Distributed Trajectory Clustering of Vessel AIS Data

Author nameStamatis Stefanopoulos
TitleDistributed Trajectory Clustering of Vessel AIS Data
Year2020-2021

Summary

Trajectory clustering is an important problem, where position data of mobile objects, such as vehicles and vessels, is analyzed to extract knowledge that is later utilized for a plethora of management tasks. Recently, a vast increase in the production of data gathering devices has taken place, allowing the collection of data in much larger volumes. This challenges the application of existing clustering algorithms, as they are not always able to handle large datasets due to their design. In particular, TRACLUS is one of the most well-known trajectory clustering algorithms that is a generalization of DBSCAN for trajectory line segments.

However, due to the iterative approach and the repetitive usage of a spatial index inherited from DBSCAN, TRACLUS’s performance degrades as the datasets increase in size and its execution might be extremely slow in some cases. To tackle this shortcoming, we propose a distributed implementation of TRACLUS, built on Apache Spark, that can operate on very large datasets by applying different types of partitioning to the input data: spatial partitioning, which splits the data taking into account its spatial distribution and random partitioning, which randomly splits the dataset into balanced subsets without considering spatial criteria. Results from an empirical evaluation on real-world trajectories illustrate that our proposed distributed variants achieve improved runtime performance without jeopardizing the quality of the results and the clustering efficiency.