Itemset mining on a distributed hash table system

Author nameEnri Gaci
TitleItemset mining on a distributed hash table system
Year2020-2021
Supervisor

Christos Tryfonopoulos

ChristosTryfonopoulos

Summary

In today’s society using applications for everyday tasks is considered so natural that most people don’t even realize it. Shopping online, reading articles, and consuming content like videos have at least one thing in common, that is, they all suggest content based on the results produced from frequent item set mining algorithms. The data mining mechanism helps us find restaurants we may like to visit or activities we may want to experience without ever having to tell anyone what we specifically want. Frequent item set mining groups items in a data set together based on the frequency they are found (eg. used, visited or bought) together enabling us to enjoy this enhanced way of living. At first due to the fact that the data sets we needed to perform mining upon could be stored in a single machine we were happy with the results classic algorithms were producing, but as the number of applications rises so does the data that we need to take under consideration. Centralized solutions become less viable as the volume rises and researchers from around the world study solutions to get results from distributed systems. The aim of this thesis is to design and implement a distributed system capable of finding frequent item sets. We start by running the computations on a single computer while next we will investigate methods to distribute the data on multiple machines and perform distributed item set mining. Our goal is to achieve the best precision between the item sets produced by the distributed system compared to the centralized system. The distribution of data will give us the opportunity to study more complex problems such as protein synthesis and the coexistence of drugs. We may find relations in data that we could not handle before and perform better predictions regarding weather or physical phenomena. Every obstacle we pass successfully opens the door to new and better systems that will definitely provide more opportunities to people through the access to new technologies.

Keywords: distributed hash table, frequent itemset mining, peer to peer systems