Restoring consistency in large-scale Knowledge Graphs

Author nameNikolaos Paraskakis
TitleRestoring consistency in large-scale Knowledge Graphs
Year2024-2025
Supervisor

Alexandros Artikis

AlexandrosArtikis

Summary

The rapid growth and adoption of Knowledge Graphs (KGs) across domains such as biomedical informatics, enterprise systems, and semantic search have underscored the need for maintaining their logical consistency. However, large-scale KGs, often built from heterogeneous and noisy data sources, are highly susceptible to inconsistencies that impair reasoning and query reliability. The proposed approach splits the KG into modules and performs parallel inconsistency detection and parallel repairing using various fixing strategies. These modules are merged using a neighborhood-based logic and a specified hop length, enabling the framework to effectively detect and repair inconsistencies in KGs expressed in OWL2 Description Logic (i.e., SROIQ(D)).

To address memory limitations of OWL2 reasoners and the need for high parallelism in large KGs, the framework leverages the big data platforms Apache Hadoop and Apache Spark, facilitating distributed processing and enabling scalability up to a billion triples. The implementation integrates a triple store for efficient data access and employs SPARQL for effective querying. This work examines the performance of three different OWL2 reasoners (HermiT, Pellet, and JFact), the effectiveness of different fixing approaches, and the impact of the hop length on (i) the completeness of the result and (ii) the processing time. Experimental evaluation on the Lehigh University Benchmark (LUBM) dataset demonstrates the framework’s effectiveness, marking an advancement to the KG sizes (expressed in OWL2) that can handle.