Monitoring the Evolution of Scientific Topics and their Impact

Author namePanagiotis Deligiannis
TitleMonitoring the Evolution of Scientific Topics and their Impact
Year2024-2025
Supervisor

Thanasis Vergoulis

ThanasisVergoulis

Summary

The aim of this thesis is the training of topic models and the estimation of the topic evolution over time. For this purpose we employ LDA topic models. The LDAs inferred in the context of this thesis are trained in scientific publications related to cancer research. Initially, we limit the corpus to texts published in four consecutive years. These texts are passed through preprocessing and vectorization processes and then are split to training, validation and test partitions. The trained models are evaluated and for each year the best performing model is elected as representative. After training, we encode topic evolution as the similarity between two successive topic models. Finally, for each inferred topic we compute aggregated impact metrics. Through a developed Web application, we explore the trained models and the computed topic evolution. The interface offers a customized diagram visualization, as well as the ability to overview learnt topics, thus facilitating in the qualitative evaluation of the training procedure. The final results indicate shortcomings in the training process, since models fail to capture sensible topics. By extension, the calculated evolution is also affected. Nevertheless, the developed application sufficiently satisfies the purpose for visualization and evaluation of inferred topics and their evolution.