Author name | Panagiotis Deligiannis |
---|---|
Title | Monitoring the Evolution of Scientific Topics and their Impact |
Year | 2024-2025 |
Supervisor | Thanasis Vergoulis ThanasisVergoulis |
The aim of this thesis is the training of topic models and the estimation of the topic evolution over time. For this purpose we employ LDA topic models. The LDAs inferred in the context of this thesis are trained in scientific publications related to cancer research. Initially, we limit the corpus to texts published in four consecutive years. These texts are passed through preprocessing and vectorization processes and then are split to training, validation and test partitions. The trained models are evaluated and for each year the best performing model is elected as representative. After training, we encode topic evolution as the similarity between two successive topic models. Finally, for each inferred topic we compute aggregated impact metrics. Through a developed Web application, we explore the trained models and the computed topic evolution. The interface offers a customized diagram visualization, as well as the ability to overview learnt topics, thus facilitating in the qualitative evaluation of the training procedure. The final results indicate shortcomings in the training process, since models fail to capture sensible topics. By extension, the calculated evolution is also affected. Nevertheless, the developed application sufficiently satisfies the purpose for visualization and evaluation of inferred topics and their evolution.