Author name | Georgios Vlachias |
---|---|
Title | Statistical and semantic analysis of BioASQ dataset for Question-Answering |
Year | 2017-2018 |
Supervisor | Anastasia Krithara AnastasiaKrithara |
The aim of this thesis is to provide an overview of the data issued during the BioASQ challenge for the years 2013-2018, in an effort to discover patterns and derive knowledge concerning the performance of the systems that participated in the challenge and the extent to which the biomedical questions were answered effectively based on the type and the creator of the question. First, we perform an Exploratory Data Analysis on the collection of data from Task B of the BioASQ challenge. Our goal is to classify the biomedical questions into classes corresponding to varying degrees of difficulty and then perform Cluster Analysis, using different Clustering Algorithms and evaluation metrics. We then use Machine Learning techniques to train models that predict the class of question difficulty, in an effort to explore how well the question difficulty characterizes the data issued during the BioASQ challenge.