Statistical and semantic analysis of BioASQ dataset for Question-Answering

Author nameGeorgios Vlachias
TitleStatistical and semantic analysis of BioASQ dataset for Question-Answering
Year2017-2018
Supervisor

Anastasia Krithara

AnastasiaKrithara

Summary

The aim of this thesis is to provide an overview of the data issued during the BioASQ challenge for the years 2013-2018, in an effort to discover patterns and derive knowledge concerning the performance of the systems that participated in the challenge and the extent to which the biomedical questions were answered effectively based on the type and the creator of the question. First, we perform an Exploratory Data Analysis on the collection of data from Task B of the BioASQ challenge. Our goal is to classify the biomedical questions into classes corresponding to varying degrees of difficulty and then perform Cluster Analysis, using different Clustering Algorithms and evaluation metrics. We then use Machine Learning techniques to train models that predict the class of question difficulty, in an effort to explore how well the question difficulty characterizes the data issued during the BioASQ challenge.