Machine Learning methods for Missing values in longitudinal data and multivariate time series

Author name	Christos Platias
Title	Machine Learning methods for Missing values in longitudinal data and multivariate time series
Year	2017-2018
Supervisor	Georgios Petasis GeorgiosPetasis

Summary

Handling missing values in a dataset is a long-standing issue across many disciplines, such as health care, geosciences, biology and medicine. Missing values can arise from different sources such as mishandling of samples, measurement errors, lack of responses, or deleted values. The main problem emerging from this situation is that many algorithms can’t run with incomplete datasets. Several methods exist for handling missing values, including “SoftImpute”, “k-nearest neighbor”, “mice”, “MatrixFactorization”, and “miss- Forest”. However, performance comparisons for these methods are hard to find as most research approaches usually face imputation as an intermediate problem of a regression or a classification task and only focus on this task’s performance. In addition, comparisons with existing scientific work are difficult, due to the lack of publications with open-access datasets. Taking into consideration all the above, the goals of this thesis were three. The first one was to find and use open datasets from real use cases, so any- one can have access to them and compare their experimental results. The second one was to propose a new imputation method. Towards this end, two approaches were actually developed. One based on Autoencoders and one on bagging. Finally, the third goal was to compare some of the most frequently used methods for missing data imputation. To achieve this, 13 different methods were tested using four different real world, publicly available datasets.

© Εθνικό Κέντρο Έρευνας Φυσικών Επιστημών «Δημόκριτος» για το Ινστιτούτο Πληροφορικής & Τηλεπικοινωνιών και Πανεπιστήμιο Πελοποννήσου για το Τμήμα Πληροφορικής και Τηλεπικοινωνιών. Τα περιεχόμενα του ιστοχώρου «ΠΜΣ Επιστήμη των Δεδομένων» μπορούν να αναπαραχθούν ελεύθερα για μη εμπορικούς σκοπούς.

Machine Learning methods for Missing values in longitudinal data and multivariate time series

Summary

2020-2021

2019-2020

Επικοινωνια