AI Driven Early Warnings of Defaults on Performing Credit: A Machine Learning Case Study of the Greek Lending Market

Author nameAnastasios Kanellopoulos
TitleAI Driven Early Warnings of Defaults on Performing Credit: A Machine Learning Case Study of the Greek Lending Market
Year2024-2025
Supervisor

Ilias Zavitsanos

IliasZavitsanos

Summary

This thesis aims to apply machine learning (ML) models to predict loan defaults on current accounts in the context of the Greek lending market. Using a unique data set from Qualco, a leading supplier of financial risk management software, the study aims to identify current accounts that have a high disposition to default in the near future. The research contacted follows a systematic approach which involves, data preprocessing, feature engineering, and the application, tuning and statistical testing of various ML models on the given task. Models used for this proccess include but are not limited to Logistic Regression and ensemble models such as Random Forest, XGBoost, LightGBM, and CatBoost. The final results obtained from a 10 fold cross validation, show that all fully featured trained models outperform Logistic Regression, which is the baseline model used in the experiments, and the difference in performance based on pairwise comparisons of classifiers is statistically significant.

However, among the fully featured trained models there was no single model that exhibits a statistically significant performance when compared with the rest. The fully featured CatBoost, XGBoost and LightGBM models achieved the best performance in this study yet, in pairwise comparisons of performance between these three models, no difference in performance was found statistically significant. Finally, the feature importance analysis, based on a final Catboost model trained on both the train and validation datasets, revealed some of the most important factors that lead to load default perdition in the given dataset. These include but are not limited to, the risk level achieved at the the previous time step, the installments due next month and the city of the customer linked to each account all of which are intuitive and align with the factors deemed as important in the research. The results of this study, act as a useful initial point to further explore the predictive power of complex ML models in the context of loan defaults, especially in the Greek lending market.