AI-driven rehabilitation indicators for non-performing credit

Author nameAikaterini Kinti
TitleAI-driven rehabilitation indicators for non-performing credit
Year2024-2025
Supervisor

Ilias Zavitsanos

IliasZavitsanos

Summary

The banking sector and lending organizations are undergoing significant transformation, driven by advancements in Machine Learning (ML). A pivotal application of ML in this domain is loan default prediction, which is essential for developing robust credit scoring systems and maintaining financial stability for banks and financial institutions. This study focuses on analyzing account- and customer-related attributes that contribute to non-performing loans (NPLs), via SHAP and LIME, with the goal of uncovering insights that can inform effective and mutually beneficial resolution strategies. Using a proprietary dataset comprising 326 attributes, the study addresses the challenge of imbalanced classification, where the dataset is heavily skewed towards performing loans, often hindering model performance.

To address this, four experimental scenarios were explored: (a) a baseline model trained on the original dataset, (b) an artificially balanced dataset with equal class representation, (c) an approach combining oversampling via Synthetic Minority Oversampling Technique (SMOTE) and Undersampling with RandomUnderSampler, and (d) the application of focal and weighted loss functions to XGBoost model. Among all scenarios, the combination of SMOTE and Random UnderSampler proved most effective. The Random Forest model emerged as the top performer, achieving a ROC-AUC score of 0.7402 and a Precision- Recall AUC of 0.0126. This study emphasizes the critical role of tailored preprocessing and evaluation methodologies in navigating the complexities of imbalanced data. It demonstrates the potential for incorporating preprocessing techniques to handle the redistribution of data across the 2 classes, including sampling strategies and loss function modifications, in order to highlight those attributes that demonstrate the predictability of loan defaults.