Text-driven data exploration and reporting template generation

Author name	Alexandros-Konstantinos Vitsas
Title	Text-driven data exploration and reporting template generation
Year	2024-2025
Supervisor	Ilias Zavitsanos IliasZavitsanos

Summary

In today’s data-driven environment, financial institutions face significant challenges in automating report generation from free-text descriptions. This thesis addresses these challenges by proposing a novel framework to transform unstructured natural language inputs into structured financial report templates. The methodology integrates a custom Named Entity Recognition (NER) model, semantic search for column identification, and rule-based extraction for row selection. By leveraging advanced text representation techniques, including Bag-of-Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), and pretrained embeddings, the system ensures the precise mapping of textual inputs to structured outputs.

Experimental results demonstrate an average precision of 54% and recall of 57%, highlighting the system’s effectiveness in capturing relevant metrics despite constraints posed by limited data and domain-specific terminology. Key contributions include a pipeline for automated report generation, the use of large language models (LLMs) for dataset augmentation, and a semantic search strategy optimized for financial reporting. While the results showcase significant progress, challenges related to dataset size and domain complexity underscore opportunities for future work. Enhancements such as expanded datasets, advanced retrieval methods, and fine-tuned LLMs could further improve the system’s scalability and accuracy. This research provides a foundation for automating financial reporting, offering a scalable, efficient, and adaptable solution to streamline data exploration and reporting in the financial domain.

Link to full text:

https://amitos.library.uop.gr/xmlui/handle/123456789/8517

© Εθνικό Κέντρο Έρευνας Φυσικών Επιστημών «Δημόκριτος» για το Ινστιτούτο Πληροφορικής & Τηλεπικοινωνιών και Πανεπιστήμιο Πελοποννήσου για το Τμήμα Πληροφορικής και Τηλεπικοινωνιών. Τα περιεχόμενα του ιστοχώρου «ΠΜΣ Επιστήμη των Δεδομένων» μπορούν να αναπαραχθούν ελεύθερα για μη εμπορικούς σκοπούς.

Text-driven data exploration and reporting template generation

Summary

2019-2020

2018-2019

Επικοινωνια