Author name | Alexandros-Konstantinos Vitsas |
---|---|
Title | Text-driven data exploration and reporting template generation |
Year | 2024-2025 |
Supervisor | Ilias Zavitsanos IliasZavitsanos |
In today’s data-driven environment, financial institutions face significant challenges in automating report generation from free-text descriptions. This thesis addresses these challenges by proposing a novel framework to transform unstructured natural language inputs into structured financial report templates. The methodology integrates a custom Named Entity Recognition (NER) model, semantic search for column identification, and rule-based extraction for row selection. By leveraging advanced text representation techniques, including Bag-of-Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), and pretrained embeddings, the system ensures the precise mapping of textual inputs to structured outputs.
Experimental results demonstrate an average precision of 54% and recall of 57%, highlighting the system’s effectiveness in capturing relevant metrics despite constraints posed by limited data and domain-specific terminology. Key contributions include a pipeline for automated report generation, the use of large language models (LLMs) for dataset augmentation, and a semantic search strategy optimized for financial reporting. While the results showcase significant progress, challenges related to dataset size and domain complexity underscore opportunities for future work. Enhancements such as expanded datasets, advanced retrieval methods, and fine-tuned LLMs could further improve the system’s scalability and accuracy. This research provides a foundation for automating financial reporting, offering a scalable, efficient, and adaptable solution to streamline data exploration and reporting in the financial domain.