Financial narrative summarization. Automatic Summaries of Annual Reports

Author nameTheodora Koutsothanasi
TitleFinancial narrative summarization. Automatic Summaries of Annual Reports
Year2019-2020
Supervisor

George Giannakopoulos

GeorgeGiannakopoulos

Summary

Today, many interesting topics are characterized by uncertainty and complex relationship structures. One of them is business issues and the evaluation of company performance. A large number of reports written by senior managers and analysts in the company’s annual courses answer these questions. Although these reports take a long time to prepare and submit, important information is often lost due to their increased length and complexity. The current work aims to address these issues by creating an automatic summarization system, in which structured summaries will be extracted for each annual financial report. The research field of automatic summarization systems focuses on the application of Natural Language Processing (NLP) and Machine Learning (ML) techniques to produce informative summaries. In our work, we focus on the Extractive Summarization variant, which scores and selects important sentences from the source text (in our case, financial reports), combining and ranking them to create the final summaries. This thesis describes a proposed methodology to apply such techniques in the financial domain, along with i) a large-scale experimental setup, and ii) both automatic and human evaluations of the output summaries. Our Extractive Summarization approach builds a sentence classification system to label text sentences, using a dataset of financial reports and ”ground truth” hand- written summaries. We use the ”Bag of Words” representation method to represent a sentence in a vector space, subsequently comparing sentence vectors across original texts and summaries. Source sentences similar to summary sentences are marked as important for summary generation. Given this constructed dataset, we train binary classifiers using three different algorithms, merging their predictions to form the final summaries. We perform a three-fold evaluation: first, we measure classifier performance, showcasing improved results compared to a naive baseline. Further, we provide ROUGE scores for summarization with each classifier, along with indicative results of related systems. Finally, a human evaluation trial was performed, asking participants to express preference between automatic and handcrafted summary versions of a financial text. The evaluation survey result in interesting findings, including that evaluators tend to prefer the automatic summary in terms of objectivity, or that the automatic and handwritten summaries were scored similarly in terms of interpretability. We believe the obtained results provide useful insights for summarization (automatic or not) for financial reports. The study is concluded with a brief summary of the material and contributions provided, along with proposals for future work.