Speech Emotion Recognition on Greek Theatrical Data

Author nameMaria Moutti
TitleSpeech Emotion Recognition on Greek Theatrical Data
Year2024-2025
Supervisor

Theodoros Giannakopoulos

TheodorosGiannakopoulos

Summary

Τhe aim of this thesis is to develop and evaluate machine learning and deep learning models that can accurately recognize emotions in Greek theatrical speech, thereby improving accessibility for individuals with hearing impairments and contributing to more inclusive cultural experiences. Speech emotion recognition (SER) in the context of theatrical plays presents a unique and intriguing challenge, as the theatrical environment often involves actors striving to evoke deeper emotions from the audience. As a result, the emotional attributes of valence and arousal in datasets from theatrical plays are likely to differ significantly from those in standard SER datasets commonly used in the literature.

However, real-world datasets from theatrical plays are scarce in the literature. To address this gap, a novel dataset named GreThE is introduced, a newly available public resource designed for speech emotion recognition in Greek theatrical plays. This dataset includes utterances from various actors and plays, annotated for valence and arousal by multiple annotators, with inter-annotator agreement factored into the final ground truth. The experimental setup involves both traditional machine-learning-based approaches (SVMs) and deep-learning-based methods, with a particular focus on leveraging pre-trained models from well-resourced English language datasets to enhance emotion recognition performance in cross-domain settings.

The results indicate that deep learning architectures, particularly those using transfer learning, significantly outperform traditional methods, achieving higher accuracy rates in detecting complex emotional states. The findings have significant implications for cultural accessibility, particularly in the context of Greek theatrical performances. By facilitating automatic emotion recognition, the proposed models can provide a richer and more inclusive experience for spectators with hearing loss. This research also lays the groundwork for future studies on SER in other underrepresented languages and cultural contexts.