Deep Metric Learning for Music Information Retrieval

Author nameVasileios Mouchakis
TitleDeep Metric Learning for Music Information Retrieval
Year2021-2022
Supervisor

Theodoros Giannakopoulos

TheodorosGiannakopoulos

Summary

This master's thesis explores Deep Metric Learning (DML) in audio data representation. DML uses deep neural networks to automatically learn hierarchical audio embeddings from raw waveforms, aiming to capture audio sample relationships. The research evaluates two primary loss functions: Triplet Loss and Contrastive Loss, and their impact on creating meaningful audio embeddings.

In the Triplet Loss experiments, eight Convolutional Neural Network (CNN) models were trained. The third Triplet Loss model, particularly when paired with the Normalizer scaler, proved most successful, effectively preserving song similarities.

Contrastive Loss experiments used two models with the ResNet50 architecture. The second Contrastive Loss model outperformed the other, notably when assessed with the correlation distance metric and MinMaxScaler, effectively capturing pairwise similarities.

In summary, choosing the appropriate loss function is crucial and depends on the nature of the audio data and specific task requirements. Triplet Loss is ideal for relative comparisons with normalized embeddings, while Contrastive Loss excels at capturing pairwise similarities. Distance metrics and scalers have a significant impact on model performance, necessitating careful selection.

This research contributes valuable insights to deep metric learning for audio data, laying the groundwork for audio-related applications like music recommendation and content-based audio search. As the field progresses, further enhancements in loss functions, architectures, and evaluation techniques are expected, pushing the boundaries of deep metric learning in audio data analysis.