Instrument playing technique recognition

Author nameKonstantinos Paraskevoudis
TitleInstrument playing technique recognition
Year2017-2018
Supervisor

Theodoros Giannakopoulos

TheodorosGiannakopoulos

Summary

Instrument Playing Technique recognition is a growing research field of Music Information Retrieval. Regarding stringed instruments, an instrument playing technique can be defined as any particular motive of the instrument players’ fingers applied on either the strings of the neck or on the strings of the body or sound hole of the instrument. In this work, the automated recognition of instrument playing techniques in solo recordings of the Greek stringed instrument bouzouki is examined. Towards this, a Dataset comprising of 336 recordings and 5 different playing techniques (slurring, trembling in one string, trembling in two strings, chord play, 2 strings play) is generated. The signal of each recording is firstly broken into short- term frames and the audio features (34) for each frame are extracted. In addition, the mean and standard deviation are extracted for each mid-term segment by applying them per sequence of short-term feature sequence for each segment. In total, there are 68 audio features for each recording. 8 different combinations of short and mid term windows and sizes have been selected in order to extract features and train models. Five Machine Learning models (SVM, K-NN, Gradient Boosting, Extra Trees, Random Forest) are trained on the extracted features, resulting in a total of 40 trained models. The trained models are evaluated in a generated test set comprising of 5 popular songs. Besides the five techniques, a ‘None’ technique is introduced, as a segment can either have a technique or be a simple melody (absence of technique). In order to address this, we experiment with different confidence levels for the classifications of the models. If the confidence (probability of classification) made by a model for a specific segment is lower than the selected confidence, then the ‘None’ class is assigned to this segment. The best performing model was SVM with mid-term window 0.9, mid-term step of 0.6, short-term window of 0.06 and short-term step of 0.02. This model achieved the highest F1 score (0.5880) at Confidence value 0.4. Recall drops and precision rises as the confidence limit rises. An open-source standalone python script was developed which classifies each segment of a given solo bouzouki instrument recording as of the playing technique using the best 6 (pre- trained) models of this Thesis.