Multimodal summarization of user generated videos from wearable cameras

Author nameTheodoros Psallidas
TitleMultimodal summarization of user generated videos from wearable cameras
Year2019-2020
Supervisor

Theodoros Giannakopoulos

TheodorosGiannakopoulos

Summary

The aim of this thesis is to construct a video summarization procedure to distill a video sequence in a more compact, and at the same time, informative form. The exponential growth of user-generated content has increased the need for efficient video summarization schemes. However, most approaches underestimate the power of aural features, while they are designed to work mainly on commercial/professional videos. In this work, we present an approach that uses both audio and visual features, in order to create video summaries from user-generated videos. Our approach produces dynamic video summaries, i.e., comprising of the most “important” parts of the original video, which are arranged so as to preserve their temporal order. We use supervised knowledge from both the aforementioned modalities and train a binary classifier, which learns to recognize the important parts of videos. Moreover, we present a novel user-generated dataset which contains videos from several categories. Every 1-sec part of each video from our dataset has been annotated by more than three annotators as being important or not. We evaluate our approach using several classification strategies based on audio, video, and fused features. Our experimental results illustrate the potential of our approach.