Visual Art Generation for Music

Author nameNikolaos Papadopoulos
TitleVisual Art Generation for Music
Year2024-2025
Supervisor

Theodoros Giannakopoulos

TheodorosGiannakopoulos

Summary

This thesis explores the potential use of Generative AI for visual art generation in music, introducing a tool named Deforum Music Visualizer. This tool enables the automatic creation of visual art from music and is built using Deforum Stable Diffusion, an open source, generative text-to-video diffusion framework. To incorporate both high- and low-level musical elements, it integrates extensive Music Information Retrieval (MIR) data into music informed settings, along with conditional generation based on the song’s album cover. A survey of 45 participants (balanced female/male ratio, ages 19–59) was conducted to evaluate the tool’s effectiveness. Regardless of the participants’ music background, the tool produced baseline results in the fully automated process, scoring 3.0 ± 1.06 for Mean Enjoyment and 2.93 ± 1.20 Mean ISA (incorporation of the song’s atmosphere) on the Likert scale (1-5). User-curated prompts provided a statistically significant improvement in the performace in both Mean Enjoyment (3.63 ± 1.03) and Mean ISA (3.74 ± 1.06). The github repository of the project is available here: https://github.com/nickpadd/DeforumMusicVisualizer.