Deep Learning with Spectrograms – Audio Generation and Feature Transfer: Samuel Pierce-Davies

April 29, 2020 | 2:00 pm 3:00 pm
Online seminar (Zoom meeting ID: 98104665986)

With the Artificial Intelligence revolution well underway, numerous techniques have been developed for applying AI processes in artistic contexts. One such system is Google Deepdream, a revolutionary computer vision process. Dealing with stylistic feature transfer, Deepdream processes images, morphing them in unusual ways to better fit its understanding of what particular image features should look like (not dissimilar to how humans see faces and other shapes in clouds), resulting in ‘dreamed images’.

My project is concerned with applying a similar process to sound, to produce ‘dreamed audio’. This has involved research into cutting-edge Deep Learning feature classification systems, which are most usually designed with images in mind as their training data. During this research, I reasoned: why not train them on images of sound? I have since been working with spectrograms — visual representations of the frequencies that make up a particular sound and how these change over time — taking Google’s NSynth machine learning dataset of individual instrumental notes and generating spectrograms from these. I am now experimenting with some initial tests regarding generation of new sounds from this dataset, making use of Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs).