Automatically understanding the content of audio data is useful for audio indexing and target-based distribution of media. This task includes audio segmentation, which segments an audio signal into homogenous segments like music or speech. Labelling of audio data can be manually performed by humans, but is time-consuming and expensive. Therefore, researchers have persistently worked on improving the performance of automatic audio segmentation by using signal processing techniques and machine learning algorithms. However, these models are still far from obtaining human-level performance.
This talk explores how deep learning can be used for audio segmentation. It aims to choose audio features and design deep neural networks (DNNs) that are optimal for this task. It discusses the pros and cons of different DNN architectures like recurrent neural networks and convolutional neural networks. Furthermore, it explores how transfer learning can be adopted for this task.
The second objective of this project is to investigate how audio segmentation can be harnessed for intelligent remixing. This task would have applications such as customising radio according to the listener’s interests and schedule. It requires audio segmentation to work in real-time and perform complex processes like audio source separation. Additionally, smooth transitions between the original and remixed audio need to be rendered. Lastly, this talk presents a prototype of the project that combines audio segmentation and intelligent remixing.