[Postponed] Towards improved sound representation for machine learning – Carlos Tarjano Santos

March 18, 2020 | 2:00 pm 3:00 pm
Rolle 114, University of Plymouth

This event has been postponed due to the COVID-19 situation. A new date and time will be announced shortly, to present the talk online.

Free entrance. For members of the university only, unless otherwise stated.

The perception of sound arises from the translation, via the nervous system, of fluctuations in air pressure experienced in regions of the auditory system. This continuous physical phenomenon is often encoded in digital form as a discrete sequence of numbers, measured at regular intervals, indicating the air pressure at a particular point in time. This representation, while appropriate in a number of applications, is suboptimal for the majority of machine learning tasks, for example adding computational burden to already resource-intensive algorithms, and offering little insight into structural characteristics of the sound. Frequency domain representations tend to be more informative, and are a suitable alternative representation, especially in classification tasks. Despite alleviating some of the shortcomings of time-domain representations, they still exhibit undesirable characteristics in the context of sound-synthesis, such as the dependence of the wavelet transform in the appropriate selection of mother wavelets and the fact that the Fourier transform isn’t localized in time. Specifically, in the context of real-time sound synthesis, those shortcomings become more prominent, motivating the investigation of alternative representations better suited to the high dimensionality, cyclic nature and other inherent characteristics of sound. Better representations have the potential to improve the research in the interface between artificial intelligence and audio, enabling, for example, the translation of many advancements seen in the computer vision field. The aim of this presentation is, thus, to explore sound representation possibilities while introducing a new transform that aims to address some of the aforementioned shortcomings.