[ 15. November 2023 ]

DEGEM News – [ak-discourse] Youtube live feed for MIR Seminar @ IRCAM Thursday Nov 16

Von: Lepa, Steffen via ak discourse
Datum: Tue, 14 Nov 2023
Betreff: [ak-discourse] Youtube live feed for MIR Seminar @ IRCAM Thursday Nov 16

Dear all,

for those who could not attend ISMIR this year or simply did not had enough, please find here the links for the youtube live feeds of a great MIR seminar hosted at IRCAM next Thursday (schedule according to Paris time zone):

– morning: https://youtube.com/live/kUSA2voSbzU

– afternoon: https://youtube.com/live/sGwiuGvHz0o


10:00 Introduction

10:15 Gaël Richard : „Hybrid deep learning for music analysis and synthesis“

11:15 Rémi Mignot : „Invariance learning for a music indexing robust to sound modifications“

14:00 Rachel Bittner : „Basic Pitch: A lightweight model for multi-pitch, note and pitch bend estimations in polyphonic music“

15:00 Romain Hennequin : „Labeling a Large Music Catalog“


Gaël Richard : „Hybrid deep learning for music analysis and synthesis“

The access to ever-increasing super-computing facilities, combined with the availability of huge data repositories (although largely unannotated), has permitted the emergence of a significant trend with pure data-driven deep learning approaches. However, these methods only loosely take into account the nature and structure of the processed data. We believe that it is important to rather build hybrid deep learning methods by integrating our prior knowledge about the nature of the processed data, their generation process or if possible their perception by humans. We will illustrate the potential of such model-based deep learning approaches (or hybrid deep learning) for music analysis and synthesis.

Rémi Mignot : „Invariance learning for a music indexing robust to sound modifications“

Music indexing allows the finding of music excerpts among a large music catalog and the detection of duplicates. With the rise of social media, it is more and more important for music owners to detect misuse and illegal use of their music. The main difficulty of this task is the detection of music excerpts when they have been strongly modified, intentionnally or not. To deal with this issue, the presented method is based on an audio representation relevant to the music content and robust to some sound modifications. Then, using a data augmentation approach, a discriminant transformation is learnt to improve the robustness of the compact representation. Finally, a hash function is derived to allow a fast searching with a large catalog together with a robustness to bit corruption.

Rachel Bittner : „Basic Pitch: A lightweight model for multi-pitch, note and pitch bend estimations in polyphonic music“

„Basic-pitch“ is a lightweight neural network for musical instrument transcription, which supports polyphonic outputs and generalizes to a wide variety of instruments (including vocals). In this talk, we will discuss how we built and evaluated this efficient and simple model, which experimentally showed to be substantially better than a comparable baseline in detecting notes. The model is trained to jointly predict frame-wise onsets, multi-pitch and note activations, and we experimentally showed that this multi-output structure improves the resulting frame-level note accuracy. We will also listen to examples using (and misusing) this model for creative purposes, using our open-source python library, or demo website: thanks to its scalability, the model can run on the browser, and your audio doesn’t even leave your own computer.

Paper: https://arxiv.org/abs/2203.09893

Code: https://github.com/spotify/basic-pitch

Demo: https://basicpitch.spotify.com

Romain Hennequin : „Labeling a Large Music Catalog“

Music Streaming Services such as Deezer offer their users a catalog of tens of millions of songs. Navigating through such a vast catalog requires retrieving and organizing musical knowledge in an automated way using music information retrieval tools. Various music dimensions can be considered, such as music genre or moods, and all these dimensions come with ambiguities. The talk will describe common issues with labeling large music catalogs, how to deal with them, and the remaining challenges.


Gaël Richard is a Full Professor in audio signal processing at Telecom-Paris, Institut polytechnique de Paris and the scientific co-director of the Hi! PARIS interdisciplinary center on Artificial Intelligence and Data analytics. His research interests are mainly in the field of speech and audio signal processing and include topics such as signal models, source separation, machine learning methods for audio/music signals and music information retrieval. He received, in 2020, the Grand prize of IMT-National academy of science for his research contribution in sciences and technologies. In 2022, he is awarded of an advanced ERC grant of the European Union for a project on hybrid deep learning for audio (HI-Audio). He is a fellow member of IEEE.

Rémi Mignot is a tenured researcher at IRCAM (UMR STMS 9912) in Paris, France, member of the Analysis/Synthesis team. His research expertise focuses on machine learning and signal processing applied to audio processing and indexing. He received a PhD in Signal and Image Processing of Télécom ParisTech with IRCAM in 2009. Then, he did a first post-doctoral research in the Langevin Institut (ESPCI ParisTech and UPMC in Paris), where he studied the sampling of rooms impulses responses using Compressed Sensing; and a second post-doctoral research at Aalto University in Espoo, Finland, with a Marie Curie post-doctoral fellowship, where he worked on the sound synthesis of musical instruments based on an „Extended“ Subtractive Synthesis approach. In 2014 he came back at IRCAM to work on audio indexing and music information retrieval, and he has obtained a permanent position since 2018.

Rachel Bittner is a Research Manager at Spotify in Paris. Before Spotify, she worked at NASA Ames Research Center in the Human Factors division. She received her Ph.D. degree in music technology and digital signal processing from New York University. Before that, she did a Master’s degree in Mathematics at New York University, and a joint Bachelor’s degree in Music Performance and Math at UC Irvine. Her research interests include automatic music transcription, musical source separation, metrics, and dataset creation.

Romain Hennequin is a Research Scientist at Deezer, where he heads the researchers‘ team. He graduated in Computer Science from Ecole Polytechnique, UPMC (now Sorbonne Université), and Telecom Paris and earned a Ph.D. in signal processing from Telecom Paris. He has been working for more than 10 years in industrial research, addressing various topics such as source separation, music information retrieval, recommender systems, and graph mining.


Mathieu Lagrange

CNRS Researcher (HDR) (https://mathieulagrange.github.io)

on AI for Audio (https://audio.ls2n.fr)

Head of the SIMS team (https://sims.ls2n.fr)

LS2N, Campus Ecole Centrale de Nantes

1, rue de la Noë, 44321 Nantes, FRANCE

Zoom phone: +33 240 379 964