[ 17. Januar 2019 ]

DEGEM News – NEWS – Investigation of ‚label noise‘ in sound event classification

From: Eduardo Fonseca via Kevin Austin via cec conference
Datum: Sun, 13 Jan 2019
Betreff: [cec-c] Investigation of ‚label noise‘ in sound event classification


We’re pleased to announce the release of FSDnoisy18k, an open dataset to foster the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.

The dataset is released as part of our publication:

Learning Sound Event Classifiers from Web Audio with Noisy Labels
E. Fonseca, M. Plakal, D. P. W. Ellis, F. Font, X. Favory, and X. Serra.
arXiv preprint arXiv:1901.01189, 2019

where we present the dataset and a CNN baseline system. We show that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data. We also show that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.

FSDnoisy18k dataset: http://www.eduardofonseca.net/FSDnoisy18k/
Source code is available: https://github.com/edufonseca/icassp19

We hope you find these resources useful!

Thanks!

Eduardo


Eduardo Fonseca
Music Technology Group
Universitat Pompeu Fabra