From: Eduardo Fonseca via Kevin Austin via cec conference
Datum: Sun, 13 Jan 2019
Betreff: [cec-c] Investigation of ‚label noise‘ in sound event classification
We’re pleased to announce the release of FSDnoisy18k, an open dataset to foster the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
The dataset is released as part of our publication:
Learning Sound Event Classifiers from Web Audio with Noisy Labels
E. Fonseca, M. Plakal, D. P. W. Ellis, F. Font, X. Favory, and X. Serra.
arXiv preprint arXiv:1901.01189, 2019
where we present the dataset and a CNN baseline system. We show that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data. We also show that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.
We hope you find these resources useful!
Music Technology Group
Universitat Pompeu Fabra