Hearing Aid Research Data Set for Acoustic Environment Recognition (HEAR-DS)
HEAR-DS provides binaural audio material recorded in acoustic environments typical of hearing aid users. Its goal is to help researchers train and test algorithms in environments relevant to hearing aids. A particular focus is on machine learning approaches such as DNN.
Please cite this work with DOI
10.1109/ICASSP40776.2020.9053611:
Hearing Aid Research Data Set for Acoustic Environment Recognition https://ieeexplore.ieee.org/document/9053611
(Andreas Hüwel, Dr. Kamil Adiloğlu and Dr. Jörg-Hendrik Bach), published at ICASSP2020
Download
HEAR-DS download link
Parts of HEAR-DS
HEAR-DS consists of this parts, for each its licensing see LICENSE.txt in subfolders:
- HEAR-DS/RawAudioCuts
- HEAR-DS/AudioSnippets
- HEAR-DS/Code
Further details see
HEAR-DS README.txt
"Your browser may have problems with correctly showing the tree structure used in the readme.txt file. For this reason, please download the readme.txt and open it in an appropriate editor."
Overview Acoustic Environments
Cocktail party | |
Interfering speakers | |
In traffic | Speech in traffic |
In vehicle | Speech in vehicle |
Music | Speech in music |
Quiet indoors | Speech in quiet indoors |
Reverberant environment | Speech in reverberant environment |
Wind turbulence | Speech in wind turbulence |
Example of Speech in Background SNR Variations
Acoustic Environment | |||||
Speech in vehicle | SNR -10 | SNR -5 | SNR 0 | SNR 5 | SNR 10 |
As described in the paper, some audio material was used by third parties and therefore cannot be provided here. But all the required data is accessible online. With the scripts we provide, anyone can regenerate the entire data set themselves.
The audio material for the noise is from CHiME5 and the speech mixing material for speech in background environments is from CHiME2. For CHiME2 (2013) and CHiME5 (2018), please contact the organizers for access to the datasets. Audio for music is from GTZan.
An acoustic environment contains audio from different recording situations. Each recording situation has a unique ID (rec_id) that contains one or more recording sessions. From the raw audio of each recording session, we manually cut appropriate pieces of audio (the cuts) to fill each recording situation with audio, where each cut has a local unique cut_id. To generate the actual dataset for training machine learning systems, we performed another processing step that generates all 10s of audio samples for each acoustic environment, as further described in the Audio Samples subsection.
HEAR-DS Raw Audio Cuts
For each recording situation, one folder contains all the cut wav files.
Folder structure of the HEAR-DS samples:
For details see
HEAR-DS README.txt
Due to the manual process of audio editing, the length of the cuts varies. The naming scheme is:
rec_id__cut____.wav
With being a 3 digit number and a 2 digit number. The could e.g. be "startengine_driveoff" for InVehicle or "bell" in ReverberantEnvironment. stands for one of the used hearing aid microphones [Mic_BTE_L_front, Mic_BTE_L_rear, Mic_BTE_R_front, Mic_BTE_R_rear, Mic_ITC_L, Mic_ITC_R]. is the name of the used audio-exporter, currently "raw_48kHz32bit".
In this processing step, the raw audio snippets were further decomposed into 10s snippets. These 10s snippets are either used directly as background samples or further mixed with random speech at different SNRs to create audio samples for speech in background environments. The binaural speech source material comes from five different directions that we randomly select, the start and end time of this source speech and the start time of the background snippet are also randomly selected. Finally, these 10-samples form the HEAR-DS audio material for training machine learning systems, e.g., as input for the feature extraction step of deep neural networks.
Audio sample snippet file format
The naming scheme for snippets is:
<ENV_ID>_<REC_ID>_<CUT_ID>_<SNIP_ID>_<TRACKNAME>_<SAMPLERATE>.wav
- <ENV_ID>: 2 digit id of acoustical environment, where each speech in background environment has its own id, separated from the pure background environment.
- <REC_ID>: 3 digit id of record situation.
- <CUT_ID>: 2 digit id of cut of the record situation (unique for all sessions of that situation)
- <SNIP_ID>: 3 digit id of the snippet of this cut.
- <TRACKNAME>: as described above.
- <SAMPLERATE>: in [48kHz, 16kHz]
For example, for Reverberant Environment, recording situation "Oldenburg Church", first cut, first snippet the 16kHz version the snippet filename is 06_005_00_000_BTE_L_front_16kHz.wav
Details see
HEAR-DS.README.txt
This work was supported by the German Federal Ministry of Education and Science (BMBF), FZK 02K16C202 AUDIO-PSS.
The authors would like to thank Marei Typlt and the partners in the AUDIO-PSS project for their support in designing the acoustic environments and Audifon GmbH for providing the hearing aid dummies.