araus-dataset-baseline-models

This repository stores code to download the ARAUS dataset (v1 and v2) and train baseline models for the dataset (v1 only). For more details on the initial publication of the dataset (ARAUSv1), please refer to our publication (a preprint can be found on arXiv at https://doi.org/10.48550/arXiv.2207.01078):

Kenneth Ooi, Zhen-Ting Ong, Karn N. Watcharasupat, Bhan Lam, Joo Young Hong, Woon-Seng Gan, ARAUS: A large-scale dataset and baseline models of affective responses to augmented urban soundscapes, IEEE Transactions on Affective Computing, 2023.

For more details on ARAUSv2, please refer to the following publication (a preprint can be found at https://hdl.handle.net/10356/168665):

Kenneth Ooi, Zhen-Ting Ong, Karn N. Watcharasupat, Bhan Lam, Trevor Wong, Woon-Seng Gan. ARAUSv2: An Expanded Dataset and Multimodal Models of Affective Responses to Augmented Urban Soundscapes. In Proceedings of the 52nd International Congress and Exposition on Noise Control Engineering (Inter-Noise), Chiba (2023).

The ARAUS dataset makes use of urban soundscape recordings from the Urban Soundscapes of the World (USotW) database. If you use the ARAUS dataset or the USotW recordings in your work, please cite the following publication:

Bert De Coensel, Kang Sun and Dick Botteldooren. Urban Soundscapes of the World: selection and reproduction of urban acoustic environments with soundscape in mind. In Proceedings of the 46th International Congress and Exposition on Noise Control Engineering (Inter-Noise), Hong Kong (2017).

The ARAUS dataset also makes use of masker recordings taken from Freesound or Xeno-canto. If you use the ARAUS dataset or the masker recordings in your work, please ensure that you comply with the license (mainly Creative Commons licenses) and citation terms of the individual source files. We have collated these terms as part of the ARAUS dataset for easy reference, but make no representation or guarantee as to their accuracy or currency. In the event of any discrepancy, please defer to the metadata of the original source files hosted on their corresponding databases.

Getting started

Firstly, clone this repository by manually downloading it from https://github.com/kenowr/araus-dataset-baseline-models, or enter the following line from a terminal (you need to have git installed first, of course):

git clone https://github.com/ntudsp/araus-dataset-baseline-models.git

You may then navigate to the downloaded folder with

cd araus-dataset-baseline-models

If you are using conda as your package manager, you may enter the following line into a terminal to install the required packages into a conda environment (or you may install them manually using the requirements stated in araus.yml):

conda env create -f araus.yml

Activate the conda environment by entering the following line into a terminal:

conda activate araus

(If you are running the code on a computer with macOS installed, and the above commands fail, try conda env create -f araus-mac.yml and conda activate araus-mac instead.)

To download the files making up the dataset (which includes the raw audio of the individual maskers and soundscapes, as well as CSV files containing metadata and all data on the subjective responses in the dataset), you may then enter the following into a terminal (this will download ~3 GB of data from the Internet):

cd ./code
python download.py manifest.csv

If all files have downloaded successfully, your directory structure should match this.

Then, to run the replication code and baseline models, as reported in our publication, you may enter the following line into a terminal (this opens a Jupyter Notebook in your default browser):

jupyter lab --notebook-dir .. replication_code.ipynb

Alternatively, if you wish to only generate the augmented soundscapes to which the subjective responses in the ARAUSv1 dataset were collected (e.g. for your own analysis or exploration), you may then enter the following line into a terminal (this will generate ~132 GB of data as WAV files):

python make_augmented_soundscapes.py

The augmented soundscapes in the ARAUSv1 dataset may be generated in FLAC format instead (~50 GB of data):

python make_augmented_soundscapes.py -of flac

To generate the augmented soundscapes for ARAUSv2 (~166 GB), you may enter the following line into a terminal:

python make_augmented_soundscapes.py ..\datav2\responses.csv ..\datav2\soundscapes.csv ..\datav2\maskers.csv -f 0 1 2 3 4 5 6 7

We have also added a Jupyter Notebook containing extra code and discussions outside the scope of our publication, which you may access by entering the following line into a terminal (this opens a Jupyter Notebook in your default browser):

jupyter lab --notebook-dir .. extra_code.ipynb

Directory structure

This repository

.
├── code                                               # Code used to process the raw data and output the results.
│   ├── araus_tf.py
│   ├── araus_utils.py
│   ├── download.py   
│   ├── extra_code.ipynb 
│   ├── make_augmented_soundscapes.py
│   ├── manifest.csv
│   └── replication_code.ipynb
│
├── CITATION.cff                                       # Citation information for the dataset in plain text.
├── README.md                                          # This file.
├── araus-mac.yml                                      # The Anaconda environment containing required packages to run all the code in ./code (for macOS)
└── araus.yml                                          # The Anaconda environment containing required packages to run all the code in ./code (for Windows and Ubuntu).

After running `python download.py manifest.csv`

.
├── code                                               # Code used to process the raw data and output the results.
│   ├── araus_tf.py
│   ├── araus_utils.py
│   ├── download.py 
│   ├── extra_code.ipynb   
│   ├── make_augmented_soundscapes.py
│   ├── manifest.csv
│   └── replication_code.ipynb
│
├── data                                               # Folder containing all CSV data in the ARAUS dataset (v1)
│   ├── maskers.csv
│   ├── participants.csv
│   ├── participants_rejected.csv
│   ├── participants_rejected_reasons.csv
│   ├── responses.csv
│   ├── responses_rejected.csv
│   └── soundscapes.csv
│
├── datav2                                             # Folder containing all CSV data in the ARAUS dataset (v2)
│   ├── maskers.csv
│   ├── participants.csv
│   ├── participants_rejected.csv
│   ├── participants_rejected_reasons.csv
│   ├── responses.csv
│   ├── responses_rejected.csv
│   └── soundscapes.csv
│
├── figures                                            # Folder containing all reference figures used in the Jupyter Notebook (123 files; all png)
│   ├── circumplex_distribution_araus.png
│   ├── ...
│   └── times_taken_with_test.png
│
├── maskers                                            # Folder containing all maskers used in the ARAUS dataset (407 files; all mono, 44.1kHz, 30 seconds in length).
│   ├── bird_00001.wav
│   ├── ...
│   └── wind_20016.wav
│
├── soundscapes                                        # Folder containing all soundscapes used in the ARAUS dataset (396 files; all binaural, 44.1 kHz, 30 seconds in length).
│   ├── R0001_segment_binaural_44100_1.wav
│   ├── ...
│   └── S0062_1min_SQobold_bin_44100_2.wav
│ 
├── soundscapes_raw                                    # Folder containing soundscapes from the Urban Soundscapes of the World database and the Lion City Soundscapes dataset (195 files; all binaural, 48 kHz, 60 seconds in length).
│   ├── R0001_segment_binaural.wav
│   ├── ...
│   └── S0062_1min_SQobold_bin.wav
│
├── CITATION.cff                                       # Citation information for the dataset in plain text.
├── README.md                                          # This file.
├── araus-mac.yml                                      # The Anaconda environment containing required packages to run all the code in ./code (for MacOS)
└── araus.yml                                          # The Anaconda environment containing required packages to run all the code in ./code.

For more details on the contents of each CSV file, please refer to the sections with their filenames as titles.

Data files

All metadata on the soundscapes and maskers used for this dataset (ARAUSv1 in the data folder and ARAUSv2 in the datav2 folder), as well as all subjective perceptual responses collected as part of this dataset, is organised into four CSV files:

maskers.csv : Infomation about the maskers used in the dataset
- For ARAUSv1, this consists of 280 non-silent maskers in the five-fold cross-validation set, 7 non-silent maskers in the independent test set, and 6 silent "maskers" used when no masker is to be added to an urban soundscape.
- For ARAUSv2, this consists of 280 non-silent maskers in the ARAUSv1 five-fold cross-validation set, 7 non-silent maskers in the ARAUSv1 test set, 112 non-silent maskers in the ARAUSv2 test set, and 8 silent "maskers" used when no masker is to be added to an urban soundscape.
soundscapes.csv : Information about the soundscapes used in the dataset
- For ARAUSv1, this consists of 234 soundscapes from the Urban Soundscapes of the World database in the five-fold cross-validation set, 6 soundscapes recorded by us in the independent test set, and 7 soundscapes not used in either set (this section explains why)).
- For ARAUSv2, this consists of the above 247 soundscapes, and 96 soundscapes from the Lion City Soundscapes dataset in the ARAUSv2 test set.
participants.csv : Information about the participants who provided their responses to the stimuli used in the ARAUS dataset.
- For ARAUSv1, this consists of 600 participants in the five-fold cross-validation set and 5 in the independent test set
- For ARAUSv2, this consists of 680 participants in the ARAUSv1 five-fold cross-validation set, 5 in the ARAUSv1 test set, and 64 in the ARAUSv2 test set.
responses.csv : Information about the individual stimuli and responses used in the ARAUS dataset.
- For ARAUSv1, this consists of 25,200 responses in the five-fold cross-validation set, 240 responses in the independent test set, and 1,815 responses to auxiliary stimuli used for practice and data quality checks
- For ARAUSv2, this consists of 28,560 responses in the ARAUSv1 five-fold cross-validation set, 240 responses in the ARAUSv1 test set, 2,688 responses in the ARAUSv2 test set, and 2,247 responses to auxiliary stimuli used for practice and data quality checks

Further details on the CSV files can be found below the corresponding subheaders in this section.

The CSV files can also be considered as individual database tables in an SQL database with the following fields as keys:

maskers.csv : Primary key = masker
soundscapes.csv : Primary key = soundscape
participants.csv : Primary key = participant
responses.csv : Primary key = (participant, stimulus_index), foreign keys = masker, soundscape, participant

There are three additional files containing rejected data from the dataset that we include in this repository for transparency and accountability, but that we do not recommend using:

participants_rejected.csv : Information about participants whose responses were rejected from the dataset.
- For ARAUSv1, this consists of 37 in the five-fold cross-validation set and 0 in the independent test set
- For ARAUSv2, this consists of 41 in the ARAUSv1 five-fold cross-validation set, 0 in the ARAUSv1 test set, and 8 in the ARAUSv2 test set.
participants_rejected_reasons.csv : Information about reasons why participants' responses were rejected from the dataset.
responses_rejected.csv : Information about the responses that were rejected from this dataset

`maskers.csv`

This CSV file contains information related to the maskers used to generate the stimuli for which the responses in responses.csv were collected, as well as relevant psychoacoustic parameters of the maskers computed after calibration to an L_A,eq value of 65 dB.

The maskers were processed from original source files that came from either Freesound or Xeno-canto, and hence may differ from the original files found at their respective websites because:

Portions of files that were originally longer than 30 seconds were cut to create the corresponding 30-second masker in this repository.
Portions of files that were originally shorter than 30 seconds may have been repeated (or have had silence added) to create the corresponding 30-second masker in this repository.
Noise cancellation/high-pass filtering may have been performed (using Audacity 2.3.2) to the original file to reduce ambient/microphone noise present in the original file.

Hence, we recommend using the maskers downloaded using ./code/download.py (or manually downloaded from https://doi.org/10.21979/N9/9OTEVX) instead of the original source files at Freesound or Xeno-canto for analysis and training of models; information regarding the original source files has been provided only for the purposes of accountability and transparency.

Fields

masker : unique strings
- The name of the file containing the masker.
fold_m : integers in {0, 1, 2, 3, 4, 5, 6, 7}
- The fold index of the masker. The sets of maskers in each fold are pairwise disjoint.
- Keys:
  - 0 : ARAUSv1 test set.
  - 1 : Fold 1 of the 5-fold cross-validation set.
  - 2 : Fold 2 of the 5-fold cross-validation set.
  - 3 : Fold 3 of the 5-fold cross-validation set.
  - 4 : Fold 4 of the 5-fold cross-validation set.
  - 5 : Fold 5 of the 5-fold cross-validation set.
  - 6 : Fold 1 of the ARAUSv2 test set.
  - 7 : Fold 2 of the ARAUSv2 test set.
class : strings in {"bird", "construction", "silence", "traffic", "water", "wind"}
- The class that the masker belongs to.
- There is only one masker in the "silence" class for each fold and its corresponding audio file is a sequence of all zeros.
site : strings in {"Freesound", "Xeno-canto", "NIL"}
- The site where the original source file (that was processed to create the masker in this repository) can be found.
- "NIL" is used for maskers in the "silence" class.
- Please see the Freesound and Xeno-canto websites for more information on their respective databases.
site_index : non-negative integers
- The index of the original source file (that was processed to create the masker in this repository) in the website site.
- 0 is used for maskers in the "silence" class.
- Files originating from Freesound can be accessed at https://freesound.org/s/{site_index}/ and files originating from Xeno-canto can be accessed at www.xeno-canto.org/{site_index}.
citation : strings
- The citation for the original source files, as required by the respective sites where the original source files were uploaded to.
- "NIL" is used for maskers in the "silence" class.
- If you use any of these maskers for your work, please include the string in this field in your citations or acknowledgements.
license : strings
- The license that the original source files (and hence the corresponding masker in this repository) is licensed under.
- "NIL" is used for maskers in the "silence" class.
- We license the maskers provided in this repository under identical licenses as the original source files on Freesound/Xeno-canto. Hence, the licenses are potentially different for different maskers. An exhaustive list of the possible licenses appearing in this field is as follows (click on the links for more information on the respective licenses):
comments : strings
- Comments that we have regarding the masker (if any).
gain_##dB : floating point numbers
- Gain to apply to achieve an L_A,eq of ## decibels when played back over a pair of Beyerdynamic Custom One Pro headphones, powered by a Creative SoundBlaster E5 soundcard (set at volume 40).
- ## is replaced with integers between 46 and 83, inclusive.
- A gain of 1 is used for maskers in the "silence" class.
leq_at_gain_##dB : floating point numbers
- Actual L_A,eq measured by a GRAS 45BB Head and Torso Simulator, when a gain of gain_##dB was applied before playback over a pair of Beyerdynamic Custom One Pro headphones, powered by a Creative SoundBlaster E5 soundcard (set at volume 40).
- ## is replaced with integers between 46 and 83, inclusive.
- An L_A,eq of ## is used for maskers in the "silence" class.
Savg_m : floating point numbers
- Mean sharpness (in acum) over time, computed according to DIN 45692 assuming free field conditions. The computation method is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Smax_m : floating point numbers
- Maximum sharpness (in acum) attained, computed according to DIN 45692 assuming free field conditions.
Sargmax_m : floating point numbers
- Time (in seconds) when maximum sharpness (in acum) was attained, computed according to DIN 45692 assuming free field conditions.
S##_m : floating point number
- ## percent exceedance level of sharpness (in acum), computed according to DIN 45692 assuming free field conditions. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
Navg_m : floating point numbers
- Mean loudness (in sone) over time, computed according to ISO 532-1 assuming free field conditions. The computation method is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Nrmc_m : floating point numbers
- Root mean cubed loudness (in sone) over time, computed according to ISO 532-1 assuming free field conditions.
- This is equal to the L₃ norm of loudness values over time, so e.g. the root mean cube of the values 1 and 2 is 2.08.
Nmax_m : floating point numbers
- Maximum loudness (in sone) attained, computed according to ISO 532-1 assuming free field conditions.
Nargmax_m : floating point numbers
- Time (in seconds) when maximum loudness (in sone) was attained, computed according to ISO 532-1 assuming free field conditions.
N##_m : floating point number
- ## percent exceedance level of loudness (in sone), computed according to ISO 532-1 assuming free field conditions. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
Favg_m : floating point numbers
- Mean fluctuation strength (in vacil) over time, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
- See Chapter 10 of "Psychoacoustics: Facts and Models (3rd ed.)" by Zwicker and Fastl for further information.
Fmax_m : floating point numbers
- Maximum fluctuation strength (in vacil) attained, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Fargmax_m : floating point numbers
- Time (in seconds) when maximum fluctuation strength (in vacil) was attained, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
F##_m : floating point number
- ## percent exceedance level of fluctuation strength (in vacil), at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
LAavg_m : floating point numbers
- Exponentially averaged A-weighted sound pressure level (in decibels) over time, computed with fast averaging (i.e. with time constant of 125 milliseconds).
- Filter is designed according to ISO1996-1. This is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
LAmin_m : floating point numbers
- Minimum A-weighted sound pressure level (in decibels) attained.
LAargmin_m : floating point numbers
- Time (in seconds) when minimum A-weighted sound pressure level (in decibels) was attained.
LAmax_m : floating point numbers
- Maximum A-weighted sound pressure level (in decibels) attained.
LAargmax_m : floating point numbers
- Time (in seconds) when maximum A-weighted sound pressure level (in decibels) was attained.
LA##_m : floating point number
- ## percent exceedance level of A-weighted sound pressure level (in decibels). This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
LCavg_m : floating point numbers
- Exponentially averaged C-weighted sound pressure level (in decibels) over time, computed with fast averaging (i.e. with time constant of 125 milliseconds).
- Filter is designed according to ISO1996-1. This is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
LCmin_m : floating point numbers
- Minimum C-weighted sound pressure level (in decibels) attained.
LCargmin_m : floating point numbers
- Time (in seconds) when minimum C-weighted sound pressure level (in decibels) was attained.
LCmax_m : floating point numbers
- Maximum C-weighted sound pressure level (in decibels) attained.
LCargmax_m : floating point numbers
- Time (in seconds) when maximum C-weighted sound pressure level (in decibels) was attained.
LC##_m : floating point number
- ## percent exceedance level of C-weighted sound pressure level (in decibels). This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
Ravg_m : floating point numbers
- Mean roughness (in asper) over time, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
- See Chapter 11 of "Psychoacoustics: Facts and Models (3rd ed.)" by Zwicker and Fastl for further information.
Rmax_m : floating point numbers
- Maximum roughness (in asper) attained, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Rargmax_m : floating point numbers
- Time (in seconds) when maximum roughness (in asper) was attained, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
R##_m : floating point number
- ## percent exceedance level of roughness (in asper), at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
Tgavg_m : floating point numbers
- Mean psychoacoustic tonality (in tonality units) over time of values greater than 0.02, computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74. The computation method is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Tavg_m : floating point numbers
- Mean psychoacoustic tonality (in tonality units) over time, computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74.
Tmax_m : floating point numbers
- Maximum psychoacoustic tonality (in tonality units) attained, computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74.
Targmax_m : floating point numbers
- Time (in seconds) when maximum psychoacoustic tonality (in tonality units) was attained, computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74.
T##_m : floating point number
- ## percent exceedance level of psychoacoustic tonality (in tonality units), computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
M#####_#_m : floating point numbers
- Power (in A-weighted decibels relative to 0.00002 Pa) at one-third octave band with centre frequency of #####.# Hz, computed using a Fast Fourier Transform with a Hanning window of size 8192 samples and 50% overlap between windows.
- #####_# is replaced by values in {00005_0, 00006_3, 00008_0, 00010_0, 00012_5, 00016_0, 00020_0, 00025_0, 00031_5, 00040_0, 00050_0, 00063_0, 00080_0, 00100_0, 00125_0, 00160_0, 00200_0, 00250_0, 00315_0, 00400_0, 00500_0, 00630_0, 00800_0, 01000_0, 01250_0, 01600_0, 02000_0, 02500_0, 03150_0, 04000_0, 05000_0, 06300_0, 08000_0, 10000_0, 12500_0, 16000_0, 20000_0}

`soundscapes.csv`

This CSV file contains information related to the urban soundscapes used to generate the stimuli for which the responses in responses.csv were collected, as well as relevant psychoacoustic parameters of the urban soundscapes computed after calibration (i.e. after the soundscape was calibrated to its in-situ L_A,eq).

Fields

soundscape : unique strings
- The name of the file containing the urban soundscape.
fold_s : integers in {-1, 0, 1, 2, 3, 4, 5, 6, 7}
- The fold index of the urban soundscape. The sets of urban soundscapes in each fold are pairwise disjoint.
- Keys:
  - -1 : Not in any fold. This could be because (a) the stimulus has an in-situ L_A,eq below 52 dB (to ensure that reproduction levels were significantly above the noise floor where the listening experiments were conducted), (b) the stimulus has an in-situ L_A,eq above 77 dB (to ensure safe listening levels for our participants), or (c) the stimulus was used as the practice (first), attention (middle), and consistency check (last) stimulus for all participants (to prevent data leakage since it is present in all folds).
  - 0 : ARAUSv1 test set.
  - 1 : Fold 1 of the 5-fold cross-validation set.
  - 2 : Fold 2 of the 5-fold cross-validation set.
  - 3 : Fold 3 of the 5-fold cross-validation set.
  - 4 : Fold 4 of the 5-fold cross-validation set.
  - 5 : Fold 5 of the 5-fold cross-validation set.
  - 6 : Fold 1 of the ARAUSv2 test set.
  - 7 : Fold 2 of the ARAUSv2 test set.
insitu_leq : floating point numbers
- For soundscapes in fold 0, this value is the in-situ L_A,eq (in decibels) of the urban soundscape, measured at the same time as the recording was made.
- For soundscapes in folds -1, 1, 2, 3, 4, and 5, this value was obtained by first calibrating the 1-minute long binaural recordings available in the Urban Soundscapes of the World database to the L_Aeq,1-min values provided on the database website (as the file SotW_LAeq_binaural_average_LR.xlsx available here), then measuring the L_Aeq,30-s of each half of the calibated file (corresponding to the file name in soundscape).
- For soundscapes in folds 6 and 7, this value was obtained by first calibrating the 1-minute long binaural recordings available in the Lion City Soundscapes dataset to the L_Aeq,1-min values provided on the database website (under the field insitu_spl_1min_LR_eavg in the file metadata.tab available here), then measuring the L_Aeq,30-s of each half of the calibated file (corresponding to the file name in soundscape).
gain_s : positive integers
- Gain to apply to achieve an L_A,eq of insitu_leq decibels when played back over a pair of Beyerdynamic Custom One Pro headphones, powered by a Creative SoundBlaster E5 soundcard (set at volume 40).
Savg_s : floating point numbers
- Mean sharpness (in acum) over time, computed according to DIN 45692 assuming free field conditions. The computation method is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Smax_s : floating point numbers
- Maximum sharpness (in acum) attained, computed according to DIN 45692 assuming free field conditions.
Sargmax_s : floating point numbers
- Time (in seconds) when maximum sharpness (in acum) was attained, computed according to DIN 45692 assuming free field conditions.
S##_s : floating point number
- ## percent exceedance level of sharpness (in acum), computed according to DIN 45692 assuming free field conditions. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
Navg_s : floating point numbers
- Mean loudness (in sone) over time, computed according to ISO 532-1 assuming free field conditions. The computation method is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Nrmc_s : floating point numbers
- Root mean cubed loudness (in sone) over time, computed according to ISO 532-1 assuming free field conditions.
- This is equal to the L₃ norm of loudness values over time, so e.g. the root mean cube of the values 1 and 2 is 2.08.
Nmax_s : floating point numbers
- Maximum loudness (in sone) attained, computed according to ISO 532-1 assuming free field conditions.
Nargmax_s : floating point numbers
- Time (in seconds) when maximum loudness (in sone) was attained, computed according to ISO 532-1 assuming free field conditions.
N##_s : floating point number
- ## percent exceedance level of loudness (in sone), computed according to ISO 532-1 assuming free field conditions. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
Favg_s : floating point numbers
- Mean fluctuation strength (in vacil) over time, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
- See Chapter 10 of "Psychoacoustics: Facts and Models (3rd ed.)" by Zwicker and Fastl for further information.
Fmax_s : floating point numbers
- Maximum fluctuation strength (in vacil) attained, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Fargmax_s : floating point numbers
- Time (in seconds) when maximum fluctuation strength (in vacil) was attained, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
F##_s : floating point number
- ## percent exceedance level of fluctuation strength (in vacil), at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
LAavg_s : floating point numbers
- Exponentially averaged A-weighted sound pressure level (in decibels) over time, computed with fast averaging (i.e. with time constant of 125 milliseconds).
- Filter is designed according to ISO1996-1. This is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
LAmin_s : floating point numbers
- Minimum A-weighted sound pressure level (in decibels) attained.
LAargmin_s : floating point numbers
- Time (in seconds) when minimum A-weighted sound pressure level (in decibels) was attained.
LAmax_s : floating point numbers
- Maximum A-weighted sound pressure level (in decibels) attained.
LAargmax_s : floating point numbers
- Time (in seconds) when maximum A-weighted sound pressure level (in decibels) was attained.
LA##_s : floating point number
- ## percent exceedance level of A-weighted sound pressure level (in decibels). This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
LCavg_s : floating point numbers
- Exponentially averaged C-weighted sound pressure level (in decibels) over time, computed with fast averaging (i.e. with time constant of 125 milliseconds).
- Filter is designed according to ISO1996-1. This is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
LCmin_s : floating point numbers
- Minimum C-weighted sound pressure level (in decibels) attained.
LCargmin_s : floating point numbers
- Time (in seconds) when minimum C-weighted sound pressure level (in decibels) was attained.
LCmax_s : floating point numbers
- Maximum C-weighted sound pressure level (in decibels) attained.
LCargmax_s : floating point numbers
- Time (in seconds) when maximum C-weighted sound pressure level (in decibels) was attained.
LC##_s : floating point number
- ## percent exceedance level of C-weighted sound pressure level (in decibels). This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
Ravg_s : floating point numbers
- Mean roughness (in asper) over time, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
- See Chapter 11 of "Psychoacoustics: Facts and Models (3rd ed.)" by Zwicker and Fastl for further information.
Rmax_s : floating point numbers
- Maximum roughness (in asper) attained, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Rargmax_s : floating point numbers
- Time (in seconds) when maximum roughness (in asper) was attained, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
R##_s : floating point number
- ## percent exceedance level of roughness (in asper), at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
Tgavg_s : floating point numbers
- Mean psychoacoustic tonality (in tonality units) over time of values greater than 0.02, computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74. The computation method is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Tavg_s : floating point numbers
- Mean psychoacoustic tonality (in tonality units) over time, computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74.
Tmax_s : floating point numbers
- Maximum psychoacoustic tonality (in tonality units) attained, computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74.
Targmax_s : floating point numbers
- Time (in seconds) when maximum psychoacoustic tonality (in tonality units) was attained, computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74.
T##_s : floating point number
- ## percent exceedance level of psychoacoustic tonality (in tonality units), computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
M#####_#_s : floating point numbers
- Power (in A-weighted decibels relative to 0.00002 Pa) at one-third octave band with centre frequency of #####.# Hz, computed using a Fast Fourier Transform with a Hanning window of size 8192 samples and 50% overlap between windows.
- #####_# is replaced by values in {00005_0, 00006_3, 00008_0, 00010_0, 00012_5, 00016_0, 00020_0, 00025_0, 00031_5, 00040_0, 00050_0, 00063_0, 00080_0, 00100_0, 00125_0, 00160_0, 00200_0, 00250_0, 00315_0, 00400_0, 00500_0, 00630_0, 00800_0, 01000_0, 01250_0, 01600_0, 02000_0, 02500_0, 03150_0, 04000_0, 05000_0, 06300_0, 08000_0, 10000_0, 12500_0, 16000_0, 20000_0}

`participants.csv` and `participants_rejected.csv`

These CSV files contain all the information provided by the participants of the study in the listener context questionnaire. The file participants.csv contains the information provided by the participants whose responses were accepted into the dataset, whereas the file participants_rejected.csv contains the information provided by participants whose responses were rejected from the dataset altogether (the reasons why they were rejected can be found in participants_rejected_reasons.csv).

When participants' responses were rejected, a different participant was assigned the same ID as the participant whose responses were rejected and presented with the same set of stimuli as the participant whose responses were rejected. Hence, we recommend using only the data in participants.csv for analysis and training of models; the data in participants_rejected.csv is provided only for the purposes of accountability and transparency.

Fields

participant : unique strings
- The ID of the participant who provided the current row of information about themselves. Each ID corresponds to a unique participant.
- For participants.csv, these strings are of the form ARAUS_#####, where ##### is a unique sequence of 5 digits.
fold_p : integers in {0, 1, 2, 3, 4, 5, 6, 7}
- The fold index of the participant. The sets of participants in each fold are pairwise disjoint.
- Keys:
  - 0 : ARAUSv1 test set.
  - 1 : Fold 1 of the 5-fold cross-validation set.
  - 2 : Fold 2 of the 5-fold cross-validation set.
  - 3 : Fold 3 of the 5-fold cross-validation set.
  - 4 : Fold 4 of the 5-fold cross-validation set.
  - 5 : Fold 5 of the 5-fold cross-validation set.
  - 6 : Fold 1 of the ARAUSv2 test set.
  - 7 : Fold 2 of the ARAUSv2 test set.
language_a : integers in {0, 1}
- Response to the question "Do you speak fluently in any languages/dialects other than English?"
- Keys:
  - 0 : No
  - 1 : Yes
language_b : integers in {-1, 0, 1}
- Response to the question "Is English your first or native language?"
- Participants were only prompted to respond to this question if their response to language_a was "Yes". If the response to language_a was "No", then we set their response to this question to "Not applicable" by default.
- Keys:
  - -1 : Not applicable
  - 0 : No
  - 1 : Yes
language_c : integers in {-1, 0, 1}
- Response to the question "Among the languages/dialects you speak, would you consider yourself to be most fluent in English?"
- Participants were only prompted to respond to this question if their response to language_a was "Yes". If the response to language_a was "No", then we set their response to this question to "Not applicable" by default.
- Keys:
  - -1 : Not applicable
  - 0 : No
  - 1 : Yes
age : positive integers
- Response to the question "What is your age?"
- The responses we received ranged from 18 to 75 (inclusive).
gender : integers in {0, 1}
- Response to the question "What is your gender?"
- Keys:
  - 0 : Male
  - 1 : Female
- **In the questionnaire, there was a field for participants to specify genders other than "Male" and "Female". A total of 2 participants specified a gender other than "Male" or "Female". Due to the risk of identification, we have randomly replaced their responses to this question with either "Male" or "Female" (50% chance each). Each response was replaced independently.
ethnic : integers in {0, 1, 2, 3}
- Response to the question "What is your ethnic group?"
- Keys:
  - 0 : Others
  - 1 : Chinese
  - 2 : Malay
  - 3 : Indian
occupation : integers in {0, 1, 2, 3, 4}
- Response to the question "What is your occupational status?"
- Keys:
  - 0 : Others
  - 1 : Student
  - 2 : Employed
  - 3 : Retired
  - 4 : Unemployed
education_a : integers in {0, 1, 2, ..., 8, 9}
- Response to the question "What is the highest level of education you have completed?"
- Keys:
  - 0 : Others
  - 1 : No qualification
  - 2 : Primary (PSLE), elementary school or equivalent
  - 3 : Secondary (GCE 'N' & 'O' level), middle school or equivalent
  - 4 : Institute of Technical Education or equivalent
  - 5 : Junior College ('A' level), high school or equivalent
  - 6 : Polytechnic and Arts Institution (Diploma level) or equivalent
  - 7 : University (Bachelor's Degree) or equivalent
  - 8 : University (Master's Degree) or equivalent
  - 9 : University (PhD)
education_b : integers in {-1, 0, 2, 3, 4, 5, 6, 7, 8, 9}
- Response to the question "What is the level of education you are currently undergoing?"
- Participants were only prompted to respond to this question if their response to occupation was "Student". If the response to occupation was not "Student", then we set their response to this question to "Not applicable" by default.
- Keys:
  - -1 : Not applicable
  - 0 : Others
  - 2 : Primary (PSLE), elementary school or equivalent
  - 3 : Secondary (GCE 'N' & 'O' level), middle school or equivalent
  - 4 : Institute of Technical Education or equivalent
  - 5 : Junior College ('A' level), high school or equivalent
  - 6 : Polytechnic and Arts Institution (Diploma level) or equivalent
  - 7 : University (Bachelor's Degree) or equivalent
  - 8 : University (Master's Degree) or equivalent
  - 9 : University (PhD)
dwelling : integers in {0, 1, 2, 3, 4}
- Response to "What dwelling type is your current main residence in Singapore?"
- Keys:
  - 0 : Others
  - 1 : Housing Development Board (HDB) flat or other public apartment
  - 2 : Hall of Residence or other student dormitory
  - 3 : Landed property
  - 4 : Condominium or other private apartment
citizen : integers in {0, 1}
- Response to "Are you a Singapore citizen?"
- Keys:
  - 0 : No
  - 1 : Yes
residence_length : integers in {0, 1}
- Response to "Have you resided in Singapore for more than 10 years?"
- Keys:
  - 0 : No
  - 1 : Yes
annoyance_freq : integers in {0, 1, 2, ..., 9, 10}
- Response to "How much has indoor/outdoor noise bothered, disturbed, or annoyed you over the past 12 months?" on an 11-point scale (0 = Not at all, 10 = Extremely)
quality : integers in {0, 1, 2, ..., 9, 10}
- Response to "How would you describe your satisfaction of the overall quality of the acoustic environment in Singapore?" on an 11-point scale (0 = Extremely dissatisfied, 10 = Extremely satisfied)
wnss : integers in {10, 11, 12, ..., 49, 50}
- Score on the truncated (10-item) Weinstein Noise Sensitivity Scale.
- See: Weinstein, N. D. (1978). Individual differences in reactions to noise: A longitudinal study in a college dormitory. Journal of Applied Psychology, 63, 458–466. doi:10.1037/0021‐9010.63.4.458
- Cronbach's alpha for the responses we obtained was computed to be 0.835, with the 95% confidence interval being [0.814, 0.854].
pss : integers in {0, 1, 2, ..., 39, 40}
- Score on the truncated (10-item) Perceived Stress Scale developed by Cohen et al.
- The time scale used was 1 month (so all items began with "In the last month, how often have you...")
- For more information on the full (14-item) scale, see: Cohen, S., Kamarck, T., and Mermelstein, R. (1983). A global measure of perceived stress. Journal of Health and Social Behavior, 24, 386-396.
- For more information on the truncated (10-item) scale, see: Cohen, S. and Williamson, G. Perceived Stress in a Probability Sample of the United States. Spacapan, S. and Oskamp, S. (Eds.) The Social Psychology of Health. Newbury Park, CA: Sage, 1988.
- Cronbach's alpha for the responses was computed to be 0.875, with the 95% confidence interval being [0.860, 0.889].
who : integers in {0, 1, 2, ..., 24, 25}
- Score on the WHO-5 Well-Being Index.
- The time scale used was 2 weeks (so all items began with "Over the last two weeks, ...")
- See: World Health Organization, "WHO-5 Well-Being Index," 1998. [Online]. Available: https://www.psykiatri-regionh.dk/who-5/Documents/WHO-5 questionaire - English.pdf.
- Cronbach's alpha for the responses was computed to be 0.854, with the 95% confidence interval being [0.834, 0.871].
panas_pos : integers in {10, 11, 12, ..., 49, 50}
- Score for Positive Affect on the Positive and Negative Affect Schedule developed by Watson et al.
- The time scale used was 2 weeks (so all items began with "In the last two weeks, to what extent have you felt...")
- See: Watson, D., Clark, L. A., and Tellegen, A. (1988). Development and Validation of Brief Measures of Positive and Negative Affect: The PANAS Scales. Journal of Personality and Social Psychology, 54(6), 1063-1070.
- Cronbach's alpha for the responses was computed to be 0.886, with the 95% confidence interval being [0.872, 0.899].
panas_neg : integers in {10, 11, 12, ..., 49, 50}
- Score for Negative Affect on the Positive and Negative Affect Schedule developed by Watson et al.
- The time scale used was 2 weeks (so all items began with "In the last two weeks, to what extent have you felt...")
- See: Watson, D., Clark, L. A., and Tellegen, A. (1988). Development and Validation of Brief Measures of Positive and Negative Affect: The PANAS Scales. Journal of Personality and Social Psychology, 54(6), 1063-1070.
- Cronbach's alpha for the responses was computed to be 0.891, with the 95% confidence interval being [0.877, 0.903].

`participants_rejected_reasons.csv`

This CSV file contains the reasons for the rejection of responses provided by participants in participants_rejected.csv.

Fields

participant : unique strings
- The ID of the participant who was rejected. Each ID corresponds to a unique participant.
rejection_reason : strings
- The reason why this participant's responses (in responses_rejected.csv) were rejected.

`responses.csv` and `responses_rejected.csv`

These CSV files contain the responses to the unique stimuli (= augmented soundscapes) provided by all study participants, as well as relevant psychoacoustic parameters of the stimuli computed after calibration (i.e. after:

the soundscape was calibrated to its in-situ L_A,eq,
the masker was calibrated at the specified SMR with respect to the calibrated soundscape, and
the two tracks were digitally added together to make the stimulus).

Each stimulus was 30 seconds in length, and made by adding a 30-second recording of an urban soundscape (from the Urban Soundscapes of the World database to a 30-second masker track at various soundscape-to-masker ratios (SMR).

The file responses.csv contains the responses provided by the participants whose responses were accepted into the dataset, whereas the file responses_rejected.csv contains the responses rejected from the dataset altogether (the reasons why they were rejected can be found in participants_rejected_reasons.csv).

When responses were rejected, a different participant was assigned the same ID as the participant whose responses were rejected and presented with the same set of stimuli as the participant whose responses were rejected. Hence, we recommend using only the data in responses.csv for analysis and training of models; the data in responses_rejected.csv is provided only for the purposes of accountability and transparency.

Fields

participant : unique strings
- The ID of the participant who provided the current row of responses. Each ID corresponds to a unique participant.
- For responses.csv, these strings are of the form ARAUS_#####, where ##### is a unique sequence of 5 digits for each unique participant.
fold_r : integers in {-1, 0, 1, 2, 3, 4, 5, 6, 7}
- The fold index of the response. The sets of responses in each fold are pairwise disjoint.
- This is identical to the fold indices associated with soundscape (i.e. fold_s). When not -1, this is also identical to the fold indices associated with participant and masker (i.e. fold_p and fold_m).
- Keys:
  - -1 : Present as common stimulus in test set and all folds of 5-fold cross-validation set. Used as practice (first), attention (middle), and consistency check (last) stimulus for all participants.
  - 0 : ARAUSv1 test set.
  - 1 : Fold 1 of the 5-fold cross-validation set.
  - 2 : Fold 2 of the 5-fold cross-validation set.
  - 3 : Fold 3 of the 5-fold cross-validation set.
  - 4 : Fold 4 of the 5-fold cross-validation set.
  - 5 : Fold 5 of the 5-fold cross-validation set.
  - 6 : Fold 1 of the ARAUSv2 test set.
  - 7 : Fold 2 of the ARAUSv2 test set.
soundscape : unique strings
- The name of the file containing the urban soundscape that the masker in masker was added to.
masker : unique strings
- The name of the file containing the masker that was added to the urban soundscape in soundscape.
smr : integers in {-6, -3, 0, 3, 6}
- The soundscape-to-masker ratio that masker was mixed with soundscape.
- E.g. if the in-situ L_A,eq of soundscape was measured to be 65 dB, and smr is -3, then masker would be calibrated to 68 dB before being added to soundscape to make the stimulus.
stimulus_index : integers in {1, 2, 3, ..., 50, 51}
- The index of the stimulus for the current participant.
- Participants were presented with stimuli in ascending order of stimulus_index.
- Each participant in the 5-fold cross-validation set and ARAUSv2 test set (fold_r in {1, 2, 3, 4, 5, 6, 7}) experienced 45 stimuli, and each participant in the test set (fold_r equal to 0) experienced 51 stimuli.
- The first stimulus presented to every participant, regardless of fold, was identical (the first 30 seconds of recording R0091 from the Urban Soundscapes of the World database). It served as a practice stimulus for participants to familiarise themselves with the questionnaire interface.
- The last stimulus presented to every participant, regardless of fold, was identical to the first stimulus. Responses to the first and last stimulus for a given participant may serve as a consistency check for individual participants' responses over the duration of the study.
- For every participant, one of the stimuli with stimulus_index in {15, 16, 17, ..., 24, 25} was designated as an attention stimulus, which was identical to the first and last stimulus presented. HOWEVER, instructions to choose the third option for all questions were overlaid on the video for the attention stimulus. Participants were not allowed to submit their answers until they had selected the third option for all questions related to the attention stimulus.
- We recommend removing rows corresponding to the first, last, and attention stimulus before using the data to train any model or perform any analysis, since those stimuli are identical for all participants regardless of fold.
time_taken : floating point numbers greater than or equal to 30
- The time between the initial onset of the current stimulus and the participant submitting their responses.
- Each 30-second stimuli was continuously repeated until the participant submitted their responses for that stimulus, with 5 seconds of silence between each repetition.
- Participants were not allowed to pause or stop the playback of the stimuli by themselves. Hence, it is guaranteed that the stimulus was played at the time intervals [0, 30], [35, 65], [70, 100], [105, 135], etc., and there was silence at the time intervals [30, 35], [65, 70], [100, 105], [135, 140] etc.
- Participants were not allowed to submit their responses before the end of the initial playback of each 30-second stimulus. This allowed them to experience the stimuli in entirety before providing their responses.
is_attention : integers in {0, 1}
- Whether the present stimulus is an attention stimulus. See the details for stimulus_index for more information on the attention stimulus.
- If the present stimulus is an attention stimulus, then all responses for pleasant, eventful, chaotic, ..., monotonous, appropriate will all be 3.
- Keys:
  - 0 : Present stimulus is not an attention stimulus.
  - 1 : Present stimulus is an attention stimulus.
pleasant : integers in {1, 2, 3, 4, 5}
- Response to the question "To what extent do you agree or disagree that the present surrounding sound environment is pleasant?" on a 5-point scale (1 = Strongly disagree, 5 = Strongly agree)
eventful : integers in {1, 2, 3, 4, 5}
- Response to the question "To what extent do you agree or disagree that the present surrounding sound environment is eventful?" on a 5-point scale (1 = Strongly disagree, 5 = Strongly agree)
chaotic : integers in {1, 2, 3, 4, 5}
- Response to the question "To what extent do you agree or disagree that the present surrounding sound environment is chaotic?" on a 5-point scale (1 = Strongly disagree, 5 = Strongly agree)
vibrant : integers in {1, 2, 3, 4, 5}
- Response to the question "To what extent do you agree or disagree that the present surrounding sound environment is vibrant?" on a 5-point scale (1 = Strongly disagree, 5 = Strongly agree)
uneventful : integers in {1, 2, 3, 4, 5}
- Response to the question "To what extent do you agree or disagree that the present surrounding sound environment is uneventful?" on a 5-point scale (1 = Strongly disagree, 5 = Strongly agree)
calm : integers in {1, 2, 3, 4, 5}
- Response to the question "To what extent do you agree or disagree that the present surrounding sound environment is calm?" on a 5-point scale (1 = Strongly disagree, 5 = Strongly agree)
annoying : integers in {1, 2, 3, 4, 5}
- Response to the question "To what extent do you agree or disagree that the present surrounding sound environment is annoying?" on a 5-point scale (1 = Strongly disagree, 5 = Strongly agree)
monotonous : integers in {1, 2, 3, 4, 5}
- Response to the question "To what extent do you agree or disagree that the present surrounding sound environment is monotonous?" on a 5-point scale (1 = Strongly disagree, 5 = Strongly agree)
appropriate : integers in {1, 2, 3, 4, 5}
- Response to the question "Overall, to what extent is the present surrounding sound environment appropriate to the present place?" on a 5-point scale (1 = Not at all, 5 = Perfectly)
Savg_r : floating point numbers
- Mean sharpness (in acum) over time, computed according to DIN 45692 assuming free field conditions. The computation method is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Smax_r : floating point numbers
- Maximum sharpness (in acum) attained, computed according to DIN 45692 assuming free field conditions.
Sargmax_r : floating point numbers
- Time (in seconds) when maximum sharpness (in acum) was attained, computed according to DIN 45692 assuming free field conditions.
S##_r : floating point number
- ## percent exceedance level of sharpness (in acum), computed according to DIN 45692 assuming free field conditions. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
Navg_r : floating point numbers
- Mean loudness (in sone) over time, computed according to ISO 532-1 assuming free field conditions. The computation method is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Nrmc_r : floating point numbers
- Root mean cubed loudness (in sone) over time, computed according to ISO 532-1 assuming free field conditions.
- This is equal to the L₃ norm of loudness values over time, so e.g. the root mean cube of the values 1 and 2 is 2.08.
Nmax_r : floating point numbers
- Maximum loudness (in sone) attained, computed according to ISO 532-1 assuming free field conditions.
Nargmax_r : floating point numbers
- Time (in seconds) when maximum loudness (in sone) was attained, computed according to ISO 532-1 assuming free field conditions.
N##_r : floating point number
- ## percent exceedance level of loudness (in sone), computed according to ISO 532-1 assuming free field conditions. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
Favg_r : floating point numbers
- Mean fluctuation strength (in vacil) over time, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
- See Chapter 10 of "Psychoacoustics: Facts and Models (3rd ed.)" by Zwicker and Fastl for further information.
Fmax_r : floating point numbers
- Maximum fluctuation strength (in vacil) attained, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Fargmax_r : floating point numbers
- Time (in seconds) when maximum fluctuation strength (in vacil) was attained, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
F##_r : floating point number
- ## percent exceedance level of fluctuation strength (in vacil), at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
LAavg_r : floating point numbers
- Exponentially averaged A-weighted sound pressure level (in decibels) over time, computed with fast averaging (i.e. with time constant of 125 milliseconds).
- Filter is designed according to ISO1996-1. This is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
LAmin_r : floating point numbers
- Minimum A-weighted sound pressure level (in decibels) attained.
LAargmin_r : floating point numbers
- Time (in seconds) when minimum A-weighted sound pressure level (in decibels) was attained.
LAmax_r : floating point numbers
- Maximum A-weighted sound pressure level (in decibels) attained.
LAargmax_r : floating point numbers
- Time (in seconds) when maximum A-weighted sound pressure level (in decibels) was attained.
LA##_r : floating point number
- ## percent exceedance level of A-weighted sound pressure level (in decibels). This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
LCavg_r : floating point numbers
- Exponentially averaged C-weighted sound pressure level (in decibels) over time, computed with fast averaging (i.e. with time constant of 125 milliseconds).
- Filter is designed according to ISO1996-1. This is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
LCmin_r : floating point numbers
- Minimum C-weighted sound pressure level (in decibels) attained.
LCargmin_r : floating point numbers
- Time (in seconds) when minimum C-weighted sound pressure level (in decibels) was attained.
LCmax_r : floating point numbers
- Maximum C-weighted sound pressure level (in decibels) attained.
LCargmax_r : floating point numbers
- Time (in seconds) when maximum C-weighted sound pressure level (in decibels) was attained.
LC##_r : floating point number
- ## percent exceedance level of C-weighted sound pressure level (in decibels). This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
Ravg_r : floating point numbers
- Mean roughness (in asper) over time, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
- See Chapter 11 of "Psychoacoustics: Facts and Models (3rd ed.)" by Zwicker and Fastl for further information.
Rmax_r : floating point numbers
- Maximum roughness (in asper) attained, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Rargmax_r : floating point numbers
- Time (in seconds) when maximum roughness (in asper) was attained, at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
R##_r : floating point number
- ## percent exceedance level of roughness (in asper), at 1/1 Bark resolution, computed as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
Tgavg_r : floating point numbers
- Mean psychoacoustic tonality (in tonality units) over time of values greater than 0.02, computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74. The computation method is as recommended in Table D.1 of ISO 12913-3:2019 for soundscape studies.
Tavg_r : floating point numbers
- Mean psychoacoustic tonality (in tonality units) over time, computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74.
Tmax_r : floating point numbers
- Maximum psychoacoustic tonality (in tonality units) attained, computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74.
Targmax_r : floating point numbers
- Time (in seconds) when maximum psychoacoustic tonality (in tonality units) was attained, computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74.
T##_r : floating point number
- ## percent exceedance level of psychoacoustic tonality (in tonality units), computed with a frequency range of 20 Hz to 20 kHz according to ECMA 74. This is the value exceeded ## percent of the time.
- ## is replaced by integers in {05, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95}.
M#####_#_r : floating point numbers
- Power (in A-weighted decibels relative to 0.00002 Pa) at one-third octave band with centre frequency of #####.# Hz, computed using a Fast Fourier Transform with a Hanning window of size 8192 samples and 50% overlap between windows.
- #####_# is replaced by values in {00005_0, 00006_3, 00008_0, 00010_0, 00012_5, 00016_0, 00020_0, 00025_0, 00031_5, 00040_0, 00050_0, 00063_0, 00080_0, 00100_0, 00125_0, 00160_0, 00200_0, 00250_0, 00315_0, 00400_0, 00500_0, 00630_0, 00800_0, 01000_0, 01250_0, 01600_0, 02000_0, 02500_0, 03150_0, 04000_0, 05000_0, 06300_0, 08000_0, 10000_0, 12500_0, 16000_0, 20000_0}
Leq_L_r : floating point numbers
- Exponentially averaged sound pressure level (in decibels) over time of the left channel of the augmented soundscape (with no weighting filter applied), computed with fast averaging (i.e. with time constant of 125 milliseconds) and measured according to the method described in http://dx.doi.org/10.1016/j.mex.2021.101288.
Leq_R_r : floating point numbers
- Exponentially averaged sound pressure level (in decibels) over time of the right channel of the augmented soundscape (with no weighting filter applied), computed with fast averaging (i.e. with time constant of 125 milliseconds) and measured according to the method described in http://dx.doi.org/10.1016/j.mex.2021.101288.

Individual conda installation commands

conda install seaborn
conda install -c conda-forge pingouin
conda install -c conda-forge jupyterlab
conda install pandas
conda install -c conda-forge pysoundfile
conda install -c conda-forge librosa
conda install -c conda-forge scikit-learn
conda install -c conda-forge tensorflow-gpu
conda install -c conda-forge python-wget

Version history

2.0.0 : Updated download.py, manifest.csv, and make_augmented_soundscapes.py for compatibility with ARAUSv2 dataset release.
1.1.0 : Updated replication_code.ipynb and araus_utils.py to reflect additional plots and results mentioned in the reply letter to our submission to IEEE Transactions on Affective Computing, and added a new notebook extra_code.ipynb containing extra code and discussions that were outside of the scope of the manuscript.
1.0.0 : Updated replication_code.ipynb, araus_tf.py, and araus_utils.py to be compatible with changes to the ARAUS dataset format (Version 2.0 according to https://doi.org/10.21979/N9/9OTEVX). The participant field in ./data/responses.csv, ./data/responses_rejected.csv, ./data/participants.csv, ./data/participants_rejected.csv, and ./data/participants_rejected_reasons.csv has been updated to ARAUS_#####, where ##### is the string in the participant field of the previous version (Version 1.2 according to https://doi.org/10.21979/N9/9OTEVX). In addition, two new fields Leq_L_r and Leq_R_r have been added to ./data/responses.csv and ./data/responses_rejected.csv corresponding to the exponentially averaged sound pressure level (in decibels) over time (with no weighting filter applied), computed with fast averaging (i.e. with time constant of 125 milliseconds) and measured according to the method described in http://dx.doi.org/10.1016/j.mex.2021.101288.
0.0.2 : Added details of FFT for M#####_# documentation.
0.0.1 : Fixed some typos in readme.
0.0.0 : Initial release

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

araus-dataset-baseline-models

Getting started

Directory structure

This repository

After running `python download.py manifest.csv`

Data files

`maskers.csv`

Fields

`soundscapes.csv`

Fields

`participants.csv` and `participants_rejected.csv`

Fields

`participants_rejected_reasons.csv`

Fields

`responses.csv` and `responses_rejected.csv`

Fields

Individual conda installation commands

Version history

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
code		code
CITATION.cff		CITATION.cff
README.md		README.md
araus-mac.yml		araus-mac.yml
araus.yml		araus.yml

ntudsp/araus-dataset-baseline-models

Folders and files

Latest commit

History

Repository files navigation

araus-dataset-baseline-models

Getting started

Directory structure

This repository

After running python download.py manifest.csv

Data files

maskers.csv

Fields

soundscapes.csv

Fields

participants.csv and participants_rejected.csv

Fields

participants_rejected_reasons.csv

Fields

responses.csv and responses_rejected.csv

Fields

Individual conda installation commands

Version history

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

After running `python download.py manifest.csv`

`maskers.csv`

`soundscapes.csv`

`participants.csv` and `participants_rejected.csv`

`participants_rejected_reasons.csv`

`responses.csv` and `responses_rejected.csv`

Packages