SLURPFOOD

This repository contains the SLURPFOOD splits proposed in the paper "Out-of-distribution generalisation in spoken language understanding". Additionally, it contains scripts for generating the splits and creating the baselines.

To start, you need to first clone the original SLURP repository in the parent directory.

Then you need to download the SLURP audio files by navigating to cd slurp/scripts and running ./download_audio.sh.

The slurpfood directory contains two subdirectories: scripts and splits. The scripts contains the scripts used for reproducing the splits and the splits contains the actual data splits in JSON and CSV formats.

In the experiments directory are the SpeechBrain recipes for training the models discussed in the paper. To run the experiments, you need to run python train.py hyperparams.yaml. The train.py contains the main code for training and inference, whereas the hyperparams.yaml contains the hyperparameters.

SLURPFOOD splits

For all the splits we used the scenario classification task, but some splits can be adapted for intent classification. The splits are derived from the official train, dev, and test portions of the original SLURP dataset.
The table below shows class overlaps (intersections of class sets) between the train, dev, and test subsets for each label type (scenario, action, intent). In the brackets are the similarities of the frequency distributions of labels in the different subsets.

OOV splits

These splits are used to assess the model's OOV generalisation ability when the scenario is seen in training, but the task (action) is novel. For instance, the training set may contain "launch super mario" where the scenario is "play" and the action is "game". Conversely, the test set could include an instruction "please play the next episode in the podcast", maintaining the "play" scenario, but in a combination with a novel "podcast" action.

CG splits

The CG splits are generated with the distribution-based compositionality assessment (DBCA) framework. To create a train-test split that requires compositional generalisation, the atom divergence should be similar between train and test splits, while compound distribution should diverge. In our CG splits, atoms are defined as the scenario and action labels, and the compound in each utterance is the combination of the two, i.e. the intent.

DA-CG splits

These splits are used to predict the scenario from a combination of two utterances with different actions but the same scenario. For the CG split, we ensured that no combination of the same two actions present in the test set appears in the training or development splits.

Microphone mismatch splits

The goal of these splits is to test the generalisation of the models in mismatched acoustic environments. To achieve that, we used the recordings that were done with a headset as part of the training and development portions. The test set consists of two subsets. The first one contains the recordings done with a headset, while the second one contains the same recordings, but without a headset.

To cite the paper, use:

ToDo

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
experiments		experiments
slurpfood		slurpfood
.gitignore		.gitignore
README.md		README.md
data_stats.png		data_stats.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SLURPFOOD

SLURPFOOD splits

OOV splits

CG splits

DA-CG splits

Microphone mismatch splits

About

Releases 1

Packages

Contributors 2

Languages

aalto-speech/slurpfood

Folders and files

Latest commit

History

Repository files navigation

SLURPFOOD

SLURPFOOD splits

OOV splits

CG splits

DA-CG splits

Microphone mismatch splits

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages