Research

This page covers various approaches and experiments, both failed and successful, done to improve the performance of the Precise models.

Audio Features

An overview of the various stages the audio goes through before being fed into the network:

Audio

_(24000) _(time)

The last 1.5 seconds of audio is captured through the microphone at 16000 samples per second represented in 16 bit integers.

Frames

_{(29, 1600)} _{(time, frame)}

frames

The audio is split up into 0.1 second chunks, moving right by 0.05 seconds.

Power Spectrum

_{(29, 257)} _{(time, frequency)}

power-spectrum

Filterbanks

_{(29, 13)} _{(time, filter)}

filtered-spectrum

The frequencies near 13 spots chosen on the mel scale are averaged to form a condensed representation of the power spectrum. After, the log is taken to reveal more detail. If the log were taken before, the filterbanks would be blurry from summing all the faint frequencies.

MFCCs

_{(29, 13)} _{(time, mfcc)}

mfccs

The DCT of the filterbanks is taken to make the structure of each spectrum band more machine readable.

To Scale

chopped-frames chopped-power-spectrum filterbanked-spectrum mfccs

Click here to see the real size of audio at each step.

Architecture

MFCC Input

GRU

Model

model = Sequential()
model.add(GRU(
    params.recurrent_units, activation='linear',
    input_shape=(pr.n_features, pr.feature_size), dropout=params.dropout, name='net'
))
model.add(Dense(1, activation='sigmoid'))

Training

$ precise-train mfccs-gru.net path/to/data -e 21 -b 64 -s 0.1 -sb -mm val_acc -em
$ precise-train mfccs-gru.net path/to/data -e 100 -b 2048 -s 0.1 -sb -mm val_acc -em
Epoch 121/121
77500/77500 [==============================] - 7s 86us/step - loss: 0.0315 - acc: 0.9265 - false_pos: 0.0097 - false_neg: 0.1226 - val_loss: 0.0305 - val_acc: 0.9587 - val_false_pos: 0.0178 - val_false_neg: 0.0626

Accuracy

=== Summary ===
20015 out of 20807
96.19 %

2.2 % false positives
5.25 % false negatives

False Positives

=== Total ===
Hours: 56.76
Activations / Day: 56.66
Activated Chunks / Day: 117.97
Average Activation (*100): 0.51

2xGRU

Model

model = Sequential()
model.add(GRU(
    params.recurrent_units, activation='tanh',
    input_shape=(pr.n_features, pr.feature_size), dropout=params.dropout, name='net',
    return_sequences=True
))
model.add(GRU(
    params.recurrent_units, activation='linear', dropout=params.dropout,
))
model.add(Dense(1, activation='sigmoid'))

Training

$ precise-train mfccs-2xgru.net path/to/data -e 21 -b 64 -s 0.1 -sb -mm val_acc -em
$ precise-train mfccs-2xgru.net path/to/data -e 100 -b 2048 -s 0.1 -sb -mm val_acc -em
Epoch 121/121
77500/77500 [==============================] - 6s 73us/step - loss: 0.0272 - acc: 0.9372 - false_pos: 0.0088 - false_neg: 0.1042 - val_loss: 0.0263 - val_acc: 0.9657 - val_false_pos: 0.0123 - val_false_neg: 0.0541

Accuracy

=== Summary ===
20152 out of 20807
96.85 %

1.83 % false positives
4.33 % false negatives

False Positives

=== Total ===
Hours: 56.76
Activations / Day: 67.23
Activated Chunks / Day: 201.69
Average Activation (*100): 0.18

2xMFCC Input

GRU

Model

model = Sequential()
model.add(GRU(
    params.recurrent_units, activation='linear',
    input_shape=(pr.n_features, pr.feature_size), dropout=params.dropout, name='net'
))
model.add(Dense(1, activation='sigmoid'))

Training

$ precise-train 2xmfccs-gru.net path/to/data -e 21 -b 64 -s 0.1 -sb -mm val_acc -em
$ precise-train 2xmfccs-gru.net path/to/data -e 100 -b 2048 -s 0.1 -sb -mm val_acc -em
Epoch 121/121
77500/77500 [==============================] - 14s 174us/step - loss: 0.0288 - acc: 0.9364 - false_pos: 0.0096 - false_neg: 0.1052 - val_loss: 0.0259 - val_acc: 0.9601 - val_false_pos: 0.0119 - val_false_neg: 0.0651

Accuracy

=== Summary ===
20100 out of 20807
96.6 %

2.0 % false positives
4.66 % false negatives

False Positives

=== Total ===
Hours: 56.76
Activations / Day: 62.58
Activated Chunks / Day: 171.25
Average Activation (*100): 0.40

2xGRU

Model

model = Sequential()
model.add(GRU(
    params.recurrent_units, activation='tanh',
    input_shape=(pr.n_features, pr.feature_size), dropout=params.dropout, name='net',
    return_sequences=True
))
model.add(GRU(
    params.recurrent_units, activation='linear', dropout=params.dropout,
))
model.add(Dense(1, activation='sigmoid'))

Training

$ precise-train 2xmfccs-2xgru.net path/to/data -e 21 -b 64 -s 0.1 -sb -mm val_acc -em
$ precise-train 2xmfccs-2xgru.net path/to/data -e 100 -b 2048 -s 0.1 -sb -mm val_acc -em
Epoch 121/121
77500/77500 [==============================] - 9s 121us/step - loss: 0.0228 - acc: 0.9481 - false_pos: 0.0073 - false_neg: 0.0862 - val_loss: 0.0226 - val_acc: 0.9709 - val_false_pos: 0.0131 - val_false_neg: 0.0435

Accuracy

=== Summary ===
20269 out of 20807
97.41 %

1.87 % false positives
3.24 % false negatives

False Positives

=== Total ===
Hours: 56.76
Activations / Day: 67.23
Activated Chunks / Day: 201.69
Average Activation (*100): 0.18

Mel Spectrogram Input

pr = ListenerParams(
    window_t=0.1, hop_t=0.05, buffer_t=1.5, sample_rate=16000,
    sample_depth=2, n_mfcc=20, n_filt=20, n_fft=512, use_delta=False,
    vectorizer=Vectorizer.mels
)

GRU

Model

model = Sequential()
model.add(GRU(
    params.recurrent_units, activation='linear',
    input_shape=(pr.n_features, pr.feature_size), dropout=params.dropout, name='net'
))
model.add(Dense(1, activation='sigmoid'))

Training

$ precise-train mels-gru.net path/to/data -e 21 -b 64 -s 0.1 -sb -mm val_acc -em
$ precise-train mels-gru.net path/to/data -e 100 -b 2048 -s 0.1 -sb -mm val_acc -em
Epoch 121/121
77500/77500 [==============================] - 2s 31us/step - loss: 0.0484 - acc: 0.8788 - false_pos: 0.0139 - false_neg: 0.2034 - val_loss: 0.0419 - val_acc: 0.9250 - val_false_pos: 0.0181 - val_false_neg: 0.1262

Accuracy

=== Summary ===
19381 out of 20807
93.15 %

2.07 % false positives
11.17 % false negatives

False Positives

=== Total ===
Hours: 56.76
Activations / Day: 114.59
Activated Chunks / Day: 169.13
Average Activation (*100): 0.70

[TimeDistributed Conv1D [10, 3], MaxPool1D]

Model

model = Sequential()
model.add(Reshape((pr.n_features, pr.feature_size, 1), input_shape=input_shape))
model.add(TimeDistributed(Conv1D(10, 4, padding='same')))
model.add(TimeDistributed(MaxPooling1D(4)))
model.add(Reshape((pr.n_features, -1)))
model.add(GRU(params.recurrent_units, activation='linear', dropout=params.dropout))
model.add(Dense(1, activation='sigmoid'))

Training

$ precise-train mel-cnn1d.net path/to/data -e 100 -b 32 -s 0.1 -sb -mm val_acc -em
Epoch 33/100
77500/77500 [==============================] - 38s 493us/step - loss: 0.0386 - acc: 0.9143 - false_pos: 0.0139 - false_neg: 0.1407 - val_loss: 0.0358 - val_acc: 0.9446 - val_false_pos: 0.0198 - val_false_neg: 0.0880

Accuracy

=== Summary ===
19698 out of 20807
94.67 %

1.92 % false positives
8.41 % false negatives

False Positives

=== Total ===
Hours: 56.76
Activations / Day: 112.47
Activated Chunks / Day: 206.77
Average Activation (*100): 1.35

Research

Audio Features

Audio

Frames

Power Spectrum

Filterbanks

MFCCs

To Scale

Architecture

MFCC Input

GRU

Model

Training

Accuracy

False Positives

2xGRU

Model

Training

Accuracy

False Positives

2xMFCC Input

GRU

Model

Training

Accuracy

False Positives

2xGRU

Model

Training

Accuracy

False Positives

Mel Spectrogram Input

GRU

Model

Training

Accuracy

False Positives

[TimeDistributed Conv1D [10, 3], MaxPool1D]

Model

Training

Accuracy

False Positives

Clone this wiki locally