-
Notifications
You must be signed in to change notification settings - Fork 230
Research
This page covers various approaches and experiments, both failed and successful, done to improve the performance of the Precise models.
An overview of the various stages the audio goes through before being fed into the network:
(24000)
(time)
The last 1.5 seconds of audio is captured through the microphone at 16000 samples per second represented in 16 bit integers.
(29, 1600)
(time, frame)
The audio is split up into 0.1 second chunks, moving right by 0.05 seconds.
(29, 257)
(time, frequency)
(29, 13)
(time, filter)
The frequencies near 13 spots chosen on the mel scale are averaged to form a condensed representation of the power spectrum. After, the log is taken to reveal more detail. If the log were taken before, the filterbanks would be blurry from summing all the faint frequencies.
(29, 13)
(time, mfcc)
The DCT of the filterbanks is taken to make the structure of each spectrum band more machine readable.
Click here to see the real size of audio at each step.
model = Sequential()
model.add(GRU(
params.recurrent_units, activation='linear',
input_shape=(pr.n_features, pr.feature_size), dropout=params.dropout, name='net'
))
model.add(Dense(1, activation='sigmoid'))
$ precise-train mfccs-gru.net path/to/data -e 21 -b 64 -s 0.1 -sb -mm val_acc -em
$ precise-train mfccs-gru.net path/to/data -e 100 -b 2048 -s 0.1 -sb -mm val_acc -em
Epoch 121/121
77500/77500 [==============================] - 7s 86us/step - loss: 0.0315 - acc: 0.9265 - false_pos: 0.0097 - false_neg: 0.1226 - val_loss: 0.0305 - val_acc: 0.9587 - val_false_pos: 0.0178 - val_false_neg: 0.0626
=== Summary ===
20015 out of 20807
96.19 %
2.2 % false positives
5.25 % false negatives
=== Total ===
Hours: 56.76
Activations / Day: 56.66
Activated Chunks / Day: 117.97
Average Activation (*100): 0.51
model = Sequential()
model.add(GRU(
params.recurrent_units, activation='tanh',
input_shape=(pr.n_features, pr.feature_size), dropout=params.dropout, name='net',
return_sequences=True
))
model.add(GRU(
params.recurrent_units, activation='linear', dropout=params.dropout,
))
model.add(Dense(1, activation='sigmoid'))
$ precise-train mfccs-2xgru.net path/to/data -e 21 -b 64 -s 0.1 -sb -mm val_acc -em
$ precise-train mfccs-2xgru.net path/to/data -e 100 -b 2048 -s 0.1 -sb -mm val_acc -em
Epoch 121/121
77500/77500 [==============================] - 6s 73us/step - loss: 0.0272 - acc: 0.9372 - false_pos: 0.0088 - false_neg: 0.1042 - val_loss: 0.0263 - val_acc: 0.9657 - val_false_pos: 0.0123 - val_false_neg: 0.0541
=== Summary ===
20152 out of 20807
96.85 %
1.83 % false positives
4.33 % false negatives
=== Total ===
Hours: 56.76
Activations / Day: 67.23
Activated Chunks / Day: 201.69
Average Activation (*100): 0.18
model = Sequential()
model.add(GRU(
params.recurrent_units, activation='linear',
input_shape=(pr.n_features, pr.feature_size), dropout=params.dropout, name='net'
))
model.add(Dense(1, activation='sigmoid'))
$ precise-train 2xmfccs-gru.net path/to/data -e 21 -b 64 -s 0.1 -sb -mm val_acc -em
$ precise-train 2xmfccs-gru.net path/to/data -e 100 -b 2048 -s 0.1 -sb -mm val_acc -em
Epoch 121/121
77500/77500 [==============================] - 14s 174us/step - loss: 0.0288 - acc: 0.9364 - false_pos: 0.0096 - false_neg: 0.1052 - val_loss: 0.0259 - val_acc: 0.9601 - val_false_pos: 0.0119 - val_false_neg: 0.0651
=== Summary ===
20100 out of 20807
96.6 %
2.0 % false positives
4.66 % false negatives
=== Total ===
Hours: 56.76
Activations / Day: 62.58
Activated Chunks / Day: 171.25
Average Activation (*100): 0.40
model = Sequential()
model.add(GRU(
params.recurrent_units, activation='tanh',
input_shape=(pr.n_features, pr.feature_size), dropout=params.dropout, name='net',
return_sequences=True
))
model.add(GRU(
params.recurrent_units, activation='linear', dropout=params.dropout,
))
model.add(Dense(1, activation='sigmoid'))
$ precise-train 2xmfccs-2xgru.net path/to/data -e 21 -b 64 -s 0.1 -sb -mm val_acc -em
$ precise-train 2xmfccs-2xgru.net path/to/data -e 100 -b 2048 -s 0.1 -sb -mm val_acc -em
Epoch 121/121
77500/77500 [==============================] - 9s 121us/step - loss: 0.0228 - acc: 0.9481 - false_pos: 0.0073 - false_neg: 0.0862 - val_loss: 0.0226 - val_acc: 0.9709 - val_false_pos: 0.0131 - val_false_neg: 0.0435
=== Summary ===
20269 out of 20807
97.41 %
1.87 % false positives
3.24 % false negatives
=== Total ===
Hours: 56.76
Activations / Day: 67.23
Activated Chunks / Day: 201.69
Average Activation (*100): 0.18
pr = ListenerParams(
window_t=0.1, hop_t=0.05, buffer_t=1.5, sample_rate=16000,
sample_depth=2, n_mfcc=20, n_filt=20, n_fft=512, use_delta=False,
vectorizer=Vectorizer.mels
)
model = Sequential()
model.add(GRU(
params.recurrent_units, activation='linear',
input_shape=(pr.n_features, pr.feature_size), dropout=params.dropout, name='net'
))
model.add(Dense(1, activation='sigmoid'))
$ precise-train mels-gru.net path/to/data -e 21 -b 64 -s 0.1 -sb -mm val_acc -em
$ precise-train mels-gru.net path/to/data -e 100 -b 2048 -s 0.1 -sb -mm val_acc -em
Epoch 121/121
77500/77500 [==============================] - 2s 31us/step - loss: 0.0484 - acc: 0.8788 - false_pos: 0.0139 - false_neg: 0.2034 - val_loss: 0.0419 - val_acc: 0.9250 - val_false_pos: 0.0181 - val_false_neg: 0.1262
=== Summary ===
19381 out of 20807
93.15 %
2.07 % false positives
11.17 % false negatives
=== Total ===
Hours: 56.76
Activations / Day: 114.59
Activated Chunks / Day: 169.13
Average Activation (*100): 0.70
model = Sequential()
model.add(Reshape((pr.n_features, pr.feature_size, 1), input_shape=input_shape))
model.add(TimeDistributed(Conv1D(10, 4, padding='same')))
model.add(TimeDistributed(MaxPooling1D(4)))
model.add(Reshape((pr.n_features, -1)))
model.add(GRU(params.recurrent_units, activation='linear', dropout=params.dropout))
model.add(Dense(1, activation='sigmoid'))
$ precise-train mel-cnn1d.net path/to/data -e 100 -b 32 -s 0.1 -sb -mm val_acc -em
Epoch 33/100
77500/77500 [==============================] - 38s 493us/step - loss: 0.0386 - acc: 0.9143 - false_pos: 0.0139 - false_neg: 0.1407 - val_loss: 0.0358 - val_acc: 0.9446 - val_false_pos: 0.0198 - val_false_neg: 0.0880
=== Summary ===
19698 out of 20807
94.67 %
1.92 % false positives
8.41 % false negatives
=== Total ===
Hours: 56.76
Activations / Day: 112.47
Activated Chunks / Day: 206.77
Average Activation (*100): 1.35