-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve intermediate layer extraction explanation #1338
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -159,12 +159,17 @@ const char* TensorflowPredictFSDSINet::description = DOC( | |
"Note: This algorithm does not make any check on the input model so it is " | ||
"the user's responsibility to make sure it is a valid one.\n" | ||
"\n" | ||
"Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or " | ||
"class activations (the output shape is, e.g., [time, number of classes]). If the output " | ||
"parameter is set to an intermediate layer with more dimensions, the output will be " | ||
"flattened to 2D.\n" | ||
"\n" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comments as for TensorflowPredictEffnetDiscogs |
||
"Note: The FSD-SINet models were trained on normalized audio clips. " | ||
"Clip-level normalization is only implemented in standard mode since in streaming there is no access to the entire audio clip. " | ||
"In the streaming case, the user is responsible for controlling the dynamic range of the input signal. " | ||
"Ideally, the signal should be zero-mean (no DC) and normalized to the full dynamic range (-1, 1).\n\n" | ||
"References:\n" | ||
" [1] Fonseca, E., Ferraro, A., & Serra, X. (2021). Improving sound event classification by increasing shift invariance in convolutional neural networks. arXiv preprint arXiv:2107.00623.\n" | ||
" [1] Fonseca, E., Ferraro, A., & Serra, X. (2021). Improving sound event classification by increasing shift invariance in convolutional neural networks. arXiv preprint arXiv:2107.00623.\n\n" | ||
" [2] https://github.com/edufonseca/shift_sec" | ||
); | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -158,6 +158,11 @@ const char* TensorflowPredictMusiCNN::description = DOC( | |
"Note: This algorithm does not make any check on the input model so it is " | ||
"the user's responsibility to make sure it is a valid one.\n" | ||
"\n" | ||
"Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or " | ||
"class activations (the output shape is, e.g., [time, number of classes]). If the output " | ||
"parameter is set to an intermediate layer with more dimensions, the output will be " | ||
"flattened to 2D.\n" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as for TensorflowPredictEffnetDiscogs |
||
"\n" | ||
"References:\n" | ||
"\n" | ||
"1. Pons, J., & Serra, X. (2019). musicnn: Pre-trained convolutional neural " | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -156,6 +156,11 @@ const char* TensorflowPredictVGGish::description = DOC( | |
"Note: This algorithm does not make any check on the input model so it is " | ||
"the user's responsibility to make sure it is a valid one.\n" | ||
"\n" | ||
"Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or " | ||
"class activations (the output shape is, e.g., [time, number of classes]). If the output " | ||
"parameter is set to an intermediate layer with more dimensions, the output will be " | ||
"flattened to 2D.\n" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as for TensorflowPredictEffnetDiscogs |
||
"\n" | ||
"References:\n" | ||
"\n" | ||
"1. Gemmeke, J. et. al., AudioSet: An ontology and human-labelled dataset " | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,6 +36,7 @@ void TensorToVectorReal::configure() { | |
_channels = 0; | ||
_timeStamps = 0; | ||
_featsSize = 0; | ||
_warned = false; | ||
} | ||
|
||
|
||
|
@@ -44,6 +45,7 @@ void TensorToVectorReal::reset() { | |
_channels = 0; | ||
_timeStamps = 0; | ||
_featsSize = 0; | ||
_warned = false; | ||
} | ||
|
||
|
||
|
@@ -66,6 +68,11 @@ AlgorithmStatus TensorToVectorReal::process() { | |
_timeStamps = tensor.dimension(2); | ||
_featsSize = tensor.dimension(3); | ||
|
||
if (_channels != 1 && !_warned) { | ||
E_WARNING("TensorToVectorReal: The channel axis (dimension 1) of the input tensor has size larger than 1, but the output of this algorithm is 2D. The batch, channel, and time axes (dimensions 0, 1, 2) will be flattened to the first dimension of the output matrix."); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We output a vector of vector of reals, so the "matrix" terminology may be misleading. |
||
_warned = true; | ||
} | ||
|
||
_frame.setAcquireSize(_timeStamps * _channels * _batchSize); | ||
_frame.setReleaseSize(_timeStamps * _channels *_batchSize); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrased version (trying to simplify):