Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve intermediate layer extraction explanation #1338

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,11 @@ const char* TensorflowPredictEffnetDiscogs::description = DOC(
"Note: This algorithm does not make any check on the input model so it is "
"the user's responsibility to make sure it is a valid one.\n"
"\n"
"Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or "
"class activations (the output shape is, e.g., [time, number of classes]). If the output "
"parameter is set to an intermediate layer with more dimensions, the output will be "
"flattened to 2D.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased version (trying to simplify):

Note: The algorithm outputs a time series of class activations or embedding vectors, with a 2D shape [time, feature vector]. Feature vector values will be flattened if the output parameter is set to extract an intermediate layer with multiple dimensions.

"\n"
"References:\n"
"\n"
"1. Supported models at https://essentia.upf.edu/models/\n\n");
Expand Down
7 changes: 6 additions & 1 deletion src/algorithms/machinelearning/tensorflowpredictfsdsinet.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -159,12 +159,17 @@ const char* TensorflowPredictFSDSINet::description = DOC(
"Note: This algorithm does not make any check on the input model so it is "
"the user's responsibility to make sure it is a valid one.\n"
"\n"
"Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or "
"class activations (the output shape is, e.g., [time, number of classes]). If the output "
"parameter is set to an intermediate layer with more dimensions, the output will be "
"flattened to 2D.\n"
"\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as for TensorflowPredictEffnetDiscogs

"Note: The FSD-SINet models were trained on normalized audio clips. "
"Clip-level normalization is only implemented in standard mode since in streaming there is no access to the entire audio clip. "
"In the streaming case, the user is responsible for controlling the dynamic range of the input signal. "
"Ideally, the signal should be zero-mean (no DC) and normalized to the full dynamic range (-1, 1).\n\n"
"References:\n"
" [1] Fonseca, E., Ferraro, A., & Serra, X. (2021). Improving sound event classification by increasing shift invariance in convolutional neural networks. arXiv preprint arXiv:2107.00623.\n"
" [1] Fonseca, E., Ferraro, A., & Serra, X. (2021). Improving sound event classification by increasing shift invariance in convolutional neural networks. arXiv preprint arXiv:2107.00623.\n\n"
" [2] https://github.com/edufonseca/shift_sec"
);

Expand Down
5 changes: 5 additions & 0 deletions src/algorithms/machinelearning/tensorflowpredictmusicnn.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,11 @@ const char* TensorflowPredictMusiCNN::description = DOC(
"Note: This algorithm does not make any check on the input model so it is "
"the user's responsibility to make sure it is a valid one.\n"
"\n"
"Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or "
"class activations (the output shape is, e.g., [time, number of classes]). If the output "
"parameter is set to an intermediate layer with more dimensions, the output will be "
"flattened to 2D.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for TensorflowPredictEffnetDiscogs

"\n"
"References:\n"
"\n"
"1. Pons, J., & Serra, X. (2019). musicnn: Pre-trained convolutional neural "
Expand Down
5 changes: 5 additions & 0 deletions src/algorithms/machinelearning/tensorflowpredictvggish.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,11 @@ const char* TensorflowPredictVGGish::description = DOC(
"Note: This algorithm does not make any check on the input model so it is "
"the user's responsibility to make sure it is a valid one.\n"
"\n"
"Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or "
"class activations (the output shape is, e.g., [time, number of classes]). If the output "
"parameter is set to an intermediate layer with more dimensions, the output will be "
"flattened to 2D.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for TensorflowPredictEffnetDiscogs

"\n"
"References:\n"
"\n"
"1. Gemmeke, J. et. al., AudioSet: An ontology and human-labelled dataset "
Expand Down
7 changes: 7 additions & 0 deletions src/algorithms/standard/tensortovectorreal.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ void TensorToVectorReal::configure() {
_channels = 0;
_timeStamps = 0;
_featsSize = 0;
_warned = false;
}


Expand All @@ -44,6 +45,7 @@ void TensorToVectorReal::reset() {
_channels = 0;
_timeStamps = 0;
_featsSize = 0;
_warned = false;
}


Expand All @@ -66,6 +68,11 @@ AlgorithmStatus TensorToVectorReal::process() {
_timeStamps = tensor.dimension(2);
_featsSize = tensor.dimension(3);

if (_channels != 1 && !_warned) {
E_WARNING("TensorToVectorReal: The channel axis (dimension 1) of the input tensor has size larger than 1, but the output of this algorithm is 2D. The batch, channel, and time axes (dimensions 0, 1, 2) will be flattened to the first dimension of the output matrix.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We output a vector of vector of reals, so the "matrix" terminology may be misleading.

_warned = true;
}

_frame.setAcquireSize(_timeStamps * _channels * _batchSize);
_frame.setReleaseSize(_timeStamps * _channels *_batchSize);

Expand Down
1 change: 1 addition & 0 deletions src/algorithms/standard/tensortovectorreal.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ class TensorToVectorReal : public Algorithm {
int _channels;
int _timeStamps;
int _featsSize;
bool _warned;

public:
TensorToVectorReal(){
Expand Down