Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve intermediate layer extraction explanation #1338

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

palonso
Copy link
Contributor

@palonso palonso commented May 26, 2023

TensorToVectorReal converts tensors to 2D arrays by flattening all axis but the last one into the first dimension.
model-specific prediction algorithms (e.g., TensorflowPredictVGGish) use this algorithm to return 2D arrays since they are primarily intended for time-wise predictions or embeddings. However, it is possible to use these algorithms to extract intermediate layers of the models that may have more than two dimensions. In this case, all dimensions but the last one will be flattened. To address this:

  • TensorToVectorReal throws a warning in case it flattens a dimension.
  • We added notes explaining this behavior to the algorithms potentially affected.

Note that it is also possible to retrieve intermediate layers with their original shape using TensorflowPredict as discussed here.

@palonso palonso requested a review from dbogdanov May 26, 2023 08:51
Copy link
Member

@dbogdanov dbogdanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I've left a proposal to improve the description of the algorithms' output in the DOC string.

"Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or "
"class activations (the output shape is, e.g., [time, number of classes]). If the output "
"parameter is set to an intermediate layer with more dimensions, the output will be "
"flattened to 2D.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased version (trying to simplify):

Note: The algorithm outputs a time series of class activations or embedding vectors, with a 2D shape [time, feature vector]. Feature vector values will be flattened if the output parameter is set to extract an intermediate layer with multiple dimensions.

"class activations (the output shape is, e.g., [time, number of classes]). If the output "
"parameter is set to an intermediate layer with more dimensions, the output will be "
"flattened to 2D.\n"
"\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as for TensorflowPredictEffnetDiscogs

"Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or "
"class activations (the output shape is, e.g., [time, number of classes]). If the output "
"parameter is set to an intermediate layer with more dimensions, the output will be "
"flattened to 2D.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for TensorflowPredictEffnetDiscogs

"Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or "
"class activations (the output shape is, e.g., [time, number of classes]). If the output "
"parameter is set to an intermediate layer with more dimensions, the output will be "
"flattened to 2D.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for TensorflowPredictEffnetDiscogs

@@ -66,6 +68,11 @@ AlgorithmStatus TensorToVectorReal::process() {
_timeStamps = tensor.dimension(2);
_featsSize = tensor.dimension(3);

if (_channels != 1 && !_warned) {
E_WARNING("TensorToVectorReal: The channel axis (dimension 1) of the input tensor has size larger than 1, but the output of this algorithm is 2D. The batch, channel, and time axes (dimensions 0, 1, 2) will be flattened to the first dimension of the output matrix.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We output a vector of vector of reals, so the "matrix" terminology may be misleading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants