Speed of generating fingereprints from custom source #23

mimbres · 2022-01-19T11:17:32Z

Hi, it might be related to this, but I'm trying to generate fingerprints from custom source using the pretranied model you shared here: #10 (comment) and I was wondering if you could tell me what's the expectated time for generating a fingerprint from a single query? Since it took 1629seconds to generate fingerprints corresponding to 2 queries (1-min length) [even though in the source directory there are 3 wav files, I'm studying why this as well ]
From the CLI Output: 2/2 [==============================] - 1629s 47ms/step

I'm using a 40-cpu server with a RTX3090.

Also, can you help me understanding the shape of the resulting db? I understand that the shape is n_items x d, and n_items is #num audios x batchsize. I don't see what this batchsize mean and therefore, the resulting db shape.

Thanks in advance!

Originally posted by @guillemcortes in #8 (comment)

The text was updated successfully, but these errors were encountered:

mimbres · 2022-01-19T12:17:43Z

@guillemcortes

Speed:
NO, it's weird to see 1629s (~= 27 min) for the 2 x 1 min queries. I can't remember the exact elapse time but it should be processed in <1s.

Did you add --skip_dummy tag?

python run.py generate CHECKPOINT_NAME CHECKPOINT_INDEX -c CONFIG_NAME --source SOURCE_ROOT_DIR --output FP_OUTPUT_DIR --skip_dummy

Here, --skip_dummy means that we will skip the generating of fingerprints for the 100K dummy songs.

BTW, only once for the first experiment, we need to extract fingerprints for the 100K dummy songs along with the custom source. In this case, it may take as long as 27 minutes, as in your case.
```
python run.py generate CHECKPOINT_NAME CHECKPOINT_INDEX -c CONFIG_NAME --source SOURCE_ROOT_DIR --output FP_OUTPUT_DIR
```
I will update this answer after reproducing it in my environment tonight.

Shape of the resulting DB:

As for n_items x d, n_items means the number of fingerprints. By default, we extract it for each 1s segments with 0.5s hop. Given 1 min x 2 songs (total 120s) for the custom queries, n_items should be 238 = 2* (60 * 2 - 1). The shape is stored in your logs/emb/xxxx/xxx/query_shape.npy.
By default, we use TS_BATCH_SZ : 125. This can be a problem in your case with only 2 songs (239 segments). As 239 % 125 = 114, the last 114 segments will be dropped (Get fingerprints for custom wav input files #21). Setting TS_BATCH_SZ with any divisible number of your total segments, like 239, can be a temporary solution for this.

mimbres · 2022-01-19T13:38:59Z

I reproduced @guillemcortes experiment using 2 x 1 min custom source. The resulting shape was 125 x 128 (114 segments were dropped with TS_BATCH_SZ=125) as expected. But I encountered a buggy error message, which is ignorable though.

python run.py generate CHECKPOINT_NAME --source ../neural-audio-fp-dataset/music/others/custom_source --skip_dummy

Arugment 'checkpoint_index' was not specified.
Searching for the latest checkpoint...
---Restored from ./logs/checkpoint/exp_mini640_tau005/ckpt-101---
Data source: dict_keys(['custom_db']) unseen_icassp
=== Generating fingerprint from 'custom_db' bsz=125, 125 items, d=128 ===
1/1 [==============================] - 6s 6s/step
=== Succesfully stored fingerprint to ./logs/emb//exp_mini640_tau005/101/ ===

[ERROR MESSAGE AFTER THIS]
File "run.py", line 111, in generate
    generate_fingerprint(cfg, checkpoint_name, checkpoint_index, source, output, skip_dummy)
  File "neural-audio-fp-dev/model/generate.py", line 190, in generate_fingerprint
    if sz_check['db'] != sz_check['query']:
KeyError: 'db'

Check resulting fingerprint shape.

import numpy as np
>>> np.load('custom_db_shape.npy')
array([125, 128])

ToDo:

Fix ignorable error for custom source, and rename output file to custom_source*.*
Fix Typo (Arugment)

mimbres · 2022-01-19T14:14:11Z

Now (7647aec) the output filenames are custom_source.mm and `custom_source_shape.npy', because they can be used for both custom DB and query generations.

mimbres · 2022-01-20T11:27:18Z

db28b6b resolves #21.

>>> np.load('custom_db_shape.npy')
array([238, 128])

guillemcortes · 2022-01-25T09:41:19Z

Hi, I know you closed this issue, just wanted to update on this.
Yes, the results I showed you were using the skip_dummy flag. I tried to generate the fingerprints using CPU only and it's fast, now. Around 12s for 3 queries of 1 minute using TS_BATCH_SIZE of 119. I still have to investigate why it's so slow with GPU (and also only computes the fingerprint of 2/3 available audios) but for the moment I will stick to CPU. Thanks!

mimbres · 2022-01-26T11:37:14Z

@guillemcortes I don't quite understand how slow it is on the GPU. Have you ever tried training with the default config? 1 epoch (10K songs) usually takes around 20 min. If it takes too long, I think it should relate to installation of environment problems.

guillemcortes · 2022-01-26T12:05:10Z

Ok! will try training with the default config and let you know!

guillemcortes · 2022-01-27T09:53:37Z

Hi, I tried reinstalling your docker version and now training from scratch with the default config python run.py train test2 --max_epoch=10 -c default takes around 16min per epoch.
Sorry for the noise, I must have had my docker image corrupted somehow. Thanks!

mimbres assigned mimbres and unassigned mimbres Jan 19, 2022

mimbres added the question Further information is requested label Jan 19, 2022

mimbres mentioned this issue Jan 19, 2022

Performance benchmark #8

Closed

mimbres added the bug Something isn't working label Jan 19, 2022

mimbres closed this as completed in 7647aec Jan 19, 2022

mimbres reopened this Jan 19, 2022

mimbres closed this as completed Jan 20, 2022

mimbres reopened this Jan 26, 2022

mimbres closed this as completed Feb 15, 2022

Rodrigo29Almeida pushed a commit to Rodrigo29Almeida/neural-audio-fp that referenced this issue Apr 16, 2024

Resolves mimbres#23, an error from custom source

733a47a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed of generating fingereprints from custom source #23

Speed of generating fingereprints from custom source #23

mimbres commented Jan 19, 2022 •

edited

Loading

mimbres commented Jan 19, 2022 •

edited

Loading

mimbres commented Jan 19, 2022 •

edited

Loading

mimbres commented Jan 19, 2022 •

edited

Loading

mimbres commented Jan 20, 2022

guillemcortes commented Jan 25, 2022

mimbres commented Jan 26, 2022

guillemcortes commented Jan 26, 2022

guillemcortes commented Jan 27, 2022

Speed of generating fingereprints from custom source #23

Speed of generating fingereprints from custom source #23

Comments

mimbres commented Jan 19, 2022 • edited Loading

mimbres commented Jan 19, 2022 • edited Loading

mimbres commented Jan 19, 2022 • edited Loading

mimbres commented Jan 19, 2022 • edited Loading

mimbres commented Jan 20, 2022

guillemcortes commented Jan 25, 2022

mimbres commented Jan 26, 2022

guillemcortes commented Jan 26, 2022

guillemcortes commented Jan 27, 2022

mimbres commented Jan 19, 2022 •

edited

Loading

mimbres commented Jan 19, 2022 •

edited

Loading

mimbres commented Jan 19, 2022 •

edited

Loading

mimbres commented Jan 19, 2022 •

edited

Loading