You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That is probably because the silences of the input audio will be trimmed when preprocess_wav is used. There are similar problems #45 and #63. I am considering trimming the silences in the original audio as well before preprocessing so that it can match the resemblyzer output, which is also mentioned solved in #63, saying that wav is actually the trimmed audio Other than that, hope there are any other solutions.
Hello everybody , first thanks to this community to support the developers.
I tried the resemblyzer diarization and I got irrelevants results on the timestamps for each speaker compare to original files:
For example :
1/ the last timestamps doesn't corresponds to the end time of the wav file even if we speak into the end
2/ is the removing of silence provoque a shift of every timestamps compare to original wav file?
3/ does the original wav is trim out during VAD process or any other one? (Segmentation or clustering...)
Thanks in advance!
The text was updated successfully, but these errors were encountered: