Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After diarization, The timestamps I got are irrelevants from original file #70

Open
teoh79 opened this issue Apr 30, 2022 · 3 comments
Open

Comments

@teoh79
Copy link

teoh79 commented Apr 30, 2022

Hello everybody , first thanks to this community to support the developers.

I tried the resemblyzer diarization and I got irrelevants results on the timestamps for each speaker compare to original files:

For example :
1/ the last timestamps doesn't corresponds to the end time of the wav file even if we speak into the end

2/ is the removing of silence provoque a shift of every timestamps compare to original wav file?

3/ does the original wav is trim out during VAD process or any other one? (Segmentation or clustering...)

Thanks in advance!

@theashishbhatt
Copy link

@teoh79 I have noticed the same issue. The audio length in the output is shorter than the actual audio length.

@ConnieZi
Copy link

ConnieZi commented Jun 17, 2022

That is probably because the silences of the input audio will be trimmed when preprocess_wav is used. There are similar problems #45 and #63. I am considering trimming the silences in the original audio as well before preprocessing so that it can match the resemblyzer output, which is also mentioned solved in #63, saying that wav is actually the trimmed audio Other than that, hope there are any other solutions.

@Nirannoel
Copy link

How to extract the timestamps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants