-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Episode 1 correction #1
Comments
I can look in more detail another time, but here's another one: At 26:46, the Bahasa is:
Which translates to:
But this is currently rendered as:
|
Should be more like
For more potential corrections, my version of these subtitles are here: https://www.youtube.com/watch?v=yP45Lx5S5TQ I live in Indonesia but I don't speak a huge amount of Bahasa, so mine may also contain mistakes. I use a few models on Google Cloud for the transcriptions 😄 |
@cgparkinson Your work is awesome!! Your subtitles are far better than ours -- well done! Some other ones that we got wrong are like when Tommy comes back after luring Alfa to the three-headed dragon room, and Vega asks Tommy where Alfa is. In our current translation he says something about swimming (?), and we weren't sure how to translate it. In your translation, it feels a lot smoother and makes MUCH more sense. :) I've transcribed the model three different ways (using Whisper.cpp, Whisper in Python, and using YouTube's auto-transcibe + auto-translate), but none of them are perfect. I'm not sure if anything that we're doing would be useful to you, but if you would like to collaborate, I would be happy to add you as a member / manager of the Alfa Fan Sub YouTube page, or if you'd prefer to keep doing things on your own, I would be happy to redirect traffic to your channel. Right now it's me and my 3 kids working on this together, and we've got partial subtitles lined up for the first 4 episodes or so. |
Thank you for your kind words! It's so cool that you're doing this with your kids as well 😄 Feel free to use any part of my translation if you feel that it helps your own. My version of Episode 2 is halfway complete: https://www.youtube.com/watch?v=DCRfHzuvb60 My process is a little painful, to be honest, so I wasn't intending on continuing with it past Episode 2. It looks like this:
Do I understand correctly that you're not really using the original Bahasa in any part of your process? This may be where I'm able to help, as I live in Indonesia and I'm learning Bahasa. Obviously it needs to stay fun for you and your kids, so I also don't want to get in the way 😄 So, if you'd like to collaborate, how about I finish my method for Episode 2, you can compare like you did for Episode 1, and we try this for Episode 3?
That seems to be a nice async way to do this - should reduce the error, and it's mostly independent of your work, so you guys still work as a family 😄 What do you think? |
That sounds pretty good to me! I'll explore options for how I can generate the cleanest Bahasa transcription. Currently, here are the paths that I'm seeing: Path 1: YouTube Automatic Transcription
Here is an example of process 1: Path 2: Whisper.cpp Transcription
Here is an example of output from process 2: How does this look? I'm also happy to provide the auto-translated scripts for each of these alongside each one if that would help. Maybe in something like a Google Sheet that puts the Bahasa alongside the auto-translated English, and finally with a spot for you to write in your translation? I could then write a script to read in that Google sheet and convert the translations back to a .srt file for uploading to YouTube. However, if you have a different tool you're using for editing the .srt's, then by all means use that instead. Do you have opinions on the quality of the output of Process 1 vs. Process 2? |
And here's episode 3 for comparison: YouTube Automatic Transcription: Whisper.cpp (Model=Large) Transcription: |
Brilliant! Thanks for providing those SRTs so promptly. To me it seems like the Whisper output is so much better than the YouTube output that I think the YouTube output is mostly irrelevant, to be honest. The Whisper output is phenomenal and gives a really solid base. To be clear, I'm not proposing taking over your existing process - only adding another bit of material you can use when you're writing subtitles, or cross-check after you've written them. (Your subtitles are already very good for 99% of cases and very often better than mine!) I like working directly with the SRT, but happy to put it somewhere else if that's easier for you. I'll need until the end of the week to make good progress on episode 3. Sound good? 😄 |
Great feedback, thank you! I'll focus on the Whisper output and continue cleaning that up. If you notice common errors in timing or repetition, I'm happy to script those out. Thank you for the kind words re: our subtitles -- they've gone through many iterations, and I've also been leveraging ChatGPT heavily in helping us to smooth out our rough translation. Where we really fall down is when there are idioms or other issues that send us on a complete rabbit trail. I hope to have some time this weekend to clean up our subtitles using your reports -- I regret we haven't applied those yet. Thanks also for the clarification re: your intentions -- it sounds like you're happy with operating under the Alfa Fan Subtitle Project banner, and centralizing here? I'll make sure to include you in the credits -- how would you like to be credited? If you'd like, I'm also happy to invite you to manage the YouTube channel itself so that you can more easily make changes and operate on things. Would you like that? Thank you very much! Also, I just realized that I forgot to run my cleanup script on the transcription that I provided to you before. For comparison, here's what it looks like after running it through Do you have preference on the "cleaned" version vs. not? These are all of the changes that it made:
|
Great work, I've been doing my own subtitles but these are really nice.
I haven't reviewed the whole first episode yet but here's a start...
At 20:56 on Episode 1, I think the original Bahasa is:
Literally in English this would be:
Idiomatically this becomes:
Which Rey never forgives Vega for.
In the current translation this is rendered as:
The text was updated successfully, but these errors were encountered: