This repo shows how to fetch the raw YouTube transcripts and use the Gemini Flash 8B API to format them. Made by @ldenoue
Try the demo at https://ldenoue.github.io/readabletranscripts and type any search term
Try locally python3 -m http.server
and open http://localhost:8000
Set your Gemini API KEY (create a Free Gemini API Key) Note: The key is used
Line 159 in f5ea580
Click on the video, e.g. https://ldenoue.github.io/readabletranscripts/?id=8yzmCt0QwOQ
You will see a summary of the video and the transcript below.
We first ask Gemini to extract important words by giving it the video title and its description. See
Line 837 in 8466ec2
This context is essential to improve the accuracy of the transcripts. Titles and descriptions often contain human-edited text that includes proper names, acronyms, etc.
We break up the raw YouTube transcript into chunks of 512 words. We feed each chunk to Gemini with a prompt (see
Line 410 in 5c51f52
Notice that we send the requests in parallel to Gemini.
Once we have the formatted chunks, we now need to merge them.
For each 2 consecutive chunks chunk1
and chunk2
, we ask Gemini to merge the last sentence chunk1
and the first sentence of chunk2
(see prompt in
Line 433 in 5c51f52
We merge the chunks and the seams between them to obtain the final transcript.
In order to highlight the words as the video plays, we need to align the words from the raw YouTube transcript and the final, punctuated, transcript.
We rely on diff.js
for that.
Now words get highlighted as the video plays, and users can also jump into the video by clicking any word in the transcript.
Thanks for your contribution.