Skip to content

A notebook for extracting embeddings from OpenAI's Jukebox model, following the approach described in Castellon et al. (2021) with some modifications followed in Spotify's Llark paper.

Notifications You must be signed in to change notification settings

jonflynng/extract-jukebox-embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

extract-jukebox-embeddings

Link to Colab Notebook.

A notebook for extracting embeddings from OpenAI's Jukebox model, following the approach described in Castellon et al. (2021) with some modifications followed in Spotify's Llark paper:

  • Source: Output of the 36th layer of the Jukebox encoder
  • Original Jukebox encoding: 4800-dimensional vectors at 345Hz
  • Audio/embeddings are chunked into 25 seconds clips as that is the max Jukebox can take in as input, any clips shorter than 25 seconds are padded before passed through Jukebox
  • Approach: Mean-pooling within 100ms frames, resulting in:
    • Downsampled frequency: 10Hz
    • Embedding size: 1.2 × 10^6 for a 25s audio clip.
    • For a 25s audio clip the 2D array shape will be [240, 4800]
  • This method retains temporal information while reducing the embedding size

Having a Colab notebook for this gives us an easily reproducible environment and allows us to take advantage of the cheap T4 GPU's Colab offers.

Extended from this repo: https://github.com/Broccaloo/jukebox

About

A notebook for extracting embeddings from OpenAI's Jukebox model, following the approach described in Castellon et al. (2021) with some modifications followed in Spotify's Llark paper.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published