Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a code to generate data for music-dissector? #1

Open
vpavlenko opened this issue Sep 5, 2023 · 1 comment
Open

Is there a code to generate data for music-dissector? #1

vpavlenko opened this issue Sep 5, 2023 · 1 comment

Comments

@vpavlenko
Copy link

vpavlenko commented Sep 5, 2023

Hi,

Thank you so much for putting together both All-in-One and Music-Dissector. Music-Dissector looks fantastic, and I'd like to open my own files in it. I've found about it when triaging an issue with Cython and madmom to build WaveBeat, as I prepare a doc on the current state of transcription.

I have successfully run allin1 locally to get the data:

Screenshot 2023-09-05 at 21 52 38

I see that music-dissector loads data in a different form to show three energy spectra: {"nav": {"bass": {"low": [2, 2, 2, 2, 2 etc.

Is there a code to generate data from demucs and All-in-One results into this music-dissector format?

@tae-jun
Copy link
Owner

tae-jun commented Sep 6, 2023

Hi, I'm glad that you could successfully install and run allin1 🥹
And I'm surprised that you are trying to open your own files on Music Dissector!

Well, I don't have a plan to release the code generating input data for Music Dissector officially.
However, I can give you the code snippets.

I guess what's missing is the code for generating waveforms, right?
As you may already noticed, it's 3-band waveforms: blue=low, yellow=vocal, white=high.
I simply extracted those energies from spectrograms which just allin1 uses.
specs are just the same spectrograms here in allin1.

LOW = 250
HIGH = 4000
FPS = 100

BIN_FREQS = [
  43.06640625, 64.599609375, 86.1328125, 107.666015625, 129.19921875, 150.732421875, 172.265625, 193.798828125,
  215.33203125, 236.865234375, 258.3984375, 279.931640625, 301.46484375, 322.998046875, 344.53125, 366.064453125,
  387.59765625, 409.130859375, 430.6640625, 452.197265625, 495.263671875, 516.796875, 538.330078125, 581.396484375,
  624.462890625, 645.99609375, 689.0625, 732.12890625, 775.1953125, 839.794921875, 882.861328125, 925.927734375,
  990.52734375, 1055.126953125, 1098.193359375, 1184.326171875, 1248.92578125, 1313.525390625, 1399.658203125,
  1485.791015625, 1571.923828125, 1658.056640625, 1765.72265625, 1873.388671875, 1981.0546875, 2088.720703125,
  2217.919921875, 2347.119140625, 2497.8515625, 2627.05078125, 2799.31640625, 2950.048828125, 3143.84765625,
  3316.11328125, 3509.912109375, 3725.244140625, 3940.576171875, 4177.44140625, 4435.83984375, 4694.23828125,
  4974.169921875, 5275.634765625, 5577.099609375, 5921.630859375, 6266.162109375, 6653.759765625, 7041.357421875,
  7450.48828125, 7902.685546875, 8376.416015625, 8871.6796875, 9388.4765625, 9948.33984375, 10551.26953125,
  11175.732421875, 11843.26171875, 12553.857421875, 13285.986328125, 14082.71484375, 14922.509765625, 15805.37109375
]
BIN_FREQS = np.array(BIN_FREQS).round().astype(int)


def process(ex, outdir: Path):
  specs = ex['spec']
  
  i_low = np.flatnonzero(BIN_FREQS < LOW)
  i_high = np.flatnonzero(BIN_FREQS > HIGH)
  i_mid = np.flatnonzero((LOW <= BIN_FREQS) & (BIN_FREQS <= HIGH))
  
  # Compute the max energy value for each frequency band considering all instruments.
  max_low = specs[:, :, i_low].max()
  max_mid = specs[:, :, i_mid].max()
  max_high = specs[:, :, i_high].max()
  
  wavs_low, wavs_mid, wavs_high = [
    specs[:, :, indices].mean(axis=-1)
    # spec[:, indices].mean(axis=1)
    for indices in [i_low, i_mid, i_high]
  ]
  wavs_low /= max_low
  wavs_mid /= max_mid
  wavs_high /= max_high
  assert wavs_low.max() <= 1.0
  assert wavs_mid.max() <= 1.0
  assert wavs_high.max() <= 1.0
  
  navs_low = np.array([median_filter(wav, size=FPS) for wav in wavs_low])
  navs_mid = np.array([median_filter(wav, size=FPS) for wav in wavs_mid])
  navs_high = np.array([median_filter(wav, size=FPS) for wav in wavs_high])
  
  navs_low = navs_low
  navs_mid = navs_low + navs_mid
  navs_high = navs_mid + navs_high
  
  max_nav = np.max([navs_low.max(), navs_mid.max(), navs_high.max()])
  navs_low /= max_nav
  navs_mid /= max_nav
  navs_high /= max_nav
  assert navs_high.max() <= 1.0
  
  data = {
    'nav': {},
    'wav': {},
  }
  
  for (
    eg_low, eg_mid, eg_high,
    nav_low, nav_mid, nav_high,
    inst
  ) in zip(
    wavs_low, wavs_mid, wavs_high,
    navs_low, navs_mid, navs_high,
    [
      'bass',
      'drum',
      'other',
      'vocal',
    ]
  ):
    data['wav'][inst] = {
      'low': eg_low,
      'mid': eg_mid,
      'high': eg_high,
    }
    
    data['nav'][inst] = {
      'low': nav_low,
      'mid': nav_mid,
      'high': nav_high,
    }
  
  apply_to_dict(data, to_uint8_list)
  data['duration'] = specs.shape[1] / FPS
  
  outpath = outdir / f'{ex["track_key"]}.json'
  outpath.write_text(json.dumps(data))

I hope this is helpful for you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants