You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When loading a stereo audio file and downmixing it to mono, I expect the resulting amplitudes to not depend on the audio file format, but only on the content.
Actual behaviour
Currently, if a wave file has the the same sample type as the one desired when loading, madmom will use scipy to load it; then, to downmix the signal to mono, it will use its own madmom.audio.signal.remix function, which computes the arithmetic mean of the channels.
If the there is a mismatch in sample types (eg. the file is stored as float32 but loaded as float, or stored as 16-bit integers and loaded as float), madmom will use ffmpeg to load the file, and, in the same step, use ffmpeg to downmix to mono.
Now, the downmixing logic of ffmpeg apparantly uses a normalizing factor of 2 / sqrt(2) when downmixing. This results in different amplitudes.
Steps needed to reproduce the behaviour
importmadmomimportnumpyasnp# chirp.wav is stored as stereo 32-bit floatread_wave=madmom.io.load_audio_file('chirp.wav', num_channels=1, dtype=np.float32)[0]
read_ffmpeg=madmom.io.load_audio_file('chirp.wav', num_channels=1, dtype=np.float)[0]
print(np.nanmedian(read_wave/read_ffmpeg)) # 0.7071...print(np.nanmedian(((2*read_wave/np.sqrt(2)) /read_ffmpeg)) # 1.0
Information about installed software
madmom master branch
ffmpeg version 4.4.2-0ubuntu0.22.04.1
The text was updated successfully, but these errors were encountered:
Expected behaviour
When loading a stereo audio file and downmixing it to mono, I expect the resulting amplitudes to not depend on the audio file format, but only on the content.
Actual behaviour
Currently, if a wave file has the the same sample type as the one desired when loading, madmom will use
scipy
to load it; then, to downmix the signal to mono, it will use its ownmadmom.audio.signal.remix
function, which computes the arithmetic mean of the channels.If the there is a mismatch in sample types (eg. the file is stored as float32 but loaded as float, or stored as 16-bit integers and loaded as float), madmom will use
ffmpeg
to load the file, and, in the same step, useffmpeg
to downmix to mono.Now, the downmixing logic of ffmpeg apparantly uses a normalizing factor of
2 / sqrt(2)
when downmixing. This results in different amplitudes.Steps needed to reproduce the behaviour
Information about installed software
The text was updated successfully, but these errors were encountered: