-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Realtime mode results in distortions, maybe clipping #84
Comments
This is strictly speaking a limitation of the API. The underlying cause is the need to avoid audible discontinuities when the stretch or shift factor is changed during live processing, specifically in the case where it is changed to 1.0 or from 1.0 to something else. In the R2 engine, pure time-stretching at a factor of 1.0 does essentially nothing and there is no issue moving from 1.0 to some other factor in time only. But pitch-shifting is an issue, which is why the In the R3 engine there is an additional multi-channel frequency handling layer. This can cause audible artifacts when enabled or disabled, so in real-time mode it is enabled always, even at 1.0. This is not usually audible with music, but can be observed with a test signal as you have found. Ideally the channel handling would be totally transparent, and I would like to improve it in that direction if possible. Failing that, there probably ought to be another option, analogous to the Of course we do also have far too many options already... |
Thank you for your reply, but, admittedly, I am not really satisfied.
Maybe it is just me, but the issue is very noticeable not only in artificial test files but real-world audio, music and especially singing as well. Just pick some real-world music with a singing voice, convert it to 24000 Hz sampling rate and process it with rubberband and the set of switches mentioned above, i. e.
Note the (probably audible) difference between Also note that the distortions are very noticeable even for mono input, so I do not see how the "additional multi-channel frequency handling layer" is supposed to cause this issue in those cases. Apparently, one can avoid at least some audible distortions by upsampling beforehand, e. g.
That is, the sampling rate presented to rubberband seems to have a quite noticeable effect on its output even if, signal-wise, the input stays more or less the same. This is the main reason why I think that there is a bug. Anyway, I ran a few experiments. First, I tried upsampling At that point, I started suspecting something like accumulated rounding errors, and the higher the sampling rate, the more errors to accumulate. Or maybe aliasing depending on the sampling rate, who knows. So I tried something more interesting: Upscaling by a different factor than 2.
As suspected, something interesting happened: The most pronounced peak moved! The overall shape (waveform) changed as well, and the other peaks appear less pronounced. What is going on here? I also had a look at the spectrograms. I visually inspected the spectrograms of all audio files I tested (both real-world audio as well as artificial test files) and compared them with their respective rubberband-processed versions. The following pattern appears to emerge:
What do you think? Is there anything new to you? If you don't see any flaw that might point to a bug, I would be glad if you could explain the observations I made. Just curious. (But if it happens to help finding hidden bugs, that would be awesome!) Feel free to ask if you have any questions. |
Well, I am not trying to claim the situation is ideal, just that it is a logical compromise. Time-stretching introduces artifacts. Ideally those artifacts would disappear entirely at a ratio of 1.0. Also ideally, switching from 1.0 to another ratio during playback should be "noiseless" (i.e. introduce no clicks, or other artifacts that are not there during fixed-ratio processing). For a variable-rate timestretcher it's possible to make a case for the latter requirement as more important than the former (within reason). If you want a 1.0 ratio most of the time and don't mind a click when switching away from 1.0, you can either use offline mode, or bypass the timestretcher when the ratio is 1.0. Whereas if the timestretcher itself clicks when switching to or from 1.0, there is nothing that you as the user can do about it. As I said, this is an area I'd like to improve in future releases - ideally by improving the method so as to satisfy both requirements, but at least with an option. It isn't an issue with the R2 stretcher for various reasons, but the cause of this behaviour with R3 is essentially connected with why R3 usually sounds better in the first place.
The API docs note that 44.1 or 48kHz are the intended rates for Rubber Band - other rates "should produce sensible output" but are not advised. Perhaps the command-line tool should also say this.
Frequency channels, not audio file channels (sorry! that was confusing). The artifacts you're observing are around channel boundaries. The precise boundary frequencies may change depending on the content of the signal. |
First of all, thank you for this awesome library. The new bugfix release (3.2.1) greatly improved the quality! However, I kind of hoped that it might fix another bug I just noticed. Unfortunately, it is still present (for example, it was already present in 3.1.2, if inferred correctly based on my package cache).
When using realtime mode (
-R
), the output is not identical to its input even if the parameters are set in a way that it should (ignoring rounding errors). Consider the following:input.wav
.(Note: mpv should report
pcm_s16le 1ch 24000Hz
for the respective audio tracks in both files.)(The sampling frequency does not matter much. However, some distortions are more pronounced when using a suitable low sampling frequency, making them audible, at least if you listen carefully.)
I analyzed the result using Tenacity, subtracting the input version from the output version by adding its inverse. The differences between the processed and the original version I noticed are:
Note that if I do not use realtime mode (by not using
-R
), the result is as expected, i.e. the difference between input and output is close to silence. (Actually, in this case, it is not perfect silence, which would be best, of course, but that is a minor issue compared to what this bug report is about. I can imagine, though, that even these tiny differences will disappear as soon as this bug is fixed.)Since
rubberband --full-help
does not say anything about it, I consider this a bug. Also, unfortunately, it makes rubberband unsuitable for cases where realtime mode is necessary, for example, when using it for pitch-correct time scaling in mpv:As described above, the distortions during playback are present even if there is no time scaling or pitch shifting taking place.
Please keep up the good work!
The text was updated successfully, but these errors were encountered: