Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to change the playback rate of an AudioBufferSourceNode without affecting the pitch #2487

Open
p-himik opened this issue Apr 27, 2022 · 7 comments
Labels
Needs Discussion The issue needs more discussion before it can be fixed.

Comments

@p-himik
Copy link

p-himik commented Apr 27, 2022

Describe the feature
Basically the title - it would be nice to have an easy way to change the playback rate of a source node without affecting the pitch.

Is there a prototype?
Alas, I have no clue how to implement something like at this point.

Describe the feature in more detail
In my particular case, I'm working on a web app that needs synthesized audio alignment and preview. By itself it's very simple - there's no audio know-how involved at all. But users have requested an ability to change the playback rate so they could align and preview the audio data much quicker. Initially I tried simply changing sourceNode.playbackRate.value - after all, changing the value of an attribute with the same name on <audio> and <video> works just fine. But as you can guess, it didn't produce the desired result.

There are also quite a few other instances when people ask for something like this:

One of those is from my own experience, the others are all from just the first page of a Google search. I'm sure I'd be able to find quite a few more if I spent more time on it.

Given that there already is such a functionality when it comes to playbackRate of <audio> and <video>, I'd think that it shouldn't be that hard from the implementation perspective to add it to AudioBufferSourceNode as well.

@agrathwohl
Copy link

Given that there already is such a functionality when it comes to playbackRate of <audio> and <video>, I'd think that it shouldn't be that hard from the implementation perspective to add it to AudioBufferSourceNode as well.

At the risk of seeming pedantic, I always found the playbackRate property of the audio element to be a bit of a half-measure, since many in the audio/DSP spaces are opinionated about time stretching algorithms and desire finer control over those algorithms based upon the type of audio content being processed. (For example, see Ardour's "Stretching" page in their documentation.)

A common use case for changing playback rate without changing pitch is to slow down or speed up voice recordings. This use case has WCAG implications, so the type of algorithm deployed can be pretty important for businesses. There are particular time stretching algorithms that address this need, by ensuring greater vocal intelligibility at the expense of lower dynamic & frequency ranges.

For music and sound effects use cases, the granularity of control is crucial since the algorithm deployed has a very noticeable impact upon the way the resulting audio will sound. This means the chosen algorithm has direct aesthetic/artistic consequences.

Initially I tried simply changing sourceNode.playbackRate.value - after all, changing the value of an attribute with the same name on and

The playback rate of an audio buffer is conventionally understood to be a multiplier applied to the source's reported sampling frequency, which results in a change in the output pitch when no other DSP is applied. Given this, the current behavior of the AudioBufferSourceNode.playbackRate property seems correct to me.

My team and I have built an audio player that uses the media element's playbackRate property to achieve this kind of time stretching, keeping everything else strictly within the Web Audio API. However, out of a desire to achieve finer-grained control over the time stretching algorithm and overcome the constraints imposed by using media elements, we have been working on implementing a solution to address exactly this ticket's concern.

Our approach is to create an AudioWorklet that uses message port and audio parameters to set a connected source node's playbackRate property. Once the new rate has been sent from the worklet to the source node, the source node sends a message back to the worklet confirming that new value. This then kicks off logic to calculate the appropriate pitch change necessary to maintain a consistent pitch upon output.

Here's some code -- would love others' thoughts on this approach and happy to answer any questions folks might have:

/**
 * Timestretch Worklet
 * ES6 class abstraction for a phaseVocoder worklet via the AudioWorkletNode. This
 * worklet is capable of shifting pitch without affecting the playback speed. Using
 * this in combination with adjusting playback speed, it can be used for a
 * timestretch effect in which audio playback speed changes without affecting pitch. An
 * example implementation of both can be found in www.js
 * @public
 * @class
 */
class TimestretchWorklet {

  /**
   * Create new iteration of the TimestretchWorklet class along with a new AudioWorkletNode
   * which is available on the class's .workletNode property.
   * @param {AudioContext} ctx - Web audio context to be used
   * @param {AudioBufferSourceNode} bufferSource - (optional) bufferSource to automatically connect the new AudioWorkletNode to
   * @param {string} modulePath - Path to the module (for use when custom paths to local assets are needed, ie: vue.js)
   * @param {opts} opts - (optional) Options to pass directly to the AudioWorkletNode
   * @param {float} pitch - (optional) Initial pitch shift for the AudioWorkletNode
   * @returns {TimestretchWorklet}
   */
  static async createWorklet({
    ctx,
    bufferSource,
    modulePath,
    opts={},
    pitch,
  }) {
    const worklet = new TimestretchWorklet(ctx)

    try {
      await ctx.audioWorklet.addModule(modulePath || 'phaseVocoder.js')
    } catch (err) {
      throw new Error(`Error adding module: ${err}`)
    }

    try {
      worklet.workletNode = new AudioWorkletNode(
        ctx,
        'phase-vocoder-processor',
        opts
      )

      if (pitch) {
        worklet.updatePitch(pitch)
      }

      if (bufferSource) {
        worklet.workletNode.parameters.get('playbackRate').value = bufferSource.playbackRate.value
        bufferSource.connect(worklet.workletNode);
        worklet.bufferSource = bufferSource;
      }

      // update playbackRate via message to ensure they stay in sync
      worklet.workletNode.port.onmessage = (e) => {
        const { data } = e
        if (data.type === 'updatePlaybackRate') {
          worklet.bufferSource.playbackRate.value = data.rate
        }
      }
    } catch (err) {
      throw new Error(`Error creating worklet node: ${err}`)
    }

    return worklet
  }

  /**
   * Meant for interior use only via the static method createWorklet()
   * @param {AudioContext} ctx - Web audio context to be used
   */
  constructor(ctx) {
    this.bufferSource = null;
    this.ctx = ctx;
    this.pitch = 1.0;
    this.playbackRate = 1.0;
    this.workletNode = null;
  }


  /**
   * Connects an audio bufferSource (AudioBufferSourceNode) to the existing AudioWorkletNode
   * @param {AudioBufferSourceNode} bufferSource - bufferSource connect the AudioWorkletNode
   */
  connectBufferSource(bufferSource) {
    if (!this.workleNode) {
      throw new Error('No worklet created. Call createWorklet() first')
    }

    this.workletNode.parameters.get('playbackRate').value = bufferSource.playbackRate.value
    bufferSource.connect(this.workletNode)
    this.bufferSource = bufferSource;
  }

  /**
   * Updates the pitch of the worklet via an {AudioParam} of the AudioWorkletNode's processor
   * @param {float} pitch - Value of the pitch to set (0.1 to 2.0)
   */
  updatePitch(pitch) {
    this.workletNode.parameters.get('pitchFactor').value = parseFloat(pitch)
  }

  /**
   * Updates the playback rate of the AudioWorkletNode parameter. The processor
   * keep adjust the pitch to keep it the same despite the speed change.
   * @param {float} pitch - Value of the pitch to set (0.1 to 2.0)
   */
  updateSpeed(rate) {
    let parsedRate = parseFloat(rate)
    this.workletNode.parameters.get('playbackRate').value = parsedRate
  }

}

module.exports = TimestretchWorklet

@hoch hoch added the Needs Discussion The issue needs more discussion before it can be fixed. label May 5, 2022
@hoch
Copy link
Member

hoch commented May 5, 2022

WG agreed that adding preservePitch property on AudioBufferSourceNode can be useful, but there are more spaces that we want to explore. (e.g. quality, algorithm, complexity, etc)

@chrisguttandin
Copy link
Contributor

The code example provided by @agrathwohl above made me think that it could be enough to add a separate PitchShiftNode (called 'phase-vocoder-processor' above) to the spec. Such a node could also be used independently of an AudioBufferSourceNode.

Let's say a separate PitchShiftNode exists and for the sake of simplicity it also has a playbackRate param. This param is used to shift the pitch back to the original pitch. To achieve the preservePitch effect one could build a graph like this.

┌──────┐ ┌────────────┐    ┌──────┐
│ ABSN │-│playbackRate│ ━━ │  CSN │
└──────┘ └────────────┘    └──────┘
   ┃                          ┃
┌──────┐ ┌────────────┐       ┃
│  PSN │-│playbackRate│ ━━━━━━┛
└──────┘ └────────────┘
   ┃
┌──────┐
│  DST │
└──────┘

The audio signal would be routed from the AudioBufferSourceNode through the PitchShiftNode into the destination. A ConstantSourceNode could be used to control both playbackRate AudioParams at the same time. This would make the back-and-forth messaging implemented above unnecessary.

While this approach is a little more complex than adding a simple preservePitch property it could be used for other sources which aren't an AudioBuffer, too.

But all of this could already be implemented using an AudioWorkletProcessor without any changes to the spec. The example above could be modified to do exactly that.

@chrisguttandin
Copy link
Contributor

After typing all of the above I realized that this is almost the same as the summary of last year's meeting. #2443 (comment)

🤦

@hoch
Copy link
Member

hoch commented Sep 14, 2022

We think of two paths:

  1. If user wants simple and cheap time-stretching, ABSN.preservePitch switch can support that. The behavior/sonic characteristic of time-stretching should match the UA's HTMLMediaElement's counterpart.
  2. If a more sophisticated approach is required, a custom AudioWorkletNode can be used.

That leads to:

partial interface AudioBufferSourceNode {
  boolean preservePitch = false;
}

@mdjp
Copy link
Member

mdjp commented Jan 12, 2023

Will review on next call after grouping all related issues/requests.

@Moebits
Copy link

Moebits commented Jun 28, 2023

Please add this, in 99% of cases you don't want to change the pitch. And HTML Audio elements already have preservesPitch to toggle this behavior, so it would be consistent to also support it in Web Audio API.

As a workaround I have been using this AudioWorklet to correct the pitch, but it doesn't really sound that great and you can notice a lot of artifacts. Hopefully a native solution would sound better. https://github.com/olvb/phaze

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Discussion The issue needs more discussion before it can be fixed.
Projects
None yet
Development

No branches or pull requests

6 participants