Skip to content

Commit

Permalink
[rustpotterks] Upgrade to version 2 (openhab#14615)
Browse files Browse the repository at this point in the history
* [rustpotter] Use version 2

Signed-off-by: Miguel Álvarez <[email protected]>
  • Loading branch information
GiviMAD authored and FordPrfkt committed Apr 19, 2023
1 parent f98083b commit 20e3df1
Show file tree
Hide file tree
Showing 6 changed files with 213 additions and 183 deletions.
58 changes: 34 additions & 24 deletions bundles/org.openhab.voice.rustpotterks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,31 @@ This voice service allows you to use the open source library Rustpotter as your

Rustpotter provides personal on-device wake word detection. You need to generate a model for your keyword using audio samples.

You can test library in your browser using these web pages:

- [The spot demo](https://givimad.github.io/rustpotter-worklet-demo/), which include some example wakewords (but it's recommended to use your own).
- [The model creation demo](https://givimad.github.io/rustpotter-create-model-demo/), it allows you to record compatible wav files and generate a wakeword file that you can test on the previous page.

Important: No voice data listened by this service will be uploaded to the Cloud.
The voice data is processed offline, locally on your openHAB server by Rustpotter.

## Configuration

After installing, you will be able to access the service options through the openHAB configuration page in UI (**Settings / Other Services - Rustpotter Keyword Spotter**) to edit them:

* **Threshold** - Configures the detector threshold, is the min score (in range 0. to 1.) that some wake word template should obtain to trigger a detection. Defaults to 0.5.
* **Averaged Threshold** - Configures the detector averaged threshold, is the min score (in range 0. to 1.) that the audio should obtain against a combination of the wake word templates, the detection will be aborted if this is not the case. This way it can prevent to run the comparison of the current frame against each of the wake word templates which saves cpu. If set to 0 this functionality is disabled.
* **Eager mode** - Enables eager mode. End detection as soon as a result is over the score, instead of waiting to see if the next frame has a higher score.
* **Noise Detection Mode** - Use build-in noise detection to reduce computation on absence of noise. Configures the difficulty to consider a frame as noise (the required noise level).
* **Noise Detection Sensitivity** - Noise/silence ratio in the last second to consider noise is detected. Defaults to 0.5.
* **VAD Mode** - Use a voice activity detector to reduce computation in the absence of vocal sound.
* **VAD Sensitivity** - Voice/silence ratio in the last second to consider voice is detected.
* **VAD Delay** - Seconds to disable the vad detector after voice is detected. Defaults to 3.
* **Comparator Ref** - Configures the reference for the comparator used to match the samples.
* **Comparator Band Size** - Configures the band-size for the comparator used to match the samples.

- **Threshold** - Configures the detector threshold, is the min score (in range 0. to 1.) that some wake word template should obtain to trigger a detection. Defaults to 0.5.
- **Averaged Threshold** - Configures the detector averaged threshold, is the min score (in range 0. to 1.) that the audio should obtain against a combination of the wake word templates, the detection will be aborted if this is not the case. This way it can prevent to run the comparison of the current frame against each of the wake word templates which saves cpu. If set to 0 this functionality is disabled.
- **Score Mode** - Indicates how to calculate the final score.
- **Min Scores** - Minimum number of positive scores to consider a partial detection as a detection.
- **Comparator Ref** - Configures the reference for the comparator used to match the samples.
- **Comparator Band Size** - Configures the band-size for the comparator used to match the samples.
- **Gain Normalizer** - Enables an audio filter that intent to approximate the volume of the stream to a reference level.
- **Min Gain** - Min gain applied by the gain normalizer filter.
- **Max Gain** - Max gain applied by the gain normalizer filter.
- **Gain Ref** - The RMS reference used by the gain-normalizer to calculate the gain applied. If unset an estimation of the wakeword level is used.
- **Band Pass** - Enables an audio filter that attenuates frequencies outside the low cutoff and high cutoff range.
- **Low Cutoff** - Low cutoff for the band-pass filter.
- **High Cutoff** - High cutoff for the band-pass filter.

In case you would like to setup the service via a text file, create a new file in `$OPENHAB_ROOT/conf/services` named `rustpotterks.cfg`

Expand All @@ -31,21 +38,24 @@ Its contents should look similar to:
```
org.openhab.voice.rustpotterks:threshold=0.5
org.openhab.voice.rustpotterks:averagedthreshold=0.2
org.openhab.voice.rustpotterks:scoreMode=max
org.openhab.voice.rustpotterks:minScores=5
org.openhab.voice.rustpotterks:comparatorRef=0.22
org.openhab.voice.rustpotterks:comparatorBandSize=6
org.openhab.voice.rustpotterks:eagerMode=true
org.openhab.voice.rustpotterks:noiseDetectionMode=hard
org.openhab.voice.rustpotterks:noiseDetectionSensitivity=0.5
org.openhab.voice.rustpotterks:vadMode=aggressive
org.openhab.voice.rustpotterks:vadSensitivity=0.5
org.openhab.voice.rustpotterks:vadDelay=3
org.openhab.voice.rustpotterks:comparatorBandSize=5
org.openhab.voice.rustpotterks:gainNormalizer=true
org.openhab.voice.rustpotterks:minGain=0.5
org.openhab.voice.rustpotterks:maxGain=1
org.openhab.voice.rustpotterks:gainRef=
org.openhab.voice.rustpotterks:bandPass=true
org.openhab.voice.rustpotterks:lowCutoff=80
org.openhab.voice.rustpotterks:highCutoff=400
```

## Magic Word Configuration

The magic word to spot is gathered from your 'Voice' configuration.

You can generate your own wake word model by using the [Rustpotter CLI](https://github.com/GiviMAD/rustpotter-cli).
You can generate your own wakeword files using the [Rustpotter CLI](https://github.com/GiviMAD/rustpotter-cli).

You can also download the models used as examples on the [rustpotter web demo](https://givimad.github.io/rustpotter-worklet-demo/) from [this folder](https://github.com/GiviMAD/rustpotter-worklet-demo/tree/main/static).

Expand All @@ -59,11 +69,11 @@ The service will only work if it's able to find the correct rpw for your magic w

You can setup your preferred default keyword spotter and default magic word in the UI:

* Go to **Settings**.
* Edit **System Services - Voice**.
* Set **Rustpotter Keyword Spotter** as **Default Keyword Spotter**.
* Choose your preferred **Magic Word** for your setup.
* Choose optionally your **Listening Switch** item that will be switch ON during the period when the dialog processor has spotted the keyword and is listening for commands.
- Go to **Settings**.
- Edit **System Services - Voice**.
- Set **Rustpotter Keyword Spotter** as **Default Keyword Spotter**.
- Choose your preferred **Magic Word** for your setup.
- Choose optionally your **Listening Switch** item that will be switch ON during the period when the dialog processor has spotted the keyword and is listening for commands.

In case you would like to setup these settings via a text file, you can edit the file `runtime.cfg` in `$OPENHAB_ROOT/conf/services` and set the following entries:

Expand Down
2 changes: 1 addition & 1 deletion bundles/org.openhab.voice.rustpotterks/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
<dependency>
<groupId>io.github.givimad</groupId>
<artifactId>rustpotter-java</artifactId>
<version>1.0.0</version>
<version>2.0.0</version>
</dependency>
</dependencies>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
package org.openhab.voice.rustpotterks.internal;

import org.eclipse.jdt.annotation.NonNullByDefault;
import org.eclipse.jdt.annotation.Nullable;

/**
* The {@link RustpotterKSConfiguration} class contains fields mapping thing configuration parameters.
Expand All @@ -36,37 +37,49 @@ public class RustpotterKSConfiguration {
*/
public float averagedThreshold = 0.2f;
/**
* Terminate the detection as son as one result is above the score,
* instead of wait to see if the next frame has a higher score.
* Indicates how to calculate the final score.
*/
public boolean eagerMode = true;
public String scoreMode = "max";
/**
* Use build-in noise detection to reduce computation on absence of noise.
* Configures the difficulty to consider a frame as noise (the required noise level).
* Minimum number of positive scores to consider a partial detection as a detection.
*/
public String noiseDetectionMode = "disabled";
public int minScores = 5;
/**
* Noise/silence ratio in the last second to consider noise is detected. Defaults to 0.5.
* Configures the reference for the comparator used to match the samples.
*/
public float noiseSensitivity = 0.5f;
public float comparatorRef = 0.22f;
/**
* Seconds to disable the vad detector after voice is detected. Defaults to 3.
* Configures the band-size for the comparator used to match the samples.
*/
public int vadDelay = 3;
public int comparatorBandSize = 5;
/**
* Voice/silence ratio in the last second to consider voice is detected.
* Enables an audio filter that intent to approximate the volume of the stream to a reference level (RMS of the
* samples is used as volume measure).
*/
public float vadSensitivity = 0.5f;
public boolean gainNormalizer = false;
/**
* Use a voice activity detector to reduce computation in the absence of vocal sound.
* Min gain applied by the gain normalizer filter.
*/
public String vadMode = "disabled";
public float minGain = 0.5f;
/**
* Configures the reference for the comparator used to match the samples.
* Max gain applied by the gain normalizer filter.
*/
public float comparatorRef = 0.22f;
public float maxGain = 1f;
/**
* Configures the band-size for the comparator used to match the samples.
* Set the RMS reference used by the gain-normalizer to calculate the gain applied. If unset an estimation of the
* wakeword level is used.
*/
public @Nullable Float gainRef = null;
/**
* Enables an audio filter that attenuates frequencies outside the low cutoff and high cutoff range.
*/
public boolean bandPass = false;
/**
* Low cutoff for the band-pass filter.
*/
public float lowCutoff = 80f;
/**
* High cutoff for the band-pass filter.
*/
public int comparatorBandSize = 6;
public float highCutoff = 400f;
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import java.io.File;
import java.io.IOException;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.Locale;
import java.util.Map;
import java.util.Set;
Expand All @@ -38,18 +39,17 @@
import org.openhab.core.voice.KSServiceHandle;
import org.openhab.core.voice.KSpottedEvent;
import org.osgi.framework.Constants;
import org.osgi.service.component.ComponentContext;
import org.osgi.service.component.annotations.Activate;
import org.osgi.service.component.annotations.Component;
import org.osgi.service.component.annotations.Modified;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import io.github.givimad.rustpotter_java.Endianness;
import io.github.givimad.rustpotter_java.NoiseDetectionMode;
import io.github.givimad.rustpotter_java.RustpotterJava;
import io.github.givimad.rustpotter_java.RustpotterJavaBuilder;
import io.github.givimad.rustpotter_java.VadMode;
import io.github.givimad.rustpotter_java.Rustpotter;
import io.github.givimad.rustpotter_java.RustpotterBuilder;
import io.github.givimad.rustpotter_java.SampleFormat;
import io.github.givimad.rustpotter_java.ScoreMode;

/**
* The {@link RustpotterKSService} is a keyword spotting implementation based on rustpotter.
Expand All @@ -76,7 +76,7 @@ public class RustpotterKSService implements KSService {
}

@Activate
protected void activate(ComponentContext componentContext, Map<String, Object> config) {
protected void activate(Map<String, Object> config) {
modified(config);
}

Expand Down Expand Up @@ -111,7 +111,7 @@ public KSServiceHandle spot(KSListener ksListener, AudioStream audioStream, Loca
throws KSException {
logger.debug("Loading library");
try {
RustpotterJava.loadLibrary();
Rustpotter.loadLibrary();
} catch (IOException e) {
throw new KSException("Unable to load rustpotter lib: " + e.getMessage());
}
Expand All @@ -126,8 +126,13 @@ public KSServiceHandle spot(KSListener ksListener, AudioStream audioStream, Loca
}
var endianness = isBigEndian ? Endianness.BIG : Endianness.LITTLE;
logger.debug("Audio wav spec: frequency '{}', bit depth '{}', channels '{}', '{}'", frequency, bitDepth,
channels, audioFormat.isBigEndian() ? "big-endian" : "little-endian");
RustpotterJava rustpotter = initRustpotter(frequency, bitDepth, channels, endianness);
channels, isBigEndian ? "big-endian" : "little-endian");
Rustpotter rustpotter;
try {
rustpotter = initRustpotter(frequency, bitDepth, channels, endianness);
} catch (Exception e) {
throw new KSException("Unable to configure rustpotter: " + e.getMessage(), e);
}
var modelName = keyword.replaceAll("\\s", "_") + ".rpw";
var modelPath = Path.of(RUSTPOTTER_FOLDER, modelName);
if (!modelPath.toFile().exists()) {
Expand All @@ -141,48 +146,43 @@ public KSServiceHandle spot(KSListener ksListener, AudioStream audioStream, Loca
logger.debug("Model '{}' loaded", modelPath);
AtomicBoolean aborted = new AtomicBoolean(false);
executor.submit(() -> processAudioStream(rustpotter, ksListener, audioStream, aborted));
return new KSServiceHandle() {
@Override
public void abort() {
logger.debug("Stopping service");
aborted.set(true);
}
return () -> {
logger.debug("Stopping service");
aborted.set(true);
};
}

private RustpotterJava initRustpotter(long frequency, int bitDepth, int channels, Endianness endianness) {
var rustpotterBuilder = new RustpotterJavaBuilder();
private Rustpotter initRustpotter(long frequency, int bitDepth, int channels, Endianness endianness)
throws Exception {
var rustpotterBuilder = new RustpotterBuilder();
// audio configs
rustpotterBuilder.setBitsPerSample(bitDepth);
rustpotterBuilder.setSampleRate(frequency);
rustpotterBuilder.setChannels(channels);
rustpotterBuilder.setSampleFormat(SampleFormat.INT);
rustpotterBuilder.setEndianness(endianness);
// detector configs
rustpotterBuilder.setThreshold(config.threshold);
rustpotterBuilder.setAveragedThreshold(config.averagedThreshold);
rustpotterBuilder.setScoreMode(getScoreMode(config.scoreMode));
rustpotterBuilder.setMinScores(config.minScores);
rustpotterBuilder.setComparatorRef(config.comparatorRef);
rustpotterBuilder.setComparatorBandSize(config.comparatorBandSize);
@Nullable
VadMode vadMode = getVADMode(config.vadMode);
if (vadMode != null) {
rustpotterBuilder.setVADMode(vadMode);
rustpotterBuilder.setVADSensitivity(config.vadSensitivity);
rustpotterBuilder.setVADDelay(config.vadDelay);
}
@Nullable
NoiseDetectionMode noiseDetectionMode = getNoiseMode(config.noiseDetectionMode);
if (noiseDetectionMode != null) {
rustpotterBuilder.setNoiseMode(noiseDetectionMode);
rustpotterBuilder.setNoiseSensitivity(config.noiseSensitivity);
}
rustpotterBuilder.setEagerMode(config.eagerMode);
// filter configs
rustpotterBuilder.setGainNormalizerEnabled(config.gainNormalizer);
rustpotterBuilder.setMinGain(config.minGain);
rustpotterBuilder.setMaxGain(config.maxGain);
rustpotterBuilder.setGainRef(config.gainRef);
rustpotterBuilder.setBandPassFilterEnabled(config.bandPass);
rustpotterBuilder.setBandPassLowCutoff(config.lowCutoff);
rustpotterBuilder.setBandPassHighCutoff(config.highCutoff);
// init the detector
var rustpotter = rustpotterBuilder.build();
rustpotterBuilder.delete();
return rustpotter;
}

private void processAudioStream(RustpotterJava rustpotter, KSListener ksListener, AudioStream audioStream,
private void processAudioStream(Rustpotter rustpotter, KSListener ksListener, AudioStream audioStream,
AtomicBoolean aborted) {
int numBytesRead;
var bufferSize = (int) rustpotter.getBytesPerFrame();
Expand All @@ -200,10 +200,20 @@ private void processAudioStream(RustpotterJava rustpotter, KSListener ksListener
continue;
}
remaining = bufferSize;
var result = rustpotter.processBuffer(audioBuffer);
var result = rustpotter.processBytes(audioBuffer);
if (result.isPresent()) {
var detection = result.get();
logger.debug("keyword '{}' detected with score {}!", detection.getName(), detection.getScore());
if (logger.isDebugEnabled()) {
ArrayList<String> scores = new ArrayList<>();
var scoreNames = detection.getScoreNames().split("\\|\\|");
var scoreValues = detection.getScores();
for (var i = 0; i < Integer.min(scoreNames.length, scoreValues.length); i++) {
scores.add("'" + scoreNames[i] + "': " + scoreValues[i]);
}
logger.debug("Detected '{}' with: Score: {}, AvgScore: {}, Count: {}, Gain: {}, Scores: {}",
detection.getName(), detection.getScore(), detection.getAvgScore(),
detection.getCounter(), detection.getGain(), String.join(", ", scores));
}
detection.delete();
ksListener.ksEventReceived(new KSpottedEvent());
}
Expand All @@ -216,35 +226,27 @@ private void processAudioStream(RustpotterJava rustpotter, KSListener ksListener
logger.debug("rustpotter stopped");
}

private @Nullable VadMode getVADMode(String mode) {
switch (mode) {
case "low-bitrate":
return VadMode.LOW_BITRATE;
case "quality":
return VadMode.QUALITY;
case "aggressive":
return VadMode.AGGRESSIVE;
case "very-aggressive":
return VadMode.VERY_AGGRESSIVE;
default:
return null;
}
}

private @Nullable NoiseDetectionMode getNoiseMode(String mode) {
private ScoreMode getScoreMode(String mode) {
switch (mode) {
case "easiest":
return NoiseDetectionMode.EASIEST;
case "easy":
return NoiseDetectionMode.EASY;
case "normal":
return NoiseDetectionMode.NORMAL;
case "hard":
return NoiseDetectionMode.HARD;
case "hardest":
return NoiseDetectionMode.HARDEST;
case "average":
return ScoreMode.AVG;
case "median":
return ScoreMode.MEDIAN;
case "p25":
return ScoreMode.P25;
case "p50":
return ScoreMode.P50;
case "p75":
return ScoreMode.P75;
case "p80":
return ScoreMode.P80;
case "p90":
return ScoreMode.P90;
case "p95":
return ScoreMode.P95;
case "max":
default:
return null;
return ScoreMode.MAX;
}
}
}
Loading

0 comments on commit 20e3df1

Please sign in to comment.