Skip to content

Commit

Permalink
WIP: print wake volume
Browse files Browse the repository at this point in the history
This requires modifications to ESP-ADF, so we need to bump it to a newer
version.

This seems to be NULL on the first wake after boot, and sometimes it
prints a different value than the print in ESP-ADF. This is most likely
timing related, because we only set the volume on SR_RESULT_VERIFIED.

Small braindump:

This is going to be a tricky feature. What if 2 people in the same house
wake a different Willow at the same time? Maybe we should not read this
on wake, but rather after WIS returned the data. We can then send the
wake volume and the transcribed text to WAS, and if the text is
completely different, we don't cancel anything. This would also make
sure we have the correct wake volume.
  • Loading branch information
stintel committed Sep 8, 2023
1 parent 985e8ec commit 2f6c047
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 2 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ COPY container.gitconfig /root/.gitconfig
ENV PATH="$PATH:/willow/.local/bin"
WORKDIR /willow

ENV ADF_VER="2939a3e541f46f6dd916d2e3b7462609ec4fae5c"
ENV ADF_VER="1ed0196fe4623f5760e512eb851c038cf0785b06"
RUN \
cd /opt/esp/idf && \
curl https://raw.githubusercontent.com/toverainc/esp-adf/$ADF_VER/idf_patches/idf_v4.4_freertos.patch | patch -p1
7 changes: 7 additions & 0 deletions main/audio.c
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,13 @@ static esp_err_t cb_ar_event(audio_rec_evt_t are, void *data)
if (recording) {
break;
}
float *wake_volume_ptr = (float *)data;
if (wake_volume_ptr == NULL) {
ESP_LOGI(TAG, "wake_volume_ptr is NULL");
} else {
float wake_volume = *wake_volume_ptr;
ESP_LOGI(TAG, "wake volume: %f", wake_volume);
}
reset_timer(hdl_display_timer, DISPLAY_TIMEOUT_US, true);

speech_rec_mode = config_get_char("speech_rec_mode", DEFAULT_SPEECH_REC_MODE);
Expand Down
4 changes: 3 additions & 1 deletion main/log.c
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,11 @@ void init_logging(void)
esp_log_level_set("*", ESP_LOG_DEBUG);
#else
esp_log_level_set("*", ESP_LOG_ERROR);
esp_log_level_set("AUDIO_RECORDER", ESP_LOG_INFO);
esp_log_level_set("PERIPH_WIFI", ESP_LOG_WARN);
#endif


esp_log_level_set("WILLOW/AUDIO", WILLOW_LOG_LEVEL);
esp_log_level_set("WILLOW/CONFIG", WILLOW_LOG_LEVEL);
esp_log_level_set("WILLOW/DISPLAY", WILLOW_LOG_LEVEL);
Expand All @@ -32,4 +34,4 @@ void init_logging(void)
esp_log_level_set("WILLOW/TIMER", WILLOW_LOG_LEVEL);
esp_log_level_set("WILLOW/UI", WILLOW_LOG_LEVEL);
esp_log_level_set("WILLOW/WAS", WILLOW_LOG_LEVEL);
}
}

4 comments on commit 2f6c047

@nikito
Copy link
Contributor

@nikito nikito commented on 2f6c047 Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my experience I sometimes trigger two units at once, but the other is far enough away that it doesn't hear quite as well so the transcribed text is different. It may require that we not only look at the difference in volume levels, but may need to have a threshold that can determine if it is hearing from so far away that it's not likely from a nearby person? Not sure if there's another vector to determine likelihood that it's an actual person it heard versus accidental 🫤

@stintel
Copy link
Collaborator Author

@stintel stintel commented on 2f6c047 Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kristiankielhofner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  1. When you have multiple devices in earshot of one another turning the gain down helps a lot.

  2. We have a new linear gain AFE param we can play with.

  3. We can define a "minimum wake level" or similar in WAS as a threshold for wake consideration.

  4. We're introducing the concept of groups/zones in WAS. Most homes will only have one group, this is likely the level "wake winner" will be at initially. Zones will be for association of Willow device with a given room/zone in HA so you can do things like "turn off the lights". The nearest Willow wins wake, WAS knows which zone that device is in, and provides it to HA (possibly as crudely as appending ' in kitchen' or similar to the command).

Groups will also eventually support different groups of settings so you can do things like host WAS + WIS for friends and family, multiple homes, etc but that's for another day :).

  1. Checking text for a unique user won't work. When I have two-three devices wake simultaneously at least one of them often hallucinates like crazy, giving completely random output that would absolutely look like another independent session if we were to use ASR output.

Anyway, I'm not terribly concerned about multiple users waking simultaneously. The chances of a wake session (1-2 seconds) from two distinct speakers at any given time colliding are very, very low and this is an edge case I'm certainly willing to live with initially. We can develop more advanced approaches between Willow, WAS, and WIS in the future but overall I think this is a very good start.

@nikito
Copy link
Contributor

@nikito nikito commented on 2f6c047 Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking the same, the likelihood of two people triggering units at the exact same instant (within a second for instance) in normal use cases is pretty slim. Probably not worth worrying about such a case in the initial design at least, probably not at all unless people report it often enough to warrant it lol. 🙂

Please sign in to comment.