This is the research repository for Direction-of-Voice (DoV) Estimation for Intuitive Speech Interaction with Smart Devices Ecosystems (UIST 2020). It contains the the featurization pipeline, analysis code and the demo code. More details can be found here.
Download the featurized dataset using the script in the data/featurized_data/
folder.
Setup the SRMR framework: https://github.com/jfsantos/SRMRpy
The python notebooks can be found in the src
fodler.
analysis.ipynb
: Run different conditions to replicate the various results in the paper. More details in the notebook.sample_prediction.ipynb
: Sample prediction on an example multi-channel wav file.
Link to the full raw dataset: https://www.dropbox.com/s/gtw7o0nj0h7j4gy/subjectrecording.zip
The data is organized in the following manner:
- 10 participants (s1 to s10)
- 2 utterances (recording0 and recording1)
- 2 sessions (trial1 and trial2)
- 2 rooms (upstairs and downstairs)
- 2 device placements (wall and nowall)
- 3 user distances (1m, 3m and 5m)
- 3 polar positions (X0, X1 and X2)
- 8 angles (DoV Angle: 45 degree increments from 0 to 360 degrees)
This leads to: 10 × 2 × 2 × 2 × 2 × 3 × 3 × 8 = 11520 recordings
The hardware used is a Seeed ReSpeaker USB Mic Array (wiki here) flashed with the 6 channel, 48kHz sampling frequency (specified as "48k_6_channels_firmware.bin". Here channel 0 is processed audio for ASR, channel 1-4 are the 4 microphones' raw data and channel 5 is playback.
The data is organized as follows:
- ParticipantID /
- ParticipantID_RoomID_DevicePlacementID_SessionID /
- PolarPositionID_Distance_PolarAngle /
- UtteranceID_DoVAngle_MicChannel
- PolarPositionID_Distance_PolarAngle /
- ParticipantID_RoomID_DevicePlacementID_SessionID /
Karan Ahuja, Andy Kong, Mayank Goel, and Chris Harrison. 2020. Direction-of-Voice (DoV) Estimation for Intuitive Speech Interaction with Smart Devices Ecosystems. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (UIST '20). Association for Computing Machinery, New York, NY, USA, 1121–1131. DOI:https://doi.org/10.1145/3379337.3415588
BibTex Reference:
@inproceedings{10.1145/3379337.3415588,
author = {Ahuja, Karan and Kong, Andy and Goel, Mayank and Harrison, Chris},
title = {Direction-of-Voice (DoV) Estimation for Intuitive Speech Interaction with Smart Devices Ecosystems},
year = {2020},
isbn = {9781450375146},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3379337.3415588},
doi = {10.1145/3379337.3415588},
booktitle = {Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology},
pages = {1121–1131},
numpages = {11},
keywords = {addressability, speaker orientation, voice interfaces},
location = {Virtual Event, USA},
series = {UIST '20}
}
GPL v 2.0 License file present in repo. Please contact [email protected] if you would like another license for your use.
THE PROGRAM IS DISTRIBUTED IN THE HOPE THAT IT WILL BE USEFUL, BUT WITHOUT ANY WARRANTY. IT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW THE AUTHOR WILL BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF THE AUTHOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.