Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Story] Speech Recognition #1012

Open
2 of 10 tasks
ar13pit opened this issue Mar 7, 2020 · 12 comments
Open
2 of 10 tasks

[Story] Speech Recognition #1012

ar13pit opened this issue Mar 7, 2020 · 12 comments
Assignees

Comments

@ar13pit
Copy link
Contributor

ar13pit commented Mar 7, 2020

Stage 1

Stage 2 (Enhancements to Yapykaldi)

  • Add Voice Activity detection to know when to stop (or start)
  • Add garbage phonemes to closed grammar to make recognition robust
  • Add wake word recognition to start recognition only on certain words (i.e. first word of the grammar targets)

Low priority but important

  • Create an ubuntu PPA and release kaldi shared libs, include files, yapykaldi. This will simplify the installation process drastically.
  • Move speech_recognition to docker when yapykaldi PPA is ready.
@LoyVanBeek
Copy link
Member

I'll take up the

Create a ros HMI client wrapper for yapykaldi (yapykaldi_ros)

@ar13pit
Copy link
Contributor Author

ar13pit commented Mar 31, 2020

I'll take up the

Create a ros HMI client wrapper for yapykaldi (yapykaldi_ros)

Great. There is one feature that is needed there though, caching of the grammar string if the same grammar was sent in the previous HMI query as compiling the speech model all the time will be very expensive.

@LoyVanBeek
Copy link
Member

Allright. I've spent most my time setting up the dependencies and I need to be on 18.04 I think though, so I'll continue next week.

@LoyVanBeek LoyVanBeek self-assigned this Mar 31, 2020
@LarsJanssenTUe
Copy link
Contributor

Allright. I've spent most my time setting up the dependencies and I need to be on 18.04 I think though, so I'll continue next week.

Needing to be on 18.04 would be undesired if not explicitly necessary. Why is this needed @ar13pit ?

@ar13pit
Copy link
Contributor Author

ar13pit commented Mar 31, 2020

No its not needed. I wrote that in the README as I explicitly tested it on 18.04 but it will work on 16.04 as well as along as CUDA is not being used and gcc7 is used.

@LoyVanBeek
Copy link
Member

Hmm, kaldi complained about needing CMake 3.12 or somethingh while by 16.04 box has 3.5 and could not get a higher version installed yet.

@ar13pit
Copy link
Contributor Author

ar13pit commented Apr 1, 2020

Did you try running the command tue-get install python-yapykaldi ? It should install all the dependencies, build kaldi and yapykaldi.

However, if you don't want to do that go with pip2 install --user cmake. You won't have to spend time building it from source.

@LoyVanBeek
Copy link
Member

LoyVanBeek commented Apr 18, 2020

@ar13pit is https://github.com/gooofy/zamia-speech/#model-adaptation what you meant with "Zamia speech JSGF"?

There is also https://pypi.org/project/pyjsgf/

@ar13pit
Copy link
Contributor Author

ar13pit commented Apr 18, 2020

Yup and that section is a huge pile of shit.

@LoyVanBeek
Copy link
Member

@LoyVanBeek
Copy link
Member

Decide where parsing of the output of yapykaldi into semantics must be done. In yapykaldi or in the ros wrapper.
I currently have this in the ROS wrapper and I think that makes the most sense too.

Same for the converting to JSGF format: I'd say that happens in the ROS wrapper and the JSGF grammar is passed to Asr.recognize or Asr.start.

@ar13pit
Copy link
Contributor Author

ar13pit commented Apr 21, 2020

I actually getting rid of JSGF completely as that is also an intermediate format. Instead I'll keep NLTK grammar object as that is pretty close to our grammar parser. So this object is passed onto Asr.recognize and a new ASR model is created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants