Note
For llamaspeak version 2 with multimodal support, see the local_llm
container
- Talk live with LLM's using NVIDIA Riva ASR and TTS!
- Requires the
riva-server
andtext-generation-webui
to be running
First, follow the steps from the riva-client:python
package to run and test the Riva server:
- Start the Riva server on your Jetson by following
riva_quickstart_arm64
- Run some of the Riva ASR examples to confirm that ASR is working: https://github.com/nvidia-riva/python-clients#asr
- Run some of the Riva TTS examples to confirm that TTS is working: https://github.com/nvidia-riva/python-clients#tts
You can also see this helpful video and guide from JetsonHacks for setting up Riva: Speech AI on Jetson Tutorial
Next, start text-generation-webui
(version 1.7) with the --api
flag and load your chat model of choice through it's web UI on port 7860:
./run.sh --workdir /opt/text-generation-webui $(./autotag text-generation-webui:1.7) \
python3 server.py --listen --verbose --api \
--model-dir=/data/models/text-generation-webui
note: launch the
text-generation-webui:1.7
container to maintain API compatability
Alternatively, you can manually specify the model that you want to load without needing to use the web UI:
./run.sh --workdir /opt/text-generation-webui $(./autotag text-generation-webui:1.7) \
python3 server.py --listen --verbose --api \
--model-dir=/data/models/text-generation-webui \
--model=llama-2-13b-chat.Q4_K_M.gguf \
--loader=llamacpp \
--n-gpu-layers=128 \
--n_ctx=4096 \
--n_batch=4096 \
--threads=$(($(nproc) - 2))
See here for command-line arguments: https://github.com/oobabooga/text-generation-webui/tree/main#basic-settings
Browsers require HTTPS to be used in order to access the client's microphone. Hence, you'll need to create a self-signed SSL certificate and key:
$ cd /path/to/your/jetson-containers/data
$ openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365 -nodes -subj '/CN=localhost'
You'll want to place these in your jetson-containers/data
directory, because this gets automatically mounted into the containers under /data
, and will keep your SSL certificate persistent across container runs. When you first navigate your browser to a page that uses these self-signed certificates, it will issue you a warning since they don't originate from a trusted authority:
You can choose to override this, and it won't re-appear again until you change certificates or your device's hostname/IP changes.
To run the llamaspeak chat server with its default arguments and the SSL keys you generated, start it like this:
./run.sh --env SSL_CERT=/data/cert.pem --env SSL_KEY=/data/key.pem $(./autotag llamaspeak)
See chat.py
for command-line options that can be changed. For example, to enable --verbose
or --debug
logging:
./run.sh --workdir=/opt/llamaspeak \
--env SSL_CERT=/data/cert.pem \
--env SSL_KEY=/data/key.pem \
$(./autotag llamaspeak) \
python3 chat.py --verbose
if you're having issues with getting audio or responses from the web client, enable debug logging to check the message traffic.
The default port is 8050
, but that can be changed with the --port
argument. You can then navigate your browser to https://HOSTNAME:8050