This project, part of a distributed interaction course at UPSSITECH, aims to integrate Ingescape into a humanoid robot head's voice interaction interface. The head, designed in SolidWorks, incorporates 3D-printed components and multiple hardware and software integrations.
- Voice Interaction: Real-time response to vocal commands and questions using speech recognition, transcription and speech synthesis with gTTS (Google Text-To-Speech).
- Dynamic Eye and Head Movements: Animated eyes on circular RGB LCD screens and head movements via MyActuator RMD-L-5005 Brushless Servomotors with CAN communication.
- Real-Time Transcription with Whisper: Integration of OpenAI's Whisper for seamless audio transcription.
- Future Development: Facial and gesture recognition via OpenCV and MediaPipe, leveraging a Raspberry Pi Camera Module v2 8MP.
-
Core Components:
- 1x Raspberry Pi 4 Model B - 8GB (running Ubuntu 24.04 LTS Server)
- 3x Waveshare 1.28inch Round LCD Module Displays for animated eyes
- 1x Raspberry Pi Camera Module v2 8MP for facial and gesture recognition in future iterations
- 3x MyActuator RMD-L-5005 brushless servomotors for head movement
- 1x USB to CAN Converter for servomotor communication
- 1x USB 2.0 Mini Microphone for audio input
- 1x USB Mini Speaker for sound output
-
Mechanical Components:
- Design: SolidWorks
- Slicing: PrusaSlicer
- Printing: Original Prusa MINI+ (180 x 180 x 180 mm volume)
- Materials: PETG filaments (Polymaker PolyLite PETG Black & Grey)
- Additional: Tinted visor
To ensure seamless installation and configuration, use the provided setup_software.sh script. This script handles all necessary installations, including Python 3.11, Ingescape, and other dependencies.
- Operating System: Ubuntu 24.04 LTS Server
- Voice and Interaction Libraries:
- Ingescape (Circle & Whiteboard)
- OpenAI's Whisper
- Python Version:
Please use the following command to install the dependencies :
sudo bash setup_laptop.sh
You can also modify all the parameters of the devices into the main.py file
# CONFIG RASPBERRY PI
"""
simulation_mode = False
device = "wlan0"
playback_device_name = "UACDemoV1.0"
sample_rate = 48000
speed_factor_tts = 1.15
recording_device_name = "USB PnP Sound Device"
mic_sample_rate = 44100
silence_threshold = 0.02
silence_duration = 0.5
"""
# YOUR LAPTOP CONFIG
simulation_mode = True
device = "wlo1"
playback_device_name = "UACDemoV1.0"
sample_rate = 48000
speed_factor_tts = 1.15
recording_device_name = "USB PnP Sound Device"
mic_sample_rate = 44100
silence_threshold = 0.02
silence_duration = 0.5
The robot head is controlled by a Python-based system that listens to user input (voice commands) and responds with dynamic actions. The core functionalities are:
-
Voice Interaction:
- Whisper is used to transcribe speech in real-time.
- gTTS is used to generate speech responses from text.
-
Dynamic Eye and Mouth & Head Movements:
- Eye and mouth animations are shown using Waveshare LCD displays and on the Whiteboard simultaneously.
2.1. Whiteboard Interface: Animated Visual Feedback The whiteboard is a visual interface where animated graphics (GIFs) are displayed to represent the robot's "expressions." This interface leverages the LCD screens for a more engaging interaction experience. Key aspects include:
-
Dynamic Eye Movements: - Depending on the robot's emotional state or context, the eyes can blink, look left or right, and even display special animations (e.g., "amoureux" for love or "animal" for playful expressions). - The animations are displayed on Waveshare 1.28inch Round LCD modules, with GIFs or specific visuals representing the state.
-
Mouth Animations: - Along with eye movements, mouth visuals change to reflect emotions (e.g., smile, wide open). - These animations provide non-verbal feedback that complements voice responses.
-
Integration with Decisions: - The
Decision
class drives the updates on the whiteboard interface by selecting appropriate GIFs or animations based on user input and predefined responses.Example Workflow:
2.2. Chat Interface: Voice Interaction and Transcription The chat interface provides real-time transcription of user speech and displays the robot's textual responses. It simulates a conversation log, making it easy for users to follow the interaction. Key components include:
-
Speech Recognition: - The Whisper model transcribes user speech into text, which is displayed in the chat. - Example: If the user says, "Bonjour, robot !", the chat log will show:
User: Bonjour, robot !
-
Text-to-Speech Responses: - The robot generates a voice response using gTTS and simultaneously displays the response text in the chat. - Example: If the robot responds, "Bonjour, comment puis-je vous aider ?", the chat log will show:
Robot: Bonjour, comment puis-je vous aider ?
- Seamless Integration with Decisions:
- The
Decision
class matches the transcription to a predefined response and updates both the whiteboard and chat interfaces accordingly.
- Seamless Integration with Decisions:
- The
-
Decision-Making Class:
- The Decision class, using pre-programmed responses (e.g., greetings, commands), decides how the robot should respond based on the input.
- The
get_response
function processes the message, checks for greetings and keywords, and updates the robot’s movements and facial expressions accordingly.
-
Future Enhancements:
- Facial and gesture recognition using OpenCV and MediaPipe.
- Integration of the Raspberry Pi Camera for improved interaction.
This section describes the testing strategy implemented to ensure proper functionality of the humanoid robot head.
To perform integration testing:
- Run the main.py script.
- Ensure to adjust the device parameter to match your hardware setup. Replace
"wlan0"
with your specific device (e.g.,"Wi-Fi"
) by modifying line 38 in the main script:decision = Decision(device="wlan0", simulation_mode=simulation_mode)
For unit testing:
- Run the Python scripts for each module or file individually.
- Adjust the device configuration in the agent initialization at the start of each script. Replace
"Wi-Fi"
with your specific device (e.g.,"wlan0"
) as follows:agent = RobotHead(device="Wi-Fi", simulation_mode=True)
To evaluate the agent's performance:
- Execute the test_robothead.igsscript script: This script contains predefined scenarios to test the agent’s behavior.
The main goals of this project are:
- Real-time voice interaction: Provide a smooth, conversational interaction with the robot using voice commands.
- Dynamic feedback: Display visual feedback through dynamic eye movements and animated facial expressions.
- Extendable platform: Build a foundation for further features like facial recognition, gesture tracking, and more complex interactions.
For installation and setup, please refer to the setup_software.sh script. Follow its execution steps to prepare your environment.