Skip to content

Humanoid robot head with real-time voice interaction, animated eyes, and head movement, powered by Ingescape and OpenAI Whisper on Raspberry Pi.

License

Notifications You must be signed in to change notification settings

acromtech/Voice_Driven_Humanoid_Head

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Humanoid Robot Head with Real-Time Voice Interaction Using Ingescape

General Overview

This project, part of a distributed interaction course at UPSSITECH, aims to integrate Ingescape into a humanoid robot head's voice interaction interface. The head, designed in SolidWorks, incorporates 3D-printed components and multiple hardware and software integrations.

Key Features

  • Voice Interaction: Real-time response to vocal commands and questions using speech recognition, transcription and speech synthesis with gTTS (Google Text-To-Speech).
  • Dynamic Eye and Head Movements: Animated eyes on circular RGB LCD screens and head movements via MyActuator RMD-L-5005 Brushless Servomotors with CAN communication.
  • Real-Time Transcription with Whisper: Integration of OpenAI's Whisper for seamless audio transcription.
  • Future Development: Facial and gesture recognition via OpenCV and MediaPipe, leveraging a Raspberry Pi Camera Module v2 8MP.

Hardware Setup


Software Setup

To ensure seamless installation and configuration, use the provided setup_software.sh script. This script handles all necessary installations, including Python 3.11, Ingescape, and other dependencies.

Key Software

System Architecture

image

Installation

Please use the following command to install the dependencies :

sudo bash setup_laptop.sh

You can also modify all the parameters of the devices into the main.py file

# CONFIG RASPBERRY PI
"""
simulation_mode = False
device = "wlan0"
playback_device_name = "UACDemoV1.0"
sample_rate = 48000
speed_factor_tts = 1.15
recording_device_name = "USB PnP Sound Device"
mic_sample_rate = 44100
silence_threshold = 0.02
silence_duration = 0.5
"""

# YOUR LAPTOP CONFIG
simulation_mode = True
device = "wlo1"
playback_device_name = "UACDemoV1.0"
sample_rate = 48000
speed_factor_tts = 1.15
recording_device_name = "USB PnP Sound Device"
mic_sample_rate = 44100
silence_threshold = 0.02
silence_duration = 0.5

How It Works

The robot head is controlled by a Python-based system that listens to user input (voice commands) and responds with dynamic actions. The core functionalities are:

  1. Voice Interaction:

    • Whisper is used to transcribe speech in real-time.
    • gTTS is used to generate speech responses from text.
  2. Dynamic Eye and Mouth & Head Movements:

    • Eye and mouth animations are shown using Waveshare LCD displays and on the Whiteboard simultaneously.

    2.1. Whiteboard Interface: Animated Visual Feedback The whiteboard is a visual interface where animated graphics (GIFs) are displayed to represent the robot's "expressions." This interface leverages the LCD screens for a more engaging interaction experience. Key aspects include:

  • Dynamic Eye Movements: - Depending on the robot's emotional state or context, the eyes can blink, look left or right, and even display special animations (e.g., "amoureux" for love or "animal" for playful expressions). - The animations are displayed on Waveshare 1.28inch Round LCD modules, with GIFs or specific visuals representing the state.

  • Mouth Animations: - Along with eye movements, mouth visuals change to reflect emotions (e.g., smile, wide open). - These animations provide non-verbal feedback that complements voice responses.

  • Integration with Decisions: - The Decision class drives the updates on the whiteboard interface by selecting appropriate GIFs or animations based on user input and predefined responses.

    Example Workflow:

    • If the user says, "heureux," the robot's eyes will display the "star" GIF, and the mouth will show a "moving mouth." image

2.2. Chat Interface: Voice Interaction and Transcription The chat interface provides real-time transcription of user speech and displays the robot's textual responses. It simulates a conversation log, making it easy for users to follow the interaction. Key components include:

  • Speech Recognition: - The Whisper model transcribes user speech into text, which is displayed in the chat. - Example: If the user says, "Bonjour, robot !", the chat log will show: User: Bonjour, robot !

  • Text-to-Speech Responses: - The robot generates a voice response using gTTS and simultaneously displays the response text in the chat. - Example: If the robot responds, "Bonjour, comment puis-je vous aider ?", the chat log will show: Robot: Bonjour, comment puis-je vous aider ? image

    • Seamless Integration with Decisions: - The Decision class matches the transcription to a predefined response and updates both the whiteboard and chat interfaces accordingly.
  1. Decision-Making Class:

    • The Decision class, using pre-programmed responses (e.g., greetings, commands), decides how the robot should respond based on the input.
    • The get_response function processes the message, checks for greetings and keywords, and updates the robot’s movements and facial expressions accordingly.
  2. Future Enhancements:

    • Facial and gesture recognition using OpenCV and MediaPipe.
    • Integration of the Raspberry Pi Camera for improved interaction.

V&V (Verification and Validation)

This section describes the testing strategy implemented to ensure proper functionality of the humanoid robot head.

1. Integration Testing

To perform integration testing:

  • Run the main.py script.
  • Ensure to adjust the device parameter to match your hardware setup. Replace "wlan0" with your specific device (e.g., "Wi-Fi") by modifying line 38 in the main script:
    decision = Decision(device="wlan0", simulation_mode=simulation_mode)

2. Unit Testing

For unit testing:

  • Run the Python scripts for each module or file individually.
  • Adjust the device configuration in the agent initialization at the start of each script. Replace "Wi-Fi" with your specific device (e.g., "wlan0") as follows:
    agent = RobotHead(device="Wi-Fi", simulation_mode=True)

3. Agent Testing

To evaluate the agent's performance:

  • Execute the test_robothead.igsscript script: This script contains predefined scenarios to test the agent’s behavior.

Project Goals

The main goals of this project are:

  • Real-time voice interaction: Provide a smooth, conversational interaction with the robot using voice commands.
  • Dynamic feedback: Display visual feedback through dynamic eye movements and animated facial expressions.
  • Extendable platform: Build a foundation for further features like facial recognition, gesture tracking, and more complex interactions.

For installation and setup, please refer to the setup_software.sh script. Follow its execution steps to prepare your environment.

About

Humanoid robot head with real-time voice interaction, animated eyes, and head movement, powered by Ingescape and OpenAI Whisper on Raspberry Pi.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published