This project was developed during our master studies at the Kempten University of Applied Sciences in cooperation with the Institute for Data-optimised Manufacturing (IDF).
Project members: Linus Göhl, Quirin Sandt, Benjamin Schober
The goal of the project was to create a pipeline that converts language to GCode (e.g. for a CNC milling machine). For this the different components are necessary:
Short information on how this pipeline works:
- Audio is transcribed to text returning the prompt
- Prompt is used to generate images using Stable Diffusion
- Image is rated by its quality and using object detection
- Selected image is preprocessed and converted to GCode
Below is more detailed information about the specific pipeline parts, models and technologies used.
Most of the pipeline components are deployed within a Docker container running on a GPU cluster. The pipelines are accessed through a REST API.
Models and technologies used:
Model/Technology | Description | Link |
---|---|---|
openai/whisper-large-v2 |
Speech recognition model (ASR) | OpenAI GitHub, HuggingFace Model, [Paper ] |
Helsinki-NLP/opus-mt-de-en |
Translation model | Helsinki-NLP GitHub, HuggingFace Model |
NLTK |
Natural Language Toolkit. Used for keyword/noun extraction | NLTK GitHub, NLTK Website |
Since the pipeline is accessed through a REST API, all the functional parts are implemented in the class TextPipeline. When the pipeline is deployed, one instance of the class is created and the models are loaded into VRAM. Since the pipeline consists of multiple models and parts, the following endpoints and functions are available:
Endpoint | Description |
---|---|
/api/transcribe |
Transcribes the audio file to text (executes the transcribe , translate and extraact_nouns function). |
/api/translate |
Translates the text to English (executes the translate and extract_nouns function) |
Model/Technology | Description | Link |
---|---|---|
stabilityai/stable-diffusion-2-1-base |
Image generation model | HuggingFace Model, [Paper ] |
LAION-Aesthetics_Predictor V1 |
Image rating model | GitHub, [Paper ] |
Grounding DINO |
Object detection model | GitHub |
Note: This pipeline component is not deployed within a Docker container, it is running on the local machine.