Home

Speech-to-world-server Wiki

Welcome to the speech-to-world-server wiki! This wiki has two main targets: to explain how the code works, and to document the history of the project.

The philosophy of this project is to experiment with the use of generative AI to create virtual environments. It does not have the ambition to create "full video games". Creating a convincing environment is complex enough, and it would be totally unrealistic with my skills and time available to go much further. In addition, in the near future, generative AI can be an interesting way to play with new concepts and explore new ideas, which is exactly the target of this project.

So explore, experiment, and have fun!

Main tasks

Currently, the projects encompasses the following fields:

Skybox
- Skybox generation with diffusion (text-to-image) models.
- Skybox edition uses inpainting techniques.
Terrain
- We use some custom image-to-heightmap models.
3D elements
- Not implementation yet.
Ambient sounds
- Experimental text-to-sound using ollama and audiocraft.
Automatic Speech Recognition.

Application tasks

On top of the previous tasks, the project aimed to be accessible via an application. To succeed, we need:

A network architecture. Currently it is in TCP.
An application implementation. This is the purpose of speech-to-world-unity-client.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Speech-to-world-server Wiki

Main tasks

Application tasks

Clone this wiki locally