-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the speech-to-world-server wiki! This wiki has two main targets: to explain how the code works, and to document the history of the project.
The philosophy of this project is to experiment with the use of generative AI to create virtual environments. It does not have the ambition to create "full video games". Creating a convincing environment is complex enough, and it would be totally unrealistic with my skills and time available to go much further. In addition, in the near future, generative AI can be an interesting way to play with new concepts and explore new ideas, which is exactly the target of this project.
So explore, experiment, and have fun!
Currently, the projects encompasses the following fields:
-
Skybox
- Skybox generation with diffusion (text-to-image) models.
- Skybox edition uses inpainting techniques.
-
Terrain
- We use some custom image-to-heightmap models.
- 3D elements
- Not implementation yet.
- Ambient sounds
- Experimental text-to-sound using ollama and audiocraft.
- Automatic Speech Recognition.
On top of the previous tasks, the project aimed to be accessible via an application. To succeed, we need:
- A network architecture. Currently it is in TCP.
- An application implementation. This is the purpose of speech-to-world-unity-client.