Skip to content
HugoFara edited this page Sep 22, 2024 · 3 revisions

Speech-to-world-server Wiki

Welcome to the speech-to-world-server wiki! This wiki has two main targets: to explain how the code works, and to document the history of the project.

The philosophy of this project is to experiment with the use of generative AI to create virtual environments. It does not have the ambition to create "full video games". Creating a convincing environment is complex enough, and it would be totally unrealistic with my skills and time available to go much further. In addition, in the near future, generative AI can be an interesting way to play with new concepts and explore new ideas, which is exactly the target of this project.

So explore, experiment, and have fun!

Main tasks

Currently, the projects encompasses the following fields:

  • Skybox
    • Skybox generation with diffusion (text-to-image) models.
    • Skybox edition uses inpainting techniques.
  • Terrain
    • We use some custom image-to-heightmap models.
  • 3D elements
    • Not implementation yet.
  • Ambient sounds
    • Experimental text-to-sound using ollama and audiocraft.
  • Automatic Speech Recognition.

Application tasks

On top of the previous tasks, the project aimed to be accessible via an application. To succeed, we need:

Clone this wiki locally