This code is a small illustration of generative AI, specifically Variational Autoencoders (VAE) (https://www.ibm.com/think/topics/variational-autoencoder). VAEs are deep learning models that encode input data into a continuous, probabilistic latent space, and following the optimization of a loss function, it generates new data, which are minor variations of the original dataset. VAEs are powerful tools for generating new samples, that very closely resemble the original data.
In quantum chemical calculations, an extensive sampling of the potential energy surface (PES) of molecular systems is critically important. However, such QM calculations are typically expensive. In this illustration, given a small dataset of structures for a reaction, VAEs are used for generating a much larger dataset which very closely resemble the original structures. The original dataset here consists of ~70 configurations of a SN2 reaction along a reaction coordinate. This was constructed from a combination of shell and Tcl scripting, followed by QM minimization using CP2K. However, constructing approximately ~10000 structures along the same reaction coordinate using these methods is non-trivial and expensive. Here, this VAE code accomplishes this by encoding the original set of 70 structures, and by training on just the coordinates, generates ~10K structures, at a fraction of the computational cost. Further refinement of the code, and incorporating on-the-fly QM calculations is currently ongoing.