Llamole is a multimodal Large Language Model (LLM) that integrates a base LLM with the Graph Diffusion Transformer and Graph Neural Networks for multi-conditional molecular generation and multi-step reaction inference within texts.
📄 Paper: Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning
🔍 Abstract
While large language models (LLMs) have integrated images, adapting them to graphs remains challenging, limiting their applications in materials and drug design. This difficulty stems from the need for coherent autoregressive generation across texts and graphs. To address this, we introduce Llamole, the first multimodal LLM capable of interleaved text and graph generation, enabling molecular inverse design with retrosynthetic planning. Llamole integrates a base LLM with the Graph Diffusion Transformer and Graph Neural Networks for multi-conditional molecular generation and reaction inference within texts, while the LLM, with enhanced molecular understanding, flexibly controls activation among the different graph modules. Additionally, Llamole integrates A* search with LLM-based cost functions for efficient retrosynthetic planning. We create benchmarking datasets and conduct extensive experiments to evaluate Llamole against in-context learning and supervised fine-tuning. Llamole significantly outperforms 14 adapted LLMs across 12 metrics for controllable molecular design and retrosynthetic planning.
Initialize the environment by following these steps:
conda create --name llamole python=3.11 -y
conda activate llamole
./install_environment.sh
Alternatively, you can install all required dependencies using the requirements.sh
script.
- Hardware: A single V100 or A6000 GPU for inference.
- Configuration Files:
config/train/{model}_lora.yaml
config/generate/{model}_{task}.yaml
On the first run, the necessary models will be automatically downloaded, including:
-
Base LLMs (Please ensure you have access to the model):
-
Pretrained Graph Models:
- Graph Decoder: Graph Diffusion Transformer
- Graph Encoder: GIN-based Encoder
- Graph Predictor: GIN-based Predictor
-
Adapters and Connectors for integrating the base LLM with pretrained graph models.
If you prefer to download the models manually, refer to and place them in the following directories:
saves/graph_decoder
saves/graph_encoder
saves/graph_predictor
saves/{model_name}-Adapter
Launch the web interface using Gradio:
python launch.py
The default base LLM is Qwen2-7B-Instruct. If you wish to change this, please modify the args_dict
variable accordingly. Upon launch, the web UI will appear as shown below:
For command-line evaluation, specify the path to the configuration file:
python main.py eval config/generate/qwen_material.yaml
You can modify the configuration files to suit your custom datasets.
Note: Examples of training and evaluation datasets are available in the data
folder. For more details, refer to data/dataset_info.json
. To test generation on all MolQA questions, first download the dataset by running:
python main.py download_data
Then, update the configuration files to point to the downloaded dataset based on the names from data/dataset_info.json
.
The codebase supports multimodal graph-text supervised fine-tuning. Follow these steps:
-
Download MolQA Training Data:
python main.py download_data
Then you may need to modify the configuration files in the
config
folder to point to the downloaded training data. Skipping this step and directly using the command from step 2 will result in training only on the example training set. -
Run Fine-Tuning:
python main.py train config/train/mistral_lora.yaml
During the first run, pretrained graph models will be downloaded in the
saves
folder. Modify the configuration files as needed for your setup. An 80G A100 GPU is recommended for supervised fine-tuning.
If you find this repository useful, please cite our paper:
@misc{liu2024llamole,
title={Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning},
author={Gang Liu and Michael Sun and Wojciech Matusik and Meng Jiang and Jie Chen},
year={2024},
eprint={2410.04223},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.04223},
}
@article{liu2024graphdit,
title={Graph Diffusion Transformers for Multi-Conditional Molecular Generation},
author={Liu, Gang and Xu, Jiaxin and Luo, Tengfei and Jiang, Meng},
journal={Thirty-Eighth Annual Conference on Neural Information Processing Systems},
year={2024}
}
This codebase is built upon Llama-Factory. We extend our gratitude for their open-source contributions.
🔗 Huggingface Models: Llamole is developed with three variants (adapters) and three pretrained graph modules (encoder, decoder, predictor):
- Base LLM Variant 1: Llama-3.1-8b-Instruct
- Base LLM Variant 2: Qwen2-7B-Instruct
- Base LLM Variant 3: Mistral-7B-Instruct-v0.3
- Pretrained Graph Decoder for multi-conditional molecular generation: Graph Diffusion Transformer
- Pretrained Graph Predictor for one-step reaction prediction: GNN Predictor
- Pretrained Graph Encoder for enhanced molecule understanding: Graph Encoder