Code for How Can LLM Guide RL? A Value-Based Approach.
Authors: Shenao Zhang*, Sirui Zheng*, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang (* indicates equal contribution)
- Clone the repository:
git clone https://github.com/agentification/Language-Integrated-VI.git
cd Language-Integrated-VI/alfworld
- Create a virtual environment and install the required packages:
pip install -r requirements.txt
-
Install the ALFWorld environment. Please refer to https://github.com/alfworld/alfworld.
-
Set
OPENAI_API_KEY
environment variable to your OpenAI API key:
export OPENAI_API_KEY=<your key>
./run.sh
Steps to run our algorithm in the InterCode environment.
- Clone the repository, create a virtual environment, and install necessary dependencies:
git clone https://github.com/agentification/Language-Integrated-VI.git
cd Language-Integrated-VI/intercode
conda env create -f environment.yml
conda activate intercode
-
Run
setup.sh
to create the docker images for the InterCode Bash, SQL, and CTF environments. -
Set
OPENAI_API_KEY
environment variable to your OpenAI API key:
export OPENAI_API_KEY=<your key>
- For InterCode-SQL, run
./scripts/expr_slinvit_sql.sh
- For InterCode-Bash, run
./scripts/expr_slinvit_bash.sh
- Our experiments are conducted with Vicuna-13B/33B (v1.3). The required packages can be installed by
pip install -r requirements.txt
-
To run the RAP experiments, here is a shell script of the script
CUDA_VISIBLE_DEVICES=0,1,2 nohup python -m torch.distributed.run --master_port 1034 --nproc_per_node 1 run_mcts.py --task mcts --model_name Vicuna --verbose False --data data/blocksworld/step_6.json --max_depth 6 --name m6ct_roll60 --rollouts 60 --model_path lmsys/vicuna-33b-v1.3 --num_gpus 3
-
To run the SLINVIT experiments, here is a shell script example
CUDA_VISIBLE_DEVICES=3,4,5 nohup python -m torch.distributed.run --master_port 39855 --nproc_per_node 1 run.py \ --model_name Vicuna \ --name planning_step6_13b \ --data data/blocksworld/step_6.json \ --horizon 6 \ --search_depth 5 \ --alpha 0 \ --sample_per_node 2 \ --model_path lmsys/vicuna-13b-v1.3 \ --num_gpus 3 \ --use_lang_goal
@article{zhang2024can,
title={How Can LLM Guide RL? A Value-Based Approach},
author={Zhang, Shenao and Zheng, Sirui and Ke, Shuqi and Liu, Zhihan and Jin, Wanxin and Yuan, Jianbo and Yang, Yingxiang and Yang, Hongxia and Wang, Zhaoran},
journal={arXiv preprint arXiv:2402.16181},
year={2024}
}