Skip to content

Latest commit

 

History

History
109 lines (84 loc) · 3.39 KB

README.md

File metadata and controls

109 lines (84 loc) · 3.39 KB

Language-Integrated Value Iteration

Code for How Can LLM Guide RL? A Value-Based Approach.

Authors: Shenao Zhang*, Sirui Zheng*, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang (* indicates equal contribution)


ALFWorld

Environment setup

  • Clone the repository:
git clone https://github.com/agentification/Language-Integrated-VI.git
cd Language-Integrated-VI/alfworld
  • Create a virtual environment and install the required packages:
pip install -r requirements.txt
export OPENAI_API_KEY=<your key>

Run the code

./run.sh

InterCode

Steps to run our algorithm in the InterCode environment.

Environment setup

  • Clone the repository, create a virtual environment, and install necessary dependencies:
git clone https://github.com/agentification/Language-Integrated-VI.git
cd Language-Integrated-VI/intercode
conda env create -f environment.yml
conda activate intercode
  • Run setup.sh to create the docker images for the InterCode Bash, SQL, and CTF environments.

  • Set OPENAI_API_KEY environment variable to your OpenAI API key:

export OPENAI_API_KEY=<your key>

Run the code

  • For InterCode-SQL, run
./scripts/expr_slinvit_sql.sh
  • For InterCode-Bash, run
./scripts/expr_slinvit_bash.sh

BlocksWorld

Environment setup

  • Our experiments are conducted with Vicuna-13B/33B (v1.3). The required packages can be installed by
    pip install -r requirements.txt
    

Run the code

  • To run the RAP experiments, here is a shell script of the script

    CUDA_VISIBLE_DEVICES=0,1,2 nohup python -m torch.distributed.run --master_port 1034 --nproc_per_node 1 run_mcts.py --task mcts --model_name Vicuna --verbose False --data data/blocksworld/step_6.json --max_depth 6 --name m6ct_roll60 --rollouts 60 --model_path lmsys/vicuna-33b-v1.3 --num_gpus 3
  • To run the SLINVIT experiments, here is a shell script example

    CUDA_VISIBLE_DEVICES=3,4,5 nohup python -m torch.distributed.run --master_port 39855 --nproc_per_node 1 run.py \
    --model_name Vicuna \
    --name planning_step6_13b \
    --data data/blocksworld/step_6.json \
    --horizon 6 \
    --search_depth 5 \
    --alpha 0 \
    --sample_per_node 2 \
    --model_path lmsys/vicuna-13b-v1.3 \
    --num_gpus 3 \
    --use_lang_goal

Citation

@article{zhang2024can,
  title={How Can LLM Guide RL? A Value-Based Approach},
  author={Zhang, Shenao and Zheng, Sirui and Ke, Shuqi and Liu, Zhihan and Jin, Wanxin and Yuan, Jianbo and Yang, Yingxiang and Yang, Hongxia and Wang, Zhaoran},
  journal={arXiv preprint arXiv:2402.16181},
  year={2024}
}