Autonomous Character-Scene Interaction Synthesis from Text Instruction

This is the code repository of Autonomous Character-Scene Interaction Synthesis from Text Instruction at SIGGRAPH Asia 2024

arXiv | Project Page | Dataset | Demo

LINGO Dataset

Please download the LINGO dataset from Google Drive. The content inside the download link will be continuously updated to ensure you have access to the most recent data.

Explanation of the files and folders of the LINGO dataset:

Scene (folder): This folder contains the occupancy grid for indoor scenes in LINGO dataset, indicated by each file name. The scenes are mirrored for augmentation.
Scene_vis (folder): This folder contains the occupancy grid for another set of indoor scenes, which we used to test our model and visualize the motions.
language_motion_dict (folder): This folder contains wrapped infomation of each motion segment we used to train our model.
human_pose.npy: This file contains a (N x 63) array, where each row corresponds to the 63-dimensional SMPL-X body_pose parameter of one frame of MoCap data. The data is a concatenation of all motion segments.
human_orient.npy: This file contains a (N x 3) array corresponding to the global_orient parameter of SMPL-X.
transl_aligned.npy: This file contains a (N x 3) array corresponding to the transl parameter of SMPL-X.
human_joints_aligned.npy: This file contains a (N x 28 x 3) array corresponding to the selected joints 3D location (y-up) of SMPL-X.
scene_name.pkl: This file contains a (N, ) list corresponding to the scene name of each frame.
start_idx.npy: This file contains a (M x 3) array corresponding to the start frame index of each motion segment.
end_idx.npy: This file contains a (M x 3) array corresponding to the end frame index of each motion segment.
text_aug.pkl: This file contains a (M, ) list corresponding to the text annotations of each motion segment.
left_hand_inter_frame.npy: This file contains a (M, ) array stores frame IDs where left hand-object contact occurs. And it contains -1 values for motion segments with no left hand-object contact.
right_hand_inter_frame.npy: This file contains a (M, ) array stores frame IDs where right hand-object contact occurs. And it contains -1 values for motion segments with no right hand-object contact.
clip_features.npy: This file contains the preprocessed CLIP features of text annotations in LINGO dataset.
text2features_idx.pkl: This file stores a dictionary that maps text annotations to their corresponding CLIP feature vectors.
norm_inter_and_loco__16frames.npy: This file is a (2, 3) array containing the range of joint coordinates along x, y, and z axes, used for normalizing joint locations.

Note: N represents the total number of frames in the LINGO dataset, while M represents the number of motion segments. This dataset is provided in mirrored form.

Human Motion Synthesis in Indoor Scenes

Prerequisites

To run the code, you need to have the following installed:

Python 3.8+
Required Python packages (specified in requirements.txt)

Installation

Clone the Repository:

git clone [email protected]:mileret/lingo-release.git

Download Checkpoints, Data, and SMPL-X Models:
- Download the necessary files and folders from this link.
- Extract lingo_utils.zip, and place the four files and folders (dataset, ckpts, smpl_models, vis.blend) at the root of the project directory.
Install Python Packages:
```
pip install -r requirements.txt
```
Install Blender:
- We use Blender for visualization of the result.
- Please download Blender3.6 from its official website.
- (Optional) Then, download SMPL-X Blender Add-on and activate it in Blender.

Inference and Visualization

Get Model Input:

Open vis.blend with Blender. Change the text, start_location, end_goal and hand_goal. Then run get_input in vis.blend.
Inference:

To synthesis human motions using our model, run
```
cd code
python sample_lingo.py
```
Visualization in Blender:

Run vis_output in vis.blend.

The generated human motion will be displayed in Blender.

Training

Overview

This README provides instructions on setting up and training our model using the LINGO dataset.

Prerequisites

Before you begin, make sure you have the following software installed:

pip install -r requirements.txt

Model Training

Navigate to the code directory:

cd code

To start training the model, run the training script from the command line:

python train_lingo.py

The training script will automatically load the dataset, set up the model, and commence training sessions using the configurations in ./code/config folder.

Citation

@inproceedings{jiang2024autonomous,
  title={Autonomous character-scene interaction synthesis from text instruction},
  author={Jiang, Nan and He, Zimo and Wang, Zi and Li, Hongjie and Chen, Yixin and Huang, Siyuan and Zhu, Yixin},
  booktitle={SIGGRAPH Asia 2024 Conference Papers},
  pages={1--11},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
code		code
results		results
.gitignore		.gitignore
load_smplx_animation.py		load_smplx_animation.py
readme.md		readme.md
requirements.txt		requirements.txt
smplx_handposes.npz		smplx_handposes.npz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autonomous Character-Scene Interaction Synthesis from Text Instruction

This is the code repository of Autonomous Character-Scene Interaction Synthesis from Text Instruction at SIGGRAPH Asia 2024

arXiv | Project Page | Dataset | Demo

LINGO Dataset

Note: N represents the total number of frames in the LINGO dataset, while M represents the number of motion segments. This dataset is provided in mirrored form.

Human Motion Synthesis in Indoor Scenes

Prerequisites

Installation

Inference and Visualization

Training

Overview

Prerequisites

Model Training

Citation

About

Releases

Packages

Languages

mileret/lingo-release

Folders and files

Latest commit

History

Repository files navigation

Autonomous Character-Scene Interaction Synthesis from Text Instruction

This is the code repository of Autonomous Character-Scene Interaction Synthesis from Text Instruction at SIGGRAPH Asia 2024

arXiv | Project Page | Dataset | Demo

LINGO Dataset

Note: N represents the total number of frames in the LINGO dataset, while M represents the number of motion segments. This dataset is provided in mirrored form.

Human Motion Synthesis in Indoor Scenes

Prerequisites

Installation

Inference and Visualization

Training

Overview

Prerequisites

Model Training

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages