Awesome-vision-language-action-models

🔥 Latest Advances on Vison-Language-Action Models. (UNDERCONSTRUCTION 🔥)

Embodied intelligence is one of the most critical carrier for general artificial intelligence to reach the physical world, with significant implications for our daily lives. Recent years have seen tremendous advancements in robotics technology, and the AI community is increasingly turning its attention to robots. We're excitedly anticipating the emergence of a GPT-like breakthrough in robotics! 🤗

Trending Projects

🤗LeRoBot - Making AI for Robotics more accessible with end-to-end learning.

Table of Content

Awesome-vision-language-action-models

Milestone Papers

Date	keywords	Institute	Paper	Code
2022-12	Transformer	Google	RT-1: Robotics Transformer for real-world control at scale	google-research/robotics_transformer
2023-03	General Multimodal	Google	PaLM-E: An Embodied Multimodal Language Model	-
2024-03	Vector Quantization	New York University	Behavior Generation with Latent Actions	jayLEE0301/vq_bet_official
2023-04	Diffusion	Columbia University	Diffusion Policy: Visuomotor Policy Learning via Action Diffusion	real-stanford/diffusion_policy
2023-04	Action Chunking	Stanford	Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware (ACT	tonyzhaozh/act
2023-07	-	Google	RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control	-
2023-10	Action-in-Video-Out	UC Berkeley	UniSim: Learning Interactive Real-World Simulators	-
2024-05	Low-Cost/Design	Google	ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoperation	tonyzhaozh/aloha
2024-05	Diffusion	UC Berkeley	Octo: An Open-Source Generalist Robot Policy	octo-models/octo
2024-06	Open-Source	Google	OpenVLA: An Open-Source Vision-Language-Action Model	openvla/openvla
2024-09	Heterogeneous	MIT	Heterogenous Pre-trained Transformers	liruiw/HPT
2024-10	Bimanual Diffusion	THU	RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation	thu-ml/RoboticsDiffusionTransformer
2024-10	Video-Language-Action	ByteDance	GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation	-
2024-10	Flow Matching	Physical Intelligence	π0: A Vision-Language-Action Flow Model for General Robot Control	-

Datasets & Benchmark

Date	keywords	Institute	Paper	Code
2023-10	OpenX Dataset	-	Open X-Embodiment: Robotic Learning Datasets and RT-X Models	google-deepmind/open_x_embodiment
2024-08	Data Mixtures	Stanford	Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning	jhejna/remix
2024-05	Real2Sim	UCSD	Evaluating Real-World Robot Manipulation Policies in Simulation	simpler-env/SimplerEnv
2023-06	Lifelong	UT	Benchmarking Knowledge Transfer for Lifelong Robot Learning	Lifelong-Robot-Learning/LIBERO

Tutorials and Courses

Date	keywords
2024-11	CoRL24-8 From Octo to π₀: How to Train Your Generalist Robot Policy
2024-10	RDT-1B Talk
2024-08	OpenVLA: LeRobot Research Presentation #5 by Moo Jin Kim

Contributing

This is an active repository and your contributions are always welcome!

The repository is underconstruction and I will keep some pull requests open if I'm not sure if they are awesome for VLA Model, you could vote for them by adding 👍 to them.

If you have any question about this opinionated list, do not hesitate to contact me 📮 [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-vision-language-action-models

Trending Projects

Table of Content

Milestone Papers

Datasets & Benchmark

Tutorials and Courses

Contributing

About

Releases

Packages

DelinQu/awesome-vision-language-action-model

Folders and files

Latest commit

History

Repository files navigation

Awesome-vision-language-action-models

Trending Projects

Table of Content

Milestone Papers

Datasets & Benchmark

Tutorials and Courses

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages