Skip to content

DelinQu/awesome-vision-language-action-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Awesome-vision-language-action-models Awesome

🔥 Latest Advances on Vison-Language-Action Models. (UNDERCONSTRUCTION 🔥)

Embodied intelligence is one of the most critical carrier for general artificial intelligence to reach the physical world, with significant implications for our daily lives. Recent years have seen tremendous advancements in robotics technology, and the AI community is increasingly turning its attention to robots. We're excitedly anticipating the emergence of a GPT-like breakthrough in robotics! 🤗

Trending Projects

  • 🤗LeRoBot - Making AI for Robotics more accessible with end-to-end learning.

Table of Content

Milestone Papers

Date keywords Institute Paper Code
2022-12 Transformer Google RT-1: Robotics Transformer for real-world control at scale google-research/robotics_transformer
2023-03 General Multimodal Google PaLM-E: An Embodied Multimodal Language Model -
2024-03 Vector Quantization New York University Behavior Generation with Latent Actions jayLEE0301/vq_bet_official
2023-04 Diffusion Columbia University Diffusion Policy: Visuomotor Policy Learning via Action Diffusion real-stanford/diffusion_policy
2023-04 Action Chunking Stanford Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware (ACT tonyzhaozh/act
2023-07 - Google RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control -
2023-10 Action-in-Video-Out UC Berkeley UniSim: Learning Interactive Real-World Simulators -
2024-05 Low-Cost/Design Google ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoperation tonyzhaozh/aloha
2024-05 Diffusion UC Berkeley Octo: An Open-Source Generalist Robot Policy octo-models/octo
2024-06 Open-Source Google OpenVLA: An Open-Source Vision-Language-Action Model openvla/openvla
2024-09 Heterogeneous MIT Heterogenous Pre-trained Transformers liruiw/HPT
2024-10 Bimanual Diffusion THU RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation thu-ml/RoboticsDiffusionTransformer
2024-10 Video-Language-Action ByteDance GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation -
2024-10 Flow Matching Physical Intelligence π0: A Vision-Language-Action Flow Model for General Robot Control -

Datasets & Benchmark

Date keywords Institute Paper Code
2023-10 OpenX Dataset - Open X-Embodiment: Robotic Learning Datasets and RT-X Models google-deepmind/open_x_embodiment
2024-08 Data Mixtures Stanford Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning jhejna/remix
2024-05 Real2Sim UCSD Evaluating Real-World Robot Manipulation Policies in Simulation simpler-env/SimplerEnv
2023-06 Lifelong UT Benchmarking Knowledge Transfer for Lifelong Robot Learning Lifelong-Robot-Learning/LIBERO

Tutorials and Courses

Date keywords
2024-11 CoRL24-8 From Octo to π₀: How to Train Your Generalist Robot Policy
2024-10 RDT-1B Talk
2024-08 OpenVLA: LeRobot Research Presentation #5 by Moo Jin Kim

Contributing

This is an active repository and your contributions are always welcome!

The repository is underconstruction and I will keep some pull requests open if I'm not sure if they are awesome for VLA Model, you could vote for them by adding 👍 to them.


If you have any question about this opinionated list, do not hesitate to contact me 📮 [email protected].

About

Latest Advances on Vison-Language-Action Models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published