v0.3.0
Key Updates
- Continued feature work and improvements on operator kernels and backends, including Apple (Core ML, MPS), Arm, Cadence, Qualcomm, Vulkan, XNNPACK.
- Various improvements to the CMake system and the Android and iOS artifacts build.
- Various improvements on the AOT export path; e.g., eliminating view_copy, and adding Llama quantizer.
- Introduce dim order for ExecuTorch. With dim order, tensors within a single graph can support multiple memory formats.
- Introduce new API to register custom ExecuTorch kernels into ATen.
- Binary size reductions to the portable library via compile-time optimizations.
- Consolidate tokenizer interface for LLM models.
- Add a colab notebook to show Llama E2E flow in ExecuTorch.
- Improved C++ test and Pytest coverage.
- Deprecate exir.capture in favor of torch.export.
- Update versions for flatbuffers (v24.3.25), flatcc (896db54) and coremltools (8.0b1).
Kudos to the following first time contributors
Andres Suarez, Ben Rogers, Carlos Fernandez, Catherine Lee, Chakri Uddaraju, Chris Hopman, Chris Thompson, David Lin, Di Xu, Edward Yang, Eric J Nguyen, Erik Lundell, Hardik Sharma, Ignacio Guridi, Jakob Degen, Kaichen Liu, Lunwen He, Masahiro Hiramori, Naman Ahuja, Nathanael See, Nikita Shulga, Richard Zou, Sicheng Jia, Stephen Bochinski, Val Tarasyuk, Will Li, Yanghan Wang, Yipu Miao, Yujie Hui, Yupeng Zhang, Zingo Andersen, Zonglin Peng, salykova
Full Changelog
Please see v0.2.1-rc5...v0.3.0-rc6 for all 735 commits since the previous release.