You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fast-LLM currently lacks support for models using Mamba 2 blocks, which are a cornerstone of modern state-space models (SSMs) like Zamba 2 and NVIDIA's Hymba. These architectures interleave Mamba 2 with transformer layers, offering faster training, low inference latency, and reduced compute/memory footprint compared to traditional transformers. Without Mamba 2 support, Fast-LLM misses out on compatibility with cutting-edge hybrid architectures, limiting its ability to adopt and continually pretrain models like Zamba 2 or similar architectures.
💡 Proposed Solution
Implement Mamba 2 blocks in Fast-LLM in a phased approach:
Initial PyTorch Integration:
Port a reference implementation of Mamba 2 blocks (e.g., from Zamba 2) into Fast-LLM.
Focus on enabling basic training functionality for SSM+transformer hybrids.
Prioritize correctness over performance in early iterations.
Support Hybrid Architectures:
Enable pretraining and fine-tuning of architectures like Zamba 2, which interleave Mamba 2 with shared transformer blocks.
Add functionality to load pretrained Zamba 2 models into Fast-LLM for continual pretraining.
Optimization for Performance:
Address performance bottlenecks by replacing generic PyTorch implementations with optimized CUDA or Triton kernels.
Reuse existing Mamba 2 CUDA kernels where feasible, or reimplement them in Triton for better scalability and flexibility.
Extend to Other Architectures:
Evaluate Hymba and other hybrid architectures to determine additional requirements for their support.
By progressing in iterative stages, we can implement Mamba 2 support incrementally, ensuring stability, performance, and alignment with Fast-LLM's broader architecture.
🔄 Alternatives Considered
Avoiding Fast-LLM for training these models is an option, but it undermines the framework’s role as our go-to tool for pretraining in the Foundation Models Lab. While we may reuse existing implementations if they integrate smoothly, it's more likely we'll eventually develop our own Mamba 2 implementation, similar to how we approached dropless MoE.
📈 Potential Benefits
Mamba 2 blocks bring computational efficiency, enabling faster training and inference than traditional transformers. Supporting these blocks enhances Fast-LLM's capabilities, making it more competitive and unlocking new research opportunities.
🧐 Problem Description
Fast-LLM currently lacks support for models using Mamba 2 blocks, which are a cornerstone of modern state-space models (SSMs) like Zamba 2 and NVIDIA's Hymba. These architectures interleave Mamba 2 with transformer layers, offering faster training, low inference latency, and reduced compute/memory footprint compared to traditional transformers. Without Mamba 2 support, Fast-LLM misses out on compatibility with cutting-edge hybrid architectures, limiting its ability to adopt and continually pretrain models like Zamba 2 or similar architectures.
💡 Proposed Solution
Implement Mamba 2 blocks in Fast-LLM in a phased approach:
Initial PyTorch Integration:
Support Hybrid Architectures:
Optimization for Performance:
Extend to Other Architectures:
By progressing in iterative stages, we can implement Mamba 2 support incrementally, ensuring stability, performance, and alignment with Fast-LLM's broader architecture.
🔄 Alternatives Considered
Avoiding Fast-LLM for training these models is an option, but it undermines the framework’s role as our go-to tool for pretraining in the Foundation Models Lab. While we may reuse existing implementations if they integrate smoothly, it's more likely we'll eventually develop our own Mamba 2 implementation, similar to how we approached dropless MoE.
📈 Potential Benefits
Mamba 2 blocks bring computational efficiency, enabling faster training and inference than traditional transformers. Supporting these blocks enhances Fast-LLM's capabilities, making it more competitive and unlocking new research opportunities.
📝 Additional Context
Reference Implementations:
Initial Challenges:
The text was updated successfully, but these errors were encountered: