- Explore tree sparsity
- Fine-tune Medusa heads together with LM head from scratch
- Distill from any model without access to the original training data
- Batched inference
- Fine-grained KV cache management
- Optimize the tree-based attention to reduce additional computation
- Improve the acceptance scheme to generate more diverse sequences