Skip to content

Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.

Notifications You must be signed in to change notification settings

ccs96307/fast-llm-inference

Repository files navigation

Fast LLM Inference - Optimized Task Plan

I hope to implement some acceleration technologies for Large Language Models (LLMs) because I enjoy doing this myself and love the challenge of bringing research papers into real-world applications.

If there are any technologies you'd like to develop or discuss, feel free to reach out. Thanks!

I'm excited to dive deeper into AI research!


Updates Log

2024

  • 2024/12/16: Add the Medusa-1 Training Script v2
  • 2024/12/15: Add the Medusa-1 Training Script
  • 2024/12/12: Update the KV Cache support for Speculative Decoding
  • 2024/12/04: Add the Kangaroo Training Script v2
  • 2024/11/26: Add the Kangaroo Training Script
  • 2024/11/22: Update the Target Model Keep Generation Mechanism experiment
  • 2024/11/18: Update the Self-Speculative Decoding experiment results of google--gemma-2-9b-it.
  • 2024/11/12: Reviewing implementation challenges for Self-Speculative Decoding and evaluating model compatibility for improved efficiency.
  • 2024/11/10: Initial setup for Self-Speculative Decoding completed; data pipeline in place for testing draft-and-verify.
  • 2024/11/08: Speculative Decoding successfully implemented. Verified improved inference time with no noticeable accuracy degradation.

Pending Decisions

  • Batched Speculative Decoding:
  • Prompt lookup decoding: Determine timeline after reviewing initial implementations.
  • UAG Integration: Assess when to integrate after Medusa and Kangaroo are in place.

TODO List

November 2024

Additional Enhancements

About

Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published