Skip to content

LLMServe/LoongServe

 
 

Repository files navigation

LoongServe

This is an implementation of the paper: "LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism".

To reproduce all the main results in our paper, please check the artifact folder and follow the instructions in it.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 62.6%
  • Python 34.6%
  • C++ 1.6%
  • Cuda 1.1%
  • Shell 0.1%