-
Notifications
You must be signed in to change notification settings - Fork 864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/cpp baby llama rework #2903
Conversation
…rsion errors. Signed-off-by: Shrinath Suresh <[email protected]> Custom preprocess implementation Signed-off-by: Shrinath Suresh <[email protected]> Free memory only after the inference is done Signed-off-by: Shrinath Suresh <[email protected]> Implement Postprocess Signed-off-by: Shrinath Suresh <[email protected]> Setting Fast compiler option Signed-off-by: Shrinath Suresh <[email protected]> Reading checkpoint path and tokenizer path from config file using folly Signed-off-by: Shrinath Suresh <[email protected]> Removing run.c from cmake Signed-off-by: Shrinath Suresh <[email protected]> Replace auto with appropriate data type Signed-off-by: Shrinath Suresh <[email protected]> Using smartpointers and initializing the vector with appropriate size upfront Signed-off-by: Shrinath Suresh <[email protected]> Using smartpointers Signed-off-by: Shrinath Suresh <[email protected]> Directly converting the tensor values to prompt token ids Signed-off-by: Shrinath Suresh <[email protected]> Moving run.c and common variables to .cc file Signed-off-by: Shrinath Suresh <[email protected]> Moving run.c to a separate folder Signed-off-by: Shrinath Suresh <[email protected]> Uncommenting the original run.c main method Signed-off-by: Shrinath Suresh <[email protected]> Implemented destructor to free up resources Signed-off-by: Shrinath Suresh <[email protected]> Supporting files for unit test Signed-off-by: Shrinath Suresh <[email protected]> Processing all the batch inputs Signed-off-by: Shrinath Suresh <[email protected]> Setting InferenceMode guard Signed-off-by: Shrinath Suresh <[email protected]> Updating InferenceMode to use torch::InferenceMode Signed-off-by: Shrinath Suresh <[email protected]> Updating class name to BabyLlamaHandler Signed-off-by: Shrinath Suresh <[email protected]> Renaming llm_handler target to babyllama_handler Signed-off-by: Shrinath Suresh <[email protected]> Adding dummy pt file Signed-off-by: Shrinath Suresh <[email protected]> Typo Fix Signed-off-by: Shrinath Suresh <[email protected]> Calculate tokens/per second for batch input Signed-off-by: Shrinath Suresh <[email protected]> Adding README.md for babyllama example Signed-off-by: Shrinath Suresh <[email protected]> Fixing out-of-bound mem access in babyllama example Move model instance out of ts_backend Use shared_ptr<void> for model to detangle from torchscript Move BaseHAndler to backends/handler Move model instance into core Remove Torchscript as a backend and implement it as a handler Move torchscript test out of backend folder Remove dummy.pt in babyllama + update README + mvoe babyllama test to new examples/examples_test.cc file
3064301
to
f0bfaf4
Compare
const std::string &handler_str = manifest_->GetModel().handler; | ||
std::size_t delimiter_pos = handler_str.find(manifest_->kHandler_Delimiter); | ||
if (delimiter_pos != std::string::npos) { | ||
#ifdef __APPLE__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this require separate packaging for TorchServe Mac installables vs Linux version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're currently not planning to provide precompiled binaries but will rely on the build.sh script for installation. If we change this in the future these macros will be resolved by the preprocessor during compilation and we would require different packages for the different platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can handle this as a separate PR, filed issue #2908 for tracking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mreso Thanks for this PR and the enhancements. For Babyllama do we still need to use torchscripted option?
Please see few minor comments inline.
@chauhang the babyllama example uses https://github.com/karpathy/llama2.c for the model execution and does not utilize torchscript. |
Description
This PR is a rebase of #2544 which add a baby llama example to the cpp backend.
Additionally, it removes the framework specific backends like the TorchScriptBackend.
With this PR no custom backend for different frameworks like llama.cpp, vllm, TorchScript will be necessary.
Instead, the handler .so file can be linked against any framework that suites the current use case.
Fixes #(issue)
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Logs for Test A:
Checklist: