-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adding the BYO finetuning models on SM
- Loading branch information
Trent Fowler
authored and
Trent Fowler
committed
Sep 27, 2024
1 parent
f731653
commit fb1571e
Showing
1 changed file
with
223 additions
and
0 deletions.
There are no files selected for viewing
223 changes: 223 additions & 0 deletions
223
fern/pages/deployment-options/cohere-on-aws/byo-finetuning-sm.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,223 @@ | ||
--- | ||
title: Deploy Your Own Finetuned Command-R-0824 Model from AWS Marketplace | ||
slug: docs/bring-your-finetuned-models-to-sagemaker | ||
hidden: false | ||
description: >- | ||
This document provides a guide for bringing your own finetuned models to Amazon SageMaker. | ||
image: ../../../../assets/images/8dbcb80-cohere_meta_image.jpg | ||
keywords: 'Fine-tuned LLMs, Cohere on AWS, language models on AWS, Amazon SageMaker' | ||
createdAt: 'Fri Sep 27 2024 ' | ||
updatedAt: '' | ||
--- | ||
|
||
This document shows you how to deploy your own finetuned HuggingFace Command-R model [CohereForAI/c4ai-command-r-08-2024](https://huggingface.co/CohereForAI/c4ai-command-r-08-2024) using Amazon SageMaker. More specifically, assuming you already have the adapter weights or merged weights from your own finetuning of [CohereForAI/c4ai-command-r-08-2024](https://huggingface.co/CohereForAI/c4ai-command-r-08-2024), we will show you how to: | ||
|
||
- Merge the adapter weights to the weights of the base model, if you bring only the adapter weights; | ||
- Export the merged weights to the TensorRT-LLM inference engine using Amazon SageMaker; | ||
- Deploy the engine as a SageMaker endpoint to serve your business use cases; | ||
|
||
You can also find a [companion notebook](https://github.com/cohere-ai/cohere-aws/blob/c946a2c798baf4fb42a3ecfc29f197ba62ee9586/notebooks/sagemaker/Deploy%20your%20own%20finetuned%20command-r-0824.ipynb) with working code samples. | ||
|
||
## Prerequisites | ||
|
||
- Ensure that IAM role used has `AmazonSageMakerFullAccess` | ||
To deploy your model successfully, ensure that either: | ||
- Your IAM role has these three permissions, and you have authority to make AWS Marketplace subscriptions in the relevant AWS account: | ||
- `aws-marketplace:ViewSubscriptions` | ||
- `aws-marketplace:Unsubscribe` | ||
- `aws-marketplace:Subscribe` | ||
- Or, your AWS account has a subscription to the packages for Cohere Command R Bring Your Own Finetuning. If so, you can skip the "subscribe to the bring your own finetuning algorithm" step below. | ||
|
||
**NOTE:** If you're running the companion notebook, know that it contains elements which render correctly in Jupyter interface, so you should open it from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio. | ||
|
||
## Step 1: Subscribe to the bring your own finetuning algorithm | ||
|
||
To subscribe to the algorithm: | ||
|
||
- Open the algorithm listing page Cohere Command R Bring Your Own Finetuning. | ||
- On the AWS Marketplace listing, click on the Continue to Subscribe button. | ||
- On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. On the **"Configure and launch"** page, make sure the ARN displayed in your region match with the ARN you will use below. | ||
|
||
## Step 2: Preliminary setup | ||
|
||
First, let's install the Python packages and import them. | ||
|
||
```TEXT | ||
pip install --upgrade cohere-aws | ||
``` | ||
|
||
```Python PYTHON | ||
import os | ||
import sagemaker as sage | ||
|
||
from cohere_aws import Client | ||
from sagemaker.s3 import S3Uploader | ||
``` | ||
|
||
Make sure you have access to the resources in your AWS account. For example, you can configure an AWS profile by the command aws configure sso (see [here](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html#cli-configure-sso-configure)) and run the command below to set the environment variable `AWS_PROFILE` as your profile name. | ||
|
||
```Python PYTHON | ||
# Change "<aws_profile>" to your own AWS profile name | ||
os.environ["AWS_PROFILE"] = "<aws_profile>" | ||
``` | ||
|
||
Finally, you need to set all the following variables using your own information. It's best not to not add a trailing slash to these paths, as that could mean some parts won't work correctly. | ||
|
||
```Python PYTHON | ||
# The AWS region | ||
region = "<region>" | ||
|
||
# The arn of the bring your own finetuning algorithm | ||
arn = "<arn>" | ||
|
||
# The local directory of your adapter weights. No need to specify this, if you bring your own merged weights | ||
adapter_weights_dir = "<adapter_weights_dir>" | ||
|
||
# The local directory you want to save the merged weights. Or the local directory of your own merged weights, if you bring your own merged weights | ||
merged_weights_dir = "<merged_weights_dir>" | ||
|
||
# The S3 directory you want to save the merged weights | ||
s3_checkpoint_dir = "<s3_checkpoint_dir>" | ||
|
||
# The S3 directory you want to save the exported TensorRT-LLM engine. Make sure you do not reuse the same S3 directory across multiple runs | ||
s3_output_dir = "<s3_output_dir>" | ||
|
||
# The name of the export | ||
export_name = "<export_name>" | ||
|
||
# The name of the SageMaker endpoint | ||
endpoint_name = "<endpoint_name>" | ||
``` | ||
|
||
## Step 3: Get the merged weights | ||
|
||
Assuming you use HuggingFace's [PEFT](https://github.com/huggingface/peft) to finetune [CohereForAI/c4ai-command-r-08-2024](https://huggingface.co/CohereForAI/c4ai-command-r-08-2024) and get the adapter weights, you can then merge your adapter weights to the base model weights to get the merged weights as shown below. Skip this step if you have already got the merged weights. | ||
|
||
```Python PYTHON | ||
import torch | ||
|
||
from peft import PeftModel | ||
from transformers import CohereForCausalLM | ||
|
||
|
||
def load_and_merge_model(base_model_name_or_path: str, adapter_weights_dir: str): | ||
""" | ||
Load the base model and the model finetuned by PEFT, and merge the adapter weights to the base weights to get a model with merged weights | ||
""" | ||
base_model = CohereForCausalLM.from_pretrained(base_model_name_or_path) | ||
peft_model = PeftModel.from_pretrained(base_model, adapter_weights_dir) | ||
merged_model = peft_model.merge_and_unload() | ||
return merged_model | ||
|
||
|
||
def save_hf_model(output_dir: str, model, tokenizer=None, args=None): | ||
""" | ||
Save a HuggingFace model (and optionally tokenizer as well as additional args) to a local directory | ||
""" | ||
os.makedirs(output_dir, exist_ok=True) | ||
model.save_pretrained(output_dir, state_dict=None, safe_serialization=True) | ||
if tokenizer is not None: | ||
tokenizer.save_pretrained(output_dir) | ||
if args is not None: | ||
torch.save(args, os.path.join(output_dir, "training_args.bin")) | ||
|
||
# Get the merged model from adapter weights | ||
merged_model = load_and_merge_model("CohereForAI/c4ai-command-r-08-2024", adapter_weights_dir) | ||
|
||
# Save the merged weights to your local directory | ||
save_hf_model(merged_weights_dir, merged_model) | ||
``` | ||
|
||
## Step 4. Upload the merged weights to S3 | ||
|
||
```Python PYTHON | ||
sess = sage.Session() | ||
merged_weights = S3Uploader.upload(merged_weights_dir, s3_checkpoint_dir, sagemaker_session=sess) | ||
print("merged_weights", merged_weights) | ||
``` | ||
|
||
## Step 5. Export the merged weights to the TensorRT-LLM inference engine | ||
|
||
Create Cohere client and use it to export the merged weights to the TensorRT-LLM inference engine. The exported TensorRT-LLM engine will be stored in a tar file `{s3_output_dir}/{export_name}.tar.gz` in S3, where the file name is the same as the `export_name`. | ||
|
||
```Python PYTHON | ||
co = Client(region_name=region) | ||
co.export_finetune( | ||
arn=arn, | ||
name=export_name, | ||
s3_checkpoint_dir=s3_checkpoint_dir, | ||
s3_output_dir=s3_output_dir, | ||
instance_type="ml.p4de.24xlarge", | ||
role="ServiceRoleSagemaker", | ||
) | ||
``` | ||
|
||
## Step 6. Create an endpoint for inference from the exported engine | ||
|
||
The Cohere client provides a built-in method to create an endpoint for inference, which will automatically deploy the model from the TensorRT-LLM engine you just exported. | ||
|
||
```Python PYTHON | ||
co.create_endpoint( | ||
arn=arn, | ||
endpoint_name=endpoint_name, | ||
s3_models_dir=s3_output_dir, | ||
recreate=True, | ||
instance_type="ml.p4de.24xlarge", | ||
role="ServiceRoleSagemaker", | ||
) | ||
``` | ||
|
||
## Step 7. Perform real-time inference by calling the endpoint | ||
|
||
Now, you can perform real-time inference by calling the endpoint you just deployed. | ||
|
||
```Python PYTHON | ||
# If the endpoint is already deployed, you can directly connect to it | ||
co.connect_to_endpoint(endpoint_name=endpoint_name) | ||
|
||
message = "Classify the following text as either very negative, negative, neutral, positive or very positive: mr. deeds is , as comedy goes , very silly -- and in the best way." | ||
result = co.chat(message=message) | ||
print(result) | ||
``` | ||
|
||
You can also evaluate your finetuned model using a evaluation dataset. The following is an example with the ScienceQA dataset: | ||
|
||
```Python PYTHON | ||
import json | ||
from tqdm import tqdm | ||
|
||
eval_data_path = "<path_to_scienceQA_eval.jsonl>" | ||
|
||
total = 0 | ||
correct = 0 | ||
for line in tqdm(open(eval_data_path).readlines()): | ||
total += 1 | ||
question_answer_json = json.loads(line) | ||
question = question_answer_json["messages"][0]["content"] | ||
answer = question_answer_json["messages"][1]["content"] | ||
model_ans = co.chat(message=question, temperature=0).text | ||
if model_ans == answer: | ||
correct += 1 | ||
|
||
print(f"Accuracy of finetuned model is %.3f" % (correct / total)) | ||
``` | ||
|
||
## Step 8. Delete the endpoint (optional) | ||
|
||
After you successfully performed the inference, you can delete the deployed endpoint to avoid being charged continuously. | ||
|
||
```Python PYTHON | ||
co.delete_endpoint() | ||
co.close() | ||
``` | ||
|
||
## Step 9. Unsubscribe to the listing (optional) | ||
|
||
If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable models](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. | ||
|
||
**Note:** You can find this information by looking at the container name associated with the model. | ||
|
||
Here's how you do that: | ||
|
||
- Navigate to **Machine Learning** tab on the [Your Software subscriptions](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust) page; | ||
- Locate the listing that you want to cancel the subscription for, and then choose Cancel Subscription to cancel the subscription. |