Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mentioning LOR. #264

Merged
merged 3 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,12 @@ To access Cohere's models on SageMaker Jumpstart, follow these steps:

If you have any questions about this process, reach out to [email protected].

## Optimize your Inference Latencies

By default, SageMaker endpoints have a random routing strategy. This means that requests coming to the model endpoints are forwarded to the machine learning instances randomly, which can cause latency issues in applications focused on generative AI. In 2023, the SageMaker platform introduced a `RoutingStrategy` parameter allowing you to use the ‘least outstanding requests’ (LOR) approach to routing. With LOR, SageMaker monitors the load of the instances behind your endpoint as well as the models or inference components that are deployed on each instance, then optimally routes requests to the instance that is best suited to serve it.

LOR has shown an improvement in latency under various conditions, and you can find more details [here](https://aws.amazon.com/blogs/machine-learning/minimize-real-time-inference-latency-by-using-amazon-sagemaker-routing-strategies/).
trentfowlercohere marked this conversation as resolved.
Show resolved Hide resolved

## Next Steps

With your selected configuration and Product ARN available, you now have everything you need to integrate with Cohere’s model offerings on SageMaker.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,27 @@ result = co.chat(message="Write a LinkedIn post about starting a career in tech:
print(result)
```

## Access Via Amazon SageMaker Jumpstart

Cohere's models are also available on Amazon SageMaker Jumpstart, which makes it easy to access the models with just a few clicks.

To access Cohere's models on SageMaker Jumpstart, follow these steps:

- In the AWS Console, go to Amazon SageMaker and click `Studio`.
- Then, click `Open Studio`. If you don't see this option, you first need to create a user profile.
- This will bring you to the SageMaker Studio page. Look for `Prebuilt and automated solutions` and select `JumpStart`.
- A list of models will appear. To look for Cohere models, type "cohere" in the search bar.
- Select any Cohere model and you will find details about the model and links to further resources.
- You can try out the model by going to the `Notebooks` tab, where you can launch the notebook in JupyterLab.

If you have any questions about this process, reach out to [email protected].

## Optimize your Inference Latencies

By default, SageMaker endpoints have a random routing strategy. This means that requests coming to the model endpoints are forwarded to the machine learning instances randomly, which can cause latency issues in applications focused on generative AI. In 2023, the SageMaker platform introduced a `RoutingStrategy` parameter allowing you to use the ‘least outstanding requests’ (LOR) approach to routing. With LOR, SageMaker monitors the load of the instances behind your endpoint as well as the models or inference components that are deployed on each instance, then optimally routes requests to the instance that is best suited to serve it.

LOR has shown an improvement in latency under various conditions, and you can find more details [here](https://aws.amazon.com/blogs/machine-learning/minimize-real-time-inference-latency-by-using-amazon-sagemaker-routing-strategies/).

## Next Steps

With your selected configuration and Product ARN available, you now have everything you need to integrate with Cohere’s model offerings on SageMaker.
Expand Down
Loading