From 7cfde87a6a82fee7c9d93bfc8b7da4aeedd6ea8d Mon Sep 17 00:00:00 2001 From: Trent Fowler Date: Fri, 22 Nov 2024 11:33:00 -0700 Subject: [PATCH 1/2] Mentioning LOR. --- .../cohere-on-aws/amazon-sagemaker-setup-guide.mdx | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fern/pages/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx b/fern/pages/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx index 83b0f7d5..5e564e63 100644 --- a/fern/pages/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx +++ b/fern/pages/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx @@ -115,6 +115,9 @@ To access Cohere's models on SageMaker Jumpstart, follow these steps: If you have any questions about this process, reach out to support@cohere.com. +## Optimize your Inference Latencies +By default, SageMaker endpoints have a random routing strategy, which can cause latency issues in applications focused on generative AI. As of 2023, the SageMaker platform supports a `RoutingStrategy` parameter that allows you to use the 'least outstanding requests' (LOR) approach to routing. LOR has shown an improvement in latency under various conditions, and you can find more details [here](https://aws.amazon.com/blogs/machine-learning/minimize-real-time-inference-latency-by-using-amazon-sagemaker-routing-strategies/). + ## Next Steps With your selected configuration and Product ARN available, you now have everything you need to integrate with Cohere’s model offerings on SageMaker. From fee99d5f6c1eeaffed15b11a08d05826510771ef Mon Sep 17 00:00:00 2001 From: Trent Fowler Date: Fri, 22 Nov 2024 12:04:17 -0700 Subject: [PATCH 2/2] Updating language and adding it to the v2 docs. --- .../amazon-sagemaker-setup-guide.mdx | 5 ++++- .../amazon-sagemaker-setup-guide.mdx | 21 +++++++++++++++++++ 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/fern/pages/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx b/fern/pages/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx index 5e564e63..7252b484 100644 --- a/fern/pages/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx +++ b/fern/pages/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx @@ -116,7 +116,10 @@ To access Cohere's models on SageMaker Jumpstart, follow these steps: If you have any questions about this process, reach out to support@cohere.com. ## Optimize your Inference Latencies -By default, SageMaker endpoints have a random routing strategy, which can cause latency issues in applications focused on generative AI. As of 2023, the SageMaker platform supports a `RoutingStrategy` parameter that allows you to use the 'least outstanding requests' (LOR) approach to routing. LOR has shown an improvement in latency under various conditions, and you can find more details [here](https://aws.amazon.com/blogs/machine-learning/minimize-real-time-inference-latency-by-using-amazon-sagemaker-routing-strategies/). + +By default, SageMaker endpoints have a random routing strategy. This means that requests coming to the model endpoints are forwarded to the machine learning instances randomly, which can cause latency issues in applications focused on generative AI. In 2023, the SageMaker platform introduced a `RoutingStrategy` parameter allowing you to use the ‘least outstanding requests’ (LOR) approach to routing. With LOR, SageMaker monitors the load of the instances behind your endpoint as well as the models or inference components that are deployed on each instance, then optimally routes requests to the instance that is best suited to serve it. + +LOR has shown an improvement in latency under various conditions, and you can find more details [here](https://aws.amazon.com/blogs/machine-learning/minimize-real-time-inference-latency-by-using-amazon-sagemaker-routing-strategies/). ## Next Steps diff --git a/fern/pages/v2/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx b/fern/pages/v2/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx index fdbf9ec1..76f97392 100644 --- a/fern/pages/v2/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx +++ b/fern/pages/v2/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx @@ -101,6 +101,27 @@ result = co.chat(message="Write a LinkedIn post about starting a career in tech: print(result) ``` +## Access Via Amazon SageMaker Jumpstart + +Cohere's models are also available on Amazon SageMaker Jumpstart, which makes it easy to access the models with just a few clicks. + +To access Cohere's models on SageMaker Jumpstart, follow these steps: + +- In the AWS Console, go to Amazon SageMaker and click `Studio`. +- Then, click `Open Studio`. If you don't see this option, you first need to create a user profile. +- This will bring you to the SageMaker Studio page. Look for `Prebuilt and automated solutions` and select `JumpStart`. +- A list of models will appear. To look for Cohere models, type "cohere" in the search bar. +- Select any Cohere model and you will find details about the model and links to further resources. +- You can try out the model by going to the `Notebooks` tab, where you can launch the notebook in JupyterLab. + +If you have any questions about this process, reach out to support@cohere.com. + +## Optimize your Inference Latencies + +By default, SageMaker endpoints have a random routing strategy. This means that requests coming to the model endpoints are forwarded to the machine learning instances randomly, which can cause latency issues in applications focused on generative AI. In 2023, the SageMaker platform introduced a `RoutingStrategy` parameter allowing you to use the ‘least outstanding requests’ (LOR) approach to routing. With LOR, SageMaker monitors the load of the instances behind your endpoint as well as the models or inference components that are deployed on each instance, then optimally routes requests to the instance that is best suited to serve it. + +LOR has shown an improvement in latency under various conditions, and you can find more details [here](https://aws.amazon.com/blogs/machine-learning/minimize-real-time-inference-latency-by-using-amazon-sagemaker-routing-strategies/). + ## Next Steps With your selected configuration and Product ARN available, you now have everything you need to integrate with Cohere’s model offerings on SageMaker.