cohere-ai · trentfowlercohere · Nov 25, 2024 · Nov 22, 2024 · Nov 22, 2024 · Nov 25, 2024
@@ -115,6 +115,12 @@ To access Cohere's models on SageMaker Jumpstart, follow these steps:
 
 If you have any questions about this process, reach out to [email protected].
 
+## Optimize your Inference Latencies
+
+By default, SageMaker endpoints have a random routing strategy. This means that requests coming to the model endpoints are forwarded to the machine learning instances randomly, which can cause latency issues in applications focused on generative AI. In 2023, the SageMaker platform introduced a `RoutingStrategy` parameter allowing you to use the ‘least outstanding requests’ (LOR) approach to routing. With LOR, SageMaker monitors the load of the instances behind your endpoint as well as the models or inference components that are deployed on each instance, then optimally routes requests to the instance that is best suited to serve it.
+
+LOR has shown an improvement in latency under various conditions, and you can find more details [here](https://aws.amazon.com/blogs/machine-learning/minimize-real-time-inference-latency-by-using-amazon-sagemaker-routing-strategies/).
+
 ## Next Steps
 
 With your selected configuration and Product ARN available, you now have everything you need to integrate with Cohere’s model offerings on SageMaker. 

@@ -101,6 +101,27 @@ result = co.chat(message="Write a LinkedIn post about starting a career in tech:
 print(result)
 ```
 
+## Access Via Amazon SageMaker Jumpstart
+
+Cohere's models are also available on Amazon SageMaker Jumpstart, which makes it easy to access the models with just a few clicks.
+
+To access Cohere's models on SageMaker Jumpstart, follow these steps:
+
+- In the AWS Console, go to Amazon SageMaker and click `Studio`.
+- Then, click `Open Studio`. If you don't see this option, you first need to create a user profile.
+- This will bring you to the SageMaker Studio page. Look for `Prebuilt and automated solutions` and select `JumpStart`.
+- A list of models will appear. To look for Cohere models, type "cohere" in the search bar.
+- Select any Cohere model and you will find details about the model and links to further resources.
+- You can try out the model by going to the `Notebooks` tab, where you can launch the notebook in JupyterLab.
+
+If you have any questions about this process, reach out to [email protected].
+
+## Optimize your Inference Latencies
+
+By default, SageMaker endpoints have a random routing strategy. This means that requests coming to the model endpoints are forwarded to the machine learning instances randomly, which can cause latency issues in applications focused on generative AI. In 2023, the SageMaker platform introduced a `RoutingStrategy` parameter allowing you to use the ‘least outstanding requests’ (LOR) approach to routing. With LOR, SageMaker monitors the load of the instances behind your endpoint as well as the models or inference components that are deployed on each instance, then optimally routes requests to the instance that is best suited to serve it.
+
+LOR has shown an improvement in latency under various conditions, and you can find more details [here](https://aws.amazon.com/blogs/machine-learning/minimize-real-time-inference-latency-by-using-amazon-sagemaker-routing-strategies/).
+
 ## Next Steps
 
 With your selected configuration and Product ARN available, you now have everything you need to integrate with Cohere’s model offerings on SageMaker.