Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mentioning LOR. #264

Merged
merged 3 commits into from
Nov 25, 2024
Merged

Mentioning LOR. #264

merged 3 commits into from
Nov 25, 2024

Conversation

trentfowlercohere
Copy link
Contributor

@trentfowlercohere trentfowlercohere commented Nov 22, 2024

This PR introduces a new section titled "Optimize your Inference Latencies" in the Amazon SageMaker setup guide. The section addresses latency issues that may arise due to SageMaker endpoints' default random routing strategy, particularly in applications centred around generative AI.

It highlights the availability of the RoutingStrategy parameter on the SageMaker platform, which enables users to employ the 'least outstanding requests' (LOR) routing approach. This strategy has demonstrated improved latency performance across various scenarios, as referenced in the provided link.

  • New Section: Optimize your Inference Latencies
  • New Parameter: RoutingStrategy
  • New Strategy: 'least outstanding requests' (LOR)

@trentfowlercohere trentfowlercohere requested a review from a team as a code owner November 22, 2024 18:33
Copy link

Copy link

@trentfowlercohere trentfowlercohere merged commit 9152dc1 into main Nov 25, 2024
3 checks passed
@trentfowlercohere trentfowlercohere deleted the add-lor branch November 25, 2024 18:10
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants