New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[RFE] Implement pod prioritization #977

Open

bdunne opened this issue Jun 26, 2023 · 7 comments · May be fixed by #978

Assignees

Labels

enhancement stale

Member

bdunne commented Jun 26, 2023

We have issues with pods being killed and rescheduled in busier environments. Unfortunately postgres is just as likely to be killed as any other worker pods. After a discussion with @Fryguy and @jrafanie we think the design should be as follows:

Add RBAC permissions for the operator to read, list and write priorityClassNames
Add 3 items to the CRD for high, medium and low priorityClassName values
Assign class name values as follows:
- If all values are specified in CR, use them
- If no values are set, detect the cluster default. Set low to cluster default, medium = low + 100, high = medium + 100
Validate that values are reasonable:
- High should not be more than 1,000,000,000 (use CRD JSON schema validation)
- Error if high, medium & low are out of order (code validation)
- Warn if low is less than cluster default? Warn if low is less than 0? (code validation)
Assign pod priorities:
- High: postgres, memcached, kafka, httpd
- Medium: UI & API, orchestrator, maybe operators if possible (may not work if the class names don't exist yet)
- Low: all other workers

The text was updated successfully, but these errors were encountered:

bdunne added the enhancement label

bdunne self-assigned this

Member Author

bdunne commented Jun 26, 2023

@Fryguy @jrafanie throw 🍅 🍅

Fryguy added this to the Quinteros milestone

Member

Fryguy commented Jun 26, 2023

High should not be more than 1,000,000,000 (use CRD JSON schema validation)

Good call. This keeps us under openshift defaults for critical values

$ oc get priorityclasses
NAME                      VALUE        GLOBAL-DEFAULT   AGE
openshift-user-critical   1000000000   false            89d
system-cluster-critical   2000000000   false            89d
system-node-critical      2000001000   false            89d

bdunne linked a pull request

that will close this issue

[WIP] Pod Prioritization #978

Draft

miq-bot added the stale label

Member

miq-bot commented Oct 2, 2023

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

Member

miq-bot commented Jan 8, 2024

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Fryguy removed this from the Quinteros milestone

Member

miq-bot commented Jun 10, 2024

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Fryguy added this to Roadmap

Fryguy moved this to To do in Roadmap

Member

miq-bot commented Sep 16, 2024

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

1 similar comment

Member

miq-bot commented Dec 23, 2024

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment