Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEVPROD-12482 Provide rank value breakdown stats #8485

Merged
merged 11 commits into from
Nov 27, 2024

Conversation

hadjri
Copy link
Contributor

@hadjri hadjri commented Nov 18, 2024

DEVPROD-12482

Description

This PR granularizes the value computed by the queue sorting computation so that each sub-category of its computation can be tracked in Honeycomb. I also did a best-effort attempt to modularize the code and make it more understandable. This is not a functional change to how the sorting values themselves are computed.

With these changes, we can now create dashboards in Honeycomb that analyze how much impact each special case or our queue sorting algorithm is having on the overall state of the queue, and inspect if any of the impacts are vastly over or under-sized.

Testing

Luckily there are already many existing tests (plus a few new ones that I ran before and after this change) that should assert that the ranking algorithm has not changed.

Added new tests asserting that all of the total breakdown values are computed correctly in that their sums can be used to recompute the total sorting value.

Lastly, I also made sure that the Honeycomb dashboards can be created with these new values and that they make sense to interpret.

@hadjri hadjri requested a review from a team November 21, 2024 22:59
Copy link
Contributor

@bynn bynn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awesome breakdown of what's going on in the planner
I don't have too much to say because this is functionally the same as what already exists except it's easier to read and much better commented.

Only have slight comment comments

// priority value that is used in the queue sorting value equation.
type PriorityBreakdown struct {
// InitialPriorityImpact represents how much of the total priority can be attributed to the
// original priority set on a task by a user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it doesn't have to be user set because it could be set by the project yaml

@@ -300,20 +316,19 @@ func (unit *Unit) info() unitInfo {
return info
}

// RankValue returns a point value for the tasks in the unit that can
// sortingValueBreakdown returns a point value for the tasks in the unit that can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function returns the breakdown now


// computeRankValue computes the custom rank value for this unit, which will later be multiplied with the
// custom priority to compute a final value by which the unit will be sorted on in the queue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we say 'computed' priority value to get the sorting value just to make it easier to correlate it with the code

// patches that have spent more time in the queue
// should get worked on first (because people are
// waiting on the results), and because FIFO feels
// Give patches a bump, over non-patches. patches that have spent more time in the queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you didn't write this but can we get rid of the comma and capitalize the "Patches" after the period

@ablack12 ablack12 self-requested a review November 26, 2024 19:49
Copy link
Contributor

@ablack12 ablack12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, commenting but not blocking since I'll be out. Did you test this in staging with currently scheduled tasks by any chance? I'm a little bit wondering if changing cachedValue's format mid-run is going to break backwards compatibility (which could be fine if it just means things need to wait for the next scheduler job to get scheduled, just trying to think through it)

Prepare to be the SME on scheduling logic now that your name is the git blame 😂

MustHaveResults bool `bson:"must_have_results,omitempty" json:"must_have_results,omitempty"`
// SortingValueBreakdown is not persisted to the db, but stored in memory and passed to the task queue document.
// It contains information on what factors led to the overall queue ranking value for the task.
SortingValueBreakdown SortingValueBreakdown `bson:"sorting_value_breakdown" json:"sorting_value_breakdown"`
Copy link
Contributor

@ablack12 ablack12 Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we actually just bson:"-" the bson value to make it more clear that we won't be persisting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

TaskGroupImpact int64
// GenerateTaskImpact represents how much of the total priority can be attributed to a
// task having a generate.tasks command in it.
GenerateTaskImpact int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extreme nit -- i think it might be more clear to call this GeneratorTaskImpact -- internally I think we usually refer to the generator task that way, and the generated tasks as generated, so generateTask is introducing a new vocabulary.


// SetSortingValueBreakdown saves a full breakdown which compartmentalizes each factor that played a role in computing the
// overall value used to sort it in the queue, and creates a honeycomb trace with this data to enable dashboards/analysis.
func (t *Task) SetSortingValueBreakdown(ctx context.Context, breakdown SortingValueBreakdown) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extreme nit -- could we call this "SetSortingValueBreakdownAttributes" ? without that, it's more in line with a function name that sets something in the DB, which this function doesn't do

attribute.Float64(fmt.Sprintf("%s.base_priority_pct", priorityBreakdownAttributePrefix), float64(breakdown.PriorityBreakdown.InitialPriorityImpact/breakdown.TotalValue*100)),
attribute.Float64(fmt.Sprintf("%s.task_group_pct", priorityBreakdownAttributePrefix), float64(breakdown.PriorityBreakdown.TaskGroupImpact/breakdown.TotalValue*100)),
attribute.Float64(fmt.Sprintf("%s.generate_task_pct", priorityBreakdownAttributePrefix), float64(breakdown.PriorityBreakdown.GenerateTaskImpact/breakdown.TotalValue*100)),
attribute.Float64(fmt.Sprintf("%s.commit_queue_pct", priorityBreakdownAttributePrefix), float64(breakdown.PriorityBreakdown.CommitQueueImpact/breakdown.TotalValue*100)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it already feels so sus how many values are both in rank and priority haha

Project string `bson:"project" json:"project"`
ExpectedDuration time.Duration `bson:"exp_dur" json:"exp_dur"`
Priority int64 `bson:"priority" json:"priority"`
SortingValueBreakdown task.SortingValueBreakdown `bson:"sorting_value_breakdown" json:"sorting_value_breakdown"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mildly concerned that for long task queues, passing all this information around with every item is going to slow things down. Can we keep an eye on scheduler job timing after we deploy to see if this has an impact?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ultimately just a collection of integers so I think it should be alright, but yes I'll monitor

// computing this value is (custom_priority * custom_rankValue) + unit_length, where custom_priority
// and custom_rankValue are both derived from specific properties of the unit and various
// scheduler constants.
func (u *unitInfo) value() task.SortingValueBreakdown {
Copy link
Contributor

@ablack12 ablack12 Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this change but just for my understanding, doesn't this change how the value computation works? It seems like we add length throughout in the original implementation, and here we add it at the end (Although i see we also are using it throughout in this implementation too, i'm just missing where we add it at the end maybe)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I fully understand but the role length plays before and after this PR should be unchanged. Before, length is added to the original priority here, and is part of the various rank value equations (and is also added at the end) here.

In these new changes, length plays the same two roles, here and here, respectively.


return value
// computePriority computes the custom priority value for this unit, which will later be multiplied with the
// custom rank value to compute a final value by which the unit will be sorted on in the queue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make a note that this destructively updates the passed in breakdown? (same with the other function)


// computeRankValue computes the custom rank value for this unit, which will later be multiplied with the
// custom priority to compute a final value by which the unit will be sorted on in the queue.
func (u *unitInfo) computeRankValue(unitLength int64, breakdown *task.SortingValueBreakdown) int64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to pass in unitLength separately, since it just comes from breakdown? If we just want the original unit length (like we update it at some point), we could just set like originalUnitLength = breakdown.TaskGroupLength at the beginning of this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's true. Updated

@hadjri
Copy link
Contributor Author

hadjri commented Nov 27, 2024

Did you test this in staging with currently scheduled tasks by any chance? I'm a little bit wondering if changing cachedValue's format mid-run is going to break backwards compatibility (which could be fine if it just means things need to wait for the next scheduler job to get scheduled, just trying to think through it)

I just tested it and didn't observe any issues / errors. Once this deploys I'll be monitoring for a bit including the scheduler job.

@hadjri hadjri merged commit c7f9ffc into evergreen-ci:main Nov 27, 2024
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants