DEVPROD-12482 Provide rank value breakdown stats #8485

hadjri · 2024-11-18T22:22:40Z

Description

This PR granularizes the value computed by the queue sorting computation so that each sub-category of its computation can be tracked in Honeycomb. I also did a best-effort attempt to modularize the code and make it more understandable. This is not a functional change to how the sorting values themselves are computed.

With these changes, we can now create dashboards in Honeycomb that analyze how much impact each special case or our queue sorting algorithm is having on the overall state of the queue, and inspect if any of the impacts are vastly over or under-sized.

Testing

Luckily there are already many existing tests (plus a few new ones that I ran before and after this change) that should assert that the ranking algorithm has not changed.

Added new tests asserting that all of the total breakdown values are computed correctly in that their sums can be used to recompute the total sorting value.

Lastly, I also made sure that the Honeycomb dashboards can be created with these new values and that they make sense to interpret.

# Conflicts: # model/task_queue.go

bynn

This is an awesome breakdown of what's going on in the planner
I don't have too much to say because this is functionally the same as what already exists except it's easier to read and much better commented.

Only have slight comment comments

bynn · 2024-11-25T23:43:04Z

model/task/task.go

+// priority value that is used in the queue sorting value equation.
+type PriorityBreakdown struct {
+	// InitialPriorityImpact represents how much of the total priority can be attributed to the
+	// original priority set on a task by a user.


nit: I think it doesn't have to be user set because it could be set by the project yaml

bynn · 2024-11-26T00:02:06Z

scheduler/planner.go

@@ -300,20 +316,19 @@ func (unit *Unit) info() unitInfo {
 	return info
 }

-// RankValue returns a point value for the tasks in the unit that can
+// sortingValueBreakdown returns a point value for the tasks in the unit that can


this function returns the breakdown now

bynn · 2024-11-26T00:13:39Z

scheduler/planner.go


+// computeRankValue computes the custom rank value for this unit, which will later be multiplied with the
+// custom priority to compute a final value by which the unit will be sorted on in the queue.


nit: can we say 'computed' priority value to get the sorting value just to make it easier to correlate it with the code

bynn · 2024-11-26T00:14:11Z

scheduler/planner.go

-		// patches that have spent more time in the queue
-		// should get worked on first (because people are
-		// waiting on the results), and because FIFO feels
+		// Give patches a bump, over non-patches. patches that have spent more time in the queue


nit: you didn't write this but can we get rid of the comma and capitalize the "Patches" after the period

ablack12

Overall LGTM, commenting but not blocking since I'll be out. Did you test this in staging with currently scheduled tasks by any chance? I'm a little bit wondering if changing cachedValue's format mid-run is going to break backwards compatibility (which could be fine if it just means things need to wait for the next scheduler job to get scheduled, just trying to think through it)

Prepare to be the SME on scheduling logic now that your name is the git blame 😂

ablack12 · 2024-11-27T15:35:34Z

model/task/task.go

-	MustHaveResults   bool   `bson:"must_have_results,omitempty" json:"must_have_results,omitempty"`
+	// SortingValueBreakdown is not persisted to the db, but stored in memory and passed to the task queue document.
+	// It contains information on what factors led to the overall queue ranking value for the task.
+	SortingValueBreakdown SortingValueBreakdown `bson:"sorting_value_breakdown" json:"sorting_value_breakdown"`


could we actually just bson:"-" the bson value to make it more clear that we won't be persisting?

ablack12 · 2024-11-27T15:37:11Z

model/task/task.go

+	TaskGroupImpact int64
+	// GenerateTaskImpact represents how much of the total priority can be attributed to a
+	// task having a generate.tasks command in it.
+	GenerateTaskImpact int64


extreme nit -- i think it might be more clear to call this GeneratorTaskImpact -- internally I think we usually refer to the generator task that way, and the generated tasks as generated, so generateTask is introducing a new vocabulary.

ablack12 · 2024-11-27T15:38:28Z

model/task/task.go

+
+// SetSortingValueBreakdown saves a full breakdown which compartmentalizes each factor that played a role in computing the
+// overall value used to sort it in the queue, and creates a honeycomb trace with this data to enable dashboards/analysis.
+func (t *Task) SetSortingValueBreakdown(ctx context.Context, breakdown SortingValueBreakdown) {


extreme nit -- could we call this "SetSortingValueBreakdownAttributes" ? without that, it's more in line with a function name that sets something in the DB, which this function doesn't do

ablack12 · 2024-11-27T15:40:56Z

model/task/task.go

+		attribute.Float64(fmt.Sprintf("%s.base_priority_pct", priorityBreakdownAttributePrefix), float64(breakdown.PriorityBreakdown.InitialPriorityImpact/breakdown.TotalValue*100)),
+		attribute.Float64(fmt.Sprintf("%s.task_group_pct", priorityBreakdownAttributePrefix), float64(breakdown.PriorityBreakdown.TaskGroupImpact/breakdown.TotalValue*100)),
+		attribute.Float64(fmt.Sprintf("%s.generate_task_pct", priorityBreakdownAttributePrefix), float64(breakdown.PriorityBreakdown.GenerateTaskImpact/breakdown.TotalValue*100)),
+		attribute.Float64(fmt.Sprintf("%s.commit_queue_pct", priorityBreakdownAttributePrefix), float64(breakdown.PriorityBreakdown.CommitQueueImpact/breakdown.TotalValue*100)),


it already feels so sus how many values are both in rank and priority haha

ablack12 · 2024-11-27T15:41:51Z

model/task_queue.go

+	Project               string                     `bson:"project" json:"project"`
+	ExpectedDuration      time.Duration              `bson:"exp_dur" json:"exp_dur"`
+	Priority              int64                      `bson:"priority" json:"priority"`
+	SortingValueBreakdown task.SortingValueBreakdown `bson:"sorting_value_breakdown" json:"sorting_value_breakdown"`


I'm mildly concerned that for long task queues, passing all this information around with every item is going to slow things down. Can we keep an eye on scheduler job timing after we deploy to see if this has an impact?

It's ultimately just a collection of integers so I think it should be alright, but yes I'll monitor

ablack12 · 2024-11-27T15:47:49Z

scheduler/planner.go

+// computing this value is (custom_priority * custom_rankValue) + unit_length, where custom_priority
+// and custom_rankValue are both derived from specific properties of the unit and various
+// scheduler constants.
+func (u *unitInfo) value() task.SortingValueBreakdown {


I'm fine with this change but just for my understanding, doesn't this change how the value computation works? It seems like we add length throughout in the original implementation, and here we add it at the end (Although i see we also are using it throughout in this implementation too, i'm just missing where we add it at the end maybe)

I'm not sure if I fully understand but the role length plays before and after this PR should be unchanged. Before, length is added to the original priority here, and is part of the various rank value equations (and is also added at the end) here.

In these new changes, length plays the same two roles, here and here, respectively.

ablack12 · 2024-11-27T15:49:02Z

scheduler/planner.go


-	return value
+// computePriority computes the custom priority value for this unit, which will later be multiplied with the
+// custom rank value to compute a final value by which the unit will be sorted on in the queue.


Can we make a note that this destructively updates the passed in breakdown? (same with the other function)

ablack12 · 2024-11-27T15:49:56Z

scheduler/planner.go


+// computeRankValue computes the custom rank value for this unit, which will later be multiplied with the
+// custom priority to compute a final value by which the unit will be sorted on in the queue.
+func (u *unitInfo) computeRankValue(unitLength int64, breakdown *task.SortingValueBreakdown) int64 {


do we have to pass in unitLength separately, since it just comes from breakdown? If we just want the original unit length (like we update it at some point), we could just set like originalUnitLength = breakdown.TaskGroupLength at the beginning of this function

Yes that's true. Updated

hadjri · 2024-11-27T19:14:02Z

Did you test this in staging with currently scheduled tasks by any chance? I'm a little bit wondering if changing cachedValue's format mid-run is going to break backwards compatibility (which could be fine if it just means things need to wait for the next scheduler job to get scheduled, just trying to think through it)

I just tested it and didn't observe any issues / errors. Once this deploys I'll be monitoring for a bit including the scheduler job.

hadjri added 10 commits November 18, 2024 17:22

DEVPROD-12482 Provide rank value breakdown stats

66723aa

Comments

e45f4b1

Merge branch 'main' into DEVPROD-12482

52aaf02

Merge branch 'main' into DEVPROD-12482

149f3a1

# Conflicts: # model/task_queue.go

Remove analysis file

51aa499

DEVPROD-12086 Best-effort refactor

fb16d21

Best-effort refactor

4b09f2c

fix test

2c108c8

Document better and compartmentalize rank vs priority

791c576

Document better and compartmentalize rank vs priority

7d8c4a2

hadjri requested a review from a team November 21, 2024 22:59

bynn approved these changes Nov 26, 2024

View reviewed changes

ablack12 self-requested a review November 26, 2024 19:49

ablack12 reviewed Nov 27, 2024

View reviewed changes

CR

fb7c0f0

hadjri merged commit c7f9ffc into evergreen-ci:main Nov 27, 2024
8 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEVPROD-12482 Provide rank value breakdown stats #8485

DEVPROD-12482 Provide rank value breakdown stats #8485

hadjri commented Nov 18, 2024 •

edited

Loading

bynn left a comment

bynn Nov 25, 2024

bynn Nov 26, 2024

bynn Nov 26, 2024

bynn Nov 26, 2024

ablack12 left a comment

ablack12 Nov 27, 2024 •

edited

Loading

hadjri Nov 27, 2024

ablack12 Nov 27, 2024

ablack12 Nov 27, 2024

ablack12 Nov 27, 2024

ablack12 Nov 27, 2024

hadjri Nov 27, 2024

ablack12 Nov 27, 2024 •

edited

Loading

hadjri Nov 27, 2024

ablack12 Nov 27, 2024

ablack12 Nov 27, 2024

hadjri Nov 27, 2024

hadjri commented Nov 27, 2024


		// computeRankValue computes the custom rank value for this unit, which will later be multiplied with the
		// custom priority to compute a final value by which the unit will be sorted on in the queue.

DEVPROD-12482 Provide rank value breakdown stats #8485

DEVPROD-12482 Provide rank value breakdown stats #8485

Conversation

hadjri commented Nov 18, 2024 • edited Loading

Description

Testing

bynn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ablack12 left a comment

Choose a reason for hiding this comment

ablack12 Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ablack12 Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadjri commented Nov 27, 2024

hadjri commented Nov 18, 2024 •

edited

Loading

ablack12 Nov 27, 2024 •

edited

Loading

ablack12 Nov 27, 2024 •

edited

Loading