-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEVPROD-12482 Provide rank value breakdown stats #8485
Conversation
# Conflicts: # model/task_queue.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an awesome breakdown of what's going on in the planner
I don't have too much to say because this is functionally the same as what already exists except it's easier to read and much better commented.
Only have slight comment comments
model/task/task.go
Outdated
// priority value that is used in the queue sorting value equation. | ||
type PriorityBreakdown struct { | ||
// InitialPriorityImpact represents how much of the total priority can be attributed to the | ||
// original priority set on a task by a user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it doesn't have to be user set because it could be set by the project yaml
scheduler/planner.go
Outdated
@@ -300,20 +316,19 @@ func (unit *Unit) info() unitInfo { | |||
return info | |||
} | |||
|
|||
// RankValue returns a point value for the tasks in the unit that can | |||
// sortingValueBreakdown returns a point value for the tasks in the unit that can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function returns the breakdown now
scheduler/planner.go
Outdated
|
||
// computeRankValue computes the custom rank value for this unit, which will later be multiplied with the | ||
// custom priority to compute a final value by which the unit will be sorted on in the queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we say 'computed' priority value to get the sorting value just to make it easier to correlate it with the code
scheduler/planner.go
Outdated
// patches that have spent more time in the queue | ||
// should get worked on first (because people are | ||
// waiting on the results), and because FIFO feels | ||
// Give patches a bump, over non-patches. patches that have spent more time in the queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you didn't write this but can we get rid of the comma and capitalize the "Patches" after the period
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, commenting but not blocking since I'll be out. Did you test this in staging with currently scheduled tasks by any chance? I'm a little bit wondering if changing cachedValue's format mid-run is going to break backwards compatibility (which could be fine if it just means things need to wait for the next scheduler job to get scheduled, just trying to think through it)
Prepare to be the SME on scheduling logic now that your name is the git blame 😂
model/task/task.go
Outdated
MustHaveResults bool `bson:"must_have_results,omitempty" json:"must_have_results,omitempty"` | ||
// SortingValueBreakdown is not persisted to the db, but stored in memory and passed to the task queue document. | ||
// It contains information on what factors led to the overall queue ranking value for the task. | ||
SortingValueBreakdown SortingValueBreakdown `bson:"sorting_value_breakdown" json:"sorting_value_breakdown"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we actually just bson:"-"
the bson value to make it more clear that we won't be persisting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
model/task/task.go
Outdated
TaskGroupImpact int64 | ||
// GenerateTaskImpact represents how much of the total priority can be attributed to a | ||
// task having a generate.tasks command in it. | ||
GenerateTaskImpact int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extreme nit -- i think it might be more clear to call this GeneratorTaskImpact -- internally I think we usually refer to the generator task that way, and the generated tasks as generated, so generateTask is introducing a new vocabulary.
model/task/task.go
Outdated
|
||
// SetSortingValueBreakdown saves a full breakdown which compartmentalizes each factor that played a role in computing the | ||
// overall value used to sort it in the queue, and creates a honeycomb trace with this data to enable dashboards/analysis. | ||
func (t *Task) SetSortingValueBreakdown(ctx context.Context, breakdown SortingValueBreakdown) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extreme nit -- could we call this "SetSortingValueBreakdownAttributes" ? without that, it's more in line with a function name that sets something in the DB, which this function doesn't do
attribute.Float64(fmt.Sprintf("%s.base_priority_pct", priorityBreakdownAttributePrefix), float64(breakdown.PriorityBreakdown.InitialPriorityImpact/breakdown.TotalValue*100)), | ||
attribute.Float64(fmt.Sprintf("%s.task_group_pct", priorityBreakdownAttributePrefix), float64(breakdown.PriorityBreakdown.TaskGroupImpact/breakdown.TotalValue*100)), | ||
attribute.Float64(fmt.Sprintf("%s.generate_task_pct", priorityBreakdownAttributePrefix), float64(breakdown.PriorityBreakdown.GenerateTaskImpact/breakdown.TotalValue*100)), | ||
attribute.Float64(fmt.Sprintf("%s.commit_queue_pct", priorityBreakdownAttributePrefix), float64(breakdown.PriorityBreakdown.CommitQueueImpact/breakdown.TotalValue*100)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it already feels so sus how many values are both in rank and priority haha
Project string `bson:"project" json:"project"` | ||
ExpectedDuration time.Duration `bson:"exp_dur" json:"exp_dur"` | ||
Priority int64 `bson:"priority" json:"priority"` | ||
SortingValueBreakdown task.SortingValueBreakdown `bson:"sorting_value_breakdown" json:"sorting_value_breakdown"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mildly concerned that for long task queues, passing all this information around with every item is going to slow things down. Can we keep an eye on scheduler job timing after we deploy to see if this has an impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ultimately just a collection of integers so I think it should be alright, but yes I'll monitor
// computing this value is (custom_priority * custom_rankValue) + unit_length, where custom_priority | ||
// and custom_rankValue are both derived from specific properties of the unit and various | ||
// scheduler constants. | ||
func (u *unitInfo) value() task.SortingValueBreakdown { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this change but just for my understanding, doesn't this change how the value computation works? It seems like we add length throughout in the original implementation, and here we add it at the end (Although i see we also are using it throughout in this implementation too, i'm just missing where we add it at the end maybe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I fully understand but the role length plays before and after this PR should be unchanged. Before, length is added to the original priority here, and is part of the various rank value equations (and is also added at the end) here.
In these new changes, length plays the same two roles, here and here, respectively.
scheduler/planner.go
Outdated
|
||
return value | ||
// computePriority computes the custom priority value for this unit, which will later be multiplied with the | ||
// custom rank value to compute a final value by which the unit will be sorted on in the queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make a note that this destructively updates the passed in breakdown? (same with the other function)
scheduler/planner.go
Outdated
|
||
// computeRankValue computes the custom rank value for this unit, which will later be multiplied with the | ||
// custom priority to compute a final value by which the unit will be sorted on in the queue. | ||
func (u *unitInfo) computeRankValue(unitLength int64, breakdown *task.SortingValueBreakdown) int64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to pass in unitLength separately, since it just comes from breakdown? If we just want the original unit length (like we update it at some point), we could just set like originalUnitLength = breakdown.TaskGroupLength
at the beginning of this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's true. Updated
I just tested it and didn't observe any issues / errors. Once this deploys I'll be monitoring for a bit including the scheduler job. |
DEVPROD-12482
Description
This PR granularizes the value computed by the queue sorting computation so that each sub-category of its computation can be tracked in Honeycomb. I also did a best-effort attempt to modularize the code and make it more understandable. This is not a functional change to how the sorting values themselves are computed.
With these changes, we can now create dashboards in Honeycomb that analyze how much impact each special case or our queue sorting algorithm is having on the overall state of the queue, and inspect if any of the impacts are vastly over or under-sized.
Testing
Luckily there are already many existing tests (plus a few new ones that I ran before and after this change) that should assert that the ranking algorithm has not changed.
Added new tests asserting that all of the total breakdown values are computed correctly in that their sums can be used to recompute the total sorting value.
Lastly, I also made sure that the Honeycomb dashboards can be created with these new values and that they make sense to interpret.