Performance in CUE #2857

myitcv · 2024-02-22T14:17:54Z

myitcv
Feb 22, 2024
Maintainer

This discussion is about performance in CUE, and how as a project we communicate improvements to CUE's evaluator. In particular, it seeks to address some of the feedback and questions in a recent Slack thread.

Performance and error messages are both critical to CUE's success.

Our communication on performance improvements has not been great. For that we would like to apologize, and commit to using this as an opportunity to do better.

Per @mpvl on Slack:

You haven't heard much from me publicly of late precisely because I've been in focus mode working on strategies for how to tackle the different categories of performance issues. We have landed a number of commits that form the basis of changes to the core evaluator. Two of the three major performance issues are addressed by these changes. The one thing that remains to have a first operational version of the new evaluator is finalizing the new implementation of disjunctions. After completing that, there are a bunch of further performance enhancements lined up to go on top of that.

This discussion is one part of a first step towards us improving our communication on performance matters, and changes to the evaluator more broadly.

Next steps

Create a GitHub discussion to respond in a less ephemeral way to the Slack thread. Done: this discussion!
Arrange two calls to discuss anything anyone would like regarding performance, including the Slack thread, this discussion. In progress.
Create an umbrella performance issue for all performance problems. This will provide a starting point for anyone interested in CUE performance issues. Done: Performance #2850
Create sub-issues below the main umbrella issue for the main categories of performance problems. These will provide the next layer of detail for those interested in specific areas of performance problems. Done:
- Disjunctions - Performance: Disjunctions #2851
- Repeated evaluation of incomplete expressions - Performance: Repeated evaluation of incomplete expressions #2852
- Closedness algorithm consuming lots of memory - Performance: closedness algorithm consuming lots of memory #2853
- Avoiding duplicate work - Performance: avoiding duplicating reusable work #2854
- Incremental evaluation - Performance: incremental Evaluation #2855
Create an Evaluator Roadmap Project to publicly track our work, following the approach discussed in Improving communication via the CUE open source project #2846. This will also cover performance-related changes to the evaluator. In progress: for now https://github.com/orgs/cue-lang/projects/18 only contains the performance sub-issues, but this will grow over time as we further migrate to the approach discussed in Improving communication via the CUE open source project #2846.
Follow up with users that have reported performance issues. We want to ensure that our understanding of your performance problems reported is current, and that we have representative configurations/reproducers. We will be contacting those with whom we have previously engaged on performance issues. Thank you to those people who have spent the time on this in the past debugging problems. If you are experiencing performance problems, please raise an issue in the first instance and we can take the conversation from there. Will start in w/c 2024/02/26.
Tidy up of existing performance-related issues: Will start in w/c 2024/02/26.
- Link existing performance-related issues from the relevant performance sub-issues.
- Close old umbrella issues: evaluator: general performance enhancements #804 is superseded by Performance #2850, and evaluator: disjunction performance improvements #2002 is superseded by Performance: Disjunctions #2851

FAQ

Are people with performance problems in the minority?
CUE being an open source project, it is impossible to answer this question with precise statistics. But as is hopefully clear from our attempt to be more transparent with the issues and the progress we are making on those issues, the categories of performance problem are well understood. As a reminder: users can register their projects with Unity, our regression and performance testing setup. This setup allows us to catch bugs/performance regressions before a change even lands in the project, but projects that are part of the Unity corpus also give us a much better "target" when it comes to making performance improvements. Please open issues for any performance problem that people are experiencing and we can help triage the problem and take things from there, including adding projects to Unity.

Can performance be The Priority for CUE until it has been fixed?
Performance and error messages are both critical to CUE's success. As @mpvl replied in Slack (quoted above), he has been in focus mode working on strategies for how to tackle the different categories of performance issues. So performance has been a key focus, and will remain so.

How can we see this progress on performance over time?
The Evaluator Roadmap project will capture progress on the evaluator more generally: we will use this roadmap as the basis for updates on community calls. The performance umbrella issue will be used to share high-level performance progress. The sub-issues capturing the main categories of performance issue will be updated with any specifics.

Why have we not had a release with any performance improvements? How long will all of this take?
We previously communicated that v0.7.0 was going to be the target release for the first cut of performance improvements. On community calls in the lead up to the release of v0.7.0 (December 2023) we indicated that we were going to change our approach to targeting v0.7.0 as the performance release for two reasons: a) the changes were not ready and b) the reduced release cadence was starting to impact our ability to get changes for modules and LSP out to users.

The challenge with the current work on performance improvement is that it relies on a rewrite of the evaluator. This rewrite addresses the three major performance issues in one fell swoop. This means that until this is done, there is little progress to be seen. We can measure big improvements for a subset of CUE as part of the work that has already been done. But none of that is visible until the new evaluator lands. One the last community call we mentioned we were close to completing the evaluator except for disjunctions. We are now close to completing the implementation of disjunctions for the new evaluator. This does not mean we will be done, but it will allow us to release an experimental version of the new evaluator, at least.

However we can do much better at communicating progress, specifically progress we are making towards improving performance in specific configurations. To reiterate the earlier point about Unity: projects that are part of the Unity corpus give us a much better "target" when it comes to making performance improvements, as well as concrete updates for the projects involved on how changes to the evaluator improve their situation.

Why can't any of these optimizations happen on the old evaluator?
This is very much a case of trade-offs. Over the history of CUE, we (and especially @mpvl) have learnt a lot about how not to structure the evaluator. This is driven by experience with real world configurations, and performance is a key aspect of this experience. There is, in effect, a ceiling on the performance improvements that can be made with the current (old) evaluator; put another way, we cannot apply some of these techniques to the current (old) evaluator. With that context in mind, it then becomes a judgment call as to whether those changes that could be retrofitted to the current (old) evaluator are worth it, if doing so detracts from efforts on the new evaluator. Also worth noting, we had to revert some improvements made to the current (old) evaluator in 2023 since the current (old) design caused unintended regressions. Again, we need to do a better job of communicating when we reach these decision points, and how we are reaching decisions.

Why is there such a focus on modules and not the evaluator?
As with many engineering projects, it is rarely the case that the more people you throw at a task the more efficient/effective you become. The same very much applies to the evaluator. Work in this space relies heavily on @mpvl's past experience, and that is not something that we can trivially replicate! Instead, we have sought to scale the process of improving CUE by taking non-evaluator work off @mpvl's plate to allow more of a focus. Whilst performance and error messages are both critical to CUE's success, other parts of the open source project are similarly critical. We hope that by improving our communication on the progress we are making on the performance side of things, that others can better understand the balance we are aiming to strike between the different focus areas of the projects. The recent discussion "Improving communication via the CUE open source project" talks more about how we are planning to structure and improve this communication.

How can I see where performance problems lie in my CUE code? How can I debug CPU/memory/performance issues?
As part of #2855 we are looking to improve the stats the evaluator collects to help make diagnosis of performance problems easier, for both users and the CUE team. Coupled with real-world configurations as part of Unity, this will make the "loop" of triage, analysis, and fixes for performance problems much tighter. Please subscribe to this issue for more information, or to contribute ideas. We will announce improved stats/logging as part of release notes, but also via the performance umbrella issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance in CUE #2857

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Performance in CUE #2857

myitcv Feb 22, 2024 Maintainer

Next steps

FAQ

Replies: 0 comments

myitcv
Feb 22, 2024
Maintainer