Performance in CUE #2857
myitcv
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This discussion is about performance in CUE, and how as a project we communicate improvements to CUE's evaluator. In particular, it seeks to address some of the feedback and questions in a recent Slack thread.
Performance and error messages are both critical to CUE's success.
Our communication on performance improvements has not been great. For that we would like to apologize, and commit to using this as an opportunity to do better.
Per @mpvl on Slack:
This discussion is one part of a first step towards us improving our communication on performance matters, and changes to the evaluator more broadly.
Next steps
FAQ
Are people with performance problems in the minority?
CUE being an open source project, it is impossible to answer this question with precise statistics. But as is hopefully clear from our attempt to be more transparent with the issues and the progress we are making on those issues, the categories of performance problem are well understood. As a reminder: users can register their projects with Unity, our regression and performance testing setup. This setup allows us to catch bugs/performance regressions before a change even lands in the project, but projects that are part of the Unity corpus also give us a much better "target" when it comes to making performance improvements. Please open issues for any performance problem that people are experiencing and we can help triage the problem and take things from there, including adding projects to Unity.
Can performance be The Priority for CUE until it has been fixed?
Performance and error messages are both critical to CUE's success. As @mpvl replied in Slack (quoted above), he has been in focus mode working on strategies for how to tackle the different categories of performance issues. So performance has been a key focus, and will remain so.
How can we see this progress on performance over time?
The Evaluator Roadmap project will capture progress on the evaluator more generally: we will use this roadmap as the basis for updates on community calls. The performance umbrella issue will be used to share high-level performance progress. The sub-issues capturing the main categories of performance issue will be updated with any specifics.
Why have we not had a release with any performance improvements? How long will all of this take?
We previously communicated that v0.7.0 was going to be the target release for the first cut of performance improvements. On community calls in the lead up to the release of v0.7.0 (December 2023) we indicated that we were going to change our approach to targeting v0.7.0 as the performance release for two reasons: a) the changes were not ready and b) the reduced release cadence was starting to impact our ability to get changes for modules and LSP out to users.
The challenge with the current work on performance improvement is that it relies on a rewrite of the evaluator. This rewrite addresses the three major performance issues in one fell swoop. This means that until this is done, there is little progress to be seen. We can measure big improvements for a subset of CUE as part of the work that has already been done. But none of that is visible until the new evaluator lands. One the last community call we mentioned we were close to completing the evaluator except for disjunctions. We are now close to completing the implementation of disjunctions for the new evaluator. This does not mean we will be done, but it will allow us to release an experimental version of the new evaluator, at least.
However we can do much better at communicating progress, specifically progress we are making towards improving performance in specific configurations. To reiterate the earlier point about Unity: projects that are part of the Unity corpus give us a much better "target" when it comes to making performance improvements, as well as concrete updates for the projects involved on how changes to the evaluator improve their situation.
Why can't any of these optimizations happen on the old evaluator?
This is very much a case of trade-offs. Over the history of CUE, we (and especially @mpvl) have learnt a lot about how not to structure the evaluator. This is driven by experience with real world configurations, and performance is a key aspect of this experience. There is, in effect, a ceiling on the performance improvements that can be made with the current (old) evaluator; put another way, we cannot apply some of these techniques to the current (old) evaluator. With that context in mind, it then becomes a judgment call as to whether those changes that could be retrofitted to the current (old) evaluator are worth it, if doing so detracts from efforts on the new evaluator. Also worth noting, we had to revert some improvements made to the current (old) evaluator in 2023 since the current (old) design caused unintended regressions. Again, we need to do a better job of communicating when we reach these decision points, and how we are reaching decisions.
Why is there such a focus on modules and not the evaluator?
As with many engineering projects, it is rarely the case that the more people you throw at a task the more efficient/effective you become. The same very much applies to the evaluator. Work in this space relies heavily on @mpvl's past experience, and that is not something that we can trivially replicate! Instead, we have sought to scale the process of improving CUE by taking non-evaluator work off @mpvl's plate to allow more of a focus. Whilst performance and error messages are both critical to CUE's success, other parts of the open source project are similarly critical. We hope that by improving our communication on the progress we are making on the performance side of things, that others can better understand the balance we are aiming to strike between the different focus areas of the projects. The recent discussion "Improving communication via the CUE open source project" talks more about how we are planning to structure and improve this communication.
How can I see where performance problems lie in my CUE code? How can I debug CPU/memory/performance issues?
As part of #2855 we are looking to improve the stats the evaluator collects to help make diagnosis of performance problems easier, for both users and the CUE team. Coupled with real-world configurations as part of Unity, this will make the "loop" of triage, analysis, and fixes for performance problems much tighter. Please subscribe to this issue for more information, or to contribute ideas. We will announce improved stats/logging as part of release notes, but also via the performance umbrella issue.
Beta Was this translation helpful? Give feedback.
All reactions