-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selectively disable direct-scanout to reduce bandwidth #969
base: master
Are you sure you want to change the base?
Conversation
and allow for syncing up to the next frame, that actually uses fewer planes for reduced bandwidth.
Okay, I am also procrastinating on exactly this topic for the last few weeks...so this is a quick brain dump: GeneralWhat I have come up with so far (partially some PoC impls, but mostly just ideas):
DrmDeviceManagerUsing an Before adding a crtc we might need to make sure there is no direct scan-out in place. This could be done by OtherThere is still an issue with doing direct scan-out on the primary plane and a second CRTC. It could (actually I have seen this happen...) happen that the direct scan-out on CRTC 1 reduces the overall bandwidth requirement allowing CRTC 2 to use an additional overlay plane. But going back to composition on CRTC 1 might then fail...
The first option might fail in funny ways and the second sounds way easier. The second one could be implemented with something similar to the |
Some kind of use_mode wrapper in the device token could use a similar approach. First optimistic by just calling use_mode on the compositor and when that fails take a write lock and retry after making sure all crtc do not use direct scan-out. Issues is more or less the same, we might have to redraw all crtc (or at least commit some buffer) |
+1 for extending the
+1
+1
So this is to aid a similar approach to wlroots swapchain helper by forcing a specific modifier set?
When would this be used? To fall back to 8-bit for more bandwidth?
Not sure I get the picture here, is this to build the device-wide test commit?
+1 for the approach of a sharable (assuming
+1
Oh yeah, I forgot about this one...
This could then fail scanout later, we essentially would need to make two tests, right? I like the idea conceptionally, but I also fear this adds to much complexity.
I feel like this might potentially be too limiting, I don't think there is any guarantee of gbm selecting the same modifier as a client using vulkan/egl might, so we might leave a bunch of performance on the table here... I guess we could limit the |
Yeah, in the first version also probably falling back to
Basically doing the same as
Yeah, we need to test all CRTC at once which can be a bit tricky without accessing internal state of
Not exactly, the idea was to have it once and only a single place which is responsible for enabling CRTC. Adding a crtc would return an In my Poc
and
Yeah and I am also not sure this will work on all devices tbh.
Not sure this will work, I have seen CRTC fail to enable just because of the modifier. Afaik the intel CCS modifier requires more bandwidth and could again cause this issue when direct scan-out uses something like Y-Tiled. But on the other hand |
I think I had a typo here in my original response, that made this somewhat hard to pass. Lets say we let this happen and once we fall-back to composition we fail because of the modifier of our swapchain. Couldn't we re-create the swapchain with the modifier previously in use when we did direct scanout? |
So this is an attempt for a problem, that has been bugging me for quite some time, but I am not convinced this is a good implementation...
The problem
So we have run into a few issues with amdgpu cards, where certain modes were not selectable, e.g.:
The underlying problem, as I was able to confirm with drm_info logs and talking to some AMD engineers seems to be, that we are using overlay planes. Or rather, that overlay planes contribute to bandwidth usage, making new configurations with higher requirements (faster / more high-res mode) potentially impossible without disabling some planes. Unfortunately the drm-api currently has no way to communicate why a commit failed, so we just reject the new mode.
Solution 1: So the initial idea would be to disable planes on a given crtc before attempting to modeset.
compositor.clear()
seems to be an obvious solution here, given we modeset anyway.Unfortunately that doesn't work. Roaming planes (e.g. planes that can be attached to multiple crtcs) exist and given this runs concurrently other
DrmCompositor
s might be able to grab the just released planes before we are done with modesetting. Fun. Also this problem can apply to enabling completely new outputs.Solution 2: So we need to synchronize this for the whole device?
Almost works, though
compositor.clear
is pretty ugly here, as we would be blanking all surfaces for but I have also seen another variant of this issue, where drm leasing fails, because we are using too much bandwidth. So now we suddenly need to keep resource usage low, while another process is involved, because we have no (good) way to asses how much bandwidth that adds at all. At this point we just want to disable overlay plane usage (though just disabling direct-scanout for now is roughly equivalent) for both the modeset (to avoid flicker) and while a lease is active.Solution 3: This PR.
Solution
Alternatives
Perhaps we should instead synchronize by terminating the surface threads, doing everything on the main thread and restarting the threads. We clearly need more access to the
DrmCompositor
s state and that is tricky. So instead maybe we should just add a way to destruct aDrmCompositor
into it'sDrmSurface
and make thatSend
. That way we should hopefully be able to still avoid flickering, though we might need to still somehow obtain a fully-composited rendered image to not break the visuals for a frame while disabling planes.This would also help us solve another issue in the future, that wlroots recently tackled:
Unfortunately it looks like some gpus are so resource constraint, that we need more logic to even enable all possible displays without any planes in use. And we need device-commits to make this work at all, which complicates synchronizing even more and makes shutting down the threads temporarily even more appealing.
That is a much more significant rework of both smithay code and cosmic-comp code though.
cc @ids1024 for general comments on both the problem and code. @cmeissl for ideas on the problem and the smithay side of things.
Draft while design isn't worked out and testing so far as only been done to verify, that this doesn't break anything, not to verify, that this actually fixes these bugs. (Lacking hardware that reproduces this.)