Replies: 17 comments 2 replies
-
I found a single try_lock call in Firefox, and I was able to easily remove it, but the issue still occurs. I have tested with hex editing out Try that it still works and it's not my fix. |
Beta Was this translation helpful? Give feedback.
-
I keep the changes to windows_sys.lst/rs as small as possible to make rebasing easier in the future. Unused declarations don't make it into the binary.
You could try forcing different Mutex kinds here and see if the behavior changes: I have tested the basic lock detection on Win 98SE (CreateMutex), Win XP SP3 (CriticalSection), Win 11 (SRWLocks) via the sample program insofar that they seem to work fine and lock and unlock properly. There have not been any stress tests or anything, so I can't guarantee that there isn't something wrong.
No idea if it's just firefox, but you could trace the loading/init process and hook the init code or even
Only things that will be added to the statically dllimported (= land in the import table header of the binary) have to be added to Notice how every function (excluding WinSock) I added to |
Beta Was this translation helpful? Give feedback.
-
These seem to conflict, but I guess it works.
Next time I build Rust, I'll try that.
Debug builds don't work too well on Vista, but I figure worst case I can just malform the TryAcquireSRWLock functions by hex editing to not match on 7+ and try that way. |
Beta Was this translation helpful? Give feedback.
-
Let me know if you figure out more! 🙏 |
Beta Was this translation helpful? Give feedback.
-
I force set it to CriticalSection, and I have the issue on both Vista and 10, and hex editing the TryAcquireSRWLock call does not fix it. I'll try Legacy tomorrow, I am too tired right now. |
Beta Was this translation helpful? Give feedback.
-
Legacy mutex's seemed to last longer, but I still got the same problem. The browser works from anywhere from a few seconds to a few minutes before freezing. I did get the funniest error in testing though
|
Beta Was this translation helpful? Give feedback.
-
I wonder if Firefox (probably rightly?) assumes that an RWLock can be read-locked multiple times. The fallbacks to a critical section or mutex of course won't allow that, which might cause a deadlock. In that case Rust9x would need to implement a completely separate fallback RWLock implementation, e.g. based on |
Beta Was this translation helpful? Give feedback.
-
You might be right, I was also experimenting with the Rust used to create Mypal68 (Firefox 68 for XP) and I noticed Semaphores are used in it. If you want to see the patch for it, here it is. It is 1.45.2 though, and I think I would need at minimum 1.66.0 for 115, and 1.76.0 for 128. I originally wanted to make a fork of 1.77.2 since that works for both 115 and 128, but I couldn't figure out how to port the changes to 1.77.2, however I was kinda doing it wrong in the first place so I could probably do it if I tried again. |
Beta Was this translation helpful? Give feedback.
-
Some of the implementations in that commit look super interesting! The RWLock impl there however doesn't seem to use semaphores or allow multiple read locks at the same time either, it also just falls back to std's old |
Beta Was this translation helpful? Give feedback.
-
I also suspect that the freeze is GFX related, and I know modern Firefox uses WebRender, which Mypal68 doesn't have. I would love to try building an older Firefox version that still allowed for disabling WebRender, however finding all the dependencies for that sounds like hell. |
Beta Was this translation helpful? Give feedback.
-
Hmmm, I'm pretty sure my current RwLock fallback implementation is just wrong (and the one in your linked commit too!). This example should panic immmediately, as the fallbacks guard against reentrancy. I've created #33 for it. Not sure when I get to updating rust9x though at the moment. |
Beta Was this translation helpful? Give feedback.
-
I guess I can look at the rwlock implementations in Firefox and see if anything would cause a crash or have this issue. |
Beta Was this translation helpful? Give feedback.
-
I don't know what I am looking for. I've traced it to this file as it's the only WebRender file to use std::sync::RwLock, but I'm not sure what in it would be. https://github.com/mozilla/gecko-dev/blob/release/gfx/wr/wr_glyph_rasterizer/src/rasterizer.rs#L28 |
Beta Was this translation helpful? Give feedback.
-
I mean the RWLock fallback implementations are just incorrect. You could try replacing the Arc<RwLock<..>> with an Arc<Mutex<...>> and test again. If the broken RWLock fallback is the cause, the Arc Mutex variant should also lock up on modern systems. |
Beta Was this translation helpful? Give feedback.
-
Well, yeah, but it still partially works. So I would want to mitigate the issue for now. |
Beta Was this translation helpful? Give feedback.
-
For now I've converted this to a discussion, as the actual issues are tracked in #33 and #34 |
Beta Was this translation helpful? Give feedback.
-
Ok, but I still want to know. Or should I just fake the Try functions with their non-try counterparts? |
Beta Was this translation helpful? Give feedback.
-
I tried to fork Rust9x to completely remove TryAcquireSRWLock and either replace the two functions with non-Try versions or critical sections.
However when I look into the code, it seems it tries to use them first, and if it can't find the functions, it then falls back to critical sections.
Something here is funny though, because in Firefox compiled with rust9x on a system without TryAcquireSRWLock but otherwise has SRW locks, the browser just freezes after a minute. If I hex edit out the Try part, the browser just works fine.
So either the fallback fails and causes the whole thing to freeze, or something fails when it hits critical sections, or something else idk.
If this is only an issue with Firefox, please give me some pointers as to how I could trace the issue, because I literally have no mentions of TryAcquireSRWLock anywhere in the source code of my modified version. This call is only from Rust.
Also idk if intentional, but when replicating this commit 08798a9 I noticed that in c.rs when you add TryEnterCriticalSection, you don't also replicate it in windows_sys.rs. Now I notice the code for the function itself looks the same as any other, so I assumed that if it's there in c.rs, it doesn't need to be in windows_sys.rs. However I also notice you just left all the SRW lock and ConditionVariable functions in windows_sys.rs. I'm not sure if this is a mistake or intentional, but it doesn't look right to me.
Beta Was this translation helpful? Give feedback.
All reactions