-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Realtime test sometime failing? #2515
Comments
See also this run - python 3.13 + postgres, |
It looks like the test It's likely to do with the sql caching mechanism. I've looked through the stack a bit to try to figure out what code might be non-deterministic. My current hypothesis is based around the fact that we use
So far this is just a theory. I think this scenario should be possible, but I haven't managed locally to generate an If this is the issue, then it should be relatively easy to fix with a slight tweak to the caching strategy (and maybe we could do this anyway). But it would be nice to get to the bottom of this beforehand, as this kind of failure could mask more subtle bugs. If for instance the only difference between the two settings objects were in the comparison levels (or even just the parameter values!), then the 'incorrect' SQL would still execute. If we were only dealing with the |
Note this line from the python docs on
which suggests that this mechanism is at least plausible. How likely it is I don't really have a sense of. |
Okay I believe that this is in fact what is occurring. I have run tests repeatedly on my fork in CI to get a failure, with some additional logging in place. This run has a failure, and you can see in the logs that a settings
For reference here is an equivalent portion of the logs for a non-failing test run - in this case the object gets a distinct
|
Thanks for looking into this, and excellent detective work! I just checked back on this one because I just got the same failure here The reason I used id(settings) was simply that it seems like a low-overhead way of uniquely identifying a settings object. Obviously it didn't work as anticipated but any other mechanism should do. It was just to avoid caching errors in the case where the user is making predictions from two different models - leading to the potential for one model to use the cached sql from the other |
Update on this / to document my thinking:
I've run it with some hooks to confirm that in cases where this issue would have arisen this fixes things. I could merge that in now, and it would solve the intermittent test failures. But as it's not a blocking issue currently, I'm going to explore another option based on hashing the object (snapshot)s. I realise it's probably a fairly niche bug, but I think it would be useful to try and get this right, as there is potential for quite a subtle (and thus hard-to-notice) error here. |
Interesting - the weakref idea is definitely an improvement, and feels like the correct/principled solution for the object case, it's a shame about the I agree that it's worth solving this 'properly' Could we simply set our own uuid
But, I appreciate it's a bit hacky! The loading directly from a path scenario is interesting because it's potentially a common production use case. The simplicity of 'if settings is a string, use it as a key' seems potentially nice - easy to understand and high performance. (apologies for the mistake in the first place - i had assumed that because the id is a big number, the chance of collision was negligible, but I can see now why that's not the case) Or maybe it's just:
That doesn't feel too bad... |
The test
tests/test_realtime.py::test_realtime_cache_different_settings[duckdb]
failed in python 3.9 on this run. Re-running the job allowed the test to pass, which is odd as I can't see where any indeterminacy could occur.Error was: Binder Error: Values list "l" does not have a column named "tf_first_name"
.Full test output
Test summary output
Environment info (3.9.20)
The text was updated successfully, but these errors were encountered: