-
-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4790 Followup, rethink the optimisations #5011
Comments
IMHO replay is not even a valid performance test case, because it's ultimately uninteresting if that takes a bit longer or not. I certainly didn't really test for that. It's more important that the projections are compact and read queries are fast (and workspace creation obvs)
Or do we have an error in the projection that just didn't manifest so far because we didn't have this key? Because I really do not know how this should be able to exist twice. |
To add some discussion points, as now I got started thinking about all of this, supposedly this is the query to get the parent to a specific child (identified by contentstream, dimensions and aggregate id)
And now I wonder, why do we join hierachy twice here? It should be the same edge just looked at from different sides, it cannot be two separate edges between parent and child, or do I miss something big here? |
Another thing, I didn't see this before but this exists: |
aaand this also assumes the combination is unique: |
I think it would be helpful to figure out what query exactly fails with the primary key applied (or stop replay right after the problematic version without the key) so we can see what hierachy is created, I more an dmore suspect we might actually create invalid data there? |
I'm a bit concerned as well that the replay didn't work with the imho proper index. To me that looks rather like a broken event stream than an issue with the projection. I'd really like to see the hierarchy relations to the failing node aggregate and how they came to be. The hierarchy relation double-check was introduced years ago at a time when the graph projection was maybe less mature than now, we could have a look at that. Especially with all the new test cases implemented since then. As for performance, I agree with @kitsunet; we optimize for read performance here. I'd be up for serious performance testing in the postgres adapter later on, maybe we then can backport some of our insights to the general dbal adapter in a later version. |
I have this primary key currently on my system again and it's fine... It doesn't have much benefit though, so I guess we can just skip it for now and reevaluate this later on. Fact remains if this uniqueness is violated something is wrong in the uderlaying data. |
Jup and i still cannot replay his event stream in the latest beta. So they might just be corrupted. |
Okay as a "hi" from the future. I attempted to revert #5009 again and see if the primary key still causes trouble in the projected from earlier
And - probably due to pruning a lot of useless event streams - it works. So if it makes a difference we can reintroduce this? :) EDIT:
|
In #4790 some optimisations were done.
A part of the change caused a critical bug in the beta8 and slowed down performance enormous while replaying
ContentStreamWasForked
events.This part will be reverted via #5009 as a hotfix.
But some questions are still open:
Bastian and me tested !!! BUGFIX: Fix schema of hierarchy relation table #5009 and compared that to a full revert of !!! TASK: Split dimensionspacepoints into separate table to reduce data duplication #4790 and found evidence (investigated on 20k events with two language dimension values). On the contrary Bastian thinks it might even slow down slightly.
['childnodeanchor', 'contentstreamid', 'dimensionspacepointhash', 'parentnodeanchor']
The text was updated successfully, but these errors were encountered: