-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter replication publication columns #1831
Comments
After discussing, we think the warning does not affect us, since we're impersonating the protocl and we only have one subscriber for one publication anyways. |
Short update on issue that I ran into with this: It seems that we cannot use The error that I get if I do is:
|
We don't need to set
In case of a developer requesting a shape with |
One more issue discovered is that a where clause on the publication can only reference columns covered by the replica identity, which means that if the where clause references anything other than a primary key we still need replica identity full:
See docs for more info We can still configure it on a need-only-basis, but wanted to flag up that there's more cases where it is required. |
Closes #1774 This work started to introduce column filters (see #1831) but ended up on a road block because of us using `REPLICA IDENTITY FULL` - however the work also takes care of cleaning up filters. - Introduced singular process for updating publication - we were locking on it before anyway, might as well linearise it ourselves. - Process maintains reference counted structure for the filters per relation, including where clauses and filtered columns, in order to produce correct overall filters per relation - Update to the publication is debounced to allow batching together many shape creations - Every update does a complete rewrite of the publication filters so they are maintained clean - but also introduced a `remove_shape` call so that if electric remains with no shapes it should also have no subscriptions to tables. ## TODOs - [x] Write tests for `PublicationManager` - [x] Write procedure for recovering in-memory state from `shape_status.list_shapes` in `recover_shapes` - [ ] Split where clauses at top-level `AND`s to improve filter optimality (suggested be @icehaunter ) - [edit: not doing this now, as we can be smart about this an do even more "merging" of where clauses like `x = 1` and `x = 2` to `x in (1, 2)` - separate PR]
We have discussed removing publication where clause filters, as we've seen that it is actually slower than doing the filtering in Elixir (...maybe not in all cases, we'd have to check), but if we do this than we'd be good with keeping replica identity default Can you describe the other issue you mentioned in our call? |
The other issue with filtering columns, which requires setting Suppose you have a table
Therefore by using |
The cost would be on the number of shapes defined for a table with different column filters. Do we think there could be many? On the other end, how much would we save in the general case by filtering out columns in the publication? We're entering the ground of fine-grained optimization. It's not clear what is best for the different workloads. I'm keen in having the possibility of filtering columns on the publication, but I wonder how and when is the best way to expose this to developers, if it isn't one-size-fits all. |
Following #1804 and corresponding PR #1829
We want to alter the publication to only send over the selected/required columns over the replication stream, which would be the selected columns via the
column
parameter, and potentially also any columns referenced in thewhere
clause if specified.While PG does support partial replication of a table (at least for PG15 and above I think), there are some warnings in the docs:
This implies that even with a single publication and subscription, altering the column lists of tables might lead to errors - this needs to be investigated.
The text was updated successfully, but these errors were encountered: