-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up publication filters when shapes are removed #1774
Comments
How much work is it to generate this call? What if 1000s of new shapes are being created every second? |
The cleanups in general can be scheduled, but it is important to note that not cleaning them up also means that we're processing data through the replication stream that we don't need, which also takes up resources - delaying a few requests in order to free up resources is probably still a better option, but we'd need to benchmark that. Cleaning it up at any point is probably the first step. We could also avoid calling it whenever a shape is deleted and only call it whenever a shape is created like we do now, and the creation call will clean up what is not used (and perhaps trigger periodic cleanups as well rather than on every shape deletion). For reference we're already doing at least 2 roundtrips with Postgres whenever a shape is created in order to adjust the publication filter and that occurs on the critical path, so I doubt the cleaning up will be a bottleneck. |
Addresses #1749 - Makes publication and slot names configurable via a `REPLICATION_STREAM_ID` env variable, which can ultimately be used for multiple electric deploys - Quotes all publication and slot names to address potential issues with configurable names (alternative is to force downcase them when initialised to avoid nasty case-sensitive bugs) - Waits for a message from `Electric.LockConnection` that the lock is acquired before initialising `ConnectionManager` with the replication stream and shapes. - If more than one Electric tries to connect to the same replication slot (with the same `REPLICATION_STREAM_ID`), it will make a blocking query to acquire the lock that will resolve once the previous Electric using that slot releases it - this addresses rolling deploys, and ensures resources are initialised only once the previous Electric has released them - Could potentially switch to `pg_try_advisory_lock` that is not a blocking query but immediately returns whether the lock could be acquired and implement retries with backoff, but since using `pg_advisory_lock` simplifies the implementation I decided to start with that and see what people think. Things that I still need to address: - Currently the publication gets altered when a shape is created (adds a table and potentially a row filter) but no cleanup occurs - so the publication can potentially grow to include everything between restarts and deploys even if it is not being used. - The way I want to address this is to change the `Electric.Postgres.Configuration` to alter the publication based on _all_ active shapes rather than based on each individual one, in that case every call will update the publication as necessary and resuming/cleaning can be a matter of calling this every time a shape is deleted and once upon starting (with recovered shapes or no shapes). Can be a separate PR. - Created #1774 to address this separately --------- Co-authored-by: Oleksii Sholik <[email protected]>
It's a bit of an orthogonal point to this issue, but we need to re-think how setup filtering on the replication slot. We cannot set replication filter per shape. We need to be more selective on the WHERE clauses. It will never work to pass filters like If we think at a macro-level, one electric instance needs to be able to handle a certain amount of rows, no matter how many filters you put into the replication slot. If we get to the point that the max tp of electric in not high enough, we need to do shape matching more efficiently.
I'll create an issue for evaluating the performance impact of filtering |
Columns + tables seems sustainable. |
yeah subscribing to tables we already do, but we should also drop them from the replication slot when the last shape for that table is gone. Columns + tables will not trigger so frequently. @msfstef is there a quick route for implementing last shape for table is gone? Otherwise, I think this can be left until we work on #1778 |
@balegas it would require a bit of rewiring but I think we could do the following:
Should be a relatively simple solution - and we can leverage an ETS table to avoid going through a single process for this. |
Addresses #1749 - Makes publication and slot names configurable via a `REPLICATION_STREAM_ID` env variable, which can ultimately be used for multiple electric deploys - Quotes all publication and slot names to address potential issues with configurable names (alternative is to force downcase them when initialised to avoid nasty case-sensitive bugs) - Waits for a message from `Electric.LockConnection` that the lock is acquired before initialising `ConnectionManager` with the replication stream and shapes. - If more than one Electric tries to connect to the same replication slot (with the same `REPLICATION_STREAM_ID`), it will make a blocking query to acquire the lock that will resolve once the previous Electric using that slot releases it - this addresses rolling deploys, and ensures resources are initialised only once the previous Electric has released them - Could potentially switch to `pg_try_advisory_lock` that is not a blocking query but immediately returns whether the lock could be acquired and implement retries with backoff, but since using `pg_advisory_lock` simplifies the implementation I decided to start with that and see what people think. Things that I still need to address: - Currently the publication gets altered when a shape is created (adds a table and potentially a row filter) but no cleanup occurs - so the publication can potentially grow to include everything between restarts and deploys even if it is not being used. - The way I want to address this is to change the `Electric.Postgres.Configuration` to alter the publication based on _all_ active shapes rather than based on each individual one, in that case every call will update the publication as necessary and resuming/cleaning can be a matter of calling this every time a shape is deleted and once upon starting (with recovered shapes or no shapes). Can be a separate PR. - Created #1774 to address this separately --------- Co-authored-by: Oleksii Sholik <[email protected]>
I think that adding the fine-grain filtering to the replication slot remains a bit of a risk (it seems we're abusing it). We should evaluate how adding more filters to the replication slot increases lag/compute cost. q: do we filter only the columns any shape subscribes too? |
Closes #1774 This work started to introduce column filters (see #1831) but ended up on a road block because of us using `REPLICA IDENTITY FULL` - however the work also takes care of cleaning up filters. - Introduced singular process for updating publication - we were locking on it before anyway, might as well linearise it ourselves. - Process maintains reference counted structure for the filters per relation, including where clauses and filtered columns, in order to produce correct overall filters per relation - Update to the publication is debounced to allow batching together many shape creations - Every update does a complete rewrite of the publication filters so they are maintained clean - but also introduced a `remove_shape` call so that if electric remains with no shapes it should also have no subscriptions to tables. ## TODOs - [x] Write tests for `PublicationManager` - [x] Write procedure for recovering in-memory state from `shape_status.list_shapes` in `recover_shapes` - [ ] Split where clauses at top-level `AND`s to improve filter optimality (suggested be @icehaunter ) - [edit: not doing this now, as we can be smart about this an do even more "merging" of where clauses like `x = 1` and `x = 2` to `x in (1, 2)` - separate PR]
Currently the replication publication gets altered when a shape is created (adds a table and potentially a row filter) but no cleanup occurs - so the publication can potentially grow to include everything, even between restarts and deploys, and slow down the system over time.
My suggestion to address this is to change the
Electric.Postgres.Configuration
to alter the publication based on all active shapes rather than based on each individual one, in that case every call will update the entire publication and its filters as necessary and resuming/cleaning can be a matter of calling this every time a shape is deleted and once upon starting (with recovered shapes or no shapes).The text was updated successfully, but these errors were encountered: