-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat (sync-service): add filters to PG publication #1660
Conversation
8d7164d
to
ff94123
Compare
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
ff94123
to
bb8267f
Compare
Looks great! The main potential problem that comes to mind here is would it get slow for Postgres to apply the where clause if e.g. there were 10k different unique shapes that it's processing. I doubt that'll happen that often and this should go in now as it makes sense but the above concern seems like something we'll want to try benchmarking and perhaps have a work-around for. |
@KyleAMathews Agreed. If i understand correctly, your point here is that the processing is done twice, right? Once by Postgres to check if it passes the publication filter, and then once by Electric which checks the individual filters to match the shapes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Great work
no more just that if the publication where clauses gets really massive because of the number of unique shapes that Electric is processing — does PG start to get slower at sending us operations because it's now a non-trivial task to calculate if it should send us an operation. |
Are we dropping PG14 from our supported matrix? |
Postgres 15 is now two years old and Postgres 17 is about to be released. Given we started again fresh with the re-write, I think it's fair to move up a version. Alternatively can we capability detect / version check when applying this optimisation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think current code works entirely correctly + I wonder if we need to drop PG14, or if we can make this behavior conditional on the PG version
fa34a71
to
0b02a95
Compare
This could use some double-checking but Postgres is going to be more efficient at filtering than our current implementation in Electric regardless. What I'd be curious to know is how aggressively does Postgres deduplicate multiple similar OR-branches in a boolean expression. I expect it to be quite good at it since the same machinery is in use here as for the general WHERE expressions in SELECTs and other top-level statements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, this is a very nice addition. I was expecting we would need to drop down to Postgres row filters for performance gains at some point. Kudos on getting it done so quickly!
Now, thinking about potential implications, we need good test coverage for the case where Postgres converts an UPDATE into an INSERT or a DELETE based on row filter evaluation results. This is almost exactly the same thing we do for move-ins and move-outs, see this source comment.
I'm not expecting any problems for the UPDATE->INSERT conversion. But when an UPDATE is converted into a DELETE, we should double-check that the logical message includes the same fields that we would have chosen had the conversion happened in Electric, namely, the full old tuple and the old primary key (when the UPDATE changes any of the PK columns):
electric/packages/sync-service/lib/electric/replication/changes.ex
Lines 286 to 294 in 23f8f6b
def convert_update(%UpdatedRecord{} = change, to: :deleted_record) do | |
%DeletedRecord{ | |
relation: change.relation, | |
old_record: change.old_record, | |
key: change.old_key || change.key, | |
log_offset: change.log_offset, | |
tags: change.tags | |
} | |
end |
f5c3b07
to
4847896
Compare
04df034
to
307b64b
Compare
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
great point! Yeah, there's been an enormous amount of effort poured into optimizing this. |
@icehaunter Great catch for If you could review the changes that would be great. |
This PR modifies the sync-service such that it adds the necessary filters to the PG publication based on the shape's where clause. Basically, for every table, we add a filter which is the disjunction of the where clauses of all the shapes that are rooted in this table. For example, if we have a shape on `foo` with where clause `a = 5` and another shape for `foo` with where clause `a = 9` then the publication will have the filter `((a = 5) OR (a = 9))`. As soon as there is a shape that does not have a where clause, the filters for that table is removed from the publication. **Note: this requires at least Postgres 15.**
This PR modifies the sync-service such that it adds the necessary filters to the PG publication based on the shape's where clause. Basically, for every table, we add a filter which is the disjunction of the where clauses of all the shapes that are rooted in this table.
For example, if we have a shape on
foo
with where clausea = 5
and another shape forfoo
with where clausea = 9
then the publication will have the filter((a = 5) OR (a = 9))
.As soon as there is a shape that does not have a where clause, the filters for that table is removed from the publication.
Note: this requires at least Postgres 15.