Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat (sync-service): add filters to PG publication #1660

Merged
merged 9 commits into from
Sep 11, 2024

Conversation

kevin-dp
Copy link
Contributor

@kevin-dp kevin-dp commented Sep 10, 2024

This PR modifies the sync-service such that it adds the necessary filters to the PG publication based on the shape's where clause. Basically, for every table, we add a filter which is the disjunction of the where clauses of all the shapes that are rooted in this table.

For example, if we have a shape on foo with where clause a = 5 and another shape for foo with where clause a = 9 then the publication will have the filter ((a = 5) OR (a = 9)).

As soon as there is a shape that does not have a where clause, the filters for that table is removed from the publication.

Note: this requires at least Postgres 15.

@kevin-dp kevin-dp force-pushed the kevin/where-claused-pubs branch from 8d7164d to ff94123 Compare September 10, 2024 12:21
Copy link

netlify bot commented Sep 10, 2024

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 8d7164d
🔍 Latest deploy log https://app.netlify.com/sites/electric-next/deploys/66e039b23c1b3c000853e5c3
😎 Deploy Preview https://deploy-preview-1660--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kevin-dp kevin-dp force-pushed the kevin/where-claused-pubs branch from ff94123 to bb8267f Compare September 10, 2024 13:27
@kevin-dp kevin-dp marked this pull request as ready for review September 10, 2024 13:58
@KyleAMathews
Copy link
Contributor

Looks great!

The main potential problem that comes to mind here is would it get slow for Postgres to apply the where clause if e.g. there were 10k different unique shapes that it's processing.

I doubt that'll happen that often and this should go in now as it makes sense but the above concern seems like something we'll want to try benchmarking and perhaps have a work-around for.

@kevin-dp
Copy link
Contributor Author

@KyleAMathews Agreed. If i understand correctly, your point here is that the processing is done twice, right? Once by Postgres to check if it passes the publication filter, and then once by Electric which checks the individual filters to match the shapes.

Copy link
Contributor

@robacourt robacourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Great work

@KyleAMathews
Copy link
Contributor

If i understand correctly, your point here is that the processing is done twice, right? Once by Postgres to check if it passes the publication filter, and then once by Electric which checks the individual filters to match the shapes.

no more just that if the publication where clauses gets really massive because of the number of unique shapes that Electric is processing — does PG start to get slower at sending us operations because it's now a non-trivial task to calculate if it should send us an operation.

@icehaunter
Copy link
Contributor

Are we dropping PG14 from our supported matrix?

@thruflo
Copy link
Contributor

thruflo commented Sep 10, 2024

Postgres 15 is now two years old and Postgres 17 is about to be released. Given we started again fresh with the re-write, I think it's fair to move up a version.

Alternatively can we capability detect / version check when applying this optimisation?

Copy link
Contributor

@icehaunter icehaunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think current code works entirely correctly + I wonder if we need to drop PG14, or if we can make this behavior conditional on the PG version

@kevin-dp kevin-dp force-pushed the kevin/where-claused-pubs branch from fa34a71 to 0b02a95 Compare September 11, 2024 08:17
@alco
Copy link
Member

alco commented Sep 11, 2024

no more just that if the publication where clauses gets really massive because of the number of unique shapes that Electric is processing — does PG start to get slower at sending us operations because it's now a non-trivial task to calculate if it should send us an operation.

This could use some double-checking but Postgres is going to be more efficient at filtering than our current implementation in Electric regardless. What I'd be curious to know is how aggressively does Postgres deduplicate multiple similar OR-branches in a boolean expression. I expect it to be quite good at it since the same machinery is in use here as for the general WHERE expressions in SELECTs and other top-level statements.

Copy link
Member

@alco alco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, this is a very nice addition. I was expecting we would need to drop down to Postgres row filters for performance gains at some point. Kudos on getting it done so quickly!

Now, thinking about potential implications, we need good test coverage for the case where Postgres converts an UPDATE into an INSERT or a DELETE based on row filter evaluation results. This is almost exactly the same thing we do for move-ins and move-outs, see this source comment.

I'm not expecting any problems for the UPDATE->INSERT conversion. But when an UPDATE is converted into a DELETE, we should double-check that the logical message includes the same fields that we would have chosen had the conversion happened in Electric, namely, the full old tuple and the old primary key (when the UPDATE changes any of the PK columns):

def convert_update(%UpdatedRecord{} = change, to: :deleted_record) do
%DeletedRecord{
relation: change.relation,
old_record: change.old_record,
key: change.old_key || change.key,
log_offset: change.log_offset,
tags: change.tags
}
end
.

@kevin-dp kevin-dp force-pushed the kevin/where-claused-pubs branch 2 times, most recently from f5c3b07 to 4847896 Compare September 11, 2024 11:59
@kevin-dp kevin-dp force-pushed the kevin/where-claused-pubs branch from 04df034 to 307b64b Compare September 11, 2024 12:28
Copy link

netlify bot commented Sep 11, 2024

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 307b64b
🔍 Latest deploy log https://app.netlify.com/sites/electric-next/deploys/66e18cdfd2873b0008d65e69
😎 Deploy Preview https://deploy-preview-1660--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@KyleAMathews
Copy link
Contributor

but Postgres is going to be more efficient at filtering than our current implementation in Electric regardless. What I'd be curious to know is how aggressively does Postgres deduplicate multiple similar OR-branches in a boolean expression. I expect it to be quite good at it since the same machinery is in use here as for the general WHERE expressions in SELECTs and other top-level statements.

great point! Yeah, there's been an enormous amount of effort poured into optimizing this.

@kevin-dp
Copy link
Contributor Author

@icehaunter Great catch for SET overriding other tables. Fixed that!
Also, i changed the publication handling to be conditional on the version of PG and modified CI to run the publication tests with both PG 14 and PG 15 (cc @alco).

If you could review the changes that would be great.

@kevin-dp kevin-dp requested a review from icehaunter September 11, 2024 13:01
@kevin-dp kevin-dp merged commit 5c684bd into main Sep 11, 2024
23 checks passed
@kevin-dp kevin-dp deleted the kevin/where-claused-pubs branch September 11, 2024 14:17
KyleAMathews pushed a commit that referenced this pull request Nov 1, 2024
This PR modifies the sync-service such that it adds the necessary
filters to the PG publication based on the shape's where clause.
Basically, for every table, we add a filter which is the disjunction of
the where clauses of all the shapes that are rooted in this table.

For example, if we have a shape on `foo` with where clause `a = 5` and
another shape for `foo` with where clause `a = 9` then the publication
will have the filter `((a = 5) OR (a = 9))`.

As soon as there is a shape that does not have a where clause, the
filters for that table is removed from the publication.

**Note: this requires at least Postgres 15.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants