feat: Add `columns` API query parameter to filter table columns #1829

msfstef · 2024-10-10T12:12:47Z

Implements #1804

Overview

I've picked columns as the query parameter name, as select might confuse people and get compared to SQL SELECTs, and filter is too close to what the where parameter does.
The columns are comma separated column names, parsed at the validation stage
If the parameter is specified, it has to include the primary key columns
- We could make it such that the PK is always included, but it feels like it might be confusing - explicit is better?
- For tables without a primary key, we treat all columns as primary keys - we might want to revisit that w.r.t. to this feature
Columns are validated also by ensuring that all specified column names match the ones in the schema
The selected columns are stored in the Shape definition as an array of column names
- We had the choice of potentially filtering the table_info data based on the selection of columns instead of storing them separately, but this might cause issues with relation change handling and generally I prefer the definition of the shape to contain all information related to it (i.e. the schema at the time of creation + what column filters where applied, rather than compacting the two)
I modified Shape related APIs for converting changes etc to also apply the column filtering - this is all we need to ensure it works correctly for replication stream log entries
In the Snapshotter, I modified a get_column_names method to also apply the filtering if present, which takes care of the snapshot log entries.

Other changes

I've changed Shapes.new to return the errors along with the field they are associated with (:root_table or :where or :columns) in order to return correct validation errors, since all of the stronger validation occrus at cast_root_table when the shape is created and PG is used but it really validates more than just root_table.
I've updated the client to accept a columns argument which is a list of strings
I've updated the OpenAPI spec to include the columns parameter

Things to consider

How do we want to handle column names with special names (quoted)? In my opinion the client needs to type them exactly as they are on postgres, otherwise they get a 400 validation error back telling them which columns are invalid, so it should be fairly easy to fix.
Replication publication filtering of columns is available, but updating it might cause errors according to the PG docs, so I'm not sure if that's something we want to consider

netlify · 2024-10-10T12:13:57Z

✅ Deploy Preview for electric-next ready!

Name	Link
🔨 Latest commit	`7b5aa03`
🔍 Latest deploy log	https://app.netlify.com/sites/electric-next/deploys/670d22b128596700074e7c28
😎 Deploy Preview	https://deploy-preview-1829--electric-next.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

robacourt

Neat!

balegas · 2024-10-10T14:26:22Z

I agree with your choices. The only thing is that we can avoid the errors for PKs, but I prefer erroring.

How do we want to handle column names with special names (quoted)? In my opinion the client needs to type them exactly as they are on postgres, otherwise they get a 400 validation error back telling them which columns are invalid, so it should be fairly easy to fix.

Please make sure it is same logic as table names, check with @kevin-dp

Replication publication filtering of columns is available

I think this is an important thing to have by the reasons we've discussed. Since we have a single subscriber and we know all the columns electric needs, maybe PG behaves. Please open a separate ticket to not block merging this one, but I think is one thing we want to validate it's working.

kevin-dp

Some issues here:

PG allows any character in column names if you quote them, e.g. CREATE TABLE foo ("a,column""with-weird name" INT);. Currently, the parser will misinterpret the comma as a separator. So we should allow column names to be quoted (is consistent with table naming parsing)
The parser is case sensitive by default. This is inconsistent with schema and table names which are case insensitive unless you quote them.

This would be a good moment to implement #1814 such that we have a parse function that we can use for parsing schema identifiers, table identifiers, and column identifiers.

packages/sync-service/lib/electric/shapes/querying.ex

kevin-dp

Well done!
I left some comments mostly asking for minor changes.

packages/sync-service/lib/electric/plug/serve_shape_plug.ex

packages/sync-service/lib/electric/shapes/shape.ex

packages/typescript-client/src/client.ts

website/electric-api.yaml

kevin-dp

Brilliant!
Left a few very minor comments.
Ready to merge for my part 💯

kevin-dp · 2024-10-14T13:09:10Z

packages/sync-service/lib/electric/plug/utils.ex

@@ -0,0 +1,50 @@
+defmodule Electric.Plug.Utils do


Let's perhaps not introduce several utility files but keep everything in the outer utils.ex file?

I feel like this is not a general utility like the other ones and is more meant to be part of the Plug code - we already have a very long serve shae plug module which defines submodules, the idea is that parsing logic could move to this.

packages/sync-service/lib/electric/plug/utils.ex

kevin-dp · 2024-10-14T13:13:18Z

packages/sync-service/lib/electric/plug/utils.ex

+      iex> Electric.Plug.Utils.parse_columns_param(~S|\"fo\"\"o\",bar|)
+      {:ok, ["bar", ~S|fo"o|]}
+      iex> Electric.Plug.Utils.parse_columns_param(~S|"id,"name"|)
+      {:error, ~S|Invalid unquoted identifier contains special characters: "id|}


why is this saying "unquoted identifier" ? The sigil does contain quotes.
The problem should be that the quote inside the quote is not escaped (with a double quote).

The provided column was "id - so I'm treating this as an unquoted identifier that has included a special character "

If the strategy to split the comas was different it would provide id,"name as a quoted identifier and produce a different error

packages/sync-service/lib/electric/postgres/identifiers.ex

Implements #1804 ### Overview - I've picked `columns` as the query parameter name, as `select` might confuse people and get compared to SQL `SELECT`s, and `filter` is too close to what the `where` parameter does. - The columns are comma separated column names, parsed at the validation stage - If the parameter is specified, it *has to include* the primary key columns - We could make it such that the PK is always included, but it feels like it might be confusing - explicit is better? - For tables without a primary key, we treat _all_ columns as primary keys - we might want to revisit that w.r.t. to this feature - Columns are validated also by ensuring that all specified column names match the ones in the schema - The selected columns are stored in the `Shape` definition as an array of column names - We had the choice of potentially filtering the `table_info` data based on the selection of columns instead of storing them separately, but this might cause issues with relation change handling and generally I prefer the definition of the shape to contain all information related to it (i.e. the schema at the time of creation + what column filters where applied, rather than compacting the two) - I modified `Shape` related APIs for converting changes etc to also apply the column filtering - this is all we need to ensure it works correctly for replication stream log entries - In the `Snapshotter`, I modified a `get_column_names` method to also apply the filtering if present, which takes care of the snapshot log entries. ### Other changes - I've changed `Shapes.new` to return the errors along with the field they are associated with (`:root_table` or `:where` or `:columns`) in order to return correct validation errors, since all of the stronger validation occrus at `cast_root_table` when the shape is created and PG is used but it really validates more than just `root_table`. - I've updated the client to accept a `columns` argument which is a list of strings - I've updated the OpenAPI spec to include the `columns` parameter ### Things to consider - How do we want to handle column names with special names (quoted)? In my opinion the client needs to type them exactly as they are on postgres, otherwise they get a 400 validation error back telling them which columns are invalid, so it should be fairly easy to fix. - Replication publication filtering of columns is available, but [updating it might cause errors](https://www.postgresql.org/docs/15/logical-replication-col-lists.html#LOGICAL-REPLICATION-COL-LIST-COMBINING) according to the PG docs, so I'm not sure if that's something we want to consider

msfstef requested review from balegas, robacourt and kevin-dp October 10, 2024 12:12

msfstef linked an issue Oct 10, 2024 that may be closed by this pull request

API parameter to filter columns shape #1804

Closed

robacourt approved these changes Oct 10, 2024

View reviewed changes

msfstef mentioned this pull request Oct 10, 2024

Filter replication publication columns #1831

Open

kevin-dp requested changes Oct 14, 2024

View reviewed changes

kevin-dp self-requested a review October 14, 2024 09:44

kevin-dp reviewed Oct 14, 2024

View reviewed changes

packages/sync-service/lib/electric/shapes/querying.ex Outdated Show resolved Hide resolved

kevin-dp requested changes Oct 14, 2024

View reviewed changes

msfstef mentioned this pull request Oct 14, 2024

Implement PG identifier parsing algorithm #1814

Closed

kevin-dp approved these changes Oct 14, 2024

View reviewed changes

msfstef force-pushed the msfstef/column-filtering branch from 20a81c2 to 7b5aa03 Compare October 14, 2024 13:54

msfstef added 15 commits October 14, 2024 17:20

Implement column filtering query parameter

bd56927

Fix filtering of columns in server

4e83c69

Add column filtering capabilities to client

2c8ae36

Update open api specification

ea385c6

Add changeset

abbdc15

Return selected columns directly if available

0c9f7f2

Parse columns to mapset

237a4df

Fix type signature for column validation

642cb88

Use list instead of mapset

8a97ba9

Handle empty columns with errors

7ee1dbe

Fix shape plug to handle parsing errors

1c7087b

Downcase using postgres algorithm

ba432a6

Check columns is not empty array

59f3d71

Pipe assigned variable instead of piping case clause

79b221f

Use explicit function with pattern matching

8483d46

msfstef added 4 commits October 14, 2024 17:20

Fix type for selected columns validation

f553e81

Shift parsing burden to postgres identifier utility

81dbb6a

Fix API schema description

490669c

Address PR comments

adad2d1

msfstef force-pushed the msfstef/column-filtering branch from 5929b95 to adad2d1 Compare October 14, 2024 14:21

msfstef merged commit 25c437f into main Oct 14, 2024
24 checks passed

msfstef deleted the msfstef/column-filtering branch October 14, 2024 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `columns` API query parameter to filter table columns #1829

feat: Add `columns` API query parameter to filter table columns #1829

msfstef commented Oct 10, 2024 •

edited

Loading

netlify bot commented Oct 10, 2024 •

edited

Loading

robacourt left a comment

balegas commented Oct 10, 2024

kevin-dp left a comment

kevin-dp left a comment

kevin-dp left a comment

kevin-dp Oct 14, 2024

msfstef Oct 14, 2024

kevin-dp Oct 14, 2024

msfstef Oct 14, 2024 •

edited

Loading

feat: Add columns API query parameter to filter table columns #1829

feat: Add columns API query parameter to filter table columns #1829

Conversation

msfstef commented Oct 10, 2024 • edited Loading

Overview

Other changes

Things to consider

netlify bot commented Oct 10, 2024 • edited Loading

✅ Deploy Preview for electric-next ready!

robacourt left a comment

Choose a reason for hiding this comment

balegas commented Oct 10, 2024

kevin-dp left a comment

Choose a reason for hiding this comment

kevin-dp left a comment

Choose a reason for hiding this comment

kevin-dp left a comment

Choose a reason for hiding this comment

kevin-dp Oct 14, 2024

Choose a reason for hiding this comment

msfstef Oct 14, 2024

Choose a reason for hiding this comment

kevin-dp Oct 14, 2024

Choose a reason for hiding this comment

msfstef Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

feat: Add `columns` API query parameter to filter table columns #1829

feat: Add `columns` API query parameter to filter table columns #1829

msfstef commented Oct 10, 2024 •

edited

Loading

netlify bot commented Oct 10, 2024 •

edited

Loading

msfstef Oct 14, 2024 •

edited

Loading