-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add columns
API query parameter to filter table columns
#1829
Conversation
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
I agree with your choices. The only thing is that we can avoid the errors for PKs, but I prefer erroring.
Please make sure it is same logic as table names, check with @kevin-dp
I think this is an important thing to have by the reasons we've discussed. Since we have a single subscriber and we know all the columns electric needs, maybe PG behaves. Please open a separate ticket to not block merging this one, but I think is one thing we want to validate it's working. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some issues here:
- PG allows any character in column names if you quote them, e.g.
CREATE TABLE foo ("a,column""with-weird name" INT);
. Currently, the parser will misinterpret the comma as a separator. So we should allow column names to be quoted (is consistent with table naming parsing) - The parser is case sensitive by default. This is inconsistent with schema and table names which are case insensitive unless you quote them.
This would be a good moment to implement #1814 such that we have a parse function that we can use for parsing schema identifiers, table identifiers, and column identifiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done!
I left some comments mostly asking for minor changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant!
Left a few very minor comments.
Ready to merge for my part 💯
@@ -0,0 +1,50 @@ | |||
defmodule Electric.Plug.Utils do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's perhaps not introduce several utility files but keep everything in the outer utils.ex
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is not a general utility like the other ones and is more meant to be part of the Plug code - we already have a very long serve shae plug module which defines submodules, the idea is that parsing logic could move to this.
iex> Electric.Plug.Utils.parse_columns_param(~S|\"fo\"\"o\",bar|) | ||
{:ok, ["bar", ~S|fo"o|]} | ||
iex> Electric.Plug.Utils.parse_columns_param(~S|"id,"name"|) | ||
{:error, ~S|Invalid unquoted identifier contains special characters: "id|} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this saying "unquoted identifier" ? The sigil does contain quotes.
The problem should be that the quote inside the quote is not escaped (with a double quote).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provided column was "id
- so I'm treating this as an unquoted identifier that has included a special character "
If the strategy to split the comas was different it would provide id,"name
as a quoted identifier and produce a different error
20a81c2
to
7b5aa03
Compare
5929b95
to
adad2d1
Compare
Implements #1804 ### Overview - I've picked `columns` as the query parameter name, as `select` might confuse people and get compared to SQL `SELECT`s, and `filter` is too close to what the `where` parameter does. - The columns are comma separated column names, parsed at the validation stage - If the parameter is specified, it *has to include* the primary key columns - We could make it such that the PK is always included, but it feels like it might be confusing - explicit is better? - For tables without a primary key, we treat _all_ columns as primary keys - we might want to revisit that w.r.t. to this feature - Columns are validated also by ensuring that all specified column names match the ones in the schema - The selected columns are stored in the `Shape` definition as an array of column names - We had the choice of potentially filtering the `table_info` data based on the selection of columns instead of storing them separately, but this might cause issues with relation change handling and generally I prefer the definition of the shape to contain all information related to it (i.e. the schema at the time of creation + what column filters where applied, rather than compacting the two) - I modified `Shape` related APIs for converting changes etc to also apply the column filtering - this is all we need to ensure it works correctly for replication stream log entries - In the `Snapshotter`, I modified a `get_column_names` method to also apply the filtering if present, which takes care of the snapshot log entries. ### Other changes - I've changed `Shapes.new` to return the errors along with the field they are associated with (`:root_table` or `:where` or `:columns`) in order to return correct validation errors, since all of the stronger validation occrus at `cast_root_table` when the shape is created and PG is used but it really validates more than just `root_table`. - I've updated the client to accept a `columns` argument which is a list of strings - I've updated the OpenAPI spec to include the `columns` parameter ### Things to consider - How do we want to handle column names with special names (quoted)? In my opinion the client needs to type them exactly as they are on postgres, otherwise they get a 400 validation error back telling them which columns are invalid, so it should be fairly easy to fix. - Replication publication filtering of columns is available, but [updating it might cause errors](https://www.postgresql.org/docs/15/logical-replication-col-lists.html#LOGICAL-REPLICATION-COL-LIST-COMBINING) according to the PG docs, so I'm not sure if that's something we want to consider
Implements #1804
Overview
columns
as the query parameter name, asselect
might confuse people and get compared to SQLSELECT
s, andfilter
is too close to what thewhere
parameter does.Shape
definition as an array of column namestable_info
data based on the selection of columns instead of storing them separately, but this might cause issues with relation change handling and generally I prefer the definition of the shape to contain all information related to it (i.e. the schema at the time of creation + what column filters where applied, rather than compacting the two)Shape
related APIs for converting changes etc to also apply the column filtering - this is all we need to ensure it works correctly for replication stream log entriesSnapshotter
, I modified aget_column_names
method to also apply the filtering if present, which takes care of the snapshot log entries.Other changes
Shapes.new
to return the errors along with the field they are associated with (:root_table
or:where
or:columns
) in order to return correct validation errors, since all of the stronger validation occrus atcast_root_table
when the shape is created and PG is used but it really validates more than justroot_table
.columns
argument which is a list of stringscolumns
parameterThings to consider