Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't send dynamic map keys to the database #4327

Merged

Conversation

greg-rychlewski
Copy link
Member

@greg-rychlewski greg-rychlewski commented Nov 22, 2023

I was reading over the issue from this morning about setting dynamic map keys (#4326). I think I have a decent way to solve it.

The idea is that any interpolated map key doesn't need to be sent to the database because it won't affect the query:

  • For subqueries/CTEs, only atom map key are allowed and those are never sent to the database
  • For outer queries literals are never sent to the database

This means instead of making map keys query parameters we can simply validate them on the spot and save them into the query struct.

There is a limitation where we can really only interpolate map keys for basic literals like atoms, strings and numbers. Composite literals like tuples and maps would require recursive escaping, which we lose the ability to do because we are at runtime.

I'm also restricting this proposal to just map keys even though I think there might be a case to extend it to anywhere in select. But it is more complicated (i.e. the distinction between outer and inner query becomes important) and more tests break, so I thought I would save it for later.

@greg-rychlewski
Copy link
Member Author

I requested a review because I wasn't sure if this had slipped by you or if you didn't have time for it right now. If the latter please ignore me.

@josevalim
Copy link
Member

Thank you for requesting it. It definitely slipped through. I think we just need to check this is also sane for unions/intersections, if you haven't yet!

@greg-rychlewski greg-rychlewski merged commit dbb7b1b into elixir-ecto:master Dec 6, 2023
6 checks passed
@greg-rychlewski greg-rychlewski deleted the dynamic_map_keys branch December 6, 2023 01:44
@greg-rychlewski
Copy link
Member Author

@fuelen you could try this out to see how it works for you :)

@fuelen
Copy link
Contributor

fuelen commented Dec 6, 2023

Just checked, it works fine. Thank you!

@greg-rychlewski
Copy link
Member Author

@josevalim Actually sorry there is something about combinations I didn't consider. I was hoping to get your opinion.

If there is a combination and the user interpolates a string map key then the string literal will be sent to the adapter. Then it will be up to the adapter to escape string properly to avoid something like this

union_query = from(c in Post, select: %{"id" => c.id})
outer_map_key = "id'; drop table posts; select 'id"
from(c in Post, select: %{^outer_map_key => c.id}, union: ^union_query)

Adapters are suppose to be escaping string literals properly. Otherwise even literals defined by the dev and not the user won't work properly. But it seems extreme to open them to SQL injection from outside users if they fail to do that.

This means if we want to close this hole, we have to continue to have interpolated map keys get changed to query parameters. And for atoms we need to have an escape hatch. There are two mechanisms we have today that kind of fit but not exactly:

  1. unsafe_fragment. This is allowed for hints and insert conflicts to inject sql directly without modification. It's not exactly the same as here where we are injecting a literal that will have some handling in the adapter, i.e. quoting/escaping
  2. literal/1. This is used as an escape hatch inside of fragments that allows the user to inject identifiers directly into the query, i.e. column names, collation names, etc. It's not exactly the same as here because the adapters will see this and quote it as an identifier rather than a literal string.

Based on the above, I can think of a few options:

  1. Create something new, like unsafe_literal/1 and use it here. I don't think we need to change the existing literal/1 because it is treated specially by adapters so it is safe
  2. Say this is ok, the adapter is supposed to be escaping strings properly
  3. Say there is no good way to handle this and completely revert

I am personally leaning towards option 1 but this feels hairy so not super confident. WDYT?

@josevalim
Copy link
Member

The adapters have to escape string literals, no? Otherwise someone could write this: foo == "id'; drop table posts; select 'id" and behave weirdly? Surely it is not an attach vector but still very confusing.

So maybe we are not escaping strings in this particular scenario?

@greg-rychlewski
Copy link
Member Author

Oh yeah sorry what I meant is that adapters still need to escape string literals properly. And the current ones do.

I was asking more from the angle of new adapters that might have mistakes in their implementation at the beginning.

@josevalim
Copy link
Member

Yeah, they have to. Otherwise all kinds of weird stuff break!

@greg-rychlewski
Copy link
Member Author

Ok phew =). Thanks for confirming!

@josevalim
Copy link
Member

@greg-rychlewski are we good to release a new patch version or would you like to leave this particular feature out just in case?

@greg-rychlewski
Copy link
Member Author

I'm good to release. I don't see a release to hold this feature back.

@josevalim
Copy link
Member

@greg-rychlewski I am reconsidering this one. Here we are using ^foo but we are not sending it as parameter to the database, right? Do we do this anywhere else? 🤔 It sounds like an exception to how Ecto works, no?

@josevalim
Copy link
Member

Alternatively, if we don't send it to the database, then we are still not passing it as parameter either. Do we do it anywhere else besides literal? or unsafe_fragment?

@greg-rychlewski
Copy link
Member Author

greg-rychlewski commented Dec 7, 2023

I believe the only two are literal and unsafe_fragment. But they are not really matching this purpose too well.

For most queries map keys won't be sent to the database but if combinations exist then they will. So there is definitely the gap you are saying in that scenario.

I'm pretty pessimistic that we can make combinations work without sending literals to the database. For example:

select "string1" union select "string2"

Something like this would be returned as ["select1", "select1"] from Ecto if the actual values are not sent to the database. Since we don't know where the main query ends/combination query starts in the results from the driver we can't really extract the literals and place them back in the right place after.

Given the above, I can only think of two options. Either scrap this entirely or introduce something like unsafe_literal in selects. Maybe the world unsafe is not needed here. But reusing literal for a different behaviour might be confusing.

@greg-rychlewski
Copy link
Member Author

It might also be possible in the planner to modify this part so that keep_literals? for the key is changed to be always false. I could play around with it to see if that causes issues.

defp collect_kv([{key, value} | elems], fields, from, query, take, keep_literals?, acc) do
    {key, fields, from} = collect_fields(key, fields, from, query, take, keep_literals?, %{})
    {value, fields, from} = collect_fields(value, fields, from, query, take, keep_literals?, %{})
    collect_kv(elems, fields, from, query, take, keep_literals?, [{key, value} | acc])
  end

@greg-rychlewski
Copy link
Member Author

greg-rychlewski commented Dec 7, 2023

No the above will break existing code for combinations. For example:

from c in Comment,
  select: %{"outer_title" => c.title},
  union: ^from(c in Comment, select: %{"inner_title" => c.title})

If the map key is sent to the database then it will return correctly

[%{"outer_title" => "title"},  %{"inner_title" => "title"}]

But if it is not sent to the database it will return this

[%{"outer_title" => "title"},  %{"outer_title" => "title"}]

@greg-rychlewski
Copy link
Member Author

I remember now what I was thinking. Because of literal/unsafe_fragment I was treating the pin operator as signal the user knows they want us to handle dynamic input rather than a signal to use a query parameter. Then depending on where the operator is used we handle it safely.

And because map keys can only ever enter the query as literals that are used to populate their values during postprocessing, it is safe how we are handling it.

I'm not sure how much of a stretch the above is. It makes some kind of sense to me but maybe i'm too tired.

@josevalim
Copy link
Member

Actually, I just realized that field(p, ^foo) is an obvious exception to my comment, so I think we are good to go. Thanks for additional discussion, it is good to double check where it fits in our mental model. :)

@greg-rychlewski
Copy link
Member Author

Ah that is a really good example! Completely forgot about it.

@greg-rychlewski
Copy link
Member Author

selected_as 1 and 2, as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants