Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix through association nested preloads #4341

Merged
merged 7 commits into from
Dec 21, 2023

Conversation

greg-rychlewski
Copy link
Member

Closes #4112

I didn't add tests yet because I wanted to see what you thought about the solution presented here and the alternative solution. This one is really tricky.

Background

When using Repo.preload, it is possible for through associations to lose their nested preloads like this

  from c in Comment, 
    join: t in ThroughAssoc,
    on: c.through_id == t.id, 
    join: u in User,
    on: t.user_id == u.id, 
    preload: [through_assoc: {t, [:some_other_assoc, user: u]}]

What happens is this:

  1. The join preloads put User into ThroughAssoc and ThroughAssoc into Comment
  2. The separate query preload puts :some_other_assoc into ThroughAssoc. However, it does this by generating the association chain and then putting it into the through field. This association chain does not have User inside of ThroughAssoc so that nested preload is lost.

Solution

The solution in this PR is pretty simple. If the through association is already loaded then we preload directly into it. The same way we do for embeds. When determining whether to put the association chain into the through field, we simply don't put the association chain into the through field if it has already been loaded. However, the chain is still calculated and put into the "regular" association fields.

The drawback of this method is that we are adding additional queries for the loaded through associations. Taking into account nested associations, this could end up with a decent amount of extra queries.

Alternative Solution

The alternative solution is to not preload the association chain if the through association has been loaded. I tried this out first but it gets quite complex to the point where I'm not sure there is any sane way to do it. There are two main issues:

  1. Different through associations can overlap in their association chains. For example:
  • posts => author => location
  • posts => author => company => department
  • posts => author => company => manager => director

Depending on which of these 3 through fields are already loaded, we would have to keep track of which sub-preloads at each nested layer are to be kept.

  1. Building off of point (1), the chain of preloads is dependent on which struct you are looking at. Since each struct can have different fields loaded/unloaded, not every one will have the same chain of nested preloads. This goes against the current flow in the code which assumes each struct in a give layer has the same sub-preloads.

It also means we have to carry over information several layers deep. For example, a field on post might control whether manager is preloaded. But by the time we get to manager we've lost all the post information because we preload recursively.

@josevalim
Copy link
Member

Awesome. This looks good to me but it feels it could be slightly backwards compatible, right? Previously we were always preloading through (as it force: true) and now it make it behave like (force: false), right?

@greg-rychlewski
Copy link
Member Author

Yeah definitely it changes the behaviour. In addition to the change of force behaviour it also changes some of the nested preload behaviour. For example, if you had this:

schema "posts" do
  has_many :comments_authors, through: [:comments, :authors]
  has_many :comments
end

schema "comments" do
  belongs_to :author
end

schema "authors" do
  has_one :permalink
  has_many :posts
end

post = Repo.preload(post, :comments_authors)
post = Repo.preload(post, [comments_authors: [:permalink], comments: [authors: [:posts]]])

In the second preload the :comments_authors gets expanded to its association chain and :permalink and :posts end up getting put into the authors struct. Since we were forcing an overwrite in the past, :posts would also be nested inside of :comments_authors even though it wasn't explicitly set. But with the change it won't get put there.

Now that you mention forcing, if we go ahead with this then I should also take it into account and allow the overwrite if force is true.

@josevalim
Copy link
Member

Maybe an alternative solution here is to know if the association has been loaded because it happens via Repo.preload or if it happens via query (i.e. preload+join)?

If the preload is filled because of a preload+join, then we can apply the heuristic in this patch, because the user is explicitly saying they want their given struct to win (i.e. it is force: false).

If it is a Repo.preload, then can keep the current semantics of filling in the association chain?

@greg-rychlewski
Copy link
Member Author

That's a really good idea. What do you think about the last commit? If you like it, I can write the integration tests.

@josevalim
Copy link
Member

Wait a second, you mentioned that in this case we will no longer have authors and posts in comments or something.

post = Repo.preload(post, [comments_authors: [:permalink], comments: [authors: [:posts]]])

But wouldn't we be able to know if comments_authors was set before hand (before calling preload) or set by preload itself? In this case, we could:

  1. If it was set before hand, we keep whatever is there, excepting forcing
  2. If it is was not set, we preserve today's behaviour

Would this be able to reduce the backwards incompatible changes without adding a new option?

@greg-rychlewski
Copy link
Member Author

Sorry I probably am not understanding properly. Would this be the same as my original commit or something different?

@josevalim
Copy link
Member

Sorry, I am telling you my conclusions but not my thought process. There are two concerns in this PR, as far as I know:

  1. force_through changing the behaviour from true to false

  2. post = Repo.preload(post, [comments_authors: [:permalink], comments: [authors: [:posts]]]) not getting expanded into the authors struct

What I am saying is that, we only need to change the behaviour of preload: [...] with joins in them, we can keep everything else behaving the exact same. And we can probably do that by passing some additional information here: https://github.com/elixir-ecto/ecto/blob/master/lib/ecto/repo/queryable.ex#L238

@greg-rychlewski
Copy link
Member Author

Oh I see what you mean now. There is a case to change the behaviour for Repo.preload as well. The reason is to keep parity with non-through preloads. For example

post = Repo.preload(post, [non_through_assoc: [:nested_assoc]])
post = Repo.preload(post, :non_through_assoc)

In the case above, post will still keep :nested_assoc. But for the case below it will not because the through field is wiped by the preload chain of "regular" assocs:

post = Repo.preload(post, [through_assoc: [:nested_assoc]])
post = Repo.preload(post, through_assoc)

I don't personally see a reason to expect non-through and through assocs to behave differently here. But to maintain backwards compatibility I thought it might be worth giving them an option instead of outright changing the behaviour.

@greg-rychlewski
Copy link
Member Author

Actually I misspoke a bit for this example:

post = Repo.preload(post, [through_assoc: [:nested_assoc]])
post = Repo.preload(post, through_assoc)

This will still have :nested_assoc. It only won't if you reset the "regular" assoc that is part of the through assoc. So for instance if you have this

schema "posts" do
  has_many :through_assoc, through: [:through1, through2]
  has_one :through1
end

Then if you do this, the nested assoc would disappear:

post = Repo.preload(post, [through_assoc: [:nested_assoc]])
post = Ecto.reset_fields(post, [:through1])
post = Repo.preload(post, through_assoc)

I think what you said makes sense then. As long as you are using Repo.preload normally and not resetting random fields it will behave consistently. I'll change the PR.

@greg-rychlewski
Copy link
Member Author

WDYT?

Copy link
Member

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I have only some minor nits.

To make sure we are on the same page. This should be fully backwards compatible. The only chance in behaviour is when you preload a through with joins and regular assocs, which was broken anyway (i.e. it was the original bug report)?

@greg-rychlewski
Copy link
Member Author

greg-rychlewski commented Dec 21, 2023

The only chance in behaviour is when you preload a through with joins and regular assocs, which was broken anyway (i.e. it was the original bug report)?

Yeah exactly. The following are backwards compatible:

  • Everything with Repo.preload
  • Everything with Ecto.Query.preload not involving throughs
  • Everything with Ecto.Query.preload involving throughs not used in joins
  • Everything with Ecto.Query.preload involving throughs used in joins as long as they don't have nested separate query preloads

The following is not backwards compatible:

  • Ecto.Query.preload when a through association is used in a join and has a nested separate query preload

In this case, the differences are:

  1. the association chain is no longer preloaded and we simply preload directly onto the loaded through association.
  2. as an additional consequence of 1, if the user goes out of their way to specify both the through association and the individual through components in the preload list,there won't be any merging of the sub-preloads here:
    defp merge_preloads(_preload, {info, _, nil, left}, {info, take, query, right}),

Before I merge, could you confirm ^ is ok with you as well? Thanks!

@josevalim
Copy link
Member

Yup, it is all good to me. Beautifully done!

@greg-rychlewski greg-rychlewski merged commit b26b25b into elixir-ecto:master Dec 21, 2023
6 checks passed
@greg-rychlewski greg-rychlewski deleted the through_preloads branch December 21, 2023 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug in has_one-through scenario: preload not set when combining atom and existing join
2 participants