Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Django Admin default queries on primary media tables in production #4344

Closed
AetherUnbound opened this issue May 16, 2024 · 7 comments · Fixed by #4349
Closed

Prevent Django Admin default queries on primary media tables in production #4344

AetherUnbound opened this issue May 16, 2024 · 7 comments · Fixed by #4349
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API

Comments

@AetherUnbound
Copy link
Collaborator

AetherUnbound commented May 16, 2024

Description

On several occasions now, we've run into production resource issues with our database due to Django Admin performing some default query against our (massive) primary media tables (e.g. #970).

Recent changes to the reporting admin (specifically #4254) have added new views that are also selecting the primary media tables in order to generate the UI. These are incredibly useful for local testing, but are actively harmful for production. We should alter the logic for these fields so that the automatic completion only occurs locally and does not occur in production. This will help prevent issues from occurring from simply visiting pages in Django Admin in production.

Going forward (per @sarayourfriend's comment #4344 (comment)), we should create a custom object manager for the media models that ensures the following things:

  • ORDER BY is not used on the base media tables
  • A LIMIT is always defined for queries on the media tables

This would prevent us from erroneously adding naive queries that could impact production in the future, even if we add the functionality for the query in a naive way.

@AetherUnbound AetherUnbound added 💻 aspect: code Concerns the software code in the repository 🟧 priority: high Stalls work on the project or its dependents 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: api Related to the Django API labels May 16, 2024
@openverse-bot openverse-bot moved this to 📋 Backlog in Openverse Backlog May 16, 2024
@AetherUnbound AetherUnbound changed the title Prevent Django Admin default queryies on primary media tables in production Prevent Django Admin default queries on primary media tables in production May 16, 2024
@AetherUnbound AetherUnbound moved this from 📋 Backlog to 📅 To Do in Openverse Backlog May 16, 2024
@AetherUnbound AetherUnbound self-assigned this May 16, 2024
@openverse-bot openverse-bot moved this from 📅 To Do to 🏗 In Progress in Openverse Backlog May 16, 2024
@openverse-bot openverse-bot moved this from 🏗 In Progress to ✅ Done in Openverse Backlog May 17, 2024
@sarayourfriend
Copy link
Collaborator

Reopening to signify my request for clarification about the status of this issue and the solution implemented in #4349 (comment)

@openverse-bot openverse-bot moved this from ✅ Done to 📋 Backlog in Openverse Backlog May 21, 2024
@AetherUnbound
Copy link
Collaborator Author

@sarayourfriend can we close this issue, or did you want to keep it open until we can remove the functions that the associated issue created? I think #4386 maybe have also helped accomplish this.

@sarayourfriend
Copy link
Collaborator

sarayourfriend commented Jun 3, 2024

I want to prevent select * from <media> without a limit. There is literally no use case for it in the API and it is trivial to cause a problem in production by doing that, even without knowing (e.g., Django admin forms).

I think it could be accomplished by creating a custom object manager for the media models that prevents selecting more than the maximum number of rows we need for search.

To me the problem isn't with the implementation of the Django admin forms we have now. The problem is that it's even possible. I want to make it impossible to cause this problem in production, not just something we try to cautiously avoid.

@AetherUnbound
Copy link
Collaborator Author

FWIW I think what's more destructive, even if a limit is provided, is select * from <media> order by <column>, since the order by clause is what causes very heavy operations on the database. Just noting that for the eventual work on this ticket. Agreed that a system-level block for that would be ideal!

@sarayourfriend
Copy link
Collaborator

Sounds good! Whatever the features we shouldn't use on those tables, the object manager can block them. Can you update the issue body with what you think those features that need to be blocked are? Is it just order by?

@krysal krysal assigned krysal and unassigned AetherUnbound Aug 19, 2024
@openverse-bot openverse-bot moved this from 📋 Backlog to 📅 To Do in Openverse Backlog Aug 19, 2024
@krysal
Copy link
Member

krysal commented Nov 18, 2024

I was looking into this and although I found that creating a custom object manager for limiting the number of rows returned works for the initial query, and even though it still returns a QuerySet, we cannot used it.

class MediaManager(models.Manager):
    def get_queryset(self):
        return super().get_queryset()[:5]  # example number

The documentation says:

Further filtering or ordering of a sliced queryset is prohibited due to the ambiguous nature of how that might work.

We have to resort to being alert to possible queries that can get out of control. We also can't remove the sort statements because they are used in admin views. If anyone has any alternatives I'd be happy to hear them. For now, I'm closing this issue.

@krysal krysal closed this as not planned Won't fix, can't repro, duplicate, stale Nov 18, 2024
@openverse-bot openverse-bot moved this from 📅 To Do to 🗑 Discarded in Openverse Backlog Nov 18, 2024
@sarayourfriend
Copy link
Collaborator

@krysal the only alternative I can think of that would accomplish this is to subclass QuerySet and enforce a limit in _fetch_all... or use it to set a custom subclass of ModelIterable onto QuerySet._iterable_class and check that the queryset.query has limits set in ModelIterable.__init__. Neither seem optimal, and I'm not sure what the side effects would be. Without interfering with existing library code, you could also log an error message when no limit is set on those models rather than prevent it altogether, so maintainers can easily identify when it happens. Otherwise, unless there are plans to implement query monitoring to alert on long queries (and maybe auto-kill them), it's probably prudent to put some kind of check in place to prevent it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API
Projects
Status: 🗑 Discarded
Development

Successfully merging a pull request may close this issue.

3 participants