Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache package pages intelligently #514

Open
jelly opened this issue Jul 20, 2024 · 2 comments
Open

Cache package pages intelligently #514

jelly opened this issue Jul 20, 2024 · 2 comments

Comments

@jelly
Copy link
Member

jelly commented Jul 20, 2024

Our packages pages can be quite slow without a cache as package.get_requiredby(), we could cache every package view's metadata with a common cache key or package-$pkgname-metadata and bust the cache once reporead has detected changes.

I don't know how often we update packages so maybe this isn't practical, or the cache is just bust all the time...

If we want to cache per package this is tricky as if I add a new package which depends on my cached package it won't show up. So either we write a smart caching busting algorithm or we don't.

Also other things influence the package page:

  • flagging
  • adopting a package
  • updating the package
  • getting a new required by
  • reproducibility status (for logged in)
@andrewSC
Copy link

If we want to cache per package this is tricky as if I add a new package which depends on my cached package it won't show up. So either we write a smart caching busting algorithm or we don't.

I'm trying to think this through (apologies in advance if the logic/wording isn't clear lol).

So if I'm understanding correctly, the concern is if we have a scenario where:

  1. Existing, available package is cached
  2. New package is introduced that the existing cached package should list under "Required By" in the web ui, but doesn't, because it's cached (and hasn't been cache busted yet)

The concern is the existing cached package wouldn't show the new package under "Required By" in the web ui?

Can we just write two functions that:

  1. if a new package is uploaded, checks/gets its dependencies
  2. if the dependent package exists (probably safe to assume?), cache bust it so the next pull/page load from whomever looks at the page shows the new "Required By"'s?

Am I understanding the problem correctly?

@jelly
Copy link
Member Author

jelly commented Jul 22, 2024

If we want to cache per package this is tricky as if I add a new package which depends on my cached package it won't show up. So either we write a smart caching busting algorithm or we don't.

I'm trying to think this through (apologies in advance if the logic/wording isn't clear lol).

So if I'm understanding correctly, the concern is if we have a scenario where:

1. Existing, available package is cached

2. New package is introduced that the existing cached package should list under "Required By" in the web ui, but doesn't, because it's cached (and hasn't been cache busted yet)

The concern is the existing cached package wouldn't show the new package under "Required By" in the web ui?

Yes. We have Package.get_requiredby() which for glibc uses 100 SQL queries and I am not sure if we can even further optimise that. But glibc is probably the most heavy one so others also use 10-100 queries for a package view. So caching this information would be beneficial.

Can we just write two functions that:

1. if a new package is uploaded, checks/gets its dependencies

2. if the dependent package exists (probably safe to assume?), cache bust it so the next pull/page load from whomever looks at the page shows the new "Required By"'s?

Am I understanding the problem correctly?

Yes, the best would be to cache the metadata of a package because that only depends on Package updates. I tried this before and it was fast but my cache key was bogus so other pages showed the wrong information. The cache key for that should be:

$pkgname-$arch-$repo-$pkgver, we have to verify that this doesn't cause too many SQL queries to fetch name/arch/repo. Having the version in there would nicely invalidate the cache. I hope memcached drops useless caches by default.

Otherwise skip the pkgver.

For django template's we can create an unique cache key and destroy this cache when reading the repository metadata:

https://stackoverflow.com/questions/10778988/how-do-i-delete-a-cached-template-fragment-in-django

As required_by is the reverse of depends/makedepends/checkdepends we can just destroy the cache when package A updates and iterate over all of their dependsmakedepends/checkdepends and destroy the cache.

Obviously reading the repository db will become a bit slower then..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants