Disable archive.org queries after timeout #347
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
#346 tried to address the download counter's log spam and downtime by setting an instance variable when a timeout occurred and gating further queries with that variable. This was somewhat effective in that last night's count pass completed, but it still wasted time querying archive.org and spammed Discord.
Cause
After performing a batched query to archive.org,
DownloadCounter
replaces itsInternetArchiveBatchedQuery
object with a fresh one. Before #346, the timeout exception was preventing this from happening, so the same query object got stuffed with all mods over time,.full()
was alwaysTrue
, and we tried to query archive.org for every mod (rather than once per 30), over and over, with an increasingly gigantic query.The main thing from #346 that was helpful was catching the exception, which allowed the main loop to replace the query object, reducing the queries back to once per 30. But setting
self.connect_timed_out
then had no effect because the query object was discarded immediately afterwards.Changes
DownloadCounter
class itself catches the timeout exception from archive.org, discards itsia_query
reference, and stops querying archive.org once this has happened.GraphQLQuery
is renamedGitHubBatchedQuery
for consistency with the other classesDownloadCounter.GITHUB_PATH_PATTERN
is moved toGitHubBatchedQuery.PATH_PATTERN
, and so on for the other path pattern variables