Store multiple crawls in a single database #105
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR significantly alters the way this package uses a database. Instead of storing individual website crawls into separate SQLite database files, all crawls are now stored into the same Django database. This database can be configured to use any database backend supported by Django. Database tables are now managed by Django migrations, and a new
Crawl
model keeps track of the status of past crawls, including whether they succeeded or failed.This PR also adds Python tests for 100% of testable Python code (excluding only the plugin to the wpull crawler, which is difficult to test without running a real crawl). This package has been migrated to pytest and pytest-cov for simpler testing and coverage checks. Moving forward, PR checks will fail if Python coverage drops below 100%.
(As a TODO, a future PR will need to add a management command to clean up old crawls, to ensure the database doesn't continue to grow indefinitely).