Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive failed jobs to object storage to reduce RDS table size. #263

Open
sharkinsspatial opened this issue Jan 8, 2024 · 0 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@sharkinsspatial
Copy link
Collaborator

Since the project's inception, we have been maintaining all failed jobs in our log database for auditing and potential reprocessing purposes. Rather than being stored in our active RDS logging instance we should be periodically writing these failed jobs to archive storage and removing them from the live instance. Initially, we should use a long running process which,

  1. Queries failed jobs for a date and exports those rows as ndjson and stores them in an S3 bucket with the key structure year/month/date.json.
  2. Deletes all of those corresponding rows from the table.
  3. Initially this should run in a loop for all dates for a specified range (like Jan-Jul 2022).

After we've done the legacy process a daily cron job should run to execute the same process for any dates > than 2 months ago.

This will provide us an auditable archive of failed granules that we can query and reprocess if necessary while keeping our active production logging RDS instance smaller and easier to manage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants