Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CBS monthly data collection script to crontab #808

Closed
2 of 5 tasks
simonaho opened this issue Dec 26, 2017 · 1 comment
Closed
2 of 5 tasks

Add CBS monthly data collection script to crontab #808

simonaho opened this issue Dec 26, 2017 · 1 comment

Comments

@simonaho
Copy link
Collaborator

simonaho commented Dec 26, 2017

Add CBS monthly data collection script to crontab:

  • Add CBS data to an S3 bucket - create a repo for provider code and for each year.
  • Import cbs data from email and upload to AWS - nowdays importmail process is running once a week and uploads to s3 data from last 2 cbs emails
  • Trigger Load CBS data from s3 when new CBS data arrives
    command to delete a certain year and load starting that year, for example 2019:
    python main.py process cbs --path <cbs dir path> --delete_start_date 2019-01-01 -load_start_year=2019
    cbs parser is in file cbs.py
    delete only data starting the year of the current files that arrived
  • Create DB table versioning of emails we load from email to s3, and load new data to s3 only when new data arrives
  • Modify schedule from weekly back to daily (see this pr that changed from daily to weekly)

Note that CBS processes are now in anyway-etl repo - see process repo here

@atalyaalon
Copy link
Collaborator

replaced by #2745

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants