Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] CBS import data to s3 from new emails #2176

Open
atalyaalon opened this issue Apr 23, 2022 · 0 comments
Open

[Feature] CBS import data to s3 from new emails #2176

atalyaalon opened this issue Apr 23, 2022 · 0 comments
Milestone

Comments

@atalyaalon
Copy link
Collaborator

atalyaalon commented Apr 23, 2022

Describe the bug
Relevant issue (partially contains this issue): #808

See the following document: https://github.com/hasadna/anyway/blob/dev/docs/Architecture/CBS.md
See part 6:
Data Loading - Separate to multiple stages - see CBS ETL in process refactoring Make sure data is not loaded multiple times and that no duplicates are created
Current Flow: email -> s3 -> updated tables email -> s3: can be scheduled once a week / even a day s3 -> Data Tables: Needs to be scheduled when both accident type 1 and accident type 3 of that months are in s3 Explanation: Nowadays we pull the last data from last 4 emails and insert data to s3 (after deleting previous data), we need to pull only emails we didn't save to s3 - hence track on the emails we already read and not re-insert them. Optional: We can add CBS data versioning in s3 - right now we delete old data and insert new one.

Expected behavior
Checking email once a day, when a new email arrives that we didn't load to S3 (perhaps create a data versioning table as mentioned above), load its data to s3.

@atalyaalon atalyaalon added the bug label Apr 23, 2022
@atalyaalon atalyaalon added this to the Future milestone Apr 23, 2022
@ziv17 ziv17 added the prio 1 label Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants