-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Squash messages to one table for slack #349
Conversation
Now all messages are in one table Add replies resource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good work!
7f4cd66
to
1af7894
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really high quality code! also thanks for all the tests!
my comments:
- current code to get replies will work with append write disposition. but in practice you'll probably won't see most of the replies - replies can be added to messages after we acquire them. those replies we'll never see.
our solution is to always get last 7 or 14 days of messages by setting the start and end dates, in that case incremental loading is disables and we get the full range.
our code however sets write disposition to append
. maybe it is a good default but imo we should change it to merge when end date is set.
you could write a test for that. try to run the pipeline with an explicit range. current code will duplicate the messages
- this code will have a problem when running on Airflow - similar to the problem we had on sql_database. in the
@dlt.source
function we fetch channels list from slack. on Airflow this will happen every 30 seconds when running the DAG. maybe this is not a big problem? in sql_database it was solved for a case where table names are provided by the user (we are fetching table properties in the resource function - which happens at run time)
yield messages_channel | dlt.transformer( | ||
get_thread_replies, | ||
name=channel_name + "_replies", | ||
table_name=partial(table_name_func, channel_name + "_replies"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
table_name_func
needs type
and/or subtype
to be present. maybe you should add them to the context
of get_pages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comments above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! thanks for all the tests!
we should deploy this with our slack to see how our moving merge window works. ping me with the deployment script (maybe we add it to the demo pipeline)
Tell us what you do here
verified source
)