Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ELE-47] AWS Glue Integration #177

Open
rajkstats opened this issue Aug 6, 2022 · 6 comments · May be fixed by #1710 or elementary-data/dbt-data-reliability#757
Open

[ELE-47] AWS Glue Integration #177

rajkstats opened this issue Aug 6, 2022 · 6 comments · May be fixed by #1710 or elementary-data/dbt-data-reliability#757

Comments

@rajkstats
Copy link

rajkstats commented Aug 6, 2022

Requesting integration with Amazon S3 as a data lake

  • Data lake is built upon Amazon S3
  • Most of Transformations/ETL are done in AWS GLUE with Spark and there is a dbt-glue-adapter which supports running dbt against Spark
  • Some of the ETL jobs are orchestrated with Airflow
  • Running the queries on GLUE Data Catalog with Amazon Athena

Want to set up data observability on top of input and output datasets.

ELE-47

@Maayan-s
Copy link
Contributor

Maayan-s commented Aug 7, 2022

Hi @rajkstats! Thanks for opening the issue!
I'm not familiar with the dbt-glue-adapter, so it's hard to assess how many changes such integration will require.
We recently decided (do to demand from the community) to add a Databricks integration, and decided to approach it gradually -
Step 1 - add support for uploading dbt artifacts and run results (in the dbt package).
Step 2 - add support in the CLI for Slack alerts and UI generation.
Step 3 - add support for data anomaly detection test (the most complex and platform-specific part of the code right now).

Here is my PR for step 1 for Databricks, as you can see it actually required pretty minor changes.
If you want to give a shot with AWS Glue, I would be happy to support you!

@rajkstats
Copy link
Author

Thanks @Maayan-s for sharing the approach, I will give it a shot, let you know if I would need any support. Thanks.

@elongl elongl changed the title Add integration with s3 Add integration with S3 as a data lake Nov 29, 2022
@Hadarsagiv Hadarsagiv changed the title Add integration with S3 as a data lake [ELE-47] Add integration with S3 as a data lake Jan 3, 2023
@Hadarsagiv Hadarsagiv added the Contribution Created by Linear-GitHub Sync label Jan 3, 2023
@bruno-ribeirodasilva
Copy link

@rajkstats did you do any progress on this?

@Maayan-s
Copy link
Contributor

Hi @bruno-ribeirodasilva, I assume that this issue can be re-assigned.
Are you interested in giving it a shot?

@rajkstats
Copy link
Author

@bruno-ribeirodasilva I wasn't able to pick this up, but have plans to pick it up. You feel free to give it a shot as @Maayan-s suggested

@nandubatchu
Copy link

@Maayan-s did this progress? Do we have a way to use elementary with dbt-glue adapter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants