This project contains the complete implementation of the DDVUG Data Warehouse Challenge 2023 Willibald-Samen using dbt (data build tool).
Our aim is to make it as easy as possible for you to set up the fully functional solution for yourself.
We set up the following tutorials/document to guide you through our solution:
Willibald data vault with dbt - 00 - introduction
Short introduction about us and what this is all about.
Willibald data vault with dbt - 01 - installation guidelines using dbt cloud
A detailed step by step tutorial to set up our solution using dbt cloud. If you installed this, you will have the fully functional solution up and running in your own snowflake account. No prior knowledge of dbt necessary.
Willibald data vault with dbt - 01 - installation guidelines using ubuntu and dbt-core
A detailed step by step tutorial to set up our solution using an ubuntu virtual machine. If you installed this, you will have the fully functional solution up and running in your own snowflake account. No prior knowledge of dbt necessary.
Willibald data vault with dbt - 02 - solution overview
In this document we will go through our solution, describing some basic features of dbt using our solution and have a look at the different layers we set up and arguing, why we did it that way.
Willibald data vault with dbt - 03 - the data challenges and how we solved them
Description of all the data challenges presented in the data set including a description on how we solved them.
Willibald data vault with dbt - 04 - overarching functions
Description of all the overarching functions we were required to comment on within the challenge.
Willibald data vault with dbt - 05 - yedi tests and testing in general
How we solved the yedi test challenge and some examples of singular and generic tests.
Willibald data vault with dbt - 06 - closing the gap between business and tech
Description of how we closely integrated this dbt-solution with dataspot. a data governance tool. That way we are coming close to our vision of an ideal data warehouse setup.
- dbt
- dbt cloud interesting SaaS-Solution from dbt labs
- datavault4dbt from Scalefree
- DDVUG German speaking data vault user group
- DDVUG Data Warehouse automation Challenge on the TDWI in June 2023
- dataspot.
- Link to Video of our TDWI presentation
For those, who are familiar with dbt, here is a short instruction on how to install.
We'd still recommend looking at our documentation regarding the specifics of our solution.
We made the S3 bucket containing the source data publicly available.
If you have any questions or comments, just contact us, we are happy to hear from you.
See Willibald data vault with dbt - 00 - introduction for contact data.
- you need to have a snowflake account (30-day free trial available)
- clone the repository and navigate to the project directory.
- install python 3.9
- create venv: python -m venv venv
- upgrade pip: python -m pip install --upgrade pip
- install dbt (with snowflake 1.6.0): pip install -r requirements.txt
- install dependencies: dbt deps (In addition to the package datavault4dbt we defined our own macros published as the package datavault_extension (see packages.yml)).
- configure your database connection using the dbt configuration file or edit profiles.yml in source-directory (includes adding three environment-variables).
- run the dbt commands (dbt build) to create your data models and transform your data.
Please:
- take a look at the naming conventions
- note that our macros are only written for snowflake
- note that there are several objects/macros we wrote, that are not supported by datavault4dbt at this time but maybe in the future
- note that the macros we wrote in part depend on the naming-conventions we set up
Installation of newer / other versions was not tested
We do not expect contributions to this project for now. If you have any suggestions please contact us.
See Willibald data vault with dbt - 00 - introduction for contact data.
This work is licensed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0/
This project was inspired by the dbt documentation, scalefree and community. We would like to thank the dbt labs team and Scalefree.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR 'AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.