- Slides
- Overview of Architecture, Technologies & Pre-Requisites
We suggest watching videos in the same order as in this document.
The last video (setting up the environment) is optional, but you can check it earlier if you have troubles setting up the environment and following along with the videos.
- Why do we need Docker
- Creating a simple "data pipeline" in Docker
- Running Postgres locally with Docker
- Using
pgcli
for connecting to the database - Exploring the NY Taxi dataset
- Ingesting the data into the database
Tip
if you have problems with pgcli
, check this video for an alternative way to connect to your database in jupyter notebook and pandas.
- The pgAdmin tool
- Docker networks
Important
The UI for PgAdmin 4 has changed, please follow the below steps for creating a server:
- After login to PgAdmin, right click Servers in the left sidebar.
- Click on Register.
- Click on Server.
- The remaining steps to create a server are the same as in the videos.
- Converting the Jupyter notebook to a Python script
- Parametrizing the script with argparse
- Dockerizing the ingestion script
- Why do we need Docker-compose
- Docker-compose YAML file
- Running multiple containers with
docker-compose up
- Adding the Zones table
- Inner joins
- Basic data quality checks
- Left, Right and Outer joins
- Group by
Tip
Optional: If you have some problems with docker networking, check Port Mapping and Networks in Docker video.
- Docker networks
- Port forwarding to the host environment
- Communicating between containers in the network
.dockerignore
file
Tip
Optional: If you are willing to do the steps from "Ingesting NY Taxi Data to Postgres" till "Running Postgres and pgAdmin with Docker-Compose" with Windows Subsystem Linux please check Docker Module Walk-Through on WSL.
For the course you'll need:
- Python 3 (e.g. installed with Anaconda)
- Google Cloud SDK
- Docker with docker-compose
- Terraform
- Git account
Note
If you have problems setting up the environment, you can check these videos.
If you already have a working coding environment on local machine, these are optional. And only need to select one method. But if you have time to learn it now, these would be helpful if the local environment suddenly do not work one day.
- Generating SSH keys
- Creating a virtual machine on GCP
- Connecting to the VM with SSH
- Installing Anaconda
- Installing Docker
- Creating SSH
config
file - Accessing the remote machine with VS Code and SSH remote
- Installing docker-compose
- Installing pgcli
- Port-forwarding with VS code: connecting to pgAdmin and Jupyter from the local computer
- Installing Terraform
- Using
sftp
for putting the credentials to the remote machine - Shutting down and removing the instance
Did you take notes? You can share them here
- Notes from Alvaro Navas
- Notes from Abd
- Notes from Aaron
- Notes from Faisal
- Michael Harty's Notes
- Blog post from Isaac Kargar
- Handwritten Notes By Mahmoud Zaher
- Notes from Candace Williams
- Notes from Marcos Torregrosa
- Notes from Vincenzo Galante
- Notes from Victor Padilha
- Notes from froukje
- Notes from adamiaonr
- Notes from Xia He-Bleinagel
- Notes from Balaji
- Notes from Erik
- Notes by Alain Boisvert
- Notes on Docker, Docker Compose, and setting up a proper Python environment, by Vera
- Setting up the development environment on Google Virtual Machine, blog post by Aditya Gupta
- Notes from Zharko Cekovski
- 2024 Module-01 Walkthough video by ellacharmed on youtube
- 2024 Companion Module Walkthough slides by ellacharmed
- 2024 Module-01 Environment setup video by ellacharmed on youtube
- Docker Notes by Linda • Terraform Notes by Linda
- Notes from Hammad Tariq
- Add your notes above this line