These are the pre-requisite skills for the course.
- Working experience using SQL is required
- Programming experience is required, preferably using Python
- Data Engineering or Data Warehousing experience is highly desired
- Basic understanding of Command Line tools such as powershell
- If you are new to Data Engineering, make sure to take our Data Engineering Essentials course.
Let us get an overview of Cloud Platforms.
- Quick Setup
- Marketplace
- Pay-as-you-go
- Scalability
Here are the various Cloud Platforms.
- AWS
- Azure
- GCP
- Oracle
- Rackspace
- Digital Ocean
AWS, Azure and GCP are the top 3 Cloud Platforms.
Let us get an overview of Google Cloud Platform.
- GCP is one of the 3 top Cloud Platforms. Others are AWS and Azure.
- It have a big marketplace of services (both Native as well as 3rd party)
- We can manage GCP Services using CLI or SDK using all prominent programming languages such as Python, Java, etc.
Let us make sure we signup for GCP using valid email id. Keep in mind that you are eligible for USD 300 credit which is valid for 3 months.
- Setup Google Account using valid email id.
- Sign up for GCP using Google Account.
- Get USD 300 Credits for 3 months.
- Setup Project and Review Billing
Click here to go to the instructions related to setting up gcloud CLI.
Once Google Cloud SDK is setup, we need to make sure it is configured properly.
Here are the commands used for your reference.
gcloud init # You need to make sure to login via browser using GCP account
gsutil list # Google Cloud SDK will take care of setting up gsutil as well
GCP provides robust set of Analytics Services. Here are the most prominent and relevant ones for this course.
- Google Cloud Storage for Data Lake
- Cloud SQL for Relational Database
- Google Functions for Data Processing
- Dataproc for Big Data Cluster using Hadoop and Spark
- Databricks for Big Data Cluster using Spark
- Google Composer for Orchestration