Skip to content

Latest commit

 

History

History
61 lines (50 loc) · 2.23 KB

01 Getting Started with Data Engineering on GCP.md

File metadata and controls

61 lines (50 loc) · 2.23 KB

Getting Started with Data Engineering on GCP

Pre-requisite Skills for the course

These are the pre-requisite skills for the course.

  • Working experience using SQL is required
  • Programming experience is required, preferably using Python
  • Data Engineering or Data Warehousing experience is highly desired
  • Basic understanding of Command Line tools such as powershell
  • If you are new to Data Engineering, make sure to take our Data Engineering Essentials course.

Overview of Cloud Platforms

Let us get an overview of Cloud Platforms.

  • Quick Setup
  • Marketplace
  • Pay-as-you-go
  • Scalability

Here are the various Cloud Platforms.

  • AWS
  • Azure
  • GCP
  • Oracle
  • Rackspace
  • Digital Ocean

AWS, Azure and GCP are the top 3 Cloud Platforms.

Overview of Google Cloud Platform

Let us get an overview of Google Cloud Platform.

  • GCP is one of the 3 top Cloud Platforms. Others are AWS and Azure.
  • It have a big marketplace of services (both Native as well as 3rd party)
  • We can manage GCP Services using CLI or SDK using all prominent programming languages such as Python, Java, etc.

Signing up for GCP

Let us make sure we signup for GCP using valid email id. Keep in mind that you are eligible for USD 300 credit which is valid for 3 months.

  • Setup Google Account using valid email id.
  • Sign up for GCP using Google Account.
  • Get USD 300 Credits for 3 months.
  • Setup Project and Review Billing

Setup Google Cloud SDK

Click here to go to the instructions related to setting up gcloud CLI.

Configure Google Cloud SDK

Once Google Cloud SDK is setup, we need to make sure it is configured properly.

Here are the commands used for your reference.

gcloud init # You need to make sure to login via browser using GCP account
gsutil list # Google Cloud SDK will take care of setting up gsutil as well

Overview of Analytics Services on GCP

GCP provides robust set of Analytics Services. Here are the most prominent and relevant ones for this course.

  • Google Cloud Storage for Data Lake
  • Cloud SQL for Relational Database
  • Google Functions for Data Processing
  • Dataproc for Big Data Cluster using Hadoop and Spark
  • Databricks for Big Data Cluster using Spark
  • Google Composer for Orchestration