Performed statistical analysis on big datasets (>1 million entries) from New York City Taxi and Limousine Commission (TLC) to assist managers to make resource distribution strategies that maximize company profits.
-
Q1 Computed the total number of trips and total fares generated from all pickup locations (with Hadoop and Java).
-
Q2 Analyzed the popularities of all pickup and drop-off locations (with Spark/Scala) on Databricks.
-
Q3 Computed the average amount of profit the company will make at each pickup location (with Hadoop Pig) using Amazon Elastic Cloud Computing (EC2) and Amazon Elastic MapReduce (EMR) (with AWS).
-
Q4 Computed the degree differences for locations and the average fare for different passenger counts (with Hadoop) using Microsoft Azure HDInsight cluster and Azure Blob storage.