Skip to content

Using Hadoop and Spark, performed statistical analysis on big datasets from New York City Taxi and Limousine Commission (TLC) to assist managers to make resource distribution strategies that maximize company profits.

Notifications You must be signed in to change notification settings

YuanningEric/New-York-City-Taxi-Trip-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

Performed statistical analysis on big datasets (>1 million entries) from New York City Taxi and Limousine Commission (TLC) to assist managers to make resource distribution strategies that maximize company profits.

Contents

  • Q1 Computed the total number of trips and total fares generated from all pickup locations (with Hadoop and Java).

  • Q2 Analyzed the popularities of all pickup and drop-off locations (with Spark/Scala) on Databricks.

  • Q3 Computed the average amount of profit the company will make at each pickup location (with Hadoop Pig) using Amazon Elastic Cloud Computing (EC2) and Amazon Elastic MapReduce (EMR) (with AWS).

  • Q4 Computed the degree differences for locations and the average fare for different passenger counts (with Hadoop) using Microsoft Azure HDInsight cluster and Azure Blob storage.

About

Using Hadoop and Spark, performed statistical analysis on big datasets from New York City Taxi and Limousine Commission (TLC) to assist managers to make resource distribution strategies that maximize company profits.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published