Skip to content

Latest commit

 

History

History
23 lines (18 loc) · 865 Bytes

README.md

File metadata and controls

23 lines (18 loc) · 865 Bytes

Airline_Data_Analysis

Airline Dataset Analysis using PySpark.

Analyze flight performance data and determine the ranking airports with Rank. In this, we will be utilizing departure delay data to perform analysis and answer the following questions:

Determine the number of airports and trips Determining the longest delay in this dataset Determining the number of delayed vs. on-time / early flights Which flights departing SFO are most likely to have significant delays Which destinations tend to have delays Which destinations tend to have significant delays departing from SEA Airport Ranking using Rank

Data File: 2015 Flight Delays and Cancellations :- https://www.kaggle.com/usdot/flight-delays

Dataset Description:

The dataset consists of 1048576 data points, including the following parameters: Flight_Number Destination_Delay Distance Arrival_Delay