Airline Dataset Analysis using PySpark.
Analyze flight performance data and determine the ranking airports with Rank. In this, we will be utilizing departure delay data to perform analysis and answer the following questions:
Determine the number of airports and trips Determining the longest delay in this dataset Determining the number of delayed vs. on-time / early flights Which flights departing SFO are most likely to have significant delays Which destinations tend to have delays Which destinations tend to have significant delays departing from SEA Airport Ranking using Rank
Data File: 2015 Flight Delays and Cancellations :- https://www.kaggle.com/usdot/flight-delays
Dataset Description:
The dataset consists of 1048576 data points, including the following parameters: Flight_Number Destination_Delay Distance Arrival_Delay