Airports are vital national resources. They serve a key role in transportation of people and goods and in regional, national, and international commerce. As a consequence, flight delays in the United States result in significant costs to airlines, passengers and society. The annual cost of domestic flight delays to the US economy was estimated to be 31 − 40 billion dollars in 2007 (Ball et al. 2010, Joint Economic Committee, US Senate 2008). The impact of flight delays is not only economical but also environmental due to additional CO2 emissions to recover delays. Such high delay costs and implications motivate the analysis and prediction of air traffic delays, and the development of better delay management mechanisms.


The results presented in this preliminary project were obtained using data from the Air Travel Consumer Report released by the U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS). The dataset, which provides summary information on the number of on-time, delayed, canceled and diverted flights appears in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end.

In our analysis we processed 12 years of data (36 MB), from June 2003 to August 2015, using the following fields for each flight:

  •  the total number of flights
  • the number of delayed flights due to carrier
  • the time of delayed flights due to carrier

A first question we would like to understand is whether different carriers have different performances based on their spatial networks. The way in which air companies optimise their coverage is to select few major hubs which manage most of their air traffic. Those are the places were flights accumulate delays and it appears that the selection of those hubs is strategic to have the best performances.

We first plot the distribution of the fraction of delayed flights and the distribution of the time delay per flight of the best 12 carriers (see Carrier). These 12 companies are responsible for more than 84 % of the air traffic in the USA. First, we observe that these sets of distributions broadly cluster in two classes, depending whether the typical value (most probable event) is very close to the average one (such as for AA or Atlantic in Carrier) or very far from it (such as for Delta and Northwest in the same plot). This highlights the importance of rare events and fluctuations in the statistics of air flight delays.


By comparing different companies we can see that the one achieving best performances is Southwest Airlines Co. with the least mean and maximum delay time.

Southwest Airlines Co 2.154428 20.33333 Delta Air Lines Inc. 3.136766 49.63636 American Airlines Inc. 3.079371 23.20000 SkyWest Airlines Inc. 4.246181 32.00000 ExpressJet Airlines Inc 4.589429 96.93750 United Air Lines Inc. 3.715870 91.00000 US Airways Inc. 3.149458 107.50000 American Eagle Airlines Inc. 4.100867 27.50000 Northwest Airlines Inc. 6.540948 130.00000 Atlantic Southeast Airlines 5.087045 92.00000 Continental Air Lines Inc. 3.071228 57.50000 AirTran Airways Corporation 3.508366 140.00000.

If we plot the distribution of flights for different airports, this company prefers a strategy that is less focused on a single major hub (blue dot in AirportLoad plot). The flights tend to be less localised than it happens for the other companies such that Delta Air Lines Inc. and American Airlines Inc. (see green and grey dots in AirportLoad). Distributing more homogeneously the air traffic load seems to be rewarding in terms of flight delay due to internal organisation and carrier malfunctioning.


In addition, we plot the fraction of late flight and the delay per flight for different airport (Airport plot) based on their air traffic volume/airport activity showing a surprisingly strong asymmetry for what concerns delays between terminals that are hubs and those that are smaller. We adopt a colour scale such as the lightest colour (white) corresponds to high fraction of delayed flights and high average delayed time per flight. A priori one could have guessed that biggest airports with more traffic load have major delays. Instead, we can see that smallest airport accumulate the biggest delays. This may be due to the fact that career tend to optimise the hubs that are more important.



My long-term goal is to build a predictive model of the distribution of delays, based on the past data by simulating air traffic between airport and creating delays that match the statistics of past events. This will enable us to predict future delays per airport. Understanding how delays generate and propagate across the transport network involves considering a large number of variables and elements. The basic elements internal to the system include, of course, the flights but also passengers, crews, airport operations, etc. Additionally, other external factors can affect flight performance as, for example, weather, labor regulations and strikes or security threats. The intricacy of all these elements and of the interactions between them clearly qualifies ATM as an area to be considered under the light of Complex Systems theory. Complexity is characterised by the emergence of new phenomena as a result of the interactions between the elements of a system. Several tools have been introduced to study and quantify the properties of such emergent behaviours.