Leaving on a jet plane
In a month dominated by bad news stories, an election result that no one predicted, DUP deals and a delayed Queen's Speech, the start of Brexit talks, and record breaking hot weather, it is easy to forget that just four weeks ago, a huge computer system meltdown at BA affected thousands of travellers world wide.
At 9:30am on the morning of Saturday 27th May, one of BA's data centres lost power. Initially BA said a power surge was responsible but no-one else in the area reported a surge. Both the National Grid, and the supplier, Scottish and Southern Electricity, said there were no problems they knew of.
Power problems in a data centre shouldn't be a big issue. Those systems typically employ multiple Uninterruptible Power Supplies (UPS) which use the mains power to keep banks of batteries topped up, and use the batteries to generate clean continuous power to the servers. In the event of a total failure of the mains supply, the UPS batteries can keep a server running for several minutes, to allow an automated orderly shutdown, or for really big data centres like BA uses, to allow a backup generator to start up and provide power for as long as there is diesel in the tank. If that isn't enough security, really big systems can be split across multiple data centres, so that rather like an aeroplane with an engine failure, they can keep running even if one data centre goes down completely. BA has three separate data centres.
The BA power failure at Bodicea House lasted approximately one minute and BA has since said a maintenance contractor "accidentally switched off the UPS" meaning there was no chance of an orderly shutdown. Bad as that was, that wasn't the root cause of the chaos. The BA spokesman said "After a few minutes of this shutdown of power, it was turned back on in an unplanned and uncontrolled fashion, which created physical damage to the system, and significantly exacerbated the problem". This chaotic restart resulted in the other data centres becoming corrupt and crashed around two hundred BA systems worldwide.
The consequences of this systems failure were profound. BA staff were no longer able to take bookings, check in passengers, or issue boarding passes. Passengers who had already been checked in and were waiting in departure lounges found they could not board planes because the departure gates were getting no information. Flights were suspended because the baggage handling systems couldn't get the right cargo onto the right planes, and pilots couldnt get the information about the weight of baggage which they need for fuel calculations. Even planes which had left the departure gates and were on their way to runways had to turn back because the BA systems could not provide passenger manifests to the destination airports.
In total, 726 flights were cancelled, many more were delayed, and about 75,000 passengers were affected, (according to Reuters), or 300,000 (if you read The Sun). It took 48 hours to get the systems working again, 72 hours before everything was fully operational, and according to some reports, about a week to finally get all the luggage to the correct destinations.
The cost of this to BA was huge. They have suffered not just the cost of two days of business completely lost, but also the mandatory compensation they owe passengers for delayed flights under EU legislation, and the costs of meals and hotel bills for accommodating passengers stranded at airports. The cost runs into millions. Even conservative estimates reckon the bill will be upwards of £80m, not to mention what it will cost them in lost future business and reputational damage. And all because of a one minute long power problem at one data centre.
29th June 2017
This article comes from the SKILLZONE email newsletter, published monthly since January 2008, and covering topics related to technology and the internet. All articles and artwork in the SKILLZONE newsletter are orignal content.