British Airways cancels all flights.. 27/5/17

20 posts in this topic

Quote

BA chief executive Alex Cruz said: "Our IT teams are working tirelessly to fix the problems. We believe the root cause was a power supply issue."

 

If they outsourced it cannot be "our IT teams"...

 

IMHO companies blindly outsource looking at possible short-term savings & ignoring the fact that the outsourced company stick rigidly to the contract and certainly don't "go the extra mile".

0

Share this post


Link to post
Share on other sites

Ermmmm  Power supply?

 

You mean a cleaner unpluged the laptop to plug their vacuum cleaner in and forgot to put the lappy back on charge??

 

 

2

Share this post


Link to post
Share on other sites

Boy, look at that mess at Heathrow...

 

heathrow.jpg.98ce3f171418c0d430c164f2716

 

A mile high pile of suitcases and thousands of people stranded on an airport?

 

Did that ever happen at BER? Think about it... :-)

 

0

Share this post


Link to post
Share on other sites

And they lied to everyone by saying everything would be up and running by 18:00 so thousands stayed who should have left only exacerbating the problem. What a nightmare.

0

Share this post


Link to post
Share on other sites

"everything up and running" does not mean that the back-log is cleared!

A "blip" like this takes at least 24hours to get sorted out, if not longer...

 

* boy, am I glad I hadn't planned to travel this weekend!!! :D

0

Share this post


Link to post
Share on other sites
2 hours ago, HEM said:

 

Of course not - NOTHING has happened at BER...

 

At the terminal, no. However, the runways are used quite often, I landed on one at the beginning of the week and taxied past the BER terminal

0

Share this post


Link to post
Share on other sites

"a power supply issue"?

Do they only have one data centre in India?

If you can bring a company as large as BA down with a power outage then they must have done something terribly wrong.

A power problem can happen everywhere, everytime .

You have to design your system that this can't cause major problems.

0

Share this post


Link to post
Share on other sites

Aren´t they planning to do away with control towers at London airports and do all the airtraffic controlling online form some remote place, relying on safe data connections? I wonder whether this really is a good idea.

0

Share this post


Link to post
Share on other sites

Like I say I'm a Dinosaur and I do think there is an over relience on IT systems and that perhaps companies should practice for when things do go wrong which, with IT seems to happen to often.

 

After all, is it really that difficult to have a back up in place that can allocate aircraft to routes? People already have their tickets, they already have historical evidence as to which type of aircraft to use on which route, the only issue would be for those who haven't booked a ticket yet as they cannot guarentee there will be space. Yes it will be so much slower but at least there won't be such a backlog.

 

As for IT, sorry to those programmers out there, but you remind me of ancient alchemists, promise to deliver gold but instead we get shit:P

2

Share this post


Link to post
Share on other sites

BA outsourced a lot to TCS -> tata consultancy services

 

I found this in a forum. Pulling a plug Indian style

" A few years ago, I flew to the UK for a meeting for a couple of days with the client manager and his team. A couple of hours after arrival, there was a "major incident", so we were all pulled into the "war room", where there was a conference call going on with the data centre in Bangalore. The database server had stopped most of its communication with the outside world, but since it was still pingable the fail-over detector didn't notice. The client manager told the chap in the data centre "switch off the database server". (Then the fail-over detector wouldn't get a response to ping, and would properly fail over to the backup).

"I can't do that sir".
"Yes you can. It's my responsibility. Now do it."
"I can't do that sir".
"Just unplug it!":
"I can't do that sir".
...
and so on. Eventually the data centre numpty chap cracked, and agreed to switch off the database server. At least, that's what we heard him agree to. The next minute, everything went down.

"What have you done?".
"Just what you told me sir. I've turned the power off."

He'd hit the emergency power off*. Thus neatly turning what should have been a thirty minute outage affecting a few thousand users into over five hours affecting a few tens of thousands of users, as network cards and other components failed at boot up and needed replacing...

The official cause of the issue? Power outage. ohwell.gif.pagespeed.ce.EcMFTGKF-U.gif

*At this point I diplomatically left the room, as it was becoming increasingly difficult to stifle my laughter. "

 

3

Share this post


Link to post
Share on other sites
1 hour ago, AnswerToLife42 said:

BA outsourced a lot to TCS -> tata consultancy services

 

I found this in a forum. Pulling a plug Indian style

" A few years ago, I flew to the UK for a meeting for a couple of days with the client manager and his team. A couple of hours after arrival, there was a "major incident", so we were all pulled into the "war room", where there was a conference call going on with the data centre in Bangalore. The database server had stopped most of its communication with the outside world, but since it was still pingable the fail-over detector didn't notice. The client manager told the chap in the data centre "switch off the database server". (Then the fail-over detector wouldn't get a response to ping, and would properly fail over to the backup).

"I can't do that sir".
"Yes you can. It's my responsibility. Now do it."
"I can't do that sir".
"Just unplug it!":
"I can't do that sir".
...
and so on. Eventually the data centre numpty chap cracked, and agreed to switch off the database server. At least, that's what we heard him agree to. The next minute, everything went down.

"What have you done?".
"Just what you told me sir. I've turned the power off."

He'd hit the emergency power off*. Thus neatly turning what should have been a thirty minute outage affecting a few thousand users into over five hours affecting a few tens of thousands of users, as network cards and other components failed at boot up and needed replacing...

The official cause of the issue? Power outage. ohwell.gif.pagespeed.ce.EcMFTGKF-U.gif

*At this point I diplomatically left the room, as it was becoming increasingly difficult to stifle my laughter. "

 

I remember the day BA announced they were gonna move their IT to India very well. It was not that long time ago.

A lot of talk from BA's side was about how this move was going to cut the costs for the customers.

Now we see.

 

 

0

Share this post


Link to post
Share on other sites

As I said / wrote:

 

23 hours ago, HEM said:

IMHO companies blindly outsource looking at possible short-term savings & ignoring the fact that the outsourced company stick rigidly to the contract and certainly don't "go the extra mile".

 

Thr trouble is, when things go wrong & IT (or whatever) is "bought back" the original experts are usually not re-hired (either pissed-off or found employment elsewhere).

0

Share this post


Link to post
Share on other sites

Always install your security patches!

 

http://forums.contractoruk.com/general/121716-so-who-did-ba-outsource-their-7.html
"A colleague at client site used to work for BA, so asked around. Allegedly...
TCS were told to apply some security patches - Linux and Windows. They applied them, then shutdown and attempted to restart the entire data centre. This immediately caused various components to fail. Memory chips and network cards particularly affected. Hence "power supply issue". Since TCS were given 24 hours to apply the patches and did it in ten minutes, they're due a bonus*.

* This bit I made up. "

 

0

Share this post


Link to post
Share on other sites
20 hours ago, AnswerToLife42 said:

"a power supply issue"?

Do they only have one data centre in India?

If you can bring a company as large as BA down with a power outage then they must have done something terribly wrong.

A power problem can happen everywhere, everytime .

You have to design your system that this can't cause major problems.

 

Restarting a data center is not the same as rebooting you computer at home.   A power outage is the worst nightmare when you are running hundreds or thousands of systems that run 24/7.  They can bring down really big companies like Netflix or Google.  Or even they can disconnect the Internet itself.

 

When you run a system 24/7 and you reboot it if a harddisk was close to die then it might not spin again, even if it was working all correct before shutting down.   Network cards can blow.  All kind of random sh*t happens.   Complicated software do not run in a single hardware, there is interdependence between multiple systems.  So bringing back all online might become a giant headache.    Switches and routers might lose some configuration and it is not that trivial to pinpoint the issue.

 

Normally you do planned reboots, you take systems offline and reboot them in a controlled environment, you replace parts if you have to, or you replace the whole system.   This is part of the maintenance plan, but a power outage means it happens in all systems at the same time in an uncontrolled environment.   In some companies there is no preemptive maintenance all all, imagine that.

 

I've seen situations when just rebooting a switch becomes a one week long problem.  And you might have read in the news that it had happened that once or twice the whole Internet was affected because there was somewhere a problematic router.

2

Share this post


Link to post
Share on other sites
18 hours ago, French bean said:

Like I say I'm a Dinosaur and I do think there is an over relience on IT systems and that perhaps companies should practice for when things do go wrong which, with IT seems to happen to often.

 

After all, is it really that difficult to have a back up in place that can allocate aircraft to routes? People already have their tickets, they already have historical evidence as to which type of aircraft to use on which route, the only issue would be for those who haven't booked a ticket yet as they cannot guarentee there will be space. Yes it will be so much slower but at least there won't be such a backlog.

 

As for IT, sorry to those programmers out there, but you remind me of ancient alchemists, promise to deliver gold but instead we get shit:P

Well, you should thank hystericals for this over reliance on IT systems.

 

There is no problem to board people with just their e-mails as proof of ticket purchase and issue hand written boarding pass. BUT TERRORISTS CAN BOARD THE PLANE ISIS OSAMA BIN LADEN LUKE I AM YOUR FATHER OH MY GOD TERRORISTS EVERYWHERE, this is why this did not happen.

 

Fuck hystericals who destroyed the comfort of ID free flying.

0

Share this post


Link to post
Share on other sites

Now I'm not a fan of Ryanair nor of O'leary but I have to admire him, he is so switched on:

 

"Ryanair's chief marketing officer Kenny Jacobs was on Today to discuss the firm's financial results.

In its statement the airline couldn't resist pointing out that it hadn't subcontracted its IT, unlike British Airways.

"We have three IT locations across Europe and these are owned and managed by Ryanair," Mr Jacobs told Today

"One of them is used at any one time. If we did have an issue which caused an outage in one, automatically one of the other two would kick in and in our 31 year history, thankfully, have never had an issue.

"We take IT very seriously - the important thing here is making sure that we don't have an outage like BA had that caused such disruption to their customers," he added."

 

Now if Ryanair who Know how to save the last penny and every drop of fuel have worked out it's better to have IT in house then the bright sparks at BA are actually not that bright.

5

Share this post


Link to post
Share on other sites
On 2017-5-29 11:34:15, Krieg said:

 

Restarting a data center is not the same as rebooting you computer at home.   A power outage is the worst nightmare when you are running hundreds or thousands of systems that run 24/7.  They can bring down really big companies like Netflix or Google.  Or even they can disconnect the Internet itself.

 

When you run a system 24/7 and you reboot it if a harddisk was close to die then it might not spin again, even if it was working all correct before shutting down.   Network cards can blow.  All kind of random sh*t happens.   Complicated software do not run in a single hardware, there is interdependence between multiple systems.  So bringing back all online might become a giant headache.    Switches and routers might lose some configuration and it is not that trivial to pinpoint the issue.

 

Normally you do planned reboots, you take systems offline and reboot them in a controlled environment, you replace parts if you have to, or you replace the whole system.   This is part of the maintenance plan, but a power outage means it happens in all systems at the same time in an uncontrolled environment.   In some companies there is no preemptive maintenance all all, imagine that.

 

I've seen situations when just rebooting a switch becomes a one week long problem.  And you might have read in the news that it had happened that once or twice the whole Internet was affected because there was somewhere a problematic router.

 

 

A good description of some of the complexities.

 

 A couple of quotes from http://www.theregister.co.uk/ give us some more details;

 

 

Quote

 The CEO has since confirmed the data centre was based in the UK, telling the Graun: "I can confirm that all the parties involved around this particular event have not been involved in any type of outsourcing in any foreign country. They have all been local issues around a local data centre who has been managed and fixed by local resources,” he said.

 

 

 

Quote

 

On Saturday morning around 9:30 there was indeed a power surge that had a catastrophic effect over some communications hardware which eventually affected the messaging across our systems.

Tens of millions of messages every day that are shared across 200 systems across the BA network and it actually affected all of those systems across the network.

Speaking to Channel 4, Cruz said: "We were unable to restore and use some of those backup systems because they themselves could not trust the messaging that had to take place amongst them."

 

 

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now