Saturday, April 20, 2013

Plan B

Do you have a plan B?
Of course you do! You have redundant power, redundant equipment with redundant routing engines on redundant sites and redundant uplinks from different providers on different transport mediums. So you're covered. Then there is no need for you to read this.

As network architects and engineers we want to build resilient solutions, that can sustain damage and problems occurring within our know paradigm. We all know the everyday things that can go wrong because we've seen them before. We know we can loose power, equipment can malfunction and cables can be disconnected. We know how frustrating these events can be and the "panic" that can erupt, not to mention the potential financial losses associated withe them. We take them into account when we design a solution. This is natural and correct thinking based on experience. But is it a Plan B?

Without neglecting their importance, the resilient solutions are mainly in place to make our (working) lives more normal. We don't like to get a call at 3.00 o'clock on a saturday morning and spending the rest of the weekend fixing stuff and then spending all monday listening to our boss yelling. We see these events from within our own paradigm. We want to be sure, that "our stuff" isn't broken, so that we don't have to go out and fix it. This is great, but it isn't a Plan B. A Plan B has to have a paradigm within the organization. It has to be able to handle disruptions outside of our control.

Staying within our own realm of knowledge and capabilities, what is the worst thing that could happen? What would bring down the entire organization to a point where it might not recover?
If you still come up with "power failure" give it 5 more minutes.

There is no standard answer to this question since all organizations are unique. But I'll try to give you some options to choose from.
- No Internet connection for 2 weeks.
- No access to certain service for 2 weeks.
- Your customers are not able to get/access your service/product for 2 weeks.
- Your customers are not able to pay for your services for 2 weeks.
- No communication with the outside world for 2 weeks.
- Your non-Internet-connected systems are taken down for 2 weeks.
- A 3. party gets access to you (non-Internet-connected) systems without your knowledge.

Would any of these be more fatal for your organization then a 3 hour power failure?

"Yes, but that's not going to happen", I can hear you say - really?

Last month there was a so called "Internet war" going on between Cyberbunker and Spamhouse. Allegedly it was a 300 Gbps DDoS attack, even though I haven't seen any evidence to support this. Real or not, this is not an impossible scenario so have you considered what would happen if your organizations network or services are in harms way of such an attack for days/weeks?

In 2008 Pakistan Telecom accidentally blocked access to Youtube for most of Asia. The block was intended to be national only but affected two-thirds of the global Internet population. Not being able to access Youtube might be harmless for most organizations, but what if it was a different service - maybe PayPal? Also you need to remember, that this was a simple misconfiguration of a router. Imagine someone doing a deliberate attack!

Many countries and regions around the world are dependent on one or two major uplinks. That is, their entire access to the remaining Internet is dependent upon one or two physical cables or one or two uplink providers. During the arab spring we have seen several countries having their Internet access cut off to the outside world. This is possible when the uplinks either physical or by number of providers are limited. You might live in a relatively free country, but you might also be in a geographical region that is serviced by only one or two uplink providers. If that/those providers fail, you loose your connection. This can be a simple misconfiguration by the provider, like removing a route or an ASN.

There have been incidents in the US, were people not able to buy gas at certain stations, which had no Internet connection. Not because the pumps at the station needed a connection to pump gas, but all the payment systems (including cash payment) did not operate, since they relied on an connection.

Some people make secure networks, like for instance Europol, which are physically separated from the Internet in order to gain security and avoid any of the "bad stuff", that takes place on the Internet. This is a false sense of security. If you are running the same protocols, that run the Internet, you are vulnerable to the same problems. Your infrastructure doesn't even have to be running TCP/IP to be hit. Stuxnet jumped to SCADA systems, which weren't connected to Internet at all and weren't running TCP/IP.

As the Internet gets more complex and more services rely on the Internet and more governments and other organizations try to break or interfere with the basic protocols running the Internet, the likelyhood of events like the ones mentioned above increases.

What can you do?
Make a Plan B!
This isn't something you can do alone. This has to take place with people from your entire organization. And, as with so many things, it's the process of making the plan that is far more important the the actual outcome.
You can't know in advance what might happen or when, but you can identify the most crucial spots in your organization, the ones, that can receive damage beyond repair. Simply identifying them and knowing them is the most crucial part of the process.
Once you know them, then you can think about how you can cope, should damage or disruptions ever occur. This doesn't have to be expensive. Look for the simplest and most low-tech or no-tech solution possible.
To get you started: Identify and examine the processes and service your organization relies on. Look around your organization. If there is no Internet connection, will you be able to use the phone? Receive payment from customers? Pay your employees? What happens if anyone gains access to non-Internet-connected parts of your infrastructure?

Once you have come up with a Plan B - test it! 
I can't stress this enough, if you do not test it, you don't know if it's going to work and you might be living with a false sense of security. Even if it is a basic as installing a backup power system - pull the plug on a live environment (don't do this in peak business hours) and verify that it works.

No comments:

Post a Comment