Sunday, December 10, 2017

The Glaring Omission in Your Incident Response Planning

Chances are excellent that your incident response plan has a glaring omission in regards to one of the most critical aspects of success during an incident.

There has been an immense amount of time and treasure expended on what a proper incident response plan should look like.  Just throw “incident response plan” into your favorite search engine and you’ll get pages and pages of content. You’ll see all sorts of advice on how the various steps and phases of an incident response plan should play out and quite a bit of thought being put into things such as collecting contact information, identifying stakeholders and roles, inventory of tools to be used, determining secure communication methods (because you’re assuming the baddies got you email servers early and often), and the like.  Great stuff.

Does any of your plan talk about how to take care of your people during a major incident? I’m talking about those incidents that are measured in weeks or months where it’s an all hands to the pump 24/7 response measured in days or weeks of the response.  Once these incidents kick off, it’s too late for the preparation stage.  It’s show time and there is an immense amount of stress involved on all of the team whether it’s the CISO who is constantly being asked for updates by senior executives who are seeing their career dissipation lights cranked up to about a quarter million lumens or the lowest level incident responder who is cranking out digital forensic images or pouring through network logs.

An incident response plan for major incident responses isn’t fit for purpose unless it addresses how your incident responders border collies will be fed, watered, and rested. An organization should have a catering plan in place before an incident so that they can start getting a steady stream of food and drink to the people who are going to be putting in an immense number of hours all around the clock getting things under control.

If it’s a large organization (or a really nice start up in Palo Alto) chances are excellent that there is already an on-site cafeteria for employees that probably offers on-site catering services.  The incident response plan should specify how to engage those people and who the points of contact are.  You’re also going to want to talk to them before an incident to make sure that you can get food to cover a long term around the clock response.

If you don’t have anything on-site, you’re going to want to identify several external catering options and understand how to engage them on short notice for an extended response and to understand how scalable their services are since you might be feeding a very large team.  Their contact information, billing methods, and the like should be part of your incident response plan. You also need to discuss with your catering providers the menu options available before an incident. It’s important to give your people healthy food during an incident to keep them going.  Just saying you are going to order a steady stream of pizza from the take-out place down the road for weeks on end isn’t a great option.  You want to give your people some healthy options to keep them fueled up, feeling good, and ready to chase bad guys out of your network. 

You also want to make sure you are providing your people with a variety of non-caffeinated drink options in addition to the endless gallons of caffeinated sugar water or energy drinks that fuel most major incident responses.  

Keep in mind that you are going to be feeding not only your employees, but any consultants that parachute in to help you out of your bind.  There is a lot of dietary diversity these days so you’ll want to make sure you have options for people who need it due to medical, religious, or cultural reasons.  Popular options include vegetarian and gluten-free diets which works out well because you can get fantastic stuff that complies with either that everyone will enjoy.

The other thing that needs to be covered is transportation for your people.  Drowsy driving is a thing and it’s a thing you want nothing to do with during an incident.  Ride sharing services have made this much easier especially in major metropolitan areas.  The goal is to make sure you can get your people safely and efficiently back and forth between home (or the hotel rooms they are calling home during the incident) and work. Most of your people will be driving into work, but if they are too tired to drive because they ended up working a day or more in a row without sleep, it’s probably not a great idea to let them drive home and your plan should address that fact.

Which reminds me of an important point. If you are having people staying up for days on end, you’re very likely understaffed for your incident and you need to fix that quickly or you’re asking for more problems.  My general rule is that I don’t do forensics after ten hours because my chances for mistakes go up dramatically.  I’ve lost count of the amount of times that I struggled with something during a forensic exam at the end of a very long day only to solve it the issue in first fifteen minutes of being back in the office after getting some sleep.

As always, the keys to success are people, processes, and tools and your incident planning should reflect that fact. 

3 comments:

  1. "Does any of your plan talk about how to take care of your people during a major incident?"

    They rarely do. As an incident responder, I most often engage with folks who don't have an IR plan, or if they do, it's a dust covered binder that they point to when a compliance assessor asks.

    I've had a number of incidents over the years where I've gone on-site and the first thing I've recommended...after watching someone try to type a "simple" command or email address several times...is that everyone go home and get some rest.

    It doesn't take someone with my background to recognize...or maybe it does...that if you're dealing with an issue that you don't understand, on a network that you (think you) own (but don't fully understand), with an adversary who's not only operating on his own time frame, but is able to react to stimulus...then the initial anxiety is only going to get worse, and fuel that thought process that "more is better".

    I was once engaged with an IT director who was under considerable pressure from the higher ups to engage in 24x7 ops (while the "higher ups" were no where to be found...) and had been doing so for about 10 days. Now, there were no shifts...so the team was getting a bit of rest basically while "no one was looking"...not good. The IT director was downloading images to an ext HDD, and the copy operation (over the internal network) had stabilized at about 8+ hrs remaining. I suggested that at that point, everyone make a hard stop and get some rest...to which the IT dir asked me who'd be analyzing the images while we were resting. Yes...those images...the ones currently being downloaded, that would require just a bit more than 8 hrs to finish.

    Due to fatigue, we ran into other issues...the compromised systems were not "supposed" to be on the internal network, and even with hard data (netflow, logs, etc) to support that, it took them about 2 days to accept it.

    Fear and anxiety quickly lead to fatigue, mistakes, apathy, and burnout. All of these can be avoided with the appropriate instrumentation and visibility into your infrastructure.

    ReplyDelete
  2. Great commentary. That also reminds me that one of the things that needs to be thought about well in advance is how to plug your consultants into your system as seamlessly as possible. Who are they going to be reporting to? *How* will they be communicated with? (because you're likely not using your corporate email for the incident response portion of things) and how are they going to get access to your network, whose tools are they using, etc, etc.

    You don't want your expensive calvary showing up and then you lose several days because they don't have access to your network and there is confusing on what tools are going to be used by who during the response.

    ReplyDelete