Developing Incident Simulation for Business Continuity and Cyber Incident Response Plans
An incident simulation recreates a real-world impact that is designed to create an operational impact of sufficient magnitude to require invocation of your business continuity or cyber incident response plan. Performing tests of this nature can require that specific resources are available to provide a safe, yet realistic environment in which to perform the test.
Tests based on an incident simulation and its impacts can highlight previously unknown weaknesses allowing you to pre-emptively remediate them. Although requiring much more planning work, incident simulation is the most reliable way to build confidence in the effectiveness your business continuity and cyber incident plans and can provide useful feedback on your intrinsic levels of resilience. An incident simulation puts plans under stress – as they would be in a live situation, rather than being "passively" evaluated.
Building blocks of an incident simulation
- Focus on basic key dependencies that will be impacted during the BCP test: for most organisations this will mean premises, people, systems and supply chain
- Establish a stress threshold: This means “how bad will it be, how much will be lost or unavailable, how important are the resources that have become unavailable?”
- Establish a time of occurrence: At what stage of the business cycle will this incident simulation occur (i.e. at a sensitive time in terms of volumes or financial importance) or just during a fairly routine period.
Key considerations for developing an incident simulation
How do we establish the impact?
The objective here is to create an impact of sufficient magnitude that would require the necessary business continuity plans or cyber incident response plans to be invoked. There are several factors to consider to meet this objective:
Scale – how much will be lost or unavailable? An example of this might be, will our business continuity arrangements mitigate staff unavailability of 35%. That’s a significant amount of the workforce, but that threshold alone might not be enough to warrant invoking the business continuity plan if:
The staff who are unavailable are not engaged in operationally critical functions
The incident occurs during an unusually quiet time where the level of absentees can be tolerated
The importance of the resources impacted - scale alone, might not be sufficient to provide a challenging test. The next step is to think about which resources are impacted. Again, staying with the staff absence scenario, the majority of the staff who will be unavailable will need to come from operationally critical areas of the business. Even if 35% of staff overall are absent, the impact generated by the incident simulation needs to be biased towards critical resources to provide the necessary stress level within the test
Duration – how long is the impact expected to last? This is something that, unless you have unlimited resources, is difficult to simulate – time is time and can’t be condensed. This aspect of the simulation has to be given in terms of a test “directive”. The incident simulation scenario has to be designed so that participants consider the duration of the incident as part of their response activities.
How should we select impacted resources?
Sometimes it’s more effective, particularly if you are dealing with a large amount of individual resources, to use a tool to randomly select individual resources (such as people from a personnel list or a list of file directories and databases). Bear in mind that if you use the random selection approach, there is a chance that the resources that are made unavailable may not cause the level of impact you were planning for. This can happen if the number resources selected meet the impact threshold but are not sufficiently critical to cause the desired level of impact. This is termed as “near miss” - where an incident occurs but the overall impact is not sufficient to warrant invoking either the business continuity plan or the cyber incident response plan (or both). A test which results in a near miss wastes both time and opportunity. It’s always best to review the results creating the necessary bias toward critical activities and subsequently “tweaking” them as required to avoid near misses
How will we pace impact escalation?
Pacing the incident simulation is an art rather than a science. All tests are constrained by time and this is one area where some accommodation has to be made. Many major incidents occur over a timeframe of days – and sometime weeks. Clearly, it’s not practical to keep participants involved in a BCP test for that period. The time constraint requires that some thought is given to how quickly participants are advised of impact (bearing in mind that establishing the overall impact of a real incident, could take several hours or days). This can be overcome by creating inserts that provide updates on overall impact and at the same time inform test participants that several hours have passed since the last update
What is the expected outcome?
During this type of BCP test, the BCP test timeline will include expected actions to be taken. The expected action should be those outlined in the business continuity or cyber incident response plan and taken in reaction to impacts and events that emerge during the test. Expected actions will be monitored by the BCP test support team who would be following the test timeline as the incident simulation unfolds. Establishing expected actions in advance forms a major part of the BCP test plan and the BCP test Results report.