Business continuity: things to consider

Let’s continue with business continuity. Today I would like to review some points to be taken into account during the implementation of a Business Continuity Plan, which I consider essential to achieve a successful outcome:

1. The scope.

2. Senior management support

3. Investment

4. Setting the objective recovery times.

And with this extremely short introduction, let’s go into the matter.

1. The scope

The first aspect that we must take into account is the scope of our Business Continuity Plan (BCP), in two different areas. On the one hand, in an horizontal sense, it is necessary to clearly define which services, activities or processes are going to be included in the BCP, if we also intend to certify the management system in ISO 22301 standard. Although from a BCP’s usefulness point of view the most logical thing is to include the entire organization, especially in small companies where the infrastructure is shared for all the organization, it is also possible to choose a reduced scope that covers relatively independent elements (a local branch or the company’s headquarters, for example), or simply include those processes that are known beforehand to be the most critical for the organization’s continuity, such as production or logistics in a more industrial company, or the web portal in a purely e-commerce company.

Of course, it goes without saying that if our goal is merely commercial (i.e. to get the certificate seal to be displayed in the corporate website), the scope, as well as the rest of the plan, has little importance (except for the fact that we won’t be allowed to certify whatever scope we want).

On the other hand, the scope must be defined in a vertical sense: how far do we want to go in the definition and development of the plan? Apart from the ICT infrastructure, are we going to develop the communication plan, the staff replacement plans and the physical infrastructure recovery plans? This will help us to know if we are talking about a Business Continuity Plan or a Disaster Recovery Plan, the latter being a particular case of the former. However, it should be noted that in general, when talking about Business Continuity, the most common thing is to refer to Disaster Recovery, leaving the rest of the plans (and disaster scenarios, sometimes) out, in many cases due to internal organizational issues.

2. Senior Management support

The second point to take into account is, as in any Management System, to have the support of the senior management. This requirement that sometimes seems a mere formality is particularly important in the case of a BCP, since the result of the project will not be just a set of documents, but a set of recovery strategies which will probably require spending some money.

Although the economic investment is also common in the implementation of other standards such as ISO 27001, in an ISMS some initiatives are more likely to be supplemented by regulatory and procedural controls, a less frequent option in the definition and implementation of recovery strategies, specially when they involve backup sites and infrastructure.

In summary, it is critical that the senior management, which is usually the body with the authority to approve investments, is involved or kept informed of the whole BCP implementation process.

3. Investment

After the previous point, it is easy to infer that an important aspect of any BCP (of any project, in fact) is investment; senior management may be fully involved in the project, but the economic circumstances of the organization may not be the best for undertaking new projects.

In short, we can say that the purpose of a BCP is to see how prepared an organization is to face potential disasters, and to put the necessary means to fill the gaps in areas where it is not as prepared as it should be, often through initiatives that may require investments in technology, outsourcing of services, personnel, regulatory projects, etc. Given that these projects are likely to require investment, that should be taken into account at the beginning of the project.

However, it is easy to ask the following question: does it make sense to implement a NCP if the organization does not consider or can afford to make an additional investment if necessary to meet business requirements? In my opinion, yes, it does, but let’s go for parts.

The first thing is that, considering that the analysis stages of a BCP are aimed at “understanding the organization”, it will always help us to identify the critical processes whose continuity requirements are not being adequately met, and at least try to implement, at a minimum cost, some initiatives to minimize the impact while those of greater economic cost cannot be addressed.

On the other hand, it must be considered that the subjective aspect that exists in the estimation of tolerable interruption times in processes that do not have an associated direct impact (as would be the case of an OT system in which the economic impact of a shutdown can be calculated “counting” the units not produced plus the cost of staff, apart from other more vague elements such as the impact on the brand, the potential under supply to customers, the inability to meet orders or projects in progress, etc.) allows in many cases to reasonably stretch the maximum times that an activity can be interrupted, without endangering the continuity of the organization.

Let no one be alarmed yet; I understand that saying that the impacts (and therefore the times) are elastic can make someone think that there is no red line that once overcome, implies putting at risk the continuity of the organization. In fact there is, but more than a line, it is usually a wide strip; it is common for users to establish recovery times for their activities that are much higher than necessary, which even in the past were exceeded without noticeable consequences (and of course, without risking continuity).

This elasticity is what allows some degree of freedom in the selection of the initiatives to be implemented: at the cost of increasing the impact within the acceptable margins, we can choose to implement the “cheaper” strategies. In other words, there are many shades of grey between daily backups and real-time data replication, and although the latter may be the ideal option for an organization, it is possible that a replication every 24 hours guarantees continuity, despite implying a greater impact in case of disaster.

Finally, it should be noted that even when an organization does not have the resources to implement an initiative considered essential for the business continuity (and therefore assume the risk), the development of a BCP can help the organization prepare for the occurrence of various incidents and generally improve the organization’s readiness and security.

4. Setting recovery times

To end this post, a key question is who should decide on the RTO and the MTPD. Let’s keep in mind that these times will ultimately determine what the critical activities are, what recovery strategies should be applied and what projects need to be implemented. Therefore, if the critical activities that come out of the Business Impact Analysis do not match the real ones, either because a bad choice of stakeholders has been made, because the times have been influenced by the enthusiasm of those in charge, because of the hierarchical weight of the departments or any other internal political issues, we may find ourselves in the middle of a disaster with nobody paying attention to critical activities until it is too late.

And here I have already answered this last question: the values of MTPD and RTO must be established by the person responsible for the process, since he is the one who knows it best. However, if we only take that into account, it would not be unusual to find fifty processes that cannot be stopped for more than an hour. What needs to be considered when establishing those times?

Help the user to help you

Explain well to the user the purpose of the project, what we needed from they and avoid acronyms (RTO, RPO, MTD, etc.) or technical concepts. In the end, what we need is nothing more than your estimate of the maximum time that activity X can be stopped before there are irreparable consequences. What is a serious consequence? It can be many things, and that is where its subjectivity lies, but for example, in some organizations it can be the need to involve management. Therefore, the question would be: how long does activity X have to be stopped before it is necessary to call Management?

At this point, it is useful to ask about past incidents that might make them think about the tolerable times, and probably get them reduced. However, it is not a question of getting him to reduce his times, but of making them respond to reality. Times unrealistic can derive contingency work towards activities that are not critical, neglecting the attention to other critical activities affected, in addition to unnecessarily increasing the pressure on work teams.

Cross-check the data you are provided

Relativize the times provided and seek objective points of comparison. It is normal to find users who provide, without any reason other than pride, recovery times totally out of perspective. I have founds users who, having printers available in departments just a few meters away, stated that they could not go more than a couple of hour without a printer, without actually making a really critical and intensive use of it. Therefore, it is necessary to review the users’ provided times and put them into perspective, detecting those that go out of context, and studying whether this is justified or whether it is necessary to make a second round of questions with that person.

It is helpful, at this point, to have someone who, without being an expert in all areas, does know the organization in some depth.It is also useful for the user to be aware that shorter times may require greater investments, appealing to their sense of responsibility.

Share the results with senior management

Finally, once we have the objective recovery times gathered, analyzed, and established, it is necessary to share them with senior management (or any executive equivalent), which will help to put them in context and homogenize them, making them as independent as possible from subjective aspects, and to keep times proportional to the criticality of each process in the general scheme of the organization.

After these three steps, the resulting times, although they may contain errors due to excess or defect (i.e., objective recovery times that are too strict or too lax), must be fairly close to reality. The subsequent iterations of the BCP and the periodic execution of the business continuity tests will allow adjusting these times to bring them as close as possible to the needs of the business, maintaining a reasonable balance between the resources needed and the resulting impact.

See also in: