Internal Audit of Informatics Disaster Recovery Planning

Internal Audit & Accountability Branch
January 2014

Table of Contents

Executive Summary

The objective of the audit was to provide assurance that information technology (IT) information and assets are maintained and safeguarded, through an informatics disaster recovery plan (IDRP), to ensure the continued availability of information technology functions at Citizenship and Immigration Canada (CIC).

Why This Is Important

As an integral part of CIC's service continuity and emergency management planning, the IDRP details the necessary steps for recovering IT assets after an unplanned or significant interruption. The objective of this plan is to ensure the continued delivery of departmental IT services that contribute to the health, safety, economic well-being and security of Canadians.

Key Findings

CIC's Business Continuity Planning Program Standard outlines roles and responsibilities for governance over IDRPs. The requirement for annual approval has not been met since 2008. While parts of CIC's National Headquarters (NHQ) and Regional IDRPs have been updated, a comprehensive review should be done to ensure they align to current priorities and reflect changes to the Department as a result of restructuring. Annual procedures exist to prioritize recovery resources and systems but these do not consider third party requirements. NHQ has developed and communicated an IDRP to restore CIC's critical systems which includes contingency planning but does not outline roles and responsibilities of key emergency respondents. Finally, there is no formal process to maintain and support IDRPs across CIC: IDRPs across headquarters and regions are at various stages of revision.

Recovery Point Objectives, which document management's expectations on information loss and form the basis for back up procedures, have not been defined. Coordination of back-up processes with Shared Services Canada (SSC) is informal, and does not require formal reporting to CIC when not completed.

The last testing of the IDRP at NHQ was performed in 2011 as part of a Business Continuity Plan (BCP) table-top simulation. There is no regular testing of IDRPs done in the regions. Results of IDRP monitoring and reporting are not provided to senior management at CIC. Following any IT incident, an Incident Report is prepared to alert IT management, identify the cause of the event, and identify appropriate follow-up actions required; however there is no formal process to follow-up on incident report recommendations.

There have been no incidents during the period covered by the audit that would indicate unsuccessful IDRP recovery attempts.

Conclusion

We have found that CIC's IDRP does not have sufficient maintenance and oversight to ensure the continued availability of information technology functions in a manner consistent with senior management expectations.

Management has accepted the audit findings and developed an action plan to address the audit recommendationsFootnote 1.

Statement of Conformance

The conduct of this engagement conforms with the Internal Auditing Standards for the Government of Canada, as supported by the results of the quality assurance and improvement program. We examined sufficient, relevant evidence to support the conclusions reached.

Gibby Armstrong
Chief Audit Executive
Citizenship and Immigration Canada

Background

Information technology plays a key role in CIC's ability to provide client services around the globe.

An up-to-date IDRP ensures that the resources and information necessary to restore IT assets and services in an emergency will be in place when needed. Disaster recovery planning in the Government of Canada is governed by the Policy on Government Security, Operational Security Standard – Business Continuity Planning Program, and the Emergency Management Act.

The Operational Security Standard – Business Continuity Planning requires departments to establish a Business Continuity Planning Program to provide for the continued availability of services. As part of this program, the standard requires departments to create an Information Management (IM) and IT service continuity plan, also known as an IDRP. This Plan should outline system and resource priorities, detail recovery procedures, and identify information, equipment, and personnel requirements, including roles and responsibilities, necessary to restore critical services to the department.

Governance over CIC's Business Continuity Planning Program has been established in a departmental standard approved by the Deputy Minister. Business Continuity Planning – the human, physical and resource side of recovery at CIC – is managed by Corporate Security, while informatics disaster recovery planning – the information management/ information technology (IM/IT) component of recovery – is managed by the Solutions and Information Management Branch (SIMB).  Business Continuity Planning should incorporate the needs of all third parties, which can include: other government departments that are reliant on CIC to deliver services; other government departments that provide services to CIC; and external suppliers under contract to provide services to CIC.

In the summer of 2011, network, email, and data centre services which were formerly the responsibility of CIC were transferred to SSC. Key positions and processes identified in the IDRP which were formerly the responsibility of CIC employees were also transferred to SSC.

The majority of CIC's IT applications and infrastructure are located in the National Capital Region. NHQ is responsible for the most critical areas of the Department's informatics disaster recovery planning; however, CIC's regional offices also maintain some applications and infrastructure and are responsible for the creation and maintenance of their own IDRPs.

Audit Objective and Scope

The objective of the audit was to provide assurance that IT information and assets are maintained and safeguarded through an IDRP, to ensure continued availability of information technology functions at CIC.

The scope of the audit included the processes and procedures involved in informatics disaster recovery planning for fiscal year 2012-13. The detailed audit criteria used for the engagement are found in Appendix A. The audit covered activities within the Department that contribute to informatics disaster recovery planning, including coordination with departmental Business Continuity Planning and coordination with third party IM/IT service providers.

Detailed Recommendations and Findings

Finding 1: Governance

A comprehensive update and review of IDRPs is required.

We assessed the extent to which governance has been established over informatics disaster recovery planning. We reviewed the processes in place to prioritize critical IT recovery resources and coordinate recovery requirements. We examined how the IDRPs have been developed and communicated, and how contingencies and required activities had been identified. Finally, we examined how IDRPs are maintained and supported for deployment.

A governance structure should ensure plans are appropriate, comprehensive, and up-to-date. Processes should exist to align resources and recovery efforts within the plan to management priorities and critical business requirements. An IDRP should have clear roles and responsibilities. Procedures within the plan should outline key steps in the recovery process to ensure they are completed and executed correctly. There is generally little notice prior to an emergency situation; IDRPs, as part of emergency planning, must be kept up-to-date to be effective.

The governance framework in place is not being followed.

CIC has developed a Business Continuity Planning Program which identifies governance, roles, responsibilities, requirements and accountabilities for business continuity emergency management at CIC. Two main components of CIC's Business Continuity Planning Program are a Business Continuity Plan (BCP) and an IDRP. The bulk of both BCP planning and IDRP planning occurs at CIC's national headquarters however regional offices are required to develop and maintain their own plans.

According to CIC's business continuity program, NHQ's BCP requires annual approval by the Departmental Security Officer and Deputy Minister. NHQ's IDRP is an annex of the BCP but also requires further annual approval by the Chief Information Officer. While parts of the BCP and IDRP are being updated annually, neither has received formal approval since 2008. It should be noted that a comprehensive update of the BCP has begun in fiscal year 2013/14 to reflect changes and major restructuring within the Department. As the IDRP relies on information in the BCP for system recovery priorities, this comprehensive update will have a significant impact on CIC's IDRP as well. Without a current IDRP at NHQ, CIC cannot ensure that IT systems required for service delivery would be restored in an appropriately prioritized manner following an emergency or crisis.

In CIC's regional offices, there is no process in place for the approval of IDRPs. The majority of regional IDRPs are out of date. Most regions do not have an adequate IDRP in place to ensure the timely resumption of IT services to operations in regional centers.

A process exists to prioritize recovery efforts, resources and systems but does not consider third party requirements.

On an annual basis, CIC Departmental Security ensures Critical Function Description Sheets are completed by all Director Generals at CIC. These documents are used to identify critical systems, resources and functions needed for CIC's operations and develop a list of critical operational resources. Through consultation with various stakeholders, including IT, the list is converted within the BCP into a prioritization of systems and operations for recovery in the event of a disaster. This list of prioritized systems and operations is required to align recovery resources and efforts in the IDRP.

The current process for identifying CIC's critical systems, resources, and functions only considers the needs of CIC. Third parties, who if also affected by a disaster could impact CIC's recovery operations, are not considered in emergency planning. Nor are third parties, who could be impacted if CIC's systems and operations were unavailable, consulted in the process for system and resource prioritization. As the federal government continues its consolidation of services and systems, and develops more common service providers such as SSC this single department approach will no longer be effective. The priorities of third parties must be included in CIC's analysis. Likewise, CIC should ensure that its priorities are considered in third party service provider plans.

CIC has developed and communicated an IDRP which includes contingency planning but not roles and responsibilities.

The IDRP identifies resources and procedures required to restore systems to “near normal” operation in the event of a disaster or disruption. At NHQ there is a semi-annual process to update appendices and distribute the IDRP to emergency recovery team members. Contingency plans, such as back-up power solutions, have also been put in place to support the IDRPs. A template for developing IDRPs has also been designed and communicated to regional centers.

NHQ’s IDRP identifies individuals to contact in the event of a disaster. Roles and responsibilities for carrying out recovery procedures are not documented in the plan. The plan should be designed so that it could be enacted by anyone called to assist in the event of a disaster. If the individuals identified in the plan are unavailable, it is unclear if and how all necessary procedures would be completed. Some individuals identified in the IDRP no longer work at CIC, as their positions and responsibilities were transferred to SSC. There is an additional risk that gaps exist between the roles and responsibilities CIC expects SSC to perform and what SSC expects to provide as these are not documented. Finally, in the absence of clearly documented roles and responsibilities, it is difficult to ensure that the IDRP is communicated to all required parties and that roles and responsibilities are understood.

There is no formal process to support and maintain IDRPs across CIC.

Technical experts are responsible for updating their areas within the IDRP and updates are coordinated by the IT Services Directorate. At NHQ, individual elements within the IDRP are at various stages of update and a comprehensive review of the IDRP, which would include SSC, has not been completed. While NHQ contact lists are updated regularly, the regional contact lists have not been updated since 2011, before regional restructuring took place. Back-up procedures are included in the IDRP; however, they do not reflect recent changes to servers and devices. Consistent maintenance and support is required to ensure that the IDRP remains relevant.

Most regional IDRPs have not been updated to reflect changes in restructuring, revised systems, processes and third party service requirements. In addition there are no clear points of contact for the support and maintenance of IDRPs in the regions.

A comprehensive update and review is required for both NHQ and regional IDRPs to ensure they contain all up–to-date required information.

Recommendation 1 (Medium Risk):

CIC should formally approve its IDRPs on an annual basis as part of the Business Continuity Planning Program.

Recommendation 2 (Medium Risk):

CIC should consider third party requirements in developing systems and recovery priorities, and request that third parties formally acknowledge their roles and responsibilities related to CIC’s disaster recovery planning.

Recommendation 3 (Medium Risk):

CIC should conduct comprehensive updates and reviews of its IDRPs annually; and clearly define the recovery roles and responsibilities within them.

Finding 2: Controls

Informal processes exist to coordinate recovery efforts.

We examined whether processes were in place to ensure that informatics recovery meets agreed upon management objectives. We expected to find that back-up and recovery processes were clearly linked to management established recovery objectives. We also verified that back-up processes were effective and timely.

Management should establish clear objectives for system recovery and controls should be in place to meet those objectives. Data recovery point objectives will determine the maximum amount of data or information which management is prepared to lose from the time an incident occurs. System back-ups create copies of information that can be used to restore data losses in the case of an incident or interruption.

Recovery Point Objectives have not been defined.

Informatics recovery should be designed to meet agreed upon objectives. Objectives of informatics recovery include the amount of data CIC is willing to lose in the event of a disaster. For example, if a recovery objective of 48 hours was set, this would mean management was willing to lose up to two days worth of data from the time of an incident. Appropriate system back up processes should then be established to meet these recovery point objectives.

There are no recovery point objectives for CIC’s IT applications. Data recovery point objectives for each business line should be identified and linked to system back-up processes. To date, the identification of data recovery objectives has not been part of CIC’s system development or business continuity planning process. It is critical to document management’s recovery expectations to minimize critical information loss when services are resumed. It is equally important to consider the objectives of third party system users and providers who keep information on CIC’s systems and could potentially lose critical information in the event of a service disruption. Furthermore, recovery point objectives should be documented and agreed upon with third party service providers to ensure that system back-up processes are appropriate.

While service level agreements exist between CIC and most of its third-party service providers, CIC’s recovery point objectives are not documented in these service agreements.  CIC must coordinate its recovery point objectives with its partners’ to ensure that CIC’s business requirements are considered in back-up processes and that critical data will not be lost in an outage. Although outside the scope of this audit, the governance, risk management and controls over service agreements have been noted as systemic issues which will be examined within the scope of a future audit.

Coordination of back-up processes with SSC is informal.

Since 2011, SSC has performed regular full and incremental IT system back-ups for CIC. Audit testing revealed that one of the two monthly back-ups tested was not stored off-site as required. There is no formal reporting to CIC when back-ups are not completed successfully. If systems back-ups are not completed it is possible critical data and documents could be lost or unrecoverable following a disaster.

Recommendation 4 (High Risk):

CIC should determine and document its system recovery point objectives, including those impacted by its third-party system providers and users.

Recommendation 5 (Low Risk):

CIC should request that Shared Services Canada provide formal notification to the Department when systems back-ups have not been successfully completed.

Finding 3: Monitoring and Reporting

Regular testing of the IDRP is not performed.

We expected that testing of the IDRP including an assessment of its continued relevance was regularly performed and documented. We assessed whether timely updates were made to the IDRP following testing or deployment to resolve any noted issues. Finally, we expected to see that results of IDRP monitoring were reported to senior management.

Regular testing of the IDRP is required to ensure it can be successfully implemented in case of an incident or emergency. A process should be in place for post-incident review and update of the IDRP to ensure that lessons learned from any testing or required activation of the IDRP are incorporated back into the plan. Testing results should be reported to Senior Management to ensure that procedures requiring follow-up and any follow-up actions are appropriately addressed. Senior management must be informed on on-going monitoring activities to ensure that the appropriate plans, procedures and resources for recovery are in place.

There was no testing of the IDRP done in 2012 at NHQ or in the regions.

Annual testing of the IDRP is required in CIC’s Business Continuity Program Standard; however tests are not regularly performed to ascertain that recovery procedures and back-up systems are capable of restoring business requirements. The last testing of the IDRP at NHQ was performed in 2011 as part of a BCP table-top simulation. There is also no regular testing of IDRPs done in the regions. While the BCP exercise did have IT elements its focus was on the human and physical resources required for recovery and not technical IT recovery requirements. In addition, there is currently no requirement to inform management whether tests have been performed. Without regular testing of the IDRP, CIC cannot ensure that the IDRP is complete and the procedures for service resumption will be effective.

It should also be noted that all testing of the IDRP both at NHQ and in the regions must be coordinated with SSC, once roles and responsibilities of SSC in the IDRP are made clear.

After-action reports are created following IT incidents.

Given no IDRP testing was conducted in 2012, we were unable to assess whether an effective process exists to update plans. Nevertheless, a regular process is in place to report IT incidents regardless of whether or not the IDRP was activated. Following an IT incident, an Incident Report is prepared to alert SIMB management, identify the cause of the event, and identify appropriate follow-up actions required.  There is however no formal process to follow-up on recommendations from IT after-action reports.

There is no regular monitoring of or reporting on IDRP testing to senior management.

Monitoring and reporting on the IDRP to senior management was not completed in 2012.There is no formal reporting to senior management on IDRP testing and follow-up at CIC. As there is no reporting to senior management regarding the results of testing of the IDRP, there is no process to ensure that appropriate resources, controls and redundancies have been put in place to safeguard CIC’s information and system continuity.

Recommendation 6 (Medium Risk):

CIC should ensure annual testing of its IDRPs is coordinated with Shared Services Canada and is completed in accordance with the Business Continuity Planning Program Standard.

Recommendation 7 (Low Risk):

CIC should ensure that testing results and follow-up actions are reported to the CIC Executive Committee and are used to update IDRPs.

Appendix A – Detailed Audit Criteria

The objective was to provide assurance that IT information and assets are maintained and safeguarded, through an IDRP, to ensure the continued availability of information technology functions at CIC.

Lines of Enquiry Audit CriteriaFootnote 2

Governance

Informatics disaster recovery planning is aligned to CIC priorities and roles and responsibilities are clear.

1- IDRP program governance has been established.
2- A process exists to prioritize critical IT resources to recover and coordinate recovery requirements.
3- An IDRP has been developed and communicated, and contingencies and required activities are clear.
4- The IDRP program is maintained and supported for any required deployment.

Controls

Controls are in place to return IT systems to a minimum acceptable level of service within an agreed upon timeframe.

5- Processes are in place to ensure that informatics recovery is designed to meet agreed-upon objectives.
6- Back-up processes are effective and timely.

Monitoring and Reporting

Monitoring and reporting processes are in place and ensure that recovery procedures will be effective.

7- Testing of the IDRP and its continued relevance is performed and documented.
8- Post-resumption processes result in timely updates to the IDRP.
9- Results of IDRP monitoring are reported to senior management.

Appendix B – Management Action Plan

Recommendation Risk Ranking Action Plan Responsibility Target Date
1. CIC should formally approve its IDRPs on an annual basis as part of the Business Continuity Planning Program. Medium

Management Response:

Since 1999, the Director, Information Technology Services has reviewed and approved the IDRP. Formal sign off was not documented. The IDRP is a living document and changes are listed in the Change Control Log. Updated versions are circulated to the Technology Recovery Team. Lack of formal approval has not prevented the IDRP from being successfully invoked when required.

Action Plan:

CIC will obtain CIO sign off on the current IDRP and obtain CIO sign off on an annual basis.

  • CIC updated the plan to identify CIC activities and SSC activities for the current IDRP as a result of the transfer of responsibilities to SSC.  (July 2013)
  • The updated IDRP was reviewed and approved by the Director, IT Infrastructure Services and SSC Departmental Lead, SSC (August 7, 2013)
  • CIO sign-off obtained.  (August 9, 2013)
  • Departmental Security Officer to document the BCP) Cycle and where the IRDP fits into the cycle. (Q3 2013/14)
  • In Q2 of every year, the IDRP is to be signed off by the CIO and included in the BCP as an annex for sign off by DSO and DM. (Q2 2014/15)
The effectiveness of the IDRP is tested in recommendation number 7.
SIMB (with DSO) Q2 2014/15
2. CIC should consider third party requirements in developing systems and recovery priorities, and request that third parties formally acknowledge their roles and responsibilities related to CIC's informatics disaster recovery planning. Medium

Management Response:

CIC has MOUs, SLA's, business arrangements and contracts with third parties that describe roles and responsibilities but they are not specifically related to the coordination of disaster recovery activities.
Currently CIC is fully compliant with the requirements of the contract with the Warehouse facility that addresses their roles and responsibilities related to disaster recovery activities and it is included in the IDRP.

Action Plan:

  • CIC's BCP coordinator will contact the service recipients to obtain system and recovery priorities and complete critical function description sheets for each. This information will be used to develop systems and recovery priorities. (Q1 2014/15)
  • CIC (SIMB and ASA) will meet with SSC and obtain formal acknowledgement for their roles and responsibilities as a service provider as they relate to CIC's disaster recovery plan.  (Q1 2014/15)
  • SIMB will request that CIC be included in SSC's disaster recovery plan. CIC will provide requirements to SSC for input into the SSC IDRP. (Q3 2013/14, going forward this will be done in Q2)
  • SIMB will determine the appropriate governance committee and update the Terms of Reference (TOR) to ensure that review of IDRP continues to occur on an annual basis. (Q3 2013/14).
SIMB (with ASA) Q1 2014/15

3. CIC should conduct comprehensive updates and reviews of its IDRPs annually; and clearly define the recovery roles and responsibilities within them.

Medium

Management Response:

CIC / SIMB conducts comprehensive reviews on an annual basis of the NHQ IDRP.  A Change Control Log acts as the formal record of updates/changes to the plan which often occurs several times throughout the year.  A comprehensive review of the NHQ IDRP has occurred every year since 1999 with the exception of 2012 due to the impact of DRAP, SIMB reorganization and transfer of responsibilities to SSC.  The last comprehensive review of the NHQ IDRP was conducted in July 2013.  Up until the SIMB reorganization, and transfer of responsibilities to SSC, regions were responsible for their own IDRP's.  The resources responsible for maintaining regional IDRP were transferred to SSC, and, as a result, the IDRP will now be centralized and include a regional component.

Action Plan:

  • Going forward, CIC will conduct a comprehensive review on an annual basis starting in Q1 2014/15 and will include regional IDRPs in the review.

Management Response:

The members of the Technology Recovery Team (TRT) carry out the activities in the IDRP. This includes CIC and SSC resources.  The TRT has been in existence since 1999 and its members are aware of their responsibilities during a disaster.  The roles and responsibilities of the TRT were included in earlier versions of the IDRP however they were eventually removed in an effort to keep the plan as short and concise as possible.  Updated copies of the IDRP are provided to all TRT members as changes occur and a laminated card containing pertinent emergency phone numbers for the team is distributed and carried by members at all times.  In years when the IDRP was not activated with an actual real-life event, or if major updates were made to the document, a staged event was carried out to ensure TRT members were familiar with their IDRP responsibilities. The current process includes a post-incident assessment and follow-up activities after the event.

Examples of Real-life events requiring activation of the IDRP:

  • Haiti Earthquake January 12, 2010
  • Damascus, Syria Evacuation December 15, 2011
  • CIC Network Outage March 29, 2013
  • Alberta Floods June 20, 2013
Action Plan:
  • A new appendix detailing TRT recovery roles and responsibilities will be added to the IDRP. (Q3 2013/14) 
  • CIC will continue to make periodic updates as required. (On-going)
  • SIMB will determine the appropriate governance committee and update the Terms of Reference (TOR) to ensure that review of IDRP continues to occur on an annual basis. (Q3 2013/14)
SIMB (ITSD) Q1 2014/15

4. CIC should determine and document its system recovery point objectives, including those impacted by its third-party system providers and users.

High

Management Response:

Criticality rankings for all CIC systems are in place. They are based on the maximum allowable downtime and have been defined, documented and confirmed. This is part of the Critical Function Description Sheet exercise performed annually by ASA.
The Recovery Point Objective for a system is the chronological point, beyond which the loss of vital records, works-in-progress, and / or IT application data may risk the viability of ongoing business operations, or at least result in a material loss to the department.  These have not yet been defined.

Action Plan:

  • As part of the BCP, CIC will define and include system recovery point objectives in the annual Critical Function Description Sheet exercise in the IDRP to identify this threshold.  This includes CIC service recipients. (Q4 2013/14 and on-going)
  • CIC will ensure that service providers are made aware of CIC system Recovery Point Objectives and ensure they are incorporated in their plans. (Q1 2014/15)
  • CIC will ensure that SSC's back-up processes meet Recovery Point Objectives indicated in CIC's BCP. (Q1 2014/15)
  • CIC will present a progress report on system recovery point objectives to senior management. (Q1 2014/15)
SIMB (with ASA) Q1 2014/15

5. CIC should request that Shared Services Canada provide formal notification to the Department when systems back-ups have not been successfully completed.

Low

Management Response:

SSC has confirmed that when a back up is unsuccessful, they continue to run until a successful back up is achieved. 
Although CIC is currently notified when back-ups do not successfully complete, a formal process has not been put in place to provide the notification.

Action Plan:

  • CIC (SIMB Partnerships) has requested SSC to provide formal notification on a daily basis when back-ups are unsuccessful.  (Completed Q2 2013/14).
SIMB (Partnerships) Complete

6. CIC should ensure annual testing of its IDRPs is coordinated with Shared Services Canada and is completed in accordance with the Business Continuity Planning Program Standard.

Medium

Management Response:

The IDRP is tested annually with the exception of 2012 due to the impact of DRAP, SIMB reorganization and transfer of responsibilities to SSC. Although not formally tested in 2012, CIC has successfully recovered from IT failures in 2012 and 2013 using the current IDRP (ex June 2013 - Calgary floods).

Action Plan:

  • CIC will coordinate annual testing of its IDRPs with SSC in Q3 2013/14 in accordance with the Business Continuity Planning Program Standard.
  • SIMB will determine the appropriate governance committee and update the Terms of Reference (TOR) to ensure that testing of IDRP continues to occur on an annual basis. (Q3 2013/14)
SIMB (ITSD) Starting
Q3 2013/14

7. CIC should ensure that testing results and follow-up actions are reported to the CIC Executive Committee and are used to update IDRPs.

Low

Management Response:

Previously, the testing results and follow up actions were reported to the BCP Coordinator.

Action Plan:

  • CIC will report IDRP test results and follow-up actions to EXCOM annually in Q4 as part of the BCP reporting process.
  • CIC will use these results to update the IDRPs. (Q4 2013/14)
  • CIC will inform SSC of the results and initiate joint review and resolution. (Q4 2013/14)
SIMB (ITSD and Partnerships) Q4 2013/14

Appendix C – Links to Applicable Legislation, Policies, Standards and Guidance

Page details

Date modified: