Business Intelligence Research and Development Environment
Privacy Impact Assessment (PIA) summary - Intelligence, Statistics and Data Directorate, Strategy and Integration Branch
Overview & PIA Initiation
Canada Revenue Agency
Government official responsible for the PIA
Assistant Commissioner, Strategy and Integration Branch
Head of the government institution or Delegate for section 10 of the Privacy Act
Name of program or activity of the government institution
Management and Oversight Services
Description of the class of record and personal information bank
All of CRA’s institution specific PIBs (not including those under internal services).
As part of its 2016 Info Source review process CRA will ensure that PIBs at the source level reflect the fact that personal information may also be used for data-matching purposes, program evaluation (quality assurance and data integrity), strategy development, and reporting and for other statistical analysis purposes.
Legal authority for program or activity
- Section 241(4)(d)(ix) of the Income Tax Act
- Section 295(5)(d)(v) of the Excise Tax Act
- Section 211(6)(e)(v) of the Excise Act
Summary of the project / initiative / change
Research and Development (R&D) activities currently taking place within CRA are largely decentralized. At present, the principle way for researchers and analysts to obtain data is to request a view access to the ADW and various data marts. This process is managed through the BIDS section of ITB. BIDS develops and maintains the acquisition of data to the R&D environment and supports the tools used by clients to use and manipulate the extracted data (i.e. eBCI, SAS, SPSS Modeler, etc.).
Given that the R&D activities taking place within CRA are largely decentralized, the Agency has identified the necessity to:
- Better deal with needs for data that cross program and organizational boundaries,;
- Provide leadership in new types of data usage, and new data directions (e.g. open government); and,
- Provide a coordinated Agency-wide approach to acquiring, using, sharing, managing and publishing data.
To facilitate this, the Agency proposed a coordinated Agency-wide approach with the implementation of a Business Intelligence (BI) appliance intended to support this R&D by providing a central Agency-wide infrastructure for using, sharing and managing data.
Data access approvals to use, share and manage the data will be governed through processes that will be identified in the detailed planning phase and implemented in the execution phase of the BI Renewal project. There will be no R&D on the BI Appliance until the established processes have been implemented. As the BI Renewal project progresses or initiatives that focus on R&D are refined this PIA will be reviewed and updated accordingly, and will support consultations with the OPC of Canada and TBS.
Risk identification and categorization
A) Type of program or activity
Program or activity that does NOT involve a decision about an identifiable individual
Level of risk to privacy: 1
Details: While the Agency uses information to undertake administrative actions on individuals under other programs and activities, personal information accessed and used within the R&D environment does not involve a decision about an identifiable individual. The Agency has been using data for many years to gain insight into its programs and activities, make strategic changes and take appropriate actions. The information derived from the data also supports the fairness and integrity of Canada’s tax system through compliance. To be able to detect, deter and correct the behaviours of taxpayers who are non-compliant, the CRA needs to learn and be proactive about taxpayer behaviours. The R&D environment allows specific employees to undertake the analytics activities to explore and uncover these insights and data relationship. Core R&D activities may include, but are not limited to:
- Research – the Agency Research Plan includes initiatives from multiple branches and can be branch specific or strategic in nature. Research projects are of quantitative or qualitative design and are used to gain a better understanding of taxpayer issues and to design or improve agency and program strategies.
- Analytics – includes descriptive as well as more advanced analytic techniques such as predictive, simulation and optimization. Analytics may be done independently or as part of a research program.
- Data Mining – includes the use of machine learning algorithms to sift through data to discover meaningful correlations, patterns and trends and
Data Access and Understanding – these are necessary to initiate and conduct research, analytics and data mining studies.
B) Type of personal information involved and context
Social Insurance Number, medical, financial or other sensitive personal information and/or the context surrounding the personal information is sensitive. Personal information of minors or incompetent individuals or involving a representative acting on behalf of the individual.
Level of risk to privacy: 3
Details: The data used within the R&D environment draws from the data present within the ADW as well as other internal source systems and data marts. Personal information may include sensitive information such as the SIN and financial information of individuals and businesses. According to the Agency's Classification Policy (Chapter 5 of the Finance and Administration Manual, Security Volume "Identifying Protected and Classified Information Assets") this data has been classified at the Protected B level. Agency staff within the R&D environment must adhere to CRA's stringent policies and procedures surrounding privacy and confidentiality, security, and information management.
The data to be housed within the R&D environment on a BI Appliance will consist of copies of the ADW as well as source systems and data marts. The personal information extracted from these sources will apply the same considerations as currently applied to the ADW and source systems/data marts.
C) Program or activity partners and private sector involvement
Within the institution (amongst one or more programs within the same institution)
Level of risk to privacy: 1
Details: The R&D environment includes only those research, analytics, and development activities being performed within the Agency. Only a small number of researchers and analysts from multiple CRA headquarters branches and at least one regional office are involved in R&D activities within the Agency.
The R&D environment will have the potential to contain any and all data collected by CRA taxpayer programs which may include external sources of data. The acquisition of this external data is governed by information sharing agreements and/or contracts between the respective taxpayer programs and the external data sources.
D) Duration of the program or activity
Level of risk to privacy: 3
Details: The R&D activities within CRA are ongoing existing activities.
E) Program population
Level of risk to privacy: N/A
Details: N/A – R&D activities within the CRA do not affect individuals for an administrative purpose. However the program population includes most of the Canadian population (i.e. taxpayers). Business intelligence activities conducted for administrative purposes (e.g. workload selection ) will be subject to separate PIAs.
F) Technology & privacy
Does the new or modified program or activity involve the implementation of a new electronic system, software or application program including collaborative software (or groupware) that is implemented to support the program or activity in terms of the creation, collection or handling of personal information?
Risk to privacy: Yes
Is the new or modified program or activity require any modifications to IT legacy systems and/or services?
Risk to privacy: No
The new or modified program or activity involves the implementation of one or more of the following technologies:
Enhanced identification methods - this includes biometric technology (i.e. facial recognition, gait analysis, iris scan, fingerprint analysis, voice print, radio frequency identification (RFID), etc...) as well as easy pass technology, new identification cards including magnetic stripe cards, "smart cards" (i.e. identification cards that are embedded with either an antenna or a contact pad that is connected to a microprocessor and a memory chip or only a memory chip with non-programmable logic).
Risk to privacy: No
Use of Surveillance - this includes surveillance technologies such as audio/video recording devices, thermal imaging, recognition devices , RFID, surreptitious surveillance / interception, computer aided monitoring including audit trails, satellite surveillance etc.
Risk to privacy: No
Details: The program does not involve the use of surveillance on taxpayers.
However, as part of the CRA security program CRA employees that will have access to personal information will be monitored by the use of OATS. OATS records information, such as user logon ID, date and time of logon, logout, user location, terminal identity, name and ID of client records accessed, including edits or changes made during each user session, etc.
The information is used to verify that only authorized users have accessed personal information and to ensure that access can be linked to specific individuals to support the investigation of suspected or alleged misuse.
Every time CRA employees log in on their computers, a notice pops up requiring employees to acknowledge that they are aware that all access to CRA networks is monitored and that access is on a need-to-know basis. This information is already described in the standard personal information bank Electronic Network Monitoring Logs PSU 905.
Use of automated personal information analysis, personal information matching and knowledge discovery techniques - for the purposes of the Directive on PIA, government institutions are to identify those activities that involve the use of automated technology to analyze, create, compare, identify or extract personal information elements. Such activities would include personal information matching, record linkage, personal information mining, personal information comparison, knowledge discovery, information filtering or analysis. Such activities involve some form of artificial intelligence and/or machine learning to uncover knowledge (intelligence), trends/patterns or to predict behavior.
Risk to privacy: Yes
Details: Automated personal information analysis, personal information matching and knowledge discovery techniques are used by the Agency. The R&D environment employs statistical analysis software (SAS), data mining software (IBM SPSS Modeler), and other advanced BI software tools to explore and visualize data for research, analytics and reporting.
G) Personal information transmission
The personal information is used in a system that has connections to at least one other system.
Level of risk to privacy: 2
Details: Workspace set up for R&D users will:
- Be configured with an exploratory research area with the ability for users to do the following:
- access all data (for which the user has been approved to use) loaded on a BI appliance
- manipulate loaded files from source systems, ADW and data marts, and save results to be reused
- recombine and create new datasets, save new/combined dataset through iterative analytical process
- perform statistical tests
- create and save outputs of analytical processes and research
- export or retrieve the results from the appliance to other environments (e.g. eBCI, and local networks
- Be carefully segmented into business areas which will only be accessible to individuals within those business lines with a need to know.
- Include tools for the business user:
- Use both server and desktop versions of BI tools (IBM SPSS Data Modeller and SAS) to perform work-related functionality (beyond queries) ensuring that processing happens on the appliance. In the future, the use of desktop versions is likely to be reduced or even eliminated.
- Incorporate data that was obtained through agreement by the program areas from external sources.
- Incorporate self-service data loads:
A mechanism will be defined to provide users with governed ability to load data into their dedicated workspace only. ITB is the only Branch able to load data onto the appliance for general consumption
H) Risk impact to the individual or employee
Details: The sensitivity of information utilized through the R&D environment is considered high (Protected B). Unauthorized use or disclosure of this information could result in loss of privacy, severe personal financial injury and or embarrassment to the taxpayer.
I) Risk impact to the institution
Details: In the event of a privacy breach, the Agency could suffer damage to its reputation, which in turn could potentially attract negative public interest or criticism. The Agencies could also be subject to civil litigation and liability for privacy breaches that result in harm to an individual.
- Date modified: