Scientific Advisory Committee on Digital Health Technologies, May 9, 2019, summary of proceedings

Attendance

Committee members

Core members: Joseph Cafazzo (Chair), Aviv Gladman, Kim Hanson, Kendall Ho, Trevor Jamieson, Chris Kamel

Ad hoc members: Syed Sibte Raza Abidi, Jaron Chong, Colleen Flood, Anna Goldenberg, Frank Rudzicz

Regrets: Kumanan Wilson

Presenters

Stakeholder representatives: Alan Coley, John Weigelt, Dana O'Born, Sachin Aggarwal (by teleconference), Peter Kendall, Jo Kennelly, Alex Bilbily

Health Canada: Tyler Dumouchel, Patrick Fandja, Ursula Polack

Observers

Health Canada: Patrick Assouad, David Boudreau, Sarah Chandler, Michelle El-Hage, Ian Glasgow, Reeham Hammouda, Janet Hendry, Shawn Hopewell, Andy Hua, Gregory Jackson, Marc Lamoureux, Matt LeBrun, Justin Peterson, Anu Shukla-Jones, Daniel Yoon

Other: Jeremy Petch (Hamilton Health Sciences), Francis Thibault (National Research Council)

Welcome

Marc Lamoureux, Manager, Digital Health Division, Medical Devices Bureau, welcomed the scientific advisory committee members. He thanked members for their time and for providing advice to Health Canada.

Chair's remarks

Dr. Joseph Cafazzo, Chair, thanked members for participating in the meeting and confirmed acceptance of the agenda. Committee members, presenters and observers introduced themselves and declared any updates to their affiliations and interests (Frank Rudzicz declared affiliations and interests in 2 companies). There were none that restricted SAC-DHT members from participating.

Presentations

There were 4 presentations:

1. Alan Coley, MEDEC, and John Weigelt, Microsoft Canada:

how artificial intelligence (AI) can be used in health care to help create better health outcomes

2. Dana O'Born, Sachin Aggarwal, Peter Kendall, Jo Kennelly, Council of Canadian Innovators:

how their respective companies are using AI
their perspective on how Health Canada should regulate AI

3. Alex Bilbily, 16 Bit:

their experience in developing machine learning algorithms for medical imaging
challenges they have encountered and proposed solutions

4. Tyler Dumouchel, Therapeutic Products Directorate, Patrick Fandja, Marketed Health Products Directorate, and Ursula Polack, Regulatory Operations and Enforcement Branch:

overview of the Canadian regulatory framework for medical devices, which involves:
pre-market licensing
post-market surveillance
compliance and enforcement

Discussion

AI training data

Science question 1:
"A medical device that is licensed by Health Canada allows for distribution to all Canadians, and thus must reflect the entire Canadian population. Therefore, the AI algorithm must be trained, validated and tested using data that are representative of the Canadian population for its specific intended use. Additionally, data are generally cleaned and curated before being used for training, validation and testing.

What are the key factors to consider when ensuring that appropriate data was used for training, validation and testing to support the intended use of the device?
What metrics could be relied upon to ensure that the training data have been correctly characterized, potential biases have been minimized, and the dataset is representative of the intended population in Canada?
What are acceptable approaches that manufacturers can follow to demonstrate adequate data curation?
Is the use of simulated data, poor quality data, or data imputation acceptable in any scenario, or are there fundamental reasons to avoid using these types of data?"

The committee discussed that it can be difficult to create methods that work for the entire Canadian population, and therefore it can be hard for an algorithm to represent the population.

However, it's important for manufacturers to make it clear what data was used to develop the method and in which populations the algorithm is expected to perform well, and to state the contraindications or exclusions. It's also important for manufacturers to put in safeguards to prevent the algorithm from acting on data it has never been trained on, where a reliable prediction or classification cannot be made.

The committee discussed that training sets, parameters and demographics vary widely across different types of AI systems. At a minimum, the committee suggested that Health Canada should request class distribution statistics with regression tests, distribution of outcome variables and post-hoc power analyses. This data will help Health Canada evaluate if the size of the dataset is appropriate and understand how the algorithm was developed. The committee also advocated for testing that simulates the actual use of the device.

When it comes to a population involving different groups, the committee encouraged proportional representation of each group in the training data and in the validation dataset. The committee also stressed the importance of ensuring no data leakage. In applied research, it is commonly known that if a patient exists in both training and testing arms, the effect or diagnostic accuracy may be overestimated. Best practice is to have separate people in both arms. Some editorial boards will also not accept the validity of an algorithm's performance unless an external set is used.

Continual post-market surveillance is important to identify unanticipated local changes that contribute to an algorithm's sudden change in performance. Unlike drugs that are on the market for years, things can change locally (for example, new protocol, new devices or scanners). This can cause a system that was functioning well to change over time.

The committee also discussed the challenge of obtaining clean data, especially in cases where the algorithm relies on patient users to input the data that the system uses. Errors are likely. Rather than using specific parameters to which AI should be programmed, Health Canada should consider regulating the process that manufacturers use to source, describe, train, test and validate data pre- and post-market. This may already be somewhat in place due to the requirement for manufacturers to have quality management systems and conduct risk analysis.

The committee discussed the risks and opportunities associated with simulated data.

Some committee members suggested that this is inappropriate for AI, as it strips AI's ability to identify nonlinear interactions between features, which defeats the purpose of AI. Simulated data also does not account for the huge variability that occurs in real life. Other members felt that simulated data can be beneficial in the case of a class imbalance problem, to prevent the device from falling into majority class classification.

At a minimum, manufacturers should indicate if they use simulated data, provide the provenance of all data and disclose if they have cleaned the data.

In summary, Health Canada should consider requiring that manufacturers document the following:

population on which the algorithm was trained
- For example, the demographics table 1 in most trials describes the characteristics of the study population. Based on this table, a clinician is able to make a decision about whether the algorithm applies to their patient.
data provenance (where the data came from)
reproducibility of the performance
specific protocols for validation and testing

Verification and validation of AI algorithms

Science question 2:
"Verifying and validating that the output of an AI algorithm is correct and appropriate is essential to guarantee that the algorithm reliably and predictably generates the intended result for the intended population.

What are the recommended methodologies and approaches to verify and validate that an AI algorithm generates correct and predictable results that reflect the intended use or expected outcome of the algorithm?
What performance metrics could be used to verify the accuracy of the outcome of a trained AI algorithm?
How can it be verified that manufacturers have chosen the correct algorithm or approach for addressing their clinical requirements?
How can AI algorithms be assured to be generalizable and transferable?"

The committee suggested that it would be very difficult to develop a list of methods to verify or validate any algorithm or performance metric because this would be dependent on the application and type of algorithm. The committee agreed that this is far too complex to define on its own. Any prescriptive requirements for how an algorithm is to be developed would quickly become outdated and challenge Health Canada's ability to stay current. Any algorithm should be validated in its intended setting, and manufacturers should be responsible for setting safety- and efficacy-based performance metrics.

It's important to show that performance can be duplicated. To this end, the committee suggested that Health Canada audit the processes that manufacturers have in place for quality management, validation, biases and limitations.

The use of computer simulation in place of traditional evidence can be beneficial in small sample sizes or where there are numerous possibilities. However, overall, the committee preferred traditional evidence, particularly in early stages. Published resources on validation and verification were cited as references.

Interoperability of AI software

Science question 3:
"Interoperability of AI software with current devices, clinical workflows, computer networks, and data are critical components of integrating AI into hospitals, clinics and point of care or home care settings.

What are the main safety and effectiveness concerns with interoperability of AI software?
What steps should manufacturers take to minimize the potential safety and effectiveness concerns with interoperability?
How should manufacturers mitigate against the potential for an AI algorithm to fail after a change is made to a third party device (for example, a change to a medical image acquisition device that affects the AI algorithm's interpretation of the image)."

The committee discussed that manufacturers should define when there are critical external features that are integral to the good function of the software, to mitigate or prevent issues. For example, if a sensor provides critical input to the algorithm, this should be noted and there should be a mechanism in place to detect a failed sensor and prevent inappropriate outputs.

The committee also discussed the need to have a clear picture of the population the algorithm was trained on (equivalent to the demographics table in a clinical study). This would help clinicians determine if it's appropriate for use on their patient.

AI algorithms could also be able to perform data integrity checks and subsequently provide warnings or interlocks to the user. For example, an algorithm trained on adults should have a mechanism to stop it from producing an output when presented with a pediatric patient.

Depending on the product's risk level, manufacturers may need to ensure a certain level of validation at the point of installation to mitigate issues of interoperability. This local validation would help ensure quality of implementation, especially for higher-risk devices.

Regulating the AI algorithm development process

Science question 4:
"Medical devices are currently regulated on a case-by-case basis using the significant change model and risk-classification rules. Case-by-case regulation may not be ideal for AI algorithms, given that the algorithms may be routinely re-trained and online training (for instance, continuous learning) is possible. Health Canada could continue to regulate these devices on a case-by-case basis or aim to regulate the AI algorithm development process. Furthermore, the current risk-classification rules may not accurately classify the risk of an AI algorithm.

What are the advantages and disadvantages to regulating on a case-by-case basis versus regulating at the development process level?
What should the main requirements be for a manufacturer to achieve approval of a development process?
What are the recommended risk classifications for AI algorithms?
How should regulatory oversight differ for the various risk classification levels?"

The committee agreed that regulating the development process, similar to the US Food and Drug Administration's pre-certification pilot, can have benefits compared to regulating case by case. This approach can provide oversight and scrutiny while allowing some freedom for updates. Improvements should be encouraged without imposing undue burden.

The committee suggested that Health Canada may want to consider piloting lower-risk AI devices, such as those classified as Class II.

The committee discussed the possibility of having a unique, parallel classification system for AI devices. This system could be based, for example, on level of autonomy (whether there's a doctor in the loop) and risk, and could allow Health Canada to add extra controls. The committee also acknowledged that the use of lower-risk decision support devices by doctors may be overseen by the respective provincial college of physicians and surgeons, which regulate professional activity.

Ethical concerns

Science question 5:
"The Health Products and Food Branch's mandate is to manage the health-related risks and benefits of health products and food. In terms of medical device regulation, the focus is on the safety and effectiveness of the devices. Other potential concerns such as patient privacy are not specifically addressed within the mandate. However, the concerns are acknowledged.

What are the main ethical concerns in terms of regulating artificial intelligence in medical devices?
How do these ethical concerns impact the effective regulation of medical devices in terms of safety and effectiveness?"

The committee described that many health equity issues will originate from the quality of data used to train the model. Datasets that are generated tend to focus on a homogenous population that is affluent and living in urban areas (around teaching hospitals or research labs). There may be a risk of exacerbating inequity between groups, because some are not represented in the data and therefore will not have access to the technology. Under-represented populations such as those who are female, children, Indigenous, elderly, in ethnic minorities, gender-diverse, or pregnant and lactating may be excluded or minimized.

However, the committee suggested that AI could potentially aid with health equity issues. For example, certain algorithms could be used as screening tools in rural populations.

The committee also suggested that Health Canada consider the following:

Consent
- applying the Personal Information Protection and Electronic Documents Act
- patients or users being asked to "donate" their data without compensation
- a marketed product continues to collect data in secret
- potential power issues, if patients are worried about being denied best available care if they do not consent to the data terms
- nature of consent forms, which are often so lengthy, detailed and technical that many people just click "agree" without true understanding and consent
Issues of numeracy or technical literacy
- biases that can exist even by recruiting patients in the training datasets or testing with patients who are adept at technology
Creation of a digital divide
- technology gap between those who have access to technologies and sufficient bandwidth to operate them
Data ownership and reciprocity
- use of data is partially justified to support innovation, but there's no reciprocity of data, as hospitals that generate data are no longer able to use it after it's acquired by industry
- the population whose data was used to create the software no longer benefits from the product
- should be a mechanism for data to come back to hospitals
- should be an ethical principle around the meaningful use of data beyond just a single application, with consideration given to requiring manufacturers to disclose how the data will be used
Erosion of trust in institutions and the sharing of data
- vulnerable populations who may wish to hide their data will not be able to reap the benefits of AI, because AI needs that data to learn and improve outcomes
What to approve with respect to funding
- whether "safe enough" is good enough to go to market
- consider conditional approvals that rely on attaining certain outcomes

The committee discussed the possible benefit of establishing AI ethics boards that are distinct from existing research ethics boards (REBs). These AI ethics boards would focus on ethics concerns (such as equity, privacy, consent) specific to AI-enabled medical devices. Existing REBs could also be leveraged rather than creating new bodies.

The challenges associated with this idea include potential lack of resources and talent, especially in smaller institutions.

Monitoring for discrepancies

Science question 6:
"Discrepancies between algorithm output and human clinician decisions are expected in diagnostic software that employs artificial intelligence.

When should a discrepancy between the algorithm's prediction and the clinician be considered a device failure?
Should re-training to improve an algorithm be considered an action to correct the device after being deemed defective?"

The committee discussed that "failure" of a device could present an opportunity for improvement. The performance over time compared to clinical decisions should be considered.

The committee noted that discrepancies can be viewed as a positive signal for change and can be an early indicator that a negative outcome may arise if no change is done to the algorithm. Discrepancies also provide a valuable opportunity for re-training the algorithm.

All AI products should include information about the sensitivity and specificity of their classification function to support clinicians in understanding the potential impacts of a discrepancy. Appropriate labelling should explicitly indicate to the user that discrepancies are to be expected.

Since discrepancies are expected, the onus should be on the manufacturer to specify performance and define the margins or unacceptable levels of discrepancy through risk analysis. Health Canada should consider at what point systemic discrepancies should be reported.

Manufacturers should have to monitor and report on performance, data that exceeds performance tolerance and false positives, and be able to collect data as long as the device is in use.

The committee said that it could be beneficial for clinicians or users to be able to report issues with the AI software directly to the manufacturer through the software itself. From a patient perspective, there should be clear processes in place for when a clinician disagrees with an algorithm and decides to override it.

The committee discussed the need for different requirements depending on the context in which the device is used and its level of autonomy.

Where the AI device is used to augment clinician practice, discrepancies can be expected and the clinician has a chance to disagree with the output. In this case, systemic discrepancies over time matter more in terms of reportability and correction. A decision to not act based on an AI's output is part of patient care, which is under provincial jurisdiction. Accountability would then rest on hospitals to make sure that the AI devices they use in their hospital are safe.

For a patient-facing device, there is no clinician in the loop and there may be a higher degree of reliance on the device by the patient. Individual discrepancies may have a greater impact in this scenario than in a clinical practice environment.

Closing remarks

Dr. Joseph Cafazzo, Chair, and Health Canada thanked the members for their participation. The meeting was adjourned.

Page details

2024-12-12

Language selection

Search