Scientific Advisory Committee Digital Health Technologies, January 10 and 12, 2024, summary of proceedings

On this page

Attendance

Committee members

Core members: January 10 - Joseph Cafazzo (Chair), Aviv Gladman, Trevor Jamieson, Kendall Ho, Kim Hanson, Kumanan Wilson, Ross Mitchell

January 12 - Joseph Cafazzo (Chair), Aviv Gladman, Trevor Jamieson, Kendall Ho, Kim Hanson, Kumanan Wilson, Ross Mitchell

Ad hoc members: January 10 - Frank Rudzicz, Jaron Chong

January 12 - Jaron Chong

Regrets: January 10 - Doug Manuel

January 12 - Doug Manuel

Presenters

Medtech: January 10 - Mia Spiegelman (Medtech Canada), Rajeswari Devanathan (TPIreg), Martha De Cunha Maluf-Burgman (Edwards), Cassie Scherer (Medtronic), Tarek Haddad (Medtronic), Seema Vyas (Medtronic), Abhineet Johri (Siemens)

Health Canada: January 10 - Bruce Randall, Sally Prawdzik, Marc Lamoureux, Laetitia Guillemette, Janet Hendry, Tyler Dumouchel, Martina Buljan, Tudor Fodor, Jiwon Hwang, Rosslynn Miller-Lee, Patrick Assouad, Ian Glasgow, Gregory Jackson, Justin Peterson, Colin McCurdy, Nicholas Blackburn

January 12 - Bruce Randall, Marc Lamoureux, Laetitia Guillemette, Janet Hendry, Tyler Dumouchel, Martina Buljan, Tudor Fodor, Jiwon Hwang, Rosslynn Miller-Lee, Ian Glasgow, Gregory Jackson, Justin Peterson, Colin McCurdy, Nicholas Blackburn

Observers

January 10 and 12 - Joanne Kim (Canadian Agency for Drugs and Technologies in Health, or CADTH), Yannick Auclair (Institut national d'excellence en santé et en services sociaux, or INESSS)

Welcome

Bruce Randall, Director General, Medical Devices Directorate (MDD), welcomed committee members. He then introduced the meeting topics:

After welcoming the newest member, Dr. Ross Mitchell, to the committee, Mr. Randall thanked the committee members for their time and for providing advice to Health Canada.

Chair's remarks

Dr. Joseph Cafazzo, Chair, thanked members for participating in the meeting. He asked committee members to provide an update on declarations of affiliations and interests from those initially declared. There were none that restricted SAC-DHT members from participating. He advised members that they would be asked to provide their advice on topics discussed over the 2 days during the roundtable at the end of the meeting.

Summary and general considerations

Marc Lamoureux, Manager, Digital Health Division, MDD, gave an overview of previous committee meetings and advice used.

Presentation

There was 1 presentation, given by MedTech Canada, a national association representing the medical technology industry, on January 10.

Medtech Canada talked about current industrial practices used to develop machine learning-enabled medical devices. The presentation touched on:

Medtech Canada answered questions by committee members and then left the meeting.

Discussion

Four topics were discussed at the 2-day meeting:

Generative AI and large language models

Science question 1:
"What unique considerations should be taken into account when assessing the safety profile or benefit-risk ratio of devices using generative AI or large language models? What new risks are associated with using these device types in the clinic or at home?"

The committee discussed the differences between traditional machine learning algorithms and generative AI. Members noted that traditional machine learning models have a narrow focus while large language models can have broader implications. A device can also use large language models in the background to generate data or make predictions, without users being aware that such a model is being used.

The committee suggested that large language models could be viewed like operating systems, where their broad abilities may be used for more narrow, specific applications. This has various implications, especially when they are integrated in electronic health records.

Large language models may be hosted by a third party and, unless automatic updates are prevented, could generate unpredictable results.

While a traditional statistical model can be explained to users, with generative AI, users may not understand how the output is achieved. Third-party generative AI models can also degrade over time for a number of reasons. For example, they may be trained on limited data and may not generalize well. Or, the data used to train the model is no longer representative of real-world data.

Generative AI products that are currently available may not claim to have a medical purpose, and therefore do not fall under Health Canada's purview. However, they are possibly already being used for such purposes. Due to the ease of access and use of such tools, individuals may be turning to this technology to manage their own health instead of trying to navigate the health care system to see a professional.

The committee expressed concern over the lack of transparency in these models, as training data are often not shared by manufacturers. Lack of transparency makes these models difficult to validate, especially if they are then used by other manufacturers to create a new device. For these reasons, the committee recommended that any regulations around this type of technology should request disclosure of data sources to increase transparency.

This discussion led to talking about the challenges associated with cybersecurity and privacy regulations. Although patient privacy should be protected, opportunity is lost when some of the patient data is unavailable for training the models due to privacy concerns. In the context of cybersecurity, there have been anecdotal reports of individual patient data being retrieved from models that had been trained locally (known as "regurgitation"). These reports highlight the need to balance availability of data to inform models with the need to protect patients' sensitive information.

The committee also raised concern about the risks of an AI model fabricating information that is coherent, plausible and convincing, but is nevertheless factually incorrect.

The committee noted the potential benefits of generative AI in health care for routine administrative tasks and automation. A hope was that such tools could:

There are risks associated with not embracing this technology. The committee cautioned against overplaying software bias, since humans are also biased in their decision-making.

The committee noted that AI/machine learning-enabled devices had strong potential to reduce physician errors and to help doctors adhere to practice guidelines and standards of care. To maximize this technology's benefits, the committee recommended a copilot model, where the clinician is required to review the output from devices using generative AI. It was suggested that Health Canada consider authorizing lower-risk products in an early pilot phase. Health Canada may also consider a similar program to that is used by the US Food and Drug Administration (FDA). In the FDA program, specific tools are fast-tracked to market if they offer significant benefits and there is no equivalent product available.

The committee was concerned that over-regulating these technologies would favour larger manufacturers and disadvantage smaller local companies and start-ups. To promote equity in the industry, any regulations should be accompanied by clear guidance in lay language.

Transparency to users of machine learning models

Science question 2:
"What do users want to know about the machine learning models within a medical device? Where is this information best placed and how should it be displayed? When do users want to know this information? How should this information get updated when critical information changes?"

Devices should indicate when they employ an AI model so users are aware. Clinician users should also be told the source of the data and whether the algorithm was trained on local data.

The committee recommended that Health Canada mandate the creation of AI model cards. AI model cards would be a permanent and accessible repository of information about the model.

Similar to product monographs for drugs, the cards would present indications, contraindications, limitations of use, potential risks, potential failures, target population (for example, inclusion and exclusion criteria of training data) and known biases. Patient characteristics that were absent from the training data would be described so users are aware of the algorithm's blindsides. Known limits would also be clearly described (for example, which circumstances lead to model failure or hallucinations and should be avoided).

They could be updated regularly based on user experience, including user reported issues and incidents reported in other countries.

The committee discussed the importance of providing meaningful data and outcome measures in a way that users can understand. The target range of expected levels of performance (minimum, maximum, mean, median) and level of uncertainty should be provided. Ideally, there should be a clear threshold for when use of a device should be suspended and a backup plan for when this occurs.

Pre-determined change control plans should incorporate a challenge component. In other words, the model should regularly be challenged after its training to evaluate how it degrades. There should be frequent feedback about whether the outputs make sense. Such feedback helps to build trust and refine and improve the model. Information about whether these models are calibrated would reassure users of the robustness of the model in complex cases.

There's an issue that when manufacturers stop operations for a specific algorithm or machine learning-enabled medical device, users do not know who to contact to report incidents or performance degradation. A regulated centralized registry of machine learning-enabled medical devices approved in Canada or updates to the Medical Device Active Licence Listing could mitigate this issue.

Change management for machine learning models

Science question 3:
"What critical elements should Health Canada take into account in the assessment of any proposed change and associated change management protocol to ensure that safety and performance is maintained or improved?"

The committee highlighted that any change would require retraining the model on local data, rather than on the original generic dataset. Concerns were raised about the potential for limited expertise around machine learning in smaller health care facilities. To resolve this, the committee suggested that third-party assistance to monitor a model's performance should be required. Another option might be to roll out changes to larger academic medical centres who opt in and who have expert AI users on staff for testing first before rolling out updates to all users.

Patches or updates to generative AI models, possibly by a remote third party, could lead to unintended consequences or unpredictable performance issues. One committee member said it's important to have fail-safe and contingency plans in place. Also noted was that change management can be challenging for models with higher degrees of automation and higher degrees of connectivity with other devices in a network. In that context, even small updates to a single device can lead to loss of connection or performance of other connected devices.

The committee discussed the complexity of managing versions and stated a preference for a gradual shift rather than a mandatory change, which could be disruptive. There should be clear protocols based on the level of risk. The committee proposed a tiered reporting system for incidents affecting patient care. Updates to devices should have a "probation period," after which the changes should be reassessed. An emergency protocol should also be in place for serious safety events.

Technology manufacturers must provide support over the entire lifecycle of their AI model. Health Canada could consider having more timely and sensitive reporting requirements beyond existing procedures.

The committee discussed the challenges associated with AI models that can "learn on the job" and change their behaviour over time as more patients are seen. The large amounts of data that these models are trained on make it difficult to analyze the cause of an algorithm issue that led to a poor health outcome or death. Fully automated medical devices that operate without supervision can make it difficult to identify that an adverse event has taken place. Thus, the benefits of automated updates may not outweigh the risks.

The committee also stressed the importance of having a thorough understanding of risk levels for different components in integrated systems. A regulatory approach that addresses interoperability may be needed as medical devices become more numerous and interconnected. From a software engineering perspective, unit tests can be used to verify accuracy, but may not be enough to ensure safe integration into existing systems. The committee emphasized the importance of recognizing greyscales rather than binary outcomes in clinical scenarios. As such, regulatory approaches that use fine-tuning instead of "pass or fail" criteria may be preferable.

Rapidly evolving gold standards in software-based medical devices make it difficult to build an evidence base at the same pace. Post-market changes to a device could also occur at the local or individual level. Such changes should align with the expected operation of the algorithm.

While checklists, unit tests and fallback planning can help protect against expected events, a culture of communication and transparency can help protect against unexpected events. As with other health products, post-market monitoring and reporting are critical to highlighting potential adverse events and problems.

Monitoring machine learning models

Science question 4:
"What are some of the benefits, challenges, and best practices for monitoring machine learning models that are in use? How should the manufacturer and user be monitoring performance? How can this information get disseminated to users and other stakeholders? How often?"

The committee highlighted the challenges of relying on end users for feedback and reporting an adverse event. Even the most engaged users often ignore issues unless they are dealing with a disruptive alert. Adverse event reports are often incomplete, sparse or delayed. Manufacturers should have a simple reporting process so end users can identify errors and deviations. A group should be onsite to monitor and recalibrate the system quickly, although this may not be possible for smaller sites or acceptable from a privacy perspective.

The committee debated the risks and benefits of active versus passive surveillance models, with some members advocating for a blend of both. A structured approach could help identify errors and deviations from the standard (for example, manufacturers could set a threshold of error reports that would indicate the need for a formal study). The committee also suggested that manufacturers regularly monitor the algorithm (for example, every year) and engage end users.

The role of synthetic data in monitoring algorithm performance emerged as a topic of interest. While recognized for privacy and cost advantages, concerns were expressed about its ability to represent rare events. While there's an interest in synthetic data, the committee stressed that natural data is superior. Members also cautioned against relying too much on surrogate variables and stressed the importance of using clinically relevant variables.

A legal perspective on active monitoring was also discussed. Cloud deployment to enable active monitoring and shifting to service-oriented AI devices was stressed. Legal considerations on privacy were explored, with the committee suggesting that Health Canada require that some classes of AI-enabled devices be actively monitored. The potential benefit of preventing potential harm through timely AI predictions was contrasted with concerns about patient confidentiality. The need for regulatory guidance to navigate these complexities was highlighted.

Fail-safe mechanisms are a potential solution to monitoring performance, especially in the context of algorithmic decision-making. The committee discussed the potential risks associated with algorithms that may continue to generate outputs even when inputs are inappropriate. Members also stressed the importance of incorporating internal checks and fail-safe protocols to prevent unintended and otherwise uncatchable consequences.

Final discussions focused on the communication and dissemination of information related to errors and updates in AI algorithms. The committee discussed the challenges associated with reaching end users effectively and considered various scenarios and types of users. For example, patient users may be best reached through notifications or forced software upgrades. Electronic health records may be an option for clinician users.

The concept of a permalink was discussed, with members suggesting that having a single point of information from the vendor site for every licenced medical device would increase the chances of reporting. Manufacturers should consider the level of risk associated with the product in their communication approach.

Closing remarks

Dr. Joseph Cafazzo, Chair, thanked the members for their insights and productive discussions before closing the meeting.

Page details

2024-12-12