Early detection and prediction of infectious disease outbreaks


Volume 45-5, May 2, 2019: Climate change and infectious diseases: The solutions


Risk assessment strategies for early detection and prediction of infectious disease outbreaks associated with climate change

EE Rees1, V Ng2, P Gachon3, A Mawudeku4, D McKenney5, J Pedlar5, D Yemshanov5, J Parmely6, J Knox1,2


1 Public Health Risk Sciences Division, National Microbiology Laboratory, Public Health Agency of Canada, St. Hyacinthe, QC

2 Public Health Risk Sciences Division, National Microbiology Laboratory, Public Health Agency of Canada, Guelph, ON

3 Centre pour l’Étude et la Simulation du Climat à l’Échelle Régionale (ESCER), Université du Québec à Montréal (UQAM), Montréal, QC

4 Office of Situational Awareness and Operations, Centre for Emergency Preparedness and Response, Public Health Agency of Canada, Ottawa, ON

5 Natural Resources Canada, Canadian Forest Service, Great Lakes Forestry Centre, Sault Ste. Marie, ON

6 Canadian Wildlife Health Cooperative, University of Guelph, Guelph, ON



Suggested citation

Rees EE, Ng V, Gachon P, Mawudeku A, McKenney D, Pedlar J, Yemshanov D, Parmely J, Knox J. Risk assessment strategies for early detection and prediction of infectious disease outbreaks associated with climate change. Can Commun Dis Rep 2019;45(5):119–26. https://doi.org/10.14745/ccdr.v45i05a02

Keywords: climate change, risk assessment, event-based surveillance systems, artificial intelligence, machine learning, natural language processing, risk modelling


A new generation of surveillance strategies is being developed to help detect emerging infections and to identify the increased risks of infectious disease outbreaks that are expected to occur with climate change. These surveillance strategies include event-based surveillance (EBS) systems and risk modelling. The EBS systems use open-source internet data, such as media reports, official reports, and social media (such as Twitter) to detect evidence of an emerging threat, and can be used in conjunction with conventional surveillance systems to enhance early warning of public health threats. More recently, EBS systems include artificial intelligence applications such machine learning and natural language processing to increase the speed, capacity and accuracy of filtering, classifying and analysing health-related internet data. Risk modelling uses statistical and mathematical methods to assess the severity of disease emergence and spread given factors about the host (e.g. number of reported cases), pathogen (e.g. pathogenicity) and environment (e.g. climate suitability for reservoir populations). The types of data in these models are expanding to include health-related information from open-source internet data and information on mobility patterns of humans and goods. This information is helping to identify susceptible populations and predict the pathways from which infections might spread into new areas and new countries. As a powerful addition to traditional surveillance strategies that identify what has already happened, it is anticipated that EBS systems and risk modelling will increasingly be used to inform public health actions to prevent, detect and mitigate the climate change increases in infectious diseases.


Climate warming trends have been accelerating over the last few decades. The world’s nine warmest years in the time period from 1850 to 2017 have all occurred in the last twelve years, with a total increase of approximately 0.97°C in the average annual air temperature for the time period from 1880 to 2017Footnote 1. This ostensibly small increase in average global temperature is nevertheless responsible for significant changes in the worldwide weather patterns and associated effects on society through sea level rise (and associated erosion) and increased frequency and intensity of flooding, droughts (with associated wildfires and crop failures) and freezing rain eventsFootnote 2. Of particular importance to Canada, climate warming is even more acute at higher latitudes and in the winter monthsFootnote 3. Over the past 70 years, the overall annual average temperature in Canada has increased by 1.8°CFootnote 4, with an average winter temperature increase of 3.4°CFootnote 4. In some areas in the northwest, this increase has been even higher. Because climate change affects not only temperatures but precipitation patterns, Canada is experiencing generally drier conditions in the west and above average precipitation in the eastFootnote 4.

Climate-driven changes to temperature and precipitation are known to affect the risk of infectious disease transmission. Climate change is modifying range distributions of disease vectors (i.e. ticks and mosquitoes) and reservoir populations (i.e. birds, rodents and deer) that participate in the transmission of pathogens from ticks and mosquitoes to humans as climate suitability for vector and reservoir populations changeFootnote 5Footnote 6. For example, the increase in cases of Lyme disease in Canada reflect the northward expansion of the range of the black-legged tick vector, Ixodes scapularis, in the United States (US) and into southern Canada, as climate change has made Canada more conducive to establishing tick populationsFootnote 7Footnote 8. This expansion of the area where the vectors and their reservoirs can thrive means not only an increased risk of sporadic infectious disease but also an increased likelihood that these vectors, and the diseases that they carry, can become endemicFootnote 6Footnote 9Footnote 10Footnote 11.

In addition, climate change is influencing the mobility patterns of people and goods. An increase in “climate refugees”, people displaced when their lives and/or livelihoods are at risk from extreme weather events, is expectedFootnote 11. Refugees, often from geographical areas where infectious diseases are more common and with different vaccination schedules and practices, may inadvertently bring these diseases into CanadaFootnote 12. Tourism is also affected by climate change, as changes in both home and travel destinations influence the push and pull of factors motivating people to travel and the potential for disease spreadFootnote 13Footnote 14Footnote 15. Vectors and pathogens can inadvertently be transported through shipments by air, land and seaFootnote 16Footnote 17Footnote 18. Land and sea containers are known to support the invasion of mosquitoes because larva can develop in trapped standing water, and if no water exists, eggs can withstand desiccation for weeks to monthsFootnote 19Footnote 20. Air travel has also been responsible for travellers carrying infections into new areas. In Canada, returning travellers have brought with them the Zika virus and have also sparked an outbreak of severe acute respiratory syndrome (SARS) coronavirusFootnote 15Footnote 21Footnote 22.

Thus, the increased risks of infectious diseases with climate change pose important public health risks and work is underway to monitor, assess and predict the impact of these risks. In the past, public health management has depended on notifiable disease reporting surveillance systems to detect outbreaks, monitor disease progression and inform prevention and mitigation policies. However, traditional surveillance systems are typically characterized by delays in the reporting and analysis of the data and the communication of the results.

To address the need for closer to real-time surveillance of emerging issues and earlier insight on potential health impacts, two risk assessment strategies have been, and are being, developed: event-based surveillance (EBS) systems, which increasingly incorporate artificial intelligence; and risk modelling. The objective of this overview is to describe these two risk assessment strategies and how they can inform public health actions to prevent, detect and mitigate the climate change increases in infectious diseases.

Event-based surveillance systems

Event-based surveillance systems use a variety of open-source internet data and assessment techniques to identify disease threatsFootnote 23Footnote 24. Typical open-source internet data include online newswires, social media and other internet data streams, in multiple languages, to detect early-warning signals of threats to public health. These systems have proven to be more timely in comparison with conventional surveillance data sources from laboratory results or hospitalsFootnote 25, and can be used in conjunction with conventional surveillance systems to enhance early warning of public health threatsFootnote 26. The more quickly signals from an evolving outbreak are identified, the more quickly the outbreak can be tracked and a public health response can be planned and implementedFootnote 27.

There are three types of EBS systems: moderated; partially moderated; and fully automatedFootnote 28. The level of automation influences how the information flow in EBS systems is managed from the open-source internet data from news aggregators (e.g. Factiva, Google News, Moreover Baidu), Rich site summary (RSS) and social media feeds from official and unofficial sources (e.g. Twitter for US Centers for Disease Control and general public), and validated official reports (e.g. World Health Organization, US Centers for Disease Control). The Program for Monitoring Emerging Diseases (ProMED) is an example of a moderated system and was on the forefront of EBS development over 25 years agoFootnote 29Footnote 30. ProMED is run by volunteer analysts (who are expert curators) who monitor and choose news articles, validate the content and notify subscribers of noteworthy infectious disease events. Strengths of this system include having a low signal-to-noise ratio, being open access and having a broad reach. However, volunteers do not cover all populations at risk, volunteer biases can influence the moderation of events and volunteers do not have the resources (nor are they expected) to provide detailed information giving situational awareness for assessing the threat levelFootnote 29.

The Global Public Health Intelligence Network (GPHIN) is a partially moderated system that was developed by the Government of Canada, in collaboration with the World Health Organization, four years after ProMEDFootnote 31Footnote 32Footnote 33. GPHIN access is restricted to agencies with health-related mandates. Artificial intelligence (AI) algorithms in GPHIN automate a stream of two to three thousand news articles per day that are moderated by 12 expert analysts who identify and issue alerts for threats using tacit contextual information (e.g. historic context, market trends, travel bans and climate anomalies). An example of the usefulness of GPHIN dates back to early 2003 when analysts identified reports from China referring to increased sales of antiviral therapies just before the global onset of the SARS epidemicFootnote 34. Unlike ProMED, GPHIN benefits from multi-staged filtering using AI and trained analysts. Artificial intelligence enables processing of a larger data stream, and analysts have the resources to provide information for situational awareness. Both ProMED and GPHIN can function in multiple languages; however, it is expensive for GPHIN to add in other languages because of the cost to hire analysts with language fluencyFootnote 33.

Fully automated systems include the European Commission’s Medical Information System (MedISys), Pattern-based Understanding and Learning System (PULS) and HealthMap. These systems are open to the public, but also have restricted access to serve the needs of health agencies such as private discussion forums, increased functionality and data processing of commercial sourcesFootnote 35Footnote 36. Fully automated systems are faster at processing data and less expensive to operate than moderated systems. The main drawback is the higher signal-to-noise ratio meaning that there is an increased risk of identifying false threatsFootnote 37Footnote 38. The EBS systems can be connected in synergistic ways to address this riskFootnote 39. For example, MedISys uses low signal-to-noise ratio data from ProMED and GPHIN, and uses more advanced language processing algorithms from PULS. The PULS extracts information about events identified in the MedISys stream and then returns these data back to MedISysFootnote 36Footnote 40. The different types of EBS systems are summarized in Table 1.

Table 1: Summary of some event-based surveillance systems
Type Example Establishment Public availability
Moderated systemFootnote a of Table 1 Program for Monitoring Emerging Disease (ProMED)Footnote 29Footnote 30 In 1994 as a nonprofit organization Yes
Partially moderated systemFootnote b of Table 1 Global Public Health Intelligence Network (GPHIN)Footnote 31Footnote 32Footnote 33 In 1998 through partnership between the Government of Canada and World Health Organization No; available to partnered health agencies
Fully automated systemFootnote c of Table 1 Medical Information System (MedISys)Footnote 36Footnote 41Footnote 42 In 2004 by the European Commission Yes
HealthMapFootnote 35Footnote 38Footnote 40Footnote 43 In 2006 by Boston Children’s Hospital Yes
Pattern-based Understanding and Learning System (PULS)Footnote 36Footnote 44Footnote 45 In 2007 by the Department of Computer Science, University of Helsinki, Finland Yes

Artificial intelligence applications

The ability of EBS systems to quickly and accurately detect threats (such as outbreaks of infectious diseases) has been revolutionized by artificial intelligence applications for data processing. Open-source internet data are considered “unstructured” in the sense that news articles, blogs, tweets, etc., provide a narrative describing an event. The text, numbers and dates are not organized in a data model, such as a database, that can be used for automated event detection and risk modelling; therefore, open-source data must be processed to extract and structure information about what happened, where it happened, when it happened and to whom it happened. The EBS systems use natural language processing (NLP) methods to process and understand event narrativesFootnote 46Footnote 47Footnote 48. Natural language processing is a field of research dedicated to understanding human discourseFootnote 49. Early methods include the sub-language approach, where rules and patterns are used to interpret and classify vocabulary, syntax and semantics of the unstructured narrative. The EBS systems have taxonomies of terms to match predefined terms and their synonyms to those found in the data sources. Much like with a conventional literature search, taxonomic classification of narratives can identify health-related articles by searching for related terms (e.g. human influenza A synonyms include H1N1, swine flu, California flu, human influenza and influenza A)Footnote 50. The sublanguage approach for identifying health-related data in EBS systems is effective but also has drawbacks. Taxonomies are not easily generalizable and must be developed for each disease being monitored and kept up-to-date as language evolves and new discoveries about diseases are made. In this light, NLP has established a strong foundation in using machine learning (ML) methods.

Machine learning is a subset of AI that uses algorithms, such as statistical models, to perform a specific task without using explicit instructions; instead, relying on patterns and inference. The EBS systems gather open-source internet data (feeds and web queries) and then filter these data through a combination of the sublanguage approach and ML methods, where the latter is used to perform more complex tasks for analysing syntax, semantics, morphology, pragmatics and discourseFootnote 51. For example, ML methods can be used to determine the difference between non-health related articles (e.g. “Bieber fever” refers to avid supporters of Justin Bieber) and those discussing an infectious disease outbreakFootnote 43Footnote 51Footnote 52. Machine learning methods can also be used to distinguish between ambiguities in dates and locations, such as past and present outbreaks in articles that discuss historical contextFootnote 53Footnote 54. Novel applications for ML methods are also being developed, such as structuring disease case information into epidemiological line lists (a listing of individuals affected by the disease and related information; i.e. health status, sex, location, date of onset, hospitalized) that can be used in outbreak investigations and risk modellingFootnote 55. Once the information from open-source internet data has been processed into a data model, the event can then be reviewed and reported, as appropriate; furthermore, additional data analytics can be performed to communicate the current and predicted impact of the health threat. A summary of information flow from data collection, processing, analytics and reporting for EBS systems is presented in Table 2.

Table 2: Information flow from open-source internet data in event-based surveillance systems
EBS Data collection Data processing Data analytics Reporting
Moderated systems Human analysts search and identify open-source internet data for health-related concern Human analysts review, filter and designate the threat level of the event None Reports on health-related threats are communicated through email and posted on EBS system website
Partially moderated and fully automated systems Automated feed of open-source internet data Taxonomic classification and ML algorithms filter and classify events based on their metadata (e.g. type of threat, location and date). ML algorithms score the level of relevancy. In partially moderated systems, highly scored data sources are curated by human analysts Analytic techniques evolve with time and differ among EBS systems. Current techniques include the following: mapping of geo-tagged events; bar plots showing changes over time to keyword counts, number of identified articles and expected and observed number of disease cases; word clouds showing importance of keyword terms; alert notices given sudden increases to case counts, reliability of sources and/or number of unique sources Reports on health-related threats are communicated through email and posted on EBS system website and notified to appropriate web application user communities

Risk modelling

An important advancement for risk assessment is increasing the variety of data being used in modelling approaches. Risk modelling in the context of infectious diseases is the process of identifying and characterizing factors in individuals or populations that increase their vulnerability to contracting disease (e.g. age, proximity to outbreak). Statistical inference is a well-grounded and informative risk modelling approach that includes regression analysis. This method is used to determine how risk factors (explanatory variables) are associated with the outcome of interest (e.g. number of reported cases). Regression models, and statistical inference in general, are developing to include information from open-source internet data. An early example was the inclusion of search query engine data from Google Flu Trends as a predictor for the outcome of the number of reported physician visits for flu-like illnessesFootnote 56. The resulting model was then used to predict the number of seasonal influenza cases one to two weeks into the future; however, this approach was not as effective in predicting outbreaks outside of the traditional flu season because of associations being identified with search query trends not related to seasonal influenza (e.g. winter basketball season)Footnote 57. Subsequent work improved the accuracy of predicting seasonal influenza flu trends by using additional sources of open-source data (e.g. Twitter) and expanding the regression method to benefit from ML algorithms that can find complex associations among the outcome and explanatory variablesFootnote 58. Furthermore, regression modelling for the risk of infection has improved by including, in addition to open-source internet data, additional explanatory variables (e.g. climate and meteorological data from satellite imagery) that account for the presence, movement and distribution of pathogens, vectors, reservoir populations and infected peopleFootnote 59Footnote 60. For example, in China, the expected number of cases of hand, foot and mouth disease in children was best predicted by including data on weekly temperature and precipitation as well as data on hand, foot and mouth disease-related queries from the Chinese Baidu search engineFootnote 61.

Another dominant risk modelling approach is the use of compartmental models to mathematically simulate transmission dynamics of a population; that is, the flow of individuals among health states, such as susceptible (S), infectious (I) and recovered (R). For example, SIR models require defining parameters for the infectious rate (or inversely, the infectious period) and the rate of infectious contacts. It is then possible to estimate if an infected population will become epidemic, and to characterize the prevalence of a disease over time. The compartmental modelling approach has more recently developed to simulate transmission dynamics among multiple populations (meta-populations). This requires the inclusion of mobility data to define the rate of individuals moving among populationsFootnote 62. Human mobility at a meta-population level can be considered as the movement of people in a connected network of cities and countries. These data can be obtained from mobile phone call records and air traffic passenger volumesFootnote 63Footnote 64. Through meta-population modelling, it is possible to identify the travel routes through which pathogens may spread or be carried to Canada, as well as to determine the likelihood of these eventsFootnote 65Footnote 66. For example, the Zika virus is estimated to have first appeared in Brazil between August 2013 and April 2014 by infected travellers entering the country at Rio de Janeiro, Brasilia, Fortaleza and/or Salvador; and this introduction was followed by epidemics in Haiti, Honduras, Venezuela and then ColombiaFootnote 21.


There is uncertainty as to how climate change will affect the many factors related to the occurrence and spread of infectious diseases. These factors will undoubtedly include changes to the distributions of vector and reservoir populations, and changes to the mobility of people and goods and potential transport of pathogens, with subsequent impacts on exposure and transmission risks. To monitor infectious disease outbreaks in an effective and timely manner, public health professionals need better access to up-to-date surveillance data. To achieve this, conventionally-obtained data, such as that from existing notifiable disease reporting surveillance systems, are increasingly being augmented by EBS systems. The EBS systems are benefiting from ML and NLP methods to more fully exploit the available data; however, challenges remainFootnote 59. There are issues of data sharing and privacy that need to be resolved. For example, at what level can personal data be used and disclosed in the detection of health-related events? Both Google and Twitter provide their data freely to the public as finely aggregated per week and city; however, more precise information on the timing and location of the source would enable more comprehensive event detectionFootnote 26. Also, there are differences where and how people use the internet and social media around the world: there are gaps in internet and mobile phone use in AfricaFootnote 67; Baidu, rather than Google, is the predominant search engine in ChinaFootnote 61; and the propensity of people using Twitter to report illnesses is dependent on age and socioeconomic statusFootnote 68.

Risk modelling provides a means of estimating the health impacts of emerging infectious diseases. Advances in risk modelling approaches include integrating open-source internet and climate data to inform these estimates, and accounting for the mobility of humans to spread infectious diseases globally. As with EBS systems, risk modelling approaches are limited by the availability of the data that can be obtained. For example, mobile phone call records and air traffic data provide information to the nearest cell phone tower and airport respectively, but more precise location data are available, granted privacy concerns, through the global position system in mobile phones. Information at the individual level could greatly increase our understanding of the factors affecting disease occurrence and pathogen spread, for example, the role of certain people to drive the 2003 SARS outbreakFootnote 69.


Advances in assessing changes to vector and reservoir populations and human activity, and their impacts on infectious diseases, are now being monitored by a number of different surveillance and analytical strategies. Event-based surveillance systems use open-source data to gather information relating to infectious diseases. These systems can be moderated, partially moderated or fully automated, and each type of system has advantages and disadvantages. There is a growing trend towards automation because of the ability to process high volumes of data, and the accuracy of ML and NLP methods to identify events are improving and may one day surpass the ability of human moderators. Risk modelling to understand and predict the health impacts of infectious diseases is commonly performed using statistical inference and compartmental modelling approaches. These methods are advancing the ability to identify populations at risk to emerging diseases, and forecast health impacts and determine pathways of disease spread by integrating open-source internet data and human mobility data, along with more traditional data variables from climate data and infectious disease outbreak data. The methods we have presented here are promising new developments that will increase our capacity to deal with evolving disease threats as the climate changes. Having more information (and more accurate information) sooner will make it possible for public health professionals to confirm and evaluate potential infectious disease outbreaks faster and thus to develop and commence treatment and other mitigation strategies in a more timely fashion.

Authors’ statement

  • EER — Conceptualization, investigation, writing—original draft, supervision and project administration
  • VN — Investigation, writing—review and editing
  • PG — Writing—review and editing
  • AM — Writing—review and editing
  • DM — Writing—review and editing
  • JP — Writing—review and editing
  • DY — Writing—review and editing
  • JP — Writing—review and editing
  • JK — Investigation, writing—review and editing

Conflict of interest



This work was supported by the Public Health Agency of Canada.

Page details

Date modified: