Natural language processing (NLP) a subfield of artificial intelligence

Challenges and opportunities for public health made possible by advances in natural language processing

Download this article as a PDF
Published by: The Public Health Agency of Canada
Issue: Volume 46–6: Artificial intelligence in public health
Date published: June 4, 2020
ISSN: 1481-8531

Subscribe to CCDR

Submit a manuscript

About CCDR

Browse

Past issues

Volume 46–6, June 4, 2020: Artificial intelligence in public health

Overview

Challenges and opportunities for public health made possible by advances in natural language processing

Oliver Baclic¹, Matthew Tunis¹, Kelsey Young¹, Coraline Doan², Howard Swerdfeger², Justin Schonfeld³

Affiliations

¹ Centre for Immunization and Respiratory Infectious Disease, Public Health Agency of Canada, Ottawa, ON

² Data, Partnerships and Innovation Hub, Public Health Agency of Canada, Ottawa, ON

³ National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB

Correspondence

oliver.baclic@canada.ca, justin.schonfeld@canada.ca

Suggested citation

Baclic O, Tunis M, Young K, Doan C, Swerdfeger H, Schonfeld J. Challenges and opportunities for public health made possible by advances in natural language processing. Can Commun Dis Rep 2020;46(6):161–8. https://doi.org/10.14745/ccdr.v46i06a02

Keywords: natural language processing, NLP, artificial intelligence, machine learning, public health

Abstract

Natural language processing (NLP) is a subfield of artificial intelligence devoted to understanding and generation of language. The recent advances in NLP technologies are enabling rapid analysis of vast amounts of text, thereby creating opportunities for health research and evidence-informed decision making. The analysis and data extraction from scientific literature, technical reports, health records, social media, surveys, registries and other documents can support core public health functions including the enhancement of existing surveillance systems (e.g. through faster identification of diseases and risk factors/at-risk populations), disease prevention strategies (e.g. through more efficient evaluation of the safety and effectiveness of interventions) and health promotion efforts (e.g. by providing the ability to obtain expert-level answers to any health related question). NLP is emerging as an important tool that can assist public health authorities in decreasing the burden of health inequality/inequity in the population. The purpose of this paper is to provide some notable examples of both the potential applications and challenges of NLP use in public health.

Introduction

There is a growing interest in deploying artificial intelligence (AI) strategies to achieve public health outcomes, particularly in response to the global coronavirus disease 2019 (COVID-19) pandemic where novel datasets, surveillance tools and models are emerging very quickly.

The objective of this manuscript is to provide a framework for considering natural language processing (NLP) approaches to public health based on historical applications. This overview includes a brief introduction to AI and NLP, suggests opportunities where NLP can be applied to public health problems and describes the challenges of applying NLP in a public health context. Particular articles were chosen to emphasize the breadth of potential applications for NLP in public health as well as the not inconsiderable challenges and risks inherent in incorporating AI/NLP in public health analysis and decision support.

Artificial intelligence and natural language processing

AI research has produced models that can interpret a radiograph^{Footnote 1}^{Footnote 2}, detect irregular heartbeats using a smartwatch^{Footnote 3}, automatically identify reports of infectious disease in the media^{Footnote 4}, ascertain cardiovascular risk factors from retinal images^{Footnote 5} and find new targets for existing medications^{Footnote 6}^{Footnote 7}. The success of these models is built from training on hundreds, thousands and sometimes millions of controlled, labelled and structured data points^{Footnote 8}. The capacity of AI to provide constant, tireless and rapid analyses of data offers the potential to transform society’s approach to promoting health and preventing and managing diseases. AI systems have the potential to “read” and triage all of the approximately 1.3 million research articles indexed by PubMed each year^{Footnote 9}; “examine” comments from 1.5 billion Facebook users or “monitor” 500 million tweets of people struggling with mental illness on a daily basis, foodborne illness or the flu^{Footnote 10}^{Footnote 11}; and simultaneously interact with each and every person seeking answers to their health questions, concerns, problems and challenges^{Footnote 12}.

NLP is a subfield of AI that is devoted to developing algorithms and building models capable of using language in the same way humans do^{Footnote 13}. It is routinely used in virtual assistants like “Siri” and “Alexa” or in Google searches and translations. NLP provides the ability to analyze and extract information from unstructured sources, automate question answering and conduct sentiment analysis and text summarization^{Footnote 8}. With natural language (communication) being the primary means of knowledge collection and exchange in public health and medicine, NLP is the key to unlocking the potential of AI in biomedical sciences.

Most modern NLP platforms are built on models refined through machine learning techniques^{Footnote 14}^{Footnote 15}. Machine learning techniques are based on four components: a model; data; a loss function, which is a measure of how well the model fits the data; and an algorithm for training (improving) the model^{Footnote 16}. Recent breakthroughs in these areas have led to vastly improved NLP models that are powered by deep learning, a subfield of machine learning^{Footnote 17}.

Innovation in the different types of models, such as recurrent neural network-based models (RNN), convolutional neural network-based models (CNN) and attention-based models, has allowed modern NLP systems to capture and model more complex linguistic relationships and concepts than simple word presence (i.e. keyword search)^{Footnote 18}. This effort has been aided by vector-embedding approaches to preprocess the data that encode words before feeding them into a model. These approaches recognize that words exist in context (e.g. the meanings of “patient,” “shot” and “virus” vary depending on context) and treat them as points in a conceptual space rather than isolated entities. The performance of the models has also been improved by the advent of transfer learning, that is, taking a model trained to perform one task and using it as the starting model for training on a related task. Hardware advancements and increases in freely available annotated datasets have also boosted the performance of NLP models. New evaluation tools and benchmarks, such as GLUE, superglue and BioASQ, are helping to broaden our understanding of the type and scope of information these new models can capture^{Footnote 19}^{Footnote 20}^{Footnote 21}.

Opportunities

Public health aims to achieve optimal health outcomes within and across different populations, primarily by developing and implementing interventions that target modifiable causes of poor health^{Footnote 22}^{Footnote 23}^{Footnote 24}^{Footnote 25}^{Footnote 26}. Success depends on the ability to effectively quantify the burden of disease or disease risk factors in the population and subsequently identify groups that are disproportionately affected or at-risk; identify best practices (i.e. optimal prevention or therapeutic strategies); and measure outcomes^{Footnote 27}. This evidence-informed model of decision making is best represented by the PICO concept (patient/problem, intervention/exposure, comparison, outcome). PICO provides an optimal knowledge identification strategy to frame and answer specific clinical or public health questions^{Footnote 28}. Evidence-informed decision making is typically founded on the comprehensive and systematic review and synthesis of data in accordance with the PICO framework elements.

Today, information is being produced and published (e.g. scientific literature, technical reports, health records, social media, surveys, registries and other documents) at unprecedented rates. By providing the ability to rapidly analyze large amounts of unstructured or semistructured text, NLP has opened up immense opportunities for text-based research and evidence-informed decision making^{Footnote 29}^{Footnote 30}^{Footnote 31}^{Footnote 32}^{Footnote 33}^{Footnote 34}. NLP is emerging as a potentially powerful tool for supporting the rapid identification of populations, interventions and outcomes of interest that are required for disease surveillance, disease prevention and health promotion. For example, the use of NLP platforms that are able to detect particular features of individuals (population/problem, e.g. a medical condition or a predisposing biological, behavioural, environmental or socioeconomic risk factor) in unstructured medical records or social media text can be used to enhance existing surveillance systems with real-world evidence. One recent study demonstrated the ability of NLP methods to predict the presence of depression prior to its appearance in the medical record^{Footnote 35}. The ability to conduct real-time text mining of scientific publications for a particular PICO concept provides opportunities for decision makers to rapidly provide recommendations on disease prevention or management that are informed by the most current body of evidence when timely guidance is essential, such as during an outbreak. NLP-powered question-answering platforms and chatbots also carry the potential to improve health promotion activities by engaging individuals and providing personalized support or advice. Table 1 provides examples of potential applications of NLP in public health that have demonstrated at least some success.

Table 1: Examples of existing and potential applications of natural language processing in public health
Type of activity	Public health objective	Example of NLP use
Identification of at-risk populations or conditions of interest	To continuously measure the incidence and prevalence of diseases and disease risk factors (i.e. surveillance)	Analysis of unstructured or semistructured text from electronic health records or social media^{Footnote 36}^{Footnote 37}^{Footnote 38}^{Footnote 39}^{Footnote 40}^{Footnote 41}^{Footnote 42}
	To identify vulnerable and at-risk populations	Analysis of risk behaviours using social media^{Footnote 43}^{Footnote 44}^{Footnote 45}
Identification of health interventions	To develop optimal recommendations/interventions	Automated systematic review and analysis of the information contained in scientific publications and unpublished data^{Footnote 46}^{Footnote 47}^{Footnote 48}^{Footnote 49}^{Footnote 50}
Identification of health interventions	To identify best practices	Identification of promising public health interventions through analysis of online grey and peer reviewed literature^{Footnote 51}
Identification of health outcomes using real-world evidence	To evaluate the benefits of health interventions	Analysis of unstructured or semistructured text from electronic health records, online media and publications to determine the impact of public health recommendations and interventions^{Footnote 52}^{Footnote 53}
Identification of health outcomes using real-world evidence	To identify unintended adverse outcomes related to interventions	Analysis of unstructured or semistructured text from electronic health records, social media and publications to identify potential adverse events of interventions^{Footnote 54}^{Footnote 55}^{Footnote 56}^{Footnote 57}^{Footnote 58}
Knowledge generation and translation	To support public health research	Analysis and extraction of information from electronic health records and scientific publications for knowledge generation^{Footnote 59}^{Footnote 60}^{Footnote 61}^{Footnote 62}
Knowledge generation and translation	To support evidence-informed decision making	Use of chatbots, question/answer systems and text summarizers to provide personalized information to individuals seeking advice to improve their health and prevent disease^{Footnote 63}^{Footnote 64}^{Footnote 65}
Environmental scanning and situational awareness	To conduct public health risk assessments and provide situational awareness	Analysis of online content for real-time critical event detection and mitigation^{Footnote 66}^{Footnote 67}^{Footnote 68}^{Footnote 69}^{Footnote 70}
Environmental scanning and situational awareness	To monitor activities that may have an impact on public health decision making	Analysis of decisions of international and national stakeholders^{Footnote 71}
Abbreviation of Table 1 Abbreviation: NLP, natural language processing

Challenges

Despite the recent advances, barriers to widespread use of NLP technologies remain.

Similar to other AI techniques, NLP is highly dependent on the availability, quality and nature of the training data^{Footnote 72}. Access and availability of appropriately annotated datasets (to make effective use of supervised or semi-supervised learning) are fundamental for training and implementing robust NLP models. For example, the development and use of algorithms that are able to conduct a systematic synthesis of published research on a particular topic or an analysis and data extraction from electronic health records requires unrestricted access to publisher or primary care/hospital databases. While the number of freely accessible biomedical datasets and pre-trained models has been increasing in recent years, the availability of those dealing with public health concepts remains limited^{Footnote 73}.

The ability to de-bias data (i.e. by providing the ability to inspect, explain and ethically adjust data) represents another major consideration for the training and use of NLP models in public health settings. Failing to account for biases in the development (e.g. data annotation), deployment (e.g. use of pre-trained platforms) and evaluation of NLP models could compromise the model outputs and reinforce existing health inequity^{Footnote 74}. However, it is important to note that even when datasets and evaluations are adjusted for biases, this does not guarantee an equal impact across morally relevant strata. For example, use of health data available through social media platforms must take into account the specific age and socioeconomic groups that use them. A monitoring system trained on data from Facebook is likely to be biased towards health data and linguistic quirks specific to a population older than one trained on data from Snapchat^{Footnote 75}. Recently many model agnostic tools have been developed to assess and correct unfairness in machine learning models in accordance with the efforts by the government and academic communities to define unacceptable AI development^{Footnote 76}^{Footnote 77}^{Footnote 78}^{Footnote 79}^{Footnote 80}^{Footnote 81}.

Currently, one of the biggest hurdles for further development of NLP systems in public health is limited data access^{Footnote 82}^{Footnote 83}. Within Canada, health data are generally controlled regionally and, due to security and confidentiality concerns, there is reluctance to provide unhindered access to these systems and their integration with other datasets (e.g. data linkage). There have also been challenges with public perception of privacy and data access. A recent survey of social media users found that the majority considered analysis of their social media data to identify mental health issues “intrusive and exposing” and they would not consent to this^{Footnote 84}.

Before key NLP public health activities can be realized at scale, such as the real-time analysis of national disease trends, jurisdictions will need to jointly determine a reasonable scope and access to public health–relevant data sources (e.g. health record and administrative data). In order to prevent privacy violations and data misuse, future applications of NLP in the analysis of personal health data are contingent on the ability to embed differential privacy into models^{Footnote 85}, both during training and postdeployment. Access to important data is also limited through the current methods for accessing full text publications. Realization of fully automated PICO-specific knowledge extraction and synthesis will require unrestricted access to journal databases or new models of data storage^{Footnote 86}.

Finally, as with any new technology, consideration must be given to assessment and evaluation of NLP models to ensure that they are working as intended and keeping in pace with society’s changing ethical views. These NLP technologies need to be assessed to ensure they are functioning as expected and account for bias^{Footnote 87}. Although today many approaches are posting equivalent or better-than-human scores on textual analysis tasks, it is important not to equate high scores with true language understanding. It is, however, equally important not to view a lack of true language understanding as a lack of usefulness. Models with a “relatively poor” depth of understanding can still be highly effective at information extraction, classification and prediction tasks, particularly with the increasing availability of labelled data.

Natural language processing and the coronavirus disease 2019 (COVID-19)

With the emergence of the COVID-19, NLP has taken a prominent role in the outbreak response efforts^{Footnote 88}^{Footnote 89}. NLP has been rapidly employed to analyze the vast quantity of textual information that has been made available through unrestricted access to peer-review journals, preprints and digital media^{Footnote 90}. NLP has been widely used to support the medical and scientific communities in finding answers to key research questions, summarization of evidence, question answering, tracking misinformation and monitoring of population sentiment^{Footnote 91}^{Footnote 92}^{Footnote 93}^{Footnote 94}^{Footnote 95}^{Footnote 96}^{Footnote 97}.

Conclusion

NLP is creating extraordinary opportunities to improve evidence-informed decision making in public health. We anticipate that broader applications of NLP will lead to the creation of more efficient surveillance systems that are able to identify diseases and at-risk conditions in real time. Similarly, with an ability to analyze and synthesize large volumes of information almost instantaneously, NLP is expected to facilitate targeted health promotion and disease prevention activities, potentially leading to population-wide disease reduction and greater health equity. However, these opportunities are not without risks: biased models, biased data, loss of data privacy and the need to maintain and update models to reflect the evolving language and context of public communication are all existing challenges that will need to be addressed. We encourage the public health and computer science communities to collaborate in order to mitigate these risks, ensure that public health practice does not fall behind in these technologies or miss opportunities for health promotion and disease surveillance and prevention in this rapidly evolving landscape.

Authors’ statement

OB — Writing – original draft, review & editing and conceptualization
MT — Writing – original draft, review & editing and conceptualization
KY — Writing – review & editing, and conceptualization
CD — Writing – review & editing
HS — Writing – review & editing
JS — Writing – original draft, review & editing and conceptualization

Conflict of interest

None.

Acknowledgements

We thank J Nash and J Robertson who were kind enough to offer feedback and suggestions.

Funding

This work is supported by the Public Health Agency of Canada. The research undertaken by JS was funded by the Canadian federal government’s Genomic Research and Development Initiative.

References

Footnote 1

Majkowska A, Mittal S, Steiner DF, Reicher JJ, McKinney SM, Duggan GE, Eswaran K, Cameron Chen PH, Liu Y, Kalidindi SR, Ding A, Corrado GS, Tse D, Shetty S. Chest radiograph interpretation with deep learning models: assessment with radiologist-adjudicated reference standards and population-adjusted evaluation. Radiology 2020;294(2):421–31. https://doi.org/10.1148/radiol.2019191293

Return to footnote 1 referrer

Footnote 2

Liu X, Faes L, Kale A, Wagner SK, Fu DJ, Bruynseels A, Mahendiran T, Moraes G, Shamdas M, Kern C, Ledsam JR, Schmid MK, Balaskas K, Topol EJ, Bachmann LM, Keane PA, Denniston AK. A comparison of deep learning performance against health care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digital Health 2019. https://doi.org/10.1016/S2589-7500(19)30123-2

Return to footnote 2 referrer

Footnote 3

Perez MV, Mahaffey KW, Hedlin H, Rumsfeld JS, Garcia A, Ferris T, Balasubramanian V, Russo AM, Rajmane A, Cheung L, Hung G, Lee J, Kowey P, Talati N, Nag D, Gummidipundi SE, Beatty A, Hills MT, Desai S, Granger CB, Desai M, Turakhia MP; Apple Heart Study Investigators. Large-scale assessment of a smartwatch to identify atrial fibrillation. N Engl J Med 2019;381(20):1909–17. https://doi.org/10.1056/NEJMoa1901183

Return to footnote 3 referrer

Footnote 4

Feldman J, Thomas-Bachli A, Forsyth J, Patel ZH, Khan K. Development of a global infectious disease activity database using natural language processing, machine learning, and human expertise. J Am Med Inform Assoc 2019;26(11):1355–9. https://doi.org/10.1093/jamia/ocz112

Return to footnote 4 referrer

Footnote 5

Poplin R, Varadarajan AV, Blumer K, Liu Y, McConnell MV, Corrado GS, Peng L, Webster DR. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng 2018;2(3):158–64. https://doi.org/10.1038/s41551-018-0195-0

Return to footnote 5 referrer

Footnote 6

Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao S. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 2019;18(6):463–77. https://doi.org/10.1038/s41573-019-0024-5

Return to footnote 6 referrer

Footnote 7

Corsello SM, Nagari RT, Spangler RD, Rossen J, Kocak M, Bryan JG, Humeidi R, Peck D, Wu X, Tang AA, Wang VM, Bender SA, Lemire E, Narayan R, Montgomery P, Ben-David U, Garvie CW, Chen Y, Rees MG, Lyons NJ, McFarland JM, Wong BT, Wang L, Dumont N, O’Hearn PJ, Stefan E, Doench JG, Harrington CN, Greulich H, Meyerson M, Vazquez F, Subramanian A, Roth JA, Bittker JA, Boehm JS, Mader CC, Tsherniak A, Golub TR. Discovering the anticancer potential of non-oncology drugs by systematic viability profiling. Nat Can 2020;1:235–48. https://doi.org/10.1038/s43018-019-0018-6

Return to footnote 7 referrer

Footnote 8

Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019 Jan;25(1):44–56. https://doi.org/10.1038/s41591-018-0300-7

Return to first footnote 8 referrer

Footnote 9

MEDLINEProduction Statistics. Bethesda (MD): U.S. National Library of Medicine (updated 2019-11-19; accessed 2020-01-27). https://www.nlm.nih.gov/bsd/medline_pubmed_production_stats.html

Return to footnote 9 referrer

Footnote 10

Twitter usage statistics. Internet LiveStats.com (updated 2013-08-16; accessed 2020-01-27). https://www.internetlivestats.com/twitter-statistics/

Return to footnote 10 referrer

Footnote 11

Searching for health. Google News Lab, Schema; 2017 (accessed 2020-01-27). https://googlenewslab.gistapp.com/searching-for-health

Return to footnote 11 referrer

Footnote 12

Friedman C, Elhadad N. Natural language processing in health care and biomedicine. In: Shortliffe E, Cimino J, editors. Biomed Informatics London: Springer; 2014. https://doi.org/10.1007/978-1-4471-4474-8_8

Return to footnote 12 referrer

Footnote 13

Ruder S. NLP-progress. London (UK): Sebastian Ruder (accessed 2020-01-18). https://nlpprogress.com/

Return to footnote 13 referrer

Footnote 14

Jurafsky D, Martin JH. Speech and language processing. Stanford (CA): Stanford University; 2019 (updated 2019-11-16; accessed 2020-01-18). https://web.stanford.edu/~jurafsky/slp3/

Return to footnote 14 referrer

Footnote 15

Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc 2011;18(5):544–51. https://doi.org/10.1136/amiajnl-2011-000464

Return to footnote 15 referrer

Footnote 16

Nilsson N. Introduction to machine learning. Stanford (CA): Robotic Library, Department of Computer Science, Stanford University; 1998. http://robotics.stanford.edu/people/nilsson/MLBOOK.pdf

Return to footnote 16 referrer

Footnote 17

Zhou M, Duan N, Liu S, Shum HY. Progress in neural NLP: modeling, learning, and reasoning. Engineering 2020;6(3):275–90. https://doi.org/10.1016/j.eng.2019.12.014

Return to footnote 17 referrer

Footnote 18

Tang B, Pan Z, Yin K, Khateeb A. Recent advances of deep learning in bioinformatics and computational biology. Front Genet 2019;10:214. https://doi.org/10.3389/fgene.2019.00214

Return to footnote 18 referrer

Footnote 19

Hirschberg J, Manning CD. Advances in natural language processing. Science 2015;349(6245):261–6. https://doi.org/10.1126/science.aaa8685

Return to footnote 19 referrer

Footnote 20

Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S. GLUE: a multi-task benchmark and analysis platform for natural language understanding. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Brussels (BE): 2018 Nov; p. 353–5. https://doi.org/10.18653/v1/W18-5446

Return to footnote 20 referrer

Footnote 21

The Big Bad NLP Database. New York (NY): Quantum Stat; 2020 (updated 2020-01-21; accessed 2020-01-27). https://quantumstat.com/dataset/dataset.html

Return to footnote 21 referrer

Footnote 22

Jackson B, Huston P. Advancing health equity to improve health: the time is now. Health Promot Chronic Dis Prev Can 2016;36(2):17–20. https://doi.org/10.24095/hpcdp.36.2.01

Return to footnote 22 referrer

Footnote 23

Pan American Health Organization. Just societies: health equity and dignified lives. Report of the Commission of the Pan American Health Organization on Equity and Health Inequalities in the Americas. Washington (DC): Pan American Health Organization; (updated 2019-11; accessed 2020-01-18). https://iris.paho.org/handle/10665.2/51571

Return to footnote 23 referrer

Footnote 24

Marmot M, Allen J, Goldblatt P, Boyce T, McNeish D, Grady M, Geddes I; The Marmot Review. Fair society, healthy lives: strategic review of health inequalities in England post-2010. UCL Institute of Health Equity. http://www.parliament.uk/documents/fair-society-healthy-lives-full-report.pdf

Return to footnote 24 referrer

Footnote 25

Arcaya MC, Arcaya AL, Subramanian SV. Inequalities in health: definitions, concepts, and theories. Glob Health Action 2015;8:27106. https://doi.org/10.3402/gha.v8.27106

Return to footnote 25 referrer

Footnote 26

Public Health Agency of Canada. The Chief Public Health Officer’s report on the state of public health in Canada: addressing health inequalities. Ottawa (ON): Public Health Agency of Canada; 2008. Report No.: HP2-10/2008E. https://www.canada.ca/en/public-health/corporate/publications/chief-public-health-officer-reports-state-public-health-canada/report-on-state-public-health-canada-2008.html

Return to footnote 26 referrer

Footnote 27

Ndumbe-Eyoh S, Dyck L, Clement C. Common agenda for public health action on health equity. Antigonish (NS): National Collaborating Centre for Determinants of Health, St Francis Xavier University; 2016. http://nccdh.ca/images/uploads/comments/Common_Agenda_EN.pdf

Return to footnote 27 referrer

Footnote 28

Alonso-Coello P, Schünemann HJ, Moberg J, Brignardello-Petersen R, Akl EA, Davoli M, Treweek S, Mustafa RA, Rada G, Rosenbaum S, Morelli A, Guyatt GH, Oxman AD; GRADE Working Group. GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 1: Introduction. BMJ 2016;353:i2016. https://doi.org/10.1136/bmj.i2016

Return to footnote 28 referrer

Footnote 29

Kim ES, James P, Zevon ES, Trudel-Fitzgerald C, Kubzansky LD, Grodstein F. Social media as an emerging data resource for epidemiologic research: characteristics of social media users and non-users in the Nurses’ Health Study II. Am J Epidemiol 2020;189(2):156–61. https://doi.org/10.1093/aje/kwz224

Return to footnote 29 referrer

Footnote 30

Koleck TA, Dreisbach C, Bourne PE, Bakken S. Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc 2019;26(4):364–79. https://doi.org/10.1093/jamia/ocy173

Return to footnote 30 referrer

Footnote 31

Marshall IJ, Wallace BC. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst Rev 2019;8(1):163. https://doi.org/10.1186/s13643-019-1074-9

Return to footnote 31 referrer

Footnote 32

Yin Z, Sulieman LM, Malin BA. A systematic literature review of machine learning in online personal health data. J Am Med Inform Assoc 2019;26(6):561–76. https://doi.org/10.1093/jamia/ocz009

Return to footnote 32 referrer

Footnote 33

Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, Forshee R, Walderhaug M, Botsis T. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review. J Biomed Inform 2017;73:14–29. https://doi.org/10.1016/j.jbi.2017.07.012

Return to footnote 33 referrer

Footnote 34

The Office of the National Coordinator for Health Information Technology. Health IT dashboard: Quick stats. Washington (DC): U.S. Department of Health and Human Services. https://dashboard.healthit.gov/quickstats/quickstats.php

Return to footnote 34 referrer

Footnote 35

Harris JK, Mansour R, Choucair B, Olson J, Nissen C, Bhatt J; Centers for Disease Control and Prevention. Health department use of social media to identify foodborne illness - Chicago, Illinois, 2013-2014. MMWR Morb Mortal Wkly Rep 2014;63(32):681–5.

Return to footnote 35 referrer

Footnote 36

Gesualdo F, Stilo G, Agricola E, Gonfiantini MV, Pandolfi E, Velardi P, Tozzi AE. Influenza-like illness surveillance on Twitter through automated learning of naïve language. PLoS One 2013;8(12):e82489. https://doi.org/10.1371/journal.pone.0082489

Return to footnote 36 referrer

Footnote 37

Eichstaedt JC, Smith RJ, Merchant RM, Ungar LH, Crutchley P, Preoţiuc-Pietro D, Asch DA, Schwartz HA. Facebook language predicts depression in medical records. Proc Natl Acad Sci USA. 2018;115(44):11203–8. https://doi.org/10.1073/pnas.1802331115

Return to footnote 37 referrer

Footnote 38

Șerban O, Thapen N, Maginnis B, Hankin C, Foot V. Real-time processing of social media with SENTINEL: a syndromic surveillance system incorporating deep learning for health classification. Inf Process Manage 2019;56(3):1166–84. https://doi.org/10.1016/j.ipm.2018.04.011

Return to footnote 38 referrer

Footnote 39

Edo-Osagie O, Smith G, Lake I, Edeghere O, De La Iglesia B. Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance. PLoS One 2019;14(7):e0210689. https://doi.org/10.1371/journal.pone.0210689

Return to footnote 39 referrer

Footnote 40

Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc 2016;23(5):1007–15. https://doi.org/10.1093/jamia/ocv180

Return to footnote 40 referrer

Footnote 41

Dorr D, Bejan CA, Pizzimenti C, Singh S, Storer M, Quinones A. Identifying patients with significant problems related to social determinants of health with natural language processing. Stud Health Technol Inform 2019;264:1456–7. https://doi.org/10.3233/SHTI190482

Return to footnote 41 referrer

Footnote 42

Carrell DS, Cronkite D, Palmer RE, Saunders K, Gross DE, Masters ET, Hylan TR, Von Korff M. Using natural language processing to identify problem usage of prescription opioids. Int J Med Inform 2015;84(12):1057–64. https://doi.org/10.1016/j.ijmedinf.2015.09.002

Return to footnote 42 referrer

Footnote 43

Cacheda F, Fernandez D, Novoa FJ, Carneiro V. Early detection of depression: social network analysis and random forest techniques. J Med Internet Res 2019;21(6):e12554. https://doi.org/10.2196/12554

Return to footnote 43 referrer

Footnote 44

Conway M, Hu M, Chapman WW. Recent advances in using natural language processing to address public health research questions using social media and consumer generated data. Yearb Med Inform 2019;28(1):208–17. https://doi.org/10.1055/s-0039-1677918

Return to footnote 44 referrer

Footnote 45

Coppersmith G, Dredze M, Harman C. Quantifying mental health signals in Twitter. Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: from linguistic signal to clinical reality. Baltimore (MA): 27 June 2014;p. 51–60. https://doi.org/10.3115/v1/W14-3207

Return to footnote 45 referrer

Footnote 46

Gates A, Guitard S, Pillay J, Elliott SA, Dyson MP, Newton AS, Hartling L. Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools. Syst Rev 2019;8(1):278. https://doi.org/10.1186/s13643-019-1222-2

Return to footnote 46 referrer

Footnote 47

Przybyła P, Soto AJ, Ananiadou S. Identifying personalised treatments and clinical trials for precision medicine using semantic search with Thalia. Manchester (UK): TREC; 2017. https://www.researchgate.net/publication/323629465_Identifying_Personalised_Treatments_and_Clinical_Trials_for_Precision_Medicine_using_Semantic_Search_with_Thalia

Return to footnote 47 referrer

Footnote 48

Bannach-Brown A, Przybyła P, Thomas J, Rice AS, Ananiadou S, Liao J, Macleod MR. Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error. Syst Rev 2019;8(1):23. https://doi.org/10.1186/s13643-019-0942-7

Return to footnote 48 referrer

Footnote 49

Norman C, Leeflang M, Spijker R, Kanoulas E, Névéol A. A distantly supervised dataset for automated data extraction from diagnostic studies. ACL Workshop on Biomedical Natural Language Processing, Florence (IT): 2019 Aug. https://doi.org/10.18653/v1/W19-5012

Return to footnote 49 referrer

Footnote 50

Tsafnat G, Glasziou P, Karystianis G, Coiera E. Automated screening of research studies for systematic reviews using study characteristics. Syst Rev 2018;7(1):64. https://doi.org/10.1186/s13643-018-0724-7

Return to footnote 50 referrer

Footnote 51

Lerner I, Créquit P, Ravaud P, Atal I. Automatic screening using word embeddings achieved high sensitivity and workload reduction for updating living network meta-analyses. J Clin Epidemiol 2019;108:86–94. https://doi.org/10.1016/j.jclinepi.2018.12.001

Return to footnote 51 referrer

Footnote 52

Tucker TC, Durbin EB, McDowell JK, Huang B. Unlocking the potential of population-based cancer registries. Cancer 2019;125(21):3729–37. https://doi.org/10.1002/cncr.32355

Return to footnote 52 referrer

Footnote 53

Mohammadhassanzadeh H, Sketris I, Traynor R, Alexander S, Winquist B, Stewart SA. Using natural language processing to examine the uptake, content, and readability of media coverage of a pan-Canadian drug safety research project: cross-sectional observational study. JMIR Form Res 2020;4(1):e13296. https://doi.org/10.2196/13296

Return to footnote 53 referrer

Footnote 54

Banerji A, Lai KH, Li Y, Saff RR, Camargo CA Jr, Blumenthal KG, Zhou L. Natural language processing combined with ICD-9-CM codes as a novel method to study the epidemiology of allergic drug reactions. J Allergy Clin Immunol Pract 2020;8(3):1032–1038.e1. https://doi.org/10.1016/j.jaip.2019.12.007

Return to footnote 54 referrer

Footnote 55

Young IJ, Luz S, Lone N. A systematic review of natural language processing for classification tasks in the field of incident reporting and adverse event analysis. Int J Med Inform 2019;132:103971. https://doi.org/10.1016/j.ijmedinf.2019.103971

Return to footnote 55 referrer

Footnote 56

Henry S, Buchan K, Filannino M, Stubbs A, Uzuner O. 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records. J Am Med Inform Assoc 2020;27(1):3–12. https://doi.org/10.1093/jamia/ocz166

Return to footnote 56 referrer

Footnote 57

Fan B, Fan W, Smith C, Garner H. Adverse drug event detection and extraction from open data: a deep learning approach. Inf Process Manage 2020;57(1):102131. https://doi.org/10.1016/j.ipm.2019.102131

Return to footnote 57 referrer

Footnote 58

Yu W, Zheng C, Xie F, Chen W, Mercado C, Sy LS, Qian L, Glenn S, Tseng HF, Lee G, Duffy J, McNeil MM, Daley MF, Crane B, McLean HQ, Jackson LA, Jacobsen SJ. The use of natural language processing to identify vaccine-related anaphylaxis at five health care systems in the Vaccine Safety Datalink. Pharmacoepidemiol Drug Saf 2020;29(2):182–8. https://doi.org/10.1002/pds.4919

Return to footnote 58 referrer

Footnote 59

Liu F, Weng C, Yu H. Advancing clinical research through natural language processing on electronic health records: traditional machine learning meets deep learning. In: Richesson RL, Andrews JE, editors: Clinical Research Informatics. Springer International Publishing; 2019. p. 357–78. https://link.springer.com/book/10.1007/978-3-319-98779-8

Return to footnote 59 referrer

Footnote 60

Chan L, Beers K, Yau AA, Chauhan K, Duffy Á, Chaudhary K, Debnath N, Saha A, Pattharanitima P, Cho J, Kotanko P, Federman A, Coca SG, Van Vleck T, Nadkarni GN. Natural language processing of electronic health records is superior to billing codes to identify symptom burden in hemodialysis patients. Kidney Int 2020;97(2):383–92. https://doi.org/10.1016/j.kint.2019.10.023

Return to footnote 60 referrer

Footnote 61

Juhn Y, Liu H. Artificial intelligence approaches using natural language processing to advance EHR-based clinical research. J Allergy Clin Immunol 2020;145(2):463–9. https://doi.org/10.1016/j.jaci.2019.12.897

Return to footnote 61 referrer

Footnote 62

Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H. Clinical information extraction applications: A literature review. J Biomed Inform 2018;77:34–49. https://doi.org/10.1016/j.jbi.2017.11.011

Return to footnote 62 referrer

Footnote 63

Laranjo L, Dunn AG, Tong HL, Kocaballi AB, Chen J, Bashir R, Surian D, Gallego B, Magrabi F, Lau AY, Coiera E. Conversational agents in healthcare: a systematic review. J Am Med Inform Assoc 2018;25(9):1248–58. https://doi.org/10.1093/jamia/ocy072

Return to footnote 63 referrer

Footnote 64

Head to health. COVID-19 support. Department of Health; Australian Government (accessed 2020-01-27). https://headtohealth.gov.au/sam-the-chatbot

Return to footnote 64 referrer

Footnote 65

Pereira J, Díaz Ó. Using health chatbots for behavior change: a mapping study. J Med Syst 2019;43(5):135. https://doi.org/10.1007/s10916-019-1237-1

Return to footnote 65 referrer

Footnote 66

Dion M, AbdelMalik P, Mawudeku A. Big Data and the Global Public Health Intelligence Network (GPHIN). Can Commun Dis Rep 2015;41(9):209–14. https://doi.org/10.14745/ccdr.v41i09a02

Return to footnote 66 referrer

Footnote 67

Ghosh S, Chakraborty P, Lewis BL, Majumder M, Cohn E, Brownstein JS, Marathe M, Ramakrishnan N. GELL: Automatic extraction of epidemiological line lists from open sources. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2017 Aug 13-17; Halifax (NS): Association for Computing Machinery; 2017. p. 1477–86. http://doi.org/10.1145/3097983.3098073

Return to footnote 67 referrer

Footnote 68

Charles-Smith LE, Reynolds TL, Cameron MA, Conway M, Lau EH, Olsen JM, Pavlin JA, Shigematsu M, Streichert LC, Suda KJ, Corley CD. Using social media for actionable disease surveillance and outbreak management: a systematic literature review. PLoS One 2015;10(10):e0139701. https://doi.org/10.1371/journal.pone.0139701

Return to footnote 68 referrer

Footnote 69

Jordan S, Hovet S, Fung I, Liang H, Fu KW, Tsz Ho Tse Z. Using Twitter for public health surveillance from monitoring and prediction to public response. Data (Basel) 2018;4(1):6. https://doi.org/10.3390/data4010006

Return to footnote 69 referrer

Footnote 70

Abbood A, Ullrich A, Busche R, Ghozzi S. EventEpi—A natural language processing framework for event-based surveillance medRxiv 2019;19006395. https://doi.org/10.1101/19006395

Return to footnote 70 referrer

Footnote 71

Anglin K. Gather-narrow-extract: A framework for studying local policy variation using web-scraping and natural language processing. J Res Educ Eff 2019;12(4):685–706. https://doi.org/10.1080/19345747.2019.1654576

Return to footnote 71 referrer

Footnote 72

Tatman R, Conner K. Effects of talker dialect, gender & race on accuracy of Bing speech and YouTube automatic captions. Proc Interspeech 2017;934–8. https://doi.org/10.21437/Interspeech.2017-1746

Return to footnote 72 referrer

Footnote 73

Spasic I, Nenadic G. Clinical text data in machine learning: systematic review. JMIR Med Inform 2020;8(3):e17984. https://doi.org/10.2196/17984

Return to footnote 73 referrer

Footnote 74

Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. Ensuring fairness in machine learning to advance health equity. Ann Intern Med 2018;169(12):866–72. https://doi.org/10.7326/M18-1990

Return to footnote 74 referrer

Footnote 75

Gramlich J. 10 facts about Americans and Facebook. Washington (DC): Pew Research Center (accessed 2020-01-27). https://www.pewresearch.org/fact-tank/2019/05/16/facts-about-americans-and-facebook/

Return to footnote 75 referrer

Footnote 76

Xu C, Doshi T. Fairness indicators: scalable infrastructure for fair ML system. Mountain View (CA): Google (accessed 2020-01-27). https://ai.googleblog.com/2019/12/fairness-indicators-scalable.html

Return to footnote 76 referrer

Footnote 77

Holstein K, Vaughan JW, Daumé H, Dudík M, Wallach H. Improving fairness in machine learning systems: What do industry practitioners need? CHI ’19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 2019 Paper No.: 600. p. 1–16. https://doi.org/10.1145/3290605.3300830

Return to footnote 77 referrer

Footnote 78

Wiens J, Price WN 2nd, Sjoding MW. Diagnosing bias in data-driven algorithms for healthcare. Nat Med 2020;26(1):25–6. https://doi.org/10.1038/s41591-019-0726-6

Return to footnote 78 referrer

Footnote 79

Chen IY, Joshi S, Ghassemi M. Treating health disparities with artificial intelligence. Nat Med 2020;26(1):16–7. https://doi.org/10.1038/s41591-019-0649-2

Return to footnote 79 referrer

Footnote 80

Montreal Declaration for a Responsible Development of Artificial Intelligence. Forum on the Socially Responsible Development of AI: 2017 Nov 2-3: Montréal (QC) (accessed 2020-01-18). https://www.montrealdeclaration-responsibleai.com/the-declaration

Return to footnote 80 referrer

Footnote 81

Treasury Board Secretariat. Directive on automated decision-making. Ottawa (ON): Government of Canada (modified 2019-02-05; accessed 2020-01-27). https://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592

Return to footnote 81 referrer

Footnote 82

Friedman C, Rindflesch TC, Corn M. Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine. J Biomed Inform 2013;46(5):765–73. https://doi.org/10.1016/j.jbi.2013.06.004

Return to footnote 82 referrer

Footnote 83

Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural language processing of clinical notes on chronic diseases: systematic review. JMIR Med Inform 2019;7(2):e12239. https://doi.org/10.2196/12239

Return to footnote 83 referrer

Footnote 84

Ford E, Curlewis K, Wongkoblap A, Curcin V. Public opinions on using social media content to identify users with depression and target mental health care advertising: mixed methods survey. JMIR Ment Health 2019;6(11):e12942. https://doi.org/10.2196/12942

Return to footnote 84 referrer

Footnote 85

Radebaugh C, Erlingsson U. Introducing tensorflow privacy: learning with differential privacy for training data. Medium.com (accessed 2020-01-27). https://medium.com/tensorflow/introducing-tensorflowprivacy-learning-with-differential-privacy-for-trainingdata-b143c5e801b6

Return to footnote 85 referrer

Footnote 86

Penning de Vries BB, van Smeden M, Rosendaal FR, Groenwold RH. Title, abstract, and keyword searching resulted in poor recovery of articles in systematic reviews of epidemiologic practice. J Clin Epidemiol 2020;121:55–61. https://doi.org/10.1016/j.jclinepi.2020.01.009

Return to footnote 86 referrer

Footnote 87

Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 2019;366(6464):447–53. https://doi.org/10.1126/science.aax2342

Return to footnote 87 referrer

Footnote 88

Coronavirus tech handbook: natural language processing. https://coronavirustechhandbook.com/nlp

Return to footnote 88 referrer

Footnote 89

COVID-19 Open Research Dataset Challenge (CORD-19): An AI challenge with AI2, CZI, MSR, Georgetown, NIH & The White House. San Francisco (CA): kaggle.com (accessed 2020-01-27). https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

Return to footnote 89 referrer

Footnote 90

Central. Public Health Emergency COVID-19 Initiative. Bethesda (MD): US National Library of Medicine (accessed 2020-01-27). https://www.ncbi.nlm.nih.gov/pmc/about/covid-19/

Return to footnote 90 referrer

Footnote 91

Bullock J, Luccioni A, Pham KH, Lam CS, Luengo-Oroz M. Mapping the landscape of artificial intelligence applications against COVID-19. arXiv:2003.11336 [cs.CY]. https://vectorinstitute.ai/wp-content/uploads/2020/03/arxiv-mappingai.pdf

Return to footnote 91 referrer

Footnote 92

Allen Institute for Artificial Intelligence (AI2). CORD-19 Explorer: explore the dataset. https://cord-19.apps.allenai.org/

Return to footnote 92 referrer

Footnote 93

Chen E, Lerman K, Ferrara E. COVID-19: the first public coronavirus Twitter dataset. Ithaca (NY): Cornell University (accessed 2020-01-27). https://arix.org/abs/2003.07372

Return to footnote 93 referrer

Footnote 94

LitCovid. Bethesda (MD): U.S. National Library of Medicine (accessed 2020-01-27). https://www.ncbi.nlm.nih.gov/research/coronavirus/

Return to footnote 94 referrer

Footnote 95

Coronafiles: Chatbots take strain off Denmark’s emergency helplines (accessed 2020-01-27). https://sifted.eu/articles/coronafiles-chatbots-helplines/

Return to footnote 95 referrer

Footnote 96

Kritikos M. At a glance: scientific foresight: What if we could fight coronavirus with artificial intelligence? Scientific Foresight Unit, European Parliament. https://www.europarl.europa.eu/RegData/etudes/ATAG/2020/641538/EPRS_ATA(2020)641538_EN.pdf

Return to footnote 96 referrer

Footnote 97

Against AI. COVID-19 Canada. About us. Toronto (ON): CIFAR (accessed 2020-01-27). https://ai-against-covid.ca/

Return to footnote 97 referrer

This work is licensed under a Creative Commons Attribution 4.0 International License

Table of contents

Page details

2020-06-04