NLP Workshop

Use of Natural Language Processing to Extract Information from Clinical Text

June 15, 2017
8:30 am to 5:00 pm (Registration starts at 7:45 am)

Great Room (Building 31, Room 1503B/C)
FDA White Oak Campus
10903 New Hampshire Ave
Silver Spring, MD 20993



The objective of this workshop is to identify current and emerging natural language processing (NLP) efforts being applied to unstructured text such as clinical notes or narratives in electronic health records (EHRs). The workshop will provide insights into utility and challenges in designing and implementing NLP systems to capture relevant or missing information from clinical notes or text for monitoring postmarketing safety surveillance and informing the design and execution of clinical trials for medical products, which include drugs, biologics, and devices. The workshop will include panel discussion sessions to provide stakeholders with a forum to discuss natural language processing with experts in the field.

The workshop will focus on whether NLP can be applied to unstructured text in clinical notes to:

  • Identify indication or reason for medical product use, adverse outcomes or events associated with use of these products, and confounders or personal behaviors that may modify risks associated with use of these products
  • Support protocol design, feasibility, recruitment efforts and execution of clinical trials


Please click here to view more information about this workshop on the FDA's website.



This workshop is open to the public.  The intended audience includes researchers, industry, public health agencies, the clinical community, and other stakeholders using or interested in using NLP, either by itself or as part of a system, to harness relevant information from unstructured text in clinical notes or narratives in EHRs.

There is no cost to attend, but pre-registration is required. Attendees may attend in person or remotely via webinar (link will be sent to registrants). Attendees arriving in person should allocate sufficient time to clear the security screening process.


Non-FDA attendees and FDA contractors/ORISE fellows: Click here to register.
FDA employees: Click here to register.


Workshop Agenda

The agenda is shown below. Speaker biographies can be downloaded here: PDF iconSpeaker Bios. A PDF version of the agenda will be available soon.

Time Topic Speaker

7:45 am





8:30 am


Rita Ouellet-Hellstrom, PhD, MPH

8:35 am


Opening Remarks - FDA’s interest in NLP for pharmacoepidemiology and other uses


Robert Ball, MD, MPH

9:00 am


A brief review of recent literature on clinical NLP


Russ Altman, MD, PhD
Stanford University




Russ Altman, MD, PhD
Stanford University

9:30 am


Advance Drug Safety Research with Semantic Analysis of Electronic Health Records

An adverse event is an injury to a patient and an adverse drug event is an injury to a patient resulting from a medical intervention related to pharmacotherapy. Adverse drug events complicate two million hospital stays annually, are associated with a prolonged hospital stay, account for upwards of two thirds of post-discharge complications, and are a significant contributor to escalating health care costs. The Office of Disease Prevention and Health Promotion has identified adverse drug event prevention as a patient safety priority. Electronic health records (EHRs) contain important adverse drug event-related information and manual chart review is prohibitively expensive. In contrast, biomedical natural language processing (NLP) provides automated tools that facilitate chart review and can improve patient drug safety surveillance and post-marketing pharmacovigilance through enhanced cost efficiencies and provision of real-time information. In this talk, I will first introduce an expert-annotated EHR corpus we developed. I will then describe several new deep neural network models (e.g., LSTM-CRF and memory-augmented NNs) we developed to build the state-of-the-art NLP systems for automated medication and adverse drug event detection from EHR narratives. I will also describe Item Response Theory (IRT) as a new evaluation metrics for NLP systems. Unlike the traditional evaluation metrics of recall/precision/F-score, IRT models characteristics of individual data points (called “items”) such as difficulty and discriminatory ability to estimate ability as a function of the characteristics of correctly answered items. Based on our IRT analysis, we found that deep neural network models exhibit human-like learning process and intelligence capabilities. Our work is an important step towards ADE surveillance and pharmacovigilance.

Hong Yu, PhD
University of Massachusetts Medical School

10:00 am


Novel NLP for longitudinal analysis of patient records identifying medication treatment patterns, side effects, and adherence


Murthy Devarakonda, PhD
IBM Research

10:30 am



10:45 am


Mining the EHR to understand disease, drugs, and adverse events

In the era of Electronic Health Records, it is possible to examine the outcomes of decisions made by doctors during clinical practice to identify patterns of care—generating evidence from the collective experience of patients. We will discuss methods that transform unstructured EHR data into a substrate to discover hidden trends, build predictive models, and drive comparative effectiveness studies in a learning health system.

Nigam Shah, MBBS, PhD
Stanford University

11:15 am


Panel Discussion - Addressing the strengths and limitations of NLP solutions

Panelists: Robert Ball, MD, MPH (FDA/CDER), Isaac Chang, PhD (FDA/CDRH), Murthy Devarakonda, PhD (IBM Research), Rita Ouellet-Hellstrom, PhD, MPH (FDA/CDER), Nigam Shah, MBBS, PhD (Stanford University), Mark Walderhaug, PhD (FDA/CBER), Hong Yu, PhD (UMass)



Russ Altman, MD, PhD
Stanford University

12:30 pm


Lunch (on your own)





Mark Walderhaug, PhD

1:30 pm


Adapting clinical NLP methods for multi-site medical products research

Medical product clinical trials and postmarketing safety surveillance are increasingly coordinated across multiple institutional settings where secondary use of electronic health record (EHR) data makes large-scale ascertainment of outcomes more efficient. Many important outcomes are captured only in unstructured clinical narrative. Harmonizing information extracted from unstructured text in these settings entails challenges similar to those encountered when combining structured EHR data from geographically and institutionally diverse delivery systems. The adage emerging from these efforts, that “all data are local,” is at least as relevant to unstructured clinical data as it is to more widely used structured EHR data. This presentation will describe salient issues confronted when adapting and applying natural language processing (NLP) methods across multiple institutional settings. Seemingly simple tasks, such as assembling complete and representative clinical corpora, can be surprisingly challenging. Idiosyncratic characteristics of clinical documentation, including language usage, document structure, and content, makes the application of NLP methods in multi-site settings an endeavor that requires forethought and attention to detail. These and related issues will be illustrated with examples from recent NLP projects in several clinical domains, including a project now underway to extract from clinical progress notes information about patient-reported medication side effects.

David Carrell, PhD
Kaiser Permanente Washington Health Research Institute

2:00 pm


Leveraging NLP and diverse data sources to mine drug repositioning, adverse drug events, and patient-reported medication outcome information

The informatics team at Mayo Clinic has developed a battery of NLP and text mining methods and tools over the past 15 years to facilitate various clinical and biomedical research projects. Observing that individual data sources tend to have biases and inconsistent findings, we have utilized diverse data sources and various NLP techniques for mining drug-related information. In this presentation, we will start with an overview of our NLP infrastructure and resources, and then demonstrate three case studies on signal detection for drug repositioning, adverse drug events and patient-reported outcomes. We will end by sharing several learned lessons on how to leverage NLP and multiple datasets of heterogeneous nature for meaningful data-driven discovery in drug-related research.

Hongfang Liu, PhD
Mayo Clinic

Lixia Yao, PhD
Mayo Clinic

2:30 pm


Flexible NLP for varied applications and data sources, including cohort selection and adverse event coding/validation

There are a wide range of existing applications for clinical NLP. In this talk we will explore some of these, including cohort selection for clinical trials, extraction of features from EHRs to predict clinical risk, coding of data, checking of regulatory submissions, and analysis of patient feedback. We will explore some of the challenges involved in dealing with such varied data sources and how they can be addressed. In particular we will look at the use of agile text mining to quickly build applications from unannotated data.

David Milward, PhD

3:00 pm



3:15 pm


Lessons learned from NLP implementations at FDA

Mitra Rocca, PhD

3:45 pm


Panel Discussion - Can NLP alone or as part of a system address FDA's and the healthcare community’s needs?

Panelists: David Carrell, PhD (Kaiser Permanente Washington Health Research Institute), Isaac Chang, PhD (FDA/CDRH), Hongfang Liu, PhD (Mayo Clinic), Lixia Yao, PhD (Mayo Clinic), David Milward, PhD (Linguamatics), Rita Ouellet-Hellstrom, PhD, MPH (FDA/CDER), Mitra Rocca, PhD (FDA/CDER)



Mark Walderhaug, PhD


4:45 pm


Closing Remarks


Dragutin Petkovic, PhD
San Francisco State University


Lodging and Transportation

Details on lodging and transportation options can be found on FDA's webpage.


Recording and Speaker Presentations

This public workshop will be recorded and a link to the recording and speaker presentation slides will be available (with speaker permission) on this website after the event.


Workshop Planning Committee

We thank the following members of the workshop planning committee for their time and expertise:

Donna Blum-Kemelor FDA / OCS / ORSI
Amal Mansuer FDA / OCS / ORSI
Catherine Ng FDA / OCS / ORSI
Audrey Thomas FDA / OCS / ORSI
York Tomita, PhD FDA / OCS / ORSI
Frank Weichold, MD, PhD FDA / OCS / ORSI
Rebekah Zinn, PhD FDA / OCS / ORSI
Ruth Barratt, PhD FDA / CDER
Rita Ouellet-Hellstrom, PhD, MPH FDA / CDER
Mitra Rocca, PhD FDA / CDER
Mark Walderhaug, PhD FDA / CBER
Isaac Chang, PhD FDA / CDRH
Russ Altman, MD, PhD UCSF-Stanford CERSI
Lawrence Lin, PhD UCSF-Stanford CERSI
Dragutin Petkovic, PhD SFSU
Anagha Kulkarni, PhD SFSU


Please contact Lawrence Lin at with any questions.