NLP Workshop

Use of Natural Language Processing to Extract Information from Clinical Text

June 15, 2017 | 8:30 am to 5:00 pm

Great Room (Building 31, Room 1503B/C)
FDA White Oak Campus
10903 New Hampshire Ave
Silver Spring, MD 20993


The objective of this workshop is to identify current and emerging natural language processing (NLP) efforts being applied to unstructured text such as clinical notes or narratives in electronic health records (EHRs). The workshop will provide insights into utility and challenges in designing and implementing NLP systems to capture relevant or missing information from clinical notes or text for monitoring postmarketing safety surveillance and informing the design and execution of clinical trials for medical products, which include drugs, biologics, and devices. The workshop will include panel discussion sessions to provide stakeholders with a forum to discuss natural language processing with experts in the field. You may view information on this workshop on the FDA's website by clicking here.

The workshop will focus on whether NLP can be applied to unstructured text in clinical notes to:

  • Identify indication or reason for medical product use, adverse outcomes or events associated with use of these products, and confounders or personal behaviors that may modify risks associated with use of these products
  • Support protocol design, feasibility, recruitment efforts and execution of clinical trials

Workshop summary

PDF icon Workshop Summary (PDF)

Workshop agenda, slides and recording

The agenda is shown below, and slides can be downloaded by clicking on the title of each talk.

PDF icon Speaker Info (PDF)

PDF icon NLP Workshop Agenda (PDF)

The workshop recording can be accessed here:

Time Topic Speaker

8:30 am

PDF icon Welcome (PDF)

Rita Ouellet-Hellstrom, PhD, MPH

8:35 am

FDA’s interest in natural language processing of clinical texts for pharmacovigilance, pharmacoepidemiology and other uses

Robert Ball, MD, MPH

9:00 am

PDF icon Current trends in clinical NLP literature

In this talk, I will present a quick review of recently published papers in the area of clinical NLP, focusing on topics of particular interest and relevance to this workshop. For each paper, I will present a “mini journal club” in which I will summarize the goal, methods, results and my conclusions about the work.

Russ Altman, MD, PhD
Stanford University




Russ Altman, MD, PhD
Stanford University

9:30 am

Lessons learned from NLP implementations at FDA

In this talk, I will first provide an overview to several ongoing FDA projects leveraging Natural Language Processing (NLP) tools for structuring and standardizing unstructured Information. For each project, I will summarize the goals, outcomes and lessons learned.

Mitra Rocca, Dipl. Inform. Med​


10:00 am

Novel NLP Methods for Medication-Related Insights from Longitudinal Patient Records

Watson for Patient Records Analytics is an initiative at IBM Research to develop novel natural language processing methods for longitudinal patient records. Our goal is to classify sentences in clinical notes, as for example that a sentence asserts a medication side effect, and extract the assertion itself. Towards this goal, we developed several NLP methods customized for clinical notes text and working on several more. We relate problems with their medication treatments, leveraging an accurate problem list that our automated method generates. Medications extracted from clinical notes are juxtaposed with orders to enable medication reconciliation. A supervised learning method achieves high accuracy in identifying sentences that discuss medication change. Plan sentence extraction has been developed. Work on related methods is underway. This talk will outline our goal, approach, what has been achieved so far, and lessons learned.

Murthy Devarakonda, PhD
IBM Research

10:30 am



10:45 am

Mining the EHR to understand disease, drugs, and adverse events

In the era of Electronic Health Records, it is possible to examine the outcomes of decisions made by doctors during clinical practice to identify patterns of care—generating evidence from the collective experience of patients. We will discuss methods that transform unstructured EHR data into a substrate to discover hidden trends, build predictive models, and drive comparative effectiveness studies in a learning health system.

Nigam Shah, MBBS, PhD
Stanford University

11:15 am

Panel Discussion - Addressing the strengths and limitations of NLP solutions

Panelists: Robert Ball, MD, MPH (FDA/CDER), Isaac Chang, PhD (FDA/CDRH), Murthy Devarakonda, PhD (IBM Research), Rita Ouellet-Hellstrom, PhD, MPH (FDA/CDER), Mitra Rocca, Dipl. Inform. Med​ (FDA/CDER), Nigam Shah, MBBS, PhD (Stanford University), Mark Walderhaug, PhD (FDA/CBER)


Russ Altman, MD, PhD
Stanford University

12:30 pm





Mark Walderhaug, PhD

1:30 pm

Adapting clinical NLP methods for multi-site medical products research

Medical product clinical trials and postmarketing safety surveillance are increasingly coordinated across multiple institutional settings where secondary use of electronic health record (EHR) data makes large-scale ascertainment of outcomes more efficient. Many important outcomes are captured only in unstructured clinical narrative. Harmonizing information extracted from unstructured text in these settings entails challenges similar to those encountered when combining structured EHR data from geographically and institutionally diverse delivery systems. The adage emerging from these efforts, that “all data are local,” is at least as relevant to unstructured clinical data as it is to more widely used structured EHR data. This presentation will describe salient issues confronted when adapting and applying natural language processing (NLP) methods across multiple institutional settings. Seemingly simple tasks, such as assembling complete and representative clinical corpora, can be surprisingly challenging. Idiosyncratic characteristics of clinical documentation, including language usage, document structure, and content, makes the application of NLP methods in multi-site settings an endeavor that requires forethought and attention to detail. These and related issues will be illustrated with examples from recent NLP projects in several clinical domains, including a project now underway to extract from clinical progress notes information about patient-reported medication side effects.

David Carrell, PhD
Kaiser Permanente Washington Health Research Institute

2:00 pm

Advance Drug Safety Research with Semantic Analysis of Electronic Health Records

An adverse event is an injury to a patient and an adverse drug event is an injury to a patient resulting from a medical intervention related to pharmacotherapy. Adverse drug events complicate two million hospital stays annually, are associated with a prolonged hospital stay, account for upwards of two thirds of post-discharge complications, and are a significant contributor to escalating health care costs. The Office of Disease Prevention and Health Promotion has identified adverse drug event prevention as a patient safety priority. Electronic health records (EHRs) contain important adverse drug event-related information and manual chart review is prohibitively expensive. In contrast, biomedical natural language processing (NLP) provides automated tools that facilitate chart review and can improve patient drug safety surveillance and post-marketing pharmacovigilance through enhanced cost efficiencies and provision of real-time information. In this talk, I will first introduce an expert-annotated EHR corpus we developed. I will then describe several new deep neural network models (e.g., LSTM-CRF and memory-augmented NNs) we developed to build the state-of-the-art NLP systems for automated medication and adverse drug event detection from EHR narratives. I will also describe Item Response Theory (IRT) as a new evaluation metrics for NLP systems. Unlike the traditional evaluation metrics of recall/precision/F-score, IRT models characteristics of individual data points (called “items”) such as difficulty and discriminatory ability to estimate ability as a function of the characteristics of correctly answered items. Based on our IRT analysis, we found that deep neural network models exhibit human-like learning process and intelligence capabilities. Our work is an important step towards ADE surveillance and pharmacovigilance.

Hong Yu, PhD
University of Massachusetts Medical School


2:30 pm

Flexible NLP for varied applications and data sources, including cohort selection and adverse event coding/validation

There are a wide range of existing applications for clinical NLP. In this talk we will explore some of these, including cohort selection for clinical trials, extraction of features from EHRs to predict clinical risk, coding of data, checking of regulatory submissions, and analysis of patient feedback. We will explore some of the challenges involved in dealing with such varied data sources and how they can be addressed. In particular we will look at the use of agile text mining to quickly build applications from unannotated data.

David Milward, PhD


3:00 pm



3:15 pm

Leveraging NLP and diverse data sources to mine drug repositioning, adverse drug events, and patient-reported medication outcome information

The informatics team at Mayo Clinic has developed a battery of NLP and text mining methods and tools over the past 15 years to facilitate various clinical and biomedical research projects. Observing that individual data sources tend to have biases and inconsistent findings, we have utilized diverse data sources and various NLP techniques for mining drug-related information. In this presentation, we will start with an overview of our NLP infrastructure and resources, and then demonstrate three case studies on signal detection for drug repositioning, adverse drug events and patient-reported outcomes. We will end by sharing several learned lessons on how to leverage NLP and multiple datasets of heterogeneous nature for meaningful data-driven discovery in drug-related research.

Lixia Yao, PhD
Mayo Clinic

3:45 pm

Panel Discussion - What is the way forward for clinical NLP?

Panelists: Russ Altman, MD, PhD (Stanford University), David Carrell, PhD (Kaiser Permanente Washington Health Research Institute), Isaac Chang, PhD (FDA/CDRH), Hongfang Liu, PhD or Lixia Yao, PhD (Mayo Clinic), David Milward, PhD (Linguamatics), Rita Ouellet-Hellstrom, PhD, MPH (FDA/CDER), Hong Yu, PhD (University of Massachusetts Medical School)


Mark Walderhaug, PhD

4:45 pm

Closing Remarks

Dragutin Petkovic, PhD
San Francisco State University

Workshop planning committee

We thank the following members of the workshop planning committee for their time and expertise:

Donna Blum-Kemelor FDA / OCS / ORSI
Amal Mansuer FDA / OCS / ORSI
Catherine Ng FDA / OCS / ORSI
Audrey Thomas FDA / OCS / ORSI
York Tomita, PhD FDA / OCS / ORSI
Frank Weichold, MD, PhD FDA / OCS / ORSI
Rebekah Zinn, PhD FDA / OCS / ORSI
Ruth Barratt, PhD FDA / CDER
Rita Ouellet-Hellstrom, PhD, MPH FDA / CDER
Mitra Rocca, Dipl. Inform. Med​ FDA / CDER
Mark Walderhaug, PhD FDA / CBER
Isaac Chang, PhD FDA / CDRH
Russ Altman, MD, PhD UCSF-Stanford CERSI
Lawrence Lin, PhD UCSF-Stanford CERSI
Dragutin Petkovic, PhD SFSU
Anagha Kulkarni, PhD SFSU

Please contact Lawrence Lin at [email protected] with any questions.