Machine Learning Tools Unlock the Most Critical Insights from Unstructured Health Data

December 6, 2019

Article Summary


Patient comments such as “I feel dizzy” or “my stomach hurts” can tell clinicians a lot about an individual’s health, as can additional background, including zip code, employment status, access to transportation, and more. This critical information, however, is captured as free text, or unstructured data, making it impossible for traditional analytics to leverage.

Machine learning tools (e.g., NLP and text mining) help health systems better understand the patient and their circumstances by unlocking valuable insights residing unstructured data:

1. NLP analyzes large amounts of natural language data for human users.
2. Text mining derives value through the analysis of mass amounts of text (e.g., word frequency, length of words, etc.).

This report is based on a 2018 Healthcare Analytics Summit presentation given by Shaun Grannis, MD, MS, FAAFP, FACMI, Director Regenstrief Center for Biomedical Informatics; Assoc. Prof, Dept of Family Medicine, Indiana University School of Medicine, entitled “Real-World Examples of Leveraging NLP, Big Data, and Data Science to Improve Population Health and Individual Care Outcomes.”

Many healthcare leaders operate on the premise that health system caregivers and stakeholders are more effective and better at what they do with the aid of thoughtful IT. This concept drives data analytics and technology integration in healthcare.

But what does thoughtful IT mean? Although many health systems leverage some form of IT, that doesn’t mean it’s the best fit. Thoughtful IT occurs when health systems use the right technology to lead to accurate data to deliver better patient care and improve outcomes.

Thoughtful IT leverages machine learning to revolutionize the way health systems use data to determine the best course of patient care. Two examples include the following:

  • Natural language processing (NLP): the digitally-enabled ability to analyze large amounts of natural language data for human users.
  • Text mining: deriving value through the analysis of mass amounts of text (e.g., word frequency, length of words, etc.).

With NLP and text mining, healthcare organizations are starting to leverage technology to access the plethora of unstructured patient data available in the EMR (e.g., nursing notes or patient-reported text such as, “my stomach hurts”). NLP and text mining can process data traditional analytics cannot, opening up richer, more complex data sources.

Untouched, Unstructured Data Is Key to Comprehensive Care

Traditional analytics typically uses structured data, consisting mainly of claims data and only accounting for approximately 20 percent of all available data. Structured data exists in a specific, consistent format and includes basic information, such as patient demographics, labs, ICD-9 codes, and medications. While this data is easy to access, it is often delayed due to lab processing time and limited because the information is basic in nature, revealing surface level details about a patient but not valuable details such as socioeconomic information.

To access and understand the remaining 80 percent of unstructured data—including free text data, physician order data, nursing notes, and dictation notes—health systems must rely on NLP. NLP allows organizations to access the complex, richer data sets that are harder to reach because they require sophisticated technology to derive value from the massive amounts of everyday language sitting in the EHR.

Unstructured data, residing in EHRs and elsewhere, contains the deeper, more complex information—such as a patient’s own words to describe symptoms or another provider’s notes. It can provide contextual information (e.g., living conditions, how the patient perceives their illness, information about the patient’s family, etc.), also known as social determinants of health (SDoH).

Text Mining Provides Context Around Patient Care—A Critical Resource for Healthcare Machine Learning Success

Clinicians do their best to capture patient anecdotes and comments in their notes, but these records rarely go beyond the EHR. Free-text inputs (e.g., “I don’t feel good,” “my right side feels numb,” etc.) are commonplace in healthcare encounters and can provide crucial context for clinicians trying to deliver the best care.

Manually collecting and categorizing free-text notes, however, proves challenging and time-consuming for healthcare teams, who already experience burnout related to data collection and input. Machine learning can alleviate the burden of manually organizing free-text notes in the form of text mining. Text mining is a sophisticated algorithm that swiftly and accurately classifies free-text comments—based on the organization’s choice of words to search for—into the right categories (e.g., high priority, low priority, etc.). Clinicians can refer to these valuable insights throughout the care process without having to sit down and manually sift through mass amounts of free-text data.

Machine Learning Delivers Real Value in the Real World

Even though algorithms don’t deliver results with 100 percent accuracy, they are still more accurate than humans. Machine learning algorithms can aid health systems in areas in which the staff simply lack bandwidth. For example, in 2005 Indiana University Health (IU Health), implemented a machine learning early-warning system to identify unusual trends in the emergency department (ED). Shortly after IU implemented the system, it sent an alert to the Indiana state health officials (who have access to all state health system data) because it flagged abnormally high levels of patients coming to the ED complaining of the same symptoms—including dizziness, confusion, nausea. A health official responded to the alert by notifying the hospitals.

Based on the existing data the health system used, nothing triggered attention outside of the early-warning system. But, after further inquiry into the complaints, IU Health discovered that all the people who complained of these symptoms had something in common—they all lived in the same apartment complex. Later, it was revealed that the heater in the apartment complex was malfunctioning and releasing carbon monoxide into the apartments, causing tenants to get sick.

With machine learning’s ability to dissect, organize, and analyze massive amounts of data at a rapid rate, health systems can focus on responding to alerts and outliers in data (Figure 1), intervene in the prevention stage, and immediately take action to address gaps in care—versus providing care after a patient’s condition has worsened.

Healthcare machine learning - Example of outliers in data in emergency department visits
Figure 1. Outliers in ED visits.

Automated Reporting Increases Accuracy, Alleviates Provider Burden

Machine learning can also help organizations alleviate the burden of reporting and increase accuracy. With countless measures to report and constant changes in reporting figures, health systems and clinicians struggle to report accurate information consistently for many reasons:

  • They are overburdened with other tasks.
  • They don’t know what they’re supposed to report, due to frequent reporting changes.
  • They assume someone else is doing it.

Machine learning can automate communication systems, reducing the manual burden of reporting and, in some cases, completely removing it. In one experience, IU Health treated many cancer patients in the Kentucky region. The State of Kentucky requires reporting of cancer diagnoses to the state health department, and the number of cancer incidences at the time of the study seemed too low.

To address the communication issue of inadequate reporting of cancer diagnoses between the Kentucky State Health Department and IU Health, IU Health used machine learning—instead of traditional ICD-9 codes—to identify cancer incidence from free text reports. The machine learning algorithms flagged words that IU Health selected (e.g., “tumor,” “malignant,” etc.). The algorithm proved more accurate than the human flagging system.

Predictive Models Account for Social Determinants of Health

Predictive models, such as NLP and text mining, also help providers take a holistic approach to care. For example, a patient might receive care in a local clinic for hypertension but may have other unmet needs (e.g., food, housing, and employment). Often these SDoH can influence a patient’s well-being as much, or more, than medical care. Because the average patient spends less than 1 percent of their life in a clinic, clinicians must understand what goes on in a patient’s life outside of the clinic doors.

To effectively capture SDoH, IU Health built a custom prediction model using environmental data, crime statistics, housing statistics, and socioeconomic status of zip code, combined with clinical data. The predictive model provided the community care nurses in primary care clinics with information about patients’ challenges outside the healthcare setting; the nurses could then identify referrals for nutrition and financial counseling, and more. With machine learning algorithms specific to a healthcare organization’s unique population, providers are empowered to treat patients more comprehensively.

Machine Learning Tools Take Healthcare Delivery to the Next Step

As health data only stands to grow, health systems need tools to tap into this expanding resource to extract better data. Although machine learning isn’t perfect with an inevitable margin of error, and the industry still has more questions than answers, machine learning is starting to offer true value in the complex world of healthcare.

As health systems continue to provide quality care on shrinking budgets and reimbursements, data and analytics become critical for success. However, the industry needs more, high-quality data to develop better algorithms that answer the right questions. These insights will allow clinicians to focus on patients while relying on the data to make the most informed decisions. Machine learning will never replace the human factor in healthcare, but it can relieve care teams of time-consuming tasks so they can focus on what matters most—delivering care to patients how, when, and where they need it.

Additional Reading

Would you like to learn more about this topic? Here are some articles we suggest:

  1. Using Analytics to Improve Patient Safety Surveillance Makes Patients Safer
  2. Healthcare NLP: Four Essentials to Make the Most of Unstructured Data
  3. Healthcare NLP: The Secret to Unstructured Data’s Full Potential
  4. Machine Learning Improves Accuracy of Risk Predictions and Improves Operational Effectiveness

PowerPoint Slides

Would you like to use or share these concepts? Download the presentation highlighting the key main points.

Click Here to Download the Slides

Healthcare Quality Improvement: A Foundational Business Strategy

This website stores data such as cookies to enable essential site functionality, as well as marketing, personalization, and analytics. By remaining on this website you indicate your consent. For more information please visit our Privacy Policy.