Toward Best Practice Reporting: A Natural Language Processor to Identify Semantic Content and Automatically Generate Standardized Knee MRI Reports

SSG08-04

Toward Best Practice Reporting: A Natural Language Processor to Identify Semantic Content and Automatically Generate Standardized Knee MRI Reports

Scientific Formal (Paper) Presentations

Presented on November 30, 2010
Presented as part of SSG08: Informatics (Reporting and Result Communication)

Bao H. Do MD, Presenter: Nothing to Disclose

Sandip Biswal MD, Abstract Co-Author: Co-founder, SiteOne Therapeutics Inc Tiger Team Member, General Electric Company

Kathryn Jane Stevens MD, Abstract Co-Author: Nothing to Disclose

Daniel L. Rubin MD, Abstract Co-Author: Research grant, General Electric Company

Voice recognition (VR) creates opportunities for real-time structured reporting and live feedback, but structured reporting can distract radiologists and be cumbersome compared with conventional unconstrained dictation. The purpose of this work is to develop and validate an NLP to identify semantic content in knee MRI statements from unstructured text and automatically generates full, structured knee MRI reports.

We designed an NLP using the Apache/PHP/MySql platform. The NLP processes whole knee MRI reports. Using a lexicon of "signals" or regular expressions that specify anatomy, findings, or disease terms, the NLP assigns each sentence to 1 of 8 categories of a standardized knee MRI template: (1) joint/effusion/synovitis/loose bodies, (2) menisci, (3) cruciate ligaments, (4) collateral ligaments, (5) extensor mechanism, (6) cartilage, (7) bone and marrow, and (8) miscellaneous (muscle, tendon, Baker's cyst, etc). Approximately 2000 sentences from 125 knee MRI reports at our institution between 2005-2009 were reviewed to generate 59 signals determined by 2 musculoskeletal subspecialists to be specific for the 8 semantic categories. For validation, 25 knee mri reports between 2005-2009 were randomly selected. Reports were pre-processed and converted to a single paragraph of sentences by removing all section headers. Accuracy in semantic assignment was assessed. Sentences containing 2 semantic concepts were assigned to at least 1 of the 2 categories.

The NLP classified 381 sentences to the 8 categories. 10 sentences in 9 reports were inaccurately categorized for an overall accuracy of 97% and 64% accuracy per sentence and per report, respectively. The most common sources of classification error include absent lexicon and signal non-specificity.

We have developed a simple rules-based NLP to extract semantic concepts from knee MRI statements to automatically create structured reports. The extensible infrastructure has potential for integration with future RadLex-based best-practice templates.

NLP systems that can stratify semantic content can potentially provide transparent, real-time feedback (“missing content” alert), decision support, QA, and data mining.

Do, B, Biswal, S, Stevens, K, Rubin, D, Toward Best Practice Reporting: A Natural Language Processor to Identify Semantic Content and Automatically Generate Standardized Knee MRI Reports. Radiological Society of North America 2010 Scientific Assembly and Annual Meeting, November 28 - December 3, 2010 ,Chicago IL. http://archive.rsna.org/2010/9007834.html