Dynamically Switched Custom Language Models for Radiology Speech Recognition

SSG08-07

Dynamically Switched Custom Language Models for Radiology Speech Recognition

Scientific Formal (Paper) Presentations

Presented on November 30, 2010
Presented as part of SSG08: Informatics (Reporting and Result Communication)

Naveen Garg MD, Presenter: Patent application, Medical Image Processing Method

John Keith Mukai MD, Abstract Co-Author: Nothing to Disclose

David Joseph Vining MD, Abstract Co-Author: Royalties, Bracco Group, Lake Success, NY Consultant, sanofi-aventis Group, Bridgewater, NJ

The efficiency of SR can be improved when the language model is narrowed to a specific radiologic application or use in a specific aspect of a radiology report.

Speech recognition (SR) quality has advanced to a point that many radiologists often choose to edit their own reports rather than send them to transcriptionists. However, more than 20% of SR reports contain dictation errors, and radiologists tend to underestimate the frequency of such errors (Quint 2008). Radiology-specific language models are utilized in commercial SR systems, but the use of subsets of the more comprehensive language model tailored to specific parts of a radiology report has not been demonstrated. We have developed a working prototype of an SR engine that dynamically uses customized language models depending on the section and context of the radiology report.

We developed a working prototype using Microsoft Speech API SDK (SAPI) 5.1. Twelve-hundred anonymized radiology reports were analyzed, and the text from each report was filtered into different sets based on the work type and report section (history, findings, impression) using the autohotkey scripting language. The number of 1-grams, 2-grams, and 3-grams for each section type was determined (1-grams: findings 7727, history 2436, impression 4987; 2-grams: findings 45920, history 7893, impression 21492; 3-grams: findings 90179, history 12507, impression 34745). An application was then written using C++ and SAPI that switched between customized language models depending upon the section of the report that was being dictated.

The compilation of customized language models takes significantly longer depending on the number of n-grams. The feasibility of using dynamically switched language models was demonstrated in a working application.

Garg, N, Mukai, J, Vining, D, Dynamically Switched Custom Language Models for Radiology Speech Recognition. Radiological Society of North America 2010 Scientific Assembly and Annual Meeting, November 28 - December 3, 2010 ,Chicago IL. http://archive.rsna.org/2010/9012062.html