Teaching a Machine to Annotate Radiology Text

SSQ11-09

Teaching a Machine to Annotate Radiology Text

Scientific Papers

Presented on December 4, 2014
Presented as part of SSQ11: Informatics (Results and Reporting)

Eamon Johnson MS, Presenter: Nothing to Disclose

Michael D. Torno DSc, Abstract Co-Author: Nothing to Disclose

William Christopher Baughman MD, Abstract Co-Author: Nothing to Disclose

The involvement of expert annotators is critical when we encounter new data and metadata, yet expert involvement can be decreased when the data conform to prior patterns. This work uses additional metadata to provide a partial solution to reducing the need for expert involvement in the text annotation process.

Ideally, the data analysis revolution would aid physicians in making diagnoses and offer an automated secondary analysis. Training machines to provide this functionality generally requires data sets vetted by experts as a starting point. However, the lack of annotated text corpora for training computational models is a perennial problem in medical informatics. The cost of creating annotated corpora is high, because informaticists must design annotation schemes and train physicians to make the annotations, and physicians must invest effort in making annotations. Even then, expert physicians are not necessarily expert—or willing—annotators. This project analyzes methods for leveraging existing clinical annotations to build richly annotated data sets automatically.

The source data consist of 700,000 diagnostic radiology text reports, each of which contains physician name, full interpretation text, modality, body area, and an ICD-9-CM code reflecting the initial diagnosis. An NLP pipeline based on cTAKES was used to extract medical concepts from the interpretation text, and a correlation between the codes and the concepts extracted from the reports was constructed.

The output of the analysis is a ranked evaluation of concept correlation with 448 ICD-9-CM codes, with discussion of underlying factors, sources of noise, and sources of bias. For a portion of codes with low-noise and low-bias characteristics, strategies for automatic annotation of records are presented.

Johnson, E, Torno, M, Baughman, W, Teaching a Machine to Annotate Radiology Text. Radiological Society of North America 2014 Scientific Assembly and Annual Meeting, - ,Chicago IL. http://archive.rsna.org/2014/14008184.html