Abstract Archives of the RSNA, 2013
Jean Garcia-Gathright, Presenter: Nothing to Disclose
Corey W. Arnold, Abstract Co-Author: Nothing to Disclose
Alex Anh-Tuan Bui MS, PhD, Abstract Co-Author: Nothing to Disclose
The extraction of specific data elements from unstructured free-text documents is a critical task for a range of clinical and research activities, including data mining and disease registry construction. To enable such applications for imaging-based application domains, we have developed a set of natural language processing (NLP) annotators for the automatic extraction of patient characteristics and the subsequent population of a database. The use of this framework is demonstrated for lung cancer screening.
Our input corpus comprises the entire set of medical reports for patients who have undergone a biopsy of an indeterminate lung nodule. We targeted several data elements, including location of tumor, biopsy results, family history of cancer, and smoking history. Extraction performance was evaluated against a manually-annotated gold standard of 112 cases. Precision and recall were as high as 95% for certain data elements, such as location of tumor.
An investigation of the input corpus revealed that most of the data elements of interest were found in radiology reports, pathology reports, and oncology consultations. We found that rule-based logic was sufficient for very good annotation performance. Our framework was implemented in Apacha UIMA (Unstructured Information Management Architecture) and includes mechanisms for database querying, section detection, and information extraction based on regular expressions.
The successful implementation of these annotators represents an important step in the analysis of unstructured clinical documents. The rules and regular expressions we have developed can be used to further structured reporting templates and other free-text based analyses. Future work also includes the implementation of interactive web-based visualizations of the extracted data to support integrated radiology/pathology reporting and tumor board meetings.
Garcia-Gathright, J,
Arnold, C,
Bui, A,
Automatic Extraction of Patient Characteristics from Clinical Reports. Radiological Society of North America 2013 Scientific Assembly and Annual Meeting, December 1 - December 6, 2013 ,Chicago IL.
http://archive.rsna.org/2013/13016206.html