Abstract Archives of the RSNA, 2012
LL-INS-WE3C
Automatic Extraction of BI-RADS Features from Cross-Institution and Cross-Language: Free-Text Mammography Reports
Scientific Informal (Poster) Presentations
Presented on November 28, 2012
Presented as part of LL-INS-WEPM: Informatics Afternoon CME Posters
Houssam Nassif, Presenter: Nothing to Disclose
Terrie Kitchner, Abstract Co-Author: Nothing to Disclose
Filipe Cunha, Abstract Co-Author: Nothing to Disclose
Ines C. Moreira, Abstract Co-Author: Nothing to Disclose
Elizabeth S. Burnside MD, MPH, Abstract Co-Author: Research Grant, Hologic, Inc
The American College of Radiology developed the Breast Imaging Reporting and Data System (BI-RADS) lexicon to standardize mammography findings and reporting. BI-RADS features were established to discriminate between benign and malignant disease and have thus been used to build successful breast-cancer risk prediction tools. However, many radiology reports are encoded in free-text, making descriptors difficult to extract and utilize for individual or population based risk estimations. Our goal is to develop an automated method for BI-RADS feature extraction from free-text, and to test it over multiple free-text mammography databases.
We first developed an algorithm that used pattern matching and regular expression to extract BI-RADS descriptors from free-text. We then established a BI-RADS concepts co-occurrence matrix over the training set, and refined our algorithm based on co-occurrence results and expert input over multiple iterations. We implemented trigger-based negation and double-negation detection. We trained our algorithm on a dataset of 146,972 consecutive mammograms from an academic breast imaging practice. We validated our algorithm on two manually-annotated test sets: 100 reports from the same academic practice not included in training and 71 reports from a private practice. To test the portability of our method to another language, we used 306 consecutive annotated Portuguese mammograms to similarly construct a Portuguese BI-RADS extractor. On all three sets, the algorithm retrieved true positive and true negative features that the manual annotation missed or misclassified.
The English algorithm achieves 99.1% precision and 98.2% recall on the academic dataset and scores 97.9% precision and 95.9% recall on the private practice dataset. The Portuguese version returns 96.6% precision and 92.6% recall on the Portuguese dataset. Taking into consideration the manual annotation errors, our algorithm performed no worse than a human annotator on all three datasets.
Our automated method to extract BI-RADS features from free-text mammography records achieves a performance comparable to manual extraction on cross-institution and cross-language datasets.
Our BI-RADS features extraction method from free-text mammograms generalizes across institutions and languages, enabling the incorporation of free-text data into breast cancer risk prediction tools.
Nassif, H,
Kitchner, T,
Cunha, F,
Moreira, I,
Burnside, E,
Automatic Extraction of BI-RADS Features from Cross-Institution and Cross-Language: Free-Text Mammography Reports. Radiological Society of North America 2012 Scientific Assembly and Annual Meeting, November 25 - November 30, 2012 ,Chicago IL.
http://archive.rsna.org/2012/12043569.html