Abstract Archives of the RSNA, 2014
SSJ13-06
Determining Imaging Characteristics of KRAS Oncogene Mutations in Colon Cancer Using Word Frequency and Naive Bayes Analysis of Radiology Reports
Scientific Papers
Presented on December 2, 2014
Presented as part of SSJ13: Informatics (Business Analytics)
Siddharth Govindan MD, Presenter: Nothing to Disclose
Quanzheng Li PhD, Abstract Co-Author: Nothing to Disclose
Suvranu Ganguli MD, Abstract Co-Author: Research Grant, Merit Medical Systems, Inc
Consultant, Boston Scientific Corporation
Thomas Gregory Walker MD, Abstract Co-Author: Nothing to Disclose
Rahmi Oklu MD, PhD, Abstract Co-Author: Nothing to Disclose
To apply word frequency analysis and a naive Bayes classifier on radiology reports to extract distinguishing imaging descriptors of wild-type colon cancer patients and those with KRAS mutations.
In this IRB approved study, we compiled a SNaPshot mutation analysis dataset from 457 colon adenocarcinoma patients between March, 2009 to December, 2012. From this cohort of patients, we analyzed the radiology reports of 299 patients (>32,000 reports) who were either the wild type (147 patients) or had a KRAS (152 patients) mutation. We wrote a computer program to determine the frequency of words within the wild type and mutant group radiology reports and using a naive Bayes classifier determined the probability of a given word belonging within either group.
Words with a greater than 50% chance (range 56-58%) of being in the KRAS mutation group and which had the highest absolute probability difference compared to the wild type group included: “several”, “innumerable”, “confluent”, and “numerous.” In contrast, words with a greater than 50% chance (range 58-61%) of being in the wild type group and with the highest absolute probability difference included: “few”, “discrete”, and “[no] recurrent.”
Words used in radiology reports, which have direct implications on disease course, tumor burden and therapy, show up with differing frequency in patients with KRAS mutations versus wild-type colon adenocarcinoma. More importantly, the study suggest that there are likely characteristic imaging traits of mutant tumors.
Probabilistic word analysis may be useful in identifying unique characteristics and disease course associated with mutated oncogenes. This type of analysis may be applied to radiology reports as well as other types of clinical notes.
Govindan, S,
Li, Q,
Ganguli, S,
Walker, T,
Oklu, R,
Determining Imaging Characteristics of KRAS Oncogene Mutations in Colon Cancer Using Word Frequency and Naive Bayes Analysis of Radiology Reports. Radiological Society of North America 2014 Scientific Assembly and Annual Meeting, - ,Chicago IL.
http://archive.rsna.org/2014/14015516.html