RSNA 2007 

Abstract Archives of the RSNA, 2007


LL-IN6166-R01

The Pilot Study of Character-based Adjacent Probability in Japanese CT Clinical Reports

Scientific Posters

Presented on November 29, 2007
Presented as part of LL-IN-R: Informatics

Participants

Naoki Nishimoto MS, Presenter: Nothing to Disclose
Satoshi Terae MD, Abstract Co-Author: Research grant, Daiichi Sankyo Company, Ltd Research grant, Eisai Co, Ltd Research grant, Medical Image Lab, Inc
Masahito Uesugi, Abstract Co-Author: Nothing to Disclose
Takayoshi Terashita, Abstract Co-Author: Nothing to Disclose
Takumi Tanikawa, Abstract Co-Author: Nothing to Disclose
Katsuhiko Ogasawara, Abstract Co-Author: Nothing to Disclose
Akira Endoh, Abstract Co-Author: Nothing to Disclose
Tsunetaro Sakurai, Abstract Co-Author: Nothing to Disclose
et al, Abstract Co-Author: Nothing to Disclose
et al, Abstract Co-Author: Nothing to Disclose

PURPOSE

The building of a medical ontology may contribute the information retrieval task that extracts information supporting diagnosis from the narrative texts written by experts. Because each word is not separated by a white space in some languages such as Japanese and Chinese language and combined technical terms exist in the medical domain, it is difficult for the computer programs to parse sentences appropriately The purpose of this study is to investigate the distribution of transitional probability of the medical term boundaries between characters in compounds.

METHOD AND MATERIALS

We adopted Japanese 100 computed tomography (CT) reports randomly selected from 2,000 reports that were made during July 2005 in the Hokkaido University Hospital. Medical terms in CT reports were identified using Morphological analysis system ChaSen. ChaSen is based on the probabilistic language model and developed by the Matsumoto laboratory in Nara Institute of Technology. The MeSH-based medical terms (51,385 entries), obtained from the Metathesaurus in UMLS (Unified Medical Language System, 2005AA), were added as the medical dictionary of ChaSen. A radiographer corrected the parsing errors in the result set. We retrieved transitional probability as the conditional probability of uni-gram, bi-gram, tri-gram.

RESULTS

The number of characters in each report was 256.4±13.7 and the number of character and word types was 863 and 1,941 respectively. For an example of anatomical location, “pulmonary hilum” was parsed as a tri-gram and counted 74(the probability was 6.54*E-3).

CONCLUSION

Retrieval of transitional probability will make progress in correctly parsing medical texts. The transitional probabilities may allow us to fix the dictionary size for parsing the narrative texts and develop a medical ontology by using it in the term extraction algorithm. Farther work will be required for parsing the texts precisely.

CLINICAL RELEVANCE/APPLICATION

Parsing the narrative texts may contribute to the information retrieval tasks that extractsinformation supporting diagnosis from the narrative texts written by experts.

Cite This Abstract

Nishimoto, N, Terae, S, Uesugi, M, Terashita, T, Tanikawa, T, Ogasawara, K, Endoh, A, Sakurai, T, et al, , et al, , The Pilot Study of Character-based Adjacent Probability in Japanese CT Clinical Reports.  Radiological Society of North America 2007 Scientific Assembly and Annual Meeting, November 25 - November 30, 2007 ,Chicago IL. http://archive.rsna.org/2007/5015943.html