Abstract Archives of the RSNA, 2007
LL-IN6166-R01
The Pilot Study of Character-based Adjacent Probability in Japanese CT Clinical Reports
Scientific Posters
Presented on November 29, 2007
Presented as part of LL-IN-R: Informatics
Naoki Nishimoto MS, Presenter: Nothing to Disclose
Satoshi Terae MD, Abstract Co-Author: Research grant, Daiichi Sankyo Company, Ltd
Research grant, Eisai Co, Ltd
Research grant, Medical Image Lab, Inc
Masahito Uesugi, Abstract Co-Author: Nothing to Disclose
Takayoshi Terashita, Abstract Co-Author: Nothing to Disclose
Takumi Tanikawa, Abstract Co-Author: Nothing to Disclose
Katsuhiko Ogasawara, Abstract Co-Author: Nothing to Disclose
Akira Endoh, Abstract Co-Author: Nothing to Disclose
Tsunetaro Sakurai, Abstract Co-Author: Nothing to Disclose
et al, Abstract Co-Author: Nothing to Disclose
et al, Abstract Co-Author: Nothing to Disclose
The building of a medical ontology may contribute the information retrieval task that extracts information supporting diagnosis from the narrative texts written by experts. Because each word is not separated by a white space in some languages such as Japanese and Chinese language and combined technical terms exist in the medical domain, it is difficult for the computer programs to parse sentences appropriately
The purpose of this study is to investigate the distribution of transitional probability of the medical term boundaries between characters in compounds.
We adopted Japanese 100 computed tomography (CT) reports randomly selected from 2,000 reports that were made during July 2005 in the Hokkaido University Hospital.
Medical terms in CT reports were identified using Morphological analysis system ChaSen. ChaSen is based on the probabilistic language model and developed by the Matsumoto laboratory in Nara Institute of Technology. The MeSH-based medical terms (51,385 entries), obtained from the Metathesaurus in UMLS (Unified Medical Language System, 2005AA), were added as the medical dictionary of ChaSen. A radiographer corrected the parsing errors in the result set. We retrieved transitional probability as the conditional probability of uni-gram, bi-gram, tri-gram.
The number of characters in each report was 256.4±13.7 and the number of character and word types was 863 and 1,941 respectively. For an example of anatomical location, “pulmonary hilum” was parsed as a tri-gram and counted 74(the probability was 6.54*E-3).
Retrieval of transitional probability will make progress in correctly parsing medical texts. The transitional probabilities may allow us to fix the dictionary size for parsing the narrative texts and develop a medical ontology by using it in the term extraction algorithm. Farther work will be required for parsing the texts precisely.
Parsing the narrative texts may contribute to the information retrieval tasks that extractsinformation supporting diagnosis from the narrative texts written by experts.
Nishimoto, N,
Terae, S,
Uesugi, M,
Terashita, T,
Tanikawa, T,
Ogasawara, K,
Endoh, A,
Sakurai, T,
et al, ,
et al, ,
The Pilot Study of Character-based Adjacent Probability in Japanese CT Clinical Reports. Radiological Society of North America 2007 Scientific Assembly and Annual Meeting, November 25 - November 30, 2007 ,Chicago IL.
http://archive.rsna.org/2007/5015943.html