Abstract Archives of the RSNA, 2009
LL-IN2112-B11
Developing Manually Annotated Corpus for Japanese CT Reports Using RadLex
Scientific Posters
Presented on November 29, 2009
Presented as part of LL-IN-B: Informatics
Naoki Nishimoto MS, Presenter: Nothing to Disclose
Satoshi Terae MD, Abstract Co-Author: Research grant, J-MAC SYSTEM, Inc
Research grant, Medical Image Lab
Research grant, FUJIFILM Holdings Corporation
Ayako Yagahara, Abstract Co-Author: Nothing to Disclose
Yuki Yokooka, Abstract Co-Author: Nothing to Disclose
Shintarou Tsuji, Abstract Co-Author: Nothing to Disclose
Masahito Uesugi, Abstract Co-Author: Nothing to Disclose
Katsuhiko Ogasawara PhD, Abstract Co-Author: Nothing to Disclose
00030490-DMT et al, Abstract Co-Author: Nothing to Disclose
RadLex is a great effort to identify the meaning of medical terms in radiology report. However, it is still a challenge and labor-intensive work to manually map the conceptual label to the specific terms. It is because few studies are reported for the framework of developing annotated corpus in radiology domain. The purpose of this study is to construct a corpus of CT reports manually annotated for semantics information and evaluate the statistical characteristics of semantics in the corpus.
We selected 30 Japanese CT reports randomly from 1,989 reports that were made during July 2005 in the Hokkaido University Hospital, Japan. Unfortunately, Japanese version of RadLex has not been available and automatic tagging tools do not work. Thus, we selected 13 top nodes of RadLex term and annotated CT reports manually. Original manual annotation tool of RadLex was developed using Java. It extracts Radlex terms through an API of ontology editor, Protégé 3.2.1. We annotated the RadLex terms to the words in CT reports. The corpus was in XML format. The frequencies of tagged semantics and phrases were counted.
The number of characters in each report was 792±273 and the number of character types was 719. We counted top node of RadLex terms from Japanese CT reports. Total count of Radlex top node was 1,129. The most frequent term was “anatomic entitity” (44.8%), which includes “potal vein”, “liver” and “pancreas.” Other top nodes in the RadLex hierarchy were “imaging observation” (26.6%) and “imaging observation characteristic” (24.9%).
Pakhomov et al. showed an increase of accuracy from 89.79% to 94.69% in automatic part-of-speech identification with small set of corpus. The constructed corpus in this study helps to increase the accuracy of automatic tagger in cross language settings. Our study was limited to top nodes of RadLex terms, that were a concept abstraction. Further study is required for the detailed semantics tagging.
RadLex is a standard terminology, however, still a large gap between CT reports and RadLex terms. Manual semantics tagging is required for the cross language information retrieval in radiology.
Nishimoto, N,
Terae, S,
Yagahara, A,
Yokooka, Y,
Tsuji, S,
Uesugi, M,
Ogasawara, K,
et al, 0,
Developing Manually Annotated Corpus for Japanese CT Reports Using RadLex. Radiological Society of North America 2009 Scientific Assembly and Annual Meeting, November 29 - December 4, 2009 ,Chicago IL.
http://archive.rsna.org/2009/8015211.html