Yan Xu, Kai Hong, Junichi Tsujii, and Eric Chang
14 May 2012
Objective: A system that translates narrative text in the
medical domain into structured representation is in great
demand. The system performs three sub-tasks: concept
extraction, assertion classification, and relation
identification.
Design: The overall system consists of five steps:
(1) pre-processing sentences, (2) marking noun phrases
(NPs) and adjective phrases (APs), (3) extracting
concepts that use a dosage-unit dictionary to
dynamically switch two models based on Conditional
Random Fields (CRF), (4) classifying assertions based on
voting of five classifiers, and (5) identifying relations
using normalized sentences with a set of effective
discriminating features.
Measurements: Macro-averaged and micro-averaged
precision, recall and F-measure were used to evaluate
results.
Results The performance is competitive with the stateof-
the-art systems with micro-averaged F-measure of
0.8489 for concept extraction, 0.9392 for assertion
classification and 0.7326 for relation identification.
Conclusions: The system exploits an array of common
features and achieves state-of-the-art performance.
Prudent feature engineering sets the foundation of our
systems. In concept extraction, we demonstrated that
switching models, one of which is especially designed for
telegraphic sentences, improved extraction of the
treatment concept significantly. In assertion
classification, a set of features derived from a rule-based
classifier were proven to be effective for the classes
such as conditional and possible. These classes would
suffer from data scarcity in conventional machinelearning
methods. In relation identification, we use twostaged
architecture, the second of which applies
pairwise classifiers to possible candidate classes. This
architecture significantly improves performance.
![]() PDF file |
In Journal of the American Medical Informatics Association
| Type | Article |