Yan Xu, Jiahua Liu, Jiajun Wu, Yue Wang, and Eric Chang
Objective: To create a highly accurate coreference system in discharge summaries for the fifth i2b2 challenge. The categoriesconsist of person, problem, treatment, testand pronoun.
Design: We developed an integrated coreference system exploiting bothdocument intrinsicstructuresand world knowledge. The system is partitioned into three subsystems: person coreference system based on three person attributes, problem/treatment/test system based on mapping engines and world knowledge, and pronoun system based on multi-SVM. The intrinsiclatent structure first includes three person attributes: patient, relative and hospital personnel; in addition, italso includesanatomy position, medication, indicator, temporalrelation,spatialrelation, section, modifier, equipment, operation, and assertion. The world knowledge is extracted from two external resources. One is open sources–Wikipediaand WordNet, and the other is provided by Microsoft ResearchAsia, includingProbase, Evidence and NeedleSeek.
Measurements: Macro-averaged and Micro-averaged precision, recall and F-measure in MUC, BCubed and CEAF were used to evaluate results.
Results: The system achieved an overall micro-averaged F-measure of 0.916, specifically0.902for person, 0.876for problem, 0.847for treatment, and 0.825for test in 10-fold cross validation. The results of personare the best in the fourcategories.
Conclusions: Our system achievedpromisingperformances. This study demonstrates that it is feasible to accomplish the coreference task in discharge summaries.
In Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data