Isabelle Stanton, Samuel Ieong, and Nina Mishra
8 July 2014
Circumlocution is when many words are used to describe what could be said with fewer, e.g., “a machine that takes moisture out of the air” instead of “dehumidifier”. Web search is a perfect backdrop for circumlocution where people struggle to name what they seek. In some domains, not knowing the correct term can have a significant impact on the search results that are retrieved. We study the medical domain, where professional medical terms are not commonly known and where the consequence of not knowing the correct term can impact the accuracy of surfaced information, as well as escalation of anxiety, and ultimately the medical care sought. Given a free-form colloquial health search query, our objective is to find the underlying professional medical term. The problem is complicated by the fact that people issue quite varied queries to describe what they have. Machine-learning algorithms can be brought to bear on the problem, but there are two key complexities: creating high-quality training data and identifying predictive features. To our knowledge, no prior work has been able to crack this important problem due to the lack of training data. We give novel solutions and demonstrate their efficacy via extensive experiments, greatly improving over the prior art.
|Published in||SIGIR (Information Retrieval)|
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Copyright is held by the author/owner(s). Subject to Microsoft Research Open Access Policy.