The same entity is often referred to in a variety of ways. For example, the camera Canon 600d is also referred to as "canon rebel t3i", the celebrity Jennifer Lopez is also referred to as "jlo" and Seattle Tacoma International Airport is also referred to as "sea tac". These are known as synonyms. Without knowledge of synonyms, many applications like e-commerce search will fail to return relevant results. We leverage the data assets amassed by Bing to automatically mine such synonyms.
One of the main insights is to use Bing's query log to mine synonyms. However, simple techniques like using co-click frequencies are not adequate; we developed new features like pseudo-document similarity and context similarity. Furthermore, we leverage other data sources like web lists, web tables and certain text patterns to mine synonyms.
- We worked closely with Bing and Azure DataMarket teams to release the the Synonym API so that external developers can integrate the synonym knowledge into their applications (e.g., e-commerce search application). It is part of Bing Dev Center along with the other Bing APIs.
- We have integrated our synonym technology into Dynamics AX for Retail to power their e-commerce search functionality.
- Synonyms technology is used by Bing Sports as well to return relevant information when people search for sports teams, players, leagues, etc. For example, when users search for `tampabaybucs’, they leverage our synonyms to return “information card” about Tampa Bay Buccaneers.
- Synonym technology is used to enhance product search in Bing Shopping
- Synonym technology is used by Bing Ads for matching keyword queries with advertisements
- Yeye He, Kaushik Chakrabarti, Tao Cheng, and Tomasz Tylenda, Automatic Discovery of Attribute Synonyms Using Query Logs and Table Corpora, WWW – World Wide Web Consortium (W3C), April 2016.
- Bilyana Taneva, Tao Cheng, Kaushik Chakrabarti, and Yeye He, Mining Acronym Expansions and Their Meanings Using Query Click Log, WWW Conference 2013, May 2013.
- Tao Cheng, Kaushik Chakrabarti, Surajit Chaudhuri, Vivek Narasayya, and Manoj Syamala, Data Services for E-tailers Leveraging Web Search Engine Assets, in ICDE Conference, April 2013.
- Kaushik Chakrabarti, Surajit Chaudhuri, Tao Cheng, and Dong Xin, A Framework for Robust Discovery of Entity Synonyms, in SIGKDD, 2012.
- Surajit Chaudhuri, Venkatesh Ganti, and Dong Xin, Exploiting Web Search To Generate Synonyms For Entities, in 18th International World Wide Web Conference, Association for Computing Machinery, Inc., April 2009.
- Surajit Chaudhuri, Venkatesh Ganti, and Dong Xin, Mining Document Collections to Facilitate Accurate Approximate Entity Matching, in VLDB, Very Large Data Bases Endowment Inc., 2009.