Jiang Bian, Bin Gao, and Tie-Yan Liu
Recent years have witnessed the increasing efforts that apply deep learning techniques to solve text mining and natural language processing tasks. The basis of these tasks is to obtain high-quality distributed representations of words, i.e., word embeddings, from large amounts of text data. However, text itself usually contains limited information, which makes necessity to leverage extra knowledge to understand it. Fortunately, since text is generated by human, it already contains well-defined morphological and syntactic knowledge; moreover, the large amount of human-generated texts on the Web enable the extraction of plenty of semantic knowledge. Thus, novel deep learning algorithms and systems are needed in order to leverage the above knowledge to compute more effective word embedding. In this paper, we conduct an empirical study on the capacity of leveraging morphologic, syntactic, and semantic knowledge to achieve high-quality word embeddings. Our study explores these types of knowledge to define new basis for word representation, provide additional input information, and serve as auxiliary supervision in deep learning, respectively. Experiments on a popular analogical reasoning task, a word similarity task, and a word completion task have all demonstrated that knowledge-powered deep learning can enhance the effectiveness of word embedding.
The final publication is available at Springer via http://rd.springer.com/