Scalable Semi-Supervised Query Classification Using Matrix Sketching

Young-Bum Kim; Karl Stratos; Ruhi Sarikaya

Scalable Semi-Supervised Query Classification Using Matrix Sketching

Young-Bum Kim ,
Karl Stratos ,
Ruhi Sarikaya

July 2016

Published by ACL - Association for Computational Linguistics

Download BibTex

The enormous scale of unlabeled text available today necessitates scalable schemes for representation learning in language processing. For instance, in this paper we are interested in classifying the intent of a user query. While our labeled data is quite limited, we have access to virtually an unlimited amount of unlabeled queries, which could be used to induce useful representations: for instance by principal component analysis (PCA). However, it is prohibitive to even store the data in memory due to its sheer size, let alone apply conventional batch algorithms. In this work, we apply the recently proposed matrix sketching algorithm to entirely obviate the problem with scalability (Liberty, 2013). This algorithm approximates the data within a speciﬁed memory bound while preserving the covariance structure necessary for PCA. Using matrix sketching, we signiﬁcantly improve the user intent classiﬁcation accuracy by leveraging large amounts of unlabeled queries.