Learning with N-Grams: From Massive Scales to Compressed Representations

N-gram models are essential in any kind of text processing; they offer simple baselines that are surprisingly competitive with more complicated “state of the art” techniques. I will present a survey of my work for learning with arbitrarily long N-grams at massive scales. This framework combines fast matrix multiplication with a dual learning paradigm that I am developing to reconcile sparsity-inducing penalties with Kernels. The presentation will focus on Dracula, a new form of deep learning based on classical ideas from compression. Dracula is a combinatorial optimization problem, and I will discuss some its problem structure and use this to visualize its solution surface.

Speaker Details

Hristo Paskov was born in Bulgaria and grew up in New York. He received a B.S. and M.Eng. in Computer Science from MIT while conducting research at the MIT Datacenter and Tomaso Poggio’s group (CBCL). He is currently finishing a Ph.D. in Computer Science at Stanford under the advisement of John Mitchell and Trevor Hastie. His research spans machine learning, optimization, and algorithms in order to build large-scale statistical methods and data representations. He is developing a new deep learning paradigm that uses compression to find compact data representations that are useful for statistical inference. His work has provided state of the art methods for security and natural language processing.

Date:
Speakers:
Hristo Paskov
Affiliation:
Stanford