Galen Andrew
About Me:
I am a Research Software Developer
in the Text Mining, Search and Navigation group at Microsoft Research.
My research interests include machine learning, natural language processing, recommender systems, scientific computing and information retrieval.
I completed my MS in Computer Science at Stanford University, where I was a research assistant for
Chris Manning in
the Stanford NLP Group,
part of the Stanford AI Lab. Before that, I majored in Math at
Reed College.
When I'm not doing research, I enjoy playing/listening to music, dancing, reading, and playing go. I also love studying foreign languages;
I lived for a while in Russia (St. Petersburg and Moscow) and in China (Beijing).
|  | | An out-of-date but flattering photo. |
|
Email: My email address is my MS username (as per the url of this page) at microsoft.com.
Publications:
- Scalable training of L1-regularized log-linear models. Galen Andrew and Jianfeng Gao. ICML, 2007
Introduces the "Orthant-Wise Limited-memory Quasi-Newton" algorithm (OWL-QN), a new method for optimizing an L1-regularized loss that is very efficient, even on problems with millions of parameters. A video of the presentation has been recorded. Source code for OWL-QN, including a standalone trainer for L1-regularized least-squares or logistic regression, is available for download.
- A comparative study of parameter estimation methods for statistical natural language processing. Jianfeng Gao, Galen Andrew, Mark Johnson and Kristina Toutanova. ACL, 2007
Compares five parameter estimation methods (L2-regularized MaxEnt, L1-regularized MaxEnt, Averaged Perceptron, Boosting and Boosted LASSO) on four models for NLP tasks (parse re-ranking, part-of-speech tagging with an MEMM, Chinese word segmentation with a semi-CRF, and re-ranking for language model adaptation).
- A hybrid Markov/semi-Markov conditional random field for sequence segmentation. Galen Andrew. EMNLP, 2006.
A log-linear model for sequence segmentation that combines the strengths of the Markov CRF and the semi-Markov CRF, achieving record-breaking results on Chinese word segmentation. A slightly improved model is described in the above paper "A comparitive study..."
- Tregex and Tsurgeon: tools for querying and manipulating tree data structures. Roger Levy and Galen Andrew. LREC, 2006.
About Tregex and Tsurgeon, tools for searching treebanks for trees matching a given pattern, and performing specified manipulations. See "Software" (below) for download.
- A conditional random field word segmenter for Sighan Bakeoff 2005. Huihsin Tseng, Pichuan Chang, Galen Andrew, Daniel Jurafsky, Cristopher D. Manning. Fourth SIGHAN Workshop on Chinese Language Processing.
Describes our champion system from the 2005 Sighan Chinese word segmentation bakeoff.
- Verb sense and subcategorization: using joint inference to improve performance on complementary tasks. Galen Andrew, Trond Grenager, Cristopher D. Manning. EMNLP, 2004.
A generative model that demonstrates the benefit of solving multiple related problems at once.
- Boosting as a metaphor for algorithm design.
Kevin Leyton-Brown, Eugene Nudelman, Galen Andrew, James McFadden, Yoav Shoham (2003).
We construct a learned model of algorithm run-time, and demonstrate various applications. Most of the material in the full (unpublished) version above appeared in two smaller papers: A portfolio approach to algorithm selection. at IJCAI, 2003, and a
complementary paper with the original title, at Constraint Programming 2003.
|
Tutorials/Notes
I prepared these notes on quasi-Newton optimization methods for an informal tutorial at MSR.
Software:
Source code for the OWL-QN L1 regularization optimization algorithm is available for download. You can define your own differentiable loss to optimize with L1 regularization, or use the standalone trainer for training L1-regularized least-squares or logistic regression models.
At Stanford, I wrote
Tregex, a utility for matching patterns in syntax trees,
similar to Tgrep-2. It contains essentially the same functionality as
Tgrep-2, plus several extremely useful relations for natural language
trees, for example "A is the lexical head of B". Due to lack of indexing,
it is somewhat slower than Tgrep-2 when searching over large treebanks,
but not prohibitively so. It is available publicly (under the GNU GPL)
from the Stanford
NLP-group website.
|