My primary research interests involve the development of machine learning and optimization methods for improving information systems, especially algorithms that are robust under uncertainty and that can automatically adapt to users and their information needs.
One area of special interest is development of robust algorithms for information systems that can effectively
balance risk and reward, a research direction that I introduced in my PhD thesis. This work connects information retrieval with portfolio theory and other areas of computational finance to arrive at new models,
algorithms, and evaluation methods that account for risk. For example, it shows how the reliability of current query expansion methods can be greatly improved by treating
query expansion as a constrained convex optimization problem that accounts for the risk and reward of joint term selection.
I'm also interested in large-scale data and text mining, statistical language modeling, natural language processing, educational applications of IR and machine learning like predicting reading difficulty and computer-assisted language learning, and how the brain acquires language skills.
My Ph.D. is from the School of Computer Science at Carnegie Mellon University, where my advisor was Jamie Callan.
I was a member of the Language Technologies Institute. My undergraduate degree (B.Math.) is from the University of Waterloo. Apparently, I'm not the only one who thinks that CMU and Waterloo are a great combination!
Keynote talk: Enriching the Web with Readability MetadataLittle is currently known about the nature of the Web, its users, and how users interact with content when seen through the lens of text readability. For example, a document isn't relevant to a person’s information need - at least, not immediately - if they can't understand it, yet Web search engines have traditionally ignored the problem of finding or providing content at the right level of difficulty as an aspect of relevance. I'll show how computing and applying metadata based on text readability at Web scale - especially in combination with topic metadata - opens up new and sometimes surprising possibilities for enriching our interactions with the Web, from personalizing Web search results to predicting user and site expertise to estimating searcher motivation. I’ll also discuss future challenges and opportunities in predicting and improving text readability, particularly in light of the rapidly growing interest in large-scale applications for online education.
I gave the LTI Colloquium at Carnegie Mellon University on Friday April 27.
I gave the CSLP Seminar at Johns Hopkins on Tuesday April 24.
I gave the keynote talk at DDR 2012: Diversity in Document Retrieval, a workshop held on Feb. 12, 2012 in conjunction with WSDM 2012 ACM International Conf. on Web Search and Data Mining. Slides from my talk Searching as Investing should soon be posted on the workshop website.
Publications
2012
L. Wang, P.N. Bennett, K. Collins-Thompson. Robust ranking models via risk-sensitive optimization. To appear, SIGIR 2012.
J. Kim, K. Collins-Thompson, P. N. Bennett, S. Dumais. Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic. Proceedings of WSDM 2012. (pdf)
D. Sontag, K. Collins-Thompson, P. N. Bennett, R. W. White, S. Dumais, B. Billerbeck. Probabilistic Models for Personalizing Web Search. Proceedings of WSDM 2012. (pdf)
K. Collins-Thompson, P. N. Bennett, R. W. White, S. de la Chica, D. Sontag. Personalizing Web Search Results by Reading Level. Proceedings of the Twentieth ACM International Conference on Information and Knowledge Management (CIKM 2011). Glasgow, Scotland. Oct. 2011. (pdf)
P. Kidwell, G. Lebanon, K. Collins-Thompson. “Statistical Estimation of Word Acquisition with Application to Readability Prediction.” Journal of the American Statistical Association. 106(493):21-30, 2011. (pdf)
K. Collins-Thompson. "Improving information retrieval with reading level prediction." SIGIR 2011 Workshop on Enriching Information Retrieval. Beijing, July 2011. (pdf)
G. Frishkoff, C. Perfetti, and K. Collins-Thompson. "Predicting robust vocabulary growth from measures of incremental learning". Scientific Studies of Reading, 15(1), 71-91. January 2011.
2010
J. Dillon and K. Collins-Thompson. “A unified optimization framework for robust pseudo-relevance feedback algorithms.” Proceedings of the Nineteenth ACM International Conference on Information and Knowledge Management (CIKM 2010), Toronto, Canada. (CIKM Student Travel Award Paper) (pdf)
M. Heilman, K. Collins-Thompson, M. Eskenazi, A. Juffs, L. Wilson. "Personalization of reading passages improves vocabulary acquisition." International Journal of Artificial Intelligence in Education, 20(1), 2010. [details]
J. Huang, N. Koudas, G. Jones, X. Wu, K. Collins-Thompson, and A. An. (eds.) Proceedings of the Nineteenth ACM International Conference on Information and Knowledge Management (CIKM 2010), ACM Press, New York.
K. Collins-Thompson and J. Dillon. “Controlling the search for expanded query representations by constrained optimization in latent variable space.” SIGIR 2010 Workshop on Query Representation and Understanding.(pdf)
Frishkoff, G. A., Perfetti, C. A., & Collins-Thompson, K. (2010). Lexical quality in the brain: ERP evidence for robust word learning from
context. Developmental Neuropsychology, 35(4), 1-28. [details]
M. Sun, G. Lebanon, and K. Collins-Thompson. "Visualizing Differences in Web Search Algorithms using the Expected Weighted Hoeffding Distance". Proceedings of WWW 2010, Raleigh, NC, U.S.A. pg 931-940. (pdf)  [bibtex]
K. Collins-Thompson, P.N. Bennett. "Predicting query performance via classification", Proceedings of ECIR 2010, Milton Keynes, UK. pg 140-152. (pdf)  [bibtex]
2009
K. Collins-Thompson. "Reducing the risk of query expansion via robust constrained optimization". Proceedings of the Eighteenth International Conference on Information and Knowledge Management (CIKM 2009). ACM. Hong Kong. pg. 837-846.(pdf)  [bibtex]
K. Collins-Thompson. "Accounting for stability of retrieval algorithms using risk-reward curves". Proceedings of SIGIR 2009 Workshop on the Future of Evaluation in Information Retrieval, Boston. pg. 27-28.(pdf)
M. Sun, G. Lebanon, and K. Collins-Thompson. Visualizing Spatial Proximity of Search Algorithms, NIPS Workshop on Learning with Ordering. (Poster abstract), 2009. (pdf)
K. Collins-Thompson. "Robust word similarity estimation using perturbation kernels". Proceedings of the International Conference on Theoretical Information Retrieval (ICTIR) 2009, Cambridge, U.K. pg. 265-272.(pdf)  [bibtex]
P. Kidwell, G. Lebanon, K. Collins-Thompson. "Statistical estimation of word acquisition with application to readability prediction". Proceedings of Empirical Methods in Natural Language Processing (EMNLP) 2009, Singapore. (pdf)
K. Collins-Thompson, P. N. Bennett. "Estimating query performance using class predictions". Proceedings of the Thirty-second Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), Boston. pg. 672-673. (Poster description) (pdf)  [bibtex]
K. Collins-Thompson. "Estimating robust query models with convex optimization". Advances in Neural Information Processing Systems 21 (NIPS), 2008. pg. 329-336.(pdf)   [bibtex]
K. Collins-Thompson. "Robust model estimation methods for information retrieval". Ph.D. thesis (LTI Technical Report CMU-LTI-08-010) Carnegie Mellon University, 2008.
G. Frishkoff, K. Collins-Thompson, C. Perfetti, J. Callan. Measuring incremental changes in word knowledge: Experimental validation and implications for learning and assessment. Behavior Research Methods, Vol. 40, No. 4. pp. 907-925. (pdf)  [pubmed]
M. Heilman, K. Collins-Thompson and M. Eskenazi. "An analysis of statistical models and features for reading difficulty prediction." ACL 2008 BEA Workshop on Innovative Use of NLP for Building Educational Applications. Columbus, Ohio. (pdf)
2007
K. Collins-Thompson and J. Callan. "Estimation and use of uncertainty in pseudo-relevance feedback." Proceedings of the Thirtieth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2007), Amsterdam. (pdf)  [bibtex]
K. Collins-Thompson and J. Callan. "Automatic and human scoring of word definition responses." Proceedings of the NAACL-HLT 2007 Conference. Rochester, U.S.A. pp. 476-483. (pdf)  [bibtex]
K. Collins-Thompson. Optimization methods for query model estimation: applying portfolio theory to mitigate risk in information retrieval. CMU DIR Group Technical Report 2007-09-03. Abstract
M. Heilman, K. Collins-Thompson, J. Callan and M. Eskenazi. "Combining lexical and grammatical features to improve readability measures for first and second language texts." Proceedings of the NAACL-HLT 2007 Conference. Rochester, U.S.A. pp. 460-467. (pdf)  [bibtex]
2006
M. Heilman, K. Collins-Thompson, J. Callan, and M. Eskenazi. Classroom success of an Intelligent Tutoring System for lexical practice and reading comprehension. Proceedings of Interspeech 2006. Pittsburgh, U.S.A. abstract
A. Juffs, L. Wilson, M. Eskenazi, J. Callan, J. Brown, K. Collins-Thompson, M. Heilman, T. Pelletreau, and J. Sanders. (2006) "Robust learning of vocabulary: investigating the relationship between learner behaviour and the acquisition of vocabulary" (poster). The 40th Annual TESOL Convention and Exhibit (TESOL 2006).
2005
K. Collins-Thompson and J. Callan. Query expansion using random walk models. Proceedings of the Fourteenth International Conference on Information and Knowledge Management (CIKM'05). ACM. Bremen, Germany. (CIKM Student Travel Award Paper) (pdf)  [bibtex]
K. Collins-Thompson, J. Callan. Predicting reading difficulty with statistical language models. Journal of the American Society for Information Science and Technology. Vol. 56, No. 13, 1448-1462.  [bibtex]
K. Collins-Thompson, P. Ogilvie and J. Callan. Initial results with structured queries and language models on half a terabyte of text. Proceedings of TREC 2004, National Institute of Standards and Technology, special publication. (pdf)
2004
K. Collins-Thompson and J. Callan. A language modeling approach to predicting reading difficulty. Proceedings of HLT / NAACL 2004, Boston, USA, May 2004. (pdf)  [bibtex]
K. Collins-Thompson and J. Callan. Information retrieval for language tutoring: an overview of the REAP project (poster description), Proceedings of SIGIR 2004, Sheffield, UK. July 2004. (pdf)  [bibtex]
K. Collins-Thompson, E. Terra, J. Callan, and C. Clarke. The effect of document retrieval quality on factoid question-answering performance (poster description), Proceedings of SIGIR 2004, Sheffield, UK. July 2004. (pdf)  [bibtex]
J. Zhang, A. Toth, K. Collins-Thompson, and A. Black. Prominence prediction for super-sentential prosodic modeling based on a new database, ISCA Synthesis Workshop, Pittsburgh, USA, June 2004.
E. Nyberg, T. Mitamura, J. Callan, J. Carbonell, R. Frederking, K. Collins-Thompson, L. Hiyakumoto, Y. Huang, C. Huttenhower, S. Judy, J. Ko, A. Kupsc, L. V. Lita, V. Pedro, D. Svoboda, and B. Van Durme. (2004.) "The JAVELIN question-answering system at TREC 2003: A multi-strategy approach with dynamic planning." Proceedings of the 2003 Text REtrieval Conference (TREC 2003). National Institute of Standards and Technology, special publication. (pdf)
U.S. Patent 6,735,335. M. Liu, K. Collins-Thompson, D. Lawton. Method and apparatus for discriminating between documents in batch scanned document files. May 2004.
U.S. Patent 6,687,697. K. Collins-Thompson, C. Schweizer. System and method for improved string matching under noisy channel conditions. Feb. 2004.
2003
K. Collins-Thompson, P. Ogilvie, Y. Zhang, and J. Callan. Information filtering, novelty detection, and named-page finding. In Proceedings of the 2002 Text REtrieval Conference (TREC 2002). National Institute of Standards and Technology, special publication. 107 - 118.(pdf)
E. Nyberg, T. Mitamura, J. Carbonell, J. Callan, K. Collins-Thompson, K. Czuba, M. Duggan, L. Hiyakumoto, N. Hu, Y. Huang, J. Ko, L. Lita, S. Murtagh, V. Pedro, D. Svoboda. The JAVELIN Question-Answering System. In Proceedings of TREC 2002. NIST, special publication. 128 - 137.
2002
K. Collins-Thompson, R. Nickolov (2002). A clustering-based algorithm for automatic document separation. Proceedings of the SIGIR 2002 Workshop on Information Retrieval and OCR, Tampere, Finland. (pdf)
2001
K. Collins-Thompson, C. Schweizer and S. T. Dumais (2001). Improved string matching under noisy channel conditions. Proceedings of CIKM 2001. Atlanta, USA. 357-364 (pdf)  [bibtex]
Reviewer, ACM Transactions on Information Systems; IEEE Transactions on Knowledge and Data Engineering; Information Processing and Management; Foundations and Trends in Information Retrieval; Transactions on Audio, Speech, and Language Processing.