Christian Konig

Christian Konig
SENIOR RESEARCHER
.

Data Management, Exploration and Mining Group
Microsoft Research
One Microsoft Way
Redmond, WA 98052
Tel: (425) 703-5064
Fax: (425) 936-7329
Email: chrisko@microsoft.com

I am currently working as a member of the AutoAdmin and Data Exploration research projects in the Data Management, Exploration and Mining Group at Microsoft Research. I am also a member of the cross-group BLEWS project on blogs and news.

Before joining Microsoft, I completed my Ph.D. in Computer Science at the University of the Saarland, in the Database Research Group of Prof. Gerhard Weikum.

 

News:

  • The proceedings to the Dagstuhl Workshop on Robust Query Processing are now available here.
  • The paper B-bit Minwise Hashing has been invited as a 'research highlight' to CACM.

Research Interests

My current research is focused on scalable algorithms for processing and indexing very large data sets in the context of web search and computational advertising. Recent work includes:

  • Service Intelligence: we study the use of statistical techniques in the context of monitoring, tuning and problem-diagnostics for large-scale 'Cloud' database instances.
  • B-bit Minwise Hashing: we proposed a technique that improves upon the standard minwise hashing method (as well as sign random projections, Hamming-LSH, etc.) for set similarity estimation by storing only storing b bits of each hashed value (e.g., b = 1 or 2); using a novel estimator we obtain order-of-magnitude improvements in the storage space required for a given level of accuracy in practice. Subsequently, we (a) extended the framework to three-way similarities, and (b) integrated (b-bit) minwinse hashing with linear learning algorithms such as linear SVM and logistic regression to solve large-scale and high-dimensional statistical learning tasks. A 3-minute video introduction to the technique is availabe here.
  • Fast Set intersection: Set intersection is a central operator in IR and data mining; we propose techniques that give novel asymptotic bounds and outperform the state of the art in practice, while being robust in that - for the cases where our approach is not the best - they are close to the best-performing one. 
  • Integrating Vertical Content with Web Search: Current search engines surface a plethora of content other than web pages, such as advertisements, news, images, movies, 'answers', etc. Retrieving the appropriate 'vertical' content for a given query is an important research challenge. Recently, we studied frameworks for the detection of query intent, which enable the selection of relevant content types, the integration of news results in web search and the dynamic construction of 'portal' pages for a given query.
  • Improving Retrieval Latency: The perceived latency of search is of critical importance to the overall search experience. We have studied algorithms and index structures aimed at minimizing the worst-case latency when retrieving 'vertical' content, surfacing advertisements in sponsored search or displaying structured data about entities (such as celebrities, products, locations) related to a search query.
  • BLEWS (= blogs + news): In the BLEWS system we studied how to surface blog entries commenting on news stories as part of the news browsing experience. The BLEWS system shows which type of blogs are linking back to a specific story, how much 'attention' the story is getting and allows the user to quickly navigate to the comments themselves. We also studied the distribution of navigational patterns used to access social media content (i.e., what type of content do users typically read in blogs and how do they get there?).
  • Text classification: here, our work has focused on the scalable and robust extraction/categorization of entities from very large corpora and reducing the human overhead in text classification settings.

My prior work in the context of the management of databases has focused on a monitoring infrastructure for database servers, the scalable exploration of different database designs and various techniques for result-size estimation.

Publications

2014

Manjula Peiris, James H. Hill, Jorgen Thelin, Sergey Bykov, Gabriel Kliot, and Christian Konig, PAD: Performance Anomaly Detection in Multi-Server Distributed Systems, in 7th IEEE International Conference on Cloud Computing (IEEE Cloud 2014), June 2014

Nivan Ferreira, Danyel Fisher, and Arnd Christian König, Sample-Oriented Task-Driven Visualizations: Allowing Users to Make Better, More Confident Decisions, in Proceedings of Conference on Human Factors in Computing Systems (CHI 2014), ACM, April 2014

2013

Ping Li, Anshumali Shrivastava, and Arnd Christian König, b-Bit Minwise Hashing in Practice, in Proceedings of the 5th Asia-Pacific Symposium on Internetware , ACM, October 2013

2012

Jiexing Li, Arnd Christian König, Vivek Narasayya, and Surajit Chaudhuri, Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques, in 38th International Conference on Very Large Databases, Very Large Data Bases Endowment Inc., 28 August 2012

Danyel Fisher, Steven M. Drucker, and A. Christian König, Exploratory Visualization Involving Incremental, Approximate Database Queries and Uncertainty, in IEEE Computer Graphics and Applications, IEEE, July 2012

Ping Li, Anshumali Shrivastava, and Arnd Christian König, GPU-Based Minwise Hashing, in 21st International World Wide Web Conference , Association for Computing Machinery, Inc., 16 April 2012

Arnd Christian König, Bolin Ding, Surajit Chaudhuri, and Vivek R. Narasayya, A Statistical Approach Towards Robust Progress Estimation, in Proceedings of the VLDB Endowment, the 38th International Conference on Very Large Data Bases (VLDB 2012), vol. 5, no. 4, pp. 382-393, Very Large Data Bases Endowment Inc., 2012

2011

Ping Li, Anshumali Shrivastava, Joshua Moore, and Arnd Christian König, Hashing Algorithms for Large-Scale Learning, in Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), Neural Information Processing Foundation, 12 December 2011

Ping Li, Anshumali Shrivastava, Joshua Moore, and Arnd Christian König, b-Bit Minwise Hashing for Large-Scale Learning, in Big Learning 2011: NIPS 2011 Workshop on Algorithms, Systems, and Tools for Learning at Scale , Neural Information Processing Foundation, December 2011

Nicolas Bruno, Surajit Chaudhuri, Arnd Christian König, Vivek Narasayya, Ravi Ramamurthy, and Manoj Syamala, AutoAdmin Project at Microsoft Research: Lessons Learned, in Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, IEEE Computer Society, December 2011

Ping Li and Arnd Christian König, Theory and Applications of b-Bit Minwise Hashing, in Communications of the ACM, ACM, August 2011

Fei Wang, Ping Li, Arnd Christian König, and Muting Wan, Improving Clustering by Learning a Bi-Stochastic Data Similarity Matrix, in Knowledge and Information Systems (KAIS) , Springer Verlag, August 2011

Klaus Berberich, Arnd Christian König, Dimitrios Lymberopoulos, and Peixiang Zhao, Improving Local Search Ranking through External Logs, in 34th Annual ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011) , ACM, July 2011

Fei Wang, Chenhao Tan, Ping Li, and Arnd Christian König, Efficient Document Clustering via Online Nonnegative Matrix Factorizations , in Eleventh SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 28 April 2011

Goetz Graefe, Harumi Anne Kuno, Arnd Christian König, Volker Markl, and Kai-Uwe Sattler, Dagstuhl Seminar on Robust Query Processing - Summary and Abstracts Collection, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany, February 2011

Bolin Ding and Arnd Christian König, Fast Set Intersection in Memory, in Proceedings of the VLDB Endowment, the 37th International Conference on Very Large Data Bases (VLDB 2011), vol. 4, no. 4, pp. 255-266, Very Large Data Bases Endowment Inc., 2011

Dimitrios Lymberopoulos, Peixiang Zhao, Arnd Christian Konig, Klaus Berberich, and Jie Liu, Location-aware Click Prediction in Mobile local Search, in Conference in Information and Knoweledge Management (CIKM), ACM, 2011

2010

Fei Wang, Ping Li, and Arnd Christian König, Learning a Bi-Stochastic Data Similarity Matrix, in The 10th International Conference on Data Mining (ICDM), IEEE, 14 December 2010

Ping Li, Arnd Christian König, and Wenhao Gui, b-Bit Minwise Hashing for Estimating Three-Way Similarities, in Twenty-Fourth Annual Conference on Neural Information Processing Systems (NIPS), 6 December 2010

Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Arnd Christian König, and Dong Xin, Query Portals: Dynamically Generating Portals for Entity-Oriented Web Queries, in International Conference on Management of Data (SIGMOD 2010) , Association for Computing Machinery, Inc., 6 June 2010

Ping Li and Arnd Christian König, b-Bit Minwise Hashing, in Nineteenth International World Wide Web Conference (WWW 2010), Association for Computing Machinery, Inc., 26 April 2010

Venkatesh Ganti, Arnd Christian König, and Xiao Li, Precomputing Search Features for Fast and Accurate Query Classification, in Third ACM International Conference on Web Search and Data Mining (WSDM 2010), Association for Computing Machinery, Inc., 4 February 2010

2009

Arnd Christian König, Michael Gamon, and Qiang Wu, Click-Through Prediction for News Queries , in 32nd Annual ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), Association for Computing Machinery, Inc., July 2009

Michael Gamon and Arnd Christian König, Navigation Patterns from and to Social Media, in 3rd AAAI Conference on Weblogs and Social Media (ICWSM 2009), American Association for Artificial Intelligence , May 2009

Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Arnd Christian König, and Dong Xin, Exploiting Web Search Engines to Search Structured Information , in 18th International World Wide Web Conference (WWW 2009), Association for Computing Machinery, Inc., April 2009

Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Arnd Christian König, and Dong Xin, Query Portals: Dynamically Generating Portals for Web, in 18th International World Wide Web Conference (WWW 2009), Association for Computing Machinery, Inc., April 2009

Arnd Christian König, Kenneth Church, and Martin Markov, A Data Structure for Sponsored Search, in 24th International Conference on Data Engineering (ICDE), IEEE Computer Society, 29 March 2009

2008

Venkatesh Ganti, Arnd Christian König, and Rares Vernica, Entity Categorization over Large Document Collections , in 14th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2008), Association for Computing Machinery, Inc., 24 August 2008

Michael Gamon, Sumit Basu, Dmitriy Belenko, Danyel Fisher, Matthew Hurst, and Arnd Christian König, BLEWS: Using Blogs to Provide Context for News Articles, in 2nd AAAI Conference on Weblogs and Social Media (ICWSM 2008), American Association for Artificial Intelligence , April 2008

2007

Surajit Chaudhuri, Kenneth Church, Arnd Christian König, and Liying Sui, Heavy-Tailed Distributions and Multi-Keyword Queries , in 30th ACM SIGIR International Conference on Research & Developement on Information Retreival, Association for Computing Machinery, Inc., July 2007

2006

Arnd Christian König and Shubha U. Nabar, Scalable Exploration of Physical Database Design, in 22nd International Conference on Data Engineering , IEEE Computer Society, 2006

Arnd Christian König and Eric Brill, Reducing the Human Overhead in Text Categorization, in Proceedings of KDD 2006, Association for Computing Machinery, Inc., January 2006

2004

Surajit Chaudhuri, Arnd Christian König, and Vivek Narasayya, SQLCM: A Contiuous Monitoring Framework for Relational Database Engines, in 20th International Conference on Data Engineering, Institute of Electrical and Electronics Engineers, Inc., March 2004

2003

Arnd Christian König and Gerhard Weikum, Automatic Tuning of Data Synopses, in Information Systems, Elsevier , 2003

2002

Arnd Christian Konig and Gerhard Weikum, A Framework for the Physical Design Problem for Data Synopses , in EDBT 2002 - Advances in Database Technology, Springer-Verlag, 2002

2000

Arnd Christian Konig and Gerhard Weikum, Auto-Tuned Spline Synopses for Database Statistics Management, in 10th International Conference on Management of Data, 2000

1999

Gerhard Weikum, Arnd Christian König, Achim Kraiss, and Marcus Sinnwell, Towards Self-Tuning Memory Management for Data Servers, in Data Engineering Bulletin 22(2), IEEE Computer Society, 1999

Arnd Christian Konig and Gerhard Weikum, Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-Size Estimation, in 25th International Conference on Very Large Data Bases, Very Large Data Bases Endowment Inc., 1999

Marcus Sinnwell and Arnd Christian König, Managing Distributed Memory to Meet Multiclass Workload Response Time Goals, in 15th International Conference on Data Engineering, IEEE Computer Society, 1999

Professional Activities

  • ACM International Conference on Management of Data (SIGMOD 2004): Proceedings Chair
  • First M3SN Workshop on Modeling, Managing, and Mining of Evolving Social Networks (co-located with ICDE 2009): Co-Chair
  • 1st USETIM (Using Search Engine Technology for Information Management) Workshop: Keynote Speaker
  • 2010 Dagstuhl Workshop on Robust Query Processing: Co-Organizer
  • 37rd International Conference on Very Large Databases (VLDB 2011): Local Arrangements Chair

Programm Comittee Memberships:

  • 10th Conference on Database Systems for Business, Technology, and the Web (BTW 2003): Program Committee member
  • 21st International Conference on Data Engineering (ICDE 2005): Program Committee member
  • 11th International Conference on Database Systems for Advanced Applications (DASFAA 2006): Program Committee member
  • 12th International Conference of the Management of Data (COMAD 2005): Industrial Program Committee member
  • 12th International Conference on Database Systems for Advanced Applications (DASFAA 2007): Program Committee member
  • 14th Conference on Database Systems for Business, Technology, and the Web (BTW 2007): Industrial Program Committee member
  • 18th International Conference on Database and Expert Systems Applications (DEXA 2007): Program Committee member
  • 33rd International Conference on Very Large Databases (VLDB 2007): Program Committee member
  • International Workshop on Ranking in Databases (DBRank 2007) to be held in conjunction with ICDE 2007: Program Committee member
  • 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007): Industrial Program Committee member
  • 13th International Conference on Database Systems for Advanced Applications (DASFAA 2008): Program Committee member
  • 11th International Conference on Extending Database Technology (EDBT 2008): Program Committee member
  • 19th International Conference on Database and Expert Systems Applications (DEXA 2008): Program Committee member
  • Second International Conference on Weblogs and Social Media (ICWSM 2008): Poster/Demo Committee member
  • 15th International Conference of the Management of Data (COMAD 2008): Program Committee member
  • 16th Conference on Database Systems for Business, Technology, and the Web (BTW 2009): Industrial Program Committee member
  • Third International Conference on Weblogs and Social Media (ICWSM 2009): Program Committee member
  • NAACL Human Language Technology (NAACL-HLT) Conference 2009: Program Committee Member
  • 2nd M3SN Workshop on Modeling, Managing, and Mining of Evolving Social Networks (co-located with ICDE 2010): Program Committee member
  • 4th International Workshop on Ranking in Databases (DBRank 2010): Program Committee member
  • 4th International Conference on Weblogs and Social Media (ICWSM 2010): Program Committee member
  • NAACL Human Language Technology (NAACL-HLT) Conference 2010: Program Committee Member
  • Fourth Workshop on Enabling Real-Time Business Intelligence (BIRTE 2010): Program Committee Member
  • 1st Workshop on Data Management Issues in Web Syndication Systems (DaMIWS 2010): Program Committee Member
  • 5th International Conference on Weblogs and Social Media (ICWSM 2011): Program Committee Member
  • 1st International Temporal Web Analytics Workshop (TWAW 2011) at WWW 2011: Program Committee Member
  • 5th IEEE International Conference on Semantic Computing (ICSC 2011): Program Committee Member
  • 17th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD 2011): Program Committee Member
  • European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2011): Program Committee Member
  • Fifth Workshop on Enabling Real-Time Business Intelligence (BIRTE 2011): Program Committee Member
  • SIGIR Workshop on Entity-Oriented Search: Program Committee Member
  • 28th International Conference on Data Engineering (ICDE 2012) - Demo Track: Program Committee Member
  • 21st World Wide Web Conference (WWW 2012): Program Committee Member
  • 5th International ACM Conference on Web Search and Data Mining (WSDM 2012): Program Committee Member
  • 26th International Conference on on Artificial Intelligence (AAAI 2012): Program Committee Member
  • 2nd International Temporal Web Analytics Workshop (TempWeb 2012) at WWW 2012: Program Committee Member
  • 39rd International Conference on Very Large Databases (VLDB 2013/PVLDB): Program Committee Member
  • 6th Workshop on Enabling Real-Time Business Intelligence (BIRTE 2012): Program Committee Member
  • Conference on Empirical Methods on Natural Language Processing (EMNLP 2012): Program Committee Member
  • 6th ACM International Conference on Web Search and Data Mining (WSDM 2013): Program Committee Member
  • 2nd International Workshop on Searching and Integrating New Web Data Sources (VLDS 2012): Program Committee Member
  • 16th International Conference on Extending Database Technology (EDBT 2013): Industrial Program Committee Member
  • 1st Joint International Workshop on Entity-oriented and Semantic Search 2012 (JIWES 2012): Program Committee Member
  • 29th International Conference on Data Engineering (ICDE 2013) - Demo Track: Program Committee Member
  • 19th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD 2013): Program Committee Member
  • 2013 IEEE International Conference on Big Data (IEEE Big Data 2013)
  • 40rd International Conference on Very Large Databases (VLDB 2014/PVLDB): Program Committee Member
  • 30th International Conference on Data Engineering (ICDE 2014): Program Committee Member
  • 17th International Conference on Extending Database Technology (EDBT 2014): Demo Program Committee Member
  • 31st International Conference on Machine Learning (ICML 2014): Program Committee Member
  • 2014 World Wide Web Conference (WWW 2014): Program Committee Member
  • 20th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD 2014): Program Committee Member (Research track)
  • 20th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD 2014): Program Committee Member (Industry and Government track)
  • 8th International Conference on Weblogs and Social Media (ICWSM-14): Program Committee Member
  • The Neural Information Processing Systems Conference 2014 (NIPS-2014): Program Committee Member
  • 8th ACM International Conference on Web Search and Data Mining (WSDM 2015): Program Committee Member

Journal Reviews: DKE, IJHIS, IS, ISI, TIST, TKDE, TODS, VLDB Journal, WWW Journal

External Reviewer for: STOC, STACS, ICML

Conclusion

    When in Rome, burn it.