I am a principal applied scientist in the applied sciences - CISL group of MSR India, Bangalore, and is technical lead for the members of this group. There are two applied scientists and research development engineers. My current research interests are in the areas of large scale machine learning, numerical optimization and data mining. There are two broad themes that we focus on now:
- Service Analytics
- Distributed Machine Learning.
Under this theme, we conduct applied research focusing on development of machine learning and data mining tools to address a broad range of problems and applications. Our application focus is currently on monitoring services and security.
Service monitoring is an important problem to be addressed in order to ensure high quality service in applications such as distributed compute/storage platform services, as offered on cloud or On Premise scenarios. This involves analysing and deriving insights from different high volume data sources such as high dimensional time series data and service logs. We are developing a generic system that can detect unusual or anomalous patterns, relate to service level issues, etc. In these problems, domain knowledge plays a crucial role and our framework can take such knowledge into account.
Intrusion detection is an important problem to be addressed in order to ensure secure networks. The problems that we study include analysing user sessions in a large network of machines and how they relate to each other when a hacker moves from one machine to another over a period of time. The scale of the problem is so huge with several billions of user sessions happening everyday, typical of a large company. We are developing scalable machine learning algorithms to rank unusual or anomalous sessions and graphs of connected sessions that handle such large volume of data.
Distributed Machine Learning:
Scalable machine learning over big data is an important problem as the volume of data collected is ever-growing in many different applications. To analyse or build classifier models on such data quickly we often require distributed compute/storage environments. One popular distributed environments is Hadoop running on a cluster of commodity machines. In such environments, communications costs can be prohibitively high. Therefore, there is a need to develop efficient algorithms that trades off communication and computation costs. We have developed several algorithms to address these requirements for training linear and non-linear classifier models.
- Dhruv Mahajan, S. Sathiya Keerthi, and Sundararajan Sellamanickam, A distributed block coordinate descent method for training l1 regularized linear classifiers, CoRR, 2014.
- Dhruv Mahajan, S. Sathiya Keerthi, and Sundararajan Sellamanickam, A Distributed Algorithm for Training Nonlinear Kernel Machines, CoRR, 2014.
- P.K. Srijith, Shirish Shevade, and S. Sundararajan, Semi-supervised Gaussian Process Ordinal Regression, European Conference on Machine Learning (ECML), June 2013.
- Kai-Wei Chang, S. Sundararajan, and S. Sathiya Keerthi, Tractable semi-supervised learning of complex structured prediction models, European Conference on Machine Learning (ECML), June 2013.
- Tanuja Ganu, Shirish Shevade, and S. Sundararajan, Sparse Max-Margin Multiclass and Multi-label Classifier Design for Fast Inference, SIAM International Conference on Data Mining (SDM), April 2013.
- Dhruv Mahajan, S. Sathiya Keerthi, Sundararajan Sellamanickam, and Leon Bottou, A Functional Approximation Based Distributed Learning Algorithm, CoRR, 2013.
- Dhruv Mahajan, S. Sathiya Keerthi, Sundararajan Sellamanickam, and Leon Bottou, A Parallel SGD method with Strong Convergence, NIPS 2013 Workshop on Optimization for Machine Learning, 2013.
- Shravan Narayanamurthy, Markus Weimer, Dhruv Mahajan, Tyson Condie, Sundararajan Sellamanickam, and S. Sathiya Keerthi, Towards Resource-Elastic Machine Learning, NIPS 2013 BigLearn Workshop, 2013.
- Dhruv Mahajan, Sundararajan Sellamanickam, Subhajit Sanyal, and Amit Madaan, A Classification Based Framework for Concept Summarization, International Conference on Data Mining (ICDM), December 2012.
- Sathiya Keerthi Selvaraj, Sundararajan Sellamanickam, and Shirish Krishnaj Shevade, Extension of TSVM to Multi-Class and Hierarchical Text Classification Problems With General Losses, in COLING (Posters), 2012.