Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Sundararajan Sellamanickam

Sundararajan Sellamanickam

I am a principal applied scientist in the applied sciences - CISL group of MSR India, Bangalore, and is technical lead for the members of this group. There are two applied scientists and research development engineers. My current research interests are in the areas of large scale machine learning, numerical optimization and data mining. There are two broad themes that we focus on now:

  • Service Analytics
  • Distributed Machine Learning.

Service Analytics:

Under this theme, we conduct applied research focusing on development of machine learning and data mining tools to address a broad range of problems and applications. Our application focus is currently on monitoring services and security.

Service monitoring is an important problem to be addressed in order to ensure high quality service in applications such as distributed compute/storage platform services, as offered on cloud or On Premise scenarios. This involves analysing and deriving insights from different high volume data sources such as high dimensional time series data and service logs. We are developing a generic system that can detect unusual or anomalous patterns, relate to service level issues, etc. In these problems, domain knowledge plays a crucial role and our framework can take such knowledge into account.

Intrusion detection is an important problem to be addressed in order to ensure secure networks. The problems that we study include analysing user sessions in a large network of machines and how they relate to each other when a hacker moves from one machine to another over a period of time. The scale of the problem is so huge with several billions of user sessions happening everyday, typical of a large company. We are developing scalable machine learning algorithms to rank unusual or anomalous sessions and graphs of connected sessions that handle such large volume of data.

Distributed Machine Learning:

Scalable machine learning over big data is an important problem as the volume of data collected is ever-growing in many different applications. To analyse or build classifier models on such data quickly we often require distributed compute/storage environments. One popular distributed environments is Hadoop running on a cluster of commodity machines. In such environments, communications costs can be prohibitively high. Therefore, there is a need to develop efficient algorithms that trades off communication and computation costs. We have developed several algorithms to address these requirements for training linear and non-linear classifier models.