Paul Mineiro


I'm in the Cloud and Information Services Lab (CISL) lab headed by Raghu Ramakrishnan. We're so new that we don't have a proper web presence at the time I'm writing this.

CISL researches and develops database systems technology to support the next generation of data manipulation desires, which among other things includes a major interest of mine: machine learning at scale.

If you enjoy making algorithms fast and scalable (like myself), you have to go where the ability to process more data is important, so naturally I'm interested in big data. It's worth noting, however, that many organizations can benefit from machine learning applied at the “not big” data scale. Big here essentially means “does not fit comfortably on a single machine”, which means that “not big” is getting larger every day.

Historically being fast and scalable has meant linear models and online learning, but that's changing. Right now I'm very interested in random features to incorporate nonlinearity without sacrificing speed.

Recent Activities

  1. Random features for unsupervised learning. This is work in progress, but basically we're just now appreciating the implications for machine learning of the following: SVD has just gotten incredibly computationally inexpensive.
  2. Random features for supervised learning. Again work in progress, but the gist is this: trading optimization for randomization simplifies distributed learning.
  3. Normalized online learning. Stephane Ross, John Langford, and myself have been interested in the implications of dimensional correctness for online learning. Theoretically, if an adversary is strengthened by allowing it to choose a fixed scaling of the data prior to the first round, many existing online learning rules have poor regret. In practice, pretending that you are up against such an adversary leads to more robust online learning algorithms. There is still more work to be done here as our current formulation is somewhat sensitive to outliers.
  4. Using less data. It turns out if you are going to do empirical risk minimization (ERM) over some hypothesis class H, and somebody hands you any hypothesis h' in H, you can use h' to compress your data prior to doing the ERM without introducing much excess risk. The basic idea is to subsample examples where h' has low loss more aggressively. You may have already done this implicitly (e.g., subsampling the more frequent class in an imbalanced binary classification problem), but Nikos Karampatziakis and I generalized the technique and proved a theorem that says it is a reasonable thing to do.

Do you like my writing style?

Perhaps you would like my blog about machine learning and data science.