My research group is engaged in making advances in theoretical machine learning as well as finding new, exciting application areas for machine learning. We believe that the best way to do machine learning research is to look at the interplay between theory and applications. We also strongly believe in the value of communication with domain scientists.

Here are a few research directions that we are pursuing in theoretical work. First, we are always looking to extend statistical learning theory in new directions. Classical statistical learning theory makes a number of simplifying assumptions such as independence and stationarity. We want to understand learning in the presence of dependence and non-stationarity, especially in the context of time series forecasting. Classical theory also assumes that the learning algorithm passively receives a batch of labeled data on a single prediction task. We are looking at active learning algorithms that make their own decision about when to request labels. We are also interested in multitask learning algorithms that use similarity across a number of learning tasks to enable faster learning.

Another limitation of classical theory is that labels are assumed to be very simple and without much internal structure (e.g., they are either binary or real valued). We want to study complex labels spaces (such a bit vectors, real valued vectors, and rankings). When dealing with these spaces, it also becomes important to learn with weak supervision where the label is not fully specified (e.g., the true label may be a full ranking of a large number of items, but the learning algorithm might only see the top few items in the ranking).

Second, we are interested in understanding the power and limitations of online learning algorithms. These are algorithms that process their inputs incrementally. This makes them extremely well suited to big data. We also try to explore the limits of what can be learned without making strong probabilistic assumptions on the data generating mechanism. This area of research has strong and deep connections with information theory, prequential statistics, and game-theoretic probability.

Third, we study sequential decision-making problems where the learning algorithm's decisions impact the future evolution of the environment it is interacting with. A fundamental trade-off here is between exploration, viz. taking actions that have not been tried before, and exploitation, viz. taking actions that have performed well in the past. We try to mathematically understand how best to resolve this dilemma by undertaking a type of formal analysis called regret analysis. That is, we define the regret of a sequential decision-making algorithm as the amount of performance loss due to not knowing the underlying dynamical system and hence the optimal behavior policy in that system. We then use the worst case (over all environments in a class) regret of algorithms as a yardstick to judge how good they are at resolving the explore-exploit dilemma.

Finally, as machine learning continues to directly affect billions of people across the world, there is an increased interest in the theoretical study of properties of learning algorithms such as robustness, safety, fairness, and privacy. Research in these areas is just emerging. Therefore, young (and young-at-heart!) researchers should look here for interesting problems!

My research group is engaged in making advances in theoretical machine learning as well as finding new, exciting application areas for machine learning. We believe that the best way to do machine learning research is to look at the interplay between theory and applications. We also strongly believe in the value of communication with domain scientists.

In terms of applications, I am very excited about the emergence of precision medicine. I am a co-leader of a SAMSI program on precision medicine where we are bringing together people with quantitative backgrounds (e.g., computer scientists, mathematicians, statisticians) with domain experts (e.g., oncologists, mental health experts) to better understand what the real challenges in precision medicine are. I am quite curious about the potential of technology (smartphones, wearables, augmented and virtual reality) in personalizing treatments to the individual characteristics of users. I am part of HeartSteps, a project led by Dr. Predrag Klasnja, whose goal is to increase and maintain physical activity via a smartphone app. I am also part of a MIDAS research project lead by Prof. Srijan Sen that is using smartphone data (self-reported mood, physical activity, sleep) from medical interns to better understand the onset of stress and depression in this population. I am cautiously optimistic that technology can play a constructive role in providing better support to people suffering from various mental illnesses.

A second application area that I am excited about is the use of machine learning in data-enabled computational chemistry. In collaboration with Prof. Paul Zimmerman, we are developing machine learning algorithms whose results are interpretable using chemists' intuition, and that can supplement quantum chemical calculations (that are often computationally expensive) of reaction energies and other quantities of interest. We are also interested in harnessing the power of data residing in existing chemical databases that can contain information about millions of chemical reactions. Still, the chemicals appearing in these databases constitute a only tiny part of what is called the chemical space, viz. the space of molecules and compound that can exist according to chemical principles but perhaps have never been created.