Dr. Ji Zhu has been an integral part of the Department of Statistics at the University of Michigan since 2003. Before entering the field of statistics, Ji originally studied physics. It wasn’t until he was enrolled as a Ph.D. student at Cornell University doing research in the area of condensed matter experimental physics that he found he enjoyed working with data more than working with big, expensive equipment. While at Cornell, Ji took courses with Persi Diaconis and Susan Holmes, renowned statisticians and mathematicians who encouraged him to switch to statistics. With that support, Ji transferred to Stanford, where he obtained his doctorate in statistics in 2003.

Ji’s area of research is statistical machine learning. There are two subcategories under the scope of statistical machine learning: supervised and unsupervised learning. The goal of supervised learning is to use data to build a model for the purpose of prediction. In contrast, the goal of unsupervised learning is not to build a model for prediction, but rather discover structure and patterns in the data. “In a sense, unsupervised learning is more challenging than supervised learning because there is no clear goal and nothing for us to predict,” said Ji. “We just look at the data and see if there is anything interesting.”

Currently, Ji is interested in developing statistical machine learning methods for network data. Advances in data collection and social media have resulted in relational network data between units of analysis being collected for use in many applications. This information is often collected along with more traditional covariates on each unit of analysis. The use of network information in statistical machine learning models has not yet been well studied, and Ji is interested in developing general statistical frameworks that use network information in both supervised and unsupervised learning. For example, in predicting the behavior of an individual in a social network, in addition to covariate information of that individual, one can also use information from the people in that individual’s network to inform a better prediction about certain behaviors of that individual.

Ji is also interested in applying statistical machine learning to help solve healthcare problems. He works with many clinicians and medical researchers at the University hospital and medical school. One recent project he is a part of is developing a model for predicting the readmission of patients. At the time of patient release, Ji and his colleagues use the ICD (International Classification of Diseases) codes that are assigned to every patient to predict whether individual patients will be readmitted to the hospital.

According to Ji, statistical machine learning is an important area of statistics due to the fact that it provides a set of very powerful statistical tools that can be applied almost everywhere and across many fields – healthcare, business, physics, biology, engineering, medicine, and more. Statistics in general is connected with real-life problems, and statisticians need to work with real-world data to ensure that the methods they develop are useful and relevant. It is beneficial for statisticians to be involved in the entire problem-solving process – understanding the problem in the application domain, identifying the statistical problems involved, providing solutions, explaining the results, and discussing the solutions with domain experts.

Age of first marijuana use shown on a school friendship network. Node size represents the individual's hazard, and node color represents the observed age of first use.