Skip to Content

Search: {{$root.lsaSearchQuery.q}}, Page {{$root.page}}

Statistics Department Seminar Series: Ye Tian, PhD Candidate, Department of Statistics, Columbia University

"Transfer and Multi-task Learning: Statistical Insights for Modern Data Challenges"
Tuesday, February 11, 2025
4:00-5:00 PM
411 West Hall Map
Abstract: Knowledge transfer, a core human ability, has inspired numerous data integration methods in machine learning and statistics. However, data integration faces significant challenges: (1) unknown similarity between data sources; (2) data contamination; (3) high-dimensionality; and (4) privacy constraints. This talk addresses these challenges in three parts across different contexts, presenting both innovative statistical methodologies and theoretical insights.

In Part I, I will introduce a transfer learning framework for high-dimensional generalized linear models that combines a pre-trained Lasso with a fine-tuning step. We provide theoretical guarantees for both estimation and inference.

In Part II, I will explore an unsupervised learning setting where task-specific data is generated from a mixture model with heterogeneous mixture proportions. This complements the supervised learning setting discussed in Part I, addressing scenarios where labeled data is unavailable. We propose a federated gradient EM algorithm that is communication-efficient and privacy-preserving, providing estimation error bounds for the mixture model parameters.

In Part III, I will introduce a representation-based multi-task learning framework that generalizes the distance-based similarity notion discussed in Parts I and II. This framework is closely related to modern applications of fine-tuning in image classification and natural language processing. I will discuss how this study enhances our understanding of the effectiveness of fine-tuning and the influence of data contamination on representation multi-task learning.

Finally, I will summarize the talk and briefly introduce my broader research interests. I will also discuss my work on imbalanced classification error control, highlighting its role in improving classification performance for minority groups and its applications in biomedical and business contexts. The three main sections of this talk are based on a series of papers [TF23, TWXF22, TWF24, TGF23] and a short course I co-taught at NESS 2024 [STL24]. The segment on imbalanced classification error control is based on our recent paper [TF24]. More about me and my research can be found at https://yet123.com.

[TF23] Tian, Y., & Feng, Y. (2023). Transfer Learning under High-dimensional Generalized Linear Models. Journal of the American Statistical Association, 118(544), 2684-2697.
[TWXF22] Tian, Y., Weng, H., Xia, L., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.
[TWF24] Tian, Y., Weng, H., & Feng, Y. (2024). Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms. ICML 2024.
[TGF23] Tian, Y., Gu, Y., & Feng, Y. (2023). Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness. arXiv preprint arXiv:2303.17765.
[STL24] A (Selective) Introduction to the Statistics Foundations of Transfer Learning. (2024).
[TF24] Tian, Y., & Feng, Y. (2024). Neyman-Pearson Multi-class Classification via Cost-sensitive Learning. Journal of the American Statistical Association, 1-15.

Speaker bio: Ye Tian is a final-year Ph.D. student in Statistics at Columbia University. His research lies at the intersection of statistics, data science, and machine learning, focusing on three main topics: (1) reliable transfer learning; (2) high-dimensional statistics; and (3) privacy and fairness of the learning system.
Building: West Hall
Website:
Event Type: Workshop / Seminar
Tags: seminar
Source: Happening @ Michigan from Department of Statistics, Department of Statistics Seminar Series