This course is open to graduate students and upper-level undergraduates in applied mathematics, bioinformatics, statistics, and engineering, who are interested in learning from data. Students with other backgrounds such as life sciences are also welcome, provided they have maturity in mathematics. The mathematical content in this course will be linear algebra, multilinear algebra, dynamical systems, and information theory. This content is required to understand some common algorithms in data science. I will start with a very basic introduction to data representation as vectors, matrices, and tensors. Then I will teach geometric methods for dimension reduction, also known as manifold learning (e.g. diffusion maps, t-distributed stochastic neighbor embedding (t-SNE), etc.), and topological data reduction (introduction to computational homology groups, etc.). I will bring an application-based approach to spectral graph theory, addressing the combinatorial meaning of eigenvalues and eigenvectors of their associated graph matrices and extensions to hypergraphs via tensors. I will also provide an introduction to the application of dynamical systems theory to data including dynamic mode decomposition. Real data examples will be given where possible and I will work with you write code implementing these algorithms to solve these problems. The methods discussed in this class are shown primarily for biological data, but are useful in handling data across many fields. A course features several guest lectures from industry and government.
There is no textbook for this course.
For more information on this course, please visit the Department of Mathematics webpage