Skip to Content

Search: {{$root.lsaSearchQuery.q}}, Page {{$root.page}}

Dissertation Defense: Link prediction and denoising in networks

Yun-Jhong Wu
Thursday, May 11, 2017
10:00 AM-1:00 PM
438 West Hall Map
Abstract:
Network data represent connections between units of interests, but are often noisy and/or include missing values. This thesis focuses on denoising network data via inferring underlying network structure from an observed noisy realization. The observed network data can be viewed as a single random realization of an unobserved latent structure, and our general approach to estimating this latent structure is based factorizing it into a product of interpretable components, with structural assumptions on the components determined by the nature of the problem.

We first study the problem of predicting links when edge features are available, or node features that can be converted into edge features. We propose a regression-type model to combine information from network structure and edge features. We show that estimating parameters in this model is straightforward and the estimator enjoys excellent theoretical performance guarantees.

Another direction we study is predicting links in time-stamped dynamic networks. A common approach to modeling networks observed over time is aggregating the networks to a few snapshots, which reduces computational complexity, but also loses information. We address this limitation through a dynamic network model based on tensor factorization, which simultaneously captures time trends and the graph structure of dynamic networks without aggregating over time. We develop an efficient algorithm to fit this model and demonstrate the method performs well numerically.

The last contribution of this thesis is link prediction for ego-networks. Ego-networks are constructed by recording all friends of a particular user, or several users, which is widely used in survey-based social data collection. There are many methods for filling in missing data in a matrix when entries are missing independently at random, but here it is more appropriate to assume that whole rows of the matrix are missing (corresponding to users), whereas other rows are observed completely. We develop an approach to estimate missing links in this scenario via subspace estimation, exploiting potential low-rank structure common in networks. We obtain theoretical bounds on the estimator's performance and demonstrate it significantly outperforms many widely used benchmarks in both simulated and real networks.
Building: West Hall
Event Type: Lecture / Discussion
Tags: Dissertation
Source: Happening @ Michigan from Department of Statistics