Skip to Content

Fast Community Detection in Large Sparse Networks

Tuesday, October 29, 2013
12:00 AM

Abstract: Community detection is one of the fundamental problems in
network analysis, with many diverse applications, and a lot of work has
been done on models and algorithms that find communities.   Perhaps the
most commonly used probabilistic model for a network with communities is
the stochastic block model, and many algorithms for fitting it have been
proposed.   Since finding communities involves optimizing over all possible
assignments of discrete labels, most existing algorithms do not scale well
to large networks, and many fail on sparse networks.   In this talk, we
propose a pseudo-likelihood approach  for fitting the stochastic block
model to address these shortcomings.  Pseudo-likelihood is a general
statistical principle that involves trading off some of the model
complexity against computational efficiency.   We also derive a variant
that allows for arbitrary degree distributions in the network, making it
suitable for fitting the more flexible degree-corrected stochastic block
model.  The pseudo-likelihood algorithm scales easily to networks with
millions of nodes, performs well empirically under a range of settings,
including on very sparse networks, and is asymptotically consistent under
reasonable conditions.   If times allows, I will also discuss spectral
clustering with perturbations, a new method of independent interest we use
to initialize pseudo-likelihood, which works well on sparse networks where
regular spectral clustering fails.

Joint work with Arash  Amini, Aiyou Chen, and Peter Bickel.