MS&E334: The Structure of Social Data (Fall 2015)

Johan Ugander, Assistant Professor, MS&E
Email: jugander [at] stanford
Office location: Huang 357
Office Hours: Tu/Th, 4:30pm-5:30pm

Lecture hours: Tu/Th, 3:00pm-4:20pm
Lecture room: Thornton Center (Terman Annex), Rm 110

Course Description

Social networks have a rich history of study across many disciplines. Recent opportunities afforded by large-sale online instrumentation and experimentation have begun to provide a rich view of their structure and role in diverse social and economic domains. This course provides a survey of recent research in the study of social networks and large-scale social and behavioral data. Topics will include network models based on random graphs and their properties; centrality and ranking on graphs; ranking from comparisons; social influence and homophily; experimentation and causal inference on networks; heavy-tailed statistical distributions.

Most important links:

Course syllabus (PDF)
MS&E334 on Canvas

Lecture material

The literature below lays the foundation for the lecture material, though not all papers will be discussed in depth. If you have a focused interests in specific papers, feel free to come discuss them with me during office hours. The reference list will almost certainly be expanded in response to class discussions as the course progresses.

Week 1

Lecture 1: Course overview

An introduction to the course and high-level tour of content and goals.

Slides (PDF, 11MB) from course introduction.

Lecture 2: Graphs and graph properties

A review of graph definitions and properties. Graphical degree sequences.

Literature:

S.L. Hakimi (1962) "On realizability of a set of integers as degrees of the vertices of a linear graph. I", J. SIAM, 10:3, p. 496.
Havel-Hakimi game: [link]
M.E.J. Newman (2003) "The structure and function of complex networks." SIAM Review 45, 167-256.

Week 2

Lecture 3: Configuration Models

Configuration models are uniform distributions over specific spaces of graphs. We discussed the Simple Configuration Model and the Non-Simple Configuration Model, and how to uniformly sample from these two models.

J. Blitzstein, P. Diaconis (2011) "A sequential importance sampling algorithm for generating random graphs with prescribed degrees", Internet Mathematics.

Lecture 4: Erdos-Renyi, Preferential Attachment, Degree Assortativity

Power Law literature:

M. Faloutsos, P. Faloutsos, C. Faloutsos (1999). "On power-law relationships of the internet topology", SIGCOMM.
M. Mitzenmacher (2004), “A Brief History of Generative Models for Power Law and Lognormal Distributions.” Internet Mathematics, vol 1, No. 2, pp. 226-251.
M.E.J. Newman (2005), “Power laws, Pareto distributions and Zipf's law”, Contemporary Physics. [link]
A. Clauset, C. Shalizi, M.E.J. Newman (2009), “Power-law distributions in empirical data“, SIAM Review. [link]

Growth models:

A.-L. Barabasi, R. Albert (1999). "Emergence of scaling in random networks", Science.
D. Callaway, J. Hopcroft, J. Kleinberg, M.E.J. Newman, S. Strogatz (2001). “Are randomly grown graphs really random?” Physical Review E. [link]
D. Liben-Nowell, C. Knipe, C. Coalson (2013). “Indifferent Attachment: The Role of Degree in Ranking Friends”, ASONAM. [link]
J. Bagrow, D. Brockmann (2013) "Natural Emergence of Clusters and Bursts in Network Evolution", Phys Rev X. [link]

Not discussed: Benford’s Law, another "law" that in fact has a rigorous (if and only if) characterization that is directly related to scale-free distributions.

S. Newcomb. (1881), “Note on the frequency of use of the different digits in natural numbers.” Amer. J Math. 4 39-40.
F. Benford (1938), “The law of anomalous numbers.” Proceedings of the American Philosophical Society 78 551-572, 1938.
T.P. Hill (1995), “A statistical derivation of the significant-digit law.” Statistical Science, 10:4, p. 354-363.
Mark Nigrini (2012), “Benford's Law: Applications for forensic accounting, auditing, and fraud detection”, John Wiley & Sons (Book).
Radiolab episode: Numbers (2009) [link]

Week 3

Lecture 5: More graph models

Finishing preferential attachment; Stochastic Block Models; ERGMs. Other models.

SBMs:

P. W. Holland, K. B. Laskey, S. Leinhardt (1983) "Stochastic Blockmodels: First Steps," Social Networks.
B. Karrer, M.E.J. Newman (2011) "Stochastic blockmodels and community structure in networks," Physical Review E.
E. M. Airoldi, D. M. Blei, S. E. Fienberg, E. P. Xing (2008) “Mixed membership stochastic blockmodels,” JMLR.

Planted partition model:

A. Condon, D. Karp (2001) "Algorithms for graph partitioning on the planted partition model", Random Structure and Algorithms. (Extended abstract in RANDOM'99, August, 1999).
F. McSherry (2001) "Spectral Partitioning of Random Graphs", FOCS.

ERGMs:

G. Robins, P. Pattison, Y. Kalish, D. Lusher (2007). "An introduction to exponential random graph models for social networks". Social Networks 29: 173–191.
S. Bhamidi, G. Bresler, A. Sly (2008) "Mixing time of exponential random graphs", FOCS.
S. Chatterjee, P. Diaconis (2013) "Estimating and understanding exponential random graph models", Annals of Statistics.

Other models:

W. Aiello, F. Chung, L. Lu (2000) "A random graph model for massive graphs", STOC/PNAS.
P. D. Hoff, A. E. Raftery, M. S. Handcock. (2002) "Latent Space Approaches to Social Network Analysis." JASA.
S. Lattanzi, D. Sivakumar (2009) “Affiliation Networks.” STOC.
J. Pfeiffer III, S. Moreno, T. La Fond, J. Neville, B. Gallagher. (2014) "Attributed Graph Models: Modeling network structure with correlated attributes." WWW.

Lecture 6: The small-world phenomena (smallness and navigability)

S. Milgram (1967) “The small world problem,” Psychology Today.
J. Travers, S. Milgram (1969) “An Experimental Study of the Small World Problem,” Sociometry.
D. Watts, S. Strogatz (1998) "Collective dynamics of 'small-world' networks", Nature.
J. Kleinberg (2000) "The small-world phenomenon: An algorithmic perspective", STOC.
D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, A. Tomkins (2005) "Geographic routing in social networks", PNAS.
L Backstrom, E Sun, C Marlow (2010) "Find me if you can: improving geographical prediction with social and spatial proximity", ICWSM.
Z Yang, W Chen (2015) "A Game Theoretic Model for the Formation of Navigable Small-World Networks", WWW.

Distance distributions:

J. Leskovec, E. Horvitz (2008) "Planetary-Scale Views on an Instant-Messaging Network", Proceedings of WWW.
L. Backstrom, P. Boldi, M. Rosa, J. Ugander, S. Vigna (2012) "Four Degrees of Separation", WebSci.
A. Jacobs, S.F. Way, J. Ugander, A. Clauset (2015) "Assembling thefacebook: Using Heterogeneity to Understand Online Social Network Assembly", WebSci.

Week 4

Lecture 7 & 8 : Graph centrality and ranking

Katz, Bonacich, Eigenvector,PageRank, Betweenness, Harmonic centrality. Personalized variations.

Foundational papers:

L. Katz (1953) “A new status index derived from sociometric analysis.” Psychometrika, 18(1), 39-43.
P. Bonacich (1987) "Power and centrality: A family of measures." American Journal of Sociology, p. 1170-1182.
L. Page, S. Brin, R. Motwani, T. Winograd (1999) “The PageRank citation ranking: Bringing order to the web”, Stanford InfoLab.
A. Ng, A. Zheng, M. Jordan (2001) "Link Analysis, Eigenvectors and Stability", IJCAI. [link]

More recent perspectives:

P. Boldi, S. Vigna (2014), "Axioms for Centrality", Internet Mathematics 10, p. 222-262.
S. Vigna (2015) “A Weighted Correlation Index for Rankings with Ties”, Proceedings of WWW.
T. Martin, X. Zhang, M.E.J. Newman (2014) “Localization and centrality in networks”, Phys Rev E.
D. Gleich (2015) "PageRank Beyond the Web", SIAM Review 57:3, pp. 321-363. [link]

Centrality, personalized:

G. Jeh, J. Widom (2003) "Scaling personalized web search", Proceedings of WWW.
K. Kloster, D.F. Gleich (2014) "Heat kernel based community detection", Proceedings of KDD.

Week 5

Lecture 9: Comparisons and ranking from comparisons

Thurstone and Bradley-Terry models; Elo ratings.

L. L. Thurstone (1927) “A law of comparative judgment,” Psychological Review.
M. Glickman (1999) “Parameter estimation in large dynamic paired comparison experiments,” J Royal Statistical Society C.
R. Herbrich, T. Minka, T. Graepel (2006) "Trueskill™: A Bayesian skill rating system", NIPS.
D. Hunter (2004) "MM Algorithms for Generalized Bradley-Terry Models", Annals of Statistics. [link]
K. Tsukida, M. Gupta (2011) "How to Analyze Paired Comparison Data", University of Washington Tech Report # UW-EE-2011-0004. [link]
S. Negahban, S. Oh, D. Shah (2012) "Rank Centrality: Ranking from Pair-wise Comparisons", arXiv. [link]

Example applications:

D. Shahaf, E. Horvitz, R. Mankoff (2015) “Inside Jokes: Identifying Humorous Cartoon Captions” KDD 2015.

Other methods that seek status embeddings:

B. Ball, M. E. J. Newman (2013) "Friendship networks and social status", Network Science. [paper, Add Health data ("wave I")]

Lecture 10: The friendship paradox

Friendship paradox literature:

S. L. Feld (1991). “Why your friends have more friends than you do.” American Journal of Sociology, p. 1464-1477.
J. Ugander, B. Karrer, L. Backstrom, C. Marlow (2011) “The Anatomy of the Facebook Social Graph,” arXiv.
F. Kooti, N. O. Hodas, K. Lerman (2014) “Network Weirdness: Exploring the Origins of Network Paradoxes”, Proceedings of ICWSM.
S. Lattanzi, Y. Singer (2015). “The Power of Random Neighbors in Social Networks” Proceedings of WSDM.

Applications of the friendship paradox:

R. Pastor-Satorras, A. Vespignani (2002) “Immunization of complex networks.” Phys Rev E; 65: 036104.
N. A. Christakis, J. H. Fowler (2010) “Social network sensors for early detection of contagious outbreaks", PLOS One.
D. A. Kim et al. (2015) “Social network targeting to maximise population behaviour change: a cluster randomised controlled trial”, The Lancet.

Week 6

Lecture 11: Models of social processes: influence and contagion

M. Granovetter (1978) "Threshold Models of Collective Behavior", AJS.
D. McAdam (1986) "Recruitment to high-risk activism: The case of freedom summer", AJS.
J. Kleinberg (2007) "Cascading Behavior in Networks: Algorithmic and Economic Issues" in Algorithmic Game Theory (book chapter) [link]
B. State, L. Adamic (2015) “The Diffusion of Support in an Online Social Movement: Evidence from the Adoption of Equal-Sign Profile Pictures”, CSCW.

Lecture 12: Influence maximization; complex contagion; Homophily and Influence

D Kempe, J Kleinberg, E Tardos (2003) "Maximizing the spread of influence through a social network", Proceedings of KDD.
D. Centola, M. Macy (2007) "Complex contagions and the weakness of long ties", AJS.
D. Centola (2010) "The spread of behavior in an online social network experiment", Science.
D. Centola (2011) "An experimental study of homophily in the adoption of health behavior", Science.
J. Ugander, L. Backstrom, C. Marlow, J. Kleinberg. (2012) "Structural Diversity in Social Contagion", PNAS.
S. Aral, L. Muchnik, A. Sundararajan (2009) "Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks", PNAS.

Week 7

Lecture 13: Causal Inference

C. Manski (1993) "Identification of endogenous social effects: The reflection problem", The Review of Economic Studies.
C. Shalizi, A. Thomas (2011) "Homophily and contagion are generically confounded in observational social network studies", Sociological methods & research.
E. Bakshy, D. Eckles, R. Yan, I. Rosenn. (2012) "Social Influence in Social Advertising: Evidence from Field Experiments," Proceedings of EC.
E. Bakshy, I. Rosenn, C. Marlow, L. Adamic (2012) "The Role of Social Networks in Information Diffusion," Proceedings of WWW.
J. Zhang, D. Brackbill, S. Yang, D. Centola (2015) "Efficacy and causal mechanism of an online social media intervention to increase physical activity: Results of a randomized controlled trial", Preventitive Medicine Reports. [paper, data]

Lecture 14: Causal Inference under Interference

P. Aronow, C Samii (2011) "Estimating average causal effects under interference between units", arXiv.
C. Manski (2013) "Identification of treatment response with social interactions", The Econometrics Journal.
J. Ugander, B. Karrer, L. Backstrom, J. Kleinberg. (2013) "Graph Cluster Randomization: Network Exposure to Multiple Universes", Proceedings of KDD.
D. Eckles, B. Karrer, J. Ugander (2014) "Design and analysis of experiments in networks: Reducing bias from interference", arXiv.
H. Gui, Y. Xu, A. Bhasin, J. Han (2015) "Network A/B Testing: From Sampling to Estimation", Proceedings of WWW.
Y. Xu, N. Chen, A. Fernandez, O. Sinno, A. Bhasin (2015) "From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks", Proceedings of KDD.
D. Walker, L. Muchnik (2015) "Design of Randomized Experiments in Networks", Proceedings of IEEE. [link]

Weeks 8

Lecture 15: More on interference; Clustering

C. Manski (2013) "Identification of treatment response with social interactions", The Econometrics Journal.

Dissecting Papers

During weeks 8 and 9 the course will take on an active discursive style, aiming to synthesize what we've discussed as we dissect the methodologies of recent, complex applied papers.

Lecture 16: Get Out the Vote Experiment

R. M. Bond, C. J. Fariss, J. J. Jones, A. D. I. Kramer, C. Marlow, J. E. Settle, J. H. Fowler (2012), “A 61-million-person experiment in social influence and political mobilization”, Nature. [paper, supplemental information]

Lecture 17: Complex contagion in technology adoption

L. Beaman, A. BenYishay, J. Magruder, A. M. Mobarak (2015) “Can Network Theory-based Targeting Increaes Technology Adaption?”, Working paper. [link]

Lecture 18: A friend-of-random field experiment

D. A. Kim, A. R. Hwong, D. Stafford, D. A. Hughes, A J. O’Malley, J. H. Fowler, N. A. Christakis (2015). “Social network targeting to maximise population behaviour change: a cluster randomised controlled trial”, Lancet. [link]

Break - Week of Thanksgiving

Week 10

In-class presentations of student projects.

Tools and Data

Here are some libraries that might be useful for the problem sets and projects:

NetworkX (Python), graph-tool (Python), SNAP (C++, Python), igraph (R, python, C)

Some data sources: