MS&E334: Topics in Social Data (Spring 2022)

Johan Ugander, Assistant Professor, MS&E
Email: jugander [at] stanford
Office location: Huang 357
Office Hours: by appointment

Lecture hours: Fr 1:30-4:30p
Lecture room: Thornton 110

Course Description

In-depth survey of methods for the analysis of large-scale social and behavioral data. Particular focus on recent developments in preference learning. Connections made to graph-theoretic investigations common in the study of social networks. Topics include discrete choice theory, random utility models, item-response theory, rank aggregation, centrality and ranking on graphs, and random graph models of social networks. Intended for Ph.D. students, but masters students with adequate background and interest in research topics are welcome to apply. Strongly recommended: 200-level courses in stochastic modeling (most specifically, Markov chains), optimization, and machine learning (e.g., MS&E 211, 221, 226, and CS161 or equivalents). Limited enrollment.

Most important links:

MS&E334 on Canvas

Basic overview:
Week 1 (4/1): Graphs and graph properties. Simple graphs, multigraphs, sampling of results in extremal graph theory. Structural empirics. Week 2 (4/8): Random graphs. Configuration model, growth models including preferential attachment, Chung-lu, and others. Introduce block models. Week 3 (4/15): Centrality on graphs, centrality as a ranking problem. Week 4 (4/22): Pairwise comparisons, ranking from pairwise comparisons, choice modelling Week 5 (4/29): Choices in context, choice beyond IIA Week 6 (5/6): Ranked data, permutations, triangle queries, metric learning from comparisons Week 7 (5/13): Block models. Mixed membership, degree-corrected, multilayer, etc. Week 8 (5/20): Trees and diffusion cascade data Week 9 (5/27): Respondent-driven sampling (RDS), network scale-up method Week 10 (6/3): Project Presentations

Some topics may receive a longer or shorter treatments depending on audience interest at the onset of the course. A detailed list of references will be posted on the course homepage as the course progresses.

Lecture material

The literature below lays the foundation for the lecture material, though only a handful of papers will be discussed in depth. If you have a focused interests in specific papers, feel free to come discuss them with me during office hours. The reference list will almost certainly be expanded in response to class discussions as the course progresses.

Week 1

A review of graph definitions and properties. Graph invariants. Graphical degree sequences. Combinatorial constraints on graphs.

Graph structure:

MEJ Newman (2003) "The structure and function of complex networks," SIAM Review 45, 167-256.

Combinatorial constraints:

S.L. Hakimi (1962) "On realizability of a set of integers as degrees of the vertices of a linear graph. I", J. SIAM, 10:3, p. 496.
Havel-Hakimi game: [link]
N Alon, M Krivelevich (2010) "Extremal and Probabilistic Combinatorics", Princeton Companion to Mathematics. [link]
A Razborov (2008) "On the minimal density of triangles in graphs," Combinatorics, Probability and Computing.
J Ugander, L Backstrom, J Kleinberg (2013) "Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections", WWW. [link]

Week 2

A broad tour of random graph models. Configuration models (uniform distributions over specific spaces of graphs), Preferential Attachment models, power law degree sequences, stochastic block models, ERGMs.
Configuration models:

J. Blitzstein, P. Diaconis (2011) "A sequential importance sampling algorithm for generating random graphs with prescribed degrees", Internet Mathematics.
B. Fosdick, D. Larremore, J. Nishimura, J. Ugander (2018) "Configuring Random Graph Models with Fixed Degree Sequences," SIAM Review 60(2):315–355. [link] [code]

Power Law literature:

M. Faloutsos, P. Faloutsos, C. Faloutsos (1999). "On power-law relationships of the internet topology", SIGCOMM.
M. Mitzenmacher (2004), "A Brief History of Generative Models for Power Law and Lognormal Distributions", Internet Mathematics, vol 1, No. 2, pp. 226-251. [link]
M.E.J. Newman (2005), "Power laws, Pareto distributions and Zipf's law", Contemporary Physics. [link]
A. Clauset, C. Shalizi, M.E.J. Newman (2009), "Power-law distributions in empirical data", SIAM Review. [link]

Other growth models:

A.-L. Barabasi, R. Albert (1999). "Emergence of scaling in random networks", Science.
D. Callaway, J. Hopcroft, J. Kleinberg, M.E.J. Newman, S. Strogatz (2001). “Are randomly grown graphs really random?” Physical Review E. [link]
D. Liben-Nowell, C. Knipe, C. Coalson (2013). “Indifferent Attachment: The Role of Degree in Ranking Friends”, ASONAM. [link]
J. Bagrow, D. Brockmann (2013) "Natural Emergence of Clusters and Bursts in Network Evolution", Phys Rev X. [link]

SBMs:

P. W. Holland, K. B. Laskey, S. Leinhardt (1983) "Stochastic Blockmodels: First Steps," Social Networks.
E. M. Airoldi, D. M. Blei, S. E. Fienberg, E. P. Xing (2008) "Mixed membership stochastic blockmodels," JMLR.
B. Karrer, M.E.J. Newman (2011) "Stochastic blockmodels and community structure in networks," Physical Review E.

Planted partition model:

A. Condon, D. Karp (2001) "Algorithms for graph partitioning on the planted partition model", Random Structure and Algorithms. (Extended abstract in RANDOM'99, August, 1999).
F. McSherry (2001) "Spectral Partitioning of Random Graphs", FOCS.

ERGMs:

G. Robins, P. Pattison, Y. Kalish, D. Lusher (2007). "An introduction to exponential random graph models for social networks". Social Networks 29: 173--191.
S. Bhamidi, G. Bresler, A. Sly (2008) "Mixing time of exponential random graphs", FOCS.
S. Chatterjee, P. Diaconis (2013) "Estimating and understanding exponential random graph models", Annals of Statistics.

Even more models:

W. Aiello, F. Chung, L. Lu (2000) "A random graph model for massive graphs", STOC/PNAS.
P. D. Hoff, A. E. Raftery, M. S. Handcock. (2002) "Latent Space Approaches to Social Network Analysis." JASA.
S. J. Young, E. Scheinerman. (2007) "Random dot product graph models for social networks", International Workshop on Algorithms and Models for the Web-Graph.
A. Athreya et al. (2017) "Statistical inference on random dot product graphs: a survey", arXiv. [link]
S. Lattanzi, D. Sivakumar (2009) "Affiliation Networks." STOC.
J. Pfeiffer III, S. Moreno, T. La Fond, J. Neville, B. Gallagher. (2014) "Attributed Graph Models: Modeling network structure with correlated attributes." WWW.

Week 3

Katz, Bonacich, Eigenvector, PageRank, Betweenness, Harmonic centrality. Personalized variations.

Foundational papers:

L. Katz (1953) "A new status index derived from sociometric analysis." Psychometrika, 18(1), 39-43.
P. Bonacich (1987) "Power and centrality: A family of measures." American Journal of Sociology, p. 1170-1182.
L. Page, S. Brin, R. Motwani, T. Winograd (1999) "The PageRank citation ranking: Bringing order to the web", Stanford InfoLab.
A. Ng, A. Zheng, M. Jordan (2001) "Link Analysis, Eigenvectors and Stability", IJCAI. [link]

More recent perspectives:

P. Boldi, S. Vigna (2014), "Axioms for Centrality", Internet Mathematics 10, p. 222-262.
S. Vigna (2015) "A Weighted Correlation Index for Rankings with Ties", Proceedings of WWW.
T. Martin, X. Zhang, M.E.J. Newman (2014) "Localization and centrality in networks", Phys Rev E.
D. Gleich (2015) "PageRank Beyond the Web", SIAM Review 57:3, pp. 321-363. [link]

Centrality, personalized:

G. Jeh, J. Widom (2003) "Scaling personalized web search", Proceedings of WWW.
K. Kloster, D.F. Gleich (2014) "Heat kernel based community detection", Proceedings of KDD.
I. Kloumann, J. Ugander, J. Kleinberg (2017) "Block Models and Personalized PageRank", PNAS. [link]

Week 4

Thurstone and Bradley-Terry-Luce models; Random Utility Models; Elo ratings; Item-response theory; Markov chain models.

L. L. Thurstone (1927) “A law of comparative judgment,” Psychological Review.
M. Glickman (1999) “Parameter estimation in large dynamic paired comparison experiments,” J Royal Statistical Society C.
R. Herbrich, T. Minka, T. Graepel (2006) "Trueskill: A Bayesian skill rating system", NIPS.
D. Hunter (2004) "MM Algorithms for Generalized Bradley-Terry Models", Annals of Statistics. [link]
K. Tsukida, M. Gupta (2011) "How to Analyze Paired Comparison Data", University of Washington Tech Report # UW-EE-2011-0004. [link]

Markov chain models:

S. Negahban, S. Oh, D. Shah (2012) "Rank Centrality: Ranking from Pair-wise Comparisons," arXiv. [link]
L. Maystre, M. Grossglauser (2015) "Fast and accurate inference of Plackett-Luce models," NIPS. [link]
S. Ragain, J. Ugander (2016) "Pairwise Choice Markov Chains," NIPS. [link]
J. Blanchet, G. Gallego, V. Goyal (2016) "A Markov chain approximation to choice modeling", Operations Research [link]
S. Ieong, N Mishra, O. Sheffet (2012) "Predicting Preference Flips in Commerce Search," ICML. [link]
L. Maystre, M. Grosslgauser (2017) "ChoiceRank: Identifying Preferences from Node Traffic in Networks", ICML. [link]
R. Kumar, A. Tomkins, S. Vassilvitskii, E. Vee (2015) "Inverting a Steady State," WSDM. [link]

Example applications:

Y. Sismanis (2010) "How I won the `Chess Ratings - Elo vs the Rest of the World' Competition," arXiv. [link]
D. Shahaf, E. Horvitz, R. Mankoff (2015) “Inside Jokes: Identifying Humorous Cartoon Captions” KDD. [link]

Other methods that seek status embeddings:

B. Ball, M. E. J. Newman (2013) "Friendship networks and social status", Network Science. [link]

Week 5

The Mallows model, Plackett-Luce, Rank Aggregation, Self-organizing lists

H.D. Block, J. Marschak (1960) "Random orderings and stochastic theories of responses," Contributions to Probability and Statistics.
C. Dwork, R. Kumar, M. Naor, D. Sivakumar (2001) "Rank aggregation methods for the Web", WWW.
M. Braverman, E. Mossel (2009) "Sorting from Noisy Information", arXiv. [link]
F. Chierichetti, A. Dasgupta, R. Kumar, S. Lattanzi (2015) "On Learning Mixture Models for Permutations", ITCS.
T. Qin, X. Geng, T.Y. Liu (2010) "A new probabilistic model for rank aggregation," NIPS.
T. Joachims (2002) "Optimizing Search Engines using Clickthrough Data", KDD.
T.Y. Liu (2009) "Learning to rank for information retrieval", Foundations and Trends in Information Retrieval.

Week 6

Models of social processes: influence and contagion

M. Granovetter (1978) "Threshold Models of Collective Behavior", AJS.
D. McAdam (1986) "Recruitment to high-risk activism: The case of freedom summer", AJS.
J. Kleinberg (2007) "Cascading Behavior in Networks: Algorithmic and Economic Issues" in Algorithmic Game Theory (book chapter) [link]
S. Aral, D. Walker (2012) "Identifying influential and susceptible members of social networks", Science.
B. State, L. Adamic (2015) "The Diffusion of Support in an Online Social Movement: Evidence from the Adoption of Equal-Sign Profile Pictures", CSCW.

D. Kempe, J. Kleinberg, E. Tardos (2003) "Maximizing the spread of influence through a social network", Proceedings of KDD. [2003 KDD version, 2015 journal version]
D. Centola, M. Macy (2007) "Complex contagions and the weakness of long ties", AJS.
D. Centola (2010) "The spread of behavior in an online social network experiment", Science.
D. Centola (2011) "An experimental study of homophily in the adoption of health behavior", Science.
J. Ugander, L. Backstrom, C. Marlow, J. Kleinberg. (2012) "Structural Diversity in Social Contagion", PNAS.
S. Aral, L. Muchnik, A. Sundararajan (2009) "Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks", PNAS.

Week 7

Causal inference under interference.

C. Manski (1993) "Identification of endogenous social effects: The reflection problem", The Review of Economic Studies.
C. Shalizi, A. Thomas (2011) "Homophily and contagion are generically confounded in observational social network studies", Sociological methods & research.
E. Bakshy, D. Eckles, R. Yan, I. Rosenn. (2012) "Social Influence in Social Advertising: Evidence from Field Experiments," Proceedings of EC.
E. Bakshy, I. Rosenn, C. Marlow, L. Adamic (2012) "The Role of Social Networks in Information Diffusion," Proceedings of WWW.
J. Zhang, D. Brackbill, S. Yang, D. Centola (2015) "Efficacy and causal mechanism of an online social media intervention to increase physical activity: Results of a randomized controlled trial", Preventitive Medicine Reports. [paper, data]

P. Aronow, C Samii (2011) "Estimating average causal effects under interference between units", arXiv.
C. Manski (2013) "Identification of treatment response with social interactions", The Econometrics Journal.
J. Ugander, B. Karrer, L. Backstrom, J. Kleinberg. (2013) "Graph Cluster Randomization: Network Exposure to Multiple Universes", Proceedings of KDD.
D. Eckles, B. Karrer, J. Ugander (2014) "Design and analysis of experiments in networks: Reducing bias from interference", arXiv.
H. Gui, Y. Xu, A. Bhasin, J. Han (2015) "Network A/B Testing: From Sampling to Estimation", Proceedings of WWW.
Y. Xu, N. Chen, A. Fernandez, O. Sinno, A. Bhasin (2015) "From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks", Proceedings of KDD.
D. Walker, L. Muchnik (2015) "Design of Randomized Experiments in Networks", Proceedings of IEEE. [link]
M. Saveski, J. Pouget-Abadie, G. Saint-Jacques, W. Duan, S. Ghosh, Y. Xu, E. Airoldi (2017) "Detecting Network Effects: Randomizing Over Randomized Experiments", Proceedings of KDD.

Weeks 8

Friendship paradox, small worlds.

S. L. Feld (1991). “Why your friends have more friends than you do.” American Journal of Sociology, p. 1464-1477.
J. Ugander, B. Karrer, L. Backstrom, C. Marlow (2011) “The Anatomy of the Facebook Social Graph,” arXiv.
F. Kooti, N. O. Hodas, K. Lerman (2014) “Network Weirdness: Exploring the Origins of Network Paradoxes”, Proceedings of ICWSM.
S. Lattanzi, Y. Singer (2015). “The Power of Random Neighbors in Social Networks” Proceedings of WSDM.

Applications of the friendship paradox:

R. Pastor-Satorras, A. Vespignani (2002) “Immunization of complex networks.” Phys Rev E; 65: 036104.
N. A. Christakis, J. H. Fowler (2010) “Social network sensors for early detection of contagious outbreaks", PLOS One.
D. A. Kim et al. (2015) “Social network targeting to maximise population behaviour change: a cluster randomised controlled trial”, The Lancet.

Small worlds:

S. Milgram (1967) “The small world problem,” Psychology Today.
J. Travers, S. Milgram (1969) “An Experimental Study of the Small World Problem,” Sociometry.
D. Watts, S. Strogatz (1998) "Collective dynamics of 'small-world' networks", Nature.
J. Kleinberg (2000) "The small-world phenomenon: An algorithmic perspective", STOC.
D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, A. Tomkins (2005) "Geographic routing in social networks", PNAS.
L Backstrom, E Sun, C Marlow (2010) "Find me if you can: improving geographical prediction with social and spatial proximity", ICWSM.
Z Yang, W Chen (2015) "A Game Theoretic Model for the Formation of Navigable Small-World Networks", WWW.

Distance distributions:

J. Leskovec, E. Horvitz (2008) "Planetary-Scale Views on an Instant-Messaging Network", Proceedings of WWW.
L. Backstrom, P. Boldi, M. Rosa, J. Ugander, S. Vigna (2012) "Four Degrees of Separation", WebSci.
A. Jacobs, S.F. Way, J. Ugander, A. Clauset (2015) "Assembling thefacebook: Using Heterogeneity to Understand Online Social Network Assembly", WebSci.

Tools and Data

Here are some libraries that might be useful for the problem sets and projects:

NetworkX (Python), graph-tool (Python), SNAP (C++, Python), igraph (R, python, C)

Some data sources: