MS&E334: Topics in Social Data (Spring 2022)

Johan Ugander, Assistant Professor, MS&E
Email: jugander [at] stanford
Office location: Huang 357
Office Hours: by appointment

Lecture hours: Fr 1:30-4:30p
Lecture room: Thornton 110

Course Description

In-depth survey of methods for the analysis of large-scale social and behavioral data. Particular focus on recent developments in preference learning. Connections made to graph-theoretic investigations common in the study of social networks. Topics include discrete choice theory, random utility models, item-response theory, rank aggregation, centrality and ranking on graphs, and random graph models of social networks. Intended for Ph.D. students, but masters students with adequate background and interest in research topics are welcome to apply. Strongly recommended: 200-level courses in stochastic modeling (most specifically, Markov chains), optimization, and machine learning (e.g., MS&E 211, 221, 226, and CS161 or equivalents). Limited enrollment.

Most important links:

Basic overview:
Week 1 (4/1): Graphs and graph properties. Simple graphs, multigraphs, sampling of results in extremal graph theory. Structural empirics. Week 2 (4/8): Random graphs. Configuration model, growth models including preferential attachment, Chung-lu, and others. Introduce block models. Week 3 (4/15): Centrality on graphs, centrality as a ranking problem. Week 4 (4/22): Pairwise comparisons, ranking from pairwise comparisons, choice modelling Week 5 (4/29): Choices in context, choice beyond IIA Week 6 (5/6): Ranked data, permutations, triangle queries, metric learning from comparisons Week 7 (5/13): Block models. Mixed membership, degree-corrected, multilayer, etc. Week 8 (5/20): Trees and diffusion cascade data Week 9 (5/27): Respondent-driven sampling (RDS), network scale-up method Week 10 (6/3): Project Presentations

Some topics may receive a longer or shorter treatments depending on audience interest at the onset of the course. A detailed list of references will be posted on the course homepage as the course progresses.



Lecture material

The literature below lays the foundation for the lecture material, though only a handful of papers will be discussed in depth. If you have a focused interests in specific papers, feel free to come discuss them with me during office hours. The reference list will almost certainly be expanded in response to class discussions as the course progresses.

Week 1

A review of graph definitions and properties. Graph invariants. Graphical degree sequences. Combinatorial constraints on graphs.

Graph structure: Combinatorial constraints:

Week 2

A broad tour of random graph models. Configuration models (uniform distributions over specific spaces of graphs), Preferential Attachment models, power law degree sequences, stochastic block models, ERGMs.
Configuration models:

Power Law literature: Other growth models: SBMs: Planted partition model: ERGMs: Even more models:

Week 3

Katz, Bonacich, Eigenvector, PageRank, Betweenness, Harmonic centrality. Personalized variations.

Foundational papers: More recent perspectives: Centrality, personalized:

Week 4

Thurstone and Bradley-Terry-Luce models; Random Utility Models; Elo ratings; Item-response theory; Markov chain models.

Markov chain models: Example applications: Other methods that seek status embeddings:

Week 5

The Mallows model, Plackett-Luce, Rank Aggregation, Self-organizing lists

Week 6

Models of social processes: influence and contagion

Week 7

Causal inference under interference.

Weeks 8

Friendship paradox, small worlds. Applications of the friendship paradox: Small worlds: Distance distributions:

Tools and Data

Here are some libraries that might be useful for the problem sets and projects:

Some data sources: