MS&E334: Topics in Social Data (Fall 2017)

Johan Ugander, Assistant Professor, MS&E
Email: jugander [at] stanford
Office location: Huang 357
Office Hours: by appointment

Lecture hours: Tu/Th, 3:00pm-4:20pm
Lecture room: Building 380, Room 381U (first floor)

Note: This page is being updated as course material for 2017 is being selected that differs slightly from previous years. The general structure of the course is fixed, with the content of some lectures up for change. This message will be removed when the course content is finalized.

Course Description

This course provides a in-depth survey of methods research for the analysis of large-scale social and behavioral data. There will be a particular focus on recent developments in discrete choice theory and preference learning. Connections will be made to graph-theoretic investigations common in the study of social networks. Topics will include random utility models, item-response theory, ranking and learning to rank, centrality and ranking on graphs, and random graphs. The course is intended for Ph.D. students, but masters students with an interested in research topics are welcome. Recommended: 221, 226, CS161, or equivalents.

Most important links:

2017 course syllabus (PDF)
MS&E334 on Canvas

Lecture material

The literature below lays the foundation for the lecture material, though only a handful of papers will be discussed in depth. If you have a focused interests in specific papers, feel free to come discuss them with me during office hours. The reference list will almost certainly be expanded in response to class discussions as the course progresses.

Week 1

Lecture 1: Course overview (9/26)

An introduction to the course and high-level tour of content and goals.

Lecture 2: Graphs and graph properties (9/28)

A review of graph definitions and properties. Graphical degree sequences. Combinatorial constraints on graphs.

General reference:

M.E.J. Newman (2003) "The structure and function of complex networks." SIAM Review 45, 167-256.

Combinatorial constraints:

S.L. Hakimi (1962) "On realizability of a set of integers as degrees of the vertices of a linear graph. I", J. SIAM, 10:3, p. 496.
Havel-Hakimi game: [link]
N Alon, M Krivelevich (2010) "Extremal and Probabilistic Combinatorics", Princeton Companion to Mathematics. [link]
J Ugander, L Backstrom, J Kleinberg (2013) "Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections", WWW. [link]

Week 2

Lectures 3 & 4: Random graph models (10/3, 10/5)

A broad tour of random graph models. Configuration models (uniform distributions over specific spaces of graphs), Preferential Attachment models, power law degree sequences, stochastic block models, ERGMs.
Configuration models:

J. Blitzstein, P. Diaconis (2011) "A sequential importance sampling algorithm for generating random graphs with prescribed degrees", Internet Mathematics.
B. Fosdick, D. Larremore, J. Nishimura, J. Ugander (2016) "Configuring Random Graph Models with Fixed Degree Sequences" [link] [code]

Power Law literature:

M. Faloutsos, P. Faloutsos, C. Faloutsos (1999). "On power-law relationships of the internet topology", SIGCOMM.
M. Mitzenmacher (2004), "A Brief History of Generative Models for Power Law and Lognormal Distributions", Internet Mathematics, vol 1, No. 2, pp. 226-251. [link]
M.E.J. Newman (2005), "Power laws, Pareto distributions and Zipf's law", Contemporary Physics. [link]
A. Clauset, C. Shalizi, M.E.J. Newman (2009), "Power-law distributions in empirical data", SIAM Review. [link]

Other growth models:

A.-L. Barabasi, R. Albert (1999). "Emergence of scaling in random networks", Science.
D. Callaway, J. Hopcroft, J. Kleinberg, M.E.J. Newman, S. Strogatz (2001). “Are randomly grown graphs really random?” Physical Review E. [link]
D. Liben-Nowell, C. Knipe, C. Coalson (2013). “Indifferent Attachment: The Role of Degree in Ranking Friends”, ASONAM. [link]
J. Bagrow, D. Brockmann (2013) "Natural Emergence of Clusters and Bursts in Network Evolution", Phys Rev X. [link]

SBMs:

P. W. Holland, K. B. Laskey, S. Leinhardt (1983) "Stochastic Blockmodels: First Steps," Social Networks.
E. M. Airoldi, D. M. Blei, S. E. Fienberg, E. P. Xing (2008) "Mixed membership stochastic blockmodels," JMLR.
B. Karrer, M.E.J. Newman (2011) "Stochastic blockmodels and community structure in networks," Physical Review E.

Planted partition model:

A. Condon, D. Karp (2001) "Algorithms for graph partitioning on the planted partition model", Random Structure and Algorithms. (Extended abstract in RANDOM'99, August, 1999).
F. McSherry (2001) "Spectral Partitioning of Random Graphs", FOCS.

ERGMs:

G. Robins, P. Pattison, Y. Kalish, D. Lusher (2007). "An introduction to exponential random graph models for social networks". Social Networks 29: 173--191.
S. Bhamidi, G. Bresler, A. Sly (2008) "Mixing time of exponential random graphs", FOCS.
S. Chatterjee, P. Diaconis (2013) "Estimating and understanding exponential random graph models", Annals of Statistics.

Even more models:

W. Aiello, F. Chung, L. Lu (2000) "A random graph model for massive graphs", STOC/PNAS.
P. D. Hoff, A. E. Raftery, M. S. Handcock. (2002) "Latent Space Approaches to Social Network Analysis." JASA.
S. J. Young, E. Scheinerman. (2007) "Random dot product graph models for social networks", International Workshop on Algorithms and Models for the Web-Graph.
A. Athreya et al. (2017) "Statistical inference on random dot product graphs: a survey", arXiv. [link]
S. Lattanzi, D. Sivakumar (2009) "Affiliation Networks." STOC.
J. Pfeiffer III, S. Moreno, T. La Fond, J. Neville, B. Gallagher. (2014) "Attributed Graph Models: Modeling network structure with correlated attributes." WWW.

Week 3

Lecture 5 & 6 : Graph centrality and ranking (10/10, 10/12)

Katz, Bonacich, Eigenvector, PageRank, Betweenness, Harmonic centrality. Personalized variations.

Foundational papers:

L. Katz (1953) "A new status index derived from sociometric analysis." Psychometrika, 18(1), 39-43.
P. Bonacich (1987) "Power and centrality: A family of measures." American Journal of Sociology, p. 1170-1182.
L. Page, S. Brin, R. Motwani, T. Winograd (1999) "The PageRank citation ranking: Bringing order to the web", Stanford InfoLab.
A. Ng, A. Zheng, M. Jordan (2001) "Link Analysis, Eigenvectors and Stability", IJCAI. [link]

More recent perspectives:

P. Boldi, S. Vigna (2014), "Axioms for Centrality", Internet Mathematics 10, p. 222-262.
S. Vigna (2015) "A Weighted Correlation Index for Rankings with Ties", Proceedings of WWW.
T. Martin, X. Zhang, M.E.J. Newman (2014) "Localization and centrality in networks", Phys Rev E.
D. Gleich (2015) "PageRank Beyond the Web", SIAM Review 57:3, pp. 321-363. [link]

Centrality, personalized:

G. Jeh, J. Widom (2003) "Scaling personalized web search", Proceedings of WWW.
K. Kloster, D.F. Gleich (2014) "Heat kernel based community detection", Proceedings of KDD.
I. Kloumann, J. Ugander, J. Kleinberg (2017) "Block Models and Personalized PageRank", PNAS. [link]

Week 4

Lectures 7 & 8: Ranking from comparisons and choice modelling (10/17, 10/19)

Thurstone and Bradley-Terry-Luce models; Random Utility Models; Elo ratings; Item-response theory; Markov chain models.

L. L. Thurstone (1927) “A law of comparative judgment,” Psychological Review.
M. Glickman (1999) “Parameter estimation in large dynamic paired comparison experiments,” J Royal Statistical Society C.
R. Herbrich, T. Minka, T. Graepel (2006) "Trueskill: A Bayesian skill rating system", NIPS.
D. Hunter (2004) "MM Algorithms for Generalized Bradley-Terry Models", Annals of Statistics. [link]
K. Tsukida, M. Gupta (2011) "How to Analyze Paired Comparison Data", University of Washington Tech Report # UW-EE-2011-0004. [link]

Markov chain models:

S. Negahban, S. Oh, D. Shah (2012) "Rank Centrality: Ranking from Pair-wise Comparisons," arXiv. [link]
L. Maystre, M. Grossglauser (2015) "Fast and accurate inference of Plackett-Luce models," NIPS. [link]
S. Ragain, J. Ugander (2016) "Pairwise Choice Markov Chains," NIPS. [link]
J. Blanchet, G. Gallego, V. Goyal (2016) "A Markov chain approximation to choice modeling", Operations Research [link]
S. Ieong, N Mishra, O. Sheffet (2012) "Predicting Preference Flips in Commerce Search," ICML. [link]
L. Maystre, M. Grosslgauser (2017) "ChoiceRank: Identifying Preferences from Node Traffic in Networks", ICML. [link]
R. Kumar, A. Tomkins, S. Vassilvitskii, E. Vee (2015) "Inverting a Steady State," WSDM. [link]

Example applications:

Y. Sismanis (2010) "How I won the `Chess Ratings - Elo vs the Rest of the World' Competition," arXiv. [link]
D. Shahaf, E. Horvitz, R. Mankoff (2015) “Inside Jokes: Identifying Humorous Cartoon Captions” KDD. [link]

Other methods that seek status embeddings:

B. Ball, M. E. J. Newman (2013) "Friendship networks and social status", Network Science. [link]

Week 5

Lecture 9: Ranking and permutation data (10/24)

The Mallows model, Plackett-Luce, Rank Aggregation, Self-organizing lists

H.D. Block, J. Marschak (1960) "Random orderings and stochastic theories of responses," Contributions to Probability and Statistics.
C. Dwork, R. Kumar, M. Naor, D. Sivakumar (2001) "Rank aggregation methods for the Web", WWW.
M. Braverman, E. Mossel (2009) "Sorting from Noisy Information", arXiv. [link]
F. Chierichetti, A. Dasgupta, R. Kumar, S. Lattanzi (2015) "On Learning Mixture Models for Permutations", ITCS.
T. Qin, X. Geng, T.Y. Liu (2010) "A new probabilistic model for rank aggregation," NIPS.
T. Joachims (2002) "Optimizing Search Engines using Clickthrough Data", KDD.
T.Y. Liu (2009) "Learning to rank for information retrieval", Foundations and Trends in Information Retrieval.

"Lecture 10": No class, Johan @ MIT (10/26)

Week 6

Lecture 11: Models of social processes: influence and contagion (10/31)

M. Granovetter (1978) "Threshold Models of Collective Behavior", AJS.
D. McAdam (1986) "Recruitment to high-risk activism: The case of freedom summer", AJS.
J. Kleinberg (2007) "Cascading Behavior in Networks: Algorithmic and Economic Issues" in Algorithmic Game Theory (book chapter) [link]
S. Aral, D. Walker (2012) "Identifying influential and susceptible members of social networks", Science.
B. State, L. Adamic (2015) "The Diffusion of Support in an Online Social Movement: Evidence from the Adoption of Equal-Sign Profile Pictures", CSCW.

Lecture 12: Influence maximization; complex contagion; Homophily and Influence (11/2)

D. Kempe, J. Kleinberg, E. Tardos (2003) "Maximizing the spread of influence through a social network", Proceedings of KDD. [2003 KDD version, 2015 journal version]
D. Centola, M. Macy (2007) "Complex contagions and the weakness of long ties", AJS.
D. Centola (2010) "The spread of behavior in an online social network experiment", Science.
D. Centola (2011) "An experimental study of homophily in the adoption of health behavior", Science.
J. Ugander, L. Backstrom, C. Marlow, J. Kleinberg. (2012) "Structural Diversity in Social Contagion", PNAS.
S. Aral, L. Muchnik, A. Sundararajan (2009) "Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks", PNAS.

Week 7

Lecture 13: Causal Inference of Peer Effects (11/7)

C. Manski (1993) "Identification of endogenous social effects: The reflection problem", The Review of Economic Studies.
C. Shalizi, A. Thomas (2011) "Homophily and contagion are generically confounded in observational social network studies", Sociological methods & research.
E. Bakshy, D. Eckles, R. Yan, I. Rosenn. (2012) "Social Influence in Social Advertising: Evidence from Field Experiments," Proceedings of EC.
E. Bakshy, I. Rosenn, C. Marlow, L. Adamic (2012) "The Role of Social Networks in Information Diffusion," Proceedings of WWW.
J. Zhang, D. Brackbill, S. Yang, D. Centola (2015) "Efficacy and causal mechanism of an online social media intervention to increase physical activity: Results of a randomized controlled trial", Preventitive Medicine Reports. [paper, data]

Lecture 14: Causal Inference under Interference (11/9)

P. Aronow, C Samii (2011) "Estimating average causal effects under interference between units", arXiv.
C. Manski (2013) "Identification of treatment response with social interactions", The Econometrics Journal.
J. Ugander, B. Karrer, L. Backstrom, J. Kleinberg. (2013) "Graph Cluster Randomization: Network Exposure to Multiple Universes", Proceedings of KDD.
D. Eckles, B. Karrer, J. Ugander (2014) "Design and analysis of experiments in networks: Reducing bias from interference", arXiv.
H. Gui, Y. Xu, A. Bhasin, J. Han (2015) "Network A/B Testing: From Sampling to Estimation", Proceedings of WWW.
Y. Xu, N. Chen, A. Fernandez, O. Sinno, A. Bhasin (2015) "From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks", Proceedings of KDD.
D. Walker, L. Muchnik (2015) "Design of Randomized Experiments in Networks", Proceedings of IEEE. [link]
M. Saveski, J. Pouget-Abadie, G. Saint-Jacques, W. Duan, S. Ghosh, Y. Xu, E. Airoldi (2017) "Detecting Network Effects: Randomizing Over Randomized Experiments", Proceedings of KDD.

Weeks 8

Lecture 15: The friendship paradox (11/14)

Friendship paradox literature:

S. L. Feld (1991). “Why your friends have more friends than you do.” American Journal of Sociology, p. 1464-1477.
J. Ugander, B. Karrer, L. Backstrom, C. Marlow (2011) “The Anatomy of the Facebook Social Graph,” arXiv.
F. Kooti, N. O. Hodas, K. Lerman (2014) “Network Weirdness: Exploring the Origins of Network Paradoxes”, Proceedings of ICWSM.
S. Lattanzi, Y. Singer (2015). “The Power of Random Neighbors in Social Networks” Proceedings of WSDM.

Applications of the friendship paradox:

R. Pastor-Satorras, A. Vespignani (2002) “Immunization of complex networks.” Phys Rev E; 65: 036104.
N. A. Christakis, J. H. Fowler (2010) “Social network sensors for early detection of contagious outbreaks", PLOS One.
D. A. Kim et al. (2015) “Social network targeting to maximise population behaviour change: a cluster randomised controlled trial”, The Lancet.

Lecture 16: The small-world phenomena (11/16)

S. Milgram (1967) “The small world problem,” Psychology Today.
J. Travers, S. Milgram (1969) “An Experimental Study of the Small World Problem,” Sociometry.
D. Watts, S. Strogatz (1998) "Collective dynamics of 'small-world' networks", Nature.
J. Kleinberg (2000) "The small-world phenomenon: An algorithmic perspective", STOC.
D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, A. Tomkins (2005) "Geographic routing in social networks", PNAS.
L Backstrom, E Sun, C Marlow (2010) "Find me if you can: improving geographical prediction with social and spatial proximity", ICWSM.
Z Yang, W Chen (2015) "A Game Theoretic Model for the Formation of Navigable Small-World Networks", WWW.

Distance distributions:

J. Leskovec, E. Horvitz (2008) "Planetary-Scale Views on an Instant-Messaging Network", Proceedings of WWW.
L. Backstrom, P. Boldi, M. Rosa, J. Ugander, S. Vigna (2012) "Four Degrees of Separation", WebSci.
A. Jacobs, S.F. Way, J. Ugander, A. Clauset (2015) "Assembling thefacebook: Using Heterogeneity to Understand Online Social Network Assembly", WebSci.

Break - Week of Thanksgiving

Week 9: Dissecting Papers (11/28, 11/30)

During Week 9 the course will take on an active discursive style, aiming to synthesize what we've discussed as we dissect the methodologies of recent, complex applied papers. We will take a survey during Week 8 to determine the papers we want to discuss. In recent years the following papers have been discussed. We will only do two papers.

E. Bakshy, S. Messing, L. A. Adamic (2015) "Exposure to ideologically diverse news and opinion on Facebook", Science [link]
R. M. Bond, C. J. Fariss, J. J. Jones, A. D. I. Kramer, C. Marlow, J. E. Settle, J. H. Fowler (2012), "A 61-million-person experiment in social influence and political mobilization", Nature. [paper, supplemental information]
L. Beaman, A. BenYishay, J. Magruder, A. M. Mobarak (2015) "Can Network Theory-based Targeting Increaes Technology Adaption?", Working paper. [link]
D. A. Kim, A. R. Hwong, D. Stafford, D. A. Hughes, A J. O'Malley, J. H. Fowler, N. A. Christakis (2015). "Social network targeting to maximise population behaviour change: a cluster randomised controlled trial", Lancet. [link]

Week 10

In-class presentations of student projects.

Tools and Data

Here are some libraries that might be useful for the problem sets and projects:

NetworkX (Python), graph-tool (Python), SNAP (C++, Python), igraph (R, python, C)

Some data sources: