Algorithms for Causal Inference on Networks
PI: Johan Ugander
Assistant Professor,
Management Science & Engineering
School of Engineering
Stanford University
NSF Award #1657104
Activity period: 04/01/2017 - 3/31/2020
Award abstract
Randomized experiments (sometimes called "A/B tests") play increasingly fundamental roles in the design and refinement of modern online systems that drive the internet economy. Common design decisions include: What headline should be used with a news article; What prices should an item be offered at; or How should search results be ranked? However, modern web platforms exist atop strong networks of information flow and social interactions that mar the statistical validity of traditional experimental designs and analyses. This project aims to design graph clustering algorithms that can be used to administer experimental treatments in network-aware randomization designs and yield practically useful results. The project will train new graduate and undergraduate students in cutting-edge data science as they develop and deploy new research algorithms and software for causal inference at the intersection of modern computational and statistical research. The project will ultimately deliver new technical tools that increase the certainty of experiments that feature network interference, helping designers of online systems, both in industry and in government, make better decisions and craft better policies for an increasingly networked world.
This project aims to develop algorithmic tools for causal inference under network interference that go beyond mere proof-of-concepts, placing a specific focus on developments that will make network experimentation tools practical and provide significant and relevant results. The project will develop tools that harness graph metadata for variance reduction, and also develop techniques for running experiments in marketplaces, bipartite graphs rife with interference. In considering graph metadata, the goal is to develop stratified graph sampling methods that balance graph structure directly against latent traits. The deliverable assets of this project, which include algorithmic research as well as software for designing experiments, are intended to benefit computer scientists and statisticians, as well the broader community of researchers in the social and economic sciences that study networks.
Related literature and prior work
As a resource to those interested in the subject of this award, here is an overview of prior work and literature that this grant builds upon.
Interference:
- C. Manski (1993) "Identification of endogenous social effects: The reflection problem", The Review of Economic Studies.
- P. Aronow, C Samii (2011) "Estimating average causal effects under interference between units", arXiv.
- L. Backstrom, J. Kleinberg (2011) "Network bucket testing", WWW.
- L. Katzir, E. Liberty, O. Somekh (2012) "Framework and Algorithms for Network Bucket Testing", WWW.
- C. Manski (2013) "Identification of treatment response with social interactions", The Econometrics Journal.
- J. Ugander, B. Karrer, L. Backstrom, J. Kleinberg. (2013) "Graph Cluster Randomization: Network Exposure to Multiple Universes", Proceedings of KDD.
- D. Eckles, B. Karrer, J. Ugander (2014) "Design and analysis of experiments in networks: Reducing bias from interference", arXiv.
- H. Gui, Y. Xu, A. Bhasin, J. Han (2015) "Network A/B Testing: From Sampling to Estimation", Proceedings of WWW.
- Y. Xu, N. Chen, A. Fernandez, O. Sinno, A. Bhasin (2015) "From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks", Proceedings of KDD.
- J. Zhang, D. Brackbill, S. Yang, D. Centola (2015) "Efficacy and causal mechanism of an online social media intervention to increase physical activity: Results of a randomized controlled trial", Preventitive Medicine Reports. [paper, data]
- D. Walker, L. Muchnik (2015) "Design of Randomized Experiments in Networks", Proceedings of IEEE. [link]
- M. Saveski, J. Pouget-Abadie, G. Saint-Jacques, W. Duan, S. Ghosh, Y. Xu, E. Airoldi (2017) "Detecting Network Effects: Randomizing Over Randomized Experiments", Proceedings of KDD. [link]
Homophily:
- C. Shalizi, A. Thomas (2011) "Homophily and contagion are generically confounded in observational social network studies", Sociological methods & research.
- E. Bakshy, D. Eckles, R. Yan, I. Rosenn. (2012) "Social Influence in Social Advertising: Evidence from Field Experiments," Proceedings of EC.
- E. Bakshy, I. Rosenn, C. Marlow, L. Adamic (2012) "The Role of Social Networks in Information Diffusion," Proceedings of WWW.
Large-scale graph partitioning
- J Ugander, L Backstrom
(2013)
Balanced Label Propagation for Partitioning Massive Graphs
Proc. 6th ACM Int'l Conf. on Web Search and Data Mining (WSDM).
- J Nishimura, J Ugander
(2013)
Restreaming Graph Partitioning: Simple Versatile Algorithms for Advanced Balancing
Proc. 19th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD).
Seeding experiments:
- D. A. Kim et al. (2015) “Social network targeting to maximise population behaviour change: a cluster randomised controlled trial”, The Lancet.
- J. Cai, A De Janvry, E Sadoulet (2015) "Social networks and the decision to insure", AEJ: Applied Economics.
- L. Beaman, A. BenYishay, J. Magruder, A.M. Mobarak (2018) "Can network theorybased targeting increase technology adoption? Working paper, National Bureau of Economic Research.
- L. Beaman, A. Dillon (2018) "Diffusion of agricultural information within social networks: Evidence on gender inequalities from mali." Journal of Development Economics, 133:147–161.
- E. L. Paluck, H. Shepherd, P. M. Aronow (2016) "Changing climates of conflict: A social network experiment in 56 schools," Proceedings of the National Academy of Sciences, 113(3):566–571.
Publications produced
Interference:
Modeling the joint structure of attributes on networks:
Graph partitioning:
Seeding experiments:
Market experiments:
Software and code produced
All published papers supported by this grant include extensive notebooks documentating all data analysis, where applicable, that are shared as repositories on Github per the data management plan of this award.