April 1, 2016

The Rise of "Big Data" in American Elections

  • Contacting everyone is expensive

  • Not all contact is efficacious (Enos & Hersh, 2015)

  • Unclear who should be contacted, where to contact them

  • Gains from trade between marketing and campaigns (Hersh, 2015)

Sources of Data

  • State Voter Registration Files

  • Marketing Lists (Experian, Infogroup)

  • Contact Dispositions (surveys, canvassing, phone bank, etc.)

  • All rolled together by partisan data vendors (Catalist, Targetsmart, L2, Data Trust, etc.)

Research Questions

  • Who isn't on the lists? Who has bad contact information?

  • Do these unlisted and mislisted people have distinctive characteristics?

  • What are the consequences of their absence from politics?

  • What biases are introduced when drawing only from administrative data?

Survey Data: ANES Face-to-Face

  • Conducted in-person by professional interviewer

  • Sampling frame is active addresses from USPS Delivery Sequence File

  • Respondent selected at random from sampled address

  • Large monetary incentives (~$100) encourage high participation

  • Non-registrants underrepresented in the sample (Jackman & Spahn, 2014)

Matching to Voterfiles

  • 2,054 completed pre-election interviews

  • 2,006 respondents gave a full name

  • Also have an address and a birthdate (most of the time)

  • Matched to 3 vendors, 2 of which use commercial databases to supplement lists of registered voters

  • 1,693 of 2,006 matched.

  • Weighted match rate: 89%

Four Citizen Types

  • Registered (70%): found registered at sampled address
  • Unregistered (7%): found on commercial list at sampled address
  • Mislisted (12%): found registered or on a commercial list at another address
  • Unlisted (11%): not found


  • Identified 310 interesting ANES questions
  • Performed weighted one-way ANOVA on each question
  • Conduct F-tests for the null hypothesis that citizen types don't explain more survey response variation than one would expect by chance
  • Generates 310 \(p\)-values
  • Reject hypotheses while controlling the FDR at \(.05\) level

Citizen Type by Race

Race by Citizen Type

Age, Wealth and Income

Residential Tenure and Income

Interest in Politics

ID Possession

Campaign Contact