Meng X-L. Statistical Paradises and Paradoxes in Big Data (I): Law of Large Populations, Big Data Paradox, and the 2016 US Presidential Election. Annals of Applied Statistics [Internet]. 2018;12 (2) :685-726. Ungated VersionAbstract
Statisticians are increasingly posed with thought-provoking and even paradoxical questions, challenging our qualifications for entering the statistical paradises created by Big Data. By developing measures for data quality, this article suggests a framework to address such a question: “Which one should I trust more: a 1% survey with 60% response rate or a self-reported administrative dataset covering 80% of the population?” A 5-element Eulerformula-like identity shows that for any dataset of size n, probabilistic or not, the difference between the sample average Xn and the population average XN is the product of three terms: (1) a data quality measure, ρR,X, the correlation between Xj and the response/recording indicator Rj ; (2) a data quantity measure, √(N − n)/n, where N is the population size; and (3) a problem difficulty measure, σX, the standard deviation of X. This decomposition provides multiple insights: (I) Probabilistic sampling ensures high data quality by controlling ρR,X at the level of N−1/2; (II) When we lose this control, the impact of N is no longer canceled by ρR,X, leading to a Law of Large Populations (LLP), that is, our estimation error, relative to the benchmarking rate 1/ √n, increases with √N; and (III) the “bigness” of such Big Data (for population inferences) should be measured by the relative size f = n/N, not the absolute size n; (IV) When combining data sources for population inferences, those relatively tiny but higher quality ones should be given far more weights than suggested by their sizes. Estimates obtained from the Cooperative Congressional Election Study (CCES) of the 2016 US presidential election suggest a ρR,X ≈ −0.005 for self-reporting to vote for Donald Trump. Because of LLP, this seemingly minuscule data defect correlation implies that the simple sample proportion of the self-reported voting preference for Trump from 1% of the US eligible voters, that is, n ≈ 2,300,000, has the same mean squared error as the corresponding sample proportion from a genuine simple random sample of size n ≈ 400, a 99.98% reduction of sample size (and hence our confidence). The CCES data demonstrate LLP vividly: on average, the larger the state’s voter populations, the further away the actual Trump vote shares from the usual 95% confidence intervals based on the sample proportions. This should remind us that, without taking data quality into account, population inferences with Big Data are subject to a Big Data Paradox: the more the data, the surer we fool ourselves.
Brewer PR, Wilson DC. Wedding Imagery and Public Support for Gay Marriage. Journal of homosexuality [Internet]. 2016;63 (8) :1041-1051. Publisher's VersionAbstract
This study uses an experiment embedded in a large, nationally representative survey to test whether exposure to imagery of a gay or lesbian couple’s wedding influences support for gay marriage. It also tests whether any such effects depend on the nature of the image (gay or lesbian couple, kissing or not) and viewer characteristics (sex, age, race, education, religion, and ideology). Results show that exposure to imagery of a gay couple kissing reduced support for gay marriage relative to the baseline. Other image treatments (gay couple not kissing, lesbian couple kissing, lesbian couple not kissing) did not significantly influence opinion.
Baker A. Race, Paternalism, and Foreign Aid: Evidence from U.S. Public Opinion. American Political Science Review [Internet]. 2015;109 (1). Article PDFAbstract

Virtually all previous studies of domestic economic redistribution find white Americans to be less enthusiastic about welfare for black recipients than for white recipients. When it comes to foreign aid and international redistribution across racial lines, I argue that prejudice manifests not in an uncharitable, resentful way but in a paternalistic way because intergroup contact is minimal and because of how the media portray black foreigners. Using two survey experiments, I show that white Americans are more favorable toward aid when cued to think of foreign poor of African descent than when cued to think of those of East European descent. This relationship is due not to the greater perceived need of black foreigners but to an underlying racial paternalism that sees them as lacking in human agency. The findings confirm accusations of aid skeptics and hold implications for understanding the roots of paternalistic practices in the foreign aid regime.

Seabrook NR, Dyck JJ, Edward L. Lascher J. Do Ballot Initiatives Increase General Political Knowledge?. Political Behavior [Internet]. 2014. Publisher's VersionAbstract

Current literature often suggests that more information and choices will enhance citizens’ general political knowledge. Notably, some studies indicate that a greater number of state ballot initiatives raise Americans’ knowledge through increases in motivation and supply of political information. By contrast, we contend that political psychology theory and findings indicate that, at best, more ballot measures will have no effect on knowledge. At worst greater use of direct democracy should make it more costly to learn about institutions of representative government and lessen motivation by overwhelming voters with choices. To test this proposition, we develop a new research design and draw upon data more appropriate to assessing the question at hand. We also make use of a propensity score matching algorithm to assess the balance in the data between initiative state and non-initiative state voters. Controlling for a wide variety of variables, we find that there is no empirical relationship between ballot initiatives and political knowledge. These results add to a growing list of findings which cast serious doubt on the educative potential of direct democracy.

Evans HK, Ensley MJ, Carmines EG. The Enduring Effects of Competitive Elections. Journal of Elections, Public Opinion, and Parties [Internet]. 2014;24 (4) :455-472. Publisher's VersionAbstract

Research on U.S. congressional elections has placed great emphasis on the role of competitiveness, which is associated with high levels of campaign spending, media coverage, and interest group and party involvement. Competitive campaigns have been found to increase citizens' participation, engagement and learning. However, little is known about whether the effects of competitive campaigns have enduring consequences for citizens' attitudes and behavior. Analyzing a survey of citizens conducted one year after the 2006 congressional elections that includes an oversample of respondents from competitive House races, we examine whether exposure to a competitive House campaign affects voters' political knowledge and political interest as well as their consumption of political news. We find that competitive elections have positive effects that endure for at least a year beyond the campaign season, reinforcing the idea that political competition plays a robust role in American representative democracy.

Dyck JJ, Pearson-Merkowitz S. To Know You is Not Necessarily to Love You: The Partisan Mediators of Intergroup Contact. Political Behavior [Internet]. 2014;36 (3) :553-580. Publisher's VersionAbstract

We propose the contact–cue interaction approach to studying political contact—that cues from trusted political elites can moderate the effect of contact on the formation of public policy opinions. Allport’s initial formulation of the contact effect noted that it relies on authority support. In a highly polarized political era, authoritative voices for individuals vary based on party identification. Social experiences may affect public policy, but they must also be considered in light of partisan filters. Using data from the 2006 CCES, we examine the manner in which straight respondents with gay family members, friends, co-workers and acquaintances view same-sex marriage policy, finding a strong contact effect among Democrats, but no contact effect among the strongest Republican identifiers. Our data and analyses strongly support the perspective that social interactions (and their effect on policy) are understood through the lens of partisanship and elite cues.