Frequently Asked Questions

Answers to some of the questions users frequently have about the CCES are below (Authors: Brian Schaffner, Shiro Kuriwaki). Please address any other questions to the PIs.


How are respondents recruited into the CCES?

A large portion of the CCES respondents are YouGov panelists. These are people who have made an account on to receive periodic notifications about new surveys. Others are recruited live from online advertisements or are recruited from another survey provider. Therefore, while panelists are prompted to participate in the CCES, they opt-in to being a YouGov panelist. 

In order to make the sample representative, not all respondents to the CCES questionnaire end up in the final dataset. To read more about the pruning process used to match to the target population, please refer to the guide. 

On what device do respondents take the survey?

The CCES has always been an online survey. The use of devices have changed each year. In 2018, 35 percent used Desktop, 56 percent used a smartphone, 9 percent used a tablet (In contrast, 90 percent used a Desktop in 2012).  

Are respondents compensated for their responses?

YouGov respondents are compensated by points for taking each survey. Respondents can exchange accumulated points with giftcards and other prizes.

How long is one sitting of a survey?

The pre-election wave of the CCES is designed to take approximately 20 minutes, though this timing varies depending on the respondent. The post-election wave is designed to take 10 minutes. The odd-year CCES surveys happen in one 20 minute wave.

Can one CCES be used as a time series of the election cycle?

We would be cautious of this. The timing of response (the start time and endtime of the interviews) are not random. For example, respondents who are easier to reach or are more likely to opt-in tend to respond to the CCES first. Survey vendors may also encourage their panel to respond at different times in different intensities.


Is the CCES representative of all national adults, or only those who vote?

The CCES is designed to be representative of all national adults.

Copy and paste this code to your website.
Copy and paste this code to your website.

Are weighted samples of states designed to be representative of each state?

Yes. State is one of the variables used to construct the target population.

Are weighted samples of congressional districts (CD) designed to be representative of each CD?


Variables and Questions

How should I define the Latino / Hispanic voters?

There are two operationalizations:

  1. Set those with race == "Hispanic" to 1set everyone else to 0.

  2. Set to those with either race == "Hispanic" or those who identified as Hispanic/Latino in the follow-up question asking ethnicity to 1; set everyone else to non-Hispanic. 

The follow-up question, "Are you of Spanish, Latino, or Hispanic origin or descent?", is only asked to respondents who do not select "Hispanic" as their race. It is called "hispanic" in most CCES datasets.

In short, the second option will count Hispanic Blacks and Hispanic Whites as Latino/Hispanic. When the CCES weights to popualtion distributions of race, we use the second definition. 


How does CCES determine the representative's congressional district?

We use the full line address if we have it on file and they say they live there. If we do not have their full address,  we use their zip codes. 

Does CCES identify the respondent's state legislative district?

No. The mapping between zipcodes and state legislative districts is more unreliable compared to congressional districts.

In the dataset, what do all the variations for zipcode (inputzip, regzip, lookupzip) distinguish?

“inputzip” is a zipcode each respondent hand enters for the question of the form, “So that we can ask you about the news and events in your area, in what zip code do you currently reside?”  

Some datasets have a separate field for an individual’s zip code where registered (“regzip”) and the zip code of residence.  The “regzip” is the response to the question “Is [inputzip] the zip code where you are registered to vote?” which includes the option “No (I am registered to vote at this zip code: [User enters zipcode]) ”

“lookupzip” is the zipcode used to look up a voter’s state and Congressional district. It will be the zip code of a person’s residence unless the zip code in which they are registered to vote is different, in which case this will be their registration zip code. In other words, if both “regzip” and “inputzip” are provided and are different, the former is used for “lookupzip”. This variable will not be missing and is used as the main zipcode for, e.g., the cumulative dataset.


How does the CCES construct weights?

CCES weighting generally constitues two steps: matching and post-stratification weighting. The approach and algorithm changes somewhat across studies. Each even-year guide on Dataverse provides details on the weighting approach taken for that particular survey. 

What are the weights that end in “_vv”, and which should I use?

2016 and prior: For election year common contents that include both a weight and a weight variable with the "_vv"  suffix (2012, 2016), the best weights to use for all analyses are the weights with a “_vv” suffix. Those are the weights that were calculated after the vote validation and therefore they are the most accurate. The weight variables that do not include the “_vv” suffix are included only so that scholars can replicate analyses that they may have produced prior to the vote validation. Where those two versions do not exist (2006 - 2010, 2014, 2018-), the weight variables are the weights that have been computed after vote validation. We have seen this is often a source of confusion, and we will employ a clearer naming moving forward.

2018 and onwards: vvweight assigns weights only to active registered voters. "commonweight" in 2018 and onwards are weights for the adult population. See the 2018 guide for details.

The key point is that “_vv” weights are not weights specific to the turnout electorate, especially 2016 and prior. They still apply to voters and non-voters alike, but they are simply called that because they are weights after vote validation has been performed. 

In 2012, 2016, and 2020, there are also weights that include the “_post” suffix or term. We recommend the use of the post-election weights variable any time researchers use variables from the post-election wave of the study in their analyses. The post-election weights help to adjust for attrition between the pre- and post-election waves of the study. 

In the cumulative file, what is "weight_cumulative"?

What is the difference between "weight" and "weight_cumulative" in the cumulative Common Content?

“weight_cumulative” is a simple transformation of the the most up to date "weight" in each common content that only adjust for the different sample size in each year. Use "weight_cumulative" for multi-year analysis for in which you want each year to be weighted the same (despite their different sample).

Validated Vote

How do I distinguish validated voters and non-voters using validated vote?

We have created some guides for earlier datasets to help explain the use of these variables in a bit more detail. In more recent years, this information has been integrated into the guides themselves. We recommend you consult these guides as the validation process and variables change slightly from year to year. However, some brief information follows.

Typically, the main variable of interest will be the variable indicating the vote method used by a respondent. This typically takes on values such as “absentee,” “mail,” “early,” “polling,” and “unknown.” If a respondent has any of these values, they have a validated vote record (“unknown” means that the state did not record what method the individual used to vote, but the individual did vote).

It should be noted that a record may not be matched to the voter file either because the individual is not registered to vote or because of incomplete or inaccurate information that prevented a match. Matches are made only with records for which there is a high level of confidence that the respondent is being assigned to the correct record. However, even by setting a high threshold of confidence, there will still be some false-positives which should be considered when using the validation records. 

For identifying non-voters, the researcher may take several different approaches. These different options are laid out in the guides, but the most common approach is to simply treat all individuals who are not validated voters as non-voters (regardless of whether they were matched to the voter files or not). The justification for this approach is the fact that the most common reason that the voter file firm will not have a record for an individual is because that individual is not registered to vote. Indeed, rates of self-reported non-registration and non-voting are much higher among un-matched respondents than among those for whom there is a match.


Is the CCES a panel?

The main CCES studies are based on different cross-sectional samples in each year. Thus, these do not constitute a panel survey where the same respondents are being re-interviewed year after year. However, the CCES did conduct a panel survey in 2010, 2012, and 2014 and you can find the data for that study here. 

Are the respondents in the 2010-2014 panel the same as those in Common Content each year?

This panel survey was born out of the sample of respondents who took the 2010 common content, but those respondents were reserved for the panel survey in subsequent years. 19,000 of those who are in the 2010 common content dataset were re-interviewed in 2012 and 9,500 of that group were re-interviewed again in 2014. (See the guides for those datasets for more information on how the panel was constructed.) Thus, respondents in the panel datasets will overlap with respondents in the 2010 common content dataset, but they will not overlap with the 2012 and 2014 common content datasets.


Who writes the CCES questions?

The principal investigators write the common content questions in the summer before the election, with input from others. Questions in team modules are written by the owners of the module. YouGov also maintains standard  panel questions for all members of their panel. These are often indicated by descriptive variable names (such as “race”, “educ”, and “birthyr”). 



Peer Reviewed Documentation


More general discussion of methodology can be found in the following peer-reviewed academic articles.

On online surveys as opposed to phone and mail:

Stephen Ansolabehere and Brian Schaffner. 2014. "Does Survey Mode Still Matter? Findings from a 2010 Multi-Mode Comparison."  Political Analysis. 22(3): 285–303.

On the cooperative structure of the CCES:

Stephen Ansolabehere and Douglas Rivers. 2013. "Cooperative Survey Research.Annual Review of Political Science. 16(1): 307-329.

On the voter validation:

Stephen Ansolabehere and Eitan Hersh. 2012. "Validation: What Big Data Reveal About Survey Misreporting and the Real Electorate.Political Analysis. 20(4): 437-459.