icon picker
How to Run Surveys:

A guide to creating your own identifying variation and revealing the invisible
How to Run Surveys: A guide to creating your own identifying variation and revealing the invisible ∗ Stefanie Stantcheva
October 11, 2022

Abstract

Surveys are an essential approach for eliciting otherwise invisible factors such as perceptions, knowledge and beliefs, attitudes, and reasoning. These factors are critical determinants of social, economic, and political outcomes. Surveys are not merely a research tool. They are also not only a way of collecting data. Instead, they involve creating the process that will generate the data. This allows the researcher to create their own identifying and controlled variation. Thanks to the rise of mobile technologies and platforms, surveys offer valuable opportunities to study either broadly representative samples or focus on specific groups. This paper offers guidance on the complete survey process, from the design of the questions and experiments to the recruitment of respondents and the collection of data to the analysis of survey responses. It covers issues related to the sampling process, selection and attrition, attention and carelessness, survey question design and measurement, response biases, and survey experiments.

1 Introduction

Surveys are an invaluable research method. Historically, they have been used to measure important variables, such as unemployment, income, or family composition. Today, there often is high-quality administrative or other big data that we can use for this purpose. However, some things remain invisible in non-survey data: perceptions, knowledge and beliefs, attitudes, and reasoning. These invisible elements are critical determinants of social, economic, and political outcomes.
As economists, we typically tend to prefer “revealed preference” approaches, which involve inferring unobserved components from observed behavior and constraints. These methods are useful and suitable for a wide range of questions. However, when it comes to measuring and identifying the above-mentioned invisible factors, there are many challenges. One could, in principle, specify a complete structural model of beliefs or other invisible factors and use observational or quasi-experimental data on some behaviors to estimate these underlying factors. However, this requires many assumptions and identifying variations that may be absent in the data. For instance, suppose you wanted to measure people’s beliefs about whether a carbon tax would reduce car emissions or whether trade policy will lead to adverse distributional consequences. You would likely have difficulty finding behaviors that allow you to identify these perceptions. There are plenty of other examples of perceptions, beliefs, attitudes, or reasonings that profoundly shape our views on policy and social issues but that we do not necessarily “reveal” with our microeconomic, observed behavior. Surveys are an essential approach for eliciting these intangibles more directly.
Surveys are not merely a research “tool” – they are part of a unique and distinct research process. Furthermore, surveys are not only a way of “collecting data.” Unlike when using observational data, you
∗Stantcheva: Harvard, CEPR, and NBER (e-mail: sstantcheva@fas.harvard.edu). I thank my students and co-authors who have worked with me to develop these methods and have taught me a lot. I thank Alberto Binetti, Martha Fiehn, and Francesco Nuzzi for excellent research assistance.
1
are the one creating the process that will generate the data. You can therefore create your own controlled and identifying variation. This process presents many opportunities as well as many challenges. Your survey design is an integral part of your research process.
Thanks to the rise of mobile technologies and platforms, online surveys offer valuable opportunities to study either broadly representative samples or focus on specific groups. They are flexible and customizable and can be made appealing and interactive for respondents. They allow researchers to conduct large-scale investigations quickly –sometimes in real time–and explore new questions. They are indeed a way to engage with people and get a glimpse of their mental processes.
In order to use surveys for economic research in a fruitful way, there are, however, important issues to take into account. This paper provides a practical and complete guide to the whole survey process, from the design of the questions and experiments to the collection of data and recruitment of respondents to the analysis of the survey responses. The goal is to give researchers from many fields and areas practical guidance on leveraging surveys to collect new and valuable data and answer original questions that are challenging to answer with other methods. The examples used in the paper come from a wide range of fields–a testimony of the extensive use of survey methods. Although the paper focuses on written surveys, specifically online ones, many of the concepts and tips apply to surveys more broadly, regardless of the mode.
The paper is organized as follows. Section 2 describes the sampling process and recruiting of respondents. It also discusses how to deal with selection and differential attrition. Section 3 presents methods to foster the attention of respondents and minimize careless answers, as well as methods to screen for inattentive respondents or careless answer patterns. Section 4 dives into the design of survey questions. It covers topics such as general best practices, open-ended questions, closed-ended questions, visual design, measurement issues, monetary incentives and real-stakes questions, and the ordering of questions. Section 5 addresses common biases that threaten the validity of surveys, such as order biases, acquiescence bias, social desirability bias, and experimenter demand effects. Section 6 offers guidance on conducting different types of survey experiments. The Appendix contains useful supplementary materials, including reviews of many papers relevant to each section.

2 Sample

2.1 Types of sample
The first question is what kind of sample you need for your research question. A nationally representative sample can be valuable in many settings, while a more targeted sample, e.g., one obtained by oversampling minorities, specific age groups, restricting to employees or job seekers, etc., may be more appropriate in others.1 A useful notion is “sampling for range” (Small, 2009), i.e., the idea that your sample should be intentionally diverse in terms of conceptually important variables. Appendix A-1.1 reviews different sampling methods.
There are different types of survey channels you could use to build your sample: i) Nationally representative panels. In the US, examples include the Knowledge panel,2 NORC’s
AmeriSpeak3 and the Understanding America Study.4 ii) Commercial survey companies which use quota sampled panels, such as Qualtrics, Dynata, Bilendi,
and Prolific Academic.
iii) Commercial survey marketplaces (such as Lucid), which are very similar to commercial survey com- panies but require more “in-house” and hands-on management of the survey process by the researcher. I will discuss these at the same time as the survey companies.
iv) Convenience samples, which, as the name indicates, are sample populations that are “convenient” for the researcher to access. Examples are university students or conference participants.
1Surveys can also be done for firms, instead of individuals or households, as in Link et al. (2022) and Weber et al. (2022). 2 https://www.ipsos.com/en- us/solutions/public- affairs/knowledgepanel 3 https://amerispeak.norc.org/us/en/amerispeak/about- amerispeak/overview.html 4 https://uasdata.usc.edu/index.php
2
v) Online work platforms like Amazon’s Mechanical Turk (MTurk), which are in between convenience samples and quota sampled panels, given the large pool of respondents.
vi) Targeted groups from specific pools, such as experts, employees at a firm, economists, etc. vii) Government or institutions’ surveys, e.g., surveys run by Statistics Denmark for matching tax data
with survey data, or the Survey of Consumer Expectations.
These survey channels differ in the control they give you over the recruiting process of your respon- dents. Therefore, the advice below tries to distinguish between the cases where you do a survey “in-house,” with complete control over your process, versus using a platform with a given process in place. Appendix A-1.2 provides information pooled from several survey companies’ documentation about their recruitment processes, rewards, and pools of respondents.
Sometimes you may be able to use mixed-method surveys to reach different types of respondents (e.g., online plus phone survey or online plus door-to-door surveys). For instance, the Understanding America Panel recruits panel members through address-based sampling with paper invitation letters. Individuals who lack internet access are provided with a tablet and an internet connection, which increases the coverage rate of this panel. While mixed methods could introduce discrepancies between respondents who answer through different modes, they could be particularly valuable if you are interested in surveying populations that are less likely to be online, including segments of the population in developing countries. This paper’s content and visual design issues also apply to mixed methods. However, there are specificities to consider if the survey is over the phone or in person.
2.2 Survey errors and selection into online surveys
This section discusses how online survey respondents compare to the target populations, starting with a review of survey errors.
Survey errors and threats to representativeness. Figure 1 illustrates the different stages, from the target population to the final sample, and the errors that occur at each stage. The target population is your population of interest, e.g., all adults 18-64 in the US. The sampling frame or pool of potential respondents represents all the people in the population you can potentially sample and invite to the survey. One bias occurs from coverage error, which is the difference between the potential pool of respondents and the target population. For instance, in online surveys, you will not be able to sample people who are not online. The planned or target sample is all the people you would ideally like to complete your survey. The difference between the planned sample and the sampling frame is due to sampling error, i.e., the fact that you are drawing only a sample from the full sampling frame. Different types of sampling are reviewed in Appendix A-1.1. For instance, probability sampling will lead to random differences between your target sample and the sampling frame. The actual sample or “set of respondents” represents the people who end up taking your survey. Non-response error refers to the differences between the target sample and the actual sample. This error can be due to the respondent not receiving or seeing the invitation, ignoring the invitation, or following up on the invitation but refusing to participate. In general, it is difficult to distinguish between these cases (not just in the case of survey companies). Most of the time, we know little about non-responders, other than information embedded in the sampling frame. Sometimes we do have extra information (e.g., in consecutive waves of a longitudinal survey or from additional data, such as administrative records from which the sample is selected).
We can further distinguish between unit non-response bias (the difference between respondents who start the survey and those in the planned sample) and item non-response when respondents start the survey but some answers are missing. A special case of item non-response is attrition, the phenomenon of respondents dropping out of the study before completing it. In this case, all items past a specific question are missing. Attrition induces a bias if it is differential, i.e., not random.5 There are ways to minimize non-response bias and attrition bias ex-ante and ways to correct for them ex-post. Conditional on respondents seeing the
5In longitudinal surveys in which respondents are interviewed multiple times, selection into subsequent rounds is typically called “attrition.”
3
survey invitation, one can expect that a good design of the invitation and landing page, as explained in Section 2.3, minimizes the selection of respondents based on the topic. These errors and biases will greatly depend on the survey channels and methods used. Next, we discuss the typical case of commercial survey companies.
Figure 1: From the target population to the actual sample
Selection in online surveys. What do we know about these survey errors in the case of commercial survey companies and survey marketplaces? Appendix A-1.2 provides information on their recruiting chan- nels, processes, and pools of respondents. The sampling frame is respondents who are in the panels of the company. Table A-1 shows how these pools of respondents compare to the population of several countries and two large survey companies. The sampling procedure is akin to quota sampling, which makes it difficult to estimate the sampling error and identify the planned sample. Typically, survey companies can target the invitations to background characteristics, and invitations are likely somewhat random, conditional on observed characteristics (see Appendix A-1). When using survey companies, it is not easy to clearly differ- entiate between sampling error and non-response error. Because it is difficult to track the respondents in each of these stages, we can use the term selection bias to jointly denote the difference between respondents who start the survey and those in the target population.
Online surveys have some key advantages in terms of selection, as compared to in-person, phone, or mail surveys: (i) they give people the flexibility to complete the survey at their convenience, which reduces selection based on who is free to answer during regular work hours or who opens the door or picks up the phone; (ii) the convenience of mobile technologies may entice some people who would otherwise not want to fill out questionnaires or answer questions on the phone to take surveys; (iii) they allow surveyors to reach people that are otherwise hard to reach (e.g., younger respondents, those who often move residences, respondents in remote or rural areas, etc.); (iv) they offer a variety of rewards for taking surveys, which can appeal to a broader group of people (especially when done through survey platforms). Some rewards can appeal to higher-income respondents as well (e.g., points for travel or hotels).6
6While different in their goal, which is typically measurement and provision of statistics, government surveys (done over the phone, mail, or in-person, now with computer-assisted technology) also face selection problems. For instance, Census Bureau (2019) lists hard-to-survey populations, some of which could be significantly easier to reach via online surveys or other types of platforms, particularly people in physically hard-to-reach areas, dense urban areas, temporary situations (e.g., short-term renters), or younger respondents, which still have mobile or internet access. Other target populations are likely challenging to reach through any survey channel, such as people who are migrant and minorities, homeless, in disaster areas, institutionalized, seafarers and fishers, nomadic and transitory, face language barriers, have disabilities preventing them from taking surveys, or have limited connectivity. Some other key surveys also suffer from misrepresentation of some groups, sometimes in a way that
4
Comparing online samples to nationally representative samples. We compare the characteristics of samples from surveys using online commercial survey platforms to the characteristics of the target population across various papers in Appendix A-1. Table 1 shows that, in the US, across many platforms, online samples can offer a good representation of a broad spectrum of incomes ($25,000 - $100,000). However, like many other survey methods, they are not suitable for reaching the tails of the income distribution (i.e., the very poor or very rich). They tend to skew more educated, white (white and non-Hispanic respondents are typically oversampled whereas Black respondents tend to be undersampled), and somewhat Democratic (at the expense of both Republican and Independent-leaning respondents). Respondents from larger urban areas and urban clusters tend to be overrepresented, whereas those from medium- and small-sized urban and rural areas are often underrepresented. Some papers do use online platforms to successfully replicate studies done on nationally representative or convenience samples (see Berinsky et al. (2012), Heen et al. (2020) and Appendix A-1.3).
In other high-income countries, according to Table 1, the representativity of online samples looks relatively consistent with that in the US. However, in developing or middle-income countries, online samples are not nationally representative. Instead, they could be considered online representative because they represent people who are well-connected to the internet and use mobile technologies.
Papers that match survey data to population-wide administrative data can also provide valuable infor- mation on selection into online surveys. For example, a sample recruited by Statistics Denmark looks almost identical to the target population (as in Hvidberg et al. (2021)).7
These comparisons between the samples and the target population rely, by necessity, on observable variables. Non-probability sampling, such as the quota sampling performed by survey companies, carries risks in terms of representativeness. Therefore, it is important to always critically assess your sample in light of your survey method and topic before suggesting that your results generalize to the target population (see Section 2.6).
2.3 Recruiting respondents
When using a more hands-on survey channel, you can directly control the content and format of the initial email or invitation to respondents, the number and timing of reminders, and the rewards system. On the contrary, commercial survey companies essentially handle the recruitment process (as explained in Appendix A-1.2). Regardless of the survey channel used, you have complete control over your survey landing page and your survey design. You can check existing papers (including the many referenced in this paper) for examples of recruiting emails and survey landing pages. It is good practice to include screenshots of your consent and landing page in your paper. If you are doing a more hands-on survey, you should also include all recruitment materials.
2.3.1 The survey landing page
The initial recruitment email and landing page of your survey are critical. You need to increase your survey engagement while avoiding selection based on your topic. Below are some general tips.
Reduce the perceived costs of taking the survey from the start by specifying the (ideally short) survey length.
Use simple language and a friendly visual design. Make sure everything is easily readable (on mobile devices, too), which signals to respondents that the rest of your survey will be clear and well-designed.
Do not reveal too much about the identity of the surveyor. There is a tradeoff between revealing more about yourself and your institution and telling respondents just the bare minimum for them to feel confident in taking the survey. Think about the difference between “We are a group of non-partisan academic
is quite different from online samples. For instance, Brehm (1993) shows that in the National Election Studies Survey and the General Social Survey, young and elderly adults, male respondents, and high-income respondents are underrepresented, while people with low education levels are over-represented.
7Other papers that have matched admin data to survey data and find good representativity include Karadja et al. (2017), who use paper mail surveys, and Epper et al. (2020), who invite people through paper mail to take an online survey.
5
researchers” versus “We are a group of faculty members from the Economics Department at Harvard and Princeton.” On the one hand, revealing more may bias respondents’ perceptions of the survey based on their perception of your institution (and its political leaning). On the other hand, it can provide legitimacy. Some amount of information is often required by IRB, and these requirements can differ by institution. You can ask respondents whether they perceive your institution and survey as biased at the end of the study.
Appear legitimate and trustworthy. (i) Think about the tradeoff between revealing more about your identity and institution versus not. (ii) Provide contact information where respondents can express com- plaints and issues or provide other feedback. Respondents need to be able to get in touch with you. (iii) Provide information about how the data will be stored and used. IRB will often ask for specific language and a link to their contact and information page. If surveys are conducted outside of the US, there will be specific rules, such as the GDPR in the EU. (iv) Reassure respondents about complete anonymity and con- fidentiality. Survey companies have rules and agreements for respondents, but it is always good to reiterate that respondents are anonymous and their data is protected.
Provide limited information about the purpose of the study. Some information about the survey is needed, but I would advise against revealing too much about the actual research topic to avoid selection. For instance, “This is a survey for academic research” may be sufficient, and “This is a survey for academic research in social sciences” is probably ok, too. “This is a survey for academic research on immigration,” instead, will likely induce some selection based on the topic. You should never reveal the purpose or intent of the study (“We are interested in how people misperceive immigrants” or “We are interested in how information about immigrants can change people’s perceptions”).
Specify some possible benefits of the survey either for research and society more broadly or for the respondent themselves (e.g., they may learn exciting things and may be able to express their opinion).
Warn against poor response quality. If appropriate for your audience, inform respondents that careless answers may be flagged and their pay may be withheld. Note that in the case of commercial survey companies, there are typically already explicit agreements between respondents and companies on the quality of the survey responses.
There is some tradeoff between getting people interested in your topic and inducing selection bias because of it. Survey companies tend to provide little information about the survey (see Appendix A-1.2). Selection is a more serious issue in some settings than others, so you must assess based on your specific situation. In surveys through commercial survey companies, I try to provide as little information as possible about the topic on the consent page (and in the first few pages of the survey). Instead, I first try to collect basic information on respondents, which will allow me to identify whether there is differential attrition or selection based on the topic. Given the large potential pool of respondents, differential attrition and selection are much greater concerns than getting enough sample size. However, in another survey done on a high-quality sample with the help of Statistics Denmark (Hvidberg et al., 2021), we already have complete information on anyone in the target population and can quickly check for selection. In this case, we worry less about selection and maximizing engagement, since we are interested in getting a large enough and broad sample. In such cases, the tradeoff is in favor of a more informative landing page.
2.3.2 Other elements of the recruiting process
There are additional elements of the recruitment process that you will have to address unless you hire a survey company to do them for you.
Writing an invitation email. This can be personalized to the respondent and incorporate the tips about the survey landing page discussed above.
Sending reminders. You must plan for and send reminder emails to respondents to encourage them to take the survey.
Ensuring that your respondents are legitimate and verified. Survey companies have several layers of verification in place (see Appendix A-1.2). Following the rise in bots, automatic survey-takers, and fraudsters, you will need to (i) employ CAPTCHAs and more sophisticated tasks at the start of the survey,
6
such as open-ended questions (for which you can check the content) or logical questions; (ii) not share the link publicly and only distribute it through reliable channels; (iii) double down on the data quality checks discussed in Section 3.
Managing incentives and rewards. While survey companies will do this for you, if you are running your survey independently, you will need to set appropriate rewards and ensure you have a way to transfer rewards to respondents. Note also that, typically, respondents that are part of survey panels are, by construction, more likely to respond to surveys than those who have not signed up for surveys. If not using survey companies or panels of respondents, you will need to work hard on recruitment and incentives.
Setting quotas. Although survey companies may do this for you, you can generally impose your quota screening at the start of the survey. This involves asking respondents some screening questions and channeling them out of the survey in case their quota is already full.
2.4 Managing the survey
When administering your survey, you need to carefully monitor the entire process to avoid issues you may not have noticed during the design phase.
Soft-launch the survey. Before launching the full-scale survey, you should run a small-scale version or “soft launch” of the complete survey. This is slightly different from the pre-testing and piloting discussed in Section 4, which is about testing the content and questions. It is about figuring out whether there are technical issues with your survey flow.
Monitor the survey. One advantage of online surveys is that you can monitor the data collection in real-time and adapt to unforeseen circumstances. First, you must pay attention to dropout rates. If you notice respondents dropping out at particular points, you may want to pause the survey and figure out the problem. This will also help you flag potential technical issues you may have missed while testing. Similarly, monitor your quotas. If one quota is filling up too fast, it will be challenging to fill the other groups later on. Finally, regularly check the designated survey email inbox in case respondents have sent emails that flag problems.
Check the data during the collection process. From the earliest responses, you should have a procedure to start checking the validity of answers, tabulating answers, and spotting possible misunderstandings or errors. Also, check that the data you are collecting is being recorded correctly.
2.5 Attrition
Reporting attrition. The level of attrition and its correlation with observable and unobservable charac- teristics are important issues in a survey. It is good practice to report detailed statistics on attrition for your survey, including i) your total attrition rate with a clear definition (for example, which respondents count as “having started the survey” versus “completed” it? Do you count respondents who failed possible attention checks? Who skipped the basic demographic questions?); ii) your attrition rate at key stages in the survey, such as upon or after learning the topic of the survey, answering socioeconomic questions, seeing an experimental treatment, etc.; and iii) correlations of attrition with respondent characteristics. To be able to test for differential attrition, some background information on the respondent is needed. If there is no outside source for that information (e.g., administrative data), there is a strong rationale for asking socioe- conomic and background questions earlier in the survey to see whether respondents are selectively dropping out. There are tradeoffs in this ordering of survey blocks, which I discuss further in Section 4.6.
Table 2 gives a sense of the distribution of attrition rates across various papers and platforms. Subject to the caveat that attrition is not defined in the same way across different studies, attrition rates tend to range between 15% and somewhat above 30%, depending on the platform used and the survey length. Patterns of correlation between personal attributes and attrition are not clear cut and will likely depend on the topic and design of the survey. In a study across 20 countries, Dechezleprˆetre et al. (2022) find that women, younger, lower-income, and less-educated respondents are more likely to drop out, but differences in attrition rates
7
are not large. Table 2 shows that survey length may be correlated with higher attrition. Respondents in the treatment branches of surveys with an experimental component are sometimes more likely to drop out, either because of the added time commitment or based on the topic (which can introduce bias in the treatment effects estimated).
Preventing attrition. The best remedies for attrition are a smooth respondent experience (e.g., pages loading quickly, a clear visual design, well-formulated questions as described in Section 4), a shorter survey, and good incentives (here again, it helps if survey companies have a variety of possible rewards that appeal to a broad range of people, rather than just one type of reward, which may induce selection). It is a good idea to avoid too many attention check questions (see Section 3), personal questions, and complex questions, all of which could irritate respondents. It is also good to be careful about revealing the topic too early on before you know enough about who the respondents are (so that you can check for selective/differential attrition based on the topic.
2.6 Correcting for non-response bias (selection and attrition)
Correcting for non-response biases is essentially a question of how to deal with missing data. Data can be missing for specific entries (item non-response), for all entries (unit non-response), or for all entries after a given point (attrition). It can be missing completely at random, at random conditional on observables, or not at random. The corrections to apply, if any, depend on your goal and the statements you are trying to make. Are you trying to provide descriptive statistics that are supposed to represent the views of the target population? Or to present treatment effects that are generalizable? Data is very rarely missing truly at random, but that does not mean selection is always a serious problem.
In some cases, if your sample looks very close to the target population in terms of observables and if attrition is small and not systematically correlated with observables, it may be best to not attempt any correction. All corrections require some assumptions and can introduce additional noise. Policy views depend on observables like income, age, or political affiliation. The questions then are whether policy views are systematically correlated with other characteristics (conditional on these observables) that make people more or less prone to taking surveys or to dropping out of the survey, e.g., upon learning the topic or seeing a treatment based on their policy views. These cannot be checked per se (you can only check selection and attrition based on observables), but you can think about the likelihood of this issue. It is reassuring if the sample is already close to representative of your target population along observable dimensions.
Three proposed solutions involve making adjustments as part of the estimation process (computing means or treatment effects). One involves re-weighing observations according to one of several methods, in order to adjust for non-response or align sample characteristics with the target population (described in Section 2.6.1). Re-weighting typically requires assuming that data is missing at random conditional on observables and may increase your standard errors. Another solution is to explicitly model selection or attrition into the survey, which does not require assuming that selection or attrition are random conditional on observables but requires modeling assumptions and some credible “instrument” for the selection (see Section 2.6.2). A third solution is to bound the effects of interest, rather than provide point estimates (see Section 2.6.3).
A fourth solution is to impute missing data directly, according to one of several approaches, directly estimating the value each non-respondent might have reported (imputation) had they been a respondent (described in Section 2.6.4). Imputations are most often used for item non-response and can be used for one or multiple entries (e.g., one can impute all entries, after someone drops out due to attrition, or individual missing entries throughout the survey).
2.6.1 Weighting methods
In this subsection, we consider weighting methods that can be used to deal with two issues: adjusting for unit non-response and aligning sample characteristics with population characteristics (called poststratification weighting). Kalton and Flores-Cervantes (2003) provide an extensive summary of these methods. For more coverage of weighting methods, see Section 3.3 of Little and Rubin (2002). The general strategy in weighting consists in finding responders who are similar to non-responders based on auxiliary data and increasing their weight. Non-longitudinal surveys typically do not contain much auxiliary data (unlike in panel surveys,
8
where you have information about the respondents from previous rounds). The reweighing methods are similar in the cases of non-response and post-stratification because the purpose is to align the sample data with some external data (in the case of non-response correction, the goal is to align the actual sample with the total targeted sample; in the case of poststratification, the goal is to align it with the population characteristics). Poststratification weighting is covered in more detail in Kalton (1983), Little (1986), and Little (1993). Below, we discuss the examples in the case of aligning sample data with population data, but the methods are readily applied to nonresponse corrections by replacing “population data” with “data for the planned sample” as done in Kalton and Flores-Cervantes (2003).
Cell-weighting. Cell-weighting adjustments involve sorting respondents and non-respondents into cells based on certain characteristics and adjusting the weights of respondents in each cell by a given factor so that the sample totals conform to the population totals on a cell-by-cell basis. You first need some data on the population, cell by cell (which may be difficult to find for some target populations). When the target is the national population, you can use censuses. The assumption needed for the validity of this adjustment is that the people who did not answer the survey are like those who did answer, which means that in any given cell, a random sample of people was invited to participate and a random set of those invited answered the survey.
The choice of variables to construct the poststratification cells can be made by simple methods or so- phisticated algorithms (see Kalton and Flores-Cervantes (2003)). Some useful results to keep in mind are, first, that poststratification based on cells that are homogeneous with respect to the outcome of interest reduces both the variance and bias in estimates based on the data (Holt and Smith, 1979). Poststratification based on cells that are homogeneous with respect to the participation in the survey reduces the bias but may increase the variance (Little, 1986). In a nutshell, this means that you should choose variables that (you think) are good predictors of the survey outcome in the first place and the propensity to respond to the survey in the second place.
Related weighting methods. Related methods include raking (which uses an iterative procedure to make the marginal distributions of the sample, instead of the joint distributions, conform to the population) and generalized regression estimation (which constructs weights based on several variables, including transformed or interacted ones). Cell weighting can lead to unstable weighting adjustments if there are cells with very few respondents. Mixture methods can be used in that case: You can use multilevel regression for small cells before poststratification cell-reweighing (see Gelman and Little (1997)).
Logistics regression weighting and inverse probability weighting methods predict the probability of responding or completing the survey based on auxiliary information. They require you to know the pool of respondents (e.g., the characteristics of the full population) so you can estimate a probit of par- ticipating/dropping out based on observables (that can be exogenous but also endogenously related to the variables of interest). For a treatment of these methods, see Fitzgerald et al. (1998), Wooldridge (2002a), and Wooldridge (2007). Inverse probability weights are a common approach for dealing with differential attrition (Bailey et al., 2016; Imbens and Wooldridge, 2009). In this application, observations in the treatment and control groups are reweighed to remain comparable to their pre-attrition samples.8
Standard errors. You must account for weighting when computing your standard errors, which most software can do. Weighting can increase your variance by a little or a lot, depending on the adjustment size. If some weight adjustments become too large, there are methods to trim the weights or collapse cells (Kalton and Flores-Cervantes, 2003).
2.6.2 Model-based approaches
Model-based approaches tackle attrition and sample selection parametrically. They explicitly model the selection or attrition process and do not need to assume that they are random, conditional on observables. Typical econometric methods are covered in Chapter 17 of Wooldridge (2002b). They were developed in the
8More generally, most methods in this section can be used for differential attrition too). 9
context of program evaluation and general non-response (leading to missing data on dependent or explanatory variables) and can also be applied to surveys.
Models like Heckman (1979) selection require finding an instrument that affects selection or attrition, but not the outcomes of interest. In the context of surveys, this could be some randomized variation in the survey process, such as the number of times a respondent was contacted or the rewards offered. Such variation is not always available. If you have control over these survey parameters, you can think ahead to their use later during your analysis. Such a model-based correction is proposed in Dutz et al. (2021) and uses variation in participation rates due to randomly-assigned incentives and in the timing of reminder emails and text messages. Behaghel et al. (2015) similarly provide a model that mixes the Heckit model and the bounding approach of Lee (2009) (covered below), using the number of prior calls made to each individual before obtaining a response to the survey as a pseudo-instrument for sample selectivity correction models. The method can be applied whenever data collection entails sequential efforts to obtain a response (in their case, trying to call several times in a phone survey or making several visits to the respondent’s home; in the case of online surveys, sending repeated invitations to take the survey) or even gradually offering higher incentives (rewards) to potential respondents.
2.6.3 Bounding methods
Bounding methods are typically non-parametric techniques to provide interval estimates for the effects of interest, relying on relatively few assumptions. Some bounding techniques use imputations (similar to the methods in Section 2.6.4), while others use trimming.
The worst-case approach by Horowitz and Manski (2000) imputes missing information using minimal and maximal possible values of the outcome variables and bound population parameters with almost no assumptions except that the variables need to be bounded. These bounds can be wide and non-informative, but are useful benchmarks, especially for binary variables. For estimating treatment effects when there is attrition, Kling et al. (2007) construct bounds using the mean and standard deviation of the observed treatment and control distributions, which leads to tighter bounds than the “worst-case approach.”
Lee bounds. Lee (2009) proposes a method to bound the treatment effects estimates when the control and treatment groups’ attrition is differential. Bounds are estimated by trimming a share of the sample, either from above or below.9 To obtain tighter bounds, lower and upper bounds can be estimated using several (categorical) covariates and trimming the sample by cells instead of overall. Many improvements and refinements for Lee bounds exist. To apply this type of bounds, the treatment has to be randomly assigned, and treatment assignment should only be able to affect attrition in one direction (monotonicity assumption).
2.6.4 Imputation methods
Imputation methods are non-parametric techniques to fill in missing data.
Hot deck imputations replace missing values with a random draw from some “donor pool.” Donor pools are values of the corresponding variable for responders that are similar, according to some metric, to the respondent with the missing value. For instance, hot decks may be defined by age, race, sex, or finer cells.10 In some cases, the donor is drawn randomly from a set of potential donors (“random hot deck method,” see Andridge and Little (2010)). In other cases, a single donor is drawn based on a distance metric (“deterministic hot deck methods”). With a stretch of terminology, some methods impute summary values, such as a mean over a set of donors.
9The actual method is as follows: 1) Calculate the trimming fraction p defined as the fraction remaining in the group with less attrition minus the fraction remaining in the group with more attrition, scaled by the fraction remaining in the group with less attrition. 2) Drop the lowest p% of outcomes from the group with less attrition. Again, calculate the mean outcomes (descriptive stats or treatment effects) for the trimmed group with less attrition. This is one bound. 3) Repeat step 2) by dropping the highest p% of outcomes from the group with less attrition, which yields another bound.
10The US Census Bureau uses a classic hot deck procedure for item non-response in the Income Supplement of the Current Population Survey (CPS).
10
Regression-based imputations replace missing values with predicted values from a regression of the missing variable on variables observed for the respondent, typically estimated on respondents who do not have this missing variable or who have complete responses. Stochastic regression imputation replaces missing values with a value predicted by regression imputation plus a residual drawn to reflect uncertainty in the prediction.
Random hot-deck or stochastic regression imputations align with the guidelines for creating good impu- tations in Little and Rubin (2002). They suggest that imputations need to be i) conditional on observed variable to improve precision and reduce bias from non-response, and account for the correlation between missing and observed variables; ii) multivariate, to preserve the relations between missing variables; and iii) randomly drawn from predictive distributions, to account for variability (rather than deterministically set to a value such as a conditional mean).
2.6.5 Best practice tips
The first step in dealing with selection and attrition is accurately reporting them to your readers. For attrition, i) describe your overall rate of attrition; ii) correlate it with observables; and iii) provide the timeline of when people drop out (see Section 2.5). For selection, compare your sample carefully to the target population along as many dimensions as possible. If the characteristics are similar, this is reassuring, although responders may differ from non-responders in other ways that are not measurable (and this is not testable). For item non-response, you can identify specific questions where there are more or many missings. For example, if you have to use a variable with many missing observations, you need to discuss this more extensively than if the variable has only a few missing responses. Your adjustments or corrective procedure and reporting should depend on the magnitude of the non-response, selection, and attrition problems. It may be worthwhile checking the robustness of your results to various correction methods among those described here.
It is helpful to report your “raw” survey results before any adjustment, either as a benchmark case or in the Appendix. You can acknowledge that the results hold, strictly speaking, just for your sample and may or may not hold for the target population. After you apply one or several correction methods (re- weighting, bounding, imputation, or model-based adjustments), you can report these results (in the main text or Appendix) for comparison with the raw ones. A final tip is to use questions on attitudes, views, or beliefs from existing, high-quality, representative (of your target population) surveys that can serve as benchmarks. You can compare the answers in your study to those in benchmark surveys so that you have an extra validation beyond comparing socioeconomic or demographic characteristics.
3 Managing Respondents’ Attention
Once you have recruited a high-quality sample, the essential asset in your survey is your respondents’ attention. As is the case for many other survey issues, the condition sine qua non in dealing with respondents’ attention or lack thereof is a good survey design. Beyond that, there are some targeted methods, described here.
3.1 Ex ante methods to check for attention
First, you need to collect extensive “meta-data” for your surveys to diagnose issues with attention and carelessness. There are options to do so in survey software such as Qualtrics. They include time spent on each survey screen and the entire survey, number of clicks or scrolling behavior, time of the day the survey was taken, and the device used (e.g., browser versus mobile phone).
One way to identify careless respondents is through “Screeners,” i.e., questions specifically designed to detect inattentive answers. There are different ways of structuring such questions:
• Logical questions require logical reasoning and have a clear, correct answer (e.g., “would you rather eat a fruit or soap?”), as described in Abbey and Meloy (2017). The issue is that there is a clear tradeoff between the subtlety of the question and the existence of an unambiguously correct answer.
11
• Instructional manipulation checks are questions that look like standard survey questions but instruct the respondent to provide a certain answer. Note, however, that they may affect, rather than measure, the attentiveness of the respondent (Kane and Barabas, 2019). An adapted example from Berinsky et al. (2014) is:
Example: People often consult internet sites to read about breaking news. We want to know which news you trust. We also want to know if you are paying attention, so please select ABC News and Fox News regardless of which sites you use. When there is a big news story, which is the one news website you would visit first? (Please only choose one)
□ New York Times website □ Huffington Post □ Washington Post website
□ The Drudge Report □ Fox News □ ABC News website
□ The Associated Press (AP) website □ Reuters website □ National Public Radio (NPR) website
• Factual manipulation checks are questions with correct answers that are placed after experimental treatments and relate to their content. These questions can either be before or after the measurement of the outcome and can be about treatment-relevant information (which is manipulated across treat- ment groups, in which case the questions serve as a check of comprehension) or treatment-irrelevant information (not manipulated across treatment groups, in which case they act as attention checks only). Section 6 provides more advice on ordering such questions in a survey experiment.
Overall, while screeners have the advantage of increasing attention and measuring carelessness, they can also annoy respondents and increase attrition rates (Berinsky et al., 2016). If you decide to use screeners, use them sparingly and strategically. For instance, you could consider using them at random points to check for survey fatigue or attention at different points in the survey.
Once you have identified careless respondents, you must decide whether to drop them (threatening external validity) or leave them in (threatening internal validity and increasing noise). There is some evidence that screener passages correlate with relevant characteristics, and excluding those who fail them may limit generalizability (Berinsky et al., 2014).11
Another possible solution is to try to induce more attention in the first place. Berinsky et al. (2016) study ways to prompt people to pay more attention: i) “training” respondents (letting them answer the attention check question again until they get it right). This increases dropout from the survey, presumably as respondents get annoyed. ii) Warning that the researcher can spot careless answers along the lines of “The researchers check responses carefully to ensure they read the instructions and responded carefully”, which also slightly increases dropout. iii) Thanking respondents: “Thank you very much for participating in our study. We hope that you will pay close attention to the questions on our survey”. This method reduces dropout by a bit. None of these conditions is thus really effective. Krosnick and Alwin (1987) suggest reminding a respondent to focus if a question is tricky. Such prompts have to be used sparingly, or they lose their effectiveness. In a nutshell, it is challenging to prompt attention through artificial methods. One of the best bets is good design to avoid squandering precious respondent attention (as explained in Section 4).
3.2 Ex post data quality checks
Instead of inserting ad-hoc questions, you can also verify the respondent’s attention ex-post through various techniques. Once you have identified potentially problematic and careless respondents, you could check the robustness of your results to including these cases versus dropping them. You can also create flags for different degrees of carelessness by applying several checks and identifying “very careless respondents” (e.g., who get flagged in many of the checks) versus “moderately careless” or “mildly careless” ones and checking the robustness of your results to dropping and including these groups.
11That paper finds that, across five different surveys, older respondents are more likely to pass the screener (but this relation- ship dampens for those older than 60), women are significantly more likely to pass screeners than men, and racial minorities are less likely to pass screeners.
12
Consistency indices are measures that match items supposed to be highly correlated by design or empir- ically and check whether they are correlated. Some common techniques are i) Psychometric Synonyms and Antonyms which are pairs of items that are highly positively correlated (synonyms) or negatively correlated (antonyms). An example of psychometric antonyms would be the answers to the questions “Are you happy?” and “Are you sad?” (Curran, 2016). You can check the within-respondent correlation for these pairs. ii) Odd-Even Consistency checks involve splitting survey questions based on their order of appearance and checking that items that should be correlated are correlated (see Appendix A-2). Consistency indices are mainly useful if your survey includes several questions on the same topic (that we expect to be correlated) and is asked on similar scales. These methods are reviewed in Meade and Craig (2012).
Response pattern indices detect patterns in consecutive questions (see Meade and Craig (2012)). i) The LongString measure is the longest series of consecutive questions on a page to which the respondent gave the same answer (e.g., for how many questions in a row a respondent consistently selects the middle option); ii) the Average LongString measure is the average of the LongString variable across all survey pages; and iii) the Max LongString measure is the maximum LongString variable on any of the survey pages. Response pattern indices are only helpful when a relatively long series of questions use the same scale. It is not easy to compare different surveys according to these measures because they depend on the type and position of the questions. These methods are likely to only detect respondents who employ minimum effort strategies such as choosing the same answer repeatedly.
Outlier indices attempt to spot outlier answers. Zijlstra et al. (2011) review six methods for computing outlier statistics (or scores) for each respondent and identifying the level of discordance with other observa- tions in the sample. For better results, these methods typically rely on multiple survey questions at once. One of the most commonly used outlier statistics is the Mahalanobis distance, which computes the distance between an observation and the center of the data, taking into account the correlational structure.
Honesty checks or self-reported attention. An additional possibility is to insert a direct question about the respondent’s “honesty,” asking the respondent to evaluate their interest and attention on a single item or the whole survey. These measures correlate with the other attention checks but are not appropriate if the respondent loses from being honest (e.g., if their survey reward is withheld, Meade and Craig (2012)). The example from Meade and Craig (2012) is “Lastly, it is vital to our study that we only include responses from people that devoted their full attention to this study. Otherwise, years of effort (the researchers’ and the time of other participants) could be wasted. You will receive credit for this study no matter what; however, please tell us how much effort you put forth towards this study.” [Almost no effort, Very little effort, Some effort, Quite a bit of effort, A lot of effort] As an example, Alesina et al. (2022) includes the following question, which is not placed at the end of the survey, but rather strategically to foster attention in the subsequent questions (regardless of what the respondents answer):
Before proceeding to the next set of questions, we want to ask for your feedback about the responses you provided so far. It is vital to our study that we only include responses from people who devoted their full attention to this study. This will not affect in any way the payment you will receive for taking this survey. In your honest opinion, should we use your responses, or should we discard your responses since you did not devote your full attention to the questions so far?
□ Yes, I have devoted full attention to the questions so far and I think you should use my responses for your study. □ No, I have not devoted full attention to the questions so far and I think you should not use my responses for your study.
Time spent on the survey. You can also decide to discard respondents who spent too little or too much time on the survey as a whole (the cutoff will depend on what you consider to be a reasonable time for a given survey). Indeed, while you should probably allow respondents to interrupt and complete the survey at a later time (because they may otherwise drop out altogether), you need to check the answers of respondents who took really long for quality because they may be distracted by other things (which could be a problem, especially when estimating treatment effects as in Section 6). It is always worth checking
13
whether time spent on the survey (and lack of attention or carelessness) is systematically correlated with respondent characteristics.
3.3 Survey fatigue
An important concern in surveys is survey fatigue, i.e., the decay of respondents’ focus and attention over the course of the survey.
Reducing survey fatigue. Good design is particularly critical for reducing survey fatigue. Questions that are inconsistent, vary a lot, and do not have good visual design can impose undue cognitive load on respondents and tire them out more quickly. The length of the survey is, of course, critical. However, there are no hard rules as a long, but interesting and well-designed survey may foster more engagement than a shorter, but poorly designed or boring one.
Testing for survey fatigue. To spot survey fatigue in your survey, you can check whether patterns of carelessness such as those described in Section 3.2 increase over the course of the survey. However, this is not always conclusive as the types of questions asked over the course of the survey change as well. Stantcheva (2021) suggests a test check based on the randomization of survey order blocks: one can test whether respondents who (randomly) saw a given survey block later in the survey spend less time on it and exhibit more careless answer patterns on questions in that block than respondents who saw that same block earlier on.
4 Writing Survey Questions
When you decide to run a survey, you may wish to start writing the questions quickly. However, do not jump into this before a lot of careful thinking. There may be a temptation to think about writing your survey as just the equivalent of “getting the data” in observational empirical work. However, you are the one creating the data here, which gives you many opportunities and presents many challenges. Writing your survey questions is already part of the analysis stage.
You first need to outline very clearly what your research question is. There is no such thing as a “good survey” or a “good question” in an absolute sense (although there are bad surveys and bad questions). A good survey is adapted to your research issue. Therefore, when writing survey questions, you must always remember how you will analyze them; the right design will depend on your goal. In this section, I outline some best practices for writing questions, based on the many references cited in this review article and my own experience. Section 4.3 builds extensively on Dillman et al. (2014) and Pew Research Center (nd). Some of the examples there are intentionally adapted to be more suitable for economics surveys with a few examples used with very minor modifications.
4.1 General advice
Types of questions. There are several different types of survey questions. Questions are made of question stems and then answer options or entry fields. Closed-ended questions, which typically make up most of the survey questions, have a given fixed set of answer options. Closed-ended questions can be nominal, with categories that have no natural ordering (e.g., “What is your marital status?”), or ordinal, with categories that have some ordering (e.g., Questions such as “Do you support or oppose a policy?” with answer options ranging from “strongly oppose” to “strongly support”; or questions about frequencies, with answer options ranging from “never” to “always”). Open-ended questions instead have open answer fields of varying lengths and do not constrain respondents to specific answer choices. Hybrid questions are closed-ended questions with open-ended answer choices, such as “Other (please specify): [empty text field].”
Well-designed survey questions allow you to create your own controlled variation. This distin- guishes social and economic surveys from other types of surveys. The goal is not only to collect statistics, the goal is to understand reasoning, attitudes, and views and tease out relationships. When you design your
14
questions, you need to keep the concept of Ceteris Paribus, or “all else equal,” in mind and think of the exercise as creating your own controlled (identifying) variation. Each question needs to ask about only one thing at a time and hold everything else as constant as possible (and respondents need to be aware of that).
For instance, as discussed in Alesina et al. (2018), if you want to understand whether people want to increase spending on a given social program, it is difficult to infer much from answers to a question such as “Do you support or oppose increasing spending on food stamps?” The reason is that this question mixes many different considerations, including i) how much government involvement respondents want; ii) how they think the spending increase will be financed (e.g., will it come at the expense of other programs?); and iii) whether they prefer another, related program (e.g., cash transfers to low-income households). This is why Alesina et al. (2018) split this question into three different questions: one about the preferred scope and involvement of the government, one about how to share a given tax burden, and one about how to allocate a given amount of government funds to several spending categories ranging from infrastructure and defense to social safety net programs.
Note that if you are only interested in treatment effects in survey experiments, you may have a bit more leeway because, presumably, the variations in interpretation of a question will be similar across the treatment and control groups. Even then, I would advise having as precise and clear questions as possible.
Writing precise and clear questions. When writing survey questions, precision and clarity are key. This involves, among others, avoiding the following types of questions:
Double-barreled questions, i.e., questions that ask about two things simultaneously. This is sometimes grammatically evident (“Do you support or oppose increasing the estate tax and the personal income tax in the top bracket?”) but often more subtle, as explained above, when your question does not hold all other relevant factors constant.
Vague questions. “Do you support or oppose raising taxes on the rich?” may be helpful in some settings, but presumably, it would be more beneficial to specify what you mean by “rich” and which taxes exactly you have in mind.
All-or-nothing questions. These questions are not informative because everyone will tend to respond the same. For instance: “Should we raise taxes to feed starving children?”
Being very specific in your questions avoids ambiguity, which can lead to misinterpretation and heterogeneous interpretations of the question across respondents. These in turn lead to measurement error.
Allow for a respondent to answer that they do not know or are indifferent. There may be respondents who have not given much thought to the issues researchers ask about, especially if these issues relate to broader social or economic phenomena as opposed to respondents’ own lives. Therefore, allowing them to express indifference toward or absence of a strong view on the issue makes sense. It is similarly recommended to let respondents answer that they do not know when asked knowledge-related questions.
Use simple, clear, and neutral language. Using simple, clear, and neutral language involves several elements:
Know your audience. Questions that are easy to answer for one type of audience may be difficult for another. For instance, Alesina et al. (2021) survey both adults and teenagers in their study on racial gaps and adapt the teenagers’ questionnaire to be shorter and with simpler words.
Do not use jargon or undefined acronyms.
Do not use negative or double negative formulations that are harder to understand. An example of a negative formulation that is mentally burdensome is “Do you favor or oppose not allowing the state to raise state taxes without approval of 60% of voters?” Instead, you could ask “Do you favor or oppose requiring states to have 60% of the approval of voters to raise state taxes?” You should minimize your respondents’ cognitive load, so the answer “yes” should mean yes, and “no” should mean no.
15
Eliminate all unnecessary words and keep your questions as short as possible. One application of this principle is to include answer options only after the question stem. For instance, a formulation such as “Are you very likely, somewhat likely, somewhat unlikely, or very unlikely to hire a tax preparer next year?” is tiring to read. A better formulation would be “How likely or unlikely are you to hire a tax preparer next year?” with answer options [Very likely, Somewhat likely, Somewhat unlikely, or Very unlikely].
Be careful with sensitive words or words that may be offensive to some people.
Adapt your questions to your respondents. Make sure your questions apply to all respondents in your sample. If they do not, either i) create survey branches based on your respondents’ characteristics or ii) ask contingent questions. For instance, do not ask someone currently unemployed about their job. Instead, you can ask about their current job, but add the contingency “if you do not currently have a job, please tell us about your last job.”
Write neutral questions. You should strive to write neutral questions that do not bias the responses.
Phrase questions as actual questions rather than using “Do you agree or disagree with this statement” formulations. In fact, you should strive to avoid Agree/Disagree and True/False questions (see the detailed discussion in Section 5.4).
Avoid leading questions that nudge the respondent in one answer direction. An example of a leading question is “More and more people have come to accept using a tax preparer to reduce one’s tax burden as beneficial. Do you feel that using a tax preparer to reduce your tax burden is beneficial?”
Avoid judgmental and emotionally charged words in your questions.
Do not ask sensitive or private questions unless you must (of course, for research, we sometimes must).
Avoid giving reasons for a given behavior in your question because the answers will mix what the respondent thinks about the issue you are asking about and the cause. For instance: “Do you support or oppose higher taxes so that children can have a better start in life?” will not lead to informative answers about people’s attitudes to either taxes or equality of opportunity.
You can consider including a counter-biasing statement to signal neutrality. For instance: “Some people support very low levels of government involvement in the economy, while others support very high levels of government involvement. How much government involvement do you support?”
When asking either/or types of questions, state both the positive and negative sides in the question stem. For instance, instead of asking “Do you favor increasing the tax rate in the top bracket?”, ask “Do you favor or oppose increasing the tax rate in the top bracket” [Favor, Oppose]. Similarly, instead of asking “How satisfied are you with the overall service you have received from your tax preparer?”, ask “How satisfied or dissatisfied are you with the overall service you have received from your tax preparer?” [Very satisfied, Somewhat satisfied, Somewhat dissatisfied, Very dissatisfied].
Use simple question formats unless the question requires more complexity. Sometimes, you will need to elicit complex perceptions or attitudes, which justifies the use of a more involved question format (see Section 4.4). In general, however, a simpler question design consistent throughout the survey makes the most sense, as it involves a lower cognitive burden for respondents. Some common question design types are:
Checkbox questions. In these questions, respondents simply select one or multiple answer options by clicking on them.
Radio buttons questions. Respondents can select one single option by clicking on it.
16
• Slider questions. Respondents select an answer by moving a slider. The benefit of sliders is that they yield more continuous and perhaps more fine-grained answers than fixed point scales. The dis- advantages are that some of that variation is almost surely noise, that they can take longer to answer than checkbox or radio button questions, and that they may be hard to control precisely, especially on mobile phones. Sliders can be a good and intuitive visual representation if you are truly interested in a continuous variable rather than discrete ordinal categories. If not, a standard question design with radio buttons is recommended.
• Ranking questions. Ranking questions can be cumbersome, and forcing respondents to rank items that are not truly on a unidimensional scale can lead to misleading results, with gaps between items that are not meaningful. They should only be used for actual rankings. For instance: “Which region contributes most to global greenhouse gas emissions? Please rank the regions from the one contributing most to the one contributing least” from Dechezleprˆetre et al. (2022).
Do not force responses. In general, you should not force respondents to answer questions unless their responses are needed to screen them at the start of the survey. For most questions, respondents can be branched appropriately even if their response to a question is missing. When you think about the branches in your survey, you should always consider where respondents who leave an item blank should go next. You can, however, “prompt” for responses. For instance, Qualtrics has a pop-up window that appears if respondents try to move to the next survey screen while leaving questions blank, asking them whether they are sure they want to leave some items unanswered.
Provide informative error messages. Your survey should provide informative error messages, i.e., messages that help the respondents recognize the error in their responses. A message such as “your answer is invalid” is not helpful; a message that specifies “please only enter integer numbers” is much more useful.
Look for precedent. You are often not the first to ask survey questions on a topic. As a starting point, it would be best always to examine the literature and existing surveys for questions that have already been validated and tested. However, the fact that a question has once been used should never be a sufficient reason to use it again and does not guarantee that it will be suitable for your survey.
Pre-test. You must pre-test your questionnaire multiple times. This involves surveying not only “content experts,” which are people who are experts on the topic (e.g., your colleagues), but a wider, non-expert audience. You should ask for feedback on your questionnaire. You can formally test various survey versions and run small-scale pilots on smaller samples from your target population. Pre-testing is particularly valuable when designing new questions on less-explored topics. During pilots, leave ample space for feedback in open- ended text boxes and encourage respondents to give feedback. Money spent on pilots and pre-testing is wisely spent because it can save you more money and disappointment later. Part of the testing is also to see whether your data is being recorded correctly. Further, it is invaluable to do in-depth cognitive interviews with people from your target audience, especially when studying new topics. Cognitive interviews involve having someone take the survey and share their impressions, questions, and reactions to it in real time. They are a complement to experimental pre-testing, not a substitute.
Include feedback questions. Even beyond the pilot and pre-testing rounds, you should always include feedback entry fields at the end of your survey. Some can be more general, e.g., “Do you feel that this survey was left- or right-wing biased or unbiased?” [Left-wing biased, Right-wing biased, Unbiased]. Others can be more targeted. For instance, you may want to elicit whether respondents understood the purpose of your research, which may be problematic in light of social desirability bias or experimenter demand effects (see Section 5).
4.2 Open-ended questions
Purposes of open-ended survey questions. Open-ended questions have many benefits. Thanks to ever-evolving text analysis methods, researchers can easily analyze them. Ferrario and Stantcheva (2022) provide an overview of the use of open-ended questions to elicit people’s concerns in surveys. Open-ended questions have several purposes.
17
First, they allow researchers to elicit people’s views and concerns on many issues without priming them with a given set of answer options (Ferrario and Stantcheva, 2022). Ferrario and Stantcheva (2022) leverage text analysis methods to study people’s first-order concerns on income taxes and estate taxes, based on answers to open-ended questions. These questions can be very broad (for instance: “When you think about federal personal income taxation and whether the U.S. should have higher or lower federal personal income taxes, what are the main considerations that come to your mind?”), more directed (for instance: “What do you think are the shortcomings of the U.S. federal estate tax?”), or very specific (for instance: “Which groups of people do you think would gain if federal personal income taxes on high earners were increased?”). Appendix Figure A-4 provides an example of how open-ended questions can shed light on the types of words used by respondents and on the different topics that matter to them based on political affiliation. Stantcheva (2022) elicits people’s main considerations about trade policy using a series of open-ended questions.
Second, they are helpful in exploratory work ahead of writing a complete survey. When unsure about relevant factors, a pilot study with open-ended questions can help you determine the answer choices to include in closed-ended questions. It may bring to light unforeseen factors and issues.
Third, they avoid leading and priming respondents when unaware of the suitable scales and answer options. We discuss the choice of answer scales in more detail below. In brief, answer scales should ideally approximate the actual distributions of answers in the population to avoid biasing respondents who may look for clues in the provided options, especially when some mental work is involved in remembering something or thinking about an answer. However, on some issues, you may not know the right scales. Open-ended questions can be beneficial in these cases.
Finally, in the context of information experiments (see Section 6.3), you can use open-ended questions to validate the answers to other questions, such as quantitative ones, and to test whether a treatment changes people’s attention on a topic (as in Bursztyn et al. (2020)).
Best practices for open-ended questions. Some best practices for open-ended questions are as follows:
Because these questions can be more time-consuming than closed-ended ones, especially when the answers required are longer, you may need to motivate respondents to answer using prompts, such as “This question is very important to understanding tax policy. Please take your time answering it.”
To encourage more extended responses, you can consider adding follow-up questions on the next screen (without overdoing it), such as “Are there any other reasons you can think of?”
As a result, you should use them sparingly and, if they are essential for your research, place them earlier on in the survey while respondents are less tired.
Specify what type of answers you are looking for to guide respondents and facilitate your subsequent analysis of the data (“Please spend 1 or 2 minutes”; “Please think of several reasons...”; “Please list any words that come to your mind...”).
Adapt the visual format to the type of answers you need. For instance, if you only need a single answer, provide a single answer box. If you need multiple answers, provide multiple answer entry fields. Make the length and sizes of the entry fields appropriate to the type of answer you need (e.g., if you want respondents to write longer responses, provide ample space). To avoid issues with interpretation and mixing up of units, you can sometimes provide a template below or next to the answer space. For example, if you are looking for a dollar amount, you can put the sign “$” next to the box; if you are looking for a duration, you can add “months” next to the answer box; for a date, you can specify “MM/YYYY”).
4.3 Closed-ended questions
Qualitative versus quantitative questions. There are two general types of closed-ended ordinal ques- tions: those that offer more vague qualitative response options (e.g., never, rarely, sometimes, always) and those that offer response options using a natural metric (e.g., once a week, twice a week, etc.). More gener- ally, a question to ask yourself is whether a qualitative or quantitative answer to a given question is better suited to your research needs. Questions using vague quantifiers as answer options have the disadvantage
18
that they are, by construction, vague and, thus, can mean different things to different people. At the same time, they are easy to understand. Furthermore, respondents may not be able to precisely assess something in a quantitative manner, so they may make errors, and variations in answers could reflect a lot of noise. In a sense, we have to be realistic about some issues. The reality is that it may be impossible to get precise measures of some constructs, and using vague quantifiers may be the best we can do. Still, there are plenty of best practices to minimize response errors and noise in both qualitative and quantitative questions. Overall, qualitative questions can be highly useful complements to quantitative questions. For example, Alesina et al. (2018) use a depiction of a ladder to elicit perceived mobility, see Figure 3. They also ask respondents a corresponding qualitative question: “Do you think the chances that a child from the poorest 100 families will grow up to be among the richest 100 families are:” [Close to zero, Low, Fairly low, Fairly high, high] (Alesina et al., 2018, p. 528). Multiple measurements are critical for especially important variables in your survey.
4.3.1 General advice for closed-ended questions
Use exhaustive answer options that span all possible reasonable answers. The following example does not contain all reasonable answer options. Furthermore, the options are not mutually exclusive either because they mix the issues of how and where the respondent learned about the 2017 Tax Cuts and Jobs Act.
From which one of these sources did you first learn about the 2017 tax reform (the Tax Cuts and Jobs Act, or TCJA)?
□ Radio □ Television □ Someone at work □ While at home □ While traveling to work
A better formulation of the question would instead be:
From which one of these sources did you first learn about the 2017 tax reform (the Tax Cuts and Jobs Act, or TCJA)?
□ Radio □ Television □ Internet □ Newspaper □ A person
Where were you when you first heard about it?
□ At work □ At home □ Traveling to work □ Somewhere else
You may also need to include options such as “Don’t know,” “Undecided,” or “Indifferent.” In questions with ordinal scales, there is often a middle option reflecting neutrality or indifference, such as “neither support nor oppose.” In other questions, there is a tradeoff: such answer options may be useful for respondents who genuinely do not have a view or do not know, but they also give respondents an easy way out. On balance, you should think carefully case by case whether it makes sense to have such an option. If you can expect respondents to genuinely not know, you should include such an option.
Furthermore, it may be beneficial to use hybrid question types, which are closed-ended, but also include an “Other” option with an empty text field for the respondent to provide further detail.
Make sure answer categories are non-overlapping. Answer categories should be non-ambiguous, which means avoiding any overlap. Even minor overlaps can be misleading and annoying for the respondents, such as in the question “What should the marginal tax rate for incomes above $1,000,000 be?” with answer options [0% to 20%, 20% to 30%, etc.] instead of a non-overlapping scale such as [0% to 19%, 20% to 29%, etc.].
Use a reasonably small number of answer options. In general, you want to avoid having respondents read through long lists of answer options and should keep the answer options list as short as possible. How-
19
ever, there are exceptions. For instance, when you are asking about demographic or background information, such as ethnicity, gender, etc., you should put as many categories as possible and be very inclusive.
4.3.2 Specific advice for nominal closed-ended questions Avoid unequal comparisons. It is important that your answer options are comparable. For instance, it
can be misleading if you make one answer option sound negative and another one positive. Which of the following do you think is most responsible for the rise in wealth inequality in the US?
Mixing positive and negative options
□ Bad tax policies □ Technological change □ Greedy corporations □ Decrease in unionization rates
More comparable answer options
□ Tax policies □ Technological change □ Corporate policies □ Decrease in unionization rates
Another possibility, which is lengthier but more neutral, is to ask a question separately for each of the answer options, along the lines of:
To what extent do you feel that tax policies are responsible for the rise in wealth inequality in the US?
□ Completely responsible □ Mostly responsible □ Somewhat responsible □ Not at all responsible
and so forth for the remaining concepts.
Multiple choice questions: “check-all-that-apply” versus “forced-choice.” For questions where there can be multiple answer options selected, you have to decide between using a “forced-choice” or a “check-all-that-apply” format. Forced-choice questions ask item by item and require respondents to judge all items presented independently. Check-all-that-apply formats list all options simultaneously and ask respondents to select some of the items presented (Smyth et al., 2006). Forced-choice questions generally lead to more items being selected and respondents thinking more carefully about the answer options. As discussed below, forced-choice questions will also circumvent the problem of order effects in the answer options, whereby respondents may be tempted just to select one of the first answers and move on without considering every choice. If you can, try to convert your “check-all-that-apply” questions into individual forced-choice questions.
Check-all-that-apply question
Which of the following policies do you think could reduce inequality? Check all that apply.
□ Job re-training programs □ Higher income taxes □ Higher minimum wage □ Free early childhood education □ Anti-trust policies
□ Unemployment insurance
Forced-choice question
Do you think of the following policies could reduce inequality?
Yes No
□ □
□ □
□ □
□ □
□ □
□ □

Order of the response options. As discussed in more detail in Section 5.1, the order in which response options are provided may not be neutral. Respondents may tend to pick the last answer (a “recency effect” most often encountered in phone or face-to-face surveys) or the first answer (a “primacy effect”). In addition to the advice of avoiding long lists of options and using forced-choice instead of select-all-that-apply, it makes sense to randomize the order of answer options for questions that do not have a natural ordering or where the ordering can be inverted.
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.