icon picker
Eliciting People’s First-Order Concerns: Text Analysis of Open-Ended Survey Questions

Eliciting People’s First-Order Concerns: Text Analysis of Open-Ended Survey Questions
Eliciting People’s First-Order Concerns: Text Analysis of Open-Ended Survey Questions
By Beatrice Ferrario and Stefanie Stantcheva∗
Surveys are a key tool for understanding people’s views on public policies. They let us slip into people’s minds and reveal other- wise invisible things such as attitudes, per- ceptions, reasonings, and beliefs. They can shed light on how people reason about im- portant policies that shape their daily lives, such as health care, taxation, and trade pol- icy. What efficiency and distributional im- pacts do people have in mind when thinking about these policies? What are their per- ceived goals and social objectives?
To some extent, we can learn about sup- port for some policies by observing citizens’ political behaviors. Yet, we lack data on their more detailed policy preferences since voting rarely happens on specific and sep- arate issues. Furthermore, it is difficult to infer the reasoning underlying people’s pol- icy views using observational data. Survey methods are thus an invaluable complement to our other research methods.
The backbone of surveys often consists of closed-ended questions that provide a fixed set of answer options. The advantages of these questions are that answer options are standardized and streamlined across respondents and they easily lend them- selves to quantitative analysis. However, in some settings, we may prime respondents to think about (and, subsequently, perhaps select) answer options that they would oth- erwise not have thought about. Conversely, we may omit relevant options that we do not know about. In open-ended questions, respondents are not offered answer options, but rather, an empty text entry field in
∗ Ferrario: Harvard University, 1805 Cambridge Street, Cambridge, MA 02138 (e-mail: beat- rice ferrario@g.harvard.edu); Stantcheva (correspond- ing author): Harvard University, CEPR, and NBER, 1805 Cambridge Street, Cambridge, MA 02138 (e-mail: sstantcheva@fas.harvard.edu). We thank Chantal Pezold and Martha Fiehn for exceptional research assistance.
which they can write freely. Open-ended survey questions can therefore circumvent some of the above-mentioned issues. By be- ing less guided, they may teach us things that we may otherwise have missed and that we may not be used to thinking about as economists. The answers to these open- ended questions can be analyzed using text analysis methods to shed light on the first- order considerations that come to people’s minds without constraining them to think about a limited set of answer options.
This paper illustrates the design and use of open-ended survey questions, focusing on the topics of income and estate taxation.
An abundant literature leverages sur- vey data to explore people’s perceptions and preferences about tax policy and redistribution (Gimpelson and Treisman, 2018; Alesina, Stantcheva and Teso, 2018; Stantcheva, 2021; Fisman et al., 2020; Cruces, Perez-Truglia and Tetaz, 2013; Karadja, Mollerstrom and Seim, 2017; Roth and Wohlfart, 2018; Hvidberg, Kreiner and Stantcheva, 2020). Perceptions (and mis- perceptions) of tax rates are documented in De Bartolome (1995), Gideon (2017), Ballard and Gupta (2018), Rees-Jones and Taubinsky (2019), Chetty, Friedman and Saez (2013), Feldman, Katuˇsˇc ́ak and Kawano (2016), and Stantcheva (2021).
Text analysis methods of non-survey data, such as online media and newspa- per coverage, have been applied in finance (Antweiler and Frank, 2004), macroeco- nomics (Baker, Bloom and Davis, 2016), and political economy (Groseclose and Mi- lyo, 2005; Gentzkow and Shapiro, 2010; Te- sei, Durante and Pinotti, 2018; Gentzkow, Kelly and Taddy, 2019). Our goal is to apply text analysis methods to data de- rived from answers to open-ended survey questions. A few papers in political sci-
1

2 PAPERS AND PROCEEDINGS MONTH YEAR

ence (Roberts et al., 2014; Brugidou, 2003) leverage open-ended survey questions, and the practice is also starting to spread to economics (Stantcheva, 2020; Houde and Wekhof, 2021).
The data for this paper comes from two surveys on income and estate taxes, con- ducted in 2019 on 5140 U.S. residents aged 18 to 70. The sample is representative of the U.S. population along the dimen- sion of gender, age, income, political af- filiation, and employment (see Appendix OA-1). Section I presents the application of text analysis to open-ended survey ques- tions. Section II summarizes key results about people’s views on income and estate taxation.
not previously thought carefully about the topic may be “gut reactions.” These reac- tions are informative, as they reflect what a respondent thinks and will keep think- ing, absent more learning or targeted re- flection. The answers of respondents who have already thought about the topic pre- viously or take time to think about it dur- ing the survey before answering may reflect more profound views.2 Either way, answers to open-ended questions capture the first- order considerations that matter to people and the aspects of an issue that are top of mind for them.
B. Text Analysis Methods for Open-Ended Questions
Data pre-processing
To prepare the data for text analysis, we first parse the answers to reduce the number of distinct text elements. We remove punc- tuation, excess spaces, numbers, misspelled words, and so-called “stop words,” which are common words that carry no intrinsic meaning such as “and” or “the.” The re- maining words in each answer are then lem- matized to group all inflected forms of a word.3 Words appearing in the question it- self or that occur generically in answers can also be removed (e.g., “think,” “believe,” and “should”). Appendix OA-4 describes the data pre-processing in detail. We now briefly present three text analysis methods, with more details in Appendix OA-5.
Word Clouds
For each of the methods presented, a de- cision has to be made on the basic unit of analysis, i.e., the size of word groups that will be considered as a set. “N-grams” are groups of n words. In word clouds, the font size for each n-gram is proportional to its frequency. Word clouds are best used as a first step in visualizing the data and for scanning answers quickly. Their weakness
2The time spent on each question can be measured and thus, it is possible to distinguish between these two types of responses.
3For instance, “policies” becomes “policy,” “were” becomes “be.”
I.
A.
Using Open-Ended Survey Questions
What do Open-ended Questions Measure?
Open-ended questions can go from broad to narrow. Broader open-ended questions are useful to elicit first-order, intrinsic con- cerns that people have before they are prompted to think of a particular policy as- pect with more directed questions. Thus, it makes sense to start by asking people big picture questions such as the “main consid- erations” that come to their minds when they think about an issue (e.g., the income or estate tax). In our application, we then narrow the focus by asking people what a “good” tax system means to them and what its goals should be, as well as what their main perceived shortcomings of the current U.S. tax system are. Finally, one can ask targeted questions, such as about the effects on the U.S. economy and on dif- ferent groups of people if the policy were changed (e.g., “What would be the effects on the economy if taxes on high earners were raised?”).1 Ideally, open-ended ques- tions should be complemented with closed- ended questions for cross-validation.
It is useful to think about what the answers to open-ended questions capture. The answers of respondents who have
1Appendix Section OA-2 provides all the questions asked.
VOL. VOL NO. ISSUE PEOPLE’S FIRST-ORDER CONCERNS ON TAX POLICY 3
is that they do not account for synonyms. Hence, topics for which there are many pos- sible words to express the same thought may be artificially diluted, while niche top- ics that feature clear buzzwords may be in- flated in importance.
Keyness analysis
Keyness analysis is based on a relative frequency analysis that compares the use of n-grams between two groups (a reference and a target/study group). The keyness scores of an n-gram are based on the χ2 test statistic for the null hypothesis that the propensity to use the n-gram is the same for the reference and target groups. In a nut- shell, the keyness score of a term measures how characteristic this term is of the refer- ence group. Words that are common, but used relatively equally by the two groups do not have a high keyness score.
Topic Analysis
The topic analysis is based on a keywords-count model. Topics are defined by sets of keywords. To extract the top- ics and associated keywords, approaches range from manual to semi-supervised or unsupervised (see Appendix ?? for a sum- mary of some key methods). Many of these methods are developed for longer text and are not that well-suited for survey answers which are shorter in length. In practice, given the manageable sample sizes, a more guided approach does better. We recom- mend extracting the “document-term ma- trix” (matrix of frequencies of terms in each answer), plotting the distributions of words, and checking many sample answers to bet- ter understand how words are used by re- spondents. Oftentimes, themes and com- monly used words appear quite clearly from the frequency distributions. It is, however, important to do sensitivity checks on the topics delineated and on the keywords in- cluded. Among other decisions that need to be made (and which warrant sensitiv- ity analysis) are whether to count a topic that is mentioned multiple times by a re- spondent only once or not, and whether
to filter out differences in answer lengths across groups by computing topic distribu- tions within groups.
II. Application: How Do People Think About Taxes?
To apply these methods to how peo- ple think about income and estate taxa- tion, we focus on answers to the broad question “What are your main considera- tions?” when thinking about income or es- tate taxes, respectively. The other open- ended questions are analyzed in Appendix OA-8. Figure 1 shows the word clouds de- rived from the responses. For the income tax, respondents express disagreement with the current levels of taxes and views on the direction in which to change them (“lower
(a) Income Tax
business business
fifty thousand
tax spend tax everyone take paycheck take check
tax benefit tax lower wealthy share double tax
rich rich people higherpweeoaplethayffoaridr tax across
hard working president trump middle poor minimum wage wealthy higher
loophole wealthy
whether tax people fair many people people back government people
economy government higher middle government take higher afford everyone tax tax affordclass too much
fair peoplepeople cant poor people level tax lower lower high low
universal healthcare
poor lower tax level spend economy
class fair lower family people tax flat tax
hundred thousand fair share tax tax class poor higher earn
people working lower working afford higher
very high
national debt higher class government spending
affect personally tax loophole rich fair
lower higher middle higher tax fair higher people long term
tax wealthy people lower middle class
progressive tax poor class double high people richaffect family people government
low people good job poor rich across board
tax government lower taxclass tax working poor
upper middle sales tax class big higher rich
high earner working citizen wealthy tax
lower everyone lower government
nothing come
poverty level people spend family tax rich poortax low government controlupper class higher highertax sharemiddle people tax business
tax higher work hard
fair tax
good economy rich tax good people
higher lowerpeople earn high people
class middle class rich poor poor
rich share rich people fund government
favor tax class upper tax mricihdhdiglheer lofawirericghood middle lower poor middle
wealthy peopleworking classtax people class class class family people keep tax rich break rich tax workingtax goodpeople people
government fund tax economylower middle tax class tax high higher earner
class higher
higher tax
ten million capital gain hard people everyone fair
lower class
lower people tax poor working peoplepeople break
working middle people high
cant afford
hard earn
tax always higher wealthy class people people work
government good everyone share end meet tax break fair everyone
loophole rich very rich american people government spend social security class lower many loophole close loophole government waste take home higher level
different class middle working social program people hard higher everyone economic growth people struggle take away government provide government program people take standard deduction
affect economy paycheck paycheck two thousand raise lowering support government higher good
everyone government
break working
(b) Estate Tax
government no tax benefit wealthy government greedy government double government revenue
good government farm business higher class lower government tax previously
work family
government family large sum poor rich inherit tax small businesslow family
family businesstax income
lower income
tax died
tax impose high income
lower tax working class lower class
transfer higher cant afford
government tax
burden family lower poorfive million
tax government eliminate tax
lower hard
tax free
income tax
tax wealthy little higher
income income higher way
tax fund income inherit transfer fund
work tax poor class wealthy higherfamily government no tax government asset taxtax inherit two thousand family without lower afford
lose lovedone rich poor higher income lower inherit class poor
rich tax lower way lower high first place
tax fair no tax tax higher rich inherit large
class afford higher little
higher higherlower lose left behind either way
lower middle way lower lower family middle class tax high tax good
owner tax
cost live fair share tax bracket
capital gain financial situation
class lower income bracket
bracket higher entire life upper class tax higher whole life
work life double tax transfer tax
hard tax tax transfer flat tax lower since
fair tax
hard life wealthy income
save taxwealthy taxfamily tax lower burden
tax double
tax way
income levelgood way
since tax work save tax no tataxx lower hard family
higher bracket work hard tax twice
lower higherlower no tax
tax savehundred thousand tax tax higher tax family familyfamily work
rich higher class family likely tax higher lower
family good
belong family
Figure 1. : Main Considerations about In-
come and Estate Taxes
Note: Word clouds based answers to open-ended question about respondents’ main considerations about income and estate taxes.
tax live tax time
low income lower good family farm
tax sinceten million government right life insurance afford higher income family rich rich hard work
lower fair wealthy family way government tax asset slide scale hard save
higher wealthylife work lower lowerlower work ultra wealthy financial burden family no tax exist tax
higher transfer next generation government transfer
wealthy fair work child family generation life government
lower bracket higher middle
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.