Explore

Eliciting People’s First-Order Concerns: Text Analysis of Open-Ended Survey Questions

https://scholar.harvard.edu/files/stantcheva/files/text_analysis_of_open-ended_questions.pdf⁠

⁠

By Beatrice Ferrario and Stefanie Stantcheva∗

Surveys are a key tool for understanding people’s views on public policies. They let us slip into people’s minds and reveal other- wise invisible things such as attitudes, per- ceptions, reasonings, and beliefs. They can shed light on how people reason about im- portant policies that shape their daily lives, such as health care, taxation, and trade pol- icy. What efficiency and distributional im- pacts do people have in mind when thinking about these policies? What are their per- ceived goals and social objectives?

To some extent, we can learn about sup- port for some policies by observing citizens’ political behaviors. Yet, we lack data on their more detailed policy preferences since voting rarely happens on specific and sep- arate issues. Furthermore, it is difficult to infer the reasoning underlying people’s pol- icy views using observational data. Survey methods are thus an invaluable complement to our other research methods.

The backbone of surveys often consists of closed-ended questions that provide a fixed set of answer options. The advantages of these questions are that answer options are standardized and streamlined across respondents and they easily lend them- selves to quantitative analysis. However, in some settings, we may prime respondents to think about (and, subsequently, perhaps select) answer options that they would oth- erwise not have thought about. Conversely, we may omit relevant options that we do not know about. In open-ended questions, respondents are not offered answer options, but rather, an empty text entry field in

∗ Ferrario: Harvard University, 1805 Cambridge Street, Cambridge, MA 02138 (e-mail: beat- rice ferrario@g.harvard.edu); Stantcheva (correspond- ing author): Harvard University, CEPR, and NBER, 1805 Cambridge Street, Cambridge, MA 02138 (e-mail: sstantcheva@fas.harvard.edu). We thank Chantal Pezold and Martha Fiehn for exceptional research assistance.

which they can write freely. Open-ended survey questions can therefore circumvent some of the above-mentioned issues. By be- ing less guided, they may teach us things that we may otherwise have missed and that we may not be used to thinking about as economists. The answers to these open- ended questions can be analyzed using text analysis methods to shed light on the first- order considerations that come to people’s minds without constraining them to think about a limited set of answer options.

This paper illustrates the design and use of open-ended survey questions, focusing on the topics of income and estate taxation.

An abundant literature leverages sur- vey data to explore people’s perceptions and preferences about tax policy and redistribution (Gimpelson and Treisman, 2018; Alesina, Stantcheva and Teso, 2018; Stantcheva, 2021; Fisman et al., 2020; Cruces, Perez-Truglia and Tetaz, 2013; Karadja, Mollerstrom and Seim, 2017; Roth and Wohlfart, 2018; Hvidberg, Kreiner and Stantcheva, 2020). Perceptions (and mis- perceptions) of tax rates are documented in De Bartolome (1995), Gideon (2017), Ballard and Gupta (2018), Rees-Jones and Taubinsky (2019), Chetty, Friedman and Saez (2013), Feldman, Katuˇsˇc ́ak and Kawano (2016), and Stantcheva (2021).

Text analysis methods of non-survey data, such as online media and newspa- per coverage, have been applied in finance (Antweiler and Frank, 2004), macroeco- nomics (Baker, Bloom and Davis, 2016), and political economy (Groseclose and Mi- lyo, 2005; Gentzkow and Shapiro, 2010; Te- sei, Durante and Pinotti, 2018; Gentzkow, Kelly and Taddy, 2019). Our goal is to apply text analysis methods to data de- rived from answers to open-ended survey questions. A few papers in political sci-

2 PAPERS AND PROCEEDINGS MONTH YEAR

ence (Roberts et al., 2014; Brugidou, 2003) leverage open-ended survey questions, and the practice is also starting to spread to economics (Stantcheva, 2020; Houde and Wekhof, 2021).

The data for this paper comes from two surveys on income and estate taxes, con- ducted in 2019 on 5140 U.S. residents aged 18 to 70. The sample is representative of the U.S. population along the dimen- sion of gender, age, income, political af- filiation, and employment (see Appendix OA-1). Section I presents the application of text analysis to open-ended survey ques- tions. Section II summarizes key results about people’s views on income and estate taxation.

not previously thought carefully about the topic may be “gut reactions.” These reac- tions are informative, as they reflect what a respondent thinks and will keep think- ing, absent more learning or targeted re- flection. The answers of respondents who have already thought about the topic pre- viously or take time to think about it dur- ing the survey before answering may reflect more profound views.2 Either way, answers to open-ended questions capture the first- order considerations that matter to people and the aspects of an issue that are top of mind for them.

B. Text Analysis Methods for Open-Ended Questions

Data pre-processing

To prepare the data for text analysis, we first parse the answers to reduce the number of distinct text elements. We remove punc- tuation, excess spaces, numbers, misspelled words, and so-called “stop words,” which are common words that carry no intrinsic meaning such as “and” or “the.” The re- maining words in each answer are then lem- matized to group all inflected forms of a word.3 Words appearing in the question it- self or that occur generically in answers can also be removed (e.g., “think,” “believe,” and “should”). Appendix OA-4 describes the data pre-processing in detail. We now briefly present three text analysis methods, with more details in Appendix OA-5.

Word Clouds

For each of the methods presented, a de- cision has to be made on the basic unit of analysis, i.e., the size of word groups that will be considered as a set. “N-grams” are groups of n words. In word clouds, the font size for each n-gram is proportional to its frequency. Word clouds are best used as a first step in visualizing the data and for scanning answers quickly. Their weakness

2The time spent on each question can be measured and thus, it is possible to distinguish between these two types of responses.

3For instance, “policies” becomes “policy,” “were” becomes “be.”

Using Open-Ended Survey Questions

What do Open-ended Questions Measure?

Open-ended questions can go from broad to narrow. Broader open-ended questions are useful to elicit first-order, intrinsic con- cerns that people have before they are prompted to think of a particular policy as- pect with more directed questions. Thus, it makes sense to start by asking people big picture questions such as the “main consid- erations” that come to their minds when they think about an issue (e.g., the income or estate tax). In our application, we then narrow the focus by asking people what a “good” tax system means to them and what its goals should be, as well as what their main perceived shortcomings of the current U.S. tax system are. Finally, one can ask targeted questions, such as about the effects on the U.S. economy and on dif- ferent groups of people if the policy were changed (e.g., “What would be the effects on the economy if taxes on high earners were raised?”).1 Ideally, open-ended ques- tions should be complemented with closed- ended questions for cross-validation.

It is useful to think about what the answers to open-ended questions capture. The answers of respondents who have

1Appendix Section OA-2 provides all the questions asked.

VOL. VOL NO. ISSUE PEOPLE’S FIRST-ORDER CONCERNS ON TAX POLICY 3

is that they do not account for synonyms. Hence, topics for which there are many pos- sible words to express the same thought may be artificially diluted, while niche top- ics that feature clear buzzwords may be in- flated in importance.

Keyness analysis

Keyness analysis is based on a relative frequency analysis that compares the use of n-grams between two groups (a reference and a target/study group). The keyness scores of an n-gram are based on the χ2 test statistic for the null hypothesis that the propensity to use the n-gram is the same for the reference and target groups. In a nut- shell, the keyness score of a term measures how characteristic this term is of the refer- ence group. Words that are common, but used relatively equally by the two groups do not have a high keyness score.

Topic Analysis

The topic analysis is based on a keywords-count model. Topics are defined by sets of keywords. To extract the top- ics and associated keywords, approaches range from manual to semi-supervised or unsupervised (see Appendix ?? for a sum- mary of some key methods). Many of these methods are developed for longer text and are not that well-suited for survey answers which are shorter in length. In practice, given the manageable sample sizes, a more guided approach does better. We recom- mend extracting the “document-term ma- trix” (matrix of frequencies of terms in each answer), plotting the distributions of words, and checking many sample answers to bet- ter understand how words are used by re- spondents. Oftentimes, themes and com- monly used words appear quite clearly from the frequency distributions. It is, however, important to do sensitivity checks on the topics delineated and on the keywords in- cluded. Among other decisions that need to be made (and which warrant sensitiv- ity analysis) are whether to count a topic that is mentioned multiple times by a re- spondent only once or not, and whether

to filter out differences in answer lengths across groups by computing topic distribu- tions within groups.

II. Application: How Do People Think About Taxes?

To apply these methods to how peo- ple think about income and estate taxa- tion, we focus on answers to the broad question “What are your main considera- tions?” when thinking about income or es- tate taxes, respectively. The other open- ended questions are analyzed in Appendix OA-8. Figure 1 shows the word clouds de- rived from the responses. For the income tax, respondents express disagreement with the current levels of taxes and views on the direction in which to change them (“lower

(a) Income Tax

business business

fifty thousand

tax spend tax everyone take paycheck take check

tax benefit tax lower wealthy share double tax

rich rich people higherpweeoaplethayffoaridr tax across

hard working president trump middle poor minimum wage wealthy higher

loophole wealthy

whether tax people fair many people people back government people

economy government higher middle government take higher afford everyone tax tax affordclass too much

fair peoplepeople cant poor people level tax lower lower high low

universal healthcare

poor lower tax level spend economy

class fair lower family people tax flat tax

hundred thousand fair share tax tax class poor higher earn

people working lower working afford higher

very high

national debt higher class government spending

affect personally tax loophole rich fair

lower higher middle higher tax fair higher people long term

tax wealthy people lower middle class

progressive tax poor class double high people richaffect family people government

low people good job poor rich across board

tax government lower taxclass tax working poor

upper middle sales tax class big higher rich

high earner working citizen wealthy tax

lower everyone lower government

nothing come

poverty level people spend family tax rich poortax low government controlupper class higher highertax sharemiddle people tax business

tax higher work hard

fair tax

good economy rich tax good people

higher lowerpeople earn high people

class middle class rich poor poor

rich share rich people fund government

favor tax class upper tax mricihdhdiglheer lofawirericghood middle lower poor middle

wealthy peopleworking classtax people class class class family people keep tax rich break rich tax workingtax goodpeople people

government fund tax economylower middle tax class tax high higher earner

class higher

higher tax

ten million capital gain hard people everyone fair

lower class

lower people tax poor working peoplepeople break

working middle people high

cant afford

hard earn

tax always higher wealthy class people people work

government good everyone share end meet tax break fair everyone

loophole rich very rich american people government spend social security class lower many loophole close loophole government waste take home higher level

different class middle working social program people hard higher everyone economic growth people struggle take away government provide government program people take standard deduction

affect economy paycheck paycheck two thousand raise lowering support government higher good

everyone government

break working

(b) Estate Tax

government no tax benefit wealthy government greedy government double government revenue

good government farm business higher class lower government tax previously

work family

government family large sum poor rich inherit tax small businesslow family

family businesstax income

lower income

tax died

tax impose high income

lower tax working class lower class

transfer higher cant afford

government tax

burden family lower poorfive million

tax government eliminate tax

lower hard

tax free

income tax

tax wealthy little higher

income income higher way

tax fund income inherit transfer fund

work tax poor class wealthy higherfamily government no tax government asset taxtax inherit two thousand family without lower afford

lose lovedone rich poor higher income lower inherit class poor

rich tax lower way lower high first place

tax fair no tax tax higher rich inherit large

class afford higher little

higher higherlower lose left behind either way

lower middle way lower lower family middle class tax high tax good

owner tax

cost live fair share tax bracket

capital gain financial situation

class lower income bracket

bracket higher entire life upper class tax higher whole life

work life double tax transfer tax

hard tax tax transfer flat tax lower since

fair tax

hard life wealthy income

save taxwealthy taxfamily tax lower burden

tax double

tax way

income levelgood way

since tax work save tax no tataxx lower hard family

higher bracket work hard tax twice

lower higherlower no tax

tax savehundred thousand tax tax higher tax family familyfamily work

rich higher class family likely tax higher lower

family good

belong family

Figure 1. : Main Considerations about In-

come and Estate Taxes

Note: Word clouds based answers to open-ended question about respondents’ main considerations about income and estate taxes.

tax live tax time

low income lower good family farm

tax sinceten million government right life insurance afford higher income family rich rich hard work

lower fair wealthy family way government tax asset slide scale hard save

higher wealthylife work lower lowerlower work ultra wealthy financial burden family no tax exist tax

higher transfer next generation government transfer

wealthy fair work child family generation life government

lower bracket higher middle

Want to print your doc?
This is not the way.

Try clicking the ··· in the right corner or using a keyboard shortcut (

CtrlP

) instead.