68
th
Asking Critical Questions:
Toward a Sustainable Future for
Public Opinion and Social Research
2013 Conference Abstracts
www.aapor.org
WAPOR
66th Annual Conference
May 14 – 16, 2013
Boston University, Photonic Center
Boston, Massachusetts
AAPOR
68th Annual Conference
May 16 19, 2013
Seaport Boston Hotel &
Seaport World Trade Center
Boston, Massachusetts
AAPOR 68th Annual Conference
Thursday, May 16
1:30 p.m. – 3:00 p.m.
AAPOR Concurrent Session A
Innovations in Traditional Questionnaire Evalutation Methods
Getting Your Money’s Worth! Targeting Resources to Make Cognitive Interviews
Most Effective
Jaki McCarthy, National Agricultural Statistics Service
Cognitive interviewing has long been hailed as an effective technique to evaluate and improve
survey questions. However, cognitive interviews are typically resource intensive and thus
conducted on limited sets of questions and with limited sets of respondents. To be most
effective, questions that are most likely to have adverse impacts on data quality should be
targeted. In addition, respondents most likely to exhibit problems with these questions should
likewise be selected for testing. One way to target a subset of questions is to use available
information from previous data collections to identify questions with the greatest number of
quality problems. For example, high edit or item imputation rates, greater numbers of requests
for assistance answering these questions, etc. Once a subset of questions has been identified
as good candidates for cognitive testing, respondents must also be selected. Again, information
from existing data sets can be used to identify characteristics of respondents most likely to
exhibit problems. Data mining techniques, such as classification trees, can be used to
determine the type of respondents most likely to contribute to low quality responses. These
criteria can be used to select respondents for cognitive interviews. In addition, knowing the
pertinent characteristics of these respondents may also suggest useful probes that can be
included in the cognitive interviews. Once questions have been revised based on the cognitive
interviews, the same indicators of quality can be used to measure the improvement in data
collection using the new questions. This approach has been employed in making revisions to
questions on the Census of Agriculture; a case study provided will illustrate how this is an
effective use of scarce testing resources.
Conducting Cognitive Interviews Over the Phone: Benefits and Challenges
Harmoni Noel, American Institutes for Research
Cognitive interviews are commonly used in the survey research world as a pre-testing method
to test survey questions before they go into the field. They can also be used to test
comprehension of other printed materials such as fact sheets or research summaries for
clinicians on a variety of treatments or conditions. Cognitive interviews can show how
respondents understand a question and identify potential sources of response error in resulting
survey data. Typically, they are done face-to-face; however, some target populations such as
clinicians or farmers are very difficult to reach in-person and other interview modes such as
telephone interviewing may be more feasible and less costly. An additional benefit of doing
cognitive interviews over the phone would be the ability to generate a nationally representative
sample. To date, little research has examined the effectiveness of conducting cognitive
interviews over the telephone, but others have explored other alternative methods for
conducting cognitive interviews such as self-administered Web surveys with promising success
(Edgar 2012). Many researchers are facing budget and staff time constraints at the same time
that respondents are becoming harder to contact in person and demand for larger samples have
increased. Alternative approaches for conducting cognitive interviews may be the way of the
future. This paper will present insights into the logistics, benefits and challenges related to
conducting cognitive interviews over the phone. Data will be generated from interviewing people
working on different projects that utilized phone cognitive interviews to learn more about their
experiences with this method. Findings will be based on a qualitative analysis of themes
identified across their experiences. For example, it will explore the challenges related to not
having body language signals as indicators of affect or intention during the cognitive interview
process. In addition, this paper will draw some comparisons to in-person interviews.
Self-Administered Cognitive Interviewing
Jennifer Edgar, U.S. Bureau of Labor Statistics
Cognitive interviewing is traditionally an in-person pretesting method. The interaction between
the interviewer and participant allows for in-depth probing allowing the researcher to use
spontaneous probes designed to elicit explanations of the participant’s response processes.
Cognitive interviewing has been used to explore several stages and potential issues in the
response process, including comprehension, retrieval, judgment and response. Past research
has shown that it is possible that the goals of cognitive interviewing can be met using an
unmoderated format, where participants “thinking aloud” respond to scripted probes without
researcher intervention (Edgar, 2011). This approach was found to be promising, in terms of the
quantity and quality of data collected as well as potential efficiencies, in terms of costs and time
required to collect the data. This study builds on past work; comparing data collected using
traditional cognitive interviewing techniques to data collected using unmoderated interviewing
via the web. The quality of information collected in both modes is compared to determine if
unmoderated cognitive interviewing can capture data equivalent to what was collected in the
traditional lab setting. Specifically, respondent retrieval and comprehension are studied to see if
both aspects of the response process can be understood using information collected online. The
efficiency of the unmoderated method will also be evaluated, in terms of the costs and
resources required to collect and analyze the data.
Using Web Ex to Conduct Usability Testing of an On-Line Survey Instrument
Kristin Stettler, U.S. Census Bureau
Generally, usability interviewing is conducted in-person, to allow the researcher to observe the
interaction between the participant and the instrument, and to conduct in-depth follow-up
probing. The Census Bureau and the National Science Foundation conduct a bi-annual survey
of state government R&D. Given a tight timeframe and respondents who are geographically
scattered, it would have been difficult and costly to conduct an adequate number of usability
tests in person in time for the survey to go into production on schedule. Therefore, we
researched interactive on-line options using web-based conferencing software. Using Web Ex,
we were able to conduct a portion of the usability interviews remotely, where the researchers in
the Washington DC area observed and interacted one-on-one with on-line survey users in
several locations throughout the U.S. This paper presents the reasoning behind our decision to
choose Web Ex, the pros and cons of doing the interviews remotely on-line and suggestions for
others who may be considering usability testing in this manner.
The Web Option in Multi-Mode Surveys
The Effects of Pushing Web in a Mixed-Mode Establishment Data Collection
Chris Ellis, RTI International
Mixed-mode data collection is increasingly becoming a standard in survey research methods,
especially when inclusion of Web-based data collection is anticipated to increase data quality
(de Leeuw 2005; Dillman 2000; Schaefer and Dillman 1998). However, offering the respondent
the choice of mode can lead to unintended results, such as increased complexity or lower
response rates (Medway and Fulton 2012). While “pushing” a particular mode (e.g. Web) may
increase use, it risks lowering overall response rates (Mooney et al. 2012). Thus, there often
exists a tension concerning if, when, and how to transition ongoing collections to a mixed-mode
methodology when its origins are single-mode, such as paper form or questionnaire. The
Deaths in Custody Reporting Program (DCRP), a data collection measuring inmate mortality
began in 2000. Authorized by Congress and funded by the Bureau of Justice Statistics (BJS),
the DCRP collects data on the circumstances surrounding deaths occurring in state prisons and
local jails. It is the only national statistical collection that obtains comprehensive information
about deaths in adult correctional facilities. RTI and BJS embedded a methodological
experiment within the 2012 mailing to test the effects of concurrently offering multiple modes,
but with a “push” of the Web option for some respondents. All agencies in the data collection
were offered login credentials and information to utilize the Web option. A treatment of
withholding paper forms provided in prior years was introduced, with a control group receiving
paper forms. Assignment to treatment and control groups considered prior years’ mode
selection. We will examine the results of the experiment including timing, response rates, data
quality measures, and variable costs – associated with the subgroups in the context of a
longitudinal establishment study.
Internet Response for the Decennial Census – 2012 National Census Test
Courtney N. Reiser, U.S. Census Bureau
The Census Bureau has already committed to using the Internet as a primary response option
in the 2020 Census. With this commitment in mind, the 2012 National Census Test (NCT) was
developed to research the design and implementation of a secure, user-friendly online survey
instrument. The primary goal of the NCT was to evaluate within-household coverage strategies
for an electronic survey instrument. A secondary goal was to evaluate self-response rates of
various mixed-mode contact strategies. This paper will focus on that secondary goal.
Experimental contact strategies, which build off previous Census and American Community
Survey research, utilize an Internet Push methodology with additional reminders, new
motivational wording, and various timing strategies for the paper questionnaire mailing. Under
the 2012 NCT Internet Push approach, households did not receive a paper questionnaire in the
initial mailing but instead received an instruction card with information on how to provide
responses online. Paper questionnaires were mailed to households who did not respond by a
pre-determined date. This paper examines the proportion of Internet responses and overall self-
response rates, including Internet, telephone, and mail responses for each of six experimental
contact strategies.
Comparing the Effects of Mode Design on Response Rate, Representativeness,
and Cost Per Complete in Mixed-Mode Surveys Conducted in New Jersey
Ryan Tully, Princeton University; Amy Lerman, Princeton University
Through a meta-analysis of recent split design surveys, Medway and Fulton (2012) find that
mixed mode surveys “offering concurrent Web option in mail surveys results in a significant
reduction in the response rate” (p. 10). In 2011, Princeton University fielded three consecutive
surveys among residents of Princeton, NJ using Web-only, concurrent Web and mail, and
sequential Web and mail mode options. These surveys utilized nearly identical survey
instruments as well as similar contact strategies as outlined by Dillman, Smyth, and Christian
(2009). In analyzing the data, we did not find a statistically significant difference in response
rates among the Web-only mode option (AAPOR RR3 50.2%) and the concurrent Web and mail
option (AAPOR RR3 47.7%). However, we did find that the use of the sequential Web and mail
mode option had a statistically significant higher response rate (AAPOR RR3 57.0%) than the
other mode options. Our study further analyzed the impact of mode design on the
representativeness of the respondent pool, the probability of joining an online panel, and the
overall cost per complete. Our results showed that the use of the sequential Web and mail
mode option produced a more representative respondent pool than other mode options and
greater participation rate in our online panel. Additionally, the study further found that the use of
the sequential Web and mail mode design produced substantial savings in the cost per
complete compared to the concurrent Web and mail mode design.
Changing to a Mixed-Mode Design: The Role of Mode in Respondents’ Decisions
About Participation in the Fifth Wave of Understanding Society’s Innovation
Panel
Debbie Collins, NatCen Social Research; Martin Mitchell, NatCen Social Research; Mari
Toomes, NatCen Social Research
Understanding Society is a large panel survey, involving 100,000 individuals living in
households in Great Britain. In 2012, for the first time, a sequential mixed mode approach was
piloted, involving first Web and then face-to-face data collection for non-responders to the Web.
The questionnaire was designed to collect equivalent data in both modes, using a single
instrument. The pilot was undertaken with members of Understanding Society’s Innovation
Panel (IP), who may have taken part in up to four previous waves of data collection, all involving
face-to-face interviews. Panel members were randomly allocated to either a mixed mode or
single mode data collection group, the latter involving only a face-to-face interview. This was
done, in part, to assess the impact of adopting a sequential mixed mode design on response
rates. While the Web response was higher than expected, a statistically significant difference in
response rates between the two groups (mixed mode and single mode) was found, with the
response rate for individuals being lower among the mixed mode group. Moreover fewer
interviews were achieved with all members of the household in the mixed mode group than in
the single mode group. To understand more about why these differences occurred we
undertook qualitative follow up interviews with members of the mixed mode group to answer two
specific questions.
Why were respondents in the mixed mode sample group, who did not respond by
Web, less likely to participate in a face-to-face interview than those in the single
mode group?
Why were members of households where one other person had completed by Web
less willing to take part in the survey, in either mode?
This paper addresses these two questions, presenting findings from the qualitative research and
discusses the implications for panel surveys planning to move to a mixed mode design.
Utilizing the Web in a Multi-Mode Survey
Lekha Venkataraman, NORC at the University of Chicago
While there has been increasing interest in Web based surveys, little research exists regarding
how the Web fits into a multi-mode survey and what techniques can be used to increase Web
participation. The presentation will focus on two populations surveyed for the National Survey of
Early Care and Education (NSECE), the center based and home based providers (~12,000
cases). Both populations had a choice of completion modes, CAPI, CATI, or Web, yet the Web
yielded significantly more completes than the other two modes. The NSECE utilized various
mail, phone and email prompting strategies as well as incentive strategies which proved to have
varying levels of success. In this paper we will investigate what led respondents and
interviewers toward Web completion rather than other modes, as well as which prompting
strategies were most likely to result in increased Web participation.
Issues in Landline and Cell Phone Dual Frame
RDD Survey Design
Benefits of a Cell Only Sample for Oversampling Households with Children or
Entire Sample
Marcus Berzofsky, RTI International
The Ohio Medicaid Assessment Study (OMAS) is a large dual frame study designed to develop
key health and health care utilization metrics for families living in the state of Ohio. The OMAS
oversamples families with children, African Americans, and Hispanics while also trying to
achieve accurate county-level estimates. A dual frame (landline/cell) sample was selected, with
75% of the telephone numbers allocated to landline and 25% allocated to cell phone. The
oversample of households with children was recruited from both frames while the oversample of
minorities was from landline numbers only. However, in all cases the cell phone sample
produced a higher proportion of the populations of interest. In this paper, we model the OMAS
field experience to test the hypothesis that an all cell phone sample might produce similar
quality at the same or lower cost than a dual frame design. We examine bias introduced from an
all cell phone sample. We also examine the cost/quality trade-off for achieving the over-sample
goals of families with children and race/ethnicity. The paper concludes by suggesting other cell
phone/land line allocation strategies to achieve OMAS, and by extension other similar survey,
goals.
Special Considerations for Weighting Local-Area Surveys
Mike Battaglia, Battaglia Consulting Group, LLC
Local-area surveys such as the New York City Community Health Survey (NYC CHS) and the
Los Angeles County Health Survey (LACHS) produce estimates for adults residing in
households in NYC and its five boroughs, and in Los Angeles County, respectively. Both
surveys target specific sample sizes of adults in geographic subareas: 42 United Hospital Fund
(UHF) neighborhoods for NYC CHS and 8 Service Planning Areas for LACHS. A key aspect of
the weighting methodology for local-area surveys is post stratification to population control
totals, e.g., age, gender, race/ethnicity, education, marital status, and home ownership, etc.
Obtaining up-to-date control totals for subareas can be challenging when available population
data for subareas other than those which are of interest (e.g. ZIP Codes, Census Tracts, and
Block Groups). We discuss the strengths and weaknesses of the sources for control totals (the
Census Bureau Population Estimates Program, the 2010 Census, the American Community
Survey (ACS) tabulation program, the ACS public-use micro data sample (PUMS), and describe
the construction of subarea control totals for the NYC CHS and LACHS. We then evaluate the
impact of including or excluding adults in non-residential housing such as college dormitories,
prisons and nursing homes. For example, when weighting a Manhattan neighborhood that
includes a university, not limiting the population control totals to adults living in households there
result in 6,782 too many adults age 18-29 after weighting (38,250 instead of 31,468). The
inclusion or exclusion of populations in group quarters should be considered when constructing
demographic control totals, particularly for subarea weighting.
Best Weighting Approaches in Dual-frame Phone Survey with Multiple Domains
of Interest
Jamie Ridenhour, RTI International
During the weighting process dual frame telephone surveys require a step to account for the
fact that dual phone type users can be selected from either frame. There are several existing
methods to achieve this. Which approach is best is often survey specific. We will look at the
OMAS which is a study with multiple domains/outcomes of interest. To determine which
approach was best for OMAS we computed the weights using four approaches: single frame
estimation, 50% composite, optimal composite optimizing on minimizing the overall unequal
weighting effect, and optimal composite optimizing on minimizing the design effect for past
year’s income. We present the impact to standard errors that each approach had on a range our
estimates and discuss which approach we think is best for OMAS and other surveys like it.
Calculation of Response Rates for Dual-frame RDD Surveys
Robert Montgomery, NORC at the University of Chicago
Dual-frame surveys that combine landline and cell-phone samples have become the standard
for telephone surveys. Although the surveys estimates and weights are calculated from both
samples, response rates are usually reported separately. We start by considering the goal of
producing a combined rate and how that may determine the appropriate method. We then
examine different methods for calculating combined response rates and provide some guidance
for when separate and combined rates are appropriate, as well as which method to use when
combined rates are appropriate. We also explore different options depending on whether the
cell-phone design is screening or take-all.
Address-based Sampling (ABS) as an Alternative RDD: A Test in California
Matt Jans, UCLA
Address-based sampling (ABS) from the USPS Delivery Sequence File (DSF) presents a
sustainable method to overcome historical coverage decreases in landline random digit dial
(RDD) frames, and reduce costs relative to dual-frame cell/landline RDD samples. DSF
Coverage tends to be better in urban areas than rural areas, yet apartments with multi-unit “drop
points” and other living situations in which households are not clearly defined by a single mailing
address can be challenges in urban areas. Data collection challenges occur in phone surveys
like the California Health Interview Survey (CHIS) because respondents complete the survey in
a mode other than the one by which they were contacted. We evaluate procedural, cost, and
data quality implications of an ABS protocol in two communities in California (total n=7274
addresses sampled from the DSF). Communities where chosen based on population
size/density and percentage of Spanish speakers. The mailing protocol included three 'full-
packet' mailings with a reminder postcard between the first and second mailings. Each packet
included single-page, one-sided 12-item screener questionnaire (in English and Spanish) that
asked for basic health and demographic information, a phone number, and interview language
preference (Spanish or English). A $2 incentive, return envelope, and English and Spanish
versions of the cover letter and FAQ were also included. Households providing a phone number
were called to complete the standard CHIS telephone interview. Households not providing a
phone number were called if one was matched to their address through public records. We
compare ABS responses to CHIS RDD (cell and landline) responses in the same geographic
areas. We also compare respondents who provided a phone number on the screener form and
those for whom we used a matched phone number. We evaluate differences in key health
statistics in addition to response rates and demographics of responding cases.
Minimizing Nonresponse Bias
Evaluation and Use of Commercial Data for Nonresponse Bias Adjustment
Andy Peytchev, RTI International
Response rates have been declining, posing a substantial threat to survey inference due to
nonresponse bias in survey estimates. Concurrently, commercial vendors have been amassing
data on individuals in the country. These data include not only demographic variables, but also
substantive variables that can be similar to the key survey variables. These characteristics
make these data potentially valuable for nonresponse adjustments, but their properties for this
purpose remain unevaluated. Of critical importance are the rate at which these data can be
matched to survey samples, the accuracy of these data, and their relevance in being informative
about nonresponse bias. An additional hindrance is how they can be incorporated into
adjustments due to the high expected missing data rate. Of critical importance is how to
incorporate these data into nonresponse adjustments. We propose and evaluate the use of
multiple imputation as a method that allows for missing auxiliary data and can offer highly
efficient estimates when the auxiliary data are substantially correlated with the key survey
estimates. We augmented a random-digit-dial telephone survey on tobacco use with data from
Experian to evaluate: 1) the match rate of sample members with demographic data from
Experian, 2) the match rate with substantive tobacco use variables from the commercial data, 3)
the accuracy of these data for variables that are available in both survey and commercial data,
4) the impact of the use of these commercial data for nonresponse bias adjustment when
compared to external benchmark estimates, and 5) the use of multiple imputation to provide
efficient use of these data for estimates that are adjusted for nonresponse bias.
Interviewer Observations vs. Commercial Data: Which is Better for Nonresponse
Bias Correction?
Jennifer Sinibaldi, Institute for Employment Research (IAB); Mark Trappmann, Institut für
Arbeitsmarkt- und Berufsforschung (IAB); Frauke Kreuter, University of Maryland JPSM
& IAB; Brady T. West, University of Michigan Institute for Social Research
Survey methodologists are searching for better paradata to use in nonresponse adjustment
models, ultimately hoping to find variables that are highly correlated with both the outcome of
interest and the propensity to respond. This analysis examines the performance of two data
sources that can be used for nonresponse bias correction, interviewer observations and
commercially available auxiliary data. The analysis will determine which data source is more
predictive of the survey outcomes and is therefore, a better candidate for nonresponse
adjustment models. The auxiliary data and paradata examined in this analysis are: 1.
interviewer observations recorded for household income and receipt of unemployment benefits,
and 2. commercial auxiliary data indicating household income and unemployment benefit. The
survey data will provide a gold standard for both income and receipt of benefits. To answer the
research question, separate models will be run for the observations and the auxiliary data,
predicting the gold standard. The model fit will determine which data source shares more
(accurate) information with the true value, making it better for adjustment. In addition to
informing researchers wishing to improve their nonresponse adjustments, the results will benefit
survey managers by providing guidance as to which type of data on which to spend the survey
budget.
Assessing the Reliability of Unit Level Auxiliary Data in RDD Surveys: NHTSA
Distracted Driving Survey
John Boyle, ICF International; Andy Weiss, Abt SRBI; Paul Schroeder, Abt SRBI; Mikelyn
Meyers, Abt SRBI; Kristie Johnson, NHTSA
With declining response rates in population surveys, non-response analysis to evaluate survey
bias becomes increasingly important. In essence, we need to compare the completed sample
with sample units not completed in the survey. Hence, data from auxiliary data sources is
necessary for evaluation of non-response bias.
Although exchange level data derived from the Census is available for all sample units in
landline RDD surveys, its’ usefulness is very limited. More useful unit level data such as age,
education, income, race, ethnicity, household size, and housing tenure is also available, but
only for some units. This information is obtained by matching sampled telephone numbers to
other data sources including credit bureaus. Unfortunately, the reliability of this data source has
not been well established.
This paper is based on the 2012 National Survey of Distracted Driving Attitudes and Behaviors
conducted by Abt SRBI for the National Highway Traffic Safety Administration. The survey
includes a total of 6,025 interviews, including 3,100 interviews from a national landline RDD
sample, an oversample of 782 persons aged 16-34 from landline sample, and 2,143 interviews
from a national cell phone sample. Matched records from the auxiliary data base were obtained
for 49% of completed interviews and 54% for household contacts not yielding a completed
interview. Although almost no auxiliary data is available for cellphone sample, matched records
were found for 75% of the national landline sample.
A relatively high match rate for completes (77%) and non-completes (71%) in the landline
sample, coupled with a relatively high rate of agreement between interview data and the
auxiliary data on a range of key characteristics, suggests that auxiliary data may be useful in
correcting some non-response bias. Indeed, it may permit targeted follow-up efforts, in addition
to sample weighting, to improve estimates.
Responsive Design for Web Panel Data Collection
Annamaria Bianchi, University of Bergamo; Silvia Biffignandi, University of Bergamo
Many surveys today are affected by high nonresponse. This can be a detriment to survey quality
since nonresponse causes systematic error (bias) in the estimates. A related problem is the
need of survey costs reduction. Given the decreasing trend in response rates and the
corresponding increasing resources needed to achieve preset response rates, taking measures
only at the estimation stage is no more sufficient to overcome these problems. Measures need
to be taken also at the data collection stage. In this direction, different forms of responsive
design have been proposed (Groves and Heeringa, 2006, Särndal, 2011). The purpose of this
paper is to study responsive design in the framework of Web panel data collection. This method
of data collection is increasingly widespread for general population opinion evaluation and it
allows disposing of many variables on the participation process. We explore whether this
amount of information could be exploited in the framework of responsive design. We evaluate
as well whether this method improves the estimates in terms of bias reduction and assess the
consequences on the variability of the estimates. The empirical application uses data from two
on-going probability-based household panels: the PAADEL panel (Italian panel for the agro-food
sector) and the LISS panel (Dutch panel managed by CentERdata, Tilburg University). Using
these databases, we artificially reproduce a set of experimental responsive designs based on
alternative interventions in the data collection. Results are analyzed in a comparative way to
evaluate the impact of this approach on the final estimates. Bibliography: Groves, R.M., and
Heeringa, S.G. (2006), Responsive design for household surveys: tools for actively controlling
survey errors and costs. Journal of the Royal Statistical Society: Series A, 169.Särndal, C.E.
(2011), The 2010 Morris Hansen Lecture: Dealing with Survey Nonresponse in Data Collection,
in Estimation. Journal of Official Statistics, 27, 1-21.
Comparative Ethnographic Evaluations of Enumeration Methods Across
Race/Ethnic Groups in the 2010 Census Nonresponse Follow-up and Update
Enumerate Operations
Laurie Schwede, U.S. Census Bureau; Rodney Terry, U.S. Census Bureau; Ryan King,
U.S. Census Bureau; Mandi Martinez, U.S. Census Bureau
Why do minority undercounts persist over censuses, despite efforts to reduce them? We briefly
review past coverage-related ethnographic studies then use a 2010 Census ethnographic
evaluation with a records check to identify possible differences among race/ethnic groups in
factors affecting enumeration methods and possible coverage error. This controlled-comparison
evaluation was done in eight sites targeted to the major race/ethnic groups—American Indian,
Alaska Native, Native Hawaiian and Other Pacific Islander, Asian, African American, non-
Hispanic white, Hispanic, and a general site—in personal-visit 2010 Census Nonresponse
Follow-up and Update Enumerate Operations. In the field sites, eight ethnographers observed
and taped (when permitted) live census interviews, watched for cues of possible coverage error,
and debriefed respondents to decide where to count persons. In the records check, we matched
and compared rosters of ethnographer-observed housing units from 1) the observed standard
interview and 2) the ethnographers' assessments to special 3) localized final Census Unedited
File datasets to identify inconsistencies across records in where to count persons. We identify
qualitative themes crosscutting the ethnographic site reports. We present records check results
and assess whether cases of inconsistencies among rosters and characteristics of affected
persons and households differ by race, Hispanic, or household type. Some factors that affected
enumeration methods and possibly coverage include: interviewer-respondent interactions,
including question rewording; difficulty in gaining access to respondents; problems in
canvassing and enumerating in rural areas without standard addresses; language issues; and
cultural variations. We also reference selected results from the “Behavior Coding of the 2010
Nonresponse (NRFU) Interview Report” (Childs and Jurgenson 2011) that was based primarily
on analysis of audiotapes collected by the ethnographers in this evaluation. We suggest
improvements for enumeration and coverage and new research.
Cross-National/Cross-Cultural Survey Research
A Session Dedicated to Janet A. Harkness
Playing Soccer with an Accent: Variable Meanings and Analyst Bias
Clifford Young, IPSOS; Darrell Bricker, IPSOS
The Total survey error paradigm delineates the many sources of error in surveys. Variable
understanding across respondents lowers validity both within countries across individuals and
across countries. Error can occur at the design stage, at data collection, and during analysis.
Trends in International Data Collection Quality Monitoring
Beth-Ellen Pennell, Institute for Social Research, University of Michigan
Data collection across countries is especially important give the cross-national variation in
languages, cultures, and structure. In addition, differences associate with data collection can
compound differences and lead to artificial differences which are related to variability in
reliability and validity rather than true substantive variation across countries. It is important both
to optimize comparability by focusing on functional equivalence and to be sure that design are
successfully carried out. Improved data collection quality monitor can facilitate this goal.
Cross-Cultural Perspectives on Surveys of the U.S. Hispanic Population
Trevor Tompson, Associated Press NORC Center for Public Affairs Research; Paul J.
Lavrakas, Independent Consultant
As the 3MC perspective emphasizes, differences exist not only cross-nationally, but cross-
culturally as well. The Hispanic population in the U.S. illustrates that point with much of this
population being recent immigrants and with many having limited English proficiency. Steps for
maximizing comparability between Hispanic and non-Hispanic population in the U.S. are
discussed.
Interviewer Effects on Respondent Processing of Survey Questions, a Cross-
cultural Analysis
Timothy Johnson, University of Illinois at Chicago
In interviewer administered survey, data collection is an interaction between interviewers and
respondents. When these two participants are from different cultures, communication between
them may be hampered and the risk of misunderstandings and more measurement error
increases. Interviewer effects are always valuable to study and especially in cross-cultural
surveys.
Monitoring Local and Regional Developments
Polling in the Midst of a Natural Disaster: The ABC News/Washington Post 2012
Election Tracking Poll and Hurricane Sandy
Gregory Holyk, Langer Research Associates; Damla Ergun, Langer Research Associates;
Gary Langer, Langer Research Associates; Julie Phelan, Langer Research Associates;
Seth Brohinsky, Abt SRBI
Hurricane Sandy made landfall the evening of Monday, Oct. 29, nine days in advance of the
2012 general election. Political pollsters faced two questions: one, whether or not it was
possible to gather reliable regional and national estimates in the storm’s aftermath, and two,
whether or not it was appropriate to call people in the devastated areas of the Northeast.
Judgments differed. The Gallup Organization decided to suspend its daily tracking poll,
declaring that the hurricane “had compromised the ability of a national survey to provide a
nationally representative assessment of the nation’s voting population.” We preferred, instead,
to proceed, and to base our judgment on the data themselves. We polled the night of the
hurricane, and, based on our ongoing assessment of data quality, we continued to poll in the
days of its immediate aftermath and continuously up to Election Day. This paper presents a
close look at how we approached interviewer sensitivity and the validity and reliability of the
estimates obtained by our tracking poll in the midst of a major destabilizing event, and reports
on lessons learned in the process. We examine post-hurricane daily call efficiency, break-offs
and variability in estimates of the key demographics and attitudes nationally, in the Northeast
region, and in the New England and Mid-Atlantic census divisions, compared with these
measures in the 11 nights preceding the hurricane. We conclude not only that was it possible to
poll during and after the hurricane in a sensitive and ethical manner, but that our polling
produced valid and reliable national and regional estimates of attitudes and maintained an
essential flow of information at a time when accurate polling was most in need and in demand.
Tweeting the Chicago Teachers Strike: Using Organic Twitter Data and Sentiment
Analysis to Understand Support on a Local Issue
Nicholas D. Davis, NORC at the University of Chicago; Patrick van Kessel, NORC at the
University of Chicago; Michael Jugovich, NORC at the University of Chicago
The September 2012 Chicago Teachers Union (CTU) strike and the response from Chicago
Public Schools (CPS) were major media events during late summer and early fall 2012. With the
rising popularity of Twitter, both the media and members of the public were able to tweet
information and thoughts about the strike in great numbers. Our research examines tweets sent
during the strike period to explore the use of organic data, as opposed to survey data or other
experimentally designed data, for gauging public sentiment about the strike. Using the Twitter
Search application programming interface (API), NORC collected more than 125,000 strike-
related tweets sent prior to and during the strike period. This presentation will focus on efforts to
clean, deduplicate and process the collected tweets to facilitate their use in analyses of public
perception on a substantive local issue. We employ natural language processing (NLP) and
machine learning techniques for the purposes of conducting sentiment analysis. Using this
information, we assess the relevance, sentiment (positive or negative tone), and position (for or
against the strike) of the tweets and validate our processes using crowd-sourced manual
coding. We conclude the presentation with a discussion of future research options and
opportunities for the use of organic data in public opinion research.
From Red to Blue in the Green Mountain State: Real Change or Stability Against a
Background of National Changes?
Richard L. Clark, Castleton State College; Ryan Flood, Castleton College; James
McCormick, Castleton College
Prior to the 1992 presidential election, Vermont was traditionally a Republican state. From 1854
until 1963, Vermont’s state government had been in Republican control, and Vermont was the
most reliable supporter of Republican presidential candidates, favoring the Republican
candidate in nearly every race since the inception of the Republican Party up until 1992 with
the sole exception of 1964, where Lyndon Johnson’s landslide victory swept Vermont along in
its wake. By most measures, Vermont was the most reliably Republican state in the union for a
period of more than 100 years. Today, however, Vermont is perhaps the most reliable
Democratic state in the union. It is the only state where the entire congressional delegation is
comprised of representatives that caucus with Democrats (although Senator Bernie Sanders is
nominally an independent) and the Democratic Party controls both the executive and legislative
branches of state government. It is easy to mark the change from red to blue, with the historic
election of Governor Philip Hoff in 1962 as the first Democrat in that position since 1854. Hoff’s
victory changed Vermont politics and set a path to competitive parties in Vermont. Despite the
fact that we can identify when the change occurred, it has not been well established why the
change occurred. Using public opinion data, Census data, and exit polls, this paper examines
how Vermont became one of the most reliably Democratic states in presidential politics. Using
those data sources, our paper tests the following two hypotheses: H1: Vermonters’ political
views have remained ideologically stable while the national parties have moved to the right.H2:
Vermonters’ have shifted their views away from the right over the past two generations, being
aided by an influx of in-migration that has brought more liberal views to Vermont.
A Comparison of Live and Automated Congressional Race Pre-Election Polling
Meghann Crawford, Siena College Research Institute; Don Levy, Siena College Research
Institute; Colin Frederickson, Siena College Research Institute
The Siena College Research Institute (SRI) has for three congressional election cycles
accurately predicted many New York State swing congressional district races. Using live
interviewers, SRI benchmarks the race in September and polls the district a final time within the
last ten days before the election. A likely voter model is used in September and tightened in the
final poll. In the recently completed 2012 election cycle, SRI simultaneously polled four New
York State congressional races, all identified as among the top 75 most contested in the nation
by National Journal’s Hotline, in both September and late October using both live interviewers
and interactive voice response (IVR) software. This paper compares the two sets of polls, live
and IVR at two time points, benchmarking in September and on election eve in late
October/early November. In all cases, raw data is weighted by age, gender and stated party
enrollment and only likely voters moved through the final screen. Regardless of any debate over
weighting factors, both sets of data are weighted identically and compared not only to each
other but also to the final results. We look at variation across the live and IVR by various
demographics – party, age, gender – and across time points as well as the ultimate predictive
efficiency of live as compared to IVR in these Congressional races.
The Growing Political Might of Ethnic Voters in California Elections
Mark DiCamillo, Field Research Corporation
According to exit polls Latinos, African-Americans and Asian-Americans comprised about 40%
of California voters in the 2012 elections, a record high proportion. While the demographic
changes taking place have been many years in the making, the 2012 elections may prove to be
a turning point in California politics. My paper will trace the growth of ethnic voters as a share of
the state registered voters. In addition, the paper will document the increasing tendency of
California Latino and Asian-American voters to support Democratic candidates and will identify
factors behind this change. The paper will draw primarily from the results of recent multi-ethnic
Field Polls conducted in six languages and which over-sampled ethnic voter populations in
seven of the ten statewide Field Poll surveys conducted in the 2010 and 2012 election years.
Reluctant Respondents and Data Quality
Using Doorstep Concerns Data to Study the Relationship Between Reluctance
and Measurement Error
Ting Yan, Institute for Social Research, University of Michigan; Shirley Tsai, U.S. Bureau
of Labor Statistics
Are reluctant respondents poor reporters? This is a question that the survey research field has
been trying to answer for decades. Researchers have tried to answer this question from many
different angles and the evidence is mixed. This paper approaches this question using doorstep
concerns data. One type of paradata, doorstep concerns data capture the interactions between
interviewers and potential survey respondents during the survey introduction and reveal the
concerns sampled members have expressed about the survey request and also their reasons
for refusing the survey request when refusal occurs. We’ve created two parsimonious measures
that retain the interrelationships inherent in the doorstep concerns data – Perceived Concerns
Index (through principal component analysis) and Reluctance Class (via latent class analysis).
We’ve found that the two measures are effective in characterizing and assessing the level of
reluctance of survey respondents. In this paper, we will investigate the association between the
level of reluctance exhibited by survey respondents and the quality of their responses to the
survey questions making use of the two summary measures. We will attempt to provide further
empirical results to the question: “Are reluctant respondents’ poor reporters?”
Patterns of CATI Survey Break-off by Item Sensitivity and Respondent
Characteristics
Ayesha De Mond, Mathematica Policy Research
Non-response and break-offs may bias survey findings. Theoretical frameworks for survey
participation suggest the decision to initiate and complete a survey depends on the survey
design and respondent characteristics, as well as psychological and social factors such as the
cognitive demand of information sought, the sensitivity of items and the respondent’s motivation
and interest in completing the survey (Beatty & Herrmann, 2002; Peytchev, 2009).
Understanding determinants of response behaviors is particularly relevant for impact evaluation
studies where differential non-response and break-off rates between treatment and control
groups may compromise the validity of the study. However, literature on break-offs in the
context of program evaluation is scarce. This paper will examine patterns of respondent break-
off in the baseline surveys of the Parents and Children Together (PACT) Evaluation study. The
PACT Evaluation consists of multiple components; here we focus on the experimental impact
evaluation of a subset of Responsible Fatherhood (RF) and Healthy Marriage (HM) federal
grantees undertaken by the Administration for Children and Families (ACF) with assistance from
Mathematica Policy Research. The baseline surveys will gather descriptive information on study
participants to make it possible to identify the characteristics of those who apply for RF and HM
programs. The baseline survey instruments consist of 10 sections with questions tailored to
respondents and rosters of family composition. The instruments collect data on sensitive topics
such as relationship(s) with their child(ren) and partner(s), mental health, fidelity, economic
stability, and experience with the justice system. We will examine the frequency of break-offs in
relation to question content and respondent characteristics. We will explore break-off patterns
and respondents’ reasons for break-off through debriefings with interviewers. We will discuss
findings and implications for survey design, response rates and data quality.
Nonresponse in Recontact Surveys
Besheer Mohamed, Pew Research Center; Greg Smith, Pew Research Center
One common way to identify individuals in hard-to-reach populations for surveys is to recontact
respondents who indicated in previous studies that they are members of the population in
question. For example, recontacting respondents who had identified themselves as Muslims,
Asians or Mormons in surveys of the general public was one key component of the sample
design for the Pew Research Center’s surveys of these low incidence populations. But to what
extent is nonresponse bias a problem in recontact samples of hard to reach populations? This
paper employs logistic regression to compare non-response bias in re-contact samples to bias
in samples acquired through random digit dialing. By analyzing non-response bias in recontact
samples across three Pew Research Center surveys (including surveys of Muslim Americans,
Asian Americans and Mormons), this new study extends and builds upon preliminary analysis
presented at the 2011 AAPOR conference, which focused primarily on analysis of the Muslim
American survey. The results will help researchers better understand both the advantages and
the potential drawbacks in employing recontact sample as a means of surveying hard to reach
populations.
Does Reissuing Unproductive Cases in a Face-to-Face Survey Reduce
Nonresponse Bias? Evidence From the UK Citizenship Survey
John D’Souza, Ipsos MORI; Patten Smith, Ipsos MORI; Kathryn Gallop, Ipsos MORI;
Angela Thompson, Ipsos MORI
It is common practice in UK face-to-face random probability surveys to reissue a subset of
unproductive sample members to another interviewer in order to improve response rates. This
practice is expensive to implement, both because interviewers are paid at higher rates for
covering reissued cases and because interview productivity is considerably lower for reissued
cases. In order to investigate the improvements in accuracy of reissuing cases, we analysed a
variety of key survey variables in the 2009/2010 round of the UK Citizenship Survey. This
survey collected data on a range of issues, including measurements of: attitudes towards
community cohesion, behavioural changes caused by the economic downturn and frequency of
civic participation activities. Measuring the non-response bias of a survey estimate is not usually
possible. However, under the plausible assumption that the full sample is less biased that the
first-issue sample, we are able to estimate the difference in bias between estimators based on
the two samples. Bootstrapping yields confidence intervals for this difference. The results of our
analysis show that the effects of reissuing were highly question-specific. For most variables, the
estimates obtained from the first-issue sample were not significantly different from those
obtained from the full sample. However, the differences were significant for many of the
variables measuring frequency of civic participation activities. Furthermore, these differences
could not be eliminated by non-response weighting. This implies that, for the variables
measuring activity, reissuing does improve the accuracy of estimates. We discuss implications
for existing survey practice and directions for future research.
Impacts of Unit Nonresponse in a Recontact Study of Youth
Jonathan Mendelson, Fors Marsh Group; Luciano Viera, Fors Marsh Group
When propensity to respond to a survey is correlated with key survey variables, nonresponse
bias can occur. One method of assessing nonresponse bias is to compare respondents with
nonrespondents using auxiliary variables from the drawn sample. A limitation of this method is
that many frames have only basic demographic variables, which may be poorly correlated with
response propensity. However, for low incidence and hard-to-reach populations, recontact
studies are a popular option, often utilizing rich sampling frames containing behavioral and
attitudinal variables from previous surveys. This paper assesses the impact of unit nonresponse
in a recontact study of young adults who had recently completed a similar 'seed' study. Both
studies were sponsored by the U.S. Department of Defense; the initial study examined attitudes
and behaviors pertaining to military recruiting, and the recontact study assessed the awareness
of and attitudes toward the Military's advertising campaigns. The seed study consisted of three
iterations of a national mail survey of young adults ages 16 to 24, sampled from an address list
database which covered more than 90% of the target population. Respondents to the seed
study who provided an email address were used as a sampling frame for the recontact study,
which was completed online. Using auxiliary variables from the original frame and from
responses to the seed study, we examine unit nonresponse in the recontact study to assess
differences between respondents and nonrespondents and the impact on key survey estimates.
First, we compare characteristics of respondents and nonrespondents on a variety of
demographic, attitudinal, and behavioral measures. Where characteristics differ significantly
between the two groups, we conduct regression analysis to determine whether these
characteristics also significantly predict responses to survey questions in the recontact study.
After examining the impact of unit nonresponse, we discuss implications for future research.
Methodological Briefs: Mode and Survey Error
Multi-Mode Survey Administration: Does Offering Multiple Modes at Once
Depress Response Rates?
Jocelyn Newsome, Westat; Kerry Levin, Westat; Pat D. Brick, Westat; Patrick Langetieg,
Internal Revenue Service; Melissa Vigil, Internal Revenue Service; Michael Sebastiani,
Internal Revenue Service
As multi-mode surveys become the dominant methodology, questions have emerged about the
optimal way to combine different modes. Is it best to offer all of the modes simultaneously,
allowing respondents to choose their preferred mode of response, or is it best to offer first one
mode and then another consecutively? Studies have shown that offering modes concurrently
can depress response rates, a phenomenon sometimes called the “paradox of choice.”
(Medway and Fulton 2012; Millar and Dillman 2011). According to this research, when
respondents are provided with a choice of modes, they are less likely to respond by any mode.
Consequently, there has been increased interest in determining how to best offer modes
sequentially in order to increase survey response. For the 2010 IRS Individual Taxpayer Burden
(ITB) Survey, an experiment compared a sequential administration (beginning with a Web
survey) with a single mode, mail-only administration. The mail-only administration resulted in a
higher response rate (44.1%) than an administration that offered first the Web survey and then
the mail survey (40.9%).When planning for the 2011 ITB Survey, however, it was not an option
to conduct a mail-only survey given federal government technology requirements. Therefore, it
was decided that the 2011 ITB should follow the successful mail-only administration, with a
simultaneous Web option. In an attempt to avoid the “paradox of choice,” the Web survey was
offered in an understated way. While there has been very low Web survey response, overall
response rates for the 2011 ITB Survey have so far been significantly higher than the 2010
survey (48.5%).This paper explores the success (and drawbacks) of this type of concurrent
offering. The results of this administration suggest that it is possible to offer modes
simultaneously if one mode is considered the primary mode and other modes are offered less
prominently.
Tablets and Smartphones and Netbooks, Oh My! Effects of Device Type on
Respondent Behavior
Hilary Ross, Fors Marsh Group; Jonathan Mendelson, Fors Marsh Group; Matthew
Lackey, Fors Marsh Group
As the Internet becomes ever more accessible via smartphones, tablets, netbooks, and laptops,
researchers have increasingly less control over how participants complete online surveys.
Although options for online survey takers make these surveys more accessible than ever,
researchers may not reap the benefits of increased accessibility if surveys are not configured to
fit the wide range of devices available. Most current research on mode differences focuses on
comparisons among paper, telephone, and online surveys, treating online surveys as a single
mode. However, with so many devices available to access online surveys, researchers must
consider the possibility that mode differences exist between devices within an online survey.
This study examines respondent behaviors by device in a probability-based online advertising
tracking survey over a one-year period. The survey contains open- and closed-form questions
with a variety of response option scales. Paradata from the survey administrator provides the
browser user-agent tag, used to determine type of survey-taking device, and time to complete at
the item level. This paper will examine the effect of the device on survey taking behaviors such
as item nonresponse, open-ended response length, and straightlining. Implications for future
online survey research will be discussed as well.
Reducing Survey Error in a Mobile Speech-IVR System
Michael Johnston, AT&T Labs Research; Patrick Ehlen, AT&T Labs; Fred Conrad,
University of Michigan; Michael Schober, The New School for Social Research; Chris
Antoun, University of Michigan; Stefanie Fail, The New School for Social Research;
Andrew Hupp, University of Michigan; Lucas Vickers, Parsons, The New School for
Design; Huiying Yan, University of Michigan; Chan Zhang, University of Michigan
Speech recognition systems for various automated tasks and transactions are now widely
deployed. Despite advances, speech recognition is still not perfect, and designers of speech
dialogue systems have various strategies for dealing with the imperfections. Can we live with
this imperfection in speech-IVR survey interfaces? In principle, survey estimates should be
accurate if misrecognition is unbiased—that is, if recognition errors are not systematic. We
argue that the nature of the survey task should lead to different strategies for dealing with
speech recognition error than other speech dialog tasks, which most often are initiated by the
user. In a survey, adopting a high-accuracy dialog strategy with explicit response confirmation
could frustrate respondents and increase break-off rates, while a low-accuracy-tolerant or no-
confirmation strategy may be sufficient as long as the recognition errors are not systematic. In
the current study we examine bias in recognition error in a corpus of 165 interviews on iPhones
that ask numerical, categorical and yes/no questions, in a speech interviewing system designed
specifically for the study. We compare a gold standard of human judges’ interpretation of what
respondents said to the speech dialog system’s interpretation, to examine how spoken dialog
system performance affects survey error for a range of different question types. Although
recognition accuracy (agreement between human and automated judgments) was 94%, the
question is whether the 6% recognition error was biased and in what direction. In particular, we
examine the impact of dialog confirmation strategy on survey error and user satisfaction, and
explore the use of acoustic and language model scores to limit errors. We also discuss which
types of misrecognition were more likely, and what this suggests for the design of a survey
instrument administered by a speech dialog system.
Mixed-Mode Data Collection in Health Care: Novel Approaches to Support
Comparative Effectiveness Research
Margaret Good, OptumInsight, Life Sciences, Susan Brenneman, OptumInsight, Life
Sciences
The American Recovery and Reinvestment Act (ARRA) of 2009 provided $1.1 billion in funding
to support comparative effectiveness research (CER). The intent of CER is to compare the
relative effectiveness, benefits and harms of treatment options among different groups of
patients in a “real world” setting. By improving our understanding of what treatments work best
for whom and in what circumstances, CER helps physicians and patients make informed
therapeutic choices. CER demands the development and expansion of a variety of data sources
and methods. Optum has developed novel approaches to support CER by leveraging its
proprietary research database of administrative medical and pharmacy claims from a large U.S.
managed care plan. This database allows for the identification of a targeted study sample and
comprehensive analysis of health care utilization and costs, as well as treatment patterns,
patient health outcomes and clinical characteristics. Limitations of administrative claims data are
well known; in particular, the voice of the patient, the reasons for healthcare decisions and
severity of illness defined by actual clinical lab values and vital signs are not available. In order
to bridge the gaps in data gathered for reimbursement purposes, Optum engages in targeted
primary data collection to obtain patient-reported outcomes via survey and clinical endpoints
such as lab values via medical chart review. These data are combined with administrative
claims data to explore a wide array of research questions, such as, the associations between
treatment satisfaction, attitudes and beliefs about medicine and healthcare, and health status to
treatment patterns and healthcare utilization; and association of severity of illness with
healthcare utilization and costs. These designs provide an efficient and powerful methodology to
conduct CER.
A Matter of Time: The Value and Optimal Timing of Follow-Up Questionnaire
Mailings in a Multi-Mode Survey
Andrea Mayfield, NORC at the University of Chicago; Ashley Amaya, NORC at the
University of Chicago; Kari Carris, NORC at the University of Chicago
Mail surveys remain popular in the United States primarily due to their lower costs relative to
other interview-based methods of data collection. Inclusion of a mail component in a larger,
multimode survey design may be used to increase response rates, obtain the requisite number
of interviews, and contain survey costs. Dillman’s Tailored Design Method provides a framework
for the ideal frequency and timing of follow-up contacts to increase response rates in multimode
surveys that include a mailed, self-administered questionnaire (SAQ) component. As the timing
of mailings has not been tested recently, we seek to examine assumptions about the
effectiveness, efficiency, and optimal timing of follow-up SAQ contacts in a survey of minority
populations. We use data for this analysis from the Racial and Ethnic Approaches to Community
Health Across the U.S. (REACH U.S.) survey, a multi-year project sponsored by the Centers for
Disease Control and Prevention to eliminate health disparities among racial and ethnic minority
populations. REACH U.S. uses a multimode, address-based survey design involving telephone,
mail, and face-to-face interviews. In the latest round (Year 4), the REACH U.S. Survey
incorporated a second SAQ mailing to non-respondents in all communities. The second SAQ
mailing was sent six weeks after the initial mailing, in accordance with Dillman’s Tailored Design
Method. In our analysis, we find significant gains in the response rate by adding a second SAQ
mailing. Additionally, we find that adding a second SAQ mailing is more cost efficient than
additional contacts in other modes to achieve a target number of completed interviews. We also
analyze the optimal time to mail a second SAQ mailing to achieve maximum response at
minimum cost. Lastly, we investigate whether pursuing nonrespondents via multiple contacts
changes key survey estimates and demographics.
Using Multiple Modes in Follow-Up Contacts in Random-Digit Dialing Surveys
Pranesh P. Chowdhury, Centers for Disease Control and Prevention
Recent studies have noted a decline in the response rates of random-digit-dialing (RDD)
surveys. To increase participation and improve representation of the general population, the
Behavioral Risk Factor Surveillance System (BRFSS) piloted several follow-up projects in 2012.
These projects included a mail follow-up study for landline phone numbers in 10 states (CT, KS,
NH, IL, MA, MO, MT, ND, OH, and AR), a Web-based follow-up (WBFU) for landline phone
numbers in 7 states (CT, DE, ,HI, IA, KY, NE and Washington DC) and a text invitation to Web
follow-up for cell phone numbers in one state (CT). The purpose of the follow-up pilots was to
test the feasibility of using landline/cell phone nonresponse contacts and their impact on
demographic and health characteristics of respondents. All three pilots followed standardized
protocols using specific non-responding RDD disposition codes to identify potential follow-up
respondents. For landline follow-ups, phone numbers were matched to address and either
entire surveys (for the mail follow-up) or letters with Web site links and login information (for
WBFU) were sent to the household. Cell phone non-respondents with the same RDD
disposition codes were texted and directed to the Web-site. Data collection will continue through
December 2012. Results will be presented to illustrate unweighted differences in the
characteristics of respondents of the three follow-up formats, as well as those who responded to
the BRFSS. Preliminary data for the first nine months (N=1,107) from Web-based follow up
survey indicate that it can increase the participation of female and Asian non-Hispanic
respondents as well as those who have college degrees and annual household incomes of over
$75,000 or more. Single-adult households are also more likely to participate in the Web-based
follow-up survey.
Where to Start: An Evaluation of Primary Data Collection Modes in an ABS Design
Ashley Amaya, NORC at University of Chicago; Felicia LeClere, NORC at the University of
Chicago; Kari Carris, NORC at the University of Chicago; Youlian Liao, Centers for
Disease Control and Prevention
As multimode address-based sampling becomes increasingly popular, researchers continue to
refine data collection best practices. While much work has been conducted on Web + mail
designs to maximize response rates, researchers have not yet tackled how phone + mail
designs can be optimized. We use data from an experiment conducted on the Racial and Ethnic
Approaches to Community Health Across the U.S. Risk Factor Survey (REACH U.S.) to
evaluate two multimode case flow designs: (1) phone followed by mail (phone-first) and (2) mail
followed by phone (mail-first). We use measures of response rates, cost, timeliness, and data
quality to identify differences across case flow design. Because surveys often differ in terms of
the rarity of the target population, we also examine whether changes in the eligibility rate alter
the choice of optimal case flow. Results suggest that the mail-first design is superior to the
phone-first design on most metrics. Mail-first achieves a higher yield rate at a lower cost with
equivalent data quality compared to phone-first. While the phone-first design initially achieves
more interviews compared to the mail-first design, over time, the mail-first design surpasses it
and obtains the greatest number of interviews.
Thursday, May 16
1:30 p.m. – 3:00 p.m.
Poster Session 1
1. A Comparison Between Screen/Follow Item Format and Yes/No Item Format on a
Multi-Mode Federal Survey
Sarah J. Hernandez, NORC at the University of Chicago; Svetlana N. Arakelyan, NORC
at the University of Chicago; Vincent Welch, NORC at the University of Chicago
Over the last decade, methodological research (Dillman, 2008) has indicated that survey
data quality can be increased if screener/follow questions (e.g., Do you have a disability? If
yes, what type?) are replaced with yes/no questions (e.g., Do you have any of the following
disabilities?). In keeping with this notion and consistent with government survey question
format guidelines, the National Science Foundation’s Survey of Earned Doctorates (SED)—
an annual census of research doctorates awarded by U.S. institutions—changed the format
of two demographic items (ethnicity and disability) from screener/follow to yes/no format.
This work will explore the impact of this change on the responses to these items. To
examine this effect, we will analyze the four most recent rounds of SED data (2008-2011);
two rounds with screener/follow format and two rounds with yes/no format. Considering the
previous research on this effect, we anticipate seeing higher levels of endorsement for both
the presence of disabilities and ethnicity and fewer “other specify” responses on surveys
with the yes/no format. We will concurrently explore whether the mode of administration
(paper versus Web) moderates the effect of the format change. The SED is self-
administered in paper and Web formats. When completed on the Web, the screener and
follow-up items appear on different screens. While on paper they appear on the same page.
Due to this difference, we anticipate that the effect of the format change will be greater for
Web than for paper-and-pencil responses. The implications of these findings for survey
design will be discussed.
2. Survey Weight Calibration With Multiple Imputation for Missing Data
Michael D. Larsen, The George Washington University; Benjamin M. Reist, U.S.
Census Bureau
Multiple imputation (MI) fills-in missing information with two or more possible values.
Observed data are used to model relationships among variables. Multiple draws for each
missing value from conditional distributions enable representation of uncertainty. Calibration
estimation in sample surveys adjusts survey weights so that estimated totals match control
total targets. Post stratification and raking are versions of calibration commonly used in
sample surveys and opinion polls. This paper examines calibration used in combination with
MI for missing data. Performance of point estimators and variance estimators for estimated
parameters are studied. The potential for calibration weighting with MI to reduce a source of
bias in MI variance estimation is examined. Methods could apply to both sample survey and
more general study design contexts.
3. Does Pre-Screening the Sample Improve Response in an Establishment Survey?
Julie A. Pacer, Abt SRBI; Kelly Daley, Abt SRBI; Marci Schalk, Abt SRBI; Jacob A.
Klerman, Abt Associates
Establishment surveys are susceptible to a unique set of challenges compared to household
surveys. Unlike a household survey, an establishment survey collects information from a
representative of a business who speaks for that business, during business hours. While
coverage may not be a problem in a survey of known establishments, nonresponse is a
considerable factor. The sampling frame may lack a contact name at the sampled business
or it may provide outdated contact information. Furthermore, depending on the role of the
intended respondent and size of the business, the intended respondent may not respond to
a survey invitation due to competing work duties. Data from two recent establishment
surveys allow the analysis of a strategy to improve response rates by improving the quality
of the sampling frame. In both studies, a sample verification effort was performed pre-data
collection to identify a particular respondent and for efficiency in main data collection. This
research will explore the impact of the sample verification effort on response to the main
survey by comparing outcomes such as level of effort, completion rate, and item
nonresponse and including mode comparisons. The results will inform recommendations for
future use of sample verification. Abt SRBI collaborated with Abt Associates on the Survey
of Homelessness Prevention, sponsored by the U.S. Department of Housing and Urban
Development, to understand the scope of the services being offered by agencies that
received federal Homelessness Prevention and Rapid Rehousing Program funding. In
addition, Abt SRBI with Abt Associates and the U.S. Department of Labor conducted a
survey of employers regarding their use and understanding of the Family and Medical Leave
Act that utilized the Dun & Bradstreet Market Identifiers file as a sampling frame. Sample
verification was performed for each survey to verify the existence of businesses and identify
a respondent.
4. Election Exit Poll Estimation Using Spatiotemporal Statistics
Clint W. Stevenson, Edison Research
There is an expansive amount of literature relating to Election Day forecasting during
presidential elections. Most of the work on this topic relates to a national random sample
and independent samples in key states. National samples provide insight on the nation as a
whole. However, due to the way the Electoral College operates the state samples are critical
to determine the winner of an election. This paper will examine the 2012 National Election
Pool Exit Poll conducted by Edison Research on Election Day (November 6, 2012) and will
take the spatial information into account. Geostatistical procedures are used to develop a
spatial model of voting patterns using actual vote results as well as demographic and other
information obtained only from the Election Day exit poll. Kriging and co-kriging is used to
improve the spatial estimation of these voting patterns. These results from 2012 are
compared to historical outcomes and exit poll data from the presidential elections in 2004
and 2008. The results presented here will allow for fine tuning attribute estimation both
nationwide and at a state level.
5. Does Persistence in Nonresponse Follow-up Overcome Respondent Reluctance or
Does it Contribute to Nonresponse?
Mary Frances E. Zelenak, U.S. Census Bureau; Brenna Matthews, U.S. Census
Bureau; Mary C. Davis, U.S. Census Bureau; Jennifer G. Tancreto, U.S. Census
Bureau
The American Community Survey (ACS) is an ongoing monthly survey that collects
demographic, housing, and socio-economic data about people and households at
approximately 3.54 million housing unit addresses in the United States each year. Since its
inception, the ACS has collected data using three modes over a three-month period for each
sample panel. In the first month, addresses are contacted by mail and households are
asked to complete and return a paper questionnaire by mail. Beginning with the January
2013 panel, the mail contact will include instructions for an Internet response mode.
Addresses that do not respond during the first month of data collection are contacted by
telephone during the second month and data are collected using a Computer-Assisted
Telephone Interview (CATI). Addresses that do not respond by the end of the second month
are subsampled and contacted using a Computer-Assisted Personal Interview (CAPI).
During the mail contact month, households receive multiple mailing pieces, the number of
which varies depending on whether and when a response is provided. Similarly, multiple
contacts are possible during both the CATI and CAPI follow-up operations. Given that the
CATI and CAPI operations are designed to target nonrespondents, it is very likely that some
addresses are contacted numerous times throughout the three-month data collection period.
These multiple contacts may lead potential respondents to be reluctant or even refuse to
respond to the ACS. Paradata from the CATI and CAPI operations will be used to assess
the success of the current CATI and CAPI procedures in obtaining cooperation from
nonrespondents when reluctance is encountered. Recommendations for possible
improvements to the current procedures and suggestions for future research will be
provided.
6. One Drink or Two: Does Quantity Depicted in an Image Affect Web Survey
Responses?
Nuttirudee Charoenruk, University of Nebraska-Lincoln; Mathew Stange, University of
Nebraska-Lincoln
Researchers sometimes place images in Web surveys to motivate participation or to
illustrate the meaning of a question, but studies indicate that presenting an image affects
responses (e.g., Couper et al. 2007; Couper et al. 2004). To date, this research has
investigated changes in responses based on the type of image presented; for example, how
presenting an image of grocery shopping versus clothing shopping changes reports of
shopping frequency (Couper et al. 2004). Our study extends this research by examining how
quantity depicted in images affects respondent reports. Respondents will be randomly
assigned to receive one of two Web surveys. One version will present pictures of one
cigarette and one alcoholic beverage as illustrations for smoking and drinking behavior
questions. The other version will present images with multiple cigarettes and glasses of
alcoholic beverages for the same questions. We hypothesize that the respondents in the
single cigarette and alcoholic beverage image condition will report consuming fewer
cigarettes and alcoholic beverages compared to respondents in the condition in which many
cigarettes and alcoholic beverages are depicted. Moreover, we hypothesize that the quantity
depicted in the images will affect how respondents consider themselves as heavy versus
light smokers and drinkers. Respondents may compare themselves to the quantity
presented in the image to judge whether they smoke and drink heavily. In addition to
analyzing differences in respondent reports, we will use eye-tracking data to analyze the
time respondents spend looking at the image and the frequency with which they look
between the images and the questions and response options to try to further understand
how images and their content affect survey responses. We will conclude with implications for
the use of images in Web surveys and Web survey design in general.
7. Geographic Accuracy of Cell-Phone RDD Sample Selected by Area Code Versus Wire
Center
Xian Tao, NORC at the University of Chicago; Benjamin Skalland, NORC at the
University of Chicago; David Yankey, National Center for Immunization and
Respiratory Diseases; Jenny V. Jeyarajah, National Center for Immunization and
Respiratory Diseases; Phil Smith, National Center for Immunization and Respiratory
Diseases
The assignment of geographic location to cell-phone numbers at the time of sampling is
often inaccurate. This inaccuracy can lead to increased cost and bias for area-specific
telephone surveys and to increased variance for national telephone surveys with area
stratification (Skalland and Khare, 2012). The assignment of cell-phone numbers to
geographic location can be done either based on the area code of the phone number or
based on the location of the wire-center associated with the phone number. In this paper,
we compare state and local-area geographic inaccuracy rates of cell-phone numbers
assigned to geographic location based on the area code versus the wire center using data
from the National Immunization Survey and the National Immunization Survey – Teen, dual-
frame RDD surveys sponsored by the Centers for Disease Control and Prevention and
fielded by NORC at the University of Chicago. In addition, we present estimates of
demographic differences between respondents with accurate and inaccurate geographic
assignment, first with the assignment based on the area code and then with the assignment
based on the wire center.
8. Hola or Hello? A Priori Assignment of Interview Language Using Demographic Flags
Ying Li, NORC at the University of Chicago
The Racial and Ethnic Approaches to Community Health across the U.S. (REACH U.S.)
Risk Factor Survey is a set of CDC-sponsored community surveys used to evaluate
progress towards eliminating racial and ethnic health disparities. The REACH U.S. survey
targets racial and ethnic minorities in specific geographic areas using an address-based
sampling (ABS) approach and a mixed-mode data collection protocol involving telephone,
mail and face-to-face interviews. Since REACH U.S. surveys racial and ethnic minority
subpopulations, identifying and gaining cooperation from households in which the primary
language is not English is vital, as many of these households may represent less educated,
and recent immigrant populations that may be significantly different from their English-
speaking counterparts. In Phase 3 of the REACH U.S. survey, NORC appended a set of
vendor-provided race/ethnicity flags to the sample frame. This allowed us to test the quality
of these flags, as well as assess the usefulness of this a priori information in making
decisions about how to approach these households. An analysis of Phase 3 data revealed
that these flags are relatively reliable. In Phase 4, the flags were used for a priori specialty-
language interviewer assignment. Flagged cases in communities targeting non-English-
speaking ethnic subgroups were assigned initially to an interviewer who speaks that
language. In this paper, we will analyze the effectiveness of language-based a priori
interviewer assignments. Comparing Phase 4 to Phase 3, we will examine whether the use
of these race/ethnicity flags increased survey participation and/or reduced the time spent
completing the survey. In addition, key performance measures from the REACH U.S. survey
will be analyzed to examine potential interviewer effects or differences in the resulting
sample introduced through the assignment of specialty language interviewers.
9. Evaluation of a Targeted Dual-Frame RDD Sample of Sub-State Populations
Amy Couzens, RTI International
Due to declining coverage of the landline RDD frames, researchers have become
increasingly reliant on dual-frame (cell phone and landline) RDD designs to maintain
complete coverage of the household population. A key challenge facing users of this
approach is achieving geographically accurate coverage of state and sub-state areas
through targeted cell phone sampling. The Aligning Forces for Quality (AF4Q) initiative
works to increase the overall quality of health care, reduce racial and ethnic disparities, and
provide models for national reform through the alignment of efforts to increase public
reporting, consumer engagement and quality improvement within these communities. The
AF4Q consumer survey seeks to evaluate the effectiveness of the AF4Q initiative and has a
target population residing in fifteen markets across the United States, ranging in size from
single counties to entire states. The overlapping dual-frame design is comprised of both
RDD landline and targeted cell phone samples. This poster presents data describing the
geographic accuracy of the cell phone samples in each market and how the accuracy varied
by market size and characteristics of the population. Based on our findings, we will make
recommendations for future dual-frame RDD studies of small geographic areas.
10. Using Maximum-Difference Scaling to Assess Community Values about Local Water
Resource Management
Tom Eiland, CFM Strategic Communications; Edward P. Johnson, SSI
As a suburban community transitions from an agriculture-based economy to an industrial-
based economy through rapid population growth, some of the core values and attitudes
towards the environment can change. Washington County, Oregon has gone through this
change over the past 20 years. Since 1990, the county’s population has grown by 250,000
people (+70%) with high-tech companies, such as Intel, becoming the largest employers. To
try and meet its new consumers’ needs, Clean Water Services (a local sewer and storm
water service district) wanted to assess the values and priorities for water resource
management in the new community makeup. Working with CFM Strategic Communications
and SSI, the District developed an online panel of approximately 30,000 residents. A total of
1,398 residents participated in an online survey on the relative importance of eight different
uses of water. Instead of using a typical Likert scale where people could answer that all
uses of water were important, a Maximum-Difference Exercise was used to force
respondents to make trade-offs between each use of natural resources. Relative utilities
were then compared overall and by different geographic locations to determine the values of
each potential water use. In particular, the Maximum-Difference technique was extremely
well suited to make respondents choose between categories that they might otherwise take
for granted. As a result, the District collected richer data and was able to make more
informed decisions on how to best allocate resources in a way that best meets the needs of
its changing user population.
11. Are We Asking the Right Questions? An Exploration Into Crowdsourcing Survey
Questions
Bryan B. Rhodes, RTI International
Survey researchers use several methods to develop research questions and gather
corresponding survey items, particularly for omnibus style surveys. Researchers may
include established survey items, call on expert panels, or conduct focus group or interviews
with particular populations. These methods, however, can be limiting in the number of
voices that are represented. Researchers (or research topics) who may be less established
in a field may not be included. This could mean important gaps in knowledge or new
strands of research are left unconsidered for a survey. One way of incorporating a much
wider range of viewpoints when developing a survey is crowdsourcing. Crowdsourcing is
defined as “a novel method of online, distributed idea generation, problem-solving, and
decision making that involves an open call to a large, often undefined network or community
of people (‘a crowd’), to provide either independent or collaborative contributions to solving a
problem or performing a task” (Dalal et al., 2011). To further explore this possibility, RTI
International hosted a crowdsourced “Research Challenge.” The challenge put a call out to
researchers across a range of fields to submit a short research brief and up to 10 survey
items. The submissions were blindly reviewed by a group of survey experts, and ten
winners were selected to have their survey items fielded. This presentation will give an
overview of how the “Research Challenge” was conducted. In addition, based on
submissions and a survey of participants the presentation will explore the types of
researchers who entered (and won) the contest, as well as their motivations. The results of
the “Research Challenge” show that important research questions and survey items can
come from a broad spectrum of researchers, that might otherwise be overlooked.
12. The Cultural Life-Course of Attitudes Toward New Medical Technologies: A Case
Study of Xenografts
Mariah D. Evans, University of Nevada, Reno; Jonathan Kelley, International Survey
Center
How do people decide whether new medical technologies are good, neutral, or evil? We
explore this question through a case study of attitudes toward a rapidly emerging
biotechnology which potentially could save thousands of lives annually: xenografts
(xenotransplantation). This involves taking a human patient's own cells, modifying them so
they can grow into an organ, for example a heart, and implanting them in an animal fetus
(often a pig), sacrificing the animal after the heart has grown large enough, and
transplanting the heart into the patient. Data are from a nationally representative U.S.
sample survey (N=2069) with reliable multiple-item measurement of key concepts, analyzed
by structural equation methods. The results show that public attitudes are largely positive,
with differences mainly reflecting cultural stances rather than social structure. Consistent
with relational anchoring theories, attitudes are strongly shaped by views on conventional
human-to-human transplants. They are also strongly influenced by scientific knowledge and
by acceptance of a Darwinian worldview. Demographic and religious differences are few.
Extrapolating from these findings, we propose a hypothesis about the cultural life course of
new technologies.
13. The Effect of Incentive Offer Timing on Interview Completion Rates for the General
Social Survey
Beth A. Fisher, NORC at the University of Chicago; Mike Buha, NORC at the
University of Chicago
Much has been written about the use of incentives in survey research, types and timing of
incentives, and how this affects interview participation rates. Prior research suggests that
the use of incentives can improve response rates in most types of surveys. Further, Singer,
van Hoowyk, and Maher (1998) found that providing incentives does not reduce future
survey participation if incentives are not subsequently provided in panel surveys. However,
feedback from our field staff during the most recent round of the General Social Survey
suggested otherwise, with interviewers stating that offering incentives, particularly to panel
cases that had previously received incentives, was critical to completing interviews. The
2012 General Social Survey began offering incentives to its panel respondents within one
month of the start of the field period, typically after an attempt to obtain the interview without
an incentive had failed. Interviewing staff was authorized to offer a fifty dollar incentive and,
upon refusal and with project staff approval, to increase the offer in increments of fifty dollars
to a maximum of two-hundred dollars or whatever their prior round incentive had been. All
incentives were conveyed in both direct communication and refusal letters that were mailed
to the respondent. By the end of the field period, all remaining cases, regardless of prior
round incentive amount, were offered a two-hundred dollar incentive if the field manager
thought it was necessary. Our poster will address this whether or not non-random incentive
payments appeared to make a difference in interview completion rates. We will add to this
analysis by examining how timing, differentials in the incentive amount between offers, and
amounts of incentives offered in previous rounds of the General Social Survey could have
affected response rates for panel and address-based respondents.
14. Social Media Usage Among Young Adults: What, How and Why?
Caitlin Krulikowski, Fors Marsh Group; Katie Solook, Fors Marsh Group; Yalcin
Acikgoz, Appalachian State University; Jennifer C. Romano Bergstrom, Fors Marsh
Group; Shawn Bergman, Appalachian State University; Fors Marsh Group
Social media hosts a tremendous amount of data about individuals and as such, has
emerged as a new way for data collection. However, utilizing this resource effectively
requires in-depth knowledge about social media usage. Information about social media user
demographics for various outlets is critical to the extent that a targeted approach for data
collection is needed. While many studies exist that explain the number of people using
social media and the various types of social media people use, few provide specific details
of time spent or behaviors and interactions on social media by different groups of users
(e.g., gender, age, race). In this study, we sought to examine how young adults (ages 16-
24) use social media. Specifically, we explored how much time they spend using various
social media compared to performing other activities (e.g., reading, playing sports), how
much personal information they share on each, and what specifically they do on each social
media (e.g., post, read). To study young adults’ social media behavior, we created a 58-item
pencil-and-paper survey. Question topics included (but are not limited to) Internet and Social
Media Usage (e.g., general activities comparison, interaction on social media), Future Plans
(using social media to get information about future plans), and Current Experiences
(employment, education). 3,743 participants completed the probability-based survey. Data
demonstrate that most young adults use social media as much or more than they talk on the
phone, play sports, and more, and there are usage differences between sub-populations. In
this talk, we will show the different ways young adults from different groups use various
types of social media and the amount of personal information they share on each. There is
applied value in exposing trends of young adults’ social media behaviors for researchers
interested in optimizing social media usage.
15. An Alternative Approach to Measuring and Describing Trust as a Complex Socio-
Cultural Phenomenon
Anastasia Mirzoyants, InterMedia Survey Institute
This study suggests a statistical model, which singles out trust when accounting for a range
of factors that influence interpersonal relationship. Prior empirical studies examined trust
from a qualitative perspective: through the description of participants’ beliefs, experiences
and behaviors (Blomqvist, 1997). As a result, there is no definition of trust agreed upon by
different academic disciplines because most existing theories identify trust by describing its
attributes rather than measuring it directly (Bloomqvist). The researcher uses the Rasch
analysis to design a quantitative instrument that can be used to measure trust. Earlier
attempts to use the Rasch model in social research demonstrated that the Rasch analysis
enables a creation of a rigorous measure useful for focused exploration of complex
phenomena common in various socio-cultural environments (Fisher, 1991; Irwin & Irwin,
2005; Johnson et al., 1995). The proposed measure relies on two theories of trust: first, the
study of Bryk and Schneider (2002), who describe trust as the mutual positive evaluation of
the relations participants according to four components: respect, competence, regard for
others, and integrity. The second theory is Lewis and Weigert’s (1985) interpretation of trust
as a tri-level phenomenon, which consists of cognitive, emotional, and behavioral
components. After a series of instrument calibrations, the researcher added one more level
to Lewis and Weigert’s theory, loyalty, thus, creating a 4X4 matrix-type measure of trust.
The alternative measure was tested in a pilot study, which demonstrated that the measure
captures the overall structure of trust and can help detect the differences in trust due to
participants’ demographic and/or socio-cultural characteristics, especially in the
environments characterized by the power asymmetry and insufficient sense of belonging.
16. The Effect of Cognitive Dissonance and Effort Justification on Recruitment into a
Longitudinal Survey Study of Military Families
Hope McMaster, Naval Health Research Center; Kelly Jones, Naval Health Research
Center
Background: Despite substantial improvements to survey design and implementation, there
has been a general decline in health survey participation in recent decades. A frequently
cited barrier to participating is the considerable time and effort it takes to complete a
comprehensive survey of health-related issues. Is it possible that requiring considerable
time and effort of survey responders actually helps in the recruitment of new survey
respondents? In order to address this question, the theory of cognitive dissonance via effort
justification was used as a framework for designing an experiment as part of enrolling
military personnel in a large prospective health study. Methods: The study population
consisted of a random sample of 598 Millennium Cohort Study participants randomly
assigned to either participate in a low effort task (completing 5 pages of a health survey) or
high effort task (completing 24 pages of a health survey), before requesting their spouse’s
contact information, so their spouse could be invited to take a health survey similar to the
one they were taking. Agreeing or not agreeing to the request for their spouse’s contact
information was considered a proxy measure for the participant’s attitude about the health
survey. Logistic regression was performed to investigate the adjusted associations. Results:
There were 494 (83%) members of the original sample that completed the survey. After
adjusting, respondents engaging in the more effortful task (N=258, 31% referred) prior to the
request were more likely to provide their spouse’s contact information than those engaging
in the low effort task (N=236, 3% referred) (adjusted odds ratio 15.6, 95% confidence
interval: 7.2–33.9). Conclusion: These finding suggest that spending considerable time and
effort completing a health survey may actually increase general regard for the survey, and
thus increase the likelihood of agreeing to subsequent recruitment requests.
17. Can’t They or Won’t They Answer Our Questions? The Implications of Satisficing in
Attrition Analysis
Veronica Roth, The Pennsylvania State University; David Johnson, The Pennsylvania
State University
Longitudinal data collection offers researchers the chance to explore change over time and
establish temporal order, a necessary assumption for multivariate analysis (Johnson, 1988).
If attrition is non-random, the sample may yield biased estimates. Satisficing, which occurs
when respondents do not fully process a question when giving a response, and may falsely
increase reliability of answers do to consistent, but not valid, responses. Respondents who
satisfice may be less invested in a survey, or they may have lowered cognitive ability to
answer demanding questions in the survey (Krosnic, 1991). Satisficing has been linked to
non-response in cross-sectional research, due to lowered cognitive ability (Kaminska et al.,
2010). Using the National Survey of Fertility Barriers (NSFB), I will conduct an analysis of
how satisficing may be related to attrition. The NSFB is a nationally representative RDD
telephone survey, initially conducted from 2004-2007 and included 4,792 women aged 25-
45 years old, with a 3 year follow-up interview with 3,723 respondents (Johnson and White,
2009). Using recency and primacy effects, reliability scores, education and duration of
survey, I will test hypotheses that attrition is related to satisficing due to both cognitive ability
and lowered commitment to the survey. I will then discuss the implications of these findings
in the context of both reducing attrition and detecting bias in the dataset. Johnson, David.
1988. “Panel Analysis in Family Studies.” Journal of Marriage and Family 50(4): 949-
955.Johnson, D. R. & L.K. White (2009). National Survey of Fertilty Barriers [Computer File].
Population Research Institute [distributor]. The Pennsylvania State University. University
Park, P.A. Kaminska, O., A.L. McCutcheon & J. Billiet (2010) Satisficing Among Reluctant
Respondents in a Cross-National Context. Public Opinion Quarterly, 74(5), 956-
984.Krosnick, J. A. (1991). Response Strategies for Coping with the Cognitive Demands of
Attitude Measures in Surveys. Applied Cognitive Psychology, 5(3), 213-236.
18. Inauthentic Respondent Behavior
Arianne Buckley, Arbitron Inc.; Will Waldron, Arbitron, Inc.
Arbitron developed an electronic Portable People Meter (PPM) that automatically detects
audio exposure to encoded radio signals. A key feature of Arbitron’s PPM is its capacity to
measure the listening behavior of each household member by issuing each person his or
her own personal meter. The meter also contains a motion detector that allows Arbitron to
determine whether the meter was carried each day. Panelists receive instruction and
coaching to carry their meter, and only their own meter, throughout the day and most
panelists comply very well with these instructions. However, as a quality measure, Arbitron
has developed methods to determine when panelists are non-compliant with these
instructions so they can be coached and/or removed from the sample as indicated. This
study takes a closer look at inauthentic respondent behavior within Arbitron and how it can
relate to other survey researchers. The study will examine the prevalence of this behavior
and the motivations behind it. The analysis will investigate any patterns seen among these
non-compliers in order to help develop strategies for predicting and preventing this behavior
before it occurs.
19. The Interpretation of Aerial Imagery as an Alternative to In-Field Listing for Address
Frame Creation in Rural Environments: A Proposed Methodology With Empirical
Results
Becki Curtis, NORC at the University of Chicago; Ned English, NORC at the University
of Chicago
While it is now possible to use address lists derived from the United States Postal Service
Delivery Sequence file (DSF or CDSF) in urban and suburban areas, rural areas without
city-style delivery may still necessitate in-field listing for address frame creation. Due to the
resource intensive nature of in-field listing, many studies are not able to proceed with frame
construction in rural areas. In an effort to understand the cost-benefit of listing procedures in
rural areas, we have outlined a methodology for using aerial imagery for creating and
validating housing unit lists as an alternative to in-field listing in rural areas. In so doing we
present empirical results from a comparison of aerial imagery to an in-person listing of a
segment in north eastern Montana. The in-person listing for this study was part of the 2011-
12 NORC National Frame listing. The aerial listing was completed blindly without use of the
DSF in order to determine what percentage of housing units could be found remotely. A
similar study was completed by Dreiling et al. as a part of the National Children’s Study
(NCS), finding that alternative listing procedures were less time-consuming and more cost-
efficient than “on-site” listing methods while still allowing for the identification of a large
percentage of housing units (2009). Likewise, we determined that the aerial listing found
85% of the housing units listed in-person, while in-person listing took four times as long to
complete and cost ten times that of the aerial listing. Our preliminary results indicate that the
use of aerial imagery may be a suitable alternative to in-field listing in certain rural
environments. Dreiling K., Trushenski S., Kayongo-Male D., & Specker B. (2009).
Comparing household listing techniques in a rural midwestern vanguard center of the
National Children's Study. Public Health Nursing, 26(2), 192-201.
20. Sample Responsiveness to Tracking Efforts on the SIF WorkAdvance 18-Month Study
Christy Aroopala, Decision Information Resources, Inc.; Jo Anna Hunter, MDRC; Lee
Robeson, Survey Management Inc.
The success of longitudinal studies is directly tied to the quality of the respondent contact
information in the sample (Laurie et al., 1999). Updated contact information on respondents
prior to study launch can save valuable time and resources during the study and helps
reduce nonresponse. Longitudinal studies with hard-to-reach populations, those that are
mobile and low-income, require special attention to tracking and cohort maintenance since
their contact information changes frequently (Duncan & Kalton, 1987). Recent research has
begun exploring successful strategies for increasing response rates to tracking efforts in
longitudinal and multi-wave studies (McGonagle, Couper, & Schoeni, 2011). This paper
evaluates the tracking efforts implemented in the 18-month pre-launch phase of the SIF
WorkAdvance Study. All 3400 participants are recruited to the study in monthly cohorts and
are randomly assigned to either program or control groups. Participants are then surveyed
18 months later to evaluate program effectiveness. During the 18-month pre-launch period,
tracking mailings are sent to respondents requesting their updated contact information at 6,
9, 12, and 15 months post-random assignment. These tracking efforts are currently
underway and will continue through September 2014. This paper evaluates sample
responsiveness to date for these three types of mailings: (1) 6-month greeting card with a
magnet, (2) 12-month letter with a perforated bottom to return via Business Reply, and (3) 9-
month & 15-month emails. Sample responsiveness will be evaluated with measures on the
number of voicemail messages, Business Reply returns, and email updates received from
respondents in response to tracking efforts. We also explore whether age or gender impacts
responsiveness to these different types of mailings to assess possibilities for targeted
tracking efforts in future work.
21. A Balancing Act of Politics and Brands: A Look at Corporate Donations to Political
Candidates and the Impact on Attitudes of Corporations, Politicians, and Purchase
Behavior
Whitney O. Walther, University of Minnesota
This past summer Chick-fil-a found was the center of a political controversy after Dan Cathy,
chief operating officers, made several public comments opposing same-sex marriage. As a
result, many consumers either boycotted or buycotted the company’s product depending on
their own stance on the issue. Similarly, posts spread via Twitter and Facebook last spring
urging individuals to stop shopping at Urban Outfitters and American Apparel (known for
attracting young, liberal-minded customers) after it was discovered that the CEO of the
companies, Richard Hayne, and his wife donated to the campaign of right-wing Republican
Sen. Rick Santorum and Santorum’s Political Action Committee. The current study uses
balance theory (Heider, 1958) to investigate the interplay of attitudes between brands and
politicians. Using a 3 (corporation supports/opposes/neutral) x 2 (favoring
candidate/opposing candidate) (N = 210) design, this study explores the way in which
individuals attempt to maintain a cognitively balanced relationship between themselves,
corporations that donate to a political candidate, and politicians. It is hypothesized that the
higher one’s level of support for either a corporation or candidate will determine the way in
which he or she shifts his or her attitude of the corporation and candidate after discovering
donations were made. For example, if one has a high regard Urban Outfitters, his or her
opinion of Rick Santorum might increase after finding out about the donations made by
Urban Outfitters’ CEO. Alternatively, one who highly opposes Rick Santorum may be less
likely to purchase items from Urban Outfitters after hearing of the donation. Results suggest
general confirmation of balance theory, in that individuals wish to maintain a balanced
relationship between their attitudes of corporate donors and politicians. General attitudes
toward the corporation and candidate, as well as purchasing behaviors are investigated.
Results have both theoretical and practical implications for political communication research.
22. Designing and Defending Surveys Used in Commercial Litigation
Melissa Pittaoulis, NERA Economic Consulting
Survey evidence has become increasingly important in commercial litigation, particularly in
intellectual property disputes. In this paper, I discuss the challenges of conducting litigation
surveys. One set of challenges is encountered at the design stage. These include drafting
questionnaires on unfamiliar topics, choosing the appropriate data collection mode, creating
any stimuli that need to be tested, and working on tight deadlines. The second set of
challenges is encountered after the survey is completed and the report has been submitted
to the court. Surveys that are proffered as evidence in legal cases are held to a particularly
high level of scrutiny. In most cases, the survey will be critiqued by a survey researcher
hired by the opposing side. In addition, judges have varying experience evaluating surveys,
and some may be quite skeptical of their results. Thus, the survey researcher must be
prepared to defend his or her methodological choices. Relying on the academic survey
research literature is one of the most effective ways to do this. However, the researcher
must be aware of the areas in which the legal precedent on best survey practices and
current academic opinion diverge. One example of this divergence is on the use of the “don’t
know option.”
23. Voter Interpretation of Large Numbers in Politics: A Comparison of Data Collected
From In-Person Solicited Surveys and Mechanical Turk
Brian M. Guay, University of Richmond; David Landy, University of Richmond
This poster presents data collected in two replicated experiments; one using Amazon’s
Mechanical Turk and the other using a more traditional in-person solicited survey technique.
Mechanical Turk is becoming increasingly popular in the field of psychology and cognitive
science, though it has been slower to gain popularity in the fields of political science and
public opinion. We explore the slow growth of this trend, while providing an analysis of data
collected in replicate experiments. The experiment run using both types of data collection
methods explores the effectiveness of a number-line intervention on voters’ interpretation of
large numbers, such as a million, billion and trillion. American voters are being increasingly
confronted with numbers of such large magnitude in daily political discussion of the budget,
deficit and debt. Previous research shows that individuals often incorrectly use number
comprehension techniques to estimate the magnitude of these numbers. In this experiment,
participants are asked to rate a series of political scenarios based on real political events
involving large numbers and to then place similar numbers on a number line ranging from
one thousand to one billion. The experimental group is then presented with a similar number
line, but with one million placed at its proper location. Participants are asked to evaluate a
second set of political situations and number lines, thus demonstrating the effect of the
experimental group’s exposure to the intervention task. The data collected using in-person
solicited surveys and Mechanical Turk are presented and analyzed.
24. How Representative are Google Consumer Surveys?: Results From an Analysis of a
Google Consumer Survey Questions Relative National Level Benchmarks With
Different Survey Modes and Samples Characteristics
Parvati Krishnamurty, NORC at the University of Chicago; Erin Tanenbaum, NORC at
the University of Chicago; Michael Stern, NORC at the University of Chicago
The decrease in coverage for traditional random digit dialing (RDD) samples is well
documented (e.g., Blumberg et al. 2011). This decline in landline connections, particularly
for young people, make coverage especially problematic (Keeter et al. 2007). Although
mobile phones can be added to landline sample frames to increase coverage, this dual
frame approach introduces new challenges, as they are more prone to nonsampling errors
in comparison to RDD and in the United States incoming calls are often counted against the
respondent’s minutes (Brick et al. 2011). Non-probability Web-based supplements have
been suggested as a means to reducing problems with RDD coverage and picking up cell-
only households without respondent-side costs. However, three questions need to be
answered. First, do we find more cell-only households among non-probability Web
samples? Second, how do these Web-based results differ from national level random
sample results? Third, how demographically different are these samples from mode varying
probability samples? In this paper, we present an analysis of a series of Google Consumer
Survey questions including home cell-phone usage and compare the results to those from
three national-level random sample surveys, all of which were cited in the AAPOR’s Cell
Phone Task Force.
25. Enumerating Households via a Mail Questionnaire
Charles D. Harm, Arbitron, Inc.
Arbitron is moving toward an address based sampling (ABS) frame, in an effort to reach a
greater proportion of U.S. households. Currently, a mail-based screener questionnaire is
sent to an ABS sample household where a selected address cannot be matched to a
landline phone number. If a respondent reports being cell phone only or cell phone mainly
the household is added to a cell-phone frame and used to supplement a 2+ list assisted
RDD sample. As part of the current screener questionnaire, respondents are asked to
provide information on the demographic composition of their household (e.g., race, age,
language). In order to maintain a representative sample, households are selected to
participate in the Ratings based on their household characteristics. The current
demographic questions are relatively simple. Moving forward, our goal is to collect more
detailed demographic information from households. Enumerating households gives us the
ability to further stratify our sample, and focus our efforts on only attempting to recruit
households that have desired household characteristics. Two approaches to household
age/gender enumeration will be tested. One method involves asking for the “presence of”
household members that fall into a defined age/gender category. The other method involves
asking for the specific number of household members that fall into a defined age/gender
category. Households will be enumerated via a follow-up phone call to assess the accuracy
of the enumeration data collected via the screener questionnaire. Which approach to
household enumeration will provide more accurate data? This presentation will examine the
impact of household enumeration on response rates, and whether data quality is influenced
by household demographics.
26. Alternative Strategies for Linking Longitudinal Survey Data Aaron M. Pearson,
University of Michigan Survey Research Center
Ryan J. Yoder, University of Michigan Survey Research Center; Lisa S. Holland,
University of Michigan Survey Research Center
As respondent concerns about confidentiality and reluctance to provide identifying
information increase, linking participant responses across multiple questionnaire
administrations can present a challenge for social science researchers. One approach is to
have participants create a self-generated identification code (SGIC) to link questionnaires.
This code is comprised of a group of items that are well known to the participant, easily
recalled from memory, and remain stable over time. In this presentation we describe a four-
element SGIC linking strategy consisting of month of birth, day of birth, last initial, and last
four digits of the social security number. We examine the effectiveness of the resulting code
for linking participant questionnaires in two distinct components of a large military study. In
the first component we examine the utility and validity of the linking strategy for respondents
who were asked to complete a questionnaire spanning two administrative sessions,
separated by a day. A unique identification code was assigned to participants to serve as
the primary link between sessions, allowing us to assess the effectiveness of the alternative
SGIC link by calculating match rates across sessions. We also examine the incremental
utility of each SGIC element to successfully match questionnaires. In the second
component, the SGIC became the primary link. This time we were interested in
demonstrating the utility of the SGIC as the only source of information to link participant’s
questionnaires over three time periods relative to their combat deployment. The first session
was conducted prior to deployment, the second took place immediately upon return (approx
9 months) and the third issued three months after return. Again, we assess the effectiveness
of the SGIC by examining match rates across sessions.
27. Investigating the Bias of Alternative Statistical Inference Methods in Sequential
Mixed-Mode Surveys
Zeynep T. Suzer-Gurtekin, ISR - University of Michigan - Program in Survey
Methodology; Steven G. Heeringa, ISR - University of Michigan - Program in Survey
Methodology; Richard Valliant, ISR - University of Michigan - Program in Survey
Methodology
Sequential mixed-mode surveys combine different data collection modes sequentially to
reduce nonresponse bias under certain cost constraints. However, as a result of
nonignorable mode effects nonrandom mixes of modes may yield unknown bias properties
for population estimates such as means, proportions and totals. The assumption of
ignorable mode effects governs the existing inference methods for sequential mixed-mode
surveys. The objective of this paper is to describe and empirically evaluate the proposed
multiple imputation estimation methods that account for both nonresponse and nonrandom
mixtures of modes in a sequential mixed-mode survey. The American Community Survey
(ACS) or the 1973 public-use Current Population Survey and Social Security Records Exact
Match data will be used to conduct empirical and simulation evaluations. The focus of the
empirical evaluations and simulations will be mean family income and health insurance
coverage.
28. The Nature and Dynamics of Candidate Trait Impressions
Scott Clifford, Duke Initiative on Survey Methodology; Sunshine Hillygus, Duke
University
Character trait perceptions are important predictors of vote choice, yet we know little about
the formation, nature, and dynamics of candidates’ trait images. Using the AP-Yahoo 12-
wave panel survey, we trace the individual-level evolution of candidate image throughout the
2008 presidential election. We then compare evaluations of Obama into the 2010 and 2012
elections. Initial analysis finds substantial partisan polarization in relative trait ratings
throughout the 2008 campaign, a trend which is especially driven by initially undecided
voters. Yet, in spite of the polarization, we find evidence that voters also update their
evaluations as they learn new information. Not only do candidates maintain distinct
character strengths and weaknesses throughout the campaign, even among party
supporters, but individuals make greater distinctions between trait dimensions for any given
candidate. However, once an individual has settled on a preferred candidate, they show
greater consistency in their trait evaluations of their preferred candidate, suggesting a
process of motivated reasoning. Trait evaluations of Obama, however, change again in
response to information learned during his first term in office. Finally, our analysis accounts
for panel attrition in the analysis and considers the implications of attrition for the substantive
conclusions.
29. On Factors Affecting the Accuracy of Congressional District Level Polls
Masahiko Aida, Greenberg Quinlan Rosner Research
When it comes to polling accuracy, political polls should be the easiest to evaluate; we can
compare final survey estimates against the actual outcome after the Election Day. While it is
relatively easy to evaluate the overall accuracy of each poll, it is very difficult to have a
holistic understanding of the roles played by various features of the poll on its accuracy.
Often each survey has a different mode (ex. IVR vs. live), different survey date, different
treatment of missing data, different weighting scheme and different question wordings.
These varied features will make an apple to apple comparison of surveys quite challenging.
The author has a unique opportunity to shed some light on this situation as he has access to
the micro-level data of many congressional district polls. Using micro-level data, the author
can standardize certain features of polls (ex. Using exact same treatment of missing data,
using identical target for weighting adjustment) and evaluate the effect of survey mode and
the effect of timing (ex. Number of days prior to the election day).The author will use 473
micro-level congressional district level polls from 2008, 2010 and 2012 to evaluate the
impact of above factors in the accuracy of the estimates.
30. Evaluating the Effect of Remote vs. In-Person Training Modes on Data Quality
A. Rupa Datta, NORC at the University of Chicago; Micah Sjoblom, NORC at the
University of Chicago; Jill Connelly, NORC at the University of Chicago; Karen
Veldman, NORC at the University of Chicago; Vicki Wilmer, NORC at the University of
Chicago
The National Survey of Early Care and Education (NSECE) is an integrated set of surveys
with households with young children, and institutions and individuals providing care for
young children. This project employed a mixed mode data collection protocol and required
several hundred interviewers, who were responsible for multiple tasks: locating, contacting,
recruiting, and interviewing sampled households and sampled establishments and
individuals providing early care and education. In order to prepare interviewers for the
multiple challenges they would face in the field, a dynamic training effort was essential.
Budget limitations and schedule constraints, however, made it possible to train only a
portion of interviewers in person, so a group of more experienced interviewers were trained
remotely. These two trainings featured similar content, but differed in significant ways,
mainly in how the content was delivered. We will evaluate the relative effectiveness of these
two training modes, including discussing differences in the trainings themselves, and
comparing costs, retention on the project, and other operational factors. The majority of the
effort will be to look at interviewer performance during the field period with a special focus
on data quality issues. The project team developed a set of field interviewer performance
metrics from different data sources to monitor and evaluate performance on a weekly basis.
These metrics were constructed using both paradata (e.g., timing data, records of call
attempts) and questionnaire data (e.g., item non-response rates, quality of verbatim
responses) to create measures of interviewer efficiency and data quality across
approximately 100,000 completed screeners and interviews. These metrics along with other
non-survey-related interviewer characteristics (e.g., interviewer experience, measures of
previous performance, gender, languages spoken) will form the foundation of our analysis of
the training modes.
31. The Process of Turning Audit Trails From a CATI Survey Into Useful Data: Interviewer
Behavior Paradata in the American Time Use Survey
Nicholas Ruther, University of Nebraska – Lincoln; Polly Phipps, U.S. Bureau of Labor
Statistics; Robert Belli, University of Nebraska – Lincoln
In recent years, using paradata as a tool to improve survey methodology has grown
markedly. Audit trails, i.e. supplementary output from a computer-assisted survey program,
catalogue the actions taken during a survey interview, such as key strokes, time information,
and edit warnings. (Couper 2000, Mockovak and Powers 2008, Dahlhamer 2004) By
examining the audit trails, researchers can investigate problems and other issues within the
survey interview in a systematic fashion. The American Time Use Survey (ATUS) is
conducted using the Blaise CATI program, which produces an audit trail for each interview.
This presentation discusses the process of taking an audit trail from its original state to a
final assessable form, the problems that occurred and the solutions, and the information
gleaned from the resulting data. Focusing on the time diary portion of the ATUS, audit trail
text was imported to Microsoft Excel, parsed and tabulated, and subsequently the data were
imported into a SAS statistical program. Many useful indicators for diagnostic analysis of the
instrument and the interviewers were obtained. Counts of error edit warnings, how the
interviewer interacted with them, and the lengths of time per interaction were able to be
calculated and compared. In the cases examined, interviewers were much more likely to
choose only one of the three options given by edit warnings when prompted. Verbatim
activity entries in the time diary, or activities not assigned a pre-programmed code in the
CATI instrument, were associated with greater use of durations (length of time entered in
minutes) versus stop times (entering specific time of day) for information on time spent on a
diary activity. This information was taken from a sample of 103 audit trails, indicating that
greater sample sizes for future research should yield a wealth of new and more specific
knowledge.
32. Air Pollution vs. Greenhouse Gasses. Government Should Limit the Amount? The
Impact of Question Wording
Volker Huefken, University of Duesseldorf, Institute of Social Sciences
Does “air pollution” seem like a less serious problem than “greenhouse gasses”? Does
effect three different measures the public support for limit the amount? In an experiment
embedded in a German national CATI survey, adults were randomly assigned to rate the
seriousness of “air pollution,” “greenhouse gasses,” or “greenhouse gasses that cause
global warming” on the support that the German government should limit the amount. It will
be shown whether there is greater difference between social insecurity, left-right ideology,
and postmaterialism. Thus, word choice may sometimes affect public perceptions of the
climate change seriousness of support for mitigation policies, but a single choice of
terminology may not influence all people the same way, making strategic language choices
difficult to implement.
33. Does It Really Make a Fracking Difference?
Robert K. Goidel, Louisiana State University; Michael Climek, Louisiana State
University; Lina Brou, Louisiana State University
One of the great challenges of survey research involves understanding when and how
citizens develop opinions on highly complex and technical issues. How do citizens who often
lack basic information on the political system develop an opinion on something as complex
as energy policy? Narrowing the scope even further, how do citizens form opinions on an
issue like hydraulic fracturing, a controversial and poorly understood technique for drilling
natural gas? Does the use of the word fracking, a term commonly used for hydraulic
fracturing, affect public support for hydraulic fracturing? And, if so, in what direction? In this
paper, we utilize data from the 2012 and 2013 annual Louisiana Survey conducted by
Louisiana State University’s Public Policy Research Lab to consider whether slight shifts in
question wording affect public perceptions of the safety of hydraulic fracturing and support
for state government action to encourage drilling. Preliminary data from the 2012 Louisiana
Survey indicate that the use of the term hydraulic fracturing or fracking increases the
probability that respondents will say the method is unsafe and reduces support for state
government action to encourage drilling.
34. Survey Research and Social Media Monitoring During the 2012 London Summer
Olympics: A Case Study
Linda Lomelino, Social Science Research Solutions; Melissa Herrmann, Social
Science Research Solutions; Susan Sherr, Social Science Research Solutions; Robyn
Rapoport, Social Science Research Solutions
Social Science Research Solutions (SSRS) and Social Strategy1 (SS1) collaborated on a
pilot study during the three-week period surrounding the 2012 Summer Olympics in order to
address a crucial question: How can social media and traditional survey research work
together to generate insights and quality data? With the increase in Internet access and
social media usage, how can survey research and social media “listening” combine to add
qualitative depth to research findings and increase the value of the data collected? The
primary objective of the pilot study was to test the joint capability of traditional survey
research and social media data collection to observe attitudes and opinions and to better
understand the added value of integrating these two methods into a single research
endeavor. The study involved collection of data regarding respondents’ viewing of the 2012
London Summer Olympics and their attitudes about both the event itself and their
experiences consuming Olympic media content. Data collection occurred simultaneously
through random sample telephone omnibus surveys and Web monitoring from July 18
through August 19, 2012, a week prior to, during, and a week following the 2012 London
Summer Olympics. Looking at all three time periods allowed the research team to examine
whether intended viewership of the event differed significantly from consumers’ actual
viewing behavior. Similarly, collecting social media data during these three time periods
provided a benchmark of the volume and nature of conversations prior to, during, and after
the event. In addition to profiling the people who use social media and the specific social
media outlets that are being used most frequently, this study also demonstrated that, while
there are parallels between data collected through a random sample survey and social
media monitoring, these data also diverge in ways that are both interesting and important to
market researchers and academics.
35. Potential Impact of Modifying the Fielding Time of a Web-Based Survey
Herb M. Baum, Data Recognition Corporation; Anna Chandonnet, Data Recognition
Corporation
As the field of survey research looks for a sustainable future, greater emphasis is being
given to conducting Web-based surveys. However little is known about the pattern of
response for these surveys. The question our presentation will address is whether, in a
Web-based survey of a closed population, the percentage of respondents’ providing a
positive rating changes by the timing of when the person responds. The United States
Patent and Trademark Office (USPTO), to improve the quality of their work and comply with
the Government Accountability and Reporting Act (GPRA), conducts a Web-based survey of
patent examiners twice a year. The survey is designed to gauge the satisfaction of the
patent examiners with the internal and external factors that impact their ability to provide
high-quality patent examinations. According to Dillman (2009) “The optimal timing sequence
for Web surveys has not, we believe, been determined yet. Moreover the timing will depend
on the nature of the survey and the population being surveyed.” In practice, many Web
surveys are fielded for two weeks with an initial invitation message followed by a reminder
one week later. However, despite our wanting to adhere to that schedule, either of the
following often occurs:
The survey field period is shortened. For example, there is a meeting next week and
we need to close the study early and present the results.
The survey field period is extended. For example, you received a low response rate
and feel that by keeping the study open longer you might increase it to more
respectable level. We will explore how our results would differ with alternate Web
survey field times.
This research is a continuation of work that was presented at a regional evaluation
conference in New Jersey.
36. Looking for Solutions to America’s Energy Problems
Jennifer Benz, Associated Press NORC Center for Public Affairs Research; Matt
Kozey, NORC at the University of Chicago; Trevor Tompson, Associated Press
NORC Center for Public Affairs Research
The U.S. public, politicians, policymakers, and experts alike agree that U.S. energy policy is
an important issue for the country and one where government needs to be part of the
solution. However, as demonstrated in the contentious exchanges on energy policy during
the 2012 presidential debates, consensus on the causes of America’s energy problems and
the appropriate policy solutions breaks down across a partisan divide. The Associated
Press-NORC Center for Public Affairs research, with funding from the Joyce Foundation,
conducted a nationally representative household survey with 1,008 adults on landline and
cell phones to measure the general public’s opinions about key energy issues in the United
States. Additionally, the survey assessed how the public understands, learns about, and
acts upon energy issues. Using multivariate regression, we find that party identification is a
stronger predictor of opinions on energy issues than demographic and socioeconomic
characteristics. While individuals in both parties agree that energy issues are important at
fairly equal rates, party identification appears to be the strongest influence on perceptions of
the causes of and solutions to this country’s energy problems. As expected, this is
especially clear when looking at the partisan differences on alternative energy sources and
domestic drilling policies as causes of and solutions to the country’s energy issues.
However, among the many partisan divisions, we do find similarities on key attitudes that
have important policy implications. Mainly, we find that the energy industry and utility
companies have the potential to be accepted and trusted actors in policy solutions. The
public believes that the energy industry shares more responsibility for increasing energy
saving in the U.S. than the government or individuals. Additionally, utility companies are the
only source of energy savings information that reaches a majority of the public and is
considered a trusted source across party lines.
37. The Effect of Cell Phones on Uninsured Rates: A Comparison of BRFSS and the
Louisiana Health Insurance Survey Estimates
Ashley Kirzinger, University of Illinois Springfield; Stephen Barnes, Louisiana State
University; Dek Terrell, Louisiana State University; Robert Goidel, Louisiana State
University
In this paper, we investigate how the inclusion of cell-phones in statewide samples affect
estimates of uninsured rates in Louisiana. We utilize data from the 2011 Behavioral Risk
Factor Surveillance System Survey (BRFSS) and from the 2011 Louisiana Health Insurance
Survey (LHIS), a a 10,000 household survey designed to estimate the number of uninsured
children and adults. Both surveys significantly increased the number of cell-phone
respondents in 2011. In the BRFSS data, the uninsured rate for adults, 18 to 64, increased
from 24.5 percent to 26.8 percent. In the LHIS, uninsured rates for children decreased from
5 percent to 3.5 percent while uninsured rates for adults increased from 20.1 percent to 22.7
percent. The CDC strongly cautions against describing these shifts as trends given the
change in methodology. With this in mind, we ask a slightly different question: What would
the uninsured rates have been without the inclusion of the cell-phone sample?
38. Effects of Response Format on Measurement of Readership
Randall K. Thomas, GfK Custom Research, LLC; Curtiss Cobb, GfK Custom
Research, LLC; Julian Baim, GfK-MRI; Risa Becker, GfK-MRI
Estimating exposure to media sources, such as magazine readership, is a critical function
that determines advertising prices. One method to determine the extent of magazine
readership is through self-report, and can be done in a variety of ways, employing paper-
pencil self-administered instruments, human interviewers using show cards, and Web-based
surveys. When presenting a series of targets like magazines that serve as a filter for
subsequent follow-up questions, there are a number of techniques that have been used,
including a multiple response format (‘Select all’), a yes-no grid (requiring a yes or no to
each element), or a card sort task that separates the magazines into piles of ‘yes’ or ‘no’. As
part of an investigation to transition a magazine readership survey to a Web-based mode,
we experimentally investigated alternative response formats to determine readership in the
past 6 months. We converted the traditional human interviewer card sort task into a drag
and drop task whereby magazine titles would be displayed in a single pile and respondents
would drag and drop the magazines into 3 piles – Yes, read; Not sure; No, did not read. The
Yes-No grid also included a middle category ‘Not sure’. The multiple response format
presented magazines with 4 in a row with 4 columns, and a response at the bottom ‘I did not
read any’. Each format had multiple screens to accommodate over 250 magazine titles. We
found that the drop-and-drag format took the longest to complete, while the multiple
response format took the least amount of time. In addition, the drop-and drag format show a
50% higher readership rate over the yes-no grid, and the yes-no grid showed a higher
readership than the multiple response format. We compare our results to those found with
high quality in-person interviews on readership.
39. The New Era of Innovative Incentive Treatments: Efficacy of Grand Prize Sweepstakes
versus Costly Individual Incentives
Ekua Kendall, Arbitron, Inc.
Arbitron developed an electronic meter that automatically detects audio exposure to
encoded radio and TV and other media signals. Panelists are asked to wear their meter
every day from the time they wake up to the time they go to sleep in order to measure their
full media exposure. The meter has a motion detector that allows Arbitron to determine
whether a panelist carried their meter on any given day and the panelist receives monthly
incentives based on their motion data. There is seasonal variance in meter carrying
behavior--with panelists less likely to carry their meter during times when there is likely
variance in their normal daily routine, such as during holiday periods and during the
Summer. Increasing individual incentives during these time periods are very costly. Over the
last 3 years Arbitron has analyzed the efficacy of implementing a grand prize sweepstakes
in place of individualized cash incentives for these seasonal periods. Previously when the
study was in its infancy, we presented an AAPOR poster that was very well received with
numerous post-conference follow up from attendees. Now going into in year-three of
implementation there is greater data for continued analysis on the effectiveness of a
sweepstakes incentive in a panel setting. There were also some lessons learned in the
running of multiple sweepstakes on a yearly basis. This presentation will reveal performance
related metrics of diverse demographic groups and additional sweepstakes methods of
varying prize money amounts and visual promotional materials. This presentation will also
reveal other interesting findings and represents an expansion of our knowledge base in this
area of alternative incentives that anyone interested in this promising area of study will not
want to miss.
40. Analyzing American Trust and Confidence Utilizing a Mixed Mode ABS Nationwide
Survey
Danna Moore, Social and Economic Sciences Research Center; Donald Beck, Booz
Allen Hamilton; Bruce Austin, Social and Economic Sciences Research Center,
Washington State University; Dave Schultz, Social and Economic Sciences Research
Center, Washington State University
An important performance indicator for government is trust and confidence of the American
people. Financial and health care services are both fundamental to the health and well-
being of most Americans and families. The U.S. population has experienced tumultuous
circumstances with the high rates of foreclosures, bank closures and fraud, high financial
service fees, high health care costs and a shifting health care system. There is much
interest in the performance of the health care and financial services sectors as related to
consumer satisfaction and ensuing trust and confidence. This nationwide survey evaluating
trust and confidence in financial services, health care services, and important American
institutions provides a unique opportunity to perform analyses of mixed mode survey results.
This study evaluates the impacts of weighting and nonresponse adjustments for an address-
based sample frame survey. We explore the impacts of weighting and compare these
results across survey modes on key survey measures.
Thursday, May 16, 1:30 p.m. – 3:00 p.m.
AAPOR Demonstration Session #1
PHIT for Duty: Exploring a Mobile Data Collection Framework
Stacey Weger, RTI International; Paul Kizakevich, RTI International; Randy Eckhoff, RTI
International; Yuying Zhang, RTI International; Jennifer Lyden, RTI International;
Vesselina Bakalov, RTI International; Stephanie Bryant, RTI International
PHIT for Duty is an applied research program, developed on behalf of the U.S. Department of
Defense, for prevention of chronic psychological health issues and post-traumatic stress
disorder (PTSD) among troops recently returned from deployment. The Personal Health
Intervention Tool (PHIT™) is an innovative field-deployable self-help system that is intended to
be used for secondary prevention of psychological health problems with early intervention of
PTS symptoms and risk coping behaviors. The goal is to reduce the short-term impact of
traumatic and operational stress exposures, reduce incidence and duration of stress-related
health problems, improve quality of life, and reduce the risks for PTSD and other long-term
stress-related injuries. The PHIT platform combines a smartphone or tablet and optional,
nonintrusive physiological sensors. PHIT for Duty integrates a suite of health assessments with
an intelligent executive program that recommends, tailors, and presents advisories based on
established rules and processes. PHIT provides for collecting information (instruments),
executing application logic (virtual advisor), and displaying output information (activities). It
integrates data ranging from questionnaires to diaries to Bluetooth-linked sensors, including
wireless heart rate, sleep state, and actigraphy sensors. The built-in logic processor executes
custom logic to change the behavior of the application and display custom output using different
forms of media. The PHIT framework is flexibly designed to collect data from different sources,
have runtime intelligence for dynamic analysis, be customizable, work offline, and run on
multiple mobile devices. Data is stored locally in an encrypted database and uploaded
periodically to a project server whenever wifi is available. User privacy is maintained via multiple
layers and safeguards. This demonstration will provide an overview of the PHIT platform and
PTSD intervention app, discuss lessons learned from beta tests, and demonstrate examples of
unique, new survey capabilities using this mobile data collection platform.
Tablets as Data Entry Interfaces – Solving Data Cleaning and Transcription Issues
During Data Collection
Michael Costello, RTI International
Due to the increasing complexity of survey work, tablets can provide a strong support system for
data enumerators during collection. Software can be written to assist in reminding enumerators
when to skip questions, what kinds of prompts are acceptable to use, or when to abort a survey
due to responses provided. It can also ensure that crucial questions are not accidentally skipped
during collection. For survey administrators, the benefits are even more far reaching: Instant
access to data, metrics on enumerator pacing, instant data entry with no additional wait time,
GPS mapping of dwelling and more. Tangerine, an open source data entry interface developed
by RTI International, is the first tablet based data collection software custom-created to record
student responses on early grade reading or mathematics assessments, yet flexible enough to
capture common survey formats in a range of languages and scripts without requiring
programming expertise. Surveys can be collaboratively designed using a simple Web-based
tool, the Tangerine wizard, similar to Survey Monkey or Google Forms. Tangerine does not
require connectivity during data collection to be usable in low-resource and low-bandwidth
environments. At the same time, where connectivity, e.g. via mobile networks is available, the
software allows for regular back-up of the data to the central server which in turn allows for
immediate review and monitoring of data collection progress.
Designing Surveys for Tablets and Smartphones
Sabin Lakhe, U.S. Census Bureau; Elizabeth Nichols, U.S. Census Bureau; Murrey G.
Olmsted, RTI International; Tiffany King, RTI International
Designing surveys for mobile data collection raises a number of programming issues for
researchers. For example--should we design an app that can be downloaded by the user or a
Web application that would be rendered based on type of a mobile device? What are the
benefits or drawbacks using app or Web based surveys? How should we display questions to
accommodate different screen sizes and formats of devices? Should the user use the keyboard
to enter date or should we use date picker? Should we automatically capitalize the first letter
entered for names and addresses? In preparation for the 2020 Census, the Census Bureau
developed and cognitively tested several android apps for three different decennial census
forms: a household level questionnaire; a group quarters questionnaire; and a questionnaire
used for people who did not receive the household questionnaire. For this study, we used
Android-based tablet and smartphone devices to conduct the cognitive and usability testing. The
presentation will review the challenges faced by the project team, our findings from two rounds
of interviews, and the design changes that we made as a result of testing. The audience will
have an opportunity to use and compare the questionnaires designed for the tablet and
smartphone and compare the differences themselves. We will also talk about next steps in this
work and how we plan to address some of the challenges of preparing these apps for use in the
field.
Thursday, May 16, 1:30 p.m. – 3:00 p.m.
AAPOR Concurrent Session B
Factors Related to Survey Participation
Social Isolation and Survey Nonresponse: An Empirical Evaluation Using Social
Network Data
Megumi Watanabe, University of Nebraska-Lincoln; Kristen M. Olson, University of
Nebraska-Lincoln; Christina D. Falci, University of Nebraska-Lincoln
Survey researchers have long hypothesized that social isolation negatively affects an
individual’s likelihood of participating in surveys, while social integration increases the likelihood
of survey participation. However, measures of social isolation usually rely on proxies that
measure marginalized groups in the population or isolated groups, such as the elderly and
nonwhites, not on direct measures of social isolation, that is, lack of connectedness to other
persons in general or lack of connection to similar others. We use the 2008 Survey on
Promoting Success among Faculty (AAPOR RR2 63.6%) to examine the relationship between
social isolation and survey participation. This study examines social networks among faculty in
academic departments. In this study, faculty identify the people in their department with whom
they collaborate for research or consider to be friends. Importantly, nonrespondents to the study
can be identified as research collaborators or friends by study respondents. Thus, social
isolation measures are available on both respondents and nonrespondents. Standardized
indegree (the number of connections divided by the department size) is used to measure
‘general social isolation’ in research exchange or friendship network. In addition, individuals’
connections to other people in their department with similar group characteristics, or homophily,
can be identified for a measure of ‘group isolation’. Preliminary analyses indicate that
standardized indegree in both research exchange and friendship network are positively
associated with survey participation rates. Also, we found that gender homophily in friendship
network increases the likelihood of survey participation.
Community Attachment, Social Trust and Nonresponse to a Telephone Survey
Thomas M. Guterbock, Center for Survey Research, University of Virginia; Casey
Eggleston, Center for Survey Research, University of Virginia
In a previously presented paper (Guterbock, Hubbard and Holian 2006), we showed that
community attachment can be a strong predictor of individual and geographic variations in unit
non-response, exceeding indicators of population density and urbanicity in its explanatory
power. (Community attachment can be defined simply as the degree of connection that exists
between an individual or group and its locale.) In the present research, we attempt to replicate
and elaborate this finding using a larger sample and a modified measure of community
attachment. The data are from a 2009 telephone survey of 2,500 adults in the National Capital
Region, concerned with the experiences, attitudes, knowledge and likely behavior of the public
in case of a terrorist attack. The survey used a triple-frame design, but since geographic
information is largely unavailable for the attempted cell phone numbers, the present analysis
uses only landline phone numbers with known addresses. We use Census data, the survey data
aggregated to the ZIP-code level, and individual demographics to identify predictors of
community attachment using multi-level modeling. We then use the ZIP-code level data to
identify the relationship between community attachment and various outcomes: contact,
cooperation, and response rates. We find that community attachment is positively associated
with response rate, operating through its separate multiplicative effects on cooperation and
contact rates. We also consider a potentially important mediator of the relationship between
community attachment and response rates: generalized social trust, a factor not measured in
our earlier work. We discuss some theoretical and practical implications of these findings for
non-response bias, and argue that community attachment and social trust should both be given
more explicit attention in future research on unit non-response and non-response bias.
Survey Topic Saliency: An Examination of Potential Effects and Remedies
Johnny Blair, Abt SRBI; Pat D. Brick, Westat; J. Michael Brick, Westat
In this paper, we review, from a broad perspective, what is known about the potential effects of
survey topic on survey quality, what approaches have been used to mitigate undesirable effects,
and what directions this knowledge and experience suggest for survey design and research.
The conventional wisdom that sample members’ interest in the survey’s topic affects their
participation decisions has motivated the development of at least one theoretical model to
understand topic saliency effects and their interactions with other factors (Groves, Singer and
Corning 2000). Both practical experience and empirical research (e.g. Groves, Presser and
Dipko, 2004) suggest that topic salience can produce differential response rates, with the
potential to introduce bias when sample estimates differ by subgroup.
Researchers have investigated whether these effects may be mitigated by features of the
questionnaire (e.g. Brick et al. 2012) or by survey design. For example, Schwartz et al. (2006)
explored whether topic interest effects could be counterbalanced through the use of sample
quotas and incentives. Beyond unit response, there is evidence that salience at the item level
may affect item nonresponse (Adua and Sharp 2010) and measurement error (Stern, Smyth
and Mendez 2012).
There are several features of a survey may potentially affect or mitigate topic saliency effects,
including:
Survey sponsorship
Survey design
Data collection mode
Incentives
Questionnaire design features
Questionnaire length
There is no model that includes these multiple factors, or an empirical study that addresses the
entire range of design characteristics. The synthesis of the research literature and survey
methodology reports that we have undertaken will inform both future research and current
practice.
Partisanship and Nonresponse in Political Polls
Leah M. Christian, Pew Research Center; Michael Dimock, Pew Research Center; Danielle
Gewurz, Pew Research Center; Scott Keeter, Pew Research Center; Jocelyn Kiley, Pew
Research Center; Alec Tyson, Pew Research Center
Nonresponse in social and political surveys continues to grow. Although nonresponse can often
be distributed at random and is not usually a good indicator of bias in a survey, one issue of
particular concern to political pollsters is whether Republicans and Democrats respond to
surveys at similar rates and whether political events may differentially influence motivation to
respond to a survey request. This paper will explore whether there is evidence of differential
nonresponse between Republicans and Democrats in political polls. We will draw on a major
study of survey nonresponse conducted in 2011, as well as data from four surveys conducted
by the Pew Research Center from September through early November 2012 that included more
than 10,000 respondents. Since the actual partisan affiliation of nonrespondents is unknown,
this paper will use two approaches to examine the potential for political nonresponse bias. First,
samples files will be matched to external databases to assess how well partisan affiliation
compares among respondents and nonrespondents. In addition, a geographic analysis will use
county level presidential vote data from the 2012 election to see if people in areas that voted at
higher rates for Mitt Romney (“red counties”) responded at similar levels to people living in
areas that voted at higher rates for Barack Obama (“blue counties”).
Tracking and Re-engaging Respondents for Follow-Up Research: A
Methodological Examination of Two Research Studies
Anna Sandoval, American Institutes for Research; Celeste Stone, American Institutes for
Research
Securing participation in follow-up studies is completely conditional on a study’s ability to
relocate the original participants. Recent advances in technology and the availability of less
expensive methods for relocating sample members have led several recent efforts to revitalize
previously “decommissioned” longitudinal studies, recontact participants for program
evaluations, and reconstitute studies initially designed as cross-sectional studies to allow for the
investigation of complex social phenomena. However, some individuals are harder to relocate
than others, and tracking biases may result if location propensity is systematically related to
outcomes of interest. This paper uses data from two recent studies to examine the effectiveness
of strategies for relocating and reengaging study participants after long periods of noncontact. In
Study 1, researchers sought to relocate participants of a postsecondary program aimed at
increasing the ethnic diversity in the aquatic sciences and who last participated in the program
anywhere from 5-20 years previously. Study 2 is a pilot test assessing the feasibility of finding
and reengaging a national, representative random subsample of Project Talent participants
(now ages 65-70) who had not been contacted in 37 to 51 years. Both studies used
commercially-available databases for tracking and prepaid incentives. This paper summarizes
(1) the results of the tracking activities, including careful examination of key factors on tracking
success and the effectiveness of low-cost strategies for locating participants, and (2) the utility
of unconditional prepaid incentives on response rates after long periods of noncontact. This
study also explores the types of individuals who are underlocated. Results from this paper will
be used to inform others interested in revitalizing studies about possible biases associated with
tracking and reengaging participants after a long hiatus.
Polling Around the World
Outside Looking In: An Examination of the Kaleidoscopic Nature of International
Public Opinion of the United States During the Bush and Obama Presidencies
Natalie Manayeva, University of Tennessee; Alexandra Brewer, University of Tennessee;
Michael Fitzgerald, University of Tennessee
Public opinion polls from around the world demonstrate that during the past decade the United
States’ international image has worsened and approval for U.S. actions has declined
(Fitzpatrick, Kendrick & Fullerton, 2011). Such tendencies are found even in countries that have
traditionally been portrayed as American allies, such as Great Britain and Poland. Strengthening
of anti-American sentiments across the globe presents a variety of negative consequences for
the United States. Negative attitudes towards the country may result in economic and political
losses, and could even cause serious international conflicts (Revel, 2003). Resolving the
problem of rising anti-Americanism and dealing with its negative outcomes requires
understanding of the phenomenon, its origins and mechanics. The origins of anti-American
feelings and attitudes have been studied by scholars in various disciplines and approaches.
Katzenstein and Keohane (2007) distinguished six types of anti-Americanism, which varied in
causes and features. Other scholars (Crockatt, 2003; Meunier, 2005) identified a variety of
historical, cultural, religious, and economic reasons for anti-American attitudes. This study is
designed to explore the multi-facetted nature of international public opinion towards the United
States during the Bush and Obama presidencies and to provide possible explanations for the
fluctuations of global attitudes by analyzing domestic and non-domestic factors. Data from
Gallup polls on the attitudes towards the United States will be analyzed in this study. The
ultimate goal of this research is to expend understanding of the linkage between the
international public opinion of the United States and the factors of U.S. domestic and foreign
policy.
When Undecideds Decide It All: The Effect of Unreported Opinions on the Results
of Pre-Election Polls
Mohamed Abouelela, Faculty of Economics and Political Science; Magued Osman, The
Egyptian Center for Public Opinion Research (Baseera)
Pre-election polls were not much welcomed as a new practice in Egypt. None of the pre-election
polls about the last Egyptian presidential elections in the first round (conducted on 23-24 May
2012) succeeded in predicting Mohamed Morsi (current Egyptian president) to be one of the
winners to the second round of the election. Opinion polls were strongly criticized of being
politically biased and unscientific; three presidential candidates filed a case to ban pre-election
opinion polls in Egypt. This paper analyzes the discrepancy between the results of the pre-
election opinion polls conducted by the Egyptian Center for Public Opinion Research (Baseera)
before the first round of the Egyptian presidential elections and the actual results. The analysis
suggests that a key factor in explaining this discrepancy is the characteristics of the undecided
voters. Respondents who prefer Islamic candidates were more likely not to name their preferred
candidate than other respondents do. Another factor affected the quality of the pre-election
polls’ predictions was neglecting the trends of voters’ preferences accompanied by a long lag
time between the last poll and the election date. Finally, the paper suggests a corrective
procedure that incorporates the characteristics of undecided voters into the prediction of the
election winner(s).
Does Data Collection Method Affect the Results of the Post-Election Polling in
Egypt?
Hanan Girgis, The Egyptian Center for Public Opinion Research (Baseera); Magued I.
Osman, The Egyptian Center for Public Opinion Research (Baseera)
After the Egyptian revolution, public opinion polls became one of the important means to
measure the political orientations of the citizens. For the first time in Egypt public opinion
centers performed pre-election opinion polls to discover the presidency election candidates who
Egyptians would vote for. Other centers performed post-election polls to analyze the
characteristics of the voters of each candidate. Some of those centers used phone polls to
collect data and others used face-to-face interviews. A great debate was raised about the
appropriateness of using the phone polls to collect data in a country with a population that
suffers from low demographic characteristics like Egypt. The new democratic experience in
Egypt put public opinion surveys among the most important instruments to keep progress in
democracy track. Employing public opinion polls in drawing the political map in Egypt is also
new. This forces pollsters to test the different methodologies and tools they use in their data
collection methods. This paper aims to discover whether, in Egypt case, data collection method
affect the results of voting for different candidates and to reveal whether the effect, if any,
occurs in certain population groups or it is a random effect. The paper analyzes data collected
by a private independent public opinion research center that collected data on the candidates
for whom the Egyptian women voted using phone polls and face-to-face survey. Both the poll
and the survey were performed in the same period using national representative samples of
women and both of them collected data on the main characteristics of the respondents.
Indicators of State Legitimacy in Afghanistan
Nina R. Sabarre, D3 Systems; Samuel Solomon, D3 Systems; Timothy Van Blarcom, D3
Systems
State legitimacy is critical for policy implementation in Afghanistan, as its absence requires the
central government to devote resources to maintaining sovereignty against an increasingly bold
and coordinated insurgency rather than effective governance. A state is considered “more
legitimate” the more it is perceived by its citizens as rightfully holding and exercising political
power (Gilley 2006). With legitimacy in hand, the Afghan central government is more likely to
effectively implement policies that Afghans consciously accept. This paper contributes to the
discourse of state legitimacy through a quantitative analysis of variables that influence indicators
of state legitimacy. In April 2012, the Afghan Center for Socio-Economic and Opinion Research
(ASCOR) fielded a survey commissioned by D3 Systems, Inc. among 2,039 individuals across
all 34 provinces in Afghanistan. This survey measured public perceptions of general living
conditions, performance of the central government, reconciliation with the Taliban, and recent
events. Working with 125 different variables, the authors of this paper use logistic regression
models to isolate variables (such as: region, security, opinion of Taliban, income, religion, socio-
economic status, etc.) in order to understand their influence on state legitimacy. Although a
number of variables affect how Afghans perceive the legitimacy of their government, this
analysis concludes that the rating of the security situation is the most powerful factor affecting
perceptions of legitimacy.
South Sudan: Evolving Opinions After a Year of Independence
Brian M. Kirchhoff, D3 Systems; Samantha Chiu, D3 Systems; Matthew Warshaw, D3
Systems
D3 Systems of Vienna, VA fielded surveys in South Sudan in November 2011 and December
2012. These surveys of South Sudan measures public opinion as it relates to the most
important issues facing this new country. This paper analyzes survey results and compares
trends after one year of independence. In 2011, a honeymoon period produced positive
opinions, but a year later opinions on multiple topics have shifted. The research topics include
political stability, hydrocarbon policy, delivery of services and resources to a largely rural
population, the HIV/AIDS epidemic, regional drought and famine, the regional spread of
terrorism and a perennially contentious relationship with Sudan. In addition to improving
understanding on the aforementioned issues, the surveys also capture key demographic
information and include questions that measure media penetration and usage. Due to the low
penetration of phones and Internet throughout the country, the surveys were conducted via face
to face interviewing. Local interviewers were recruited primarily from universities and were
trained for two days prior to commencing field work. The questionnaires were prepared in
English and Arabic. Interviewers were required to be fluent and literate in English and at least
one other. The wave 1 sample consists of 5 key cities across South Sudan, with a
representative sample of the 18+ population by city, gender and age group. The five cities are
Juba, Malkal, Rumbek, Yambio and Wau. The wave 2 sample was split into urban and rural sub
groups; 500 interviews were conducted in the same five cities that comprise the wave 1 sample
frame and an additional 500 interviews conducted in rural locations surrounding those five cities.
Respondents were selected using a multi-stage random method, from PSU selection (from a
proportional to population list of sampling points), to household selection (random route) and
respondent selection (Kish grid).
Strategies for Increasing Response Rates
Use of Smart Phones/Text Messaging to Increase Response Rates
Piper DuBray, ICF International
INTRODUCTION: Survey response rates have greatly declined in the past decade, causing
researchers to seek new ways to increase participation. The Connecticut Department of Health
(CT DPH) and ICF International conducted two pilot studies in 2012 using text messages to 1)
increase response rates to the Behavioral Risk Factor Surveillance System (BRFSS) cell phone
survey, and 2) increase participation in the BRFSS Non-Response Web Follow-up.
METHOD: To evaluate the impact of an advance text message on survey response, the CT
BRFSS cell phone sample was divided into 3 groups: Group 1 was sent a text asking the
respondent to complete the telephone survey when called, also offering a $10 incentive. Group
2 received the text invitation with no incentive offer, and group 3, the control group, did not
receive a text message. The second pilot consisted of sending BRFSS telephone non-
responders a text message invitation to complete the survey via Web. Non-responders were
divided into 2 groups: Group 1 received 2 text messages inviting them to participate in the Web
survey and offered a $10 incentive for participating. Group 2 was sent the text invitations,
without an incentive.
RESULTS: Early results show that text invitations to the Web survey do not have a significant
effect on response rates. Initial results of advance texts to cell phones show a 2% increase in
CASRO over the control group, while advance text with incentives show a 3% increase in
CASRO over the control group. We will conduct further analyses after all data has been
collected, and determine whether this increase in response rate is significant.
CONCLUSION: Based on preliminary results, text messages as a tool to increase response had
mixed results. Advance text messages increased participation in a telephone survey, but text
messages to BRFSS non-responders were ineffective in increasing Web survey participation.
The Use of Email, Text Messages, and Facebook to Increase Response Rates
Among
Adolescents in a Longitudinal Study
Anna Fleeman, Abt SRBI; Kimberly Francis, Abt Associates; Tiffany Henderson, Abt
SRBI; Michelle Woodford, Abt Associates; Marlena Jani, Abt SRBI
Over the course of two years, more than 1,600 students in grades 7 through 12 were recruited
to take part in a three-year study assessing the effectiveness of a pregnancy prevention
program. As part of the assent process, students were informed about the study and asked for
name, home address, home phone, cell phone, email address, Facebook username, and
permission to text. The study consisted of three 25 minute surveys: a baseline with a $15
incentive, a 12 month follow-up ($25), and a 24 month follow-up ($30). The baseline survey was
administered in-school either online or by paper/pencil, with both follow-up surveys conducted
online. The majority of students were from low-income, minority households; therefore, six
months after the baseline and first follow-up surveys, they were asked to confirm or update their
contact information in an online five minute tracking survey. Initially, the first tracking survey
promised a $5 incentive; however, due to low response, it was increased to $10 as well as text
messages were sent as reminders. Additionally, phone calls were added both as a reminder
and as a mode to complete the short tracking survey. For the first follow-up, invitation and
reminder contacts consisted of a minimum of six emails, three text messages, three letters, and
six phone calls, depending on available contact information. To increase response, we decided
to send Facebook messages using the standard publicly-available personal page with privacy
as the utmost concern. Presented results will include response rates and demographics by
contact type and timing. Further, the operational issues related to text and Facebook messaging
will be detailed. The results provide great insight as to the use of social media as well as to the
retention, contact, and response rates of surveys of adolescents.
Will They Answer the Phone If They Know It’s Us? Using Caller ID to Improve
Response Rates
Kathy Ott, National Agricultural Statistics Service; Heather Ridolfo, National Agricultural
Statistics Service; Jeff Boone, National Agricultural Statistics Service; Nancy Dickey,
National Agricultural Statistics Service
Survey response rates have been declining over the last several decades. In terms of telephone
surveys, this decline is often attributed to the wide availability of call screening technologies and
respondents’ reluctance to answer calls from unknown numbers. This has led some to posit that
calling respondents from local area codes (or familiar area codes) and using identifiers that are
both recognizable and trustworthy may improve survey response rates. In fact, anecdotal
evidence within our own agency has suggested that this may be the case; however, research
outside of our agency has produced mixed findings in regards to these claims. At the National
Agricultural Statistics Service, we conducted a series of experiments to determine if the
information presented on caller ID would influence response rates. Specifically, we examined
whether calling respondents using local area codes rather than out-of-state area codes and
different identifiers (i.e., USDA versus Ag Counts) improved response rates. In addition, we
surveyed respondents regarding their use of caller ID and its influence on their decision to
answer our call. In this presentation, we will discuss the findings from this study and their
implications.
Using Qualitative and Quantitative Testing to Improve Hispanic Response Rates
for Online Surveys
Yelena Pens, Arbitron; Robin Gentry, Arbitron
Arbitron Inc., a provider of radio ratings data, conducted a test using a probability based
address sample to recruit the Hispanic population, aged 13 and older, to complete a one week
Web-based diary of their radio listening. Since hard-to-reach demos such as the Hispanic
population historically have had lower participation, a qualitative study was conducted to provide
insights into the Hispanic population and used to design materials used in a large quantitative
study of recruitment into an online survey. In January 2012, Arbitron conducted a series of focus
groups as well as face-to-face interviews with the Hispanic population in three markets. The
purpose of the focus groups was to determine concerns related to the mailing materials. In
particular, materials presented included mailed invitations for the Web-based diary. The face-to-
face interviews were conducted in the form of a usability study in order to provide insight into the
user experience of the Web-based diary. The mailing materials as well as the Web-based diary
were translated in Spanish, thus the participants were able to select an English or Spanish
version of the online diary. In October 2012, Arbitron conducted a pilot study of the online diary
for the Hispanic population. The feedback from the qualitative study helped to design advanced
notices, mailing invitations, and pre-recorded blast messages for the Web-based diary. The
usability study helped to re-design the Web-based diary that was previously used for a pilot
study of the general population. In this presentation, we will present the results from the
qualitative and quantitative studies. In addition, we will present the optimal strategy for mail-
based recruitment for an online survey of the Hispanic population.
Survey Reminder Method Experiment: An Examination of Cost Efficiency and
Reminder Mode Salience in the 2012 N-MHSS Locator Survey
Matthew G. Anderson, Mathematica Policy Research; Barbara Rogers, Mathematica
Policy Research; Karen CyBulski, Mathematica Policy Research; John Hall, Mathematica
Policy Research; Cathie E. Alderks, SAMHSA; Laura Milazzo-Sayre, SAMHSA
Encouraging survey completion rates in a cost-efficient manner is typically a challenging
endeavor. This paper will use data from the 2012 National Mental Health Services Survey (N-
MHSS) Locator Survey to examine whether one type of respondent reminder is more cost-
efficient than another. The 2012 N-MHSS is sponsored by the federal government’s Substance
Abuse and Mental Health Services Administration and includes 22,455 mental health facilities
across the United States. Data for this survey were collected using the Web mode with
computer-assisted telephone interviewing (CATI) follow-up over a four-month period. Using
4,300 randomly selected facilities equally divided between treatment and control groups,
facilities received one of two types of reminder. A specialized reminder letter was mailed first
class to the control group and to nonresponders in the non-experiment sample. The treatment
group received CATI reminder calls, starting on the same day that the letters were mailed. A
two-week field period was established to complete the reminder calls and to allow the letters to
arrive at facilities. Our findings will include an analysis of the percentage of facilities that
completed the survey during or shortly after the reminder period and an examination of facility
characteristics that might affect the completion rate, in addition to analyzing costs. The results of
this experiment will help determine whether a particular reminder method is more efficient, both
in cost measures and completion rate, and can help inform the survey research field of evolving
trends in respondent behavior and reminder mode salience.
The Role of Blogs in Public Opinion Research Dissemination
The Survey Geek
Reg Baker, Market Strategies, Inc.
Reg Baker launched his blog The Survey Geek in 2005 as a way to share news and information
about survey methods with his colleagues at Market Strategies International. As those
colleagues shared posts with clients and others outside the company it morphed from an
internal blog into a public blog. Its content also evolved from a focus on survey methods to
broader commentary on the evolution of new research methods of all kinds. The blog’s original
intent was to educate and while some posts still have that theme it more often is commentary on
how the research industry is changing, whether that’s for the good or ill, and is especially
disrespectful of the hype that dominates too much of the so-called “NewMR.” Reg is the former
president and COO of Market Strategies International where he now works as a part-time
consultant.
LoveStats
Annie Pettit, Conversition
Annie Pettit launched her LoveStats blog four years ago after leaving a full-time job to pursue
her own interests. The blog began simply as a way to stay active in the market research arena,
even though she was not part of global company, but grew into much more. It became a place
to clarify fuzzy thoughts, disagree vehemently with traditional opinion, pursue hot-headed rants,
share insights into new methodologies, and show others that you can have a little research,
statistics, and baking fun along the way. As the blog became more popular, it led to many
unexpected opportunities that a behind the scenes researcher rarely gets to participate in. Annie
is the Chief Research Officer of Conversition Strategies and Vice President, Research
Standards at Research Now, specializing in social media market research, survey research, and
data quality.
SurveyPost
Adam Sage, RTI International
RTI International’s SurveyPost is a blog comprised of future-oriented researchers in the fields of
survey methodology, health communications, and statistics and informatics. Contributions are
geared toward evaluating and understanding emerging technologies and concepts as sources
of social and behavioral data, and tools for data capture. Emerging from research and
development initiatives in communication platforms, such as Facebook, Twitter, and
smartphones, and concepts such as crowd behavior, SurveyPost is intended to communicate
with and engage the research community in ways that promote the spread of innovative
research on the very platforms we investigate. Recognizing the difficulty in publishing cutting-
edge research that keeps pace with the rate of technological development, SurveyPost
researchers view blogs and other forms of social media as critical mechanisms for promoting
timely discussion of our research to ensure that the state of science is in line with the state of
technology. Adam Sage is the editor of SurveyPost and is a research methodologist at RTI
International.
The Caucus
Marjorie Connelly, The New York Times
The New York Times website has more than 60 blogs dealing with news and politics, business
and finance, technology, culture and media, health and education, and style and leisure, sports
and opinion. Most are group blog sites, written by a mix of staffers and freelancers; others are
blogs by individuals. Marjorie Connelly, as an editor on The Times’ News Surveys and Election
Analysis Desk, works to coordinate the multi-platform coverage of surveys. She has been
contributing to The Caucus blog, which has offered news and analysis about government and
politics, since February 2007. She writes items about Times/CBS News polls, those released
from other organizations, and other survey related news. In addition, Marjorie posts items to the
local City Room blog that concentrates on news about New York City and to blogs from the
business section dealing with personal finance and health care. But it’s not all politics and hard
news: she contributes to the sports blogs dedicated to the Olympic Games, the N.F.L., college
sports and major league baseball. The Caucus and other blogs are useful ways to disseminate
survey results that may not merit a full story, but are be interesting or entertaining. However,
inclusion on a blog does not preclude an item from appearing in the print version of the paper.
The Times’ own surveys are often teased with partial releases on The Caucus during the
afternoon ahead of the full poll release in the evening. And polls released after The Times’ print
deadlines now have a place to appear.
FreeRangeResearch
Casey L. Tesfaye, American Institute of Physics
Blogs are particularly important in the current research environment. Excitement abounds over
the terabytes of data freely available for analysis online. This has led to a rapid rise in data
science and experimental analytic strategies. The survey community has been understandably
slow to embrace these developments. From a perspective of Total Survey Error, in a field where
a few percentage points can have far-reaching consequences, experimental methodologies
seem downright irresponsible. It is important both to advocate for our abilities and continued
relevance as a field and to carefully examine the strengths and weaknesses of new methods.
Casey uses her blog FreeRangeResearch as a space to experiment with ways in which these
research methodologies can coexist and even learn and gain from each other. As the voice
behind FreeRangeResearch, Casey aims to explore the quickly evolving field of social science
research in a methodologically grounded way. She tries to maintain an up-to-date listing of
relevant blogs and research tools, share high quality articles from a range of disciplines, report
from a range of speakers and events, and explore intersections that she comes across in her
own research.
Kumarrao.net and Survey Practice
Kumar Rao, The Nielsen Company
Kumar started his blog www.kumarrao.net about three years ago as a window to, what he calls
his “thinking world.” He saw this as a venue to not only showcase his research activities and
interests, but also to network and connect with like-minded folks who share his research
interests. Folks from various disciplines and countries have contacted Kumar to ask for a copy
of his papers and/or share their opinion about his research. He has also ended up working with
some of them. Kumar feels, when done right, blogs can serve as a gateway to particular
communities of supporters, learners, and peers. The caveat here is “when is it done right?”
What does it mean? Is it the ability of a blog to differentiate itself from the millions out there?
What does differentiation mean in this context? How can bloggers attempt to differentiate their
blogs with the thousands of spam ones that are out there? Is it in quality or quantity of the
content in the blog, on top of a good advertising strategy, that can facilitate differentiation? With
his recent appointment as co-editor of AAPOR’s blog-style publication Survey Practice, Kumar
feels an even stronger sense of my social obligation to serve the larger community of survey
and public opinion researchers. A recent AAPORnet post from former Public Opinion Quarterly
editor Peter Miller describes the role of a journal editor which Kumar believes also applies to
other distributed content sources such as blogs. He wrote “editors should not give themselves
the license to dictate a journal's content and should be careful stewards rather than egotists with
a grand vision.' Kumar Rao is the director for the Statistical Center of Innovation at The Nielsen
Company where he is responsible for developing new statistical and computation techniques for
online and mobile business initiatives.
Researchscape
Jeffery Henning, Researchscape
Interest in survey research and polls is surging, in part because of the rise of Do-It-Yourself
survey platforms. Many business people are being asked to conduct surveys, despite having no
formal training in the field.
Methodological Briefs: Internet Surveys
The Impact on Web Survey Drop-Out Rates of Page Number Progress Indicators
Used Throughout, Near the End, or Not at All
Jill Walston, American Institutes for Research; Brittany Cunningham, American Institutes
for Research; Rebecca Medway, American Institutes for Research
A common feature of Web-based surveys is a progress indicator letting respondents know how
far along they are in the survey. This information can be in the form of a progress bar that
steadily fills up as the survey is completed or a display of the current item or page number along
with the total number of items or pages. According to Conrad, Tourangeau, & Peytchev (2004),
the use of progress indicators is based on the assumption that respondents will be less likely to
drop out if they see they are making progress. However, there are conflicting results on
progress indicators’ effect on drop-out rates (Callegaro, Yang, Villar, 2011; Conrad, Couper,
Tourangeau, & Peytchev, 2004; Matzat, Snijder, & van der Hurst, 2009). We speculate that a
progress indicator might be most effective at discouraging drop-outs at the end of the survey
when the respondent is close to completion. To investigate this possibility, we administered a
Web-survey under three randomly assigned conditions, 1) a page number progress indicator for
all 12 pages of the survey (e.g., “page 1 out of 12 pages”), 2) a page number indicator
appearing only for the last 3 pages of the survey), and 3) no progress indicator. Comparing
drop-outs during the first 9 pages of the survey will evaluate the impact of page numbers vs. no
page numbers. Comparing drop-outs during the last three pages will allow us to consider the
impact of adding the indicator near the end of the survey. The survey is being administered to a
national sample of public school principals and includes questions about Common Core State
Standards. Given the ambiguity that continues to surround the effect of progress indicators we
anticipate that our results will add an informative perspective on the possible impact of using a
hybrid approach.
Examining the Feasibility of SMS as a Contact Mode for a College Student Survey
Scott D. Crawford, Survey Sciences Group, LLC; Colleen A. McClain, Survey Sciences
Group, LLC; Sara O’Brien, Survey Sciences Group, LLC; Toben F. Nelson, University of
Minnesota
As respondents use mobile devices to take Web surveys at increasing rates, researchers are
finding that related technologies may also be a useful tool for communicating with these
respondents. Recent work surrounding text (SMS) messages as a means of communication
with respondents both at the survey invitation stage (Mavletova & Couper, 2012) and as a data
collection mechanism (Brenner and DeLamater, 2012; Schober et al., 2012) has suggested
promise for the communication method, while at the same time raising questions about optimal
use. With this literature in mind, we focused on the processes of consent, mode of invitation,
and type of URL used (due to space limitations with SMS) as we invited college students at one
Midwestern university to participate in a short, rapid-response survey evaluating alcohol use
over the past month. We will begin by comparing those giving consent to receive SMS
messages (obtained in a baseline survey) with those who did not consent to be contacted in this
way. Then, we will describe the results of a randomly assigned experiment conducted among
1367 students, in which we varied both communication type (email versus SMS) and URL
composition (short, commercial “tiny URL” service versus full research domain URL). We will
discuss the relationship of these treatments to both data quality indicators and substantive
measures, using baseline and follow-up data in our analysis. Key measures explored will
include response rates, break-off rates, item missing data rates, substantive mental health and
alcohol use measures, and respondents’ self-reported use of technology. Further, we will
address the practical challenges of incorporating short SMS messages into a data collection
protocol focusing on sensitive behaviors-- including issues related to message content length,
IRB approval, consent processes, and SMS technology.
The Effectiveness of Mailed Invitations for Web Surveys
Wolfgang Bandilla, GESIS - Leibniz Institute for the Social Sciences; Mick P. Couper,
University of Michigan; Lars Kaczmirek, GESIS - Leibniz Institute for the Social Sciences
E-mail is a common invitation mode for Web surveys. However there are limitations in
conducting Web surveys of the general population because lists of all Internet users and their e-
mail addresses do not exist. So it is impossible to select a random sample of e-mail addresses
(compared to RDD for the telephone). One solution could be to collect e-mail addresses in
another mode (e.g. via CAPI or CATI interviews). But asking for e-mail addresses may raise
privacy concerns among respondents. We test whether an invitation by a mailed letter could be
an alternative to the common e-mail invitation in a Web survey. In this experiment participants
were recruited with the aid of the German General Social Survey (ALLBUS), a face-to-face
survey using computer assisted personal interviews (CAPI) in private households, conducted in
2012. Among ALLBUS respondents who reported having Internet access at home, we asked a
random third for their e-mail address: 43% provided their e-mail address, while 57% declined to
do so. As a control group two thirds of the Internet users were not asked for their e-mail
address. In a follow-up Web survey, to be conducted in February 2013, the three groups of
Internet users (those who provided an e-mail address, those who were asked but refused to
provide an e-mail address, and those not asked for an e-mail address) will be invited to a Web
survey by a mailed letter. We will examine the response rates to the Web survey among the
three groups, and explore potential demographic and attitudinal differences of respondents,
based on ALLBUS data. Our expectation is that those who provided an e-mail address will be
the most cooperative, while those who were asked but refused will be least likely to respond to
the Web survey.
A Competition Among New Graphical Methods for Eliciting Probability
Distributions
David Rothschild, Microsoft Research
We test eight graphical interfaces that capture probability distributions from non-experts. This
work stands to both improve how surveys elicit expectations from experts and allows us to
successfully elicit new information from individuals, which was previously too complicated to
survey. Traditional methods typically elicit probability distributions by asking for the likelihood of
an outcome in a given range. More modern examples include the “ball and buckets” method of
asking users to fill up buckets that represent each range with 100 balls. The new methods we
propose ask users for up to six data points that define polygon-shaped probability distributions.
For example, participants mark the high, low, and mid-points of a range on a ruler with their
values shown both graphically and numerically. With three points set, a polygon-shaped
probability distribution forms above the ruler. There is no y-axis, instead the distribution is
broken up into six segments with the probability mass included in each segment indicated. The
user can drag any point around freely before submitting. In various, randomly-assigned
conditions we test six progressively complex methods that build from a simple point-estimate to
a multi-sided shape. We compare these methods on three criteria: time of completion, effort,
and accuracy of elicited moments. Faster completion times allow surveyors to either reduce
monetary costs or ask more questions. Reduced effort allows users to focus more on their work,
which both decreases depletion effects (which can impact results) as well as the cognitive cost
of completing the survey. In an increasingly online and connected world there is a potential
value to guiding non-experts to create more accurate individual-level expectations that can
create more efficient choices. Further, aggregating elicited probability distributions (as opposed
to simple point estimates or confidence ranges) can enhance the usefulness of forecasts for
many stakeholders in many situations.
Smarter Online Panels for Smartphone Users: Exploring Factors Associated with
Mobile Panel Participation
Lauren A. Walton, The Nielsen Company; Trent D. Buskirk, The Nielsen Company;
Thomas Wells, The Nielsen Company
Smartphones currently account for nearly 50% of all U.S. cell phones and Internet usage on all
mobile devices is projected to surpass that of desktop computers by 2014. Smartphones APPS
also continue to rise in popularity and use across mobile platforms. With both Internet and app
availability on smartphone devices, researchers have multiple methods for conducting surveys
via this technology. To date relatively little research has been published about the theoretical
constructs associated with survey participation on such devices. Recently one study reported a
theoretical model of mobile survey participation that expanded the traditional constructs
associated with online surveys to include enjoyment and engagement. Beyond this work little is
known about specific factors that influence participation in mobile surveys. In this paper we will
investigate practical factors associated with a respondent’s choice to participate in a
hypothetical online smartphone panel where surveys are completed exclusively using mobile
browsers. Specifically, using an online survey administered to a nationally representative
sample of 1,000 smartphone owners, we investigate what influence survey specific factors (e.g.
frequency, length, content) as well as logistical factors (e.g. personal information required, data
consumption limits and GPS tracking) have on panel participation. Using a split-ballot
experiment, respondents were asked to answer questions presented using either a standard 7-
point Likert scale or maximum difference scaling (MaxDiff). Knowing that Likert scale formats
are not optimal for smartphone Internet browsers, we explore whether similar information can be
gleamed from the Maxdiff and Likert scales. Specifically, we compare both the influence
rankings as well as the degree of item differentiation provided by both methods in order to
assess the degree to which Maxdiff questions might provide a possibly more reliable
assessment of survey factor influence and make the case that this method may be more optimal
for influence questions posed on mobile browsers.
Distracted Respondents
Brian F. Schaffner, University of Massachusetts Amherst; Stephen Ansolabehere,
Harvard University
The Internet is becoming an increasingly common mode for conducting survey research. While
academics and practitioners have paid significant attention to evaluating the extent to which
online polls are able to generate representative samples, less work has been conducted
evaluating how the nature of the survey interview differs online. For example, how do
respondents interact with a survey questionnaire that they are free to complete at their own
pace and to what extent does the self-administered nature of online surveys affect survey
responses? In this paper, we investigate this question using a series of large N online surveys
conducted by YouGov, America. At the end of each survey, respondents were asked whether
they had engaged in a number of activities while they were taking the survey. Half of the
respondents to our surveys reported at least one distraction while taking the survey; the most
common distractions included watching television, having a conversation with another adult in
the room, taking a break, answering email, or taking a phone call. We combine answers to this
question with data on how long the respondents took to answer each question in order to
determine when distractions occurred. These data allow us to examine not only when
respondents become distracted, but also whether response patterns are altered after these
distractions. Ultimately, the findings from this study provide an improved understanding of how
to best administer and analyze data from online surveys.
Are Response Rates to a Web-Only Survey Spatially Clustered?
Lee Fiorio, NORC at the University of Chicago; Michael Stern, NORC at the University of
Chicago; Ned English, NORC at the University of Chicago; Ipek Bilgen, NORC at the
University of Chicago; Becki Curtis, NORC at the University of Chicago
Over the past decade, researchers have learned a great deal about the design and
implementation of Web surveys. However, to date, we have virtually no empirical information
about the role space and place has in influencing the error associated with Web-only surveys.
The two types of error most often discussed when considering Web surveys are coverage and
non-response; both of which are typically indicated as reasons for low response rates in these
types of surveys. One way to pursue this issue of place is to use Geographic Information
Systems (GIS) to spatially-model survey response rates. This will allow us to understand the
impact of location on error in Web surveys. In this paper, we attempt to examine this gap in the
literature by assessing the spatial clustering of response rates to a general population Web-only
survey. The data come from a random, Address- Based Sampling approach using the Delivery
Sequence File (Valassis version) where respondents received a postal letter with a URL. We
calculate response rates at several geographic scales, including county, state, and region, to
determine the extent to which response rates are spatially clustered. While controlling for ACS
demographics, Internet availability, and postal characteristics, we then build a spatial lag model
to measure spatial dependence of response rates observed. Preliminary findings show clusters
of low response rates in the South that cannot be accounted for by other variables in the model.
Interviewers and Interviewing
Frequentist and Bayesian Approaches for Comparing Interviewer Variance
Components in Two Groups of Survey Interviewers
Brady T. West, Institute for Social Research, University of Michigan; Michael R. Elliott,
Institute for Social Research, University of Michigan
Survey methodologists have long studied the effects of interviewers on the variance of survey
estimates. Statistical models including random interviewer effects are often fitted in such
investigations, and research interest lies in the magnitude of the interviewer variance
component. One question that might arise in methodological investigations is whether or not
different groups of interviewers (e.g., those with prior experience on a given survey vs. new
hires) have significantly different variance components in these models, which could mean, for
example, that certain groups might benefit from additional training (in hopes of minimizing the
mean squared error of survey estimates). Unfortunately, popular frequentist approaches to
making inferences about interviewer variance components in hierarchical generalized linear
models (HGLMs) for non-normal survey variables have several limitations. These include
reliance on asymptotic theory, questionable properties of classical likelihood ratio tests when
pseudo-likelihood methods are used for estimation, and a failure to account for uncertainty in
the estimation of features of prior distributions for model parameters. This paper compares and
contrasts alternative approaches to making inferences about differences in variance
components between two independent groups of survey interviewers. A Bayesian approach is
proposed that circumvents many of the problems associated with alternative frequentist
approaches. The Bayesian approach and alternative frequentist approaches are applied to an
analysis of real survey data collected in the U.S. National Survey of Family Growth (NSFG), and
results suggest that inferences can vary depending on the approach used. Examples of
software code that can be used to implement both approaches in practice will be provided as a
part of the presentation.
Interviewer Perceptions and Data Collection Outcomes on a National Multi-Mode
Study
Micah Sjoblom, NORC at the University of Chicago; Vicki Wilmer, NORC at the University
of Chicago; Marietta Bowman, NORC at the University of Chicago; Peter Hepburn, NORC
at the University of Chicago
The National Survey of Early Care and Education (NSECE) employed a multi-mode design
including a national in-person data collection effort. The complexity of managing multiple
combinations of samples, questionnaires and respondent types created greater needs for
customized combinations of paradata and cost management data to steer outreach efforts. To
establish another set of objective information reflective of experiences “on the ground,”
interviewer observations were collected for specific types of cases such as considering eligibility
for unscreened households or gauging whether or not a respondent would complete the
questionnaire based on past contact attempts. For the observations, interviewers were
instructed to complete case reviews for certain types of cases at different stages of data
collection and assign a code that best matches the current status of the case. The observation
process included the evaluation of previous contacts, the determination of the level of difficulty
perceived in achieving case resolution and the identification of barriers to cooperation.
Interviewer observations and perceptions were then used in aggregate to identify patterns and
develop targeted strategies for working particular types of cases. The interviewer observations
captured at numerous points during the course of data collection will allow us to further examine
the possibility of using such information in systematic ways to better target effort. For this
presentation we will evaluate the quality of these interviewer observations by comparing coded
interviewer assessments with the additional effort expended in future contact attempts as well
as the final case status outcomes assigned to these cases at the end of data collection. These
comparisons will be discussed in terms of how effective the initial interviewer observations were
at determining final case level outcomes and the level of agreement between interviewer
observations and finalized case status assigned at the end of data collection.
Factors Influencing the Quality of Interviewers’ Observations of Respondents’
Gender in Telephone Surveys
Susan K. McCulloch, Joint Program in Survey Methodology; Frauke Kreuter, University
of Maryland, JPSM & IAB
According to a 2011 survey, 68% of all U.S. organizations that conduct telephone surveys
collect respondent gender data by requiring interviewers to observe and record whether they
are speaking with a male or female based solely on respondents’ voice. These gender
observations are often made early in the survey as part of the introduction and screening
process – thus, providing limited acoustic cues to inform judgments. Researchers rely on these
gender observations to: (1) understand attitudes and behavior; (2) screen for study eligibility; (2)
determine skip patterns; (4) contribute to nonresponse assessment and adjustments; (5) inform
post-stratification weighting; and (6) design experiments. Despite this fundamental role in
research, literature suggests observational data is often flawed. In fact, analysis of the quality of
one firm’s interviewer gender observations found an overall misclassification rate of
approximately 8% (McCulloch et al., 2010), and higher among certain groups such as women
and African-Americans. Given this, can we identify some predictors of observational errors?
Moreover, how can we begin to improve the quality of gender observations in telephone
surveys? The goal of this paper is to identify structural features (such as length of exposure to a
respondent’s voice and the buzzing sound of a centralized phone room) in addition to
interviewer characteristics as predictors of errors in interviewer observations of gender. Utilizing
existing recording of survey interviews, the experimental research addresses the following
questions: (1) Does allowing more time to disentangle gender cues improve observations?; (2)
Does a noisy phone room contribute to errors in observations?; (3) Are there characteristics of
the interviewer and/or respondent that are significant covariates of error in interviewer
observations? Using the recent paradata work and linguistics literature as a foundation to
design this lab experiment, the results of this paper provide information for improved collection
methods of observational data.
Shocking Misbehavior by Face-to-Face Interviewers: The 2008 ANES Office
Recognition Questions
Hector Santa Cruz, Stanford University; Jon A. Krosnick, Stanford University
In 2008, for the first time in the study’s history, the American National Election Studies (ANES)
made audio recordings of survey respondents’ answers to four open-ended quiz questions
assessing political knowledge. In the past, interviewers typed transcripts of the answers while
respondents were speaking; however, inspection of these transcripts revealed that interviewers
usually did not follow their instructions to provide literal, word for word verbatim transcriptions.
ANES made audio recordings of respondents’ oral answers in 2008, to see whether more exact
transcriptions of respondents’ actual utterances might lead to more reliable and valid coding.
These recordings were invaluable in finding remarkable deviations by interviewers from their
instructions, in many cases invalidating the answers provided by the respondents. Interviewers
both increased and decreased the likelihood of a respondent answering correctly by giving
hints, answers, comments, choices, mispronounced names, and even completely different
names. The Political Psychology Research Group (PPRG) at Stanford worked with the audio
transcriptions of all respondents and coded deviations to determine their frequency and effects.
Frequency of deviations include how many and how often interviewers deviated. Effects include
identifying if helpful deviations led to correct answers and if hurtful deviations led to incorrect
answers. Without these audio recordings, we would never have been able to discover these
deviations. Our findings reveal interviewer misbehavior and show how they affect the data that
countless scholars use nationwide. While the costs of survey research have increased, this
study shows that the additional cost of producing audio recordings justify the benefits of
increased accuracy.
Audio-Recording of Verbatim Thinkalouds: A Solution to the Problems of
Interviewer Transcription?
Patrick Sturgis, University of Southampton; Nick Allum, University of Essex; Rebekah
Luff, University of Southampton
Recent attention in the survey methodological literature has turned to the quality of the coding
that has conventionally been applied to verbatim response data. Verbatim responses require
respondents to express their thoughts about some topic or attitudinal object ‘in their own words’.
This type of question has been argued to provide potentially richer data than standard closed-
format response alternatives, which are not constrained by the (generally implicit) a priori
framing of an issue by the researcher or question writer. However, the potential benefits of the
verbatim format are often undermined by the quality of the procedures that are used to record
and code them. In this study, we use a split sample design incorporated in a nationally
representative face-to-face survey to assess the effect of audio-recording verbatim responses,
compared to the standard approach of requiring interviewers to type the responses into the
laptop computer as they are enunciated. We compare the responses obtained from each
random half of the sample on a range of different measures of data quality as well as the
distributions obtained when they are coded to the same underlying frame.
Designing Effective Rating Scales
A Comparison of Branched and Unbranched Rating Scales for the Measurement
of Attitudes in Surveys
Emily E. Gilbert, University of Essex
The choice of question response format is an important one and has wide implications for
reliability and validity. One relatively recent innovation has been the use of ‘branched’ formats
for Likert scales. In this format, one firstly asks the respondent about the direction of their
attitude, and then using a follow up question, measures the intensity of the attitude (Krosnick
and Berent, 1993). The potential advantage of this method is to reduce cognitive burden on the
respondent, thereby permitting data of higher quality to be extracted. The potential
disadvantage is in administration time. A key question is whether potential costs of adopting this
method in face-to-face surveys justify any gains in reliability. This paper uses data from wave 3
of the Innovation Panel, a subsample of Understanding Society, a longitudinal panel of 40,000
British households. A split ballot experiment was embedded within the survey, allowing for a
comparison of responses between branched and unbranched versions of the same questions.
In particular, reliability of both versions was assessed, as well as differences in the time taken to
answer the questions in each format. In a total survey costs framework, this allows us to
establish if any gains in reliability are outweighed by the additional costs incurred because of
extended administration times. Initial findings show evidence of response differences between
branched and unbranched scales, particularly a higher rate of extreme-responding in the
branched format. However, the differences in reliability between the two formats are less clear-
cut. The branched questions took longer to administer than the unbranched versions, potentially
increasing survey costs significantly.
Do Branched Rating Scales Have Better Test-Retest Reliability Than Un-Branched
Scales? Experimental Evidence From a Three-Wave Panel Survey
Nick Allum, University of Essex; Emily Gilbert, University of Essex
The use of ‘branched’ formats for rating scales is becoming more widespread because of a
belief that this format yields data that are more valid and reliable. Using this approach, the
respondent is first asked about the direction of his or her attitude/belief and then, using a
second question, about the intensity of that attitude/belief (Krosnick and Berent, 1993). The
rationale for this procedure is that cognitive burden is reduced, leading to a higher probability of
respondent engagement and superior quality data. Although this approach has been adopted
recently by some major studies, notably the ANES, the empirical evidence for the presumed
advantages is actually quite meagre. Given that using branching may involve trading off
increased interview administration time for enhanced data quality, it is important that the gains
are worthwhile. This paper uses data from an experiment embedded across three waves of a
national f2f probability-based panel survey in the UK (the Innovation Panel from the
‘Understanding Society’ Survey). Each respondent was interviewed once per year between
2009 and 2011. We capitalise on this repeated measures design to fit a series of models which
compare test-retest reliability, and range of other indices, for branched and un-branched
question forms, using both single items and multi-item scales. We present the results of our
empirical investigation and offer some conclusions about the pros and cons of branching.
Controlling for a Response Order Effect in Ranking Items Using Latent Class
Choice Modeling
Ingrid Vriens, Tilburg University; John Gelissen, Tilburg University; Guy Moors, Tilburg
University
The ranking approach is an often used method to measure human values. It is based on
Rokeach’s idea that ‘a value is an enduring belief that a specific mode of conduct or end-state of
existence is personally preferable to an opposite or converse mode’. The benefit of this method
compared to the rating approach is that it forces respondents to choose between given choice
options, while in a rating task respondents can rate all choice options as equally important. A
disadvantage of the ranking approach is the occurrence of a response order effect. This means
that choice options have a higher probability of being chosen just because of their placing in the
list instead of their actual content. This may be the consequence of satisficing behavior
(meaning that instead of looking for an optimal solution, respondents tend to go for the first
acceptable option they see) and is especially visible in longer lists of items, although previous
research has also shown this effect for questions with only three items. Whereas in earlier
studies this effect was only detected, we show an approach on how to actually control for this
effect. To do this we use the latent class factor model (with an especially designed Choice
module that makes it easy to appropriately analyze choice data) and include the response order
effect as an attribute of the choice. We examine the changes in model parameters when the
response order effect is being controlled for or not and specifically whether this changes the
effects of covariates on the content factor. We illustrate our approach with data that were
gathered by implementing a small experiment in the LISS panel research project, which
provides a panel that is based on a representative sample of the Dutch population.
Measurement of Self-Rated Health Among U.S. Hispanic Populations
Mingnan Liu, University of Michigan; Sunghee Lee, University of Michigan
Self-rated health (SRH) is a widely used survey item for monitoring current population health
and predicting its future. While its importance is evident in survey practice and substantive
research, there is no clear principle for its measurement approaches. In fact, SRH is
operationalized in various forms in different surveys. Yet, it appears implicitly assumed in
substantive research that SRH measured in different forms provides equivalent measurement
properties. In this study, we focus on the U.S. Hispanic population and compare measurement
properties of SRH implemented in four different surveys: the Health and Retirement Study
(HRS), the Hispanic Established Populations for the Epidemiologic Studies of the Elderly
(Hispanic-EPESE), the National Health Interview Survey (NHIS) and the National Latino and
Asian American Study (NLAAS). SRH in these surveys differed by response scale (5 versus 4
point scale), question order (before versus after specific health items) and question content
(overall general health versus specific domain’s general health). Moreover, the item was
translated differently in Spanish. This study will analyze 1) the distribution of SRH, 2) the well-
known relationship between SRH and specific health conditions and between SRH and health
care utilization, and 3) the level of SRH utility for predicting mortality. We will compare these
estimates for Hispanics across all applicable surveys by interview language. We will also use
the non-Hispanic White sample in HRS as our benchmark group in assessing the measurement
properties.
Rating Scale Design in Developing Countries: A Split Ballot Experiment in
Ethiopia
Charles Lau, RTI International; Emilia Peytcheva, RTI International
Due to the growth of cross-cultural surveys, questionnaires developed in the U.S. and Europe
are often translated and used in developing countries in Africa, Asia, and Latin America. These
surveys often include rating scales to measure attitudes. However, there is little empirical
evidence about the reliability and validity of rating scales in developing countries, or research
about the optimal design of these scales. This is problematic because rating scales are likely
understood differently in developing countries due to their cultural and socioeconomic contexts.
To address this gap in our knowledge, we conducted a split ballot experiment in a face-to-face
survey of Ethiopian business owners (n = 608). The survey included 38 agree/disagree
questions about the social and economic context of doing business in Ethiopia. We randomly
assigned one of three rating scale types to each respondent: (1) Verbal scale (e.g., Completely
Disagree, Somewhat Disagree, Neutral, Somewhat Agree, Completely Agree); (2) Numeric
scale (1-5, with verbal labels at the anchors); (3) Branched or unfolding scale that first asked
about direction (Agree, Disagree, Neutral) and then asked about extremity (Completely,
Somewhat).In this paper, we investigate how rating scale design affects key indicators of data
quality. Three findings emerge from our preliminary analysis. First, scale design has a
statistically significant effect on the distribution of responses. Compared to verbal and branched
scales, numeric scales produce significantly greater endorsement of the middle category, but
less endorsement of “agree” responses. Second, branched scales produce the highest levels of
within-individual variance, which suggests that branched scales are best at encouraging
respondents to differentiate among response options. Third, in two independent tests of criterion
validity, numeric scale designs had lower levels of validity compared to the verbal and branched
designs—suggesting that branched and verbal scales produce substantially higher data quality
compared to numeric scales.
Partisanship, Democracy and Political Behavior
What’s Wrong With Nevada?: The Persuasive Power of Partisanship
Andrew Smith, UNH Survey Center; Jennifer Dineen, University of Connecticut
2012 pre-election polls routinely showed concerns about the economy and jobs were the most
important issues facing the public. This led many analysts to believe that for most voters,
economic issues would be central to their vote, much as they were in 1992 and 1980, and that
Barack Obama would be denied reelection like George H. W. Bush and Jimmie Carter. But this
obviously did not happen, and raises the question – why wasn’t it the economy? Frank (2004)
asked “what’s wrong with Kansas” and concluded, in part, that working class whites voted
Republican, despite economic policies that worked against their interests. Obama won in states,
such as Nevada, and Florida, that were disproportionally hit by the recession, but whose voters
did not hold his administration responsible or did not factor the economy in to their vote.
Previous research has shown that perceptions of the economy are heavily influenced by
partisanship (see Evans and Pickup 2010; Marsh and Tilley 2009; Bartels 2002; Conover, et. al,
1987 and Pfeffley 1987) and perhaps partisan factors outweighed economics. This paper looks
to expand this line of research, but examines specific economic consequences of the recession
(loss of a job, problems paying a mortgage, adult children living at home, etc.) in addition to the
attitudinal measures that make up consumer confidence scales. Preliminary findings indicate
that Republicans and Democrats facing similar economic consequences view their situations
quite differently; Republicans believed they were worse off than four years ago while Democrats
believed they were better off. The authors speculate that voters view their economic situation
based on their partisanship, reducing the impact of economics issues at the voting booth.
Types of Moderates and Their Effect on Partisanship and Voting
Natalie M. Jackson, Marist Institute for Public Opinion
We know that many individuals in the American population consider themselves to be
ideologically moderate, and these moderate partisans and moderate independents are often the
swing voters in elections. This paper seeks to understand the people that we group into the
moderate category by proposing a theory of three types of moderates: the know-nothings, those
who completely ignore politics and have no substantive political beliefs; the cross-pressured,
those who report their ideology as moderate because they are torn between conflicting views
(e.g., liberal social views and conservative economic views); and the true moderates, those who
do not want to choose a side and whose political beliefs are in between liberal and conservative
beliefs. Which type of moderate an individual is should determine whether they are movable
partisans (the cross-pressured), independents (the true moderates), or apolitical (the know-
nothings). The types and their differing mechanisms of preference formation will be illustrated
using data from the American National Election Study, and data from an original national
survey. By classifying moderates in this way and explaining the origins of their opinions, I move
the literature beyond the assumptions that moderates are uniformly uninterested in and
uninformed about politics. They are, in reality, a complex group of individuals, many of whom
will comprise the swing vote in elections.
Satisfaction and Democracy: A Possible Combination?
Mónica Ferrín Pereira, Collegio Carlo Alberto, Torino
Satisfaction with democracy is probably one of the most contested indicators in public opinion
research. In fact, it is not fully clear that this indicator in fact reflects support for democracy, as it
is normally assumed. Canache, Mondak and Seligson, for example, arrive at very pessimistic
conclusions, and recommend avoiding its use in research on public opinion on democracy
(Canache, Mondak, Seligson 2001). In spite of the debate over its suitability as an indicator,
satisfaction with democracy continues to be widely used. The vast majority of surveys on public
opinion have incorporated this item in their questionnaire, and there are rich longitudinal data on
levels of satisfaction with democracy in most parts of the world. In light of this, it is pressing to
understand what this item measures. This paper comes precisely at the core of this discussion.
It is an attempt to deal with this classical indicator from a new perspective, which is very much
influenced by psychological and marketing studies. As such, I propose a new reading of
satisfaction, as applied to the concept of democracy. Two main questions are to be answered
through this paper: Is satisfaction with democracy a summary of citizens’ expectations towards
democracy, and evaluations of their democratic systems (as proposed by psychological and
marketing studies)? And, can satisfaction be applied to the concept of democracy?
Consistency of Reports of Party Affiliation and Voting Behaviour—Lessons From
a UK Panel Study
Nick Moon, GfK NOP Social Research; John Burton, ISER, University of Essex
One of the hot topics about polling in the run-up to the 2012 U.S. election was the role of party
identification and its use by pollsters in the weighting. One of the main points of debate is the
extent to which party identification is a long-term fundamental belief, or whether it is subject to
quite frequent change, and may even align itself with current voting intention. This paper draws
on data from the British Household Panel Study, a major study that interviewed over 10,000
adults annually for 18 years. Each year people were asked whether they supported a political
party, and how strongly they did so. The paper looks at the extent to which people gave the
same answer each year, whether they made a single switch in allegiance over time, or whether
they move back and forth between the parties more than once. This will provide solid
information on the stability of this much-used variable. In the second part of the paper we look at
another political variable—reported vote at the last general election. This question was asked at
14 of the 18 waves. The paper looks at the relationship between reported vote and party
identification, thus shedding more light on how respondents perceive the party identification
question, and also at the stability of the reported past vote question. As British general elections
are typically four years apart, respondents answered the question ‘how did you vote in the
general election of XXXX?’ at three or four consecutive waves, with the election getting
progressively more distant in time. There is much debate about how reliable the past vote
question is as a supposedly factual question, and the paper sheds valuable light on whether this
is a stable variable or not.
Friday, May 17
8:00 a.m. – 9:30 p.m.
AAPOR Concurrent Session C
Improving Surveys With Paradata
Paradata and Coverage Error
Stephanie Eckman, Institute for Employment Research
Coverage research involves studying the quality of the frames from which samples are selected,
and the impacts of errors in frames on survey data. Coverage is an understudied area in the
survey methodology literature, due in large part to the difficulty of obtaining the necessary data
about errors on the frame. Fortunately, paradata can in many cases provide the missing data
needed to study coverage. This presentation highlights how paradata can be used to study
coverage in household surveys. It discusses several types of frames, and the studies related to
each type that have made use of paradata. The presentation also suggests additional coverage
research that could be done with paradata.
Paradata and Nonresponse Error
Brady West, Institute for Social Research, University of Michigan
Nonresponse is a ubiquitous feature of almost all surveys, no matter which mode is used for
data collection, whether the sample units are households or establishments, or whether the
survey is mandatory or not. Confronted with this fact, survey researchers search for strategies
to reduce nonresponse rates and to reduce nonresponse bias, or at least to assess the
magnitude of any nonresponse bias in the resulting data. Paradata are now used to support all
of these tasks, either prior to the data collection to develop best strategies based on past
experiences, during data collection using paradata from the ongoing process, or post hoc for
empirically examining the risk of nonresponse bias in survey estimates or for developing
weights or other forms of nonresponse adjustment. Effective design strategies for reducing
nonresponse bias will call for the collection of survey process data from both respondents and
nonrespondents that are correlated with both key survey variables and response propensity.
Survey managers can therefore work to identify features of sample units that can be collected
for respondents and nonrespondents alike which may also be related to key survey variables
and the probability of responding. However, previous studies have suggested that paradata may
be prone to error, and paradata collection strategies that theoretically could reduce
nonresponse bias may be impaired if the collected paradata are of poor quality. Results of
simulation studies designed to examine the effects of varying levels of error in survey paradata
on the effectiveness of post-survey nonresponse adjustments will be discussed.
Paradata and Measurement Error
Kristen Olson, University of Nebraska - Lincoln
Paradata for purposes of investigating and understanding measurement error include response
timing, keystrokes, mouse clicks, behavior codes, vocal characteristics, and interviewer
evaluations. This presentation will focus on the analysis of these types of paradata. It will
highlight the specific analytic steps taken and issues to be considered when analyzing paradata
for the purpose of examining measurement error. The presentation will also call the reader's
attention to issues related to the measurement error in these types of paradata and offers take-
home points for researchers, survey practitioners, supervisors and interviewers.
Paradata in Web Surveys
Mario Callegaro, Google, UK
Survey researchers and methodologists seek to have new and innovative ways of evaluating
the quality of data collected from sample surveys. Paradata, or data collected for free from
computerized survey instruments, have increasingly been used in survey methodological work
for this purpose. In Web surveys, paradata can be collected at a variety of levels, resulting in a
complex, hierarchical data structure. One challenge is that not all off-the-shelf software
programs capture paradata, and thus user-generated programs have been developed to assist
in recording paradata. Further complicating matters is how the data are recorded, ranging from
text or sound files to ready-to-analyze variables. This presentation briefly discusses how
paradata differs by mode, and gives guidance on how to turn paradata into an analytic data set.
Paradata to Study Response to Within-Survey Requests
have evolved significantly from the days where the sole means of data collection consisted of
asking respondents to complete a standard Q and A-type questionnaire administered under a
single mode of data collection. Although the traditional questionnaire remains the primary
instrument for data collection in survey research, it is being supplemented with requests to
collect additional data from respondents using less traditional methods. Such requests may
include asking respondents for permission to collect physical or biological measurements
(collectively referred to as “biomeasures”), access and link administrative records (e.g., Social
Security, Medicare claims) to respondents' survey information, switch from one mode of data
collection to another mode, complete and mail back a leave-behind questionnaire in a face-to-
face interview, among other requests. Such requests, which are usually made within the survey
interview itself, have spawned new scientific opportunities that allow researchers to answer
important substantive and methodological questions that would be more difficult to answer
otherwise. For a series of requests (administrative data linkage consent, consent to biomeasure
collection, data collection mode switch, and income item response) this presentation will give in-
depth examples of how paradata have been used to study response to each type of within-
survey request. Possible uses of paradata for purposes of identifying potentially reluctant
respondents and implementing intervention strategies aimed at reducing within-survey
nonresponse will be discussed.
Sampling and Data Quality Issues in Internet Surveys
The Performance of Different Calibration Models in Non-Probability Online
Surveys: The Case of the 2012 U.S. Presidential Election
Clifford A. Young, Ipsos Public Affairs
The survey research world is changing. Gold standard methodologies such as the telephone
survey are under increasing pressure due to declining response rates, increased cell phone-
only households, and rising costs. Many have argued that one possible solution to this problem
is the online survey. There is some evidence of this. Indeed, as a class, online polls performed
well in this year’s U.S. presidential elections. However, online polls have their serious critics as
well. One is that online surveys potentially suffer from 'nonignorable error, and thus to be
projectable to the population must employ adjustment, or calibration, models to eliminate their
bias. However, to date, calibration models have often been treated as ‘black boxes’ by the
survey shops that employ them. With this in mind, we ask one simple research question in this
paper: which calibration method performs best when estimating voting intention (VI)? To do this,
we will analyze approximately 160,000 interviews collected for the Reuters-Ipsos presidential
tracking poll between January and November 2012, including primary, state and national races.
The Ipsos poll is a blended online sample where multiple panel and nonpanel sample sources
are combined. Our paper will focus on the performance of different calibration models, including
propensity weighting, the use of demographic and attitudinal variables in post stratification
weights, and weighting strategies at the sample source stage. To measure performance, we will
employ a ‘Mean Square Error’ framework looking both at bias (average absolute difference) as
well as variability of the estimate. Finally, our validating benchmarks will include both final
election results as well as the weekly market averages of VI taken from ‘pollster.com.’ In total,
we will have 51 separate data points to analyze.
How Do Different Sampling Techniques Perform in a Web-Only Survey? Results
From a Comparison of a Random Sample Email Blast to an Address-Based
Sampling Approach
Ipek Bilgen, NORC at the University of Chicago; Michael J. Stern, NORC at the University
of Chicago; Kirk M. Wolter, NORC at the University of Chicago
In the late 1990s there was much optimism that the Web-based surveys would become the
replacement for RDD telephone interviews. For many reasons, Web only surveys have not
taken precedence among different survey modes. For one, according to the 2010 Current
Population Survey, about 72% of the American households have Internet access. Among these
households, some individuals lack the skills to use it, are uncomfortable with it, or use it
infrequently. Still, among certain segments of the population, Web-surveying has become a
viable part of the lexicon of survey research. As a result, more research is necessary to
understand ways to sample for Web-only surveys and examine the implications of different
sampling strategies on survey estimates. In this paper, we compare two Internet sampling
strategies for a Web-only survey to assess the data quality and cost-efficiency obtained via
each sampling strategy. In the first sampling approach, email addresses were randomly
selected from a vendor’s email address sample frame. We sent the sampled email addresses a
series of survey invitation emails which included the link to our survey. The second sampling
strategy employed an Address Based Sampling (ABS) approach and sampled addresses from
the USPS Delivery Sequence File. We sent the sampled addresses a series of survey invitation
mailings which included the link to our survey, as well as the instructions on how to complete
the survey. We compare respondent demographics and response distributions by sampling
approach and ultimately compare the response distributions obtained via each sampling
approach to a national-level benchmark (e.g. General Social Survey) to assess generalizability.
In addition, we explore the results of these approaches in terms of response rates, the
effectiveness of incentives, and the comparison of weighted response distributions.
Can We Effectively Sample From Social Media Sites? Results From Two Sampling
Experiments
Michael Stern, NORC at the University of Chicago; Kirk Wolter, NORC at the University of
Chicago; Ipek Bilgen, NORC at the University of Chicago
The exponential increase in user generated social media sites, where individuals can share
information about themselves and their opinions, has raised questions about whether we can
use them in a variety of survey capacities. As a result, there is a need to investigate whether
researchers can effectively sample from the social media sites and, if so, what is the quality of
the data produced? In this paper, we attempt to answer these questions by comparing and
evaluating two different opt-in social media sampling experiments. The two sampling methods
involve using advertisements as survey invitations on two separate social media sites:
Facebook and YouTube. In both sampling approaches, respondents click on the invitation
advertisement posted in the banners and side-panels and are taken to our landing page with
information about the 21-item survey of technology use and its entry point. We assess the 1)
data representativeness using the General Social Survey as our national benchmark, 2) time
taken to reach 1000 completes by social media site, and 3) the cost efficiency of these sampling
strategies. In addition, we conduct a series of four incentive experiments to test the
effectiveness of the different quantities of incentives. The incentive experiment is designed to
achieve 100 completes from each of three incentive amounts: $2, $5 and $10. In addition, a
larger treatment is designed to achieve 700 completes and to test the best-value incentive
determined in prior experiments.
How Far Have We Come? The Lingering Digital Divide and Its Impact on the
Representativeness of Internet Surveys
J.M. Dennis, GfK Knowledge Networks; Curtiss Cobb, GfK Knowledge Networks
Even while the Internet has become a popular tool for survey data collection, researchers have
identified a number of potential problems involved in using a Web-based survey. One primary
concerns was sampling coverage error. For example, only 68% of American households had an
Internet connection in the home as of 2006 (Pew 2012). Today, more than 78% of households
have an Internet connection, but some subgroups of the population such as African Americans
and Latinos are still known to be more likely to be offline than others. This phenomenon is often
referred to as the “digital divide.” Despite the persisting existence of the “digital divide,” the use
of the Internet for survey data collection has grown exponentially. Should survey researchers
still be concerned about sampling coverage issues? This study uses data from GfK’s
KnowledgePanel® to examine whether attitudinal and behavioral differences—those that cannot
be accounted for with post-stratification weighting—between Internet households and non-
Internet households have also persisted over time. KnowledgePanel provides Internet access
and netbook computers to its panelists who live in a household without Internet access. As a
result, all panel members are able to participate in surveys online, minimizing the potential error
resulting from the exclusion of non-Internet users. Using data from 2008 and 2012, for each
year, we compared weighted estimates that include non-Internet households to weighted
estimates without non-Internet households. The analysis reveals that differences still exist
between Internet and non-Internet households for a series of attitudes and behaviors that
cannot be corrected for using post-stratification weighting.
Respondent Validation Phase II
Dinaz Kachhi-Jiwani, United Sample (uSamp); Lisa Wilding-Brown, United Sample
(uSamp)
In the recent years, online research has gained acceptance but the question about data quality
continues to surface as technological sophistication helps fraudsters to easily bypass quality
checks. Prior research by Courtright and Miller (2011) highlighted the unwillingness of
respondents to share Personal identifiable Information (PII) and demographic bias as major
barriers to performing validation. To that end, the research was administered in 2012 to identify
any change from the previous landscape and also evaluate different techniques introduced by
vendors to overcome traditional challenges. We discovered that although the new techniques
have an impact on number of respondents who get validated, it also influences data quality. The
demographic of respondents who were not validated were consistent with 2011 and were more
likely to be those without a bank account or credit card and less likely to own their own homes.
When we further associated data quality with validation status, we found that respondents who
failed to validate were twice as likely to fail at least one quality check (i.e. straight-line in a grid,
speed through the questionnaire or answer inconsistently). Also, the validation methodologies
and process of conducting validation differs across vendors. Therefore, from a project
management standpoint, it becomes imperative to account for these factors to make sure that
appropriate techniques are adopted and followed by researchers. Key takeaways:
Demographic and psychographic make-up of validated and un-validated
respondents.
Validation and its impact on data quality.
What validation means for market researchers?
Project management implications.
Lessons in Leadership: AAPOR Women Leaders Share
Their Insights
Mollyann Brodie, The Henry J. Kaiser Family Foundation; Courtney Kennedy, Abt SRBI;
Nancy Mathiowetz, University of Wisconsin-Milwaukee; Eileen O’Brien, Energy
Information Administration, U.S. Department of Energy
Across research sectors, there are unique challenges and concerns for women in leadership
roles. Building on the experiences shared in last year’s successful professional development
panel, Considering Changing Sectors in the Research Industry?, this session will continue the
conversation, focusing on women’s leadership in the research industry. This panel, organized
by AAPOR’s Education Committee and moderated by Angie Gels of The Nielsen Company,
brings together a group of AAPOR women leaders. Sharing their real-life experiences, panelists
will discuss their successes and challenges as women in research and help identify
opportunities to improve personal leadership skills and effectiveness. Panelists will also reflect
on changing roles and experiences of women in the research industry. The panel session will
include brief commentary by each panelist and a moderated Q&A session (audience
questions/comments highly encouraged). The session may be of interest to women (and men!)
at all levels of leadership, from informal to manager to CEO. Expect a lively discussion reflecting
the diversity of our membership and their experiences. A number of experienced and willing
panelists have been identified and three to four will be invited to participate in the panel.
From Concepts to Questions
Preparing to Measure Health Coverage in Surveys Post-Reform: Lessons From
Massachusetts
Joanne Pascale, U.S. Census Bureau; Jonathan Rodean, U.S. Census Bureau; Jennifer
Leeman, U.S. Census Bureau; Carol Cosenza, Center for Survey Research, University of
Massachusetts Boston; Alisu Schoua-Glusberg, Research Support Services
The Affordable Care Act (ACA) is expected to be fully implemented in January 2014 and usher
in a series of reforms of the U.S. health care system. One of the most significant components of
the ACA is the “Health Insurance Exchange”—a state-level marketplace of health insurance
options for individuals and small businesses. While these Exchanges are still in development
and states have broad flexibility in designing the programs, it is essential for the federal
government to have a viable methodology in place for measuring health coverage post-reform.
One opportunity for research and development of such a methodology rests in the state of
Massachusetts, which in 2006 passed legislation that includes most of the features of the ACA,
including Exchanges. The Census Bureau teamed with Research Support Services and the
University of Massachusetts to conduct research with Massachusetts residents to explore the
many pathways of enrolling in an Exchange, the language and terminology residents used when
describing their coverage, and ultimately to develop standardized questions for capturing
Exchange participation and subsidization. The project was conducted in three phases: expert
consultation with key individuals with years of experience in measuring health coverage at the
state and federal level (focusing on Massachusetts); focus groups with subgroups for whom the
Exchange was targeted; and cognitive interviews with those same subgroups. Individuals with
coverage through more conventional sources were included in the testing as a control to flag
possible “false positives”—reporting coverage through an Exchange that was actually through
another source. Questions on the Exchange were developed and tested within the context of
both the Current Population Survey and the American Community Survey, thereby providing
some baseline findings for other federal and state surveys that utilize a similar questionnaire
structure as either of these two surveys.
Identifying the Dimensions of Question Sensitivity: A Multidimensional Scaling
Study
Christopher Antoun, Institute for Social Research, University of Michigan
“Sensitive questions” are questions that are likely to be seen as threatening or embarrassing.
They have become more common in national surveys as researchers attempt to monitor sexual
behavior or the use of illicit drugs. Despite the large body of survey research about sensitive
questions, it is still unclear what makes a question sensitive because no standard definition of
“sensitivity” exists. Tourangeau, Rips, and Rasinski (2000) identify three distinct meanings of
sensitivity from the survey literature: intrusiveness, threat to disclosure, and social desirability
concerns. To date, no one has attempted to empirically verify these dimensions.
Multidimensional scaling (MDS) can help by locating stimuli, which are sensitive questions in
our experiment, on a spatial configuration or “dimensional space.” The ordering of the points
along dimensions allow for interpretations about the nature of each dimension. An advantage of
MDS is that the research participants, not researchers, determine the number and kinds of
dimensions present. We conducted an experiment to empirically identify dimensions of sensitive
questions. Approximately 250 participants provided pairwise similarity judgments for 12 types of
sensitive survey questions. Applying MDS to these data yielded three dimensions representing
how participants thought about question sensitivity. The dimensional structure did not match
Tourangeau and colleagues’ formulation precisely. Intrusiveness was the most salient
dimension with questions about taboo topics such as sex at one extreme of the dimensional
space and a question about exercise at the other extreme. Threat of disclosure was the second
most salient dimension with questions about illicit drug use at one extreme of the dimensional
space and a question about racial attitudes at the other extreme. A third dimension improved
the model fit but was not related to social desirability concerns. The results indicate that there
are independent and separable dimensions of question sensitivity that should be further
explored.
Finding the Needle: The Challenges of Recruiting Participants for Cognitive
Testing by Coverage Type in an Exchange State
Katherine R. Kenward, Research Support Services, Inc.; Joanne Pascale, U.S. Census
Bureau; Alisu Schoua-Glusberg, Research Support Services, Inc.; Carol Cosenza, Center
for Survey Research, University of Massachusetts Boston
When recruiting for particular respondent types, there are often challenges in finding the right
individuals. Researchers can advertise for a specific characteristic but this is challenging when
the trait is rare in the population or is similar to another unwanted and more common trait.
Screening presents challenges and sometimes yields false negatives or positives and/or primes
respondents for specific traits. In March 2010, the Affordable Care Act was passed and in 2014
Health Insurance Exchanges will be operating in all or most states. In 2006 Massachusetts
passed legislation similar to the ACA and in order to develop standardized questions on
Exchange participation prior to 2014 the U.S. Census Bureau, in fall of 2011,undertook to
cognitively test question sets in English and Spanish exploring terms and concepts that refer to
coverage through the Exchange in Massachusetts. Finding residents with coverage through the
Exchange became a complex recruitment task because only a small portion of the population is
covered through the Exchange and many participants do not know the specific type or source of
coverage they have. In addition, the agency that administers the Exchange (and has records on
enrollees) had never before allowed researchers access to their covered population. This paper
explores the challenging process of identifying respondents who cannot accurately answer
screening questions about their coverage source, successfully gaining the cooperation of an
agency that has records on the population of interest, the limitations and benefits of using an
agency as a source for outreach to the population of interest and the limitations of and creative
resources used for recruiting when faced with small populations unaware of their coverage type.
In addition the paper explores the time and effort involved and the benefits of each approach.
The Establishment Survey Response Process and Measurement Error: How and
Why Are They Connected?
Polly Phipps, U.S. Bureau of Labor Statistics; Danna L. Moore, Social and Economic
Sciences Research Center, Washington State
The BLS Survey of Occupational Injuries and Illnesses (SOII) provides a unique opportunity to
study the establishment cognitive response process and measurement error. Recent studies
have cited discrepancies between SOII and State Workers’ Compensation (WC) administrative
claims records to support the assertion that SOII undercounts workforce injuries and illnesses.
To explore reasons for discrepancies, we conducted over 50 qualitative interviews with SOII
respondents from establishments of varying sizes, industries, and magnitude of differences
between SOII and WC data. Our in-person interviews focused on possible errors in
comprehension, retrieval, judgment, and communication associated with the respondent,
records system, and business environment. We address numerous questions, including across
businesses and respondents, what response processes contribute the differences? Results
suggest that understanding of reporting rules and survey timing play a role in discrepancies. Our
research also suggests that the business environment influences the response process.
An Overarching Process for Enhancing the Validity of Survey Scales
Hunter Gehlbach, Harvard Graduate School of Education
For years researchers across many disciplines have undertaken the formidable challenge of
designing survey scales to assess attitudes, opinions, and behaviors. Correspondingly, scholars
have written much to guide researchers in this undertaking. Yet, much of their guidance focuses
on discrete steps that survey designers might take especially statistical procedures to be
conducted after pilot data are collected. This paper synthesizes several of these steps into an
overarching process to facilitate the construction of questionnaire scales. Unlike previous
processes, this one front loads input from other academics and potential respondents in the
item-development and revision phase with the goal of achieving credibility across both
populations. Specifically, the article describes how (a) a literature review and (b) focus group
interview data can be (c) synthesized into a comprehensive list to facilitate (d) the development
of items. Next, survey designers can subject the items to (e) an expert review and (f) cognitive
pretesting before executing a pilot test.
The Role of Literature and Parent Voices in Developing the Child Behaviors Scale
Lauren Capotosto, Harvard Graduate School of Education
As a first step in developing the Child Behaviors scale, we reviewed the child learning-related
behaviors literature in order to define the construct and identify existing measures that could
inform our own questionnaire. Many of the measures specific to children’s learning-related
behaviors require either teacher (e.g., Learning Behaviors Scale; McDermott, Green, Francis, &
Stott, 1999) or student respondents (e.g., Patterns of Adaptive Learning Scales; Midgley,
Maehr, Hruda, Anderman, Anderman, Freeman, et al., 2000). They similarly include items that
reflect positive behaviors that support and negative behaviors that hinder school success.
Second, we conducted focus groups with eight parents who represented a socioeconomically
and racially diverse group in order to examine the extent to which the conceptualization of child
learning-related behaviors in the literature aligned with the way parents conceive of it. We used
a semi-structured interview protocol consisting of five open-ended questions to ascertain how
parents thought about what students broadly, and their children in particular, can do to help or
hinder their school success. Third, we synthesized the literature review with focus group data.
Specifically, we developed a two-column list to compare indicators that emerged from the
literature and focus groups. While there were several commonalities between the ways in which
researchers and parents conceptualized child learning-related behaviors (e.g., both discussed
procrastination as a negative behaviors and following directions as a positive behavior), there,
too, were noteworthy differences. For example, whereas the literature refers to willingness to
ask for help as a positive child attribute, several parents mentioned asking for help as a
negative behavior. When probed further, we learned that parents wanted their children to ask for
help only after making an earnest effort to work independently through a challenge. Such
distinctions informed the crafting of items in step 4.
Item Development and Expert Reviews for the Child Behaviors Scale
Sofia Bahena, Harvard Graduate School of Education
Item development and expert reviews were two key steps in the process of developing scales
for the Family-School Relationships survey. First, we used our synthesis of the literature review
and focus groups to develop items for the Child Behaviors scale. The goal was to develop items
that represent indicators integral to the construct, while using vocabulary that is relevant to
potential respondents (parents of school aged children). We relied on known best practices in
survey design to avoid wording bias, ameliorate acquiescence bias, and ensure our items would
pertain to a wide range of parents. We aimed to improve reliability by avoiding reverse scored
items (Benson & Hocevar, 1985; Cacioppo & Berntson, 1994; Swain, et al., 2008) and labeling
answer choices with construct-specific anchors (Tourangeau, Rips, & Rasinski, 2000), and
avoiding agree/disagree statements (Fowler, 2009; Krosnick, 1999). We also tried to capture the
breath and depth of what children do that is conducive (or hurtful) to their school success in
order to provide face validity to our items. Once developed, the items received several rounds of
feedback from the research team, with a particular eye towards clarity and comprehensiveness.
To address the valence of the items, we separated our scale into two sets of questions: one for
positive behaviors and another for negative behaviors. Next, we reached out to field experts and
asked them to review our items; 9 out of 26 responded. These experts included researchers and
K-12 school leaders. We asked them to rate our items based on clarity, comprehensiveness,
and appropriateness for a broad range of parent groups (e.g. student age, parent education
level, parent whose 2nd language is English). Based on their responses, one item about
homework was eliminated because it did not apply to children in the earliest grades, and several
others were modified.
The Role of Cognitive Interviewing and Pilot Testing in the Development of the
Child Behaviors Scale
Beth
Once item development was complete, we subjected our items to a cognitive interviewing
procedure to ensure potential respondents understood the items as we intended (Willis, 2005).
We conducted 40-60 minute one-on-one interviews with parents regarding the Child Behaviors
items (N=5). We first asked parents to restate each question in their own words, using none of
the words from the item itself, and then to “think aloud” as they came to their own answer to the
question. In this presentation, I will describe how these interviews a) provided evidence parents
understood our items, b) led us revise a small number of items, and c) led us to eliminate a
small number of items from the scale. Given this scale was designed to be deployed in the
context of a larger survey, there was a premium on keeping it as parsimonious as possible.
After making final wording changes based on cognitive pre-testing, we used the SurveyMonkey
website to administer our survey to two separate samples of parents (N=384; N=266) from
SurveyMonkey’s unique national panel. We analyzed the resulting data using confirmatory
factor analysis (CFA) to provide evidence of adequate variability, reliability and the factor
structure of the scale. With the first sample, our child negative behaviors items had strong
internal consistency (alpha = .79), as did our child positive behaviors items (alpha = .82). The fit
was adequate (Kline, 2011) for a two-factor measurement model with separate latent variables
capturing positive and negative behaviors (?2 = 40.82, p = .03 ; CFI = .99; RMSEA = .04;
RMSEA 90% CI = .01, .06). Finally, we replicated these results with the second large sample of
parents. These findings provide confidence that the Child Behaviors scale is a valuable tool for
practitioners and researchers alike.
Benchmarking Parent Perceptions of Child Behaviors with SurveyMonkey
Philip Garland, SurveyMonkey
At least since No Child Left Behind and continuing with Race to the Top, high stakes education
standards are gaining traction nationally. And while assessors obviously find value in comparing
standardized test scores across schools, the process would benefit from the ability to compare
schools on a broader array of factors. To that end, surveying is one avenue to collect data about
schools. While test questions can be administered in a uniform way quite easily, standardized
survey questions have been elusive to date. It follows then, that the ability to compare survey
data across schools would greatly benefit schools and researchers alike. Schools can ascertain
with much greater certainty what their strengths and weaknesses are if they are able to
compare their own data to a series of comparable schools. Benchmarking data will also allow
researchers to examine between-school variation. To the extent survey administration reflects
the uniformity and regularity of testing, schools ought to be able to understand and explain
which areas are comparative strengths and weaknesses and scholars ought to be able to
understand which areas offer the greatest explanatory power. In turn, this clarity should allow
school leaders to craft plans for improvement with more confidence and allow researchers to
draw broad lessons about school improvement. Of course, such an offering requires critical
mass of usage. Fortunately SurveyMonkey facilitates roughly 70 million parent interviews each
year from more than 80,000 schools. With this scale at hand, schools will be able to compare
themselves at quite granular levels within very small geographies. Moreover, SurveyMonkey is
working with Great Schools to make information available to parents as they select schools for
their children. This presentation will describe this benchmarking project using the Child
Behaviors scale example and will outline the implications for both researchers and school
practitioners.
The 2012 Election: Horserace Polls, Exit Polls and Poll Aggregation
Voter Mobilization Effects of Localized Pre-Election Horserace Polling Information
David L. Vannette, Stanford University; Sean J. Westwood, Stanford University
Candidate performance in pre-election polls dominates media coverage of elections and is
known as the ‘horse race.’ Yet, we have an incomplete understanding of how coverage of the
horserace may influence voter turnout. Prior research has demonstrated that there are
relationships between polling numbers and political behavior, but these results are largely
correlational and there are large gaps in our understanding of the boundaries of these effects.
We approach the question experimentally. The most commonly cited poll effects are
‘bandwagon’ and ‘underdog’ effects; simply being made aware of the preferences of other
people seems to influence some voters to support the candidate or issue that is currently
leading or trailing in the polls. In this paper, we experimentally examine the influence of poll
reports about the state of the horserace in a potential voter’s specific congressional district on
the decision to turn out to vote. Participants were sent three bogus poll reports about the
horserace via e-mail in the two-week period leading up to the 2012 presidential election.
Participants were randomly assigned to treatments where 1) Obama was leading in local polls,
2) Romney was leading in local polls, and 3) where both Romney and Obama were “tied.” An
additional control group completed the pre and post-election surveys but did not receive any e-
mail messages. Nearly one thousand subjects were recruited into the experiment using a
convenience Internet sample. We demonstrate that reports of a Romney advantage, or a tie,
increased voter turnout for Democrats. We provide detailed analysis of the moderating effects of
election salience, partisan strength and media consumption. We plan to validate voter self
reports with voter file data from Catalist. Our results provide evidence on the mobilizing effects
of local polls on potential voters and have broad implications for polling research and political
communication.
Using Non-Probability Online Surveys for Exit Polling: The Case of the 2012 U.S.
Presidential Elections
John P. Vidmar, Ipsos USPA; Darrell Bricker, Ipsos USPA; Cliff Young, Ipsos USPA; Julia
Clark, Ipsos USPA; Alan Roshwalb, Ipsos USPA; Neale El Dash, Ipsos USPA
The exit poll is a staple of the modern American democratic experience. Exit polls have multiple
purposes including an independent check to validate election results, a mechanism to provide
content to media companies during election night, and finally copious voting data for
practitioners and academics alike. Traditionally, the U.S. exit poll conducted by VNS (Voter
News Service) in randomly selected polling stations is increasingly feeling the strain. First, costs
are a real concern. Indeed, VNS cut nineteen states from its electoral coverage in 2012
presumably due to cost. Second, early voting in 2012 reached 40% making in loco exit polling
both irrelevant and problematic in the long-term. One possible solution for these challenges
would be to conduct an online exit (or election day) poll. Online methodologies, though, have
their serious detractors who cite both their non-probabilistic nature and their coverage and self-
selection bias. Are such criticisms warranted? To answer this question, we will analyze data
from an online exit poll conducted by Ipsos for Thomson Reuters. In total, 42,000 interviews
were conducted nationally. Specifically, we will compare results from the Ipsos exit poll to the
traditional VNS exit poll published by the primary new shops. As a measure of comparison, we
will look at the average absolute difference (AAD) to gauge relative performance of the online
exit poll with more traditional methods. Finally, our paper will also detail operationally and
methodologically the IPSOS online exit poll.
Information Disconnect: Data Aggregators and Media Reporting in the 2012
Presidential Election
Fred Solop, Northern Arizona University; Nancy Wonders, Northern Arizona University
Election 2012 was a $6 billion event in the United States. Media headlines screamed news of
the closeness of the race and the rhetoric of the presidential contest suggested big changes
were coming. The candidates “sparred” at the debates. Reportage about the election featured a
competitive “contest” between President Obama and Governor Romney. Throughout the
campaign, newspapers followed every new poll, particularly those in “battleground states,” and
reported on small differences in the numbers, even if they fell within the margin of error. Voters
were primed to expect a tight race in a polarized election environment. Either candidate could
win. In contrast to the message promoted by media outlets, some data aggregators like Nate
Silver of the 538 blog and Mark Blumenthal of Huffington Post’s Pollster.com aggregating a
wide range of election surveys and telling a different story. Their story was one of stability rather
than rapid change. Obama was favored to win throughout the election season. Nate Silver, in
fact, predicted that Barack Obama had a 92 percent chance of winning the presidential contest
on the eve of the election. Despite contributions by these data aggregators, papers such as The
New York Times still spoke of an expectedly close race and seemed surprised when America
awoke to the news that Barack Obama would be in office for the next four years. This paper
explores the disconnect between what the “data aggregators” were saying and what the media
was telling voters to expect in the 2012 election. We content code and compare messages
coming from both sources of information. After characterizing the extent of the disconnect, we
develop a range of theories to explain this phenomenon. Finally, we offer solutions for how
media can do a better job integrating the science of polling into coverage of future elections for
president.
Using Model-Based Poll Averaging to Evaluate the 2012 Polls and Pollsters
Mark Blumenthal, Huffington Post; Simon Jackman, Stanford University
Much previous research on pre-election poll accuracy, from Mosteller et al. (1949) to Crespi
(1988) to Traugott et al. (2005) has focused on various scores that compare the deviation
between the election results and vote preference as measured on polls conducted at or near the
end of the campaign. The traditional measures of poll accuracy capture both validity (or the
absence of bias) and reliability (or the absence of variance). When scores are calculated from a
single poll or a small number of polls, it can be hard distinguish between the two sources of
error. Also, by focusing on the performance of the final pre-election poll, these scores can
create unhelpful incentives, which some believe lead to a 'herding' of poll results near election
day (Clinton and Rogers, 2012). Model based poll averaging offers another check on polling
validity, if not reliability, through estimates of pollster ‘house effects,’ the tendency of some
survey houses to produce estimates that are systematically higher or lower for one candidate
than other houses. Once the model is corrected to the final election outcome, these house effect
estimates allow for tests of several hypotheses: Were the polls of 2012 collectively biased
towards one candidate or the other over the final two months of the campaign and not just
during its final week? Did particular sampling methodologies or surveys modes or exhibit
consistent bias? And were particular survey houses better or worse in 2012?
Model-Based Poll Averaging Over the 2012 U.S. Presidential Election Campaign
Simon Jackman, Stanford University
During the 2012 U.S. presidential election campaign I developed a poll-averaging model that
produced daily estimates of voting intentions national and state levels, published on
Pollster/HuffingtonPost.com. Using over 1200 published polls, my model correctly predicted the
election outcome in every state and Obama’s 332 vote Electoral College tally. I elaborate the
various elements of the model: (1) reliance on historical election returns; (2) corrections for
house effects; (3) correlations among states and national levels; (4) a dynamic model for day-to-
day changes in voting intentions over the campaign. I report estimates of key parameters of the
model (e.g., house effects, the day-to-day rate of change parameter), details as to the
forecasting performance of the model and sensitivity to various model assumptions. Collectively,
the polling industry underestimated Obama’s two-party vote share by about half a percentage
point; I examine the sources of this systematic, collective bias in 2012 election polling. Since the
model produces estimates of trajectories of voting intentions in every state, I also assess the
extent to which ‘set pieces’ of the campaign (the end of the Republican nominating process, the
nominating conventions, the debates) and exogenous events (e.g., Hurricane Sandy) appears
to have moved voting intentions, and variation across states in the magnitude of responses to
these events.
Methodological Briefs: Cell Phones
Alternative Sample Selection and Data Collection Strategies for Balancing Cell
Phone Response Distribution Across County/Region Level Geographies in a Dual
Frame (Landline/Cell) Telephone Survey
Howard Speizer, RTI, International; Marcus Berzofsky, RTI International; Jamie
Ridenhour, RTI International; Tom Duffy, RTI International; Tim Sahr, Ohio State
University
The challenge of targeting smaller geographic regions in a dual frame (landline/cell) telephone
study increases with the proportion of sample members that are expected to respond by cell
phone. For the Ohio Medicaid Assessment Study (OMAS), completed in October 2012, 25% of
the 22,000 respondents in this state-wide survey completed their interview by cell phone.
Targets for the number of cell phone respondents by county were set by population totals and a
probability-proportionate-to-size cell phone sample was fielded. Although this design worked
well in most regions, some significant under- and over-representation occurred. In this paper,
we examine both sample selection and field data collection protocols to explain these variances.
We then examine various cell phone data augmentation options and sampling strategies, by
modeling performance on the OMAS, to improve geographic targeting without sacrificing quality
objectives. We compare the results of these alternatives and suggest an improved design for
the cell phone portion of a dual frame telephone study for achieving small area targets.
Sampling Cell Phones by Rate Center: Efficacy, Coverage and Incidence
David Dutwin, Social Science Research Solutions; David Malarek, MSG
Survey researchers who conduct telephone studies of geographies any smaller than a state
have limited options available for sampling cell phone telephone numbers. Furthermore, there is
presently no way to estimate the incidence one will attain by any of the methods presently
available for localized sampling, nor are there any techniques available to estimate coverage of
the selected target population. This paper first details the options available to researchers of
local telephone studies with regard to cell phone sample selection, and explains why selection
at the level of rate center is superior to other methods. Using a unique dataset that combines
thousands of respondent surveys across the United States with data from the 2010 U.S.
Census, aggregated to the level of rate center, we show the efficacy and potential bias of
utilizing rate center for local sample selection. Finally, we offer a model by which researchers
who utilize rate center can estimate the survey incidence they will attain and as well as the
coverage of their target population.
To Call or Mail: Impact of Mailing Surveys Directly to Cell-Phone-Only
Households in an Address-Based Frame
Vrinda Nair, Arbitron Inc.; Robin Gentry, Arbitron Inc.
Arbitron currently uses a mailed screener questionnaire sent to an Address Based sample
(ABS) to recruit the non-landline portion of the population. If a respondent reports being cell
phone only or cell phone mainly the household is added to a cell-phone frame and used to
supplement a 2+ list assisted RDD sample. This cell-phone sample is called using our CATI
system. To investigate better methods and more cost-effective ways of increasing cell phone
response rates, Arbitron conducted a direct mail study in fall 2012. In our current methodology,
cell phone households who supply a contact phone number at the screener stage are contacted
and asked to participate in a seven-day diary survey of their radio listening. With the direct mail
study, this stage was omitted and radio surveys were mailed directly to the cell phone
households without first gaining consent. How does not offering a chance to refuse survey
participation impact response rates? We will present the return rate results, cost-benefit
analysis, as well as the analysis of the demographics of those that returned the survey to
determine who we brought in with the direct mail as compared to the traditional “call and then
mail” approach.
Understanding Bias in Appended Wireless Billing ZIP Code Data
Tara Merry, Abt SRBI; Andy Weiss, Abt SRBI; Mikelyn Meyers, Abt SRBI; Paul Schroeder,
Abt SRBI; Kristie Johnson, NHTSA
Designing cell phone samples is particularly challenging for small area surveys given the lack of
precise geographic information available. Rate center is currently the best information available
that can be used to target a specific area. While rate center information works fairly well to
target larger geographic areas, it is much less precise for targeting smaller areas such as
individual counties that are served by several rate centers. New post-production processes are
available that append billing ZIP code data to cellular samples. While this information cannot be
used to draw targeted sample, it can be used to stratify the sample prior to fielding. The ability to
append billing ZIP code data to cellular samples has the potential to dramatically improve
geographic targeting precision, thereby increasing efficiency and reducing costs. The
robustness and accuracy of the billing ZIP code data should be evaluated to determine how it
can be used to refine cellular samples without introducing bias or increasing coverage error. We
compare respondent-provided ZIP code data with billing ZIP code data in a population survey of
New York City with similar data from a national survey. Approximately one third (34%) of
sampled cell records in New York City have matched billing ZIP code data compared to 42% in
the national sample. We review variations in the rate at which ZIP code data are matched, the
accuracy of the ZIP code information, and examine which characteristics differ between
matched and unmatched cases across these studies. Results are discussed in the context of
how these data could be used to develop stratified cell samples depending on the geographic
area being targeted, the population of interest, and survey topic.
Cell-Phone Sampling Frames: Effectiveness and Dependability of Recent-Usage
Data
Robert DeHaan, Arbitron Inc.
Arbitron currently uses a mailed screener questionnaire to an Address Based Sample (ABS) to
recruit the non-landline portion of the sample frame. The questionnaire is used to identify cell-
phone only or cell-phone mainly households to be included in a supplemental cell-phone frame.
Recently, innovations have been made that allow for better targeting of cell-phone only and cell-
phone mainly households through the use of data available via various sample vendors. Of
interest to Arbitron in these data are activity indicators, geography, and other auxiliary
information to be used in stratification. To investigate the usability of these newly available data,
the test will be looking into advantages and disadvantages that will come from switching over
from our current address based methodology. We aim to answer: 1) How accurate are the
activity indicators and are they beneficial in increasing response rates and reducing costs? 2)
Can the activity indicators be used in conjunction with respondent reported cell-phone vs.
landline usage to determine differential sampling rates in a dual RDD frame approach? 3) What
affect, if any, will the change in methodology (elimination of screener questionnaire) have on
proportionality in historically under-represented demographics? We will present results
describing the effectiveness of using usage indicators in sample selection, cost considerations
and comparisons, along with an analysis of demographic proportionality in agreement while
using the cell phone frame.
Recent Methodological Updates Adopted for the National Immunization Survey
(NIS)
Vicki Pineau, NORC at the University of Chicago; Robert Montgomery, NORC at the
University of Chicago; Bess Welch, NORC at the University of Chicago; Kirk Wolter,
NORC at the University of Chicago; Stacie Greby, Centers for Disease Control and
Prevention
The National Immunization Survey (NIS), conducted annually since 1984, has been the nation’s
flagship survey for monitoring vaccination coverage among 19-35 month-old children. The NIS
was designed to collect data using a list-assisted RDD survey methodology of households with
landline telephones and a follow-up mail survey of the age-eligible child’s vaccination providers
to collect vaccination histories. Like many other surveys, the NIS is affected by declining
response rates and increasing cell phone only use resulting in high survey costs and serious
concerns about non-representative data. The NIS research strategy in recent years was
focused on assessing the consequences of noncoverage of households with cellular phone
service only, addressing declining response rates in the household interview, and maximizing
survey efficiency. In this paper, we summarize the major changes implemented in the NIS
design for 2011-2012 and the research results that supported adoption of these changes with
the relevant survey results since the changes were made. Expansion of the NIS in 2011 to a
dual RDD landline and cell frame design and implementation of weighting methods to minimize
the MSE of key survey estimates resulted in few changes in official vaccination coverage
estimates. The change in the NIS age definition increased the number of eligible households
and decreased the required number of calls in the household survey, decreasing survey costs
with no substantive effect on vaccination coverage rates. Beginning 2012, the household survey
questionnaire length was reduced by eliminating non-critical parental vaccination recall content
resulting in higher completion rates, decreasing survey costs and supporting expansion of the
cell phone sample frame. Also for 2012, an optimum dual-frame sample was fielded to minimize
the survey costs subject to geographic reliability constraints. The NIS will evaluate results using
data from the National Health Interview Survey Provider Record Check completed in the same
years.
Cross-Platform Measurement: User Experience With a Smartphone and Web Self-
Reported Data Collection Application
Ana P. Petras, The Nielsen Company; Shu Duan, The Nielsen Company; Oana Dan, The
Nielsen Company
The proliferation and adoption of multiple Internet-access platforms in the U.S., especially
among younger and ethnic populations, has increased the need for research organizations to
provide alternative methods of data collection, such as mobile applications and Web. In 2012,
Nielsen conducted a study in two local demographically diverse market areas in the U.S. to
assess the viability of using a cross-platform mobile and online application (Whatcha Watchin’?)
to capture television viewing. To maximize coverage of smartphone and non-smartphone users,
iOS (iPhone/iPad), Android and online (website) versions of the application were developed and
made available to participants. Here we present the end-to-end study experience through the
eyes of the user and focus on recruitment, app download and general use of this data collection
tool. The user experience was gathered through a follow up online survey and a set of one-on-
one interviews conducted shortly after the end of data collection period. Survey respondents
included those who accepted, registered into the App/Web and submitted at least one viewing
entry; those who accepted and registered but did not submit any viewing entry; and respondents
that accepted over the phone but didn’t register into the App/Web. Key findings on the
effectiveness of the recruitment materials, reasons for participation, interaction with the App,
etc. will be presented, followed by a discussion on key topics to include when surveying
respondents’ on the user experience with cross-platform data collection tools.
The Mechanics of GPS Geo-Location for Mobile Devices: Their Potential for
Measurement Error and Some Illustrative Data
Trashawna Boals, Experian Marketing Services; Max Kilger, Experian Marketing Services
As researchers continue their initial efforts to utilize mobile devices as survey research data
collection instruments, more and more researchers will turn to taking advantage of some of the
special features of these mobile devices as a part of their data collection process. In particular,
the ability to pinpoint the exact geo-location of an individual may contribute information that will
provide additional meaning to data already being collected by means of mobile devices. In this
paper we examine the multiple technical mechanisms involved in geo-location using GPS
services from mobile devices like Smartphones and tablets. In particular we look at the potential
sources for and size of measurement error in each of these strategies — such as the
“transporter effect” — due to a variety of factors. Finally, we examine some real-life geo-location
data collected by a recent passive mobile measurement study to look for evidence of some of
these potential measurement errors as well as provide the reader with some familiarity with
mobile device-based geo-location data.
Assessing the Risk of Nonresponse Bias
Following up on Nonresponse Bias in the American Time Use Survey
Daniel G. Harwell, National Center for Health Statistics
Nonresponse can be a major issue when the topic of interest is directly related to response
propensity. This issue is of particular concern for time use surveys, which measure how people
spend their time. If individuals who are busier fail to respond to surveys due to a lack of time,
this could lead to biased estimates. This has been a particular concern for the American Time
Use Survey (ATUS), which has had a response rate below sixty percent since it began in 2003.
Following up on previous research (Abraham, Maitland, and Bianchi 2006), this study examines
the potential bias created by nonresponse in the 2011 ATUS, with particular emphasis on the
theory that busier respondents are less likely to respond to the ATUS.
Multiple Approaches for Evaluating Nonresponse Bias in a Short-Field-Period
Survey
Robyn Rapoport, Social Science Research Solutions; Paul J. Lavrakas, Independent
Consultant; Eran Ben-Porath, Social Science Research Solutions; Melissa Herrmann,
Social Science Research Solutions
High quality dual frame phone surveys typically employ several strategies to reduce non-
response including: 1) making multiple call attempts to non-responsive numbers; 2) assigning
highly trained interviewers to call back refusals in an effort to convert them into completed
interviews; and 3) varying the time of day and day of the week that call attempts are made.
Surveys that are fielded over short field periods (10 days or less) are limited in their ability to
employ these types of approaches and may be more susceptible to the possible effects of non-
response, including bias. This paper reports on a series of non-response bias studies
embedded in a state-wide survey concerning voting that was fielded over a ten-day period in
October 2012. Since the survey was expected to undergo intensive legal scrutiny regarding its
validity in representing the population, the research team proactively incorporated four methods
to investigate the presence of non-response bias. These included comparisons of data provided
by respondents who completed the survey in initial call attempts to those who completed in later
call attempts and to those who had initially refused participation. The researchers also
compared responders and non-responders using Census data associated with their local zip
codes derived from telephone exchanges. For example, one variable found to significantly
differentiate responders from non-responders was the percentage of the population in the zip
code that was White. In order to gain insight into other types of potential non-response bias, the
team reviewed information about refusals collected, in real-time, from the interviewers using a
Refusal Report Form (cf. Lavrakas, 2010).Given that the survey industry is currently confronting
diminishing response rates, particularly during short field-periods, pursuing a rigorous evaluation
of the impact of non-response is imperative for increasing confidence in the validity of findings
derived from this type of data collection.
An Evaluation of Alternative Indicators for the Risk of Nonresponse Bias for a
Mail Survey With a Nonresponse Follow-Up
Sonja Ziniel, Harvard Medical School; Boston Children’s Hospital; James Wagner,
University of Michigan; Rebecca Hehn, Boston Children’s Hospital; Robert Groves,
Georgetown University; Ingrid Holm, Boston Children’s Hospital
Recent research on nonresponse bias in surveys has included the development of alternative
indicators for the representativeness of survey respondents, such as the R-indicators
(Schouten, Cobben, and Bethlehem (2009)). Few empirical studies have been published and
little is known about the usefulness of these indicators to detect nonresponse bias in survey
statistics under different survey designs. This study evaluates these alternative indicators,
including R-indicators, for a mail survey sent by Boston Children’s Hospital to 7,000 parents
whose children were recently seen at the hospital or any of the clinics affiliated with it. The
survey focused on attitudes about participation in genetic biobanks and the return of genetic
research results. Previous research indicated a high likelihood of nonresponse bias for a
number of statistics from such a survey. After the initial survey and a reminder postcard were
sent, we performed a nonresponse follow-up study for a random sample of the nonrespondents
that included a shorter questionnaire and a $2-bill as incentive. The highly detailed sampling
frame included demographic and medical condition-related information on the children and
parents. Exploratory analyses will compare nonresponse bias indicators across the two phases
of the survey and assess their ability to detect nonresponse bias using data from the sampling
frame as well as survey statistics.
The Effect of Survey Mode on Nonresponse Bias and Measurement Error: A
Validation Approach
Antje Kirchner, Institute for Employment Research; Barbara Felderer, Institute for
Employment Research
In order to obtain unbiased estimates from survey interviews, it is important that the data is of
good quality: i.e. representativity of the survey respondents and variables that are free of
measurement error. Using administrative records and survey data, the main questions we
address concern the differential nonresponse bias between the telephone and the Web mode
and whether these modes lead to differential measurement error. In an experimental setting we
randomly assigned respondents to either phone (n=2,400) or Web mode (n=1,082). Because
the sampled persons were selected from German administrative records, record data are
available for all sample units to study the bias due to nonresponse and measurement error (e.g.
population means and regression parameters). Hence, we can assess the overall nonresponse
bias of the estimates by comparing the statistics from both modes against the known population
value. Similarly, for the respondents, we compare survey values and administrative records on
individual level for selected variables and compute measurement error directly. First, based on
administrative data for respondents and nonrespondents, our paper compares nonresponse
bias for the above statistics in the single telephone mode to those obtained in the Web mode.
Empirical analyses confirm a differential sample composition resulting in systematically different
nonresponse bias between the two modes. Second, we assess the amount of measurement
error for both modes. We conclude with a discussion of whether mode specific differences, with
respect to nonresponse bias and measurement error bias, compensate or reinforce each other
with respect to the total error.
Implications of Potential Nonresponse Bias
Ashton Jacobe, Fors Marsh Group
One of the challenges researchers face in data collection is achieving a representative sample
of the population. There will usually be portions of the population who will not respond to the
survey. If the people who do not respond are systematically different from those who respond,
this introduces potential bias to the survey results. Therefore, survey nonresponse is a factor
that plays a significant role in the composition of the resulting sample. Ideally, nonresponse bias
would be measured by comparing the survey responses of those who did not respond to the
survey to those who responded; however in most cases this is not possible. Hence the
prevalence of potential nonresponse bias must be estimated. This paper discusses the
nonresponse bias analysis for a sample of military recruiters used for a quality of life survey.
The analysis used a two-stage process to approximate the differences in survey responses
between sample members who did not complete the survey and those who completed it. The
first stage compared demographic characteristics of respondents and nonrespondents. For
those characteristics found to differ significantly between the two groups, responses to key
survey items were analyzed in stage two to determine if the characteristics that differed between
groups were related to responses to survey items. Results indicated that several key estimates
showed potential bias; this response bias was primarily related to a few demographic
characteristics (race/ethnicity, aptitude test score, and family characteristics). There were also
differences in the amount of potential bias and drivers of the bias among the different Service
subgroups. As a result of this analysis, the data were weighted to adjust for the potential biasing
factors to ensure that all estimates better represent the full population of recruiters. This paper
highlights the importance of determining the extent of potential nonresponse bias in survey data.
Culture and Survey Responses
Examining the Role of Culture in Answering Context-General and Context-
Specific Survey Questions
Allyson L. Holbrook, University of Illinois at Chicago; Sharon Shavitt, University of
Illinois; Timothy P. Johnson, University of Illinois at Chicago; Young I. Cho, University of
Wisconsin – Milwaukee; Noel Chavez, University of Illinois at Chicago; Saul Weiner,
University of Illinois at Chicago
Previous research has examined factors that influence responses to context-general (those that
ask about opinions, behaviors, or beliefs that apply across contexts; e.g., general life
satisfaction) and context-specific (those that ask about opinions, behaviors, or beliefs that are
limited to a particular context; e.g., satisfaction with one’s work) questions. For example, the
order of these questions (in particular part-whole pairs of questions) may influence the
distribution of responses to the questions as well as the relationship between the questions
(e.g., Schwarz, Strack, & Mai, 1991). We examine the role that culture may play in influencing
responses to general and context-specific questions by examining several pairs of part-whole or
specific-general question pairs (e.g., opposition to the death penalty for a specific crime and
opposition to the death penalty in general) from a survey of non-Hispanic Whites, African-
Americans, Mexican-Americans, and Korean-Americans. We assess the extent to which culture
influences the process of answering these survey questions by examining order effects,
respondent reactions to the questions (measured via coding of respondents’ behaviors from
recordings of the interviews), and paradata (e.g., response latencies) to test a number of
hypotheses. For example, we expect that members of collectivistic cultures, who have been
shown to think more contextually (Nisbett, 2003) will have more difficulty than individualists in
answering the general items (which are less tied to context) but less difficulty with the specific
items (which are more strongly tied to context). References: Nisbett Richard E. 2003. The
Geography of Thought. New York: The Free Press.Schwarz, Norbert, Fritz Strack, and Hans-
Peter Mai. 1991. “Assimilation and Contrast Effects in Part-Whole Question Sequences: A
Conversational Logic Analysis.” Public Opinion Quarterly 55(1):3-23.
Testing the Veracity of Self-Reported Religious Practice in the Muslim World
Philip Brenner, University of Massachusetts Boston
Survey findings suggest that predominantly Muslim countries are among the most religious in
the world and validate commonly held, but overly simplistic, perceptions of Muslims as
extremely and uniformly religious. Existing research has demonstrated that survey estimates
can give a distorted view of the reality of levels of religious practice, however, it has thus far
focused exclusively on traditionally Christian, advanced Western democracies. To address this
oversight, the veracity of self-reported religious practice in the Muslim world is tested using
Pakistan, the Palestinian Territories, and Turkey as cases for study. Comparing estimates of
prayer from conventional surveys with those from time diaries, marginal rates of overreporting
are estimated for each country by sex. The time use measure of prayer is then imputed for the
conventional survey dataset to estimate overreporting at the respondent level and to predict
overreporting using a measure of religious identity importance. Findings suggest that
overreporting of prayer occurs in each country considered, although more consistently for
women than men. These gender differences in data quality are discussed in terms of
public/private religious practice. Moreover, religious identity importance is strongly correlated
with overreporting of prayer, suggesting that a similar mechanism may promote the
measurement error for overreported prayer in the Muslim world and overreported church
attendance in the West.
Estaría Bien Si Le Hago Unas Pocas Preguntas En Ingles? An Experimental
Investigation of Language Effects Among Bilingual Latinos
Nicole R. Buttermore, Social Science Research Solutions; Luis Tipan, Social Science
Research Solutions; Mark Lopez, Pew Hispanic Center; David Dutwin, Social Science
Research Solutions
A growing body of research has documented differences in survey results among Latino
respondents interviewed in Spanish and those interviewed in English and demonstrated that the
failure to offer interviewing in both languages introduces significant bias (e.g., Lee, et al., 2008;
Dutwin et al., 2012). Such language effects might result either from: 1) cultural differences in
attitudes, or; 2) differences in the meaning and interpretation of the questions depending on
language (Perez, 2011). In an attempt to tease apart these explanations, we embedded an
experiment in a national survey of Latino political attitudes and used random assignment to
control for the effects of acculturation. During the middle of the interview, respondents who
reported fluency in both English and Spanish were randomly assigned to either switch
languages or continue in the language in which they began the interview. In line with previous
research, the results demonstrated significant attitudinal differences between those answering
in English versus Spanish, but there were also differences between those who did and did not
change languages. Notably, of respondents who began the interview in Spanish, those who
switched to English rated their financial situation as more favorable than did those who did not
switch languages. Furthermore, respondents initially interviewed in English rated being
successful in a high paying career as less important when responding in English, compared with
those who switched to Spanish. These results suggest that the language in which a survey is
administered has an important implications for the way in which respondents think about and
respond to the questions.
Assessing the Validity and Reliability of Self-Reported Items on Likelihood of
Migration
Sergio C. Wals, University of Nebraska-Lincoln; Alejandro Moreno, Instituto Tecnológico
Autónomo de México
Immigrants provide a critical test to longstanding theories of attitude formation. These
individuals, after all, import their political attitudes from their countries of origin to their new
homes. Some of these attitudes and beliefs remain consequential for immigrants' civic lives
whereas others are replaced through exposure to the new political system. With international
migration flows on the rise and increased scholarly attention to immigrants’ political attitudes,
beliefs, and behaviors, social sciences are in need of valid and reliable survey instruments to
study these hard-to-reach populations. Before and after migration panel data are ideal. Given
budgetary and logistical constraints, however, before and after migration panel data are nearly
impossible to collect. Building upon social science research on likelihood of migration, we
develop a battery of items to assess the extent to which individuals from one country are likely
to migrate to another one. We contend that a small battery of items can provide researchers
with a theoretically-driven alternative to identify the most likely individuals to migrate from any
given country to another in the near future, which in turn results in a cost-effective opportunity to
gather pre-migration data on populations of interest. Over the past few decades, Mexico has
provided the largest cohort of immigrants to the United States. Therefore, we tested our
likelihood of migration battery on several nationally representative samples of Mexican citizens
living in Mexico to identify those individuals most likely to migrate to the United States in the
following two to three years. Our data analyses assess the validity and reliability of our survey
items. The data collection took place from 2007 to 2012. Our empirical findings strongly suggest
that our likelihood of migration battery is a viable and efficient option to collect valid and reliable
pre-migration data on populations of interest.
A Cross-Cultural Study on Daily Experience of Depression Between Countries in
the Sahel Region and Western Asia
Jinyoung Lee, University of Nebraska - Lincoln
According to the World Health Organization’s World Mental Health Survey Initiative (2011),
richer countries have higher depression rates. Contrary to this finding, the Gallup World Poll has
shown that countries suffering from extreme poverty and/or wars have the highest depression
rates: people in Ethiopia (43.4%), the Palestinian Territories (30.8%), Yemen (28.2%), and Iraq
(26.3%) report the highest rates of depression, while less than 10% of people feel depressed in
most of the developed countries such as Denmark (4.2%), Netherlands (4.8%), Switzerland
(4.8%), and Sweden (4.9%). The remarkable difference between the two results implies that
studies on depression should pay attention to the causes of depression differentiated by the
social context of each country. This study examines daily experience of depression based on
the Gallup World Poll, a multinational probability-based survey. This dataset does not
distinguish clinical depression from momentarily feeling down. This study focuses on the socio-
political conditions that contribute to the public’s feeling of depression. As Diener et al. (2003)
noted, comparative studies on emotional and cognitive aspects are complicated because
cultural variables as well as personalities yield differences in mean level of individual evaluation
of life between countries. Considering this complexity, this study traces fluctuations in
depression rates within countries in the Sahel region or Western Asia, rather than simply stating
that the public of which country are more depressed. A preliminary analysis indicates that
depression rates might reflect major socio-political events such as famine and violence against
the citizens, that is, the depression rates of a country are often influenced by major social
events. Further, different factors such as extreme poverty and fear result in different levels of
depression rates between countries. The result confirms that cross-cultural studies on
depression should meticulously take the unique social context in each country into
consideration.
Friday, May 17
10:00 a.m. – 11:30 p.m.
AAPOR Concurrent Session D
Probability and Non-Probability Samples in Internet Surveys
Understanding Bias in Probability and Non-Probability Samples of a Rare
Population
John Boyle, ICF International; Sarah Ball, Abt Associates; Helen Ding, Chenega
Government Consulting LLC; Gary L. Euler, NCIRD, Centers for Disease Control and
Prevention; Stacie Greby, Centers for Disease Control and Prevention; Faith Lewis, Abt
SRBI
Although probability samples are the preferred source for national data measures, non-
response issues in probability samples have a substantial impact on the cost of surveys among
rare populations. In some rare populations, non-probability samples can produce credible
results that would not otherwise be possible to obtain. CDC conducts surveys of flu vaccination
attitudes and usage among pregnant women in order to monitor health risks in this vulnerable
population. Pregnant women account for 1% of the adult U.S. population. The National Health
Interview Survey (NHIS) identifies pregnant women in its sample; however, the sample size is
small and the data are not available by the start of the next flu season. During the 2010-11 flu
season, CDC conducted nearly 1,500 interviews of pregnant women in the fall and 2,000
interviews in the spring using a large national Internet panel. The fall Internet panel surveys are
launched on or around November 1 and published in early December to provide an early
season estimate of flu vaccination among pregnant women. The final season estimates
measured by the spring Internet panel surveys are published at the start of the following flu
vaccination season. Complete data for pregnant women in the NHIS are not available until near
the end of the following flu vaccination season. In this paper we examine the characteristics of
pregnant women in the Internet sample and compare them to the NHIS to better understand
potential sources of error in both probability and non-probability samples, with consideration for
reasons to choose between a probability and non-probability sample for generating rapid data to
assess public health programs.
A Comparison of Results from Dual Frame RDD Telephone Surveys and Google
Consumer Surveys
Scott Keeter, Pew Research Center; Leah Christian, Pew Research Center; Danielle
Gewurz, Pew Research Center; Michael Dimock, Pew Research Center; Rob Suls, Pew
Research Center; Jon Sadow, Google; Paul McDonald, Google; Brett Slatkin, Google;
Matt Mohebbi, Google
The growth in Internet use has led to the development of new techniques for conducting social
research and measuring people’s behavior and opinion while they are online. One such tool,
Google Consumer Surveys, interviews a sample of Internet users from a diverse group of about
80 publisher sites that allow Google to ask one or two questions of selected visitors as they
seek to view content on the site. Google’s approach results in a nonprobability sample of
Internet users, but is distinct from opt-in surveys in that respondents cannot self-select into the
survey. It is also different from Internet panels that respondents join for an extended period of
time. This paper will summarize a year-long evaluation of how results from Google Consumer
Surveys compare with those from dual frame RDD telephone surveys. Specifically we compare
Google and telephone survey estimates across a wide range of political attitudes and behavior,
domestic and foreign policy opinions, technology use and civic and political engagement. We
also examine how the demographic composition of the samples compares with that of Internet
users in both telephone surveys and the Current Population Survey. In the initial six months of
evaluation, the median difference in point estimates across substantive and demographic 48
questions tested was 3 percentage points; the mean difference was 6 percentage points. The
paper will offer guidance on how Google Consumer Surveys can be used for different
applications of survey research.
A Comparison of a Mailed-in Probability Sample Survey and a Non-Probability
Internet Panel Survey for Assessing Self-Reported Influenza Vaccination Levels
Among Pregnant Women
James Singleton, Centers for Disease Control and Prevention; Helen Ding, Chenega
Goverment Consulting LLC; Stacie Greby, Centers for Disease Control and Prevention;
Gary L. Euler, NCRID, Centers for Disease Control and Prevention; Indu B. Ahluwalia,
NCCDPHP, Centers for Disease Control and Prevention; John Boyle, ICF International
The Centers for Disease Control and Prevention (CDC) conducted an opt-in Internet panel
survey (IPS) to provide timely estimates of mid-season and end-of-season influenza vaccination
coverage among pregnant women. We used the Pregnancy Risk Assessment Monitoring
System (PRAMS), a stratified probability sampling survey, to assess the representativeness of
the pregnant women sample and the validity of influenza vaccination coverage from IPS. The
IPS is an “opt-in” survey for women who were pregnant anytime since August 2010-April 2011.
PRAMS is an ongoing population–based surveillance system collecting data on maternal
experience and behaviors before, during and shortly after pregnancy among women delivering a
live-born infant. For both surveys, we limited the analysis to women pregnant during the peak flu
vaccination period (October 2010-January 2011) residing in 18 states with completed data and
compared final weighted distributions of demographic characteristics and influenza vaccination
coverage and un-weighted IPS vs. base-weighted PRAMS (accounting for probability of
selection) distributions of demographic characteristics. Compared to PRAMS, IPS respondents
had similar age and marital status distributions, were more likely to be white (68.1% vs.58.0%),
less likely to be Hispanic (10.7% vs.16.5%) or other racial/ethnic groups (6.0% vs.12.0%) before
final weighting. The higher percentage of women with college and above education from IPS
(44.0% vs.34.6%) persisted after final weighting (43.6% vs.31.9%). Overall influenza
vaccination coverage from both surveys was similar (50.2% vs.49.2%) and the estimates by
subgroups were similar (±<5%) except by race/ethnicity>5%). While neither survey provides
a standard measure of flu vaccination among pregnant women, IPS is able to provide a similar
vaccination coverage estimates among pregnant women within the flu season for rapid
response, while PRAMS provides detailed state level data for longer term planning. Both
surveys will be continued to assess immunization programs and to ensure valid, timely data
available for decision making.
Probability vs. Non-Probability Samples: A Comparison of Five Surveys
Johan Martinsson, University of Gothenburg; Stefan Dahlberg, University of Gothenburg;
Sebastian Lundmark, University of Gothenburg
Commercial Internet panels based on non-probability samples have begun to be widely used,
also by academic researchers. But is the quality of such data comparable to that of probability
based samples? Previous studies addressing this issue have mainly focused on the U.S., while
this study compares the quality of such Internet panels in different context: Sweden. Sweden
differs from the U.S. for example by having had very high Internet coverage for a long time,
having smaller socio-economic differences and by having a complete population register that
can be used for random samples. Two non-probability based panels are compared with two
probability based panels and a benchmark telephone survey. Demographics are compared to
government records, and attitudes are compared to benchmark studies of high quality and high
response rate. In order to allow comparisons five surveys with comparable questions were run
at the same time. Three of the Internet panels provided professional post-stratification weights,
which allow us to compare the accuracy both with and without weights. In contrast to previous
studies, the results indicate a surprising similarity in terms of accuracy between probability
panels and non-probability panels. The reasons for this deviating result and differences between
the United States and Sweden are discussed.
Modeling a Probability Sample? An Evaluation of Sample Matching for an Internet
Measurement Panel
Lukasz Chmura, The Nielsen Company; Douglas Rivers, YouGov; Delia Bailey, YouGov;
Christine Pierce, The Nielsen Company; Scott Bell, The Nielsen Company
The past several years have seen an increase in the usage of samples from opt-in panels.
While these samples are relatively inexpensive compared to more traditional sample designs,
they are subject to unknown biases. A sample matching approach was evaluated as a means of
controlling and reducing this bias. Sample matching allows us to select a subset of the opt-in
sample that is as similar as possible, at a sample unit level, to a probability sample based on
variables common to both. Assuming selection into the sample is independent of the survey
variables conditional upon the matching variables, the matched sample will produce consistent
results. With assistance from YouGov, Nielsen evaluated the effectiveness of the sample
matching process to measure Internet behavioral metrics, including individual site visitation and
Internet usage levels. The American Community Survey (a respondent level, publicly available
probability sample) was used as the target sample for matching, and a matched sample was
selected from the non-probability component of the Nielsen Netview panel (Nielsen’s Internet
audience measurement service, comprised of both probability and non-probability components).
The resulting matched sample was compared to the probability portion of the Netview panel.
The results are encouraging, showing significant reductions in bias among the matched sample.
Specifically, for the Internet behavioral metrics considered, the matched sample generally
produced smaller differences from the probability panel than the full opt-in sample, even after
standard post-stratification weighting was applied. The robustness of the method was also
evaluated, with the matched samples producing relatively stable estimates. While additional
research is necessary to fully optimize this new methodology, the results so far show promise in
producing quality results from an opt-in sample.
Question Construction and Data Quality
Impact of Filter Questions on Estimates of Media Consumption
Curtiss Cobb, GfK Knowledge Networks; Danell Godinez, GfK Knowledge Networks;
Randall Thomas, GfK Knowledge Networks; Julian Baim, GfK-MRI; Risa Becker, GfK-MRI
A key choice in the design of Web surveys is whether to avoid posing questions to respondents
that do not apply to them by first asking filter questions. In research on filter questions, there is
some indication that a dichotomous “yes” or “no” response will yield a lower proportion of self-
reported occurrences of behaviors or attitudes than a multi-category scale. For example, in a
number of studies measuring attitudes (e.g. ‘concern’) or self-reports of ‘crime,’ multi-category
formats have been associated with higher self-reported incidence or attitudes than conditions
that filter with yes-no formats (Herrmann, et al., 1998; Hippler and Schwarz, 1989; Knäuper,
1998; Sterngold, et al., 1994). These findings are at odds with the cognitive processes that
survey researchers and psychologists believe that respondents use to answer questions. It is
believed that respondents first determine whether an incident or attitude occurred and before
trying to map it onto the provided multi-category response scale. This study extends the
research on filter questions by examining their use to measure media consumption, particularly
newspapers readership, radio listening and television viewing. Using a split ballot design on a
representative sample of 1,000 adults, we randomly assigned half the sample to report their
media consumption over a period of time using a multi-category response, while the other half
of the sample were first asked a filter question before receiving the multi-category response if
eligible. Preliminary finding show that respondents receiving the multi-category response
reported more media consumption than those receiving the filter questions. Additional analysis
will explore differences along demographic lines and seek to relate the findings of non-media
use to satisficing behavior in other parts of the survey instrument.
Response Format Effects in the Measurement of Employment
Sergei Rodkin, Gfk Custom Research, LLC; Randall K. Thomas, Gfk Custom Research,
LLC; Stefan Subias, Gfk Custom Research, LLC; Carolyn Chu, Gfk Custom Research,
LLC
Accurate measurement of employment is essential to track employment trends in a nation, with
the information used to determine the effectiveness of a variety of private and governmental
programs designed to increase employment. Some have noted discrepancies in estimated
employment numbers between the Census and the CPS (Census typically has a lower count of
employed people), most often attributed to differences in interviewing mode, time frame
reference, or sampling frame. Many researchers using paper-pencil or Web-based
questionnaires present a multiple response question (‘Select all that apply’) to assess
employment. However, in a telephone interview, employment is often asked through a series of
yes-no questions, with the interviewer requesting a ‘yes’ or ‘no’ response for each item
presented in sequence (cf. Smyth, Christian, and Dillman, 2008, POQ). In research with self-
administered questionnaires, the Yes-No Grid format has been found to yield a higher level of
endorsement than the Multiple Response format in self-administered surveys (Smyth, Dillman,
Christian, and Stern, 2006, POQ; Thomas and Klein, 2006, JOS). This paper reports on two
studies – Study 1 was a Web-based study that was conducted across 24 monthly waves with
over 60,000 respondents (18 or older) using an opt-in non-probability panel, balanced
demographically for age, sex, region, education, and income. Respondents were randomly
assigned to one of the 3 employment response scale formats: Multiple Response Format
(MRF); Yes-No Grid (YNG for employment); Single Response Format (SRF). Study 2 was a
Web-based study with over 2700 respondents using a probability-recruited panel (GfK-
Knowledge Networks) with the same conditions used in Study 1. In both studies, endorsement
of every category was higher with the YNG and lowest with the SRF. We will also describe how
these results are related to trend changes across quarters and how they are related to other
work-related variables, including hours worked/week.
Grouped Versus Interleafed Questions and Specific Versus Global Questions to
Improve Accuracy of the Census Questionnaire
Emily Geisen, RTI International; Murrey Olmsted, RTI International; Jennifer H. Childs,
U.S. Census Bureau
To reduce duplication or misreporting on the census, the U.S. Census Bureau includes
questions asking respondents about alternate addresses where household members sometimes
live or stay. However, recent studies based on the 2010 census found evidence of
underreporting these alternate addresses. To improve enumeration and reduce costs
associated with conducting follow-up interviews, the Census Bureau is exploring the use of
computer-assisted interviewing (CAI) for the 2020 census. The use of CAI for the 2020 Census
allows us to explore the use of two different design methods to encourage reporting of alternate
addresses: (1) grouped questions versus interleafed questions, and (2) specific versus global
questions. Research has shown that asking a group of yes/no filter questions before asking
detailed follow-up questions can elicit more “yes” responses compared to interleafed questions,
where follow-up questions come immediately after the filter question. (Kreuter, McCulloch,
Presser, & Tourangeau, 2011). This idea is incorporated into the 2020 Census by asking
respondents a series of yes/no questions about whether household members live or stay
somewhere else before asking for the more detailed address information, which respondents
may be reluctant to provide. Furthermore, research shows that asking global questions (i.e.,
asking about the household collectively) elicits less detail from respondents than asking specific
questions (i.e., asking about each household member individually). However, using specific
questions can lead to lengthier surveys and increased respondent burden. In this paper, we
examined the qualitative results of these two design methods compared to a control using two
rounds of cognitive and usability testing with approximately 100 participants. We examined
which methods resulted in higher reporting of alternate addresses, and higher reporting of
household members with alternate addresses. In addition, we investigated which method was
associated with more accurate reporting overall, the least number of user errors, and the lower
respondent burden.
Minor Design Changes With Major Impacts: Testing Explicit Versus Implicit Don’t
Know and Refused Response Options in Audio Computer-Assisted Self
Interviewing
James M. Dahlhamer, National Center for Health Statistics; Adena Galinsky, National
Center for Health Statistics; Sarah Joestl, National Center for Health Statistics; Marcie
Cynamon, National Center for Health Statistics; Jennifer Madans, National Center for
Health Statistics; Virginia Cain, National Center for Health Statistics
An ongoing debate among survey researchers focuses on the provision, or not, of an explicit
don’t know response option for questions in self-administered surveys. Some argue that offering
an explicit don’t know option invites satisficing, an “easy way out,” resulting in elevated item
nonresponse. Others counter, arguing that the exclusion of an explicit option represents a form
of coercion, forcing respondents to answer when the required knowledge/experience/opinion
does not exist. To inform this debate, we utilize data from two 2012 field tests evaluating the
feasibility of audio computer-assisted self-interviewing (ACASI) in the National Health Interview
Survey. In the first test, questions on sexual identity, mental and financial health, sleep, and HIV
testing were administered via ACASI to 535 adults. Don’t know and refused options were
provided with each question, and respondents could advance without answering (“explicit”
approach). The second field test involved a split-ballot experiment in which 3,215 adults were
assigned to receive questions using ACASI and 2,237 using computer-assisted personal
interviewing (CAPI). For ACASI, explicit don’t know and refused options were eliminated and a
follow-up item (response options: return to question, don’t know, refused) was presented when
a respondent advanced without answering (“implicit” approach). We assessed the ACASI
design changes by comparing item nonresponse rates between ACASI cases from the two field
tests. Where significant bivariate results emerged, the impact of screen design was tested in a
multivariate setting, controlling for sociodemographic characteristics such as age, sex, and
education. We also assessed mode differences by comparing item nonresponse rates between
CAPI and ACASI cases from the second test. Here again, significant bivariate results were
followed by multivariate analyses controlling for sociodemographic measures. Preliminary
results suggest a considerable advantage to the implicit approach. We conclude by discussing
the implications of our results for self-administered questionnaire design.
Seymour Sudman Student Paper Award Winner
Measure for Measure: An Experimental Test of Online Political Media Exposure
Andrew Guess, Columbia University
It is well known that existing measures of self-reported political media exposure are potentially
unreliable. Various studies have explored the causes of such measurement error, such as social
desirability bias, and have tested proxies, such as political knowledge. However, lacking an
objective baseline, investigations of this sort still rely solely on survey responses. By focusing
specifically on recent Internet activity, this paper's methodology estimates individuals' actual
consumption of political media. Using an experiment embedded within an online survey, I test
two different measures of media exposure and compare them to the estimated actual exposure.
I find that open-ended prompts produce generally more accurate measures of recent exposure
to online media compared to multiple-choice questions offering a list of different political news
outlets, which tend to produce overreporting.
Interviewing Methods and Survey Outcomes
Rapport, Sensitivity, and Proxy Reporting: Questions About End-of-Life Planning
and Interviewer-Respondent Interaction
Dana Garbarski, University of Wisconsin-Madison; Nora Cate Schaeffer, University of
Wisconsin-Madison; Jennifer Dykema, University of Wisconsin Survey Center
“Rapport” is a vague concept that has been used to refer to a wide range of features of
interaction. Rapport is sometimes assumed to be a positive feature of the interaction, referring
to a situated sense of affiliation between interactional partners, comfort, willingness to disclose,
motivation to please, empathy, or sharing (Goudy and Potter, 1976). Rapport may benefit
response quality by increasing respondent motivation, but could also harm data quality.
Questions about one’s end-of-life treatment planning and preferences are potentially sensitive
and interactionally delicate for both interviewers and respondents, creating a unique opportunity
to study the development and maintenance of interactional rapport. We propose to consider the
various meanings and dimensions of rapport and consider what their interactional expressions
might be both for the interviewer and the respondent in this context. This study examines
transcripts of the end-of-life section of the 2004 wave of the Wisconsin Longitudinal Study in
order to examine the conversational practices through which interviewers and respondents
negotiate sensitive topics and answers in terms of the signals that respondents give about
sensitive answers and how interviewers signal these questions are delicate, ask these
questions, and follow-up respondent answers. A coding scheme is developed to examine the
features of the interviewer-respondent interaction, including behaviors associated with rapport,
sensitivity, and motivation as outlined above, as well as behaviors previously identified as
indicating potential problems in the response process such as markers of uncertainty (see, e.g.,
Garbarski, Schaeffer, and Dykema, 2011). These coded features of interviewer-respondent
interaction will be examined for their associations with two criteria: participation in a subsequent
wave of the Wisconsin Longitudinal Study, and, for respondents who are married, concordance
with spouses in reports of spouses’ end of life treatment preferences.
Measuring Conversational Interviewing and Its Impact on Data Quality in the
American Time Use Survey
Scott Fricker, U.S. Bureau of Labor Statistics; Morgan Earp, U.S. Bureau of Labor
Statistics; Jennifer Edgar, U.S. Bureau of Labor Statistics; Polly Phipps, U.S. Bureau of
Labor Statistics; Stephanie Denton, U.S. Bureau of Labor Statistics
In the American Time Use Survey (ATUS), interviewers use a set of scripted open-ended
questions to walk respondents chronologically through the prior 24-hour day, collecting activities
and details about each activity reported. The interview is designed to be administered using
conversational interviewing, a method thought to put the respondent at ease and provide
interviewers with the freedom to collect data in the best possible way. Conversational
interviewing is hypothesized to improve respondent understanding of questions and concepts as
interviewer and respondent converse and collaborate on meaning. In the ATUS, conversational
interviewing also is thought to improve recall by allowing interviewers to ask open-ended
questions to assist respondents in reconstructing their day in a way that is meaningful to them
rather than following a set script and sequence. Previous research has explored the use of
conversational interviewing in the ATUS and found that although some conversational
interviewing methods are used, they are not used consistently across all interviewers,
respondents, or even within an interview. The impact of conversational interviewing techniques
on data quality also was found to be inconsistent. In this paper, we use 100 behavior-coded
transcripts of ATUS interviews to further explore the use and scope of conversational
interviewing and the impact on data quality. We identify important components of conversational
interviewing, based on interviewer behaviors and respondent-interviewer interactions, and
develop a scale to measure how conversational the interview is. New measures of data quality,
including the adequacy of respondent answers, number and type of respondent activities,
missing activities, and interview length are explored and multivariate analysis techniques are
utilized to better understand the complex relationship between interviewer and respondent
behaviors, as well as the quality of the data collected.
Predicting the Occurrence of Respondent Retrieval Strategies in Calendar
Interviewing: The Quality of Retrospective Reports
Robert F. Belli, University of Nebraska – Lincoln; L.D. Miller, University of Nebraska
Lincoln; Leen Kiat Soh, University of Nebraska – Lincoln; Tarek Al Baghal, University of
Nebraska – Lincoln
Calendar based survey interviewing methods have been predicted to enhance the quality of
retrospective reports by encouraging the use of thematic and temporal retrieval cues that reside
in autobiographical memory. These cues—measured as observable verbal behaviors—exist as
parallel (using a contemporaneous event from the respondent’s past to cue an event in a
different theme) and sequential (using an event to remember what happened earlier or later
within the same theme) interviewer probes, and respondent parallel and sequential retrieval
strategies. Previous research has shown that retrieval behaviors are associated with better
retrospective reporting data quality when the respondents’ histories are more complex. The
current study focused on discovering patterns of interviewer verbal behaviors that predict the
occurrence of respondent parallel retrieval strategies, and whether these patterns are
associated with data quality. Data are derived from the interviews of 153 respondents of the
Panel Study of Income Dynamics (PSID) who were interviewed about their life course histories.
For every respondent turn of speech, the occurrence or nonoccurrence of a respondent parallel
retrieval strategy was evenly sampled. Verbal behaviors of immediately preceding interviewer
and respondent turns of speech were assessed in terms of their co-occurrence with parallel
retrievals using a decision tree data mining algorithm called C4.5. We discover seven patterns
of preceding behaviors that have the most impact on encouraging respondent parallel retrieval
strategies. We assessed the association between the occurrences per interview of each of
these patterns on response accuracy in reports of employment as determined by comparing
calendar responses with responses collected in prior waves of the PSID. Interviewer sequential
probing, when followed by a respondent parallel, was associated with greater accuracy, but
interviewer parallel probing was not. For some patterns that involved sequential probing, greater
accuracy was only observed with respondents who had complicated employment histories.
Linking Interview Context, Interviewer Behavior and Data Quality
Aaron Maitland, Westat; Wendy Hicks, Westat
Interviewers play an important role in gaining the cooperation of survey respondents and
administering questions. Several studies have explored the relationship between interviewer
behavior and different sources of survey error. But there is little known about the mechanism in
which interviewers’ affect error. A study by Olson and Peytchev (2007) adds some insight into
the interviewers’ effect. The authors found that as interviewers conduct more interviews, the
length of the interview decreases and the interviewers perceive the respondents as less
interested. While we would anticipate that interviewers improve their skill in navigating and
administering an instrument over repeated administrations, the change in interviewers’
perception of respondents’ interest in the study may not be independent of the faster
administration and may actually be more reflective of the interviewers’ own attitude. We build on
these findings and make use of Computer Audio Recorded Interviewing (CARI) and coding
analysis to further understand the mechanism in which interviewers’ behaviors may affect error.
Using the National Health and Aging Trends Study (NHATS), we link behavior coding analysis
with contact history data and interviewer characteristics to create a context in which we examine
the relationship between interviewer behavior and data quality. In the analysis, we construct a
‘case difficulty’ variable based on the contact history data and compare interviewer behaviors
between the more difficult and less difficult cases. In addition, we account for interviewer
productivity as a variable related to interviewer behaviors. In a preliminary analysis, we found
that interviewers differ in how well they follow the standardized interviewing protocol between
difficult and less difficult cases, depending on their overall productivity. In this paper, we look at
whether there are differences in data quality as measured by item nonresponse, interview
length and the consistency of survey responses when interviewers’ deviate from protocol.
Hello? Is Better Than Hello: Effects of Greetings on Participation in Survey
Invitations
José R. Benkí, University of Michigan; Jessica Broome, University of Michigan; Frederick
Conrad, University of Michigan; Robert Groves, Georgetown University; Frauke Kreuter,
University of Maryland JPSM & IAB
Potential respondents to telephone survey interviews rapidly decide whether or not to
participate, with most refusals occurring within 30 seconds of answering the phone. Given the
speed of this decision, it is likely that the initial verbal interactions between the interviewer and
the “answerer” play an important role in the answerer’s decision to participate. The present
study focuses on the acoustic properties of “Hello” greetings by interviewers and telephone
answerers at the beginning of survey invitations, and the relationship of these properties to the
outcome of specific telephone survey invitations: agree-to-participate, scheduled-callback, and
refusal. These relationships are explored in a corpus of 1380 audio-recorded contacts, sampled
from five studies conducted by the University of Michigan Survey Research Center. Half of the
contacts contain “hello” greetings suitable for acoustic analysis, including pitch measurement.
Following Schegloff (1998), who documents how high-pitched greetings in telephone
conversations signal enthusiasm, recognition, and friendliness, we hypothesize that contacts
containing high-pitched “hello”s are more likely to lead to agreement or scheduled-callback
instead of refusal. Greetings resulting in refusal contained average pitch rises of 18% above
baseline pitch level for both answerers and interviewers. Contacts resulting in agreement or
scheduled-callback contained greetings with higher pitch rises, 22% for answerers and 26% for
interviewers. The significantly higher interviewer pitch rises in nonrefusals suggests that the
positive attributes conveyed through high-pitched greetings promote participation. A second
analysis considered the interaction between answerer and the interviewer greeting intonation.
Consistent with the previous result, the highest rate of agreement occurs for contacts in which
both actors produce a greater-than-average pitch rise. The lowest rate of agreement occurs for
contacts in which the answerer greeting contains a pitch rise, but the subsequent interviewer
greeting had flat intonation, suggesting that interviewer failure to reciprocate an enthusiastic and
friendly greeting can be particularly harmful to participation.
Decision-Making in the 2012 Election
Validating Likely Voter Measures in 2012 Pre-Election Polling
Jocelyn Kiley, Pew Research Center; Scott Keeter, Pew Research Center; Matt Frei, Pew
Research Center; Seth Motel, Pew Research Center; Leah M. Christian, Pew Research
Center; Michael Dimock, Pew Research Center; Michael P. McDonald, George Mason
University; Matthew Berent, Matt Berent Consulting; Jon Krosnick, Stanford University
One of the key challenges in pre-election surveys is determining the likely electorate.
Substantial research has shown that people over-report their intention to vote (Holbrook and
Krosnick 2010, McDonald 2011) so pollsters have developed various methods of identifying
which survey respondents are likely to vote and which are not. However, the accuracy of these
methods have rarely been validated using actual voter records, aside from Paul Perry’s path
breaking work in this area in the 1960s, and a Pew Research Center study in a local mayoral
race in the late 1990s (Perry 1970 and Dimock, et al. 2001). This paper will present preliminary
analysis on the effectiveness of various likely voter measures using Pew Research’s 2012 final
pre-election poll of 3,815 adults and a post-election survey of a sample of registered voters from
the same survey, in which voter records will be used to distinguish between voters and
nonvoters. The analysis will explore how various survey questions designed to identify likely
voters correlate with actual turnout. The paper also will explore the relatively new issue of how
best to handle respondents who say they have “already voted”, who constituted only a very
small proportion in past years. In particular, it will examine the extent to which over-reporting of
voting occurs among those who report having already voted.
The Impact of the Presidential Debates on Undecided and Persuadable Voters
Curtiss Cobb, GfK Knowledge Networks; Charles DiSogra, Abt SRBI; Jordon Peugh, GfK
Knowledge Networks; Sarah Dutton, CBS; Anthony Salvanto, CBS
While politicians and pundits heralded Gov. Mitt Romney’s performance in the first debate of the
2012 presidential campaign as a game-changer, and included the subsequent narrowing of
support between the candidates in public opinion polls as evidence, political scientists were
warning that it was all likely hype. Past research on debates have found little in the way of direct
effects on candidate support and instead lead to partisan reinforcement (Hyllygus & Simon
2003; Kenski & Jamieson 2006). Moreover, debate “effects” are in part mediated through the
post-debate political conversation (Brubaker & Hanson 2009).This two-wave study examines
how the 2012 presidential debates and the subsequent post-debate conversation altered
undecided and persuadable voters’ perceptions of the candidates and their ultimate vote choice.
Using GfK’s probability based Internet panel, KnowledgePanel®, a group of undecided and
persuadable voters were identified prior to each presidential debate and asked to complete a
post-debate questionnaire in the hour immediately following the debate. Respondents were
asked about their impressions of the debate performance of each candidate and for whom they
planned to vote and why. Respondents were then re-interviewed on Election Day, along with
decided voters and undecided voters that failed to watch the debates. They were asked again
about their impressions of the candidates’ debate performances and who they voted for and
why, along with questions about media consumption, political interest and political knowledge.
Differences between the three groups are being analyzed. The analysis will show: (1) whether
undecided debate watchers votes differed from undecided non-watchers and already-decided
voters; (2) whether a re-evaluation of debate performance occurred in the days between the
debate and the election, and if so whether it relates to media consumption; (3) and whether
political interest and political knowledge are moderating variables.
The RAND Continuous 2012 Presidential Election Poll
Tania L. Gutsche, RAND Corporation; Arie Kapteyn, RAND Corporation; Erik Meijer,
RAND Corporation; Bas Weerman, RAND Corporation
The RAND Continuous 2012 Presidential Election Poll (CPEP) was conducted within the
American Life Panel, which is an Internet panel recruited through traditional probability sampling
to ensure representativeness. The CPEP differs from other polls in that it asks the same
respondents repeatedly about their voting preferences. Thus, it leads to more stable outcomes
and changes are due to individuals' changing their minds and not due to random sampling
fluctuations. The CPEP is also different because it asks respondents to state their preferences
for a candidate and the likelihood that they will vote in probabilistic terms (percent chance).
Moreover, we asked the panel members after the election whether they had voted and who they
had voted for, so we can study the predictive power both within sample and out of sample (the
national results). The CPEP appears to have predicted well. Our final prediction of the
difference in popular vote between Obama and Romney differed less than .7 percentage point
from the final tally. The probabilistic questions, even months before the election, were strongly
related to individuals' actual voting behavior. Our approach allows us to gain insights in stability
of voting preferences and the effect of events on individual preferences; for example, we see
that changes in intention to vote play an important role in predicted vote shares for the
candidates, while various shifts can be related clearly to major events. The American Life Panel
has a wealth of background characteristics which can be related to voting preferences.
Survey Research as a Campaign Tool: Turnout Effects of Survey Respondents
David M. Margolis, Greenberg Quinlan Rosner Research
Researchers and political operatives alike are both concerned with determining which campaign
methods and tactics are best for boosting voter turnout. Whether advertising, door-to-door
canvassing, telephone calls, or direct mail, many methods of traditional voter contact have been
tested in growing experimental literature on the effectiveness of voter mobilization efforts. In the
case of political campaigns, one of the most common methods of conducting research on voter
behavior adopts the same mode of one of the common methods of promoting voter mobilization:
telephone surveys. Given the similarity of the mode (in fact, many voter mobilization call scripts
adopt a format of an opinion survey), the effect that responding to a public opinion survey has
on the likelihood that the respondent will turn out to vote should be evaluated. The author will
implement a vote propensity score matching process to evaluate the effect that taking a political
survey has on voter turnout likelihood. This quasi-experimental design will compare similar
individuals in the treatment group (survey respondents) with non-treated individuals (sample
members who did not receive a contact), and assess the voting behavior of both groups. Similar
individuals will be identified using a listed sample where all potential respondents were assigned
a modeled vote propensity score, and the average treatment effect can be analyzed with a
paired difference test.
The Influence of Social Desirability in the Rise of Political Independents
Samara Klar, Northwestern University; Yanna Krupnikov, Northwestern University
Over the last several decades, survey researchers have seen a consistent change in political
party identification among the American public. Today, when asked by researchers and
pollsters, a plurality of Americans identifies themselves as Independent, as opposed to
committing to one of the two parties. This evident detachment from partisanship brings with it
scholarly concerns for citizens’ engagement with the party system. Using a series of survey
experiments, we demonstrate that these shifts are not, in fact, indicative of a genuine decline of
partisanship, but rather a function of heretofore undetected social desirability pressures. We
show that this tendency to identify as Independent is particularly likely to be triggered by media
coverage focusing on the importance of undecided and Independent voters, as well as coverage
emphasizing bickering between the two parties. Our two-wave panel experiment allows us to
measure individual partisan preferences prior to stimulus exposure, strengthening the case that
media coverage increases the social desirability of identifying as an Independent. Our follow-up
study demonstrates the implications of this finding for both survey research and for political
participation more broadly.
Questionnaire Translation:
Janet Harkness’ Contributions, Legacy, and Beyond
This is one of two sessions to review and honor the contributions of Janet Harkness to the field
of survey translation, adaptation, and questionnaire development in multilingual and
multicultural surveys. All of Harkness’ contributions have advanced our understanding of
language and cultural issues in conducting surveys across language, cultures and regions.
Where is the impact of her work most visible and what significance does this have on
measurement in multicultural, multilingual surveys? How has this led to improvements and what
are the areas where her legacy has the potential to largely improve procedures? Villar and
Schoua-Glusberg’s paper will focus on this overview of Harkness’ work and contributions.
Methods to improve equivalence in measurement tools across languages/cultures was a central
focus of Harkness’ work. When is translation not enough to produce equivalent measures for
comparative studies, and adaptation is required? What are the limits of adaptation to still
maintain comparability? Behr addresses these issues and focuses on the need to strive for a
common understanding of adaptation and its applicability. Harkness strived to advance the field
of survey translation by conducting and promoting basic methodological research to uncover the
strengths and weaknesses of different approaches to translation and translation assessment.
The other two papers in this session are examples of such research. Dörer examines the effects
of advance translation of selected items in the ESS and how it allowed to uncover issues in the
original items, leading to changes in the English version. This is precisely the flexibility that
Harkness hoped questionnaires would have to adopt changes based on the need to reach
better crossnational equivalence. In Schoua-Glusberg’s paper, another experiment of a survey
measure translation into Polish will provide evidence on what backtranslation and the committee
approach can each contribute to translation assessment and what are their strengths and
weaknesses.
Overview of Janet Harkness’ Work and Contributions to the Field: Where Did She
Lead Us To and Where We Are Now
Ana Villar, Research Fellow; Alisú Schoua-Glusberg, Research Support Services
This paper aims to present and evaluate the contributions of Janet Harkness to the field of
multinational, multilingual, and multicultural (3M) surveys. Her contributions have advanced our
understanding of language and cultural issues in conducting 3M surveys and her legacy has the
potential to largely improve procedures used to implement them. We will start by reviewing the
main areas in which the impact of Harkness’ work is visible and the significance this has on
measurement and comparability. As leader of the ESS Translation Task Force and of the
Translation and Questionnaire Design Group of the ISSP, she helped set up translation
procedures, stressing the importance of incorporating cross-cultural input at the questionnaire
design stage to increase the chances of translations resulting in comparable instruments. She
proposed a number of techniques (e.g., advance translation) and translation models (TRAPD)
that are currently used in important international projects as well as large national projects with
federal and international funding. She advocated the use of pretesting methods to assess
translation and challenged existing views on answer scale construction. Some of Harkness’
contributions, however, have not yet had the impact on survey implementation that they may
one day have. Many studies still follow a strategy where the source questionnaire is developed
and finalized without taking into consideration input from the other cultures or languages that
questionnaires will be translated to. Even worse, in smaller national projects that add a minority
language as a last step before fieldwork, the resulting translation often is not done in time to go
through appropriate assessment procedures, and thus its quality is very often unacceptable. We
have a long road ahead before we reach, as a field, the understanding and the rigorous
methodology that Harkness envisioned. This paper will try to suggest ways to help us get there.
On the Different Uses and Users of the Term Adaptation
Dorothee Behr, GESIS – Leibniz Institute for the Social Sciences
Transferring a questionnaire from one language and culture into another language and culture
calls for translation and/or adaptation of the questionnaire. Whether translation or adaptation is
required or referred to depends on various factors, among which: 1) the goal of the research
(e.g., comparative), 2) the original design of the source questionnaire and, thus, its
transferability to other languages and cultures (e.g., source questionnaire was designed with
only one culture in mind), 3) the (linguistic) unit referred to (e.g., word vs. sentence), 4) the
discipline – including its terminology – to which a researcher belongs (e.g., psychology,
translation studies), or 5) personal views on what adaptation and translation involve. Firmly
embracing Janet Harkness’ work on adaptation (e.g., 2010), this presentation will look into the
different uses and users of the term adaptation, in contrast to the term translation. This study
shall encourage, in the long term, the use of a consistent terminology. A consistent
understanding of what translation and adaptation involve is essential given the widespread use
of cross-national research data, the different analysis techniques that can go with
translation/adaptation, and the impact that different understandings of translation/adaptation
have on the actual “translation” process. In the short or medium term, the aim is to raise a
greater awareness of how the term adaptation, in contrast to translation, is used by different
researchers. Also, a greater debate shall be encouraged on what kind of changes in translation
are possible, or even required, to produce an equivalent questionnaire in comparative research.
References: Harkness, Janet. (2010). VII. Adaptation of Survey Instruments. Guidelines for Best
Practice in Cross-Cultural Surveys. Ann Arbor, MI: Survey Research Center, Institute for Social
Research, University of Michigan.
Enhancing the Translatability of the Source Questionnaire in the European Social
Survey (ESS) Does Advance Translation Help?
Brita Dorer, GESIS – Leibniz Institute for the Social Sciences
To assure comparability of measurement across countries, the quality of questionnaire
translation into the target languages in cross-cultural surveys is of utmost importance. Not only
translation procedures and guidelines, but also the source questionnaire has an impact on the
quality of the resulting translations. Starting from this, Janet Harkness developed the idea of
performing ex-ante translations of a pre-final version of the source questionnaire, to be used as
a ‘problem-spotting tool’ in order to improve the translatability of source questionnaires. The
European Social Survey (ESS) was the first major social sciences survey to apply systematic
‘advance translations’: In its 5th and 6th round (2010 and 2011), advance translations were
carried out in order to get input from people with different cultural and linguistic backgrounds.
The participating teams were asked to comment primarily on translation-related problems, from
linguistic or grammar issues to wording, meaning or intercultural aspects. The advance
translations led to numerous suggestions for modifications in the final English source
questionnaire in both rounds. At least one third of the items advance translated were modified
as a result, either by amending the wording of the source text to facilitate translation, or by
adding footnotes to clarify words or expressions. This paper will evaluate the methodology
applied in both rounds and describe the changes made to the final source questions following
advance translations. To assess the usefulness of this method empirically, tests using Think-
Aloud-Protocols (TAPs) where pre- and post-advance translation versions of questions will be
translated (aloud) into German and French to evaluate whether translation of the pre- version is
more problematic than translation of the post- version. Then, respondents in both target
languages will be asked to think aloud while answering these questions, to evaluate potential
translation effects on question processing.
Adapting Translation of the American Community Survey in Chinese and Korean
Mandy Sha, RTI International; Hyunjoo Park, RTI International; Yuling Pan, U.S. Census
Bureau
Chinese and Korean are among the top five non-English languages for which the U.S. Census
Bureau provided language assistance during the 2010 Census. However, questionnaire
translation in these two languages is less studied compared to Spanish translation. This paper
fills this gap by investigating unique challenges of questionnaire translation in these two
languages and by providing comprehensive guidelines for translating questionnaires in Chinese
and Korean. Based on results from cognitive pretesting with monolingual Chinese and Korean
speakers in the United States, this paper highlights several important steps that were taken to
adapt the translation to ensure functional equivalency in the translated questionnaire. This
paper is based on a study that the Census Bureau undertook to translate the American
Community Survey (ACS) in Chinese and Korean, in the form of a self-administered language
assistance guide (LAG). Using findings from the ACS LAG study, we will discuss several issues:
1) adapting unique Chinese and Korean linguistic practices (e.g. inability to adopt common
formatting stimuli in English language questionnaires such as all caps, because the Chinese
and Korean writing systems are not alphabet based); 2) adopting linguistic rules (e.g. use of
Hancha-rooted words and phonetic expressions in Korean); 3) adding pragmatic contextual
considerations (e.g. cultural expectation when asking the marital status question); and 4)
choosing appropriate translated words to reflect the immigrant experience (e.g. questions
concerning migration). We will also discuss some translation difficulties that simply cannot be
“fixed” within the parameters of the translation and must be addressed at the source language
questionnaire level, which echoes Harkness (2003). This paper is of interest to questionnaire
designers who survey non-English speakers in the United States. And our recommendations
have methodological implications for translating questionnaires for other Asian languages and
cultures, as well as languages that use non-Roman letters.
Translation Versus Adaptation: Translating U.S. Educational Level Survey
Questions into Spanish
Patricia Goerman, U.S. Census Bureau; Leticia Fernández; Rosanna Quiroz, RTI
International
Various studies have shown the difficulty of translating concepts related to country-specific
programs for use in surveys. Questions about educational attainment are an example of a
concept that is very difficult to translate for use with respondents with different national origins.
This is particularly the case for Spanish-speaking respondents in the United States, who come
from a variety of different countries where educational systems are different not only from the
U.S. system but from each other as well. This paper presents results from the cognitive testing
of the Spanish translation of educational level questions in the U.S. Census Bureau’s American
Community Survey (ACS). Two iterative rounds of cognitive testing were conducted on a series
of educational level questions with 46 Spanish-speaking respondents from 11 different
countries. We found that Spanish speakers interpreted many of the educational level categories
differently from what was intended. For example, Mexican-origin respondents interpreted
“escuela secundaria,” the original translation used for “high school,” to correspond to nine years
of schooling, while in the U.S. completing high school corresponds to 12 years of schooling.
Similarly, while the translation for “bachelor’s degree” or “bachiller universitario,” was interpreted
appropriately by Puerto Rican Spanish speakers, this was not the case among respondents
from Argentina, Mexico, Colombia and Nicaragua. In these Latin American countries the term
“bachillerato” is used to describe either junior high school or high school. Both of these
translations could result in upward biases in reports of immigrant educational levels since both
misinterpretations involve respondents reporting lower levels of education as higher ones. We
discuss various approaches taken to deal with the comprehension differences and the extent to
which these were successful. The paper concludes with a discussion of implications for
translation and testing of educational levels and other country specific programs, and provides
recommendations for future research.
The Origins and Development of Survey Research
The Origins and Development of Cross-National Survey Research: The Diffusion
of an Innovation
Tom W. Smith, NORC at the University of Chicago
This paper examines the rise and diffusion of survey research from the 1930s to the 1960s. It
covers 1) the emergence of cross-national, survey research including the role of early
adopters—Gallup, the National Opinion Research Center (NORC), other survey-research
organizations, and Public Opinion Quarterly; 2) the initial diffusion of survey research by Gallup,
International Research Associates, Inc., and others, 3) foundational survey-research meetings
and associations, 4) the impact of World War II, 5) the role of the United Nations and other
international organizations including its collaboration with the World Association for Public
Opinion Research, 6) the first comparative surveys, 7) the contributions of international
exchanges and immigrations, 8) changing developments in the 1950s and 1960s, including the
role of American influence and center/periphery diffusion, and 9) impediments to development.
A History of Survey Research and Its Professional Associations
Michael Mokrzycki, Mike Mokrzycki Survey Research Services
The development of survey research is viewed primarily through the development on AAPOR
starting with the Central City Conference in 1946.
Early Studies of Political Behavior in the United States
Michael W. Traugott, University of Michigan
Political polling has been central to the development of survey research and promotion of its
adoption in the United States and in other countries. This presentation focuses on the distinctive
roles of pollsters, academics, and news organizations and their involvement with political polling
as a critical element in the development of survey research in the United States. This is a story
of institutional conflicts and research design differences, and the ways they affected the
advancement of knowledge about polling methodology as well our understanding of political
behavior. It also explains a series of paradigmatic shifts in models explaining how and why
people vote. Across 75 years of development, relations between academic and commercial
pollsters have waxed and waned. In contemporary polling, academics continue to provide most
of the methodological development, quickly adopted by commercial pollsters.
A History of Survey Research at NORC
Norman Bradburn, Department of Psychology, NORC at the University of Chicago;
James A. Davis, Department of Psychology, NORC at the University of Chicago
The National Opinion Research Center (now just NORC) was founded by the Harry Field at the
University of Denver in 1941. Breaking from the commercial orientation of industry founders
Gallup, Roper, and Crossley, the National Opinion Research Center aimed to do survey
research in the public domain and to serve the social science community. Field in 1946
organized the first survey research conference at Central Center which led to the founding of
AAPOR.
Comparing Early Survey Research Methodologies in Mexico in the 1940s
Alejandro Moreno, Instituto Technológico Autónomo de México
In this paper I compare the development of public opinion research in Mexico during World War
II in different areas: media polls, academic research, and policy-oriented surveys. The latter two
include the works by Laszlo Radvanyi at the National University's Scientific Institute of Public
Opinion, as well as various works sponsored and conducted by the U.S. State Department.
Early polls developed by media outlets both in Mexico City and in Monterrey illustrate a
continuous measurement of public opinion, with ad-hoc methodologies that were still far from
proper probability sampling and questionnaire design, but that gave voice to politically excluded
segments of the population, such as women, who were granted the right to vote until 1953. For
the academic research efforts, I analyze some of the contents published in the International
Journal of Attitude and Opinion Research, edited by professor Radvanyi in Mexico City and
published for the first time in 1947. The word “encuesta,” used indifferently in Spanish for polls
and surveys, was used in the 1940s in academic books that relied on interviews with experts
rather than a broader public, but referring to a process of interviewing “some” sample. In
addition to the methodologies, this paper pays a special attention to the social groups that were
not only used in the stratification of samples but also in the analyses of poll results, providing
ways in which Mexican society was conceived during those years.
Maximizing Response Through Optimal Contact Strategies
Number of Mail and Phone Contact Attempts to Complete Physician Surveys
Julie C. Linville, SRA International; Eric Jamoom, National Center for Health Statistics;
Paul C. Beatty, National Center for Health Statistics; Nicholas A. Holt, SRA International
The National Center for Health Statistics has conducted the Electronic Health Records
Supplement (EHRS) of the National Ambulatory Medical Care Survey (NAMCS) annually by
mail since 2008. The EHRS asks physicians about their use of electronic health records
(EHRs), with 10,302 physicians surveyed each year in 2010, 2011 and 2012. A sample of 5,232
respondents to the 2011 EHRS were impaneled for the subsequent Physician Workflow Study
(PWS), a three-year longitudinal study designed to obtain additional information from physicians
regarding the impacts and barriers to adopting EHR systems. The PWS was administered using
a modification of methods proposed by Dillman (2007), with up to three mailings of the
questionnaire, a reminder postcard sent after the first mailing, and telephone follow-up of non-
respondents. Two versions of the PWS were developed, one for EHR adopters and another for
EHR non-adopters. Although the sampled physician was the intended respondent, proxy
responses from staff members in the practice were accepted when necessary. This paper will
examine the relationship between contact history and response for three years of the EHRS
(2010-2012) and the first two years of the longitudinal PWS (2011-2012). We will explore the
response yield and efficiency from each wave of contact. We will also analyze differences
across surveys, across adopter and non-adopter strata, across physician specialties and across
other practice characteristics. In addition, we will investigate the relationship between contact
history, physician specialty and adopter status on the prevalence of proxy response. Finally, we
will consider the implications of these findings toward the most efficient approaches for
maximizing responses from the sampled physicians.
Issues in Contacting and Engaging SNAP Recipients in a Longitudinal Survey
Crystal MacAllum, Westat; Suzanne McNutt, Westat; Adam Chu, Westat; Susan Bartlett,
Abt Associates; Kelly Kinnison, USDA Food and Nutrition Service
The Supplemental Nutrition Assistance Program (SNAP) provides nutritional foods to low-
income families. The Food and Nutrition Service (FNS) in the U.S. Department of Agriculture
administers the program. The FNS Healthy Incentives Pilot (HIP) Evaluation assessed the
impact of giving a financial incentive for the purchase of fruits and vegetables on recipients’ diet.
A random sample of 2,538 SNAP recipients in one U.S. county received the incentive while a
comparable sample of 2,538 in the same county did not. The study attempted to contact and
engage sampled SNAP recipients in three rounds of interviews over a 16-month period, spaced
approximately every six months. SNAP recipients are a mobile population that is hard to reach
and engage in research; therefore, obtaining and retaining a sample large enough to achieve
adequate power to detect a change in diet at each round of interviews was a challenge for the
study. This paper presents the strategies employed to contact and engage this population,
including telephone data collection with in-person field follow-up for non-respondents and those
with missing telephone numbers; progressively greater incentives over rounds of data collection;
additional incentives if respondents used their cell phones; and the use of iPads to manage the
interplay between cases in the field and those in the telephone center. The study was
successful in increasing the proportion of telephone respondents over waves: In Round 1 58%
of interviews were field completes and 42% telephone completes; by Round 3, only 26% were
field completes while 74% were telephone completes.
Improving Response and Operational Efficiency Under the Constraints of Time-
Sensitive Program Evaluation
Andy Weiss, Abt SRBI; Rhoda Cohen, Mathematica Policy Research; Faith Lewis, Abt
SRBI
Surveys to evaluate government benefit programs are constrained by the administrative
structure of those programs. Issues like time-sensitive administration and access to program
participants limit survey design choices. This can be an especially complicated problem for
locally administered programs. Flexibility in adapting survey design to local conditions holds
promise for improving response rates and other quality measures. We conducted an evaluation
of the USDA’s Summer Electronic Benefit Transfer for Children (SEBTC) program, which
provides a supplemental nutrition benefit to households with school-aged children during the
summer. In 2011, the sample included 5 sites and interviews with 5,000 households before the
school year ended and again in the summer. In the 2nd year, the evaluation entailed collecting
data from 14 sites and interviews with 27,000 households during each wave. All interviews were
completed over the telephone. Respondents were contacted by mail distribution of a toll-free
call-in number, outbound phone calls and in-person interviewers going to respondent homes
and initiating a call to our call center. The random assignment study assessed the impact of
SEBTC on children’s food security and other nutrition related measures. By implementing a
wide range of methods from providing technical assistance to help school districts consent
households to a case-level customized calling algorithm we improved the response rates from
65% in the summer of 2011 to 80% in the summer of 2012.
Setting Expectations for Managing Interviewer Performance
Barbara C. O’Hare, U.S. Census Bureau; Tamara S. Adams, U.S. Census Bureau; Chandra
Erdman, U.S. Census Bureau; James B. Lawrence, U.S. Census Bureau
Differential survey response across subpopulations and geographic areas is well documented in
the survey literature. The current challenges of survey administration require setting realistic
expectations of survey response rates, particularly in assessing survey progress and deciding
where to direct data collection effort. This paper discusses the development and implementation
of standardized field interviewer performance standards based on statistical variation in
demographic and socio-economic characteristics of neighborhood (block group) areas. The
result of this effort is a set of standards driven by the neighborhood characteristics of the
interviewer’s cases, rather than the overall county response rate of where the interviewer
primarily works. We defined the new field CAPI interviewer response rate standards through
statistical analysis of census and American Community Survey block group data to identify the
best predictors of census and survey response, to cluster neighborhoods, and to determine the
optimal number of performance strata. (Note: the analysis is described in detail in an abstract
submitted by Erdman, Adams, and Lawrence). After establishing new strata that reflect
variations in response rate, we addressed the practical issues of implementing new standards
on which interviewers are evaluated, including: 1) The process of establishing a distribution of
expected response rates within each stratum used to define five performance levels, adjusting
for small caseloads. 2) The collaborative effort between the field operations staff and the
statistical analysts to set the boundaries of the performance levels. 3) The challenges of
presenting the statistical analyses to field operations staff and addressing the practical concerns
of the need for face validity and clarity of how the standards were set to address interviewer and
personnel policy concerns. The experiences presented here can be of value to other survey
organizations setting survey performance expectations, in that it highlights the challenges of the
practical operational challenges of implementing statistically derived standards.
First Contact Strategies for Web Surveys: Is a Phone Call or a Letter the More
Effective Introduction?
Jill Connelly, NORC at the University of Chicago; Micah Sjoblom, NORC at the University
of Chicago; A. Rupa Datta, NORC at the University of Chicago; Peter Hepburn, NORC at
the University of Chicago
The objective of the National Survey of Early Care and Education (NSECE) is to document the
nation’s current use and availability of early care and education, and to deepen our
understanding of the extent to which families’ needs and preferences coordinate well with
providers’ offerings and constraints. The NSECE included a survey of home-based child care
providers who were licensed or otherwise registered with state agencies. The survey included
Web data collection, with phone or in-person follow up as needed. Individuals who provide care
to children in a home-based setting tend to be older or lower-income or in other demographic
subgroups that have lower Internet usage rates. In order to encourage participation by Web, a
$35 gift card was offered to complete the interview online. We had phone numbers, but no
mailing or email addresses for sampled individuals. We designed an experiment with 1,300
providers to test whether it would be more efficient to 1) send a letter or email as a first contact
based on locating efforts that didn’t involve personal contact with the respondent, or 2) make a
gaining cooperation phone call first, to introduce the study and then request mailing or email
information to send the Web survey request. Our evaluation includes comparisons of effort
required, success rates in reaching respondents through initial contact attempts, cooperation
with the initial request, and final cooperation rates.
Incentives and Survey Response
Survey Incentive Fees, Data Quality, Nonresponse, and Survey Administration
Jesse Bricker, Federal Reserve Board of Governors
This paper uses both the 2007-2009 Survey of Consumer Finances (SCF) panel and the 2007
and 2010 SCF cross sections to investigate whether monetary incentives help data quality,
conditional on responding to the survey, and help reduce time in the field. The 2007 SCF had a
base incentive of $20, though many needed $50 before responding; the base incentive in 2009
was $50. This first component of this paper compares the response rates and data quality of
two groups: those that received a $50 incentive in both waves to those that received a base $20
incentive in the first wave and the $50 base incentive in the second. Data quality is measured by
the use verifying documents and the precision of responses. The second component of this
paper uses two levels of variation to investigate the impact of incentives on time that field staff
spends in the field and the total number of times that field staff contacted potential respondents.
First, we use variation in base incentive across the 2007 and 2010 SCF cross sections, as the
base 2010 incentive was also increased to $50. Second, there is also variation across sampling
regions, as cost of living varies across rural and urban areas. Conditioning on detailed local
information and focusing on sampling areas that were used in both the 2007 and the 2010
surveys will allow us to analyze the costs and benefits of a larger survey incentive on time spent
in the field collecting data.
Timing of Nonparticipation in an Online Panel: The Effect of Incentive Strategies
Salima Douhou, CentERdata, Tilburg University; Annette Scherpenzeel, CentERdata,
Tilburg University
Nonresponse in (online) panel surveys is problematic since it may lead to a bias. An important
measure to secure respondent cooperation is the use of monetary incentives. An experiment
was carried out in the LISS panel (Longitudinal Internet Studies for the Social Sciences, an
online panel based on a true probability sample of households) in 2007 to determine the optimal
recruitment strategy for a new online household panel (see Scherpenzeel and Toepoel, 2012).
The monetary incentives varied during the recruitment. The incentives were either promised or
prepaid and the amount varied (10, 20 or 50 euros). More than 500 respondents were randomly
selected in the different incentive conditions. The prepaid incentives were quite effective in
increasing the recruitment rates for the panel. However, a question often posed is how these
incentives affect the long term participation in the panel. Are the respondents who were
recruited with the help of the high incentives not dropping out faster than respondents who
participate with intrinsic motivation? The purpose of this paper is to find out which incentive
strategy is efficient for long term participation of respondents, five years after the recruitment.
Efficiency implies both low recruitment costs combined with high response rate after entrance in
the panel. This paper takes a different approach to model the time-to-event of nonparticipation:
survival analysis. The event in this case is nonparticipation. This method has two important
advantages: 1) incorporates the timing of the event and 2) allows for censoring. This research
will provide new evidence on the timing of nonparticipation and the influence of different
incentive strategies on this timing. The paper will present the willingness of respondents to
participate for a long term in the panel for different incentive strategies.
Nonresponse and Nonresponse Bias in a Probability-Based Internet Panel: The
Effect of (Un)conditional Cash Incentives
Annelies Blom, University of Mannheim; Ulrich Krieger, University of Mannheim
The German Internet Panel (GIP) is a new large-scale online panel based on a probability
sample of individuals living within households in Germany. In 2012 households were
approached offline, with a short face-to-face interview. Subsequently, all household members
were invited to complete the bi-monthly GIP questionnaires. To minimize non-coverage bias,
households without access to the Internet were provided with the necessary hardware and/or a
broadband Internet connection. Recruitment into the GIP consisted of various stages: the face-
to-face household interview, mailed invitations to the online survey, reminder letters, a phone
follow-up, and final mailed reminders. During the face-to-face phase we conducted an
experiment with
€5 unconditional vs. €10 conditional household incentives. In addition, an
experiment with
€5 unconditional personal incentives was conducted during the first reminder.
We examine the question of whether a carefully recruited, probability-based online panel can be
representative of the general population and is thus suitable for social and economic research.
The models presented analyze the processes leading to participation and associated biases in
the sample. The various stages of recruitment into the GIP are assessed, as well as the effects
of the two incentives experiments.
The Effect of Prepaid Incentives on Responses to Sensitive Questions in a Mail
Survey
Rebecca Medway, American Institutes for Research
Researchers have expressed concern that offering monetary incentives in surveys may have
unintended effects on responses to survey questions. The current literature exploring the effect
of incentives on response distributions finds limited support for this fear. However, when
researchers have investigated the impact of incentives on survey responses, they typically have
analyzed all of the survey items as one group. It is possible that the incentive effect varies
depending on item characteristics, and that the decision to analyze all of the items at once
masks significant differences for subgroups of items. In particular, responses to sensitive items
appear to be subject to situational factors and survey design features; as a result, these items
may be more susceptible than non-sensitive ones to incentive effects. Furthermore, research
repeatedly shows that respondents misreport for sensitive items, so it would be useful to know
whether incentives affect how honestly respondents answer such items. This paper explores the
effect that offering a prepaid cash incentive had on self-presentation concerns and responses to
sensitive questions in a mail survey of registered voters. As compared to a control group that
did not receive an incentive, respondents who received $5 reported a significantly greater
number of highly sensitive, undesirable attitudes and behaviors. The incentive had no effect on
responses to less sensitive items, suggesting that item sensitivity may play a role in the
magnitude of the incentive effect. For three voting items where validation data was available,
the incentive resulted in a general pattern of reduced nonresponse bias and increased
measurement bias; however these effects generally were not significant. The effect of the
incentive did not vary significantly by respondent characteristics.
Effective e-incentive for Online Study: Comparing Branded e-Gift Card and Virtual
Cash Card
Teresa (Ye) Jin, The Nielsen Company; Shu Duan, The Nielsen Company; Jennie Lai, The
Nielsen Company; Michael W. Link, The Nielsen Company
Given the continued growth of young adults with Internet access, the incentive method should
complement the survey mode especially online studies for repeated measures. Past empirical
research examined potential online incentive methods such as vouchers, lotteries or donations,
eGift card, or virtual incentive and studied their effect on response rates. In an effort to gain
greater cooperation among the hard to reach cohorts (e.g., in particular young adults), Nielsen
will administer an online study using a Web-based application to collect media consumption
behavior. The research objectives are two-fold: 1) to test the effectiveness of incentive methods
(choice vs. no-choice) and 2) e-incentive options (branded e-gift card incentive vs. virtual cash
card incentive). The address-based sample will be randomly assigned to three conditions:
branded eGift card (i.e., Amazon.com Gift Card), virtual cash Visa card, or respondent can
choose any one of the two options listed above. The qualified respondent in the household will
be asked to participate in the one-week online study. This research paper will evaluate
cooperation rate by key demographic characteristics and compliance during the data collection
period for each incentive condition. The research findings will advance the body of knowledge
on the most effective incentive method and option to gain cooperation of the hard-to-reach
cohort in the digital age of online usage.
Friday, May 17
1:45 p.m. – 3:15 p.m.
AAPOR Concurrent Session E
Developments in the Design and Implementation of Web Surveys
The Effect of Compressing Questionnaire Length on Data Quality
Jessica LeBlanc, Center for Survey Research at University of Massachusetts Boston;
Carol Cosenza, Center for Survey Research at University of Massachusetts Boston
Consumer Assessment of Healthcare Providers and Systems (CAHPS®) instrument guidelines
recommend formatting ordinal response categories vertically. However, in an effort to create
questionnaires with fewer pages, some users have formatted the response options horizontally.
Frequently, when there are too many answer categories to fit on one horizontal line, CAHPS
users format responses horizontally over multiple rows. This formatting may lead respondents to
search for an appropriate response in the scale. As part of a survey of adult patients from a
university-based health system (n=2100), a methodological experiment was implemented, with
respondents randomized to receive one of three versions of the questionnaire. Version A
maintained closer compliance to CAHPS guidelines, containing mostly vertical scales with
horizontal scales used only when fit onto a single line. Versions B and C both contained
response scales with multiple columns and rows. In version B, ordinal response options were
listed horizontally in two rows (read from left to right, top-bottom) and in version C, response
options were listed vertically in two columns (read from top to bottom, left-right). Analysis of this
data will focus on differences among versions A, B, and C in survey response rates, mean
scores for single items, and item non-response. Particular attention will be paid to differences
between items that ask respondents to report on frequency of events (e.g. number of doctor
visits) or demographics (e.g., age) and items that ask respondents to assess experiences using
adjectival scales (e.g., never, sometimes, usually, always), which may be more difficult for
respondents to choose when the presentation of the ordinal responses is disrupted. This test
used the CAHPS ®Clinician & Group Patient-Centered Medical Home adult questionnaire and
utilized a standard 3-contact mailing protocol. It was funded by the Agency for Healthcare
Research and Quality. Data collection was completed in 2011.
Evaluating Interactive Feedback in Computer-Assisted Self-Interviewing (CASI)
Margaret L. Hudson, University of Michigan; Andrew L. Hupp, University of Michigan;
Chan Zhang, University of Michigan; Heather M. Schroeder, University of Michigan
A long-standing concern with self-interviewing methods is that respondents may lack the
motivation to spend effort in completing the survey, which can lead to satisficing and
compromised data quality. Recently researchers have started to explore the use of interactive
feedback in computer-assisted self-interviewing (CASI) whereby respondents are prompted if
satisficing behaviors are detected (e.g., respondents receive messages saying they are going
too fast when their response time is quicker than a certain threshold). In particular, a small
number of studies, mostly using online panels, have shown that such interactive feedback can
effectively reduce targeted undesirable behaviors in Web surveys without a substantial increase
in break-offs. While these findings are promising, it is not clear if the same success would be
observed with other survey populations who may not be as motivated to complete surveys as
panel respondents. Even more importantly, little is known as to whether this type of interactive
feedback in self-administered surveys could affect perceived privacy and thus, introduce social
desirability bias in answers to sensitive questions. We will report findings from a CASI survey of
mental health risk and resilience among Soldiers new to the U.S. Army. Response speed
prompts were implemented in response to concerns about satisficing behavior. The speed
prompts were introduced approximately one quarter of the way through the study. Since the
monthly samples are independent and representative, a natural pre/post comparison is
possible. Survey data will be compared before and after the implementation to evaluate whether
these prompts can effectively influence response time and improve response quality (based on
indicators such as item nonresponse, straightlining, and acquiescence). We will also assess if
the use of prompts could backfire – i.e., producing more break-offs and fewer reports of socially
undesirable answers, given the survey is voluntary and contains many sensitive questions (e.g.,
suicidal ideation).
Are You Seeing What I am Seeing? Exploring Response Option Visual Design
Effects With Eye-Tracking
Amanda Libman, University of Nebraska – Lincoln; Jolene D. Smyth, University of
Nebraska – Lincoln; Kristen Olson, University of Nebraska – Lincoln
Since the late 1990s theory drawn from the vision sciences and Gestalt psychology has guided
the visual design of questionnaires. A considerable amount of research has been conducted
that shows that altering questionnaire visual design can change response distributions and data
quality (Dillman, Smyth, and Christian 2009; Jenkins and Dillman 1995; Tourangeau, Couper, &
Conrad 2004). However, this research is limited in what it can tell us about how different visual
designs influence responses. In other words, the evidence for how visual design matters is
largely circumstantial. Eye-tracking technology gives us the opportunity to overcome this
challenge. A handful of studies have used eye tracking to better understand how respondents
see and process a questionnaire (Galesic et al.2008; Lenzer, Kaczmirek, and Galesic 2011). In
this paper, we will explore how visual design in response options assists respondents in
processing survey questions. Specifically, we will analyze eye-tracking data to examine the
effects of Web survey response option experiments that include symbolic language, grid
response options and the use of single and double columns. Preliminary evidence from the lab
shows that the addition of smiley faces to a Likert scale cause respondents to slow down when
processing the given response options. By observing how respondents actually view the
different versions of the questionnaire and visual aids, this study will contribute to our
understanding of how and why visual design influences responses and will shed light on best
practices for questionnaire design.
Classifying Mouse Movements to Predict Respondent Difficulty
Rachel Horwitz, U.S. Census Bureau
A goal of the survey interview is to collect reliable and valid data. Achieving this goal is often
difficult because respondents may not understand what is being asked of them. In traditional
interviewer-administered survey modes, interviewers can pick up on signs of confusion and
difficulty answering a question from the respondent’s speech patterns, expressions, or response
times. In self-administered surveys, however, identifying confused respondents has previously
not been possible. The introduction of Web surveys provides an interactive environment with a
vast amount of data that researchers can collect in real time. Using these data, it may be
possible to determine when respondents are having difficulty answering a question, much like in
an interviewer-administered survey. Using Web browsing and education research as a basis,
this paper identifies 11 unique movements that respondents make with the mouse cursor while
answering survey questions. Through an exploratory analysis, we hypothesized which of these
movements are related to difficulty answering survey questions. Then, using scenarios to
manipulate question difficulty and asking participants to rate the difficulty of each question, we
were able to test our hypotheses to determine which movements are related to difficulty and
which are general movements people make when interacting with a computer. Finally, this
paper proposes a model that can be used to predict, in real time, when a respondent is having
difficulty answering a survey question. We find that not only are certain mouse movements
highly predictive of difficulty, but they are more predictive than response times, which have been
used to predict difficulty in the past. This information can be used to provide real-time help to
confused participants or it can act as a diagnostic tool to identify confusing questions.
Dynamic Visual Design for List-Style Open-Ended Questions in Web Surveys
Marek Fuchs, Damstadt University of Technology
Several studies have demonstrated that respondents react to the size and design of the answer
field offered with open-ended questions in Web surveys. Larger answer boxes seem to pose an
additional burden and yield fewer answers and higher rates of item nonresponse as compared
to smaller answer boxes. At the same time larger answer boxes work as a stimulus that
increases the length of the response provided by those respondents who actually answer the
question. Similar findings have been demonstrated for list-style open-ended questions were
respondents are supposed to type short responses (e.g. name of countries, cities, or brand
names). In this paper we evaluate a method optimizing the extent of the answer to list-style
open-ended questions without increasing item nonresponse. We use a dynamic screen design
were respondents initially were exposed to one fixed answer box. If respondents entered a
response into an initially visible answer box, a second answer box appeared. If they again
entered a response a third box appeared (and so on). In a randomized field-experimental study
embedded in a large scale survey (n=6,100) we tested several question versions combining
various numbers of fixed and dynamic answer boxes in a between-subjects design. Results
indicated that the optimal design consisted of three initially visible (fixed) answer boxes and
dynamically providing further answer boxes if the respondents wanted to answer more extent.
Findings are discussed in light of the impact of the dynamic visual design on the question
answer process.
Question Order and Context Effects
Question Order Effects on Estimates of the Size and Characteristics of Religious
Groups
Gregory A. Smith, Pew Research Center; Besheer Mohamed, Pew Research Center;
Jessica Hamar Martinez, Pew Research Center
Religious identification is a key variable for understanding opinion on many topics (including
politics and elections) and a multifaceted, complex concept. It is an indicator of the religious
groups with which one identifies and one’s religious beliefs. But it can also tap ethnic or cultural
attachments even in the absence of any ongoing religious commitment. Depending on their
goals, researchers may be interested in one or another of these aspects of religious
identification. Some will only be interested in those who currently identify religiously with a
group. Others will be interested in the broader group of those who identify in some way with a
group (e.g., by virtue of their upbringing or family background) even if they do not currently think
of themselves as Catholic or Jewish or Mormon (for example) in religious terms. We report the
results of experiments in which we varied the wording and order of questions about religious
affiliation. We show that the wording and order of religious affiliation questions can have a
substantial impact on estimates of the size of religious groups. And we assess the degree to
which varying approaches to question wording and order can produce different estimates of the
religious attributes and demographic characteristics of religious groups. We discuss the ways in
which the specifics of how one asks about religion can shape the results one obtains. The
findings of our study are consequential not just for scholars of American religion, but also for the
many social and political researchers who include religious identification in their surveys or as a
variable in their studies.
Context Effects in Candidate Favorability Ratings: Lessons From the 2012
Elections
Eran Ben-Porath, Social Science Research Solutions; Damla Ergun, Langer Research
Associates; Gregory Holyk, Langer Research Associates; Gary Langer, Langer Research
Associates; Jon Cohen, Capital Insight/Washington Post Media
This study builds on context effects theory to test the impact of question order during an
ongoing favorability measurement of presidential candidates. Throughout the primaries and the
campaign leading up to the 2012 presidential election, respondents were asked how favorable
they felt toward various candidates. Our findings indicate that when respondents were asked
about Mitt Romney after Barack Obama, Romney was consistently rated more unfavorably than
when his name came before Obama’s. When respondents were asked about Romney prior to
Obama, the share of “Don’t Know” responses to Romney was significantly higher as was his
relative popularity among those with an expressed opinion. Order-effects for Obama’s
favorability ratings were not apparent. A similar pattern occurred in other comparisons as well.
For example, when asked about Rick Santorum prior to Mitt Romney, there were significantly
more “Don’t Know” responses for Santorum than when respondents were asked first about
Romney. Both favorable and unfavorable responses to Santorum were higher when Romney
was the first candidate mentioned. These findings illustrate how asking about the more familiar
candidates first provides context for the lesser-known candidates. Consistent with previous
research on context effects, order effects were strongest among respondents with less
education, lending further support to the idea that the better-known candidate provides a context
in which assessments are made. The results suggest that question order variations could result
in measurement of two different types of attitudes about lesser-known candidates: one is the
“true” attitude in so far as it can be measured, while the other is the attitude relative to the
better-known candidate. This, in turn, raises questions as to whether one order or another better
approximates the actual construct of favorability, and when rotating question order is
appropriate, given specific research objectives.
Interaction Between Question Context Effects and Linguistic Backgrounds
Sunghee Lee, University of Michigan; Norbert Schwarz, University of Michigan
Despite lacking theories, question context effects are one of the most frequently examined
measurement errors. Based on social cognition and communication theories and the notion of
high vs. low context culture, we hypothesized 1) interactions among textual, cultural, and
external question contexts. We chose the self-rated health (SRH) question, a popular survey
item believed to be immune to context effects, and further hypothesized 2) larger context effects
for Spanish speakers (and Hispanics) than English speakers (and non-Hispanics). We
conducted two sets of experiments in a multilingual survey. A subset of respondents was
randomly assigned to different textual contexts of SRH by varying its order in a questionnaire.
The results supported the hypotheses. English-speaking respondents’ reports on SRH were
consistent across all textual contexts, but simple changes in the textual contexts produced
dramatically different reports by Spanish-speaking respondents. Specifically, Spanish speakers
reported substantially better health when SRH was asked after specific health condition
questions than before any health-related questions. Because language is a proxy for culture,
this demonstrated an interaction between textual and cultural contexts. Furthermore, among
Spanish speakers, the textual context effects were larger for females and older respondents and
differed by comorbidity status, illustrating an interaction among three types of contexts.
Implications are twofold. First, context effect patterns observed in one culture do not necessarily
apply to another culture. Second, even within the same culture, context effects vary by
respondents’ characteristics. Hence, context effects studied with a homogeneous group should
not be assumed to hold in cross-cultural studies.
Some Informal Experiments on the Effects of Questionnaire Design Changes on
Item Nonresponse
Christine Kudisch, Experian Marketing Services; Josephine Leonard, Experian Marketing
Services; Max Kilger, Experian Marketing Services; Charlie Palit, University of
Wisconsin-Madison
For decades, Experian Simmons has conducted a national survey of U.S. consumers reporting
data on a sample of approximately 25,000 adults age 18+ annually. The mail survey instrument
is particularly large in scope and widely varied in topical content as well as broadly diverse in
the types of question formats used to measure those topics. This has provided us with
opportunities to empirically investigate the effect of different ways of asking questions and their
impact on item nonresponse. As a result we have collected an interesting set of examples
illustrating how changes in question wording and position can affect item non-response. This
presentation will present and discuss some common features of specific question formats that
affect item non-response. We also present the results of informal experiments aimed at
reducing item non-response bias through modifications to the question format as well as
question positioning changes. In addition we examine aspects of differential non-response to
questions by ethnicity in an exploration of some specific question formats that differ in terms of
non-response by Hispanic, non-Hispanic status and by language preference among Hispanics.
Are Question Context Effects Partially A Function of Forced Choice Questions?
David Moore, University of New Hampshire
Crucial to a sustainable future for public policy polls is whether they provide meaningful
assessments of public opinion. Often polls on even the same subject, however, produce
contradictory results, which are explained by attributing the differences to question wording or
question context effects. This paper reports on two different representative surveys that show 1)
a particular response order effect and, separately, 2) a particular question wording effect, were
not present among people with “intense opinions,” though the two effects were found for the
overall samples. These results suggest that if polls were measuring “meaningful” (i.e., intensely
held) opinions, some (or many) of the contradictory results produced by polls would disappear.
Background: Many polls ask public policy questions that pressure respondents to produce an
opinion, even if they don’t have one. The result: Typically more than 9 in 10 Americans appear
to have a meaningful opinion about virtually all issues. Separately, polls on the same subject
often produce startlingly different results. In May 2011, five polling organizations all asked about
bringing home troops from Afghanistan in the wake of Osama bin Laden’s death. Two polls
showed strong majorities in favor, two showed about an evenly divided public, and one found
strong opposition. A frequent explanation for such contradictory findings is that small differences
in question wording and question order could produce major differences in results. But maybe
that’s at least partly because we include people who really don’t have opinions, but are
pressured to respond anyway, and who are therefore particularly susceptible to small
differences in question wording and question order. The experiments reported in this paper
suggest that notion has merit. Both the response order effect and question wording effect were
minimized (or eliminated) when only people with intense opinions were analyzed.
Multi-cultural and Multi-Lingual Survey Research
A Comparison of Hispanic Households That Were Identified by Hispanic Surname
to Those That Were Not
Dan Estersohn, Arbitron Inc.; Kelly Dixon, Arbitron Inc.; Mike Kwanisai, Arbitron Inc.; Al
Tupek, Arbitron Inc.
In partnership with our sampling vendor (SSI) Arbitron has been investigating potential uses for
sample that has been appended with demographic data. The most useful attribute discovered
so far has been Hispanic household identification. SSI’s identification is based upon matching a
householder’s last name to a Hispanic surname list. SSI’s list is based the Census Bureau’s
Hispanic surnames list which has been in use (with modifications) for over 50 years. The
surname matching is not a perfect identification method. Some households are incorrectly
tagged as Hispanic while other Hispanic households are not identified. We propose to
investigate whether the correctly tagged Hispanic households are demographically or
geographically different than the Hispanic households that were not identified as Hispanic.
Differences between the two groups might suggest differential contact strategies such as
materials, incentives, or interviewer language. Arbitron’s respondent procedures are used to
identify the actual Hispanic households. Among the Arbitron-collected variables for the
comparison will be age, household size, number of persons, presence of children, the presence
of non-Hispanic persons in each group of households, and language spoken most often at
home. A test for spatial clustering of each group will also be performed. If one or both of the
groups are spatially clustered then an analysis of neighborhood Census variables can also be
undertaken. Among the neighborhood-level (census tracts) Census/ACS variables that can be
used are the Hispanic percent of the population, native vs. foreign-born, “linguistic Isolation,”
educational attainment and median income.
Survey Error and Survey Costs of Interviews Using Real-Time Interpreters
Stephen Immerwahr, New York City Department of Health and Mental Hygiene; Tara
Merry, Abt SRBI
Real-time interpreter services can be used to include linguistically isolated respondents in
telephone surveys, but the inherently unstandardized nature of these interviews raises serious
concerns about measurement error. Despite calls for evaluation, published analyses of survey
error and costs associated with real-time interpretation are rare. (Hu et al. 2010 and Link et al.
2009 are two recent articles of note.)Using data from a computer-assisted telephone
interviewing survey conducted in New York City between September 2008 and February 2009,
we compare survey error and cost for 82 interviews conducted in multiple languages by
interviewers aided by a commercial telephone interpreter service with 7472 standardized
interviews conducted in English or by bilingual interviewers in Spanish, Russian, and two
Chinese dialects. We report mean item nonresponse, average relative variance across
continuous variables, nonresponse to specific health conditions and behaviors, and within-
survey break-offs (starting but not completing the interview). We compare direct costs
(translating the survey into Spanish, Russian, Chinese scripts) and operation costs (calling and
interviewing, line costs, and live interpreter service fees).Overall differences in item
nonresponse were small but were substantial for some individual survey measures. For
example, when asked for the number of opposite-sex sexual partners in the past 12 months,
18% of real-time interpreter interviews resulted in ‘don't know’ or ‘refused’ responses compared
to 6% overall. Within-survey break-offs were also higher (25% vs. 11% overall).The cost-per-
complete for real-time interpreter interviews was $470: more than nine times the cost of
interviews in English or Spanish, and roughly four times that of Russian or Chinese language
interviews. The challenge facing survey researchers using real-time interpreters is to balance
reduction of bias by including this population with potential measurement error and greater cost.
Resolving Multilingual Issues in Survey Development: Experiences From a
Translation Workshop
Stephanie Beauvais, Westat; Jocelyn Newsome, Westat; Martha Stapleton, Westat; Kerry
Levin, Westat; Salma Shariff-Marco, Cancer Prevention Institute of California; Nancy
Breen, National Cancer Institute; Gordon Willis, National Cancer Institute
As surveys are increasingly administered in multiple languages, researchers must consider both
language and culture during translation (Harkness et al. 2010). In an innovative approach to
survey translation, Westat and NCI recently held a workshop that tackled multi-lingual issues
across multiple languages simultaneously. Its purpose was to address previously identified
problems with the Spanish- and Asian-language (Mandarin, Cantonese, Vietnamese, and
Korean) versions of the California Health Interview Survey (CHIS) Discrimination Module (DM)
(Shariff-Marco et al. 2009). Earlier behavior coding efforts identified a dissonance between the
translations and the original intent of the English-language items (Levin et al., 2010). These
translation “mismatches” were the focus of the workshop. Since both cultural and linguistic
issues were to be addressed in the workshop, the project team sought “culture brokers,” rather
than translators, for each language. Culture broker is an anthropological term referring to
someone who mediates and facilitates understanding between cultures (Jezewski & Sotnik,
2001). Each language team was comprised of individuals who had experience with survey
research, were knowledgeable about the culture and language of the target group, and were
able to think critically and collaborate with others. The translation workshop was designed to
focus on conceptual equivalence rather than exact word-for-word translation. The findings from
the workshop identified four primary areas where translations were problematic. First, it was
often difficult to capture nuances of an English idiom in translation. Second, there were
instances when simply no lexical counterpart existed in other languages. Third, response scales
suffered in translation. Finally, certain survey conventions posed unexpected problems. In this
paper, we discuss our experiences developing the translation workshop and finding the culture
brokers. We also discuss workshop dynamics and the team’s resolutions for the problematic
translations. We conclude by proposing areas for future research in multicultural and
multilingual survey development.
Are Latin Americans as Courteous as People Say? Survey Experiment Evidence
on “Courtesy Bias” From Five Countries
David Crow, Centro de Investigacion y Docencia Economicas (CIDE); Gerardo
Maldonado, Centro de Investigacion y Docencia Economicas (CIDE)
pronounced in Latin America for two reasons. First, given the relative recency of survey
research in Latin America, potential respondents may be more willing to participate in surveys
and more civil when they do. Second, given the formality and hospitality that characterizes
interpersonal communication in Latin America, respondents may be reluctant to give
unvarnished answers, preferring to put matters in the best light possible. Does courtesy bias
exist in Latin America? We attempt to answer this question by means of survey experiments
conducted in five Latin American countries (Brazil, Colombia, Ecuador, Mexico, and Uruguay). If
Latin Americans are susceptible to courtesy bias, we should observe it in all countries—though,
given intraregional cultural heterogeneity, to varying degrees. The survey experiment consists of
splitting national samples and providing each half-sample with a different response set for each
of four batteries of questions (on institutional trust, government performance evaluations,
attitudes on immigration policy, and support for liberal intraregional trade and migration policy).
The first response set was a four-point scale common in cross-national research (‘a lot,’
‘somewhat,’ ‘a little’ and ‘not at all’) and the second, a seven-point response scale. The seven-
point scale’s neutral midpoint and more graduated response options, in theory, give
respondents greater opportunity to nuance their ratings. We expect that courtesy bias will result
in higher means on the four-point scale than on the seven-point scale (after rescaling both from
0 to 1). Statistically indistinguishable means imply the absence of courtesy bias.
Respondent Difficulty in Cognitive Interviews: From Findings of Chinese and
Korean Cognitive Interviews
Hyunjoo Park, RTI International; Mandy Sha, RTI International; Murrey Olmsted, RTI
International
Cognitive interviewing has been widely used as a tool for pretesting and improving
questionnaires. As noted by Willis (2005), respondent recruitment determines the feasibility of a
cognitive interview study and the selected sample should attempt to cover a cross-section of the
population that is being studied. However, the practical side of conducting cognitive interviews
has received little empirical attention—in particular, not much is known about how to optimize
respondent selection. Literature has shown that level of education and age affect one’s
cognitive ability, including recall and verbal fluency; these are all skills required for being a
“good” cognitive interview respondent. Following this logic, the information produced from each
cognitive interview and its utility may vary. Using interview data and associated paradata (e.g.,
interview length) from 258 non-English language (Chinese and Korean) cognitive interviews
from the American Community Survey (ACS), this paper identifies indicators of respondent
difficulty and examines how those indicators are related to the outcome of cognitive interviews.
First, we identify variables likely to represent respondent difficulty and establish a profile of
participants who experience higher response difficulty with interview probes and the cognitive
interview setting based on the interviewers’ rating. We hypothesize that compared to those with
lower response difficulty, respondents with higher response difficulty will identify a larger
number of issues in cognitive interviews along with different types of issues (i.e., issues
indicating valid questionnaire problems vs. issues prone to user errors) by better understanding
the cognitive interview task. Finally, we provide practical guidelines about whom to include as
research participants in non-English cognitive interviews. Our recommendations may be also
applicable to English cognitive interviews.
Improving Response Rates in Establishment Surveys:
Results From Controlled Experiments
Evaluating the Effectiveness of Two Strategies to Improve Telephone Survey
Response Rates of Employers
Jeremy Pickreign, NORC at the University of Chicago; Heidi Whitmore, NORC at the
University of Chicago
This paper is an update to work initially presented at the ICES IV conference in Montreal,
Quebec, on June 13, 2012. The overall response rate for the California Employer Health
Benefits Survey has been hovering between 35 percent and 40 percent since 2004. The
response rate varies considerably by certain characteristics, however. For example, the
response rate for non-panel firms in 2010 was 24 percent and for firms with 3-49 workers was
27 percent. In contrast, the response rate for panel firms was 61 percent. This study examines
two strategies for improving the response rate in surveys of employers by targeting those with
the lowest response rate: the smallest non-panel firms. The two strategies included: 1) mailing a
personalized advance letter, and 2) offering financial incentives. We pre-called 1,024 non-panel
firms with 3-49 workers for the 2011 survey. We sent a personalized advance letter to 513 firms
successfully contacted. Simultaneously, we randomly assigned these 1,024 firms into three
incentive groups: firms sent a $20 incentive with the initial mailing; firms promised $20 upon
completion of the survey; and a control group receiving no incentive. Firms sent a personalized
advance letter had a significantly higher response rate than those sent a generic advance letter
(31.0 percent vs. 18.3 percent, p<0.001). Firms sent a financial incentive with the initial mailing
(22.0 percent vs. 28.1 percent, p=0.209) or were promised $20 upon completion of the survey
(30.0 percent vs. 28.1 percent, p=0.707) did not have significantly different response rates
compared to firms receiving no incentives. This lack of significance is supported via logistic
regression analysis (new work conducted following the ICES IV conference). Sending a
personalized advance letter has a significant impact on improving the overall response rate
while offering incentives does not.
The Effect of Non-Monetary Incentives in a Longitudinal Physician Survey
Paul Beatty, National Center for Health Statistics; Eric Jamoom, National Center for
Health Statistics
Physicians are often reluctant survey respondents because they are busy and receive many
survey requests. The use of incentives appears to be an attractive option for boosting physician
response rates, but in some past studies, relatively large incentives (e.g., $50) have failed to
make a difference—possibly because physicians see such incentives as inadequate for “buying”
their time. Token incentives may be more effective because they invoke norms of social
exchange. In at least one previous study, pens proved to be effective incentives in a general
population survey. In this study, we explored whether good-quality pens improved response
rates to a mail survey of physicians. We conducted an experiment using pen incentives in the
second wave of a three-year longitudinal mail survey, the Physician Workflow Study. This
survey, conducted by mail with telephone follow-up for those who do not initially respond,
explores physician attitudes and experiences regarding the use of electronic health records
(EHRs). The sample was stratified into “adopters” and “non-adopters” of EHRs, then sub-
stratified based on their response to the first wave of the survey (early mail respondents, late
mail respondents, telephone respondents, and non-respondents). Half of each stratum received
a pen incentive with the initial survey mailing. Overall, the response rate for those who received
pens was 4% higher than those who did not receive one. Most of the effect was realized in early
mail responses. While statistically significant, the main benefit of the boost is that it reduced the
need for expensive telephone follow-up. Additional analyses will explore whether the effect of
the pen varied based upon responses to the prior survey wave and by EHR adoption
experience (as physicians who adopted EHRs might have been more engaged in the survey
topic), and whether the pen affected response quality and rates of item-missing data.
Evaluating the Effect of a Non-Monetary Incentive in a Nationally Representative
Mixed-Mode Establishment Survey
Manisha Sengupta, National Center for Health Statistics; Lauren Harris-Kojetin, National
Center for Health Statistics; Melissa Hobbs, RTI International; Angela Greene, RTI
International
In 2012, the National Center for Health Statistics (NCHS) launched its new strategy for obtaining
nationally representative statistical information about the supply and use of the major types of
long-term care providers in the United States—the National Study of Long-Term Care Providers
(NSLTCP). NSLTCP represents a substantial redesign, including replacing in-person data
collection with less expensive mail, Web and telephone modes. When using in-person data
collection over the past couple of decades to survey a variety of long-term care providers
(assisted living communities, nursing homes, home health and hospice agencies), NCHS
experienced decreasing response rates from highs in the 90s to lows in the 70s. Because of
concerns about decreasing response rate trends and achieving adequate response rates when
transitioning from in-person data collection to modes that have traditionally produced relatively
lower response rates, NCHS embedded experiments into its 2012 national data collection effort.
This presentation focuses on a randomized experiment to test the effect of a non-financial
incentive. The base protocol included mail and Web choice options with computer-assisted
telephone interviewing (CATI) follow-up for non-respondents. The contacts included an advance
letter, first questionnaire mailing, thank you/reminder letter, second and third questionnaire
mailings, and CATI. For this experiment, treatment cases were offered a tailored report showing
their responses compared to all responses, if they participated. We hypothesize that compared
to the control group the treatment group would have a higher response rate both prior to CATI
and at study end and lower nonresponse bias, for both provider types. Results and implications
for the protocol for the next wave will be discussed.
Examining the Effects of Interventions to Obtain Participation via Less Expensive
Modes: Results from Experiments in a Nationally Representative Mixed-Mode
Establishment Survey
Lauren Harris-Kojetin, National Center for Health Statistics; Manisha Sengupta, National
Center for Health Statistics; Melissa Hobbs, RTI International; Angela Greene, RTI
International
Decreasing response rates and increasing data collection costs are enduring survey challenges.
This presentation reports results from two randomized experiments embedded within two
nationally representative establishment surveys, one of 5,000 adult day services centers and
the other of 11,700 assisting living communities, both part of the 2012 National Study of Long-
Term Care Providers (NSLTCP) sponsored by the National Center for Health Statistics.
NSLTCP includes substantially redesigned surveys that changed from in-person to less
expensive data collection modes. The main rationale for both experiments is to examine
whether survey respondents can be encouraged to participate using less expensive modes. The
base protocol for each survey included advance letter, first questionnaire mailing, thank
you/reminder letter, second and third questionnaire mailings, then computer-assisted telephone
interviewing (CATI) for non-respondents; both mail and Web options were provided in all
questionnaire mailings. In the “drive to the Web” experiment, treatment cases were provided
only the Web option until the third questionnaire mailing, when they were also given the mail
option. Among cases in the “explicit forewarning of non-response follow-up by telephone”
experimental group, the thank you/reminder letter stated that if they did not respond via Web or
mail by a specific date they would be called to complete the questionnaire by telephone. The
premise is that the respondents may prefer to complete the survey on their own schedule, which
they can do more readily by mail or Web than by telephone. We hypothesize for both
experiments that compared to the control group, the treatment group will have a higher
response rate prior to CATI and at the end of the field period. For the Web experiment, we
expect a higher response rate by Web compared to the control. Nonresponse bias and item-
missing rates will be examined. Results of both experiments and implications will be discussed.
Cell Phone Samples:
Effort, Outcomes and Costs
Home Is Where the Cooperation Is: The Association Between Interview Location
and Cooperation Among Cell-Phone Users
Christopher D. Ward, NORC at the University of Chicago; Becky Reimer, NORC at the
University of Chicago; Meena Khare, National Center for Health Statistics; Carla Black,
National Center for Immunization and Respiratory Diseases
Interviewing respondents on cell-phones can be particularly challenging to survey researchers.
While the proportions of cell-only and cell-mainly households are rising in the United States,
cell-phone samples often have lower response rates than landline samples and must therefore
sacrifice cost, timeliness, or both. In this paper, we examine the relationship between interview
location status (whether respondent is using a landline, a cell-at-home, or a cell-away outside
home) and the likelihood to respond. We do so by addressing two questions: first, does
cooperation vary by the interview location? Second, do respondent characteristics interact with
the respondent’s location on cooperation? We examine these questions using data from the
National Immunization Survey, a national, dual-frame RDD survey sponsored by the Centers for
Disease Control and Prevention. We use a logistic regression model to investigate factors that
predict differences in cooperation and likelihood of break-off among cell-at-home, cell-away, and
landline respondents. Preliminary results suggest that time of interview is a strong predictor of
likelihood to complete the interview and likelihood of agreeing to release child’s healthcare
records; evening cell-at-home respondents are much more likely to cooperate than daytime cell-
away respondents. Likewise, mothers are more likely to complete the interview and agree to the
release of the child’s healthcare records than are fathers. Many of these predictors interact
significantly with cell-location status. This research provides insight into the behavior of cell-
phone respondents and the conditions under which they may be most likely to respond. Given
the differences in cooperation among cell-at-home, cell-away, and landline respondents, we will
also discuss implications for data quality and limitations of the analysis.
The Cell Effect in Inbound Calling Behavior and Methods for Maximizing
Outcomes
Jenny Kelly, NORC at the University of Chicago; Becky Reimer, NORC at the University
of Chicago; Trevor Tompson, NORC at the University of Chicago; Jennifer Benz, NORC at
the University of Chicago
Recent NORC surveys have shown a marked increase in inbound calls that correlates with
increasing the proportion of the cell phone sample in RDD studies. The behavior of cell phone
respondents is likely to differ from that of landline respondents for several reasons. First, cell
phones are unlikely to be listed numbers. Cell users are therefore more likely to expect phone
calls from personal or business contacts and a missed call maybe viewed as a missed social or
business opportunity. Second, cell phones have advanced functionality over many landline
phones making it easier to respond to calls—often in a single keystroke. While these factors
increase the likelihood of cell phone users to place an inbound call, their expectations of what
will occur during the resulting connection may vary from landline users. Understanding how we
can best operate within their expectations is critical to obtaining high response rates with cell
sample. Using data collected from AP-NORC Center RDD surveys on inbound calls, we
analyzed inbound calling patterns to determine the impact of cell sample and methods to
maximize good outcomes from inbound calls. Our results confirmed that cell phone respondents
were significantly more likely than landline respondents to place inbound calls. Furthermore, we
were able to increase positive outcomes when calls were answered immediately by a live
interviewer. We also found that inbound cell callers seem mainly interested in simply finding out
who called them. As such, we reduced the proportion of hang-ups as respondents waited to be
connected by eliminating the automated greeting that allowed respondents to find out the call
was about a survey and hang-up before reaching an interviewer. The results suggest that a
blended inbound/outbound system can maximize positive outcomes for inbound calls while
achieving staffing efficiencies.
Cell Phone Costs Revisited: Understanding Cost and Productivity Ratios in Dual-
Frame Telephone Surveys
Thomas M. Guterbock, Center for Survey Research, University of Virginia; Andy
Peytchev, RTI International; Deborah L. Rexrode, Center for Survey Research, University
of Virginia
An earlier review of a convenience sample found that the per-interview cost of random–digit-
dialed (RDD) cell-phone numbers is on average higher than the cost of landline RDD interviews.
However, the ratio of cell-phone to landline interviewing costs varies widely across studies and
organizations, and may have changed over time. There is reason to believe that the cost ratios
for dual-frame phone surveys reported in the 2009 AAPOR Cell-phone Task Force report have
become more favorable since those data were assembled. This is especially the case since
sampling companies have been improving their frames, data-collection methods have
increasingly been tailored to cell-phone samples, and sampling companies have started to
provide cell-phone samples with appended information on activity status and ZIP code location
of some numbers. The cost ratios also likely vary substantially across data collection designs,
such as geographic targeting and screening criteria, and may even vary across sample vendors.
We are currently gathering detailed cost and productivity information on a widely inclusive
sample of dual-frame telephone studies conducted recently by survey research organizations.
We expect relative hourly productivity to depend on whether or not dual-phone users are
screened out, the type of cell-phone sample used, the specificity of sample geography,
variations in working number density on the cell-phone and landline sides, amount and form of
any incentives, variations in interview length, the specificity of screening criteria used in the
study, differences in incidence rates, and the efficiency of different dialing technologies. Building
on the prior work of the 2009 Task Force, our analysis updates estimates and develops
modeling strategies that will allow practitioners to predict more closely the cost of cell-phone
calling in future studies by: (1) updating the cost ratio information, (2) expanding the number of
surveys and organizations providing input, and (3) identifying the cost ratio drivers.
The Unusual Suspect: Call Protocol and Bias in the 2012 NHTSA Distracted
Driving Cell Phone Sample
Paul Schroeder, Abt SRBI; Mikleyn Meyers, Abt SRBI; Brian Meekins, U.S. Bureau of
Labor Statistics; Kristie Johnson, National Highway Transportation and Safety
Administration
To date, the majority of research examining samples of numbers from cell phone and mixed-use
exchanges has been limited to response rates, cost estimates, and the reduction in overall bias
when combined with landline samples. In the current study we thoroughly examine the call
history information in the cell phone sample of a national distracted driving study conducted in
2012. The survey employed a partial overlapping dual frame sample design of households with
landline telephones as well as households that relied on cell phones, and collected data from
interviews with drivers age 16 and older. The cell phone sample contained 2,143 completed
interviews with respondents who were classified as residing in cell-only or cell-mostly
households. We review residential penetration; the number of call attempts; the incidence of
callbacks, refusals and break-offs; the length of the interview; the time of day of the attempt; and
the overall pattern of calling. This allows us to more directly address effort, nonresponse, and
bias within the cell phone sample, examining whether increased effort on cell numbers is worth
the reduction in bias that may be obtained. We also employ a new technique which allows us to
control for multiple factors and isolate the effect each call attempt has on bias. In addition, our
analysis provides recommendations about the calling protocol for cell phones that may lead to
increased efficiency when dialing cell phone samples.
A Comparison of Bloomberg Consumer Comfort Index Data in Landline-Only vs.
Mixed-Frame Telephone Samples
Julie Phelan, Langer Research Associates; Gary Langer, Langer Research Associates
The changed nature of telephone use in the Unites States has raised a quandary for survey
research projects that focus on long-term trends in public attitudes. Orthodoxy holds that
methodology should be kept constant to preserve these time-trends. Yet changes in access to
the sampled population argue for methodological adjustments to preserve coverage. These
considerations are acute for ongoing measurements of consumer sentiment, the three longest-
standing of which are the monthly University of Michigan/Reuters Index of Consumer Sentiment,
The Conference Board’s monthly Consumer Confidence Index and the weekly Bloomberg
Consumer Comfort Index. The Michigan survey appears to continue as a landline-only sample.
The Conference Board in 2011 switched from a mail-in to an Internet-based approach, with
admitted impacts on the utility of its time-trend data. Our paper reports on a switch of the third
survey, the Bloomberg CCI, from a landline-only to a landline-plus-cell-phone sample in 2012.
The change to the Bloomberg CCI was made in recognition of the fact that the number of
Americans living in cell-phone-only (CPO) households has reached more than a third of the
population. Most survey research firms, finding this level of noncoverage intolerable, have
switched to dual-frame designs combining landline-only and cell-phone samples. Given the
fundamental value of the Bloomberg CCI’s long-term trend, we first conducted an extensive test
of the change. From January to March 2012 we supplemented the usual weekly landline CCI
(total n = 2750) with a weekly cell-phone sample (n = 1882). We examined the landline and
dual-frame estimates for the weekly index overall, among demographic groups and on individual
questions; and assessed the quality of the two samples by comparing their unweighted
demographic compositions and design effects from weighting. Apparent differences were tested
using overlapping sample t-tests.
Public Opinion on Current Political
and Social Issues
Public Support of the Military: Influence of Personal Experience and Perceived
Media Coverage on Attitudes Toward the U.S. Army, 2010-2012
Julie L. Andsager, The Everett Group; H. A. White, The Everett Group; Robert P. Daves,
The Everett Group; Stephen E. Everett, The Everett Group
Public trust and support of the military are crucial as engines of both funding and policy for U.S.
military endeavors. This study examines the public’s view of the U.S. Army, “through thick and
thin,” over 11 waves of survey data tracking attitudes quarterly from 2010 through the third
quarter of 2012. Data reported comprise responses from 1,950 randomly sampled U.S. adults
participating in RDD-based telephone surveys. Nineteen evaluative items regarding Army
performance and traits produced two factors (principal-components, varimax rotation), Army
professionalism (alpha = .92) and Army responsibility (alpha = .88). Army professionalism
comprised items related to training and technological capabilities. Army responsibility included
items regarding the care of soldiers, “Wounded Warriors,” veterans and Army families, etc.
Hierarchical regression analyses indicated that education and age were negatively related to the
attitude that the Army acts responsibly toward its soldiers, their families, and its responsibilities
to the nation. Personal experience (including family members currently serving, personal
service, and parents’ military service) were unrelated to perceptions of Army responsibility.
Favorable news coverage was significantly, positively predictive of responsibility, but advertising
was not. (Adjusted R2 for full model = .25.) For Army professionalism, age, parents’ service,
and confidence in the Army were positively related to perceptions of professionalism. Favorable
evaluations of news coverage and advertising were also positively related to perceptions of
professionalism. (Adjusted R2 = .35.) An examination of evaluations over time for the two
indexes indicates some fluctuation, explored in more detail in this paper. This study suggests
that, despite polls indicating a decline in news credibility over the last 40 years, news coverage
of the Army positively predicts performance evaluations in two aspects. Analysis of comments
on news coverage from respondents is included, indicating potential directions for Pentagon
Public Affairs professionals attempting to frame news coverage of the Army.
PAPOR Student Paper Award Winner
Too Many Immigrants? Examining Alternative Forms of Immigrant Population
Innumeracy
Daniel Herda, University of California - Davis
The tendency to over-estimate immigrant population sizes has garnered considerable scholarly
attention for its potential link to anti-immigrant policy support. However, this existing innumeracy
research has neglected other forms of ignorance, namely under-estimation and non-response.
Using the 2002 European Social Survey, the current study examines the full scope of
innumeracy for the first time. Results indicate that under-estimation and non-response occur
commonly across 21 countries and that over-estimation is far from ubiquitous. Non-responders
in particular are found to represent a distinct innumeracy form associated with low cognitive
availability and high negative affect. Multilevel models indicate that under-estimation associates
with greater opposition to anti-immigrant policy, while over-estimation and non-response
associate with greater support. Much of these associations are explained by affective factors.
However, significant under- and over-estimation coefficients remain net of controls suggesting
that innumeracy may be more important than initially thought. Overall, the results highlight the
multifaceted character of innumeracy.
Missed Opportunities in HIV Testing: Health Care Providers Ignore
Recommendations and Ignore Seniors
Micheline Blum, Baruch College School of Public Affairs, City University of New York;
Douglas Muzzio, Baruch College School of Public Affairs, City University of New York
'Missed Opportunities' reports on a survey conducted by Baruch College Survey Research for
the NYC Department of Health and Mental Hygiene. The study surveyed 2473 adult New
Yorkers from June to August, 2011 on a variety of HIV-related and sexual behavior questions. A
2012 AAPOR paper reported some preliminary findings from this study. “Missed Opportunities”
reports on analysis, data and policy implications NOT presented in 2012. Background: In 2006,
the CDC recommended routine HIV screening for individuals aged 13-64 in healthcare settings.
A 2010 New York State (NYS) law mandated the offer of an HIV test to all patients aged 13-64,
receiving hospital and primary care services. New Yorkers aged 65+ accounted for 2% of new
HIV diagnoses in 2010. However, 47% of them were diagnosed late in the course of infection,
more than double the rate of the general NYC population (22%). Findings: 1. Health care
providers are ignoring both the NYS law and the CDC recommendation to offer an HIV test to all
patients aged 13-64 (93% not offered test) with profound health consequences; 2. Seniors (65-
74) are particularly apt to have never been tested for HIV and to not be offered a test by health
care providers, despite the fact that one in three aged 65-74 is sexually active. Consequences
and Policy Recommendations: If NYC health care providers adhered to CDC recommendations
and NYS law, nearly a million New Yorkers 18-64 would have been tested for HIV for the first
time, leading to discovery of about 6500 with previously undiagnosed HIV infection. Among
those aged 65-74, an estimated 200,000+ would be tested for the first time, again uncovering
previously undiagnosed HIV infection and decreasing the number of late HIV diagnoses.
What Explains California’s Passage of Proposition 30: Fear of Education Cuts,
Gubernatorial Approval, Political Trust, or Tax Preferences?
Dean E. Bonner, PPIC
With automatic trigger cuts—mostly to K–12 education—looming, Californians went to the polls
on November 6 and passed Proposition 30, which increases taxes on the wealthy for seven
years and the state sales tax by 1/4 cent for four years. Given the central role that Governor
Brown played in the initiative campaign this paper analyzes the role that gubernatorial
approval—in comparison to opposition to automatic education cuts, political trust, and tax
preferences—played in support for Proposition 30. Utilizing pre-election data of those most
likely to vote in the general election, preliminary results indicate that tax preferences exert the
most leverage on Proposition 30 support. Further analysis will examine the interplay between
support for Proposition 30 and support for Proposition 38, another measure to fund education
that relied on increasing income taxes on most Californians based on a sliding scale. Analyzing
what types of people supported both propositions will provide a broader understanding of
support for tax increases in California.
Racial Resentment, Belief in Rumors about Barack Obama, and Racial and Ethnic
Identities
Michael W. Traugott, University of Michigan; Ashley E. Jardina, University of Michigan
Barack Obama has been the focus of innumerable rumors about his citizenship and religion,
and recent research has shown that racial resentment plays an important role in explaining
these views. These analyses have been based almost entirely upon white survey respondents
because the measurements of these concepts were made in single cross-sectional surveys, and
the numbers of nonwhite respondents were too small for analysis. In this analysis, we employ a
unique multi-survey data collection that allows pooling of respondents to support separate
analyses of Black and Hispanic respondents in addition to whites. As a result, we compare
models explaining these beliefs among different segments of the population and discuss why
and how they differ. This analysis is complicated by lower acceptance of rumors about Obama
among black respondents.
Reaching and Estimating Small or
Specialized Populations
Dynamic Averaging: A Modified Time Series Approach to Improve Estimates for
Smaller Demographic Groups
Kelly Dixon, Arbitron; Al Tupek, Arbitron; Richard Griffiths, Arbitron; Wolfgang Jank,
College of Business, University of South Florida
Arbitron is a media research company that produces quarterly or semi-annual estimates of radio
listening. The surveys are designed to produce a certain standard error for estimates for the
radio-market at an aggregate level. However, our primary customers are radio stations that
target specific demographic sub-groups of the market (for example, Black Males 18-34.) The
standard errors on these smaller sub-group estimates of radio listening are large which makes it
difficult to see a trend in the estimates. Our research goal is to improve estimates reliability for
smaller sub-groups and geographies while retaining trends and other actual changes of the
more reliable aggregate estimates. We also need a solution that will allow for sub-groups to add
up to the aggregates. Our proposed solution, dynamic averaging, achieves a smoother time
series for the sub-group estimates, which should give customers a better view of long-term
trends in their estimates. We estimate the reliability improvements to be equivalent of a sample
that is two to three times as large.
Small Area Estimation of a Rare Population Incidence
Stanislav Kolenikov, Abt SRBI; Benjamin Phillips, Abt SRBI
Abt SRBI is conducting a large scale CATI Survey of American Jews. Identifying this rare
population with incidence of about 1.9% across the nation involves enormous screening effort.
As Kalton & Anderson (1986) suggested, stratifying this population into different levels of
incidence brings efficiency gains in both costs and accuracy. Thus to inform stratification and
coverage decisions for the sample design, reliable estimates of Jewish incidence were needed.
We developed a small area mixed effects model that combined information from a number of
sources. The unit-level dependent variable, 0/1 indicator of being Jewish by religion (JBR), as
well as the sampling weights, came from a merge file of national studies conducted by Pew
Research Center. The area (county) level data came from: 1) ICPSR County Characteristics
data set; 2) a commercially acquired list of synagogues (geocoded by Abt SRBI GIS unit); 3) a
list of Jewish educational organizations (JData.com at Brandeis University); and 4) incidence of
Jewish names by a commercial sample provider. We demonstrate the steps of fitting the
multilevel mixed effects logistic regression model, obtaining the empirical best predictions (EBP;
Jiang and Lahiri 2001), and using the EBPs to delineate the strata by incidence and estimate
undercoverage of the survey. We also provide qualitative comparisons of our SAE estimates
with alternative estimates, such as direct estimates based on the screeners for the survey, the
direct estimates based on merge file only, and incidence of Jewish names only.
Efficient Sampling Designs for Rare Populations
Benjamin Phillips, Abt SRBI; Stanislav Kolenikov, Abt SRBI
Abt SRBI is conducting a national dual frame RDD Survey of American Jews. As a rare
population (c. 1. 9% of adults), a very large proportion of the study budget is spent identifying
eligible households, necessitating the development of an efficient sample. The methods we
develop apply to other rare populations. Study objectives included minimizing sampling error for
estimates of total Jewish population incidence and for estimates of characteristics of the overall
Jewish, Orthodox, and Russian Jewish populations. Given different geographic distributions of
these groups, targeting these populations had different implications for sample design. Design
choices included sample size, allocation to strata, landline/cell allocation, and degree of
undercoverage. Using small area estimates of Jewish incidence at the county level (Kolenikov
and Phillips, submitted) we developed an optimal allocation using the Excel 2010 nonlinear
solver and the study design/budget spreadsheet. Study objectives—measured as effective
sample sizes for the above subpopulations of interest—were weighted by importance and
combined using the Cobb-Douglas function. Design effects were calculated by simulating
survey dispositions based on a “donor” survey expected to have similar outcome rates. This
allowed us to include the impact of landline/cell phone RDD allocation on design effects,
including frame integration and adjustment to NHIS coverage estimates. We were also able to
estimate design effects for subpopulations of interest. It was determined that a smaller than
initially planned sample yielded a greater effective sample size, with the reduction in sample
size allowing the use of a more efficient design under the study budget. We illustrate the effect
on sample design of varying the importance of study objectives, coverage of the Jewish
population, and enforcing a minimum sample size constraint. We compare design projections to
study results and, in addition, compare study design to field results, such as design effect and
screening rate.
Sampling “Hidden” Populations in Developing Countries: An Application of
Respondent-Driven Sampling (RDS) in Ethiopia
Charles Q. Lau, RTI International; Georgiy Bobashev, RTI International; Burton Levine,
RTI International
In developing countries, surveys often attempt to sample populations not readily accessible to
researchers such as drug users, sex workers, or entrepreneurs. For example, smaller
businesses oftentimes lack a fixed address or operate out of plain sight, meaning that sampling
frames for these businesses are typically unavailable or incomplete. As a result, typical methods
such as sampling from telephone directories suffer from substantial coverage errors. One
potential solution is respondent-driven sampling (RDS), a method similar to chain-referral or
“snowball” sampling but with strict controls over the number of people a single subject can
recruit. Mathematical theory behind RDS allows one to adjust for known biases in convenience
sampling, such as network homophily and the tendency for individuals from larger networks to
be overrepresented. RDS has been used productively in studies of many hidden populations,
but has not been applied to businesses. To address this gap in our understanding, we used
RDS in a survey of businesses in the capital city of Ethiopia, Addis Ababa. Eligible participants
were owners of small businesses (3-99 employees) in the manufacturing, service, or trade
sectors. We recruited 24 initial respondents non-randomly. These initial respondents (and all
subsequent respondents) were provided incentives to recruit up to 3 additional respondents.
After 11 waves of recruitment over six weeks, we achieved our target sample size of 608. Our
paper reports the statistical properties of our sample and critically evaluates the assumptions
underlying the RDS approach. Preliminary analysis shows the sample compositions converged
and reached equilibrium which, according to the theory, indicates representativeness of the
population. The sample composition also closely tracked to the estimated population
composition, suggesting that RDS could be used as a reliable but also a cheaper alternative to
probability sampling. We also discuss the levels of homophily and random recruitment of the
network members.
Issue Publics in Nanotechnology in the New Media Environment
Doo-Hun Choi, University of Wisconsin – Madison; Michael Cacciatore, University of
Wisconsin – Madison; Young Mie Kim, University of Wisconsin – Madison; Dietram
Scheufele, University of Wisconsin – Madison; Michael Xenos, University of Wisconsin
Madison; Dominique Brossard, University of Wisconsin – Madison; Elizabeth Corley,
Arizona State University
While the public has low levels of interest in and understanding of nanotechnology (Scheufele &
Lewenstein, 2005), instead, issue publics, subsets of the population who are passionately
concerned with a specific science issue (Kim, 2009), exert significant influences on public
opinion formation and science policy decision. The concrete attitude structure and issue
specializations of issue publics indicate a high level of attitude extremity. Given the preferences
of selective information attendance and the availability of diverse and specialized information,
the nanotechnology issue publics will likely use the Internet to strength their attitudes.
Analyzing nationally representative online survey data (N = 2,805) collected by
KnowledgeNetworks, this study explores the predictors of attitude formation among the
nanotechnology issue publics. This study also explores how these factors interact in
determining attitude extremity toward the nanotechnology in the new media environment. Our
findings showed that “nanotechnology issue publics” tend to use science media more attentively
and have a higher level of nanotechnology knowledge than non-issue publics. The issue publics
were more extreme in their attitudes toward this emerging technology. We also found that the
Internet contributes to an increase in attitude extremity among issue publics. More importantly,
exploring how the issue publics form their attitude extremity, the study found that issue publics
relied on their schema, a specific cognitive structure toward a certain issue, for science and
technology, rather than political ideology, a set of basic beliefs about political world. Given
opportunities for selective exposure online and the interpretative schemas, the issue publics can
be expected to become polarized and extreme when forming their attitudes toward
nanotechnology. In this sense, as informal opinion leaders, the issue publics will have the
potential to influence public opinion formation toward nanotechnology, mirroring the extreme
division between the issue publics.
Monitoring Interviewer Behavior
Detecting Poorly Conducted Interviews
Joerg Blasius, University of Bonn
In their recent book, Blasius and Thiessen (2012) introduced several screening methods to
assess the quality and validity of survey data. They characterized the survey interview context
as one in which task simplification, time and effort minimization, and cost reduction strategies by
respondents, interviewers, and research institutes resulted in poor data quality. In this
presentation, we concentrate on the quality of the interviewers, identifying patterns that help to
assess how carefully and thoroughly they conduct their interviews. We illustrate our ideas using
the German General Social Survey 2008 in which we detect clusters of interviewer-specific
response combinations whose frequency of occurrence defies the odds to such an extent that
we suspect interviewer fraud to be the cause of some of them. Using two of the screening
methods proposed by Blasius and Thiessen (2012) we find a substantial number of interviewers
who simplified their tasks in a manner that reduced their interviewing time and effort but
increased their “measurement error”. Blasius, Jörg and Victor Thiessen (2012): Assessing the
Quality of Survey Data. London: Sage.
Interviewer Affect and CARI Effects: Lessons in Implementation and the Effects of
CARI on a Large-scale Longitudinal Study
Ryan A. Hubbard, Westat
This paper summarizes the evolution of Computer Audio-Recorded Interviewing (CARI) on
Medicare Current Beneficiary Survey (MCBS). CARI is integrated into a number of studies due
to its value as a tool for interview validation, assessment of interviewer performance, evaluation
of data quality and question assessment. While these uses of CARI have been documented
(Biemer, 2000, 2003; Herget, 2001; Thissen, 2007, 2009; Fisher, 2012; Kinsey, 2012; Hicks,
2008, 2012), CARI implementation on MCBS offers a new perspective on the effect CARI can
have on operations when implemented on an ongoing longitudinal study with a rotating-panel
design. MCBS implemented CARI with experienced interviewers and longitudinal study
respondents accustom to an interview for which audio recordings never occurred. Interviewers
voiced concern that introducing CARI would negatively affect rapport with respondents. In fact,
the project initially experienced a high rate of refusal to record concentrated among a subset of
experienced interviewers. Efforts were made to improve consent during the field period, and the
refusal to record rate dropped to expected levels. MCBS later introduced new interviewers and
respondents who had not been conditioned regarding CARI. The comparison of these
combinations contributes to a better understanding of CARI effects. Both interviewers and
respondents conditioned to a non-recorded interview are expected to produce higher rates of
recording refusal, but the key to this effect lies in the interaction. While implementing CARI on
an established study offered insight into consent, MCBS study design also allows for a data
quality impact assessment. A 10 minute increase in average interview length on MCBS was
attributable to CARI and the enforcement of proper interviewing technique (fewer shortcuts,
reading full question text). This practice should lead to better data quality. The paper analyzes
the effect of CARI on the quality of event reporting through a comparison of respondent-
reported medical events to insurance statements.
Variability in Error Detection Among Telephone Monitors
Douglas B. Currivan, RTI International; Derek Stone, RTI International; Curry Spain, RTI
International; Nicole Tate, RTI International
Standardized methods and tools for monitoring telephone interviewers are important for
ensuring survey data meet high quality standards. In order to effectively limit the risk of
interviewer behaviors biasing or adding variance to survey estimates, the quality monitoring
process requires accurate and consistent detection of interviewer errors. To this end, RTI has
developed a standardized, mode-independent interview quality monitoring evaluation system,
QUEST. This system supports evaluation of interviewing quality through both live monitoring
and review of digitally-recorded sessions. QUEST allows telephone interviewing behaviors to be
evaluated using a common set of quality metrics that are stored in a single shared database.
These metrics are based on objective indicators of specific interviewer behaviors, including
definitions and concrete examples for each behavior, as opposed to more subjective ratings or
impressions of interviewing quality. The primary hypothesis of our research is that the
standardized, objective approach followed in QUEST will produce minimal variation across
monitors in their detection of interviewer errors and other unacceptable behaviors. Two primary
sources of data are used to investigate variability in the rates at which monitors detect
interviewer errors: comparison of error detection rates across monitors from monthly monitoring
results and examination of the results of blind scoring by monitors of a set of 10 selected
interviewing scenarios. Comparisons of error detection rates across monitors includes both
overall errors detected across sessions and errors detected for specific interviewing skill areas.
In addition, this analysis examines whether scoring across monitors varies when factors such as
interviewing shifts or monitor experience levels are considered. Based on the results of the
comparisons of monthly monitor scores and blind scoring of interviewing scenarios, this
presentation discusses the implications of the observed levels of monitor scoring variability in
general and disagreements on specific scenarios for accurate and consistent detection of
interviewing errors.
A Field Experiment Using GPS Devices to Monitor Interviewer Travel Behavior
Kristen Olson, University of Nebraska-Lincoln; James Wagner, University of Michigan
Survey organizations rely on interviewers to make informed and efficient decisions about their
efforts in the field, including which houses they approach to knock on doors, make
appointments, and obtain interviews (Groves and Couper 1998). Previous evidence suggests
that inefficient decisions about where to travel can have deleterious effects on response rates
(Wagner and Olson 2011). To date, however, there is no systematic evaluation of how
interviewers make travel decisions in real time. This paper presents initial findings from a field
experiment and a survey of interviewers in a face to face survey, the National Survey of Family
Growth. NSFG interviewers were equipped with GPS-enabled smartphones. In the first quarter,
a random half of the interviewers were asked to record their travel behavior via a GPS logging
app in the smartphone; the second group recorded their travel behavior during the second
quarter of data collection. All interviewers were asked to record their travel for subsequent
quarters. We evaluate interviewer compliance with the GPS request, the quality of the recorded
GPS points, the correspondence between the GPS points and the attempts recorded in the call
records, and provide an overview of the interviewers’ travel behavior. We also report results
from a survey of the NSFG interviewers about the smartphone and GPS logging app. Initial
results indicate that 68% of the first quarter interviewer-days, 54% of second quarter
interviewer-days, and 57% of third and subsequent quarter days had GPS data recorded.
Results from the interviewer survey indicate that an interviewer’s failure to have travel behavior
recorded resulted largely from technical problems (e.g., forgetting to turn the phone on), not
from discomfort with having movements tracked via the GPS device. Implications for future use
of GPS devices to monitor interviewer travel behavior will be discussed.
Friday, May 17
3:15 p.m. – 4:15 p.m.
Poster Session 2
1. Trends in Cell Phone Calling Outcomes: BRFSS 2008-2011
Carol Pierannunzi, Centers for Disease Control and Prevention; Machell Town,
Centers for Disease Control and Prevention; Simone Salandy, Northrup Grumman
Contractor for CDC; Lina Balluz, Centers for Disease Control and Prevention
In 2011, the Behavioral Risk Factor Surveillance System (BRFSS) released both landline
and cell phone data for public use for the first time. However, the BRFSS has collected cell
data since 2008 as part of a large pilot study. This study examines the calling outcomes for
2. 7 million cell phone numbers included in the BRFSS samples from 2008-2011. Trends in
final dispositions are examined over time for the aggregated state samples and for selected
individual states with large number of cell samples. Patterns of response rates, refusal rates,
contact rates, out of sample numbers, terminations and partial completes are illustrated.
Demographic characteristics of respondents who completed the screening questions are
also included. Four year trend lines are produced for interim calling outcomes resulting in
completed interviews as well as for calls which result in refusals or cut-offs. Results indicate
that although terminations, break-offs, partial completes and refusal after determination of
eligibility are relatively small percentages of the sample, the proportion of these outcomes is
increasing over time. When taken as a percentage of the sample which resulted in contact
with potential respondents, these trends in unsuccessful cell phone outcomes are more
pronounced. The BRFSS is currently conducting new pilot studies to determine the
feasibility of other modes of data collection to counteract these trends.
2. Non-Responds Reasons Among Surveys Participants in the Gulf Arab Countries,
Case of QATAR
Elmogiera Elawad, Social and Economic Survey Research Institute, Qatar University;
Mohamed Ahmed Bala Agied, Social and Economic Survey Research Institute, Qatar
University
Choosing appropriate times of interviewing and use of males and females interviewer’s to
meet the society customs in addition to translate the questionnaire’s into different
languages, all these procedures taken into consideration to improve or maintain the
response rate in Qatar households surveys, but this rate obviously declined. With each field
survey conducted by the Social and Economic Survey Research Institute at Qatar university
-SESRI, we asked selected participants who refused to participate in the study about the
reasons of rejections if any, some of them of course refused even to answer this question,
but some answered. In this paper we will try to understand non respond cause among
Qataris and expatriates live in Qatar state, by studying the answers of non-respondent’s in
SESRI previous surveys 2011- 2012, we will know the reasons of non-responds and the
attitude of Qataris and expatriates towards the participations in field surveys.
3. Internet Versus Mail: A Comparison of Data Quality Indicators
Jenifer G. Tancreto, U.S. Census Bureau; Rachel Horwitz, U.S. Census Bureau; Mary
Davis, U.S. Census Bureau; Mary Frances Zelenak, U.S. Census Bureau
In April 2011, we conducted a test to evaluate the feasibility of providing an Internet
response mode to households selected for the American Community Survey (ACS). The
main purpose of this test was to determine the best methods for informing people about the
Internet response option and encouraging them to respond. Results suggested that
providing a sequential mode offering, starting with Internet followed by a paper
questionnaire, maintained or increased response rates while driving over 50 percent of self-
response to the Internet (Tancreto et al., 2012). This study analyzes data collected in that
test, as well as supplemental data collected as part of a reinterview, to examine the quality
of the data collected on the Internet compared to the quality of mail response data. This
analysis will help determine whether the Internet provides data of comparable quality to
mail. Specifically, we used the following data quality indicators: the amount of outliers and
the percentage of rounded values for some numeric income fields; the correlation between
certain related measures; and measures of response error generated from comparing data
from the original interview to a reinterview. We attempted to control for known demographic
differences between mail and Internet respondents using propensity weights so we could
measure true mode differences. Overall results suggest that the Internet data appear to
have a comparable level of quality compared to mail data.
4. Reducing Erroneous Enumerations in the Decennial Census Group Quarters
Populations While Potentially Reducing Follow-Up Costs
Geoffrey Jackson, U.S. Census Bureau
The foundation of the decennial census is to successfully count each person once, only
once, and in the correct place. Sometimes people live or stay in more than one place and
their lengths of residency make it difficult to ascertain which place is correct. The Census
Bureau has a rule on where people should be counted; how the rule is applied depends on if
the person lives in a housing unit or group quarters (living quarters such as college
dormitories, prisons, etc.) and where they were living on April 1, 2010. The respondent is not
always aware of how the rule applies to their situation. Research has shown that some
people living in group quarters tend to also be counted at a housing unit. In the 2010
Census, the method for resolving this person duplication occurred if 1) on the housing unit
questionnaire a person indicated living or staying at another address, 2) that housing unit
was re-contacted and the duplicated person was removed during the costly Coverage
Followup Operation. During the 2010 Census, an experimental questionnaire was tested for
people living in group quarters. The experimental group quarters questionnaire asked all
respondents if they had another address where they stayed at besides the group quarters.
The traditional group quarters questionnaire only asks for another address if the respondent
indicated they lived or stayed somewhere else most of the time. This paper will analyze the
number of people that provided an address on both group quarters questionnaires and if
they were found to be counted at that those addresses. The paper will show that the person
duplication between a group quarters and housing unit can be resolved without any costly
follow-up interviews by using the collected address on the questionnaire in conjunction with
the results of the person duplication matching.
5. Attempting to Reduce Respondent Burden in Complex Listing Tasks
Lauren A. Walton, The Nielsen Company; Anh Thu Burks, The Nielsen Company;
Christine Pierce, The Nielsen Company
In order to gain survey participation, researchers try to make the benefits of participating
outweigh the drawbacks of not participating by offering incentives, shorter questionnaires, or
interesting survey topics. Each survey is unique in its level of potential respondent burden
moderated by the questions asked, the survey format, and the level of cognitive effort
required by a respondent to complete the survey. When designing a survey, one major
consideration to gaining cooperation is the amount of time and effort required of a
participant to complete the survey. This paper tackles the respondent burden associated
with a complex knowledge based listing task that can be arduous to complete. An
experiment was conducted using a paper and pencil survey where respondents were asked
a series of demographic questions followed by a complex knowledge based listing task (i.e.,
respondents can provide hundreds of specific pieces of information) finished with the
respondent completing an event history calendar of a specific activity. More specifically, the
experiment manipulated the listing task that a respondent was asked to complete in the
survey. A proportion of respondents were randomly selected to provide a detailed list of
information while others in the sample were assigned to provide the minimal amount of data
required in order to reduce respondent burden associated with the listing task. Preliminary
results indicate a highly significant difference in favor of the reduced listing task in the
number of households that returned a useable survey (18.5% vs. 17.9%). Results from this
July 2012 test suggest reducing respondent burden in challenging surveys is a good thing
for respondents and research organizations.
6. Predicting Biases Due to the Use of Lottery Incentives in Surveys
David Fan, University of Minnesota; Joe Murphy, RTI International; Susan Mitchell,
RTI International; Ken Blake, Middle Tennessee State University
The goal of a survey is to obtain a set of responses from a representative sample of a target
population. Typically defined, representativeness means the characteristics of the sample
will, on average, match the target population. In other words, the survey methodology must
be independent of the responses sought. For example, the telephone method is commonly
used for political polls under the assumption that the responses are independent of phone
usage. However, the same phone poll would not be used to determine why the respondent
does not use the telephone. The reason is the complete correlation and lack of
independence between the phone non-usage question and the survey mode. The phone
poll example shows how error may lead to representative responses to some questions but
not others. This paper explores a similar inquiry, but about bias due to choice of incentive
type. Specifically, do lottery or drawing-type incentives lead to biased data for certain types
of questions? To identify the potential effect, we included a question about the preference
for a lottery incentive on two separate surveys using either a fixed payment incentive or no
incentive. We asked whether the respondents would prefer a drawing to a fixed payment.
We scored for the independence of lottery response from responses to other survey
questions. Responses correlated with a lottery preference should not be used in surveys
with a lottery incentive because the lottery is likely to bias the results. This paper is a
demonstration project for identifying potentially problematic questions on surveys. The long
term goal is to encourage survey researchers to routinely add simple methodological
questions like this to surveys. A database could be constructed for the types of responses
that are correlated with various survey designs and hence be problematic using the
corresponding methods.
7. Tell Me the Truth: The Response Validity of College Student Populations
Cole Napper, RTI International; Tilman Sheets, Louisiana Tech University
According to Peterson’s (2001) meta-analysis, a considerable proportion of research in the
social sciences has been conducted using American college sophomores as participants;
also known as the “science of sophomores” (Gordon et al., 1986). Although some
researchers support the notion that undergraduate students can be representative
populations for generalization to non-student populations (Highhouse & Gillespie, 2009), this
assertion should be evaluated in the context of whether participant’s motivation is to
satisfice or provide accurate responses (Krosnick, 1999). Fan (2006) states about half of
what participants report on self-report questionnaires is inaccurate. This is a troubling finding
for social scientists, and should prompt researchers to assess the quality of their data before
they expand upon their research conclusions. This research study was conducted to assess
response validity of an undergraduate student population. An experimental design utilizing
deception was used to elicit truthful responses on the effort and motivation of students
completing a long self-report questionnaire. The purpose was to examine if undergraduates’
responses in survey research are dishonest, involve little or no effort by the participant, and
if participants intentionally provide inaccurate responses. After finishing a cumbersome 300-
item scale, participants completed a response validity scale (RVS) which indicated the level
of effort they exerted and whether they intentionally provided inaccurate responses to the
self-report questionnaire. However, while participants completed the RVS, they were told
they were being monitored for lie detection (i.e., inactive eye-tracking and EEG hardware
were used to create a ruse that untruthful responses were monitored). The results examined
the validity of using long psychological measures (i.e., 300 items) on college student
populations. Also, student responses to the RVS are discussed, as well as the relationships
between those students who failed the validity check and the students who admitted to
intentionally providing inaccurate responses.
8. Utilizing GIS Data to Enhance Survey Data
Christine Cowles, Abt SRBI; Mark Morgan, Abt SRBI
Researchers have an increasing number of non-survey data resources available and it is
essential that the survey research community is proactive in incorporating this added value
in their study designs. This methodological brief will examine the use of geocoding and
appended geographical statistics in the analysis of how one’s neighborhood can impact their
mental health. The aim of the research is to understand the role of neighborhood
environment on physical and mental health to encourage policy choices that improve the
opportunity for aging residents to avoid or minimize depression and its effects on quality of
life. The data are collected in a 3 year cohort survey conducted among 3,500 older residents
living in New York City.
9. The Impact of Climate Change Issue in the 2012 U.S. Presidential Election
Bo MacInnis, Stanford University; Jon A. Krosnick, Stanford University; Jon Cohen,
Capital Insight/Washington Post Media; Clifford Young, Ipsos
The long-held theories of voting behavior posit that voters evaluate political candidates on
the basis of their positions on issues, and yet have received little empirical confirmation in
the general population and limited support among members of the public who attach high
personal importance to the issue. National surveys show large majorities of Americans
believe in climate change and want government actions to reduce future climate change,
and that climate change issue public is sizeable, suggesting climate change would be an
important factor in the 2012 election. However, counterarguments exist: one is that other
issues such as economy, seemed prominently important to electoral, possibly diminishing
the importance of climate change and the other is that candidates appeared nearly silent on
climate change, leading to the tendency toward a null effect of issue voting. Based on the
data from nationally representative surveys in September 2011 and June 2012, this
research employed the well-established methodologies in political science through the
measure of issue congruence. In the first study where respondents chose candidates to vote
for if the election were to be held where the candidates were President Obama and one
named Republican candidate, we exploited the cross-candidate and cross-respondent
variations in climate change stances as the identification sources, and found that Americans
were more likely to vote for a candidate, Democratic or Republican, whose belief matched
their own than to vote for a candidate whose belief differed from their own on climate
change. The second study found that greater relative proximity to Mr. Obama on climate
change than to Mr. Romney increased the likelihood of voting for him instead of for Mr.
Romney. While issue voting was found to be present among the general population in both
studies, it was moderated by attitude strength and personal importance, consistent with
issue voting theories.
10. A Framework and Usage Model of Social Media for Young Adults
Jennifer C. Romano Bergstrom, Fors Marsh Group; Caitlin Krulikowski, Fors Marsh
Group; Ricky Carroll, Appalachian State University; Kara Marsh, Fors Marsh Group;
Joseph N. Luchman, Fors Marsh Group; Katie Helland, Joint Advertising, Market
Research & Studies (JAMRS); Megan Fischer, Fors Marsh Group
The use of social media has grown immensely over the past decade, with technological and
Internet innovations like Facebook, Twitter, and YouTube achieving massive adoption in a
few years. Increasing numbers of young adults are using social media, and many
companies and organizations are using social media to reach out to youths. However, it is
unclear as to the extent to which organizations can apply the same strategies across
products, services, and industries when introducing social media into their marketing
strategy. Moreover, it is unclear whether social media is the most viable or effective
marketing platform to reach out to all young adults. We reviewed existing social media
literature (e.g., popular press, academic journals). Our review reveals that little guidance
exists on how and why youths use social media. Our review was used to build an organizing
framework of social media usage that was subsequently tested using a national probability-
based pencil-and-paper survey (N = 3,743) and a follow-up Web-based survey that the
original respondents were invited to complete (N = 1,686). Data for the original survey were
analyzed using a finite mixture model approach to uncover underlying “classes” of social
media user profiles. Data for the follow-up survey were analyzed using multidimensional
scaling to uncover the underlying framework across myriad social media channels. We
present the resulting two-dimensional framework model and the usage model, which
demonstrates the way young adults use each type of social media (e.g., “pushing”
information; “pulling” information). Our results suggest successful social media strategies
depend on the function of the social media channel and the marketing objective. Most
importantly, our study provides critical information on the motives of the media user in using
social media—critical information for effective targeting. We conclude with recommendations
for organizations seeking to use social media for marketing efforts.
11. Surveywalls: A Breakthrough for Survey Customers or DIY Run Amok?
Tom Wells, The Nielsen Company; Elizabeth Dean, RTI International; Kumar Rao, The
Nielsen Company; Joe Murphy, RTI International; David Roe, RTI International
Online surveys continue to transform how survey research is conducted, not just in terms of
the capabilities they offer, but also how online surveys are designed. Several companies
have recently entered the survey research field with a new type of platform, offering
researchers a do-it-yourself (DIY), cost-effective approach to surveying thousands of people
online. Respondents to DIY surveys are recruited from an online panel of Internet users or
by using a variety of online recruitment methods, including banner advertisements, email
campaigns, and search campaigns (i.e., search engine generated links). A new recruitment
approach for conducting DIY surveys has been gaining traction -- a “surveywall” that first
intercepts Internet users attempting to access restricted/paid content from a participating
website then solicits them to participate in a very brief survey (1-2 questions). Users are
sampled in real-time and, in exchange for their survey participation, are given access to the
paid content. Proponents of this DIY approach argue that by reducing survey burden, and
simultaneously providing more meaningful incentives (i.e., access to content), survey results
are as accurate as those derived from probability-based online panels. In this study, we test
the feasibility and performance of intercept-type DIY survey relative to a probability-based
online panel, a traditional opt-in online panel, and online populations recruited through two
popular social media platforms using a common questionnaire. We provide an independent
assessment, useful to those studying and contemplating using such a system. We compare
responses from all platforms to demographic and behavioral benchmarks, using the average
percentage point absolute error across all the questions in the survey, as done by
McDonald, Mohebbi, and Slatkin (2012) and Yeager, Krosnick, et al. (2011) in their
comparative research on survey accuracy. We discuss the findings from the study and
conclude with a call and recommendations for further research on this topic.
12. Does Classroom Observer Reliability Differ By Content or Approach To Data
Collection?
Harshini K. Shah, Mathematica Policy Research ; Jillian Stein, Mathematica Policy
Research; Katherine M. Burnett, Mathematica Policy Research; Tim Bruursema,
Mathematica Policy Research
The use of classroom observations (CO) has become increasingly common in large-scale
education studies assessing teaching effectiveness and in state accountability systems.
COs capture the actual experiences of students in classrooms rather than the intended
instruction that is often captured through teacher reports. COs require observers to record
behaviors in the classroom and can range from structured checklists to more qualitative
descriptions of behavior. A major consideration when using COs is observer reliability.
Although it is well understood that low observer reliability has an adverse effect on the
quality of data, surprisingly little research has compared the reliability of different
approaches to reporting live CO data. Our paper draws from a large-scale study that used a
CO tool in first and second grade classrooms. The tool used in this study contains a
combination of items that involve 1) discrete behavior coding and 2) global ratings of
classroom processes. The same observer completed both types of ratings for each CO and
observers collected data in classrooms in more than one region. We compare the reliability
of these two approaches by decomposing the variance in each type of rating across regions,
classrooms, and observers to determine how much of the variance in scores was
attributable to the observer and how this varies by observation approach (behavior coding
versus global ratings). Because the content of items differs both within and across the two
approaches, we also examine the extent to which content influences reliability. Our findings
contribute to the field’s understanding of issues surrounding the selection and development
of observational tools and facilitate the collection of higher quality data when using COs.
13. An Application of Network Analysis for Mapping the Structure and Evolution of an e-
Journal
Kumar Rao, The Nielsen Company; Kirby Goidel, Louisiana State University; Ashley
Kirzinger, University of Illinois Springfield; John M. Kennedy, Indiana University
Over the past decade, the landscape of scholarly publishing is changing the way journal
information is disseminated. While most journals nowadays offer digitized version of their
articles available for access on the Web, some have taken a step further by publishing
online-only articles, rather than in print. One such electronic journal, or e-journal, is Survey
Practice (SP). Established in 2008, the mission of SP is to provide current information on
issues in survey research and public opinion that is useful to survey and public opinion
practitioners, new survey researchers, and those interested in survey and polling methods.
The articles in Survey Practice emphasize useful and practical information designed to
enhance survey quality by providing a forum to share a) advances in practical survey
methods, b) current information on conditions affecting survey research, and c) interesting
features about surveys and people who work in survey research. In this study, we use
sophisticated network analysis techniques to map the structure and evolution of SP over a
four year time frame. In this effort, we formulate and investigate the following few research
questions: How has the collective scholarly knowledge of SP grown over time? Are there
any trends that are shaping the overall knowledgebase in SP and is this in line with SP’s
mission? What major areas and topics of survey and public opinion research have been
addressed in SP, and how have they evolved across years and how are they interlinked to
one another? Are certain segments of published articles (such as sponsored/funded
research) in SP evolved differently than the overall journal when it comes to intersection with
certain popular research themes and topics? The answers to these questions would provide
a basis to study the impact of SP in promoting research on issues in survey research and
public opinion.
14. Who Knows: Question Format, Don’t Know Discouragement, and Estimates of
Political Knowledge as a Dependent and Independent Variable
Joshua Robison, Northwestern University, Political Science Department
Political knowledge is a key dependent and independent variable. However, there is
considerable debate concerning how best to measure this concept, particularly as it relates
to open-ended versus multiple choice formatted questions and the encouragement of don’t
know responses. I report data from a survey experiment contained on the ANES’ EGSS3
survey, conducted in December 2011, which sheds further light on these questions. The
evidence supports three key findings. First, discouraging don’t know responses via an
introductory text do not substantially increase estimates of political knowledge. Second,
opting for a multiple-choice format rather than open-ended questions has a substantial
impact on estimates of how knowledgeable the mass public is and who is knowledgeable.
Purported knowledge gaps based upon political interest, education, and race are all
ameliorated when using a multiple-choice knowledge question rather than an open-ended
one. Third, these design elements influence the purported relationship between knowledge
and stereotype holding. While knowledge is negatively related with stereotype holding
against African-Americans, Hispanics and Muslims when an open-ended format is used, this
relationship is null when estimates of knowledge from multiple-choice questions are used
instead. The results reported in this article are thus highly relevant for the measurement and
use of a critical variable in political analyses (knowledge about politics).
15. The Results of Usability Testing of a New Online Consumer Expenditure Web Diary
Kathleen T. Ashenfelter, U.S. Census Bureau; Marylisa Gareau, U.S. Census Bureau
Two rounds of usability testing were conducted on a prototype version of the Consumer
Expenditure (CE) Web Diary Survey from January-July 2012. The CE diary examines the
buying habits of people in the United States and the products and services that are bought
by people in this country. Respondents to the CE Web Diary in the field complete the diary
for two weeks for all of the items for which they spend money. In Round 1, three treatment
groups were based on the way the expenditures were organized when presented to the
participants. Group 1 had the expenditures organized by day and person, Group 2 had the
items organized according to the type of purchase, and Group 3 had the items with no
organization. We hypothesized that Group 2 would be the most efficient and satisfactory
way to organize the expenses and that Group 3’s lack of structure would make it the least
popular. The results showed that the Group 2 was the most preferred organization style and
Group 3 the least preferred, as predicted. Participants also had trouble with categorizing
some items and over-reported alcohol purchases. In Round 2, participants were given
“receipts” of the “purchases” to enter instead of narrative lists. Each participant was
randomly assigned to one of three sorting conditions: Group 1 was instructed to sort the
receipts in any way they liked, Group 2 was instructed to sort the receipts into the 4 CE
Diary categories, and Group 3 was given no instructions. The results showed that many
Group 1 and Group 3 participants ended up sorting receipts into the Group 2 categories
after learning of the diary format. Participants still had trouble categorizing some purchases
and with over-reporting alcoholic beverages. There were some significant effects by age in
both rounds that will be discussed.
16. Did the First Presidential Debate Really Matter? Evidence From the 2012 NORC
Presidential Election Study
Rene Bautista, NORC at the University of Chicago; Tricia McCarthy, NORC at the
University of Chicago; Kirk Wolter, NORC at the University of Chicago
One of the most salient events in the recent presidential election campaign, where the
candidates discussed public policy issues, was the first presidential debate, held in Denver
on October 3, 2012. Political and media analysts suggested that the candidates’
performance in this debate had an effect on vote choice; however, little evidence was
presented to support the argument. NORC at the University of Chicago conducted a pre-
and post-election nationally representative survey of public opinion, with a focus on public
policy issues, including the economy and the Affordable Care Act. The pre-election portion
was fielded between September 24 and October 19. The fact that the debate took place in
the middle of the fieldwork period represents a natural experiment that creates an
opportunity to examine the impact of the debate on voter preference. Using multivariate
regression techniques, this paper will use ‘day of interview’ as an instrumental variable to
determine a potential effect of the debate, controlling for demographic characteristics, public
policy knowledge on health issues, economic evaluation, and party identification.
Additionally, NORC will collect actual vote choice among respondents who provided
consent. Regression models will incorporate such data to the extent possible.
17. Social Network Analysis and Survey Response: How Facebook Data Can Supplement
Survey Data
Adam Sage, RTI International
Social networking sites like Facebook and Twitter have resulted in the emergence of a type
of data that is under-explored in the field of public opinion and survey research. Social
network data is comprised of objects (typically people or groups) and the ties between the
objects (e.g., relationships or transactions). Previously, obtaining these data to conduct
thorough social network analysis was often impractically time consuming and costly. But
increases in the ability to efficiently access such data have raised the potential for
investigating new methods of analyses that may supplement current survey data, or
otherwise fill holes in extant research where traditional analysis is limited. Addressing
questions such as how objective measures of one’s network differ from self-reported
measures of such relationships, or how information flows and one’s social context influence
individual perception, thus survey results, are just a few examples of how survey and public
opinion researchers might find value in social media and other Web 2.0 concepts. This
paper demonstrates 1) how Facebook user data can be obtained through an application and
utilized to reconstruct social networks, 2) how similar data scaled to a user’s entire network
can be analyzed to understand the formation of opinions, attitudes, and behaviors, and 3)
how social network analysis of data native to social networking sites (e.g., a Facebook
friendship) can enhance the interpretation and precision of such data when used to
supplement survey data. Specifically, I describe an approach to developing a Facebook
application that obtains friendship data from users, processes for obtaining a user’s entire
Facebook friendship network, and how I analyzed my personal social network on Facebook
to produce measures. I then discuss how social network analysis techniques, such as
cluster analysis and clique identification, can be used to supplement and provide precision
to survey data.
18. Numbers, Numbers on the Dial, Which is the Fairest One on File? Cell or Landline?
Home or Work? Findings from an ABS Longitudinal Study
Anna Fleeman, Abt SRBI; Tiffany Henderson, Abt SRBI; Patricia Vanderwolf, Abt
SRBI; Kenneth J. Ruggiero, Medical University of South Carolina
Abt SRBI used an address-based-sampling (ABS) frame to select more than 200,000
addresses for a project fielding from 2011 through 2013. The use of an ABS was a result of
landline RDD coverage issues and the need to target precise geographies. After addresses
were selected, phone numbers were appended, if a match was available. Addresses able to
be matched to a phone number were called, asked phone status (e.g., cell-phone-only or
dual user), and screened for eligibility (12 to 17 in HH). Addresses unable to be matched to
a phone number were sent a letter and questionnaire screening for presence of a 12- to 17-
year-old and requesting contact information; two phone numbers were elicited. All available
or provided phone numbers were dialed; if households were eligible, baseline phone
interviews were conducted. Four and twelve months after completing the baseline,
households were then re-contacted for follow-up phone interviews. If needed, all available
numbers were called to maximize contact and response. We speculate that as cell-phone-
only households are increasing, so too are those providing a “work phone,” which is typically
a landline, on which to contact them. Presented findings will include the percentage of “work
phones” provided, along with working number rates and response rates by matched phone
status and type of phone –cell or landline, work or home – for all contacts during baseline
and both follow-up interviews. As the reliance on ABS increases in the survey research field,
knowing the best phone number on which to reach respondents is of critical importance.
Further, the results provide insight as to the retention, contact, and response rates of all
studies relying on an ABS frame.
19. Early Grade Reading Assessment – Using Tablet Technology and Efficient Survey
Methodology in Developing Nations
Karol Krotki, RTI International; Michael Costello, RTI International
Implementing a standardized national survey in many countries poses challenges, not the
least of which is how to incorporate national and sub-national cultural differences into the
survey design framework. RTI International has developed a procedure for designing and
implementing such a survey, EGRA/EGMA, Early Grade Reading/Mathematics Assessment
to measure changes in education attainment across time and to compare countries and
subpopulations within countries. The process also collects contextual demographic,
socioeconomic, and education data to help in the analysis. In this presentation we describe
how RTI has streamlined the sampling, data collection, data processing, and analysis to
make each iteration efficient in its implementation and effective in producing the desired
results. The data collection is carried out on tablets enabling standardized assessment,
automated data correction, and speedy and error-free data transfer to a centralized server.
This approach is now being widely used in many international education project evaluations
and demonstrates how technology and good design can facilitate survey research in even
very challenging circumstances.
20. Online Panels: Recruitment Based on “Hot Topics” – What are the Consequences?
Maria Andreasson, University of Gothenburg; Johan Martinsson, University of
Gothenburg
Cost-efficient and representative recruitment to online panels is a persistent challenge for
commercial enterprises and academic research alike. One method that is sometimes used is
to take advantage of highlighting that the panel or the survey in question concerns a “hot
topic” that most people are likely to find involving. This method can be exploited both with
probability based recruitments as well as with opt-in recruitments. This study compares the
consequences of “hot topic”-recruitment both for opt-in recruitment and for probability based
recruitment. During the summer 2012 four different surveys was fielded by the University of
Gothenburg concerning a local “hot topic”: the introduction of congestion charges around the
city of Gothenburg. In total four different surveys are compared: one from a pop-up ad on
the major local daily newspaper website concerning the congestion charges, one survey to
an opt-in sample from a general recruitment to the University of Gothenburg online panel,
one probability sample from a postal invitation highlighting the issue of the congestion
charges, and finally one probability sample from a general postal invitation to participate in
an online panel. The outcomes that are compared include: recruitment rates, cost-efficiency,
demographic and attitudinal representativeness. Special attention is paid to the hypothesis
that “hot topic”-recruitment might help recruit those that are normally not interested in social
or political issues, which might improve sample representativeness.
21. Relative Exposure: A Field Experiment Exploring the Influence of Public Opinion
Polling Data on Voter Preference
Heather Knappen, Rochester Institute of Technology
This poster will present original research from a field experiment conducted on New York’s
25th Congressional District race from July-August, 2012. A random sample of 200 registered
voters was invited to participate in an experiment to determine whether exposure to public
opinion polling data influences a voter’s preference for the candidate leading in a poll. The
experiment was conducted in three stages; first, each voter was called with a telephone poll
to establish a baseline level of support for each congressional candidate. Next, half of the
sample received one mailing and one robo-call with opinion polling data that showed one of
the candidates clearly leading the other (59%-41%). A second telephone poll was used to
determine whether voters who received the polling data were more inclined to support the
candidate leading in the poll compared to voters who did not receive the polling data. The
results from this experiment address critical questions about the influence of opinion polling
data on voter preference. Although previous experiments have also addressed these
questions, many of these studies have been concentrated in the laboratory. The benefit of
conducting a field experiment is to provide a more realistic assessment of the influence of
opinion polling data on voter preference within the context of a live campaign. Finally, this
field experiment is one of the first to employ registration-based sampling for telephone
survey research. The poster presentation will discuss several potential benefits of this
sampling technique. Examples include improvements to polling analysis since voter files
provide detailed demographic and voting histories, making it possible to more accurately
identify “likely voters”. As the AAPOR community pursues a more sustainable future for
public opinion research, this field experiment provides a good case study for the use of this
sampling technique.
22. How Spending Money Can Save You Money: The Impact of Incentives on Speed of
Response
Jennifer E. O’Brien, Westat
The effects of incentives on various aspects of survey administration continue to be active
area of research. Features of the incentive such as the type (monetary vs. nonmonetary),
amount (for monetary incentives), and the timing of the offer (prepayment vs. promised)
have resulted in a few well-documented effects. Decades of research have demonstrated
that, all else being equal, incentives increase participation rates and reduce refusal rates,
cash incentives are more effective than non-cash incentives, prepayment of incentives is
more effective than promised payment, the impact of incentives is greater in surveys with
few, if any, other reasons to participate, and large incentives are not needed to recruit lower-
income respondents (Singer & Bossarte, 2006). Another, less well-documented, impact of
the use of incentives concerns its influence on the speed of response. A handful of studies
have observed that not only do incentives increase response rate, they also increase the
speed of response (Czepiec, Landers, Hopkins, & Young, 1998; Gajraj, Faria, & Dickinson,
1990; Goldenberg, McGrath, & Tan, 2009; Hansen, 1980; Shettle & Mooney, 1999; Singer,
Van Hoewyk, & Maher, 2000.) The results of the present study replicate this observation in a
multi-mode, nationally representative household survey. In addition, we observed that the
small monetary incentive used in this study not only resulted in faster submission of
completed surveys but faster refusals as well. Thus, cases that were resolved quickly
(whether they be completes or refusals) were removed from follow-up efforts resulting in
savings to the study. In addition to presenting descriptive statistics, we will also present a
cost savings analysis.
23. Well, Not Well, or Not Well at All? Evaluating American Community Survey (ACS) Data
on School-Age Children Who Speak English With Difficulty
Angelina N. Kewal Ramani, American Institutes for Research; Amber Noel, American
Institutes for Research
In 2010, approximately 22 percent of school-age children spoke a language other than
English at home. As the U. S. population becomes more diverse, collecting accurate data on
language use and language ability is increasingly important. Several studies report on the
language proficiency of school-age children. The American Community Survey (ACS)
reports on children who speak a language other than English at home and how well these
children speak English. The U. S. Department of Education reports on the number of
English Language Learners (ELL) in public elementary and secondary schools. These two
sources reveal a different picture of school-age children with English language difficulties.
Over the past five years, the percentage of school-age children who speak English with
difficulty has remained steady (ACS), while the percentage of ELL students has consistently
increased (Department of Education). This paper will examine these puzzling findings and
evaluate the reliability of ACS language ability estimates for school-age children. The ACS
includes a three-part question on language use and ability. Respondents who speak a
language other than English at home are asked to assess how well they (or their children)
speak English, either “very well,” “well,” “not well,” or “not at all.” Generally, respondents who
report speaking English less than “very well” are considered to have some difficulty
speaking English. The Department of Education’s Common Core of Data (CCD) identifies
ELL students based on limited English language ability; therefore, these students speak
English with some difficulty. In addition, the Department of Education Office for Civil Rights
(OCR) collects more detailed information on ELL students including gender and
race/ethnicity. ACS estimates for 2006 and 2011 will be compared with CCD and OCR data.
Analyses will be conducted by gender and race/ethnicity. The research will reveal whether
the ACS provides reliable estimates of language ability.
24. Page Reduction Experiment with Diverse Populations
Stephanie Lloyd, Center for Survey Research, University of Massachusetts Boston;
Carol Cosenza, Center for Survey Research, University of Massachusetts Boston;
Lee Hargraves, Center for Survey Research, University of Massachusetts Boston
Although it is increasingly common for health care organizations to survey their patients to
assess the patient-centered care they provide, there is consistent pressure to minimize
survey costs. Given the increasing printing and postage expenses associated with mailing
paper questionnaires, one proposed way to lessen cost burdens is to minimize the number
of pages in self-administered mail questionnaires, often by compressing text and formatting.
Building on a previous experiment, the current project tested a CAHPS® Clinician & Group
(CG-CAHPS) questionnaire formatted to reduce its length from 12 to 4 pages to examine
effects on data quality with different sample groups. The two groups in this experiment who
were administered test questionnaires, included: (1) Spanish speakers, those sampled
health plan members who requested Spanish materials, and (2) adults who were asked to
respond about a sampled child. The survey, of which this study was a part, was funded by
the Agency for Healthcare Research and Quality (AHRQ). All samples were drawn from a
Medicaid population who were randomized to self-administer either the 12-page standard
(Spanish and Child) or one of the test versions (n=500). Both the Spanish and Child test
questionnaires were 4-page versions of the standard, using CAHPS guidelines, with the
introduction and instructions at the top of the first page. A standard 3-contact mail
administration protocol was followed. The current paper seeks to understand to what extent
the cost savings associated with reducing the number of pages has an adverse effect on the
quality of the resulting data. Response rates, item non response, substantive differences in
answers between horizontal and vertical presentation of response alternatives, and mean
CAHPS composite and rating measures will be compared across study arms. This study
was in the field until September 2012, and the data will be analyzed by early 2013.
25. Putting a Little Religion Into Volunteer Activity
Robert K. Goidel, Louisiana State University; Belinda Davis, Louisiana State
University
This paper began as a puzzle. Why did our state level estimates of volunteer activity in
Louisiana differ so dramatically from CPS estimates? According to the CPS, only 1 in 5
Louisiana adults (20.9 percent) engage in volunteer activity. Our state level estimates from
the 2012 Louisiana Volunteer Study (LVS) in contrast, place the number at just under half of
all Louisiana adults. To understand the nature of these differences, we conducted a survey
experiment in which respondents were asked either the CPS versions of the volunteer
questions or the questions we have routinely asked. In the first part of the experiment, we
tested the effect of including church as one of the organizations included in our standard
volunteer question (listed below).
VOL-1A: Have you done any volunteer activities in the last 12 months? I'm
asking about activities for which you were not paid, except perhaps expenses,
that you did in your neighborhood, or in other neighborhoods, at schools,
churches, or for a volunteer organization.
VOL-1B: Have you done any volunteer activities in the last 12 months? I'm
asking about activities for which you were not paid, except perhaps expenses,
that you did in your neighborhood, or in other neighborhoods, at schools, or for a
volunteer organization.
In the second part of the experiment, we directly compared the CPS question wording to the
LVS wording. The preliminary results indicate: 1) We estimate higher rates for volunteer
activity even when we use the CPS question wording. This likely reflects other differences in
terms of survey context, e.g., the LVS introduction cues respondents into the focus of the
survey. It may also reflect differences in response rates or some combination nonresponse
and subject matter; 2) Adding churches into the list of organizations in the LVS question
significantly increases the volunteer rate.
26. First Contact Strategies for Web Surveys: Is a Phone Call or a Letter the More
Effective Introduction?
Jill Connelly, NORC at the University of Chicago; Micah Sjoblom, NORC at the
University of Chicago; A. Rupa Datta, NORC at the University of Chicago; Peter
Hepburn, NORC at the University of Chicago
The objective of the National Survey of Early Care and Education (NSECE) is to document
the nation’s current use and availability of early care and education, and to deepen our
understanding of the extent to which families’ needs and preferences coordinate well with
providers’ offerings and constraints. The NSECE included a survey of home-based child
care providers who were licensed or otherwise registered with state agencies. The survey
included Web data collection, with phone or in-person follow up as needed. Individuals who
provide care to children in a home-based setting tend to be older or lower-income or in other
demographic subgroups that have lower Internet usage rates. In order to encourage
participation by Web, a $35 gift card was offered to complete the interview online. We had
phone numbers, but no mailing or e-mail addresses for sampled individuals. We designed
an experiment with 1,300 providers to test whether it would be more efficient to 1) send a
letter or e-mail as a first contact based on locating efforts that didn’t involve personal contact
with the respondent, or 2) make a gaining cooperation phone call first, to introduce the study
and then request mailing or e-mail information to send the Web survey request. Our
evaluation includes comparisons of effort required, success rates in reaching respondents
through initial contact attempts, cooperation with the initial request, and final cooperation
rates.
27. How Did the 2012 U.S. Presidential Campaign Season Affect Media Consumption and
Behavior?
Daniel Hutchison, Arbitron, Inc.
U.S. Presidential Election campaign seasons have several key media events including both
Television and Radio coverage of the National Conventions and the Presidential debates.
News broadcasts during the weeks leading up to the election carry coverage of campaign
efforts and culminate with Election Day coverage. Additionally, the overall environment of
the 2012 campaign season was shaped by a high utilization of political advertising, often
negative, throughout the nation. Specific states and markets received varying levels of this
advertising. Specifically, metros including districts with races for seats in the House of
Representatives and the Senate, perceived to be of “High-Value” by the national political
parties, attracted a higher level of advertising support from the parties themselves and from
independent Political Advocacy Coalitions as well. This led to an intensity of campaign-
related advertising that many perceived to be excessive. Arbitron PPM is a system to
passively collect Radio and Television media use over time among an on-going panel of
respondents. This system replaced the traditional paper Radio and Television self-report
diaries previously used in 47 top U.S. metros. This paper will explore the result of specific
media events, news broadcasts and, to a lesser degree, the result of political advertising on
Radio listening and Television viewing behavior. Results will include comparisons of overall
listening and viewing before and following the campaign season across the 47 metros
measured by Arbitron’s PPM system. Analyses of listening and viewing will be presented by
radio station format and for stations carrying the special media events overall and by age,
gender, and racial groups. Reviewing these results will add to our knowledge of the potential
impact of the media on public opinion during this election season.
28. Crowd Coding: Increasing the Time and Cost Efficiency of Common Research Tasks
Michael Jugovich, NORC at the University of Chicago; Patrick Van Kessel, NORC at
the University of Chicago
In recent years, many online crowdsourcing platforms have been developed and now
provide organizations with the opportunity to outsource simple yet labor-intensive tasks to a
large pool of individuals around the world. With the advent of services such as Amazon
Mechanical Turk, which offers the ability to easily create Human Intelligence Tasks for
distribution across a user base active both during and after normal business hours,
researchers now have the ability to leverage crowdsourcing technology to alleviate some of
the costs associated with straightforward coding tasks traditionally allocated to in-house
resources. Seizing this opportunity, NORC has developed a “crowd coding” software
package that allows researchers to quickly deploy custom assignments to Mechanical Turk,
with applications for research projects involving not only traditional designed data but
organic data as well. Examples include sentiment and relevancy analysis of social media
data, and the rapid and inexpensive construction of context-specific training datasets for
machine learning algorithms to be deployed on Big Data collections. This presentation will
focus on a series of case studies that explore the effectiveness of crowd coding compared
to traditional manual coding, measured across three dimensions: time, cost, and accuracy. It
will conclude with a discussion of the pros, cons, and potential future applications of the
technology.
29. Use of Paradata to Predict Participation in a Randomized Control Trial Intervention
Harmoni Noel, American Institutes for Research; Simone Robers, American Institutes
for Research; Grace Wang, American Institutes for Research; Alex Ortiz, American
Institutes for Research; Amy Windham, American Institutes for Research; Steven
Garfinkel, American Institutes for Research; Kristin Carmen, American Institutes for
Research
Paradata is increasingly being used to monitor response rates, conduct respondent
validation, evaluate interviewer performance, and determine cost efficiencies in the survey
administration process (Kreuter, Couper, & Lyberg, 2010). Paradata has also been used in
an adaptive design framework to tailor interventions to a subgroup of the sample to achieve
higher response rates (Couper & Wagner, 2011). This paper uses data from a randomized
control trial study with a pre/post survey intervention design to examine the use of paradata
such as pre-survey completion time as an indicator of likelihood to participate in the
intervention. The authors hypothesize that participants with shorter response times are less
likely to attend. If this hypothesis is supported, it suggests that survey completion could be
used in multi-stage research to tailor follow-up strategies to increase participation in
subsequent stages. In this study 1,747 participants were randomized into four experimental
conditions or the control group across four locations. The four experimental conditions
represent four methods for conducting public deliberations. These deliberations are
designed to obtain informed perspectives on complex topics similar to those that arise
frequently with respect to healthcare and health research decision making. The four
methods have varying levels of respondent burden, vary between in-person and online
formats and have varying attendance rates. Participants were recruited into the study before
pre-survey administration which led to a high overall response rate of 94%. First, the authors
will conduct a non-response bias analysis to compare intervention response rates by
respondent characteristics (race/ethnicity, gender, age, occupation, and level of education),
recruitment location, and experimental method to see if response propensity varies by these
subgroups. Second, the authors will examine whether pre-survey completion time is related
to subsequent participation in the intervention and whether different variables interact with
the potential effect of completion time.
30. Designing Questions to Measure Number of Sex Partners Among At-Risk Youths in
ACASI (Audio Computer-Assisted Self-Interviewing)
Kerryann DiLoreto, University of Wisconsin Survey Center; Jennifer Dykema,
University of Wisconsin Survey Center; Jessica Price, University of Wisconsin Survey
Center; Nora Cate Schaeffer, University of Wisconsin Survey Center
A central concern for questionnaire designers is how to design questions to accurately
measure the frequency of sensitive behaviors. For interviewer-administered surveys, past
research indicates that higher reports of sensitive behaviors may be obtained using: open
(versus closed) questions (e.g., Blair et al. 1977) and audio computer-assisted self-
interviewing (ACASI) (versus interviewer administration) (e.g., Turner et al. 1998). Little
research, however, explores how differences in question wording affect responses using
ACASI. We implemented an experiment using ACASI in which respondents reported the
total number of sex partners they had in their lifetimes and the last year using one of three
question formats: (1) closed-low frequency (from Add Health) that used the categories 0, 1-
2, 3-4, or 5 or more partners; (2) closed-high frequency that used the categories 0, 1, 2, 3-4,
5-6, 7-8, 9-10, or 11 or more partners; and (3) open-total frequency that allowed
respondents to enter a value for the total number of partners. Data are provided by the
Midwest Young Adult Study, a longitudinal in-person study of young adults transitioning out
of foster care. Current data are from Wave 5 (2010-2011) in which 82% of the baseline
sample (n = 590) were re-interviewed. This hard-to-reach population is characterized by
high engagement in behaviors with negative consequences and low literacy levels. While we
find no differences among the question formats in reporting about sex partners in the past
year, the open-total format is associated with lower reporting among men and higher
reporting among women for lifetime partners, and less missing data than the closed formats.
These results are consistent with research assessing the quality of reporting about sexual
partners which finds that men overreport and women underreport sex partners (Laumann et
al. 1991), and add to the body of research that recommends using open questions to
measure sensitive behaviors.
31. Household Composition and Child Wellbeing: Using Quantitative Data to Construct
Narratives to Inform a Research Agenda
Catherine C. Haggerty, NORC at the University of Chicago; Kate Bachtell, NORC at the
University of Chicago; Nola duToit, NORC at the University of Chicago; Ned English,
NORC at the University of Chicago
Due to the deinstitutionalization of marriage, high levels of divorce, and an increased
acceptance of cohabitation and single parenthood, there is a changing array of families in
American households (Stacey 1996, Thistle 2006). Du Toit et al. (2011) used data from two
waves of the Making Connections Survey, a study of disadvantaged urban communities, to
examine different types of households, the extent of change in household composition, and
differences in the effect of various household structures on a variety of economic measures
of child wellbeing. They observed large proportions of households that do not fit the
traditional nuclear family model and are not accounted for in conventional family studies.
These non-traditional households differ along several measures of economic wellbeing.
Changes in the composition of these different households further impact their economic
stability and, therefore, child wellbeing. Building on this quantitative research and using
quantitative data, we used a grounded qualitative approach to develop case studies of four
types of households: two parent, single parent, non-parent and extended family households
to further explore the characteristics of distinct household types, how they changed over
time, and how their unique qualities impacted child wellbeing. This methods brief presents
the process of developing these case studies to further explore the characteristics of these
distinct household types which informed the next steps in our research agenda.
32. Over sampling Young Adults on Cell Phones
Randal ZuWallack, Abt SRBI; Thomas Duffy, RTI International; Matthew Denker, Abt
SRBI
Young adults are often a key research group in public health and public safety surveys.
Many research organizations, such as the National Highway Traffic Safety Administration,
conduct surveys with oversamples of this age cohort to ensure sufficient data to analyze
driving behaviors and attitudes. In a recent national survey, nearly 40% of the cell phone
interviews were with respondents under age 35; the same survey yielded young adults less
than 10% of the time on landlines. It is clear that cell phones are an efficient method for
increasing the sample size for young adults. We conducted a cost-benefit analysis to
determine the best sampling design when young adults are a subpopulation of interest.
Optimal allocations that account for landline and cell cost differentials are not optimal for
reaching this population because the costs will favor the landline sample, resulting in an
undersample of young adults. We compare costs and benefits for three dual-frame designs,
1) based on the overall optimal allocation, 2) based on a screening oversample of young
adults, 3) and one with a higher allocation to cell phones. All designs are based on a fixed
cost and compared on the overall sample size, the sample size of young adults, and the
resulting design effects.
33. Those are the Breaks: Incumbents, Challengers and the Distribution of Unallocated
Votes in Pre-Election Polls
Christopher P. Borick, Muhlenberg College Institute of Public Opinion; David G.
Wegge, St. Norbert College
In almost every case pre-election polls contain a portion of voters who identify themselves
as “undecided” in terms of their candidate preferences in an upcoming election. In 2012 for
example, about 7% of pre-election poll respondents in Senate and gubernatorial election
polls conducted in the week before the election identified themselves as undecided in terms
of their voting plans. Of course on Election Day those undecided voters either select a
candidate or decide not to vote at all, leaving no voters unallocated in the final results. So
how do the undecided voters in pre-election polls break in terms of their ultimate decision?
For many years there was evidence that most unallocated votes broke towards challengers
in statewide races. However in the last decade it appears that there has been an increasing
share of unallocated votes being captured by incumbents seeking reelection. In this paper
we examine the unallocated vote in the 2012 election and the role that incumbency, party
affiliation and other candidate characteristics played in terms of the distribution of the
unallocated voters in the final results of Senate, House and gubernatorial races.
34. God, Money, Politics & Science: The Role of Religion, Conservative Economic and
Liberal Social Attitudes on Perception of Science in the Last Weeks of the 2012 U.S.
Presidential Election
Kristin Runge, University of Wisconsin – Madison
This study uses a two-wave panel design to examine the effects of perceptual filters in
predicting science-related opinion and media use during the weeks immediately prior to and
after the 2012 U.S. Presidential election. The first wave was conducted in the two weeks
prior to the first candidate debates (September 25, 2012 through October 8, 2012), and the
second wave was conducted after the election (field dates November 14, 2012 - November
21, 2012). A total of 1,401 respondents were segmented into 4 attitudinal clusters based on
religiosity, liberal/conservative economic attitudes and liberal/conservative social attitudes.
In our preliminary analysis of the first panel wave, we find that respondents clustered into
one of four segments: 1) high religiosity with conservative ideologies, 2) high religiosity with
liberal or moderate ideologies, 3) low religiosity with conservative ideologies, and 4) low
religiosity with liberal or moderate ideologies. Initial analysis indicates that response to 'How
much guidance does religion provide in your everyday life?’ is the strongest determinant of
cluster membership among the attitudinal bases variables. After controlling for demographic
characteristics, cluster membership predicts a number of items including likelihood of voting
for President Obama or Governor Romney, media habits, support for federal funding of
science, support for free market regulation of nanotechnology, benefit perception of
nanotechnology, synthetic biology and stem cell research, as well as trust in university
scientists, corporations, environmental organizations and religious institutions. Final analysis
will show how panel members voted and determine if attitudes and behaviors changed
during the final weeks of the election. Implications of results for media effects, science
communication and political communication research will be discussed.
35. Public Sentiments Online: New Tools of Measurement Combining Human- and
Computer-Based Coding
Leona Yi-Fan Su, University of Wisconsin – Madison; Xuan Liang, University of
Wisconsin – Madison; Nan Li, University of Wisconsin – Madison; Dietram A.
Scheufele, University of Wisconsin – Madison; Dominique Brossard, University of
Wisconsin – Madison; Michael Xenos, University of Wisconsin – Madison
The Internet provides researchers with a wide variety of tools for tapping opinion
expressions on Web-based platforms. This study uses a new content analysis method for
tapping opinion expressions in online Big Data environments. Based on a carefully
constructed keyword search about scientific topics, a series of Twitter posts are first
randomly pulled from publicly-available Twitter accounts. The selected content is interpreted
and analyzed by trained coders and then translated into appropriate categories.
Computational software (Crimson Hexagon) then extracts the linguistic patterns from the
coded examples and uses the resulting algorithms to track these patterns in every captured
tweet. In our method, human coders no longer serve as text-level analysts; instead, we
capitalize on human coders to translate extract sentiments and latent meanings from the
Tweets (equivalent to building a codebook in traditional content analysis) and use the
resulting algorithms to guide the computer-based analysis. In other words, computer
algorithms inductively determine the patterns of underlying content identified by human
coders, and then apply the learned patterns for large scale data processing. Our study also
provides empirical verification that this method can accurately analyze defined
communication content, sentiment and topics from large-scale datasets. Using nuclear
energy as one exemplar, we track public opinion expressed on Twitter before and after the
Fukushima Daiichi disaster in order to examine if variations in our sentiment coding reflect
changes consistent with these external influences. We also compare nuclear energy to other
scientific issues to demonstrate how our method proves to accurately track public
sentiments across issues without introducing false positives or other biases. Our results
suggest that this method works well in capitalizing on the strengths of human coding in
terms of preserving sentiment validity while relying on computer-based coding to reliably
process large-scale data of online opinion expression (e.g., millions of Twitter posts).
36. Turnout Validation of Survey Respondents in New Jersey
Ryan Tully, Princeton University; Amy Lerman, Princeton University
It is commonly observed that self-reported voter turnout in surveys is substantially higher
than actual turnout. In previous studies, researchers have attempted to use government
records to validate self-reported voter turnout among individual survey respondents. More
recently, Berent, Krosnick, and Lupia (2011) attempted to validate self-reported voter turnout
among participants in the 2008 American National Election Study (ANES) using government
records. Their study found that the “success” of turnout exercises in previous studies may be
due to an inherent bias that “people who choose to participate in surveys also choose to
participate in elections at a higher rate than people who do not participate in surveys” (p. 8).
Our study expands on this initial finding by conducting a turnout validation exercise for a
series of surveys conducted in central New Jersey in 2011. In our analysis, we compared
respondents based on various aspects of survey participation, including those who
volunteered to participate in an online panel or volunteered personal information with those
who do not. Overall, we found that respondents who opted into the online panel or
volunteered personal information were significantly more likely to accurately report voter
turnout than those who do not. Furthermore, we also found that various demographic
characteristics, such as age, race, educational attainment, and income, correlated with
significant differences in the accuracy of self-reported voter turnout among survey
respondents.
37. Who is Really Ahead in Election Polls? Practical Guidance on Assessing the Gap
Between Two Candidates
Kien Le, Social and Economic Survey Research Institute, Qatar University; Abdoulaye
Diop, Social and Economic Survey Research Institute, Qatar University; Darwish
Alemadi, Social and Economic Survey Research Institute, Qatar University
In election poll results, the proportions favoring candidates and the survey sampling error
are usually reported. However, it is hard to assess if the gap between any two candidates is
statistically significant or not based on this information. This note provides an alternative
measurement of sampling error for this assessment purpose. We detail the calculation steps
in STATA and SPSS programs to handle polls based on simple random sampling and also
polls based on more complicated designs.
38. Are Declining Response Rates Only a Symptom of a Bigger Problem?: Assessing
Trends in Survey Response Quality Between 2005 and 2013
Curtiss Cobb, GfK Knowledge Networks
In May 2012, Pew Research shocked the research community by publicly stating an already
widely known fact—there has been a general decline in response rates that is evident
across nearly all types of surveys. Pew offered as an example that its typical telephone
survey response rate in 1997 was 36% and is just 9% today. At the same time, greater effort
and expense are required to achieve even the diminished response rates of today. These
challenges have led many within and outside the survey research community to question
whether surveys are still providing accurate information. Non-response is only one way that
the quality of surveys may have changed over time. Unit non-response may merely be a
harbinger of the declining quality of responses even among those that do respond.
Satisficing behavior and item non-response may be increasing over time as well. Or, it is
also possible that non-responders were the “bad” respondents in previous years, leaving
only those that optimize their responses and measurable satisficing behavior has
decreased. Of course, measurement error and non-response may be completely unrelated.
This study seeks to examine trends in data quality to determine whether response quality is
also changing over time. We use 7 years of data profile data from GfK’s probability-based
Internet panel, KnowledgePanel®. We test whether item non-response, breakoffs, straight-
lining, speeding, and other satisficing and sub-optimal response behavior is increasing,
decreasing or remaining constant over time. We explore these trends in general and within
demographic groups.
39. Measuring Parental Engagement With Children’s Schools
Beth Schueler, Harvard Graduate School of Education
Researchers have repeatedly demonstrated that parental engagement with schools is
associated with positive educational and social outcomes for children (Walker, Wilkins,
Dallaire, Sandler, & Hoover-Dempsey, 2005). However, to accurately measure parent
engagement, new tools are needed that take advantage of best practices in survey design.
This poster outlines the development of a survey scale to assess parent perceptions of their
engagement with their child’s school. We employed Gehlbach and Brinkworth’s (2011) 6-
step process for scale development that front-loads input from academics and potential
respondents during item-development to establish evidence for validity with regard to both
populations. First, we conducted a literature review to define the construct and identify
potential indicators. Second, we conducted open-ended interviews with diverse groups of
parents to learn how they conceptualized of school engagement. Third, we systematically
compared literature review and interview results, noting distinctions in the language
academics and parents used. These findings informed our item phrasing. Fourth, we crafted
preliminary items that reflected key factors of engagement. Fifth, we subjected our items to
an expert review process regarding the relevance, comprehensiveness, developmental
appropriateness, and cultural appropriateness of our items (Rubio, Berg-Weger, Tebb, Lee,
& Rauch, 2003). Sixth, to ensure that parents understood our items as intended, we
conducted a “cognitive pretesting” procedure. We asked parents to rephrase the questions
in their own words and think aloud when answering the questions and then edited some
items for clarity. Finally, we conducted three studies with large national samples of parents
(n=385; n=253; n=531) to gather evidence of reliability and convergent/discriminant validity.
Through confirmatory factor analysis we identified a theoretically grounded factor structure
that fit the data well. The poster will describe the implications of our process for scale validity
and ways researchers and Pre-K – 12 schools can use the scale to aid school improvement
efforts.
40. The Case for Town Hall Debates: The Effects of the Press and Public Agendas on
Voter Acquisition of Campaign Knowledge
Jason Turcotte, Louisiana State University
An uninformed and unmotivated electorate has plagued American democracy for decades.
Americans know very little about their public officials and their stances on issues. With the
media devoting more attention to negativity, tactics, attack ads and horserace coverage,
voters have fewer avenues for learning about the candidates and substantive issues.
Political debates are one of those avenues, and perhaps serve as the only remaining
campaign event maintaining a mass audience. Using the 2008 National Annenberg Election
Survey data, this paper explores the relationship between exposure to the 2008 U. S.
presidential debates and political knowledge. More specifically, this project explains whether
survey data can reveal a more nuanced understanding of debate effects by extending
previous scholarship to account for effects differences among formats. I hypothesize that
exposure to all three general election debates holds a positive relationship with political
knowledge but, also, that the town hall debate – debates in which the electorate has a hand
in shaping the debate agenda – foster greater knowledge gains than traditional media
moderated debate formats where the press serves as sole gatekeeper of the discourse.
After controlling for a number of other variables known to influence political learning and
political knowledge, I find support for both hypotheses. As criticisms of debate formats and
moderating grew even louder in the 2012 U. S. presidential debates, these findings hold
numerous implications for democratic process and offer some preliminary evidence that
more participatory debate formats may improve political knowledge.
41. Blogging Nanotechnology: Public Discourse Around Emerging Technologies in the
Blogosphere
Xuan Liang, University of Wisconsin – Madison
Communication environments in this information age are experiencing rapid changes and
the Internet emerges as one of the dominant channels for science information (“Science and
Engineering Indicators,” 2012). New media forms, such as blogs, forums and podcasts, can
serve as public spaces for audiences to share knowledge, develop ideas about science, and
interact with scientists in a timely fashion (Birch & Weitkamp 2010). This raises important
questions about the types of user-generated information and opinions surrounding emerging
technologies, such as nanotechnology, that audiences may encounter in blogs. In order to
explore the landscape of blog traffic about nanotechnology, we use a computational
linguistic software to analyze a census of all English-language nanotechnology-related blog
posts generated between January 1, 2009, and October 31, 2012. Results of content
analysis and sentiment analysis on a total number of 680,790 related posts show that
nanotechnology is depicted in a comprehensive and comparatively positive picture in the
blogosphere. Overall, most of the blog posts presented information about nanotech related
consumer products, followed by discussion about business, national security, medicine,
EHS (Environmental Health and Safety), basic research and energy. Thirty-six percent of
blog posts expressed optimistic opinions, 32% expressed neutral opinions and 32%
expressed pessimistic opinions. Interestingly, we found that scientists’ latest research was
reflected in the perceivable fluctuations of some topics covered in the blog posts. Our results
have significant implications for the understanding of the open discourse of nanotechnology
in the blogosphere, and more importantly, how new media on the Internet reflects and
shapes public opinion of this emerging technology.
42. Is Deliberative Science Possible? Examining the Links Between Informational
Factors, Scientific Knowledge, and Attitude Extremity
Nan Li, University of Wisconsin – Madison; Dominique Brossard, University of
Wisconsin – Madison
In the past decades, U.S. citizens have increasingly been asked to engage in the decision-
making process related to science policy with a high level of public interests at stake. Those
who hold strong opinions about the issues at hand are more likely to participate in public
discussions and to express themselves openly. Studies have shown that the strength of
individual attitudes can be influenced by a variety of factors, including the heterogeneity of
networks and the nature of the information environment one is constantly exposed to. In this
study, we examine whether and how people’s attention to news and entertaining contents
on mass media may influence the extremity of their attitudes toward the issue of nuclear
power. In addition, we test whether interactive online communication makes people develop
strong opinions on this issue. Using data from a nationwide online survey carried out in 2010
(N = 1,138), this study finds that higher levels of attention to news in newspapers and
television, as well as more frequent interpersonal talk, make people develop more extreme
attitudes toward nuclear power. The relationships between media use, interpersonal talk,
and attitude extremity, are mediated by the level of factual knowledge about this issue. In
contrast, interactive online communication is not significantly related to attitude extremity.
Results hence suggest an absence of the so-called “echo-chamber” effects of the Internet
regarding controversial scientific issues. In fact, people tend to develop extreme attitudes
toward nuclear power based on the knowledge that is obtained either from interpersonal talk
or newspaper and television news.
Friday, May 17
3:15 p.m. – 4:15 p.m.
AAPOR Demonstration Session #2
Mathematica’s Survey E-Tool: Assisting Third-Party Data Collection
Kristina P. Rall, Mathematica Policy Research
With the rapid advancement of data collection technology, including Web-based instruments
and handheld devices, it’s easy to lose sight of the continuing need by some organizations to
collect data the “old-fashioned” way, using paper questionnaires or basic Windows applications.
A growing number of such organizations are seeking technical assistance to conduct their own
surveys rather than contracting them out. However, they may lack funding for devices such as
laptops, tablets, and smartphones and may not have Internet access in locations where the
survey is conducted, limiting their ability to use the most modern survey modes. The Centers for
Medicare & Medicaid Services (CMS) faced this constraint in administering its Money Follows
the Person (MFP) project. MFP provides demonstration grants to 44 states to help them reform
their financing and service designs for long-term health care, the ultimate goal being to measure
the costs and benefits of transitioning some Medicare patients from institutional to community
care settings. The Quality of Life (QoL) survey is currently being conducted to evaluate MFP; in
administering the QoL, participating states follow patients from one care setting to another,
surveying them at several times in multiple locations. To help states gather and submit data for
QoL, Mathematica Policy Research developed the “Survey E-Tool.” This user-friendly database
streamlines the process of manually entering data collected on hardcopy forms and enables
states to transmit data through Gentran when an Internet connection is available. The tool is
programmed to account for different versions of Access used by state offices. Without this
standard system, data would be submitted as Excel files with no consistent layout, necessitating
additional time and expense merging files to conduct analysis.
Colectica for Microsoft Excel: Increasing Transparency Using Open Standards
Dan Smith, Colectica
Colectica is a suite of modern metadata management software that is used to document public
opinion and survey research methodologies and data. This demonstration will introduce the new
Colectica for Microsoft Excel software, a free tool to document statistical data using open
standards. There is often inadequate transparency of research methods when results of opinion
polls and behavioral science research are disseminated. Colectica allows organizations to
increase their openness and credibility through standardized documentation of their data
collection, research process and resulting data. The software implements leading open
standards including the Data Documentation Initiative (DDI) Lifecycle version 3 and ISO 11179.
Using this software allows survey organizations to both better educate survey sponsors and the
public on their methodology and increases the organization’s reputation for performing credible
scientific research. The free Colectica for Excel tool allows researchers to document their data
directly in Microsoft Excel. Variables, Code Lists, and the datasets can be globally identified and
described in a standard format. Data can also be directly imported and documented from SPSS
and Stata files. The standardized metadata is stored within the Excel files so it will be available
to anyone receiving the documented dataset. Code books can also be customized and
generated by the tool, and output in PDF, Word, Html, and XSL-FO formats.
Roper Center: Archiving Services and Access Tools
Lois Timms-Ferrara, Roper Center for Public Opinion Research
Marc Maynard, Roper Center for Public Opinion Research
Founded at about the same time as AAPOR, the Roper Center archives are now the largest and
most comprehensive archives of public opinion data. The Center’s role in the data life cycle is
one of preserving survey data entrusted to its care in perpetuity and making these data
available via intuitive access tools. Preserving data for long term access requires vigilant review
of all data and documentation, standardization of data formats, and ongoing attentiveness to
aging and new technologies impacting those data formats. This year, new procedures have
been adopted to clearly identify specific features of the data coming into the Center that mirror
those of AAPOR’s Transparency Initiative. By more completely documenting the details of
publicly released survey data at the point of ingest, the objectives of the TI and the Center to
better inform poll consumers may be achieved. Come and see how this new process works.
This spring the Roper Center released a set of enhanced services impacting access to some
20,000 U.S. and international survey datasets archived at the Center, as well as, iPOLL a
database of more than 600,000 U.S. questions and responses asked over the last 75 years.
This demonstration will review recently improved data discovery and analysis tools that support
the utilization of public opinion surveys. Survey practitioners engaged in questionnaire design,
comparative research, and analysis of all types of survey data will discover the value of this
collection unlocked by these advanced features. Bring your research questions to this session!
Friday, May 17
4:15 p.m. – 5:45 p.m.
AAPOR Concurrent Session F
Questionnaire Design and Data
Quality
Associations Between Interactional Indicators of Problematic Questions and
Systems for Coding Question Characteristics
Jennifer Dykema, University of Wisconsin Survey Center; Nora C. Schaeffer, University
of Wisconsin Survey Center; Dana Garbarski, Center for Women’s Health and Health
Disparities Research
Writing survey questions requires attention to the conceptual and operational definitions of
survey concepts as well as to the technical issues that arise in composing items. These
technical issues are examined in a body of research that considers how characteristics of
questions (e.g., the number of categories to include in a rating scale) affect responses, their
distributions and associations with other variables, and their validity and reliability. While the
analysis of the properties of questions has led to the development of several ad hoc and formal
systems for coding characteristics (e.g., Problem Classification Coding System (CCS) (Forsyth
et al. 2004), Question Appraisal System (Willis 2005), and Question Understanding Aid (QUAID)
(Graesser et al.), these systems vary considerably in the assumptions that underlie which
characteristics they identify as problematic, which characteristics are compared, and how
dependencies among characteristics are taken into account when writing questions. Our paper
has several goals. First we review and synthesize the literature on question characteristics and
the systems for coding characteristics. Second, we analyze the administration of questions
about physical and mental health from 350 digitally recorded and transcribed interviews with
older adults in the Wisconsin Longitudinal Study. Interviewer-respondent interaction has been
coded in Sequence Viewer and we have also coded the questions’ characteristics using several
different coding schemes. We identify interactional behaviors that have been associated with
poorer data quality and use multi-level models to determine which coding systems are best at
predicting problematic outcomes, including interviewers misreading questions and respondents
expressing uncertainty and requesting clarification. Our analysis adds to the small but growing
body of research concerning the effects of question characteristics on interaction and data
quality. Our results have implications for designing questions and interviewing procedures with
an emphasis on health surveys of older adults.
Interaction Between Questionnaire Design and Interviewer Performance
Pat D. Brick, Westat; Catherine Billington, Westat; Sarah Dipko, Westat; J. Michael Brick,
Westat
There is a large literature devoted to structuring and controlling the behavior of the interviewer in
telephone and in-person surveys with the goal of improving the quality of the data collected.
This literature discusses such topics as interviewer error and interviewer effects. The behavior
of the interviewer is the focus of these studies and interviewer behavior is treated as an
exogenous variable. The interviewer effects are often described as increasing the variance of
the estimate rather than causing biases because, in the models, interviewer effects are
assumed to have an expected value of zero across interviews (O'Muircheartaigh and
Campanelli 1998). These studies are helpful, but treat interviewers in isolation for other features
of the survey. We suggest that this approach is incomplete because in many cases, the
behavior of the interviewer is a function of characteristics of the questions being administered.
We suggest that some survey questions may generate greater interviewer effects than others
due to the way survey questions are constructed. Our research links the questionnaire design
characteristics to the interviewer effects. We begin our investigation by conducting an expert
review on questionnaire items in a CATI survey. Items are classified as either having potential
problems or being well-constructed. As a complement to the expert review, we examine and
tabulate the comments entered during the interview for all items. Our analysis assesses whether
problematic questions generate more comment entries than well-designed questions. The
second part of the analysis deals with interviewer effects linked to the questionnaire design
characteristics. The goal is to determine which questionnaire items experienced greater and
lesser interviewer effects. Ultimately, we seek to evaluate the hypothesis that interviewer effects
are at least to some extent a function of questionnaire design characteristics and that crafting
high quality survey questions is the best way to control interviewer behavior.
An Examination of the Relationship Between Pretest Method Results and Data
Quality
Aaron Maitland, Westat
Many research studies collect data through survey questionnaires. In order to enhance the
validity of the findings from these studies, it is important for the studies to employ questions that
minimize measurement error. A diverse range of question evaluation methods are available for
detecting measurement error in survey questions. Ex-ante question evaluation methods are
relatively inexpensive, because they do not require any data collection from actual survey
respondents. Other methods require data collection from respondents either in the laboratory or
in the field setting. A major gap in the literature is the general lack of evidence that the problems
identified by these methods are actually problems as assessed by traditional quality standards
such as reliability or validity. Although one would expect these methods to identify questions
that produce low quality data, behavior coding is the only technique in the literature that has
been consistently shown to predict the reliability and validity survey questions (Dykema,
Lepkowski, and Blixt 1997; Hess, Singer, and Bushery 1999). This paper addresses the
important gap in the literature about whether the problems identified by question evaluation
methods lead to lower quality data. The research in this paper investigates how effectively these
methods predict the reliability of survey questions as measured by test-retest correlations
obtained from repeated measurements of sample respondents. The study uses question
evaluation results from a few ex-ante methods such as expert review and QUAID, laboratory
methods such as cognitive interviewing, and field methods such as behavior coding and
response latency to predict the reliability of survey questions. In addition, the study evaluates
how the results from question evaluation methods relate to other data quality indicators such as
item missing data.
Can Google Consumer Surveys Help Pre-Test Alternative Versions of a Survey
Question?: A Comparison of Results from Cognitive Interviews and Google
Consumer Surveys on Alternate Forms of Two Questions
Michael Stern, NORC at the University of Chicago; Vincent Welch, NORC at the University
of Chicago
During the 1990s cognitive interviewing in its various incarnations (e.g., concurrent think-aloud,
retrospective think-aloud, focus group discussions, probes, and memory cues) became the
primary means for evaluating questions (see Lessler and Forsyth, 1995; Conrad and Blair,
1996; Willis and Schechter, 1997; Tourangeau, Ripps, and Rasinski, 2000). By examining the
cognitive processes respondents went through while interacting with a survey question, survey
methodologists uncovered how small manipulations in the wording of questions influenced
respondents’ answers. Another way researchers have historically uncovered the effects of
question wording is through experimental field tests where several versions of a question are
randomly assigned to a subsample of respondents. Over the past decade, researchers have
embedded the bulk of these experiments in Web surveys among undergraduate students due to
the affordability of implementing experimental designs in this mode and the technological acuity
of college students. Still, if a researcher wanted to assess a single-item among a large
heterogeneous audience, their options were limited. Google non-probability Consumer Surveys
may provide a solution. However, two questions remain to be answered. First, how do results
from these non-probability surveys compare to those from proven cognitive interview
techniques. Relatedly, what is the value added by conducting such experiments. In this paper,
we compare results from alternate forms of two questions that were tested at NORC at the
University of Chicago with 2-waves of cognitive interviews and with Google Consumer surveys
(N=4,000) to answer these questions. The results suggest that the Google Consumer survey
data do complement the findings from cognitive interviews and the inclusion of the inferred
weighted demographics data are useful for use in certain types of studies.
An Empirical Test of the Effectiveness of Cognitive Testing in Improving Question
Wording
Martha Stapleton, Westat; Jeffrey Kerwin, Westat; Jennifer Crafts, Westat; Jasmine Folz,
Westat
Cognitive interviewing has become an accepted survey industry best practice due to the face
validity of basing question revisions on feedback elicited from representatives of the survey
target population. Despite expectations that such revisions improve data quality and reduce
burden, little empirical evidence supports the effectiveness of cognitive interviewing (Willis,
2000; Willis and Schechter, 1997; Forsyth, et al., 2004). Our experiment focused on questions
with comprehension problems. We provide evidence that issues revealed and addressed
through cognitive interviewing are associated with improved survey outcomes, including data
quality and response burden. We conducted a between-subjects experiment with approximately
20 items organized as: 1) control set (“original” questions -- before cognitive testing), and 2)
experimental set (same questions modified on the basis of cognitive test results). CATI
interviews were administered to a sample of 200 U.S. general population, English-speaking
adults. Through random assignment, each respondent received a different mix of both control
and experimental questions. At the interview end, respondents were asked to explain the
meaning of two questions in their own words so that we could judge whether their responses to
those questions were “accurate” (consistent with the question intent). Interviews were timed and
recorded for later behavior coding. Participants received a $10 incentive. We examined missing
data to evaluate whether the cognitively tested questions resulted in fewer “don’t know” / “can’t
answer” responses, compared to the “original” questions. We compared time to respond to
evaluate whether the cognitively tested questions required a lower average time per recorded
answer. We compared the follow-up probes to the survey questions to evaluate whether
responses to the cognitively tested questions appeared more accurate than responses to the
original questions. Future behavior coding analysis will compare control versus experimental
groups on requests for repeats and clarifications and the match between respondent answers
and response categories.
Methodological Briefs: Combating
Nonresponse
The Impact of Incentives in a National RDD Survey
Kelly Daley, Abt SRBI
While there is considerable empirical support for pre-paid monetary incentives (Church 1992;
Singer et al. 2000), the benefits from post-paid incentives – particularly in RDD surveys are
less clear (Singer et al. 2000; Gelman et al. 2003). Designing an effective post-paid incentive is
particularly challenging when the survey features both a screener and an extended interview.
Prior research suggests that incentives offered for the extended interview may be more cost-
effective than incentives offered for the screener (Arbitron 2003; Cantor et al. 1998; Kropf et al.
2000; Singer et al. 2000), but gaining participation at the initial stage is often the most
challenging component of RDD surveys. The 2012 Family and Medical Leave Survey of
Employees conducted for the U.S. Department of Labor features both of these challenges in
providing incentives: (i) addresses were not available for most sample members, which ruled
out pre-incentives; and (ii) the instrument featured both a screener and an extended interview.
The Employee Survey is a national dual frame RDD survey of adults employed in the last 12
months. Adults who needed or took family/medical leave in the 12 months prior to the interview
were oversampled and administered an interview roughly twice the length of the interview for
respondents who did not need or take such leave. This presentation describes results from a
randomized experiment to assess the impact of a post-paid incentive on cooperation rates, data
quality and cost per completed interview. Special focus is given to the effect of the incentive on
cooperation among cases receiving the longer questionnaire and cases in which the in which
the screener respondent was not the adult selected for the extended interview.
Using the iPad as a Prize-Based Incentive to Boost Response Rates: A Case
Study at Brigham Young University
Richard McClendon, Brigham Young University; Danny Olsen, Brigham Young University
In 2009, Dillman, Smyth, and Christian downplayed the use of prize drawing incentives for Web-
based surveys and instead conclude that, like mail and telephone surveys, the most effective
way to increase response rates in Web-based surveys is to use postal mail to deliver an
invitation and prepaid cash incentive (pgs. 274-275). However, for many public, marketing, and
social researchers, the feasibility of this approach is not only cost-prohibitive but naturally goes
against the initial purposes of using the Internet in the first place—the reduction in time and
ease of use. Further, when it comes to the advancement and public use of technological, data
from 2009 already feels like it’s a century behind. Thus, the purpose of this paper is to revisit the
question of lottery- or prize-based drawings, particularly in light of using new technological
devices as incentives; in our case—the iPad3. During 2011 and 2012, the Office of Assessment
and Analysis at Brigham Young University sent out several surveys to both students and alumni
that included an iPad drawing as an incentive. Data gathered from these surveys clearly show a
significant increase in response rates for both students and alumni. Some of these increases
have ranged from 8% to 13%. Given these favorable increases compared to the relatively low
cost of offering an iPad in a drawing, we feel this simple application would represent an
attractive solution to maintaining a sustainable cost/benefit trajectory for future research and
polling among other institutions. We will present further details surrounding this research
including a discussion of demographics that identify who are more or less likely to respond to a
survey that includes a drawing for an iPad.
Tracking Children Across Key Transitions Using Data from Multiple Informants
Lessons Learned from the Head Start Family and Child Experiences Survey
Annalee Kelly, Mathematica Policy Research; Marcia Comly Rigby, Mathematica Policy
Research
Longitudinal studies of young children often focus on key transitions, such as the transition to
school, with the goal of estimating characteristics of children before and after these transitions.
The accuracy of these estimates depends in part on having high response rates across data
collection waves, and expert tracking of study respondents is imperative. In addition, new
sources of data are often needed as children transition from one program to another. The Head
Start Family and Child Experiences Survey (FACES) is a national, longitudinal, descriptive study
of children and families served by Head Start. It follows a national sample of children from Head
Start entry through program participation and to the end of kindergarten. An accurate
accounting of study children before, during, and after each round of data collection is necessary
in order to guarantee the integrity of the sample. When children leave Head Start, their
kindergarten teachers become an important source of information on their kindergarten
programs and any difficulties they might be having at school. For the most recent FACES
cohort, FACES 2009, Mathematica tracked a sample of low-income, preschool-aged children
from Head Start entry through the end of kindergarten. Schools and teachers were identified,
located and confirmed, despite challenges that included contacting parents in hard-to-locate
populations and identifying, introducing the study to, and gaining cooperation from kindergarten
principals and teachers previously unconnected to the study. In a 16-week period, we identified
96 percent of children’s kindergarten schools and 93 percent of their teachers. This
methodological brief examines how we were successful in mitigating these challenges by using
data from multiple informants (parents during interviews at the end of Head Start and again in
kindergarten, Head Start programs, and elementary schools), and examines how useful each
was in providing verifiable information to help locate children.
When is Enough Enough? Deciding the Optimal Number of Contacts for a Multi-
Mode Survey
Kerry Levin, Westat; Jocelyn Newsome, Westat; Pat D. Brick, Westat; Brenda Schafer,
Internal Revenue Service; Ron Hodge, Internal Revenue Service; Patrick Langetieg,
Internal Revenue Service
A variety of factors can improve survey response rates, including incentives, a credible sponsor,
and a brief, easy-to-complete survey. In addition, the number and form of contacts during
survey administration can significantly influence response rates. The universally accepted
procedures for conducting mixed mode surveys are based on variants of Dillman’s Tailored
Design Method (TDM) (Dillman, Smyth, and Christian 2009). The classic TDM approach
advocates contacting respondents 4 or 5 times, where each successive contact is different from
the preceding contact. It has been empirically demonstrated that each additional contact will
result in an increase in the overall response rate (Hassol et al. 2003, Rookey et al. 2012). When
plotted as a curve against level of effort or cost, response rates move incrementally towards an
asymptote or a plateau. However, in actual survey practice, we rarely observe a plateauing of
the response rate. For many reasons, including budgetary and time constraints, more than 5
contacts is typically not an option in “real world’ survey practice. As a result, there is minimal
evidence in the literature concerning the optimum number of times a respondent to a survey
should be contacted in order to increase response rates and still be cost efficient. In other
words, when is enough, enough? In this paper, we explore adding a sixth contact to the IRS
Individual Taxpayer Burden (ITB) Survey, which includes a sample of taxpayers across the
United States. The 2010 ITB Survey, which had five contacts, never reached a plateauing of
response rate. Given that, a sixth contact was added to investigate whether this plateauing
effect would be observed. We use three critical measures to determine the success of this sixth
contact: the cost per additional complete, respondent feedback collected via a toll-free helpline,
and response rate analysis at each of the phases of contact.
Incentives and Early-Life Civic Engagement as a Mediating Factor in a Study After
50 Years
Ashley Kaiser, American Institutes for Research; Danielle Battle, American Institutes for
Research; Jizhi Zhang, American Institutes for Research
Civic engagement has been considered as one of the factors that influence individuals’ survey
participation (Singer et al. 1999). According to Groves et al. (2000), when sample members with
higher levels of civic engagement were offered incentives for survey participation there was no
impact on cooperation, but when sample members with low levels of civic engagement were
offered the same incentive, there was a positive effect on response. This paper examines the
combined influences of individuals’ civic engagement and cash incentives on survey
participation. Utilizing the nationally representative data of Project Talent, this study explores the
extent to which incentives affect sample members’ response propensity, by controlling their
early-life civic engagement. Project Talent (PT), a longitudinal study started in 1960, collected
extensive cognitive, personality and background information from 440,000 9th-12th graders.
Fifty years after the base-year data collection, a pilot test of one percent of the original sample
was conducted in 2011. The pilot test involved an incentive experiment, with one-third of sample
members not receiving any cash incentive, receiving a noncontingent $2 bill, or receiving a
noncontingent $20 check. Civic engagement was measured through respondents’
clubs/organization participation and political involvement in high school in 1960. This paper will
use the items measuring civic engagement to examine their effect on response to a follow-up
survey 50 years later. The findings of this paper will enable researchers to better understand the
influence of early-life civic engagement and survey response propensity and determine whether
high civic engagement in early-life incentivizes participants to participate in surveys later in life,
regardless of monetary incentives.
Responsive Design Features and Respondent Cooperation in the Health and
Retirement Study
Piotr Dworak, University of Michigan; Heidi Guyer, University of Michigan
Responsive design relies on observation of data collection progress and application of
measures to increase cooperation thereby reducing non-response (Groves and Heeringa 2006).
However, responsive design may have different implications for cross-sectional and longitudinal
studies. Over the past waves, the Health and Retirement Study (HRS) operations team has
implemented many targeted strategies to reduce the data collection timeline and secure
cooperation of late respondents. The goal of this analysis is to assess the efficiency the
responsive design features on instantaneous (within-wave) progress and the impact on
longitudinal cooperation. Examples of such interventions include: reducing the length of the
baseline interview, an experimental design randomizing wave 2 respondents into higher and
lower incentive conditions (in some cases higher and lower than their wave 1 incentive),
targeted “end-game” mailings, interviewer bonuses to prioritize different types of cases, “kept-
appointment” incentives and emphasis on contacting respondents around the holidays. The goal
of analysis is to comparatively asses these interventions and their impact on within-wave and
longitudinal cooperation. The analysis will further the understanding of responsive design
strategies and their application to the longitudinal studies.
Video Effects on Panelist Co-operation: Arbitron Installation Video
Kate T. Williams, Arbitron
Arbitron’s Portable People Meter (PPM) is a device that automatically detects an individual’s
exposure to encoded media and transmits the data to Arbitron for reporting. Households are
recruited into a two-year panel, and their members are asked to wear the PPM from the time
they rise in the morning until they retire at night. In order to comply successfully, the household
must first install the PPM equipment. Due to rising costs associated with recruiting households,
Arbitron is exploring methods to improve panelist installation rate. Across 2012, Arbitron
conducted a series of trials with newly recruited PPM households to examine the effect of an
email containing an offer to view an installation video. The installation video provided guidance
on how to set up PPM equipment, and it was available on a website that panelists could access
only through the email link. In this experiment, approximately half of the newly recruited
households received the email with a link to the installation video; the other half received a
similar email without such a link. Differences in the graphical content and the subject lines of the
emails were also tested. Analyses of households’ behavior after receiving the email focus on
installation success rate, and also include the efficacy of different email communications.
Innovative Measurement of Public
Opinion
140 Characters or Less to Shape Public Opinion: Methodological and Theoretical
Improvements on the Use of Twitter to Measure Public Attitudes
Anna Novikova, Knox College
How can social media complement traditional surveys in assessing public opinion? While
telephone surveys constrict responses (and introduce bias), moving from asking to listening
allows us an unfiltered look at public opinion. Translating open-ended opinions into useful
figures, however, is a challenge. Using a corpus of English language tweets from a 1% sample
stream of public Twitter posts collected in the two months prior to the 2012 presidential election,
I assess the validity of using Twitter as a forecasting tool. A machine learning algorithm trained
on hand-coded data is used to measure sentiment (i.e. positive and negative emotions)
expressed in the Twitter data. I aggregate these sentiment scores and compare them to public
polling data within the same time frame. I build upon previous research in this area by using
more sophisticated classification techniques, rather than either naïve volume counts or list-
based classification. In this way, what is being said about a candidate is captured, rather than
how often a candidate is mentioned. I hypothesize that opinions expressed by Twitter users,
who are more educated and more informed than the general public, will be more responsive to
day-to-day events in the course of the campaign. Changes in sentiment among these users,
then, should be a leading indicator for movement in public polls. This research contributes to the
development of social media analysis as a supplement to traditional public opinion polling.
Understanding Elections: Voter Intentions, Expectations, and Forecasts
David Rothschild, Microsoft Research
Using a unique dataset from YouGov/Xbox Polls we explore the relationship between
respondents’ intentions and expectations. During the 2012 election Xbox conducted roughly
750,000 interviews with 350,000 respondents. These respondents answered questions about
their candidate support and engagement in the election, as well as their expectation of who
would win, who their social network supported, and who the media was projecting to win. Cross-
sectional analysis of how intentions relates to expectations explains the underlying structure of
how respondents view the election. Panel analysis of how intentions and expectations move
during the election cycle provides new insight into the bandwagon effect of expectations effect
on intentions. Finally, we show how aggregations of respondents’ expectations accurately
predict both national and state-by-state elections.
Wanted: Young Adults 18-35 – Leveraging Smartphone Applications for Repeated
Measures of This Elusive Cohort
Shu Duan, The Nielsen Company
Growing smartphone penetration has offered survey researchers a new mode to reach the
young adults for measurement. Latest research (Pew Internet Project, Sep 2012) shows that 2
out of 3 under aged 30 adults own a smartphone which reveals a strong potential in using
smartphone to reach the younger cohort that is usually hard to reach by traditional survey
methods. Past studies have focused on specific areas of mobile research such as cell phone
frame, survey design on mobile browser, survey administration via text messages, etc. The
research gap remains on the effective mobile research methodology to target young adults age
18 to 35. Nielsen will be conducting a pilot on crowdsourcing from a mobile panel to collect
media consumption through a smartphone application with the emphasis in researching
respondents under age 35. Specifically, we would study 1) respondent cooperation through
crowdsourcing from a mobile panel; 2) app usability optimized for survey data collection; and 3)
survey compliance of reporting media consumption in smartphone app. This study will share the
learning on the end-to-end methodology of leveraging a mobile panel of smartphone users to
gain cooperation from young adults age 18-35 and adapting the smartphone features for
respondent engagement to maximize their participation throughout the data collection period.
Enhancing Usability and Data Quality
Usability of App Features and Tutorials
Kelly L. Bristol, The Nielsen Company; Jennie Lai, The Nielsen Company; Michael W.
Link, The Nielsen Company
A critical question about the sustainable future of survey research is how to design an effective
user experience for electronic data collection tools. Usability research defines a well-designed
user experience as being easy to use, quick to complete, memorable, with minimal errors and
well-liked by users. Optimizing the user experience reduces respondent burden and can
significantly improve data quality. Developing a user centered design is particularly important for
long term panel and diary studies where respondents must interact with the data collection
instrument frequently for an extended period of time. Findings reported here assess the usability
of features in a mobile and Web app for a two week diary study of television viewing conducted
by Nielsen in August of 2012. Within the application there are four primary modules – enter
viewing, check entries, messages and badges. In addition, there is a tutorial feature for each
module and the home page. Usability of the application modules is measured on several
different metrics depending on the module purpose and depth of features. Ease of use and
quickness are evaluated by time-to-complete surveys on the first use compared to the average
completion time for the overall study. The effectiveness of the tutorial feature is also evaluated
through two forms of comparison: 1) pre and post tutorial usage of features, 2) tutorial versus
non-tutorial user survey completion times and feature usage. Likability measures are
supplemented from a post-study survey. This research provides insight into developing effective
user experience design for a self-reporting electronic data collection tool, and the effectiveness
of app tutorials on optimizing user experience.
From 1.0 to 2.0: Lessons Learned of Mobile Application Design for Effective
Respondent Engagement
Jennie W. Lai, The Nielsen Company; Kelly Bristol, The Nielsen Company; Michael W.
Link, The Nielsen Company; Shu Duan, The Nielsen Company
The continued surge of smartphone ownership and mobile application (app) usage has opened
doors for survey researchers to reach the young adults and ethnic minorities. Using mobile apps
as the data collection tool and the versatility of its features allow for new respondent
engagement techniques unparalleled to the traditional modes of data collection. Both user
interface and user experience design of the mobile app are the core tools for user engagement
and the key to encourage compliance throughout the data collection period. Mobile app features
such as dynamic tutorial for survey instructions, in-app notification for customized respondent
communication, deployment of badges as incentive for survey compliance, social sharing
through Facebook posting, etc. are the tools designed to keep respondents engaged for
repeated measures. Nielsen has conducted two pilots in January and August of 2012 to capture
media usage behavior through two comparable versions of mobile application. The latest mobile
app study was launched in two markets using dual telephone frame sample for recruitment and
respondents participated for a two-week collection period. The first pilot yielded insightful
learning on the effectiveness of the aforementioned mobile app features for respondent
engagement and significant app enhancements were made for the second pilot. This research
paper will discuss the lessons learned of the app features from the first pilot and compare the
results of the upgraded features in the second pilot. The findings of these research studies will
inform which mobile app features hold promise for respondent engagement targeted for
repeated measures of longer term panel studies.
Can Embedded Help Text Links in Web Survey Items Improve Data Quality?
Natasha Janson, RTI International; Christopher Bennett, RTI International; Lesa Caves,
RTI International; Melissa Cominole, RTI International; Bryan Shepherd, RTI International;
Jennifer Wine, RTI International
Self-administered surveys often include text that is separate from survey items and serves to
provide respondents with standardized definitions and clarifications for nuanced items. For
Web-based surveys, this information can be presented in a variety of formats, including “Help”
buttons leading to external Websites or popup windows. More information is needed to evaluate
the extent to which these various formats for accessing help text actually encourage its use and
whether the use of help text has any effect on the responses provided. Embedded help text
links were evaluated in two large postsecondary surveys. Help text in these surveys has
historically been accessible via a “Help” button on each survey form, and has generally
exhibited very low usage rates among self-administered respondents (typically about one
percent). To make the help text feature more salient for self-administered respondents, key
words were hyperlinked so that respondents could click on the linked words and access the help
text for that form, just as if they had clicked the “Help” button. The content was the same
regardless of how help text was accessed. The embedded help text links were used only on
selected survey items, while all forms displayed the “Help” button at the bottom of the form.
Preliminary results show the use of help text increased significantly on screens with embedded
links versus screens with only the separate “Help” button. Implications for survey timing and
response distributions will be discussed. Study findings indicate that the way in which help text
is presented has implications for Web survey administration and data quality.
Grid Formats, Data Quality, and Mobile Device Use: A Questionnaire Design
Approach
Colleen A. McClain, Survey Sciences Group, LLC
Scott D. Crawford, Survey Sciences Group, LLC
Grids have been the subject of significant research as a frequently used—but often
problematic—way to present multiple questions in a shared layout, particularly within Web-
based surveys. Respondents’ increasing use of mobile devices underscores and emphasizes
the need to reexamine design standards for grids and questionnaires that will now be seen on a
variety of screen types. While recent work has begun to explore the relationship between device
use, data quality (McClain, Crawford, & Dugan, 2012; Saunders et al., 2012), and substantive
responses (Mavletova & Couper, 2012), considerable practical concerns remain in conducting
surveys that have been optimized for larger screens. Drawing upon recent literature and
paradata that we have collected, we propose a combined layout and questionnaire design
approach to confronting these challenges-- acknowledging that while refining layout and user
design of grids can impact data quality (Couper et al., 2013) and aid mobile navigation, an
additional challenge lies in designing questionnaires that are clear, cohesive, and adaptable to
the smaller screen space available on mobile devices. To better understand interactions
between device use and data quality measures in a grid-heavy setting, we reviewed respondent
behavior and characteristics of grids from multi-year administrations of 11 Web surveys with
college student populations, spanning several hundred thousand respondents. We focused our
exploration on key contextual characteristics of grids that may influence data quality and
exacerbate burden--such as questionnaire position/context, grid length and density, scale
design, sensitivity of content, and presence of validations. Specifically, we investigate the
relationship between several of these characteristics and mobile respondents’ tendency to
straightline, as a potential indicator of satisficing (Krosnick, 1991); to break off; and to yield
higher rates of item-missing data. Our presentation will highlight key findings from this analysis
and discuss implications for questionnaire design that considers the mobile space.
Examination of Question Complexity Through Paradata
Rebecca J. Powell, University of Nebraska-Lincoln; Ana Lucia Cordova Cazar, University
of Nebraska-Lincoln; Jinyoung Lee, University of Nebraska-Lincoln
Question complexity in surveys should be at a level where all respondents can understand what
the question is asking (Dillman et al. 2009; Groves et al. 2009). Therefore, in practice,
researchers aim to create questions that are no higher than an eighth grade reading level. While
this gives a quantitative measure for the overall question, there can still be qualitative aspects of
a question that make it complex even when the reading level is below eighth grade. For
example, a question can be phrased such that it is below an eighth grade reading level, but the
ambiguity of the words in the question can lead to a complex question. Programs like QUAID
help to point out these challenging words and phrases, which can lead to difficulty with the
response process. When respondents have difficulty with any phase of the response process, it
can have adverse effects on data quality. One way to test the effects on data quality is through
paradata. Specifically, paradata allows us to collect the frequency of answer changes to
questions, and back-ups where respondents answer another question before changing their
answer to a previous question. This study uses the Internet component of the Gallup Panel to
develop a question complexity index from QUAID information, the question reading level and
word count. These are then examined to better understand the relationship between question
complexity and the frequency of answer changes and back-ups per question. Preliminary
findings show a 0.36 correlation between the reading level and the average number of answer
changes but a 0.53 correlation between the word count and the average number of answer
changes. Increased answer changes can result in measurement error if respondents are unsure
of their answers to questions.
Using Mail to Improve the
Effectiveness of Web and Telephone
Data Collection for Address-Based
Samples of the General Public
Using Visual Design to Aid Within-Household Selection in Mail Surveys: Does it
Lead to Accurate Selection and Representative Samples?
Mathew S. Stange, University of Nebraska
Research examining the next and last birthday methods of within-household selection in mail
surveys find few differences in sample composition between the two methods, but find both
methods are unrepresentative of certain demographic groups (e.g., Battaglia et al. 2008). Yet
other research shows that accurate selection of respondents remains a problem for within-
household selection in mail surveys (e.g., Olson & Smyth forthcoming), with inaccuracy rates
ranging from a small percent to over 30% (e.g., Battaglia et al. 2008; Schnell 2007). Because
interviewers are not present, mail surveys require a different approach to motivate within-
household selection and to aid households in selecting the correct household member. Visual
design is one way to possibly help. In this study, we examine the use of a calendar placed on a
survey’s cover letter to help households select the correct household member with the next
birthday. Including a calendar adds emphasis to the task and may aid households in selecting
the correct respondent. Data come from the 2012 Nebraska Annual Social Indicators Survey
(NASIS; n=959, AAPOR RR1 26.6%) – an omnibus mail survey of Nebraskans. Half of sampled
households received a cover letter with the calendar and the other half received a cover letter
without the calendar. We examine the resulting sample composition and use a household roster
included in all the surveys to evaluate the accuracy of selecting the household member with the
next birthday. Preliminary analyses indicate that the response rate did not differ significantly
between the treatments (26.5% with calendar; 26.7% without calendar) and the sample had
similar representation on education levels. We also examine whether the calendar increased
accuracy of within-household selections, using the 92% of the sample who completed at least
some information in the roster. We conclude with implications for within-household selection
methods in mail surveys.
Effects of Survey Sponsorship on Internet and Mail Response: Using Address-
Based Sampling
Michelle L. Edwards, Washington State University
Scholars have shown that the combined use of token cash incentives with an initial withholding
of a mail response alternative can increase Internet response rates significantly in regional and
state-level surveys using address-based sampling. However, the effectiveness of this model has
declined when university sponsors have surveyed residents in distant states. While
nonresponse rates are not necessarily predictive of nonresponse bias, attitudes toward a
survey’s sponsoring organization may influence both response rates and nonresponse bias. To
test the effects of survey sponsorship by a local (in-state) university sponsor versus a distant
(out-of-state) university sponsor on response rates, we conducted an experiment in spring 2012
with an address-based sample of Washington and Nebraska residents. We found that
sponsorship had a significant effect on final response rates in both states, with in-state
sponsorship significantly improving response for both mail-only and 2 Web+mail (initial Web
request with a mail questionnaire offered in the fourth and final contact) treatment groups. For 2
Web+mail groups, we also found that local sponsorship increased the risk of responding by
Web (relative to not responding), but not the risk of responding by mail (relative to not
responding). In examining the representativeness of the resulting samples, we found that our
survey respondents were both generally older and more highly educated than state-level
estimates from the Gallup Poll and American Community Survey. In Nebraska, a Republican-
leaning state, distant-sponsored surveys obtained a lower percentage of Republicans than
local-sponsored surveys. In Washington, a Democrat-leaning state, local-sponsored surveys
obtained a lower percentage of Republicans than distant-sponsored surveys. This research
suggests that recent public opinion findings demonstrating declining public trust in science
among conservatives (but not other groups) may have important consequences for university-
sponsored survey research.
Sample Performance and Cost in a Two-stage ABS Design with Telephone
Interviewing
W. Sherman Edwards, Westat
Random-digit-dial (RDD) surveys long provided an effective, lower-cost alternative to face-to-
face surveys for general population research. With declining response rates and an increasing
proportion of cell-phone-only households, both the effectiveness and cost of RDD surveys have
become less attractive. Address-based sampling (ABS) is becoming a preferred approach in
many cases, but there is no consensus as yet on the optimal data collection mode or mix of
modes, particularly for surveys requiring within-household sampling and/or an interviewer-
mediated questionnaire. Brick et al. (2011) describe a successful two-stage mail ABS design,
where the first stage determines household eligibility and provides information needed for
within-household sampling, and the second stage collects more detailed information about
sampled individuals. Two-stage designs have also incorporated telephone interviewing at the
second stage. This paper will present the results of a pilot two-stage ABS design for a
companion survey to the National Crime Victimization Survey to support local area estimates,
with an initial mail contact and telephone interviewing. The pilot incorporates a split-sample
experiment. In one treatment, only addresses without an associated telephone number were
sent the mail instrument, with the objective of obtaining a telephone number. In the other
treatment, all sampled addresses were sent the mail instrument, which also included questions
to allow stratification of the sample by likelihood of having experienced a crime. In both
treatments, telephone interviews were attempted with all households for which a telephone
number was obtained. The analysis will compare sample performance and per-case cost
between the two treatments and with the likely sample performance and cost of an RDD survey
to accomplish the same objectives. Since the NCVS estimates both prevalence and
characteristics of relatively rare events (crimes), a large sample is required. Therefore, we will
calculate both costs per completed interview cost per completed interview with reported
victimization.
Is Pushing the General Public to the Web in Address-Based Samples Cost
Effective?
Virginia M. Lesser, Oregon State University Department of Statistics
Interest in using mail contact in address-based samples of the general public to encourage
responses over the Internet is considerable. However, several studies have shown that it is
necessary to also use mail questionnaires in order to obtain responses from households with
quite different demographics than those who will respond by Web (e.g. Messer and Dillman,
Public Opinion Quarterly, 2011). That study also shows that “pushing” some respondents to the
Web may actually increase total survey costs on a per respondent basis while reducing overall
response rates and not provide a demonstrable improvement in household representation. The
expected savings from questionnaire mailing and processing costs did not offset the set-up and
implementation costs. In this paper we examine results from two experiments conducted on
address-based samples in Oregon during 2010 and 2012. Response rates and cost effects of
two approaches were examined: 1) Web+mail (withholding mail in early contacts) and 2)
offering a choice of Web or mail were compared using a mail-only approach as controls. We
systematically examine for each approach and year response rates, costs for each survey
mode, and demographic representation with regard to age, gender, and employment. Thus, we
reexamine the question of whether including a Web response is cost effective when
administered in a somewhat different way that used by Messer and Dillman.
Using GIS to Target Address-Based Samples of Households for a Web (vs. Mail)
Response: Evidence from Three Web+Mail Surveys in Washington State
Benjamin L. Messer, Washington State University
Address-based sampling enables researchers to use geographic information systems (GIS) to
analyze the social, demographic, and other characteristics of the communities in which sampled
households are located. Increasingly, research is finding that these methods are important for
survey designs in which households can be targeted for response to different survey modes in
advance of the data collection period. However, little is known about which community
characteristics are important for predicting what households have the highest propensities for
responding to a Web (vs. mail) survey. Previous research has identified a number of individual
and household characteristics that are important for predicting Web response, including
household Internet access, socioeconomic status, and age, but less attention has been directed
toward the community-level. The purpose of this paper is to report on the geographic bases of
Web and mail survey response to statewide surveys, identify those characteristics that are most
salient for targeting households to respond via the Web, and to offer suggestions on which
Web+mail methods may be the most effective in different types of communities. We use existing
data of address-based samples from three general public Web+mail surveys conducted in
Washington State between 2008 -2011 matched with data from the Census and American
Community Surveys in GIS. Analyses are currently being conducted and results will be available
in the next few weeks.
Public Opinion and the Environment
The Weathering of Skepticism: An Examination of American Views on the
Existence of Climate Change
Christopher P. Borick, Muhlenberg College Institute of Public Opinion; Barry G. Rabe,
University of Michigan
The period between 2008 and 2012 was one of significant shifts in American public opinion
regarding climate change. Between 2008 and 2010 an increasing number of Americans
indicated skepticism that global warming was occurring. This trend has been reversed between
2010 and 2012, with most public opinion research finding that levels of acceptance regarding
the existence of global warming returning to levels observed in 2008. Numerous studies have
identified factors such as changing economic conditions, media framing and variations in
weather as the determinants of the shifts in American public perceptions about climate change.
In this study we examine the role that individual perceptions about weather have had on their
beliefs regarding the planet’s climate. In particular we look at the personal experiences that
Americans have had with conditions such as severe droughts, hurricanes and heat waves, and
how those experiences have diminished skepticism regarding global warming. The study
includes results from 9 iterations of the National Survey of American Public Opinion on Climate
Change (NSAPOCC) between 2008 and 2012, including a rounds conducted just before and
after Hurricane Sandy’s landfall in late October of 2012.
Global Warming Attitudes Among Local News Viewers and Non-Viewers; Media
Market Comparative Analysis and Change Over Time
Amy Simon, Goodwin Simon Strategic Research; Leora Lawton, Tech Society Research;
UC Berkeley, Berkeley Population Center; Adam D. Probolsky, Probolsky Research LLC;
Paul A. Hanle, Climate Central
In light of AAPOR’s conference theme “Toward a Sustainable Future for Public Opinion and
Social Research” we submit a paper looking at views and attitudes about global warming and
climate change. This paper reports on the findings of two surveys measuring attitudes and
views towards global warming in three media markets, comparatively, as well as over time. In
February 2012, we completed a benchmark telephone survey with n=6,089 completed
interviews using live interviewers in three media markets (DMAs): Denver, Colorado; Terre
Haute, Indiana; and Dallas, Texas. We conducted approximately 1,000 interviews in each
market among adults who watched a local news station at selected evening viewing times that
included the weather report, with a focus on a different network affiliate in each market. For a
control group, we also conducted 1,000 interviews in each market among adults who either did
not watch the targeted local news station or watched the station at different times than the
select evening viewing times. The RDD sample included landlines and cell phones. In our
benchmark survey, we found that while six in ten respondents think global warming is
happening, just over four in ten are concerned about its impact on the world today. By
combining an attitudinal survey with media consumption, we were able to show that the source
of information about global warming as well as religious and political ideological positions are
strongly associated with attitudes about global warming, and that these positions are
independent of educational attainment. In February 2013, one year later, we will conduct the
survey in the same markets to measure any change in attitudes over time among the viewer
and non-viewer populations. We will also investigate whether watching certain weather
newscasts has a quantifiable impact on views of global warming.
Polls, Publics and Pipelines: Mapping Public Opinion Toward the Keystone XL
Pipeline in the United States and the Northern Gateway Pipeline in Canada
Timothy B. Gravelle, PriceMetrix Inc.
The politics of oil pipelines has been especially prominent in recent years in North America. In
the American case, debates about economic benefits, energy security and environmental
impact have been provoked by the then-proposed (and now vetoed) Keystone XL pipeline
intended to take bitumen from northern Alberta in Canada to refineries on the Gulf of Mexico in
Texas. In the Canadian case, similar debates have been provoked by the proposed Northern
Gateway Pipeline from northern Alberta westward to ports in British Columbia. Drawing on data
from recent probability-based surveys in the U.S. (by the Pew Research Center) and Canada
(by Ekos Research Associates), this paper asks a series of questions comparing the two cases.
What levels of support for (and opposition to) the two pipelines exist? What are the roles of
political factors (such as party identification), economic attitudes and proximity to the proposed
pipeline routes in shaping attitudes? And how to political and economic factors (on the one
hand) and proximity to the pipelines (on the other) interact? In asking these questions, the paper
sets out to build on the growing body of literature highlighting the geospatial determinants of
policy attitudes.
Emphasis Framing and Americans’ Perception of Scientific Consensus:
Scientists Agree on “Climate Change” but not on “Global Warming”
Jonathon P. Schuldt, Cornell University; Sungjong Roh, Cornell University; Norbert
Schwarz, University of Michigan
Whether or not citizens perceive a scientific consensus on global climate change has emerged
as an important factor in public opinion regarding climate policy (Weingart, Engels, &
Pansegrau, 2000; Kahan, Jenkins-Smith, & Braman, 2011). However, little is known about the
situational factors that might influence this perception. Building on recent research (Schuldt,
Konrath, & Schwarz, 2011), we explore whether a seemingly trivial wording change can
influence perceptions of scientific consensus, namely, whether the issue is framed in terms of
“global warming” or “climate change” in the survey question. In a nationally representative
survey experiment (N = 2041) fielded August 25–September 5, 2012, respondents reported on
their own as well as scientists’ beliefs about the existence of global climate change, worded
either in terms of global warming or climate change. Replicating a previous observation (Schuldt
et al., 2011) with a representative sample, Republicans (but not Democrats) reported
significantly lower existence beliefs when asked about “global warming” as compared to “climate
change.” Going beyond their own beliefs, respondents overall were less likely to perceive
scientific consensus when the issue was framed in terms of global warming. Thus, the influence
of these emphasis frames, which are commonly used interchangeably in public discourse,
extends beyond personal beliefs and affects citizens’ perceptions of the positions of scientific
experts. Discussion focuses on theoretical and practical implications of this subtle but
overlooked factor in science communication, survey design, and public opinion about climate.
Global Warming, Geo-Engineering and Human Happiness: Survey Based
Estimates of Worldwide Gains and Losses in North and South, Winter and
Summer
Jonathan Kelley, International Survey Center and University of Nevada, Reno
This paper provides quantitative estimates of the consequences of global warming for human
happiness (well-being, utility, life satisfaction). Data are from a representative national sample of
the U.S. (N=2295), together with standard NOAA data on climate worldwide on a half-degree
latitude/longitude grid. Regression estimates show that a century of warming at currently
expected rates will increase American's satisfaction with winter weather in northern and mid-
latitude states but decrease their satisfaction with summer weather in all states. The gain is
equivalent to that which would come from an increase in income of around 8% in northern
states and a loss of 5% in southern states—huge figures, dwarfing most other consequences of
climate change. Assuming people in other nations evaluate temperatures the same way as
Americans, global warming is likely to be beneficial in higher latitudes (Canada, northern
Europe, north China, Korea, Argentina, New Zealand) and bad near the equator (Mexico,
Central America, Brazil, sub-Saharan Africa, India, south China, south-east Asia). The potential
for North-South conflict is clear. Moreover, choice in these matters may not lie with western
nations: In the absence of geo-engineering, the continued expansion of coal fired power plants
is likely to be a benefit to the north Chinese and possibly to China as a whole. If so, and if the
large and rapidly growing Chinese economy pursues its own self-interest, that alone could lead
to global warming regardless of what policies western nations pursue at home. Geo-engineering
techniques (such as atmospheric sulfur injections) might perhaps reduce these conflicts by
cooling lands near the equator while letting temperatures rise at higher latitudes. Indeed geo-
engineering might be a worldwide benefit if it could selectively cool summer temperatures in
middle and lower latitudes while letting winter temperatures rise at middle and higher latitudes.
Panel Recruitment, Attrition and Data
Quality I
Predicting Survey Breakoff in Internet Survey Panels
Tarek Al Baghal, University of Nebraska - Lincoln; Allan L. McCutcheon, University of
Nebraska - Lincoln; Davit Tsabutashvili, University of Nebraska - Lincoln
Survey breakoff – when respondents discontinue their participation before completing the
questionnaire – has attracted a growing amount of interest and attention (see, e.g., Peytchev
2009). The increased interest in breakoff involving Internet survey respondents has been
accelerated by the relatively recent availability of paradata collection methods for Web surveys.
In addition to respondent and survey design characteristics, it is now relatively easy to obtain
data such as the amount of time taken per survey item (response latency), number of response
changes to questions, time of day when the survey breakoff occurs, as well as a number of
other factors that can be evaluated as contributors to survey breakoff. The proposed study
examines data from monthly waves of the Internet component of the Gallup Panel, a multi-mode
(mail and Web) panel of American households. In addition to standard demographic respondent
characteristics and survey design factors (e.g., question complexity, topic, number of questions
on the page, length of survey), the analysis will include a variety of respondent self-reports on
Internet sophistication and survey design factors and paradata to explore factors related to
survey breakoff. Preliminary analysis indicates that while long-term panel members are less
likely to breakoff, that there appears to be a clear and persistent pattern with respect to
response latency; as respondents approach breaking off their survey participation, they tend to
slow down in their response time (increase response latency). The study will explore the
potential use of such predictive models for survey breakoff in designing possible
responsive/adaptive design (Groves and Heeringa 2006) interventions for Internet surveys that
may prove useful in averting, or delaying, Internet survey breakoff.
Innovative Retention Methods in Panel Research: Can SmartPhones Improve
Long-Term Panel Participation?
James J. Dayton, ICF; Andrew Dyer, ICF
Minimizing participant attrition is vital to the success of longitudinal panel research. One such
example of longitudinal panel study conducted by ICF is the National Recreational Boating
Survey (NRBS), sponsored by the U. S. Coast Guard to ensure that the public has safe, secure,
and enjoyable recreational boating experiences. Specifically, the NRBS Program enables the
Coast Guard to better identify safety priorities and coordinate and focus research efforts. The
project features a several components, one such being the “Trip Panel.” The Trip Panel is
designed to capture actual exposure to recreational boating. This panel was recruited via dual-
frame, dual-mode (Random Digit Dial telephone and mail) and has been in place for over a
year. Respondent contact information includes e-mail address, mailing address, and telephone.
In many cases, the provided contact number is a mobile device. This presentation will explore
ICF researchers’ quest to improve panel retention though the introduction of a smartphone
application that engages respondents in between survey waves by allowing them to
communicate changes in contact information and even provide survey responses via
smartphone rather than via the Web or traditional telephone. Active panelists who provide cell
phone contact information will be randomly assigned to receive standard retention
communications via mail, phone and e-mail (control) or alternate retention communications via a
smartphone application and text message/SMS (treatment). The communications application for
the treatment group includes study updates, various interactive communications, and mini-
surveys. ICF researchers will analyze the differences in control and treatment panel retention
over a six-month period. We will also survey panelists’ willingness to sign-on for another annual
wave of the panel as well as their overall satisfaction with panel participation as an indicator of
long-term continued participation.
Probability Based Postal Recruitment into Longitudinal Online Panels: The
Effects of Personalization and Incentives
Johan Martinsson, University of Gothenburg
This study examines the feasibility of probability based recruitment into longitudinal on-line
panels through postal invitations. The study explores the effect of three factors: personalization,
incentives and reminders. Further, the study uses a factorial design allowing us to explore
interactions between for example incentives and personalization. The aim of the study is to find
the most cost-efficient way to recruit a reasonably representative probability sample. Since this
large scale study involves as many as 29 000 post cards being sent to a probability sample of
the Swedish population from the national population register, we are able to analyze both the
effect on the recruitment rate of personalization, incentives and of reminders, but also the effect
on the actual demographic and attitudinal representativeness of those recruited from different
kinds of postal invitations. Further, due to the excellent Swedish population register we also
have access to register data on marital status, age, sex, children, citizenship, country of origin
and more for all individuals included in our random samples, and not only in the aggregate. This
allows us to carefully check which demographic groups respond stronger (or weaker) to the
factors examined in this study. All in all, three main outcomes are examined: recruitment rates,
representativeness, and the cost of recruitment. Finally, we also check the long-term effect of
different recruitments after the respondents participate in their first large scale survey after the
initial recruitment survey approximately one month later.
Acquiescence to False Preload Information When Using Dependent Interviewing
Johannes Eggs, Institute for Employment Research; Annette Jäckle, Institute for Social
and Economic Research
With Proactive Dependent Interviewing (PDI), respondents are reminded of the answer to a
survey question they gave in a previous interview. The previous information is used to verify
whether the respondent’s status has changed, or as a starting point for asking about events
since the previous interview. In either case, concern is frequently voiced that measurement error
from the previous wave will be carried forward into future waves of the survey. In this paper we
use data from the panel survey “Labour Market and Social Security” (PASS), linked to individual
administrative records, to examine possible causes acquiescence to false preload information.
During the interviews for wave 4 of PASS, the preload was faultily generated for a subgroup of
393 respondents regarding welfare receipt and respondents were given questions with false
preload information. Only a part of the respondents contradicted the false preload. However, the
error allows us to exploit a rare research opportunity to address the following questions: 1) To
what extent do respondents confirm previous information when that is false? 2) How much of
the apparent false confirmation is in fact due to false reporting at the previous wave of the
survey? 3) To what extent is the false confirmation carried forward into the next wave of the
survey? 4) To what extent can the acquiescence be explained by personal traits, response
strategies, response difficulty, or interviewer characteristics?
How am I Doing? The Effects of Gamification and Social Sharing on User
Engagement
Oana M. Dan, The Nielsen Company; Jennie W. Lai, The Nielsen Company
Gaming mechanics and concepts (“gamification”), as well as virtual “sharing” within social
networks, are emerging tools to increase participation in surveys and especially to maintain
cooperation in longitudinal studies. As customizable and personalized devices germane to
respondents’ environment and lifestyle, mobile devices have greatly facilitated the development
of interactive measurement instruments that are able to challenge respondents, to evaluate and
reward their behavior, and to broadcast it to others in real time. However, the mechanisms
underlying the effects of gamification and social sharing on respondent engagement have not
been fully unpacked. These mechanisms may be active (extroverted interaction or competition
with other participants) or reflexive (introverted evaluation of one’s own performance). This
paper assesses these two mechanisms, relying on data from a 6-week study of an innovative
mobile application to measure media consumption behavior. The iPhone application allowed
users to record what they watched on TV, to earn badges and “ranks” based on their
engagement with the app’s various features, and to share their accomplishments with other
users. Mixed-effects panel models show that self-evaluation (checking how one is doing) and
positive reinforcement from others increase engagement, whereas extroverted competitive
interactions (sharing one’s performance with other users) decrease it. These results are
significant among the two groups of study participants: one that was gradually exposed to the
gamification and social sharing features; and the other exposed to the full-featured app from the
beginning. Gamification and social sharing have stronger positive effects for those who were
gradually exposed to these features, showing that these effects are independent of other
factors, and that they could be explained in part by the novelty of these features. This suggests
that gamification and social sharing are effective and self-sustaining (hence, cost-efficient)
incentives in panel studies, especially if they promote self-evaluation and keep the study
exciting.
Evaluating Address-Based Samples I
The Implications of Excluding Inactive Mailing Addresses From ABS Frames
Rachel Harter, RTI International; Bonnie Shook-Sa, RTI International; Joseph McMichael,
RTI International; Jamie Ridenhour, RTI International
Unoccupied addresses in address-based sampling (ABS) frames lead to inefficiencies in data
collection and increased data collection costs. Some studies remove addresses flagged as
vacant or new construction to improve efficiency and reduce data collection costs. However
housing units that are vacant or under construction in the frame have the potential to become
occupied and part of the eligible population for the survey. The longer the time lag between
frame construction and data collection, the greater the risk that the flags are outdated. Thus
there are tradeoffs between ABS sample frame coverage of the U.S. housing unit population
and the efficiency of data collection, with the element of time shifting the balance. This paper
explores the tradeoffs in the context of the U.S.P.S. Computerized Delivery Sequence file
(CDS), which is often used as an address frame for surveys and whose coverage of the housing
unit population has been researched. Sometimes the CDS is supplemented with traditional field
enumeration or ABS frame supplementation methods such as CHUM to improve coverage,
especially in areas that do not have city-style addresses. Recently the No-Stat file (NS)
containing drop units, throwbacks, and addresses on contract carrier routes not receiving mail
has been made publicly available, and it, too, has been used to supplement the CDS. This
paper examines vacancy and new construction status in the CDS/NS files, the typical durations
for housing units being flagged as vacant or new, the clustering of flagged addresses within
geographies and within buildings, and the extent to which addresses move from the NS to the
CDS file, or vice versa. With this information, survey designers can make a more informed
decision whether to supplement the active housing units in the CDS/NS files with those flagged
as vacant or under construction.
The Trajectory of the USPS DSF: Change in National Coverage for In-Person
Interviewing 2000-2010
Colm O’Muircheartaigh, NORC at the University of Chicago; Ned English, NORC at the
University of Chicago
Our continuing research program at NORC indicates that the proportion of the USA that
requires in-field listing has changed substantially over the past decade, shrinking from 28% to
15% of the population; the United States Postal Service (Computerized) Delivery Sequence File
((C)DSF) provides a preferable alternative from a cost and efficiency perspective for the rest of
the population. We use data from the NORC National Master Sample in both the 2000 and 2010
decades, which has listings for national surveys across environments and geographies in the
USA, so show the depth and breadth of changes to the DSF over the past decade.
Improvements in the CDSF have not been evenly distributed across the population, however,
with some areas remaining static since 2002 and others that formerly required in-field listing
now suitable for using the CDSF. Our paper examines the kinds of places that experienced the
most change in CDSF coverage during the period in which the list underwent the most research
with respect to surveys, e.g. the 2002-2012 decade. We will describe which micropolitan
statistical areas have improved faster than average, and what the structural implications of such
changes might be. Multimode Address-Based Sampling (ABS) also requires standardized
addresses, which are often not available for sparsely populated areas and for undifferentiated
apartment addresses within buildings. By examining the trajectory of change, we predict the
future requirements for in-person surveys and for multimode ABS.
Building a More Powerful Model to Predict Areas Where USPSBased Address
Lists May Be Used in Place of Traditional Listing
Frost A. Hubbard, Survey Research Center, University of Michigan; James R. Wagner,
Survey Research Center, University of Michigan; Haoyu Gu, Survey Research Center,
University of Michigan; Wen Chang, Survey Research Center, University of Michigan
Traditional field listing is an expensive method for obtaining high levels of coverage on area
probability studies. Over the past decade, many studies have shown how using the U.S. Postal
Service Delivery Sequence File (DSF) as a sampling frame for area segments, typically clusters
of Census blocks, can greatly reduce costs while maintaining relatively high levels of coverage.
In general, rural areas have lower levels of coverage than suburban or urban areas. However,
this generalization is not uniformly true. Brick and colleagues (2011) devised a model that
improved the prediction of areas which are likely to be well-covered by the DSF that includes
many other predictors. Their prediction model was built using mainly American Community
Survey data, on a relatively small scale and not using a nationally representative sample of area
segments. Since new data are available from the 2010 Census, and since the National Survey
of Family Growth (NSFG) uses a nationally representative sample of area segments in which
the DSF listings are reviewed for correctness, we have the basis to develop an improved model.
We will use Census 2010 variables, variables from the Census hard to count data file, and data
on the DSF as predictors. Results from an experiment using this model in production will be
presented.
Growing Survey Response Rates on Trees: Evaluation of Response Propensity
Models Based on Logistic Regression Models and Random Forests Using Block-
Group Information Appended to an ABS Sampling Frame
Trent D. Buskirk, The Nielsen Company; Anh Thu Burks, The Nielsen Company; Brady T.
West, Institute for Social Research, University of Michigan
Address based sampling (ABS) enables survey researchers and statisticians to append a vast
array of ancillary information to the sampling frame at the block-group level for virtually every
sampling unit. Information such as median household income, percentage of renters, or
percentage of householders over 55 can be used a priori as part of the sampling design or post-
sampling to either improve the survey recruitment processes or serve as the basis for
nonresponse adjustments. In this presentation we report the results of a study aimed at
evaluating the use of a series of variables available both at the block-group and ZIP-code+4
levels from both the 2000 Census and other commercial sources to estimate response
propensities for a national media diary survey (MDS). The MDS sample consisted of over
650,000 addresses randomly selected from a national ABS sampling frame. The response
propensity models were constructed from a catalogue of over 50 ancillary variables using both
random forests and logistic regression models incorporating principal components for reduction
of the ancillary data. These methods will be compared to a basic response propensity model
derived using logistic regression from household predictors including age and Hispanic
indicators. We first compare the internal validity of these models, derived using a series of
cross-validation techniques including bootstrap resampling and a test-retest hold out sample.
We also present estimates of temporal validity based on application of these models to a
second sample from the same calendar year, and estimates of external validity based on
application of these methods to a separate and subsequent media diary national sample.
Finally, we will discuss how the results of this research can be used to tailor recruitment
strategies based on the optimal prediction models.
Cashing in on ABS GOLD? Exploring the Utility of ABS Frame Appended
Auxiliary Data for Potential Nonresponse Bias Assessment and Adjustment
Anh Thu Burks, The Nielsen Company; Lauren Walton, The Nielsen Company; Trent
Buskirk, The Nielsen Company; Michael W. Link, The Nielsen Company
Address based sampling (ABS) is a viable sampling methodology due to its near universal
coverage of residential households with latest numbers placing coverage at 95% of households
(Link and Lai 2011; AAPOR Cell Phone Task Force, 2010). The frame itself provides an
alternative sampling solution for coverage issues related to cell phone only homes and hard to
reach demographic subgroups (i.e., 18 to 34 year olds, blacks and Hispanics) Moreover, ABS
frame data are rich and provide options for stratification, oversampling and nonresponse
adjustments that extend way beyond what is available for RDD sampling designs. In this paper
we present results from a mixed-mode sample survey from an ABS frame that employed
vigorous nonresponse follow-up protocols. All randomly selected households were mailed a
survey and a subset of nonresponding households received a follow-up in-person survey
attempting to gain participation. Here we assess nonresponse biases for both a continuous
measure of media consumption and a binary measure of media access by comparing
responses on these outcomes between responding and nonresponding households. We will
explore characteristics of responding and nonresponding households that are based on both
standard survey household demographic variables as well as ABS auxiliary variables that are
measured at the block group. We will further assess the degree to which these variables are
related to the survey outcomes and determine the degree to which nonresponse biases can be
mitigated using propensity models based on a combination of survey demographic and ABS
frame variables. Specifically we will assess the utility of ABS frame auxiliary variables in
mitigating nonresponse biases by comparing nonresponse adjusted estimates based on both
logistic and random forest propensity models derived using only collected survey demographics
as well as those based on both survey demographic and ABS frame variables.
Saturday, May 18
8:00 a.m. – 9:30 p.m.
AAPOR Concurrent Session G
Advances in the Use of Paradata
A Glimpse Inside the Mind of a Respondent: Using Paradata to Improve Online
Surveys
Travis Pape, U.S. Census Bureau
Traditional quality measures of survey instruments include item nonresponse and survey
completion time. In interviewer-administered modes, quality measures sometimes include
interviewer observations of respondent utterances or facial expressions. These results are often
subjective and cannot describe the reasons behind respondents answer choices or their
experiences with the survey. Use of paradata from Internet instruments allows us to get an
objective view of the entire survey experience from initial login to final submission. As part of the
2012 National Census Test, the Census Bureau captured paradata from every page of the
online instrument, along with respondent answers. These paradata provide rich data related to
respondent interaction with the census Internet questionnaire such as break-off rates, help link
access, answer changes, and completion times. These data help researchers key in on items
that are problematic from a user perspective in a way that is not possible with traditional data
analyses, such as response rates. Paradata results allow researchers to focus instrument
improvement efforts on items that are known to be problematic for a respondent in a very
specific way. This paper will use paradata results from the 2012 National Census Test to identify
potential issues that can be resolved for future online instruments and to highlight design
features that worked well.
Use of Paradata to Evaluate Medical Expenditure Panel Survey Data and
Operations
Lisa B. Mirel, Agency for Healthcare Quality and Research; Steven R. Machlin, Agency for
Healthcare Quality and Research
The use of paradata in survey research has become increasingly valuable in recent years to
facilitate monitoring of survey operations and improve data quality. Paradata consists of
information about the data collection process in a survey, including interviewer observations,
interview language, computer generated time variables for questionnaire sections and
numerous other variables. One survey that uses paradata to monitor survey operations and
explore improvements in data quality is the Medical Expenditure Panel Survey Household
Component (MEPS-HC). The MEPS-HC is a complex multi-stage nationally representative
sample of the U. S. civilian noninstitutionalized population with an overlapping panel design.
Each year a new sample is drawn as a subsample of households that participated in the prior
year's National Health Interview Survey (NHIS) (conducted by the National Center for Health
Statistics). Data are collected in the MEPS-HC through a series of five CAPI interviews that
cumulatively cover a two year period on a variety of health related issues including health
conditions, use of medical care services, charges and payments, and access to care. There is a
wealth of MEPS-HC paradata associated with the multiple MEPS-HC interviews and additional
paradata information can be obtained by linking to the NHIS. Selected paradata are routinely
used to improve non-response adjustments to MEPS-HC survey weights and have been used
for a responsive design pilot study. This paper describes an ongoing evaluation of the
association between paradata measures and data quality in the MEPS-HC. In particular, the
current evaluation uses descriptive statistics and multivariable modeling to evaluate areas of
improvement in the collection of reported health care utilization in the MEPS-HC. The results
are interpreted in the context of strengths and limitations of using paradata for improving data
quality and monitoring survey processes.
Using Audit Trail Data for Interviewer Data Quality Management
Haoyu Gu, University of Michigan; Nicole Kirgis, University of Michigan
Audit trail data, the record of actions and entries on computers by computer users, have been
collected in many studies using Computer-Assisted Personal Interviewing (CAPI). Audit trail
data collected during the National Survey of Family Growth (NSFG) include a record of every
key stroke and the time spent between key strokes while interviewers conduct CAPI interviews.
Using these data, a data quality dashboard was created in order to monitor data quality at the
interviewer level. Indicators include the average time spent on survey questions, the frequency
of using help screens, recording remarks, checking errors, backing up in the interview, and the
frequency of “don’t know” and “refuse” responses. Principle component analysis (PCA) is used
to investigate the relationship between the elements of the interview process. Three factors
identified from PCA are included in the dashboard. Two examples will be presented in this
paper, showing that by using this data monitoring technique, interviewers with quality concerns
can be effectively identified, and the change of the performance of problematic interviewers
after intervention can be monitored.
Examining Response Time Outliers Though Paradata in Online Panel Surveys
Jinyoung Lee, University of Nebraska - Lincoln; Tarek Al Baghal, University of Nebraska -
Lincoln
As nonresponse rates and costs of traditional data collection modes increase, more people are
becoming interested in Web surveys as an alternative. Although there are great concerns about
coverage errors in Web surveys, the simultaneous advantages of Web surveys—timeliness,
cost-saving, various design options, and applicability to mixed modes—make them attractive
survey modes. This study focuses on response time using paradata and survey responses from
the Internet component of the Gallup Panel. Usually, response time is highly skewed. For
example, while the average total response time for a Gallup Panel survey in June was 295.15
seconds, the maximum total response time was 4561.24 seconds. To handle outliers with very
long response times, Yan and Tourangeau (2008) replaced observations beyond the upper one
percentile with the ninety-ninth percentile value and observations below the lower one percentile
with the first percentile value, respectively. This study, however, focuses on the outliers
themselves, especially those with extremely long response times. Outliers are potentially
important because they provide cues to identify respondent behavior and response patterns. In
a preliminary analysis, cutting outliers with long response times at certain points excluded nearly
one-third of the participants who broke off from the analysis. Also, there were significant
differences in the percentages of item nonresponse between outliers and non-outliers. Despite
their importance, outliers tend to be excluded from the analysis because of their great leverage
to the overall results. Instead of discussing the optimal cutoff points for outliers, this study aims
to examine the features of outliers in online panel surveys and suggests that outliers with long
response latencies be investigated for researchers to understand respondent behavior and
improve data quality. Exploring response time outliers through paradata may show us a novel
way to approach various issues concerning Web surveys.
What Can Paradata Tell Us About Multi-Establishment Business Reporting?
Eric B. Fink, U.S. Census Bureau
Paradata are increasingly used to understand respondent behavior and survey outcomes. In
this paper, we use paradata to examine multi-establishment business reporting patterns for the
2011 Annual Survey of Manufactures. The ASM offers two main reporting options: paper and
electronic. All Business enterprises are mailed a form, but are encouraged to report
electronically. Electronic reporting occurs via the downloadable reporting software used by
multi-establishment businesses called Surveyor. Enterprises that do not respond initially are
subject to nonresponse follow-up. The ASM nonresponse follow-up includes up to four
subsequent mailings to the initial mailing and, for select enterprises, analyst phone calls. We
combine Surveyor, 2007 Economic Census data, and other ASM paradata for our analysis.
Based on our findings, we discuss ideas for adapting the survey during data collection to bring
down costs while maintaining or improving data quality.
Adaptive Design at the Census Bureau
Adaptive Design at the Census Bureau—A New Way of Doing Business
Peter V. Miller, U.S. Census Bureau
The Census Bureau has made a significant investment in adaptive design, a strategy for more
efficient management of survey data collection. The Bureau is engaged developing capabilities
for employing adaptive design in all of its censuses and surveys. This panel illustrates a range
of efforts in progress. First, we provide an overview of the projects directed by the newly formed
Center for Adaptive Design, which include research on adaptive design components, IT system
design and outreach and education. Then we offer two papers that detail efforts to develop and
validate paradata resources essential to putting adaptive design into practice. One paper
concerns data quality of contact information recorded by interviewers in a number of Census
CAPI surveys. This information is used to measure the level of effort expended in attempting to
interview each case and in estimating the propensity of each case to respond. The second
paradata paper details developmental work on an instrument that supplements contact data with
interviewer observations of household characteristics. This information may refine estimates of
response propensity and offer a means to adjust for nonresponse bias for cases that are not
interviewed. The fourth paper describes the process of integrating paradata resources and
survey response data to create a set of timely survey metrics. We detail how effort and cost
information is combined with response propensity and key survey estimates in a single display
in near real time to allow survey managers to track survey progress and execute adaptive
design interventions. Finally, we illustrate an application of adaptive design interventions in the
National Survey of College Graduates. The test involves both continuous monitoring of key
survey indicators and mode switching to increase the likelihood of response in a shorter field
period.
An Investigation of Quality of the Contact History Instrument
Dawn V. Nelson, U.S. Census Bureau; Julia Coombs, U.S. Census Bureau
The Contact History Instrument (CHI) is a standalone Blaise application housed in the Census
Bureau’s computer-assisted personal interview (CAPI) Case Management system. Beginning in
January 2004, Field Representatives (FRs) have used the CHI application to record details
about contacts and contact attempts on the National Health Interview Survey (NHIS). Today, all
ongoing and some periodic Census CAPI surveys have embraced the CHI. Survey managers in
the field and at headquarters rely on CHI data for daily monitoring of survey progress and
quality control. Researchers have used CHI data for a wide range of analyses including survey
cooperation and nonresponse, optimization of field operations, and effectiveness of respondent
incentives. Furthermore, the CHI is an important paradata source for the Census Bureau’s
adaptive design efforts. Given the wide acceptance and use of the CHI and its importance in the
Bureau’s adaptive design initiative, it is critical that the CHI data be fit for the uses to which it is
put. In this paper, we discuss a recent multi-survey evaluation of the CHI in terms of
completeness, reliability, and validity. We identify weaknesses and strengths of the CHI data,
and describe our planned research efforts for improving CHI data quality. We end with
recommendations for others using similar interviewer-created paradata.
Interviewers as Respondents: Assessing the Usefulness of Neighborhood and
Sample-unit Interviewer Observations
Rachael Walsh, U.S. Census Bureau; Nancy Bates, U.S. Census Bureau
Interviewer observations have recently gained attention in the survey methods literature as a
way to enhance both the data collection process and the quality of the data. Adaptive survey
design can potentially benefit from visual information collected by interviewers to provide
contextual data about interviewer assignment areas. Survey managers can use this information
to manage cases better through response propensity models. When they are correlated with
both response propensity and the survey variables of interest, interviewer observations can
reduce nonresponse bias through post-survey adjustments. This paper includes an assessment
of interviewer observations and the potential of these observations for use in adaptive survey
design. The 2012 Survey of Income and Program Participation-Event History Calendar (SIPP-
EHC) field test included interviewer observations of 3,582 sample units collected by 340
interviewers. Observations included 17 different characteristics of the sample unit and
surrounding neighborhood. In this paper, we address the following research questions:
How successful were interviewers in collecting the observations?
Are observations predictive of final survey outcomes?
Are observations correlated with key survey estimates like employment, participation
in social welfare and social insurance programs, health insurance coverage, and
poverty?
Does usefulness of observations vary by neighborhood versus sample-unit level
observations?
Do observations have added value beyond the usual contact history data (e.g.
doorstep concerns, number of attempts, mode of attempts)?
Developing Survey Metrics for Adaptive Design
Barbara O’Hare, U.S. Census Bureau
Adaptive survey design is based on interventions during data collection to achieve strategic
survey goals. Intervening in data collection requires access to metrics that integrate paradata
and response data. This paper discusses the development of survey metrics and a dashboard
display for the 2013 American Housing Survey conducted by the Census Bureau. We will
discuss the decision process to identify the key metrics and the configuration of a dashboard to
display them in near real-time. This work involves consultation with the Census survey manager
and the sponsor to determine which survey response variables to track daily. It entails tracking
case completion and response rate to measure survey progress. It also involves the
construction of effort and cost metrics to assess continuously the expense associated with
progress. Finally, the construction of survey metrics includes measuring the propensity of open
cases to respond to further contact attempts. The combination of survey response, case
completion, cost and effort and response propensity measures allow the survey manager to
adjust field efforts to optimize data quality while containing costs. The dashboard is dependent
on an integrated system of paradata and reporting capabilities. Data from several Census
Bureau systems (e.g., Field, Payroll) need to be assembled and converted for use as survey
metrics. A Unified Tracking System (UTS) in the Bureau has made survey process data from
these different systems accessible to a range of survey stakeholders. We discuss the process of
refining information provided through the UTS for particular survey requirements.
2013 National Survey of College Graduates: A Practice-Based Investigation of
Adaptive Design
John Finamore, U.S. Census Bureau
The goals of adaptive design are to attain high-quality survey estimates in less time and at less
cost than traditionally executed survey operations. The National Survey of College Graduates
(NSCG) will be fielded from February to July of 2013 and will investigate several facets of
adaptive design in order to achieve these goals. First, daily processing (editing, imputation,
weighting) is operationally expected to reduce the overall time from the beginning of data
collection until the final delivery of data and estimates. In addition to operational efficiencies,
daily processing will allow the survey team to monitor several quality measures throughout data
collection, including R-indicators, benchmarking, stability of estimates, and response
propensities by mode. Adaptive design techniques will be directly employed in a mode-switching
experiment, where data quality measures will be examined on a weekly basis, and cases will be
switched between modes, or put on hold entirely. This experiment is an attempt to allocate
resources more efficiently in order to maximize survey quality while minimizing wasted funds
and effort. The NSCG uses the American Community Survey (ACS) as its sampling frame and
so has a large quantity of data from which to construct propensity models and calculate
expected frame totals. For the 2013 NSCG, propensity models calculated using 2010 NSCG
data will be applied to 2013 NSCG data for initial locating and response propensity estimation.
Those models will be updated with respondent data from 2013 so that adaptive design
decisions employ the most up-to-date models available. Daily processing will use respondent
data to calculate weighted estimates of frame variables for comparison with expected estimates
from the ACS for benchmarking purposes. This talk will discuss the components of adaptive
design that NSCG will implement in the 2013 survey, and present examples of data quality
measures using 2010 NSCG retrospective data.
Surveying Families and Households
Concordance of Information Collected from Both Members of Low-Income
Couples
Daniel J. Friend, Mathematica Policy Research; Amber Tomas, Mathematica Policy
Research; M. Robin Dion, Mathematica Policy Research; Debra Wright, Mathematica
Policy Research; Robert Wood, Mathematica Policy Research
Low-income families and couples are often the target of federal policies and programs,
particularly social service programs. As part of the evaluations of these programs, researchers
collect background data which is used for several important purposes such as 1) describing the
characteristics of participants, 2) controlling variables in impact analyses, and 3) analyzing
impacts on subgroups. Although data is frequently collected from both members of couples, little
is known about how often partners agree on basic household demographics (e.g., income), or
how their perspectives on more subjective measures, such as relationship quality, may differ.
Although research exists on the level of agreement between proxies and respondents, little
research has been conducted on agreement between partners within a couple. Given that
analysis may focus on only one partner it is important that we understand how often couples
agree or disagree on this basic important information. To shed light on this question, we will
analyze data from three studies involving low-income families funded by the Administration for
Children and Families, including the Building Strong Families project (a national evaluation of
healthy relationship programs involving 4,700 couples), the Couples Decision-Making project (a
multi-method study examining decision-making in 46 low-income couples), and the Creating
Healthy Relationships project (an evaluation of an intimate partner violence prevention program
including 115 couples). We will examine demographic variables (e.g., family structure, income)
and relationship variables (e.g., status, quality) and compute a couples’ agreement score
indicating the degree to which the couples agree on these variables. Additional data sources
from these studies (i.e., observational), will be used in regression analyses to explore potential
explanations for discordance. Finally, we will discuss the findings implications and applications
for future data collection and analysis of families and couples, determining the best respondent.
“S/he Said What!”: The Challenge of Interviewing Both Partners About a
Relationship
Jennifer Satorius, NORC at the University of Chicago; Colm O’Muircheartaigh, University
of Chicago; Angela Jaszczak, NORC at the University of Chicago; Stephen Smith, NORC
at the University of Chicago
The National Social Life, Health, and Aging Project is a longitudinal study designed to explore
the role of social support and personal relationships in healthy aging. Each wave of multi-mode
data collection combines in-home CAPI interviews with the collection of a wide range of
biomeasures. Wave 1 was conducted in 2005-2006 with a nationally representative sample of
more than 3,000 older adults. Wave 2 was conducted in 2010-2011. To understand from the
perspective of both partners the role intimate relationships play in respondents’ health, Wave 2
interviewed the cohabitating spouses and romantic partners (partners) of our primary
respondents (primes) in addition to interviewing the primes. Given the inclusion of questions
regarding health behaviors and relationship quality, we wanted to assess whether the
introduction of partner interviews might discourage response from the primes or introduce a bias
in their responses. An experiment was designed in Wave 2 to assess the impact of the change
in methodology. Primes were assigned to one of three experimental conditions: 1) Primes were
informed in advance of the Wave 2 interview that the partner would be approached for interview;
2) Primes were informed at the end of the Wave 2 interview that the partner would be
approached for interview; 3) No request was made for a partner interview. The results of this
experiment will inform future decisions on the design of surveys involving data about
partnerships. The design permits the assessment of three effects which will be presented at the
conference: first, whether the introduction of the partner interviews affects the data from the
primes or their response rates; second, whether the timing of the request has a differential
impact; and third, whether the responding partners themselves can provide an unbiased
estimate of population values.
Validation of Teacher Report as a Methodology for Collecting Information on
Student’s Cognitive Knowledge and Skills
Kristin Flanagan, American Institute for Research; Cameron McPhee, American Institute
for Research
The Early Childhood Longitudinal Study, Kindergarten Class of 2010-11 (ECLS-K:2011),
sponsored by the National Center for Education Statistics (NCES) within the U. S. Department
of Education, is a nationally representative study of children in kindergarten during the 2010-11
school year. The ECLS-K:2011 will follow these children throughout their elementary school
years, culminating data collection in the spring of 2016 when the majority are in fifth grade.
During the kindergarten year collection, the ECLS-K:2011 collected information about children’s
reading, mathematics, and science knowledge and skills both through direct assessment of the
child and through teacher report allowing for a unique opportunity to check the validity of
teacher-reported data. Over 18,000 children participated in their kindergarten year, from diverse
socioeconomic and racial/ethnic backgrounds, in both public and private kindergarten programs.
This study will 1) explore the validity of teacher report of children’s reading, mathematics, and
science knowledge and skills by comparing teacher reports to direct child assessment data, 2)
explore the variation in validity by characteristics of teachers and classrooms, such as teacher
education; experience; certification; approaches to instruction (e.g., use of whole group versus
small group instruction; use of ability groups) and classroom characteristics, such as size and
racial/ethnic composition of the class; and 3) explore the variation in validity by characteristics of
the child, such as socioeconomic status, gender, and race/ethnicity. Studies of children’s growth
and development oftentimes rely on methodologies where the teacher provides information on
children’s knowledge and skills and direct assessment are not included. A study such as this
one will provide information to researchers on the validity of teacher report of such information,
exploring the possibility of variation of validity by teacher, classroom, and child characteristics.
Understanding the sources of validity differences can help researchers interpret survey results
as well as design surveys that minimize this variation.
Maintaining Sensitivity to Socio-Cultural Differences in Survey Instruments for
Heterogeneous Samples
Rebecca Weiner, Mathematica Policy Research
The past several decades have witnessed sweeping changes in the family and left many U.S.
children without the support or involvement of their fathers. In response, the federal government
created the Responsible Fatherhood (RF) grant programs. To better understand the
effectiveness of such programs, Mathematica Policy Research is assisting the Administration for
Children and Families (ACF) with the Parents and Children Together (PACT) Evaluation, a
study of a subset of RF federal grantees. African American and Hispanic fathers will comprise a
large proportion of PACT’s impact study sample. This paper discusses the design of the
baseline survey instrument and the use of pretesting techniques to achieve a culturally relevant
instrument that is nimble enough to capture the complex family structures of diverse program
participants. We drew on several national surveys of similar populations, and consulted with
nationally recognized experts in research, practice and policy as we designed the instruments.
We conducted cognitive interviews with African American and Hispanic fathers reporting
different family configurations, including married fathers and non-residential fathers who had
children with multiple partners, to assess question response and sources of response error.
Results from the cognitive interviews suggested respondents inaccurately interpreted several
key items, particularly those related to men’s mental health and wellbeing, and highlighted
issues that warranted adjusting the instrument. In response to the cognitive interview results, we
modified item sequencing and wording and included a different mental health scale (PHQ-8),
which respondents more easily understood in subsequent pretests. We will discuss the pretest
process, the findings for the baseline instrument and implications for future survey research with
similar populations.
Potential Explanations for the High Net Undercount Rate of Young Children in the
U.S. Decennial Census
William P. O’Hare, U.S. Census Bureau; Eric Jensen, U.S. Census Bureau; Barbara
O’Hare, U.S. Census Bureau
The Census Bureau’s Demographic Analysis (DA) found a net undercount rate of 4.6 percent
for children age 0 to 4 in the 2010 Census, higher than any other age group. In addition, the net
undercount rate for young children has increased substantially since the 1980 Census. This
paper presents three possible explanations as to why young children have a high net
undercount rate in the Census and discusses the implications for data collection. One factor
which may account for the difference between the DA counts and the Census may be the
population estimation technique for children ages 0-4, where net international migration of
young children is underestimated. The second set of ideas is related to the Census data
collection instrument and processing which may result in under-enumeration of young children.
The third category of ideas is related to the households and living arrangements of young
children and the extent to which young children are over-represented in hard-to-count places
and households. Each proposed cause is described, currently available data are used to assess
the ideas, and additional data are proposed to better assess each idea or set of ideas. The
undercount of very young children in the U.S. Census has received relatively little attention in
the professional literature, yet there are substantial implications beyond the decennial census.
For example weighting of survey results often rely on census data. In addition, data collection
procedures for capturing accurate counts of very young children in the census can apply to
survey data collection. This paper furthers the discussion of this important issue.
Cell Phone Sampling
Improving the Reliability of Survey Items to Assess Telephone Status in RDD
Surveys
Vincent E. Welch, NORC at the University of Chicago
The reliability and validity of random digit dial (RDD) landline telephone surveying in the United
States has been threatened in the past decade by concerns about possible noncoverage bias
linked, in part, to a growing number of households giving up their landline telephone and
embracing a wireless only lifestyle (AAPOR, 2010). Since the beginning of the last decade,
survey researchers have recognized the need to address the mobile phone population in order
to ensure full coverage of the population of U.S. households (Blumberg and Luke, 2012).
However, the reliability and validity of the items that assess telephone status have not been
established (AAPOR, 2010). Over the past year, NORC has conducted a series of qualitative
and quantitative research studies aimed at filling in this vital gap in knowledge. Researchers at
NORC conducted focus groups and cognitive interviews with dual-phone (i.e., landline and
wireless) users to assess the understandability of the current telephone status items in use in
many surveys, such as the National Health Interview Survey, California Health Interview Survey,
National Immunization Survey, and multiple surveys conducted by Gallup. In-depth probing
revealed substantial threats to reliability associated with the wording of telephone status items
and response scaling. We found that by altering the wording of the items and the response
scaling, we could increase the reliability of responses substantially. Results of a preliminary test
of the new scaling will be discussed.
Cell-Phone-Only Voters in 2012 National and State Exit Polls
Michael Mokrzycki, Mokrzycki Survey Research Solutions
Courtney Kennedy, Abt SRBI
The November 2012 U.S. exit polls included a question on voters' telephone status not only on
the national questionnaire—as in elections dating to 2004—but in surveys in 12 states with high
rates of early or absentee voting. Nationally and in the aggregate of the 12 states, one-third of
voters were cell-only. By state, however, this proportion ranged from 50% in Arizona to 17% in
New Jersey; these estimates among voters correlate highly with National Center for Health
Statistics modeled state-level estimates of wireless-only incidence for all adults. In many states
presidential vote preference differed starkly between cell-only voters (typically more likely to be
Obama voters) and those with landlines, but there were exceptions, including the primary and
general election battleground New Hampshire. In seven of the 12 states and the national exit
poll, there were supplementary dual-frame telephone polls to reach early or absentee voters;
cell/landline status and other survey estimates for those respondents will be compared with
those for Election Day voters. Characteristics of cell-only and other voters will be compared with
data from past national exit polls. Implications for future pre-election surveys and exit polls will
be discussed.
The Use of Billing Zip Code and Recent Activity Flags in Cellular Telephone
Samples
David Dutwin, Social Science Research Solutions
David Malarek, MSG
Major sampling companies have recently begun to offer the appending of billing zip codes and
recent activity flags to cellular telephone samples. This study considers the utility of these data
by investigating first what percent of cell phone telephones numbers receive the billing zip flag,
and then, through the use of a large scale national study, by measuring the percent of cell
owners whose billing zip actually matches the self-reported zip of their household. Differences
by geography, demographics, and characteristics of the zip codes themselves are analyzed to
assess the degree of bias inherent in utilizing only sample which has a billing zip flag, and then
of respondents who actually qualify for a study by reporting that they in fact live in the zip
code(s) targeted by such sample. This paper also considers the distribution of respondents who
have a billing zip flag but do not live in their billing zip, and measures the increase in coverage
that can be attained by casting a wider net to nearby zip codes, outside of the study target
geography. With regard to recent activity flags, we report on the differential telephone
dispositions of sample in a large national study by whether each sample record has any recent
activity at all; by the prepaid phone flag, and by the recency of use measure. Implications for
cellular telephone sampling are considered.
Adjustments for Missing Cell Phone Only Respondents in Repeated Cross-
Sectional RDD Surveys
Burton Levine, RTI International
Until 2011, the Behavioral Risk Factor Surveillance System (BRFSS), a repeated cross-
sectional random digit dial survey, only utilized a landline telephone frame. In 2011, the BRFSS
frame was supplemented with a frame of cell phones. To the extent that health behaviors such
as smoking are correlated with cell phone only status, the landline-only sample results in
noncoverage bias. Trends in health behaviors are confounded with trends in telephone usage
and the 2011 sample design change. Specifically, based on BRFSS data, many states saw a
downward trend in smoking rates between 2005 and 2010 that reversed in 2011 when the cell
phone frame was added. We present methodology to account for the pre-2011 coverage error
and the resulting coverage bias. We impute the missing cell phone-only subjects from pre-2011
data with the 2011 cell phone-only respondents. We then reapply the poststratification and
recalculate the smoking rates at each time interval. As a result of this procedure the 2005-2010
smoking rates increased, but not uniformly—the later the year, the more coverage error, and
therefore the greater the increase in the adjusted smoking prevalence. In some states, before
the adjustment, the 2011 smoking rate was the highest for all years between 2005 and 2011;
but after the adjustment, 2011 had the lowest smoking rate. This methodology is generalizable
to other outcomes that are correlated with cell phone only status in other repeated cross-
sectional RDD surveys that added a cell phone component.
Methodological Briefs:
Survey Measurement
Improving the Measurement of Big 5 Personality Traits in a Brief Survey
Instrument
Matthew DeBell, Stanford University; Ted Brader, University of Michigan; Simon
Jackman, Stanford University; Catherine Wilson, Stanford University
The 'Big Five' personality traits are the subject of a huge literature in psychology. Part of this
literature employs extensive multi-item scales whose length normally precludes their inclusion
on representative sample surveys. The Ten Item Personality Inventory (TIPI) has made Big 5
measurement practical in more settings, including representative surveys. However, TIPI's
agree-disagree question format invites acquiescence bias. In this paper we report the results of
an attempt to improve personality measurement by rewriting the questions to fix the
acquiescence problem. We compare the canonical version to an edited version and assess the
quality of the resulting data (from a survey conducted by the American National Election Studies
in 2012) on several dimensions: completion time, item nonresponse, paired item reliability, and
construct validity. We also compare results from both measures in tests of hypotheses about
personality's relationship to political attitudes and behavior. We find that completion time and
item nonresponse rates are comparable, while reliability and construct validity for the revised
TIPI are as good or better than the canonical version by most measures. The results show how
better personality data can be obtained at no additional cost by optimizing questionnaire design.
A Comparative Look at Measures of Socioeconomic Status and How Well They
Predict Academic Achievement
David Miller, American Institutes for Research; Saida Mamedova, American Institutes for
Research
Socioeconomic status (SES) generally refers to the social standing or class of an individual or
group based on economic and social factors. When studies refer to SES levels (low, high, etc.),
people may assume that a common definition or measure has been employed. This analysis will
examine specific SES measures used across several education studies, including national
household surveys, national longitudinal studies, and national and international assessments.
Some education studies, such as the Early Childhood Longitudinal Study (ECLS) and Education
Longitudinal Study (ELS), produce a composite SES measure based on parents’ education,
parents’ occupation, and household income as reported by a parent of each student. However,
in the Program for International Student Assessment (PISA), a composite SES measure is
constructed based on student-reported information from the 15-year-olds participating in the
study. It is composed of several variables: the International Socio-Economic Index of
Occupational Status (ISEI); the highest level of education of the student’s parents, converted
into years of schooling; the PISA index of family wealth; the PISA index of home educational
resources; and the PISA index of possessions related to “classical” culture in the home. In the
Trends in International Mathematics and Science Study (TIMSS), 4th-graders’ reports of how
many books they have at home is often used as a proxy for SES, and the percentage of
students in a public school eligible for free or reduced-price lunch has often been used in
studies as a proxy for school-level SES. In this analysis, we will describe SES measures used
across major national and international education studies. Using regression analyses, we will
examine how variation in student achievement within a given study differs if alternative SES
measures are applied. The study aims to better understand the implications of different
definitions and measurement of SES, especially as related to student achievement.
Applying “Best Practices” to Questionnaire Design
Darin Harm, Arbitron
Arbitron uses a short questionnaire as the first step of a multi-mode data collection process
(mailed screener, phone diary placement and mailed diary package) for recruiting the non-
landline portion of the population. If a respondent reports being cell phone only or cell phone
mainly the household is added to a cell-phone frame and is used to supplement a 2+ list
assisted RDD sample. Improving the response rate for the questionnaire is critical to improving
the overall response rate of the ABS frame sample since the overall response rate for the ABS
sample can never be higher than the return rate to the initial questionnaire. In the summer of
2012, Arbitron redesigned the questionnaire. The goal of this collaboration was to apply best
practices in questionnaire design to increase response rates while maintaining data quality.
Several modifications were made to the current questionnaire, including making the survey
materials more “official”, limiting response modes, and improving visual flow. The redesigned
questionnaire will be tested in the winter of 2012. Arbitron’s current questionnaire will be used
as the control. Since multiple changes to the questionnaire are being tested simultaneously, it
will not be possible to pinpoint the impact of a specific change. However, our goal is to compare
the effectiveness of our current questionnaire to the overall effectiveness of a questionnaire that
has been redesigned based on “best practices” in questionnaire design. This presentation will
examine the impact of the redesigned questionnaire on response rates, data quality, and
demographic representation of respondents.
Examining Errors in Medicaid Reporting Across Four National Surveys: ACS,
CPS, MEPS, and NHIS
Kathleen T. Call, University of Minnesota, SHADAC; Michel Boudreaux, University of
Minnesota, SHADAC; Joanna Turner, University of Minnesota, SHADAC; Brett Fried,
University of Minnesota, SHADAC
Surveys provide the only source of estimates for the distribution of health insurance in the
population, representing a critical source for evaluating the impact of the Patient Protection and
Affordable Care Act (ACA). However, measuring health insurance coverage is challenging and
virtually every survey is said to undercount Medicaid enrollment. In surveys such as the National
Health Interview Survey (NHIS), Medical Expenditure Panel Survey (MEPS) and the Current
Population Survey (CPS) Medicaid enrollment counts are always lower than counts available
from enrollment data. If enrollees do not report Medicaid, estimates of other coverage or being
uninsured will be biased upwards and Medicaid estimates will be biased downwards. If critical
questions about the Medicaid undercount are not addressed, public trust (e.g., fiscal and
legislative analysts) in health insurance information from surveys will erode and the impact of
the ACA will be difficult to evaluate. We extend work from the SNACC team, a multi-phase
project examining the Medicaid undercount in federal surveys, to the American Community
Survey (ACS). We use linked 2008 ACS data, the first year health insurance variables were
available and 2008 monthly Medicaid Statistical Information System (MSIS) data to examine the
extent to which Medicaid enrollment is misreported. We compare the magnitude of the
undercount and factors associated with misreporting in the ACS to other federal surveys (CPS,
MEPS, NHIS). From previous research we know that measuring health coverage is prone to
some level of error and is worse in surveys with extended recall periods; yet bias to uninsurance
estimates is minimal. This work provides the first look at the Medicaid undercount in the ACS, a
survey that allows us to explore accuracy of Medicaid reporting by survey mode, and is part of a
research agenda to further explore patterns of misreporting and the effect on coverage
estimates.
Reliability of Parent-Reported Age of Diagnosis for Children with Autism
Stephen J. Blumberg, National Center for Health Statistics; Matthew D. Bramlett, National
Center for Health Statistics; Heather M. Morrison, NORC at the University of Chicago;
Alicia M. Frasier, NORC at the University of Chicago; Michael D. Kogan, Maternal and
Child Health Bureau
Early identification of autism spectrum disorder (ASD) is an important first step toward making
sure that children with ASD and their families are able to access and benefit from early
intervention services. Parent surveys could be used to evaluate progress in reducing the age
when children with ASD are first diagnosed. Concerns have been raised, however, about
parents’ ability to accurately recall this information. We used data from two surveys to evaluate
the reliability of parent report. Parents of school-aged (6 to 17 years) children with ASD were
identified during the 2009-2010 National Survey of Children with Special Health Care Needs,
and these parents were recontacted (on average, 9 months later) for the Survey of Pathways to
Diagnosis and Services. Both surveys were conducted by the National Center for Health
Statistics as part of the State and Local Area Integrated Telephone Survey, and both asked
“How old was the child when a doctor or other health care provider first told you that [he/she]
had autism or ASD?” The responses across surveys for 1,341 children were highly correlated
(Pearson r = 0.85) but did not match exactly for nearly half the children (47%). For many (19%
of the total), the reported age of diagnosis differed by two years or more between surveys.
Differences of this magnitude were more likely for adolescents aged 12 to 17 years (risk ratio =
1.52), for children living in poverty (RR = 1.41), for children whose parents have no more than a
high school education (RR = 1.48), and for children ever diagnosed with attention-deficit/
hyperactivity disorder, depression, anxiety problems, and/or behavioral/conduct problems (RR =
1.54). Children with 3 or 4 of these emotional or behavioral conditions were more likely to have
discrepant parental reports than children with only 1 or 2 conditions (RR = 1.51).
Interpreting Feeling Thermometers Using Demographic Models
Quinn Albaugh, McGill University; Stuart Soroka, McGill University
Public opinion surveys often rely on feeling thermometers—questions that ask respondents to
quantify their feelings to-wards politicians, political parties, institutions, and social groups on a
rating scale (typically from 0 to 100). In theory, these questions provide easily interpretable
interval-level scores for survey researchers, but a number of studies suggest systematic
differences in the ways in which respondents assign scores. Differing levels and variances in
thermometer ratings then make it extremely difficult to interpret each respondent’s ratings
particularly with regard to whether a rating is positive or negative. A 60 for one respondent is, in
short, not the same thing as a 60 for another respondent. This study suggests that we can
overcome these obstacles to some extent by developing models that predict respondents’
scores (almost entirely ignoring the object being rated) based on their demographic
characteristics, and then evaluate ratings based on their deviation from the respondent’s
predicted values. In short, predicted means provide an estimate of what a neutral point is likely
to be, given a respondent’s demographic characteristics; we are thus able to capture and then
account for a good degree of heterogeneity across individuals. Analyses are based on millions
of thermometer scores gathered in the American National Election Studies and U.S. General
Social Surveys over the past three decades. Results suggest a means of interpreting
thermometer scores that is significantly different from what is typical in the literature, and point
towards significant gains in the value of thermometer scores for wide range of analyses. In sum,
the analysis provides a useful approach that can help researchers interpret thermometer scores
in light of individual and group differences.
Maximizing the Accuracy of Final Pre-Election Polls Predicting the Outcomes of
Races for the U.S. Senate, House of Representatives, Governorships and the
Presidency: A Meta-Analysis
Samuel L. Storey, Stanford University
With the wide variance in polling data collected from the most recent 2012 election cycle, many
observers have wondered what makes one pre-election poll more accurate than another. Polls
that ask similar questions sometimes yield widely different results, and as of yet no one has
undertaken a quantitative analysis to determine why. This study fills this void by taking a holistic
look at every public pre-election poll taken to predict elections for the House of Representatives,
Senate, Governorships, and the Presidency in the days preceding the 2008, 2010, and 2012
elections. We conducted a regression analysis of key variables that characterize a poll,
specifically its distance from Election Day, methodology, pool identity, partisan affiliation,
geographic location, time in field, and sample size. We determined that common impressions do
not always hold true for poll accuracy. For example, data from 2008 and 2010 reveal that while
companies that use automated polling techniques frequently claim their polls are superior,
human interviews actually tend to yield more accurate results. Additionally, larger sample sizes,
closer proximity to Election Day, and larger constituencies are all characteristics of more
accurate predictive polling. Currently, we are continuing to collate poll results from the 2012
election to affirm these results. Using the study’s results, future pollsters will be able to construct
wiser and more prudent methodologies that will provide more precise polls, and constituents will
be able to distinguish which companies create the most reliable data to make more informed
decisions about the status of an election.
How Does This Look Over There?: Two Experiments in Formatting
Carol Cosenza, Center for Survey Research/UMass Boston; Stephanie Lloyd, Center for
Survey Research/UMass Boston; Lee Hargraves, Center for Survey Research/UMass
Boston
The Consumer Assessment of Healthcare Providers and Systems (CAHPS®) instruments are
usually self-administered, which means that HOW the pages look and are formatted can
influence data quality. As part of two different field tests, several experiments of alternative
formatting were undertaken. The surveys were funded by the Agency for Healthcare Research
and Quality. Experiment 1: Formatting a 0-10 scale. This test, conducted in a university-based
health system, used the CAHPS® Clinician & Group (CG-CAHPS) Patient Centered Medical
Home adult questionnaire. Different formats for the 0-10 provider rating were tested, including a
vertical format (CAHPS® standard) and several horizontal formats (altering placement of check
boxes, numeric responses, and text anchors). Experiment 2: Skip patterns placement of check
boxes. Sometimes respondents are instructed to skip over questions based on their answers to
screening questions. When respondents make errors at screening questions, valuable data can
be lost. When surveys are coded, we sometimes observe that respondents correctly follow the
skip instructions, but fail to check any response box in the screening question itself. To test
whether placement of the check boxes makes a difference in skip compliance, a small sample
(n=500) were sent questionnaires where the check boxes were placed to the right of the
response options (and directly before skip instructions). We compare the data from this test
group with those who received the standard version (check boxes left of the response options).
This experiment, conducted in a Medicaid population, used the CG-CAHPS Adult questionnaire.
Both experiments used a standard 3-contact mailing protocol. The analysis plan for Experiment
1 is to compare differences in means and item non-response. For Experiment 2, we compare
rates of errors of omission, where respondents skip questions they should answer, and errors of
commission, questions that should be skipped are answered.
Public Opinion and Health Policy
Public Opinion and Health Policy at the State Level
Claudia Deane, Kaiser Family Foundation; Bianca DiJulio, Kaiser Family Foundation;
Mollyann Brodie, Kaiser Family Foundation; Sarah Cho, Kaiser Family Foundation
With the re-election of President Barack Obama, threats to repeal the Affordable Care Act
(ACA) have quieted and attention has turned to the states as they face many implementation
decisions and challenges. States vary widely in their willingness to expand their Medicaid
programs and develop exchanges in which the public can shop for health insurance. Public
opinion of the law also varies greatly by state and as is true nationally, it is largely entrenched in
partisan leanings. The Kaiser Family Foundation has been in the field monthly with an in-depth,
national sample survey of Americans’ views of the Affordable Care Act, and using this rich base
of data we dig deeper into the national results and analyze opinion of the law at the state level.
We explore which states have the most positive and negative views of the law and see how this
corresponds with their 2012 vote choice, their progress in implementing the ACA and compare it
to a number of state level benchmarks of who the law is intended to assist, namely the states
shares of those on Medicaid, uninsured, and living in poverty. As demonstrated by national
data, opinion is often ideologically based and this paper explores where opinion and state policy
come together or diverge.
Re-Examining Self-Interest as a Predictor of Policy Attitudes Towards Public
Health Policy
Stephanie Morain, Harvard University
To what extent are the policy preferences of Americans shaped by self-interest? A substantial
body of empirical scholarship in political science and public opinion suggests that self-interest
has minimal explanatory power in explaining public attitudes. Among the most frequently cited
exceptions to this thesis are attitudes towards smoking policy, with smoking status repeatedly
demonstrated as a significant predictor of support. However, the studies most frequently cited
as evidence of this exception are three decades old. Given dramatic changes in tobacco control
policy, these prior studies may not accurately reflect current attitudes. Further, the role of self-
interest in shaping preferences toward other public health challenges, including obesity, remains
woefully underexplored. Using data from a 2011 online survey of a nationally representative
sample of 1817 American adults using KnowledgePanel®, Knowledge Networks’ (KN) national
probability-based Web panel, I replicate and extend prior inquiries into the influence of self-
interest upon respondent views towards public health policy. I begin by examining whether
smoking status influences respondent views towards legal strategies to reduce tobacco use. I
then examine whether and how self-interest may also influence respondent views towards legal
strategies to address obesity. Consistent with earlier studies, I find current smokers are less
likely to support tobacco control measures. However, I find that former smokers are also
significantly less likely to support such measures. With respect to obesity policy, I find that body
weight does not predict support for policies aimed at shaping the food environment, but does
predict support for “individually punitive” policies such as insurance premiums for obesity status
and restrictions on the use of food stamps for the purchase of “junk foods.” I propose these
results complicate prior explanations of self-interest as a driver of policy preferences, and
suggest the need to revisit the role of self-interest in attitudes towards public health policy.
Attitudes and Preferences Toward Health Care and Their Symmetry with Health
Insurance Coverage and Medical Expenditure Behaviors
Steven B. Cohen, Agency for Healthcare Research and Quality
Health insurance helps individuals receive timely access to medical care and protects them
against the risk of expensive and unanticipated medical events. In addition to the
socioeconomic profiles that distinguish individuals with coverage from those who are uninsured,
attitudes regarding the need for and value of health insurance coverage may also affect
coverage decisions. Given the potential for individuals’ health care preferences to influence
health behaviors, it is important to measure the population’s attitudes towards health insurance
coverage and to examine the persistence of these attitudes over time. Individual opinions and
attitudes towards healthcare may also visibly impact upon decisions associated with the use
health care services and associated behavior with respect to medical expenditures. This study
provides a detailed investigation of the degree of alignment over time in health care attitudes
regarding the need and value of health insurance coverage based on national data from the
Medical Expenditure Panel Survey (MEPS)sponsored by the Agency for Healthcare Research
and Quality. Attention is also given to the alignment and associations revealed between the
degree of concordance in health care preferences and the persistence in individual coverage
and expenditure patterns over time. The utility of these preference measures as significant
predictors that serve to identify individuals with persistently high levels of medical expenditures
over time is also assessed.
Public Opinion on Medicare Reform
Becky Hanna, Kaiser Family Foundation; Liz Hamel, Kaiser Family Foundation; Sarah
Cho, Kaiser Family Foundation; Mollyann Brodie, Kaiser Family Foundation
With Medicare spending expected to rise as a share of the federal budget and the nation’s
economy, policymakers are challenged to find ways to reduce the future growth in Medicare
spending, while preserving the quality and affordability of care, and assuring fair payments to
plans and providers. The Kaiser Family Foundation has been tracking Americans’ views on
health policy topics, including Medicare and Medicare reform proposals, through monthly,
nationally representative surveys. In the context of Medicare policy proposals and ongoing
budget discussions, this paper explores the public’s views of the Medicare program and their
reactions to policy proposals to change the program, such as raising the age of eligibility, means
testing, and the premium support plan put forth by former vice presidential candidate Paul Ryan
and others. Special focus is given to seniors, the current beneficiaries of the program, and the
partisan divides that often pervade public opinion on health policy. As the nation faces
significant fiscal challenges and policymakers explore ways to reduce the national debt,
understanding public opinion on Medicare is crucial as budget debates move forward.
The Effect of Question Wording on Preferences for Prenatal Genetic Testing and
Abortion
Eleanor Singer, University of Michigan;Mick P. Couper, University of Michigan
At intervals since 1990, the General Social Survey (GSS) has asked a series of four questions
inquiring into knowledge of genetic testing and attitudes toward prenatal testing and abortion,
most recently in 2010. Preferences for prenatal testing for genetic defects are relatively stable
over this time period, with almost two thirds of respondents expressing a preference for such
testing. Preferences for abortion in case of fetal defect, on the other hand, showed a decline,
from 41.1% in 1990 and 41.7% in 1996 to 28.7% in 2004 and 31% in 2010. From 1990 through
2010, the questions about prenatal testing and abortion were framed in terms of 'baby'—for
example, 'Today, tests are being developed that make it possible to detect serious genetic
defects before a baby is born. But so far, it is impossible either to treat or to correct most of
them. If you/your partner were pregnant, would you want (her) to have a test to find out if the
baby has any serious defects?' After the 2010 results were released, some researchers
questioned whether the answers might have been different had the questions been framed in
terms of 'fetus' rather than 'baby.' The word 'baby' had been chosen on the assumption that
'fetus' would be less familiar to respondents, and would therefore lead to more Don't Knows and
No Answers. But in the current climate, it seemed possible that the word 'fetus' would carry a
more abstract, impersonal meaning and therefore lead to more frequent expressions of
preferences for prenatal testing and abortion. To resolve this issue and provide guidance for
future administration of these questions in the GSS, we designed a question-wording
experiment fielded by TESS. The data have been collected and analyzed and we propose to
describe the results of the experiment at the 2013 conference.
Who Consents?...Especially When
Linkage or Biological Data are
Involved
I Think I’ll Pass on That...: Analyzing Differences Between Respondents Who
Allow and Reject Consent Requests in the 2006 HRS
Bradley Parsell, NORC at the University of Chicago
The Health and Retirement Study (HRS) is a longitudinal panel study supported by the National
Institute on Aging and the Social Security Administration that surveys a representative sample
of more than 26,000 Americans over the age of 50 every two years. The HRS explores the
changes in labor force participation and the health transitions that individuals undergo toward
the end of their work lives and in the years that follow. Beginning in 2006, some study
participants were asked to consent to a series of physical measurements (height, weight, etc.)
and biological collections (saliva and blood). Additionally, a subset of the respondents was
asked for their social security numbers for record linkage. To varying degrees, respondents
decline to participate in these activities or give information. Similar to the notion that
nonresponse introduces potential bias in survey estimates, nonconsent could potentially lead to
bias in the data collected. Using the public data files from the 2006 HRS, we compare
demographic variables and key survey estimates for respondents who did and did not consent
to the various collections and record linkage request. For each of the different consent requests,
we find that the populations of respondents who decline consent are significantly different from
those who provide consent. The differences between these respondents may indicate that bias
was introduced into the data collected through the activities requiring additional respondent
consent. Further, we analyze respondent characteristics and survey measures that may
influence a respondent’s propensity to give their consent to a given request.
Obtaining Administrative Record Linkage Consent by Mail: Impact of a Sensitive
Request on Survey Cooperation Rates and Nonresponse Bias
Celeste Stone, American Institutes for Research; Harmoni Noel, American Institutes for
Research; David Weir, University of Michigan
With response rates declining (Groves 2011), researchers are turning to administrative records
as an alternate method for collecting rich and comprehensive data from study participants, while
also reducing respondent burden and survey costs. Such linkage requests typically require
obtaining consent from study participants for sensitive, personally identifiable information (PII)
(i.e., Social Security number [SSN]) (Sakshaug et al. 2012). For this reason, these studies
generally use interviewer-administered modes, where interviewers can build rapport and
address respondents’ concerns. Mail is an attractive, cheaper alternative mode. However, little
is known about the feasibility of using a mail survey to make such linkage requests and collect
the required PII (Fulton 2012). This paper reports the findings from a study testing the feasibility
of using a mail survey to obtain participants’ authorization to release their Social Security
Administration (SSA) records for survey research. A subsample of 4,879 Project Talent
longitudinal study participants who had not been contacted in 37 to 50 years were randomly
assigned to either a questionnaire only condition or an experimental condition that included a
simultaneous request to link to their SSA data by signing a form and providing the requested PII
(SSN, DOB, name, signature). The SSA condition also included a three-level prepaid incentive
experiment ($2; $20; no incentive). This paper will 1) evaluate the impact of the consent and PII
request on questionnaire cooperation rates, 2) assess the extent to which any negative effects
are mitigated by offering incentives, and 3) examine the sample characteristics (sex, race,
personality, aptitude) associated with higher propensities to consent to the request for PII.
Preliminary results indicate that the SSA request depressed questionnaire cooperation rates by
at least 10 percentage points. However even after the lengthy period of noncontact and with no
incentive offered, at least 20% of those asked consented to the mail-based SSA linkage
request.
Examination of Item- and Unit- Nonresponse in Population-Based Social Surveys
That Seek to Collect Biological Marker Samples From Respondents
Michael Lawrence, GfK Knowledge Networks; Curtiss Cobb, GfK Knowledge Networks
A large and growing number of population-based social surveys desire to collect biological
markers (e.g., saliva, dried blood, nails, cheek swabs, skin, hair) to investigate the role of
biology in social behaviors and processes. Part of the growth in interest is due to recent
methodological developments that have greatly reduced the financial and administration costs
of collecting biological markers, including the ability of respondents to provide samples by mail.
At the same time, requests for biological markers from respondents heighten concerns over
privacy and may encourage systematic nonresponse (both unit- and item-nonresponse) that can
bias results obtained from studies. However, for most studies, determining whether systematic
nonresponse occurred is difficult because it is not possible to know anything about those
individuals who choose not to participate in a study; other studies have only limited
demographic information from which to understand non-response. In this study, we use GfK’s
probability-based Internet panel, KnowledgePanel®, and its extensive profile information on
panelists to examine demographic, attitudinal and behavioral differences among three groups of
respondents that were invited to participate in two population-based survey studies of adults
(18+) requesting bio-marker samples: 1) individuals that completed the survey and provided a
bio-marker sample (full completes); 2) individuals that completed the survey but failed to provide
a bio-marker samples (item-nonresponse); and 3) individuals who failed to complete the survey
in its entirety (unit-nonresponse). Understanding the characteristic differences between these
three groups can be used to correct for nonresponse bias. Initial results find that while
education, an interest in politics and participation in community groups correlates positively with
consenting to provide a biosample; conservative ideology is negatively correlated with consent.
Once consent is obtained, failure to provide a biosample appears to be mostly random and not
systematically related to demographics, attitudes or behaviors.
Interviewers’ Influence on Consent to the Collection of Biomarkers
Julie Korbmacher, Max Planck Institute for Social Law and Social Policy
This paper examines the determinants of consent to the collection of biomarkers in SHARE with
special regards to the role of the interviewer administering the collection. The Survey of Health,
Ageing and Retirement in Europe (SHARE) expanded measurements of objective health by
collecting a battery of innovative biomarkers. As a pilot study a new module was implemented in
the fourth wave of the German SHARE Study which included the collection of dried blood spots.
For this measurement the ethics review board requires the respondents’ written consent. The
interviewer plays an important role in the collection of biomarkers: (s)he is not only responsible
for explaining the measurements and reassuring respondents, but is also the one conducting all
measurements. Especially in the case of dried blood spots a high level of interviewer skills and
trust of the respondent in the interviewers’ abilities is necessary. As the interviewer plays such a
crucial role in the collection, we examine their influence in this work. Information on them can be
drawn from the 2011 interviewer survey of the German SHARE interviewers. This PAPI
questionnaire was administered during field training and includes information on general
attitudes towards surveys as well as some questions on interviewers’ attitudes, experiences,
and expectations with regard to the collection of biomarkers. Effects of these variables on the
consent rates of the interviewers will be investigated. The design of the pilot study also allows
for a comparison of respondents from the panel and the refresher sample. Given that
consenting to the collection of biomarkers may require a lot of trust in the interviewer, we expect
respondents from the panel who had the chance to build up trust during previous SHARE
interviews to be more willing to consent than the respondents from the refresher sample.
Placement, Wording, and Interviewers: Identifying Correlates of Consent to Link
Survey and Administrative Data
Joseph W. Sakshaug, Institute for Employment Research; Valerie Tutz, Institute for
Employment Research; Frauke Kreuter, University of Maryland JPSM & IAB
Data linkage is becoming more important as survey budgets are tightening while at the same
time demands for more statistical information are rising. Not all respondents consent to linking
their survey answers to administrative records, threatening inferences made from linked data
sets. So far, several studies have identified respondent-level attributes that are correlated with
the likelihood of providing consent (e.g., age, education), but these factors are outside the
control of the survey designer. In the present study three factors that are under the control of the
survey designer are evaluated to assess whether they impact respondents’ likelihood of linkage
consent: 1) the wording of the consent question; 2) the placement of the consent question and;
3) interviewer attributes (e.g., attitudes toward data sharing and consent, experience,
expectations). Data from an experiment were used to assess the impact of the first two and data
from an interviewer survey that was administered prior to the start of data collection are used to
examine the third. The results show that in a telephone setting: 1) indicating time savings in the
wording of the consent question had no effect on the consent rate; 2) placement of the consent
question at the beginning of the questionnaire achieved a higher consent rate than at the end
and; 3) interviewers’ who themselves would be willing to consent to data linkage requests were
more likely to obtain linkage consent from respondents.
DC-AAPOR Student Paper Award Winner
Descriptive Analysis of Influences on Consent to Administrative Record Linkage
Jenna Fulton, Joint Program in Survey Methodology, University of Maryland
Surveys increasingly request respondents’ consent to link survey responses with administrative
records. Such linked data can enhance the utility of both the survey and administrative data, yet
in most cases, this linkage is contingent upon respondents’ consent. With evidence of declining
consent rates, there is a growing need to understand factors associated with consent to record
linkage. This research investigates the relationship between design characteristics of the survey
and consent request and consent rates, and draws upon all available consent rates from
surveys conducted in the U.S. with consent requests. There are three components to this
research. We first assess whether rates of consent to record linkage have declined overall. The
second and third objectives of this research overlap: we describe several characteristics of
surveys that request consent to record linkage, and examine these characteristics as potential
sources of variation in consent rates. We selected attributes of the survey and consent request
that vary across surveys in the target population, for which sufficient information was available
in the methodological documentation, and for which we predicted an influence on consent rates.
This includes survey mode, sponsor and response rate; whether consent is requested orally or
in writing, whether the request takes an explicit or opt-out approach, the topic of the records
requested, and any personally-identifying information requested to facilitate record linkage. The
results of this study suggest that consent rates are declining over time, and that some
characteristics of the survey and consent request are associated with variations in consent
rates, including survey mode, administrative record topic, personal identifier requested, and
whether the consent request takes an explicit or opt-out approach.
Evaluating Address-Based Samples II
Measurement Consequences of Mode Switching in Multi-Mode ABS Surveys:
Experiments in Case Flow Design
Jennifer Vanicek, NORC at the University of Chicago; Felicia LeClere, NORC at the
University of Chicago; Ashley Amaya, NORC at the University of Chicago; Kari Carris,
NORC at the University of Chicago
Multi-mode surveys, among other strategies, have recently been offered as a solution to a
decreasing willingness to respond to social and economic surveys and as a companion to
Address Based Sampling (ABS) as a method for improving population coverage and increasing
response rates. The improvement, however, may come at the cost of measurement error
introduced by asking questions using different methods. Responsive mode switching is vital to
achieving coverage and response rate gains yet it is not clear if starting and ending modes and
switching rules impact key survey statistics by introducing additional bias (LeClere, et al., 2012).
In this study, we use a case flow experiment designed to assess the efficiency and performance
of a mail-first multi-mode ABS design from Phase 4 (November 2011 – September 2012) of
CDC’s Racial and Ethnic Approaches to Community Health Across the U. S. Risk Factor Survey
(REACH U. S.) to disentangle mode effects from population composition. The experiment was
fielded in six of the 28 REACH U. S. communities. Selected sample lines were randomly
assigned to a phone-first or mail-first condition. An attempt was made to match each sample
address to a telephone number. Only cases that were matched to a telephone number in these
six communities were retained for the experiment. We will examine the impact of non-response
and starting and ending mode on estimates for key health statistics from the six experimental
communities to assess whether differences in responsive design choices have an impact on
estimation. Initial results from the experiment suggest that starting with a mail first case flow
design and completing data collection by mail do interact to generate higher estimates of current
smoking rates. Other health related variables, such as BMI, cholesterol and diabetes diagnoses,
as well as diet show limited variation by case flow and ending mode.
The Geographic Nature of Responses to a Web Survey: How Respondents and
Their Sentiments Are Subject to Spatial Bias in a Survey of Technology Usage
Ned English, NORC at the University of Chicago; Lee Fiorio, NORC at the University of
Chicago; Michael J. Stern, NORC at the University of Chicago; Becki Curtis, NORC at the
University of Chicago; Ipek Bilgen, NORC at the University of Chicago
Web surveys have been seen in recent years as a convenient lower-cost alternative to other
modes, without the coverage drawbacks of random-digit dial telephone surveys. At question is
what degree of coverage error might be inherent to Web surveys, and how the kinds of people
who respond to Web surveys may differ from typical respondents to other modes as well as the
population at-large and thus risk the possibility of bias. We broaden on the paper by Fiorio et al.
(2012) by using geographic information systems (GIS) and geostatistical models to examine the
spatial nature of bias in respondents to a Web survey, and the subsequent impact on reported
sentiments. Our research shows that there is a distinct and clustered nature to the
demographics of Web respondents, as influenced by linguistic isolation, race ethnicity, and
percent households below poverty. In addition, we quantify the spatial bias present in
questionnaire data in a survey of technology usage patterns by comparing items from Web
respondents to the same on the in-person General Social Survey (GSS). Our paper shows that
there is a spatial clustering inherent to Web respondents that is not present to the same degree
in other modes, which implies a new category of coverage bias that has not yet been addressed
in the literature. Our research is important to survey research in general because it
demonstrates the use of spatial modeling to explore and quantify a new and emerging issue in
the field, this being bias inherent in the rapidly-emerging Web mode.
Rural Route Where? : An Examination of Coverage Issues Associated with the
U.S. Census Bureau’s National Address List
Kathleen Kephart, U.S. Census Bureau
The U.S. Census Bureau’s Master Address File (MAF) is a national address list that is used for
numerous surveys, as well as the decennial census. In order to create an address frame for
sampling, or an address list for the decennial census, an extract of the MAF is generated using
criteria to determine address validity. One year before the 2010 Census, the U.S. Census
Bureau conducted one of the largest dependent address listing operations in the world, utilizing
an extract of the MAF. As the first major field operation of the 2010 Census, it was important to
provide an accurate address inventory for the census enumeration operations. An accurate
inventory reduces census costs and lessens the risk of either omissions from the census or an
over-count. We will present added-in-error and deleted-in-error rates for later census
operations, as well as the initial canvassing. In addition to presenting the results of 2010
Census operations we explore the characteristics and demographics of areas in the U.S. with
poor address coverage. The majority of blocks in the U.S. only required validation and no
actions by listers (Boies, 2012). Poor coverage is defined by areas that required a large number
of added or deleted addresses from the existing inventory. Another component of our research
is addresses with ambiguous statuses; for instance records that were deleted by one 2010
Census operation and re-added by a later, or records that were found on the ground and existed
on the MAF, but failed to meet the criteria to be included on the MAF extract. In order to focus
limited resources on addresses and geographic areas that require field work, research that
allows us to identify potential errors is key.
Improving the Efficiency of Address-Based Frames With the No-Stat File
Bonnie E. Shook-Sa, RTI International
Address-Based Sampling (ABS) frames are typically based on the Computerized Delivery
Sequence (CDS) file, which the United States Postal Service (USPS) makes available through
licensing agreements with qualified vendors. Research based on the CDS file has found the
coverage of ABS frames for in-person surveys to be sufficient in urban areas but problematic in
rural areas. Because of low rural coverage, researchers often resort to hybrid sampling frames
based on both ABS and traditional field enumeration (FE). With a hybrid frame, areas where
ABS coverage is expected to be sufficient are allocated to ABS while areas where poor ABS
coverage is anticipated are allocated to FE. The more areas that are allocated to the ABS
portion of the hybrid frame, the greater the cost savings. Since 2009, the USPS has made
available the No-Stat file, a supplement to the CDS file that contains approximately 8 million
predominately rural addresses not found on the CDS file. Previous research indicates that
supplementing the CDS file with the No-Stat file could be a cost-effective strategy for improving
rural ABS coverage for in-person surveys (Shook-Sa et al. 2012). Although the overall coverage
gains provided by the No-Stat file are modest, No-Stat addresses are clustered in relatively
small geographic areas. This clustered aspect of No-Stat addresses means that they could
significantly improve ABS coverage in some localized areas. In a hybrid frame design, these
coverage improvements could move areas that otherwise would rely on FE to the ABS portion
of the frame, which would lower field costs. This research measures the efficiencies that are
gained by including the No-Stat file in a hybrid frame design.
Too Many Older Homes in Your Sample? Disproportionately Sampling AOH 55+
Addresses from an Address Based Sampling Frame to Improve Sample
Representation
Lawnzetta Yancey, The Nielsen Company; Lukasz Chmura, The Nielsen Company; Scott
Bell, The Nielsen Company
Historically, the Nielsen TV diary sample has over represented older households. To address
the representation of younger households, we have oversampled households with age of head
of households (AOH) under age 35; however, we still under represented households with AOH
35-49. One of the benefits of the address based sampling frame is the presence of AOH
indicators on a portion of the addresses. In particular, the AOH 55+ age indicator has a 92%
accuracy rate for the 55+ age group. In an effort to improve the demographic representation of
completed diary homes, Nielsen implemented disproportionate sampling of addresses with the
AOH 55+ indicator starting with the November 2011 sample. This paper will review the data
used to make the decision to exclude addresses with an AOH 55+ indicator from a portion of the
diary sample; and, it will show if we achieved the benefits expected within a recent TV diary
survey.
Saturday, May 18
10:00 a.m. – 11:30 p.m.
AAPOR Concurrent Session H
Survey Mode and Survey Error
Assessments of Survey Accuracy Through a Multi-Modes National Field
Experiment
Bo MacInnis, Stanford University; Jon A. Krosnick, Stanford University
Several mode studies have assessed the accuracy of telephone and Internet surveys of
probability samples and Internet surveys of non-probability samples (Yeager et al. 2011; Chang
and Krosnick 2010; Pasek and Krosnick 2010), yielding a general finding that probability-sample
surveys are more accurate than non-probability sample surveys. This accuracy gap, some
claim, however, may be narrowed or closed by recent developments in sampling for non-
probability samples. To supplement this literature to account for newer modes and
methodologies, we conducted a large scale mode comparison study in 2012 with a number of
leading online panels participating. The study involved administrating the identical questionnaire
via RDD telephone calls to a national sample of cell phones and land lines, via the Internet with
multiple probability samples, and via the Internet with multiple non-probability samples of
respondents. The questionnaire included measures of a range of political opinions with a focus
on climate change. Simultaneous data collection through multiple modes allowed us to explore
the similarity of the measurements made using the various methodologies and to assess
whether the methodologies differed in the degree to which they yielded accurate measurements
of the American adult population. National benchmarks of known high accuracy were used to
assess the accuracy of data collected. We examined differences between data collection modes
in terms of the distributions of political opinions, the relations between opinions, and the
relations of opinions with demographics. We also investigated whether the data collection
streams differed in the extent to which survey satisficing manifested as well as in the
magnitudes of question wording and question order effects. We also explored whether statistical
weighting improves the accuracy of the various datasets, and whether the response rate affects
the effort’s accuracy comparing cases that completed the questionnaire early vs. late during the
data collection period.
Web Versus Outbound: A Mode Face-Off Following the Presidential Debate
Jenny Marlar, Gallup
Unique events, such as a presidential debate or natural disaster, present researchers with an
opportunity, perhaps even responsibility, to capture public opinion. Understanding attitudes
immediately following these types of events can inform policy or courses of action. However,
conducting surveys in a narrow window of time is challenging and costly, especially using
traditional outbound methods. This study compares an outbound and Web study and draws
conclusions about the costs and benefits of each. Gallup interviewed respondents following the
Presidential debate on October 22, 2012, either via outbound or Web. Outbound telephone
respondents were recruited from a nightly tracking study prior to the debate and agreed to be
called back immediately following the debate. Web respondents were randomly selected from
the Gallup panel, a probability based panel of more than 50,000 members who agree to
complete several surveys per month. Respondents were notified ahead of time that they would
be asked to participate in a survey following the debate. Web respondents were randomly
assigned to receive the survey at one of three points in time: as the debate concluded, one day
after the debate, or three days after the debate. The results will be used to analyze several
research questions. First, are the Web and outbound components significantly different in terms
of response rates, respondents’ demographics, and overall results, and does weighting
effectively minimize any of these differences between modes? Second, does a Web survey
appear to be effective for collecting opinions at a specific point in time? Paradata will be used to
explore whether respondents complete the survey at the requested time and if users on mobile
devices were more compliant. Finally, results from the three time periods will be analyzed to see
if opinion changes over time and to evaluate the benefit of conducting surveys under tight time
constraints.
Estimating Measurement Effects of Survey Modes From Between and Within
Subject Designs
Thomas Klausch, Utrecht University; Joop Hox, Utrecht University; Barry Schouten,
Statistics Netherlands
Measurement effects are a major problem in mixed-mode surveys suggesting that the same
respondent potentially provides different answers under different modes. Mixed-mode
researchers therefore often need to know the average size of measurement effects (AME) for
the questions of their interest. The present paper discusses estimation of AME using two
different data collection approaches: a between subject and a within subject (repeated
measures) design. Real-world data from an experiment with N=8,800 subjects in The
Netherlands are presented. In the ‘between design’, subjects were randomly allocated to one
mode only (Face-to-Face, Telephone, Mail, or Web). In the ‘within design’ subjects were first
allocated as in the ‘between design’ and subsequently re-approached after some weeks in a
reference mode (Face-to-Face) repeating a large number of questions. Unit nonresponse in
both designs represents a threat to full randomization and thus to unbiased estimation of the
AME, if confounders relate to the selection mechanism into mode conditions and the outcome
variable. Statistical adjustment of missing data is a possible solution to this problem, but it is
based on assumptions. Adjustment in ‘between designs’ assumes that the selection mechanism
is ignorable given auxiliary variables. This is often contestable in practice, because some
important confounders might not be observed. An advantage of ‘within designs’ is that it is more
plausible to ignore the selection mechanism when conditioning on the repeated measurements.
Thereby it is not problematic whether time-related changes of outcomes between measurement
occasions occur, because these can be controlled using subjects who are allocated to the
reference mode on both occasions (i.e., Face-to-Face). However, ‘within designs’ need to
assume that measurements can be taken independently across time. We compare AME
estimates from both designs for questions from the Dutch Crime Victimization Survey applying
regression adjustment with propensity score strata as covariates or propensity score weighting.
Asking Questions on Sexual Identity, Financial Well-Being, Sleep, and HIV
Testing in the National Health Interview Survey: Exploring Mode Effects
Adena Galinsky, National Center for Health Statistics; James Dahlhamer, National Center
for Health Statistics; Sarah Joestl, National Center for Health Statistics; Marcie Cynamon,
National Center for Health Statistics; Jennifer Madans, National Center for Health
Statistics; Virginia Cain, National Center for Health Statistics
In recent decades research has demonstrated that audio computer-assisted self-interviewing
(ACASI) yields greater reporting of socially undesirable behaviors compared to paper-and-pencil
questionnaires and various other forms of computerized interviewing (e.g., computer assisted
personal interviewing (CAPI)). The bulk of this research, however, has focused on risky sexual
behaviors, sexual abuse, and drug and alcohol use. Less is known about mode effects with
potentially sensitive topics such as sexual identity or sexual orientation. Over the past year,
three field tests were conducted with the National Health Interview Survey (NHIS), a face-to-
face, household health survey, to assess the feasibility of 1) asking questions on sexual identity,
and 2) administering these and other potentially sensitive items in ACASI. This paper utilizes
data collected during the third field test to explore possible mode effects on estimates of sexual
minority status, financial well-being, sleep, and HIV testing. The third field test included a split-
ballot experiment in which 3,215 adults were assigned to receive the questions using ACASI
and 2,237 to receive them using CAPI. Preliminary results revealed no significant differences in
prevalence estimates of sexual minority status by mode of administration, while estimates of
HIV testing were higher using ACASI than using CAPI. In addition, preliminary estimates of the
average hours of sleep in a 24-hour period revealed a shift toward shorter sleep durations in
ACASI. Where significant bivariate results emerged, we attempted to diminish or eliminate
mode effects in a series of multivariate analyses, controlling for sociodemographic
characteristics such as age, sex, race/ethnicity, and education. We discuss the implications of
our results for mode choices when administering questions on sexual identity and mental health,
and for prior NHIS CAPI-based estimates of sleep and HIV testing.
Changing of the Guard: Effects of Different Self-Administered Survey Modes on
Sensitive Questions
Frances M. Barlas, ICF International; Wm. B. Higgins, ICF International; Jacqueline
Pflieger, ICF International; Randall K. Thomas, GfK Knowledge Networks; Diana Jeffery,
Department of Defense; Mark Mattiko, U.S. Coast Guard
Compared to self-administered questionnaires, socially desirable responses are more likely
found with interviewer-administered questionnaires. However, less is known about differences in
social desirability bias between different modes of self-administration. This study compared the
results for sensitive questions when asked on a paper-pencil questionnaire versus in a Web-
based survey. Personnel at selected military installations were randomly assigned to either the
paper-pencil or the Web administration. The paper-pencil survey was administered in a group
setting, with an interviewer present to distribute and collect the surveys while the online survey
was individually-administered at respondents’ convenience. All respondents, regardless of
mode, were assured anonymity. The surveys were conducted as part of the Health Related
Behaviors Survey of Military Personnel, conducted every three years by the Department of
Defense and the United States Coast Guard. The largest survey on service members’
behavioral health, it asks about a number of activities that can have serious consequences for
military careers such as substance use and mental health indicators, as well as a number of
highly sensitive topics, including for the first time Coast Guard members’ sexuality. Overall, the
paper-pencil survey showed fewer drops offs. After controlling for demographic differences and
differences in Internet accessibility and use, in the online survey we found lower prevalence
estimates of unhealthy or illicit activities, such as heavy drinking or drinking and driving, and
higher estimates of socially desirable attitudes and behaviors, such as exercise and safety,
compared to the group-administered, paper-pencil surveys. Contrary to the hypothesis that the
online administration would be associated with greater reports of undesirable behaviors, we
consider the possibility that respondents to the online survey had concerns about anonymity.
Quality of Measurement
Building an Archive of Reliability of Survey Questions
Duane Alwin, Pennsylvania State University
This paper presents a design for a public archive of measures of data quality for the typical
kinds of information gathering approaches used in survey research. A progress report is
presented concerning an ongoing project that is focused on developing a data base of estimates
of the reliability of survey measures. Based on nearly 900 individual measures from several
large panel survey data sets based on representative samples of the U.S. population, including
measures from the National Election Studies, the General Social Surveys, the Health and
Retirement Study, and others, this paper reports on the success of the development of a data
base for common survey questions implemented in actual surveys. The paper discusses
problems in creating a data base containing estimates of question-specific reliability, along with
detailed coding of attributes of the questions (e.g. content, response formats, question length,
etc.), which can be used to evaluate the optimal properties of survey questions with respect to
levels of measurement error. The approach advanced can be used within a meta-analytic
framework for assessing the relative quality of measures, and can be used to improve the
quality of inferences from survey data. Preliminary evidence is presented from this data base
regarding patterns of variation in levels of measurement error linked to concerning survey
content, source of information, survey context, and attributes of questions (question form,
number of response categories, labeling response options, explicit Don’t Know options, and
question length) as a way of demonstrating the utility of the approach. Given that survey
measurement is a key ingredient in the majority of social science research, the broader impact
of the present project lies in its contribution to the uses of virtually all types of survey data, which
can be evaluated in terms of the results of this study.
Can We Have Confidence in Consumer Confidence? Assessing the Temporal
Comparability of the Consumer Sentiment Index
Dmitriy Poznyak, Mathematica Policy Research; George F. Bishop, University of
Cincinnati
Along with the Conference Board’s Consumer Confidence Index, the University of Michigan’s
Index of Consumer Sentiment (ICS) has long been regarded as a reliable and valid measure of
public opinion on economic conditions in the country. Not only is it considered an essential
forecasting tool; the outcomes it produces are a vital force in the movement of U.S. and global
stock markets. ICS has also become a central variable in explanatory models of political
attitudes and behavior, particularly in time-series models of presidential approval. Given the
importance of this subjective indicator, it is surprising that its psychometric properties
particularly its temporal comparability—have not been established. In the absent of such
assessment, it cannot be determined whether temporal change observed in consumer
confidence is due to a true change in the construct or to methodological changes in its latent
factor structure—thus a measurement artifact. Using multigroup confirmatory factor analysis we
decompose the Index to analyze the pattern of survey responses to the five questions—each
with a 12-month horizon—used to measure Consumer Confidence since 1972. The results
confirm that the ICS has the same overall temporal factorial structure. However, only partial
equivalence can be established for the Index, indicating that the measurement error associated
with repeated measurements over-time is not random. We demonstrate that the meaning-and-
interpretation of some of the items—especially personal economic evaluations—varies
significantly over time. At the same time, respondents’ sociotropic evaluations of the economy
remain temporally invariant. Further analysis of trends in response patterns to the personal
economic items shows that respondents systematically interpret them in conceptually different
ways in times of crisis vs. economic stability. Our analysis raises questions about the temporal
comparability of the ICS and suggests that its partial measurement equivalence must be taken
into account in deciding whether we can have confidence in consumer confidence.
A Versatile Tool? Applying the Cross-National Error Source Typology (CNEST) to
Triangulated Pre-Test Data
Rory Fitzgerald, City University London; Lizzy Gatrell, City University London; Yvette
Prestage, City University London
There are certain error sources that are unique to measurement via cross-national
questionnaires, or which occur less frequently in single nation studies. Tools that help to identify
these errors assist the cross-national survey researcher in producing a higher quality
questionnaire in the source language and also facilitate translation. This paper evaluates the
Cross-national Error Source Typology (CNEST), which was developed as a tool for improving
the effectiveness of cross-national questionnaire design (Fitzgerald et al., 2009). The CNEST
has already proved useful when applied to cognitive interviewing data (Fitzgerald et al., 2011).
This paper assesses the consistency and versatility of the tool by applying it to triangulated
cross-national pre-test data of a module on ‘understandings and evaluations of democracy’ from
Round 6 of the European Social Survey (ESS). Quantitative data from a face-to-face pilot in
Russia and the UK are triangulated with qualitative feedback from interviewers and respondent
debriefs in both countries. The CNEST is applied to these pre-test findings to identify and
categorise sources of error in the questions, and to develop improved questions or drop a
concept from the module where appropriate. This paper highlights the benefits and challenges
that accompany the use of multiple pre-testing tools simultaneously.
Does End-User Experience With Government Reforms Diffuse to General Public
Opinion? Two Parallel Quasi-Experiments in Colombia
Clifford Zinnes, NORC at the University of Chicago; Christopher Nicoletti, NORC at the
University of Chicago
This study conducts a quasi-experimental impact evaluation to examine several questions
regarding the effect of providing citizens an office of transparency and accountability (e.g.,
accessing freedom-of-information services, lodging complaints of corruption) on public opinion
in Latin America on democracy and governance in general and on the quality of government in
particular. First, is there an influence on the opinions of end-users of this service regarding their
confidence in democracy and governance and does the answer depend on socio-economic or
demographic characteristics? Second, how quickly and to what extent is there diffusion of the
opinions of direct users of the service to public opinion in general? Are the former’s opinions
good predictors of the latter’s eventual opinions? Third, how does one quantitatively measure
such opinions? For this purpose a household-panel public-opinion survey and two end-user
cross-sectional exit surveys are administered in treated and comparison municipalities in
Colombia at a common baseline in 2010 and endline in 2012. Then multiple applications of
propensity-score matching were carried out and difference-in-differences impacts and attribution
equations estimated. A novel quantitative indicator design approach is developed and tested to
capture inherently qualitative opinions. Among the findings include strong positive
improvements as a result of the new office in the opinions of direct users on a range of beliefs
concerning democracy and the acceptable ways of exercising it, but slow diffusion of these
changed opinions to the general public over the evaluation period. These results, therefore, may
serve as a warning of the limited utility of conducting household public-opinion surveys in the
short term when gauging the effect of government reform.
Informed Computerized Adaptive Testing: Using Prior Knowledge to Improve
Dynamic Surveys
Josh Cutler, Duke University; Jacob M. Montgomery, Washington University in St. Louis
Survey researchers avoid using large multi-item scales to measure latent traits due to both the
financial costs and the risk of driving up non-response rates. Typically, investigators select a
subset of available scale items rather than asking the full battery. Reduced batteries, however,
can sharply reduce measurement precision and introduce bias. In this paper, we evaluate how
computerized adaptive testing (CAT) within a Bayesian framework can (a) minimize the number
of questions each respondent must answer as well as (b) seamlessly incorporate prior
knowledge about respondents into the survey procedure all while maximizing measurement
precision and accuracy. CAT algorithms respond to individuals’ previous answers to select
subsequent questions that most efficiently reveal respondents’ position on a latent dimension.
Latent traits of interest may include individuals’ political knowledge, healthy eating habits, or
propensity to vote. Utilizing information gleaned from prior responses on a respondent’s
questionnaire or on a previous panel wave, we demonstrate how, through informative priors, we
can increase measurement relative to alternative methods. Using simulations, convenience
samples, and a national probability sample, we demonstrate the advantages of using prior
information in a CAT algorithm by testing multiple priors and showing that, in most cases, we
can achieve greater accuracy and precision when compared to a static battery or a naïve CAT
algorithm. We demonstrate how this approach can be used as a dynamic and theoretically
motivated way to reduce the size of commonly used batteries (e.g., the big five and need for
cognition inventories). We conclude by noting how this method could be extended to include
information about respondents gleaned from public records such as voter files. This may
facilitate the use of shorter questionnaires to achieve the same levels of measurement quality in
a wide array of domains.
Unlocking the Potential of Conjoint
Analysis/Discrete Choice Modeling
and MaxDiff Scaling in Public Opinion
and Survey Research
Motivating Consumers to Participate in Wellness Programs
Lisa Weber-Raley, Mathew Greenwald & Associates
Health care policymakers are always seeking ways to improve the quality of health care in this
country, while keeping it affordable and accessible. One important strategy is to encourage
healthy lifestyles that will help to prevent and/or manage chronic conditions through a wide
range of initiatives, loosely termed wellness programs. Today, many wellness programs are
sponsored by large employers, but other types of community organizations and health care
institutions also offer these types of support services. National health care reform legislation
also has promised funding for small employers to offer wellness programs. One key challenge
with wellness programs is motivating consumers to participate, so they can have support to
maintain or improve their health status, and therefore, lower health care costs. Many large
employers, the current leaders in implementing wellness programs, try to encourage
participation among their employees by offering incentives. However, there is limited knowledge
about what wellness program features effectively motivate participation, what amount or type of
incentive spurs participation, or whether features and incentives work the same across different
types of wellness programs. Much of the existing research on motivating consumers to
participate in wellness programs is case-study based. Understanding the “return on investment”
in wellness programs on a broader scale is crucial in order for more public health agencies to
justify these offerings. We used discrete choice modeling to study three specific types of
wellness programs to identify the optimal feature design to make them appealing to the
audiences they target. The programs tested were: Biometric Screening, Exercise, and Health
Coaching programs. We conducted online survey of 1,200 employed Americans ages 21 to 65,
where each respondent was asked about one of the three programs. Respondents who were
selected for the Health Coaching program had to have a chronic health condition or a BMI of 30
or higher.
Message Testing in an Environmental Context
Barry T. Radler, University of Wisconsin-Madison
Aquatic Invasive Species (AIS) can cause significant ecological and economic harm to lakes
and other water bodies. One of the primary ways AIS spread is by 'hitching' rides with anglers,
boaters and other recreational enthusiasts. Although the behaviors these water users need to
adopt to prevent AIS “hitchhiking” are fairly simple, behavior change cannot be achieved without
carefully planned communication efforts. Various communication strategies have been
implemented to increase awareness of AIS problem and encourage behavior change. Many of
these strategies have been successful; however, it is not clear what components of current AIS-
prevention campaign efforts are having the most impact. Also additional communication efforts
are needed to influence individuals who are still not practicing AIS-prevention behaviors.
Although social marketing and behavioral theories are frequently used in health
communications, little research has applied these theories within an environmental context. As
AIS spread is directly linked with specific behaviors, this project presents a unique opportunity to
test the effectiveness of key concepts from social and behavioral theories. It is important to
evaluate prototypes of different creative strategies to determine if campaign materials are
optimally designed to influence attitudes and subsequent behavioral intentions among an
identified target audience. We used an online survey that exposed 1,000 individuals from
Wisconsin who boated, fished, or recreated on a body of water in the last year to a number of
distinct stimuli. Using a discrete choice task with a split-sample design, half the respondents
(randomly selected) completed a choice task of AIS materials, while the other half (holdout
sample) evaluated the AIS materials using traditional Likert-type response scaling. The holdout
sample served as an external validity check of the conjoint model. Multiple dependent measures
were employed and the survey also contained behavioral, attitudinal, and knowledge measures
regarding AIS that could be used for segmentation analyses.
To Complete by Smartphone or by Tablet or by Computer or by Paper & Pencil
That is the Question: Exploring Factors Associated with Respondent Mode
Choice for Multi-Mode Surveys
Trent D. Buskirk, The Nielsen Company
Today more than ever before researchers have an unprecedented opportunity to administer
surveys via a vast collection of Internet-enabled devices including smartphones, tablets/e-
readers, netbooks and desktop and laptop computers. Currently in the U.S., smartphones
account for nearly 50% of all cell phones and roughly 20% of households own some type of
tablet device. As these penetration levels rise, survey researchers have more viable modes for
online survey administration and respondents have more choices with which to complete online
surveys. Currently, there is relatively little published research comparing response rates and
potential mode effects for surveys completed via these new modes. Even more elusive is the
literature that explores the relationship between survey factors, such as recruitment
methodology and questionnaire content, and the respondent’s choice of completion mode. In
this paper we present the results of a conjoint analysis administered to a national probability
sample of 1000 smartphone, tablet and personal computer owners aimed at investigating the
relationship between five survey attributes (i.e. topic, sponsor, length, incentive amount and
delivery type) and a respondent’s choice of completion mode (i.e. smartphone, tablet, personal
computer and paper and pencil). We will also investigate whether various technology-related
variables (i.e. Internet usage and prior survey experience by device) as well as demographic
variables (e.g. age, race, education) might explain the latent structure of the derived mode
choice utilities. Finally, we present the results of an experiment randomizing half of the
respondents to complete the entire conjoint exercise and the remaining half to complete a
subset of the conjoint questions. We present external validity measures derived by comparing
the modeled preferences of respondents assigned to the full conjoint exercise to the observed
preferences from those assigned only to the subset questions.
Price and Preference Sizing for a Consumer Service
Mario Callegaro, Google UK
We will field a Choice Based Conjoint survey for an online consumer service. Goals of the
project are to establish: 1) Interest and tradeoffs among performance features of the service, 2)
Brand value, 3) Price sensitivity, 4) Preference share vs. competing services. Additionally, this
project will address questions of conjoint analysis replicability, internal reliability, and external
validity by comparing results to a previous sample, to current market share, and to a real, inline
indicator of interest presented at the end of the survey. The sample will comprise approximately
N=1650 adult, online, general consumer respondents in the U.S. We will include approximately
K=20 demographic, attitudinal, and behavioral survey items in addition to the conjoint; total
survey time of 10-15 minutes. Previous work suggests that conjoint analysis is internally reliable
but may need per-project assessment of external validity (Chapman et al., 2009*). We will
compare the results here to: 1) Results from a previous CBC study that assessed a partially
overlapping set of attributes/levels, 2) Actual current market share in both a regional area and
national area. We believe these comparisons will be of particular interest in the survey analyst
community.*CN Chapman, J Alford, and E Love (2009). Exploring the Reliability and Validity of
Conjoint Analysis Studies. Presented at the 2009 Advanced Research Techniques Forum,
Whistler, BC, June 2009.
State of the Art: Past, Present and Future of the
Survey Profession
Old and New Survey-Research Paradigms
Tom W. Smith, NORC at the University of Chicago
A paradigm shift occurred almost 80 years ago in the mid-1930s when Gallup, Roper, Crossley,
and a handful of other innovators pioneered the public opinion poll (Brick, 2011; Converse,
1987; Groves, 2011b). Prior to the advent of polling, politicians, journalists, social scientists and
others had turned to various sources to measure public opinion and other aspects of society.
These included tracking election returns, the outcomes of referenda, crowd counts, straw polls,
compilations of editorials and news articles collected by such publications as Public Opinion
(taken over by Literary Digest in 1906), studies of letters to the editor, and, as George Gallup
(1957) noted in 1957, such other evidence as “letters to congressmen, the lobbying of pressure
groups, and the reports of political henchmen…” These alternatives were supplanted by the
polls and soon public opinion and poll results became considered to be almost synonymous with
one another. The advent of polling was a complete game changer. As Elmo Wilson (1947), a
researcher at Roper and other organizations, remarked in 1947, “25 years ago the possibility of
measuring public opinion with any degree of precision was at least as remote from public
consciousness as the atomic bomb.” Now a rising chorus is asserting that polls are passé, a
growingly antiquated relic of the last century. They claim that public opinion, consumer
behaviors, and other socio-political outcomes can be better measured (less expensively, more
quickly, more easily) by the analysis of Internet usage in general and of social media in
particular, by the data mining of administrative databases (including the merging of disparate
information sources through such techniques as data fusion), or by a combination of these two
alternatives to traditional surveys. The promise and pitfalls of this new proposed paradigm are
considered.
The Evolution of Presidential Polling
Robert M. Eisinger, Savannah College of Art and Design
Interest in presidential polling continues to grow. What are they? How are they conducted?
Why and when? 2013 will mark the 10 year anniversary of the publication of The Evolution of
Presidential Polling (Cambridge U. Press). The 2012 election and related media coverage
underscores the interest in polls, and the continued discussion between presidential leadership
and responsiveness to public opinion. This proposed panel explores the past, present and
future of presidential polling, with the goal of educating attendees and exploring new theories
about how polls are conducted and why.
Self-Reported Participation in Research Practices Among Survey Methodology
Researchers
Kelly Perez-Vergara, Independent Consultant; Caroline Smith, Dana-Farber Cancer
Institute; Carol Lowenstein, Dana-Farber Cancer Institute; Al Ozonoff, Boston Children’s
Hospital; Yolanda Martins, Boston Children’s Hospital
In recent years, the issues of accountability, transparency and ethical conduct in scientific
research have received widespread media attention. However, the ethical “grey-zone” of
research practices is widely exploited, as demonstrated in John et al.’s (2012) report that more
than 63% of investigators admitted they had failed to report all of a study’s dependent measures
in a published paper and over 45% admitted that in a paper they had 'selectively reported
studies that 'worked'.' We do not know of any published studies in the area of survey
methodology that quantify how often researchers employ various research methodologies or the
implications of their use, when conducting research about survey methods. In order to assess
the use of and beliefs about various research methodologies and practices that may be utilized
while conducting survey methodology research, 483 men and women, identified through
systematic Web searches as survey researchers, were invited to participate in a Web survey.
The survey included 14 items assessing demographic variables, 10 items related to use of and
belief about methodological designs used in survey methodology research and 15 items on
beliefs about ethical conduct of research. Results will be discussed in terms of the potential
ethical implications and the American Association for Public Opinion Research’s commitment to
transparent survey research methodologies.
Transparency in the 2012 Pre-Election Polls
Stephanie Calvano, Marist Institute for Public Opinion; Daniela Charter, Marist Institute
for Public Opinion; Michael Conte, Marist Institute for Public Opinion; Natalie Jackson,
Marist Institute for Public Opinion; Susan McCulloch, Marist Institute for Public Opinion
Are pollsters providing enough information? In 2009, AAPOR began the Transparency Initiative,
designed to “encourage routine disclosure of methodological information from polls and surveys
whose findings are released to the public.” The Initiative is applicable to all types of polling and
survey data, but perhaps the highest volume of publicly released survey findings occurs prior to
U.S. Presidential elections, making these polls a particular focus of scrutiny. Some of the firms
that release pre-election polling numbers are members of AAPOR and have signed on to
support the Transparency Initiative, and some are not part of the Transparency Initiative or
AAPOR. The variety of firms that produce and release pre-election polls provide an ideal
opportunity to evaluate the transparency of various organizations, and how easily their
methodological information is accessed as well as what information is provided. In this meta-
analysis, we will review the polls reported by Real Clear Politics in the months before the 2012
presidential election, plus any others used by poll aggregating models, and determine two
things: 1) how much effort is required to access their methodological information online, and 2)
how much information is provided in their methodology statement. In an atmosphere in which
pre-election polls are under heavy attack, methodological transparency is of utmost importance.
We will report how methodologically transparent public polls were during the general election of
the 2012 presidential campaign. Data will be aggregated by type of organization, for example
AAPOR members and those who have signed on to the Transparency Initiative vs. non-member
organizations, academic vs. non-academic organizations, and partisan vs. non-partisan polling
organizations. The research design will include ease of accessibility to information by both
experienced researchers and evaluators with little or no research background.
Trust in Statistics and Statistical Use of
Administrative Records
A Multi-Method Analysis of Measurement Error Using a Measure of the Public’s
Trust of Official Statistics in the United States
Morgan Earp, U.S. Bureau of Labor Statistics
In an effort to explore the public’s trust of official statistics in the United States and attitudes
towards the use of administrative records, the Census Bureau collaborated with several federal
statistical agencies to develop a measure of trust in statistical products, trust in statistical
agencies, and attitudes towards use of administrative records. This measure is being used in a
telephone survey to monitor the public’s trust level and assess the impact on attitudes towards
use of administrative records. During the construct refinement and item development phase, we
consulted international models of trust of official statistics (Brackfield, 2011; UK Office for
National Statistics, 2006 & 2007). Prior to pretesting, cognitive interviews and expert reviews
were used to assess and improve items. Pretesting was done in three phases, allowing us time
to assess and address measurement error between administrations. During pretesting, we used
random probes to assess item performance and we used confirmatory factor analysis (CFA) to
evaluate item misfit (error variance) within factors. Since pretesting was completed, we have
continued using the prior methods as well as Item Response Theory (IRT) to evaluate items.
While the results from each analysis are correlated to varying extents, it appears that each tool
taps into a unique aspect of measurement error, and that no single tool provides a complete
assessment. While the results from some tools are weakly correlated with item nonresponse,
the results from other tools are strongly correlated with item nonresponse. This paper focuses
on the relationship between the various diagnostic tools used to assess measurement error and
the relationship between measurement error and item nonresponse. We will present the
theoretical model we developed, the methods used to detect measurement error, and our
analysis of the relationship between item nonresponse and measurement error.
Monitoring and Detecting Shocks that Influence Change in Public Trust Towards
the Federal Statistical System
Melissa A. Mitchell, USDA/NASS
Beginning in 2011, several federal statistical agencies partnered to develop a measure of trust
in official statistics and monitor public opinion on the use of administrative data for statistical
purposes. Using the Fellegi model of trust of official statistics as a starting point, Earp and
colleagues (2011) identified factors related to trust and public perception of the Federal
Statistical System. These factors are: trust in statistical products (accuracy, relevance, and
credibility), trust in statistical institutions (integrity, confidentiality, transparency, and impartiality),
and trust in official statistics. It is hypothesized that these factors may influence attitudes
towards the use of administrative records for statistical purposes. Considering trust and
perception can change over time and could be influenced by many different, external events, we
planned to study these factors over time to see if trust and perceptions towards the statistical
system and opinions towards the use of administrative records are changing, and, if so, what
influences their change. Using time series techniques, we examine these factors over time. We
are interested in external events that may occur that cause a “shock” to the system. A shock is
defined by its location in time and its magnitude. It can have both an immediate impact as well
as a long-lasting impact. Shocks are reflected by the residuals (error terms) once an adequate
model is fit. Part of this study is hypothesis driven, for instance, events that make an impact in
the media, like former CEO of General Electric, Jack Welch, questioning the validity of the
unemployment rate, and the presidential election may impact trust and perception. In addition to
hypothesis driven inspection, we also employ a retrospective approach where we look for
changes in opinion and see if we can determine events in the media that may have coincided
with the change in opinion.
To Share or Not to Share? Understanding Respondents’ Privacy and
Confidentiality Concerns Regarding Administrative Records Usage
Michelle Smirnova, U.S. Census Bureau
The U.S. Census Bureau is investigating the use of administrative records, which could create
unease if respondents believed that the agency was treating their personal data inappropriately.
Although privacy and confidentiality are protected by different laws, the two concepts are often
conflated in respondents’ minds. This creates a problem in measuring these concerns and
designing effective communication strategies to address them. Accordingly, the Census Bureau
collaborated with other agencies to conduct focus groups and cognitive interviews to design a
questionnaire that would measure privacy and confidentiality concerns of respondents regarding
the use of administrative records for statistical purposes. In a series of three focus groups and
85 cognitive interviews, we explored respondents’ concerns with the use of administrative
records data, which allowed us to formulate and refine survey questions that measured the
constructs as intended. We found that respondents do not have consistent opinions regarding
data sharing; rather their reactions depended largely on two factors. The first was data-specific:
if respondents believed that data collected by another agency was accurate, beneficial to
society, or cost-effective, they had favorable attitudes. The second factor was agency-specific:
respondents tended to divide organizations into two categories—benign information-gathering
agencies, whose use of data is perceived to have either a positive or neutral effect on the
respondent versus sanctioning agencies whose data use is associated with negative
consequences. For this second group of agencies, even if data were perceived as accurate,
beneficial or cost effective, respondents were opposed to another agency sharing their personal
information with this perceived-as-threatening organization. This research enabled us to
separate privacy and confidentiality concerns, utilizing the results to design more precise survey
questions and to craft messages that future public relations communications campaigns could
use to allay respondents’ concerns about privacy and confidentiality with regard to
administrative record usage.
Predicting Attitudes Towards the Use of Administrative Records
Ryan King, U.S. Census Bureau
In reaction to declining response rates and increased operational costs, the Federal Statistical
System is carefully examining the possibility of using administrative records to supplement
current survey practices. To do this, we need to understand what the public’s reaction may be
and what concerns the public may have if this is undertaken. An interagency team developed a
series of questions that are asked at the end of an ongoing nightly telephone survey. The
survey is being fielded from January 2012 to September 2013 and completes interviews with
about 200 nationally representative respondents most nights. Respondents are asked a number
of questions regarding their attitudes towards and knowledge about the Federal Statistical
System, as well as questions about their attitudes and knowledge of the potential use of
administrative records data for statistical purposes. Building on past research in this area,
through the nightly survey, we have examined various ways of measuring and influencing
opinions towards the use of administrative records. This paper explores overall attitudes
towards administrative records use and compares whether mentioning different social benefits
(such as saving money or time), using different data sources (such as government, commercial,
or health records), and different federal agencies requesting use of the record may produce
different results. In addition, we show how respondents of different demographic groups and of
different mindsets may have different attitudes towards the use of administrative records
depending on how the use is framed. We also show how this line of research can be used to
help frame the public discussion of the use of administrative records for statistical purposes.
Mixed Topics in Questionnaire Design I
Estimation of Expected Academic Engagement Behaviors: The Use of Vague
Quantifiers Versus Tallied Responses
James Cole, Indiana University; Alex McCormick, Indiana University
This study sheds light on a rarely explored topic in survey research: do different behavior
estimation procedures for past and expected behaviors produce different results? This study is
based on prior research regarding the importance of academic expectations, estimation of
behavior frequency (e.g., Schaeffer & Presser, 2003), and the use of vague quantifiers in survey
research (e.g., Wright, Gaskell, O’Muircheartaigh, 1994).Data for this study are from the 2010
administration of the Beginning College Survey of Student Engagement. Responses from more
than 28,000 first-year students enrolled at 68 institutions were included in this analysis. Items
from the core survey were repeated at the end of the Web version of the survey. Respondents
were reminded of their original response to the item (core survey items are presented with
vague quantifiers: very often, often, sometimes, and never) and were then asked to again
estimate their behavior by tallying or counting their behaviors. One of the general findings is that
the magnitude (effect size) of the differences for the vague estimations was much larger than for
the tallied estimations. This means that those doing “gap analysis” where data are used to
identify areas where student expectations are not met, may want to consider if the results are
more of an artifact of the response set, then any real difference in behavior frequency. This
study also found that tallied estimates associated with vague quantifiers are not necessarily
stable. For instance, in high school “very often” asking questions in class corresponded with a
tallied count of this activity of 23 times per week. However, “very often” expecting to ask
questions in class during their first year of college corresponded with a mean of 16 times per
week (dpooled=.550). Full results will be presented and implications for survey research
discussed.
Numeric Estimation and Response Options: An Examination of the Measurement
Properties of Numeric and Vague Quantifier Responses
Tarek Al Baghal, University of Nebraska - Lincoln
Many survey questions ask respondents to provide responses that contain quantitative
information. These questions are often asked requiring open ended numeric responses, while
others have been asked using vague quantifier scales. Generally, survey researchers have
argued against the use of vague quantifier scales. However, no study has compared accuracy
between vague quantifiers and numeric open ended responses. This study is the first to do so,
using a unique data set created through an experiment. 124 participants studied word lists of
paired words in the experiment, where the experiment employed a 2 (context: same; different) x
6 (frequency of target word presentation: 0, 2, 4, 8, 12, 16) x 2 (response form: open-ended
numeric; vague scale) factorial design, with the context and form factors manipulated between
subjects, and the frequency factor manipulated within subjects. There are two conditions for the
context factor: same-context condition where the same context word was paired with each
presentation of the target word and different-context condition where the same context word
was paired with each presentation of the target word. The other between subject factor was
response form, where participants responded to a recall test using either vague quantifiers or
numeric open ended responses. Translations of vague quantifiers were taken and used in
accuracy tests. Finally, a numeracy test was administered to collect information about
respondent numeracy. Different accuracy measures were estimated and analyzed including
relative accuracy, bias in estimation, and signed and absolute differences. Results show context
memory did not have a significant effect. Numeracy has an effect, but not always in the same
direction, depending on form and context. Actual frequency had a significant effect on accuracy,
but did not interact with other variables. Importantly, response form does not always have
impact on accuracy, but when it does, vague quantifiers tend to improve accuracy.
Including Covariates in a Factor Mixture Model Intended to Detect Differences in
Vague Quantifier Interpretation
Jamie L. Griffin, Mathematica Policy Research
Survey respondents are commonly asked to provide vaguely quantified estimates of behavioral
frequency (e.g., never, sometimes, often, very often). Researchers interested in placing
respondents on a latent behavioral frequency continuum based on a set of related items often
assume that the interpretation of these vague quantifiers is identical across respondents—for
example, that all respondents interpret sometimes as 1 to 2 times or very often as 5 or more
times. If this assumption is incorrect, detected differences on a latent factor estimated from
these frequency reports (e.g., student engagement) might reflect differences in interpretation
rather than true differences on the factor. Several studies investigating the interpretation of
vague quantifiers have demonstrated that individual variability is not necessarily random; rather,
the variability tends to be associated with demographic or social characteristics (for example,
education, age, race, social class). Griffin (2012) described the use of a factor mixture model to
detect latent “interpretation” classes; that is, unobserved groups of respondents for whom
interpretation is consistent. There was, however, wide variation in the ability of the model to
correctly predict vaguely quantified responses; thus, the extracted latent classes may not
necessarily differentiate respondents according to their interpretation of vague quantifiers. The
present paper investigates whether the model’s performance is improved by including
covariates representing social referent groups (e.g., gender, class rank, race). Using data from
an experiment embedded in the 2006 National Survey of Student Engagement in which 8,174
students reported frequencies of several student engagement behaviors in both numerically and
vaguely quantified terms, we first outline how to include covariates in a factor mixture model
estimated on vaguely quantified frequency reports intended to detect differences in the
interpretation of vague quantifiers. Second, we evaluate the model’s performance by comparing
the numeric frequencies estimated from the model to those directly reported by the students.
Validating Sensitive Questions in Labor Market Surveys: A Comparison of Survey
and Register Data
Antje Kirchner, Institute for Employment Research (IAB)
The randomized response technique (RRT) is one of the most popular and best investigated so-
called “dejeopardizing techniques,” a class of data collection strategies for eliciting sensitive
information. This paper explores the RRT as a means to improve the quality of data about
sensitive labor market topics, such as receipt of basic income support. In a 2010 telephone
survey (n=3,211), we experimentally tested two techniques for asking such sensitive questions:
direct questioning and the randomized response technique. First, we compare the percent of
socially undesirable responses (indication of transfer payments, i.e. receipt of basic income
support) across the two techniques. In addition, because the sampled persons were selected
from German administrative records, we know (in the aggregate) the percent of respondents
who have received transfer payments and thus the percent who should have reported receipt.
Thus we can also validate the reported percent from each method against the known true rate
for the responding cases, hence assessing the bias of our estimates. Such administrative record
data is quite rare in the literature on sensitive questions, and allows us a unique opportunity to
evaluate the “more is better” assumption which is so often used in the literature. Being able to
assess the amount of ‘non-compliance’ to the RRT instructions for this item, insights into the
functioning of the RRT also in specific sub-populations can be assessed using multivariate
analyses. Thus this paper provides insights into a variety of practical and theoretical factors
contributing to successful implementation of the RRT in labor market surveys.
Are Readability Formulas Valid Tools to Assess Survey Question Difficulty?
Timo Lenzner, GESIS - Leibniz Institute for the Social Sciences
Readability formulas, such as the Flesch Reading Ease formula (Flesch, 1948), the Flesch-
Kincaid Grade Level index (Flesch, 1979), and the Gunning Fog index (Gunning, 1952), are
often considered to be objective measures of language complexity. Not surprisingly, survey
researchers have frequently used readability scores as indicators of question difficulty (e.g.,
Converse, 1976; Ganassali, 2008; Harmon, 2001; Holbrook et al., 2006) and some have even
suggested to apply the formulas during the questionnaire design phase to identify problematic
items and to assist survey designers in revising these questions (e.g., Velez & Ashworth, 2007).
At the same time, the formulas have faced severe criticism, in particular for being mostly based
on only two variables (word length and sentence length) which may not be very good predictors
of language difficulty (e.g., Oakland & Lane, 2004). The present study examines whether the
three readability formulas presented above reliably identify problematic survey questions.
Readability scores were calculated for a large number of question pairs, each including a
problematic (e.g., syntactically complex, vague, etc.) and an improved version of the question.
The question pairs came from two different sources: (1) existing literature on survey design
(e.g., Fowler, 1992; Fowler & Consenza, 2008) and (2) the Q-BANK database (NCHS). The
analyses revealed that the readability formulas often favoured the problematic over the
improved question version. On average, the success rate of the formulas in identifying the
difficult questions was below 50 percent. Reasons for this poor performance as well as
implications for the use of readability formulas in survey research are discussed.
Implementing a Responsive Design:
Moving From the Theoretical to the Practical
Using Predicted Response Propensities for Bias Reduction
Dan Pratt, RTI International; Melissa Cominole, RTI International; Jeff Rosen, RTI
International; Bryan Shepherd, Abt SRBI; Peter Siegel, RTI International; David Wilson,
University of Delaware; Jennifer Wine, RTI International
How response rates are increased during nonresponse follow-up can affect the amount of
nonresponse bias evident in survey estimates. A common approach has been to simply
maximize response rates by targeting sample members who are most likely to be interviewed.
However, since nonresponse bias is a function of the association between the likelihood to be
interviewed (response propensity) and the survey variable of interest, interviewing the easiest
cases during nonresponse follow-up may not reduce bias (e.g., Curtin, et al., 2000; Keeter,
Miller, Kohut, Groves, and Presser, 2000). In fact, nonresponse bias may actually increase
when nonresponse follow-up efforts target likely respondents (Merkle and Edelman, 2009). This
paper reports the results from three national field test studies which tested whether or not
encouraging participation among low propensity (low likelihood) cases can be a practical and
effective method to improve overall survey estimates. Various sources of information were used
to evaluate propensity: paradata from early interview attempts, demographic and substantive
survey data from prior survey waves, and administrative data. The likelihood of any sample
member becoming a nonrespondent was estimated prior to data collection and, for those
sample cases least likely to respond, a different survey protocol was employed to gain
cooperation. The approach rested on the assumption that low propensity cases, which are
frequently excluded due to nonresponse, were fundamentally different from responding cases,
and their inclusion would reduce bias in key survey estimates.
Comparative Evaluation of Metrics for Tracking and Assessing Nonresponse Bias
Peter Siegel, RTI International; Bryan Shepherd, Abt SRBI; Melissa Cominole, RTI
International
Recent work regarding survey error has helped clarify the effects of nonresponse on survey
estimates. One of the key findings from this new literature is that response rates are not good
predictors of bias. In other words, increases in response rates do not necessarily decrease bias
in estimates, a finding that stands counter to the prevailing mindset of many survey researchers.
Rather, this research illustrates that a key factor in determining the level of nonresponse bias is
the covariance between response propensity and response values within the population of
interest. In cases where the response value for a survey item varies with propensity to respond
to that survey item, nonresponse can create bias in estimates. In cases where this relationship
does not exist, or is obscured by other relationships or survey errors, bias due to nonresponse
may be absent or hidden. These insights refine our understanding of the roots of nonresponse
bias, but in doing so complicate what was once a straightforward recommendation for
minimizing nonresponse bias—increase response rates. Fortunately, although still in the early
stages, research on new metrics for tracking and assessing nonresponse bias has begun. In
this manuscript we consider the impact of monitoring and responding to two specific metrics
Mahalanobis distance and the R-indicator—in the context of a responsive data collection design
aimed at reducing nonresponse bias in key survey outcomes. We do this via simulations based
on data collected by a large, nationally representative panel survey. We find that these metrics
can be useful in gauging potential contributions to nonresponse bias, each with its own pros and
cons, and present recommendations for real-world implementations based on the simulations.
Using Mahalanobis Distance Measures for Bias Reduction
Melissa Cominole, RTI International; Dan Pratt, RTI International; Bryan Shepherd, Abt
SRBI; Peter Siegel, RTI International; David Wilson, University of Delaware; Jennifer
Wine, RTI International
Building upon the results of the experiments and simulation studies discussed in other papers in
the panel, our most recent data collections incorporated the Mahalanobis distance measure into
a responsive design intended to reduce nonresponse bias among high-distance cases, or those
nonrespondents most unlike those who have already responded. We will describe the designs
of three studies, each with a unique approach adapted to its population. Each study began with
an early response phase during which sample members were invited to complete a self-
administered Web interview. After the initial early response phase, outbound telephone
prompting and production telephone interviewing began. Each study identified a series of time
points, after the early response and initial outbound calling phases, during which the
Mahalanobis distance values were calculated for all remaining nonrespondents, so that cases
above a certain threshold could be targeted for specialized protocols. The timing and nature of
interventions varied according to the specific needs of each study. The particular design used in
each study will be described and preliminary results will be presented. Issues related to practical
implementation within constrained budgets and schedules will be discussed.
Using Propensity Models During Data Collection for Responsive Designs: Issues
with Estimation
James Wagner, University of Michigan; Frost Hubbard, University of Michigan
Responsive designs often use response propensities estimated during data collection. These
estimated propensities may be used either for monitoring or for making decisions about the next
action to take on each case. A problem with estimating response propensities in this way is that
the data are not fully observed until the end of the study. The data about future effort and
response are “missing.” This missingness may or may not bias the resulting estimates. This
presentation reviews situations under which this missingness may lead to bias, discusses
approaches to estimation that may minimize the risk of bias, and gives several examples that
evaluate the impact of this missingness on estimates and actions taken as a result of these
estimates.
Does Balancing Survey Response Reduce Nonresponse Bias?
Barry Schouten, Statistics Netherlands
Recently, various indicators have been proposed as indirect measures of nonresponse error in
surveys. The indicators employ available auxiliary variables in order to detect nonrepresentative
or unbalanced response. They may be used as quality objective functions in responsive and
adaptive survey designs. In such designs different population subgroups receive different
treatments. The natural question is whether the decrease in nonresponse bias caused by these
designs could also be achieved by nonresponse adjustment methods that employ the same
auxiliary variables. In this paper, we discuss this important question. We provide theoretical and
empirical considerations on the role of both the survey design and nonresponse adjustment
methods to make response representative or balanced. The empirical considerations are
supported by a wide range of household and business surveys.
Economic Issues and Attitudes
Media, Public Opinion and Economic News Coverage
Stuart Soroka, McGill University; Dominik Stecula, University of British Columbia;
Christopher Wlezien, Temple University
Public reactions to the economy have political consequences. Support for governments and
policies follows economic trends, for instance. But past work shows that media coverage of the
economy matters to public attitudes, above and beyond the economy itself; and that coverage is
biased, driven by organizational factors, news norms and audience interests. This paper
examines one new aspect of the media-public-economy relationship: the tendency for both
media and public opinion to react mainly to changes in the economy, conditional on levels. That
is, both media and the public do not react to high unemployment so much as an increase in the
rate; and that coverage is conditional on current unemployment levels. This pattern comports
nicely with research on voter behavior and elections, which shows that economic change, not
the level of the economy, is what matters; it also makes more understandable the somewhat
surprising finding of positive coverage in the midst of the Great Recession. Results indicate that
the model applies at other times and in other places—they are based on a content analysis of
150,000 news stories (over 20 years) in the U.S., UK and Canada, analyzed alongside
commercial polling data on economic sentiment. The paper considers implications of the media-
economy relationship for economic sentiment and government support.
Economic Mobility and Public Opinion
Catherine Wilson, American National Election Studies
How does economic mobility relate to political attitudes and behavior? The American National
Election Studies recently fielded a new set of questions about respondents' current, past, and
anticipated future prosperity. We use these data to investigate two general research questions:
How does past experience transitioning among low, middle, and upper incomes relate to
political opinions? And how do expectations about one's chances of being poor, comfortable, or
wealthy in the future relate to those opinions? We investigate these 'pocketbook' considerations
of economic mobility as they relate to perceptions of the economy, blame attribution for poor
economic performance, presidential approval, party identification, policy preferences, and
presidential candidate preferences. We show how those who have experienced or who
anticipate an upward trajectory to their financial well-being differ from those whose experiences
or prospects have not been, or do not seem, as favorable, and we characterize the relative
magnitude of these effects compared to other attitudinal and demographic variables.
Who Counts as White Working-Class? A Proposal for a New Approach
Daniel Cox, Public Religion Research Institute; Juhem Navarro-Rivera, Public Religion
Research Institute; Robert P. Jones, Public Religion Research Institute
The influence of the white working-class on American culture and politics is difficult to overstate.
Although arguably facing a decline in political clout, white working class Americans still retain an
outsized influence in many important battleground states. Their support was pivotal for Obama
in states like Michigan, Ohio and Pennsylvania. Yet despite this, there has been a glaring lack of
consensus about the best approach to measuring this important group. In different works, the
white working-class are defined in terms of income (Bartels 2008; McCarty, Poole, and
Rosenthal 2006), occupation (Edsall 2007; Spitzer 2012), or some combination of these
(McTague 2012; Teixeira and Abramowitz 2008). These different definitions often lead us to
draw different conclusions about the political attitudes and behavior of the white working-class
Americans.In this paper we compare several different definitions of the white working-class to a
new definition developed from an original large (n=3,000) national survey of Americans. We
define the white working-class using a combination of race, education, and an occupational
proxy (people who are paid hourly or by the job). This definition is parsimonious and replicable
and better captures the complex social, economic, and political realities of this oft-mentioned but
often misunderstood group of Americans. This new approach provides a more complex picture
of the working class in terms of their politics, economic outlook, and cultural traits.
The Employment Outlook of Low-Wage Workers in America
Trevor Tompson, AP-NORC Center for Public Affairs Research; Jennifer Benz, AP-NORC
Center for Public Affairs Research
With a sluggish economy and shrinking middle class, there are many reasons to be concerned
about the current status and future opportunities of lower-wage earners in America. The
Associated Press-NORC Center for Public Affairs research, with funding from the Joyce
Foundation, conducted a representative, multimode survey with 1,606 lower-wage workers in
America to measure the their opinions about their economic outlook, working conditions, and
opportunities for advancement. The survey targeted employed individuals in jobs that pay less
than $35,000 per year. Findings from the survey reveal that lower-wage workers in American
are struggling to get ahead, both inside and outside the workplace. Compared to the general
population, more lower-wage workers feel that the country is headed in the wrong direction.
Three-quarters of lower-wage workers report being worse off than they were four years ago and
report worrying a great deal about many aspects of their personal financial situation. Inside the
workplace, pessimism cuts across numerous aspects of their employment outlook with
majorities seeing little opportunity for promotion or viewing that their current job will help them
advance their long-term career goals. The data also reveal that this general and job related
pessimism is especially high among white low-wage workers (even when controlling for other
political, social, and demographic factors). In spite of a pessimistic outlook, lower-wage workers
are general satisfied with their jobs and working conditions, and a majority feel like their
employer values them for the work they do. Findings do show that job training may be one
solution to overcoming pessimism and feelings of being stuck in a dead-end job with a majority
of workers who have participated in employer-sponsored job training programs and benefits
reporting that training and education are important for moving ahead in their careers.
Seeing Red: The Politics of Regulations
Debbie Borie-Holtz, Rutgers University; Stuart Shapiro, Rutgers University; Michael
Wong, Rutgers University
The role of regulation has been a central point of the recent presidential campaign and several
gubernatorial contests. Regulations have been criticized as 'killing jobs' and hurting the
economy while their defenders point to the benefits of a strong regulatory regime. Claims on
either side of the debate are backed with limited evidence. While this is not unusual for claims in
the political arena, academic examination of the effects of regulation has also been limited. In a
unique dataset of environmental regulations, we examined whether regulatory burden hurts the
economy or the business climate in five contiguous Midwestern states over the past decade.
While the empirical data suggests NO, we then conducted a random survey of business leaders
in these states to assess if regulatory criticism has any standing within the regulated community.
At the outset, we conduct a list-experiment technique to measure whether regulations are
considered a major problem among businesses, particularly given the attention paid to the
policy issue by candidates and elected officials alike. While other national surveys suggest
otherwise, we drill down further to see if regulatory burden is a 'real' or 'perceived' threat to
businesses. If real, we attempt to determine if the threat is different among certain sectors of
businesses, company size or gross revenues. If perceived, we look for reasons to explain in
what ways regulations are considered harmful or economically threatening to business
managers and owners.
Saturday, May 18
1:15 p.m. – 2:15 p.m.
Poster Session 3
1. Watch Your Language!: The Impact of the Survey Language on Bilingual Hispanics’
Response Process
Meryem Ay, University of Nebraska – Lincoln; Wendy Gross, GfK Knowledge
Networks; Curtis Cobb, GfK Knowledge Networks; Randall Thomas, GfK Knowledge
Networks
Cross-cultural studies have been the focus of researchers creating and analyzing globally
comparative data. However, the construct validity of the questions across cultures is a
concern for data quality. Survey questions should have the same meaning and sentence
structures across the languages. Current efforts are inclined to standardize survey questions
across multiple languages, yet the impact of language itself as a potential confound remains
untested. Linguists debate if language shapes individuals’ thoughts and judgments or if a
universal language of thinking exists (Whorf & Carrol, 1956; Chomsky, 1976). Given that
increasing numbers of surveys are conducted in multiple languages, the debate among
linguists introduces practical, yet untested, concerns. If the language that people use affects
their thoughts and judgments, inter-language differences in multiple language surveys
reflect not only true cross-cultural differences in attitudes and behaviors but also differences
created by language itself. It is hypothesized that completing a survey in a specific language
will prime respondents into thinking in a culture-specific way. Bilingual Hispanics are ideal
subjects for this study because of their bridge between both non-Hispanic and Hispanic
cultures and their ability to communicate in both English and Spanish. A Web-based survey
experiment was conducted on a sample of 620 respondents from GfK’s KnowledgePanel
Latino to determine how the survey language would influence responses. Bilingual
Hispanics were randomly assigned to receive questions in English (n=156) or Spanish
(n=155) along with two control groups: English-only Hispanics (n=154) and Spanish-only
Hispanics (n=155). Differences in acculturation between the two bilingual groups were
examined to ensure randomization occurred properly. Even after controlling for demographic
differences, preliminary analysis indicated differences between the two bilingual groups on
topics related to self-efficacy. The differences are evidence of language priming and have
potential implications for the data quality of multi-language, multi-cultural, and cross-national
survey work.
2. Movers and Shakers: Discrepancies Between Cell Phone Area Codes and Respondent
Area Code Locations in RDD Samples
Carol Pierannunzi, Centers for Disease Control and Prevention; Machell Town,
Centers for Disease Control and Prevention; Lina Balluz, Centers for Disease Control
and Prevention; William Garvin, Centers for Disease Control and Prevention; Mansour
Fahimi, Marketing Systems Group; David Malarek, Marketing Systems Group; Ashley
Hyon, Marketing Systems Group
Invariably, a portion of all cellular RDD sample telephone numbers reach individuals who
reside outside of the area they are expected to reside—a discrepancy that only widens as
the target geography becomes smaller down. In 2011, the Behavioral Risk Factor
Surveillance System (BRFSS) found that on average about 8% of cellular calls reach
individuals who reside outsides of their sample states. The extent of this discrepancy ranged
from a low of 4% in Mississippi to a high of 48% in the District of Columbia. By appending a
variety of ancillary data about the location and demographic composition of rate centers
associated with each cellular number for the 2011 BRFSS, this research attempts to
quantify and explain some of the observed discrepancies. Using multivariate analysis
techniques, rate center characteristics are identified that may predict which numbers within a
sample are more/less likely to reach respondents outside of their sample states.
3. Improving the Quality of Proxy Reports
Jennifer Edgar, U.S. Bureau of Labor Statistics
The Consumer Expenditure Quarterly Interview Survey (CEQ) asks one respondent to
report expenditures made by an entire household. The CEQ has long identified this type of
proxy reporting as a potentially significant source of underreporting. There are two likely
reasons for these omissions: knowledge and recall. Lack of knowledge, stemming from the
fact that participants may not know about all purchases by other household members,
cannot be corrected through revisions to survey questions. The second reason, that
participants may forget to consider other household members, may be able to be addressed
through the survey design. A small lab study (n=20) explored the feasibility and
effectiveness of collecting information about each household member at the beginning of the
study, and using that information to add prompts in relevant sections of the survey. The
study found this approach to be effective. All participants were able to provide information
specific to other household members upfront, and after hearing the prompts reported an
average of $182 of additional expenditure reports; a 6 percent increase in overall reporting.
This presentation will explain the method used and give an overview of the results.
4. Multi-Method Pretesting of Multilingual Survey Items
Cynthia Helba, Westat; Gina Shkodriani, Westat; Jasmine Folz, Westat; Martha
Stapleton, Westat; Gordon Willis, National Cancer Institute
Cognitive interviewing and behavior coding are methods often used for pretesting survey
questions prior to administration (e.g., Fowler & Cannell, 1996; DeMaio & Rothgeb, 1996;
Willis, 2005). In single-language surveys, the two methods are frequently used together
because they have different strengths (e.g., Census Bureau, 1995; Willis, DeMaio & Harris-
Kojetin, 1999). A few studies also have reported on use of multiple methods to pretest
translated questions (e.g., Napoles-Springer et al., 2006). Many of these studies use the
number of problems identified through each method as a way to compare the testing
methods (e.g., Presser and Blair, 1994); our study compares the specific types of problems
identified by the two methods. This project used cognitive interviewing and behavior coding
to pretest Chinese (Mandarin and Cantonese), Korean, and Vietnamese translations of the
U.S. Tobacco Use Special Cessation Supplement to the Current Population Survey. We
used 11 items translated into four languages. These 11 items were not revised between
cognitive interviewing and behavior coding. Some had problems identified in cognitive
interviewing and others did not. This project was an unusual opportunity to compare the
types of problems identified using exactly the same items rather than items that had been
revised between the cognitive interviewing and the behavior coding. Our analysis begins to
assess whether multi-method pretesting can make a substantial contribution to multilingual
survey development. We determine the types of problems identified at each stage of the
pretesting using an established coding system for identification of questionnaire problems
for the entire group of respondents. We then consider if the identified types of problems
varied between language groups. Because the team also debriefed the behavior coders
after the pretesting was completed, we will describe how these debriefings allowed the
coders to act as “cultural interpreters” and further inform recommendations for revising the
questions.
5. Targeted Data Collection Efforts for NASS’s Quarterly Agricultural Survey Based on
Nonresponse Classification Tree Models
Kathy Ott, National Agricultural Statistics Service; Melissa Mitchell, National
Agricultural Statistics Service
In order to combat rising nonresponse rates, the National Agricultural Statistics Service
(NASS) developed nonresponse propensity models to identify potential nonrespondents
prior to data collection for the Quarterly Agricultural Survey. Classification tree models based
on auxiliary data were used to rank order operations on their likelihood to be a
nonrespondent. Earlier models were developed at the national level and now NASS has
shifted focus to state level models. These scores can be used to enhance data collection
efforts. Using the nonresponse prediction for each operation, targeted data collection
methods were developed and tested to determine if specific methods directed at operations
that had a low propensity to respond would increase response for the survey. The ability of
the model to predict nonresponse propensity, as well as results from the targeted data
collection methods, will be discussed.
6. Identifying and Addressing Response Inconsistency
Ashton Jacobe, Fors Marsh Group; Sarah Keaton, Fors Marsh Group; Luciano Viera,
Fors Marsh Group
Measurement error occurs when a respondent’s answer is inaccurate or imprecise. One
common manifestation of measurement error is response inconsistency, where respondents
provide survey responses that seem incompatible with or contradict their other responses.
Response inconsistency is thought to occur when questionnaires are completed without full
comprehension of the items. It is particularly problematic in self-administered surveys, due
to limitations of implementing active follow-up and/or clarification strategies such as
additional definitions, examples, or additional instructions. Survey research typically
examines survey design features that contribute to response inconsistency, such as mode
of administration, question type and wording, and interview setting. However, much less
research has focused on identifying the contribution of respondents themselves to these
errors. Despite efforts to develop instructions that communicate clear expectations and
motivate high quality responses from all participants, these manipulations typically vary in
their effectiveness in reducing response inconsistency across respondent types. This is
consistent with past studies that have found that have found relationships between
respondent demographic characteristics (e.g., age, gender, etc.) and the quality of survey
responses. Additional research that takes a more holistic approach to response
inconsistency is needed that examines both the survey design- and person-level causes as
well as how they should be addressed. To this end, the present investigation uses multiple
study examples to examine the: 1) Methods for identifying response inconsistency; 2)
Survey design features that lead to greater response inconsistency; 3) Types of
respondents that are more likely to respond inconsistently; and 4) Measures taken to reduce
the impact of response inconsistency on measurement error. Results and implications for
existing survey practice along with directions for future research will be discussed.
7. Controlling for Acquiescence in Comparative Cross-National Research: The
Importance of Using Measurement Equivalent Country Clusters
Eva van Vlimmeren, Tilburg University; Guy Moors, Tilburg University
This paper addresses the situation in which an acquiescence response behavior has
differential impact on cross-national differences in attitudes depending on the type of culture
to which a national culture can be allocated. Acquiescence is a tendency among certain
respondents to agree with question items irrespective of the content of the items and is
generally recognized as a source that might bias cross-cultural comparisons. Frequently in
cross-cultural research there are problems regarding measurement invariance, i.e. the level
of comparability of measurement models across cultures, we therefore grouped countries
according to their homogeneity in measurement. Using data of the European Values Study
of 2008, we will demonstrate that the correlation structure of a set of conceptually balanced
items defines several clusters of countries that are internally homogeneous but can
externally be quite divers. Interestingly, the different clusters display a distinct reaction to
controlling for ARS in the model. For instance, in the Western European cluster country
differences in attitudes did not substantially change when controlling for acquiescence,
whereas in the other clusters changes were pronounced. Our findings have important
implications for comparative research in the sense that even a response style factor such as
acquiescence can have a distinct meaning across cultures with various impacts on how it
disturbs the measurement model. It also demonstrates that clustering countries according to
their similarity in correlations within a given set of items might be a tool to identify
measurement equivalent sets of countries in which comparative research is possible. Some
practical guidelines as well as implications for further research are presented as well.
8. A Practical Approach for Identifying Engagement-Level Segments and Developing
Differentiated Acquisition and Retention Strategies
Jack Fentress, Data Recognition Corporation; Herbert Baum, Data Recognition
Corporation; Colleen Rasinowich, Data Recognition Corporation
Whether considering products, candidates or policies, individuals make choices. As
researchers, a goal is to develop strategies that best impact individual choice and our offer’s
share of preference. Depending on the context, preference share may be translated as
market share, votes or policy endorsement, but the analytics are similar. This presentation
will provide an overview of our use of choice models to increase preference share. Standard
analytic approaches tend to analyze populations in aggregate and identify universal drivers.
Similar to the work of Fred Reichheld and the use of Net Promoter Score (NPS) in the
commercial sector, we advocate the establishment of segments, differentiated by
quantifiable levels of preference. The definition of these groups is flexible, but needs to
segment respondents along a continuum of preference. Utilizing choice models and
respondent-level analytics, we address two issues central to preference studies. The first is
retention and how one can best strengthen preference among current
supporters/customers. The second is acquisition or how one can best capture new
supporters/customers. It’s a balancing act between the retention of supporters and the need
to revise that offer to acquire new supporters. The delineation of these segments and
accounting for their unique requirements is recommended. We will present two analytic
approaches that effectively address acquisition and retention strategies. Relative Strength of
Preference (RSP) scorecards are effective for identifying acquisition strategies. Quantifying
respondent-level gaps on key items results in the identification of those items that are most
impactful for achieving overall preference. For the identification of retention strategies,
Power/Penalty and Reward Quadrant Maps are highly effective. Utilizing logistic
regressions, these maps identify which items have the greatest upside (reward), downside
(penalty) or both (power).Our intent is to provide participants with several analytic
approaches used in the commercial sector that have viable application for policy analysis.
9. Measuring Messy Concepts Without Creating Messy Questionnaires: The Case of
Gender
Alian Kasabian, University of Nebraska-Lincoln
Researchers are often interested in the impact of gender on their variables of interest, yet
use measures of sex category in their analyses. Sex and gender scholars are highly critical
of this practice, due to the range of gender behaviors and experiences that are unrelated to
biological sex as labeled at birth and because there is growing visibility of people who do not
identify as male or female. Yet for most surveyors, categorizing people as male or female is
the most practical option because other gender measures tend to be very lengthy (as with
psychological scales) or are better suited to qualitative work. To make real gains in
incorporating gender into our understanding of the social world, researchers need a more
nuanced and informative measure of gender that is not overly burdensome for respondents
and does not require inordinate amounts of space in a questionnaire. In this paper, I present
one such measure. In 2011, the Nebraska Annual Social Indicator Survey of residents aged
19 and older (n=906, AAPOR RR1=36.3%), provided respondents with a visual analog scale
(VAS) labeled “completely feminine” on one end, and “completely masculine” on the other.
Respondents were asked to place themselves, their spouse/partner (if applicable), and
society’s ideal woman and ideal man on the scale. Thus, the scale provides an interval level
measure of gender identity. Preliminary analyses indicate that respondents in the middle of
the scale rate themselves significantly differently than their more feminine and masculine
counterparts on a number of attitudinal measures (competence, political leanings, feminist
identification, etc.), suggesting that the commonly used sex category measures are missing
important variation. Additional analyses will assess the predictive validity of this gender
measure. The paper will also discuss the difficulties of using a VAS for this construct.
10. Nonresponse Bias Analysis in a Cohort Study Incorporating Genetic Data
Daniel Loew, Abt SRBI; Mark Morgan, Abt SRBI
Post-Traumatic Stress Disorder (PTSD) is a mental health condition that afflicts many of the
soldiers returning from service in Afghanistan and Iraq. Risk and resilience factors for PTSD
are not well understood. Longitudinal research is being conducted to study the mental health
trajectory of soldiers who have been deployed to combat situations and those who have not.
It is critical to the interpretation of the results that study attrition is minimized and that bias
over time is identified and adjusted for. The Ohio National Guard (ONG) cohort consists of
~3,000 members of the Ohio Army National Guard interviewed annually by telephone. Each
member was also invited to submit a saliva sample for genetic analysis. The key questions
that we will address in this methodological brief are: Are soldiers with more severe traumas
more likely or less likely to continue participating? Are soldiers with less difficult service
experiences more or less likely to continue? How do these potential biases affect our ability
to identify the factors that prevent or promote the development of PTSD and other mental
health problems? This methodological brief will examine the factors that are associated with
attrition for the survey and participation decisions regarding the optional genetics study.
11. Four Experiments for the 2011 Diary of Consumer Payment Choice
Kevin M. Foster, Federal Reserve Bank of Boston
The Diary of Consumer Payment Choice (DCPC) is a new data product from the Federal
Reserve Banks of Boston, Richmond and San Francisco. In 2010 and 2011, we conducted
two pilot diaries, in which diarists reported all transactions--purchases and bill payments--
and cash management activity over a three-day period. Respondents recorded their activity
in a paper diary and then reported the results in a nightly online survey, which included
additional questions. To prepare for the full implementation of the DCPC in 2012, we
conducted four experiments concerning key survey methodology issues in the diary
program: 1) Does using mixed modes affect the number of transactions reported? We asked
some diarists to mail back their paper diary for an additional incentive. 2) Do new or
experienced diarists report larger numbers of transactions? We feared that experienced
diarists may suffer from diary fatigue or conditioned underreporting. 3) Do diarists who take
the associated survey before their assigned diary period report different numbers of
transactions than those who take the survey after their diary period? In the 2010 pilot study,
we insisted that all diarists take the survey first. 4) Does having extra 'lead time' affect the
number of transactions and the amount of cash reported? Diarists receive their diary packet
one, two, or three days ahead of their assigned diary start date based on the day of the
week of the start date. The answer to each of these questions is 'No'. These results have the
potential to save money (fewer incentives paid) and administrative effort (no need to remind
diarists to take the survey first). In addition, the experimental outcomes show that we are not
biasing our results by including both new and experienced diarists, nor by changing the lead
time on receiving the diary packet.
12. Authorizing Health Record Linkage in Survey Research
Mindy Hu, Mathematica Policy Research; Ronghua (Cathy) Lu, Mathematica Policy
Research; Anna Situ, Mathematica Policy Research
Linking administrative and survey data is becoming increasingly popular in health services
research. Linking survey and medical claims data enables researchers to examine the
interactions between disability, chronic disease, health care use, cost, and patient
experiences with the health care system. Evidence suggests that participant
characteristics—such as age, health status, and health care use—influence the likelihood to
authorize data linkage; however, results are mixed regarding the most important variables
and the direction of the effects (Beebe et al. 2011; Dunn et al. 2004; Harris et al. 2005;
Huang et al. 2007; Knies et al. 2012). The enactment of the Health Insurance Portability and
Accountability Act (HIPAA) of 1996 could help explain these mixed results. In the United
States, the HIPAA Privacy Rule imposes requirements on obtaining authorization that could
affect rates of authorization. Few population-based studies have examined the interplay of
participant characteristics and authorization to link data in the context of HIPAA regulation.
The 2012 Autoworker Health Care Survey is a self-administered mail survey of
approximately 13,000 active and retired autoworkers and their spouses/partners consisting
of 1) a health questionnaire; and 2) a request for written authorization (which meets HIPAA
regulations), enabling researchers to link survey responses to medical claims data.
Mathematica Policy Research conducted the survey for the National Institute for Health
Care Reform. This paper will examine the influence of self-reported health, health care use,
and demographic characteristics on rates of authorization to link survey data to medical
claims data. We will use logistic regression to examine associations between individual
characteristics and authorization outcome. We will also examine potential bias due to
differences in authorizers and non-authorizers and discuss the resulting implications for
survey design.
13. Can a Verbal Prompt About Importance Reduce Item Nonresponse for Demographic
Items?
Glenn D. Israel, University of Florida
Conventional wisdom and practice lead to placing demographic items at the end of a
questionnaire. The thinking behind this practice is that these items are less important than
topically-salient items for most surveys, so higher item non-response can be tolerated for
demographic questions. A recent study by Teclaw, Price and Osatuke (2012) turn this logic
on its head and found that item response for demographic items at the beginning of a
questionnaire was higher than for the same set of items at the end of the survey. This
finding raises the question of whether there are other equally effective approaches to
stimulating high item response rates for demographic questions. This study experimentally
tests whether a verbal prompt about the importance of answering the demographic
questions improves item response rates (relative to the version without the prompt) when
the items are placed at the end of the survey. Data from a customer satisfaction survey of
Cooperative Extension Service clients are used to address the research question. The
mixed-mode survey data included both Web and mail survey responses. Overall, the item
response rate was no higher for the questionnaire with the verbal prompt than the one
without it. In addition, item response rates were not different for either the mail or Web
responses (although the later showed a higher item response rate with the prompt, it was
not statistically significant). Based on these results, it does not appear that a verbal prompt
about importance is a viable strategy for reducing item non-response of demographic items.
14. An Experiment to Improve Spanish Language Response Rates to a Mail
Questionnaire
Andrew Caporaso, Westat; David Cantor, Westat; Aaron Maitland, Westat; Bradford
Hesse, National Cancer Institute
The Health Information National Trends Survey (HINTS) is a national health communication
mail survey sponsored by the National Cancer Institute (NCI). In the first cycle of HINTS 4
non-responding households were mailed both an English and Spanish questionnaire in the
second mailing if their address was linked to a Hispanic surname and/or was in a
linguistically isolated (LI) area as indicated on the frame. This strategy yielded a sample
which was 8.5% Hispanic, which was significantly lower than ACS figures. Compared to
prior telephone versions of HINTS, significantly fewer surveys were completed in Spanish.
Since cycle 1, Brick et al. (2012) have reported on a different mailing procedure that was
tested with a short screening survey on education. This test found significantly more returns
of Spanish language surveys, as well as more Hispanic respondents, when compared to the
cycle 1 HINTS procedure. The purpose of this paper is to test whether these results
generalize to HINTS, which is a long survey (about 20 pages) on a topic that is less salient
than that tested by Brick et al. The paper will report on the results of an experiment that was
carried out in cycle 2 of HINTS 4 which compared two different mailing methods intended to
reach more Spanish speakers and Hispanics. In the first condition, based on Brick et al.,
about 2,000 respondents were sent both a Spanish and English questionnaire in all
mailings. In the second condition, about 10,000 were sent a Spanish and English
questionnaire in all mailings only if the household was linked to a Hispanic surname and/or
LI area. The presentation will report on the results of the experiment with respect to the
number of Spanish language returns, the percentage of respondents identifying as Hispanic
and overall response rates.
15. All in the Family? Who Do Respondents Include When Responding to Telephone
Status Items
Josiane Bechara, NORC at the University of Chicago; Vincent Welch, NORC at the
University of Chicago
The benchmark study for telephone status in U.S. households is the National Health
Interview Survey (NHIS) published by the National Center for Health Statistics. The NHIS is
an area probability survey where data are collected face-to-face in an interview that lasts for
nearly an hour. Telephone status on this survey (i.e., wireless-only, wireless-mostly,
landline-only) is established through responses to a survey item that asks ‘Of all the
telephone calls that you or your family receives are…’ In the context of the NHIS interview,
researchers believe that respondents clearly understand what the term ‘family’ should
include (See Blumberg and Luke, 2012). This item has been employed in a number of
studies that are conducted over the telephone. It is not clear that respondents in the
telephone setting understand the term ‘family’ in the same way that NHIS respondents do.
The current research explores telephone respondents understanding of the term ‘family’ in
this telephone status option. We employed in-depth probing in a cognitive interview setting
in order to understand the level of agreement between respondents’ household rosters and
the set of individuals whom they included in their ‘family’ when responding to this item. We
found that respondents made errors of inclusion and exclusion in their ‘family’ composition.
Replacing the word ‘family’ with the ‘household’ dramatically reduced the number of errors
and led to increased reliability. Further probing revealed that respondents’ self-generated
definition of ‘household’ was also in line with Blumberg and Luke’s (2012) intended
meaning. Implications for future dual-frame RDD studies are discussed.
16. The Expansion of Survey Research into Educational Strategy Consulting: An Example
of How Universities Can Increase Retention Rates With the Use of Surveys and
Personality Tests
Thomas Lamatsch, Monmouth University; Tyler Breder, Monmouth University;
Andrew Bell, Monmouth University
In order for survey research to have a sustainable future it is important to branch out and
cooperate more closely with other fields and break into new areas. While survey research
organization have done large scale studies of education systems for decades they are
mostly absent in the field of education consulting which is dominated by MBAs and
researchers with education degrees although we should play a more serious role. One of
the major problems universities struggle with today is retention and survey research could
assist in that issue similar to the way Gallup assists their clients in picking employees who
are the right fit for their companies. This paper will, however, turn the premise around and
not look for the right student to fit the university but the right fit in terms of approaches to
teaching geared for their students. Universities have long acknowledged that few students
leave because they struggle academically; instead they leave because it is not what they
expected. This study will test if schools could increase retention rates by offering more
flexible programs and closer advising to students based on students’ characters and
temperaments. Companies worldwide use the Meyer-Briggs typology test (MBTI) to create
the ideal atmosphere for their employees to succeed. This study will conduct a survey of
500 randomly selected students who will answer questions modeled after the MBTI as well
as question about how happy they are with their choice of college as well as their preferred
form of “education delivery,” i.e. lectures, seminars, independent studies, online classes etc.
The results can then be used to create simple tests that advisers should use to not only
advise their students academically but advise them which type of class they will most likely
succeed in.
17. Immigration à la GCC: Support and Opposition to the Kafala System in Qatar
Abdoulaye Diop, Social and Economic Survey Research Institute, Qatar University;
Trevor Johnston, University of Michigan; Kien T. Le, Social and Economic Survey
Research Institute, Qatar University; John L. Holmes, Social and Economic Survey
Research Institute, Qatar University
Since the 1950s, immigration in the Gulf Cooperation Council (GCC) countries has been
uniquely governed by the Kafala or sponsorship system. The Kafala provides the legal basis
for the residency and employment of migrant white-collar and blue-collar workers in these
countries. Today, despite growing criticism from human rights organizations, little effort has
been made to ameliorate the difficult working and employment conditions of these migrant
workers in the GCC countries. While the existing literature is abundant at the country-level,
combining macro analysis and ethnographic narratives to describe the abuses and human
costs, we know little about public opinion towards the Kafala system. Capturing this public
opinion is critical to understanding the GCC countries’ failure to enact vital reforms. In this
paper, we study this issue using data from two nationally representative surveys in Qatar.
We begin by exploring the native Qataris’ attitudes towards migrant workers in general and
the determinants of support for or opposition to reform. Drawing on a survey experiment, we
then exploit a matching design to evaluate the effects of priming and prejudice on the
Kafala’s reform. Finally, we draw some conclusions about the results with respect to the
future outlook of the region.
18. Evaluations on a New Methodology of the Turkish Consumer Survey
Türknur Hamsici Brand, Central Bank of Turkey; Ece Oral, Central Bank of Turkey
This study investigates the methodology of a redesigned monthly Turkish consumer
confidence survey conducted by Turkish Statistical Institute and the Central Bank of Turkey
in an effort to calculate the consumer confidence index for Turkey. Since the start in
December 2003, the data collection method of the Turkish Consumer Survey has been a
face-to-face survey annexed to a Labor Force Survey panel design. In addition to data
collection method, the redesigned survey has implemented a different sampling method.
Recently, the updated survey has been at the pilot period for twelve months. Comparisons
between old and new survey at a design-based perspective are made. Reasons for possible
measurement errors and biases are evaluated. Consumer surveys are usually aimed at
including questions to form consumer confidence indices, and maintain comparisons across
countries via standardized questionnaires and indices. The redesigned Turkish survey
meets the requirements of the European Commission quality dimensions for a future
approval. Consumer confidence indices are used as economic indicators for forecasting
household consumption expenditure, consumer behavior in general, and the country’s
economic situation. The data are also used in political decision-making processes. In this
regard, consumer confidence surveys are useful and widely accepted tools for gathering
information about common people’s expectations over time (Ludvigson, 2004). Some
indicators derived from Turkish consumer confidence survey are used in macroeconomic
analysis and forecasting. Given the significance, the choice of survey methodology used is
central to maintain efficient results for economic and political environment. The redesigned
survey is expected to be a good asset to improve the macroeconomic indicator
characteristics of Turkish consumer confidence index.
19. In Search of More Granular Likely-Voter Models for Low-Turnout Elections: The Case
of the 2013 Florida and Ohio Primary Elections
Clifford Young, Ipsos Public Affairs; Neale El-Dash, Ipsos Public Affairs
Most public polls use some derivation of the old 'Gallup' Likely-Voter model which typically
includes 5 or 6 items summated into an index. Likely voters (LV), then, are defined by a “cut
point” which typically corresponds to the historical turnout rate in that given election.
Because of the coarse nature of the index, there are two potential problems: 1) It may be
impossible to obtain a LV-cut that approximates the expected turn-out. 2) The predicted
turn-out in two consecutive LV-cut points can be very different, not allowing the researcher
to examine what happens in-between these cut-points. These problems are especially acute
in low turnout elections, such as primaries. In the specific case of the 2012 Ohio and Florida
primaries, we confronted these issues. First, the top 25% of declared likely voters tend to be
clumped together in the top box of the scale. In a low turnout election where only about 15%
of electorate vote, this inability to discriminate is a serious handicap. The second problem is
that a 25% turnout has a decidedly different partisan makeup than a 15% one. With these
challenges in mind, we employed estimated probabilities of voting, using logistic regression
as a function of past behavior, intended future behavior, and degree of partisanship. Our
model provides two advantages. First, we were able to discriminate voters in one-percent
intervals from 0 to 100%. Second, by employing political variables, our model captured the
partisan nature of primary elections. Our paper will examine the relative performance of the
traditional summated index likely voter approaches with our logistic regression method. We
will analyze approximately 13,000 interviews collected for the Reuters-Ipsos 2012 primary
polls in OH and FL. To measure performance, we will employ the Average Absolute
Difference between the survey estimates and election results.
20. The Effectiveness of Follow-Up Interviews in Reducing Item Nonresponse Bias in Mail
Surveys
Sandra L. Clark, U.S. Census Bureau; Deborah H. Griffin, U.S. Census Bureau
Research has demonstrated that survey managers need to consider factors other than
response rates when assessing survey quality. When considering nonresponse, quality is a
consequence of the adjustments that a survey makes and the similarity of survey
nonrespondents and respondents, more than the level of nonresponse. While much of the
research in this area has focused on unit or survey nonresponse, item nonresponse involves
parallel concepts and concerns. We generally assume that a low level of item imputation is a
good predictor of the quality of survey estimates. This paper assesses if efforts to reduce
levels of item nonresponse in the American Community Survey (ACS) are successful in
reducing nonresponse bias. The ACS achieves high levels of item response because data
collection includes special efforts to follow up on incomplete responses. Evaluations have
demonstrated the effectiveness of this follow up effort in reducing the national-level item
imputation rates. These evaluations have not assessed the reduction in nonresponse bias
that the ACS achieves by converting a subset of item-level nonresponses to responses.
Recent analysis of this follow up operation provides us with important information about our
ability to obtain responses for items that respondents left blank on ACS mail-returned
questionnaires. Using data from the 2010 American Community Survey, this research
identifies the specific items that follow up efforts are successful in converting and those that
once left blank stay blank. In addition, to assess nonresponse bias reduction, this paper
compares the values of the originally missing, converted responses to the values reported
without follow up. By closely examining the ACS’s mail return follow-up operation, this
project will broaden our knowledge of item nonresponse bias in mail surveys and help us
define the items that benefit most from follow up efforts.
21. Conducting “Issues” Surveys Using Automated (IVR) Polls: The Case of the National
Leadership Index
Seth A. Rosenthal, DataDoc Research Consultants; Owen Andrews, Center for Public
Leadership, Harvard Kennedy School
The use of automated (IVR) polling methods to conduct issues-based surveys is
controversial. Issues-based survey questions are often more complex than the candidate-
choice questions typical of IVR polls. Some critics suggest that only live-caller interviews can
provide valid assessments of public opinion on complex issues. We evaluated the validity
and effectiveness of IVR-based issues polling using data from the National Leadership
Index (NLI). The NLI is an annual survey in the U.S. of public opinion toward the nation’s
leaders. It is conducted by the Center for Public Leadership at the Harvard Kennedy School
in collaboration with Merriman River Group. It assesses and indexes opinions about national
leadership across 13 key sectors of public life. From 2005-2010, the NLI was conducted as
a live-caller survey. In 2010, we tested a pilot IVR version of the NLI, which allowed for
direct comparison of the two methods. Since 2011, the NLI has been conducted as an IVR
survey. Overall, our data indicate a nearly seamless transition from live-caller to IVR
methods. Two areas, however, merit closer examination. First, there was an increase in
endorsement of extreme responses in the IVR version, particularly on the most divisive
questions. However, the mean responses for these questions were not affected. This may
indicate that the increase in extreme responses accurately reflected respondent opinion
after the moderating effect of a live interviewer was removed. Second, the percent of “not
sure” responses increased marginally throughout the survey. This was likely due to the
inclusion of an explicit “not sure” option for each question, necessitated by the IVR
methodology. However, some argue that including “not sure” anchors generally increases
the external validity of public opinion surveys. Overall, results for the IVR version of National
Leadership Index suggest that IVR can compare favorably with live-caller methods for
conducting issues-based surveys.
22. Is Interactive Voice Recognition a Viable Mode of Data Collection?
Adam Gluck, Arbitron
Arbitron uses a panel-based methodology to collect radio listening data, and produce media
ratings in various markets around the country. The method for collecting this data is the
Portable People Meter, a cell phone sized device that passively measures exposure to
encoded audio in media. As each individual meter is carried by a unique panelist, we can
associate the media that the PPM detects to the panelist who is wearing it, thus creating an
electronic log of their listening. From that we can estimate who was listening to radio. After
panelists leave a panel, we occasionally re-contact them to gather additional information via
surveys. During the fourth quarter of 2012 and the first quarter of 2013, Arbitron will conduct
one such brief survey. Households will first be surveyed via phone, with a brief survey
administered automatically via Interactive Voice Recognition (IVR). Households will be sent
one follow-up IVR call as well, and they may call back a special number to administer the
IVR survey at their leisure. The survey will consist of two questions. In this paper, we will
seek answers to the following questions: 1. What type of response rate does an IVR survey
yield? 2. What are the characteristics of responding vs. non-responding households, with
regard to phone type (cell vs. landline), income level, size, and presence of children?
Additionally, we will also present information about the legal and logistical challenges of
administering an IVR survey.
23. The Effectiveness of Forgiving Introductions and Response Options for Reducing
Social Desirability Biases in Reports of Health-Related Behaviors
Hanyu Sun, Joint Program in Survey Methodology; Rebecca Medway, American
Institutes for Research
As the obesity epidemic continues to rage, it is becoming increasingly important to collect
accurate information about people’s health-related behaviors. Unfortunately, it can be
difficult to get survey respondents to provide truthful responses about these topics. One
method researchers have proposed as a way to reduce such social desirability biases is
adding a forgiving introduction to the question stem. It is hypothesized that forgiving
introductions reduce both the intrusiveness of the question and respondents’ concerns
about the negative consequences of giving a truthful response. However, the few
experimental studies that have tested their effectiveness have produced mixed results. One
explanation for these mixed results is that many studies utilize vague introductions that
respondents do not find very convincing (e.g, “Some people want to exercise, but they just
can’t find the time”). We hypothesized that offering concrete, scientific statements would be
a more effective approach (e.g., “A recent study conducted by the Center for Disease
Control indicates that almost one-third of adults do not exercise on a regular basis”).
Additionally, the previous studies rarely experimentally manipulated both forgiving
introductions and forgiving response options simultaneously. Finally, most existing studies
have focused on reports of voting history and sexual behavior; the effectiveness of forgiving
introductions and response options on reports of other health-related behaviors has not yet
been investigated. To better determine whether, and when, forgiving introductions and
response options are effective, we included a 5-item 3×3 question wording experiment in a
national probability-based Web survey. The experiment varied both the authoritativeness of
the forgiving introduction (authoritative scientific introduction vs. vague non-scientific
introduction vs. no introduction) and the use of forgiving response options (forgiving
response options first vs. forgiving options last vs. no forgiving options). This presentation
presents the results of this experiment.
24. Reaching Respondents Using an Address-Based Frame: Does a Nonreturned Mail
Questionnaire Really Mean “No”?
Marla D. Cralley, Arbitron
Over the past ten years researchers have witnessed decreasing coverage and efficiency in
traditional Landline RDD samples. To address this, Arbitron conducted experiments and
began using cell-only and cell-mostly samples to supplement the traditional RDD samples.
Finally during 2011, Arbitron moved to a total Address-Based sample frame in the 47 top
media metros currently measured by Arbitron’s PPM service. The Arbitron PPM service
passively collects Radio and Television media usage among an on-going panel of
respondents. This system replaced the traditional paper Radio and Television self-report
diaries previously used in these markets. Address-based sampling requires researchers to
employ differing modes to contact potential respondent households effectively and
economically. Arbitron uses an initial phone contact to reach resident addresses where
Arbitron’s sample vendor is able to match to a phone number using secondary databases.
Selected addresses where a phone match is unavailable are initially contacted using a
screener questionnaire. This questionnaire is designed to confirm the address reached and
collect demographic information and a telephone number. Selected households returning
usable questionnaires are then contacted using the provided phone number for panel
recruitment. Attempts to recruit a sub-sample of households that do not return usable
questionnaires are made in person by Arbitron field representatives. This paper compares
panel recruitment agree rates for households returning the initial mail questionnaire to those
who did not return the screener. Recruited households will also be compared based on
household demographics and quality of panel participation. This analysis will evaluate the
benefit of making additional efforts to contact households not returning mail questionnaires.
25. Motivated Conservationism: Contingent Effects of “One Health” Framing on
Conservation Behavior
Sungjong Roh, Cornell University; Katherine A. McComas, Cornell University; Dan
Decker, Cornell University; Rickard Laura, SUNY-ESF
Recent years have seen growing attention to communicating about the interconnectedness
of human, environmental, and animal health. Our research on “One Health” messages
examines how framing wildlife diseases as not only resulting from wildlife behavior but also
due to human and environment factors might influence conservation behaviors seeking to
protect the natural environment (see Karesh & Cook, 2005 for a review). Yet, recent work in
framing effects suggest potential boomerang effects (Chong & Druckman, 2007) when
individuals who receive information opposing their belief system may not simply resist
challenges to their views but instead strengthen their original, opposing position (e.g.,
Gollust, Lantz, & Ubel, 2009; Peffley & Hurwitz, 2007). Building on research (Kahan,
Jenkins-Smith, & Braman, 2011) into the boomerang effects of message framing caused by
individuals’ ideology-protective cognition (i.e., cultural cognition; DiMaggio, 1997; Douglas &
Wildavsky, 1982), we investigate how message effects of a One Health frame and its
counter-frame (i.e., blame wildlife behavior only) vary by citizens’ cultural values
(Hierarchical-Individualists vs. Hierarchical-Communitarians vs. Egalitarian-Individualists vs.
Egalitarian-Communitarians). We report on a Web experiment of N = 550 Americans who
reported intentions to engage in conservation behaviors. Results varied markedly by frames
and individuals’ cultural cognitions. Specifically, among Egalitarian Individualists, the One
Health frame showed a boomerang effect: it reduced intentions to engage in conservation
behaviors compared to a control group, which did not read a message; however, the counter
frame, which blamed wildlife behavior, led Hierarchical Communitarians to express greater
intentions to engage in conservation behaviors compared to the control group. Our
discussion focuses on theoretical and practical implications of the efficacy of One Health
framing in messages seeking to increase conservation behaviors among a diverse public
audience.
26. Vacant Housing Units and Other Out-of-Scopes Identified Across Data Collection
Years of the General Social Survey (GSS)
Jodie A. Daquilanea, NORC at the University of Chicago; Katherine Dekker, NORC at
the University of Chicago; Lauren Doerr, NORC at the University of Chicago; Ned
English, NORC at the University of Chicago
The General Social Survey (GSS) provides a suitable environment in which to explore
trends in vacancy and housing unit eligibility rates, as it has been conducted as a nationally-
representative household sample over the past decades. The GSS, sponsored primarily by
the National Science Foundation, biennially collects cross-sectional and panel data on the
attitudes, experiences, and demographic characteristics of residents throughout the United
States. The cross-sectional sample uses an address frame based on the United States
Postal Service Delivery Sequence File (DSF), as enhanced through supplemental listing
conducted by NORC staff prior to the start of data collection. NORC updates its national
sampling frame for the cross-section component of the GSS every ten years in rural areas
based on the decennial Census. Field interviewers visit cross-sectional housing units during
data collection, and so determine their eligibility. Sampled housing units may then be
identified as being vacant or not housing units and therefore out of scope. The 2004 and
2012 rounds of the GSS used newly-updated sampling frames, based on the newest
released Census data. For this paper we will track trends in vacancy rates and housing unit
eligibility rates across multiple years going back to 2000. Observed vacancy rates may
increase as the sample frame ages. Further, vacancy rates in later years may have been
affected by the 2008-09 economic recession; the GSS provides an environment in which we
can observe these trends over two-year intervals. For rural areas that required in-person
listing, we will also compare vacancy rates reported by the Census with vacancy rates
calculated through GSS fielding. These findings will add to the body of knowledge about the
effect of recency of updates in a study’s sample frame upon vacancy rates calculated in its
subsequent fielding.
27. Comparisons of Online Recruitment Strategies: Craigslist, Facebook, Google Ads
and Amazon’s Mechanical Turk
Christopher Antoun, University of Michigan; Chan Zhang, University of Michigan;
Frederick G. Conrad, University of Michigan; Michael F. Schober, The New School for
Social Research
Methods such as posting flyers in public places, placing print ads in newspapers and
magazines, and posting online classified ads on Craigslist have been widely used to recruit
research subjects. Recently, the rise of social media Websites (e.g., Facebook) and online
services such as Google Ads and Amazon’s Mechanical Turk (MTurk) offer new
opportunities for researchers to recruit study participants. Although researchers have started
to use these emerging methods, little is known about how they perform in terms of cost
efficiency and, more importantly, the type of people that they ultimately recruit. Here, we
report findings about the performance of four online sources for recruiting participants, in our
case, iPhone users: Craigslist, Facebook, Google Ads and MTurk. First, we compare the
cost and participant demographics associated with different recruiting sources. Next, we
evaluate whether people recruited from different sources behaved differently in our screener
survey (a brief online questionnaire to collect participants’ demographic information and to
verify they are actually iPhone users). The findings reveal very different performance
between two types of online recruitment strategies: those that “pull-in” online users actively
looking for paid work (e.g., MTurk workers and Craigslist users) and those that “push-out” a
recruiting ad to online users engaged in other, unrelated online activities (e.g., Google ads
and Facebook). We find that (1) the pull-in recruiting strategy was more cost efficient (more
respondents per dollar) than the push-out approach; (2) participants from the two pull-in
sites (Craigslist and MTurk) were predominantly young presumably because the users are
relatively young; (3) the two push-out recruiting sources, in contrast, seemed to have
reached a more diverse user base. In addition, the pull-in strategy brought in participants
who seemed more committed to the task and more willing to disclose personal information in
the interview, than respondents attracted through push techniques.
28. Continuous Survey Improvement: Modeling Nonresponse in Real-Time to Optimize
Sampling and Contact Procedures
Andrew Therriault, Lightbox Analytics
Disposition data is regularly used for post-survey adjustment, most commonly to reweight for
representativeness, but a more proactive approach offers a chance to address these issues
in real-time. We present an original method---'continuous survey improvement'---for using
disposition data from surveys still in the field. Our technique is based on modeling
nonresponse to initial survey attempts as a product of the various data available, including
completed surveys, call metadata, and characteristics of the target population. Through the
use of Random Forests, Lasso models, and other data mining tools, we can not only
pinpoint which segments of the population are being missed, but also identify how best to
correct the problem with changes in sampling or contact procedures. By addressing
problems during the survey rather than afterward, the ultimate goal is to reach truly-
representative set of respondents, rather than settling for weighted approximations. While
our method is most obviously applicable to long-term or repeated surveys, (e.g., tracking
polls, unemployment surveys), the same process could be applied in the course of one-off
surveys as well.
29. The Effect of Stamped Return Envelopes on Re-Mailing to Non-Respondents
Scott A. McInerney, Center for Survey Research
Although it is more economical for researchers to use business reply return envelopes when
sending out mail questionnaires, evidence has shown that stamped return envelopes
improve response rates by several percentage points. This has been shown for initial survey
mailings by Dillman and others, however, to date there is no published research addressing
the effect of stamped return envelopes on response rates for second round mailings to non-
respondents. Our experiment was designed to see if the benefit would persist in the second
round. As part of the Indiana/Texas Tobacco Study at the Center for Survey Research at
University of Massachusetts Boston, paper questionnaires were used to reach address
based sample (ABS) without any listed phone number. After an initial mailing to 4,000
sample members, each including a $1.00 incentive and a stamped return envelope, followed
by a reminder postcard, we still had 2,630 non-respondents. For the re-mail of the survey
instrument, we randomly assigned half the non-respondents to a stamped return envelope
condition, and half to a business reply envelope condition. No incentive was included in the
re-mailing. Comparing both groups, results show no significant difference between the rates
of return (9.8% vs. 9.3%). As previous research indicates, the idea of stamped return
envelopes may boost response rate for the initial mailing; however, it does not seem to
improve the rate of return for the re-mail of potentially more resistant non-respondents. This
research was funded by the National Cancer Institute, Grant #5R01CA151384.
30. Polling Post-Superstorm Sandy: Understanding the Social and Political Aftermath of
the Hurricane in New Jersey
David Redlawsk, Rutgers University; Ashley Koning, Rutgers University; Elizabeth
Kantor, Rutgers University; Caitlin Sullivan, Rutgers University
The entire Northeast and especially New Jersey suffered severe damage and loss from
Superstorm Sandy in October 2012. Rendering many regions powerless and devastated
and hitting soon before the election, the storm had serious social and political consequences
for countless citizens – as well as implications for polling and the field of public opinion in the
last days of presidential campaigning. In the storm’s aftermath a few weeks later as New
Jersey slowly began to return to a “new normal,” the Rutgers-Eagleton Poll carefully
captured citizens’ opinions in a Sandy-focused post-election survey on how the storm
affected them both personally and politically. In terms of personal ramifications, this analysis
looks at whether New Jerseyans were affected by Superstorm Sandy, forced to evacuate,
sustained property and/or other damage, and suffered power outages. It also assesses
interaction with and opinions on FEMA, the Red Cross, and citizens’ electric companies, as
well as the state’s overall level of preparation. Politically, we investigate how Sandy
impacted New Jersey voters on Election Day, whether it swayed their vote, how they viewed
Governor Chris Christie’s and other political figures’ handling of the crisis, and what they
thought of the highly publicized bipartisan visit between the governor and President Obama
after the storm. This analysis provides a look into New Jersey opinions soon after Sandy’s
aftermath by standard demographics such as income, race, and region, as well as by
cell/landline telephone contact and day of interview.
31. Barking up the Right Tree: Surveys to Target and Analyze Animal Health
Danna L. Moore, Social and Economic Sciences Research Center, Washington State;
Thom Allen, Social and Economic Sciences Research Center, Washington State;
Rose Krebill-Prather, Social and Economic Sciences Research Center, Washington
State
A significant issue for many animal health researchers is to define and get information from
the very specific subgroup of the human population closely associated with an animal
population that is at risk or that has a higher risk of injury, illness, nutritional requirements,
and/or special performance requirements. This research discusses aspects of locating and
finding a hard to reach group, owners of agility dogs and defining measurements of nutrition
and health, incidence of injury and illness, and animal health practices. The incidence of one
specific feeding practice is evaluated that is closely connected with infectious disease
transmission between dogs and humans. This study examines the problem of a population
within a population. A targeted large convenience sample, a general population survey, and
social network recruitment are used to study incidence and to comparatively study this
problem. New social media are used as an optional innovative framework for sampling,
targeting, and evaluating complex health problems where the contactable population holds
key information related to a subpopulation of interest.
32. Combining Local and National Cross-Survey Data to Estimate the Prevalence and
Characteristics of Low Incidence Religious Groups in the New York Metropolitan Area
Daniel Parmer, Cohen Center for Modern Jewish Studies
One of the defining characteristics of the United States is its religious diversity and the
traditions of civic involvement and service of many of the religious communities. However,
the separation of church and state precludes the U.S. government from collecting data on
the religious identification of citizens. An important source of estimates of the religious
composition of the U.S. is surveys, such as the American Religious Identification Survey as
well as surveys commissioned by specific religious denominations. Single surveys as
sources of estimation are problematic. Many include too few respondents to be able to
describe reliably the low-incidence religious groups (those ranging from 1% to 10% of the
population). Moreover, any individual survey contains systematic errors that arise from
questionnaire construction, sampling, sponsorship, and “house” effects. This study seeks to
overcome these challenges through the development of cross-survey analytic techniques
that are similar in approach to standard meta-analyses. We have compiled data across more
than 50 independent surveys of the New York metropolitan adult household population.
Each survey was designed to provide a representative sample and each contained
questions about current religious affiliation. Multilevel and advanced Bayesian techniques
were employed to account for within survey clustering and to develop estimates of smaller
groups, such as Jewish, Mormon and Muslim, as well as larger groups such as Catholic.
Estimates were post-stratified across surveys on basic demographics such as age, sex, race
and educational attainment. In addition, adjustments were made for the over- or under-
representation of metropolitan areas across the sample of surveys. The results from this
analysis expand on prior research by combining national and local data sources to estimate
the prevalence and characteristics of low incidence religious groups at the metropolitan
level.
33. Commemoration Matters: The Anniversaries of 9/11 and Woodstock
Amy Corning, University of Michigan
We investigate the effect of anniversary commemorations of September 11 and Woodstock
on the American public’s collective memory or collective knowledge of each event. We are
able to examine both the eighth and the tenth anniversary commemorations of the
September 11 attacks (in 2009 and 2011), as well as the fortieth anniversary of the 1969
Woodstock Festival (in 2009). In an initial step, we used media analysis to identify the timing
of commemorative activity surrounding the anniversaries. Our second step was to draw on
data from surveys whose fieldwork dates corresponded to the anniversary periods, in order
to compare respondents’ memory and knowledge of the events before, during, and after the
commemorations. Our evidence shows that the percentage of Americans who consider 9/11
an “especially important” event is related to commemorative activity, and we likewise find
that greater knowledge about the Woodstock festival is associated with commemoration of
that event. In addition, the impact of commemoration on knowledge of Woodstock was
greatest among those with lower levels of education. For memory of 9/11, we found that
commemoration’s effects were stronger for blacks than for whites, suggesting that
commemoration may enhance the salience of national, as opposed to racial, identity. These
findings offer insights into the educative and evocative roles of commemoration.
34. The Prevalence and Impact of Self-Selection Bias and Panel Conditioning on Smoker
Studies Using Established Internet Panels
J.M. Dennis, GfK Knowledge Networks; Curtiss Cobb, GfK Knowledge Networks;
Michael Lawrence, GfK Knowledge Networks; Jordon Peugh, GfK Knowledge
Networks
Given their many advantages (see Couper 2008; Fricker 2002; Chang & Krosnick 2009), it is
not surprising that there has been increasing use of established Internet panels for
household and individual level data collection. Internet panels are, however, susceptible to
two potential drawbacks: self-selection bias and panel conditioning effects. Self-selection
bias is a form of non-response and can occur if panelists non-randomly fail to participate in
assigned studies or fail to answer specific questions within a study. Panel conditioning can
occur if panelists’ responses in a study are influenced by participation in prior studies, such
that panelists’ answers differ systematically from those of individuals not on the panel. Self-
selection bias and panel conditioning effects may be particularly likely to occur for
individuals asked to complete many surveys on the same topic while a part of the Internet
panel, such as what occurs with smokers and public health smoking studies. This study
investigates the prevalence and impact of these biases on three smoking-related public
health studies conducted using GfK’s KnoweldgePanel®, a probability-based Internet panel
representative of the U.S. general population. Outcomes examined include measures of
knowledge, behavior and attitudes and are estimated from selection models to disentangle
conditioning from non-response. Initial findings suggest that while many questions related to
attitudes, behaviors and knowledge are repeated across most smoking-related studies,
exposure to prior smoking surveys was weakly correlated to respondent answers in two out
of the three smoking studies examined. For example, panel conditioning effects were
estimated to increase the prevalence of having ever tried to quit smoking from 62% to
63.2% (+1.2 points). Not surprisingly, willingness to participate in early studies, regardless of
topic, is related to the likelihood of completing another smoking study. These results are
reassuring that panel participation minimally impacts the reported attitudes and behaviors of
respondents.
35. Voter Identification: Towards A Statistical Likely Voter Model
Jonathan Robison, Greenberg Quinlan Rosner Research; Masahiko Aida, Greenberg
Quinlan Rosner Research
There has been much controversy in political punditry on the criterion for assessing whether
a respondent will be a likely voter in an election. As is commonly known, the likely voter
models many political public opinion researchers use are not statistical in nature, rather they
are decision rules meant to define a universe that, apriori, researchers believe will constitute
the electorate. Recent scholarship has made substantive critiques of likely voter models that
use variables such as enthusiasm and political knowledge, and proposed differing methods
for resolving biases that likely voter screens introduce. With declining response rates to
surveys, developing an empirically rigorous and statistically grounded likely voter model will
go a long way towards improving accuracy and limit bias in results. Today, pollsters using
likely voter models rarely go back to validate its effectiveness, relying on gut instinct rather
than hard data. Because the existing literature in this area is relatively sparse, uses older
and less extensive data, as well as less rigorous predictive methods, we believe we can
make a both scholarly and practical contribution to this area of research. Using sophisticated
predictive modeling techniques, we intend to create a weighted algorithm to assess the
likelihood a registered voter will vote, using data from a national survey to create a statistical
decision rule that will provide researchers with a dynamic, rather than an ad hoc method to
create a likely voter universe. Additionally, the novel dataset the authors assembled for
analyzing likely voter screens includes data of the evaluations by calling house
professionals instructed to rate the likelihood a respondent will turn out to vote on Election
Day. Utilizing this novel survey question and survey micro-data, we plan to find an optimal
likely voter screen.
36. Analyses of a Frame Based Telephone Survey in Mainland China
Shishi Chen, The University of Hong Kong
Fixed lines and mobile phones have been widely used as national telephone survey tools
and there are many studies of fixed line and mobile phone survey methodology and
comparing telephone surveys with other survey modes. This paper builds upon a great
opportunity for methodological work on fixed line and mobile phone surveys in Mainland
China, using a follow-up survey interviewing the respondents from a prior face-to-face
survey. This is innovative. Understanding the challenges in fixed line and mobile phone
surveys in Mainland China is a very topical issue in the field of survey research and the
results can be used to study survey errors and contribute to that literature as well as to
improve the quality of survey fieldwork procedures. A database with telephone contact
information for 4041 individuals was obtained from a household survey in Mainland China,
for which the Social Sciences Research Centre of the University of Hong Kong was
commissioned to conduct a follow-up telephone survey of the same individuals. The
households were sampled randomly for the first wave national face-to-face survey and the
individuals are respondents who left their telephone numbers after the face-to-face survey
and accepted in principle a follow-up interview within two weeks. This paper analyzes the
quality of the face-to-face database and the outcomes of the follow-up telephone survey. As
the demographics of respondents and non-respondents were known from the database,
studies of the influence of day, time, household demographics and individual demographics
on the first and second contact attempt outcomes are undertaken using logistic regression.
The findings include an effective calling design to improve telephone survey field work
strategy and contribute valuable information for further studies in Mainland China. The
impact of the interviewers’ language skills on survey cooperation rate is also discussed.
37. Debating Tweets: An Analysis of Policy Choices on Twitter During the Dutch Pre-
Election Debates
Bengü Hosch-Dayican, University of Twente; Kees Aarts, University of Twente
To what extent can social media be a relevant data source for the study of political
representation? The present paper aims at providing some building blocks for answering
this question, using data collected on Twitter during a 2012 election campaign. A commonly
used measure of the quality of political representation is the congruence between policy
preferences of the electorate and their representatives. Recent research has demonstrated,
however, that measures of ideological and issue proximity between voters and parties
based on survey data and content analyses of party programmes lead to contradictory
findings on party representativeness (Thomassen 2012). This suggests that traditional
methods of analyzing issue congruence should be accompanied by more comprehensive
data. We aim therefore in this paper to discover the potential of politically relevant
discussions or sequences in social media as a novel instrument to explore the extent of
congruence between issue preferences of political elites and citizens. Our setting is formed
by the planned six election debates broadcasted on TV and radio, leading up to the Dutch
Parliamentary elections of September 12, 2012. The policy positions of party leaders will be
captured by transcribing and coding the debates according to a predefined scheme.
Furthermore, we will monitor citizens’ attitudes on policy issues addressed by the candidates
using Twitter messages sent during these debates. We will use software which mines
Tweets containing a selected set of hashtags and assesses the sentiment expressed in
them (Pang & Lee 2008). Through the simultaneous measurement of policy positions on
both mass and elite side it will be possible to capture position distances on up-to-date issues
around general elections. Moreover, applying the same measurement to six consecutive
debates allows us to trace how issue discrepancies between citizens and parties develop on
these topics in the last three weeks before the elections.
38. The Effect of Attempting to Recruit Respondents to a Web-Based Diary on Overall
Response Rate
Michelle A. Cantave, Arbitron, Inc.; Robin Gentry, Arbitron, Inc.
Arbitron Inc., a provider of radio ratings data, conducted a test using a probability based
address sample to recruit the general population, aged 13 and older, to complete a one
week diary of their radio listening. Traditionally Arbitron uses a hybrid frame, which includes
address matched and unmatched RDD sample, cell phone households, and no phone
households, to recruit households for our one week diary. For the telephone numbers that
we have an address for: matched RDD, cell phone, and no phone households; we send a
pre-alert letter before we call to attempt to recruit the household. Upon recruiting the
household we sent a follow up letter, then the diaries, as well as follow-up phone calls for
the households with phone numbers. In this test we attempted to recruit households from a
phone matched address based sample using both our traditional recruitment practices
(control group) as well as trying to recruit the household by sending them an invitation to
complete the diary by going online (test group). For those households sent the online diary
invitation, we followed up with nonresponding households and attempted to recruit them via
our standard methodology. In this presentation, we will report the effects of first attempting
to recruit the respondents to the Web based survey on the response rate (test vs. control
group) as well as comparing the results from the phone matched address based sample to
those recruited from the address matched RDD sample (control group vs. standard
production).
39. Measuring Patient Health Behavior: Information Sharing With Healthcare Providers
Tammy J. Payton, National Marrow Donor Program; Heather K. Moore, National
Marrow Donor Program; Jaime M. Preussler, National Marrow Donor Program;
Viengneesee Thao, National Marrow Donor Program; Michelle J. Kolb, National
Marrow Donor Program; Navneet S. Majhail, National Marrow Donor Program;
Elizabeth A. Murphy, National Marrow Donor Program; Ellen M. Denzen, National
Marrow Donor Program
Bone marrow and cord blood transplant (transplant) is a potentially curative, but complex
and resource-intense therapy for patients with blood cancers as well as other genetic and
immune disorders. It is estimated that there are currently 100,000 transplant survivors in the
United States and this number is expected to grow two- to three-fold by 2020. Studies to
date have shown that the quality of survivorship care is frequently suboptimal, and as a
result, survivors are often lost to systematic follow-up within the healthcare system. The
literature also suggests that the majority of cancer patients rarely or never discuss
information they find important with their provider. As such, patient-focused post-transplant
care guides were developed to facilitate follow-up care, especially the transition of care from
transplant specialist to local physician, and to promote patient-provider information sharing.
To evaluate the effectiveness of the guides overall, and specifically in addressing this issue,
a longitudinal, repeat-measures survey was administered at 6, 12 and 24 months post-
transplant to a nationally representative cohort of transplant recipients. The challenge was to
measure patients’ information sharing experiences in a single question with a focus on
minimizing respondent burden. We will describe survey instrument pilot results and question
design as well as characterize differences between patient groups who do and who do not
share information with their providers. These results can be used to improve the precision of
information sharing measures and identify communication barriers. Addressing these
barriers may ultimately improve patient-provider decision making and patient satisfaction.
40. Using Focus Groups to Develop and Understand Survey Questions
Kinsey Gimbel, Fors Marsh Group; Katherine Ely, Fors Marsh Group; Bryan Wiggins,
Fors Marsh Group; Jennifer Romano Bergstrom, Fors Marsh Group
Although often viewed as a qualitative data collection tool, focus groups can be a powerful
tool in the survey development process. While cognitive interviewing is a more traditional
way of testing survey questions, focus groups can also be structured so as to guide and
assist in the development of both high-level survey topics and specific questions. This can
be done before, during, and after survey development:
Before work begins on survey design, focus groups can help researchers identify key
concepts and topics to include in the study, and help spot subjects that will not be
profitable survey areas.
During survey development, focus groups can be used to evaluate prospective
survey questions, identify possible response options, and refine question wording.
After data collection is complete, focus groups can be used to better understand
survey findings.
This paper will use examples from a series of focus group projects conducted for the
Department of Defense during 2011 and 2012 to illustrate how focus groups can be used at
each point in the survey process to improve survey materials and better understand survey
findings. Areas of discussion will include specific questions and activities used during
groups to solicit responses, examples of how questions were modified based on group
feedback, and how focus group discussions can be used to expand on concepts used in
survey questions.
41. Effects of Displaying Videos on Measurement in a Web Survey
Jonathan Mendelson, Fors Marsh Group; Jennifer L. Gibson, Fors Marsh Group;
Jennifer Romano Bergstrom, Fors Marsh Group
Advertisers often use videos in online surveys to assess effectiveness of advertisements.
While this allows marketers to test immediate reactions to videos, technical issues and lack
of high-speed Internet access can introduce issues of generalizability and of comparability
with alternate methodologies. Despite increased interest in embedding rich media in
surveys, there is little published research on the implications for survey measurement. In a
probability-based online advertising tracking survey, respondents were asked two sets of
advertising recall questions. First, they were asked if they had seen advertisements for the
Military or for any of its specific Services. Next, depending on whether respondents could
successfully view a test video, respondents were shown videos or images of several specific
advertisements and asked if they had seen them. Respondents who had seen the
advertisements or who were shown videos were asked about their reactions to the ads.
Time spent per survey page and the randomized presentation order of the advertisements
was recorded. Our research examines the effects of using video stimuli on measurement.
First, we use logistic regression to predict whether respondents could view videos, based on
demographics; differences would indicate potential bias in studies solely using a video-
based methodology. Second, we examine differences in ad recall based on whether
respondents were shown images or videos, using demographics and the first set of recall
questions to attempt to control for the possible confound between respondent selection into
the video condition and respondent ability to view videos. Third, we use regression methods
to predict whether respondents who were shown videos viewed the entire advertisements,
based on demographics and presentation order. Fourth, we examine response
differentiation and the selection of 'not sure' options in the ad reaction questions among
respondents who were shown videos, based on demographics, whether respondents
viewed the entire videos, and presentation order.
Saturday, May 18
1:15 p.m. – 2:15 p.m.
AAPOR Demonstration Session #3
Simulating the Effect of Follow-Up Survey Response Rates on Program
Outcomes
Rebecca Lien, Professional Data Analysts, Inc.
Using data from three tobacco cessation phone counseling programs (quitlines), we simulate
program outcomes at lower survey response rate levels than what was achieved. Program
outcomes explored include quit status and program satisfaction measured seven months after
program registration. The quitline field has adopted a target of 50% for follow-up surveys, yet
the majority of U.S. quitlines do not achieve this target. We conducted the simulation as a tool to
discuss the importance of survey response rates to the quitline community. We collected intake
and 7-month follow-up data for quitlines in the three states: Minnesota (n=1,287); Florida
(n=3,430); and Hawaii (n=1,203). The survey response rates for the three quitline case studies
ranged from 48% to 64%.Using the number of days from the first survey attempt to the survey
completion date we calculate response rates and outcome measures at each day of the survey
period. We then graph the outcomes as a function of the calculated survey response rate to
show what quit rates and satisfaction measures would be for the same group of participants at
lower survey response rates. We find the quit rate outcome is influenced more by the survey
response rate than the satisfaction outcome in the three case studies. The simulation is
straightforward and generalizable to other fields with follow-up surveys. The graphs were a
useful tool in discussing non-response bias to the quitline community.
A Demonstration of the University of Michigan Survey Research Center’s
Electronic Listing Program
Frost A. Hubbard, Survey Research Center, University of Michigan; Jennifer Kelley,
Survey Research Center, University of Michigan; Jeffrey Smith, Survey Research Center,
University of Michigan; Xuetao Zhang, Survey Research Center, University of Michigan
In 2006, the University of Michigan’s Survey Research Center developed the Electronic Listing
Program (ELP), which enables us to do traditional and depending listing electronically. As
defined by Eckman (2010), dependent listing occurs when field staff are given a list of
addresses for a specific geographic area and asked to update the list based on what they find in
the area in person. Traditional listing occurs when no address list is given to the field staff in
advance and the staff member must create the entire list of addresses in the area. Doing both
types of listing electronically has greatly reduced our listing processing error and processing
costs. Since the inception of the ELP, we have continually revised the software to achieve three
objectives. First, we made it as easy as possible for our field staff to rearrange the addresses on
the list and to put them in “walking-sort” order as defined by Kish (1965). Second, we improved
the quality of our listed addresses to reduce returned mail, but also our ability to match our listed
addresses to commercial databases from vendors such as Marketing Systems Group and
Acxiom. We accomplished this by parsing the addresses into the seven unique fields as defined
by the USPS (e.g. housing unit number, street suffix). Finally, to have a clear sense of how
many addresses were added, deleted or modified by field staff during the listing procedure in
each geographic area, for each individual address, the ELP now transmits indicators of whether
the address has been added, deleted, or modified to our master listing database. With these
indicators, we have data which will help us more accurately predict the areas in the future where
we can forego dependent listing and select addresses directly from the USPS Delivery
Sequence File.
Demonstration of an Integrated Respondent Management and Data Collection
Tool for Mixed-Mode (Phone/Web/Mail) Surveys
Harlan Luxenberg, Professional Data Analysts, Inc.; Julie Rainey, Professional Data
Analysts, Inc.
Many research and evaluation firms are recognizing the importance of collecting data through
multiple modes in order to increase response rates and reach a more diverse pool of
respondents. Firms often use Microsoft Excel for managing contact lists across modes or a
pricier CATI system solution which may or may not meet all of their needs. Neither of these
options fit the needs for our evaluation organization so we built our own tool based on our
experience and survey methodology. This demonstration will showcase Synchronized
SurveyTM, a tool that Professional Data Analysts, Inc. developed specifically for mixed mode
data collection and has been using for the last four years. This software provides a central
management interface for managing contacts across modes, tools for sending emails,
processing mail merges, updating contact lists, tracking attempts and response, and entering
mail surveys through an automated, dual-data entry system. Telephone interviewers have
access to a secure interface which allows them to use caller lists to select which cases to
attempt by viewing the complete annotated call history. An easy to use form leads them through
the survey. Non-respondents can be automatically flagged to receive an additional mode and
removed from all modes once they complete by any one mode. In addition, this software
integrates with LimeSurvey, an open-source online surveying tool, so online surveys can be
created in LimeSurvey, but managed through Synchronized Survey. A comprehensive reporting
system shows real-time response and other metrics necessary for tracking multiple surveys and
surveyors. The system is built using ASP.net technology and a SQL Server database. Surveys
require a certain amount of programming to meet the needs of each project. This demonstration
will showcase a recent tri-mode survey conducted using Synchronized Survey and LimeSurvey
so others can learn what a homegrown, Mixed Mode survey application looks like.
RDC-in-RDC: A New Approach to International Data Sharing
Stefan Bender, Institute for Employment Research; Daniela E. Hochfellner, Institute for
Employment Research at the University of Michigan; Margaret Levenstein, University of
Michigan
International and comparative analysis is often difficult given the existing restrictions on access
to non-public micro data. In most cases researchers are required to undertake a costly research
stay at a foreign RDC to access the necessary data. In order to improve data accessibility for
international researchers, the Research Data Center of the German Federal Employment
Agency (BA) at the Institute for Employment Research (IAB) in Nuremberg and the Michigan
Center on the Demography of Aging (MICDA) have launched a new initiative in international
data sharing, RDC-in-RDC. The RDC-in-RDC enables access to restricted German social
security data stored on a secure server in Nuremberg from designated institutions with
comparable standards but other locations. This is the first time that confidential German micro
data have been made accessible to researchers outside of Germany. Researchers can apply for
working with data on individuals, households, and establishments. The data contain daily
information on the employment and unemployment history of the individuals, occupations and
education, wages, and benefits, as well as job search activities and job training schemes as
covered by the German social security system. In all data sources it is possible to link
individuals and households to establishments. Furthermore, access is granted to IAB surveys
which also can be linked to the administrative records of the respondents and metadata like
data on interviewers or non-response. Such kind of data access is important for social science
in many ways. Globalization requires research of transnational topics, such as economic crises,
migration and health. Moreover, the various linkage possibilities can be used to gain new
insights in survey methodology. The paper contains a brief description of the RDC-in-RDC
concept and it’s technical implementation. It provides an overview of the available data sources
regarding comparative research topics and research on survey methodology.
Saturday, May 18
2:15 p.m. – 3:45 p.m.
AAPOR Concurrent Session I
Response Rates and Data Quality in Multi-Mode Surveys
Changing Horses Midstream? Mode Supplement Quasi-Experiment and
Response Rates
Rumel Mahmood, Center for Survey Research; Mary Ellen Colten, Center for Survey
Research; Jack Fowler, Center for Survey Research; Carol Cosenza, Center for Survey
Research
Declining response rates for Random Digit Dial (RDD) samples, the traditional workhorse in
survey research, over the past few decades has led to some consternation among survey
researchers, and as a result helpful suggestions to improve response rates. For a survey on
rationing in Medicare and high health care costs carried out for the University of Pennsylvania
Medical School by the Center for Survey Research at the University of Massachusetts- Boston,
from May- August, 2012, we adopted many of these best practices at the outset for our
telephone survey: implementing a multi-frame sample (n=2800) with RDD (800), list (1400), and
cell phone (600) components; sending pre-notification letters to those respondents for whom
addresses were available (1568); and including a small monetary incentive ($2) with the
advance letters. (Since our survey was on Medicare, rationing, and health care costs, we sought
to speak with a member of the household over the age of 40.) Despite these measures, our
response rate was lower than expected. We decided to send a printed questionnaire to
respondents for whom we had addresses but were unable to reach over the telephone or who
refused the telephone interview. With the paper instrument we sent a letter tailored to the non-
response type and a further incentive ($10). Of the initial 200 surveys we mailed to non-
respondents, we received 122 completed surveys (61%). After such a high yield, we mailed a
printed questionnaire to the remaining non-respondents in our sample. In total, we obtained 388
telephone interviews and 503 completed mail surveys, for a final response rate of 50% (AAPOR
4). In this paper we present differences in the characteristics of those who responded via the
two modes and some of the substantive differences that resulted from adding the mail
responses to those from the telephone interviews.
Differential Incentives in a Dual Mode Survey of Health Care Providers
Brian Roff, Mathematica Policy Research; Kirsten A. Barrett, Mathematica Policy
Research
Health care providers respond to surveys at very low rates. Mail surveys are commonly used
when surveying physicians and similar health care professionals. Increasingly, surveys are
being administration by Web or by mail with a Web option. The Web offers an opportunity for
data to be collected more efficiently – data entry costs are reduced, data quality is improved,
and respondent burden is reduced. Prior research on the dual mode mail/Web approach has
focused on responses rates, with mixed results (Schneider et al. 2005; Friese et al. 2010;
McFarlane et al., 2009). Little research exists on the role incentives play in mode choice,
especially when the incentive favors a certain mode. The use of differential incentives in dual-
mode mail/Web surveys to encourage Web response in particular has not been examined in the
physician population, although it has been studied in surveys of recent college graduates
(Mooney et al., 20012). Mathematica Policy Research conducted a dual-mode mail/Web survey
of a nationally representative sample of 5,000 health care workers providing care to patients
with HIV/AIDS. To control survey costs while at the same time encouraging response, we
offered a $20 pre-paid incentive and a differential post-pay incentive that favored Web survey
completion. Those responding via mail received an additional $20 while those responding via
Web received an additional $40. Since we did not have emails for sample members, precluding
an email invitation, we hypothesized that 60 to 70 percent of the responding clinicians would
complete the survey by mail. However, only one third did—two-thirds responded by Web. In this
paper, we will: 1) explain the rationale for using a differential incentive as a means to encourage
mode selection, 2) describe differences between Web and mail survey responders, and 3)
provide suggestions for improving dual-mode surveys and incentive structures in the future.
Suppressing Survey Response: Further Evidence to Not Use Web Instruction
Cards
Orin T. Puniello, Bloustein Center for Survey Research, Rutgers University; Marc D.
Weiner, Bloustein Center for Survey Research, Rutgers University; Robert B. Noland,
Alan M. Voorhees Transportation Center
By way of a survey research experiment, Messer and Dillman (2011) theorized that an
illustrated, explanatory “Web card” would increase Web response rates when stimulating survey
participation via postal mail. While those authors found no such effect, the Web cards in their
experiment were generic for all respondents. As personalization of a survey invitation tends to
increase response, we theorized that personalization of the Web card would increase its
efficacy. We embedded an experiment in an Internet survey driven by address-based sampling
mail contacting. The sample (N=8,000), geographically centered around eight train-stations,
was divided into three categories: no Web card; generic Web card; and, personalized Web card,
i.e., preprinted with the respondent’s Internet survey passcode. Hypothesizing no effect for the
“no card” and “generic card” respondents, we anticipated a response rate boost for the
personalized Web cards. We found no effect in the proportion of Internet and mail response;
however, while we found no effect on overall survey response in the “no card” and “personalized
card” categories, we found a noticeable response suppression effect in the “generic card”
category (N=6,938; chi2=4.74, p=0.029). An inferential logit model controlled for 1) whether the
invitation letter, per se, was personalized; 2) nature of the housing unit; and 3) geography. We
found, as now expected from the bivariate analysis, no effect from the personalized card
(OR=1.01; p=0.858). All of the other controls were statistically significant and performed as
expected (for e.g., for a personalized invitation letter, OR=2.65; p=0.000). The important
empirical finding is that even under these controlled conditions, the generic Web card still
suppressed survey response (OR=0.87; p=0.050).The instructive lessons for survey
researchers are to not waste valuable survey resources on Web cards, whether personalized or
not, and that there is a demonstrable risk that using Web cards may actually suppress survey
response.
Approaches to Collecting Data Using Interactive Voice Response (IVR) for
Address-Based Samples
Douglas Williams, Westat; David Cantor, Westat; Shannan Catalano, Bureau of Justice
Statistics
Investigation concerning the use of Interactive Voice Response (IVR) as a data collection tool is
not new. The advantages offered through IVR data collection are increased sense of privacy to
encourage the reporting of sensitive behaviors, standardized interviewing, computer assistance
to accommodate complex skip patterns, and reduced costs. For household surveys the
traditional protocol for connecting respondents is for an interviewer to contact the respondent
and transfer to the IVR system (Gribble et al., 2000). Concern with this approach is the potential
for respondents to drop out during the transfer. This has been found to be as high as 30 percent
(Tourangeau, 2004). The involvement of an interviewer can mitigate potential costs savings of
an IVR, and the high drop off rate can counteract the reduced biases gained from increased
privacy. The rise of address-based sampling approaches (ABS) affords the opportunity to invite
participants through mail contact, maximizing cost efficiency and avoiding drop offs due to
system transfers. The paper reports on the results of a field test conducted for the Bureau of
Justice Statistics in 2012 which examined the feasibility of using IVR to administer the National
Crime Victimization Survey (NCVS). The NCVS is a two-stage victimization survey which, in its
present form, requires complex skip patterns that cannot be accommodated on a mail paper
survey. In this test households were randomly assigned to either CATI Only, CATI with transfer
to IVR (CATI-to-IVR), or Mail invitation to call the IVR system (IVR Only). This paper will
compare the response rates from these different approaches, as well as the types of
respondents that responded. Overall, the response rates for the IVR Only are equivalent to
CATI Only and higher than CATI-to-IVR. The presentation will provide detail on these results,
including characteristics of the respondents to each of the different modes.
AAPOR Updates: Reports From The Transparency Initiative
and Non-Probability Task Force
This session will present a report on progress for two AAPOR initiatives: the Transparency
Initiative and the Non-Probability Sampling Task Force. It will provide AAPOR members with an
opportunity to engage in discussion and dialogue with members of these two groups.
Transparency Initiative Coordinating Committee Report
Timothy Johnson, University of Illinois at Chicago
Non-Probability Task Force Report
Reg Baker, Market Strategies, Inc.; J. Michael Brick, Westat
Social Attitudes: Race, Gender and Generations
Measuring Anti-Black Racism in the U.S.
Tobias H. Stark, Stanford University; Josh Pasek, University of Michigan; Trevor
Tompson, Associated Press-NORC Center for Public Affairs Research; Jon A. Krosnick,
Stanford University
Especially in the light of the recent election campaign of president Obama, interest among
social scientists in racial prejudice remains as high as ever. However, four years after the
election of the first Black president, we have not reached agreement on how survey researchers
should measure racism. Some scientists prefer measuring racial stereotypes, others focus on
affective measure of prejudice, some address the issue with implicit measures, and another
group of scientists focus on measures of “new racism” such as symbolic racism. In fact, the field
has evolved into camps that seem to doubt the validity of the others’ approaches. We try to build
bridges across these camps by understanding the relations between the different types of
racism measures. Multitrait-Multimethod models are applied to data from three recent
representative U.S. national surveys that included the most commonly used measures of
racism. We assess similarities and differences in how the measures associate with each other
as well as with various predictors and outcomes of racism. We discuss advantages and
limitations of the different racial prejudice measures and propose guidelines for future research
on racism.
Integration and Segregation in 21st Century Schools: Voter Conflicts Over
Equality, Local Control, and Community
Rachel L. Moskowitz, Northwestern University
This paper explores the competing meanings of equality, local control, and community for voters
in the context of a local school referendum in Evanston, IL. In March 2012, residents voted on a
ballot referendum that would levy taxes earmarked for building a new school in the 5th ward of
Evanston. This ward is a historically black neighborhood that has not had a neighborhood
school since racial integration of the school district in the late 1960s. Notions of equality and
community control were at the heart of the Evanston referendum debate on building this new
neighborhood school; providing equal access for all neighborhoods to local community schools
was pitted against maintaining city-wide racial integration of schools. This original survey
experiment of the election explores how important factors, such as race and group identity,
affect individuals’ preferences for equality and community control both in the abstract and in
these specific circumstances. The role information played in this preference formation is also
seriously considered in this paper.
A Failure to Engage? An Examination of the Political Life of Generation X
Jon D. Miller, International Center for the Advancement of Scientific Literacy
In recent years and in recent campaigns, political analysts have asked whether the 80 million
young adults that comprise Generation X have become or will become active participants in the
American political system. Some journalistic characterizations of Generation X have painted
them as “slackers” who are often disengaged with the political system in contrast to the more
activist young adults who led the civil rights and anti-war movements of the 1960s and 1970s.
The 26-year record of the Longitudinal Study of American Youth (LSAY) provides a strong
empirical base for examining and testing the idea that most Generation X young adults (born
between 1961 and 1981) have failed to engage with the political system. The LSAY is a national
longitudinal study that was initiated in 1987 and continues to collect new information from the
same 5,000 respondents each year. The participants in the LSAY represent the center of the
age range for Generation X. Parallel to Jennings and Niemi’s longitudinal study of high school
seniors in 1965, the LSAY has collected a wide array of political socialization and participation
data over the last 26 years. A comparison of the patterns found in these two studies will provide
empirical evidence about the engagement of the young adults in Generation X and a
comparison with the preceding generation of young Americans. Although data from the 2012
election are still being collected, the data from preceding decades will show that the level of
political engagement by Generation X young adults has been higher than that of preceding
generations. A set of two-group structural equation models will be used to validate this claim,
but the results will be presented in a format that will be accessible to AAPOR attendees with
and without prior training or experience with statistical models.
Framing the “War on Women”: A Survey Experiment on the Effects of Partisan
Framing on Issue Perception and Vote Choice
Ashley A. Koning, Rutgers, The State University of New Jersey; David P. Redlawsk,
Rutgers, The State University of New Jersey
Women voters were at the forefront of the 2012 election, and women’s issues continually made
headlines. These stories became part of an overarching assertion that a “war on women” was
being waged. The Democratic Party originated the “war on women” frame to specifically attack
Republican stances and legislation on reproductive health, contraception, and rape. The
Republicans soon countered, however, by framing the “war on women” as an economic one.
Republicans argued that the war was actually being waged by President Obama’s
administration, which caused women to suffer most in terms of jobs, unemployment, and
poverty rates. The “war on women” thus became an enduring part of the campaign and a
symbol for the battle over women voters. But which party had the more effective “war on
women” frame? We know who won the election and who women voters favored, but how did
these frames—the Democrats’ health-based one and the Republicans’ economic-based one—
affect perceptions of the “war on women” and individuals’ ultimate vote? This paper explores the
“war on women” rhetoric by employing the two differing partisan frames through a survey
experiment design. We test each frames’ influence on whether voters perceived the “war on
women” as real or myth, which party they thought was most responsible for waging it, and if it
had any influence on voting. We argue that while voters and women overall will believe and be
more influenced by the Democrats’ frame, Republicans (and men) will show greater support for
the “war on women” in their own partisan framing. This research follows the framing literature by
showing how different frames can differently affect subsequent perceptions and opinions, as
well as adds the assertion that partisans may be more susceptible to issues they would not
traditionally support when framed within their own values and arguments.
Changes in Gender Beliefs in the U.S. from 1977 to 2010: Results from the
General Social Surveys
Duane Alwin, Pennsylvania State University; Paula Tufis, University of Bucharest;
Kristen Lee, University of Buffalo
This research examines secular change in gender beliefs from 1977 to 2010 using state-level
GSS data. Processes of change in gender beliefs are found to vary across three historically
relevant time periods and across segments of the population defined by religion, gender and
region of the country. While there has been considerable growth across time in all groups in
support of egalitarian gender beliefs, men tend to lag behind women in support of women’s work
roles. In a decomposition analysis, we find that the dramatic rate of intra-cohort change in
beliefs reported from 1977 to 1985 declines in later periods for both women and men. Our
findings are consistent with the claim that an anti-feminist backlash emerged in the mid-1980s
and a period of stagnation in the growth of egalitarian beliefs predominated through the 1990s
and the early 21st century. Both religion and region influence the nature gender beliefs, with
distinct patterns being independently shown by both sets of factors. Regional differences reveal
patterns consistent with state endorsement of the 1972 Equal Rights Amendment. Regional
composition with respect to religious adherents accounts for some, but not all, of the differences
between regions, and generally both religion and region contribute independently to levels of
gender beliefs. There are very few statistical interactions between the components of secular
change and regional and religious variation, suggesting that components of change throughout
the periods studied are relatively immune to the level differences in beliefs due to regional and
religious variation. Change components among women do not depend upon religion or regional
categories. We conclude that analyzing change in different historical periods and geographic
regions and within different segments of the population defined by gender and religion sheds
new light on the processes of gender belief change in the U.S. since the 1970s.
Satisficing and Cognitive Shortcuts
The Relations Among Different Cognitive Shortcuts in Surveys
Roger Tourangeau, Westat; Rebecca Medway, University of Maryland; Stanley Presser,
University of Maryland
This paper examines the issue of whether some respondents are consistently “bad”
respondents, who use a variety of methods to get through a questionnaire quickly and provide
data of dubious value. We examine a wide range of cognitive shortcuts, including choosing the
first and last response options, yea-saying, giving don’t know and no opinion responses, non-
differentiation among answers to similar questions, reporting numerical answers as round
values, and selecting status quo responses. Some of these are forms of survey satisficing but
others are not. The data include responses from national face-to-face, telephone, and Web
surveys. Across all three modes, we find little evidence that respondents who exhibit a high rate
of shortcuts in the first half of a questionnaire also exhibit a high rate in the second half. In
addition, we find weak correlations among the various forms of shortcutting. It could be that
respondents have preferred modes for coping with the demands of survey questions, with some
preferring DK responses, others non-differentiation, and still others yea-saying. Another
possibility is that item characteristics (which affect how interesting and difficult an item is for
different respondents) play a more important role in determining the level of shortcutting than
respondent characteristics. A final possibility is that these shortcuts do not represent a single
phenomenon, but are at best loosely related strategies for dealing with survey questions. We do
not find consistent relations between any respondent variables (such as educational attainment)
and any of our measures of the use of shortcuts.
Mindful Responding to Questions: The Dangers of Survey Satisficing
David L. Vannette, Stanford University; Jon A. Krosnick, Stanford University
Respondent satisficing during surveys is a significant concern for researchers because of the
implications for data quality. As such, considerable efforts have been made to measure,
understand, and reduce survey satisficing since the early 1990s. Recently, promising new areas
of research on survey satisficing have emerged; one of these is the application of the
psychological concept of mindfulness to further understand the cognitive processes implicated
when a respondent is satisficing. To achieve high levels of data quality, researchers strive to
induce respondents to engage in optimal process for answering survey questions (Krosnick,
1991). This optimizing process refers to a respondent attending to the question at hand and
then proceeding through the process of interpreting the meaning of the question, searching their
memory for all relevant information, integrating that information into summary judgments,
mapping those judgments into the required response format, and the reporting their response
(Tourangeau, Rips, and Rasinski, 2000). Satisficing occurs when respondents deviate from this
cognitively demanding process and provide answers that they deem to be satisfactory.
Mindfulness during the survey response process refers to one possible mechanism through
which respondents may be able to apply the mental control necessary to exert the considerable
cognitive effort required to optimize their responses to the questions being asked. This is
contrasted with mindlessness in survey responding where a respondent does not exert sufficient
mental control to optimize the survey response process; this may be a pathway to satisficing
behaviors and the associated low-quality responses to survey questions. In this paper, we seek
to integrate the existing psychological research on mindfulness and survey satisficing to further
develop our understanding of the implications of mindfulness and mindlessness for the survey
response process. We will also make suggestions for best practices in survey design in order to
elicit mindful responses to survey questions.
Effects of Respondent Reluctance, Mode, and Technical Difficulties on Straight-
Lining and Refusals in a Mixed-Mode Survey
Jennifer L. Gibson, Fors Marsh Group; Jonathan Mendelson, Fors Marsh Group
This study extends past research examining the effect on data quality of experiencing technical
difficulties with a survey. Past research finds that straight-lining, which can indicate satisficing, is
predicted by respondent reluctance and mode. Given the popularity of mixed-mode surveys with
a Web option, it is important to understand whether these method factors affect data quality. We
examined straight-lining in a quality-of-life survey of military recruiters offered in Web and paper
modes. Straight-lining was evaluated as a function of survey reluctance, mode, and whether a
respondent experienced technical difficulties. Common difficulties were trouble logging into the
survey and security restrictions on personal computers. Of the 3,957 participants, most
responded via the Web (77%) and did not experience technical difficulties (95%). Moderated
multiple regressions will be estimated to describe the association of survey reluctance, mode,
and technical difficulties with three measures of satisficing behavior: straightlining, endorsing
“n/a” or “don’t know,” and refusals. Results will indicate whether respondents taking more or
less time to return a completed survey, using different modes, or encountering technical
difficulties are more likely engage in different forms of satisficing. Interaction results will indicate
whether certain combinations (e.g., Web respondents who encounter technical difficulties and
take longer to respond) exacerbate indicators of potential satisficing.
Use of Drag-and-Drop Rating Scales in Web Surveys and Its Effect on Survey
Reports and Data Quality
Tanja Kunz, Darmstadt University of Technology
In Web surveys, rating scales measuring respondents’ attitudes and behaviors by means of a
series of related statements are commonly presented in grid formats. Besides benefits from
using grid questions displaying multiple items neatly arranged and easy to complete on a single
screen, grid formats often evoke satisficing behavior as respondents rush through a list of serial
items quickly. This, in turn, might be at the expense of processing each item carefully resulting,
amongst others, in less differentiated answers compared to using grids with fewer items or
single-item per screen formats. The present experiment is designed to gain a better
understanding of how respondents answer to rating scale questions and how the quality of
rating scale answers can be influenced by different kinds of grid formats. For that purpose, two
types of drag-and-drop rating scales are developed with the aim to retain the benefits of a grid
format but preventing respondents from satisficing behaviors by either 1) dragging answer
options horizontally arranged in the top row to the question items in the first column, or 2)
dragging question items stacked in the top row to answer options in the first column. A 3 x 5
factorial design is implemented in a randomized field experimental Web survey conducted
among university applicants (n=6000) with varying number of items (6, 10, and 16) presented in
drag-and-drop formats or standard grids. Rating scale formats are examined in terms of
response distribution and indicators of data quality (item nonresponse, nondifferentiation,
acquiescence and extremity bias). Results indicate that while all rating scale formats yield
comparable substantial responses drag-and-drop rating scales encourage higher item
differentiation. However, results concerning other indicators of data quality are mixed which are
discussed within the scope of the cognitive question-answer process.
MAPOR Student Paper Award Winner
Speeding and Non-Differentiation in Web Surveys: Evidence of Correlation and
Strategies for Reduction
Chan Zhang, University of Michigan
The interactivity of the Web can be harnessed to improve online response quality. A small body
of research has begun to explore interactive prompts to reduce respondent satisficing, i.e.,
providing adequate but not optimal answers. For example, in our earlier work speeding
(responding very quickly), is reduced with an interactive, textual prompt when responses are
very fast (< 1/3 second per word). These and other studies have focused on one satisficing
behavior, although it’s likely respondents who engage one satisficing behavior engage in other
such behaviors while completing the questionnaire. In fact, emerging evidence suggests a
strong correlation between two well-known satisficing behaviors in Web surveys—speeding and
non-differentiation (giving very similar ratings in grid questions). Given that both speeding and
non-differentiation are prominent satisficing behaviors, which one should be addressed through
prompting and does prompting one behavior over the other differently impact data quality? We
tested this in an experiment using a probability-based online panel. We compare two types of
prompts in a series of grid questions, one targeting only speeding and the other only non-
differentiation (we also include a control condition of no prompt). We find that prompting either
speeding or non-differentiation can curtail both behaviors on grid questions. This reflects the
inherent correlation of these two satisficing behaviors, and more importantly, suggests that both
prompts indeed lead to more thoughtful answers (in contrast to if the two types of prompts only
had parallel effects where speeding prompts only reduced speeding and vice versa). In addition,
both prompts seem to enhance the quality of answers other than grid questions, suggesting
potentially broad effects on respondent performance. We will also report evidence about the
impact of prompts on respondents’ behaviors in subsequent surveys of this panel, and whether
any carry-over effects differ between the two types of prompts.
Mode Choice, Respondent Engagement and Data Quality
Accessibility or Simplicity? How Respondents Engage With a Multiportal (Mobile,
Tablet, Online) Methodology for Data Collection
Michael W. Link, The Nielsen Company; Jennie Lai, The Nielsen Company; Kelly Bristol,
The Nielsen Company
While “choice” may be good for consumers, it is unclear whether or not mode “choice” helps or
hurts in our efforts to collect data from respondents. Moreover, mobile technologies add a
number of new dimensions to computer-assisted interviewing, including potential changes in
location, the ability to communicate more readily with respondents (via triggered pop-up
messages or IMS), and the potential to move from device to device throughout the day. We
examine the impact of mode choice on respondents’ willingness to keep a two-week activity
diary. Utilizing a “multi-portal” approach (i.e., smartphone, tablet and traditional online), we
selected approximately 400 respondents in two cities utilizing a dual-frame (landline/cellphone)
sample. Respondents could provide their information throughout the day in any location using a
smartphone, tablet, online access or any combination of these. Those without one or more of
these devices in their homes was deemed “out of scope” for the study. The study highlights
several important findings: 1) despite having access to multiple ways of entering information, the
vast majority of respondents utilized only one; 2) traditional online access was the preferred
mode of entry over mobile devices; and 3) there were significant differences in terms of age
(over 50 years versus 50 years and under) in respondents’ willingness (or ability) to use these
electronic modes to keep a multi-week activity diary. The findings highlight many of the
opportunities and challenges with utilizing some of the new technologies—singularly or in
concert—as data collection modes.
Online Survey Participation via Mobile Devices: Findings From Seven Access
Panel Studies
Michael Bosnjak, GESIS-Leibniz Institute for the Social Science; Teresio Poggio, Free
University of Bozen-Bolzano; Frederik Funke, LINK Institute
The diffusion of mobile devices such as tablet computers and smartphones enabling
respondents to participate in self-administered online surveys create new challenges for survey
methodology in terms of measurement (e.g., equivalence of mobile versus traditional online
instruments) and nonresponse issues (e.g., response patterns among mobile participants in
comparison to desktop-based respondents). By merging available data from several online
access panel studies conducted between March and May 2012 in Germany, we have
addressed four nonresponse-related research questions. First, how large is the share of mobile
participants when conducting online panel surveys overall? Second, how can the propensity to
choose mobile modes be explained? Third, do mobile participants differ on participation
parameters, such as the number of completed questions, and the length of entries to open-
ended questions? Fourth, does mobile participation change as more advanced technological
features (such as Flash technology) are being embedded? The results to be presented show
that 1) a considerable share of online panel members did participate using a mobile device, 2)
that the propensity for choosing mobile devices to participate in online surveys is a function of
age and gender (younger subjects and males are more likely to participate in this way, rather
than older subjects and women), 3) mobile respondents did not substantially differ from
traditional online survey respondents on an array of participation rate indicators. However, when
using Flash technology, 4) mobile participants showed extraordinarily high dropout rates (about
twice as much mobile drop-out rate when using Flash technology compared to traditional
computers). Implications for survey methodology will be discussed, along with avenues for
future research.
Mode Choice on an iPhone Increases Survey Data Quality
Frederick G. Conrad, University of Michigan; Michael F. Schober, The New School for
Social Research; Chan Zhang, University of Michigan; Huiying G. Yan, University of
Michigan; Lucas Vickers, The New School for Social Research; Michael Johnston, AT&T;
Andrew G. Hupp, University of Michigan; Lloyd Hemingway, University of Michigan;
Stefanie Fail, The New School for Social Research; Patrick Ehlen, AT&T; Christopher
Antoun, University of Michigan
We now commonly choose the mode through which we communicate. For example, if
immediate feedback is needed a phone call makes sense; otherwise, an email message is fine.
Similarly, if a written record of the exchange is desirable, email or text is appropriate; otherwise,
phone is better. Smartphones and tablets make mode choice particularly easy and routine: the
options can be selected from a single device with one finger movement or voice command. Can
this kind of mode choice add value to the survey enterprise by, for example, increasing
respondents’ commitment to the task when answering in a mode they have chosen? We
conducted an experiment to explore how mode choice affects data quality, completion and
satisfaction. 1268 iPhone users were contacted on their iPhones by either a human or
automated interviewer via voice or SMS text. This created four modes: Human Voice, Human
Text, Automated Voice, and Automated Text. In half of the initial contacts, respondents were
able to choose their interview mode (which could be the contact mode); in the remaining half the
mode was simply assigned. Overall, more than half the mode choices involved a mode switch.
But just being able to choose (whether switching or not) improved data quality: when
respondents chose the interview mode, there was less satisficing (rounded numerical answers
and non-differentiation) than when the mode was assigned. There was a small loss of
participants at the point the choice was made but those who began the interview in a mode they
chose were more likely to complete it than respondents interviewed in an assigned mode.
Finally, those who chose their interview mode were more satisfied with the experience than
those who were interviewed in an assigned mode. The results point to clear benefits from mode
choice and the importance of further exploration.
Comparing Tablet, Computer, and Smartphone Survey Administrations
Tom Wells, The Nielsen Company; Justin Bailey, The NPD Group; Michael W. Link, The
Nielsen Company
“Survey respondents are increasingly attempting to take surveys on their mobile devices,
whether researchers intend for this or not” (Cazes et al., 2011, p. 2). Approximately 50% of U.S.
adults own a smartphone (Nielsen 2012; Smith 2012) and approximately 20% of U.S. adults
own a tablet (Rainie 2012). These trends have serious implications for online surveys,
especially for those not optimized for mobile devices. In this paper, we present results from
tablet, computer, and smartphone administrations of a survey. Our main focus is on surveys
taken with tablets and whether tablet survey administration is comparable to computer survey
administration. There is currently very little research on tablet administration of online surveys,
however, with tablet ownership on the rise, understanding the effects of this survey mode will
become increasingly more important. In this study, we fielded a survey to a large, national
sample of online panelists, who are also smartphone users. For the mode effect research being
conducted, panelists were randomly assigned to a mobile app version or an online computer
version of the survey. However, among the 711 respondents completing the online survey, 128
completed the survey using a smartphone mobile Web browser and 33 completed the survey
using a tablet. We analyze three measures of survey taking behavior—breakoff rates, survey
completion times, and item-missing data—among tablet respondents, computer respondents,
and smartphone respondents (both mobile app and mobile Web respondents). Based on our
analysis, tablet survey administration appears to be comparable to computer survey
administration. Across each measure, differences in survey taking behaviors were small and not
statistically significant. At the same time, with two of the measures—breakoff rates and survey
completion time—we consistently uncovered differences between smartphone administration
and computer administration, with differences being more pronounced among smartphone
mobile Web respondents.
Mobile Browser Web Surveys: Testing Response Rates, Data Quality and Best
Practices
Kyley McGeeney, Gallup; Jenny Marlar, Gallup
The rapidly changing technological landscape of the United States has important implications
for survey researchers. The challenges are well known for outbound telephone surveys, but to a
lesser degree for Web-based surveys. According to estimates by the Pew Internet and
American Life project, 55% of cellphone owners access the Internet via their phone, and for
many Americans a cellular device is their only Internet connection. Mobile devices provide
instant connectivity, allowing respondents to take surveys at any time of day, no matter where
they are located, which is an exciting prospect. However, very little research exists to date about
surveys completed via smartphones and other mobile devices. It is unknown if surveys
designed to be compatible with mobile Web browsers increase response rates, or if
respondents who respond via a mobile browser are demographically different than desktop
respondents. Further, it is unknown if best practices for the design of desktop based Web
surveys translate to mobile based surveys. The present study was conducted using the Gallup
Panel, a probability based panel of over 50,000 members who complete studies via the Web,
mail, or telephone. Panel members were randomly assigned to one of 12 treatment groups that
compare three different modes (traditional Web only, traditional Web plus mobile browser
compatible and outbound), two treatment for length, and two treatments for question layout.
Closed and open-ended questions were tested. Paradata, such as user agent string, time per
survey, breakoffs, and answer changes, were also recorded as part of the study. The results will
be analyzed to better understand how mobile compatible surveys affect response rates, the
representativeness of the sample, and data quality. The authors will draw conclusions about the
costs and benefits of mobile compatible surveys, and make suggestions for best practices.
Research on Behavioral and Time-Use Diaries
Augmenting Paper Diaries With Phone and Web Data Retrieval: Is it Effective?
Laurie Wargelin, Abt SRBI; Jason Minser, Abt SRBI; Zachary Homer, Abt SRBI; Anna
Fleeman, Abt SRBI; Randal ZuWallack, Abt SRBI
From the 1960s to the 1990s, most Household Travel Surveys (HTS) were conducted entirely
by self-administered pen and paper diaries sent via USPS mail. Starting in the 1990s and into
present day, researchers have augmented the paper diaries with phone and Web technologies
for HTS data retrieval. These electronic programs provide the advantages of offering
sophisticated geocoding capabilities, in-program data checking, and monitoring for valid
responses. Some researchers have speculated that the advent of advanced technologies will
make the pen and paper retrieval method obsolete. However, since the introduction of multi-
method retrieval options, only 15-25% of travel diaries have been completed by Web while
recent evidence indicates that less than 25% of diaries are reported by phone. A majority of
travel diaries are still returned by mail, as evidenced in the recently completed Metropolitan
Council HTS (Greater Minneapolis), an interim report for the Southern California Association of
Governments (SCAG) HTS Augment Survey, and the Pretest from the Delaware Valley
Regional Planning Commission (DVRPC) HTS. This phenomenon may be explained by: 1)
accessibility to electronic methods; 2) advanced modeling requirements have greatly increased
respondent burden, making telephone-based reporting cumbersome; and/or 3) thoughtful
development of paper diaries, relying on years of survey research, may prove more appealing to
respondents. Our research will explore the variations in travel reporting for each retrieval
method in three distinct regions of the United States – Northeast, Midwest, and West – and
analyze any underlying socio-demographics related to retrieval method. In addition to
documenting the socio-demographics by the three methods, this paper will explore the quality of
data collected by each retrieval method. The findings provide great insight as to whether having
options is effective and efficient for surveys.
Comparison of Instantaneous Mobile Time Use Data Collection Methods to
Traditional Time Diary Methods
Pat Graham, GfK Knowledge Networks
Time use studies frequently make use of recall time diaries, which require respondents to recall
all of their activities for a period of time (usually the 24 hours of single day). While time diaries
are considered a tried and true method for studying time use, there is ample literature
documenting survey error and trade-offs with this approach (National Academy of Sciences
2000; Phipps & Vernon 2009; Robinson 1999). For example, time diaries elicit relatively low
response rates that vary systematically along demographic lines, rely on recall information that
is often incomplete, and are known to under-report secondary activities. One potential solution
to these issues of data quality has been to make use of recent enhancements in the quality,
management and technology of “mobile” surveys to collect several instantaneous
measurements from respondents throughout the day. Respondents can be “pinged” at pre-set
times to record information about what they are doing, where they are, who they are with, and
their thoughts and feelings. If they fail to respond to the first “ping,” then they can be reminded
again with another “ping.” Surveys conducted through “mobile” devices, however, are not
without their limitations, mostly related to screen size and usability. Moreover, nothing is known
empirically about how frequently to “ping” respondents to maximize data quality. This study
compares data collected over three 24-hour periods of time (including Super Bowl Sunday)
using a traditional time diary recorded at the end of each day and instantaneous measurements
made throughout the day using mobile technology. The two modes of data collection will be
evaluated based upon the non-response, number of primary and secondary activities reported,
number of individuals present with the respondent, completeness of responses and the
concurrent validity between measurements. Within the “mobile” collection mode, we will also
how number of “pings” impacts data quality.
Examining the Relationship Between Error and Behavior in the American Time
Use Survey Using Audit Trail Paradata
Nicholas Ruther, University of Nebraska – Lincoln; Tarek Al Baghal, University of
Nebraska – Lincoln; Adam Eck, University of Nebraska – Lincoln; Leonard C. Stuart,
University of Nebraska – Lincoln; A. L. Phillips, University of Nebraska – Lincoln; Robert
Belli, University of Nebraska – Lincoln; Leen-Kiat Soh, University of Nebraska - Lincoln
Audit trails, usage information produced during a computer assisted survey, are a form of
paradata that allows researchers to examine how an instrument is used by interviewers or
respondents in the course of an interview. This research uses audit trails and survey responses
from the American Time Use Survey (ATUS) to examine the relationship between the audit trail
paradata and potential errors in the ATUS. Previous research has examined the relationship to
a much more limited source of paradata to issues such as data quality and survey breakoff
(Gutierrez et al. 2011, Peytchev 2009). Research has also identified a number of potential
errors in time diaries and specifically in the ATUS, such as missing key daily events (such as
sleeping, eating, and grooming), providing consistently rounded answers to the duration of
activities, and having memory gaps where some part of the recall period cannot be remembered
(Fricker 2007, Phillips et al. 2012). The current set of audit trails paradata provide such useful
but infrequently available data such as timing data (e.g. data entry timing, length of interview),
key stroke data, the number of programmed prompts indicating data warnings, and how the
data was reported and entered (such as using a precoded response option versus verbatim
responses). These data are used, in combination with other potentially important variables such
as indicators of cognitive ability and demographics, to predict the likelihood and amount of error
observed in the ATUS using the various indicators. Initial findings show the importance of audit
trail paradata in understanding error. For example, more verbatim entries used by a respondent
are associated with higher rates of missing key daily events compared against pre-coded
responses. Activity entry editing, on the other hand, is associated with less overall presence of
this error, indicating a potential interviewer-respondent interaction in correcting errors.
What Are You Doing Now?: Audit Trails, Activity Level Responses and Error in
the American Time Use Survey
Tarek Al Baghal, University of Nebraska – Lincoln; Lynn Phillips, University of Nebraska
– Lincoln; Nicholas Ruther, University of Nebraska – Lincoln; Robert F. Belli, University
of Nebraska – Lincoln; Leonard Stuart, University of Nebraska – Lincoln; Adam Eck,
University of Nebraska – Lincoln; Leenkiat Soh, University of Nebraska – Lincoln
The American Time Use Survey (ATUS) is a time use diary where respondents report all
activities they performed in a given day. The granular (activity level) data it provides sheds light
not only on time use, but also potentially on memory and survey response processes. For
example, activity level data identifies when in remembering the past day errors occur, which
may assist in the study of memory structure and cues used for recall. Using a unique data set
combining ATUS public use and audit trail data, this research examines activity level data to
answer questions such as how people recall the length of time of different types of activities,
how recall affects errors, and the impact of respondent level characteristics, (e.g. cognitive
ability), on activity level reports. Initial results show that durations (e.g. doing an activity for 45
minutes) are reported for shorter activities, whereas start and stop times (e.g. completing an
activity at 4 p.m.) are used for longer activities. Interestingly, the majority (76.9%) of reported
gaps in memory was given as start and stop times, but errors of vagueness were more reported
as durations (69.2%). Further, the majority (61.2%) of memory gaps occurred during “off-peak”
hours, outside of the standard working hours of 9-5, whereas the reverse was true for vague
reports; 77% of these errors occurred during standard work hours. The effect of respondent
characteristics will be examined using hierarchical linear modeling. The results of this study
shed light on memory and survey response processes, with implications for survey design,
particularly for time diaries.
Troubles With Time-Use: Examining Potential Indicators of Error in the American
Time Use Survey
Andrea Lynn Phillips, University of Nebraska – Lincoln; Tarek al Baghal, University of
Nebraska – Lincoln; Robert Belli, University of Nebraska – Lincoln
This study explores six potential indicators of measurement error in the American Time Use
Survey (ATUS), for the purpose of analyzing satisficing behavior in time-diary research.
Possible reasons for satisficing behavior include respondents’ busyness, their levels of social
capital, their cognitive sophistication, and the difficulty of retrieving the information requested.
This analysis builds on the research of Fricker (2007), who identified three “missing data”
indicators of error in the ATUS: whether the respondent failed to report eating, sleeping, or
“personal grooming” in the day in question. This paper conducts more detailed analysis of these
indicators than has been previously done, and also examines an indicator of rounding of time
spent on activities, an indicator of errors in travel reports, and the presence of “memory gaps”
reported by respondents. Regression and structural equation modeling are used to identify the
impact of demographic and other descriptive variables on error indicators. Direct and indirect
effects of these variables on error indicators are found, but these effects are not consistent
across indicators. For instance, hours worked and race are positively correlated with the
likelihood of missing sleeping and missing eating, but are negatively correlated with missing
grooming. Age, education, race, and sex are also found to have significant indirect impacts on
the likelihood of rounding. In contrast to previous assumptions made in the literature, this study
indicates that oft-used error indicators in the ATUS do not measure a single latent construct of
satisficing behavior. However, cognitive ability and the difficulty of retrieval are identified as
important factors influencing satisficing in the ATUS.
Mixed Topics in Questionnaire Design II
Determining Optimal Recall Period Length for Surveys of Payment Instrument
Use in the Past
Marcin Hitczenko, Federal Reserve Boston
With the increasing ability to store and manipulate large amounts of information, we are
increasingly learning about the world by gathering and analyzing data. While advances in
technology have made it easier to collect this data accurately and often instantaneously, a great
deal of research, especially in the social sciences, continues to rely on surveys. Much work has
been done documenting that surveys often lead to inconsistent or erroneous responses. For this
reason, it is fundamental to understand how the data collection process interacts with the
cognitive process to affect the responses. In this work, we focus on the effect of the length of
the recall period in surveys that ask individuals to aggregate past behavior for a specific
timeframe. We limit ourselves to data collected by RAND and the Consumer Payment Research
Center at the Boston Federal Reserve regarding reported number of uses of four different
payment instruments within a year, month, week, and day of the survey. This data consistently
shows that the average reported daily usage decreases as the length of the recall period
increases. This well-known phenomenon introduces a tradeoff between the benefit of sampling
more days and the potential bias introduced by memory decay as the recall period increases.
We propose a general form for a stochastic model mapping the actual number of payment
instrument uses to the value reported, as a function of the recall period length. We fit the models
by utilizing data from the Diary of Consumer Payment Choice, also of the CPRC, that tracks
individuals’ payment behavior for three consecutive days. We then use the results to determine
the optimal recall period length for each instrument, defined to be that which minimizes the
mean-square error of estimates. Implications for other types of data are discussed.
Mechanisms of Reporting to Dependent Questions in Panel Surveys
Stephanie Eckman, Institute for Employment Research; Annette Jaeckle, Institute for
Employment Research
Panel surveys are used to measure change over time, but previous research has shown that
simply asking the same questions of the same respondents in repeated interviews leads to
overreporting of change. With proactive dependent interviewing, responses from the previous
interview are preloaded into the questionnaire, and respondents are reminded of this
information before being asked about their current situation. Existing research has shown that
dependent interviewing techniques can reduce spurious change in wave-to-wave reports and
thus improve the quality of estimates from longitudinal data. However, the literature provides
little guidance on how such questions should be worded. After reminding a respondent of her
report in the last wave (“Last time we interviewed you, you said that you were not employed”),
we might ask: “Is that still the case?”; “Has that changed?”; “Is that still the case or has that
changed?”; or we might ask the original question again: “What is your current labour market
activity?”. In this study we present experimental evidence from a longitudinal telephone survey
in Germany (n=1500) in which we experimentally manipulated the wording of the dependent
questions and contrasted them with independent questions. We report differences in the
responses collected by the different question types. Due to the concern that respondents may
falsely confirm previous information as still applying, leading to underreporting of change in
dependent interviewing, we also test hypotheses about how respondents answer such
questions. In these tests, we focus on the roles played by personality, deliberate misreporting to
shorten the interview, least effort strategies and cognitive ability in the response process to
dependent questions. The paper provides evidence-based guidance on questionnaire design for
panel surveys.
Is Time on Our Side? Decomposing Survey Length on the Health and Retirement
Study
Piotr Dworak, ISR; Heidi Guyer, Institute for Social Research University of Michigan
The effects of questionnaire length on respondent burden and response rates have been
studied over the years. However, less attention is paid to what factors, other than content, may
explain the variation in survey length and which factors have a positive, versus a negative
impact on the interview experience. This analysis explores a rich set of paradata from the
Health and Retirement Study to develop a more holistic view of the survey length and its impact
on Respondent cooperation. The Health and Retirement Study (HRS) administers computer-
assisted in-person and phone interviews to over 20,000 participants every two years. The
questionnaire covers a wide range of topics and has grown in size and complexity since 1992.
In 2010, the average interview length was 153 minutes for an in-person interview with physical
measures and biomarkers and 86 minutes for interviews completed by telephone. Currently, the
HRS survey length is analyzed using section-level timings but recent developments allow
controlling for the objective length – the number of fields encountered during the interview.
There is preliminary evidence that after controlling for the objective length, other factors related
to respondent characteristics (age, gender, education, employment), interviewer characteristics
(age, gender, performance, and tenure), and other characteristics related to study design affect
the length of the interview. Based on the preliminary findings this analysis aims not only to
discern the key predictors of the survey length but to estimate their relative contribution, which
in turn may inform length-reduction initiatives and more refined data collection cost-models. In
addition, capitalizing on the HRS longitudinal design, we will investigate the influence of the
interview length on cross-wave participation.
Building a History: Collecting Comprehensive Employment Data in a Web-Based,
Multi-Mode Survey
Melissa Cominole, RTI International; Chris Bennet, RTI International; Lesa Caves, RTI
International
Event history analysis is an increasingly common technique used by social scientists to analyze
change over time. Conducting such analyses often relies on the availability of historical
information provided by survey respondents, for whom it may be challenging to recall events
that occurred months or years in the past. As a result, when developing a survey that collects
information conducive to such an event history format, there are several competing survey
design priorities to consider. One goal may be to collect enough data to meet the analytic needs
of a diverse set of data users. An additional goal may be to provide sufficient response options
for respondents to easily and accurately convey their experiences over broad spans of time. Yet
another goal may be to provide a suite of features (e.g., event history calendar, validations,
cross-checks) that minimize recall error and encourage similar experiences across modes. Still
another goal may be to ensure that the survey is conducted as efficiently as possible in order to
minimize the response burden for respondents. For this large, nationally representative
longitudinal study of recent college graduates, it was necessary to balance these competing
priorities when developing items designed to elicit a history of employment, unemployment, and
job search activities in the four years after college graduation. Here we examine the impact of
this balance using such metrics as survey timing data, item-level nonresponse, comparisons to
responses from an earlier wave of the survey, and comparable estimates from benchmark data
sources. We will offer suggestions for survey designers based on lessons learned during the
design and implementation process.
Using Visual Design Theory to Improve Skip Instructions: An Experimental Test
Nicole Gohring, University of Nebraska – Lincoln; Jolene Smyth, University of Nebraska
– Lincoln
With the emergence of Address Based Sampling (ABS) and the availability of the Computerized
Delivery Sequence File researchers are increasingly utilizing mail surveys. However, one
drawback of the mail mode is that respondents have to navigate their own way through a mail
survey without interviewer or computer assistance. Thus, a common challenge in questionnaire
design is determining how best to provide skip instructions. Previous research has identified
design strategies that decrease the frequency of skip errors, but even in the most effective
treatments nearly 20 percent of respondents still make navigational errors (Redline et al. 2003).
In this paper we report the results of a skip instruction experiment conducted in the 2012
Nebraska Annual Social Indicators Survey (NASIS; n=954; AAPOR RR1 = 27.2%). Two
versions of the NASIS questionnaire were created drawing heavily on current visual design
theory. The first contained conventionally designed skip instructions in which the response
option that triggered the skip was followed by a right hand arrow and a verbal instruction to “Go
to question #”. The second version also used a right hand arrow and identical verbal instruction
on the response option that triggered the skip, but included up to three design alterations that
we hypothesize will increase the effectiveness of the skip instruction. These included 1) the
addition of a right hand arrow connecting the response options that did not trigger a skip to their
follow-up questions, 2) indentation of the immediate follow-up questions to create hierarchical
subgrouping, and 3) where necessary for 1 and 2, reordering the response options in the
originating question. Preliminary results support our hypotheses in showing that Version 2 led to
significant decreases in skip errors. In addition to reporting results, the paper will discuss best
practices for designing skip instructions based on current evidence and visual design theory.
Panel Recruitment, Attrition and Data Quality II
After Your Interviewer Looks Under the Couch: Strategies for Handling Attrition in
Twin Studies
Christopher Ojeda, The Pennsylvania State University; Veronica Roth, The Pennsylvania
State University; Eric Plutzer, The Pennsylvania State University
Twin studies have proliferated in the social sciences, revealing that behaviors such as voter
turnout (Fowler et al. 2008), general ideology (Alford et al. 2005), and many specific political
attitudes (Hatemi et al. 2011) are heritable. These estimates of heritability are derived from the
analysis of complex, and frequently longitudinal surveys, but almost never account for key
features of survey design. Most important, twin analyses ignore the differential probability of
answering questions, thereby increasing the risk of biased estimates. We examine panel
attrition as one potential source of bias in the estimation of genetic and environmental
influences. Using the National Longitudinal Study of Adolescent Health data (Add Health), we
consider if and how panel attrition affects the estimates of genetic and environmental influence
on voting behavior. To do so, we proceed in two steps. First, we explain how attrition biases
estimates in a twin study and then propose strategies for reducing the bias. Second, we
demonstrate evidence of attrition in Add Health and then compare three methods for mitigating
bias due to attrition: complete case analysis, inverse probability weighting, and multiple
imputation. In our analyses, we use the first wave (N = 1,974 sibling pairs) and third wave (N =
1,456 sibling pairs) of Add Health, conducted in 1996-1997 and 2001-2002, respectively.
Finally, we discuss the strengths and weaknesses of these methods and how each may impact
estimates of voting behavior. We believe this study represents a critical first step in ensuring
that biosocial models produce accurate estimates of political behaviors and attitudes, rather
than estimates that may be artifacts of the data collection process.
Panel Attrition: Separating Stayers, Sleepers and Other Types of Drop-Out in an
Internet Panel
Peter Lugtig, Department of Methods and Statistics - Utrecht University
Attrition is the process of respondents dropping out in a panel study. Errors resulting from
attrition decrease statistical power and can potentially bias estimates derived from survey data.
As panels are increasingly being used in the social sciences as a source of empirical data, a
good understanding of the determinants and consequences of attrition is important for all social
scientists who make use of panel study data. In many panel surveys, the process of attrition is
more subtle than being either in or out of the study. Respondents often miss out on one or more
waves, but might return after that. They start off responding infrequently, but participate more
often later in the course of the study. Using current models, it is difficult to incorporate such non-
monothone attrition patterns in analyses of attrition. Non-monothone attrition is common in long
running panels, or panels that collect data frequently. In order to separate different groups of
respondents that each follow a distinct process of attrition, a Latent Class model is used. This
allows the separation of different groups of respondents, that each follow a different and distinct
process of attrition. Using background characteristics for a panel survey of 8000 respondents
who were recruited using a probability-based method into the Web-based LISS panel, I show
that respondents who loyally participate in every wave (stayers) are for example older and more
conscientious than attriters, while infrequent (lurkers) respondents are younger and less
educated. We can link these characteristics to attrition theories, and show that our findings can
be related to theories on panel participation and reasons for dropout. I conclude by showing
how each class contributes to attrition bias on voting behavior, and discuss ways to use attrition
models to improve the panel survey process
Panel Attrition and Weighting Adjustments for the ANES Time Series
Matthew DeBell, Stanford University
The American National Election Studies (ANES) Times Series surveys have been conducted
during every presidential election since 1948 and are among the most widely used datasets in
political science. The ANES interviews respondents before each election and interviews the
same respondents post-election, with some losses due to attrition. Attrition in panels typically is
not random and typically contributes to survey error. However, the ANES has never produced a
formal analysis of the effects of attrition on the Time Series sample, nor has ANES developed
weights to adjust explicitly for attrition effects. In this paper we analyze attrition in the ANES
2008 Time Series study, assess the effects of that attrition on the accuracy of the survey's
estimates, and implement weighting adjustments for attrition bias. We then assess post-
adjustment accuracy and examine the effects of these adjustments on voter turnout and
candidate choice models. Implications for adaptive design are considered, in which the quality
of the post-election sample could be improved by targeting reinterview efforts on respondents
whose likely attrition would be most harmful to the quality of the sample. We conclude with
recommendations for procedures aimed at the prevention, measurement, and correction of
ANES panel attrition bias in the future.
Retention and Attrition: A Comparison Across Ethnic Groups
Jennifer Parker, RAND Corporation; Kirsten Becker, RAND Corporation; Benjamin
Karney, UCLA
In a longitudinal study on marital satisfaction, we focused on low income couples of differing
ethnicities. Couples in which both partners identified as Hispanic or Latino were asked to sub-
categorize themselves as Puerto Rican, Cuban/Cuban American, Dominican, Mexican/
Mexican-American, Central American, South American, Other Latin American, Other
Hispanic/Latino or Mixed Hispanic. Information was also gathered on the couples’ ages,
preferred language (English or Spanish), country of origin and parents’ country of origin. We will
explore differences in retention amongst Hispanic participants relating to Hispanic/Latino sub-
categories, age, preferred language and country of origin. We will also compare retained
couples to those not retained using the same demographic points, and discuss the reasons for
attrition amongst couples not retained. Lastly, we will report preliminary findings in differences in
retention and reasons for attrition between Hispanic/ Latino couples, African-American couples
and white couples.
Re-Interview Bias in Panel Surveys: Results from a Seven-Wave Randomized
Experiment
Sebastian Lundmark, Gothenburg University; Mikael Gilljam, Gothenburg University
Comparing panel samples and refreshment samples, previous studies have found significant re-
interview effects on people’s knowledge. Participation in previous panel waves tends to produce
more knowledgeable respondents. However, in studies of people’s beliefs, attitudes and voting
intentions, only minor re-interview effects have been detected (Das, Toepel & Soest 2011;
Lazarsfeld 1944). Most of these studies have used three or fewer panel-waves, and none of
them have used a randomized experiment design. This study rectifies these shortcomings by
using a seven-wave panel together with a randomized experiment design. With this different
and more ambitious approach, we are able to study re-interview effects on beliefs, attitudes and
voting intentions with a relatively large number of waves, and with randomized gaps. More
specifically, the design consists of one group of respondents receiving five waves and two gaps,
one group receiving six waves and one gap, and one group receiving all the seven waves. In
addition, we also compare these groups with two refreshment samples of new panelists (one
probability sample, and one non-probability sample, however not randomized). Preliminary
findings show that the number of waves a respondent is subjected to affects their responses to
questions on attitudes, beliefs and voting intentions. The results indicate that professionalized
panelists and overestimated response stability are a non-negligible problem in panel surveys.
Sunday, May 19
8:30 a.m. – 10:00 a.m.
AAPOR Concurrent Session J
Reliability and Validity of Measurement
Parent and Teacher Ratings of Children’s Approaches to Learning and Behavior:
Do They Align and Are They Reliable?
Ashley Kopack Klein, Mathematica Policy Research; Lizabeth Malone, Mathematica
Policy Research
Studies of young children often rely on indirect or proxy reports of children’s behavior given the
lack of direct measures and the costs of administration. To verify the data, multiple reporters are
often asked to report on the same child. However, there are not clear standards for deciding
which reporter to use when ratings vary and using all or randomly picking one reporter may
confound measurement (Kraemer et al. 2003). Our study aims to answer three questions: 1)
What is the reliability of parent, teacher, and assessor ratings of children’s behavior? 2) How
similar are parent, teacher, and assessor ratings? 3) How do reporter ratings compare to a
direct measure of children’s behavior? We use data from the Head Start Family and Child
Experiences Survey (FACES) 2009 to answer these questions. FACES includes a nationally
representative sample of 3,349 children and uses multiple methods to collect data on children
from several sources. We focus on 1,000 children who entered Head Start at age four in fall
2009. Parent, teacher, and assessor reports of children’s behaviors are used to construct two
rating scales–approaches to learning/social skills and problem behaviors–which overlap with
three domains of executive functioning: working memory, inhibitory control, and attention. We
use Cronbach’s alpha to examine the internal consistency reliability of the scales. We examine
the correlations and net difference rate between parent, teacher, and assessor ratings and
children’s performance on the executive functioning Pencil Tapping Task (Smith-Donald et al.
2007). We explore differences by child and family characteristics. This study contributes to
decisions surrounding the use of multiple reporters by comparing indirect ratings across
reporters and linking them to a direct measure of the same construct. As surveys often have
limited resources, we discuss the utility and/or added value of multiple reporters to measure
children’s behaviors.
Proxy Reports of Children’s General Health Status and the Role of Reporting Bias
in the Association Between Child and Maternal Health
Dana Garbarski, University of Wisconsin-Madison
Child health is an important covariate of a variety of individual and familial health and
socioeconomic outcomes as well as an important outcome on its own. Given that mothers often
report children’s health status in large scale survey collection efforts, it is essential to gain an
understanding of how and the extent to which mothers’ and children’s reports of children’s
general health status differ as well as account for the ways in which the association of child
health with other familial outcomes of interest may be subject to the common method bias of
being reported by the same person. Using data from the first wave of the National Longitudinal
Study of Youth 1997 cohort (ages 12 to 17) the analysis demonstrates moderate concordance
between mothers’ and children’s reports of children’s general health status. The analysis also
demonstrates that additional measures of child health and sociodemographic covariates such
as children’s age, race or ethnicity, and household wealth have stronger relationships with
mothers’ compared to children’s reports of children’s general health status. Finally, it appears
that maternal reporting bias may lead to overestimation of the relationship between child health
and other maternal-reported outcomes. Using maternal health as the criterion of interest, this
analysis incorporates interaction effects to examine whether the statistical effect of child health
on maternal health is greater when child health is reported by the mother compared to when the
child reports it. This method gives researchers some idea about how much their results may be
influenced by the common method bias of being reported by the same person based on a few
assumptions, and is easier to incorporate than some of the more complicated methods for
dealing with common method biases.
Differences Between Self-Reported and Actual Income: An Analysis of Low-
Income Households Seeking Housing Assistance
Ahuva Jacobowitz, NYC Department of Housing Preservation and Development
Elyzabeth Gaumer, NYC Department of Housing Preservation and Development
Socioeconomic status is a key predictor in a wide range of disciplines and research questions.
Researchers often rely on self-report income data; however, the validity of self-report answers is
less well understood. In particular, capturing accurate income data using self-report can pose a
difficult challenge as it is often considered a sensitive topic by respondents. To assess the
validity of self-report income, we will conduct a comparison of household and individual income
across different modes of data collection as part of a larger survey effort of applicants to New
York City affordable housing. The population applying to affordable housing in New York City,
and therefore the population for survey participants, is a near-poor, working population. They do
not always have a traditional source of income, but rather work multiple jobs, have seasonal
employment, or are self-employed, leading to both difficulties in calculating an annual income
and potential error in reporting. We collect income data from two sources. The first is self-report
household and individual level income listed on a household’s housing application. The second
is household and individual level verified income using pay stubs, tax returns, and employer
verification among other sources as part of the verification process to determine eligibility. We
will compare these self-report data against the verified income and analyze for reporting bias
(n=1,000). Furthermore, since this is part of a larger data collection effort of a self-administered
questionnaire that asks about other household information, we will do further analysis to look at
trends in reporting across other variables of interest including race, education, neighborhood,
and household composition. Since an error in self-report income could mean the difference
between being determined eligible or ineligible for an affordable housing unit, this analysis has
the potential to impact policies and interventions to help individuals more accurately report their
income.
Measurement Error in Diabetes Patient Profiles: Demographic Differences
Between Diagnosed and Undiagnosed Diabetics in a Large Nationally
Representative Sample of Adults 25-34
Anna Bellatorre, University of Nebraska-Lincoln; Patrick Habecker, University of
Nebraska-Lincoln
A wide body of literature exists documenting the rise in obesity in the United States in the past
two decades. However, relatively little attention has been paid to the rise in co-morbid
conditions such as diabetes, particularly undiagnosed diabetes in young adults. Existing
information from BRFSS records indicate that the number of states with rates of diagnosed
diabetes for all adults exceeding 9% of the population increased from zero in 1990 to fifteen in
2010, however no information exists for undiagnosed diabetes prevalence over that same time
period. Using a nationally representative sample of young adults aged 25-34 from the National
Longitudinal Study of Adolescent Health (Add Health); we evaluate the measurement error in
demographics related to diabetes among this cohort. Using this data, we find that 59.4% of
diabetes cases are undiagnosed among this cohort. Moreover, we find that significant bias
exists in estimates related to race, gender, and overall health despite equivalent utilization of
healthcare and insurance coverage when diagnosed diabetes is used as a measure for diabetes
as opposed to using glycated hemoglobin levels exceeding 6.5% per deciliter to measure
diabetes prevalence. We seek to use this adjusted profile of what diabetes looks like in young
adults to inform the medical community on how best to catch cases of diabetes that would
otherwise go undetected if the current profile were used to diagnose diabetes in young adults.
Further, we seek to use this data to inform large national studies utilizing hemoglobin A1C on
the importance of preventing race and gendered non-response for biomarker data collection.
Who Has What Information About Others: Proxy Reporting, Knowledge and
Willingness
Katherine R. Kenward, Research Support Services, Inc.; Alisu Schoua-Glusberg,
Research Support Services, Inc.; Eleanor R. Gerber, Research Support Services, Inc.;
Patricia L. Goerman, U.S. Census Bureau; Elizabeth M. Nichols, U.S. Census Bureau;
Murrey G. Olmstead, RTI International
The U.S. Census and other surveys typically collect data from households by asking a single
household respondent to provide information about others that live in the dwelling. This method
of enumeration assumes that the household respondent can act as an accurate proxy for all
other household members and that he or she is willing to share information about all household
members. This paper explores the cognitive strategies that people use when they are unaware
or uncertain of the information they are being asked to provide as proxies and the extent to
which it is possible to determine the quality of proxy responses in an actual enumeration. We
also explore the reported willingness and/or barriers that exist when reporting for others in the
household, especially those unrelated to the proxy. To explore these issues, we use data from
cognitive interviews conducted with Census Bureau questions asking respondents about
alternate addresses where household members may live or stay, such as former addresses,
seasonal homes, or relatives’ homes. We report what respondents think about responding for
themselves, their family members, and those living at the same address who are unrelated or
only tenuously attached to the household. We also describe strategies that can be used to
determine the likelihood that the data are accurate and complete; also we identify alternative
data collection strategies that may be warranted for households that include roommates,
boarders, or tenuously attached household occupants. Finally, the implications of the findings
for the U.S. Census and other household surveys will be discussed.
Polling and Political Attitudes
Payoff at the Polls: An Investment Theory of Internal Political Efficacy
Tim Vercellotti, Western New England University
Research has found that voting for a winning candidate increases one’s feelings of external
political efficacy (the sense that government is responsive to one’s needs). But little is known
about the relationship between other forms of political activism on behalf of a winning candidate
and internal efficacy (the sense that one can have an effect on politics). This research seeks to
address that gap in the literature by proposing and testing an investment theory of internal
political efficacy. Political scientists have speculated that internal efficacy is psychologically
grounded in an individual’s self-esteem and ego, and is therefore relatively stable and difficult to
alter. I hypothesize that forms of campaign activity that require a greater investment of oneself,
such as volunteering for a campaign, attending a political event or events, or urging others to
support a candidate, are more likely to achieve the difficult task of increasing one’s sense of
internal efficacy when a voter’s preferred candidate wins. Activities that require less of a
personal investment, such as voting for a winning candidate, are less likely to alter feelings of
internal efficacy. I test these hypotheses using a panel survey of Massachusetts voters
interviewed before and after the November 2010 election for governor, as well as American
National Election Study data for the same period. Controlling for existing levels of internal
efficacy before the election, I find that high-investment activities are associated with increased
levels of internal efficacy after the election, while low-investment activities are not. I also find
that this is true for supporters of winning and losing candidates, suggesting that it is
participation, and not the outcome, that makes the difference. Still, these results suggest that
one’s sense of internal efficacy is less fixed than previously thought, and that internal efficacy
may be subject to change under certain circumstances.
MAPOR Student Paper Award Winner
The Influence of Competing Identity Appeals on Voter Participation
Samara Klar, Northwestern University; Spencer Piston, University of Michigan
Political rhetoric frequently targets specific identity groups in order to garner support from group
members. Each year, pollsters and researchers note important voting blocs that emerge from
such group-based appeals. A particularly effective tactic for increasing a demographic group's
participation is to instill group members with a sense of anger. However, demographics illustrate
that Americans are more likely than ever to identify with more than one identity group at a
time—and, often, these groups may align with competing sides of a policy debate. The effect of
targeting two competing identity groups on an individual's political participation is yet unknown.
We administer a unique survey experiment to illustrate that political rhetoric targeting two
competing identities actually causes group members to decrease their political participation,
particularly with respect to one important activity: donating money. The results have implications
for how political rhetoric may affect participation among highly coveted voters.
The 2012 Election: A Different Kind of Country
Gary Langer, Langer Research Associates; Julie Phelan, Langer Research Associates;
Greg Holyk, Langer Research Associates; Damla Ergun, Langer Research Associates
“Protest or transformation?” was the title of our AAPOR presentation on the 2008 presidential
election. Four years later. pre-election surveys and exit poll results in the 2012 contest point in
the latter direction, underscoring demographic and related attitudinal changes that hold out the
prospect of fundamental and potentially long-term changes in the nation’s political equation.
Using 2012 results and previous decades of ABC News/Washington Post surveys and network
exit polls, we will present a portrait of the forces at play in the latest contest for the White House,
exploring preferences in partisanship, ideology and the role of government; views of the
competing candidates and their policies; and the demographic shifts that informed the vote.
Elements of the race we’ll trace include Mitt Romney’s starting position as the least personally
popular major-party candidate in data at least since 1984, Barack Obama’s largely successful
framing in the summer season, Romney’s transformation after the first debate and his advance
in mid-October assessments, followed by a resurgence for Obama as the race drew to its close.
We’ll present data showing the pre-election contest, by two standards of measure, as the
closest either since 1960 or since the dawn of probability-based pre-election polling in
1936.Substantive topics of discussion will include the role of the economy and of the
candidates’ economic empathy, including regression modeling identifying the strongest
predictors of vote preference. We’ll also discuss record-setting or record-matching levels of
polarization among groups (including men vs. women; young voters vs. seniors; and racial,
partisan and ideological groups); and we’ll compare national exit poll and pre-election poll
results.
The Impact of Political Sponsorship on Response to Political Surveys
Roger Tourangeau, Westat; Hanyu Sun, University of Maryland; Stanley Presser,
University of Maryland
This talk presents the results from three experiments, exploring when and how the organization
identified as sponsoring a survey affects who cooperates with the survey and the answers they
provide. In the first experiment, a sample of people registered to vote in Maryland was randomly
assigned to one of three conditions; a survey about politics was identified as being done by 1)
researchers at the University Maryland; 2) the Campus Republicans at the University of
Maryland; or 3) the Campus Democrats at the University of Maryland. To our surprise, we
observed neither the nonresponse bias nor measurement bias that we believe most survey
researchers would have predicted. That is, registered Democrats, Republicans, and
Independents responded at essentially the same rate to the three conditions and gave
essentially the same answers across the conditions. (We conducted half the experiment using
mailed questionnaires and half using telephone interviews.)It is possible that the University
connection in all three conditions undercut the partisan cue, but it is also possible the
conventional wisdom about this kind of effect might be in need of revision—a possibility
supported by the fact that, so far as we know, there have been no prior experimental
demonstrations of a political sponsorship effect in the U.S.. To fill this gap, we conducted two
more experiments in the context of actual political polls done just prior to the 2012 election. In
two state polls, conducted by telephone, half the cases were told that the poll was being done
“on behalf of Democratic candidates” and the remaining cases were not told this. We should
have the results in the next few weeks.
The Influence of Core Political Values on Attitudes Towards Contentious Science
Patrick Sturgis, University of Southampton; Nick Allum, University of Essex; Ian Brunton-
Smith, University of Surrey
Science and technology (S+T) are increasingly entering the public sphere as politically
contested phenomena. In the USA, partisanship is now an important predictor of attitudes
towards stem cell research, global warming, evolution and other areas scientific research. In this
paper we develop this line of research to consider the influence of left/right political orientation
and libertarian/authoritarian values on a particularly contentious area of research: biotechnology
and genomics. Using data from the British Social Attitudes Survey, we test the hypothesis that
conservative economic values are associated with support for genomics research while social
conservatism constrains support and that both aspects of political values condition the way that
citizens select and deploy information that amplifies conflict. We present the results of our
analysis and derive some conclusions about how citizens make judgments about S+T that are
consistent with their existing political predispositions.
Cell Phone Samples: Coverage and Weighting
Finding the Optimal Allocation of Sample Sizes in Dual Frame RDD Telephone
Surveys
Haci Akcin, CDC/OSELS/PHSPO; Denise Bradford, Northrop Grumman
Random-digit dialing (RDD) telephone surveys have long been used to capture data about a
target population. To maintain survey coverage and validity, surveys have had to add cellular
telephone households to their samples. The Behavioral Risk Factor Surveillance System
(BRFSS), for example, one of the largest state-based RDD telephone surveys, began
conducting a large pilot study to collect cell phone data in 2008. In 2011, landline and cell phone
data were combined and released for public use. Optimal allocation of samples in dual-frame
(cell and landline) telephone surveys, however, is still not well defined. In this study, we
examined data from the 2011 BRFSS with different characteristics: landline only, combined data
with current allocation, and combined data with proposed optimal allocation. The study
determines whether there is a cost-effective and optimal sample design feasible for dual-frame
RDD telephone surveys.
Attempting to Boost RDD Cell Sample Productivity by Identifying Non-Working
Numbers Prior to Dialing
Missy Mosher, SSI; Jonathan Best, Princeton Survey Research Associates International
To mitigate rising coverage bias from cell-only households, telephone studies of the general
population are including a significant cell phone component in their design. Federal law prohibits
phone rooms from using predictive dialers to call cell phone sample. Consequently, data
collection costs are high as interviewers spend significant time manually dialing non-working
cellular numbers. These costs create a demand for wireless sample that is screened for non-
working numbers before it reaches the interviewers. SSI, in conjunction with Neustar
Information Services, has developed a method for identifying non-working numbers and
numbers that are likely to be non-working in RDD cell samples. Starting with a randomly
generated EPSEM wireless RDD sample, the numbers are matched against an extensive caller
ID network where telephone activity levels are tracked. Numbers with low activity levels are
identified and can be excluded prior to dialing thus increasing the working phone rate of the
sample. Specifics of this process will be discussed. Additionally, the authors will analyze the
accuracy of the coding and if using it can increase phone room productivity. The extent of
potential non-coverage bias introduced by excluding cell phone numbers with low activity levels
will also be explored. The information provided is essential to researchers making an informed
decision on whether to screen their wireless sample.
Modeling Phone Usage to Weight Dual Frame Samples
Kristie M. Healey, ICF International; William Robb, ICF International; Naomi Freedner-
Maguire, ICF International; Kurt Peters, ICF International
The use of a dual frame design for telephone based surveys is increasing, and in some ways
has become the new standard, due to the increasing use of cell phones and the corresponding
decrease in land-line only households. With these designs, telephone numbers are sampled
from two frames, one representing land line telephone numbers and one representing mobile
telephone numbers. There is significant overlap of the two frames. That is, respondents who
use both landlines and cell phones could potentially be selected through either frame. Proper
weighting of the data takes this overlap into account. Combining data from the two samples
without adjusting for frame overlap will result in biased estimates. To make such an adjustment,
we need data on telephone usage to identify dual users—those that use both types of telephone
service—in each response group. Ideally, it is best to find out during interviewing whether
respondents are dual users, cell only, or landline only. This paper evaluates an option for
making the weight adjustment for dual frames when self-reported phone usage is not available.
We used demographics from an existing dual frame survey to model the probability of dual
phone usage separately for landline and cell data. This model was then applied to a dual frame
survey where self-reported information about phone usage was not available. We compared
weighted estimates for key survey findings using two sets of weights: those that included no
dual-frame adjustment and those that adjusted for predicted phone usage. Finally, we applied
the same model to a third dual-frame study and compared estimates from three sets of weights:
adjusted based on known phone usage, those adjusted based on modeled phone usage, and
not adjusted at all.
Estimation and Prediction of the Landline and Cell-Phone Incidence for Local
Areas
Stanislav Kolenikov, Abt SRBI; Randal ZuWallack, Abt SRBI
Researchers designing dual frame samples must determine how to optimally allocate the
sample across frames. This requires accurate cost information and population proportions; the
latter is the subject of this paper. Overestimation of the cell-only population will result in an
allocation that unnecessarily increases the project cost; underestimation will result in an
allocation that produces higher sampling variability. National and regional estimates have been
released biannually since 2004 based on data from the National Health Interview Survey (NHIS)
(Blumberg et al., 2012a). Researchers have developed small area estimation models to
estimate sub-regional estimates (Battaglia et al., 2010; Blumberg et al., 2011 and 2012). A
limitation of these estimates is the lag time of about 10-12 months after data collection,
compounded by additional lead from several months to several years between the sample
design and the field period (e.g. to accommodate OMB or a long term contract.) We advance
the current research in the area of cell-only prediction utilizing an alternative small area
approach that combines demographic data and telecommunications trends—such as the total
number of landline access points, cell phone subscriptions, and the number of ported numbers.
A multinomial logistic regression is formulated on NHIS data, where the response variable is the
(three-category) phone usage and the explanatory variables are based on the household
demographics. The model coefficients are plugged to ACS data and state-level predictions are
obtained. Finally, the joint generalized method of moments objective function is formulated as a
quadratic form in the multinomial score equations and the discrepancies of the model prediction
from FCC counts. Thus the model respects both the small area demographic profile and the
administrative records. We demonstrate how the model based on NHIS 2009–2011 performs in
predicting the usage rates in 2012, and provide our predictions for 2013.
Impact of Weighting Methods on Tobacco Use Estimates from a Dual-Frame RDD
Survey
S. Sean Hu, Centers for Disease Control and Prevention; Burton Levine, RTI
International; Shanta Dube, Centers for Disease Control and Prevention
Differences in estimates of tobacco use among adults have been observed among the major
surveillance systems including National Health Interview Survey (NHIS), Behavioral Risk Factor
Surveillance System (BRFSS), and National Adult Tobacco Survey (NATS). Sample variance is
least likely to be the reason for these observed differences and therefore differences in
estimates are likely due to bias. For RDD dual-frame telephone surveys such as the BRFSS
and NATS, low response rates and differential response rates across subgroups may increase
bias in estimates of population parameters. To reduce potential nonresponse bias in RDD dual
frame surveys, poststratification is used, which constrains the sum of the weights to equal
external population totals based on combinations of geography, phone usage, age category,
gender, and race/ethnicity category. However, constraining the weights to this set of population
distributions does not effectively compensate for nonresponse bias. The purpose of the current
study is to explore the combinations of characteristics to constrain population totals in the NATS
weighting procedure for effectively compensating for nonresponse bias. Using data from the
2009-2010 NATS, we identified the variables that are most correlated with current tobacco use
and response propensity. Then, we use raking and model-based poststratification procedures to
constrain the sum of the weights to distributions of these variables attained through external
data sources. Using the NHIS as a benchmark, since it has a relatively high response rate, we
compare the smoking rates nationally and by state to determine which combination of
constraints results in the least bias.
Sampling, Response Propensity and Weighting
Consumer File Ancillary Data and Nonresponse Adjustment: Assessing the
Consistency of Estimates Across Weighting Strategies
Josh Pasek, University of Michigan; Curtiss Cobb, GfK Knowledge Networks; J. Michael
Dennis, GfK Knowledge Networks
The increasing availability of auxiliary data sources that can be linked to data at the household
level provides survey researchers with new sources of information about sampled units. Data
sources such as consumer file ancillary data, paradata, and even social media data allow
practitioners to assess differential characteristics of respondents and nonrespondents. What is
unclear, however, is how effectively each of these new sources of information can account for
differences between individuals who do and do not respond to our primary data collection
efforts. The current study compares the results of weighting techniques using consumer file
ancillary data with those of more traditional corrections. Using a unique dataset collected by GfK
where consumer file ancillary data was appended to all households in an address-based
sample, we explore point estimates and relations between variables under a variety of weighting
techniques. Specifically, we compare raking to CPS marginals, propensity score weights to the
CPS, propensity score weights to the sample using the ancillary data, and multiple imputation to
the ancillary data as means to derive estimates and relations linking a number of political
variables to one another. We discuss the assumptions behind each of these corrective
techniques as well as the implications of the differences observed.
Improving Data Collection Procedures Using Prediction Methods
Julia Lee, University of Michigan
Data collection procedures that are implemented under a conventional survey design may incur
differential nonresponse among subjects with different characteristics. This differential
nonresponse could lead to biased survey inferences. Responsive design, an alternative design
strategy, monitors and uses process data ('paradata') to alter the design during the course of
data collection. The process data guides data collection decisions and prioritizes subjects
meeting certain criteria to improve both survey cost efficiency and the representativeness of the
respondent pool. Under the responsive design framework, this research describes a model-
based strategy that combines prediction and balancing using benchmark information from a
high quality survey to improve sampling and data collection of a 'current survey' consisting of
multi-phase data collection. Models predicting sample characteristics from frame and contextual
information are fitted to data from the benchmark survey (such as ACS), which shares the same
frame and contextual information as the current survey of interest. The fitted models are used to
predict sample characteristics for the 'current survey' to guide sampling decisions aimed at
obtaining samples that better represent the targeted population. The proposed method is
illustrated using two large government surveys, treating one as the benchmark survey and one
as the 'current survey'. Analysis of the observed data from the benchmark and 'current survey'
suggests that respondent distributions of the current survey are different from those of the
benchmark. The results of the simulated survey using the proposed method obtains
respondents that better represent the target population. In addition, the inferences based on the
observed current survey have larger estimated standard errors than those based on the
proposed strategy. This proposal provides a framework for a stochastic data collection strategy
that aims to simultaneously attenuate nonresponse bias and increase inference precision, while
maintaining the same budget and timeliness of a conventional survey.
Will Snowball Sampling Leave Your Data in the Cold?
Kristin Cavallaro, SSI
As online research becomes more integrated into the everyday methodology of industries
across the board, we find the need to target for very specific groups of people. Whether we
need to target people who use a specific brand of antiperspirant or those with a rare form of
cancer, some rare populations can be almost impossible to find on an online access panel.
While the use of additional sample sources increases the feasibility for some of these projects
there are still valuable untapped resources that could make a world of difference in the success
of a project. The great advantage we have in the struggle find these rare populations is that
people with similar lifestyles or experiences tend to cluster—often sharing similar beliefs or
banding together based on a commonality such as a disease, an interest in the same model car,
or alumni from the same college. The practice of “Snowball sampling” (identifying one person
who fits the profile and asking that person to “spread the word” within their community) has
been a technique criticized by some, who have feared it will introduce unacceptable biases. But
with average project incidences continuing to fall, it may be time to take another look. SSI will
conduct side-by-side tests to compare data from snowball sample to both online access panels
and intercept sample. SSI will also test to find the optimal combination of sources and sample
types (panel, snowball, river, etc.) yielding the most sound data available from an online sample
frame. Topics for this test will include consumer goods, healthcare, known offline benchmarks
and more. The findings will help researchers in all industries create methodologically sound
sampling plans as they have in the past with the possible introduction of a broader reach made
possible by the use of snowball sampling.
Difficulty in Capturing Minority Populations in RDD Survey Through a Landline
Oversample
Timothy R. Sahr, Ohio Colleges of Medicine Government Resource Center; Bo Lu, The
Ohio State University; Marcus Berzofsky, RTI International; Amy Ferketich, The Ohio
State University; Jamie Ridenhour, RTI International; Thomas Duffy, RTI International
Often surveys are interested in oversampling certain minority populations in order to increase
the precision of the estimates for those sub-populations. In a telephone survey this can be done
with a landline frame by either targeting phone exchanges in certain Census tracts with higher
concentrations of the sub-population of interest or by using listed samples of phone numbers
with surnames in the population of interest (for ethnic targeting). However, as more individuals
in these targeted sub-populations (e.g., young adults, African-Americans, Hispanics) move to
cell phone-only phone use, landline-based oversample strategies become less effective. The
2012 Ohio Medicaid Assessment Survey (OMAS) oversampled ethnic minorities in Ohio using
both approaches – African-Americans through Census tract targeting and Asians and Hispanics
through list samples of surnames. In this paper, we describe the results of our experiences and
offer possible suggestions on how to improve the efficiency of the oversample, considering the
impact that increased cell phone sampling may have on geographic targeted landline
oversampling (e.g., metropolitan area African-American density sampling).
Methodological Briefs: Questionnaire Design
How Open Are We to the Open-Ended Questions?
Saida Mamedova, American Institutes for Research
Political polling and other opinion related surveys literature has a large body of knowledge on
open- vs. close-ended questions. These surveys are often telephone RDD or in-person
interviews or, more recently, Web-based surveys. High item non-response has been one of the
major reasons for surveys to avoid open-ended questions whenever possible. Even the open-
ended questions that are limited to filling out a text field have been known to have high non-
response. National Household Education Survey (NHES) in 2011 administered a mail survey
field test with imbedded experiments on open-/close-ended questions. The respondents to the
survey were first recruited by filling out a screener questionnaire. After an eligible child was
selected from the information in the screener, a more extensive topical questionnaire was sent.
The follow-up survey asked parents about their child’s education and the parental care and
family involvement in child’s development. Imbedded in the design, there were questions which
were asked in one form as an open-ended question and in another form as a close-ended
question. The two forms were tested experimentally. One such question was on how many
times a child was read to in the past week: one set of parents received an answer option in a
write-in form and another set of parents received an answer option in the form of categories. In
this paper, we will explore the response rates for these open-ended vs. close-ended option
items. Our hypothesis is that the open-ended items are skipped more often than the close-
ended items. We will use logistic regression to estimate the likelihood of response for one type
of question vs. the other, controlling for other factors that may affect the response. This study
will build up literature on the open- vs. close-ended questions as it relates to the mail household
surveys.
Navigating Complexity in PAPI: Improving Questionnaire Comprehension on a
Multi-National Media Trend Survey
Darby Steiger, Gallup; Kersten Weisbach, Deutsche Welle; Leah Ermarth, Broadcasting
Board of Governors
Members of the Conference of International Broadcasters’ Audience Research (CIBAR)
developed a core media consumption questionnaire in 2010 to ensure consistent and accurate
measurement of key performance indicators in the context of growing competition in local media
markets and at the same time ever-tighter budgets for public broadcasting. Compared with the
previous International Audience Research Program (IARP) questionnaire, the CIBAR core
questionnaire was designed to be shorter and tighter, and hence, better suited to the changing
research environment of declining response rates, growing interview costs and weary
respondents. In 2012, Gallup conducted a redesign of the instrument to further refine the
usability of the instrument for interviewers and data entry staff in the more than 50 countries
where the survey is administered by paper and pencil face-to-face interviewing. This paper will
present lessons learned from two companion efforts: 1) a qualitative and quantitative study
conducted by Deutsche Welle to test and compare the CIBAR core with the former IARP
questionnaire and 2) a review of navigational improvements made to the instrument in 2012 by
Gallup that have addressed many of the challenges identified in the IARP and original CIBAR
core questionnaires. The results of this study will shed light on key challenges in implementing
face-to-face paper and pencil surveys that involve complex skip patterns, multiple response
items, and recall items.
Measuring Happiness: Evaluating Life Satisfaction Versus the State of the World
Jason Husser, Elon University; Kenneth E. Fernandez, Elon University
The social scientific study of happiness has grown increasingly prominent. For instance, Federal
Reserve Chairman Bernanke recently called on scholars to create better measures of well-
being. We evaluate a common question designed to measure happiness: “Taken all together,
how would you say things are these days--would you say that you are very happy, pretty happy,
or not too happy?” Through two representative survey experiments, we show that the question
is fundamentally flawed. Rather than measuring satisfaction with one’s life, the oft-cited
happiness question actually measures satisfaction with the state of the world, politically and
economically. We suggest a simple correction of the question to better measure personal
happiness.
Investigating the Effects of Questionnaire Design and Question Characteristics
on Respondent Fatigue
Frida Vernersdotter, The SOM Institute, University of Gothenburg; Elias Markstedt, The
SOM Institute, University of Gothenburg; Jonas Hägglund, The SOM Institute, University
of Gothenburg
Overwhelming respondents with attitude. Looking for the contextual factors in questionnaires
leading to breakoff. Survey noncompletion, breakoff, is often overlooked in the discussion on
survey response rates as a proxy for data quality (Peytchev 2009). The contexts in which
breakoffs occur have not been thoroughly investigated. In this study we investigate how the
composition of question types affect breakoff propensity in the case of self-administered mail
surveys. We examine the effects of questionnaire design, in particular frequency and
concentration of attitude questions, on breakoffs. We draw on 26 years of consecutive self-
administered mail surveys in Sweden, conducted by the SOM Institute at the University of
Gothenburg, with a total of 73 000 respondents and 43 different questionnaires. The SOM
surveys cover a wide range of topics in society, media and politics and are used for academic
research on attitudes, values, self-reported behavior, and socio-economic status. The
questionnaires are on average 22 pages long and have a mean response rate of 55 percent
(RR1), 58 percent (RR2). For each of the questionnaires we identify the breakoff patterns in
order to determine what questionnaire design and question features have caused them.
Investigating Signs of Interview Fatigue: Decreased Reporting of Category
Expenditures
Brett E. McBride, U.S. Bureau of Labor Statistics
The survey design involving a screener or filter question followed by a series of more detailed
questions is used in many surveys, including the Behavioral Risk Factor Surveillance Survey
and the National Crime Victimization Survey. Some research has suggested that respondents
learn that reporting a certain answer to a screener question will extend the interview through a
series of follow-up questions and thus will alter their responses in a way that avoids the follow-
up questions (Kessler et al., 1998). Additionally, a change in screener question response
patterns over the course of an interview may reflect the cumulative cognitive burden that arises
from a long interview. Whether due to respondent learning or fatigue, measurement error may
be introduced into survey estimates. In the Consumer Expenditure Quarterly Interview Survey
(CEQ), screener questions ask whether respondents have expenditures in various item
categories over the course of an interview that lasts on average 56 minutes. Past research has
found evidence of panel conditioning in responses to a screener question in one section of the
CEQ (Shields & To, 2005). This research seeks to address whether there is a shift in responses
to screener questions over the course of the interview and what may account for this pattern.
The data examined comes from the wave one interview of the 2011 CEQ. Patterns of reporting
expenditures are examined in the responses to screener questions asked of all respondents. In
interviews involving a noticeable reduction in expenditure reporting, this research will identify
whether measures indicating respondent reluctance or survey burden appear to be associated
with the reduction. This paper will seek to disentangle the effects of decreased reporting and
survey characteristics. Findings from this research will suggest whether new screener question
formats or a reduction in survey length are warranted to confront decreased reporting of
expenditure categories.
Measuring Issue Attitudes: Open Versus Closed Questions Redux
David RePass, University of Connecticut
For decades, social scientists and polling practitioners have debated the relative advantages
and disadvantages of using open-ended versus fixed-choice questions to measure issue
attitudes. In this paper, an extensive amount of survey data is examined in search of a definitive
answer. First, let us postulate that if a person has an attitude, it will influence behavior. Indeed,
many definitions of attitude include behavior as a component. This study tests the hypothesis
that responses to fixed-choice issue questions are measuring issue attitudes and therefore
should be related to voting behavior. Every one of the 208 issues asked in National Elections
Studies since 1960 was correlated with vote (while controlling for attitudes toward the
candidates and party identification). However, in only 17 of the 208 tests did issue position
correlate significantly with vote. Thus, the null hypothesis was confirmed; fixed-choice issue
questions do not measure attitudes. When the open-ended most important problem (MIP)
question was tested in all elections since 1960, the issue attitudes ascertained by this measure
were strongly related to vote, as strongly related to vote as party identification. Next, let us
hypothesize that if a person has no attitude toward an issue, he or she will respond to a fixed-
choice issue question in an inconsistent or random manner. The amount of such 'flip-flopping'
can be observed by using panel studies. The National Elections Studies have conducted a
number of panel studies over the past six decades. The author has developed a new measure
that can estimate the amount of turnover in panel data. Using this measure, in 21 out of 25
fixed-choice issue questions asked in these panel studies, 56 to 77 percent of responses were
inconsistent or random. The paper will also critique a number of studies that have examined
fixed-choice versus open-ended methods of measuring issue attitudes.
Using Motivating Prompts to Increase Responses to Open-ended Questions in
Mixed-mode Surveys: Where Should the Prompt Be Placed and to What Effect?
Glenn Israel, University of Florida
Getting respondents to provide high quality information to open-ended questions in self-
administered surveys is a challenge. The evidence shows visual and verbal design elements
play a role in response behavior. Regarding visual design, creating an “optimal” size answer
space contributes to higher item response and longer answers in mail and Web surveys (Israel,
2010; Smyth et al., 2009). Likewise, including motivating information in the question stem was
shown to improve response quality in Web surveys (Smyth et al., 2009). Finally, mode impacts
responses, with Web surveys eliciting longer answers than mail surveys. Given interest in
mixed-mode surveys, I explore the effect of adding a motivating prompt to open-ended
questions to assess impacts on item response rate and response length for mail and Web
modes. Further, I test whether placing the prompt at the beginning or end of the question affects
responses. Data from a survey of Cooperative Extension Service clients are used for the study.
The importance prompt increased the item response rate for the question about improving
Extension’s services but it had no effect on the description question asking clients about getting
information, its use and the result. In addition, the importance prompt increased the item
response rate for mail surveys but not for Web surveys. I also found that the importance prompt
increased the number of words in answers provided by respondents for the improvement
question over having no prompt. This effect occurred for the prompt placed either at the
beginning of the question or at the end. The importance prompt did not affect response length
for the description question. Web responses were longer than mail, independent of the prompt
for both questions. The findings suggest there is some benefit to using a motivating prompt but
it is unclear when and why it will be helpful.
The Influence of Answer Box Format, Personal Topic Interest, and Respondent
Characteristics on Response Behavior in Open-ended Questions
Florian Keusch, University of Michigan
Previous research showed that the visual design of answer fields for open-ended questions in
self-administered surveys influences response behavior depending on the type of response that
is collected (Couper et al. 2011). For narrative responses, larger answer fields produce longer,
more elaborated responses (Christian & Dillman 2004; Israel 2010; Stern et al., 2007),
especially with less motivated respondents (Smyth et al., 2009). Questions that ask for
frequencies and numeric responses seem to be less influenced by the answer space provided
(Couper et al., 2011; Fuchs, 2009). Until now, no study has looked at the influence of the visual
design of answer boxes in open-ended questions that ask respondents to list all known items of
a specific category. Additionally, there is only limited research looking at the influence of
personal topic interest on response behavior in open-ended questions (Holland & Christian,
2009).This paper looks at differences in response behavior (number of items named, item
omission, response latency, and response order) between formats that provide the respondent
with one large answer box or ten small answer boxes when asked for unaided brand
awareness. In three experiments embedded in Web surveys, respondents from a non-
probability online panel were randomly assigned to one of two question formats asking for
unaided brand awareness of insurances (Experiment 1), airlines (Experiment 2), and car tires
(Experiment 3). In two of the three experiments personal interest in the topic of the survey could
be controlled for. The results of this study show that the number of brands named is significantly
higher when ten small answer boxes are presented in two of the three studies indicating that
respondents infer from the answer box format what the questionnaire designer expects from
them. Personal topic interest and demographic characteristics of the respondents seem to play
only a minor role.
International Public Opinion
The Americas Barometer: Public Opinion on Democracy and Governance Across
the Western Hemisphere
Keith Neuman, The Environics Institute for Survey Research; Mitchell Seligson,
Vanderbilt University
The Americas Barometer (www.AmericasBarometer.org) is a multi-country public opinion survey
on democracy, governance and political engagement in the Americas, conducted every two
years by a consortium of academic and think tank partners in the hemisphere under the general
coordination of the Latin American Public Opinion Project (LAPOP) at Vanderbilt University. The
Americas Barometer was first conducted in 11 countries in 2004, and most recently to 26
countries in 2012. It is the most expansive international survey project in the Western
Hemisphere. In each country, the survey is conducted with a representative sample of voting-
age adults, in all cases stratified by major regions in the country and in some cases including
oversamples to provide for more in-depth analysis of groups (e.g., Afro-Colombians) or regions
(e.g., internally displaced persons camps in Haiti. Surveys are conducted face-to-face with
respondents in their households, except in the USA and Canada where surveys are conducted
online using established Internet panels. This research represents a unique body of public
opinion data that is used extensively by academic researchers, governments, and organizations
such as USAID, the World Bank, the Organization of American States, the Inter-American
Development Bank and the United Nations Development Programme. The initial impetus for the
Americas Barometer was to chart the evolution of democracy and civil institutions in Latin
America and the Caribbean, but the issues covered are increasingly relevant to all countries
faced with mounting challenges of governance, crime, corruption, political and civic engagement
in the 21st century. This paper will introduce the Americas Barometer to AAPOR. It will provide
a brief overview of this project as a unique case study of an ongoing multi-country collaborative
project, and present selected findings from the 2012 survey with an emphasis U.S. public
opinion in terms of trends over the decade and comparisons with Canada, Mexico, Latin
America and the Caribbean.
When are Politicians Responsive to Public Opinion? Results from a Scenario-
Based Survey of 3,000 Swedish Politicians
Patrik Öhberg, Université de Montréal
In representative democratic states, responsiveness is a core value. No matter how fine-tuned
formal political rights or political institutions are, representative democracy does not function
well without responsiveness. On a general level, the notion of responsiveness has to do with the
connection between public opinion and public policy. Standpoints, priorities and values among
voters are supposed to leave their mark on outputs from the political system. However,
responsiveness is one of the most blurry notions within representative democratic theory and
we need better tools to understand why politicians are responsive to public opinion in some
situations, but not in other. In this paper, we try to contribute to the literature on responsiveness
by asking politicians themselves under what circumstances policy decisions should be affected
by swifts in public opinion. More specifically, this paper is the first to present the Panel of
Politician conducted at the University of Gothenburg, Sweden, to an international audience. The
panel includes almost 3000 politicians from local, regional and national levels. For example, 25
per cent of the country’s MPs participate in the surveys. Given that Sweden has a bite over 30
000 politicians, the number of participants in the panel is noteworthy. By presenting different
scenarios where public opinion differs from the standpoint of the politician, we hope to identify
mechanisms behind responsive behaviour. We vary the following mechanisms that can be
assumed to affect responsiveness to public opinion: a) personal self-interest, b) policy area and
c) different periods of the electoral cycle.
Social Media and Revolutions in Arab Nations: The Impact of Facebook on the
Arab Spring
Muteb S. Alhammash, Kingdom of Saudi Arabia
It has been more than a year since the world watched the revolutions that shook the Middle
East, the revolutions also known as the Arab Spring. There has been extensive material written
about the internal factors (corruption, greed, nepotism, despotism) which led to the revolutions
in Tunisia, Egypt, Yemen, Syria and Libya and there has been some material written about
external factors. This paper explores the connection between the Arab countries that revolted
and the use of social media sites, specifically Facebook, which acted as a “voice” for the people.
It is hypothesized that Facebook had an impact on the revolutions, an impact that continues
today. In addition to data from recent studies, this paper implements a survey which will attempt
to gather data from a pool of Arab citizens and will endeavor to understand the respondents’
experiences with social media, revolution, and their perceptions of each. Key words: Arab
Spring, Revolution, Social Media, Facebook, Tunisia, Libya, Egypt, Yemen, and Syria.
Interviewer Effects in the Arab Gulf: Lessons from Bahrain and Qatar
Justin Gengler, Social and Economic Survey Research Institute, Qatar University
Although the Arab world is experiencing a critical transition in the availability of systematic and
objective public opinion data, researchers continue to rely on techniques developed in non-Arab
societies to evaluate overall survey quality and estimate the total survey error. Interviewers are
one of the sources of measurement error in surveys, and researchers have invested significant
resources to create methods for detecting and reducing those errors. There are a handful of
studies on interviewer effects in surveys conducted in the Middle East and North Africa, yet
none examines how the ethnicity or nationality of an interviewer influences respondent answers
to sensitive survey questions. Furthermore, no study of interviewer effects of any type has been
conducted in the Gulf region, where the outwardly-observable categories of ethnicity and
nationality retain special social and political salience. This study asks whether and why
interviewer nationality and ethnicity affects responses to questions about political attitudes and
behavior. Using data from the 2009 Arab Barometer survey conducted in Bahrain and two
nationally-representative surveys conducted in Qatar in 2010 and 2013, the study finds strong
evidence that the ethnicity and nationality of interviewers affects a variety of attitudinal questions
related to sensitive social and political topics.
Freedom is in the Eye of the Beholder: Examining Perceptions of Media Freedom
in China
Kay Ricci, University of Nebraska – Lincoln; Quan Zhou, University of Nebraska – Lincoln
Characterized by its stringent censorship practices and historical adherence to a “dominance
model of media” (McQuail 2005), the government of China faces new challenges with respect to
the Internet’s growing penetration of its population. According to the China Internet Network
Information Center (CNNIC), Internet users in China have grown dramatically from 58 million in
2002 to 538 million in June 2012. Although the Chinese government has attempted to tighten its
control over the Internet, this network remains relatively unrestricted when compared to other
media. Thus, the Internet has fostered the rise of a public sphere that encourages interactions
among its citizens (Yang 2003). This is in stark contrast to the traditional one-way
communication in which the public only accepts the views disseminated by the government.
Previous research has addressed the media’s effects on people’s confidence and trust of the
political system (Chen and Shi 2001). This paper examines the Chinese public’s attitudes
toward the media itself. Using data from the 2010 Gallup World Poll, a multinational probability-
based survey, this paper examines the impact of critical factors, such as Internet access,
education, confidence in institutions, and sector of employment, on the public’s perceptions of
media freedom in China. Preliminary analyses suggest that a higher proportion of individuals
whose homes lack Internet access believe that the Chinese media has “a lot of freedom”
(75.9%), compared to those who report having Internet access (63.1%). Additionally, individuals
with lower levels of education are more likely than those with higher education to think the
media enjoys “a lot of freedom” (78.5% vs. 47.6%). Given that there is still a great deal of
potential growth in Internet usage and changes in the educational system, our findings shed
light on the development of China’s civil society and the changing attitudes of its people.
Investigating Challenges of Internet Surveys for
Public Health Programs and Policies:
From Neighborhood to Nation
The Triple Constraints of Health and Behavioral Surveys: Cost, Quality, and Time
Carol Crawford, Centers for Disease Control and Prevention
Survey methodologists have always had to balance competing demands of lower costs, higher
quality (coverage and non-response), and more timely data. The need to do so has become
imperative and will continue to become more so in the face of austere budgets. Most door-to-
door face-to-face surveys using multi-stage address-based samples, still considered the gold
standard, gave way to random digit dialed (RDD) phone surveys because of cost and time. Now
RDD phone surveys are facing considerable challenges. The population coverage rates are
being eroded by wireless-only households, portable telephone numbers, telecommunication
technology barriers (e.g., call-forwarding, call-blocking and pager connections), increased
refusal rates and privacy concerns. While substantial research continues to alleviate many of
these problems, the costs associated with RDD surveys remain high and the response rates are
typically low. Moreover, the time from design to data release takes two years for most federal
and state government surveys (e.g. National Health Interview Survey, Behavioral Risk Factor
Surveillance System, and the California Health Interview Survey), making timeliness of the data
less than optimal for efficient and effective public health programs and policy prioritization and
evaluation. Different sampling frames, modes and analytical methods that may overcome these
challenges and assist state public health professionals to continue to collect affordable quality
and timely data that are representative of their respective populations are being evaluated.
Novel approaches to health and behavioral surveillance include single and blended non-
probability opt-in panels, and new statistical estimation methods. This presentation covers some
of the novel approaches and preliminary results from pilot studies being conducted by the
Division of Behavioral Surveillance, Centers for Disease Control and Prevention in wide-ranging
public-private collaborations with states, academic researchers, and private companies.
Statistical Adjustments for Internet Opt-in Panel Surveys
Sunghee Lee, University of Michigan
The data needs for producing population estimates for various subgroups at varying geographic
levels in a timely manner are on the rise. Because it is difficult to satisfy those needs with
traditional probability samples due to their high resource requirements, survey practice has
turned to data collection using Internet opt-in panels. This practice, however, does not provide
data with desirable unbiased properties due to the nonprobabilistic nature of the sample yet has
outpaced the effort to understand the errors and to develop statistical methods to correct for
them. In this study, we will use data from Centers for Disease Control and Prevention (CDC)
funded study of health related quality of life and well-being measurement that included the ten-
item measure from the Patient-Reported Outcomes Measurement Information System
(PROMIS). These data were collected using an Internet opt-in sample that simulated census
demographic distributions of the U.S. general population. PROMIS items, including the self-
reported health item, have been part of well-established probability-based CDC national
surveys. The analysis will focus on the comparison on these items across data sources. The
comparison includes three types of statistics: 1) point estimates of the common variables for the
general population, 2) point estimates for the population subgroups (e.g., gender, age,
race/ethnicity, education, geography), and 3) relationships across variables through regression
modeling. The data will be used with and without weights in these comparisons. We will also
discuss how such data may be blended with probability sample data. The findings from this
study will enhance empirical evidence to understanding and using Internet opt-in panel data.
Internet Opt-In Panels Assessing Political Effects on Health Care
Stephen Ansolabehere, Harvard University
This paper analyzes standard measures of reported health effects from the 2012 Cooperative
Congressional Election Study (CCES) and the 2012 NORC Election Study and compares them
with survey results from prior national health surveys. The 2012 CCES consists of a 55,000
person sample of on-line respondents and was conducted by YouGov. The 2012 NORC study
was conducted by NORC and consists of 2,000 random digit-dial phone responses from a
mixed land-line and cell phone sample frame. I compare the national results from these two
samples with prior national health surveys to gauge possible mode effects. I further study the
variation on means, standard deviations, and correlations across states in the 2012 CCES. Prior
research comparing on-line, phone, and mail surveys by Ansolabehere and Schaffner (2009)
found no significant mode differences for health and political questions and demographics.
Identifying Sample Source of Sufficient Quantity, Availability, and Consistency to
Meet Local Public Health Needs
Stephen Gittelman, Marketing, Inc.
There is an increasing interest in timely county- and community-level data to track health status,
health behaviors, and health care access. Currently, the best reliable estimates come from
aggregating three or more years of state health surveys using random digit dialed (RDD) dual-
frame telephone surveys. RDD surveys face decreasing response rates, increasing costs, and
infeasibility at county- and community-level. Surveying over 3,000 counties in the United States,
would require a budget beyond that available in these difficult times. New survey and analytical
methods that provide reliable estimates that meet local public health needs are needed. This
study represents an initial effort at online data collection to address these challenges. The
online double opt in panels have stood as the stalwart of sourcing for market research but the
demands of the public health community for granularity and feasibility may outstrip the
capabilities of current panels, estimated at 8 million members. A sample source of sufficient
quantity, availability, and consistency has to be identified. Second, a criterion by which the
sample frame is to be engaged has to be determined. Third, health related information is
correlated to demography, those variables that are not so constrained must be identified and
appropriate behavioral controls to balance the sample frame considered. No obvious covariance
with the test variables needs to be demonstrated. This is an ongoing study in which preliminary
data from four states will be available by the end of first quarter 2013. This presentation will
present the results of this study to date and provide recommendations to support additional
efforts in this area moving forward.
Cross-section vs. Panel Estimates of Vote Intention During an Election Campaign
Doug Rivers, Stanford University and YouGov USA
Analyzing change in voter preferences using repeated cross-sections depends upon stable
sample composition. Researchers routinely weight samples to control for demographic variation,
but have been reluctant to use attitudinal data for sample balancing, due to the lack of reliable
benchmarks. In a panel design, however, it is feasible to correct for selection bias in multiple
waves using baseline demographic and attitudinal data. Selection and weighting methods are
described and evaluated using data from the 2012 U.S. elections.
Item Nonresponse: Prediction and Compensation
Predicting Item Nonresponse in a Recontact Study of Youth
Jennifer L. Gibson, Fors Marsh Group; Ashley A. Barbee, Fors Marsh Group; Luke Viera,
Fors Marsh Group
Past research indicates that respondents may ‘satisfice,’ or conserve time and energy and yet
produce an answer that seems good enough. This behavior, which is driven by respondent
motivation, task difficulty, and cognitive ability, is likely to affect data completeness and quality
and by extension the validity of study conclusions. Given the potential impact on study data, it is
important to understand the various predictors of such behaviors when developing and
analyzing survey results. The goal of this study is to examine predictors of two measures of data
quality (item nonresponse and underreporting) for respondents of a recontact (advertising
tracking) study of young adults who had completed a previous survey also regarding military
recruiting. The advertising tracking survey follows an interleafed design. Each filter item
indicating whether a respondent recalled seeing the target advertisement is followed by more
detailed items only for respondents who answered a filter item affirmatively. Because
respondents tend to learn that negative responses to filter questions help them complete the
survey more quickly, some will begin to underreport recall as measured by the filter items. We
examine underreporting and item nonresponse as functions of motivation and past behavior.
Indicators of motivation assessed on the seed survey include demographic and attitudinal items
related to interest in military service (i.e., relevance of the survey topic). Underreporting and
item nonresponse on the advertising tracking survey are predicted based on these measures of
motivation and item nonresponse on the seed survey.
Adjust Survey Response Distributions Using Multiple Imputation: A Simulation
with External Validation
Frank C. Liu, Institute of Political Science, National Sun Yat-Sen University; Yu-Sung Su,
Department of Political Science, Tsinghua University
One commonly acknowledged challenges in polls or surveys is item non-response, i.e., a
significant proportion of respondents conceal their preferences about particular questions. This
paper presents how multiple imputation (MI) techniques are applied to the reconstruction of vote
choice distribution in telephone and face-to-face survey samples. Given previous studies about
using this method in adjusting vote share information drawn from pre-election survey/poll data,
this paper gives more attention to external validity of this method. Using survey data sets
collected in Taiwan in early 2013, the authors take two steps to study the utilities of this method.
First, they randomly take out a proportion (about 1/3 to 1/2) of values in a variable with few or no
missing values. Respondents of the missing values in the chosen variables are contacted. Their
responses are compared against the “guesses” generated by MI. This paper reports and
concludes the utility of applying MI to point-estimation adjustment.
Reduction of Item Nonresponse Bias by Accommodating Unequal Selection
Probability in Multiple Imputation: Applications on Income Data in BRFSS and
NHIS
Hanzhi Zhou, Institute for Social Research, University of Michigan
Income-related health inequality has been a special interest of researchers/agencies who
conduct health surveys. However, the disproportionately high item nonresponse rates on
income questions relative to other survey questions usually hinder such investigations. Although
multiple imputation (MI) has been adopted by survey researchers to deal with missing data,
there is inconsistency between the MI theory and its applications in practice. On one hand, data
production in the public health and social science research is often based on complex sample
surveys. On the other hand, existing software packages and procedures typically do not
incorporate complex sample design features in the imputation process. Failure to account for
design features can introduce severe bias on final estimates and hence invalid inference. In this
paper, we applied a two-step MI method that was previously developed by us on two large
public health surveys. Under the method, the complex feature of the survey design (including
weights, clustering and stratification) is fully accounted for at the first step through a synthetic
data generation procedure; conventional parametric MI for missing data is performed at the
second step using readily available imputation software designed for an SRS sample. Data
users need only to apply simple unweighted estimation methods to the imputed datasets. Using
survey data from the Behavior Risk Factor Surveillance System (BRFSS) and National Health
Interview Survey (NHIS), we evaluated the performance of our method in comparison with
existing MI techniques. Extensive analyses are conducted on the income variable and related
health measures for full-sample as well as domain estimation. The new method results in
significant reduction in the bias, particularly in the presence of model misspecification or
informative sampling.
Using Paradata, Questionnaire Characteristics and Respondent Characteristics to
Examine Item Nonresponse
Ana Lucia Cordova Cazar, Gallup Research Center, University of Nebraska – Lincoln;
Rebecca Powell J. Powell, Gallup Research Center, University of Nebraska – Lincoln
Because inaccurate data have little use, data accuracy is one of the main dimensions of survey
quality (Biemer and Lyberg, 2003). Paradata, data about the data collection process, can shed
light on ways to enhance data accuracy by allowing one to investigate factors that may lead to
difficulties in the response process. When the response process is difficult for respondents, item
nonresponse may occur. Item nonresponse is problematic not only because it has the potential
to affect data accuracy, but also because it may create analytical difficulties as both effective
sample size and statistical power are reduced (Beatty and Hermann, 2002). Cognitive
processes that underlie a respondent’s decision to give an answer have received substantial
attention (De Leeuw et al., 2003). The majority of these studies, however, have not used
paradata to investigate item nonresponse. This study aims to fill that gap. Paradata and survey
responses collected from the Internet component of the Gallup Panel are used to examine the
extent to which characteristics such as the time spent filling out a questionnaire, the
questionnaire’s topic, the respondent’s level of interest in the survey, and the respondent’s
demographic characteristics influence whether the respondent completes the entire
questionnaire. A two-part multivariate model will be used to predict whether a respondent gave
an answer to every question in the survey, and if not, to identify the factors affecting the
proportion of item nonresponse. In a sample of 17,045 respondents who answered a first
questionnaire on media usage and a second questionnaire on world affairs two months later,
preliminary analyses indicate that respondents’ characteristics such as age and education are
significant predictors of item nonresponse, and that these variables interact with survey topic
and time devoted to answer the questionnaire.
Eliminate Item Non-Response: The Effect of Forcing Respondents to Answer in
Web Surveys
Laura Leach, Graduate Management Admission Council
In Web-based surveys, a forced response to questions can be a solution to item non-response.
This method may come with costs, however, that could affect the quality of responses and
respondent drop-off rate. The Graduate Management Admission Council© (GMAC) conducted a
Web-based survey with 4,135 motivated graduate business school alumni. GMAC investigated
the impact that forced-response items had on respondent drop-off and qualitative differences in
item answers. The survey used a random split-sample: Half of the respondents were forced to
answer all survey questions. The other half was allowed to move to the next question without
answering the current question. For the latter group, a request-response prompt notified
respondents of an unanswered question and asked whether they would like to continue with or
without answering the item. The survey was comprised of 49 questions and had an average
completion time of 20 minutes. No differences were found between the forced-response and the
request-response conditions with respect to respondent drop-off. In addition, no differences
were found in the attitudinal nature of the response items. The forced-response and request-
response designs had no impact on the response to categorical items regardless of placement
in the survey. There was a marginal impact on items of personal sensitivity, such as
compensation; however, this was not true for all finance-related questions. A motivated and
interested population of graduate management alumni completed a lengthy questionnaire
without regard to the treatment in this study. Furthermore, the content of responses was not
impacted by forced or requested item conditions, and the only hesitancy was to reveal sensitive
information, which is a common survey respondent issue.
Sunday, May 19
10:15 a.m. – 11:45 a.m.
AAPOR Concurrent Session K
Toward the Surveys of the Future
Envisioning the “Survey” of the Future: The Role of Smartphones and Tablets in
Face-to-Face Interviewing
Robert Manchin, Gallup Europe; Femke De Keulenaer, Gallup Europe
The focus of this paper is on the role that technology can play in advancing the practice of face-
to-face interviewing. More specifically, we will illustrate new ways of using smartphones and
tablets during all stages of the data collection process, going beyond solely using these devices
as a means to record respondents’ responses. We will start by discussing how an application
designed for use on a smartphone or tablet can help to construct an area sampling frame and
draw a random sample. This application randomly selects one or more square-shaped areas
(PSUs) in a municipality and samples a pre-defined number of points/locations in each PSU;
each point is then “reverse geocoded” into an address and uploaded to the interviewer’s device.
At this point, the device becomes a direct assistant to the interviewer; e.g. interviewers can use
built-in maps to navigate to the exact location/building that they have to locate. The second part
will address issues related to collecting paradata and interviewer quality control. All aspects of
the interviewer’s task – locating addresses, completing contact forms, randomly selecting a
respondent in each household etc. – can run via an application on the interviewer’s smartphone
or tablet. In other words, a large amount of paradata will be (automatically) collected and will be
almost instantaneously accessible to the fieldwork managers, offering new possibilities for
responsive survey design and interviewer quality control (e.g. via a built-in GPS locator and time
stamps automatically attached to each step).In the final part of the paper, we will illustrate new
ways to enrich survey data not only with location-related context data (e.g. using geo-location
technology to link geo-spatial crime data to survey data), but also with “non-survey” data
collected via the interviewer’s smartphone or tablet (e.g. measurements of air quality).
Conversational Interaction and Survey Data Quality in SMS Text Interviews
Michael F. Schober, The New School for Social Research; Frederick G. Conrad,
University of Michigan
Christopher Antoun, University of Michigan; Alison W. Bowers, University of Michigan;
Andrew L. Hupp, University of Michigan; Huiying Yan, University of Michigan
As people increasingly adopt SMS text messaging for communicating in their daily lives, texting
becomes a potentially important way to interact with survey respondents, who may expect that
they can communicate with survey researchers as they communicate with others. Thus far our
evidence from analyses of 642 iPhone interviews suggests that text interviewing can lead to
higher quality data (less satisficing, more disclosure) than voice interviews on the same device,
whether the questions are asked by an interviewer or an automated system. Respondents also
report high satisfaction with text interviews, with many reporting that text is more convenient
because they can continue with other activities while responding. Here we report analyses of
how text interviews differed from voice interviews in our corpus. Text interviews took more than
twice as long, but the amount of time between turns (text messages) was large, and the total
number of turns was two thirds as many as in voice interviews. As in our voice interviews, text
interviews with human interviewers involved a small but significantly greater number of turns
than text interviews with automated systems, not only because respondents engaged in small
“talk” with human interviewers but because they requested clarification and help with the survey
task more often than with the automated text interviewer. Respondents were more likely to type
out full response options (as opposed to equally acceptable single character responses) with a
human text interviewer. Analyses of the content and format of text interchanges compared to
voice interchanges demonstrate both potential improvements in data quality and ease for
respondents, but also pitfalls and challenges that a more asynchronous mode brings. The
“anytime anywhere” qualities of text interviewing may reduce pressure to answer quickly,
allowing respondents to answer more thoughtfully and to consult records even if they are mobile
or multitasking.
Piloting a Mobile Data Collection Application: SurveyPulse
TM
, by RTI International
David J. Roe, RTI International; Michael Keating, RTI International; Yuying Zhang, RTI
International
The landscape of survey research continues to change with the evolution of mobile technologies
and increased accessibility of Smartphones and tablet PCs. Both the adoption and the
computing power of these devices are on the rise, providing users with increased exposure to
information and opportunities to interact on a personal device. As a result, researchers must
adapt to changing communication patterns and habits, and it is becoming more important than
ever to explore the best methods for incorporating mobile data collection into survey research.
While Smartphone survey applications (apps) have the potential to offer a robust set of features
to researchers: instant data capture, real time insights, location data, multimedia access
including video and the use of cameras, and better respondent communication tools such as
push notifications, email and SMS (text), implementation, deciding what to implement and how
to implement it can be a serious challenge. Many things must be taken into consideration, from
building a custom app to buying into a panel using an already developed app, to data security,
to the provision of user support. Further, applying best research practices for sampling,
recruiting, coverage, and maintaining a panel of users must also be a part of exploration. This
presentation focuses on the development and pilot testing of SurveyPulse
TM
, by RTI
International, from the decision to build a custom app, to recruiting and maintaining a panel of
users. SurveyPulse
TM
is a mobile application designed to deliver surveys to users across
multiple devices including tablets, platforms and operating systems and collect data in real time.
Included is a discussion of app development and distribution, recruiting, data collection
operations, data quality, user engagement and respondent communication. Also included is a
discussion of plans for next steps, future research and expansion of this data capture method.
The iPad
®
Computer-Assisted Personal Interview system—A Revolution for In-
Person Data Capture?
Heather Driscoll, ICF International; James Dayton, ICF International; Autumn Foushee,
ICF International
In-person interviewing has long utilized paper-and-pencil surveys as the data collection mode
for observational studies. At a time of increased scrutiny from the public and rising costs,
electronic data collection devices are dramatically changing the landscape of these types of
studies. ICF has conducted several pilot studies using our iPad
®
Computer-Assisted Personal
Interview system (iCAPI) since 2010 and found that it allowed for more efficient data collection,
monitoring, cleaning, and analysis. Most recently, ICF conducted a study of the economic
impact of Pennsylvania’s water trails on the state’s economy. This was our first complete
implementation of our newly developed iCAPI. Our interviewers surveyed visitors to water trails
(rivers that have been designated as a recreational water trail because they are important
corridors between specific locations) at hundreds of boat and kayak launch sites during the
summer of 2012. Through record heat waves, intense thunderstorms, and unpredictable site
conditions, our interviewers successfully collected expenditure data from roughly 400 water trail
visitors, using the iCAPI. Our most recent work in Pennsylvania confirmed and expanded on
what we learned in our pilots, addressing questions, such as how easily can interviewers pick
up the iCAPI system; how effective are the GPS and map capabilities; how do the iPads
perform over weeks of data collection; and, is the iCAPI system greener, faster, better and
cheaper? We were surprised by some in-field scenarios that were resolved with iCAPI;
however, is the iCAPI system the perfect, sustainable intercept solution? Our paper will explore
the advantages and limitations, as well as our ideas for refining the next iteration of applications
for our iCAPI.
New Approaches to the Study of Attitude Formation and
Political Behavior
A Multi-Survey, Multi-Methodological Assessment of Perception of Need and
Quality of Life: Opinion Polling for the Common Good
Don Levy, Siena Research Institute
While public opinion polling is central to pre-election analysis, the sustainability of our craft may
hinge on the degree to which we contribute to ongoing efforts to promote and enhance the
common good. Locally, it may matter more to citizens to garner a clear understanding of shared
need than what the projected vote totals may be in the next election. This paper discusses a
methodological triangulation study measuring the perception of need in one northeastern county
that includes a major urban center as well as variation in respondent quality of life and a ranking
of governmental services. We conducted three surveys, two RDD with cells of the general public
and one via mail/phone and Web from among service providers across non-profits, educators,
public officials and clergy. Survey questions included multiple quality of life indicators,
perceptions of need across multiple areas, and opinion questions pertaining to root causes of
enduring societal problems and appropriate collective future directions. Data from the three
surveys – 623 respondents to the Quality of Life survey, 1306 to the Community Needs
Assessment and 391 to the multi-methodological service provider survey – were analytically
merged with available secondary data and presented to the public through not only a report but
also through publication in a local newspaper and a video on YouTube but also in three well-
advertised public forums attended by over 200 residents. Using multiple surveys and methods,
the variation in the public’s perception of life in the county, the impact on some of their social
location and the quality of local services was measured and reported to officials, service
providers and the public. By making the data available in multiple forms and actively inviting
comment and interactive discussion, the research stimulated collective response including the
formation of an information and capacity sharing cooperative among local non-profits.
The Storm of the Century: Assessing the Effects of a Natural Disaster on
Electoral Behavior and Attitudes
Krista Jenkins, Fairleigh Dickinson University; Dan Cassino, Fairleigh Dickinson
University; Peter Woolley, Fairleigh Dickinson University
In October 2012, Hurricane Sandy hit the eastern seaboard and brought to an abrupt end
attempts to conduct a pre-election survey in the days leading up to the presidential election.
Almost two million New Jerseyans were without power, thousands were displaced, and
telephone service (both cell and landline) was rendered inoperable for a large proportion of
households. Prudence dictated that interviewing be suspended as even those residents who
might have been reachable via phone struggled to recover from the storm. In short, the
widespread nature of non-coverage was an insurmountable challenge to ongoing pre-election
polling. However, rather than abandon the survey, our research design morphed into a panel
study, whereby we recontacted the 400+ registered voters who were interviewed in the days
preceding the hurricane’s arrival. When power and phone services were widely restored we
resumed the study and emerged with a unique data set from which to assess individual level
effects of a natural disaster on electoral behavior and attitudes. Thus we revisited questions
concerning an individual’s voting intention, candidate preferences for both president and U.S.
Senate, public questions that touched on referenda for higher education bonds, judicial fringe
benefits, and favorability of key national and state political actors. The data and paper address
several questions including “Given the opportunity to look presidential in non-partisan settings,
do natural disasters increase the prospects for incumbent presidents?”, “Do natural disasters
heighten, diminish, or have no effect on one’s likelihood of voting?”, and “Are attitudinal and
behavioral changes dependent on the degree of loss one experiences as the result of a natural
disaster?” These questions, although basic, are rarely addressed given the infrequency with
which natural disasters are so closely timed with elections.
Bayesian Estimation and the 2012 Presidential Election Exit Poll
Clint W. Stevenson, Edison Research
Election exit polling provides a unique opportunity to collect vote results as well as other
information on the voting population immediately after a voter casts their vote on Election Day.
Due to the nature of elections there is a significant amount of prior information available for each
state, county, and precinct. This provides an excellent opportunity to apply Bayesian estimation
to the exit poll data. Traditionally, exit polling is analyzed using frequentist approaches (e.g.
hypothesis testing). This paper will discuss Bayesian approaches and how exit poll data can be
analyzed and updated beginning at the start of Election Day until polls close. After polling
locations close they often make the actual vote count available. All of these data (including all
prior knowledge) can be combined to develop a Bayesian model to estimate the Election Night
results quickly and accurately.
Preference-Based Measures of Media Exposure
Thomas J. Leeper, Aarhus University
Media exposure is among the most important constructs in political behavioral research, yet
agreement on operationalization is lacking. Beyond susceptibilities to various biases, standard
measures of exposure gloss over differences in informational content received by individuals
who report similar levels of exposure. Prominent alternative approaches propose to measure
exposure through specific news stories or particular news programs. For use in most research,
however, both approaches are burdensome on researchers and respondents and lack
robustness across temporal, political, and media environments. Analyses of nationally
representative Pew Research Center surveys from 1996 to 2008 and a large, online panel
survey indicate that a preference-based measure of news-following offers a viable alternative
that is more robust, strongly predicts variations in all extant exposure metrics, and is reliable at
the individual and aggregate levels. These findings have implications for the conceptualization
and measurement of media exposure and normative implications related to citizen awareness.
Separating Political Attitude Change from Attitude Uncertainty: (In)Consistency
Experiments of the ESS Panel Component
Sedef Turper, University of Twente; Kees Aarts, University of Twente; Minna van Gerven,
University of Twente
As far as its vital role for explaining causal mechanisms is concerned, change has always been
a great interest to scholars. Scholarly attention paid tracing and explaining changes in attitudes
and behavioral patterns of diverse populations, paved the way to many wide scale cross-
sectional time-series data collection projects in the field of social sciences. However, while
repeated cross-sectional surveys provide data about aggregate level trends, the evidence they
provide about micro-level processes underlying these macro changes is indirect. Thus, the
knowledge that standard cross-sectional studies can provide is destined to be incomplete in the
absence of more direct evidence about micro-processes. This paper attempts to the shed light
on the micro-level political attitude change processes through (in)consistency confrontation
experiments conducted as a part of the Panel Component of European Social Survey. In these
experiments, a subset of the panel respondents are asked to confront with their responses from
the previous wave, irrespective of whether they offered a consistent or an inconsistent answer.
The design of the experiments allows us not only to systematically analyze the micro-level
processes underlying political attitude change, but also to differentiate between genuine attitude
change and attitude uncertainty. We first present to what extent level of attitude uncertainty and
susceptibility to attitude change differ with respect to level of education, political interest, and
attitude strength by using four-wave panel data representative of Dutch population over age 16.
Secondly, we further investigate the nature of observed political attitude change among different
education level, political interest and attitude strength groups through the examination of
(in)consistency experiments. Analysis of the experimental data provides us with better
understanding of attitude change at the micro-level and also with direct evidence needed to
complement the statistical inferences on separation of attitude change from measurement error.
Investigating the Effectiveness of Incentives
Interviewer Attitudes and the Effectiveness of Monetary Incentives
Ulrich Krieger, German Internet Panel
Studies have shown interviewer characteristics, such as race, ethnicity, and gender can have a
negative effect on data quality (Singer, Frankel, & Glassman, 1983; Catania, et al., 1996; Davis
et al., 2010; O’Muircheartaigh & Campanelli, 1998). While much is known about how interviewer
characteristics affect a survey respondent’s answers, little is known about the measurement
error effect due to interviewer attitudes on the survey topic. The known studies that investigate
interviewer attitudes focus on attitudes towards their job satisfaction and performance and the
effect it has on production rates or data quality (Singer, Frankel, & Glassman, 1983; Hox, de
Leeuw, & Kreft, 1991). There are no known studies that examine if interviewer attitudes on the
survey topic have an impact on survey respondent’s answers (i.e. measurement error). Using
data from the National Survey of Family Growth, a national face-to-face survey that has both
interviewer-administered (CAPI) and self-administered (ACASI) components, and interviewer
characteristic data (i.e. demographics and attitudes), this study examines discrepancies
between respondent answers from CAPI to ACASI on sensitive items (e.g. on number of sexual
partners and abortions) and how interviewer attitudes on sexual behaviors and other
demographics (e.g. age, religion) may relate to those discrepancies. The main hypothesis
guiding this investigation is that interviewer attitudes about the survey topic, particularly
sensitive topics, might unwittingly be transmitted to respondents and influence respondent
answers for sensitive questions. Preliminary analysis shows that interviewer attitudes on sexual
behaviors are correlated with respondent answer discrepancies from CAPI to ACASI, for both
number of lifetime sexual partners and number of abortions. Further investigation is warranted
and those results will also be reported.
The Influence of Respondent Incentives on Item Nonresponse and Measurement
Error in a Web Survey
Barbara Felderer, Institute for Employment Research; Frauke Kreuter, University of
Maryland JPSM & IAB; Joachim Winter, University of Munich
Even though a sampled person may agree to participate in a survey, she may not provide
answers to all of the questions asked or might not answer questions correctly. This may lead to
seriously biased estimates. It is well known that incentives can effectively be used to decrease
unit nonresponse. The question we are analyzing here is whether incentives are able to
decrease item nonresponse and measurement error as well. To study the effect of incentives on
item nonresponse and measurement error, an experiment was conducted with participants of a
Web survey. In addition to an incentive for participation, an extra prepaid incentive ranging from
0.50 Euro to 4.50 Euro was given to some respondents towards the end of the questionnaire in
the form of an Amazon-voucher. At the same time, respondents were requested to think hard
about the answers to the next questions and be as precise as possible. In this experiment there
are two reference groups: one group received the request but no incentive and the other did not
receive any request or incentive. The questions within the incentive experiment contain
knowledge questions, recall questions referring to different time periods, and questions about
subjective expectations. We approach our research questions in three steps: Our first analysis
focuses on the effect of incentives on the proportion of “don’t know”s and “no answer”s. In a
second step, we look at the amount of rounding and heaping as an indicator for measurement
error. In the third step, we examine measurement error directly for two variables (income,
unemployment benefit recipiency) by linking the survey data to German administrative records
and computing the difference between survey response and administrative records.
Comparisons across the different incentive groups will allow for an assessment of the
effectiveness of incentives on item nonresponse and measurement error.
Improving Panel Maintenance Success on a Longitudinal Study
Tiffany L. Mattox, RTI International; Jennifer L. Domico, RTI International; Daniel J. Pratt,
RTI International
Minimizing sample member attrition is vital to the success of longitudinal research. Key steps in
this effort include periodically locating the sample members and confirming or updating their
contact information. RTI International is conducting the third follow-up survey for the Education
Longitudinal Study of 2002 (ELS:2002), conducted for the National Center for Education
Statistics, U.S. Department of Education. The study follows high school students over time to
determine how their high school experiences influence their lives as they continue on to
postsecondary education, the work force, and forming families. Sample members were originally
surveyed as 10th grade students in 2002 (base year) and/or as 12th grade students in 2004
(first follow-up). The second follow-up was conducted in 2006. Panel maintenance activities
were then performed at multiple points prior to the third follow-up. Given that the previous
follow-up interview was conducted 6 years prior, we anticipated challenges in locating sample
members for the third follow-up full scale data collection in 2012. Thus, we conducted an
experiment with the third follow-up field test sample to determine whether offering $10 to sample
members – if the sample member or a parent updated or confirmed contact information on file
for the ELS:2002 sample member – would increase panel-maintenance participation. The
significant positive outcome of the experiment led us to extend this $10 panel-maintenance-
participation offer to the entire full-scale sample during the panel maintenance effort prior to the
start of third follow-up full-scale data collection. In this paper we provide results from the field
test panel-maintenance experiment and examine the panel-maintenance response from the full-
scale sample prior to the third follow-up full-scale data collection. In addition, we examine the
third follow-up full-scale survey response status of the panel maintenance respondents to gauge
the ultimate success of these efforts.
50 Years Later: Do Respondents Who Remember the Initial Survey Provide Higher
Quality Responses to a Follow-Up Survey?
Danielle K. Battle, American Institutes for Research; Rebecca Medway, American
Institutes for Research
Groves, Presser, and Dipko (2004) found that people predisposed to be interested in a
particular survey topic were more likely to participate in a survey on that topic. Studies focusing
on topic interest have looked at its effect on cooperation with a survey request, but little
research has evaluated the effect of topic interest on response quality. We hypothesize that
those for whom the topic is highly salient are more highly engaged and thus put more effort into
responding to survey questions. This paper presents results from the 2011-12 Project Talent
Follow-up Pilot Study, which assesses the feasibility of reengaging a representative random
subsample of the initial 1960 Project Talent participants. The initial Project Talent survey was a
large-scale longitudinal study that collected extensive cognitive, personality and background
information from 440,000 9th-12th graders in 1960. In the 2011-12 follow-up, participants were
asked if they remember participating in the 1960 Project Talent study (about 60% did); we use
the response to this item as a measure of topic interest. The follow-up also included a prepaid
incentive experiment where participants were randomly assigned to receive no incentive, $2, or
$20. This paper examines recall of the initial 1960 Project Talent study among the 2011-12
Follow-Up Pilot Study respondents and determines whether recall is predictive of response
quality. It also looks at whether offering an incentive reduces any differences in response quality
between those who do and do not recall the initial study. Response quality outcomes include
item nonresponse, amount of time spent completing the questionnaire, straight-lining/non-
differentiation response, round values, and consistency of responses to personality measures
across the 1960 and 2011 collections.
Aspiring for More than Crumbs: The Impact of Incentives on Girl Scout Response
Rates
Debra Dodson, Girl Scout Research Institute, Girl Scouts of the USA; Meredith Reid
Sarkees, Girl Scout Research Institute, Girl Scouts of the USA; Cathy VonFange,
Abt/SRBI
Youth development organizations are increasingly wedged between the demands of funders, on
one hand, who want empirical evidence of program effectiveness and, on the other hand, a
society increasingly unwilling to provide that empirical evidence particularly when the data
sought are from minors. This challenge requires navigating not only typical response rate
challenges faced in survey research but also the additional complication of gaining parental
consent in order to even contact those members under the age of 13. This paper draws on a
summer 2012 study of Girl Scout members in 10 local councils to assess the relative
effectiveness of a variety of strategies (virtual vs. traditional incentives; membership oriented vs.
non-membership oriented incentives; and small rewards vs. chances to win larger prizes). The
analysis explores the effectiveness of those incentives on willingness of parents to register girls,
willingness of girls to respond, and the impact of incentives on representativeness of the
respondents. The results can help us better understand the strategies for increasing accuracy of
the data used to drive data-driven philanthropy.
Assessing Data Quality
Assessing the Quality of Survey Data Through Streamlined Data Processing
Donsig Jang, Mathematica Policy Research; Amy Beyler, Mathematica Policy Research;
Alicia Haelen, Mathematica Policy Research; Flora F. Lan, National Center for Science
and Engineering Statistics (NCSES)
Federal statistical agencies are continuously striving to provide high-quality survey data in a
timely manner. Adaptive survey design (Groves and Heeringa 2006) is one method they are
using to help achieve this goal. This type of design draws on several data sources, such as
paradata, frame data, and processing data, in real time to help staff allocate resources
effectively during data collection and make informed decisions about the closeout. The
technological advancements that make adaptive survey design possible also make it possible to
streamline data processing. Survey-management systems can now link data sources in real
time, allowing statisticians to conduct editing, imputation, and weighting during data collection.
Researchers can even monitor key survey variables during data collection. (These measures,
along with R-indicators and response rates, can serve as indicators of survey bias.) Combining
adaptive survey design with this streamlined process not only allows us to assess data quality
and bias during data collection, but it also expedites data processing because it enables us to
put all data-processing systems in place by the end of the collection period. The development of
this process was motivated by the National Science Foundation (NSF). In conducting the
National Survey of Recent College Graduates for NSF, we replaced the customary sequential
approach to data processing with this integrated approach. This allowed us to test our data-
processing procedures, including key SAS programs for autocoding, computer edits, and
imputation. We produced and examined real-time quality measures, bias indicators, and
paradata, and then assembled a comprehensive quality profile and assessed nonresponse bias.
Monitoring the data enabled us to correct problems as they arose. We will present our data-
processing framework, the measures we monitored during data collection, and the benefits and
challenges of adopting this process.
Toward a Standard Toolkit for Comparing Samples: Point Estimates, Relations
Between Variables and Trends Over Time
Josh Pasek, University of Michigan
The proliferation of both new methods for collecting data and novel analytical tools for
translating between respondents and the population present exciting possibilities for public
opinion research. But for researchers interested in understanding the population, these new
opportunities may be accompanied with inferential pitfalls. Researchers need to identify the
circumstances under which non-probability surveys, corporate data, and social media data can
yield valuable insights and when these sources might instead lead to erroneous conclusions.
Similarly, corrective tools such as raking, calibration, and matching have the potential to
ameliorate some sources of survey error, but may be unable to adjust for other systematic
biases. For survey researchers to fully utilize diverse sources of data to make conclusions about
the population, they need to be able to assess how the conclusions from diverse data sources
compare to one-another. Particularly, we need to know the circumstances under which the
conclusions reached from these newer tools mirror those of more traditional analyses. In this
paper I present a new toolkit for comparing the inferences derived from different sources of data
and weighting strategies. Programmed as a freely available R package, the toolkit represents a
standardized system for comparing the inferences derived from different datasets regarding
point estimates, relations between variables, and trends over time. To illustrate the features this
new software, the paper presents the results of a novel analysis of 16 weeks of comparable
data collected from one probability RDD telephone data stream and one opt-in non-probability
Internet data stream collected in the run-up to the 2004 U.S. Presidential election. The results
show both the potential for a standardized comparison toolkit as well as the differences that can
be observed across differing types of inferences.
Controlling Survey Response Bias with Range Regression Techniques
John Tuhao Chen, Bowling Green State University; Yuanting Zhang, U.S. Food and Drug
Administration
Response bias arises when the respondent provides inaccurate information, possibly due to a
leading survey question or social desirability bias. There is a lack of innovative methodologies
that systematically deal with response bias. In this paper, we propose a new method called the
range regression to analyze a dataset containing several waves of Health and Diet surveys
(HDS) conducted by the U.S. Food and Drug Administration between 1982 and 2008.Range
regression recently emerged in studying vascular surgery procedures regarding the amount of
treated clots and post-thrombotic syndrome for patients with deep vein thrombosis. Intrinsically,
range regression consists of stratification of respondents with similar ranges, followed by
identification of a measure that bundles subject variability within each stratum. Since the sample
mean is an asymptotically unbiased estimate of the population mean, range regression
essentially models the trend of conditional expected value of the response as a function of
ranges of explanatory variables. By controlling the strata, the method maintains the key source
of variation and reduces confounding effects and survey bias intervening with the main
explanatory variables. Using the FDA’s HDS, we hypothesize that survey response bias may
partially blur out the associations among BMI (body mass index, which involves body weight
and body height) and food label use. Thus, we sort BMI into different ranges and then plot the
mean responses across ranges to seek the relationship between BMI and consumer behaviors
on food label usage. After we apply the range regression technique, the associations between
public perception on diet-nutrition and BMI range clearly stood out. Results of the new methods
are compared with conventional approaches for model plausibility, goodness of fit, efficiency,
and power performance.
Effects of Self-Awareness on Disclosure During Skype Survey Interviews
Shelley Feuer, The New School for Social Research; Michael Schober, The New School
for Social Research
As people increasingly communicate via video using software like Skype and FaceTime, new
opportunities for survey interviewing are emerging. But little is known about how videomediated
interviewing affects data quality, respondent satisfaction, and interviewer rapport. On the one
hand, videomediation might increase rapport with interviewers without the intimidation that can
occur face to face; on the other hand, it may reduce respondents’ sense of privacy, and thus
reduce disclosure of socially undesirable behaviors. The current study explores how one
prominent default feature in current video technologies—the “self-view,” a video image of
oneself in the corner of the screen—affects survey respondents’ levels of disclosure and
feelings of comfort. In a laboratory experiment, 85 respondents engaged in a live real-time
survey interview conducted over Skype, with the interviewer and respondent in separate
locations. Respondents answered 42 questions from major U.S. surveys, selected because they
might show mode effects related to socially undesirable responding, either with the default video
image of themselves in the corner of the screen (“self-view”) or without the image (“no self-
view”). Results suggest, perhaps counter intuitively, that the self-view reduces sensitivity and
social desirability effects, allowing respondents to answer more comfortably and presumably
more accurately. For instance, when asked about alcohol consumption, respondents in the self-
view condition reported more frequent and greater alcohol consumption, and a (presumably
more accurate) decreased likelihood of having been tested for HIV. In post-interview questions,
respondents in the no-self-view condition reported a greater sense of co-presence with the
interviewer and less comfort answering many of the sensitive questions. They also rated the
interview as more sensitive than those in the self-view condition. Although the causal
mechanisms are unclear, perhaps a self-view allows videomediated survey respondents to feel
comfortable enough about their self-display to promote disclosure, or distracts them enough to
reduce defensive self-monitoring.
Methodological Briefs: Maximizing Response and Response Quality
The Effect of Differential Mailing Methodologies on Response Rates: Testing
Advanced Notices, Pre-Recorded Messages and Personalized Address Labels
Yelena Pens, Arbitron; Michelle Cantave, Arbitron; Robin Gentry, Arbitron
Arbitron Inc., a provider of radio ratings data, conducted a test using a probability based
address sample to recruit the general population, aged 13 and older, to complete a one week
Web-based diary of their radio listening. Since Web-based surveys historically have had lower
response rates, there were several treatments in place to increase the response rate. In order to
find the optimal mailing strategy for recruitment, the mailing experiment included treatments
such as alternative advanced notices, pre-recorded telephone messages, and personalization.
The initial invitation to participate in a one week Web-based diary included a box mailer with a
monetary incentive. From previous testing, the box mailer provided the highest response rate as
compared to any reminder mailings. Thus, advance notices as well as pre-recorded messages
emphasizing the arrival of the box mailer was the focus of the study. Three different package
designs and messaging was tested which included a postcard, invitation note card, and self-
mailer postcard. In addition, two different pre-recorded telephone messages were tested
including an advanced notice message communicating the box mailer was on the way and a
reminder message stating the box mailer had recently arrived in the mail. Finally, Arbitron
previously conducted two studies testing how response rates and deliverability are affected by
the use of generic salutation versus personalized name approach. The results of the studies
were mixed, thus a follow-up study was conducted which included a name on the letters or a
generic “City Name Area Household” greeting. In this presentation, we will present the results
from the Web-diary initiatives. We will also determine the combined impact of the non-
deliverable rate and response rate of the personalized letters. Finally, we will present the
optimal mailing strategy for mail-based recruitment for an online survey.
New versus Old Technologies: An Examination of Usability and Cognitive Issues
Across Modes Among Respondents with Varying Education Levels
Elizabeth M. Nichols, U.S. Census Bureau; Patricia L. Goerman, U.S. Census Bureau;
Nathan Jurgenson, U.S. Census Bureau; Tiffany King, RTI International; Murrey Olmsted,
RTI International; Jennifer H. Childs, U.S. Census Bureau
It has often been speculated that respondents who have lower levels of education may have
trouble completing automated government forms. However, recent data shows that cell and
smartphone usage is growing in this demographic (Woelfer, et al., 2011; Rice, et al., 2011;
Woelfer and Hendry, 2009). With cell phone usage, in particular smartphones, becoming nearly
ubiquitous, particularly among young people and minorities, there is the potential to use this
technology to reach out to those with low education, who are often highly mobile and might,
otherwise, not be included. However, little is known about the success and problems
encountered in attempting to administer government forms via smartphones and tablets—in
particular with those who are of differing educational levels. This paper presents qualitative
evidence from 160 cognitive interviews completed with individuals who completed paper or
automated versions of draft U.S. 2020 Census forms. The paper examines whether there are
differences in the number and types of usability and cognitive problems found in cognitive
interviews by education for paper and automated forms and seeks to identify whether data
collection using automated mobile forms would be helpful in reaching out to those who are have
lower levels of education.
Converting Nonrespondents to Late Respondents: The Impact of Automated
Phone Reminder in an RDD Landline Survey
Robin Gentry, Arbitron; Vrinda Nair, Arbitron
The Arbitron Syndicated Radio Survey uses a two-stage methodology whereby an RDD sample
is contacted via telephone and all household members aged twelve or older are asked to
participate in a seven-day radio listening diary for a specific “ratings” week. Unfortunately,
roughly 40% of households who agree to participate in the Radio Survey during the phone call
fail to return any diaries. Relatively little is known about why these households do not return
their surveys. In Spring 2012, Arbitron fielded a study in which non-returning households were
sent an automated phone message approximately 9 days after the end of their diary keeping
week which reminded them to return their completed Radio Survey. We will present the return
rate results, cost-benefit analysis, as well as the analysis of the demographics of those that
returned a diary to determine who we brought in with the additional automated phone reminder.
Instances when the late respondents picked up their phones to receive the live automated
phone message were compared to when the automated message was left on a voice mail to
determine if there was a difference in sample performance.
Factors Influencing Survey Participation Rates on an Online, Probability-Based
Research Panel
Dawn Wiest, American College of Physicians
In May 2011, the American College of Physicians (ACP), a membership organization of
physicians who specialize in internal medicine, established a probability-based, invitation-only
research panel to learn more about the needs and interests of members. After three waves of
invitations, 952 ACP members had joined. In summer 2012, a process of “panel hygiene” was
initiated with the goal of clearing the panel of non-participants and replacing them with a new
round of invitees. Analysis revealed that 30% of panelists had completed no surveys or only one
since joining. Brief surveys were sent to these panelists asking if they wished to remain on the
panel. Panelists who did not respond to this survey and those who responded “no” were
dropped from the panel. Beginning in October 2012, invitations to join the panel were sent to a
new sample of ACP members. This five-minute presentation is based on an analysis of one
year of panel participation data and highlights findings regarding participation rates and panelist
retention. Over the course of one year, seventeen surveys were sent to panelists. Participation
rates were influenced less by demographic factors, such as age, gender, or career stage, than
by how soon after joining the panel panelists received their first survey. Forty percent of
panelists who received their first survey over two weeks after joining completed no surveys in a
year, compared to fourteen percent of those who received their first survey within ten days. The
findings underscore the importance of minimizing the time between when a panelist joins a
panel and when s/he receives the first survey. Additionally, analysis reveals that as a
mechanism for engaging panelists, “quick polls” and other low incentive opportunities are no
replacement for surveys offering higher value rewards. Recommendations based on the findings
are discussed.
When We Do Not Know the Difference – the Level of DK in Different Question
Formats and Different Modes
Steve Schwarzer, TNS Opinion; Eva Zeglovits, University of Vienna; Dylan S. Connor,
University of California (UCLA)
The level of don’t know (DK) responses recorded in surveys are impacted by both social
desirability (SD) and satisficing (SC). Both, SD and SC are known to be sensitive to survey
mode and can inflate the rate of non-committal responses. It is assumed that Web surveys
mitigate interviewer effects, and thus social desirability. However, this is a dualism as Web
surveys also tend to exhibit higher levels of Don’t Know. This mechanic of survey design is
poorly understood and there is little available, practical guidance on reducing mode effects that
tend to increase the level of Don’t Know selection. Our first research question addresses the
level of don’t know responses in Web surveys. We investigate how different presentations of
don’t know answers in this mode affect the number of respondents selecting those options. As
many studies are now employed in a multi-mode manner, inconsistency in don’t knows between
modes introduce noise into the data. As such, our second objective takes a comparative
approach to modes, by analyzing the different outcomes between online and telephone surveys.
To answer these questions we deployed a survey experiment, administered online in four
countries (n=1000). So far, most studies have used data from lagged surveys. But in our case,
the telephone benchmark surveys were conducted (n=1000) concurrently. The paper will focus
on examining whether different question designs result in different outcomes in the level of don’t
know within the same mode. Furthermore, we will show, which question formats limit the
differences between modes—online and telephone surveying. Finally, as this research is based
on a multi-country survey, we will test, whether different formats work differently across
countries. The paper will conclude with how researchers can successfully bridge modes in order
to limit the “questionnaire design mode effect” on the answering behavior of respondents.
Data Quality in a Multi-Mode Self-Administered Study of Mental Health
Andrew L. Hupp, University of Michigan; Margaret L. Hudson, University of Michigan;
Heather M. Schroeder, University of Michigan
This study examines important dimensions of data quality from a mental health study of soldiers
in the U.S. Army. One component of this study involves a cross-sectional survey in which a
global, representative sample of active duty soldiers is interviewed. Soldiers completed either a
computerized or paper self-administered interview in a group session dependent on their duty
location. Each group session is overseen by staff trained by an academic research organization.
We will examine data quality using the following metrics: unit non-response (consent) rates,
item non-response rates, a measure of satisficing (straight-lining) in responses to grid formatted
questions, rates of endorsement of sensitive items, and questionnaire completion rates. This
paper focuses on two aspects of the survey that may affect these measures of data quality. The
first aspect examined is the impact of mode of administration. We hypothesize that data quality
is improved when the survey is self-administered via computer rather than paper. The second
aspect examined is the effect of the group administration agent (field staff v. Army). We
hypothesize that data quality is affected by the presence of a homophilistic agent. In this case,
the homophilistic characteristic of interest is being a member of the military. The agent is
dressed similarly (Army uniform) to the participants. The agent may be perceived as an
authoritative figure since they may have a higher rank than some of those being asked to
participate. This could have an effect on perceived privacy and confidentiality by the participant,
leading to higher compliance in completing the survey request while at the same time
contributing to lower data quality through higher item nonresponse, more satisficing and less
endorsement of sensitive mental health items.
Using Registry Information to Adjust for Non-response Bias in a Diabetes Patient
Survey
Jiaquan Fan, Mayo Clinic
Objective: To evaluate nonresponse bias in a mail survey of diabetes patients and assess a
weighting method designed to adjust for the non-response bias using information obtained from
diabetes registry. Study Design and Setting: Patients from a diabetes registry including 34
Midwestern clinics were randomly selected to participate in a mail survey. 2055 patients
responded to the survey (response rate 43%). Analyses examined demographic, current
smoking status, and health outcomes: blood pressure, Hba1c, Low-density lipoprotein (LDL)
from the diabetes registry, seeking differences between responders and non-responders. A
logistic regression model is developed to identify significant factors related to non-response, and
a weighting method is designed to adjust for non-response bias. Results: Non-response bias is
present in the survey. Responders tends to be older, nonsmoker, and healthier. Age, current
smoking status, blood pressure, and LDL are identified to be significantly related to non-
response. After imputation for missing values, these four variables were used to form weighting
cells to create weights for non-response adjustment. This method compares favorably than non-
response adjustment weighting using only demographic variables. Conclusions: Leverage
salient theory suggests that topic is a large motivator for response. In practice few studies have
frame data with which to conduct nonresponse bias analyses and weighting adjustments. When
frames do have information on both respondents and non-respondents, it is typically only
demographic variables and it is not clear how well adjustments made on demographic variables
actually correct for observed bias in health related survey. Using rich information obtained in the
registry database for this survey, we demonstrate that non-response in health related survey is
likely related to health outcomes and registry data with rich health related information can be
used to obtain better non-response adjustment than using demographic variables alone.
Issues Related to Recruiting and Screening
Empirical Assessment of Respondent Driven Sampling
Zeynep T. Suzer-Gurtekin, University of Michigan; Sunghee Lee, University of Michigan;
James Wagner, University of Michigan
Challenges of scientific data collection with rare and hidden populations are well understood.
Sampling such groups using traditional probability methods is highly costly and almost
impractical. In order to approach this sampling issue, several methods utilizing the social
networks of those populations, including respondent driven sampling (RDS), were suggested as
an alternative. RDS stems from the reasonable assumption that, although hidden in the general
population from outsiders’ viewpoint, some hidden population units are linked to other units of
the same population, forming some type of networks. Once a few members of the target rare
population are contacted typically through convenience sampling, those members are
interviewed as first-wave participants (seeds) and their social networks are exploited to recruit
the next wave of the participants. Unlike traditional sampling, these seeds are asked to play a
role of recruiters; they recruit those who quality for the study from their individual networks. After
the second wave of data collection, this new set of participants recruits the next wave of
participants. Recruitment waves continue until the desired sample size is achieved. Under a set
of strong, yet often untestable, assumptions, RDS claims to produce memoryless Markov chains
of data points leading to unbiased estimates. In this paper, we use data from the Sexual
Acquisition and Transmission of HIV Cooperative Agreement (SATHCAP) that collected data
from the HIV risk groups using RDS. We examine how well the assumptions are reflected in the
data collection, focusing on the memoryless chain assumption and the complete response
assumption. The examination is done with respect to estimation and sampling productivity. We
also compare different estimators suggested in the literature to test their performance.
Recruiting Participants into a Probability-Based Panel Using Interactive Voice
Response Methods: The Canadian Experience
Frank L. Graves, EKOS Research Associates; Timothy B. Gravelle, PriceMetrix Inc.
Significant research on recruiting participants into probability-based research panels has been
undertaken in recent years. In particular, research has focused on finding optimal recruiting
processes and assessing the representativeness of samples recruited using different methods --
landline random-digit dial (RDD), dual-frame (landline RDD plus cell phone) and address-based
sampling (ABS). To date, little work has been done to evaluate the relative efficacy of interactive
voice response (IVR) methods, in part due to regulations in the United States preventing IVR
dialers from calling cell phones and the bias that would presumably result from using IVR
methods to call landline RDD sample only. This paper presents experiences and findings from
the use of IVR to recruit into a probability-based panel in Canada, where both landline and cell
phone numbers may be called using IVR.
Benefits and Drawbacks of a Multistage Screening Effort for Surveying Rare
Populations
Heather M. Morrison, NORC at the University of Chicago; Alicia M. Frasier, NORC at the
University of Chicago; Stephen J. Blumberg, National Center for Health Statistics;
Matthew D. Bramlett, National Center for Health Statistics
Conducting scientifically rigorous surveys of rare populations can be cost-prohibitive because
obtaining a sufficient sample of eligible respondents via probability sampling requires a
significant screening effort. As a result, surveys of rare populations are sometimes undertaken
using convenience samples that minimize the screening effort but come at the cost of scientific
rigor. Recent survey work undertaken through the State and Local Integrated Telephone Survey
(SLAITS) mechanism of the National Center for Health Statistics, however, demonstrates it is
possible to control screening costs while maintaining the statistical properties of a probability
design. SLAITS’ multi-stage approach screens for rare populations via one or more parent
surveys: the National Survey of Children with Special Health Care Needs and the National
Survey of Children’s Health – both conducted on behalf of the Maternal and Child Health
Bureau. These national surveys use the National Immunization Survey sampling frame to
screen approximately six million telephone lines for eligible households yearly, resulting in a rich
sample of certain rare populations. Once identified, these targeted rare populations participate
in the salient follow-up survey. We have successfully employed this screening methodology to
identify and interview nationally representative samples of adoptive parents in the National
Survey of Adoptive Parents and the National Survey of Adoptive Parents of Children with
Special Health Care Needs and more recently for parents of children with autism, intellectual
disability, or developmental delay in the Survey of Pathways to Diagnosis and Services. These
surveys would not be feasible without this multi-stage screening mechanism. There are,
however, drawbacks to this approach. While observed cooperation rates are high for the salient
survey, response rates must be calculated accounting for response at all survey stages
including screening. We examine the benefits and drawbacks of interviewing rare populations
using this methodology, including assessing survey cost, response rates, and sampling
alternatives.
Assessing Methods of Recruitment for a Cell Phone Survey Panel: An Experiment
Conducted in 2011 in Mexico City
Yamil Nares, University of Essex; Rene Bautista, NORC at the University of Chicago
This paper presents the results of an experiment conducted with cell phones in Mexico City
between July and August of 2011. The study was conducted by the public opinion firm Defoe,
Experts on Social Reporting and consisted of a three-wave survey. In the first wave, a
household survey of one hundred cases was conducted face to face, as a baseline study.
These selected respondents were provided with free pre-paid cell phones in exchange for their
continued participation in subsequent waves, which were planned to be conducted over the said
cell phones during the following week. The pool of selected respondents was randomly divided
into two groups. Fifty of the respondents were handed a letter which states the purpose of the
study and objectives. The other fifty people were asked to voluntarily sign a contract in order to
encourage commitment and participation over the next two waves. In both conditions (letter and
contract) cell phones were credited with 15 dollars in advanced. Participants were explained
that they could keep the cell phone equipment upon completion of the two-wave study; that is,
by the end of the week. This paper will discuss the impact of using signed contracts (compared
to letters only) on survey participation. Aspects such as interviewer characteristics, fieldwork
data, and other relevant information will be included in the analysis.
Strategies for Recruiting Respondents for Exploratory Interviews to Aid
Questionnaire Development
Herman Alvarado, U.S. Census Bureau
In a recent collaborative effort between the National Science Foundation and the U.S. Census
Bureau, a sample of U.S. companies was contacted to understand the role of innovation in their
business practices and decision-making, to assess the feasibility of developing survey questions
to measure private-sector innovation. This type of exploratory research is often foreign to
potential research participants, and may even be viewed with suspicion. Thoughtful, concise
and persuasive appeals are often necessary to find, contact, and obtain cooperation from
appropriate people within companies. In order to interview appropriate company personnel, i.e.,
those with both broad and deep knowledge of their companies, we decided to make the initial
requests to company executives and ask for their assistance. In order to reach company
executives, an initial mail contact strategy was used. An official letter explaining the purpose of
the study, requesting their participation, and providing the researchers’ contact information was
sent to more than 120 companies in several U.S. cities. We took steps to ensure the letters
would be perceived as legitimate and important, and would get the attention of gatekeepers
responsible for filtering executives’ mail, including personalizing the letters and sending them via
2-day priority mail. We conducted telephone follow-ups with those companies who did not
initially respond to the letters. In our presentation we will discuss recruiting strategies and
methods, as contacts with often busy and skeptical company representatives, especially
gatekeepers, present narrow windows of opportunity to convey the nature of the request for the
interview. We will also make recommendations for overcoming some of the obstacles we
encountered.
Multi-Mode Surveys
Evaluation of a Sequential Mixed-Mode Design Experiment with Physicians on
Response Rates, Costs, and Response Bias
Emily Geisen, RTI International; Murrey Olmsted, RTI International; Joe Murphy, RTI
International; Marshica Stanley, RTI International
While Web surveys are generally less expensive than data collection by mail, they have not
been shown to be successful at achieving high response rates with physicians. In comparisons
of single-mode physician surveys, Web surveys typically have lower response rates than other
modes (Van Geest, 2007). Similarly, research on concurrent mixed-mode surveys with
physicians has found that the use of a Web option does not increase survey responses
compared to mail alone (McFarlane, 2009). However, a recent meta-analysis of mixed-mode
general population surveys found that offering sequential mixed modes (offering only one mode
at a time) compared to concurrent mixed modes (offering only more than one mode at the same
time) can yield higher responses rates (Fulton, 2012). Our study evaluated a sequential mixed-
mode design experiment conducted on a nationally representative sample of 4,700 board-
certified physicians. Recent research with physicians shows that physicians are adopting mobile
devices such as smartphones and tablets at increasing rates. Therefore the Web survey was
optimized so that it could be completed on mobile devices as well as computers. Half of the
sample received an initial paper survey via mail followed by up to three mail-only nonresponse
follow-ups. The other half of the sample received an initial survey invitation via email with up to
two email reminders. Nonresponders to the Web survey were then sent up to three paper
survey follow-ups. The three paper survey follow-ups were identical in both groups. In this
paper, we compare the effect of the two mixed-mode designs on responses rates, overall costs,
and costs per complete. In addition, we examined mode differences and potential effects of
response bias between the two groups. This work has implications for researchers designing
studies with physicians to find an optimum balance between costs and response rates.
Facing Their Fears: Examining the Impact of Audio Computer-Assisted Self
Interviewing on Population Prevalence of Self-Reported Non-Specific
Psychological Distress
Sarah S. Joestl, National Center for Health Statistics; James Dahlhamer, National Center
for Health Statistics; Adena Galinsky, National Center for Health Statistics; Marcie
Cynamon, National Center for Health Statistics; Virginia Cain, National Center for Health
Statistics; Jennifer Madans, National Center for Health Statistics
Despite steady growth in psychiatric epidemiological research, population-based prevalence
estimates of serious mental illness remain the gold standard for both research and policy.
Recognizing this need, the National Center for Health Statistics (NCHS) during its redesign in
1997 added the K6 scale, a validated six-item screening tool for identifying non-specific
psychological distress, to its National Health Interview Survey (NHIS). However, concerns
around stigma and discrimination may disincentivize people living with mental illness from
reporting psychiatric symptoms in a face-to-face or telephone interview setting. In order to
assess the possibility of underreporting (and hence bias of national estimates) of this and other
health information, NCHS between August and mid-October 2012 carried out a feasibility study
on the use of Audio Computer-Assisted Self Interview (ACASI) for a subset of NHIS questions
deemed sensitive in nature. In this paper, we used data from that field test to compare
prevalence, item non-response, and breakoff rates for each of the K6 items and the overall
scale between the 3,215 adults who received the questions via ACASI and the 2,237 adults who
completed them via Computer-Assisted Personal Interview (CAPI). We further contrasted CAPI
field test estimates with those from the 2012 NHIS production survey to allow examination of
potential context effects from changes in item placement within the survey. Where significant
bivariate results emerged, we examined them in multivariate models to identify potential
sociodemographic respondent characteristics underlying any observed mode effects. Results
from this examination will not only inform mode choice for future surveys with a mental health
component, but will also provide insight on whether prior-year NHIS estimates of non-specific
psychological distress could be improved to account for context effects due to question
placement.
Alone in a Group: Comparison of Effects of a Group-Administered Paper-Pencil
Survey Versus an Individually-Administered Web-Based Survey on Perceptions of
Culture, Peer Pressures and Stigma
William B. Higgins, ICF International; Frances M. Barlas, ICF International; Jacqueline
Pflieger, ICF International; Randall K. Thomas, GfK Custom Research North America;
Diana Jeffery, Tricare Management Activity; Mark J. Mattiko, United States Coast Guard
While research has found that the presence of an interviewer can influence respondents’
answers to questions, less attention has focused on the potential impact that other respondents
may have on survey responses as might occur in group-administered settings. In assessing
topics related to group culture and peer-pressure, the presence or absence of other group
members when completing the survey may influence responses. Such influences may be
stronger in a tight-knit group like the United States military where unit cohesion and trust are
critical to mission success. In this study, survey responses to items concerning group culture
and influence when asked on a paper-pencil, group-administered survey were compared with
responses on an individually-administered, online survey. The Department of Defense and U.S.
Coast Guard authorized the 2011 Health Related Behaviors Survey to explore the prevalence of
a number of behavioral health issues including the military culture of substance use, the
presence of peer pressure to use substances, and the stigma associated with receiving mental
health services. Personnel from a few key military installations from the Army, Navy, Marines,
and the Coast Guard were randomly assigned to one of the administration modes. Respondents
were assured anonymity for each mode. Group-administered paper-pencil survey respondents
indicated greater stigma of receiving mental health care and a stronger military culture of
substance use than did respondents in the Web-based mode.
The Effect of Survey Mode on Socially Undesirable Responses to Open Ended
Questions: Online vs. Paper Instruments
Eric Hedberg, NORC at the University of Chicago; Gabriel Ceasar, Arizona State
University; Danielle Wallace, Arizona State University
A chief concern of survey research is that respondents give socially desirable answers instead
of actual beliefs. However, it is possible that this tendency is mitigated by survey mode. In this
paper we evaluate open-ended responses to a photographic stimuli that asked 1,056 students
in a criminal justice program to evaluate neighborhood conditions. This photograph presents a
street corner with a brick building, a van marked with spray paint, and a religious mural. We
expect responses to this photograph to contain references to race, ethnicity, and class.
However, we examine the difference in how race, ethnicity, and class, were depicted by
respondents across two modes: paper surveys (46.6 percent of responses) and Web surveys
(53.3 percent). We mark each response for various socially undesirable responses ranging from
impolite language to disparaging stereotypes. We then use an item response theory (IRT)
model to estimate the impact of survey mode on the propensity of such offences by estimating a
multi-level logistic regression model. Using a means-as-outcome model and cross level
interactions with survey mode we estimate how mode impacts not only the general propensity
for social undesirability, but also how survey mode impacts the different aspects of socially
undesirable answers. Preliminary results suggest that while mentions of race or ethnicity do not
vary based on mode, surveys from Web interfaces are more likely to provide socially
undesirable answers. For example, we found no difference between modes for mentions of
minority populations, but online surveys were 88 percent more likely to use the word “ghetto.”
We then consider what these results suggest for quantitative research. We conclude that online
surveys are more likely to elicit visceral responses, and that analyses on mixed mode data
collection should include survey mode as a control when examining mean differences on
various scales.
Mode Effects in a National Establishment Survey
Kelly Daley, Abt SRBI; Ben Phillips, Abt SRBI
Surveys of establishments often require the reporting of administrative or historical data, which
can be difficult or burdensome to complete by telephone. Offering survey respondents multiple
modes of reporting can make the task easier by allowing respondents flexibility in the time,
location and pace at which they complete the survey. Presumably, this flexibility would increase
response rates, produce higher quality data and potentially reduce survey administrative cost.
The 2012 Family and Medical Leave Worksite Survey was a sequential multi-mode (Web and
CATI) survey of 1,812 U.S. business establishments. A major design difference between the
2012 survey and earlier administrations is that the 2012 survey allowed respondents to
complete the survey on the Web. The field period for the 2012 survey was March through June,
2012. A total of 634 interviews were completed on the Web and 1,178 interviews were
completed by Computer Assisted Telephone Interviewing (CATI). The target population
consisted of all private-sector business establishments excluding self-employed businesses
without employees, government entities, and quasi-government entities. Provision of the Web
option in 2012 was expected to bolster both the overall response rate and the item response
rate on several key variables related to the administration of FMLA at the sampled
establishment site. This paper explores several aspects related to survey administration mode
in the 2012 FMLA Worksite survey. We compare item response rates to administrative data
questions between the 2000 and 2012 surveys. We examine mode effects using matching
models for causal effects due to the non-ignorable relationship between respondent
characteristics and completion of a survey in telephone or self-administered modes. Potential
reduction of bias of estimates due to different sample composition under a high response rate
scenario is estimated net of estimated mode effects.
Applications of Social Media to Surveys and Pretesting
Social Media vs. Online Classified Advertisements: Does Where We Advertise for
Cognitive Interviews Matter?
Brian Head, RTI International; Elizabeth Dean, RTI International; Timothy Flanigan, RTI
International; Jodi E. Swicegood, RTI International; Michael Keating, RTI International
Technologies have advanced over the past decade and the ways in which people access
information has evolved with those advancements. These changes have created new
opportunities to recruit questionnaire evaluation study participants (e.g., cognitive interview
participants) that may address some concerns with the use of one of the most common
recruitment methods in use today—online classified ads (e.g., advertisements on
Craigslist.com). Potential issues with online classified ads include: the recent decline, based on
anecdotal observation, in the number of responses to these ads; limited demographic diversity;
an inability to target specific populations; concerns about the development of a class of
“professional participants” who use the ads to seek out study participation for additional income;
and the infeasibility to recruit geographically dispersed samples. We hypothesize that
advertising on social media may help address these concerns. We use recruitment data from
two cognitive interviewing studies with distinct populations—virtual world users and adults near
retirement age—to test this theory. In both studies we ran advertisements on Facebook and
Craigslist to recruit potential study participants. Each ad included a link to a web administered
screening survey. The screening surveys included questions about demographic information
and other information used to determine study eligibility. We will present data showing
differences in 1) demographic diversity of participant pools drawn from the two recruitment
methods; 2) the size of and speed at which pools are drawn; and 3) the feasibility of recruiting a
geographically diverse population. Findings from this study may be useful to researchers
concerned 1) with the effects of having homogeneous pools from which to draw questionnaire
evaluation participants; 2) the effect professional participants may have on cognitive
interviewing data; and, 3) recruiting geographically dispersed pool of potential study
participants.
Cognitive Interviewing in Online Modes: a Comparison of Data Collected in
Second Life and Skype
Jodi E. Swicegood, RTI International; Brian Head, RTI International; Elizabeth Dean, RTI
International; Michael Keating, RTI International
Cognitive interviewing can identify potential errors in a survey prior to a large data collection
effort allowing researchers to effectively pretest a draft survey instrument. Digital technologies
afford researchers the opportunity to overcome geographic and logistical limitations of
conducting these interviews with a diverse sample. The convenience of interviewing participants
online includes reduced travel time and the ability to schedule interviews outside of normal
business hours, reducing participant burden with certain populations including online users. The
Second Life population was of interest to researchers in this study. Second Life is a virtual world
where users self-represent through avatars. Purposes of play include socializing, entertainment
and education. New technologies such as the virtual world Second Life and the voice-over-
internet software Skype were utilized to conduct cognitive interviews pretesting a draft
instrument on virtual world avatar similarity. A series of questions asked participants to describe
several physical and personality characteristics of both themselves and their avatars. The goal
of this questionnaire was to determine the extent to which SL users viewed their avatars as
similar to their real life counterparts. Interviews were conducted in three modes: Second Life,
Skype and face-to-face. To determine the feasibility of conducting cognitive interviews digitally,
analyses were conducted to compare data quality across each mode; analyses identified the
number, type and severity of errors detected. Preliminary findings suggest that interviews
conducted in Skype and Second Life yield, on average, the same number of errors. Comparison
data are presented from all three modes. Second Life and Skype can be used to conduct
cognitive interviews with a sample of online participants, though each mode has its own
consideration and limitations for study design and implementation. These implications are
discussed and recommendations explored for researchers interested in other digital cognitive
interviewing modes.
Latent Characteristic Extraction from Twitter Data: Toward Weighting Social
Media Data to Make Inferences to the General Public
Martin Barron, NORC at the University of Chicago
Twitter is a social media service where users post small, 140-character public messages. In the
U.S. alone, Twitter currently has over 100 million users who each day posts over 400 million
“tweets”. This continuous stream of data has been mined by researchers to measure a variety of
behaviors and opinions such as influenza outbreaks, drug use, and a host of other topics of
interest to survey researchers and their clients. However, significant questions remain regarding
the generalizability of these findings beyond the particular universe of Twitter users. Twitter
users represent a self-selected cross-section of the U.S. population–a cross-section that is
younger, more African American, and less rural than the overall U.S. population.One possible
approach to drawing inferences from Twitter data to a larger population involves weighting the
data drawn from Twitter users to known demographic distributions among the general
population. Unfortunately, almost no demographic data are available on Twitter users. This
paper describes a method of assigning demographic characteristics to Twitter users as a first
step towards weighting data mined from Twitter to U.S. population control totals. I discuss a
methodology for extracting latent characteristics (such as sex, race, and age) based on Twitter
behavior. Starting with a hand-coded training dataset, I use machine learning techniques to
build models classifying users on each demographic characteristic of interest. I show that, given
a robust training dataset, many demographic characteristics can be assigned with relatively high
levels of accuracy. Using these classification models, I then explore weighting election
projections based on Twitter data to determine if the weighted results result in more accurate
predictions than the unweighted projections.
Capabilities and Considerations for Using Facebook in Survey Research
Kim Mook, Mathematica Policy Research; Sean Harrington, Mathematica Policy
Research; Amanda Skaff, Mathematica Policy Research
In an era of declining survey response rates and unreliable locating methods, social media
provides important new opportunities for respondent outreach. With over 900 million users
worldwide, including over 150 million in the United States, Facebook warrants particular
attention as a tool for improving sample member contact. This paper discusses the potential
capabilities and concerns that survey researchers must consider when exploring ways to
incorporate this widely used platform into data collection and respondent locating efforts. We
detail the demographics of Facebook’s most common users, as well as the benefits and
drawbacks of contacting potential sample members on the site. We also describe Facebook’s
current outreach capacities, including the differences in information dissemination and direct
communication capabilities between a Page, profile and group, and the important privacy issues
that circumscribe interaction on the site. Finally, we provide a brief case study of the preliminary
stages of Facebook use on the Evaluation of the YouthBuild Program. We detail the benefits of
using Facebook to locate and contact this study’s sample members, who are generally young,
low-income, highly mobile, and often maintain social media accounts as their most permanent
method of contact. In addition to these outreach strategies, we describe the development of
tools to track social media interactions, as well as paradata possibilities for future exploration.
Though these social media efforts are ongoing, our progress to date suggests that Facebook
can be a critical tool in establishing connections with difficult to reach sample members, and can
provide otherwise inaccessible contact information to locators in addition to serving as a
communication platform.
Dangerous Disconnects? How Public Discourse About Nanotechnology is
Missing the Point
Sara K. Yeo, University of Wisconsin - Madison; Dominique Brossard, University of
Wisconsin - Madison; Dietram A. Scheufele, University of Wisconsin – Madison; Michael
A. Xenos, University of Wisconsin - Madison
In general, scientists tend to be more optimistic about technologies, such as nuclear power and
biotechnology, and perceive fewer risks than lay audiences (Savadori et al., 2004; Sjöberg,
1999). However, there is evidence that this trend is reversed for environment, health, and safety
(EHS) risks of nanotechnology (Scheufele et al., 2007), with scientists calling attention to the
potential seriousness of these negative effects. Although nanotechnologies are gaining in
consumer uses with now over 1,300 products available worldwide (The Project on Emerging
Technologies, 2011), the potential deleterious effects of nanoparticles on EHS have been
gaining attention among scientists and regulators (Holgate, 2010; Marambio-Jones and Hoek,
2010). The extent to which these discussions have reached broader segments of the American
population is an empirical question.In this study, we explore public discourse of nanotechnology
using the micro-blogging social media platform Twitter. Online media are rapidly becoming an
important source of information for science and technology for lay audiences (National Science
Board, 2012). Twitter is one of the most prolific outlets for public discourse as it is an ideal
medium for information distribution and discussion. For example, on the night of the 2012 U.S.
Presidential election, 31 million tweets were posted, with the highest tweeting rate (327,452
tweets per minute) occurring when media networks announced Obama’s reelection (Sharp,
2012). In the present study, we performed opinion mining with the software ForSight to
characterize 1,557,325 nano-related tweets posted between September 1, 2010 and August 31,
2012. The topics analyzed included business, national security, consumer products,
medicine/health, EHS, basic research, and energy, the most invested domains of research and
development related to nanotechnology. We found that discussions about consumer products
and national security dominate public discussions about nanotechnology, while EHS was the
least discussed. Implications of this disconnect between expert and public discourses are
discussed.