Evaluation/Survey Design

The Basics

Dillman's Principles for Writing Survey Questions

Choose simple over specialized words.
Concise: Choose as few words as possible to pose the questions
Grammatical: Use complete sentences to ask questions
Avoid vague qualifiers when precise estimates can be obtained
Use equal numbers of positive and negative categories for scalar questions
Put "Undecided" at the end of the scale to distinguish it from "neutral"
Avoid bias from unequal comparisons
State both sides of attitudinal scales in the question stems
Eliminate check-all-that-aply question formats to reduce primacy effects
Develop response categories that are mutually exclusive
Use cognitive techniques to improve recall
Provide appropriate time "referents" (J: does the mean denotation or reference"? What are you denoting? )
Be sure each question is technically accurate.
Choose question wordings that allow essential comparisons to be made with previously collected data.
Avoid asking respondents to say yes in order to mean no.
Avoid double-barreled questions.
Soften the impact of potentially objectionable questions.
Avoid asking respondents to make unnecessary calculations.

Choose simple over specialized words.

Examples:

In the following list, the words on the right would be better than those on the left.

exhausted –––––> tired
candid –––––> honest
top priority –––––> most important
leisure –––––> free time
employment –––––> work
courageous –––––> brave
rectify –––––> correct (verb)

Concise: Choose as few words as possible to pose the questions

Remove redundancy.

Use complete sentences to ask questions (Grammatically correct):

age ––––> What is your age? height ----> What is your height?

This is survey cannon; not sure why. (J: less ambiguous?)

Avoid vague qualifiers when precise estimates can be obtained

"Graduate 'whizz kids' are pampered at Sellafield" (Lee, 1993. Work & Stress)

"the working environment is sufficiently enlightened." (Gonca and Salim, 2009. Ege Academic Review"

The qualifiers in these questions are "whizz kids" and "enlightened".

Avoid specificity that exceeds the respondent's potential for having an accurate ready-made answer."

"How many days have you worked this season when you were injured or ill?" (Arcury et al., 2012, American Journal of Public Health)

If you ask questions people probably won't be able to answer, you'll reduce your credibility, frustrate the participant, and will not likely get accurate answers. It's like asking people about jelly-bean jars in a fair. (J: Is the average of answers about jelly-beans in a jar usually accurate, or is it only when you average the answers of experts that the average answer outperforms individual answers?)

How often does something happen? How novel is it?

If you ask about behavior in the past week or past month, don't ask,

"How many times this month did you brush your teeth?" ––––> "Think about the last full month: How many times did you do this?"

Question: If you ask about too short of a time period, you could get error sometimes that you don't want? What do you think of asking "in a typical week?"

Use equal numbers of positive and negative categories for scalar questions

"Would you say that in general your health is" ...? ––––> "In general, my health is..." (Scale from Good <–––––> Very Good"

Question: Is it very important to always have a neutral response available? The answer to this depends on NOIR: Nominal; Ordinal; Interval; Ratio; (Link to wikipedia).

One argument for having a neutral response, is that your information is no longer interval-level. In a scale that goes "Strongly Disagree – Disagree – Agree – Strongly Agree" <–– This assumes that the distance between "Disagree" and "agree" is the same as the distance between "Agree" and "strongly agree".

When you don't want the neutral option:

You want to force people to know which side of the scale they are on.

If you force the choice you don't get the ability to run the statistics that many people run on their surveys.

Put "Undecided" at the end of the scale to distinguish it from "neutral"
Avoid bias from unequal comparisons
State both sides of attitudinal scales in the question stems
Eliminate check-all-that-aply question formats to reduce primacy effects
Develop response categories that are mutually exclusive
Use cognitive techniques to improve recall
Provide appropriate time "referents" (J: does the mean denotation or reference"? What are you denoting? )
Be sure each question is technically accurate.
Choose question wordings that allow essential comparisons to be made with previously collected data.

Avoid asking respondents to say yes in order to mean no.

Also, "Avoid negatively worded "Strongly Disagree/Strongly Agree" items.

Keep everything worded the same way. There’s a false belief that if you ask kids, “I love drugs and I hate drugs” that all you have to do is flip the scales and they’ll be the same.

If you run a factor analysis, nine times out of ten they will not “hang together”

Avoid double-barreled questions.
Soften the impact of potentially objectionable questions.
Avoid asking respondents to make unnecessary calculations.

Questions with a negative tone are not necessarily bad. Double negatives ARE bad. And "not doing a good job" can be in danger of being read as "doing a good job"

Use Specific attitude if you want to predict specific behaviors"

If you ask attitude towards organi donation, it doesn't (predict?) what they will do when you then ask them to sign up.

So, "what are your attitudes towards disneyland" ---> "What is your attitude towards going to disneyland this year?"

If you want to predict what people will do, you want to be as accurate as possible.

Be sure there is equal distance between interval scales

"To what extent do you feel your coworkers"

Distinguish "neutral", "Don't know", and "not applicable".

'THe Yankees are a good team" <– What if I know nothing about baseball? I assume I'm expected to put "Neutral". Should the person who knows nothing about it, and the person who knows a lot, be the same data point? No. Those are very different things. So one way to get around it is to include a "not applicable"

People who are ambivalent, not neutral, and people who it doesn't apply to. This can make a huge difference in the data.

Let me give you some caveats. When we have small samples, there are times that we will actually accept having the neutral and the not-applicable be on the same data point. Because if someone chooses "not applicable", you can't use that data in your analysis. (J: but you could collapse that later? because what about meta-evaluation?)

If you give them the option, it is an amazing thing that will make people ignorant! "It's easy, I don't have to think!" You clean your data by putting it in, but also get people saying "not applicable" that it was probably applicable to.

On the other hand, you might lose about 30% of your sample, but be left with good, clean data.

Question: Would decline to answer be like "not applicable"? – A: Once you put in "decline to answer" people start thinking they have the right to decline to answer and feel powerful." – On phone surveys we'll have a code for "Decline to answer" – but we don't give them the option often.

Q: Would you front-load your survey with qualifying questions so you don't have to have a "not applicable"? A: Here's a problem: if there is a financial incentive to do the survey, people have reason to lie and say that they are. We have a survey up now, where you must be a college student: and then at the end of the survey.

You can choose to throw people out at the end. You don't want people to be too excited to do your survey because of extrinsic rewards, because they will lie to receive the reward.

Spread out your response set to obtain more variance (except in low-education samples)

One question we often get: On Likert scales, how many items should we have?

Pilot Testing: Don't trust your own judgement!

There was a 100-question survey that the school administration thought was way too long for their 7th grade students. The students, however, completed it in 6 minutes and didn't find it too long.

Don't trust your own judgement unless you're part of the target population: and even then, don't trust your judgement. Pilot test it.

Cognitive

Process Model of Answering a Survey Question

Interpret
Recall (J: process? Which can include recall, but can also use other operations like comparison, extrapolation, etc.?)
Format answers
Report

Does the respondent understand the question in the same way that the researcher wanted it to be understood? (J: Does this mean that there should be more redundancy then? It seems to be a balance between being concise and providing extra detail. . .)

Response alternatives will influence your answers

Most of what we just studied was about how to write items well. But that doesn’t take into account the order effects.

Back-translation is the gold standard when dealing with translation.

Order matters

The questions comes before influence your response to questions that come later.

You always have to think, "What am I activating?" What thoughts am I putting into people's heads?

Then put things in a way to minimize order effects. If you are nervous, you can pilot test it; and vary it; see how the answers come out each way.

Range Effect – the range of responses I give you is going to influence your answer. It can triple the amount of times you say something exists. Open-ended questions are hard to answer;

Frequency Effect - You get nervous if it's the same answer over and over again. If you ask about a bunch of behaviors that are pretty low-occuring, after a while, people will

One way around this is counter-balancing: So that there is a different question order each time someone gets a survey.

Resources

Sudman, Bradburn, and Schwartz write some of the best resources on surveys.

As people go on in surveys, they will report doing something less and less.

These researchers argue that the context in order of relationship.

Context Effects at the Comprehension Stage

For example, what happens if a respondent encounters an ambiguous question?

When you don't have a lot of knowledge, your participant's attitudes may be weak, the more they will be moved by context effects.

If they're knowledgeable, it doesn't matter as much. Psychologically, with an ambiguous question, you project; and the more your mood will affect you.

Context Effects at the Judgement Stage=

If you say, "Should a student caught plagiarizing get a second chance?" and then ask "Should a professor caught plagiarizing get a second chance?" you'll be more likely to get a "yes".

Formatting Stage: You can anchor the norm; ask you a bunch of questions about professors doing the most evil thing possible, but then add something that's evil but not as evil (like plagiarism),

Effect of rank ordering on subsequent ratings: If you start positive to negative the responses will be more favorable than if you go from negative to positive.

Most order effects you can get through logic if you look at it. Inclusion/Exclusion effects;

Assimilation Effects: if they take the information that comes before, and let it influence their answers, that is considered an assimilation affect.

If you want true data, start with general then go specific. If you ask something very specific and you activate that, unless you take steps to be otherwise, that question will have a greater effect on that answer than it would be otherwise.

If you go from a specific question to a general question, it's almost the same as saying "with ___ in mind, what do you think about..."

Contrast Effects

Subtraction-Based contrast effect:

Comparison-based contrast effect:

Dillman's Concerns:

Should people in Alabama be ready to accept;

Making a lot of specific things active will generally lead to an overall positive bias.

Implications for Questionnaire Construction

Keep in mind what mood people are going to be in when you are doing a survey.

The content of the preceding question determines the information that becomes temporarily accessible in memory
The number of preceding questions is important
The generality of the target question is important.

A very general question is like an ink-blot.

(J: What about surveys that are designed so that there is a narrative to it, to increase energy and motivation in the survey respondent)

Glossary

Exhaustion: "THe totality of strain on a person"

– you could have a really long survey that energizes people, and they're more excited at the end of it than they are when they started. 
- On the other hand you could have a 2-question survey where