Help with Chapter 5 end of chapter exercises

ANSWERS TO CHAPTER 5 EXERCISES

1. Why is bias considered more serious than random error?

Hints: They are both errors, but how do they differ? Which one introduces systematic error that could bias a study’s results? Which one averages out to zero? Which one poisons validity and which one dilutes it? The answers to these and other questions are on pages 147-150 of your text.

2. What are the two primary types of subject bias? The two types of subject bias are social desirability bias and obeying demand characteristics.

What are the differences between these two sources?

With social desirability, participants try to make themselves look good. With obeying demand characteristics, participants try to make the researcher look good by giving the researcher the results that will support the hypothesis. Note that making responses anonymous would be much more likely to reduce social desirability bias than to reduce obeying demand characteristics.

3. Suppose a “social intelligence” test in a popular magazine had high internal consistency. What would that mean?

Hints: If a participants missed one question, would that person tend to miss most of the other questions? If a person got a certain question right, would that person tend to get most of the other questions right? If each question was considered a judge of social intelligence, would the “judges” agreeing with each other? The answers to these and other questions are on pages 171-173.

Why would you still want to see whether the test had discriminant validity?

Hint: What other than social intelligence might the test be measuring? If you can’t think of anything, re-read pages 180-183.

How would you do a study to determine whether the test had discriminant validity?

Hints: What tests besides the “practical intelligence test” would you administer? What would it mean if scores on the practical intelligence test correlated highly with scores on those other tests? What would it mean if scores on the practical intelligence test did not correlate highly with the scores on the other tests? If you need more help, re-read pages 180-183.

4. Given that IQ tests are not perfectly reliable, why would it be irresponsible to tell someone his or her score on an IQ test?

People tend to think that their IQ is exactly the same as their test score. Thus, if told their IQ score was 97, they would tend to think of their IQ as being exactly 97. However, scores have random error. Thus, someone who scored 97 one day might score 105 the next. Note that most tests are not perfectly reliable, but many are pretty reliable. Even a pretty reliable test may, however, give an individual pretty different scores from one time to the next.

5. What is content validity?

Hint: Content validity is defined here and explained on page 176.

How does it differ from internal consistency?

Hint: Which one requires statistical evidence such as average inter-item correlation, Cronbach’s alpha, odd-even correlations, and Kuder-Richardson coefficients to establish that all the items on a scale seem to be measuring the same thing? Which one requires expert judgment that the items are consistent with the concept’s definition? Which is concerned with showing that the items represent a fair sample of the construct’s key aspects-- and which is concerned that all the items are measuring one thing? If you are not sure about your answers to these questions, refer to pages 176-179 of the text.

For what measures is it most important?

Hint: See paragraph 5 on page 176.

6. What is content validity?

Swann and Rentfrow (2001) wanted to develop a test “that measures the extent to which people respond to others quickly and effusively.” In their view, high scorers would tend to blurt out their thoughts to others immediately and low scorers would be slow to respond.

a. How would you use the known-groups technique to get evidence of your measure’s construct validity?

You could see whether car salespeople scored higher than librarians.

b. What measures would you correlate with your scale to make the case for your measure’s discriminant validity? Extraversion, social desirability

Why?

Extraversion: Your claim is that your measure is doing something other than measuring outgoingness. Social desirability: It is usually good to show that you are not just measuring a response bias.

In what range would the correlation coefficients between those measures and your measure have to be to provide evidence of discriminant validity? Why?

For extraversion, you would be satisfied with a correlation between .3 and .7. You expect that the trait would be related to extraversion. Thus, you would expect your measure to correlate with a measure of extraversion, but you would certainly want it to be below .8—otherwise, it may just be a measure of extraversion. For social desirability, you would like a correlation around 0 (in the -.2 to +.2 range) because you do not think that your trait is related to social desirability. If your trait is not related to social desirability, your measure of that trait should not be related to social desirability.

c. To provide evidence of convergent validity, you could correlate scores on your measure with a behavior typical of people who blurt out their thoughts. What behavior would you choose? Why?

Interrupting others, talking during movies, or responding to rude behavior—because people who blurt out their thoughts might not be able to help themselves from interrupting others, talking during movies, or responding to rude behavior.

7. A researcher wants to measure "aggressive tendencies." The researcher is considering two choices: a paper and pencil test of aggressive impulses or observation of actual aggression.

a. What problems might there be with observing participants' aggressive behavior?

Hints: In Box 5.3, consider points 1b and 7. To see how to solve these problems, refer to table 5.1.

b. What would probably be the most serious threat to the validity of a paper-and-pencil test of aggression?

Hint: See pages 155-160.

What information about the test would suggest that the test is a good instrument?

Hint: See p. 184: Both Figure 5.7 and Table 5.4 are helpful.

8. Think of a construct that you would like to measure.

a. Name that construct—No one right answer

b. Define that construct

Definition should be drawn from a dictionary, psychological dictionary, or theory

c. Locate two published measures of that concept (see Web Appendix B).

No one right answer.

d. Develop a measure of that construct.

e. What could you do to improve or evaluate your measure’s reliability?

· use machines to record behavior

· simplify the observer's task

· train and motivate observers

· provide clear-cut guidelines on scoring

· re-check observer's ratings

· standardize the way the measure is administered

· calculate a test-retest reliability coefficient

f. If you had a year to try to validate your measure, how would you go about it? (Hint: Refer to the different kinds of validities discussed in this chapter.)

Validation strategies would include

· Assessing measure's reliability

· Assessing convergent validity

· Assessing discriminant validity

· Assessing content validity

g. How vulnerable is your measure to subject and observer bias? Why? Can you change your measure to make it more resistant to these threats?

To make the measure less vulnerable to subject bias

Prevent participants from knowing what behavior is being observed by

· observing them in a “non-research” setting

· using unobtrusive observation

· using unobtrusive measures

· using unexpected measures

Prevent participants from knowing what concept you are trying to measure by

· using disguised measures

· overwhelming participants with measures

Use behaviors that participants won't readily change by using

· physiological measures

· important behavior

To make the measure less vulnerable to observer bias

· Don't use human observers—use machines instead.

· If you must use human observers, make them “blind” measures)

· Reduce memory biases by permanently recording the behavior

· Re-check observer's ratings

· Clearly define the rating categories

· Train and motivate raters

· Use only the raters who were successful during training

9. What problems do you see with measuring "athletic ability" as 40-yard dash speed? What steps would you take to improve this measure? (Hint: Think about solving the problems of bias and reliability).

Hints:

Is there more to athletic ability than 40-yard dash speed? If so, what?
What problems might result from hand-timing 40-yard dashes? Could equipment be used to reduce this human error?

10. Think of a factor that you would like to manipulate.

a. Define this factor as specifically as you can.

No one correct answer.

b. Find one example of this factor being manipulated in a published study. Write down the reference citation for that source.

No one correct answer.

c. Would you use an environmental or instructional manipulation? Why?

No one correct answer.

d. How would you manipulate that factor? Why?

Answer should focus on

· standardization

· reducing experimenter bias

· reducing subject biases, including the use of a placebo treatment

· consistency with theoretical definitions of the construct

· evidence that the manipulation is effective, such as the results of manipulation checks from other studies

e. How could you perform a manipulation check on the factor you want to manipulate? Would it be useful to perform a manipulation check? Why or why not?

There is no one answer to how to perform the manipulation check. However, there are clearer answers to the next two questions. Generally, it is a good idea to perform a manipulation check because one should not simply assume that a manipulation was interpreted the way that we wanted it to be interpreted. The manipulation check provides evidence that the treatment is valid (if it is) and may tell you where your study went wrong (if the treatment manipulation is not valid). Thus, if the study doesn't support the hypothesis, the manipulation check may help in determining whether it was the hypothesis or the manipulation that was faulty.