Back
to Research Design Explained Main Menu
Research Design Explained (7th
edition)
1.
2.
Operational definitions help psychology
a.
be
objective because they provide objective definitions of terms.
b.
make
testable statements because they allow us to define our concepts in specific
and observable terms.
c.
be
public because operational definitions define terms in publicly observable
ways. These “recipes” can be shared.
d.
be
productive because the operational definition provides a recipe that other
scientists can follow to repeat and build on the original researcher’s
study.
3.
4.
Match the following to the qualities of
science.
_a__ testable a.
learning from mistakes
_ e__ skeptical b.
“show us the evidence”
_b__ open-minded c.
avoid bias
_c__ objective d.
publishing studies
d_ public e.
question authority
_f__ productive f.
science works
5.
6. The characteristic of science that is threatened is
finding general rules. One implication is that psychology’s inability to
predict
7.
8.Astrology fails on all three counts.
a. It often makes statements that are too vague to be
testable
b, It is not productive: It hasn’t changed in 2,000
years.
c. Astrologers do not seek objective evidence of their
accuracy. They prefer to talk only about their successes (perhaps because their
track record is not very good).
9.
10. Psychoanalysis is attacked for
a.
not being testable because it makes after-the-fact interpretations rather than
predictions
b.
not producing observable evidence (if the unconscious can’t be observed),
which, in turn, leads to it being untestable as well
as unproductive.
c.
not being productive–if the effectiveness of psychoanalysis has not
improved
1.
2.
Match the threat to the
type of validity
_a_
construct validity |
a. poor
measure |
_c_
external validity |
b. treatment and no treatment groups were unequal before the study
began |
_b_
internal validity
|
c.
small, biased sample of participants |
3.
4.
You could argue that it is unethical to give
patients an unproven treatment. People
are coming to (and paying) the
therapist for help. If the therapist's “help” has not been tested,
it may produce real harm--or prevent the patient from getting treatment that
has been proven to be effective. Some would argue that if you are using an
unproven treatment, you should at least tell clients that the treatment is
unproven and you should not charge them for such treatments. You could argue
that it is ethical to withhold a treatment that is believed to work because you
do not yet know if it works or not.
That is, you should not give the treatment until you are sure that it
does indeed work.
5.
6.
a.
To make an informed decision about the ethics
of a research project, we must be able to weigh the pros and the cons. If we
cannot accurately assess the harm that might result from a study, we cannot
make an intelligent decision about whether a study should be done. Thus, we may approve research that
causes great harm. It could be argued that, rather than taking that risk, we
should not do research.
b.
Because we are so ignorant of human
behavior, we need to do research so we can become less ignorant. Our ignorance and lack of understanding
results in harm every single day. We have a duty to become less ignorant, more
helpful, and more understanding of ourselves and of others.
c.
The following principles appear to have
been violated.
i.
Participants did not fully understand the
amount of shock that they would be asked to administer.
ii.
Participants did not feel free to quit
the study at any point.
iii.
Milgram
did not seek permission from an internal review board (such committees did not
exist back then).
d.
The following principles appear to have
been violated.
i.
The “prisoners” didn't feel
free to quit the study at any point.
ii.
The investigators failed to anticipate
all possible risks to the participants.
1.
2.
According to dissonance theory, the
following factors might moderate the effects of writing a counterattitudinal
essay: the extent to which you believe
that others will know you wrote the essay, the extent to which you believe that
your essay will influence others, the extent to which you believe that you
voluntarily chose to write the essay.
3.
4.
Find a research article that tests a
hypothesis derived from theory.
Give the citation for the article and describe the main findings.
No
set answer.
5.
6.
Design a study to improve the construct
validity of the study reported in Appendix D.
No
set answer.
7.
8.
The study reported in Appendix B finds a
relationship between variables.
Design a study to map out the functional relationship between those two
variables.
No
set answer.
9.
10.
In terms of the null hypothesis, what's
wrong with the following research conclusions:
a.
There is no difference in outcome among
the different psychological therapies.
b.
Viewing television violence is not
related to aggression.
c.
There are no gender difference in
emotional responsiveness.
(1) All these conclusions are based on
assuming that the null hypothesis can proven. It cannot. Failing to find a
difference or an effect does not mean there is no
(2) difference or no effect. It just means
you failed to find a difference.
There are no set answers to any of these
exercises.
1.
2.
List four basic tactics for reducing the
possibility of subject bias.
The
four basic tactics are:
a. Not letting participants know that you
are observing them.
b. Letting participants know that you are
observing them, but not letting them know what particular behavior you are
observing.
c. Letting participants know what behavior
you are interested in, but not letting them know what the behavior really
measures.
d. Choosing a response that most
participants couldn't or wouldn't change
3.
4.
What is the discriminant
validity? Why is it necessary?
Discriminant validity involves showing that your
measure does not correlate too highly with something it shouldn’t. Discriminant validity and convergent validity work
together. Convergent validity is necessary to know that your measure correlates
with what it should. Discriminant validity is
necessary to show that your does not correlate too highly with what it should
not. To take but one example, suppose that you had a measure of a
person’s weight that was based on the length of their arms. Such a
measure would correlate with other measures of weight (convergent validity),
but it would also correlate even more highly with other measures of height.
Thus, the measure would not have discriminant
validity relative to height. In short, you could easily have an invalid measure
that correlates with what it’s supposed to measure (has convergent validity),
but correlates even more highly with a related construct (lacks discriminant validity).
5.
6.
What is content validity? For what measures is it most important?
Content
validity is the extent to which a measure represents a balanced and adequate sampling
of relevant dimensions, knowledge, and skills. It is important when you are measuring
classroom tests and other tests of knowledge and skill.
7.
8.
Think of a construct that you would like
to measure.
a.
Name that construct—No one right answer
b.
Define that construct
Definition
should be drawn from a dictionary, psychological dictionary, or theory
c.
Locate two published measures of that
concept (see Web Appendix B).
No
one right answer.
d.
Develop a measure of that construct.
e.
What could you do to improve or evaluate
your measure’s reliability?
·
use
machines to record behavior
·
simplify
the observer's task
·
train
and motivate observers
·
provide
clear-cut guidelines on scoring
·
re-check
observer's ratings
·
standardize
the way the measure is administered
·
calculate
a test-retest reliability coefficient
f. If
you had a year to try to validate your measure, how would you go about it?
(Hint: Refer to the different kinds of validities discussed in this chapter.)
Validation
strategies would include
·
Assessing
measure's reliability
·
Assessing
convergent validity
·
Assessing
discriminant validity
·
Assessing content validity
g. How
vulnerable is your measure to subject and observer bias? Why? Can you change
your measure to make it more resistant to these threats?
To
make the measure less vulnerable to subject bias
Prevent participants from knowing what
behavior is being observed by
·
observing
them in a “non-research” setting
·
using
unobtrusive observation
·
using
unobtrusive measures
·
using
unexpected measures
Prevent
participants from knowing what concept you are trying to measure by
·
using
disguised measures
·
overwhelming
participants with measures
Use behaviors that participants won't
readily change by using
·
physiological
measures
·
important
behavior
To make the measure less vulnerable to observer bias
·
Don't
use human observers--use machines instead.
·
If
you must use human observers, make them “blind” measures)
·
Reduce
memory biases by permanently recording the behavior
·
Re-check
observer's ratings
·
Clearly
define the rating categories
·
Train
and motivate raters
·
Use only
the raters who were successful during training
9.
10. Think
of a factor that you would like to manipulate.
a.
Define this factor as specifically as you
can.
No one correct answer.
b.
Find one example of this factor being
manipulated in a published study. Write down the reference citation for that
source.
No one correct answer.
c.
How would you manipulate that factor?
Why?
Answer should focus on
·
standardization
·
reducing
experimenter bias
·
reducing
subject biases, including the use of a placebo treatment
·
consistency
with theoretical definitions of the construct
·
evidence
that the manipulation is effective, such as the results of manipulation checks
from other studies
d.
How could you perform a manipulation check
on the factor you want to manipulate? Would it be useful to perform a
manipulation check? Why or why not?
There is no
one answer to how to perform the manipulation check. However, there are clearer
answers to the next two questions. Generally, it is a good idea to perform a
manipulation check because one should not simply assume that a manipulation was
interpreted the way that we wanted it to be interpreted. The manipulation check
provides evidence that the treatment is valid (if it is) and may tell you where
your study went wrong (if the treatment manipulation is not valid). Thus, if
the study doesn't support the hypothesis, the manipulation check may help in
determining whether it was the hypothesis or the manipulation that was faulty.
1.
2.
List the scales of measurement in order
from least to most accurate and informative.
Nominal, ordinal, interval, and ratio.
3.
4.
Assume that facial tension is a measure
of thinking.
a.
How would you measure facial tension?
Facial tension could be measured as the amount of lines a
person gets on his/her face during times of stress or by measuring electrical
activity of facial muscles.
b.
What scale of measurement is it on? Why?
You might assume that it is an ordinal scale (more tension
means more thinking). Certainly, it would not be safe to assume that you had a
ratio scale (twice as much tension means twice as much thinking) and it would
probably not be safe to assume that you had an interval scale.
c.
How sensitive do you think this measure
should be? Why?
The measure of lines on the face might not be sensitive
(there would be a small range of scores and some random observer error).
However, the measure of electrical activity in the facial muscles might be
extremely sensitive.
5.
6.
In an ideal world, car gas gauges would
be on what scale of measurement? Why?
Ratio. You would want the best measurement possible. It
would be nice to know that if you registered having half a tank you really had
half a tank.
7.
8.
Find or invent a measure.
a.
Describe the measure.
No one correct answer.
b.
Discuss how you could improve its
sensitivity.
No one correct answer.
c.
What kind of data (nominal, ordinal,
interval, or ratio), do you think that measure would produce? Why?
No one correct answer.
1.
2.
Steinberg & Dornbusch
(1991) also reported that the correlation between hours of employment and
interest in school was statistically significant. Specifically, they reported
that r(3,989)= -.06, p<.001. [Note that the r (3,989) means that they had
3,989 participants in their study.] Interpret this finding.
The more hours students worked, the less likely they were to
be interested in school. However, this effect was extremely small and is only
significant because the researchers, by using almost 4,000 participants had an
extremely powerful design. The effect is so small that for practical purposes
it is meaningless. Put another way, the coefficient of determination is only
.0036, meaning that the relationship explains almost none (0.0036 is not that
far from 0.00) of the variation in interest in school.
3.
4.
In the same study, sex was coded as 1=
male, 2= female. The correlation between sex and aerobic fitness was -.58, which was statistically significant at
the p<.01 level.
a.
In this study, were men or women more
fit?
Men were more aerobically fit.
b.
What would the correlation have been if
sex had been coded as 1= female and 2= male?
+.58.
c.
From the information we have given you, can
you conclude that one gender tends to be more aerobically fit than the other?
Why or why not?
No, because you do not know if the sample of men and women
were a representative random sample of all men and women.
5.
6.
A physician looked at 26 instances of crib
death in a certain town. The physician found that some of these deaths were due
to parents suffocating their children. As a result, the physician concluded
that most crib deaths in this country are not due to problems in brain
development, but to parental abuse and neglect. What problems do you have with
the physician's conclusions?
First, the physician generalized from a small and limited
sample to the entire country. Second, the physician made an inference about the
percentage of instances in larger population (that most crib deaths are due to
parental neglect) without doing any statistical test of this assertion.
1-3.
Open-ended.
4.
Question Format |
Advantages |
Disadvantages |
Nominal- dichotomous |
•
Easy to answer •
Easily and objectively scored •
High reliability |
•
Participants may dislike •
Participants’ viewpoints may not be represented •
Provides only ordinal data •
Deprives study of power because (1) measure is insensitive and (2) may
require use of less powerful statistical techniques. |
Likert-type |
•
Easy to answer •
Easily and objectively scored •
High reliability •
Sensitive •
Provide interval data •
Can be analyzed with powerful statistical tests •
Potential for summating scores |
•
Participants may resist fixed-response format |
Open-ended |
•
Allows participants freedom to respond as they choose •
Good for exploratory research |
•
Time-consuming to answer •
Time-consuming to score •
Hard to score objectively. |
5-8.
Open-ended.
9.
10.
Why might having participants sign
informed consent forms make the study less ethical?
If the survey is anonymous, innocuous, and doesn’t elicit
sensitive information from participants, informed consent is not required.
Filling out the informed consent form might make the study less ethical by
making the participants’ involvement in the study less confidential and
more time-consuming without providing any benefits to the participant.
1.
2. In all of the following cases, the researcher
wants to make cause-effect statements. What threats to internal validity is the
researcher apparently overlooking?
a. Employees are interviewed on job satisfaction. Bosses undergo a three week training program. When employees are re-interviewed a second time, dissatisfaction seems to be even higher. Therefore, the researcher concludes that the training program caused further employee dissatisfaction.
History--other
events besides the training program have happened in the past three weeks. For
example, layoffs or salary cuts could have occurred.
Instrumentation--The
interviewer may have developed more rapport and been more direct in the second
interview. Thus, the second time around, the measure was not the same and not
administered the same way.
b. After completing a voluntary workshop on
improving the company's image, workers are surveyed. Worker who attended the
workshop are now more committed
than those in the "no-treatment" group who did not make the
workshop. Researcher's conclusion: The workshop made workers more committed.
Obvious
selection problem--even before the workshop, volunteers were probably more
committed than non-volunteers.
c. After a 6-month training program, employee
productivity improves. Conclusion: Training program caused increased
productivity.
Maturation:
New workers might have naturally improved their skills
over
that period.
History:
Other events (a new incentive system, a better supervisor, better technology)
that happened over the last six months could be responsible for the rise in
productivity.
Regression:
Would be a likely problem if training was instituted because productivity was
at an all time low.
Mortality:
Poorer workers may have left the company.
d. Morale is at an all-time low. As a result,
the company hires a "humor consultant." A month later, workers are
surveyed and morale has improved. Conclusion: The consultant improved morale.
Regression
is the most likely suspect.
Also likely are:
Mortality
(unhappy people leaving)
History (management making
other changes)
e.
Two groups of workers are matched on commitment to the company. One group is asked
to attend a two-week workshop on improving the company’s image, the other
is the no-treatment group. Workers who complete the workshop are more committed
than those in the “no-treatment” group. Researcher’s
conclusion: The workshop made workers more committed.
Selection
(not all workers who are asked will go) and mortality (people dropping out) are
prime suspects.
3.
4. How could a quack psychologist or doctor take
advantage of regression toward the mean to make it appear that certain phony
treatments actually worked?
If the quack takes people who are feeling
unusually bad, those people will tend to improve on their own. That is, they
will naturally rebound to their normal levels of health or happiness and the
quack can take the credit.
5.
6. Suppose a memory researcher administers a
memory test to a group of residents at a nursing home. He finds a group of
grade school students that score the same as the older patients on the memory pretest. He then administers
an experimental memory drug to the older patients. A year later, he gives both
groups a posttest.
If the researcher finds that the older patients
now have a worse memory than the grade school patients, what can the researcher
conclude? Why?
Nothing--the results could be due to a
selection by maturation interaction due to the school children's memories
improving and the older patients' memories naturally staying the same or
declining slightly.
If the researcher finds that the older patients
now have a better memory than the grade school students, what can the
researcher conclude? Why?
The researcher might have an easier time
concluding that the drug improves memory because the difference is opposite of
what would be expected on the basis of selection by maturation interactions. However,
history effects are still possible (if other interventions are going on at the
nursing home) and regression might be possible (if the children selected had
unusually high scores for their grade level).
7.
8. What is the difference between testing and
instrumentation?
The difference between testing and
instrumentation is that in testing participants may remember things from the
previous test and therefore score higher, whereas in instrumentation, the
actual measuring instrument changes or the way it is administered changes.
9.
10. What is the difference between internal and
external validity?
Internal
validity
refers to whether you can make the statement that, in a given study, with these
participants, the treatment caused an effect.
External
validity
refers to whether you can generalize
what you discovered in a particular study to other people, situations, and
times.
1.
2. Participants are randomly assigned to meditation or no meditation
condition. The meditation group meditates three times a week. The meditation
group reports being significantly more relaxed than the no meditation group.
a.
Why might the results of this experiment be less clear-cut than they may first
appear?
There
is a construct validity problem. The meditation group may feel more relaxed
because of a placebo effect or they may simply report being more relaxed
because they think that is what the experimenter wants them to say. In
addition, it may be that the tense people dropped out of the experimental group
because they were unable or unwilling to keep to the schedule of meditating
three times a week.
b.
How would you improve this experiment?
The
experiment could be improved by improving the control group. For example, the
control group might be assigned to keep to a schedule where they would listen
to classical music three times a week. Alternatively, they might be asked to
keep to a schedule where they would have “quiet time” three times a
week.
3.
4.
A training program significantly improves worker performance. What should you
know before advising a company to invest in such a training program?
You
should know how big the difference
was. A statistically significant difference may not be big enough to be worth
paying for.
5.
6.
Students were randomly assigned to two different strategies of studying for an
exam. One group used visual imagery, the other group was told to study their
normal way.
The
visual imagery group scored a 88% on the test as compared to a 76% for the
control group. This difference was not significant.
a.
What, if anything, can the experimenter conclude?
Nothing--null
results are inconclusive.
b.
If the difference had been significant, what would you have concluded? What
changes in the study would have made it easier to be sure of your conclusions?
Imagery
seems to improve recall. We would be more confident of our conclusions if they
hadn't used an “empty control group.” Ideally, the control group
would have gotten some placebo-type treatment (a lecture on the importance of
studying).
c.
"To be sure that they are studying the way they should, why don't you have the imagery
people form one study group and have the control group form another study
group." Is this good advice? Why or why not?
This is bad advice because that would mean violating independence.
d.
"Just get a random sample of students who typically use imagery and
compare them to a sample of students who don't use imagery. That will do the
same thing as random assignment" Is this good advice? Why or why not?
This is bad advice. Random sampling is very different from random
assignment. People who typically use imagery may differ from people who don't
typically use imagery in a wide variety of ways. They are probably more visual
thinkers and may do better in art, architecture, geometry, and chemistry than
people who do not typically use imagery.
7.
8.
Gerald's dependent measure is the order in which people turned in their exam
(1st, 2nd, 3rd, etc.). Can Gerald use a t
test on this data? Why or why not? What would you advise Gerald to do in future
studies?
Gerald
should not use a t test because he
has ordinal data. Because he has
ordinal data, computing means for the control group and the experimental group
(a first step in doing a t test)
would be misleading. Next time, Gerald should record at what time people turned
in their exam. Then, Gerald would have data that were at least interval.
9.
10.
Are the results of experiment A or
experiment B more likely to be significant? Why?
EXPERIMENT
A |
EXPERIMENT
B |
||||
CONTROL GROUP |
EXPERIMENTAL GROUP |
CONTROL GROUP |
EXPERIMENTAL GROUP |
|
|
3 |
4 |
3 |
4 |
|
|
4 |
5 |
4 |
5 |
|
|
5 |
6 |
5 |
6 |
|
|
|
|
3 |
4 |
|
|
|
|
4 |
5 |
|
|
|
|
5 |
6 |
|
|
|
|
3 |
4 |
|
|
|
|
4 |
5 |
|
|
|
|
5 |
6 |
|
|
Experiment
B’s results are more likely to be statistically significant because it
studied more participants. Having more participants allows random error more
opportunities to balance out. Consequently, with more participants, a moderate
difference between the groups is less likely to be due to chance alone. When we
do the calculations, we find that for Experiment A t = 1.225, which is not significant, and that for Experiment B, t = 2.449, which is significant.
1.
2.
Suppose people living in homes for the elderly were randomly assigned to two
groups: a no treatment group and a
transcendental mediation (TM) group.
Transcendental mediation involves more than sitting with eyes
closed. The technique involves both
a "mantra, or meaningless sound selected for its value in facilitating or
settling down process and a specific procedure for using it mentally without
effort again to facilitate transcending" (Alexander, Langer, Newman,
Chandler, & Davies, 1989).
Thus, the TM group was given instruction in how to perform the
technique, then "they met with their instructors 1/2 hour each week to
verify that they were mediating correctly and regularly. They were to practice their program 20
minutes twice daily (morning and afternoon) sitting comfortably in their own
room with eyes closed and using a timepiece to ensure correct length of
practice." (Alexander, Langer,
Newman,
a.
Could the researcher conclude that it was the transcendental meditation that
caused the effect?
No,
because the control group was an empty control group.
b.
What besides the specific aspects of TM could cause the difference between the
two groups?
The
extra attention the TM group received, the structure of a routine that was
imposed on the TM group, as well as the fact that those who weren't able to
learn the TM technique or who didn't continue to apply the technique would be
dropped from the study. Thus, people may be dropping out of the experimental
group, but not out of the control group.
c.
What control groups would you add?
A
group that had to undergo some training (e.g., critical thinking) and would
have to practice what they had learned twice a day and meet with their
instructors once a week.
d.
Suppose you added these control groups and then got a significant F for the treatment variable? What could
you conclude? Why?
Conclusion:
That at least one of the groups differ from the others. In other words, at
least one of the treatments had an effect. However, we would not be able to say
which groups differed from each other until we did a post hoc test.
3.
4.
Assume a researcher is looking at the relationship between caffeine consumption
and sense of humor.
a.
How many levels of caffeine should the researcher use? Why?
At
least three because the relationship might be nonlinear. For example, people
might have little sense of humor with no caffeine (they're not awake) and
little with an extreme amount of caffeine (they are too hyped up and
irritable), but a good sense of humor under moderate levels of caffeine. Using
three or more levels of caffeine
would allow us to detect some nonlinear trends and help us make predictions
about the effects of levels of caffeine that we had not directly tested.
b.
What levels would you choose? Why?
Three
to four levels. A no caffeine group, a low caffeine group, a moderate caffeine
group, and a high caffeine group. Make sure that the amounts of caffeine are
evenly spaced (e.g., 0 mg., 20, 40, 60, 80) so that trend analyses can be
performed.
c.
If a graph of the data suggests a curvilinear relationship, can the researcher assume
that the functional relationship between the independent and dependent variable
is curvilinear? Why or why not?
No—the
researcher do a post hoc trend analysis to make sure the observed pattern is
reliable.
d. Suppose the researcher used the
following four levels of caffeine: 0 mg., 20 mg., 25 mg., 26 mg. Can the
researcher do a trend analysis? Why or why not?
No—the levels are not evenly or proportionately
spaced.
e.
Suppose the researcher ranked participants
based on their sense of humor. That is, the person that laughed least got a
score of "1", the person who laughed second least got a
"2", etc. Can the
researcher use this data to do a trend analysis? Why or why not?
No—you
need at least interval scale measurement to do a trend analysis. Ranked data is
only ordinal.
f.
If a researcher used 4 levels of caffeine, how many trends can the researcher
look for?
3 (one less than the number of levels)
What is the
treatment's degrees of freedom?
3 (also one less than the number of
levels)
g.
If the researcher used 3 levels of caffeine and 30 participants, what are the
degrees of freedom for the treatment?
2
the degrees of freedom
for the error term?
27
h.
Suppose the F is 3.34 Referring to the degrees of freedom you
obtained in your answer to "g" (above) and to the table E-3, are the results
statistically significant?
No--if
the significance rule is that p <
.05
Can the researcher
look for linear and quadratic trends?
No—if
the results are not statistically significant, then the researcher cannot look
for trends.
5.
6.
A friend gives you the following Fs
and significance levels. On what basis, would you want these Fs (or significance levels) re-checked?
a.
F (2, 63)=.10, not significant
Even
when the treatment has no effect, F's
rarely tend to be zero. Instead, they are usually closer to 1.00. After all, if
there is no treatment effect, then, at a conceptual level, you are dividing an
estimate of error variance by another, estimate of the same error variance.
Dividing anything by itself should result in a number close to 1.
b.
F (3, 85) = -1.70, not significant
F’s can’t be negative. You are
dividing a square term by another squared term.
c.
F (1, 120)= 52.8, not significant
Such
a large F with so many degrees of freedom would have to be significant. Indeed,
according to the F table in Appendix
E, the critical value of F(1,120) is
3.92.
d.
F (5, 70) = 1.00, significant
F's close to one are rarely significant. An F of one is expected even when there is
absolutely no effect. Indeed, the lowest critical value of F on the entire F table
in Appendix E is 1.46—and that's for an F(30, and an infinite number of degrees of freedom).
7.
8.
Complete the following table.
(SV) |
(SS) |
(df) |
(MS) |
F |
|
Treatment (T) |
50 |
5 |
10 |
2.5 |
|
Error (E) |
100 |
25 |
4 |
|
|
Total |
SS
Total= 150 |
30 |
|
|
|
1.
2. Can you have an interaction without a
main effect?
Yes. Having a main
effect has no impact on whether you will have an interaction.
3.
4. Describe the pattern of results in the
following table in terms of main effects and interactions. Assume that all
differences are statistically significant.
|
Status of Speaker |
|
Rate
of Speech |
Low
Status |
Hi
Status |
Slow |
10 |
15 |
Fast |
20 |
25 |
|
Attitude Change |
Main effect for status, main effect for status, and an
interaction.
5.
6.
The following table is an ANOVA summary table of a study looking at the effects
of similarity and attractiveness on liking. Complete the table. Then, answer
these three questions.
a.
How many participants were used in the study?
60
b.
How many levels of similarity were used?
2
c.
How many levels of attractiveness were used?
3
SV |
SS |
df |
MS |
F |
Similarity (S) |
10
|
1 |
10
|
1
|
Attractiveness (A) |
40 |
2 |
20
|
2
|
S X A
interaction |
400 |
2 |
200 |
20 |
Error |
540 |
54 |
10
|
|
Total |
990 |
59 |
|
7.
8. A lab experiment on motivation yielded the following results:
Group |
Productivity |
No financial bonus,
no encouragement |
90% |
No financial bonus,
encouragement |
25% |
Financial bonus, no
encouragement |
90% |
Financial bonus,
encouragement |
90% |
a.
Make a 2 X 2 table of these data.
|
No encouragement |
Encouragement |
No financial bonus |
25 |
90 |
Bonus |
90 |
90 |
b.
Graph these data.
c.
Describe the results in terms of main
effects and interactions.
Bonus
main effect; Encouragement main effect; Interaction between bonus and
encouragement.
d.
What is your interpretation of the
findings?
One
interpretation is that you can use either bonuses or encouragement, but there
is no need to do both. However, it is possible that this ordinal interaction is
due to a ceiling effect and that a better measure of productivity might find
that encouragement combined with bonuses is better than either encouragement or
bonuses alone.
9.
10.
Suppose a researcher wanted to know whether lecturing was more effective than
group discussion for teaching basic facts.
Therefore, the researcher did a study and obtained the following
results:
Source of
Variance |
SS |
df |
MS |
F |
Teaching (T) |
10 |
1 |
10 |
5 |
Introversion/
Extroversion (I) |
20 |
1 |
20 |
10 |
T X I
interaction |
50 |
1 |
50 |
25 |
Error |
100 |
50 |
2 |
|
a.
What does the interaction seem to indicate?
The
effectiveness of the different teaching styles is different, depending on
whether introverts or extroverts are being taught. Without seeing the means it
is dangerous to speculate, but if one had to guess, one might say that
introverts responded better to the lecture method whereas extroverts responded
better to the group discussion method.
b.
Even if there had been no interaction between teaching and Extraversion, would
there be any value in including the introversion-extroversion variable?
Explain.
Yes,
because we would know whether the effectiveness of a teaching style was
moderated by introversion
c.
What, if anything, can you conclude about the effects of introversion on
learning?
Nothing—introversion
is not an experimental factor.
1.
2.
A researcher uses a simple
between-subjects experiment involving ten participants to examine the effects
of memory strategy (repetition versus imagery) on memory.
a.
Do you think the researcher will find a
significant effect? Why or why not?
No—too
few participants to have any power.
b.
What design would you recommend?
A
counterbalanced design so that the researcher could have the power of a
within-subjects design and yet control for order effects.
c.
If the researcher had used a matched
pairs study involving 10 participants, would the study have more power? Why?
How many degrees of freedom would the researcher have? What type of matching
task would you suggest? Why?
Yes—the
design should have more power because random error due to individual
differences would be reduced, thereby making the treatment effect easier to
detect.
Only
4 (one less than the number of
pairs).
A
reliable, sensitive, valid memory test that would be similar to the memory test
used in the real study. Ideally, we would use a test that correlated highly
with the real measure. We would use such a task because we do not have to worry
about deception and because it is most likely to give us accurately matched
pairs (a real concern when we only
have five pairs).
3.
4.
What problems would there be in using a
within-subjects design to study the "humor-perseverance" study
(discussed in question 3)? Would a counterbalanced design solve these problems?
Participants would probably figure out what the study was
about, thus hurting construct validity. Also, participants might be more
frustrated during the second exposure to the frustrating task (a practice
effect). In addition, there might be an interesting carry-over effect of humor
for participants receiving the humor/no-humor sequence: Irritability in the
“no humor” condition might be due to “coming down” from
laughing (if one buys opponent process theory). Not completely. However, it
might be able to balance out and measure
these effects. Thus, the design might let you know that these factors were
problems.
5.
6.
Two researchers hypothesize that spatial
problems will be solved more quickly when the problems are presented to
participant's left visual fields than when stimuli are presented to
participant's right visual fields (because messages seen in the left visual
field go directly to the right brain which is often assumed to be better at
processing spatial information).
Conversely, they believe verbal tasks will be performed more quickly
when stimuli are presented to participants' right visual fields than when the
tasks are presented to participants' left visual fields. What design would you recommend? Why?
A within subject design or a counterbalanced design because
the
differences looked for are probably fractions of seconds, so you need a
powerful design that will reduce error variance and allow you to get many
observations.
the
hypothesis is not so intuitive that participants are likely to guess it and
play along. Therefore, sensitization is not a big problem.
a
few warm-up trials could minimize practice effects and keeping the study short
would minimize fatigue effects (especially since the task is so simple). In
addition, we could use counterbalancing to balance out practice and fatigue
effects.
7.
8.
You want to determine whether caffeine, a
snack, or a brief walk has a more beneficial effect on mood? What design would
you use? Why? How?
A
between-subjects design would probably be best to avoid problems with (a) the
order effects that would affect within subject designs and (b) catching on to
the hypothesis (sensitization) that
would affect both matched pairs and within subject designs. This would
be done simply by randomly assigning participants to groups. If you did not
want to use a pure between-subjects design, you could use a mixed design in
which the within-subjects variable would be before vs. after the treatment and
the between-subjects variable would be caffeine vs. snack vs. the brief walk.
In that case, you would be looking for a significant interaction between trials
(before or after) and the treatment variable.
9.
10.
A researcher wants to kow
whether music lessons increase scores on IQ subtests and whether music lessons
have more of an effect on some subtests (e.g., more of an effect on math than
on vocabulary) than others.
a.
Would you make music lessons a between or
within subjects factor? Why?
Between-subjects.
It varies between-subjects in real life and there might be substantial
carryover effects.
b.
Would you make subtests a between or
within subjects factor? Why?
A
within-subjects factor. There is little concern about order effects and it
would give the study much more power.
c.
If the researcher did an analysis of
variance (ANOVA) on the data, the researcher would obtain three effects. Name
those three effects.
A
between-subjects main effect for music lessons, a within-subjects main effect
for subtests, and an interaction between subtests and music lessons.
d.
What effect would the researcher look for
to determine whether music lessons increase scores on IQ subtests?
The
between-subjects main effect of music lessons.
e.
What effect would the researcher look to
determine whether music lessons have more of an effect on math subtests than on
vocabulary subtests?
The
interaction between music lessons and subtests.
1.
2.
If the study does not manipulate the treatment, which requirement of
establishing causality will be difficult to meet?
Temporal precedence
3.
4. Compare and
contrast how single-subject experiments and randomized experiments account for
non-treatment factors.
Single-n experiments |
Randomized experiments |
1. Eliminate between
subject variables by studying a single subject. |
1. Independent
random assignment to be sure that irrelevant variables vary randomly rather
than systematically. |
2. Control relevant
environmental factors and demonstrate control of extraneous variables by
establishing a stable baseline. |
2. Use tests of
statistical significance to see if it is unlikely that random factors could
account for the differences. |
|
|
5.
6.
How do the A-B design and the pretest-posttest design differ in terms of
a.
Procedure?
The
pretest-posttest design uses more participants, does not attempt to develop a
stable baseline, and usually exerts less control over non-treatment variables.
b.
Internal validity?
Because the pretest-posttest researcher has not established a
stable baseline and does not exert as much control over extraneous variables,
the pretest-posttest has less internal validity than the A-B design.
7.
8.
Design a quasi-experiment that looks at the effects of a course on simulating
parenthood, including an assignment that involves taking care of an egg, on
changing the expectations of junior-high school students about parenting. What
kind of design would you use? Why?
A
randomized experiment would probably be the best choice because it is (a)
feasible and (b) would have internal validity. The next best choice would
probably be a time-series design with a control group because the control group
might be able to rule out some of the history effects. A time-series design
without a control group would be better than a pretest-posttest design because
it could better estimate the effects of maturation. However, a pretest-posttest
design would be better than a nonequivalent control group design because the
nonequivalent control group is so vulnerable to selection.
9.
10.
According to one study, holding students back a grade harmed students. The
evidence: students who had been held back a grade did much worse in school than
students who had not been held back.
a. Does this evidence
prove that holding students back harms their performance? Why or why not?
No—there
is a strong possibility that those who were held back differ in certain ways
from those who were not held back.
b.
If you were a researcher hired by the Dept. of Education to test the assertion
that holding students back harms them, which of the designs in this chapter
would you use? Why?
A time series design would be inadequate because dropping
out could reflect some historical force (better employment opportunities). A
nonequivalent control group would not be adequate because the groups are
different to begin with. Therefore, you should use a two-group time series
design. To make your “held back” group and “not held
back” groups as equivalent as possible, you might
o
attempt
to match on key variables, such as IQ and attendance.
o
hope
that you could find a district where students were held back according to some
rule (scored below 50% on a standardized test). Then, you might compare those
who were just above the cut-off (50-51%) to those who were just below (49-50%).
o
hope
that different districts had different cut-off points so that you could compare
50% scorers who were held back against 50% scorers who advanced.
Back
to Research Design Explained Main Menu