Description of Experiment and Data
Singmann and Klauer (2011) were interested in whether or not
conditional reasoning can be explained by a single process or whether
multiple processes are necessary to explain it. To provide evidence for
multiple processes we aimed to establish a double dissociation of two
variables: instruction type and problem type. Instruction type was
manipulated between-subjects, one group of participants received
deductive instructions (i.e., to treat the premises as given and only
draw necessary conclusions) and a second group of participants received
probabilistic instructions (i.e., to reason as in an everyday situation;
we called this “inductive instruction” in the manuscript). Problem type
consisted of two different orthogonally crossed variables that were
manipulated within-subjects, validity of the problem (formally valid or
formally invalid) and plausibility of the problem (inferences which were
consisted with the background knowledge versus problems that were
inconsistent with the background knowledge). The critical comparison
across the two conditions was among problems which were valid and
implausible with problems that were invalid and plausible. For example,
the next problem was invalid and plausible:
If a person is wet, then the person fell into a swimming pool.
A person fell into a swimming pool.
How valid is the conclusion/How likely is it that the person is wet?
For those problems we predicted that under deductive instructions
responses should be lower (as the conclusion does not necessarily follow
from the premises) as under probabilistic instructions. For the valid
but implausible problem, an example is presented next, we predicted the
opposite pattern:
If a person is wet, then the person fell into a swimming pool.
A person is wet.
How valid is the conclusion/How likely is it that the person fell into a
swimming pool?
Our study also included valid and plausible and invalid and
implausible problems.
In contrast to the analysis reported in the manuscript, we initially
do not separate the analysis into affirmation and denial problems, but
first report an analysis on the full set of inferences, MP, MT, AC, and
DA, where MP and MT are valid and AC and DA invalid. We report a
reanalysis of our Experiment 1 only. Note that the factor
plausibility is not present in the original manuscript,
there it is a results of a combination of other factors.
 
Data and R Preperation
We begin by loading the packages we will be using throughout.
library("afex")     # needed for ANOVA functions.
library("emmeans")  # emmeans must now be loaded explicitly for follow-up tests.
library("multcomp") # for advanced control for multiple testing/Type 1 errors.
library("ggplot2")  # for customizing plots.
data(sk2011.1)
str(sk2011.1)
## 'data.frame':    640 obs. of  9 variables:
##  $ id          : Factor w/ 40 levels "8","9","10","12",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ instruction : Factor w/ 2 levels "deductive","probabilistic": 2 2 2 2 2 2 2 2 2 2 ...
##  $ plausibility: Factor w/ 2 levels "plausible","implausible": 1 2 2 1 2 1 1 2 1 2 ...
##  $ inference   : Factor w/ 4 levels "MP","MT","AC",..: 4 2 1 3 4 2 1 3 4 2 ...
##  $ validity    : Factor w/ 2 levels "valid","invalid": 2 1 1 2 2 1 1 2 2 1 ...
##  $ what        : Factor w/ 2 levels "affirmation",..: 2 2 1 1 2 2 1 1 2 2 ...
##  $ type        : Factor w/ 2 levels "original","reversed": 2 2 2 2 1 1 1 1 2 2 ...
##  $ response    : int  100 60 94 70 100 99 98 49 82 50 ...
##  $ content     : Factor w/ 4 levels "C1","C2","C3",..: 1 1 1 1 2 2 2 2 3 3 ...
An important feature in the data is that each participant provided
two responses for each cell of the design (the content is different for
each of those, each participant saw all four contents). These two data
points will be aggregated automatically by afex.
with(sk2011.1, table(inference, id, plausibility))
## , , plausibility = plausible
## 
##          id
## inference 8 9 10 12 13 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
##        MP 2 2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##        MT 2 2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##        AC 2 2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##        DA 2 2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##          id
## inference 37 38 39 40 41 42 43 44 46 47 48 49 50
##        MP  2  2  2  2  2  2  2  2  2  2  2  2  2
##        MT  2  2  2  2  2  2  2  2  2  2  2  2  2
##        AC  2  2  2  2  2  2  2  2  2  2  2  2  2
##        DA  2  2  2  2  2  2  2  2  2  2  2  2  2
## 
## , , plausibility = implausible
## 
##          id
## inference 8 9 10 12 13 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
##        MP 2 2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##        MT 2 2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##        AC 2 2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##        DA 2 2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##          id
## inference 37 38 39 40 41 42 43 44 46 47 48 49 50
##        MP  2  2  2  2  2  2  2  2  2  2  2  2  2
##        MT  2  2  2  2  2  2  2  2  2  2  2  2  2
##        AC  2  2  2  2  2  2  2  2  2  2  2  2  2
##        DA  2  2  2  2  2  2  2  2  2  2  2  2  2
 
ANOVA
To get the full ANOVA table for the model, we simply pass it to
aov_ez using the design as described above. We save the
returned object for further analysis.
a1 <- aov_ez("id", "response", sk2011.1, between = "instruction", 
       within = c("inference", "plausibility"))
## Warning: More than one observation per design cell, aggregating data using `fun_aggregate = mean`.
## To turn off this warning, pass `fun_aggregate = mean` explicitly.
## Contrasts set to contr.sum for the following variables: instruction
a1 # the default print method prints a data.frame produced by nice 
## Anova Table (Type 3 tests)
## 
## Response: response
##                               Effect           df     MSE         F  ges p.value
## 1                        instruction        1, 38 2027.42      0.31 .003    .583
## 2                          inference 2.66, 101.12  959.12   5.81 ** .063    .002
## 3              instruction:inference 2.66, 101.12  959.12   6.00 ** .065    .001
## 4                       plausibility        1, 38  468.82 34.23 *** .068   <.001
## 5           instruction:plausibility        1, 38  468.82  10.67 ** .022    .002
## 6             inference:plausibility  2.29, 87.11  318.91    2.87 + .009    .055
## 7 instruction:inference:plausibility  2.29, 87.11  318.91    3.98 * .013    .018
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
## 
## Sphericity correction method: GG
The equivalent calls (i.e., producing exactly the same output) of the
other two ANOVA functions aov_car or aov4 is
shown below.
aov_car(response ~ instruction + Error(id/inference*plausibility), sk2011.1)
aov_4(response ~ instruction + (inference*plausibility|id), sk2011.1)
As mentioned before, the two responses per cell of the design and
participants are aggregated for the analysis as indicated by the warning
message. Furthermore, the degrees of freedom are Greenhouse-Geisser
corrected per default for all effects involving inference,
as inference is a within-subject factor with more than two
levels (i.e., MP, MT, AC, & DA). In line with our expectations, the
three-way interaction is significant.
The object printed per default for afex_aov objects
(produced by nice) can also be printed nicely using
knitr:
| instruction | 1, 38 | 2027.42 | 0.31 | .003 | .583 | 
| inference | 2.66, 101.12 | 959.12 | 5.81 ** | .063 | .002 | 
| instruction:inference | 2.66, 101.12 | 959.12 | 6.00 ** | .065 | .001 | 
| plausibility | 1, 38 | 468.82 | 34.23 *** | .068 | <.001 | 
| instruction:plausibility | 1, 38 | 468.82 | 10.67 ** | .022 | .002 | 
| inference:plausibility | 2.29, 87.11 | 318.91 | 2.87 + | .009 | .055 | 
| instruction:inference:plausibility | 2.29, 87.11 | 318.91 | 3.98 * | .013 | .018 | 
Alternatively, the anova method for
afex_aov objects returns a data.frame of class
anova that can be passed to, for example,
xtable for nice formatting:
print(xtable::xtable(anova(a1), digits = c(rep(2, 5), 3, 4)), type = "html")
|  | num Df | den Df | MSE | F | ges | Pr(>F) | 
| instruction | 1.00 | 38.00 | 2027.42 | 0.31 | 0.003 | 0.5830 | 
| inference | 2.66 | 101.12 | 959.12 | 5.81 | 0.063 | 0.0016 | 
| instruction:inference | 2.66 | 101.12 | 959.12 | 6.00 | 0.065 | 0.0013 | 
| plausibility | 1.00 | 38.00 | 468.82 | 34.23 | 0.068 | 0.0000 | 
| instruction:plausibility | 1.00 | 38.00 | 468.82 | 10.67 | 0.022 | 0.0023 | 
| inference:plausibility | 2.29 | 87.11 | 318.91 | 2.87 | 0.009 | 0.0551 | 
| instruction:inference:plausibility | 2.29 | 87.11 | 318.91 | 3.98 | 0.013 | 0.0177 | 
 
Post-Hoc Contrasts and Plotting
To further analyze the data we need to pass it to package
emmeans, a package that offers great functionality for both
plotting and contrasts of all kind. A lot of information on
emmeans can be obtained in its vignettes and
faq.
emmeans can work with afex_aov objects
directly as afex comes with the necessary methods for
the generic functions defined in emmeans. When using the
default multivariate option for follow-up tests,
emmeans uses the ANOVA model estimated via base R’s
lm method (which in the case of a multivariate response is
an object of class c("mlm", "lm")). afex also
supports a univariate model (i.e.,
emmeans_model = "univariate", which requires that
include_aov = TRUE in the ANOVA call) in which case
emmeans uses the object created by base R’s
aov function (this was the previous default but is not
recommended as it does not handle unbalanced data well).
Some First Contrasts
Main Effects Only
This object can now be passed to emmeans, for example to
obtain the marginal means of the four inferences:
m1 <- emmeans(a1, ~ inference)
m1
##  inference emmean   SE df lower.CL upper.CL
##  MP          87.5 1.80 38     83.9     91.2
##  MT          76.7 4.06 38     68.5     84.9
##  AC          69.4 4.77 38     59.8     79.1
##  DA          83.0 3.84 38     75.2     90.7
## 
## Results are averaged over the levels of: instruction, plausibility 
## Confidence level used: 0.95
This object can now also be used to compare whether or not there are
differences between the levels of the factor:
##  contrast estimate   SE df t.ratio p.value
##  MP - MT     10.83 4.33 38   2.501  0.0759
##  MP - AC     18.10 5.02 38   3.607  0.0047
##  MP - DA      4.56 4.20 38   1.086  0.7002
##  MT - AC      7.27 3.98 38   1.825  0.2778
##  MT - DA     -6.28 4.70 38  -1.334  0.5473
##  AC - DA    -13.54 5.30 38  -2.556  0.0672
## 
## Results are averaged over the levels of: instruction, plausibility 
## P value adjustment: tukey method for comparing a family of 4 estimates
To obtain more powerful p-value adjustments, we can furthermore pass
it to multcomp (Bretz, Hothorn, & Westfall, 2011):
summary(as.glht(pairs(m1)), test=adjusted("free"))
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Linear Hypotheses:
##              Estimate Std. Error t value Pr(>|t|)   
## MP - MT == 0   10.831      4.331   2.501   0.0587 . 
## MP - AC == 0   18.100      5.018   3.607   0.0049 **
## MP - DA == 0    4.556      4.196   1.086   0.3135   
## MT - AC == 0    7.269      3.984   1.825   0.1941   
## MT - DA == 0   -6.275      4.703  -1.334   0.3135   
## AC - DA == 0  -13.544      5.299  -2.556   0.0587 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- free method)
 
A Simple interaction
We could now also be interested in the marginal means of the
inferences across the two instruction types. emmeans offers
two ways to do so. The first splits the contrasts across levels of the
factor using the by argument.
m2 <- emmeans(a1, "inference", by = "instruction")
## equal: emmeans(a1, ~ inference|instruction)
m2
## instruction = deductive:
##  inference emmean   SE df lower.CL upper.CL
##  MP          97.3 2.54 38     92.1    102.4
##  MT          70.4 5.75 38     58.8     82.0
##  AC          61.5 6.75 38     47.8     75.1
##  DA          81.8 5.43 38     70.8     92.8
## 
## instruction = probabilistic:
##  inference emmean   SE df lower.CL upper.CL
##  MP          77.7 2.54 38     72.6     82.9
##  MT          83.0 5.75 38     71.3     94.6
##  AC          77.3 6.75 38     63.7     91.0
##  DA          84.1 5.43 38     73.1     95.1
## 
## Results are averaged over the levels of: plausibility 
## Confidence level used: 0.95
Consequently, tests are also only performed within each level of the
by factor:
## instruction = deductive:
##  contrast estimate   SE df t.ratio p.value
##  MP - MT     26.89 6.13 38   4.389  0.0005
##  MP - AC     35.80 7.10 38   5.045  0.0001
##  MP - DA     15.47 5.93 38   2.608  0.0599
##  MT - AC      8.91 5.63 38   1.582  0.4007
##  MT - DA    -11.41 6.65 38  -1.716  0.3297
##  AC - DA    -20.32 7.49 38  -2.712  0.0471
## 
## instruction = probabilistic:
##  contrast estimate   SE df t.ratio p.value
##  MP - MT     -5.22 6.13 38  -0.853  0.8287
##  MP - AC      0.40 7.10 38   0.056  0.9999
##  MP - DA     -6.36 5.93 38  -1.072  0.7084
##  MT - AC      5.62 5.63 38   0.998  0.7512
##  MT - DA     -1.14 6.65 38  -0.171  0.9982
##  AC - DA     -6.76 7.49 38  -0.902  0.8036
## 
## Results are averaged over the levels of: plausibility 
## P value adjustment: tukey method for comparing a family of 4 estimates
The second version considers all factor levels together.
Consequently, the number of pairwise comparisons is a lot larger:
m3 <- emmeans(a1, c("inference", "instruction"))
## equal: emmeans(a1, ~inference*instruction)
m3
##  inference instruction   emmean   SE df lower.CL upper.CL
##  MP        deductive       97.3 2.54 38     92.1    102.4
##  MT        deductive       70.4 5.75 38     58.8     82.0
##  AC        deductive       61.5 6.75 38     47.8     75.1
##  DA        deductive       81.8 5.43 38     70.8     92.8
##  MP        probabilistic   77.7 2.54 38     72.6     82.9
##  MT        probabilistic   83.0 5.75 38     71.3     94.6
##  AC        probabilistic   77.3 6.75 38     63.7     91.0
##  DA        probabilistic   84.1 5.43 38     73.1     95.1
## 
## Results are averaged over the levels of: plausibility 
## Confidence level used: 0.95
##  contrast                            estimate   SE df t.ratio p.value
##  MP deductive - MT deductive            26.89 6.13 38   4.389  0.0020
##  MP deductive - AC deductive            35.80 7.10 38   5.045  0.0003
##  MP deductive - DA deductive            15.47 5.93 38   2.608  0.1848
##  MP deductive - MP probabilistic        19.55 3.59 38   5.439  0.0001
##  MP deductive - MT probabilistic        14.32 6.29 38   2.279  0.3310
##  MP deductive - AC probabilistic        19.95 7.21 38   2.767  0.1342
##  MP deductive - DA probabilistic        13.19 5.99 38   2.201  0.3741
##  MT deductive - AC deductive             8.91 5.63 38   1.582  0.7577
##  MT deductive - DA deductive           -11.41 6.65 38  -1.716  0.6772
##  MT deductive - MP probabilistic        -7.34 6.29 38  -1.167  0.9363
##  MT deductive - MT probabilistic       -12.56 8.13 38  -1.545  0.7783
##  MT deductive - AC probabilistic        -6.94 8.86 38  -0.783  0.9931
##  MT deductive - DA probabilistic       -13.70 7.91 38  -1.733  0.6666
##  AC deductive - DA deductive           -20.32 7.49 38  -2.712  0.1501
##  AC deductive - MP probabilistic       -16.25 7.21 38  -2.254  0.3446
##  AC deductive - MT probabilistic       -21.48 8.86 38  -2.423  0.2600
##  AC deductive - AC probabilistic       -15.85 9.54 38  -1.661  0.7111
##  AC deductive - DA probabilistic       -22.61 8.66 38  -2.611  0.1834
##  DA deductive - MP probabilistic         4.08 5.99 38   0.680  0.9971
##  DA deductive - MT probabilistic        -1.15 7.91 38  -0.145  1.0000
##  DA deductive - AC probabilistic         4.47 8.66 38   0.517  0.9995
##  DA deductive - DA probabilistic        -2.29 7.68 38  -0.298  1.0000
##  MP probabilistic - MT probabilistic    -5.22 6.13 38  -0.853  0.9885
##  MP probabilistic - AC probabilistic     0.40 7.10 38   0.056  1.0000
##  MP probabilistic - DA probabilistic    -6.36 5.93 38  -1.072  0.9588
##  MT probabilistic - AC probabilistic     5.62 5.63 38   0.998  0.9719
##  MT probabilistic - DA probabilistic    -1.14 6.65 38  -0.171  1.0000
##  AC probabilistic - DA probabilistic    -6.76 7.49 38  -0.902  0.9840
## 
## Results are averaged over the levels of: plausibility 
## P value adjustment: tukey method for comparing a family of 8 estimates
 
Running Custom Contrasts
Objects returned from emmeans can also be used to test
specific contrasts. For this, we can simply create a list, where each
element corresponds to one contrasts. A contrast is defined as a vector
of constants on the reference grid (i.e., the object returned from
emmeans, here m3). For example, we might be
interested in whether there is a difference between the valid and
invalid inferences in each of the two conditions.
c1 <- list(
  v_i.ded = c(0.5, 0.5, -0.5, -0.5, 0, 0, 0, 0),
  v_i.prob = c(0, 0, 0, 0, 0.5, 0.5, -0.5, -0.5)
  )
contrast(m3, c1, adjust = "holm")
##  contrast estimate   SE df t.ratio p.value
##  v_i.ded    12.194 4.12 38   2.960  0.0105
##  v_i.prob   -0.369 4.12 38  -0.090  0.9291
## 
## Results are averaged over the levels of: plausibility 
## P value adjustment: holm method for 2 tests
summary(as.glht(contrast(m3, c1)), test = adjusted("free"))
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Linear Hypotheses:
##               Estimate Std. Error t value Pr(>|t|)  
## v_i.ded == 0   12.1937     4.1190    2.96   0.0105 *
## v_i.prob == 0  -0.3687     4.1190   -0.09   0.9291  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- free method)
The results can be interpreted as in line with expectations.
Responses are larger for valid than invalid problems in the deductive,
but not the probabilistic condition.
 
 
Plotting
Since version 0.22, afex comes with its own
plotting function based on ggplot2, afex_plot,
which works directly with afex_aov objects.
As said initially, we are interested in the three-way interaction of
instruction with inference, plausibility, and instruction. As we saw
above, this interaction was significant. Consequently, we are interested
in plotting this interaction.
Basic Plots
For afex_plot, we need to specify the
x-factor(s), which determine which factor-levels or
combinations of factor-levels are plotted on the x-axis. We can also
define trace factor(s), which determine which factor levels
are connected by lines. Finally, we can also define panel
factor(s), which determine if the plot is split into subplots.
afex_plot then plots the estimated marginal means obtained
from emmeans, confidence intervals, and the raw data in the
background. Note that the raw data in the background is per default
drawn using an alpha blending of .5 (i.e., 50% semi-transparency). Thus,
in case of several points lying directly on top of each other, this
point appears noticeably darker.
afex_plot(a1, x = "inference", trace = "instruction", panel = "plausibility")
## Warning: Panel(s) show a mixed within-between-design.
## Error bars do not allow comparisons across all means.
## Suppress error bars with: error = "none"

In the default settings, the error bars show 95%-confidence intervals
based on the standard error of the underlying model (i.e., the
lm model in the present case). In the present case, in
which each subplot (defined by x- and
trace-factor) shows a combination of a within-subjects
factor (i.e., inference) and a between-subjects (i.e.,
instruction) factor, this is not optimal. The error bars
only allow to assess differences regarding the between-subjects factor
(i.e., across the lines), but not inferences regarding the
within-subjects factor (i.e., within one line). This is also indicated
by a warning.
An alternative would be within-subject confidence intervals:
afex_plot(a1, x = "inference", trace = "instruction", panel = "plausibility", 
          error = "within")
## Warning: Panel(s) show a mixed within-between-design.
## Error bars do not allow comparisons across all means.
## Suppress error bars with: error = "none"

However, those only allow inferences regarding the within-subject
factors and not regarding the between-subjecta factor. So the same
warning is emitted again.
A further alternative is to suppress the error bars altogether. This
is the approach used in our original paper and probably a good idea in
general when figures show both between- and within-subjects factors
within the same panel. The presence of the raw data in the background
still provides a visual depiction of the variability of the data.
afex_plot(a1, x = "inference", trace = "instruction", panel = "plausibility", 
          error = "none")

 
Customizing Plots
afex_plot allows to customize the plot in a number of
different ways. For example, we can easily change the aesthetic mapping
associated with the trace factor. So instead of using
lineytpe and shape of the symbols, we can use color. Furthermore, we can
change the graphical element used for plotting the data points in the
background. For example, instead of plotting the raw data, we can
replace this with a boxplot. Finally, we can also make both the points
showing the means and the lines connecting the means larger.
p1 <- afex_plot(a1, x = "inference", trace = "instruction", 
                panel = "plausibility", error = "none", 
                mapping = c("color", "fill"), 
                data_geom = geom_boxplot, data_arg = list(width = 0.4), 
                point_arg = list(size = 1.5), line_arg = list(size = 1))
p1

Note that afex_plot returns a ggplot2 plot
object which can be used for further customization. For example, one can
easily change the theme to something that does not have a
grey background:

We can also set the theme globally for the remainder of the
R session.
The full set of customizations provided by afex_plot is
beyond the scope of this vignette. The examples on the help page at
?afex_plot provide a good overview.
 
 
 
Replicate Analysis from Singmann and Klauer (2011)
However, the plots shown so far are not particularly helpful with
respect to the research question. Next, we fit a new ANOVA model in
which we separate the data in affirmation and denial inferences. This
was also done in the original manuscript. We then lot the data a second
time.
a2 <- aov_ez("id", "response", sk2011.1, between = "instruction", 
       within = c("validity", "plausibility", "what"))
## Warning: More than one observation per design cell, aggregating data using `fun_aggregate = mean`.
## To turn off this warning, pass `fun_aggregate = mean` explicitly.
## Contrasts set to contr.sum for the following variables: instruction
## Anova Table (Type 3 tests)
## 
## Response: response
##                                    Effect    df     MSE         F   ges p.value
## 1                             instruction 1, 38 2027.42      0.31  .003    .583
## 2                                validity 1, 38  678.65    4.12 *  .013    .049
## 3                    instruction:validity 1, 38  678.65    4.65 *  .014    .037
## 4                            plausibility 1, 38  468.82 34.23 ***  .068   <.001
## 5                instruction:plausibility 1, 38  468.82  10.67 **  .022    .002
## 6                                    what 1, 38  660.52      0.22 <.001    .640
## 7                        instruction:what 1, 38  660.52      2.60  .008    .115
## 8                   validity:plausibility 1, 38  371.87      0.14 <.001    .715
## 9       instruction:validity:plausibility 1, 38  371.87    4.78 *  .008    .035
## 10                          validity:what 1, 38 1213.14   9.80 **  .051    .003
## 11              instruction:validity:what 1, 38 1213.14   8.60 **  .045    .006
## 12                      plausibility:what 1, 38  204.54   9.97 **  .009    .003
## 13          instruction:plausibility:what 1, 38  204.54    5.23 *  .005    .028
## 14             validity:plausibility:what 1, 38  154.62      0.03 <.001    .855
## 15 instruction:validity:plausibility:what 1, 38  154.62      0.42 <.001    .521
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
Then we plot the data from this ANOVA. Because each panel would again
show a mixed-design, we suppress the error bars.
afex_plot(a2, x = c("plausibility", "validity"), 
          trace = "instruction", panel = "what", 
          error = "none")

We see the critical and predicted cross-over interaction in the left
of those two graphs. For implausible but valid problems deductive
responses are larger than probabilistic responses. The opposite is true
for plausible but invalid problems. We now tests these differences at
each of the four x-axis ticks in each plot using custom contrasts
(diff_1 to diff_4). Furthermore, we test for a
validity effect and plausibility effect in both conditions.
(m4 <- emmeans(a2, ~instruction+plausibility+validity|what))
## what = affirmation:
##  instruction   plausibility validity emmean   SE df lower.CL upper.CL
##  deductive     plausible    valid      99.5 1.16 38     97.1    101.8
##  probabilistic plausible    valid      95.3 1.16 38     93.0     97.6
##  deductive     implausible  valid      95.1 5.01 38     85.0    105.2
##  probabilistic implausible  valid      60.2 5.01 38     50.0     70.3
##  deductive     plausible    invalid    67.0 6.95 38     52.9     81.0
##  probabilistic plausible    invalid    90.5 6.95 38     76.5    104.6
##  deductive     implausible  invalid    56.0 7.97 38     39.9     72.2
##  probabilistic implausible  invalid    64.1 7.97 38     48.0     80.3
## 
## what = denial:
##  instruction   plausibility validity emmean   SE df lower.CL upper.CL
##  deductive     plausible    valid      70.5 6.18 38     58.0     83.1
##  probabilistic plausible    valid      93.0 6.18 38     80.5    105.5
##  deductive     implausible  valid      70.2 6.36 38     57.4     83.1
##  probabilistic implausible  valid      73.0 6.36 38     60.1     85.8
##  deductive     plausible    invalid    86.5 5.32 38     75.8     97.3
##  probabilistic plausible    invalid    87.5 5.32 38     76.7     98.2
##  deductive     implausible  invalid    77.1 6.62 38     63.7     90.5
##  probabilistic implausible  invalid    80.8 6.62 38     67.4     94.1
## 
## Confidence level used: 0.95
c2 <- list(
  diff_1 = c(1, -1, 0, 0, 0, 0, 0, 0),
  diff_2 = c(0, 0, 1, -1, 0, 0, 0, 0),
  diff_3 = c(0, 0, 0, 0,  1, -1, 0, 0),
  diff_4 = c(0, 0, 0, 0,  0, 0, 1, -1),
  val_ded  = c(0.5, 0, 0.5, 0, -0.5, 0, -0.5, 0),
  val_prob = c(0, 0.5, 0, 0.5, 0, -0.5, 0, -0.5),
  plau_ded   = c(0.5, 0, -0.5, 0, -0.5, 0, 0.5, 0),
  plau_prob  = c(0, 0.5, 0, -0.5, 0, 0.5, 0, -0.5)
  )
contrast(m4, c2, adjust = "holm")
## what = affirmation:
##  contrast  estimate    SE df t.ratio p.value
##  diff_1       4.175  1.64 38   2.543  0.0759
##  diff_2      34.925  7.08 38   4.931  0.0001
##  diff_3     -23.600  9.83 38  -2.401  0.0854
##  diff_4      -8.100 11.28 38  -0.718  0.9538
##  val_ded     35.800  7.10 38   5.045  0.0001
##  val_prob     0.400  7.10 38   0.056  0.9553
##  plau_ded    -3.275  3.07 38  -1.068  0.8761
##  plau_prob   30.775  4.99 38   6.164  <.0001
## 
## what = denial:
##  contrast  estimate    SE df t.ratio p.value
##  diff_1     -22.425  8.74 38  -2.565  0.1007
##  diff_2      -2.700  8.99 38  -0.300  1.0000
##  diff_3      -0.925  7.52 38  -0.123  1.0000
##  diff_4      -3.650  9.36 38  -0.390  1.0000
##  val_ded    -11.412  6.65 38  -1.716  0.5658
##  val_prob    -1.137  6.65 38  -0.171  1.0000
##  plau_ded    -4.562  4.11 38  -1.109  1.0000
##  plau_prob   13.363  2.96 38   4.519  0.0005
## 
## P value adjustment: holm method for 8 tests
We can also pass these tests to multcomp which gives us
more powerful Type 1 error corrections.
summary(as.glht(contrast(m4, c2)), test = adjusted("free"))
## Warning in tmp$pfunction("adjusted", ...): Completion with error > abseps
## Warning in tmp$pfunction("adjusted", ...): Completion with error > abseps
## Warning in tmp$pfunction("adjusted", ...): Completion with error > abseps
## Warning in tmp$pfunction("adjusted", ...): Completion with error > abseps
## Warning in tmp$pfunction("adjusted", ...): Completion with error > abseps
## Warning in tmp$pfunction("adjusted", ...): Completion with error > abseps
## Warning in tmp$pfunction("adjusted", ...): Completion with error > abseps
## Warning in tmp$pfunction("adjusted", ...): Completion with error > abseps
## Warning in tmp$pfunction("adjusted", ...): Completion with error > abseps
## $`what = affirmation`
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Linear Hypotheses:
##                Estimate Std. Error t value Pr(>|t|)    
## diff_1 == 0       4.175      1.641   2.543   0.0648 .  
## diff_2 == 0      34.925      7.082   4.931 7.75e-05 ***
## diff_3 == 0     -23.600      9.830  -2.401   0.0708 .  
## diff_4 == 0      -8.100     11.275  -0.718   0.6882    
## val_ded == 0     35.800      7.096   5.045 6.44e-05 ***
## val_prob == 0     0.400      7.096   0.056   0.9553    
## plau_ded == 0    -3.275      3.065  -1.068   0.6036    
## plau_prob == 0   30.775      4.992   6.164 1.75e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- free method)
## 
## 
## $`what = denial`
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Linear Hypotheses:
##                Estimate Std. Error t value Pr(>|t|)    
## diff_1 == 0     -22.425      8.742  -2.565 0.082209 .  
## diff_2 == 0      -2.700      8.987  -0.300 0.984915    
## diff_3 == 0      -0.925      7.522  -0.123 0.984915    
## diff_4 == 0      -3.650      9.358  -0.390 0.984915    
## val_ded == 0    -11.412      6.651  -1.716 0.379871    
## val_prob == 0    -1.137      6.651  -0.171 0.984915    
## plau_ded == 0    -4.562      4.115  -1.109 0.725817    
## plau_prob == 0   13.363      2.957   4.519 0.000424 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- free method)
Unfortunately, in the present case this function throws several
warnings. Nevertheless, the p-values from both methods are very similar
and agree on whether or not they are below or above .05. Because of the
warnings it seems advisable to use the one provided by
emmeans directly and not use the ones from
multcomp.
The pattern for the affirmation problems is in line with the
expectations: We find the predicted differences between the instruction
types for valid and implausible (diff_2) and invalid and
plausible (diff_3) and the predicted non-differences for
the other two problems (diff_1 and diff_4).
Furthermore, we find a validity effect in the deductive but not in the
probabilistic condition. Likewise, we find a plausibility effect in the
probabilistic but not in the deductive condition.