diff --git a/01-pvalue_files/figure-pdf/fig-fig131-1.pdf b/01-pvalue_files/figure-pdf/fig-fig131-1.pdf index 7e00432..7a3d4d3 100644 Binary files a/01-pvalue_files/figure-pdf/fig-fig131-1.pdf and b/01-pvalue_files/figure-pdf/fig-fig131-1.pdf differ diff --git a/01-pvalue_files/figure-pdf/fig-fig132-1.pdf b/01-pvalue_files/figure-pdf/fig-fig132-1.pdf index 33ce960..313d9ad 100644 Binary files a/01-pvalue_files/figure-pdf/fig-fig132-1.pdf and b/01-pvalue_files/figure-pdf/fig-fig132-1.pdf differ diff --git a/01-pvalue_files/figure-pdf/fig-fig134-1.pdf b/01-pvalue_files/figure-pdf/fig-fig134-1.pdf index 36a2471..e5dd12d 100644 Binary files a/01-pvalue_files/figure-pdf/fig-fig134-1.pdf and b/01-pvalue_files/figure-pdf/fig-fig134-1.pdf differ diff --git a/01-pvalue_files/figure-pdf/fig-fig135-1.pdf b/01-pvalue_files/figure-pdf/fig-fig135-1.pdf index 144e805..cedb9a1 100644 Binary files a/01-pvalue_files/figure-pdf/fig-fig135-1.pdf and b/01-pvalue_files/figure-pdf/fig-fig135-1.pdf differ diff --git a/01-pvalue_files/figure-pdf/fig-fig136-1.pdf b/01-pvalue_files/figure-pdf/fig-fig136-1.pdf index bdcdcfc..a647eb8 100644 Binary files a/01-pvalue_files/figure-pdf/fig-fig136-1.pdf and b/01-pvalue_files/figure-pdf/fig-fig136-1.pdf differ diff --git a/01-pvalue_files/figure-pdf/fig-fig137-1.pdf b/01-pvalue_files/figure-pdf/fig-fig137-1.pdf index ac9a099..6e16bfd 100644 Binary files a/01-pvalue_files/figure-pdf/fig-fig137-1.pdf and b/01-pvalue_files/figure-pdf/fig-fig137-1.pdf differ diff --git a/01-pvalue_files/figure-pdf/fig-fig138-1.pdf b/01-pvalue_files/figure-pdf/fig-fig138-1.pdf index 929e8a7..02bc8cd 100644 Binary files a/01-pvalue_files/figure-pdf/fig-fig138-1.pdf and b/01-pvalue_files/figure-pdf/fig-fig138-1.pdf differ diff --git a/01-pvalue_files/figure-pdf/fig-paradox-1.pdf b/01-pvalue_files/figure-pdf/fig-paradox-1.pdf index 2dc8497..3afdc8c 100644 Binary files a/01-pvalue_files/figure-pdf/fig-paradox-1.pdf and b/01-pvalue_files/figure-pdf/fig-paradox-1.pdf differ diff --git a/01-pvalue_files/figure-pdf/fig-pdft-1.pdf b/01-pvalue_files/figure-pdf/fig-pdft-1.pdf index e907bec..bae7c80 100644 Binary files a/01-pvalue_files/figure-pdf/fig-pdft-1.pdf and b/01-pvalue_files/figure-pdf/fig-pdft-1.pdf differ diff --git a/01-pvalue_files/figure-pdf/fig-tdist-1.pdf b/01-pvalue_files/figure-pdf/fig-tdist-1.pdf index a4f6a4e..bcaf020 100644 Binary files a/01-pvalue_files/figure-pdf/fig-tdist-1.pdf and b/01-pvalue_files/figure-pdf/fig-tdist-1.pdf differ diff --git a/01-pvalue_files/figure-pdf/q1-1.pdf b/01-pvalue_files/figure-pdf/q1-1.pdf index 7d5a574..b2aa35c 100644 Binary files a/01-pvalue_files/figure-pdf/q1-1.pdf and b/01-pvalue_files/figure-pdf/q1-1.pdf differ diff --git a/01-pvalue_files/figure-pdf/unnamed-chunk-3-1.pdf b/01-pvalue_files/figure-pdf/unnamed-chunk-3-1.pdf index 4080478..5a428ad 100644 Binary files a/01-pvalue_files/figure-pdf/unnamed-chunk-3-1.pdf and b/01-pvalue_files/figure-pdf/unnamed-chunk-3-1.pdf differ diff --git a/02-errorcontrol_files/figure-pdf/fig-minerror-1.pdf b/02-errorcontrol_files/figure-pdf/fig-minerror-1.pdf index 8038601..f20fcc0 100644 Binary files a/02-errorcontrol_files/figure-pdf/fig-minerror-1.pdf and b/02-errorcontrol_files/figure-pdf/fig-minerror-1.pdf differ diff --git a/02-errorcontrol_files/figure-pdf/justifyalpha1-1.pdf b/02-errorcontrol_files/figure-pdf/justifyalpha1-1.pdf index 8038601..84f159a 100644 Binary files a/02-errorcontrol_files/figure-pdf/justifyalpha1-1.pdf and b/02-errorcontrol_files/figure-pdf/justifyalpha1-1.pdf differ diff --git a/07-CI.qmd b/07-CI.qmd index ad22c23..7c32202 100644 --- a/07-CI.qmd +++ b/07-CI.qmd @@ -94,10 +94,8 @@ In order to maintain the direct relationship between a confidence interval and a To maintain a direct relationship between an *F*-test and its confidence interval, a 90% CI for effect sizes from an *F*-test should be provided. The reason for this is explained by [Karl Wuensch](https://web.archive.org/web/20140104080701/http://core.ecu.edu/psyc/wuenschk/docs30/CI-Eta2-Alpha.doc). Where Cohen’s *d* can take both positive and negative values, r² or η² are squared, and can therefore only take positive values. This is related to the fact that *F*-tests (as commonly used in ANOVA) are one-sided. If you calculate a 95% CI, you can get situations where the confidence interval includes 0, but the test reveals a statistical difference with a *p* < .05 (for a more mathematical explanation, see @steiger_beyond_2004). This means that a 95% CI around Cohen's *d* in an independent *t*-test equals a 90% CI around η² for exactly the same test performed as an ANOVA. As a final detail, because eta-squared cannot be smaller than zero, the lower bound for the confidence interval cannot be smaller than 0. This means that a confidence interval for an effect that is not statistically different from 0 has to start at 0. You report such a CI as 90% CI [.00; .XX] where the XX is the upper limit of the CI. -Confidence intervals are often used in forest plots that communicate the results from a meta-analysis. In the plot below, we see 4 rows. Each row shows the effect size estimate from one study (in Hedges’ *g*). For example, study 1 yielded an effect size estimate of 0.44, with a confidence interval around the effect size from 0.08 to 0.8. The horizontal black line, similarly to the visualization we played around with before, is the width of the confidence interval. When it does not touch the effect size 0 (indicated by a black vertical dotted line) the effect is statistically significant. -```{r fig-meta, echo=FALSE} -#| fig-cap: "Meta-analysis of 4 studies." +```{r metaexample, echo=FALSE} set.seed(2238) @@ -118,11 +116,18 @@ for (i in 1:nSims) { # for each simulated study } result <- metafor::rma(yi, vi, data = metadata, method = "FE") + +``` + +Confidence intervals are often used in forest plots that communicate the results from a meta-analysis. In the plot below, we see 4 rows. Each row shows the effect size estimate from one study (in Hedges’ *g*). For example, study 1 yielded an effect size estimate of 0.53, with a confidence interval around the effect size from 0.12 to 0.94. The horizontal black line, similarly to the visualization we played around with before, is the width of the confidence interval. When it does not touch the effect size 0 (indicated by a black vertical dotted line) the effect is statistically significant. + +```{r fig-meta, echo=FALSE} +#| fig-cap: "Meta-analysis of 4 studies." + par(mar=c(4,0,4,0)) par(bg = backgroundcolor) metafor::forest(result, top = 0) title("Forest plot for a simulated meta-analysis") - ``` We can see, based on the fact that the confidence intervals do not overlap with 0, that studies 1 and 3 were statistically significant. The diamond shape named the FE model (Fixed Effect model) is the meta-analytic effect size. Instead of using a black horizontal line, the upper limit and lower limit of the confidence interval are indicated by the left and right points of the diamond, and the center of the diamond is the meta-analytic effect size estimate. A meta-analysis calculates the effect size by combining and weighing all studies. The confidence interval for a meta-analytic effect size estimate is always narrower than that for a single study, because of the combined sample size of all studies included in the meta-analysis. diff --git a/08-samplesizejustification_files/figure-pdf/fig-followupbias-1.pdf b/08-samplesizejustification_files/figure-pdf/fig-followupbias-1.pdf index 5c75797..726d7b3 100644 Binary files a/08-samplesizejustification_files/figure-pdf/fig-followupbias-1.pdf and b/08-samplesizejustification_files/figure-pdf/fig-followupbias-1.pdf differ diff --git a/08-samplesizejustification_files/figure-pdf/fig-noncentralt-1.pdf b/08-samplesizejustification_files/figure-pdf/fig-noncentralt-1.pdf index 0a2e19a..1194415 100644 Binary files a/08-samplesizejustification_files/figure-pdf/fig-noncentralt-1.pdf and b/08-samplesizejustification_files/figure-pdf/fig-noncentralt-1.pdf differ diff --git a/08-samplesizejustification_files/figure-pdf/fig-obs-power-plot-2-1.pdf b/08-samplesizejustification_files/figure-pdf/fig-obs-power-plot-2-1.pdf index abeca4e..800d992 100644 Binary files a/08-samplesizejustification_files/figure-pdf/fig-obs-power-plot-2-1.pdf and b/08-samplesizejustification_files/figure-pdf/fig-obs-power-plot-2-1.pdf differ diff --git a/08-samplesizejustification_files/figure-pdf/fig-plot-1-1.pdf b/08-samplesizejustification_files/figure-pdf/fig-plot-1-1.pdf index c599f1b..12a9cad 100644 Binary files a/08-samplesizejustification_files/figure-pdf/fig-plot-1-1.pdf and b/08-samplesizejustification_files/figure-pdf/fig-plot-1-1.pdf differ diff --git a/08-samplesizejustification_files/figure-pdf/fig-plot-4-1.pdf b/08-samplesizejustification_files/figure-pdf/fig-plot-4-1.pdf index b95d556..0759bc1 100644 Binary files a/08-samplesizejustification_files/figure-pdf/fig-plot-4-1.pdf and b/08-samplesizejustification_files/figure-pdf/fig-plot-4-1.pdf differ diff --git a/08-samplesizejustification_files/figure-pdf/fig-power-2-1.pdf b/08-samplesizejustification_files/figure-pdf/fig-power-2-1.pdf index ed70f2c..a2b0f98 100644 Binary files a/08-samplesizejustification_files/figure-pdf/fig-power-2-1.pdf and b/08-samplesizejustification_files/figure-pdf/fig-power-2-1.pdf differ diff --git a/08-samplesizejustification_files/figure-pdf/fig-power-3-1.pdf b/08-samplesizejustification_files/figure-pdf/fig-power-3-1.pdf index a0a73d1..e08a67d 100644 Binary files a/08-samplesizejustification_files/figure-pdf/fig-power-3-1.pdf and b/08-samplesizejustification_files/figure-pdf/fig-power-3-1.pdf differ diff --git a/08-samplesizejustification_files/figure-pdf/fig-power-effect1-1.pdf b/08-samplesizejustification_files/figure-pdf/fig-power-effect1-1.pdf index 3abddea..ac4a8a5 100644 Binary files a/08-samplesizejustification_files/figure-pdf/fig-power-effect1-1.pdf and b/08-samplesizejustification_files/figure-pdf/fig-power-effect1-1.pdf differ diff --git a/09-equivalencetest_files/figure-pdf/fig-ciequivalence1-1.pdf b/09-equivalencetest_files/figure-pdf/fig-ciequivalence1-1.pdf index d13bbc5..a5138f1 100644 Binary files a/09-equivalencetest_files/figure-pdf/fig-ciequivalence1-1.pdf and b/09-equivalencetest_files/figure-pdf/fig-ciequivalence1-1.pdf differ diff --git a/09-equivalencetest_files/figure-pdf/fig-ciequivalence2-1.pdf b/09-equivalencetest_files/figure-pdf/fig-ciequivalence2-1.pdf index ec0ad78..49b0d16 100644 Binary files a/09-equivalencetest_files/figure-pdf/fig-ciequivalence2-1.pdf and b/09-equivalencetest_files/figure-pdf/fig-ciequivalence2-1.pdf differ diff --git a/09-equivalencetest_files/figure-pdf/fig-intervaltest-1.pdf b/09-equivalencetest_files/figure-pdf/fig-intervaltest-1.pdf index e23a720..0c95536 100644 Binary files a/09-equivalencetest_files/figure-pdf/fig-intervaltest-1.pdf and b/09-equivalencetest_files/figure-pdf/fig-intervaltest-1.pdf differ diff --git a/09-equivalencetest_files/figure-pdf/fig-tdistequivalence-1.pdf b/09-equivalencetest_files/figure-pdf/fig-tdistequivalence-1.pdf index 7144398..364e1a7 100644 Binary files a/09-equivalencetest_files/figure-pdf/fig-tdistequivalence-1.pdf and b/09-equivalencetest_files/figure-pdf/fig-tdistequivalence-1.pdf differ diff --git a/09-equivalencetest_files/figure-pdf/fig-tmet-1.pdf b/09-equivalencetest_files/figure-pdf/fig-tmet-1.pdf index b5df9e5..ab9af27 100644 Binary files a/09-equivalencetest_files/figure-pdf/fig-tmet-1.pdf and b/09-equivalencetest_files/figure-pdf/fig-tmet-1.pdf differ diff --git a/10-sequential_files/figure-pdf/fig-boundplot1-1.pdf b/10-sequential_files/figure-pdf/fig-boundplot1-1.pdf index 2621d45..01f04c3 100644 Binary files a/10-sequential_files/figure-pdf/fig-boundplot1-1.pdf and b/10-sequential_files/figure-pdf/fig-boundplot1-1.pdf differ diff --git a/10-sequential_files/figure-pdf/fig-comparison-1.pdf b/10-sequential_files/figure-pdf/fig-comparison-1.pdf index aa52246..c997efb 100644 Binary files a/10-sequential_files/figure-pdf/fig-comparison-1.pdf and b/10-sequential_files/figure-pdf/fig-comparison-1.pdf differ diff --git a/10-sequential_files/figure-pdf/fig-fourspendingfunctions-1.pdf b/10-sequential_files/figure-pdf/fig-fourspendingfunctions-1.pdf index e802628..3f76342 100644 Binary files a/10-sequential_files/figure-pdf/fig-fourspendingfunctions-1.pdf and b/10-sequential_files/figure-pdf/fig-fourspendingfunctions-1.pdf differ diff --git a/10-sequential_files/figure-pdf/fig-futility1-1.pdf b/10-sequential_files/figure-pdf/fig-futility1-1.pdf index cbac575..2db6be3 100644 Binary files a/10-sequential_files/figure-pdf/fig-futility1-1.pdf and b/10-sequential_files/figure-pdf/fig-futility1-1.pdf differ diff --git a/10-sequential_files/figure-pdf/fig-futility2-1.pdf b/10-sequential_files/figure-pdf/fig-futility2-1.pdf index f485f0a..d2e0321 100644 Binary files a/10-sequential_files/figure-pdf/fig-futility2-1.pdf and b/10-sequential_files/figure-pdf/fig-futility2-1.pdf differ diff --git a/10-sequential_files/figure-pdf/fig-futilityq13-1.pdf b/10-sequential_files/figure-pdf/fig-futilityq13-1.pdf index b7d7606..2d5aebb 100644 Binary files a/10-sequential_files/figure-pdf/fig-futilityq13-1.pdf and b/10-sequential_files/figure-pdf/fig-futilityq13-1.pdf differ diff --git a/10-sequential_files/figure-pdf/fig-powerseq-1.pdf b/10-sequential_files/figure-pdf/fig-powerseq-1.pdf index 1561e94..f98b0e0 100644 Binary files a/10-sequential_files/figure-pdf/fig-powerseq-1.pdf and b/10-sequential_files/figure-pdf/fig-powerseq-1.pdf differ diff --git a/10-sequential_files/figure-pdf/fig-powerseq2-1.pdf b/10-sequential_files/figure-pdf/fig-powerseq2-1.pdf index 97a7ae6..c258f23 100644 Binary files a/10-sequential_files/figure-pdf/fig-powerseq2-1.pdf and b/10-sequential_files/figure-pdf/fig-powerseq2-1.pdf differ diff --git a/12-bias_files/figure-pdf/fig-carterbias-1.pdf b/12-bias_files/figure-pdf/fig-carterbias-1.pdf index 659b7ac..3732472 100644 Binary files a/12-bias_files/figure-pdf/fig-carterbias-1.pdf and b/12-bias_files/figure-pdf/fig-carterbias-1.pdf differ diff --git a/12-bias_files/figure-pdf/fig-funnel1-1.pdf b/12-bias_files/figure-pdf/fig-funnel1-1.pdf index 3cfd8e6..3a5ea8e 100644 Binary files a/12-bias_files/figure-pdf/fig-funnel1-1.pdf and b/12-bias_files/figure-pdf/fig-funnel1-1.pdf differ diff --git a/12-bias_files/figure-pdf/fig-funnel2-1.pdf b/12-bias_files/figure-pdf/fig-funnel2-1.pdf index 7f15495..5568f46 100644 Binary files a/12-bias_files/figure-pdf/fig-funnel2-1.pdf and b/12-bias_files/figure-pdf/fig-funnel2-1.pdf differ diff --git a/12-bias_files/figure-pdf/fig-petpeese-1.pdf b/12-bias_files/figure-pdf/fig-petpeese-1.pdf index a66e796..e6579f2 100644 Binary files a/12-bias_files/figure-pdf/fig-petpeese-1.pdf and b/12-bias_files/figure-pdf/fig-petpeese-1.pdf differ diff --git a/12-bias_files/figure-pdf/fig-petpeeseq4-1.pdf b/12-bias_files/figure-pdf/fig-petpeeseq4-1.pdf index 8a7728b..b86c9f7 100644 Binary files a/12-bias_files/figure-pdf/fig-petpeeseq4-1.pdf and b/12-bias_files/figure-pdf/fig-petpeeseq4-1.pdf differ diff --git a/12-bias_files/figure-pdf/fig-trimfill1-1.pdf b/12-bias_files/figure-pdf/fig-trimfill1-1.pdf index d0ee5b2..5c25cf6 100644 Binary files a/12-bias_files/figure-pdf/fig-trimfill1-1.pdf and b/12-bias_files/figure-pdf/fig-trimfill1-1.pdf differ diff --git a/12-bias_files/figure-pdf/fig-twoforestplot-1.pdf b/12-bias_files/figure-pdf/fig-twoforestplot-1.pdf index cb3caba..a1db5c1 100644 Binary files a/12-bias_files/figure-pdf/fig-twoforestplot-1.pdf and b/12-bias_files/figure-pdf/fig-twoforestplot-1.pdf differ diff --git a/12-bias_files/figure-pdf/metasimq2-1.pdf b/12-bias_files/figure-pdf/metasimq2-1.pdf index f9eb572..95d23f1 100644 Binary files a/12-bias_files/figure-pdf/metasimq2-1.pdf and b/12-bias_files/figure-pdf/metasimq2-1.pdf differ diff --git a/13-prereg_files/figure-epub/unnamed-chunk-1-1.png b/13-prereg_files/figure-epub/unnamed-chunk-1-1.png deleted file mode 100644 index 6254897..0000000 Binary files a/13-prereg_files/figure-epub/unnamed-chunk-1-1.png and /dev/null differ diff --git a/docs/03-likelihoods.html b/docs/03-likelihoods.html index 19d9700..9528a81 100644 --- a/docs/03-likelihoods.html +++ b/docs/03-likelihoods.html @@ -458,8 +458,8 @@

< 3.4.1 Questions about likelihoods

Q1: Let’s assume that you flip what you believe to be a fair coin. What is the binomial probability of observing 8 heads out of 10 coin flips, when p = 0.5? (You can use the functions in the chapter, or compute it by hand).

-
- +
+

Q2: The likelihood curve rises and falls, except in the extreme cases where 0 heads or only heads are observed. Copy the code below (remember that you can click the ‘clipboard’ icon on the top right of the code section to copy all the code to your clipboard), and plot the likelihood curves for 0 heads (x <- 0) out of 10 flips (n <- 10) by running the script. What does the likelihood curve look like?

@@ -483,8 +483,8 @@

< title(paste("Likelihood Ratio H0/H1:", round(dbinom(x, n, H0) / dbinom(x, n, H1), digits = 2), " Likelihood Ratio H1/H0:", round(dbinom(x, n, H1) / dbinom(x, n, H0), digits = 2)))

-
- +
+

Q3: Get a coin out of your pocket or purse Flip it 13 times, and count the number of heads. Using the code above, calculate the likelihood of your observed results under the hypothesis that your coin is fair, compared to the hypothesis that the coin is not fair. Set the number of successes (x) to the number of heads you observed. Change \(H_1\) to the number of heads you have observed (or leave it at 0 if you didn’t observe any heads at all!). You can just use 4/13, or enter 0.3038. Leave \(H_0\) at 0.5. Run the script to calculate the likelihood ratio. What is the likelihood ratio of a fair compared to a non-fair coin (or \(H_0\)/\(H_1\)) that flips heads as often as you have observed, based on the observed data? Round your answer to 2 digits after the decimal.

@@ -497,8 +497,8 @@

<

Q7: When comparing two hypotheses (p = X vs p = Y), a likelihood ratio of:

-
- +
+

@@ -506,21 +506,21 @@

<

A Shiny app to perform the calculations is available here.

Q8: Which statement is correct when you perform 3 studies?

-
- +
+

Q9: Sometimes in a set of three studies, you’ll find a significant effect in one study, but there is no effect in the other two related studies. Assume the two related studies were not exactly the same in every way (e.g., you changed the manipulation, or the procedure, or some of the questions). It could be that the two other studies did not work because of minor differences that had some effect that you do not fully understand yet. Or it could be that the single significant result was a Type 1 error, and \(H_0\) was true in all three studies. Which statement below is correct, assuming a 5% Type 1 error rate and 80% power?

-
- +
+

The idea that most studies have 80% power is slightly optimistic. Examine the correct answer to the previous question across a range of power values (e.g., 50% power, and 30% power).

Q10: Several papers suggest it is a reasonable assumption that the power in the psychological literature might be around 50%. Set the number of studies to 4, the number of successes also to 4, and the assumed power slider to 50%, and look at the table at the bottom of the app. How likely is it that you will observe 4 significant results in 4 studies, assuming there is a true effect?

-
- +
+

Imagine you perform 4 studies, and 3 show a significant result. Change these numbers in the online app. Leave the power at 50%. The output in the text tells you:

@@ -529,16 +529,16 @@

<

These calculations show that, assuming you have observed three significant results out of four studies, and assuming each study had 50% power, you are 526 times more likely to have observed these data when the alternative hypothesis is true, than when the null hypothesis is true. In other words, your are 526 times more likely to find a significant effect in three studies when you have 50% power, than to find three Type 1 errors in a set of four studies.

Q11: Maybe you don’t think 50% power is a reasonable assumption. How low can the power be (rounded to 2 digits), for the likelihood to remain higher than 32 in favor of \(H_1\) when observing 3 out of 4 significant results?

-
- +
+

The main take-home message of these calculations is to understand that 1) mixed results are supposed to happen, and 2) mixed results can contain strong evidence for a true effect, across a wide range of plausible power values. The app also tells you how much evidence, in a rough dichotomous way, you can expect. This is useful for our educational goal. But when you want to evaluate results from multiple studies, the formal way to do so is by performing a meta-analysis.

The above calculations make a very important assumption, namely that the Type 1 error rate is controlled at 5%. If you try out many different tests in each study, and only report the result that yielded p < 0.05, these calculations no longer hold.

Q12: Go back to the default settings of 2 out of 3 significant results, but now set the Type 1 error rate to 20%, to reflect a modest amount of p-hacking. Under these circumstances, what is the highest likelihood in favor of \(H_1\) you can get if you explore all possible values for the true power?

-
- +
+

As the scenario above shows, p-hacking makes studies extremely uninformative. If you inflate the error rate, you quickly destroy the evidence in the data. You can no longer determine whether the data are more likely when there is no effect, than when there is an effect. Sometimes researchers complain that people who worry about p-hacking and try to promote better Type 1 error control are missing the point, and that other things (better measurement, better theory, etc.) are more important. I fully agree that these aspects of scientific research are at least as important as better error control. But better measures and theories will require decades of work. Better error control could be accomplished today, if researchers would stop inflating their error rates by flexibly analyzing their data. And as this assignment shows, inflated rates of false positives very quickly make it difficult to learn what is true from the data we collect. Because of the relative ease with which this part of scientific research can be improved, and because we can achieve this today (and not in a decade), I think it is worth stressing the importance of error control, and publish more realistic-looking sets of studies.

diff --git a/docs/04-bayes.html b/docs/04-bayes.html index 9d376f1..5e59742 100644 --- a/docs/04-bayes.html +++ b/docs/04-bayes.html @@ -541,14 +541,14 @@

Q1: The true believer had a prior of Beta(1,0.5). After observing 10 heads out of 20 coin flips, what is the posterior distribution, given that \(\alpha\) = \(\alpha\) + x and \(\beta\) = \(\beta\) + n – x?

-
- +
+

Q2: The extreme skeptic had a prior of Beta(100,100). After observing 50 heads out of 100 coin flips, what is the posterior distribution, given that \(\alpha\) = \(\alpha\) + x and \(\beta\) = \(\beta\) + n – x?

-
- +
+

Copy the R script below into R. This script requires 5 input parameters (identical to the Bayes Factor calculator website used above). These are the hypothesis you want to examine (e.g., when evaluating whether a coin is fair, p = 0.5), the total number of trials (e.g., 20 flips), the number of successes (e.g., 10 heads), and the \(\alpha\) and \(\beta\) values for the Beta distribution for the prior (e.g., \(\alpha\) = 1 and \(\beta\) = 1 for a uniform prior). Run the script. It will calculate the Bayes Factor, and plot the prior (grey), likelihood (dashed blue), and posterior (black).

@@ -588,14 +588,14 @@

We see that for the newborn baby, p = 0.5 has become more probable, but so has p = 0.4.

Q3: Change the hypothesis in the first line from 0.5 to 0.675, and run the script. If you were testing the idea that this coin returns 67.5% heads, which statement is true?

-
- +
+

Q4: Change the hypothesis in the first line back to 0.5. Let’s look at the increase in the belief of the hypothesis p = 0.5 for the extreme skeptic after 10 heads out of 20 coin flips. Change the \(\alpha\) for the prior in line 4 to 100 and the \(\beta\) for the prior in line 5 to 100. Run the script. Compare the figure from R to the increase in belief for the newborn baby. Which statement is true?

-
- +
+

Copy the R script below and run it. The script will plot the mean for the posterior when 10 heads out of 20 coin flips are observed, given a uniform prior (as in Figure 4.6). The script will also use the ‘binom’ package to calculate the posterior mean, credible interval, and highest density interval is an alternative to the credible interval.

@@ -697,8 +697,8 @@

The posterior mean is identical to the Frequentist mean, but this is only the case when the mean of the prior equals the mean of the likelihood.

Q5: Assume the outcome of 20 coin flips had been 18 heads. Change x to 18 in line 2 and run the script. Remember that the mean of the prior Beta(1,1) distribution is \(\alpha\) / (\(\alpha\) + \(\beta\)), or 1/(1+1) = 0.5. The Frequentist mean is simply x/n, or 18/20=0.9. Which statement is true?

-
- +
+

Q6: What is, today, your best estimate of the probability that the sun will rise tomorrow? Assume you were born with an uniform Beta(1,1) prior. The sun can either rise, or not. Assume you have seen the sun rise every day since you were born, which means there has been a continuous string of successes for every day you have been alive. It is OK to estimate the days you have been alive by just multiplying your age by 365 days. What is your best estimate of the probability that the sun will rise tomorrow?

diff --git a/docs/05-questions.html b/docs/05-questions.html index ed40e10..529a942 100644 --- a/docs/05-questions.html +++ b/docs/05-questions.html @@ -547,7 +547,7 @@

\(<\) .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997

-Colling, L. J., Szűcs, D., De Marco, D., Cipora, K., Ulrich, R., Nuerk, H.-C., Soltanlou, M., Bryce, D., Chen, S.-C., Schroeder, P. A., Henare, D. T., Chrystall, C. K., Corballis, P. M., Ansari, D., Goffin, C., Sokolowski, H. M., Hancock, P. J. B., Millen, A. E., Langton, S. R. H., … McShane, B. B. (2020). Registered Replication Report on Fischer, Castel, Dodd, and Pratt (2003). Advances in Methods and Practices in Psychological Science, 3(2), 143–162. https://doi.org/10.1177/2515245920903079 +Colling, L. J., Szcs, D., De Marco, D., Cipora, K., Ulrich, R., Nuerk, H.-C., Soltanlou, M., Bryce, D., Chen, S.-C., Schroeder, P. A., Henare, D. T., Chrystall, C. K., Corballis, P. M., Ansari, D., Goffin, C., Sokolowski, H. M., Hancock, P. J. B., Millen, A. E., Langton, S. R. H., … McShane, B. B. (2020). Registered Replication Report on Fischer, Castel, Dodd, and Pratt (2003). Advances in Methods and Practices in Psychological Science, 3(2), 143–162. https://doi.org/10.1177/2515245920903079
de Groot, A. D. (1969). Methodology (Vol. 6). Mouton & Co. diff --git a/docs/06-effectsize.html b/docs/06-effectsize.html index 29e6ad4..4c64e8b 100644 --- a/docs/06-effectsize.html +++ b/docs/06-effectsize.html @@ -551,76 +551,76 @@

Q1: One of the largest effect sizes in the meta-meta analysis by Richard and colleagues from 2003 is that people are likely to perform an action if they feel positively about the action and believe it is common. Such an effect is (with all due respect to all of the researchers who contributed to this meta-analysis) somewhat trivial. Even so, the correlation was r = .66, which equals a Cohen’s d of 1.76. What, according to the online app at https://rpsychologist.com/cohend/, is the probability of superiority for an effect of this size?

-
- +
+

Q2: Cohen’s d is to ______ as eta-squared is to ________

-
- +
+

Q3: A correlation of r = 1.2 is:

-
- +
+

Q4: Let’s assume the difference between two means we observe is 1, and the pooled standard deviation is also 1. If we simulate a large number of studies with those values, what, on average, happens to the t-value and Cohen’s d, as a function of the sample size in these simulations?

-
- +
+

Q5: Go to http://rpsychologist.com/d3/correlation/ to look at a good visualization of the proportion of variance that is explained by group membership, and the relationship between r and \(r^2\). Look at the scatterplot and the shared variance for an effect size of r = .21 (Richard et al., 2003). Given that r = 0.21 was their estimate of the median effect size in psychological research (not corrected for bias), how much variance in the data do variables in psychology on average explain?

-
- +
+

Q6: By default, the sample size for the online correlation visualization linked to above is 50. Click on the cogwheel to access the settings, change the sample size to 500, and click the button ‘New Sample’. What happens?

-
- +
+

Q7: In an old paper you find a statistical result reported as t(36) = 2.14, p < 0.05 for an independent t-test without a reported effect size. Using the online MOTE app https://doomlab.shinyapps.io/mote/ (choose “Independent t -t” from the Mean Differences dropdown menu) or the MOTE R function d.ind.t.t, what is the Cohen’s d effect size for this effect, given 38 participants (e.g., 19 in each group, leading to N – 2 = 36 degrees of freedom) and an alpha level of 0.05?

-
- +
+

Q8: In an old paper you find a statistical result from a 2x3 between-subjects ANOVA reported as F(2, 122) = 4.13, p < 0.05, without a reported effect size. Using the online MOTE app https://doomlab.shinyapps.io/mote/ (choose Eta – F from the Variance Overlap dropdown menu) or the MOTE R function eta.F, what is the effect size expressed as partial eta-squared?

-
- +
+

Q9: You realize that computing omega-squared corrects for some of the bias in eta-squared. For the old paper with F(2, 122) = 4.13, p < 0.05, and using the online MOTE app https://doomlab.shinyapps.io/mote/ (choose Omega – F from the Variance Overlap dropdown menu) or the MOTE R function omega.F, what is the effect size in partial omega-squared? HINT: The total sample size is the \(df_{error} + k\), where k is the number of groups (which is 6 for the 2x3 ANOVA).

-
- +
+

Q10: Several times in this chapter the effect size Cohen’s d was converted to r, or vice versa. We can use the effectsize R package (that can also be used to compute effect sizes when you analyze your data in R) to convert the median r = 0.21 observed in Richard and colleagues’ meta-meta-analysis to d: effectsize::r_to_d(0.21) which (assuming equal sample sizes per condition) yields d = 0.43 (the conversion assumes equal sample sizes in each group). Which Cohen’s d corresponds to a r = 0.1?

-
- +
+

Q11: It can be useful to convert effect sizes to r when performing a meta-analysis where not all effect sizes that are included are based on mean differences. Using the d_to_r() function in the effectsize package, what does a d = 0.8 correspond to (again assuming equal sample sizes per condition)?

-
- +
+

Q12: From questions 10 and 11 you might have noticed something peculiar. The benchmarks typically used for ‘small’, ‘medium’, and ‘large’ effects for Cohen’s d are d = 0.2, d = 0.5, and d = 0.8, and for a correlation are r = 0.1, r = 0.3, and r = 0.5. Using the d_to_r() function in the effectsize package, check to see whether the benchmark for a ‘large’ effect size correspond between d and r.

As McGrath & Meyer (2006) write: “Many users of Cohen’s (1988) benchmarks seem unaware that those for the correlation coefficient and d are not strictly equivalent, because Cohen’s generally cited benchmarks for the correlation were intended for the infrequently used biserial correlation rather than for the point biserial.”

Download the paper by McGrath and Meyer, 2006 (you can find links to the pdf here), and on page 390, right column, read which solution the authors prefer.

-
- +
+
diff --git a/docs/07-CI.html b/docs/07-CI.html index 72d4702..2fac405 100644 --- a/docs/07-CI.html +++ b/docs/07-CI.html @@ -372,7 +372,7 @@

There is a direct relationship between the CI around an effect size and statistical significance of a null-hypothesis significance test. For example, if an effect is statistically significant (p < 0.05) in a two-sided independent t-test with an alpha of .05, the 95% CI for the mean difference between the two groups will not include zero. Confidence intervals are sometimes said to be more informative than p-values, because they not only provide information about whether an effect is statistically significant (i.e., when the confidence interval does not overlap with the value representing the null hypothesis), but also communicate the precision of the effect size estimate. This is true, but as mentioned in the chapter on p-values it is still recommended to add exact p-values, which facilitates the re-use of results for secondary analyses (Appelbaum et al., 2018), and allows other researchers to compare the p-value to an alpha level they would have preferred to use (Lehmann & Romano, 2005).

In order to maintain the direct relationship between a confidence interval and a p-value it is necessary to adjust the confidence interval level whenever the alpha level is adjusted. For example, if an alpha level of 5% is corrected for three comparisons to 0.05/3 - 0.0167, the corresponding confidence interval would be a 1 - 0.0167 = 0.9833 confidence interval. Similarly, if a p-value is computed for a one-sided t-test, there is only an upper or lower limit of the interval, and the other end of the interval ranges to −∞ or ∞.

To maintain a direct relationship between an F-test and its confidence interval, a 90% CI for effect sizes from an F-test should be provided. The reason for this is explained by Karl Wuensch. Where Cohen’s d can take both positive and negative values, r² or η² are squared, and can therefore only take positive values. This is related to the fact that F-tests (as commonly used in ANOVA) are one-sided. If you calculate a 95% CI, you can get situations where the confidence interval includes 0, but the test reveals a statistical difference with a p < .05 (for a more mathematical explanation, see Steiger (2004)). This means that a 95% CI around Cohen’s d in an independent t-test equals a 90% CI around η² for exactly the same test performed as an ANOVA. As a final detail, because eta-squared cannot be smaller than zero, the lower bound for the confidence interval cannot be smaller than 0. This means that a confidence interval for an effect that is not statistically different from 0 has to start at 0. You report such a CI as 90% CI [.00; .XX] where the XX is the upper limit of the CI.

-

Confidence intervals are often used in forest plots that communicate the results from a meta-analysis. In the plot below, we see 4 rows. Each row shows the effect size estimate from one study (in Hedges’ g). For example, study 1 yielded an effect size estimate of 0.44, with a confidence interval around the effect size from 0.08 to 0.8. The horizontal black line, similarly to the visualization we played around with before, is the width of the confidence interval. When it does not touch the effect size 0 (indicated by a black vertical dotted line) the effect is statistically significant.

+

Confidence intervals are often used in forest plots that communicate the results from a meta-analysis. In the plot below, we see 4 rows. Each row shows the effect size estimate from one study (in Hedges’ g). For example, study 1 yielded an effect size estimate of 0.53, with a confidence interval around the effect size from 0.12 to 0.94. The horizontal black line, similarly to the visualization we played around with before, is the width of the confidence interval. When it does not touch the effect size 0 (indicated by a black vertical dotted line) the effect is statistically significant.

diff --git a/docs/10-sequential.html b/docs/10-sequential.html index 550ad0c..3c2938b 100644 --- a/docs/10-sequential.html +++ b/docs/10-sequential.html @@ -847,17 +847,17 @@

-
[PROGRESS] Stage results calculated [0.0352 secs] 
-[PROGRESS] Conditional power calculated [0.0277 secs] 
-[PROGRESS] Conditional rejection probabilities (CRP) calculated [0.001 secs] 
-[PROGRESS] Repeated confidence interval of stage 1 calculated [0.6816 secs] 
-[PROGRESS] Repeated confidence interval of stage 2 calculated [0.6956 secs] 
-[PROGRESS] Repeated confidence interval calculated [1.38 secs] 
-[PROGRESS] Repeated p-values of stage 1 calculated [0.2672 secs] 
-[PROGRESS] Repeated p-values of stage 2 calculated [0.2507 secs] 
-[PROGRESS] Repeated p-values calculated [0.5192 secs] 
-[PROGRESS] Final p-value calculated [0.0013 secs] 
-[PROGRESS] Final confidence interval calculated [0.0645 secs] 
+
[PROGRESS] Stage results calculated [0.0434 secs] 
+[PROGRESS] Conditional power calculated [0.0327 secs] 
+[PROGRESS] Conditional rejection probabilities (CRP) calculated [0.0012 secs] 
+[PROGRESS] Repeated confidence interval of stage 1 calculated [0.7686 secs] 
+[PROGRESS] Repeated confidence interval of stage 2 calculated [0.7027 secs] 
+[PROGRESS] Repeated confidence interval calculated [1.47 secs] 
+[PROGRESS] Repeated p-values of stage 1 calculated [0.254 secs] 
+[PROGRESS] Repeated p-values of stage 2 calculated [0.2679 secs] 
+[PROGRESS] Repeated p-values calculated [0.5231 secs] 
+[PROGRESS] Final p-value calculated [0.0015 secs] 
+[PROGRESS] Final confidence interval calculated [0.0804 secs] 
diff --git a/docs/Improving-Your-Statistical-Inferences.epub b/docs/Improving-Your-Statistical-Inferences.epub index ecf9e14..166bfd6 100644 Binary files a/docs/Improving-Your-Statistical-Inferences.epub and b/docs/Improving-Your-Statistical-Inferences.epub differ diff --git a/docs/Improving-Your-Statistical-Inferences.pdf b/docs/Improving-Your-Statistical-Inferences.pdf index 91b94de..7bdfd94 100644 Binary files a/docs/Improving-Your-Statistical-Inferences.pdf and b/docs/Improving-Your-Statistical-Inferences.pdf differ diff --git a/docs/changelog.html b/docs/changelog.html index 6d46f53..ad9be77 100644 --- a/docs/changelog.html +++ b/docs/changelog.html @@ -254,8 +254,8 @@

Change Log

The current version of this textbook is 1.4.3.

-

This version has been compiled on January 10, 2024.

-

This version was generated from Git commit #5d126ce3. All version controlled changes can be found on GitHub.

+

This version has been compiled on February 01, 2024.

+

This version was generated from Git commit #bb858739. All version controlled changes can be found on GitHub.

This page documents the changes to the textbook that were more substantial than fixing a typo.

Updates

January 10, 2024:

diff --git a/docs/references.html b/docs/references.html index 5726051..cb9d5af 100644 --- a/docs/references.html +++ b/docs/references.html @@ -757,7 +757,7 @@

References

Behaviour, 6(12), 1731–1742. https://doi.org/10.1038/s41562-022-01458-9
-Colling, L. J., Szűcs, D., De Marco, D., Cipora, K., Ulrich, R., Nuerk, +Colling, L. J., Szcs, D., De Marco, D., Cipora, K., Ulrich, R., Nuerk, H.-C., Soltanlou, M., Bryce, D., Chen, S.-C., Schroeder, P. A., Henare, D. T., Chrystall, C. K., Corballis, P. M., Ansari, D., Goffin, C., Sokolowski, H. M., Hancock, P. J. B., Millen, A. E., Langton, S. R. H., diff --git a/docs/search.json b/docs/search.json index e19b49d..4afcf0f 100644 --- a/docs/search.json +++ b/docs/search.json @@ -270,7 +270,7 @@ "href": "05-questions.html#sec-verisimilitude", "title": "5  Asking Statistical Questions", "section": "\n5.13 Verisimilitude and Progress in Science", - "text": "5.13 Verisimilitude and Progress in Science\n\nIt makes a fellow cheery To be cooking up a theory; And it needn’t make him blue That it’s not exactly true If at least it’s getting neary. Verisimilitude — Meehl, 11/7/1988\n\nDoes science offer a way to learn what is true about our world? According to the perspective in philosophy of science known as scientific realism, the answer is ‘yes’. Scientific realism is the idea that successful scientific theories that have made novel predictions give us a good reason to believe these theories make statements about the world that are at least partially true. Known as the no miracle argument, only realism can explain the success of science, which consists of repeatedly making successful predictions (Duhem, 1954), without requiring us to believe in miracles.\nNot everyone thinks that it matters whether scientific theories make true statements about the world, as scientific realists do. Laudan (1981) argues against scientific realism based on a pessimistic meta-induction: If theories that were deemed successful in the past turn out to be false, then we can reasonably expect all our current successful theories to be false as well. Van Fraassen (1980) believes it is sufficient for a theory to be ‘empirically adequate’, and make true predictions about things we can observe, irrespective of whether these predictions are derived from a theory that describes how the unobservable world is in reality. This viewpoint is known as constructive empiricism. As Van Fraassen summarizes the constructive empiricist perspective (1980, p.12): “Science aims to give us theories which are empirically adequate; and acceptance of a theory involves as belief only that it is empirically adequate”.\nThe idea that we should ‘believe’ scientific hypotheses is not something scientific realists can get behind. Either they think theories make true statements about things in the world, but we will have to remain completely agnostic about when they do (Feyerabend, 1993), or they think that corroborating novel and risky predictions makes it reasonable to believe that a theory has some ‘truth-likeness’, or verisimilitude. The concept of verisimilitude is based on the intuition that a theory is closer to a true statement when the theory allows us to make more true predictions, and less false predictions. When data is in line with predictions, a theory gains verisimilitude, when data are not in line with predictions, a theory loses verisimilitude (Meehl, 1978). Popper clearly intended verisimilitude to be different from belief (Niiniluoto, 1998). Importantly, verisimilitude refers to how close a theory is to the truth, which makes it an ontological, not epistemological question. That is, verisimilitude is a function of the degree to which a theory is similar to the truth, but it is not a function of the degree of belief in, or the evidence for, a theory (Meehl, 1978, 1990a). It is also not necessary for a scientific realist that we ever know what is true – we just need to be of the opinion that we can move closer to the truth (known as comparative scientific realism, Kuipers (2016)).\nAttempts to formalize verisimilitude have been a challenge, and from the perspective of an empirical scientist, the abstract nature of this ongoing discussion does not really make me optimistic it will be extremely useful in everyday practice. On a more intuitive level, verisimilitude can be regarded as the extent to which a theory makes the most correct (and least incorrect) statements about specific features in the world. One way to think about this is using the ‘possible worlds’ approach (Niiniluoto, 1999), where for each basic state of the world one can predict, there is a possible world that contains each unique combination of states.\nFor example, consider the experiments by Stroop (1935), where color related words (e.g., RED, BLUE) are printed either in congruent colors (i.e., the word RED in red ink) or incongruent colors (i.e., the word RED in blue ink). We might have a very simple theory predicting that people automatically process irrelevant information in a task. When we do two versions of a Stroop experiment, one where people are asked to read the words, and one where people are asked to name the colors, this simple theory would predict slower responses on incongruent trials, compared to congruent trials. A slightly more advanced theory predicts that congruency effects are dependent upon the salience of the word dimension and color dimension (Melara & Algom, 2003). Because in the standard Stroop experiment the word dimension is much more salient in both tasks than the color dimension, this theory predicts slower responses on incongruent trials, but only in the color naming condition. We have four possible worlds, two of which represent predictions from either of the two theories, and two that are not in line with either theory.\n\n\n\nResponses Color Naming\nResponses Word Naming\n\n\n\nWorld 1\nSlower\nSlower\n\n\nWorld 2\nSlower\nNot Slower\n\n\nWorld 3\nNot Slower\nSlower\n\n\nWorld 4\nNot Slower\nNot Slower\n\n\n\nMeehl (1990b) discusses a ‘box score’ of the number of successfully predicted features, which he acknowledges is too simplistic. No widely accepted formalized measure of verisimilitude is available to express the similarity between the successfully predicted features by a theory, although several proposals have been put forward (Cevolani et al., 2011; Niiniluoto, 1998; Oddie, 2013). However, even if formal measures of verisimilitude are not available, it remains a useful concept to describe theories that are assumed to be closer to the truth because they make novel predictions (Psillos, 1999).\n\n\n\n\n\nAltoè, G., Bertoldo, G., Zandonella Callegher, C., Toffalini, E., Calcagnì, A., Finos, L., & Pastore, M. (2020). Enhancing Statistical Inference in Psychological Research via Prospective and Retrospective Design Analysis. Frontiers in Psychology, 10.\n\n\nBaguley, T. (2012). Serious stats: A guide to advanced statistics for the behavioral sciences. Palgrave Macmillan.\n\n\nCevolani, G., Crupi, V., & Festa, R. (2011). Verisimilitude and belief change for conjunctive theories. Erkenntnis, 75(2), 183.\n\n\nCho, H.-C., & Abe, S. (2013). Is two-tailed testing for directional research hypotheses tests legitimate? Journal of Business Research, 66(9), 1261–1266. https://doi.org/10.1016/j.jbusres.2012.02.023\n\n\nCohen, J. (1994). The earth is round (p \\(<\\) .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997\n\n\nColling, L. J., Szűcs, D., De Marco, D., Cipora, K., Ulrich, R., Nuerk, H.-C., Soltanlou, M., Bryce, D., Chen, S.-C., Schroeder, P. A., Henare, D. T., Chrystall, C. K., Corballis, P. M., Ansari, D., Goffin, C., Sokolowski, H. M., Hancock, P. J. B., Millen, A. E., Langton, S. R. H., … McShane, B. B. (2020). Registered Replication Report on Fischer, Castel, Dodd, and Pratt (2003). Advances in Methods and Practices in Psychological Science, 3(2), 143–162. https://doi.org/10.1177/2515245920903079\n\n\nde Groot, A. D. (1969). Methodology (Vol. 6). Mouton & Co.\n\n\nDongen, N. N. N. van, Doorn, J. B. van, Gronau, Q. F., Ravenzwaaij, D. van, Hoekstra, R., Haucke, M. N., Lakens, D., Hennig, C., Morey, R. D., Homer, S., Gelman, A., Sprenger, J., & Wagenmakers, E.-J. (2019). Multiple Perspectives on Inference for Two Simple Statistical Scenarios. The American Statistician, 73(sup1), 328–339. https://doi.org/10.1080/00031305.2019.1565553\n\n\nDubin, R. (1969). Theory building. Free Press.\n\n\nDuhem, P. (1954). The aim and structure of physical theory. Princeton University Press.\n\n\nFerguson, C. J., & Heene, M. (2021). Providing a lower-bound estimate for psychology’s “crud factor”: The case of aggression. Professional Psychology: Research and Practice, 52(6), 620–626. https://doi.org/http://dx.doi.org/10.1037/pro0000386\n\n\nFeyerabend, P. (1993). Against method (3rd ed). Verso.\n\n\nFeynman, R. P. (1974). Cargo cult science. Engineering and Science, 37(7), 10–13.\n\n\nFiedler, K. (2004). Tools, toys, truisms, and theories: Some thoughts on the creative cycle of theory formation. Personality and Social Psychology Review, 8(2), 123–131. https://doi.org/10.1207/s15327957pspr0802_5\n\n\nGelman, A., & Carlin, J. (2014). Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors. Perspectives on Psychological Science, 9(6), 641–651.\n\n\nGerring, J. (2012). Mere Description. British Journal of Political Science, 42(4), 721–746. https://doi.org/10.1017/S0007123412000130\n\n\nHagger, M. S., Chatzisarantis, N. L. D., Alberts, H., Anggono, C. O., Batailler, C., Birt, A. R., Brand, R., Brandt, M. J., Brewer, G., Bruyneel, S., Calvillo, D. P., Campbell, W. K., Cannon, P. R., Carlucci, M., Carruth, N. P., Cheung, T., Crowell, A., De Ridder, D. T. D., Dewitte, S., … Zwienenberg, M. (2016). A Multilab Preregistered Replication of the Ego-Depletion Effect. Perspectives on Psychological Science, 11(4), 546–573. https://doi.org/10.1177/1745691616652873\n\n\nHand, D. J. (1994). Deconstructing Statistical Questions. Journal of the Royal Statistical Society. Series A (Statistics in Society), 157(3), 317–356. https://doi.org/10.2307/2983526\n\n\nHempel, C. G. (1966). Philosophy of natural science (Nachdr.). Prentice-Hall.\n\n\nJeffreys, H. (1939). Theory of probability (1st ed). Oxford University Press.\n\n\nJones, L. V. (1952). Test of hypotheses: One-sided vs. Two-sided alternatives. Psychological Bulletin, 49(1), 43–46. https://doi.org/http://dx.doi.org/10.1037/h0056832\n\n\nKaiser, H. F. (1960). Directional statistical decisions. Psychological Review, 67(3), 160–167. https://doi.org/10.1037/h0047595\n\n\nKenett, R. S., Shmueli, G., & Kenett, R. (2016). Information Quality: The Potential of Data and Analytics to Generate Knowledge (1st edition). Wiley.\n\n\nKerr, N. L. (1998). HARKing: Hypothesizing After the Results are Known. Personality and Social Psychology Review, 2(3), 196–217. https://doi.org/10.1207/s15327957pspr0203_4\n\n\nKuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.\n\n\nKuipers, T. A. F. (2016). Models, postulates, and generalized nomic truth approximation. Synthese, 193(10), 3057–3077. https://doi.org/10.1007/s11229-015-0916-9\n\n\nLakatos, I. (1978). The methodology of scientific research programmes: Volume 1: Philosophical papers. Cambridge University Press.\n\n\nLakens, D. (2019). The value of preregistration for psychological science: A conceptual analysis. Japanese Psychological Review, 62(3), 221–230. https://doi.org/10.24602/sjpr.62.3_221\n\n\nLakens, D. (2021). The practical alternative to the p value is the correctly used p value. Perspectives on Psychological Science, 16(3), 639–648. https://doi.org/10.1177/1745691620958012\n\n\nLakens, D., McLatchie, N., Isager, P. M., Scheel, A. M., & Dienes, Z. (2020). Improving Inferences About Null Effects With Bayes Factors and Equivalence Tests. The Journals of Gerontology: Series B, 75(1), 45–57. https://doi.org/10.1093/geronb/gby065\n\n\nLaudan, L. (1981). Science and Hypothesis. Springer Netherlands. https://doi.org/10.1007/978-94-015-7288-0\n\n\nLaudan, L. (1986). Science and Values: The Aims of Science and Their Role in Scientific Debate.\n\n\nLuttrell, A., Petty, R. E., & Xu, M. (2017). Replicating and fixing failed replications: The case of need for cognition and argument quality. Journal of Experimental Social Psychology, 69, 178–183. https://doi.org/10.1016/j.jesp.2016.09.006\n\n\nMayo, D. G. (2018). Statistical inference as severe testing: How to get beyond the statistics wars. Cambridge University Press.\n\n\nMcCarthy, R. J., Skowronski, J. J., Verschuere, B., Meijer, E. H., Jim, A., Hoogesteyn, K., Orthey, R., Acar, O. A., Aczel, B., Bakos, B. E., Barbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R., Blatz, L., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E. (2018). Registered Replication Report on Srull and Wyer (1979). Advances in Methods and Practices in Psychological Science, 1(3), 321–336. https://doi.org/10.1177/2515245918777487\n\n\nMcGuire, W. J. (2004). A Perspectivist Approach to Theory Construction. Personality and Social Psychology Review, 8(2), 173–182. https://doi.org/10.1207/s15327957pspr0802_11\n\n\nMeehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 103–115. https://www.jstor.org/stable/186099\n\n\nMeehl, P. E. (1978). Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834. https://doi.org/10.1037/0022-006X.46.4.806\n\n\nMeehl, P. E. (1990a). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry, 1(2), 108–141. https://doi.org/10.1207/s15327965pli0102_1\n\n\nMeehl, P. E. (1990b). Why Summaries of Research on Psychological Theories are Often Uninterpretable: Psychological Reports, 66(1), 195–244. https://doi.org/10.2466/pr0.1990.66.1.195\n\n\nMeehl, P. E. (2004). Cliometric metatheory III: Peircean consensus, verisimilitude and asymptotic method. The British Journal for the Philosophy of Science, 55(4), 615–643.\n\n\nMelara, R. D., & Algom, D. (2003). Driven by information: A tectonic theory of Stroop effects. Psychological Review, 110(3), 422–471. https://doi.org/10.1037/0033-295X.110.3.422\n\n\nMellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency representations eliminate conjunction effects? An exercise in adversarial collaboration. Psychological Science, 12(4), 269–275. https://doi.org/10.1111/1467-9280.00350\n\n\nMorey, R. D., Kaschak, M. P., Díez-Álamo, A. M., Glenberg, A. M., Zwaan, R. A., Lakens, D., Ibáñez, A., García, A., Gianelli, C., Jones, J. L., Madden, J., Alifano, F., Bergen, B., Bloxsom, N. G., Bub, D. N., Cai, Z. G., Chartier, C. R., Chatterjee, A., Conwell, E., … Ziv-Crispel, N. (2021). A pre-registered, multi-lab non-replication of the action-sentence compatibility effect (ACE). Psychonomic Bulletin & Review. https://doi.org/10.3758/s13423-021-01927-8\n\n\nNiiniluoto, I. (1998). Verisimilitude: The Third Period. The British Journal for the Philosophy of Science, 49, 1–29.\n\n\nNiiniluoto, I. (1999). Critical Scientific Realism. Oxford University Press.\n\n\nO’Donnell, M., Nelson, L. D., Ackermann, E., Aczel, B., Akhtar, A., Aldrovandi, S., Alshaif, N., Andringa, R., Aveyard, M., Babincak, P., Balatekin, N., Baldwin, S. A., Banik, G., Baskin, E., Bell, R., Białobrzeska, O., Birt, A. R., Boot, W. R., Braithwaite, S. R., … Zrubka, M. (2018). Registered Replication Report: Dijksterhuis and van Knippenberg (1998). Perspectives on Psychological Science, 13(2), 268–294. https://doi.org/10.1177/1745691618755704\n\n\nOddie, G. (2013). The content, consequence and likeness approaches to verisimilitude: Compatibility, trivialization, and underdetermination. Synthese, 190(9), 1647–1687. https://doi.org/10.1007/s11229-011-9930-8\n\n\nOrben, A., & Lakens, D. (2020). Crud (Re)Defined. Advances in Methods and Practices in Psychological Science, 3(2), 238–247. https://doi.org/10.1177/2515245920917961\n\n\nPlatt, J. R. (1964). Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. Science, 146(3642), 347–353. https://doi.org/10.1126/science.146.3642.347\n\n\nPopper, K. R. (2002). The logic of scientific discovery. Routledge.\n\n\nPsillos, S. (1999). Scientific realism: How science tracks truth. Routledge.\n\n\nRoyall, R. (1997). Statistical Evidence: A Likelihood Paradigm. Chapman and Hall/CRC.\n\n\nScheel, A. M., Tiokhin, L., Isager, P. M., & Lakens, D. (2021). Why Hypothesis Testers Should Spend Less Time Testing Hypotheses. Perspectives on Psychological Science, 16(4), 744–755. https://doi.org/10.1177/1745691620966795\n\n\nSchulz, K. F., & Grimes, D. A. (2005). Sample size calculations in randomised trials: Mandatory and mystical. The Lancet, 365(9467), 1348–1353. https://doi.org/10.1016/S0140-6736(05)61034-3\n\n\nShmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310.\n\n\nStroebe, W., & Strack, F. (2014). The Alleged Crisis and the Illusion of Exact Replication. Perspectives on Psychological Science, 9(1), 59–71. https://doi.org/10.1177/1745691613514450\n\n\nStroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18(6), 643–662.\n\n\nTendeiro, J. N., & Kiers, H. A. L. (2019). A review of issues about null hypothesis Bayesian testing. Psychological Methods. https://doi.org/10.1037/met0000221\n\n\nUygun Tunç, D., & Tunç, M. N. (2022). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework. Meta-Psychology. https://doi.org/10.31234/osf.io/pdm7y\n\n\nVan Fraassen, B. C. (1980). The scientific image. Clarendon Press ; Oxford University Press.\n\n\nVerschuere, B., Meijer, E. H., Jim, A., Hoogesteyn, K., Orthey, R., McCarthy, R. J., Skowronski, J. J., Acar, O. A., Aczel, B., Bakos, B. E., Barbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R., Blatz, L., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E. (2018). Registered Replication Report on Mazar, Amir, and Ariely (2008). Advances in Methods and Practices in Psychological Science, 1(3), 299–317. https://doi.org/10.1177/2515245918781032\n\n\nWagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A., Adams, R. B., Albohn, D. N., Allard, E. S., Benning, S. D., Blouin-Hudon, E.-M., Bulnes, L. C., Caldwell, T. L., Calin-Jageman, R. J., Capaldi, C. A., Carfagno, N. S., Chasten, K. T., Cleeremans, A., Connell, L., DeCicco, J. M., … Zwaan, R. A. (2016). Registered Replication Report: Strack, Martin, & Stepper (1988). Perspectives on Psychological Science, 11(6), 917–928. https://doi.org/10.1177/1745691616674458\n\n\nWynants, L., Calster, B. V., Collins, G. S., Riley, R. D., Heinze, G., Schuit, E., Bonten, M. M. J., Dahly, D. L., Damen, J. A., Debray, T. P. A., Jong, V. M. T. de, Vos, M. D., Dhiman, P., Haller, M. C., Harhay, M. O., Henckaerts, L., Heus, P., Kammer, M., Kreuzberger, N., … Smeden, M. van. (2020). Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal. BMJ, 369, m1328. https://doi.org/10.1136/bmj.m1328\n\n\nYarkoni, T., & Westfall, J. (2017). Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspectives on Psychological Science, 12(6), 1100–1122. https://doi.org/10.1177/1745691617693393" + "text": "5.13 Verisimilitude and Progress in Science\n\nIt makes a fellow cheery To be cooking up a theory; And it needn’t make him blue That it’s not exactly true If at least it’s getting neary. Verisimilitude — Meehl, 11/7/1988\n\nDoes science offer a way to learn what is true about our world? According to the perspective in philosophy of science known as scientific realism, the answer is ‘yes’. Scientific realism is the idea that successful scientific theories that have made novel predictions give us a good reason to believe these theories make statements about the world that are at least partially true. Known as the no miracle argument, only realism can explain the success of science, which consists of repeatedly making successful predictions (Duhem, 1954), without requiring us to believe in miracles.\nNot everyone thinks that it matters whether scientific theories make true statements about the world, as scientific realists do. Laudan (1981) argues against scientific realism based on a pessimistic meta-induction: If theories that were deemed successful in the past turn out to be false, then we can reasonably expect all our current successful theories to be false as well. Van Fraassen (1980) believes it is sufficient for a theory to be ‘empirically adequate’, and make true predictions about things we can observe, irrespective of whether these predictions are derived from a theory that describes how the unobservable world is in reality. This viewpoint is known as constructive empiricism. As Van Fraassen summarizes the constructive empiricist perspective (1980, p.12): “Science aims to give us theories which are empirically adequate; and acceptance of a theory involves as belief only that it is empirically adequate”.\nThe idea that we should ‘believe’ scientific hypotheses is not something scientific realists can get behind. Either they think theories make true statements about things in the world, but we will have to remain completely agnostic about when they do (Feyerabend, 1993), or they think that corroborating novel and risky predictions makes it reasonable to believe that a theory has some ‘truth-likeness’, or verisimilitude. The concept of verisimilitude is based on the intuition that a theory is closer to a true statement when the theory allows us to make more true predictions, and less false predictions. When data is in line with predictions, a theory gains verisimilitude, when data are not in line with predictions, a theory loses verisimilitude (Meehl, 1978). Popper clearly intended verisimilitude to be different from belief (Niiniluoto, 1998). Importantly, verisimilitude refers to how close a theory is to the truth, which makes it an ontological, not epistemological question. That is, verisimilitude is a function of the degree to which a theory is similar to the truth, but it is not a function of the degree of belief in, or the evidence for, a theory (Meehl, 1978, 1990a). It is also not necessary for a scientific realist that we ever know what is true – we just need to be of the opinion that we can move closer to the truth (known as comparative scientific realism, Kuipers (2016)).\nAttempts to formalize verisimilitude have been a challenge, and from the perspective of an empirical scientist, the abstract nature of this ongoing discussion does not really make me optimistic it will be extremely useful in everyday practice. On a more intuitive level, verisimilitude can be regarded as the extent to which a theory makes the most correct (and least incorrect) statements about specific features in the world. One way to think about this is using the ‘possible worlds’ approach (Niiniluoto, 1999), where for each basic state of the world one can predict, there is a possible world that contains each unique combination of states.\nFor example, consider the experiments by Stroop (1935), where color related words (e.g., RED, BLUE) are printed either in congruent colors (i.e., the word RED in red ink) or incongruent colors (i.e., the word RED in blue ink). We might have a very simple theory predicting that people automatically process irrelevant information in a task. When we do two versions of a Stroop experiment, one where people are asked to read the words, and one where people are asked to name the colors, this simple theory would predict slower responses on incongruent trials, compared to congruent trials. A slightly more advanced theory predicts that congruency effects are dependent upon the salience of the word dimension and color dimension (Melara & Algom, 2003). Because in the standard Stroop experiment the word dimension is much more salient in both tasks than the color dimension, this theory predicts slower responses on incongruent trials, but only in the color naming condition. We have four possible worlds, two of which represent predictions from either of the two theories, and two that are not in line with either theory.\n\n\n\nResponses Color Naming\nResponses Word Naming\n\n\n\nWorld 1\nSlower\nSlower\n\n\nWorld 2\nSlower\nNot Slower\n\n\nWorld 3\nNot Slower\nSlower\n\n\nWorld 4\nNot Slower\nNot Slower\n\n\n\nMeehl (1990b) discusses a ‘box score’ of the number of successfully predicted features, which he acknowledges is too simplistic. No widely accepted formalized measure of verisimilitude is available to express the similarity between the successfully predicted features by a theory, although several proposals have been put forward (Cevolani et al., 2011; Niiniluoto, 1998; Oddie, 2013). However, even if formal measures of verisimilitude are not available, it remains a useful concept to describe theories that are assumed to be closer to the truth because they make novel predictions (Psillos, 1999).\n\n\n\n\n\nAltoè, G., Bertoldo, G., Zandonella Callegher, C., Toffalini, E., Calcagnì, A., Finos, L., & Pastore, M. (2020). Enhancing Statistical Inference in Psychological Research via Prospective and Retrospective Design Analysis. Frontiers in Psychology, 10.\n\n\nBaguley, T. (2012). Serious stats: A guide to advanced statistics for the behavioral sciences. Palgrave Macmillan.\n\n\nCevolani, G., Crupi, V., & Festa, R. (2011). Verisimilitude and belief change for conjunctive theories. Erkenntnis, 75(2), 183.\n\n\nCho, H.-C., & Abe, S. (2013). Is two-tailed testing for directional research hypotheses tests legitimate? Journal of Business Research, 66(9), 1261–1266. https://doi.org/10.1016/j.jbusres.2012.02.023\n\n\nCohen, J. (1994). The earth is round (p \\(<\\) .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997\n\n\nColling, L. J., Szcs, D., De Marco, D., Cipora, K., Ulrich, R., Nuerk, H.-C., Soltanlou, M., Bryce, D., Chen, S.-C., Schroeder, P. A., Henare, D. T., Chrystall, C. K., Corballis, P. M., Ansari, D., Goffin, C., Sokolowski, H. M., Hancock, P. J. B., Millen, A. E., Langton, S. R. H., … McShane, B. B. (2020). Registered Replication Report on Fischer, Castel, Dodd, and Pratt (2003). Advances in Methods and Practices in Psychological Science, 3(2), 143–162. https://doi.org/10.1177/2515245920903079\n\n\nde Groot, A. D. (1969). Methodology (Vol. 6). Mouton & Co.\n\n\nDongen, N. N. N. van, Doorn, J. B. van, Gronau, Q. F., Ravenzwaaij, D. van, Hoekstra, R., Haucke, M. N., Lakens, D., Hennig, C., Morey, R. D., Homer, S., Gelman, A., Sprenger, J., & Wagenmakers, E.-J. (2019). Multiple Perspectives on Inference for Two Simple Statistical Scenarios. The American Statistician, 73(sup1), 328–339. https://doi.org/10.1080/00031305.2019.1565553\n\n\nDubin, R. (1969). Theory building. Free Press.\n\n\nDuhem, P. (1954). The aim and structure of physical theory. Princeton University Press.\n\n\nFerguson, C. J., & Heene, M. (2021). Providing a lower-bound estimate for psychology’s “crud factor”: The case of aggression. Professional Psychology: Research and Practice, 52(6), 620–626. https://doi.org/http://dx.doi.org/10.1037/pro0000386\n\n\nFeyerabend, P. (1993). Against method (3rd ed). Verso.\n\n\nFeynman, R. P. (1974). Cargo cult science. Engineering and Science, 37(7), 10–13.\n\n\nFiedler, K. (2004). Tools, toys, truisms, and theories: Some thoughts on the creative cycle of theory formation. Personality and Social Psychology Review, 8(2), 123–131. https://doi.org/10.1207/s15327957pspr0802_5\n\n\nGelman, A., & Carlin, J. (2014). Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors. Perspectives on Psychological Science, 9(6), 641–651.\n\n\nGerring, J. (2012). Mere Description. British Journal of Political Science, 42(4), 721–746. https://doi.org/10.1017/S0007123412000130\n\n\nHagger, M. S., Chatzisarantis, N. L. D., Alberts, H., Anggono, C. O., Batailler, C., Birt, A. R., Brand, R., Brandt, M. J., Brewer, G., Bruyneel, S., Calvillo, D. P., Campbell, W. K., Cannon, P. R., Carlucci, M., Carruth, N. P., Cheung, T., Crowell, A., De Ridder, D. T. D., Dewitte, S., … Zwienenberg, M. (2016). A Multilab Preregistered Replication of the Ego-Depletion Effect. Perspectives on Psychological Science, 11(4), 546–573. https://doi.org/10.1177/1745691616652873\n\n\nHand, D. J. (1994). Deconstructing Statistical Questions. Journal of the Royal Statistical Society. Series A (Statistics in Society), 157(3), 317–356. https://doi.org/10.2307/2983526\n\n\nHempel, C. G. (1966). Philosophy of natural science (Nachdr.). Prentice-Hall.\n\n\nJeffreys, H. (1939). Theory of probability (1st ed). Oxford University Press.\n\n\nJones, L. V. (1952). Test of hypotheses: One-sided vs. Two-sided alternatives. Psychological Bulletin, 49(1), 43–46. https://doi.org/http://dx.doi.org/10.1037/h0056832\n\n\nKaiser, H. F. (1960). Directional statistical decisions. Psychological Review, 67(3), 160–167. https://doi.org/10.1037/h0047595\n\n\nKenett, R. S., Shmueli, G., & Kenett, R. (2016). Information Quality: The Potential of Data and Analytics to Generate Knowledge (1st edition). Wiley.\n\n\nKerr, N. L. (1998). HARKing: Hypothesizing After the Results are Known. Personality and Social Psychology Review, 2(3), 196–217. https://doi.org/10.1207/s15327957pspr0203_4\n\n\nKuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.\n\n\nKuipers, T. A. F. (2016). Models, postulates, and generalized nomic truth approximation. Synthese, 193(10), 3057–3077. https://doi.org/10.1007/s11229-015-0916-9\n\n\nLakatos, I. (1978). The methodology of scientific research programmes: Volume 1: Philosophical papers. Cambridge University Press.\n\n\nLakens, D. (2019). The value of preregistration for psychological science: A conceptual analysis. Japanese Psychological Review, 62(3), 221–230. https://doi.org/10.24602/sjpr.62.3_221\n\n\nLakens, D. (2021). The practical alternative to the p value is the correctly used p value. Perspectives on Psychological Science, 16(3), 639–648. https://doi.org/10.1177/1745691620958012\n\n\nLakens, D., McLatchie, N., Isager, P. M., Scheel, A. M., & Dienes, Z. (2020). Improving Inferences About Null Effects With Bayes Factors and Equivalence Tests. The Journals of Gerontology: Series B, 75(1), 45–57. https://doi.org/10.1093/geronb/gby065\n\n\nLaudan, L. (1981). Science and Hypothesis. Springer Netherlands. https://doi.org/10.1007/978-94-015-7288-0\n\n\nLaudan, L. (1986). Science and Values: The Aims of Science and Their Role in Scientific Debate.\n\n\nLuttrell, A., Petty, R. E., & Xu, M. (2017). Replicating and fixing failed replications: The case of need for cognition and argument quality. Journal of Experimental Social Psychology, 69, 178–183. https://doi.org/10.1016/j.jesp.2016.09.006\n\n\nMayo, D. G. (2018). Statistical inference as severe testing: How to get beyond the statistics wars. Cambridge University Press.\n\n\nMcCarthy, R. J., Skowronski, J. J., Verschuere, B., Meijer, E. H., Jim, A., Hoogesteyn, K., Orthey, R., Acar, O. A., Aczel, B., Bakos, B. E., Barbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R., Blatz, L., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E. (2018). Registered Replication Report on Srull and Wyer (1979). Advances in Methods and Practices in Psychological Science, 1(3), 321–336. https://doi.org/10.1177/2515245918777487\n\n\nMcGuire, W. J. (2004). A Perspectivist Approach to Theory Construction. Personality and Social Psychology Review, 8(2), 173–182. https://doi.org/10.1207/s15327957pspr0802_11\n\n\nMeehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 103–115. https://www.jstor.org/stable/186099\n\n\nMeehl, P. E. (1978). Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834. https://doi.org/10.1037/0022-006X.46.4.806\n\n\nMeehl, P. E. (1990a). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry, 1(2), 108–141. https://doi.org/10.1207/s15327965pli0102_1\n\n\nMeehl, P. E. (1990b). Why Summaries of Research on Psychological Theories are Often Uninterpretable: Psychological Reports, 66(1), 195–244. https://doi.org/10.2466/pr0.1990.66.1.195\n\n\nMeehl, P. E. (2004). Cliometric metatheory III: Peircean consensus, verisimilitude and asymptotic method. The British Journal for the Philosophy of Science, 55(4), 615–643.\n\n\nMelara, R. D., & Algom, D. (2003). Driven by information: A tectonic theory of Stroop effects. Psychological Review, 110(3), 422–471. https://doi.org/10.1037/0033-295X.110.3.422\n\n\nMellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency representations eliminate conjunction effects? An exercise in adversarial collaboration. Psychological Science, 12(4), 269–275. https://doi.org/10.1111/1467-9280.00350\n\n\nMorey, R. D., Kaschak, M. P., Díez-Álamo, A. M., Glenberg, A. M., Zwaan, R. A., Lakens, D., Ibáñez, A., García, A., Gianelli, C., Jones, J. L., Madden, J., Alifano, F., Bergen, B., Bloxsom, N. G., Bub, D. N., Cai, Z. G., Chartier, C. R., Chatterjee, A., Conwell, E., … Ziv-Crispel, N. (2021). A pre-registered, multi-lab non-replication of the action-sentence compatibility effect (ACE). Psychonomic Bulletin & Review. https://doi.org/10.3758/s13423-021-01927-8\n\n\nNiiniluoto, I. (1998). Verisimilitude: The Third Period. The British Journal for the Philosophy of Science, 49, 1–29.\n\n\nNiiniluoto, I. (1999). Critical Scientific Realism. Oxford University Press.\n\n\nO’Donnell, M., Nelson, L. D., Ackermann, E., Aczel, B., Akhtar, A., Aldrovandi, S., Alshaif, N., Andringa, R., Aveyard, M., Babincak, P., Balatekin, N., Baldwin, S. A., Banik, G., Baskin, E., Bell, R., Białobrzeska, O., Birt, A. R., Boot, W. R., Braithwaite, S. R., … Zrubka, M. (2018). Registered Replication Report: Dijksterhuis and van Knippenberg (1998). Perspectives on Psychological Science, 13(2), 268–294. https://doi.org/10.1177/1745691618755704\n\n\nOddie, G. (2013). The content, consequence and likeness approaches to verisimilitude: Compatibility, trivialization, and underdetermination. Synthese, 190(9), 1647–1687. https://doi.org/10.1007/s11229-011-9930-8\n\n\nOrben, A., & Lakens, D. (2020). Crud (Re)Defined. Advances in Methods and Practices in Psychological Science, 3(2), 238–247. https://doi.org/10.1177/2515245920917961\n\n\nPlatt, J. R. (1964). Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. Science, 146(3642), 347–353. https://doi.org/10.1126/science.146.3642.347\n\n\nPopper, K. R. (2002). The logic of scientific discovery. Routledge.\n\n\nPsillos, S. (1999). Scientific realism: How science tracks truth. Routledge.\n\n\nRoyall, R. (1997). Statistical Evidence: A Likelihood Paradigm. Chapman and Hall/CRC.\n\n\nScheel, A. M., Tiokhin, L., Isager, P. M., & Lakens, D. (2021). Why Hypothesis Testers Should Spend Less Time Testing Hypotheses. Perspectives on Psychological Science, 16(4), 744–755. https://doi.org/10.1177/1745691620966795\n\n\nSchulz, K. F., & Grimes, D. A. (2005). Sample size calculations in randomised trials: Mandatory and mystical. The Lancet, 365(9467), 1348–1353. https://doi.org/10.1016/S0140-6736(05)61034-3\n\n\nShmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310.\n\n\nStroebe, W., & Strack, F. (2014). The Alleged Crisis and the Illusion of Exact Replication. Perspectives on Psychological Science, 9(1), 59–71. https://doi.org/10.1177/1745691613514450\n\n\nStroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18(6), 643–662.\n\n\nTendeiro, J. N., & Kiers, H. A. L. (2019). A review of issues about null hypothesis Bayesian testing. Psychological Methods. https://doi.org/10.1037/met0000221\n\n\nUygun Tunç, D., & Tunç, M. N. (2022). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework. Meta-Psychology. https://doi.org/10.31234/osf.io/pdm7y\n\n\nVan Fraassen, B. C. (1980). The scientific image. Clarendon Press ; Oxford University Press.\n\n\nVerschuere, B., Meijer, E. H., Jim, A., Hoogesteyn, K., Orthey, R., McCarthy, R. J., Skowronski, J. J., Acar, O. A., Aczel, B., Bakos, B. E., Barbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R., Blatz, L., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E. (2018). Registered Replication Report on Mazar, Amir, and Ariely (2008). Advances in Methods and Practices in Psychological Science, 1(3), 299–317. https://doi.org/10.1177/2515245918781032\n\n\nWagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A., Adams, R. B., Albohn, D. N., Allard, E. S., Benning, S. D., Blouin-Hudon, E.-M., Bulnes, L. C., Caldwell, T. L., Calin-Jageman, R. J., Capaldi, C. A., Carfagno, N. S., Chasten, K. T., Cleeremans, A., Connell, L., DeCicco, J. M., … Zwaan, R. A. (2016). Registered Replication Report: Strack, Martin, & Stepper (1988). Perspectives on Psychological Science, 11(6), 917–928. https://doi.org/10.1177/1745691616674458\n\n\nWynants, L., Calster, B. V., Collins, G. S., Riley, R. D., Heinze, G., Schuit, E., Bonten, M. M. J., Dahly, D. L., Damen, J. A., Debray, T. P. A., Jong, V. M. T. de, Vos, M. D., Dhiman, P., Haller, M. C., Harhay, M. O., Henckaerts, L., Heus, P., Kammer, M., Kreuzberger, N., … Smeden, M. van. (2020). Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal. BMJ, 369, m1328. https://doi.org/10.1136/bmj.m1328\n\n\nYarkoni, T., & Westfall, J. (2017). Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspectives on Psychological Science, 12(6), 1100–1122. https://doi.org/10.1177/1745691617693393" }, { "objectID": "06-effectsize.html#effect-sizes", @@ -375,7 +375,7 @@ "href": "07-CI.html#sec-relatCIp", "title": "7  Confidence Intervals", "section": "\n7.4 The relation between confidence intervals and p-values", - "text": "7.4 The relation between confidence intervals and p-values\nThere is a direct relationship between the CI around an effect size and statistical significance of a null-hypothesis significance test. For example, if an effect is statistically significant (p < 0.05) in a two-sided independent t-test with an alpha of .05, the 95% CI for the mean difference between the two groups will not include zero. Confidence intervals are sometimes said to be more informative than p-values, because they not only provide information about whether an effect is statistically significant (i.e., when the confidence interval does not overlap with the value representing the null hypothesis), but also communicate the precision of the effect size estimate. This is true, but as mentioned in the chapter on p-values it is still recommended to add exact p-values, which facilitates the re-use of results for secondary analyses (Appelbaum et al., 2018), and allows other researchers to compare the p-value to an alpha level they would have preferred to use (Lehmann & Romano, 2005).\nIn order to maintain the direct relationship between a confidence interval and a p-value it is necessary to adjust the confidence interval level whenever the alpha level is adjusted. For example, if an alpha level of 5% is corrected for three comparisons to 0.05/3 - 0.0167, the corresponding confidence interval would be a 1 - 0.0167 = 0.9833 confidence interval. Similarly, if a p-value is computed for a one-sided t-test, there is only an upper or lower limit of the interval, and the other end of the interval ranges to −∞ or ∞.\nTo maintain a direct relationship between an F-test and its confidence interval, a 90% CI for effect sizes from an F-test should be provided. The reason for this is explained by Karl Wuensch. Where Cohen’s d can take both positive and negative values, r² or η² are squared, and can therefore only take positive values. This is related to the fact that F-tests (as commonly used in ANOVA) are one-sided. If you calculate a 95% CI, you can get situations where the confidence interval includes 0, but the test reveals a statistical difference with a p < .05 (for a more mathematical explanation, see Steiger (2004)). This means that a 95% CI around Cohen’s d in an independent t-test equals a 90% CI around η² for exactly the same test performed as an ANOVA. As a final detail, because eta-squared cannot be smaller than zero, the lower bound for the confidence interval cannot be smaller than 0. This means that a confidence interval for an effect that is not statistically different from 0 has to start at 0. You report such a CI as 90% CI [.00; .XX] where the XX is the upper limit of the CI.\nConfidence intervals are often used in forest plots that communicate the results from a meta-analysis. In the plot below, we see 4 rows. Each row shows the effect size estimate from one study (in Hedges’ g). For example, study 1 yielded an effect size estimate of 0.44, with a confidence interval around the effect size from 0.08 to 0.8. The horizontal black line, similarly to the visualization we played around with before, is the width of the confidence interval. When it does not touch the effect size 0 (indicated by a black vertical dotted line) the effect is statistically significant.\n\n\n\n\nFigure 7.4: Meta-analysis of 4 studies.\n\n\n\n\nWe can see, based on the fact that the confidence intervals do not overlap with 0, that studies 1 and 3 were statistically significant. The diamond shape named the FE model (Fixed Effect model) is the meta-analytic effect size. Instead of using a black horizontal line, the upper limit and lower limit of the confidence interval are indicated by the left and right points of the diamond, and the center of the diamond is the meta-analytic effect size estimate. A meta-analysis calculates the effect size by combining and weighing all studies. The confidence interval for a meta-analytic effect size estimate is always narrower than that for a single study, because of the combined sample size of all studies included in the meta-analysis.\nIn the preceding section, we focused on examining whether the confidence interval overlapped with 0. This is a confidence interval approach to a null-hypothesis significance test. Even though we are not computing a p-value, we can directly see from the confidence interval whether p < \\(\\alpha\\). The confidence interval approach to hypothesis testing makes it quite intuitive to think about performing tests against non-zero null hypotheses (Bauer & Kieser, 1996). For example, we could test whether we can reject an effect of 0.5 by examining if the 95% confidence interval does not overlap with 0.5. We can test whether an effect is smaller that 0.5 by examining if the 95% confidence interval falls completely below 0.5. We will see that this leads to a logical extension of null-hypothesis testing where, instead of testing to reject an effect of 0, we can test whether we can reject other effects of interest in range predictions and equivalence tests." + "text": "7.4 The relation between confidence intervals and p-values\nThere is a direct relationship between the CI around an effect size and statistical significance of a null-hypothesis significance test. For example, if an effect is statistically significant (p < 0.05) in a two-sided independent t-test with an alpha of .05, the 95% CI for the mean difference between the two groups will not include zero. Confidence intervals are sometimes said to be more informative than p-values, because they not only provide information about whether an effect is statistically significant (i.e., when the confidence interval does not overlap with the value representing the null hypothesis), but also communicate the precision of the effect size estimate. This is true, but as mentioned in the chapter on p-values it is still recommended to add exact p-values, which facilitates the re-use of results for secondary analyses (Appelbaum et al., 2018), and allows other researchers to compare the p-value to an alpha level they would have preferred to use (Lehmann & Romano, 2005).\nIn order to maintain the direct relationship between a confidence interval and a p-value it is necessary to adjust the confidence interval level whenever the alpha level is adjusted. For example, if an alpha level of 5% is corrected for three comparisons to 0.05/3 - 0.0167, the corresponding confidence interval would be a 1 - 0.0167 = 0.9833 confidence interval. Similarly, if a p-value is computed for a one-sided t-test, there is only an upper or lower limit of the interval, and the other end of the interval ranges to −∞ or ∞.\nTo maintain a direct relationship between an F-test and its confidence interval, a 90% CI for effect sizes from an F-test should be provided. The reason for this is explained by Karl Wuensch. Where Cohen’s d can take both positive and negative values, r² or η² are squared, and can therefore only take positive values. This is related to the fact that F-tests (as commonly used in ANOVA) are one-sided. If you calculate a 95% CI, you can get situations where the confidence interval includes 0, but the test reveals a statistical difference with a p < .05 (for a more mathematical explanation, see Steiger (2004)). This means that a 95% CI around Cohen’s d in an independent t-test equals a 90% CI around η² for exactly the same test performed as an ANOVA. As a final detail, because eta-squared cannot be smaller than zero, the lower bound for the confidence interval cannot be smaller than 0. This means that a confidence interval for an effect that is not statistically different from 0 has to start at 0. You report such a CI as 90% CI [.00; .XX] where the XX is the upper limit of the CI.\nConfidence intervals are often used in forest plots that communicate the results from a meta-analysis. In the plot below, we see 4 rows. Each row shows the effect size estimate from one study (in Hedges’ g). For example, study 1 yielded an effect size estimate of 0.53, with a confidence interval around the effect size from 0.12 to 0.94. The horizontal black line, similarly to the visualization we played around with before, is the width of the confidence interval. When it does not touch the effect size 0 (indicated by a black vertical dotted line) the effect is statistically significant.\n\n\n\n\nFigure 7.4: Meta-analysis of 4 studies.\n\n\n\n\nWe can see, based on the fact that the confidence intervals do not overlap with 0, that studies 1 and 3 were statistically significant. The diamond shape named the FE model (Fixed Effect model) is the meta-analytic effect size. Instead of using a black horizontal line, the upper limit and lower limit of the confidence interval are indicated by the left and right points of the diamond, and the center of the diamond is the meta-analytic effect size estimate. A meta-analysis calculates the effect size by combining and weighing all studies. The confidence interval for a meta-analytic effect size estimate is always narrower than that for a single study, because of the combined sample size of all studies included in the meta-analysis.\nIn the preceding section, we focused on examining whether the confidence interval overlapped with 0. This is a confidence interval approach to a null-hypothesis significance test. Even though we are not computing a p-value, we can directly see from the confidence interval whether p < \\(\\alpha\\). The confidence interval approach to hypothesis testing makes it quite intuitive to think about performing tests against non-zero null hypotheses (Bauer & Kieser, 1996). For example, we could test whether we can reject an effect of 0.5 by examining if the 95% confidence interval does not overlap with 0.5. We can test whether an effect is smaller that 0.5 by examining if the 95% confidence interval falls completely below 0.5. We will see that this leads to a logical extension of null-hypothesis testing where, instead of testing to reject an effect of 0, we can test whether we can reject other effects of interest in range predictions and equivalence tests." }, { "objectID": "07-CI.html#the-standard-error-and-95-confidence-intervals", @@ -774,7 +774,7 @@ "href": "10-sequential.html#reporting-the-results-of-a-sequential-analysis", "title": "10  Sequential Analysis", "section": "\n10.8 Reporting the results of a sequential analysis", - "text": "10.8 Reporting the results of a sequential analysis\nGroup sequential designs have been developed to efficiently test hypotheses using the Neyman-Pearson approach for statistical inference, where the goal is to decide how to act, while controlling error rates in the long run. Group sequential designs do not have the goal to quantify the strength of evidence, or provide accurate estimates of the effect size (Proschan et al., 2006). Nevertheless, after having reached a conclusion about whether a hypothesis can be rejected or not, researchers will often want to also interpret the effect size estimate when reporting results.\nA challenge when interpreting the observed effect size in sequential designs is that whenever a study is stopped early when \\(H_0\\) is rejected, there is a risk that the data analysis was stopped because, due to random variation, a large effect size was observed at the time of the interim analysis. This means that the observed effect size at these interim analyses over-estimates the true effect size. As Schönbrodt et al. (2017) show, a meta-analysis of studies that used sequential designs will yield an accurate effect size, because studies that stop early have smaller sample sizes, and are weighted less, which is compensated by the smaller effect size estimates in those sequential studies that reach the final look, and are weighted more because of their larger sample size. However, researchers might want to interpret effect sizes from single studies before a meta-analysis can be performed, and in this case, reporting an adjusted effect size estimate can be useful. Although sequential analysis software only allows one to compute adjusted effect size estimates for certain statistical tests, we recommend reporting both the adjusted effect size where possible, and to always also report the unadjusted effect size estimate for future meta-analyses.\nA similar issue is at play when reporting p values and confidence intervals. When a sequential design is used, the distribution of a p value that does not account for the sequential nature of the design is no longer uniform when \\(H_0\\) is true. A p value is the probability of observing a result at least as extreme as the result that was observed, given that \\(H_0\\) is true. It is no longer straightforward to determine what ‘at least as extreme’ means a sequential design (Cook, 2002). The most widely recommended procedure to determine what “at least as extreme” means is to order the outcomes of a series of sequential analyses in terms of the look at which the study was stopped, where earlier stopping is more extreme than later stopping, and where studies with higher z values are more extreme, when different studies are stopped at the same time (Proschan et al., 2006). This is referred to as stagewise ordering, which treats rejections at earlier looks as stronger evidence against \\(H_0\\) than rejections later in the study (Wassmer & Brannath, 2016). Given the direct relationship between a p value and a confidence interval, confidence intervals for sequential designs have also been developed.\nReporting adjusted p values and confidence intervals, however, might be criticized. After a sequential design, a correct interpretation from a Neyman-Pearson framework is to conclude that \\(H_0\\) is rejected, the alternative hypothesis is rejected, or that the results are inconclusive. The reason that adjusted p values are reported after sequential designs is to allow readers to interpret them as a measure of evidence. Dupont (1983) provides good arguments to doubt that adjusted p values provide a valid measure of the strength of evidence. Furthermore, a strict interpretation of the Neyman-Pearson approach to statistical inferences also provides an argument against interpreting p values as measures of evidence (Lakens, 2022). Therefore, it is recommended, if researchers are interested in communicating the evidence in the data for \\(H_0\\) relative to the alternative hypothesis, to report likelihoods or Bayes factors, which can always be reported and interpreted after the data collection has been completed. Reporting the unadjusted p-value in relation to the alpha level communicates the basis to reject hypotheses, although it might be important for researchers performing a meta-analysis based on p-values (e.g., a p-curve or z-curve analysis, as explained in the chapter on bias detection) that these are sequential p-values. Adjusted confidence intervals are useful tools to evaluate the observed effect estimate relative to its variability at an interim or the final look at the data. Note that the adjusted parameter estimates are only available in statistical software for a few commonly used designs in pharmaceutical trials, such as comparisons of mean differences between groups, or survuval analysis.\nBelow, we see the same sequential design we started with, with 2 looks and a Pocock-type alpha spending function. After completing the study with the planned sample size of 95 participants per condition (where we collect 48 participants at look 1, and the remaining 47 at look 2), we can now enter the observed data using the function getDataset. The means and standard deviations are entered for each stage, so at the second look, only the data from the second 95 participants in each condition are used to compute the means (1.51 and 1.01) and standard deviations (1.03 and 0.96).\n\ndesign <- getDesignGroupSequential(\n kMax = 2,\n typeOfDesign = \"asP\",\n sided = 2,\n alpha = 0.05,\n beta = 0.1\n)\n\ndataMeans <- getDataset(\n n1 = c(48, 47), \n n2 = c(48, 47), \n means1 = c(1.12, 1.51), # for directional test, means 1 > means 2\n means2 = c(1.03, 1.01),\n stDevs1 = c(0.98, 1.03), \n stDevs2 = c(1.06, 0.96)\n )\n\nres <- getAnalysisResults(\n design, \n equalVariances = TRUE,\n dataInput = dataMeans\n )\n\nprint(summary(res))\n\n\n\n[PROGRESS] Stage results calculated [0.0352 secs] \n[PROGRESS] Conditional power calculated [0.0277 secs] \n[PROGRESS] Conditional rejection probabilities (CRP) calculated [0.001 secs] \n[PROGRESS] Repeated confidence interval of stage 1 calculated [0.6816 secs] \n[PROGRESS] Repeated confidence interval of stage 2 calculated [0.6956 secs] \n[PROGRESS] Repeated confidence interval calculated [1.38 secs] \n[PROGRESS] Repeated p-values of stage 1 calculated [0.2672 secs] \n[PROGRESS] Repeated p-values of stage 2 calculated [0.2507 secs] \n[PROGRESS] Repeated p-values calculated [0.5192 secs] \n[PROGRESS] Final p-value calculated [0.0013 secs] \n[PROGRESS] Final confidence interval calculated [0.0645 secs] \n\n\n\n\nAnalysis results for a continuous endpoint\n\nSequential analysis with 2 looks (group sequential design).\nThe results were calculated using a two-sample t-test (two-sided, alpha = 0.05), \nequal variances option.\nH0: mu(1) - mu(2) = 0 against H1: mu(1) - mu(2) != 0.\n\nStage 1 2 \nFixed weight 0.5 1 \nEfficacy boundary (z-value scale) 2.157 2.201 \nCumulative alpha spent 0.0310 0.0500 \nStage level 0.0155 0.0139 \nCumulative effect size 0.090 0.293 \nCumulative (pooled) standard deviation 1.021 1.013 \nOverall test statistic 0.432 1.993 \nOverall p-value 0.3334 0.0238 \nTest action continue accept \nConditional rejection probability 0.0073 \n95% repeated confidence interval [-0.366; 0.546] [-0.033; 0.619]\nRepeated p-value >0.5 0.0819 \nFinal p-value 0.0666 \nFinal confidence interval [-0.020; 0.573]\nMedian unbiased estimate 0.281 \n\n-----\n\nAnalysis results (means of 2 groups, group sequential design):\n\nDesign parameters:\n Information rates : 0.500, 1.000 \n Critical values : 2.157, 2.201 \n Futility bounds (non-binding) : -Inf \n Cumulative alpha spending : 0.03101, 0.05000 \n Local one-sided significance levels : 0.01550, 0.01387 \n Significance level : 0.0500 \n Test : two-sided \n\nUser defined parameters: not available\n\nDefault parameters:\n Normal approximation : FALSE \n Direction upper : TRUE \n Theta H0 : 0 \n Equal variances : TRUE \n\nStage results:\n Cumulative effect sizes : 0.0900, 0.2928 \n Cumulative (pooled) standard deviations : 1.021, 1.013 \n Stage-wise test statistics : 0.432, 2.435 \n Stage-wise p-values : 0.333390, 0.008421 \n Overall test statistics : 0.432, 1.993 \n Overall p-values : 0.33339, 0.02384 \n\nAnalysis results:\n Assumed standard deviation : 1.013 \n Actions : continue, accept \n Conditional rejection probability : 0.007317, NA \n Conditional power : NA, NA \n Repeated confidence intervals (lower) : -0.36630, -0.03306 \n Repeated confidence intervals (upper) : 0.5463, 0.6187 \n Repeated p-values : >0.5, 0.08195 \n Final stage : 2 \n Final p-value : NA, 0.06662 \n Final CIs (lower) : NA, -0.02007 \n Final CIs (upper) : NA, 0.5734 \n Median unbiased estimate : NA, 0.2814 \n\n\nImagine we have performed a study planned to have at most 2 equally spaced looks at the data, where we perform a two-sided test with an alpha of 0.05, and we use a Pocock type alpha spending function, and we observe mean differences between the two conditions at the last look. Based on a Pocock-like alpha spending function with two equally spaced looks the alpha level for a two-sided t-test is 0.003051, and 0.0490. We can thus reject \\(H_0\\) after look 2. But we would also like to report an effect size, and adjusted p values and confidence intervals.\nThe results show that the action after look 1 was to continue data collection, and that we could reject \\(H_0\\) at the second look. The unadjusted mean difference is provided in the row “Overall effect size” and at the final look this was 0.293. The adjusted mean difference is provided in the row “Median unbiased estimate” and is lower, and the adjusted confidence interval is in the row “Final confidence interval”, giving the result 0.281, 95% CI [-0.02, 0.573].\nThe unadjusted p values for a one-sided test are reported in the row “Overall p-value”. The actual p values for our two-sided test would be twice as large, so 0.6668, 0.0477. The adjusted p-value at the final look is provided in the row “Final p-value” and it is 0.06662." + "text": "10.8 Reporting the results of a sequential analysis\nGroup sequential designs have been developed to efficiently test hypotheses using the Neyman-Pearson approach for statistical inference, where the goal is to decide how to act, while controlling error rates in the long run. Group sequential designs do not have the goal to quantify the strength of evidence, or provide accurate estimates of the effect size (Proschan et al., 2006). Nevertheless, after having reached a conclusion about whether a hypothesis can be rejected or not, researchers will often want to also interpret the effect size estimate when reporting results.\nA challenge when interpreting the observed effect size in sequential designs is that whenever a study is stopped early when \\(H_0\\) is rejected, there is a risk that the data analysis was stopped because, due to random variation, a large effect size was observed at the time of the interim analysis. This means that the observed effect size at these interim analyses over-estimates the true effect size. As Schönbrodt et al. (2017) show, a meta-analysis of studies that used sequential designs will yield an accurate effect size, because studies that stop early have smaller sample sizes, and are weighted less, which is compensated by the smaller effect size estimates in those sequential studies that reach the final look, and are weighted more because of their larger sample size. However, researchers might want to interpret effect sizes from single studies before a meta-analysis can be performed, and in this case, reporting an adjusted effect size estimate can be useful. Although sequential analysis software only allows one to compute adjusted effect size estimates for certain statistical tests, we recommend reporting both the adjusted effect size where possible, and to always also report the unadjusted effect size estimate for future meta-analyses.\nA similar issue is at play when reporting p values and confidence intervals. When a sequential design is used, the distribution of a p value that does not account for the sequential nature of the design is no longer uniform when \\(H_0\\) is true. A p value is the probability of observing a result at least as extreme as the result that was observed, given that \\(H_0\\) is true. It is no longer straightforward to determine what ‘at least as extreme’ means a sequential design (Cook, 2002). The most widely recommended procedure to determine what “at least as extreme” means is to order the outcomes of a series of sequential analyses in terms of the look at which the study was stopped, where earlier stopping is more extreme than later stopping, and where studies with higher z values are more extreme, when different studies are stopped at the same time (Proschan et al., 2006). This is referred to as stagewise ordering, which treats rejections at earlier looks as stronger evidence against \\(H_0\\) than rejections later in the study (Wassmer & Brannath, 2016). Given the direct relationship between a p value and a confidence interval, confidence intervals for sequential designs have also been developed.\nReporting adjusted p values and confidence intervals, however, might be criticized. After a sequential design, a correct interpretation from a Neyman-Pearson framework is to conclude that \\(H_0\\) is rejected, the alternative hypothesis is rejected, or that the results are inconclusive. The reason that adjusted p values are reported after sequential designs is to allow readers to interpret them as a measure of evidence. Dupont (1983) provides good arguments to doubt that adjusted p values provide a valid measure of the strength of evidence. Furthermore, a strict interpretation of the Neyman-Pearson approach to statistical inferences also provides an argument against interpreting p values as measures of evidence (Lakens, 2022). Therefore, it is recommended, if researchers are interested in communicating the evidence in the data for \\(H_0\\) relative to the alternative hypothesis, to report likelihoods or Bayes factors, which can always be reported and interpreted after the data collection has been completed. Reporting the unadjusted p-value in relation to the alpha level communicates the basis to reject hypotheses, although it might be important for researchers performing a meta-analysis based on p-values (e.g., a p-curve or z-curve analysis, as explained in the chapter on bias detection) that these are sequential p-values. Adjusted confidence intervals are useful tools to evaluate the observed effect estimate relative to its variability at an interim or the final look at the data. Note that the adjusted parameter estimates are only available in statistical software for a few commonly used designs in pharmaceutical trials, such as comparisons of mean differences between groups, or survuval analysis.\nBelow, we see the same sequential design we started with, with 2 looks and a Pocock-type alpha spending function. After completing the study with the planned sample size of 95 participants per condition (where we collect 48 participants at look 1, and the remaining 47 at look 2), we can now enter the observed data using the function getDataset. The means and standard deviations are entered for each stage, so at the second look, only the data from the second 95 participants in each condition are used to compute the means (1.51 and 1.01) and standard deviations (1.03 and 0.96).\n\ndesign <- getDesignGroupSequential(\n kMax = 2,\n typeOfDesign = \"asP\",\n sided = 2,\n alpha = 0.05,\n beta = 0.1\n)\n\ndataMeans <- getDataset(\n n1 = c(48, 47), \n n2 = c(48, 47), \n means1 = c(1.12, 1.51), # for directional test, means 1 > means 2\n means2 = c(1.03, 1.01),\n stDevs1 = c(0.98, 1.03), \n stDevs2 = c(1.06, 0.96)\n )\n\nres <- getAnalysisResults(\n design, \n equalVariances = TRUE,\n dataInput = dataMeans\n )\n\nprint(summary(res))\n\n\n\n[PROGRESS] Stage results calculated [0.0434 secs] \n[PROGRESS] Conditional power calculated [0.0327 secs] \n[PROGRESS] Conditional rejection probabilities (CRP) calculated [0.0012 secs] \n[PROGRESS] Repeated confidence interval of stage 1 calculated [0.7686 secs] \n[PROGRESS] Repeated confidence interval of stage 2 calculated [0.7027 secs] \n[PROGRESS] Repeated confidence interval calculated [1.47 secs] \n[PROGRESS] Repeated p-values of stage 1 calculated [0.254 secs] \n[PROGRESS] Repeated p-values of stage 2 calculated [0.2679 secs] \n[PROGRESS] Repeated p-values calculated [0.5231 secs] \n[PROGRESS] Final p-value calculated [0.0015 secs] \n[PROGRESS] Final confidence interval calculated [0.0804 secs] \n\n\n\n\nAnalysis results for a continuous endpoint\n\nSequential analysis with 2 looks (group sequential design).\nThe results were calculated using a two-sample t-test (two-sided, alpha = 0.05), \nequal variances option.\nH0: mu(1) - mu(2) = 0 against H1: mu(1) - mu(2) != 0.\n\nStage 1 2 \nFixed weight 0.5 1 \nEfficacy boundary (z-value scale) 2.157 2.201 \nCumulative alpha spent 0.0310 0.0500 \nStage level 0.0155 0.0139 \nCumulative effect size 0.090 0.293 \nCumulative (pooled) standard deviation 1.021 1.013 \nOverall test statistic 0.432 1.993 \nOverall p-value 0.3334 0.0238 \nTest action continue accept \nConditional rejection probability 0.0073 \n95% repeated confidence interval [-0.366; 0.546] [-0.033; 0.619]\nRepeated p-value >0.5 0.0819 \nFinal p-value 0.0666 \nFinal confidence interval [-0.020; 0.573]\nMedian unbiased estimate 0.281 \n\n-----\n\nAnalysis results (means of 2 groups, group sequential design):\n\nDesign parameters:\n Information rates : 0.500, 1.000 \n Critical values : 2.157, 2.201 \n Futility bounds (non-binding) : -Inf \n Cumulative alpha spending : 0.03101, 0.05000 \n Local one-sided significance levels : 0.01550, 0.01387 \n Significance level : 0.0500 \n Test : two-sided \n\nUser defined parameters: not available\n\nDefault parameters:\n Normal approximation : FALSE \n Direction upper : TRUE \n Theta H0 : 0 \n Equal variances : TRUE \n\nStage results:\n Cumulative effect sizes : 0.0900, 0.2928 \n Cumulative (pooled) standard deviations : 1.021, 1.013 \n Stage-wise test statistics : 0.432, 2.435 \n Stage-wise p-values : 0.333390, 0.008421 \n Overall test statistics : 0.432, 1.993 \n Overall p-values : 0.33339, 0.02384 \n\nAnalysis results:\n Assumed standard deviation : 1.013 \n Actions : continue, accept \n Conditional rejection probability : 0.007317, NA \n Conditional power : NA, NA \n Repeated confidence intervals (lower) : -0.36630, -0.03306 \n Repeated confidence intervals (upper) : 0.5463, 0.6187 \n Repeated p-values : >0.5, 0.08195 \n Final stage : 2 \n Final p-value : NA, 0.06662 \n Final CIs (lower) : NA, -0.02007 \n Final CIs (upper) : NA, 0.5734 \n Median unbiased estimate : NA, 0.2814 \n\n\nImagine we have performed a study planned to have at most 2 equally spaced looks at the data, where we perform a two-sided test with an alpha of 0.05, and we use a Pocock type alpha spending function, and we observe mean differences between the two conditions at the last look. Based on a Pocock-like alpha spending function with two equally spaced looks the alpha level for a two-sided t-test is 0.003051, and 0.0490. We can thus reject \\(H_0\\) after look 2. But we would also like to report an effect size, and adjusted p values and confidence intervals.\nThe results show that the action after look 1 was to continue data collection, and that we could reject \\(H_0\\) at the second look. The unadjusted mean difference is provided in the row “Overall effect size” and at the final look this was 0.293. The adjusted mean difference is provided in the row “Median unbiased estimate” and is lower, and the adjusted confidence interval is in the row “Final confidence interval”, giving the result 0.281, 95% CI [-0.02, 0.573].\nThe unadjusted p values for a one-sided test are reported in the row “Overall p-value”. The actual p values for our two-sided test would be twice as large, so 0.6668, 0.0477. The adjusted p-value at the final look is provided in the row “Final p-value” and it is 0.06662." }, { "objectID": "10-sequential.html#test-yourself", @@ -1103,13 +1103,13 @@ "href": "references.html", "title": "References", "section": "", - "text": "Abelson, P. (2003). The Value of Life and\nHealth for Public Policy. Economic\nRecord, 79, S2–S13. https://doi.org/10.1111/1475-4932.00087\n\n\nAberson, C. L. (2019). Applied Power Analysis for the\nBehavioral Sciences (2nd ed.). Routledge.\n\n\nAert, R. C. M. van, & Assen, M. A. L. M. van. (2018). Correcting\nfor Publication Bias in a Meta-Analysis with\nthe P-uniform* Method.\nMetaArXiv. https://doi.org/10.31222/osf.io/zqjr9\n\n\nAgnoli, F., Wicherts, J. M., Veldkamp, C. L. S., Albiero, P., &\nCubelli, R. (2017). Questionable research practices among italian\nresearch psychologists. PLOS ONE, 12(3), e0172792. https://doi.org/10.1371/journal.pone.0172792\n\n\nAkker, O. van den, Bakker, M., Assen, M. A. L. M. van, Pennington, C.\nR., Verweij, L., Elsherif, M., Claesen, A., Gaillard, S. D. M., Yeung,\nS. K., Frankenberger, J.-L., Krautter, K., Cockcroft, J. P., Kreuer, K.\nS., Evans, T. R., Heppel, F., Schoch, S. F., Korbmacher, M., Yamada, Y.,\nAlbayrak-Aydemir, N., … Wicherts, J. (2023). The effectiveness of\npreregistration in psychology: Assessing preregistration\nstrictness and preregistration-study consistency.\nMetaArXiv. https://doi.org/10.31222/osf.io/h8xjw\n\n\nAlbers, C. J., Kiers, H. A. L., & Ravenzwaaij, D. van. (2018).\nCredible Confidence: A Pragmatic View on the\nFrequentist vs Bayesian Debate. Collabra:\nPsychology, 4(1), 31. https://doi.org/10.1525/collabra.149\n\n\nAlbers, C. J., & Lakens, D. (2018). When power analyses based on\npilot data are biased: Inaccurate effect size estimators\nand follow-up bias. Journal of Experimental Social Psychology,\n74, 187–195. https://doi.org/10.1016/j.jesp.2017.09.004\n\n\nAldrich, J. (1997). R.A. Fisher and the making\nof maximum likelihood 1912-1922. Statistical Science,\n12(3), 162–176. https://doi.org/10.1214/ss/1030037906\n\n\nAllison, D. B., Allison, R. L., Faith, M. S., Paultre, F., &\nPi-Sunyer, F. X. (1997). Power and money: Designing\nstatistically powerful studies while minimizing financial costs.\nPsychological Methods, 2(1), 20–33. https://doi.org/10.1037/1082-989X.2.1.20\n\n\nAltman, D. G., & Bland, J. M. (1995). Statistics notes:\nAbsence of evidence is not evidence of absence.\nBMJ, 311(7003), 485. https://doi.org/10.1136/bmj.311.7003.485\n\n\nAltoè, G., Bertoldo, G., Zandonella Callegher, C., Toffalini, E.,\nCalcagnì, A., Finos, L., & Pastore, M. (2020). Enhancing\nStatistical Inference in Psychological\nResearch via Prospective and Retrospective\nDesign Analysis. Frontiers in Psychology, 10.\n\n\nAnderson, M. S., Martinson, B. C., & De Vries, R. (2007). Normative\ndissonance in science: Results from a national survey of\nUS scientists. Journal of Empirical Research on Human\nResearch Ethics, 2(4), 3–14.\n\n\nAnderson, M. S., Ronning, E. A., De Vries, R., & Martinson, B. C.\n(2007). The perverse effects of competition on scientists’ work and\nrelationships. Science and Engineering Ethics, 13(4),\n437–461.\n\n\nAnderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-size\nplanning for more accurate statistical power: A method\nadjusting sample effect sizes for publication bias and uncertainty.\nPsychological Science, 28(11), 1547–1562. https://doi.org/10.1177/0956797617723724\n\n\nAnderson, S. F., & Maxwell, S. E. (2016). There’s more than one way\nto conduct a replication study: Beyond statistical\nsignificance. Psychological Methods, 21(1), 1–12. https://doi.org/10.1037/met0000051\n\n\nAnvari, F., Kievit, R., Lakens, D., Pennington, C. R., Przybylski, A.\nK., Tiokhin, L., Wiernik, B. M., & Orben, A. (2021). Not all effects\nare indispensable: Psychological science requires\nverifiable lines of reasoning for whether an effect matters.\nPerspectives on Psychological Science. https://doi.org/10.31234/osf.io/g3vtr\n\n\nAnvari, F., & Lakens, D. (2018). The replicability crisis and public\ntrust in psychological science. Comprehensive Results in Social\nPsychology, 3(3), 266–286. https://doi.org/10.1080/23743603.2019.1684822\n\n\nAnvari, F., & Lakens, D. (2021). Using anchor-based methods to\ndetermine the smallest effect size of interest. Journal of\nExperimental Social Psychology, 96, 104159. https://doi.org/10.1016/j.jesp.2021.104159\n\n\nAppelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M.,\n& Rao, S. M. (2018). Journal article reporting standards for\nquantitative research in psychology: The APA Publications\nand Communications Board task force report. American\nPsychologist, 73(1), 3. https://doi.org/10.1037/amp0000191\n\n\nArmitage, P., McPherson, C. K., & Rowe, B. C. (1969). Repeated\nsignificance tests on accumulating data. Journal of the Royal\nStatistical Society: Series A (General), 132(2), 235–244.\n\n\nArslan, R. C. (2019). How to Automatically Document Data\nWith the codebook Package to Facilitate Data\nReuse. Advances in Methods and Practices in Psychological\nScience, 2515245919838783. https://doi.org/10.1177/2515245919838783\n\n\nAzrin, N. H., Holz, W., Ulrich, R., & Goldiamond, I. (1961). The\ncontrol of the content of conversation through reinforcement.\nJournal of the Experimental Analysis of Behavior, 4,\n25–30. https://doi.org/10.1901/jeab.1961.4-25\n\n\nBabbage, C. (1830). Reflections on the Decline of\nScience in England: And on\nSome of Its Causes. B.\nFellowes.\n\n\nBacchetti, P. (2010). Current sample size conventions:\nFlaws, harms, and alternatives. BMC Medicine,\n8(1), 17. https://doi.org/10.1186/1741-7015-8-17\n\n\nBaguley, T. (2004). Understanding statistical power in the context of\napplied research. Applied Ergonomics, 35(2), 73–80. https://doi.org/10.1016/j.apergo.2004.01.002\n\n\nBaguley, T. (2009). Standardized or simple effect size:\nWhat should be reported? British Journal of\nPsychology, 100(3), 603–617. https://doi.org/10.1348/000712608X377117\n\n\nBaguley, T. (2012). Serious stats: A guide to advanced statistics\nfor the behavioral sciences. Palgrave Macmillan.\n\n\nBakan, D. (1966). The test of significance in psychological research.\nPsychological Bulletin, 66(6), 423–437. https://doi.org/10.1037/h0020412\n\n\nBakan, D. (1967). On method: Toward a reconstruction of\npsychological investigation. San Francisco,\nJossey-Bass.\n\n\nBakker, B. N., Kokil, J., Dörr, T., Fasching, N., & Lelkes, Y.\n(2021). Questionable and Open Research Practices:\nAttitudes and Perceptions among\nQuantitative Communication Researchers. Journal of\nCommunication, 71(5), 715–738. https://doi.org/10.1093/joc/jqab031\n\n\nBall, K., Berch, D. B., Helmers, K. F., Jobe, J. B., Leveck, M. D.,\nMarsiske, M., Morris, J. N., Rebok, G. W., Smith, D. M., &\nTennstedt, S. L. (2002). Effects of cognitive training interventions\nwith older adults: A randomized controlled trial. Jama,\n288(18), 2271–2281.\n\n\nBarber, T. X. (1976). Pitfalls in Human Research:\nTen Pivotal Points. Pergamon Press.\n\n\nBartoš, F., & Schimmack, U. (2020). Z-Curve.2.0:\nEstimating Replication Rates and Discovery\nRates. https://doi.org/10.31234/osf.io/urgtn\n\n\nBauer, P., & Kieser, M. (1996). A unifying approach for confidence\nintervals and testing of equivalence and difference.\nBiometrika, 83(4), 934–937.\n\n\nBausell, R. B., & Li, Y.-F. (2002). Power Analysis\nfor Experimental Research: A Practical Guide\nfor the Biological, Medical and Social\nSciences (1st edition). Cambridge University\nPress.\n\n\nBeck, W. S. (1957). Modern Science and the nature of\nlife (First Edition). Harcourt, Brace.\n\n\nBecker, B. J. (2005). Failsafe N or File-Drawer\nNumber. In Publication Bias in\nMeta-Analysis (pp. 111–125). John Wiley &\nSons, Ltd. https://doi.org/10.1002/0470870168.ch7\n\n\nBem, D. J. (2011). Feeling the future: Experimental evidence for\nanomalous retroactive influences on cognition and affect. Journal of\nPersonality and Social Psychology, 100(3), 407–425. https://doi.org/10.1037/a0021524\n\n\nBem, D. J., Utts, J., & Johnson, W. O. (2011). Must psychologists\nchange the way they analyze their data? Journal of Personality and\nSocial Psychology, 101(4), 716–719. https://doi.org/10.1037/a0024777\n\n\nBender, R., & Lange, S. (2001). Adjusting for multiple\ntestingwhen and how? Journal of Clinical\nEpidemiology, 54(4), 343–349.\n\n\nBenjamini, Y. (2016). It’s Not the p-values’\nFault. The American Statistician: Supplemental Material\nto the ASA Statement on P-Values and Statistical Significance,\n70, 1–2.\n\n\nBenjamini, Y., & Hochberg, Y. (1995). Controlling the false\ndiscovery rate: A practical and powerful approach to multiple testing.\nJournal of the Royal Statistical Society. Series B\n(Methodological), 289–300. https://www.jstor.org/stable/2346101\n\n\nBen-Shachar, M. S., Lüdecke, D., & Makowski, D. (2020). Effectsize:\nEstimation of Effect Size Indices and\nStandardized Parameters. Journal of Open Source\nSoftware, 5(56), 2815. https://doi.org/10.21105/joss.02815\n\n\nBerger, J. O., & Bayarri, M. J. (2004). The Interplay\nof Bayesian and Frequentist Analysis.\nStatistical Science, 19(1), 58–80. https://doi.org/10.1214/088342304000000116\n\n\nBerkeley, G. (1735). A defence of free-thinking in mathematics, in\nanswer to a pamphlet of Philalethes Cantabrigiensis\nentitled Geometry No Friend to Infidelity.\nAlso an appendix concerning mr. Walton’s\nVindication of the principles of fluxions against the\nobjections contained in The analyst. By the\nauthor of The minute philosopher (Vol. 3).\n\n\nBird, S. B., & Sivilotti, M. L. A. (2008). Self-plagiarism,\nrecycling fraud, and the intent to mislead. Journal of Medical\nToxicology, 4(2), 69–70. https://doi.org/10.1007/BF03160957\n\n\nBishop, D. V. M. (2018). Fallibility in Science:\nResponding to Errors in the Work\nof Oneself and Others. Advances in Methods\nand Practices in Psychological Science, 2515245918776632. https://doi.org/10.1177/2515245918776632\n\n\nBland, M. (2015). An introduction to medical statistics (Fourth\nedition). Oxford University Press.\n\n\nBorenstein, M. (Ed.). (2009). Introduction to meta-analysis.\nJohn Wiley & Sons.\n\n\nBosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A.\n(2015). Correlational effect size benchmarks. The Journal of Applied\nPsychology, 100(2), 431–449. https://doi.org/10.1037/a0038047\n\n\nBozarth, J. D., & Roberts, R. R. (1972). Signifying significant\nsignificance. American Psychologist, 27(8), 774.\n\n\nBretz, F., Hothorn, T., & Westfall, P. H. (2011). Multiple\ncomparisons using R. CRC Press.\n\n\nBross, I. D. (1971). Critical levels, statistical language and\nscientific inference. In Foundations of statistical inference\n(pp. 500–513). Holt, Rinehart and Winston.\n\n\nBrown, G. W. (1983). Errors, Types I and II.\nAmerican Journal of Diseases of Children, 137(6),\n586–591. https://doi.org/10.1001/archpedi.1983.02140320062014\n\n\nBrown, N. J. L., & Heathers, J. A. J. (2017). The GRIM\nTest: A Simple Technique Detects Numerous Anomalies\nin the Reporting of Results in\nPsychology. Social Psychological and Personality\nScience, 8(4), 363–369. https://doi.org/10.1177/1948550616673876\n\n\nBrunner, J., & Schimmack, U. (2020). Estimating Population\nMean Power Under Conditions of Heterogeneity and\nSelection for Significance.\nMeta-Psychology, 4. https://doi.org/10.15626/MP.2018.874\n\n\nBryan, C. J., Tipton, E., & Yeager, D. S. (2021). Behavioural\nscience is unlikely to change the world without a heterogeneity\nrevolution. Nature Human Behaviour, 1–10. https://doi.org/10.1038/s41562-021-01143-3\n\n\nBrysbaert, M. (2019). How many participants do we have to include in\nproperly powered experiments? A tutorial of power analysis\nwith reference tables. Journal of Cognition, 2(1), 16.\nhttps://doi.org/10.5334/joc.72\n\n\nBrysbaert, M., & Stevens, M. (2018). Power Analysis and\nEffect Size in Mixed Effects Models: A\nTutorial. Journal of Cognition, 1(1). https://doi.org/10.5334/joc.10\n\n\nBuchanan, E. M., Scofield, J., & Valentine, K. D. (2017).\nMOTE: Effect Size and Confidence\nInterval Calculator.\n\n\nBulus, M., & Dong, N. (2021). Bound Constrained\nOptimization of Sample Sizes Subject to\nMonetary Restrictions in Planning Multilevel\nRandomized Trials and Regression Discontinuity\nStudies. The Journal of Experimental Education,\n89(2), 379–401. https://doi.org/10.1080/00220973.2019.1636197\n\n\nBurriss, R. P., Troscianko, J., Lovell, P. G., Fulford, A. J. C.,\nStevens, M., Quigley, R., Payne, J., Saxton, T. K., & Rowland, H. M.\n(2015). Changes in women’s facial skin color over the ovulatory cycle\nare not detectable by the human visual system. PLOS ONE,\n10(7), e0130093. https://doi.org/10.1371/journal.pone.0130093\n\n\nButton, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint,\nJ., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why\nsmall sample size undermines the reliability of neuroscience. Nature\nReviews Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475\n\n\nButton, K. S., Kounali, D., Thomas, L., Wiles, N. J., Peters, T. J.,\nWelton, N. J., Ades, A. E., & Lewis, G. (2015). Minimal clinically\nimportant difference on the Beck Depression Inventory -\nII according to the patient’s perspective.\nPsychological Medicine, 45(15), 3269–3279. https://doi.org/10.1017/S0033291715001270\n\n\nCaplan, A. L. (2021). How Should We Regard Information\nGathered in Nazi Experiments? AMA Journal of\nEthics, 23(1), 55–58. https://doi.org/10.1001/amajethics.2021.55\n\n\nCarter, E. C., & McCullough, M. E. (2014). Publication bias and the\nlimited strength model of self-control: Has the evidence for ego\ndepletion been overestimated? Frontiers in Psychology,\n5. https://doi.org/10.3389/fpsyg.2014.00823\n\n\nCarter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J.\n(2019). Correcting for Bias in Psychology:\nA Comparison of Meta-Analytic Methods.\nAdvances in Methods and Practices in Psychological Science,\n2(2), 115–144. https://doi.org/10.1177/2515245919847196\n\n\nCascio, W. F., & Zedeck, S. (1983). Open a New Window\nin Rational Research Planning: Adjust Alpha to\nMaximize Statistical Power. Personnel Psychology,\n36(3), 517–526. https://doi.org/10.1111/j.1744-6570.1983.tb02233.x\n\n\nCeci, S. J., & Bjork, R. A. (2000). Psychological\nScience in the Public Interest: The\nCase for Juried Analyses. Psychological\nScience, 11(3), 177–178. https://doi.org/10.1111/1467-9280.00237\n\n\nCevolani, G., Crupi, V., & Festa, R. (2011). Verisimilitude and\nbelief change for conjunctive theories. Erkenntnis,\n75(2), 183.\n\n\nChalmers, I., & Glasziou, P. (2009). Avoidable waste in the\nproduction and reporting of research evidence. The Lancet,\n374(9683), 86–89.\n\n\nChamberlin, T. C. (1890). The Method of Multiple\nWorking Hypotheses. Science, ns-15(366), 92–96.\nhttps://doi.org/10.1126/science.ns-15.366.92\n\n\nChambers, C. D., & Tzavella, L. (2022). The past, present and future\nof Registered Reports. Nature Human Behaviour,\n6(1), 29–42. https://doi.org/10.1038/s41562-021-01193-7\n\n\nChang, H. (2022). Realism for Realistic People: A\nNew Pragmatist Philosophy of Science.\nCambridge University Press. https://doi.org/10.1017/9781108635738\n\n\nChang, M. (2016). Adaptive Design Theory and\nImplementation Using SAS and R (2nd\nedition). Chapman and Hall/CRC.\n\n\nChatziathanasiou, K. (2022). Beware the Lure of\nNarratives: “Hungry Judges”\nShould not Motivate the Use of\n“Artificial Intelligence” in\nLaw ({{SSRN Scholarly Paper}} ID 4011603).\nSocial Science Research Network. https://doi.org/10.2139/ssrn.4011603\n\n\nChin, J. M., Pickett, J. T., Vazire, S., & Holcombe, A. O. (2021).\nQuestionable Research Practices and Open\nScience in Quantitative Criminology. Journal of\nQuantitative Criminology. https://doi.org/10.1007/s10940-021-09525-6\n\n\nCho, H.-C., & Abe, S. (2013). Is two-tailed testing for directional\nresearch hypotheses tests legitimate? Journal of Business\nResearch, 66(9), 1261–1266. https://doi.org/10.1016/j.jbusres.2012.02.023\n\n\nCohen, J. (1988). Statistical power analysis for the behavioral\nsciences (2nd ed). L. Erlbaum Associates.\n\n\nCohen, J. (1990). Things I have learned (so far).\nAmerican Psychologist, 45(12), 1304–1312. https://doi.org/10.1037/0003-066X.45.12.1304\n\n\nCohen, J. (1994). The earth is round (p < .05). American\nPsychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997\n\n\nColes, N. A., March, D. S., Marmolejo-Ramos, F., Larsen, J. T., Arinze,\nN. C., Ndukaihe, I. L. G., Willis, M. L., Foroni, F., Reggev, N.,\nMokady, A., Forscher, P. S., Hunter, J. F., Kaminski, G., Yüvrük, E.,\nKapucu, A., Nagy, T., Hajdu, N., Tejada, J., Freitag, R. M. K., …\nLiuzza, M. T. (2022). A multi-lab test of the facial feedback hypothesis\nby the Many Smiles Collaboration. Nature Human\nBehaviour, 6(12), 1731–1742. https://doi.org/10.1038/s41562-022-01458-9\n\n\nColling, L. J., Szűcs, D., De Marco, D., Cipora, K., Ulrich, R., Nuerk,\nH.-C., Soltanlou, M., Bryce, D., Chen, S.-C., Schroeder, P. A., Henare,\nD. T., Chrystall, C. K., Corballis, P. M., Ansari, D., Goffin, C.,\nSokolowski, H. M., Hancock, P. J. B., Millen, A. E., Langton, S. R. H.,\n… McShane, B. B. (2020). Registered Replication Report on\nFischer, Castel, Dodd, and\nPratt (2003). Advances in Methods and Practices in\nPsychological Science, 3(2), 143–162. https://doi.org/10.1177/2515245920903079\n\n\nColquhoun, D. (2019). The False Positive Risk: A\nProposal Concerning What to Do About\np-Values. The American Statistician,\n73(sup1), 192–201. https://doi.org/10.1080/00031305.2018.1529622\n\n\nCook, J., Hislop, J., Adewuyi, T., Harrild, K., Altman, D., Ramsay, C.,\nFraser, C., Buckley, B., Fayers, P., Harvey, I., Briggs, A., Norrie, J.,\nFergusson, D., Ford, I., & Vale, L. (2014). Assessing methods to\nspecify the target difference for a randomised controlled trial:\nDELTA (Difference ELicitation in\nTriAls) review. Health Technology Assessment,\n18(28). https://doi.org/10.3310/hta18280\n\n\nCook, T. D. (2002). P-Value Adjustment in Sequential\nClinical Trials. Biometrics, 58(4), 1005–1011.\n\n\nCooper, H. (2020). Reporting quantitative research in psychology:\nHow to meet APA Style Journal Article Reporting\nStandards (2nd ed.). American Psychological\nAssociation. https://doi.org/10.1037/0000178-000\n\n\nCooper, H. M., Hedges, L. V., & Valentine, J. C. (Eds.). (2009).\nThe handbook of research synthesis and meta-analysis (2nd ed).\nRussell Sage Foundation.\n\n\nCopay, A. G., Subach, B. R., Glassman, S. D., Polly, D. W., &\nSchuler, T. C. (2007). Understanding the minimum clinically important\ndifference: A review of concepts and methods. The Spine\nJournal, 7(5), 541–546. https://doi.org/10.1016/j.spinee.2007.01.008\n\n\nCorneille, O., Havemann, J., Henderson, E. L., IJzerman, H., Hussey, I.,\nOrban de Xivry, J.-J., Jussim, L., Holmes, N. P., Pilacinski, A.,\nBeffara, B., Carroll, H., Outa, N. O., Lush, P., & Lotter, L. D.\n(2023). Beware “persuasive communication devices” when\nwriting and reading scientific articles. eLife, 12,\ne88654. https://doi.org/10.7554/eLife.88654\n\n\nCorrell, J., Mellinger, C., McClelland, G. H., & Judd, C. M. (2020).\nAvoid Cohen’s “Small,”\n“Medium,” and\n“Large” for Power Analysis.\nTrends in Cognitive Sciences, 24(3), 200–207. https://doi.org/10.1016/j.tics.2019.12.009\n\n\nCousineau, D., & Chiasson, F. (2019). Superb:\nComputes standard error and confidence interval of means\nunder various designs and sampling schemes [Manual].\n\n\nCowles, M., & Davis, C. (1982). On the origins of the. 05 level of\nstatistical significance. American Psychologist,\n37(5), 553.\n\n\nCox, D. R. (1958). Some Problems Connected with\nStatistical Inference. Annals of Mathematical\nStatistics, 29(2), 357–372. https://doi.org/10.1214/aoms/1177706618\n\n\nCribbie, R. A., Gruman, J. A., & Arpin-Cribbie, C. A. (2004).\nRecommendations for applying tests of equivalence. Journal of\nClinical Psychology, 60(1), 1–10.\n\n\nCrusius, J., Gonzalez, M. F., Lange, J., & Cohen-Charash, Y. (2020).\nEnvy: An Adversarial Review and Comparison of\nTwo Competing Views. Emotion Review,\n12(1), 3–21. https://doi.org/10.1177/1754073919873131\n\n\nCrüwell, S., Apthorp, D., Baker, B. J., Colling, L., Elson, M., Geiger,\nS. J., Lobentanzer, S., Monéger, J., Patterson, A., Schwarzkopf, D. S.,\nZaneva, M., & Brown, N. J. L. (2023). What’s in a\nBadge? A Computational Reproducibility\nInvestigation of the Open Data Badge Policy in\nOne Issue of Psychological Science.\nPsychological Science, 09567976221140828. https://doi.org/10.1177/09567976221140828\n\n\nCumming, G. (2008). Replication and p\nIntervals: p Values\nPredict the Future Only Vaguely, but\nConfidence Intervals Do Much Better. Perspectives on\nPsychological Science, 3(4), 286–300. https://doi.org/10.1111/j.1745-6924.2008.00079.x\n\n\nCumming, G. (2013). Understanding the new statistics:\nEffect sizes, confidence intervals, and meta-analysis.\nRoutledge.\n\n\nCumming, G. (2014). The New Statistics: Why\nand How. Psychological Science, 25(1),\n7–29. https://doi.org/10.1177/0956797613504966\n\n\nCumming, G., & Calin-Jageman, R. (2016). Introduction to the\nNew Statistics: Estimation, Open\nScience, and Beyond. Routledge.\n\n\nCumming, G., & Maillardet, R. (2006). Confidence intervals and\nreplication: Where will the next mean fall?\nPsychological Methods, 11(3), 217–227. https://doi.org/10.1037/1082-989X.11.3.217\n\n\nDanziger, S., Levav, J., & Avnaim-Pesso, L. (2011). Extraneous\nfactors in judicial decisions. Proceedings of the National Academy\nof Sciences, 108(17), 6889–6892. https://doi.org/10.1073/PNAS.1018033108\n\n\nde Groot, A. D. (1969). Methodology (Vol. 6). Mouton\n& Co.\n\n\nde Heide, R., & Grünwald, P. D. (2017). Why optional stopping is a\nproblem for Bayesians. arXiv:1708.08278 [Math,\nStat]. https://arxiv.org/abs/1708.08278\n\n\nDeBruine, L. M., & Barr, D. J. (2021). Understanding\nMixed-Effects Models Through Data Simulation. Advances\nin Methods and Practices in Psychological Science, 4(1),\n2515245920965119. https://doi.org/10.1177/2515245920965119\n\n\nDelacre, M., Lakens, D., Ley, C., Liu, L., & Leys, C. (2021).\nWhy Hedges’ g*s based on the non-pooled standard\ndeviation should be reported with Welch’s t-test.\nPsyArXiv. https://doi.org/10.31234/osf.io/tu6mp\n\n\nDelacre, M., Lakens, D., & Leys, C. (2017). Why Psychologists\nShould by Default Use Welch’s\nt-test Instead of\nStudent’s t-test. International\nReview of Social Psychology, 30(1). https://doi.org/10.5334/irsp.82\n\n\nDetsky, A. S. (1990). Using cost-effectiveness analysis to improve the\nefficiency of allocating funds to clinical trials. Statistics in\nMedicine, 9(1-2), 173–184. https://doi.org/10.1002/sim.4780090124\n\n\nDienes, Z. (2008). Understanding psychology as a science:\nAn introduction to scientific and statistical\ninference. Palgrave Macmillan.\n\n\nDienes, Z. (2014). Using Bayes to get the most out of\nnon-significant results. Frontiers in Psychology, 5.\nhttps://doi.org/10.3389/fpsyg.2014.00781\n\n\nDmitrienko, A., & D’Agostino Sr, R. (2013). Traditional multiplicity\nadjustment methods in clinical trials. Statistics in Medicine,\n32(29), 5172–5218. https://doi.org/10.1002/sim.5990\n\n\nDodge, H. F., & Romig, H. G. (1929). A Method of\nSampling Inspection. Bell System Technical\nJournal, 8(4), 613–631. https://doi.org/10.1002/j.1538-7305.1929.tb01240.x\n\n\nDongen, N. N. N. van, Doorn, J. B. van, Gronau, Q. F., Ravenzwaaij, D.\nvan, Hoekstra, R., Haucke, M. N., Lakens, D., Hennig, C., Morey, R. D.,\nHomer, S., Gelman, A., Sprenger, J., & Wagenmakers, E.-J. (2019).\nMultiple Perspectives on Inference for\nTwo Simple Statistical Scenarios. The American\nStatistician, 73(sup1), 328–339. https://doi.org/10.1080/00031305.2019.1565553\n\n\nDouglas, H. E. (2009). Science, policy, and the value-free\nideal. University of Pittsburgh Press.\n\n\nDubin, R. (1969). Theory building. Free Press.\n\n\nDuhem, P. (1954). The aim and structure of physical theory.\nPrinceton University Press.\n\n\nDupont, W. D. (1983). Sequential stopping rules and sequentially\nadjusted P values: Does one require the other?\nControlled Clinical Trials, 4(1), 3–10. https://doi.org/10.1016/S0197-2456(83)80003-8\n\n\nDuyx, B., Urlings, M. J. E., Swaen, G. M. H., Bouter, L. M., &\nZeegers, M. P. (2017). Scientific citations favor positive results: A\nsystematic review and meta-analysis. Journal of Clinical\nEpidemiology, 88, 92–101. https://doi.org/10.1016/j.jclinepi.2017.06.002\n\n\nEbersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M.,\nAllen, J. M., Banks, J. B., Baranski, E., Bernstein, M. J., Bonfiglio,\nD. B. V., Boucher, L., Brown, E. R., Budiman, N. I., Cairo, A. H.,\nCapaldi, C. A., Chartier, C. R., Chung, J. M., Cicero, D. C., Coleman,\nJ. A., Conway, J. G., … Nosek, B. A. (2016). Many Labs 3:\nEvaluating participant pool quality across the academic\nsemester via replication. Journal of Experimental Social\nPsychology, 67, 68–82. https://doi.org/10.1016/j.jesp.2015.10.012\n\n\nEckermann, S., Karnon, J., & Willan, A. R. (2010). The\nValue of Value of Information.\nPharmacoEconomics, 28(9), 699–709. https://doi.org/10.2165/11537370-000000000-00000\n\n\nEdwards, M. A., & Roy, S. (2017). Academic Research in\nthe 21st Century: Maintaining Scientific\nIntegrity in a Climate of Perverse\nIncentives and Hypercompetition. Environmental\nEngineering Science, 34(1), 51–61. https://doi.org/10.1089/ees.2016.0223\n\n\nElson, M., Mohseni, M. R., Breuer, J., Scharkow, M., & Quandt, T.\n(2014). Press CRTT to measure aggressive behavior: The\nunstandardized use of the competitive reaction time task in aggression\nresearch. Psychological Assessment, 26(2), 419–432. https://doi.org/10.1037/a0035569\n\n\nErdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER:\nA general power analysis program. Behavior Research\nMethods, Instruments, & Computers, 28(1), 1–11. https://doi.org/10.3758/BF03203630\n\n\nEysenck, H. J. (1978). An exercise in mega-silliness. American\nPsychologist, 33(5), 517–517. https://doi.org/10.1037/0003-066X.33.5.517.a\n\n\nFanelli, D. (2010). “Positive” Results\nIncrease Down the Hierarchy of the\nSciences. PLoS ONE, 5(4). https://doi.org/10.1371/journal.pone.0010068\n\n\nFaul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007).\nGPower 3: A flexible statistical power\nanalysis program for the social, behavioral, and biomedical sciences.\nBehavior Research Methods, 39(2), 175–191. https://doi.org/10.3758/BF03193146\n\n\nFerguson, C. J. (2014). Comment: Why meta-analyses rarely\nresolve ideological debates. Emotion Review, 6(3),\n251–252.\n\n\nFerguson, C. J., & Heene, M. (2012). A vast graveyard of undead\ntheories publication bias and psychological science’s aversion to the\nnull. Perspectives on Psychological Science, 7(6),\n555–561.\n\n\nFerguson, C. J., & Heene, M. (2021). Providing a lower-bound\nestimate for psychology’s “crud factor”: The\ncase of aggression. Professional Psychology: Research and\nPractice, 52(6), 620–626. https://doi.org/http://dx.doi.org/10.1037/pro0000386\n\n\nFerguson, C., Marcus, A., & Oransky, I. (2014). Publishing:\nThe peer-review scam. Nature, 515(7528),\n480–482. https://doi.org/10.1038/515480a\n\n\nFerron, J., & Onghena, P. (1996). The Power of\nRandomization Tests for Single-Case Phase\nDesigns. The Journal of Experimental Education,\n64(3), 231–239. https://doi.org/10.1080/00220973.1996.9943805\n\n\nFeyerabend, P. (1993). Against method (3rd ed).\nVerso.\n\n\nFeynman, R. P. (1974). Cargo cult science. Engineering and\nScience, 37(7), 10–13.\n\n\nFiedler, K. (2004). Tools, toys, truisms, and theories:\nSome thoughts on the creative cycle of theory formation.\nPersonality and Social Psychology Review, 8(2),\n123–131. https://doi.org/10.1207/s15327957pspr0802_5\n\n\nFiedler, K., & Schwarz, N. (2016). Questionable Research\nPractices Revisited. Social Psychological and Personality\nScience, 7(1), 45–52. https://doi.org/10.1177/1948550615612150\n\n\nField, S. A., Tyre, A. J., Jonzén, N., Rhodes, J. R., & Possingham,\nH. P. (2004). Minimizing the cost of environmental management decisions\nby optimizing statistical thresholds. Ecology Letters,\n7(8), 669–675. https://doi.org/10.1111/j.1461-0248.2004.00625.x\n\n\nFisher, Ronald Aylmer. (1935). The design of experiments.\nOliver And Boyd; Edinburgh; London.\n\n\nFisher, Ronald A. (1936). Has Mendel’s work been\nrediscovered? Annals of Science, 1(2), 115–137.\n\n\nFisher, Ronald A. (1956). Statistical methods and scientific\ninference: Vol. viii. Hafner Publishing Co.\n\n\nFraley, R. C., & Vazire, S. (2014). The N-Pact Factor:\nEvaluating the Quality of Empirical\nJournals with Respect to Sample Size\nand Statistical Power. PLOS ONE, 9(10),\ne109019. https://doi.org/10.1371/journal.pone.0109019\n\n\nFrancis, G. (2014). The frequency of excess success for articles in\nPsychological Science. Psychonomic Bulletin &\nReview, 21(5), 1180–1187. https://doi.org/10.3758/s13423-014-0601-x\n\n\nFrancis, G. (2016). Equivalent statistics and data interpretation.\nBehavior Research Methods, 1–15. https://doi.org/10.3758/s13428-016-0812-3\n\n\nFranco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias\nin the social sciences: Unlocking the file drawer.\nScience, 345(6203), 1502–1505. https://doi.org/10.1126/SCIENCE.1255484\n\n\nFrankenhuis, W. E., Panchanathan, K., & Smaldino, P. E. (2022).\nStrategic ambiguity in the social sciences. Social Psychological\nBulletin.\n\n\nFraser, H., Parker, T., Nakagawa, S., Barnett, A., & Fidler, F.\n(2018). Questionable research practices in ecology and evolution.\nPLOS ONE, 13(7), e0200303. https://doi.org/10.1371/journal.pone.0200303\n\n\nFreiman, J. A., Chalmers, T. C., Smith, H., & Kuebler, R. R. (1978).\nThe importance of beta, the type II error and sample size\nin the design and interpretation of the randomized control trial.\nSurvey of 71 \"negative\" trials. The New England Journal\nof Medicine, 299(13), 690–694. https://doi.org/10.1056/NEJM197809282991304\n\n\nFrick, R. W. (1996). The appropriate use of null hypothesis testing.\nPsychological Methods, 1(4), 379–390. https://doi.org/10.1037/1082-989X.1.4.379\n\n\nFricker, R. D., Burke, K., Han, X., & Woodall, W. H. (2019).\nAssessing the Statistical Analyses Used in\nBasic and Applied Social Psychology After\nTheir p-Value Ban. The American\nStatistician, 73(sup1), 374–384. https://doi.org/10.1080/00031305.2018.1537892\n\n\nFried, B. J., Boers, M., & Baker, P. R. (1993). A method for\nachieving consensus on rheumatoid arthritis outcome measures: The\nOMERACT conference process. The Journal of\nRheumatology, 20(3), 548–551.\n\n\nFriede, T., & Kieser, M. (2006). Sample size recalculation in\ninternal pilot study designs: A review. Biometrical Journal: Journal\nof Mathematical Methods in Biosciences, 48(4), 537–555. https://doi.org/10.1002/bimj.200510238\n\n\nFriedlander, F. (1964). Type I and Type II\nBias. American Psychologist, 19(3), 198–199. https://doi.org/10.1037/h0038977\n\n\nFugard, A. J. B., & Potts, H. W. W. (2015). Supporting thinking on\nsample sizes for thematic analyses: A quantitative tool.\nInternational Journal of Social Research Methodology,\n18(6), 669–684. https://doi.org/10.1080/13645579.2015.1005453\n\n\nFunder, D. C., & Ozer, D. J. (2019). Evaluating effect size in\npsychological research: Sense and nonsense. Advances in\nMethods and Practices in Psychological Science, 2(2),\n156–168. https://doi.org/10.1177/2515245919847202\n\n\nGannon, M. A., de Bragança Pereira, C. A., & Polpo, A. (2019).\nBlending Bayesian and Classical Tools to\nDefine Optimal Sample-Size-Dependent Significance Levels.\nThe American Statistician, 73(sup1), 213–222. https://doi.org/10.1080/00031305.2018.1518268\n\n\nGelman, A., & Carlin, J. (2014). Beyond Power\nCalculations: Assessing Type S (Sign)\nand Type M (Magnitude) Errors.\nPerspectives on Psychological Science, 9(6), 641–651.\n\n\nGerring, J. (2012). Mere Description. British Journal\nof Political Science, 42(4), 721–746. https://doi.org/10.1017/S0007123412000130\n\n\nGillon, R. (1994). Medical ethics: Four principles plus attention to\nscope. BMJ, 309(6948), 184. https://doi.org/10.1136/bmj.309.6948.184\n\n\nGlöckner, A. (2016). The irrational hungry judge effect revisited:\nSimulations reveal that the magnitude of the effect is\noverestimated. Judgment and Decision Making, 11(6),\n601–610.\n\n\nGlover, S., & Dixon, P. (2004). Likelihood ratios: A\nsimple and flexible statistic for empirical psychologists.\nPsychonomic Bulletin & Review, 11(5), 791–806.\n\n\nGoldacre, B., DeVito, N. J., Heneghan, C., Irving, F., Bacon, S.,\nFleminger, J., & Curtis, H. (2018). Compliance with requirement to\nreport results on the EU Clinical Trials Register: Cohort\nstudy and web resource. BMJ, 362, k3218. https://doi.org/10.1136/bmj.k3218\n\n\nGood, I. J. (1992). The Bayes/Non-Bayes\ncompromise: A brief review. Journal of the American\nStatistical Association, 87(419), 597–606. https://doi.org/10.2307/2290192\n\n\nGoodyear-Smith, F. A., van Driel, M. L., Arroll, B., & Del Mar, C.\n(2012). Analysis of decisions made in meta-analyses of depression\nscreening and the risk of confirmation bias: A case study.\nBMC Medical Research Methodology, 12, 76. https://doi.org/10.1186/1471-2288-12-76\n\n\nGopalakrishna, G., Riet, G. ter, Vink, G., Stoop, I., Wicherts, J. M.,\n& Bouter, L. M. (2022). Prevalence of questionable research\npractices, research misconduct and their potential explanatory factors:\nA survey among academic researchers in The\nNetherlands. PLOS ONE, 17(2), e0263023. https://doi.org/10.1371/journal.pone.0263023\n\n\nGosset, W. S. (1904). The Application of the\n\"Law of Error\" to the Work of the\nBrewery (1 vol 8; pp. 3–16). Arthur Guinness\n& Son, Ltd.\n\n\nGreen, P., & MacLeod, C. J. (2016). SIMR: An\nR package for power analysis of generalized linear mixed\nmodels by simulation. Methods in Ecology and Evolution,\n7(4), 493–498. https://doi.org/10.1111/2041-210X.12504\n\n\nGreen, S. B. (1991). How Many Subjects Does It Take To Do A\nRegression Analysis. Multivariate Behavioral Research,\n26(3), 499–510. https://doi.org/10.1207/s15327906mbr2603_7\n\n\nGreenwald, A. G. (1975). Consequences of prejudice against the null\nhypothesis. Psychological Bulletin, 82(1), 1–20.\n\n\nGrünwald, P., de Heide, R., & Koolen, W. (2019). Safe\nTesting. arXiv:1906.07801 [Cs, Math, Stat]. https://arxiv.org/abs/1906.07801\n\n\nGupta, S. K. (2011). Intention-to-treat concept: A review.\nPerspectives in Clinical Research, 2(3), 109–112. https://doi.org/10.4103/2229-3485.83221\n\n\nHacking, I. (1965). Logic of Statistical\nInference. Cambridge University Press.\n\n\nHagger, M. S., Chatzisarantis, N. L. D., Alberts, H., Anggono, C. O.,\nBatailler, C., Birt, A. R., Brand, R., Brandt, M. J., Brewer, G.,\nBruyneel, S., Calvillo, D. P., Campbell, W. K., Cannon, P. R., Carlucci,\nM., Carruth, N. P., Cheung, T., Crowell, A., De Ridder, D. T. D.,\nDewitte, S., … Zwienenberg, M. (2016). A Multilab Preregistered\nReplication of the Ego-Depletion Effect.\nPerspectives on Psychological Science, 11(4), 546–573.\nhttps://doi.org/10.1177/1745691616652873\n\n\nHallahan, M., & Rosenthal, R. (1996). Statistical power:\nConcepts, procedures, and applications. Behaviour\nResearch and Therapy, 34(5), 489–499. https://doi.org/10.1016/0005-7967(95)00082-8\n\n\nHallinan, D., Boehm, F., Külpmann, A., & Elson, M. (2023).\nInformation Provision for Informed Consent\nProcedures in Psychological Research Under the\nGeneral Data Protection Regulation: A Practical\nGuide. Advances in Methods and Practices in Psychological\nScience, 6(1), 25152459231151944. https://doi.org/10.1177/25152459231151944\n\n\nHalpern, J., Brown Jr, B. W., & Hornberger, J. (2001). The sample\nsize for a clinical trial: A Bayesian decision theoretic\napproach. Statistics in Medicine, 20(6), 841–858. https://doi.org/10.1002/sim.703\n\n\nHalpern, S. D., Karlawish, J. H., & Berlin, J. A. (2002). The\ncontinuing unethical conduct of underpowered clinical trials.\nJama, 288(3), 358–362. https://doi.org/doi:10.1001/jama.288.3.358\n\n\nHand, D. J. (1994). Deconstructing Statistical Questions.\nJournal of the Royal Statistical Society. Series A (Statistics in\nSociety), 157(3), 317–356. https://doi.org/10.2307/2983526\n\n\nHardwicke, T. E., Mathur, M. B., MacDonald, K., Nilsonne, G., Banks, G.\nC., Kidwell, M. C., Mohr, A. H., Clayton, E., Yoon, E. J., Tessler, M.\nH., Lenne, R. L., Altman, S., Long, B., & Frank, M. C. (2018). Data\navailability, reusability, and analytic reproducibility: Evaluating the\nimpact of a mandatory open data policy at the journal\nCognition. Open Science, 5(8), 180448. https://doi.org/10.1098/rsos.180448\n\n\nHarms, C., & Lakens, D. (2018). Making ’null effects’ informative:\nStatistical techniques and inferential frameworks. Journal of\nClinical and Translational Research, 3, 382–393. https://doi.org/10.18053/jctres.03.2017S2.007\n\n\nHarrer, M., Cuijpers, P., Furukawa, T. A., & Ebert, D. D. (2021).\nDoing Meta-Analysis with R: A\nHands-On Guide. Chapman and Hall/CRC. https://doi.org/10.1201/9781003107347\n\n\nHauck, D. W. W., & Anderson, S. (1984). A new statistical procedure\nfor testing equivalence in two-group comparative bioavailability trials.\nJournal of Pharmacokinetics and Biopharmaceutics,\n12(1), 83–91. https://doi.org/10.1007/BF01063612\n\n\nHedges, L. V., & Pigott, T. D. (2001). The power of statistical\ntests in meta-analysis. Psychological Methods, 6(3),\n203–217. https://doi.org/10.1037/1082-989X.6.3.203\n\n\nHempel, C. G. (1966). Philosophy of natural science (Nachdr.).\nPrentice-Hall.\n\n\nHilgard, J. (2021). Maximal positive controls: A method for\nestimating the largest plausible effect size. Journal of\nExperimental Social Psychology, 93. https://doi.org/10.1016/j.jesp.2020.104082\n\n\nHill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008).\nEmpirical Benchmarks for Interpreting Effect\nSizes in Research. Child Development\nPerspectives, 2(3), 172–177. https://doi.org/10.1111/j.1750-8606.2008.00061.x\n\n\nHodges, J. L., & Lehmann, E. L. (1954). Testing the\nApproximate Validity of Statistical\nHypotheses. Journal of the Royal Statistical Society. Series\nB (Methodological), 16(2), 261–268. https://doi.org/10.1111/j.2517-6161.1954.tb00169.x\n\n\nHoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The\npervasive fallacy of power calculations for data analysis. The\nAmerican Statistician, 55(1), 19–24. https://doi.org/10.1198/000313001300339897\n\n\nHuedo-Medina, T. B., Sánchez-Meca, J., Marín-Martínez, F., &\nBotella, J. (2006). Assessing heterogeneity in meta-analysis:\nQ statistic or I$2̂$ index? Psychological\nMethods, 11(2), 193.\n\n\nHung, H. M. J., O’Neill, R. T., Bauer, P., & Kohne, K. (1997). The\nBehavior of the P-Value When the\nAlternative Hypothesis is True.\nBiometrics, 53(1), 11–22. https://doi.org/10.2307/2533093\n\n\nHunt, K. (1975). Do we really need more replications? Psychological\nReports, 36(2), 587–593.\n\n\nHyde, J. S., Lindberg, S. M., Linn, M. C., Ellis, A. B., & Williams,\nC. C. (2008). Gender Similarities Characterize Math\nPerformance. Science, 321(5888), 494–495. https://doi.org/10.1126/science.1160364\n\n\nIoannidis, J. P. A. (2005). Why Most Published Research Findings\nAre False. PLoS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124\n\n\nIoannidis, J. P. A., & Trikalinos, T. A. (2007). An exploratory test\nfor an excess of significant findings. Clinical Trials,\n4(3), 245–253. https://doi.org/10.1177/1740774507079441\n\n\nIyengar, S., & Greenhouse, J. B. (1988). Selection\nModels and the File Drawer Problem.\nStatistical Science, 3(1), 109–117. https://www.jstor.org/stable/2245925\n\n\nJaeschke, R., Singer, J., & Guyatt, G. H. (1989). Measurement of\nhealth status: Ascertaining the minimal clinically\nimportant difference. Controlled Clinical Trials,\n10(4), 407–415. https://doi.org/10.1016/0197-2456(89)90005-6\n\n\nJeffreys, H. (1939). Theory of probability (1st ed).\nOxford University Press.\n\n\nJennison, C., & Turnbull, B. W. (2000). Group sequential methods\nwith applications to clinical trials. Chapman &\nHall/CRC.\n\n\nJohansson, T. (2011). Hail the impossible: P-values, evidence, and\nlikelihood. Scandinavian Journal of Psychology, 52(2),\n113–125. https://doi.org/10.1111/j.1467-9450.2010.00852.x\n\n\nJohn, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the\nprevalence of questionable research practices with incentives for truth\ntelling. Psychological Science, 23(5), 524–532. https://doi.org/10.1177/0956797611430953\n\n\nJohnson, V. E. (2013). Revised standards for statistical evidence.\nProceedings of the National Academy of Sciences,\n110(48), 19313–19317. https://doi.org/10.1073/pnas.1313476110\n\n\nJones, L. V. (1952). Test of hypotheses: One-sided vs. Two-sided\nalternatives. Psychological Bulletin, 49(1), 43–46.\nhttps://doi.org/http://dx.doi.org/10.1037/h0056832\n\n\nJostmann, N. B., Lakens, D., & Schubert, T. W. (2009). Weight as an\nEmbodiment of Importance. Psychological\nScience, 20(9), 1169–1174. https://doi.org/10.1111/j.1467-9280.2009.02426.x\n\n\nJostmann, N. B., Lakens, D., & Schubert, T. W. (2016). A short\nhistory of the weight-importance effect and a recommendation for\npre-testing: Commentary on Ebersole et al.\n(2016). Journal of Experimental Social Psychology, 67,\n93–94. https://doi.org/10.1016/j.jesp.2015.12.001\n\n\nJulious, S. A. (2004). Sample sizes for clinical trials with normal\ndata. Statistics in Medicine, 23(12), 1921–1986. https://doi.org/10.1002/sim.1783\n\n\nJunk, T., & Lyons, L. (2020). Reproducibility and\nReplication of Experimental Particle Physics\nResults. Harvard Data Science Review, 2(4). https://doi.org/10.1162/99608f92.250f995b\n\n\nKaiser, H. F. (1960). Directional statistical decisions.\nPsychological Review, 67(3), 160–167. https://doi.org/10.1037/h0047595\n\n\nKaplan, R. M., & Irvin, V. L. (2015). Likelihood of Null\nEffects of Large NHLBI Clinical Trials Has Increased\nover Time. PLOS ONE, 10(8), e0132382. https://doi.org/10.1371/journal.pone.0132382\n\n\nKass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of\nthe American Statistical Association, 90(430), 773–795. https://doi.org/10.1080/01621459.1995.10476572\n\n\nKeefe, R. S. E., Kraemer, H. C., Epstein, R. S., Frank, E., Haynes, G.,\nLaughren, T. P., Mcnulty, J., Reed, S. D., Sanchez, J., & Leon, A.\nC. (2013). Defining a\nClinically Meaningful Effect for the Design\nand Interpretation of Randomized Controlled\nTrials. Innovations in Clinical Neuroscience,\n10(5-6 Suppl A), 4S–19S.\n\n\nKelley, K. (2007). Confidence Intervals for\nStandardized Effect Sizes: Theory,\nApplication, and Implementation. Journal\nof Statistical Software, 20(8). https://doi.org/10.18637/JSS.V020.I08\n\n\nKelley, K., & Preacher, K. J. (2012). On effect size.\nPsychological Methods, 17(2), 137–152. https://doi.org/10.1037/a0028086\n\n\nKelley, K., & Rausch, J. R. (2006). Sample size planning for the\nstandardized mean difference: Accuracy in parameter estimation via\nnarrow confidence intervals. Psychological Methods,\n11(4), 363–385. https://doi.org/10.1037\n\n\nKelter, R. (2021). Analysis of type I and II\nerror rates of Bayesian and frequentist parametric and\nnonparametric two-sample hypothesis tests under preliminary assessment\nof normality. Computational Statistics, 36(2),\n1263–1288. https://doi.org/10.1007/s00180-020-01034-7\n\n\nKenett, R. S., Shmueli, G., & Kenett, R. (2016). Information\nQuality: The Potential of Data\nand Analytics to Generate Knowledge (1st\nedition). Wiley.\n\n\nKennedy-Shaffer, L. (2019). Before p < 0.05 to Beyond p\n< 0.05: Using\nHistory to Contextualize p-Values and\nSignificance Testing. The American Statistician,\n73(sup1), 82–90. https://doi.org/10.1080/00031305.2018.1537891\n\n\nKenny, D. A., & Judd, C. M. (2019). The unappreciated heterogeneity\nof effect sizes: Implications for power, precision,\nplanning of research, and replication. Psychological Methods,\n24(5), 578–589. https://doi.org/10.1037/met0000209\n\n\nKeppel, G. (1991). Design and analysis: A researcher’s\nhandbook, 3rd ed (pp. xiii, 594). Prentice-Hall, Inc.\n\n\nKerr, N. L. (1998). HARKing: Hypothesizing\nAfter the Results are Known.\nPersonality and Social Psychology Review, 2(3),\n196–217. https://doi.org/10.1207/s15327957pspr0203_4\n\n\nKing, M. T. (2011). A point of minimal important difference\n(MID): A critique of terminology and methods. Expert\nReview of Pharmacoeconomics & Outcomes Research,\n11(2), 171–184. https://doi.org/10.1586/erp.11.9\n\n\nKish, L. (1959). Some Statistical Problems in\nResearch Design. American Sociological Review,\n24(3), 328–338. https://doi.org/10.2307/2089381\n\n\nKish, L. (1965). Survey Sampling.\nWiley.\n\n\nKomić, D., Marušić, S. L., & Marušić, A. (2015). Research\nIntegrity and Research Ethics in\nProfessional Codes of Ethics:\nSurvey of Terminology Used by\nProfessional Organizations across Research\nDisciplines. PLOS ONE, 10(7), e0133662. https://doi.org/10.1371/journal.pone.0133662\n\n\nKraft, M. A. (2020). Interpreting effect sizes of education\ninterventions. Educational Researcher, 49(4), 241–253.\nhttps://doi.org/10.3102/0013189X20912798\n\n\nKruschke, J. K. (2011). Bayesian assessment of null values via parameter\nestimation and model comparison. Perspectives on Psychological\nScience, 6(3), 299–312.\n\n\nKruschke, J. K. (2013). Bayesian estimation supersedes the t test.\nJournal of Experimental Psychology: General, 142(2),\n573–603. https://doi.org/10.1037/a0029146\n\n\nKruschke, J. K. (2014). Doing Bayesian Data Analysis,\nSecond Edition: A Tutorial with\nR, JAGS, and Stan (2\nedition). Academic Press.\n\n\nKruschke, J. K. (2018). Rejecting or Accepting Parameter\nValues in Bayesian Estimation. Advances in\nMethods and Practices in Psychological Science, 1(2),\n270–280. https://doi.org/10.1177/2515245918771304\n\n\nKruschke, J. K., & Liddell, T. M. (2017). The Bayesian New\nStatistics: Hypothesis testing, estimation,\nmeta-analysis, and power analysis from a Bayesian\nperspective. Psychonomic Bulletin & Review. https://doi.org/10.3758/s13423-016-1221-4\n\n\nKuhn, T. S. (1962). The Structure of Scientific\nRevolutions. University of Chicago Press.\n\n\nKuipers, T. A. F. (2016). Models, postulates, and generalized nomic\ntruth approximation. Synthese, 193(10), 3057–3077. https://doi.org/10.1007/s11229-015-0916-9\n\n\nLakatos, I. (1978). The methodology of scientific research\nprogrammes: Volume 1: Philosophical\npapers. Cambridge University Press.\n\n\nLakens, Daniël. (2013). Calculating and reporting effect sizes to\nfacilitate cumulative science: A practical primer for t-tests and\nANOVAs. Frontiers in Psychology, 4. https://doi.org/10.3389/fpsyg.2013.00863\n\n\nLakens, Daniël. (2014). Performing high-powered studies efficiently with\nsequential analyses: Sequential analyses. European\nJournal of Social Psychology, 44(7), 701–710. https://doi.org/10.1002/ejsp.2023\n\n\nLakens, Daniël. (2017). Equivalence Tests: A\nPractical Primer for t Tests,\nCorrelations, and Meta-Analyses. Social\nPsychological and Personality Science, 8(4), 355–362. https://doi.org/10.1177/1948550617697177\n\n\nLakens, Daniël. (2019). The value of preregistration for psychological\nscience: A conceptual analysis. Japanese Psychological\nReview, 62(3), 221–230. https://doi.org/10.24602/sjpr.62.3_221\n\n\nLakens, Daniël. (2020). Pandemic researchers recruit your\nown best critics. Nature, 581(7807), 121–121. https://doi.org/10.1038/d41586-020-01392-8\n\n\nLakens, Daniël. (2021). The practical alternative to the p value is the\ncorrectly used p value. Perspectives on Psychological Science,\n16(3), 639–648. https://doi.org/10.1177/1745691620958012\n\n\nLakens, Daniël. (2022a). Sample Size Justification.\nCollabra: Psychology. https://doi.org/10.31234/osf.io/9d3yf\n\n\nLakens, Daniël. (2022b). Why P values are not measures of\nevidence. Trends in Ecology & Evolution, 37(4),\n289–290. https://doi.org/10.1016/j.tree.2021.12.006\n\n\nLakens, Daniël. (2023). Is my study useless? Why\nresearchers need methodological review boards. Nature,\n613(7942), 9–9. https://doi.org/10.1038/d41586-022-04504-8\n\n\nLakens, Daniel. (2023). When and How to\nDeviate from a Preregistration.\nPsyArXiv. https://doi.org/10.31234/osf.io/ha29k\n\n\nLakens, Daniël, Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A.\nJ., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D.,\nBradford, D. E., Buchanan, E. M., Caldwell, A. R., Calster, B.,\nCarlsson, R., Chen, S.-C., Chung, B., Colling, L. J., Collins, G. S.,\nCrook, Z., … Zwaan, R. A. (2018). Justify your alpha. Nature Human\nBehaviour, 2, 168–171. https://doi.org/10.1038/s41562-018-0311-x\n\n\nLakens, Daniël, & Caldwell, A. R. (2021). Simulation-Based\nPower Analysis for Factorial Analysis of\nVariance Designs. Advances in Methods and Practices in\nPsychological Science, 4(1). https://doi.org/10.1177/2515245920951503\n\n\nLakens, Daniël, & DeBruine, L. (2020). Improving\nTransparency, Falsifiability, and\nRigour by Making Hypothesis Tests Machine\nReadable. https://doi.org/10.31234/osf.io/5xcda\n\n\nLakens, Daniël, & Etz, A. J. (2017). Too True to be\nBad: When Sets of Studies With\nSignificant and Nonsignificant Findings Are Probably\nTrue. Social Psychological and Personality Science,\n8(8), 875–881. https://doi.org/10.1177/1948550617693058\n\n\nLakens, Daniël, Hilgard, J., & Staaks, J. (2016). On the\nreproducibility of meta-analyses: Six practical recommendations. BMC\nPsychology, 4, 24. https://doi.org/10.1186/s40359-016-0126-3\n\n\nLakens, Daniël, McLatchie, N., Isager, P. M., Scheel, A. M., &\nDienes, Z. (2020). Improving Inferences About Null Effects With\nBayes Factors and Equivalence Tests. The\nJournals of Gerontology: Series B, 75(1), 45–57. https://doi.org/10.1093/geronb/gby065\n\n\nLakens, Daniël, Scheel, A. M., & Isager, P. M. (2018). Equivalence\ntesting for psychological research: A tutorial.\nAdvances in Methods and Practices in Psychological Science,\n1(2), 259–269. https://doi.org/10.1177/2515245918770963\n\n\nLan, K. K. G., & DeMets, D. L. (1983). Discrete Sequential\nBoundaries for Clinical Trials. Biometrika,\n70(3), 659. https://doi.org/10.2307/2336502\n\n\nLangmuir, I., & Hall, R. N. (1989). Pathological\nScience. Physics Today, 42(10), 36–48. https://doi.org/10.1063/1.881205\n\n\nLatan, H., Chiappetta Jabbour, C. J., Lopes de Sousa Jabbour, A. B.,\n& Ali, M. (2021). Crossing the Red Line?\nEmpirical Evidence and Useful Recommendations\non Questionable Research Practices among Business\nScholars. Journal of Business Ethics, 1–21. https://doi.org/10.1007/s10551-021-04961-7\n\n\nLaudan, L. (1981). Science and Hypothesis.\nSpringer Netherlands. https://doi.org/10.1007/978-94-015-7288-0\n\n\nLaudan, L. (1986). Science and Values: The\nAims of Science and Their Role in\nScientific Debate.\n\n\nLawrence, J. M., Meyerowitz-Katz, G., Heathers, J. A. J., Brown, N. J.\nL., & Sheldrick, K. A. (2021). The lesson of ivermectin:\nMeta-analyses based on summary data alone are inherently unreliable.\nNature Medicine, 27(11), 1853–1854. https://doi.org/10.1038/s41591-021-01535-y\n\n\nLeamer, E. E. (1978). Specification Searches: Ad\nHoc Inference with Nonexperimental Data (1\nedition). Wiley.\n\n\nLehmann, E. L., & Romano, J. P. (2005). Testing statistical\nhypotheses (3rd ed). Springer.\n\n\nLenth, R. V. (2001). Some practical guidelines for effective sample size\ndetermination. The American Statistician, 55(3),\n187–193. https://doi.org/10.1198/000313001317098149\n\n\nLenth, R. V. (2007). Post hoc power: Tables and commentary. Iowa\nCity: Department of Statistics and Actuarial Science, University of\nIowa.\n\n\nLeon, A. C., Davis, L. L., & Kraemer, H. C. (2011). The\nRole and Interpretation of Pilot\nStudies in Clinical Research. Journal of\nPsychiatric Research, 45(5), 626–629. https://doi.org/10.1016/j.jpsychires.2010.10.008\n\n\nLetrud, K., & Hernes, S. (2019). Affirmative citation bias in\nscientific myth debunking: A three-in-one case study.\nPLOS ONE, 14(9), e0222213. https://doi.org/10.1371/journal.pone.0222213\n\n\nLeung, P. T. M., Macdonald, E. M., Stanbrook, M. B., Dhalla, I. A.,\n& Juurlink, D. N. (2017). A 1980 Letter on the\nRisk of Opioid Addiction. New England\nJournal of Medicine, 376(22), 2194–2195. https://doi.org/10.1056/NEJMc1700150\n\n\nLevine, T. R., Weber, R., Park, H. S., & Hullett, C. R. (2008). A\ncommunication researchers’ guide to null hypothesis significance testing\nand alternatives. Human Communication Research, 34(2),\n188–209.\n\n\nLeys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019).\nHow to Classify, Detect, and Manage\nUnivariate and Multivariate Outliers, With\nEmphasis on Pre-Registration. International\nReview of Social Psychology, 32(1), 5. https://doi.org/10.5334/irsp.289\n\n\nLinden, A. H., & Hönekopp, J. (2021). Heterogeneity of\nResearch Results: A New Perspective From Which\nto Assess and Promote Progress in\nPsychological Science. Perspectives on Psychological\nScience, 16(2), 358–376. https://doi.org/10.1177/1745691620964193\n\n\nLindley, D. V. (1957). A statistical paradox. Biometrika,\n44(1/2), 187–192.\n\n\nLindsay, D. S. (2015). Replication in Psychological\nScience. Psychological Science, 26(12),\n1827–1832. https://doi.org/10.1177/0956797615616374\n\n\nLongino, H. E. (1990). Science as Social Knowledge:\nValues and Objectivity in Scientific\nInquiry. Princeton University Press.\n\n\nLouis, T. A., & Zeger, S. L. (2009). Effective communication of\nstandard errors and confidence intervals. Biostatistics,\n10(1), 1–2. https://doi.org/10.1093/biostatistics/kxn014\n\n\nLovakov, A., & Agadullina, E. R. (2021). Empirically derived\nguidelines for effect size interpretation in social psychology.\nEuropean Journal of Social Psychology, 51(3), 485–504.\nhttps://doi.org/10.1002/ejsp.2752\n\n\nLubin, A. (1957). Replicability as a publication criterion. American\nPsychologist, 12, 519–520. https://doi.org/10.1037/h0039746\n\n\nLuttrell, A., Petty, R. E., & Xu, M. (2017). Replicating and fixing\nfailed replications: The case of need for cognition and\nargument quality. Journal of Experimental Social Psychology,\n69, 178–183. https://doi.org/10.1016/j.jesp.2016.09.006\n\n\nLyons, I. M., Nuerk, H.-C., & Ansari, D. (2015). Rethinking the\nimplications of numerical ratio effects for understanding the\ndevelopment of representational precision and numerical processing\nacross formats. Journal of Experimental Psychology: General,\n144(5), 1021–1035. https://doi.org/10.1037/xge0000094\n\n\nMacCoun, R., & Perlmutter, S. (2015). Blind analysis:\nHide results to seek the truth. Nature,\n526(7572), 187–189. https://doi.org/10.1038/526187a\n\n\nMack, R. W. (1951). The Need for Replication\nResearch in Sociology. American Sociological\nReview, 16(1), 93–94. https://doi.org/10.2307/2087978\n\n\nMahoney, M. J. (1979). Psychology of the scientist: An\nevaluative review. Social Studies of Science, 9(3),\n349–375. https://doi.org/10.1177/030631277900900304\n\n\nMaier, M., & Lakens, D. (2022). Justify your alpha: A\nprimer on two practical approaches. Advances in Methods and\nPractices in Psychological Science. https://doi.org/10.31234/osf.io/ts4r6\n\n\nMakel, M. C., Hodges, J., Cook, B. G., & Plucker, J. A. (2021). Both\nQuestionable and Open Research Practices Are\nPrevalent in Education Research. Educational\nResearcher, 50(8), 493–504. https://doi.org/10.3102/0013189X211001356\n\n\nMarshall, B., Cardon, P., Poddar, A., & Fontenot, R. (2013). Does\nSample Size Matter in Qualitative Research?:\nA Review of Qualitative Interviews in is\nResearch. Journal of Computer Information Systems,\n54(1), 11–22. https://doi.org/10.1080/08874417.2013.11645667\n\n\nMaxwell, S. E., & Delaney, H. D. (2004). Designing experiments\nand analyzing data: A model comparison perspective (2nd ed).\nLawrence Erlbaum Associates.\n\n\nMaxwell, S. E., Delaney, H. D., & Kelley, K. (2017). Designing\nExperiments and Analyzing Data: A Model\nComparison Perspective, Third Edition (3\nedition). Routledge.\n\n\nMaxwell, S. E., & Kelley, K. (2011). Ethics and sample size\nplanning. In Handbook of ethics in quantitative methodology\n(pp. 179–204). Routledge.\n\n\nMaxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample\nSize Planning for Statistical Power and\nAccuracy in Parameter Estimation. Annual\nReview of Psychology, 59(1), 537–563. https://doi.org/10.1146/annurev.psych.59.103006.093735\n\n\nMayo, D. G. (1996). Error and the growth of experimental\nknowledge. University of Chicago Press.\n\n\nMayo, D. G. (2018). Statistical inference as severe testing: How to\nget beyond the statistics wars. Cambridge University\nPress.\n\n\nMayo, D. G., & Spanos, A. (2011). Error statistics. Philosophy\nof Statistics, 7, 152–198.\n\n\nMazzolari, R., Porcelli, S., Bishop, D. J., & Lakens, D. (2022).\nMyths and methodologies: The use of equivalence and\nnon-inferiority tests for interventional studies in exercise physiology\nand sport science. Experimental Physiology, 107(3),\n201–212. https://doi.org/10.1113/EP090171\n\n\nMcCarthy, R. J., Skowronski, J. J., Verschuere, B., Meijer, E. H., Jim,\nA., Hoogesteyn, K., Orthey, R., Acar, O. A., Aczel, B., Bakos, B. E.,\nBarbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R., Blatz,\nL., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E. (2018).\nRegistered Replication Report on Srull and\nWyer (1979). Advances in Methods and Practices in\nPsychological Science, 1(3), 321–336. https://doi.org/10.1177/2515245918777487\n\n\nMcElreath, R. (2016). Statistical Rethinking: A\nBayesian Course with Examples in R and\nStan (Vol. 122). CRC Press.\n\n\nMcGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree:\nThe case of r and d. Psychological Methods,\n11(4), 386–401. https://doi.org/10.1037/1082-989X.11.4.386\n\n\nMcGraw, K. O., & Wong, S. P. (1992). A common language effect size\nstatistic. Psychological Bulletin, 111(2), 361–365. https://doi.org/10.1037/0033-2909.111.2.361\n\n\nMcGuire, W. J. (2004). A Perspectivist Approach to\nTheory Construction. Personality and Social Psychology\nReview, 8(2), 173–182. https://doi.org/10.1207/s15327957pspr0802_11\n\n\nMcIntosh, R. D., & Rittmo, J. Ö. (2021). Power calculations in\nsingle-case neuropsychology: A practical primer.\nCortex, 135, 146–158. https://doi.org/10.1016/j.cortex.2020.11.005\n\n\nMeehl, P. E. (1967). Theory-testing in psychology and physics:\nA methodological paradox. Philosophy of Science,\n103–115. https://www.jstor.org/stable/186099\n\n\nMeehl, P. E. (1978). Theoretical Risks and Tabular\nAsterisks: Sir Karl, Sir Ronald, and\nthe Slow Progress of Soft Psychology.\nJournal of Consulting and Clinical Psychology, 46(4),\n806–834. https://doi.org/10.1037/0022-006X.46.4.806\n\n\nMeehl, P. E. (1990a). Appraising and amending theories: The\nstrategy of Lakatosian defense and two principles that\nwarrant it. Psychological Inquiry, 1(2), 108–141. https://doi.org/10.1207/s15327965pli0102_1\n\n\nMeehl, P. E. (1990b). Why Summaries of\nResearch on Psychological Theories are\nOften Uninterpretable: Psychological Reports,\n66(1), 195–244. https://doi.org/10.2466/pr0.1990.66.1.195\n\n\nMeehl, P. E. (2004). Cliometric metatheory III:\nPeircean consensus, verisimilitude and asymptotic method.\nThe British Journal for the Philosophy of Science,\n55(4), 615–643.\n\n\nMelara, R. D., & Algom, D. (2003). Driven by information:\nA tectonic theory of Stroop effects.\nPsychological Review, 110(3), 422–471. https://doi.org/10.1037/0033-295X.110.3.422\n\n\nMellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency\nrepresentations eliminate conjunction effects? An exercise\nin adversarial collaboration. Psychological Science,\n12(4), 269–275. https://doi.org/10.1111/1467-9280.00350\n\n\nMerton, R. K. (1942). A Note on Science and\nDemocracy. Journal of Legal and Political\nSociology, 1, 115–126.\n\n\nMeyners, M. (2012). Equivalence tests A\nreview. Food Quality and Preference, 26(2), 231–245.\nhttps://doi.org/10.1016/j.foodqual.2012.05.003\n\n\nMeyvis, T., & Van Osselaer, S. M. J. (2018). Increasing the\nPower of Your Study by Increasing\nthe Effect Size. Journal of Consumer Research,\n44(5), 1157–1173. https://doi.org/10.1093/jcr/ucx110\n\n\nMillar, R. B. (2011). Maximum likelihood estimation and inference:\nWith examples in R, SAS, and\nADMB. Wiley.\n\n\nMiller, J. (2009). What is the probability of replicating a\nstatistically significant effect? Psychonomic Bulletin &\nReview, 16(4), 617–640. https://doi.org/10.3758/PBR.16.4.617\n\n\nMiller, J., & Ulrich, R. (2019). The quest for an optimal alpha.\nPLOS ONE, 14(1), e0208631. https://doi.org/10.1371/journal.pone.0208631\n\n\nMitroff, I. I. (1974). Norms and Counter-Norms in a\nSelect Group of the Apollo Moon Scientists:\nA Case Study of the Ambivalence of\nScientists. American Sociological Review,\n39(4), 579–595. https://doi.org/10.2307/2094423\n\n\nMoe, K. (1984). Should the Nazi Research Data Be Cited?\nThe Hastings Center Report, 14(6), 5–7. https://doi.org/10.2307/3561733\n\n\nMoran, C., Link to external site, this link will open in a new window,\nRichard, A., Link to external site, this link will open in a new window,\nWilson, K., Twomey, R., Link to external site, this link will open in a\nnew window, Coroiu, A., & Link to external site, this link will open\nin a new window. (2022). I know it’s bad, but I have been\npressured into it: Questionable research practices among\npsychology students in Canada. Canadian\nPsychology/Psychologie Canadienne. https://doi.org/10.1037/cap0000326\n\n\nMorey, R. D. (2020). Power and precision [Blog].\nhttps://medium.com/@richarddmorey/power-and-precision-47f644ddea5e.\n\n\nMorey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., &\nWagenmakers, E.-J. (2016). The fallacy of placing confidence in\nconfidence intervals. Psychonomic Bulletin & Review,\n23(1), 103–123.\n\n\nMorey, R. D., Kaschak, M. P., Díez-Álamo, A. M., Glenberg, A. M., Zwaan,\nR. A., Lakens, D., Ibáñez, A., García, A., Gianelli, C., Jones, J. L.,\nMadden, J., Alifano, F., Bergen, B., Bloxsom, N. G., Bub, D. N., Cai, Z.\nG., Chartier, C. R., Chatterjee, A., Conwell, E., … Ziv-Crispel, N.\n(2021). A pre-registered, multi-lab non-replication of the\naction-sentence compatibility effect (ACE). Psychonomic\nBulletin & Review. https://doi.org/10.3758/s13423-021-01927-8\n\n\nMorris, T. P., White, I. R., & Crowther, M. J. (2019). Using\nsimulation studies to evaluate statistical methods. Statistics in\nMedicine, 38(11), 2074–2102. https://doi.org/10.1002/sim.8086\n\n\nMorse, J. M. (1995). The Significance of\nSaturation. Qualitative Health Research,\n5(2), 147–149. https://doi.org/10.1177/104973239500500201\n\n\nMoscovici, S. (1972). Society and theory in social psychology. In\nContext of social psychology (pp. 17–81).\n\n\nMoshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L.,\nForscher, P. S., Grahe, J. E., McCarthy, R. J., Musser, E. D., &\nAntfolk, J. (2018). The Psychological Science Accelerator:\nAdvancing psychology through a distributed collaborative\nnetwork. Advances in Methods and Practices in Psychological\nScience, 1(4), 501–515. https://doi.org/10.1177/2515245918797607\n\n\nMotyl, M., Demos, A. P., Carsel, T. S., Hanson, B. E., Melton, Z. J.,\nMueller, A. B., Prims, J. P., Sun, J., Washburn, A. N., Wong, K. M.,\nYantis, C., & Skitka, L. J. (2017). The state of social and\npersonality science: Rotten to the core, not so bad,\ngetting better, or getting worse? Journal of Personality and Social\nPsychology, 113, 34–58. https://doi.org/10.1037/pspa0000084\n\n\nMrozek, J. R., & Taylor, L. O. (2002). What determines the value of\nlife? A meta-analysis. Journal of Policy Analysis and\nManagement, 21(2), 253–270. https://doi.org/10.1002/pam.10026\n\n\nMudge, J. F., Baker, L. F., Edge, C. B., & Houlahan, J. E. (2012).\nSetting an Optimal α That Minimizes\nErrors in Null Hypothesis Significance Tests.\nPLOS ONE, 7(2), e32734. https://doi.org/10.1371/journal.pone.0032734\n\n\nMullan, F., & Jacoby, I. (1985). The town meeting for technology:\nThe maturation of consensus conferences. JAMA,\n254(8), 1068–1072. https://doi.org/10.1001/jama.1985.03360080080035\n\n\nMulligan, A., Hall, L., & Raphael, E. (2013). Peer review in a\nchanging world: An international study measuring the\nattitudes of researchers. Journal of the American Society for\nInformation Science and Technology, 64(1), 132–161. https://doi.org/10.1002/asi.22798\n\n\nMurphy, K. R., & Myors, B. (1999). Testing the hypothesis that\ntreatments have negligible effects: Minimum-effect tests in the general linear model.\nJournal of Applied Psychology, 84(2), 234–248. https://doi.org/10.1037/0021-9010.84.2.234\n\n\nMurphy, K. R., Myors, B., & Wolach, A. H. (2014). Statistical\npower analysis: A simple and general model for traditional and modern\nhypothesis tests (Fourth edition). Routledge, Taylor &\nFrancis Group.\n\n\nNational Academy of Sciences, National Academy of Engineering, &\nInstitute of Medicine. (2009). On being a scientist: A\nguide to responsible conduct in research: Third\nedition. The National Academies Press. https://doi.org/10.17226/12192\n\n\nNeher, A. (1967). Probability Pyramiding, Research\nError and the Need for Independent\nReplication. The Psychological Record, 17(2),\n257–262. https://doi.org/10.1007/BF03393713\n\n\nNemeth, C., Brown, K., & Rogers, J. (2001). Devil’s advocate versus\nauthentic dissent: Stimulating quantity and quality. European\nJournal of Social Psychology, 31(6), 707–720. https://doi.org/10.1002/ejsp.58\n\n\nNeyman, J. (1957). \"Inductive Behavior\" as a Basic\nConcept of Philosophy of Science.\nRevue de l’Institut International de Statistique / Review of the\nInternational Statistical Institute, 25(1/3), 7. https://doi.org/10.2307/1401671\n\n\nNeyman, J., & Pearson, E. S. (1933). On the problem of the most\nefficient tests of statistical hypotheses. Philosophical\nTransactions of the Royal Society of London A: Mathematical, Physical\nand Engineering Sciences, 231(694-706), 289–337. https://doi.org/10.1098/rsta.1933.0009\n\n\nNickerson, R. S. (1998). Confirmation bias: A ubiquitous\nphenomenon in many guises. Review of General Psychology,\n2(2), 175–220.\n\n\nNickerson, R. S. (2000). Null hypothesis significance testing:\nA review of an old and continuing controversy.\nPsychological Methods, 5(2), 241–301. https://doi.org/10.1037//1082-989X.5.2.241\n\n\nNiiniluoto, I. (1998). Verisimilitude: The Third Period.\nThe British Journal for the Philosophy of Science, 49,\n1–29.\n\n\nNiiniluoto, I. (1999). Critical Scientific\nRealism. Oxford University Press.\n\n\nNorman, G. R., Sloan, J. A., & Wyrwich, K. W. (2004). The truly\nremarkable universality of half a standard deviation: Confirmation\nthrough another look. Expert Review of Pharmacoeconomics &\nOutcomes Research, 4(5), 581–585.\n\n\nNosek, B. A., & Lakens, D. (2014). Registered reports:\nA method to increase the credibility of published results.\nSocial Psychology, 45(3), 137–141. https://doi.org/10.1027/1864-9335/a000192\n\n\nNuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp,\nS., & Wicherts, J. M. (2015). The prevalence of statistical\nreporting errors in psychology (19852013). Behavior\nResearch Methods. https://doi.org/10.3758/s13428-015-0664-2\n\n\nNuijten, M. B., & Wicherts, J. (2023). The effectiveness of\nimplementing statcheck in the peer review process to avoid statistical\nreporting errors. PsyArXiv. https://doi.org/10.31234/osf.io/bxau9\n\n\nNunnally, J. (1960). The place of statistics in psychology.\nEducational and Psychological Measurement, 20(4),\n641–650. https://doi.org/10.1177/001316446002000401\n\n\nO’Donnell, M., Nelson, L. D., Ackermann, E., Aczel, B., Akhtar, A.,\nAldrovandi, S., Alshaif, N., Andringa, R., Aveyard, M., Babincak, P.,\nBalatekin, N., Baldwin, S. A., Banik, G., Baskin, E., Bell, R.,\nBiałobrzeska, O., Birt, A. R., Boot, W. R., Braithwaite, S. R., …\nZrubka, M. (2018). Registered Replication Report:\nDijksterhuis and van Knippenberg (1998).\nPerspectives on Psychological Science, 13(2), 268–294.\nhttps://doi.org/10.1177/1745691618755704\n\n\nObels, P., Lakens, D., Coles, N. A., Gottfried, J., & Green, S. A.\n(2020). Analysis of Open Data and Computational\nReproducibility in Registered Reports in\nPsychology. Advances in Methods and Practices in\nPsychological Science, 3(2), 229–237. https://doi.org/10.1177/2515245920918872\n\n\nOddie, G. (2013). The content, consequence and likeness approaches to\nverisimilitude: Compatibility, trivialization, and underdetermination.\nSynthese, 190(9), 1647–1687. https://doi.org/10.1007/s11229-011-9930-8\n\n\nOkada, K. (2013). Is Omega Squared Less Biased? A\nComparison of Three Major Effect Size Indices\nin One-Way Anova. Behaviormetrika, 40(2),\n129–147. https://doi.org/10.2333/bhmk.40.129\n\n\nOlejnik, S., & Algina, J. (2003). Generalized Eta and\nOmega Squared Statistics: Measures of\nEffect Size for Some Common Research Designs.\nPsychological Methods, 8(4), 434–447. https://doi.org/10.1037/1082-989X.8.4.434\n\n\nOlsson-Collentine, A., Wicherts, J. M., & van Assen, M. A. L. M.\n(2020). Heterogeneity in direct replications in psychology and its\nassociation with effect size. Psychological Bulletin,\n146(10), 922–940. https://doi.org/10.1037/bul0000294\n\n\nOpen Science Collaboration. (2015). Estimating the reproducibility of\npsychological science. Science, 349(6251),\naac4716–aac4716. https://doi.org/10.1126/science.aac4716\n\n\nOrben, A., & Lakens, D. (2020). Crud\n(Re)Defined. Advances in Methods and\nPractices in Psychological Science, 3(2), 238–247. https://doi.org/10.1177/2515245920917961\n\n\nParker, R. A., & Berman, N. G. (2003). Sample Size.\nThe American Statistician, 57(3), 166–170. https://doi.org/10.1198/0003130031919\n\n\nParkhurst, D. F. (2001). Statistical significance tests:\nEquivalence and reverse tests should reduce\nmisinterpretation. Bioscience, 51(12), 1051–1057. https://doi.org/10.1641/0006-3568(2001)051[1051:SSTEAR]2.0.CO;2\n\n\nParsons, S., Kruijt, A.-W., & Fox, E. (2019). Psychological\nScience Needs a Standard Practice of\nReporting the Reliability of\nCognitive-Behavioral Measurements. Advances in Methods\nand Practices in Psychological Science, 2(4), 378–395. https://doi.org/10.1177/2515245919879695\n\n\nPawitan, Y. (2001). In all likelihood: Statistical modelling and\ninference using likelihood. Clarendon Press ; Oxford\nUniversity Press.\n\n\nPemberton, M., Hall, S., Moskovitz, C., & Anson, C. M. (2019). Text\nrecycling: Views of North American journal\neditors from an interview-based study. Learned Publishing,\n32(4), 355–366. https://doi.org/10.1002/leap.1259\n\n\nPerneger, T. V. (1998). What’s wrong with Bonferroni\nadjustments. Bmj, 316(7139), 1236–1238.\n\n\nPerugini, M., Gallucci, M., & Costantini, G. (2014). Safeguard power\nas a protection against imprecise power estimates. Perspectives on\nPsychological Science, 9(3), 319–332. https://doi.org/10.1177/1745691614528519\n\n\nPerugini, M., Gallucci, M., & Costantini, G. (2018). A\nPractical Primer To Power Analysis for Simple\nExperimental Designs. International Review of Social\nPsychology, 31(1), 20. https://doi.org/10.5334/irsp.181\n\n\nPeters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., &\nRushton, L. (2007). Performance of the trim and fill method in the\npresence of publication bias and between-study heterogeneity.\nStatistics in Medicine, 26(25), 4544–4562. https://doi.org/10.1002/sim.2889\n\n\nPhillips, B. M., Hunt, J. W., Anderson, B. S., Puckett, H. M., Fairey,\nR., Wilson, C. J., & Tjeerdema, R. (2001). Statistical significance\nof sediment toxicity test results: Threshold values derived\nby the detectable significance approach. Environmental Toxicology\nand Chemistry, 20(2), 371–373. https://doi.org/10.1002/etc.5620200218\n\n\nPickett, J. T., & Roche, S. P. (2017). Questionable,\nObjectionable or Criminal? Public\nOpinion on Data Fraud and Selective\nReporting in Science. Science and Engineering\nEthics, 1–21. https://doi.org/10.1007/s11948-017-9886-2\n\n\nPlatt, J. R. (1964). Strong Inference: Certain\nsystematic methods of scientific thinking may produce much more rapid\nprogress than others. Science, 146(3642), 347–353. https://doi.org/10.1126/science.146.3642.347\n\n\nPocock, S. J. (1977). Group sequential methods in the design and\nanalysis of clinical trials. Biometrika, 64(2),\n191–199. https://doi.org/10.1093/biomet/64.2.191\n\n\nPolanin, J. R., Hennessy, E. A., & Tsuji, S. (2020). Transparency\nand Reproducibility of Meta-Analyses in\nPsychology: A Meta-Review. Perspectives on\nPsychological Science, 15(4), 1026–1041. https://doi.org/10.1177/1745691620906416\n\n\nPopper, K. R. (2002). The logic of scientific\ndiscovery. Routledge.\n\n\nPrimbs, M., Pennington, C. R., Lakens, D., Silan, M. A., Lieck, D. S.\nN., Forscher, P., Buchanan, E. M., & Westwood, S. J. (2022). Are\nSmall Effects the Indispensable Foundation for\na Cumulative Psychological Science? A Reply to\nGötz et al.\n(2022). Perspectives on Psychological Science. https://doi.org/10.31234/osf.io/6s8bj\n\n\nProschan, M. A. (2005). Two-Stage Sample Size Re-Estimation\nBased on a Nuisance Parameter: A\nReview. Journal of Biopharmaceutical Statistics,\n15(4), 559–574. https://doi.org/10.1081/BIP-200062852\n\n\nProschan, M. A., Lan, K. K. G., & Wittes, J. T. (2006).\nStatistical monitoring of clinical trials: A unified approach.\nSpringer.\n\n\nPsillos, S. (1999). Scientific realism: How science tracks\ntruth. Routledge.\n\n\nQuertemont, E. (2011). How to Statistically Show the\nAbsence of an Effect. Psychologica\nBelgica, 51(2), 109–127. https://doi.org/10.5334/pb-51-2-109\n\n\nRabelo, A. L. A., Farias, J. E. M., Sarmet, M. M., Joaquim, T. C. R.,\nHoersting, R. C., Victorino, L., Modesto, J. G. N., & Pilati, R.\n(2020). Questionable research practices among Brazilian\npsychological researchers: Results from a replication study\nand an international comparison. International Journal of\nPsychology, 55(4), 674–683. https://doi.org/10.1002/ijop.12632\n\n\nRadick, G. (2022). Mendel the fraud? A social history of\ntruth in genetics. Studies in History and Philosophy of\nScience, 93, 39–46. https://doi.org/10.1016/j.shpsa.2021.12.012\n\n\nReif, F. (1961). The Competitive World of the Pure\nScientist. Science, 134(3494), 1957–1962. https://doi.org/10.1126/science.134.3494.1957\n\n\nRice, W. R., & Gaines, S. D. (1994). ’Heads I win,\ntails you lose’: Testing directional alternative hypotheses in\necological and evolutionary research. Trends in Ecology &\nEvolution, 9(6), 235–237. https://doi.org/10.1016/0169-5347(94)90258-5\n\n\nRichard, F. D., Bond, C. F., & Stokes-Zoota, J. J. (2003). One\nHundred Years of Social Psychology Quantitatively\nDescribed. Review of General Psychology, 7(4),\n331–363. https://doi.org/10.1037/1089-2680.7.4.331\n\n\nRichardson, J. T. E. (2011). Eta squared and partial eta squared as\nmeasures of effect size in educational research. Educational\nResearch Review, 6(2), 135–147. https://doi.org/10.1016/j.edurev.2010.12.001\n\n\nRijnsoever, F. J. van. (2017). (I Can’t Get\nNo) Saturation: A simulation and\nguidelines for sample sizes in qualitative research. PLOS ONE,\n12(7), e0181689. https://doi.org/10.1371/journal.pone.0181689\n\n\nRogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using\nsignificance tests to evaluate equivalence between two experimental\ngroups. Psychological Bulletin, 113(3), 553–565.\nhttps://doi.org/http://dx.doi.org/10.1037/0033-2909.113.3.553\n\n\nRogers, S. (1992). How a publicity blitz created the myth of subliminal\nadvertising. Public Relations Quarterly, 37(4), 12.\n\n\nRopovik, I., Adamkovic, M., & Greger, D. (2021). Neglect of\npublication bias compromises meta-analyses of educational research.\nPLOS ONE, 16(6), e0252415. https://doi.org/10.1371/journal.pone.0252415\n\n\nRosenthal, R. (1966). Experimenter effects in behavioral\nresearch. Appleton-Century-Crofts.\n\n\nRoss-Hellauer, T., Deppe, A., & Schmidt, B. (2017). Survey on open\npeer review: Attitudes and experience amongst editors,\nauthors and reviewers. PLOS ONE, 12(12), e0189311. https://doi.org/10.1371/journal.pone.0189311\n\n\nRouder, J. N. (2014). Optional stopping: No problem for\nBayesians. Psychonomic Bulletin & Review,\n21(2), 301–308.\n\n\nRouder, J. N., Haaf, J. M., & Snyder, H. K. (2019). Minimizing\nMistakes in Psychological Science.\nAdvances in Methods and Practices in Psychological Science,\n2(1), 3–11. https://doi.org/10.1177/2515245918801915\n\n\nRouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G.\n(2009). Bayesian t tests for accepting and rejecting the null\nhypothesis. Psychonomic Bulletin & Review, 16(2),\n225–237. https://doi.org/10.3758/PBR.16.2.225\n\n\nRoyall, R. (1997). Statistical Evidence: A\nLikelihood Paradigm. Chapman and Hall/CRC.\n\n\nRozeboom, W. W. (1960). The fallacy of the null-hypothesis significance\ntest. Psychological Bulletin, 57(5), 416–428. https://doi.org/10.1037/h0042040\n\n\nRücker, G., Schwarzer, G., Carpenter, J. R., & Schumacher, M.\n(2008). Undue reliance on I(2) in assessing heterogeneity\nmay mislead. BMC Medical Research Methodology, 8, 79.\nhttps://doi.org/10.1186/1471-2288-8-79\n\n\nSarafoglou, A., Kovacs, M., Bakos, B., Wagenmakers, E.-J., & Aczel,\nB. (2022). A survey on how preregistration affects the research\nworkflow: Better science but more work. Royal Society Open\nScience, 9(7), 211997. https://doi.org/10.1098/rsos.211997\n\n\nScheel, A. M., Schijen, M. R. M. J., & Lakens, D. (2021). An\nExcess of Positive Results:\nComparing the Standard Psychology Literature With\nRegistered Reports. Advances in Methods and Practices in\nPsychological Science, 4(2), 25152459211007467. https://doi.org/10.1177/25152459211007467\n\n\nScheel, A. M., Tiokhin, L., Isager, P. M., & Lakens, D. (2021). Why\nHypothesis Testers Should Spend Less Time Testing\nHypotheses. Perspectives on Psychological Science,\n16(4), 744–755. https://doi.org/10.1177/1745691620966795\n\n\nSchimmack, U. (2012). The ironic effect of significant results on the\ncredibility of multiple-study articles. Psychological Methods,\n17(4), 551–566. https://doi.org/10.1037/a0029487\n\n\nSchnuerch, M., & Erdfelder, E. (2020). Controlling decision errors\nwith minimal costs: The sequential probability ratio t\ntest. Psychological Methods, 25(2), 206–226. https://doi.org/10.1037/met0000234\n\n\nSchoemann, A. M., Boulton, A. J., & Short, S. D. (2017). Determining\nPower and Sample Size for Simple\nand Complex Mediation Models. Social Psychological and\nPersonality Science, 8(4), 379–386. https://doi.org/10.1177/1948550617715068\n\n\nSchoenegger, P., & Pils, R. (2023). Social sciences in crisis: On\nthe proposed elimination of the discussion section. Synthese,\n202(2), 54. https://doi.org/10.1007/s11229-023-04267-3\n\n\nSchönbrodt, F. D., Wagenmakers, E.-J., Zehetleitner, M., & Perugini,\nM. (2017). Sequential hypothesis testing with Bayes\nfactors: Efficiently testing mean differences.\nPsychological Methods, 22(2), 322–339. https://doi.org/10.1037/MET0000061\n\n\nSchuirmann, D. J. (1987). A comparison of the two one-sided tests\nprocedure and the power approach for assessing the equivalence of\naverage bioavailability. Journal of Pharmacokinetics and\nBiopharmaceutics, 15(6), 657–680.\n\n\nSchulz, K. F., & Grimes, D. A. (2005). Sample size calculations in\nrandomised trials: Mandatory and mystical. The Lancet,\n365(9467), 1348–1353. https://doi.org/10.1016/S0140-6736(05)61034-3\n\n\nSchumi, J., & Wittes, J. T. (2011). Through the looking glass:\nUnderstanding non-inferiority. Trials, 12(1), 106. https://doi.org/10.1186/1745-6215-12-106\n\n\nSchweder, T., & Hjort, N. L. (2016). Confidence,\nLikelihood, Probability: Statistical\nInference with Confidence Distributions.\nCambridge University Press. https://doi.org/10.1017/CBO9781139046671\n\n\nScull, A. (2023). Rosenhan revisited: Successful scientific fraud.\nHistory of Psychiatry, 0957154X221150878. https://doi.org/10.1177/0957154X221150878\n\n\nSeaman, M. A., & Serlin, R. C. (1998). Equivalence confidence\nintervals for two-group comparisons of means. Psychological\nMethods, 3(4), 403–411. https://doi.org/http://dx.doi.org.dianus.libr.tue.nl/10.1037/1082-989X.3.4.403\n\n\nSedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical\npower have an effect on the power of studies? Psychological\nBulletin, 105(2), 309–316. https://doi.org/10.1037/0033-2909.105.2.309\n\n\nShadish, W. R., Cook, T. D., & Campbell, D. T. (2001).\nExperimental and quasi-experimental designs for generalized causal\ninference. Houghton Mifflin.\n\n\nShmueli, G. (2010). To explain or to predict? Statistical\nScience, 25(3), 289–310.\n\n\nSimmons, J. P., Nelson, L. D., & Simonsohn, U. (2011).\nFalse-Positive Psychology: Undisclosed\nFlexibility in Data Collection and Analysis\nAllows Presenting Anything as Significant.\nPsychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632\n\n\nSimmons, J. P., Nelson, L. D., & Simonsohn, U. (2013). Life\nafter P-Hacking.\n\n\nSimonsohn, U. (2015). Small telescopes: Detectability and\nthe evaluation of replication results. Psychological Science,\n26(5), 559–569. https://doi.org/10.1177/0956797614567341\n\n\nSimonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve:\nA key to the file-drawer. Journal of Experimental\nPsychology: General, 143(2), 534.\n\n\nSmart, R. G. (1964). The importance of negative results in psychological\nresearch. Canadian Psychologist / Psychologie Canadienne,\n5a(4), 225–232. https://doi.org/10.1037/h0083036\n\n\nSmithson, M. (2003). Confidence intervals. Sage\nPublications.\n\n\nSotola, L. K. (2022). Garbage In, Garbage Out?\nEvaluating the Evidentiary Value of Published Meta-analyses Using Z-Curve Analysis.\nCollabra: Psychology, 8(1), 32571. https://doi.org/10.1525/collabra.32571\n\n\nSpanos, A. (1999). Probability theory and statistical inference:\nEconometric modeling with observational data. Cambridge\nUniversity Press.\n\n\nSpanos, A. (2013). Who should be afraid of the\nJeffreys-Lindley paradox? Philosophy of Science,\n80(1), 73–93. https://doi.org/10.1086/668875\n\n\nSpellman, B. A. (2015). A Short (Personal)\nFuture History of Revolution 2.0.\nPerspectives on Psychological Science, 10(6), 886–899.\nhttps://doi.org/10.1177/1745691615609918\n\n\nSpiegelhalter, D. (2019). The Art of\nStatistics: How to Learn from\nData (Illustrated edition). Basic Books.\n\n\nSpiegelhalter, D. J., Freedman, L. S., & Blackburn, P. R. (1986).\nMonitoring clinical trials: Conditional or predictive power?\nControlled Clinical Trials, 7(1), 8–17. https://doi.org/10.1016/0197-2456(86)90003-6\n\n\nStanley, T. D., & Doucouliagos, H. (2014). Meta-regression\napproximations to reduce publication selection bias. Research\nSynthesis Methods, 5(1), 60–78. https://doi.org/10.1002/jrsm.1095\n\n\nStanley, T. D., Doucouliagos, H., & Ioannidis, J. P. A. (2017).\nFinding the power to reduce publication bias: Finding the\npower to reduce publication bias. Statistics in Medicine. https://doi.org/10.1002/sim.7228\n\n\nSteiger, J. H. (2004). Beyond the F Test: Effect Size\nConfidence Intervals and Tests of Close\nFit in the Analysis of Variance and\nContrast Analysis. Psychological Methods,\n9(2), 164–182. https://doi.org/10.1037/1082-989X.9.2.164\n\n\nSterling, T. D. (1959). Publication Decisions and\nTheir Possible Effects on Inferences Drawn\nfrom Tests of Significance–Or Vice Versa.\nJournal of the American Statistical Association,\n54(285), 30–34. https://doi.org/10.2307/2282137\n\n\nStewart, L. A., & Tierney, J. F. (2002). To IPD or not\nto IPD?: Advantages and\nDisadvantages of Systematic Reviews Using Individual\nPatient Data. Evaluation & the Health Professions,\n25(1), 76–97. https://doi.org/10.1177/0163278702025001006\n\n\nStodden, V., Seiler, J., & Ma, Z. (2018). An empirical analysis of\njournal policy effectiveness for computational reproducibility.\nProceedings of the National Academy of Sciences,\n115(11), 2584–2589. https://doi.org/10.1073/pnas.1708290115\n\n\nStrand, J. F. (2023). Error tight: Exercises for lab groups\nto prevent research mistakes. Psychological Methods, No\nPagination Specified–No Pagination Specified. https://doi.org/10.1037/met0000547\n\n\nStroebe, W., & Strack, F. (2014). The Alleged Crisis\nand the Illusion of Exact Replication.\nPerspectives on Psychological Science, 9(1), 59–71. https://doi.org/10.1177/1745691613514450\n\n\nStroop, J. R. (1935). Studies of interference in serial verbal\nreactions. Journal of Experimental Psychology, 18(6),\n643–662.\n\n\nSwift, J. K., Link to external site, this link will open in a new\nwindow, Christopherson, C. D., Link to external site, this link will\nopen in a new window, Bird, M. O., Link to external site, this link will\nopen in a new window, Zöld, A., Link to external site, this link will\nopen in a new window, Goode, J., & Link to external site, this link\nwill open in a new window. (2022). Questionable research practices among\nfaculty and students in APA-accredited\nclinical and counseling psychology doctoral programs. Training and\nEducation in Professional Psychology, 16(3), 299–305. https://doi.org/10.1037/tep0000322\n\n\nTaper, M. L., & Lele, S. R. (2011). Philosophy of\nStatistics. In P. S. Bandyophadhyay & M. R. Forster\n(Eds.), Evidence, evidence functions, and error probabilities\n(pp. 513–531). Elsevier, USA.\n\n\nTaylor, D. J., & Muller, K. E. (1996). Bias in linear model power\nand sample size calculation due to estimating noncentrality.\nCommunications in Statistics-Theory and Methods,\n25(7), 1595–1610. https://doi.org/10.1080/03610929608831787\n\n\nTeare, M. D., Dimairo, M., Shephard, N., Hayman, A., Whitehead, A.,\n& Walters, S. J. (2014). Sample size requirements to estimate key\ndesign parameters from external pilot randomised controlled trials: A\nsimulation study. Trials, 15(1), 264. https://doi.org/10.1186/1745-6215-15-264\n\n\nTendeiro, J. N., & Kiers, H. A. L. (2019). A review of issues about\nnull hypothesis Bayesian testing. Psychological\nMethods. https://doi.org/10.1037/met0000221\n\n\nter Schure, J., & Grünwald, P. D. (2019). Accumulation\nBias in Meta-Analysis: The Need\nto Consider Time in Error Control.\narXiv:1905.13494 [Math, Stat]. https://arxiv.org/abs/1905.13494\n\n\nTerrin, N., Schmid, C. H., Lau, J., & Olkin, I. (2003). Adjusting\nfor publication bias in the presence of heterogeneity. Statistics in\nMedicine, 22(13), 2113–2126. https://doi.org/10.1002/sim.1461\n\n\nThompson, B. (2007). Effect sizes, confidence intervals, and confidence\nintervals for effect sizes. Psychology in the Schools,\n44(5), 423–432. https://doi.org/10.1002/pits.20234\n\n\nTversky, A. (1977). Features of similarity. Psychological\nReview, 84(4), 327–352. https://doi.org/10.1037/0033-295X.84.4.327\n\n\nTversky, A., & Kahneman, D. (1971). Belief in the law of small\nnumbers. Psychological Bulletin, 76(2), 105–110. https://doi.org/10.1037/h0031322\n\n\nUlrich, R., & Miller, J. (2018). Some properties of p-curves, with\nan application to gradual publication bias. Psychological\nMethods, 23(3), 546–560. https://doi.org/10.1037/met0000125\n\n\nUygun Tunç, D., & Tunç, M. N. (2022). A Falsificationist\nTreatment of Auxiliary Hypotheses in\nSocial and Behavioral Sciences:\nSystematic Replications Framework.\nMeta-Psychology. https://doi.org/10.31234/osf.io/pdm7y\n\n\nUygun Tunç, D., Tunç, M. N., & Lakens, D. (2023). The epistemic and\npragmatic function of dichotomous claims based on statistical hypothesis\ntests. Theory & Psychology, 09593543231160112. https://doi.org/10.1177/09593543231160112\n\n\nValentine, J. C., Pigott, T. D., & Rothstein, H. R. (2010). How\nMany Studies Do You Need?: A Primer on\nStatistical Power for Meta-Analysis.\nJournal of Educational and Behavioral Statistics,\n35(2), 215–247. https://doi.org/10.3102/1076998609346961\n\n\nvan de Schoot, R., Winter, S. D., Griffioen, E., Grimmelikhuijsen, S.,\nArts, I., Veen, D., Grandfield, E. M., & Tummers, L. G. (2021). The\nUse of Questionable Research Practices to\nSurvive in Academia Examined With Expert\nElicitation, Prior-Data Conflicts, Bayes\nFactors for Replication Effects, and the Bayes\nTruth Serum. Frontiers in Psychology, 12.\n\n\nvan de Schoot, R., Winter, S. D., Ryan, O., Zondervan-Zwijnenburg, M.,\n& Depaoli, S. (2017). A systematic review of Bayesian\narticles in psychology: The last 25 years.\nPsychological Methods, 22(2), 217–239. https://doi.org/10.1037/met0000100\n\n\nVan Fraassen, B. C. (1980). The scientific image.\nClarendon Press ; Oxford University Press.\n\n\nvan ’t Veer, A. E., & Giner-Sorolla, R. (2016). Pre-registration in\nsocial psychologyA discussion and suggested\ntemplate. Journal of Experimental Social Psychology,\n67, 2–12. https://doi.org/10.1016/j.jesp.2016.03.004\n\n\nVarkey, B. (2021). Principles of Clinical Ethics and\nTheir Application to Practice. Medical\nPrinciples and Practice: International Journal of the Kuwait University,\nHealth Science Centre, 30(1), 17–28. https://doi.org/10.1159/000509119\n\n\nVazire, S. (2017). Quality Uncertainty Erodes Trust in\nScience. Collabra: Psychology, 3(1), 1.\nhttps://doi.org/10.1525/collabra.74\n\n\nVazire, S., & Holcombe, A. O. (2022). Where Are the\nSelf-Correcting Mechanisms in Science?\nReview of General Psychology, 26(2), 212–223. https://doi.org/10.1177/10892680211033912\n\n\nVerschuere, B., Meijer, E. H., Jim, A., Hoogesteyn, K., Orthey, R.,\nMcCarthy, R. J., Skowronski, J. J., Acar, O. A., Aczel, B., Bakos, B.\nE., Barbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R.,\nBlatz, L., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E.\n(2018). Registered Replication Report on\nMazar, Amir, and Ariely (2008).\nAdvances in Methods and Practices in Psychological Science,\n1(3), 299–317. https://doi.org/10.1177/2515245918781032\n\n\nViamonte, S. M., Ball, K. K., & Kilgore, M. (2006). A\nCost-Benefit Analysis of Risk-Reduction Strategies\nTargeted at Older Drivers. Traffic Injury\nPrevention, 7(4), 352–359. https://doi.org/10.1080/15389580600791362\n\n\nViechtbauer, W. (2010). Conducting meta-analyses in R with\nthe metafor package. J Stat Softw, 36(3), 1–48.\nhttps://doi.org/http://dx.doi.org/10.18637/jss.v036.i03\n\n\nVohs, K. D., Schmeichel, B. J., Lohmann, S., Gronau, Q. F., Finley, A.\nJ., Ainsworth, S. E., Alquist, J. L., Baker, M. D., Brizi, A., Bunyi,\nA., Butschek, G. J., Campbell, C., Capaldi, J., Cau, C., Chambers, H.,\nChatzisarantis, N. L. D., Christensen, W. J., Clay, S. L., Curtis, J., …\nAlbarracín, D. (2021). A Multisite Preregistered Paradigmatic\nTest of the Ego-Depletion Effect. Psychological\nScience, 32(10), 1566–1581. https://doi.org/10.1177/0956797621989733\n\n\nVosgerau, J., Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2019).\n99% impossible: A valid, or falsifiable, internal\nmeta-analysis. Journal of Experimental Psychology. General,\n148(9), 1628–1639. https://doi.org/10.1037/xge0000663\n\n\nVuorre, M., & Curley, J. P. (2018). Curating Research\nAssets: A Tutorial on the Git Version Control\nSystem. Advances in Methods and Practices in Psychological\nScience, 1(2), 219–236. https://doi.org/10.1177/2515245918754826\n\n\nWacholder, S., Chanock, S., Garcia-Closas, M., El ghormli, L., &\nRothman, N. (2004). Assessing the Probability That a\nPositive Report is False: An\nApproach for Molecular Epidemiology Studies.\nJNCI Journal of the National Cancer Institute, 96(6),\n434–442. https://doi.org/10.1093/jnci/djh075\n\n\nWagenmakers, E.-J. (2007). A practical solution to the pervasive\nproblems of p values. Psychonomic Bulletin & Review,\n14(5), 779–804. https://doi.org/10.3758/BF03194105\n\n\nWagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A.,\nAdams, R. B., Albohn, D. N., Allard, E. S., Benning, S. D.,\nBlouin-Hudon, E.-M., Bulnes, L. C., Caldwell, T. L., Calin-Jageman, R.\nJ., Capaldi, C. A., Carfagno, N. S., Chasten, K. T., Cleeremans, A.,\nConnell, L., DeCicco, J. M., … Zwaan, R. A. (2016). Registered\nReplication Report: Strack,\nMartin, & Stepper (1988). Perspectives\non Psychological Science, 11(6), 917–928. https://doi.org/10.1177/1745691616674458\n\n\nWagenmakers, E.-J., Wetzels, R., Borsboom, D., & van der Maas, H. L.\nJ. (2011). Why psychologists must change the way they analyze their\ndata: The case of psi: Comment on Bem (2011). Journal\nof Personality and Social Psychology, 100(3), 426–432. https://doi.org/10.1037/a0022790\n\n\nWald, A. (1945). Sequential tests of statistical hypotheses. The\nAnnals of Mathematical Statistics, 16(2), 117–186.\nhttps://doi.org/https://www.jstor.org/stable/2240273\n\n\nWaldron, S., & Allen, C. (2022). Not all pre-registrations are\nequal. Neuropsychopharmacology, 47(13), 2181–2183. https://doi.org/10.1038/s41386-022-01418-x\n\n\nWang, B., Zhou, Z., Wang, H., Tu, X. M., & Feng, C. (2019). The\np-value and model specification in statistics. General\nPsychiatry, 32(3), e100081. https://doi.org/10.1136/gpsych-2019-100081\n\n\nWason, P. C. (1960). On the failure to eliminate hypotheses in a\nconceptual task. Quarterly Journal of Experimental Psychology,\n12(3), 129–140. https://doi.org/10.1080/17470216008416717\n\n\nWassmer, G., & Brannath, W. (2016). Group\nSequential and Confirmatory Adaptive Designs\nin Clinical Trials. Springer International\nPublishing. https://doi.org/10.1007/978-3-319-32562-0\n\n\nWeinshall-Margel, K., & Shapard, J. (2011). Overlooked factors in\nthe analysis of parole decisions. Proceedings of the National\nAcademy of Sciences, 108(42), E833–E833. https://doi.org/10.1073/pnas.1110910108\n\n\nWellek, S. (2010). Testing statistical hypotheses of equivalence and\nnoninferiority (2nd ed). CRC Press.\n\n\nWestberg, M. (1985). Combining Independent Statistical\nTests. Journal of the Royal Statistical Society. Series D\n(The Statistician), 34(3), 287–296. https://doi.org/10.2307/2987655\n\n\nWestfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power\nand optimal design in experiments in which samples of participants\nrespond to samples of stimuli. Journal of Experimental Psychology:\nGeneral, 143(5), 2020–2045. https://doi.org/10.1037/xge0000014\n\n\nWestlake, W. J. (1972). Use of Confidence Intervals in\nAnalysis of Comparative Bioavailability\nTrials. Journal of Pharmaceutical Sciences,\n61(8), 1340–1341. https://doi.org/10.1002/JPS.2600610845\n\n\nWhitney, S. N. (2016). Balanced Ethics Review.\nSpringer International Publishing. https://doi.org/10.1007/978-3-319-20705-6\n\n\nWicherts, J. M. (2011). Psychology must learn a lesson from fraud case.\nNature, 480(7375), 7–7. https://doi.org/10.1038/480007a\n\n\nWicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M.,\nAert, V., M, R. C., Assen, V., & M, M. A. L. (2016). Degrees of\nFreedom in Planning, Running,\nAnalyzing, and Reporting Psychological\nStudies: A Checklist to Avoid\np-Hacking. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.01832\n\n\nWiebels, K., & Moreau, D. (2021). Leveraging Containers\nfor Reproducible Psychological Research. Advances in\nMethods and Practices in Psychological Science, 4(2),\n25152459211017853. https://doi.org/10.1177/25152459211017853\n\n\nWigboldus, D. H. J., & Dotsch, R. (2016). Encourage\nPlaying with Data and Discourage\nQuestionable Reporting Practices. Psychometrika,\n81(1), 27–32. https://doi.org/10.1007/s11336-015-9445-1\n\n\nWilliams, R. H., Zimmerman, D. W., & Zumbo, B. D. (1995). Impact of\nMeasurement Error on Statistical Power:\nReview of an Old Paradox. The Journal of\nExperimental Education, 63(4), 363–370. https://doi.org/10.1080/00220973.1995.9943470\n\n\nWilson, E. C. F. (2015). A Practical Guide to\nValue of Information Analysis.\nPharmacoEconomics, 33(2), 105–121. https://doi.org/10.1007/s40273-014-0219-x\n\n\nWilson VanVoorhis, C. R., & Morgan, B. L. (2007). Understanding\npower and rules of thumb for determining sample sizes. Tutorials in\nQuantitative Methods for Psychology, 3(2), 43–50. https://doi.org/10.20982/tqmp.03.2.p043\n\n\nWiner, B. J. (1962). Statistical principles in experimental\ndesign. New York : McGraw-Hill.\n\n\nWingen, T., Berkessel, J. B., & Englich, B. (2020). No\nReplication, No Trust? How Low\nReplicability Influences Trust in Psychology.\nSocial Psychological and Personality Science, 11(4),\n454–463. https://doi.org/10.1177/1948550619877412\n\n\nWiseman, R., Watt, C., & Kornbrot, D. (2019). Registered reports: An\nearly example and analysis. PeerJ, 7, e6232. https://doi.org/10.7717/peerj.6232\n\n\nWittes, J., & Brittain, E. (1990). The role of internal pilot\nstudies in increasing the efficiency of clinical trials. Statistics\nin Medicine, 9(1-2), 65–72. https://doi.org/10.1002/sim.4780090113\n\n\nWong, T. K., Kiers, H., & Tendeiro, J. (2022). On the\nPotential Mismatch Between the Function of the\nBayes Factor and Researchers’\nExpectations. Collabra: Psychology, 8(1),\n36357. https://doi.org/10.1525/collabra.36357\n\n\nWynants, L., Calster, B. V., Collins, G. S., Riley, R. D., Heinze, G.,\nSchuit, E., Bonten, M. M. J., Dahly, D. L., Damen, J. A., Debray, T. P.\nA., Jong, V. M. T. de, Vos, M. D., Dhiman, P., Haller, M. C., Harhay, M.\nO., Henckaerts, L., Heus, P., Kammer, M., Kreuzberger, N., … Smeden, M.\nvan. (2020). Prediction models for diagnosis and prognosis of covid-19:\nSystematic review and critical appraisal. BMJ, 369,\nm1328. https://doi.org/10.1136/bmj.m1328\n\n\nYarkoni, T., & Westfall, J. (2017). Choosing Prediction Over\nExplanation in Psychology: Lessons From\nMachine Learning. Perspectives on Psychological Science,\n12(6), 1100–1122. https://doi.org/10.1177/1745691617693393\n\n\nYuan, K.-H., & Maxwell, S. (2005). On the Post Hoc\nPower in Testing Mean Differences. Journal of\nEducational and Behavioral Statistics, 30(2), 141–167. https://doi.org/10.3102/10769986030002141\n\n\nZabell, S. L. (1992). R. A. Fisher and\nFiducial Argument. Statistical Science,\n7(3), 369–387. https://doi.org/10.1214/ss/1177011233\n\n\nZenko, M. (2015). Red Team: How to\nSucceed By Thinking Like the Enemy (1st\nedition). Basic Books.\n\n\nZumbo, B. D., & Hubley, A. M. (1998). A note on misconceptions\nconcerning prospective and retrospective power. Journal of the Royal\nStatistical Society: Series D (The Statistician), 47(2),\n385–388. https://doi.org/10.1111/1467-9884.00139" + "text": "Abelson, P. (2003). The Value of Life and\nHealth for Public Policy. Economic\nRecord, 79, S2–S13. https://doi.org/10.1111/1475-4932.00087\n\n\nAberson, C. L. (2019). Applied Power Analysis for the\nBehavioral Sciences (2nd ed.). Routledge.\n\n\nAert, R. C. M. van, & Assen, M. A. L. M. van. (2018). Correcting\nfor Publication Bias in a Meta-Analysis with\nthe P-uniform* Method.\nMetaArXiv. https://doi.org/10.31222/osf.io/zqjr9\n\n\nAgnoli, F., Wicherts, J. M., Veldkamp, C. L. S., Albiero, P., &\nCubelli, R. (2017). Questionable research practices among italian\nresearch psychologists. PLOS ONE, 12(3), e0172792. https://doi.org/10.1371/journal.pone.0172792\n\n\nAkker, O. van den, Bakker, M., Assen, M. A. L. M. van, Pennington, C.\nR., Verweij, L., Elsherif, M., Claesen, A., Gaillard, S. D. M., Yeung,\nS. K., Frankenberger, J.-L., Krautter, K., Cockcroft, J. P., Kreuer, K.\nS., Evans, T. R., Heppel, F., Schoch, S. F., Korbmacher, M., Yamada, Y.,\nAlbayrak-Aydemir, N., … Wicherts, J. (2023). The effectiveness of\npreregistration in psychology: Assessing preregistration\nstrictness and preregistration-study consistency.\nMetaArXiv. https://doi.org/10.31222/osf.io/h8xjw\n\n\nAlbers, C. J., Kiers, H. A. L., & Ravenzwaaij, D. van. (2018).\nCredible Confidence: A Pragmatic View on the\nFrequentist vs Bayesian Debate. Collabra:\nPsychology, 4(1), 31. https://doi.org/10.1525/collabra.149\n\n\nAlbers, C. J., & Lakens, D. (2018). When power analyses based on\npilot data are biased: Inaccurate effect size estimators\nand follow-up bias. Journal of Experimental Social Psychology,\n74, 187–195. https://doi.org/10.1016/j.jesp.2017.09.004\n\n\nAldrich, J. (1997). R.A. Fisher and the making\nof maximum likelihood 1912-1922. Statistical Science,\n12(3), 162–176. https://doi.org/10.1214/ss/1030037906\n\n\nAllison, D. B., Allison, R. L., Faith, M. S., Paultre, F., &\nPi-Sunyer, F. X. (1997). Power and money: Designing\nstatistically powerful studies while minimizing financial costs.\nPsychological Methods, 2(1), 20–33. https://doi.org/10.1037/1082-989X.2.1.20\n\n\nAltman, D. G., & Bland, J. M. (1995). Statistics notes:\nAbsence of evidence is not evidence of absence.\nBMJ, 311(7003), 485. https://doi.org/10.1136/bmj.311.7003.485\n\n\nAltoè, G., Bertoldo, G., Zandonella Callegher, C., Toffalini, E.,\nCalcagnì, A., Finos, L., & Pastore, M. (2020). Enhancing\nStatistical Inference in Psychological\nResearch via Prospective and Retrospective\nDesign Analysis. Frontiers in Psychology, 10.\n\n\nAnderson, M. S., Martinson, B. C., & De Vries, R. (2007). Normative\ndissonance in science: Results from a national survey of\nUS scientists. Journal of Empirical Research on Human\nResearch Ethics, 2(4), 3–14.\n\n\nAnderson, M. S., Ronning, E. A., De Vries, R., & Martinson, B. C.\n(2007). The perverse effects of competition on scientists’ work and\nrelationships. Science and Engineering Ethics, 13(4),\n437–461.\n\n\nAnderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-size\nplanning for more accurate statistical power: A method\nadjusting sample effect sizes for publication bias and uncertainty.\nPsychological Science, 28(11), 1547–1562. https://doi.org/10.1177/0956797617723724\n\n\nAnderson, S. F., & Maxwell, S. E. (2016). There’s more than one way\nto conduct a replication study: Beyond statistical\nsignificance. Psychological Methods, 21(1), 1–12. https://doi.org/10.1037/met0000051\n\n\nAnvari, F., Kievit, R., Lakens, D., Pennington, C. R., Przybylski, A.\nK., Tiokhin, L., Wiernik, B. M., & Orben, A. (2021). Not all effects\nare indispensable: Psychological science requires\nverifiable lines of reasoning for whether an effect matters.\nPerspectives on Psychological Science. https://doi.org/10.31234/osf.io/g3vtr\n\n\nAnvari, F., & Lakens, D. (2018). The replicability crisis and public\ntrust in psychological science. Comprehensive Results in Social\nPsychology, 3(3), 266–286. https://doi.org/10.1080/23743603.2019.1684822\n\n\nAnvari, F., & Lakens, D. (2021). Using anchor-based methods to\ndetermine the smallest effect size of interest. Journal of\nExperimental Social Psychology, 96, 104159. https://doi.org/10.1016/j.jesp.2021.104159\n\n\nAppelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M.,\n& Rao, S. M. (2018). Journal article reporting standards for\nquantitative research in psychology: The APA Publications\nand Communications Board task force report. American\nPsychologist, 73(1), 3. https://doi.org/10.1037/amp0000191\n\n\nArmitage, P., McPherson, C. K., & Rowe, B. C. (1969). Repeated\nsignificance tests on accumulating data. Journal of the Royal\nStatistical Society: Series A (General), 132(2), 235–244.\n\n\nArslan, R. C. (2019). How to Automatically Document Data\nWith the codebook Package to Facilitate Data\nReuse. Advances in Methods and Practices in Psychological\nScience, 2515245919838783. https://doi.org/10.1177/2515245919838783\n\n\nAzrin, N. H., Holz, W., Ulrich, R., & Goldiamond, I. (1961). The\ncontrol of the content of conversation through reinforcement.\nJournal of the Experimental Analysis of Behavior, 4,\n25–30. https://doi.org/10.1901/jeab.1961.4-25\n\n\nBabbage, C. (1830). Reflections on the Decline of\nScience in England: And on\nSome of Its Causes. B.\nFellowes.\n\n\nBacchetti, P. (2010). Current sample size conventions:\nFlaws, harms, and alternatives. BMC Medicine,\n8(1), 17. https://doi.org/10.1186/1741-7015-8-17\n\n\nBaguley, T. (2004). Understanding statistical power in the context of\napplied research. Applied Ergonomics, 35(2), 73–80. https://doi.org/10.1016/j.apergo.2004.01.002\n\n\nBaguley, T. (2009). Standardized or simple effect size:\nWhat should be reported? British Journal of\nPsychology, 100(3), 603–617. https://doi.org/10.1348/000712608X377117\n\n\nBaguley, T. (2012). Serious stats: A guide to advanced statistics\nfor the behavioral sciences. Palgrave Macmillan.\n\n\nBakan, D. (1966). The test of significance in psychological research.\nPsychological Bulletin, 66(6), 423–437. https://doi.org/10.1037/h0020412\n\n\nBakan, D. (1967). On method: Toward a reconstruction of\npsychological investigation. San Francisco,\nJossey-Bass.\n\n\nBakker, B. N., Kokil, J., Dörr, T., Fasching, N., & Lelkes, Y.\n(2021). Questionable and Open Research Practices:\nAttitudes and Perceptions among\nQuantitative Communication Researchers. Journal of\nCommunication, 71(5), 715–738. https://doi.org/10.1093/joc/jqab031\n\n\nBall, K., Berch, D. B., Helmers, K. F., Jobe, J. B., Leveck, M. D.,\nMarsiske, M., Morris, J. N., Rebok, G. W., Smith, D. M., &\nTennstedt, S. L. (2002). Effects of cognitive training interventions\nwith older adults: A randomized controlled trial. Jama,\n288(18), 2271–2281.\n\n\nBarber, T. X. (1976). Pitfalls in Human Research:\nTen Pivotal Points. Pergamon Press.\n\n\nBartoš, F., & Schimmack, U. (2020). Z-Curve.2.0:\nEstimating Replication Rates and Discovery\nRates. https://doi.org/10.31234/osf.io/urgtn\n\n\nBauer, P., & Kieser, M. (1996). A unifying approach for confidence\nintervals and testing of equivalence and difference.\nBiometrika, 83(4), 934–937.\n\n\nBausell, R. B., & Li, Y.-F. (2002). Power Analysis\nfor Experimental Research: A Practical Guide\nfor the Biological, Medical and Social\nSciences (1st edition). Cambridge University\nPress.\n\n\nBeck, W. S. (1957). Modern Science and the nature of\nlife (First Edition). Harcourt, Brace.\n\n\nBecker, B. J. (2005). Failsafe N or File-Drawer\nNumber. In Publication Bias in\nMeta-Analysis (pp. 111–125). John Wiley &\nSons, Ltd. https://doi.org/10.1002/0470870168.ch7\n\n\nBem, D. J. (2011). Feeling the future: Experimental evidence for\nanomalous retroactive influences on cognition and affect. Journal of\nPersonality and Social Psychology, 100(3), 407–425. https://doi.org/10.1037/a0021524\n\n\nBem, D. J., Utts, J., & Johnson, W. O. (2011). Must psychologists\nchange the way they analyze their data? Journal of Personality and\nSocial Psychology, 101(4), 716–719. https://doi.org/10.1037/a0024777\n\n\nBender, R., & Lange, S. (2001). Adjusting for multiple\ntestingwhen and how? Journal of Clinical\nEpidemiology, 54(4), 343–349.\n\n\nBenjamini, Y. (2016). It’s Not the p-values’\nFault. The American Statistician: Supplemental Material\nto the ASA Statement on P-Values and Statistical Significance,\n70, 1–2.\n\n\nBenjamini, Y., & Hochberg, Y. (1995). Controlling the false\ndiscovery rate: A practical and powerful approach to multiple testing.\nJournal of the Royal Statistical Society. Series B\n(Methodological), 289–300. https://www.jstor.org/stable/2346101\n\n\nBen-Shachar, M. S., Lüdecke, D., & Makowski, D. (2020). Effectsize:\nEstimation of Effect Size Indices and\nStandardized Parameters. Journal of Open Source\nSoftware, 5(56), 2815. https://doi.org/10.21105/joss.02815\n\n\nBerger, J. O., & Bayarri, M. J. (2004). The Interplay\nof Bayesian and Frequentist Analysis.\nStatistical Science, 19(1), 58–80. https://doi.org/10.1214/088342304000000116\n\n\nBerkeley, G. (1735). A defence of free-thinking in mathematics, in\nanswer to a pamphlet of Philalethes Cantabrigiensis\nentitled Geometry No Friend to Infidelity.\nAlso an appendix concerning mr. Walton’s\nVindication of the principles of fluxions against the\nobjections contained in The analyst. By the\nauthor of The minute philosopher (Vol. 3).\n\n\nBird, S. B., & Sivilotti, M. L. A. (2008). Self-plagiarism,\nrecycling fraud, and the intent to mislead. Journal of Medical\nToxicology, 4(2), 69–70. https://doi.org/10.1007/BF03160957\n\n\nBishop, D. V. M. (2018). Fallibility in Science:\nResponding to Errors in the Work\nof Oneself and Others. Advances in Methods\nand Practices in Psychological Science, 2515245918776632. https://doi.org/10.1177/2515245918776632\n\n\nBland, M. (2015). An introduction to medical statistics (Fourth\nedition). Oxford University Press.\n\n\nBorenstein, M. (Ed.). (2009). Introduction to meta-analysis.\nJohn Wiley & Sons.\n\n\nBosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A.\n(2015). Correlational effect size benchmarks. The Journal of Applied\nPsychology, 100(2), 431–449. https://doi.org/10.1037/a0038047\n\n\nBozarth, J. D., & Roberts, R. R. (1972). Signifying significant\nsignificance. American Psychologist, 27(8), 774.\n\n\nBretz, F., Hothorn, T., & Westfall, P. H. (2011). Multiple\ncomparisons using R. CRC Press.\n\n\nBross, I. D. (1971). Critical levels, statistical language and\nscientific inference. In Foundations of statistical inference\n(pp. 500–513). Holt, Rinehart and Winston.\n\n\nBrown, G. W. (1983). Errors, Types I and II.\nAmerican Journal of Diseases of Children, 137(6),\n586–591. https://doi.org/10.1001/archpedi.1983.02140320062014\n\n\nBrown, N. J. L., & Heathers, J. A. J. (2017). The GRIM\nTest: A Simple Technique Detects Numerous Anomalies\nin the Reporting of Results in\nPsychology. Social Psychological and Personality\nScience, 8(4), 363–369. https://doi.org/10.1177/1948550616673876\n\n\nBrunner, J., & Schimmack, U. (2020). Estimating Population\nMean Power Under Conditions of Heterogeneity and\nSelection for Significance.\nMeta-Psychology, 4. https://doi.org/10.15626/MP.2018.874\n\n\nBryan, C. J., Tipton, E., & Yeager, D. S. (2021). Behavioural\nscience is unlikely to change the world without a heterogeneity\nrevolution. Nature Human Behaviour, 1–10. https://doi.org/10.1038/s41562-021-01143-3\n\n\nBrysbaert, M. (2019). How many participants do we have to include in\nproperly powered experiments? A tutorial of power analysis\nwith reference tables. Journal of Cognition, 2(1), 16.\nhttps://doi.org/10.5334/joc.72\n\n\nBrysbaert, M., & Stevens, M. (2018). Power Analysis and\nEffect Size in Mixed Effects Models: A\nTutorial. Journal of Cognition, 1(1). https://doi.org/10.5334/joc.10\n\n\nBuchanan, E. M., Scofield, J., & Valentine, K. D. (2017).\nMOTE: Effect Size and Confidence\nInterval Calculator.\n\n\nBulus, M., & Dong, N. (2021). Bound Constrained\nOptimization of Sample Sizes Subject to\nMonetary Restrictions in Planning Multilevel\nRandomized Trials and Regression Discontinuity\nStudies. The Journal of Experimental Education,\n89(2), 379–401. https://doi.org/10.1080/00220973.2019.1636197\n\n\nBurriss, R. P., Troscianko, J., Lovell, P. G., Fulford, A. J. C.,\nStevens, M., Quigley, R., Payne, J., Saxton, T. K., & Rowland, H. M.\n(2015). Changes in women’s facial skin color over the ovulatory cycle\nare not detectable by the human visual system. PLOS ONE,\n10(7), e0130093. https://doi.org/10.1371/journal.pone.0130093\n\n\nButton, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint,\nJ., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why\nsmall sample size undermines the reliability of neuroscience. Nature\nReviews Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475\n\n\nButton, K. S., Kounali, D., Thomas, L., Wiles, N. J., Peters, T. J.,\nWelton, N. J., Ades, A. E., & Lewis, G. (2015). Minimal clinically\nimportant difference on the Beck Depression Inventory -\nII according to the patient’s perspective.\nPsychological Medicine, 45(15), 3269–3279. https://doi.org/10.1017/S0033291715001270\n\n\nCaplan, A. L. (2021). How Should We Regard Information\nGathered in Nazi Experiments? AMA Journal of\nEthics, 23(1), 55–58. https://doi.org/10.1001/amajethics.2021.55\n\n\nCarter, E. C., & McCullough, M. E. (2014). Publication bias and the\nlimited strength model of self-control: Has the evidence for ego\ndepletion been overestimated? Frontiers in Psychology,\n5. https://doi.org/10.3389/fpsyg.2014.00823\n\n\nCarter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J.\n(2019). Correcting for Bias in Psychology:\nA Comparison of Meta-Analytic Methods.\nAdvances in Methods and Practices in Psychological Science,\n2(2), 115–144. https://doi.org/10.1177/2515245919847196\n\n\nCascio, W. F., & Zedeck, S. (1983). Open a New Window\nin Rational Research Planning: Adjust Alpha to\nMaximize Statistical Power. Personnel Psychology,\n36(3), 517–526. https://doi.org/10.1111/j.1744-6570.1983.tb02233.x\n\n\nCeci, S. J., & Bjork, R. A. (2000). Psychological\nScience in the Public Interest: The\nCase for Juried Analyses. Psychological\nScience, 11(3), 177–178. https://doi.org/10.1111/1467-9280.00237\n\n\nCevolani, G., Crupi, V., & Festa, R. (2011). Verisimilitude and\nbelief change for conjunctive theories. Erkenntnis,\n75(2), 183.\n\n\nChalmers, I., & Glasziou, P. (2009). Avoidable waste in the\nproduction and reporting of research evidence. The Lancet,\n374(9683), 86–89.\n\n\nChamberlin, T. C. (1890). The Method of Multiple\nWorking Hypotheses. Science, ns-15(366), 92–96.\nhttps://doi.org/10.1126/science.ns-15.366.92\n\n\nChambers, C. D., & Tzavella, L. (2022). The past, present and future\nof Registered Reports. Nature Human Behaviour,\n6(1), 29–42. https://doi.org/10.1038/s41562-021-01193-7\n\n\nChang, H. (2022). Realism for Realistic People: A\nNew Pragmatist Philosophy of Science.\nCambridge University Press. https://doi.org/10.1017/9781108635738\n\n\nChang, M. (2016). Adaptive Design Theory and\nImplementation Using SAS and R (2nd\nedition). Chapman and Hall/CRC.\n\n\nChatziathanasiou, K. (2022). Beware the Lure of\nNarratives: “Hungry Judges”\nShould not Motivate the Use of\n“Artificial Intelligence” in\nLaw ({{SSRN Scholarly Paper}} ID 4011603).\nSocial Science Research Network. https://doi.org/10.2139/ssrn.4011603\n\n\nChin, J. M., Pickett, J. T., Vazire, S., & Holcombe, A. O. (2021).\nQuestionable Research Practices and Open\nScience in Quantitative Criminology. Journal of\nQuantitative Criminology. https://doi.org/10.1007/s10940-021-09525-6\n\n\nCho, H.-C., & Abe, S. (2013). Is two-tailed testing for directional\nresearch hypotheses tests legitimate? Journal of Business\nResearch, 66(9), 1261–1266. https://doi.org/10.1016/j.jbusres.2012.02.023\n\n\nCohen, J. (1988). Statistical power analysis for the behavioral\nsciences (2nd ed). L. Erlbaum Associates.\n\n\nCohen, J. (1990). Things I have learned (so far).\nAmerican Psychologist, 45(12), 1304–1312. https://doi.org/10.1037/0003-066X.45.12.1304\n\n\nCohen, J. (1994). The earth is round (p < .05). American\nPsychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997\n\n\nColes, N. A., March, D. S., Marmolejo-Ramos, F., Larsen, J. T., Arinze,\nN. C., Ndukaihe, I. L. G., Willis, M. L., Foroni, F., Reggev, N.,\nMokady, A., Forscher, P. S., Hunter, J. F., Kaminski, G., Yüvrük, E.,\nKapucu, A., Nagy, T., Hajdu, N., Tejada, J., Freitag, R. M. K., …\nLiuzza, M. T. (2022). A multi-lab test of the facial feedback hypothesis\nby the Many Smiles Collaboration. Nature Human\nBehaviour, 6(12), 1731–1742. https://doi.org/10.1038/s41562-022-01458-9\n\n\nColling, L. J., Szcs, D., De Marco, D., Cipora, K., Ulrich, R., Nuerk,\nH.-C., Soltanlou, M., Bryce, D., Chen, S.-C., Schroeder, P. A., Henare,\nD. T., Chrystall, C. K., Corballis, P. M., Ansari, D., Goffin, C.,\nSokolowski, H. M., Hancock, P. J. B., Millen, A. E., Langton, S. R. H.,\n… McShane, B. B. (2020). Registered Replication Report on\nFischer, Castel, Dodd, and\nPratt (2003). Advances in Methods and Practices in\nPsychological Science, 3(2), 143–162. https://doi.org/10.1177/2515245920903079\n\n\nColquhoun, D. (2019). The False Positive Risk: A\nProposal Concerning What to Do About\np-Values. The American Statistician,\n73(sup1), 192–201. https://doi.org/10.1080/00031305.2018.1529622\n\n\nCook, J., Hislop, J., Adewuyi, T., Harrild, K., Altman, D., Ramsay, C.,\nFraser, C., Buckley, B., Fayers, P., Harvey, I., Briggs, A., Norrie, J.,\nFergusson, D., Ford, I., & Vale, L. (2014). Assessing methods to\nspecify the target difference for a randomised controlled trial:\nDELTA (Difference ELicitation in\nTriAls) review. Health Technology Assessment,\n18(28). https://doi.org/10.3310/hta18280\n\n\nCook, T. D. (2002). P-Value Adjustment in Sequential\nClinical Trials. Biometrics, 58(4), 1005–1011.\n\n\nCooper, H. (2020). Reporting quantitative research in psychology:\nHow to meet APA Style Journal Article Reporting\nStandards (2nd ed.). American Psychological\nAssociation. https://doi.org/10.1037/0000178-000\n\n\nCooper, H. M., Hedges, L. V., & Valentine, J. C. (Eds.). (2009).\nThe handbook of research synthesis and meta-analysis (2nd ed).\nRussell Sage Foundation.\n\n\nCopay, A. G., Subach, B. R., Glassman, S. D., Polly, D. W., &\nSchuler, T. C. (2007). Understanding the minimum clinically important\ndifference: A review of concepts and methods. The Spine\nJournal, 7(5), 541–546. https://doi.org/10.1016/j.spinee.2007.01.008\n\n\nCorneille, O., Havemann, J., Henderson, E. L., IJzerman, H., Hussey, I.,\nOrban de Xivry, J.-J., Jussim, L., Holmes, N. P., Pilacinski, A.,\nBeffara, B., Carroll, H., Outa, N. O., Lush, P., & Lotter, L. D.\n(2023). Beware “persuasive communication devices” when\nwriting and reading scientific articles. eLife, 12,\ne88654. https://doi.org/10.7554/eLife.88654\n\n\nCorrell, J., Mellinger, C., McClelland, G. H., & Judd, C. M. (2020).\nAvoid Cohen’s “Small,”\n“Medium,” and\n“Large” for Power Analysis.\nTrends in Cognitive Sciences, 24(3), 200–207. https://doi.org/10.1016/j.tics.2019.12.009\n\n\nCousineau, D., & Chiasson, F. (2019). Superb:\nComputes standard error and confidence interval of means\nunder various designs and sampling schemes [Manual].\n\n\nCowles, M., & Davis, C. (1982). On the origins of the. 05 level of\nstatistical significance. American Psychologist,\n37(5), 553.\n\n\nCox, D. R. (1958). Some Problems Connected with\nStatistical Inference. Annals of Mathematical\nStatistics, 29(2), 357–372. https://doi.org/10.1214/aoms/1177706618\n\n\nCribbie, R. A., Gruman, J. A., & Arpin-Cribbie, C. A. (2004).\nRecommendations for applying tests of equivalence. Journal of\nClinical Psychology, 60(1), 1–10.\n\n\nCrusius, J., Gonzalez, M. F., Lange, J., & Cohen-Charash, Y. (2020).\nEnvy: An Adversarial Review and Comparison of\nTwo Competing Views. Emotion Review,\n12(1), 3–21. https://doi.org/10.1177/1754073919873131\n\n\nCrüwell, S., Apthorp, D., Baker, B. J., Colling, L., Elson, M., Geiger,\nS. J., Lobentanzer, S., Monéger, J., Patterson, A., Schwarzkopf, D. S.,\nZaneva, M., & Brown, N. J. L. (2023). What’s in a\nBadge? A Computational Reproducibility\nInvestigation of the Open Data Badge Policy in\nOne Issue of Psychological Science.\nPsychological Science, 09567976221140828. https://doi.org/10.1177/09567976221140828\n\n\nCumming, G. (2008). Replication and p\nIntervals: p Values\nPredict the Future Only Vaguely, but\nConfidence Intervals Do Much Better. Perspectives on\nPsychological Science, 3(4), 286–300. https://doi.org/10.1111/j.1745-6924.2008.00079.x\n\n\nCumming, G. (2013). Understanding the new statistics:\nEffect sizes, confidence intervals, and meta-analysis.\nRoutledge.\n\n\nCumming, G. (2014). The New Statistics: Why\nand How. Psychological Science, 25(1),\n7–29. https://doi.org/10.1177/0956797613504966\n\n\nCumming, G., & Calin-Jageman, R. (2016). Introduction to the\nNew Statistics: Estimation, Open\nScience, and Beyond. Routledge.\n\n\nCumming, G., & Maillardet, R. (2006). Confidence intervals and\nreplication: Where will the next mean fall?\nPsychological Methods, 11(3), 217–227. https://doi.org/10.1037/1082-989X.11.3.217\n\n\nDanziger, S., Levav, J., & Avnaim-Pesso, L. (2011). Extraneous\nfactors in judicial decisions. Proceedings of the National Academy\nof Sciences, 108(17), 6889–6892. https://doi.org/10.1073/PNAS.1018033108\n\n\nde Groot, A. D. (1969). Methodology (Vol. 6). Mouton\n& Co.\n\n\nde Heide, R., & Grünwald, P. D. (2017). Why optional stopping is a\nproblem for Bayesians. arXiv:1708.08278 [Math,\nStat]. https://arxiv.org/abs/1708.08278\n\n\nDeBruine, L. M., & Barr, D. J. (2021). Understanding\nMixed-Effects Models Through Data Simulation. Advances\nin Methods and Practices in Psychological Science, 4(1),\n2515245920965119. https://doi.org/10.1177/2515245920965119\n\n\nDelacre, M., Lakens, D., Ley, C., Liu, L., & Leys, C. (2021).\nWhy Hedges’ g*s based on the non-pooled standard\ndeviation should be reported with Welch’s t-test.\nPsyArXiv. https://doi.org/10.31234/osf.io/tu6mp\n\n\nDelacre, M., Lakens, D., & Leys, C. (2017). Why Psychologists\nShould by Default Use Welch’s\nt-test Instead of\nStudent’s t-test. International\nReview of Social Psychology, 30(1). https://doi.org/10.5334/irsp.82\n\n\nDetsky, A. S. (1990). Using cost-effectiveness analysis to improve the\nefficiency of allocating funds to clinical trials. Statistics in\nMedicine, 9(1-2), 173–184. https://doi.org/10.1002/sim.4780090124\n\n\nDienes, Z. (2008). Understanding psychology as a science:\nAn introduction to scientific and statistical\ninference. Palgrave Macmillan.\n\n\nDienes, Z. (2014). Using Bayes to get the most out of\nnon-significant results. Frontiers in Psychology, 5.\nhttps://doi.org/10.3389/fpsyg.2014.00781\n\n\nDmitrienko, A., & D’Agostino Sr, R. (2013). Traditional multiplicity\nadjustment methods in clinical trials. Statistics in Medicine,\n32(29), 5172–5218. https://doi.org/10.1002/sim.5990\n\n\nDodge, H. F., & Romig, H. G. (1929). A Method of\nSampling Inspection. Bell System Technical\nJournal, 8(4), 613–631. https://doi.org/10.1002/j.1538-7305.1929.tb01240.x\n\n\nDongen, N. N. N. van, Doorn, J. B. van, Gronau, Q. F., Ravenzwaaij, D.\nvan, Hoekstra, R., Haucke, M. N., Lakens, D., Hennig, C., Morey, R. D.,\nHomer, S., Gelman, A., Sprenger, J., & Wagenmakers, E.-J. (2019).\nMultiple Perspectives on Inference for\nTwo Simple Statistical Scenarios. The American\nStatistician, 73(sup1), 328–339. https://doi.org/10.1080/00031305.2019.1565553\n\n\nDouglas, H. E. (2009). Science, policy, and the value-free\nideal. University of Pittsburgh Press.\n\n\nDubin, R. (1969). Theory building. Free Press.\n\n\nDuhem, P. (1954). The aim and structure of physical theory.\nPrinceton University Press.\n\n\nDupont, W. D. (1983). Sequential stopping rules and sequentially\nadjusted P values: Does one require the other?\nControlled Clinical Trials, 4(1), 3–10. https://doi.org/10.1016/S0197-2456(83)80003-8\n\n\nDuyx, B., Urlings, M. J. E., Swaen, G. M. H., Bouter, L. M., &\nZeegers, M. P. (2017). Scientific citations favor positive results: A\nsystematic review and meta-analysis. Journal of Clinical\nEpidemiology, 88, 92–101. https://doi.org/10.1016/j.jclinepi.2017.06.002\n\n\nEbersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M.,\nAllen, J. M., Banks, J. B., Baranski, E., Bernstein, M. J., Bonfiglio,\nD. B. V., Boucher, L., Brown, E. R., Budiman, N. I., Cairo, A. H.,\nCapaldi, C. A., Chartier, C. R., Chung, J. M., Cicero, D. C., Coleman,\nJ. A., Conway, J. G., … Nosek, B. A. (2016). Many Labs 3:\nEvaluating participant pool quality across the academic\nsemester via replication. Journal of Experimental Social\nPsychology, 67, 68–82. https://doi.org/10.1016/j.jesp.2015.10.012\n\n\nEckermann, S., Karnon, J., & Willan, A. R. (2010). The\nValue of Value of Information.\nPharmacoEconomics, 28(9), 699–709. https://doi.org/10.2165/11537370-000000000-00000\n\n\nEdwards, M. A., & Roy, S. (2017). Academic Research in\nthe 21st Century: Maintaining Scientific\nIntegrity in a Climate of Perverse\nIncentives and Hypercompetition. Environmental\nEngineering Science, 34(1), 51–61. https://doi.org/10.1089/ees.2016.0223\n\n\nElson, M., Mohseni, M. R., Breuer, J., Scharkow, M., & Quandt, T.\n(2014). Press CRTT to measure aggressive behavior: The\nunstandardized use of the competitive reaction time task in aggression\nresearch. Psychological Assessment, 26(2), 419–432. https://doi.org/10.1037/a0035569\n\n\nErdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER:\nA general power analysis program. Behavior Research\nMethods, Instruments, & Computers, 28(1), 1–11. https://doi.org/10.3758/BF03203630\n\n\nEysenck, H. J. (1978). An exercise in mega-silliness. American\nPsychologist, 33(5), 517–517. https://doi.org/10.1037/0003-066X.33.5.517.a\n\n\nFanelli, D. (2010). “Positive” Results\nIncrease Down the Hierarchy of the\nSciences. PLoS ONE, 5(4). https://doi.org/10.1371/journal.pone.0010068\n\n\nFaul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007).\nGPower 3: A flexible statistical power\nanalysis program for the social, behavioral, and biomedical sciences.\nBehavior Research Methods, 39(2), 175–191. https://doi.org/10.3758/BF03193146\n\n\nFerguson, C. J. (2014). Comment: Why meta-analyses rarely\nresolve ideological debates. Emotion Review, 6(3),\n251–252.\n\n\nFerguson, C. J., & Heene, M. (2012). A vast graveyard of undead\ntheories publication bias and psychological science’s aversion to the\nnull. Perspectives on Psychological Science, 7(6),\n555–561.\n\n\nFerguson, C. J., & Heene, M. (2021). Providing a lower-bound\nestimate for psychology’s “crud factor”: The\ncase of aggression. Professional Psychology: Research and\nPractice, 52(6), 620–626. https://doi.org/http://dx.doi.org/10.1037/pro0000386\n\n\nFerguson, C., Marcus, A., & Oransky, I. (2014). Publishing:\nThe peer-review scam. Nature, 515(7528),\n480–482. https://doi.org/10.1038/515480a\n\n\nFerron, J., & Onghena, P. (1996). The Power of\nRandomization Tests for Single-Case Phase\nDesigns. The Journal of Experimental Education,\n64(3), 231–239. https://doi.org/10.1080/00220973.1996.9943805\n\n\nFeyerabend, P. (1993). Against method (3rd ed).\nVerso.\n\n\nFeynman, R. P. (1974). Cargo cult science. Engineering and\nScience, 37(7), 10–13.\n\n\nFiedler, K. (2004). Tools, toys, truisms, and theories:\nSome thoughts on the creative cycle of theory formation.\nPersonality and Social Psychology Review, 8(2),\n123–131. https://doi.org/10.1207/s15327957pspr0802_5\n\n\nFiedler, K., & Schwarz, N. (2016). Questionable Research\nPractices Revisited. Social Psychological and Personality\nScience, 7(1), 45–52. https://doi.org/10.1177/1948550615612150\n\n\nField, S. A., Tyre, A. J., Jonzén, N., Rhodes, J. R., & Possingham,\nH. P. (2004). Minimizing the cost of environmental management decisions\nby optimizing statistical thresholds. Ecology Letters,\n7(8), 669–675. https://doi.org/10.1111/j.1461-0248.2004.00625.x\n\n\nFisher, Ronald Aylmer. (1935). The design of experiments.\nOliver And Boyd; Edinburgh; London.\n\n\nFisher, Ronald A. (1936). Has Mendel’s work been\nrediscovered? Annals of Science, 1(2), 115–137.\n\n\nFisher, Ronald A. (1956). Statistical methods and scientific\ninference: Vol. viii. Hafner Publishing Co.\n\n\nFraley, R. C., & Vazire, S. (2014). The N-Pact Factor:\nEvaluating the Quality of Empirical\nJournals with Respect to Sample Size\nand Statistical Power. PLOS ONE, 9(10),\ne109019. https://doi.org/10.1371/journal.pone.0109019\n\n\nFrancis, G. (2014). The frequency of excess success for articles in\nPsychological Science. Psychonomic Bulletin &\nReview, 21(5), 1180–1187. https://doi.org/10.3758/s13423-014-0601-x\n\n\nFrancis, G. (2016). Equivalent statistics and data interpretation.\nBehavior Research Methods, 1–15. https://doi.org/10.3758/s13428-016-0812-3\n\n\nFranco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias\nin the social sciences: Unlocking the file drawer.\nScience, 345(6203), 1502–1505. https://doi.org/10.1126/SCIENCE.1255484\n\n\nFrankenhuis, W. E., Panchanathan, K., & Smaldino, P. E. (2022).\nStrategic ambiguity in the social sciences. Social Psychological\nBulletin.\n\n\nFraser, H., Parker, T., Nakagawa, S., Barnett, A., & Fidler, F.\n(2018). Questionable research practices in ecology and evolution.\nPLOS ONE, 13(7), e0200303. https://doi.org/10.1371/journal.pone.0200303\n\n\nFreiman, J. A., Chalmers, T. C., Smith, H., & Kuebler, R. R. (1978).\nThe importance of beta, the type II error and sample size\nin the design and interpretation of the randomized control trial.\nSurvey of 71 \"negative\" trials. The New England Journal\nof Medicine, 299(13), 690–694. https://doi.org/10.1056/NEJM197809282991304\n\n\nFrick, R. W. (1996). The appropriate use of null hypothesis testing.\nPsychological Methods, 1(4), 379–390. https://doi.org/10.1037/1082-989X.1.4.379\n\n\nFricker, R. D., Burke, K., Han, X., & Woodall, W. H. (2019).\nAssessing the Statistical Analyses Used in\nBasic and Applied Social Psychology After\nTheir p-Value Ban. The American\nStatistician, 73(sup1), 374–384. https://doi.org/10.1080/00031305.2018.1537892\n\n\nFried, B. J., Boers, M., & Baker, P. R. (1993). A method for\nachieving consensus on rheumatoid arthritis outcome measures: The\nOMERACT conference process. The Journal of\nRheumatology, 20(3), 548–551.\n\n\nFriede, T., & Kieser, M. (2006). Sample size recalculation in\ninternal pilot study designs: A review. Biometrical Journal: Journal\nof Mathematical Methods in Biosciences, 48(4), 537–555. https://doi.org/10.1002/bimj.200510238\n\n\nFriedlander, F. (1964). Type I and Type II\nBias. American Psychologist, 19(3), 198–199. https://doi.org/10.1037/h0038977\n\n\nFugard, A. J. B., & Potts, H. W. W. (2015). Supporting thinking on\nsample sizes for thematic analyses: A quantitative tool.\nInternational Journal of Social Research Methodology,\n18(6), 669–684. https://doi.org/10.1080/13645579.2015.1005453\n\n\nFunder, D. C., & Ozer, D. J. (2019). Evaluating effect size in\npsychological research: Sense and nonsense. Advances in\nMethods and Practices in Psychological Science, 2(2),\n156–168. https://doi.org/10.1177/2515245919847202\n\n\nGannon, M. A., de Bragança Pereira, C. A., & Polpo, A. (2019).\nBlending Bayesian and Classical Tools to\nDefine Optimal Sample-Size-Dependent Significance Levels.\nThe American Statistician, 73(sup1), 213–222. https://doi.org/10.1080/00031305.2018.1518268\n\n\nGelman, A., & Carlin, J. (2014). Beyond Power\nCalculations: Assessing Type S (Sign)\nand Type M (Magnitude) Errors.\nPerspectives on Psychological Science, 9(6), 641–651.\n\n\nGerring, J. (2012). Mere Description. British Journal\nof Political Science, 42(4), 721–746. https://doi.org/10.1017/S0007123412000130\n\n\nGillon, R. (1994). Medical ethics: Four principles plus attention to\nscope. BMJ, 309(6948), 184. https://doi.org/10.1136/bmj.309.6948.184\n\n\nGlöckner, A. (2016). The irrational hungry judge effect revisited:\nSimulations reveal that the magnitude of the effect is\noverestimated. Judgment and Decision Making, 11(6),\n601–610.\n\n\nGlover, S., & Dixon, P. (2004). Likelihood ratios: A\nsimple and flexible statistic for empirical psychologists.\nPsychonomic Bulletin & Review, 11(5), 791–806.\n\n\nGoldacre, B., DeVito, N. J., Heneghan, C., Irving, F., Bacon, S.,\nFleminger, J., & Curtis, H. (2018). Compliance with requirement to\nreport results on the EU Clinical Trials Register: Cohort\nstudy and web resource. BMJ, 362, k3218. https://doi.org/10.1136/bmj.k3218\n\n\nGood, I. J. (1992). The Bayes/Non-Bayes\ncompromise: A brief review. Journal of the American\nStatistical Association, 87(419), 597–606. https://doi.org/10.2307/2290192\n\n\nGoodyear-Smith, F. A., van Driel, M. L., Arroll, B., & Del Mar, C.\n(2012). Analysis of decisions made in meta-analyses of depression\nscreening and the risk of confirmation bias: A case study.\nBMC Medical Research Methodology, 12, 76. https://doi.org/10.1186/1471-2288-12-76\n\n\nGopalakrishna, G., Riet, G. ter, Vink, G., Stoop, I., Wicherts, J. M.,\n& Bouter, L. M. (2022). Prevalence of questionable research\npractices, research misconduct and their potential explanatory factors:\nA survey among academic researchers in The\nNetherlands. PLOS ONE, 17(2), e0263023. https://doi.org/10.1371/journal.pone.0263023\n\n\nGosset, W. S. (1904). The Application of the\n\"Law of Error\" to the Work of the\nBrewery (1 vol 8; pp. 3–16). Arthur Guinness\n& Son, Ltd.\n\n\nGreen, P., & MacLeod, C. J. (2016). SIMR: An\nR package for power analysis of generalized linear mixed\nmodels by simulation. Methods in Ecology and Evolution,\n7(4), 493–498. https://doi.org/10.1111/2041-210X.12504\n\n\nGreen, S. B. (1991). How Many Subjects Does It Take To Do A\nRegression Analysis. Multivariate Behavioral Research,\n26(3), 499–510. https://doi.org/10.1207/s15327906mbr2603_7\n\n\nGreenwald, A. G. (1975). Consequences of prejudice against the null\nhypothesis. Psychological Bulletin, 82(1), 1–20.\n\n\nGrünwald, P., de Heide, R., & Koolen, W. (2019). Safe\nTesting. arXiv:1906.07801 [Cs, Math, Stat]. https://arxiv.org/abs/1906.07801\n\n\nGupta, S. K. (2011). Intention-to-treat concept: A review.\nPerspectives in Clinical Research, 2(3), 109–112. https://doi.org/10.4103/2229-3485.83221\n\n\nHacking, I. (1965). Logic of Statistical\nInference. Cambridge University Press.\n\n\nHagger, M. S., Chatzisarantis, N. L. D., Alberts, H., Anggono, C. O.,\nBatailler, C., Birt, A. R., Brand, R., Brandt, M. J., Brewer, G.,\nBruyneel, S., Calvillo, D. P., Campbell, W. K., Cannon, P. R., Carlucci,\nM., Carruth, N. P., Cheung, T., Crowell, A., De Ridder, D. T. D.,\nDewitte, S., … Zwienenberg, M. (2016). A Multilab Preregistered\nReplication of the Ego-Depletion Effect.\nPerspectives on Psychological Science, 11(4), 546–573.\nhttps://doi.org/10.1177/1745691616652873\n\n\nHallahan, M., & Rosenthal, R. (1996). Statistical power:\nConcepts, procedures, and applications. Behaviour\nResearch and Therapy, 34(5), 489–499. https://doi.org/10.1016/0005-7967(95)00082-8\n\n\nHallinan, D., Boehm, F., Külpmann, A., & Elson, M. (2023).\nInformation Provision for Informed Consent\nProcedures in Psychological Research Under the\nGeneral Data Protection Regulation: A Practical\nGuide. Advances in Methods and Practices in Psychological\nScience, 6(1), 25152459231151944. https://doi.org/10.1177/25152459231151944\n\n\nHalpern, J., Brown Jr, B. W., & Hornberger, J. (2001). The sample\nsize for a clinical trial: A Bayesian decision theoretic\napproach. Statistics in Medicine, 20(6), 841–858. https://doi.org/10.1002/sim.703\n\n\nHalpern, S. D., Karlawish, J. H., & Berlin, J. A. (2002). The\ncontinuing unethical conduct of underpowered clinical trials.\nJama, 288(3), 358–362. https://doi.org/doi:10.1001/jama.288.3.358\n\n\nHand, D. J. (1994). Deconstructing Statistical Questions.\nJournal of the Royal Statistical Society. Series A (Statistics in\nSociety), 157(3), 317–356. https://doi.org/10.2307/2983526\n\n\nHardwicke, T. E., Mathur, M. B., MacDonald, K., Nilsonne, G., Banks, G.\nC., Kidwell, M. C., Mohr, A. H., Clayton, E., Yoon, E. J., Tessler, M.\nH., Lenne, R. L., Altman, S., Long, B., & Frank, M. C. (2018). Data\navailability, reusability, and analytic reproducibility: Evaluating the\nimpact of a mandatory open data policy at the journal\nCognition. Open Science, 5(8), 180448. https://doi.org/10.1098/rsos.180448\n\n\nHarms, C., & Lakens, D. (2018). Making ’null effects’ informative:\nStatistical techniques and inferential frameworks. Journal of\nClinical and Translational Research, 3, 382–393. https://doi.org/10.18053/jctres.03.2017S2.007\n\n\nHarrer, M., Cuijpers, P., Furukawa, T. A., & Ebert, D. D. (2021).\nDoing Meta-Analysis with R: A\nHands-On Guide. Chapman and Hall/CRC. https://doi.org/10.1201/9781003107347\n\n\nHauck, D. W. W., & Anderson, S. (1984). A new statistical procedure\nfor testing equivalence in two-group comparative bioavailability trials.\nJournal of Pharmacokinetics and Biopharmaceutics,\n12(1), 83–91. https://doi.org/10.1007/BF01063612\n\n\nHedges, L. V., & Pigott, T. D. (2001). The power of statistical\ntests in meta-analysis. Psychological Methods, 6(3),\n203–217. https://doi.org/10.1037/1082-989X.6.3.203\n\n\nHempel, C. G. (1966). Philosophy of natural science (Nachdr.).\nPrentice-Hall.\n\n\nHilgard, J. (2021). Maximal positive controls: A method for\nestimating the largest plausible effect size. Journal of\nExperimental Social Psychology, 93. https://doi.org/10.1016/j.jesp.2020.104082\n\n\nHill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008).\nEmpirical Benchmarks for Interpreting Effect\nSizes in Research. Child Development\nPerspectives, 2(3), 172–177. https://doi.org/10.1111/j.1750-8606.2008.00061.x\n\n\nHodges, J. L., & Lehmann, E. L. (1954). Testing the\nApproximate Validity of Statistical\nHypotheses. Journal of the Royal Statistical Society. Series\nB (Methodological), 16(2), 261–268. https://doi.org/10.1111/j.2517-6161.1954.tb00169.x\n\n\nHoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The\npervasive fallacy of power calculations for data analysis. The\nAmerican Statistician, 55(1), 19–24. https://doi.org/10.1198/000313001300339897\n\n\nHuedo-Medina, T. B., Sánchez-Meca, J., Marín-Martínez, F., &\nBotella, J. (2006). Assessing heterogeneity in meta-analysis:\nQ statistic or I$2̂$ index? Psychological\nMethods, 11(2), 193.\n\n\nHung, H. M. J., O’Neill, R. T., Bauer, P., & Kohne, K. (1997). The\nBehavior of the P-Value When the\nAlternative Hypothesis is True.\nBiometrics, 53(1), 11–22. https://doi.org/10.2307/2533093\n\n\nHunt, K. (1975). Do we really need more replications? Psychological\nReports, 36(2), 587–593.\n\n\nHyde, J. S., Lindberg, S. M., Linn, M. C., Ellis, A. B., & Williams,\nC. C. (2008). Gender Similarities Characterize Math\nPerformance. Science, 321(5888), 494–495. https://doi.org/10.1126/science.1160364\n\n\nIoannidis, J. P. A. (2005). Why Most Published Research Findings\nAre False. PLoS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124\n\n\nIoannidis, J. P. A., & Trikalinos, T. A. (2007). An exploratory test\nfor an excess of significant findings. Clinical Trials,\n4(3), 245–253. https://doi.org/10.1177/1740774507079441\n\n\nIyengar, S., & Greenhouse, J. B. (1988). Selection\nModels and the File Drawer Problem.\nStatistical Science, 3(1), 109–117. https://www.jstor.org/stable/2245925\n\n\nJaeschke, R., Singer, J., & Guyatt, G. H. (1989). Measurement of\nhealth status: Ascertaining the minimal clinically\nimportant difference. Controlled Clinical Trials,\n10(4), 407–415. https://doi.org/10.1016/0197-2456(89)90005-6\n\n\nJeffreys, H. (1939). Theory of probability (1st ed).\nOxford University Press.\n\n\nJennison, C., & Turnbull, B. W. (2000). Group sequential methods\nwith applications to clinical trials. Chapman &\nHall/CRC.\n\n\nJohansson, T. (2011). Hail the impossible: P-values, evidence, and\nlikelihood. Scandinavian Journal of Psychology, 52(2),\n113–125. https://doi.org/10.1111/j.1467-9450.2010.00852.x\n\n\nJohn, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the\nprevalence of questionable research practices with incentives for truth\ntelling. Psychological Science, 23(5), 524–532. https://doi.org/10.1177/0956797611430953\n\n\nJohnson, V. E. (2013). Revised standards for statistical evidence.\nProceedings of the National Academy of Sciences,\n110(48), 19313–19317. https://doi.org/10.1073/pnas.1313476110\n\n\nJones, L. V. (1952). Test of hypotheses: One-sided vs. Two-sided\nalternatives. Psychological Bulletin, 49(1), 43–46.\nhttps://doi.org/http://dx.doi.org/10.1037/h0056832\n\n\nJostmann, N. B., Lakens, D., & Schubert, T. W. (2009). Weight as an\nEmbodiment of Importance. Psychological\nScience, 20(9), 1169–1174. https://doi.org/10.1111/j.1467-9280.2009.02426.x\n\n\nJostmann, N. B., Lakens, D., & Schubert, T. W. (2016). A short\nhistory of the weight-importance effect and a recommendation for\npre-testing: Commentary on Ebersole et al.\n(2016). Journal of Experimental Social Psychology, 67,\n93–94. https://doi.org/10.1016/j.jesp.2015.12.001\n\n\nJulious, S. A. (2004). Sample sizes for clinical trials with normal\ndata. Statistics in Medicine, 23(12), 1921–1986. https://doi.org/10.1002/sim.1783\n\n\nJunk, T., & Lyons, L. (2020). Reproducibility and\nReplication of Experimental Particle Physics\nResults. Harvard Data Science Review, 2(4). https://doi.org/10.1162/99608f92.250f995b\n\n\nKaiser, H. F. (1960). Directional statistical decisions.\nPsychological Review, 67(3), 160–167. https://doi.org/10.1037/h0047595\n\n\nKaplan, R. M., & Irvin, V. L. (2015). Likelihood of Null\nEffects of Large NHLBI Clinical Trials Has Increased\nover Time. PLOS ONE, 10(8), e0132382. https://doi.org/10.1371/journal.pone.0132382\n\n\nKass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of\nthe American Statistical Association, 90(430), 773–795. https://doi.org/10.1080/01621459.1995.10476572\n\n\nKeefe, R. S. E., Kraemer, H. C., Epstein, R. S., Frank, E., Haynes, G.,\nLaughren, T. P., Mcnulty, J., Reed, S. D., Sanchez, J., & Leon, A.\nC. (2013). Defining a\nClinically Meaningful Effect for the Design\nand Interpretation of Randomized Controlled\nTrials. Innovations in Clinical Neuroscience,\n10(5-6 Suppl A), 4S–19S.\n\n\nKelley, K. (2007). Confidence Intervals for\nStandardized Effect Sizes: Theory,\nApplication, and Implementation. Journal\nof Statistical Software, 20(8). https://doi.org/10.18637/JSS.V020.I08\n\n\nKelley, K., & Preacher, K. J. (2012). On effect size.\nPsychological Methods, 17(2), 137–152. https://doi.org/10.1037/a0028086\n\n\nKelley, K., & Rausch, J. R. (2006). Sample size planning for the\nstandardized mean difference: Accuracy in parameter estimation via\nnarrow confidence intervals. Psychological Methods,\n11(4), 363–385. https://doi.org/10.1037\n\n\nKelter, R. (2021). Analysis of type I and II\nerror rates of Bayesian and frequentist parametric and\nnonparametric two-sample hypothesis tests under preliminary assessment\nof normality. Computational Statistics, 36(2),\n1263–1288. https://doi.org/10.1007/s00180-020-01034-7\n\n\nKenett, R. S., Shmueli, G., & Kenett, R. (2016). Information\nQuality: The Potential of Data\nand Analytics to Generate Knowledge (1st\nedition). Wiley.\n\n\nKennedy-Shaffer, L. (2019). Before p < 0.05 to Beyond p\n< 0.05: Using\nHistory to Contextualize p-Values and\nSignificance Testing. The American Statistician,\n73(sup1), 82–90. https://doi.org/10.1080/00031305.2018.1537891\n\n\nKenny, D. A., & Judd, C. M. (2019). The unappreciated heterogeneity\nof effect sizes: Implications for power, precision,\nplanning of research, and replication. Psychological Methods,\n24(5), 578–589. https://doi.org/10.1037/met0000209\n\n\nKeppel, G. (1991). Design and analysis: A researcher’s\nhandbook, 3rd ed (pp. xiii, 594). Prentice-Hall, Inc.\n\n\nKerr, N. L. (1998). HARKing: Hypothesizing\nAfter the Results are Known.\nPersonality and Social Psychology Review, 2(3),\n196–217. https://doi.org/10.1207/s15327957pspr0203_4\n\n\nKing, M. T. (2011). A point of minimal important difference\n(MID): A critique of terminology and methods. Expert\nReview of Pharmacoeconomics & Outcomes Research,\n11(2), 171–184. https://doi.org/10.1586/erp.11.9\n\n\nKish, L. (1959). Some Statistical Problems in\nResearch Design. American Sociological Review,\n24(3), 328–338. https://doi.org/10.2307/2089381\n\n\nKish, L. (1965). Survey Sampling.\nWiley.\n\n\nKomić, D., Marušić, S. L., & Marušić, A. (2015). Research\nIntegrity and Research Ethics in\nProfessional Codes of Ethics:\nSurvey of Terminology Used by\nProfessional Organizations across Research\nDisciplines. PLOS ONE, 10(7), e0133662. https://doi.org/10.1371/journal.pone.0133662\n\n\nKraft, M. A. (2020). Interpreting effect sizes of education\ninterventions. Educational Researcher, 49(4), 241–253.\nhttps://doi.org/10.3102/0013189X20912798\n\n\nKruschke, J. K. (2011). Bayesian assessment of null values via parameter\nestimation and model comparison. Perspectives on Psychological\nScience, 6(3), 299–312.\n\n\nKruschke, J. K. (2013). Bayesian estimation supersedes the t test.\nJournal of Experimental Psychology: General, 142(2),\n573–603. https://doi.org/10.1037/a0029146\n\n\nKruschke, J. K. (2014). Doing Bayesian Data Analysis,\nSecond Edition: A Tutorial with\nR, JAGS, and Stan (2\nedition). Academic Press.\n\n\nKruschke, J. K. (2018). Rejecting or Accepting Parameter\nValues in Bayesian Estimation. Advances in\nMethods and Practices in Psychological Science, 1(2),\n270–280. https://doi.org/10.1177/2515245918771304\n\n\nKruschke, J. K., & Liddell, T. M. (2017). The Bayesian New\nStatistics: Hypothesis testing, estimation,\nmeta-analysis, and power analysis from a Bayesian\nperspective. Psychonomic Bulletin & Review. https://doi.org/10.3758/s13423-016-1221-4\n\n\nKuhn, T. S. (1962). The Structure of Scientific\nRevolutions. University of Chicago Press.\n\n\nKuipers, T. A. F. (2016). Models, postulates, and generalized nomic\ntruth approximation. Synthese, 193(10), 3057–3077. https://doi.org/10.1007/s11229-015-0916-9\n\n\nLakatos, I. (1978). The methodology of scientific research\nprogrammes: Volume 1: Philosophical\npapers. Cambridge University Press.\n\n\nLakens, Daniël. (2013). Calculating and reporting effect sizes to\nfacilitate cumulative science: A practical primer for t-tests and\nANOVAs. Frontiers in Psychology, 4. https://doi.org/10.3389/fpsyg.2013.00863\n\n\nLakens, Daniël. (2014). Performing high-powered studies efficiently with\nsequential analyses: Sequential analyses. European\nJournal of Social Psychology, 44(7), 701–710. https://doi.org/10.1002/ejsp.2023\n\n\nLakens, Daniël. (2017). Equivalence Tests: A\nPractical Primer for t Tests,\nCorrelations, and Meta-Analyses. Social\nPsychological and Personality Science, 8(4), 355–362. https://doi.org/10.1177/1948550617697177\n\n\nLakens, Daniël. (2019). The value of preregistration for psychological\nscience: A conceptual analysis. Japanese Psychological\nReview, 62(3), 221–230. https://doi.org/10.24602/sjpr.62.3_221\n\n\nLakens, Daniël. (2020). Pandemic researchers recruit your\nown best critics. Nature, 581(7807), 121–121. https://doi.org/10.1038/d41586-020-01392-8\n\n\nLakens, Daniël. (2021). The practical alternative to the p value is the\ncorrectly used p value. Perspectives on Psychological Science,\n16(3), 639–648. https://doi.org/10.1177/1745691620958012\n\n\nLakens, Daniël. (2022a). Sample Size Justification.\nCollabra: Psychology. https://doi.org/10.31234/osf.io/9d3yf\n\n\nLakens, Daniël. (2022b). Why P values are not measures of\nevidence. Trends in Ecology & Evolution, 37(4),\n289–290. https://doi.org/10.1016/j.tree.2021.12.006\n\n\nLakens, Daniël. (2023). Is my study useless? Why\nresearchers need methodological review boards. Nature,\n613(7942), 9–9. https://doi.org/10.1038/d41586-022-04504-8\n\n\nLakens, Daniel. (2023). When and How to\nDeviate from a Preregistration.\nPsyArXiv. https://doi.org/10.31234/osf.io/ha29k\n\n\nLakens, Daniël, Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A.\nJ., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D.,\nBradford, D. E., Buchanan, E. M., Caldwell, A. R., Calster, B.,\nCarlsson, R., Chen, S.-C., Chung, B., Colling, L. J., Collins, G. S.,\nCrook, Z., … Zwaan, R. A. (2018). Justify your alpha. Nature Human\nBehaviour, 2, 168–171. https://doi.org/10.1038/s41562-018-0311-x\n\n\nLakens, Daniël, & Caldwell, A. R. (2021). Simulation-Based\nPower Analysis for Factorial Analysis of\nVariance Designs. Advances in Methods and Practices in\nPsychological Science, 4(1). https://doi.org/10.1177/2515245920951503\n\n\nLakens, Daniël, & DeBruine, L. (2020). Improving\nTransparency, Falsifiability, and\nRigour by Making Hypothesis Tests Machine\nReadable. https://doi.org/10.31234/osf.io/5xcda\n\n\nLakens, Daniël, & Etz, A. J. (2017). Too True to be\nBad: When Sets of Studies With\nSignificant and Nonsignificant Findings Are Probably\nTrue. Social Psychological and Personality Science,\n8(8), 875–881. https://doi.org/10.1177/1948550617693058\n\n\nLakens, Daniël, Hilgard, J., & Staaks, J. (2016). On the\nreproducibility of meta-analyses: Six practical recommendations. BMC\nPsychology, 4, 24. https://doi.org/10.1186/s40359-016-0126-3\n\n\nLakens, Daniël, McLatchie, N., Isager, P. M., Scheel, A. M., &\nDienes, Z. (2020). Improving Inferences About Null Effects With\nBayes Factors and Equivalence Tests. The\nJournals of Gerontology: Series B, 75(1), 45–57. https://doi.org/10.1093/geronb/gby065\n\n\nLakens, Daniël, Scheel, A. M., & Isager, P. M. (2018). Equivalence\ntesting for psychological research: A tutorial.\nAdvances in Methods and Practices in Psychological Science,\n1(2), 259–269. https://doi.org/10.1177/2515245918770963\n\n\nLan, K. K. G., & DeMets, D. L. (1983). Discrete Sequential\nBoundaries for Clinical Trials. Biometrika,\n70(3), 659. https://doi.org/10.2307/2336502\n\n\nLangmuir, I., & Hall, R. N. (1989). Pathological\nScience. Physics Today, 42(10), 36–48. https://doi.org/10.1063/1.881205\n\n\nLatan, H., Chiappetta Jabbour, C. J., Lopes de Sousa Jabbour, A. B.,\n& Ali, M. (2021). Crossing the Red Line?\nEmpirical Evidence and Useful Recommendations\non Questionable Research Practices among Business\nScholars. Journal of Business Ethics, 1–21. https://doi.org/10.1007/s10551-021-04961-7\n\n\nLaudan, L. (1981). Science and Hypothesis.\nSpringer Netherlands. https://doi.org/10.1007/978-94-015-7288-0\n\n\nLaudan, L. (1986). Science and Values: The\nAims of Science and Their Role in\nScientific Debate.\n\n\nLawrence, J. M., Meyerowitz-Katz, G., Heathers, J. A. J., Brown, N. J.\nL., & Sheldrick, K. A. (2021). The lesson of ivermectin:\nMeta-analyses based on summary data alone are inherently unreliable.\nNature Medicine, 27(11), 1853–1854. https://doi.org/10.1038/s41591-021-01535-y\n\n\nLeamer, E. E. (1978). Specification Searches: Ad\nHoc Inference with Nonexperimental Data (1\nedition). Wiley.\n\n\nLehmann, E. L., & Romano, J. P. (2005). Testing statistical\nhypotheses (3rd ed). Springer.\n\n\nLenth, R. V. (2001). Some practical guidelines for effective sample size\ndetermination. The American Statistician, 55(3),\n187–193. https://doi.org/10.1198/000313001317098149\n\n\nLenth, R. V. (2007). Post hoc power: Tables and commentary. Iowa\nCity: Department of Statistics and Actuarial Science, University of\nIowa.\n\n\nLeon, A. C., Davis, L. L., & Kraemer, H. C. (2011). The\nRole and Interpretation of Pilot\nStudies in Clinical Research. Journal of\nPsychiatric Research, 45(5), 626–629. https://doi.org/10.1016/j.jpsychires.2010.10.008\n\n\nLetrud, K., & Hernes, S. (2019). Affirmative citation bias in\nscientific myth debunking: A three-in-one case study.\nPLOS ONE, 14(9), e0222213. https://doi.org/10.1371/journal.pone.0222213\n\n\nLeung, P. T. M., Macdonald, E. M., Stanbrook, M. B., Dhalla, I. A.,\n& Juurlink, D. N. (2017). A 1980 Letter on the\nRisk of Opioid Addiction. New England\nJournal of Medicine, 376(22), 2194–2195. https://doi.org/10.1056/NEJMc1700150\n\n\nLevine, T. R., Weber, R., Park, H. S., & Hullett, C. R. (2008). A\ncommunication researchers’ guide to null hypothesis significance testing\nand alternatives. Human Communication Research, 34(2),\n188–209.\n\n\nLeys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019).\nHow to Classify, Detect, and Manage\nUnivariate and Multivariate Outliers, With\nEmphasis on Pre-Registration. International\nReview of Social Psychology, 32(1), 5. https://doi.org/10.5334/irsp.289\n\n\nLinden, A. H., & Hönekopp, J. (2021). Heterogeneity of\nResearch Results: A New Perspective From Which\nto Assess and Promote Progress in\nPsychological Science. Perspectives on Psychological\nScience, 16(2), 358–376. https://doi.org/10.1177/1745691620964193\n\n\nLindley, D. V. (1957). A statistical paradox. Biometrika,\n44(1/2), 187–192.\n\n\nLindsay, D. S. (2015). Replication in Psychological\nScience. Psychological Science, 26(12),\n1827–1832. https://doi.org/10.1177/0956797615616374\n\n\nLongino, H. E. (1990). Science as Social Knowledge:\nValues and Objectivity in Scientific\nInquiry. Princeton University Press.\n\n\nLouis, T. A., & Zeger, S. L. (2009). Effective communication of\nstandard errors and confidence intervals. Biostatistics,\n10(1), 1–2. https://doi.org/10.1093/biostatistics/kxn014\n\n\nLovakov, A., & Agadullina, E. R. (2021). Empirically derived\nguidelines for effect size interpretation in social psychology.\nEuropean Journal of Social Psychology, 51(3), 485–504.\nhttps://doi.org/10.1002/ejsp.2752\n\n\nLubin, A. (1957). Replicability as a publication criterion. American\nPsychologist, 12, 519–520. https://doi.org/10.1037/h0039746\n\n\nLuttrell, A., Petty, R. E., & Xu, M. (2017). Replicating and fixing\nfailed replications: The case of need for cognition and\nargument quality. Journal of Experimental Social Psychology,\n69, 178–183. https://doi.org/10.1016/j.jesp.2016.09.006\n\n\nLyons, I. M., Nuerk, H.-C., & Ansari, D. (2015). Rethinking the\nimplications of numerical ratio effects for understanding the\ndevelopment of representational precision and numerical processing\nacross formats. Journal of Experimental Psychology: General,\n144(5), 1021–1035. https://doi.org/10.1037/xge0000094\n\n\nMacCoun, R., & Perlmutter, S. (2015). Blind analysis:\nHide results to seek the truth. Nature,\n526(7572), 187–189. https://doi.org/10.1038/526187a\n\n\nMack, R. W. (1951). The Need for Replication\nResearch in Sociology. American Sociological\nReview, 16(1), 93–94. https://doi.org/10.2307/2087978\n\n\nMahoney, M. J. (1979). Psychology of the scientist: An\nevaluative review. Social Studies of Science, 9(3),\n349–375. https://doi.org/10.1177/030631277900900304\n\n\nMaier, M., & Lakens, D. (2022). Justify your alpha: A\nprimer on two practical approaches. Advances in Methods and\nPractices in Psychological Science. https://doi.org/10.31234/osf.io/ts4r6\n\n\nMakel, M. C., Hodges, J., Cook, B. G., & Plucker, J. A. (2021). Both\nQuestionable and Open Research Practices Are\nPrevalent in Education Research. Educational\nResearcher, 50(8), 493–504. https://doi.org/10.3102/0013189X211001356\n\n\nMarshall, B., Cardon, P., Poddar, A., & Fontenot, R. (2013). Does\nSample Size Matter in Qualitative Research?:\nA Review of Qualitative Interviews in is\nResearch. Journal of Computer Information Systems,\n54(1), 11–22. https://doi.org/10.1080/08874417.2013.11645667\n\n\nMaxwell, S. E., & Delaney, H. D. (2004). Designing experiments\nand analyzing data: A model comparison perspective (2nd ed).\nLawrence Erlbaum Associates.\n\n\nMaxwell, S. E., Delaney, H. D., & Kelley, K. (2017). Designing\nExperiments and Analyzing Data: A Model\nComparison Perspective, Third Edition (3\nedition). Routledge.\n\n\nMaxwell, S. E., & Kelley, K. (2011). Ethics and sample size\nplanning. In Handbook of ethics in quantitative methodology\n(pp. 179–204). Routledge.\n\n\nMaxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample\nSize Planning for Statistical Power and\nAccuracy in Parameter Estimation. Annual\nReview of Psychology, 59(1), 537–563. https://doi.org/10.1146/annurev.psych.59.103006.093735\n\n\nMayo, D. G. (1996). Error and the growth of experimental\nknowledge. University of Chicago Press.\n\n\nMayo, D. G. (2018). Statistical inference as severe testing: How to\nget beyond the statistics wars. Cambridge University\nPress.\n\n\nMayo, D. G., & Spanos, A. (2011). Error statistics. Philosophy\nof Statistics, 7, 152–198.\n\n\nMazzolari, R., Porcelli, S., Bishop, D. J., & Lakens, D. (2022).\nMyths and methodologies: The use of equivalence and\nnon-inferiority tests for interventional studies in exercise physiology\nand sport science. Experimental Physiology, 107(3),\n201–212. https://doi.org/10.1113/EP090171\n\n\nMcCarthy, R. J., Skowronski, J. J., Verschuere, B., Meijer, E. H., Jim,\nA., Hoogesteyn, K., Orthey, R., Acar, O. A., Aczel, B., Bakos, B. E.,\nBarbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R., Blatz,\nL., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E. (2018).\nRegistered Replication Report on Srull and\nWyer (1979). Advances in Methods and Practices in\nPsychological Science, 1(3), 321–336. https://doi.org/10.1177/2515245918777487\n\n\nMcElreath, R. (2016). Statistical Rethinking: A\nBayesian Course with Examples in R and\nStan (Vol. 122). CRC Press.\n\n\nMcGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree:\nThe case of r and d. Psychological Methods,\n11(4), 386–401. https://doi.org/10.1037/1082-989X.11.4.386\n\n\nMcGraw, K. O., & Wong, S. P. (1992). A common language effect size\nstatistic. Psychological Bulletin, 111(2), 361–365. https://doi.org/10.1037/0033-2909.111.2.361\n\n\nMcGuire, W. J. (2004). A Perspectivist Approach to\nTheory Construction. Personality and Social Psychology\nReview, 8(2), 173–182. https://doi.org/10.1207/s15327957pspr0802_11\n\n\nMcIntosh, R. D., & Rittmo, J. Ö. (2021). Power calculations in\nsingle-case neuropsychology: A practical primer.\nCortex, 135, 146–158. https://doi.org/10.1016/j.cortex.2020.11.005\n\n\nMeehl, P. E. (1967). Theory-testing in psychology and physics:\nA methodological paradox. Philosophy of Science,\n103–115. https://www.jstor.org/stable/186099\n\n\nMeehl, P. E. (1978). Theoretical Risks and Tabular\nAsterisks: Sir Karl, Sir Ronald, and\nthe Slow Progress of Soft Psychology.\nJournal of Consulting and Clinical Psychology, 46(4),\n806–834. https://doi.org/10.1037/0022-006X.46.4.806\n\n\nMeehl, P. E. (1990a). Appraising and amending theories: The\nstrategy of Lakatosian defense and two principles that\nwarrant it. Psychological Inquiry, 1(2), 108–141. https://doi.org/10.1207/s15327965pli0102_1\n\n\nMeehl, P. E. (1990b). Why Summaries of\nResearch on Psychological Theories are\nOften Uninterpretable: Psychological Reports,\n66(1), 195–244. https://doi.org/10.2466/pr0.1990.66.1.195\n\n\nMeehl, P. E. (2004). Cliometric metatheory III:\nPeircean consensus, verisimilitude and asymptotic method.\nThe British Journal for the Philosophy of Science,\n55(4), 615–643.\n\n\nMelara, R. D., & Algom, D. (2003). Driven by information:\nA tectonic theory of Stroop effects.\nPsychological Review, 110(3), 422–471. https://doi.org/10.1037/0033-295X.110.3.422\n\n\nMellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency\nrepresentations eliminate conjunction effects? An exercise\nin adversarial collaboration. Psychological Science,\n12(4), 269–275. https://doi.org/10.1111/1467-9280.00350\n\n\nMerton, R. K. (1942). A Note on Science and\nDemocracy. Journal of Legal and Political\nSociology, 1, 115–126.\n\n\nMeyners, M. (2012). Equivalence tests A\nreview. Food Quality and Preference, 26(2), 231–245.\nhttps://doi.org/10.1016/j.foodqual.2012.05.003\n\n\nMeyvis, T., & Van Osselaer, S. M. J. (2018). Increasing the\nPower of Your Study by Increasing\nthe Effect Size. Journal of Consumer Research,\n44(5), 1157–1173. https://doi.org/10.1093/jcr/ucx110\n\n\nMillar, R. B. (2011). Maximum likelihood estimation and inference:\nWith examples in R, SAS, and\nADMB. Wiley.\n\n\nMiller, J. (2009). What is the probability of replicating a\nstatistically significant effect? Psychonomic Bulletin &\nReview, 16(4), 617–640. https://doi.org/10.3758/PBR.16.4.617\n\n\nMiller, J., & Ulrich, R. (2019). The quest for an optimal alpha.\nPLOS ONE, 14(1), e0208631. https://doi.org/10.1371/journal.pone.0208631\n\n\nMitroff, I. I. (1974). Norms and Counter-Norms in a\nSelect Group of the Apollo Moon Scientists:\nA Case Study of the Ambivalence of\nScientists. American Sociological Review,\n39(4), 579–595. https://doi.org/10.2307/2094423\n\n\nMoe, K. (1984). Should the Nazi Research Data Be Cited?\nThe Hastings Center Report, 14(6), 5–7. https://doi.org/10.2307/3561733\n\n\nMoran, C., Link to external site, this link will open in a new window,\nRichard, A., Link to external site, this link will open in a new window,\nWilson, K., Twomey, R., Link to external site, this link will open in a\nnew window, Coroiu, A., & Link to external site, this link will open\nin a new window. (2022). I know it’s bad, but I have been\npressured into it: Questionable research practices among\npsychology students in Canada. Canadian\nPsychology/Psychologie Canadienne. https://doi.org/10.1037/cap0000326\n\n\nMorey, R. D. (2020). Power and precision [Blog].\nhttps://medium.com/@richarddmorey/power-and-precision-47f644ddea5e.\n\n\nMorey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., &\nWagenmakers, E.-J. (2016). The fallacy of placing confidence in\nconfidence intervals. Psychonomic Bulletin & Review,\n23(1), 103–123.\n\n\nMorey, R. D., Kaschak, M. P., Díez-Álamo, A. M., Glenberg, A. M., Zwaan,\nR. A., Lakens, D., Ibáñez, A., García, A., Gianelli, C., Jones, J. L.,\nMadden, J., Alifano, F., Bergen, B., Bloxsom, N. G., Bub, D. N., Cai, Z.\nG., Chartier, C. R., Chatterjee, A., Conwell, E., … Ziv-Crispel, N.\n(2021). A pre-registered, multi-lab non-replication of the\naction-sentence compatibility effect (ACE). Psychonomic\nBulletin & Review. https://doi.org/10.3758/s13423-021-01927-8\n\n\nMorris, T. P., White, I. R., & Crowther, M. J. (2019). Using\nsimulation studies to evaluate statistical methods. Statistics in\nMedicine, 38(11), 2074–2102. https://doi.org/10.1002/sim.8086\n\n\nMorse, J. M. (1995). The Significance of\nSaturation. Qualitative Health Research,\n5(2), 147–149. https://doi.org/10.1177/104973239500500201\n\n\nMoscovici, S. (1972). Society and theory in social psychology. In\nContext of social psychology (pp. 17–81).\n\n\nMoshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L.,\nForscher, P. S., Grahe, J. E., McCarthy, R. J., Musser, E. D., &\nAntfolk, J. (2018). The Psychological Science Accelerator:\nAdvancing psychology through a distributed collaborative\nnetwork. Advances in Methods and Practices in Psychological\nScience, 1(4), 501–515. https://doi.org/10.1177/2515245918797607\n\n\nMotyl, M., Demos, A. P., Carsel, T. S., Hanson, B. E., Melton, Z. J.,\nMueller, A. B., Prims, J. P., Sun, J., Washburn, A. N., Wong, K. M.,\nYantis, C., & Skitka, L. J. (2017). The state of social and\npersonality science: Rotten to the core, not so bad,\ngetting better, or getting worse? Journal of Personality and Social\nPsychology, 113, 34–58. https://doi.org/10.1037/pspa0000084\n\n\nMrozek, J. R., & Taylor, L. O. (2002). What determines the value of\nlife? A meta-analysis. Journal of Policy Analysis and\nManagement, 21(2), 253–270. https://doi.org/10.1002/pam.10026\n\n\nMudge, J. F., Baker, L. F., Edge, C. B., & Houlahan, J. E. (2012).\nSetting an Optimal α That Minimizes\nErrors in Null Hypothesis Significance Tests.\nPLOS ONE, 7(2), e32734. https://doi.org/10.1371/journal.pone.0032734\n\n\nMullan, F., & Jacoby, I. (1985). The town meeting for technology:\nThe maturation of consensus conferences. JAMA,\n254(8), 1068–1072. https://doi.org/10.1001/jama.1985.03360080080035\n\n\nMulligan, A., Hall, L., & Raphael, E. (2013). Peer review in a\nchanging world: An international study measuring the\nattitudes of researchers. Journal of the American Society for\nInformation Science and Technology, 64(1), 132–161. https://doi.org/10.1002/asi.22798\n\n\nMurphy, K. R., & Myors, B. (1999). Testing the hypothesis that\ntreatments have negligible effects: Minimum-effect tests in the general linear model.\nJournal of Applied Psychology, 84(2), 234–248. https://doi.org/10.1037/0021-9010.84.2.234\n\n\nMurphy, K. R., Myors, B., & Wolach, A. H. (2014). Statistical\npower analysis: A simple and general model for traditional and modern\nhypothesis tests (Fourth edition). Routledge, Taylor &\nFrancis Group.\n\n\nNational Academy of Sciences, National Academy of Engineering, &\nInstitute of Medicine. (2009). On being a scientist: A\nguide to responsible conduct in research: Third\nedition. The National Academies Press. https://doi.org/10.17226/12192\n\n\nNeher, A. (1967). Probability Pyramiding, Research\nError and the Need for Independent\nReplication. The Psychological Record, 17(2),\n257–262. https://doi.org/10.1007/BF03393713\n\n\nNemeth, C., Brown, K., & Rogers, J. (2001). Devil’s advocate versus\nauthentic dissent: Stimulating quantity and quality. European\nJournal of Social Psychology, 31(6), 707–720. https://doi.org/10.1002/ejsp.58\n\n\nNeyman, J. (1957). \"Inductive Behavior\" as a Basic\nConcept of Philosophy of Science.\nRevue de l’Institut International de Statistique / Review of the\nInternational Statistical Institute, 25(1/3), 7. https://doi.org/10.2307/1401671\n\n\nNeyman, J., & Pearson, E. S. (1933). On the problem of the most\nefficient tests of statistical hypotheses. Philosophical\nTransactions of the Royal Society of London A: Mathematical, Physical\nand Engineering Sciences, 231(694-706), 289–337. https://doi.org/10.1098/rsta.1933.0009\n\n\nNickerson, R. S. (1998). Confirmation bias: A ubiquitous\nphenomenon in many guises. Review of General Psychology,\n2(2), 175–220.\n\n\nNickerson, R. S. (2000). Null hypothesis significance testing:\nA review of an old and continuing controversy.\nPsychological Methods, 5(2), 241–301. https://doi.org/10.1037//1082-989X.5.2.241\n\n\nNiiniluoto, I. (1998). Verisimilitude: The Third Period.\nThe British Journal for the Philosophy of Science, 49,\n1–29.\n\n\nNiiniluoto, I. (1999). Critical Scientific\nRealism. Oxford University Press.\n\n\nNorman, G. R., Sloan, J. A., & Wyrwich, K. W. (2004). The truly\nremarkable universality of half a standard deviation: Confirmation\nthrough another look. Expert Review of Pharmacoeconomics &\nOutcomes Research, 4(5), 581–585.\n\n\nNosek, B. A., & Lakens, D. (2014). Registered reports:\nA method to increase the credibility of published results.\nSocial Psychology, 45(3), 137–141. https://doi.org/10.1027/1864-9335/a000192\n\n\nNuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp,\nS., & Wicherts, J. M. (2015). The prevalence of statistical\nreporting errors in psychology (19852013). Behavior\nResearch Methods. https://doi.org/10.3758/s13428-015-0664-2\n\n\nNuijten, M. B., & Wicherts, J. (2023). The effectiveness of\nimplementing statcheck in the peer review process to avoid statistical\nreporting errors. PsyArXiv. https://doi.org/10.31234/osf.io/bxau9\n\n\nNunnally, J. (1960). The place of statistics in psychology.\nEducational and Psychological Measurement, 20(4),\n641–650. https://doi.org/10.1177/001316446002000401\n\n\nO’Donnell, M., Nelson, L. D., Ackermann, E., Aczel, B., Akhtar, A.,\nAldrovandi, S., Alshaif, N., Andringa, R., Aveyard, M., Babincak, P.,\nBalatekin, N., Baldwin, S. A., Banik, G., Baskin, E., Bell, R.,\nBiałobrzeska, O., Birt, A. R., Boot, W. R., Braithwaite, S. R., …\nZrubka, M. (2018). Registered Replication Report:\nDijksterhuis and van Knippenberg (1998).\nPerspectives on Psychological Science, 13(2), 268–294.\nhttps://doi.org/10.1177/1745691618755704\n\n\nObels, P., Lakens, D., Coles, N. A., Gottfried, J., & Green, S. A.\n(2020). Analysis of Open Data and Computational\nReproducibility in Registered Reports in\nPsychology. Advances in Methods and Practices in\nPsychological Science, 3(2), 229–237. https://doi.org/10.1177/2515245920918872\n\n\nOddie, G. (2013). The content, consequence and likeness approaches to\nverisimilitude: Compatibility, trivialization, and underdetermination.\nSynthese, 190(9), 1647–1687. https://doi.org/10.1007/s11229-011-9930-8\n\n\nOkada, K. (2013). Is Omega Squared Less Biased? A\nComparison of Three Major Effect Size Indices\nin One-Way Anova. Behaviormetrika, 40(2),\n129–147. https://doi.org/10.2333/bhmk.40.129\n\n\nOlejnik, S., & Algina, J. (2003). Generalized Eta and\nOmega Squared Statistics: Measures of\nEffect Size for Some Common Research Designs.\nPsychological Methods, 8(4), 434–447. https://doi.org/10.1037/1082-989X.8.4.434\n\n\nOlsson-Collentine, A., Wicherts, J. M., & van Assen, M. A. L. M.\n(2020). Heterogeneity in direct replications in psychology and its\nassociation with effect size. Psychological Bulletin,\n146(10), 922–940. https://doi.org/10.1037/bul0000294\n\n\nOpen Science Collaboration. (2015). Estimating the reproducibility of\npsychological science. Science, 349(6251),\naac4716–aac4716. https://doi.org/10.1126/science.aac4716\n\n\nOrben, A., & Lakens, D. (2020). Crud\n(Re)Defined. Advances in Methods and\nPractices in Psychological Science, 3(2), 238–247. https://doi.org/10.1177/2515245920917961\n\n\nParker, R. A., & Berman, N. G. (2003). Sample Size.\nThe American Statistician, 57(3), 166–170. https://doi.org/10.1198/0003130031919\n\n\nParkhurst, D. F. (2001). Statistical significance tests:\nEquivalence and reverse tests should reduce\nmisinterpretation. Bioscience, 51(12), 1051–1057. https://doi.org/10.1641/0006-3568(2001)051[1051:SSTEAR]2.0.CO;2\n\n\nParsons, S., Kruijt, A.-W., & Fox, E. (2019). Psychological\nScience Needs a Standard Practice of\nReporting the Reliability of\nCognitive-Behavioral Measurements. Advances in Methods\nand Practices in Psychological Science, 2(4), 378–395. https://doi.org/10.1177/2515245919879695\n\n\nPawitan, Y. (2001). In all likelihood: Statistical modelling and\ninference using likelihood. Clarendon Press ; Oxford\nUniversity Press.\n\n\nPemberton, M., Hall, S., Moskovitz, C., & Anson, C. M. (2019). Text\nrecycling: Views of North American journal\neditors from an interview-based study. Learned Publishing,\n32(4), 355–366. https://doi.org/10.1002/leap.1259\n\n\nPerneger, T. V. (1998). What’s wrong with Bonferroni\nadjustments. Bmj, 316(7139), 1236–1238.\n\n\nPerugini, M., Gallucci, M., & Costantini, G. (2014). Safeguard power\nas a protection against imprecise power estimates. Perspectives on\nPsychological Science, 9(3), 319–332. https://doi.org/10.1177/1745691614528519\n\n\nPerugini, M., Gallucci, M., & Costantini, G. (2018). A\nPractical Primer To Power Analysis for Simple\nExperimental Designs. International Review of Social\nPsychology, 31(1), 20. https://doi.org/10.5334/irsp.181\n\n\nPeters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., &\nRushton, L. (2007). Performance of the trim and fill method in the\npresence of publication bias and between-study heterogeneity.\nStatistics in Medicine, 26(25), 4544–4562. https://doi.org/10.1002/sim.2889\n\n\nPhillips, B. M., Hunt, J. W., Anderson, B. S., Puckett, H. M., Fairey,\nR., Wilson, C. J., & Tjeerdema, R. (2001). Statistical significance\nof sediment toxicity test results: Threshold values derived\nby the detectable significance approach. Environmental Toxicology\nand Chemistry, 20(2), 371–373. https://doi.org/10.1002/etc.5620200218\n\n\nPickett, J. T., & Roche, S. P. (2017). Questionable,\nObjectionable or Criminal? Public\nOpinion on Data Fraud and Selective\nReporting in Science. Science and Engineering\nEthics, 1–21. https://doi.org/10.1007/s11948-017-9886-2\n\n\nPlatt, J. R. (1964). Strong Inference: Certain\nsystematic methods of scientific thinking may produce much more rapid\nprogress than others. Science, 146(3642), 347–353. https://doi.org/10.1126/science.146.3642.347\n\n\nPocock, S. J. (1977). Group sequential methods in the design and\nanalysis of clinical trials. Biometrika, 64(2),\n191–199. https://doi.org/10.1093/biomet/64.2.191\n\n\nPolanin, J. R., Hennessy, E. A., & Tsuji, S. (2020). Transparency\nand Reproducibility of Meta-Analyses in\nPsychology: A Meta-Review. Perspectives on\nPsychological Science, 15(4), 1026–1041. https://doi.org/10.1177/1745691620906416\n\n\nPopper, K. R. (2002). The logic of scientific\ndiscovery. Routledge.\n\n\nPrimbs, M., Pennington, C. R., Lakens, D., Silan, M. A., Lieck, D. S.\nN., Forscher, P., Buchanan, E. M., & Westwood, S. J. (2022). Are\nSmall Effects the Indispensable Foundation for\na Cumulative Psychological Science? A Reply to\nGötz et al.\n(2022). Perspectives on Psychological Science. https://doi.org/10.31234/osf.io/6s8bj\n\n\nProschan, M. A. (2005). Two-Stage Sample Size Re-Estimation\nBased on a Nuisance Parameter: A\nReview. Journal of Biopharmaceutical Statistics,\n15(4), 559–574. https://doi.org/10.1081/BIP-200062852\n\n\nProschan, M. A., Lan, K. K. G., & Wittes, J. T. (2006).\nStatistical monitoring of clinical trials: A unified approach.\nSpringer.\n\n\nPsillos, S. (1999). Scientific realism: How science tracks\ntruth. Routledge.\n\n\nQuertemont, E. (2011). How to Statistically Show the\nAbsence of an Effect. Psychologica\nBelgica, 51(2), 109–127. https://doi.org/10.5334/pb-51-2-109\n\n\nRabelo, A. L. A., Farias, J. E. M., Sarmet, M. M., Joaquim, T. C. R.,\nHoersting, R. C., Victorino, L., Modesto, J. G. N., & Pilati, R.\n(2020). Questionable research practices among Brazilian\npsychological researchers: Results from a replication study\nand an international comparison. International Journal of\nPsychology, 55(4), 674–683. https://doi.org/10.1002/ijop.12632\n\n\nRadick, G. (2022). Mendel the fraud? A social history of\ntruth in genetics. Studies in History and Philosophy of\nScience, 93, 39–46. https://doi.org/10.1016/j.shpsa.2021.12.012\n\n\nReif, F. (1961). The Competitive World of the Pure\nScientist. Science, 134(3494), 1957–1962. https://doi.org/10.1126/science.134.3494.1957\n\n\nRice, W. R., & Gaines, S. D. (1994). ’Heads I win,\ntails you lose’: Testing directional alternative hypotheses in\necological and evolutionary research. Trends in Ecology &\nEvolution, 9(6), 235–237. https://doi.org/10.1016/0169-5347(94)90258-5\n\n\nRichard, F. D., Bond, C. F., & Stokes-Zoota, J. J. (2003). One\nHundred Years of Social Psychology Quantitatively\nDescribed. Review of General Psychology, 7(4),\n331–363. https://doi.org/10.1037/1089-2680.7.4.331\n\n\nRichardson, J. T. E. (2011). Eta squared and partial eta squared as\nmeasures of effect size in educational research. Educational\nResearch Review, 6(2), 135–147. https://doi.org/10.1016/j.edurev.2010.12.001\n\n\nRijnsoever, F. J. van. (2017). (I Can’t Get\nNo) Saturation: A simulation and\nguidelines for sample sizes in qualitative research. PLOS ONE,\n12(7), e0181689. https://doi.org/10.1371/journal.pone.0181689\n\n\nRogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using\nsignificance tests to evaluate equivalence between two experimental\ngroups. Psychological Bulletin, 113(3), 553–565.\nhttps://doi.org/http://dx.doi.org/10.1037/0033-2909.113.3.553\n\n\nRogers, S. (1992). How a publicity blitz created the myth of subliminal\nadvertising. Public Relations Quarterly, 37(4), 12.\n\n\nRopovik, I., Adamkovic, M., & Greger, D. (2021). Neglect of\npublication bias compromises meta-analyses of educational research.\nPLOS ONE, 16(6), e0252415. https://doi.org/10.1371/journal.pone.0252415\n\n\nRosenthal, R. (1966). Experimenter effects in behavioral\nresearch. Appleton-Century-Crofts.\n\n\nRoss-Hellauer, T., Deppe, A., & Schmidt, B. (2017). Survey on open\npeer review: Attitudes and experience amongst editors,\nauthors and reviewers. PLOS ONE, 12(12), e0189311. https://doi.org/10.1371/journal.pone.0189311\n\n\nRouder, J. N. (2014). Optional stopping: No problem for\nBayesians. Psychonomic Bulletin & Review,\n21(2), 301–308.\n\n\nRouder, J. N., Haaf, J. M., & Snyder, H. K. (2019). Minimizing\nMistakes in Psychological Science.\nAdvances in Methods and Practices in Psychological Science,\n2(1), 3–11. https://doi.org/10.1177/2515245918801915\n\n\nRouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G.\n(2009). Bayesian t tests for accepting and rejecting the null\nhypothesis. Psychonomic Bulletin & Review, 16(2),\n225–237. https://doi.org/10.3758/PBR.16.2.225\n\n\nRoyall, R. (1997). Statistical Evidence: A\nLikelihood Paradigm. Chapman and Hall/CRC.\n\n\nRozeboom, W. W. (1960). The fallacy of the null-hypothesis significance\ntest. Psychological Bulletin, 57(5), 416–428. https://doi.org/10.1037/h0042040\n\n\nRücker, G., Schwarzer, G., Carpenter, J. R., & Schumacher, M.\n(2008). Undue reliance on I(2) in assessing heterogeneity\nmay mislead. BMC Medical Research Methodology, 8, 79.\nhttps://doi.org/10.1186/1471-2288-8-79\n\n\nSarafoglou, A., Kovacs, M., Bakos, B., Wagenmakers, E.-J., & Aczel,\nB. (2022). A survey on how preregistration affects the research\nworkflow: Better science but more work. Royal Society Open\nScience, 9(7), 211997. https://doi.org/10.1098/rsos.211997\n\n\nScheel, A. M., Schijen, M. R. M. J., & Lakens, D. (2021). An\nExcess of Positive Results:\nComparing the Standard Psychology Literature With\nRegistered Reports. Advances in Methods and Practices in\nPsychological Science, 4(2), 25152459211007467. https://doi.org/10.1177/25152459211007467\n\n\nScheel, A. M., Tiokhin, L., Isager, P. M., & Lakens, D. (2021). Why\nHypothesis Testers Should Spend Less Time Testing\nHypotheses. Perspectives on Psychological Science,\n16(4), 744–755. https://doi.org/10.1177/1745691620966795\n\n\nSchimmack, U. (2012). The ironic effect of significant results on the\ncredibility of multiple-study articles. Psychological Methods,\n17(4), 551–566. https://doi.org/10.1037/a0029487\n\n\nSchnuerch, M., & Erdfelder, E. (2020). Controlling decision errors\nwith minimal costs: The sequential probability ratio t\ntest. Psychological Methods, 25(2), 206–226. https://doi.org/10.1037/met0000234\n\n\nSchoemann, A. M., Boulton, A. J., & Short, S. D. (2017). Determining\nPower and Sample Size for Simple\nand Complex Mediation Models. Social Psychological and\nPersonality Science, 8(4), 379–386. https://doi.org/10.1177/1948550617715068\n\n\nSchoenegger, P., & Pils, R. (2023). Social sciences in crisis: On\nthe proposed elimination of the discussion section. Synthese,\n202(2), 54. https://doi.org/10.1007/s11229-023-04267-3\n\n\nSchönbrodt, F. D., Wagenmakers, E.-J., Zehetleitner, M., & Perugini,\nM. (2017). Sequential hypothesis testing with Bayes\nfactors: Efficiently testing mean differences.\nPsychological Methods, 22(2), 322–339. https://doi.org/10.1037/MET0000061\n\n\nSchuirmann, D. J. (1987). A comparison of the two one-sided tests\nprocedure and the power approach for assessing the equivalence of\naverage bioavailability. Journal of Pharmacokinetics and\nBiopharmaceutics, 15(6), 657–680.\n\n\nSchulz, K. F., & Grimes, D. A. (2005). Sample size calculations in\nrandomised trials: Mandatory and mystical. The Lancet,\n365(9467), 1348–1353. https://doi.org/10.1016/S0140-6736(05)61034-3\n\n\nSchumi, J., & Wittes, J. T. (2011). Through the looking glass:\nUnderstanding non-inferiority. Trials, 12(1), 106. https://doi.org/10.1186/1745-6215-12-106\n\n\nSchweder, T., & Hjort, N. L. (2016). Confidence,\nLikelihood, Probability: Statistical\nInference with Confidence Distributions.\nCambridge University Press. https://doi.org/10.1017/CBO9781139046671\n\n\nScull, A. (2023). Rosenhan revisited: Successful scientific fraud.\nHistory of Psychiatry, 0957154X221150878. https://doi.org/10.1177/0957154X221150878\n\n\nSeaman, M. A., & Serlin, R. C. (1998). Equivalence confidence\nintervals for two-group comparisons of means. Psychological\nMethods, 3(4), 403–411. https://doi.org/http://dx.doi.org.dianus.libr.tue.nl/10.1037/1082-989X.3.4.403\n\n\nSedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical\npower have an effect on the power of studies? Psychological\nBulletin, 105(2), 309–316. https://doi.org/10.1037/0033-2909.105.2.309\n\n\nShadish, W. R., Cook, T. D., & Campbell, D. T. (2001).\nExperimental and quasi-experimental designs for generalized causal\ninference. Houghton Mifflin.\n\n\nShmueli, G. (2010). To explain or to predict? Statistical\nScience, 25(3), 289–310.\n\n\nSimmons, J. P., Nelson, L. D., & Simonsohn, U. (2011).\nFalse-Positive Psychology: Undisclosed\nFlexibility in Data Collection and Analysis\nAllows Presenting Anything as Significant.\nPsychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632\n\n\nSimmons, J. P., Nelson, L. D., & Simonsohn, U. (2013). Life\nafter P-Hacking.\n\n\nSimonsohn, U. (2015). Small telescopes: Detectability and\nthe evaluation of replication results. Psychological Science,\n26(5), 559–569. https://doi.org/10.1177/0956797614567341\n\n\nSimonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve:\nA key to the file-drawer. Journal of Experimental\nPsychology: General, 143(2), 534.\n\n\nSmart, R. G. (1964). The importance of negative results in psychological\nresearch. Canadian Psychologist / Psychologie Canadienne,\n5a(4), 225–232. https://doi.org/10.1037/h0083036\n\n\nSmithson, M. (2003). Confidence intervals. Sage\nPublications.\n\n\nSotola, L. K. (2022). Garbage In, Garbage Out?\nEvaluating the Evidentiary Value of Published Meta-analyses Using Z-Curve Analysis.\nCollabra: Psychology, 8(1), 32571. https://doi.org/10.1525/collabra.32571\n\n\nSpanos, A. (1999). Probability theory and statistical inference:\nEconometric modeling with observational data. Cambridge\nUniversity Press.\n\n\nSpanos, A. (2013). Who should be afraid of the\nJeffreys-Lindley paradox? Philosophy of Science,\n80(1), 73–93. https://doi.org/10.1086/668875\n\n\nSpellman, B. A. (2015). A Short (Personal)\nFuture History of Revolution 2.0.\nPerspectives on Psychological Science, 10(6), 886–899.\nhttps://doi.org/10.1177/1745691615609918\n\n\nSpiegelhalter, D. (2019). The Art of\nStatistics: How to Learn from\nData (Illustrated edition). Basic Books.\n\n\nSpiegelhalter, D. J., Freedman, L. S., & Blackburn, P. R. (1986).\nMonitoring clinical trials: Conditional or predictive power?\nControlled Clinical Trials, 7(1), 8–17. https://doi.org/10.1016/0197-2456(86)90003-6\n\n\nStanley, T. D., & Doucouliagos, H. (2014). Meta-regression\napproximations to reduce publication selection bias. Research\nSynthesis Methods, 5(1), 60–78. https://doi.org/10.1002/jrsm.1095\n\n\nStanley, T. D., Doucouliagos, H., & Ioannidis, J. P. A. (2017).\nFinding the power to reduce publication bias: Finding the\npower to reduce publication bias. Statistics in Medicine. https://doi.org/10.1002/sim.7228\n\n\nSteiger, J. H. (2004). Beyond the F Test: Effect Size\nConfidence Intervals and Tests of Close\nFit in the Analysis of Variance and\nContrast Analysis. Psychological Methods,\n9(2), 164–182. https://doi.org/10.1037/1082-989X.9.2.164\n\n\nSterling, T. D. (1959). Publication Decisions and\nTheir Possible Effects on Inferences Drawn\nfrom Tests of Significance–Or Vice Versa.\nJournal of the American Statistical Association,\n54(285), 30–34. https://doi.org/10.2307/2282137\n\n\nStewart, L. A., & Tierney, J. F. (2002). To IPD or not\nto IPD?: Advantages and\nDisadvantages of Systematic Reviews Using Individual\nPatient Data. Evaluation & the Health Professions,\n25(1), 76–97. https://doi.org/10.1177/0163278702025001006\n\n\nStodden, V., Seiler, J., & Ma, Z. (2018). An empirical analysis of\njournal policy effectiveness for computational reproducibility.\nProceedings of the National Academy of Sciences,\n115(11), 2584–2589. https://doi.org/10.1073/pnas.1708290115\n\n\nStrand, J. F. (2023). Error tight: Exercises for lab groups\nto prevent research mistakes. Psychological Methods, No\nPagination Specified–No Pagination Specified. https://doi.org/10.1037/met0000547\n\n\nStroebe, W., & Strack, F. (2014). The Alleged Crisis\nand the Illusion of Exact Replication.\nPerspectives on Psychological Science, 9(1), 59–71. https://doi.org/10.1177/1745691613514450\n\n\nStroop, J. R. (1935). Studies of interference in serial verbal\nreactions. Journal of Experimental Psychology, 18(6),\n643–662.\n\n\nSwift, J. K., Link to external site, this link will open in a new\nwindow, Christopherson, C. D., Link to external site, this link will\nopen in a new window, Bird, M. O., Link to external site, this link will\nopen in a new window, Zöld, A., Link to external site, this link will\nopen in a new window, Goode, J., & Link to external site, this link\nwill open in a new window. (2022). Questionable research practices among\nfaculty and students in APA-accredited\nclinical and counseling psychology doctoral programs. Training and\nEducation in Professional Psychology, 16(3), 299–305. https://doi.org/10.1037/tep0000322\n\n\nTaper, M. L., & Lele, S. R. (2011). Philosophy of\nStatistics. In P. S. Bandyophadhyay & M. R. Forster\n(Eds.), Evidence, evidence functions, and error probabilities\n(pp. 513–531). Elsevier, USA.\n\n\nTaylor, D. J., & Muller, K. E. (1996). Bias in linear model power\nand sample size calculation due to estimating noncentrality.\nCommunications in Statistics-Theory and Methods,\n25(7), 1595–1610. https://doi.org/10.1080/03610929608831787\n\n\nTeare, M. D., Dimairo, M., Shephard, N., Hayman, A., Whitehead, A.,\n& Walters, S. J. (2014). Sample size requirements to estimate key\ndesign parameters from external pilot randomised controlled trials: A\nsimulation study. Trials, 15(1), 264. https://doi.org/10.1186/1745-6215-15-264\n\n\nTendeiro, J. N., & Kiers, H. A. L. (2019). A review of issues about\nnull hypothesis Bayesian testing. Psychological\nMethods. https://doi.org/10.1037/met0000221\n\n\nter Schure, J., & Grünwald, P. D. (2019). Accumulation\nBias in Meta-Analysis: The Need\nto Consider Time in Error Control.\narXiv:1905.13494 [Math, Stat]. https://arxiv.org/abs/1905.13494\n\n\nTerrin, N., Schmid, C. H., Lau, J., & Olkin, I. (2003). Adjusting\nfor publication bias in the presence of heterogeneity. Statistics in\nMedicine, 22(13), 2113–2126. https://doi.org/10.1002/sim.1461\n\n\nThompson, B. (2007). Effect sizes, confidence intervals, and confidence\nintervals for effect sizes. Psychology in the Schools,\n44(5), 423–432. https://doi.org/10.1002/pits.20234\n\n\nTversky, A. (1977). Features of similarity. Psychological\nReview, 84(4), 327–352. https://doi.org/10.1037/0033-295X.84.4.327\n\n\nTversky, A., & Kahneman, D. (1971). Belief in the law of small\nnumbers. Psychological Bulletin, 76(2), 105–110. https://doi.org/10.1037/h0031322\n\n\nUlrich, R., & Miller, J. (2018). Some properties of p-curves, with\nan application to gradual publication bias. Psychological\nMethods, 23(3), 546–560. https://doi.org/10.1037/met0000125\n\n\nUygun Tunç, D., & Tunç, M. N. (2022). A Falsificationist\nTreatment of Auxiliary Hypotheses in\nSocial and Behavioral Sciences:\nSystematic Replications Framework.\nMeta-Psychology. https://doi.org/10.31234/osf.io/pdm7y\n\n\nUygun Tunç, D., Tunç, M. N., & Lakens, D. (2023). The epistemic and\npragmatic function of dichotomous claims based on statistical hypothesis\ntests. Theory & Psychology, 09593543231160112. https://doi.org/10.1177/09593543231160112\n\n\nValentine, J. C., Pigott, T. D., & Rothstein, H. R. (2010). How\nMany Studies Do You Need?: A Primer on\nStatistical Power for Meta-Analysis.\nJournal of Educational and Behavioral Statistics,\n35(2), 215–247. https://doi.org/10.3102/1076998609346961\n\n\nvan de Schoot, R., Winter, S. D., Griffioen, E., Grimmelikhuijsen, S.,\nArts, I., Veen, D., Grandfield, E. M., & Tummers, L. G. (2021). The\nUse of Questionable Research Practices to\nSurvive in Academia Examined With Expert\nElicitation, Prior-Data Conflicts, Bayes\nFactors for Replication Effects, and the Bayes\nTruth Serum. Frontiers in Psychology, 12.\n\n\nvan de Schoot, R., Winter, S. D., Ryan, O., Zondervan-Zwijnenburg, M.,\n& Depaoli, S. (2017). A systematic review of Bayesian\narticles in psychology: The last 25 years.\nPsychological Methods, 22(2), 217–239. https://doi.org/10.1037/met0000100\n\n\nVan Fraassen, B. C. (1980). The scientific image.\nClarendon Press ; Oxford University Press.\n\n\nvan ’t Veer, A. E., & Giner-Sorolla, R. (2016). Pre-registration in\nsocial psychologyA discussion and suggested\ntemplate. Journal of Experimental Social Psychology,\n67, 2–12. https://doi.org/10.1016/j.jesp.2016.03.004\n\n\nVarkey, B. (2021). Principles of Clinical Ethics and\nTheir Application to Practice. Medical\nPrinciples and Practice: International Journal of the Kuwait University,\nHealth Science Centre, 30(1), 17–28. https://doi.org/10.1159/000509119\n\n\nVazire, S. (2017). Quality Uncertainty Erodes Trust in\nScience. Collabra: Psychology, 3(1), 1.\nhttps://doi.org/10.1525/collabra.74\n\n\nVazire, S., & Holcombe, A. O. (2022). Where Are the\nSelf-Correcting Mechanisms in Science?\nReview of General Psychology, 26(2), 212–223. https://doi.org/10.1177/10892680211033912\n\n\nVerschuere, B., Meijer, E. H., Jim, A., Hoogesteyn, K., Orthey, R.,\nMcCarthy, R. J., Skowronski, J. J., Acar, O. A., Aczel, B., Bakos, B.\nE., Barbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R.,\nBlatz, L., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E.\n(2018). Registered Replication Report on\nMazar, Amir, and Ariely (2008).\nAdvances in Methods and Practices in Psychological Science,\n1(3), 299–317. https://doi.org/10.1177/2515245918781032\n\n\nViamonte, S. M., Ball, K. K., & Kilgore, M. (2006). A\nCost-Benefit Analysis of Risk-Reduction Strategies\nTargeted at Older Drivers. Traffic Injury\nPrevention, 7(4), 352–359. https://doi.org/10.1080/15389580600791362\n\n\nViechtbauer, W. (2010). Conducting meta-analyses in R with\nthe metafor package. J Stat Softw, 36(3), 1–48.\nhttps://doi.org/http://dx.doi.org/10.18637/jss.v036.i03\n\n\nVohs, K. D., Schmeichel, B. J., Lohmann, S., Gronau, Q. F., Finley, A.\nJ., Ainsworth, S. E., Alquist, J. L., Baker, M. D., Brizi, A., Bunyi,\nA., Butschek, G. J., Campbell, C., Capaldi, J., Cau, C., Chambers, H.,\nChatzisarantis, N. L. D., Christensen, W. J., Clay, S. L., Curtis, J., …\nAlbarracín, D. (2021). A Multisite Preregistered Paradigmatic\nTest of the Ego-Depletion Effect. Psychological\nScience, 32(10), 1566–1581. https://doi.org/10.1177/0956797621989733\n\n\nVosgerau, J., Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2019).\n99% impossible: A valid, or falsifiable, internal\nmeta-analysis. Journal of Experimental Psychology. General,\n148(9), 1628–1639. https://doi.org/10.1037/xge0000663\n\n\nVuorre, M., & Curley, J. P. (2018). Curating Research\nAssets: A Tutorial on the Git Version Control\nSystem. Advances in Methods and Practices in Psychological\nScience, 1(2), 219–236. https://doi.org/10.1177/2515245918754826\n\n\nWacholder, S., Chanock, S., Garcia-Closas, M., El ghormli, L., &\nRothman, N. (2004). Assessing the Probability That a\nPositive Report is False: An\nApproach for Molecular Epidemiology Studies.\nJNCI Journal of the National Cancer Institute, 96(6),\n434–442. https://doi.org/10.1093/jnci/djh075\n\n\nWagenmakers, E.-J. (2007). A practical solution to the pervasive\nproblems of p values. Psychonomic Bulletin & Review,\n14(5), 779–804. https://doi.org/10.3758/BF03194105\n\n\nWagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A.,\nAdams, R. B., Albohn, D. N., Allard, E. S., Benning, S. D.,\nBlouin-Hudon, E.-M., Bulnes, L. C., Caldwell, T. L., Calin-Jageman, R.\nJ., Capaldi, C. A., Carfagno, N. S., Chasten, K. T., Cleeremans, A.,\nConnell, L., DeCicco, J. M., … Zwaan, R. A. (2016). Registered\nReplication Report: Strack,\nMartin, & Stepper (1988). Perspectives\non Psychological Science, 11(6), 917–928. https://doi.org/10.1177/1745691616674458\n\n\nWagenmakers, E.-J., Wetzels, R., Borsboom, D., & van der Maas, H. L.\nJ. (2011). Why psychologists must change the way they analyze their\ndata: The case of psi: Comment on Bem (2011). Journal\nof Personality and Social Psychology, 100(3), 426–432. https://doi.org/10.1037/a0022790\n\n\nWald, A. (1945). Sequential tests of statistical hypotheses. The\nAnnals of Mathematical Statistics, 16(2), 117–186.\nhttps://doi.org/https://www.jstor.org/stable/2240273\n\n\nWaldron, S., & Allen, C. (2022). Not all pre-registrations are\nequal. Neuropsychopharmacology, 47(13), 2181–2183. https://doi.org/10.1038/s41386-022-01418-x\n\n\nWang, B., Zhou, Z., Wang, H., Tu, X. M., & Feng, C. (2019). The\np-value and model specification in statistics. General\nPsychiatry, 32(3), e100081. https://doi.org/10.1136/gpsych-2019-100081\n\n\nWason, P. C. (1960). On the failure to eliminate hypotheses in a\nconceptual task. Quarterly Journal of Experimental Psychology,\n12(3), 129–140. https://doi.org/10.1080/17470216008416717\n\n\nWassmer, G., & Brannath, W. (2016). Group\nSequential and Confirmatory Adaptive Designs\nin Clinical Trials. Springer International\nPublishing. https://doi.org/10.1007/978-3-319-32562-0\n\n\nWeinshall-Margel, K., & Shapard, J. (2011). Overlooked factors in\nthe analysis of parole decisions. Proceedings of the National\nAcademy of Sciences, 108(42), E833–E833. https://doi.org/10.1073/pnas.1110910108\n\n\nWellek, S. (2010). Testing statistical hypotheses of equivalence and\nnoninferiority (2nd ed). CRC Press.\n\n\nWestberg, M. (1985). Combining Independent Statistical\nTests. Journal of the Royal Statistical Society. Series D\n(The Statistician), 34(3), 287–296. https://doi.org/10.2307/2987655\n\n\nWestfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power\nand optimal design in experiments in which samples of participants\nrespond to samples of stimuli. Journal of Experimental Psychology:\nGeneral, 143(5), 2020–2045. https://doi.org/10.1037/xge0000014\n\n\nWestlake, W. J. (1972). Use of Confidence Intervals in\nAnalysis of Comparative Bioavailability\nTrials. Journal of Pharmaceutical Sciences,\n61(8), 1340–1341. https://doi.org/10.1002/JPS.2600610845\n\n\nWhitney, S. N. (2016). Balanced Ethics Review.\nSpringer International Publishing. https://doi.org/10.1007/978-3-319-20705-6\n\n\nWicherts, J. M. (2011). Psychology must learn a lesson from fraud case.\nNature, 480(7375), 7–7. https://doi.org/10.1038/480007a\n\n\nWicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M.,\nAert, V., M, R. C., Assen, V., & M, M. A. L. (2016). Degrees of\nFreedom in Planning, Running,\nAnalyzing, and Reporting Psychological\nStudies: A Checklist to Avoid\np-Hacking. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.01832\n\n\nWiebels, K., & Moreau, D. (2021). Leveraging Containers\nfor Reproducible Psychological Research. Advances in\nMethods and Practices in Psychological Science, 4(2),\n25152459211017853. https://doi.org/10.1177/25152459211017853\n\n\nWigboldus, D. H. J., & Dotsch, R. (2016). Encourage\nPlaying with Data and Discourage\nQuestionable Reporting Practices. Psychometrika,\n81(1), 27–32. https://doi.org/10.1007/s11336-015-9445-1\n\n\nWilliams, R. H., Zimmerman, D. W., & Zumbo, B. D. (1995). Impact of\nMeasurement Error on Statistical Power:\nReview of an Old Paradox. The Journal of\nExperimental Education, 63(4), 363–370. https://doi.org/10.1080/00220973.1995.9943470\n\n\nWilson, E. C. F. (2015). A Practical Guide to\nValue of Information Analysis.\nPharmacoEconomics, 33(2), 105–121. https://doi.org/10.1007/s40273-014-0219-x\n\n\nWilson VanVoorhis, C. R., & Morgan, B. L. (2007). Understanding\npower and rules of thumb for determining sample sizes. Tutorials in\nQuantitative Methods for Psychology, 3(2), 43–50. https://doi.org/10.20982/tqmp.03.2.p043\n\n\nWiner, B. J. (1962). Statistical principles in experimental\ndesign. New York : McGraw-Hill.\n\n\nWingen, T., Berkessel, J. B., & Englich, B. (2020). No\nReplication, No Trust? How Low\nReplicability Influences Trust in Psychology.\nSocial Psychological and Personality Science, 11(4),\n454–463. https://doi.org/10.1177/1948550619877412\n\n\nWiseman, R., Watt, C., & Kornbrot, D. (2019). Registered reports: An\nearly example and analysis. PeerJ, 7, e6232. https://doi.org/10.7717/peerj.6232\n\n\nWittes, J., & Brittain, E. (1990). The role of internal pilot\nstudies in increasing the efficiency of clinical trials. Statistics\nin Medicine, 9(1-2), 65–72. https://doi.org/10.1002/sim.4780090113\n\n\nWong, T. K., Kiers, H., & Tendeiro, J. (2022). On the\nPotential Mismatch Between the Function of the\nBayes Factor and Researchers’\nExpectations. Collabra: Psychology, 8(1),\n36357. https://doi.org/10.1525/collabra.36357\n\n\nWynants, L., Calster, B. V., Collins, G. S., Riley, R. D., Heinze, G.,\nSchuit, E., Bonten, M. M. J., Dahly, D. L., Damen, J. A., Debray, T. P.\nA., Jong, V. M. T. de, Vos, M. D., Dhiman, P., Haller, M. C., Harhay, M.\nO., Henckaerts, L., Heus, P., Kammer, M., Kreuzberger, N., … Smeden, M.\nvan. (2020). Prediction models for diagnosis and prognosis of covid-19:\nSystematic review and critical appraisal. BMJ, 369,\nm1328. https://doi.org/10.1136/bmj.m1328\n\n\nYarkoni, T., & Westfall, J. (2017). Choosing Prediction Over\nExplanation in Psychology: Lessons From\nMachine Learning. Perspectives on Psychological Science,\n12(6), 1100–1122. https://doi.org/10.1177/1745691617693393\n\n\nYuan, K.-H., & Maxwell, S. (2005). On the Post Hoc\nPower in Testing Mean Differences. Journal of\nEducational and Behavioral Statistics, 30(2), 141–167. https://doi.org/10.3102/10769986030002141\n\n\nZabell, S. L. (1992). R. A. Fisher and\nFiducial Argument. Statistical Science,\n7(3), 369–387. https://doi.org/10.1214/ss/1177011233\n\n\nZenko, M. (2015). Red Team: How to\nSucceed By Thinking Like the Enemy (1st\nedition). Basic Books.\n\n\nZumbo, B. D., & Hubley, A. M. (1998). A note on misconceptions\nconcerning prospective and retrospective power. Journal of the Royal\nStatistical Society: Series D (The Statistician), 47(2),\n385–388. https://doi.org/10.1111/1467-9884.00139" }, { "objectID": "changelog.html", "href": "changelog.html", "title": "Change Log", "section": "", - "text": "The current version of this textbook is 1.4.3.\nThis version has been compiled on January 10, 2024.\nThis version was generated from Git commit #5d126ce3. All version controlled changes can be found on GitHub.\nThis page documents the changes to the textbook that were more substantial than fixing a typo.\nUpdates\nJanuary 10, 2024:\nAdded the section ‘Deviating from a Preregistration’ in CH 13\nOctober 15, 2023:\nIncorporated extensive edits by Nick Brown in CH 4-6.\nSeptember 6, 2023:\nIncorporated extensive edits by Nick Brown in CH 1-3.\nAugust 27, 2023:\nAdd CH 16 on confirmation bias and organized skepticism. Add Bakan 1967 quote to CH 13.\nAugust 12, 2023:\nAdded section on why standardized effect sizes hinder the interpretation of effect sizes in CH 6. Added Spanos 1999 to CH 1. Split up the correct interpretation of p values for significant and non-significant results CH 1. Added new Statcheck study CH 12. Added Platt quote CH 5.\nJuly 21, 2023:\nAdded “Why Effect Sizes Selected for Significance are Inflated” section to CH 6, moved main part of “The Minimal Statistically Detectable Effect” from CH 8 to CH 6, replaced Greek characters by latex, added sentence bias is expected for papers that depend on main hypothesis test in CH 12.\nJuly 13, 2023:\nUpdated Open Questions in CH 1, 2, 3, 4, 6, 7, 8 and 9. Added a figure illustrating how confidence intervals become more narrow as N increases in CH 7.\nJuly 7, 2023:\nAdded this change log page.\nJune 12, 2023:\nAdded an updated figure from Carter & McCullough, 2014, in the chapter in bias detection, now generated from the raw data.\nMay 5, 2023:\nAdded the option to download a PDF and epub version of the book.\nMarch 19, 2023:\nUpdated CH 5 with new sections on falsification, severity, and risky predictions, and a new final section on verisimilitude.\nMarch 3, 2023:\nUpdated book to Quarto. Added webexercises to all chapters.\nFebruary 27, 2023:\nAdded a section “Dealing with Inconsistencies in Science” to CH 5.\nOctober 4, 2022:\nAdded CH 15 on research integrity." + "text": "The current version of this textbook is 1.4.3.\nThis version has been compiled on February 01, 2024.\nThis version was generated from Git commit #bb858739. All version controlled changes can be found on GitHub.\nThis page documents the changes to the textbook that were more substantial than fixing a typo.\nUpdates\nJanuary 10, 2024:\nAdded the section ‘Deviating from a Preregistration’ in CH 13\nOctober 15, 2023:\nIncorporated extensive edits by Nick Brown in CH 4-6.\nSeptember 6, 2023:\nIncorporated extensive edits by Nick Brown in CH 1-3.\nAugust 27, 2023:\nAdd CH 16 on confirmation bias and organized skepticism. Add Bakan 1967 quote to CH 13.\nAugust 12, 2023:\nAdded section on why standardized effect sizes hinder the interpretation of effect sizes in CH 6. Added Spanos 1999 to CH 1. Split up the correct interpretation of p values for significant and non-significant results CH 1. Added new Statcheck study CH 12. Added Platt quote CH 5.\nJuly 21, 2023:\nAdded “Why Effect Sizes Selected for Significance are Inflated” section to CH 6, moved main part of “The Minimal Statistically Detectable Effect” from CH 8 to CH 6, replaced Greek characters by latex, added sentence bias is expected for papers that depend on main hypothesis test in CH 12.\nJuly 13, 2023:\nUpdated Open Questions in CH 1, 2, 3, 4, 6, 7, 8 and 9. Added a figure illustrating how confidence intervals become more narrow as N increases in CH 7.\nJuly 7, 2023:\nAdded this change log page.\nJune 12, 2023:\nAdded an updated figure from Carter & McCullough, 2014, in the chapter in bias detection, now generated from the raw data.\nMay 5, 2023:\nAdded the option to download a PDF and epub version of the book.\nMarch 19, 2023:\nUpdated CH 5 with new sections on falsification, severity, and risky predictions, and a new final section on verisimilitude.\nMarch 3, 2023:\nUpdated book to Quarto. Added webexercises to all chapters.\nFebruary 27, 2023:\nAdded a section “Dealing with Inconsistencies in Science” to CH 5.\nOctober 4, 2022:\nAdded CH 15 on research integrity." } ] \ No newline at end of file diff --git a/include/book.bib b/include/book.bib index d04193b..6c9eafd 100644 --- a/include/book.bib +++ b/include/book.bib @@ -151,7 +151,7 @@ @article{altman_statistics_1995 issn = {0959-8138, 1468-5833}, doi = {10.1136/bmj.311.7003.485}, urldate = {2018-02-23}, - abstract = {The non-equivalence of statistical significance and clinical importance has long been recognised, but this error of interpretation remains common. Although a significant result in a large study may sometimes not be clinically important, a far greater problem arises from misinterpretation of non-significant findings. By convention a P value greater than 5\% (P{$>$}0.05) is called ``not significant.'' Randomised controlled clinical trials that do not show a significant difference between the treatments being compared are often called ``negative.'' This term wrongly implies that the study has shown that there is no difference, whereas usually all that has been shown is an absence of evidence of a difference. These are quite different statements. The sample size of controlled trials is generally inadequate, with a consequent lack of power to detect real, and clinically worthwhile, differences in treatment. Freiman et al1 found that only {\ldots}}, + abstract = {The non-equivalence of statistical significance and clinical importance has long been recognised, but this error of interpretation remains common. Although a significant result in a large study may sometimes not be clinically important, a far greater problem arises from misinterpretation of non-significant findings. By convention a P value greater than 5\% (P{$>$}0.05) is called ``not significant.'' Randomised controlled clinical trials that do not show a significant difference between the treatments being compared are often called ``negative.'' This term wrongly implies that the study has shown that there is no difference, whereas usually all that has been shown is an absence of evidence of a difference. These are quite different statements. The sample size of controlled trials is generally inadequate, with a consequent lack of power to detect real, and clinically worthwhile, differences in treatment. Freiman et al1 found that only {\dots}}, copyright = {{\textcopyright} 1995 BMJ Publishing Group Ltd.}, langid = {english}, pmid = {7647644} @@ -1225,7 +1225,7 @@ @article{coles_multi-lab_2022 @article{colling_registered_2020, title = {Registered {{Replication Report}} on {{Fischer}}, {{Castel}}, {{Dodd}}, and {{Pratt}} (2003)}, - author = {Colling, Lincoln J. and Sz{\H u}cs, D{\'e}nes and De Marco, Damiano and Cipora, Krzysztof and Ulrich, Rolf and Nuerk, Hans-Christoph and Soltanlou, Mojtaba and Bryce, Donna and Chen, Sau-Chin and Schroeder, Philipp Alexander and Henare, Dion T. and Chrystall, Christine K. and Corballis, Paul M. and Ansari, Daniel and Goffin, Celia and Sokolowski, H. Moriah and Hancock, Peter J. B. and Millen, Ailsa E. and Langton, Stephen R. H. and Holmes, Kevin J. and Saviano, Mark S. and Tummino, Tia A. and Lindemann, Oliver and Zwaan, Rolf A. and Lukavsk{\'y}, Ji{\v r}{\'i} and Beckov{\'a}, Ad{\'e}la and Vranka, Marek A. and Cutini, Simone and Mammarella, Irene Cristina and Mulatti, Claudio and Bell, Raoul and Buchner, Axel and Mieth, Laura and R{\"o}er, Jan Philipp and Klein, Elise and Huber, Stefan and Moeller, Korbinian and Ocampo, Brenda and Lupi{\'a}{\~n}ez, Juan and {Ortiz-Tudela}, Javier and {de la Fuente}, Juanma and Santiago, Julio and Ouellet, Marc and Hubbard, Edward M. and Toomarian, Elizabeth Y. and Job, Remo and Treccani, Barbara and McShane, Blakeley B.}, + author = {Colling, Lincoln J. and Sz{\Hu}cs, D{\'e}nes and De Marco, Damiano and Cipora, Krzysztof and Ulrich, Rolf and Nuerk, Hans-Christoph and Soltanlou, Mojtaba and Bryce, Donna and Chen, Sau-Chin and Schroeder, Philipp Alexander and Henare, Dion T. and Chrystall, Christine K. and Corballis, Paul M. and Ansari, Daniel and Goffin, Celia and Sokolowski, H. Moriah and Hancock, Peter J. B. and Millen, Ailsa E. and Langton, Stephen R. H. and Holmes, Kevin J. and Saviano, Mark S. and Tummino, Tia A. and Lindemann, Oliver and Zwaan, Rolf A. and Lukavsk{\'y}, Ji{\v r}{\'i} and Beckov{\'a}, Ad{\'e}la and Vranka, Marek A. and Cutini, Simone and Mammarella, Irene Cristina and Mulatti, Claudio and Bell, Raoul and Buchner, Axel and Mieth, Laura and R{\"o}er, Jan Philipp and Klein, Elise and Huber, Stefan and Moeller, Korbinian and Ocampo, Brenda and Lupi{\'a}{\~n}ez, Juan and {Ortiz-Tudela}, Javier and {de la Fuente}, Juanma and Santiago, Julio and Ouellet, Marc and Hubbard, Edward M. and Toomarian, Elizabeth Y. and Job, Remo and Treccani, Barbara and McShane, Blakeley B.}, year = {2020}, month = jun, journal = {Advances in Methods and Practices in Psychological Science}, @@ -1875,7 +1875,7 @@ @article{fanelli_how_2009 issn = {1932-6203}, doi = {10.1371/journal.pone.0005738}, urldate = {2022-03-15}, - abstract = {The frequency with which scientists fabricate and falsify data, or commit other forms of scientific misconduct is a matter of controversy. Many surveys have asked scientists directly whether they have committed or know of a colleague who committed research misconduct, but their results appeared difficult to compare and synthesize. This is the first meta-analysis of these surveys. To standardize outcomes, the number of respondents who recalled at least one incident of misconduct was calculated for each question, and the analysis was limited to behaviours that distort scientific knowledge: fabrication, falsification, ``cooking'' of data, etc{\ldots} Survey questions on plagiarism and other forms of professional misconduct were excluded. The final sample consisted of 21 surveys that were included in the systematic review, and 18 in the meta-analysis. A pooled weighted average of 1.97\% (N = 7, 95\%CI: 0.86{\textendash}4.45) of scientists admitted to have fabricated, falsified or modified data or results at least once {\textendash}a serious form of misconduct by any standard{\textendash} and up to 33.7\% admitted other questionable research practices. In surveys asking about the behaviour of colleagues, admission rates were 14.12\% (N = 12, 95\% CI: 9.91{\textendash}19.72) for falsification, and up to 72\% for other questionable research practices. Meta-regression showed that self reports surveys, surveys using the words ``falsification'' or ``fabrication'', and mailed surveys yielded lower percentages of misconduct. When these factors were controlled for, misconduct was reported more frequently by medical/pharmacological researchers than others. Considering that these surveys ask sensitive questions and have other limitations, it appears likely that this is a conservative estimate of the true prevalence of scientific misconduct.}, + abstract = {The frequency with which scientists fabricate and falsify data, or commit other forms of scientific misconduct is a matter of controversy. Many surveys have asked scientists directly whether they have committed or know of a colleague who committed research misconduct, but their results appeared difficult to compare and synthesize. This is the first meta-analysis of these surveys. To standardize outcomes, the number of respondents who recalled at least one incident of misconduct was calculated for each question, and the analysis was limited to behaviours that distort scientific knowledge: fabrication, falsification, ``cooking'' of data, etc{\dots} Survey questions on plagiarism and other forms of professional misconduct were excluded. The final sample consisted of 21 surveys that were included in the systematic review, and 18 in the meta-analysis. A pooled weighted average of 1.97\% (N = 7, 95\%CI: 0.86{\textendash}4.45) of scientists admitted to have fabricated, falsified or modified data or results at least once {\textendash}a serious form of misconduct by any standard{\textendash} and up to 33.7\% admitted other questionable research practices. In surveys asking about the behaviour of colleagues, admission rates were 14.12\% (N = 12, 95\% CI: 9.91{\textendash}19.72) for falsification, and up to 72\% for other questionable research practices. Meta-regression showed that self reports surveys, surveys using the words ``falsification'' or ``fabrication'', and mailed surveys yielded lower percentages of misconduct. When these factors were controlled for, misconduct was reported more frequently by medical/pharmacological researchers than others. Considering that these surveys ask sensitive questions and have other limitations, it appears likely that this is a conservative estimate of the true prevalence of scientific misconduct.}, langid = {english}, keywords = {Deception,Medical journals,Medicine and health sciences,Metaanalysis,Scientific misconduct,Scientists,Social research,Surveys} } @@ -2413,7 +2413,7 @@ @article{gillon_medical_1994 issn = {0959-8138, 1468-5833}, doi = {10.1136/bmj.309.6948.184}, urldate = {2022-09-27}, - abstract = {The ``four principles plus scope'' approach provides a simple, accessible, and culturally neutral approach to thinking about ethical issues in health care. The approach, developed in the United States, is based on four common, basic prima facie moral commitments - respect for autonomy, beneficence, non-maleficence, and justice - plus concern for their scope of application. It offers a common, basic moral analytical framework and a common, basic moral language. Although they do not provide ordered rules, these principles can help doctors and other health care workers to make decisions when reflecting on moral issues that arise at work. Nine years ago the BMJ allowed me to introduce to its readers1 an approach to medical ethics developed by the Americans Beauchamp and Childress,2 which is based on four prima facie moral principles and attention to these principles' scope of application. Since then I have often been asked for a summary of this approach by doctors and other health care workers who find it helpful for organising their thoughts about medical ethics. This paper, based on the preface of a large multiauthor textbook on medical ethics,3 offers a brief account of this ``four principles plus scope'' approach. The four principles plus scope approach claims that whatever our personal philosophy, politics, religion, moral theory, or life stance, we will find no difficulty in committing ourselves to four prima facie moral principles plus a reflective concern about their scope of application. Moreover, these four principles, plus attention to their scope of application, encompass most of the moral issues that arise in health care. The four prima facie principles are respect for autonomy, beneficence, non-maleficence, and justice. ``Prima facie,'' a term introduced by the English philosopher W D Ross, means that the principle is binding unless it conflicts with another moral principle {\ldots}}, + abstract = {The ``four principles plus scope'' approach provides a simple, accessible, and culturally neutral approach to thinking about ethical issues in health care. The approach, developed in the United States, is based on four common, basic prima facie moral commitments - respect for autonomy, beneficence, non-maleficence, and justice - plus concern for their scope of application. It offers a common, basic moral analytical framework and a common, basic moral language. Although they do not provide ordered rules, these principles can help doctors and other health care workers to make decisions when reflecting on moral issues that arise at work. Nine years ago the BMJ allowed me to introduce to its readers1 an approach to medical ethics developed by the Americans Beauchamp and Childress,2 which is based on four prima facie moral principles and attention to these principles' scope of application. Since then I have often been asked for a summary of this approach by doctors and other health care workers who find it helpful for organising their thoughts about medical ethics. This paper, based on the preface of a large multiauthor textbook on medical ethics,3 offers a brief account of this ``four principles plus scope'' approach. The four principles plus scope approach claims that whatever our personal philosophy, politics, religion, moral theory, or life stance, we will find no difficulty in committing ourselves to four prima facie moral principles plus a reflective concern about their scope of application. Moreover, these four principles, plus attention to their scope of application, encompass most of the moral issues that arise in health care. The four prima facie principles are respect for autonomy, beneficence, non-maleficence, and justice. ``Prima facie,'' a term introduced by the English philosopher W D Ross, means that the principle is binding unless it conflicts with another moral principle {\dots}}, chapter = {Education and debate}, copyright = {{\textcopyright} 1994 BMJ Publishing Group Ltd.}, langid = {english}, @@ -3720,7 +3720,7 @@ @article{lakens_is_2023 @article{lakens_justify_2018, title = {Justify Your Alpha}, - author = {Lakens, Dani{\"e}l and Adolfi, Federico G. and Albers, Casper J. and Anvari, Farid and Apps, Matthew A. J. and Argamon, Shlomo E. and Baguley, Thom and Becker, Raymond B. and Benning, Stephen D. and Bradford, Daniel E. and Buchanan, Erin M. and Caldwell, Aaron R. and Calster, Ben and Carlsson, Rickard and Chen, Sau-Chin and Chung, Bryan and Colling, Lincoln J. and Collins, Gary S. and Crook, Zander and Cross, Emily S. and Daniels, Sameera and Danielsson, Henrik and DeBruine, Lisa and Dunleavy, Daniel J. and Earp, Brian D. and Feist, Michele I. and Ferrell, Jason D. and Field, James G. and Fox, Nicholas W. and Friesen, Amanda and Gomes, Caio and {Gonzalez-Marquez}, Monica and Grange, James A. and Grieve, Andrew P. and Guggenberger, Robert and Grist, James and Harmelen, Anne-Laura and Hasselman, Fred and Hochard, Kevin D. and Hoffarth, Mark R. and Holmes, Nicholas P. and Ingre, Michael and Isager, Peder M. and Isotalus, Hanna K. and Johansson, Christer and Juszczyk, Konrad and Kenny, David A. and Khalil, Ahmed A. and Konat, Barbara and Lao, Junpeng and Larsen, Erik Gahner and Lodder, Gerine M. A. and Lukavsk{\'y}, Ji{\v r}{\'i} and Madan, Christopher R. and Manheim, David and Martin, Stephen R. and Martin, Andrea E. and Mayo, Deborah G. and McCarthy, Randy J. and McConway, Kevin and McFarland, Colin and Nio, Amanda Q. X. and Nilsonne, Gustav and Oliveira, Cilene Lino and Xivry, Jean-Jacques Orban and Parsons, Sam and Pfuhl, Gerit and Quinn, Kimberly A. and Sakon, John J. and Saribay, S. Adil and Schneider, Iris K. and Selvaraju, Manojkumar and Sjoerds, Zsuzsika and Smith, Samuel G. and Smits, Tim and Spies, Jeffrey R. and Sreekumar, Vishnu and Steltenpohl, Crystal N. and Stenhouse, Neil and {\'S}wi{\k{a}}tkowski, Wojciech and Vadillo, Miguel A. and Assen, Marcel A. L. M. and Williams, Matt N. and Williams, Samantha E. and Williams, Donald R. and Yarkoni, Tal and Ziano, Ignazio and Zwaan, Rolf A.}, + author = {Lakens, Dani{\"e}l and Adolfi, Federico G. and Albers, Casper J. and Anvari, Farid and Apps, Matthew A. J. and Argamon, Shlomo E. and Baguley, Thom and Becker, Raymond B. and Benning, Stephen D. and Bradford, Daniel E. and Buchanan, Erin M. and Caldwell, Aaron R. and Calster, Ben and Carlsson, Rickard and Chen, Sau-Chin and Chung, Bryan and Colling, Lincoln J. and Collins, Gary S. and Crook, Zander and Cross, Emily S. and Daniels, Sameera and Danielsson, Henrik and DeBruine, Lisa and Dunleavy, Daniel J. and Earp, Brian D. and Feist, Michele I. and Ferrell, Jason D. and Field, James G. and Fox, Nicholas W. and Friesen, Amanda and Gomes, Caio and {Gonzalez-Marquez}, Monica and Grange, James A. and Grieve, Andrew P. and Guggenberger, Robert and Grist, James and Harmelen, Anne-Laura and Hasselman, Fred and Hochard, Kevin D. and Hoffarth, Mark R. and Holmes, Nicholas P. and Ingre, Michael and Isager, Peder M. and Isotalus, Hanna K. and Johansson, Christer and Juszczyk, Konrad and Kenny, David A. and Khalil, Ahmed A. and Konat, Barbara and Lao, Junpeng and Larsen, Erik Gahner and Lodder, Gerine M. A. and Lukavsk{\'y}, Ji{\v r}{\'i} and Madan, Christopher R. and Manheim, David and Martin, Stephen R. and Martin, Andrea E. and Mayo, Deborah G. and McCarthy, Randy J. and McConway, Kevin and McFarland, Colin and Nio, Amanda Q. X. and Nilsonne, Gustav and Oliveira, Cilene Lino and Xivry, Jean-Jacques Orban and Parsons, Sam and Pfuhl, Gerit and Quinn, Kimberly A. and Sakon, John J. and Saribay, S. Adil and Schneider, Iris K. and Selvaraju, Manojkumar and Sjoerds, Zsuzsika and Smith, Samuel G. and Smits, Tim and Spies, Jeffrey R. and Sreekumar, Vishnu and Steltenpohl, Crystal N. and Stenhouse, Neil and {\'S}wi{\k a}tkowski, Wojciech and Vadillo, Miguel A. and Assen, Marcel A. L. M. and Williams, Matt N. and Williams, Samantha E. and Williams, Donald R. and Yarkoni, Tal and Ziano, Ignazio and Zwaan, Rolf A.}, year = {2018}, month = feb, journal = {Nature Human Behaviour}, @@ -3822,7 +3822,7 @@ @article{lakens_simulation-based_2021 issn = {2515-2459}, doi = {10.1177/2515245920951503}, urldate = {2021-03-23}, - abstract = {Researchers often rely on analysis of variance (ANOVA) when they report results of experiments. To ensure that a study is adequately powered to yield informative results with an ANOVA, researchers can perform an a priori power analysis. However, power analysis for factorial ANOVA designs is often a challenge. Current software solutions do not allow power analyses for complex designs with several within-participants factors. Moreover, power analyses often need {$\eta$}2{$\mathsl{p}\eta$}p2{$<$}math display="inline" id="math1-2515245920951503" overflow="scroll" altimg="eq-00001.gif"{$><$}mrow{$><$}msubsup{$><$}mi mathvariant="normal"{$>\eta<$}/mi{$><$}mi{$>$}p{$<$}/mi{$><$}mn{$>$}2{$<$}/mn{$><$}/msubsup{$><$}/mrow{$><$}/math{$>$} or Cohen's f as input, but these effect sizes are not intuitive and do not generalize to different experimental designs. We have created the R package Superpower and online Shiny apps to enable researchers without extensive programming experience to perform simulation-based power analysis for ANOVA designs of up to three within- or between-participants factors. Predicted effects are entered by specifying means, standard deviations, and, for within-participants factors, the correlations. The simulation provides the statistical power for all ANOVA main effects, interactions, and individual comparisons. The software can plot power across a range of sample sizes, can control for multiple comparisons, and can compute power when the homogeneity or sphericity assumption is violated. This Tutorial demonstrates how to perform a priori power analysis to design informative studies for main effects, interactions, and individual comparisons and highlights important factors that determine the statistical power for factorial ANOVA designs.}, + abstract = {Researchers often rely on analysis of variance (ANOVA) when they report results of experiments. To ensure that a study is adequately powered to yield informative results with an ANOVA, researchers can perform an a priori power analysis. However, power analysis for factorial ANOVA designs is often a challenge. Current software solutions do not allow power analyses for complex designs with several within-participants factors. Moreover, power analyses often need {$\eta$}2{$p\eta$}p2{$<$}math display="inline" id="math1-2515245920951503" overflow="scroll" altimg="eq-00001.gif"{$><$}mrow{$><$}msubsup{$><$}mi mathvariant="normal"{$>\eta<$}/mi{$><$}mi{$>$}p{$<$}/mi{$><$}mn{$>$}2{$<$}/mn{$><$}/msubsup{$><$}/mrow{$><$}/math{$>$} or Cohen's f as input, but these effect sizes are not intuitive and do not generalize to different experimental designs. We have created the R package Superpower and online Shiny apps to enable researchers without extensive programming experience to perform simulation-based power analysis for ANOVA designs of up to three within- or between-participants factors. Predicted effects are entered by specifying means, standard deviations, and, for within-participants factors, the correlations. The simulation provides the statistical power for all ANOVA main effects, interactions, and individual comparisons. The software can plot power across a range of sample sizes, can control for multiple comparisons, and can compute power when the homogeneity or sphericity assumption is violated. This Tutorial demonstrates how to perform a priori power analysis to design informative studies for main effects, interactions, and individual comparisons and highlights important factors that determine the statistical power for factorial ANOVA designs.}, langid = {english}, keywords = {ANOVA,hypothesis test,open materials,power analysis,sample-size justification} } @@ -7165,7 +7165,7 @@ @article{weinshall-margel_overlooked_2011 issn = {0027-8424, 1091-6490}, doi = {10.1073/pnas.1110910108}, urldate = {2022-02-14}, - abstract = {Danziger et al. (1) concluded that meal breaks taken by Israeli parole boards influence the boards' decisions. This conclusion depends on the order of cases being random or at least exogenous to the timing of meal breaks. We examined data provided by the authors and obtained additional data from 12 hearing days ( n = 227 decisions).* We also interviewed three attorneys, a parole panel judge, and five personnel at Israeli Prison Services and Court Management, learning that case ordering is not random and that several factors contribute to the downward trend in prisoner success between meal breaks. The most important is that the board tries to complete all cases from one prison before it takes a break and to start with another {\ldots} [{$\carriagereturn$}][1]1To whom correspondence should be addressed. E-mail: johnshapard\{at\}gmail.com. [1]: \#xref-corresp-1-1}, + abstract = {Danziger et al. (1) concluded that meal breaks taken by Israeli parole boards influence the boards' decisions. This conclusion depends on the order of cases being random or at least exogenous to the timing of meal breaks. We examined data provided by the authors and obtained additional data from 12 hearing days ( n = 227 decisions).* We also interviewed three attorneys, a parole panel judge, and five personnel at Israeli Prison Services and Court Management, learning that case ordering is not random and that several factors contribute to the downward trend in prisoner success between meal breaks. The most important is that the board tries to complete all cases from one prison before it takes a break and to start with another {\dots} [↵][1]1To whom correspondence should be addressed. E-mail: johnshapard\{at\}gmail.com. [1]: \#xref-corresp-1-1}, chapter = {Letter}, langid = {english}, pmid = {21987788}