forked from MathiasHarrer/Doing-Meta-Analysis-in-R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
08-publication_bias.Rmd
280 lines (184 loc) · 22.1 KB
/
08-publication_bias.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
# Publication Bias
![](_figs/publicationbias.jpg)
In the last chapters, we have showed you how to **pool effects** in meta-analysis, choose the **right pooling model**, assess the **heterogeneity** of your effect estimate, and determine sources of heterogeneity through **outlier, influence, and subgroup analyses**.
Nevertheless, even the most thoroughly conducted meta-analysis can only work with the **study material at hand**. An issue commonly discussed in research, however, is the **file-drawer** or **publication bias** problem, which states that a study with high effect sizes **is more likely to be published** than a study with a low effect size [@rothstein2006publication].
Such **missing studies** with low effect sizes, it is assumed, thus never get published and therefore cannot be integrated into our meta-analysis. This leads to **publication bias**, as the pooled effect we estimate in our meta-analysis might be higher than the **true effect size** because we did not consider the missing studies with lower effects due to the simple fact that they were never published.
Although this practice is gradually changing [@nelson2018psychology], whether a study is published or not heavily depends on the **statistical significance** ($p<0.05$) of its results [@dickersin2005publication]. For any sample size, a result is more likely to become **statistically significant** if its **effect size is high**. This is particularly true for **small studies**, for which very large effect sizes are needed to attain a statisitcally significant result.
In the [following chapter](#smallstudyeffects), we will describe the **idea behind statistical models for publication bias** in further depth. We call these concepts and methods **small sample bias methods**, as it is small studies that they mostly focus on. These methods assume that publication bias is primarily driven by **effect size** and because researchers **immediately put every study in the file drawer once the results are not significant**.
Recently, it has been argued that these **assumptions may not be true**, and that publication bias is mostly caused by **significance levels** and **p-hacking** [@simonsohn2014p]. An alternative method called **P-Curve** has therefore been suggested to examine **publication bias**. We will present this method in the [last chapter](#pcurve) of this section.
```{block,type='rmdinfo'}
**Which method should i use for my meta-analysis?**
While recent research suggests that the conventional **small sample bias methods** may have substantial **limitations**, and that **p-Curve** may be able to estimate the true effect with less bias [@simonsohn2014p;@simonsohn2015better;@simonsohn2014pb], please note that both methods are based on different **theoretical assumptions** about the origin of publication bias. As we cannot ultimately decide which assumption is the **"true"** one in specific research fields, and, in practice **the true effect is unkown when doing meta-analysis**, we argue that you may use **both methods** and compare results as **sensitivity analyses** [@harrer2019internet].
**P-curve** was developed with **full-blown experimental psychological research in mind**, in which researchers often have **high degrees of "researcher freedom"** [@simmons2011false] in deleting outliers and performing statistical test on their data.
We argue that this looks slightly different for **clinical psychology** and the medical field, where researchers conduct **randomized controlled trials** whith a clear **primary outcome**: the difference between the control and the intervention group after the treatment. While it is also true for **medicine and clinical psychology that statistical significance plays an important role**, the **effect size** of an intervention is often of greater interest, as **treatments are often compared in terms of their treatment effects** in this field. Furthermore, best practice for randomized controlled trials is to perform **intention-to-treat** analyses, in which all collected data in a trial has to be considered, giving researchers less space to "play around" with their data and perform p-hacking. While we certainly do not want to insinuate that **outcome research in clinical psychology** is free from p-hacking and bad data analysis practices, this should be seen as a **caveat** that the assumptions of the small sample bias methods may be more adequate for clinical psychology than other fields within psychology, especially when **the risk of bias for each study is also taken into account*.
Facing this uncertainty, we think that conducting both analyses and reporting them in our research paper may be the most adequate approach until meta-scientific research gives us more certainty about which **assumption actually best reflects the field of clinical psychology**.
```
## Small Sample Bias methods {#smallstudyeffects}
The **small sample bias methods** we present here have been conventional for many years. Thus various methods to assess and control for publication bias have been developed, but we will only focus on the most important ones here.
```{block,type='rmdinfo'}
**The model behind small sample bias methods**
According to Borenstein et al. [@borenstein2011], the models behind the most common small sample bias methods have these core **assumptions**:
1. Because they involve large commitment of ressources and time, **large studies are likely to get published**, whether the results are significant or not.
2. Moderately sized studies are at **greater risk of not being published**, but with a moderate sample size even moderately sized effects are likely to become significant, which means that only some studies will be missing.
3. Small studies are **at greatest risk** for being non-significant, and thus being missing. Only small studies with a very large effect size become significant, and will be found in the published literature.
In accordance with these assumptions, the methods we present here particularly focus **on small studies with small effect sizes, and whether they are missing**.
```
### Funnel plots
The best way to visualize whether **small studies with small effect sizes are missing** is through **funnel plots**.
We can generate a funnel plot for our `m.hksj` meta-analysis output using the `funnel()` function in `meta`.
```{r,echo=FALSE,message=FALSE}
library(meta)
load("_data/Meta_Analysis_Data.RData")
madata<-Meta_Analysis_Data
m.hksj<-metagen(TE,
seTE,
data=madata,
studlab=paste(Author),
comb.fixed = FALSE,
comb.random = TRUE,
method.tau = "SJ",
hakn = TRUE,
prediction=TRUE,
sm="SMD")
```
```{r}
funnel(m.hksj,xlab = "Hedges' g")
```
The **funnel plot** basically consists of a **funnel** and two **axes**: the y-axis showing the **Standard Error** $SE$ of each study, with larger studies (which thus have a smaller $SE$) plotted **on top of the y-axis**; and the x-axis showing the **effect size** of each study.
Given our assumptions, and in the case when there is **no publication bias**, all studies would lie **symmetrically around our pooled effect size (the striped line)** within the form of the funnel. When **publication bias is present**, we would assume that the funnel would look asymmetrical, because only the small studies with a large effect size very published, **while small studies without a significant, large effect would be missing**.
We see from the plot that in the case of our meta-anlysis `m.hksj`, the latter is probably true. We see that the plot is highly asymmetrical, with exactly the small studies with low effect size missing in the bottom-left corner of our plot.
We can also display the name of each study using the `studlab` parameter.
```{r}
funnel(m.hksj,xlab = "g",studlab = TRUE)
```
Here, we see that asymmetry is primarily driven by **three studies with high effects, but a small study sample** in the bottom right corner. Interestingly, two of these studies are the ones we also detected in our [outlier](#outliers) and [influence](#influenceanalyses) analyses.
An even better way to inspect the funnel plot is through **contour-enhanced funnel plots**, which help to distinguish publication bias from other forms of asymmetry [@peters2008contour]. Contour-enhanced funnels include colors signifying the significance level into which the effects size of each study falls. We can plot such funnels using this code:
```{r,message=FALSE,warning=FALSE,eval=FALSE}
funnel(m.hksj, xlab="Hedges' g",
contour = c(.95,.975,.99),
col.contour=c("darkblue","blue","lightblue"))+
legend(1.4, 0, c("p < 0.05", "p<0.025", "< 0.01"),bty = "n",
fill=c("darkblue","blue","lightblue"))
```
```{r, echo=FALSE, fig.width=7}
library(png)
library(grid)
img <- readPNG("_figs/funnel.PNG")
grid.raster(img)
```
We can see in the plot that while **some studies have statistically significant effect sizes** (blue background), other do not (white background). We also see a trend that while the moderately sized studies are partly significant and non-significant, with slightly more significant studies, the asymmetry is much larger for small studies. This gives us a hint that publication bias might indeed be present in our analysis.
### Testing for funnel plot asymmetry using Egger's test
**Egger's test of the intercept** [@egger1997bias] quantifies the funnel plot asymmetry and performs a statistical test.
We have prepared a function called `eggers.test` for you, which allows you to perform Egger's test of the intercept in *R*. The function is a wrapper for the `metabias` function in `meta`. The `eggers.test` function is part if the [`dmetar`](#dmetar) package. If you have the package installed already, you have to load it into your library first.
```{r, eval=FALSE}
library(dmetar)
```
If you don't want to use the `dmetar` package, you can find the source code for this function [here](https://raw.githubusercontent.com/MathiasHarrer/dmetar/master/R/eggers.test.R). In this case, *R* doesn't know this function yet, so we have to let *R* learn it by **copying and pasting** the code **in its entirety** into the **console** in the bottom left pane of RStudio, and then hit **Enter ⏎**. The function requires the `meta` package to work.
```{r, echo=FALSE}
source("dmetar/eggers.test.R")
```
Now we can use the `eggers.test` function. We only have to specify our meta-analysis output `m.hksj` as the parameter `x` the function should use.
```{r}
eggers.test(x = m.hksj)
```
The function returns the `intercept` along with its confidence interval. We can see that the `p-value` of Egger's test is **significant** ($p<0.05$), which means that there is substanital asymmetry in the Funnel plot. This asymmetry could have been caused by publication bias.
### Duval & Tweedie's trim-and-fill procedure {#dant}
**Duval & Tweedie's trim-and-fill procedure** [@duval2000trim] is also based the funnel plot and its symmetry/asymmetry. When **Egger's test is significant**, we can use this method to estimate what **the actaul effect size would be had the "missing" small studies been published**. The procedure **imputes** missing studies into the funnel plot until symmetry is reached again.
```{block,type='rmdinfo'}
**The trim-and-fill procedure includes the following five steps** [@schwarzer2015meta]:
1. Estimating the number of studies in the outlying (right) part of the funnel plot
2. Removing (trimming) these effect sizes and pooling the results with the remaining effect sizes
3. This pooled effect is then taken as the center of all effect sizes
4. For each trimmed/removed study, a additional study is imputed, mirroring the effect of the study on the left side of the funnel plot
5. Pooling the results with the imputed studies and the trimmed studies included
```
The **trim-and-fill-procedure** can be performed using the `trimfill` function in `meta`, and specifying our meta-analysis output.
```{r}
trimfill(m.hksj)
```
We see that the procedure identified and trimmed **eight studies** `(with 8 added studies)`). The overall effect estimated by the procedure is $g = 0.34$.
Let's compare this to our initial results.
```{r}
m.hksj$TE.random
```
The initial pooled effect size was $g = 0.59$, which is substantially larger than the bias-corrected effect. In our case, if we assume that **publication bias** was a problem in the analyses, the **trim-and-fill procedure** lets us assume that our initial results were **overestimated** due to publication bias, and the "true" effect when controlling for selective publication might be $g = 0.34$ rather than $g = 0.59$.
If we store the results of the `trimfill` function in an **object**, we can also create **funnel plots including the imputed studies**.
```{r}
m.hksj.trimfill<-trimfill(m.hksj)
funnel(m.hksj.trimfill,xlab = "Hedges' g")
```
## *P*-Curve {#pcurve}
In the last chapter, we showed how you can apply **Egger's test of the intercept**, **Duval & Tweedie's trim and fill procedure**, and inspect **Funnel plots** in *R*.
As we have mentioned before, recent research has shown that the assumptions of the small-effect study methods may be inaccurate in many cases. The **Duval & Tweedie trim-and-fill procedure** in particular has been shown to be prone to providing **inaccurate effect size estimates** [@simonsohn2014pb].
**$P$-curve Analysis** has been proposed as an alternative way to assess publication bias and estimate the true effect behind our collected data. $P$-Curve assumes that publication bias is not primarily generated because researchers do not publish non-significant results, but because the "play" around with their data (e.g., selectively removing outliers, choosing different outcomes, controlling for different variables) until a non-significant finding becomes significant. This (bad) practice is called **$p$-hacking**, and has been shown to be very frequent among researchers [@head2015extent].
<br></br>
```{block,type='rmdinfo'}
**The idea behind $P$-Curve**
<iframe width="700" height="415" src="https://www.youtube.com/embed/V7pvYLZkcK4" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
```
<br></br>
### Performing a *p*-curve analysis
To conduct a $p$-curve analysis, you can use the `pcurve` function we prepared for you. This function is part if the [`dmetar`](#dmetar) package. If you have the package installed already, you have to load it into your library first.
```{r, eval=FALSE}
library(dmetar)
```
If you don't want to use the `dmetar` package, you can find the source code for this function [here](https://raw.githubusercontent.com/MathiasHarrer/dmetar/master/R/pcurve2.R). In this case, *R* doesn't know this function yet, so we have to let *R* learn it by **copying and pasting** the code **in its entirety** into the **console** on the bottom left pane of RStudio, and then hit **Enter ⏎**. The function requires the `stringr` and `poibin` package to work.
**For this function, the following parameters need to be specified:**
```{r,echo=FALSE}
i<-c("x", "effect.estimation", "N", "dmin", "dmax")
ii<-c("The meta-analysis results object generated by meta functions.",
"Logical. Should the true effect size underlying the p-curve be estimated? If set to TRUE, a vector containing the total sample size for each study must be provided for N. FALSE by default.",
"A numeric vector of same length as the number of effect sizes included in x specifiying the total sample size N corresponding to each effect. Only needed if effect.estimation = TRUE.",
"If effect.estimation = TRUE: lower limit for the effect size (d) space in which the true effect size should be searched. Must be greater or equal to 0. Default is 0.",
"If effect.estimation = TRUE: upper limit for the effect size (d) space in which the true effect size should be searched. Must be greater than 0. Default is 1.")
ms<-data.frame(i,ii)
names<-c("Parameter", "Function")
colnames(ms)<-names
kable(ms)
```
First, let's use the `pcurve` function with `effect.estimation` set to `FALSE`. As this is the default, we only have to plug the `m.hksj` meta-analysis object into the function to generate the $p$-curve.
```{r, echo=FALSE, message=FALSE, error=FALSE}
source("dmetar/pcurve2.R")
library(poibin)
library(stringr)
library(dmetar)
```
```{r, fig.width=7, fig.height=7, fig.align="center", eval=FALSE}
pcurve(m.hksj)
```
```{r, fig.width=7, fig.height=7, fig.align="center", echo=FALSE}
dmetar::pcurve(m.hksj)
```
**The function produces a large output, so let us go through it one by one:**
* **P-Curve plot**. The figure shows the $p$-curve for your results (in blue). On the bottom, you can also find the number of effect sizes with $p<0.05$ which were included in the analysis. Results of the Right-Skewness and Flatness test are also displayed (see Results).
* **P-curve analysis**. This section of the output provides general details about the studies used for the $p$-curve analysis, such as the number of studies in the meta-analysis (in total), the number of significant effect sizes used for the $p$-curve analysis, and number of studies with effect sizes for which $p<0.025$ (the so-called "half-curve").
* **Results**. This section displays results of the Right-Skewness test and the Flatness test. The Right-Skewness tests analyzes if the $p$-curve resulting from your data is significantly right-skewed, which would indicate that there is a "true" effect behind your data. The flatness test analyzes if the $p$-curve is flat, which could indicate that the power is insufficient, or that there is no "true" effect behind your data. For both tests, result are reported for the full $p$-curve (all values for which $p<0.05$) and for the half $p$-curve (all values for which $p<0.025$).
* **Power Estimate**. This line displays the estimated power of the studies in your analysis, and the confidence interval.
* **Evidential value**. In terms of interpretation, this section is the most important one of the output. It shows if $P$-curve estimates that evidential value (a "true" effect) is present in the analysis or not. This interpretation is done automatically based on the values of the Flatness and Right-Skewness test (you can read the [documentation](https://dmetar.protectlab.org/reference/pcurve.html) of the function for more information on how this is done). There are two types of information provided: ($i$) if evidential value is present, or ($ii$) if it is absent or inadequate. This may look a little peculiar at first, because we would expect a simple yes/no interpretation concerning the evidential value of our analysis. However, it is possible that both `Evidential value present` and `Evidential value absent/inadequate` result in a `no` decision. This basically means that while the presence of evidential value could not be ascertained, it could also not be verified that evidential value is indeed absent or inadequate (for example because a very small effect exists).
For our `m.hksj` object, we see that 11 studies were included into the analysis, of which 10 had a $p$-value lower than 0.025. We also see that the Power of the analysis was 84% (95%CI: 62.7%-94.6%).
We are provided with the interpretation that **evidential value is present**, and that evidential value is **not absent or inadequate**. This means that $P$-curve estimates that there is a "true" effect size behind our findings, and that the results are not the product of publication bias and $p$-hacking alone.
<br><br>
### Estimating the "true" effect
The `pcurve` function also allows you to estimate $P$-curve's estimate of the "true" effect size underlying your data (much like the Duval & Tweedie trim-and-fill procedure we described [before](#dant)). However, there is one important information we need to do this: we need to know the **total sample size of each study**. Thankfully, this information is usually reported in most research publications.
Let's assume i have stored a variable called `N.m.hksj` in *R*, which contains the total sample size for each study contained in `m.hksj` (in the same order as the studies in `m.hksj`). Let's have a look at `N.m.hksj` first.
```{r, echo=FALSE}
N.m.hksj = c(105, 161, 60, 37, 141, 82, 97, 61, 200, 79, 124, 25, 166, 59, 201, 95, 166, 144)
```
```{r}
N.m.hksj
```
With this information, we can extend the `pcurve` call from before with some additional arguments. First, we have to set `effect.estimation` to `TRUE` to estimate the effect size. Second, we have to provide the function with the study sample sizes `N.m.hksj`. Lastly, we can specify the range of effect sizes in which the function should search for the true effect (expressed as Cohen's $d$) through `dmin` and `dmax`. We will search for the effect between $d=0.0$ and $d=1.0$. The function returns the same output as before, and one additional plot:
```{r, eval=FALSE}
pcurve(m.hksj, effect.estimation = TRUE, N = N.m.hksj, dmin = 0, dmax = 1)
```
```{r, echo=FALSE, fig.width=6,fig.height=6, fig.align="center"}
library(png)
library(grid)
img <- readPNG("_figs/pcurve_es.png")
grid.raster(img)
```
As can be seen in the plot, the function provides an effect estimate of $d=0.48$. This effect mirrors the one we found for `m.hksj` when using the fixed-effect model, while the effect size for the random-effects model ($g=0.59$) was somewhat higher.
```{block,type='rmdachtung'}
It should be noted that this chapter should only be seen as **an introduction into $P$-curve**, and should not be seen as comprehensive. Simonsohn et al. [@simonsohn2015better] also stress that $P$-Curve should only be used for **outcome data which was actually of interest for the authors of the specific article**, because those are the one's likely to get $p$-hacked. They also ask meta-researchers to provide a detailed table in which the reported results of each outcome data used in the $P$-curve is documented (a guide can be found [here](http://p-curve.com/guide.pdf)).
It has also been shown that $P$-Curve's effect estimate are **not robust** when the heterogeneity of a meta-analyis is high ($I^2$ > 50%). Van Aert et al. [@van2016conducting] propose not to determine the "true" effect using $P$-Curve when heterogeneity is high (defined as $I^2$ > 50%). The `pcurve` function therefore automatically prints a warning message at the end of the output if a true effect is estimated and heterogeneity is considerable. A poosible **solution** for this problem might be to reduce the overall heterogeneity using outlier removal, or to $p$-curve results in more homogeneous subgroups.
```