forked from AuthorCarpentry/DTSTUDENT2018
-
Notifications
You must be signed in to change notification settings - Fork 1
/
DMP_final.Rmd
289 lines (176 loc) · 14.9 KB
/
DMP_final.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
---
title: "Data Management Plan for the Study _Do Reputable Open Access Journals Require Open Data Sharing?_"
author:
- Principal Investigator - Gail Clement, `r params$institution`
- Co-Investigator - Thomas Morell, Caltech
date: "`r format(Sys.Date(), '%A, %B %d, %Y')`"
output:
html_document:
code_folding: hide
css: custom.css
number_sections: yes
toc: yes
toc_depth: 2
toc_float: yes
theme: readable
highlight: kate
word_document:
toc: yes
toc_depth: '2'
params:
institution:
choices:
- International Centre for Theoretical Physics
- Trieste Polytechnic Institute
- CODATA-RDA Summer School
- Elsevier Inc
- California Institute of Technology
input: select
label: 'Institution:'
value: California Institute of Technology
bibliography:
- oajournals.bib
- packages.bib
---
*****
```{r setup, include=FALSE, echo = FALSE}
# Load packages used in this exercise
library(tidyverse)
library(rmarkdown)
library(DT)
```
```{r global_options, include=FALSE}
# Set global options for chunk options
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, cache=FALSE)
```
```{r add_dataset}
# Add the dataset of DOAJ Seal journals
doaj_seal <- read_csv("data/DOAJ_Seal.csv")
```
# Administrative Data
## ID
*Not Applicable*
## Funder
This research is being submitted for funding to the [J. Bohannon Foundation](http://www.johnbohannon.org/).
## Grant Reference Number
*Not Applicable* (Proposal in preparation)
## Project Name
***Do Reputable Open Access Journals Require Open Data Sharing? ***
## Project Description
This study analyzes the submission requirements of the most reputable open access journals to determine the prevalence and characteristics of data sharing policies. This question is an important one for 21^st^ century authors and readers because open data sharing is seen as a key component of open and more trusted scientific record.
According to the Research Data Alliance (RDA) [Data Policy Standardisation and Implementation group](https://www.rd-alliance.org/groups/data-policy-standardisation-and-implementation):
>"the prevalence of research data policies from institutions and research funders is increasing, so publishers and editors are paying more attention to standardisation and the wider adoption of **data sharing policies**."
This study investigates whether the **most reputable Open Access journals** have data sharing polices and the characteristics of those policies. These policies require authors, in some fashion, to openly disseminate the data and software underlying their published articles.
Our investigation builds on the recent work of @Castro_2017 who assessed the prevalence and characteristics of data sharing policies from randomly-selected, English-language, open access journals. Their findings reveal that only a small minority of these journals have data sharing policies. These findings -- which are consistent with those of other studies [see for example, @Vasilevsky_2017] -- may be skewed because of the authors' rules of inclusion and exclusion ^[in particular, the choice to include open access journals merely because of their use of the Open Journal Systems (OJS) hosting platform; the choice to exclude non_English language journals].
In this study, we will include only the most reputable open access journals in our assessment of journal sharing policies, regardless of language. We will analyze all journals that have attained the Seal of Approval from the [_**Directory of Open Access Journals, DOAJ**_](http://doaj.org) (shown below). We will apply the same coding framework devised by @Castro_2017 to the DOAJ Seal journals. We contend that a more rigorously screened population of open access journals, regardless of language, will yield a more accurate and reproducible set of findings than those published from @Castro_2017.
![DOAJ Seal of Approval](images/doaj_seal_logo.png)
**DOAJ Seal** journals are considered the most reputable because they:
>"achieve a high level of openness, adhere to Best Practice and high publishing standards.The Seal is awarded to a journal that fulfills a set of criteria related to accessibility, openness, discoverability, reuse and author rights. It acts as a signal to readers and authors that the journal has generous use and reuse terms, author rights and adheres to the highest level of 'openness'. " ^[DOAJ selection for Seal Approval is explained in the FAQ at <https://doaj.org/faq#seal>]
Moreover, the DOAJ Seal journals do include over 200 non-English language journals that merit analysis in this study. Excluding these from the analysis represents cultural bias that undermines reliable research. The following plot of DOAJ Seal Journals by Country indicates the problem.
```{r plot_country, echo= FALSE, fig.width=10,fig.height=11}
doaj <- read.csv(file = "data/DOAJ_Seal.csv", header = TRUE)
ggplot(data=doaj, aes(doaj$PubCountry, fill=doaj$PubCountry)) + stat_count() + labs(x="Country", y="Count") + theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
```
Finally, our research group has determined that following is true when it comes to reputable open access journals.
```{r equations, child = "equations-child.Rmd"}
```
## Researcher
Gail Clement, Principal Investigator, `r params$institution`
## Researcher ID
ORCID: 0000-0001-5494-4806
## Date of First Version
`r format(Sys.Date()-60, '%A, %B %d, %Y')`
## Date of Last Update
`r format(Sys.Date(), '%A, %B %d, %Y')`
## Related Policies
a. All original data, code, or reports produced as part of this project are owned by `r params$institution` per its institutional intellectual property policy.
b. The J. Bohannan Foundation adheres to the open access and data sharing policies of the [Gates Foundation](https://www.gatesfoundation.org/How-We-Work/General-Information/Open-Access-Policy)
*****
# Data Collection
## Existing Data Being Reused
This study relies on the _DOAJ Journal Metadata_ available as a `csv` file download from the [DOAJ website](https://doaj.org/faq#metadata). The `csv` file is updated every 30 minutes.
This data will be read into the open source tool [`OpenRefine`](openrefine.org) to filter it for those journals awarded the DOAJ Seal, and to remove unneeded columns containing the web addresses for journal policies around plagiarism, submission fees, and other urls not related to this study.
The filtered version of the data set will be exported as a new file named `doaj_seal.csv` for importing into analytical software.
A sample of the `doaj_seal.csv` data set is shown below. The complete data set is available in searchable and broweseable format as [Annex 10.1](#annex-table) at the end of this document.
```{r}
knitr::kable(head(doaj_seal, 4), caption = 'A Table of the first 4 rows of the DOAJ Seal data.')
```
## Data being collected
The `doaj_seal.csv` data set currently includes `r nrow(doaj_seal)` reputable open access journals that we will investigate in this study. The `doaj_seal.csv` data set will be copied and enhanced with additional columns, resulting in the processed data set, `doaj_seal_enhanced.csv`. The following columns will be added to `doaj_seal.csv`, in conformance with the Coding Framework of @Castro_2017.
- 'Data Policy' (Boolean)
+ Yes | No
- 'Data Sharing Policy' (Factor)
+ No mention
+ Implied
+ Mentioned
+ Explicitly encouraged
+ Required, but not explicitly tied to editorial decisions
+ Required as a condition of publication
- 'Data Citation Policy' (Factor)
+ No mention
+ Implied
+ Explicitly encouraged
Investigators will examine the websites of each journal listed in the `doaj_seal_enhanced.csv` file to determine whether the data sharing policy is included in the Instructions to Authors. The Coding Framework published by @Castro_2017 will be applied.
### Data file formats and standards
All data retrieved for the DOAJ Seal of Approval are downloaded and stored in the open `common separated value` (`csv`) file format.
All data policies culled from the web sites of the DOAJ seal journals will be saved as `.txt` files. The data generated by applying Castro et al's [-@Castro_2017] Coding Framework will be stored in `csv` file format.
Analysis, visualization, and summarization of the study's findings will be performed in the open source software `R` and `RStudio` using the `tidyverse` package [@R-tidyverse]. Reports produced from the study will be also be created in `RStudio` using the open source text format `Rmarkdown` and output to `HTML` documents, slides, and `MS Word` documents for submission to funders or publishers [@Xie_2018].
All files associated with the project will be maintained under the `Git` version control system and made openly available for download from the Principal Investigator's `GitHub` repository.
### Expected outputs of the project
|Output #|Digital Output|Type|Format,Duration,Size|Planned access|
|:------:|:-------|:-----------|:----------|:-------------|
|1|doaj_seal.csv|raw data set downloaded from DOAJ site| CSV file, plain text format, 2.7 MB | Not retained since duplicative |
|2| doaj_seal_enhanced.csv | enhanced data set with new data | CSV file, plain text format, 2.7 MB | GitHub repository; Zenodo repository |
|3| Data set documentation | json metadata file | plain text file, .1 MB | Datacite registry database; GitHub public repository; Zenodo repository |
|4| Data Processing steps | R scripts and comments | R Notebook file, 1 MB | GitHub public repository |
|5| Data Visualizations | R scripts and documentation; Plots | R Notebook file, .png image files 4 MB | GitHub public repository |
|6| Journal article | Rendered report | RMarkdown, 9 MB| Publisher website, community preprint server|
```{r plot_license, echo= FALSE, fig.width=10,fig.height=11}
doaj <- read_csv(file = "data/DOAJ_Seal.csv")
ggplot(data=doaj, aes(doaj$JnlLicense, fill=doaj$JnlLicense)) + stat_count() + labs(x="License", y="Count", title = "DOAJ Seal by Country")
```
# Documentation and Metadata
The journal metadata contained in the project's data sets comes directly from the _Directory of Open Access Journals_.
The final outputs from the project will be documented in metadata files according to the DataCite DOI registration agency -- see the [DataCite Metadata Schema 4.1] (https://schema.datacite.org/) for specific details. By following this standard metadata format, other researchers (and computers) will be able to find, access, and reuse the outputs from this project by searching the DataCite metadatabase.
*****
# Ethics and Legal Compliance
* No additional ethical or privacy issues arise in this study because both the _DOAJ data_, and the information about data policies for any published journal, are publicly posted online.
* The data provided about journals awarded the *Directory of Open Access Journals* Seal of Approval is distributed under a CC BY-SA license. This license requires that reusers of the data share their derivative data set under the same license. Therefore, the output of this research will be disseminated under the CC BY-SA license. This license adheres to the *Principles and Guidelines of the Research Data Alliance Legal Interoperability Group*, which recommends the use of Creative Commons Attribution licenses to allow the broadest sharing of data while guaranteeing attribution to the data provider.^[https://www.rd-alliance.org/rda-codata-legal-interoperability-research-data-principles-and-implementation-guidelines-now]
*****
# Storage and Backup
During the active phase of the project data will be stored on and backed up to the Research Data Storage Facility (RDSF) at `r params$institution`. This facility represents 2 million pounds of digital resilient storage, with ongoing capital investment. The RDSF is overseen by a steering group of senior research and support staff, which includes the PVC Research. Backup procedures are robust (overnight backup, copies held remotely on tape) and secured access is in place
# Selection and long-term preservation
After the project conclusion, all research outputs will be assigned Digital Object Identifiers by DataCite to serve as persistent identifiers and to always resolve to the landing page for the project files (metadata, data, code, and reports).
Additionally, the Research Data Storage Facility (RDSF) at `r params$institution` will undertake to preserve each of the digital outputs mentioned above, for three years beyond the end of the project. Refreshment of storage media will be scheduled as required
The final complete data set `doaj_seal_enhanced` and all published outputs produced based on this data set will be issued Digital Object Identifiers and stored on `GitHub` and `Zenodo`. All draft versions of project outputs will also be publicly available in Git Hub with unique hashes distinguishing each version that precedes the final, released versions represented in the published DMP.
****
# Data Sharing
The data and metadata will be made openly accessible under a CC-BY SA license in the CERN-maintained open access repository [Zenodo](https://zenodo.org/), an open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. ^[Zenodo policies are available online at http://about.zenodo.org/policies/]
*****
# Responsibilities and Resources
The Principal Investigator is responsible for implementing the Data Management Plan and ensuring it is reviewed and revised as necessary. (S)he will be responsible for all data collection and recording; for data analysis and visualization; and for maintain all files under version control using git and GitHub.
The Data Management Specialist assigned to the project as an in-kind contribution from the `r params$institution` Library will be responsible for creating the DataCite metadata documentation for all outputs and ensuring timely DOI registration of each final output. (s)he will also deposit all final outputs to the Zenodo repository and update metadata associated with the DOI as necessary.
# Annexes {-}
## Complete dataset `doaj_seal.csv` {#annex-table}
```{r data-table}
doaj_seal %>%
datatable(rownames = FALSE,
colnames = c("Title", "Publisher", "Country", "Fees", "Waivers", "Identifiers", "Start Year", "Language(s)", "Review Process", "Plagiarism check", "Time to Press", "License", "Author owns Copyright", "DOAJ Seal"),
class = "cell-border stripe",
caption = "Journals with DOAJ Seal",
filter = list(position = "bottom"),
extensions = 'Buttons', options = list(dom = 'Bfrtip',
buttons = c('colvis', 'csv', 'pdf'))
)
```
## Principal Investigator's BioSketch
_This is auto-populated from your ORCID profile using the @R-rorcid `rorcid` package._
```{r orcid-bio, echo= FALSE}
library ('rorcid')
library('httpuv')
token <- orcid_auth(scope = "/authenticate", reauth = TRUE, redirect_uri = getOption("rorcid.redirect_uri"))
res <- orcid_bio(orcid = "0000-0001-5494-4806")
bio <- res$'0000-0001-5494-4806'$'content'
```
`r bio`
# References