Skip to content

Commit

Permalink
updated readmes
Browse files Browse the repository at this point in the history
  • Loading branch information
Giovanni1085 committed Dec 20, 2023
1 parent 0259e58 commit 4d3258a
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 51 deletions.
58 changes: 12 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# das-public
# PLOS Open Science Indicators

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10256816.svg)](https://doi.org/10.5281/zenodo.10256816)

PLOS recently published an innovative [dataset of Open Science Indicators (OSI)](https://doi.org/10.6084/m9.figshare.21687686.v4), focused on its entire collection plus a comparison dataset from PubMed. We use here the OSI version 5, containing approximately 124000 PMC and PLOS articles (of which 103000 are from PLOS). The OSI is primarily concerned with indicators on: sharing of research data, in particular, data shared in data repositories; sharing of code; and posting of preprints.
PLOS recently published an innovative [dataset of Open Science Indicators (OSI)](https://doi.org/10.6084/m9.figshare.21687686.v5), focused on its entire collection plus a comparison dataset from PubMed. We use here the [OSI version 5](https://plos.figshare.com/articles/dataset/PLOS_Open_Science_Indicators/21687686/5), containing approximately 124000 PMC and PLOS articles. The OSI is primarily concerned with indicators on: sharing of research data, in particular, data shared in data repositories; sharing of code; and posting of preprints.

The [Media Engineering Institute (MEI)](https://heig-vd.ch/en/research/mei) has been involved in collecting data from the PubMed Open Access collection to equip the OSI dataset with citation data (article) and h-index data (author level), in preparation for further analysis. The data collection pipeline has been adapted following the process described in the previous work on Data Availability Statements, described below.

Expand All @@ -19,56 +19,22 @@ The [Media Engineering Institute (MEI)](https://heig-vd.ch/en/research/mei) has
* To validate the code, please refer to the [testing procedure](test.md).
* The final result can be found in [dataset/exports/export_plos.csv.zip](dataset/exports/export_plos.csv.zip).

# Original work
This repository is a fork of previous work that can be found here:

[![DOI](https://zenodo.org/badge/180121200.svg)](https://zenodo.org/badge/latestdoi/180121200)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/alan-turing-institute/das-public/master?filepath=notebooks%2FDescriptiveFigures.ipynb)
## Modelling and analysis

## Previous publications
The original code is mentioned in the following papers:
The code and data for the modelling and analysis can be found in the [analysis folder](analysis).

* 📃 Preprint: https://arxiv.org/abs/1907.02565.
* 📝 Peer-reviewed publication: https://doi.org/10.1371/journal.pone.0230416

Blogs and talks:
* "A selfish reason to share research data": https://www.turing.ac.uk/blog/selfish-reason-share-research-data
## Original work

## Code and data
This repository is a fork of previous work that can be found here:

* See the [dataset folder](dataset) to create a dataset for analysis from the [PubMed Central OA collection](https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist).
* See the [notebooks](notebooks) and [scripts](scripts) folders to replicate Figure 2 (shown below) and have a descriptive overview of the dataset.
* See the [analysis folder](analysis) to replicate analytical results from the paper. The [dataset analysed in the paper](analysis/dataset/export_full.csv.zip) is provided, so that the two replication steps can be done independently.
* The [figures](figures) and [resources](resources) folders contain supporting files.
* [![DOI](https://zenodo.org/badge/180121200.svg)](https://zenodo.org/badge/latestdoi/180121200)
* [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/alan-turing-institute/das-public/master?filepath=notebooks%2FDescriptiveFigures.ipynb)

![](figures/Figure2.png)
The original code is mentioned in the following papers:

## Report issues
* 📃 Preprint: https://arxiv.org/abs/1907.02565.
* 📝 Peer-reviewed publication: https://doi.org/10.1371/journal.pone.0230416.

Please add an issue or notify the authors should you find any error to correct or improvements to make.
Well-documented pull requests are particularly appreciated.

## How to cite

> Colavizza, G., Hrynaszkiewicz, I., Staden, I., Whitaker, K., & McGillivray, B. (2020). The citation advantage of linking publications to research data. PLOS ONE, 15(4), e0230416. https://doi.org/10.1371/journal.pone.0230416
```
@article{Colavizza_Hrynaszkiewicz_Staden_Whitaker_McGillivray_2020,
title = {The citation advantage of linking publications to research data},
volume = {15},
url = {http://dx.doi.org/10.1371/journal.pone.0230416},
DOI = {10.1371/journal.pone.0230416},
number = {4},
journal = {PLOS ONE},
publisher = {Public Library of Science (PLoS)},
author = {Colavizza, Giovanni and
Hrynaszkiewicz, Iain and
Staden, Isla and
Whitaker, Kirstie and
McGillivray, Barbara},
editor = {Wicherts, Jelte M.Editor},
year = {2020},
month = {Apr},
pages = {e0230416}
}
```
Well-documented pull requests are particularly appreciated.
8 changes: 4 additions & 4 deletions analysis/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ This folder contains R code to replicate the analyses done in the paper. It furt
* [R models](r_models.R): code to replicate the modelling analyses.
* [R descriptive](descriptive.R): code to replicate the descriptive analyses.
* [dataset/compressed](dataset/compressed/): folder containing the zipped copy of the datasets used for analysis, created as per instructions in [R models](r_models.R). Please unzip them in the /dataset folder for reproducing our results.
- *DATASET.csv*: contains the complete data frame used for modelling.
- *df_OSI.csv*: contains the OSI data frame, consolidated from version 5.2.
- *df_OSI_classes_top.csv*: contains the ANZSRC FoR Division as dummy variables.
- *export_plos.csv*: contains citation counts (as in the *datasets/exports* folder).
- `DATASET.csv`: contains the complete data frame used for modelling.
- `df_OSI.csv`: contains the OSI data frame, consolidated from version 5.
- `df_OSI_classes_top.csv`: contains the ANZSRC FoR Division as dummy variables.
- `export_plos.csv`: contains citation counts (as in the `datasets/exports` folder).

## Instructions

Expand Down
1 change: 0 additions & 1 deletion install.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ install.packages("ggplot2")
install.packages("VGAM")
install.packages("nnet")
install.packages("glamss")
install.packages("DMwR")
install.packages("MASS")
install.packages("stargazer")
install.packages("dyplr")

0 comments on commit 4d3258a

Please sign in to comment.