updated readmes

MediaComem · Dec 20, 2023 · 4d3258a · 4d3258a
1 parent 0259e58
commit 4d3258a
Show file tree

Hide file tree

Showing 3 changed files with 16 additions and 51 deletions.
diff --git a/README.md b/README.md
@@ -1,8 +1,8 @@
-# das-public
+# PLOS Open Science Indicators
 
 [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10256816.svg)](https://doi.org/10.5281/zenodo.10256816)
 
-PLOS recently published an innovative [dataset of Open Science Indicators (OSI)](https://doi.org/10.6084/m9.figshare.21687686.v4), focused on its entire collection plus a comparison dataset from PubMed. We use here the OSI version 5, containing approximately 124000 PMC and PLOS articles (of which 103000 are from PLOS). The OSI is primarily concerned with indicators on: sharing of research data, in particular, data shared in data repositories; sharing of code; and posting of preprints.
+PLOS recently published an innovative [dataset of Open Science Indicators (OSI)](https://doi.org/10.6084/m9.figshare.21687686.v5), focused on its entire collection plus a comparison dataset from PubMed. We use here the [OSI version 5](https://plos.figshare.com/articles/dataset/PLOS_Open_Science_Indicators/21687686/5), containing approximately 124000 PMC and PLOS articles. The OSI is primarily concerned with indicators on: sharing of research data, in particular, data shared in data repositories; sharing of code; and posting of preprints.
 
 The [Media Engineering Institute (MEI)](https://heig-vd.ch/en/research/mei) has been involved in collecting data from the PubMed Open Access collection to equip the OSI dataset with citation data (article) and h-index data (author level), in preparation for further analysis. The data collection pipeline has been adapted following the process described in the previous work on Data Availability Statements, described below.
 
@@ -19,56 +19,22 @@ The [Media Engineering Institute (MEI)](https://heig-vd.ch/en/research/mei) has
 * To validate the code, please refer to the [testing procedure](test.md).
 * The final result can be found in [dataset/exports/export_plos.csv.zip](dataset/exports/export_plos.csv.zip).
 
-# Original work
-This repository is a fork of previous work that can be found here:
-
-[![DOI](https://zenodo.org/badge/180121200.svg)](https://zenodo.org/badge/latestdoi/180121200)
-[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/alan-turing-institute/das-public/master?filepath=notebooks%2FDescriptiveFigures.ipynb)
+## Modelling and analysis
 
-## Previous publications
-The original code is mentioned in the following papers:
+The code and data for the modelling and analysis can be found in the [analysis folder](analysis).
 
-* 📃 Preprint: https://arxiv.org/abs/1907.02565.
-* 📝 Peer-reviewed publication: https://doi.org/10.1371/journal.pone.0230416
 
-Blogs and talks:
-* "A selfish reason to share research data": https://www.turing.ac.uk/blog/selfish-reason-share-research-data
+## Original work
 
-## Code and data
+This repository is a fork of previous work that can be found here:
 
-* See the [dataset folder](dataset) to create a dataset for analysis from the [PubMed Central OA collection](https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist).
-* See the [notebooks](notebooks) and [scripts](scripts) folders to replicate Figure 2 (shown below) and have a descriptive overview of the dataset.
-* See the [analysis folder](analysis) to replicate analytical results from the paper. The [dataset analysed in the paper](analysis/dataset/export_full.csv.zip) is provided, so that the two replication steps can be done independently.
-* The [figures](figures) and [resources](resources) folders contain supporting files.
+* [![DOI](https://zenodo.org/badge/180121200.svg)](https://zenodo.org/badge/latestdoi/180121200)
+* [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/alan-turing-institute/das-public/master?filepath=notebooks%2FDescriptiveFigures.ipynb)
 
-![](figures/Figure2.png)
+The original code is mentioned in the following papers:
 
-## Report issues
+* 📃 Preprint: https://arxiv.org/abs/1907.02565.
+* 📝 Peer-reviewed publication: https://doi.org/10.1371/journal.pone.0230416.
 
 Please add an issue or notify the authors should you find any error to correct or improvements to make.
-Well-documented pull requests are particularly appreciated.
-
-## How to cite
-
-> Colavizza, G., Hrynaszkiewicz, I., Staden, I., Whitaker, K., & McGillivray, B. (2020). The citation advantage of linking publications to research data. PLOS ONE, 15(4), e0230416. https://doi.org/10.1371/journal.pone.0230416
-
-```
-@article{Colavizza_Hrynaszkiewicz_Staden_Whitaker_McGillivray_2020,
-  title =     {The citation advantage of linking publications to research data},
-  volume =    {15},
-  url =       {http://dx.doi.org/10.1371/journal.pone.0230416},
-  DOI =       {10.1371/journal.pone.0230416},
-  number =    {4},
-  journal =   {PLOS ONE},
-  publisher = {Public Library of Science (PLoS)},
-  author =    {Colavizza, Giovanni and
-               Hrynaszkiewicz, Iain and 
-               Staden, Isla and 
-               Whitaker, Kirstie and 
-               McGillivray, Barbara},
-  editor =    {Wicherts, Jelte M.Editor},
-  year =     {2020},
-  month =    {Apr},
-  pages =    {e0230416}
-  }
-```
+Well-documented pull requests are particularly appreciated.
diff --git a/analysis/README.md b/analysis/README.md
@@ -5,10 +5,10 @@ This folder contains R code to replicate the analyses done in the paper. It furt
 * [R models](r_models.R): code to replicate the modelling analyses.
 * [R descriptive](descriptive.R): code to replicate the descriptive analyses.
 * [dataset/compressed](dataset/compressed/): folder containing the zipped copy of the datasets used for analysis, created as per instructions in [R models](r_models.R). Please unzip them in the /dataset folder for reproducing our results.
-	- *DATASET.csv*: contains the complete data frame used for modelling. 
-	- *df_OSI.csv*: contains the OSI data frame, consolidated from version 5.2.
-	- *df_OSI_classes_top.csv*: contains the ANZSRC FoR Division as dummy variables.
-	- *export_plos.csv*: contains citation counts (as in the *datasets/exports* folder).
+	- `DATASET.csv`: contains the complete data frame used for modelling. 
+	- `df_OSI.csv`: contains the OSI data frame, consolidated from version 5.
+	- `df_OSI_classes_top.csv`: contains the ANZSRC FoR Division as dummy variables.
+	- `export_plos.csv`: contains citation counts (as in the `datasets/exports` folder).
 
 ## Instructions
 

diff --git a/install.R b/install.R
@@ -2,7 +2,6 @@ install.packages("ggplot2")
 install.packages("VGAM")
 install.packages("nnet")
 install.packages("glamss")
-install.packages("DMwR")
 install.packages("MASS")
 install.packages("stargazer")
 install.packages("dyplr")