Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prep for 0.3.0 submission to CRAN #74

Merged
merged 23 commits into from
Jan 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
ee57d28
starting cran check
wibeasley Jan 17, 2021
1ce6bd7
updating links
wibeasley Jan 17, 2021
b4d6647
bump version
wibeasley Jan 17, 2021
e227c79
lazyData
wibeasley Jan 17, 2021
58505e1
rmarkdown dependency
wibeasley Jan 17, 2021
987c884
split comparisons into pieces
wibeasley Jan 18, 2021
86bc7a5
non-latin character for Kukoc
wibeasley Jan 18, 2021
86fa3c1
fix vignette link
wibeasley Jan 18, 2021
e4e9c8e
bump version
wibeasley Jan 18, 2021
ab7591a
Merge branch 'dev' of https://github.com/IQSS/dataverse-client-r into…
wibeasley Jan 18, 2021
7c4112b
Merge branch 'master' into dev
wibeasley Jan 18, 2021
89b3b3a
small updates to docs
wibeasley Jan 18, 2021
696f28d
update cran comments
wibeasley Jan 18, 2021
6f257b5
update news file
wibeasley Jan 18, 2021
538fd15
Fix typo
kuriwaki Jan 18, 2021
50afad3
Move around documentation for get_*
kuriwaki Jan 18, 2021
cb79851
Reorder. The first example is not about rds, so we shouldn't name it …
kuriwaki Jan 18, 2021
342a494
readr is already in Imports (and we use readr::read_tsv all the time)…
kuriwaki Jan 18, 2021
ef8f454
Reorder authors roughly by amount of edits? (cc: @wibeasley)
kuriwaki Jan 18, 2021
c9025d7
As you know dvn is removed from CRAN (although that link still is liv…
kuriwaki Jan 18, 2021
1c5d6e8
Dataverse moved to v5 in late 2020. All our new tests are tested agai…
kuriwaki Jan 18, 2021
ce36291
turn off some examples for CRAN submission
wibeasley Jan 18, 2021
e2012c7
merging changes from @kuriwaki
wibeasley Jan 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ man-roxygen/*
^_pkgdown\.yml$
^docs$
^pkgdown$
^cran-comments\.md$
22 changes: 12 additions & 10 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: dataverse
Version: 0.2.1.9002
Version: 0.3.0
Title: Client for Dataverse 4 Repositories
Authors@R: c(
person(
Expand All @@ -14,18 +14,18 @@ Authors@R: c(
email = "[email protected]",
comment = c(ORCID = "0000-0003-4097-6326")
),
person(
"Philip", "Durbin",
role = c("aut"),
email = "[email protected]",
comment = c(ORCID = "0000-0002-9528-9470")
),
person(
"Shiro", "Kuriwaki",
role = c("aut"),
email = "[email protected]",
comment = c(ORCID = "0000-0002-5687-2647")
),
person(
"Philip", "Durbin",
role = c("aut"),
email = "[email protected]",
comment = c(ORCID = "0000-0002-9528-9470")
),
person(
"Sebastian", "Karcher",
role=c("aut"),
Expand All @@ -49,13 +49,15 @@ Suggests:
haven,
knitr,
purrr,
rmarkdown,
testthat,
UNF,
yaml
Description: Provides access to Dataverse version 4 APIs <https://dataverse.org/>,
enabling data search, retrieval, and deposit. For Dataverse versions <= 4.0,
use the deprecated 'dvn' package <https://cran.r-project.org/package=dvn>.
Description: Provides access to Dataverse APIs <https://dataverse.org/> (versions 4-5),
enabling data search, retrieval, and deposit. For Dataverse versions <= 3.0,
use the archived 'dvn' package <https://cran.r-project.org/package=dvn>.
License: GPL-2
LazyData: true
URL: https://github.com/iqss/dataverse-client-r
BugReports: https://github.com/iqss/dataverse-client-r/issues
VignetteBuilder: knitr
Expand Down
8 changes: 7 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# CHANGES TO dataverse 0.2.2 (upcoming)
# CHANGES TO dataverse 0.3.0

New Methods

* Add new `get_dataframe_*()` methods (#48, #66)

Small updates

* Make filter queries (fq) work in `dataverse_search` (#36 @adam3smith)
* Update maintainer to Will Beasley ([email protected]) (#38)
Expand Down
45 changes: 30 additions & 15 deletions R/get_dataframe.R
Original file line number Diff line number Diff line change
@@ -1,53 +1,56 @@
#' Get file from dataverse and convert it into a dataframe or tibble
#' Download dataverse file as a dataframe
#'
#' `get_dataframe_by_id`, if you know the numeric ID of the dataset, or instead
#' `get_dataframe_by_name` if you know the filename and doi. The dataset
#' Use `get_dataframe_by_name` if you know the name of the datafile and the DOI
#' of the dataset. Use `get_dataframe_by_doi` if you know the DOI of the datafile
#' itself. Use `get_dataframe_by_id` if you know the numeric ID of the
#' datafile.
#'
#' @rdname get_dataframe
#'
#' @param filename The name of the file of interest, with file extension, for example
#' `"roster-bulls-1996.tab"`.
#' @param .f The function to used for reading in the raw dataset. This user
#' must choose the appropriate function: for example if the target is a .rds
#' file, then `.f` should be `readRDS` or `readr::read_`rds`.
#' file, then `.f` should be `readRDS` or `readr::read_rds`.
#' @param original A logical, defaulting to TRUE. Whether to read the ingested,
#' archival version of the dataset if one exists. The archival versions are tab-delimited
#' archival version of the datafile if one exists. The archival versions are tab-delimited
#' `.tab` files so if `original = FALSE`, `.f` is set to `readr::read_tsv`.
#' If functions to read the original version is available, then `original = TRUE`
#' with a specified `.f` is better.
#'
#' @inheritDotParams get_file
#'
#' @examples
#'
#' # Retrieve data.frame from dataverse DOI and file name
#' df_from_rds_ingested <-
#' df_tab <-
#' get_dataframe_by_name(
#' filename = "roster-bulls-1996.tab",
#' dataset = "doi:10.70122/FK2/HXJVJU",
#' server = "demo.dataverse.org"
#' )
#'
#' # Retrieve the same data.frame from dataverse + file DOI
#' df_from_rds_ingested_by_doi <-
#' # Retrieve the same file from file DOI
#' df_tab <-
#' get_dataframe_by_doi(
#' filedoi = "10.70122/FK2/HXJVJU/SA3Z2V",
#' server = "demo.dataverse.org"
#' )
#'
#' # Do not run when submitting to CRAN, because the whole
#' # example sometimes takes longer than 10 sec.
#' \dontrun{
#' # Retrieve ingested file originally a Stata dta
#' df_from_stata_ingested <-
#' get_dataframe_by_name(
#' filename = "nlsw88.tab",
#' dataset = "doi:10.70122/FK2/PPIAXE",
#' server = "demo.dataverse.org"
#' )
#'
#' )
#'
#' # To use the original file version, or for non-ingested data,
#' # please specify `original = TRUE` and specify a function in .f.
#'
#' # A data.frame is still returned, but the
# Rds files are not ingested so original = TRUE and .f is required.
#' if (requireNamespace("readr", quietly = TRUE)) {
#' df_from_rds_original <-
#' get_dataframe_by_name(
Expand All @@ -56,19 +59,31 @@
#' server = "demo.dataverse.org",
#' original = TRUE,
#' .f = readr::read_rds
#' )
#' )
#' }
#'
#' # Get Stata file as original
#' if (requireNamespace("haven", quietly = TRUE)) {
#' df_from_stata_original <-
#' df_stata_original <-
#' get_dataframe_by_name(
#' filename = "nlsw88.tab",
#' dataset = "doi:10.70122/FK2/PPIAXE",
#' server = "demo.dataverse.org",
#' original = TRUE,
#' .f = haven::read_dta
#' )
#' )
#' }
#'
#' # Stata file as ingested file (less information than original)
#' df_stata_ingested <-
#' get_dataframe_by_name(
#' filename = "nlsw88.tab",
#' dataset = "doi:10.70122/FK2/PPIAXE",
#' server = "demo.dataverse.org"
#' )
#'
#' }
#'
#' @export
get_dataframe_by_name <- function (
filename,
Expand Down
19 changes: 10 additions & 9 deletions R/get_file.R
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
#' @rdname files
#'
#' @title Download File
#' @title Download dataverse file as a raw binary
#'
#' @description Download Dataverse File(s). `get_file` is a general wrapper,
#' and can take either dataverse objects, file IDs, or a filename and dataverse.
#' @description Download Dataverse File(s). `get_file_*`
#' functions return a raw binary file, which cannot be readily analyzed in R.
#' To use the objects as dataframes, see the `get_dataset_*` functions at
#' \link{get_dataset} instead.
#'
#' @details This function provides access to data files from a Dataverse entry.
#' `get_file` is a general wrapper,
#' and can take either dataverse objects, file IDs, or a filename and dataverse.
#' Internally, all functions download each file by `get_file_by_id`.
#' `get_file_by_name` is a shorthand for running `get_file` by
#' specifying a file name (`filename`) and dataset (`dataset`).
#' `get_file_by_doi` obtains a file by its file DOI, bypassing the
#' `dataset` argument.
#'
#' Internally, all functions download each file by `get_file_by_id`. `get_file_*`
#' functions return a raw binary file, which cannot be readily analyzed in R.
#' To use the objects as dataframes, see the `get_dataset_*` functions at \link{get_dataset}
#'
#' @details This function provides access to data files from a Dataverse entry.
#'
#' @param file An integer specifying a file identifier; or a vector of integers
#' specifying file identifiers; or, if used with the prefix \code{"doi:"}, a
#' character with the file-specific DOI; or, if used without the prefix, a
Expand Down
8 changes: 4 additions & 4 deletions README.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "R Client for Dataverse 4 Repositories"
title: "R Client for Dataverse Repositories"
output: github_document
---

Expand All @@ -11,9 +11,9 @@ Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")

[![CRAN Version](https://www.r-pkg.org/badges/version/dataverse)](https://cran.r-project.org/package=dataverse) ![Downloads](https://cranlogs.r-pkg.org/badges/dataverse) [![Travis-CI Build Status](https://travis-ci.org/IQSS/dataverse-client-r.png?branch=master)](https://travis-ci.org/IQSS/dataverse-client-r) [![codecov.io](https://codecov.io/github/IQSS/dataverse-client-r/coverage.svg?branch=master)](https://codecov.io/github/IQSS/dataverse-client-r?branch=master)

[![Dataverse Project logo](https://dataverse.org/files/dataverseorg/files/dataverse_project_logo-hp.png "Dataverse Project")](https://dataverse.org)
[![Dataverse Project logo](https://dataverse.org/files/dataverseorg/files/dataverse_project_logo-hp.png)](https://dataverse.org)

The **dataverse** package provides access to [Dataverse 4](https://dataverse.org/) APIs, enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, retrieval, and deposit, including use of the (currently in development) **sword** package for data deposit and the **UNF** package for data fingerprinting.
The **dataverse** package provides access to [Dataverse](https://dataverse.org/) APIs (versions 4-5), enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, retrieval, and deposit, including use of the (currently in development) **sword** package for data deposit and the **UNF** package for data fingerprinting.

### Getting Started

Expand All @@ -32,7 +32,7 @@ library("dataverse")

#### Keys

Some features of the Dataverse 4 API are public and require no authentication. This means in many cases you can search for and retrieve data without a Dataverse account for that a specific Dataverse installation. But, other features require a Dataverse account for the specific server installation of the Dataverse software, and an API key linked to that account. Instructions for obtaining an account and setting up an API key are available in the [Dataverse User Guide](https://guides.dataverse.org/en/latest/user/account.html). (Note: if your key is compromised, it can be regenerated to preserve security.) Once you have an API key, this should be stored as an environment variable called `DATAVERSE_KEY`. It can be set within R using:
Some features of the Dataverse API are public and require no authentication. This means in many cases you can search for and retrieve data without a Dataverse account for that a specific Dataverse installation. But, other features require a Dataverse account for the specific server installation of the Dataverse software, and an API key linked to that account. Instructions for obtaining an account and setting up an API key are available in the [Dataverse User Guide](https://guides.dataverse.org/en/latest/user/account.html). (Note: if your key is compromised, it can be regenerated to preserve security.) Once you have an API key, this should be stored as an environment variable called `DATAVERSE_KEY`. It can be set within R using:

``` r
Sys.setenv("DATAVERSE_KEY" = "examplekey12345")
Expand Down
52 changes: 26 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
R Client for Dataverse 4 Repositories
R Client for Dataverse Repositories
================

[![CRAN
Expand All @@ -9,14 +9,13 @@ Status](https://travis-ci.org/IQSS/dataverse-client-r.png?branch=master)](https:
[![codecov.io](https://codecov.io/github/IQSS/dataverse-client-r/coverage.svg?branch=master)](https://codecov.io/github/IQSS/dataverse-client-r?branch=master)

[![Dataverse Project
logo](https://dataverse.org/files/dataverseorg/files/dataverse_project_logo-hp.png
"Dataverse Project")](https://dataverse.org)
logo](https://dataverse.org/files/dataverseorg/files/dataverse_project_logo-hp.png)](https://dataverse.org)

The **dataverse** package provides access to
[Dataverse 4](https://dataverse.org/) APIs, enabling data search,
retrieval, and deposit, thus allowing R users to integrate public data
sharing into the reproducible research workflow. **dataverse** is the
next-generation iteration of [the **dvn**
[Dataverse](https://dataverse.org/) APIs (versions 4-5), enabling data
search, retrieval, and deposit, thus allowing R users to integrate
public data sharing into the reproducible research workflow.
**dataverse** is the next-generation iteration of [the **dvn**
package](https://cran.r-project.org/package=dvn), which works with
Dataverse 3 (“Dataverse Network”) applications. **dataverse** includes
numerous improvements for data search, retrieval, and deposit, including
Expand All @@ -35,7 +34,7 @@ library("dataverse")

#### Keys

Some features of the Dataverse 4 API are public and require no
Some features of the Dataverse API are public and require no
authentication. This means in many cases you can search for and retrieve
data without a Dataverse account for that a specific Dataverse
installation. But, other features require a Dataverse account for the
Expand All @@ -53,12 +52,13 @@ Sys.setenv("DATAVERSE_KEY" = "examplekey12345")

#### Server

Because [there are many Dataverse installations](https://dataverse.org/),
all functions in the R client require specifying what server
installation you are interacting with. This can be set by default with
an environment variable, `DATAVERSE_SERVER`. This should be the
Dataverse server, without the “https” prefix or the “/api” URL path,
etc. For example, the Harvard Dataverse can be used by setting:
Because [there are many Dataverse
installations](https://dataverse.org/), all functions in the R client
require specifying what server installation you are interacting with.
This can be set by default with an environment variable,
`DATAVERSE_SERVER`. This should be the Dataverse server, without the
“https” prefix or the “/api” URL path, etc. For example, the Harvard
Dataverse can be used by setting:

``` r
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
Expand Down Expand Up @@ -99,7 +99,7 @@ nlsw <-

## Downloading ingested version of data with readr::read_tsv. To download the original version and remove this message, set original = TRUE.

##
##
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
## cols(
## idcode = col_double(),
Expand Down Expand Up @@ -164,7 +164,8 @@ nlsw_original <-
)
```

Note that even though the file prefix is “.tab”, we use `read_dta`.
Note that even though the file prefix is “.tab”, we use
`haven::read_dta`.

Of course, when the dataset is not ingested (such as a Rds file), users
would always need to specify an `.f` argument for the specific file.
Expand All @@ -183,7 +184,7 @@ class(nlsw_tsv$race) # tab ingested version only has numeric data
attr(nlsw_original$race, "labels") # original dta has value labels
```

## white black other
## white black other
## 1 2 3

#### Reading a dataset as a binary file.
Expand Down Expand Up @@ -220,7 +221,7 @@ get_dataset(
)
```

## Dataset (182162):
## Dataset (182162):
## Version: 1.1, RELEASED
## Release Date: 2020-12-30T00:00:24Z
## License: CC0
Expand Down Expand Up @@ -256,13 +257,12 @@ subsequent pages, specify `start`.

### Data Archiving

Dataverse provides two - basically unrelated - workflows for managing
(adding, documenting, and publishing) datasets. The first is built on
[SWORD v2.0](http://swordapp.org/sword-v2/). This means that to create a
new dataset listing, you will have to first initialize a dataset entry with
some metadata, add one or more files to the dataset, and then publish
it. This looks something like the following:

Dataverse provides two - basically unrelated - workflows for managing
(adding, documenting, and publishing) datasets. The first is built on
[SWORD v2.0](http://swordapp.org/sword-v2/). This means that to create a
new dataset listing, you will have to first initialize a dataset entry
with some metadata, add one or more files to the dataset, and then
publish it. This looks something like the following:

``` r
# retrieve your service document
Expand Down Expand Up @@ -324,6 +324,6 @@ Scott Chamberlain’s [oai](https://cran.r-project.org/package=oai), which
offer metadata download from any web repository that is compliant with
the [Open Archives Initiative](http://www.openarchives.org/) standards.
Additionally, [rdryad](https://cran.r-project.org/package=rdryad) uses
OAIHarvester to interface with [Dryad](http://datadryad.org/). The
OAIHarvester to interface with [Dryad](https://datadryad.org/stash). The
[rfigshare](https://cran.r-project.org/package=rfigshare) package works
in a similar spirit to **dataverse** with <https://figshare.com/>.
27 changes: 27 additions & 0 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Description
-----------------------------------------------

This submission includes new features and updates to stay compliant with R checks.

A second change is that I am now the package maintainer, taking over from Thomas J. Leeper ([email protected]). See https://github.com/IQSS/dataverse-client-r/issues/42 and https://github.com/IQSS/dataverse-client-r/issues/21.

The first submission on Jan 17/18 was rejected because the CRAN check had three notes that one documentation example exceeded 10 seconds. In response, I've added a `dontrun{}` block on most of that example.

Thank you for taking the time to review my submission, and please tell me if there's something else I should do for CRAN. -Will Beasley


Test environments
-----------------------------------------------

1. Local Ubuntu, R 4.0.3
1. Local Win10, R 4.0.3 Patched
1. [r-hub](https://builder.r-hub.io/status/dataverse_0.3.0.tar.gz-905624c45a92467eb688858acab1a13)
1. [win-builder](https://win-builder.r-project.org/xYyWrC1uFjXH), development version.
1. [Travis CI](https://travis-ci.org/github/IQSS/dataverse-client-r), Ubuntu 18.04 LTS


R CMD check results
-----------------------------------------------

* No ERRORs or WARNINGs on any builds.
* One NOTE about the new package maintainer
2 changes: 1 addition & 1 deletion docs/404.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/ISSUE_TEMPLATE.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading