Skip to content

Commit

Permalink
fix conduct path and https
Browse files Browse the repository at this point in the history
  • Loading branch information
Myfanwy committed Dec 13, 2022
1 parent 12f63ec commit 08937e1
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 15 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@
^CRAN-RELEASE$
^\.github$
^CODE_OF_CONDUCT\.md$
^CRAN-SUBMISSION$
10 changes: 5 additions & 5 deletions R/gutenberg_download.R
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ gutenberg_strip <- function(text) {
#'
#' Get the recommended mirror for Gutenberg files by accessing
#' the wget harvest path, which is
#' \url{http://www.gutenberg.org/robot/harvest?filetypes[]=txt}.
#' \url{https://www.gutenberg.org/robot/harvest?filetypes[]=txt}.
#' Also sets the global \code{gutenberg_mirror} options.
#'
#' @param verbose Whether to show messages about the Project Gutenberg
Expand All @@ -253,10 +253,10 @@ gutenberg_get_mirror <- function(verbose = TRUE) {
if (verbose) {
message(
"Determining mirror for Project Gutenberg from ",
"http://www.gutenberg.org/robot/harvest"
"https://www.gutenberg.org/robot/harvest"
)
}
wget_url <- "http://www.gutenberg.org/robot/harvest?filetypes[]=txt"
wget_url <- "https://www.gutenberg.org/robot/harvest?filetypes[]=txt"
lines <- readr::read_lines(wget_url)
a <- lines[stringr::str_detect(lines, stringr::fixed("<a href="))][1]
mirror_full_url <- stringr::str_match(a, "href=\"(.*?)\"")[2]
Expand All @@ -265,10 +265,10 @@ gutenberg_get_mirror <- function(verbose = TRUE) {
parsed <- urltools::url_parse(mirror_full_url)
mirror <- glue::glue("{parsed$scheme}://{parsed$domain}")

if (mirror == "http://www.gutenberg.lib.md.us") { # nocov start
if (mirror == "https://www.gutenberg.lib.md.us") { # nocov start
# this mirror is broken (PG has been contacted)
# for now, replace:
mirror <- "http://aleph.gutenberg.org"
mirror <- "https://aleph.gutenberg.org"
} # nocov end

if (verbose) {
Expand Down
8 changes: 4 additions & 4 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ gutenbergr: R package to search and download public domain texts from Project Gu

<!-- badges: start -->
[![Build Status](https://travis-ci.org/ropensci/gutenbergr.svg?branch=master)](https://travis-ci.org/ropensci/gutenbergr)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/gutenbergr)]( https://CRAN.R-project.org/package=gutenbergr)
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/gutenbergr)]( https://CRAN.R-project.org/package=gutenbergr)
[![Build status](https://ci.appveyor.com/api/projects/status/lqb7hngtj5epsmd1?svg=true)](https://ci.appveyor.com/project/ropensci/gutenbergr-dujv9)
[![Coverage Status](https://img.shields.io/codecov/c/github/ropensci/gutenbergr/master.svg)](https://codecov.io/github/ropensci/gutenbergr?branch=master)
[![rOpenSci peer-review](https://badges.ropensci.org/41_status.svg)](https://github.com/ropensci/software-review/issues/41)
Expand Down Expand Up @@ -118,14 +118,14 @@ See the [data-raw](https://github.com/ropensci/gutenbergr/tree/master/data-raw)

Yes! The package respects [these rules](https://www.gutenberg.org/policy/robot_access.html) and complies to the best of our ability. Namely:

* Project Gutenberg allows wget to harvest Project Gutenberg using [this list of links](http://www.gutenberg.org/robot/harvest?filetypes[]=html). The gutenbergr package visits that page once to find the recommended mirror for the user's location.
* We retrieve the book text directly from that mirror using links in the same format. For example, Frankenstein (book 84) is retrieved from `http://www.gutenberg.lib.md.us/8/84/84.zip`.
* Project Gutenberg allows wget to harvest Project Gutenberg using [this list of links](https://www.gutenberg.org/robot/harvest?filetypes[]=html). The gutenbergr package visits that page once to find the recommended mirror for the user's location.
* We retrieve the book text directly from that mirror using links in the same format. For example, Frankenstein (book 84) is retrieved from `https://www.gutenberg.lib.md.us/8/84/84.zip`.
* We retrieve the .zip file rather than txt to minimize bandwidth on the mirror.

Still, this package is *not* the right way to download the entire Project Gutenberg corpus (or all from a particular language). For that, follow [their recommendation](https://www.gutenberg.org/policy/robot_access.html) to use wget or set up a mirror. This package is recommended for downloading a single work, or works for a particular author or topic.

### Code of Conduct

This project is released with a [Contributor Code of Conduct](https://github.com/ropensci/gutenbergr/blob/master/CONDUCT.md). By participating in this project you agree to abide by its terms.
Please note that the gutenbergr project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

[![ropensci\_footer](https://ropensci.org/public_images/github_footer.png)](https://ropensci.org/)
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

[![Build
Status](https://travis-ci.org/ropensci/gutenbergr.svg?branch=master)](https://travis-ci.org/ropensci/gutenbergr)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/gutenbergr)](https://CRAN.R-project.org/package=gutenbergr)
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/gutenbergr)](https://CRAN.R-project.org/package=gutenbergr)
[![Build
status](https://ci.appveyor.com/api/projects/status/lqb7hngtj5epsmd1?svg=true)](https://ci.appveyor.com/project/ropensci/gutenbergr-dujv9)
[![Coverage
Expand Down Expand Up @@ -219,12 +219,12 @@ to the best of our ability. Namely:

- Project Gutenberg allows wget to harvest Project Gutenberg using [this
list of
links](http://www.gutenberg.org/robot/harvest?filetypes%5B%5D=html).
links](https://www.gutenberg.org/robot/harvest?filetypes%5B%5D=html).
The gutenbergr package visits that page once to find the recommended
mirror for the user’s location.
- We retrieve the book text directly from that mirror using links in the
same format. For example, Frankenstein (book 84) is retrieved from
`http://www.gutenberg.lib.md.us/8/84/84.zip`.
`https://www.gutenberg.lib.md.us/8/84/84.zip`.
- We retrieve the .zip file rather than txt to minimize bandwidth on the
mirror.

Expand All @@ -237,8 +237,9 @@ a single work, or works for a particular author or topic.

### Code of Conduct

This project is released with a [Contributor Code of
Conduct](https://github.com/ropensci/gutenbergr/blob/master/CONDUCT.md).
By participating in this project you agree to abide by its terms.
Please note that the gutenbergr project is released with a [Contributor
Code of
Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms.

[![ropensci_footer](https://ropensci.org/public_images/github_footer.png)](https://ropensci.org/)

0 comments on commit 08937e1

Please sign in to comment.