Skip to content

Commit

Permalink
Update zip/txt retrieval (#71)
Browse files Browse the repository at this point in the history
  • Loading branch information
jrdnbradford authored Sep 2, 2024
1 parent 2a73b8e commit ee02847
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ Yes! The package respects [these rules](https://www.gutenberg.org/policy/robot_a

* Project Gutenberg allows wget to harvest Project Gutenberg using [this list of links](https://www.gutenberg.org/robot/harvest?filetypes[]=html). The gutenbergr package visits that page once to find the recommended mirror for the user's location.
* We retrieve the book text directly from that mirror using links in the same format. For example, Frankenstein (book 84) is retrieved from `https://www.gutenberg.lib.md.us/8/84/84.zip`.
* We retrieve the .zip file rather than txt to minimize bandwidth on the mirror.
* We give priority to retrieving the `.zip` file to minimize bandwidth on the mirror. `.txt` files are only retrieved if there is no `.zip`.

Still, this package is *not* the right way to download the entire Project Gutenberg corpus (or all from a particular language). For that, follow [their recommendation](https://www.gutenberg.org/policy/robot_access.html) to use wget or set up a mirror. This package is recommended for downloading a single work, or works for a particular author or topic.

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,8 +230,8 @@ to the best of our ability. Namely:
- We retrieve the book text directly from that mirror using links in the
same format. For example, Frankenstein (book 84) is retrieved from
`https://www.gutenberg.lib.md.us/8/84/84.zip`.
- We retrieve the .zip file rather than txt to minimize bandwidth on the
mirror.
- We give priority to retrieving the `.zip` file to minimize bandwidth
on the mirror. `.txt` files are only retrieved if there is no `.zip`.

Still, this package is *not* the right way to download the entire
Project Gutenberg corpus (or all from a particular language). For that,
Expand Down

0 comments on commit ee02847

Please sign in to comment.