Skip to content

Commit

Permalink
Add gutenberg_get_all_mirrors
Browse files Browse the repository at this point in the history
  • Loading branch information
jrdnbradford committed Aug 31, 2024
1 parent 84acc41 commit 07d3976
Show file tree
Hide file tree
Showing 7 changed files with 110 additions and 0 deletions.
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Generated by roxygen2: do not edit by hand

export(gutenberg_download)
export(gutenberg_get_all_mirrors)
export(gutenberg_get_mirror)
export(gutenberg_strip)
export(gutenberg_works)
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# gutenbergr (development version)

* `gutenberg_get_all_mirrors()` has been added to retrieve mirror data (@jrdnbradford, #58)

# gutenbergr 0.2.4

* Update data scraping process to use R end-to-end (@jonthegeek, #36).
Expand Down
42 changes: 42 additions & 0 deletions R/gutenberg_mirrors.R
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,45 @@ gutenberg_get_mirror <- function(verbose = TRUE) {
options(gutenberg_mirror = mirror)
return(mirror)
}


#' Get all mirror data from Project Gutenberg
#'
#' Get all the mirror data from \url{https://www.gutenberg.org/MIRRORS.ALL}
#'
#' @return A tbl_df of Project Gutenberg mirrors and related data
#' \describe{
#'
#' \item{continent}{Continent where the mirror is located}
#'
#' \item{nation}{Nation where the mirror is located}
#'
#' \item{location}{Location of the mirror}
#'
#' \item{provider}{Provider of the mirror}
#'
#' \item{url}{URL of the mirror}
#'
#' \item{note}{Special notes}
#' }
#' @examplesIf interactive()
#'
#' gutenberg_get_all_mirrors()
#'
#' @export
gutenberg_get_all_mirrors <- function() {
mirrors_url <- "https://www.gutenberg.org/MIRRORS.ALL"
mirrors_md <- read_url(mirrors_url)
tmp <- tempfile(fileext = ".md")
writeLines(mirrors_md, tmp)
mirrors <- suppressWarnings(
readr::read_delim(
tmp,
delim = "|",
trim_ws = TRUE
) |>
dplyr::slice(2:(dplyr::n() - 1))
)

return(mirrors)
}
34 changes: 34 additions & 0 deletions man/gutenberg_get_all_mirrors.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 21 additions & 0 deletions tests/testthat/fixtures/MIRRORS-ALL
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
continent | nation | location | provider | url | note
---------------+---------------+---------------------+----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------
Europe | Great Britain | Kent | UK Mirror Service | http://www.mirrorservice.org/sites/ftp.ibiblio.org/pub/docs/books/gutenberg/ |
Europe | Great Britain | Kent | UK Mirror Service | ftp://ftp.mirrorservice.org/sites/ftp.ibiblio.org/pub/docs/books/gutenberg/ |
Europe | Great Britain | Kent | UK Mirror Service | rsync://rsync.mirrorservice.org/gutenberg/ |
Europe | Portugal | Braga | Universidade do Minho | http://eremita.di.uminho.pt/gutenberg/ |
Europe | Portugal | Braga | Universidade do Minho | ftp://eremita.di.uminho.pt/pub/gutenberg/ |
North America | Canada | Waterloo | University of Waterloo Computer Science Club | http://mirror.csclub.uwaterloo.ca/gutenberg/ |
North America | United States | Buffalo, NY | Jake Nabasny | https://gutenberg.nabasny.com/ |
North America | United States | Chapel Hill | iBiblio | https://www.gutenberg.org/dirs/ | Main Project Gutenberg Collection Site
North America | United States | Chapel Hill | iBiblio | ftp://ftp.ibiblio.org/pub/docs/books/gutenberg/ | Main Project Gutenberg FTP Site.
North America | United States | Pikeville, Kentucky | SandyRiver.NET | https://mirror2.sandyriver.net/pub/gutenberg | High speed mirror on a 10Gb network connection. Also available by http, and by rsync to rsync://mirror2.sandyriver.net/pub/gutenberg
North America | United States | Salt Lake City | Xmission ISP - FTP | ftp://mirrors.xmission.com/gutenberg/ |
North America | United States | Salt Lake City | Xmission ISP - HTTP | http://mirrors.xmission.com/gutenberg/ |
North America | United States | San Diego | Project Gutenberg | ftp://gutenberg.pglaf.org | High-speed mirror. Includes cache/generated files (epub, mobi, etc.).
North America | United States | San Diego | Project Gutenberg | https://aleph.gutenberg.org/ | High-speed mirror. Includes cache/generated files (epub, mobi, etc.). Also available via rsync and ftp.
North America | United States | San Diego | Project Gutenberg | https://gutenberg.pglaf.org/ | High-speed mirror. Includes cache/generated files (epub, mobi, etc.).
North America | United States | San Diego | Project Gutenberg | gopher://gopher.pglaf.org/ | Gopher server.
North America | United States | San Diego | Project Gutenberg | rsync://gutenberg.pglaf.org/gutenberg | High-speed mirror. Includes cache/generated files (epub, mobi, etc.).
(17 rows)

1 change: 1 addition & 0 deletions tests/testthat/fixtures/create_fixtures.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ dl_fixture("https://www.gutenberg.org/cache/epub/68283/pg68283.txt")
dl_fixture("https://www.gutenberg.org/robot/harvest?filetypes[]=txt")
dl_fixture("http://aleph.gutenberg.org/1/0/105/105-0.zip")
dl_fixture("http://aleph.gutenberg.org/1/0/109/109.zip")
dl_fixture("https://www.gutenberg.org/MIRRORS.ALL")
9 changes: 9 additions & 0 deletions tests/testthat/test-gutenberg_mirrors.R
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,12 @@ test_that("gutenberg_get_mirror uses existing option", {
gutenberg_get_mirror(), "mirror"
)
})

test_that("gutenberg_get_all_mirrors works", {
local_dl_and_read()
mirrors <- gutenberg_get_all_mirrors()
expect_true(inherits(mirrors, "data.frame"))
expect_true(inherits(mirrors, "tbl_df"))
expect_equal(ncol(mirrors), 6)
expect_true(nrow(mirrors) > 10)
})

0 comments on commit 07d3976

Please sign in to comment.