diff --git a/docs/404.html b/docs/404.html index cf54aa6..e1d7e9e 100644 --- a/docs/404.html +++ b/docs/404.html @@ -33,7 +33,7 @@ readtext - 0.82 + 0.91 diff --git a/docs/CONDUCT.html b/docs/CONDUCT.html index 5c2a260..5cc474a 100644 --- a/docs/CONDUCT.html +++ b/docs/CONDUCT.html @@ -17,7 +17,7 @@ readtext - 0.82 + 0.91 diff --git a/docs/articles/index.html b/docs/articles/index.html index 3a02456..bec1c14 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -17,7 +17,7 @@ readtext - 0.82 + 0.91 diff --git a/docs/articles/readtext_vignette.html b/docs/articles/readtext_vignette.html index 6b70e4a..389bf8d 100644 --- a/docs/articles/readtext_vignette.html +++ b/docs/articles/readtext_vignette.html @@ -34,7 +34,7 @@ readtext - 0.82 + 0.91 @@ -386,16 +386,16 @@

3. Inter-operability with quantedasummary(corpus_csv, 5) } ## Loading required package: quanteda -## Package version: 3.2.4 +## Package version: 3.3.1 ## Unicode version: 14.0 ## ICU version: 71.1 ## Parallel computing: 10 of 10 threads used. ## See https://quanteda.io for tutorials and examples. ## ## Attaching package: 'quanteda' -## The following objects are masked from 'package:readtext': +## The following object is masked from 'package:readtext': ## -## docnames, docvars, texts +## texts ## Corpus consisting of 5 documents, showing 5 documents: ## ## Text Types Tokens Sentences Year President FirstName @@ -569,8 +569,8 @@

4.2 Read files with different encod ## Corpus consisting of 36 documents, showing 5 documents: ## ## Text Types Tokens Sentences document -## IndianTreaty_English_UTF-16LE.txt 617 2577 152 IndianTreaty -## IndianTreaty_English_UTF-8-BOM.txt 645 3092 150 IndianTreaty +## IndianTreaty_English_UTF-16LE.txt 618 2577 152 IndianTreaty +## IndianTreaty_English_UTF-8-BOM.txt 647 3085 150 IndianTreaty ## UDHR_Arabic_ISO-8859-6.txt 753 1555 86 UDHR ## UDHR_Arabic_UTF-8.txt 753 1555 86 UDHR ## UDHR_Arabic_WINDOWS-1256.txt 753 1555 86 UDHR diff --git a/docs/authors.html b/docs/authors.html index f802d0e..4d18651 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -17,7 +17,7 @@ readtext - 0.82 + 0.91 @@ -81,15 +81,15 @@

Citation

-

Benoit K, Obeng A (2023). +

Benoit K, Obeng A (2024). readtext: Import and Handling for Plain and Formatted Text Files. -R package version 0.82, https://github.com/quanteda/readtext. +R package version 0.91, https://github.com/quanteda/readtext.

@Manual{,
   title = {readtext: Import and Handling for Plain and Formatted Text Files},
   author = {Kenneth Benoit and Adam Obeng},
-  year = {2023},
-  note = {R package version 0.82},
+  year = {2024},
+  note = {R package version 0.91},
   url = {https://github.com/quanteda/readtext},
 }
diff --git a/docs/index.html b/docs/index.html index 6310db5..282a2bb 100644 --- a/docs/index.html +++ b/docs/index.html @@ -36,7 +36,7 @@ readtext - 0.82 + 0.91 @@ -94,11 +94,11 @@

How to InstallFrom GitHub, if you want the latest development version.

 # devtools packaged required to install readtext from Github 
-devtools::install_github("quanteda/readtext") 
+remotes::install_github("quanteda/readtext")

Linux note: There are a couple of dependencies that may not be available on linux systems. On Debian/Ubuntu try installing these packages by running these commands at the command line:

-
sudo apt-get install libpoppler-cpp-dev   # for antiword
+
sudo apt-get install libpoppler-cpp-dev   # for antiword

Demonstration: Reading one or more text files @@ -106,8 +106,7 @@

Demonstration: Reading one

readtext supports plain text files (.txt), data in some form of JavaScript Object Notation (.json), comma-or tab-separated values (.csv, .tab, .tsv), XML documents (.xml), as well as PDF, Microsoft Word formatted files and other document formats (.pdf, .doc, .docx, .odt, .rtf). readtext also handles multiple files and file types using for instance a “glob” expression, files from a URL or an archive file (.zip, .tar, .tar.gz, .tar.bz).

The file formats are determined automatically by the filename extensions. If a file has no extension or is unknown, readtext will assume that it is plain text. The following command, for instance, will load in all of the files from the subdirectory txt/UDHR/:

-require(readtext)
-## Loading required package: readtext
+library("readtext")
 # get the data directory from readtext
 DATA_DIR <- system.file("extdata/", package = "readtext")
 
@@ -149,16 +148,16 @@ 

With quanteda

readtext was originally developed in early versions of the quanteda package for the quantitative analysis of textual data. Because quanteda’s corpus constructor recognizes the data.frame format returned by readtext(), it can construct a corpus directly from a readtext object, preserving all docvars and other meta-data.

 library("quanteda")
-## Package version: 3.2.4
+## Package version: 3.3.1
 ## Unicode version: 14.0
 ## ICU version: 71.1
 ## Parallel computing: 10 of 10 threads used.
 ## See https://quanteda.io for tutorials and examples.
 ## 
 ## Attaching package: 'quanteda'
-## The following objects are masked from 'package:readtext':
+## The following object is masked from 'package:readtext':
 ## 
-##     docnames, docvars, texts
+##     texts
 # read in comma-separated values with readtext
 rt_csv <- readtext(paste0(DATA_DIR, "/csv/inaugCorpus.csv"), text_field = "texts")
 # create quanteda corpus
@@ -221,7 +220,7 @@ 

Developers

Dev status

  • CRAN Version
  • -
  • +
  • Downloads
  • Total Downloads
  • R-CMD-check
  • diff --git a/docs/news/index.html b/docs/news/index.html index da3011f..7598958 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -17,7 +17,7 @@ readtext - 0.82 + 0.91
@@ -48,14 +48,22 @@

Changelog

- -
  • Moves some quanteda functions to this package: docvars(), docnames(), texts() + +
    • Completes changes for compatibility with quanteda 4.0.
    • +
+
+ +
+
+ +
  • Moves some quanteda functions to this package: docvars(), docnames(), texts()
  • Updates print method to use pillar instead of tibble
  • Modernizes some of the testthat syntax.
- +
  • Fixed a problem in the examples breaking CRAN checks on Solaris.
  • Changed documentation to markdown.
diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index be286e8..8a2d3bc 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -1,9 +1,9 @@ -pandoc: 2.19.2 +pandoc: 3.1.1 pkgdown: 2.0.7 pkgdown_sha: ~ articles: readtext_vignette: readtext_vignette.html -last_built: 2023-04-06T07:32Z +last_built: 2024-02-23T05:03Z urls: reference: http://readtext.quanteda.io/reference article: http://readtext.quanteda.io/articles diff --git a/docs/reference/add_docid.html b/docs/reference/add_docid.html index 7f08a07..def3dea 100644 --- a/docs/reference/add_docid.html +++ b/docs/reference/add_docid.html @@ -17,7 +17,7 @@ readtext - 0.82 + 0.91

@@ -69,10 +69,6 @@

Arguments

docid_field

numeric or character; indicate position of a text column in x

- -
impute_types
-

logical; if TRUE, set types of variables automatically

- diff --git a/docs/reference/as.character.readtext.html b/docs/reference/as.character.readtext.html index 4acd3ea..ac8447a 100644 --- a/docs/reference/as.character.readtext.html +++ b/docs/reference/as.character.readtext.html @@ -18,7 +18,7 @@ readtext - 0.82 + 0.91 diff --git a/docs/reference/basename_unique.html b/docs/reference/basename_unique.html index 6d42af4..5632bea 100644 --- a/docs/reference/basename_unique.html +++ b/docs/reference/basename_unique.html @@ -17,7 +17,7 @@ readtext - 0.82 + 0.91 diff --git a/docs/reference/cache_remote.html b/docs/reference/cache_remote.html index e823599..c96ae38 100644 --- a/docs/reference/cache_remote.html +++ b/docs/reference/cache_remote.html @@ -17,7 +17,7 @@ readtext - 0.82 + 0.91 diff --git a/docs/reference/data_char_encodedtexts.html b/docs/reference/data_char_encodedtexts.html index 0a36dad..cef81f9 100644 --- a/docs/reference/data_char_encodedtexts.html +++ b/docs/reference/data_char_encodedtexts.html @@ -18,7 +18,7 @@ readtext - 0.82 + 0.91 diff --git a/docs/reference/data_files_encodedtexts.html b/docs/reference/data_files_encodedtexts.html index 8e123f0..730fdd8 100644 --- a/docs/reference/data_files_encodedtexts.html +++ b/docs/reference/data_files_encodedtexts.html @@ -19,7 +19,7 @@ readtext - 0.82 + 0.91 @@ -69,18 +69,18 @@

Examples

FILEDIR <- tempdir() unzip(system.file("extdata", "data_files_encodedtexts.zip", package = "readtext"), exdir = FILEDIR) -#> Error in dir.create(exdir, showWarnings = FALSE, recursive = TRUE): object 'FILEDIR' not found +#> Error in eval(expr, envir, enclos): object 'FILEDIR' not found # get encoding from filename filenames <- list.files(FILEDIR, "\\.txt$") -#> Error in list.files(FILEDIR, "\\.txt$"): object 'FILEDIR' not found +#> Error in eval(expr, envir, enclos): object 'FILEDIR' not found # strip the extension filenames <- gsub(".txt$", "", filenames) -#> Error in is.factor(x): object 'filenames' not found +#> Error in eval(expr, envir, enclos): object 'filenames' not found parts <- strsplit(filenames, "_") -#> Error in strsplit(filenames, "_"): object 'filenames' not found +#> Error in eval(expr, envir, enclos): object 'filenames' not found fileencodings <- sapply(parts, "[", 3) -#> Error in lapply(X = X, FUN = FUN, ...): object 'parts' not found +#> Error in eval(expr, envir, enclos): object 'parts' not found fileencodings #> Error in eval(expr, envir, enclos): object 'fileencodings' not found @@ -88,54 +88,54 @@

Examples

cat("Encoding conversions not available for this platform:") #> Encoding conversions not available for this platform: notAvailableIndex <- which(!(fileencodings %in% iconvlist())) -#> Error in fileencodings %in% iconvlist(): object 'fileencodings' not found +#> Error in eval(expr, envir, enclos): object 'fileencodings' not found fileencodings[notAvailableIndex] #> Error in eval(expr, envir, enclos): object 'fileencodings' not found # try readtext require(quanteda) #> Loading required package: quanteda -#> Package version: 3.2.4 +#> Package version: 3.3.1 #> Unicode version: 14.0 #> ICU version: 71.1 #> Parallel computing: 10 of 10 threads used. #> See https://quanteda.io for tutorials and examples. #> #> Attaching package: ‘quanteda’ -#> The following objects are masked from ‘package:readtext’: +#> The following object is masked from ‘package:readtext’: #> -#> docnames, docvars, texts +#> texts txts <- readtext(paste0(FILEDIR, "/", "*.txt")) -#> Error in paste0(FILEDIR, "/", "*.txt"): object 'FILEDIR' not found +#> Error in eval(expr, envir, enclos): object 'FILEDIR' not found substring(texts(txts)[1], 1, 80) # gibberish -#> Error in texts(txts): object 'txts' not found +#> Error in eval(expr, envir, enclos): object 'txts' not found substring(texts(txts)[4], 1, 80) # hex -#> Error in texts(txts): object 'txts' not found +#> Error in eval(expr, envir, enclos): object 'txts' not found substring(texts(txts)[40], 1, 80) # hex -#> Error in texts(txts): object 'txts' not found +#> Error in eval(expr, envir, enclos): object 'txts' not found # read them in again txts <- readtext(paste0(FILEDIR, "/", "*.txt"), encoding = fileencodings) -#> Error in paste0(FILEDIR, "/", "*.txt"): object 'FILEDIR' not found +#> Error in eval(expr, envir, enclos): object 'FILEDIR' not found substring(texts(txts)[1], 1, 80) # English -#> Error in texts(txts): object 'txts' not found +#> Error in eval(expr, envir, enclos): object 'txts' not found substring(texts(txts)[4], 1, 80) # Arabic, looking good -#> Error in texts(txts): object 'txts' not found +#> Error in eval(expr, envir, enclos): object 'txts' not found substring(texts(txts)[40], 1, 80) # Cyrillic, looking good -#> Error in texts(txts): object 'txts' not found +#> Error in eval(expr, envir, enclos): object 'txts' not found substring(texts(txts)[7], 1, 80) # Chinese, looking good -#> Error in texts(txts): object 'txts' not found +#> Error in eval(expr, envir, enclos): object 'txts' not found substring(texts(txts)[26], 1, 80) # Hindi, looking good -#> Error in texts(txts): object 'txts' not found +#> Error in eval(expr, envir, enclos): object 'txts' not found txts <- readtext(paste0(FILEDIR, "/", "*.txt"), encoding = fileencodings, docvarsfrom = "filenames", docvarnames = c("document", "language", "inputEncoding")) -#> Error in paste0(FILEDIR, "/", "*.txt"): object 'FILEDIR' not found +#> Error in eval(expr, envir, enclos): object 'FILEDIR' not found encodingCorpus <- corpus(txts, source = "Created by encoding-tests.R") -#> Error in corpus(txts, source = "Created by encoding-tests.R"): object 'txts' not found +#> Error in eval(expr, envir, enclos): object 'txts' not found summary(encodingCorpus) -#> Error in summary(encodingCorpus): object 'encodingCorpus' not found +#> Error in eval(expr, envir, enclos): object 'encodingCorpus' not found
diff --git a/docs/reference/encoding.html b/docs/reference/encoding.html index 3ee6655..a013ade 100644 --- a/docs/reference/encoding.html +++ b/docs/reference/encoding.html @@ -21,7 +21,7 @@ readtext - 0.82 + 0.91 diff --git a/docs/reference/get_nexis_html.html b/docs/reference/get_nexis_html.html index 7718873..6acfe77 100644 --- a/docs/reference/get_nexis_html.html +++ b/docs/reference/get_nexis_html.html @@ -18,7 +18,7 @@ readtext - 0.82 + 0.91 diff --git a/docs/reference/get_temp.html b/docs/reference/get_temp.html index f732d86..d85f864 100644 --- a/docs/reference/get_temp.html +++ b/docs/reference/get_temp.html @@ -17,7 +17,7 @@ readtext - 0.82 + 0.91 @@ -73,7 +73,7 @@

Arguments

seed
-

a seed value for digest::digest. If codeNULL, a random +

a seed value for digest::digest. If NULL, a random value will be used.

diff --git a/docs/reference/impute_types.html b/docs/reference/impute_types.html index 61e7062..f92725d 100644 --- a/docs/reference/impute_types.html +++ b/docs/reference/impute_types.html @@ -18,7 +18,7 @@ readtext - 0.82 + 0.91 diff --git a/docs/reference/index.html b/docs/reference/index.html index 2ec4760..fde3109 100644 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -17,7 +17,7 @@ readtext - 0.82 + 0.91 @@ -62,14 +62,6 @@

All functions data_files_encodedtexts

a .zip file of texts containing a variety of differently encoded texts

- -

docnames()

- -

Extract document names from a readtext object

- -

docvars()

- -

Extract document variables from a readtext object

encoding()

diff --git a/docs/reference/print.readtext.html b/docs/reference/print.readtext.html index 568cb8b..ac15dfd 100644 --- a/docs/reference/print.readtext.html +++ b/docs/reference/print.readtext.html @@ -17,7 +17,7 @@ readtext - 0.82 + 0.91 diff --git a/docs/reference/readtext-package.html b/docs/reference/readtext-package.html index 6c1eb54..643f6d3 100644 --- a/docs/reference/readtext-package.html +++ b/docs/reference/readtext-package.html @@ -19,7 +19,7 @@ readtext - 0.82 + 0.91 @@ -76,6 +76,12 @@

Package options

readtext().

+

Author

Ken Benoit, Adam Obeng, and Paul Nulty

diff --git a/docs/reference/readtext.html b/docs/reference/readtext.html index 78f9fbc..d884f0c 100644 --- a/docs/reference/readtext.html +++ b/docs/reference/readtext.html @@ -20,7 +20,7 @@ readtext - 0.82 + 0.91
diff --git a/docs/reference/readtext_options.html b/docs/reference/readtext_options.html index ce7b796..9c72443 100644 --- a/docs/reference/readtext_options.html +++ b/docs/reference/readtext_options.html @@ -17,7 +17,7 @@ readtext - 0.82 + 0.91 diff --git a/docs/reference/sort_fields.html b/docs/reference/sort_fields.html index b137338..44115fd 100644 --- a/docs/reference/sort_fields.html +++ b/docs/reference/sort_fields.html @@ -17,7 +17,7 @@ readtext - 0.82 + 0.91 diff --git a/docs/reference/texts.html b/docs/reference/texts.html index 4cc03c4..6f4560b 100644 --- a/docs/reference/texts.html +++ b/docs/reference/texts.html @@ -17,7 +17,7 @@ readtext - 0.82 + 0.91 @@ -68,11 +68,6 @@

Arguments

...

not used

- -
spacer
-

when concatenating texts by using groups, this will be the -spacing added between texts. (Default is two spaces.)

-

Value