diff --git a/docs/404.html b/docs/404.html index cf54aa6..e1d7e9e 100644 --- a/docs/404.html +++ b/docs/404.html @@ -33,7 +33,7 @@
diff --git a/docs/CONDUCT.html b/docs/CONDUCT.html index 5c2a260..5cc474a 100644 --- a/docs/CONDUCT.html +++ b/docs/CONDUCT.html @@ -17,7 +17,7 @@ diff --git a/docs/articles/index.html b/docs/articles/index.html index 3a02456..bec1c14 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -17,7 +17,7 @@ diff --git a/docs/articles/readtext_vignette.html b/docs/articles/readtext_vignette.html index 6b70e4a..389bf8d 100644 --- a/docs/articles/readtext_vignette.html +++ b/docs/articles/readtext_vignette.html @@ -34,7 +34,7 @@ @@ -386,16 +386,16 @@Benoit K, Obeng A (2023). +
Benoit K, Obeng A (2024). readtext: Import and Handling for Plain and Formatted Text Files. -R package version 0.82, https://github.com/quanteda/readtext. +R package version 0.91, https://github.com/quanteda/readtext.
@Manual{, title = {readtext: Import and Handling for Plain and Formatted Text Files}, author = {Kenneth Benoit and Adam Obeng}, - year = {2023}, - note = {R package version 0.82}, + year = {2024}, + note = {R package version 0.91}, url = {https://github.com/quanteda/readtext}, }diff --git a/docs/index.html b/docs/index.html index 6310db5..282a2bb 100644 --- a/docs/index.html +++ b/docs/index.html @@ -36,7 +36,7 @@ @@ -94,11 +94,11 @@
# devtools packaged required to install readtext from Github
-devtools::install_github("quanteda/readtext")
Linux note: There are a couple of dependencies that may not be available on linux systems. On Debian/Ubuntu try installing these packages by running these commands at the command line:
-sudo apt-get install libpoppler-cpp-dev # for antiword
readtext supports plain text files (.txt), data in some form of JavaScript Object Notation (.json), comma-or tab-separated values (.csv, .tab, .tsv), XML documents (.xml), as well as PDF, Microsoft Word formatted files and other document formats (.pdf, .doc, .docx, .odt, .rtf). readtext also handles multiple files and file types using for instance a “glob” expression, files from a URL or an archive file (.zip, .tar, .tar.gz, .tar.bz).
The file formats are determined automatically by the filename extensions. If a file has no extension or is unknown, readtext will assume that it is plain text. The following command, for instance, will load in all of the files from the subdirectory txt/UDHR/
:
-require(readtext)
-## Loading required package: readtext
+library("readtext")
# get the data directory from readtext
DATA_DIR <- system.file("extdata/", package = "readtext")
@@ -149,16 +148,16 @@ With quanteda
readtext was originally developed in early versions of the quanteda package for the quantitative analysis of textual data. Because quanteda’s corpus constructor recognizes the data.frame format returned by readtext()
, it can construct a corpus directly from a readtext object, preserving all docvars and other meta-data.
library("quanteda")
-## Package version: 3.2.4
+## Package version: 3.3.1
## Unicode version: 14.0
## ICU version: 71.1
## Parallel computing: 10 of 10 threads used.
## See https://quanteda.io for tutorials and examples.
##
## Attaching package: 'quanteda'
-## The following objects are masked from 'package:readtext':
+## The following object is masked from 'package:readtext':
##
-## docnames, docvars, texts
+## texts
# read in comma-separated values with readtext
rt_csv <- readtext(paste0(DATA_DIR, "/csv/inaugCorpus.csv"), text_field = "texts")
# create quanteda corpus
@@ -221,7 +220,7 @@ Developers
Dev status
@@ -48,14 +48,22 @@ Changelog
docvars()
, docnames()
, texts()
+docvars()
, docnames()
, texts()
.docvars()
, docnames()
, texts()
numeric or character; indicate position of a text column in x
logical; if TRUE
, set types of variables automatically
a seed value for digest::digest
. If codeNULL, a random
+
a seed value for digest::digest
. If NULL
, a random
value will be used.
data_files_encodedtexts
a .zip file of texts containing a variety of differently encoded texts
Extract document names from a readtext object
Extract document variables from a readtext object
readtext()
.
+ Useful links:
not used
when concatenating texts by using groups
, this will be the
-spacing added between texts. (Default is two spaces.)