Skip to content

Commit

Permalink
update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
David Muhr committed Dec 14, 2017
1 parent 751c71e commit 59767e1
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 9 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: stopwords
Type: Package
Title: Stopword lists in R
Version: 0.9
Version: 0.9.0
Authors@R: c(person("Kenneth", "Benoit", email = "[email protected]", role = "aut"),
person("David", "Muhr", email = "[email protected]", role = c("aut", "cre")),
person("Kohei", "Watanabe", email = "[email protected]", role = "aut"))
Expand Down
19 changes: 15 additions & 4 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ knitr::opts_chunk$set(
[![Downloads](https://cranlogs.r-pkg.org/badges/stopwords)](https://CRAN.R-project.org/package=stopwords)
[![Total Downloads](https://cranlogs.r-pkg.org/badges/grand-total/stopwords?color=orange)](https://CRAN.R-project.org/package=stopwords)

R package providing "one-stop shopping" for stopword lists in R, for multiple languages and sources. No longer should text analysis or NLP packages bake in their own stopword lists or functions, since this package can accomodate them all, and is easily extended.
R package providing "one-stop shopping" for stopword lists in R, for multiple languages and sources. No longer should text analysis or NLP packages bake in their own stopword lists or functions, since this package can accomodate them all, and is easily extended.

Created by [David Muhr](https://github.com/davnn), and extended in cooperation with [Kenneth Benoit](https://github.com/kbenoit) and [Kohei Watanabe](https://github.com/koheiw).

Expand All @@ -42,13 +42,24 @@ head(stopwords::stopwords("de", source = "stopwords-iso"), 20)
```

For compability with the former `quanteda::stopwords()`:

```{r}
head(stopwords::stopwords("german"), 20)
```

Explore sources and languages:

```{r}
# list all sources
stopwords::stopwords_getsources()
# list languages for a specific source
stopwords::stopwords_getlanguages("snowball")
```

## Languages available

The following coverage of languages is currently available, by source. Note that the inclusiveness of the stopword lists will vary by source, and the numebr of languages covered by a stopword list does not necessarily mean that the source is better than one with more limited coverage. (There may be meany reasons to prefer the "snowball" source over the "stopwords-iso" source, for instance.)
The following coverage of languages is currently available, by source. Note that the inclusiveness of the stopword lists will vary by source, and the number of languages covered by a stopword list does not necessarily mean that the source is better than one with more limited coverage. (There may be meany reasons to prefer the default "snowball" source over the "stopwords-iso" source, for instance.)

The following languages are currently available:

Expand Down Expand Up @@ -111,15 +122,15 @@ The following languages are currently available:
| Vietnamese | vi || | | |
| Yoruba | yo || | | |
| Zulu | zu || | | |

## Contributing

Additional sources can be defined and contributed by adding new data objects, as follows:

1. **Data object**. Create a named list of characters, in UTF-8 format, consisting of the stopwords for each language. The [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) language code will form the name of the list element, and the values of each element will be the character vector of stopwords for literal matches. The data object should follow the package naming convention, and be called `data_stopwords_newsource`, where `newsource` is replaced by the name of the new source.
1. **Data object**. Create a named list of characters, in UTF-8 format, consisting of the stopwords for each language. The [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) language code will form the name of the list element, and the values of each element will be the character vector of stopwords for literal matches. The data object should follow the package naming convention, and be called `data_stopwords_newsource`, where `newsource` is replaced by the name of the new source.

2. **Documentation**. The new source should be clearly documented, especially the source from which was taken.


## License

This package as well as the source repositories are licensed under MIT.
15 changes: 14 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,23 @@ head(stopwords::stopwords("german"), 20)
## [15] "anderer" "anderes" "anderm" "andern" "anderr" "anders"
```

Explore sources and languages:

``` r
# list all sources
stopwords::stopwords_getsources()
## [1] "snowball" "stopwords-iso" "misc" "smart"

# list languages for a specific source
stopwords::stopwords_getlanguages("snowball")
## [1] "da" "de" "en" "es" "fi" "fr" "hu" "ir" "it" "nl" "no" "pt" "ro" "ru"
## [15] "sv"
```

Languages available
-------------------

The following coverage of languages is currently available, by source. Note that the inclusiveness of the stopword lists will vary by source, and the numebr of languages covered by a stopword list does not necessarily mean that the source is better than one with more limited coverage. (There may be meany reasons to prefer the "snowball" source over the "stopwords-iso" source, for instance.)
The following coverage of languages is currently available, by source. Note that the inclusiveness of the stopword lists will vary by source, and the number of languages covered by a stopword list does not necessarily mean that the source is better than one with more limited coverage. (There may be meany reasons to prefer the default "snowball" source over the "stopwords-iso" source, for instance.)

The following languages are currently available:

Expand Down
4 changes: 1 addition & 3 deletions cran-comments.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@

0 errors | 0 warnings | 0 note

* This is a new release.

## Reverse dependencies

This is a new release, so there are no reverse dependencies.
No reverse dependencies.

0 comments on commit 59767e1

Please sign in to comment.