-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
78 lines (60 loc) · 2.82 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
output: github_document
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "##",
fig.path = "man/images/"
)
```
```{r echo=FALSE, results="hide", message=FALSE}
library("badger")
```
# nsyllable
<!-- badges: start -->
[![CRAN Version](https://www.r-pkg.org/badges/version/nsyllable)](https://CRAN.R-project.org/package=nsyllable)
`r badge_devel("quanteda/nsyllable", "royalblue")`
[![Downloads](https://cranlogs.r-pkg.org/badges/nsyllable)](https://CRAN.R-project.org/package=nsyllable)
[![Total Downloads](https://cranlogs.r-pkg.org/badges/grand-total/nsyllable?color=orange)](https://CRAN.R-project.org/package=nsyllable)
[![R build status](https://github.com/quanteda/nsyllable/workflows/R-CMD-check/badge.svg)](https://github.com/quanteda/nsyllable/actions)
[![codecov](https://codecov.io/gh/quanteda/nsyllable/branch/master/graph/badge.svg)](https://app.codecov.io/gh/quanteda/nsyllable)
<!-- badges: end -->
## About
Counts syllables in character vectors. For English, this looks up syllables
from the [Carnegie Mellon University Pronouncing
Dictionary](https://github.com/cmusphinx/cmudict), or guesses the
syllables as the number of vowel sequences for words not found. User-supplied
syllable word lists are also supported.
We hope to add lookup tables for additional languages in the future.
## How to Install
From CRAN:
```{r eval = FALSE}
install.packages("nsyllable")
```
From GitHub:
```{r eval = FALSE}
# remotes package required to install nsyllable from Github
remotes::install_github("quanteda/nsyllable")
```
## Usage
`nsyllable()` counts the syllables in each element of a character vector, and returns the integer vector of the syllable counts. If `use.names = TRUE`, then the output vector is named. The default (and currently, only) language implemented is English.
```{r}
library("nsyllable")
charvec <- c("testing", "Aachen", "supercalifragilisticexpialidocious")
nsyllable(charvec)
nsyllable(charvec, use.names = TRUE)
```
User-supplied dictionaries can also be used, and these will override the `language` argument. Below, "excellent" is still (correctly) counted, but not because it looked up the results in the English dictionary, but because it counted the vowel sequences. This gets "noel" wrong, however.
```{r}
nsyllable(c("excellent", "noel", "film"), use.names = TRUE)
# redefine the syllables as it's pronounced in parts of Ireland
mydict <- c("film" = 2L)
# looks up "excellent" and does the vowel count
nsyllable(c("excellent", "noel", "film"), syllable_dictionary = mydict, use.names = TRUE)
```
To not use the English dictionary and count only vowel sequences, set `syllable_dictionary` to `NULL`. This will likely to be a good approximation for many Western languages.
```{r}
nsyllable(c("Dies", "ist", "eine", "Demonstration"), syllable_dictionary = NULL,
use.names = TRUE)
```