You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that textstat_lexdiv produces different results, depending on whether a token or dfm object is used in the function. When I calculate the TTR by hand (for example), the figures match perfectly with the output of textstat_lexdiv with a dfm, but differ from the output of the function with a tokens object. Why is this? Is this behavior expected? It is not clear to me from the source code.
Reproducible code
Please paste minimal code that reproduces the bug. If possible, please upload the data file as .rds.
data(data_corpus_inaugural)
reagan_corpus<- corpus_subset(data_corpus_inaugural, Year==1981|Year==1985)
reagan_tokens<- tokens(reagan_corpus, remove_punct=TRUE, remove_numbers=FALSE,
remove_symbols=FALSE)
dfm<- dfm(reagan_tokens, tolower=FALSE)
dfm %>% textstat_lexdiv(measure= c("TTR", "R"),
remove_numbers=F, remove_punct=T,
remove_symbols=F, remove_hyphens=FALSE)
# --- - - - Versus:reagan_tokens %>% textstat_lexdiv(measure= c("TTR", "R"),
remove_numbers=F, remove_punct=T,
remove_symbols=F, remove_hyphens=FALSE)
# --- - - - by hand:
ntype(dfm) /ntoken(dfm) # this is the same as textstat_lexdiv with a dfm
ntype(reagan_tokens) /ntoken(reagan_tokens) # this is the same as textstat_lexdiv with a dfm
Expected behavior
I would expect both methods to return the same estimates for the TTR.
Describe the bug
I noticed that
textstat_lexdiv
produces different results, depending on whether a token or dfm object is used in the function. When I calculate the TTR by hand (for example), the figures match perfectly with the output oftextstat_lexdiv
with a dfm, but differ from the output of the function with a tokens object. Why is this? Is this behavior expected? It is not clear to me from the source code.Reproducible code
Please paste minimal code that reproduces the bug. If possible, please upload the data file as
.rds
.Expected behavior
I would expect both methods to return the same estimates for the TTR.
## System information
Please run
sessionInfo()
and paste the output.The text was updated successfully, but these errors were encountered: