textplot_wordcloud() with comparison = TRUE split not max.words evenly by groups #21

p-dre · 2022-12-08T10:11:19Z

Describe the bug

I would like to create a wordcloud in which the most used words are displayed by groups. Unfortunately, the maximum number of words is not evenly split between the groups like it is writen at the docs: "The maximum frequency will be split evenly across categories when comparison = TRUE.". I couldn't quite figure out what the split is based on, but it seems to have something to do with the relative importance of the words within each group.

Reproducible code

Dataset: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/L4OAKN

Bundestagganz <- readRDS("Corp_Bundestag_V2.rds")
datenneu1 <- subset (Bundestagganz, date <"2018-01-01")

datenneu1 %>% 
  corpus %>%  
  tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>% 
  dfm(verbose = FALSE) %>%  
  dfm_group(groups = party) %>% 
  quanteda.textplots::textplot_wordcloud(comparison = TRUE, max.words = 10,title.size = 1)

datenneu1 %>% 
  corpus %>% 
  tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>% 
  dfm(verbose = FALSE) %>%
  dfm_group(groups = party) %>% 
  quanteda.textplots::textplot_wordcloud(comparison = TRUE, max_words = 1000 , min_size = 0.1, max_size = 1)

Expected behavior

My expectation is that max.words describes either the maximlae number of words per group or the maximum number of words in total but then split evenly between the groups.

System information

sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8    LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                    LC_TIME=German_Germany.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] word2vec_0.3.4            udpipe_0.8.9              wordcloud_2.6             RColorBrewer_1.1-3        reshape2_1.4.4            quanteda.textplots_0.94.1 forcats_0.5.1             stringr_1.4.0            
 [9] purrr_0.3.4               readr_2.1.2               tidyr_1.2.0               tibble_3.1.7              ggplot2_3.3.6             tidyverse_1.3.2           tidytext_0.3.4            dplyr_1.0.10             
[17] ROCR_1.0-11               quanteda_3.2.1           

loaded via a namespace (and not attached):
 [1] httr_1.4.3          jsonlite_1.8.0      modelr_0.1.8        RcppParallel_5.1.5  assertthat_0.2.1    googlesheets4_1.0.0 cellranger_1.1.0    yaml_2.3.5          pillar_1.7.0        backports_1.4.1     lattice_0.20-45    
[12] glue_1.6.2          digest_0.6.29       rvest_1.0.2         colorspace_2.0-3    htmltools_0.5.2     Matrix_1.5-1        plyr_1.8.7          pkgconfig_2.0.3     broom_0.8.0         haven_2.5.0         scales_1.2.0       
[23] tzdb_0.3.0          googledrive_2.0.0   generics_0.1.2      ellipsis_0.3.2      withr_2.5.0         cli_3.3.0           magrittr_2.0.3      crayon_1.5.1        readxl_1.4.0        evaluate_0.15       stopwords_2.3      
[34] tokenizers_0.2.1    janeaustenr_0.1.5   fs_1.5.2            fansi_1.0.3         SnowballC_0.7.0     xml2_1.3.3          data.table_1.14.2   tools_4.2.2         hms_1.1.1           gargle_1.2.0        lifecycle_1.0.1    
[45] munsell_0.5.0       reprex_2.0.1        compiler_4.2.2      rlang_1.0.2         grid_4.2.2          rstudioapi_0.13     rmarkdown_2.13      gtable_0.3.0        DBI_1.1.2           R6_2.5.1            lubridate_1.8.0    
[56] knitr_1.38          fastmap_1.1.0       utf8_1.2.2          fastmatch_1.1-3     stringi_1.7.6       Rcpp_1.0.8.3        vctrs_0.4.1         dbplyr_2.1.1        tidyselect_1.1.2    xfun_0.30

The text was updated successfully, but these errors were encountered:

stuckel · 2023-03-02T15:33:42Z

I had exactly the same issue and would also appreciate a solution.

kbenoit transferred this issue from quanteda/quanteda Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

textplot_wordcloud() with comparison = TRUE split not max.words evenly by groups #21

textplot_wordcloud() with comparison = TRUE split not max.words evenly by groups #21

p-dre commented Dec 8, 2022 •

edited

Loading

stuckel commented Mar 2, 2023

textplot_wordcloud() with comparison = TRUE split not max.words evenly by groups #21

textplot_wordcloud() with comparison = TRUE split not max.words evenly by groups #21

Comments

p-dre commented Dec 8, 2022 • edited Loading

Describe the bug

Reproducible code

Expected behavior

System information

stuckel commented Mar 2, 2023

p-dre commented Dec 8, 2022 •

edited

Loading