Skip to content

Commit

Permalink
Update text-mining-youtube-comments.md
Browse files Browse the repository at this point in the history
  • Loading branch information
hawc2 authored Feb 7, 2024
1 parent 97b9396 commit dd2de08
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion en/drafts/originals/text-mining-youtube-comments.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ all_data$text <- all_data$commentText %>%

Using the `stringr` package from the tidyverse, and the `stringi` package from base R, the following code further cleans the text data. This additional pre-processing step filters out numeric digits, punctuation, emojis, links, mentions, and comments with less than 10 total words. In addition, the following code removes duplicate comments and places the cleaned data into a column titled "uniqueWords."

Note you can also clean the data using the `quanteda` R package at a later stage of this lesson, but we recommend `stringr` and `stringi` - especially if you want to export cleaned data in a user-readable format, such as if you're performing other analytics outside the Wordfish modeling described below.
Note you can also clean the data using the `quanteda` R package at a later stage of this lesson, but we recommend `stringr` and `stringi` - especially if you want to export cleaned data in a user-readable format, such as if you're performing other analytics outside the Wordfish modeling described below. For more guidance on using the 'quanteda' package, see the University of Virginia Library's useful overview of its functionalities, ["A Beginner's Guide to Text Analysis with quanteda"](https://library.virginia.edu/data/articles/a-beginners-guide-to-text-analysis-with-quanteda).

```
all_data$text <- all_data$text %>%
Expand Down

0 comments on commit dd2de08

Please sign in to comment.