Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nounphrase_consolidate() gives unexpected results #207

Open
rjake opened this issue Jun 23, 2021 · 0 comments
Open

nounphrase_consolidate() gives unexpected results #207

rjake opened this issue Jun 23, 2021 · 0 comments
Assignees

Comments

@rjake
Copy link

rjake commented Jun 23, 2021

spacyr has been great to learn! Thanks for making this available. I might be doing something wrong but sometimes nounphrase_consolidate() gives me unexpected results. It seems to be related to how much data spacyr has to process. In the results below, sometimes processing more rows gives the expected results (orange) and sometimes processing fewer rows gives the expected results (green). I hope you can shed some light on this for me.

library(tidyverse)
library(spacyr)

odd_words <- "right|most|paw|clear"

df <- 
  tibble(
    doc_id = seq_along(sentences),
    text = tolower(sentences)
  )
  
  
all_rows <-
  df |> 
  spacy_parse(nounphrase = TRUE) |> 
  nounphrase_consolidate()


filtered_rows <-
  df |>
  filter(str_detect(text, odd_words)) |> # <--- this line is different 
  spacy_parse(nounphrase = TRUE) |> 
  nounphrase_consolidate()


filtered_rows |> 
  select(doc_id:token) |> 
  left_join(
    select(all_rows,doc_id, sentence_id, token_id, token2 = token)
  ) |> 
  filter(token != token2)

image

@kbenoit kbenoit self-assigned this Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants