nounphrase_consolidate() gives unexpected results #207

rjake · 2021-06-23T01:51:15Z

spacyr has been great to learn! Thanks for making this available. I might be doing something wrong but sometimes nounphrase_consolidate() gives me unexpected results. It seems to be related to how much data spacyr has to process. In the results below, sometimes processing more rows gives the expected results (orange) and sometimes processing fewer rows gives the expected results (green). I hope you can shed some light on this for me.

library(tidyverse)
library(spacyr)

odd_words <- "right|most|paw|clear"

df <- 
  tibble(
    doc_id = seq_along(sentences),
    text = tolower(sentences)
  )
  
  
all_rows <-
  df |> 
  spacy_parse(nounphrase = TRUE) |> 
  nounphrase_consolidate()


filtered_rows <-
  df |>
  filter(str_detect(text, odd_words)) |> # <--- this line is different 
  spacy_parse(nounphrase = TRUE) |> 
  nounphrase_consolidate()


filtered_rows |> 
  select(doc_id:token) |> 
  left_join(
    select(all_rows,doc_id, sentence_id, token_id, token2 = token)
  ) |> 
  filter(token != token2)

The text was updated successfully, but these errors were encountered:

kbenoit self-assigned this Sep 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nounphrase_consolidate() gives unexpected results #207

nounphrase_consolidate() gives unexpected results #207

rjake commented Jun 23, 2021

nounphrase_consolidate() gives unexpected results #207

nounphrase_consolidate() gives unexpected results #207

Comments

rjake commented Jun 23, 2021