-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reenable non-alphabetic stripping by default #46
Comments
Hi, thanks for the report. The problem here is caused by a very unfortunate interaction of two features, as follows. (Here, I refer to the specific example of the blacklisted term
There are a couple of ways we could go about addressing this: a. Remove the Option B is evidently the more 'correct' fix but it requires some rearchitecting of the internals of So that leaves Option A. This fix is something you can do in your own code: provide a custom set of transformers that does not include the Thanks again for the report and reproduction, and apologies for not being able to address this more promptly. |
For #23, #46. BREAKING CHANGE: Using the default English preset, Obscenity will no longer strip non-alphabetic characters from the input text before matching. This addresses a class of egregious false negatives in previous versions (see #23), but introduces a regression where cases such as 'f u c k' (with the space) will no longer be detected by default. We expect to provide a more comprehensive fix in the next minor release. If desired, it remains possible to revert to the previous behavior by providing a custom set of transformers to the matcher.
For #23, #46. BREAKING CHANGE: Using the default English preset, Obscenity will no longer strip non-alphabetic characters from the input text before matching. This addresses a class of egregious false negatives in previous versions (see #23), but introduces a regression where cases such as 'f u c k' (with the space) will no longer be detected by default. We expect to provide a more comprehensive fix in the next minor release. If desired, it remains possible to revert to the previous behavior by providing a custom set of transformers to the matcher.
I thought about this issue some more and released v0.2.0 with Option A (disabling the problematic transformation by default). As outlined above, this avoids the false negatives seen in your report at the cost of regressing in cases such as |
Thanks for the detailed response. I tried running the code again and unfortunately there are still some words not being censored, and I'm not sure why.
As seen above, he following words are still failing to be censored:
|
I just realised the reason those 4 aren't working is because of the
This can be fixed by either
|
Is it possible to make the Skipping non-alphabetic characters is a very nice feature to have, and as you say, without it we suffer "some undesirable degradation in match quality". There's also the issue of multi-word phrases such as |
It's possible to do this, but we would still run into issues with not matching on Re: the issues w/ |
What are the options to handle cases like |
Expected behavior
Inputting all the EnglishProfaneWord, I expected all of them to be censored.
Actual behavior
As you can see on this CodePen, not all of the words get censored.
Minimal reproducible example
Steps to reproduce
Additional context
Here is a less minimal CodePen with an input and output textarea: https://codepen.io/HatScripts/pen/NWJxEKW
Node.js version
N/A
Obscenity version
v0.1.4
Priority
Terms
The text was updated successfully, but these errors were encountered: