Skip to content

Latest commit

 

History

History
1017 lines (662 loc) · 30.5 KB

README.md

File metadata and controls

1017 lines (662 loc) · 30.5 KB

obscenity

obscenity

Table of contents

Enumerations

Classes

Interfaces

Type Aliases

Variables

Functions

Type Aliases

CensorContext

Ƭ CensorContext: MatchPayload & { input: string ; overlapsAtEnd: boolean ; overlapsAtStart: boolean }

Context passed to [[TextCensorStrategy | text censoring strategies]].

Defined in

src/censor/TextCensor.ts:104


CharacterMapping

Ƭ CharacterMapping: Map<string, string> | Record<string, string>

Maps characters to other characters. The key of the map/object should be the transformed character, while the value should be a set of characters that map to the transformed character.

Defined in

src/transformer/remap-characters/index.ts:60


EnglishProfaneWord

Ƭ EnglishProfaneWord: "abeed" | "abo" | "africoon" | "anal" | "anus" | "arabush" | "arse" | "ass" | "bastard" | "bestiality" | "bitch" | "blowjob" | "bollocks" | "boob" | "boonga" | "buttplug" | "chingchong" | "chink" | "cock" | "cuck" | "cum" | "cunt" | "deepthroat" | "dick" | "dildo" | "doggystyle" | "double penetration" | "dyke" | "ejaculate" | "fag" | "felch" | "fellatio" | "finger bang" | "fisting" | "fuck" | "gangbang" | "handjob" | "hentai" | "hooker" | "incest" | "jerk off" | "jizz" | "kike" | "lubejob" | "masturbate" | "negro" | "nigger" | "orgasm" | "orgy" | "penis" | "piss" | "porn" | "prick" | "pussy" | "rape" | "retard" | "scat" | "semen" | "sex" | "shit" | "slut" | "spastic" | "tit" | "tranny" | "turd" | "twat" | "vagina" | "wank" | "whore"

All the profane words that are included in the [[englishDataset | english dataset]] by default.

Defined in

src/preset/english.ts:416


MatchPayloadWithPhraseMetadata

Ƭ MatchPayloadWithPhraseMetadata<MetadataType>: MatchPayload & { phraseMetadata?: MetadataType }

Extends the default match payload by adding phrase metadata.

Type parameters

Name
MetadataType

Defined in

src/dataset/DataSet.ts:190


Node

Ƭ Node: LiteralNode | OptionalNode | WildcardNode

All the possible kinds of nodes.

Defined in

src/pattern/Nodes.ts:24


TextCensorStrategy

Ƭ TextCensorStrategy: (ctx: CensorContext) => string

Type declaration

▸ (ctx): string

A text censoring strategy, which receives a [[CensorContext]] and returns a replacement string.

Parameters
Name Type
ctx CensorContext
Returns

string

Defined in

src/censor/TextCensor.ts:99

Variables

englishDataset

Const englishDataset: DataSet<{ originalWord: EnglishProfaneWord }>

A dataset of profane English words.

Example

const matcher = new RegExpMatcher({
	...englishDataset.build(),
	...englishRecommendedTransformers,
});

Example

// Extending the data-set by adding a new word and removing an existing one.
const myDataset = new DataSet()
	.addAll(englishDataset)
	.removePhrasesIf((phrase) => phrase.metadata.originalWord === 'vagina')
	.addPhrase((phrase) => phrase.addPattern(pattern`|balls|`));

Copyright

The words are taken from the cuss project, with some modifications.

(The MIT License)

Copyright (c) 2016 Titus Wormer <[email protected]>

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
'Software'), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Defined in

src/preset/english.ts:103


englishRecommendedBlacklistMatcherTransformers

Const englishRecommendedBlacklistMatcherTransformers: (SimpleTransformerContainer | StatefulTransformerContainer)[]

A set of transformers to be used when matching blacklisted patterns with the [[englishDataset | english word dataset]].

Defined in

src/preset/english.ts:13


englishRecommendedTransformers

Const englishRecommendedTransformers: Pick<RegExpMatcherOptions, "blacklistMatcherTransformers" | "whitelistMatcherTransformers">

Recommended transformers to be used with the [[englishDataset | english word dataset]] and the [[RegExpMatcher]].

Defined in

src/preset/english.ts:48


englishRecommendedWhitelistMatcherTransformers

Const englishRecommendedWhitelistMatcherTransformers: (SimpleTransformerContainer | StatefulTransformerContainer)[]

A set of transformers to be used when matching whitelisted terms with the [[englishDataset | english word dataset]].

Defined in

src/preset/english.ts:36

Functions

assignIncrementingIds

assignIncrementingIds(patterns): BlacklistedTerm[]

Assigns incrementing IDs to the patterns provided, starting with 0. It is useful if you have a list of patterns to match against but don't care about identifying which pattern matched.

Example

const matcher = new RegExpMatcher({
 ...,
 blacklistedTerms: assignIncrementingIds([
     pattern`f?uck`,
     pattern`|shit|`,
 ]),
});

Parameters

Name Type Description
patterns ParsedPattern[] List of parsed patterns.

Returns

BlacklistedTerm[]

A list of blacklisted terms with valid IDs which can then be passed to the [[RegExpMatcher]].

Defined in

src/matcher/BlacklistedTerm.ts:37


asteriskCensorStrategy

asteriskCensorStrategy(): TextCensorStrategy

A text censoring strategy that generates strings made up of asterisks (*).

Example

const strategy = asteriskCensorStrategy();
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: '**** you'

Returns

TextCensorStrategy

A [[TextCensorStrategy]] for use with the [[TextCensor]].

Defined in

src/censor/BuiltinStrategies.ts:71


collapseDuplicatesTransformer

collapseDuplicatesTransformer(options?): StatefulTransformerContainer

Creates a transformer that collapses duplicate characters. This is useful for detecting variants of patterns in which a character is repeated to bypass detection.

As an example, the pattern hi does not match hhiii by default, as the frequency of the characters does not match. With this transformer, hhiii would become hi, and would therefore match the pattern.

Application order

It is recommended that this transformer be applied after all other transformers. Using it before other transformers may have the effect of not catching duplicates of certain characters that were originally different but became the same after a series of transformations.

Warning

This transformer should be used with caution, as while it can make certain patterns match text that wouldn't have been matched before, it can also go the other way. For example, the pattern hello clearly matches hello, but with this transformer, by default, hello would become helo which does not match. In this cases, the customThresholds option can be used to allow two ls in a row, making it leave hello unchanged.

Example

// Collapse runs of the same character.
const transformer = collapseDuplicatesTransformer();
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });

Example

// Collapse runs of characters other than 'a'.
const transformer = collapseDuplicatesTransformer({ customThresholds: new Map([['a', Infinity]]) });
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });

Parameters

Name Type Description
options CollapseDuplicatesTransformerOptions Options for the transformer.

Returns

StatefulTransformerContainer

A container holding the transformer, which can then be passed to the [[RegExpMatcher]].

Defined in

src/transformer/collapse-duplicates/index.ts:46


compareMatchByPositionAndId

compareMatchByPositionAndId(a, b): 0 | 1 | -1

Compares two match payloads.

If the first match payload's start index is less than the second's, -1 is returned; If the second match payload's start index is less than the first's, 1 is returned; If the first match payload's end index is less than the second's, -1 is returned; If the second match payload's end index is less than the first's, 1 is returned; If the first match payload's term ID is less than the second's, -1 is returned; If the first match payload's term ID is equal to the second's, 0 is returned; Otherwise, 1 is returned.

Parameters

Name Type Description
a MatchPayload First match payload.
b MatchPayload Second match payload.

Returns

0 | 1 | -1

The result of the comparison: -1 if the first should sort lower than the second, 0 if they are the same, and 1 if the second should sort lower than the first.

Defined in

src/matcher/MatchPayload.ts:57


fixedCharCensorStrategy

fixedCharCensorStrategy(char): TextCensorStrategy

A text censoring strategy that generates replacement strings that are made up of the character given, repeated as many times as needed.

Example

const strategy = fixedCharCensorStrategy('*');
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: '**** you'.

Parameters

Name Type Description
char string String that represents the code point which should be used when generating the replacement string. Must be exactly one code point in length.

Returns

TextCensorStrategy

A [[TextCensorStrategy]] for use with the [[TextCensor]].

Defined in

src/censor/BuiltinStrategies.ts:134


fixedPhraseCensorStrategy

fixedPhraseCensorStrategy(phrase): TextCensorStrategy

A text censoring strategy that returns a fixed string.

Example

// The replacement phrase '' effectively removes all matched regions
// from the string.
const strategy = fixedPhraseCensorStrategy('');
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: ' you'

Example

const strategy = fixedPhraseCensorStrategy('fudge');
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: 'fudge you'

Parameters

Name Type Description
phrase string Replacement phrase to use.

Returns

TextCensorStrategy

A [[TextCensorStrategy]] for use with the [[TextCensor]].

Defined in

src/censor/BuiltinStrategies.ts:115


grawlixCensorStrategy

grawlixCensorStrategy(): TextCensorStrategy

A text censoring strategy that generates grawlix, i.e. strings that contain the characters %, @, $, &, and *.

Example

const strategy = grawlixCensorStrategy();
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: '%@&* you'

Returns

TextCensorStrategy

A [[TextCensorStrategy]] for use with the [[TextCensor]].

Defined in

src/censor/BuiltinStrategies.ts:89


keepEndCensorStrategy

keepEndCensorStrategy(baseStrategy): TextCensorStrategy

A text censoring strategy that extends another strategy, adding the last character matched at the end of the generated string.

Example

const strategy = keepEndCensorStrategy(asteriskCensorStrategy());
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: '***k you'

Parameters

Name Type Description
baseStrategy TextCensorStrategy Strategy to extend. It will be used to produce the start of the generated string.

Returns

TextCensorStrategy

A [[TextCensorStrategy]] for use with the [[TextCensor]].

Defined in

src/censor/BuiltinStrategies.ts:51


keepStartCensorStrategy

keepStartCensorStrategy(baseStrategy): TextCensorStrategy

A text censoring strategy that extends another strategy, adding the first character matched at the start of the generated string.

Example

const strategy = keepStartCensorStrategy(grawlixCensorStrategy());
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: 'f$@* you'

Example

// Since keepEndCensorStrategy() returns another text censoring strategy, you can use it
// as the base strategy to pass to keepStartCensorStrategy().
const strategy = keepStartCensorStrategy(keepEndCensorStrategy(asteriskCensorStrategy()));
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: 'f**k you'

Parameters

Name Type Description
baseStrategy TextCensorStrategy Strategy to extend. It will be used to produce the end of the generated string.

Returns

TextCensorStrategy

A [[TextCensorStrategy]] for use with the [[TextCensor]].

Defined in

src/censor/BuiltinStrategies.ts:28


parseRawPattern

parseRawPattern(pattern): ParsedPattern

Parses a string as a pattern directly.

Note

It is recommended to use the [[pattern | pattern template tag]] instead of this function for literal patterns (i.e. ones without dynamic content).

Throws

[[ParserError]] if a syntactical error was detected while parsing the pattern.

Parameters

Name Type Description
pattern string The string to parse.

Returns

ParsedPattern

The parsed pattern, which can then be used with the [[RegExpMatcher]].

Defined in

src/pattern/Pattern.ts:130


pattern

pattern(strings, ...expressions): ParsedPattern

Parses a pattern, which matches a set of strings; see the Syntax section for details. This function is intended to be called as a template tag.

Syntax

Generally speaking, in patterns, characters are interpreted literally. That is, they match exactly what they are: a matches an a, b matches a b, ; matches a ;, and so on.

However, there are several constructs that have special meaning:

  • [expr] matches either the empty string or expr (an optional expression). expr may be a sequence of literal characters or a wildcard (see below).

  • ? matches any character (a wildcard).

  • A | at the start or end of the pattern asserts position at a word boundary (a word boundary assertion). If | is at the start, it ensures that the match either starts at the start of the string or a non- word character preceding it; if it is at the end, it ensures that the match either ends at the end of the string or a non-word character follows it.

    A word character is an lower-case or upper-case ASCII alphabet character or an ASCII digit.

  • In a literal, a backslash may be used to escape one of the meta-characters mentioned above so that it does match literally: \\[ matches [, and does not mark the start of an optional expression.

    Note about escapes

    As this function operates on raw strings, double-escaping backslashes is not necessary:

    // Use this:
    const parsed = pattern`hello \[`;
    // Don't use this:
    const parsed = pattern`hello \\[`;

Examples

  • baz matches baz exactly.

  • b\[ar matches b[ar exactly.

  • d?ude matches d, then any character, then ude. All of the following strings are matched by this pattern:

    • dyude
    • d;ude
    • d!ude
  • h[?]ello matches either h, any character, then ello or the literal string hello. The set of strings it matches is equal to the union of the set of strings that the two patterns hello and h?ello match. All of the following strings are matched by this pattern:

    • hello
    • h!ello
    • h;ello
  • |foobar| asserts position at a word boundary, matches the literal string foobar, and asserts position at a word boundary:

    • foobar matches, as the start and end of string count as word boundaries;
    • yofoobar does not match, as f is immediately preceded by a word character;
    • hello foobar bye matches, as f is immediately preceded by a non-word character, and r is immediately followed by a non-word character.

Grammar

Pattern  ::= '['? Atom* ']'?
Atom     ::= Literal | Wildcard | Optional
Optional ::= '[' Literal | Wildcard ']'
Literal  ::= (NON_SPECIAL | '\' SUPPORTS_ESCAPING)+

NON_SPECIAL       ::= _any character other than '\', '?', '[', ']', or '|'_
SUPPORTS_ESCAPING ::= '\' | '[' | ']' | '?' | '|'

Example

const parsed = pattern`hello?`; // match "hello", then any character

Example

const parsed = pattern`w[o]rld`; // match "wrld" or "world"

Example

const parsed = pattern`my initials are \[??\]`; // match "my initials are [", then any two characters, then a "]"

Throws

[[ParserError]] if a syntactical error was detected while parsing the pattern.

See

[[parseRawPattern]] if you want to parse a string into a pattern without using a template tag.

Parameters

Name Type
strings TemplateStringsArray
...expressions unknown[]

Returns

ParsedPattern

The parsed pattern, which can then be used with the [[RegExpMatcher]].

Defined in

src/pattern/Pattern.ts:106


randomCharFromSetCensorStrategy

randomCharFromSetCensorStrategy(charset): TextCensorStrategy

A text censoring strategy that generates replacement strings made up of random characters from the set of characters provided.

Example

const strategy = randomCharFromSetCensorStrategy('$#!');
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you!'
// After: '!##$ you!'

Parameters

Name Type Description
charset string Set of characters from which the replacement string should be constructed. Must not be empty.

Returns

TextCensorStrategy

A [[TextCensorStrategy]] for use with the [[TextCensor]].

Defined in

src/censor/BuiltinStrategies.ts:155


remapCharactersTransformer

remapCharactersTransformer(mapping): SimpleTransformerContainer

Maps certain characters to other characters, leaving other characters unchanged.

Application order

It is recommended that this transformer be applied near the start of the transformer chain.

Example

// Transform 'a' to 'b'.
const transformer = remapCharactersTransformer({ 'b': 'a' });
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });

Example

// Transform '🅱️' to 'b', and use a map instead of an object as the argument.
const transformer = remapCharactersTransformer(new Map([['b', '🅱️']]));
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });

Example

// Transform '🇴' and '0' to 'o'.
const transformer = remapCharactersTransformer({ o: '🇴0' });
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });

See

  • [[resolveConfusablesTransformer| Transformer that handles confusable Unicode characters]]
  • [[resolveLeetSpeakTransformer | Transformer that handles leet-speak]]

Parameters

Name Type Description
mapping CharacterMapping A map/object mapping certain characters to others.

Returns

SimpleTransformerContainer

A container holding the transformer, which can then be passed to the [[RegExpMatcher]].

Defined in

src/transformer/remap-characters/index.ts:38


resolveConfusablesTransformer

resolveConfusablesTransformer(): SimpleTransformerContainer

Creates a transformer that maps confusable Unicode characters to their normalized equivalent. For example, , , and become 1 when using this transformer.

Application order

It is recommended that this transformer be applied near the start of the transformer chain.

Example

const transformer = resolveConfusablesTransformer();
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });

Returns

SimpleTransformerContainer

A container holding the transformer, which can then be passed to the [[RegExpMatcher]].

Defined in

src/transformer/resolve-confusables/index.ts:22


resolveLeetSpeakTransformer

resolveLeetSpeakTransformer(): SimpleTransformerContainer

Creates a transformer that maps leet-speak characters to their normalized equivalent. For example, $ becomes s when using this transformer.

Application order

It is recommended that this transformer be applied near the start of the transformer chain, but after similar transformers that map characters to other characters, such as the [[resolveConfusablesTransformer | transformer that resolves confusable Unicode characters]].

Example

const transformer = resolveLeetSpeakTransformer();
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });

Returns

SimpleTransformerContainer

A container holding the transformer, which can then be passed to the [[RegExpMatcher]].

Defined in

src/transformer/resolve-leetspeak/index.ts:23


skipNonAlphabeticTransformer

skipNonAlphabeticTransformer(): SimpleTransformerContainer

Creates a transformer that skips non-alphabetic characters (a-z, A-Z). This is useful when matching text on patterns that are solely comprised of alphabetic characters (the pattern hello does not match h.e.l.l.o by default, but does with this transformer).

Warning

This transformation is not part of the default set of transformations, as there are some known rough edges with false negatives; see #23 and #46 on the GitHub issue tracker.

Application order

It is recommended that this transformer be applied near the end of the transformer chain, if at all.

Example

const transformer = skipNonAlphabeticTransformer();
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });

Returns

SimpleTransformerContainer

A container holding the transformer, which can then be passed to the [[RegExpMatcher]].

Defined in

src/transformer/skip-non-alphabetic/index.ts:31


toAsciiLowerCaseTransformer

toAsciiLowerCaseTransformer(): SimpleTransformerContainer

Creates a transformer that changes all ASCII alphabet characters to lower-case, leaving other characters unchanged.

Application order

It is recommended that this transformer be applied near the end of the transformer chain. Using it before other transformers may have the effect of making its changes useless as transformers applied after produce characters of varying cases.

Returns

SimpleTransformerContainer

A container holding the transformer, which can then be passed to the [[RegExpMatcher]].

Defined in

src/transformer/to-ascii-lowercase/index.ts:18