NLP-Filter

Experimenting an offensive speech filter, using synthetic data (generated from templates)

GOAL

Trying out various simple classification algorithms.

Working language: German.

NOTE 1: since all the data is generated from template, there are clear patterns, and none of the "noise" encountered in the wild.
NOTE 2: it might be interesting to extend this using (in order of complexity) a) word embeddings or b) language models, and then test it's performance on a publicly available dataset.