Experimenting an offensive speech filter, using synthetic data (generated from templates)
Trying out various simple classification algorithms.
Working language: German.
- NOTE 1: since all the data is generated from template, there are clear patterns, and none of the "noise" encountered in the wild.
- NOTE 2: it might be interesting to extend this using (in order of complexity) a) word embeddings or b) language models, and then test it's performance on a publicly available dataset.