This repository contains the code of the paper Respectful or Toxic? Using Zero-Shot Learning with Language Models to Detect Hate Speech accepted at The 7th Workshop on Online Abuse and Harms (WOAH) at ACL 2023.
Flor Miriam Plaza-del-Arco • Debora Nozza • Dirk Hovy •
Code comes from HuggingFace and thus our License is an MIT license.
For models restrictions may apply on the data (which are derived from existing datasets) or Twitter (main data source). We refer users to the original licenses accompanying each dataset and Twitter regulations.
To use Encoder LMs, you can import the prompting
module from encoder_lms
:
from encoder_lms import prompting
prompt_template = "This text is"
verb_h = "toxic" # verbalizer for hate speech class
verb_nh = "respectful" # verbalizer for non-hate speech class
enc_lms = prompting("deberta-base") # Models: roberta-base, roberta-large, bert, deberta-base, deberta-large, xlm-roberta
# The input can be a dataframe, a text or a list of texts
enc_lms.predict(prompt_template, verb_h, verb_nh, ["Shut your dumbass up bitch we all know you a hoe", "My lovely cat"])
>> ["hate", "non-hate"]
To use Instruction fine-tuned LMs, you can import the prompting
module from instruction_fine_tuned_lms
:
from instruction_fine_tuned_lms import prompting
prompt_template = "Classify this text as hate or non-hate. Text:"
output_indicator = "Answer:"
inst_lms = prompting("flant5") # Models: flant5, mt0
# The input can be a dataframe, a text or a list of texts
inst_lms.predict(prompt_template, output_indicator, ["Shut your dumbass up bitch we all know you a hoe", "My lovely cat"])
>> ["hate", "non-hate"]
Note: The examples (hate) provided are sourced from a hate speech corpus and are not created by the authors of this repository.