Reading Notes | Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation

[Semantic Scholar] – [Code] – [Tweet] – [Video] – [Website] – [Slide]

Change Logs:

  • 2023-09-11: First draft. This paper appears at WOAH ’22.

The paper studies the generalization to new hate target groups on the single HateXplain dataset; they authors do so by comparing three existing methods, including (1) Unsupervised Domain Adaptation (UDA, this method is also used in paper [1]), (2) MixUp regularization, (3) curriculum labeling, and (4) DANN.

The paper also considers the back translation approach (specifically (en, fr), (en, de), and (en, es)) for data augmentation.

Experiments

  • Zero: Directly apply a model trained on \mathcal{D}_A to a new domain \mathcal{D}_B.
  • Zero+: Augmenting \mathcal{D}_A using back-translation.
  • ZeroB+: Applying back-translation-based data augmentation while making sure that the each batch is class-balanced.

Reference

  1. Unsupervised Domain Adaptation in Cross-corpora Abusive Language Detection (Bose et al., SocialNLP 2021): This paper considers the setting of training on dataset \mathcal{D}_A and testing on another dataset \mathcal{D}_B, where A, B are HateEval, Waseem, and Davidson, resulting in 6 pairs. They use several existing methods to improve the test scores on \mathcal{D}_B.
  2. [2012.10289] HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection (Mathew et al., AAAI 2021): This used to be the only dataset that provides the target groups of both hateful and non-hateful contents.
  3. d: Data augmentation could happen in symbol space via rules, word replacement through BERT, text-generation models or feature space. However, the main paper chooses to use the back translation for data augmentation.

    Here are two libraries on data augmentation in NLP: