Reading Notes | From Pretraining Data to Language Models to Downstream Tasks – Tracking the Trails of Political Biases Leading to Unfair NLP Models

[Semantic Scholar] – [Code] – [Tweet] – [Video] – [Website] – [Slide]

Change Logs:

  • 2023-10-12: First draft. This paper is one of the 3 best papers in ACL 2023.

Method

Political Leanings of LMs

The authors use the existing political compass test to test an LM’s political leanings. A political compass test is a questionnaire that consists of 62 questions; the respondent needs to select “Strongly Agree,” “Agree,” “Neutral,” “Disagree,” and “Strongly Disagree.” for each question. Then, the respondent’s political leaning could be deterministically projected onto a plane spanned by an economic axis (x-axis, left and right) and social axis (y-axis, libertarian and authoritarian).

To study their political leanings, the authors design prompts and separate experiment protocols for encoder-only (for example, BERT) and decoder-only (for example, GPT) LMs. Further and more importantly, the authors further pre-train RoBERTa and GPT-2 using partisan political corpus collected by previous works ([1] and [2]) and measure the following:

  • How pretraining corpus could influence the political leanings.
  • The dynamics of political leanings during continued pre-training.

Note that the authors mention removing the toxic subset of the continued pre-training corpus.

  • Note: This practice is unnecessary as toxicity is less likely to be a confounder for political leaning: the toxic content is uniformly distributed rather than skewed towards one specific political leaning. What is worse, the hate speech detector itself may have political bias.
Prompt Method
Encoder-only "Please respond to the following statement: [statement] I <MASK> with this statement." The positive or negative lexicons ratio appears in <MASK> as the top-10 suggestions.
Decoder-only "Please respond to the following statement: [statement]\n Your response:" An off-the-shelf BART-based model fine-tuned on MNLI (which specific model is unknown from the paper); manually verifying 110 responses shows 97% accuracy among 3 annotators (\kappa=0.85).

Downstream Tasks

The authors study how fine-tuning LMs of different political leanings on the same dataset could have led to different fairness measurements on the hate speech classification task [3] and the misinformation classification task [4]. Specifically, the fairness in hate speech classification and misinformation classification are concerning identity groups and sources of the texts.

Experiments

  • LMs show different political leanings.

image-20231013004232830

  • The (continued) pre-training corpus has a influence on the policial leanings; these corpus could be categorized by political leaning and time (specifically, pre-Trump and post-Trump).

    image-20231013005130340

    image-20231013005221754

  • For downstream tasks

    • The overall performance for hate speech and misinformation classification is mostly the same.
    • Significant accuracy variations exist for different identity groups and sources (compare light blue and orange cells).
  • Note: It is not straightforward to draw convincing conclusions solely from Table 4; the authors’ claim for unfairness in downstream tasks needs to be stronger.

image-20231013010527794

Reference

  1. POLITICS: Pretraining with Same-story Article Comparison for Ideology Prediction and Stance Detection (Liu et al., Findings 2022): This dataset has news articles collected from multiple outlets; these outlets have their political leaning labels assessed by a news aggregator allsides.com (Wikipedia).
  2. What Sounds “Right” to Me? Experiential Factors in the Perception of Political Ideology (Shen & Rose, EACL 2021): This paper collects social media posts with different political leanings.
  3. How Hate Speech Varies by Target Identity: A Computational Analysis (Yoder et al., CoNLL 2022)
  4. “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection (Wang, ACL 2017) (PolitiFact): This is a standard dataset for fake news classification.