From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models

What’s Covered?

This paper offers one of the most methodical investigations yet into how political leanings in training data shape the behavior of LMs. It introduces a two-step process to track political bias:

Measuring LM ideology using the Political Compass framework, which places models on social and economic axes based on their responses to 62 statements.
Tracing downstream effects on high-stakes tasks like hate speech and misinformation detection.

The authors use a mix of pretraining sources (left, center, right — across both Reddit and news domains) and apply them to 14 major models (BERT, RoBERTa, GPT-2, GPT-3, GPT-4, etc.). Using both probing and generation methods, they quantify ideological leanings and show how models pretrained on partisan content behave differently—even when the fine-tuning data remains unchanged.

Key findings include:

– BERT variants skew more socially conservative; GPTs lean more libertarian

– Models are more polarized on social issues than economic ones

– Social media content shifts LMs more on social values; news shifts economic values more

– Post-Trump data led to stronger polarization in LMs

– Even small shifts in pretraining content lead to measurable differences in how hate and misinformation are detected

The study also compares ensemble models combining partisan LMs to show that this strategy can improve fairness and performance across the board.

💡 Why it matters?

This paper tackles the gap between technical neutrality and political impact. It shows that models can internalize subtle ideological skew—even from “clean” or mainstream sources—and that this bias directly affects how identity groups and political narratives are treated. If left unmonitored, such bias could fuel algorithmic unfairness, erode public trust, and reinforce social inequalities.

What’s Missing?

The work centers around U.S.-based political ideologies, so its compass may not translate globally. Also, it leans heavily on the Political Compass test—a tool criticized for ambiguity and Western-centric framing. The paper doesn’t go deep on mitigation beyond ensemble and targeted pretraining. Other promising techniques (like adversarial training or contrastive debiasing) aren’t explored. Finally, real-world deployment contexts (like moderation pipelines or journalistic platforms) are left out of scope, though they’d benefit most from this insight.

Best For:

– Developers building moderation or misinformation systems

– Researchers exploring social bias and fairness in NLP

– Policy experts working on AI accountability

– Governance teams auditing model behavior

– Anyone designing foundation models intended for sensitive tasks

Source Details:

Citation:

Feng, S., Park, C. Y., Liu, Y., & Tsvetkov, Y. (2023). From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 11737–11762.

Authors & Credentials:

– Shangbin Feng (University of Washington): Works on responsible and interpretable NLP

– Chan Young Park (Carnegie Mellon University): Researcher in fairness and language model behavior

– Yuhan Liu (Xi’an Jiaotong University): Focus on political NLP

– Yulia Tsvetkov (University of Washington): Leading voice in social bias, multilingual NLP, and model fairness

Backed by DARPA and the NSF, this team brings a solid mix of computational rigor and political theory grounding. The code and data for this study are open-sourced: https://github.com/BunsenFeng/PoliLean