Citation

LLMs and the generation of moderate speech

Author:: de Keulenaar, Emillie
Year:: 2025

Since their public release, large language models (LLMs) have been scrutinized for how they manage speech deemed harmful or controversial. This paper considers instead the discursive techniques they use to generate “moderate speech” — that is, refusals, defusal, and other rhetorical strategies used to answer prompts situated along the boundaries of acceptability. It compares the outputs of eight LLMs (GPT-3.5 turbo, GPT-4, Claude 3 Haiku, Claude 3 Sonnet, Claude 3.5, Llama 2 7B, Llama 3 7B, and Mistral) when prompted with 526 top controversial questions from various subreddits. It finds that LLMs tend to answer prompts with high moderation scores — i.e., high “risk” — using refusals or normative language, but respond to prompts with high controverscy scores in more inconsistent tones. This distinction reflects a deeper tension between anticipation (risk mitigation) and deliberation (controversy negotiation) in value alignment processes. By tracing these patterns, the paper situates LLMs within a longer history of speech moderation and asks what role they may play in the ongoing negotiation of public speech norms.

See citation in ‘MediaWell Citations Library’ Zotero library

Go to citation source