Social Science Research Council Research AMP Just Tech
Citation

When collaboration fails: persuasion driven adversarial influence in multi agent large language model debate

Author:
Kraidia, Insaf; Qaddara, Iyas; Almutairi, Alhanof; Alzaben, Nada; Belhouari, Samir Brahim
Publication:
Scientific Reports
Year:
2026

Recent developments have made Large Language Model (LLM) multi-agent systems a promising paradigm for enhancing reasoning via collaborative debate and collective deliberation. Prior work has demonstrated that coordinated LLM agents tend to perform better than single models in terms of accuracy, robustness, and reasoning depth. But these benefits depend on a rarely questioned assumption: that all actors act honestly. In this paper we subvert this assumption by identifying one of the most critical weaknesses: a persuasion-induced adversarial influence in LLM-to-LLM debate. Here we show that a single strategically designed adversarial agent can significantly influence group outcomes through coherent, confident, and misleading arguments, instead of through the more classical prompt or token attacks. Experimental results suggest that this kind of agent can lower the system’s overall accuracy by 10–40% while increasing consensus on incorrect answers by more than 30%. We conceptualize persuasion as an adversarial vector and demonstrate that inference-time enhancement techniques, such as both Best-of-N optimization and Retrieval-Augmented Generation (RAG), can unintentionally amplify these attacks by increasing the perceived credibility of flawed arguments, even when retrieval quality is low. Our results show that increasing the number of agents or debate rounds does not reliably mitigate adversarial persuasion, nor can simple prompt-based defenses. The present findings demand a fundamental re-thinking of trust, coordination, and robustness assumptions when deploying multi-agent LLM systems.