Red-teaming is a method where people — traditionally security engineers inside a company
— interact with a system to try to make it produce undesired outcomes. The goal is to identify
ways the system doesn’t work as intended, and then find fixes for the breaks.
Increasingly, red-teaming is being put forward as a solution to concerns about artificial
intelligence — a way to pressure test AI systems and identify potential harms. What does
that mean in practice? What can red-teaming do, and what are its limits? Answering those
questions is the subject of this policy brief.
