News Item

Language models might be able to self-correct biases—if you ask them | MIT Technology Review

By Niall Firth on March 20, 2023

Large language models are infamous for spewing toxic biases, thanks to the reams of awful human-produced content they get trained on.

But if the models are large enough, and humans have helped train them, then they may be able to self-correct for some of these biases. Remarkably, all we have to do is ask.

That’s the finding of an experiment out of AI lab Anthropic, described in a non-peer-reviewed paper, which analyzed large language models that had been trained using reinforcement learning from human feedback (RLHF), a technique that gets humans to steer the AI model toward more desirable answers.

Source: Language models might be able to self-correct biases—if you ask them | MIT Technology Review

Artificial Intelligence, Data, Digital Culture, Research Methods, Technology

Language models might be able to self-correct biases—if you ask them | MIT Technology Review

Suggested News Items

‘Clog the lines’: Internet trolls deliberately disrupted the Iowa caucuses hotline for reporting results | NBC

‘Disinformation’ Is The Word Of The Year — And A Sign Of What’s To Come | NPR

‘Don’t fuel the fire’: disinformation experts on how Biden should deal with Trump’s election lies | The Guardian