A significant body of research is dedicated to developing language models that can detect various types of online abuse, for example, hate speech, cyberbullying. However, there is a disconnect between platform policies, which often consider the author’s intention as a criterion for content moderation, and the current capabilities of detection models, which typically lack efforts