Enhancing health misinformation detection: A multidimensional feature framework incorporating linguistic strategies

Health misinformation constitutes a recurrent theme in the context of public health. Fighting health misinformation involves the timely detection of misleading information. However, detecting health misinformation is particularly challenging, as the content is often intentionally crafted. Current studies frequently fail to fully extract effective features of health misinformation due to a lack of in-depth insight into textual language use. Deceivers may use the right words and linguistic tricks to influence users. To address this issue, this study aims to analyze health misinformation from a language use perspective. Based on the Comprehensive Information Theory, this study proposes a multidimensional feature framework that incorporates syntactic features (such as text length, lexical density, and POS ratio) and semantic features (including sentence, main linguistic strategies, and sub-strategies). Among these features, analyzing linguistic strategies presents challenges in terms of context understanding, language understanding, and language reasoning. To address these challenges, we introduce a novel annotation method focused on linguistic strategy analysis. This method employs a directed content analysis approach guided by Aristotle’s rhetorical theory, followed by a summative content analysis to establish a keyword list for each linguistic strategy. Finally, we develop a LinguisticStrategy2Vec method to extract semantic information. The experimental analysis of the real-world dataset demonstrates that our proposed detection framework achieves an F1 score of 0.831 and an AUC score of 0.883, with an enhanced F1 score of 0.899 specifically on false health information dataset. These results demonstrate the effectiveness of our detection method, highlighting its capability for advanced health misinformation detection by incorporating multidimensional features. The findings confirm the robust stability of linguistic strategies in detecting health misinformation. Moreover, this study has methodological significance in annotating and extracting sentiment features in health misinformation detection, and in enhancing the natural language understanding and semantic reasoning of deep learning models.