Analysis and Detection of Health-Related Misinformation on Chinese Social Media

Liu, Yue; Yu, Ke; Wu, Xiaofei; Qing, Linbo; Peng, Yonghong
IEEE Access

With the mobile Internet development, e-health has become increasingly connected with people’s daily life. However, health information on Internet is severely corrupted by misinformation, especially for the aged. It is necessary to analyze the characteristics of health-related misinformation on Internet and to design automated detection tools. In this study, we focus on analyzing common characteristics of reliable and unreliable health-related information on Chinese online social media, and exploring possible detection method using machine learning algorithms. We first collect a dataset containing both reliable and unreliable health-related articles from multiple Chinese online social media sites, with 2,296 reliable and 2,085 unreliable included. Then we analyze their differences with respect to writing style, text topic and feature distribution by both intuitive and statistical analysis. We also manually select 104 linguistic and statistical features that are useful for machine learning classifiers. Lastly, we propose a Health-related Misinformation Detection framework (HMD) that includes a feature-based method and a text-based method for detecting unreliable health-related information. Experiments verifies the performance of our proposed HMD method.