Tackling misinformation: What researchers could do with social media data

Social media platforms rarely provide data to misinformation researchers. This is problematic as platforms
play a major role in the diffusion and amplification of mis- and disinformation narratives. Scientists are
often left working with partial or biased data and must rush to archive relevant data as soon as it appears
on the platforms, before it is suddenly and permanently removed by deplatforming operations.
Alternatively, scientists have conducted off-platform laboratory research that approximates social media
use. While this can provide useful insights, this approach can have severely limited external validity
(though see Munger, 2017; Pennycook et al. 2020). For researchers in the field of misinformation,
emphasizing the necessity of establishing better collaborations with social media platforms has become routine. In-lab studies and off-platform investigations can only take us so far. Increased data access would
enable researchers to perform studies on a broader scale, allow for improved characterization of
misinformation in real-world contexts, and facilitate the testing of interventions to prevent the spread of
misinformation. The current paper highlights 15 opinions from researchers detailing these possibilities
and describes research that could hypothetically be conducted if social media data were more readily
available. As scientists, our findings are only as good as the dataset at our disposal, and with the current
misinformation crisis, it is urgent that we have access to real-world data where misinformation is wreaking
the most havoc.
While new collaborative efforts are gradually emerging (e.g., Clegg, 2020; Mervis, 2020), they remain
scarce and unevenly distributed across research communities and disciplines. Platforms periodically fund
research initiatives on mis- and disinformation, but these rarely include increased access to data and
algorithmic models. Most importantly, in these kinds of collaborations, intellectual freedom is easily
limited by the fact that the overarching scope of the research is not defined by the researchers, but by
the platforms themselves. In the rare case data sharing is a possibility, negotiations have been slow for
several reasons, including platforms’ concerns over protecting their brands and reputation, and ethical
and legal issues of privacy and data security on a grand scale (Bechmann & Kim, 2020; Olteanu et al.,
2019). However, these barriers are not insurmountable (Moreno et al., 2013; Lazer et al. 2020). For
instance, establishing a mechanism by which users can actively consent to various research studies, and
potentially offering to make the data available to the participants themselves, would be a significant step
forward (Donovan, 2020).
We invited misinformation researchers to write a 250-word commentary about the research that they
would hypothetically conduct if they had access to consenting participants’ social media data. The
excerpts below provide concrete examples of studies that misinformation researchers could conduct, if
the community had better access to platforms’ data and processes. Based on the contents of the
submission, we have grouped these brief excerpts into five areas that could be improved, and conclude
with an excerpt regarding the importance of data sharing:
1. measurement and design,
2. who engages with misinformation and why,
3. unique datasets with increased validity,
4. disinformation campaigns,
5. interventions, and
6. the importance of data sharing.
While these excerpts are not comprehensive and may not be representative of the field as a whole,
our hope is that this multi-authored piece will further the conversation regarding the establishment of
more evenly distributed collaborations between researchers and platforms. Despite the challenges, on
the other side of these negotiations are a vast array of potential discoveries that are needed by both the
nascent field of misinformation as well as society.