Welcome to MediaWell’s Dataset Library, a curated collection of tools, guides, and resources to support evidence-based solutions to digital mis- and disinformation. Our library is made up of three sections: Datasets & Data Repositories offers access to raw data, and includes details on the update status (active/archived), data type, time period, and access type (open/restricted) of each resource; Social Media APIs Guides & Toolkits helps navigate the shifting landscape of requesting data from social media platforms; and Peer Resources & Partner Organizations provides a list of cutting-edge labs, centers, and think tanks working at the intersections of democracy and technology. We regularly update this page to add new resources or reflect changes in those already listed. If you want to recommend a resource for our Dataset Library, let us know here.
Datasets & Data Repositories
Datasets & Data Repositories | Status | Data Type | Time Period | Access |
---|---|---|---|---|
Ad Observatory Database of political advertising across Meta platforms, including Facebook and Instagram, from January 1 – December 31, 2022. Originally maintained by NYU Cybersecurity for Democracy (C4D). Searchable by topic, keywords, sponsors, language (English and Spanish), region, with analyses of spending, messaging trends, impressions, partisan lean, and audience demographic. | Archived | Social media | 2022 | Open |
Afrobarometer Non-partisan research network that conducts public opinion surveys across 30 countries in Africa on social, political, and economic issues. Special topics include technology and digital infrastructure, and institutions and governance. | Active | Surveys | 1999–present | Open |
American National Election Survey (ANES) 2020-2022 Social Media Study Large-scale, three-wave panel survey of social media users. The first two waves were carried out before and after the 2020 U.S. presidential election, with a third wave after the 2020 midterm elections. Questionnaire covers a variety of topics, including trust, political knowledge and misinformation, and policy issues. | Active | Surveys | 2020–2022 | Open; must apply for restricted Facebook dataset |
AmericasBarometer Comparative public opinion surveys in 34 countries across South, Central, and North America on a variety of topics, including trust in institutions, democratic values, and state capacity. Hosted by LAPOP at Vanderbilt University. | Active | Surveys | 2004–present | Open |
Arab Barometer Non-partisan research network that conducts public opinion surveys across the Middle East and North Africa on social, political, and economic attitudes. Focus topics include COVID-19, governance, media, political institutions, democracy, gender issues, the environment, and more. | Active | Surveys | 2006–present | Open |
Data for Good at Meta Meta’s data and resource hub that aims to support research and organizations addressing crises around the world. Offers publicly-available datasets on a variety of population, infrastructure, economic, mobility, social connection, health, and climate-based topics, including the Social Connectedness Index and the Facebook Population During Crisis dataset. | Active | Various | Various | Open |
Dataset of COVID-Related Misinformation Videos and their Spread on Social Media Dataset of 8,122 COVID-related YouTube videos that circulated on social media between November 2019 and June 2020, and were taken down for containing false information. Contains video metadata and social media engagement statistics. | Archived | Social media | November 2019–June 2020 | Open |
Digital Governance Projects Database Database of all World Bank-funded digital governance (DG) and information and communication technology (ICT) projects since 1995, broken down by project summary, cost, duration, country, GovTech focus area, and more. Last version in 2022 contains details of 1,449 projects across 147 countries. | Active | Various | 1995–present | Open |
DocNow Tweet Catalog Curated list of 143 publicly-available Twitter datasets across a variety of topics. Topics include the 2020 U.S. presidential election, COVID-19, and hate groups. | Archived | Social media | Various | Open |
Documenting Hate News Index Searchable database of news articles about hate crime across the United States, including harassment, intimidation, cyberbullying and online trolling, vandalism, and violent crime. Hosted by Google News Lab and ProPublica. | Archived | News | August 2017–December 2019 | Open |
ElectionRumors2022 Dataset of election rumors on Twitter (now X) during the 2022 U.S. midterm elections. Contains information on 1.81 million Twitter posts around 135 distinct rumors spread during the 2022 midterm season, with mixed-methods analyses of specific cases. | Archived | Social media | September–December 2022 | Open |
ESOC COVID-19 Misinformation Dataset Database of COVID misinformation shared on social media and media outlets around the world in 2020. The final report counts 5,613 distinct misinformation stories. Data includes direct links and a breakdown of language, region, title, narrative, claim, distribution channel, audience, keywords, and misinformation type. Collected and coded by the Empirical Studies of Conflict at Princeton University. | Archived | Various | January–December 2020 | Open |
EU Data Portal Contains datasets on a wide variety of subjects in the European Union, including public opinion surveys, economic development measures, and government spending. Recent releases include the 2022 – 2023 TechSonar Report. | Active | Various | Various | Open |
Eurobarometer Series of public opinion surveys across Europe. Includes the Standard Eurobarometer and flash surveys on special themes, like Digital Society & Technology. Recent releases include the Media & News Survey (2023), perceptions of cyberskills and cybersecurity, and the impact of digitization on EU citizens’ daily lives. | Active | Surveys | 1974–present | Open |
European Social Survey Cross-national survey of attitudes, beliefs, and behavior in over thirty countries across Europe. Conducted every two years with “core” and rotating sections, including questions on trust, immigration, civic involvement, and digital social contacts. | Active | Surveys | 2002–present | Open |
EUvsDisinfo Searchable multilingual database of pro-Kremlin disinformation news stories. Searchable by country, language, date, and tags, including the invasion of Ukraine, U.S. presence in Europe, the European elections, and more. Hosted by the European Union and the East Stratcom Task Force. Contains 17,300+ distinct disinformation cases as of August 2024. | Active | News | 2015–present | Open |
Facebook Political Ad Collector Searchable database of targeted political ads on Facebook, collected with a browser plugin created by the NYU Online Political Transparency Project in partnership with ProPublica’s Electionland Project. View ads targeted for specific audiences by filtering by city, state, political affiliation, age, and gender. (Note: Not searchable by race due to Facebook parameters.) | Archived | Social media | August 2018–July 2020 | Open |
French Political Trust Barometer (Le Baromètre de la confiance politique) Benchmark survey data of trust in politics among the French population since 2009. Results released in French, with select reports available in English. Led by the CEVIPOF research lab at Sciences Po – Paris. | Active | Surveys | 2009–present | |
Harvard Dataverse A free, open-source repository for researchers across disciplines to publish and share datasets. Examples include replication data for studies on the impact of belief in false claims, voter fraud disinformation campaigns, and social media trolling. | Active | Various | Various | Open |
ICPSR (Inter-University Consortium for Political and Social Research) A repository for social science datasets, including polls and surveys by organizations, individual researchers, and government entities. Examples include the datasets on social media echo chambers, religion and misinformation, and fact-checking COVID-19 misinformation in college students. | Active | Various | Various | Various |
Information Laundering Cycle (ILC) Document Database A comprehensive document database to examine attacks against disinformation researchers and institutions. Contains 162 sets of primary source documents, totaling more than 2,000 pages of publicly available material, largely made up of emails, text messages, and other written communications between researchers, social media platforms, and government agencies. | Active | Various | 2022-2023 | Open |
Latinobarómetro Annual public opinion survey across Latin America on a variety of social, economic, and political issues. Recent versions include questions on AI, automation, trust in institutions, and digital communication. | Active | Surveys | 1995–present | Open |
MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection A collection of 36 hate speech datasets across mainstream and niche social media platforms. Hosted by the Information Retrieval Lab at the University of A Coruña. Contains 1.2 million social media posts as of August 2024. | Active | Various | 2016–present | Open |
Misinformation Amplification Tracking Dashboard Tracker of how various mis- and disinformation narratives are amplified across social media, how platforms respond, and the extent to which they amplify or incentivize the spread of false information. Hosted by the Integrity Institute as part of the Elections Integrity Program. | Active | Social media | Various | Open (aggregate results) |
Pew Research Center Nonpartisan “fact tank” that collects and analyzes data on public attitudes in the United States and around the world. Ongoing research topics with available datasets include Public Trust in Government and Local News Dynamics. | Active | Surveys | Various | Open |
Policy Tracker An interactive tracker by Tech Policy Press to track laws and regulations, along with government investigations and litigation, that shapes the rules and accountability for tech companies. Searchable by topic, government, and policy type; includes details on government, date initiated, current status, and last estimated update. | Active | Policy | 2017–present | Open |
Political Deepfakes Incidents Database (PDID) Database of politically-salient deepfake incidents. Includes deepfake content for images and videos, metadata, and descriptors drawn from political science, public policy, and misinformation research, with the goal of documenting trends around the use of generative AI for political disinformation. | Active | Various | 2018–present | Open |
Politwoops Tracks deleted tweets by public officials in the United States, including those elected at the time of posting and candidates running for office. Hosted by the Sunlight Foundation from 2012 – 2015 and ProPublics from 2016 – 2023. Archived due to Twitter/X API changes. | Archived | Social media | 2012–2023 | Open |
SOMAR – Social Media Archive at ICPSR Repository for datasets collected for social media research. Data collected from major platforms (Twitter/X, Facebook, Instagram, Reddit, and YouTube) across a variety of topics, including political communication, information networks, and online behavior. | Active | Various | Various | Open |
Technology Policy Tracker A data aggregation initiative by Cambridge Local First, Tech Policy Press, and Integrity Institute. Aims to provide a comprehensive view of major technology policy and legislation across the United States (on a federal and state level) and internationally. | Active | Policy | 2023 | Open |
The Accountability Project (TAP) Collection of public datasets by and about local, state, and federal government agencies in the United States. Contains datasets on campaign spending, employee salaries, voter registration, land ownership, businesses, medical facilities, government contracts, emergency funds, and more. | Active | Various | 1978–present | Open |
The Platform Governance Archive Dataset of major social media platforms and their content moderation and legal policies. V1 includes data for Facebook, Instagram, Twitter, and YouTube from 2005-2021. V2 covers 2022 onwards with data from 14 additional platforms. Hosted by the Platform Governance, Media, and Technology Lab at the University of Bremen. | Active | Policy | 2005–present | Open |
TruthSeeker One of the largest ground truth fake news datasets for real and fake news content on social media, with the goal of establishing deep learning-based detection models and clustering-based event detection. Includes bot, credibility, and influence scores. Led by the Canadian Institute for Cybersecurity at the University of New Brunswick. | Active | Social media | 2023–present | Open |
UNdata Official statistics and datasets collected by various United Nations agencies and partner organizations. Includes the World Telecommunication/ICT Indicators Database, World Development Indicators, and the UIS Data Centre on Education, Culture and Communication, and Science and Technology. | Active | Various | Various | Open |
Wellcome Global Monitor Series of global public opinion surveys around science and health across 140 countries. Contains questions about trust in science, trust in doctors and nurses, confidence in public health officials, attitudes towards vaccines, intersections with gender and religion, and perceptions of the future. | Archived | Surveys | 2018–2020 | Open |
Social Media APIs Guides & Toolkits
American University Social Media API Guide Up-to-date resource for accessing data across major social media platforms, including Twitter (X), Instagram, Facebook, Reddit, YouTube, and TikTok. Includes guides for using each platform’s APIs, step-by-step instructions on how to gain access, what each does (or does not) allow, and third-party tools for access and visualization. Maintained by American University. |
Fighting Disinformation Online Part of the RAND Corporation’s Countering Truth Decay Initiative, the Fighting Disinformation Online project curates a universe of online tools developed by nonprofits and civil society organizations to target online disinformation. The project aims to assist media consumers as well as inform funders and developers about what tools already exist, and where further developments are needed. Organized by category, including bot/spam detection, codes and standards, credibility scoring, disinformation tracking, verification and fact-checking, and whitelisting. |
Hoaxy A web-based tool that visualizes the spread of articles online, created by the Observatory on Social Media (OSoMe) at Indiana University. Tracks the sharing of links to stories from low-credibility sources and independent fact-checking organizations. |
Meta Content Library and API The Meta Content Library and Content Library API provides comprehensive access to the full public content archive from Facebook and Instagram, as well as select data from Threads. As of October 2024, individuals can apply for access to the tools with the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. |
Social Media Research Toolkit Curated set of 50+ social media research tools by the Social Media Lab at Toronto Metropolitan University. Updated annually, with a breakdown of each access type, the platforms covered, and whether coding experience is required. |
SOMAR Data Applications Platform The Social Media Archive (SOMAR) Data Applications platform is an online system that streamlines the process of finding and applying for social media data disseminated via the virtual data enclave (VDE) or controlled download by SOMAR and its partners, including the Meta Content Library and Content Library API and the Meta Ad Targeting Dataset. Hosted by the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan Institute for Social Research (ISR). Research affiliation required. |
Top FIBers A dashboard by the Observatory for Social Media (OSoMe) at Indiana University for tracking and reporting the top ten superspreaders of low-credibility information on Twitter and Facebook each month. |
University of Michigan Social Media Research Guide Step-by-step, updated resource for navigating social media data, including how to request access across different platforms, how to use APIs, what data is or is not publicly available, where to find existing datasets, sources of U.S. government social media, and tools for analysis. |
Peer Resources & Partner Organizations
- Center for Critical Race and Digital Studies
- Center for Information, Technology, and Public Life at UNC-Chapel Hill
- Center for an Informed Public at the University of Washington
- Center for Media Engagement, at the University of Texas at Austin
- Center for Social Media and Politics at New York University
- Credibility Coalition
- Cyber Policy Center at Stanford University
- Data Fluencies at at Simon Fraser University
- Data & Society
- Digital Democracies Institute at Simon Fraser University
- Digital Forensic Research Lab at the Atlantic Council
- Global Disinformation Lab at University of Texas at Austin
- Hacks/Hackers
- Institute for Data, Democracy, and Politics at George Washington University
- The Integrity Institute
- Fighting Disinformation Online at RAND Corporation
- Full Fact
- Journalism & Media at Pew Research Center
- Knight Center for Journalism in the Americas at the University of Texas at Austin
- Lazer Lab at Northeastern University
- National Institute for Civil Discourse at the University of Arizona
- NiemanLab at Harvard University
- Observatory on Social Media at Indiana University
- Oxford Internet Institute
- Poynter
- Program on Democracy and the Internet at Stanford University
- Shorenstein Center at Harvard University