Social Science Research Council Research AMP Just Tech

Dataset Library

Welcome to MediaWell’s Dataset Library, a curated collection of tools, guides, and resources to support evidence-based solutions to digital mis- and disinformation. Our library is made up of three sections: Datasets & Data Repositories offers access to raw data, and includes details on the update status (active/archived), data type, time period, and access type (open/restricted) of each resource; Social Media APIs Guides & Toolkits helps navigate the shifting landscape of requesting data from social media platforms; and Peer Resources & Partner Organizations provides a list of cutting-edge labs, centers, and think tanks working at the intersections of democracy and technology. We regularly update this page to add new resources or reflect changes in those already listed. If you want to recommend a resource for our Dataset Library, let us know here.

Datasets & Data Repositories

Datasets & Data RepositoriesStatusData TypeTime PeriodAccess
Ad Observatory
Database of political advertising across Meta platforms, including Facebook and Instagram, from January 1 – December 31, 2022. Originally maintained by NYU Cybersecurity for Democracy (C4D). Searchable by topic, keywords, sponsors, language (English and Spanish), region, with analyses of spending, messaging trends, impressions, partisan lean, and audience demographic.
ArchivedSocial media2022Open
Afrobarometer
Non-partisan research network that conducts public opinion surveys across 30 countries in Africa on social, political, and economic issues. Special topics include technology and digital infrastructure, and institutions and governance.
ActiveSurveys1999–presentOpen
American National Election Survey (ANES) 2020-2022 Social Media Study
Large-scale, three-wave panel survey of social media users. The first two waves were carried out before and after the 2020 U.S. presidential election, with a third wave after the 2020 midterm elections. Questionnaire covers a variety of topics, including trust, political knowledge and misinformation, and policy issues.
ActiveSurveys2020–2022Open; must apply for restricted Facebook dataset
AmericasBarometer
Comparative public opinion surveys in 34 countries across South, Central, and North America on a variety of topics, including trust in institutions, democratic values, and state capacity. Hosted by LAPOP at Vanderbilt University.
ActiveSurveys2004–presentOpen
Arab Barometer
Non-partisan research network that conducts public opinion surveys across the Middle East and North Africa on social, political, and economic attitudes. Focus topics include COVID-19, governance, media, political institutions, democracy, gender issues, the environment, and more.
ActiveSurveys2006–presentOpen
Data for Good at Meta
Meta’s data and resource hub that aims to support research and organizations addressing crises around the world. Offers publicly-available datasets on a variety of population, infrastructure, economic, mobility, social connection, health, and climate-based topics, including the Social Connectedness Index and the Facebook Population During Crisis dataset.
ActiveVariousVariousOpen
Dataset of COVID-Related Misinformation Videos and their Spread on Social Media
Dataset of 8,122 COVID-related YouTube videos that circulated on social media between November 2019 and June 2020, and were taken down for containing false information. Contains video metadata and social media engagement statistics.
ArchivedSocial mediaNovember 2019–June 2020Open
Digital Governance Projects Database
Database of all World Bank-funded digital governance (DG) and information and communication technology (ICT) projects since 1995, broken down by project summary, cost, duration, country, GovTech focus area, and more. Last version in 2022 contains details of 1,449 projects across 147 countries.
ActiveVarious1995–presentOpen
DocNow Tweet Catalog
Curated list of 143 publicly-available Twitter datasets across a variety of topics. Topics include the 2020 U.S. presidential election, COVID-19, and hate groups.
ArchivedSocial mediaVariousOpen
Documenting Hate News Index
Searchable database of news articles about hate crime across the United States, including harassment, intimidation, cyberbullying and online trolling, vandalism, and violent crime. Hosted by Google News Lab and ProPublica.
ArchivedNewsAugust 2017–December 2019Open
ElectionRumors2022
Dataset of election rumors on Twitter (now X) during the 2022 U.S. midterm elections. Contains information on 1.81 million Twitter posts around 135 distinct rumors spread during the 2022 midterm season, with mixed-methods analyses of specific cases.
ArchivedSocial mediaSeptember–December 2022Open
ESOC COVID-19 Misinformation Dataset
Database of COVID misinformation shared on social media and media outlets around the world in 2020. The final report counts 5,613 distinct misinformation stories. Data includes direct links and a breakdown of language, region, title, narrative, claim, distribution channel, audience, keywords, and misinformation type. Collected and coded by the Empirical Studies of Conflict at Princeton University.
ArchivedVariousJanuary–December 2020Open
EU Data Portal
Contains datasets on a wide variety of subjects in the European Union, including public opinion surveys, economic development measures, and government spending. Recent releases include the 2022 – 2023 TechSonar Report.
ActiveVariousVariousOpen
Eurobarometer
Series of public opinion surveys across Europe. Includes the Standard Eurobarometer and flash surveys on special themes, like Digital Society & Technology. Recent releases include the Media & News Survey (2023), perceptions of cyberskills and cybersecurity, and the impact of digitization on EU citizens’ daily lives.
ActiveSurveys1974–presentOpen
European Social Survey
Cross-national survey of attitudes, beliefs, and behavior in over thirty countries across Europe. Conducted every two years with “core” and rotating sections, including questions on trust, immigration, civic involvement, and digital social contacts.
ActiveSurveys2002–presentOpen
EUvsDisinfo
Searchable multilingual database of pro-Kremlin disinformation news stories. Searchable by country, language, date, and tags, including the invasion of Ukraine, U.S. presence in Europe, the European elections, and more. Hosted by the European Union and the East Stratcom Task Force. Contains 17,300+ distinct disinformation cases as of August 2024.
ActiveNews2015–presentOpen
Facebook Political Ad Collector
Searchable database of targeted political ads on Facebook, collected with a browser plugin created by the NYU Online Political Transparency Project in partnership with ProPublica’s Electionland Project. View ads targeted for specific audiences by filtering by city, state, political affiliation, age, and gender. (Note: Not searchable by race due to Facebook parameters.)
ArchivedSocial mediaAugust 2018–July 2020Open
French Political Trust Barometer (Le Baromètre de la confiance politique)
Benchmark survey data of trust in politics among the French population since 2009. Results released in French, with select reports available in English. Led by the CEVIPOF research lab at Sciences Po – Paris.
ActiveSurveys2009–present
Harvard Dataverse
A free, open-source repository for researchers across disciplines to publish and share datasets. Examples include replication data for studies on the impact of belief in false claims, voter fraud disinformation campaigns, and social media trolling.
ActiveVariousVariousOpen
ICPSR (Inter-University Consortium for Political and Social Research)
A repository for social science datasets, including polls and surveys by organizations, individual researchers, and government entities. Examples include the datasets on social media echo chambers, religion and misinformation, and fact-checking COVID-19 misinformation in college students.
ActiveVariousVariousVarious
Information Laundering Cycle (ILC) Document Database
A comprehensive document database to examine attacks against disinformation researchers and institutions. Contains 162 sets of primary source documents, totaling more than 2,000 pages of publicly available material, largely made up of emails, text messages, and other written communications between researchers, social media platforms, and government agencies.
ActiveVarious2022-2023Open
Latinobarómetro
Annual public opinion survey across Latin America on a variety of social, economic, and political issues. Recent versions include questions on AI, automation, trust in institutions, and digital communication.
ActiveSurveys1995–presentOpen
MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection
A collection of 36 hate speech datasets across mainstream and niche social media platforms. Hosted by the Information Retrieval Lab at the University of A Coruña. Contains 1.2 million social media posts as of August 2024.
ActiveVarious2016–presentOpen
Misinformation Amplification Tracking Dashboard
Tracker of how various mis- and disinformation narratives are amplified across social media, how platforms respond, and the extent to which they amplify or incentivize the spread of false information. Hosted by the Integrity Institute as part of the Elections Integrity Program.
ActiveSocial mediaVariousOpen (aggregate results)
Pew Research Center
Nonpartisan “fact tank” that collects and analyzes data on public attitudes in the United States and around the world. Ongoing research topics with available datasets include Public Trust in Government and Local News Dynamics.
ActiveSurveysVariousOpen
Policy Tracker
An interactive tracker by Tech Policy Press to track laws and regulations, along with government investigations and litigation, that shapes the rules and accountability for tech companies. Searchable by topic, government, and policy type; includes details on government, date initiated, current status, and last estimated update.
ActivePolicy2017–presentOpen
Political Deepfakes Incidents Database (PDID)
Database of politically-salient deepfake incidents. Includes deepfake content for images and videos, metadata, and descriptors drawn from political science, public policy, and misinformation research, with the goal of documenting trends around the use of generative AI for political disinformation.
ActiveVarious2018–presentOpen
Politwoops
Tracks deleted tweets by public officials in the United States, including those elected at the time of posting and candidates running for office. Hosted by the Sunlight Foundation from 2012 – 2015 and ProPublics from 2016 – 2023. Archived due to Twitter/X API changes.
ArchivedSocial media2012–2023Open
SOMAR – Social Media Archive at ICPSR
Repository for datasets collected for social media research. Data collected from major platforms (Twitter/X, Facebook, Instagram, Reddit, and YouTube) across a variety of topics, including political communication, information networks, and online behavior.
ActiveVariousVariousOpen
Technology Policy Tracker
A data aggregation initiative by Cambridge Local First, Tech Policy Press, and Integrity Institute. Aims to provide a comprehensive view of major technology policy and legislation across the United States (on a federal and state level) and internationally.
ActivePolicy2023Open
The Accountability Project (TAP)
Collection of public datasets by and about local, state, and federal government agencies in the United States. Contains datasets on campaign spending, employee salaries, voter registration, land ownership, businesses, medical facilities, government contracts, emergency funds, and more.
ActiveVarious1978–presentOpen
The Platform Governance Archive
Dataset of major social media platforms and their content moderation and legal policies. V1 includes data for Facebook, Instagram, Twitter, and YouTube from 2005-2021. V2 covers 2022 onwards with data from 14 additional platforms. Hosted by the Platform Governance, Media, and Technology Lab at the University of Bremen.
ActivePolicy2005–presentOpen
TruthSeeker
One of the largest ground truth fake news datasets for real and fake news content on social media, with the goal of establishing deep learning-based detection models and clustering-based event detection. Includes bot, credibility, and influence scores. Led by the Canadian Institute for Cybersecurity at the University of New Brunswick.
ActiveSocial media2023–presentOpen
UNdata
Official statistics and datasets collected by various United Nations agencies and partner organizations. Includes the World Telecommunication/ICT Indicators Database, World Development Indicators, and the UIS Data Centre on Education, Culture and Communication, and Science and Technology.
ActiveVariousVariousOpen
Wellcome Global Monitor
Series of global public opinion surveys
around science and health across 140 countries. Contains questions about trust in science, trust in doctors and nurses, confidence in public health officials, attitudes towards vaccines, intersections with gender and religion, and perceptions of the future.
ArchivedSurveys2018–2020Open

Social Media APIs Guides & Toolkits

American University Social Media API Guide
Up-to-date resource for accessing data across major social media platforms, including Twitter (X), Instagram, Facebook, Reddit, YouTube, and TikTok. Includes guides for using each platform’s APIs, step-by-step instructions on how to gain access, what each does (or does not) allow, and third-party tools for access and visualization. Maintained by American University.
Fighting Disinformation Online
Part of the RAND Corporation’s Countering Truth Decay Initiative, the Fighting Disinformation Online project curates a universe of online tools developed by nonprofits and civil society organizations to target online disinformation. The project aims to assist media consumers as well as inform funders and developers about what tools already exist, and where further developments are needed. Organized by category, including bot/spam detection, codes and standards, credibility scoring, disinformation tracking, verification and fact-checking, and whitelisting.
Hoaxy
A web-based tool that visualizes the spread of articles online, created by the Observatory on Social Media (OSoMe) at Indiana University. Tracks the sharing of links to stories from low-credibility sources and independent fact-checking organizations.
Meta Content Library and API
The Meta Content Library and Content Library API provides comprehensive access to the full public content archive from Facebook and Instagram, as well as select data from Threads. As of October 2024, individuals can apply for access to the tools with the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan.
Social Media Research Toolkit
Curated set of 50+ social media research tools by the Social Media Lab at Toronto Metropolitan University. Updated annually, with a breakdown of each access type, the platforms covered, and whether coding experience is required.
SOMAR Data Applications Platform
The Social Media Archive (SOMAR) Data Applications platform is an online system that streamlines the process of finding and applying for social media data disseminated via the virtual data enclave (VDE) or controlled download by SOMAR and its partners, including the Meta Content Library and Content Library API and the Meta Ad Targeting Dataset. Hosted by the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan Institute for Social Research (ISR). Research affiliation required.
Top FIBers
A dashboard by the Observatory for Social Media (OSoMe) at Indiana University for tracking and reporting the top ten superspreaders of low-credibility information on Twitter and Facebook each month.
University of Michigan Social Media Research Guide
Step-by-step, updated resource for navigating social media data, including how to request access across different platforms, how to use APIs, what data is or is not publicly available, where to find existing datasets, sources of U.S. government social media, and tools for analysis.

Peer Resources & Partner Organizations