People populate the web with content relevant to their lives, content that millions of others rely on for information and guidance. However, the web is not a perfect rep- resentation of lived experience: some topics appear in greater proportion online than their true incidence in our population, while others are deflated. This paper presents a large scale data collection study of this phenomenon. We collect webpages about 21 topics of interest capturing roughly 200,000 webpages, and then compare each topic’s popularity to representative national surveys as ground truth. We find that rare experiences are inflated on the web (by a median of 7x), while common experiences are deflated (by a median of 0.7x). We call this phenomenon novelty bias.
