Sam Gilbert, Affiliated Researcher at the Bennett Institute, explains how internet search data is being used in responses to the Covid-19 pandemic, and what search datasets and tools are available to researchers.
Search data during previous infectious disease outbreaks
“Infodemiology” – the analysis of user-generated internet data to inform public health policy – has come a long way in the 12 years since Google Flu Trends was launched.
When I reviewed the literature with SAGE in 2019, I found 265 peer-reviewed papers in academic journals that draw on aggregated data about individuals’ Google searches. Researchers have used search data to track the spread of Ebola during the 2014 epidemic in West Africa, the Zika virus during the 2016 outbreak in Brazil, and West Nile fever over an eleven-year period in Italy.
Even in countries with low levels of internet penetration, analysing search data has given health agencies important clues about the public’s information needs. By highlighting gaps between expert guidance on disease suppression and what ordinary people think and do, search data can help develop effective communication strategies. During the Zika outbreak, search data suggested there a was disproportionate focus in people’s searches on microcephaly, when in fact fever, skin rash, and conjunctivitis were much more common symptoms of infection. It also showed that people were not searching for information about practical actions they could take to reduce their risk of catching the virus, such as avoiding areas of standing water where mosquitoes breed.
Search data and the Covid-19 crisis
So how is search data being used during the current coronavirus pandemic? Researchers have demonstrated that Google searches for Covid-19 symptoms can track the spread of the disease in advance of official statistics – a technique referred to as “nowcasting”. A model built by UCL computer scientist Bill Lampos and team shows that Google searches predict Covid case volumes up to 14 days ahead. Among the most predictive are searches for anosmia – the loss of smell.
This symptom was the focus of a paper by British ENT doctors at Guys and St Thomas’s Hospital, Abigail Walker, Claire Hopkins and Pavol Surda, who were the first to bring it to widespread attention. Their Google Trends-based analysis showed that searches for information relating to smell loss was strongly correlated with the onset of Covid-19 infection in Italy, Spain, UK, USA, Germany, France, Iran and the Netherlands. Apparent to Lampos’s team as early as mid-March, the association of anosmia with Covid-19 was not acknowledged by Public Health England until 18th May.
Anosmia search is also the focus of interactive tools developed by the search marketer Patrick Berlinquette. By bidding on keywords relating to loss of smell in Google Ads auctions, Berlinquette is able to surface much more granular data than is available through Google Trends. Plotting the location of searches on a map builds up a picture of emerging hotspots. In the US, this may give public health agencies early warning of new Covid-19 outbreaks as states ease lockdown restrictions.
The potential is even greater in contexts where there is less testing capacity, and/or where official statistics are unreliable. In Tanzania, for example, anosmia search data suggests there are many times more Covid cases than the 509 that have been reported, substantiating on-the-ground reports of overflowing hospitals and night burials. As Berlinquette’s tools are built using Google Data Studio, all his data is freely available to download by right-clicking from the report.
A common objection to these uses of symptom search data is the biasing effect of media reports. When the news is dominated by coronavirus stories, how can we be sure that symptom search is not being driven by curiosity and hypochondria? There are more and less sophisticated ways of dealing with this: the Lampos model uses autoregression to adjust for the effect of media, while Berlinquette simply excludes broad search queries, as well as ones containing coronavirus or Covid keywords. Searches for “I can’t smell,” “lost my sense of smell,” and “when you can’t smell” are in scope; searches for “anosmia” and “loss of smell coronavirus” are not.
Another objection is that Google Flu Trends, the first and best-known infodemiological tool, stopped working after three years, failing to predict the peak of the 2013 flu season. However, the most helpful conclusion to draw is not that search data analysis is unreliable, but that it’s a complement to other methods and not a replacement for them. Anosmia search may not be the best basis for a predictive Covid-19 model in three years’ time, but that is hardly a reason not to make use of it now, when the need to close data gaps and triangulate other sources is acute, and the stakes exceptionally high.
Search data also has something to say about the human experience of the pandemic. Looking at 95 country cases, the University of Copenhagen’s Jeanet Bentzen finds that the crisis has increased Google searches for prayer by 50%, to their highest level ever recorded. Search data can also provide insights into the coronavirus “infodemic”, by allowing us to carry out empirical tests on the pervasiveness of misinformation. Analysing the Bing Coronavirus Query Set – a dataset of aggregated and de-identified searches with coronavirus-related intent – suggests survey findings about support for conspiracy theories may be less alarming than they first appear.
Most importantly, however, search data reveals the everyday questions which members of the public need answers to. Start typing “Can I…” into Google, and autocomplete offers you a window onto our collective concerns. In the UK at the time of writing these include “Can I visit my parents?”, “Can I go to the beach?”, and “Can I get tested for coronavirus?”. Government websites would do well to address these explicitly.
Bringing search data into your own research
As well as being a fascinating source of insight, search data is easy to get to grips with. I wrote an introductory guide to using it in social science research in a previous SAGE Ocean post, and coronavirus related search datasets and academic papers are collected in the infodemiology section of the Coronavirus Tech Handbook.
As I hope the latter half of this blog makes clear, the applications of search data to the current crisis extend well beyond epidemiology: economists could use it to monitor and forecast consumer demand at a time when many of the data sources statistics agencies rely on are compromised; sociologists could combine it with survey data to investigate the effects of lockdown on mental wellbeing. There has never been a better moment to explore it.
Biography Sam is an entrepreneur and researcher working at the intersection of politics and technology. An expert in data-driven marketing, Sam was Employee No.1 and Chief Marketing Officer at Bought...