Advertisement

What Can Digital Disease Detection Learn from (an External Revision to) Google Flu Trends?

      Background

      Google Flu Trends (GFT) claimed to generate real-time, valid predictions of population influenza-like illness (ILI) using search queries, heralding acclaim and replication across public health. However, recent studies have questioned the validity of GFT.

      Purpose

      To propose an alternative methodology that better realizes the potential of GFT, with collateral value for digital disease detection broadly.

      Methods

      Our alternative method automatically selects specific queries to monitor and autonomously updates the model each week as new information about CDC-reported ILI becomes available, as developed in 2013. Root mean squared errors (RMSEs) and Pearson correlations comparing predicted ILI (proportion of patient visits indicative of ILI) with subsequently observed ILI were used to judge model performance.

      Results

      During the height of the H1N1 pandemic (August 2 to December 22, 2009) and the 2012–2013 season (September 30, 2012, to April 12, 2013), GFT’s predictions had RMSEs of 0.023 and 0.022 (i.e., hypothetically, if GFT predicted 0.061 ILI one week, it is expected to err by 0.023) and correlations of r=0.916 and 0.927. Our alternative method had RMSEs of 0.006 and 0.009, and correlations of r=0.961 and 0.919 for the same periods. Critically, during these important periods, the alternative method yielded more accurate ILI predictions every week, and was typically more accurate during other influenza seasons.

      Conclusions

      GFT may be inaccurate, but improved methodologic underpinnings can yield accurate predictions. Applying similar methods elsewhere can improve digital disease detection, with broader transparency, improved accuracy, and real-world public health impacts.
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to American Journal of Preventive Medicine
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Brownstein J.S.
        • Freifeld C.C.
        • Madoff L.C.
        Digital disease detection—harnessing the Web for public health surveillance.
        N Engl J Med. 2009; 360 (2157): 2153-2155
        • Eysenbach G.
        Infodemiology and infoveillance tracking online health information and cyberbehavior for public health.
        Am J Prev Med. 2011; 40: S154-S158
        • Ayers J.W.
        • Althouse B.M.
        • Dredze M.
        Could behavioral medicine lead the web data revolution?.
        JAMA. 2014; 311: 1399-1400
        • Chunara R.
        • Andrews J.R.
        • Brownstein J.S.
        Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
        Am J Trop Med Hyg. 2012; 86: 39-45
        • Althouse B.M.
        • Ng Y.Y.
        • Cummings D.A.
        Prediction of dengue incidence using search query surveillance.
        PLoS Negl Trop Dis. 2011; 5: e1258
        • Chan E.H.
        • Sahai V.
        • Conrad C.
        • Brownstein J.S.
        Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.
        PLoS Negl Trop Dis. 2011; 5: e1206
        • Eysenbach G.
        Infodemiology: tracking flu-related searches on the web for syndromic surveillance.
        AMIA Annu Symp Proc. 2006; 2006: 244-248
        • Ginsberg J.
        • Mohebbi M.H.
        • Patel R.S.
        • Brammer L.
        • Smolinski M.S.
        • Brilliant L
        Detecting influenza epidemics using search engine query data.
        Nature. 2009; 457: 1012-1014
        • Breyer B.N.
        • Sen S.
        • Aaronson D.S.
        • Stoller M.L.
        • Erickson B.A.
        • Eisenberg M.L.
        Use of Google Insights for Search to track seasonal and geographic kidney stone incidence in the U.S.
        Urology. 2011; 78: 267-271
        • Willard S.D.
        • Nguyen M.M.
        Internet search trends analysis tools can provide real-time data on kidney stone disease in the U.S.
        Urology. 2011; 81: 37-42
        • Wilson K.
        • Brownstein J.S.
        Early detection of disease outbreaks using the Internet.
        CMAJ. 2009; 180: 829-831
        • Dukic V.M.
        • David M.Z.
        • Lauderdale D.S.
        Internet queries and methicillin-resistant Staphylococcus aureus surveillance.
        Emerg Infect Dis. 2011; 17: 1068-1070
        • Yang A.C.
        • Tsai S.J.
        • Huang N.E.
        • Peng C.K.
        Association of Internet search trends with suicide death in Taipei City, Taiwan, 2004–2009.
        J Affect Disord. 2011; 132: 179-184
        • Cavazos-Rehg P.A.
        • Krauss M.J.
        • Spitznagel E.L.
        • et al.
        Monitoring of non-cigarette tobacco use using Google Trends.
        Tob Control. 2014; (In press)
        • Yuan Q.
        • Nsoesie E.O.
        • Lv B.
        • Peng G.
        • Chunara R.
        • Brownstein J.S.
        Monitoring influenza epidemics in China with search query from Baidu.
        PLoS One. 2013; 8: e64323
        • Ocampo A.J.
        • Chunara R.
        • Brownstein J.S.
        Using search queries for malaria surveillance, Thailand.
        Malar J. 2013; 12: 390
        • Cook S.
        • Conrad C.
        • Fowlkes A.L.
        • Mohebbi M.H.
        Assessing Google flu trends performance in the U.S. during the 2009 influenza virus A (H1N1) pandemic.
        PLoS One. 2011; 6: e23610
      1. Butler D. When Google got flu wrong. nature.com/news/when-google-got-flu-wrong.

        • Olson D.R.
        • Konty K.J.
        • Paladini M.
        • Viboud C.
        • Simonsen L.
        Reassessing Google Flu Trends data for detection of seasonal and pandemic influenza: a comparative epidemiological study at three geographic scales.
        PLoS Comput Biol. 2013; 9: e1003256
        • Lazer D.
        • Kennedy R.
        • King G.
        • Vespignani A.
        The parable of Google Flu: traps in big data analysis.
        Science. 2014; 343: 1203-1205
        • Copeland P.
        • Romano R.
        • Zhang T.
        • Hecht G.
        • Zigmond D.
        • Stefansen C.
        Google disease trends: an update. Google.org, Menlo Park CA2013
        • Nsoesie E.O.
        • Buckeridge D.L.
        • Brownstein J.S.
        Guess who’s not coming to dinner? Evaluating online restaurant reservations for disease surveillance.
        J Med Internet Res. 2014; 16: e22
        • Ayers J.W.
        • Ribisl K.M.
        • Brownstein J.S.
        Tracking the rise in popularity of electronic nicotine delivery systems (electronic cigarettes) using search query surveillance.
        Am J Prev Med. 2011; 40: 448-453
        • Ayers J.W.
        • Althouse B.M.
        • Allem J.P.
        • et al.
        Novel surveillance of psychological distress during the great recession.
        J Affect Disord. 2012; 142: 323-330
        • Ayers J.W.
        • Althouse B.M.
        • Noar S.M.
        • Cohen J.E.
        Do celebrity cancer diagnoses promote primary cancer prevention?.
        Prev Med. 2014; 58: 81-84
        • Ayers J.W.
        • Althouse B.M.
        • Johnson M.
        • Cohen J.E.
        Circaseptan (weekly) rhythms in smoking cessation considerations.
        JAMA Intern Med. 2014; 174: 146-148
        • Ghil M.
        • Malanotte-Rizzoli P.
        Data assimilation in meteorology and oceanography.
        Adv Geophys. 1991; 33: 141-266
        • Wang B.
        • Zou X.
        • Zhu J.
        Data assimilation and its applications.
        Proc Natl Acad Sci U S A. 2000; 97: 11143-11144
        • Hastie T.
        • Tibshirani R.
        • Friedman J.
        • Franklin J.
        The elements of statistical learning: data mining, inference and prediction.
        Springer, New York2001
        • Russell S.J.
        • Norvig P.
        • Canny J.F.
        • Malik J.M.
        • Edwards D.D.
        Artificial intelligence: a modern approach.
        Prentice Hall, Englewood Cliffs NJ1995
        • Tibshirani R.
        Regression shrinkage and selection via the lasso.
        J Roy Stat Soc B. 1996; 58: 267-288
      2. Zhang, Wendong. Development of a Real-Time Estimate of Flu Activity in the United States Using Dyamically Updated Lasso Regressions and Google Search Queries. Harvard senior honors theses from the Harvard School of Engineering and Applied Sciences, Accession 19083, Box 4, 2013

      3. WHO. Influenza fact sheet, 2009. http://who.int/mediacentre/factsheets/fs211/en/.

        • Patwardhan A.
        • Bilkovski R.
        Comparison: flu prescription sales data from a retail pharmacy in the U.S. with Google Flu Trends and U.S. ILINet (CDC) data as flu activity indicator.
        PLoS One. 2012; 7: e43611
        • Liu F.
        • Lv B.
        • Peng G.
        • Li X.
        Influenza epidemics detection based on Google search queries.
        in: Goal F.L. Recent progress in data engineering and internet technology. Springer, New York2012: 371-376
        • Valdivia A.
        • Monge-Corella S.
        Diseases tracked by using Google trends, Spain.
        Emerg Infect Dis. 2010; 16: 168
        • Kelly H.
        • Grant K.
        Interim analysis of pandemic influenza (H1N1) 2009 in Australia: surveillance trends, age of infection and effectiveness of seasonal vaccination.
        Eur Surveill. 2009; 14: 2
        • Kang M.
        • Zhong H.
        • He J.
        • Rutherford S.
        • Yang F.
        Using Google trends for influenza surveillance in South China.
        PLoS One. 2013; 8: e55205
        • Wada K.
        • Ohta H.
        • Aizawa Y.
        Correlation of “Google Flu Trends” with sentinel surveillance data for influenza in 2009 in Japan.
        Open Public Health J. 2011; 4: 17-20
        • Ortiz J.R.
        • Zhou H.
        • Shay D.K.
        • Neuzil K.M.
        • Fowlkes A.L.
        • Goss C.H.
        Monitoring influenza activity in the U.S.: a comparison of traditional surveillance systems with Google Flu Trends.
        PLoS One. 2011; 6: e18687
        • Dugas A.F.
        • Hsieh Y.H.
        • Levin S.R.
        • et al.
        Google Flu Trends: correlation with emergency department influenza rates and crowding metrics.
        Clin Infect Dis. 2012; 54: 463-469
      4. Paul MJ, Dredze M. You are what you tweet: analyzing Twitter for public health. Fifth International AAAI Conference on Weblogs and Social Media (ICWSM 2011); 2011 Jul 17–21; Barcelona, Spain.

        • Althouse B.M.
        • Allem J.-P.
        • Childers M.A.
        • Dredze M.
        • Ayers J.W.
        Population health concerns during the U.S.’ Great Recession.
        Am J Prev Med. 2014; 46: 166-170
        • Ayers J.W.
        • Althouse B.M.
        • Allem J.P.
        • Rosenquist J.N.
        • Ford D.E.
        Seasonality in seeking mental health information on Google.
        Am J Prev Med. 2013; 44: 520-525
        • Ayers J.W.
        • Althouse B.M.
        • Ribisl K.M.
        • Emery S.
        Digital detection for tobacco control: online reactions to the U.S.’ 2009 cigarette excise tax increase.
        Nicotine Tob Res. 2014; 16: 576-583
        • Shaman J.
        • Karspeck A.
        Forecasting seasonal outbreaks of influenza.
        Proc Natl Acad Sci U S A. 2012; 109: 20425-20430