Advertisement

Denominator Issues for Personally Generated Data in Population Health Monitoring

  • Rumi Chunara
    Correspondence
    Address correspondence to: Rumi Chunara, PhD, Department of Computer Science and Engineering, New York University Tandon School of Engineering, 2 Metrotech Center, 10th Floor, 10.007, Brooklyn NY 11201
    Affiliations
    Department of Computer Science and Engineering, New York University Tandon School of Engineering, Brooklyn, New York

    College of Global Public Health, New York University, New York, New York
    Search for articles by this author
  • Lauren E. Wisk
    Affiliations
    Division of Adolescent/Young Adult Medicine, Boston Children’s Hospital, Boston, Massachusetts

    Department of Pediatrics, Harvard Medical School, Harvard University, Boston, Massachusetts
    Search for articles by this author
  • Elissa R. Weitzman
    Affiliations
    Division of Adolescent/Young Adult Medicine, Boston Children’s Hospital, Boston, Massachusetts

    Department of Pediatrics, Harvard Medical School, Harvard University, Boston, Massachusetts

    Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts
    Search for articles by this author
Open AccessPublished:December 22, 2016DOI:https://doi.org/10.1016/j.amepre.2016.10.038

      Introduction

      Widespread use of Internet and mobile technologies provides opportunities to gather health-related information to complement data generated through traditional healthcare and public health systems. These personally generated data (PGD) are increasingly viewed as informative of the patient experience of conditions, symptoms, treatments, and side effects.
      • Lavallee D.C.
      • Chenok K.E.
      • Love R.M.
      • et al.
      Incorporating patient-reported outcomes into health care to engage patients and enhance care.
      Behavior, sentiment, and disease patterns can be discerned from mining unstructured PGD in text, image, or metadata form, and from analyzing PGD collected via structured, opt-in, and web-enabled platforms and devices, including wearables.
      • Chunara R.
      • Smolinski M.S.
      • Brownstein J.S.
      Why we need crowdsourced data in infectious disease surveillance.
      • Ayers J.W.
      • Althouse B.M.
      • Dredze M.
      Could behavioral medicine lead the web data revolution?.
      Models that employ PGD from distributed cohorts are being used increasingly to measure public health outcomes; moreover, PGD collection forms the centerpiece of important new federal investments into personalized medicine that seek to energize vast cohorts in donating data via apps and devices.
      • Collins F.S.
      • Varmus H.
      A new initiative on precision medicine.
      PGD offer the opportunity to inform gap areas of health research through high-resolution views into spatial, temporal, or demographic features. However, when PGD are used to answer epidemiologic questions, it is not always clear what constitutes the population at risk (PAR), or the denominator, challenging researchers’ abilities to make inferences, draw comparisons, and evaluate change. Because of this, initial PGD studies have tended toward numerator-only investigations
      • Chan E.H.
      • Sahai V.
      • Conrad C.
      • Brownstein J.S.
      Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.
      • Chunara R.
      • Bouton L.
      • Ayers J.W.
      • Brownstein J.S.
      Assessing the online social environment for surveillance of obesity prevalence.
      • Ginsberg J.
      • Mohebbi M.H.
      • Patel R.S.
      • et al.
      Detecting influenza epidemics using search engine query data.
      ; however, the field is advancing. This report summarizes issues related to specifying PAR and denominator metrics when using PGD for health research and outlines approaches for resolving these issues using design and analytic strategies.

      Challenges in Specifying the Population at Risk

      Individuals considered capable of acquiring a disease are considered the PAR (Figure 1). Accurate knowledge of the PAR is a fundamental requirement in preventive medicine and public health, and is necessary for measuring effects and gauging impact. Challenges in assessing PAR have been deliberated even for traditional healthcare-based data sources
      • Krogh-Jensen P.
      The denominator problem.
      ; when PAR is unobservable, different proxy metrics are often used.
      • Schlaud M.
      • Brenner M.
      • Hoopmann M.
      • Schwartz F.
      Approaches to the denominator in practice-based epidemiology: a critical overview.
      The exact nature of what might constitute a useful and valid choice for defining PAR depends on the research focus and is a source of difficulty.
      • Krogh-Jensen P.
      The denominator problem.
      • Schlaud M.
      • Brenner M.
      • Hoopmann M.
      • Schwartz F.
      Approaches to the denominator in practice-based epidemiology: a critical overview.
      All analyses drawing on chart reviews, electronic medical records, and telephone surveys must address the extent to which the sample represents the source population (i.e., generalizability). Even when samples are drawn directly from the PAR, the dynamic nature of human populations and subtleties of selection, differential non-response, and attrition may introduce variability and impact validity of inferences. Anticipation and mitigation of these issues are vital for investigations drawing on traditional healthcare-based data and PGD. As with RCTs, where findings may not generalize with the imposition of strict eligibility criteria and participant self-selection into research,
      CDC
      Principles of Epidemiology in Public Health Practice, Third Edition: An Introduction to Applied Epidemiology and Biostatistics.
      PGD may not fully represent patient or community populations. Moreover, once individuals contribute data, bias may be introduced through imperfect methods of observation and engagement. Researchers wishing to use PGD will need to acknowledge and, when possible, measure how characteristics of a sample differ from the PAR (i.e., issues relevant to external validity). Three challenges to using PGD stand out and are elaborated here—sample representativeness and selection bias, poorly characterized reference populations, and spatiotemporally inconsistent denominators (Table 1 and Appendix Table 1, available online).
      Figure 1
      Figure 1An overview of challenges in assessing the population-at-risk working with both healthcare-based data and personally generated data.
      Table 1Approaches to Resolving Denominator Challenges in PGD
      PGD challenge typeIllustrative case(s)Corrective approaches (via design or analytic strategy)Practical examples
      Sample representativeness and selection biasUse of social media data for health monitoring where data are collected via keyword search
      • Select out a subsample that represents the PAR using probability sampling or other matching methods
      • Conduct sensitivity analyses varying denominator
      • Control for potential confounders at appropriate scale
      • Propensity score matching used to evaluate effect of information on Twitter users

        Rehman N, Liu J, Chunara R. Using propensity score matching to understand the relationship between online health information sources and vaccination sentiment. Proceedings of Association for the Advancement of Artificial Intelligence Spring Symposia; 2016 March 23–25, Stanford University, Stanford, CA. Palo Alto, CA: Association for the Advancement of Artificial Intelligence, 2016.

      • Assess effect of participatory symptom contribution frequency on population burden estimates
        • Chunara R.
        • Goldstein E.
        • Patterson-Lomba O.
        • Brownstein J.S.
        Estimating influenza attack rates in the United States using a participatory cohort.
      • SES controlled for at ZIP code when assessing relationship between online check-ins and obesity levels

        Bai H, Chunara R, Varshney LR. Social capital deserts: obesity surveillance using a location-based social network. Proceedings of the Data for Good Exchange (D4GX), New York, September 28, 2015. https://dl.dropboxusercontent.com/u/389406195/web-Papers/PH_Varshney_55.pdf. Accessed November 20, 2016.

      Poorly characterized reference populationsUse of anonymized data as indicator of health phenomenon that do not include demographic or sampling details, such as internet search data from Google Trends
      • Focus on concurrent validation against criterion standards, with the understanding that criteria are rarely available to compare against
      • Utilize multiple datasets (including a criterion standard, when available) to establish concurrent validity
      • Use Bayesian evidence synthesis to improve model performance
      • Supervised learning and some ground truth data to perform inference of latent characteristics
      Spatiotemporally inconsistent denominatorData donation of steps or symptoms by members of an opt-in distributed, online health community
      • Engage a cohort for cross-sectional or prospective evaluation where spatial and temporal frames can be identified
      • Partner with PGD sources/platforms to elucidate denominator features
      • Imputation for missing or outlier data
      CDC, Centers for Disease Control and Prevention; PAR, population at risk; PGD, personally generated data.

      Sample Representativeness and Selection Bias

      Selection biases arise when the sample does not accurately represent an underlying population. They are among the most common validity threats when making inferences (Appendix Table 1, available online). In healthcare-based studies, different types of settings, recruitment procedures, and inclusion and exclusion criteria must be carefully considered to ensure valid estimates and extrapolation of findings. For example, access bias was found to affect the Behavioral Risk Factor Surveillance System, a U.S. survey that collects data via random-digit-dial telephone interviews. However, using only data from landline-based sampling excluded important segments of the U.S. population, and resulted in measurable biases for many key health indicators.
      • Hu S.S.
      • Balluz L.
      • Battaglia M.P.
      • Frankel M.R.
      Improving public health surveillance using a dual-frame survey of landline and cell phone numbers.
      Consequently, the Behavioral Risk Factor Surveillance System shifted their sampling approach to additionally incorporate cell phones, yielding more valid, reliable, and representative measures.
      Many types of selection bias also apply to PGD (Appendix Table 1, available online). Access bias, for example, is a concern when disparities in Internet access affect representativeness of PGD. Efforts to close the Internet access gap have succeeded, with only 15% of the U.S. population offline in the past year, yet variability and inequalities persist. For instance, among teens, Internet platform preferences vary by household income.

      Pew Research Center. Social media update; 2014. www.pewinternet.org/files/2015/01/PI_SocialMediaUpdate20144.pdf. Accessed December 1, 2016.

      Among adults, health literacy is highly correlated with health information seeking on the Internet and self-rated health.
      • Kutner M.
      • Greenburg E.
      • Jin Y.
      • Paulsen C.
      The Health literacy of America’s adults: results from the 2003 National Assessment of Adult Literacy. NCES 2006-483.
      Hence, individuals who passively source or actively contribute data via online platforms may vary by income and be more literate and healthy than the general population. Access differences can also reflect group preferences, norms, technology diffusion, and sharing patterns. For example, in the U.S., 40% of African Americans aged 18–29 years who use the Internet say that they use Twitter compared with 28% among their white counterparts.
      • Smith A.
      African Americans and Technology Use: A Demographic Portrait.
      Across multiple influenza-focused participatory surveillance systems, women have higher participation levels than men, with peak levels found among those aged 30–60 years.
      • Bajardi P.
      • Vespignani A.
      • Funk S.
      • et al.
      Determinants of follow-up participation in the internet-based European influenza surveillance platform influenzanet.
      • Smolinski M.S.
      • Crawley A.W.
      • Baltrusaitis K.
      • et al.
      Flu near you: crowdsourced symptom reporting spanning 2 influenza seasons.
      As well, in participatory diabetes surveillance within an online diabetes community, early adopters of an app that enabled PGD sharing for population health monitoring reported greater diabetes control than did later adopters, and preference toward openness in data sharing was greater among individuals with better diabetes control.
      • Weitzman E.R.
      • Adida B.
      • Kelemen S.
      • Mandl K.D.
      Sharing data for public health research by members of an international online diabetes social network.
      Because technology diffusion, preferences, and norms vary by age, income, and gender, observations gleaned passively from processing of Internet search or microblogging data (as done for sentiment analysis or outbreak detection) may be affected by differential selection stemming from the composition of groups using one versus another platform or tool.

      Pew Research Center. Social media update; 2014. www.pewinternet.org/files/2015/01/PI_SocialMediaUpdate20144.pdf. Accessed December 1, 2016.

      Bias may also be present in active surveillance, which relies on the planned donation of information by persons using Internet or mobile tools. For example, in participatory influenza reporting systems, individuals tend to sign up more when their symptoms are worse, potentially skewing incidence estimates.
      • Kjelsø C.
      • Galle M.
      • Bang H.
      • Ethelberg S.
      • Krause T.G.
      Influmeter—an online tool for self-reporting of influenza-like illness in Denmark.
      Technology standards, protections on data use, or controls over secondary data use can also affect representativeness. To address these issues, investigators might preselect a subsample of users that represents the PAR using probability sampling, conduct sensitivity analyses to ascertain how sample composition affects observations, or integrate data gleaned from multiple platforms (Table 1).

      Poorly Characterized Reference Populations

      Even if recruitment, access, and other factors are accounted for, there still can be multiple PAR options. Complexity when choosing a reference population using healthcare-based data has been discussed when assessing cohort patterns for cerebral palsy (CP), for example.
      • Paneth N.
      • Hong T.
      • Korzeniewski S.
      The descriptive epidemiology of cerebral palsy.
      For a congenital disorder such as CP, in which newborns and infants might logically be considered to constitute the PAR, the birth cohort from which a case arises is generally used as the denominator. However, poorly defined or tracked birth cohorts may necessitate the use of other denominators, such as all children aged <2 years residing in a particular area. Yet, if families of a child with CP move and cluster around health centers that are best suited for caring for children with CP, prevalence estimates might skew higher; alternatively, such health centers might be more adept at screening and diagnosis, leading to a diagnostic access bias. Ultimately, statistics calculated with different denominators will answer slightly different questions and researchers must select the most relevant estimate for their objectives.
      Similarly, when using PGD, there will be multiple options for specifying the reference population. For example, investigators may elect to use all search queries, all geotagged queries, or all queries by week, and these elections may yield different estimates.
      • Santillana M.
      • Zhang D.W.
      • Althouse B.M.
      • Ayers J.W.
      What can digital disease detection learn from (an external revision to) Google Flu Trends?.
      • Morstatter F.
      • Pfeffer J.
      • Liu H.
      • Carley K.M.
      Is the sample good enough? Comparing data from Twitter’s streaming API with Twitter’s Firehose. Proceedings of the Seventh International Association for the Advancement of Artificial Intelligence Conference on Weblogs and Social Media; 2013 July 8–12; Cambridge, Massachusetts.
      Accordingly, researchers will need to select and justify their choice of a reference population based on their aims. Investigators can evaluate their choice of denominator by comparing estimates that reflect different underlying PARs or they can benchmark PGD measures against criterion standards (assessing concurrent validity), as has been done for influenza surveillance when anonymous Internet search activity data are modeled against healthcare measures.
      • Santillana M.
      • Nguyen A.T.
      • Dredze M.
      • et al.
      Combining search, social media, and traditional data sources to improve influenza surveillance.
      Yet, criterion standards are not always available, particularly as PGD are often used to fill existing data gaps. Other approaches, including triangulating PGD, may improve estimates. For example, investigators might compare influenza symptom data derived from Twitter and a participatory surveillance system, where both sources derive from the same geographic area and time period. Participatory methods, even where extrapolation to PAR and formulation of a denominator is not feasible, may be informative too, where information about a health phenomenon is otherwise missing. Such numerator-only estimates gleaned through voluntary reporting may shed light on emerging phenomenon without a practical reference population for benchmarking change (Table 1). In this case, transparency around measurement and caveats on inference are especially important.

      Spatiotemporally Inconsistent Denominator

      In healthcare-based data, prevalence incidence bias, diagnostic vogue bias, and noncontemporaneous control bias are all known contributors to a denominator that varies over time or by place (Appendix Table 1, available online). In traditional longitudinal cohort studies, inclusion time is demarcated by directly observed benchmarks (e.g., date of diagnosis or treatment). For PGD, the absence of structured reporting formats or observation periods and the influence of exogenous factors, such as tool availability, create subtle spatiotemporal issues. Fixing a known and appropriate spatiotemporal data frame is vital because the underlying, relevant PAR could be dynamic, which affects interpretability, reproducibility, and prediction. For example, inclusion time is conditioned on availability of search engines, operating systems, and technology standards. Shifts in app or device popularity and privacy policies may affect data availability and reporting. Views into these factors may be obscured when data are proprietary, resulting in transparency concerns, such as those observed in pharmaceutical trials.
      • Sykes R.
      Being a modern pharmaceutical company: involves making information available on clinical trial programmes.
      Although some statistics on Internet and mobile tool use exist, they are not necessarily reproducible or consistent over time, providing static views into dynamic patterns.
      Other exogenous factors can also cause spatiotemporal shifts. News regarding an outbreak can generate “spurious spikes” attributed to increased discussion/awareness (rather than experience of actual symptoms) in incidence time series.
      • Chan E.H.
      • Sahai V.
      • Conrad C.
      • Brownstein J.S.
      Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.
      This can affect the validity and stability of disease incidence estimates generated from close monitoring of Internet search activity patterns, a common indicator of infectious disease activity.
      • Chan E.H.
      • Sahai V.
      • Conrad C.
      • Brownstein J.S.
      Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.
      • Ginsberg J.
      • Mohebbi M.H.
      • Patel R.S.
      • et al.
      Detecting influenza epidemics using search engine query data.
      These spikes may not reproducibly occur in every situation; monitoring of malaria trends via Google search queries in Thailand did not reveal “spurious spikes”—possibly due to low media activity in the area. Variations in annual influenza-related search incidence estimates drawn from Google search queries reflect the lack of structure in PGD and underscore the importance of understanding how the dynamic nature of population reporting behaviors may affect the denominator. As well, PGD may be influenced by the condition of participants, analogous to the healthy participant effect. For example, associations between self-report of health status and willingness to share personal health information in studies via data sharing from web-based sources have been noted.
      • Weitzman E.R.
      • Kelemen S.
      • Kaci L.
      • Mandl K.D.
      Willingness to share personal health record data for care improvement and public health: a survey of experienced personal health record users.
      Temporally inconsistent denominators can be addressed by utilizing data from only individuals with acceptable participation levels over a particular time period, as done to understand incidence from participatory influenza reports.
      • Chunara R.
      • Goldstein E.
      • Patterson-Lomba O.
      • Brownstein J.S.
      Estimating influenza attack rates in the United States using a participatory cohort.
      However, efforts must be made to ensure unmeasured confounding is not introduced. Spurious spikes have been distinguished from true increases in illness burden by identifying the effects of media or other exogenous factors unrelated to actual disease dynamics. Comparing with reasonable growth rates of disease allows investigators to distinguish excess variation over time.
      • Chan E.H.
      • Sahai V.
      • Conrad C.
      • Brownstein J.S.
      Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.
      Methods to reduce these spikes can be developed after or simultaneous to data collection, as has been done in monitoring dengue trends from Internet search query data and influenza-like illness trends from participatory reports.
      • Chan E.H.
      • Sahai V.
      • Conrad C.
      • Brownstein J.S.
      Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.
      • Smolinski M.S.
      • Crawley A.W.
      • Baltrusaitis K.
      • et al.
      Flu near you: crowdsourced symptom reporting spanning 2 influenza seasons.

      Conclusions

      Challenges common to both healthcare-based data and PGD pertain to knowledge about PAR and specification of a denominator. All surveillance data have challenges that may lead to samples that do not represent the population from which they are drawn. Excitement over the potential for PGD to accelerate knowledge and inform prevention can obscure the significance of these challenges when using PGD—simply put, extra care is needed to define PAR and a denominator. The commercial nature of many novel data types amplifies these challenges. Ambitious new initiatives such as NIH’s call for creation of a vast, volunteer cohort to donate data to advance precision medicine will require exquisite attention to how PGD are invoked, sampled, compiled, and used.
      • Collins F.S.
      • Varmus H.
      A new initiative on precision medicine.
      Similar attention is vital to optimizing use of PGD distilled from web platforms, search activity, and other sources. Overall, in both healthcare-based data and PGD, there are no “one-size-fits-all” solutions. For PGD as with other forms of data, concerns about measurement, inference, and extrapolation are mitigated where careful specification of PAR is evident, limitations addressed forthrightly and through confirmatory investigations as feasible.

      Acknowledgments

      This work is supported by grant R21 AA023901-01 from the National Institutes of Alcohol Abuse and Alcoholism at NIH (ERW and RC), and grant IIS-1343968 from the National Science Foundation (RC). Rumi Chunara, PhD and Elissa R. Weitzman, ScD, MSc, conceived the paper and Rumi Chunara, PhD, Elissa Weitzman, ScD, MSc, and Lauren Wisk, PhD, wrote the paper. The authors thank Melanie Kenney for help with assembling relevant literature.
      No financial disclosures were reported by the authors of this paper.

      Appendix A. Supplementary material

      References

        • Lavallee D.C.
        • Chenok K.E.
        • Love R.M.
        • et al.
        Incorporating patient-reported outcomes into health care to engage patients and enhance care.
        Health Aff (Millwood). 2016; 35: 575-582https://doi.org/10.1377/hlthaff.2015.1362
        • Chunara R.
        • Smolinski M.S.
        • Brownstein J.S.
        Why we need crowdsourced data in infectious disease surveillance.
        Curr Infect Dis Rep. 2013; 15: 316-319https://doi.org/10.1007/s11908-013-0341-5
        • Ayers J.W.
        • Althouse B.M.
        • Dredze M.
        Could behavioral medicine lead the web data revolution?.
        JAMA. 2014; 311: 1399-1400https://doi.org/10.1001/jama.2014.1505
        • Collins F.S.
        • Varmus H.
        A new initiative on precision medicine.
        N Engl J Med. 2015; 372: 793-795https://doi.org/10.1056/NEJMp1500523
        • Chan E.H.
        • Sahai V.
        • Conrad C.
        • Brownstein J.S.
        Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.
        PLoS Negl Trop Dis. 2011; 5: e1206https://doi.org/10.1371/journal.pntd.0001206
        • Chunara R.
        • Bouton L.
        • Ayers J.W.
        • Brownstein J.S.
        Assessing the online social environment for surveillance of obesity prevalence.
        PLoS One. 2013; 8: e61373https://doi.org/10.1371/journal.pone.0061373
        • Ginsberg J.
        • Mohebbi M.H.
        • Patel R.S.
        • et al.
        Detecting influenza epidemics using search engine query data.
        Nature. 2009; 457: 1012-1014https://doi.org/10.1038/nature07634
        • Krogh-Jensen P.
        The denominator problem.
        Scand J Prim Health Care. 1983; 1: 53https://doi.org/10.3109/02813438309034933
        • Schlaud M.
        • Brenner M.
        • Hoopmann M.
        • Schwartz F.
        Approaches to the denominator in practice-based epidemiology: a critical overview.
        J Epidemiol Community Health. 1998; 52 (13S−19S)
        • CDC
        Principles of Epidemiology in Public Health Practice, Third Edition: An Introduction to Applied Epidemiology and Biostatistics.
        CDC, Atlanta, GA2012
      1. Rehman N, Liu J, Chunara R. Using propensity score matching to understand the relationship between online health information sources and vaccination sentiment. Proceedings of Association for the Advancement of Artificial Intelligence Spring Symposia; 2016 March 23–25, Stanford University, Stanford, CA. Palo Alto, CA: Association for the Advancement of Artificial Intelligence, 2016.

        • Chunara R.
        • Goldstein E.
        • Patterson-Lomba O.
        • Brownstein J.S.
        Estimating influenza attack rates in the United States using a participatory cohort.
        Sci Rep. 2015; 5: 9540https://doi.org/10.1038/srep09540
      2. Bai H, Chunara R, Varshney LR. Social capital deserts: obesity surveillance using a location-based social network. Proceedings of the Data for Good Exchange (D4GX), New York, September 28, 2015. https://dl.dropboxusercontent.com/u/389406195/web-Papers/PH_Varshney_55.pdf. Accessed November 20, 2016.

        • Santillana M.
        • Nguyen A.T.
        • Dredze M.
        • et al.
        Combining search, social media, and traditional data sources to improve influenza surveillance.
        PLoS Comput Biol. 2015; 11: e1004513https://doi.org/10.1371/journal.pcbi.1004513
        • Santillana M.
        • Zhang D.W.
        • Althouse B.M.
        • Ayers J.W.
        What can digital disease detection learn from (an external revision to) Google Flu Trends?.
        Am J Prev Med. 2014; 47: 341-347https://doi.org/10.1016/j.amepre.2014.05.020
        • Presanis A.M.
        • De Angelis D.
        • Hagy A.
        • et al.
        The severity of pandemic H1N1 influenza in the United States, from April to July 2009: a Bayesian analysis.
        PLoS Med. 2009; 6: e1000207https://doi.org/10.1371/journal.pmed.1000207
        • Rao D.
        • Yarowsky D.
        • Shreevats A.
        • Gupta M.
        Classifying latent user attributes in Twitter. Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents; 2010 Oct 30.
        Association for Computing Machinery, Toronto, ON, Canada, New York2010: 37-44https://doi.org/10.1145/1871985.1871993
        • Weitzman E.R.
        • Adida B.
        • Kelemen S.
        • Mandl K.D.
        Sharing data for public health research by members of an international online diabetes social network.
        PLoS One. 2011; 6: e19256https://doi.org/10.1371/journal.pone.0019256
        • Hu S.S.
        • Balluz L.
        • Battaglia M.P.
        • Frankel M.R.
        Improving public health surveillance using a dual-frame survey of landline and cell phone numbers.
        Am J Epidemiol. 2011; 173: 703-711https://doi.org/10.1093/aje/kwq442
      3. Pew Research Center. Social media update; 2014. www.pewinternet.org/files/2015/01/PI_SocialMediaUpdate20144.pdf. Accessed December 1, 2016.

        • Kutner M.
        • Greenburg E.
        • Jin Y.
        • Paulsen C.
        The Health literacy of America’s adults: results from the 2003 National Assessment of Adult Literacy. NCES 2006-483.
        National Center for Education Statistics, Washington, DC2006
        • Smith A.
        African Americans and Technology Use: A Demographic Portrait.
        Pew Research Center, Washington, DC2014
        • Bajardi P.
        • Vespignani A.
        • Funk S.
        • et al.
        Determinants of follow-up participation in the internet-based European influenza surveillance platform influenzanet.
        J Med Internet Res. 2014; 16: e78https://doi.org/10.2196/jmir.3010
        • Smolinski M.S.
        • Crawley A.W.
        • Baltrusaitis K.
        • et al.
        Flu near you: crowdsourced symptom reporting spanning 2 influenza seasons.
        Am J Public Health. 2015; 105: 2124-2130https://doi.org/10.2105/AJPH.2015.302696
        • Kjelsø C.
        • Galle M.
        • Bang H.
        • Ethelberg S.
        • Krause T.G.
        Influmeter—an online tool for self-reporting of influenza-like illness in Denmark.
        Infect Dis (Lond). 2016; 48: 322-327https://doi.org/10.3109/23744235.2015.1122224
        • Paneth N.
        • Hong T.
        • Korzeniewski S.
        The descriptive epidemiology of cerebral palsy.
        Clin Perinatol. 2006; 33: 251-267https://doi.org/10.1016/j.clp.2006.03.011
        • Morstatter F.
        • Pfeffer J.
        • Liu H.
        • Carley K.M.
        Is the sample good enough? Comparing data from Twitter’s streaming API with Twitter’s Firehose. Proceedings of the Seventh International Association for the Advancement of Artificial Intelligence Conference on Weblogs and Social Media; 2013 July 8–12; Cambridge, Massachusetts.
        Association for the Advancement of Artificial Intelligence, Palo Alto, CA2013
        • Sykes R.
        Being a modern pharmaceutical company: involves making information available on clinical trial programmes.
        BMJ. 1998; 317: 1172https://doi.org/10.1136/bmj.317.7167.1172
        • Weitzman E.R.
        • Kelemen S.
        • Kaci L.
        • Mandl K.D.
        Willingness to share personal health record data for care improvement and public health: a survey of experienced personal health record users.
        BMC Med Inform Decis Mak. 2012; 12: 39https://doi.org/10.1186/1472-6947-12-39