A Practical Approach for Content Mining of Tweets


      Use of data generated through social media for health studies is gradually increasing. Twitter is a short-text message system developed 6 years ago, now with more than 100 million users generating over 300 million Tweets every day. Twitter may be used to gain real-world insights to promote healthy behaviors. The purposes of this paper are to describe a practical approach to analyzing Tweet contents and to illustrate an application of the approach to the topic of physical activity. The approach includes five steps: (1) selecting keywords to gather an initial set of Tweets to analyze; (2) importing data; (3) preparing data; (4) analyzing data (topic, sentiment, and ecologic context); and (5) interpreting data. The steps are implemented using tools that are publically available and free of charge and designed for use by researchers with limited programming skills. Content mining of Tweets can contribute to addressing challenges in health behavior research.
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to American Journal of Preventive Medicine
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Atienza A.A.
        • Patrick K.
        Mobile health: the killer app for cyberinfrastructure and consumer health.
        Am J Prev Med. 2011; 40: S151-S153
        • Nathan K.C.
        • Amanda L.G.
        Health behavior interventions in the age of Facebook.
        Am J Prev Med. 2012; 43: 571-572
      1. Twitter Team. Twitter turns six. Twitter Blog 2012.

        • O'Connor B.
        • Balasubramanyan R.
        • Routledge B.
        • Smith N.
        From Tweets to polls: linking text sentiment to public opinion time series.
        Proc Int AAAI (ICWSM 2010). 2010; : 122-129
        • Eysenbach G.
        Infodemiology and infoveillance tracking online health information and cyberbehavior for public health.
        Am J Prev Med. 2011; 40: S154-S158
        • Paul M.J.
        • Dredze M.
        You are what you Tweet: analyzing Twitter for public health.
        Proc ICWSM. 2011;
        • Fisher J.
        • Clayton M.
        Who gives a Tweet: assessing patients' interest in the use of social media for health care.
        Worldviews Evid Based Nurs. 2012; 9: 100-108
        • Militello L.K.
        • Kelly S.A.
        • Melnyk B.M.
        Systematic review of text-messaging interventions to promote healthy behaviors in pediatric and adolescent populations: implications for clinical practice and research.
        Worldviews Evid Based Nurs. 2012; 9: 66-77
        • Schwartz J.E.
        • Stone A.A.
        Strategies for analyzing ecological momentary assessment data.
        Health Psychol. 1998; 17: 6-16
        • DHHS. Office of disease prevention and health promotion
        Physical activity guidelines advisory committee report.
        DHHS, Washington DC2008
        • Liu B.
        Carey M.J. Ceri S. Web data mining: exploring hyperlinks, contents and usage data. In carey MJ, Ceri S, eds. Springer, Berlin2006
        • Dodds P.S.
        • Danforth C.M.
        Measuring the happiness of large-scale written expression: songs, blogs, and presidents.
        J Happiness Stud. 2010; 11: 441-456
        • Kleinberg J.
        Bursty and hierarchical structure in streams.
        Data Min Knowl Disc. 2003; 7: 373-397
        • Grobelnik M.
        • Mladenic D.
        Text-mining tutorial. In the Proceedings of Learning Methods for Text Understanding and Mining,.
        France, Grenoble2004
        • Salton G.
        • Wong A.
        • Yang C.S.
        Vector-space model for automatic indexing.
        Commun ACM. 1975; 18: 613-620
        • Conway M.
        • Doan S.
        • Kawazoe A.
        • Collier N.
        Classifying disease outbreak reports using N-grams and semantic features.
        Int J Med Inform. 2009; 78: e47-e58
        • Wilbur W.J.
        • Sirotkin K.
        The automatic identification of stop words.
        J Inform Sci. 1992; 18: 45-55
        • Salton G.
        • Buckley C.
        Improving retrieval performance by relevance feedback.
        J Am Soc Information Sci. 1990; 41: 288-297
        • Jackson P.
        • Moulinier I.
        Natural language processing for online applications: text retrieval, extraction and categorization.
        2 rev. ed. John Benjamins Pub Co, Amsterdam2007
        • Tomlinson S.
        Comparative evaluation of multilingual informtion access systems.
        Lecture Notes Comput Sci. 2004; 3237: 169-182
      2. Porter M.F. Snowball: a language for stemming algorithms. 2001.

        • Tan P.
        • Steinbach M.
        • Kumar V.
        Introduction to data mining.
        Addison Wesley, Reading MA2006
        • Pang B.
        • Lee L.
        Opinion mining and sentiment analysis.
        Found Trends Inform Retr. 2008; 2: 1-135
      3. Go A, Bhayani R, Huang L. Twitter sentiment classification using distant supervision, 2010.

        • Gershon N.
        • Eick S.G.
        Information visualization applications in the real world.
        IEEE Comput Graph Appl. 1997; 17: 66
        • Mabry P.L.
        Making sense of the data explosion: the promise of systems.
        science. Am J Prev Med. 2011; 40: S159-S161
        • Oenema A.
        • Brug J.
        • Dijkstra A.
        • de Weerdt I.
        • de Vries H.
        Efficacy and use of an internet-delivered computer-tailored lifestyle intervention, targeting saturated fat intake, physical activity and smoking cessation: a randomized controlled trial.
        Ann Behav Med. 2008; 35: 125-135
        • Cavallo D.N.
        • Tate D.F.
        • Ries A.V.
        • Brown J.D.
        • DeVellis R.F.
        • Ammerman A.S.
        A social media-based physical activity intervention: a randomized controlled trial original research article.
        Am J Prev Med. 2012; 43: 527-532
        • Pak A.
        • Paroubek P.
        Twitter as a corpus for sentiment analysis and opinion mining. Proc Int LREC’10, European Language Resources Association (ELRA).
        Malta, Valletta2010
        • Adams S.A.
        • Matthews C.E.
        • Ebbeling C.B.
        • et al.
        The effect of social desirability and social approval on self-reports of physical activity.
        Am J Epidemiol. 2005; 161: 389-398
        • Klesges L.M.
        • Baranowski T.
        • Beech B.
        • et al.
        Social desirability bias in self-reported dietary, physical activity and weight concerns measures in 8- to 10-year-old African-American girls: results from the Girls Health Enrichment Multisite Studies (GEMS).
        Prev Med. 2004; 38: S78-S87
        • Warnecke R.B.
        • Johnson T.P.
        • Chavez N.
        • et al.
        Improving question wording in surveys of culturally diverse populations.
        Ann Epidemiol. 1997; 7: 334-342