A Machine Learning Approach to Identify NIH-Funded Applied Prevention Research

Published:October 25, 2018DOI:


      To fulfill its mission, the NIH Office of Disease Prevention systematically monitors NIH investments in applied prevention research. Specifically, the Office focuses on research in humans involving primary and secondary prevention, and prevention-related methods. Currently, the NIH uses the Research, Condition, and Disease Categorization system to report agency funding in prevention research. However, this system defines prevention research broadly to include primary and secondary prevention, studies on prevention methods, and basic and preclinical studies for prevention. A new methodology was needed to quantify NIH funding in applied prevention research.


      A novel machine learning approach was developed and evaluated for its ability to characterize NIH-funded applied prevention research during fiscal years 2012–2015. The sensitivity, specificity, positive predictive value, accuracy, and F1 score of the machine learning method; the Research, Condition, and Disease Categorization system; and a combined approach were estimated. Analyses were completed during June–August 2017.


      Because the machine learning method was trained to recognize applied prevention research, it more accurately identified applied prevention grants (F1 = 72.7%) than the Research, Condition, and Disease Categorization system (F1 = 54.4%) and a combined approach (F1 = 63.5%) with p<0.001.


      This analysis demonstrated the use of machine learning as an efficient method to classify NIH-funded research grants in disease prevention.
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to American Journal of Preventive Medicine
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


      1. NIH. Mission and goals. Updated July 27, 2017. Accessed December 19, 2017.

      2. NIH Office of Disease Prevention. About us. Updated March 16, 2017. Accessed December 19, 2017.

        • NIH Office of Extramural Research
        Estimates of funding for various Research, Condition, and Disease Categories (RCDC).
        (Published July 3, 2017. Accessed December 19,)
        • NIH Office of Extramural Research
        Categorization process.
        (Published May 16, 2012. Accessed December 19,)
        • Murray DM
        • Cross WP
        • Simons-Morton D
        • et al.
        Enhancing the quality of prevention research supported by the National Institutes of Health.
        Am J Public Health. 2015; 105: 9-12
        • NIH Office of Extramural Research
        Funding facts.
        (Published August 29, 2014. Accessed December 19,)
        • Dietterich TG
        Ensemble methods in machine learning.
        in: Kittler J Roli F Multiple Classifier Systems. Vol. 1857. Springer, New York2000: 1-15
      3. Dong Y-S, Han K-S. A comparison of several ensemble methods for text categorization. Proceedings of the 2004 IEEE International Conference on Services Computing. 2004 Sept 15–18; Shanghai, China. New York: IEEE; 2004:419–422.

        • Harringon P.
        Machine Learning in Action.
        Manning, Shelter Island, NY2012
        • Patterson J
        • Gibson A
        Deep Learning: A Practitioner's Approach.
        O'Reilly Media, Sebastopol, CA2017
      4. LIBLINEAR: A library for large linear classification.∼cjlin/liblinear/. Accessed August 11, 2017.

      5. Apache OpenNLP. Accessed August 11, 2017.

      6. Deeplearning4j: open-source, distributed deep learning for the JVM. Accessed August 11, 2017.

        • Carrell DS
        • Halgrim S
        • Tran DT
        • et al.
        Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence.
        Am J Epidemiol. 2014; 179: 749-758
        • Zhang Q
        • Yu H
        Computational approaches for predicting biomedical research collaborations.
        PLoS One. 2014; 9e111795
        • Efron B
        • Tibshirani R
        Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy.
        Stat Sci. 1986; 1: 54-75
        • Starfield B
        • Hyde J
        • Gérvas J
        • Heath I
        The concept of prevention: a good idea gone astray.
        J Epidemiol Community Health. 2008; 62: 580-583
        • Ataguba JE
        • Mooney G
        Building on “The Concept of Prevention: A Good Idea Gone Astray?”.
        J Epidemiol Community Health. 2011; 65: 116-118
        • Pan I
        • Nolan LB
        • Brown RR
        • et al.
        Machine learning for social services: a study of prenatal case management in Illinois.
        Am J Public Health. 2017; 107: 938-944
        • Rosellini AJ
        • Monahan J
        • Street AE
        • et al.
        Predicting sexual assault perpetration in the U.S. Army using administrative data.
        Am J Prev Med. 2017; 53: 661-669
        • Rose S
        Mortality risk score prediction in an elderly population using machine learning.
        Am J Epidemiol. 2013; 177: 443-452
        • Roetker NS
        • Page CD
        • Yonker JA
        • et al.
        Assessment of genetic and nongenetic interactions for the prediction of depressive symptomatology: an analysis of the Wisconsin Longitudinal Study using machine learning algorithms.
        Am J Public Health. 2013; 103: S136-S144
        • Abreu PH
        • Santos MS
        • Abreu MH
        • et al.
        Predicting breast cancer recurrence using machine learning techniques: a systematic review.
        ACM Comput Surv. 2016; 49: 52
        • Koo CL
        • Liew MJ
        • Mohamad MS
        • et al.
        A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology.
        Biomed Res Int. 2013; 2013432375