Advertisement

Use of Machine Learning to Determine the Information Value of a BMI Screening Program

Open AccessPublished:January 19, 2021DOI:https://doi.org/10.1016/j.amepre.2020.10.016

      Introduction

      Childhood obesity continues to be a significant public health issue in the U.S. and is associated with short- and long-term adverse health outcomes. A number of states have implemented school-based BMI screening programs. However, these programs have been criticized for not being effective in improving students’ BMI or reducing childhood obesity. One potential benefit, however, of screening programs is the identification of younger children at risk of obesity as they age.

      Methods

      This study used a unique panel data set from the BMI screening program for public school children in the state of Arkansas collected from 2003 to 2004 through the 2018–2019 academic years and analyzed in 2020. Machine learning algorithms were applied to understand the informational value of BMI screening. Specifically, this study evaluated the importance of BMI information during kindergarten to the accurate prediction of childhood obesity by the 4th grade.

      Results

      Kindergarten BMI z-score is the most important predictor of obesity by the 4th grade and is much more important to prediction than sociodemographic and socioeconomic variables that would otherwise be available to policymakers in the absence of the screening program. Including the kindergarten BMI z-score of students in the model meaningfully increases the accuracy of the prediction.

      Conclusions

      Data from the Arkansas BMI screening program greatly improve the ability to identify children at greatest risk of future obesity to the extent that better prediction can be translated into more effective policy and better health outcomes. This is a heretofore unexamined benefit of school-based BMI screening.

      INTRODUCTION

      Estimates from the 2015–2016 National Health and Nutrition Examination Survey indicate that 18.5% of children and adolescents in the U.S. are obese.
      • Hales CM
      • Fryar CD
      • Carroll MD
      • Freedman DS
      • Ogden CL
      Trends in obesity and severe obesity prevalence in U.S. youth and adults by sex and age, 2007-2008 to 2015-2016.
      Obesity during childhood is associated with both short-term health consequences (e.g., psychological consequences, cardiovascular risk factors in childhood) and long-term consequences such as worse social and economic outcomes, adult morbidity, and risk of premature mortality.
      • Biro FM
      • Wien M
      Childhood obesity and adult morbidities.
      ,
      • Reilly JJ
      • Methven E
      • McDowell ZC
      • et al.
      Health consequences of obesity.
      Moreover, children with obesity are more likely to also become adults with obesity.
      • Serdula MK
      • Ivery D
      • Coates RJ
      • Freedman DS
      • Williamson DF
      • Byers T
      Do obese children become obese adults? A review of the literature.
      Obesity is a primary driver of rising healthcare costs. Recent studies find that the economic impact of obesity in the U.S. could be as high as $147 billion a year for adults, $14.3 billion a year for children, and $17.5 billion for children and adolescents aged 11–17 years (i.e., >10% of all medical spending).
      • Hammond RA
      • Levine R
      The economic impact of obesity in the United States.
      ,
      • Finkelstein EA
      • Trogdon JG
      • Cohen JW
      • Dietz W
      Annual medical spending attributable to obesity: payer-and service-specific estimates.
      Other studies also show that a 1-unit increase in BMI translates to a 1.9% increase in median medical expenditure.
      • Pronk NP
      • Goodman MJ
      • O'Connor PJ
      • Martinson BC
      Relationship between modifiable health risks and short-term health care charges.
      ,
      • Wolf AM
      Economic outcomes of the obese patient.
      In 1998, the U.S. Government declared childhood obesity to be an epidemic. In 2001, the U.S. Surgeon General issued a call for action to encourage specific actions regarding this public health issue.
      • Ikeda JP
      • Crawford PB
      • Woodward-Lopez G
      BMI screening in schools: helpful or harmful.
      In response to this call, Arkansas implemented a BMI surveillance and screening program through the state's public schools (Act 1220 of 2003. HB 1583). This legislation, Act 1220 of 2003, was the first legislation requiring public schools to measure the BMI of all students and provide confidential letters known as Child Health Reports for the parents/guardians.
      • Raczynski JM
      • Thompson JW
      • Phillips MM
      • Ryan KW
      • Cleveland HW
      Arkansas Act 1220 of 2003 to reduce childhood obesity: its implementation and impact on child and adolescent body mass index.
      ,
      • Gee KA
      School-based body mass index screening and parental notification in late adolescence: evidence from Arkansas's Act 1220.
      Since then, many other states have implemented some variation of this legislation in the form of surveillance or screening programs. Surveillance programs focus on the aggregate levels of obesity and identify the percentage of the students in the school or school district who are underweight, healthy weight, overweight, or obese. Screening programs provide parents with information on their child's BMI category. At least 25 states have since passed variations of this legislation requiring public schools to measure students’ BMI, and a number of these states also require public schools to provide health reports to parents/guardians.
      • Thompson HR
      • Madsen KA
      The report card on BMI report cards.
      ,
      • Ruggieri DG
      • Bass SB
      A comprehensive review of school-based body mass index screening programs and their implications for school health: do the controversies accurately reflect the research? [published correction appears in J Sch Health. 2015;85(6):411].
      Such BMI screening programs could reduce childhood obesity by raising awareness among parents of children with unhealthy weight status and thereby enabling parenting practices that are conducive to healthy body weight. However, evidence on the effectiveness of screening programs is mixed. Almond et al.
      • Almond D
      • Lee A
      • Schwartz AE
      Impacts of classifying New York City students as overweight.
      used a regression discontinuity design to evaluate the impact of overweight reports on children enrolled in the New York City public schools. They obtained precise but small estimates indicating that children labeled as overweight in 1 period were not meaningfully more likely to have lower BMI or weight in the subsequent year than children labeled as normal weight. Prina and Royer
      • Prina S
      • Royer H
      The importance of parental knowledge: evidence from weight report cards in Mexico.
      conducted an experimental screening program in Mexico and found that BMI reports effectively transmitted obesity information to parents but did not meaningfully alter parental behaviors. One explanation for null findings may lie in an emerging body of literature questioning the efficacy of correct perceptions of weight status and future weight gain.
      • Sonneville KR
      • Thurston IB
      • Milliren CE
      • Kamody RC
      • Gooding HC
      • Richmond TK
      Helpful or harmful? Prospective association between weight misperception and weight gain among overweight and obese adolescents and young adults.
      • Robinson E
      • Sutin AR
      • Daly M
      Self-perceived overweight, weight loss attempts, and weight gain: evidence from two large, longitudinal cohorts.
      • Robinson E
      • Sutin AR
      Parental perception of weight status and weight gain across childhood.
      Although the underlying mechanisms are unclear, self-perception of being overweight is associated with increased future weight gain in adolescents and young adults.
      • Sonneville KR
      • Thurston IB
      • Milliren CE
      • Kamody RC
      • Gooding HC
      • Richmond TK
      Helpful or harmful? Prospective association between weight misperception and weight gain among overweight and obese adolescents and young adults.
      ,
      • Robinson E
      • Sutin AR
      • Daly M
      Self-perceived overweight, weight loss attempts, and weight gain: evidence from two large, longitudinal cohorts.
      Similarly, increases in weight have also been observed among younger children whose parents perceived them to be overweight.
      • Robinson E
      • Sutin AR
      Parental perception of weight status and weight gain across childhood.
      Overall, the utility of BMI screening programs is an issue. Parent focus groups formed to evaluate the content of parental notification letters from the Massachusetts program raised concerns regarding the novelty of information provided and whether BMI was a valid metric to determine a healthy weight.
      • Moyer LJ
      • Carbone ET
      • Anliker JA
      • Goff SL
      The Massachusetts BMI letter: a qualitative study of responses from parents of obese children.
      Some have raised concerns about unintended consequences such as stigmatization and body dissatisfaction.
      • Ikeda JP
      • Crawford PB
      • Woodward-Lopez G
      BMI screening in schools: helpful or harmful.
      However, 1 study evaluated weight-based teasing before and 2-years after implementation of school-based BMI screening in Arkansas and found no increases overall or among adolescents who were overweight or obese.
      • Krukowski RA
      • West DS
      • Siddiqui NJ
      • Bursac Z
      • Phillips MM
      • Raczynski JM
      No change in weight-based teasing when school-based obesity policies are implemented.
      There is a need for more work in this area.
      One issue neglected in the existing literature is the informational value of BMI screening programs. These provide longitudinal data that facilitate a better understanding of childhood obesity, its causes, and the effectiveness of efforts to promote healthier childhood body weight.
      • Thompson JW
      • Card-Higginson P
      Arkansas’ experience: statewide surveillance and parental information on the child obesity epidemic.
      The potential to better identify those at risk of future obesity is an important consideration that has not factored into earlier criticisms of BMI screening programs.
      • Ikeda JP
      • Crawford PB
      • Woodward-Lopez G
      BMI screening in schools: helpful or harmful.
      ,
      • Evans EW
      • Sonneville KR
      BMI report cards: will they pass or fail in the fight against pediatric obesity?.
      ,
      • Soto C
      • White JH
      School health initiatives and childhood obesity: BMI screening and reporting.
      Obesity rates tend to increase from early childhood to preadolescence.
      • Hales CM
      • Fryar CD
      • Carroll MD
      • Freedman DS
      • Ogden CL
      Trends in obesity and severe obesity prevalence in U.S. youth and adults by sex and age, 2007-2008 to 2015-2016.
      If the availability of BMI information early in elementary school meaningfully improves the ability to identify children who are at greatest risk of becoming obese, this may amplify obesity prevention efforts by better reaching the children who have a high likelihood of becoming obese. This may be a heretofore overlooked merit of early BMI screening programs in public schools.
      The purpose of this study is to assess the informational value of the Arkansas BMI screening program, the nation's first and longest-running program. No other study has examined the potential informational value of BMI screening programs to identify children at risk of becoming obese. This study employs several machine learning algorithms to identify the importance of BMI information during kindergarten (typically aged 5–6 years) on predicting children who are most likely to be obese by the 4th grade (typically aged 9–10 years). Specifically, this study wishes to know whether the availability of BMI information during kindergarten meaningfully improves the prediction of obesity in the 4th grade beyond predictors that could otherwise be observed in absence of the screening program.

      METHODS

      Study Population

      The BMI panel data used in this study reflect the population of Arkansas public school children beginning kindergarten in academic years 2003–2004 through 2014–2015. Children in these cohorts attended 4th grade in academic years 2007–2008 through 2018–2019, respectively. The use of these data was reviewed by the University of Arkansas IRB (protocol number 14-07-026) and was determined to meet Exemption 4 for “research involving the collection or study of existing data or specimens if publicly available or information recorded such that subjects cannot be identified.”

      Measures

      This study used data from students observed both in kindergarten and 4th grade. BMI was calculated as (weight in pounds) ÷ (height in inches)
      • Biro FM
      • Wien M
      Childhood obesity and adult morbidities.
       × 703. BMI measures were converted to age- and sex-specific z-scores according to the Centers for Disease Control and Prevention guidelines. Obesity was defined as BMI ≥95th percentile. The outcome variable was the obesity status in the 4th grade. This outcome was predicted using the kindergarten BMI z-score and several other individual and neighborhood measures (called features in machine learning parlance). These included the child's race, ethnicity, school meal status (whether the child qualified for free or reduced-price school meals), language spoken at home, grade in school, school of attendance, and Census block group of residence.
      Because neighborhood SES is associated with excess weight gain in childhood,
      • Kim Y
      • Landgraf A
      • Colabianchi N
      Living in high-SES neighborhoods is protective against obesity among higher-income children but not low-income children: results from the Healthy Communities Study.
      Census block group–level measures of income, poverty, racial, and ethnic composition; educational attainment; housing; and family structure were also included as features in the prediction models. Because the data for the study span a prolonged period, block group–level Census data were taken from Summary File 3 of the 2000 Census of Population and from various releases of the American Community Survey (ACS) 5-year estimates. The Census block group is the smallest unit for which aggregate socioeconomic measures are provided by the U.S. Census. There are 2,147 Census block groups in Arkansas. All monetary measures from the Census or the ACS were adjusted for inflation to reflect the 2010 purchasing power of the U.S. dollar.
      Finally, this study characterized the commercial food environment by measuring less-healthy food retailers comprising fast-food restaurants, convenience stores, and low-cost variety (dollar) stores. Historical data on store locations were from the ReferenceUSA Database. Counts of unhealthy food stores by year were determined by adding up the number of fast-food restaurants, convenience stores, and low-cost variety (dollar) stores within 1 and 10 miles of the child's Census block centroid for urban and rural blocks, respectively.
      The Appendix (available online) provides additional information on the assembly of the analysis data set. This contains a sequential description of data preprocessing steps and a concordance between the academic years of kindergarten cohorts and the various releases of the ACS.

      Statistical Analysis

      This study assessed the performance of several algorithms to predict obesity in 4th grade on the basis of children's kindergarten characteristics. Specifically, predictions were assessed on the basis of 4 machine learning algorithms: a decision tree, a logistic regression, an artificial neural network, and a random forest. Previous studies provide a detailed review of these methods and contrast among them.
      • Kotsiantis SB
      Supervised machine learning: a review of classification techniques.
      • Dreiseitl S
      • Ohno-Machado L
      Logistic regression and artificial neural network classification models: a methodology review.
      • Caruana R
      • Niculescu-Mizil A
      An empirical comparison of supervised learning algorithms.
      • DeGregory KW
      • Kuiper P
      • DeSilvio T
      • et al.
      A review of machine learning in obesity.
      • Saritas MM
      • Yasar A
      Performance analysis of ANN and naive Bayes classification algorithm for data classification.
      A total of 1 or a mix of these methods have been applied in a wide range of fields such as bioinformatics, economics, and ecology.
      • Larivière B
      • Van den Poel D
      Predicting customer retention and profitability by using random forests and regression forests techniques.
      • Prasad AM
      • Iverson LR
      • Liaw A
      Newer classification and regression tree techniques: bagging and random forests for ecological prediction.
      • Jiang H
      • Deng Y
      • Chen HS
      • et al.
      Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes.
      • Buckinx W
      • Van den Poel D
      Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting.
      • Mullainathan S
      • Spiess J
      Machine learning: an applied econometric approach.
      • Dugan TM
      • Mukhopadhyay S
      • Carroll A
      • Downs S
      Machine learning techniques for prediction of early childhood obesity.
      • Ahmad LG
      • Eshlaghy A
      • Poorebrahimi A
      • Ebrahimi M
      • Razavi A
      Using three machine learning techniques for predicting breast cancer recurrence.
      In the prediction models, this study randomly selected one third of the data set to serve as the testing set for the purpose of out-of-sample prediction. The remaining two thirds were used as the training set.
      The performance of each method was assessed with standard metrics, including accuracy, sensitivity, specificity, and the area under the receiver operating characteristic (ROC) curve. Accuracy was assessed using the following equation:
      Accuracy=TP+TNTP+TN+FP+FN,


      where TP, TN, FP, and FN are the counts of true positives, true negatives, false positives, and false negatives, respectively. Sensitivity is the ratio of TP to TP and FN. In this case, sensitivity measured the proportion of non-obese 4th graders who were correctly classified. Specificity was defined by the ratio of TN to FP and TN, which is the proportion of students who were obese and were correctly classified. These 2 measures indicate how aggressive or conservative the algorithms were in classifying subjects.
      Statistical analysis was conducted with R, version 4.0.0. Within the R software environment, the caret package, version 6.0−86, was used to train and test the logistic regression, decision tree, and neural network algorithms.
      • Kuhn M
      Building predictive models in R using the caret package.
      A 10-fold cross-validation was used on the training data set to estimate the optimal complexity parameter in the decision tree algorithm and the optimal decay and size parameters for the artificial neural network. The randomForest package, version 4.6–14,
      • Liaw A
      • Wiener M
      Classification and regression by randomForest.
      was used to implement the random forest algorithm.
      • Liaw A
      • Wiener M
      • Breiman L
      • Cutler A
      Package “Randomforest.”.
      For the random forest, the optimal number of trees matter. For models with the same results for different numbers of trees, the smallest was chosen to avoid the computational cost of a larger number of trees. The performance of the forest does not necessarily become significantly better as the number of trees grows.
      • Breiman L
      Out-of-bag estimation.
      In this study, the lower bound of the number of trees was 25. This study used 5 iterations to augment the number of trees in steps of 100 up to an upper bound of 525 on the number of trees.

      RESULTS

      Table 1 shows the summary statistics for the analysis sample. The average BMI z-score in kindergarten was 0.61. Obesity rates in the analysis sample increased from 16% when children were in kindergarten to 24% by the 4th grade. Although not shown in the table, a cross tabulation was run on the obesity indicator in kindergarten and 4th grade. Of the 38,958 kindergartners classified as obese, 32,580 (84%) continued to be classified as obese in the 4th grade. Of the 205,095 non-obese kindergartners, 26,060 (13%) had developed obesity by the 4th grade. Because kindergarten BMI information is being used to predict 4th-grade obesity status, the study could only include children with valid BMI information in both kindergarten and 4th grade. The Appendix (available online) provides additional details on children with incomplete information who were not included in the study.
      Table 1Descriptive Statistics for Study Data Set (N=244,053)
      MeasureMean (SD)
      Student age, months
      Age on date of kindergarten BMI measurement.
      71.28 (5.00)
      Spanish at home
      Binary indicator variable for the child in question.
      0.09 (0.28)
      Other language at home
      Binary indicator variable for the child in question.
      0.01 (0.11)
      Free school meals
      Binary indicator variable for the child in question.
      0.50 (0.50)
      Reduced-price school meals
      Binary indicator variable for the child in question.
      0.10 (0.30)
      African American
      Binary indicator variable for the child in question.
      0.22 (0.41)
      Hispanic
      Binary indicator variable for the child in question.
      0.11 (0.32)
      Asian
      Binary indicator variable for the child in question.
      0.02 (0.12)
      Other race
      Binary indicator variable for the child in question.
      0.02 (0.13)
      Female
      Binary indicator variable for the child in question.
      0.49 (0.50)
      Population African American
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.17 (0.26)
      Population native
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.01 (0.02)
      Population Asian
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.01 (0.03)
      Population other race
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.05 (0.08)
      Population Hispanic
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.07 (0.12)
      Single female HH
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.27 (0.23)
      Less than high school
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.20 (0.12)
      Some college
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.27 (0.09)
      College degree or higher
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.18 (0.14)
      Limited English
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.02 (0.06)
      Median HH income (constant 2010 U.S.$)41,681.33 (17,831.52)
      Mother in labor force
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.66 (0.19)
      No vehicle access
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.07 (0.08)
      Poverty
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.19 (0.14)
      Median home value (constant 2010 U.S.$)102,908.70 (55,255.69)
      Vacant housing units
      Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year. HH, household.
      0.12 (0.09)
      Urban Census block
      Binary indicator variable for the child in question.
      0.65 (0.48)
      Unhealthy food stores (count)22.16 (22.32)
      Kindergarten BMI (z-score)0.61 (1.06)
      Obesity indicator—kindergarten
      Binary indicator variable for the child in question.
      0.16 (0.37)
      Obesity indicator—4th grade
      Binary indicator variable for the child in question.
      0.24 (0.43)
      a Age on date of kindergarten BMI measurement.
      b Binary indicator variable for the child in question.
      c Proportion of population, HHs, or housing units within the child's Census block group of residence during his or her kindergarten year.HH, household.
      The logistic regression, random forest, and neural network algorithms were performed similarly in terms of accuracy, sensitivity, and specificity (Table 2) and in terms of area under the curve (AUC) values (Figure 1). The 95% CIs around the AUC overlap among these 3 algorithms. The decision tree showed lower performance with an AUC value that was statistically lower than the AUC from each of the other algorithms. Nevertheless, the performance of the decision tree algorithm was in close proximity to the others. The overall conclusion is that the ability to predict obesity by the 4th grade was robust across the machine learning algorithms and the logistic regression with these data.
      Table 2Out-of-Sample Performance by Prediction Algorithm
      Performance measureDecision treeLogistic regressionArtificial neural networkRandom forest
      Accuracy0.8350.8690.8690.865
      Sensitivity0.6080.6210.6230.620
      Specificity0.9060.9470.9470.942
      Figure 1
      Figure 1ROC curves.
      Note: The AUC with 95% CIs for different prediction algorithms is indicated for each algorithm.
      AUC, area under the curve; ROC, receiver operating characteristic.
      As noted above, 1 of the criticisms of school-based BMI screening programs is that they are not effective in improving students’ BMI or reducing childhood obesity
      • Ikeda JP
      • Crawford PB
      • Woodward-Lopez G
      BMI screening in schools: helpful or harmful.
      ,
      • Evans EW
      • Sonneville KR
      BMI report cards: will they pass or fail in the fight against pediatric obesity?.
      ,
      • Soto C
      • White JH
      School health initiatives and childhood obesity: BMI screening and reporting.
      and that they might have potential harms related to weight stigmatization.
      • Ruggieri DG
      • Bass SB
      A comprehensive review of school-based body mass index screening programs and their implications for school health: do the controversies accurately reflect the research? [published correction appears in J Sch Health. 2015;85(6):411].
      ,
      • Dietz WH
      • Story MT
      • Leviton LC
      Issues and implications of screening, surveillance, and reporting of children's BMI.
      Overlooked in previous studies is the potential for BMI screening programs to identify at-risk children. For this purpose, the contribution and importance of the kindergarten z-score to prediction accuracy was assessed. When BMI z-score was excluded as a feature, prediction performance dropped markedly (AUC=51.2%, 95% CI=51.0, 51.4 in the random forest algorithm). By contrast, when all features but the kindergarten z-score were excluded, performance fell but by much less (AUC=78.2 %, 95% CI=77.9, 78.6 in the logistic regression algorithm).
      Kindergarten BMI z-score ranks first in importance and by a wide margin regardless of the algorithm (Figure 2). Importance weights can be dependent on the method and features of the data. In this analysis, variable importance is based on standardized coefficients for the logistic regression,
      • Gelman A
      Scaling regression inputs by dividing by two standard deviations.
      weighted variables for neural network,
      • Gevrey M
      • Dimopoulos I
      • Lek S
      Review and comparison of methods to study the contribution of variables in artificial neural network models.
      and node strength for tree-based algorithms (random forest and decision tree). All measures of importance were scaled to have a maximum value of 100. Figure 2 shows that the importance rankings of other features are highly sensitive to the algorithm used. Still, the dominance of kindergarten BMI as an important predictor across methods is noteworthy given that other features have already been identified in previous studies to be associated with BMI levels. For example, parents’ educational level,
      • Lamerz A
      • Kuepper-Nybelen J
      • Wehle C
      • et al.
      Social class, parental education, and obesity prevalence in a study of six-year-old children in Germany.
      • Nagel G
      • Wabitsch M
      • Galm C
      • et al.
      Determinants of obesity in the Ulm Research on Metabolism, Exercise and Lifestyle in Children (URMEL-ICE).
      • Scheinker D
      • Valencia A
      • Rodriguez F
      Identification of factors associated with variation in U.S. county-level obesity prevalence rates using epidemiologic vs machine learning models.
      income level,
      • Scheinker D
      • Valencia A
      • Rodriguez F
      Identification of factors associated with variation in U.S. county-level obesity prevalence rates using epidemiologic vs machine learning models.
      • Anderson PM
      • Butcher KF
      • Levine PB
      Maternal employment and overweight children.
      • Hofferth SL
      • Curtin S
      Poverty, food programs, and childhood obesity.
      • Kimm SY
      • Obarzanek E
      • Barton BA
      • et al.
      Race, socioeconomic status, and obesity in 9- to 10-year-old girls: the NHLBI Growth and Health Study.
      • Strauss RS
      • Knight J
      Influence of the home environment on the development of obesity in children.
      • Klein-Platat C
      • Wagner A
      • Haan MC
      • Arveiler D
      • Schlienger JL
      • Simon C
      Prevalence and sociodemographic determinants of overweight in young French adolescents.
      • Dubois L
      • Girard M
      • Potvin Kent MP
      Breakfast eating and overweight in a pre-school population: is there a link?.
      child age,
      • Hargreaves DS
      • Djafari Marbini AD
      • Viner RM
      Inequality trends in health and future health risk among English children and young people, 1999-2009.
      and sex
      • Ahn MK
      • Juon HS
      • Gittelsohn J
      Association of race/ethnicity, socioeconomic status, acculturation, and environmental factors with risk of overweight among adolescents in California, 2003.
      ,
      • Singh GK
      • Kogan MD
      • Van Dyck PC
      • Siahpush M
      Racial/ethnic, socioeconomic, and behavioral determinants of childhood and adolescent obesity in the United States: analyzing independent and joint associations.
      have all been associated with obesity risk.
      Figure 2
      Figure 2Variable importance by prediction algorithm.
      Note: To facilitate comparison, importance is scaled as a percentage of the most important variable in each algorithm. Importance rank by the algorithm is shown to the right of the bars. Only variables ranking in among the top 5 in importance for any algorithm are shown. provides the variable units.
      One issue with the above analysis is that 1 class was more heavily represented than the other, that is, only about 16% of the sample corresponded to the obese category. One of the methods to address this issue is downsampling the majority class in the training set.
      • Dubey R
      • Zhou J
      • Wang Y
      • Thompson PM
      • Ye J
      Alzheimer's Disease Neuroimaging Initiative. Analysis of sampling techniques for imbalanced data: an n = 648 ADNI study.
      • He H
      • Garcia EA
      Learning from imbalanced data.
      • Krawczyk B
      Learning from imbalanced data: open challenges and future directions.
      • Japkowicz N
      Learning from imbalanced data sets: a comparison of various strategies.
      • Liu XY
      • Wu J
      • Zhou ZH
      Exploratory undersampling for class-imbalance learning.
      Specifically, the authors used a downsampling majority class method to achieve balanced samples on the basis of the proportion of students who were obese. As shown in the Appendix (available online), there were improvements in AUC and sensitivity, but overall findings from the balanced samples were similar. Kindergarten z-score continued to be the most important feature across all algorithms assessed in the balanced samples.

      DISCUSSION

      Childhood BMI screening programs have been adopted by many states but have been criticized by some for the potential to stigmatize heavier children, limited data on their effectiveness in reducing obesity, and the cost of running the programs. However, a potential benefit of these programs, which has been overlooked in the previous studies, is the value of BMI information during early childhood to predict the likelihood of obesity later in life. The findings indicate that data from the Arkansas BMI screening program greatly improve the ability to identify children at greatest risk of future obesity. Models that included kindergarten BMI z-score did much better at predicting obesity than models that did not. Across all considered algorithms, the importance of BMI screening information (i.e., kindergarten BMI z-score) greatly exceeded that of any other demographic or socioeconomic measure that could otherwise be used to identify at-risk children.
      Although this study finds that kindergarten BMI is a strong predictor of 4th-grade obesity status, the ways to use information from BMI screening programs to better target childhood obesity interventions warrant careful investigation. In this sample, 84% of children with obesity in the 4th grade had already developed obesity by kindergarten. As an alternative to the approach used in this study, future research could focus on predicting obesity in 4th grade among the subpopulation of kindergarteners with a healthy weight status. The authors’ rationale for focusing on the entire population of kindergartners is 2-fold. First, there is a need to help children with an unhealthy weight status achieve a healthy weight as they grow. Second, targeting specific at-risk individuals is problematic given the aforementioned concerns involving privacy and unintentional stigmatization of children who are obese or at risk of becoming obese.
      School-based interventions may be more feasible venues. Children receive up to 58% of daily energy intake at schools.
      • Cullen KW
      • Chen TA
      The contribution of the USDA school breakfast and lunch program meals to student daily dietary intake.
      Ongoing interventions such as the Supplemental Nutrition Assistance Program Education (SNAP-Ed)
      U.S. Department of Agriculture, Supplemental Nutrition Assistance Program
      Supplemental Nutrition Assistance Program education (SNAP-ed) factsheet.
      and Fresh Fruit and Vegetable Program (FFVP)
      U.S. Department of Agriculture, Food and Nutrition Service
      Fresh fruit and vegetable program: a handbook for schools.
      already target all children in a school and thereby avoid stigmatizing any child or group of children. A school's eligibility for these interventions is determined by free and reduced lunch participation rates within the school, which is essentially an income criterion because children from lower-income families qualify for free or reduced-price school meals. These findings suggest that considering the baseline kindergarten BMI information of children enrolled in schools could be an additional criterion that may be able to amplify program effectiveness in terms of preventing obesity or helping children with unhealthy weight status grow into a healthy weight as they age.
      In both examples, the Supplemental Nutrition Assistance Program Education and Fresh Fruit and Vegetable Program, there are more eligible schools than the participating schools. Thus, baseline BMI information at kindergarten could be used to guide outreach efforts aimed at helping eligible nonparticipating schools with large numbers of high-risk children apply and become active in these programs. This may be a way to ensure these programs reach children at the greatest risk without any changes to existing program eligibility requirements. More generally, information from the BMI screening program could become an integral tool for new or ongoing community-based efforts to target neighborhood or school-based interventions to reach children at greatest risk of obesity.

      Limitations

      The study does have several limitations. First, the prediction methods used required valid BMI measurements in both kindergarten and 4th grade. Thus, the study excluded children who were missing BMI measurements in 1 or both years. As shown in the Appendix (available online), children with valid measures in kindergarten but with missing or invalid measures in 4th grade had higher BMIs and a higher prevalence of obesity. Second, data for the study were solely from Arkansas, and there is a need to assess whether findings in this study are similar when data from other geographic contexts are used. Finally, this study has only shown that information from a school-based BMI screening program greatly improves the prediction accuracy of later obesity.

      CONCLUSIONS

      The baseline kindergarten BMI information of children enrolled in schools could be an additional criterion to target specific at-risk children who are obese or at risk of becoming obese and may help children with an unhealthy weight status achieve a healthy weight as they grow. The information can also boost the effectiveness of school-based programs in preventing obesity or helping children with unhealthy weight status. For these reasons, the ability of BMI screening programs to identify children at greatest risk of becoming obese is an important but neglected dimension that should be used in evaluating their overall utility.

      ACKNOWLEDGMENTS

      The content of this manuscript is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
      The research reported in this publication was supported in part by the National Institute of General Medical Sciences of the NIH under Award Number P20GM109096.
      No financial disclosures were reported by the authors of this paper.

      Appendix. SUPPLEMENTAL MATERIAL

      REFERENCES

        • Hales CM
        • Fryar CD
        • Carroll MD
        • Freedman DS
        • Ogden CL
        Trends in obesity and severe obesity prevalence in U.S. youth and adults by sex and age, 2007-2008 to 2015-2016.
        JAMA. 2018; 319: 1723-1725https://doi.org/10.1001/jama.2018.3060
        • Biro FM
        • Wien M
        Childhood obesity and adult morbidities.
        Am J Clin Nutr. 2010; 91: 1499S-1505Shttps://doi.org/10.3945/ajcn.2010.28701B
        • Reilly JJ
        • Methven E
        • McDowell ZC
        • et al.
        Health consequences of obesity.
        Arch Dis Child. 2003; 88: 748-752https://doi.org/10.1136/adc.88.9.748
        • Serdula MK
        • Ivery D
        • Coates RJ
        • Freedman DS
        • Williamson DF
        • Byers T
        Do obese children become obese adults? A review of the literature.
        Prev Med. 1993; 22: 167-177https://doi.org/10.1006/pmed.1993.1014
        • Hammond RA
        • Levine R
        The economic impact of obesity in the United States.
        Diabetes Metab Syndr Obes. 2010; 3: 285-295https://doi.org/10.2147/DMSOTT.S7384
        • Finkelstein EA
        • Trogdon JG
        • Cohen JW
        • Dietz W
        Annual medical spending attributable to obesity: payer-and service-specific estimates.
        Health Aff (Millwood). 2009; 28: w822-w831https://doi.org/10.1377/hlthaff.28.5.w822
        • Pronk NP
        • Goodman MJ
        • O'Connor PJ
        • Martinson BC
        Relationship between modifiable health risks and short-term health care charges.
        JAMA. 1999; 282: 2235-2239https://doi.org/10.1001/jama.282.23.2235
        • Wolf AM
        Economic outcomes of the obese patient.
        Obes Res. 2002; 10: 58S-62Shttps://doi.org/10.1038/oby.2002.191
        • Ikeda JP
        • Crawford PB
        • Woodward-Lopez G
        BMI screening in schools: helpful or harmful.
        Health Educ Res. 2006; 21: 761-769https://doi.org/10.1093/her/cyl144
        • Raczynski JM
        • Thompson JW
        • Phillips MM
        • Ryan KW
        • Cleveland HW
        Arkansas Act 1220 of 2003 to reduce childhood obesity: its implementation and impact on child and adolescent body mass index.
        J Public Health Policy. 2009; 30: S124-S140https://doi.org/10.1057/jphp.2008.54
        • Gee KA
        School-based body mass index screening and parental notification in late adolescence: evidence from Arkansas's Act 1220.
        J Adolesc Health. 2015; 57: 270-276https://doi.org/10.1016/j.jadohealth.2015.05.007
        • Thompson HR
        • Madsen KA
        The report card on BMI report cards.
        Curr Obes Rep. 2017; 6: 163-167https://doi.org/10.1007/s13679-017-0259-6
        • Ruggieri DG
        • Bass SB
        A comprehensive review of school-based body mass index screening programs and their implications for school health: do the controversies accurately reflect the research? [published correction appears in J Sch Health. 2015;85(6):411].
        J Sch Health. 2015; 85: 61-72https://doi.org/10.1111/josh.12222
        • Almond D
        • Lee A
        • Schwartz AE
        Impacts of classifying New York City students as overweight.
        Proc Natl Acad Sci U S A. 2016; 113: 3488-3491https://doi.org/10.1073/pnas.1518443113
        • Prina S
        • Royer H
        The importance of parental knowledge: evidence from weight report cards in Mexico.
        J Health Econ. 2014; 37: 232-247https://doi.org/10.1016/j.jhealeco.2014.07.001
        • Sonneville KR
        • Thurston IB
        • Milliren CE
        • Kamody RC
        • Gooding HC
        • Richmond TK
        Helpful or harmful? Prospective association between weight misperception and weight gain among overweight and obese adolescents and young adults.
        Int J Obes (Lond). 2016; 40: 328-332https://doi.org/10.1038/ijo.2015.166
        • Robinson E
        • Sutin AR
        • Daly M
        Self-perceived overweight, weight loss attempts, and weight gain: evidence from two large, longitudinal cohorts.
        Health Psychol. 2018; 37: 940-947https://doi.org/10.1037/hea0000659
        • Robinson E
        • Sutin AR
        Parental perception of weight status and weight gain across childhood.
        Pediatrics. 2016; 137e20153957https://doi.org/10.1542/peds.2015-3957
        • Moyer LJ
        • Carbone ET
        • Anliker JA
        • Goff SL
        The Massachusetts BMI letter: a qualitative study of responses from parents of obese children.
        Patient Educ Couns. 2014; 94: 210-217https://doi.org/10.1016/j.pec.2013.10.016
        • Krukowski RA
        • West DS
        • Siddiqui NJ
        • Bursac Z
        • Phillips MM
        • Raczynski JM
        No change in weight-based teasing when school-based obesity policies are implemented.
        Arch Pediatr Adolesc Med. 2008; 162: 936-942https://doi.org/10.1001/archpedi.162.10.936
        • Thompson JW
        • Card-Higginson P
        Arkansas’ experience: statewide surveillance and parental information on the child obesity epidemic.
        Pediatrics. 2009; 124: S73-S82https://doi.org/10.1542/peds.2008-3586J
        • Evans EW
        • Sonneville KR
        BMI report cards: will they pass or fail in the fight against pediatric obesity?.
        Curr Opin Pediatr. 2009; 21: 431-436https://doi.org/10.1097/MOP.0b013e32832ce04c
        • Soto C
        • White JH
        School health initiatives and childhood obesity: BMI screening and reporting.
        Policy Polit Nurs Pract. 2010; 11: 108-114https://doi.org/10.1177/1527154410374218
        • Kim Y
        • Landgraf A
        • Colabianchi N
        Living in high-SES neighborhoods is protective against obesity among higher-income children but not low-income children: results from the Healthy Communities Study.
        J Urban Health. 2020; 97: 175-190https://doi.org/10.1007/s11524-020-00427-9
        • Kotsiantis SB
        Supervised machine learning: a review of classification techniques.
        in: Maglogiannis I Karpouzis K Wallace M Soldatos J Emerging Artificial Intelligence Applications in Computer Engineering. 160. IOS Press, Amsterdam, Netherlands2007: 3-24
        • Dreiseitl S
        • Ohno-Machado L
        Logistic regression and artificial neural network classification models: a methodology review.
        J Biomed Inform. 2002; 35: 352-359https://doi.org/10.1016/s1532-0464(03)00034-0.27
        • Caruana R
        • Niculescu-Mizil A
        An empirical comparison of supervised learning algorithms.
        in: Proceedings of the 23rd International Conference on Machine Learning; 2006 June 25–29, Pittsburg, Pennsylvania. Association for Computing Machinery: New York2006https://doi.org/10.1145/1143844.1143865
        • DeGregory KW
        • Kuiper P
        • DeSilvio T
        • et al.
        A review of machine learning in obesity.
        Obes Rev. 2018; 19: 668-685https://doi.org/10.1111/obr.12667
        • Saritas MM
        • Yasar A
        Performance analysis of ANN and naive Bayes classification algorithm for data classification.
        Int J Intell Sys Appl Eng. 2019; 7: 88-91https://doi.org/10.18201/ijisae.2019252786
        • Larivière B
        • Van den Poel D
        Predicting customer retention and profitability by using random forests and regression forests techniques.
        Expert Syst Appl. 2005; 29: 472-484https://doi.org/10.1016/j.eswa.2005.04.043
        • Prasad AM
        • Iverson LR
        • Liaw A
        Newer classification and regression tree techniques: bagging and random forests for ecological prediction.
        Ecosystems. 2006; 9: 181-199https://doi.org/10.1007/s10021-005-0054-1
        • Jiang H
        • Deng Y
        • Chen HS
        • et al.
        Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes.
        BMC Bioinformatics. 2004; 5: 81https://doi.org/10.1186/1471-2105-5-81
        • Buckinx W
        • Van den Poel D
        Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting.
        Eur J Oper Res. 2005; 164: 252-268https://doi.org/10.1016/j.ejor.2003.12.010
        • Mullainathan S
        • Spiess J
        Machine learning: an applied econometric approach.
        J Econ Perspec. 2017; 31: 87-106https://doi.org/10.1257/jep.31.2.87
        • Dugan TM
        • Mukhopadhyay S
        • Carroll A
        • Downs S
        Machine learning techniques for prediction of early childhood obesity.
        Appl Clin Inform. 2015; 6: 506-520https://doi.org/10.4338/ACI-2015-03-RA-0036
        • Ahmad LG
        • Eshlaghy A
        • Poorebrahimi A
        • Ebrahimi M
        • Razavi A
        Using three machine learning techniques for predicting breast cancer recurrence.
        J Health Med Inform. 2013; 4: 3https://doi.org/10.4172/2157-7420.1000124
        • Kuhn M
        Building predictive models in R using the caret package.
        J Stat Softw. 2008; 28: 1-26https://doi.org/10.18637/jss.v028.i05
        • Liaw A
        • Wiener M
        Classification and regression by randomForest.
        R News. 2002; 2: 18-22
        • Liaw A
        • Wiener M
        • Breiman L
        • Cutler A
        Package “Randomforest.”.
        Department of Statistics, University of California, Berkeley, CA2015
        • Breiman L
        Out-of-bag estimation.
        Department of Statistics, University of California, Berkeley, CA1996 (Published)
        • Dietz WH
        • Story MT
        • Leviton LC
        Issues and implications of screening, surveillance, and reporting of children's BMI.
        Pediatrics. 2009; 124: S98-S101https://doi.org/10.1542/peds.2008-3586M
        • Gelman A
        Scaling regression inputs by dividing by two standard deviations.
        Stat Med. 2008; 27: 2865-2873https://doi.org/10.1002/sim.3107
        • Gevrey M
        • Dimopoulos I
        • Lek S
        Review and comparison of methods to study the contribution of variables in artificial neural network models.
        Ecol Modell. 2003; 160: 249-264https://doi.org/10.1016/S0304-3800(02)00257-0
        • Lamerz A
        • Kuepper-Nybelen J
        • Wehle C
        • et al.
        Social class, parental education, and obesity prevalence in a study of six-year-old children in Germany.
        Int J Obes. 2005; 29: 373-380https://doi.org/10.1038/sj.ijo.0802914
        • Nagel G
        • Wabitsch M
        • Galm C
        • et al.
        Determinants of obesity in the Ulm Research on Metabolism, Exercise and Lifestyle in Children (URMEL-ICE).
        Eur J Pediatr. 2009; 168: 1259-1267https://doi.org/10.1007/s00431-009-1016-y
        • Scheinker D
        • Valencia A
        • Rodriguez F
        Identification of factors associated with variation in U.S. county-level obesity prevalence rates using epidemiologic vs machine learning models.
        JAMA Netw Open. 2019; 2e192884https://doi.org/10.1001/jamanetworkopen.2019.2884
        • Anderson PM
        • Butcher KF
        • Levine PB
        Maternal employment and overweight children.
        J Health Econ. 2003; 22: 477-504https://doi.org/10.1016/S0167-6296(03)00022-5
        • Hofferth SL
        • Curtin S
        Poverty, food programs, and childhood obesity.
        J Policy Anal Manage. 2005; 24: 703-726https://doi.org/10.1002/pam.20134
        • Kimm SY
        • Obarzanek E
        • Barton BA
        • et al.
        Race, socioeconomic status, and obesity in 9- to 10-year-old girls: the NHLBI Growth and Health Study.
        Ann Epidemiol. 1996; 6: 266-275https://doi.org/10.1016/s1047-2797(96)00056-7
        • Strauss RS
        • Knight J
        Influence of the home environment on the development of obesity in children.
        Pediatrics. 1999; 103: e85https://doi.org/10.1542/peds.103.6.e85
        • Klein-Platat C
        • Wagner A
        • Haan MC
        • Arveiler D
        • Schlienger JL
        • Simon C
        Prevalence and sociodemographic determinants of overweight in young French adolescents.
        Diabetes Metab Res Rev. 2003; 19: 153-158https://doi.org/10.1002/dmrr.368
        • Dubois L
        • Girard M
        • Potvin Kent MP
        Breakfast eating and overweight in a pre-school population: is there a link?.
        Public Health Nutr. 2006; 9: 436-442https://doi.org/10.1079/phn2005867
        • Hargreaves DS
        • Djafari Marbini AD
        • Viner RM
        Inequality trends in health and future health risk among English children and young people, 1999-2009.
        Arch Dis Child. 2013; 98: 850-855https://doi.org/10.1136/archdischild-2012-303403
        • Ahn MK
        • Juon HS
        • Gittelsohn J
        Association of race/ethnicity, socioeconomic status, acculturation, and environmental factors with risk of overweight among adolescents in California, 2003.
        Prev Chronic Dis. 2008; 5 (https://www.cdc.gov/pcd/issues/2008/Jul/07_0152.htm. Accessed November 10, 2020): A75
        • Singh GK
        • Kogan MD
        • Van Dyck PC
        • Siahpush M
        Racial/ethnic, socioeconomic, and behavioral determinants of childhood and adolescent obesity in the United States: analyzing independent and joint associations.
        Ann Epidemiol. 2008; 18: 682-695https://doi.org/10.1016/j.annepidem.2008.05.001
        • Dubey R
        • Zhou J
        • Wang Y
        • Thompson PM
        • Ye J
        Alzheimer's Disease Neuroimaging Initiative. Analysis of sampling techniques for imbalanced data: an n = 648 ADNI study.
        Neuroimage. 2014; 87: 220-241https://doi.org/10.1016/j.neuroimage.2013.10.005
        • He H
        • Garcia EA
        Learning from imbalanced data.
        IEEE Trans Knowl Data Eng. 2009; 21: 1263-1284https://doi.org/10.1109/TKDE.2008.239
        • Krawczyk B
        Learning from imbalanced data: open challenges and future directions.
        Prog Artif Intell. 2016; 5: 221-232https://doi.org/10.1007/s13748-016-0094-0
        • Japkowicz N
        Learning from imbalanced data sets: a comparison of various strategies.
        AAAI Technical Report WS-00-05. 2000; 68: 10-15
        • Liu XY
        • Wu J
        • Zhou ZH
        Exploratory undersampling for class-imbalance learning.
        IEEE Trans Syst Man Cybern B Cybern. 2009; 39: 539-550https://doi.org/10.1109/TSMCB.2008.2007853
        • Cullen KW
        • Chen TA
        The contribution of the USDA school breakfast and lunch program meals to student daily dietary intake.
        Prev Med Rep. 2017; 5: 82-85https://doi.org/10.1016/j.pmedr.2016.11.016
        • U.S. Department of Agriculture, Supplemental Nutrition Assistance Program
        Supplemental Nutrition Assistance Program education (SNAP-ed) factsheet.
        U.S. Department of Agriculture, Supplemental Nutrition Assistance Program, Washington, DC2016 (Published August)
        • U.S. Department of Agriculture, Food and Nutrition Service
        Fresh fruit and vegetable program: a handbook for schools.
        U.S. Department of Agriculture, Food and Nutrition Service, Alexandria, VA2010 (Published December)