- Research
- Open access
- Published:
Explainable predictive models of short stature and exploration of related environmental growth factors: a case-control study
BMC Endocrine Disorders volume 25, Article number: 129 (2025)
Abstract
Background
Short stature is a prevalent pediatric endocrine disorder for which early detection and prediction are pivotal for improving treatment outcomes. However, existing diagnostic criteria often lack the necessary sensitivity and specificity because of the complex etiology of the disorder. Hence, this study aims to employ machine learning techniques to develop an interpretable predictive model for normal-variant short stature and to explore how growth environments influence its development.
Methods
We conducted a case‒control study including 100 patients with normal-variant short stature who were age-matched with 200 normal controls from the Endocrinology Department of Nanjing Children’s Hospital from April to September 2021. Parental surveys were conducted to gather information on the children involved. We assessed 33 readily accessible medical characteristics and utilized conditional logistic regression to explore how growth environments influence the onset of normal-variant short stature. Additionally, we evaluated the performance of the nine machine learning algorithms to determine the optimal model. The Shapley additive explanation (SHAP) method was subsequently employed to prioritize factor importance and refine the final model.
Results
In the multivariate logistic regression analysis, children’s weight (OR = 0.92, 95% CI: 0.86, 0.99), maternal height (OR = 0.79, 95% CI: 0.72, 0.87), paternal height (OR = 0.83, 95% CI: 0.75, 0.91), sufficient nighttime sleep duration (OR = 0.48, 95% CI: 0.26, 0.89), and outdoor activity time exceeding three hours (OR = 0.02, 95% CI: 0.00, 0.66) were identified as protective factors for normal-variant short stature. This study revealed that parental height, caregiver education, and children’s weight significantly influenced the prediction of normal-variant short stature risk, and both the random forest model and gradient boosting machine model exhibited the best discriminatory ability among the 9 machine learning models.
Conclusions
This study revealed a close correlation between environmental growth factors and the occurrence of normal-variant short stature, particularly anthropometric characteristics. The random forest model and gradient boosting machine model performed exceptionally well, demonstrating their potential for clinical applications. These findings provide theoretical support for clinical identification and preventive measures for short stature.
Introduction
Short stature refers to children whose height falls below the third percentile (P3) of the growth curve for healthy children of the same race, age, and sex in a comparable environment or more than two standard deviations (SDs) below the mean [1]. Surveys indicate that 90% of children with short stature experience various degrees of inferiority, introversion, depression, and other behavioral or psychological disorders [2,3,4]. These issues can adversely affect their academic performance, employment opportunities, and marital prospects [5]. Short people often exhibit negative personality traits. They are susceptible to physical weakness, reduced activity levels, a lack of concentration, memory decline, and poor learning efficiency, all of which are exacerbated by fatigue, which can further exacerbate negative emotions [6]. Society must acknowledge the challenges individuals face with short stature in their professional and daily lives, which contribute to social costs. Therefore, short stature represents a significant risk to both physical and mental health, as well as to social stability and harmony.
Short stature is the result of various etiological factors, which can be categorized into normal variations and pathological causes. Normal-variate short stature (NVSS) encompasses familial short stature (FSS) and constitutional growth delay (CGD), while pathological causes are further delineated into endocrine disorders, clinically defined syndromes, chronic illnesses, and metabolic diseases [7]. Recently, clinicians have classified NVSS as idiopathic short stature (ISS), a diagnosis made after evidence of systemic, endocrine, nutritional, or chromosomal abnormalities is excluded [8, 9]. Currently, short stature affects approximately 3% of Chinese children, there are already 8 million children suffering from short stature in China, with the prevalence of 161,000 people increasing every year [10]. Most research on children with short stature at home and abroad focuses on the analysis of etiology, diagnostic methods, drug treatment, and safety. The influences of diet, sleep, psychology, and other factors on height are less important. Studies have shown that sleep disorders cause physical and psychological developmental disorders in children [11], ISS harms children’s psychological behavior [12], and a poor diet can contribute to the development of nutritional diseases, such as growth retardation [13]. Factors such as the growth environment [14, 15] and parental height also play significant roles [16, 17]. However, research on the impacts of sleep, diet, and behavior on growth and development is limited. Early screening, diagnosis, and treatment of children with short stature are of paramount importance.
The etiology of NVSS is highly complex, posing challenges for both diagnosis and treatment, even for endocrinologists, who may frequently encounter misdiagnoses [18]. In underdeveloped and remote regions with limited healthcare resources, missed, misdiagnosed, or mistreated cases are more common. In the era of big data, hospital electronic medical record (EMR) systems serve as crucial medical data repositories, garnering increasing attention for their role in assisting diagnostic processes [19]. Processing and analyzing this wealth of medical data can significantly increase decision-making support for healthcare professionals. In recent years, medical data to derive valuable insights into prevention, diagnosis, and treatment have become imperative. These machine learning (ML) methods, which are based on EMRs, have garnered recognition and attention from clinical practitioners [20,21,22]. Leveraging EMRs to extract expert knowledge and establish computer-assisted diagnostic systems not only optimizes the utilization of medical records but also increases healthcare professionals’ efficiency, thereby alleviating their workload. Moreover, these systems facilitate diagnostics in remote areas, enabling timely detection at the grassroots level, thereby facilitating early prevention and treatment to halt further progression.
This study aims to develop an explainable predictive model for the normal-variant short stature of children via machine learning on the basis of previous diagnostic cases. By analyzing patient characteristics, physical examinations, lifestyle habits, and other indicators, the model seeks to predict NVSS. The SHapley additive explanations (SHAP) method was used to clarify the machine learning model [23]. Although individuals diagnosed with NVSS were included, the key aspect of the machine learning model is to extract and explain the factors influencing the differences between this group and the normal population. By leveraging rich clinical data, targeted statistical analyses will explore the combined impacts of sleep, diet, and behavior on children with NVSS. The findings from this research will provide scientific evidence to support the diagnosis and prevention of normal-variant short stature.
Materials and methods
Participants and study design
From April to September 2021, children with normal-variant short stature were selected from the Endocrinology Department of Nanjing Children’s Hospital of Nanjing Medical University in Nanjing, China. The control group consisted of age-matched normal children without NVSS who attended the hospital for routine medical care.
The diagnostic process for NVSS involves several steps to ensure accurate assessment and exclusion of other potential pathological causes. First, the physician conducts a detailed history taking, reviews the child’s growth development history and to establish a normal growth pattern. Next, the physician performs a physical examination and compares the data with the “Height and weight standardized growth charts for Chinese children and adolescents aged 0 to 18 years“ [24]. This comparison helps identify whether the child’s height falls within the normal range of growth for their age and gender. Additionally, X-ray examinations are conducted to assess bone age, determining whether skeletal development aligns with chronological age. After excluding other potential causes of short stature, the physician will perform a comprehensive hormonal evaluation, including the assessment of growth hormone levels, along with other routine tests such as CBC, CMP, inflammatory markers, karyotype analysis, and genetic testing to ensure there are no underlying endocrine disorders, nutritional deficiencies, or chronic diseases. Combining all collected information with the insights of pediatric endocrinology specialists, the physician will arrive at a diagnosis that accurately differentiates between “normal variants” and pathological short stature. Ultimately, conditions that do not meet the established diagnostic criteria, as well as those with missing data, are excluded, ensuring that only children with normal-variant short stature are included in the case group.
Following age matching, this case‒control study ultimately included 300 patients with a ratio of 1:2 (100 patients with NVSS and 200 controls), all of whom were term infants at birth. This study followed the principles outlined in the Helsinki Declaration and received approval from the Institutional Review Board of Nanjing Children’s Hospital. We obtained written informed consent from the legal guardians of each child participating in this study.
Data collection
Through face‒to-face interviews with the caregivers of the children involved, information about the study participants was collected via a structured questionnaire. The questionnaire comprises five sections: (1) General information about the children, including gender, age, height, race/ethnicity, place of residence, parity, preterm birth, delivery method, and caregiver’s educational background. (2) Anthropometric measurements, including child height and weight, parents’ heights and weights, and history of early puberty, were taken. The weights of the children were measured while they were lightly dressed and barefoot. (3) Children’s dietary habits included feeding methods after birth, fried foods, barbecued foods, foreign fast foods, soft drinks, vitamin D supplements, and the material of the children’s bowls. (4) Sleep habits included the time of going to bed at night, the average hours of sleep per day, the hours of sleep during the day (from 07:00 to 19:00) and at night (from 19:00 to 07:00), the number of wakes during the night, the time spent tucking in, and snoring. (5) Parental habits included the time the child started at a screening device, the duration and intensity of outdoor activities, and secondhand smoke inhalation. The structured questionnaire was designed based on existing relevant studies to capture potential factors influencing child health and development [14,15,16,17]. Before data collection commenced, researchers underwent training to ensure the quality of the clinical research practices. The data were entered by professional staff with uniform coding, one person input, and one person verification. The questionnaire was pre-tested in a small sample to identify any ambiguities or biases, and adjustments were made accordingly. Although age and height are essential characteristics of children, they have strong covariance with the NVSS, which can lead to inaccurate model estimation and increased instability of the coefficients [25]; therefore, these variables are not selected for model development and logistic regression.
Statistical analysis
We used SPSS statistical software 23.0 to analyze the data collected from the questionnaires to explore the etiology of NVSS. Frequencies and percentages were used for categorical variables, and means ± SDs were used for continuous variables. To assess the statistically significant differences between cases and controls, we used Pearson’s chi-square test (χ2) or Fisher’s exact test for categorical variables and Student’s t test for continuous variables. Univariate and multivariate conditional logistic regression analyses were performed to assess the effects of environmental factors and genetic and socioeconomic factors on the NVSS of children. In the univariate conditional logistic regression analysis, all variables that showed significant differences (at the P < 0.05 level) between the case and control groups were included, and only those variables that remained statistically significant (at the P < 0.05 level) were retained in the multivariate conditional logistic regression.
A predictive model was constructed to forecast the NVSS using the aforementioned 33 collected factors. Nine machine learning models were employed: logistic regression (LR), k-nearest neighbors (KNN), decision tree (DT), naive Bayes (NB), random forest (RF), multinomial naive bayes (MNB), extreme gradient boosting (XGBoost), support vector machine (SVM), and gradient boosting machine (GBM). These models were chosen for their diverse methodologies and strengths in handling various aspects of the dataset; details can be found in supplementary Table 1. We constructed the RF model via the random forest package version 4.6–14 in R version 4.0.1. The area under the receiver operating characteristic (ROC) curve, known as the AUC, was utilized to evaluate the reliability of these models. The ROC curve represents the trade-off between sensitivity (true positive rate) and specificity (false positive rate) across thresholds. The AUC quantifies the model’s ability to discriminate between positive and negative classes, with values ranging from 0.5 (no discrimination) to 1.0 (perfect discrimination) [26]. After the optimal model was identified, SHAP values were used to assist in factor selection by ranking the importance of the 33 factors. This process led to the reduction of features to a final set of 20, which were chosen on the basis of their significance across the models [27]. SHAP quantifies how each factor contributes to the model’s output, providing a consistent framework to identify the top 20 most impactful features across different models and ensuring that the final predictive model is both robust and interpretable in predicting the NVSS. This process was conducted via Python version 3.6.5.
Results
Comparison of essential characteristics between the Short-Stature and control groups
This case‒control study included 100 children with NVSS and 200 normal children (Table 1). The proportions of boys were 54% and 56%, with mean ages of 8.29 ± 2.53 years and 7.96 ± 0.50 years in the case and control groups, respectively. Sex (P = 0.74) and age (P = 0.21) did not differ between the two groups. However, there were differences in place of residence (P < 0.01) and caregiver education (P < 0.01) between the two groups.
Comparison of the anthropometric measurements between the Short-Stature and control groups
The case and control groups presented differences in terms of child weight (t = -4.60, P < 0.01), maternal height (t = -10.17, P < 0.01), maternal weight (t = -3.17, P < 0.01), paternal height (t = -9.73, P < 0.01), and paternal weight (t = -4.95, P < 0.01) (Table 2).
Comparison of the dietary habits between the Short-Stature and control groups
The comparison of dietary habits between the two groups revealed differences in the consumption of barbecued food (χ2 = 5.61, P = 0.02) and foreign fast food (χ2 = 7.61, P = 0.01) between the case and control groups (Table 3).
Comparison of sleep habits between the Short-Stature and control groups
Table 4 presents the differences in sleep habits between the short stature group and the control group, with difference in the frequency of nocturnal awakening (P < 0.01, Fisher’s exact test).
Comparison of the parenting habits between the Short-Stature and control groups
Table 5 shows the comparison of parenting habits between the two groups, revealing difference in the time children spent outdoors (χ2 = 11.17, P = 0.01).
Logistic regression
To reduce the possibility of missing potentially important variables, we incorporated variables that revealed statistically significant differences between cases and controls at P < 0.01 into the logistic regression analysis. Certain variables, although not reaching the level of significance in the univariate analysis, may be significant in the multivariate analysis, and to avoid missing suspicious factors, we also included all variables in the multivariate logistic regression.
In the univariate conditional logistic regression analysis (Table 6), child weight was associated with a reduced risk of NVSS, with an odds ratio (OR) of 0.92 (95% CI: 0.88, 0.96), indicating that each unit increase in child weight corresponds to a 8% decrease in the odds of NVSS. Maternal height and weight also showed significant associations with NVSS, exhibiting OR of 0.77 (95% CI: 0.72, 0.83) and 0.95 (95% CI: 0.92, 0.98), respectively. This indicates that each unit increase in maternal height is linked to a 23% reduction in the odds of NVSS, while each unit increase in maternal weight corresponds to a 5% decrease in the odds. Similarly, paternal height exhibited an OR of 0.77 (95% CI: 0.72, 0.82), reinforcing the genetic influence on short stature, with each unit increase linked to a 23% decrease in odds. Paternal weight, while showing a smaller effect (OR = 0.94, 95% CI: 0.91, 0.96), still revealed a significant association, with each unit increase correlating to a 6% reduction in the odds of NVSS. Furthermore, eating barbecued food (OR = 0.56, 95% CI: 0.34, 0.91) is associated with an 44% decrease in the odds of NVSS, suggesting that individuals who consume barbecued food have significantly lower odds of experiencing NVSS compared to those who do not. Similarly, the intake of foreign fast food was associated with a reduced risk of NVSS, with an OR of 0.48 (95% CI: 0.28, 0.81), indicating that consuming foreign fast food may lead to a 52% decrease in the odds of NVSS. Child’s time spent on screening devices also significantly impacted NVSS, with those spending less than 30 min showing an OR of 2.75 (95% CI: 1.02, 7.40), 1–2 h showing an OR of 2.93 (95% CI: 1.06, 8.11) and those spending more than 2 h having an OR of 6.44 (95% CI: 1.63, 25.51). This suggests that spending less than 30 min and 1–2 h on screening devices are associated with 2.75 and 2.93 times the odds of experiencing NVSS compared to those who spend less time. Similarly, spending more than 2 h correlates with 6.44 times the odds of NVSS compared to those who do not engage in such screen time. This highlights a strong link between increased screen time and a higher risk of NVSS. Additionally, spending time outdoors was shown to promote child growth, with those engaging in outdoor activities for less than 1 h having an OR of 0.33 (95% CI: 0.16, 0.69), those spending 1–3 h having an OR of 0.34 (95% CI: 0.16, 0.74), and those spending more than 3 h showing an OR of 0.17 (95% CI: 0.03, 0.90). This indicates that spending more time outdoors is associated with a 67% reduction in the odds for less than 1 h, a 66% reduction for 1–3 h, and an 83% reduction for more than 3 h, suggesting that increased outdoor activity is linked to a lower risk of NVSS.
In the multivariable conditional logistic regression analyses (Table 6), child weight (OR = 0.92, 95% CI: 0.86, 0.99), again demonstrates a 8% reduction in odds with each unit increase, which is clinically significant for monitoring children’s growth. Maternal height (OR = 0.79, 95% CI: 0.72, 0.87), indicates a 21% lower risk with each unit increase, suggesting that taller mothers may contribute to better growth outcomes for their children. Paternal height (OR = 0.83, 95% CI: 0.75, 0.91), maintains a similar interpretation as in the univariate analysis, emphasizing the consistent influence of parental height. Nighttime sleep duration (OR = 0.48, 95% CI: 0.26, 0.89) indicates that adequate sleep can also facilitate height growth, suggesting that children who get sufficient sleep have better growth outcomes compared to those who do not. Additionally, more than 3 h of outdoor activity per day (OR = 0.02, 95% CI: 0.00, 0.66) suggests that high levels of outdoor activity are associated with a substantial reduction in the odds of NVSS, emphasizing the importance of physical activity in child development.
Model selection and explanation
The collected data were used to develop nine machine learning models to predict the onset of NVSS. Each model was trained and evaluated via a stratified k-fold cross-validation approach to ensure robustness and mitigate overfitting. The performance of each model was assessed via the AUC. The results showed that both the RF model and the GBM model emerged as the top predictors, each achieving an AUC of 0.95. They were closely followed by the XGB model with an AUC of 0.93 and the KNN model with an AUC of 0.92 (Fig. 1). The SHAP plot (Fig. 2) was used to interpret the output of the RF model, illustrating the contribution of each feature to the model’s predictions. In the SHAP plot, the values on the x-axis represent the SHAP values, which quantify the impact of each feature on the predicted risk of NVSS. A higher SHAP value indicates a greater contribution to the likelihood of the outcome. The color coding further illustrates the direction of the feature’s effect: red signifies features that increase the risk of short stature, whereas blue indicates features that decrease this risk. The plot revealed that paternal height was the strongest predictor of the risk of short stature, followed by maternal height, caregiver education, and child weight. This visualization not only highlights the most influential variables but also provides insights into their respective effects.
Performance of machine learning models to predict normal-variant short stature. This figure shows the Receiver Operating Characteristic (ROC) curves for various machine learning models used to predict normal-variant short stature. The ROC curve plots the true positive rate (sensitivity) against the false positive rate (1-specificity) at various threshold settings. The area under the curve (AUC) for each model is also provided, indicating the model’s overall performance. Higher AUC values represent better model performance
SHAP Plot on the importance of factors. This figure displays the SHAP (SHapley Additive exPlanations) plot, which illustrates the importance of various factors in the machine learning model’s predictions. Each dot represents a SHAP value for a particular feature for a specific observation. The color of each dot indicates the feature’s value (e.g., high or low). Features are ranked by their importance, with the most important features at the top. This plot helps in understanding how each feature contributes to the model’s output
Discussion
In the case‒control study, we used a machine learning approach to identify the optimal model for predicting the NVSS, explained it through a SHAP plot, and investigated the influence of environmental growth factors on the NVSS. We found that the random forest model and gradient boosting machine model were the most effective predictors and that characteristics such as parental height, caregiver education, and child weight significantly impacted the prediction of NVSS. Multivariate conditional logistic regression revealed that child weight, maternal height, paternal height, sufficient nighttime sleep duration, and children’s outdoor activity time of more than 3 h per day were protective factors related to a lower risk of developing NVSS.
In the present study, child weight and parental height were protective factors against a reduced risk of childhood short stature according to both univariate and multivariate regression analyses. A study by Li et al. revealed that birth weight and a family history of short stature were significantly associated with an increased risk for the development of childhood short stature [28]. A study by Huang et al. reported similar findings [5]. Previous studies have demonstrated that genetics significantly influences the growth and development of children and that individuals with short stature in their immediate family are much more likely to develop this disorder than those without a family history of short stature [29, 30]. While early menarche in mothers has been associated with certain growth patterns in offspring, it is essential to acknowledge the complexity of the relationship between early puberty and child growth. Early menarche in mothers may reduce the occurrence of childhood stunting due to favorable genetic backgrounds and superior living conditions [31,32,33,34]. Some studies indicate that early reproductive maturity can lead to an initial increase in height during childhood, followed by an accelerated growth spurt. However, this can often result in a premature cessation of growth, ultimately leading to a shorter adult stature. For example, research has shown that early menarche is linked to shorter adult height due to this pattern of growth dynamics [35, 36]. Moreover, the assertion that early menarche reflects favorable growing conditions is contentious and requires careful consideration of socioeconomic and demographic factors. In particular, populations such as Black/African Americans and Hispanics, who traditionally face significant health disparities and socioeconomic challenges, tend to experience an earlier onset of puberty compared to non-Hispanic whites. This relationship complicates the notion that early menarche is solely indicative of a favorable environment, as these groups often have less access to quality healthcare, nutrition, and supportive resources that are critical for optimal growth and development. Therefore, when evaluating the impact of early menarche on growth and development, it is crucial to consider these broader contextual factors, as they can significantly influence both the timing of puberty and the overall growth trajectory of children. A nuanced understanding of these relationships can lead to more informed clinical practices aimed at supporting healthy growth among diverse populations.
Aside from genetic factors, changes in the environment and lifestyle may also play a significant role in the growth and development of children. Recent trends indicate a significant increase in beverage consumption among children, which may contribute to rising obesity rates. From 1977 to 78 to 2003-06, average daily energy intake from beverages rose by 203 kcal, while portion sizes increased by 97 g. Although overall food energy consumption has also increased, the steady rise in beverage consumption—particularly those with high energy density—suggests that these drinks may play a crucial role in the obesity epidemic [37]. A parallel meta-analysis of cohort studies found that children with high intake levels of sugar-sweetened beverages had a 55% increased risk of being overweight or obese compared to those with lower intake levels (95% CI 32%-82%) [38]. This elevated risk is concerning because an obese state can disrupt the endocrine system, particularly leading to imbalances in insulin and growth hormone secretion [39]. Such hormonal imbalances can adversely affect the normal development of bones and soft tissues [40]. We also found that regular outdoor activity reduces the risk of childhood NVSS, possibly because outdoor activity helps promote healthy bone growth and increased bone density [41]. Exercise stimulates the secretion of growth hormone and supports vertical bone growth, thereby promoting children’s height development [42, 43]. Second, outdoor activity increases children’s exposure to sunlight, which helps them synthesize vitamin D, which is essential for calcium absorption and mineralization in bones [44, 45]. In addition, outdoor activities promote the health of the cardiovascular system [46] and improve blood circulation throughout the body [47, 48], which facilitates the delivery and absorption of various nutrients and enhances the body’s metabolic level [49].
Research shows that obese children have an average whole-body bone mineral content (BMC) of 500 g, significantly higher than the 400 g observed in normal-weight children (p < 0.01) [50]. Additionally, lumbar BMC values are 300 g for obese children compared to 250 g for normal-weight children (p < 0.05) [51], further supporting the association between increased body fat and elevated BMC. This indicates that obese children bear greater skeletal loads due to weight gain, which may restrict bone growth velocity and lead to skeletal deformities [52, 53]. Furthermore, obesity is often accompanied by chronic low-grade inflammation, characterized by elevated pro-inflammatory markers, which exacerbates insulin resistance and the development of metabolic syndrome [54,55,56]. This inflammation may also interfere with growth hormone signaling and inhibit osteoblast function, further affecting skeletal health [57, 58]. Therefore, appropriate dietary management and regular physical activity are crucial for preventing short stature.
Adequate nocturnal sleep also plays a crucial role in supporting growth-related endocrine regulation, metabolic homeostasis, and inflammatory modulation, thus reducing the risk of short stature. Growth hormone (GH) secretion, which predominantly occurs during slow-wave sleep (SWS), is essential for stimulating insulin-like growth factor 1 (IGF-1) production and promoting linear growth [11, 59, 60]. Sleep deprivation disrupts glucose metabolism, increases cortisol levels, and alters leptin/ghrelin ratios, diverting energy from growth processes [61,62,63]. Additionally, prolonged sleep deficiency elevates pro-inflammatory cytokines, potentially suppressing growth plate activity [64, 65]. Sleep also affects the hypothalamic-pituitary-gonadal axis, influencing pubertal timing and growth plate closure [66, 67].
Machine learning techniques are powerful computational approaches for handling complex and extensive data and are capable of managing highly variable datasets and comprehending intricate relationships between variables in a flexible and trainable manner [68]. Among the nine ML models, the random forest model and gradient boosting machine model achieved the highest AUC value. RF integrates a set of decision trees through majority voting and is widely recognized as an effective classification model [69]. While incorporating additional factors can provide richer information for the predictive model, introducing too many features—especially those without direct causal relationships to the outcome—can lead to overfitting. Overfitting occurs when the model becomes overly tuned to the noise in the training data, which in turn diminishes its ability to generalize to new data. To mitigate this risk, we employed the SHAP method to help select the most relevant factors [27]. The RF model serves as a straightforward and convenient machine learning predictive tool aimed at enhancing clinical decision-making for children with NVSS. ML technology has often been criticized as a “black box,” offering limited insight into how predictions are generated [70]. This concern may lead clinicians to hesitate in adopting such models for medical decisions owing to the opacity of their predictions. To mitigate this concern, we utilized the SHAP method to improve interpretability. SHAP allows us to explain how the model generates predictions by identifying the contribution of each feature for individual cases, thus making the model more transparent and clinically relevant. However, further work is needed to fully describe the model-building process and the role that SHAP plays in enhancing clinical applicability.
This case‒control study identified predictors of short stature, emphasizing the significant influence of parental height on child height, as shown in the column‒line graphical prediction model. Childhood height growth involves both genetic and environmental factors. The height heritability ranged from 0.75 to 0.98 in an analysis of 6,752 individuals across 2,508 families [29]. Genetic factors significantly influence a child’s height, with the probability of NVSS decreasing as parental height increases. Therefore, providing early screening, regular follow-up, and timely treatment for children with shorter parents is crucial. Moreover, this study corroborates previous research indicating that environmental growth factors independently contribute to short stature onset in children [71,72,73,74]. Caregiver education level was moderately yet significantly linked to the risk of short stature in children across all the column plot prediction models. In a longitudinal cohort of 10,127 children evaluated by Ghajar et al. [75], higher caregiver education levels were associated with higher age-specific height-for-age z scores (HAZ), which aligns with findings from previous studies [5, 76]. The hypothesis posited that family structure characterized by higher levels of parental education, more significant family income, and higher socioeconomic status is associated with better nutritional status for children.
In this study, our prediction model aims to identify factors associated with normal-variant short stature, particularly focusing on the risk of growth deceleration. While our analysis is cross-sectional and examines data at a single point in time, we intend for the model to serve as a tool for identifying children who may be at risk of falling off their growth trajectory in the future. By assessing key indicators such as parental height, children’s weight, and environmental factors, the model can provide insights into potential growth patterns. Although we do not have longitudinal data in this study, the factors identified can help clinicians recognize children who may require closer monitoring for growth issues, thereby facilitating timely treatment if growth deceleration is observed over time.
The strength of this study lies in the matching of cases and controls based on basic characteristics, including gender, age, ethnicity, parity, whether the birth was preterm, and the method of delivery, thereby ensuring consistency in these characteristics and reducing the impact of confounding variables and selection bias. Additionally, researchers underwent training before data collection to ensure high-quality clinical research practices. Comprehensive information, including basic information, anthropological characteristics, diet, sleep patterns, and behavior, was collected. This study identified the effects of multiple environmental growth factors on NVSS. More importantly, the etiology of short stature is complex. This paper explores an explainable disease prediction model using machine learning methods, which provides clinicians with a reliable theoretical basis for diagnosing and treating NVSS. Our model can assist clinicians in differentiating between cases that may require intervention and those that are consistent with normal growth patterns based on genetic potential.
This study also has several limitations. (1) One limitation of this study is the regional sampling source, which may not be nationally representative of the population. Due to the limitations of the sample, there is an imbalance between the case and control groups regarding their place of residence and the education level of caregivers. Further sampling across regions and ethnicities is needed to ensure experimental accuracy; (2) most information, such as anthropometric measurements, children’s eating and sleeping habits, and parenting habits, was obtained through interview questionnaires rather than direct measurements, potentially introducing recall bias; and (3) the sample size was small, and separate analyses were not conducted on children of different genders. A larger study is needed to obtain a comprehensive understanding of the growth and development history of children and to maximize the discovery of potential disease factors. (4) This case‒control study cannot clarify causal relationships. A large cohort study should comprehensively investigate the factors affecting NVSS.
Conclusions
Our findings suggest that related factors such as child weight, parental height, sufficient nighttime sleep duration, and outdoor activity of more than 3 h per day are protective factors for the normal-variant short stature of children. The random forest model and gradient boosting machine model performed exceptionally well, demonstrating its potential for clinical applications. Parental height, caregiver education, and child weight were the most important predictors. These findings can help in the development of public health strategies to prevent and manage the onset of normal-variant short stature in children.
Data availability
The original data for this study was obtained from electronic medical records at the hospital and through conversations with the parents of children receiving treatment, rather than from third-party sources. We have attached the collected data as supplementary material (Dataset-file).
References
Rani D et al. Short Stature, in StatPearls. 2024, StatPearls Publishing Copyright © 2024, StatPearls Publishing LLC.: Treasure Island (FL).
Ranabothu S, Kaskel FJ. Validation of automated Greulich-Pyle bone age determination in children with chronic renal failure? Pediatr Nephrol. 2015;30(7):1051–2.
Murano MC, Feldt MM, Lantos JD. Parental concerns on short stature: A 15-Year Follow-Up. J Pediatr. 2020;220:237–40.
Hoover-Fong J, et al. Blood pressure in adults with short stature skeletal dysplasias. Am J Med Genet A. 2020;182(1):150–61.
Huang S, et al. Analysis of risk factors and construction of a prediction model for short stature in children. Front Pediatr. 2022;10:1006011.
Hoyer-Kuhn H, et al. Comparison of DXA scans and conventional X-rays for spine morphometry and bone age determination in children. J Clin Densitom. 2016;19(2):208–15.
Ranke MB. The Kabi pharmacia international growth study: aetiology classification list with comments. Acta Paediatr Scand Suppl. 1991;379:87–92.
Song KC, et al. Etiologies and characteristics of children with chief complaint of short stature. Ann Pediatr Endocrinol Metab. 2015;20(1):34–9.
Saengkaew T, McNeil E, Jaruratanasirikul S. Etiologies of short stature in a pediatric endocrine clinic in Southern Thailand. J Pediatr Endocrinol Metab. 2017;30(12):1265–70.
Wang Q, et al. The epidemic characteristics of short stature in school students. Ital J Pediatr. 2015;41:99.
El Halal CDS, Nunes ML. Sleep and weight-height development. J Pediatr (Rio J). 2019;95(Suppl 1):2–9.
Noeker M. Management of idiopathic short stature: psychological endpoints, assessment strategies and cognitive-behavioral intervention. Horm Res. 2009;71(Suppl 1):75–81.
Di Renzo L, Gualtieri P, De Lorenzo A. Diet, nutrition and chronic degenerative diseases. Nutrients, 2021. 13(4).
Senbanjo IO, et al. Prevalence of and risk factors for stunting among school children and adolescents in Abeokuta, Southwest Nigeria. J Health Popul Nutr. 2011;29(4):364–70.
Li Z, et al. Factors associated with child stunting, wasting, and underweight in 35 Low- and Middle-Income countries. JAMA Netw Open. 2020;3(4):e203386.
Lin YJ et al. Genetic architecture associated with Familial short stature. J Clin Endocrinol Metab, 2020. 105(6).
Wu S. Role of medical IoT-Based bone age determination in the diagnosis and clinical treatment of dwarfism disease monitoring. Contrast Media Mol Imaging. 2022;2022:p7247932.
Martin DD, Schittenhelm J, Thodberg HH. Validation of adult height prediction based on automated bone age determination in the Paris longitudinal study of healthy children. Pediatr Radiol. 2016;46(2):263–9.
Miller DD, Brown EW. Artificial intelligence in medical practice: the question to the answer?? Am J Med. 2018;131(2):129–33.
Koyner JL, et al. Development of a multicenter Ward-Based AKI prediction model. Clin J Am Soc Nephrol. 2016;11(11):1935–43.
Churpek MM, et al. Internal and external validation of a machine learning risk score for acute kidney injury. JAMA Netw Open. 2020;3(8):e2012892.
Gao W, et al. Prediction of acute kidney injury in ICU with gradient boosting decision tree algorithms. Comput Biol Med. 2022;140:105097.
Lundberg S, Lee S. A unified approach to interpreting model predictions in 31st conference on neural information processing systems (NIPS 2017).(2017). CA, USA, 2017.
Li H, et al. [Height and weight standardized growth charts for Chinese children and adolescents aged 0 to 18 years]. Zhonghua Er Ke Za Zhi. 2009;47(7):487–92.
Fredriks AM, et al. Nationwide age references for sitting height, leg length, and sitting height/height ratio, and their diagnostic value for disproportionate growth disorders. Arch Dis Child. 2005;90(8):807–12.
Nahm FS. Receiver operating characteristic curve: overview and practical use for clinicians. Korean J Anesthesiol. 2022;75(1):25–36.
Lee BK, et al. A principal odor map unifies diverse tasks in olfactory perception. Science. 2023;381(6661):999–1006.
Li Y, et al. Analysis of prevalence, influencing factors, and countermeasures of short stature in children and adolescents aged 6∼14 in furong district, Changsha City, in 2020. Evid Based Complement Alternat Med. 2021;2021:3933854.
Wu X, et al. Combined analysis of genomewide scans for adult height: results from the NHLBI family blood pressure program. Eur J Hum Genet. 2003;11(3):271–4.
Wu S, et al. A retrospective analysis of patients with short stature in the South of China between 2007 and 2015. Biomed Res Int. 2018;2018:p5732694.
Prendergast AJ, Humphrey JH. The stunting syndrome in developing countries. Paediatr Int Child Health. 2014;34(4):250–65.
Hoddinott J, et al. Adult consequences of growth failure in early childhood. Am J Clin Nutr. 2013;98(5):1170–8.
Black RE, et al. Maternal and child undernutrition and overweight in low-income and middle-income countries. Lancet. 2013;382(9890):427–51.
Christian P, et al. Nutrition and maternal, neonatal, and child health. Semin Perinatol. 2015;39(5):361–72.
Perry JR, et al. Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature. 2014;514(7520):92–7.
Day FR, et al. Puberty timing associated with diabetes, cardiovascular disease and also diverse health outcomes in men and women: the UK biobank study. Sci Rep. 2015;5:11208.
Duffey KJ, Popkin BM. Energy density, portion size, and eating occasions: contributions to increased energy intake in the united States, 1977–2006. PLoS Med. 2011;8(6):e1001050.
Hu FB. Resolved: there is sufficient scientific evidence that decreasing sugar-sweetened beverage consumption will reduce the prevalence of obesity and obesity-related diseases. Obes Rev. 2013;14(8):606–19.
Braun JM. Early-life exposure to EDCs: role in childhood obesity and neurodevelopment. Nat Rev Endocrinol. 2017;13(3):161–73.
Zhu K, et al. Associations between body mass index, lean and fat body mass and bone mineral density in middle-aged Australians: the Busselton healthy ageing study. Bone. 2015;74:146–52.
Weaver CM, et al. The National osteoporosis foundation’s position statement on peak bone mass development and lifestyle factors: a systematic review and implementation recommendations. Osteoporos Int. 2016;27(4):1281–386.
Bidlingmaier M, et al. Reference intervals for insulin-like growth factor-1 (igf-i) from birth to senescence: results from a multicenter study using a new automated chemiluminescence IGF-I immunoassay conforming to recent international recommendations. J Clin Endocrinol Metab. 2014;99(5):1712–21.
Poh BK, et al. Nutritional status and dietary intakes of children aged 6 months to 12 years: findings of the nutrition survey of Malaysian children (SEANUTS Malaysia). Br J Nutr. 2013;110(Suppl 3):S21–35.
Vimaleswaran KS, et al. Causal relationship between obesity and vitamin D status: bi-directional Mendelian randomization analysis of multiple cohorts. PLoS Med. 2013;10(2):e1001383.
de la Guía-Galipienso F, et al. Vitamin D and cardiovascular health. Clin Nutr. 2021;40(5):2946–57.
Lee IM, et al. Effect of physical inactivity on major non-communicable diseases worldwide: an analysis of burden of disease and life expectancy. Lancet. 2012;380(9838):219–29.
Green DJ, et al. Vascular adaptation to exercise in humans: role of hemodynamic stimuli. Physiol Rev. 2017;97(2):495–528.
Bailey DP, Locke CD. Breaking up prolonged sitting with light-intensity walking improves postprandial glycemia, but breaking up sitting with standing does not. J Sci Med Sport. 2015;18(3):294–8.
Pedersen BK, Saltin B. Exercise as medicine - evidence for prescribing exercise as therapy in 26 different chronic diseases. Scand J Med Sci Sports. 2015;25(Suppl 3):1–72.
Mosca LN, et al. Excess body fat negatively affects bone mass in adolescents. Nutrition. 2014;30(7–8):847–52.
Arabi A, et al. Sex differences in the effect of body-composition variables on bone mass in healthy children and adolescents. Am J Clin Nutr. 2004;80(5):1428–35.
Dimitri P, Wales JK, Bishop N. Fat and bone in children: differential effects of obesity on bone size and mass according to fracture history. J Bone Min Res. 2010;25(3):527–36.
Sioen I, et al. Associations between body composition and bone health in children and adolescents: A systematic review. Calcif Tissue Int. 2016;99(6):557–77.
Reilly SM, Saltiel AR. Adapting to obesity with adipose tissue inflammation. Nat Rev Endocrinol. 2017;13(11):633–43.
Gregor MF, Hotamisligil GS. Inflammatory mechanisms in obesity. Annu Rev Immunol. 2011;29:415–45.
Esser N, et al. Inflammation as a link between obesity, metabolic syndrome and type 2 diabetes. Diabetes Res Clin Pract. 2014;105(2):141–50.
Cao JJ. Effects of obesity on bone metabolism. J Orthop Surg Res. 2011;6:30.
Karsenty G, Ferron M. The contribution of bone to whole-organism physiology. Nature. 2012;481(7381):314–20.
Jenni OG, et al. Sleep duration from ages 1 to 10 years: variability and stability in comparison with growth. Pediatrics. 2007;120(4):e769–76.
Lampl M, Johnson ML. Infant growth in length follows prolonged sleep and increased naps. Sleep. 2011;34(5):641–50.
Huang JY et al. Association of sleep patterns and respiratory disturbance index with physiological parameters in pediatric patients with Self-Perceived short stature. Diagnostics (Basel), 2024. 14(15).
Benedict C, et al. Acute sleep deprivation reduces energy expenditure in healthy men. Am J Clin Nutr. 2011;93(6):1229–36.
Chen X, Beydoun MA, Wang Y. Is sleep duration associated with childhood obesity? A systematic review and meta-analysis. Obes (Silver Spring). 2008;16(2):265–74.
Gómez-González B, et al. Role of sleep in the regulation of the immune system and the pituitary hormones. Ann N Y Acad Sci. 2012;1261:97–106.
Taveras EM, et al. Chronic sleep curtailment and adiposity. Pediatrics. 2014;133(6):1013–22.
Berberoğlu M. Precocious puberty and normal variant puberty: definition, etiology, diagnosis and current management. J Clin Res Pediatr Endocrinol. 2009;1(4):164–74.
Jessen E, et al. Sleep timing in patients with precocious and delayed pubertal development. Clocks Sleep. 2019;1(1):140–50.
Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science. 2015;349(6245):255–60.
Buri M, Hothorn T. Model-based random forests for ordinal regression. Int J Biostat, 2020.
Azodi CB, Tang J, Shiu SH. Opening the black box: interpretable machine learning for geneticists. Trends Genet. 2020;36(6):442–55.
Mendez N, et al. ETHNICITY AND INCOME IMPACT ON BMI AND STATURE OF SCHOOL CHILDREN LIVING IN URBAN SOUTHERN MEXICO. J Biosoc Sci. 2016;48(2):143–57.
Grunauer M, Jorge AAL. Genetic short stature. Growth Horm IGF Res. 2018;38:29–33.
Lee WS, et al. Parental concern of feeding difficulty predicts poor growth status in their child. Pediatr Neonatol. 2019;60(6):676–83.
Zapata ME, Bibiloni MD, Tur JA. Prevalence of overweight, obesity, abdominal-obesity and short stature of adult population of Rosario, Argentina. Nutr Hosp. 2016;33(5):580.
Davallow Ghajar L, DeBoer MD. Environmental and birth characteristics as predictors of short stature in early childhood. Acta Paediatr. 2019;108(5):954–60.
Hancock C, Bettiol S, Smith L. Socioeconomic variation in height: analysis of National child measurement programme data for England. Arch Dis Child. 2016;101(5):422–6.
Acknowledgements
Not applicable.
Funding
This research was funded by the Nanjing Liuhe District Health and Technology Development Project (Project No. LHYB2024026).
Author information
Authors and Affiliations
Contributions
All the authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Jiani Liu, Xin Zhang, and Wei Li. The first draft of the manuscript was written by Jiani Liu. Francis Manyori Bigambo, Xu Wang, and Dandan Wang and Beibei Teng commented on previous versions of the manuscript. All the authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Informed consent for participation was obtained from all participants in the study, and the research received approval from the Institutional Review Board of Nanjing Children’s Hospital.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Clinical trial number
Not applicable.
Footnotes
Not applicable.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, J., Zhang, X., Li, W. et al. Explainable predictive models of short stature and exploration of related environmental growth factors: a case-control study. BMC Endocr Disord 25, 129 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12902-025-01936-x
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12902-025-01936-x