Skip to main content

Development and validation of inpatient mortality prediction models for patients with hyperglycemic crisis using machine learning approaches

Abstract

Background

Hyperglycemic crisis is one of the most common and severe complications of diabetes mellitus, associated with a high motarlity rate. Emergency admissions due to hyperglycemic crisis remain prevalent and challenging. This study aimed to develop and validate predictive models for in-hospital mortality risk among patients with hyperglycemic crisis admitted to the emergency department using various machine learning (ML) methods.

Methods

A multi-center retrospective study was conducted across six large general adult hospitals in Chongqing, western China. Patients diagnosed with hyperglycemic crisis were identified using an electronic medical record (EMR) database. Demographics, comorbidities, clinical characteristics, laboratory results, complications, and therapeutic interventions were extracted from the medical records to construct the prognostic prediction model. Seven machine learning algorithms, including support vector machines (SVM), random forest (RF), recursive partitioning and regression trees (RPART), extreme gradient boosting with dart booster (XGBoost), multivariate adaptive regression splines (MARS), neural network (NNET), and adaptive boost (AdaBoost) were compared with logistic regression (LR) for predicting the risk of in-hospital mortality in patients with hyperglycemic crisis. Stratified random sampling was used to split the data into training (80%) and validation (20%) sets. Ten-fold cross validation was performed on the training set to optimize model hyperparameters. The sensitivity, specificity, positive and negative predictive values, area under the curve (AUC) and accuracy of all models were computed for comparative analysis.

Results

A total of 1668 patients were eligible for the present study. The in-hospital mortality rate was 7.3% (121/1668). In the training set, feature importance scores were calculated for each of the eight models, and the top 10 significant features were identified. In the validation set, all models demonstrated good predictive capability, with areas under the curve value exceeding 0.9 with a F1 score between 0.632 and 0.81, except the MARS model. Six machine learning algorithm models outperformed the referred logistic regression algorithm except the MARS model. Among the selected models, RPART, RF, and SVM achieved the best performance in the selected models (AUC values were 0.970, 0.968 and 0.968, F1 score were 0.652, 0.762, 0.762 respectively). Feature importance analysis identified novel predictors including mechanical ventilation, age, Charlson Comorbidity Index, blood gas index, first 24-hour insulin dosage, and first 24-hour fluid intake.

Conclusion

Most machine learning algorithms exhibited excellent performance predicting in-hospital mortality among patients with hyperglycemic crisis except the MARS model, and the best one was RPART model. These algorithms identified overlapping but different, up to 10 predictors. Early identification of high-risk patients using these models could support clinical decision-making and potentially improve the prognosis of hyperglycemic crisis patients.

Clinical trial number

Not applicable.

Peer Review reports

Background

Diabetes Mellitus (DM) is among the most prevalent chronic diseases worldwide. affecting approximately 537 million individuals today. It is projected that the number will rise to 700 million by 2045, posing significant challenges to global health systems. Diabetes not only leads to substantial morbidity and mortality, with over 400 million deaths annually, but also imposes a considerable burden on individuals, societies, and national economies. In China, it is reported that the number of people with diabetes are 141 million in 2021 and that it will increase to 174 million by 2045. Notably, 51.7% of individuals with diabetes in China remain undiagnosed [1]. Hyperglycemic crisis (HC) represents one of the most severe acute metabolic complications of diabetes that encompasses diabetic ketoacidosis (DKA), hyperosmolar hyperglycemic state (HHS) and DKA combined with HHS (DKA-HHS) [2-4]. DKA and HHS share similar pathophysiological mechanisms, though with some distinctions. The underlying mechanisms of HHS are not as thoroughly understood [5, 6]. DKA is a complex metabolic disorder primarily caused by either an absolute or relative deficiency in insulin, accompanied by elevated levels of catecholamines, cortisol, glucagon, and growth hormones [5, 7]. Hyperglycemia in DKA is driven by three main processes: increased gluconeogenesis, enhanced glycogenolysis, and reduced glucose utilization by peripheral tissues. The insulin deficiency and heightened counterregulatory hormones in DKA also promote lipolysis, leading to the release of free fatty acids from adipose tissue into the bloodstream. These fatty acids are then converted into ketones by the liver. The resulting surge in free fatty acids and ketones exacerbates hyperglycemia by inducing insulin resistance, ultimately leading to ketonemia and metabolic acidosis [8]. DKA, characterized by hyperglycemia (> 250 mg/dL), metabolic acidosis and increased blood ketone concentration, is more common among young individuals with type 1 diabetes mellitus (T1DM). Conversely, HHS is defined by severe hyperglycemia (> 600 mg/dL), hyperosmolarity and dehydration, without ketoacidosis, and it predominantly affects older patients with type 2 diabetes mellitus (T2DM) [8]. Although DKA occurs more commonly in patients with T1DM, the cumulative number of cases of DKA reported in patients with T2DM represents at least one-third of all cases [9]. Hyperglycemic crisis often present abruptly and progress rapidly, requiring immediate medical attention. Most of the patients attend the emergency department for medical care [10], reflecting the acute and critical nature of these conditions. Studies analyzing trends over time, particularly from 2006 to 2017, have reported persistently high ED attendance rates for hyperglycemia in countries such as the United States and Italy [11, 12].Without timely and effective treatment, hyperglycemic crises can result in severe complications, including organ failure, coma, cerebral edema, and even death. Additionally, patients may face an elevated risk of recurrent hyperglycemic episodes in the future [13].

Therefore, emergency physicians and nurses play a crucial role in managing patients with hyperglycemic crisis. Despite advancements in treatment techniques, particularly in developed countries, mortality rates remain alarmingly high, exceeding 10% in some developing regions [14, 15]. Mortality in patients with HHS is reported to be between 5% and 16%, which is around 10 times higher than that of patients with DKA [16]. In China, the mortality rate for hyperglycemic crisis has been reported at 10.8% [17]. Although several studies have reported risk factors affecting death in patients with hyperglycemic crisis, most of them were single-center with small sample sizes [18-21]. Owing to the small number of patients, a limited study population, and a high risk of bias in these studies, the ability to predict mortality remains unknown.

Nowadays, machine learning (ML) is popular in disease prediction fields. Machine learning is a new artificial intelligence discipline, that can be applied to the large datasets of multidimensional variables to explore the nonlinear relationship between clinical indicators and clinical outcomes and predict the results. Its goals are to design and develop algorithms so that computers can improve the performance of data processing. This process includes an analysis of past experience to find practical and useful laws and patterns that human may ignore. The development of automatic models is the central focus of machine learning research; for example, extracting rules and patterns from large datasets [22].Much effort has been put into the development of prediction models to predict the risk of mortality for patients with hyperglycemic crisis. Most prediction tools developed in previous studies rely on generalized linear models, such as logistic regression and Cox proportional hazard models [3, 17, 23, 24]. However, with the rapid advancement of information technology, the emergence of high-dimensional and nonlinear data poses significant challenges to these traditional models. Machine learning offers a robust and innovative approach to analyzing complex medical data, enabling the creation of more accurate predictive models. Given that the emergency clinicians are often the first to encounter patients with hyperglycemic crises, early and acute prognostic prediction of hyperglycemic crisis is critical. Such predictions can enable timely medical interventions, optimize resource allocation, and improve survival outcomes. Accordingly, this study aims to apply various ML algorithms to identify risk factors for mortality in hyperglycemic crises, develop predictive models, and validate these models through cross-validation. The findings are expected to provide valuable references and guidance for clinicians managing these life-threatening conditions.

Materials and methods

This study was an observational investigation based on electronic medical records (EMRs).

It was conducted in accordance with the principles outlined in the Declaration of Helsinki, and was approved by the Institutional Ethical Review Board of the First Affiliated Hospital of Chongqing Medical University (approval number: 2022-K212). A waiver of informed consent was granted due to the anonymous nature of the data used in the analysis. The study retrospectively included all patients presenting with emergency hyperglycemic crises, with data collected from six tertiary general hospitals affiliated with Chongqing Medical University. The data were obtained from the Intelligent Medical Data (IMD) platform, maintained by the Chongqing Medical University Data Science Academy. The flow chart of this study design is shown in Fig. 1. All statistical analyses were performed using the open-source R software (version 4.1.3, R Foundation for Statistical Computing). A two-sided P-value of less than 0.05 was considered statistically significant. Additionally, mortality prediction models for patients with hyperglycemic crises were developed utilizing the caret package (version 6.0–92) within the R programming environment.

Fig. 1
figure 1

The flow chart of this study design

IMD platform and participants

The IMD platform serves as a centralized system for collecting patients’ data from participating hospitals. We extracted data from the IMD platform of patients admitted with hyperglycemic crisis, including diabetic ketoacidosis(DKA), hyperosmolar hyperglycemic state(HHS), and diabetic ketoacidosis combined with hyperosmolar hyperglycemic state (DKA-HHS) over a 6-year period from January 1, 2015, to December 30, 2020. Data are collected through retrospective medical record review and submitted using a standardized data collection tool. The extracted information encompassed patient diagnoses, laboratory indices, comorbidities, procedures, medications, and clinical outcomes. Patient encounters were initially identified based on the International Classification of Diseases, 10th revision (ICD-10) codes for E14.001, E14.002, E14.101, E14.102, and E14.103. Inclusive criteria were as follow:1) the admission or discharge diagnosis was DKA, HHS, or DKA-HHS, confirmed by clinical manifestation and laboratory examination; 2) age ≥ 14;3) admitted to the hospital through the emergency department. Exclusive criteria included: (1) after cardiopulmonary resuscitation prior to emergency admission for CPR can significantly alter a patient’s clinical condition and subsequent outcomes, introducing heterogeneity that may confound the study results; (2) other hyperglycemia states such as stress hyperglycemia; (3) other ketosis states such as alcoholism and hunger ketosis; (4) other metabolic acidosis states; (5) gestational diabetes mellitus; (6) cases with missing medical records exceeding 30%. A total of 1668 patients diagnosed with HC satisfied eligibility for subsequent analysis between Jan 2015 and Dec 2020. This study was approved by the Institutional Ethical Review Board of the First Affiliated Hospital of Chongqing Medical University (approval number: 2022-K212).

Feature inclusion and data preprocessing

In the process of selecting features, we incorporated two sets of variables to construct our machine learning models. The first set comprised the variables from the previous study. The current mortality prediction model for hyperglycemic crisis used 4 variables(hypoglycemia, hypokalemia, acute kidney injury, and combined DKA and HHS) to predict mortality and was derived using logistic regression by Pasquel et al. [3]. Hence, we included the 4 variables into our study. The second set was an expanded variable collection based on clinical practice, including all additional variables that would be accessible to clinicians at the time of hospital presentation for hyperglycemic crises. For this section, we consulted experts from the departments of Emergency Medicine, Endocrinology, and Critical Care Medicine to identify potential factors that might influence the prognosis of patients experiencing hyperglycemic crises. Based on the clinical experience and relevant frontier literature on the etiology, pathology and treatment of hyperglycemic crisis, then, combined with the inclusion and exclusion criteria, the final 26 variables were included in our study. These variables include patient demographics(age, sex, body mass index (BMI), type of hyperglycemic crisis and course of diabetes), comorbidities(infection, multiple system organ failure (MSOF) and Charlson Comorbidity Index (CCI)), complications (hypoglycemia, hypokalemia and acute kidney injury (AKI)), and procedures (first 24 h insulin dosage which refers to the total amount of insulin administered during the initial 24 h after hospital admission, first 24 h infusion volume which refers to the total volume of intravenous fluids administered during the first 24 h of treatment, mechanical ventilation and length of stay) and selected laboratory values (blood glucose on admission, HbA1c, pH, actual base excess (ABE), actual bicarbonate (AB), anion gap (AG), serum creatinine, serum sodium, serum potassium, effective plasma osmotic pressure (EPOP)). Most of the variables were recorded within 24 h of the patient’s admission including demographic information and examination test result. Some variables including AKI, length of stay, hypoglycemia, and hypokalemia etc. were dynamically collected during hospitalization for these indices may be clinical relevance, data availability and practical significance. To ensure robust modeling, we included variables present in at least 90% of the patient records, resulting in a selection of 17 continuous variables and 9 categorical variables. The characteristics of these variables are detailed in Table 1. Missing data were addressed using multiple imputation performed with the R software mice package (version 3.14). This method employs a Markov Chain Monte Carlo (MCMC) approach to predict and replace missing values effectively.

Table 1 Characteristics of included variables

The primary outcome of this study was all-cause mortality among patients with hyperglycemic crises during hospitalization.

Data cleaning and feature engineering

After performing multiple imputations to address missing values, we proceeded with data cleaning, splitting the data into training set and validation set, and carried variables selection. This process and the subsequent machine learning algorithms were completed using the caret package of R. Firstly, we used the createDataPartition function to randomly split the hyperglycemic crisis dataset into the training set (80%) and the internal validation set (20%), and deleted near-zero variance and zero variance variables using the nearZeroVar function, as well as normalized the dataset using the preProcess function. The createDataPartition function in the caret package in R is a tool used to divide a dataset into subsets such as training and validation sets. It ensures that the distribution of the target variable is preserved across the subsets. This is particularly valuable in classification tasks to maintain class proportions (stratified sampling) and in regression tasks to ensure balanced distribution. For the training set, a combination of random undersampling and synthetic minority oversampling techniques(SMOTE)was employed to address the issue of class imbalance between positive and negative samples. Second, the recursive feature elimination (RFE) algorithm based on SHapley Additive exPlanations values was performed to screen out key features. Each algorithm used different methods to identify the most important genetic features. The varImp function in the caret package was applied to extract the important features for each algorithm. Data pre-processing workflow of machine learning was displayed in Fig. 2. Differences in covariates between the training and validation samples were tested using the t-test or nonparametric equivalent for continuous variables and the chi-squared test for categorical or nominal variables.

Fig. 2
figure 2

Data pre-processing workflow of machine learning

Machine learning algorithms

Eight ML algorithms from the caret package of R including (logistic regression (LR) (method=‘glm’)), support vector machines (SVM) with radial basis function kernel (method = ‘svmRadial’), random forest (RF) (method = ‘rf’), recursive partitioning and regression trees (RPART) (method = ‘rpart’), extreme gradient boosting with dart booster (XGBoost) (method = ‘xgbDART’), multivariate adaptive regression splines (MARS) (method = ‘earth’), neural network (NNET) (method = ‘nnet’), and adaptive boost (AdaBoost) (method = ‘adaboost’) were used in the current study.

Logistic regression is a simple and effective model for analyzing binary response data in medical studies. It uses odds instead of risk in its link function, making interpretation straightforward. This model is known for its ease of computation, making it a preferred choice among generalized linear models [25]. Support vector machine is a robust classifier that constructs a boundary between two classes, facilitating label predictions based on feature vectors [26, 27]. Random forest model is a powerful ensemble classifier made up of individual decision trees trained on various subsets of the data. Each tree in the forest works with a limited set of samples (chosen with replacement), and for every split in the tree, a random subset of features is considered [28]. RPART is a type of binary tree used for classification or regression tasks. It performs a search over all possible splits by maximizing an information measure of node impurity, selecting the covariate showing the best split. XGBoost models make predictions using a series of decision trees, representing an interpretable model. This model incorporates a measure of how much model accuracy is improved by the addition of a given variable, with a higher gain value implying greater importance in generating a prediction [29]. MARS is an adaptive regression procedure well suited to problems with a large number of predictor variables. MARS model is constructed using a subset of all such possible linear spline functions [30]. Neural networks like the human brain, connects layers of nodes (neurons) to model an output [31]. AdaBoost was a widely used implementation of boosting and is favored for its accuracy, ease of deployment and fast training time [32]. It uses shallow decision trees as the base classifiers. Readers are referred elsewhere for details on these methods.

For these models, training in caret package can automatically create a grid of tuning parameters by three repeated 10-fold cross-validation. The parameters were all default parameter in caret package.

Model performance measures

We evaluated the performance of each model by calculating the following metrics: (i) AUC, which is a widely used metric for binary classification problems and describes the ability of the models to separate the classes into positive or negative classes representing the model’s ability to distinguish between positive and negative classes. A higher AUC indicates better discriminative performance. (ii) Sensitivity also referred to as the true positive rate or recall, describing what proportion of the correctly classified decreased hyperglycemic crisis patients out of all decreased patients. In essence, sensitivity describes the probability that the model predicts a case as “decreased”, given that the patient is truly dereased. (iii)Specificity, also known as the true negative rate, is the proportion of correctly classified surviving patients by the models out of all surviving classes from the dataset. (iv)Accuracy, which takes into consideration both the sensitivity and specificity of the models and describes what proportion of all cases or subjects were correctly classified by that models. It provides an overall measure of the model’s performance across all classes. (v) F1 score, which is a weighted average of precision and recall (sensitivity), offering a balanced evaluation when comparing these two metrics, particularly in cases of imbalanced datasets [33]. In addition to these primary metrics, other evaluation indices were considered, including positive predictive value (PPV), negative predictive value (NPV), and the kappa value. These supplementary measures provide further insights into the models’ agreement and predictive capabilities.

Result

Summary of patients’ characteristic

A total of 1668 hyperglycemic crisis patients were eligible for the present study. The mortality rate during hospitalization was 7.3% (121/1668). Among these patients, 1335 (80%) and 333 (20%) patients were allocated to the training and validation datasets, respectively. Baseline characteristics, including age, sex, length of stay, duration of diabetes, type of diabetes, type of hyperglycemic crisis, treatment procedures, comorbidities, and blood gas results were compared between the two groups. No statistically significant differences were observed between the training and validation datasets across these parameters. This finding highlights the robustness of the random sampling method and ensures the comparability of the two cohorts. The detailed results of the baseline characteristics are presented in Table 2.

Table 2 Comparation of baseline characteristics between training dataset and validation dataset

Variable importance

In the training set, we calculated the variable importance of each predictor for eight models. The variable importance was ranked, and up to 10 important predictors including mechanical ventilation, hypoglycemia, length of stay, first 24 h insulin dosage, first 24 h infusion volume, AG, AB, pH, age, CCI for all eight models are shown in supplementary Fig. 1 to Fig. 8. These predictors rank slightly differently and some of them are established risk predictors for hyperglycemic crisis patients. Except the NNET model, mechanical ventilation was ranked top one as an important predictor in other seven models. For MARS and LR models, mechanic ventilation, CCI, hypoglycemia and age were important predictors. SVM, RPART, RF, NNET and AdaBoost models consistently identified first 24 h insulin dosage and first 24 h infusion volume as top five important predictors. Interestingly, MARS, SVM and RF model identified actual bicarbonate (AB) as the least important predictor. NNET and RPART identified CCI as the least important predictor. The LR and AdaBoost models identified length of stay as the least important predictor.

Models performance

Table 3 presents a summary of the performance metrics for the eight ML algorithms in predicting mortality among patients with hyperglycemic crisis. According to the results, seven models show good discrimination ability, with an AUC above 0.9 and with a F1 score between 0.632 and 0.81. The AdaBoost achieved highest F1 score of 0.81. In contrast, the MARS model exhibited moderate discrimination ability with an AUC of 0.861 and a F1 score of 0.7. Among these models, the logistic regression model obtained the lowest sensitivity (0.545) whereas the XGBoost model obtained the highest sensitivity (0.818). The sensitivities of the remaining six models were moderate (0.636–0.818), but the specificities are high (0.971–0.99). Notably, the AdaBoost model achieved the highest positive predictive value (0.850), while the XGBoost model demonstrated the highest negative predictive value (0.987). ROC curves that showing the performance of eight models in predicting inpatient mortality in patients with hyperglycemic crisis were provided and a comparative analysis of the AUC values (95% confidence interval) for each model, using the logistic regression model as a reference, is depicted in Fig. 3.

Table 3 Performance of different machine learning models for prediction of hyperglycemic crisis outcome
Fig. 3
figure 3

Validated discrimination for in-hospital mortality in eight models

Discussion

Hyperglycemic crisis is a life-threatening acute complication in patients with diabetes mellitus. Therefore, early indentification of risk factors affecting the prognosis of hyperglycemic crisis patients and timely medical interventions and appropriate care are crucial for reducing mortality rate. Machine learning models have the potential to assist clinicians in initiating resuscitation at the earliest stage and optimizing the allocation of healthcare resources.

In this study of 1,668 patients with diagnosis of hyperglycaemic crisis during emergency visit, we successfully developed models that achieved good predictive performance by using data routinely collected within emergency and subsequent treatment based on a big data platform. The predictive factors identified in our analysis are closely associated with patient outcomes and are readily accessible in most cases of hyperglycemic crisis, thereby enhancing their clinical utility. To the best of our knowledge, this is the first study to employ multiple machine learning approaches with a comprehensive set of predictors to forecast the prognosis of patients with hyperglycemic crisis. This innovative approach represents a significant advancement in the use of big data and machine learning for critical care in diabetes-related emergencies.

This study utilized eight machine learning algorithms to develop models capable of accurately predicting mortality risk in patients with hyperglycemic crisis. The selected variables are clinically relevant to prognosis, and available in most HC patients. Although LR is often regarded as the most appropriate model for predicting complications associated with diabetes mellitus [34], it may not present the optimal choice for predicting in-hospital mortality in patients with hyperglycaemic crisis. Our finding show that, when using the LR model as a reference, all machine learning models, with the exception of the MARS model, outperformed the LR model in predicting in-hospital all-cause mortality among patients with hyperglycaemic crisis.

Previous studies have demonstrated that the mortality in patients with hyperglycaemic crisis is associated with age, level of consciousness upon admission, pH and plasma osmolality levels [4, 35, 36]. In addition to these factors,, we found that actual bicarbonate and anion gap levels were also associated with prognosis in patients with hyperglycaemic crisis. Regarding effective plasma osmolality and serum creatinine, previous study [17] conducted on patients with HHS noted that both effective plasma osmolality and serum creatinine values were higher in the deceased patients compared to survivors, suggesting that these variables may serve as indicators of poor prognosis in these patients population. However, these two indicators did not be screen into final prediction models in this study. This may be related to the fact that we included not only HHS patients but also DKA patients. Consequently, further research is required to better understand the relationship between these indicators and patient outcomes across different types of hyperglycemic crises.

In addition, this study is the first to identify mechanical ventilation as a predictor of mortality in patients experiencing hyperglycaemic crises. Previous studies have also reported that 30-day mortality in critically ill patients receiving mechanical ventilation in the intensive care unit (ICU) was 3.3 times higher than that in patients not receiving mechanical ventilation [37]. Despite this evidence, no previous studies have specifically investigated the prognostic value of mechanical ventilation in the context of hyperglycaemic crises. This is a new finding, but not surprising, as receiving mechanical ventilation means that patients are more severely ill and also prone to complicate such as ventilator-associated pneumonia and ventilator-induced lung injury [38].

In our study, all machine learning models consistently infentified that the first 24-hour infusion volume and first 24-hour insulin dosage as prognostic predictors for patients with hyperglycaemic crisis. This may be explained by the prominent clinical manifestations of hyperglycaemic crisis, namely severe dehydration and hyperglycaemia. The primary treatments for which are mainly rehydration and insulin therapy [13, 39], and the amount of intravenous fluids and insulin dose administered in the first 24-hour may, to some extent, reflect the severity of the condition. Furthermore, consistent with the findings of Pasquel et al. [3]. our study revealed that the occurrence of hypoglycemia and hypokalemia during hospitalization significantly affected the prognosis of patients with hyperglycemic crisis. This is due to the use of insulin therapy, which leads to patients being prone to complications such as hypoglycemia and hypokalemia [40-42], which can be life-threatening if left untreated. This serves as a reminder to clinicians that, in addition to aggressive intravenous rehydration and insulin therapy, blood potassium and blood glucose levels should be dynamically assessed and treatment regimens adjusted as needed.

Our study revealed that length of stay(LOS) display an important indicator in predicting in-hospital mortality of patients with hyperglycemic crisis. LOS often correlates with the complexity of the clinical case, the severity of illness, and the response to treatment. Clinical practitioners cannot respond appropriately to emergency cases because the number of patients with longer length of stay exceeds the patient-handling capacity of standard medical services and personnel. Prolonged length of stay lowers performance in managing new emergency cases and increases the risk of delayed treatment, mortality, morbidity, and patient complaints(43).

Studies have demonstrated that prolonged length of hospital stays are frequently associated with higher morbidity, comorbid conditions, or delayed recovery processes. Liu et al. used machine learning for predicting in-hospital mortality in elderly patients with heart failure combined with hypertension and result demonstrated that LOS was the most related factor [43]. Dilek reported that LOS ≥ 4 days was independent risk factor of in-hospital mortality [44]. Arnold et al. [45] reported that LOS was directly associated with the risk of mortality from pneumonia among elderly patients. For dynamic prediction models, LOS can be considered a time-varying variable for continuous monitoring. As a patient’s LOS increases, the model can iteratively update this variable and reassess the patient’s risk of mortality based on the newly updated data. This approach leverages the temporal changes in LOS to enhance the model’s capacity for dynamic monitoring of patient prognosis over time.

There are some limitations in our study. First, since all the data was retrospectively collected from Intelligent Medical Dataset platform of Chongqing Medical University, the data may have a selection bias. However, the data was collected from six separate medical centers, and the sample size was large enough to enable us to carry out internal validation. Regretfully, we didn’t carry out external validation, which is something we will further explore and validate in the future. Secondly, there are some difficulties in implementing prediction models with many predictors in emergency clinical practice. Variables used as inputs to the machine learning algorithms were those that are typically obtainable or evaluated in most cases. However, the prediction might be influenced slightly according to the variables and might be adjusted with consideration for their availability when incorporated. In the future, we suggest carrying out a bigger sample, multi-center and prospective study to further validate our results. Thirdly, one issue of class imbalance in our dataset exists, and it may potentially impact the model’s performance and generalizability. However, to mitigate the impact of class imbalance, we employed undersampling, class-weight adjustments, or synthetic data generation techniques like SMOTE during the training phase. Although appropriate methods were used to address the issue of class imbalance, this problem is still relatively common in studies involving predictive models using big dataset.

Conclusion

The hyperglycemic crisis is a significant cause of inpatient mortality for diabetic patients. Patients often attend in the emergency department and they require immediate evaluation and treatment. Hyperglycaemic crisis represent a significant cause of inpatient mortality among diabetic patients. These patients frequently present to emergency departments, necessitating immediate evaluation and treatment. In this study, we developed and validated predictive models utilizing machine learning algorithms to estimate the risk of mortality in patients experiencing hyperglycaemic crises. These models rely on commonly available clinical indicators during emergency admission and hospitalization, offering potential benefits for clinical decision-making and prognostic assessments.

The early identification of mortality risk in hyperglycaemic crisis patients is critical for enabling clinicians to implement timely and appropriate medical interventions. Such measures not only conserve medical resources but also improve patient survival outcomes. As demonstrated by the results of our study, machine learning provides a promising alternative approach to traditional methods for mortality risk prediction in this population.

We developed and validated models using machine learning algorithms to predict the risk of death in hyperglycaemic crisis patients with common indicators during emergency admission and hospitalization, with implications for clinical decision-making and prognostic prediction. Early prognostic prediction of hyperglycemic crisis is essential for clinicians to take prompt and appropriate medical measures so that can save medical resources and improve survival outcomes. As our study results demonstrated, machine learning is a promising alternative approach for mortality risk prediction in hyperglycemic crisis patients.

Data availability

The data that support the findings of this study are available from the authors but restrictions apply to the availability of these data, which were used under license from Chongqing Medical University Data Science Academy for the current study, and so are not publicly available. Data are, however, available from the authors upon reasonable request.

References

  1. IDF Diabetes Atlas. 2021| IDF Diabetes Atlas [Internet]. [cited 2024 Dec 21]. Available from: https://diabetesatlas.org/atlas/tenth-edition/

  2. Canarie MF, Bogue CW, Banasiak KJ, Weinzimer SA, Tamborlane WV. Decompensated hyperglycemic hyperosmolarity without significant ketoacidosis in the adolescent and young adult population. J Pediatr Endocrinol Metab JPEM. 2007;20(10):1115–24.

    CAS  PubMed  Google Scholar 

  3. Pasquel FJ, Tsegka K, Wang H, Cardona S, Galindo RJ, Fayfman M, et al. Clinical outcomes in patients with isolated or combined Diabetic Ketoacidosis and Hyperosmolar Hyperglycemic State: a Retrospective, Hospital-based Cohort Study. Diabetes Care. 2020;43(2):349–57.

    CAS  PubMed  Google Scholar 

  4. Stoner GD. Hyperosmolar Hyperglycemic State. Am Fam Physician. 2017;96(11):729–36.

    PubMed  Google Scholar 

  5. Kitabchi AE, Umpierrez GE, Murphy MB, Barrett EJ, Kreisberg RA, Malone JI, et al. Management of hyperglycemic crises in patients with diabetes. Diabetes Care. 2001;24(1):131–53.

    CAS  PubMed  Google Scholar 

  6. Goyal A, Mathew UE, Golla KK, Mannar V, Kubihal S, Gupta Y, et al. A practical guidance on the use of intravenous insulin infusion for management of inpatient hyperglycemia: intravenous insulin infusion for management of Inpatient Hyperglycemia. Diabetes Metab Syndr. 2021;15(5):102244.

    CAS  PubMed  Google Scholar 

  7. Kitabchi AE, Umpierrez GE, Murphy MB, Kreisberg RA. Hyperglycemic crises in adult patients with diabetes: a consensus statement from the American Diabetes Association. Diabetes Care. 2006;29(12):2739–48.

    CAS  PubMed  Google Scholar 

  8. Fayfman M, Pasquel FJ, Umpierrez GE. Management of hyperglycemic crises: Diabetic Ketoacidosis and Hyperglycemic Hyperosmolar State. Med Clin North Am. 2017;101(3):587–606.

    PubMed  PubMed Central  Google Scholar 

  9. Wang ZH, Kihl-Selstam E, Eriksson JW. Ketoacidosis occurs in both type 1 and type 2 diabetes–a population-based study from Northern Sweden. Diabet Med J Br Diabet Assoc. 2008;25(7):867–70.

    CAS  Google Scholar 

  10. Yan JW, Gushulak KM, Columbus MP, van Aarsen K, Hamelin AL, Wells GA, et al. Risk factors for recurrent emergency department visits for hyperglycemia in patients with diabetes mellitus. Int J Emerg Med. 2017;10(1):23.

    PubMed  PubMed Central  Google Scholar 

  11. Wang J, Geiss LS, Williams DE, Gregg EW. Trends in Emergency Department Visit Rates for Hypoglycemia and Hyperglycemic Crisis among adults with diabetes, United States, 2006–2011. PLoS ONE. 2015;10(8):e0134917.

    PubMed  PubMed Central  Google Scholar 

  12. Andreano A, Bosio M, Russo AG. Emergency attendance for acute hyper- and hypoglycaemia in the adult diabetic population of the metropolitan area of Milan: quantifying the phenomenon and studying its predictors. BMC Endocr Disord. 2020;20(1):72.

    PubMed  PubMed Central  Google Scholar 

  13. Muneer M, Akbar I. Acute Metabolic emergencies in Diabetes: DKA, HHS and EDKA. Adv Exp Med Biol. 2021;1307:85–114.

    CAS  PubMed  Google Scholar 

  14. Savage MW, Dhatariya KK, Kilvert A, Rayman G, Rees JaE, Courtney CH, et al. Joint British Diabetes societies guideline for the management of diabetic ketoacidosis. Diabet Med J Br Diabet Assoc. 2011;28(5):508–15.

    CAS  Google Scholar 

  15. Otieno CF, Kayima JK, Omonge EO, Oyoo GO. Diabetic ketoacidosis: risk factors, mechanisms and management strategies in sub-saharan Africa: a review. East Afr Med J. 2005;82(12 Suppl):S197–203.

    CAS  PubMed  Google Scholar 

  16. Bhowmick SK, Levens KL, Rettig KR. Hyperosmolar hyperglycemic crisis: an acute life-threatening event in children and adolescents with type 2 diabetes mellitus. Endocr Pract off J Am Coll Endocrinol Am Assoc Clin Endocrinol. 2005;11(1):23–9.

    Google Scholar 

  17. Wu XY, She DM, Wang F, Guo G, Li R, Fang P, et al. Clinical profiles, outcomes and risk factors among type 2 diabetic inpatients with diabetic ketoacidosis and hyperglycemic hyperosmolar state: a hospital-based analysis over a 6-year period. BMC Endocr Disord. 2020;20(1):182.

    PubMed  PubMed Central  Google Scholar 

  18. Siregar NN, Soewondo P, Subekti I, Muhadi M. Seventy-two hour mortality prediction model in patients with Diabetic Ketoacidosis: a retrospective cohort study. J ASEAN Fed Endocr Soc. 2018;33(2):124–9.

    PubMed  PubMed Central  Google Scholar 

  19. Ahuja W, Kumar N, Kumar S, Rizwan A. Precipitating risk factors, clinical presentation, and Outcome of Diabetic Ketoacidosis in patients with type 1 diabetes. Cureus. 2019;11(5):e4789.

    PubMed  PubMed Central  Google Scholar 

  20. Singh H, Saroch A, Pannu AK, Sachin HJ, Sharma N, Dutta P. Clinical and biochemical profile, precipitants and prognostic factors of diabetic ketoacidosis: a retrospective study from a tertiary care center of north India. Diabetes Metab Syndr. 2019;13(4):2357–60.

    PubMed  Google Scholar 

  21. Guo YW, Wu TE, Chen HS. Prognostic factors of mortality among patients with severe hyperglycemia. Am J Manag Care. 2015;21(1):e9–22.

    PubMed  Google Scholar 

  22. Golpour P, Ghayour-Mobarhan M, Saki A, Esmaily H, Taghipour A, Tajfard M, et al. Comparison of support Vector Machine, Naïve Bayes and Logistic Regression for assessing the necessity for coronary angiography. Int J Environ Res Public Health. 2020;17(18):6449.

    PubMed  PubMed Central  Google Scholar 

  23. Ekpebegh CO, Longo-Mbenza BBIS, Nge AO. A clinical score to predict survival from hyperglycemic crisis following general medical wards admission in a resource constrained setting. Int J Diabetes Dev Ctries. 2012;32(1):7–13.

    Google Scholar 

  24. Huang CC, Weng SF, Tsai KT, Chen PJ, Lin HJ, Wang JJ, et al. Long-term mortality risk after Hyperglycemic Crisis episodes in geriatric patients with diabetes: a National Population-based Cohort Study. Diabetes Care. 2015;38(5):746–51.

    CAS  PubMed  Google Scholar 

  25. Stoltzfus JC. Logistic regression: a brief primer. Acad Emerg Med off J Soc Acad Emerg Med. 2011;18(10):1099–104.

    Google Scholar 

  26. Noble WS. What is a support vector machine? Nat Biotechnol. 2006;24(12):1565–7.

    CAS  PubMed  Google Scholar 

  27. Acharya TD, Subedi A, Lee DH. Evaluation of Machine Learning Algorithms for Surface Water Extraction in a landsat 8 scene of Nepal. Sensors. 2019;19(12):2769.

    PubMed  PubMed Central  Google Scholar 

  28. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

    Google Scholar 

  29. Chen T, Guestrin C, XGBoost:. A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [Internet]. New York, NY, USA: Association for Computing Machinery; 2016 [cited 2024 Dec 21]. pp. 785–94. (KDD ’16). Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.1145/2939672.2939785

  30. Hastie T, Tibshirani R. The Elements of Statistical Learning,Data Mining, Inference, and Prediction [Internet]. New York, NY, USA: Springer; 2016. 745 p. Available from: http://link.springer.com/book/10.1007/978-0-387-848

  31. Heaton JT. Introduction to Neural Networks with Java. Heaton Res Inc [Internet]. 2005 [cited 2024 Dec 21]; Available from: http://www.semanticscholar.org/paper/ba8c8c80dfaf3f5558b59dbb728d9249036864a2

  32. Walker KW, Jiang Z. Application of adaptive boosting (AdaBoost) in demand-driven acquisition (DDA) prediction: a machine-learning approach. J Acad Librariansh. 2019;45(3):203–12.

    Google Scholar 

  33. Shin D, Lee KJ, Adeluwa T, Hur J. Machine learning-based predictive modeling of Postpartum Depression. J Clin Med. 2020;9(9):2899.

    PubMed  PubMed Central  Google Scholar 

  34. Dagliati A, Marini S, Sacchi L, Cogni G, Teliti M, Tibollo V, et al. Machine learning methods to Predict Diabetes complications. J Diabetes Sci Technol. 2018;12(2):295–302.

    PubMed  Google Scholar 

  35. Nyenwe EA, Kitabchi AE. The evolution of diabetic ketoacidosis: an update of its etiology, pathogenesis and management. Metabolism. 2016;65(4):507–21.

    CAS  PubMed  Google Scholar 

  36. Barski L, Nevzorov R, Rabaev E, Jotkowitz A, Harman-Boehm I, Zektser M, et al. Diabetic ketoacidosis: clinical characteristics, precipitating factors and outcomes of care. Isr Med Assoc J IMAJ. 2012;14(5):299–303.

    PubMed  Google Scholar 

  37. Ferrante LE, Pisani MA, Murphy TE, Gahbauer EA, Leo-Summers LS, Gill TM. Functional trajectories among older persons before and after critical illness. JAMA Intern Med. 2015;175(4):523–9.

    PubMed  PubMed Central  Google Scholar 

  38. Goligher EC, Ferguson ND, Brochard LJ. Clinical challenges in mechanical ventilation. Lancet Lond Engl. 2016;387(10030):1856–66.

    Google Scholar 

  39. Karslioglu French E, Donihi AC, Korytkowski MT. Diabetic ketoacidosis and hyperosmolar hyperglycemic syndrome: review of acute decompensated diabetes in adult patients. BMJ. 2019;365:l1114.

    PubMed  Google Scholar 

  40. Dhatariya K, Nunney I, Iceton G. Institutional factors in the management of adults with diabetic ketoacidosis in the UK: results of a national survey. Diabet Med J Br Diabet Assoc. 2016;33(2):269–70.

    CAS  Google Scholar 

  41. Karajgikar ND, Manroa P, Acharya R, Codario RA, Reider JA, Donihi AC, et al. Addressing pitfalls in Management of Diabetic Ketoacidosis with a standardized protocol. Endocr Pract off J Am Coll Endocrinol Am Assoc Clin Endocrinol. 2019;25(5):407–12.

    Google Scholar 

  42. Ullal J, Aloi JA, Reyes-Umpierrez D, Pasquel FJ, McFarland R, Rabinovich M Comparison of Computer-Guided Versus Standard Insulin Infusion Regimens in Patients With Diabetic Ketoacidosis. J Diabetes Sci Technol., Al-Qahtani S, Alsultan A, Haddad S The association of duration of boarding in 43., Al-Qahtani S, Alsultan A, Haddad S et al. The association of duration of boarding in the emergency room and the outcome of patients admitted to the intensive care unit. BMC Emerg Med. 2017;17(1):34.

  43. Liu X, Xie Z, Zhang Y, et al. Machine learning for predicting in-hospital mortality in elderly patients with heart failure combined with hypertension: a multicenter retrospective study. Cardiovasc Diabetol. 2024;23:407.

    PubMed  PubMed Central  Google Scholar 

  44. Dülger D, Albuz Ö. Risk indices that predict in-hospital mortality of elderly patients. Turk J Med Sci 2020 Jun. 2020;23(4):969–77.

    Google Scholar 

  45. Arnold FW, Reyes Vega AM, Salunkhe V, et al. Older adults hospitalized for Pneumonia in the United States: incidence, epidemiology, and outcomes. J Am Geriatr Soc. 2020;68(5):1007–14.

    PubMed  Google Scholar 

Download references

Acknowledgements

We thank Chongqing Medical University Data Science Academy for providing us the data.

Funding

KZ received funding from Chongqing Medical University, with Chongqing Medical University Intelligent medical Projects (grant number, ZHYX202119). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This study was supported by 2024 First-class Disciplines - Nursing Discipline Construction Fund; Project Code: 03010205040301.

Author information

Authors and Affiliations

Authors

Contributions

MG and KZ made substantial contributions to the conception, software, medical guidance and design of the work. RH drafted and revised the work, and analyzed the data and interpreted the results. HL provided acquisition, analysis and interpretation of data. All authors edited the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Kebiao Zhang or Manping Gu.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Institutional Ethical Review Board of the First Affiliated Hospital of Chongqing Medical University (approval number: 2022-K212) with a waiver of informed consent due to the anonymous nature of the data.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, R., Zhang, K., Li, H. et al. Development and validation of inpatient mortality prediction models for patients with hyperglycemic crisis using machine learning approaches. BMC Endocr Disord 25, 86 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12902-025-01873-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12902-025-01873-9

Keywords