AKI Prediction in Mechanically Ventilated ICU Patients

Overview

Clinical motivation and study design

Acute kidney injury (AKI) is a common and serious complication in mechanically ventilated ICU patients, associated with increased mortality, prolonged hospital stays, and long-term renal impairment. Early identification of patients at risk enables timely interventions including fluid management, nephrotoxin avoidance, and early renal consultation.

This project builds a complete predictive modeling pipeline using the MIMIC-IV v3.1 critical care database. The outcome of interest is AKI Stage 2 or higher (KDIGO criteria) within 7 days of mechanical ventilation initiation.

Data Source

MIMIC-IV v3.1 — 330,000+ hospital admissions from Beth Israel Deaconess Medical Center (2008–2022)

Outcome

AKI Stage 2+ within 7 days of intubation, defined by KDIGO creatinine criteria with imputed baseline

Methods

Logistic regression, LASSO, random forest, XGBoost, and SVM with MICE imputation across 5 imputed datasets

Analysis Pipeline

Reproducible, end-to-end workflow from raw EHR data to model interpretation

Cohort Construction

Identify mechanically ventilated patients (≥24h), apply clinical exclusion criteria (ESRD, elective surgery, pediatric), define AKI outcomes using KDIGO staging with CKD-EPI imputed baseline creatinine.

01_cohort_construction.Rmd R Clinical

Feature Engineering

Extract 74 candidate predictors from labs (28 categories), vitals, vasopressors, fluid balance, and ICD-coded comorbidities. Derive BMI, P/F ratio, SOFA components, and driving pressure.

02_feature_engineering.Rmd R

Multiple Imputation

Characterize missingness patterns across all features. Apply MICE (Multivariate Imputation by Chained Equations) to generate 5 complete datasets with principled uncertainty propagation.

03_imputation.Rmd Statistics

Machine Learning

Train and evaluate logistic regression, LASSO, random forest, XGBoost, and SVM using tidymodels. Pool predictions across all 5 imputed datasets following Rubin’s rules.

04_machine_learning.Rmd ML

Model Interpretation

SHAP-based global and local feature importance for the best-performing model. Identify which clinical variables drive AKI risk predictions.

05_feature_analysis.Rmd ML

Cohort Construction

Systematic exclusion criteria following CONSORT guidelines

Starting from all ICU admissions with mechanical ventilation ≥24 hours, sequential exclusion criteria were applied to arrive at a clinically homogeneous cohort of 10,089 stays suitable for AKI prediction modeling.

Patient exclusion flow diagram — Figure 1. Sequential application of exclusion criteria to mechanically ventilated MIMIC-IV ICU stays yielded a final analysis cohort of 10,089 stays with no or mild AKI (KDIGO Stage ≤1) at intubation.

AKI outcome distribution — Figure 2. Distribution of the primary outcome across the final analysis cohort; AKI Stage 2+ within 7 days of intubation occurred in 15.4% of evaluable stays (1,551 of 10,085).

Cohort Demographics

Patient characteristics stratified by 7-day AKI outcome (post-MICE, Imputation 1)

Table 1. Selected Cohort Characteristics by AKI Status
Characteristic	Overall (N=10,085)	No AKI Progression (n=8,534)	AKI Stage 2+ (n=1,551)	p-value
Age, years (median [IQR])	65.0 [53.0, 75.0]	64.0 [53.0, 75.0]	67.0 [56.0, 77.0]	<0.001
Sex — Female	4,175 (41.4%)	3,542 (41.5%)	633 (40.8%)	0.630
Sex — Male	5,910 (58.6%)	4,992 (58.5%)	918 (59.2%)
Full demographics table (57 characteristics) available in the complete analysis report.

Missingness pattern in lab variables — Figure 3. Laboratory variable missingness in the 24h post-intubation prediction window; variables exceeding 50% missingness (BNP, fibrinogen, troponin, albumin) were dropped prior to MICE imputation.

Model Performance

Comparing five ML approaches with pooled predictions across 5 MICE imputations

Table 2. Discriminative Performance on Held-Out Test Set
Model	AUROC	AUPRC
Logistic Regression	0.8009	0.4434
XGBoost	0.8003	0.4363
LASSO	0.8000	0.4434
Random Forest	0.7923	0.4282
SVM	0.7855	0.4195
Pooled predictions across 5 MICE imputations. Models ordered by descending AUROC.

Table 3. Classification Thresholds (Youden’s J)
Model	Threshold	Sensitivity	Specificity	Youden’s J
Logistic Regression	0.1600	71.4%	75.3%	0.4666
XGBoost	0.1300	78.5%	69.4%	0.4782
LASSO	0.1700	68.8%	77.4%	0.4626
Random Forest	0.1700	74.3%	70.9%	0.4522
SVM	0.0500	100.0%	0.0%	0.0000
Thresholds derived by maximizing Youden’s J = sensitivity + specificity − 1 on held-out test set.

ROC and precision-recall curves — Figure 4. ROC curves (A) and precision-recall curves (B) for all five models on the held-out test set; pooled predictions across 5 MICE imputations.

Feature Importance

SHAP-based interpretation of the XGBoost model

SHAP (SHapley Additive exPlanations) values provide both global feature rankings and patient-level explanations for each prediction. This helps clinicians understand which variables drive AKI risk predictions and build trust in model outputs.

Figure 5. SHAP beeswarm summary plot for the XGBoost model (top 20 features by mean |SHAP|); each dot represents one test patient, colored by feature value (red = high, blue = low).

LASSO coefficients — Figure 6. LASSO top 25 non-zero coefficients (imputation 1); red = features associated with increased AKI risk, blue = protective features.

Model Calibration

Assessing reliability of predicted probabilities

Table 4. Brier Scores
Model	Brier Score
Logistic Regression	0.1070
LASSO	0.1071
XGBoost	0.1075
Random Forest	0.1088
SVM	0.1304
Brier score = mean(predicted_prob − observed_outcome)². Lower is better. Pooled across 5 MICE imputations.

Figure 7. Calibration plots for all five models; each point represents a probability bin (sized by N), and the dashed diagonal represents perfect calibration.

Code & Data

Fully reproducible pipeline

Source Code

Full analysis pipeline available on GitHub. All scripts are documented R Markdown files that can be rendered with knitr.

Data Access

This project uses MIMIC-IV v3.1. Access requires PhysioNet credentialing and a signed data use agreement. No patient data is included in this repository.

Shiny Dashboard

A Shiny app for interactive cohort exploration is included in the repository (shiny_app/). Clone the repo and run locally with shiny::runApp("shiny_app/shiny_app").

Requirements: R ≥ 4.5 • Key packages: tidyverse, tidymodels, xgboost, mice, shapviz, teal, shiny

Predicting Acute Kidney Injury inMechanically Ventilated ICU Patients

Overview

Data Source

Outcome

Methods

Analysis Pipeline

Cohort Construction

Feature Engineering

Multiple Imputation

Machine Learning

Model Interpretation

Cohort Construction

Cohort Demographics

Model Performance

Feature Importance

Model Calibration

Code & Data

Source Code

Data Access

Shiny Dashboard

Predicting Acute Kidney Injury in
Mechanically Ventilated ICU Patients