BACKGROUNDDiabetic kidney disease (DKD) is a common and serious complication of diabetic mellitus (DM). More sensitive methods for early DKD prediction are urgently needed. This study aimed to set up DKD risk prediction models based on machine learning algorithms (MLAs) in patients with type 2 DM (T2DM).METHODSThe electronic health records of 12,190 T2DM patients with 3-year follow-ups were extracted, and the dataset was divided into a training and testing dataset in a 4:1 ratio. The risk variables for DKD development were ranked and selected to establish forecasting models. The performance of models was further evaluated by the indexes of sensitivity, specificity, positive predictive value, negative predictive value, accuracy, as well as F1 score, using the testing dataset. The value of accuracy was used to select the optimal model.RESULTSUsing the importance ranking in the random forest package, the variables of age, urinary albumin-to-creatinine ratio, serum cystatin C, estimated glomerular filtration rate, and neutrophil percentage were selected as the predictors for DKD onset. Among the seven forecasting models constructed by MLAs, the accuracy of the Light Gradient Boosting Machine (LightGBM) model was the highest, indicated that the LightGBM algorithms might perform the best for predicting 3-year risk of DKD onset.CONCLUSIONSOur study could provide powerful tools for early DKD risk prediction, which might help optimize intervention strategies and improve the renal prognosis in T2DM patients.