Objective To evaluate the predictive effect of three machine learning methods, namely support vector machine (SVM), K-nearest neighbor (KNN) and decision tree, on the daily number of new patients with ischemic stroke in Chengdu. Methods The numbers of daily new ischemic stroke patients from January 1st, 2019 to March 28th, 2021 were extracted from the Third People’s Hospital of Chengdu. The weather and meteorological data and air quality data of Chengdu came from China Weather Network in the same period. Correlation analyses, multinominal logistic regression, and principal component analysis were used to explore the influencing factors for the level of daily number of new ischemic stroke patients in this hospital. Then, using R 4.1.2 software, the data were randomly divided in a ratio of 7∶3 (70% into train set and 30% into validation set), and were respectively used to train and certify the three machine learning methods, SVM, KNN and decision tree, and logistic regression model was used as the benchmark model. F1 score, the area under the receiver operating characteristic curve (AUC) and accuracy of each model were calculated. The data dividing, training and validation were repeated for three times, and the average F1 scores, AUCs and accuracies of the three times were used to compare the prediction effects of the four models. Results According to the accuracies from high to low, the prediction effects of the four models were ranked as SVM (88.9%), logistic regression model (87.5%), decision tree (85.9%), and KNN (85.1%); according to the F1 scores, the models were ranked as SVM (66.9%), KNN (62.7%), decision tree (59.1%), and logistic regression model (57.7%); according to the AUCs, the order from high to low was SVM (88.5%), logistic regression model (87.7%), KNN (84.7%), and decision tree (71.5%). Conclusion The prediction result of SVM is better than the traditional logistic regression model and the other two machine learning models.
With the increasing availability of clinical and biomedical big data, machine learning is being widely used in scientific research and academic papers. It integrates various types of information to predict individual health outcomes. However, deficiencies in reporting key information have gradually emerged. These include issues like data bias, model fairness across different groups, and problems with data quality and applicability. Maintaining predictive accuracy and interpretability in real-world clinical settings is also a challenge. This increases the complexity of safely and effectively applying predictive models to clinical practice. To address these problems, TRIPOD+AI (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis+artificial intelligence) introduces a reporting standard for machine learning models. It is based on TRIPOD and aims to improve transparency, reproducibility, and health equity. These improvements enhance the quality of machine learning model applications. Currently, research on prediction models based on machine learning is rapidly increasing. To help domestic readers better understand and apply TRIPOD+AI, we provide examples and interpretations. We hope this will support researchers in improving the quality of their reports.