基于机器学习特征筛选的脑梗死诊断预测模型构建

袁磊; 刘洋; 王明

基于机器学习特征筛选的脑梗死诊断预测模型构建

投稿时间：2025-07-15 修订日期：2025-09-02 点此下载全文

引用本文：袁磊,刘洋,王明.基于机器学习特征筛选的脑梗死诊断预测模型构建[J].医学研究杂志,2026,55(1):127-133

DOI： 10.11969/j.issn.1673-548X.2026.01.022

摘要点击次数: 84

全文下载次数: 51

作者	单位
袁磊	南阳市第二人民医院神内一科 473000
刘洋	南阳市第二人民医院神内三科 473000
王明	南阳市第二人民医院神内三科 473000

基金项目:河南省南阳市科技攻关计划项目(23KJGG162)；南阳市第二人民医院科研立项项目(YJLX-2025-39)

中文摘要:目的基于机器学习特征辅助筛选来构建脑梗死早期精准诊断及预后的预测模型。方法将2023年2月~2025年2月南阳市第二人民医院收治的140例脑梗死患者纳入脑梗死组,选取同期于笔者医院进行体检的210例健康人群为非脑梗死组。并将350例研究对象按照6∶4分为训练集(n=210)和验证集(n=140)。观察研究对象的基线资料；分析与脑梗死诊断显著相关的风险变量,构建列线图预测模型并进行验证；分析模型对患者预后的预测价值。结果两组年龄、体重指数、吸烟史、高血压、糖尿病、颈动脉斑块、收缩压、舒张压、血糖、总胆固醇、甘油三酯、尿素、血肌酐、总胆红素、谷丙转氨酶、同型半胱氨酸、美国国立卫生研究院卒中量表(national institutes of health stroke scale,NIHSS)评分、改良Rankin量表评分等指标比较,差异有统计学意义(P＜0.05)；Logistic回归分析结果显示,年龄、吸烟史、血糖、血肌酐、同型半胱氨酸、NIHSS评分为与脑梗死诊断显著相关的风险变量(P＜0.05)；基于Boruta算法,最终筛选出血糖、血肌酐、同型半胱氨酸、NIHSS评分等4个变量纳入模型；Hosmer Lemeshow拟合优度检验显示:χ²=0.101,P＞0.05,且模型的预测概率与实际事件发生率之间的一致性较高；训练集和验证集中,主成分分析显示低风险和高风险两组均较为离散；与低风险组比较,高风险组患者血糖、血肌酐、同型半胱氨酸、NIHSS评分较高(P＜0.05)。结论基于血糖、血肌酐、同型半胱氨酸、NIHSS评分构建的列线图预测模型可以较好地辅助评估脑梗死诊断,且对患者预后有一定的预测价值。

中文关键词:脑梗死诊断预后列线图预测模型

Construction of A Diagnosis and Prediction Model for Cerebral Infarction Based on Machine Learning Feature Screening.

Abstract:Objective To construct a prediction model for the early accurate diagnosis and prognosis of cerebral infarction based on machine learning feature-assisted screening. Methods A total of 140 patients with cerebral infarction who were admitted to Nanyang Second People′s Hospital from February 2023 to February 2025 were included in the cerebral infarction group, and 210healthy people who underwent physical examinations in our hospital during the same period were selected as the non-cerebral infarction group. The 350study objects were divided into the training set (n=210) and verification set (n=140) in a ratio of 6∶4. The baseline data of the study subjects was observed; the risk variables significantly related to the diagnosis of cerebral infarction were analyzed, a nomogram prediction model was built and verified; and the impact of the model on patient prognosis was analyzed. Results There were statistically significant differences in those indicators such as age, body mass index, smoking history, hypertension, diabetes, carotid plaque, systolic blood pressure, diastolic blood pressure, blood glucose, total cholesterol, triglycerides, urea, serum creatinine, total bilirubin, alanine aminotransferase, homocysteine, National Institutes of Health Stroke Scale (NIHSS) score, and modified Ranking scale score between the two groups (P<0.05). The results of Logistic regression analysis showed that age, smoking history, blood sugar, serum creatinine, homocysteine, and NIHSS scores were risk variables significantly associated with the diagnosis of cerebral infarction (P<0.05). Based on Boruta, the algorithm finally screened out four variables, including blood glucose, serum creatinine, homocysteine, and NIHSS score included into the model. The Hosmer Lemeshow goodness of fit test showed that:χ²=0.101, P>0.05, and the consistency between the predicted probability of the model and the actual event incidence rate was higher; principal component analysis showed that both low-risk and high-risk groups were relatively discrete in the training set and verification set; compared with the low-risk group, the high-risk group had higher blood glucose, serum creatinine, homocysteine, and NIHSS scores (P<0.05). Conclusion The nomogram prediction model constructed based on blood sugar, serum creatinine, homocysteine, and NIHSS scores can better assist in evaluating the diagnosis of cerebral infarction and has certain predictive value for the prognosis of patient.

keywords:Cerebral Diagnosis Prognosis Nomogram prediction model

查看全文查看/发表评论下载PDF阅读器