| 董泽馨,安毅,董琪,张晨晨,孙思佳.基于集成学习预测稻米镉的可解释性分析[J].农业环境科学学报,2025,44(11):2757-2763. |
| 基于集成学习预测稻米镉的可解释性分析 |
| Interpretable analysis of rice cadmium prediction based on ensemble learning |
| 投稿时间:2025-02-06 |
| DOI:10.11654/jaes.2025-0101 |
| 中文关键词: 稻米镉 机器学习 集成学习 SHAP分析 交互作用分析 |
| 英文关键词: rice cadmium machine learning ensemble learning SHAP analysis interaction analysis |
| 基金项目:农田土壤障碍因素多指标综合诊断与智能决策装备研制与应用项目(24ZYCGYS00910);南方稻田镉污染空间分布特征及形态变化关键影响因子筛查项目(2201120603010240000) |
| 作者 | 单位 | E-mail | | 董泽馨 | 农业农村部环境保护科研监测所, 天津 300191 中国农业科学院湘潭综合实验站, 湖南 湘潭 411100 | | | 安毅 | 农业农村部环境保护科研监测所, 天津 300191 中国农业科学院湘潭综合实验站, 湖南 湘潭 411100 | simon8601@126.com | | 董琪 | 农业农村部环境保护科研监测所, 天津 300191 中国农业科学院湘潭综合实验站, 湖南 湘潭 411100 | | | 张晨晨 | 农业农村部环境保护科研监测所, 天津 300191 中国农业科学院湘潭综合实验站, 湖南 湘潭 411100 | | | 孙思佳 | 农业农村部环境保护科研监测所, 天津 300191 中国农业科学院湘潭综合实验站, 湖南 湘潭 411100 东北农业大学资源与环境学院, 哈尔滨 150030 | |
|
| 摘要点击次数: 1071 |
| 全文下载次数: 790 |
| 中文摘要: |
| 本研究针对我国南方稻米主产省,采集涵盖土壤-稻米系统的303组配对样本,为解决传统方法在大尺度异质数据中的泛化瓶颈,提出“多尺度特征工程-混合评估策略-可解释性验证”的创新框架:首先基于方差过滤与递归特征消除(RFE)筛选出8项核心指标,然后采用极端梯度提升(XGB)算法,并对比随机森林(RF)、支持向量机(SVM)等模型的分类性能,最终引入SHAP(Shapley additive explanations)解析模型决策机制,揭示特征交互效应与非线性阈值规律。结果显示,模型最终的准确率达到了84.1%,在召回率和精确率的综合考量下,该模型获得了85.3%的F1得分。研究表明,集成学习可以为稻米镉含量的精准预测提供科学依据和技术支持。 |
| 英文摘要: |
| This study focuses on the major rice-growing provinces in southern China and collected 303 paired soil-rice samples. To address the generalization bottleneck of traditional methods in large-scale heterogeneous data, an innovative framework of“multi-scale feature engineering-hybrid evaluation strategy-explainability verification”was proposed. First, eight core indicators were selected based on variance filtering and Recursive feature elimination(RFE). Second, the Extreme gradient boosting(XGB)algorithm was employed, and its classification performance was compared with other models such as Random forest(RF)and Support vector machine(SVM). Finally, Shapley additive explanations(SHAP)was introduced to interpret the model's decision-making mechanism, revealing the interaction effects of features and non-linear threshold rules. The results showed that the model achieved an accuracy of 84.1%. Moreover, considering the balance between recall and precision, the model obtained an F1 score of 85.3%. This study demonstrates that ensemble learning can provide a scientific basis and technical support for the precise prediction of cadmium content in rice. |
| HTML
查看全文
查看/发表评论 下载PDF阅读器 |
|
|
|