毕卫冬,丁昌峰,周志高,王兴祥.基于机器学习方法的花生镉富集系数预测[J].农业环境科学学报,2024,43(6):1230-1238. |
基于机器学习方法的花生镉富集系数预测 |
Prediction of cadmium bioconcentration factor for peanuts based on machine-learning methods |
投稿时间:2023-12-18 |
DOI:10.11654/jaes.2023-1084 |
中文关键词: 土壤 花生 镉 随机森林 预测模型 |
英文关键词: soil peanut cadmium random forest prediction model |
基金项目:国家现代农业产业技术体系项目(CARS-13);国家自然科学基金项目(42077151) |
|
摘要点击次数: 627 |
全文下载次数: 579 |
中文摘要: |
通过采集我国14个省份花生主产区的100组土壤-花生样品,分析土壤-花生系统镉(Cd)污染特征及土壤理化性质,基于机器学习方法建立花生Cd富集系数预测模型并识别花生富集Cd的重要影响因素。结果表明,采集的全国范围土壤样品总体以酸性为主,有60%的土壤pH<6.50,花生籽粒Cd含量均值为0.27 mg·kg-1,富集系数均值达到2.42。利用全国以及南北方产区分组数据构建的随机森林模型(R2=0.930~0.966)对花生 Cd富集系数的预测性能均明显优于相应的多元线性回归模型(R2=0.471~0.657)。随机森林模型分析结果表明不同区域相对重要性较高的特征变量有所差异,影响北方产区花生Cd富集系数预测最重要的特征变量为土壤游离锰氧化物、游离铁氧化物含量和pH,而影响南方产区花生Cd富集系数预测最重要的特征变量为土壤游离锰氧化物、黏粒、游离铁氧化物和有机质含量。研究表明,相较于传统多元线性回归模型,随机森林模型对花生Cd富集系数的预测性能更为优越,为田间大尺度下土壤-花生系统中Cd的迁移预测提供了新的视角和解决方案。 |
英文摘要: |
In this study, 100 pairs of soil and peanut samples were collected from 14 provinces in China. The soil-peanut cadmium(Cd) contamination characteristics and soil physicochemical properties were analyzed. Prediction models of the Cd bioconcentration in peanuts were established based on machine-learning methods and the important factors influencing Cd enrichment in peanuts were identified. The results showed that the soil samples collected were mainly acidic, with 60% of the soils being pH<6.5. The average Cd content in peanut kernels was 0.27 mg · kg-1 and the average bioconcentration factor was 2.42. The prediction performance was significantly better for the random forest models(R2=0.930–0.966), based on the data for the whole country, and the grouped northern and southern producing areas, than for the corresponding multiple linear regression models(R2=0.471–0.657). The results of random forest model analysis showed that the characteristic variables with high relative importance were different in different regions. The most important variables affecting the prediction of Cd bioconcentration in northern producing areas were the free manganese oxide content, free iron oxide content, and pH of the soil, while the most important variables affecting the Cd bioconcentration in southern producing areas were the free manganese oxide, clay, free iron oxide, and organic matter contents of the soil. The results revealed that, compared with the traditional multiple linear regression models, random forest models had better performance at predicting the Cd bioconcentration of peanuts. This provides a new perspective and solution for predicting Cd transfer in soil-peanut systems at a large scale in the field. |
HTML
查看全文
查看/发表评论 下载PDF阅读器 |
|
|
|