| 焦扬庆,张世文,颜芳,王胜涛,赵宝玉.基于特征筛选与随机森林的土壤有机质空间预测[J].农业环境科学学报,2025,44(11):2864-2874. |
| 基于特征筛选与随机森林的土壤有机质空间预测 |
| Spatial prediction of soil organic matter based on feature screening and random forests |
| 投稿时间:2025-06-16 |
| DOI:10.11654/jaes.2025-0557 |
| 中文关键词: 土壤有机质 特征筛选 随机森林 空间预测 |
| 英文关键词: soil organic matter characterization screening random forest spatial prediction |
| 基金项目:第三次新疆综合考察项目(2021xjkk0200601);粮食作物创新团队土壤评价与质量提升岗位专家项目(BAIC02-2024);国家重点研发计划项目(2020YFC1908601) |
|
| 摘要点击次数: 1083 |
| 全文下载次数: 810 |
| 中文摘要: |
| 针对冗余特征变量对随机森林模型存在负面影响的问题,本研究选取不同类型环境变量进行组合并优化,探究不同环境变量对随机森林模型预测土壤有机质空间分布的影响。选择地形因素、气候因素、植被因素、土壤属性和人为因素进行排列组合,构建不同环境变量组合的土壤有机质随机森林预测模型。利用Spearman相关性分析与重要性分析方法遴选最优环境变量集合。结果表明:以土壤属性和人为因素作为输入变量的随机森林预测模型结果较优,其均方根误差和决定系数分别为4.387 g·kg-1和0.802;以气候因素作为单独输入时,预测精度最低,决定系数为0.747;经特征筛选后去掉冗余变量,随机森林预测模型效果达到最优,均方根误差和决定系数分别为2.785 g·kg-1和0.911;相关性和重要性分析结果表明,地形是研究区土壤有机质的主要影响因素。研究表明,随机森林模型预测精度优于传统的普通克里格、回归克里格和地理加权回归克里格模型。特征筛选后的环境变量能够有效提高随机森林预测模型精度,在变量减少的情况下,仅用高程、坡度、成土母质和年平均降水量进行研究区土壤有机质含量空间预测的预测精度达0.8以上。 |
| 英文摘要: |
| In order to investigate the effects of various environmental variables on the spatial distribution of soil organic matter predicted by the random forest model, this study chose various environmental variable types for combination and optimization. The goal was to minimize the detrimental effects of redundant characteristic variables on the random forest model. In order to build a random forest prediction model of soil organic matter with various combinations of environmental variables, topographic factors, climate factor, vegetation factors, soil properties, and anthropogenic factors were chosen and prioritized. Spearman correlation analysis and importance analysis were used to choose the best set of environmental factors. The findings demonstrated that the random forest prediction model with anthropogenic influences and soil characteristics as input variables produced superior outcomes, with root mean square error and coefficient of determination of 4.387 g·kg-1 and 0.802, respectively; With an coefficient of determination of 0.747, the prediction accuracy was lowest when climate factor were employed as independent inputs; The random forest prediction model's findings were optimized with root mean square error and coefficient of determination of 2.785 g·kg-1 and 0.911, respectively, following feature screening to eliminate redundant features; Topography was the primary determinant of soil organic matter in the research area, according to the findings of correlation and significance analyses. The random forest model outperforms the conventional ordinary kriging, regression kriging, and geographically weighted regression kriging models in terms of prediction accuracy. The accuracy of the random forest prediction model may be effectively increased by characteristic filtered environmental variables. Using only elevation, slope, soil parent material, and mean annual precipitation with fewer variables, the spatial prediction accuracy of soil organic matter content in the studied area exceeded 0.8. |
| HTML
查看全文
查看/发表评论 下载PDF阅读器 |
|
|
|