基于机器学习的气象因子与酿酒葡萄代谢组预测建模

    Predictive modelling on meteorological factors and wine grape metabolome using machine learning

    • 摘要: 为探究气象因子对酿酒葡萄代谢组的影响机制并实现对代谢物积累量的精准预测,该研究提出了一种基于改进蜣螂优化算法(improved dung beetle optimizer, IDBO)与极度梯度提升算法(eXtreme gradient boosting, XGBoost)模型的集成预测方法IDBO-XGBoost,基于气象因子对酿酒葡萄代谢组进行预测。首先,利用IDBO优化XGBoost模型的超参数提升模型性能;其次,通过SHAP(shapley additive explanations)可解释性分析,挖掘气象因子对葡萄代谢物质的影响机制。试验结果表明,该研究所提模型在多个酿酒葡萄代谢数据集上的预测精度均优于XGBoost模型,决定系数(R2)平均提升8.5%,平均绝对误差(MAE)、均方根误差(RMSE)和平均绝对百分比误差(MAPE)分别平均下降9.4%、7.7%和12.1%。该方法能够依据气象因子实现对酿酒葡萄代谢物的精准预测,在精准农业领域具有重要应用价值。

       

      Abstract: Climate has posed serious impacts on the growth, development, and quality formation of wine grapes, particularly in conventional viticulture. Meteorological factors—including temperature, solar radiation, and precipitation—play a pivotal role in the physiological and metabolic processes of grape berries. There are also direct influences on the accumulation of the key secondary metabolites, such as flavonoids and aroma compounds, leading to the wine flavor, aroma, and overall quality. It is often required for accurate and reliable predictive models to clarify the relationships between meteorological parameters and grape metabolic responses against global warming and an increasing frequency of extreme weather events. Adaptive cultivation can be expected to advance precision viticulture. Existing prediction models are also limited to the hyperparameter sensitivity, generalization, and proneness to local optima. In this study, a domain-specific dataset was constructed with the meteorological indicators and metabolomic profiles of four wine grape varieties over multiple developmental stages. A forecasting framework (named IDBO-XGBoost) was also proposed to synergistically combine an improved dung beetle optimizer (IDBO) with the eXtreme gradient boosting (XGBoost) algorithm. Among them, the IDBO algorithm incorporated two enhancements: an osprey global exploration to strengthen the population diversity for less premature convergence, and another adaptive t-distribution mutation operator to balance global and local exploitation during the iterative process. The algorithm significantly improved the optimization efficiency, convergence speed, and solution quality. Extensive validation experiments were performed on nine benchmark test functions. The IDBO outperformed the standard Dung Beetle Optimizer in terms of precision and stability after optimization. Once applied to predict the accumulation of 12 key metabolite groups in wine grapes—including flavonols, flavanols, free and bound forms of terpenoids, norisoprenoids, and carbonyl compounds—the IDBO-XGBoost model demonstrated the superior predictive performance over all datasets. The better performance was achieved with an average increase of 8.5% in the coefficient of determination (R2), along with the average reductions of 9.4%, 7.7%, and 12.1% in the mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE), respectively, compared with the baseline XGBoost model. The prediction accuracy and robustness were significantly enhanced. Furthermore, the contribution rate of each meteorological feature was quantified to explore the underlying mechanisms using SHAP (Shapley Additive exPlanations) interpretability analysis. The predictions were obtained for the climatic variables' influence on the metabolic outputs. For instance, the moderate temperatures (DT20-25) were positively correlated with the flavonol accumulation, whereas the high temperatures (DT40) exhibited an inhibitory effect. Solar duration and effective accumulated temperature shared the divergent effects on the free and bound terpenoids, indicating the enzyme-mediated metabolic shifts. Additionally, the variety-specific effects were also observed on some influencing factors, such as the precipitation and temperature ranges, indicating the genetic dependency of the environmental responses. As such, an intelligent computational framework was provided to accurately predict the wine grape metabolic traits under varying climatic conditions. The ecophysiological mechanisms were determined to govern the grape quality. The interpretable machine learning can then bridge the gap between data-driven modeling and biological properties in the decision-making on the vineyard cultivation. The findings can hold substantial practical significance to mitigate the impacts of climate under grape cultivation, in order to enhance the resource use efficiency in the sustainable wine industry.

       

    /

    返回文章
    返回