张聪, 周为峰, 唐峰华, 石永闯, 樊伟. 基于机器学习的中西太平洋黄鳍金枪鱼渔场预报模型[J]. 农业工程学报, 2022, 38(15): 330-338. DOI: 10.11975/j.issn.1002-6819.2022.15.036
    引用本文: 张聪, 周为峰, 唐峰华, 石永闯, 樊伟. 基于机器学习的中西太平洋黄鳍金枪鱼渔场预报模型[J]. 农业工程学报, 2022, 38(15): 330-338. DOI: 10.11975/j.issn.1002-6819.2022.15.036
    Zhang Cong, Zhou Weifeng, Tang Fenghua, Shi Yongchuang, Fan Wei. Forecasting models for yellowfin tuna fishing ground in the central and western Pacific based on machine learning[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(15): 330-338. DOI: 10.11975/j.issn.1002-6819.2022.15.036
    Citation: Zhang Cong, Zhou Weifeng, Tang Fenghua, Shi Yongchuang, Fan Wei. Forecasting models for yellowfin tuna fishing ground in the central and western Pacific based on machine learning[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(15): 330-338. DOI: 10.11975/j.issn.1002-6819.2022.15.036

    基于机器学习的中西太平洋黄鳍金枪鱼渔场预报模型

    Forecasting models for yellowfin tuna fishing ground in the central and western Pacific based on machine learning

    • 摘要: 为提供准确的中西太平洋黄鳍金枪鱼渔场预报信息,该研究利用2008-2019年中国水产集团43艘远洋延绳钓渔船在中西太平洋海域(0°~30°S;110°E~170°W)作业的渔业数据,通过方差膨胀因子筛选、归一化处理,选取时空因子、海洋环境因子及大尺度气候数据等共35种特征因子,构建了一种随机森林和极端梯度提升决策树相结合的XGBRF模型,并利用五折交叉验证法确定最佳参数,选择逻辑回归、分类与回归树、K最近邻、自适应增强、梯度提升决策树、极端梯度提升决策树和随机森林等模型作为对照,建立8种黄鳍金枪鱼渔场预测模型并进行模型间的比较分析。结果表明,XGBRF模型对中西太平洋黄鳍金枪鱼渔场的预测性能比其他模型更好,其准确率、渔场召回率、渔场F1得分、非渔场查准率和曲线下面积值AUC均最高,分别为75.39%、87.36%、82.64%、66.32%和79.48%,且模型的受试者工作特征曲线ROC更靠近左上角;海表温度是影响中西太平洋黄鳍金枪鱼渔场分布最重要的环境因子,其他因子依次是300 m水层温度、50 m水层盐度、叶绿素a浓度、南方涛动指数以及表层盐度因子,时空因子和其余大尺度气候因子的影响程度较低;基于XGBRF预报模型得到的渔场预测结果与实际作业范围总体一致。XGBRF集成模型对中西太平洋海域黄鳍金枪鱼的渔场预报具有较好的效果,可为渔场预报提供参考。

       

      Abstract: An accurate forecast can be greatly contributed to the yellowfin tuna fishing ground in the western and Central Pacific. However, a large amount of fishery data, and high feature dimension have posed a great over-fitting on the various classification in recent years. The random forest parallel integration can be expected to achieve the excellent performance of the extreme gradient boosting decision tree algorithm. In this study, a hybrid integration model was proposed to combine the Xgboost with Random Forest (XGBRF) with the random forest and extreme gradient lifting decision tree. The fishery production data was also collected from the operation data of 43 distant-water longline fishing vessels of China Aquatic Group in the western and Central Pacific (0°-30°S; 110°E-170°W) from 2008 to 2019, including catch information, such as amount, job date, as well as the job latitude and longitude. A comparison was performed on the fishery data, including the concentration of chlorophyll a, eddy kinetic energy, sea surface height anomalies, temperature and salinity of the 0-500 m mixed water layer. A total of 36 variable combinations were used as the original data set, including the Southern Oscillation Index (SOI), the Arctic Oscillation Index (AOI), the Pacific Decadal Oscillation Index (PDOI), and North Pacific Gyre Oscillation Index (NPGOI). The original data set was divided into the training set and test set after the screening and normalization of the variance expansion factor, accounting for 80% and 20%, respectively. The training set was used to train eight models, including classification and regression, logistic regression, k-nearest neighbor, adaptive boosting, gradient boosting decision tree, xgboost, random forest, and XGBRF. The five-fold cross-validation was used for each model to determine the optimal parameters. Finally, the model was verified to superimpose the actual fishing ground of the test set. The experimental results showed that: 1) There was a significant correlation between the catch per unit fishing effort and various variable factors. There was also a great decrease in the degree of collinearity between the variables that were filtered by variance inflation factor. 2) The XGBRF hybrid ensemble model also significantly improved the performance of XGBoost and RF models. Specifically, the highest accuracy rate and Area Under Curve (AUC) were 75.39%, and 79.48%, respectively. The Receiver Operator Characteristic (ROC) curve of the XGBRF model was closer to the upper left, indicating the best performance of the forecasting model than before. 3) The sea surface temperature was the most important factor to dominate the distribution of yellowfin tuna fishing ground, accounting for 7.573%. The temperature of the 300 m water layer was equally important for the yellowfin tuna, which was 7.369%. In addition, the greater impact was also found in the salinity of the 50-meter water layer, the SOI, the concentration of chlorophyll a, and the surface salinity. There was a relatively low influence of other large-scale climatic factors, except for the SOI. 4) There was only a small deviation between the fishing ground predicted by the XGBRF model and the actual fishing ground, indicating the high accuracy and reliability of the prediction. Overall, the XGBRF ensemble model performed the best on the fishing ground forecast of yellowfin tuna in the western and Central Pacific. The finding can also provide a strong reference for the fishing ground forecast.

       

    /

    返回文章
    返回