基于Hadoop的云环境下作物生长模型算法的实现与测试

赵青松; 陈 林; 孙 波; 朱 艳; 姜海燕

摘要: 为了提高作物生长模型的计算速度，该文提出了云环境下作物生长模型算法的实现方案。综合分析了作物生长模型和子模型之间的数据依赖关系，以及不同并行计算方法的特点。以云计算基础架构开源软件Hadoop为基础，设计云环境下作物生长模型处理方案。以小麦生长模型WheatGrow为测试对象，在真实云环境下，验证了该方案的有效性。研究表明，在处理作物生长模型这类具有复杂数据依赖关系问题时，当区域数据点较多，需采用数据并行计算方法；且区域数据点越多，加入计算的计算结点越多，越能体现出MapReduce在并行计算上具有的可扩展性。研究可为促进作物生长模型和数字农业的发展提供参考。

Abstract: Abstract: As the inputs of the crop growth model increased, based on data of multiple sites, weather, and soil, and especially when dealing with massive regional data, the response time of the model gets longer. After a parallel computation scheme of cloud computing was selected in this paper, considering the large amount of weather data, an algorithm of crop growth model based on Cloud Computing was proposed to improve parallel computation speed and response time of the crop growth model. First, the authors analyzed the Crop growth model and data dependence relationships among sub-models, and then summarized different parallel computation schemes. From a system constitution perspective, the crop growth model included model description, model structure, model algorithm, and forcing data. Complex data dependence relations between sub-models and among computing units in the sub-models comprised independency, synchronous dependency, self-reliance, and interdependency. Parallel computation was grouped into data-intensive computing and computing-intensive computing, according to characteristics of the calculation. The former was suitable for computation tasks with large amount of data and simple computing relations, while the latter was suitable for computation tasks with little amount of data and complex computing relations. Second, a scheme of crop growth model based on Cloud Computing was designed on the basis of Hadoop, which is an open-source software of Cloud Computing infrastructure. The MapReduce parallel computation scheme of Crop growth model assumption was that computing tasks of all sub-models in a regional point of the same crop were viewed as a computing job, and a number of computing nodes completed crop growth process computing of multiple regional points. Hence, the granularity of MapReduce parallel computation was a regional point crop, and a computing task of crop growth model could be broken down into multiple sub-computing tasks that executed on different nodes in parallel. The object-oriented approach was employed to design different sub-m. Third, taking Wheat Grow, a wheat growth model from the National Engineering and Technology Center for Information Agriculture, as the testing target, the effectiveness of this scheme was verified in a real Cloud Computing environment. Exemplified by the development stage sub-model, according to contrast research using data-intensive parallel computation methods and computing-intensive parallel computation methods, data-intensive parallel computation methods had better advantages of performance. Therefore, when dealing with crop growth model which had complex data dependence relations, if there appeared more regional data points, the data-intensive parallel computation method was more reasonable to be employed. The advantages of MapReduce extendibility was further reflected based on the more regional data points and the added calculating nodes. When regional points data of crop was fixed, the test line of program runtime fell below the proportional line and increasing tendency gradually became smaller. It also showed that MapReduce had good extendibility. Hadoop was not suitable for processing a small amount of data, and a pseudo-distributed environment was not suitable for the calculation, but pseudo-distributed environment provided convenience for program development. Finally, the authors suggested that this thesis had fixed guidance on regional applications of crop growth mode, and it could achieve both increasing production and income of regional crops and provide reference to promote the development of the crop growth model and the digital agriculture development. Its application prospect was very wide.

基于Hadoop的云环境下作物生长模型算法的实现与测试

Algorithm implementation and tested of crop growth model based on hadoop of cloud computing