基于人工智能的小麦高效育种信息交互系统构建

杨民安; 孙雨; 王凤超; 杨晶; 陈进

doi:10.11975/j.issn.1002-6819.202310107

基于人工智能的小麦高效育种信息交互系统构建

Construction of an efficient wheat breeding information exchange system based on artificial intelligence

摘要

摘要: 小麦是人类社会重要的粮食资源之一，因此基于人工智能技术构建高效育种信息交互平台对于高质高产的小麦种植具有重要的战略价值。高效育种信息交互平台的搭建关键在于核心数据的准确识别与分类，基于此该研究提出了一种Naive Bayes（朴素贝叶斯）-AdaBoost策略，应用于小麦育种信息数据的分类与识别，并实现构筑交互平台。在该策略中AdaBoost主要用于对Naive Bayes的弱分类器进行迭代，形成强分类器，同时过滤并优化核心词汇，达到提高分类识别准确度的目的。结果显示，与传统Naive Bayes方法相比该方法准确率提高了12.2个百分点，识别的准确率达到99.2%，而此时基于Naive Bayes、决策树、支持向量机3种方法的准确率分别为87.0%、86.6%和85.6%。结果表明，该研究所提方法在面对复杂数据分类识别的场景中具有较大的应用潜力。

Abstract: High-yield and superior-quality wheat varieties can greatly contribute to the escalating demand for food security. Artificial intelligence (AI) can be integrated into efficient breeding programs. AI-driven information exchange platforms can also facilitate the collection, analysis, and dissemination of breeding data in the genetic improvement of breeding. However, such platforms have hinged on the accurate identification and classification of core data. There are often some challenges, due to the complex and diverse nature of breeding information. In this study, a Naive Bayes-AdaBoost strategy was introduced to construct the wheat breeding interaction system. Naive Bayes, a probabilistic machine learning served as the foundation. The AdaBoost was then integrated to enhance the classification performance of the datasets. AdaBoost functions as a meta-learning iteratively boosted the performance of weak classifiers (in this case, Naive Bayes), where the misclassified data points were focused. As such, the strong classifier significantly improved the accuracy. Additionally, the vocabulary optimization was incorporated. The core vocabulary was then filtered and refined for the data classification. The noise and irrelevant information were reduced to further enhance the recognition accuracy. The efficacy of the Naive Bayes-AdaBoost strategy was evaluated to compare with traditional machine learning, including standalone Naive Bayes, decision tree, and support vector machine. The results demonstrated that the superior performance of the strategy was achieved in a remarkable 99.2% accuracy, in order to classify and recognize the wheat breeding information. There was a substantial 12.2 percentage point improvement over the conventional Naive Bayes. The highly accurate data classification and recognition were realized to construct efficient and reliable wheat breeding information platforms. The valuable tools were potential benefit to the data-driven decision-making in the development of superior wheat varieties for global food security. Meanwhile, a real-world case study was conducted to verify the practical applications of this strategy in a wheat breeding project. New wheat varieties were developed with enhanced disease resistance and yield potential. A dataset was also collected with the genotypic, phenotypic, environmental, and historical breeding records. The naive Bayes-AdaBoost strategy was efficiently classified to reveal the key genes and genomic regions with the targeted traits. The optimal parent selection and breeding strategies promoted the superior wheat varieties. Furthermore, user-friendly software was developed to integrate the strategy into an intuitive interface. The breeding data was easily uploaded, managed, and analyzed without specialized knowledge in machine learning. Data visualization and analysis features were facilitated to identify the potential breeding targets for the better optimization of breeding strategies. The strategy performed better than the immediate application in wheat breeding, indicating the potential applicability to the classification of complex data. For example, the accurate identification and classification of plant diseases were selected for effective disease management. The naive Bayes-AdaBoost strategy was employed to analyze the images of diseased plants in the automated diagnosis of diseases using visual symptoms. Similarly, the improved strategy can be utilized to classify the soil types in satellite imagery for crop health monitoring and resource allocation using real-time data analysis in precision agriculture. In conclusion, the Naive Bayes-AdaBoost strategy can provide a robust solution to the efficient classification and recognition of wheat breeding information. The high accuracy, adaptability, and user-friendliness can be expected to serve as the data-driven breeding programs. Moreover, the versatility can also be extended to diverse domains with complex data classification.

HTML全文

参考文献(36)

施引文献

资源附件(0)