Abstract:
High-yield and superior-quality wheat varieties can greatly contribute to the escalating demand for food security. Artificial intelligence (AI) can be integrated into efficient breeding programs. AI-driven information exchange platforms can also facilitate the collection, analysis, and dissemination of breeding data in the genetic improvement of breeding. However, such platforms have hinged on the accurate identification and classification of core data. There are often some challenges, due to the complex and diverse nature of breeding information. In this study, a Naive Bayes-AdaBoost strategy was introduced to construct the wheat breeding interaction system. Naive Bayes, a probabilistic machine learning served as the foundation. The AdaBoost was then integrated to enhance the classification performance of the datasets. AdaBoost functions as a meta-learning iteratively boosted the performance of weak classifiers (in this case, Naive Bayes), where the misclassified data points were focused. As such, the strong classifier significantly improved the accuracy. Additionally, the vocabulary optimization was incorporated. The core vocabulary was then filtered and refined for the data classification. The noise and irrelevant information were reduced to further enhance the recognition accuracy. The efficacy of the Naive Bayes-AdaBoost strategy was evaluated to compare with traditional machine learning, including standalone Naive Bayes, decision tree, and support vector machine. The results demonstrated that the superior performance of the strategy was achieved in a remarkable 99.2% accuracy, in order to classify and recognize the wheat breeding information. There was a substantial 12.2 percentage point improvement over the conventional Naive Bayes. The highly accurate data classification and recognition were realized to construct efficient and reliable wheat breeding information platforms. The valuable tools were potential benefit to the data-driven decision-making in the development of superior wheat varieties for global food security. Meanwhile, a real-world case study was conducted to verify the practical applications of this strategy in a wheat breeding project. New wheat varieties were developed with enhanced disease resistance and yield potential. A dataset was also collected with the genotypic, phenotypic, environmental, and historical breeding records. The naive Bayes-AdaBoost strategy was efficiently classified to reveal the key genes and genomic regions with the targeted traits. The optimal parent selection and breeding strategies promoted the superior wheat varieties. Furthermore, user-friendly software was developed to integrate the strategy into an intuitive interface. The breeding data was easily uploaded, managed, and analyzed without specialized knowledge in machine learning. Data visualization and analysis features were facilitated to identify the potential breeding targets for the better optimization of breeding strategies. The strategy performed better than the immediate application in wheat breeding, indicating the potential applicability to the classification of complex data. For example, the accurate identification and classification of plant diseases were selected for effective disease management. The naive Bayes-AdaBoost strategy was employed to analyze the images of diseased plants in the automated diagnosis of diseases using visual symptoms. Similarly, the improved strategy can be utilized to classify the soil types in satellite imagery for crop health monitoring and resource allocation using real-time data analysis in precision agriculture. In conclusion, the Naive Bayes-AdaBoost strategy can provide a robust solution to the efficient classification and recognition of wheat breeding information. The high accuracy, adaptability, and user-friendliness can be expected to serve as the data-driven breeding programs. Moreover, the versatility can also be extended to diverse domains with complex data classification.