Prediction of cereal protein function based on multilayer functional structures
-
-
Abstract
Cereals are very valuable food sources of healthy and sustainable protein. Food innovations in cereal protein are ever transitioning to more sustainable food systems for healthy diets. A more precise understanding is required by the functions that cereal proteins have. The application of cereal proteins has greatly contributed to genomics and food science today. In this study, a functional prediction was proposed for the cereal proteins using a multilayer functional structure, in order to select the functional proteins more conveniently and accurately. A large-scale interaction network was also constructed with the indica, japonica, wheat, maize, and soybean data. Firstly, the relevant functions of unknown proteins were explored via the interaction of functional features of clusters with the unknown proteins. Secondly, new protein weights, semantic similarity, and functional hierarchy weights were defined to determine the possible functions of proteins. Finally, the grain protein function was further determined using a scoring mechanism in the prediction of the function. The results show that better performance was achieved to predict the function of cereal proteins, particularly with a precision of about 77% for the accurate protein function prediction and up to 92% for the fuzzy protein function prediction using retraceability. A great contribution was made to determine the functional range of unknown proteins, especially with the high efficiency of prediction. The precision of protein function prediction varied significantly at different levels, with an average precision of 92% at level-1, 85% at level-2, and 69% at the level-4. More importantly, the average precision was close to 80% in all six levels of FunCat. As such, the multi-layer functional structure of proteins was predicted to calculate the number of unknown proteins with different sizes. The precision of the prediction was 76% at an unknown protein size of 50, 72% at an unknown protein number of 100, and 66% at an unknown protein number of 200. There was no sharp decrease with the significant increase in the prediction size. It infers that the prediction still performed the best in the case of large-scale unknown proteins. A comparison was made with the latest algorithms, such as FUNPRED_SEQSIN, DAC (Diffusion Alignment Coefficient), and PILL (Predict protein function using Incomplete hierarchical LabeLs). In terms of precision, recall, and F-measured, the performance of the improved prediction was significantly better than the others. The experimental results show that 1) the prediction can be expected to serve as the predicted function hierarchical, particularly for the protein with the specified function, or the available protein functions of specified functional levels; 2) The average precision of the cereal protein function in the first four layers of FunCat (Functional Catelogue) can reach more than 80%, even to realize the prediction of the fifth and sixth layers of the protein; 3) The retrospective nature of the hierarchy can allow the functions with the low predictions to be returned to the higher level functions. As such, the probability of false positives was reduced to improve the overall prediction accuracy. The finding can also provide a strong reference to the protein function prediction in the food industry.
-
-