基于多层功能结构的谷物蛋白质功能预测

    Prediction of cereal protein function based on multilayer functional structures

    • 摘要: 为使研究人员可以更加便捷、准确地选择功能蛋白质,更高效完成谷物功能性食品的研发与创新,该研究提出基于多层功能结构的谷物蛋白质功能预测方法。该研究首先构建多种谷物数据共建的大规模相互作用网络,通过集群的功能特征与未知蛋白的交互作用探寻未知蛋白的相关功能;其次,定义新的蛋白质权重与语义相似度、功能层级权重来确定蛋白质可能具有的功能;最后,通过评分机制辅助完成谷物蛋白质功能的预测结果的判定。试验结果表明,该研究提出的预测方法使预测的功能具有层级性的特点,并且可获得指定功能蛋白质;对功能类别FunCat(functional catelogue)前二层的谷物蛋白质功能预测平均准确率达到85%以上,且能完成对蛋白质的第五层、第六层功能的预测; 层级结构的可回溯性使得预测结果差的功能返回至上层功能,并达到降低假阳性的概率、提高算法整体的预测准确率的效果。该研究结果可为功能类食品、药品的研发提供参考。

       

      Abstract: Cereals are very valuable food sources of healthy and sustainable protein. Food innovations in cereal protein are ever transitioning to more sustainable food systems for healthy diets. A more precise understanding is required by the functions that cereal proteins have. The application of cereal proteins has greatly contributed to genomics and food science today. In this study, a functional prediction was proposed for the cereal proteins using a multilayer functional structure, in order to select the functional proteins more conveniently and accurately. A large-scale interaction network was also constructed with the indica, japonica, wheat, maize, and soybean data. Firstly, the relevant functions of unknown proteins were explored via the interaction of functional features of clusters with the unknown proteins. Secondly, new protein weights, semantic similarity, and functional hierarchy weights were defined to determine the possible functions of proteins. Finally, the grain protein function was further determined using a scoring mechanism in the prediction of the function. The results show that better performance was achieved to predict the function of cereal proteins, particularly with a precision of about 77% for the accurate protein function prediction and up to 92% for the fuzzy protein function prediction using retraceability. A great contribution was made to determine the functional range of unknown proteins, especially with the high efficiency of prediction. The precision of protein function prediction varied significantly at different levels, with an average precision of 92% at level-1, 85% at level-2, and 69% at the level-4. More importantly, the average precision was close to 80% in all six levels of FunCat. As such, the multi-layer functional structure of proteins was predicted to calculate the number of unknown proteins with different sizes. The precision of the prediction was 76% at an unknown protein size of 50, 72% at an unknown protein number of 100, and 66% at an unknown protein number of 200. There was no sharp decrease with the significant increase in the prediction size. It infers that the prediction still performed the best in the case of large-scale unknown proteins. A comparison was made with the latest algorithms, such as FUNPRED_SEQSIN, DAC (Diffusion Alignment Coefficient), and PILL (Predict protein function using Incomplete hierarchical LabeLs). In terms of precision, recall, and F-measured, the performance of the improved prediction was significantly better than the others. The experimental results show that 1) the prediction can be expected to serve as the predicted function hierarchical, particularly for the protein with the specified function, or the available protein functions of specified functional levels; 2) The average precision of the cereal protein function in the first four layers of FunCat (Functional Catelogue) can reach more than 80%, even to realize the prediction of the fifth and sixth layers of the protein; 3) The retrospective nature of the hierarchy can allow the functions with the low predictions to be returned to the higher level functions. As such, the probability of false positives was reduced to improve the overall prediction accuracy. The finding can also provide a strong reference to the protein function prediction in the food industry.

       

    /

    返回文章
    返回