Submit Manuscript  

Article Details


An Integrated Prediction Method for Identifying Protein-Protein Interactions

[ Vol. 17 , Issue. 4 ]

Author(s):

Chang Xu, Limin Jiang, Zehua Zhang, Xuyao Yu, Renhai Chen* and Junhai Xu*   Pages 271 - 286 ( 16 )

Abstract:


Background: Protein-Protein Interactions (PPIs) play a key role in various biological processes. Many methods have been developed to predict protein-protein interactions and protein interaction networks. However, many existing applications are limited, because of relying on a large number of homology proteins and interaction marks.

Methods: In this paper, we propose a novel integrated learning approach (RF-Ada-DF) with the sequence-based feature representation, for identifying protein-protein interactions. Our method firstly constructs a sequence-based feature vector to represent each pair of proteins, via Multivariate Mutual Information (MMI) and Normalized Moreau-Broto Autocorrelation (NMBAC). Then, we feed the 638- dimentional features into an integrated learning model for judging interaction pairs and non-interaction pairs. Furthermore, this integrated model embeds Random Forest in AdaBoost framework and turns weak classifiers into a single strong classifier. Meanwhile, we also employ double fault detection in order to suppress over-adaptation during the training process.

Results: To evaluate the performance of our method, we conduct several comprehensive tests for PPIs prediction. On the H. pylori dataset, our method achieves 88.16% accuracy and 87.68% sensitivity, the accuracy of our method is increased by 0.57%. On the S. cerevisiae dataset, our method achieves 95.77% accuracy and 93.36% sensitivity, the accuracy of our method is increased by 0.76%. On the Human dataset, our method achieves 98.16% accuracy and 96.80% sensitivity, the accuracy of our method is increased by 0.6%. Experiments show that our method achieves better results than other outstanding methods for sequence-based PPIs prediction. The datasets and codes are available at https://github.com/guofei-tju/RF-Ada-DF.git

Keywords:

Protein-protein interaction, multivariate mutual information, random forest, AdaBoost framework, double fault detection, sensitivity.

Affiliation:

College of Intelligence and Computing, Tianjin University, Tianjin, College of Intelligence and Computing, Tianjin University, Tianjin, College of Intelligence and Computing, Tianjin University, Tianjin, Department of Radiotherapy, Tianjin Medical University, Cancer Institute and Hospital, Tianjin, College of Intelligence and Computing, Tianjin University, Tianjin, College of Intelligence and Computing, Tianjin University, Tianjin



Read Full-Text article