Jun Zhang and Bin Liu* Pages 363 - 373 ( 11 )
Background: DNA-binding proteins are vital cellular components, and their identification is important for the understanding of biological processes. Traditional methods for the prediction of protein function are both time-consuming and expensive. With the development of bioinformatics, a large amount of protein sequence information is available to researchers, necessitating the development of an efficient predictor for identification of DNA-binding proteins based on the protein-sequence information.
Objective: To better utilize the protein sequence information and further improve the accuracy of DNA-binding protein recognition, we designed a new predictor for identifying DNA-binding protein based on a voting strategy.
Method: Here, we employed two feature extractions for DNA-binding protein identification, including Physicochemical Distance Transformation (PDT), and PDT-profile. Then two predictors (iDNA-Prot- PDT and iDNA-Prot-PDT-Profile) were established on the basis of these two feature extraction methods. To further improve the quality of prediction, a voting strategy (iDNA-Prot-Vote) was adopted.
Results: The experimental results on benchmark dataset and independent dataset showed that our methods outperformed other state-of-the-art methods.
Conclusion: These results indicate that the proposed methods are useful for DNA-binding protein identification, which would promote the development of protein sequence analysis.
DNA-binding proteins identification, physicochemical distance transformation, frequency profile, ensemble learning, vector, threshold.
School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055