Submit Manuscript  

Article Details


Identifying the Characteristics of the Hypusination Sites Using SMOTE and SVM Algorithm with Feature Selection

[ Vol. 15 , Issue. 2 ]

Author(s):

XiJun Sun, JiaRui Li, Lei Gu, ShaoPeng Wang, YuHang Zhang, Tao Huang and Yu-Dong Cai*   Pages 111 - 118 ( 8 )

Abstract:


Background: Hypusination is a unique modification on lysine residues in eukaryotic translation initiation factor 5A (eIF5A), which is essential and highly conserved in all kinds of eukaryotes. However, the mechanism of recognizing this particular hypusination site remains unclear. In this study, we first gave an attempt in uncovering the characteristics of the hypusination sites using computational methods.

Method: The hypusination sites validated by experiments or predicted through sequence similarity that were retrieved from the UniProt database were selected for investigating. Each site was transformed into a peptide segment that contained the modification site and the residues around it. Four types of features were extracted from the peptide segments. Because the hypusination sites are much fewer than non-hypusination sites, the synthetic minority over-sampling technique (SMOTE) was performed to make the dataset containing them balanced. Then, some feature selection methods, including maximum relevance minimum redundancy (mRMR) and incremental feature selection (IFS), were used to analyze four types of features and build an optimal classifier that used support vector machine (SVM) as the prediction engine.

Results: The obtained optimal SVM classifier harboring four amino acid features yielded a perfect Mathews’ correlation coefficient (MCC) value of 1.000 on both training and testing sets, indicating these four features are hypusination specific characteristics.

Conclusions: As a pioneer work, our analysis provides insight into the improvement of the understanding of hypusination mechanisms.

Keywords:

Hypusination, SMOTE, mRMR, marchine-learning, feature selection, SMO.

Affiliation:

College of Life Science, Shanghai University, Shanghai, 200444, College of Life Science, Shanghai University, Shanghai, 200444, Boston Children's Hospital & Cell Biology Department, Harvard Medical School, Boston, MA 02215, College of Life Science, Shanghai University, Shanghai, 200444, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, College of Life Science, Shanghai University, Shanghai, 200444

Graphical Abstract:



Read Full-Text article