Submit Manuscript  

Article Details


Identifying protein subcellular location with embedding features learned from networks

Author(s):

Hongwei Liu, Bin Hu, Lei Chen* and Lin Lu   Pages 1 - 17 ( 17 )

Abstract:


Background: Identification of protein subcellular location is an important problem because the subcellular location is highly related to protein function. It is fundamental to determine the locations with biology experiments. However, these experiments are of high costs and time-consuming. The alternative way to address such problem is to design effective computational methods.

Objective: To date, several computational methods have been proposed in this regard. However, these methods mainly adopted the features derived from proteins themselves. On the other hand, with the development of network technique, several embedding algorithms have been proposed, which can encode nodes in the network into feature vectors. Such algorithms connected the network and traditional classification algorithms. Thus, they provided a new way to construct models for the prediction of protein subcellular location.

Method: In this study, we analyzed features produced by three network embedding algorithms (DeepWalk, Node2vec and Mashup) that were applied on one or multiple protein networks. Obtained features were learned by one machine learning algorithm (support vector machine or random forest) to construct the model. The cross-validation method was adopted to evaluate all constructed models.

Results: After evaluating models with the cross-validation method, embedding features yielded by Mashup on multiple networks were quite informative for predicting protein subcellular location. The model based on these features were superior to some classic models.

Conclusion: Embedding features yielded by a proper and powerful network embedding algorithm were effective for building the model for prediction of protein subcellular location, providing new pipelines to build more efficient models.

Keywords:

Protein subcellular location prediction, network embedding algorithm, DeepWalk, Node2vec, Mashup, machine learning algorithm, support vector machine, random forest.

Affiliation:

College of Information Engineering, Shanghai Maritime University, Shanghai, State Key Laboratory of Livestock and Poultry Breeding, Guangdong Public Laboratory of Animal Breeding and Nutrition, Guangdong Provincial Key Laboratory of Animal Breeding and Nutrition, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, College of Information Engineering, Shanghai Maritime University, Shanghai, Department of Radiology, Columbia University Medical Center, New York



Read Full-Text article