您的位置:首页  > 论文页面

脂滴表面蛋白质的预测和分类

发表时间:2012-09-15  浏览量:1362  下载量:614
全部作者: 董琼叶,汪小我,刘平生,张奇伟
作者单位: 清华大学生物信息学教育部重点实验室;清华信息国家实验室生物信息学部,合成与系统生物学研究中心,清华大学自动化系;中国科学院生物物理研究所
摘 要: 采用机器学习的方法对脂滴表面的相关蛋白进行分析。从NCBI数据库收集了目前已知的Perilipin, ADRP, Oleosin等六类脂滴表面蛋白的氨基酸序列;提取了这些蛋白质序列的氨基酸组成和与其物理化学性质相关的伪氨基酸序列等特征;利用支持向量机(support vector machine, SVM)的方法,训练出两类分类器来区分脂滴表面的蛋白和定位于其他细胞器的蛋白质并对高维的特征向量采用F-score的方法进行特征筛选和降维;最后通过六倍交叉验证表明预测结果的F-value均值达到0.842.
关 键 词: 生物数学;脂滴表面蛋白质;支持向量机;特征选择
Title: Prediction and classification of the lipid droplet-surface proteins
Author: DONG Qiongye, WANG Xiaowo, LIU Pingsheng, ZHANG Qiwei
Organization: Key Laboratory of Bioinformatics, Ministry of Education, Tsinghua University; Bioinformatics Division for Synthetic and Systems Biology, Tsinghua National Laboratory for Information Science and Technology, Department of Automation, Tsinghua University; Institute of Biophysics, Chinese Academy of Sciences
Abstract: In this paper, the proteins associated with the surface of the lipid droplet were analyzed using a machine learning method. Firstly, the amino acid sequences of the known lipid body proteins were collected, such as the Perilipin, ADRP and Oleosin from NCBI database. Secondly, the feature vectors were constructed by using the amino acid composition and the pseudo-amino acid composition based on the physico-chemical features. Then, a binary classifier based on support vector machine (SVM) was built to predict whether a protein was a lipid droplet surface protein or located on the other organells, and the method of F-score was used to select the informative features to diminish the feature dimension. Six-fold cross validation suggested that the classifier could achieve an average F-value of 0.842.
Key words: biomathematics; lipid droplet-surface protein; support vector machine; feature selection
发表期数: 2012年9月第17期
引用格式: 董琼叶,汪小我,刘平生,等. 脂滴表面蛋白质的预测和分类[J]. 中国科技论文在线精品论文,2012,5(17):1667-1673.
 
0 评论数 0
暂无评论
友情链接