Staff View: Computational Technique for an Efficient Classification of Protein Sequences With Distance-Based Sequence Encoding Algorithm

Computational Technique for an Efficient Classification of Protein Sequences With Distance-Based Sequence Encoding Algorithm

Machine learning is being implemented in bioinformatics and computational biology to solve challenging problems emerged in the analysis and modeling of biological data such as DNA, RNA, and protein. The major problems in classifying protein sequences into existing families/superfamilies are the foll...

Full description

Main Authors:	Iqbal, M.J., Faye, I., Said, A.M.D., Samir, B.B.
Format:	Article
Institution:	Universiti Teknologi Petronas
Record Id / ISBN-0:	utp-eprints.19628 /
Published:	Blackwell Publishing Inc. 2017
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-84945256270&doi=10.1111%2fcoin.12069&partnerID=40&md5=adcc0812e793bce19eaea8e92af9991d http://eprints.utp.edu.my/19628/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	utp-eprints.19628
recordtype	eprints
spelling	utp-eprints.196282018-04-20T07:19:25Z Computational Technique for an Efficient Classification of Protein Sequences With Distance-Based Sequence Encoding Algorithm Iqbal, M.J. Faye, I. Said, A.M.D. Samir, B.B. Machine learning is being implemented in bioinformatics and computational biology to solve challenging problems emerged in the analysis and modeling of biological data such as DNA, RNA, and protein. The major problems in classifying protein sequences into existing families/superfamilies are the following: the selection of a suitable sequence encoding method, the extraction of an optimized subset of features that possesses significant discriminatory information, and the adaptation of an appropriate learning algorithm that classifies protein sequences with higher classification accuracy. The accurate classification of protein sequence would be helpful in determining the structure and function of novel protein sequences. In this article, we have proposed a distance-based sequence encoding algorithm that captures the sequence's statistical characteristics along with amino acids sequence order information. A statistical metric-based feature selection algorithm is then adopted to identify the reduced set of features to represent the original feature space. The performance of the proposed technique is validated using some of the best performing classifiers implemented previously for protein sequence classification. An average classification accuracy of 92 was achieved on the yeast protein sequence data set downloaded from the benchmark UniProtKB database. © 2015 Wiley Periodicals, Inc. Blackwell Publishing Inc. 2017 Article PeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-84945256270&doi=10.1111%2fcoin.12069&partnerID=40&md5=adcc0812e793bce19eaea8e92af9991d Iqbal, M.J. and Faye, I. and Said, A.M.D. and Samir, B.B. (2017) Computational Technique for an Efficient Classification of Protein Sequences With Distance-Based Sequence Encoding Algorithm. Computational Intelligence, 33 (1). pp. 32-55. http://eprints.utp.edu.my/19628/
institution	Universiti Teknologi Petronas
collection	UTP Institutional Repository
description	Machine learning is being implemented in bioinformatics and computational biology to solve challenging problems emerged in the analysis and modeling of biological data such as DNA, RNA, and protein. The major problems in classifying protein sequences into existing families/superfamilies are the following: the selection of a suitable sequence encoding method, the extraction of an optimized subset of features that possesses significant discriminatory information, and the adaptation of an appropriate learning algorithm that classifies protein sequences with higher classification accuracy. The accurate classification of protein sequence would be helpful in determining the structure and function of novel protein sequences. In this article, we have proposed a distance-based sequence encoding algorithm that captures the sequence's statistical characteristics along with amino acids sequence order information. A statistical metric-based feature selection algorithm is then adopted to identify the reduced set of features to represent the original feature space. The performance of the proposed technique is validated using some of the best performing classifiers implemented previously for protein sequence classification. An average classification accuracy of 92 was achieved on the yeast protein sequence data set downloaded from the benchmark UniProtKB database. © 2015 Wiley Periodicals, Inc.
format	Article
author	Iqbal, M.J. Faye, I. Said, A.M.D. Samir, B.B.
spellingShingle	Iqbal, M.J. Faye, I. Said, A.M.D. Samir, B.B. Computational Technique for an Efficient Classification of Protein Sequences With Distance-Based Sequence Encoding Algorithm
author_sort	Iqbal, M.J.
title	Computational Technique for an Efficient Classification of Protein Sequences With Distance-Based Sequence Encoding Algorithm
title_short	Computational Technique for an Efficient Classification of Protein Sequences With Distance-Based Sequence Encoding Algorithm
title_full	Computational Technique for an Efficient Classification of Protein Sequences With Distance-Based Sequence Encoding Algorithm
title_fullStr	Computational Technique for an Efficient Classification of Protein Sequences With Distance-Based Sequence Encoding Algorithm
title_full_unstemmed	Computational Technique for an Efficient Classification of Protein Sequences With Distance-Based Sequence Encoding Algorithm
title_sort	computational technique for an efficient classification of protein sequences with distance-based sequence encoding algorithm
publisher	Blackwell Publishing Inc.
publishDate	2017
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-84945256270&doi=10.1111%2fcoin.12069&partnerID=40&md5=adcc0812e793bce19eaea8e92af9991d http://eprints.utp.edu.my/19628/
_version_	1741196237422460928
score	11.62408

Computational Technique for an Efficient Classification of Protein Sequences With Distance-Based Sequence Encoding Algorithm

Similar Items