On Utilizing Optimal and Information Theoretic Syntactic Modeling for Peptide Classification

Aygün, Eser; Oommen, B. John; Cataltepe, Zehra

doi:10.1007/978-3-642-04031-3_3

Eser Aygün²⁴,
B. John Oommen^25,26 &
Zehra Cataltepe²⁴

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5780))

Included in the following conference series:

IAPR International Conference on Pattern Recognition in Bioinformatics

906 Accesses
1 Citations

Abstract

Syntactic methods in pattern recognition have been used extensively in bioinformatics, and in particular, in the analysis of gene and protein expressions, and in the recognition and classification of bio-sequences. These methods are almost universally distance-based. This paper concerns the use of an Optimal and Information Theoretic (OIT) probabilistic model [11] to achieve peptide classification using the information residing in their syntactic representations. The latter has traditionally been achieved using the edit distances required in the respective peptide comparisons. We advocate that one can model the differences between compared strings as a mutation model consisting of random Substitutions, Insertions and Deletions (SID) obeying the OIT model. Thus, in this paper, we show that the probability measure obtained from the OIT model can be perceived as a sequence similarity metric, using which a Support Vector Machine (SVM)-based peptide classifier, referred to as OIT_SVM, can be devised.

The classifier, which we have built has been tested for eight different “substitution” matrices and for two different data sets, namely, the HIV-1 Protease Cleavage sites and the T-cell Epitopes. The results show that the OIT model performs significantly better than the one which uses a Needleman-Wunsch sequence alignment score, and the peptide classification methods that previously experimented with the same two datasets.

Download to read the full chapter text

Chapter PDF

ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins

Article Open access 16 May 2015

A novel method for achieving an optimal classification of the proteinogenic amino acids

Article Open access 18 September 2020

A Platform for Peptidase Detection Based on Text Mining Techniques and Support Vector Machines

Keywords

References

Aygün, E., Oommen, B.J., Cataltepe, Z.: Peptide Classification Using Optimal and Information Theoretic Syntactic Modeling (submitted for publication)
Google Scholar
Bucher, P., Hofmann, K.: A sequence similarity search algorithm based on a probabilistic interpretation of an alignment scoring system. In: Proceedings of the Conference on Intelligent Systems for Molecular Biology, pp. 44–51 (1996)
Google Scholar
Cai, Y.D., Chou, K.C.: Artificial neural network model for predicting HIV protease cleavage sites in protein. Advances in Engineering Software 29(2), 119–128 (1998)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Dayhoff, M., Schwartz, R., Orcutt, B.: A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure 5(suppl. 3), 345–352 (1978)
Google Scholar
Duin, R.P.W., Juszczak, P., Paclik, P., Pekalska, E., de Ridder, D., Tax, D.M.J.: PRTools, a Matlab Toolbox for Pattern Recognition. Delft University of Technology (2004)
Google Scholar
Guide, M.R.: The MathWorks. Inc., Natick, MA (1998)
Google Scholar
Kim, H., Zhang, Y., Heo, Y.S., Oh, H.B., Chen, S.S.: Specificity rule discovery in HIV-1 protease cleavage site analysis. Computational Biology and Chemistry 32(1), 71–78 (2008)
Article PubMed Google Scholar
Liao, L., Noble, W.S.: Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships. Journal of Computational Biology 10(6), 857–868 (2003)
Article CAS PubMed Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the ammo acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
Article CAS PubMed Google Scholar
Oommen, B.J., Kashyap, R.L.: A formal theory for optimal and information theoretic syntactic pattern recognition. Pattern Recognition 31(8), 1159–1177 (1998)
Article Google Scholar
Tatusova, T.A., Madden, T.L.: BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiology Letters 174(2), 247–250 (1999)
Article CAS PubMed Google Scholar
Thomson, R., Hodgman, T.C., Yang, Z.R., Doyle, A.K.: Characterizing proteolytic cleavage site activity using bio-basis function neural networks. Bioinformatics 19(14), 1741–1747 (2003)
Article CAS PubMed Google Scholar
Trudgian, D.C., Yang, Z.R.: Substitution Matrix Optimisation for Peptide Classification. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds.) EvoBIO 2007. LNCS, vol. 4447, pp. 291–300. Springer, Heidelberg (2007)
Chapter Google Scholar
Zhao, Y., Pinilla, C., Valmori, D., Martin, R., Simon, R.: Application of support vector machines for T-cell epitopes prediction. Bioinformatics 19(15), 1978–1984 (2003)
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Eng., Istanbul Technical University, Istanbul, Turkey
Eser Aygün & Zehra Cataltepe
School of Computer Science, Carleton University, Ottawa, Canada, K1S 5B6
B. John Oommen
Adjunct Professor at the University of Agder in Grimstad, Norway
B. John Oommen

Authors

Eser Aygün
View author publications
You can also search for this author in PubMed Google Scholar
B. John Oommen
View author publications
You can also search for this author in PubMed Google Scholar
Zehra Cataltepe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Automatic Control and Systems Engineering, University of Sheffield, Mappin Street, S1 3JD, Sheffield, UK
Visakan Kadirkamanathan
Department of Computer Science and Department of Chemical and Process Engineering, University of Sheffield, Mappin Street, S1 3JD, Sheffield, UK
Guido Sanguinetti
University of Glasgow, Department of Computing Science, Sir Alwyn Williams Building, Lilybank Gardens, Glasgow, G12 8QQ, UK, and, University of Glasgow, Department of Statistics, 14 University Gardens, Glasgow, G12 8QQ, UK
Mark Girolami
School of Electronics and Computer Science, University of Southampton, SO17 1BJ, Southampton, UK
Mahesan Niranjan
Department of Chemical and Process Engineering, University of Sheffield, Mappin Street, S1 3JD, Sheffield, UK
Josselin Noirel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aygün, E., Oommen, B.J., Cataltepe, Z. (2009). On Utilizing Optimal and Information Theoretic Syntactic Modeling for Peptide Classification. In: Kadirkamanathan, V., Sanguinetti, G., Girolami, M., Niranjan, M., Noirel, J. (eds) Pattern Recognition in Bioinformatics. PRIB 2009. Lecture Notes in Computer Science(), vol 5780. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04031-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-04031-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04030-6
Online ISBN: 978-3-642-04031-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

On Utilizing Optimal and Information Theoretic Syntactic Modeling for Peptide Classification

Abstract

Chapter PDF

Similar content being viewed by others

ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins

A novel method for achieving an optimal classification of the proteinogenic amino acids

A Platform for Peptidase Detection Based on Text Mining Techniques and Support Vector Machines

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

On Utilizing Optimal and Information Theoretic Syntactic Modeling for Peptide Classification

Abstract

Chapter PDF

Similar content being viewed by others

ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins

A novel method for achieving an optimal classification of the proteinogenic amino acids

A Platform for Peptidase Detection Based on Text Mining Techniques and Support Vector Machines

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation