We discuss and experimentally compare several alternative classification algorithms for biological sequences. The methods presented in this chapter are all essentially based on different forms of statistical learning, ranging from support vector machines with string kernels, to nearest neighbour using biologically motivated distances. We report about an extensive comparison of empirical results for the problem of protein subcellular localization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Taskar, B., Chatalbashev, V., Koller, D., Guestrin, C.: Learning structured prediction models: A large margin approach. In: Twenty Second International Conference on Machine Learning (ICML05), Bonn, Germany (2005)
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector ma-chine learning for interdependent and structured output spaces. In: International Conference on Machine Learning (ICML04). (2004)
Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, University of California, Santa Cruz (1999)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2 (2002) 419-444
Leslie, C.S., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for svm protein classification. In: Pacific Symposium on Biocomputing. (2002) 566-575
Cortes, C., Haffner, P., Mohri, M.: Rational kernels: Theory and algorithms. J. of Machine Learning Research 5 (2004) 1035-1062
Collins, M., Duffy, N.: Convolution kernels for natural language. In: NIPS 14. (2001)625-632
Vishwanathan, S., Smola, A.: Fast kernels on strings and trees. In: Advances in Neural Information Processing Systems 2002. (2002)
Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of ICML’03. (2003)
Gärtner, T.: A survey of kernels for structured data. SIGKDD Explor. Newsl. 5(1) (2003) 49-58
Gärtner, T., Lloyd, J., Flach, P.: Kernels and distances for structured data. Mach. Learning 57(3) (2004) 205-232
Passerini, A., Frasconi, P.: Kernels on prolog ground terms. In: Int. Joint Conf. on Artificial Intelligence (IJCAI’05), Edinburgh (2005)
Passerini, A., Frasconi, P., De Raedt, L.: Kernels on prolog proof trees: Statisti-cal learning in the ILP setting. Journal of Machine Learning Research 7 (2006) 307-342
Ben-David, S., Eiron, N., Simon, H.U.: Limitations of learning via embeddings in euclidean half spaces. J. of Mach. Learning Research 3 (2002) 441-461
Schölkopf, B., Weston, J., Eskin, E., Leslie, C.S., Noble, W.S.: A kernel approach for learning from almost orthogonal patterns. In: Proc. of ECML’02. (2002) 511-528
Menchetti, S., Costa, F., Frasconi, P.: Weighted decomposition kernels. In: Proc. Int. Conf. on Machine Learning (ICML’05). (2005)
Jaakkola, T., Diekhans, M., Haussler, D.: A Discrimitive Framework for De-tecting Remote Protein Homologies. J. of Comp. Biology 7(1-2) (2000) 95-114
Reinhardt, A., Hubbard, T.: Using neural networks for prediction of the sub-cellular location of proteins. Nucleic Acids Research 26(9) (1998) 2230-2236
Chou, K.C., Elrod, D.: Prediction of membrane protein types and subcellular locations. Proteins 34 (1999) 137-153
Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their n-terminal amino acid sequence. J Mol. Biol. 300 (2000) 1005-1016
Hua, S., Sun, Z.: Support Vector Machine for Protein Subcellular Localization Prediction. Bioinformatics 17(8) (2001) 721-728
Nair, R., Rost, B.: Sequence conserved for subcellular localization. Protein Science 11 (2002) 2836 - 2847
Lu, Z., Szafron, D., Greiner, R., Lu, P., Wishart, D.S., Poulin, B., Anvik, J., Macdonell, C., Eisner, R.: Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics 20(4) (2004) 547-556
Nair, R., Rost, B.: Mimicking cellular sorting improves prediction of subcellular localization. J Mol Biol 348(1) (2005) 85-100
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge Univ. Press (2004)
Varzi, A.: Parts, wholes, and part-whole relations: the prospects of mereotopol-ogy. Knowledge and Data Engineering 20 (1996) 259-286
Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: Hardness results and effi-cient alternatives. In Schölkopf, B., Warmuth, M.K., eds.: Proc. of COLT/Kernel ’03. (2003) 129-143
Jebara, T., Kondor, R., Howard, A.: Probability product kernels. J. Mach. Learn. Res. 5 (2004) 819-844
Odone, F., Barla, A., Verri, A.: Building kernels from binary strings for image matching. IEEE Transactions on Image Processing 14(2) (2005) 169-180
Wu, C., Berry, M., Shivakumar, S., McLarty, J.: Neural networks for full-scale protein sequence classification: Sequence en coding with singular value decom-position. Machine Learning 21(1) (1995) 177-193
Leslie, C., Eskin, E., Cohen, A., Weston, J., Stafford Noble, W.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4) (2004) 467-476
Devos, D., Valencia, A.: Practical limits of function prediction. Proteins: Struc-ture, Function, and Genetics 41 (2000) 98-107
Webb, E.C.: Enzyme nomenclature 1992 : recommendations of the nomenclature committee of the international union of biochemistry and molecular biology on the nomenclature and classification of enzymes. San Diego : Published for the International Union of Biochemistry and Molecular Biology by Academic Press (1992)
Lewis, S., Ashburner, M., Reese, M.: Annotating eukaryote genomes. Current Opinion in Structural Biology 10(3) (2000) 349-354
Doolittle, R.: Of URFs and ORFs: a primer on how to analyze derived amino acid sequences. University Science Books, Mill Valley California (1986)
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: A basic local align-ment search tool. J Mol. Biol. 215 (1990) 403-410
Rost, B.: Twilight zone of protein sequence alignment. Protein Engineering 12(2) (1999) 85-94
Sander, C., Schneider, R.: Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9(1) (1991) 56-68
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The swiss-prot protein knowledgebase and its supplement trembl in 2003. Nucleic Acids Res 31(1) (2003) 365-370
Hobohm, U., Scharf, M., Schneider, R., Sander, C.: Selection of representative protein data sets. Protein Science 1 (1992) 409-417
Mika, S., Rost, B.: Uniqueprot: creating sequence-unique protein data sets. Nucleic Acids Res. 31(13) (2003) 3789-3791
Liò, P., Vannucci, M.: Wavelet change-point prediction of transmembrane pro-teins. Bioinformatics 16(4) (2000) 376-382
Chen, C., Rost, B.: State-of-the-art in membrane protein prediction. Applied Bioinformatics 1(1) (2002) 21-35
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25(17) (1997) 3389-3402
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Costa, F., Menchetti, S., Frasconi, P. (2007). Comparing Sequence Classification Algorithms for Protein Subcellular Localization. In: Hammer, B., Hitzler, P. (eds) Perspectives of Neural-Symbolic Integration. Studies in Computational Intelligence, vol 77. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73954-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-73954-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73953-1
Online ISBN: 978-3-540-73954-8
eBook Packages: EngineeringEngineering (R0)