Skip to main content

Accuracy of String Kernels for Protein Sequence Classification

  • Conference paper
Pattern Recognition and Data Mining (ICAPR 2005)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3686))

Included in the following conference series:

Abstract

Determining protein sequence similarity is an important task for protein classification and homology detection. Typically this may be done using sequence alignment algorithms, yet fast and accurate alignment-free kernel based classifiers exist. Viewing sequences as a “bag of words”, we test a simple weighted string kernel, investigating the effects of k-mer length, sequence length and choice of weighting. We also extend the kernel to operate on the k-mer frequency representation of a sequence rather than the “bag of words” representation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Miller, C.J., Attwood, T.K.: Psst..the probabilistic sequence search tool. In: 2nd IEEE International Symposium on Bioinformatics and Bioengineering, pp. 33–40. IEEE Press, Washington (2001), ISBN 0769514235

    Chapter  Google Scholar 

  2. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Molecular Biology 215, 403–410 (1990)

    Google Scholar 

  3. Pearson, W.R.: Rapid and sensitive sequence comparison with fastp and fasta. Methods Enzymol 183, 63–98 (1990)

    Article  Google Scholar 

  4. Krogh, A., Brown, M., Mian, I., Sjolander, K., Haussler, D.: Hidden markov models in computational biology: Applications to protein modelling. Journal of Molecular Biology 235, 1501–1531 (1994)

    Article  Google Scholar 

  5. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI–BLAST: A new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)

    Article  Google Scholar 

  6. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  7. Leslie, C., Eskin, E., Weston, J., Noble, W.S.: The spectrum kernel: a string kernel for svm protein classification. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 564–575. World Scientific, Singapore (2002)

    Google Scholar 

  8. Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for svm protein classification. In: Advances in Neural Information Processing Systems (2002)

    Google Scholar 

  9. Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N., Mitchell, A., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin, A., Zygouri, C.: PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Research 31(1), 400–402 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Spalding, J.D., Hoyle, D.C. (2005). Accuracy of String Kernels for Protein Sequence Classification. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds) Pattern Recognition and Data Mining. ICAPR 2005. Lecture Notes in Computer Science, vol 3686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551188_49

Download citation

  • DOI: https://doi.org/10.1007/11551188_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28757-5

  • Online ISBN: 978-3-540-28758-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics