Skip to main content

A New Family of String Classifiers Based on Local Relatedness

  • Conference paper
Discovery Science (DS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4265))

Included in the following conference series:

  • 1214 Accesses

Abstract

This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr’s), longest common subsequences (LCSeq’s), and window-accumulated longest common subsequences (wLCSeq’s). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set), is NP-hard for all of the above measurements. In order to achieve practically efficient algorithms for finding the best classifier, we investigate pruning heuristics and fast string matching techniques based on the properties of the local relatedness measurements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arimura, H., Wataki, A., Fujino, R., Arikawa, S.: A fast algorithm for discovering optimal string patterns in large text databases. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS (LNAI), vol. 1501, pp. 247–261. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  2. Hirao, M., Inenaga, S., Shinohara, A., Takeda, M., Arikawa, S.: A practical algorithm to find the best episode patterns. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 435–440. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  3. Hirao, M., Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S.: A practical algorithm to find the best subsequence patterns. Theoretical Computer Science 292(2), 465–479 (2002)

    Article  MathSciNet  Google Scholar 

  4. Shinohara, A., Takeda, M., Arikawa, S., Hirao, M., Hoshino, H., Inenaga, S.: Finding best patterns practically. In: Arikawa, S., Shinohara, A. (eds.) Progress in Discovery Science. LNCS (LNAI), vol. 2281, pp. 307–317. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Inenaga, S., Bannai, H., Shinohara, A., Takeda, M., Arikawa, S.: Discovering best variable-length-don’t-care patterns. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS (LNAI), vol. 2534, pp. 86–97. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Shinozaki, D., Akutsu, T., Maruyama, O.: Finding optimal degenerate patterns in DNA sequences. Bioinformatics 19, ii206–ii214 (2003)

    Google Scholar 

  7. Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. Information and Computation 185, 41–55 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  8. Takeda, M., Inenaga, S., Bannai, H., Shinohara, A., Arikawa, S.: Discovering most classificatory patterns for very expressive pattern classes. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 486–493. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: Finding optimal pairs of patterns. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 450–462. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: An O(N 2) algorithm for discovering optimal Boolean pattern pairs. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1(4), 159–170 (2004) (special issue for selected papers of WABI 2004).

    Article  Google Scholar 

  11. Inenaga, S., Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: Finding optimal pairs of cooperative and competing patterns with bounded distance. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 32–46. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Oxford (1994)

    MATH  Google Scholar 

  13. Landau, G.M., Myers, E.W., Ziv-Ukelson, M.: Two algorithms for LCS consecutive suffix alignment. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 173–193. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Higa, Y., Inenaga, S., Bannai, H., Takeda, M. (2006). A New Family of String Classifiers Based on Local Relatedness. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds) Discovery Science. DS 2006. Lecture Notes in Computer Science(), vol 4265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893318_14

Download citation

  • DOI: https://doi.org/10.1007/11893318_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-46491-4

  • Online ISBN: 978-3-540-46493-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics