Efficient Discovery of Proximity Patterns with Suffix Arrays (Extended Abstract)

Arimura, Hiroki; Asaka, Hiroki; Sakamoto, Hiroshi; Arikawa, Setsuo

doi:10.1007/3-540-48194-X_14

Hiroki Arimura^6,7,
Hiroki Asaka⁶,
Hiroshi Sakamoto⁶ &
…
Setsuo Arikawa⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2089))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

739 Accesses
5 Citations

Abstract

We describe an efficient implementation of a text mining algorithm for discovering a class of simple string patterns. With an index structure, called the virtual suffix tree, for pattern discovery built on the top of the suffix array, the resulting algorithm is simple and fast in practice compared with the previous implementation with the suffix tree.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Arimura, J. Abe, R. Fujino, H. Sakamoto, S. Shimozono, S. Arikawa, Text Data Mining: Discovery of Important Keywords in the Cyberspace, In Proc. IEEE Kyoto Int’l Conf. Digital Library, 2001. (to appear)
Google Scholar
H. Arimura, S. Arikawa, S. Shimozono, Efficient discovery of optimal word-association patterns in large text databases, New Generation Computing, Special issue on Discovery Science, 18, 49–60, 2000.
Article Google Scholar
Arimura, H., Wataki, A., Fujino, R., Arikawa, S., A fast algorithm for discovering optimal string patterns in large text databases, In Proc. the 9th Int. Workshop on Algorithmic Learning Theory (ALT’98), LNAI 1501, 247–261, 1998.
Google Scholar
L. Devroye, L. Gyorfi, G. Lugosi, A Probablistic Theory of Pattern Recognition, Springer-Verlag,1996.
Google Scholar
G. Gonnet, R. Baeza-Yates and T. Snider, New indices for text: Pat trees and pat arrays, In William Frakes and Ricardo Baeza-Yates (eds.), Information Retrieval: Data Structures and Algorithms, 66–82, 1992.
Google Scholar
D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, New York, 1997.
Book MATH Google Scholar
T. Kasai, G. Lee, H. Arimura, S. Arikawa, K. Park, Linear-time longest-common-prefix computation in suffix arrays and its applications, In Proc. CPM’01, LNCS, Springer-Verlag, 2000 (this volumn). (A part of this work is also available as: T. Kasai, H. Arimura, S. Arikawa, Efficient substring traversal with suffix arrays, DOI-TR 185, 2001, ftp://ftp.i.kyushu-u.ac.jp/pub/tr/trcs185.ps.gz.)
Google Scholar
M.J. Kearns, R.E. Shapire, L.M. Sellie, Toward efficient agnostic learning. Machine Learning, 17(2-3), 115–141, 1994.
Article MATH Google Scholar
D. Lewis, Reuters-2157-8 text categorization test collection, Distribution 1.0, AT&T Labs-Research, http://www.research.att.com/~lewis/, 1997.
E.M. McCreight, A space-economical suffix tree construction algorithm, JACM, 23(2):262–272, 1976.
Article MathSciNet MATH Google Scholar
U. Manber and R. Baeza-Yates, An algorithm for string matching with a sequence of don’t cares. IPL 37, 1991.
Google Scholar
U. Manber and G. Myers, Suffix arrays: A new method for on-line string searches, SIAM J. Computing, 22(5), 935–948 (1993).
Article MathSciNet MATH Google Scholar
S. Morishita, On classification and regression, In Proc. Discovery Science’ 98, LNAI 1532, 49–59, 1998.
Google Scholar
B. Schieber and U. Vishkin, On finding lowest common ancestors: simplifications an parallelization, SIAM J. Computing, 17, 1253–1262, 1988.
Article MathSciNet MATH Google Scholar
J.T.L. Wang, G.W. Chirn, T.G. Marr, B. Shapiro, D. Shasha and K. Zhang, In Proc. SIGMOD’94, 115–125, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Kyushu University, Fukuoka, 812-8581, Japan
Hiroki Arimura, Hiroki Asaka, Hiroshi Sakamoto & Setsuo Arikawa
PRESTO, Japan Science and Technology Corporation, Japan
Hiroki Arimura

Authors

Hiroki Arimura
View author publications
You can also search for this author in PubMed Google Scholar
Hiroki Asaka
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Sakamoto
View author publications
You can also search for this author in PubMed Google Scholar
Setsuo Arikawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Bar-Ilan University, 52900, Ramat-Gan, Israel, Atlanta, Georgia, 30332-0280, USA
Amihood Amir

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arimura, H., Asaka, H., Sakamoto, H., Arikawa, S. (2001). Efficient Discovery of Proximity Patterns with Suffix Arrays (Extended Abstract). In: Amir, A. (eds) Combinatorial Pattern Matching. CPM 2001. Lecture Notes in Computer Science, vol 2089. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48194-X_14

Download citation

DOI: https://doi.org/10.1007/3-540-48194-X_14
Published: 13 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42271-6
Online ISBN: 978-3-540-48194-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics