Prefix-Suffix Trees: A Novel Scheme for Compact Representation of Large Datasets

Pai, Radhika M.; Ananthanarayana, V. S

doi:10.1007/978-3-540-77046-6_40

Radhika M. Pai¹ &
V. S Ananthanarayana²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4815))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

2245 Accesses

Abstract

An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper we propose a novel scheme called Prefix-Suffix trees for compact storage of patterns in data mining, which forms an abstraction of the patterns, and which is generated from the data in a single scan. This abstraction takes less amount of space and hence forms a compact storage of patterns. Further, we propose a clustering algorithm based on this storage and prove experimentally that this type of storage reduces the space and time. This has been established by considering large data sets of handwritten numerals namely the OCR data, the MNIST data and the USPS data. The proposed algorithm is compared with other similar algorithms and the efficacy of our scheme is thus established.

Download to read the full chapter text

Chapter PDF

Compressed Spaced Suffix Arrays

Article 02 February 2017

Improvements to Suffix Tree Clustering

Data Mining Paradigms

Keywords

References

Moore, A., Lee, M.S.: Cached Sufficient statistics for efficient machine learning with large datasets. Journal of Artificial Intelligence Research 8, 67–91 (1998)
MATH MathSciNet Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (Advanced Reference Series)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)
Article Google Scholar
Pujari, A.K.: Data Mining techniques. University Press, New Haven (2001)
Google Scholar
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM trans. Math software 3(3), 209–226 (1997)
Article Google Scholar
Viswanath, P., Murthy, M.N.: An incremental mining algorithm for compact realization of prototypes. Technical Report, IISC, Bangalore (2002)
Google Scholar
Prakash, M., Murthy, M.N.: Growing subspace pattern recognition methods and their neural network models. IEEE trans. Neural Networks 8(1), 161–168 (1997)
Article Google Scholar
Ananthanarayana, V.S., NarasimhaMurty, M., Subramanian, D.K.: Tree structure for efficient data mining using rough sets. Pattern Recognition Letters 24, 851–886 (2003)
Article MATH Google Scholar
http://www.cs.cmu.edu/15781/web/digits.html
http://wwwi6.informatik.rwthaachen.de/~keysers/usps.html
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Ravindra, T., Murthy, M.N.: Comparison of Genetic Algorithms based prototype selection scheme. Pattern Recognition 34, 523–525 (2001)
Article Google Scholar
Pai, R.M., Ananthanarayana, V.S.: A novel data structure for efficient representation of large datasets in Data Mining. In: Proceedings of the 14th international Conference on Advanced Computing and Communications, pp. 547–552 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Manipal Institute of Technology, Manipal,
Radhika M. Pai
National Institute of Technology Karnataka, Surathkal,
V. S Ananthanarayana

Authors

Radhika M. Pai
View author publications
You can also search for this author in PubMed Google Scholar
V. S Ananthanarayana
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ashish Ghosh Rajat K. De Sankar K. Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pai, R.M., Ananthanarayana, V.S. (2007). Prefix-Suffix Trees: A Novel Scheme for Compact Representation of Large Datasets. In: Ghosh, A., De, R.K., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2007. Lecture Notes in Computer Science, vol 4815. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77046-6_40

Download citation

DOI: https://doi.org/10.1007/978-3-540-77046-6_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77045-9
Online ISBN: 978-3-540-77046-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Prefix-Suffix Trees: A Novel Scheme for Compact Representation of Large Datasets

Abstract

Chapter PDF