Data Compaction Through Simultaneous Selection of Prototypes and Features

Ravindra Babu, T.; Narasimha Murty, M.; Subrahmanya, S. V.

doi:10.1007/978-1-4471-5607-9_5

T. Ravindra Babu⁶,
M. Narasimha Murty⁷ &
S. V. Subrahmanya⁶

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1718 Accesses
1 Altmetric

Abstract

Efficiency in algorithms for data mining can be achieved through identifying representative prototypes or representative features and base explorative study only on those subsets. It is interesting to examine whether both of them can be achieved simultaneously through lossy compression and efficient clustering algorithms on large datasets. We study this aspect in the present chapter. We further examine whether there is a preference in sequencing both these activities; specifically, we examine clustering followed by compression and compression followed by clustering. We provide a detailed discussion on background material that includes definition of various terms, parameters, choice of thresholds in reducing number of patterns and features, etc. We study eight combinations of lossy compression scenarios. We demonstrate that these lossy compression scenarios with compressed information provide a better classification accuracy than the original dataset. In this direction, we implement the proposed scheme on two large datasets, one with binary-valued features and the other with float-point-valued features. At the end of the chapter, we provide bibliographic notes and a list of references.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in Proceedings of International Conference on VLDB (1994)
Google Scholar
P. Bradley, U.M. Fayyad, C. Reina, Scaling clustering algorithms to large databases, in Proceedings of 4th Intl. Conf. on Knowledge Discovery and Data Mining (AAAI Press, New York, 1998), pp. 9–15
Google Scholar
C.J.C. Burges, A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998).
Article Google Scholar
P. Domingos, Occam’s two razors: the sharp and the blunt, in Proc. of 4th Intl. Conference on Knowledge Discovery and Data Mining (KDD’98), ed. by R. Agrawal, P. Stolorz (AAAI Press, New York, 1998), pp. 37–43
Google Scholar
W. DuMouchel, C. Volinksy, T. Johnson, C. Cortez, D. Pregibon, Squashing flat files flatter, in Proc. 5th Intl. Conf. on Knowledge Discovery and Data Mining, San Diego, CA (AAAI Press, New York, 2002)
Google Scholar
R.O. Duda, P.E. Hart, D.J. Stork, Pattern Classification (Wiley-Interscience, New York, 2000)
Google Scholar
J. Han, M. Kamber, J. Pei, Data Mining—Concepts and Techniques (Morgan-Kauffman, New York, 2012)
MATH Google Scholar
P.E. Hart, The condensed nearest neighbor rule. IEEE Trans. Inf. Theory IT-14, 515–516 (1968)
Article Google Scholar
A.K. Jain, M.N. Murty, P. Flynn, Data clustering: a review. ACM Comput. Surv. 32(3) (1999)
Google Scholar
J. Kittler, Feature selection and extraction, in Handbook of Pattern Recognition and Image Proc., ed. by T.Y. Young, K.S. Fu. (Academic Press, San Diego, 1986), pp. 59–83
Google Scholar
L. Kaufman, P.J. Rousseeuw, Finding Groups in Data—An Introduction to Cluster Analysis (Wiley, New York, 1989)
Google Scholar
S.K. Pal, P. Mitra, Pattern Recognition Algorithms for Data Mining (Chapman & Hall/CRC, London/Boca Raton, 2004)
Book MATH Google Scholar
T. Ravindra Babu, M. Narasimha Murty, V.K. Agrawal, Hybrid learning scheme for data mining applications, in Proc. of Fourth Intl. Conf. on Hybrid Intelligent Systems (IEEE Computer Society, Los Alamitos, 2004), pp. 266–271. doi:10.1109/ICHIS.2004.56
Chapter Google Scholar
T. Ravindra Babu, M. Narasimha Murty, V.K. Agrawal, On simultaneous selection of prototypes and features in large data, in Proceedings of the First International Conference on Pattern Recognition and Machine Intelligence. Lecture Notes in Computer Science, vol. 3776 (Springer, Berlin, 2005), pp. 595–600
Chapter Google Scholar
T. Ravindra Babu, M. Narasimha Murty, Comparison of genetic algorithm based prototype selection schemes. Pattern Recognit. 34(2), 523–525 (2001)
Article Google Scholar
H. Spath, Cluster Analysis Algorithms for Data Reduction and Classification (Ellis Horwood, Chichester, 1980)
Google Scholar
T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: an efficient data clustering method for very large databases, in Proceedings of the ACM SIGMOD International Conference of Management of Data (SIGMOD’96) (1996), pp. 103–114
Google Scholar
Iris dataset (2013) http://archive.isc.uci.edu/ml/datasets/Iris. Accessed on 18 April 2013

Download references

Author information

Authors and Affiliations

Infosys Technologies Ltd., Bangalore, India
T. Ravindra Babu & S. V. Subrahmanya
Indian Institute of Science, Bangalore, India
M. Narasimha Murty

Authors

T. Ravindra Babu
View author publications
You can also search for this author in PubMed Google Scholar
M. Narasimha Murty
View author publications
You can also search for this author in PubMed Google Scholar
S. V. Subrahmanya
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ravindra Babu, T., Narasimha Murty, M., Subrahmanya, S.V. (2013). Data Compaction Through Simultaneous Selection of Prototypes and Features. In: Compression Schemes for Mining Large Datasets. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5607-9_5

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5607-9_5
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5606-2
Online ISBN: 978-1-4471-5607-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics