Instance Selection Based on Hypertuples

Wang, Hui

doi:10.1007/978-1-4757-3359-4_14

Hui Wang³

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 608))

284 Accesses

Abstract

Instance selection aims to search for a representative portion of data that serves the same purpose as the whole data. In this chapter we propose a novel procedure for instance selection based on hypertuples, a generalization of traditional database tuples. This procedure has two tasks: building a model and selecting instances based on the model. For the first task, we propose to merge data tuples while ensuring some criteria are satisfied. This merge operation results in a set of hypertuples which, under certain conditions, serves as a model of the original data. We identify two types of criteria for the task of instance selection: preserving classification structures and maximizing density of hypertuples. For the first criterion we propose a formalism that leads to a unique solution — the least E-set. We then propose algorithms for finding this unique solution and for finding a compromised solution efficiently. For the second criterion we propose a new measure of density, which is normalized and quantized, and which applies to both numerical and categorical data. Using this measure of density, we then propose a hill-climbing algorithm that can efficiently find a quasi-optimal set of hypertuples, which is “quasi-densest”.

Having a model of data, we can generate a set of representative instances — the second task of the procedure. We propose to calculate the centers of the hypertuples in the model and take these centers as the representative instances of the original data. To use the selected instances for classification, we propose to use a nearest neighbor (NN) approach.

Experiments using real world public data show that, when used with the proposed NN classifier, the selected instances are not only representative but even outperform C5 in some cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D. W. (1990). A study of instance-based algorithms for supervised learning tasks. Technical report, University of California, Irvine.
Google Scholar
Duda, R. O. and Hart, P. E. (1973). Pattern classification and scene analysis. John Wiley & Sons.
MATH Google Scholar
Ester, M., Kriegel, H. P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, pages 226–231. AAAI Press.
Google Scholar
Grätzer, G. (1978). General Lattice Theory. Birkhäuser, Basel.
Book Google Scholar
Jensen, R. E. (1969). A dynamic programming algorithm for cluster analysis. Operations Research, 17:1034–1057.
Article MATH Google Scholar
Mitchell, T. M. (1997). Machine Learning. The McGraw-Hill Companies, Inc.
MATH Google Scholar
Wang, H., Dubitzky, W., Düntsch, I., and Bell, D. (1999). A lattice machine approach to automated casebase design: Marrying lazy and eager learning. In Proc. IJCAI99, pages 254–259, Stockholm, Sweden.
Google Scholar
Wang, H., Düintsch, I., and Bell, D. (1998). Data reduction based on hyper relations. In Proceedings of KDD98, New York, pages 349–353.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Software Engineering, University of Ulster, Newtownabbey, BT 37 OQB, N.Ireland
Hui Wang

Authors

Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Arizona State University, USA
Huan Liu
Osaka University, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, H. (2001). Instance Selection Based on Hypertuples. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_14

Download citation

DOI: https://doi.org/10.1007/978-1-4757-3359-4_14
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4861-8
Online ISBN: 978-1-4757-3359-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics