Skip to main content

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 608))

  • 284 Accesses

Abstract

Instance selection aims to search for a representative portion of data that serves the same purpose as the whole data. In this chapter we propose a novel procedure for instance selection based on hypertuples, a generalization of traditional database tuples. This procedure has two tasks: building a model and selecting instances based on the model. For the first task, we propose to merge data tuples while ensuring some criteria are satisfied. This merge operation results in a set of hypertuples which, under certain conditions, serves as a model of the original data. We identify two types of criteria for the task of instance selection: preserving classification structures and maximizing density of hypertuples. For the first criterion we propose a formalism that leads to a unique solution — the least E-set. We then propose algorithms for finding this unique solution and for finding a compromised solution efficiently. For the second criterion we propose a new measure of density, which is normalized and quantized, and which applies to both numerical and categorical data. Using this measure of density, we then propose a hill-climbing algorithm that can efficiently find a quasi-optimal set of hypertuples, which is “quasi-densest”.

Having a model of data, we can generate a set of representative instances — the second task of the procedure. We propose to calculate the centers of the hypertuples in the model and take these centers as the representative instances of the original data. To use the selected instances for classification, we propose to use a nearest neighbor (NN) approach.

Experiments using real world public data show that, when used with the proposed NN classifier, the selected instances are not only representative but even outperform C5 in some cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aha, D. W. (1990). A study of instance-based algorithms for supervised learning tasks. Technical report, University of California, Irvine.

    Google Scholar 

  • Duda, R. O. and Hart, P. E. (1973). Pattern classification and scene analysis. John Wiley & Sons.

    MATH  Google Scholar 

  • Ester, M., Kriegel, H. P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, pages 226–231. AAAI Press.

    Google Scholar 

  • Grätzer, G. (1978). General Lattice Theory. Birkhäuser, Basel.

    Book  Google Scholar 

  • Jensen, R. E. (1969). A dynamic programming algorithm for cluster analysis. Operations Research, 17:1034–1057.

    Article  MATH  Google Scholar 

  • Mitchell, T. M. (1997). Machine Learning. The McGraw-Hill Companies, Inc.

    MATH  Google Scholar 

  • Wang, H., Dubitzky, W., Düntsch, I., and Bell, D. (1999). A lattice machine approach to automated casebase design: Marrying lazy and eager learning. In Proc. IJCAI99, pages 254–259, Stockholm, Sweden.

    Google Scholar 

  • Wang, H., Düintsch, I., and Bell, D. (1998). Data reduction based on hyper relations. In Proceedings of KDD98, New York, pages 349–353.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Wang, H. (2001). Instance Selection Based on Hypertuples. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-3359-4_14

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-4861-8

  • Online ISBN: 978-1-4757-3359-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics