Rule Induction, Missing Attribute Values and Discretization

Grzymala-Busse, Jerzy W.

doi:10.1007/978-1-4614-1800-9_170

Jerzy W. Grzymala-Busse^2,3

246 Accesses

Article Outline

Glossary

Definition of the Subject

Introduction

Discretization

LEM2 Algorithm

Inconsistent Data

Missing Attribute Values

MLEM2

Classification System

Validation

Future Directions

Bibliography

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 1,500.00; Price excludes VAT (USA)

Hardcover Book: USD 1,399.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

Discretization:

Discretization is a process of converting numerical attributes into symbolic ones by splitting the numerical attribute domain into intervals. Usually discretization is conducted before the main process of rule induction, but in some rule induction algorithms, e. g., in MLEM2 (Modified LEM2), rules are induced concurrently with discretization.

LEM2 algorithm :

LEM2 (Learning from Examples Module, version 2) is the basic rule induction algorithm of the machine learning/data mining system LERS. LEM2, implemented for the first time in 1990, uses an idea of a local covering to induce a minimal set of minimal rules describing all data concepts.

LERS machine learning/data mining system :

LERS (Learning from Examples based on Rough Sets) is a rule induction system created at the University of Kansas. Its first implementation was done in Franz Lisp in 1988. This first version of LERS had only one algorithm called LEM1 (Learning form Examples Module, version 1) to induce all rules from input data.

Missing attribute values :

Missing attribute values frequently affect real-life data. Some attribute values are lost (e. g., erased), some are of the type “do not care” conditions (such attribute values were irrelevant for classification of the case). In most existing machine learning/data mining systems some method of handling missing attribute values is applied before the main process of rule induction.However, in MLEM2 rule induction and handling missing attribute values are conducted at the same time.

Rule induction :

Rule induction is understood here as an instance of supervised learning. Rule induction is one of the basic processes of acquiring knowledge (knowledge extraction) in the form of rule sets from raw data. This process is widely used in machine learning (data mining). A data set contains cases (examples) characterized by attribute values and classified as members of concepts by an expert. Rules are expressions of the following format:

if condition₁ and condition₂ and … and condition\( { _n } \) then decision.

Bibliography

Chan CC, Grzymala‐Busse JW (1991) On the attribute redundancy and thelearning programs ID3, PRISM, and LEM2. Department of Computer Science, University of Kansas, TR-91-14 20
Google Scholar
Chmielewski MR, Grzymala‐Busse JW (1996) Global discretization ofcontinuous attributes as preprocessing for machine learning. Int J Approx Reason 15:319–331
Google Scholar
Grzymala‐Busse JW (1988) Knowledge acquisition underuncertainty – A rough set approach. J Intell Robot Syst 1:3–16
Google Scholar
Grzymala‐Busse JW (1992) LERS – A system for learning fromexamples based on rough sets. In: Slowinski R (ed) Intelligent decision support. Handbook of applications and advances of the rough set theory. Kluwer,Dordrecht, pp 3–18
Google Scholar
Grzymala‐Busse JW (1997) A new version of the rule induction systemLERS. Fundam Inform 31:27–39
Google Scholar
Grzymala‐Busse JW (2002) Discretization of numerical attributes. In:Klösgen W, Zytkow J (eds) Handbook of data mining and knowledge discovery. Oxford University Press, New York,pp 218–225
Google Scholar
Grzymala‐Busse JW (2002) MLEM2: A new algorithm for rule inductionfrom imperfect data. In: Proceedings of the 9th international conference on information processing and management of uncertainty in knowledge‐basedsystems, IPMU 2002, Annecy, France, pp 243–250
Google Scholar
Grzymala‐Busse JW (2003) A comparison of three strategies to ruleinduction from data with numerical attributes. In: Proceedings of the international workshop on rough sets in knowledge discovery (RSKD 2003), inconjunction with the European joint conferences on theory and practice of software, Warsaw, pp 132–140
Google Scholar
Grzymala‐Busse JW (2003) Rough set strategies to data with missing attribute values. In: Workshop notes, foundations and new directions of data mining, in conjunction with the 3rd IEEE international conference on datamining, Melbourne, FL, pp 56–63
Google Scholar
Grzymala‐Busse JW (2007) Mining numerical data – A roughset approach. In: Proceedings of the RSEISP'2007, the international conference of rough sets and emerging intelligent systems paradigms, Warsaw,Poland. Lecture Notes in artificial intelligence, vol 4585. Springer, Berlin, pp 12–21
Google Scholar
Kryszkiewicz M (1995) Rough set approach to incomplete informationsystems. In: Proceedings of the second annual joint conference on information sciences, pp 194–197
Google Scholar
Kryszkiewicz M (1999) Rules in incomplete information systems. Inf Sci113:271–292
Article MathSciNet MATH Google Scholar
Lin TY (1989) Chinese Wall security policy – An aggressivemodel. In: Proceedings of the fifth aerospace computer security application conference, Tucson, AZ, pp 286–293
Google Scholar
Lin TY (1989) Neighborhood systems and approximation in database and knowledgebase systems. In: Proceedings of the ISMIS-89, the fourth international symposium on methodologies of intelligent systems, Charlotte, NC,pp 75–86
Google Scholar
Lin TY (1992) Topological and fuzzy rough sets. In: Slowinski R (ed)Intelligent decision support. Handbook of applications and advances of the rough set theory. Kluwer, Dordrecht,pp 287–304
Chapter Google Scholar
Pawlak Z (1982) Rough Sets. Int J Comput Inf Sci11:341–356
Article MathSciNet MATH Google Scholar
Pawlak Z (1991) Rough Sets: Theoretical aspects of reasoning aboutdata. Kluwer, Dordrecht
MATH Google Scholar
Stefanowski J (2001) Algorithms of decision rule induction in datamining. Poznan University of Technology Press, Poznan
Google Scholar
Stefanowski J, Tsoukias A (1999) On the extension of rough sets underincomplete information. In: Proceedings of the RSFDGrC'1999, 7th international workshop on new directions in rough sets, data mining, andgranular‐soft computing, pp 73–81
Google Scholar
Stefanowski J, Tsoukias A (2001) Incomplete information tables and roughclassification. Comput Intell 17:545–566
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, USA
Jerzy W. Grzymala-Busse
Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
Jerzy W. Grzymala-Busse

Authors

Jerzy W. Grzymala-Busse
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

RAMTECH LIMITED, 122 Escalle Lane, Larkspur, CA, 94939, USA
Robert A. Meyers Ph. D. (Editor-in-Chief) (Editor-in-Chief)

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Grzymala-Busse, J.W. (2012). Rule Induction, Missing Attribute Values and Discretization. In: Meyers, R. (eds) Computational Complexity. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1800-9_170

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1800-9_170
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1799-6
Online ISBN: 978-1-4614-1800-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Rule Induction, Missing Attribute Values and Discretization

Article Outline

Access this chapter

Abbreviations

Bibliography

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation