Skip to main content

Experiments on Data with Three Interpretations of Missing Attribute Values—A Rough Set Approach

  • Conference paper
Intelligent Information Processing and Web Mining

Part of the book series: Advances in Soft Computing ((AINSC,volume 35))

Abstract

In this paper we distinguish three different types of missing attribute values: lost values (e.g., erased values), “do not care” conditions (attribute values that were irrelevant for classification a case), and attribute-concept values (“do not care” conditions restricted to a specific concept). As it is known, subset and concept approximations should be used for knowledge acquisition from incomplete data sets. We report results of experiments on seven well-known incomplete data sets using nine strategies: interpreting missing attribute values in three different ways and using both lower and upper, subset and concept approximations (note that subset lower approximations are identical with concept lower approximations). Additionally, in the data sets cases with more than approximately 70% of missing attribute values, these values were removed from the original data sets and then all nine strategies were applied. Our conclusions are that any two of our nine strategies are incomparable in terms of error rates (5% significance level, two-tailed test). However, for some data sets removing cases with an excessive number of missing attribute values improves the error rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1. Greco, S., Matarazzo, B., and Slowinski, R.: Dealing with missing data in rough set analysis of multi-attribute and multi-criteria decision problems. In Decision Making: Recent developments and Worldwide Applications, ed. by S. H. Zanakis, G. Doukidis, and Z. Zopounidis, Kluwer Academic Publishers, Dordrecht, Boston, London, 2000, 295–316.

    Google Scholar 

  2. 2. Grzymala-Busse, J.W.: On the unknown attribute values in learning from examples. Proc. of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, Charlotte, North Carolina, October 16–19, 1991. Lecture Notes in Artificial Intelligence, vol. 542, Springer-Verlag, Berlin, Heidelberg, New York (1991) 368–377.

    Google Scholar 

  3. 3. Grzymala-Busse, J. W.: LERS—A system for learning from examples based on rough sets. In Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory. Slowinski, R. (ed.), Kluwer Academic Publishers, Dordrecht, Boston, London (1992) 3–18.

    Google Scholar 

  4. 4. Grzymala-Busse, J. W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31 (1997), 27–39.

    MATH  Google Scholar 

  5. 5. Grzymala-Busse., J.W.: MLEM2: A new algorithm for rule induction from imperfect data. Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2002, July 1–5, Annecy, France, 243–250.

    Google Scholar 

  6. 6. Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. Workshop Notes, Foundations and New Directions of Data Mining, the 3-rd International Conference on Data Mining, Melbourne, FL, USA, November 19–22, 2003, 56–63.

    Google Scholar 

  7. 7. Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of idiscernibility relation and rule induction. Transactions on Rough Sets, Lecture Notes in Computer Science Journal Subline, Springer-Verlag, vol. 1 (2004) 78– 95.

    Google Scholar 

  8. 8. Grzymala-Busse, J.W.: Characteristic relations for incomplete data: A generalization of the indiscernibility relation. Proceedings of the RSCTC'2004, the Fourth International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, June 1–5, 2004. Lecture Notes in Artificial Intelligence 3066, Springer-Verlag 2004, 244–253.

    Google Scholar 

  9. 9. Grzymala-Busse, J.W.: Three approaches to missing attribute values—A rough set perspective. Proceedings of the Workshop on Foundation of Data Mining, associated with the Fourth IEEE International Conference on Data Mining, Brighton, UK, November 1–4, 2004, 55–62.

    Google Scholar 

  10. 10. Grzymala-Busse, J.W.: Incomplete data and generalization of indiscernibility relation, definability, and approximations. Proceedings of the RSFDGrC'2005, the Tenth International Conference on Rough Sets, Fuzzy Sets, data Mining, and Granular Computing, Springer-Verlag, Regina, Canada, September 1–3, 2005, 244–253.

    Google Scholar 

  11. 11. Grzymala-Busse, J.W. and Hu, M.: A comparison of several approaches to missing attribute values in data mining. Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing RSCTC'2000, Ban., Canada, October 16–19, 2000, 340–347.

    Google Scholar 

  12. 12. Grzymala-Busse, J.W. and Siddhaye, S.: Rough set approaches to rule induction from incomplete data. Proceedings of the IPMU'2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Perugia, Italy, July 4–9, 2004, vol. 2, 923–930.

    Google Scholar 

  13. 13. Grzymala-Busse, J.W. and Wang A.Y.: Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. Proc. of the Fifth International Workshop on Rough Sets and Soft Computing (RSSC'97) at the Third Joint Conference on Information Sciences (JCIS'97), Research Triangle Park, NC, March 2–5, 1997, 69–72.

    Google Scholar 

  14. 14. Kryszkiewicz, M.: Rough set approach to incomplete information systems. Proceedings of the Second Annual Joint Conference on Information Sciences, Wrightsville Beach, NC, September 28–October 1, 1995, 194–197.

    Google Scholar 

  15. 15. Kryszkiewicz, M.: Rules in incomplete information systems. Information Sciences 113 (1999) 271–292.

    Article  MATH  MathSciNet  Google Scholar 

  16. 16. Lin, T.Y.: Topological and fuzzy rough sets. In Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, ed. by R. Slowinski, Kluwer Academic Publishers, Dordrecht, Boston, London (1992) 287–304.

    Google Scholar 

  17. 17. Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences 11 (1982) 341–356.

    Article  MathSciNet  Google Scholar 

  18. 18. Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Boston, London (1991).

    Google Scholar 

  19. 19. Slowinski, R. and Vanderpooten, D.: A generalized de.nition of rough approximations based on similarity. IEEE Transactions on Knowledge and Data Engineering 12 (2000) 331–336.

    Article  Google Scholar 

  20. 20. Stefanowski, J.: Algorithms of Decision Rule Induction in Data Mining. Poznan University of Technology Press, Poznan, Poland (2001).

    Google Scholar 

  21. 21. Stefanowski, J. and Tsoukias, A.: On the extension of rough sets under incomplete information. Proceedings of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, RSFDGrC' 1999, Ube, Yamaguchi, Japan, November 8-10, 1999, 73–81.

    Google Scholar 

  22. 22. Stefanowski, J. and Tsoukias, A.: Incomplete information tables and rough classi.cation. Computational Intelligence 17 (2001) 545–566.

    Article  Google Scholar 

  23. 23. Yao, Y.Y.: On the generalizing rough set theory. Proc. of the 9th Int. Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC'2003), Chongqing, China, October 19-22, 2003, 44–51.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer

About this paper

Cite this paper

Grzymała-Busse, J.W., Santoso, S. (2006). Experiments on Data with Three Interpretations of Missing Attribute Values—A Rough Set Approach. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 35. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33521-8_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-33521-8_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33520-7

  • Online ISBN: 978-3-540-33521-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics