Studies on the Necessary Data Size for Rule Induction by STRIM

Kato, Yuichi; Saeki, Tetsuro; Mizuno, Shoutarou

doi:10.1007/978-3-642-41299-8_20

Yuichi Kato²⁴,
Tetsuro Saeki²⁵ &
Shoutarou Mizuno²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8171))

Included in the following conference series:

International Conference on Rough Sets and Knowledge Technology

1479 Accesses
13 Citations

Abstract

STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by a simulation experiment specifying rules in advance, and by comparison with the conventional methods. However, there remains scope for future studies. One aspect which needs examination is determination of the size of the dataset needed for inducting true rules by simulation experiments, since finding statistically significant rules is the core of the method. This paper examines the theoretical necessary size of the dataset that STRIM needs to induct true rules with probability w [%] in connection with the rule length, and confirms the validity of this study by a simulation experiment at the rule length 2. The results provide useful guidelines for analyzing real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pawlak, Z.: Rough sets. Internat. J. Inform. Comput. Sci. 11, 341–356 (1982)
Article MathSciNet MATH Google Scholar
Skowron, A., Rauszer, C.: The discernibility matrix and functions in information systems. In: Slowiński, R. (ed.) Intelligent Decision Support — Handbook of Application and Advances of Rough Set Theory, pp. 331–362. Kluwer Academic Publisher, Dordrecht (1992)
Chapter Google Scholar
Bao, Y.G., Du, X.Y., Deng, M.G., Ishii, N.: An efficient method for computing all reducts. Transactions of the Japanese Society for Artificial Intelligence 19, 166–173 (2004)
Article Google Scholar
Grzymała-Busse, J.W.: LERS — A system for learning from examples based on rough sets. In: Slowiński, R. (ed.) Intelligent Decision Support — Handbook of Applications and Advances of the Rough Sets Theory, pp. 3–18. Kluwer Academic Publisher, Dordrecht (1992)
Chapter Google Scholar
Ziarko, W.: Variable precision rough set model. Journal of Computer and System Science 46, 39–59 (1993)
Article MathSciNet MATH Google Scholar
Shan, N., Ziarko, W.: Data-based acquisition and incremental modification of classification rules. Computational Intelligence 11, 357–370 (1995)
Article Google Scholar
Nishimura, T., Kato, Y., Saeki, T.: Studies on an effective algorithm to reduce the decision matrix. In: Kuznetsov, S.O., Ślęzak, D., Hepting, D.H., Mirkin, B.G. (eds.) RSFDGrC 2011. LNCS, vol. 6743, pp. 240–243. Springer, Heidelberg (2011)
Chapter Google Scholar
Matsubayashi, T., Kato, Y., Saeki, T.: A new rule induction method from a decision table using a statistical test. In: Li, T., Nguyen, H.S., Wang, G., Grzymala-Busse, J., Janicki, R., Hassanien, A.E., Yu, H. (eds.) RSKT 2012. LNCS, vol. 7414, pp. 81–90. Springer, Heidelberg (2012)
Chapter Google Scholar
Walpole, R.E., Myers, R.H., Myers, S.L., Ye, K.: Probability and Statistics for Engineers and Scientists, 8th edn., pp. 187–191. Pearson Prentice Hall, New Jersey (2007)
Google Scholar
Grzymała-Busse, J.W., Grzymała-Busse, W.J.: Handling missing attribute values. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn., pp. 33–49. Springer, Heidelberg (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Shimane University, 1060 Nishikawatsu-cho, Matsue city, Shimane, 690-8504, Japan
Yuichi Kato & Shoutarou Mizuno
Yamaguchi University, 2-16-1 Tokiwadai, Ube city, Yamaguchi, 755-8611, Japan
Tetsuro Saeki

Authors

Yuichi Kato
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuro Saeki
View author publications
You can also search for this author in PubMed Google Scholar
Shoutarou Mizuno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Saint Mary’s University, B3H 3C3, Halifax, NS, Canada
Pawan Lingras
Maria Curie-Skłodowska University, Lublin, Poland
Marcin Wolski
University of Granada, Spain
Chris Cornelis
Indian Statistical Institute, 700108, Kolkata, India
Sushmita Mitra
University of Warsaw, 02-097, Warsaw, Poland
Piotr Wasilewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kato, Y., Saeki, T., Mizuno, S. (2013). Studies on the Necessary Data Size for Rule Induction by STRIM. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds) Rough Sets and Knowledge Technology. RSKT 2013. Lecture Notes in Computer Science(), vol 8171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41299-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-41299-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41298-1
Online ISBN: 978-3-642-41299-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics