Advertisement

Data Mining: A Probabilistic Rough Set Approach

  • Ning Zhong
  • Ju-Zhen Dong
  • Setsuo Ohsuga
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 19)

Abstract

This paper introduces a new approach for mining if-then rules in databases with uncertainty and incompleteness. The approach is based on the combination of Generalization Distribution Table (GDT) and the Rough Set methodology. A GDT is a table in which the probabilistic relationships between concepts and instances over discrete domains are represented. By using a GDT as a hypothesis search space and combining the GDT with the rough set methodology, noises and unseen instances can be handled, biases can be flexibly selected, background knowledge can be used to constrain rule generation, and if-then rules with strengths can be effectively acquired from large, complex databases in an incremental, bottom-up mode. In this paper, we focus on basic concepts and an implementation of our methodology.

Keywords

Decision Table Decision Attribute Concept Description Prior Probability Distribution Noise Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proc. 12th Inter. Conf. on Machine Learning (1995) 194–202.Google Scholar
  2. 2.
    Gordon, D.F., DesJardins, M.: Evaluation and selection of biases in machine learning. Machine Learning 20 (1995) 5–22Google Scholar
  3. 3.
    Hirsh, H.: Generalizing version spaces. Machine Learning 17 (1994) 5–46Google Scholar
  4. 4.
    Langley, P.: Elements of machine learning, Morgan Kaufmann Publishers (1996)Google Scholar
  5. 5.
    Mollestad, T., Skowron, A.: A rough set framework for data mining of propositional default rules. In: Z.W. Ras and M. Michalewicz (eds.), Ninth International Symposium on Methodologies for Intelligent Systems (ISMIS-96), Zakopane, Poland, June 9–13, Lecture Notes in Artificial Intelligence 1079, Springer-Verlag, Berlin (1996) 448–457Google Scholar
  6. 6.
    Michalski, R.S., Carbonell, J.G., Mitchell, T.M.: Machine learning - An artificial intelligence approach, 1–3 Morgan Kaufmann Publishers ( 1983, 1986, 1990 )Google Scholar
  7. 7.
    Mitchell, T.M.: Version spaces: A candidate elimination approach to rule learning. In: Proc. 5th Int. Joint Conf. Artificial Intelligence, (1977) 305–310Google Scholar
  8. 8.
    Mitchell, T.M.: Generalization as search. Artificial Intelligence 18 (1982) 203–226CrossRefGoogle Scholar
  9. 9.
    Ohsuga, S.: Symbol processing by non-symbol processor. In: Proc. 4th Pacific Rim International Conference on Artificial Intelligence (PRICAI’96) (1996) 193–205Google Scholar
  10. 10.
    Pfahringer, B.: Compression-based discretization of continuous attributes. In: Proc. 12th Inter. Conf. on Machine Learning (1995) 456–463Google Scholar
  11. 11.
    Piatetsky-Shapiro, G., Frawley, W.J. (eds.): Knowledge discovery in databases. AAAI Press and The MIT Press, (1991)Google Scholar
  12. 12.
    Shavlik, J.W., Dietterich, T.G. (eds.): Readings in machine learning. Morgan Kaufmann Publishers, San Mateo, CA (1990)Google Scholar
  13. 13.
    Shan, N., Hamilton, H.J., Ziarko, W., Cercone, N.: Discretization of continuos valued attributes in classification systems, In: S. Tsumoto, S. Kobayashi, T. Yokomori, H. Tanaka, and A. Nakamura (eds.): Proceedings of the Fourth International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery (RSFD’96), The University of Tokyo, November 6–8 (1996) 74–81Google Scholar
  14. 14.
    Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: R. Slowinski (ed.): Intelligent Decision Support - Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht (1992) 331–362CrossRefGoogle Scholar
  15. 15.
    Skowron, A., Suraj, Z.: A parallel algorithm for real-time decision making: A rough set approach. Journal of Intelligent Information Systems 7 (1996) 5–28CrossRefGoogle Scholar
  16. 16.
    Skowron, A., Polkowski, L.: Synthesis of decision systems from data tables. In: T.Y. Lin, N. Cercone (eds.): Rough Sets and Data Mining. Analysis of Imprecise Data, Kluwer Academic Publishers, Boston, Dordrecht (1997) 259–299Google Scholar
  17. 17.
    Teghem, J., Charlet J.-M.: Use of ‘rough sets’ method to draw premonitory factors for earthquakes by emphasing gas geochemistry: The case of a low seismic activity context, in Belgium. In: R. Slowinski (ed.): Intelligent Decision Support - Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht (1992) 165–179CrossRefGoogle Scholar
  18. 18.
    Lin, T.Y.: Neighborhood systems - A qualitative theory for fuzzy and rough sets. In: P.P. Wang (ed.), Advances in Machine Intelligence and Soft Computing 4 (1996) 132–155Google Scholar
  19. 19.
    Lin, T.Y., Cercone, N. (eds.): Rough sets and data mining: Analysis of imprecise data. Kluwer Academic Publishers, Boston, Dordrecht (1997)Google Scholar
  20. 20.
    Pawlak, Z.: Rough sets - Theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht (1991)Google Scholar
  21. 21.
    Zhong, N. Ohsuga,S.: Using generalization distribution tables as a hypotheses search space for generalization. In: S. Tsumoto, S. Kobayashi, T. Yokomori, H. Tanaka, and A. Nakamura (eds.): Proceedings of the Fourth International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery (RSFD’96), The University of Tokyo, November 6–8 (1996) 396–403Google Scholar
  22. 22.
    Zhong, N., Fujitsu, S., Ohsuga, S.: Generalization based on the connectionist networks representation of a generalization distribution table. In: Proc. First Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD97), World Scientific (1997) 183–197Google Scholar
  23. 23.
    Zhong, N., Dong, J.Z., Ohsuga, S.: Discovering rules in the environment with noise and incompleteness. In: Proc. 10th International Florida AI Reaserch Symposium (FLAIRS-97), Special Track on Uncertainty in AI (1997) 186–191Google Scholar
  24. 24.
    Zhong, N., Dong, J.Z., Ohsuga, S.: Soft techniques to rule discovery in data. In: Proceedings of the Fifth European Congress on Intelligent Techniques and Soft Computing (EUFIT’97), September 8–11, Aachen, Germany, Verlag Mainz, Aachen (1997) 212–217Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Ning Zhong
  • Ju-Zhen Dong
    • 1
  • Setsuo Ohsuga
    • 2
  1. 1.Department of Computer Science and Systems Engineering, Faculty of EngineeringYamaguchi UniversityUbe 755Japan
  2. 2.Department of Information and Computer Science, School of Science and EngineeringWaseda UniversityTokyo 169Japan

Personalised recommendations