Statistical Techniques for Rough Set Data Analysis

  • Günther Gediga
  • Ivo Düntsch
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 56)


Concept forming and classification in the absence of complete or certain information has been a major concern of artificial intelligence for some time. Traditional “hard” data analysis based on statistical models or are in many cases not equipped to deal with uncertainty, relativity, or non—monotonic processes. Even the recently popular “soft” computing approach with its principal components “... fuzzy logic, neural network theory, and probabilistic reasoning” [16] uses quite hard parameters outside the observed phenomena, e.g. representation and distribution assumptions, prior probabilities, beliefs, or membership degrees, the origin of which is not always clear; one should not forget that the results of these methods are only valid up to the — stated or unstated — model assumptions. The question arises, whether there is a step in the modelling process which is informative for the researcher and, at the same time, does not require additional assumptions about the data. To make this clearer, we follow [9] in assuming that a data model consists of
  1. 1.

    A domain D of interest.

  2. 2.

    An empirical system E, which consists of a body of data and relations among the data, and a mapping e : D → E, called operationalisation.

  3. 3.

    A (structural or numerical) model M, and a mapping m : ε → M, called representation.



Akaike Information Criterion Approximation Quality Decision Attribute Rule System Deterministic Rule 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B. N. and Cáski, F., editors, Second International Symposium on Information Theory, pages 267–281, Budapest. Akademiai Kaidó.Google Scholar
  2. 1.
    Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. Reprinted in Breakthroughs in Statistics, eds Kotz, S. & Johnson, N. L. (1992), volume I, pp. 599–624. New York: Springer.Google Scholar
  3. 2.
    Browne, C., Düntsch, I., and Gediga, G. (1998). IRIS revisited: A comparison of discriminant and enhanced rough set data analysis. In Polkowski, L. and Skowron, A., editors, Rough sets in knowledge discovery, Vol. 2, pages 345–368, Heidelberg. Physica-Verlag.Google Scholar
  4. 3.
    Düntsch, I. and Gediga, G. (1997a). The rough set engine GROBIAN. In Sydow, A., editor, Proc. 15th IMACS World Congress, Berlin, volume 4, pages 613–618, Berlin. Wissenschaft und Technik Verlag.Google Scholar
  5. 4.
    Düntsch, I. and Gediga, G. (1997b). Statistical evaluation of rough set dependency analysis. International Journal of Human-Computer Studies, 46:589–604.CrossRefGoogle Scholar
  6. 5.
    Düntsch, I. and Gediga, G. (1998). Uncertainty measures of rough set prediction. Artificial Intelligence, 106(1):77–107.MathSciNetCrossRefGoogle Scholar
  7. 6.
    Düntsch, I. and Gediga, G. (2000). Rough set data analysis. In Encyclopedia of Computer Science and Technology. Marcel Dekker. To appear (Tech. report version avaliable at e-mail Scholar
  8. 7.
    Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Ann. Eugen., 7:179–188.Google Scholar
  9. 8.
    Gediga, G. and Düntsch, I. (1999). Probabilistic granule analysis. Draft paper.Google Scholar
  10. 9.
    Gigerenzer, G. (1981). Messung und Modellbildung in der Psychologie. Birkhäuser, Basel.Google Scholar
  11. 10.
    Pawlak, Z. (1982). Rough sets. Internat. J. Comput. Inform. Sci., 11:341–356.MathSciNetCrossRefMATHGoogle Scholar
  12. 11.
    Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data, volume 9 of System Theory, Knowledge Engineering and Problem Solving. Kluwer, Dordrecht.MATHGoogle Scholar
  13. 12.
    Quinlan, R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4:77–90.MATHGoogle Scholar
  14. 13.
    Rissanen, J. (1978). Modeling by the shortest data description. Automatica, 14:465–471.CrossRefMATHGoogle Scholar
  15. 14.
    Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6:461–464.MathSciNetCrossRefMATHGoogle Scholar
  16. 15.
    Wald, A. (1947). Sequential Analysis. Wiley, New York.MATHGoogle Scholar
  17. 16.
    Zadeh, L. A. (1994). What is BISC? e-mail:, University of California.Google Scholar
  18. 17.
    Ziarko, W. (1993). Variable precision rough set model. Journal of Computer and System Sciences, 46.Google Scholar

Copyright information

© Physica-Verlag Heidelberg 2000

Authors and Affiliations

  • Günther Gediga
    • 1
  • Ivo Düntsch
    • 2
  1. 1.FB Psychologie / MethodenlehreUniversität OsnabrückOsnabrückGermany
  2. 2.School of Information and Software EngineeringUniversity of UlsterNewtownabbeyN. Ireland

Personalised recommendations