Skip to main content

Data Mining: Trends in Research and Development

  • Chapter
Rough Sets and Data Mining

Abstract

Data mining is an interdisciplinary research area spanning several disciplines such as database systems, machine learning, intelligent information systems, statistics, and expert systems. Data mining has evolved into an important and active area of research because of theoretical challenges and practical applications associated with the problem of discovering (or extracting) interesting and previously unknown knowledge from very large real-world databases. Many aspects of data mining have been investigated in several related fields. But the problem is unique enough that there is a great need to extend these studies to include the nature of the contents of the real-world databases. In this chapter, we discuss the theory and foundational issues in data mining, describe data mining methods and algorithms, and review data mining applications. Since a major focus of this book is on rough sets and its applications to database mining, one full section is devoted to summarizing the state of rough sets as related to data mining of real-world databases. More importantly, we provide evidence showing that the theory of rough sets constitutes a sound basis for data mining applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, “Knowledge discovery databases: An overview,” in Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. J. Frawley, eds.), pp. 1–27, Cambridge, MA: AAAI/MIT, 1991.

    Google Scholar 

  2. R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and B. Swami, “An interval classifier for database mining applications,” in Proceedings of the 18th VLDB Conference, (Vancouver, British Columbia, Canada), pp. 560–573, 1992.

    Google Scholar 

  3. C. J. Matheus, P. K. Chan, and G. Piatetsky-Shapiro, “Systems for knowledge discovery in databases,” IEEE Trans, on Knowledge and Data Engineering, vol. 5, no. 6, pp. 903–912, 1993.

    Article  Google Scholar 

  4. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Advances in Knowledge Discovery and Data Mining. Cambridge, MA: MIT Press, 1996.

    Google Scholar 

  5. R. Krishnamurty and T. Imielinski, “Research directions in knowledge discovery,” SIGMOD RECORD, vol. 20, pp. 76–78, 1991.

    Google Scholar 

  6. A. Silberschatz, M. Stonebraker, and J. Ullman, “Database systems: achievements and opportunities,” Tech. Rep. TR-90–22, University of Texas at Austin, Department of Computer Science, 1990.

    Google Scholar 

  7. K. C. C. Chan and A. K. C. Wong, “A statistical technique for extracting classificatory knowledge from databases,” in Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. J. Frawley, eds.), pp. 107–123, Cambridge, MA: AAAI/MIT, 1991.

    Google Scholar 

  8. V. V. Raghavan, H. Sever, and J. S. Deogun, “A system architecture for database mining applications,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), pp. 73–77, 1993.

    Google Scholar 

  9. S. K. Lee, “An extended relational database model for uncertain and imprecise information,” in Proceedings of the 18th VLDB conference, (Vancouver, British Columbia, Canada), pp. 211–218, 1992.

    Google Scholar 

  10. B. P. Buckles and F. E. Petry, “A fuzzy model for relational databases,” Journal of Fuzzy Sets and Systems, vol. 7, no. 3, pp. 213–226, 1982.

    Article  MATH  Google Scholar 

  11. D. Barbara, H. Garcia-Molina, and D. Porter, “The management of probabilistic data,” IEEE Trans, on Knowledge and Data Engineering, vol. 4, no. 5, pp. 487–502, 1992.

    Article  Google Scholar 

  12. C. Corinna, H. Drucker, D. Hoover, and V. Vapnik, “Capacity and complexity control in predicting the spread between harrowing and lending interest rates,” in The First International Conference on Knowledge Discovery and Data Mining (U. Fayyad and R. Uthurusamy, eds.), (Montreal, Quebec, Canada), pp. 51–76, aug 1995.

    Google Scholar 

  13. N. Zhong and S. Ohsuga, “Discovering concept clusters by decomposing databases,” Data & Knowledge Engineering, vol. 12, pp. 223–244, 1994.

    Article  Google Scholar 

  14. G. Piatetsky-Shapiro and C. J. Matheus, “Knowledge discovery workbench for exploring business databases,” International Journal of Intelligent Systems, vol. 7, pp. 675–686, 1992.

    Article  MATH  Google Scholar 

  15. U. M. Fayyad and K. B. Irani, “Multi interval discretization of continuous attributes for classification learning,” in Proceedings of 13th International Joint Conference on Artificial Intelligence (R. Bajcsy, ed.), pp. 1022–1027, Morgan Kauffmann, 1993.

    Google Scholar 

  16. J. F. Elder-IV and D. Pregibon, “A statistical perspective on KDD,” in The First International Conference on Knowledge Discovery and Data Mining (U. Fayyad and R. Uthurusamy, eds.), (Montreal, Quebec, Canada), pp. 87–93, aug 1995.

    Google Scholar 

  17. S. K. M. Wong, W. Ziarko, and R. L. Ye, “Comparison of rough set and statistical methods in inductive learning,” International Journal of Man-Machine Studies, vol. 24, pp. 53–72, 1986.

    Article  MATH  Google Scholar 

  18. W. Ziarko, “The discovery, analysis, and representation of data dependencies in databases,” in Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. J. Frawley, eds.), Cambridge, MA: AAAI/MIT, 1991.

    Google Scholar 

  19. J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81–106, 1986.

    Google Scholar 

  20. M. James, Classification Algorithms. John Wiley & Sons, 1985.

    MATH  Google Scholar 

  21. T. Mitchell, “Generalization as search,” Artificial Intelligence, vol. 18, pp. 203–226, 1982.

    Article  MathSciNet  Google Scholar 

  22. J. Han, Y. Cai, and N. Cercone, “Knowledge discovery in databases: An attribute-oriented approach,” in Proceedings of the 18th VLDB Conference, (Vancouver, British Columbia, Canada), pp. 547–559, 1992.

    Google Scholar 

  23. J. Ching, A. Wong, and K. Chan, “Class-dependent discretization for inductive learning from continuous and mixed mode data,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 7, pp. 641–651, 1995.

    Google Scholar 

  24. J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann Publishers, 1988.

    Google Scholar 

  25. D. Stashuk and R. Naphan, “Probabilistic inference based classification applied to myoelectric signal decomposition,” IEEE Trans, on Biomedical Engineering, June 1992.

    Google Scholar 

  26. J. Quinlan and R. Rivest, “Inferring decision trees using the minumum description length principle,” Information and Computation, vol. 80, pp. 227–248, 1989.

    Article  MathSciNet  MATH  Google Scholar 

  27. J. R. Quinlan, “The effect of noise on concept learning,” in Machine Learning: An Artificial Intelligence Approach (R. Michalski, J. Carbonell, and T. Mitchell, eds.), vol. 2, pp. 149–166, San Mateo, CA: Morgan Kauffmann Inc., 1986.

    Google Scholar 

  28. T. Luba and R. Lasocki, “On unknown attribute values in functional dependencies,” in Proceedings of the International Workshop on Rough Sets and Soft Computing, (San Jose, CA), pp. 490–497, 1994.

    Google Scholar 

  29. J. W. Grzymala-Busse, “On the unknown attribute values in learning from examples,” in Proceedings of Methodologies for Intelligent Systems (Z. W. Ras and M. Zemankowa, eds.), Lecture Notes in AI, 542, pp. 368–377, New York: Springer-Verlag, 1991.

    Google Scholar 

  30. B. Thiesson, “Accelerated quantification of bayesian networks with incomplete data,” in The First International Conference on Knowledge Discovery and Data Mining (U. Fayyad and R. Uthurusamy, eds.), (Montreal, Quebec, Canada), pp. 306–311, aug 1995.

    Google Scholar 

  31. J. R. Quinlan, “Unknown attribute values in induction,” in Proceedings of the Sixth International Machine Learning Workshop (A. M. Segre, ed.), (San Mateo, CA), pp. 164–168, Morgan Kaufmann Pub., 1989.

    Google Scholar 

  32. S. K. M. Wong and W. Ziarko, “Comparison of the probabilistic approximate classification and fuzzy set model,” Fuzzy Sets and Systems, no. 21, pp. 357–362, 1982.

    Google Scholar 

  33. Y. Y. Yao and K. M. Wong, “A decision theoretic framework for approximating concepts,” International Journal Man-Machine Studies, vol. 37, pp. 793–809.

    Article  Google Scholar 

  34. J. Mingers, “An empirical comparison of selection measures for decision tree induction,” Machine Learning, vol. 3, pp. 319–342, 1989.

    Google Scholar 

  35. M. Modrzejewski, “Feature selection using rough sets theory,” in Machine Learning: Proceedings of ECML-93 (P. B. Brazdil, ed.), pp. 213–226, Springer-Verlag.

    Google Scholar 

  36. R. Uthurusamy, U. Fayyad, and S. Spangler, “Learning useful rules from inconclusive data,” in Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. J. Frawley, eds.), Cambridge, MA: AAAI/MIT, 1991.

    Google Scholar 

  37. J. S. Deogun, V. V. Raghavan, and H. Sever, “Exploiting upper approximations in the rough set methodology,” in The First International Conference on Knowledge Discovery and Data Mining (U. Fayyad and R. Uthurusamy, eds.), (Montreal, Quebec, Canada), pp. 69–74, aug 1995.

    Google Scholar 

  38. K. Kira and L. Rendell, “The feature selection problem: Tradational methods and a new algorithm,” in Proceedings of AAAI-92, pp. 129–134, AAAI Press, 1992.

    Google Scholar 

  39. H. Almuallim and T. Dietterich, “Learning with many irrelevant features,” in Proceedings of AAAI-91, (Menlo Park, CA), pp. 547–552, AAAI Press, 1991.

    Google Scholar 

  40. Z. Pawlak, K. Slowinski, and R. Slowinski, “Rough classification of patients after highly selective vagotomy for duodenal ulcer,” International Journal of Man-Machine Studies, vol. 24, pp. 413–433, 1986.

    Article  Google Scholar 

  41. C. Y. Chang, “Dynamic programming as applied to feature subset selection in a pattern recognition system,” IEEE Trans. Syst., Man, Cybern., vol. SMC-3, pp. 166–171, 1973.

    Google Scholar 

  42. P. M. Narendra and K. Fukunaga, “A branch and bound algorithm for feature subset selection,” IEEE Trans, on Computers, vol. c-26, no. 9, pp. 917–922, 1977.

    Article  Google Scholar 

  43. R. A. Devijver and J. Kittler, Pattern Recognation: A statistical approach. London: Prentice Hall, 1982.

    MATH  Google Scholar 

  44. A. J. Miller, Subset Selection in Regression. Chapman and Hall, 1990.

    MATH  Google Scholar 

  45. U. M. Fayyad and K. B. Irani, “The attribute selection problem in decision tree generation,” in Proceedings of AAAI-92, pp. 104–110, AAAI Press, 1992.

    Google Scholar 

  46. P. Baim, “A method for attribute selection in inductive learning systems,” IEEE Trans, on Pattern Analysis and Machine Intelligence, vol. 10, no. 4, pp. 888–896, 1988.

    Article  Google Scholar 

  47. P. J. Huber, “Projection pursuit,” Annals of Statistics, vol. 13, no. 2, pp. 435–475, 1985.

    Article  MathSciNet  MATH  Google Scholar 

  48. R. Agrawal, T. Imielinski, and A. Swami, “Database mining: A performance perspective,” IEEE Trans. Knowledge and Data Eng., vol. 5, no. 6, pp. 914–924, 1993.

    Article  Google Scholar 

  49. R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. John Wiley & Sons, 1973.

    MATH  Google Scholar 

  50. S. Salzberg, Learning with Nested Generalized Exemplars. Boston, MA: Kluwer Academic Publishers, 1990.

    Book  MATH  Google Scholar 

  51. S. M. Weiss and C. A. Kulikowski, Computer Systems that Learn. San Mateo, CA: Morgan Kaufmann, 1991.

    Google Scholar 

  52. R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, Machine Learning: An Artificial Intelligence Approach. Palo Alto, CA: Tioga, 1983.

    Google Scholar 

  53. J. Shavlik and T. Diettrich, Readings in Machine Learning. San Mateo, CA: Morgan Kaufmann, 1990.

    Google Scholar 

  54. S. Muggleton, A. Srinivasan, and M. Bain, “Compression, significance and accuracy,” in Proceedings of 9th International Workshop on Machine Learning, (ML92), (Aberdeen, Scotland), Morgan Kauffmann, 1992.

    Google Scholar 

  55. R. Holte, L. Acker, and B. Porter, “Concept learning and the problem of small disjuncts,” in Proceedings of 11th International Joint Conference on Artificial Intelligence, (Detroit, MI), Morgan Kauffmann, 1989.

    Google Scholar 

  56. B. Efron and R. Tibshirani, An Introduction to the Bootstrap. Chapman & Hall. 1993

    MATH  Google Scholar 

  57. K. Fukunaga and R. Hayes, “Effects of sample size in classifier design,” IEEE Trans, on Pattern analysis and Machine Intelligence, vol. 11, no. 8, pp. 873–885, 1985.

    Article  Google Scholar 

  58. M. P. D. Fisher and P. Langley, Concept Formation, Knowledge and Experience in Unsupervised Learning. San Mateo, CA: Morgan Kaufmann, 1991.

    Google Scholar 

  59. R. Slowinski and J. Stefanowiski, “Rough classification with valued closeness relation,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (San Jose, CA), 1995.

    Google Scholar 

  60. J. S. Deogun, V. V. Raghavan, and H. Sever, “Rough set based classification methods and extended decision tables,” in Proceedings of the International Workshop on Rough Sets and Soft Computing, (San Jose, California), pp. 302–309.

    Google Scholar 

  61. W. Ziarko and N. Shan, “KDD-R: a comprehensive system for knowledge discovery in databases using rough sets,” in Proceedings of the International Workshop on Rough Sets and Soft Computing, (San Jose, California), pp. 164–173, 1994.

    Google Scholar 

  62. J. D. Katzberg and W. Ziarko, “Variable precision rough sets with asymmetric bounds,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), pp. 163–190, 1993.

    Google Scholar 

  63. Y. Y. Yao and X. Li, “Uncertainty reasoning with interval-set algebra,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), pp. 191–201, 1993.

    Google Scholar 

  64. R. R. Hashemi, B. A. Pearce, W. G. Hinson, M. G. Paule, and J. F. Young, “IQ estimation of monkeys based on human data using rough sets,” in Proceedings of the International Workshop on Rough Sets and Soft Computing, (San Jose, California), pp. 400–407, 1994.

    Google Scholar 

  65. Z. Pawlak, “Rough classification,” International Journal of Man-Machine Studies, vol. 20, pp. 469–483, 1984.

    Article  MATH  Google Scholar 

  66. R. Kohavi and B. Frasca, “Useful feature subsets and rough set reducts,” in Proceedings of the International Workshop on Rough Sets and Soft Computing, (San Jose, California), pp. 310–317, 1994.

    Google Scholar 

  67. J. S. Deogun, V. V. Raghavan, and H. Sever, “Rough set model for database mining applications,” Tech. Rep. TR-94–6-10, The University of Southwestern Louisiana, The Center for Advanced Computer Studies, 1994.

    Google Scholar 

  68. R. E. Kent, “Rough concept analysis,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), pp. 245–253, 1993.

    Google Scholar 

  69. J. Berry, “Database marketing,” Business Week, pp. 56–62, September 5 1994.

    Google Scholar 

  70. K. A. Kaufmann, R. S. Michalski, and L. Kerschberg, “Mining for knowledge in databases: Goals and general description of the INLEN system,” in Knowledge Discovery in Databases (W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, eds.), Cambridge, MA: MIT Press, 1991.

    Google Scholar 

  71. P. Hoschka and W. Klosgen, “A support system for interpreting statistical data,” in Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. J. Frawley, eds.), pp. 325–345, Cambridge, MA: AAAI/MIT, 1991.

    Google Scholar 

  72. Integrated Solutions, Ltd., Hampshire, England, Clementine — Software for Data Mining.

    Google Scholar 

  73. A. J. Szladow, “Datogic/R: for database mining and decision support,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), p. 511, 1993.

    Google Scholar 

  74. J. W. Grzymala-Busse, “The rule induction system LERS Q: a version for personal computers,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), p. 509, 1993.

    Google Scholar 

  75. D. M. Grzymala-Busse and J. W. Grzymala-Busse, “Comparison of machine learning and knowledge acquisition methods of rule induction based on rough sets,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), pp. 297–306, 1993.

    Google Scholar 

  76. T. Anand and G. Kahn, “Spotlight: A data explanation system,” in Proceedings of the Eighth IEEE Conference on Applied AI, (Washington, D.C.), pp. 2–8, IEEE Press, 1992.

    Google Scholar 

  77. K. Hatonen, M. Klemettinen, H. Mannila, and P. Ronkinen, “Knowledge discovery from telecommunications network alarm databases,” in Proceedings of the 12th International Conference on Data Engineering (C. Bogdan, ed.), (New Orleans, LA), feb/mar 1996.

    Google Scholar 

  78. R. Wille, “Restructuring lattice theory: An approach based on hierarchies on concepts,” in Ordered Sets (I. Rival, ed.), Dordrecht-Boston: Reidel, 1982.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Kluwer Academic Publishers

About this chapter

Cite this chapter

Deogun, J.S., Raghavan, V.V., Sarkar, A., Sever, H. (1997). Data Mining: Trends in Research and Development. In: Rough Sets and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1461-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1461-5_2

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8637-0

  • Online ISBN: 978-1-4613-1461-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics