Data Mining: Trends in Research and Development

Deogun, Jitender S.; Raghavan, Vijay V.; Sarkar, Amartya; Sever, Hayri

doi:10.1007/978-1-4613-1461-5_2

Jitender S. Deogun³,
Vijay V. Raghavan⁴,
Amartya Sarkar⁴ &
…
Hayri Sever⁵

232 Accesses
25 Citations

Abstract

Data mining is an interdisciplinary research area spanning several disciplines such as database systems, machine learning, intelligent information systems, statistics, and expert systems. Data mining has evolved into an important and active area of research because of theoretical challenges and practical applications associated with the problem of discovering (or extracting) interesting and previously unknown knowledge from very large real-world databases. Many aspects of data mining have been investigated in several related fields. But the problem is unique enough that there is a great need to extend these studies to include the nature of the contents of the real-world databases. In this chapter, we discuss the theory and foundational issues in data mining, describe data mining methods and algorithms, and review data mining applications. Since a major focus of this book is on rough sets and its applications to database mining, one full section is devoted to summarizing the state of rough sets as related to data mining of real-world databases. More importantly, we provide evidence showing that the theory of rough sets constitutes a sound basis for data mining applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, “Knowledge discovery databases: An overview,” in Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. J. Frawley, eds.), pp. 1–27, Cambridge, MA: AAAI/MIT, 1991.
Google Scholar
R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and B. Swami, “An interval classifier for database mining applications,” in Proceedings of the 18th VLDB Conference, (Vancouver, British Columbia, Canada), pp. 560–573, 1992.
Google Scholar
C. J. Matheus, P. K. Chan, and G. Piatetsky-Shapiro, “Systems for knowledge discovery in databases,” IEEE Trans, on Knowledge and Data Engineering, vol. 5, no. 6, pp. 903–912, 1993.
Article Google Scholar
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Advances in Knowledge Discovery and Data Mining. Cambridge, MA: MIT Press, 1996.
Google Scholar
R. Krishnamurty and T. Imielinski, “Research directions in knowledge discovery,” SIGMOD RECORD, vol. 20, pp. 76–78, 1991.
Google Scholar
A. Silberschatz, M. Stonebraker, and J. Ullman, “Database systems: achievements and opportunities,” Tech. Rep. TR-90–22, University of Texas at Austin, Department of Computer Science, 1990.
Google Scholar
K. C. C. Chan and A. K. C. Wong, “A statistical technique for extracting classificatory knowledge from databases,” in Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. J. Frawley, eds.), pp. 107–123, Cambridge, MA: AAAI/MIT, 1991.
Google Scholar
V. V. Raghavan, H. Sever, and J. S. Deogun, “A system architecture for database mining applications,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), pp. 73–77, 1993.
Google Scholar
S. K. Lee, “An extended relational database model for uncertain and imprecise information,” in Proceedings of the 18th VLDB conference, (Vancouver, British Columbia, Canada), pp. 211–218, 1992.
Google Scholar
B. P. Buckles and F. E. Petry, “A fuzzy model for relational databases,” Journal of Fuzzy Sets and Systems, vol. 7, no. 3, pp. 213–226, 1982.
Article MATH Google Scholar
D. Barbara, H. Garcia-Molina, and D. Porter, “The management of probabilistic data,” IEEE Trans, on Knowledge and Data Engineering, vol. 4, no. 5, pp. 487–502, 1992.
Article Google Scholar
C. Corinna, H. Drucker, D. Hoover, and V. Vapnik, “Capacity and complexity control in predicting the spread between harrowing and lending interest rates,” in The First International Conference on Knowledge Discovery and Data Mining (U. Fayyad and R. Uthurusamy, eds.), (Montreal, Quebec, Canada), pp. 51–76, aug 1995.
Google Scholar
N. Zhong and S. Ohsuga, “Discovering concept clusters by decomposing databases,” Data & Knowledge Engineering, vol. 12, pp. 223–244, 1994.
Article Google Scholar
G. Piatetsky-Shapiro and C. J. Matheus, “Knowledge discovery workbench for exploring business databases,” International Journal of Intelligent Systems, vol. 7, pp. 675–686, 1992.
Article MATH Google Scholar
U. M. Fayyad and K. B. Irani, “Multi interval discretization of continuous attributes for classification learning,” in Proceedings of 13th International Joint Conference on Artificial Intelligence (R. Bajcsy, ed.), pp. 1022–1027, Morgan Kauffmann, 1993.
Google Scholar
J. F. Elder-IV and D. Pregibon, “A statistical perspective on KDD,” in The First International Conference on Knowledge Discovery and Data Mining (U. Fayyad and R. Uthurusamy, eds.), (Montreal, Quebec, Canada), pp. 87–93, aug 1995.
Google Scholar
S. K. M. Wong, W. Ziarko, and R. L. Ye, “Comparison of rough set and statistical methods in inductive learning,” International Journal of Man-Machine Studies, vol. 24, pp. 53–72, 1986.
Article MATH Google Scholar
W. Ziarko, “The discovery, analysis, and representation of data dependencies in databases,” in Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. J. Frawley, eds.), Cambridge, MA: AAAI/MIT, 1991.
Google Scholar
J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81–106, 1986.
Google Scholar
M. James, Classification Algorithms. John Wiley & Sons, 1985.
MATH Google Scholar
T. Mitchell, “Generalization as search,” Artificial Intelligence, vol. 18, pp. 203–226, 1982.
Article MathSciNet Google Scholar
J. Han, Y. Cai, and N. Cercone, “Knowledge discovery in databases: An attribute-oriented approach,” in Proceedings of the 18th VLDB Conference, (Vancouver, British Columbia, Canada), pp. 547–559, 1992.
Google Scholar
J. Ching, A. Wong, and K. Chan, “Class-dependent discretization for inductive learning from continuous and mixed mode data,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 7, pp. 641–651, 1995.
Google Scholar
J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann Publishers, 1988.
Google Scholar
D. Stashuk and R. Naphan, “Probabilistic inference based classification applied to myoelectric signal decomposition,” IEEE Trans, on Biomedical Engineering, June 1992.
Google Scholar
J. Quinlan and R. Rivest, “Inferring decision trees using the minumum description length principle,” Information and Computation, vol. 80, pp. 227–248, 1989.
Article MathSciNet MATH Google Scholar
J. R. Quinlan, “The effect of noise on concept learning,” in Machine Learning: An Artificial Intelligence Approach (R. Michalski, J. Carbonell, and T. Mitchell, eds.), vol. 2, pp. 149–166, San Mateo, CA: Morgan Kauffmann Inc., 1986.
Google Scholar
T. Luba and R. Lasocki, “On unknown attribute values in functional dependencies,” in Proceedings of the International Workshop on Rough Sets and Soft Computing, (San Jose, CA), pp. 490–497, 1994.
Google Scholar
J. W. Grzymala-Busse, “On the unknown attribute values in learning from examples,” in Proceedings of Methodologies for Intelligent Systems (Z. W. Ras and M. Zemankowa, eds.), Lecture Notes in AI, 542, pp. 368–377, New York: Springer-Verlag, 1991.
Google Scholar
B. Thiesson, “Accelerated quantification of bayesian networks with incomplete data,” in The First International Conference on Knowledge Discovery and Data Mining (U. Fayyad and R. Uthurusamy, eds.), (Montreal, Quebec, Canada), pp. 306–311, aug 1995.
Google Scholar
J. R. Quinlan, “Unknown attribute values in induction,” in Proceedings of the Sixth International Machine Learning Workshop (A. M. Segre, ed.), (San Mateo, CA), pp. 164–168, Morgan Kaufmann Pub., 1989.
Google Scholar
S. K. M. Wong and W. Ziarko, “Comparison of the probabilistic approximate classification and fuzzy set model,” Fuzzy Sets and Systems, no. 21, pp. 357–362, 1982.
Google Scholar
Y. Y. Yao and K. M. Wong, “A decision theoretic framework for approximating concepts,” International Journal Man-Machine Studies, vol. 37, pp. 793–809.
Article Google Scholar
J. Mingers, “An empirical comparison of selection measures for decision tree induction,” Machine Learning, vol. 3, pp. 319–342, 1989.
Google Scholar
M. Modrzejewski, “Feature selection using rough sets theory,” in Machine Learning: Proceedings of ECML-93 (P. B. Brazdil, ed.), pp. 213–226, Springer-Verlag.
Google Scholar
R. Uthurusamy, U. Fayyad, and S. Spangler, “Learning useful rules from inconclusive data,” in Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. J. Frawley, eds.), Cambridge, MA: AAAI/MIT, 1991.
Google Scholar
J. S. Deogun, V. V. Raghavan, and H. Sever, “Exploiting upper approximations in the rough set methodology,” in The First International Conference on Knowledge Discovery and Data Mining (U. Fayyad and R. Uthurusamy, eds.), (Montreal, Quebec, Canada), pp. 69–74, aug 1995.
Google Scholar
K. Kira and L. Rendell, “The feature selection problem: Tradational methods and a new algorithm,” in Proceedings of AAAI-92, pp. 129–134, AAAI Press, 1992.
Google Scholar
H. Almuallim and T. Dietterich, “Learning with many irrelevant features,” in Proceedings of AAAI-91, (Menlo Park, CA), pp. 547–552, AAAI Press, 1991.
Google Scholar
Z. Pawlak, K. Slowinski, and R. Slowinski, “Rough classification of patients after highly selective vagotomy for duodenal ulcer,” International Journal of Man-Machine Studies, vol. 24, pp. 413–433, 1986.
Article Google Scholar
C. Y. Chang, “Dynamic programming as applied to feature subset selection in a pattern recognition system,” IEEE Trans. Syst., Man, Cybern., vol. SMC-3, pp. 166–171, 1973.
Google Scholar
P. M. Narendra and K. Fukunaga, “A branch and bound algorithm for feature subset selection,” IEEE Trans, on Computers, vol. c-26, no. 9, pp. 917–922, 1977.
Article Google Scholar
R. A. Devijver and J. Kittler, Pattern Recognation: A statistical approach. London: Prentice Hall, 1982.
MATH Google Scholar
A. J. Miller, Subset Selection in Regression. Chapman and Hall, 1990.
MATH Google Scholar
U. M. Fayyad and K. B. Irani, “The attribute selection problem in decision tree generation,” in Proceedings of AAAI-92, pp. 104–110, AAAI Press, 1992.
Google Scholar
P. Baim, “A method for attribute selection in inductive learning systems,” IEEE Trans, on Pattern Analysis and Machine Intelligence, vol. 10, no. 4, pp. 888–896, 1988.
Article Google Scholar
P. J. Huber, “Projection pursuit,” Annals of Statistics, vol. 13, no. 2, pp. 435–475, 1985.
Article MathSciNet MATH Google Scholar
R. Agrawal, T. Imielinski, and A. Swami, “Database mining: A performance perspective,” IEEE Trans. Knowledge and Data Eng., vol. 5, no. 6, pp. 914–924, 1993.
Article Google Scholar
R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. John Wiley & Sons, 1973.
MATH Google Scholar
S. Salzberg, Learning with Nested Generalized Exemplars. Boston, MA: Kluwer Academic Publishers, 1990.
Book MATH Google Scholar
S. M. Weiss and C. A. Kulikowski, Computer Systems that Learn. San Mateo, CA: Morgan Kaufmann, 1991.
Google Scholar
R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, Machine Learning: An Artificial Intelligence Approach. Palo Alto, CA: Tioga, 1983.
Google Scholar
J. Shavlik and T. Diettrich, Readings in Machine Learning. San Mateo, CA: Morgan Kaufmann, 1990.
Google Scholar
S. Muggleton, A. Srinivasan, and M. Bain, “Compression, significance and accuracy,” in Proceedings of 9th International Workshop on Machine Learning, (ML92), (Aberdeen, Scotland), Morgan Kauffmann, 1992.
Google Scholar
R. Holte, L. Acker, and B. Porter, “Concept learning and the problem of small disjuncts,” in Proceedings of 11th International Joint Conference on Artificial Intelligence, (Detroit, MI), Morgan Kauffmann, 1989.
Google Scholar
B. Efron and R. Tibshirani, An Introduction to the Bootstrap. Chapman & Hall. 1993
MATH Google Scholar
K. Fukunaga and R. Hayes, “Effects of sample size in classifier design,” IEEE Trans, on Pattern analysis and Machine Intelligence, vol. 11, no. 8, pp. 873–885, 1985.
Article Google Scholar
M. P. D. Fisher and P. Langley, Concept Formation, Knowledge and Experience in Unsupervised Learning. San Mateo, CA: Morgan Kaufmann, 1991.
Google Scholar
R. Slowinski and J. Stefanowiski, “Rough classification with valued closeness relation,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (San Jose, CA), 1995.
Google Scholar
J. S. Deogun, V. V. Raghavan, and H. Sever, “Rough set based classification methods and extended decision tables,” in Proceedings of the International Workshop on Rough Sets and Soft Computing, (San Jose, California), pp. 302–309.
Google Scholar
W. Ziarko and N. Shan, “KDD-R: a comprehensive system for knowledge discovery in databases using rough sets,” in Proceedings of the International Workshop on Rough Sets and Soft Computing, (San Jose, California), pp. 164–173, 1994.
Google Scholar
J. D. Katzberg and W. Ziarko, “Variable precision rough sets with asymmetric bounds,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), pp. 163–190, 1993.
Google Scholar
Y. Y. Yao and X. Li, “Uncertainty reasoning with interval-set algebra,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), pp. 191–201, 1993.
Google Scholar
R. R. Hashemi, B. A. Pearce, W. G. Hinson, M. G. Paule, and J. F. Young, “IQ estimation of monkeys based on human data using rough sets,” in Proceedings of the International Workshop on Rough Sets and Soft Computing, (San Jose, California), pp. 400–407, 1994.
Google Scholar
Z. Pawlak, “Rough classification,” International Journal of Man-Machine Studies, vol. 20, pp. 469–483, 1984.
Article MATH Google Scholar
R. Kohavi and B. Frasca, “Useful feature subsets and rough set reducts,” in Proceedings of the International Workshop on Rough Sets and Soft Computing, (San Jose, California), pp. 310–317, 1994.
Google Scholar
J. S. Deogun, V. V. Raghavan, and H. Sever, “Rough set model for database mining applications,” Tech. Rep. TR-94–6-10, The University of Southwestern Louisiana, The Center for Advanced Computer Studies, 1994.
Google Scholar
R. E. Kent, “Rough concept analysis,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), pp. 245–253, 1993.
Google Scholar
J. Berry, “Database marketing,” Business Week, pp. 56–62, September 5 1994.
Google Scholar
K. A. Kaufmann, R. S. Michalski, and L. Kerschberg, “Mining for knowledge in databases: Goals and general description of the INLEN system,” in Knowledge Discovery in Databases (W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, eds.), Cambridge, MA: MIT Press, 1991.
Google Scholar
P. Hoschka and W. Klosgen, “A support system for interpreting statistical data,” in Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. J. Frawley, eds.), pp. 325–345, Cambridge, MA: AAAI/MIT, 1991.
Google Scholar
Integrated Solutions, Ltd., Hampshire, England, Clementine — Software for Data Mining.
Google Scholar
A. J. Szladow, “Datogic/R: for database mining and decision support,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), p. 511, 1993.
Google Scholar
J. W. Grzymala-Busse, “The rule induction system LERS Q: a version for personal computers,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), p. 509, 1993.
Google Scholar
D. M. Grzymala-Busse and J. W. Grzymala-Busse, “Comparison of machine learning and knowledge acquisition methods of rule induction based on rough sets,” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, (Banff, Alberta, Canada), pp. 297–306, 1993.
Google Scholar
T. Anand and G. Kahn, “Spotlight: A data explanation system,” in Proceedings of the Eighth IEEE Conference on Applied AI, (Washington, D.C.), pp. 2–8, IEEE Press, 1992.
Google Scholar
K. Hatonen, M. Klemettinen, H. Mannila, and P. Ronkinen, “Knowledge discovery from telecommunications network alarm databases,” in Proceedings of the 12th International Conference on Data Engineering (C. Bogdan, ed.), (New Orleans, LA), feb/mar 1996.
Google Scholar
R. Wille, “Restructuring lattice theory: An approach based on hierarchies on concepts,” in Ordered Sets (I. Rival, ed.), Dordrecht-Boston: Reidel, 1982.
Google Scholar

Download references

Author information

Authors and Affiliations

The Department of Computer Science and Engineering, University of Nebraska-Lincoln, 68588, Linclon, NE, USA
Jitender S. Deogun
The Center for Advanced Computer Studies, University of Southwestern Louisiana, 70504, Lafayette, LA, USA
Vijay V. Raghavan & Amartya Sarkar
The Department of Computer Science, Hacettepe University,Beytepe, 06532, Ankara, Turkey
Hayri Sever

Authors

Jitender S. Deogun
View author publications
You can also search for this author in PubMed Google Scholar
Vijay V. Raghavan
View author publications
You can also search for this author in PubMed Google Scholar
Amartya Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Hayri Sever
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Deogun, J.S., Raghavan, V.V., Sarkar, A., Sever, H. (1997). Data Mining: Trends in Research and Development. In: Rough Sets and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1461-5_2

Download citation

DOI: https://doi.org/10.1007/978-1-4613-1461-5_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8637-0
Online ISBN: 978-1-4613-1461-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics