Skip to main content

Applications of Fuzzy and Rough Set Theory in Data Mining

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 225))

Abstract

The explosion of very large databases has created extraordinary opportunities for monitoring, analyzing and predicting global economical, geographical, demographic, medical, political, and other processes in the world. Statistical analysis and data mining techniques have emerged for these purposes. Data mining is the process of discovering previously unknown but potentially useful patterns, rules, or associations from huge quantity of data. Data mining can be performed on different data repositories such as relational databases, data warehouses, transactional databases, sequence databases, spatial databases, spatio-temporal databases, and text databases, etc. Typically, data mining functionalities can be classified into two categories: descriptive and predictive. Descriptive mining tasks aim at characterizing the general properties of the data in the databases, while predictive mining tasks perform inherence on the current data in order to make prediction in future.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The Brown Lab, http://brownlab.stanford.edu/

  2. Munich information centre for protein sequence, http://mips.gsf.de/proj/yeast/catalogues/funcat/

  3. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD 1993 International Conference on Management of Data [SIGMOD 1993], Washington D.C., pp. 207–216 (1993)

    Google Scholar 

  4. Akleman, E., Chen, J.: Generalized distance functions. In: Proceedings of the 1999 International Conference on Shape Modeling, pp. 72–79 (March 1999)

    Google Scholar 

  5. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research (25), 3389–3402 (1997)

    Google Scholar 

  6. Asharaf, S., Narasimha Murty, M.: An adaptive rough fuzzy single pass algorithm for clustering large data sets. Pattern Recognition 36, 3015–3018 (2003)

    Article  MATH  Google Scholar 

  7. Bace, R.: Intrusion Detection. Macmillan Technical Publishing, Basingstoke (2000)

    Google Scholar 

  8. Banerjee, M., Mitra, S., Pal, S.K.: Rough fuzzy mlp: Knowledge encoding and classification. IEEE Trans. Neural Networks 9, 1203–1216 (1998)

    Article  Google Scholar 

  9. Barbara, D., Couto, J., Jajodia, S., Popyack, L., Wu, N.: ADAM: Detecting intrusions by data mining. In: Proc. of the 2001 IEEE Workshop on Information Assurance and Security, West Point, NY, pp. 11–16 (June 2001)

    Google Scholar 

  10. Barbara, D., Couto, J., Jajodia, S., Wu, N.: ADAM: a testbed for exploring the use of data mining in intrusion detection. ACM SIGMOD Special Issue: Special section on data mining for intrusion detection and threat analysis 30(4), 15–24 (2001)

    Google Scholar 

  11. Bondugula, R., Duzlevski, O., Xu, D.: Profiles and fuzzy k-nearest neighbor algorithm for protein secondary structure prediction. In: Proc. of the 3rd Asia-Pacific Bioinformatics Conference, Singapore, pp. 85–94 (January 2005)

    Google Scholar 

  12. Cai, Y., Bork, P.: Homology-based gene prediction using neural nets. Anal. Biochem. (265), 269–274 (1998)

    Google Scholar 

  13. Chan, K.C.C., Wong, A.K.C.: A statistical technique for extracting classificatory knowledge from databases. Knowledge Discovery in Databases, 107–124 (1991)

    Google Scholar 

  14. Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum sum-squared residue co-clustering of gene expression data. In: Proc. of the Fourth SIAM International Conference on Data Mining, Florida (2004)

    Google Scholar 

  15. Corinna, C., Drucker, H., Hoover, D., Vapnik, V.: Capacity and complexity control in predicting the spread between barrowing and lending interest rates. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, Montreal, Quebec, Canada, pp. 51–76 (1995)

    Google Scholar 

  16. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  17. Deogun, J., Raghavan, V., Sarkar, A., Sever, H.: Data mining: Trends in research and development. Rough Sets and Data Mining: Analysis for Imprecise Data, 9–45 (1996)

    Google Scholar 

  18. Dokas, P., Ertoz, L., Kumar, V., Lazarevic, A., Srivastava, J., Tan, P.: Data mining for network intrusion detection. In: Proceedings of NSF Workshop on Next Generation Data Mining, Baltimore, MD (November 2002)

    Google Scholar 

  19. Elder, J., Pregibon, D.: A statistical perspective on kdd. In: Advances in Knowledge Discovery and Data Mining (1996)

    Google Scholar 

  20. Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: Proc. 17th International Conf. on Machine Learning, pp. 255–262. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  21. Fayyad, U.M.: Mining databases: Towards algorithms for knowledge discovery. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 22(1), 39–48 (1998)

    Google Scholar 

  22. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous attribous as preprocessing for classification learning. In: Proc. 13th Internat. Joint Conf. on Artificial Intelligence, Los Altos, CA, pp. 1022–1027 (1993)

    Google Scholar 

  23. Friedman, N., Goldszmidt, M.: Building classifiers using bayesian networks. In: AAAI/IAAI, vol. 2, pp. 1277–1284 (1996)

    Google Scholar 

  24. Fujikawa, Y., Ho, T.: Cluster-based algorithms for dealing with missing values. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, pp. 535–548. Springer, Heidelberg (2002)

    Google Scholar 

  25. Gary, K., Honaker, J., Joseph, A., Scheve, K.: Listwise deletion is evil: What to do about missing data in political science (2000), http://GKing.Harvard.edu

  26. Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Proceedings of the Workshop on Foundations and New Directions in Data Mining, the third IEEE International Conference on Data Mining, Melbourne, FL, November 2003, pp. 56–63 (2003)

    Google Scholar 

  27. Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of indiscernibility relation and rule induction. Transactions on Rough Sets 1, 78–95 (2004)

    Google Scholar 

  28. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)

    Google Scholar 

  29. Harms, S., Deogun, J., Saquer, J., Tadesse, T.: Discovering representative episodal association rules from event sequences using frequent closed episode sets and event constraints. In: Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, California, USA, November 29 - December 2, pp. 603–606 (2001)

    Google Scholar 

  30. Hartigan, J., Wong, M.: Algorithm AS136: A k-means clustering algorithm. Applied Statistics 28, 100–108 (1979)

    Article  MATH  Google Scholar 

  31. Ho, L.S., Rajapakse, J.C., Nguyen, M.N.: Augmenting hmm with neural network for finding gene structure. In: Proc. of the 7th International Conference on Control, Automation, Robotics and Vision (ICARCV 2002), Singapore, pp. 1522–1527 (December 2002)

    Google Scholar 

  32. Hullermeier, E.: Mining implication-based fuzzy association rules in databases. In: Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems, pp. 101–108 (2002)

    Google Scholar 

  33. Ishibuchi, H., Yamamoto, T., Nakashima, T.: Fuzzy data mining: effect of fuzzy discretization. In: Proceedings IEEE International Conference on Data Mining, pp. 241–248 (November 2001)

    Google Scholar 

  34. Jones, A.K., Sielken, R.S.: Computer system intrusion detection: A survey. Technical report, University of Virginia Computer Science Department (1999)

    Google Scholar 

  35. Joshi, A., Krishnapuram, R.: Robust fuzzy clustering methods to support web mining. In: Proc. Workshop in Data Mining and knowledge Discovery, SIGMOD, pp. 15–1 – 15–8 (1998)

    Google Scholar 

  36. Klawonn, F., Keller, A.: Fuzzy clustering based on modified distance measures. In: Hand, D.J., Kok, J.N., Berthold, M.R. (eds.) IDA 1999. LNCS, vol. 1642, pp. 291–299. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  37. Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems 9(4), 595–607 (2001)

    Article  Google Scholar 

  38. Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems 9(4), 595–607 (2001)

    Article  Google Scholar 

  39. Kumar, P., Krishna, P.R., Bapi, R.S., Kumar, S.: Rough clustering of sequential data. Data & Knowledge Engineering 63(2), 183–199 (2007)

    Article  Google Scholar 

  40. Kuok, C.M., Fu, A.W.-C., Wong, M.H.: Mining fuzzy association rules in databases. SIGMOD Record 27(1), 41–46 (1998)

    Article  Google Scholar 

  41. Li, D., Deogun, J., Spaulding, W., Shuart, B.: Dealing with missing data: Algorithms based on fuzzy sets and rough sets theories. Transactions on Rough Sets IV, 37–57 (2005)

    Article  Google Scholar 

  42. Li, D., Deogun, J., Wang, K.: Fads: A fuzzy anomaly detection system. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS, vol. 4062, pp. 792–798. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  43. Li, D., Deogun, J., Wang, K.: Gene function classification using fuzzy k-nearest neighbor approach. In: Proceedings of the 2007 IEEE International Conference on Granular Computing (GrC 2007), San Jose, CA, pp. 644–647 (November 2007)

    Google Scholar 

  44. Li, H., Zhang, W., Xu, P., Wang, H.: Rought set attribute reduction in decision systems. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS, vol. 4062, pp. 135–140. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  45. Lingras, P., Yan, R., West, C.: Comparison of conventional and rough k-means clustering. In: Proc. of the 9th Intl Conf. on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Chongqing, China, pp. 130–137 (2003)

    Google Scholar 

  46. Lippmann, R., Fried, D., Graf, I., Haines, J., Kendall, K., McClung, D., Weber, D., Webster, S., Wyschogrod, D., Cunningham, R., Zissman, M.: Evaluating intrusion detection systems: The 1998 DARPA off-line intrusion detection evaluation. In: Proceedings of the DARPA Information Survivability Conference and Exposition. IEEE Computer Society Press, Los Alamitos (2000)

    Google Scholar 

  47. Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)

    MATH  Google Scholar 

  48. Luo, J., Bridges, S.: Mining fuzzy association rules and fuzzy frequency episodes for intrusion detection. Intl. Journal of Intelligent Systems 15, 687–703 (2000)

    Article  MATH  Google Scholar 

  49. Matheus, C.J., Chan, P.K., Piatetsky-Shapiro, G.: Systems for knowledge discovery in databases. IEEE Trans. On Knowledge And Data Engineering 5, 903–913 (1993)

    Article  Google Scholar 

  50. Mitra, S., Pal, S.K., Mitra, P.: Data mining in soft computing framework: A survey. IEEE Transaction on Neural Networks 13(1), 3–14 (2002)

    Article  Google Scholar 

  51. Myrtveit, I., Stensrud, E., Olsson, U.H.: Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Transactions on Software Engineering 27(11), 999–1013 (2001)

    Article  Google Scholar 

  52. Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  53. Perera, A., Denton, A., Kotala, P., Jockheck, W., Granda, W., Perrizo, W.: P-tree classification of yeast gene deletion data. SIGKDD Explorations (2002)

    Google Scholar 

  54. Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: ACM Workshop on Data Mining Applied to Security (2001)

    Google Scholar 

  55. Roth, P.: Missing data: A conceptual review for applied psychologists. Personnel Psychology 47(3), 537–560 (1994)

    Article  Google Scholar 

  56. Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC, Boca Raton (1997)

    MATH  Google Scholar 

  57. Shahbaba, B., Radford, M.N.: Gene function classification using bayesian models with hierarchy-based priors. Technical Report 0606, Department of Statistics, University of Toronto (May 2006)

    Google Scholar 

  58. Sim, J., Kim, S.-Y., Lee, J.: Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method. Bioinformatics (21), 2844–2849 (2005)

    Google Scholar 

  59. Slowinski, R., Vanderpooten, D.: A generalized definition of rough approximations based on similarity. IEEE Transactions on Knowledge and Data Engineering 12(2), 331–336 (2000)

    Article  Google Scholar 

  60. Störr, H.-P.: A compact fuzzy extension of the naive bayesian classification algorithm. In: Proc. In Tech/VJFuzzy 2002, Hanoi, Vietnam, pp. 172–177 (2002)

    Google Scholar 

  61. Vinayagam, A., Konig, R., Moormann, J., Schubert, F., Eils, R., Glatting, K.H., Suhai, S.: Applying support vector machines for gene ontology based gene function prediction. BMC Bioinformatics (5) (2004)

    Google Scholar 

  62. Weiss, S.M., Indurkhya, N.: Decision-rule solutions for data mining with missing values. In: IBERAMIA-SBIA, pp. 1–10 (2000)

    Google Scholar 

  63. Yager, R.R.: Using fuzzy methods to model nearest neighbor rules. IEEE Transactions on Systems, Man and Cybernetics, Part B 32(4), 512–525 (2002)

    Article  MathSciNet  Google Scholar 

  64. Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  65. Zeng, H., Lan, H., Zeng, X.: Redundant data processing based on rough-fuzzy. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS, vol. 4062, pp. 156–161. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  66. Ziarko, W.: The discovery, analysis and representation of data dependencies in databases. In: Knowledge Discovery in Databases, pp. 195–209. AAAI Press, Menlo Park (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Li, D., Deogun, J.S. (2009). Applications of Fuzzy and Rough Set Theory in Data Mining. In: Zakrzewska, D., Menasalvas, E., Byczkowska-Lipinska, L. (eds) Methods and Supporting Technologies for Data Analysis. Studies in Computational Intelligence, vol 225. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02196-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02196-1_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02195-4

  • Online ISBN: 978-3-642-02196-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics