Advertisement

Applications of Fuzzy and Rough Set Theory in Data Mining

  • Dan Li
  • Jitender S. Deogun
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 225)

Abstract

The explosion of very large databases has created extraordinary opportunities for monitoring, analyzing and predicting global economical, geographical, demographic, medical, political, and other processes in the world. Statistical analysis and data mining techniques have emerged for these purposes. Data mining is the process of discovering previously unknown but potentially useful patterns, rules, or associations from huge quantity of data. Data mining can be performed on different data repositories such as relational databases, data warehouses, transactional databases, sequence databases, spatial databases, spatio-temporal databases, and text databases, etc. Typically, data mining functionalities can be classified into two categories: descriptive and predictive. Descriptive mining tasks aim at characterizing the general properties of the data in the databases, while predictive mining tasks perform inherence on the current data in order to make prediction in future.

Keywords

Root Mean Square Error Data Mining Association Rule Intrusion Detection Anomaly Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Munich information centre for protein sequence, http://mips.gsf.de/proj/yeast/catalogues/funcat/
  3. 3.
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD 1993 International Conference on Management of Data [SIGMOD 1993], Washington D.C., pp. 207–216 (1993)Google Scholar
  4. 4.
    Akleman, E., Chen, J.: Generalized distance functions. In: Proceedings of the 1999 International Conference on Shape Modeling, pp. 72–79 (March 1999)Google Scholar
  5. 5.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research (25), 3389–3402 (1997)Google Scholar
  6. 6.
    Asharaf, S., Narasimha Murty, M.: An adaptive rough fuzzy single pass algorithm for clustering large data sets. Pattern Recognition 36, 3015–3018 (2003)zbMATHCrossRefGoogle Scholar
  7. 7.
    Bace, R.: Intrusion Detection. Macmillan Technical Publishing, Basingstoke (2000)Google Scholar
  8. 8.
    Banerjee, M., Mitra, S., Pal, S.K.: Rough fuzzy mlp: Knowledge encoding and classification. IEEE Trans. Neural Networks 9, 1203–1216 (1998)CrossRefGoogle Scholar
  9. 9.
    Barbara, D., Couto, J., Jajodia, S., Popyack, L., Wu, N.: ADAM: Detecting intrusions by data mining. In: Proc. of the 2001 IEEE Workshop on Information Assurance and Security, West Point, NY, pp. 11–16 (June 2001)Google Scholar
  10. 10.
    Barbara, D., Couto, J., Jajodia, S., Wu, N.: ADAM: a testbed for exploring the use of data mining in intrusion detection. ACM SIGMOD Special Issue: Special section on data mining for intrusion detection and threat analysis 30(4), 15–24 (2001)Google Scholar
  11. 11.
    Bondugula, R., Duzlevski, O., Xu, D.: Profiles and fuzzy k-nearest neighbor algorithm for protein secondary structure prediction. In: Proc. of the 3rd Asia-Pacific Bioinformatics Conference, Singapore, pp. 85–94 (January 2005)Google Scholar
  12. 12.
    Cai, Y., Bork, P.: Homology-based gene prediction using neural nets. Anal. Biochem. (265), 269–274 (1998)Google Scholar
  13. 13.
    Chan, K.C.C., Wong, A.K.C.: A statistical technique for extracting classificatory knowledge from databases. Knowledge Discovery in Databases, 107–124 (1991)Google Scholar
  14. 14.
    Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum sum-squared residue co-clustering of gene expression data. In: Proc. of the Fourth SIAM International Conference on Data Mining, Florida (2004)Google Scholar
  15. 15.
    Corinna, C., Drucker, H., Hoover, D., Vapnik, V.: Capacity and complexity control in predicting the spread between barrowing and lending interest rates. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, Montreal, Quebec, Canada, pp. 51–76 (1995)Google Scholar
  16. 16.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1), 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  17. 17.
    Deogun, J., Raghavan, V., Sarkar, A., Sever, H.: Data mining: Trends in research and development. Rough Sets and Data Mining: Analysis for Imprecise Data, 9–45 (1996)Google Scholar
  18. 18.
    Dokas, P., Ertoz, L., Kumar, V., Lazarevic, A., Srivastava, J., Tan, P.: Data mining for network intrusion detection. In: Proceedings of NSF Workshop on Next Generation Data Mining, Baltimore, MD (November 2002)Google Scholar
  19. 19.
    Elder, J., Pregibon, D.: A statistical perspective on kdd. In: Advances in Knowledge Discovery and Data Mining (1996)Google Scholar
  20. 20.
    Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: Proc. 17th International Conf. on Machine Learning, pp. 255–262. Morgan Kaufmann, San Francisco (2000)Google Scholar
  21. 21.
    Fayyad, U.M.: Mining databases: Towards algorithms for knowledge discovery. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 22(1), 39–48 (1998)Google Scholar
  22. 22.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous attribous as preprocessing for classification learning. In: Proc. 13th Internat. Joint Conf. on Artificial Intelligence, Los Altos, CA, pp. 1022–1027 (1993)Google Scholar
  23. 23.
    Friedman, N., Goldszmidt, M.: Building classifiers using bayesian networks. In: AAAI/IAAI, vol. 2, pp. 1277–1284 (1996)Google Scholar
  24. 24.
    Fujikawa, Y., Ho, T.: Cluster-based algorithms for dealing with missing values. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, pp. 535–548. Springer, Heidelberg (2002)Google Scholar
  25. 25.
    Gary, K., Honaker, J., Joseph, A., Scheve, K.: Listwise deletion is evil: What to do about missing data in political science (2000), http://GKing.Harvard.edu
  26. 26.
    Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Proceedings of the Workshop on Foundations and New Directions in Data Mining, the third IEEE International Conference on Data Mining, Melbourne, FL, November 2003, pp. 56–63 (2003)Google Scholar
  27. 27.
    Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of indiscernibility relation and rule induction. Transactions on Rough Sets 1, 78–95 (2004)Google Scholar
  28. 28.
    Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)Google Scholar
  29. 29.
    Harms, S., Deogun, J., Saquer, J., Tadesse, T.: Discovering representative episodal association rules from event sequences using frequent closed episode sets and event constraints. In: Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, California, USA, November 29 - December 2, pp. 603–606 (2001)Google Scholar
  30. 30.
    Hartigan, J., Wong, M.: Algorithm AS136: A k-means clustering algorithm. Applied Statistics 28, 100–108 (1979)zbMATHCrossRefGoogle Scholar
  31. 31.
    Ho, L.S., Rajapakse, J.C., Nguyen, M.N.: Augmenting hmm with neural network for finding gene structure. In: Proc. of the 7th International Conference on Control, Automation, Robotics and Vision (ICARCV 2002), Singapore, pp. 1522–1527 (December 2002)Google Scholar
  32. 32.
    Hullermeier, E.: Mining implication-based fuzzy association rules in databases. In: Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems, pp. 101–108 (2002)Google Scholar
  33. 33.
    Ishibuchi, H., Yamamoto, T., Nakashima, T.: Fuzzy data mining: effect of fuzzy discretization. In: Proceedings IEEE International Conference on Data Mining, pp. 241–248 (November 2001)Google Scholar
  34. 34.
    Jones, A.K., Sielken, R.S.: Computer system intrusion detection: A survey. Technical report, University of Virginia Computer Science Department (1999)Google Scholar
  35. 35.
    Joshi, A., Krishnapuram, R.: Robust fuzzy clustering methods to support web mining. In: Proc. Workshop in Data Mining and knowledge Discovery, SIGMOD, pp. 15–1 – 15–8 (1998)Google Scholar
  36. 36.
    Klawonn, F., Keller, A.: Fuzzy clustering based on modified distance measures. In: Hand, D.J., Kok, J.N., Berthold, M.R. (eds.) IDA 1999. LNCS, vol. 1642, pp. 291–299. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  37. 37.
    Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems 9(4), 595–607 (2001)CrossRefGoogle Scholar
  38. 38.
    Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems 9(4), 595–607 (2001)CrossRefGoogle Scholar
  39. 39.
    Kumar, P., Krishna, P.R., Bapi, R.S., Kumar, S.: Rough clustering of sequential data. Data & Knowledge Engineering 63(2), 183–199 (2007)CrossRefGoogle Scholar
  40. 40.
    Kuok, C.M., Fu, A.W.-C., Wong, M.H.: Mining fuzzy association rules in databases. SIGMOD Record 27(1), 41–46 (1998)CrossRefGoogle Scholar
  41. 41.
    Li, D., Deogun, J., Spaulding, W., Shuart, B.: Dealing with missing data: Algorithms based on fuzzy sets and rough sets theories. Transactions on Rough Sets IV, 37–57 (2005)CrossRefGoogle Scholar
  42. 42.
    Li, D., Deogun, J., Wang, K.: Fads: A fuzzy anomaly detection system. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS, vol. 4062, pp. 792–798. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  43. 43.
    Li, D., Deogun, J., Wang, K.: Gene function classification using fuzzy k-nearest neighbor approach. In: Proceedings of the 2007 IEEE International Conference on Granular Computing (GrC 2007), San Jose, CA, pp. 644–647 (November 2007)Google Scholar
  44. 44.
    Li, H., Zhang, W., Xu, P., Wang, H.: Rought set attribute reduction in decision systems. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS, vol. 4062, pp. 135–140. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  45. 45.
    Lingras, P., Yan, R., West, C.: Comparison of conventional and rough k-means clustering. In: Proc. of the 9th Intl Conf. on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Chongqing, China, pp. 130–137 (2003)Google Scholar
  46. 46.
    Lippmann, R., Fried, D., Graf, I., Haines, J., Kendall, K., McClung, D., Weber, D., Webster, S., Wyschogrod, D., Cunningham, R., Zissman, M.: Evaluating intrusion detection systems: The 1998 DARPA off-line intrusion detection evaluation. In: Proceedings of the DARPA Information Survivability Conference and Exposition. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  47. 47.
    Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)zbMATHGoogle Scholar
  48. 48.
    Luo, J., Bridges, S.: Mining fuzzy association rules and fuzzy frequency episodes for intrusion detection. Intl. Journal of Intelligent Systems 15, 687–703 (2000)zbMATHCrossRefGoogle Scholar
  49. 49.
    Matheus, C.J., Chan, P.K., Piatetsky-Shapiro, G.: Systems for knowledge discovery in databases. IEEE Trans. On Knowledge And Data Engineering 5, 903–913 (1993)CrossRefGoogle Scholar
  50. 50.
    Mitra, S., Pal, S.K., Mitra, P.: Data mining in soft computing framework: A survey. IEEE Transaction on Neural Networks 13(1), 3–14 (2002)CrossRefGoogle Scholar
  51. 51.
    Myrtveit, I., Stensrud, E., Olsson, U.H.: Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Transactions on Software Engineering 27(11), 999–1013 (2001)CrossRefGoogle Scholar
  52. 52.
    Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)zbMATHCrossRefMathSciNetGoogle Scholar
  53. 53.
    Perera, A., Denton, A., Kotala, P., Jockheck, W., Granda, W., Perrizo, W.: P-tree classification of yeast gene deletion data. SIGKDD Explorations (2002)Google Scholar
  54. 54.
    Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: ACM Workshop on Data Mining Applied to Security (2001)Google Scholar
  55. 55.
    Roth, P.: Missing data: A conceptual review for applied psychologists. Personnel Psychology 47(3), 537–560 (1994)CrossRefGoogle Scholar
  56. 56.
    Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC, Boca Raton (1997)zbMATHGoogle Scholar
  57. 57.
    Shahbaba, B., Radford, M.N.: Gene function classification using bayesian models with hierarchy-based priors. Technical Report 0606, Department of Statistics, University of Toronto (May 2006)Google Scholar
  58. 58.
    Sim, J., Kim, S.-Y., Lee, J.: Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method. Bioinformatics (21), 2844–2849 (2005)Google Scholar
  59. 59.
    Slowinski, R., Vanderpooten, D.: A generalized definition of rough approximations based on similarity. IEEE Transactions on Knowledge and Data Engineering 12(2), 331–336 (2000)CrossRefGoogle Scholar
  60. 60.
    Störr, H.-P.: A compact fuzzy extension of the naive bayesian classification algorithm. In: Proc. In Tech/VJFuzzy 2002, Hanoi, Vietnam, pp. 172–177 (2002)Google Scholar
  61. 61.
    Vinayagam, A., Konig, R., Moormann, J., Schubert, F., Eils, R., Glatting, K.H., Suhai, S.: Applying support vector machines for gene ontology based gene function prediction. BMC Bioinformatics (5) (2004)Google Scholar
  62. 62.
    Weiss, S.M., Indurkhya, N.: Decision-rule solutions for data mining with missing values. In: IBERAMIA-SBIA, pp. 1–10 (2000)Google Scholar
  63. 63.
    Yager, R.R.: Using fuzzy methods to model nearest neighbor rules. IEEE Transactions on Systems, Man and Cybernetics, Part B 32(4), 512–525 (2002)CrossRefMathSciNetGoogle Scholar
  64. 64.
    Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965)zbMATHCrossRefMathSciNetGoogle Scholar
  65. 65.
    Zeng, H., Lan, H., Zeng, X.: Redundant data processing based on rough-fuzzy. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS, vol. 4062, pp. 156–161. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  66. 66.
    Ziarko, W.: The discovery, analysis and representation of data dependencies in databases. In: Knowledge Discovery in Databases, pp. 195–209. AAAI Press, Menlo Park (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Dan Li
    • 1
  • Jitender S. Deogun
    • 2
  1. 1.Department of Computer ScienceNorthern Arizona UniversityFlagstaff
  2. 2.Department of Computer Science and EngineeringUniversity of Nebraska-LincolnLincoln

Personalised recommendations