Applications of Fuzzy and Rough Set Theory in Data Mining

Li, Dan; Deogun, Jitender S.

doi:10.1007/978-3-642-02196-1_4

Applications of Fuzzy and Rough Set Theory in Data Mining

Dan Li⁴ &
Jitender S. Deogun⁵

Chapter

1075 Accesses
1 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 225))

Abstract

The explosion of very large databases has created extraordinary opportunities for monitoring, analyzing and predicting global economical, geographical, demographic, medical, political, and other processes in the world. Statistical analysis and data mining techniques have emerged for these purposes. Data mining is the process of discovering previously unknown but potentially useful patterns, rules, or associations from huge quantity of data. Data mining can be performed on different data repositories such as relational databases, data warehouses, transactional databases, sequence databases, spatial databases, spatio-temporal databases, and text databases, etc. Typically, data mining functionalities can be classified into two categories: descriptive and predictive. Descriptive mining tasks aim at characterizing the general properties of the data in the databases, while predictive mining tasks perform inherence on the current data in order to make prediction in future.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

The Brown Lab, http://brownlab.stanford.edu/
Munich information centre for protein sequence, http://mips.gsf.de/proj/yeast/catalogues/funcat/
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD 1993 International Conference on Management of Data [SIGMOD 1993], Washington D.C., pp. 207–216 (1993)
Google Scholar
Akleman, E., Chen, J.: Generalized distance functions. In: Proceedings of the 1999 International Conference on Shape Modeling, pp. 72–79 (March 1999)
Google Scholar
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research (25), 3389–3402 (1997)
Google Scholar
Asharaf, S., Narasimha Murty, M.: An adaptive rough fuzzy single pass algorithm for clustering large data sets. Pattern Recognition 36, 3015–3018 (2003)
Article MATH Google Scholar
Bace, R.: Intrusion Detection. Macmillan Technical Publishing, Basingstoke (2000)
Google Scholar
Banerjee, M., Mitra, S., Pal, S.K.: Rough fuzzy mlp: Knowledge encoding and classification. IEEE Trans. Neural Networks 9, 1203–1216 (1998)
Article Google Scholar
Barbara, D., Couto, J., Jajodia, S., Popyack, L., Wu, N.: ADAM: Detecting intrusions by data mining. In: Proc. of the 2001 IEEE Workshop on Information Assurance and Security, West Point, NY, pp. 11–16 (June 2001)
Google Scholar
Barbara, D., Couto, J., Jajodia, S., Wu, N.: ADAM: a testbed for exploring the use of data mining in intrusion detection. ACM SIGMOD Special Issue: Special section on data mining for intrusion detection and threat analysis 30(4), 15–24 (2001)
Google Scholar
Bondugula, R., Duzlevski, O., Xu, D.: Profiles and fuzzy k-nearest neighbor algorithm for protein secondary structure prediction. In: Proc. of the 3rd Asia-Pacific Bioinformatics Conference, Singapore, pp. 85–94 (January 2005)
Google Scholar
Cai, Y., Bork, P.: Homology-based gene prediction using neural nets. Anal. Biochem. (265), 269–274 (1998)
Google Scholar
Chan, K.C.C., Wong, A.K.C.: A statistical technique for extracting classificatory knowledge from databases. Knowledge Discovery in Databases, 107–124 (1991)
Google Scholar
Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum sum-squared residue co-clustering of gene expression data. In: Proc. of the Fourth SIAM International Conference on Data Mining, Florida (2004)
Google Scholar
Corinna, C., Drucker, H., Hoover, D., Vapnik, V.: Capacity and complexity control in predicting the spread between barrowing and lending interest rates. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, Montreal, Quebec, Canada, pp. 51–76 (1995)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Deogun, J., Raghavan, V., Sarkar, A., Sever, H.: Data mining: Trends in research and development. Rough Sets and Data Mining: Analysis for Imprecise Data, 9–45 (1996)
Google Scholar
Dokas, P., Ertoz, L., Kumar, V., Lazarevic, A., Srivastava, J., Tan, P.: Data mining for network intrusion detection. In: Proceedings of NSF Workshop on Next Generation Data Mining, Baltimore, MD (November 2002)
Google Scholar
Elder, J., Pregibon, D.: A statistical perspective on kdd. In: Advances in Knowledge Discovery and Data Mining (1996)
Google Scholar
Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: Proc. 17th International Conf. on Machine Learning, pp. 255–262. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Fayyad, U.M.: Mining databases: Towards algorithms for knowledge discovery. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 22(1), 39–48 (1998)
Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous attribous as preprocessing for classification learning. In: Proc. 13th Internat. Joint Conf. on Artificial Intelligence, Los Altos, CA, pp. 1022–1027 (1993)
Google Scholar
Friedman, N., Goldszmidt, M.: Building classifiers using bayesian networks. In: AAAI/IAAI, vol. 2, pp. 1277–1284 (1996)
Google Scholar
Fujikawa, Y., Ho, T.: Cluster-based algorithms for dealing with missing values. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, pp. 535–548. Springer, Heidelberg (2002)
Google Scholar
Gary, K., Honaker, J., Joseph, A., Scheve, K.: Listwise deletion is evil: What to do about missing data in political science (2000), http://GKing.Harvard.edu
Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Proceedings of the Workshop on Foundations and New Directions in Data Mining, the third IEEE International Conference on Data Mining, Melbourne, FL, November 2003, pp. 56–63 (2003)
Google Scholar
Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of indiscernibility relation and rule induction. Transactions on Rough Sets 1, 78–95 (2004)
Google Scholar
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)
Google Scholar
Harms, S., Deogun, J., Saquer, J., Tadesse, T.: Discovering representative episodal association rules from event sequences using frequent closed episode sets and event constraints. In: Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, California, USA, November 29 - December 2, pp. 603–606 (2001)
Google Scholar
Hartigan, J., Wong, M.: Algorithm AS136: A k-means clustering algorithm. Applied Statistics 28, 100–108 (1979)
Article MATH Google Scholar
Ho, L.S., Rajapakse, J.C., Nguyen, M.N.: Augmenting hmm with neural network for finding gene structure. In: Proc. of the 7th International Conference on Control, Automation, Robotics and Vision (ICARCV 2002), Singapore, pp. 1522–1527 (December 2002)
Google Scholar
Hullermeier, E.: Mining implication-based fuzzy association rules in databases. In: Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems, pp. 101–108 (2002)
Google Scholar
Ishibuchi, H., Yamamoto, T., Nakashima, T.: Fuzzy data mining: effect of fuzzy discretization. In: Proceedings IEEE International Conference on Data Mining, pp. 241–248 (November 2001)
Google Scholar
Jones, A.K., Sielken, R.S.: Computer system intrusion detection: A survey. Technical report, University of Virginia Computer Science Department (1999)
Google Scholar
Joshi, A., Krishnapuram, R.: Robust fuzzy clustering methods to support web mining. In: Proc. Workshop in Data Mining and knowledge Discovery, SIGMOD, pp. 15–1 – 15–8 (1998)
Google Scholar
Klawonn, F., Keller, A.: Fuzzy clustering based on modified distance measures. In: Hand, D.J., Kok, J.N., Berthold, M.R. (eds.) IDA 1999. LNCS, vol. 1642, pp. 291–299. Springer, Heidelberg (1999)
Chapter Google Scholar
Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems 9(4), 595–607 (2001)
Article Google Scholar
Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems 9(4), 595–607 (2001)
Article Google Scholar
Kumar, P., Krishna, P.R., Bapi, R.S., Kumar, S.: Rough clustering of sequential data. Data & Knowledge Engineering 63(2), 183–199 (2007)
Article Google Scholar
Kuok, C.M., Fu, A.W.-C., Wong, M.H.: Mining fuzzy association rules in databases. SIGMOD Record 27(1), 41–46 (1998)
Article Google Scholar
Li, D., Deogun, J., Spaulding, W., Shuart, B.: Dealing with missing data: Algorithms based on fuzzy sets and rough sets theories. Transactions on Rough Sets IV, 37–57 (2005)
Article Google Scholar
Li, D., Deogun, J., Wang, K.: Fads: A fuzzy anomaly detection system. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS, vol. 4062, pp. 792–798. Springer, Heidelberg (2006)
Chapter Google Scholar
Li, D., Deogun, J., Wang, K.: Gene function classification using fuzzy k-nearest neighbor approach. In: Proceedings of the 2007 IEEE International Conference on Granular Computing (GrC 2007), San Jose, CA, pp. 644–647 (November 2007)
Google Scholar
Li, H., Zhang, W., Xu, P., Wang, H.: Rought set attribute reduction in decision systems. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS, vol. 4062, pp. 135–140. Springer, Heidelberg (2006)
Chapter Google Scholar
Lingras, P., Yan, R., West, C.: Comparison of conventional and rough k-means clustering. In: Proc. of the 9th Intl Conf. on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Chongqing, China, pp. 130–137 (2003)
Google Scholar
Lippmann, R., Fried, D., Graf, I., Haines, J., Kendall, K., McClung, D., Weber, D., Webster, S., Wyschogrod, D., Cunningham, R., Zissman, M.: Evaluating intrusion detection systems: The 1998 DARPA off-line intrusion detection evaluation. In: Proceedings of the DARPA Information Survivability Conference and Exposition. IEEE Computer Society Press, Los Alamitos (2000)
Google Scholar
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
MATH Google Scholar
Luo, J., Bridges, S.: Mining fuzzy association rules and fuzzy frequency episodes for intrusion detection. Intl. Journal of Intelligent Systems 15, 687–703 (2000)
Article MATH Google Scholar
Matheus, C.J., Chan, P.K., Piatetsky-Shapiro, G.: Systems for knowledge discovery in databases. IEEE Trans. On Knowledge And Data Engineering 5, 903–913 (1993)
Article Google Scholar
Mitra, S., Pal, S.K., Mitra, P.: Data mining in soft computing framework: A survey. IEEE Transaction on Neural Networks 13(1), 3–14 (2002)
Article Google Scholar
Myrtveit, I., Stensrud, E., Olsson, U.H.: Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Transactions on Software Engineering 27(11), 999–1013 (2001)
Article Google Scholar
Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)
Article MATH MathSciNet Google Scholar
Perera, A., Denton, A., Kotala, P., Jockheck, W., Granda, W., Perrizo, W.: P-tree classification of yeast gene deletion data. SIGKDD Explorations (2002)
Google Scholar
Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: ACM Workshop on Data Mining Applied to Security (2001)
Google Scholar
Roth, P.: Missing data: A conceptual review for applied psychologists. Personnel Psychology 47(3), 537–560 (1994)
Article Google Scholar
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC, Boca Raton (1997)
MATH Google Scholar
Shahbaba, B., Radford, M.N.: Gene function classification using bayesian models with hierarchy-based priors. Technical Report 0606, Department of Statistics, University of Toronto (May 2006)
Google Scholar
Sim, J., Kim, S.-Y., Lee, J.: Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method. Bioinformatics (21), 2844–2849 (2005)
Google Scholar
Slowinski, R., Vanderpooten, D.: A generalized definition of rough approximations based on similarity. IEEE Transactions on Knowledge and Data Engineering 12(2), 331–336 (2000)
Article Google Scholar
Störr, H.-P.: A compact fuzzy extension of the naive bayesian classification algorithm. In: Proc. In Tech/VJFuzzy 2002, Hanoi, Vietnam, pp. 172–177 (2002)
Google Scholar
Vinayagam, A., Konig, R., Moormann, J., Schubert, F., Eils, R., Glatting, K.H., Suhai, S.: Applying support vector machines for gene ontology based gene function prediction. BMC Bioinformatics (5) (2004)
Google Scholar
Weiss, S.M., Indurkhya, N.: Decision-rule solutions for data mining with missing values. In: IBERAMIA-SBIA, pp. 1–10 (2000)
Google Scholar
Yager, R.R.: Using fuzzy methods to model nearest neighbor rules. IEEE Transactions on Systems, Man and Cybernetics, Part B 32(4), 512–525 (2002)
Article MathSciNet Google Scholar
Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965)
Article MATH MathSciNet Google Scholar
Zeng, H., Lan, H., Zeng, X.: Redundant data processing based on rough-fuzzy. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS, vol. 4062, pp. 156–161. Springer, Heidelberg (2006)
Chapter Google Scholar
Ziarko, W.: The discovery, analysis and representation of data dependencies in databases. In: Knowledge Discovery in Databases, pp. 195–209. AAAI Press, Menlo Park (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Northern Arizona University, Flagstaff, AZ 86011-5600
Dan Li
Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115
Jitender S. Deogun

Authors

Dan Li
View author publications
You can also search for this author in PubMed Google Scholar
Jitender S. Deogun
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Technical University of Lodz , Wolczanska 215, 90-924, Lodz, Poland
Danuta Zakrzewska & Liliana Byczkowska-Lipinska &
Facultad de Informatica, Universidad Politecnica de Madrid , Campus de Montegancedo s/n, 28660, Boadilla del Monte Madrid, Spain
Ernestina Menasalvas

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Li, D., Deogun, J.S. (2009). Applications of Fuzzy and Rough Set Theory in Data Mining. In: Zakrzewska, D., Menasalvas, E., Byczkowska-Lipinska, L. (eds) Methods and Supporting Technologies for Data Analysis. Studies in Computational Intelligence, vol 225. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02196-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-02196-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02195-4
Online ISBN: 978-3-642-02196-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics