Irrelevant Feature and Rule Removal for Structural Associative Classification Using Structure-Preserving Flat Representation

Shaharanee, Izwan Nizal Mohd; Hadzic, Fedja

doi:10.1007/978-3-662-45620-0_10

Izwan Nizal Mohd Shaharanee⁴ &
Fedja Hadzic⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 584))

2875 Accesses

Abstract

Practical applications of association rule mining often suffer from overwhelming number of rules that are generated, many of which are not interesting or useful for the application in question. Removing irrelevant features and/or rules comprised of irrelevant features can significantly improve the overall performance. Many statistical and constraint based measures are used to discard unnecessary and irrelevant features and rules when vectorial or tabular data is in question. In contrast, the use of such measures is limited in the tree-structured data domain, due to the structural aspects that are not easily incorporated. In this chapter, we explore the use of a feature subset selection measure as well as a number of common statistical interestingness measures via a recently proposed structure-preserving flat representation for tree-structured data such as XML. A feature subset selection is used prior to association rule generation. Once the initial set of rules is obtained, irrelevant rules are determined as those that are comprised of attributes not determined to be statistically significant for the classification task. The experiments are performed using real world web access trees and property management dataset. The results indicate that where the dataset has more standard structure a large number of insignificant rules will be discarded and accuracy will increase. However, where the tree instances can vary greatly in terms of structure and label distribution among nodes, while many rules are removed and the accuracy increases, there is a significant reduction in coverage rate of the rule set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, R., Imieliski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)
Article Google Scholar
Aumann, Y., Lindell, Y.: A statistical theory for quantitative association rules. Intell. Inf. Syst. 20(3), 253–283 (2003)
Google Scholar
Bathoorn, R., Koopman, A., Siebes, A.: Reducing the frequent pattern set. In: Proceedings of the 6th IEEE International Conference on Data Mining—Workshops, pp. 55–59 (2006)
Google Scholar
Bayardo, R., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense databases. Data Min. Knowl. Discov. 4(2–3), 217–240 (2000)
Article Google Scholar
Blanchard, J., Guillet, F., Gras, R., Briand, H.: Using information-theoretic measures to assess association rule interestingness. In: Proceedings of the 5th IEEE International Conference on Data Mining, pp. 215–238 (2005)
Google Scholar
Bolon-Canedo, V., Sanchez-Marono, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34(3), 483–519 (2013)
Article Google Scholar
Brijs, T., Vanhoof, K., Wets, G.: Defining interestingness for association rules. Int. J. Inf. Theor. Appl. 10(4), 370–376 (2003)
Google Scholar
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 265–276 (1997)
Google Scholar
Cheng, H., Yan, X., Han, J., Hsu, C.W.: Discriminative frequent pattern analysis for effective classification. In: Proceedings of the 23rd International IEEE Conference on Data Engineering, pp. 716–725 (2007)
Google Scholar
Cheng, H., Yan, X., Han, J., Yu, P.: Direct discriminative pattern mining for effective classification. In: Proceedings of the 24th International Conference on Data Engineering, pp. 167–178 (2008)
Google Scholar
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)
Article Google Scholar
Geng, L., Hamilton, H.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 338(3, Article No. 9) (2006)
Google Scholar
Goodman, A., Kamath, C., Kumar, V.: Data analysis in the 21st century. Stat. Anal. Data Min. 1(1), 1–3 (2008)
Article MathSciNet Google Scholar
Hadzic, F.: A structure preserving flat data format representation for tree-structured data. In: Proceedings of PAKDD Workshops, vol. 2011, pp. 221–233 (2012)
Google Scholar
Hadzic, F., Dillon, T.: Using the symmetrical tau (\( \tau \)) criterion for feature selection in decision tree and neural network learning. In: Proceedings of the 2nd SIAM Workshop on Feature Selection for Data Mining: Interfacing Machine Learning and Statistics (2006)
Google Scholar
Hadzic, F., Hecker, M.: Alternative approach to tree-structured web log representation and mining. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 235–242 (2011)
Google Scholar
Hadzic, F., Tan, H., Dillon, T.S.: Mining of Data With Complex Structures, 1st edn, Studies in Computational Intelligence, vol. 333, . Springer (2011)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2001)
Google Scholar
Hashimoto, K., Takigawa, I., Shiga, M., Kanehisa, M., Mamitsuka, H.: Mining significant tree patterns in carbohydrate sugar chains. Bioinformatics 24(16), 167–173 (2008)
Article Google Scholar
Knijf, J.D., Feelders, A.J.: Monotone constraints in frequent tree mining. In: Proceedings of the 14th Annual Machine Learning Conference of Belgium and the Netherlands, BENELEARN pp. 13–20 (2005)
Google Scholar
Kudo, M., Sklansky, J.: Comparison of algorithms that select features for pattern classifiers. Pattern Recognit. 33(1), 25–41 (2000)
Article Google Scholar
Lallich, S., Teytaud, O., Prudhomme, E.: Association rule interestingness: measure and statistical validation. In: Quality Measures in Data Mining. Studies in Computational Intelligence, vol. 43, pp. 251–275. Springer (2007)
Google Scholar
Lallich, S., Teytaud, O., Prudhomme, E.: Formal framework for the study of algorithmic properties of objective interestingness measures. In: Data Mining: Foundations and Intelligent Paradigms, vol. 24, pp. 77–98. ISRL (2012)
Google Scholar
Le Bras, Y., Lenca, P., Lallich, S.: Mining classification rules without support: an anti-monotone property of Jaccard measure. In: Proceedings of the 14th International Conference on Discovery Science, pp. 179–193 (2011)
Google Scholar
Lenca, P., Meyer, P., Vaillant, B., Lallich, S.: On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. Eur. J. Oper. Res. 184(2), 610–626 (2008)
Article MATH Google Scholar
Li, J., Shen, H., Topor, R.: Mining the optimal class association rule set. Knowl.-Based Syst. 15(7), 399–405 (2002)
Article Google Scholar
Little, R., Rubin, D.: Statistical Analysis with Missing Data, 2nd edn. Wiley, New York (2002)
Book MATH Google Scholar
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 80–86 (1998)
Google Scholar
McGarry, K.: A survey of interestingness measures for knowledge discovery. Knowl. Eng. Rev. 20(1), 39–61 (2005)
Article Google Scholar
Molina, L., Belanche, L., Nebot, A.: Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of IEEE International Conference on Data Mining, pp. 306–313 (2002)
Google Scholar
Nakamura, A., Kudo, M.: Mining frequent trees with node-inclusion constraints. In: Advances in Knowledge Discovery and Data Mining, vol. 3518, pp. 850–860. Springer (2005)
Google Scholar
Ozaki, T., Ohkawa, T.: New frontiers in applied data mining, PAKDD 2008 International Workshops. Mining Mutually Dependent Ordered Subtrees in Tree Databases, pp. 75–86. Springer, Heidelberg (2009)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufman (1993)
Google Scholar
Refaat, M.: Data Preparation for Data Mining Using SAS. Morgan Kaufmann Publishers, San Francisco (2007)
Google Scholar
Roiger, R., Geatz, M.: Data Mining: A Tutorial-Based Primer. Addison Wesley, Boston (2003)
Google Scholar
Shaharanee, I., Hadzic, F.: Evaluation and optimization of frequent, closed and maximal association rule based classification. Stat. Comput. 23, 1–23 (2013)
Article MathSciNet Google Scholar
Shaharanee, I., Hadzic, F., Dillon, T.: Interestingness measures for association rules based on statistical validity. Knowl.-Based Syst. 24(3), 386–392 (2011)
Article Google Scholar
Siebes, A., Vreeken, J., Leeuwen, M.V.: Item sets that compress. In: Proceedings of the SIAM Conference on Data Mining, pp. 393–404 (2006)
Google Scholar
Silverstein, C., Brin, S., Motwani, R.: Beyond market baskets: generalizing association rules to dependence rules. Data Min. Knowl. Disc. 2(1), 39–68 (1998)
Article Google Scholar
Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Proceedings of the 3rd Internationall Conference on Knowledge Discovery in Databases and Data Mining, pp. 67–73 (1997)
Google Scholar
Tan, H., Dillon, T., Hadzic, F., Feng, L., Chang, E.: IMB3-Miner: Mining induced/embedded subtrees by constraining the level of embedding. In: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 450–461 (2006)
Google Scholar
Tan, H., Hadzic, F., Dillon, T., Chang, E., Feng, L.: Tree model guided candidate generation for mining frequent subtrees from XML documents. ACM Trans. Knowl. Disc. Data Min. 2(2), 1–43 (2008)
Article Google Scholar
Tan, P., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the 8th ACM Knowledge Discovery and Data Mining Conference, pp. 32–41 (2002)
Google Scholar
Veloso, A., Meira, W., Zaki, M.: Lazy Associative classification. In: Proceedings of the 6th IEEE International Conference on Data Mining, pp. 645–654 (2006)
Google Scholar
Webb, G.: Discovering significant patterns. Mach. Learn. 68(1), 1–33 (2007)
Article Google Scholar
Xiong, H., Tan, P.N., Kumar, V.: Hyperclique pattern discovery. Data Min. Knowl. Disc. 13(2), 219–242 (2006)
Article MathSciNet Google Scholar
Yan, X., Cheng, H., Han, J., Yu, P.S.: Mining significant graph patterns by leap search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 433–444 (2008)
Google Scholar
Yan, X., Han, J., Hsu, C.W.: Discrimantive frequent pattern analysis for effective classification. In: Proceedings of the 23rd IEEE International Conference on Data Engineering, pp. 716–725 (2007)
Google Scholar
Yin, X., Han, J.: CPAR: Classification based on predictive association rules. In: Proceedings of the SIAM International Conference on Data Mining, pp. 396–376 (2003)
Google Scholar
Zaki, M.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)
Article Google Scholar
Zaki, M.J., Aggarwal, C.: XRules: an effective structural classifier for XML data. In: Proceedings of the 9th ACM Knowledge Discovery and Data Mining Conference, pp. 316–325 (2003)
Google Scholar
Zhang, C., Zhang, S.: Collecting quality data for database mining. In: AI 2001: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 2256, pp. 593–604. Springer (2001)
Google Scholar
Zhou, X., Dillon, T.: A statistical-heuristic feature selection criterion for decision tree induction. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 834–841 (1991)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Quantitative Sciences, Universiti Utara Malaysia, Sintok, Malaysia
Izwan Nizal Mohd Shaharanee
Department of Computing, Curtin University, Perth, Australia
Fedja Hadzic

Authors

Izwan Nizal Mohd Shaharanee
View author publications
You can also search for this author in PubMed Google Scholar
Fedja Hadzic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Izwan Nizal Mohd Shaharanee .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Urszula Stańczyk
Mawson Lakes Campus, Faculty of Education, Science, Technology and Mathematics, University of Canberra, Canberra, Australia, and University of South Australia, Adelaide, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shaharanee, I.N.M., Hadzic, F. (2015). Irrelevant Feature and Rule Removal for Structural Associative Classification Using Structure-Preserving Flat Representation. In: Stańczyk, U., Jain, L. (eds) Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence, vol 584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45620-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-662-45620-0_10
Published: 31 December 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45619-4
Online ISBN: 978-3-662-45620-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics