Selecting Relevant Association Rules From Imperfect Data

L’Héritier, Cécile; Harispe, Sébastien; Imoussaten, Abdelhak; Dusserre, Gilles; Roig, Benoît

doi:10.1007/978-3-030-35514-2_9

Cécile L’Héritier^11,12,
Sébastien Harispe¹¹,
Abdelhak Imoussaten¹¹,
Gilles Dusserre¹¹ &
…
Benoît Roig¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11940))

Included in the following conference series:

International Conference on Scalable Uncertainty Management

543 Accesses
1 Citations

Abstract

Association Rule Mining (ARM) in the context of imperfect data (e.g. imprecise data) has received little attention so far despite the prevalence of such data in a wide range of real-world applications. In this work, we present an ARM approach that can be used to handle imprecise data and derive imprecise rules. Based on evidence theory and Multiple Criteria Decision Analysis, the proposed approach relies on a selection procedure for identifying the most relevant rules while considering information characterizing their interestingness. The several measures of interestingness defined for comparing the rules as well as the selection procedure are presented. We also show how a priori knowledge about attribute values defined into domain taxonomies can be used to (i) ease the mining process, and to (ii) help identifying relevant rules for a domain of interest. Our approach is illustrated using a concrete simplified case study related to humanitarian projects analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that the simplification of the mining process here refers to a reduction of complexity in terms of the number of rules analysed, i.e. search space size. Algorithmic contributions and therefore complexity analyses regarding efficient implementations of the proposed approach are left for future work.
2.
Indeed all the measures used in our approach take values in the interval [0, 1], then a measure k to minimize can be changed to a measure to maximize by considering \(1-g_k(r)\) instead of \(g_k(r)\).
3.
Evaluating support and confidence of \(\overline{A} \rightarrow B\) and \(\overline{A} \rightarrow \overline{B}\) can lead to undefined values, e.g. evaluating \(\overline{A} \rightarrow B\), we have \(Bel(\overline{A} \times B) = 0\) when \(\overline{A}\) has never been observed, leading to \(Bel(B |\overline{A})\) being undefined. However, pruning using dominance and Electre I requires the same measures to be defined. Undefined values are thus substituted by an arbitrary value that neither favor nor penalize the evaluation of the rule \(A \rightarrow B\). The median of \(Bel(\overline{A} \times B)\) (resp. \(Bel(\overline{A} \times \overline{B})\)) has been chosen. Note that \(A \rightarrow \overline{B}\) is not concerned since evaluating \(A\rightarrow B\) implies evidence on A.

References

Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol. 22, pp. 207–216. ACM (1993)
Google Scholar
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Google Scholar
Ait-Mlouk, A., Gharnati, F., Agouti, T.: Multi-agent-based modeling for extracting relevant association rules using a multi-criteria analysis approach. Vietnam J. Comput. Sci. 3(4), 235–245 (2016)
Article Google Scholar
Bouker, S., Saidi, R., Yahia, S.B., Nguifo, E.M.: Ranking and selecting association rules based on dominance relationship. In: 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, vol. 1, pp. 658–665. IEEE (2012)
Google Scholar
Chen, M.C.: Ranking discovered rules from data mining with multiple criteria by data envelopment analysis. Expert Syst. Appl. 33(4), 1110–1116 (2007)
Article Google Scholar
Choi, D.H., Ahn, B.S., Kim, S.H.: Prioritization of association rules in data mining: multiple criteria decision approach. Expert Syst. Appl. 29(4), 867–878 (2005)
Article Google Scholar
Dempster, A.P.: Upper and lower probabilities induced by a multivalued mapping. Ann. Math. Stat. 38, 325–339 (1967)
Article MathSciNet Google Scholar
Djouadi, Y., Redaoui, S., Amroun, K.: Mining association rules under imprecision and vagueness: towards a possibilistic approach. In: 2007 IEEE International Fuzzy Systems Conference, pp. 1–6. IEEE (2007)
Google Scholar
Dubois, D., Denoeux, T.: Conditioning in dempster-shafer theory: prediction vs. revision. In: Denoeux, T., Masson, M.H. (eds.) Belief Functions: Theory and Applications, pp. 385–392. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29461-7_45
Chapter Google Scholar
Fagin, R., Halpern, J.Y.: A new approach to updating beliefs. In: Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI 1990, pp. 347–374. Elsevier Science Inc., New York, NY, USA (1991). http://dl.acm.org/citation.cfm?id=647233.760137
Figueira, J., Roy, B.: Determining the weights of criteria in the electre type methods with a revised simos’ procedure. Eur. J. Oper. Res. 139(2), 317–326 (2002)
Article Google Scholar
Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38(3), 9-es (2006)
Article Google Scholar
Hewawasam, K., Premaratne, K., Subasingha, S., Shyu, M.L.: Rule mining and classification in imperfect databases. In: 2005 7th International Conference on Information Fusion, vol. 1, p. 8. IEEE (2005)
Google Scholar
Hong, T.P., Lin, K.Y., Wang, S.L.: Fuzzy data mining for interesting generalized association rules. Fuzzy Sets Syst. 138(2), 255–269 (2003)
Article MathSciNet Google Scholar
Kotsiantis, S., Kanellopoulos, D.: Association rules mining: a recent overview. GESTS Int. Trans. Comput. Sci. Eng. 32(1), 71–82 (2006)
Google Scholar
Liu, B., Hsu, W., Chen, S., Ma, Y.: Analyzing the subjective interestigness of association rules. IEEE Intell. Syst. 15(5), 47–55 (2000). https://doi.org/10.1109/5254.889106
Article Google Scholar
Nguyen Le, T.T., Huynh, H.X., Guillet, F.: Finding the most interesting association rules by aggregating objective interestingness measures. In: Richards, D., Kang, B.-H. (eds.) PKAW 2008. LNCS (LNAI), vol. 5465, pp. 40–49. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01715-5_4
Chapter Google Scholar
Roy, B.: Classement et choix en présence de points de vue multiples. Revue française d’informatique et de recherche opérationnelle 2(8), 57–75 (1968)
Article Google Scholar
Samet, A., Lefèvre, E., Yahia, S.B.: Evidential data mining: precise support and confidence. J. Intell. Inf. Syst. 47(1), 135–163 (2016)
Article Google Scholar
Seco, N., Veale, T., Hayes, J.: An intrinsic information content metric for semantic similarity in wordNet. In: Ecai, vol. 16, p. 1089 (2004)
Google Scholar
Shafer, G.: A Mathematical Theory of Evidence, vol. 42. Princeton University Press, Princeton (1976)
MATH Google Scholar
Silberschatz, A., Tuzhilin, A.: What makes patterns interesting in knowledge discovery systems. IEEE Trans. Knowl. Data Eng. 8(6), 970–974 (1996)
Article Google Scholar
Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 32–41. ACM (2002)
Google Scholar
Tobji, M.B., Yaghlane, B.B., Mellouli, K.: A new algorithm for mining frequent itemsets from evidential databases. Proc. IPMU 8, 1535–1542 (2008)
Google Scholar
Bach Tobji, M.A., Ben Yaghlane, B., Mellouli, K.: Frequent itemset mining from databases including one evidential attribute. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 19–32. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87993-0_4
Chapter Google Scholar
Toloo, M., Sohrabi, B., Nalchigar, S.: A new method for ranking discovered rules from data mining by dea. Expert Syst. Appl. 36(4), 8503–8508 (2009)
Article Google Scholar
Vaillant, B., Lenca, P., Lallich, S.: A clustering of interestingness measures. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 290–297. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30214-8_23
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

LGI2P, IMT Mines Ales, Univ Montpellier, Alès, France
Cécile L’Héritier, Sébastien Harispe, Abdelhak Imoussaten & Gilles Dusserre
EA7352 CHROME, Université de Nîmes, Nîmes, France
Cécile L’Héritier & Benoît Roig

Authors

Cécile L’Héritier
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Harispe
View author publications
You can also search for this author in PubMed Google Scholar
Abdelhak Imoussaten
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Dusserre
View author publications
You can also search for this author in PubMed Google Scholar
Benoît Roig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cécile L’Héritier .

Editor information

Editors and Affiliations

Institut Supérieur de Gestion de Tunis, Bouchoucha, Tunisia
Nahla Ben Amor
University of Technology of Compiègne, Compiègne, France
Benjamin Quost
University of Luxembourg, Esch-Sur-Alzette, Luxembourg
Martin Theobald

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

L’Héritier, C., Harispe, S., Imoussaten, A., Dusserre, G., Roig, B. (2019). Selecting Relevant Association Rules From Imperfect Data. In: Ben Amor, N., Quost, B., Theobald, M. (eds) Scalable Uncertainty Management. SUM 2019. Lecture Notes in Computer Science(), vol 11940. Springer, Cham. https://doi.org/10.1007/978-3-030-35514-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-35514-2_9
Published: 02 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35513-5
Online ISBN: 978-3-030-35514-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics