Frequent Pattern Outlier Detection Without Exhaustive Mining

Giacometti, Arnaud; Soulet, Arnaud

doi:10.1007/978-3-319-31750-2_16

Arnaud Giacometti¹⁹ &
Arnaud Soulet¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9652))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3174 Accesses
8 Citations

Abstract

Outlier detection consists in detecting anomalous observations from data. During the past decade, pattern-based outlier detection methods have proposed to mine all frequent patterns in order to compute the outlier factor of each transaction. This approach remains too expensive despite recent progress in pattern mining field. In this paper, we provide exact and approximate methods for calculating the frequent pattern outlier factor (FPOF) without extracting any pattern or by extracting a small sample. We propose an algorithm that returns the exact FPOF without mining any pattern. Surprisingly, it works in polynomial time on the size of the dataset. We also present an approximate method where the end-user controls the maximum error on the estimated FPOF. Experiments show the interest of both methods for very large datasets where exhaustive mining fails to provide the exact solution. The accuracy of our approximate method outperforms the baseline approach for a same budget in time or number of patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
It is the proportion of pairs of transactions which would be ranked similarly with the approximate FPOF and with the true FPOF (see Sect. 6 for a formal definition).

References

Hawkins, D.M.: Identification of Outliers, vol. 11. Springer, The Netherlands (1980)
Book MATH Google Scholar
He, Z., Xu, X., Huang, Z.J., Deng, S.: FP-outlier: frequent pattern based outlier detection. Comput. Sci. Inf. Syst. 2(1), 103–118 (2005)
Article Google Scholar
Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast distributed outlier detection in mixed-attribute data sets. Data Min. Knowl. Discovery 12(2–3), 203–228 (2006)
Article MathSciNet Google Scholar
Koufakou, A., Secretan, J., Georgiopoulos, M.: Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data. Knowl. Inf. Syst. 29(3), 697–725 (2011)
Article Google Scholar
Knobbe, A., Crémilleux, B., Fürnkranz, J., Scholz, M.: From local patterns to global models: the lego approach to data mining. In: From Local Patterns to Global Models: Proceedings of the ECML PKDD 2008 Workshop, pp. 1–16 (2008)
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: International Conference on Knowledge Discovery and Data mining (1998)
Google Scholar
Liu, Q., Dong, G.: CPCQ: contrast pattern based clustering quality index for categorical data. Pattern Recogn. 45(4), 1739–1748 (2012)
Article Google Scholar
Chaoji, V., Hasan, M.A., Salem, S., Besson, J., Zaki, M.J.: ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Stat. Anal. Data Min. 1(2), 67–84 (2008)
Article MathSciNet Google Scholar
Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient two-step random procedures. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 582–590 (2011)
Google Scholar
van Leeuwen, M.: Interactive data exploration using pattern mining. In: Jurisica, I., Holzinger, A. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014)
Google Scholar
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: International conference on Very Large Data Bases, vol. 1215, pp. 487–499 (1994)
Google Scholar
Giacometti, A., Li, D.H., Soulet, A.: Balancing the analysis of frequent patterns. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014, Part I. LNCS, vol. 8443, pp. 53–64. Springer, Heidelberg (2014)
Chapter Google Scholar

Download references

Acknowledgements

This work has been partially supported by the Prefute project, PEPS 2015, CNRS.

Author information

Authors and Affiliations

Université François Rabelais Tours, LI EA 6300, 3 Place Jean Jaurès, 41029, Blois, France
Arnaud Giacometti & Arnaud Soulet

Authors

Arnaud Giacometti
View author publications
You can also search for this author in PubMed Google Scholar
Arnaud Soulet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnaud Giacometti .

Editor information

Editors and Affiliations

The University of Melbourne, Melbourne, Victoria, Australia
James Bailey
The University of Texas at Dallas, Richardson, Texas, USA
Latifur Khan
Osaka University, Osaka, Japan
Takashi Washio
University of Auckland, Auckland, New Zealand
Gill Dobbie
Shenzhen University, Shenzhen, China
Joshua Zhexue Huang
Massey University, Auckland, New Zealand
Ruili Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Giacometti, A., Soulet, A. (2016). Frequent Pattern Outlier Detection Without Exhaustive Mining. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-31750-2_16
Published: 12 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31749-6
Online ISBN: 978-3-319-31750-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics