Significant Frequent Item Sets Via Pattern Spectrum Filtering

Borgelt, Christian; Picado-Muiño, David

doi:10.1007/978-3-319-26986-3_4

Christian Borgelt⁵ &
David Picado-Muiño⁵

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 335))

769 Accesses

Abstract

Frequent item set mining often suffers from the grave problem that the number of frequent item sets can be huge, even if they are restricted to closed or maximal item sets: in some cases the size of the output can even exceed the size of the transaction database to analyze. In order to overcome this problem, several approaches have been suggested that try to reduce the output by statistical assessments so that only significant frequent item sets (or association rules derived from them) are reported. In this paper we propose a new method along these lines, which combines data randomization with so-called pattern spectrum filtering, as it has been developed for neural spike train analysis. The former serves the purpose to implicitly represent the null hypothesis of independent items, while the latter helps to cope with the multiple testing problem resulting from a statistical evaluation of found patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdi, H.: Bonferroni and S̆idák corrections for multiple comparisons. In: Salkind, N.J. (ed.) Encyclopedia of Measurement and Statistics, pp. 103–107. Sage Publications, Thousand Oaks (2007)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Databases (VLDB 1994, Santiago de Chile), pp. 487–499. Morgan Kaufmann, San Mateo (1994)
Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodological) 57(1), 289–300. Blackwell, Oxford, United Kingdom (1995)
Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California at Irvine, CA (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html
Bonferroni, C.E.: Il calcolo delle assicurazioni su gruppi di teste. Studi in Onore del Professore Salvatore Ortu Carboni, pp. 13–60. Bardi, Rome (1935)
Google Scholar
Borgelt, C.: Frequent item set mining. wiley interdisciplinary reviews (WIREs): data mining and knowledge discovery 2(6), 437–456 (2012). doi:10.1002/widm.1074, Wiley, Chichester, United Kingdom
Google Scholar
Bringmann, B., Zimmermann, A.: The chosen few: on identifying valuable patterns. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007, Omaha, NE), pp. 63–72. IEEE Press, Piscataway, NJ (2007)
Google Scholar
De Raedt, L., Zimmermann, A: Constraint-based pattern set mining. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007, Omaha, NE), pp. 237–248. IEEE Press, Piscataway, NJ (2007)
Google Scholar
Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Trans. Knowl. Discov. Data 1(3), 14 (2007). ACM Press, New York
Google Scholar
Goethals, B.: Frequent Itemset Mining Implementations Repository. University of Antwerp, Belgium (2003). http://fimi.ua.ac.be/
Goethals, B.: Frequent set mining. Data Mining and Knowledge Discovery Handbook, pp. 321–338. Springer, Berlin (2010)
Google Scholar
Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. In: Proceedings Workshop Frequent Item Set Mining Implementations (FIMI 2003, Melbourne, FL), vol. 90. CEUR Workshop Proceedings, Aachen, Germany (2003)
Google Scholar
Grahne, G., Zhu, J.: Reducing the main memory consumptions of FPmax* and FPclose. In: Proceedings Workshop Frequent Item Set Mining Implementations (FIMI 2004, Brighton, UK), vol. 126, CEUR Workshop Proceedings, Aachen, Germany (2004)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the19th ACM International Conference on Management of Data (SIGMOD 2000, Dallas, TX), pp. 1–12. ACM Press, New York (2000)
Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979). Wiley, Chichester, United Kingdom
Google Scholar
Kohavi, R., Bradley, C.E., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 organizers’ report: peeling the onion. SIGKDD Exploration 2(2), 86–93 (2000) .ACM Press, New York
Google Scholar
Louis, S., Borgelt, C., Grün, S.: Generation and selection of surrogate methods for correlation analysis. In: Grün, S., Rotter, S. (eds.) Analysis of Parallel Spike Trains, pp. 359–382. Springer, Berlin (2010)
Chapter Google Scholar
Picado-Muiño, D., Borgelt, C., Berger, D., Gerstein, G.L., Grün, S.: Finding neural assemblies with frequent item set mining. Front. Neuroinformatics 7(9) (2013). doi:10.3389/fninf.2013.00009, Frontiers Media, Lausanne, Switzerland
Siebes, A., Vreeken, J., van Leeuwen, M., Item Sets that Compress. In: Proceedings SIAM International Conference on Data Mining (SDM 2006, Bethesda, MD), pp. 393–404. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2006)
Google Scholar
Torre, E., Picado-Muiño, D., Denker, M., Borgelt, C., Grün, S.: Statistical evaluation of synchronous spike patterns extracted by frequent item set mining. Front. Comput. Neurosc. 7(132) (2013). doi:10.3389/fninf.2013.00132. Frontiers Media, Lausanne, Switzerland
Uno, T., Asai, T., Uchida, Y., Arimura, H.: LCM: an efficient algorithm for enumerating frequent closed item sets. In: Proceedings Workshop on Frequent Item Set Mining Implementations (FIMI 2003, Melbourne, FL), vol. 90. CEUR Workshop Proceedings, TU Aachen, Germany (2003)
Google Scholar
Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings Workshop Frequent Item Set Mining Implementations (FIMI 2004, Brighton, UK), vol.126. CEUR Workshop Proceedings, Aachen, Germany (2004)
Google Scholar
Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: Proceedings of the 1st Open Source Data Mining on Frequent Pattern Mining Implementations (OSDM 2005, Chicago, IL), pp. 77–86. ACM Press, New York, (2005)
Google Scholar
Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)
Article MathSciNet MATH Google Scholar
Webb, G.I.: Discovering significant patterns. Mach. Learn. 68(1), 1–33 (2007)
Article Google Scholar
Webb, G.I.: Layered critical values: a powerful direct-adjustment approach to discovering significant patterns. Mach. Learn. 71(2–3), 307–323 (2008)
Article Google Scholar
Webb, G.I.: Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans. Knowl. Discov. Data (TKDD) 4(1), 3 (2010)
MathSciNet Google Scholar
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Confernece on Knowledge Discovery and Data Mining (KDD 1997, Newport Beach, CA), pp. 283–296. AAAI Press, Menlo Park, CA, USA (1997)
Google Scholar
Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2003, Washington, DC), pp. 326–335. ACM Press, New York, NY, USA (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

European Centre for Soft Computing, Gonzalo Gutiérrez Quirós s/n, 33600, Mieres, Spain
Christian Borgelt & David Picado-Muiño

Authors

Christian Borgelt
View author publications
You can also search for this author in PubMed Google Scholar
David Picado-Muiño
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Borgelt .

Editor information

Editors and Affiliations

School of Business and Management, Lappeenranta University of Technology, Lappeenranta, Finland
Mikael Collan
Universita di Trento, Trento, Italy
Mario Fedrizzi
Polish Academy of Sciences, Systems Research Institute, Warsaw, Poland
Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Borgelt, C., Picado-Muiño, D. (2016). Significant Frequent Item Sets Via Pattern Spectrum Filtering. In: Collan, M., Fedrizzi, M., Kacprzyk, J. (eds) Fuzzy Technology. Studies in Fuzziness and Soft Computing, vol 335. Springer, Cham. https://doi.org/10.1007/978-3-319-26986-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-26986-3_4
Published: 07 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26984-9
Online ISBN: 978-3-319-26986-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics