Abstract
Frequent item set mining often suffers from the grave problem that the number of frequent item sets can be huge, even if they are restricted to closed or maximal item sets: in some cases the size of the output can even exceed the size of the transaction database to analyze. In order to overcome this problem, several approaches have been suggested that try to reduce the output by statistical assessments so that only significant frequent item sets (or association rules derived from them) are reported. In this paper we propose a new method along these lines, which combines data randomization with so-called pattern spectrum filtering, as it has been developed for neural spike train analysis. The former serves the purpose to implicitly represent the null hypothesis of independent items, while the latter helps to cope with the multiple testing problem resulting from a statistical evaluation of found patterns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdi, H.: Bonferroni and SĢidĆ”k corrections for multiple comparisons. In: Salkind, N.J. (ed.) Encyclopedia of Measurement and Statistics, pp. 103ā107. Sage Publications, Thousand Oaks (2007)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Databases (VLDB 1994, Santiago de Chile), pp. 487ā499. Morgan Kaufmann, San Mateo (1994)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodological) 57(1), 289ā300. Blackwell, Oxford, United Kingdom (1995)
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California at Irvine, CA (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html
Bonferroni, C.E.: Il calcolo delle assicurazioni su gruppi di teste. Studi in Onore del Professore Salvatore Ortu Carboni, pp. 13ā60. Bardi, Rome (1935)
Borgelt, C.: Frequent item set mining. wiley interdisciplinary reviews (WIREs): data mining and knowledge discovery 2(6), 437ā456 (2012). doi:10.1002/widm.1074, Wiley, Chichester, United Kingdom
Bringmann, B., Zimmermann, A.: The chosen few: on identifying valuable patterns. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007, Omaha, NE), pp. 63ā72. IEEE Press, Piscataway, NJ (2007)
De Raedt, L., Zimmermann, A: Constraint-based pattern set mining. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007, Omaha, NE), pp. 237ā248. IEEE Press, Piscataway, NJ (2007)
Gionis, A., Mannila, H., MielikƤinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Trans. Knowl. Discov. Data 1(3), 14 (2007). ACM Press, New York
Goethals, B.: Frequent Itemset Mining Implementations Repository. University of Antwerp, Belgium (2003). http://fimi.ua.ac.be/
Goethals, B.: Frequent set mining. Data Mining and Knowledge Discovery Handbook, pp. 321ā338. Springer, Berlin (2010)
Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. In: Proceedings Workshop Frequent Item Set Mining Implementations (FIMI 2003, Melbourne, FL), vol. 90. CEUR Workshop Proceedings, Aachen, Germany (2003)
Grahne, G., Zhu, J.: Reducing the main memory consumptions of FPmax* and FPclose. In: Proceedings Workshop Frequent Item Set Mining Implementations (FIMI 2004, Brighton, UK), vol. 126, CEUR Workshop Proceedings, Aachen, Germany (2004)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the19th ACM International Conference on Management of Data (SIGMOD 2000, Dallas, TX), pp. 1ā12. ACM Press, New York (2000)
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65ā70 (1979). Wiley, Chichester, United Kingdom
Kohavi, R., Bradley, C.E., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 organizersā report: peeling the onion. SIGKDD Exploration 2(2), 86ā93 (2000) .ACM Press, New York
Louis, S., Borgelt, C., GrĆ¼n, S.: Generation and selection of surrogate methods for correlation analysis. In: GrĆ¼n, S., Rotter, S. (eds.) Analysis of Parallel Spike Trains, pp. 359ā382. Springer, Berlin (2010)
Picado-MuiƱo, D., Borgelt, C., Berger, D., Gerstein, G.L., GrĆ¼n, S.: Finding neural assemblies with frequent item set mining. Front. Neuroinformatics 7(9) (2013). doi:10.3389/fninf.2013.00009, Frontiers Media, Lausanne, Switzerland
Siebes, A., Vreeken, J., van Leeuwen, M., Item Sets that Compress. In: Proceedings SIAM International Conference on Data Mining (SDM 2006, Bethesda, MD), pp. 393ā404. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2006)
Torre, E., Picado-MuiƱo, D., Denker, M., Borgelt, C., GrĆ¼n, S.: Statistical evaluation of synchronous spike patterns extracted by frequent item set mining. Front. Comput. Neurosc. 7(132) (2013). doi:10.3389/fninf.2013.00132. Frontiers Media, Lausanne, Switzerland
Uno, T., Asai, T., Uchida, Y., Arimura, H.: LCM: an efficient algorithm for enumerating frequent closed item sets. In: Proceedings Workshop on Frequent Item Set Mining Implementations (FIMI 2003, Melbourne, FL), vol. 90. CEUR Workshop Proceedings, TU Aachen, Germany (2003)
Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings Workshop Frequent Item Set Mining Implementations (FIMI 2004, Brighton, UK), vol.126. CEUR Workshop Proceedings, Aachen, Germany (2004)
Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: Proceedings of the 1st Open Source Data Mining on Frequent Pattern Mining Implementations (OSDM 2005, Chicago, IL), pp. 77ā86. ACM Press, New York, (2005)
Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169ā214 (2011)
Webb, G.I.: Discovering significant patterns. Mach. Learn. 68(1), 1ā33 (2007)
Webb, G.I.: Layered critical values: a powerful direct-adjustment approach to discovering significant patterns. Mach. Learn. 71(2ā3), 307ā323 (2008)
Webb, G.I.: Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans. Knowl. Discov. Data (TKDD) 4(1), 3 (2010)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Confernece on Knowledge Discovery and Data Mining (KDD 1997, Newport Beach, CA), pp. 283ā296. AAAI Press, Menlo Park, CA, USA (1997)
Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2003, Washington, DC), pp. 326ā335. ACM Press, New York, NY, USA (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Borgelt, C., Picado-MuiƱo, D. (2016). Significant Frequent Item Sets Via Pattern Spectrum Filtering. In: Collan, M., Fedrizzi, M., Kacprzyk, J. (eds) Fuzzy Technology. Studies in Fuzziness and Soft Computing, vol 335. Springer, Cham. https://doi.org/10.1007/978-3-319-26986-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-26986-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26984-9
Online ISBN: 978-3-319-26986-3
eBook Packages: EngineeringEngineering (R0)