Abstract
Frequent pattern mining has broad applications which encompass clustering, classification, software bug detection, recommendations, and a wide variety of other problems. In fact, the greatest utility of frequent pattern mining (unlike other major data mining problems such as outlier analysis and classification), is as an intermediate tool to provide pattern-centered insights for a variety of problems. In this chapter, we will study a wide variety of applications of frequent pattern mining. The purpose of this chapter is not to provide a detailed description of every possible application, but to provide the reader an overview of what is possible with the use of methods such as frequent pattern mining.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Many other kinds of methods such as Markov Models [55] are used in order to solve this problem.
References
C. C. Aggarwal. On Effective Classification of Strings with Wavelets, ACM KDD Conference, 2002.
C. C. Aggarwal. Managing and Mining Sensor Data, Springer, 2013.
C. C. Aggarwal, T. Abdelzaher. Social Sensing, Managing and Mining Sensor Data, Springer, 2013.
C. C. Aggarwal, C. K. Reddy. Data Clustering: Algorithms and Applications, CRC Press, 2013.
C. C. Aggarwal, and H. Wang. Managing and Mining Graph Data, Springer, 2010.
C. Aggarwal and P. Yu. On Effective Conceptual Indexing and Similarity Search in Text Data, ICDM Conference, 2001.
C. Aggarwal and C. Zhai. Mining Text Data, Springer, 2012.
C. C. Aggarwal, J. Wolf, P. Yu. A New Method for Similarity Indexing of Market Basket Data, ACM SIGMOD Conference, 1999.
C. Aggarwal, C. Procopiuc, and P. Yu. Finding Localized Associations in Market Basket Data, IEEE Transactions on Knowledge and Data Engineering, 14(1), pp. 51–62, 2002.
C. C. Aggarwal, N. Ta, J. Wang, J. Feng, M. Zaki. Xproj: A framework for projected structural clustering of XML documents, ACM KDD Conference, 2007.
R. Agrawal, and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases, VLDB Conference, pp. 487–499, 1994.
R. Agrawal, and R. Srikant. Mining Sequential Patterns, ICDE Conference, 1995.
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. ACM SIGMOD Conference, 1993.
R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications, ACM SIGMOD Conference, 1998.
R. Agrawal, D. Gunopulos, and F. Leymann. Mining Process Models from Workflow Logs, Springer, 1998.
R. Agarwal, C. C. Aggarwal, and V. V. V. Prasad. A Tree Projection Algorithm for Generation of Frequent Itemsets, JPDC Journal, 2001.
L. Akoglu, H. Tong, J. Vreeken, and C. Faloutsos. Fast and Reliable Anomaly Detection in Categorical Data, CIKM Conference, 2012.
K. Ali, K. Manganaris, R. Srikant. Partial Classification using Association Rules, KDD Conference, 1997.
P. Anick, and S. Tipirneni. The Paraphrase Search Assistant: Terminological Feedback for Iterative Information Seekings, ACM SIGIR, 1999.
M.-L. Antonie, O. Zaiane, and A. Coman. Application of Data Mining Techniques for Medical Image Classification, Second International Workshop on Multimedia Data Mining at KDD, 2001.
F. Beil, M. Ester, and X. Xu. Frequent Term-based Text Clustering, ACM KDD Conference, 2002.
M. Benkert, J. Gudmundsson, F. Hubner, and T. Wolle. Reporting flock patterns, COMGEO, 2008.
C. Bettini, X. S. Wang, S. Jajodia, and J. L. Lin. Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences, IEEE Transactions on Knowledge and Data Engineering, 10(2), pp. 222–237, 1998.
H. Bohm and G. Schneider. Virtual Screening for Bioactive Molecules. Wiley-VCH, 2000.
C. Borgelt, M. Berthold. Mining molecular fragments: finding relevant substructures of molecules. ICDM Conference, 2002.
G. Buehrer, and K. Chellapilla. A Scalable Pattern Mining Approach to Web Graph Compression with Communities. WSDM Conference, 2009.
T. Calders, and B. Goethals. Mining all non-derivable frequent itemsets Principles of Data Mining and Knowledge Discovery, pp. 1–42, 2002.
H. Cao, N. Mamoulis, D. W. Cheung. Mining Frequent Spatiotemporal Sequential Patterns, ICDM Conference, 2005.
W. Cavnar, and J. Trenkle. N-Gram based Text Categorization, Proceedings of SDAIR, pp. 161–174, 1994.
M. S. Chen, J. S. Park, and P. S. Yu. Efficient data mining for path traversal patterns, IEEE Transactions on Knowledge and Data Engineering, 10(2), pp. 209–221, 1998.
C. Chen, X. Yan, F. Zhu, and J. Han. gapprox: Mining Frequent Approximate Patterns from a Massive Network, ICDM Conference, 2007.
C. Cheng, A. Fu, Y. Zhang. Entropy-based Subspace Clustering for Mining Numerical Data, ACM KDD Conference, 1999.
H. Cheng, X. Yan, J. Han, and C.-W. Hsu. Discriminative Frequent Pattern Analysis for Effective Classification, ICDE Conference, 2007.
G. Cong, A. Tung, X. Xu, F. Pan, and J. Yang. FARMER: Finding Interesting Rule Groups in Microarray Data Sets, ACM SIGMOD Conference, 2004.
G. Cong, K.-L. Tan, A. K. H. Tung, X. Xu. Mining Top-k covering Rule Groups for Gene Expression Data. ACM SIGMOD Conference, 2005.
R. Cooley, B. Mobasher, and J. Srivasatava. Web mining: Information and pattern discovery on the world wide web. Ninth International Conference on Tools with Artificial Intelligence, 1997.
R. Cooley, B. Mobaser, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and information systems, 1(1), pp. 5–32, 1999.
B, Cule, N. Tatti, and B. Goethals. MARBLES: Mining Association Rules Buried in Long Event Sequences. SDM Conference, 2002.
B. Cule, B. Goethals, S. Tassenoy and S. Verboven. Mining Train Delays. Proc. 10th International Symposium on Intelligent Data Analysis (IDA 2011), 2011.
L. Dehaspe, H. Toivonen, and R. King. Finding Frequent Substructures in Chemical Compounds. ACM KDD Conference, 1998.
M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE TKDE., 17(8), pp. 1036–1050, 2005.
A. Don, E. Zheleva, M. Gregory, S. Tarkan, L. Auvil, T. Clement, B. Schneiderman, C. Plaisant. Discovering Interesting Usage Patterns in Text Collections: Integrating Text Mining with Visualization, CIKM Conference, 2007.
G. Dong, and J. Li. Efficient Mining of Emerging Patterns: Discovering Trends and Differences, ACM KDD Conference, 1999.
F. Eichinger, D. Nauck, and F. Klawonn. Sequence Mining for Customer Behaviour Predictions in Telecommunications, Workshop on Practical Data Mining: Applications, Experiences and Challenges, 2006.
F. Eichinger, K. Bohm and M. Huber. Mining Edge-Weighted Call Graphs to Localize Software Bugs, Machine Learning and Knowledge Discovery in Databases, Springer, 2008.
M. Eirinaki, M. Vazirgiannis. Web mining for web personalization. ACM Transactions on Internet Technology, 3: pp. 1–27, 2003.
M. Ester, H.-P. Kriegel, and J. Sander. Spatial Data Mining: A Database Approach, Advances in Spatial Databases, pp. 47–66, Springer, 1997.
V. Estivill-Castrol, and A. T. Murray. Discovering Associations in Spatial Data—An Efficient Medoid-Based Approach, DMKD Workshop, 1998.
W. Fan, K. Zhang, H. Cheng, J. Gao, X. Yan, J. Han, P. Yu, and P. Verscheure. Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree, ACM KDD Conference, 2008.
A. Frank, and A. Asuncion. UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science, 2010. http://archive.ics.uci.edu/ml.
B. Fung, K. Wang, and M. Ester. Hierarchical Document Clustering using Frequent Itemsets, SDM Conference, 2003.
J. Gudmindsson, M. van Krevald, B. Speckmann. Efficient detection of motion patterns in spatiotemporal data sets, GIS, 2004.
J. Gudmundsson, M. van Krewald. Computing Longest Duration Flocks in Trajectory Data, GIS, 2006.
M. Gupta, J. Gao, Y. Sun, and J. Han. Community Trend Outlier Detection Using Soft Temporal Pattern Mining. ECML/PKDD Conference, 2012.
R. Gwadera, M. J. Atallah, and W. Szpankowski. Markov Models for Identification of Significant Episodes, SDM Conference, 2005.
R. Gwadera, M. J. Atallah, and W. Szpankowski. Reliable detection of episodes in event sequences. Knowledge and Information Systems, 7(4), pp. 415–437, 2005.
J. Han, K. Koperski, and N. Stefanovic. GeoMiner: a system prototype for spatial data mining. ACM SIGMOD Record 26(2), pp. 553–556, 1997.
J. Han, G. Dong, and Y. Yin. Efficient Mining of Partial Periodic Patterns in Time Series Database, ICDE Conference, 1999.
J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation, ACM SIGMOD Conference, 2000.
J. Han, H. Cheng, D. Xin, and X. Yan. Frequent Pattern Mining: Current Status and Future Directions, Data Mining and Knowledge Discovery, 15(1), pp. 55–86, 2007.
K. Hashimoto, I. Takigawa, M. Shiga, M. Kanehisa, and H. Mamitsuka. Mining significant tree patterns in carbohydrate sugar chains. Bioinformatics, 24(16), pp. 116–7, 2008.
Z. He, S. Deng, and X. Xu. Outlier Detection Integrating Semantic Knowledge. Web Age Information Management (WAIM), 2002.
Z. He, X. Xu, J. Huang, and S. Deng. FP-Outlier: Frequent Pattern-based Outlier Detection, COMSIS, 2(1), 200–5.
J. Hellerstein, S. Ma, and C.-S. Perng. Discovering Actionable Patterns in Event Data, IBM Systems Journal, 41(3), pp. 475–493, 2002.
R. Ivancy, and I. Vajk. Frequent Pattern Mining in Web Log Data, Acta Polytechnica Hungarica, 3(1), pp. 77–90, 2006.
P.-S. Kam, A. W.-C. Fu. Discovering Temporal Patterns for Interval-based Events, Springer, Berlin, 2000.
K. Koperski, and J. Han. Discovery of Spatial Association Riles in Geographic Information Databases, Advances in Spatial Databases, 1995.
K. Koperski, J. Adhikary, and J. Han. Spatial Data Mining: Progress and Challenges Survey Paper, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 1996.
R. Kosala, H. Blockeel. Web Mining Research: A Survey. ACM SIGKDD Explorations, 2000.
G. Kuramuchi and G. Karypis. Frequent Subgraph Discovery, ICDM Conference, 2001.
S. Le, J. Owens, R. Nussinov, J. Chen, B. Shapiro, and J. Maizel. RNA secondary structures: comparison and determination of frequently recurring substructures by consensus. Bioinformatics, 5(3), pp. 205–210, 1989.
A. R. Leach and V. J. Gillet. An Introduction to Chemoinformatics. Springer, 2003.
W. Lee, and S. Stolfo. Data Mining Approaches for Intrusion Detection. Proceedings of the 7th USENIX Security Symposium, 1998.
W. Lee, S. Stolfo, and P. Chan. Learning Patterns from Unix Execution Traces for Intrusion Detection, AAAI workshop on AI methods in Fraud and Risk Management, 1997.
W. Lee, S. Stolfo, and K. Mok. A Data Mining Framework for Building Intrusion Detection Models, IEEE Symposium on Security and Privacy, 1999.
Z. Li, Y. Zhou. PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code, ACM SIGSOFT Symposium on Foundations of Software Engineering, 2005.
W. Li, J. Han, and J. Pei. CMAR: Accurate and Efficient Classification based on Multiple Association Rules, ICDM Conference, 2001.
Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A Tool for Finding Copy-Paste and Related Bugs in Operating System Code, Symposium on Operating Systems Design and Implementation, 2004.
Z. Li, Z. Chen, S. M. Srinivasan, Y. Zhou. C-Miner: Mining Block Correlations in Storage Systems, USENIX Conference on File and Storage System Technologies, 2004.
T. Li, F. Liang, S. Ma, and W. Peng. An Integrated Framework on Mining Log Files for Computing System Management. ACM KDD Conference, 2005.
X. Li, J. Han, and S. Kim. Motion-alert: Automatic Anomaly Detection in Massive Moving Objects, IEEE Conference in Intelligence and Security Informatics, 2006.
X. Li, J. Han, S. Kim, and H. Gonzalez. ROAM: Rule- and Motif-based Anomaly Detection in Massive Moving Object Data Sets, SDM Conference, 2007.
Y. Li, S. Chung, and J. Holt. Text document clustering based on frequent word meaning sequences. Data and Knowledge Engineering, 64(1), pp. 381–404, 2008.
Z. Li, B. Ding, J. Han, R. Kays. Swarm: Mining Relaxed Temporal Object Moving Clusters, VLDB Conference, 2010.
D. Lin, and P. Pantel. DIRT@ SBT@ Discovery of Inference Rules from Text, ACM KDD Conference, 2001.
J. Lin, E. Keogh, S. Lonardi, and B. Y.-C. Chiu. A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. DMKD Workshop, 2003.
B. Liu, W. Hsu, Y. Ma. Integrating Classification and Association Rule Mining, ACM KDD Conference, 1998.
C. Liu, X. Yan, H. Lu, J. Han, and P. S. Yu. Mining Behavior Graphs for “backtrace” of non-crashing bugs, SDM Conference, 2005.
H. Liu, J. Han, D. Xin, and Z. Shao. Mining frequent patterns on very high dimensional data: a top-down row enumeration approach. SDM Conference, 2006.
D. Lo, H. Cheng, J. Han, S.-C. Khoo, and C. Sun. Classification of Software Behaviors for Failure Detection: A Discriminative Pattern Mining Approach, ACM KDD Conference, 2009.
A. Lopes, R. Pinho, F. Paulovich, and R. Minghim. Visual Text Mining using Association Rules, Computers and Graphics, 31(3), pp. 316–326, 2007.
S. Ma, and J. Hellerstein. Mining Partially Periodic Event Patterns with Unknown Periods, IEEE International Conference on Data Engineering, 2001.
S. Madeira, and A. Oliveira. Biclustering Algorithms for Biological Data Analysis; A Survey, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1), pp. 24–45, 2004.
H. Mannila, and H. Toivonen. Discovering Generalized Episodes using Minimal Occurrences, KDD Conference, 1996.
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering Frequent Episodes in Sequences, ACM KDD Conference, 1995.
T. Margush, F. McMorris. Consensus-trees. Bulletin of Mathematical Biology, 43(2), pp. 239–244, 1981.
H. J. Miller, and J. Han. Geographic Data Mining and Knowledge Discovery. CRC Press, 2003.
J. Mitchell, J. Cheng, and K. Collins. A box H/ACA small nucleolar RNA-like domain at the human telomerase RNA end. Molecular and cellular biology, 19(1), pp. 567–576, 1999.
S. Mitra, and H. Banka. Multi-objective Evolutionary Biclustering of Gene Expression Data, Pattern Recognition, 39(12), pp. 2464–2477, 2006.
B. Mobasher, R. Cooley, and J. Srivastava. Automatic personalization based on Web usage mining Communications of the ACM, 43(8), pp. 142–151, 2000.
B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Effective personalization based on association rule discovery from web usage data, Proceedings of the 3rd international workshop on Web information and data, 2001.
A. Nanopoulos, and Y. Manolopoulos. Finding generalized path patterns for web log data mining, Lecture notes in computer science, pp. 215–228, 2000.
A. Nanopoulos, and Y. Manolopoulos. Efficient similarity search for market basket data, VLDB Journal, 11(2), 2002.
F. Pan, G. Cong, A. Tung, J. Yang, and M. Zaki. CARPENTER: Finding closed patterns in long biological datasets. ACM KDD Conference, 2003.
F Pan, A. K. H. Tung, G. Cong, X. Xu. COBBLER: Combining column and Row Enumeration for Closed Pattern Discovery. SSDBM, 2004.
L. Parsons, E. Haque, and H. Liu. Subspace Clustering for High Dimensional Data: A Review, ACM SIGKDD Explorations, 6(1), pp. 90–105, 2004.
J. Pei, and J. Han. Can we push more constraints into frequent pattern mining? ACM KDD Conference, 2000.
J. Pei, J. Han, B. Mortazavi-Asl and H. Zhu. Mining access patterns efficiently from web logs. PAKDD, 2000.
J. Pei, J. Han, and L. V. S. Lakshmanan. Mining Frequent Patterns with Convertible Constraints in Large Databases, ICDE Conference, 2001.
J. Punin, M. Krishnamoorthy, M. Zaki. Web usage mining: languages and algorithms. Springer-Verlag, 2001.
I. Rigoutsos, and A. Floratos. Combinatorial Pattern Discovery in Biological Sequences: The TEIRESIAS algorithm, Bioinformatica, 14(1), pp. 55–67, 1998.
B. Shapiro, and K. Zhang. Comparing multiple RNA secondary structures using tree comparisons. Bioinformatics, 6(4), pp. 309–318, 1990.
D. Shasha, J. Wang, and S. Zhang. Unordered tree mining with applications to phylogeny. ICDE Conference, pp. 708–719, 2004.
P. Shenoy, J. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, D. Shah. Turbo-charging Vertical Mining of Large Databases. ACM SIGMOD Conference, pp. 22–33, 2000.
A. Siebes, J. Vreeken, and M. van Leeuwen. Itemsets than Compress, SIAM Conference on Data Mining, 2006.
K. Smets and J. Vreeken. The Odd One Out: Identifying an Characterising Anomalies, SIAM Conference on Data Mining, 2011.
A. Srinivasan, R. King, S. Muggleton, and M. J. E. Sternberg. Carcinogenesis predictions using ILP. Workshop on Inductive Logic Programming, Vol. 1297, pp. 273–287, 1997.
A. Srinivasan, R. King, S. Muggleton, and M. J. E. Sternberg. The predictive toxicology evaluation challenge, IJCAI, 1997.
J. Srivastava, R. Cooley, M. Deshpande, and P. N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. ACM SIGKDD Explorations Newsletter, 1(2), pp. 12–23, 2000.
C. Stockham, L. Wang, T. Warnow. Statistically based postprocessing of phylogenetic analysis by clustering. Bioinformatics, 18(3), pp. 465–469, 2002.
R. Vilalta, and S. Ma. Predicting Rare Events in Temporal Domains, ICDM Conference, 2002.
J. Wang, G. Karypis. HARMONY: Efficiently Mining the Best Rules for Classification. SDM Conference, 2005.
J. T.-L. Wang, G.-W. Chirn, T. G. Marr, B. Shapiro, D. Shasha, and K. Zhang. Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results, ACM SIGMOD Record, 23(2), pp. 115–125, 1994.
K. Wang, C. Xu, and B. Liu. Clustering Transactions using Large Items, CIKM Conference, 1999.
J. Wang, D. Shasha, and B. Shapiro. Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications. Oxford University Press, 1999.
H. Wang, W. Wang, J. Yang, and P. S. Yu. Clustering by pattern similarity in large data sets, ACM SIGMOD Conference, 2002.
K. Wang, Y. Xu, and J. X. Yu. Scalable Sequential Pattern Mining for Biological Sequences, ACM KDD Conference, 2004.
P. C. Wong, P. Whitney, and J. Thomas. Visualizing Association Rules for Text Mining, InfoVis, 1999.
Y. Xiao, M. Dunham. Efficient mining of traversal patterns, Data and Knowledge Engineering, 39(2), pp. 191–214, 2001.
Z. Xing, J. Pei, and E. Keogh. A Brief Survey on Sequence Classification, ACM SIGKDD Explorations, 12(1), 201–0.
H. Xiong, S. Shekhar, Y. Huang, V. Kumar, X. Ma, J. Yoo. A framework for discovering co-location patterns in data sets with extended spatial objects, SDM Conference, pp. 78–89, 2004.
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. ACM SIGMOD Conference, 2004.
X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. ACM SIGMOD Conference, 2005.
X. Yan, F. Zhu, J. Han, and P. S. Yu. Searching substructures with superimposed distance, ICDE Conference, 2006.
Y. Yang, and B. Padmanabhan. GHIC: A Hierarchical Pattern-based Clustering for Grouping Web Transactions, IEEE TKDE, 17(9), pp. 1300–1304, 2005.
J. Yang, and W. Wang. CLUSEQ: Efficient and Effective Sequence Clustering. ICDE Conference, 2003.
M. Yiu, and N. Mamoulis. Frequent-pattern based iterated projected clustering, ICDM Conference, 2003.
O. Zaiane, M. Xin, and J. Han. Discovering Web Access Patterns and Trends by applying OLAP and Data Mining Technology on Web Logs. Research and Technology Advances in Digital Libraries, pp. 19–29. 1998.
M. Zaki. Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17(8), pp. 1021–1035, 2005.
M. Zaki, C. Aggarwal. XRules: An Effective Classifier for XML Data, ACM KDD Conference, 2003.
M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New Algorithms for Fast Discovery of Association Rules. KDD Conference, pp. 283–286, 1997.
S. Zhang, T. Wang. Discovering Frequent Agreement Subtrees from Phylogenetic Data. IEEE Transactions on Knowledge and Data Engineering, 20(1), pp. 68–82, 2008.
X. Zhang, N. Mamoulis, D. W. Cheung, Y. Shou. Fast mining of spatial collocations. ACM KDD Conference, pp. 384–393, 2004.
W. Zhou, H. Liu, and H. Cheng. Mining closed episodes from event sequences efficiently, PAKDD Conference, 2010.
http://www.cs.cmu.edu/∼wcohen/#sw.
http://cgi.csc.liv.ac.uk/∼frans/KDD/Software/FPgrowth/fpGrowth.html.
http://www.oracle.com/technetwork/database/options/advanced-analytics/odm/index.html.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Aggarwal, C. (2014). Applications of Frequent Pattern Mining. In: Aggarwal, C., Han, J. (eds) Frequent Pattern Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-07821-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-07821-2_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07820-5
Online ISBN: 978-3-319-07821-2
eBook Packages: Computer ScienceComputer Science (R0)