Abstract
Most pattern recognition approaches look for patterns in data represented as independent entities described by attributes. However, the relationships between entities are as important, if not more important, to the recognition of accurate and meaningful patterns. In this chapter we describe an approach to discovering patterns in relational data represented as a graph. Our approach is based on the minimum description length (MDL) principle [28], which measures how well various patterns compress the original database. This approach is implemented in the SUBDUE system. We begin with a discussion of related work. We then describe graph-based discovery, the main discovery algorithm, and the polynomially-constrained inexact graph matching algorithm at the heart of the discovery process. Next, we describe how this technique can also be used for clustering and concept learning. We illustrate the utility of the approach by applying the clustering and concept learning techniques to DNA and WWW data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
G. H. Ball. Classification analysis. Technical Report SRI Project 5533, Stanford Research Institute, 1971.
J. M. Barnard. Substructure searching methods: Old and new. Journal of Chemical Information and Computing Sciences, 33: 532–538, 1993.
C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998.
H. Bunke and B. T. Messmer. A new algorithm for efficient subgraph matching. In G. Vernazza, A. N. Vebetsanopoulos, and C. Braccini, editors, Image Processing: Theory and Applications, pages 303–307. Elsevier Science Publishers, 1993.
R. M. Cameron-Jones and J. R. Quinlan Efficient top-down induction of logic programs SIGART Bulletin, 5 (1): 33–42, 1994.
P. Cheeseman and J. Stutz. Bayesian classification (AutoClass): Theory and results. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, chapter 6, pages 153–180. MIT Press, 1996.
D. Conklin, S. Fortier, J. Glasgow, and F. Allen. Discovery of spatial concepts in crystallographic databases. In Proceedings of the ML92 Workshop on Machine Discovery, pages 111–116, 1992.
D. Conklin and J. Glasgow. Spatial analogy and subsumption. In Proceedings of the Ninth International Conference on Machine Learning, pages 111–116, 1992.
D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1: 231–255, 1994.
D. J. Cook and L. B. Holder. Graph-based data mining IEEE Intelligent Systems, 15 (2): 32–41, 2000.
D. J. Cook, L. B. Holder, and S. Djoko. Knowledge discovery from structural data. Journal of Intelligence and Information Sciences, 5 (3): 229–245, 1995.
D. J. Cook, L. B. Holder, and S. Djoko. Scalable discovery of informative structural concepts using domain knowledge. IEEE Expert, 11 (5): 59–68, 1996.
D. J. Cook, L. B. Holder, G. Galal, and R. K. Maglothin. Approaches to parallel graph-based knowledge discovery. Journal of Parallel and Distributed Computing, 61 (3): 427–446, 2001.
L. Dehaspe, H. Toivonen, and R. D. King. Finding frequent substructures in chemical compounds. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pages 30–36, 1998.
S. Dzeroski. Inductive logic programming and knowledge discovery in databases. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, chapter 5, pages 117–152. MIT Press, 1996.
J. T. Favata. Offline general handwritten word recognition using an approximate beam matching algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(9), 2001.
D. H. Fisher. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2 (2): 139–172, 1987.
J. Gonzalez. Empirical and Theoretical Analysis of Relational Concept Learning Using a Graph-Based Representation. PhD thesis, Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, Aug. 2001.
P. Jappy and R. Nock. Pac learning conceptual graphs. In Proceedings of the Sixth International Conference on Conceptual Structures, 1998.
R. Levinson. A self-organizing retrieval system for graphs. In Proceedings of the Fourth National Conference on Artificial Intelligence, pages 203–206, 1984.
M. Liquiere and J. Sallantin. Structural machine learning with galois lattice and graphs. In Proceedings of the Fifteenth International Conference on Machine Learning, pages 305–313, 1998.
J. Llados, E. Marti, and J. J. Villanueva. Symbol recognition by error-tolerant subgraph matching between region adjacency graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23 (10): 1137–1143, 2001.
B. Luo and E. R. Hancock. Structural graph matching using the em algorithm and singular value decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23 (10): 1120–1136, 2001.
B. T. Messmer and H. Bunke A new algorithm for error-tolerant sub-graph isomorphism. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(5): 493–504, 1998.
S. Muggleton. Inverse entailment and Progol. New Generation Computing, 13:245–286, 1995.
R. Myers, R. C. Wilson, and E. R. Hancock. Bayesian graph edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 (6): 628–635, 2000.
L. D. Raedt and M. Bruynooghe. A theory of clausal discovery. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pages 1058–1063, 1993.
J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific Publishing Company, 1989.
J. Segen. Learning graph models of shape. In Proceedings of the Fifth International Conference on Machine Learning, pages 29–35, 1988.
J. Segen. Graph clustering and model learning by data compression. In Proceedings of the Seventh International Conference on Machine Learning, pages 93–101, 1990.
J. F. Sowa. Conceptual Structures: Information in Mind and Machine. Addison Wesley, 1984.
S. Su, D. J. Cook, and L. B. Holder. Applications of knowledge discovery to molecular biology: Identifying structural regularities in proteins. In Proceedings of the Pacific Symposium on Biocomputing, pages 190–201, 1999.
K. Thompson and P. Langley. Concept formation in structured domains. In D. H. Fisher and M. Pazzani, editors, Concept Formation: Knowledge and Experience in Unsupervised Learning, chapter 5. Morgan Kaufmann Publishers, 1991.
J. T. L. Wang, B. A. Shapiro, D. Shasha, K. Zhang, and K. M. Currey. An algorithm for finding the largest approximately common substructures of two trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20 (8): 889–895, 1998.
P. H. Winston. Learning structural descriptions from examples. In P. H. Winston, editor, The Psychology of Computer Vision, pages 157–210. McGraw-Hill, 1975.
P. H. Winston. Artificial Intelligence. Addison Wesley, 2nd edition, 1994.
K. Yoshida, H. Motoda, and N. Indurkhya. Unifying learning methods by colored digraphs. In Proceedings of the Learning and Knowledge Acquisition Workshop at IJCAI-93, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Kluwer Academic Publishers
About this chapter
Cite this chapter
Holder, L., Cook, D., Gonzalez, J., Jonyer, I. (2003). Structural Pattern Recognition in Graphs. In: Chen, D., Cheng, X. (eds) Pattern Recognition and String Matching. Combinatorial Optimization, vol 13. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-0231-5_10
Download citation
DOI: https://doi.org/10.1007/978-1-4613-0231-5_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7952-2
Online ISBN: 978-1-4613-0231-5
eBook Packages: Springer Book Archive