Pruning Relations for Substructure Discovery of Multi-relational Databases

Guo, Hongyu; Viktor, Herna L.; Paquet, Eric

doi:10.1007/978-3-540-74976-9_47

Hongyu Guo¹,
Herna L. Viktor¹ &
Eric Paquet²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4702))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

3457 Accesses
1 Citations

Abstract

Multirelational data mining methods discover patterns across multiple interlinked tables (relations) in a relational database. In many large organizations, such a multi-relational database spans numerous departments and/or subdivisions, which are involved in different aspects of the enterprise such as customer profiling, fraud detection, inventory management, financial management, and so on. When considering multirelational classification, it follows that these subdivisions will express different interests in the data, leading to the need to explore various subsets of relevant relations with high utility with respect to the target class. The paper presents a novel approach for pruning the uninteresting relations of a relational database where relations come from such different parties and spans many classification tasks. We aim to create a pruned structure and thus minimize predictive performance loss on the final classification model. Our method identifies a set of strongly uncorrelated subgraphs to use for training and discards all others. The experiments performed demonstrate that our strategy is able to significantly reduce the size of the relational schema without sacrificing predictive accuracy.

Download to read the full chapter text

Chapter PDF

Finding Dense Subgraphs in Relational Graphs

Mining Association Rules in Graphs Based on Frequent Cohesive Itemsets

Mining Frequent Patterns from Hypergraph Databases

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Berka, P.: Guide to the financial data set. In: Siebes, A., Berka, P. (eds.) PKDD 2000 Discovery Challenge (2000)
Google Scholar
Blockeel, H., Raedt, L.D.: Top-down induction of first-order logical decision trees. Artificial Intelligence 101(1-2), 285–297 (1998)
Article MATH MathSciNet Google Scholar
Ghiselli, E.E.: Theory of Psychological Measurement. McGrawHill Company, New York (1964)
Google Scholar
Guo, H., Viktor, H.L.: Mining relational data through correlation-based multiple view validation. In: KDD 2006, pp. 567–573, New York, USA (2006)
Google Scholar
Habrard, A., Bernard, M., Sebban, M.: Detecting irrelevant subtrees to improve probabilistic learning from tree-structured data. Fundamenta Informaticae: Special Issue on Mining Graphs, Trees and Sequences (2005)
Google Scholar
Hall, M.: Correlation-based feature selection for machine learning, Ph.D thesis, department of computer science, university of waikato, new zealand (1998)
Google Scholar
Hamill, R., Martin, N.: Database support for path query functions. In: Williams, H., MacKinnon, L.M. (eds.) Key Technologies for Data Management. LNCS, vol. 3112, pp. 84–99. Springer, Heidelberg (2004)
Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Article MATH Google Scholar
Krogel, M.-A.: On Propositionalization for Knowledge Discovery in Relational Databases. PhD thesis, Otto-von-Guericke-Universität Magdeburg (2005)
Google Scholar
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical recipes in C: the art of scientific computing. Cambridge University Press, Cambridge (1988)
MATH Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Singh, L., Getoor, L., Licamele, L.: Pruning social networks using structural properties and descriptive attributes. In: ICDM 2005, pp. 773–776 (2005)
Google Scholar
Yin, X., Han, J., Yang, J., Yu, P.S.: Crossmine: Efficient classification across multiple database relations. In: ICDE 2004, Boston (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology& Engineering, University of Ottawa, Canada
Hongyu Guo & Herna L. Viktor
National Research Council of Canada, Ottawa, Canada
Eric Paquet

Authors

Hongyu Guo
View author publications
You can also search for this author in PubMed Google Scholar
Herna L. Viktor
View author publications
You can also search for this author in PubMed Google Scholar
Eric Paquet
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joost N. Kok Jacek Koronacki Ramon Lopez de Mantaras Stan Matwin Dunja Mladenič Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, H., Viktor, H.L., Paquet, E. (2007). Pruning Relations for Substructure Discovery of Multi-relational Databases. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds) Knowledge Discovery in Databases: PKDD 2007. PKDD 2007. Lecture Notes in Computer Science(), vol 4702. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74976-9_47

Download citation

DOI: https://doi.org/10.1007/978-3-540-74976-9_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74975-2
Online ISBN: 978-3-540-74976-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Pruning Relations for Substructure Discovery of Multi-relational Databases

Abstract

Chapter PDF

Similar content being viewed by others

Finding Dense Subgraphs in Relational Graphs

Mining Association Rules in Graphs Based on Frequent Cohesive Itemsets

Mining Frequent Patterns from Hypergraph Databases

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Pruning Relations for Substructure Discovery of Multi-relational Databases

Abstract

Chapter PDF

Similar content being viewed by others

Finding Dense Subgraphs in Relational Graphs

Mining Association Rules in Graphs Based on Frequent Cohesive Itemsets

Mining Frequent Patterns from Hypergraph Databases

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation