Sequential Pattern Mining in Multi-relational Datasets

Ferreira, Carlos Abreu; Gama, João; Costa, Vítor Santos

doi:10.1007/978-3-642-14264-2_13

Carlos Abreu Ferreira^22,23,24,
João Gama²² &
Vítor Santos Costa²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5988))

Included in the following conference series:

Conference of the Spanish Association for Artificial Intelligence

644 Accesses
3 Citations

Abstract

We present a framework designed to mine sequential temporal patterns from multi-relational databases. In order to exploit logic-relational information without using aggregation methodologies, we convert the multi-relational dataset into what we name a multi-sequence database. Each example in a multi-relational target table is coded into a sequence that combines intra-table and inter-table relational temporal information. This allows us to find heterogeneous temporal patterns through standard sequence miners. Our framework is grounded in the excellent results achieved by previous propositionalization strategies. We follow a pipelined approach, where we first use a sequence miner to find frequent sequences in the multi-sequence database. Next, we select the most interesting findings to augment the representational space of the examples. The most interesting sequence patterns are discriminative and class correlated. In the final step we build a classifier model by taking an enlarged target table as input to a classifier algorithm. We evaluate the performance of this work through a motivating application, the hepatitis multi-relational dataset. We prove the effectiveness of our methodology by addressing two problems of the hepatitis dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ferreira, C.A., Gama, J., Costa, V.S.: RUSE-WARMR: Rule Selection for Classifier Induction in Multi-relational Data-Sets. In: ICTAI, pp. 379–386 (2008)
Google Scholar
Zelezny, F., Lavrac, N.: Propositionalization-Based Relational Subgroup Discovery with RSD. Machine Learning, 33–63 (2006)
Google Scholar
Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: ICDE, pp. 3–14 (1995)
Google Scholar
Pei, J., Han, J., Mortazavi-asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: ICDE, pp. 215–224 (2001)
Google Scholar
Garofalakis, M., Rastogi, R., Shim, K.: Mining Sequential Patterns with Regular Expression Constraints. IEEE Trans. on Know. and Data Eng., 223–234 (2002)
Google Scholar
Yan, X., Han, J., Afshar, R.: CloSpan: Mining Closed Sequential Patterns in Large Datasets. In: SDM, pp. 166–177 (2003)
Google Scholar
Quinlan, J.R., Cameron-Jones, R.M.: Induction of Logic Programs: FOIL and Related Systems. New Generation Computing, 287–312 (1995)
Google Scholar
Muggleton, S., Feng, C.: Efficient Induction Of Logic Programs. Academic Press, London (1990)
Google Scholar
Landwehr, N., Kersting, K., De Raedt, L.: nFOIL: Integrating Naïve Bayes and FOIL. In: AAAI, pp. 795–800 (2005)
Google Scholar
Davis, J., Burnside, E., Page, D., Dutra, I., Costa, V.S.: Learning Bayesian networks of rules with SAYU. In: MRDM, p.13 (2005)
Google Scholar
Dehaspe, L., Toivonen, H.: Discovery of frequent DATALOG patterns. Data Min. Knowl. Discov. (1999)
Google Scholar
Ohara, K., Yoshida, T., Geamsakul, W., Motoda, H., Washio, T., Yokoi, H., Takabayashi, K.: Analysis of Hepatitis Dataset by Decision Tree Graph-Based Induction. Proceedings of Discovery Challenge, 173–184 (2004)
Google Scholar
Yamada, Y., Suzuki, E., Yokoi, H., Takabayashi, K.: Decision-tree Induction from Time-series Data Based on a Standard-example Split Test. In: ICML, pp. 840–847 (2003)
Google Scholar
Witten, I., Frank, E.: Data mining: practical machine learning tools with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

LIAAD-INESC LA, Portugal
Carlos Abreu Ferreira & João Gama
CRACS-INESC LA, University of Porto, Portugal
Carlos Abreu Ferreira & Vítor Santos Costa
ISEP-Institute of Engineering of Porto, Portugal
Carlos Abreu Ferreira

Authors

Carlos Abreu Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
João Gama
View author publications
You can also search for this author in PubMed Google Scholar
Vítor Santos Costa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IIIA - CSIC, Campus UAB s/n, 08193, Bellaterra, Spain
Pedro Meseguer
Dpto. Lenguajes y Ciencias de la Computación, Universidad de Málaga, Campus de Teatinos, 29071, Málaga, Spain
Lawrence Mandow
Dpto. Lenguajes y Sistemas Informáticos, ETS Ingeniería Informática, University of Seville, Av. Reina Mercedes S/N, 41012, Sevilla, Spain
Rafael M. Gasca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferreira, C.A., Gama, J., Costa, V.S. (2010). Sequential Pattern Mining in Multi-relational Datasets. In: Meseguer, P., Mandow, L., Gasca, R.M. (eds) Current Topics in Artificial Intelligence. CAEPIA 2009. Lecture Notes in Computer Science(), vol 5988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14264-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-14264-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14263-5
Online ISBN: 978-3-642-14264-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics