Contrasting Sequence Groups by Emerging Sequences

Deng, Kang; Zaïane, Osmar R.

doi:10.1007/978-3-642-04747-3_29

Kang Deng²³ &
Osmar R. Zaïane²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5808))

Included in the following conference series:

International Conference on Discovery Science

1924 Accesses
14 Citations

Abstract

Group comparison per se is a fundamental task in many scientific endeavours but is also the basis of any classifier. Contrast sets and emerging patterns contrast between groups of categorical data. Comparing groups of sequence data is a relevant task in many applications. We define Emerging Sequences (ESs) as subsequences that are frequent in sequences of one group and less frequent in the sequences of another, and thus distinguishing or contrasting sequences of different classes. There are two challenges to distinguish sequence classes: the extraction of ESs is not trivially efficient and only exact matches of sequences are considered. In our work we address those problems by a suffix tree-based framework and a similar matching mechanism. We propose a classifier based on Emerging Sequences. Evaluating against two learning algorithms based on frequent subsequences and exact matching subsequences, the experiments on two datasets show that our model outperforms the baseline approaches by up to 20% in prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Google Scholar
Bay, S.D., Pazzani, M.J.: Detecting change in categorical data: Mining contrast sets. In: KDD, pp. 302–306 (1999)
Google Scholar
Deng, K., Zaïane, O.R.: Technical report, Department of Computing Science, University of Alberta (2009), http://www.cs.ualberta.ca/~kdeng2/postscript/deng09.pdf
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: KDD, pp. 43–52. ACM Press, New York (1999)
Google Scholar
EL-Manzalawy, Y., Dobbs, D., Honavar, V.: On evaluating mhc-ii binding peptide prediction methods. PLoS ONE 3(9), e3268 (2008)
Article Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Han, J., Kamber, M.: Data Mining, Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
MATH Google Scholar
Jazayeri, S.V., Zaïane, O.R.: Plant protein localization using discriminative and frequent partition-based subsequences. In: ICDM Workshops, pp. 228–237 (2008)
Google Scholar
Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11(3), 259–286 (2007)
Article Google Scholar
Ramamohanarao, J.B.K., Dong, G.: tutorial Contrast Data Mining: Methods and Applications. In: ICDM (2007)
Google Scholar
Langley, P., Iba, W., Thompson, K.: An analysis of bayesian classifiers. In: National Conference on Artificial Intelligence, pp. 223–228 (1992)
Google Scholar
Lesh, N., Zaki, M.J., Ogihara, M.: Mining features for sequence classification. In: KDD, pp. 342–346 (1999)
Google Scholar
Li, J., Yang, Q.: Strong compound-risk factors: Efficient discovery through emerging patterns and contrast sets. IEEE Transactions on Information Technology in Biomedicine 11(5), 544–552 (2007)
Article MathSciNet Google Scholar
Liao, T.F.: Statoistical Group Comparison. Wiley’s Series in probability and Statistics (2002)
Google Scholar
Lo, D., Cheng, H., Han, J., Khoo, S.-C.: Classification of software behaviors for failure detection: A discriminative pattern mining approach. In: KDD (2009)
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: Mining sequential patterns by prefix-projected growth. In: ICDE, pp. 215–224 (2001)
Google Scholar
Rish, I.: An empirical study of the naive bayes classifier. In: IJCAI workshop (2001)
Google Scholar
Wang, L., Zhao, H., Dong, G., Li, J.: On the complexity of finding emerging patterns. Theor. Comput. Sci. 335(1), 15–27 (2005)
Article MathSciNet MATH Google Scholar
Zaki, M.J.: Efficient enumeration of frequent sequences. In: CIKM, pp. 68–75 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8
Kang Deng & Osmar R. Zaïane

Authors

Kang Deng
View author publications
You can also search for this author in PubMed Google Scholar
Osmar R. Zaïane
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics; Rua Dr. Roberto Frias, University of Porto, 4200-465, Porto, Portugal
João Gama
DCC-FC, Universidade do Porto, Portugal
Vítor Santos Costa
LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel B. Brazdil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deng, K., Zaïane, O.R. (2009). Contrasting Sequence Groups by Emerging Sequences. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds) Discovery Science. DS 2009. Lecture Notes in Computer Science(), vol 5808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04747-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-04747-3_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04746-6
Online ISBN: 978-3-642-04747-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics