Skip to main content

Contrasting Sequence Groups by Emerging Sequences

  • Conference paper
Discovery Science (DS 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5808))

Included in the following conference series:

Abstract

Group comparison per se is a fundamental task in many scientific endeavours but is also the basis of any classifier. Contrast sets and emerging patterns contrast between groups of categorical data. Comparing groups of sequence data is a relevant task in many applications. We define Emerging Sequences (ESs) as subsequences that are frequent in sequences of one group and less frequent in the sequences of another, and thus distinguishing or contrasting sequences of different classes. There are two challenges to distinguish sequence classes: the extraction of ESs is not trivially efficient and only exact matches of sequences are considered. In our work we address those problems by a suffix tree-based framework and a similar matching mechanism. We propose a classifier based on Emerging Sequences. Evaluating against two learning algorithms based on frequent subsequences and exact matching subsequences, the experiments on two datasets show that our model outperforms the baseline approaches by up to 20% in prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion, A., Newman, D.: UCI machine learning repository (2007)

    Google Scholar 

  2. Bay, S.D., Pazzani, M.J.: Detecting change in categorical data: Mining contrast sets. In: KDD, pp. 302–306 (1999)

    Google Scholar 

  3. Deng, K., Zaïane, O.R.: Technical report, Department of Computing Science, University of Alberta (2009), http://www.cs.ualberta.ca/~kdeng2/postscript/deng09.pdf

  4. Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: KDD, pp. 43–52. ACM Press, New York (1999)

    Google Scholar 

  5. EL-Manzalawy, Y., Dobbs, D., Honavar, V.: On evaluating mhc-ii binding peptide prediction methods. PLoS ONE 3(9), e3268 (2008)

    Article  Google Scholar 

  6. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  7. Han, J., Kamber, M.: Data Mining, Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)

    MATH  Google Scholar 

  8. Jazayeri, S.V., Zaïane, O.R.: Plant protein localization using discriminative and frequent partition-based subsequences. In: ICDM Workshops, pp. 228–237 (2008)

    Google Scholar 

  9. Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11(3), 259–286 (2007)

    Article  Google Scholar 

  10. Ramamohanarao, J.B.K., Dong, G.: tutorial Contrast Data Mining: Methods and Applications. In: ICDM (2007)

    Google Scholar 

  11. Langley, P., Iba, W., Thompson, K.: An analysis of bayesian classifiers. In: National Conference on Artificial Intelligence, pp. 223–228 (1992)

    Google Scholar 

  12. Lesh, N., Zaki, M.J., Ogihara, M.: Mining features for sequence classification. In: KDD, pp. 342–346 (1999)

    Google Scholar 

  13. Li, J., Yang, Q.: Strong compound-risk factors: Efficient discovery through emerging patterns and contrast sets. IEEE Transactions on Information Technology in Biomedicine 11(5), 544–552 (2007)

    Article  MathSciNet  Google Scholar 

  14. Liao, T.F.: Statoistical Group Comparison. Wiley’s Series in probability and Statistics (2002)

    Google Scholar 

  15. Lo, D., Cheng, H., Han, J., Khoo, S.-C.: Classification of software behaviors for failure detection: A discriminative pattern mining approach. In: KDD (2009)

    Google Scholar 

  16. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: Mining sequential patterns by prefix-projected growth. In: ICDE, pp. 215–224 (2001)

    Google Scholar 

  17. Rish, I.: An empirical study of the naive bayes classifier. In: IJCAI workshop (2001)

    Google Scholar 

  18. Wang, L., Zhao, H., Dong, G., Li, J.: On the complexity of finding emerging patterns. Theor. Comput. Sci. 335(1), 15–27 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  19. Zaki, M.J.: Efficient enumeration of frequent sequences. In: CIKM, pp. 68–75 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Deng, K., Zaïane, O.R. (2009). Contrasting Sequence Groups by Emerging Sequences. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds) Discovery Science. DS 2009. Lecture Notes in Computer Science(), vol 5808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04747-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04747-3_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04746-6

  • Online ISBN: 978-3-642-04747-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics