Skip to main content

Similarity-Based Clustering of Sequences Using Hidden Markov Models

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2734))

Abstract

Hidden Markov models constitute a widely employed tool for sequential data modelling; nevertheless, their use in the clustering context has been poorly investigated. In this paper a novel scheme for HMM-based sequential data clustering is proposed, inspired on the similarity-based paradigm recently introduced in the supervised learning context. With this approach, a new representation space is built, in which each object is described by the vector of its similarities with respect to a predeterminate set of other objects. These similarities are determined using hidden Markov models. Clustering is then performed in such a space. By way of this, the difficult problem of clustering of sequences is thus transposed to a more manageable format, the clustering of points (vectors of features). Experimental evaluation on synthetic and real data shows that the proposed approach largely outperforms standard HMM clustering schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A., Dubes, R.: Algorithms for clustering data. Prentice Hall (1988)

    Google Scholar 

  2. Rabiner, L.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. of IEEE 77 (1989) 257–286

    Article  Google Scholar 

  3. Jain, A., Zongker, D.: Representation and recognition of handwritten digits using deformable templates. IEEE Trans. Pattern Analysis and Machine Intelligence 19 (1997) 1386–1391

    Article  Google Scholar 

  4. Graepel, T., Herbrich, R., Bollmann-Sdorra, P., Obermayer, K.: Classification on pairwise proximity data. In M. Kearns, S. Solla D.C., ed.: Advances in Neural Information Processing. Volume 11., MIT Press (1999)

    Google Scholar 

  5. Jacobs, D., Weinshall, D.: Classification with nonmetric distances: Image retrieval and class representation. IEEE Trans. Pattern Analysis and Machine Intelligence 22 (2000) 583–600

    Article  Google Scholar 

  6. Pekalska, E., Duin, R.: Automatic pattern recognition by similarity representations. Electronics Letters 37 (2001) 159–160

    Article  Google Scholar 

  7. Pekalska, E., Paclik, P., Duin, R.: A generalized kernel approach to dissimilaritybased classification. Journal of Machine Learning Research 2 (2002) 175–211

    Article  MATH  MathSciNet  Google Scholar 

  8. Pekalska, E., Duin, R.: Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters 23 (2002) 943–956

    Article  MATH  Google Scholar 

  9. Smyth, P.: Clustering sequences with hidden Markov models. In Mozer, M., Jordan, M., Petsche, T., eds.: Advances in Neural Information Processing. Volume 9., MIT Press (1997)

    Google Scholar 

  10. Panuccio, A., Bicego, M., Murino, V.: A Hidden Markov Model-based approach to sequential data clustering. In Caelli, T., Amin, A., Duin, R., Kamel, M., de Ridder, D., eds.: Structural, Syntactic and Statistical Pattern Recognition. LNCS 2396, Springer (2002) 734–742

    Chapter  Google Scholar 

  11. Rabiner, L., Lee, C., Juang, B., Wilpon, J.: HMM clustering for connected word recognition. In: Proc. of IEEE ICASSP. (1989) 405–408

    Google Scholar 

  12. Lee, K.: Context-dependent phonetic hidden Markov models for speakerindependent continuous speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 38 (1990) 599–609

    Article  Google Scholar 

  13. Kosaka, T., Matsunaga, S., Kuraoka, M.: Speaker-independent phone modeling based on speaker-dependent hmm’s composition and clustering. In: Int. Proc. on Acoustics, Speech, and Signal Processing. Volume 1. (1995) 441–444

    Google Scholar 

  14. Bahlmann, C., Burkhardt, H.: Measuring hmm similarity with the bayes probability of error and its application to online handwriting recognition. In: Proc. Int. Conf. Document Analysis and Recognition. (2001) 406–411

    Google Scholar 

  15. Cadez, I., Gaffney, S., Smyth, P.: A general probabilistic framework for clustering individuals. In: Proc. of ACM SIGKDD 2000. (2000)

    Google Scholar 

  16. Law, M., Kwok, J.: Rival penalized competitive learning for model-based sequence. In: Proc. Int. Conf. Pattern Recognition. Volume 2. (2000) 195–198

    Google Scholar 

  17. Xu, L., Krzyzak, A., Oja, E.: Rival penalized competitive learning for clustering analysis, RBF nets, and curve detection. IEEE Trans. on Neural Networks 4 (1993) 636–648

    Article  Google Scholar 

  18. Li, C.: A Bayesian Approach to Temporal Data Clustering using Hidden Markov Model Methodology. PhD thesis, Vanderbilt University (2000)

    Google Scholar 

  19. Li, C., Biswas, G.: Clustering sequence data using hidden Markov model representation. In: Proc. of SPIE’99 Conf. on Data Mining and Knowledge Discovery: Theory, Tools, and Technology. (1999) 14–21

    Google Scholar 

  20. Li, C., Biswas, G.: A bayesian approach to temporal data clustering using hidden Markov models. In: Proc. Int. Conf. on Machine Learning. (2000) 543–550

    Google Scholar 

  21. Li, C., Biswas, G.: Applying the Hidden Markov Model methodology for unsupervised learning of temporal data. Int. Journal of Knowledge-based Intelligent Engineering Systems 6 (2002) 152–160

    Google Scholar 

  22. Li, C., Biswas, G., Dale, M., Dale, P.: Matryoshka: A HMM based temporal data clustering methodology for modeling system dynamics. Intelligent Data Analysis Journal in press (2002)

    Google Scholar 

  23. Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6 (1978) 461–464

    Article  MATH  MathSciNet  Google Scholar 

  24. Stolcke, A., Omohundro, S.: Hidden Markov Model induction by Bayesian model merging. In Hanson, S., Cowan, J., Giles, C., eds.: Advances in Neural Information Processing Systems. Volume 5., Morgan Kaufmann, San Mateo, CA (1993) 11–18

    Google Scholar 

  25. Cheeseman, P., Stutz, J.: Bayesian classification (autoclass): Theory and results. In: Advances in Knowledge discovery and data mining. (1996) 153–180

    Google Scholar 

  26. Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)

    MATH  Google Scholar 

  27. Dubnov, S., El-Yaniv, R., Gdalyahu, Y., Schneidman, E., Tishby, N., Yona, G.: A new nonparametric pairwise clustering algorithm based on iterative estimation of distance profiles. Machine Learning 47 (2002) 35–61

    Article  MATH  Google Scholar 

  28. Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press (1999)

    Google Scholar 

  29. Bicego, M., Murino, V.: Investigating Hidden Markov Models’ capabilities in 2D shape classification. Submitted for publication (2002)

    Google Scholar 

  30. Sebastian, T., Klein, P., Kimia, B.: Recognition of shapes by editing Shock Graphs. In: Proc. Int Conf. Computer Vision. (2001) 755–762

    Google Scholar 

  31. Bicego, M., Murino, V., Figueiredo, M.: Similarity-based classification of sequences using hidden Markov models (2002) Submitted for publication.

    Google Scholar 

  32. Law, M., A.K. Jain, Figueiredo, M.: Feature selection in mixture-based clustering. In: Neural Information Processing Systems — NIPS’2002, Vancouver (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bicego, M., Murino, V., Figueiredo, M.A.T. (2003). Similarity-Based Clustering of Sequences Using Hidden Markov Models. In: Perner, P., Rosenfeld, A. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2003. Lecture Notes in Computer Science, vol 2734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45065-3_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-45065-3_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40504-7

  • Online ISBN: 978-3-540-45065-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics