Similarity-Based Clustering of Sequences Using Hidden Markov Models

Bicego, Manuele; Murino, Vittorio; Figueiredo, Mário A. T.

doi:10.1007/3-540-45065-3_8

Manuele Bicego⁵,
Vittorio Murino⁵ &
Mário A. T. Figueiredo⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2734))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

1691 Accesses
39 Citations

Abstract

Hidden Markov models constitute a widely employed tool for sequential data modelling; nevertheless, their use in the clustering context has been poorly investigated. In this paper a novel scheme for HMM-based sequential data clustering is proposed, inspired on the similarity-based paradigm recently introduced in the supervised learning context. With this approach, a new representation space is built, in which each object is described by the vector of its similarities with respect to a predeterminate set of other objects. These similarities are determined using hidden Markov models. Clustering is then performed in such a space. By way of this, the difficult problem of clustering of sequences is thus transposed to a more manageable format, the clustering of points (vectors of features). Experimental evaluation on synthetic and real data shows that the proposed approach largely outperforms standard HMM clustering schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jain, A., Dubes, R.: Algorithms for clustering data. Prentice Hall (1988)
Google Scholar
Rabiner, L.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. of IEEE 77 (1989) 257–286
Article Google Scholar
Jain, A., Zongker, D.: Representation and recognition of handwritten digits using deformable templates. IEEE Trans. Pattern Analysis and Machine Intelligence 19 (1997) 1386–1391
Article Google Scholar
Graepel, T., Herbrich, R., Bollmann-Sdorra, P., Obermayer, K.: Classification on pairwise proximity data. In M. Kearns, S. Solla D.C., ed.: Advances in Neural Information Processing. Volume 11., MIT Press (1999)
Google Scholar
Jacobs, D., Weinshall, D.: Classification with nonmetric distances: Image retrieval and class representation. IEEE Trans. Pattern Analysis and Machine Intelligence 22 (2000) 583–600
Article Google Scholar
Pekalska, E., Duin, R.: Automatic pattern recognition by similarity representations. Electronics Letters 37 (2001) 159–160
Article Google Scholar
Pekalska, E., Paclik, P., Duin, R.: A generalized kernel approach to dissimilaritybased classification. Journal of Machine Learning Research 2 (2002) 175–211
Article MATH MathSciNet Google Scholar
Pekalska, E., Duin, R.: Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters 23 (2002) 943–956
Article MATH Google Scholar
Smyth, P.: Clustering sequences with hidden Markov models. In Mozer, M., Jordan, M., Petsche, T., eds.: Advances in Neural Information Processing. Volume 9., MIT Press (1997)
Google Scholar
Panuccio, A., Bicego, M., Murino, V.: A Hidden Markov Model-based approach to sequential data clustering. In Caelli, T., Amin, A., Duin, R., Kamel, M., de Ridder, D., eds.: Structural, Syntactic and Statistical Pattern Recognition. LNCS 2396, Springer (2002) 734–742
Chapter Google Scholar
Rabiner, L., Lee, C., Juang, B., Wilpon, J.: HMM clustering for connected word recognition. In: Proc. of IEEE ICASSP. (1989) 405–408
Google Scholar
Lee, K.: Context-dependent phonetic hidden Markov models for speakerindependent continuous speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 38 (1990) 599–609
Article Google Scholar
Kosaka, T., Matsunaga, S., Kuraoka, M.: Speaker-independent phone modeling based on speaker-dependent hmm’s composition and clustering. In: Int. Proc. on Acoustics, Speech, and Signal Processing. Volume 1. (1995) 441–444
Google Scholar
Bahlmann, C., Burkhardt, H.: Measuring hmm similarity with the bayes probability of error and its application to online handwriting recognition. In: Proc. Int. Conf. Document Analysis and Recognition. (2001) 406–411
Google Scholar
Cadez, I., Gaffney, S., Smyth, P.: A general probabilistic framework for clustering individuals. In: Proc. of ACM SIGKDD 2000. (2000)
Google Scholar
Law, M., Kwok, J.: Rival penalized competitive learning for model-based sequence. In: Proc. Int. Conf. Pattern Recognition. Volume 2. (2000) 195–198
Google Scholar
Xu, L., Krzyzak, A., Oja, E.: Rival penalized competitive learning for clustering analysis, RBF nets, and curve detection. IEEE Trans. on Neural Networks 4 (1993) 636–648
Article Google Scholar
Li, C.: A Bayesian Approach to Temporal Data Clustering using Hidden Markov Model Methodology. PhD thesis, Vanderbilt University (2000)
Google Scholar
Li, C., Biswas, G.: Clustering sequence data using hidden Markov model representation. In: Proc. of SPIE’99 Conf. on Data Mining and Knowledge Discovery: Theory, Tools, and Technology. (1999) 14–21
Google Scholar
Li, C., Biswas, G.: A bayesian approach to temporal data clustering using hidden Markov models. In: Proc. Int. Conf. on Machine Learning. (2000) 543–550
Google Scholar
Li, C., Biswas, G.: Applying the Hidden Markov Model methodology for unsupervised learning of temporal data. Int. Journal of Knowledge-based Intelligent Engineering Systems 6 (2002) 152–160
Google Scholar
Li, C., Biswas, G., Dale, M., Dale, P.: Matryoshka: A HMM based temporal data clustering methodology for modeling system dynamics. Intelligent Data Analysis Journal in press (2002)
Google Scholar
Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6 (1978) 461–464
Article MATH MathSciNet Google Scholar
Stolcke, A., Omohundro, S.: Hidden Markov Model induction by Bayesian model merging. In Hanson, S., Cowan, J., Giles, C., eds.: Advances in Neural Information Processing Systems. Volume 5., Morgan Kaufmann, San Mateo, CA (1993) 11–18
Google Scholar
Cheeseman, P., Stutz, J.: Bayesian classification (autoclass): Theory and results. In: Advances in Knowledge discovery and data mining. (1996) 153–180
Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)
MATH Google Scholar
Dubnov, S., El-Yaniv, R., Gdalyahu, Y., Schneidman, E., Tishby, N., Yona, G.: A new nonparametric pairwise clustering algorithm based on iterative estimation of distance profiles. Machine Learning 47 (2002) 35–61
Article MATH Google Scholar
Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press (1999)
Google Scholar
Bicego, M., Murino, V.: Investigating Hidden Markov Models’ capabilities in 2D shape classification. Submitted for publication (2002)
Google Scholar
Sebastian, T., Klein, P., Kimia, B.: Recognition of shapes by editing Shock Graphs. In: Proc. Int Conf. Computer Vision. (2001) 755–762
Google Scholar
Bicego, M., Murino, V., Figueiredo, M.: Similarity-based classification of sequences using hidden Markov models (2002) Submitted for publication.
Google Scholar
Law, M., A.K. Jain, Figueiredo, M.: Feature selection in mixture-based clustering. In: Neural Information Processing Systems — NIPS’2002, Vancouver (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università di Verona, Ca’ Vignal 2, Strada Le Grazie 15, 37134, Verona, Italy
Manuele Bicego & Vittorio Murino
Instituto de Telecomunicações, Instituto Superior Técnico, 1049-001, Lisboa, Portugal
Mário A. T. Figueiredo

Authors

Manuele Bicego
View author publications
You can also search for this author in PubMed Google Scholar
Vittorio Murino
View author publications
You can also search for this author in PubMed Google Scholar
Mário A. T. Figueiredo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Arndtstr. 4, 04275, Leipzig, Germany
Petra Perner
Center for Automation Research, University of Maryland, College Park, Maryland, 20742-3275, USA
Azriel Rosenfeld

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bicego, M., Murino, V., Figueiredo, M.A.T. (2003). Similarity-Based Clustering of Sequences Using Hidden Markov Models. In: Perner, P., Rosenfeld, A. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2003. Lecture Notes in Computer Science, vol 2734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45065-3_8

Download citation

DOI: https://doi.org/10.1007/3-540-45065-3_8
Published: 24 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40504-7
Online ISBN: 978-3-540-45065-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics