Skip to main content

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

This chapter presents speech and handwriting recognition, i.e. two major applications involving the markovian models described in Chapter 10. The goal is not only to present some of the most widely investigated applications of the literature, but also to show how the same machine learning techniques can be applied to recognize data apparently different like handwritten word images and speech recordings. In fact, the only differences between handwriting and speech recognition systems concern the so-called front-end, i.e. the low-level processing steps dealing directly with the raw data (see Section 12.2 for more details). Once the raw data have been converted into sequences of vectors, the same recognition approach, based on hidden Markov models and N-grams, is applied to both problems and no more domain specific knowledge is needed. The possibility of dealing with different data using the same approach is one of the main advantages of machine learning, in fact it makes it possible to work on a wide spectrum of problems even in absence of deep problem specific knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Abberley, S. Renals, D. Ellis, and T. Robinson. The THISL SDR system at TREC-8. In Proceedings of 8th Text Retrieval Conference, pages 699-706, 1999.

    Google Scholar 

  2. D. Attwater, M. Edgington, P. Durston, and S. Whittaker. Practical issues in the application of speech technology to network and customer service applications. Speech Communication, 31(4):279-291, 2000.

    Article  Google Scholar 

  3. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999.

    Google Scholar 

  4. L.R. Bahl, V. De Gennaro, P.S. Gopalakrishnan, and R.L. Mercer. A fast ap- proximate acoustic match for large vocabulary speech recognition. IEEE Trans- actions on Speech and Audio Processing, 1(1):59-67, 1993.

    Article  Google Scholar 

  5. H. Bourlard and N. Morgan. Connectionist Speech Recognition - A Hybrid Ap- proach. Kluwer, 1994.

    Google Scholar 

  6. R.M. Bozinovic and S.N. Srihari. Off-line cursive script word recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(1):69-83, January 1989.

    Article  Google Scholar 

  7. H. Bunke, M. Roth, and E.G. Schukat-Talamazzini. Off-line cursive handwriting recognition using hidden Markov models. Pattern Recognition, 28(9):1399-1413, September 1995.

    Article  Google Scholar 

  8. Horst Bunke, M. Roth, and E.G. Schukat-Talamazzini. Off-line recognition of cursive script produced by a cooperative writer. In Proceedings of International Conference on Pattern Recognition, pages 383-386, 1994.

    Google Scholar 

  9. W. Byrne, D. Doermann, M. Franz, S. Gustman, J. Hajic, D. Oard, M. Picheny, J. Psutka, B. Ramabhadran, D. Soergel, T. Ward, and Wei-Jing Zhu. Automatic recognition of spontaneous speech for access to multilingual oral history archives. IEEE Transactions on Speech and Audio Processing, 12(4):420-435, 2004.

    Article  Google Scholar 

  10. E. Chang, F. Seide, H.M. Meng, Zhuoran Chen, Yu Shi, and Yuk-Chi Li. A system for spoken query information retrieval on mobile devices. IEEE Trans-actions on Speech and Audio Processing, 10(8):531-541, 2002.

    Article  Google Scholar 

  11. M.Y. Chen and A. Kundu. An alternative to variable duration HMM in hand-written word recognition. In Proceedings of International Workshop on Frontiers in Handwriting Recognition, 1993.

    Google Scholar 

  12. M.Y. Chen, A. Kundu, and J. Zhou. Off-line handwritten word recognition using a hidden Markov model type stochastic network. IEEE Transactionson Pattern Analysis and Machine Intelligence, 16(5):481-496, May 1994.

    Article  Google Scholar 

  13. W. Chen, P. Gader, and H. Shi. Lexicon-driven handwritten word recognition using optimal linear combinations of order statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(1):77-82, January 1999.

    Article  Google Scholar 

  14. J. Chu-Carroll and B. Carpenter. Vector based natural language call routing. Computational Linguistics, 25(3):361-388, 1999.

    Google Scholar 

  15. F.S. Cohen. Markov random fields for image modelling e analysis. In U. De-sai, editor, Modelling and Applications of Stochastic Processes, pages 243-272. Kluwer Academic Press, 1986.

    Google Scholar 

  16. S. Deligne, S. Dharanipragada, R. Gopinath, B. Maison, P. Olsen, and H. Printz. A robust high accuracy speech recognition system for mobile applications. IEEE Transactions on Speech and Audio Processing, 10(8):551-561, 2002.

    Article  Google Scholar 

  17. V. Di Lecce, A. Dimauro, Guerriero, S. Impedovo, G. Pirlo, and A. Salzo. A new hybrid approach for legal amount recognition. In Proceedings of International Workshop on Frontiers in Handwriting Recognition, pages 199-208, Amsterdam, 2000.

    Google Scholar 

  18. G. Dimauro, S. Impedovo, and G. Pirlo. Automatic recognition of cursive amounts on italian bank-checks. In S. Impedovo, editor, Progress in Image Analysis and Processing III, pages 323-330. World Scientific, 1994.

    Google Scholar 

  19. G. Dimauro, S. Impedovo, G. Pirlo, and A. Salzo. Bankcheck recognition sys-tems: re-engineering the design process. In A. Downton and S. Impedovo, edi-tors, Progress in Handwriting Recognition, pages 419-425.

    Google Scholar 

  20. G. Dimauro, S. Impedovo, G. Pirlo, and A. Salzo. Automatic bankcheck process-ing: A new engineered system. In Automatic Bankcheck Processing, pages 5-42. World Scientific Publishing, 1997.

    Google Scholar 

  21. S. Edelman, T. Flash, and S. Ullman. Reading cursive handwriting by alignment of letter prototypes. International Journal of Computer Vision, 5(3):303-331, March 1990.

    Article  Google Scholar 

  22. A. El Yacoubi, J.M. Bertille, and Gilloux M. Conjoined location and recognition of street names within a postal address delivery line. In Proceedings of Inter-national Conference on Document Analysis and Recognition, volume 1, pages 1024-1027, Montreal, 1995.

    Google Scholar 

  23. A. El-Yacoubi, M. Gilloux, R. Sabourin, and C.Y. Suen. An HMM,-based ap-proach for off-line unconstrained handwritten word modeling and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(8):752-760, August 1999.

    Article  Google Scholar 

  24. John T. Favata. General word recognition using approximate segment-string matching. In Proceedings of International Conference on Document Analysis and Recognition, volume 1, pages 92-96, Ulm, 1997.

    Google Scholar 

  25. M. Franz, J.S. McCarley, and R.T. Ward. Ad hoc, cross-language and spoken document information retrieval at IBM. In Proceedings of 8th Text Retrieval Conference, pages 391-398, 1999.

    Google Scholar 

  26. M.J. Gales, D.Y. Kim, P.C. Woodland, H.Y. Chan, D. Mrva, R. Sinha, and S.A. Tranter. Progress in the CU-HTK boradcast news transcription system. IEEE Transactions on Audio, Speech and Language Processing, 14(5):1513-1525, 2006.

    Article  Google Scholar 

  27. J.S. Garofolo, C.G.P. Auzanne, and E.M. Voorhees. The TREC spoken doc-ument retrieval track: A success story. In Proceedings of 8th Text Retrieval Conference, pages 107-129, 1999.

    Google Scholar 

  28. J.L. Gauvain, Y. de Kercadio, L. Lamel, and G. Adda. The LIMSI SDR system for TREC-8. In Proceedings of 8th Text Retrieval Conference, pages 475-482, 1999.

    Google Scholar 

  29. C. Gerber. Found in translation. Military Information Technology, 10(2), 2006.

    Google Scholar 

  30. P.S. Gopalakrishnan, L.R. Bahl, and R.L. Mercer. A tree search strategy for large vocabulary continuous speech recognition. In Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing, pages 572-575,1995.

    Google Scholar 

  31. A. Gorin, G. Riccardi, and J. Wright. How may I help you? Speech Communi- cation, 23(2):113-127, 1997.

    Article  Google Scholar 

  32. N. Gorski, V. Anisimov, E. Augustin, O. Baret, D. Price, and J.C. Simon. A2iA check reader: A family of bank check recognition systems. In Proceedings of International Conference on Document Analysis and Recognition, volume 1, pages 523-526, Bangalore, 1999.

    Google Scholar 

  33. D. Graff, C. Cieri, S. Strassel, and N. Martey. The TDT-3 text and speech corpus. In Proceedings of Topic Detection and Tracking Workshop, 2000.

    Google Scholar 

  34. D. Guillevic and C.Y. Suen. HMM word engine recognition. In Proceedings of International Conference on Document Analysis and Recognition, volume 2, pages 544-547, Ulm, 1997.

    Google Scholar 

  35. T. Hain, L. Burget, J. Dines, G. Garau, M. Karafiat, M. Lincoln, J. Vepa, and V. Wan. The AMI meeting transcription system: progress and performance. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2007.

    Google Scholar 

  36. B. Han, R. Nagarajan, R. Srihari, and M. Srikanth. TREC-8 experiments at SUNY at Buffalo. In Proceedings of 8th Text Retrieval Conference, pages 591-596,1999.

    Google Scholar 

  37. J.H.L. Hansen, R. Huang, B. Zhou, M. Seadle, J.R. Deller, A.R. Gurijala, M. Ku-rimo, and P. Angkititrakul. Speechfind: Advances in spoken document retrieval for a national gallery of the spoken word. IEEE Transactions on Speech and Audio Processing, 13(5):712-730, 2005.

    Article  Google Scholar 

  38. Q. Huang and S. Cox. Task-independent call-routing. Speech Communication, 48(3-4):374-389, 2006.

    Article  Google Scholar 

  39. X. Huang, A. Acero, and H.-W. Hon. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice-Hall, 2001.

    Google Scholar 

  40. F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997.

    Google Scholar 

  41. S.E. Johnson, P. Jourlin, K. Spärck-Jones, and P.C. Woodland. Spoken docu-ment retrieval for TREC-8 at Cambridge University. In Proceedings of 8th Text Retrieval Conference, pages 197-206, 1999.

    Google Scholar 

  42. D. Jurafsky and J.H. Martin. Speech and Language Processing: an Introduc-tion to Natural Processing Computational Linguistics, and Speech Recognition. Prentice-Hall, 2000.

    Google Scholar 

  43. G. Kaufmann and H. Bunke. Automated reading of cheque amounts. Pattern Analysis and Applications, 3:132-141, march 2000.

    Google Scholar 

  44. T. Kawahara, M. Hasegawa, K. Shitaoka, T. Kitade, and H. Nanjo. Automatic indexing of lecture presentations using unsupervised learning of presumed dis-course markers. IEEE Transactions on Speech and Audio Processing, 12(4):409-419,2004.

    Article  Google Scholar 

  45. G. Kim and V. Govindaraju. Handwritten word recognition for real time appli-cations. In Proceedings of International Conference on Document Analysis and Recognition, volume 1, pages 24-27, Montreal, 1995.

    Google Scholar 

  46. G. Kim and V. Govindaraju. A lexicon driven approach to handwritten word recognition for real time application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):366-379, 1997.

    Article  Google Scholar 

  47. S. Knerr, E. Augustin, O. Baret, and D. Price. Hidden Markov model based word recognition and its application to legal amount reading on French checks. Computer Vision and Image Understanding, 70(3):404-419, June 1998.

    Article  Google Scholar 

  48. W. Kraaij, R. Pohlmann, and D. Hiemstra. Twenty-one at TREC-8 using lan-guage technology for information retrieval. In Proceedings of 8th Text Retrieval Conference, pages 285-300, 1999.

    Google Scholar 

  49. A. Kundu, Y. He, and M.Y. Che. Alternatives to variable duration HMM in handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1275-1280, November 1998.

    Article  Google Scholar 

  50. H.-K.J. Kuo and L. Chin-Hui. Discriminative training of natural language call routers. IEEE Transactions on Speech and Audio Processing, 11(1):24-35, 2003.

    Article  Google Scholar 

  51. M. Kurimo. Thematic indexing of spoken documents by using self-organizing maps. Speech Communication, 38(1-2):29-45, 2002.

    Article  MATH  Google Scholar 

  52. C.H. Lee, B. Carpenter, W. Chou, J. Chu-Carroll, W. Reichl, A. Saad, and Q. Zhou. On natural language call routing. Speech Communication, 31(4):309-320,2000.

    Article  Google Scholar 

  53. D. Li, W. Kuansan, A. Acero, H. Hsiao-Wuen, J. Droppo, C. Boulis, W. Ye-Yi, D. Jacoby, M. Mahajan, C. Chelba, and X.D. Huang. Distributed speech processing in miPad’s multimodal user interface. IEEE Transactions on Speech and Audio Processing, 10(8):605-619, 2002.

    Article  Google Scholar 

  54. S. Madhvanath, E. Kleinberg, and V. Govindaraju. Holistic verification of hand-written phrases. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 1999.

    Google Scholar 

  55. S. Madhvanath, E. Kleinberg, V. Govindaraju, and S.N. Srihari. The HOVER system for rapid holistic verification of off-line handwritten phrases. In Pro-ceedings of International Conference on Document Analysis and Recognition, volume 2, pages 855-859, Ulm, 1997.

    Google Scholar 

  56. U. Marti and H. Bunke. Towards general cursive script recognition. In Proceed-ings of International Workshop on Frontiers in Handwriting Recognition, pages 379-388, Korea, 1998.

    Google Scholar 

  57. U. Marti, G. Kaufmann, and Bunke H. Cursive script recognition with time de-lay neural networks using learning hints. In W. Gerstner, A. Gernoud, M. Hasler, and J.D. Nicoud, editors, Artificial Neural Networks - ICANN97, pages 973-979. Springer Verlag, 1997.

    Google Scholar 

  58. U.-V. Marti and H. Bunke. A full english sentence database for off-line hand-writing recognition. In Proceedings of International Conference on Document Analysis and Recognition, volume 1, pages 705-708, Bangalore, 1999.

    Google Scholar 

  59. U.V. Marti and H. Bunke. Handwritten sentence recognition. In Proceedings of International Conference on Pattern Recognition, volume 3, pages 467-470, Barcelona, 2000.

    Google Scholar 

  60. U.V. Marti and H. Bunke. Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Inter-national Journal of Pattern Recognition and Artificial Intelligence, 2001.

    Google Scholar 

  61. U.V. Marti and H. Bunke. The IAM-database: an English sentence database for offline handwriting recognition. International Journal of Document Analysis and Recognition, 5(1): 39-46, January 2002.

    Article  MATH  Google Scholar 

  62. S. Matsoukas, J.L. Gauvain, G. Adda, T. Colthurst, C.L. Kao, O. Kimball, L. Lamel, F. Lefevre, J.Z. Ma, J. Makhoul, L. Nguyen, R. Prasad, R. Schwartz, H. Schwenk, and B. Xiang. Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI. IEEE Transactions on Audio, Speech and Language Processing, 14(5):1541-1556, 2006.

    Article  Google Scholar 

  63. M. Mohamed and P. Gader. Handwritten word recognition using segmentation-free hidden Markov modeling and segmentation-based dynamic programming techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (5):548-554, May 1996.

    Article  Google Scholar 

  64. S. Möller, J. Krebber, and P. Smeele. Evaluating the speech output component of a smart-home system. Speech Communication, 48(1):1-27, 2006.

    Article  Google Scholar 

  65. C. Olivier, T. Paquet, M. Avila, and Y. Lecourtier. Recognition of handwritten words using stochastic models. In Proceedings of International Conference on Document Analysis and Recognition, volume 1, pages 19-23, Montreal, 1995.

    Google Scholar 

  66. M. Padmanabhan, G. Saon, J. Huang, B. Kingsbury, and L. Mangu. Auto-matic speech recognition performance on a voicemail transcription task. IEEE Transactions on Speech and Audio Processing, 10(7):433-442, 2002.

    Article  Google Scholar 

  67. T. Paquet and Y. Lecourtier. Recognition of handwritten sentences using a restricted lexicon. Pattern Recognition, 26(3):391-407, 1993.

    Article  Google Scholar 

  68. J. Park, V. Govindaraju, and S.N. Srihari. Efficient word segmentation driven by unconstrained handwritten phrase recognition. In Proceedings of International Conference on Document Analysis and Recognition, volume 1, pages 605-608, Bangalore, 1999.

    Google Scholar 

  69. R. Plamondon and S.N. Srihari. On-line and off-line handwriting recognition: A comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):63-84, 2000.

    Article  Google Scholar 

  70. D. Ponceleon and S. Srinivasan. Automatic discovery of salient segments in im-perfect speech transcripts. In ACM Conference on Information and Knowledge Management, pages 490-497, 2001.

    Google Scholar 

  71. D. Ponceleon and S. Srinivasan. Structure and content based segmentation of speech transcripts. In ACM Conference on Research and Development in Information Retrieval (SIGIR), pages 404-405, 2001.

    Google Scholar 

  72. L.R. Rabiner and B.H. Juang. Fundamentals of Speech Recognition. Prentice- Hall, 1993.

    Google Scholar 

  73. G. Saon. Cursive word recognition using a random field based hidden Markov model. International Journal of Document Analysis and Recognition, 1(1):199-208,1999.

    Article  Google Scholar 

  74. G. Seni, V. Kripasundar, and R.K. Srihari. Generalizing edit distance to in-corporate domain information: Handwritten text recognition as a case study. Pattern Recognition, 29(3):405-414, 1996.

    Article  Google Scholar 

  75. A.W. Senior. Off-Line Cursive Handwriting Recognition Using Recurrent Neural Network. PhD thesis, University of Cambridge, UK, 1994.

    Google Scholar 

  76. A.W. Senior and A.J. Robinson. An off-line cursive handwriting recognition system. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):309-321, March 1998.

    Article  Google Scholar 

  77. M. Shridar, G. Houle, and Kimura F. Handwritten word recognition using lexicon free and lexicon directed word recognition algorithms. In Proceedings of International Conference on Document Analysis and Recognition, volume 2, pages 861-865, Ulm, 1997.

    Google Scholar 

  78. A. Singhal, S. Abney, M. Bacchiani, M. Collins, D. Hindle, and F. Pereira. AT&T at TREC-8. In Proceedings of 8th Text Retrieval Conference, pages 317-330,1999.

    Google Scholar 

  79. R.K. Srihari. Use of lexical and syntactic techniques in recognizing handwritten text. In Proceedings of ARPA workshop on Human Language Technology, pages 403-407, 1994.

    Google Scholar 

  80. R.K. Srihari and C. Baltus. Incorporating syntactic constraints in recognizing handwritten sentences. In Proceedings of International Joint Conference on Artificial Intelligence, pages 1262-1267, 1993.

    Google Scholar 

  81. S.N. Srihari. Handwritten address interpretation: a task of many pattern recog-nition problems. International Journal of Pattern Recognition and Artificial Intelligence, 14(5):663-674, 2000.

    Article  Google Scholar 

  82. T. Steinherz, E. Rivlin, and N. Intrator. Off-line cursive script word recognition - a survey. International Journal on Document Analysis and Recognition, 2(2):1-33,1999.

    Google Scholar 

  83. A. Stolcke, B. Chen, H. Franco, V.R. Rao Gadde, M. Graciarena, M.Y. Hwang, K. Kirchhoff, A. Mandal, N. Morgan, X. Lei, T. Ng, M. Ostendorf, K. Sönmez, A. Venkataraman, D. Vergyri, W. Wang, J. Zheng, and Q. Zhu. Recent innovations in speech-to-text transcriptions at SRI-ICSI-UW. IEEE Transactions on Audio, Speech and Language Processing, 14(5):1729-1744, 2006.

    Article  Google Scholar 

  84. Lee S.W., editor. Advances in Handwriting Recognition. World Scientific Pub-lishing Company, 1999.

    Google Scholar 

  85. O.D. Trier, A.K. Jain, and T. Taxt. Feature extraction methods for character recognition-A survey. Pattern Recognition, 10(4):641-662, 1996.

    Article  Google Scholar 

  86. G. Tur, R. Schapire, and D. Hakkani-Tr. Active learning for spoken language understanding. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2003.

    Google Scholar 

  87. I. Varga, S. Aalburg, B. Andrassy, S. Astrov, J.G. Bauer, C. Beaugeant, C. Geissler, and H. Hoge. ASR in mobile phones - an industrial approach. IEEE Transactions on Speech and Audio Processing, 10(8):562-569, 2002.

    Article  Google Scholar 

  88. A. Vinciarelli. A survey on off-line cursive word recognition. Pattern Recogni-tion, 35(7):1433-1446, 2002.

    Article  MATH  Google Scholar 

  89. A. Vinciarelli. Noisy text categorization. IEEE Transactions on Pattern Analy-sis and Machine Intelligence, 27(12):1882-1895, 2005.

    Article  Google Scholar 

  90. A. Vinciarelli, S. Bengio, and H. Bunke. Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 26(6):709-720, 2004.

    Article  Google Scholar 

  91. W. Wang, A. Brakensiek, A. Kosmala, and G. Rigoll. HMM based high ac-curacy off-line cursive handwriting recognition by a baseline detection error tolerant feature extraction approach. In Proceedings of International Workshop on Frontiers in Handwriting Recognition, pages 209-218, Amsterdam, 2000.

    Google Scholar 

  92. B.A. Yanikoglu and P.A. Sandon. Off line cursive handwriting recognition using neural networks. In Proceedings of SPIE Conference on Applications of Artificial Neural Networks, 1993.

    Google Scholar 

  93. B.A. Yanikoglu and P.A. Sandon. Off-line cursive handwriting recognition using style parameters. Tech. Rep. PCS-TR93-192 Dartmouth College, 1993.

    Google Scholar 

  94. S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland. The HTK book. http://htk.eng.cam.ac.uk/docs/docs/shtml, 2000.

  95. M. Zimmermann and H. Bunke. Automatic segmentation of the IAM off-line database for handwritten english text. In Proceedings of 16th International Conference on Pattern Recognition, volume IV, pages 35-39, 2002.

    Google Scholar 

  96. M. Zimmermann, J.-C. Chappelier, and H. Bunke. Offline grammar-based recog-nition of handwritten sentences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(5):818-821, 2006.

    Article  Google Scholar 

  97. V. Zue, S. Seneff, J.R. Glass, J. Polifroni, C. Pao, T.J. Hazen, and L. Hethering-ton. Juplter: a telephone-based conversational interface for weather information. IEEE Transactions on Speech and Audio Processing, 8(1):85-96, 2000.

    Article  Google Scholar 

Download references

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer

About this chapter

Cite this chapter

(2008). Speech and Handwriting Recognition. In: Machine Learning for Audio, Image and Video Analysis. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84800-007-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-007-0_12

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-006-3

  • Online ISBN: 978-1-84800-007-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics