Speech and Handwriting Recognition

Camastra, Francesco; Vinciarelli, Alessandro

doi:10.1007/978-1-4471-6735-8_12

Francesco Camastra¹⁴ &
Alessandro Vinciarelli¹⁵

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

4630 Accesses

Abstract

What the reader should know to understand this chapter \(\bullet \) Hidden Markov models (Chap. 10). \(\bullet \) Language models (Chap. 10). \(\bullet \) Bayes decision theory (Chap. 3).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
At the time this book is being written, the package can be downloaded at http://htk.eng.cam.ac.uk.
2.
In this case, the decoding takes into account the fact that each sample corresponds to a single word and does not try to align the data with more than one word. This avoids deletion and insertion errors that are explained in the following.
3.
Call routing is the problem of automatically finding an operator capable of addressing the needs expressed by a person contacting a call center. In this case, a perfect transcription is not necessary; the only important thing is to recognize the few keywords identifying the user needs and the right operator.
4.
The data is publicly available and it can be downloaded at the following ftp address: ftp.eng.cam.ac.uk/pub/data.
5.
At the time this book is being written, the proceedings are available online at the site http://nist.trec.gov.

References

D. Abberley, S. Renals, D. Ellis, and T. Robinson. The THISL SDR system at TREC-8. In Proceedings of 8 \(^{th}\) Text Retrieval Conference, pages 699–706, 1999.
Google Scholar
D. Attwater, M. Edgington, P. Durston, and S. Whittaker. Practical issues in the application of speech technology to network and customer service applications. Speech Communication, 31(4):279–291, 2000.
Google Scholar
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999.
Google Scholar
L.R. Bahl, V. De Gennaro, P.S. Gopalakrishnan, and R.L. Mercer. A fast approximate acoustic match for large vocabulary speech recognition. IEEE Transactions on Speech and Audio Processing, 1(1):59–67, 1993.
Google Scholar
H. Bourlard and N. Morgan. Connectionist Speech Recognition - A Hybrid Approach. Kluwer, 1994.
Google Scholar
R.M. Bozinovic and S.N. Srihari. Off-line cursive script word recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(1):69–83, January 1989.
Google Scholar
H. Bunke, M. Roth, and E.G. Schukat-Talamazzini. Off-line cursive handwriting recognition using hidden Markov models. Pattern Recognition, 28(9):1399–1413, September 1995.
Google Scholar
Horst Bunke, M. Roth, and E.G. Schukat-Talamazzini. Off-line recognition of cursive script produced by a cooperative writer. In Proceedings of International Conference on Pattern Recognition, pages 383–386, 1994.
Google Scholar
W. Byrne, D. Doermann, M. Franz, S. Gustman, J. Hajic, D. Oard, M. Picheny, J. Psutka, B. Ramabhadran, D. Soergel, T. Ward, and Wei-Jing Zhu. Automatic recognition of spontaneous speech for access to multilingual oral history archives. IEEE Transactions on Speech and Audio Processing, 12(4):420–435, 2004.
Google Scholar
E. Chang, F. Seide, H.M. Meng, Zhuoran Chen, Yu Shi, and Yuk-Chi Li. A system for spoken query information retrieval on mobile devices. IEEE Transactions on Speech and Audio Processing, 10(8):531–541, 2002.
Google Scholar
M.Y. Chen and A. Kundu. An alternative to variable duration HMM in handwritten word recognition. In Proceedings of International Workshop on Frontiers in Handwriting Recognition, 1993.
Google Scholar
M.Y. Chen, A. Kundu, and J. Zhou. Off-line handwritten word recognition using a hidden Markov model type stochastic network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):481–496, May 1994.
Google Scholar
W. Chen, P. Gader, and H. Shi. Lexicon-driven handwritten word recognition using optimal linear combinations of order statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(1):77–82, January 1999.
Google Scholar
J. Chu-Carroll and B. Carpenter. Vector based natural language call routing. Computational Linguistics, 25(3):361–388, 1999.
Google Scholar
F.S. Cohen. Markov random fields for image modelling e analysis. In U. Desai, editor, Modelling and Applications of Stochastic Processes, pages 243–272. Kluwer Academic Press, 1986.
Google Scholar
S. Deligne, S. Dharanipragada, R. Gopinath, B. Maison, P. Olsen, and H. Printz. A robust high accuracy speech recognition system for mobile applications. IEEE Transactions on Speech and Audio Processing, 10(8):551–561, 2002.
Google Scholar
V. Di Lecce, A. Dimauro, Guerriero, S. Impedovo, G. Pirlo, and A. Salzo. A new hybrid approach for legal amount recognition. In Proceedings of International Workshop on Frontiers in Handwriting Recognition, pages 199–208, Amsterdam, 2000.
Google Scholar
G. Dimauro, S. Impedovo, and G. Pirlo. Automatic recognition of cursive amounts on italian bank-checks. In S. Impedovo, editor, Progress in Image Analysis and Processing III, pages 323–330. World Scientific, 1994.
Google Scholar
G. Dimauro, S. Impedovo, G. Pirlo, and A. Salzo. Bankcheck recognition systems: re-engineering the design process. In A. Downton and S. Impedovo, editors, Progress in Handwriting Recognition, pages 419–425.
Google Scholar
G. Dimauro, S. Impedovo, G. Pirlo, and A. Salzo. Automatic bankcheck processing: A new engineered system. In Automatic Bankcheck Processing, pages 5–42. World Scientific Publishing, 1997.
Google Scholar
S. Edelman, T. Flash, and S. Ullman. Reading cursive handwriting by alignment of letter prototypes. International Journal of Computer Vision, 5(3):303–331, March 1990.
Google Scholar
A. El Yacoubi, J.M. Bertille, and Gilloux M. Conjoined location and recognition of street names within a postal address delivery line. In Proceedings of International Conference on Document Analysis and Recognition, volume 1, pages 1024–1027, Montreal, 1995.
Google Scholar
A. El-Yacoubi, M. Gilloux, R. Sabourin, and C.Y. Suen. An HMM,-based approach for off-line unconstrained handwritten word modeling and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(8):752–760, August 1999.
Google Scholar
John T. Favata. General word recognition using approximate segment-string matching. In Proceedings of International Conference on Document Analysis and Recognition, volume 1, pages 92–96, Ulm, 1997.
Google Scholar
M. Franz, J.S. McCarley, and R.T. Ward. Ad hoc, cross-language and spoken document information retrieval at IBM. In Proceedings of 8 \(^{th}\) Text Retrieval Conference, pages 391–398, 1999.
Google Scholar
M.J. Gales, D.Y. Kim, P.C. Woodland, H.Y. Chan, D. Mrva, R. Sinha, and S.A. Tranter. Progress in the CU-HTK boradcast news transcription system. IEEE Transactions on Audio, Speech and Language Processing, 14(5):1513–1525, 2006.
Google Scholar
J.S. Garofolo, C.G.P. Auzanne, and E.M. Voorhees. The TREC spoken document retrieval track: A success story. In Proceedings of 8\(^{th}\) Text Retrieval Conference, pages 107–129, 1999.
Google Scholar
J.L. Gauvain, Y. de Kercadio, L. Lamel, and G. Adda. The LIMSI SDR system for TREC-8. In Proceedings of 8 \(^{th}\) Text Retrieval Conference, pages 475–482, 1999.
Google Scholar
C. Gerber. Found in translation. Military Information Technology, 10(2), 2006.
Google Scholar
P.S. Gopalakrishnan, L.R. Bahl, and R.L. Mercer. A tree search strategy for large vocabulary continuous speech recognition. In Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing, pages 572–575, 1995.
Google Scholar
A. Gorin, G. Riccardi, and J. Wright. How may I help you? Speech Communication, 23(2):113–127, 1997.
Google Scholar
N. Gorski, V. Anisimov, E. Augustin, O. Baret, D. Price, and J.C. Simon. A2iA check reader: A family of bank check recognition systems. In Proceedings of International Conference on Document Analysis and Recognition, volume 1, pages 523–526, Bangalore, 1999.
Google Scholar
D. Graff, C. Cieri, S. Strassel, and N. Martey. The TDT-3 text and speech corpus. In Proceedings of Topic Detection and Tracking Workshop, 2000.
Google Scholar
D. Guillevic and C.Y. Suen. HMM word engine recognition. In Proceedings of International Conference on Document Analysis and Recognition, volume 2, pages 544–547, Ulm, 1997.
Google Scholar
T. Hain, L. Burget, J. Dines, G. Garau, M. Karafiat, M. Lincoln, J. Vepa, and V. Wan. The AMI meeting transcription system: progress and performance. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2007.
Google Scholar
B. Han, R. Nagarajan, R. Srihari, and M. Srikanth. TREC-8 experiments at SUNY at Buffalo. In Proceedings of 8 \(^{th}\) Text Retrieval Conference, pages 591–596, 1999.
Google Scholar
J.H.L. Hansen, R. Huang, B. Zhou, M. Seadle, J.R. Deller, A.R. Gurijala, M. Kurimo, and P. Angkititrakul. Speechfind: Advances in spoken document retrieval for a national gallery of the spoken word. IEEE Transactions on Speech and Audio Processing, 13(5):712–730, 2005.
Google Scholar
Q. Huang and S. Cox. Task-independent call-routing. Speech Communication, 48(3–4):374–389, 2006.
Google Scholar
X. Huang, A. Acero, and H.-W. Hon. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice-Hall, 2001.
Google Scholar
F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997.
Google Scholar
S.E. Johnson, P. Jourlin, K. Spärck-Jones, and P.C. Woodland. Spoken document retrieval for TREC-8 at Cambridge University. In Proceedings of 8 \(^{th}\) Text Retrieval Conference, pages 197–206, 1999.
Google Scholar
D. Jurafsky and J.H. Martin. Speech and Language Processing: an Introduction to Natural Processing Computational Linguistics, and Speech Recognition. Prentice-Hall, 2000.
Google Scholar
G. Kaufmann and H. Bunke. Automated reading of cheque amounts. Pattern Analysis and Applications, 3:132–141, march 2000.
Google Scholar
T. Kawahara, M. Hasegawa, K. Shitaoka, T. Kitade, and H. Nanjo. Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers. IEEE Transactions on Speech and Audio Processing, 12(4):409–419, 2004.
Google Scholar
G. Kim and V. Govindaraju. Handwritten word recognition for real time applications. In Proceedings of International Conference on Document Analysis and Recognition, volume 1, pages 24–27, Montreal, 1995.
Google Scholar
G. Kim and V. Govindaraju. A lexicon driven approach to handwritten word recognition for real time application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):366–379, 1997.
Google Scholar
S. Knerr, E. Augustin, O. Baret, and D. Price. Hidden Markov model based word recognition and its application to legal amount reading on French checks. Computer Vision and Image Understanding, 70(3):404–419, June 1998.
Google Scholar
W. Kraaij, R. Pohlmann, and D. Hiemstra. Twenty-one at TREC-8 using language technology for information retrieval. In Proceedings of 8 \(^{th}\) Text Retrieval Conference, pages 285–300, 1999.
Google Scholar
A. Kundu, Y. He, and M.Y. Che. Alternatives to variable duration HMM in handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1275–1280, November 1998.
Google Scholar
H.-K.J. Kuo and L. Chin-Hui. Discriminative training of natural language call routers. IEEE Transactions on Speech and Audio Processing, 11(1):24–35, 2003.
Google Scholar
M. Kurimo. Thematic indexing of spoken documents by using self-organizing maps. Speech Communication, 38(1–2):29–45, 2002.
Google Scholar
C.H. Lee, B. Carpenter, W. Chou, J. Chu-Carroll, W. Reichl, A. Saad, and Q. Zhou. On natural language call routing. Speech Communication, 31(4):309–320, 2000.
Google Scholar
D. Li, W. Kuansan, A. Acero, H. Hsiao-Wuen, J. Droppo, C. Boulis, W. Ye-Yi, D. Jacoby, M. Mahajan, C. Chelba, and X.D. Huang. Distributed speech processing in miPad’s multimodal user interface. IEEE Transactions on Speech and Audio Processing, 10(8):605–619, 2002.
Google Scholar
S. Madhvanath, E. Kleinberg, V. Govindaraju, and S.N. Srihari. The HOVER system for rapid holistic verification of off-line handwritten phrases. In Proceedings of International Conference on Document Analysis and Recognition, volume 2, pages 855–859, Ulm, 1997.
Google Scholar
S. Madhvanath, E. Kleinberg, and V. Govindaraju. Holistic verification of handwritten phrases. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999.
Google Scholar
U. Marti and H. Bunke. Towards general cursive script recognition. In Proceedings of International Workshop on Frontiers in Handwriting Recognition, pages 379–388, Korea, 1998.
Google Scholar
U.-V. Marti and H. Bunke. A full english sentence database for off-line handwriting recognition. In Proceedings of International Conference on Document Analysis and Recognition, volume 1, pages 705–708, Bangalore, 1999.
Google Scholar
U.V. Marti and H. Bunke. Handwritten sentence recognition. In Proceedings of International Conference on Pattern Recognition, volume 3, pages 467–470, Barcelona, 2000.
Google Scholar
U.V. Marti and H. Bunke. Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. International Journal of Pattern Recognition and Artificial Intelligence, 2001.
Google Scholar
U.V. Marti and H. Bunke. The IAM-database: an English sentence database for offline handwriting recognition. International Journal of Document Analysis and Recognition, 5(1):39–46, january 2002.
Google Scholar
U. Marti, G. Kaufmann, and Bunke H. Cursive script recognition with time delay neural networks using learning hints. In W. Gerstner, A. Gernoud, M. Hasler, and J.D. Nicoud, editors, Artificial Neural Networks - ICANN97, pages 973–979. Springer Verlag, 1997.
Google Scholar
S. Matsoukas, J.L. Gauvain, G. Adda, T. Colthurst, C.L. Kao, O. Kimball, L. Lamel, F. Lefevre, J.Z. Ma, J. Makhoul, L. Nguyen, R. Prasad, R. Schwartz, H. Schwenk, and B. Xiang. Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI. IEEE Transactions on Audio, Speech and Language Processing, 14(5):1541–1556, 2006.
Google Scholar
M. Mohamed and P. Gader. Handwritten word recognition using segmentation-free hidden Markov modeling and segmentation-based dynamic programming techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(5):548–554, May 1996.
Google Scholar
S. Möller, J. Krebber, and P. Smeele. Evaluating the speech output component of a smart-home system. Speech Communication, 48(1):1–27, 2006.
Google Scholar
C. Olivier, T. Paquet, M. Avila, and Y. Lecourtier. Recognition of handwritten words using stochastic models. In Proceedings of International Conference on Document Analysis and Recognition, volume 1, pages 19–23, Montreal, 1995.
Google Scholar
M. Padmanabhan, G. Saon, J. Huang, B. Kingsbury, and L. Mangu. Automatic speech recognition performance on a voicemail transcription task. IEEE Transactions on Speech and Audio Processing, 10(7):433–442, 2002.
Google Scholar
T. Paquet and Y. Lecourtier. Recognition of handwritten sentences using a restricted lexicon. Pattern Recognition, 26(3):391–407, 1993.
Google Scholar
J. Park, V. Govindaraju, and S.N. Srihari. Efficient word segmentation driven by unconstrained handwritten phrase recognition. In Proceedings of International Conference on Document Analysis and Recognition, volume 1, pages 605–608, Bangalore, 1999.
Google Scholar
R. Plamondon and S.N. Srihari. On-line and off-line handwriting recognition: A comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):63–84, 2000.
Google Scholar
D. Ponceleon and S. Srinivasan. Automatic discovery of salient segments in imperfect speech transcripts. In ACM Conference on Information and Knowledge Management, pages 490–497, 2001.
Google Scholar
D. Ponceleon and S. Srinivasan. Structure and content based segmentation of speech transcripts. In ACM Conference on Research and Development in Information Retrieval (SIGIR), pages 404–405, 2001.
Google Scholar
L.R. Rabiner and B.H. Juang. Fundamentals of Speech Recognition. Prentice-Hall, 1993.
Google Scholar
G. Saon. Cursive word recognition using a random field based hidden Markov model. International Journal of Document Analysis and Recognition, 1(1):199–208, 1999.
Google Scholar
G. Seni, V. Kripasundar, and R.K. Srihari. Generalizing edit distance to incorporate domain information: Handwritten text recognition as a case study. Pattern Recognition, 29(3):405–414, 1996.
Google Scholar
A.W. Senior. Off-Line Cursive Handwriting Recognition Using Recurrent Neural Network. PhD thesis, University of Cambridge, UK, 1994.
Google Scholar
A.W. Senior and A.J. Robinson. An off-line cursive handwriting recognition system. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):309–321, March 1998.
Google Scholar
M. Shridar, G. Houle, and Kimura F. Handwritten word recognition using lexicon free and lexicon directed word recognition algorithms. In Proceedings of International Conference on Document Analysis and Recognition, volume 2, pages 861–865, Ulm, 1997.
Google Scholar
A. Singhal, S. Abney, M. Bacchiani, M. Collins, D. Hindle, and F. Pereira. AT&T at TREC-8. In Proceedings of 8\(^{th}\) Text Retrieval Conference, pages 317–330, 1999.
Google Scholar
R.K. Srihari. Use of lexical and syntactic techniques in recognizing handwritten text. In Proceedings of ARPA workshop on Human Language Technology, pages 403–407, 1994.
Google Scholar
R.K. Srihari and C. Baltus. Incorporating syntactic constraints in recognizing handwritten sentences. In Proceedings of International Joint Conference on Artificial Intelligence, pages 1262–1267, 1993.
Google Scholar
S.N. Srihari. Handwritten address interpretation: a task of many pattern recognition problems. International Journal of Pattern Recognition and Artificial Intelligence, 14(5):663–674, 2000.
Google Scholar
T. Steinherz, E. Rivlin, and N. Intrator. Off-line cursive script word recognition - a survey. International Journal on Document Analysis and Recognition, 2(2):1–33, 1999.
Google Scholar
A. Stolcke, B. Chen, H. Franco, V.R. Rao Gadde, M. Graciarena, M.Y. Hwang, K. Kirchhoff, A. Mandal, N. Morgan, X. Lei, T. Ng, M. Ostendorf, K. Sönmez, A. Venkataraman, D. Vergyri, W. Wang, J. Zheng, and Q. Zhu. Recent innovations in speech-to-text transcriptions at SRI-ICSI-UW. IEEE Transactions on Audio, Speech and Language Processing, 14(5):1729–1744, 2006.
Google Scholar
Lee S.W., editor. Advances in Handwriting Recognition. World Scientific Publishing Company, 1999.
Google Scholar
O.D. Trier, A.K. Jain, and T. Taxt. Feature extraction methods for character recognition-A survey. Pattern Recognition, 10(4):641–662, 1996.
Google Scholar
G. Tur, R. Schapire, and D. Hakkani-Tr. Active learning for spoken language understanding. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2003.
Google Scholar
I. Varga, S. Aalburg, B. Andrassy, S. Astrov, J.G. Bauer, C. Beaugeant, C. Geissler, and H. Hoge. ASR in mobile phones - an industrial approach. IEEE Transactions on Speech and Audio Processing, 10(8):562–569, 2002.
Google Scholar
A. Vinciarelli. A survey on off-line cursive word recognition. Pattern Recognition, 35(7):1433–1446, 2002.
Google Scholar
A. Vinciarelli. Noisy text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12):1882–1895, 2005.
Google Scholar
A. Vinciarelli, S. Bengio, and H. Bunke. Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):709–720, 2004.
Google Scholar
W. Wang, A. Brakensiek, A. Kosmala, and G. Rigoll. HMM based high accuracy off-line cursive handwriting recognition by a baseline detection error tolerant feature extraction approach. In Proceedings of International Workshop on Frontiers in Handwriting Recognition, pages 209–218, Amsterdam, 2000.
Google Scholar
B.A. Yanikoglu and P.A. Sandon. Off line cursive handwriting recognition using neural networks. In Proceedings of SPIE Conference on Applications of Artificial Neural Networks, 1993.
Google Scholar
B.A. Yanikoglu and P.A. Sandon. Off-line cursive handwriting recognition using style parameters. Tech. Rep. PCS-TR93-192 Dartmouth College, 1993.
Google Scholar
S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, P. Woodland. The HTK book. http://htk.eng.cam.ac.uk/docs/docs/shtml, 2000
M. Zimmermann and H. Bunke. Automatic segmentation of the IAM off-line database for handwritten english text. In Proceedings of 16 \(^{th}\) International Conference on Pattern Recognition, volume IV, pages 35–39, 2002.
Google Scholar
M. Zimmermann, J.-C. Chappelier, and H. Bunke. Offline grammar-based recognition of handwritten sentences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(5):818–821, 2006.
Google Scholar
V. Zue, S. Seneff, J.R. Glass, J. Polifroni, C. Pao, T.J. Hazen, and L. Hetherington. Juplter: a telephone-based conversational interface for weather information. IEEE Transactions on Speech and Audio Processing, 8(1):85–96, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Science and Technology, Parthenope University of Naples, Naples, Italy
Francesco Camastra
School of Computing Science and the Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
Alessandro Vinciarelli

Authors

Francesco Camastra
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Vinciarelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesco Camastra .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Camastra, F., Vinciarelli, A. (2015). Speech and Handwriting Recognition. In: Machine Learning for Audio, Image and Video Analysis. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-4471-6735-8_12

Download citation

DOI: https://doi.org/10.1007/978-1-4471-6735-8_12
Published: 22 July 2015
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6734-1
Online ISBN: 978-1-4471-6735-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics