Prototypes and Demonstrators

  • Alejandro Héctor Toselli
  • Enrique Vidal
  • Francisco Casacuberta


This chapter presents several full working prototypes and demonstrators of multimodal interactive pattern recognition applications. These systems serve as validating examples of the approaches that have been proposed and described throughout this book. Among other interesting things, they are designed to enable a true human–computer interaction on selected tasks.

To begin, we shall expound the different protocols that were tested, namely Passive Left-to-Right, Passive Desultory, and Active. The overview of each demonstrator is sufficiently detailed to give the reader an overview of the underlying technologies. The prototypes covered in this chapter are related to transcription of text images (IHT, GIDOC), machine translation (IMT), speech transcription (IST), text generation (ITG), and image retrieval (RISE). Additionally, most of these prototypes shall present evaluation measures about the amount of user effort reduction at the end of the process. Finally, some of such demonstrators come with web-based versions, whose addresses are included to allow the reader to test and practice with the different implemented applications.


Application Program Interface Text Line User Feedback Parse Tree Late Fusion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Alabau, V., Romero, V., Ortiz-Martínez, D., & Ocampo, J. (2009). A multimodal predictive-interactive application for computer assisted transcription and translation. In Proceedings of international conference on multimodal interfaces (ICMI) (pp. 227–228). Google Scholar
  2. 2.
    Barrachina, S., Bender, O., Casacuberta, F., Civera, J., Cubel, E., Khadivi, S., Lagarda, A. L., Ney, H., Tomás, J., Vidal, E., & Vilar, J. M. (2009). Statistical approaches to computer-assisted translation. Computational Linguistics, 35(1), 3–28. MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bickel, S., Haider, P., & Scheffer, T. (2005). Predicting sentences using n-gram language models. In Proceedings of human language technology and empirical methods in natural language processing (HLT/EMNLP) (pp. 193–200). CrossRefGoogle Scholar
  4. 4.
    Bisani, M., & Ney, H. (2004). Bootstrap estimates for confidence intervals in ASR performance evaluation. In Proc. ICASSP (pp. 409–412). Google Scholar
  5. 5.
    Cascia, M. L., Sethi, S., & Sclaroff, S. (1998). Combining textual and visual cues for content-based image retrieval on the world wide web. In IEEE workshop on content-based access of image and video libraries (pp. 24–28). CrossRefGoogle Scholar
  6. 6.
    Craciunescu, O., Gerding-Salas, C., & Stringer-O’Keeffe, S. (2004). Machine translation and computer-assisted translation: a new way of translating? Translation Journal, 8(3), 1–16. Google Scholar
  7. 7.
    Datta, R., Joshi, D., Li, J., & Wang, J. Z. (2008). Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2), 1–60. CrossRefGoogle Scholar
  8. 8.
    Jelinek, F. (1998). Statistical methods for speech recognition. Cambridge: MIT Press. Google Scholar
  9. 9.
    Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the HLT/NAACL (pp. 48–54). Google Scholar
  10. 10.
    Lease, M., Charniak, E., Johnson, M., & McClosky, D. (2006). A look at parsing and its applications. In Proc. AAAI (pp. 1642–1645). Google Scholar
  11. 11.
    Likforman-Sulem, L., Zahour, A., & Taconet, B. (2007). Text line segmentation of historical documents: a survey. International Journal on Document Analysis and Recognition, 9, 123–138. CrossRefGoogle Scholar
  12. 12.
    Moran, S. (2009). Automatic image tagging. Master’s thesis, School of Informatics, University of Edinburgh. Google Scholar
  13. 13.
    Oncina, J. (2009). Optimum algorithm to minimize human interactions in sequential computer assisted pattern recognition. Pattern Recognition Letters, 30(5), 558–563. CrossRefGoogle Scholar
  14. 14.
    Ortiz-Martínez, D., Leiva, L. A., Alabau, V., & Casacuberta, F. (2010). Interactive machine translation using a web-based architecture. In Proceedings of the international conference on intelligent user interfaces (pp. 423–425). Google Scholar
  15. 15.
    Paredes, R., Deselaer, T., & Vidal, E. (2008). A probabilistic model for user relevance feedback on image retrieval. In Proceedings of machine learning for multimodal interaction (MLMI) (pp. 260–271). CrossRefGoogle Scholar
  16. 16.
    Pérez, D., Tarazón, L., Serrano, N., Castro, F.-M., Ramos-Terrades, O., & Juan, A. (2009). The GERMANA database. In Proceedings of the international conference on document analysis and recognition (ICDAR) (pp. 301–305). Google Scholar
  17. 17.
    Plötz, T., & Fink, G. A. (2009). Markov models for offline handwriting recognition: a survey. International Journal on Document Analysis and Recognition, 12(4), 269–298. CrossRefGoogle Scholar
  18. 18.
    Ramos-Terrades, O., Serrano, N., Gordó, A., Valveny, E., & Juan, A. (2010). Interactive-predictive detection of handwritten text blocks. In Document recognition and retrieval XVII (Proc. of SPIE-IS&T electronic imaging) (pp. 219–222). Google Scholar
  19. 19.
    Rodríguez, L., Casacuberta, F., & Vidal, E. (2007). Computer assisted transcription of speech. In Proceedings of the Iberian conference on pattern recognition and image analysis (pp. 241–248). CrossRefGoogle Scholar
  20. 20.
    Romero, V., Toselli, A. H., Civera, J., & Vidal, E. (2008). Improvements in the computer assisted transciption system of handwritten text images. In Proceedings of workshop on pattern recognition in information system (PRIS) (pp. 103–112). Google Scholar
  21. 21.
    Romero, V., Leiva, L. A., Toselli, A. H., & Vidal, E. (2009). Interactive multimodal transcription of text images using a web-based demo system. In Proceedings of the international conference on intelligent user interfaces (pp. 477–478). Google Scholar
  22. 22.
    Romero, V., Leiva, L. A., Alabau, V., Toselli, A. H., & Vidal, E. (2009). A web-based demo to interactive multimodal transcription of historic text images. In LNCS: Vol. 5714. Proceedings of the European conference on digital libraries (ECDL) (pp. 459–460). Google Scholar
  23. 23.
    Sánchez-Sáez, R., Leiva, L. A., Sánchez, J. A., & Benedí, J. M. (2010). Interactive predictive parsing using a web-based architecture. In Proceedings of NAACL (pp. 37–40). Google Scholar
  24. 24.
    Sanchis-Trilles, G., Ortiz-Martínez, D., Civera, J., Casacuberta, F., Vidal, E., & Hoang, H. (2008). Improving interactive machine translation via mouse actions. In EMNLP 2008: conference on empirical methods in natural language processing. Google Scholar
  25. 25.
    Serrano, N., Pérez, D., Sanchis, A., & Juan, A. (2009). Adaptation from partially supervised handwritten text transcriptions. In Proceedings of the 11th international conference on multimodal interfaces and the 6th workshop on machine learning for multimodal interaction (ICMI-MLMI) (pp. 289–292). Google Scholar
  26. 26.
    Serrano, N., Tarazón, L., Perez, D., Ramos-Terrades, O., & Juan, A. (2010). The GIDOC prototype. In Proceedings of the 10th international workshop on pattern recognition in information systems (PRIS 2010) (pp. 82–89). Google Scholar
  27. 27.
    Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380. CrossRefGoogle Scholar
  28. 28.
    Stolcke, A. (2002). SRILM—an extensible language modeling toolkit. In Proceedings of the international conference on spoken language processing (ICSLP) (pp. 901–904). Google Scholar
  29. 29.
    Toselli, A. H., Juan, A., Keysers, D., González, J., Salvador, I., Ney, H., Vidal, E., & Casacuberta, F. (2004). Integrated handwriting recognition and interpretation using finite-state models. International Journal of Pattern Recognition and Artificial Intelligence, 18(4), 519–539. CrossRefGoogle Scholar
  30. 30.
    Trost, H., Matiasek, J., & Baroni, M. (2005). The language component of the fasty text prediction system. Applied Artificial Intelligence, 19(8), 743–781. CrossRefGoogle Scholar
  31. 31.
    Wang, J. Z., Boujemaa, N., Bimbo, A. D., Geman, D., Hauptmann, A. G., & Tešić, J. (2006). Diversity in multimedia information retrieval research. In Proceedings of the 8th ACM international workshop on multimedia information retrieval (pp. 5–12). Google Scholar
  32. 32.
    Young, S., et al. (1995). The HTK book. Cambridge University, Engineering Department. Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Alejandro Héctor Toselli
    • 1
  • Enrique Vidal
    • 1
  • Francisco Casacuberta
    • 1
  1. 1.Instituto Tecnológico de InformáticaUniversidad Politécnica de ValenciaValenciaSpain

Personalised recommendations