Skip to main content

Data Driven Approaches to Speech and Language Processing

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3445))

Abstract

Speech and language processing systems can be categorised according to whether they make use of predefined linguistic information and rules or are data driven and therefore exploit machine learning techniques to automatically extract and process relevant units of information which are then indexed and retrieved as appropriate. As an example, most state of the art automatic speech processing systems rely on a representation based on predefined phonetic symbols. The use of language dependent representations, whilst linguistically intuitive, has several drawbacks i.e. portability across languages, development time. Therefore, in this article, we review and present our recent experiments exploiting the idea inherent in the ALISP (Automatic Language Independent Speech Processing) approach, with particular respect to speech processing, where the intermediate representation between the acoustic and linguistic levels area is automatically inferred from speech data. We then present prospective directions in which the ALISP principles could be exploited by different domains such as audio, speech, text, image and video processing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice Conversion Through Vector Quantization. In: Proceedings ICASSP, New York, pp. 565–568 (1988)

    Google Scholar 

  2. Aho, A.V.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)

    MATH  Google Scholar 

  3. Ahlbom, G., Bimbot, F., Chollet, G.: Modeling Spectral Speech Transitions using Temporal Decomposition Techniques. In: Proceedings IEEE ICASSP, Dallas, pp. 13–16 (1987)

    Google Scholar 

  4. Aleksic, P., Williams, J., Katsaggelos, A.: Speech-To-Video Synthesis Using MPEG-4 Compliant Visual Features. IEEE Trans. Circuits and Systems for Video Technology 14(5), 682–692 (2004)

    Article  Google Scholar 

  5. Ammicht, E., Gorin A.L., Alonso T.: Knowledge Collection for Natural Spoken Dialog Systems. In: Proceedings EUROSPEECH, Budapest, Hungary (1999).

    Google Scholar 

  6. Atal B.: Efficient Coding of LPC Parameters by Temporal Decomposition. In: Proceedings ICASSP, pp. 81–84 (1983)

    Google Scholar 

  7. Baudoin, G., Cernocky, J., Chollet, G.: Quantization of Spectral Sequences using Variable Length Spectral Segments for Speech Coding at Very Low Bit Rate. In: Proceedings EUROSPEECH, Rhodes, pp. 1295–1298 (1997)

    Google Scholar 

  8. Baudoin, G., Cernocky, J., Gournay, P., Chollet, G.: Codage de la parole à bas et très bas débit. Annales des télécommunications 55, 462–482 (2000)

    Google Scholar 

  9. Baudoin, G., Cernocky, J., El Chami, F., Charbit, M., Chollet, G., Petrovska- Delacretaz, D.: Advances in Very Low Bit Rate Speech Coding using Recognition and Synthesis Techniques. In: Proceedings of the 5th Text, Speech and Dialog Workshop, Brno, pp. 269–276. Czech Republic (2002) ISBN 3-540-44129-8

    Google Scholar 

  10. Bayer, R., Unterauer, K.: Prefix B-Trees. ACM Transactions on Database Systems 2(1), 11–26 (1977)

    Article  Google Scholar 

  11. Berger, A., Brown, P., Della Pietra, S., Della Pietra, V., Gillett, J., Lafferty, J., Mercer, R., Printz, H., Ures, L.: The Candide System for Machine Translation. In: Proceedings of the ARPA Workshop on Human Language Technology (1994)

    Google Scholar 

  12. Bimbot, F., Chollet, G., Deleglise, P., Montacié, C.: Temporal Decomposition and Acoustic-Phonetic decoding of Speech. In: Proceedings IEEE ICASSP, New York, pp. 445–448 (1988)

    Google Scholar 

  13. Bimbot, F., Deleglise, P., Chollet, G.: Speech Synthesis by Structured Segments using Temporal Decomposition. In: Proceedings EUROSPEECH, Paris, pp. 183–186 (1989)

    Google Scholar 

  14. Bimbot, F., Pieraccini, R., Levin, E., Atal, B.: Variable Length Sequence Modelling: Multigrams. IEEE Signal Processing Letters 2(6), 111–113 (1995)

    Article  Google Scholar 

  15. Black, E., Jelinek, F., Lafferty, J.D., Magerman, D.M., Mercer, R.L., Roukos, S.: Towards History-Based Grammars: Using Richer Models for Probabilistic Parsing. In: Proceedings DARPA Speech and Natural Language Workshop, Harriman, NY, pp. 134–139 (1992)

    Google Scholar 

  16. Black, A., Brown, R.D., Frederking, R., Singh, R., Moody, J., Steinbrecher, E.: TONGUES: Rapid Development of a Speech-to-Speech Translation System. In: Proceedings of HLT 2002: Second International Conference on Human Language Technology Research, San Diego, CA , pp. 24–27 (2002)

    Google Scholar 

  17. Blouet, R., Mokbel, C., Mokbel, H., Sanchez-Soto, E., Chollet, G., Greige, H.: BECARS: A Free Software for Speaker Verification. In: Proceedings ODYSSEY 2004 - The Speaker and Language Recognition Workshop, Toledo, Spain, pp. 145–148 (2004)

    Google Scholar 

  18. Bregler, C., Covell, M., Slaney, M.: Video Rewrite: Driving Visual Speech with Audio. In: Proceedings ACM SIGGRAPH 1997 (1997)

    Google Scholar 

  19. Brown, P.F., Della Pietre, S.A., Della Pietra, V.J., Mercer, R.: Word-Sense Disambiguation using Statistical Methods. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, pp. 264–270 (1991)

    Google Scholar 

  20. Brown, P.F., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Mercer, R., Roossin, P.: A Statistical Approach to Language Translation. In: Coling Budapest: Proceedings of the 12th International Conference on Computational Linguistics, Budapest, Hungary, pp. 71–77 (1998)

    Google Scholar 

  21. Brown, P.F., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Lafferty, J., Mercer, R.L., Roossin, P.S.: A Statistical Approach to Machine Translation. Computational Linguistics 16, 79–85 (1990)

    Google Scholar 

  22. Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19, 263–311 (1993)

    Google Scholar 

  23. Brown, R.D.: Example-Based Machine Translation in the PANGLOSS System. In: COLING 1996: The 16th International Conference on Computational Linguistics, Copenhagen, Denmark, pp. 169–174 (1996)

    Google Scholar 

  24. Brown, R.D.: Automated Dictionary Extraction for Knowledge-Free Example- Based Translation. In: Proceedings of the 7th International Conference on Theoretical and Methodological Issues in Machine Translation, Santa Fe, New Mexico, pp. 111–118 (1997)

    Google Scholar 

  25. Brown, R.D., Frederking, R.E.: Applying Statistical Language Modelling to Symbolic Machine Translation. In: Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine Translation, Leuven, Belgium, pp. 354–372 (1995)

    Google Scholar 

  26. Cappe, O., Stylianou, Y., Moulines, E.: Statistical Methods For Voice Quality Transformation. In: Proceedings of EUROSPEECH 1995, Madrid, Spain, pp. 447–450 (1995)

    Google Scholar 

  27. Carpenter, G., Grossberg, S.: A Massively Parallel Architecture for a Self- Organizing Neural Pattern Recognition Machine. Proceedings of Computer Vision, Graphics and Image Processing 37, 54–115 (1987)

    Article  Google Scholar 

  28. Casacuberta, F., Vidal, E., Vilar, J.-M.: Architectures for Speech-to-Speech Translation using Finite-State Models. In: Proceedings of the Workshop on Speech-to- Speech Translation: Algorithms and Systems, Philadelphia, pp. 39–44 (2002)

    Google Scholar 

  29. Cernocky, J., Baudoin, G., Chollet, G.: Speech Spectrum Representation and Coding using Multigrams with Distance. In: Proceedings IEEE ICASSP, Munich, pp. 1343–1346 (1997)

    Google Scholar 

  30. Cernocky, J., Baudoin, G., Chollet, G.: Segmental Vocoder - Going Beyond the Phonetic Approach. In: Proceedings IEEE ICASSP, Seattle, pp. 605–608 (1998) ISBN 0-7803-4428-6

    Google Scholar 

  31. Cernocky, J., Baudoin, G., Chollet, G.: Very Low Bit Rate Segmental Speech Coding using Automatically Derived Units. In: Proceedings RADIOELEKTRONIKA, Brno, Czech Republic, pp. 224–227 (1998) ISBN 80-214-0983-5

    Google Scholar 

  32. Cernocky, J., Petrovska-Delacretaz, D., Pigeon, S., Verlinde, P., Chollet, G.: A Segmental Approach to Text-Independent Speaker Verification. In: Proceedings EUROSPEECH, Budapest, vol. 5, pp. 2203–2206 (1999)

    Google Scholar 

  33. Cernocky, J., Kopecek I., Baudoin, G., Chollet, G.: Very Low Bit Rate Speech Coding: Comparison of Data-Driven Units with Syllable Segments. In: Proceedings of the Text, Speech and Dialog Workshop, Pilsen, Czech Republic, pp. 257–262 (1999) ISBN 3-540- 66494-7

    Google Scholar 

  34. Cernocky, J., Baudoin, G., Petrovska-Delacretaz, D., Chollet, G.: Vers une analyse acoustico-phonétique de la parole indépendante de la langue, basée sur ALISP. Revue Parole 17, 191–226 (2001) ISSN 1373-1955

    Google Scholar 

  35. Charniak, E.: Statistical Language Learning. MIT Press, Cambridge (1993)

    Google Scholar 

  36. Charniak, E.: Statistical Parsing with a Context-Free Grammar and Word Statistics. In: Proceedings of the 14th National Conference on Artificial Intelligence (AAAI 1997), Menlo Park, CA, pp. 598–603 (1997)

    Google Scholar 

  37. Chollet, G., Galliano, J.-F., Lefevre, J.-P., Viara, E.: On the Generation and Use of a Segment Dictionary for Speech Coding, Synthesis and Recognition. In: Proceedings IEEE ICASSP, Boston, pp. 1328–1331 (1983)

    Google Scholar 

  38. Chollet, G., Grenier, Y., Marcus, S.: Segmentation and Non-Stationary Modeling of Speech. In: Proceedings EUSIPCO, The Hague (1986)

    Google Scholar 

  39. Chollet, G., Cernocky, J., Constantinescu, A., Deligne, S., Bimbot, F.: Toward ALISP: Automatic Language Independent Speech Processing. In: Ponting, K., Moore, R. (eds.) Computational Models for Speech Pattern Processing, pp. 375–387. Springer, Heidelberg (1999) ISBN 3-540-65478-X

    Google Scholar 

  40. Chollet, G., Cernocky, J., Gravier, G., Hennebert, J., Petrovska-Delacretaz, D., Yvon, F.: Toward Fully Automatic Speech Processing Techniques for Interactive Voice Servers. In: Chollet, G., Di Benedetto, M.-G., Esposito, A., Marinaro, M. (eds.) Speech Processing, Recognition and Artificial Neural Networks, Springer, Heidelberg (1999)

    Google Scholar 

  41. Chollet, G., Cernocky, J., Baudoin, G.: Unsupervised Learning for Very Low Bit Rate Coding. In: Proceedings of SCI-ISAS 2000, Orlando (2000)

    Google Scholar 

  42. Chu-Carroll, J., Carpenter, B.: Vector-based Natural Language Call Routing. Computational Linguistcs 25(3), 361–388 (1999)

    Google Scholar 

  43. Church, K.: A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In: Proceedings Second Conference on Applied Natural Language Processing, ACL, Austin, Texas, pp. 136–143 (1988)

    Google Scholar 

  44. Collins, B., Cunningham, P.: Adaptation Guided Retrieval in EBMT: A Case- Based Approach to Machine Translation. In: Smith, I., Faltings, B.V. (eds.) EWCBR 1996. LNCS, vol. 1168, pp. 91–104. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  45. Cutting, D., Pedersen, J.: Optimizations for Dynamic Inverted Index Maintenance. In: Proceedings 13th International Conference on Research and Development in Information Retrieval, Brussels, Belgium, pp. 405–411 (1990)

    Google Scholar 

  46. Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A Practical Part-of-Speech Tagger. In: Third Conference on Applied Natural Language Processing, Trento, Italy, pp. 133–140 (1992)

    Google Scholar 

  47. Daelemans, W., Zavrel, J., Berck, S.: MBT: A Memory Based Part of Speech Tagger-Generator. In: Proceedings of the 4th Workshop on Very Large Corpora, Copenhagen, Denmark, pp. 14–27 (1996)

    Google Scholar 

  48. Dagan, I., Perreira, F., Lee, L.: Similarity Based Estimation ofWord Co-occurence Probabilities. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, pp. 272–278 (1994)

    Google Scholar 

  49. Damper, R.I. (ed.): Data Driven Techniques in Speech Synthesis. Kluwer, Dordrecht (2001)

    Google Scholar 

  50. Deligne, S., Bimbot, F.: Language Modeling by Variable Length Sequences: Theoretical Formulation and Evaluation of Multigrams. In: Proceedings ICASSP, Munich, pp. 1731–1734 (1997)

    Google Scholar 

  51. Deligne, S., Bimbot, F.: Inference of Variable-length Linguistic and Acoustic Units by Multigrams. Speech Communication 23, 223–241 (1997)

    Article  Google Scholar 

  52. Deligne, S., Yvon, F., Bimbot, F.: Introducing Statistical Dependencies and Structural Constraints in Variable-Length Sequence Models. In: Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences, Montpellier, France, pp. 156–167 (1996)

    Google Scholar 

  53. Doddington, G., Martin, A., Przybocki, M., Reynolds, D.: The NIST Speaker Recognition Evaluation - Overview, Methodology, Systems, Results, Perspectives. Speech Communications 31(2-3), 225–254 (2000)

    Article  Google Scholar 

  54. Dorr, B. J., Jordan, P. W., Benoit, J. W.: A Survey of Current Paradigms in Machine Translation. Technical Report: LAMP-TR-027, UMIACS-TR-98-72, CSTR- 3961, University of Maryland, College Park (December 1998)

    Google Scholar 

  55. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley and Sons, Chichester (2001)

    MATH  Google Scholar 

  56. Du Jeu, C., Charbit, M., Chollet, G.: Very Low Rate Speech Compression by Indexation of Polyphones. In: Proceedings of EUROSPEECH, Geneva, pp. 1085–1088 (2003)

    Google Scholar 

  57. Eatock, J.P., Mason, J.S.: A Quantitative Assessment of the Relative Speaker Discriminant Properties of Phonemes. In: Proceedings ICASSP, vol. 1, pp. 133–136 (1994)

    Google Scholar 

  58. El Hannani, A., Petrovska-Delacretaz, D., Chollet, G.: Linear and Non-linear Fusion of ALISP- and GMM-Based Systems for Text-Independent Speaker Verification. In: Proceedings of ISCA Workshop: A Speaker Odyssey, Toledo, Spain, pp. 111–116 (2004)

    Google Scholar 

  59. Farinas, J., Obrecht, R.A.: Modélisation phonotactique de grandes classes phonétiques en vue d’une approche différenciée en identification automatique des langues. In: Proceedings 18ème colloque GRETSI sur le traitement du signal et des images, Toulouse, France (2001)

    Google Scholar 

  60. Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs (1992)

    Google Scholar 

  61. Fukunaga, K.: Statistical Pattern Recognition, 2nd edn. Academic Press, London (1990)

    MATH  Google Scholar 

  62. Gailly, J.-L., Nelson, M.: The Data Compression Book. John Wiley and Sons, Chichester (1995)

    Google Scholar 

  63. Gale, W., Church, K.W., Yarowsky, D.: Work on Statistical Methods for Word Sense Disambiguation. In: Proceedings of the AAAI Fall Symposium: Probabilistic Approaches to Natural Language, Cambridge, MA, pp. 54–60 (1992)

    Google Scholar 

  64. Gonnet, G.H., Baeza-Yates, R.: Handbook of Algorithms and Data Structures, 2nd edn. Addison-Wesley, Reading (1991)

    Google Scholar 

  65. Gorin, A.L., Petrovska-Delacrétaz, D., Riccardi, G., Wright, J.H.: Learning Spoken Language without Transcriptions. In: Proceedings IEEE Workshop on Automatic Speech Recognition and Understanding (1999)

    Google Scholar 

  66. Gorin, A.L.: How I Help You? Speech Communication 23, 113–127 (1997)

    Article  Google Scholar 

  67. Gorin, A.L.: On Automated Language Acquisition. Journal of the Acoustical Society of America JASA 97(6), 3441–3461 (1995)

    Article  Google Scholar 

  68. Gorin, A.L., Levinson, S., Sankar, A.: An Experiment in Spoken Language Acquisition. Proceedings IEEE Transactions on Speech and Audio 2, 224–240 (1994)

    Article  Google Scholar 

  69. Haines, D., Croft, W.B.: Relevance Feedback and Inference Networks. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburg, Penn, pp. 2–11 (1993)

    Google Scholar 

  70. Hankerson, D., Harris, G.A., Johnson, P.D.: Introduction to Information Theory and Data Compression. CRC Press, Boca Raton (2003)

    Book  MATH  Google Scholar 

  71. Harbeck, S., Ohler, U.: Multigrams for Language Identification. In: Proceedings EUROSPEECH, Budapest, Hungary (1999)

    Google Scholar 

  72. Harman, D., Baeza-Yates, R., Fox, E., Lee, W.: Inverted Files. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, Prentice Hall, Englewood Cliffs (1992)

    Google Scholar 

  73. Ho, Y.: Application of Minimal Perfect Hashing in Main Memory Indexing. MITLCS-TM-508 (1994)

    Google Scholar 

  74. Jensen, F.V.: Bayesian Networks and Decision Graphs. Springer (2001)

    Google Scholar 

  75. Jelinek, F.: Self-Organized Language Modeling for Speech Recognition. In: Waibel, A., Lee, K.F. (eds.) Readings in Speech Recognition, pp. 450–506. Morgan Kaufmann Publishers, San Mateo (1990)

    Google Scholar 

  76. Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1999)

    Google Scholar 

  77. Kain, A., Macon, M.W.: Spectral Voice Conversion for Text to Speech Synthesis. In: Proceedings ICASSP 88, New York, vol. 1, pp. 285–288 (1998)

    Google Scholar 

  78. Kain, A., Macon, M.W.: Design and Evaluation of a Voice Conversion Algorithm Based on Spectral Envelope Mapping and Residual Prediction. In: Proceedingsd ICASSP 2001, Salt Lake City, USA (2001)

    Google Scholar 

  79. Kaji, H., Kida, Y., Morimoto, Y.: Learning Translation Templates from Bilingual Text. In: Proceedings of the 14th Conference on Computational Linguistics, Nantes, France, vol. 2, pp. 672–678 (1992)

    Google Scholar 

  80. Karam, W., Mokbel, C., Aversano, G., Pelachaud, C., Chollet, G.: An Audiovisual Imposture Scenario by Talking Face Animation. In: Chollet, G., Esposito, A., Faundez, M., Marinaro, M. (eds.) Nonlinear Speech Processing: Algorithms and Analysis, Springer, Heidelberg (2005) (in this volume)

    Google Scholar 

  81. Knuth, D.E.: The Art of Computer Programming. Addison Wesley, Reading (1973)

    Google Scholar 

  82. Kohonen, T.: Self Organizing Maps. Springer, Heidelberg (1995)

    Google Scholar 

  83. Koza, J.R.: Genetic Programming. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  84. Kuo, H.-K.J., Lee, C.-H.: A Portability Study on Natural Language Call Steering. In: Proceedings EUROSPEECH, Aalborg, Denmark (2001)

    Google Scholar 

  85. Lamel, L.F, Gauvain, J.-L., Eskénazi, M.: BREF, A Large Vocabulary Spoken Corpus for French. In: Proceedings of the European Conference on Speech Technology, EUROSPEECH, pp. 505–508 (1991)

    Google Scholar 

  86. Laroche, J., Stylianou, Y., Moulines, E.: HNM: A Simple, Efficient Harmonic Plus Noise Model for Speech. In: Proceedings of IEEE ASSPWorkshop on Applications of Signal Processing to Audio and Acoustics (1993)

    Google Scholar 

  87. Lee, K.-S., Cox, R.V.: A Segmental Speech Coder Based on a Concatenative TTS. Speech Communication 38(1), 89–100 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  88. Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Cybernetics and Control Theory 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  89. Levin, L., Lavie, A., Woszczyna, M., Gates, D., Gavaldá, M., Koll, D., Waibel, A.: The Janus-III Translation System: Speech-to-Speech Translation in Multiple Domains. Machine Translation Archive 15(1-2), 3–25 (2000)

    Article  MATH  Google Scholar 

  90. Lloyd-Thomas, H., Parris, E., Wright, J.W.: Recurrent Substrings and Data Fusion for Language Recognition. In: Proceedings ICSLP, Sydney, Australia (1998)

    Google Scholar 

  91. Lowrance, R., Wagner, R.A.: An Extension of the String-to-String Correction Problem. Journal of the Association of Computing Machinery 22(2), 177–183 (1975)

    MATH  MathSciNet  Google Scholar 

  92. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  93. Martin, A., Przybocki, M.: The NIST Speaker Recognition Evaluations: 1996-2001. In: Proceedings Odyssey 2001, Crete, Greece, pp. 39–42 (2001)

    Google Scholar 

  94. Marcu, D., Wong, W.: A Phrase-Based, Joint Probability Model for Statistical Machine Translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, pp. 133–139 (2002)

    Google Scholar 

  95. Mc-Tait, K.: Translation Patterns, Linguistic Knowledge and Complexity in an Approach to EBMT. In: Carl, M., Way, A. (eds.) Recent Advances in Example-Based Machine Translation, Kluwer Academic Press, Amsterdam (2003)

    Google Scholar 

  96. McTait, K.: Translation Pattern Extraction and Recombination for Example- Based Machine Translation. Ph.D. Thesis, University of Manchester Institute of Science and Technology, Manchester, UK (2001)

    Google Scholar 

  97. McTait, K., Trujillo, A.: A Language-Neutral Sparse-Data Algorithm for Extracting Translation Patterns. In: Proceedings of the 8th International Conference on Theoretical and Methodological Issues in Machine Translation TMI 1999, Chester, UK, pp. 98–108 (1999)

    Google Scholar 

  98. McTait, K., Olohan, M., Trujillo, A.: A Building Blocks Approach to Translation Memory. In: Proceedings of the 21st ASLIB International Conference on Translating and the Computer, London, UK (1999)

    Google Scholar 

  99. Melamed, I.D.: A Word-To-Word Model of Translation Equivalence. In: 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain, pp. 490–497 (1997)

    Google Scholar 

  100. Merialdo, B.: Tagging English Text with a Probabilistic Model. Computational Linguistics 20(2), 155–172 (1994)

    Google Scholar 

  101. Metze, F., McDonough, J., Soltau, H., Waibel, A., Lavie, A., Burger, S., Langley, C., Levin, L., Schultz, T., Pianesi, F., Cattoni, R., Lazzari, G., Mana, N., Pianta, E.: The NESPOLE! Speech-to-Speech Translation System. In: Proceedings of HLT 2002 Human Language Technology Conference, San Diego, CA (2002)

    Google Scholar 

  102. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  103. Mitchell, T.M.: Machine Learning and Data Mining. Communications of the ACM 42(11), 30–36 (1999)

    Article  Google Scholar 

  104. Morimoto, T., Takezawa, T., Yato, F., Sagayama, S., Tashiro, M., Nagata, M., Kurematsu, A.: ATR’s Speech Translation System: ASURA. In: Proceedings EUROSPEECH 1993, pp. 1291–1295 (1993)

    Google Scholar 

  105. Nagao, M.: A Framework of a Mechanical Translation between Japenese and English by Analogy Principle. In: Elithorn, A., Banerji, R. (eds.) Artificial and Human Intelligence., pp. 173–180. NATO Publications (1984)

    Google Scholar 

  106. Nakamura, S.: Fusion of Audio-Visual Information for Integrated Speech Processing. In: Bigun, J., Smeraldi, F. (eds.) AVBPA 2001. LNCS, vol. 2091, pp. 127–143. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  107. Navrátil, J.: Spoken Language Recognition: A Step Towards Multilinguality. IEEE Trans. Audio and Speech Processing 9(6), 678–685 (2001)

    Article  Google Scholar 

  108. Nevill-Manning, C.G.: Inferring Sequential Structure. PhD Thesis, Univ. of Waikato (1996)

    Google Scholar 

  109. Nirenburg, S., Beale, S., Domashnev, C.: A Full-Text Experiment in Example- Based Machine Translation. In: Proceedings of the International Conference on New Methods in Language Processing (NeMLaP), Manchester, UK, pp. 78–87 (1994)

    Google Scholar 

  110. Nirenburg, S., Domashnev, C., Grannes, D.J.: Two Approaches to Matching in Example-Based Machine Translation. In: Proceedings of the Fifth International Conference on Theoretical and Methodological Issues in Machine Translation, TMI 1993: MT in the Next Generation, Kyoto, Japan, pp. 47–57 (1993)

    Google Scholar 

  111. Olivier, D.C.: Stochastic Grammars and Language Acquisition Mechanism. Ph.D. Thesis, Harvard University (1968)

    Google Scholar 

  112. Pasquariello, S., Pelachaud, C.: Greta: A Simple Facial Animation Engine. In: 6th Online World Conference on Soft Computing in Industrial Applications, Session on Soft Computing for Intelligent 3D Agents (September 2001)

    Google Scholar 

  113. Perrot, P., Aversano, G., Chollet, G., Charbit, M.: Voice Forgery Using ALISP: Indexation in a Client Memory. To appear in proc. of ICASSP 2005

    Google Scholar 

  114. Petrovska-Delacrétaz, D., Černocký, J., Hennebert, J., Chollet, G.: Text-Independent Speaker Verification Using Automatically Labeled Acoustic Segments. In: ICLSP, Sydney, Australia (1998)

    Google Scholar 

  115. Petrovska-Delacretaz, D., Cernocky, J., Hennebert, J., Chollet, G.: Segmental Approaches to Automatic Speaker Verification. Digital Signal Processing: A Review Journal 10(1/2/3), 198–212 (2000)

    Article  Google Scholar 

  116. Petrovska-Delacrétaz, D., Gorin, A.L.,Wright, J.H., Riccardi G.: Detecting Acoustic Morphemes in Lattices for Spoken Language Understanding. In: Proceedings ICSLP, Beijing, China (2000)

    Google Scholar 

  117. Petrovska-Delacretaz, D., Gorin, A.L., Riccardi, G., Wright, J.H.: Detecting Acoustic Morphemes in Lattices for Spoken Language Understanding. In: Proceedings of ICASSP, Beijing, China (2000)

    Google Scholar 

  118. Petrovska-Delacretaz, D., Chollet, G.: Searching Through a Speech Memory for Efficient Coding, Recognition and Synthesis. In: Braun, A., Masthoff, H. (eds.) Phonetics and its Applications, pp. 453–464. Franz Steiner Verlag, Stuttgart (2002) ISBN 8094-5

    Google Scholar 

  119. Petrovska-Delacretaz, D., Abalo, M., El Hannani, A., Chollet, G.: Data-Driven Speech Segmentation for Speaker Verification and Language Identification. In: Proceedings of NOLISP, Le Croisic (2003)

    Google Scholar 

  120. Petrovska-Delacretaz, D., El Hannani, A., Chollet, G.: Searching through a Speech Memory for Text-Independent Speaker Verification. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688, p. 84. Springer, Heidelberg (2003)

    Google Scholar 

  121. Pighin, F., Szeliski, R., Salesin, D.: Modeling and Animating Realistic Faces from Images. International Journal of Computer Vision 50(2), 143–169 (2002)

    Article  MATH  Google Scholar 

  122. Planas, E., Furuse, O.: Formalizing Translation Memory. In: Carl, M., Way, A. (eds.) Recent Advances in Example-Based Machine Translation., Kluwer Academic Press, Amsterdam (2003)

    Google Scholar 

  123. Prudon, R., d’Alessandro, C.: A Selection/Concatenation Text-to-Speech Synthesis System: Database Development, System Design, Comparative Evaluation. In: Proceedings of the 4th Speech Synthesis Workshop, Pitlochy, Scotland (2001)

    Google Scholar 

  124. Przybocki, M., Martin, A.: NIST’s Assessment of Text Independent Speaker Recognition Performance 2002. In: The Advent of Biometrics on the Internet, A COST 275 Workshop in Rome, Italy, November 7-8 (2002)

    Google Scholar 

  125. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993)

    Google Scholar 

  126. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. DSP, Special Issue on the NIST 1999 Evaluations 10(1-3), 19–41 (2000)

    Google Scholar 

  127. Ribeiro, C.M., Trancoso, I.M.: Improving Speaker Recognisability in Phonetic Vocoders. In: Proceedings of ICSLP, Sydney (1998)

    Google Scholar 

  128. Ribeiro, C.M., Trancoso, I.M.: Phonetic Vocoder Assessment. In: Proceedings ICSLP, Beijing, vol. 3, pp. 830–833 (2000)

    Google Scholar 

  129. Roy, D.: Learning Words from Sights and Sounds: A Computational Model. Ph.D. Thesis, MIT (1999)

    Google Scholar 

  130. Sadler, V., Vendelmans, R.: Pilot Implementation of a Bilingual Knowledge Bank. In: Proceedings of the 13th International Conference on Computational Linguistics, Helsinki, vol. 3, pp. 449–451 (1990)

    Google Scholar 

  131. Salton, G., McGill, M.S.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  132. Sayood, K.: Introduction to Data Compression. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  133. Shiraki, Y., Honda, M.: LPC Speech Coding based on VLSQ. Proceedings IEEE Trans. on ASSP 3(9) (1988)

    Google Scholar 

  134. Schroeter, J., Graf, H.P., Beutnagel, M., Cosatto, E., Syrdal, A., Conkie, A., Stylianou, Y.: Multimodal Speech Synthesis. In: Proceedings IEEE International Conference on Multimedia and Expo., NY, pp. 571–578 (2000)

    Google Scholar 

  135. Simard, P.Y., Le Cun, Y., Denker, J.S.: Memory Based Character Recognition using a Transformation Invariant Metric. In: Proceedings of ICPR, Jerusalem, pp. 262–267 (1994)

    Google Scholar 

  136. Simard, M., Langlais, P.: Sub-Sentential Exploitation of Translation Memories. In: MT Summit VIII: Machine Translation in the Information Age, Santiago de Compostela, Spain, pp. 335–339 (2001)

    Google Scholar 

  137. Simons, A., Cox, S.: Generation of Mouth Shapes for a Synthetic Talking Head. Proceedings Inst. Acoust. 12, 475–482 (1990)

    Google Scholar 

  138. Smith, T.C., Witten, I.H.: Learning Language using Genetic Algorithms. In: Wermter, S., Riloff, E., Scheler, G. (eds.) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing, pp. 132–145. Springer, NY (1996)

    Google Scholar 

  139. Somers, H., McLean, I., Jones, D.: Experiments in Multilingual Example-Based Generation. In: Proceedings CSNLP 1994: 3rd Conference on the Cognitive Science of Natural Language Processing, Dublin, Ireland

    Google Scholar 

  140. Stolcke, A.: An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities. Computational Linguistics 21(2), 165–201 (1995)

    MathSciNet  Google Scholar 

  141. Stylianou, Y., Cappé, O., Moulines, E.: Statistical Methods for Voice Quality Transformation. In: Proceedings of EUROSPEECH, Madrid, pp. 447–450 (1995)

    Google Scholar 

  142. Stylianou, Y., Cappé, O., Moulines, E.: Continuous Probabilistic Transform for Voice Conversion. Proceedings IEEE Transactions on SAP 6(2), 131–142 (1998)

    Google Scholar 

  143. Suhm, B., Geutner, P., Kemp, T., Lavie, A., Mayfield, L., McNair, A.E., Rogina, I., Schultz, T., Sloboda, T., Ward, W., Woszczyna, M., Waibel, A.: JANUS: Towards Multilingual Spoken Language Translation. In: Proceedings ARPA Spoken Language Technology Workshop, Austin, TX (1995)

    Google Scholar 

  144. Sumita, E., Tsutsumi, Y.: A Translation Aid System Using Flexible Text Retrieval Based on Syntax-Matching. In: TMI 1988 Proceedings Supplement, Pittsburgh (1988) (pages not numbered)

    Google Scholar 

  145. Tamura, M., Masuko, T., Kobayashi, T., Tokuda, K.: Visual Speech Synthesis Based on Parameter Generation from HMM: Speech-Driven and Text-and-Speech- Driven Approaches. In: Proceedings Auditory-Visual Speech Processing (1998)

    Google Scholar 

  146. Thomas, H.L., Parris, E., Wright, J.: Reccurent Substrings and Data Fusion for Language Recognition. In: Proceedings ICASSP 2000, Instanbul, Turkey, vol. 2, pp. 169–173 (2000)

    Google Scholar 

  147. Tomokiyo, M., Chollet, G.: A Proposal to Represent Speech Control Mechanisms within the Universal Networking Digital Language. In: Proceedings of the International Conference on the Convergence of Knowledge, Culture, Language and Information Technologies, Alexandria, Egypt (2003)

    Google Scholar 

  148. Turcato, D.: Automatically Creating Bilingual Lexicons for Machine Translation from Bilingual Text. In: Proceedings COLING-ACL 1998. 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Canada, pp. 1299–1305 (1998)

    Google Scholar 

  149. Utsuro, T., Matsumoto, Y., Nagao, M.: Lexical Knowledge Acquisition from Bilingual Corpora. In: Proceedings of the fifteenth [sic] International Conference on Computational Linguistics, COLING 1992, Nantes, France, pp. 581–587 (1992)

    Google Scholar 

  150. Valbret, H., Moulines, E., Tubach, J.-P.: Voice Transformation using PSOLA Technique. In: Proceedings ICASSP 1992, vol. 1, pp. 145–148 (1992)

    Google Scholar 

  151. Valiant, L.G.: A Theory of the Learnable. Communications of the ACM 27(11), 1134–1142 (1984)

    Article  MATH  Google Scholar 

  152. Vogel, S., Och, F.J., Tillmann, C., Nießen, S., Sawaf, H., Ney, H.: Statistical Methods for Machine Translation. In: Wahlster, W. (ed.) Verbmobil: Foundations of Speech-to-Speech Translation, Springer, Berlin (2000)

    Google Scholar 

  153. Wahlster, W.: First Results of Verbmobil: Translation Assistance for Spontaneous Dialogues. In: Proceedings ATR International Workshop on Speech Translation, Kyoto, Japan (1993)

    Google Scholar 

  154. Waibel, A., Finke, M., Gates, D., Gavaldà, M., Kemp, T., Lavie, A., Maier, M., Mayfield, M., McNair, A., Rogina, I., Shima, K., Sloboda, T., Woszczyna, M., Zhan, P., Zeppenfeld, T.: Janus II - Advances in Spontaneous Speech Translation. In: Internatational Conference on Acoustics, Speech and Signal Processing, Atlanta, Georgia (1996)

    Google Scholar 

  155. Waibel, A., Jain, A.M., McNair, A.E., Saito, H., Hauptmann, A.G., Tebelskis, J.: JANUS: A Speech-To-Speech Translation System Using Connectionist and Symbolic Processing Strategies. In: ICASSP 1991, Toronto, Canada, vol. 2, pp. 793–796 (1991)

    Google Scholar 

  156. Wang, Y.-Y., Waibel, A.: Modeling with Structures in Statistical Machine Translation. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Canada, pp. 1357–1363 (1998)

    Google Scholar 

  157. Wang, Y., Waibel, A.: Decoding Algorithm in Statistical Machine Translation. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics ACL/EACL 1997, Madrid, Spain, pp. 366–372 (1997)

    Google Scholar 

  158. Watanabe, H.: A Method for Extracting Translation Patterns from Translation Examples. In: Proceedings of the 5th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI 1993): MT in the Next Generation, Kyoto, Japan, pp. 292–301 (1993)

    Google Scholar 

  159. Williams, J., Katsaggelos, A.: An HMM-Based Speech-to-Video Synthesizer. Proceedings IEEE Transactions on Neural Networks 13(4), 900–915 (2002)

    Article  Google Scholar 

  160. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  161. Yamamoto, E., Nakamura, S., Shikano, K.: Lip Movement Synthesis from Speech Based on Hidden Markov Models. Speech Communication 26(12), 105–115 (1998)

    Article  Google Scholar 

  162. Yi, J., Glass, J.: Information-Theoretic Criteria for Unit Selection Synthesis. In: Proceedings of ICSLP, Denver, Colorado, pp. 2617–2620 (2002)

    Google Scholar 

  163. Yvon, F.: Paradigmatic Cascades: A Linguistically Sound Model of Pronunciation by Analogy. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, Somerset, NJ, pp. 428–435 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chollet, G., McTait, K., Petrovska-Delacrétaz, D. (2005). Data Driven Approaches to Speech and Language Processing. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_8

Download citation

  • DOI: https://doi.org/10.1007/11520153_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27441-4

  • Online ISBN: 978-3-540-31886-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics