Skip to main content

Spoken Dialogue Application in Space: The Clarissa Procedure Browser

  • Chapter
  • First Online:

Abstract

Anyone who has seen more than three science-fiction films will probably be able to suggest potential uses for spoken dialogue systems in space. Until recently, however, NASA and other space agencies have shown a surprising lack of interest in attempting to make this dream a reality and it is only in the last few years that any serious work has been carried out. The present chapter describes Clarissa, an experimental voice-enabled system developed at NASA Ames Research Center during a 3-year project starting in early 2002, which enables astronauts to navigate complex procedures using only spoken input and output. Clarissa was successfully tested on the International Space Station (ISS) on June 27, 2005, and is, to the best of our knowledge, the first spoken dialogue application in space.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In this connection, we would particularly like to mention T.J. Creamer and Mike Fincke.

  2. 2.

    Note that the grammar’s “logical forms” and the dialogue manager’s “dialogue moves” are not the same.

  3. 3.

    For long side-conversations, the user has the option of using the “suspend” command (cf. Section 2.1) to pause recognition.

References

  1. Aist, G., Dowding, J., Hockey, B. A., Hieronymus, J. (2002). An intelligent procedure assistant for astronaut training and support. In: Proc. 40th Annual Meeting of the Association for Computational Linguistics (demo track), Philadelphia, PA, 5–8.

    Google Scholar 

  2. Martin, D., Cheyer, A., Moran, D. (1999). The open agent architecture: a framework for building distributed software systems. Appl. Artif. Intell., 13 (1–2), 92–128.

    Google Scholar 

  3. Nuance (2006). http://www.nuance.com. As of 15 November 2006.

  4. Knight, S., Gorrell, G., Rayner, M., Milward, D., Koeling, R., Lewin, I. (2001). Comparing grammar-based and robust approaches to speech understanding: a case study. In: Proc. Eurospeech 2001, Aalborg, Denmark, 1779–1782.

    Google Scholar 

  5. Rayner, M., Hockey, B. A., Bouillon, P. (2006). Putting Linguistics into Speech Recognition: The Regulus Grammar Compiler. CSLI, Chicago, IL.

    Google Scholar 

  6. Regulus (2006). http://sourceforge.net/projects/regulus/. As of 15 November 2006.

  7. Pulman, S. G. (1992). Syntactic and semantic processing. In: Alshawi, H. (ed) The Core Language Engine, MIT, Cambridge, MA, 129–148.

    Google Scholar 

  8. van Harmelen, T., Bundy, A. (1988). Explanation-based generalization – partial evaluation (research note). Artif. Intell., 36, 401–412.

    Article  Google Scholar 

  9. Rayner, M. (1988). Applying explanation-based generalization to natural-language processing. In: Proc. Int. Conf. on Fifth Generation Computer Systems, Tokyo, Japan, 1267–1274.

    Google Scholar 

  10. Rayner, M., Hockey, B. A. (2003). Transparent combination of rule-based and data-driven approaches in a speech understanding architecture. In: Proc. 10th Conf. Eur. Chapter of the Association for Computational Linguistics, Budapest, Hungary, 299–306.

    Google Scholar 

  11. Yarowsky, D. (1994). Decision lists for lexical ambiguity resolution. In: Proc. 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 88–95.

    Google Scholar 

  12. Carter, D. (2000). Choosing between interpretations. In: Rayner, M., Carter, D., Bouillon, P., Digalakis, V., Wirén, M. (eds) The Spoken Language Translator, Cambridge University Press, Cambridge, MA, 78–97.

    MATH  Google Scholar 

  13. Dowding, J., Hieronymus, J. (2003). A spoken dialogue interface to a geologist’s field assistant. In: Proc. HLT-NAACL 2003: Demo Session, Edmonton, Alberta, 9–10.

    Google Scholar 

  14. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In: Proc. 10th Eur. Conf. on Machine Learning, Chemnitz, Germany, 137–142.

    Google Scholar 

  15. Joachims, T. (2006). http://svmlight.joachims.org/. As of 15 November 2006.

  16. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C. (2002). Text classification using string kernels. J. Machine Learn. Res., 2, 419–444.

    MATH  Google Scholar 

  17. Shawe-Taylor, J., Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge.

    Book  Google Scholar 

  18. Navia-Vázquez, A., Pérez-Cruz, F., Artés-Rodríguez, A., Figueiras-Vidal, A. R. (2004). Advantages of unbiased support vector classifiers for data mining applications. J. VLSI Signal Process. Syst., 37 (1–2), 1035–1062.

    MATH  Google Scholar 

  19. Bennett, P. (2003). Using asymmetric distributions to improve text classifier probability estimates. In: Proc. 26th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Toronto, Ontario, 111–118.

    Google Scholar 

  20. Zadrozny, B., Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In: Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, Alberta, 694–699.

    Google Scholar 

  21. Allen, J., Byron, D., Dzikovska, M., Ferguson, G., Galescu, L., Stent, A. (2000). An architecture for a generic dialogue shell. Natural Language Engineering, Special Issue on Best Practice in Spoken Language Dialogue Systems Engineering, 6, 1–16.

    Google Scholar 

  22. Larsson, S., Traum, D. (2000). Information state and dialogue management in the TRINDI dialogue move engine toolkit. Natural Language Engineering, Special Issue on Best Practice in Spoken Language Dialogue Systems Engineering, 6, 323–340.

    Google Scholar 

  23. Stent, A., Dowding, J., Gawron, J., Bratt, E., Moore, R. (1999). The CommandTalk spoken dialogue system. In: Proc. 37th Annual Meeting of the Association for Computational Linguistics, University of Maryland, College Park, Maryland, VA, 183–190.

    Google Scholar 

  24. Haffner, P., Cortes, C., Mohri, M. (2003). Lattice kernels for spoken-dialog classification. In: Proc. 2003 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ’03), Hong Kong, 628–631.

    Google Scholar 

  25. Cortes, C., Haffner, P., Mohri, M. (2004). Rational kernels: Theory and algorithms. J. Machine Learn. Res., 5, 1035–1062.

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

Work at ICSI, UCSC and RIACS was supported by NASA Ames Research Center internal funding. Work at XRCE was partly supported by the IST Programme of the European Community, under the PASCAL Network of Excellence, IST-2002-506778. Several people not credited here as co-authors also contributed to the implementation of the Clarissa system: among these, we would particularly like to mention John Dowding, Susana Early, Claire Castillo, Amy Fischer and Vladimir Tkachenko. This publication only reflects the authors’ views.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manny Rayner .

Editor information

Editors and Affiliations

Appendix: Detailed Results for System Performance

Appendix: Detailed Results for System Performance

This appendix provides detailed performance results justifying the claims made in the main body of the chapter. We divide it into two parts: the first concerns the recognition task and the second the accept/reject task.

12.1.1 The Recognition Task

Table 12.7 presents the results of experiments contrasting speech understanding performance for the Regulus-based recogniser and the class N-gram recogniser, using several different sets of Alterf features (cf. Section 4.3). For completeness, we also present results for simulated perfect recognition, i.e. using the reference transcriptions. We used six different sets of Alterf features:

  • N-grams: N-gram features only.

  • LF: Logical-form-based patterns only.

  • String: String-based patterns only.

  • String + LF: Both string-based and logical-form-based patterns.

  • String + N-grams: Both string-based and N-gram features.

  • String + LF + N-grams: All types of features.

Table 12.7 Speech understanding performance for 8158 test sentences recorded during development, on 13 different configurations of the system

The general significance of these results is discussed at the end of Section 12.4. It is interesting to note that the combination of logical-form-based features and string-based features outperforms logical-form-based features alone (rows G-4 and G-2). Although the difference is small (6.0% versus 6.3%), a pairwise comparison shows that it is significant at the 1% level according to the McNemar sign test. There is no clear evidence that N-gram features are very useful. This supports the standard folklore result that semantic understanding components for command and control applications are more appropriately implemented using hand-coded phrase-spotting patterns than general associational learning techniques.

Table 12.8 presents a breakdown of speech understanding performance, by utterance length, for the best GLM-based and SLM-based versions of the system. There are two main points to note here. First, speech understanding performance remains respectable even for the longer utterances; second, the performance of the GLM-based version is consistently better than that of the SLM-based version for all utterance lengths.

Table 12.8 Speech understanding performance, broken down by utterance length, for the best GLM-based and SLM-based versions of the system (cf. Table 12.7). Results are omitted for the small group of utterances of length 10 or more

12.1.2 The Accept/Reject Task

Table 12.9 presents detailed results for the experiments on response filtering described in Section 12.5. All conclusions were confirmed by hypothesis testing, using the Wilcoxon rank test, at the 5% significance level. In the remainder of this section, we assess the impact made by individual techniques.

Table 12.9 Performance on accept/reject classification and the top-level speech understanding task, on 12 different configurations of the system

12.1.3 Kernel Types

Quadratic kernels performed better than linear (around 25% relative improvement in classification error); however, this advantage is less marked when considering the task metric (only 3–9% relative increase). Though small, the difference is statistically significant. This suggests that meaningful information for filtering lies, at least partially, in the co-occurrences of groups of words, rather than just in isolated words.

12.1.4 Asymmetric Error Costs

We next consider the effect of methods designed to take account of asymmetric error costs (cf. Section 12.5). Comparing GQ-1 (no treatment of asymmetric error costs) with GQ-2 (intrinsic SVM-optimisation using the j-parameter) and GQ-3 (calibration), we see that both methods produce a significant improvement in performance. On the u2 loss function that both methods aim to minimise, we attain a 9% relative improvement when using calibration and 6% when using intrinsic SVM optimisation; on the task metric, these gains are reduced to 5% (relative) for calibration, and only 2% for intrinsic SVM-optimisation, though both of these are still statistically significant. Error rates on individual classes show that, as intended, both methods move errors from false accepts (classes B and C) to the less dangerous false rejects (class A). In particular, the calibration method manages to reduce the false accept rate on cross-talk and out-of-domain utterances from 6.8% on GQ-1 to 4.7% on GQ-3 (31% relative), at the cost of an increase from 2.7% to 4.3% in the false reject rate for correctly recognised utterances.

12.1.5 Recognition Methods

Using the confidence threshold method, there was a large difference in performance between the GLM-based GT-1 and the SLM-based ST-1. In particular, the false accept rate for cross-talk and out-of-domain utterances is nearly twice as high (16.5% versus 8.9%) for the SLM-based recogniser. This supports the folklore result that GLM-based recognisers give better performance on the accept/reject task.

When using the SVM-based methods, however, the best GLM-based configuration (GQ-3) performs about as well as the best SLM-based configuration (SQ-1) in terms of average classification error, with both systems scoring about 5.5%. GQ-3 does perform considerably better than SQ-1 in terms of task error (5.4% versus 6.9%, or 21% relative), but this is due to better performance on the speech recognition and semantic interpretation tasks. Our conclusion here is that GLM-based recognisers do not necessarily offer superior performance to SLM-based ones on the accept/reject task, if a more sophisticated method than a simple confidence threshold is used.

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Rayner, M., Hockey, B.A., Renders, JM., Chatzichrisafis, N., Farrell, K. (2010). Spoken Dialogue Application in Space: The Clarissa Procedure Browser. In: Chen, F., Jokinen, K. (eds) Speech Technology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-73819-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-73819-2_12

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-73818-5

  • Online ISBN: 978-0-387-73819-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics