Spoken Dialogue Application in Space: The Clarissa Procedure Browser

Rayner, Manny; Hockey, Beth Ann; Renders, Jean-Michel; Chatzichrisafis, Nikos; Farrell, Kim

doi:10.1007/978-0-387-73819-2_12

Spoken Dialogue Application in Space: The Clarissa Procedure Browser

Manny Rayner³,
Beth Ann Hockey⁴,
Jean-Michel Renders⁵,
Nikos Chatzichrisafis⁶ &
…
Kim Farrell^7,8

Chapter
First Online: 01 January 2010

1280 Accesses
4 Citations

Abstract

Anyone who has seen more than three science-fiction films will probably be able to suggest potential uses for spoken dialogue systems in space. Until recently, however, NASA and other space agencies have shown a surprising lack of interest in attempting to make this dream a reality and it is only in the last few years that any serious work has been carried out. The present chapter describes Clarissa, an experimental voice-enabled system developed at NASA Ames Research Center during a 3-year project starting in early 2002, which enables astronauts to navigate complex procedures using only spoken input and output. Clarissa was successfully tested on the International Space Station (ISS) on June 27, 2005, and is, to the best of our knowledge, the first spoken dialogue application in space.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In this connection, we would particularly like to mention T.J. Creamer and Mike Fincke.
2.
Note that the grammar’s “logical forms” and the dialogue manager’s “dialogue moves” are not the same.
3.
For long side-conversations, the user has the option of using the “suspend” command (cf. Section 2.1) to pause recognition.

References

Aist, G., Dowding, J., Hockey, B. A., Hieronymus, J. (2002). An intelligent procedure assistant for astronaut training and support. In: Proc. 40th Annual Meeting of the Association for Computational Linguistics (demo track), Philadelphia, PA, 5–8.
Google Scholar
Martin, D., Cheyer, A., Moran, D. (1999). The open agent architecture: a framework for building distributed software systems. Appl. Artif. Intell., 13 (1–2), 92–128.
Google Scholar
Nuance (2006). http://www.nuance.com. As of 15 November 2006.
Knight, S., Gorrell, G., Rayner, M., Milward, D., Koeling, R., Lewin, I. (2001). Comparing grammar-based and robust approaches to speech understanding: a case study. In: Proc. Eurospeech 2001, Aalborg, Denmark, 1779–1782.
Google Scholar
Rayner, M., Hockey, B. A., Bouillon, P. (2006). Putting Linguistics into Speech Recognition: The Regulus Grammar Compiler. CSLI, Chicago, IL.
Google Scholar
Regulus (2006). http://sourceforge.net/projects/regulus/. As of 15 November 2006.
Pulman, S. G. (1992). Syntactic and semantic processing. In: Alshawi, H. (ed) The Core Language Engine, MIT, Cambridge, MA, 129–148.
Google Scholar
van Harmelen, T., Bundy, A. (1988). Explanation-based generalization – partial evaluation (research note). Artif. Intell., 36, 401–412.
Article Google Scholar
Rayner, M. (1988). Applying explanation-based generalization to natural-language processing. In: Proc. Int. Conf. on Fifth Generation Computer Systems, Tokyo, Japan, 1267–1274.
Google Scholar
Rayner, M., Hockey, B. A. (2003). Transparent combination of rule-based and data-driven approaches in a speech understanding architecture. In: Proc. 10th Conf. Eur. Chapter of the Association for Computational Linguistics, Budapest, Hungary, 299–306.
Google Scholar
Yarowsky, D. (1994). Decision lists for lexical ambiguity resolution. In: Proc. 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 88–95.
Google Scholar
Carter, D. (2000). Choosing between interpretations. In: Rayner, M., Carter, D., Bouillon, P., Digalakis, V., Wirén, M. (eds) The Spoken Language Translator, Cambridge University Press, Cambridge, MA, 78–97.
MATH Google Scholar
Dowding, J., Hieronymus, J. (2003). A spoken dialogue interface to a geologist’s field assistant. In: Proc. HLT-NAACL 2003: Demo Session, Edmonton, Alberta, 9–10.
Google Scholar
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In: Proc. 10th Eur. Conf. on Machine Learning, Chemnitz, Germany, 137–142.
Google Scholar
Joachims, T. (2006). http://svmlight.joachims.org/. As of 15 November 2006.
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C. (2002). Text classification using string kernels. J. Machine Learn. Res., 2, 419–444.
MATH Google Scholar
Shawe-Taylor, J., Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge.
Book Google Scholar
Navia-Vázquez, A., Pérez-Cruz, F., Artés-Rodríguez, A., Figueiras-Vidal, A. R. (2004). Advantages of unbiased support vector classifiers for data mining applications. J. VLSI Signal Process. Syst., 37 (1–2), 1035–1062.
MATH Google Scholar
Bennett, P. (2003). Using asymmetric distributions to improve text classifier probability estimates. In: Proc. 26th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Toronto, Ontario, 111–118.
Google Scholar
Zadrozny, B., Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In: Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, Alberta, 694–699.
Google Scholar
Allen, J., Byron, D., Dzikovska, M., Ferguson, G., Galescu, L., Stent, A. (2000). An architecture for a generic dialogue shell. Natural Language Engineering, Special Issue on Best Practice in Spoken Language Dialogue Systems Engineering, 6, 1–16.
Google Scholar
Larsson, S., Traum, D. (2000). Information state and dialogue management in the TRINDI dialogue move engine toolkit. Natural Language Engineering, Special Issue on Best Practice in Spoken Language Dialogue Systems Engineering, 6, 323–340.
Google Scholar
Stent, A., Dowding, J., Gawron, J., Bratt, E., Moore, R. (1999). The CommandTalk spoken dialogue system. In: Proc. 37th Annual Meeting of the Association for Computational Linguistics, University of Maryland, College Park, Maryland, VA, 183–190.
Google Scholar
Haffner, P., Cortes, C., Mohri, M. (2003). Lattice kernels for spoken-dialog classification. In: Proc. 2003 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP ’03), Hong Kong, 628–631.
Google Scholar
Cortes, C., Haffner, P., Mohri, M. (2004). Rational kernels: Theory and algorithms. J. Machine Learn. Res., 5, 1035–1062.
MathSciNet MATH Google Scholar

Download references

Acknowledgments

Work at ICSI, UCSC and RIACS was supported by NASA Ames Research Center internal funding. Work at XRCE was partly supported by the IST Programme of the European Community, under the PASCAL Network of Excellence, IST-2002-506778. Several people not credited here as co-authors also contributed to the implementation of the Clarissa system: among these, we would particularly like to mention John Dowding, Susana Early, Claire Castillo, Amy Fischer and Vladimir Tkachenko. This publication only reflects the authors’ views.

Author information

Authors and Affiliations

ISSCO/TIM University of Geneva, CH-1211, Geneva 4, Switzerland
Manny Rayner
UCSC UARC, NASA Ames Research Center, Mail Stop 19-26, Moffett Field, CA, 94035-1000, USA
Beth Ann Hockey
Xerox Research Center Europe, 6 chemin de Maupertuis, Meylan, 38240, France
Jean-Michel Renders
ISSCO/TIM, University of Geneva, CH-1211, Geneva 4, Switzerland
Nikos Chatzichrisafis
RIACS/NASA Ames Research Center, 444 Castro Street, Suite 320, Mountain View, CA, 94041, USA
Kim Farrell
Yahoo, 701 First Avenue, Sunnyvale, CA, 94089-0703, USA
Kim Farrell

Authors

Manny Rayner
View author publications
You can also search for this author in PubMed Google Scholar
Beth Ann Hockey
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Michel Renders
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Chatzichrisafis
View author publications
You can also search for this author in PubMed Google Scholar
Kim Farrell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manny Rayner .

Editor information

Editors and Affiliations

Department of Computing Science & Engineering, Chalmers University of Technology, 412 96, Göteborg, Sweden
Fang Chen
Department of Speech Sciences, University of Helsinki, 9, FIN-00014, Helsinki, Finland
Kristiina Jokinen

Appendix: Detailed Results for System Performance

This appendix provides detailed performance results justifying the claims made in the main body of the chapter. We divide it into two parts: the first concerns the recognition task and the second the accept/reject task.

12.1.1 The Recognition Task

Table 12.7 presents the results of experiments contrasting speech understanding performance for the Regulus-based recogniser and the class N-gram recogniser, using several different sets of Alterf features (cf. Section 4.3). For completeness, we also present results for simulated perfect recognition, i.e. using the reference transcriptions. We used six different sets of Alterf features:

N-grams: N-gram features only.
LF: Logical-form-based patterns only.
String: String-based patterns only.
String + LF: Both string-based and logical-form-based patterns.
String + N-grams: Both string-based and N-gram features.
String + LF + N-grams: All types of features.

Table 12.7 Speech understanding performance for 8158 test sentences recorded during development, on 13 different configurations of the system

Full size table

The general significance of these results is discussed at the end of Section 12.4. It is interesting to note that the combination of logical-form-based features and string-based features outperforms logical-form-based features alone (rows G-4 and G-2). Although the difference is small (6.0% versus 6.3%), a pairwise comparison shows that it is significant at the 1% level according to the McNemar sign test. There is no clear evidence that N-gram features are very useful. This supports the standard folklore result that semantic understanding components for command and control applications are more appropriately implemented using hand-coded phrase-spotting patterns than general associational learning techniques.

Table 12.8 presents a breakdown of speech understanding performance, by utterance length, for the best GLM-based and SLM-based versions of the system. There are two main points to note here. First, speech understanding performance remains respectable even for the longer utterances; second, the performance of the GLM-based version is consistently better than that of the SLM-based version for all utterance lengths.

Table 12.8 Speech understanding performance, broken down by utterance length, for the best GLM-based and SLM-based versions of the system (cf. Table 12.7). Results are omitted for the small group of utterances of length 10 or more

Full size table

12.1.2 The Accept/Reject Task

Table 12.9 presents detailed results for the experiments on response filtering described in Section 12.5. All conclusions were confirmed by hypothesis testing, using the Wilcoxon rank test, at the 5% significance level. In the remainder of this section, we assess the impact made by individual techniques.

Table 12.9 Performance on accept/reject classification and the top-level speech understanding task, on 12 different configurations of the system

Full size table

12.1.3 Kernel Types

Quadratic kernels performed better than linear (around 25% relative improvement in classification error); however, this advantage is less marked when considering the task metric (only 3–9% relative increase). Though small, the difference is statistically significant. This suggests that meaningful information for filtering lies, at least partially, in the co-occurrences of groups of words, rather than just in isolated words.

12.1.4 Asymmetric Error Costs

We next consider the effect of methods designed to take account of asymmetric error costs (cf. Section 12.5). Comparing GQ-1 (no treatment of asymmetric error costs) with GQ-2 (intrinsic SVM-optimisation using the j-parameter) and GQ-3 (calibration), we see that both methods produce a significant improvement in performance. On the u₂ loss function that both methods aim to minimise, we attain a 9% relative improvement when using calibration and 6% when using intrinsic SVM optimisation; on the task metric, these gains are reduced to 5% (relative) for calibration, and only 2% for intrinsic SVM-optimisation, though both of these are still statistically significant. Error rates on individual classes show that, as intended, both methods move errors from false accepts (classes B and C) to the less dangerous false rejects (class A). In particular, the calibration method manages to reduce the false accept rate on cross-talk and out-of-domain utterances from 6.8% on GQ-1 to 4.7% on GQ-3 (31% relative), at the cost of an increase from 2.7% to 4.3% in the false reject rate for correctly recognised utterances.

12.1.5 Recognition Methods

Using the confidence threshold method, there was a large difference in performance between the GLM-based GT-1 and the SLM-based ST-1. In particular, the false accept rate for cross-talk and out-of-domain utterances is nearly twice as high (16.5% versus 8.9%) for the SLM-based recogniser. This supports the folklore result that GLM-based recognisers give better performance on the accept/reject task.

When using the SVM-based methods, however, the best GLM-based configuration (GQ-3) performs about as well as the best SLM-based configuration (SQ-1) in terms of average classification error, with both systems scoring about 5.5%. GQ-3 does perform considerably better than SQ-1 in terms of task error (5.4% versus 6.9%, or 21% relative), but this is due to better performance on the speech recognition and semantic interpretation tasks. Our conclusion here is that GLM-based recognisers do not necessarily offer superior performance to SLM-based ones on the accept/reject task, if a more sophisticated method than a simple confidence threshold is used.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rayner, M., Hockey, B.A., Renders, JM., Chatzichrisafis, N., Farrell, K. (2010). Spoken Dialogue Application in Space: The Clarissa Procedure Browser. In: Chen, F., Jokinen, K. (eds) Speech Technology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-73819-2_12

Download citation

DOI: https://doi.org/10.1007/978-0-387-73819-2_12
Published: 17 April 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-73818-5
Online ISBN: 978-0-387-73819-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Detailed Results for System Performance

Appendix: Detailed Results for System Performance

12.1.1 The Recognition Task

12.1.2 The Accept/Reject Task

12.1.3 Kernel Types

12.1.4 Asymmetric Error Costs

12.1.5 Recognition Methods

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation