Skip to main content
Log in

Spontaneous speech and opinion detection: mining call-centre transcripts

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Opinion mining on conversational telephone speech tackles two challenges: the robustness of speech transcriptions and the relevance of opinion models. The two challenges are critical in an industrial context such as marketing. The paper addresses jointly these two issues by analyzing the influence of speech transcription errors on the detection of opinions and business concepts. We present both modules: the speech transcription system, which consists in a successful adaptation of a conversational speech transcription system to call-centre data and the information extraction module, which is based on a semantic modeling of business concepts, opinions and sentiments with complex linguistic rules. Three models of opinions are implemented based on the discourse theory, the appraisal theory and the marketers’ expertise, respectively. The influence of speech recognition errors on the information extraction module is evaluated by comparing its outputs on manual versus automatic transcripts. The F-scores obtained are 0.79 for business concepts detection, 0.74 for opinion detection and 0.67 for the extraction of relations between opinions and their target. This result and the in-depth analysis of the errors show the feasibility of opinion detection based on complex rules on call-centre transcripts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www.edf.com/the-edf-group-42667.html.

  2. VoxFactory is a project of Cap Digital, the French business cluster for digital content in Paris and the Ile de France region (http://www.capdigital.com/vox-factory/).

  3. http://www.vocapia.com/.

  4. http://www.limsi.fr/index.en.html.

  5. http://www.temis.com/.

  6. http://www.sinequa.com/.

  7. http://www.xrce.xerox.com/Research-Development/Historical-projects/XeLDA.

  8. XCAS is a format defined by Apache UIMA™ project (Unstructured Information Management Applications) http://uima.apache.org/.

  9. Appreciation and judgment have been gathered because this distinction is not suited to call-centre data.

  10. http://www.vecsys.fr/.

  11. Commission Nationale de l’Informatique et des Libertés. The CNIL’s general mission consists in ensuring that the development of information technology remains at the service of citizens and does not breach human identity, human rights, privacy or personal or public liberties.

  12. This strategy for backchannel annotation has been chosen in a perspective of overlapping speech segment detection.

  13. Simple Metadata Annotation Specification, Version 6.2—February 3, 2004 Linguistic Data Consortium www.ldc.upenn.edu/Projects/MDESimpleMDE.

  14. “You have less consumed than you has been debited”.

  15. “I have lived in Beauvoisin for thirty years”.

  16. “I have lived at your neighbours for thirty years”.

  17. Long sequences of errors could be explained by bad acoustic conditions (phone noise, saturation), a bad articulation, presence of disfluencies, etc.

References

  • Adda-Decker, M., & Lamel, L. (2005). Do speech recognizers prefer female speakers? In Proceedings of the 9th international conference on speech communication and technology (Interspeech’05) (pp. 2205–2208).

  • Adda-Decker, M., Habert, B., Barras, C., Adda, G., & Boula De Mareuil, P. (2003). A disfluency study for cleaning spontaneous speech automatic transcripts and improving speech language models. In Proceedings of disfluency in spontaneous speech workshop (pp. 67–70), Göteborg, Sweden.

  • Allauzen, A. (2007). Error detection in confusion network. In Proceedings of the 8th annual conference of the international speech communication association (Interspeech’ 07) (pp. 1749–1752), Antwerp, Belgium.

  • Appelt D., Hobbs J., Bear J., Israel D., Kameyama M., & Tyson, M. (1993). FASTUS: A finite state processor for information extraction from real-world text. In Proceedings of the international joint conference on artificial intelligence (pp. 1172–1178), Chambéry, France.

  • Barras, C., Geoffrois, E., Wu, Z., & Liberman, M. (2000). Transcriber: Development and use of a tool for assisting speech corpora production. Speech Communication (special issue on speech annotation and corpus tools), 33(1–2).

  • Barras, C., Zhu, X., Meignier, S., & Gauvain, J.-L. (2006). Multi-stage speaker diarization of broadcast news. IEEE Transactions on Audio, Speech and Language Processing, 14(5), 1505–1512.

    Article  Google Scholar 

  • Baumann, T., Buß, O., Atterer, M., & Schlangen, D. (2009). Evaluating the potential utility of ASR N-best lists for incremental spoken dialogue systems. In Proceedings of interspeech (pp. 1031–1034), Brighton.

  • Benveniste, E. (1970). L’appareil formel de l’énonciation. In Problèmes de linguistique générale II (pp. 79–88). Gallimard, 1974.

  • Blanche-Benveniste, C. (1990). Le français parlé : Etudes grammaticales. Paris: Didier-Erudition.

    Google Scholar 

  • Bloom, K., Garg, N., & Argamon, S. (2007). Extracting appraisal expressions. In Proceedings of HLT-NAACL (pp. 308–315).

  • Bozzi, L., Suignard, P., & Waast-Richard, C. (2009). Segmentation et classification non supervisée de conversations téléphoniques automatiquement retranscrites. In Proceedings of TALNTraitement Automatique des Langues Naturelles, Juillet 2009.

  • Brill, E. (1995). Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics, 21(4), 1–37.

    Google Scholar 

  • Cailliau, F., & Cavet, A. (2010). Analyse des sentiments et transcription automatique: modélisation du déroulement de conversations téléphoniques. Revue Traitement Automatique des Langues, 51(3), 131–154.

    Google Scholar 

  • Cailliau, F., & Giraudel, A. (2008). Enhanced search and navigation on conversational speech. In Proceedings of SIGIR workshop of searching spontaneous conversational speech, Singapour.

  • Charaudeau, P. (1992). Grammaire du sens et de l’expression. Paris: Hachette Education.

    Google Scholar 

  • Clavel, C., & Richard, G. (2011). Recognition of acoustic emotion. In C. Pelachaud (Ed.), Emotional interaction system, John Wisley.

  • Dubreil E., Vernier M., Monceaux L., & Daille, B. (2008). Annotating opinion—evaluation of blogs. In Proceedings of the LREC workshop on sentiment analysis: metaphor, ontology and terminology (EMOT-08), Marrakech, 2008.

  • Danesi, C., & Clavel, C. (2010). Impact of spontaneous speech features on business concept detection: A study of call-centre data. In Proceedings of the ACM multimedia workshop on searching spontaneous conversational speech, Firenze, Italy.

  • Dave, K., Lawrence, S., & Pennock, D. (2003). Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In Proceedings of the twelfth international world wide web conference (pp. 519–528).

  • Devillers, L., Vaudable, C., & Chastagnol, C. (2010). Real-life emotion-related states detection in call centers: A cross-corpora study. In Proceedings of interspeech (pp. 2350–2353), Makhuari, Japan.

  • Galliano, S., Gravier, G., & Chaubard, L. (2009). The ESTER2 evaluation campaign for the rich transcription of French radio broadcasts. In Proceedings of interspeech (pp. 2583–2586), Brighton.

  • Garnier-Rizet, M., Adda, G., Cailliau, F., Gauvain, J.-L., Guillemin-Lanne, S., & Lamel, L. (2008). CallSurf: Automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content. In Proceedings of the sixth international language resources and evaluation (LREC) European Language Resources Association (ELRA) (pp. 2623–2628), Marrakech, Morocco.

  • Garofolo, J., Auzanne, C., & Voorhees, E. (1999). The TREC spoken document retrieval track: A success story. In Proceedings of text retrieval conference (TREC) (Vol. 8, pp. 16–19).

  • Gauvain, J.-L., Lamel, L., & Adda, G. (1998). Partitioning and transcription of broadcast news data. In International conference on speech and language processing (Vol. 4, pp. 1335–1338), Sydney, Australia.

  • Gauvain, J. L., Lamel, L., Schwenk, H., Adda, G., Chen, L., & Lefevre, F. (2003). Conversational telephone speech recognition. In Proceedings of ICASSP (pp. 212–215), Hong Kong, 2003.

  • Gillick, L., Ito, Y., & Young, J. (1997). A probabilistic approach to confidence estimation and evaluation. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing (Vol. 2, pp. 879–882), Munich, Germany.

  • Godfrey, J. J., Holliman, E. C. & Mcdaniel, J. (1992). Switchboard: Telephone speech corpus for research and development. In Proceedings of ICASSP (Vol. 1, pp. 517–520), San Francisco.

  • Goldwater, S., Jurafsky, D., & Manning, C. D. (2010). Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication, 52(3), 181–200.

    Google Scholar 

  • Hain, T., Woodland, P. C., Evermann, G., & Povey, D. (2000). The CU-HTK Hub5e Transcription System. College Park, MD: In Proceedings of NIST Speech Transcription Workshop.

    Google Scholar 

  • Halliday, M. (1994). Introduction to functional grammar (2nd ed.). London: Edward Arnold.

    Google Scholar 

  • Hardya, H., Bierman, A., Bryce Inouye, R., Mckenzie, A., Strzalkowski, T., Ursu, C., et al. (2006). The Amities system: Data-driven techniques for automated dialogue. Speech Communication (issue Spoken Language Understanding in Conversational Systems), 48(3–4), 354–373.

  • Hazen, T., Burianek, T., Polifroni, J., & Seneff, S. (2000). Recognition confidence scoring for use in speech understanding systems. In Proceedings of the ISCA ASR2000 tutorial and research workshop (pp. 213–220). Paris.

  • Kneser, R., & Ney, H. (1995). Improved backing-off for m-gram language modeling. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing (Vol. 1, p. 181).

  • Ljolje, A., Hindle, D., Riley, M., & Sproat, R. (2000). The AT&T LVCSR-2000 system. In Proceedings of NIST speech transcription workshop, College Park, MD.

  • Martin, J. R., & White, P. R. R. (2005). The language of evaluation, appraisal in English. London, New York: Palgrave Macmillan.

    Google Scholar 

  • Matsoukas, S., Colthurst, T., Kimball, O., Solomonoff, A., Richardson, F., Quillen, C., et al. (2002). The 2001 Byblos English large vocabulary conversational speech recognition system. In Proceedings of ICASSP (Vol. 1, pp. 721–724).

  • Olsson, J. S., Wintrode, J., & Lee, M. (2007). Fast unconstrained audio search in numerous human languages. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing (Vol. 4, pp. 77–80), Honolulu, Hawaii.

  • Ostendorf, M., (2009). Transcribing human-directed speech for spoken language processing, In Proceedings of interspeech (pp. 21–26), Brighton, United Kingdom.

  • Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. In Foundation and trends in information retrieval (Vol. 2(1–2), pp. 1–135). Hanover, USA: Now Publishers Inc.

  • Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the Acl-02 conference on empirical methods in natural language processing (Vol. 10, pp. 79–86), Morristown, NJ.

  • Park, Y., Patwardhan, S., Visweswariah, K., & Gates, S. C. (2008). An empirical analysis of word error rate and keyword error rate. In Proceedings of Interspeech (pp. 2070–2073).

  • Paumier S. (2002). Manuel d’utilisation d’Unitex, Université de Marne-la-Vallée. http://www-lipn.univ-paris13.fr/~rozenknop/Cours/MICR_REI/Seance2/ManuelUnitex1.2.pdf. Accessed 31 Jan 2012.

  • Poibeau, T. (2002). Extraction d’information à base de connaissances hybrides, PhD Thesis, Université Paris-Nord, March 2002.

  • Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A comprehensive grammar of the English language. Harlow: Longman.

    Google Scholar 

  • Riloff, E., & Wiebe, J. (2003). Learning extraction patterns for subjective expressions. In Proceedings of the conference on empirical methods in natural language processingtheoretical issues in natural language processing. association for computational linguistics (Vol. 10, pp. 105–112), Morristown, NJ.

  • Shriberg, E. (1994). Preliminaries to a theory of speech disfluencies. PhD Thesis, University of Berkeley, California.

  • Silberztein, M. (1994). Dictionnaires électroniques et analyse automatique de textes: le système INTEX. Paris: Masson.

    Google Scholar 

  • Stolcke, A., Bratt, H., Butzberger, J., Franco, H., Rao Gadde, V. R., Plauche, M., et al. (2000). The SRI March 2000 Hub-5 Conversational speech transcription system. In Proceedings of NIST Speech Transcription Workshop, College Park, MD.

  • Tang, H., Tan, S., & Cheng, X. (2009). A survey on sentiment detection of reviews. Expert Systems with Applications, 36(7), 10760–10773.

    Article  Google Scholar 

  • Ten Bosch, L., & Boves, L. (2004). Survey of spontaneous speech phenomena in a multimodal dialogue system and some implications for ASR. In Proceedings of Interspeech (pp. 1505–1508), Korea.

  • Turney, P. D. (2002). Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417–424), Morristown, NJ.

  • Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems, 21(4), 315–346.

    Article  Google Scholar 

  • Whitelaw, C., Garg, N., & Argamon, S. (2005). Using appraisal groups for sentiment analysis. In Proceedings of the 14th ACM international conference on information and knowledge management- CIKM (pp. 625–631), Bremen, Germany,.

  • Wiebe, J. (2000). Learning subjective adjectives from corpora. In Proceedings of the seventeenth national conference on artificial intelligence and twelfth conference on innovative applications of artificial intelligence (pp. 735–740).

  • Wiebe, J., & Riloff, E. (2005). Creating subjective and objective sentence classifiers from unannotated texts. In Proceedings of the 6th international conference on computational linguistics and intelligent text processing (CICLing-05), invited paper, springer LNC (p. 3406). Berlin: Springer.

Download references

Acknowledgments

This work was partly financed by CAP DIGITAL, the Business Cluster for digital content through the VoxFactory project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chloé Clavel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clavel, C., Adda, G., Cailliau, F. et al. Spontaneous speech and opinion detection: mining call-centre transcripts. Lang Resources & Evaluation 47, 1089–1125 (2013). https://doi.org/10.1007/s10579-013-9224-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-013-9224-5

Keywords

Navigation