Abstract
In this contribution a novel system of speaker segmentation has been designed for improving safety on voice communication in air traffic control. In addition to the usage of the aircraft identification tag to assign speaker turns on the shared communication channel to aircrafts, speaker verification is investigated as an add-on attribute to improve security level effectively for the air traffic control. The verification task is done by training universal background models and speaker dependent models based on Gaussian mixture model approach. The feature extraction and normalization units are especially optimized to deal with small bandwidth restrictions and very short speaker turns. To enhance the robustness of the verification system, a cross verification unit is further applied. The designed system is tested with SPEECHDAT-AT and WSJ0 database to demonstrate its superior performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Van Es, G.: Air-ground communication safety study: An analysis of pilot-controller occurrences. EUROCONTROL DAP/SAF Ed. 1.0 (2004/04)
Hering, H., Hagmuller, M., Kubin, G.: Safety and security increase for air traffic management through unnoticeable watermark aircraft identification tag transmitted with the VHF voice communication. In: Proc. of the 22nd IEEE Dig. Avionics Sys. Conf. DASC 2003, vol. 1, 4.E.2–41-10 (2003)
Neffe, M., Hering, H., Kubin, G.: Speaker segmentation for conventional ATC voice communication. In: 4th EUROCONTROL Innovative Research Workshop Brétigny-sur-Orge France (2005)
Mistral Project (2005-2006), http://www.mistral-project.at
Abad, A., Brutti, A., Chu, S., Hernando, J., Klee, U., Macho, D., McDonough, J., Nadeu, C., Omologo, M., Padrell, J., Potamianos, G., Svaizer, P., Wölfel, M.: First experiments of automatic speech activity detection, source localization and speech recognition in the chil project. In: Proc. of Workshop on Hands-Free Speech Communication and Microphone Arrays, Rutgers University, Piscataway, NJ (2005), http://chil.server.de/servlet/is/101/
Airlines electronic engineering committee. Airborne VHF Communications Transceiver, Manual Annapolis, Maryland (2003), https://www.arinc.com/cf/store/catalog_detail.cfm?item_id=493
Hofbauer, K., Hering, H., Kubin, G.: A measurement system and the TUG-EEC-Channels database for the aeronautical voice radio. In: IEEE Vehicular Technology Conference, Montreal, Canada (2006)
Hofbauer, K., Hering, H., Kubin, G.: Aeronautical voice radio channel modelling and simulation - A tutorial review. ICRAT, Belgrade, Serbia and Montenegro (2006)
Reynolds, D.A., Campbell, W., Gleason, T.T., Quillen, C., Sturim, D., Torres-Carrasquillo, P., Adami, A.: The 2004 MIT Lincoln Laboratory Speaker Recognition System. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), vol.1, pp. 177–180 (2005)
Kinnunen, T., Karpov, E., Franti, P.: Real-time speaker identification and verification. IEEE Transactions on Audio, Speech and Language Processing 14(1), 277–288 (2006)
Chen, K.: On the use of different speech representations for speaker modeling. IEEE Trans. on Systems, Man and Cybernetics, Part C 35, 301–314 (2005)
Wan, V., Campbell, W.M.: Support vector machines for speaker verification and identification. In: Proc. of the IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing X, vol. 2, pp. 775–784 (2000)
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 3(1), 72–83 (1995)
Stahl, V., Fischer, A., Bippus, R.: Quantile based noise estimation for spectral subtraction and Wiener filtering. In: Proc. of the IEEE International Conf. on Acoustics, Speech, and Signal ICASSP 2000, vol. 3, pp. 1875–1878 (2000)
Pham, T.V., Kubin, G.: WPD-based noise suppression using nonlinearly weighted threshold quantile estimation and optimal wavelet shrinking. In: Proc. Interspeech 2005 Lisboa, Portugal, pp. 2089–2092 (september 4-8, 2005)
Pham, T.V., Kèpèsi, M., Kubin, G., Weruaga, L., Juffinger, A., Grabner, M.: Noise cancellation frontends for automatic meeting transcription. In: Euronoise Conf. Tampere, Finland, CS.42-445 (2006)
Brady, P.T.: A statistical analysis of on-off pattern in 16 conversations. Bell Syst. Tech.J. 47(1), 73–91 (1968)
Eurocontrol experimental centre-EEC Brètigny-sur-Orge, France, http://www.eurocontrol.int/eec/public/subsite_homepage/homepage.html
Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacretaz, D., Reynolds, D.A.: A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing, pp. 430–451 (2003)
de la Torre, A., Peinado, A.M., Segura, J.C., Perez-Cordoba, J.L., Benitez, M.C., Rubio, A.J.: Histogram equalization of speech representation for robust speech recognition. IEEE Transactions on Speech and Audio Processing 13, 355–366 (2005)
Skosan, M., Mashao, D.: Modified segmental histogram equalization for robust speaker verification. Pattern Recognition Letters 27(5), 479–486 (2006)
Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacretaz, D., Reynolds, D.A.: Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 10, 19–41 (2000)
Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10, 42–54 (2000)
Baum, M., Erbach, G., Kubin, G.: Speechdat-AT: A telephone speech database for Austrian German. In: Proc. LREC Workshop Very Large Telephone Databases (XLDB) Athen, Greece, pp. 51–56 (2000)
Garofalo, J., David Graff, D., Paul, D., Pallett, D.: Continuous Speech Recognition (CSR-I) Wall Street Journal (WSJ0) news, complete. Linguistic Data Consortium, Philadelphia (1993), http://ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S6A
Przybocki, M., Martin, A.: Nist speaker recognition evaluation (1997), http://www.nist.gov/speech/tests/spk/1997/sp_v1p1.htm
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The det curve in assessment of detection task performance. In: Proc. Eurospeech, Rhodes, pp. 1895–1898 (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Neffe, M., Van Pham, T., Hering, H., Kubin, G. (2007). Speaker Segmentation for Air Traffic Control. In: Müller, C. (eds) Speaker Classification II. Lecture Notes in Computer Science(), vol 4441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74122-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-74122-0_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74121-3
Online ISBN: 978-3-540-74122-0
eBook Packages: Computer ScienceComputer Science (R0)