Skip to main content

Challenges in Audio Processing of Terrorist-Related Data

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11296))

Included in the following conference series:

Abstract

Much information in multimedia data related to terrorist activity can be extracted from the audio content. Our work in ongoing projects aims to provide a complete description of the audio portion of multimedia documents. The information that can be extracted can be derived from diarization, classification of acoustic events, language and speaker segmentation and clustering, as well as automatic transcription of the speech portions. An important consideration is ensuring that the audio processing technologies are well suited to the types of data of interest to the law enforcement agencies. While language identification and speech recognition may be considered as ’mature technologies’, our experience is that even state-of-the-art systems require customisation and enhancements to address the challenges of terrorist-related audio documents.

This work was partially financed by the Horizon 2020 project DANTE - Detecting and analysing terrorist-related online contents and financing activities and the French National Agency for Research as part of the SALSA project (Speech and Language technologies for Security Applications) under grant ANR-14-CE28-0021.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vu, N.T. et al.: A first speech recognition system for Mandarin-English code-switch conversational speech. In: IEEE ICASSP (2012)

    Google Scholar 

  2. Gauvain, J.L., Lamel, L., Adda, G.: Audio partitioningt and transcription for broadcast data indexation. Multimed. Tools Appl. 14, 187–200 (2001)

    Article  Google Scholar 

  3. House, A.S., Neuburg, E.P.: Toward automatic identification of the language of an utterance. I. Preliminary methodological considerations. JASA 62(3), 708–713 (1977)

    Article  Google Scholar 

  4. Gauvain, J.L., Lamel, L.: Identification of non-linguistic speech features. In: Human Language Technology (HLT 1993), pp. 96–101. ACL (1993)

    Google Scholar 

  5. Lamel, L., Gauvain, J.L.: A phone-based approach to non-linguistic speech feature identification. Comput. Speech Lang. 9(1), 87–103 (1995). https://doi.org/10.1006/csla.1995.0005

    Article  Google Scholar 

  6. Zissman, M.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio 4, 31–44 (1996)

    Article  Google Scholar 

  7. Benzeghiba, M. Gauvain, J.L., Lamel, L.: Improved n-gram phonotactic models for language recognition. In: Interspeech (2010)

    Google Scholar 

  8. Kadambe, S., Hieronymus, J.: Language identification with phonological and lexical models. In: IEEE ICASSP (1995)

    Google Scholar 

  9. Gauvain, J.L., Messaoudi, A., Schwenk, H.: Language recognition using phone lattices. In: ICSLP, pp. 1283–1286, Jeju Island (2004)

    Google Scholar 

  10. Dehak, N. et al.: Language recognition via i-vectors and dimensionality reduction. In: Interspeech, pp. 857–860, Florence (2011)

    Google Scholar 

  11. Martinez, D. et al.: Language recognition in iVectors space. In: Interspeech (2011)

    Google Scholar 

  12. Hinton, G., et al.: Deep neural networks foracoustic modeling in speech recognition. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  13. Weinreich, U.: Languages in Contact. Mouton, The Hague (1953)

    Google Scholar 

  14. Demby, G.: How code-switching explains the world (2013)

    Google Scholar 

  15. Amazouz, D., Adda-Decker, M, Lamel, L.: Addressing code-switching in French/Algerian Arabic speech. In: Proceedings of Interspeech 2017, pp. 62–66 (2017)

    Google Scholar 

  16. Jelinek, F.: Continuous speech recognition by statistical methods. Proc. IEEE 64, 532–556 (1976)

    Article  Google Scholar 

  17. Schwartz, R. et al.: Improved hidden Markov modeling of phonemes for continuous speech recognition. In: IEEE ICASSP, vol. 3, pp. 35.6.1–35.6.4 (1984)

    Google Scholar 

  18. Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: Interspeech (2015)

    Google Scholar 

  19. Cui, X., Goel, V., Kingsbury, B.: Data augmentation for deep neural network acoustic modelling. In: IEEE ICASSP, pp. 5619–5623 (2014)

    Google Scholar 

  20. Ragni, A., et al.: Data augmentation for low resource languages. In: Interspeech, pp. 810–814, Singapore (2014)

    Google Scholar 

  21. Gemmeke, J.F., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: IEEE ICASSP, pp. 776–780 (2017)

    Google Scholar 

  22. Hershey, S. et al.: CNN architectures for large-scale audio classification. In: IEEE ICASSP, pp. 131–135 (2017)

    Google Scholar 

  23. Takahashi, N. et al.: Deep convolutional neural networks and data augmentation for acoustic event detection, arXiv preprint arXiv:1604.07160 (2016)

  24. Snyder, D., Chen, G., Povey, D.: MUSAN: a music, speech, and noise corpus, CoRR abs/1510.08484 (2015). http://arxiv.org/pdf/1510.08484v1.pdf

  25. Martin, A. Garofolo, J.: NIST speech processing evaluations: LVCSR, speaker recognition, language recognition. In: IEEE Workshop on Signal Processing Applications for Public Security and Forensics, pp. 1–7 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jodie Gauvain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gauvain, J. et al. (2019). Challenges in Audio Processing of Terrorist-Related Data. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05716-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05715-2

  • Online ISBN: 978-3-030-05716-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics