Skip to main content

Multimedia Analysis in Police–Citizen Communication: Supporting Daily Policing Tasks

  • Chapter
  • First Online:
Social Media Strategy in Policing

Abstract

This chapter describes an approach for improved multimedia analysis as part of an ICT-based tool for community policing. It includes technology for automatic processing of audio, image and video contents sent as evidence by the citizens to the police. In addition to technical details of their development, results of their performance within initial pilots simulating nearly real crime situations are presented and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Google Nexus 5x was selected as a reference for the INSPEC2T project; it includes an 8MP camera with a size of 1.4 micron pixels and 4.0 mm focal length.

References

  • Animetrics. (2018). Advanced 2D-to-3D algorithms for face recognition applications. Animetrics. Retrieved October, 2018, from http://animetrics.com/forensicagps

  • Amped. (2018). Amped Five. Amped SRL. Retrieved October, 2018, from https://ampedsoftware.com/five.html

  • Baltieri, D., Vezzani, R., & Cucchiara, R. (2011). 3DPes: 3D People Dataset for Surveillance and Forensics. In Proceedings of the 1st International ACM Workshop on Multimedia access to 3D Human Objects, pp. 59–64

    Google Scholar 

  • Bevilacqua, M., Roumy, A., Guillemot, C., & Marie-Line. A. M. (2013). Video super-resolution via sparse combinations of key-frame patches in a compression context. In: 30th Picture Coding Symposium (PCS)

    Google Scholar 

  • Bisani, M., & Ney, H. (2008). Joint-sequence models for grapheme-to-phoneme conversion. J Speech communication, 50(5), 434–451.

    Google Scholar 

  • BOSCH. (n.d.). Video analytics at the edge. Bosch Sicherheitssysteme GmbH. Retrieved October, 2018, from https://ipp.boschsecurity.com/en/tools/video-tools/video-analytics-overview/intelligent-video-analysis-1

  • Campbell, J. P., Shen, W., Campbell, W. M., et al. (2009). Forensic speaker recognition. J IEEE Signal Processing Magazine, 26(2), 95.

    Article  Google Scholar 

  • Can, D., & Saraclar, M. (2011). Lattice indexing for spoken term detection. J IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2338–2347.

    Google Scholar 

  • Cheng, D. S., Cristani, M., Stoppa, M., Bazzani, L., & Murino, V. (2011). Custom pictorial structures for re-identification. In: British Machine Vision Conference (BMVC).

    Google Scholar 

  • CitizenCOP Foundation. (n.d.). CitizenCOP APP. CitizenCOP Foundation. Retrieved October, 2018, from http://www.citizencop.org/?page_id=168

  • Davis, E. K. (2009). Dlib-ml: A machine learning toolkit. J Machine Learning Research, 10, 1755–1758.

    Google Scholar 

  • del Pozo, A., Aliprandi, C., & Álvarez, A. Mendes, C., Neto, J., Paulo, S., Piccinini, N., Raffaelli, M. (2014) SAVAS: Collecting, annotating and sharing audiovisual language resources for automatic subtitling. In: Ninth international conference on language resources and evaluation (LREC).

    Google Scholar 

  • Eurostat. (2018). Individuals using the internet for participating in social networks, code: tin00127, Eurostat. Retrieved October, 2018, from http://ec.europa.eu/eurostat

  • Freesound Org. (n.d.). Freesound, Freesound Org. Retrieved October, 2018, from https://freesound.org/

  • Garcia-Romero, D., & Espy-Wilson, C. (2010). Speech forensics: Automatic acquisition device identification. The Journal of the Acoustical Society of America, 127(3), 2044–2044.

    Article  Google Scholar 

  • Gray, D., & Tao, H. (2008). Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: 10th European Conference on Computer Vision (ECCV).

    Google Scholar 

  • Heafield, K. (2011). KenLM: Faster and smaller language model queries. In: Sixth workshop on statistical machine translation. Association for Computational Linguistics.

    Google Scholar 

  • Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts.

    Google Scholar 

  • Hunt, A. (1996). BEEP dictionary. Speech Applications Group, Sun Microsystems Laboratories. Retrieved October, 2018, from http://svr-www.eng.cam.ac.uk/comp.speech/Section1/Lexical/beep.html

  • Ikram, S., & Malik, H. (2010). Digital audio forensics using background noise. In: IEEE International Conference on Multimedia and Expo (ICME).

    Google Scholar 

  • Itseez Inc. (2015). Open source computer vision library. Retrieved from https://github.com/itseez/opencv

  • Jain, V., & Learned-Miller, E. (2010). FDDB: A benchmark for face detection in unconstrained settings. Technical report UM-CS-2010-009, University of Massachusetts.

    Google Scholar 

  • Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). DenseCap: Fully convolutional localization networks for dense captioning. In: IEEE Conf. on Computer Vision and Pattern Recognition.

    Google Scholar 

  • Kazemi, V., & Sullivan, J. (2014). One Millisecond Face Alignment with an Ensemble of Regression Trees. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).

    Google Scholar 

  • Koehn, P., Hoang, H., & Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E. (2007) Moses: Open source toolkit for statistical machine translation. In: 45th annual meeting of the ACL on interactive poster and demonstration sessions, Association for Computational Linguistics.

    Google Scholar 

  • Koenig, B. E., & Lacey, D. S. (2015). Forensic authentication of digital audio and video files. In Handbook of digital forensics of multimedia data and devices, (pp. 133–181).

    Google Scholar 

  • Loy CC (2017) QMUL underGround re-IDentification (GRID) dataset, School of Computer Science and Engineering, Nanyang Technological University, Singapore. Retrieved October, 2018, from http://personal.ie.cuhk.edu.hk/~ccloy/downloads_qmul_underground_reid.html

  • López Morràs, X. (2004). Transcriptor fonético automático del español. Retrieved October, 2018, from http://www.aucel.com/pln/

  • Maher, R. C. (2009). Audio forensic examination. IEEE Signal Processing Magazine, 26(2), 84–94.

    Article  Google Scholar 

  • Malik, H. (2013). Acoustic environment identification and its applications to audio forensics. J IEEE Transactions on Information Forensics and Security, 8(11), 1827–1837.

    Article  Google Scholar 

  • Mattys, S. L., Davis, M. H., Bradlow, A. R., & Scott, S. K. (2012). Speech recognition in adverse conditions: A review. J Language and Cognitive Processes, 27(7–8), 953–978.

    Google Scholar 

  • Miami-Dade County. (2018). Download the COP app. Miami-Dade County. Retrieved October, 2018, from https://www8.miamidade.gov/global/service.page?Mduid_service=ser1508773998289190

  • Ministerio del Interior. (2018). AlertCops: Law Enforcement Agencies App, Ministerio del Interior Gobierno de España. Retrieved October, 2018, from https://alertcops.ses.mir.es/mialertcops/info/info.xhtml

  • Panayotov, V., Chen, G., Povey, D.,& Khudanpur, S. (2015). Librispeech: An ASR corpus based on public domain audio books.In: IEEE international conference on acoustics, Speech and Signal Processing (ICASSP).

    Google Scholar 

  • Petroff, A. (2016). MasterCard launching selfie payments. Cable News Network. Retrieved October, 2018, from http://money.cnn.com/2016/02/22/technology/mastercard-selfie-pay-fingerprint-payments

  • Povey, D., Ghoshal, A., & Boulianne, G., et al. (2011). The Kaldi speech recognition toolkit. In: IEEE workshop on automatic speech recognition and understanding (ASRU), IEEE Signal Processing Society

    Google Scholar 

  • Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2013). 300 Faces in-the-wild challenge: The first facial landmark localization challenge. In: IEEE Intl Conf. On computer vision.

    Google Scholar 

  • Sargsyan, G., & Stoter, A. (2016). D3.4 2nd SAG Meeting Report. INSPEC2T consortum public deliverable

    Google Scholar 

  • TED Conferences. (n.d.). TED Ideas worth spreading. TED Conferences. Retrieved October, 2018, from https://www.ted.com

  • The Reno Police Department. (n.d.). myRPD App. The Reno police department. Retrieved October, 2018, from https://www.renopd.com/myrpd

  • Tilk, O., & Alum, T. (2015). LSTM for punctuation restoration in speech transcripts. In: 16th annual Conf. Of the international speech communication association (INTERSPEECH).

    Google Scholar 

  • Varol, G., & Salah, A. A. (2015). Efficient large-scale action recognition in videos using extreme learning machines. J Expert Systems with Applications, 42(21), 8274.

    Google Scholar 

  • Veaux, C., Yamagishi, J., & MacDonald, K., et al. (2017). CSTR VCTK Corpus: English multi-speaker Corpus for CSTR voice cloning toolkit. University of Edinburgh. The Centre for Speech Technology Research (CSTR).

    Google Scholar 

  • WiredBlue. (n.d.). My Police Deapartment App. WiredBlue. Retrieved October, 2018, from http://mypdapp.com/

  • Wu, X., He, R., & Sun, Z. (2015). A lightened CNN for deep face representation. In: CoRR arXiv:1511.02683.

  • Yi, D., Lei, Z., Liao, S., & Li, S. Z. (2014). Learning face representation from scratch. In: CoRR. arXiv:1411.7923

    Google Scholar 

  • Zheng, W. S., Gong, S., & Xiang, T. (2016). Towards open-world person re-identification by one-shot group-based verification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3), 591–606.

    Article  Google Scholar 

  • Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. Advances in neural information processing systems 27.

    Google Scholar 

Download references

Acknowledgements

This work has been supported by the EU project INSPEC2T under the H2020-FCT-2014 programme (GA 653749).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Leškovský .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Leškovský, P. et al. (2019). Multimedia Analysis in Police–Citizen Communication: Supporting Daily Policing Tasks. In: Akhgar, B., Bayerl, P.S., Leventakis, G. (eds) Social Media Strategy in Policing. Security Informatics and Law Enforcement. Springer, Cham. https://doi.org/10.1007/978-3-030-22002-0_13

Download citation

Publish with us

Policies and ethics