Abstract
This chapter describes an approach for improved multimedia analysis as part of an ICT-based tool for community policing. It includes technology for automatic processing of audio, image and video contents sent as evidence by the citizens to the police. In addition to technical details of their development, results of their performance within initial pilots simulating nearly real crime situations are presented and discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Google Nexus 5x was selected as a reference for the INSPEC2T project; it includes an 8MP camera with a size of 1.4 micron pixels and 4.0 mm focal length.
References
Animetrics. (2018). Advanced 2D-to-3D algorithms for face recognition applications. Animetrics. Retrieved October, 2018, from http://animetrics.com/forensicagps
Amped. (2018). Amped Five. Amped SRL. Retrieved October, 2018, from https://ampedsoftware.com/five.html
Baltieri, D., Vezzani, R., & Cucchiara, R. (2011). 3DPes: 3D People Dataset for Surveillance and Forensics. In Proceedings of the 1st International ACM Workshop on Multimedia access to 3D Human Objects, pp. 59–64
Bevilacqua, M., Roumy, A., Guillemot, C., & Marie-Line. A. M. (2013). Video super-resolution via sparse combinations of key-frame patches in a compression context. In: 30th Picture Coding Symposium (PCS)
Bisani, M., & Ney, H. (2008). Joint-sequence models for grapheme-to-phoneme conversion. J Speech communication, 50(5), 434–451.
BOSCH. (n.d.). Video analytics at the edge. Bosch Sicherheitssysteme GmbH. Retrieved October, 2018, from https://ipp.boschsecurity.com/en/tools/video-tools/video-analytics-overview/intelligent-video-analysis-1
Campbell, J. P., Shen, W., Campbell, W. M., et al. (2009). Forensic speaker recognition. J IEEE Signal Processing Magazine, 26(2), 95.
Can, D., & Saraclar, M. (2011). Lattice indexing for spoken term detection. J IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2338–2347.
Cheng, D. S., Cristani, M., Stoppa, M., Bazzani, L., & Murino, V. (2011). Custom pictorial structures for re-identification. In: British Machine Vision Conference (BMVC).
CitizenCOP Foundation. (n.d.). CitizenCOP APP. CitizenCOP Foundation. Retrieved October, 2018, from http://www.citizencop.org/?page_id=168
Davis, E. K. (2009). Dlib-ml: A machine learning toolkit. J Machine Learning Research, 10, 1755–1758.
del Pozo, A., Aliprandi, C., & Álvarez, A. Mendes, C., Neto, J., Paulo, S., Piccinini, N., Raffaelli, M. (2014) SAVAS: Collecting, annotating and sharing audiovisual language resources for automatic subtitling. In: Ninth international conference on language resources and evaluation (LREC).
Eurostat. (2018). Individuals using the internet for participating in social networks, code: tin00127, Eurostat. Retrieved October, 2018, from http://ec.europa.eu/eurostat
Freesound Org. (n.d.). Freesound, Freesound Org. Retrieved October, 2018, from https://freesound.org/
Garcia-Romero, D., & Espy-Wilson, C. (2010). Speech forensics: Automatic acquisition device identification. The Journal of the Acoustical Society of America, 127(3), 2044–2044.
Gray, D., & Tao, H. (2008). Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: 10th European Conference on Computer Vision (ECCV).
Heafield, K. (2011). KenLM: Faster and smaller language model queries. In: Sixth workshop on statistical machine translation. Association for Computational Linguistics.
Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts.
Hunt, A. (1996). BEEP dictionary. Speech Applications Group, Sun Microsystems Laboratories. Retrieved October, 2018, from http://svr-www.eng.cam.ac.uk/comp.speech/Section1/Lexical/beep.html
Ikram, S., & Malik, H. (2010). Digital audio forensics using background noise. In: IEEE International Conference on Multimedia and Expo (ICME).
Itseez Inc. (2015). Open source computer vision library. Retrieved from https://github.com/itseez/opencv
Jain, V., & Learned-Miller, E. (2010). FDDB: A benchmark for face detection in unconstrained settings. Technical report UM-CS-2010-009, University of Massachusetts.
Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). DenseCap: Fully convolutional localization networks for dense captioning. In: IEEE Conf. on Computer Vision and Pattern Recognition.
Kazemi, V., & Sullivan, J. (2014). One Millisecond Face Alignment with an Ensemble of Regression Trees. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).
Koehn, P., Hoang, H., & Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E. (2007) Moses: Open source toolkit for statistical machine translation. In: 45th annual meeting of the ACL on interactive poster and demonstration sessions, Association for Computational Linguistics.
Koenig, B. E., & Lacey, D. S. (2015). Forensic authentication of digital audio and video files. In Handbook of digital forensics of multimedia data and devices, (pp. 133–181).
Loy CC (2017) QMUL underGround re-IDentification (GRID) dataset, School of Computer Science and Engineering, Nanyang Technological University, Singapore. Retrieved October, 2018, from http://personal.ie.cuhk.edu.hk/~ccloy/downloads_qmul_underground_reid.html
López Morràs, X. (2004). Transcriptor fonético automático del español. Retrieved October, 2018, from http://www.aucel.com/pln/
Maher, R. C. (2009). Audio forensic examination. IEEE Signal Processing Magazine, 26(2), 84–94.
Malik, H. (2013). Acoustic environment identification and its applications to audio forensics. J IEEE Transactions on Information Forensics and Security, 8(11), 1827–1837.
Mattys, S. L., Davis, M. H., Bradlow, A. R., & Scott, S. K. (2012). Speech recognition in adverse conditions: A review. J Language and Cognitive Processes, 27(7–8), 953–978.
Miami-Dade County. (2018). Download the COP app. Miami-Dade County. Retrieved October, 2018, from https://www8.miamidade.gov/global/service.page?Mduid_service=ser1508773998289190
Ministerio del Interior. (2018). AlertCops: Law Enforcement Agencies App, Ministerio del Interior Gobierno de España. Retrieved October, 2018, from https://alertcops.ses.mir.es/mialertcops/info/info.xhtml
Panayotov, V., Chen, G., Povey, D.,& Khudanpur, S. (2015). Librispeech: An ASR corpus based on public domain audio books.In: IEEE international conference on acoustics, Speech and Signal Processing (ICASSP).
Petroff, A. (2016). MasterCard launching selfie payments. Cable News Network. Retrieved October, 2018, from http://money.cnn.com/2016/02/22/technology/mastercard-selfie-pay-fingerprint-payments
Povey, D., Ghoshal, A., & Boulianne, G., et al. (2011). The Kaldi speech recognition toolkit. In: IEEE workshop on automatic speech recognition and understanding (ASRU), IEEE Signal Processing Society
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2013). 300 Faces in-the-wild challenge: The first facial landmark localization challenge. In: IEEE Intl Conf. On computer vision.
Sargsyan, G., & Stoter, A. (2016). D3.4 2nd SAG Meeting Report. INSPEC2T consortum public deliverable
TED Conferences. (n.d.). TED Ideas worth spreading. TED Conferences. Retrieved October, 2018, from https://www.ted.com
The Reno Police Department. (n.d.). myRPD App. The Reno police department. Retrieved October, 2018, from https://www.renopd.com/myrpd
Tilk, O., & Alum, T. (2015). LSTM for punctuation restoration in speech transcripts. In: 16th annual Conf. Of the international speech communication association (INTERSPEECH).
Varol, G., & Salah, A. A. (2015). Efficient large-scale action recognition in videos using extreme learning machines. J Expert Systems with Applications, 42(21), 8274.
Veaux, C., Yamagishi, J., & MacDonald, K., et al. (2017). CSTR VCTK Corpus: English multi-speaker Corpus for CSTR voice cloning toolkit. University of Edinburgh. The Centre for Speech Technology Research (CSTR).
WiredBlue. (n.d.). My Police Deapartment App. WiredBlue. Retrieved October, 2018, from http://mypdapp.com/
Wu, X., He, R., & Sun, Z. (2015). A lightened CNN for deep face representation. In: CoRR arXiv:1511.02683.
Yi, D., Lei, Z., Liao, S., & Li, S. Z. (2014). Learning face representation from scratch. In: CoRR. arXiv:1411.7923
Zheng, W. S., Gong, S., & Xiang, T. (2016). Towards open-world person re-identification by one-shot group-based verification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3), 591–606.
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. Advances in neural information processing systems 27.
Acknowledgements
This work has been supported by the EU project INSPEC2T under the H2020-FCT-2014 programme (GA 653749).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Leškovský, P. et al. (2019). Multimedia Analysis in Police–Citizen Communication: Supporting Daily Policing Tasks. In: Akhgar, B., Bayerl, P.S., Leventakis, G. (eds) Social Media Strategy in Policing. Security Informatics and Law Enforcement. Springer, Cham. https://doi.org/10.1007/978-3-030-22002-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-22002-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22001-3
Online ISBN: 978-3-030-22002-0
eBook Packages: Literature, Cultural and Media StudiesLiterature, Cultural and Media Studies (R0)