Smart surveillance system for real-time multi-person multi-camera tracking at the edge

Abstract

In this work, we have presented an end-to-end multi-person multi-camera tracking (MPMCT) surveillance system and implemented it on edge analytics platform for real-time performance. The proposed MPMCT framework is both privacy-aware and scalable supporting a processing pipeline on the edge consisting of person detection, tracking and robust person re-identification. A realistic and large dataset has been created to train and evaluate the surveillance system that has been employed to track people inside the institute campus throughout the entire day. Appropriate deep-learning algorithms and real-time implementation strategies have been employed to realize the MPMCT system on NVIDIA Jetson TX2 embedded platform with real-time performance. The proposed system has an IDF1 score of 90.97 on our dataset and outperforms the current state-of-the-art real-time algorithms. The performance up to 30 FPS is achieved for the person detection algorithm, whereas an average latency of 90 ms is achieved for the re-identification algorithm.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. 1.

    Hampapur, A., Brown, L., Connell, J., Pankanti, S., Senior, A., Tian, Y.: Smart surveillance: applications, technologies and implications. In: Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, vol. 2. IEEE, pp. 1133–1138 (2003)

  2. 2.

    Xiaogang, W.: Intelligent multi-camera video surveillance: a review. Pattern Recognit. Lett. 34(1), 3–19 (2013)

    Article  Google Scholar 

  3. 3.

    Yang, J., Yan, J., Liao, S., Yi, D., Li, S.Z.: Salient color names for person re-identification. In: European Conference on Computer Vision, pp. 536–551. Springer, Cham (2014)

  4. 4.

    Zhao, R., Ouyang, W., Wang, X.: Learning mid-level filters for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 144–151 (2014)

  5. 5.

    Paisitkriangkrai, S., Shen, C., Van Den Hengel, A.: Learning to rank in person re-identification with metric ensembles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1846–1855 (2015)

  6. 6.

    Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P. M., Bischof, H.: Large scale metric learning from equivalence constraints. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2288–2295. IEEE (2012)

  7. 7.

    Khamis, S., Kuo, C.H., Singh, V.K., Shet, V.D., Davis, L.S.: Joint learning for attribute-consistent person re-identification. In: European Conference on Computer Vision, pp. 134–146. Springer, Cham (2014)

  8. 8.

    Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2197–2206 (2015)

  9. 9.

    Chiang, M., Zhang, T.: Fog and IOT: An overview of research opportunities. IEEE Internet Things J. 99, 1–1 (2016)

    Google Scholar 

  10. 10.

    Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L.: Edge computing: vision and challenges. IEEE Internet Things J. 3(5), 637–646 (2016)

    Article  Google Scholar 

  11. 11.

    Lee, W.K., Leong, C.F., Lai, W.K., Leow, L.K., Yap, T.H.: ArchCam: real-time expert system for suspicious behaviour detection in ATM site. Expert Syst. Appl. 109, 12–24 (2018)

    Article  Google Scholar 

  12. 12.

    Neff, C., Mendieta, M., Mohan, S., Baharani, M., Rogers, S., Tabkhi, H.: REVAMP2T: real-time edge video analytics for multi-camera privacy-aware pedestrian tracking. IEEE Internet Things J. 7(4), 2591–2602 (2019)

    Article  Google Scholar 

  13. 13.

    Embedded Systems for Next-Generation Autonomous Machines, NVIDIA Jetson: The AI platform for autonomous everything. https://www.nvidia.com/en-in/autonomous-machines/embedded-systems/. Accessed 24 July 2020.

  14. 14.

    Huang, T., Russell, S.: Object identification in a Bayesian context. In: IJCAI, vol. 97, pp. 1276–1282 (1997)

  15. 15.

    Omar, J., Khurram, S., Zeeshan, R., Mubarak, S.: Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views. Comput. Vis. Image Underst. 109(2), 146–162 (2008)

    Article  Google Scholar 

  16. 16.

    Kuan-Wen, C., Chih-Chuan, L., Pei-Jyun, L., Chu-Song, C., Yi-Ping, H.: Adaptive learning for target tracking and true linking discovering across multiple non-overlapping cameras. IEEE Trans. Multimed. 13(4), 625–638 (2011)

    Article  Google Scholar 

  17. 17.

    Tesfaye, Y.T., Zemene, E., Prati, A., Pelillo, M., Shah, M.: Multi-target tracking in multiple non-overlapping cameras using fast-constrained dominant sets. Int. J. Comput. Vis. 127(9), 1303–1320 (2019)

    Article  Google Scholar 

  18. 18.

    Ristani, E., Tomasi, C.: Features for multi-target multi-camera tracking and re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6036–6046 (2018)

  19. 19.

    Gheissari, N., Sebastian, T.B., Hartley, R.: Person re-identification using spatio-temporal appearance. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 2, pp. 1528–1535 (2006)

  20. 20.

    Farenzena, M., Bazzani, L., Perina, A., Murino, V., Cristani, M.: Person re-identification by symmetry-driven accumulation of local features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2360–2367 (2010)

  21. 21.

    Zhang, X., Luo, H., Fan, X., Xiang, W., Sun, Y., Xiao, Q., Jiang, W., Zhang, C., Sun, J.: Aligned-reID: Surpassing human-level performance in person re- identification. arXiv 1711.08184 (2017)

  22. 22.

    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1116–1124 (2015)

  23. 23.

    Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: Deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 152–159 (2014)

  24. 24.

    Wei, L., Zhang, S., Gao, W., Tian, Q.: Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 79–88 (2018)

  25. 25.

    Ni X, Fang L, Huttunen H.: AdaptiveReID: Adaptive L2 Regularization in Person Re-Identification. arXiv preprint 2007.07875 (2020)

  26. 26.

    Wang, G., Lai, J., Huang, P., Xie, X.: Spatial-temporal person re-identification. Proc. AAAI Conf. Artif. Intell. 33, 8933–8940 (2019)

    Google Scholar 

  27. 27.

    Yuanlu, X., Bingpeng, M., Rui, H., Liang, L.: Person search in a scene by jointly modeling people commonness and person uniqueness. In ACM International Conference on Multimedia, pp. 937–940 (2014)

  28. 28.

    Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3415–3424 (2017)

  29. 29.

    Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., Tian, Q.: Person re-identification in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1367–1376 (2017)

  30. 30.

    Schulter, S., Vernaza, P., Choi, W., Chandraker, M.: Deep network flow for multi-object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6951–6960 (2017)

  31. 31.

    Shidik, G.F., Noersasongko, E., Nugraha, A., Andono, P.N., Jumanto, J., Kusuma, E.J.: A systematic review of intelligence video surveillance: trends, techniques, frameworks, and datasets. IEEE Access 7, 170457–170473 (2019)

    Article  Google Scholar 

  32. 32.

    Karthikeswaran, D., Sengottaiyan, N., Anbukaruppusamy, S.: Video surveillance system against anti-terrorism by using adaptive linear activity classification (ALAC) technique. J. Med. Syst. 43(8), 256 (2019)

    Article  Google Scholar 

  33. 33.

    Zin, T.T., Tin, P., Hama, H., Toriu, T.: Unattended object intelligent analyzer for consumer video surveillance. IEEE Trans. Consum. Electron. 57(2), 549–557 (2011)

    Article  Google Scholar 

  34. 34.

    Arroyo, R., Yebes, J.J., Bergasa, L.M., Daza, I.G., Almazán, J.: Expert video-surveillance system for real-time detection of suspicious behaviors in shopping malls. Expert Syst. Appl. 42(21), 7991–8005 (2015)

    Article  Google Scholar 

  35. 35.

    Shu, C.-F., Hampapur, A., Lu, M., Brown, L., Connell, J., Senior, A., & Tian, Y.: IBM smart surveillance system (S3): an open and extensible framework for event based surveillance. In: IEEE Conference on Advanced Video and Signal Based Surveillance, IEEE, pp. 318–323 (2005)

  36. 36.

    Kardas, K., Cicekli, N.K.: SVAS: surveillance video analysis system. Expert Syst. Appl. 89, 343–361 (2017)

    Article  Google Scholar 

  37. 37.

    Ko, K.E., Sim, K.B.: Deep convolutional framework for abnormal behavior detection in a smart surveillance system. Eng. Appl. Artif. Intell. 1(67), 226–234 (2018)

    Article  Google Scholar 

  38. 38.

    Şaykol, E., Güdükbay, U., Ulusoy, Ö.: Scenario-based query processing for video-surveillance archives. Eng. Appl. Artif. Intell. 23(3), 331–345 (2010)

    Article  Google Scholar 

  39. 39.

    Bonomi, F., Milito, R., Natarajan, P., Zhu, J.: Fog computing: A platform for internet of things and analytics. Big data and internet of things: a roadmap for smart environments, pp. 169–186. Springer, Cham (2014)

    Google Scholar 

  40. 40.

    Sapienza, M., Guardo, E., Cavallo, M., Torre, G.L., Leombruno, G., Tomarchi, O.: Solving critical events through mobile edge computing: An approach for smart cities. In: IEEE International Conference on Smart Computing (SMARTCOMP), pp. 1–5 (2016)

  41. 41.

    Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Real-time multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)

  42. 42.

    Yu, Q., Chang, X., Song, Y.Z., Xiang, T., Hospedales, T.M.: The devil is in the middle: exploiting mid-level representations for cross-domain instance matching. arXiv preprint 711.08106 (2017)

  43. 43.

    Zheng, L., Bie, Z., Sun, Y., Wang, J. Su, C., Wang, S., Tian, Q.: Mars: A video benchmark on large-scale person re-identification. In: European Conference on Computer Vision, pp. 868–884. Springer, Cham (2016)

  44. 44.

    Li, P., Zhang, J., Zhu, Z., Li, Y., Jiang, L., Huang, G.: State-aware re-identification feature for multi-target multi-camera tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)

  45. 45.

    YOLOv5. https://github.com/ultralytics/yolov5. Accessed 24 July 2020.

  46. 46.

    Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)

  47. 47.

    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Advances in neural information processing systems, pp. 91–99 (2015)

  48. 48.

    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)

  49. 49.

    Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision,, pp. 2980–2988 (2017)

  50. 50.

    Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint 2004.10934 (2020)

  51. 51.

    Redmon, J., Ali, F.: Yolov3: an incremental improvement. arXiv preprint 1804.02767 (2018)

  52. 52.

    Redmon, J., Ali, F.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

  53. 53.

    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  54. 54.

    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer, Cham (2014)

  55. 55.

    Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2018)

  56. 56.

    Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 480–496 (2018)

  57. 57.

    Qian, X., Fu, Y., Jiang, Y.-G., Xiang, T., Xue, X.: Multi-scale deep learning architectures for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5399–5408 (2017)

  58. 58.

    Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3702–3712 (2019)

  59. 59.

    NVIDIA TensorRT Programmable Inference Accelerator. https://developer.nvidia.com/tensorrt. Accessed 24 Nov 2020.

  60. 60.

    Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision, pp. 17–35. Springer, Cham (2016)

  61. 61.

    Kuo, C.H., Huang, C., Nevatia, R.: Inter-camera association of multi-target tracks by on-line learned appearance affinity models. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision ECCV 2010. Number 6311 in Lecture notes in computer science, pp. 383–396. Springer, Berlin (2010)

    Google Scholar 

  62. 62.

    Per, J., Kenk, V.S., Kristan, M., Kovacic, S.: Dana36: a multi-camera image dataset for object identification in surveillance scenarios. In: 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, pp. 64–69. IEEE (2012)

  63. 63.

    Cao, L., Chen, W., Chen, X., Zheng, S., Huang, K.: An equalized global graphical model-based approach for multi-camera object tracking. 11502.03532 [cs] (2015)

  64. 64.

    Zhang, S., Staudt, E., Faltemier, T., Roy-Chowdhury, A.K.: A camera network tracking (CamNeT) dataset and performance baseline. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 365–372 (2015)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Bipin Gaikwad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gaikwad, B., Karmakar, A. Smart surveillance system for real-time multi-person multi-camera tracking at the edge. J Real-Time Image Proc (2021). https://doi.org/10.1007/s11554-020-01066-8

Download citation

Keywords

  • Smart surveillance
  • MPMCT
  • Edge analytics
  • Person detection
  • Person re-identification
  • Deep learning
  • YOLOv5
  • Resnet50-mid