Skip to main content

Human Centered Scene Understanding Based on 3D Long-Term Tracking Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9117))

Abstract

Scene understanding approaches are mainly based on geometric information, not considering the behavior of humans. The proposed approach introduces a novel human-centric scene understanding approach, based on long-term tracking information. Long-term tracking information is filtered, clustered and areas offering meaningful functionalities for humans are modeled using a kernel density estimation. This approach allows to model walking and sitting areas within an indoor scene without considering any geometric information. Thus, it solely uses continuous and noisy tracking data, acquired from a 3D sensor, monitoring the scene from a bird’s eye view. The proposed approach is evaluated on three different datasets from two application domains (home and office environment), containing more than 180 days of tracking data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Since Primesense is not supporting the OpenNI project any longer, the authors would like to stress that the proposed approach is fully independent from third party companies. Hence, other depth cameras and tracking algorithms can be used in order to obtain the long-term tracking data.

  2. 2.

    http://tracking-dataset.planinc.eu.

References

  1. OpenNI (2011). http://www.openni.org. Accessed 10 April 2014

  2. Delaitre, V., Fouhey, D.F., Laptev, I., Sivic, J., Gupta, A., Efros, A.A.: Scene semantics from long-term observation of people. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 284–298. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  3. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  4. Fouhey, D.F., Delaitre, V., Gupta, A., Efros, A.A., Laptev, I., Sivic, J.: People watching: human actions as a cue for single view geometry. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 732–745. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  5. Gupta, A., Satkin, S., Efros, A.A., Hebert, M.: From 3D scene geometry to human workspace. In: CVPR, pp. 1961–1968 (2011)

    Google Scholar 

  6. Gupta, S., Arbelaez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from RGB-D images. In: CVPR, pp. 564–571 (2013)

    Google Scholar 

  7. Janoch, A., Karayev, S., Jia, Y., Barron, J., Fritz, M., Saenko, K., Darrell, T.: A category-level 3-d object dataset: putting the kinect to work. In: ICCV Workshop on Consumer Depth Cameras for Computer Vision, pp. 141–165 (2013)

    Google Scholar 

  8. Lu, J., Wang, G.: Human-Centric Indoor Environment Modeling from Depth Videos. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part II. LNCS, vol. 7584, pp. 42–51. Springer, Heidelberg (2012)

    Google Scholar 

  9. Mutch, J., Lowe, D.G.: Multiclass Object Recognition with Sparse, Localized Features. In: CVPR, vol. 1, 11–18 (2006)

    Google Scholar 

  10. Planinc, R., Kampel, M.: Robust fall detection by combining 3D data and fuzzy logic. In: Park, J.-I., Kim, J. (eds.) ACCV Workshops 2012, Part II. LNCS, vol. 7729, pp. 121–132. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  11. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor Segmentation and Support Inference from RGBD Images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Sung, J., Ponce, C., Selman, B., Saxena, A.: Human activity detection from rgbd images. In: PAIR, pp. 842–849 (2011)

    Google Scholar 

  13. Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp. 1290–1297 (2012)

    Google Scholar 

  14. Xia, L., Chen, C., Aggarwal, J.: View invariant human action recognition using histograms of 3d joints. In: CVPR-Workshop, pp. 20–27 (2012)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the EU and national funding organisations of EU member states (AAL 2013-6-063).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rainer Planinc .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Planinc, R., Kampel, M. (2015). Human Centered Scene Understanding Based on 3D Long-Term Tracking Data. In: Paredes, R., Cardoso, J., Pardo, X. (eds) Pattern Recognition and Image Analysis. IbPRIA 2015. Lecture Notes in Computer Science(), vol 9117. Springer, Cham. https://doi.org/10.1007/978-3-319-19390-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19390-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19389-2

  • Online ISBN: 978-3-319-19390-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics