Skip to main content

Passive Scene Recognition

  • Chapter
  • First Online:
Indoor Scene Recognition by 3-D Object Search

Part of the book series: Springer Tracts in Advanced Robotics ((STAR,volume 135))

  • 409 Accesses

Abstract

Detailed technical presentation of our contributions that are related to Passive Scene Recognition. This includes the learning of Trees of Implicit Shape Models as well as carrying out scene recognition on the basis of these classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The object trajectories are visualized as colored line strips between coordinate frames that indicate the recorded poses.

  2. 2.

    The overall confidence of a scene model is visualized by a sphere above the reference object. Its color changes from red to green with increasing confidence. Fulfilled relations are displayed as lines whose colors express to which ISM they belong. The degree to which an individual relation is fulfilled is indicated by the color of a small pyramid at that end of the relation. In order to reduce the overlap between the visualizations of scene models and real objects, transparent grey cylinders are placed between the relations and the objects.

  3. 3.

    The overall confidence of a scene model from our hierarchical representation is visualized by a sphere at its top. In 3 and 4 in Fig. 3.3, the visible models consist of two layers. At the top, there is an ISM that connects ISMs from the underlying layer in the tree and summarizes their recognition results. By doing so, it also represents additional spatial relations that are not yet covered by the ISMs below it. The individual confidences of the results are represented by smaller spheres. All relations of an ISM that are fulfilled are visualized by lines of the same color.

  4. 4.

    An example object configuration from the demonstration is visualized in transparent blue to ensure the clarity of the picture.

  5. 5.

    In Sect. 1.1.2, we subsumed the two tags \(c\) and \(d\) to a simple name tag for the sake of simplicity. Differentiating between object class \(c\) and identifier \(d\) e.g. allows for expressing that objects are identical with respect to their shapes but not in relation to their colors.

  6. 6.

    Since scene category models and relation topologies contain objects \(o\) and spatial relations \(\mathbf R \), we chose to define both with the same identifiers.

  7. 7.

    We visualize relation topologies as undirected graphs, e.g. in Fig. 1.6.

  8. 8.

    Throughout this thesis, we use mathematical set theory instead of composite data types for the sake of compactness.

  9. 9.

    We derive our method with homogeneous transforms [5] for better readability but implement it by using pairs of position vectors and quaternions [5] in order to optimize the sizes of ISM tables.

  10. 10.

    It is to be noted that this reference is an imaginary object, existing separately from all the real objects in the scene category.

  11. 11.

    Throughout this thesis, we use mathematical set theory instead of composite data types for the sake of compactness. For instance, the union operator in Algorithm 2 has time complexity \(\mathcal {O}(1)\) in our notation.

  12. 12.

    These transforms are located at the head of lines which represent relative reference poses \(\mathbf T _{oF}\) within an ISM table and start at input object pose \(\mathbf T \). The input poses are drawn as spheres but are partly occluded by the three-dimensional models of the objects which they belong to.

  13. 13.

    This is the reason why we do not represent vote combination \(\mathbf {{v}}_\mathbf{S }\) as a subset of \(\mathbf B _\mathbf{S }\).

  14. 14.

    Local search methods are not used in order to avoid issues related to local optima [18, p. 669]. We only present design decisions that reduce problem complexity, omitting canonical optimization strategies as parallelization for the sake of clarity.

  15. 15.

    A sufficiently small bucket size \(s\) should be selected in order to prevent vote combinations of different configurations from falling into the same bucket.

  16. 16.

    The reference pose hypotheses in the votes are visualized by the shapes of the objects that cast the votes.

  17. 17.

    This sphere is the circumscribed sphere of the bucket, the three-dimensional analog to the circumscribed circle [1, p. 137]. Its diameter ensures that this sphere is enclosed in the neighborhood of the bucket, no matter the location of its center within the bucket.

  18. 18.

    Without loss of generality, vote \(\mathbf v _{C}\) is part of every vote combination that is deduced from the sphere, centered at \(\mathbf T _{F}(\mathbf v _{C})\).

  19. 19.

    The vote \(\mathbf v _{j}\) which each object \(o_{j}\) in the scene category instance contributes, is depicted as a thick black line. \(\mathbf T _{F}(\mathbf v _{C})\) is the pose hypothesis of the center \(\mathbf v _{C}\) of the sphere that contains this vote combination. All deviations between \(\mathbf T _{F}(\mathbf v _{C})\) and all other votes in the accumulator are visualized as thin black lines. The confidence of the contributed scene category instance is expressed by the coloring of the sphere, ranging from red to green with increasing confidence.

  20. 20.

    We refer to this distance as the (variable) length of a spatial relation.

  21. 21.

    Nevertheless, \(\mathbf T _{F}(\mathbf v _{j}),\mathbf T _{F}(\mathbf v _{C})\) are still passed to Algorithm 6 as both are necessary to calculate \(\mathbf T , \mathbf T _{p}\).

  22. 22.

    We employ the same visualization for drawing perfectly matching poses with the according relative poses \(\{\mathbf {T}_{Fo}\}\) that we used for visualizing reference pose votes.

  23. 23.

    The aspect of individual areas that is related to positions is depicted as transparent spheres, with as radius and the poses from 1 in Fig. 3.9 as centers. In addition, we randomly sample 6-DoF object poses within the acceptance areas and visualize them as smaller, opaque spheres, representing their positions to which we attach cones. The symmetry axes of the cones correspond to an axis of the coordinate frames of the sampled poses, and the radius of the cones stands for the maximum angle deviation.

  24. 24.

    This is especially the case when relying on small bucket sizes.

  25. 25.

    Both variations in rating perform their roles, especially since meaningful, precise rating presumes accurate demonstration data. Besides, fulfilling the given thresholds represents a perfect result according to the semantics of some applications.

  26. 26.

    Since the possible values of position deviations are not limited by any upper bound, we opted for a function that is defined on the open interval [1, p. 3] \((-\infty , \infty )\).

  27. 27.

    The domain of this function, the closed interval [1, p. 3] [\({0}^\circ ,{180}^\circ \)], matches the range of values of \(w_{o}(\mathbf T ,\mathbf T _{p})\).

  28. 28.

    Since \(o_{M}\) may not be unique in the input topology, \(H_{D}(\{o\},\{\mathbf {R}\})\) is a multi-valued function.

  29. 29.

    The first part of ISM tree generation in particular differs from the second one by solely operating on abstract entities with no notion of the spatial characteristics of the relations of scene category \(\mathbf S \).

  30. 30.

    Contrary to the definition of reference objects in Sect. 3.4, such scene reference objects \(o_{F}\) do not stand for real objects among the set \(\{o\}\) an ISM is supposed to connect to each other. They are imaginary objects that exist outside the scope of an individual ISM and are deduced from recognition results \(\mathbf I _{m}\).

  31. 31.

    We define the height of a vertex in a search tree as increasing from 0 at the root towards the leafs of the search tree.

  32. 32.

    Individual ISMs are visualized in analogy to Fig. 3.12.

  33. 33.

    In star topologies, multiple objects may have a maximal degree in terms of the number of relations they participate in. The center of such topologies may not be identical with the object in the respective topology that minimizes the height function.

  34. 34.

    Searching for the center \(o_{M}(\Sigma _{\sigma }(j))\) of topology \(\Sigma _{\sigma }(j)\) instead of object \(o_{H}\) may yield suboptimal tree heights. Again, this is a consequence of the possibility that topology \(\Sigma _{\sigma }(j)\) may contain multiple objects of maximum degree.

  35. 35.

    An ISM tree can be stored as a set of tables in a relational database.

  36. 36.

    Input configuration \(\{i\}\) is extended in the course of scene recognition.

  37. 37.

    Instead of passing scene category \(\mathbf S \) to Algorithm 2, we pass ISMs \(m\) as parameters since all ISMs in a tree refer to the same scene category. In consequence, no more ISM tables are loaded in Algorithm 2.

  38. 38.

    If object \(o\) participates to several relations in input topology \(\Sigma _{\nu }\), these relations are distributed among different star topologies \(\Sigma _{\sigma }(j)\) if \(o\) is not selected as a center \(o_{M}(\Sigma _{\sigma }(j)) = o\) while partitioning of \(\Sigma _{\nu }\).

  39. 39.

    Identifiers \(\mathbf I _{m_{k}}\) of recognition results are assigned with the identifier of ISM \(m_{k}\) from which they originate as compared to scene category instances \(\mathbf I _\mathbf{S }\).

  40. 40.

    Scene labels \(z_{m}\) of non-root ISMs \(\{m\} \setminus m_{R}\) are created by appending “_sub” postfixes to the identifier \(z\) of the overall scene category.

  41. 41.

    Spheres that belong to estimates \(\mathbf E (o)\) for the same object \(o\) are visualized in the same color.

  42. 42.

    \(b_{m}\) corresponds to the number of input objects that may cast votes in ISM \(m\). \(a_{m}\) corresponds to the number of table entries per object in the same ISM.

  43. 43.

    This assumption refers to the employed value \(\epsilon = 0\) for acceptance threshold \(\epsilon \).

  44. 44.

    For the sake of simplicity, this option is omitted in Algorithm 2.

  45. 45.

    This option is omitted in Algorithm 2, too.

  46. 46.

    The ISM tree-generation algorithms require a connected relation topology \(\Sigma _{\nu }\) and a set of demonstrated object trajectories \(\{\mathbf {J}(o)\}\) as input.

  47. 47.

    In this thesis, we set the threshold for accepting an input configuration \(\{i\}\) as a valid instance of a scene category to 1.

  48. 48.

    ISM trees that are based on star topologies \(\Sigma _{\sigma }\) are made up of just a single ISM \(\{m_{\sigma }\} = m_{\sigma }\).

  49. 49.

    The mixed confidence of the scene category instance that is depicted in 4 in Fig. 3.14 is made visible by the yellow colouring of the sphere above the reference of the root of ISM tree \(\{m_{\mu }\}\).

  50. 50.

    In this thesis, connected relation topologies are visualized as connected labeled graphs.

  51. 51.

    In the context of Eq. 3.16, a false positive is considered as detected when the return value of Algorithm 12 is not empty.

  52. 52.

    Even though we rate ISM trees \(\{m\}\), we assign the values of objective function r\(()\) to their topologies \(\Sigma \).

  53. 53.

    The optimized topologies \(\Sigma _{o}\) for scene category “Setting—Ready for Breakfast” that are presented in this section have been selected on the basis of the weights: \(\omega _{F} : \omega _{D} = 3 : 1\).

  54. 54.

    Star topologies are part of those topologies that contain the minimum number of spatial relations that is acceptable for connected topologies.

  55. 55.

    Our approach to visualizing ISM trees in this section differs from the one in Sect. 3.5.2. Leafs in trees are discerned by their shapes instead of names written into circles. ISM placeholders and the internal vertices to which they transfer scene reference objects are merged to cloud-shaped symbols.

  56. 56.

    \(n\) is the number of objects in the scene category and \(l\) is the length of the trajectories.

  57. 57.

    Since references \(o_{F}\) are chosen at random, they potentially change each time a configuration \(\{i_{p}\}\) is generated.

  58. 58.

    Star topologies for that scene category contain \(|\{\mathbf {R}\}| = 7\) relations. \(|\{\mathbf {R}_{\mu }\}| = 28\) relations exist in the corresponding complete topology.

  59. 59.

    In addition, an option exists for disabling both the remove and the exchange operation.

  60. 60.

    The rating of a topology is visualized by means of a color interval from red to green where red stands for the highest rating encountered during optimization and green represents the lowest value.

  61. 61.

    The exact value of the rating of a topology stands in the middle row of the label of its circle.

  62. 62.

    In the lowermost row of the label of each circle, the exact value of numFPs\(()\) can be found on the left and the one of avgDur\(()\) on the right.

  63. 63.

    The amount of time avgDur\(()\) returns for a topology is proportional to the diameter of its circle.

  64. 64.

    In hill-climbing, current and optimized topology always coincide.

  65. 65.

    The visualization used in Figs. 3.22 and 3.23 is identical to that in Fig. 3.20.

  66. 66.

    When using simulated annealing, the successor function has three operations at its disposal. Both remove and exchange operations are required for removing inappropriate relations in some current topologies, resulting from the initially large tolerance of simulated annealing.

  67. 67.

    Calling the successor function repeatedly for the same current topology makes sense because this function always just returns a randomly composed, limited subset of the successors that is able to generate.

  68. 68.

    Another reason for the superiority of hill-climbing in this example is its capability of restricting itself to the add operation when using the successor function, instead of having to rely on all three operations as simulated annealing does.

  69. 69.

    Given a scene category, we assign performances of ISM trees to the topologies from which they originate.

  70. 70.

    We regard both optimized topologies and their ISM trees as direct results of relation topology selection.

  71. 71.

    Please note that object sets \(\{o\}\) consist of objects \(o\) and not of object estimates \(\mathbf E (o)\).

  72. 72.

    Scene variations differ in terms of the object poses that can be found in them. Beyond that, scene alternatives consist of different objects.

References

  1. Bronshtein, I., Semendyayev, K., Musiol, G., Muehlig, H.: Handbook of Mathematics, 5th edn. Springer, Berlin (2007)

    Google Scholar 

  2. Carneiro, G., Lowe, D.: Sparse flexible models of local features. In: European Conference on Computer Vision, pp. 29–43. Springer, Berlin (2006)

    Google Scholar 

  3. Crandall, D., Felzenszwalb, P., Huttenlocher, D.: Spatial priors for part-based recognition using statistical models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005. vol. 1, pp. 10–17. IEEE (2005)

    Google Scholar 

  4. Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  5. Funda, J., Taylor, R.H., Paul, R.P.: On homogeneous transforms, quaternions, and computational efficiency. IEEE Trans. Robot. Autom. 6(3), 382–388 (1990)

    Google Scholar 

  6. Grauman, K., Leibe, B.: Visual object recognition. Synth. Lect. Artif. Intell. Mach. Learn. 5(2), 1–181 (2011)

    Article  Google Scholar 

  7. Horstmann, M.G.: Tischgedeck und Tafelarrangement — herr mika TAFELKULTUR. http://herr-mika.tafelkultur.eu/?page_id=58. Accessed 15 May 2017

  8. Illingworth, J., Kittler, J.: A survey of the hough transform. Comput. Vis., Graph., Image Process. 44(1), 87–116 (1988)

    Article  Google Scholar 

  9. Jäkel, R., Meißner, P., Schmidt-Rohr, S., Dillmann, R.: Distributed generalization of learned planning models in robot programming by demonstration. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4633–4638. IEEE (2011)

    Google Scholar 

  10. Jäkel, R., Schmidt-Rohr, S.R., Rühl, S.W., Kasper, A., Xue, Z., Dillmann, R.: Learning of planning models for dexterous manipulation based on human demonstrations. Int. J. Soc. Robot. 1–12 (2012)

    Google Scholar 

  11. Kenwright, B.: Dual-quaternions, from classical mechanics to computer graphics and beyond (2012)

    Google Scholar 

  12. Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. Int. J. Comput. Vis. 77(1–3), 259–289 (2008)

    Article  Google Scholar 

  13. Mehlhaus, J.: Komparative Analyse ausgewählter Algorithmen zur kombinatorischen Optimierung der räumlichen Relationen in hierarchischen Implicit Shape Models. Bachelor’s thesis, Advisor: P. Meißner, Reviewer: R. Dillmann, Karlsruhe Institute of Technology (2016)

    Google Scholar 

  14. Meißner, P., Hanselmann, F., Jäkel, R., Schmidt-Rohr, S., Dillmann, R.: Automated selection of spatial object relations for modeling and recognizing indoor scenes with hierarchical implicit shape models. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4257–4264. IEEE (2015)

    Google Scholar 

  15. Meißner, P., Reckling, R., Jäkel, R., Schmidt-Rohr, S., Dillmann, R.: Recognizing scenes with hierarchical implicit shape models based on spatial object relations for programming by demonstration. In: 2013 16th International Conference on Advanced Robotics (ICAR), pp. 1–6. IEEE (2013)

    Google Scholar 

  16. Mitchell, T.M.: Machine Learning, International edn. McGraw-Hill, New York (1997)

    Google Scholar 

  17. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. (CSUR) 33(1), 31–88 (2001)

    Article  Google Scholar 

  18. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, Third International edn. Prentice Hall Press, Prentice (2010)

    Google Scholar 

  19. Samal, A., Iyengar, P.A.: Automatic recognition and analysis of human faces and facial expressions: a survey. Pattern Recognit. 25(1), 65–77 (1992)

    Article  Google Scholar 

  20. Siciliano, B., Khatib, O.: Springer Handbook of Robotics. Springer Science + Business Media, Berlin (2008)

    Google Scholar 

  21. Sloane, N.J.A.: Number of connected labeled graphs with n nodes — the on-line encyclopedia of integer sequences. http://oeis.org/A001187. Accessed 02 June 2017

  22. Weisstein, E.W.: Connected graph — a wolfram web resource. http://mathworld.wolfram.com/ConnectedGraph.html. Accessed 18 Feb 2017

  23. Weisstein, E.W.: Square pyramidal number — a wolfram web resource. http://mathworld.wolfram.com/SquarePyramidalNumber.html. Accessed 30 June 2017

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pascal Meißner .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Meißner, P. (2020). Passive Scene Recognition. In: Indoor Scene Recognition by 3-D Object Search. Springer Tracts in Advanced Robotics, vol 135. Springer, Cham. https://doi.org/10.1007/978-3-030-31852-9_3

Download citation

Publish with us

Policies and ethics