Abstract
While relational representations have been popular in early work on syntactic and structural pattern recognition, they are rarely used in contemporary approaches to computer vision due to their pure symbolic nature. The recent progress and successes in combining statistical learning principles with relational representations motivates us to reinvestigate the use of such representations. More specifically, we show that statistical relational learning can be successfully used for hierarchical image understanding. We employ kLog, a new logical and relational language for learning with kernels to detect objects at different levels in the hierarchy. The key advantage of kLog is that both appearance features and rich, contextual dependencies between parts in a scene can be integrated in a principled and interpretable way to obtain a qualitative representation of the problem. At each layer, qualitative spatial structures of parts in images are detected, classified and then employed one layer up the hierarchy to obtain higher-level semantic structures. We apply a four-layer hierarchy to street view images and successfully detect corners, windows, doors, and individual houses.
Chapter PDF
References
Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision 3(3), 177–280 (2007)
Hanson, A., Riseman, E.: Visions: A computer system for interpreting scenes. In: CVS, pp. 303–333 (1978)
De Raedt, L.: Logical and Relational Learning. Springer (2008)
Fu, K.: Syntactic methods in pattern recognition, vol. 112. Elsevier Science (1974)
Antanas, L., van Otterlo, M., Tuytelaars, T., Raedt, L.D., Oramas Mogrovejo, J.: A relational distance-based framework for hierarchical image understanding. In: ICPRAM, vol. (2), pp. 206–218 (2012)
Pearce, A.R., Caelli, T., Bischof, W.F.: Learning relational structures: Applications in computer vision. Applied Intelligence 4, 257–268 (1994)
Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of relational structure. In: ICML, pp. 170–177 (2001)
Frasconi, P., Costa, F., Raedt, L.D., Grave, K.D.: klog: A language for logical and relational learning with kernels. CoRR (2012)
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE TPAMI 32(9), 1627–1645 (2010)
Fergus, R., Perona, P., Zisserman, A.: Weakly supervised scale-invariant learning of models for visual recognition. IJCV 71(3), 273–303 (2007)
Han, F., Zhu, S.: Bottom-up/top-down image parsing with attribute grammar. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(1), 59–73 (2009)
Zhu, L., Chen, Y., Lin, Y., Lin, C., Yuille, A.: Recursive segmentation and recognition templates for image parsing. IEEE TPAMI 34(2), 359–371 (2012)
Girshick, R., Felzenszwalb, P., McAllester, D.: Object detection with grammar models. IEEE TPAMI 33(12) (2011)
Zhu, S.C., Mumford, D.: A stochastic grammar of images. Found. Trends. Comput. Graph. Vis. 2(4), 259–362 (2006)
Hartz, J.: Learning probabilistic structure graphs for classification and detection of object structures. In: ICMLA, pp. 5–11 (2009)
Zhao, P., Fang, T., Xiao, J., Zhang, H., Zhao, Q., Quan, L.: Rectilinear parsing of architecture in urban environment. In: CVPR, pp. 342–349 (2010)
Koutsourakis, P., Simon, L., Teboul, O., Tziritas, G., Paragios, N.: Single view reconstruction using shape grammars for urban environments. In: ICCV, pp. 1795–1802 (2009)
Terzic, K., Hotz, L., Sochman, J.: Interpreting structures in man-made scenes - combining low-level and high-level structure sources. In: ICAART, pp. 357–364 (2010)
Tuytelaars, T., Fritz, M., Saenko, K., Darrell, T.: The nbnn kernel. In: ICCV, pp. 1824–1831 (2011)
Antanas, L., Frasconi, P., Tuytelaars, T., De Raedt, L.: Employing relational languages for image understanding. In: IEEE Workshop on Kernels and Distances for Computer Vision, pp. 1–2 (2011)
Ferrari, V., Fevrier, L., Jurie, F., Schmid, C.: Groups of adjacent contour segments for object detection. TPAMI, 36–51 (2008)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The Complete Book, 2nd edn. Prentice Hall Press, Upper Saddle River (2008)
Costa, F., Grave, K.D.: Fast neighborhood subgraph pairwise distance kernel. In: ICML, pp. 255–262 (2010)
Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, University of California at Santa Cruz (1999)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: Efficient boosting procedures for multiclass object detection. In: CVPR, pp. 762–769 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Antanas, L., Frasconi, P., Costa, F., Tuytelaars, T., De Raedt, L. (2012). A Relational Kernel-Based Framework for Hierarchical Image Understanding. In: Gimel’farb, G., et al. Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2012. Lecture Notes in Computer Science, vol 7626. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34166-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-34166-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34165-6
Online ISBN: 978-3-642-34166-3
eBook Packages: Computer ScienceComputer Science (R0)