Abstract
Automatic annotation of image contents can be performed more efficiently if it is supported by reliable segmentation algorithms which can extract, as accurately as possible, areas with a certain level of semantic uniformity on top of the default pictorial uniformity of regions extracted by the segmentation methods. Obviously, the results should be insensitive to noise, textures, and other effects typically distorting such uniformities. This chapter discusses a segmentation technique based on SIMSER (scale-insensitive maximally stable extremal regions) features, which are a generalization of popular MSER features. Promising conformity (at least in selected applications) of such segmentation results with semantic image interpretation is shown. Additionally, the approach has a relatively low computational complexity \((O(log n\times n)\) or \(O(log n\times n\times log(log(n)))\), where n is the image resolution) which makes it prospectively instrumental in real-time applications and/or in low-cost mobile devices. First, the chapter presents fundamentals of SIMSER detector (and the original MSER detector) in gray-level images. Then, relations between semantics-based image annotation and SIMSER features are investigated and illustrated by extensive experiments (including color images, which are the main area of interest).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Hadbury, A.: A survey of methods for image annotation. J. Vis. Lang. Comput. 19, 617–627 (2008)
Liu, Y., Zhang, D., Lua, G., Ma, W.-Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40, 262–282 (2007)
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Ghahramani, Z., et al. (eds.) Advances in Neural Information Processing Systems 27 (NIPS 2014), pp. 487–495. Curran Associates, Inc. (2014)
Pal, N.R., Pal, S.K.: A review on image segmentation techniques. Pattern Recogn. 26, 1277–1294 (1993)
Zaitouna, N.M., Aqel, M.J.: Survey on image segmentation techniques. Procedia Comput. Sci. 65, 797–806 (2015)
Śluzek, A.: Local Detection and Identification of Visual Data: Selected Techniques and Applications. LAP, Saarbrucken (2013)
Belaid, L.J., Mourou, W.: Image segmentation: a watershed transformation algorithm. Image Anal. Stereol. 28(2), 93–102 (2009)
Thoma, M.: A survey of semantic segmentation. https://arxiv.org/pdf/1602.06541. Accessed 27 Apr 2017
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of British Machine Vision Conference BMVC 2002, pp. 384–393 (2002)
Wu, Zh., Ke, Q., Isard, M., Sun, J.: Bundling features for large scale partial-duplicate web image search. In: Proceedings of 2009 IEEE Conference Computer Vision & Pattern Recognition CVPR 2009, vol. 1, pp. 25–32 (2009)
Donoser, M., Bischof, H.: Efficient maximally stable extremal region (MSER) tracking. In: Proceedings of 2006 IEEE Conference Computer Vision & Pattern Recognition CVPR 2006, vol. 1, pp. 553–560 (2006)
Gómez, L., Karatzas, D.: MSER-based real-time text detection and tracking. In: Proceedings of 22nd International Conference on Pattern Recognition ICPR 2014, pp. 3110–3115 (2014)
Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Proceedings of 10th European Conference ECCV 2008. vol. 2, pp. 183–196 (2008)
Salahat, E., Saleh, H., Sluzek, A., Al-Qutayri, M., Mohammad, B., Elnaggar, M.: Architecture and method for real-time parallel detection and extraction of maximally stable extremal regions (MSERs). US Patent 9,311,555, 12 Apr 2016
Salahat, E., Saleh, H., Sluzek, A., Al-Qutayri, M., Mohammad, B., Elnaggar, M.: Hardware architecture for real-time extraction of maximally stable extremal regions (MSERs). US Patent 9,489,578, 8 Nov 2016
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. PAMI. 27, 1615–1630 (2005)
Forssén, P-E., Lowe, D.G.: Shape descriptors for maximally stable extremal regions. In: Proceedings of 11th IEEE International Conference on Computer Vision ICCV 2007, pp. 1–8 (2007)
Kimmel, R., Zhang, C., Bronstein, A.M., Bronstein, M.M.: Are MSER features really interesting? IEEE Trans. PAMI. 33, 2316–2320 (2011)
Martins, P., Carvalho, P., Gatta, C.: On the completeness of feature-driven maximally stable extremal regions. Pattern Recogn. Lett. 74, 9–16 (2016)
Śluzek, A.: Improving performances of MSER features in matching and retrieval tasks. In: Proceedings of 14th European Conference ECCV 2016. vol. LNCS 9915, pp. 759–770 (2016)
Śluzek, A., Saleh, H.: Algorithmic foundations for hardware implementation of scale-insensitive MSER Features. In: Proceedings of 59th International Midwest Symposium Circuits & Systems MWSCAS 2016, pp. 1–4 (2016)
Donoser, M., Bischof, H., Wiltsche, M.: Color blob segmentation by MSER analysis. In: Proceedings of IEEE International Conference on Image Processing ICIP 2006, pp. 757–760 (2006)
Gui, Y., Zhang, X., Shang, Y.: SAR image segmentation using MSER and improved spectral clustering. EURASIP J. Adv. Sig. Process. 83 (2012)
Oh, I.S., Lee, J., Majumder, A.: Multi-scale image segmentation using MSER. In: Proceedings of 15th International Conference CAIP 2013, vol. II, pp. 201–208 (2013)
Wang, G., Gao, K., Zhang, Y., Li, J.: Efficient perceptual region detector based on object boundary. In: Proceedings of 22nd International Conference on Multimedia Modeling MMM 2016, vol. II, pp. 68–78 (2016)
Li, H., Cai, J., Nguyen, T.N.A., Zheng, J.: A benchmark for semantic image segmentation. In: Proceedings of IEEE International Conference Multimedia and Expo ICME 2013 (2013)
Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30, 77–116 (1998)
Śluzek, A.: MSER and SIMSER regions: A link between local features and image segmentation. In: Proceedings of International Conference on Computer Graphics & Digital Image Processing CGDIP 2017, Article 15 (2017)
Acknowledgements
Some results presented in this paper have been supported by the ATIC-SRC Center within Energy Efficient Electronic Systems contract 2013-HJ-2440 for the task A Low-Power System-on-Chip Detector and Descriptor of Visual Keypoints for Video Surveillance Applications.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
The Appendix contains details of computational steps in SIMSER detection, focusing on the prospective hardware or hardware-supported implementations. However, such details cannot be fully explained without an insight into the detection of MSER features. Thus, the included information (a summary of the results presented in [22]) covers most important facts on architectures used in MSER and SIMSER detection, as well as architectures specifically proposed for SIMSER detection only.
Detection of local minima in the threshold space
At each threshold level, the binary image of \(M \times N\) size is represented by three data structures:
-
Seed matrix of regions SM (of the same size as the image) with the initial content \(SM_{i,j}= M\times (i-1)+j\), i.e. each pixel is a seed for itself. After processing, \(SM_{i,j}=K\), where K indicates the initial pixel (seed) of the region to which (i, j) pixel belongs.
-
Region Size matrix RS (of the same size) specifying the size of region to which each (i, j) pixel belongs. Initially, \(RS_{i,j}=1\), i.e. each pixel is a separate region of unit size.
-
Map-of-regions array, which for each image region lists its seed, the binary color and the size.
A small binary image and the final contents of its SM and RS matrices are shown in Fig. 3.8, while its Map-of-regions is given in Table 3.2.
Given such representations for the sequence of binary regions over three neighboring threshold levels (note that such regions are always nested) the local minima of \(q_Q\) (see Eq. 3.2) and \(qt_Q\) (see Eq. 3.4) growth-rate functions can be straightforwardly identified. In other words, MSER regions can be detected or SIMSER candidates (i.e. the regions which satisfy the local minimum criterion in the threshold space) can be pre-selected.
Detection of local minima in the scale space
To identify SIMSER blobs, the regions pre-selected as the local minima in the threshold space should also be confirmed as the local minima in the scale space, i.e. the minima the second growth-rate function \(qs_Q\) (see Eq. 3.5). To verify this, two operations are needed:
-
The original input image should be repetitively processed by a smoothing filter. This is just a convolution with the filter kernel, i.e. the operation which can be straightforwardly into hardware. Its computational complexity is O(n).
-
The correspondences between binary regions in the neighboring scales should be established and, based on that, the values of \(qs_Q\) growth-rate evaluated. This is not a straightforward operation because binary regions over a sequence of scales often do not nest (a simple example is shown in Fig. 3.9).
To solve this problem, the following pseudocode is proposed (its less effective variant which, nevertheless, clearly indicates O(n) complexity of the algorithm was given in [21]):
Evaluation of \(qs_Q\) growth-rate function
The scheme takes two binary images (at the same threshold but at the neighboring scales) their RS and SR matrices, and their maps-of-regions (see above). For each binary region at the current scale, the identifier of the next-scale region is found, and the value of the growth-rate function \(qs_Q\) is evaluated. Therefore, the changes of \(qs_Q\) can be tracked over the scales, and the local minima can be easily found.
In this way, all operations needed to identify SIMSER features are completed.
As an example, a pair of binary images from two neighboring scales is shown in Fig. 3.10, and the corresponding results of the above operations are included in Table 3.3. In this example, Region 4 has the best chance to be a local minimum (with the smallest value of \(qs_Q\)). To confirm that, however, similarly computed values of \(qs_Q\) for Region C (which is the correspondence of Region 4 in the next scale) and for the corresponding region in the previous scale, should be larger (Fig. 3.11).
Altogether, it can be concluded that SIMSER detection architecture is a relatively simple extpansion of the MSER detection architecture, so that hardware implementation of SIMSER detector is a feasible task.
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Śluzek, A. (2018). Scale-Insensitive MSER Features: A Promising Tool for Meaningful Segmentation of Images. In: Kwaśnicka, H., Jain, L. (eds) Bridging the Semantic Gap in Image and Video Analysis. Intelligent Systems Reference Library, vol 145. Springer, Cham. https://doi.org/10.1007/978-3-319-73891-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-73891-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73890-1
Online ISBN: 978-3-319-73891-8
eBook Packages: EngineeringEngineering (R0)