Skyline-based dissimilarity of images

  • Nikolaos GeorgiadisEmail author
  • Eleftherios Tiakas
  • Yannis Manolopoulos
  • Apostolos N. Papadopoulos


Large image collections are being used in many modern applications. In this paper, we aim at capturing the intrinsic dissimilarities of image descriptors in large image collections, i.e., to detect dissimilar (or else diverse) images without defining an explicit similarity or distance measure. Towards this goal, we adopt skyline processing techniques for large image databases, based on their high-dimensional descriptor vectors. The novelty of the proposed methodology lies in the use of skyline techniques empowered by state-of-the-art hashing schemes to enable effective data partitioning and indexing in secondary memory, towards supporting large image databases. The proposed approach is evaluated experimentally by using three real-world image datasets. Performance evaluation results demonstrate that images lying on the skyline have significantly different characteristics, which depend on the type of the descriptor. Thus, these skyline items may be used as seeds to apply clustering in large image databases. In addition, we observe that skyline processing using hash-based indexing structures is significantly faster than index-free skyline computation and also more efficient than skyline computation with hierarchical indexing structures. Based on our results, the proposed approach is both efficient (regarding runtime) and effective (with respect to image diversity) and therefore can be used as a base for more complex data mining tasks such as clustering.


Image databases Image descriptors Skyline algorithms Hashing techniques 



  1. Borzsony, S., Kossmann, D., Stocker, K. (2001). The skyline operator, Proceedings 17th international conference on data engineering (ICDE) pp. 421–430, Heidelberg, Germany.Google Scholar
  2. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L. (2001). Searching in metric spaces. ACM Computer Surveys, 33(3), 273–321.CrossRefGoogle Scholar
  3. Chatzichristofis, S. A., & Boutalis, Y.S. (2008). CEDD: color and edge directivity descriptor – a compact descriptor for image indexing and retrieval, Proceedings 6th international conference in advanced research on computer vision systems (ICVS) pp. 312–322, Santorini, Greece.Google Scholar
  4. Cheng, Y., & Chen, S. (2003). Image classification using color, texture and regions. Image & Vision Computing, 21(9), 759–776.CrossRefGoogle Scholar
  5. Drosou, M., & Pitoura, E. (2015). Multiple radii disC diversity: Result diversification based on dissimilarity and coverage. ACM Transactions on Database Systems, 1, 40.MathSciNetGoogle Scholar
  6. Fagin, R. (1999). Combining fuzzy information from multiple systems. Journal of Computer & System Sciences, 58(1), 83–99.MathSciNetCrossRefzbMATHGoogle Scholar
  7. Georgiadis, N., Tiakas, E., Manolopoulos, Y. (2017). Detecting intrinsic dissimilarities in large image databases through skylines, Proceedings 9th international conference on management of digital ecosystems (MEDES), pp. 194–201, Bangkok, Thailand.Google Scholar
  8. Di Gesu, V., & Starovoitov, V. (1999). Distance-based functions for image comparison. Pattern Recognition Letters, 20(2), 207–214.CrossRefzbMATHGoogle Scholar
  9. Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F. (2013). Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis & Machine Intelligence, 35(12), 2916–2929.CrossRefGoogle Scholar
  10. Grauman, K., & Fergus, R. (2013). Learning binary hash codes for large-scale image search, chapter in book machine learning for computer vision by R. cipolla, S. Battiato and G.M. Farinella (eds.), pp. 49–87, Springer.Google Scholar
  11. Heo, J. P., Lee, Y., He, J., Chang, S. F., Yoon, S.E. (2015). Spherical hashing: binary code embedding with hyperspheres. IEEE Transactions on Pattern Analysis & Machine Intelligence, 37(11), 2304–2316.CrossRefGoogle Scholar
  12. Indyk, P., & Motwani, R. (1998). Approximate nearest neighbors: Towards removing the curse of dimensionality, Proceedings 30th annual ACM symposium on theory of computing (STOC), pp. 604–613, Dallas, TX.Google Scholar
  13. Jégou, H., Douze, M., Schmid, C. (2008). Hamming embedding and weak geometry consistency for large scale image search, Proceedings 10th European conference on computer vision (ECCV), pp. 304–317, Marseille, France.Google Scholar
  14. Jégou, H., Douze, M., Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis & Machine Intelligence, 33(1), 117–128.CrossRefGoogle Scholar
  15. Jin, Z., Li, C., Lin, Y., Cai, D. (2014). Density sensitive hashing. IEEE Transactions on Cybernetics, 44(8), 1362–1371.CrossRefGoogle Scholar
  16. Kossmann, D., Ramsak, F., Rost, S. (2002). Shooting stars in the sky: An online algorithm for skyline queries, Proceedings 28th international conference on very large data bases (VLDB), pp. 275–286, Hong Kong, China.Google Scholar
  17. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.CrossRefGoogle Scholar
  18. Papadias, D., Tao, Y., Fu, G., Seeger, B. (2003). An optimal and progressive algorithm for skyline queries, Proceedings ACM international conference on management of data (SIGMOD), pp. 467–478, San Diego, CA.Google Scholar
  19. Shirkhorshidi, A. S., Aghabozorgi, S., Wah, T.Y. (2015). A comparison study on similarity and dissimilarity measures in clustering continuous data, PLos ONE, 10(12).Google Scholar
  20. Stehling, R. O., Nascimento, M. A., Falcão, A.X. (2002). A compact and efficient image retrieval approach based on border/interior pixel classification, Proceedings 11th international conference on information & knowledge management (CIKM), pp. 102–109, McLean, VA.Google Scholar
  21. Tan, K. -L., Eng, P. -K., Ooi, B.C. (2001). Efficient progressive skyline computation, Proceedings 27th international conference on very large data bases (VLDB), pp. 301–310, Rome, Italy.Google Scholar
  22. Tiakas, E., Papadopoulos, A.N., Manolopoulos, Y. (2013). On estimating the maximum domination value and the skyline cardinality of multidimensional data sets. International Journal of Knowledge-based Organizations, 3(4), 61–83.CrossRefGoogle Scholar
  23. Tiakas, E., Papadopoulos, A. N., Manolopoulos, Y. (2016). Skyline queries: An introduction, Proceedings 6th international conference on information, intelligence, systems & applications (IISA), pp. 1–6, Corfu, Greece.Google Scholar
  24. Tiakas, E., Rafailidis, D., Dimou, A., Daras, P. (2013). MSIDX: Multi-sort indexing for efficient Content-Based image search and retrieval. IEEE Transactions on Multimedia, 15(6), 1415–1430.CrossRefGoogle Scholar
  25. Valkanas, G., Papadopoulos, A. N., Gunopoulos, D. (2013). Skydiver: A framework for skyline diversification, Proceedings of joint EDBT/ICDT conferences, pp. 406–417, Genoa, Italy.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of InformaticsAristotle UniversityThessalonikiGreece
  2. 2.Faculty of Pure, Applied SciencesOpen University of CyprusNicosiaCyprus

Personalised recommendations