Skip to main content

Near Real-Time Searchable Analytics for Images

  • Chapter
  • First Online:
Searchable Storage in Cloud Computing
  • 432 Accesses

Abstract

The challenges of handling the explosive growth in data volume and complexity cause the increasing needs for semantic queries. The semantic queries can be interpreted as the correlation-aware retrieval, while containing approximate results. Existing cloud storage systems mainly fail to offer an adequate capability for the semantic queries. Since the true value or worth of data heavily depends on how efficiently semantic search can be carried out on the data in (near-) real-time, large fractions of data end up with their values being lost or significantly reduced due to the data staleness. To address this problem, we propose a near real-time and cost-effective semantic queries based methodology, called FAST. The idea behind FAST is to explore and exploit the semantic correlation within and among datasets via correlation-aware hashing and manageable flat-structured addressing to significantly reduce the processing latency, while incurring acceptably small loss of data-search accuracy. The near real-time property of FAST enables rapid identification of correlated files and the significant narrowing of the scope of data to be processed. FAST supports several types of data analytics, which can be implemented in existing searchable storage systems. We conduct a real-world use case in which children reported missing in an extremely crowded environment (e.g., a highly popular scenic spot on a peak tourist day) are identified in a timely fashion by analyzing 60 million images using FAST. FAST is further improved by using semantic-aware namespace to provide dynamic and adaptive namespace management for ultra-large storage systems. Extensive experimental results demonstrate the efficiency and efficacy of FAST in the performance improvements ({2016}IEEE. Reprinted, with permission, from Ref. [1].).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Y. Hua, H. Jiang, D. Feng, Real-time semantic search using approximate methodology for large-scale storage systems. Trans. Parallel Distrib. Syst. (TPDS) 27(4), 1212–1225 (2016)

    Article  Google Scholar 

  2. M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, M. Zaharia, A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)

    Article  Google Scholar 

  3. A. Marathe, R. Harris, D.K. Lowenthal, B.R. de Supinski, B. Rountree, M. Schulz, X. Yuan, A comparative study of high-performance computing on the cloud, in Proceedings of HPDC (2013)

    Google Scholar 

  4. P. Nath, B. Urgaonkar, A. Sivasubramaniam, Evaluating the usefulness of content addressable storage for high-performance data intensive applications, in Proceedings of HPDC (2008)

    Google Scholar 

  5. Gartner, Inc., Forecast: consumer digital storage needs, 2010–2016 (2012)

    Google Scholar 

  6. Storage Newsletter, 7% of consumer content in cloud storage in 2011, 36% in 2016 (2012)

    Google Scholar 

  7. J. Gantz, D. Reinsel, The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east, in International Data Corporation (IDC) iView, Dec 2012

    Google Scholar 

  8. Y. Hua, W. He, X. Liu, D. Feng, SmartEye: real-time and efficient cloud image sharing for disaster environments, in Proceedings of INFOCOM (2015)

    Google Scholar 

  9. Y. Ke, R. Sukthankar, PCA-SIFT: a more distinctive representation for local image descriptors, in Proceedings of CVPR (2004)

    Google Scholar 

  10. Y. Ke, R. Sukthankar, L. Huston, Efficient near-duplicate detection and sub-image retrieval, in Proceedings of ACM Multimedia (2004)

    Google Scholar 

  11. J. Liu, Z. Huang, H.T. Shen, H. Cheng, Y. Chen, Presenting diverse location views with real-time near-duplicate photo elimination, in Proceedings of ICDE (2013)

    Google Scholar 

  12. D. Zhan, H. Jiang, S.C. Seth, CLU: co-optimizing locality and utility in thread-aware capacity management for shared last level caches. IEEE Trans. Comput. 63(7), 1656–1667 (2014)

    Article  MathSciNet  Google Scholar 

  13. P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of STOC (1998), pp. 604–613

    Google Scholar 

  14. R. Pagh, F. Rodler, Cuckoo hashing, in Proceedings of ESA (2001), pp. 121–133

    Chapter  Google Scholar 

  15. Y. Hua, H. Jiang, Y. Zhu, D. Feng, L. Xu, SANE: semantic-aware namespace in ultra-large-scale file systems. IEEE Trans. Parallel Distrib. Syst. (TPDS) 25(5), 1328–1338 (2014)

    Article  Google Scholar 

  16. Changewave Research. http://www.changewaveresearch.com (2011)

  17. X. Tan, S. Chen, Z.-H. Zhou, F. Zhang, Face recognition from a single image per person: a survey. Pattern Recognit. 39(9), 1725–1745 (2006)

    Article  Google Scholar 

  18. T. Ahonen, A. Hadid, M. Pietikainen, Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)

    Article  Google Scholar 

  19. X. Tan, B. Triggs, Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans. Image Process. 19(6), 1635–1650 (2010)

    Article  MathSciNet  Google Scholar 

  20. J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)

    Article  Google Scholar 

  21. Y. Hua, X. Liu, Scheduling heterogeneous flows with delay-aware deduplication for avionics applications. IEEE Trans. Parallel Distrib. Syst. 23(9), 1790–1802 (2012)

    Article  MathSciNet  Google Scholar 

  22. A.W. Leung, M. Shao, T. Bisson, S. Pasupathy, E.L. Miller, Spyglass: fast, scalable metadata search for large-scale storage systems, in Proceedings of FAST (2009)

    Google Scholar 

  23. Y. Hua, H. Jiang, Y. Zhu, D. Feng, L. Tian, SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems, in Proceedings of SC (2009)

    Google Scholar 

  24. E. Riedel, M. Kallahalla, R. Swaminathan, A framework for evaluating storage system security, in Proceedings of FAST (2002), pp. 15–30

    Google Scholar 

  25. S. Kavalanekar, B. Worthington, Q. Zhang, V. Sharda, Characterization of storage workload traces from production Windows servers, in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC) (2008)

    Google Scholar 

  26. D. Ellard, J. Ledlie, P. Malkani, M. Seltzer, Passive NFS tracing of email and research workloads, in Proceedings of FAST (2003), pp. 203–216

    Google Scholar 

  27. J.L. Hellerstein, Google cluster data. http://googleresearch.blogspot.com/2010/01/google-cluster-data.html, Jan 2010

  28. D. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  MathSciNet  Google Scholar 

  29. B. Bloom, Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  Google Scholar 

  30. A. Andoni, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)

    Article  Google Scholar 

  31. Q. Lv, W. Josephson, Z. Wang, M. Charikar, K. Li, Multi-probe LSH: efficient indexing for high-dimensional similarity search, in Proceedings of VLDB (2007), pp. 950–961

    Google Scholar 

  32. B. Debnath, S. Sengupta, J. Li, ChunkStash: speeding up inline storage deduplication using flash memory, in Proceedings of USENIX ATC (2010)

    Google Scholar 

  33. FUSE. http://fuse.sourceforge.net/

  34. Y. Hua, H. Jiang, D. Feng, FAST: near real-time searchable data analytics for the cloud, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2014)

    Google Scholar 

  35. D. Lowe, Object recognition from local scale-invariant features, in Proceedings of IEEE ICCV (1999)

    Google Scholar 

  36. A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing, in VLDB (1999), pp. 518–529

    Google Scholar 

  37. M. Datar, N. Immorlica, P. Indyk, V. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, in Proceedings of the Annual Symposium on Computational Geometry (2004)

    Google Scholar 

  38. Y. Tao, K. Yi, C. Sheng, P. Kalnis, Quality and efficiency in high-dimensional nearest neighbor search, in Proceedings of SIGMOD (2009)

    Google Scholar 

  39. A. Guttman, R-trees: a dynamic index structure for spatial searching, in Proceedings of ACM SIGMOD (1984), pp. 47–57

    Article  Google Scholar 

  40. Y. Liu, L. Guo, F. Li, S. Chen, An empirical evaluation of battery power consumption for streaming data transmission to mobile devices, in Proceedings of Multimedia (2011), pp. 473–482

    Google Scholar 

  41. Monsoon Power Monitor. http://www.msoon.com (2012)

  42. A. Viswanathan, A. Hussain, J. Mirkovic, S. Schwab, J. Wroclawski, A semantic framework for data analysis in networked systems, in Proceedings of NSDI (2011)

    Google Scholar 

  43. S. Lakshminarasimhan, J. Jenkins, I. Arkatkar, Z. Gong, H. Kolla, S.-H. Ku, S. Ethier, J. Chen, C.-S. Chang, S. Klasky et al., ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data, in Proceedings of SC (2011)

    Google Scholar 

  44. M. Mihailescu, G. Soundararajan, C. Amza, MixApart: decoupled analytics for shared storage systems, in Proceedings of FAST (2013)

    Google Scholar 

  45. J.C. Bennett, H. Abbasi, P.-T. Bremer, R. Grout, A. Gyulassy, T. Jin, S. Klasky, H. Kolla, M. Parashar, V. Pascucci et al., Combining in-situ and in-transit processing to enable extreme-scale scientific analysis, in Proceedings of SC (2012)

    Google Scholar 

  46. H. Huang, N. Zhang, W. Wang, G. Das, A. Szalay, Just-in-time analytics on large file systems, in Proceedings of FAST (2011)

    Google Scholar 

  47. S. Deerwester, S. Dumas, G. Furnas, T. Landauer, R. Harsman, Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci, 391–407 (1990)

    Article  Google Scholar 

  48. C. Papadimitriou, P. Raghavan, H. Tamaki, S. Vempala, Latent semantic indexing: a probabilistic analysis. J. Comput. Syst. Sci. 61(2), 217–235 (2000)

    Article  MathSciNet  Google Scholar 

  49. S. Weil, S.A. Brandt, E.L. Miller, D.D.E. Long, C. Maltzahn, Ceph: a scalable, high-performance distributed file system, in Proceedings of OSDI (2006)

    Google Scholar 

  50. C. Maltzahn, E. Molina-Estolano, A. Khurana, A.J. Nelson, S.A. Brandt, S. Weil, Ceph as a scalable alternative to the Hadoop distributed file system, in login: The USENIX Magazine, August 2010

    Google Scholar 

  51. J. Chou, K. Wu, O. Rubel, M. Howison, J. Qiang, B. Austin, E.W. Bethel, R.D. Ryne, A. Shoshani et al., Parallel index and query for large scale data analysis, in Proceedings of SC (2011)

    Google Scholar 

  52. Y. Hua, B. Xiao, B. Veeravalli, D. Feng, Locality-sensitive bloom filter for approximate membership query. IEEE Trans. Comput. 61(6), 817–830 (2012)

    Article  MathSciNet  Google Scholar 

  53. J.B. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Polyzotis, S. Brandt, SciHadoop: array-based query processing in Hadoop, in Proceedings of SC (2011)

    Google Scholar 

  54. B. Zhu, K. Li, H. Patterson, Avoiding the disk bottleneck in the data domain deduplication file system, in Proceedings of FAST (2008)

    Google Scholar 

  55. D. Bhagwat, K. Eshghi, D. Long, M. Lillibridge, Extreme binning: scalable, parallel deduplication for chunk-based file backup, in Proceedings IEEE MASCOTS (2009)

    Google Scholar 

  56. W. Xia, H. Jiang, D. Feng, Y. Hua, SiLo: a similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput, in Proceedings of USENIX ATC (2011)

    Google Scholar 

  57. W. Dong, F. Douglis, K. Li, H. Patterson, S. Reddy, P. Shilane, Tradeoffs in scalable data routing for deduplication clusters, in Proceedings of FAST (2011)

    Google Scholar 

  58. M. Lillibridge, K. Eshghi, D. Bhagwat, V. Deolalikar, G. Trezise, P. Camble, Sparse indexing: large scale, inline deduplication using sampling and locality, in Proceedings of FAST (2009)

    Google Scholar 

  59. A. Muthitacharoen, B. Chen, D. Mazieres, A low-bandwidth network file system, in Proceedings of SOSP (2001)

    Google Scholar 

  60. D. Meister, J. Kaiser, A. Brinkmann, T. Cortes, M. Kuhn, J. Kunkel, A study on data deduplication in HPC storage systems, in Proceedings of SC (2012)

    Google Scholar 

  61. B. Aggarwal, A. Akella, A. Anand, A. Balachandran, P. Chitnis, C. Muthukrishnan, R. Ramjee, G. Varghese, EndRE: an end-system redundancy elimination service for enterprises, in Proceedings of NSDI (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Hua .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hua, Y., Liu, X. (2019). Near Real-Time Searchable Analytics for Images. In: Searchable Storage in Cloud Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-2721-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-2721-6_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-2720-9

  • Online ISBN: 978-981-13-2721-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics