Near Real-Time Searchable Analytics for Images

Hua, Yu; Liu, Xue

doi:10.1007/978-981-13-2721-6_6

Yu Hua³ &
Xue Liu⁴

432 Accesses

Abstract

The challenges of handling the explosive growth in data volume and complexity cause the increasing needs for semantic queries. The semantic queries can be interpreted as the correlation-aware retrieval, while containing approximate results. Existing cloud storage systems mainly fail to offer an adequate capability for the semantic queries. Since the true value or worth of data heavily depends on how efficiently semantic search can be carried out on the data in (near-) real-time, large fractions of data end up with their values being lost or significantly reduced due to the data staleness. To address this problem, we propose a near real-time and cost-effective semantic queries based methodology, called FAST. The idea behind FAST is to explore and exploit the semantic correlation within and among datasets via correlation-aware hashing and manageable flat-structured addressing to significantly reduce the processing latency, while incurring acceptably small loss of data-search accuracy. The near real-time property of FAST enables rapid identification of correlated files and the significant narrowing of the scope of data to be processed. FAST supports several types of data analytics, which can be implemented in existing searchable storage systems. We conduct a real-world use case in which children reported missing in an extremely crowded environment (e.g., a highly popular scenic spot on a peak tourist day) are identified in a timely fashion by analyzing 60 million images using FAST. FAST is further improved by using semantic-aware namespace to provide dynamic and adaptive namespace management for ultra-large storage systems. Extensive experimental results demonstrate the efficiency and efficacy of FAST in the performance improvements ({2016}IEEE. Reprinted, with permission, from Ref. [1].).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Y. Hua, H. Jiang, D. Feng, Real-time semantic search using approximate methodology for large-scale storage systems. Trans. Parallel Distrib. Syst. (TPDS) 27(4), 1212–1225 (2016)
Article Google Scholar
M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, M. Zaharia, A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)
Article Google Scholar
A. Marathe, R. Harris, D.K. Lowenthal, B.R. de Supinski, B. Rountree, M. Schulz, X. Yuan, A comparative study of high-performance computing on the cloud, in Proceedings of HPDC (2013)
Google Scholar
P. Nath, B. Urgaonkar, A. Sivasubramaniam, Evaluating the usefulness of content addressable storage for high-performance data intensive applications, in Proceedings of HPDC (2008)
Google Scholar
Gartner, Inc., Forecast: consumer digital storage needs, 2010–2016 (2012)
Google Scholar
Storage Newsletter, 7% of consumer content in cloud storage in 2011, 36% in 2016 (2012)
Google Scholar
J. Gantz, D. Reinsel, The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east, in International Data Corporation (IDC) iView, Dec 2012
Google Scholar
Y. Hua, W. He, X. Liu, D. Feng, SmartEye: real-time and efficient cloud image sharing for disaster environments, in Proceedings of INFOCOM (2015)
Google Scholar
Y. Ke, R. Sukthankar, PCA-SIFT: a more distinctive representation for local image descriptors, in Proceedings of CVPR (2004)
Google Scholar
Y. Ke, R. Sukthankar, L. Huston, Efficient near-duplicate detection and sub-image retrieval, in Proceedings of ACM Multimedia (2004)
Google Scholar
J. Liu, Z. Huang, H.T. Shen, H. Cheng, Y. Chen, Presenting diverse location views with real-time near-duplicate photo elimination, in Proceedings of ICDE (2013)
Google Scholar
D. Zhan, H. Jiang, S.C. Seth, CLU: co-optimizing locality and utility in thread-aware capacity management for shared last level caches. IEEE Trans. Comput. 63(7), 1656–1667 (2014)
Article MathSciNet Google Scholar
P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of STOC (1998), pp. 604–613
Google Scholar
R. Pagh, F. Rodler, Cuckoo hashing, in Proceedings of ESA (2001), pp. 121–133
Chapter Google Scholar
Y. Hua, H. Jiang, Y. Zhu, D. Feng, L. Xu, SANE: semantic-aware namespace in ultra-large-scale file systems. IEEE Trans. Parallel Distrib. Syst. (TPDS) 25(5), 1328–1338 (2014)
Article Google Scholar
Changewave Research. http://www.changewaveresearch.com (2011)
X. Tan, S. Chen, Z.-H. Zhou, F. Zhang, Face recognition from a single image per person: a survey. Pattern Recognit. 39(9), 1725–1745 (2006)
Article Google Scholar
T. Ahonen, A. Hadid, M. Pietikainen, Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)
Article Google Scholar
X. Tan, B. Triggs, Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans. Image Process. 19(6), 1635–1650 (2010)
Article MathSciNet Google Scholar
J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
Article Google Scholar
Y. Hua, X. Liu, Scheduling heterogeneous flows with delay-aware deduplication for avionics applications. IEEE Trans. Parallel Distrib. Syst. 23(9), 1790–1802 (2012)
Article MathSciNet Google Scholar
A.W. Leung, M. Shao, T. Bisson, S. Pasupathy, E.L. Miller, Spyglass: fast, scalable metadata search for large-scale storage systems, in Proceedings of FAST (2009)
Google Scholar
Y. Hua, H. Jiang, Y. Zhu, D. Feng, L. Tian, SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems, in Proceedings of SC (2009)
Google Scholar
E. Riedel, M. Kallahalla, R. Swaminathan, A framework for evaluating storage system security, in Proceedings of FAST (2002), pp. 15–30
Google Scholar
S. Kavalanekar, B. Worthington, Q. Zhang, V. Sharda, Characterization of storage workload traces from production Windows servers, in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC) (2008)
Google Scholar
D. Ellard, J. Ledlie, P. Malkani, M. Seltzer, Passive NFS tracing of email and research workloads, in Proceedings of FAST (2003), pp. 203–216
Google Scholar
J.L. Hellerstein, Google cluster data. http://googleresearch.blogspot.com/2010/01/google-cluster-data.html, Jan 2010
D. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article MathSciNet Google Scholar
B. Bloom, Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Article Google Scholar
A. Andoni, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)
Article Google Scholar
Q. Lv, W. Josephson, Z. Wang, M. Charikar, K. Li, Multi-probe LSH: efficient indexing for high-dimensional similarity search, in Proceedings of VLDB (2007), pp. 950–961
Google Scholar
B. Debnath, S. Sengupta, J. Li, ChunkStash: speeding up inline storage deduplication using flash memory, in Proceedings of USENIX ATC (2010)
Google Scholar
FUSE. http://fuse.sourceforge.net/
Y. Hua, H. Jiang, D. Feng, FAST: near real-time searchable data analytics for the cloud, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2014)
Google Scholar
D. Lowe, Object recognition from local scale-invariant features, in Proceedings of IEEE ICCV (1999)
Google Scholar
A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing, in VLDB (1999), pp. 518–529
Google Scholar
M. Datar, N. Immorlica, P. Indyk, V. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, in Proceedings of the Annual Symposium on Computational Geometry (2004)
Google Scholar
Y. Tao, K. Yi, C. Sheng, P. Kalnis, Quality and efficiency in high-dimensional nearest neighbor search, in Proceedings of SIGMOD (2009)
Google Scholar
A. Guttman, R-trees: a dynamic index structure for spatial searching, in Proceedings of ACM SIGMOD (1984), pp. 47–57
Article Google Scholar
Y. Liu, L. Guo, F. Li, S. Chen, An empirical evaluation of battery power consumption for streaming data transmission to mobile devices, in Proceedings of Multimedia (2011), pp. 473–482
Google Scholar
Monsoon Power Monitor. http://www.msoon.com (2012)
A. Viswanathan, A. Hussain, J. Mirkovic, S. Schwab, J. Wroclawski, A semantic framework for data analysis in networked systems, in Proceedings of NSDI (2011)
Google Scholar
S. Lakshminarasimhan, J. Jenkins, I. Arkatkar, Z. Gong, H. Kolla, S.-H. Ku, S. Ethier, J. Chen, C.-S. Chang, S. Klasky et al., ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data, in Proceedings of SC (2011)
Google Scholar
M. Mihailescu, G. Soundararajan, C. Amza, MixApart: decoupled analytics for shared storage systems, in Proceedings of FAST (2013)
Google Scholar
J.C. Bennett, H. Abbasi, P.-T. Bremer, R. Grout, A. Gyulassy, T. Jin, S. Klasky, H. Kolla, M. Parashar, V. Pascucci et al., Combining in-situ and in-transit processing to enable extreme-scale scientific analysis, in Proceedings of SC (2012)
Google Scholar
H. Huang, N. Zhang, W. Wang, G. Das, A. Szalay, Just-in-time analytics on large file systems, in Proceedings of FAST (2011)
Google Scholar
S. Deerwester, S. Dumas, G. Furnas, T. Landauer, R. Harsman, Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci, 391–407 (1990)
Article Google Scholar
C. Papadimitriou, P. Raghavan, H. Tamaki, S. Vempala, Latent semantic indexing: a probabilistic analysis. J. Comput. Syst. Sci. 61(2), 217–235 (2000)
Article MathSciNet Google Scholar
S. Weil, S.A. Brandt, E.L. Miller, D.D.E. Long, C. Maltzahn, Ceph: a scalable, high-performance distributed file system, in Proceedings of OSDI (2006)
Google Scholar
C. Maltzahn, E. Molina-Estolano, A. Khurana, A.J. Nelson, S.A. Brandt, S. Weil, Ceph as a scalable alternative to the Hadoop distributed file system, in login: The USENIX Magazine, August 2010
Google Scholar
J. Chou, K. Wu, O. Rubel, M. Howison, J. Qiang, B. Austin, E.W. Bethel, R.D. Ryne, A. Shoshani et al., Parallel index and query for large scale data analysis, in Proceedings of SC (2011)
Google Scholar
Y. Hua, B. Xiao, B. Veeravalli, D. Feng, Locality-sensitive bloom filter for approximate membership query. IEEE Trans. Comput. 61(6), 817–830 (2012)
Article MathSciNet Google Scholar
J.B. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Polyzotis, S. Brandt, SciHadoop: array-based query processing in Hadoop, in Proceedings of SC (2011)
Google Scholar
B. Zhu, K. Li, H. Patterson, Avoiding the disk bottleneck in the data domain deduplication file system, in Proceedings of FAST (2008)
Google Scholar
D. Bhagwat, K. Eshghi, D. Long, M. Lillibridge, Extreme binning: scalable, parallel deduplication for chunk-based file backup, in Proceedings IEEE MASCOTS (2009)
Google Scholar
W. Xia, H. Jiang, D. Feng, Y. Hua, SiLo: a similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput, in Proceedings of USENIX ATC (2011)
Google Scholar
W. Dong, F. Douglis, K. Li, H. Patterson, S. Reddy, P. Shilane, Tradeoffs in scalable data routing for deduplication clusters, in Proceedings of FAST (2011)
Google Scholar
M. Lillibridge, K. Eshghi, D. Bhagwat, V. Deolalikar, G. Trezise, P. Camble, Sparse indexing: large scale, inline deduplication using sampling and locality, in Proceedings of FAST (2009)
Google Scholar
A. Muthitacharoen, B. Chen, D. Mazieres, A low-bandwidth network file system, in Proceedings of SOSP (2001)
Google Scholar
D. Meister, J. Kaiser, A. Brinkmann, T. Cortes, M. Kuhn, J. Kunkel, A study on data deduplication in HPC storage systems, in Proceedings of SC (2012)
Google Scholar
B. Aggarwal, A. Akella, A. Anand, A. Balachandran, P. Chitnis, C. Muthukrishnan, R. Ramjee, G. Varghese, EndRE: an end-system redundancy elimination service for enterprises, in Proceedings of NSDI (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Huazhong University of Science and Technology, Wuhan, Hubei, China
Yu Hua
McGill University, Montreal, QC, Canada
Xue Liu

Authors

Yu Hua
View author publications
You can also search for this author in PubMed Google Scholar
Xue Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Hua .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hua, Y., Liu, X. (2019). Near Real-Time Searchable Analytics for Images. In: Searchable Storage in Cloud Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-2721-6_6

Download citation

DOI: https://doi.org/10.1007/978-981-13-2721-6_6
Published: 09 February 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2720-9
Online ISBN: 978-981-13-2721-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics