Abstract
Internet users’ access pattern for objects has been observed to follow Zipf’s law. The preference for network resource is showing strong influence on real-time lookup performance in large-scale distributed systems. In order to guarantee search response rate with limited memory space, we develop a new object indexing and locating algorithm called Bloom filter Arrays based on Zipf’s-distributed user Preference (ZPBA). The algorithm uses a compact data structure to achieve high accuracy in item lookup. We give the theoretical analysis of ZPBA and then conduct experiments with one million item corpus and 100,000 queries to validate our design. Comparison shows that our solution can be 77% more space efficient than traditional bloom filter based index approaches for applications of concentrated user access preference. The algorithm demonstrates practical application potential in fault tolerant large-scale distributed indexing and item lookup.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chierichetti, F., Kumar, R., Raghavan, P.: Compressed Web Indexes. In: 18th International Conference on World Wide Web, pp. 451–460. Association for Computing Machinery, New York (2009)
Sato, I., Nakagawa, H.: Topic Models with Power-law using Pitman-Yor Process. In: 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 673–682. Association for Computing Machinery, New York (2010)
Zipf, G.K.: The Psychobiology of Language. Houghton-Mifflin, Boston (1935)
Breslau, L., Pei, C., Li, F., Phillips, G., Shenker, S.: Web Caching and Zipf-like Distributions: Evidence and Implications. In: IEEE Annual Joint Conference of the IEEE Computer and Communications Societies, pp. 126–134. IEEE Press, New York (1999)
Rodriguez, P., Spanner, C., Biersack, E.W.: Analysis of Web Caching Architectures: Hierarchical and Distributed Caching. IEEE/ACM Transactions on Networking 9(4), 404–418 (2001)
Kotera, I., Egawa, R., Takizawa, H., Kobayashi, H.: Modeling of Cache Access Behavior Based on Zipf’s Law. In: 9th Workshop on Memory Performance: Dealing with Applications, Systems and Architecture, pp. 9–15. Association for Computing Machinery, New York (2008)
Jelenković, P.R., Kang, X.: Characterizing the Miss Sequence of the LRU Cache. ACM SIGMETRICS Performance Evaluation Review 36(2), 119–121 (2008)
Zorich, V.A., Cooke, R.: Mathematical Analysis. Springer, Berlin (2004)
Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer, Berlin (2011)
Stoica, I., Morris, R., Karger, D., Kaashoek, M., Balakrishnan, H.: Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. In: 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 149–160. Association for Computing Machinery, New York (2001)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A Scalable Content-Addressable Network. In: 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 161–172. Association for Computing Machinery, New York (2001)
Weil, S.A., Pollack, K.T., Brandt, S.A., Miller, E.L.: Dynamic Metadata Management for Petabyte-Scale File Systems. In: 2004 ACM/IEEE Conference on Supercomputing, p. 4. IEEE Computer Society, Washington, DC (2004)
Bloom, B.: Space/time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM 13(7), 422–426 (1970)
Tarkoma, S.C., Rothenberg, E., Lagerspetz, E.: Theory and Practice of Bloom Filters for Distributed Systems. IEEE Communications Surveys & Tutorials 14(1), 131–155 (2004)
Wang, Z., Luo, T.-J.: Intelligent Video Content Routing in a Direct Access Network. In: 3rd Symposium on Web Society, pp. 147–152. IEEE Press, New York (2011)
Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. IEEE/ACM Transactions on Networking 8(3), 281–293 (2000)
Bonomi, F., Mitzenmacher, M., Panigrahy, R., Singh, S., Varghese, G.: An Improved Construction for Counting Bloom Filters. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 684–695. Springer, Heidelberg (2006)
Ficara, D., Giordano, S., Procissi, G., Vitucci, F.: Multilayer Compressed Counting Bloom Filters. In: 27th Annual Joint Conference of the IEEE Computer and Communications Societies, pp. 311–315. IEEE Press, New York (2008)
Song, H., Dharmapurikar, S., Turner, J., Lockwood, J.: Fast Hash Table Lookup using Extended Bloom Filter: an Aid to Network Processing. In: 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 181–192. Association for Computing Machinery, New York (2005)
Bonomi, F., Mitzenmacher, M., Panigrah, R., Singh, S., Varghese, G.: Beyond Bloom Filters: from Approximate Membership Checks to Approximate State Machines. In: 2006 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 315–326. Association for Computing Machinery, New York (2006)
Sung, M., Xu, J., Li, J., Li, L.: Large-scale IP Traceback in Highspeed Internet: Practical Techniques and Information-theoretic Foundation. IEEE/ACM Transaction on Networking 16(6), 1253–1266 (2008)
Jokela, P., Zahemszky, A., Esteve, C., Arianfar, S., Nikander, P.: LIPSIN: Line Speed Publish/Subscribe Inter-Networking. In: 2009 ACM SIGCOMM 2009 Conference on Data Communication, pp. 195–206. Association for Computing Machinery, New York (2009)
Zhu, Y., Jiang, H., Wang, J., Xian, F.: HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems. IEEE Transactions on Parallel and Distributed Systems 19(6), 750–763 (2008)
Wang, Z., Luo, T.: Optimizing Hash Function Number for BF-Based Object Locating Algorithm. In: Tan, Y., Shi, Y., Ji, Z. (eds.) ICSI 2012, Part II. LNCS, vol. 7332, pp. 543–552. Springer, Heidelberg (2012)
Bruck, J., Gao, J., Jiang, A.: Weighted Bloom Filter. In: 2006 IEEE International Symposium on Information Theory, pp. 2304–2308. IEEE Press, New York (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Z., Luo, T., Xu, Y., Cheng, F., Zhang, X., Wang, X. (2013). A Fast Indexing Algorithm Optimization with User Behavior Pattern. In: Zu, Q., Hu, B., Elçi, A. (eds) Pervasive Computing and the Networked World. ICPCA/SWS 2012. Lecture Notes in Computer Science, vol 7719. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37015-1_52
Download citation
DOI: https://doi.org/10.1007/978-3-642-37015-1_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37014-4
Online ISBN: 978-3-642-37015-1
eBook Packages: Computer ScienceComputer Science (R0)