Abstract
This paper proposes an efficient and effective "Locality Preserving Mapping" scheme that allows text databases representatives to be mapped onto a global information retrieval system such as Peer-to-Peer Information Retrieval Systems (P2PIR). The proposed approach depends on using Locality Sensitive Hash functions (LSH), and approximate min-wise independent permutations to achieve such task. Experimental evaluation over real data, along with comparison between different proposed schemes (with different parameters) will be presented in order to show the performance advantages of such schemes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bawa, M., Condie, T., Ganesan, P.: LSH forest: self tuning indexes for similarity search. In: Proceedings of the 14th International on World Wide Web (WWW 2005), New York, NY, USA, pp. 651–660 (2005)
Bhattacharya, I., Kashyap, S.R., Parthasarathy, S.: Similarity Searching in Peer-to-Peer Databases. In: The 25th IEEE International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, OH, pp. 329–338 (2005)
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: The 34th ACM Symposium on Theory of Computing, pp. 380–388 (2002)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. In: The Twentieth Annual Symposium on Computational Geometry (SCG 2004), Brooklyn, New York, USA, pp. 253–262 (2004)
Gupta, A., Agrawal, D., Abbadi, A.E.: Approximate Range Selection Queries in Peer-to-Peer Systems. In: The CIDR Conference, pp. 254–273 (2003)
Cai, D., He, X., Han, J.: Document Clustering Using Locality Preserving Indexing. IEEE Transactions on Knowledge and Data Engineering 17(12), 1624–1637 (2005)
Mokbel, M.F., Aref, W.G., Grama, A.: Spectral LPM: An Optimal Locality-Preserving Mapping using the Spectral (not Fractal) Order. In: The 19th International Conference on Data Engineering (ICDE 2003), pp. 699–701 (2003)
Sagan, H.: Space-Filling Curves. Springer, Berlin (1994)
Indyk, P., Motwani, R.: Approximate Nearest Neighbors: towards Removing the Curse of Dimensionality. In: The Symp. Theory of Computing, pp. 604–613 (1998)
Indyk, P.: Nearest neighbors in High-Dimensional Spaces. In: CRC Handbook of Discrete and Computational Geometry. CRC, Boca Raton (2003)
Motwani, R., Naor, A., Panigrahy, R.: Lower bounds on Locality Sensitive Hashing. In: The ACM Twenty-Second Annual Symposium on Computational Geometry SCG 2006, Sedona, Arizona, USA, pp. 154–157 (2006)
Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Efficient Filtering with Sketches in the Ferret Toolkit. In: The 8th ACM International Workshop on Multimedia Information Retrieval (MIR 2006), Santa Barbara, California, USA, pp. 279–288 (2006)
Qamra, A., Meng, Y., Chang, E.Y.: Enhanced Perceptual Distance Functions and Indexing for Image Replica Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(3), 379–391 (2005)
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-Wise Independent Permutations. Journal of Computer and System Sciences 60, 630–699 (2000)
Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of Compression and Complexity of Sequences, p. 21 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hassan, M., Hasan, Y. (2010). Locality Preserving Scheme of Text Databases Representative in Distributed Information Retrieval Systems. In: Zavoral, F., Yaghob, J., Pichappan, P., El-Qawasmeh, E. (eds) Networked Digital Technologies. NDT 2010. Communications in Computer and Information Science, vol 88. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14306-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-14306-9_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14305-2
Online ISBN: 978-3-642-14306-9
eBook Packages: Computer ScienceComputer Science (R0)