Abstract
Many applications, like Twitter, Yelp, or Facebook, produce documents that are tagged with geolocations. For example, when a user tweets using Twitter, the tweets are tagged with the user’s location (inferred using the user’s IP address, or mobile GPS). These locations, however, are computed with inherent uncertainty. In such scenarios, it is desired to support search queries that take into account both text relevancy and location proximity. In this paper, we study the problem of text retrieval queries on probabilistic spatial data. We consider top-(\(c\), \(k\)) queries to capture semantics of both textual relevance and probabilistic location proximity. A top-(\(c\), \(k\)) query returns \(k\) tuples which have the highest probability of being in the top-\(c\) query results under the possible world semantics. We propose a framework to answer such queries. Our framework integrates two components: scoring textual similarity based on the query text; and the document text and calculating top-\(c\) confidence based on the probability of the document falling within the query region. We develop an IRTree-based Incremental Scoring Approach (ISA) that returns an iterator over tuples in decreasing order of text similarity. Our parameterized probabilistic ranking algorithm \(PRank^c\), consumes the output of ISA interactively and calculates top-\(c\) confidence of these tuples in linear time. We also provide a heuristic optimization to terminate the \(PRank^c\) algorithm earlier without compromising on result quality. We conduct experiments on real data to show the efficiency of this framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2004)
Cao, X., Cong, G., Jensen, C.S., Ooi, B.C.: Collective spatial keyword querying. In: Proceedings of ACM Special Interest Group on Management of Data (SIGMOD) (2011)
Chen, Y.-Y., Suel, T., Markowetz, A.: Efficient query processing in geographic web search engines. In: Proceedings of ACM Special Interest Group on Management of Data (SIGMOD) (2006)
Cheng, R., Chen, L., Chen, J., Xie, X.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of the International Conference on Extending Database Technology (EDBT) (2009)
Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. In: Proceedings of the International Conference on Very Large Data Bases (VLDB) (2009)
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: Proceedings of the International Conference on Very Large Data Bases (VLDB) (2007)
De Felipe, I., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: Proceedings of the International Conference on Data Engineering (ICDE) (2008)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of ACM Special Interest Group on Management Of Data (SIGMOD) (2008)
Lian, X., Chen, L.: Probabilistic ranked queries in uncertain databases. In: Proceedings of the International Conference on Extending Database Technology (EDBT) (2008)
Lu, J., Lu, Y., Cong, G.: Reverse spatial and textual k nearest neighbor search. In: Proceedings of ACM Special Interest Group on Management Of Data (SIGMOD) (2011)
Markowetz, A., Chen, Y.Y., Suel, T.: Design and implementation of a geographic search engine. In: International Workshop on the Web and Databases (WebDB) (2005)
McCurley, K.S.: Geospatial mapping and navigation of the web. In: Proceedings of the International Conference on World Wide Web (WWW) (2001)
Qi, Y., Jain, R., Singh, S., Prabhakar, S.: Threshold query optimization for uncertain data. In: Proceedings of ACM Special Interest Group on Management Of Data (SIGMOD) (2010)
Qi, Y., Singh, S., Shah, R., Prabhakar, S.: Indexing probabilistic nearest-neighbor threshold queries. In: Workshop on Management of Uncertain Data (2008)
Sarawagi, S.: Information extraction. Found. Trends Databases 1(3), 261–377 (2008)
Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S.E., Neville, J., Cheng, R.: Database support for probabilistic attributes and tuples. In: Proceedings of the International Conference on Data Engineering (ICDE) (2008)
Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Top-k query processing in uncertain databases. In: Proceedings of the International Conference on Data Engineering (ICDE), April 2007
Wing, B.P., Baldridge, J.: Simple supervised document geolocation with geodesic grids. In: Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 (2011)
Zhou, Y., Xie, X., Wang, C., Gong, Y., Ma, W.-Y.: Hybrid index structures for location-based web search. In: ACM International Conference on Information and Knowledge Management (2005)
Acknowledgements
The work in this paper was supported by National Science Foundation grants IIS-1017990 and IIS-09168724.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, F., Jain, R., Prabhakar, S., Si, L. (2014). ProbKS: Keyword Search on Probabilistic Spatial Data. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_30
Download citation
DOI: https://doi.org/10.1007/978-3-662-43984-5_30
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43983-8
Online ISBN: 978-3-662-43984-5
eBook Packages: Computer ScienceComputer Science (R0)