Abstract
Search log k-anonymization is based on the elimination of infrequent queries under exact (or nearly exact) matching conditions, which usually results in a big data loss and impaired utility. We present a more flexible, semantic approach to k-anonymity that consists of three steps: query concept mining, automatic query expansion, and affinity assessment of expanded queries. Based on the observation that many infrequent queries can be seen as refinements of a more general frequent query, we first model query concepts as probabilistically weighted n-grams and extract them from the search log data. Then, after expanding the original log queries with their weighted concepts, we find all the k-affine expanded queries under a given affinity threshold Θ, modeled as a generalized k-core of the graph of Θ-affine queries. Experimenting with the AOL data set, we show that this approach achieves levels of privacy comparable to those of plain k-anonymity while at the same time reducing the data losses to a great extent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adar, E.: User 4xxxxx9: Anonymizing query logs. In: WWW Workshop on Query Log Analysis (2007)
Barbaro, M., Zeller, T.: A face is exposed for aol searcher no. 4417749. New York Times (2006)
Batagelj, V., Zaversnik, M.: Generalized Cores. CoRR cs.DS/0202039 (2002)
Batagelj, V., Zaversnik, M.: An O(m) Algorithm for Cores Decomposition of Networks. CoRR cs.DS/0310049 (2003)
Bendersky, M., Croft, W.B.: Discovering key concepts in verbose queries. In: SIGIR, pp. 491–498 (2008)
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM CSUR 44(1), 1–50 (2012)
Church, K.W., Hanks, P.: Word association norms, mutual information and lexicography. Computational Linguistics 16(1), 22–29 (1990)
Götz, M., Machanavajjhala, A., Wang, G., Xiao, X., Gehrke, J.: Publishing Search Logs: A Comparative Study of Privacy Guarantees. TKDE 24(3), 520–532 (2012)
Feild, H., Allan, J., Glatt, J.: CrowdLogging: distributed, private, and anonymous search logging. In: SIGIR, pp. 375–384 (2011)
He, Y., Naughton, J.F.: Anonymization of SetValued Data via TopDown, Local Generalization. In: VLDB, pp. 934–945 (2009)
Hong, Y., He, X., Vaidya, J., Adam, N., Atluri, V.: Effective anonymization of query logs. In: CIKM, pp. 1465–1468 (2009)
Hu, Y., Qian, Y., Li, H., Pei, J., Zheng, Q.: Mining Query Subtopics from Search Log Data. In: SIGIR, pp. 305–314 (2012)
Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and click privately. In: WWW, pp. 171–180 (2009)
Kumar, R., Novak, J., Pang, B., Tomkins, A.: On anonymizing query logs via token-based hashing. In: WWW (2007)
Kumaran, G., Allan, J.: A Case for Shorter Queries, and Helping Users Create Them. In: NAACL-HLT, pp. 220–227 (2007)
Seidman, S.: Network structure and minimum degree. Social Networks 3(5), 269–287 (1983)
Su, K.-Y., Hsu, Y.-L., Sailard, C.: Constructing a Phrase Structure Grammar by Incorporating Linguistic Knowledge and Statistical Log-Likelihood Ratio. In: ROCLING IV, pp. 257–275 (1991)
Sweeney, L.: k-Anonymity: A Model for Protecting Privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carpineto, C., Romano, G. (2013). Semantic Search Log k-Anonymization with Generalized k-Cores of Query Concept Graph. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-36973-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36972-8
Online ISBN: 978-3-642-36973-5
eBook Packages: Computer ScienceComputer Science (R0)