Abstract
Large document collections can be hard to explore if the user presents her information need in a limited set of keywords. Ambiguous intents arising out of these short queries often result in long-winded query sessions and many query reformulations. To alleviate this problem, in this work, we propose the novel concept of semantic aspects (e.g., \({\langle }\{\textsf {michael\text {-}phelps}\}, \{\textsf {athens, beijing, london}\}, [2004,2016] \rangle \) for the ambiguous query ) and present the xFactor algorithm that generates them from annotations in documents. Semantic aspects uplift document contents into a meaningful structured representation, thereby allowing the user to sift through many documents without the need to read their contents. The semantic aspects are created by the analysis of semantic annotations in the form of temporal, geographic, and named entity annotations. We evaluate our approach on a novel testbed of over 5,000 aspects on Web-scale document collections amounting to more than 450 million documents. Our results show the xFactor algorithm finds relevant aspects for highly ambiguous queries.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
The ClueWeb09 dataset. http://lemurproject.org/clueweb09/
The ClueWeb12 dataset. http://lemurproject.org/clueweb12/
List of lists of lists. https://en.wikipedia.org/wiki/List_of_lists_of_lists
Maria Sharapova. https://en.wikipedia.org/wiki/Maria_Sharapova
The New York Times Annotated Corpus. https://catalog.ldc.upenn.edu/LDC2008T19
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994, pp. 487–499 (1994)
Ben-Yitzhak, O., et al.: Beyond basic faceted search. In: WSDM 2008, pp. 33–44 (2008)
Berberich, K., Bedathur, S., Alonso, O., Weikum, G.: A language modeling approach for temporal information needs. In: Gurrin, C., et al. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 13–25. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12275-0_5
Bhagavatula, C.S., Noraset, T., Downey, D.: TabEL: entity linking in web tables. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 425–441. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_25
Bianchi, F., Palmonari, M., Nozza, D.: Towards encoding time in text-based entity embeddings. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 56–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_4
Nguyen, T.N., Kanhabua, N., Nejdl, W.: Multiple models for recommending temporal aspects of entities. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 462–480. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_30
Blei, D.M., et al.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Bordino, I., et al.: Beyond entities: promoting explorative search with bundles. Inf. Retr. J. 19(5), 447–486 (2016)
Ceccarelli, D., et al.: Learning relatedness measures for entity linking. In: CIKM 2013, pp. 139–148 (2013)
Clarke, C.L.A., et al.: Novelty and diversity in information retrieval evaluation. In: SIGIR 2008, pp. 659–666 (2008)
Dou, Z., et al.: Finding dimensions for queries. In: CIKM 2011, pp. 1311–1320 (2011)
Gabrilovich, E., et al.: FACC1: freebase annotation of ClueWeb corpora, version 1 (release date 2013-06-26, format version 1, correction level 0), June 2013
Grau, B.C. et al.: SemFacet: faceted search over ontology enhanced knowledge graphs. In: ISWC 2016 (2016)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(Suppl. 1), 5228–5235 (2004)
Guo, J., et al.: Named entity recognition in query. In: SIGIR 2009, pp. 267–274 (2009)
Gupta, D., Berberich, K.: Identifying time intervals of interest to queries. In: CIKM 2014, pp. 1835–1838 (2014)
Hearst, M.A.: Search User Interfaces, 1st edn. Cambridge University Press, New York (2009)
Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: SIGIR 1993. pp. 59–68 (1993)
Henry, J.: Providing knowledge panels with search results, 2 May 2013. https://www.google.com/patents/US20130110825. US Patent App. 13/566,489
Hoffart, J., et al.: STICS: searching with strings, things, and cats. In: SIGIR 2014, pp. 1247–1248 (2014)
Hoffart, J., et al.: Robust disambiguation of named entities in text. In: EMNLP 2011, pp. 782–792 (2011)
Kong, W., Allan, J.: Extracting query facets from search results. In: SIGIR 2013, pp. 93–102 (2013)
Koutrika, G., et al.: Generating reading orders over document collections. In: ICDE 2015, pp. 507–518 (2015)
Li, C., et al.: Facetedpedia: Dynamic generation of query-dependent faceted interfaces for Wikipedia. In: WWW 2010, pp. 651–660 (2010)
Reinanda, R., et al.: Mining, ranking and recommending entity aspects. In: SIGIR 2015, pp. 263–272 (2015)
Santos, R.L.T., et al.: Search result diversification. Found. Trends® Inf. Retr. 9(1), 1–90 (2015)
Schuhmacher, M., et al.: Ranking entities for web queries through text and knowledge. In: CIKM 2015, pp. 1461–1470 (2015)
Strötgen, J., Gertz, M.: Multilingual and cross-domain temporal tagging. Lang. Resour. Eval. 47(2), 269–298 (2013)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from wikipedia and wordnet. Web Semant. 6(3), 203–217 (2008)
Tran, N.K., Tran, T., Niederée, C.: Beyond time: dynamic context-aware entity recommendation. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 353–368. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58068-5_22
Zhang, R., et al.: Learning recurrent event queries for web search. In: EMNLP 2010, pp. 1129–1139 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Gupta, D., Berberich, K., Strötgen, J., Zeinalipour-Yazti, D. (2019). Generating Semantic Aspects for Queries. In: Hitzler, P., et al. The Semantic Web. ESWC 2019. Lecture Notes in Computer Science(), vol 11503. Springer, Cham. https://doi.org/10.1007/978-3-030-21348-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-21348-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21347-3
Online ISBN: 978-3-030-21348-0
eBook Packages: Computer ScienceComputer Science (R0)