A Framework for Evaluating Snippet Generation for Dataset Search
Abstract
Reusing existing datasets is of considerable significance to researchers and developers. Dataset search engines help a user find relevant datasets for reuse. They can present a snippet for each retrieved dataset to explain its relevance to the user’s data needs. This emerging problem of snippet generation for dataset search has not received much research attention. To provide a basis for future research, we introduce a framework for quantitatively evaluating the quality of a dataset snippet. The proposed metrics assess the extent to which a snippet matches the query intent and covers the main content of the dataset. To establish a baseline, we adapt four state-of-the-art methods from related fields to our problem, and perform an empirical evaluation based on real-world datasets and queries. We also conduct a user study to verify our findings. The results demonstrate the effectiveness of our evaluation framework, and suggest directions for future research.
Keywords
Snippet generation Dataset search Evaluation metricNotes
Acknowledgements
This work was supported in part by the National Key R&D Program of China under Grant 2018YFB1005100, in part by the NSFC under Grant 61572247, and in part by the SIRIUS Centre, Norwegian Research Council project number 237898. Cheng was funded by the Six Talent Peaks Program of Jiangsu Province under Grant RJFW-011.
References
- 1.Bai, X., Delbru, R., Tummarello, G.: RDF snippets for semantic web search engines. In: Meersman, R., Tari, Z. (eds.) OTM 2008. LNCS, vol. 5332, pp. 1304–1318. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88873-4_27CrossRefGoogle Scholar
- 2.Brickley, D., Burgess, M., Noy, N.F.: Google dataset search: building a search engine for datasets in an open web ecosystem. In: WWW, pp. 1365–1375 (2019)Google Scholar
- 3.Butt, A.S., Haller, A., Xie, L.: Dwrank: learning concept ranking for ontology search. Semant. Web 7(4), 447–461 (2016)CrossRefGoogle Scholar
- 4.Cebiric, S., Goasdoué, F., Manolescu, I.: Query-oriented summarization of RDF graphs. PVLDB 8(12), 2012–2015 (2015)Google Scholar
- 5.Cheng, G., Ge, W., Qu, Y.: Generating summaries for ontology search. In: WWW (Companion Volume), pp. 27–28 (2011)Google Scholar
- 6.Cheng, G., Ji, F., Luo, S., Ge, W., Qu, Y.: Biprank: ranking and summarizing RDF vocabulary descriptions. In: JIST, pp. 226–241 (2011)Google Scholar
- 7.Cheng, G., Jin, C., Ding, W., Xu, D., Qu, Y.: Generating illustrative snippets for open data on the web. In: WSDM, pp. 151–159 (2017)Google Scholar
- 8.Cheng, G., Jin, C., Qu, Y.: HIEDS: a generic and efficient approach to hierarchical dataset summarization. In: IJCAI, pp. 3705–3711 (2016)Google Scholar
- 9.Cheng, G., Kharlamov, E.: Towards a semantic keyword search over industrial knowledge graphs (extended abstract). In: IEEE BigData, pp. 1698–1700 (2017)Google Scholar
- 10.Coffman, J., Weaver, A.C.: An empirical performance evaluation of relational keyword search techniques. IEEE Trans. Knowl. Data Eng. 26(1), 30–42 (2014)CrossRefGoogle Scholar
- 11.Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE, pp. 836–845 (2007)Google Scholar
- 12.Dolby, J., et al.: Scalable semantic retrieval through summarization and refinement. In: AAAI, pp. 299–304 (2007)Google Scholar
- 13.Ellefi, M.B., et al.: RDF dataset profiling - a survey of features, methods, vocabularies and applications. Semant. Web 9(5), 677–705 (2018)CrossRefGoogle Scholar
- 14.Feigenblat, G., Roitman, H., Boni, O., Konopnicki, D.: Unsupervised query-focused multi-document summarization using the cross entropy method. In: SIGIR, pp. 961–964 (2017)Google Scholar
- 15.Fkoue, A., Meneguzzi, F., Sensoy, M., Pan, J.Z.: Querying linked ontological data through distributed summarization. In: AAAI (2012)Google Scholar
- 16.Gambhir, M., Gupta, V.: Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1), 1–66 (2017)CrossRefGoogle Scholar
- 17.Ge, W., Cheng, G., Li, H., Qu, Y.: Incorporating compactness to generate term-association view snippets for ontology search. Inf. Process. Manag. 49(2), 513–528 (2013)CrossRefGoogle Scholar
- 18.Horrocks, I., Giese, M., Kharlamov, E., Waaler, A.: Using semantic technology to tame the data variety challenge. IEEE Internet Comput. 20(6), 62–66 (2016)CrossRefGoogle Scholar
- 19.Jiménez-Ruiz, E., et al.: BootOX: practical mapping of RDBs to OWL 2. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 113–132. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25010-6_7CrossRefGoogle Scholar
- 20.Kacprzak, E., Koesten, L., Ibáñez, L.D., Blount, T., Tennison, J., Simperl, E.: Characterising dataset search - an analysis of search logs and data requests. J. Web Semant. 55, 37–55 (2019)CrossRefGoogle Scholar
- 21.Kasneci, G., Ramanath, M., Sozio, M., Suchanek, F.M., Weikum, G.: STAR: steiner-tree approximation in relationship graphs. In: ICDE, pp. 868–879 (2009)Google Scholar
- 22.Kharlamov, E., et al.: Capturing industrial information models with ontologies and constraints. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 325–343. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_30CrossRefGoogle Scholar
- 23.Kharlamov, E., et al.: Ontology Based Data Access in Statoil. J. Web Semant. 44, 3–36 (2017)CrossRefGoogle Scholar
- 24.Kharlamov, E., et al.: An ontology-mediated analytics-aware approach to support monitoring and diagnostics of static and streaming data. J. Web Semant. 56, 30–55 (2019)CrossRefGoogle Scholar
- 25.Kharlamov, E., et al.: Semantic access to streaming and static data at Siemens. J. Web Semant. 44, 54–74 (2017)CrossRefGoogle Scholar
- 26.Kharlamov, E., Mehdi, G., Savković, O., Xiao, G., Kalayci, E.G., Roshchin, M.: Semantically-enhanced rule-based diagnostics for industrial internet of things: the SDRL language and case study for siemens trains and turbines. J. Web Semant. 56, 11–29 (2019)CrossRefGoogle Scholar
- 27.Le, W., Li, F., Kementsietsidis, A., Duan, S.: Scalable keyword search on large RDF data. IEEE Trans. Knowl. Data Eng. 26(11), 2774–2788 (2014)CrossRefGoogle Scholar
- 28.Li, N., Motta, E., d’Aquin, M.: Ontology summarization: an analysis and an evaluation. In: IWEST (2010)Google Scholar
- 29.Li, R., Qin, L., Yu, J.X., Mao, R.: Efficient and progressive group steiner tree search. In: SIGMOD, pp. 91–106 (2016)Google Scholar
- 30.Pan, J., Vetere, G., Gomez-Perez, J., Wu, H. (eds.): Exploiting Linked Data and Knowledge Graphs for Large Organisations. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-45654-6CrossRefGoogle Scholar
- 31.Penin, T., Wang, H., Tran, T., Yu, Y.: Snippet generation for semantic web search engines. In: ASWC, pp. 493–507 (2008)Google Scholar
- 32.Pietriga, E., et al.: Browsing linked data catalogs with LODAtlas. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 137–153. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_9CrossRefGoogle Scholar
- 33.Pinkel, C., et al.: RODI: benchmarking relational-to-ontology mapping generation quality. Semant. Web 9(1), 25–52 (2018)CrossRefGoogle Scholar
- 34.Pouriyeh, S., et al.: Graph-based methods for ontology summarization: A survey. In: AIKE, pp. 85–92 (2018)Google Scholar
- 35.Pouriyeh, S., et al.: Ontology summarization: graph-based methods and beyond. Int. J. Semant. Comput. 13(2), 259–283 (2019)CrossRefGoogle Scholar
- 36.Rietveld, L., Hoekstra, R., Schlobach, S., Guéret, C.: Structural properties as proxy for semantic relevance in RDF graph sampling. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8797, pp. 81–96. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11915-1_6CrossRefGoogle Scholar
- 37.Ringsquandl, M., et al.: Event-enhanced learning for KG completion. In: ESWC, pp. 541–559 (2018)CrossRefGoogle Scholar
- 38.Song, Q., Wu, Y., Lin, P., Dong, X., Sun, H.: Mining summaries for knowledge graph search. IEEE Trans. Knowl. Data Eng. 30(10), 1887–1900 (2018)CrossRefGoogle Scholar
- 39.Troullinou, G., Kondylakis, H., Stefanidis, K., Plexousakis, D.: Exploring RDFS KBs Using summaries. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 268–284. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_16CrossRefGoogle Scholar
- 40.Turpin, A., Tsegay, Y., Hawking, D., Williams, H.E.: Fast generation of result snippets in web search. In: SIGIR, pp. 127–134 (2007)Google Scholar
- 41.Wang, H., Aggarwal, C.C.: A survey of algorithms for keyword search on graph data. In: Managing and Mining Graph Data, pp. 249–273. Springer, Boston (2010). https://doi.org/10.1007/978-1-4419-6045-0_8CrossRefGoogle Scholar
- 42.Zhang, X., Cheng, G., Ge, W., Qu, Y.: Summarizing vocabularies in the global semantic web. J. Comput. Sci. Technol. 24(1), 165–174 (2009)CrossRefGoogle Scholar
- 43.Zhang, X., Cheng, G., Qu, Y.: Ontology summarization based on rdf sentence graph. In: WWW, pp. 707–716 (2007)Google Scholar
- 44.Zhang, X., Li, H., Qu, Y.: Finding important vocabulary within ontology. In: ASWC, pp. 106–112 (2006)CrossRefGoogle Scholar
- 45.Zneika, M., Vodislav, D., Kotzinos, D.: Quality metrics for RDF graph summarization. Semant. Web 10(3), 555–584 (2019)CrossRefGoogle Scholar