Abstract
Semistructured data is of increasing importance in many application domains, but one of its core use cases is representing documents. Consequently, effectively retrieving information from semistructured documents is an important problem that has seen work from both the information retrieval (IR) and databases (DB) communities. Comparing the large number of retrieval models and systems is a non-trivial task for which established benchmark initiatives such as TREC with their focus on unstructured documents are not appropriate. This chapter gives an overview of semistructured data in general and the INEX initiative for the evaluation of XML retrieval, focusing on the most prominent Adhoc Search Track.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amer-Yahia, S., Lalmas, M.: XML search: languages, INEX and scoring. SIGMOD Record 35(4), 16–23 (2006)
Arvola, P., Geva, S., Kamps, J., Schenkel, R., Trotman, A., Vainio, J.: Overview of the INEX 2010 ad hoc track. In: Geva, et al. (eds.) [14], pp. 1–32 (2010)
Arvola, P., Kekäläinen, J., Junkkari, M.: Expected reading effort in focused retrieval evaluation. Inf. Retr. 13(5), 460–484 (2010)
Case, P., Dyck, M., Holstege, M., Amer-Yahia, S., Botev, C., Buxton, S., Doerre, J., Melton, J., Rys, M., Shanmugasundaram, J.: XQuery and XPath full text 1.0 (2011), http://www.w3.org/TR/xpath-full-text-10/
Chappell, T., Geva, S.: Overview of the INEX 2012 relevance feedback track. In: Forner, et al. (eds.) [9] (2012)
Demartini, G., Iofciu, T., de Vries, A.P.: Overview of the INEX 2009 entity ranking track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2009. LNCS, vol. 6203, pp. 254–264. Springer, Heidelberg (2010)
Denoyer, L., Gallinari, P.: The Wikipedia XML corpus. In: Fuhr, et al. (eds.) [12], pp. 12–19
Fetahu, B., Schenkel, R.: Retrieval evaluation on focused tasks. In: Hersh, W.R., Callan, J., Maarek, Y., Sanderson, M. (eds.) SIGIR, pp. 1135–1136. ACM (2012)
Forner, P., Karlgren, J., Womser-Hacker, C. (eds.): CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy September 17-20 (2012)
Frommholz, I., Larson, R.R.: The heterogeneous collection track at INEX 2006. In: Fuhr, et al. (eds.) [12], pp. 312–317
Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.): INEX 2007. LNCS, vol. 4862. Springer, Heidelberg (2008)
Fuhr, N., Lalmas, M., Trotman, A. (eds.): INEX 2006. LNCS, vol. 4518. Springer, Heidelberg (2007)
(Sandy) Gao, S., Sperberg-McQueen, C.M., Thompson, H.S.: W3C XML schema definition language (XSD) 1.1 part 1: Structures (2012), http://www.w3.org/TR/xmlschema11-1/
Geva, S., Kamps, J., Schenkel, R., Trotman, A. (eds.): INEX 2010. LNCS, vol. 6932. Springer, Heidelberg (2011)
Gövert, N., Fuhr, N., Lalmas, M., Kazai, G.: Evaluating the effectiveness of content-oriented XML retrieval methods. Inf. Retr. 9(6), 699–722 (2006)
Kamps, J., Lalmas, M., Larsen, B.: Evaluation in context. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 339–351. Springer, Heidelberg (2009)
Kamps, J., Pehcevski, J., Kazai, G., Lalmas, M., Robertson, S.: INEX 2007 evaluation measures. In: Fuhr, et al. (eds.) [11], pp. 24–33 (2007)
Kazai, G., Doucet, A.: Overview of the INEX 2007 book search track (BookSearch’07). In: Fuhr, et al., [11], pp. 148–161
Kazai, G., Gövert, N., Lalmas, M., Fuhr, N.: The INEX evaluation initiative. In: Blanken, H.M., Grabs, T., Schek, H.-J., Schenkel, R., Weikum, G. (eds.) Intelligent Search on XML Data. LNCS, vol. 2818, pp. 279–293. Springer, Heidelberg (2003)
Kazai, G., Kamps, J., Koolen, M., Milic-Frayling, N.: Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking. In: Ma, W.-Y., Nie, J.-Y., Baeza-Yates, R.A., Chua, T.-S., Bruce Croft, W. (eds.) SIGIR, pp. 205–214. ACM (2011)
Kazai, G., Lalmas, M.: extended cumulated gain measures for the evaluation of content-oriented XML retrieval. ACM Trans. Inf. Syst. 24(4), 503–542 (2006)
Kazai, G., Lalmas, M., de Vries, A.P.: The overlap problem in content-oriented XML retrieval evaluation. In: Sanderson, M., Järvelin, K., Allan, J., Bruza, P. (eds.) SIGIR, pp. 72–79. ACM (2004)
Kazai, G., Lalmas, M., Fuhr, N., Gövert, N.: A report on the first year of the INitiative for the Evaluation of XML retrieval. JASIST 55(6), 551–556 (2004)
Koolen, M., Kazai, G., Kamps, J., Preminger, M., Doucet, A., Landoni, M.: Overview of the INEX 2012 social book search track. In: Forner, et al. (eds.) [9]
Lalmas, M., Tombros, A.: Evaluating XML retrieval effectiveness at INEX. SIGIR Forum 41(1), 40–57 (2007)
Nordlie, R., Pharo, N.: Seven years of INEX interactive retrieval experiments – lessons and challenges. In: Catarci, T., Forner, P., Hiemstra, D., Peñas, A., Santucci, G. (eds.) CLEF 2012. LNCS, vol. 7488, pp. 13–23. Springer, Heidelberg (2012)
O’Keefe, R.A., Trotman, A.: The simplest query language that could possibly work. In: Proceedings of the 2nd INEX Workshop, pp. 167–174 (2003)
Pal, S., Mitra, M., Kamps, J.: Evaluation effort, reliability and reusability in XML retrieval. JASIST 62(2), 375–394 (2011)
Pehcevski, J., Piwowarski, B.: Evaluation metrics for structured text retrieval. In: Liu, L., Tamer Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 1015–1024. Springer US (2009)
Peterson, D., (Sandy) Gao, S., Malhotra, A., Sperberg-McQueen, C.M., Henry, S. Thompson. W3C XML schema definition language (XSD) 1.1 part 2: Datatypes (2012), http://www.w3.org/TR/xmlschema11-2/
Piwowarski, B.: EPRUM metrics and INEX 2005. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 30–42. Springer, Heidelberg (2006)
Piwowarski, B., Trotman, A., Lalmas, M.: Sound and complete relevance assessment for XML retrieval. ACM Trans. Inf. Syst. 27(1) (2008)
SanJuan, E., Bellot, P., Moriceau, V., Tannier, X.: Overview of the INEX 2010 question answering track (QA@INEX). In: Geva et al. (eds.)[14,] pp. 269–281 (2010)
SanJuan, E., Moriceau, V., Tannier, X., Bellot, P., Mothe, J.: Overview of the INEX 2012 tweet contextualization track. In: Forner, et al. (eds.) [9]
Schenkel, R., Suchanek, F.M., Kasneci, G.: YAWN: A semantically annotated Wikipedia XML corpus. In: Kemper, A., Schöning, H., Rose, T., Jarke, M., Seidl, T., Quix, C., Brochhaus, C. (eds.) BTW. LNI, vol. 103, pp. 277–291. GI (2007)
Thom, J.A., Wu, C.: Overview of the INEX 2010 web service discovery track. In: Geva, et al. (eds.) [14], pp. 332–335
Trappett, M., Geva, S., Trotman, A., Scholer, F., Sanderson, M.: Overview of the INEX 2012 snippet retrieval track. In: Forner, et al. (eds.) [9]
Trotman, A., Alexander, D., Geva, S.: Overview of the INEX 2010 link the wiki track. In: Geva, et al. (eds.) [14], pp. 241–249
Trotman, A., Lalmas, M.: Why structural hints in queries do not help XML-retrieval. In: Efthimiadis, E.N., Dumais, S.T., Hawking, D., Järvelin, K. (eds.) SIGIR, pp. 711–712. ACM (2006)
Trotman, A., Sigurbjörnsson, B.: Narrowed Extended XPath I (NEXI). In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 16–40. Springer, Heidelberg (2005)
Tsikrika, T., Kludas, J.: Overview of the WikipediaMM Task at ImageCLEF 2008. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 539–550. Springer, Heidelberg (2009)
Tsikrika, T., Westerveld, T.: The INEX 2007 multimedia track. In: Fuhr, et al. (eds.) [11], pp. 440–453
De Vries, C.M., Nayak, R., Kutty, S., Geva, S., Tagarelli, A.: Overview of the INEX 2010 XML mining track: Clustering and classification of XML documents. In: Geva, et al. (eds.) [14], pp. 363–376
Wang, Q., Kamps, J., Camps, G.R., Marx, M., Schuth, A., Theobald, M., Gurajada, S., Mishra, A.: Overview of the INEX 2012 linked data track. In: Forner, et al. (eds.) [9]
Wang, Q., Ramírez, G., Marx, M., Theobald, M., Kamps, J.: Overview of the INEX 2011 data-centric track. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 118–137. Springer, Heidelberg (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schenkel, R. (2014). Semistructured Data Search Evaluation. In: Ferro, N. (eds) Bridging Between Information Retrieval and Databases. PROMISE 2013. Lecture Notes in Computer Science, vol 8173. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54798-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-54798-0_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54797-3
Online ISBN: 978-3-642-54798-0
eBook Packages: Computer ScienceComputer Science (R0)