Abstract
This paper presents a structure we call XML Wavelet Tree (XWT) to represent any XML document in a compressed and self-indexed form. Therefore, any query or procedure that could be performed over the original document can be performed more efficiently over the XWT representation because it is shorter and has some indexing properties. In fact, XWT permits to answer XPath queries more efficiently than using the uncompressed version of the documents. XWT is also competitive when comparing it with inverted indexes over the XML document (if both structures use the same space).
Funded in part by MEC grant TIN2006-15071-C03-03, for the Spanish group; and for the third author by Fondecyt grant 1-080019 (Chile).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Xml 1.0, W3C Recommendation of Extensible Markup Language (XML) Version 1.0, 5th edn., http://www.w3.org/TR/REC-xml
Xpath 2.0, W3C Recommendation of XML Path Language (XPath) Version 2.0, http://www.w3.org/TR/xpath20
Bordogna, G., Pasi, G.: Personalised indexing and retrieval of heterogeneous structured documents. Inf. Retr. 8(2), 301–318 (2005)
Brisaboa, N.R., Fariña, A., Ladra, S., Navarro, G.: Reorganizing compressed text. In: SIGIR 2008, pp. 139–146 (2008)
Brisaboa, N.R., Fariña, A., Navarro, G., Paramá, J.R.: (s, c)-dense coding: An optimized compression code for natural language text databases. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 122–136. Springer, Heidelberg (2003)
Brisaboa, N.R., Fariña, A., Navarro, G., Paramá, J.R.: Lightweight natural language text compression. Inf. Retr. 10, 1–33 (2007)
Brisaboa, N.R., Fariña, A., Navarro, G., Places, A.S., López, E.R.: Self-indexing natural language. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 121–132. Springer, Heidelberg (2008)
Culpepper, J.S., Moffat, A.: Enhanced byte codes with restricted prefix properties. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 1–12. Springer, Heidelberg (2005)
Fuhr, N., Grobjohann, K.: Xirql: A query language for information retrieval in XML documents. In: SIGIR 2001, pp. 172–180 (2001)
Li, H.-G., Aghili, S.A., Agrawal, D., Abbadi, A.E.: Flux: fuzzy content and structure matching of XML range queries. In: WWW 2006, pp. 1081–1082 (2006)
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)
Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. TOIS 18(2), 113–139 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brisaboa, N.R., Cerdeira-Pena, A., Navarro, G. (2009). A Compressed Self-indexed Representation of XML Documents. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2009. Lecture Notes in Computer Science, vol 5714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04346-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-04346-8_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04345-1
Online ISBN: 978-3-642-04346-8
eBook Packages: Computer ScienceComputer Science (R0)