Skip to main content

A Compressed Self-indexed Representation of XML Documents

  • Conference paper
Research and Advanced Technology for Digital Libraries (ECDL 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5714))

Included in the following conference series:

Abstract

This paper presents a structure we call XML Wavelet Tree (XWT) to represent any XML document in a compressed and self-indexed form. Therefore, any query or procedure that could be performed over the original document can be performed more efficiently over the XWT representation because it is shorter and has some indexing properties. In fact, XWT permits to answer XPath queries more efficiently than using the uncompressed version of the documents. XWT is also competitive when comparing it with inverted indexes over the XML document (if both structures use the same space).

Funded in part by MEC grant TIN2006-15071-C03-03, for the Spanish group; and for the third author by Fondecyt grant 1-080019 (Chile).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Xml 1.0, W3C Recommendation of Extensible Markup Language (XML) Version 1.0, 5th edn., http://www.w3.org/TR/REC-xml

  2. Xpath 2.0, W3C Recommendation of XML Path Language (XPath) Version 2.0, http://www.w3.org/TR/xpath20

  3. Bordogna, G., Pasi, G.: Personalised indexing and retrieval of heterogeneous structured documents. Inf. Retr. 8(2), 301–318 (2005)

    Article  Google Scholar 

  4. Brisaboa, N.R., Fariña, A., Ladra, S., Navarro, G.: Reorganizing compressed text. In: SIGIR 2008, pp. 139–146 (2008)

    Google Scholar 

  5. Brisaboa, N.R., Fariña, A., Navarro, G., Paramá, J.R.: (s, c)-dense coding: An optimized compression code for natural language text databases. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 122–136. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Brisaboa, N.R., Fariña, A., Navarro, G., Paramá, J.R.: Lightweight natural language text compression. Inf. Retr. 10, 1–33 (2007)

    Article  Google Scholar 

  7. Brisaboa, N.R., Fariña, A., Navarro, G., Places, A.S., López, E.R.: Self-indexing natural language. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 121–132. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  8. Culpepper, J.S., Moffat, A.: Enhanced byte codes with restricted prefix properties. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 1–12. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Fuhr, N., Grobjohann, K.: Xirql: A query language for information retrieval in XML documents. In: SIGIR 2001, pp. 172–180 (2001)

    Google Scholar 

  10. Li, H.-G., Aghili, S.A., Agrawal, D., Abbadi, A.E.: Flux: fuzzy content and structure matching of XML range queries. In: WWW 2006, pp. 1081–1082 (2006)

    Google Scholar 

  11. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  12. Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. TOIS 18(2), 113–139 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brisaboa, N.R., Cerdeira-Pena, A., Navarro, G. (2009). A Compressed Self-indexed Representation of XML Documents. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2009. Lecture Notes in Computer Science, vol 5714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04346-8_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04346-8_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04345-1

  • Online ISBN: 978-3-642-04346-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics