Skip to main content

Tree String Path Subsequences Automaton and Its Use for Indexing XML Documents

  • Conference paper
  • First Online:
Languages, Applications and Technologies (SLATE 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 563))

Included in the following conference series:

  • 434 Accesses

Abstract

The theory of indexing texts is well-researched, which does not hold for indexing other data structures, such as trees for example. In this paper a simple method of indexing a tree for subsequences of string paths in the tree by finite automaton is presented. The use of the index is shown on indexing XML documents for XPath descendant-or-self axis inspired queries. Given a subject tree \(\mathcal{T}\) with n nodes, the tree is preprocessed and an index, which is a directed acyclic subsequence graph for a set of strings, is constructed. The searching phase uses the index, reads an input string path subsequence \(\mathcal{Q}\) inspired by the specific XPath query of size m and computes the list of positions of all occurrences of \(\mathcal{Q}\) in the tree \(\mathcal{T}\). The searching is performed in time \(\mathcal {O}(m)\) and does not depend on n. Although the number of distinct valid queries is \(\mathcal {O}(2^n)\), the size of the index is \(\mathcal {O}(h^k)\), where h is the height of the tree \(\mathcal{T}\) and k is the number of its leaves. Moreover, we discuss that in the case of indexing a common XML document the size of the index is even smaller \(\mathcal {O}(h \cdot 2^k)\).

J. Janoušek—This research has been partially supported by the Czech Science Foundation (GAČR) as project No. GA-13-03253S and by Technology Agency of the Czech Republic (TAČR) as project No. TA03010964 in \(\alpha \) programme.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baeza-Yates, R.A.: Searching subsequences. Theoret. Comput. Sci. 78(2), 363–376 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  2. Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.T., Seiferas, J.I.: The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci. 40, 31–55 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  3. Buneman, P., Davidson, S.B., Fan, W., Hara, C., Tan, W.-C.: Reasoning about Keys for XML. In: Ghelli, G., Grahne, G. (eds.) DBPL 2001. LNCS, vol. 2397, pp. 133–148. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Chung, C.-W., Min, J.-K., Shim, K.: Apex: an adaptive path index for xml data. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD 2002, pp. 121–132. ACM, New York (2002)

    Google Scholar 

  5. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007)

    Book  MATH  Google Scholar 

  6. Crochemore, M., Melichar, B., Tronicek, Z.: Directed acyclic subsequence graph–Overview. J. Discrete Algorithms 1(3–4), 255–280 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  7. Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Oxford (1994)

    MATH  Google Scholar 

  8. Crochemore, M., Troníček, Z.: On the size of DASG for multiple texts. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 58–64. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  9. Goldman, R., Widom, J.: Dataguides: enabling query formulation and optimization in semistructured databases (1997)

    Google Scholar 

  10. Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S.: Online construction of subsequence automata for multiple texts. In: Seventh International Symposium on String Processing and Information Retrieval, SPIRE 2000. Proceedings, pp. 146–152 (2000)

    Google Scholar 

  11. Janoušek, J., Melichar, B., Polách, R., Poliak, M., Trávníček, J.: A full and linear index of a tree for tree patterns. In: Jürgensen, H., Karhumäki, J., Okhotin, A. (eds.) DCFS 2014. LNCS, vol. 8614, pp. 198–209. Springer, Heidelberg (2014)

    Google Scholar 

  12. Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD 2002, pp. 133–144. ACM, New York (2002)

    Google Scholar 

  13. Li, Q., Moon, B.: Indexing and querying xml data for regular path expressions. In: Proceedings of the 27th International Conference on Very Large Data Bases, VLDB 2001, pp. 361–370. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  14. Melichar, B., Janoušek, J., Flouri, T.: Arbology: trees and pushdown automata. Kybernetika 48(3), 402–428 (2012)

    MathSciNet  MATH  Google Scholar 

  15. Miklau, G., Suciu, D.: Containment and equivalence for an xpath fragment. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2002, pp. 65–76. ACM, New York (2002)

    Google Scholar 

  16. Miklau, G., Suciu, D.: Containment and equivalence for a fragment of xpath. J. ACM 51(1), 2–45 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  17. Milo, T.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  18. Mark Pettovello, P., Fotouhi, F.: Mtree: an xml xpath graph index. In: Proceedings of the 2006 ACM Symposium on Applied Computing, SAC 2006, pp. 474–481. ACM, New York (2006)

    Google Scholar 

  19. Rao, P., Moon, B.: Prix: indexing and querying xml using prufer sequences. In: 20th International Conference on Data Engineering, 2004. Proceedings, pp. 288–299, March 2004

    Google Scholar 

  20. Tang, N., Yu, J.X., Ozsu, M.T., Wong, K.-F.: Hierarchical indexing approach to support xpath queries. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 1510–1512, April 2008

    Google Scholar 

  21. Šestáková, E.: Indexing XML documents. Master’s thesis, Czech Technical University in Prague, Faculty of Information Technology, Prague (2015)

    Google Scholar 

  22. Wang, H., Park, S., Fan, W., Yu, P.S.: Vist: a dynamic index method for querying xml data by tree structures. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD 2003, pp. 110–121. ACM, New York (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eliška Šestáková .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Šestáková, E., Janoušek, J. (2015). Tree String Path Subsequences Automaton and Its Use for Indexing XML Documents. In: Sierra-Rodríguez, JL., Leal, JP., Simões, A. (eds) Languages, Applications and Technologies. SLATE 2015. Communications in Computer and Information Science, vol 563. Springer, Cham. https://doi.org/10.1007/978-3-319-27653-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27653-3_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27652-6

  • Online ISBN: 978-3-319-27653-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics