Abstract
Pattern matching mechanisms based on regular expressions feature in a number of recent languages for processing XML. The flexibility of these mechanisms demands novel approaches to the familiar problems of pattern-match compilation—how to minimize the number of tests performed during pattern matching while keeping the size of the output code small.
We describe semantic compilation methods in which we use the schema of the value flowing into a pattern matching expression to generate efficient target code. We start by discussing a pragmatic algorithm used currently in the compiler of Xtatic and report some preliminary performance results. For a more fundamental analysis, we define an optimality criterion of “no useless tests” and show that it is not satisfied by Xtatic’s algorithm. We constructively demonstrate that the problem of generating optimal pattern matching code is decidable for finite (non-recursive) patterns.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Benzaken, V., Castagna, G., Frisch, A.: CDuce: An XML-centric general-purpose language. In: ACM SIGPLAN International Conference on Functional Programming (ICFP), Uppsala, Sweden, pp. 51–63 (2003)
Flesca, S., Furfaro, F., Masciari, E.: On the minimization of xpath queries. In: VLDB, pp. 153–164 (2003)
Fokoue, A.: Improving the performance of XPath query engines on large collections of XML data (2002)
Frisch, A.: Regular tree language recognition with static information. In: Workshop on Programming Language Technologies for XML (PLAN-X) (January 2004)
Frisch, A.: Théorie, conception et réalisation d’un langage adapté á XML. PhD thesis, Ecole Normale Supérieure, Paris, France (2004)
Gapeyev, V., Levin, M.Y., Pierce, B.C., Schmitt, A.: XML goes native: Run-time representations for Xtatic. In: Bodik, R. (ed.) CC 2005. LNCS, vol. 3443, pp. 43–58. Springer, Heidelberg (2005)
Gapeyev, V., Levin, M.Y., Pierce, B.C., Schmitt, A.: The Xtatic experience. In: Workshop on Programming Language Technologies for XML (PLAN-X) (January 2005); University of Pennsylvania Technical Report MS-CIS-04-24 (October 2004)
Genevès, P., Vion-Dury, J.-Y.: Logic-based XPath optimization. In: International ACM Symposium on Document Engineering (2004)
Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing xpath queries. In: VLDB, pp. 95–106 (2002)
Gottlob, G., Koch, C., Pichler, R.: XPath query evaluation: Improving time and space efficiency (2003)
Hosoya, H., Pierce, B.C.: XDuce: A statically typed XML processing language. ACM Transactions on Internet Technology 3(2), 117–148 (2003)
Hosoya, H., Vouillon, J., Pierce, B.C.: Regular expression types for XML. ACM Transactions on Programming Languages and Systems (TOPLAS) 27(1), 46–90 (2005); Preliminary version in ICFP 2000
Levin, M.Y.: Compiling regular patterns. In: ACM SIGPLAN International Conference on Functional Programming (ICFP), Uppsala, Sweden (2003)
Levin, M.Y., Pierce, B.C.: Type-based optimization for regular patterns. Technical Report MS-CIS-05-13, University of Pennsylvania (June 2005)
Wood, P.T.: Minimising simple xpath expressions. In: WebDB, pp. 13–18 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Levin, M.Y., Pierce, B.C. (2005). Type-Based Optimization for Regular Patterns. In: Bierman, G., Koch, C. (eds) Database Programming Languages. DBPL 2005. Lecture Notes in Computer Science, vol 3774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11601524_12
Download citation
DOI: https://doi.org/10.1007/11601524_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30951-2
Online ISBN: 978-3-540-31445-5
eBook Packages: Computer ScienceComputer Science (R0)