Abstract
Schema inference has been an essential task in database management, and can be reduced to learning regular expressions from sets of positive finite-sample. In this paper, we extend the single-occurrence regular expressions (SOREs) to single-occurrence regular expressions with unorder (uSOREs), and give an inference algorithm for uSOREs. First, we present an unorder-countable finite automaton (uCFA). Then, we construct an uCFA for recognizing the given finite sample. Next, the uCFA runs on the given finite sample to count the number of occurrences of the subexpressions (connectable via unorder) for every possibly repeated matching. Finally we transform the uCFA to an uSORE according to the above results of counting. Experimental results demonstrate that, for larger samples, our algorithm can efficiently infer an uSORE with better generalization ability.
Work supported by National Natural Science Foundation of China under Grant No. 61872339.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
For instance, \(Perm(\{1,2,3\})\!=\!\{\{1,2,3\},\{1,3,2\},\{2,1,3\},\{2,3,1\},\{3,1,2\},\) \(\{3,2,1\}\}\).
- 2.
References
The JSON query language. http://www.jsoniq.org
json-schema.org: The home of JSON Schema. http://json-schema.org/
Abiteboul, S., Bourhis, P., Vianu, V.: Highly expressive query languages for unordered data trees. Theory Comput. Syst. 57(4), 927–966 (2015)
Barbosa, D., Mignet, L., Veltri, P.: Studying the XML Web: gathering statistics from an XML sample. World Wide Web 9(2), 187–212 (2006)
Bex, G.J., Martens, W., Neven, F., Schwentick, T.: Expressiveness of XSDs: from practice to theory, there and back again. In: Proceedings of the 14th International Conference on World Wide Web, pp. 712–721. ACM (2005)
Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML Schema: a practical study. In: Proceedings of the 7th International Workshop on the Web and Databases: Colocated with ACM SIGMOD/PODS 2004, pp. 79–84. ACM (2004)
Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of concise DTDs from XML data. In: International Conference on Very Large Data Bases, Seoul, Korea, pp. 115–126, September 2006
Bex, G.J., Neven, F., Schwentick, T., Vansummeren, S.: Inference of concise regular expressions and DTDs. ACM Trans. Database Syst. 35(2), 1–47 (2010)
Boneva, I., Ciucanu, R., Staworko, S.: Schemas for unordered XML on a DIME. Theory Comput. Syst. 57(2), 337–376 (2015)
Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Inf. Comput. 142(2), 182–206 (1998)
Che, D., Aberer, K., Özsu, M.T.: Query optimization in XML structured-document databases. VLDB J. 15(3), 263–289 (2006)
Ciucanu, R., Staworko, S.: Learning schemas for unordered XML. arXiv preprint arXiv:1307.6348 (2013)
Freydenberger, D.D., Kötzing, T.: Fast learning of restricted regular expressions and DTDs. In: Proceedings of the 16th International Conference on Database Theory, pp. 45–56. ACM (2013)
Freydenberger, D.D., Kötzing, T.: Fast learning of restricted regular expressions and DTDs. Theory Comput. Syst. 57(4), 1114–1158 (2015)
Hovland, D.: The membership problem for regular expressions with unordered concatenation and numerical constraints. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 313–324. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28332-1_27
Manolescu, I., Florescu, D., Kossmann, D.: Answering XML queries on heterogeneous data sources. In: VLDB, vol. 1, pp. 241–250 (2001)
Martens, W., Neven, F.: Typechecking top-down uniform unranked tree transducers. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 64–78. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36285-1_5
Mignet, L., Barbosa, D., Veltri, P.: The XML web: a first study. In: Proceedings of the 12th International Conference on World Wide Web, pp. 500–510. ACM (2003)
International Organization for Standardization: Information Processing: Text and Office Systems: Standard Generalized Markup Language (SGML). ISO (1986)
Staworko, S., Boneva, I., Gayo, J.E.L., Hym, S., Prud’Hommeaux, E.G., Solbrig, H.: Complexity and expressiveness of ShEx for RDF. In: 18th International Conference on Database Theory (ICDT 2015) (2015)
Thompson, H., Beech, D., Maloney, M., Mendelsohn, N.: XML Schema Part 1: Structures, 2nd Edn. W3C Recommendation (2004)
Wang, X., Chen, H.: Inferring deterministic regular expression with counting. In: Trujillo, J., et al. (eds.) ER 2018. LNCS, vol. 11157, pp. 184–199. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00847-5_15
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, X., Chen, H. (2020). Inferring Deterministic Regular Expression with Unorder. In: Chatzigeorgiou, A., et al. SOFSEM 2020: Theory and Practice of Computer Science. SOFSEM 2020. Lecture Notes in Computer Science(), vol 12011. Springer, Cham. https://doi.org/10.1007/978-3-030-38919-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-38919-2_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38918-5
Online ISBN: 978-3-030-38919-2
eBook Packages: Computer ScienceComputer Science (R0)