In the context of database systems, data generation refers to the creation of synthetic data sets that can be used to populate a database. For relational database systems, tuples are generated based on the definition of one or several tables, as well as constraints (e.g., the cardinality of an attribute and the distribution of its values). For XML databases, documents are generated based on a schema as well as constraints (e.g., cardinality constraints over XPath queries). For graph databases, many algorithms have been devised for generating graphs with given properties (e.g., diameter or density).
Data generation is the generation of basic combinatorial patterns. As Donald Knuth explained in his fascicle on “Generating all n-tuples,” the problem is to devise algorithms that systematically traverse a combinatorial space of possibilities.
- 1.Bruno N, Chaudhuri S. Flexible database generators. In: Proceedings of the 31st International Conference on Very Large Data Bases; 2005. p. 1097–107.Google Scholar
- 2.Arasu A, Kaushik R, Li J. Data generation using declarative constraints. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data; 2011. p. 685–96. https://doi.org/10.1145/1989323.1989395.
- 4.Olston C, Chopra S, Srivastava U. Generating example data for dataflow programs. In: Binnig C, Dageville B, editors. Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data; 2009. p. 245–56. https://doi.org/10.1145/1559845.1559873.