Skip to main content

Data Generation

  • Reference work entry
  • First Online:
  • 30 Accesses

Synonyms

Tuple generation

Definition

In the context of database systems, data generation refers to the creation of synthetic data sets that can be used to populate a database. For relational database systems, tuples are generated based on the definition of one or several tables, as well as constraints (e.g., the cardinality of an attribute and the distribution of its values). For XML databases, documents are generated based on a schema as well as constraints (e.g., cardinality constraints over XPath queries). For graph databases, many algorithms have been devised for generating graphs with given properties (e.g., diameter or density).

Scientific Fundamentals

Data generation is the generation of basic combinatorial patterns. As Donald Knuth explained in his fascicle on “Generating all n-tuples,” the problem is to devise algorithms that systematically traverse a combinatorial space of possibilities.

The first issue is to determine the nature of that space. It is constrained by...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Bruno N, Chaudhuri S. Flexible database generators. In: Proceedings of the 31st International Conference on Very Large Data Bases; 2005. p. 1097–107.

    Google Scholar 

  2. Arasu A, Kaushik R, Li J. Data generation using declarative constraints. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data; 2011. p. 685–96. https://doi.org/10.1145/1989323.1989395.

  3. Knuth DE. The art of computer programming, volume 4, fascicle 3: generating all combinations and partitions. Upper Saddle River: Addison-Wesley Professional; 2005.

    MATH  Google Scholar 

  4. Olston C, Chopra S, Srivastava U. Generating example data for dataflow programs. In: Binnig C, Dageville B, editors. Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data; 2009. p. 245–56. https://doi.org/10.1145/1559845.1559873.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Bonnet .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Bonnet, P., Shasha, D. (2018). Data Generation. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80799

Download citation

Publish with us

Policies and ethics