Skip to main content

A Parallel General-Purpose Synthetic Data Generator1

  • Chapter
  • First Online:
Data Engineering

Abstract

The IT industry needs synthetic data generation tools for a number of applications including (but not limited to):

Regression testing. Repeatedly generate the same large data set for testing enterprise applications. Allow the data set to be removed between regression tests.

1This chapter is adapted from an article of the same name published by the authors in the March 2007 issue of the ACM SIGMOD Record.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The actual values could change with a different random seed.

References

  • Bruno N, Chaudhuri S (2005) Flexible Database Generators. Proceedings on Very Large Data Bases, pp.1097-1107.

    Google Scholar 

  • DTM Data Generator (n.d.), DTM Data Generator home page. Retrieved March 2007 from http://www.sqledit.com.

  • Gray J, Sundaresan P, Englert S, Baclawski K, Weinberger P (1994) Quickly Generating Billion-Record Synthetic Databases. Proceedings of the ACM International Conference on Management of Data (SIGMOD).

    Google Scholar 

  • GS Data Generator (n.d.), GS DataGenerator home page. Retrieved March 2007 from http://www.GSApps.com/products/datagenerator.

  • Houkjaer K, Torp K, Wind R (2006) Simple and Realistic Data Generation. Proceedings on Very Large Data Bases, pp. 1243-1246.

    Google Scholar 

  • KRDataGeneration (n.d.). KRDataGeneration home page. Retrieved January 2007 from http://www.data-generation.com.

  • O’Neil P (n.d.) The Set-Query Benchmark. Retrieved March 2007 from www.cs.umb.edu/∼poneil/SetQBM.pdf.

    Google Scholar 

  • RowGen (n.d.), RowGen home page. Retrieved March 2007 from http://www.iri.com/products/rowgen.

  • Samadi, B., Cipolone, A., Jeske, D., Cox, S., Rendón, C., Holt, D. and Xiao, R. (2006). “Development of a Synthetic Data Set Generator for Building and Testing Information Discovery Systems,” Proceedings of the Third International Conference on Information Technology: New Generations, IEEE Computer Society, Las Vegas, USA, April 10-12, 2006, pp. 707-712.

    Google Scholar 

  • Stephens, J. and Poess, M. (2004). “MUDD: a Multi-Dimensional Data Generator”, International Workshop on Software and Performance, Redwood City, California, January 2004, pp. 104-109.

    Google Scholar 

  • TPC-C (n.d.). TPC-C Home page. Transaction Processing Performance Council., Retrieved March 2007 from http://www.tpc.org/tpcc.

  • TurboData (n.d.). TurboData home page. Retrieved March 2007 from http://www.turbodata.ca

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Hoag, J.E., Thompson, C.W. (2009). A Parallel General-Purpose Synthetic Data Generator1 . In: Chan, Y., Talburt, J., Talley, T. (eds) Data Engineering. International Series in Operations Research & Management Science, vol 132. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0176-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-0176-7_6

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-0175-0

  • Online ISBN: 978-1-4419-0176-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics