Abstract
The IT industry needs synthetic data generation tools for a number of applications including (but not limited to):
Regression testing. Repeatedly generate the same large data set for testing enterprise applications. Allow the data set to be removed between regression tests.
1This chapter is adapted from an article of the same name published by the authors in the March 2007 issue of the ACM SIGMOD Record.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The actual values could change with a different random seed.
References
Bruno N, Chaudhuri S (2005) Flexible Database Generators. Proceedings on Very Large Data Bases, pp.1097-1107.
DTM Data Generator (n.d.), DTM Data Generator home page. Retrieved March 2007 from http://www.sqledit.com.
Gray J, Sundaresan P, Englert S, Baclawski K, Weinberger P (1994) Quickly Generating Billion-Record Synthetic Databases. Proceedings of the ACM International Conference on Management of Data (SIGMOD).
GS Data Generator (n.d.), GS DataGenerator home page. Retrieved March 2007 from http://www.GSApps.com/products/datagenerator.
Houkjaer K, Torp K, Wind R (2006) Simple and Realistic Data Generation. Proceedings on Very Large Data Bases, pp. 1243-1246.
KRDataGeneration (n.d.). KRDataGeneration home page. Retrieved January 2007 from http://www.data-generation.com.
O’Neil P (n.d.) The Set-Query Benchmark. Retrieved March 2007 from www.cs.umb.edu/∼poneil/SetQBM.pdf.
RowGen (n.d.), RowGen home page. Retrieved March 2007 from http://www.iri.com/products/rowgen.
Samadi, B., Cipolone, A., Jeske, D., Cox, S., Rendón, C., Holt, D. and Xiao, R. (2006). “Development of a Synthetic Data Set Generator for Building and Testing Information Discovery Systems,” Proceedings of the Third International Conference on Information Technology: New Generations, IEEE Computer Society, Las Vegas, USA, April 10-12, 2006, pp. 707-712.
Stephens, J. and Poess, M. (2004). “MUDD: a Multi-Dimensional Data Generator”, International Workshop on Software and Performance, Redwood City, California, January 2004, pp. 104-109.
TPC-C (n.d.). TPC-C Home page. Transaction Processing Performance Council., Retrieved March 2007 from http://www.tpc.org/tpcc.
TurboData (n.d.). TurboData home page. Retrieved March 2007 from http://www.turbodata.ca
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Hoag, J.E., Thompson, C.W. (2009). A Parallel General-Purpose Synthetic Data Generator1 . In: Chan, Y., Talburt, J., Talley, T. (eds) Data Engineering. International Series in Operations Research & Management Science, vol 132. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0176-7_6
Download citation
DOI: https://doi.org/10.1007/978-1-4419-0176-7_6
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-0175-0
Online ISBN: 978-1-4419-0176-7
eBook Packages: Computer ScienceComputer Science (R0)