Abstract
Even today, the wisdom for storage still is that storing data in main memory is more expensive than storing on disks. While this is true for the price per byte, the picture looks different for price per bandwidth. However, for data driven applications with high throughput demands, I/O bandwidth can easily become the major bottleneck. Comparing costs for different storage types for a given bandwidth requirement shows that the old wisdom of inexpensive disks and expensive main memory is no longer valid in every case. The higher the bandwidth requirements become, the more cost efficient main memory is. And all of sudden: main memory is less expensive than disk.
In this paper, we show that database workloads for the next generation of enterprise systems have vastly increased bandwidth requirements. These new requirement favor in-memory systems as they are less expensive when operational costs are taken into account. We will discuss mixed enterprise workloads in comparison to traditional transactional workloads and show with a cost evaluation that main memory systems can turn out to incur lower total costs of ownership than their disk-based counterparts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Intel Xeon E7-4890v2 Benchmark – URL: http://www.intel.com/content/www/us/en/benchmarks/server/xeon-e7-v2/xeon-e7-v2-4s-stream.html.
- 2.
iotop – URL: http://guichaz.free.fr/iotop/.
- 3.
Uptime Institute 2012 Data Center Survey – URL: http://uptimeinstitute.com/2012-survey-results.
References
Boissier, M., Krueger, J., Wust, J., Plattner, H.: An integrated data management for enterprise systems. In: ICEIS 2014 - Proceedings of the 16th International Conference on Enterprise Information Systems, vol. 3, 27–30 April, pp. 410–418, Lisbon, Portugal (2014)
Cole, R., Funke, F., Giakoumakis, L., Guy, W., Kemper, A., Krompass, S., Kuno, H.A., Nambiar, R.O., Neumann, T., Poess, M., Sattler, K.-U., Seibold, M., Simon, E., Waas, F.: The mixed workload ch-benchmark. In: DBTest, p. 8. ACM (2011)
Difallah, D.E., Pavlo, A., Curino, C., Cudr-Mauroux, P.: OLTP-bench: an extensible testbed for benchmarking relational databases. PVLDB 7(4), 277–288 (2013)
Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database - an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)
Grund, M., Krueger, J., Plattner, H., Zeier, A., Cudr-Mauroux, P., Madden, S.: HYRISE - a main memory hybrid storage engine. PVLDB 4(2), 105–116 (2010)
H-Store Documentation: MapReduce Transactions. http://hstore.cs.brown.edu/documentation/deployment/mapreduce/
Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: SIGMOD Conference, pp. 981–992. ACM (2008)
Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)
Kemper, A., Neumann, T., Finis, J., Funke, F., Leis, V., Muehe, H., Muehlbauer, T., Roediger, W.: Processing in the hybrid OLTP & OLAP main-memory database system hyper. IEEE Data Eng. Bull. 36(2), 41–47 (2013)
Larson, P., Clinciu, C., Fraser, C., Hanson, E.N., Mokhtar, M., Nowakiewicz, M., Papadimos, V., Price, S.L., Rangarajan, S., Rusanu, R., Saubhasik, M.: Enhancements to SQL server column stores. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, June 22–27, pp. 1159–1168, New York (2013)
Malladi, K.T., Lee, B.C., Nothaft, F.A., Kozyrakis, C., Periyathambi, K., Horowitz, M.: Towards energy-proportional datacenter memory with mobile dram. In: SIGARCH Computer Architecture News, vol. 40(3), pp. 37–48 (2012)
Plattner, H.: The impact of columnar in-memory databases on enterprise systems. PVLDB 7(13), 1722–1729 (2014)
Raman, V., Attaluri, G.K., Barber, R., Chainani, N., Kalmuk, D., KulandaiSamy, V., Leenstra, J., Lightstone, S., Liu, S., Lohman, G.M., Malkemus, T., Müller, R., Pandis, I., Schiefer, B., Sharpe, D., Sidle, R., Storm, A.J., Zhang, L.: DB2 with BLU acceleration: so much more than just a column store. PVLDB 6(11), 1080–1091 (2013)
Rowstron, A., Narayanan, D., Donnelly, A., O’Shea, G., Douglas, A.: Nobody ever got fired for using hadoop on a cluster. In: Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing, HotCDP 2012, pp. 2:1–2:5. ACM, New York (2012)
Shute, J., Vingralek, R., Samwel, B., Handy, B., Whipkey, C., Rollins, E., Oancea, M., Littlefield, K., Menestrina, D., Ellner, S., Cieslewicz, J., Rae, I., Stancescu, T., Apte, H.: F1: a distributed SQL database that scales. PVLDB 6(11), 1068–1079 (2013)
Sizing Guide for Single Click Configurations of Oracles MySQL on Sun Fire x86 Servers. www.oracle.com/technetwork/server-storage/sun-x86/documentation/o11-133-single-click-sizing-mysql-521534.pdf
Zilio, D.C., Rao, J., Lightstone, S., Lohman, G.M., Storm, A.J., Garcia-Arellano, C., Fadden, S.: DB2 design advisor: integrated automatic physical database design. In: VLDB, pp. 1087–1097. Morgan Kaufmann (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 7.1 Execution of CH-benCHmark Queries
The following adaptions have been done to run the CH-benCHmark queries:
-
when needed, the extract function (e.g., EXTRACT(YEAR FROM o_entry_d)) has been replaced by the year function (e.g., YEAR(o_entry_d))
-
for MySQL and PostgreSQL, query 15 has been modified to use a view instead of using SQL’s having clause (code provided by the OLTP-Bench framework)
-
when needed, aliases have been resolved in case they are not supported in aggregations
We set the maximal query execution time to 12Â h for each query, which excludes queries from our results even though they are executable. Due to their long execution time we assume that the execution of these queries does not terminate.
1.2 7.2 TCO Calculations
The following section lists the components for an assumed bandwidth requirement of 40Â GB/s. The prices have been obtained from the official websites of hardware vendors and do not include any discounts. Energy costs are calculated using the technical specifications of the hardware. Cooling costs are calculated using an assumed Power Usage Effectiveness (PUE) of 1.8 according to the Uptime Institute 2012 Data Center SurveyFootnote 3. The cost of energy is $0,276Â per kWh. Both energy and cooling costs are calculated for a timespan of three years.
For the hard disk and solid state disk based systems each node is a four processor server (4Â \(\times \)Â Intel Xeon E7-4850v2 12C/24T 2.3Â GHz 24Â MB) with an estimated price of $30,000. For both configurations the size of main memory is set to \({\sim }10\,\%\) of the database volume (i.e., 50Â GB for the 500Â GB data set).
All following exemplary calculations do not include costs for high availability.
HDD-Based System. The HDD-based system adapts to higher bandwidth requirements by adding direct attached storage units. In this calculation, each node has eight SAS slots. Each DAS unit is connected to two SAS slots and is assumed to provide the maximal theoretical throughput of 6Â GB/s and consists of 96 disks (10Â K enterprise grade) to provide the bandwidth. It is possible to reach 6Â GB/s with fewer 15Â K disks, but a configuration with 10Â K is more price efficient.
Since two SAS slots are used to connect each DAS unit, each server node can connect to a maximum of four DAS units resulting in a peak bandwidth of 24Â GB/s. Consequently, any bandwidth higher than 24Â GB/s requires an additional server node.
The hardware setup for the 40 GB/s configuration and its TCO calculation is listed in Sect. 7.2 (Table 2).
SSD-Based System. The SSD-based system uses PCI-e connected solid state disks. Recent Intel Xeon CPUs have up to 32 PCI-e lanes per socket that are directly connected. Consequently, we assume a theoretical setup of up to eight PCIe-connected SSDs per server node.
For our calculations, we use an PCIe SSD that provide a peak read bandwidth of 3 GB/s and has a size of 1 TB. As of now, there are faster SSDs available (up to 6 GB/s), but these are more expensive by a factor of over 3x. We also calculated prices for another PCIe SSD vendor whose drives are almost a factor 2x less expensive in their smallest size of 350 GB. We did not include these calculations here, as these drives are currently not available.. However, even using these drives the 40 GB/s configuration is still more expensive than its main memory-based counterpart (Table 3).
Main Memory-Based System. The main memory-based server is equipped with Intel’s latest XEON E7 CPU. A server with four CPUs (Intel Xeon E7-4890v2 15C/30T 2.8 GHz 37 MB) costs \({\sim }\$63,000\). The costs include a 600 GB enterprise-grade HDD for persistence (Table 4).
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Boissier, M., Meyer, C., Uflacker, M., Tinnefeld, C. (2015). And All of a Sudden: Main Memory Is Less Expensive Than Disk. In: Rabl, T., Sachs, K., Poess, M., Baru, C., Jacobson, HA. (eds) Big Data Benchmarking. WBDB 2014. Lecture Notes in Computer Science(), vol 8991. Springer, Cham. https://doi.org/10.1007/978-3-319-20233-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-20233-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20232-7
Online ISBN: 978-3-319-20233-4
eBook Packages: Computer ScienceComputer Science (R0)