Abstract
With the rapid development of computer and the Internet techniques, the amount of data in all walks of life increases sharply, especially accumulating numerous high-dimensional big data such as the network transactions data, the user reviews data and the multimedia data. The storing structure of high-dimensional big data is a critical factor that can affect the processing performance in a fundamental way. However, due to the huge dimensionality feature of high-dimensional data, the existing data storage techniques, such as row-store and column-store, are not very suitable for high-dimensional and large scale data. Therefore, in this paper, we present an efficient high-dimensional big data storage structure based on US-ELM, High-dimensional Big Data File, named HB-File, which is a hybrid storage model of row-store and column-store. With the intensive experiments, we show the effectiveness of HB-File for storing the high-dimensional big data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Carstoiu, D., Lepadatu, E., Gaspar, M.: Hbase—non SQL database, performances evaluation. Int. J. Adv. Comput. Technol. 2(5), 42–52 (2010)
Leo, S., Zanetti, G.: Pydoop: a python mapreduce and HDFS API for hadoop. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010, pp. 819–825. Chicago, Illinois, USA, 21–25 June 2010
Huang, G., Song, S., Gupta, J.N.D., Wu, C.: Semi-supervised and unsupervised extreme learning machines. IEEE Trans. Cybern. 44(12), 2405–2417 (2014)
Cannon, R.L., Dave, J.V., Bezdek, J.C.: Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 8(2), 248–255 (1986)
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, vol. 2, pp. 985–990. IEEE (2004)
Sun, Y., Yuan, Y., Wang, G.: Extreme learning machine for classification over uncertain data. Neurocomputing 128, 500–506 (2014)
Zong, W., Huang, G.-B.: Learning to rank with extreme learning machine. Neural Process. Lett. 39(2), 155–166 (2014)
Sun, Y., Yuan, Y., Wang, G.: An os-elm based distributed ensemble classification framework in p2p networks. Neurocomputing 74(16), 2438–2443 (2011)
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Liu, J., Chen, Y., Liu, M., Zhao, Z.: SELM: semi-supervised ELM with application in sparse calibrated location estimation. Neurocomputing 74(16), 2566–2572 (2011)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems 14 Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, pp. 849–856. Vancouver, British Columbia, Canada, Dec 3–8 2001
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Deng, D., Li, G., Hao, S., Wang, J., Feng, J.: Massjoin: a mapreduce-based method for scalable string similarity joins. In: ICDE, pp. 340–351. IEEE (2014)
Qin, L., Yu, J.X., Chang, L., Cheng, H., Zhang, C., Lin, X.: Scalable big graph processing in mapreduce. In: SIGMOD Conference, pp. 827–838. ACM (2014)
Zhang, Y., Chen, S., Wang, Q., Yu, G.: i\({}^{\text{2 }}\) mapreduce: incremental mapreduce for mining evolving big data. IEEE Trans. Knowl. Data Eng. 27(7), 1906–1919 (2015)
Bin Cui, H.M., Ooi, B.C.: Big data: the driver for innovation in databases. Nat. Sci. Rev. 1(1), 27–30 (2014)
Ghemawat, S., Gobioff, H., Leung, S.: The google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles 2003, SOSP 2003, pp. 29–43. Bolton Landing, NY, USA, 19–22 Oct 2003
Raman, V., Swart, G., Qiao, L., Reiss, F., Dialani, V., Kossmann, D., Narang, I., Sidle, R.: Constant-time query processing. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, pp. 60–69. Cancún, México, 7–12 April 2008
Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture optimized for the new bottleneck: memory access. In: VLDB’99, Proceedings of 25th International Conference on Very Large Data Bases, pp. 54–65. Edinburgh, Scotland, UK, 7–10 Sep 1999
Gates, A., Dai, J., Nair, T.: Apache pig’s optimizer. IEEE Data Eng. Bull. 36(1), 34–45 (2013)
Ailamaki, A., DeWitt, D.J., Hill, M.D., Skounakis, M.: Weaving relations for cache performance. In: VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, pp. 169–180. Roma, Italy, 11–14 Sep 2001
He, Y., Lee, R., Huai, Y., Shao, Z., Jain, N., Zhang, X., Xu, Z.: Rcfile: a fast and space-efficient data placement structure in mapreduce-based warehouse systems. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, pp. 1199–1208. Hannover, Germany, 11–16 April 2011
Acknowledgments
This work is supported by National Natural Science Foundation of China (NO. 61472169,61472069,61502215,61472089). Science Research Normal Fund of Liaoning Province Education Department (NO. L2015193). Young Research Foundation of Liaoning University. (NO. LDQN201438). Doctoral Scientific Research Start Foundation of Liaoning Province 2015.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ding, L., Liu, Y., Song, B., Xin, J. (2016). An Efficient High-Dimensional Big Data Storage Structure Based on US-ELM. In: Cao, J., Mao, K., Wu, J., Lendasse, A. (eds) Proceedings of ELM-2015 Volume 1. Proceedings in Adaptation, Learning and Optimization, vol 6. Springer, Cham. https://doi.org/10.1007/978-3-319-28397-5_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-28397-5_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28396-8
Online ISBN: 978-3-319-28397-5
eBook Packages: EngineeringEngineering (R0)