Skip to main content

An Efficient High-Dimensional Big Data Storage Structure Based on US-ELM

  • Conference paper
  • First Online:
Proceedings of ELM-2015 Volume 1

Part of the book series: Proceedings in Adaptation, Learning and Optimization ((PALO,volume 6))

Abstract

With the rapid development of computer and the Internet techniques, the amount of data in all walks of life increases sharply, especially accumulating numerous high-dimensional big data such as the network transactions data, the user reviews data and the multimedia data. The storing structure of high-dimensional big data is a critical factor that can affect the processing performance in a fundamental way. However, due to the huge dimensionality feature of high-dimensional data, the existing data storage techniques, such as row-store and column-store, are not very suitable for high-dimensional and large scale data. Therefore, in this paper, we present an efficient high-dimensional big data storage structure based on US-ELM, High-dimensional Big Data File, named HB-File, which is a hybrid storage model of row-store and column-store. With the intensive experiments, we show the effectiveness of HB-File for storing the high-dimensional big data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Carstoiu, D., Lepadatu, E., Gaspar, M.: Hbase—non SQL database, performances evaluation. Int. J. Adv. Comput. Technol. 2(5), 42–52 (2010)

    Google Scholar 

  2. Leo, S., Zanetti, G.: Pydoop: a python mapreduce and HDFS API for hadoop. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010, pp. 819–825. Chicago, Illinois, USA, 21–25 June 2010

    Google Scholar 

  3. Huang, G., Song, S., Gupta, J.N.D., Wu, C.: Semi-supervised and unsupervised extreme learning machines. IEEE Trans. Cybern. 44(12), 2405–2417 (2014)

    Article  Google Scholar 

  4. Cannon, R.L., Dave, J.V., Bezdek, J.C.: Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 8(2), 248–255 (1986)

    Article  MATH  Google Scholar 

  5. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, vol. 2, pp. 985–990. IEEE (2004)

    Google Scholar 

  6. Sun, Y., Yuan, Y., Wang, G.: Extreme learning machine for classification over uncertain data. Neurocomputing 128, 500–506 (2014)

    Article  Google Scholar 

  7. Zong, W., Huang, G.-B.: Learning to rank with extreme learning machine. Neural Process. Lett. 39(2), 155–166 (2014)

    Article  Google Scholar 

  8. Sun, Y., Yuan, Y., Wang, G.: An os-elm based distributed ensemble classification framework in p2p networks. Neurocomputing 74(16), 2438–2443 (2011)

    Article  Google Scholar 

  9. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)

    Article  Google Scholar 

  10. Liu, J., Chen, Y., Liu, M., Zhao, Z.: SELM: semi-supervised ELM with application in sparse calibrated location estimation. Neurocomputing 74(16), 2566–2572 (2011)

    Article  Google Scholar 

  11. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)

    Article  MATH  Google Scholar 

  12. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems 14 Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, pp. 849–856. Vancouver, British Columbia, Canada, Dec 3–8 2001

    Google Scholar 

  13. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  14. Deng, D., Li, G., Hao, S., Wang, J., Feng, J.: Massjoin: a mapreduce-based method for scalable string similarity joins. In: ICDE, pp. 340–351. IEEE (2014)

    Google Scholar 

  15. Qin, L., Yu, J.X., Chang, L., Cheng, H., Zhang, C., Lin, X.: Scalable big graph processing in mapreduce. In: SIGMOD Conference, pp. 827–838. ACM (2014)

    Google Scholar 

  16. Zhang, Y., Chen, S., Wang, Q., Yu, G.: i\({}^{\text{2 }}\) mapreduce: incremental mapreduce for mining evolving big data. IEEE Trans. Knowl. Data Eng. 27(7), 1906–1919 (2015)

    Article  Google Scholar 

  17. Bin Cui, H.M., Ooi, B.C.: Big data: the driver for innovation in databases. Nat. Sci. Rev. 1(1), 27–30 (2014)

    Article  Google Scholar 

  18. Ghemawat, S., Gobioff, H., Leung, S.: The google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles 2003, SOSP 2003, pp. 29–43. Bolton Landing, NY, USA, 19–22 Oct 2003

    Google Scholar 

  19. Raman, V., Swart, G., Qiao, L., Reiss, F., Dialani, V., Kossmann, D., Narang, I., Sidle, R.: Constant-time query processing. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, pp. 60–69. Cancún, México, 7–12 April 2008

    Google Scholar 

  20. Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture optimized for the new bottleneck: memory access. In: VLDB’99, Proceedings of 25th International Conference on Very Large Data Bases, pp. 54–65. Edinburgh, Scotland, UK, 7–10 Sep 1999

    Google Scholar 

  21. Gates, A., Dai, J., Nair, T.: Apache pig’s optimizer. IEEE Data Eng. Bull. 36(1), 34–45 (2013)

    Google Scholar 

  22. Ailamaki, A., DeWitt, D.J., Hill, M.D., Skounakis, M.: Weaving relations for cache performance. In: VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, pp. 169–180. Roma, Italy, 11–14 Sep 2001

    Google Scholar 

  23. He, Y., Lee, R., Huai, Y., Shao, Z., Jain, N., Zhang, X., Xu, Z.: Rcfile: a fast and space-efficient data placement structure in mapreduce-based warehouse systems. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, pp. 1199–1208. Hannover, Germany, 11–16 April 2011

    Google Scholar 

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China (NO. 61472169,61472069,61502215,61472089). Science Research Normal Fund of Liaoning Province Education Department (NO. L2015193). Young Research Foundation of Liaoning University. (NO. LDQN201438). Doctoral Scientific Research Start Foundation of Liaoning Province 2015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linlin Ding .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ding, L., Liu, Y., Song, B., Xin, J. (2016). An Efficient High-Dimensional Big Data Storage Structure Based on US-ELM. In: Cao, J., Mao, K., Wu, J., Lendasse, A. (eds) Proceedings of ELM-2015 Volume 1. Proceedings in Adaptation, Learning and Optimization, vol 6. Springer, Cham. https://doi.org/10.1007/978-3-319-28397-5_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28397-5_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28396-8

  • Online ISBN: 978-3-319-28397-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics