An Efficient High-Dimensional Big Data Storage Structure Based on US-ELM

Ding, Linlin; Liu, Yu; Song, Baoyan; Xin, Junchang

doi:10.1007/978-3-319-28397-5_38

Linlin Ding⁷,
Yu Liu⁷,
Baoyan Song⁷ &
…
Junchang Xin⁸

Part of the book series: Proceedings in Adaptation, Learning and Optimization ((PALO,volume 6))

1254 Accesses
1 Altmetric

Abstract

With the rapid development of computer and the Internet techniques, the amount of data in all walks of life increases sharply, especially accumulating numerous high-dimensional big data such as the network transactions data, the user reviews data and the multimedia data. The storing structure of high-dimensional big data is a critical factor that can affect the processing performance in a fundamental way. However, due to the huge dimensionality feature of high-dimensional data, the existing data storage techniques, such as row-store and column-store, are not very suitable for high-dimensional and large scale data. Therefore, in this paper, we present an efficient high-dimensional big data storage structure based on US-ELM, High-dimensional Big Data File, named HB-File, which is a hybrid storage model of row-store and column-store. With the intensive experiments, we show the effectiveness of HB-File for storing the high-dimensional big data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Carstoiu, D., Lepadatu, E., Gaspar, M.: Hbase—non SQL database, performances evaluation. Int. J. Adv. Comput. Technol. 2(5), 42–52 (2010)
Google Scholar
Leo, S., Zanetti, G.: Pydoop: a python mapreduce and HDFS API for hadoop. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010, pp. 819–825. Chicago, Illinois, USA, 21–25 June 2010
Google Scholar
Huang, G., Song, S., Gupta, J.N.D., Wu, C.: Semi-supervised and unsupervised extreme learning machines. IEEE Trans. Cybern. 44(12), 2405–2417 (2014)
Article Google Scholar
Cannon, R.L., Dave, J.V., Bezdek, J.C.: Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 8(2), 248–255 (1986)
Article MATH Google Scholar
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, vol. 2, pp. 985–990. IEEE (2004)
Google Scholar
Sun, Y., Yuan, Y., Wang, G.: Extreme learning machine for classification over uncertain data. Neurocomputing 128, 500–506 (2014)
Article Google Scholar
Zong, W., Huang, G.-B.: Learning to rank with extreme learning machine. Neural Process. Lett. 39(2), 155–166 (2014)
Article Google Scholar
Sun, Y., Yuan, Y., Wang, G.: An os-elm based distributed ensemble classification framework in p2p networks. Neurocomputing 74(16), 2438–2443 (2011)
Article Google Scholar
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Article Google Scholar
Liu, J., Chen, Y., Liu, M., Zhao, Z.: SELM: semi-supervised ELM with application in sparse calibrated location estimation. Neurocomputing 74(16), 2566–2572 (2011)
Article Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Article MATH Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems 14 Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, pp. 849–856. Vancouver, British Columbia, Canada, Dec 3–8 2001
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Deng, D., Li, G., Hao, S., Wang, J., Feng, J.: Massjoin: a mapreduce-based method for scalable string similarity joins. In: ICDE, pp. 340–351. IEEE (2014)
Google Scholar
Qin, L., Yu, J.X., Chang, L., Cheng, H., Zhang, C., Lin, X.: Scalable big graph processing in mapreduce. In: SIGMOD Conference, pp. 827–838. ACM (2014)
Google Scholar
Zhang, Y., Chen, S., Wang, Q., Yu, G.: i\({}^{\text{2 }}\) mapreduce: incremental mapreduce for mining evolving big data. IEEE Trans. Knowl. Data Eng. 27(7), 1906–1919 (2015)
Article Google Scholar
Bin Cui, H.M., Ooi, B.C.: Big data: the driver for innovation in databases. Nat. Sci. Rev. 1(1), 27–30 (2014)
Article Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.: The google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles 2003, SOSP 2003, pp. 29–43. Bolton Landing, NY, USA, 19–22 Oct 2003
Google Scholar
Raman, V., Swart, G., Qiao, L., Reiss, F., Dialani, V., Kossmann, D., Narang, I., Sidle, R.: Constant-time query processing. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, pp. 60–69. Cancún, México, 7–12 April 2008
Google Scholar
Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture optimized for the new bottleneck: memory access. In: VLDB’99, Proceedings of 25th International Conference on Very Large Data Bases, pp. 54–65. Edinburgh, Scotland, UK, 7–10 Sep 1999
Google Scholar
Gates, A., Dai, J., Nair, T.: Apache pig’s optimizer. IEEE Data Eng. Bull. 36(1), 34–45 (2013)
Google Scholar
Ailamaki, A., DeWitt, D.J., Hill, M.D., Skounakis, M.: Weaving relations for cache performance. In: VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, pp. 169–180. Roma, Italy, 11–14 Sep 2001
Google Scholar
He, Y., Lee, R., Huai, Y., Shao, Z., Jain, N., Zhang, X., Xu, Z.: Rcfile: a fast and space-efficient data placement structure in mapreduce-based warehouse systems. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, pp. 1199–1208. Hannover, Germany, 11–16 April 2011
Google Scholar

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China (NO. 61472169,61472069,61502215,61472089). Science Research Normal Fund of Liaoning Province Education Department (NO. L2015193). Young Research Foundation of Liaoning University. (NO. LDQN201438). Doctoral Scientific Research Start Foundation of Liaoning Province 2015.

Author information

Authors and Affiliations

School of Information, Liaoning University, Shenyang, 110036, Liaoning, China
Linlin Ding, Yu Liu & Baoyan Song
College of Information Science & Engineering, Northeastern University, Shenyang, 110819, Liaoning, China
Junchang Xin

Authors

Linlin Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Baoyan Song
View author publications
You can also search for this author in PubMed Google Scholar
Junchang Xin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Linlin Ding .

Editor information

Editors and Affiliations

Institute of Information and Contro, Hangzhou Dianzi University, Zhejiang, China
Jiuwen Cao
Nanyang Technological University, Singapore, Singapore
Kezhi Mao
ECE, U of Windsor, WINDSOR, Ontario, Canada
Jonathan Wu
Dept of Mechanical and Industrial Engg, University of Iowa, Iowa City, Iowa, USA
Amaury Lendasse

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, L., Liu, Y., Song, B., Xin, J. (2016). An Efficient High-Dimensional Big Data Storage Structure Based on US-ELM. In: Cao, J., Mao, K., Wu, J., Lendasse, A. (eds) Proceedings of ELM-2015 Volume 1. Proceedings in Adaptation, Learning and Optimization, vol 6. Springer, Cham. https://doi.org/10.1007/978-3-319-28397-5_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-28397-5_38
Published: 01 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28396-8
Online ISBN: 978-3-319-28397-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics