Skip to main content

A Data Structure to Speed-Up Machine Learning Algorithms on Massive Datasets

  • Conference paper
  • First Online:
Book cover Hybrid Artificial Intelligent Systems (HAIS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9648))

Included in the following conference series:

Abstract

Data processing in a fast and efficient way is an important functionality in machine learning, especially with the growing interest in data storage. This exponential increment in data size has hampered traditional techniques for data analysis and data processing, giving rise to a new set of methodologies under the term Big Data. Many efficient algorithms for machine learning have been proposed, facing up time and main memory requirements. Nevertheless, this process could still become hard when the number of features or records is extremely high. In this paper, the goal is not to propose new efficient algorithms but a new data structure that could be used by a variety of existing algorithms without modifying their original schemata. Moreover, the proposed data structure enables sparse datasets to be massively reduced, efficiently processing the data input into a new data structure output. The results demonstrate that the proposed data structure is highly promising, reducing the amount of storage and improving query performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Information about the API is available in: http://www.uco.es/grupos/kdis/kdiswiki/SpeedingUpML.

  2. 2.

    Due to space limitations, we have put a table with the results for each dataset at the website http://www.uco.es/grupos/kdis/kdiswiki/SpeedingUpML.

References

  1. Cox, M., Ellsworth, D.: Application-controlled demand paging for out-of-core visualization. In: Proceedings of the 8th Conference on Visualization 1997, VIS 1997, pp. 235-ff. IEEE Computer Society Press, Los Alamitos (1997)

    Google Scholar 

  2. Cano, A., Luna, J.M., Ventura, S.: High performance evaluation of evolutionary-mined association rules on gpus. J. Supercomput. 66(3), 1438–1461 (2013)

    Article  Google Scholar 

  3. Deza, M.M., Deza, E.: Encyclopedia of Distances. Springer, Berlin (2009)

    Book  MATH  Google Scholar 

  4. Luk, R., Lam, W.: Efficient in-memory extensible inverted file. Inform. Syst. 32(5), 733–754 (2007)

    Article  Google Scholar 

  5. Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 671–682. SIGMOD Conference, Chicago, Illinois, USA (2006)

    Google Scholar 

  6. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, New York (2011)

    Book  Google Scholar 

  7. Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)

    Article  Google Scholar 

  8. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  9. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemannr, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)

    Article  Google Scholar 

  10. Cover, T.M., Thomas, J.A.: Elements of Information Theory (Chap. 3), 2nd edn. Wiley-Interscience (2006)

    Google Scholar 

  11. Vreeken, J., Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discovery 23(1), 169–214 (2011)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This research was supported by the Spanish Ministry of Economy and Competitiveness, project TIN-2014-55252-P, and by FEDER funds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastián Ventura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Padillo, F., Luna, J.M., Cano, A., Ventura, S. (2016). A Data Structure to Speed-Up Machine Learning Algorithms on Massive Datasets. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2016. Lecture Notes in Computer Science(), vol 9648. Springer, Cham. https://doi.org/10.1007/978-3-319-32034-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32034-2_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32033-5

  • Online ISBN: 978-3-319-32034-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics