Skip to main content

Learned Data Structures

  • Chapter
  • First Online:
Recent Trends in Learning From Data

Part of the book series: Studies in Computational Intelligence ((SCI,volume 896))

Abstract

Very recently, the unexpected combination of data structures and machine learning has led to the development of a new area of research, called learned data structures. Their distinguishing trait is the ability to reveal and exploit patterns and trends in the input data for achieving more efficiency in time and space, compared to previously known data structures. The goal of this chapter is to provide the first comprehensive survey of these results and to stimulate further research in this promising area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We assume that universe \(\mathcal {U}\) is a range of reals because of the arithmetic operations required by the models. This works for any kind of keys that can be mapped to reals by preserving their order, such as integers or strings.

  2. 2.

    Private correspondence with the authors clarified that only linear models were used in the hierarchy due to their superior performance.

  3. 3.

    It has to be noted that a B+-tree supports also insertions and deletions, which require a more complex structure that is both slower and more space-inefficient than other static tree-based indexes.

  4. 4.

    The authors suggest organising the buffer “like a hash table”. Even though hashing simplifies buffer modifications, we lose the ability to do predecessor or range queries efficiently.

  5. 5.

    The authors do not discuss the details of this step, but we assume that it is performed in \(O(\log \tilde{m})\) time via a binary search on the \(\tilde{m}\) sorted key-model pairs.

  6. 6.

    We remark that MurmurHash is a known function which can be implemented in a few dozen lines of C code.

  7. 7.

    It has to be noted that for 5K elements a Bloom filter of \({\approx }6\) KB and \(k=7\) hash functions would have sufficed to achieve 1% false positives. Similarly, a Bloom filter of \({\approx }9\) KB and \(k=10\) hash functions would have sufficed to achieve 0.1% false positives.

References

  1. Ao, N., Zhang, F., Wu, D., Stones, D.S., Wang, G., Liu, X., Liu, J., Lin, S.: Efficient parallel lists intersection and index compression algorithms using graphics processing units. Proc. VLDB Endow. 4(8), 470–481 (2011)

    Article  Google Scholar 

  2. Bender, M.A., Farach-Colton, M., Mosteiro, M.A.: Insertion sort is \(o(n \log n)\). Theory Comput. Syst. 39(3), 391–397 (2006)

    Article  MathSciNet  Google Scholar 

  3. Bender, M.A., Hu, H.: An adaptive packed-memory array. ACM Trans. Database Syst. 32(4) (2007)

    Google Scholar 

  4. Bentley, J.L., Yao, A.C.C.: An almost optimal algorithm for unbounded searching. Inf. Process. Lett. 5(3), 82–87 (1976)

    Article  MathSciNet  Google Scholar 

  5. Broder, A., Mitzenmacher, M.: Network applications of Bloom filters: a survey. Internet Math. 1(4), 485–509 (2004)

    Article  MathSciNet  Google Scholar 

  6. Coraddu, A., Oneto, L., Baldi, F., Anguita, D.: Vessels fuel consumption forecast and trim optimisation: a data analytics perspective. Ocean. Eng. 130, 351–370 (2017)

    Article  Google Scholar 

  7. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  8. Cormode, G., Garofalakis, M., Haas, P.J., Jermaine, C.: Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches, vol. 4, No. 1–3, pp. 1–294. Foundations and Trends in Databases (2011)

    Google Scholar 

  9. Cormode, G., Muthukrishnan, S.: Summarizing and mining skewed data streams. In: Proceedings of the International Conference on Data Mining, SDM, pp. 44–55. SIAM (2005)

    Google Scholar 

  10. Dai, Z., Shrivastava, A.: Adaptive learned Bloom filter (Ada-BF): Efficient utilization of the classifier. arXiv:1910.09131 (2019)

  11. Ding, J., Minhas, U.F., Zhang, H., Li, Y., Wang, C., Chandramouli, B., Gehrke, J., Kossmann, D., Lomet, D.: ALEX: An updatable adaptive learned index. arXiv:1905.08898 (2019)

  12. Ferragina, P., Vinciguerra, G.: The PGM-index: a fully dynamic compressed learned index with provable worst-case bounds. PVLDB J. 13(8) (2020). ISSN 2150-8097

    Google Scholar 

  13. Gaede, V., Günther, O.: Multidimensional access methods. ACM Comput. Surv. 30(2), 170–231 (1998)

    Article  Google Scholar 

  14. Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R., Kraska, T.: FITing-Tree: a data-aware index structure. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 1189–1206. ACM, New York, NY, USA (2019)

    Google Scholar 

  15. Hadian, A., Heinis, T.: Considerations for handling updates in learned index structures. In: Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM, pp. 3:1–3:4. ACM, New York, NY, USA (2019)

    Google Scholar 

  16. Hashemi, M., Swersky, K., Smith, J.A., Ayers, G., Litz, H., Chang, J., Kozyrakis, C., Ranganathan, P.: Learning memory access patterns. In: ICML, Proceedings of Machine Learning Research, vol. 80, pp. 1924–1933. PMLR (2018)

    Google Scholar 

  17. Hastie, T., Tibshirani, R., Friedman, J.H.: The elements of statistical learning: data mining, inference, and prediction, 2nd edn., Springer Series in Statistics, Springer (2009)

    Google Scholar 

  18. Hsu, C.Y., Indyk, P., Katabi, D., Vakilian, A.: Learning-based frequency estimation algorithms. In: International Conference on Learning Representations, ICLR (2019)

    Google Scholar 

  19. Idreos, S., Zoumpatianos, K., Chatterjee, S., Qin, W., Wasay, A., Hentschel, B., Kester, M., Dayan, N., Guo, D., Kang, M., Sun, Y.: Learning data structure alchemy. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 42(2), 46–57 (2019)

    Google Scholar 

  20. Idreos, S., Zoumpatianos, K., Hentschel, B., Kester, M.S., Guo, D.: The data calculator: Data structure design and cost synthesis from first principles and learned cost models. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 535–550. ACM, New York, NY, USA (2018)

    Google Scholar 

  21. Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P.A., Kemper, A.: Learned cardinalities: estimating correlated joins with deep learning. In: CIDR. www.cidrdb.org (2019)

  22. Kraska, T., Alizadeh, M., Beutel, A., Chi, E.H., Kristo, A., Leclerc, G., Madden, S., Mao, H., Nathan, V.: SageDB: a learned database system. In: CIDR. www.cidrdb.org (2019)

  23. Kraska, T., Beutel, A., Chi, E.H., Dean, J., Polyzotis, N.: The case for learned index structures. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 489–504. ACM, New York, NY, USA (2018)

    Google Scholar 

  24. Li, P., Hua, Y., Zuo, P., Jia, J.: A scalable learned index scheme in storage systems. arXiv:1905.06256 (2019)

  25. Li, X., Li, J., Wang, X.: ASLM: Adaptive single layer model for learned index. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds.) Database Systems for Advanced Applications, pp. 80–95. Springer International Publishing, Cham (2019)

    Google Scholar 

  26. Macke, S., Beutel, A., Kraska, T., Sathiamoorthy, M., Cheng, D.Z., Chi, E.H.: Lifting the curse of multidimensional data with learned existence indexes. In: Workshop on ML for Systems at NeurIPS (2018)

    Google Scholar 

  27. Mehta, D.P., Sahni, S. (eds.): Handbook of Data Structures and Applications, 2nd edn. Chapman and Hall/CRC, Boca Raton (2018)

    MATH  Google Scholar 

  28. Mitzenmacher, M.: A model for learned Bloom filters and optimizing by sandwiching. In: 32nd Conference on Neural Information Processing Systems, NeurIPS (2018)

    Google Scholar 

  29. Navarro, G.: Compact data structures: a practical approach. Cambridge University Press, New York (2016)

    Book  Google Scholar 

  30. Oneto, L., Ridella, S., Anguita, D.: Tikhonov, ivanov and morozov regularization for support vector machine learning. Mach. Learn. 103(1), 103–136 (2015)

    Article  MathSciNet  Google Scholar 

  31. O’Rourke, J.: An on-line algorithm for fitting straight lines between data ranges. Commun. ACM 24(9), 574–578 (1981)

    Article  Google Scholar 

  32. Petrov, A.: Algorithms behind modern storage systems. Commun. ACM 61(8), 38–44 (2018)

    Article  Google Scholar 

  33. Qu, W., Wang, X., Li, J., Li, X.: Hybrid indexes by exploring traditional B-tree and linear regression. In: Proceedings of the 16th International Conference on Web Information Systems and Applications, WISA, pp. 601–613. Springer International Publishing, Cham (2019)

    Google Scholar 

  34. Rae, J., Bartunov, S., Lillicrap, T.: Meta-learning neural Bloom filters. In: Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, pp. 5271–5280. PMLR, Long Beach, California, USA (2019)

    Google Scholar 

  35. Vahdat, M., Oneto, L., Anguita, D., Funk, M., Rauterberg, M.: A learning analytics approach to correlate the academic achievements of students with interaction data from an educational simulator. In: European Conference on Technology Enhanced Learning (2015)

    Google Scholar 

  36. Vinciguerra, G., Ferragina, P., Miccinesi, M.: Superseding traditional indexes by orchestrating learning and geometry. arXiv:1903.00507 (2019)

  37. Vitter, J.S.: External memory algorithms and data structures: dealing with massive data. ACM Comput. Surv. 33(2), 209–271 (2001)

    Article  Google Scholar 

  38. Wang, H., Fu, X., Xu, J., Lu, H.: Learned index for spatial queries. In: Proceedings of the 20th International Conference on Mobile Data Management, MDM, pp. 569–574. IEEE (2019)

    Google Scholar 

  39. Wu, Y., Yu, J., Tian, Y., Sidle, R., Barber, R.: Designing succinct secondary indexing mechanism by exploiting column correlations. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 1223–1240. ACM, New York, NY, USA (2019)

    Google Scholar 

  40. Xiang, W., Zhang, H., Cui, R., Chu, X., Li, K., Zhou, W.: Pavo: A RNN-based learned inverted index, supervised or unsupervised? IEEE Access 7, 293–303 (2019)

    Article  Google Scholar 

  41. Zhang, M., Wang, H., Li, J., Gao, H.: Learned sketches for frequency estimation. Inf. Sci. 507, 365–385 (2020)

    Article  MathSciNet  Google Scholar 

  42. Zhang, Y., Huang, Y.: “Learned” operating systems. SIGOPS Oper. Syst. Rev. 53(1), 40–45 (2019)

    Article  Google Scholar 

Download references

Acknowledgements

Part of this work has been supported by the Italian MIUR PRIN project “Multicriteria Data Structures and Algorithms: from compressed to learned indexes, and beyond” (Prot. 2017WR7SHH), by Regione Toscana (under POR FSE 2014/2020), and by the European Integrated Infrastructure for Social Mining and Big Data Analytics (SoBigData++, Grant Agreement # 871042).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgio Vinciguerra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ferragina, P., Vinciguerra, G. (2020). Learned Data Structures. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds) Recent Trends in Learning From Data. Studies in Computational Intelligence, vol 896. Springer, Cham. https://doi.org/10.1007/978-3-030-43883-8_2

Download citation

Publish with us

Policies and ethics