Abstract
In this article, we show how machine learning methods, namely random forests and convolutional neural networks, can be used to predict file lifetimes from their absolute path with a high reliability in an HPC filesystem context. The file lifetime is defined in this article as the time between the creation of the file and the last time it is read. Such results can be applied to the design of smart data placement policies, especially for hierarchical storage systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Change history
08 January 2020
In the original version of this LNCS volume, four papers were erroneously released as open access papers. This has been corrected to only two papers – papers 5 and 7.
Notes
- 1.
Least Recently Used.
- 2.
- 3.
Term Frequency - Inverse Document Frequency, in this case the frequency of an n-gram in a path divided by the logarithm of the inverse of the frequency of this n-gram in the whole corpus of paths.
- 4.
eXpose: A Character-Level Convolutional Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys, Joshua Saxe and Konstantin Berlin.
References
Scikit learn. https://scikit-learn.org
Tensorflow. https://www.tensorflow.org
Abeywardana, S.: Deep quantile regression (2018). https://towardsdatascience.com/deep-quantile-regression-c85481548b5a
Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain, pp. 1107–1116. Association for Computational Linguistics, April 2017. https://www.aclweb.org/anthology/E17-1104
Leibovici, T.: Robinhood policy engine. https://github.com/cea-hpc/robinhood
Saxe, J., Berlin, K.: eXpose: a character-level convolutional neural network with embeddings for detecting malicious urls, file paths and registry keys. CoRR abs/1702.08568 (2017). http://arxiv.org/abs/1702.08568
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). https://arxiv.org/abs/1409.1556
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Monjalet, F., Leibovici, T. (2019). Predicting File Lifetimes with Machine Learning. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-34356-9_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)