The Journal of Supercomputing

, Volume 74, Issue 6, pp 2870–2902 | Cite as

APS: adaptable prefetching scheme to different running environments for concurrent read streams in distributed file systems

  • Sangmin Lee
  • Soon J. Hyun
  • Hong-Yeon Kim
  • Young-Kyun Kim


Distributed file systems (DFSs) are widely used in various areas. One of the key issues is to provide high performance of concurrent read streams (i.e., multiple series of sequential reads by concurrent processes) for their applications. Despite the many studies on local file systems (LFSs), research has seldom been done on concurrent read streams in DFSs with different running environments (i.e., different types of storage devices and various network delays). Furthermore, most of the existing DFSs have a sharply degraded performance compared with a LFS (i.e., EXT4). Therefore, to achieve high performance in concurrent read streams, this study introduces a populating effect that keeps sending subsequent reads to a storage server and then proposes an adaptable prefetching scheme (APS) to obtain the effect even in different running environments. Hence, our APS resolves all the problems that we identified as dramatically degrading the performance in existing DFSs. In three different types of storage devices and in various network delays, the evaluation results show that our prefetching scheme (1) achieves almost the same performance as a LFS from an individual server and (2) minimizes the performance degradation of random reads.


Distributed file system Concurrent read streams Data prefetching Device type Network delay 



This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. R0126-15-1082, Management of Developing ICBMS (IoT, Cloud, Bigdata, Mobile, Security) Core Technologies and Development of Exascale Cloud Storage Technology).


  1. 1.
    A file system and storage benchmark. Accessed Mar 2018
  2. 2.
    Baek SH, Park KH (2009) Striping-aware sequential prefetching for independency and parallelism in disk arrays with concurrent accesses. IEEE Trans Comput 58(8):1146–1152MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Chen M et al (2017) vNFS: maximizing NFS performance with compounds and vectorized I/O. ACM Trans Storage (TOS) 13(3):21MathSciNetGoogle Scholar
  4. 4.
    Cooper BF et al (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing. ACMGoogle Scholar
  5. 5.
    Ding X et al (2007) DiskSeen: exploiting disk layout and access history to enhance I/O prefetch. In; USENIX Annual Technical Conference, vol 7Google Scholar
  6. 6.
    Dong B et al (2010) Correlation based file prefetching approach for hadoop. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom). IEEEGoogle Scholar
  7. 7.
    Ellard D, Seltzer MI (2003) NFS tricks and benchmarking traps. In: USENIX Annual Technical Conference, FREENIX TrackGoogle Scholar
  8. 8.
    Feiyi W et al (2009) Understanding lustre filesystem internals. Oak Ridge National Laboratory, National Center for Computational Sciences, Technical ReportGoogle Scholar
  9. 9.
    Fengguang WU, Hongsheng XI, Chenfeng XU (2008) On the design of a new linux readahead framework. ACM SIGOPS Oper Syst Rev 42(5):75–84CrossRefGoogle Scholar
  10. 10.
    Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: ACM SIGOPS Operating Systems Review, vol 37, no. 5. ACMGoogle Scholar
  11. 11.
    Gill BS, Bathen LAD (2007) Optimal multistream sequential prefetching in a shared cache. ACM Trans Storage (TOS) 3(3):10CrossRefGoogle Scholar
  12. 12.
    Gluster File System. Accessed Mar 2018
  13. 13.
    Hong J et al (2016) Optimizing Hadoop framework for solid state drives. In: IEEE International Congress on Big Data (BigData Congress), 2016. IEEEGoogle Scholar
  14. 14.
    Islam NS et al (2016) High performance design for HDFS with byte-addressability of NVM and RDMA. In: Proceedings of the 2016 International Conference on Supercomputing. ACMGoogle Scholar
  15. 15.
    Jiang S et al (2013) A prefetching scheme exploiting both data layout and access history on disk. ACM Trans Storage (TOS) 9(3):10Google Scholar
  16. 16.
    Lee HK, An BS, Kim EJ (2009) Adaptive prefetching scheme using web log mining in Cluster-based web systems. In: IEEE International Conference on Web Services, 2009. ICWS 2009. IEEEGoogle Scholar
  17. 17.
    Liang S, Jiang S, Zhang X (2007) STEP: sequentiality and thrashing detection based prefetching to improve performance of networked storage servers. In: 27th International Conference on Distributed Computing Systems (ICDCS’07). IEEEGoogle Scholar
  18. 18.
    Li C, Shen K, Papathanasiou AE (2007) Competitive prefetching for concurrent sequential I/O. In: ACM SIGOPS Operating Systems Review, vol 41(3). ACMGoogle Scholar
  19. 19.
    Martin RP, Culler DE (1999) NFS sensitivity to high performance networks. ACM SIGMETRICS Perform Eval Rev 27(1):71–82CrossRefGoogle Scholar
  20. 20.
    Mikami S, Ohta K, Tatebe O (2011) Using the Gfarm File System as a POSIX compatible storage platform for Hadoop MapReduce applications. In: Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing. IEEE Computer SocietyGoogle Scholar
  21. 21.
    Pai R, Pulavarty B, Cao M (2004) Linux 2.6 performance improvement through readahead optimization. In: Proceedings of the Linux Symposium, vol 2Google Scholar
  22. 22.
    Palankar MR et al (2008) Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing. ACMGoogle Scholar
  23. 23.
    Papagiannaki K et al (2002) Analysis of measured single-hop delay from an operational backbone network. In: Proceedings of the Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies. INFOCOM 2002. IEEE, vol 2. IEEEGoogle Scholar
  24. 24.
    Papagiannaki K et al (2003) Measurement and analysis of single-hop delay on an IP backbone network. IEEE J Sel Areas Commun 21(6):908–921CrossRefGoogle Scholar
  25. 25.
    Pillai TS et al (2017) Application crash consistency and performance with CCFS. FAST, vol 15Google Scholar
  26. 26.
    Rago S, Bohra A, Ungureanu C (2013) Using eager strategies to improve NFS I/O performance. Int J Parallel Emerg Distrib Syst 28(2):134–158CrossRefGoogle Scholar
  27. 27.
    Roselli DS, Lorch JR, Anderson TE (2000) A comparison of file system workloads. In: USENIX Annual Technical Conference, General TrackGoogle Scholar
  28. 28.
    Saini S et al (2012) I/O performance characterization of Lustre and NASA applications on Pleiades. In: 2012 19th International Conference on High Performance Computing (HiPC). IEEEGoogle Scholar
  29. 29.
    Shafer J, Rixner S, Cox AL (2010) The hadoop distributed filesystem: balancing portability and performance. In: 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS). IEEEGoogle Scholar
  30. 30.
    Shriver EAM, Small C, Smith KA (1999) Why does file system prefetching work? USENIX Annual Technical Conference, General TrackGoogle Scholar
  31. 31.
    Shvachko K et al (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). IEEEGoogle Scholar
  32. 32.
    Soundararajan G, Mihailescu M, Amza C (2008) Context-aware prefetching at the storage server. In: USENIX Annual Technical ConferenceGoogle Scholar
  33. 33.
    Sur S et al (2010) Can high-performance interconnects benefit hadoop distributed file system. In: Workshop on Micro Architectural Support for Virtualization, Data Center Computing, and Clouds (MASVDC). Held in Conjunction with MICROGoogle Scholar
  34. 34.
    The IOzone Benchmark. Accessed Mar 2018
  35. 35.
    Walker E (2006) A distributed file system for a wide-area high performance computing infrastructure. WORLDS. Vol. 6Google Scholar
  36. 36.
    Weil SA et al (2006) Ceph: A scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation. USENIX AssociationGoogle Scholar
  37. 37.
    Welch B et al (2008) Scalable performance of the panasas parallel file system. FAST, vol 8Google Scholar
  38. 38.
    Wu F et al (2007) Linux readahead: less tricks for more. In: Proceedings of the Linux Symposium, vol 2Google Scholar
  39. 39.
    Yadgar G et al (2008) Mc2: multiple clients on a multilevel cache. In: The 28th International Conference on Distributed Computing Systems, 2008. ICDCS’08. IEEEGoogle Scholar
  40. 40.
    Yadgar G et al (2011) Management of multilevel, multiclient cache hierarchies with application hints. ACM Trans Comput Syst (TOCS) 29(2):5CrossRefGoogle Scholar
  41. 41.
    Zhang Z et al (2008) Pfc: transparent optimization of existing prefetching strategies for multi-level storage systems. In: The 28th International Conference on Distributed Computing Systems, 2008. ICDCS’08. IEEEGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of School of ComputingKorea Advanced Institute of Science and Technology (KAIST)DaejeonKorea
  2. 2.High Performance Computing Research GroupElectronics and Telecommunications Research Institute (ETRI)DaejeonKorea

Personalised recommendations