Cluster Computing

, Volume 17, Issue 2, pp 593–604 | Cite as

FastStor: improving the performance of a large scale hybrid storage system via caching and prefetching

  • Ziliang Zong
  • Ribel Fares
  • Brian Romoser
  • Joal Wood
Original Paper


Storing enormous amount of data on hybrid storage systems has become a widely accepted solution for today’s production level applications in order to trade off the performance and cost. However, how to improve the performance of large scale storage systems with hybrid components (e.g. solid state disks, hard drives and tapes) and complicated user behaviors is not fully explored. In this paper, we conduct an in-depth case study (we call it FastStor) on designing a high performance hybrid storage system to support one of the world’s largest satellite images distribution systems operated by the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) center. We demonstrate how to combine conventional caching policies with innovative current popularity oriented and user-specific prefetching algorithms to improve the performance of the EROS system. We evaluate the effectiveness of our proposed solution using over 5 million real world user download requests provided by EROS. Our experimental results show that using the Least Recently Used (LRU) caching policy alone, we are able to achieve an overall 64 % or 70 % hit ratio on a 100 TB or 200 TB FTP server farm composed of Solid State Disks (SSDs) respectively. The hit ratio can be further improved to 70 % (for 100 TB SSDs) and 76 % (for 200 TB SSDs) if intelligent prefetching algorithms are used together with LRU.


Hybrid storage systems Big data Performance Caching Prefetching 



The authors sincerely appreciate the comments and feedback from the anonymous reviewers. The work reported in this paper is supported by the U.S. National Science Foundation under Grants No. CNS-0915762 and CNS-1212535. We also gratefully acknowledge the support from the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) center.


  1. 1.
    Lyman, P., Varian, H.R.: How much information 2003. Retrieved from on May 28, 2012
  2. 2.
    DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles, pp. 205–220 (2007) CrossRefGoogle Scholar
  3. 3.
    Beaver, D., Kumar, S., Li, H.C., Sobel, J., Vajgel, P.: Finding a needle in haystack: facebook’s photo storage. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (2010) Google Scholar
  4. 4.
    Hybrid Storage Solutions, Powerfile Technical Report. Retrieved from in September 2012
  5. 5.
    Faundeen, J.: Archiving strategy for USGS EROS center and our future direction. In: Proceedings of 2010 Roadmap for Digital Preservation Interoperability Framework Workshop, vol. 5 (2010) Google Scholar
  6. 6.
  7. 7.
  8. 8.
  9. 9.
    Smith, A.J.: Design of CPU cache memories. In: Proceedings of IEEE TENCON (1987) Google Scholar
  10. 10.
    Chou, H.-T., Dewitt, D.J.: An evaluation of buffer management strategies for relational database systems. In: Proceedings of International Conference on Very Large Databases (VLDB) (1985) Google Scholar
  11. 11.
    Dar, S., Franklin, M.J., Jonsson, B., Srivastava, D., Tan, M.: Semantic data caching and replacement. In: Proceedings of International Conference on Very Large Databases (VLDB) (1996) Google Scholar
  12. 12.
    Megiddo, N., Modha, D.S.: ARC: a self-tuning, low overhead replacement cache. In: Proceedings of USENIX Conference on File and Storage Technologies (FAST), pp. 115–130 (2003) Google Scholar
  13. 13.
    Zhou, Y., Philbin, J., Li, K.: The multi-queue replacement algorithm for second level buffer caches. In: Proceedings of USENIX Technical Conference (2001) Google Scholar
  14. 14.
    Fares, R., Romoser, B., Qin, X., Nijim, M., Zong, Z.: Performance evaluation of traditional caching policies on a large system with petabytes of data. In: Proceedings of the 7th IEEE International Conference on Networking, Architecture, and Storage (2012) Google Scholar
  15. 15.
    Butt, A.R., Gniady, C., Hu, Y.C.: The performance impact of Kernel prefetching on buffer cache replacement algorithms. IEEE Trans. Comput. 56(7), 889–908 (2007) CrossRefMathSciNetGoogle Scholar
  16. 16.
    Grimsrud, K.S., Archibald, J.K., Nelson, B.E.: Multiple prefetch adaptive disk caching. IEEE Trans. Knowl. Data Eng. 5(1), 88–103 (1993) CrossRefGoogle Scholar
  17. 17.
    Jeon, H.S.: Practical buffer cache management scheme based on simple prefetching. IEEE Trans. Consum. Electron. 52(3), 926–934 (2006) CrossRefGoogle Scholar
  18. 18.
    Jeon, J., Lee, G., Cho, H., Ahn, B.: A prefetching web caching method using adaptive search patterns. In: Proceedings of the IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, vol. 1, pp. 37–40 (2003) Google Scholar
  19. 19.
    Cao, P., Felten, E.W., Karlin, A.R., Li, K.: A study of integrated prefetching and caching strategies. In: Proceedings of the 1995 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, vol. 23, pp. 188–197 (1995) Google Scholar
  20. 20.
    Lan, B., Bressan, S., Ooi, B., Tan, K.: Rule-assisted prefetching in web-server caching. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp. 504–511 (2000) Google Scholar
  21. 21.
    Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: A data mining algorithm for generalized web prefetching. IEEE Trans. Knowl. Data Eng. 15(5), 1155–1169 (2003) CrossRefGoogle Scholar
  22. 22.
    Griffoen, J., Appleton, R.: Reducing file system latency using a predictive approach. In: Proceedings of the Summer USENIX Technical Conference, vol. 1, p. 13 (1995) Google Scholar
  23. 23.
    Gindele, J.D.: Buffer block prefetching method. IBM Tech. Dis. Bull. 20(2), 696–697 (1977) Google Scholar
  24. 24.
    Smith, A.: Cache memories. ACM Comput. Surv. 14(3), 473–530 (1982) CrossRefGoogle Scholar
  25. 25.
    Srinivasan, V., Davidson, E., Tyson, G.: A prefetch taxonomy. IEEE Trans. Comput. 53(2), 126–140 (2004) CrossRefGoogle Scholar
  26. 26.
    Azevedo, D., Oliveira, J.: Application of data mining techniques to the storage management and online distribution of satellite image. In: Proceedings of the Seventh International Conference on Intelligent Systems Design and Applications, pp. 930–955 (2007) Google Scholar
  27. 27.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009) CrossRefGoogle Scholar
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
    Romoser, B., Fares, R., Janovics, P., Ruan, X.J., Qin, X., Zong, Z.L.: Global workload characterization of a large scale satellite image distribution system. In: Proceedings of the 2012 IEEE International Performance Computing and Communications Conference (2012) Google Scholar
  34. 34.
    Zong, Z.L., Romoser, B.: Architecture design of a data intensive satellite image processing and distribution system. In: International Workshop on Data-Intensive Scalable Computing Systems in Conjunction with the 2012 ACM/IEEE Supercomputing Conference (2012) Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Ziliang Zong
    • 1
  • Ribel Fares
    • 1
  • Brian Romoser
    • 1
  • Joal Wood
    • 1
  1. 1.Department of Computer ScienceTexas State UniversitySan MarcosUSA

Personalised recommendations