Automated Hot_Text and Huge_Pages: An Easy-to-Adopt Solution Towards High Performing Services

  • Zhenyun ZhuangEmail author
  • Mark Santaniello
  • Shumin Zhao
  • Bikash Sharma
  • Rajit Kambo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11512)


Performance optimizations of large scale services can lead to significant wins on service efficiency and performance. CPU resource is one of the most common performance bottlenecks, hence improving CPU performance has been the focus of many performance optimization efforts. In particular, reducing iTLB (instruction TLB) miss rates can greatly improve CPU performance and speed up service running.

At Facebook, we have achieved CPU reduction by applying a solution that firstly identifies hot-text of the (software) binary and then places the binary on huge pages (i.e., 2 MB+ memory pages). The solution is wrapped into an automated framework, enabling service owners to effortlessly adopt it. Our framework has been applied to many services at Facebook, and this paper shares our experiences and findings.


Huge pages Hot-text Performance iTLB miss 



The solution presented in this paper involves many peoples’ efforts, which include new services or feature enhancements of existing services. In particular, we thank Guilherme Ottoni and Bert Maher for working on HFSort, Mark Williams for implementing hugify\(\_\)self(), Denis Sheahan and Pallab Bhattacharya for substantiating a generic library, and Mirek Klimos for the support that allows automated refreshing of profiling data.


  1. 1.
    Chen, G.J., et al.: Realtime data processing at Facebook. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD 2016, New York, NY, USA (2016)Google Scholar
  2. 2.
    Dong, S., Callaghan, M., Galanis, L., Borthakur, D., Savor, T., Strum, M.: Optimizing space amplification in RocksDB. In: Proceedings of the 8th Biennial Conference on Innovative Data Systems Research (CIDR 2017). Chaminade, California (2017)Google Scholar
  3. 3.
    Annamalai, M., et al.: Sharding the shards: managing datastore locality at scale with Akkio. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI 2018, Berkeley, CA, USA (2018)Google Scholar
  4. 4.
    Wicht, B., Vitillo, R.A., Chen, D., Levinthal, D.: Hardware counted profile-guided optimization. CoRR, vol. abs/1411.6361 (2014).
  5. 5.
  6. 6.
  7. 7.
    Advanced usage of last branch records.
  8. 8.
  9. 9.
    Ottoni, G., Maher, B.: Optimizing function placement for large-scale data-center applications. In: Proceedings of the 2017 International Symposium on Code Generation and Optimization, CGO 2017, Piscataway, NJ, USA (2017)Google Scholar
  10. 10.
  11. 11.
    Barrigas, H., Barrigas, D., Barata, M., Furtado, P., Bernardino, J.: Overview of Facebook scalable architecture. In: Proceedings of the International Conference on Information Systems and Design of Communication, ISDOC 2014 (2014)Google Scholar
  12. 12.
    Buck: A high-performance build tool.
  13. 13.
  14. 14.
    Chen, D., Li, D.X., Moseley, T.: AutoFDO: automatic feedback-directed optimization for warehouse-scale applications. In: Proceedings of the 2016 International Symposium on Code Generation and Optimization, CGO 2016, New York, NY, USA (2016)Google Scholar
  15. 15.
    Tallam, S., Coutant, C., Taylor, I.L., Li, X.D., Demetriou, C.: Safe ICF: pointer safe and unwinding aware identical code folding in gold. In: GCC Developers Summit (2010)Google Scholar
  16. 16.
    Panchenko, M., Auler, R., Nell, B., Ottoni, G.: Bolt: a practical binary optimizer for data centers and beyond. In: Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2019, pp. 2–14. IEEE Press, Piscataway (2019)Google Scholar
  17. 17.
    Binary Optimization and Layout Tool.
  18. 18.
    Luk, C.-K., Muth, R., Patil, H., Cohn, R., Lowney, G.: Ispike: a post-link optimizer for the Intel Itanium architecture. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO 2004, Washington, DC, USA (2004)Google Scholar
  19. 19.
    Nowak, A., Yasin, A., Mendelson, A., Zwaenepoel, W.: Establishing a base of trust with performance counters for enterprise workloads. In: Proceedings of the 2015 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC 2015, Berkeley, CA, USA, pp. 541–548 (2015)Google Scholar
  20. 20.
    Scaling server software at Facebook. In Applicative 2016, Applicative 2016, speaker-Watson, Dave (2016)Google Scholar
  21. 21.
    RocksDB: A persistent key-value store.
  22. 22.
    Ouaknine, K., Agra, O., Guz, Z.: Optimization of RocksDB for Redis on flash. In: Proceedings of the International Conference on Compute and Data Analysis, ICCDA 2017, New York, NY, USA (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Zhenyun Zhuang
    • 1
    Email author
  • Mark Santaniello
    • 1
  • Shumin Zhao
    • 1
  • Bikash Sharma
    • 1
  • Rajit Kambo
    • 1
  1. 1.Facebook, Inc.Menlo ParkUSA

Personalised recommendations