Improving Latency Tolerance of Network Processors Through Simultaneous Multithreading

  • Bo Liang
  • Hong An
  • Fang Lu
  • Rui Guo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3756)


Existing multithreaded network processors architecture with multiple processing engines (PEs), aims at taking advantage of blocked multithreading technique which executes instructions of different user-defined threads in the same PE pipeline, in explicit and interleave way. Multiple PEs, each of which is a multithreaded processor core, process several packets in parallel to hide long memory access latency. Most of them are optimized for throughputs mostly in data-plane. In future network workloads, the boundaries between data-plane and control-plane become blurred, so that PEs are demanded not only wire speed packet forwarding on data-plane, but also highly intelligent and increased complex packet processing function on control-plane. In this paper, we analyze SMT’s short latency tolerance potential when used in out-of-order and dynamic scheduling PE cores. We show in this paper that 2~4 issue SMT provides an excellent short memory and branch latency tolerance, which gain higher instructions throughout as well as much simpler structures.


Branch Latency Benchmark Suite Network Address Translation Network Processor Branch Prediction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Intel Corporation: Intel IXP2400 Network Processor Family Hardware Reference Manual (June 2001)Google Scholar
  2. 2.
    IBM Corporation: The Network Processor: Enabling Technology for High-Performance Networking. IBM Microelectronics (1999) Google Scholar
  3. 3.
    C-Port Corporation: C-5 Digital Communications Processor (1999),
  4. 4.
    Wolf, T., Franklin, M.: CommBench - A Telecommunications Benchmark for Network Processors. In: International Symposium on Performance Analysis of Systems and Software (April 2000)Google Scholar
  5. 5.
    Memik, G., Mangione-smith, W., Hu, W.: NetBench: A Benchmarking Suite for Network Processors. In: 2001 IEEE/ACM International Conference on Computer-Aided Design (2001)Google Scholar
  6. 6.
    Lee, B.K.: NpBench: A Benchmark Suite for Control plane and Data plane Applications for Network Processors. In: IEEE International Conference on Computer Design (October 2003)Google Scholar
  7. 7.
    Gonçalves, R., Ayguadé, E., Valero, M., Navaux, P.: A Simulator for SMT Architectures: Evaluating Instruction Cache Topologies. In: SBAC-PAD, Brazil, pp. 2169–2161 (2000)Google Scholar
  8. 8.
    Simplescalar Simulator, from,
  9. 9.
    Chiueh, T.-C., Pradhan, P.: Cache Memory Design for Network Processors. In: Proceeding of the 6th International Symposium. on High Performance Computer Architecture, Tolouse, France (January 2000)Google Scholar
  10. 10.
    Sherwood, T., Varghese, G., Calder, B.: A Pipelined Memory Architecture for High Throughput Network Processors. In: Proceedings of the 30th Annual, ISCA 2003 (2003)Google Scholar
  11. 11.
    Hasan, J., Chandra1, S., Vijaykumar, T.N.: Efficient Use of Memory Bandwidth to Improve Network Processor Throughput. In: Proceedings of the 30th Annual, ISCA 2003 (2003)Google Scholar
  12. 12.
    Parcerisa, J.-M., Gonzalez, A.: Improving Latency Tolerance of Multithreading through Decoupling. IEEE Transactions on Computers 50(10) (October 2001)Google Scholar
  13. 13.
    Hily, S., Seznec, A.: Branch Prediction and Simultaneous Multithreading. In: proceeding of International Conference on Parallel Architecture and Compilation Techniques (1996)Google Scholar
  14. 14.
    Ramsay, M., Feucht, C., Lipasti, M.H.: Exploring Efficient SMT Branch Predictor Design. In: Workshop on Complexity-Effective Design, in conjunction with ISCA (June 2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Bo Liang
    • 1
  • Hong An
    • 1
    • 2
  • Fang Lu
    • 1
  • Rui Guo
    • 1
  1. 1.Department of Computer Science and TechnologyUniversity of Science and Technology of ChinaHefeiChina
  2. 2.Computer Architecture Laboratory, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina

Personalised recommendations