Skip to main content

Application-Specific Accelerators for Communications

  • Chapter
  • First Online:
  • 6027 Accesses

Abstract

For computation-intensive digital signal processing algorithms, complexity is exceeding the processing capabilities of general-purpose digital signal processors (DSPs). In some of these applications, DSP hardware accelerators have been widely used to off-load a variety of algorithms from the main DSP host, including FFT, FIR/IIR filters, multiple-input multiple-output (MIMO) detectors, and error correction codes (Viterbi, Turbo, LDPC) decoders. Given power and cost considerations, simply implementing these computationally complex parallel algorithms with high-speed general-purpose DSP processor is not very efficient. However, not all DSP algorithms are appropriate for off-loading to a hardware accelerator. First, these algorithms should have data-parallel computations and repeated operations that are amenable to hardware implementation. Second, these algorithms should have a deterministic dataflow graph that maps to parallel datapaths. The accelerators that we consider are mostly coarse grain to better deal with streaming data transfer for achieving both high performance and low power. In this chapter, we focus on some of the basic and advanced digital signal processing algorithms for communications and cover major examples of DSP accelerators for communications.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alamouti, S.M.: A simple transmit diversity technique for wireless communications. IEEE Journal on Selected Areas in Communications 16(8), 1451–1458 (1998)

    Article  Google Scholar 

  2. Amiri, K., Cavallaro, J.R.: FPGA implementation of dynamic threshold sphere detection for MIMO systems. In: IEEE Asilomar Conf. on Signals, Syst. and Computers, pp. 94–98 (2006)

    Google Scholar 

  3. Analog Devices: The SHARC processor family. http://www.analog.com/en/embedded-processing-dsp/sharc/processors/index.html (2009)

  4. Bahl, L., Cocke, J., Jelinek, F., Raviv, J.: Optimal decoding of linear codes for minimizing symbol error rate. IEEE Transactions on Information Theory IT-20, 284–287 (1974)

    Article  MathSciNet  Google Scholar 

  5. Bass, B.: A low-power, high-performance, 1024-point FFT processor. IEEE Journal of Solid-State Circuits 34(3), 380–387 (1999)

    Article  MathSciNet  Google Scholar 

  6. Berrou, C., Glavieux, A., Thitimajshima, P.: Near Shannon limit error-correcting coding and decoding: Turbo-codes. In: IEEE Int. Conf. on Commun., pp. 1064–1070 (1993)

    Google Scholar 

  7. Bougard, B., Giulietti, A., Derudder, V., Weijers, J.W., Dupont, S., Hollevoet, L., Catthoor, F., Van der Perre, L., De Man, H., Lauwereins, R.: A scalable 8.7-nJ/bit 75.6-Mb/s parallel concatenated convolutional (turbo-) codec. In: IEEE International Solid-State Circuit Conference (ISSCC), vol. 1, pp. 152–484 (2003)

    Google Scholar 

  8. Brack, T., Alles, M., Lehnigk-Emden, T., Kienle, F., Wehn, N., Lapos, Insalata, N., Rossi, F., Rovini, M., Fanucci, L.: Low complexity LDPC code decoders for next generation standards. In: Design, Automation, and Test in Europe, pp. 1–6 (2007)

    Google Scholar 

  9. Brogioli, M.: Reconfigurable heterogeneous DSP/FPGA based embedded architectures for numerically intensive embedded computing workloads. Ph.D. thesis, Rice University, Houston, Texas, USA (2007)

    Google Scholar 

  10. Brogioli, M., Radosavljevic, P., Cavallaro, J.: A general hardware/software codesign methodology for embedded signal processing and multimedia workloads. In: IEEE Asilomar Conf. on Signals, Syst., and Computers, pp. 1486–1490 (2006)

    Google Scholar 

  11. Burg, A.: VLSI circuits for MIMO communication systems. Ph.D. thesis, Swiss Federal Institute of Technology, Zurich, Switzerland (2006)

    Google Scholar 

  12. Burg, A., Borgmann, M., Wenk, M., Zellweger, M., Fichtner, W., Bolcskei, H.: VLSI implementation of MIMO detection using the sphere decoding algorithm. IEEE Journal of Solid-State Circuits 40(7), 1566–1577 (2005)

    Article  Google Scholar 

  13. Cheng, C.C., Tsai, Y.M., Chen, L.G., Chandrakasan, A.: A 0.077 to 0.168 nJ/bit/iteration scalable 3GPP LTE turbo decoder with an adaptive sub-block parallel scheme and an embedded DVFS engine. In: IEEE Custom Integrated Circuits Conference, pp. 1–4 (2010)

    Google Scholar 

  14. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19(90), 297–301 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  15. Cupaiuolo, T., Siti, M., Tomasoni, A.: Low-complexity high throughput VLSI architecture of soft-output ML MIMO detector. In: Design, Automation and Test in Europe Conference and Exhibition, pp. 1396–1401 (2010)

    Google Scholar 

  16. Damen, M.O., Gamal, H.E., Caire, G.: On maximum likelihood detection and the search for the closest lattice point. IEEE Transaction on Information Theory 49(10), 2389–2402 (2003)

    Article  Google Scholar 

  17. De Sutter, B., Raghavan, P., Lambrechts, A.: Coarse-grained reconfigurable array architectures. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, second edn. Springer (2013)

    Google Scholar 

  18. Falcao, G., Silva, V., Sousa, L.: How GPUs can outperform ASICs for fast LDPC decoding. In: 23rd ACM International Conference on Supercomputing, pp. 390–399. ACM (2009)

    Google Scholar 

  19. Fincke, U., Pohst, M.: Improved methods for calculating vectors of short length in a lattice, including a complexity analysis. Mathematics of Computation 44(170), 463–471 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  20. Forney, G.D.: The Viterbi algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)

    Article  MathSciNet  Google Scholar 

  21. Foschini, G.: Layered space-time architecture for wireless communication in a fading environment when using multiple antennas. Bell Labs. Tech. Journal 2, 41–59 (1996)

    Google Scholar 

  22. Freescale Semiconductor: Freescale Starcore architecture. http://www.freescale.com/starcore (2009)

  23. Freescale Semiconductor: MSC8156 six core broadband wireless access DSP. http://www.freescale.com/starcore (2009)

  24. Gallager, R.: Low-density parity-check codes. IEEE Transactions on Information Theory IT-8, 21–28 (1962)

    Article  MathSciNet  Google Scholar 

  25. Garrett, D., Davis, L., ten Brink, S., Hochwald, B., Knagge, G.: Silicon complexity for maximum likelihood MIMO detection using spherical decoding. IEEE Journal of Solid-State Circuits 39(9), 1544–1552 (2004)

    Article  Google Scholar 

  26. Golden, G., Foschini, G.J., Valenzuela, R.A., Wolniansky, P.W.: Detection algorithms and initial laboratory results using V-BLAST space-time communication architecture. Electronics Letters 35(1), 14–15 (1999)

    Article  Google Scholar 

  27. Gunnam, K., Choi, G.S., Yeary, M.B., Atiquzzaman, M.: VLSI architectures for layered decoding for irregular LDPC codes of WiMax. In: IEEE International Conference on Communications, pp. 4542–4547 (2007)

    Google Scholar 

  28. Guo, Z., Nilsson, P.: Algorithm and implementation of the K-best sphere decoding for MIMO detection. IEEE Journal on Selected Areas in Communications 24(3), 491–503 (2006)

    Article  Google Scholar 

  29. Gustafsson, O., Wanhammar, L.: Arithmetic. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, second edn. Springer (2013)

    Google Scholar 

  30. Han, S., Tellambura, C.: A complexity-efficient sphere decoder for MIMO systems. In: IEEE International Conference on Communications, pp. 1–5 (2011)

    Google Scholar 

  31. Hassibi, B., Vikalo, H.: On the sphere-decoding algorithm I. expected complexity. IEEE Transaction On Signal Processing 53(8), 2806–2818 (2005)

    Google Scholar 

  32. Hunter, H.C., Moreno, J.H.: A new look at exploiting data parallelism in embedded systems. In: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 159–169 (2003)

    Google Scholar 

  33. Jin, J., Tsui, C.: Low-complexity switch network for reconfigurable LDPC decoders. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 18(8), 1185–1195 (2010)

    Google Scholar 

  34. Lechner, G., Sayir, J., Rupp, M.: Efficient DSP implementation of an LDPC decoder. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp. 665–668 (2004)

    Google Scholar 

  35. Lee, S.J., Shanbhag, N.R., Singer, A.C.: Area-efficient high-throughput MAP decoder architectures. IEEE Transaction on VLSI Systems 13(8), 921–933 (2005)

    Article  MathSciNet  Google Scholar 

  36. Lin, C.H., Chen, C.Y., Wu, A.Y.: Area-efficient scalable MAP processor design for high-throughput multistandard convolutional turbo decoding. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 19(2), 305–318 (2011)

    Google Scholar 

  37. Liu, D., Wang, J.: Application specific instruction set DSP processors. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, second edn. Springer (2013)

    Google Scholar 

  38. Martina, M., Nicola, M., Masera, G.: A flexible UMTS-WiMax turbo decoder architecture. IEEE Transactions on Circuits and Systems II 55(4), 369–273 (2008)

    Article  Google Scholar 

  39. May, M., Ilnseher, T., Wehn, N., Raab, W.: A 150 Mbit/s 3GPP LTE turbo code decoder. In: IEEE Design, Automation & Test in Europe Conference & Exhibition, pp. 1420–1425 (2010)

    Google Scholar 

  40. McAllister, J.: FPGA-based DSP. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, second edn. Springer (2013)

    Google Scholar 

  41. Myllylä, M., Silvola, P., Juntti, M., Cavallaro, J.R.: Comparison of two novel list sphere detector algorithms for MIMO-OFDM systems. In: IEEE International Symposium on Personal Indoor and Mobile Radio Communications, pp. 1–5 (2006)

    Google Scholar 

  42. Parhi, K.K.: VLSI digital signal processing systems design and implementation. Wiley (1999)

    Google Scholar 

  43. Prescher, G., Gemmeke, T., Noll, T.G.: A parametrizable low-power high-throughput turbo-decoder. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 5, pp. 25–28 (2005)

    Google Scholar 

  44. Rovini, M., Gentile, G., Rossi, F., Fanucci, L.: A scalable decoder architecture for IEEE 802.11n LDPC codes. In: IEEE Global Telecommunications Conference, pp. 3270–3274 (2007)

    Google Scholar 

  45. Sadjadpour, H., Sloane, N., Salehi, M., Nebe, G.: Interleaver design for turbo codes. IEEE Journal on Seleteced Areas in Communications 19(5), 831–837 (2001)

    Article  Google Scholar 

  46. Salmela, P., Gu, R., Bhattacharyya, S., Takala, J.: Efficient parallel memory organization for turbo decoders. In: Proc. European Signal Processing Conf., pp. 831–835 (2007)

    Google Scholar 

  47. Shin, M.C., Park, I.C.: A programmable turbo decoder for multiple 3G wireless standards. In: IEEE Solid-State Circuits Conference, vol. 1, pp. 154–484 (2003)

    Article  Google Scholar 

  48. Studer, C., Benkeser, C., Belfanti, S., Huang, Q.: Design and implementation of a aarallel turbo-decoder ASIC for 3GPP-LTE. IEEE Journal of Solid-State Circuits 46(1), 8–17 (2011)

    Article  Google Scholar 

  49. Sun, J., Takeshita, O.: Interleavers for turbo codes using permutation polynomials over integer rings. IEEE Transaction on Information Theory 51(1), 101–119 (2005)

    Article  MathSciNet  Google Scholar 

  50. Sun, Y.: Parallel VLSI architectures for multi-Gbps MIMO communication systems. Ph.D. thesis, Rice University, Houston, Texas, USA (2010)

    Google Scholar 

  51. Sun, Y., Cavallaro, J.R.: A low-power 1-Gbps reconfigurable LDPC decoder design for multiple 4G wireless standards. In: IEEE International SOC Conference, pp. 367–370 (2008)

    Google Scholar 

  52. Sun, Y., Cavallaro, J.R.: Scalable and low power LDPC decoder design using high level algorithmic synthesis. In: IEEE International SOC Conference (SoCC), pp. 267–270 (2009)

    Google Scholar 

  53. Sun, Y., Cavallaro, J.R.: Efficient hardware implementation of a highly-parallel 3GPP LTE, LTE-advance turbo decoder. Integration, the VLSI Journal, Special Issue on Hardware Architectures for Algebra, Cryptology and Number Theory 44(4), 305–315 (2011)

    Google Scholar 

  54. Sun, Y., Cavallaro, J.R.: A flexible LDPC/turbo decoder architecture. Journal of Signal Processing System 64(1), 1–16 (2011)

    Article  Google Scholar 

  55. Sun, Y., Karkooti, M., Cavallaro, J.R.: VLSI decoder architecture for high throughput, variable block-size and multi-rate LDPC codes. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2104–2107 (2007)

    Google Scholar 

  56. Sun, Y., Wang, G., Cavallaro, J.R.: Multi-layer parallel decoding algorithm and VLSI architecture for quasi-cyclic LDPC codes. In: IEEE International Symposium on Circuits and Systems, pp. 1776–1779 (2011)

    Google Scholar 

  57. Sun, Y., Zhu, Y., Goel, M., Cavallaro, J.R.: Configurable and scalable high throughput turbo decoder architecture for multiple 4G wireless standards. In: IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 209–214 (2008)

    Google Scholar 

  58. Takala, J.: General-purpose DSP processors. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, second edn. Springer (2013)

    Google Scholar 

  59. Tarokh, V., Jafarkhani, H., Calderbank, A.R.: Space-time block codes from orthogonal designs. IEEE Transactions on Information Theory 45(5), 1456–1467 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  60. Tarokh, V., Jafarkhani, H., Calderbank, A.R.: Space time block coding for wireless communications: Performance results. IEEE Journal on Selected Areas in Communications 17(3), 451–460 (1999)

    Article  Google Scholar 

  61. Telatar, I.E.: Capacity of multiantenna Gaussian channels. European Transaction on Telecommunications 10, 585–595 (1999)

    Article  Google Scholar 

  62. Tensilica Inc.: http://www.tensilica.com (2009)

  63. Texas Instruments: TMS 320C6000 CPU and instruction set reference guide. http://dspvillage.ti.com (2001)

  64. Texas Instruments: TMS 320C55x DSP CPU programmer’s reference supplement. http://focus.ti.com/lit/ug/spru652g/spru652g.pdf (2005)

  65. Texas Instruments: TMS320C6474 high performance multicore processor datasheet. http://focus.ti.com/docs/prod/folders/print/tms320c6474.html (2008)

  66. Thul, M.J., Gilbert, F., Vogt, T., Kreiselmaier, G., Wehn, N.: A scalable system architecture for high-throughput turbo-decoders. The Journal of VLSI Signal Processing 39(1–2), 63–77 (2005)

    MATH  Google Scholar 

  67. Viterbi, A.: Error bounds for convolutional coding and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory IT-13, 260–269 (1967)

    Article  Google Scholar 

  68. Wijting, C., Ojanperä”, T., Juntti, M., Kansanen, K., Prasad, R.: Groupwise serial multiuser detectors for multirate DS-CDMA. In: IEEE Vehicular Technology Conference, vol. 1, pp. 836–840 (1999)

    Google Scholar 

  69. Willmann, P., Kim, H., Rixner, S., Pai, V.S.: An efficient programmable 10 gigabit Ethernet network interface card. In: ACM International Symposium on High-Performance Computer Architecture, pp. 85–86 (2006)

    Google Scholar 

  70. Witte, E., Borlenghi, F., Ascheid, G., Leupers, R., Meyr, H.: A scalable VLSI architecture for soft-input soft-output single tree-search sphere decoding. IEEE Trans. on Circuits and Systems II: Express Briefs 57(9), 706–710 (2010)

    Article  Google Scholar 

  71. Wong, C.C., Chang, H.C.: Reconfigurable turbo decoder with parallel architecture for 3GPP LTE system. IEEE Tran. on Circuits and Systems II: Express Briefs 57(7), 566–570 (2010)

    Article  MathSciNet  Google Scholar 

  72. Wong, K., Tsui, C., Cheng, R.S., Mow, W.: A VLSI architecture of a K-best lattice decoding algorithm for MIMO channels. In: IEEE Internation Symposium on Circuits and Systems, vol. 3, pp. 273–276 (2002)

    Google Scholar 

  73. Wu, M., Sun, Y., Cavallaro, J.: Implementation of a 3GPP LTE turbo decoder accelerator on GPU. In: IEEE Workshop on Signal Processing Systems, pp. 192–197 (2010)

    Google Scholar 

  74. Wu, M., Sun, Y., Gupta, S., Cavallaro, J.R.: Implementation of a high throughput soft MIMO detector on GPU. Journal of Signal Processing System 64(1), 123–136 (2011)

    Article  Google Scholar 

  75. Wu, M., Sun, Y., Wang, G., Cavallaro, J.R.: Implementation of a high throughput 3GPP turbo decoder on GPU. Journal of Signal Processing System Online First (2011)

    Google Scholar 

  76. Ye, Z.A., Moshovos, A., Hauck, S., Banerjee, P.: CHIMAERA: A high performance architecture with a tightly coupled reconfigurable functional unit. In: Proceedings of the 27th Annual International Symposium on Computer Architecture, pp. 225–235 (2000)

    Google Scholar 

  77. Zhang, T., Pan, Y., Li, Y.: DSP systems using three-dimensional integration technology. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, second edn. Springer (2013)

    Google Scholar 

  78. Zhong, H., Zhang, T.: Block-LDPC: a practical LDPC coding system design approach. IEEE Transactions on Circuits and Systems I 52(4), 766–775 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors at Rice University would like to thank Nokia, Nokia Siemens Networks (NSN), Xilinx, and US National Science Foundation (under grants CCF-0541363, CNS-0551692, CNS-0619767, EECS-0925942 and CNS-0923479) for their support of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Sun, Y., Amiri, K., Brogioli, M., Cavallaro, J.R. (2013). Application-Specific Accelerators for Communications. In: Bhattacharyya, S., Deprettere, E., Leupers, R., Takala, J. (eds) Handbook of Signal Processing Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6859-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-6859-2_23

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-6858-5

  • Online ISBN: 978-1-4614-6859-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics