Abstract
For computation-intensive digital signal processing algorithms, complexity is exceeding the processing capabilities of general-purpose digital signal processors (DSPs). In some of these applications, DSP hardware accelerators have been widely used to off-load a variety of algorithms from the main DSP host, including FFT, FIR/IIR filters, multiple-input multiple-output (MIMO) detectors, and error correction codes (Viterbi, Turbo, LDPC) decoders. Given power and cost considerations, simply implementing these computationally complex parallel algorithms with high-speed general-purpose DSP processor is not very efficient. However, not all DSP algorithms are appropriate for off-loading to a hardware accelerator. First, these algorithms should have data-parallel computations and repeated operations that are amenable to hardware implementation. Second, these algorithms should have a deterministic dataflow graph that maps to parallel datapaths. The accelerators that we consider are mostly coarse grain to better deal with streaming data transfer for achieving both high performance and low power. In this chapter, we focus on some of the basic and advanced digital signal processing algorithms for communications and cover major examples of DSP accelerators for communications.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Alamouti, S.M.: A simple transmit diversity technique for wireless communications. IEEE Journal on Selected Areas in Communications 16(8), 1451–1458 (1998)
Amiri, K., Cavallaro, J.R.: FPGA implementation of dynamic threshold sphere detection for MIMO systems. In: IEEE Asilomar Conf. on Signals, Syst. and Computers, pp. 94–98 (2006)
Analog Devices: The SHARC processor family. http://www.analog.com/en/embedded-processing-dsp/sharc/processors/index.html (2009)
Bahl, L., Cocke, J., Jelinek, F., Raviv, J.: Optimal decoding of linear codes for minimizing symbol error rate. IEEE Transactions on Information Theory IT-20, 284–287 (1974)
Bass, B.: A low-power, high-performance, 1024-point FFT processor. IEEE Journal of Solid-State Circuits 34(3), 380–387 (1999)
Berrou, C., Glavieux, A., Thitimajshima, P.: Near Shannon limit error-correcting coding and decoding: Turbo-codes. In: IEEE Int. Conf. on Commun., pp. 1064–1070 (1993)
Bougard, B., Giulietti, A., Derudder, V., Weijers, J.W., Dupont, S., Hollevoet, L., Catthoor, F., Van der Perre, L., De Man, H., Lauwereins, R.: A scalable 8.7-nJ/bit 75.6-Mb/s parallel concatenated convolutional (turbo-) codec. In: IEEE International Solid-State Circuit Conference (ISSCC), vol. 1, pp. 152–484 (2003)
Brack, T., Alles, M., Lehnigk-Emden, T., Kienle, F., Wehn, N., Lapos, Insalata, N., Rossi, F., Rovini, M., Fanucci, L.: Low complexity LDPC code decoders for next generation standards. In: Design, Automation, and Test in Europe, pp. 1–6 (2007)
Brogioli, M.: Reconfigurable heterogeneous DSP/FPGA based embedded architectures for numerically intensive embedded computing workloads. Ph.D. thesis, Rice University, Houston, Texas, USA (2007)
Brogioli, M., Radosavljevic, P., Cavallaro, J.: A general hardware/software codesign methodology for embedded signal processing and multimedia workloads. In: IEEE Asilomar Conf. on Signals, Syst., and Computers, pp. 1486–1490 (2006)
Burg, A.: VLSI circuits for MIMO communication systems. Ph.D. thesis, Swiss Federal Institute of Technology, Zurich, Switzerland (2006)
Burg, A., Borgmann, M., Wenk, M., Zellweger, M., Fichtner, W., Bolcskei, H.: VLSI implementation of MIMO detection using the sphere decoding algorithm. IEEE Journal of Solid-State Circuits 40(7), 1566–1577 (2005)
Cheng, C.C., Tsai, Y.M., Chen, L.G., Chandrakasan, A.: A 0.077 to 0.168 nJ/bit/iteration scalable 3GPP LTE turbo decoder with an adaptive sub-block parallel scheme and an embedded DVFS engine. In: IEEE Custom Integrated Circuits Conference, pp. 1–4 (2010)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19(90), 297–301 (1965)
Cupaiuolo, T., Siti, M., Tomasoni, A.: Low-complexity high throughput VLSI architecture of soft-output ML MIMO detector. In: Design, Automation and Test in Europe Conference and Exhibition, pp. 1396–1401 (2010)
Damen, M.O., Gamal, H.E., Caire, G.: On maximum likelihood detection and the search for the closest lattice point. IEEE Transaction on Information Theory 49(10), 2389–2402 (2003)
De Sutter, B., Raghavan, P., Lambrechts, A.: Coarse-grained reconfigurable array architectures. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, second edn. Springer (2013)
Falcao, G., Silva, V., Sousa, L.: How GPUs can outperform ASICs for fast LDPC decoding. In: 23rd ACM International Conference on Supercomputing, pp. 390–399. ACM (2009)
Fincke, U., Pohst, M.: Improved methods for calculating vectors of short length in a lattice, including a complexity analysis. Mathematics of Computation 44(170), 463–471 (1985)
Forney, G.D.: The Viterbi algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)
Foschini, G.: Layered space-time architecture for wireless communication in a fading environment when using multiple antennas. Bell Labs. Tech. Journal 2, 41–59 (1996)
Freescale Semiconductor: Freescale Starcore architecture. http://www.freescale.com/starcore (2009)
Freescale Semiconductor: MSC8156 six core broadband wireless access DSP. http://www.freescale.com/starcore (2009)
Gallager, R.: Low-density parity-check codes. IEEE Transactions on Information Theory IT-8, 21–28 (1962)
Garrett, D., Davis, L., ten Brink, S., Hochwald, B., Knagge, G.: Silicon complexity for maximum likelihood MIMO detection using spherical decoding. IEEE Journal of Solid-State Circuits 39(9), 1544–1552 (2004)
Golden, G., Foschini, G.J., Valenzuela, R.A., Wolniansky, P.W.: Detection algorithms and initial laboratory results using V-BLAST space-time communication architecture. Electronics Letters 35(1), 14–15 (1999)
Gunnam, K., Choi, G.S., Yeary, M.B., Atiquzzaman, M.: VLSI architectures for layered decoding for irregular LDPC codes of WiMax. In: IEEE International Conference on Communications, pp. 4542–4547 (2007)
Guo, Z., Nilsson, P.: Algorithm and implementation of the K-best sphere decoding for MIMO detection. IEEE Journal on Selected Areas in Communications 24(3), 491–503 (2006)
Gustafsson, O., Wanhammar, L.: Arithmetic. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, second edn. Springer (2013)
Han, S., Tellambura, C.: A complexity-efficient sphere decoder for MIMO systems. In: IEEE International Conference on Communications, pp. 1–5 (2011)
Hassibi, B., Vikalo, H.: On the sphere-decoding algorithm I. expected complexity. IEEE Transaction On Signal Processing 53(8), 2806–2818 (2005)
Hunter, H.C., Moreno, J.H.: A new look at exploiting data parallelism in embedded systems. In: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 159–169 (2003)
Jin, J., Tsui, C.: Low-complexity switch network for reconfigurable LDPC decoders. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 18(8), 1185–1195 (2010)
Lechner, G., Sayir, J., Rupp, M.: Efficient DSP implementation of an LDPC decoder. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp. 665–668 (2004)
Lee, S.J., Shanbhag, N.R., Singer, A.C.: Area-efficient high-throughput MAP decoder architectures. IEEE Transaction on VLSI Systems 13(8), 921–933 (2005)
Lin, C.H., Chen, C.Y., Wu, A.Y.: Area-efficient scalable MAP processor design for high-throughput multistandard convolutional turbo decoding. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 19(2), 305–318 (2011)
Liu, D., Wang, J.: Application specific instruction set DSP processors. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, second edn. Springer (2013)
Martina, M., Nicola, M., Masera, G.: A flexible UMTS-WiMax turbo decoder architecture. IEEE Transactions on Circuits and Systems II 55(4), 369–273 (2008)
May, M., Ilnseher, T., Wehn, N., Raab, W.: A 150 Mbit/s 3GPP LTE turbo code decoder. In: IEEE Design, Automation & Test in Europe Conference & Exhibition, pp. 1420–1425 (2010)
McAllister, J.: FPGA-based DSP. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, second edn. Springer (2013)
Myllylä, M., Silvola, P., Juntti, M., Cavallaro, J.R.: Comparison of two novel list sphere detector algorithms for MIMO-OFDM systems. In: IEEE International Symposium on Personal Indoor and Mobile Radio Communications, pp. 1–5 (2006)
Parhi, K.K.: VLSI digital signal processing systems design and implementation. Wiley (1999)
Prescher, G., Gemmeke, T., Noll, T.G.: A parametrizable low-power high-throughput turbo-decoder. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 5, pp. 25–28 (2005)
Rovini, M., Gentile, G., Rossi, F., Fanucci, L.: A scalable decoder architecture for IEEE 802.11n LDPC codes. In: IEEE Global Telecommunications Conference, pp. 3270–3274 (2007)
Sadjadpour, H., Sloane, N., Salehi, M., Nebe, G.: Interleaver design for turbo codes. IEEE Journal on Seleteced Areas in Communications 19(5), 831–837 (2001)
Salmela, P., Gu, R., Bhattacharyya, S., Takala, J.: Efficient parallel memory organization for turbo decoders. In: Proc. European Signal Processing Conf., pp. 831–835 (2007)
Shin, M.C., Park, I.C.: A programmable turbo decoder for multiple 3G wireless standards. In: IEEE Solid-State Circuits Conference, vol. 1, pp. 154–484 (2003)
Studer, C., Benkeser, C., Belfanti, S., Huang, Q.: Design and implementation of a aarallel turbo-decoder ASIC for 3GPP-LTE. IEEE Journal of Solid-State Circuits 46(1), 8–17 (2011)
Sun, J., Takeshita, O.: Interleavers for turbo codes using permutation polynomials over integer rings. IEEE Transaction on Information Theory 51(1), 101–119 (2005)
Sun, Y.: Parallel VLSI architectures for multi-Gbps MIMO communication systems. Ph.D. thesis, Rice University, Houston, Texas, USA (2010)
Sun, Y., Cavallaro, J.R.: A low-power 1-Gbps reconfigurable LDPC decoder design for multiple 4G wireless standards. In: IEEE International SOC Conference, pp. 367–370 (2008)
Sun, Y., Cavallaro, J.R.: Scalable and low power LDPC decoder design using high level algorithmic synthesis. In: IEEE International SOC Conference (SoCC), pp. 267–270 (2009)
Sun, Y., Cavallaro, J.R.: Efficient hardware implementation of a highly-parallel 3GPP LTE, LTE-advance turbo decoder. Integration, the VLSI Journal, Special Issue on Hardware Architectures for Algebra, Cryptology and Number Theory 44(4), 305–315 (2011)
Sun, Y., Cavallaro, J.R.: A flexible LDPC/turbo decoder architecture. Journal of Signal Processing System 64(1), 1–16 (2011)
Sun, Y., Karkooti, M., Cavallaro, J.R.: VLSI decoder architecture for high throughput, variable block-size and multi-rate LDPC codes. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2104–2107 (2007)
Sun, Y., Wang, G., Cavallaro, J.R.: Multi-layer parallel decoding algorithm and VLSI architecture for quasi-cyclic LDPC codes. In: IEEE International Symposium on Circuits and Systems, pp. 1776–1779 (2011)
Sun, Y., Zhu, Y., Goel, M., Cavallaro, J.R.: Configurable and scalable high throughput turbo decoder architecture for multiple 4G wireless standards. In: IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 209–214 (2008)
Takala, J.: General-purpose DSP processors. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, second edn. Springer (2013)
Tarokh, V., Jafarkhani, H., Calderbank, A.R.: Space-time block codes from orthogonal designs. IEEE Transactions on Information Theory 45(5), 1456–1467 (1999)
Tarokh, V., Jafarkhani, H., Calderbank, A.R.: Space time block coding for wireless communications: Performance results. IEEE Journal on Selected Areas in Communications 17(3), 451–460 (1999)
Telatar, I.E.: Capacity of multiantenna Gaussian channels. European Transaction on Telecommunications 10, 585–595 (1999)
Tensilica Inc.: http://www.tensilica.com (2009)
Texas Instruments: TMS 320C6000 CPU and instruction set reference guide. http://dspvillage.ti.com (2001)
Texas Instruments: TMS 320C55x DSP CPU programmer’s reference supplement. http://focus.ti.com/lit/ug/spru652g/spru652g.pdf (2005)
Texas Instruments: TMS320C6474 high performance multicore processor datasheet. http://focus.ti.com/docs/prod/folders/print/tms320c6474.html (2008)
Thul, M.J., Gilbert, F., Vogt, T., Kreiselmaier, G., Wehn, N.: A scalable system architecture for high-throughput turbo-decoders. The Journal of VLSI Signal Processing 39(1–2), 63–77 (2005)
Viterbi, A.: Error bounds for convolutional coding and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory IT-13, 260–269 (1967)
Wijting, C., Ojanperä”, T., Juntti, M., Kansanen, K., Prasad, R.: Groupwise serial multiuser detectors for multirate DS-CDMA. In: IEEE Vehicular Technology Conference, vol. 1, pp. 836–840 (1999)
Willmann, P., Kim, H., Rixner, S., Pai, V.S.: An efficient programmable 10 gigabit Ethernet network interface card. In: ACM International Symposium on High-Performance Computer Architecture, pp. 85–86 (2006)
Witte, E., Borlenghi, F., Ascheid, G., Leupers, R., Meyr, H.: A scalable VLSI architecture for soft-input soft-output single tree-search sphere decoding. IEEE Trans. on Circuits and Systems II: Express Briefs 57(9), 706–710 (2010)
Wong, C.C., Chang, H.C.: Reconfigurable turbo decoder with parallel architecture for 3GPP LTE system. IEEE Tran. on Circuits and Systems II: Express Briefs 57(7), 566–570 (2010)
Wong, K., Tsui, C., Cheng, R.S., Mow, W.: A VLSI architecture of a K-best lattice decoding algorithm for MIMO channels. In: IEEE Internation Symposium on Circuits and Systems, vol. 3, pp. 273–276 (2002)
Wu, M., Sun, Y., Cavallaro, J.: Implementation of a 3GPP LTE turbo decoder accelerator on GPU. In: IEEE Workshop on Signal Processing Systems, pp. 192–197 (2010)
Wu, M., Sun, Y., Gupta, S., Cavallaro, J.R.: Implementation of a high throughput soft MIMO detector on GPU. Journal of Signal Processing System 64(1), 123–136 (2011)
Wu, M., Sun, Y., Wang, G., Cavallaro, J.R.: Implementation of a high throughput 3GPP turbo decoder on GPU. Journal of Signal Processing System Online First (2011)
Ye, Z.A., Moshovos, A., Hauck, S., Banerjee, P.: CHIMAERA: A high performance architecture with a tightly coupled reconfigurable functional unit. In: Proceedings of the 27th Annual International Symposium on Computer Architecture, pp. 225–235 (2000)
Zhang, T., Pan, Y., Li, Y.: DSP systems using three-dimensional integration technology. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, second edn. Springer (2013)
Zhong, H., Zhang, T.: Block-LDPC: a practical LDPC coding system design approach. IEEE Transactions on Circuits and Systems I 52(4), 766–775 (2005)
Acknowledgements
The authors at Rice University would like to thank Nokia, Nokia Siemens Networks (NSN), Xilinx, and US National Science Foundation (under grants CCF-0541363, CNS-0551692, CNS-0619767, EECS-0925942 and CNS-0923479) for their support of this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Sun, Y., Amiri, K., Brogioli, M., Cavallaro, J.R. (2013). Application-Specific Accelerators for Communications. In: Bhattacharyya, S., Deprettere, E., Leupers, R., Takala, J. (eds) Handbook of Signal Processing Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6859-2_23
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6859-2_23
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6858-5
Online ISBN: 978-1-4614-6859-2
eBook Packages: EngineeringEngineering (R0)