Implementation on Parallel Architectures

  • Richard Tolimieri
  • Myoung An
  • Chao Lu
Part of the Signal Processing and Digital Filtering book series (SIGNAL PROCESS)


In this chapter, we will consider some issues surrounding parallel implementation of several MDFT algorithms on a broadcast mode multiprocessor machine. Such machines typically feature a collection of homogeneous processing elements (nodes) together with an interconnection network of a regular topology for interprocessor communication. The node processors are externally connected by a single I/O channel to a host through which all data loads and unloads are carried out (see Figure 11.1).


Discrete Fourier Transform Processing Element Hybrid Algorithm Parallel Implementation Parallel Architecture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Agarwal, R.C. and Cooley, J.W. (1986), “Fourier Transform and Convolution Subroutines for the IBM 3090 Vector Facility,” IBM J. Res. Devel. 30, 145–162.MathSciNetCrossRefGoogle Scholar
  2. [2]
    Agarwal, R.C. and Cooley, J.W. (1986), “An Efficient Vector Implementation of the FFT Algorithm on IBM 3090VF,” Proc. ICASSP-86, 249–252.Google Scholar
  3. [3]
    Agarwal, R.C. and Cooley, J.W. (1987), “Vectorized Mixed Radix Discrete Fourier Transform Algorithms,” IEEE Proc. 75(9).Google Scholar
  4. [4]
    Auslander, L., Feig, E., and Winograd, S. (1983), “New Algorithms for the Multi-Dimensional Discrete Fourier Transform,” IEEE Trans. Acoust, Speech, Signal Processing ASSP-31(2), 388–403.CrossRefGoogle Scholar
  5. [5]
    Berglsnd, G.D. (1972), “A Parallel Implementation of the Fast Fourier Transform Algorithm,” IEEE Trans. Computers C-21(4), 366–370.CrossRefGoogle Scholar
  6. [6]
    Blahut, R.E. (1985), Fast Algorithms for Digital Signal Processing, Addison-Wesley, Reading, MA.MATHGoogle Scholar
  7. [7]
    Browning, S.A. (1980), The Tree Machine: A Highly Concurrent Computing Environment, Ph.D. thesis, CIT, CA.Google Scholar
  8. [8]
    Burrus, C.S. and Eschenbacher, P.W. (1979), “An In-place, In-order Prime Factor FFT Algorithm,” IEEE Trans. Acoust., Speech, Signal Proc. ASSP-29, 806–817.Google Scholar
  9. [9]
    Chamberlain, R.M. (1988), “Gray Codes, Fast Fourier Transforms and Hypercubes,” Parallel Computing 6, 225–233.MathSciNetMATHCrossRefGoogle Scholar
  10. [10]
    Chu, C.Y. (1988), The Fast Fourier Transform on Hypercube Parallel Computers, Ph.D. thesis, Cornell Univ.Google Scholar
  11. [11]
    “Connection Machine CM-2, Technical Summary,” Thinking Machines Co. Technical Report HA87-4, April, 1987.Google Scholar
  12. [12]
    Fox, G.C. and Otto, S.W. (1984), “Algorithms for Concurrent Processors,” Phys. Today 37, 50–59.CrossRefGoogle Scholar
  13. [13]
    Gertner, I. (1988), “A New Efficient Algorithm to Compute the Twodimensional Discrete Fourier Transform,” IEEE Trans. ASSP ASSP-36(7), 1036–1050.CrossRefGoogle Scholar
  14. [14]
    Gertner, I. and Shamash, M. (1987), “VLSI Architectures for Multidimensional Fourier Transform Processing,” IEEE Trans. Comp. C-36(11), 1265–1274.CrossRefGoogle Scholar
  15. [15]
    Gertner, I. and Rofheart, M. (1990), “A Parallel Algorithm for 2-D DFT Computation with No Interprocessor Communication,” IEEE Trans. Parallel and Dist. Syst. 1(3).Google Scholar
  16. [16]
    Gorin, A.L., Auslander, L., and Silberger, A. (1987), “Balanced Computation of 2-D Transforms on a Tree Machine,” Appl. Math. Letters.Google Scholar
  17. [17]
    Hwang, K. and Briggs, F.A. (1984), Computer Architecture and Parallel Processing, McGraw-Hill, New York.MATHGoogle Scholar
  18. [18]
    Jackson, E., She, Z., and Orszag, S. (1991), “A case Study in Parallel Computing: I. Homogeneous Turbulence on a Hypercube,” J. Scientific Comp. 6(1).Google Scholar
  19. [19]
    Jamieson, L.H., Mueller, P.T., and Siegel, H.J. (1986), “FFT Algorithms for SIMD Processing,” J. Parai. Dist. Comp. Google Scholar
  20. [20]
    Jesshope, C.R. (1980), “The Implementation of Fast Radix-2 Transforms on Array Processors,” IEEE Trans. Comp. C-29(1), 20–27.MathSciNetCrossRefGoogle Scholar
  21. [21]
    Johnson, S.L., Krawitz, R.L., Frye, R., and Macdonald, D. (1989), “A Radix-2 FFT on the Connection Machine,” Supercomputing’89.Google Scholar
  22. [22]
    Korn, D.G. and Lambiotte, J. Jr. (1979), “Computing the Fast Fourier Transform on a Vector Computer,” Math. Comput. 33, 977–992.MathSciNetMATHCrossRefGoogle Scholar
  23. [23]
    Lu, C, An, M., Qian, S., and Tolimieri, R. (1992), “Parallel M-D FFT Algorithms and Their Implementation on Distributed Computing Systems,” submitted for publication.Google Scholar
  24. [24]
    Matsuura, T., Miura, K., and Makino, M. (1985), “Supervector Performance without Toil,” Comput. Phys. Comm. 37, 101–107.MathSciNetCrossRefGoogle Scholar
  25. [25]
    Norton, V.A. and Silberger, A.J. (1987), “Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Sharedmemory Architectures,” IEEE Trans. Comp. C-36(5), 581–591.CrossRefGoogle Scholar
  26. [26]
    Pease, M.C. (1968), “An Adaptation of the Fast Fourier Transform for Parallel Processing,” J. Assoc. Comp. Mach. 15, 253–264.CrossRefGoogle Scholar
  27. [27]
    Peterson, W.P. (1983), “Vector Fortran for Numerical Problems on Cray-1,” Commun. Assoc. Comput. Mach. 26, 1008–1021.Google Scholar
  28. [28]
    Rofheart, M. (1991), Algorithms and Methods for Multidimensional Digital Signal Processing, Ph.D. thesis, the City University of New York.Google Scholar
  29. [29]
    Swarztrauber, P.N. (1982), “Vectorizing the FFTs,” Paral. Comput., Rodrique, G., ed., Academic Press, New York.Google Scholar
  30. [30]
    Swarztrauber, P.N. (1986), “Multiprocessor FFT’s,” Paral. Comput. Google Scholar
  31. [31]
    Temperton, C. (1985), “Implementation of Self-Sorting In-place Prime Factor FFT Algorithm,” J. Comp. Phys. 58, 283–299.MathSciNetMATHCrossRefGoogle Scholar
  32. [32]
    Temperton, C. (1991), “Self-Sorting In-place Fast Fourier Transforms,” Siam J. Sci. Stat. Comput. 12(4), 6–23.MathSciNetGoogle Scholar
  33. [33]
    Tolimieri, R., An, M., and Lu, C.(1989), Algorithms for Discrete Fourier Transform and Convolutions, Springer-Verlag, New York.Google Scholar
  34. [34]
    Zapata, E.L. et al. (1990), “Multidimensional Fast Fourier Transform into SIMD Hypercubes,” Pro. IEE 137(4), 253–260.Google Scholar
  35. [35]
    AT&T DSP Parallel Processor BT-100, AT&T, Whippany, NJ, 1988.Google Scholar
  36. [36]
    NCUBE 6400 Processor Handbook, NCUBE Co., Beaverton, Oregon, 1989.Google Scholar
  37. [37]
    Intel iPSC/2, Intel Scientific Computers, Beaverton, 1988.Google Scholar
  38. [38]
    Intel iPSC/860 User’s Guide, Intel Co., June, 1990.Google Scholar

Copyright information

© Springer Science+Business Media New York 1997

Authors and Affiliations

  • Richard Tolimieri
    • 1
  • Myoung An
    • 2
  • Chao Lu
    • 3
  1. 1.Department of Electrical EngineeringCity College of CUNYNew YorkUSA
  2. 2.A.J. Devaney AssociatesAllstonUSA
  3. 3.Department of Computer and Information SciencesTowson State UniversityTowsonUSA

Personalised recommendations