Skip to main content
Log in

Heterogeneous Computing in Economics: A Simplified Approach

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

This paper shows the potential of heterogeneous computing in solving dynamic equilibrium models in economics. We illustrate the power and simplicity of C++ Accelerated Massive Parallelism (C++ AMP) recently introduced by Microsoft. Starting from the same exercise as Aldrich et al. (J Econ Dyn Control 35:386–393, 2011) we document a speed gain together with a simplified programming style that naturally enables parallelization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. “A computational unit could be a general-purpose processor (GPP)—including but not limited to a multi-core central processing unit (CPU), a special-purpose processor (i.e.) digital signal processor (DSP) or graphics processing unit (GPU), a co-processor, or custom acceleration logic (application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA)). In general, a heterogeneous computing platform consists of processors with different instruction set architectures (ISAs)” (Source: http://en.wikipedia.org/wiki/Heterogeneous_computing).

  2. See http://www.gregcons.com/CppAmp/.

  3. While Thrust, “a parallel algorithms library which resembles the C++ Standard Template Library (STL),” improves upon the low-level approach of C for CUDA, it still suffers from the same vendor lock-in.

  4. For instance, a programmer using C for CUDA has to remember to deallocate the previously allocated memory—failure to do so may result in memory leaks; in contrast, C++ AMP follows the common C++ approach of relying on the Resource Acquisition Is Initialization (RAII) technique for automatic resource management, see (Stroustrup (2000), Sect. 14.4.).

  5. http://blogs.msdn.com/b/nativeconcurrency/archive/2012/04/11/c-amp-for-the-cuda-programmer.aspx.

  6. Shared memory access ensures that upon termination of this step each processor can read \(V^{i+1}\).

  7. The source code for our application is available from the authors upon request and will be made available under an open-source license.

  8. In a previous version of the paper we also find that the single-precision CUDA program, in this particular case, is slightly faster than the single-precision C++ AMP program.

  9. In a previous version of the paper, working in single precision, we also find that the C++ AMP program in this case is more than five times faster than the CUDA one.

  10. For more details, see http://blogs.msdn.com/b/nativeconcurrency/archive/2012/02/07/double-precision-support-in-c-amp.aspx.

References

  • Aldrich, E. M., Fernández-Villaverde, J., & Gallant, A. R. (2011). Tapping the supercomputer under your desk: Solving dynamic equilibrium models with graphics processors. Journal of Economic Dynamics and Control, 35, 386–393.

    Article  Google Scholar 

  • Boyd, C. (2008). Data-parallel computing. Queue, 6, 30–39.

    Article  Google Scholar 

  • Creel, M. (2005). User-friendly parallel computations with econometric examples. Computational Economics, 26, 107–128.

    Article  Google Scholar 

  • Creel, M., & Goffe, W. L. (2008). Multi-core CPUs, clusters, and grid computing: A tutorial. Computational Economics, 32, 353–382.

    Article  Google Scholar 

  • Flynn, M. (1972). Some computer organizations and their effectiveness. IEEE Transaction Computing, 21, 948–960.

    Google Scholar 

  • Heer, B., & Maussner, A. (2005). Dynamic general equilibrium modelling: Computational methods and applications. Berlin: Springer.

    Google Scholar 

  • ISO. (2011). ISO/IEC 14882:2011 Information technology—Programming languages—C++. Geneva: International Organization for Standardization.

  • Morozov, S., & Mathur, S. (2012). Massively parallel computation using graphics processors with application to optimal experimentation in dynamic control. Computational Economics, 21, 151–182.

    Article  Google Scholar 

  • Stroustrup, B. (2000). The C++ programming language (3rd ed.). Boston, MA: Addison-Wesley Longman.

    Google Scholar 

  • Tauchen, G. (1986). Finite state Markov–Chain approximations to univariate and vector autoregressions. Economics Letters, 20, 177–181.

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful to an anonymous Referee and the Editor for their helpful comments and suggestions. We also thank the participants at Norges Bank research seminar held in Oslo and the CFE’12 Conference held in Oviedo. We acknowledge financial support by the Center for Research in Econometric Analysis of Time Series, CREATES, funded by the Danish National Research Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefano Grassi.

Appendix: The C++ AMP Programming Style

Appendix: The C++ AMP Programming Style

To show the similarities and differences between the standard C++, ISO (2011) and C++ AMP we provide as an example the code to obtain a vector containing the element-by-element squares of the numbers in a given input vector (a sum of the elements of the output vector would give the sum of squares, often useful in econometrics applications). Let’s take vector \(Vector = \{1, 2, 3, 4, 5\}\) and calculate the corresponding squares vector \(Squares = \{1, 4, 9, 16, 25\}\). Listing 3 reports the C++ and the C++ AMP code. We define three vectors: hostVector is the input vector, while hostSquares_usingCPU and hostSquares_usingGPU are the output vectors (both stored in the host’s memory), obtained using the host (CPU) and the device (GPU), respectively. Functions hostSquares and deviceSquares perform squaring on the host and the device, respectively. Finally the results are compared.

Note that from the point of view of the client code (here: function main), the interface (and, consequently, the use) of hostSquares is identical to that of deviceSquares. This illustrates the fact that C++ AMP can be used to incrementally introduce parallelism to an existing C++ code base—whenever necessary and without breaking changes.

The only differences are in the internal implementation details of both functions—while hostSquares uses standard C++ constructs, deviceSquares uses parallel constructs from the concurrency namespace (made available via the inclusion of the amp.h header).

Note that data-parallel applications are an especially good fit for GPU computing. In terms of Flynn’s taxonomy, Flynn (1972), this can be related to Single Instruction, Multiple Data (SIMD)—or, more precisely, a more general category thereof—SPMD (single program, multiple data). In terms of our example, the single program (here: multiply the values in the input vector by themselves, storing thus obtained squared values in the output vector) is applied to multiple data (elements of the input vector), where the lack of data dependence (between the distinct elements of the input vector) allows to spread the work onto multiple threads.

figure c

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dziubinski, M.P., Grassi, S. Heterogeneous Computing in Economics: A Simplified Approach. Comput Econ 43, 485–495 (2014). https://doi.org/10.1007/s10614-013-9362-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10614-013-9362-2

Keywords

Navigation