Abstract
Hardware and parallel implementations can substantially speed up machine learning algorithms to extend their widespread applications. In this chapter, we first introduce various circuit realizations for popular neural network learning methods. We then introduce their parallel implementations on graphic processing units (GPUs), systolic arrays of processors, and parallel computers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anderson, D. T., Luke, R. H., & Keller, J. M. (2008). Speedup of fuzzy clustering through stream processing on graphics processing units. IEEE Transactions on Fuzzy Systems, 16(4), 1101–1106.
Andraka, R. (1998). A survey of CORDIC algorithms for FPGA based computers. In Proceedings of ACM/SIGDA International Symposium on Field Programmable Gate Arrays (pp. 191–200). Monterey, CA.
Anguita, D., & Boni, A. (2003). Neural network learning for analog VLSI implementations of support vector machines: A survey. Neurocomputing, 55, 265–283.
Anguita, D., Boni, A., & Ridella, S. (1999). Learning algorithm for nonlinear support vector machines suited for digital VLSI. Electronics Letters, 35(16), 1349–1350.
Anguita, D., Boni, A., & Ridella, S. (2003). A digital architecture for support vector machines: Theory, algorithm and FPGA implementation. IEEE Transactions on Neural Networks, 14(5), 993–1009.
Anguita, D., Ghio, A., Pischiutta, S., & Ridella, S. (2008). A support vector machine with integer parameters. Neurocomputing, 72, 480–489.
Anguita, D., Pischiutta, S., Ridella, S., & Sterpi, D. (2006). Feed-forward support vector machine without multipliers. IEEE Transactions on Neural Networks, 17(5), 1328–1331.
Anguita, D., Ridella, S., & Rovetta, S. (1998). Circuital implementation of support vector machines. Electronics Letters, 34(16), 1596–1597.
Asanovic, K., & Morgan, N. (1991). Experimental determination of precision requirements for back-propagation training of artificial neural networks. Proceedings of the 2nd International Conference on Microelectronics for Neural Networks (pp. 9–15). Munich, Germany.
Aunet, S., Oelmann, B., Norseng, P. A., & Berg, Y. (2008). Real-time reconfigurable subthreshold CMOS perceptron. IEEE Transactions on Neural Networks, 19(4), 645–657.
Baturone, I., Sanchez-Solano, S., Barriga, A., & Huertas, J. L. (1997). Implementation of CMOS fuzzy controllers as mixed-signal integrated circuits. IEEE Transactions on Fuzzy Systems, 5(1), 1–19.
Beiu, V., & Taylor, J. G. (1996). On the circuit complexity of sigmoid feedforward neural networks. Neural Networks, 9(7), 1155–1171.
Bouras, S., Kotronakis, M., Suyama, K., & Tsividis, Y. (1998). Mixed analog-digital fuzzy logic controller with continuous-amplitude fuzzy inferences and defuzzification. IEEE Transactions on Fuzzy Systems, 6(2), 205–215.
Brandstetter, A., & Artusi, A. (2008). Radial basis function networks GPU-based implementation. IEEE Transactions on Neural Networks, 19(12), 2150–2154.
Brown, B., Yu, X., & Garverick, S. (2004). A mixed-mode analog VLSI continuous-time recurrent neural network. In Proceedings of the 2nd IASTED International Conference on Circuits, Signals and Systems (pp. 104–108). Clearwater Beach, FL.
Cancelo, G., & Mayosky, M. (1998). A parallel analog signal processing unit based on radial basis function networks. IEEE Transactions on Nuclear Science, 45(3), 792–797.
Cao, L. J., Keerthi, S. S., Ong, C.-J., Zhang, J. Q., Periyathamby, U., Fu, X. J., et al. (2006). Parallel sequential minimal optimization for the training of support vector machines. IEEE Transactions on Neural Networks, 17(4), 1039–1049.
Catanzaro, B., Sundaram, N., & Keutzer, K. (2008). Fast support vector machine training and classification on graphics processors. In Proceedings of the 25th ACM International Conference on Machine Learning (pp. 104–111).
Chaudhuri, K., Sarwate, A. D., & Sinha, K. (2013). A near-optimal algorithm for differentially-private principal components. Journal of Machine Learning Research, 14, 2905–2943.
Choi, J., Sheu, B. J., & Chang, J. C. F. (1994). A Gaussian synapse circuit for analog VLSI neural networks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2(1), 129–133.
Chua, L. O. (1971). Memristor-the missing circuit element. IEEE Transactions on Circuit Theory, 18(5), 507–519.
Churcher, S., Murray, A. F., & Reekie, H. M. (1993). Programmable analogue VLSI for radial basis function networks. Electronics Letters, 29(18), 1603–1605.
Cichocki, A. (1992). Neural network for singular value decomposition. Electronics Letters, 28(8), 784–786.
Costa, A., De Gloria, A., Farabosch, P., Pagni, A., & Rizzotto, G. (1995). Hardware solutions of fuzzy control. Proceedings of the IEEE, 83(3), 422–434.
Culler, D., Estrin, D., & Srivastava, M. (2004). Overview of sensor networks. IEEE. Computer, 37(8), 41–49.
del Campo, I., Echanobe, J., Bosque, G., & Tarela, J. M. (2008). Efficient hardware/software implementation of an adaptive neuro-fuzzy system. IEEE Transactions on Fuzzy Systems, 16(3), 761–778.
Delbruck, T. (1991). ‘Bump’ circuits for computing similarity and dissimilarity of analog voltage. In Proceedings of IEEE International Joint Conference on Neural Networks (Vol. 1, pp. 475–479). Seattle, WA.
Di Ventra, M., & Pershin, Y. V. (2013). The parallel approach. Nature Physics, 9, 200–202.
Dong, J.-X., Krzyzak, A., & Suen, C. Y. (2005). Fast SVM training algorithm with decomposition on very large data sets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(4), 603–618.
Draghici, S. (2002). On the capabilities of neural networks using limited precision weights. Neural Networks, 15, 395–414.
Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In S. Halevi & T. Rabin (Eds.), Theory of cryptography, LNCS (Vol. 3876, pp. 265–284). Berlin: Springer.
Elredge, J. G., & Hutchings, B. L. (1994). RRANN: A hardware implementation of the backpropagation algorithm using reconfigurable FPGAs. In Proceedings of IEEE International Conference on Neural Networks (pp. 77–80). Orlando, FL.
Feali1, M. S., & Ahmadi, A., (2017). Realistic Hodgkin-Huxley axons using stochastic behavior of memristors. Neural Processing Letters, 45(1), 1–14.
Fellus, J., Picard, D., & Gosselin, P.-H. (2015). Asynchronous gossip principal components analysis. Neurocomputing, 169, 262–271.
Fierimonte, R., Scardapane, S., Uncini, A., Panella, M. (2017). Fully decentralized semi-supervised learning via privacy-preserving matrix completion. IEEE Transactions on Neural Networks and Learning Systems, 28(11), 2699–2711.
Gadea, R., Cerda, J., Ballester, F., & Mocholi, A. (2000). Artificial neural network implementation on a single FPGA of a pipelined on-line backprogation. In Proceedings of the 13th International Symposium on System Synthesis (pp. 225–230). Madrid, Spain.
Girones, R. G., Palero, R. C., & Boluda, J. C. (2005). FPGA implementation of a pipelined on-line backpropagation. Journal of VLSI Signal Processing, 40, 189–213.
Gobi, A. F., & Pedrycz, W. (2006). The potential of fuzzy neural networks in the realization of approximate reasoning engines. Fuzzy Sets and Systems, 157, 2954–2973.
Hardt, M., & Roth, A. (2012). Beating randomized response on incoherent matrices. In Proceedings of the 44th Annual ACM Symposium on Theory of Computing (pp. 1255–1268). New York, NY.
Hikawa, H. (2003). A digital hardware pulse-mode neuron with piecewise linear activation function. IEEE Transactions on Neural Networks, 14(5), 1028–1037.
Himavathi, S., Anitha, D., & Muthuramalingam, A. (2007). Feedforward neural network implementation in FPGA using layer multiplexing for effective resource utilization. IEEE Transactions on Neural Networks, 18(3), 880–888.
Hurdle, J. F. (1997). The synthesis of compact fuzzy neural circuits. IEEE Transactions on Fuzzy Systems, 5(1), 44–55.
Hwang, J. N., Vlontzos, J. A., & Kung, S. Y. (1989). A systolic neural network architecture for hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(12), 1967–1979.
Kang, K., & Shibata, T. (2010). An on-chip-trainable Gaussian-kernel analog support vector machine. IEEE Transactions on Circuits and Systems I, 57(7), 1513–1524.
Kim, C. M., Park, H. M., Kim, T., Choi, Y. K., & Lee, S. Y. (2003). FPGA implementation of ICA algorithm for blind signal separation and adaptive noise canceling. IEEE Transactions on Neural Networks, 14(5), 1038–1046.
Kollmann, K., Riemschneider, K., & Zeider, H. C. (1996). On-chip backpropagation training using parallel stochastic bit streams. In Proceedings of the 5th International Conference on Microelectronics for Neural Networks and Fuzzy Systems (pp. 149–156). Lausanne, Switzerland.
Kozlov, A. V., & Singh, J. P. (1994). A parallel Lauritzen-Spiegelhalter algorithm for probabilistic inference. In Proceedings of ACM/IEEE conference on Supercomputing (pp. 320–329). Washington, DC.
Kung, S. Y., & Hwang, J. N. (1989). A unified systolic architecture for artificial neural networks. Journal of Parallel and Distributed Computing, 6, 358–387.
Kuo, Y. H., & Chen, C. L. (1998). Generic \(LR\) fuzzy cells for fuzzy hardware synthesis. IEEE Transactions on Fuzzy Systems, 6(2), 266–285.
Lawrence, R. D., Almasi, G. S., & Rushmeier, H. E. (1999). A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems. Data Mining and Knowledge Discovery, 3, 171–195.
Lazzaro, J., Lyckebusch, S., Mahowald, M. A., & Mead, C. A. (1989). Winner-take-all networks of \(O(n)\) complexity. In D. S. Touretzky (Ed.), Advances in neural information processing systems (Vol. 1, pp. 703–711). San Mateo, CA: Morgan Kaufmann.
Lee, B. W., & Shen, B. J. (1992). Design and analysis of analog VLSI neural networks. In B. Kosko (Ed.), Neural networks for signal processing (pp. 229–284). Englewood Cliffs, NJ: Prentice-Hall.
Lee, B. W., & Shen, B. J. (1993). Parallel hardware annealing for optimal solutions on electronic neural networks. IEEE Transactions on Neural Networks, 4(4), 588–599.
Le Ly, D., & Chow, P. (2010). High-performance reconfigurable hardware architecture for restricted Boltzmann machines. IEEE Transactions on Neural Networks, 21(11), 1780–1792.
Lemaitre, L., Patyra, M., & Mlynek, D. (1994). Analysis and design of CMOS fuzzy logic controller in current mode. IEEE Journal of Solid-State Circuits, 29(3), 317–322.
Liu, Q., Dang, C., & Cao, J. (2010). A novel recurrent neural network with one neuron and finite-time convergence for \(k\)-winners-take-all operation. IEEE Transactions on Neural Networks, 21(7), 1140–1148.
Lin, S. Y., Huang, R. J., & Chiueh, T. D. (1998). A tunable Gaussian/square function computation circuit for analog neural networks. IEEE Transactions on Circuits and Systems II, 45(3), 441–446.
Lin, S.-J., Hung, Y.-T., & Hwang, W.-J. (2011). Efficient hardware architecture based on generalized Hebbian algorithm for texture classification. Neurocomputing, 74, 3248–3256.
Liu, Y., Jing, W., & Xu, L. (2016). Parallelizing backpropagation neural network using MapReduce and cascading model. Computational Intelligence and Neuroscience, 2016, Article ID 2842780, 11 pages.
Lu, Y., Roychowdhury, V., & Vandenberghe, L. (2008). Distributed parallel support vector machines in strongly connected networks. IEEE Transactions on Neural Networks, 19(7), 1167–1178.
Luo, F.-L., Unbehauen, R., & Li, Y.-D. (1997). Real-time computation of singular vectors. Applied Mathematics and Computation, 86, 197–214.
Mahapatra, S., & Mahapatra, R. N. (2000). Mapping of neural network models onto systolic arrays. Journal of Parallel and Distributed Computing, 60, 677–689.
Majani, E., Erlanson, R., & Abu-Mostafa, Y. (1989). On the \(k\)-winners-take-all network. In D. S. Touretzky (Ed.), Advances in neural information processing systems 1 (pp. 634–642). San Mateo, CA: Morgan Kaufmann.
Mann, J. R., & Gilbert, S. (1989). An analog self-organizing neural network chip. In D. S. Touretzky (Ed.), Advances in neural information processing systems 1 (pp. 739–747). San Mateo, CA: Morgan Kaufmann.
Marchesi, M., Orlandi, G., Piazza, F., & Uncini, A. (1993). Fast neural networks without multipliers. IEEE Transactions on Neural Networks, 4(1), 53–62.
Marchesi, M. L., Piazza, F., & Uncini, A. (1996). Backpropagation without multiplier for multilayer neural networks. IEE Proceedings—Circuits, Devices and Systems, 143(4), 229–232.
Mayes, D. J., Murray, A. F., & Reekie, H. M. (1996). Pulsed VLSI for RBF neural networks. In Proceedings of the 5th IEEE International Conference on Microelectronics for Neural Networks (pp. 177–184). Lausanne, Switzerland.
Navia-Vazquez, A., Gutierrez-Gonzalez, D., Parrado-Hernandez, E., & Navarro-Abellan, J. J. (2006). Distributed support vector machines. IEEE Transactions on Neural Networks, 17(4), 1091–1097.
Oohori, T., & Naganuma, H. (2007). A new backpropagation learning algorithm for layered neural networks with nondifferentiable units. Neural Computation, 19, 1422–1435.
Oh, K.-S., & Jung, K. (2004). GPU implementation of neural networks. Pattern Recognition, 37(6), 1311–1314.
Palit, I., & Reddy, C. K. (2012). Scalable and parallel boosting with MapReduce. IEEE Transactions on Knowledge and Data Engineering, 24(10), 1904–1916.
Patel, N. D., Nguang, S. K., & Coghill, G. G. (2007). Neural network implementation using bit streams. IEEE Transactions on Neural Networks, 18(5), 1488–1503.
Perfetti, R., & Ricci, E. (2006). Analog neural network for support vector machine learning. IEEE Transactions on Neural Networks, 17(4), 1085–1091.
Pickett, M. D., Medeiros-Ribeiro, G., & Williams, R. S. (2013). A scalable neuristor built with Mott memristors. Nature Materials, 12(2), 114–117.
Rabenseifner, R., & Wellein, G. (2003). Comparison of parallel programming models on clusters of SMP nodes. In H. G. Bock, E. Kostina, H. X. Phu, & R. Rannacher (Eds.), Modeling, simulation and optimization of complex processes (pp. 409–426). Berlin: Springer.
Raina, R., Madhavan, A., & Ng, A. Y. (2009). Large-scale deep unsupervised learning using graphics processors. In Proceedings of ACM International Conference on Machine Learning (pp. 873–880).
Rasche, C., & Douglas, R. (2000). An improved silicon neuron. Analog Integrated Circuits and Signal Processing, 23(3), 227–236.
Reyneri, L. M. (2003). Implementation issues of neuro-fuzzy hardware: Going toward HW/SW codesign. IEEE Transactions on Neural Networks, 14(1), 176–194.
Rovetta, S., & Zunino, R. (1999). Efficient training of neural gas vector quantizers with analog circuit implementation. IEEE Transactions on Circuits and Systems II, 46(6), 688–698.
Salapura, V. (2000). A fuzzy RISC processor. IEEE Transactions on Fuzzy Systems, 8(6), 781–790.
Saldana, M., Patel, A., Madill, C., Nunes, D., Wang, D., Styles, H., Putnam, A., Wittig, R., & Chow, P. (2008). MPI as an abstraction for software-hardware interaction for HPRCs. In Proceedings of the 2nd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (pp. 1–10). Austin, TX.
Scardapane, S., Fierimonte, R., Di Lorenzo, P., & Panella, M. (2016). A. Uncini. Distributed semi-supervised support vector machines. Neural Networks, 80, 43–52.
Schaik, A. (2001). Building blocks for electronic spiking neural networks. Neural Networks, 14, 617–628.
Schneider, R. S., & Card, H. C. (1998). Analog hardware implementation issues in deterministic Boltzmann machines. IEEE Transactions on Circuits and Systems II, 45(3), 352–360.
Seiler, G., & Nossek, J. (1993). Winner-take-all cellular neural networks. IEEE Transactions on Circuits and Systems II, 40(3), 184–190.
Serrano-Gotarredona, R., Oster, M., Lichtsteiner, P., & 15 colleagues,. (2009). CAVIAR: A 45k neuron, 5M synapse, 12G connects/s AER hardware sensory-processing-learning-actuating system for high-speed visual object recognition and tracking. IEEE Transactions on Neural Networks, 20(9), 1417–1438.
Shyu, K.-K., Lee, M.-H., Wu, Y.-T., & Lee, P.-L. (2008). Implementation of pipelined FastICA on FPGA for real-time blind source separation. IEEE Transactions on Neural Networks, 19(6), 958–970.
Soudry, D., Di Castro, D., Gal, A., Kolodny, A., & Kvatinsky, S. (2015). Memristor-based multilayer neural networks with online gradient descent training. IEEE Transactions on Neural Networks and Learning Systems, 26(10), 2408–2421.
Strukov, D. B., Snider, G. S., Stewart, D. R., & Williams, R. S. (2008). The missing memristor found. Nature, 453(7191), 80–83.
Sum, J. P. F., Leung, C. S., Tam, P. K. S., Young, G. H., Kan, W. K., & Chan, L. W. (1999). Analysis for a class of winner-take-all model. IEEE Transactions on Neural Networks, 10(1), 64–71.
Tan, Y., Xia, Y., & Wang, J. (2000). Neural network realization of support vector methods for pattern classification. In Proceedings of IEEE International Joint Conference on Neural Networks (Vol. 6, pp. 411–416). Como, Italy.
Traversa, F. L., & Di Ventra, M. (2015). Universal Memcomputing Machines. IEEE Transactions on Neural Networks and Learning Systems, 26(11), 2702–2715.
Trebaticky, P., & Pospichal, J. (2008). Neural network training with extended Kalman filter using graphics processing unit. In Proceedings of the 18th International Conference Artificial Neural Networks (ICANN) (Vol. 2, pp. 198–207). Berlin: Springer.
Turing, A. M. (1936). On computational numbers, with an application to the entscheidungsproblem. Proceedings of the London Mathematical Society, 42(2), 230–265.
Tymoshchuk, P. V. (2009). A discrete-time dynamic K-winners-take-all neural circuit. Neurocomputing, 72, 3191–3202.
Urahama, K., & Nagao, T. (1995). K-winners-take-all circuit with \(O(N)\) complexity. IEEE Transactions on Neural Networks, 6, 776–778.
Vanek, J., Michalek, J., & Psutka, J. (2017). A GPU-Architecture Optimized Hierarchical Decomposition Algorithm for Support Vector Machine Training. IEEE Transactions on Parallel and Distributed Systems, 28(12), 3330–3343.
Vrtaric, D., Ceperic, V., & Baric, A. (2013). Area-efficient differential Gaussian circuit for dedicated hardware implementations of Gaussian function based machine learning algorithms. Neurocomputing, 118, 329–333.
Wang, X., & Leeser, M. (2009). A truly two-dimensional systolic array FPGA implementation of QR decomposition. ACM Transactions on Embedded Computing Systems Article, 9(1), Article 3, 1–17.
Watkins, S. S., & Chau, P. M. (1992). A radial basis function neurocomputer implemented with analog VLSI circuits. In Proceedings of International Joint Conference on Neural Networks (Vol. 2, pp. 607–612). Baltimore, MD.
Weninger, F., Bergmann, J., & Schuller, B. (2015). Introducing CURRENNT: The Munich open-source CUDA RecurREnt Neural Network Toolkit. Journal of Machine Learning Research, 16, 547–551.
Woodsend, K., & Gondzio, J. (2009). Hybrid MPI/OpenMP parallel linear support vector machine training. Journal of Machine Learning Research, 10, 1937–1953.
Xia, Y., & Wang, J. (2004). A one-layer recurrent neural network for support vector machine learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 34(2), 1261–1269.
Xu, X., & Jager, J. (1999). A fast parallel clustering algorithm for large spatial databases. Data Mining and Knowledge Discovery, 3, 263–290.
Yildirim, T., & Marsland, J. S. (1996). A conic section function network synapse and neuron implementation in VLSI hardware. In Proceedings of IEEE International Conference on Neural Networks (Vol. 2, pp. 974–979). Washington, DC.
Zanghirati, G., & Zanni, L. (2003). A parallel solver for large quadratic programs in training support vector machines. Parallel Computing, 29, 535–551.
Zanni, L., Serafini, T., & Zanghirati, G. (2006). Parallel software for training large scale support vector machines on multiprocessor systems. Journal of Machine Learning Research, 7, 1467–1492.
Zhang, Y., Li, P., Jin, Y., & Choe, Y. (2015). A digital liquid state machine with biologically inspired learning and its application to speech recognition. IEEE Transactions on Neural Networks and Learning Systems, 26(11), 2635–2649.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer-Verlag London Ltd., part of Springer Nature
About this chapter
Cite this chapter
Du, KL., Swamy, M.N.S. (2019). Neural Network Circuits and Parallel Implementations. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-7452-3_28
Download citation
DOI: https://doi.org/10.1007/978-1-4471-7452-3_28
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-7451-6
Online ISBN: 978-1-4471-7452-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)