Hardware-Assisted Context Management for Accelerator Virtualization: A Case Study with RSA

Gao, Ying; Sherwood, Timothy

doi:10.1007/978-3-319-30695-7_6

Ying Gao¹⁹ &
Timothy Sherwood¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9637))

Included in the following conference series:

International Conference on Architecture of Computing Systems

1771 Accesses
1 Altmetric

Abstract

The advantages of virtualization, including the ability to migrate, schedule, and manage software processes, continues to drive the demand for hardware and software support. However, the packaging of software state required by virtualization is in direct conflict with the trend toward accelerator-rich architectures where state is distributed between the processor and a set of heterogeneous devices – a problem that is particularly acute in the mobile SoC market. Virtualizing such systems requires that the VMM explicitly manage the internal state of all of the accelerators over which a process’s computation may be spread. Public-key crypto engines are particularly problematic because of both the sensitivity of the information that they carry and the long compute times required to complete a single task.

In this paper we examine a set of hardware design approaches to public-key crypto accelerator virtualization and study the trade-off between sharing granularity and management overhead in time and space. Based on observations made during the design of several such systems, we propose a hybrid local-remote scheduling approach that promotes more intelligent decisions during hardware context switches and enables quick and safe state packaging. We find that performance can vary significantly among the examined approaches, and that our new design, with explicit accelerator support for state management and a modicum of scheduling flexibility, can allow highly contended resources to be efficiently shared with only moderate gains in area and power consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

5.3, C.: http://quid.hpl.hp.com:9081/cacti
Chen, J.H., Wu, H.S., Shieh, M.D., Lin, W.C.: A new montgomery modular multiplication algorithm and its vlsi design for rsa cryptosystem. In: IEEE International Symposium on Circuits and Systems, ISCAS 2007, pp. 3780–3783. IEEE (2007)
Google Scholar
Clark, N., Hormati, A., Mahlke, S.: Veal: Virtualized execution accelerator for loops. In: 35th International Symposium on Computer Architecture, ISCA 2008, pp. 389–400. IEEE (2008)
Google Scholar
Compiler, D.: https://www.synopsys.com/tools/implementation/rtlsynthesis
Cong, J., Ghodrat, M.A., Gill, M., Grigorian, B., Huang, H., Reinman, G.: Composable accelerator-rich microprocessor enhanced for adaptivity and longevity. In: 2013 IEEE International Symposium on Low Power Electronics and Design (ISLPED), pp. 305–310. IEEE (2013)
Google Scholar
Cong, J., Ghodrat, M.A., Gill, M., Grigorian, B., Reinman, G.: Charm: a composable heterogeneous accelerator-rich microprocessor. In: Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, pp. 379–384. ACM (2012)
Google Scholar
Govindaraju, V., Ho, C.H., Sankaralingam, K.: Dynamically specialized datapaths for energy efficient computing. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), pp. 503–514. IEEE (2011)
Google Scholar
Gupta, V., Schwan, K., Tolia, N., Talwar, V., Ranganathan, P.: Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In: 2011 USENIX Annual Technical Conference (USENIX ATC 2011), p. 31 (2011)
Google Scholar
Hiremane, R.: Intel virtualization technology for directed i/o(intel vt-d). Technology@ Intel Magazine 4(10) (2007)
Google Scholar
Jovanovic, S., Tanougast, C., Weber, S.: A hardware preemptive multitasking mechanism based on scan-path register structure for fpga-based reconfigurable systems. In: Second NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2007, pp. 358–364. IEEE (2007)
Google Scholar
Koch, D., Haubelt, C., Teich, J.: Efficient hardware checkpointing: concepts, overhead analysis, and implementation. In: Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays, pp. 188–196. ACM (2007)
Google Scholar
Liu, J., Abali, B.: Virtualization polling engine (vpe): using dedicated cpu cores to accelerate i/o virtualization. In: Proceedings of the 23rd International Conference on Supercomputing, pp. 225–234. ACM (2009)
Google Scholar
Menychtas, K., Shen, K., Scott, M.L.: Disengaged scheduling for fair, protected access to fast computational accelerators. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 301–316. ACM (2014)
Google Scholar
ModelSim: http://www.mentor.com/products/fv/modelsim
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)
Article MathSciNet MATH Google Scholar
OpenSSL: https://www.openssl.org
Rupnow, K., Fu, W., Compton, K.: Block, drop or roll (back): Alternative preemption methods for rh multi-tasking. In: 17th IEEE Symposium on Field Programmable Custom Computing Machines, FCCM 2009, pp. 63–70. IEEE (2009)
Google Scholar
Shieh, M.D., Chen, J.H., Wu, H.H., Lin, W.C.: A new modular exponentiation architecture for efficient design of rsa cryptosystem. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 16(9), 1151–1161 (2008)
Article Google Scholar
Stewin, P., Bystrov, I.: Understanding DMA malware. In: Flegel, U., Markatos, E., Robertson, W. (eds.) DIMVA 2012. LNCS, vol. 7591, pp. 21–41. Springer, Heidelberg (2013)
Chapter Google Scholar
Stillwell, P.M., Chadha, V., Tickoo, O., Zhang, S., Illikkal, R.,Iyer, R., Newell, D.: Hippai: high performance portable accelerator interface for socs. In: 2009 International Conference on High Performance Computing (HiPC), pp.109–118. IEEE (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Santa Barbara, CA, 93106, USA
Ying Gao & Timothy Sherwood

Authors

Ying Gao
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Sherwood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Gao .

Editor information

Editors and Affiliations

Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Frank Hannig
Faculty of Engineering (FEUP), University of Porto, Porto, Portugal
João M. P. Cardoso
Universität zu Lübeck, Lübeck, Germany
Thilo Pionteck
Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Dietmar Fey
Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Wolfgang Schröder-Preikschat
Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Jürgen Teich

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, Y., Sherwood, T. (2016). Hardware-Assisted Context Management for Accelerator Virtualization: A Case Study with RSA. In: Hannig, F., Cardoso, J.M.P., Pionteck, T., Fey, D., Schröder-Preikschat, W., Teich, J. (eds) Architecture of Computing Systems – ARCS 2016. ARCS 2016. Lecture Notes in Computer Science(), vol 9637. Springer, Cham. https://doi.org/10.1007/978-3-319-30695-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-30695-7_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30694-0
Online ISBN: 978-3-319-30695-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics