Abstract
The input/output memory management unit (IOMMU) was recently introduced into mainstream computer architecture when both Intel and AMD added IOMMUs to their chip-sets. An IOMMU provides memory protection from I/O devices by enabling system software to control which areas of physical memory an I/O device may access. However, this protection incurs additional direct memory access (DMA) overhead due to the required address resolution and validation.
IOMMUs include an input/output translation lookaside buffer (IOTLB) to speed-up address resolution, but still every IOTLB cache-miss causes a substantial increase in DMA latency and performance degradation of DMA-intensive workloads. In this paper we first demonstrate the potential negative impact of IOTLB cache-misses on workload performance. We then propose both system software and hardware enhancements to reduce IOTLB miss rate and accelerate address resolution. These enhancements can lead to a reduction of over 60% in IOTLB miss-rate for common I/O intensive workloads.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AMD: IOMMU architectural specification, http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/34434.pdf
Bellard, F.: QEMU, a fast and portable dynamic translator. In: ATEC 2005: Proceedings of the Annual Conference on USENIX (41–41) (2005)
Ben-Yehuda, M., Mason, J., Xenidis, J., Krieger, O., van Doorn, L., Nakajima, J., Mallick, A., Wahlig, E.: Utilizing IOMMUs for virtualization in Linux and Xen. In: OLS 2006: The 2006 Ottawa Linux Symposium, pp. 71–86 (July 2006)
Ben-Yehuda, M., Xenidis, J., Ostrowski, M., Rister, K., Bruemmer, A., van Doorn, L.: The price of safety: Evaluating IOMMU performance. In: OLS 2007: The 2007 Ottawa Linux Symposium, pp. 9–20 ( July 2007)
Hill, M.D., Kong, S.I., Patterson, D.A., Talluri, M.: Tradeoffs in supporting two page sizes. Tech. rep., Mountain View, CA, USA (1993)
Linux 2.6.31:drivers/Documentation/networking/e1000.txt
Intel: Intel virtualization technology for directed I/O, architecture specification, http://download.intel.com/technology/computing/vptech/Intelr_VT_for_Direct_IO.pdf
Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. SIGARCH Comput. Archit. News 18(3a), 364–373 (1990)
Kandiraju, G.B., Sivasubramaniam, A.: Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks. SIGMETRICS Perform. Eval. Rev. 30(1), 129–139 (2002)
Kandiraju, G.B., Sivasubramaniam, A.: Going the distance for TLB prefetching: An application-driven study. In: International Symposium on Computer Architecture, p. 195 (2002)
Kivity, A., Kamay, Y., Laor, D., Lublin, U., Liguori, A.: KVM: the Linux Virtual Machine Monitor. In: Proceedings of the Linux Symposium, Ottawa, Ontario (2007), http://www.kernel.org/doc/ols/2007/ols2007v1-pages-225-230.pdf
LSI53C895A PCI to ultra2 SCSI controller technical manual, http://www.lsi.com/DistributionSystem/AssetDocument/files/docs/techdocs/storage_stand_prod/SCSIControllers/lsi53c895a_tech_manual.pdf
Miller, D.S., Henderson, R., Jelinek, J.: Linux 2.6.31:Documentation/DMA-mapping.txt
Moll, L., Shand, M.: Systems performance measurement on PCI pamette. In: Proceedings of the 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines, April 1997, pp. 125–133 (1997)
Navarro, J., Iyer, S., Druschel, P., Cox, A.: Practical, transparent operating system support for superpages. In: OSDI 2002: Proceedings of the 5th Symposium on Operating Systems Design and Implementation, pp. 89–104. ACM, New York (2002), http://dx.doi.org/10.1145/1060289.1060299
Sugerman, J., Venkitachalam, G., Lim, B.H.: Virtualizing I/O devices on VMware workstation’s hosted virtual machine monitor. In: USENIX Annual Technical Conference. USENIX Association, Berkeley (2001), http://dx.doi.org/10.1145/265924.265930
Tomonori, F.: DMA representations sg_table vs. sg_ring IOMMUs and LLDś restrictions. LSF 2008 http://iou.parisc-linux.org/lsf2008/IOD-MA_Representations-fujita_tomonori.pdf
Vaidyanathan, K., Huang, W., Chai, L., Panda, D.K.: Designing efficient asynchronous memory operations using hardware copy engine: A case study with I/OAT. In: Proceedings of 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), March 26-30, pp. 1–8. IEEE, Long Beach (2007)
Willmann, P., Rixner, S., Cox, A.L.: Protection strategies for direct access to virtualized I/O devices. In: ATC 2008: USENIX 2008 Annual Technical Conference on Annual Technical Conference, pp. 15–28. USENIX Association, Berkeley (2008)
Yassour, B.A., Ben-Yehuda, M., Wasserman, O.: On the DMA mapping problem in direct device assignment. In: SYSTOR 2010: The 3rd Annual Haifa Experimental Systems Conference (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amit, N., Ben-Yehuda, M., Yassour, BA. (2011). IOMMU: Strategies for Mitigating the IOTLB Bottleneck. In: Varbanescu, A.L., Molnos, A., van Nieuwpoort, R. (eds) Computer Architecture. ISCA 2010. Lecture Notes in Computer Science, vol 6161. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24322-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-24322-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24321-9
Online ISBN: 978-3-642-24322-6
eBook Packages: Computer ScienceComputer Science (R0)