aboutsummaryrefslogtreecommitdiff
path: root/include/hw/i386/intel_iommu.h
AgeCommit message (Collapse)AuthorFilesLines
2025-01-15tests/qtest: Add intel-iommu testZhenzhong Duan1-0/+1
Add the framework to test the intel-iommu device. Currently only tested cap/ecap bits correctness when x-flts=on in scalable mode. Also tested cap/ecap bits consistency before and after system reset. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Acked-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20241212083757.605022-21-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-15intel_iommu: Introduce a property to control FS1GP cap bit settingZhenzhong Duan1-0/+1
This gives user flexibility to turn off FS1GP for debug purpose. It is also useful for future nesting feature. When host IOMMU doesn't support FS1GP but vIOMMU does, nested page table on host side works after turning FS1GP off in vIOMMU. This property has no effect when vIOMMU is in legacy mode or x-flts=off in scalable modme. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20241212083757.605022-20-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-15intel_iommu: Set default aw_bits to 48 starting from QEMU 9.2Zhenzhong Duan1-1/+1
According to VTD spec, stage-1 page table could support 4-level and 5-level paging. However, 5-level paging translation emulation is unsupported yet. That means the only supported value for aw_bits is 48. So default aw_bits to 48 when stage-1 translation is turned on. For legacy and scalable modes, 48 is the default choice for modern OS when both 48 and 39 are supported. So it makes sense to set default to 48 for these two modes too starting from QEMU 9.2. Use pc_compat_9_1 to handle the compatibility for machines before 9.2. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20241212083757.605022-17-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-15intel_iommu: Flush stage-1 cache in iotlb invalidationZhenzhong Duan1-0/+1
According to spec, Page-Selective-within-Domain Invalidation (11b): 1. IOTLB entries caching second-stage mappings (PGTT=010b) or pass-through (PGTT=100b) mappings associated with the specified domain-id and the input-address range are invalidated. 2. IOTLB entries caching first-stage (PGTT=001b) or nested (PGTT=011b) mapping associated with specified domain-id are invalidated. So per spec definition the Page-Selective-within-Domain Invalidation needs to flush first stage and nested cached IOTLB entries as well. We don't support nested yet and pass-through mapping is never cached, so what in iotlb cache are only first-stage and second-stage mappings. Add a tag pgtt in VTDIOTLBEntry to mark PGTT type of the mapping and invalidate entries based on PGTT type. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-Id: <20241212083757.605022-11-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-15intel_iommu: Rename slpte to pteYi Liu1-1/+1
Because we will support both FST(a.k.a, FLT) and SST(a.k.a, SLT) translation, rename variable and functions from slpte to pte whenever possible. But some are SST only, they are renamed with sl_ prefix. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Co-developed-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-Id: <20241212083757.605022-6-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-15intel_iommu: Add a placeholder variable for scalable mode stage-1 translationZhenzhong Duan1-0/+1
Add an new element flts in IntelIOMMUState to mark stage-1 translation support in scalable mode, this element will be exposed as an intel_iommu property x-flts finally. For now, it's only a placehholder and used for address width compatibility check and block host device passthrough until nesting is supported. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-Id: <20241212083757.605022-4-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-11-04intel_iommu: Introduce property "stale-tm" to control Transient Mapping (TM) ↵Zhenzhong Duan1-0/+3
field VT-d spec removed Transient Mapping (TM) field from second-level page-tables and treat the field as Reserved(0) since revision 3.2. Changing the field as reserved(0) will break backward compatibility, so introduce a property "stale-tm" to allow user to control the setting. Use pc_compat_9_1 to handle the compatibility for machines before 9.2 which allow guest to set the field. Starting from 9.2, this field is reserved(0) by default to match spec. Of course, user can force it on command line. This doesn't impact function of vIOMMU as there was no logic to emulate Transient Mapping. Suggested-by: Yi Liu <yi.l.liu@intel.com> Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-Id: <20241028022514.806657-1-zhenzhong.duan@intel.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-06-24intel_iommu: Implement [set|unset]_iommu_device() callbacksYi Liu1-0/+2
Implement [set|unset]_iommu_device() callbacks in Intel vIOMMU. In set call, we take a reference of HostIOMMUDevice and store it in hash table indexed by PCI BDF. Note this BDF index is device's real BDF not the aliased one which is different from the index of VTDAddressSpace. There can be multiple assigned devices under same virtual iommu group and share same VTDAddressSpace, but each has its own HostIOMMUDevice. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-03hw/i386/intel_iommu: Fix endianness problems related to VTD_IR_TableEntryThomas Huth1-24/+26
The code already tries to do some endianness handling here, but currently fails badly: - While it already swaps the data when logging errors / tracing, it fails to byteswap the value before e.g. accessing entry->irte.present - entry->irte.source_id is swapped with le32_to_cpu(), though this is a 16-bit value - The whole union is apparently supposed to be swapped via the 64-bit data[2] array, but the struct is a mixture between 32 bit values (the first 8 bytes) and 64 bit values (the second 8 bytes), so this cannot work as expected. Fix it by converting the struct to two proper 64-bit bitfields, and by swapping the values only once for everybody right after reading the data from memory. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230802135723.178083-3-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
2023-01-27intel-iommu: Document iova_treePeter Xu1-1/+37
It seems not super clear on when iova_tree is used, and why. Add a rich comment above iova_tree to track why we needed the iova_tree, and when we need it. Also comment for the map/unmap messages, on how they're used and implications (e.g. unmap can be larger than the mapped ranges). Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20230109193727.1360190-1-peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-11-07intel-iommu: PASID supportJason Wang1-1/+6
This patch introduce ECAP_PASID via "x-pasid-mode". Based on the existing support for scalable mode, we need to implement the following missing parts: 1) tag VTDAddressSpace with PASID and support IOMMU/DMA translation with PASID 2) tag IOTLB with PASID 3) PASID cache and its flush 4) PASID based IOTLB invalidation For simplicity PASID cache is not implemented so we can simply implement the PASID cache flush as a no and leave it to be implemented in the future. For PASID based IOTLB invalidation, since we haven't had L1 stage support, the PASID based IOTLB invalidation is not implemented yet. For PASID based device IOTLB invalidation, it requires the support for vhost so we forbid enabling device IOTLB when PASID is enabled now. Those work could be done in the future. Note that though PASID based IOMMU translation is ready but no device can issue PASID DMA right now. In this case, PCI_NO_PASID is used as PASID to identify the address without PASID. vtd_find_add_as() has been extended to provision address space with PASID which could be utilized by the future extension of PCI core to allow device model to use PASID based DMA translation. This feature would be useful for: 1) prototyping PASID support for devices like virtio 2) future vPASID work 3) future PRS and vSVA work Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221028061436.30093-5-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-11-07intel-iommu: drop VTDBusJason Wang1-9/+2
We introduce VTDBus structure as an intermediate step for searching the address space. This works well with SID based matching/lookup. But when we want to support SID plus PASID based address space lookup, this intermediate steps turns out to be a burden. So the patch simply drops the VTDBus structure and use the PCIBus and devfn as the key for the g_hash_table(). This simplifies the codes and the future PASID extension. To prevent being slower for past vtd_find_as_from_bus_num() callers, a vtd_as cache indexed by the bus number is introduced to store the last recent search result of a vtd_as belongs to a specific bus. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221028061436.30093-3-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
2022-05-16intel_iommu: Support IR-only mode without DMA translationDavid Woodhouse1-0/+1
By setting none of the SAGAW bits we can indicate to a guest that DMA translation isn't supported. Tested by booting Windows 10, as well as Linux guests with the fix at https://git.kernel.org/torvalds/c/c40aaaac10 Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220314142544.150555-2-dwmw2@infradead.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-04-06Replace config-time define HOST_WORDS_BIGENDIANMarc-André Lureau1-3/+3
Replace a config-time define with a compile time condition define (compatible with clang and gcc) that must be declared prior to its usage. This avoids having a global configure time define, but also prevents from bad usage, if the config header wasn't included before. This can help to make some code independent from qemu too. gcc supports __BYTE_ORDER__ from about 4.6 and clang from 3.2. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> [ For the s390x parts I'm involved in ] Acked-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220323155743.1585078-7-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-06intel_iommu: support snoop controlJason Wang1-0/+1
SC is required for some kernel features like vhost-vDPA. So this patch implements basic SC feature. The idea is pretty simple, for software emulated DMA it would be always coherent. In this case we can simple advertise ECAP_SC bit. For VFIO and vhost, thing will be more much complicated, so this patch simply fail the IOMMU notifier registration. In the future, we may want to have a dedicated notifiers flag or similar mechanism to demonstrate the coherency so VFIO could advertise that if it has VFIO_DMA_CC_IOMMU, for vhost kernel backend we don't need that since it's a software backend. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220214060346.72455-1-jasowang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-09-18Use OBJECT_DECLARE_SIMPLE_TYPE when possibleEduardo Habkost1-3/+1
This converts existing DECLARE_INSTANCE_CHECKER usage to OBJECT_DECLARE_SIMPLE_TYPE when possible. $ ./scripts/codeconverter/converter.py -i \ --pattern=AddObjectDeclareSimpleType $(git grep -l '' -- '*.[ch]') Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Paul Durrant <paul@xen.org> Message-Id: <20200916182519.415636-6-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-09Use DECLARE_*CHECKER* macrosEduardo Habkost1-2/+2
Generated using: $ ./scripts/codeconverter/converter.py -i \ --pattern=TypeCheckMacro $(git grep -l '' -- '*.[ch]') Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-12-ehabkost@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-13-ehabkost@redhat.com> Message-Id: <20200831210740.126168-14-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-09Move QOM typedefs and add missing includesEduardo Habkost1-1/+2
Some typedefs and macros are defined after the type check macros. This makes it difficult to automatically replace their definitions with OBJECT_DECLARE_TYPE. Patch generated using: $ ./scripts/codeconverter/converter.py -i \ --pattern=QOMStructTypedefSplit $(git grep -l '' -- '*.[ch]') which will split "typdef struct { ... } TypedefName" declarations. Followed by: $ ./scripts/codeconverter/converter.py -i --pattern=MoveSymbols \ $(git grep -l '' -- '*.[ch]') which will: - move the typedefs and #defines above the type check macros - add missing #include "qom/object.h" lines if necessary Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-9-ehabkost@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-10-ehabkost@redhat.com> Message-Id: <20200831210740.126168-11-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-03-16misc: Replace zero-length arrays with flexible array member (automatic)Philippe Mathieu-Daudé1-1/+2
Description copied from Linux kernel commit from Gustavo A. R. Silva (see [3]): --v-- description start --v-- The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member [1], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being unadvertenly introduced [2] to the Linux codebase from now on. --^-- description end --^-- Do the similar housekeeping in the QEMU codebase (which uses C99 since commit 7be41675f7cb). All these instances of code were found with the help of the following Coccinelle script: @@ identifier s, m, a; type t, T; @@ struct s { ... t m; - T a[0]; + T a[]; }; @@ identifier s, m, a; type t, T; @@ struct s { ... t m; - T a[0]; + T a[]; } QEMU_PACKED; [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=76497732932f [3] https://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git/commit/?id=17642a2fbd2c1 Inspired-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-09hw/i386/intel_iommu: Remove unused includesPhilippe Mathieu-Daudé1-4/+0
intel_iommu.h does not use any of these includes, remove them. Acked-by: John Snow <jsnow@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200228114649.12818-7-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-08-16Include hw/qdev-properties.h lessMarkus Armbruster1-1/+1
In my "build everything" tree, changing hw/qdev-properties.h triggers a recompile of some 2700 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). Many places including hw/qdev-properties.h (directly or via hw/qdev.h) actually need only hw/qdev-core.h. Include hw/qdev-core.h there instead. hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h and hw/qdev-properties.h, which in turn includes hw/qdev-core.h. Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h. While there, delete a few superfluous inclusions of hw/qdev-core.h. Touching hw/qdev-properties.h now recompiles some 1200 objects. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Daniel P. Berrangé" <berrange@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190812052359.30071-22-armbru@redhat.com>
2019-04-02intel_iommu: Drop extended root fieldPeter Xu1-1/+0
VTD_RTADDR_RTT is dropped even by the VT-d spec, so QEMU should probably do the same thing (after all we never really implemented it). Since we've had a field for that in the migration stream, to keep compatibility we need to fill the hole up. Please refer to VT-d spec 10.4.6. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190329061422.7926-3-peterx@redhat.com> Reviewed-by: Liu, Yi L <yi.l.liu@intel.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-20intel-iommu: optimize nodmar memory regionsPeter Xu1-2/+5
Previously we have per-device system memory aliases when DMAR is disabled by the system. It will slow the system down if there are lots of devices especially when DMAR is disabled, because each of the aliased system address space will contain O(N) slots, and rendering such N address spaces will be O(N^2) complexity. This patch introduces a shared nodmar memory region and for each device we only create an alias to the shared memory region. With the aliasing, QEMU memory core API will be able to detect when devices are sharing the same address space (which is the nodmar address space) when rendering the FlatViews and the total number of FlatViews can be dramatically reduced when there are a lot of devices. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190313094323.18263-1-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-12intel_iommu: add scalable-mode option to make scalable mode workYi Sun1-1/+2
This patch adds an option to provide flexibility for user to expose Scalable Mode to guest. User could expose Scalable Mode to guest by the config as below: "-device intel-iommu,caching-mode=on,scalable-mode=on" The Linux iommu driver has supported scalable mode. Please refer below patch set: https://www.spinics.net/lists/kernel/msg2985279.html Signed-off-by: Liu, Yi L <yi.l.liu@intel.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Message-Id: <1551753295-30167-4-git-send-email-yi.y.sun@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12intel_iommu: add 256 bits qi_desc supportLiu, Yi L1-0/+1
Per Intel(R) VT-d 3.0, the qi_desc is 256 bits in Scalable Mode. This patch adds emulation of 256bits qi_desc. Signed-off-by: Liu, Yi L <yi.l.liu@intel.com> [Yi Sun is co-developer to rebase and refine the patch.] Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <1551753295-30167-3-git-send-email-yi.y.sun@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12intel_iommu: scalable mode emulationLiu, Yi L1-2/+22
Intel(R) VT-d 3.0 spec introduces scalable mode address translation to replace extended context mode. This patch extends current emulator to support Scalable Mode which includes root table, context table and new pasid table format change. Now intel_iommu emulates both legacy mode and scalable mode (with legacy-equivalent capability set). The key points are below: 1. Extend root table operations to support both legacy mode and scalable mode. 2. Extend context table operations to support both legacy mode and scalable mode. 3. Add pasid tabled operations to support scalable mode. Signed-off-by: Liu, Yi L <yi.l.liu@intel.com> [Yi Sun is co-developer to contribute much to refine the whole commit.] Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Message-Id: <1551753295-30167-2-git-send-email-yi.y.sun@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
2018-12-19intel_iommu: dma read/write draining supportPeter Xu1-0/+1
Support DMA read/write draining should be easy for existing VT-d emulation since the emulation itself does not have any request queue there so we don't need to do anything to flush the un-commited queue. What we need to do is to declare the support. These capabilities are required to pass Windows SVVP test program. It is verified that when with parameters "x-aw-bits=48,caching-mode=off" we can pass the Windows SVVP test with this patch applied. Otherwise we'll fail with: IOMMU[0] - DWD (DMA write draining) not supported IOMMU[0] - DWD (DMA read draining) not supported Segment 0 has no DMA remapping capable IOMMU units However since these bits are not declared support for QEMU<=3.1, we'll need a compatibility bit for it and we turn this on by default only for QEMU>=4.0. Please refer to VT-d spec 6.5.4 for more information. CC: Yu Wang <wyu@redhat.com> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1654550 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05x86_iommu: move vtd_generate_msi_message in common fileSingh, Brijesh1-59/+0
The vtd_generate_msi_message() in intel-iommu is used to construct a MSI Message from IRQ. A similar function will be needed when we add interrupt remapping support in amd-iommu. Moving the function in common file to avoid the code duplication. Rename it to x86_iommu_irq_to_msi_message(). There is no logic changes in the code flow. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Suggested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: rework the page walk logicPeter Xu1-0/+2
This patch fixes a potential small window that the DMA page table might be incomplete or invalid when the guest sends domain/context invalidations to a device. This can cause random DMA errors for assigned devices. This is a major change to the VT-d shadow page walking logic. It includes but is not limited to: - For each VTDAddressSpace, now we maintain what IOVA ranges we have mapped and what we have not. With that information, now we only send MAP or UNMAP when necessary. Say, we don't send MAP notifies if we know we have already mapped the range, meanwhile we don't send UNMAP notifies if we know we never mapped the range at all. - Introduce vtd_sync_shadow_page_table[_range] APIs so that we can call in any places to resync the shadow page table for a device. - When we receive domain/context invalidation, we should not really run the replay logic, instead we use the new sync shadow page table API to resync the whole shadow page table without unmapping the whole region. After this change, we'll only do the page walk once for each domain invalidations (before this, it can be multiple, depending on number of notifiers per address space). While at it, the page walking logic is also refactored to be simpler. CC: QEMU Stable <qemu-stable@nongnu.org> Reported-by: Jintack Lim <jintack@cs.columbia.edu> Tested-by: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: only do page walk for MAP notifiersPeter Xu1-0/+2
For UNMAP-only IOMMU notifiers, we don't need to walk the page tables. Fasten that procedure by skipping the page table walk. That should boost performance for UNMAP-only notifiers like vhost. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: add iommu lockPeter Xu1-0/+6
SECURITY IMPLICATION: this patch fixes a potential race when multiple threads access the IOMMU IOTLB cache. Add a per-iommu big lock to protect IOMMU status. Currently the only thing to be protected is the IOTLB/context cache, since that can be accessed even without BQL, e.g., in IO dataplane. Note that we don't need to protect device page tables since that's fully controlled by the guest kernel. However there is still possibility that malicious drivers will program the device to not obey the rule. In that case QEMU can't really do anything useful, instead the guest itself will be responsible for all uncertainties. CC: QEMU Stable <qemu-stable@nongnu.org> Reported-by: Fam Zheng <famz@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: remove IntelIOMMUNotifierNodePeter Xu1-7/+2
That is not really necessary. Removing that node struct and put the list entry directly into VTDAddressSpace. It simplfies the code a lot. Since at it, rename the old notifiers_list into vtd_as_with_notifiers. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18intel-iommu: Extend address width to 48 bitsPrasad Singamsetty1-0/+1
The current implementation of Intel IOMMU code only supports 39 bits iova address width. This patch provides a new parameter (x-aw-bits) for intel-iommu to extend its address width to 48 bits but keeping the default the same (39 bits). The reason for not changing the default is to avoid potential compatibility problems with live migration of intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits' parameter are 39 and 48. After enabling larger address width (48), we should be able to map larger iova addresses in the guest. For example, a QEMU guest that is configured with large memory ( >=1TB ). To check whether 48 bits aw is enabled, we can grep in the guest dmesg output with line: "DMAR: Host address width 48". Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18intel-iommu: Redefine macros to enable supporting 48 bit address widthPrasad Singamsetty1-2/+4
The current implementation of Intel IOMMU code only supports 39 bits host/iova address width so number of macros use hard coded values based on that. This patch is to redefine them so they can be used with variable address widths. This patch doesn't add any new functionality but enables adding support for 48 bit address width. Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-08-02intel_iommu: use access_flags for iotlbPeter Xu1-2/+1
It was cached by read/write separately. Let's merge them. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-14memory/iommu: introduce IOMMUMemoryRegionClassAlexey Kardashevskiy1-1/+2
This finishes QOM'fication of IOMMUMemoryRegion by introducing a IOMMUMemoryRegionClass. This also provides a fastpath analog for IOMMU_MEMORY_REGION_GET_CLASS(). This makes IOMMUMemoryRegion an abstract class. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170711035620.4232-3-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-14memory/iommu: QOM'fy IOMMU MemoryRegionAlexey Kardashevskiy1-1/+1
This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion as a parent. This moves IOMMU-related fields from MR to IOMMU MR. However to avoid dymanic QOM casting in fast path (address_space_translate, etc), this adds an @is_iommu boolean flag to MR and provides new helper to do simple cast to IOMMU MR - memory_region_get_iommu. The flag is set in the instance init callback. This defines memory_region_is_iommu as memory_region_get_iommu()!=NULL. This switches MemoryRegion to IOMMUMemoryRegion in most places except the ones where MemoryRegion may be an alias. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20170711035620.4232-2-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-20intel_iommu: enable remote IOTLBPeter Xu1-0/+8
This patch is based on Aviv Ben-David (<bd.aviv@gmail.com>)'s patch upstream: "IOMMU: enable intel_iommu map and unmap notifiers" https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01453.html However I removed/fixed some content, and added my own codes. Instead of translate() every page for iotlb invalidations (which is slower), we walk the pages when needed and notify in a hook function. This patch enables vfio devices for VT-d emulation. And, since we already have vhost DMAR support via device-iotlb, a natural benefit that this patch brings is that vt-d enabled vhost can live even without ATS capability now. Though more tests are needed. Signed-off-by: Aviv Ben-David <bdaviv@cs.technion.ac.il> Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-10-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20intel_iommu: allow dynamic switch of IOMMU regionPeter Xu1-0/+2
This is preparation work to finally enabled dynamic switching ON/OFF for VT-d protection. The old VT-d codes is using static IOMMU address space, and that won't satisfy vfio-pci device listeners. Let me explain. vfio-pci devices depend on the memory region listener and IOMMU replay mechanism to make sure the device mapping is coherent with the guest even if there are domain switches. And there are two kinds of domain switches: (1) switch from domain A -> B (2) switch from domain A -> no domain (e.g., turn DMAR off) Case (1) is handled by the context entry invalidation handling by the VT-d replay logic. What the replay function should do here is to replay the existing page mappings in domain B. However for case (2), we don't want to replay any domain mappings - we just need the default GPA->HPA mappings (the address_space_memory mapping). And this patch helps on case (2) to build up the mapping automatically by leveraging the vfio-pci memory listeners. Another important thing that this patch does is to seperate IR (Interrupt Remapping) from DMAR (DMA Remapping). IR region should not depend on the DMAR region (like before this patch). It should be a standalone region, and it should be able to be activated without DMAR (which is a common behavior of Linux kernel - by default it enables IR while disabled DMAR). Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-9-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-02-17intel_iommu: add "caching-mode" optionAviv Ben-David1-0/+2
This capability asks the guest to invalidate cache before each map operation. We can use this invalidation to trap map operations in the hypervisor. Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com> [peterx: using "caching-mode" instead of "cache-mode" to align with spec] [peterx: re-write the subject to make it short and clear] Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-11-15intel_iommu: fix several incorrect endianess and bit fieldsPeter Xu1-5/+4
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-10-17intel_iommu: reject broken EIMRadim Krčmář1-0/+1
Cluster x2APIC cannot work without KVM's x2apic API when the maximal APIC ID is greater than 8 and only KVM's LAPIC can support x2APIC, so we forbid other APICs and also the old KVM case with less than 9, to simplify the code. There is no point in enabling EIM in forbidden APICs, so we keep it enabled only for the KVM APIC; unconditionally, because making the option depend on KVM version would be a maintanance burden. Old QEMUs would enable eim whenever intremap was on, which would trick guests into thinking that they can enable cluster x2APIC even if any interrupt destination would get clamped to 8 bits. Depending on your configuration, QEMU could notice that the destination LAPIC is not present and report it with a very non-obvious: KVM: injection failed, MSI lost (Operation not permitted) Or the guest could say something about unexpected interrupts, because clamping leads to aliasing so interrupts were being delivered to incorrect VCPUs. KVM_X2APIC_API is the feature that allows us to enable EIM for KVM. QEMU 2.7 allowed EIM whenever interrupt remapping was enabled. In order to keep backward compatibility, we again allow guests to misbehave in non-obvious ways, and make it the default for old machine types. A user can enable the buggy mode it with "x-buggy-eim=on". Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17intel_iommu: add OnOffAuto intr_eim as "eim" propertyRadim Krčmář1-0/+1
The default (auto) emulates the current behavior. A user can now control EIM like -device intel-iommu,intremap=on,eim=off Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-07-21intel_iommu: avoid unnamed fieldsMichael S. Tsirkin1-4/+4
Also avoid unnamed fields for portability. Also, rename VTD_IRTE to VTD_IR_TableEntry for coding style compliance. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21intel_iommu: add SID validation for IRPeter Xu1-0/+17
This patch enables SID validation. Invalid interrupts will be dropped. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21intel_iommu: Add support for Extended Interrupt ModeJan Kiszka1-0/+1
As neither QEMU nor KVM support more than 255 CPUs so far, this is simple: we only need to switch the destination ID translation in vtd_remap_irq_get if EIME is set. Once CFI support is there, it will have to take EIM into account as well. So far, nothing to do for this. This patch allows to use x2APIC in split irqchip mode of KVM. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> [use le32_to_cpu() to retrieve dest_id] Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21intel_iommu: add support for split irqchipPeter Xu1-0/+1
In split irqchip mode, IOAPIC is working in user space, only update kernel irq routes when entry changed. When IR is enabled, we directly update the kernel with translated messages. It works just like a kernel cache for the remapping entries. Since KVM irqfd is using kernel gsi routes to deliver interrupts, as long as we can support split irqchip, we will support irqfd as well. Also, since kernel gsi routes will cache translated interrupts, irqfd delivery will not suffer from any performance impact due to IR. And, since we supported irqfd, vhost devices will be able to work seamlessly with IR now. Logically this should contain both vhost-net and vhost-user case. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [move trace-events lines into target-i386/trace-events] Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-20intel_iommu: Add support for PCI MSI remapPeter Xu1-0/+66
This patch enables interrupt remapping for PCI devices. To play the trick, one memory region "iommu_ir" is added as child region of the original iommu memory region, covering range 0xfeeXXXXX (which is the address range for APIC). All the writes to this range will be taken as MSI, and translation is carried out only when IR is enabled. Idea suggested by Paolo Bonzini. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-20intel_iommu: define several structs for IOMMU IRPeter Xu1-0/+74
Several data structs are defined to better support the rest of the patches: IRTE to parse remapping table entries, and IOAPIC/MSI related structure bits to parse interrupt entries to be filled in by guest kernel. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-20intel_iommu: define interrupt remap table addr registerPeter Xu1-0/+5
Defined Interrupt Remap Table Address register to store IR table pointer. Also, do proper handling on global command register writes to store table pointer and its size. One more debug flag "DEBUG_IR" is added for interrupt remapping. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>