aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)AuthorFilesLines
2018-05-31xen-hvm: stop faking I/O to access PCI config spacePaul Durrant2-20/+84
This patch removes the current hackery where IOREQ_TYPE_PCI_CONFIG requests are handled by faking PIO to 0xcf8 and 0xcfc and replaces it with direct calls to pci_host_config_read/write_common(). Doing so necessitates mapping BDFs to PCIDevices but maintaining a simple QLIST in xen_device_realize/unrealize() will suffice. NOTE: whilst config space accesses are currently limited to PCI_CONFIG_SPACE_SIZE, this patch paves the way to increasing the limit to PCIE_CONFIG_SPACE_SIZE when Xen gains the ability to emulate MCFG table accesses. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2018-05-31xen-hvm: try to use xenforeignmemory_map_resource() to map ioreq pagesPaul Durrant2-15/+52
Xen 4.11 has a new API to directly map guest resources. Among the resources that can be mapped using this API are ioreq pages. This patch modifies QEMU to attempt to use the new API should it exist, falling back to the previous mechanism if it is unavailable. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2018-05-31xen/hvm: correct reporting of modified memory under physmap during migrationIgor Druzhinin2-19/+20
When global_log_dirty is enabled VRAM modification tracking never worked correctly. The address that is passed to xen_hvm_modified_memory() is not the effective PFN but RAM block address which is not the same for VRAM. We need to make a translation for this address into PFN using physmap. Since there is no way to access physmap properly inside xen_hvm_modified_memory() let's make it a global structure. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2018-05-31KVM: GIC: Fix memory leak due to calling kvm_init_irq_routing twiceShannon Zhao2-2/+0
kvm_irqchip_create called by kvm_init will call kvm_init_irq_routing to initialize global capability variables. If we call kvm_init_irq_routing in GIC realize function, previous allocated memory will leak. Fix this by deleting the unnecessary call. Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-id: 1527750994-14360-1-git-send-email-zhaoshenglong@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-31ARM: ACPI: Fix use-after-free due to memory reallocShannon Zhao1-5/+15
acpi_data_push uses g_array_set_size to resize the memory size. If there is no enough contiguous memory, the address will be changed. So previous pointer could not be used any more. It must update the pointer and use the new one. Also, previous codes wrongly use le32 conversion of iort->node_offset for subsequent computations that will result incorrect value if host is not litlle endian. So use the non-converted one instead. Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-id: 1527663951-14552-1-git-send-email-zhaoshenglong@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-31Make address_space_get_iotlb_entry() take a MemTxAttrs argumentPeter Maydell1-1/+2
As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to address_space_get_iotlb_entry(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-12-peter.maydell@linaro.org
2018-05-31Make MemoryRegion valid.accepts callback take a MemTxAttrs argumentPeter Maydell4-7/+14
As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to the MemoryRegion valid.accepts callback. We'll need this for subpage_accepts(). We could take the approach we used with the read and write callbacks and add new a new _with_attrs version, but since there are so few implementations of the accepts hook we just change them all. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-9-peter.maydell@linaro.org
2018-05-31Make memory_region_access_valid() take a MemTxAttrs argumentPeter Maydell1-1/+2
As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to memory_region_access_valid(). Its callers either have an attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED. The callsite in flatview_access_valid() is part of a recursive loop flatview_access_valid() -> memory_region_access_valid() -> subpage_accepts() -> flatview_access_valid(); we make it pass MEMTXATTRS_UNSPECIFIED for now, until the next several commits have plumbed an attrs parameter through the rest of the loop and we can add an attrs parameter to flatview_access_valid(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-8-peter.maydell@linaro.org
2018-05-31Make address_space_translate{, _cached}() take a MemTxAttrs argumentPeter Maydell1-1/+2
As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to address_space_translate() and address_space_translate_cached(). Callers either have an attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-4-peter.maydell@linaro.org
2018-05-31xlnx-zdma: Correct mem leaks and memset to zero on desc unaligned errorsFrancisco Iglesias1-3/+7
Coverity found that the string return by 'object_get_canonical_path' was not being freed at two locations in the model (CID 1391294 and CID 1391293) and also that a memset was being called with a value greater than the max of a byte on the second argument (CID 1391286). This patch corrects this by adding the freeing of the strings and also changing to memset to zero instead on descriptor unaligned errors. Signed-off-by: Francisco Iglesias <frasse.iglesias@gmail.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180528184859.3530-1-frasse.iglesias@gmail.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-31arm: fix qemu crash on startup with -bios optionIgor Mammedov1-9/+9
When QEMU is started with following CLI -machine virt,gic-version=3,accel=kvm -cpu host -bios AAVMF_CODE.fd it crashes with abort at accel/kvm/kvm-all.c:2164: KVM_SET_DEVICE_ATTR failed: Group 6 attr 0x000000000000c665: Invalid argument Which is caused by implicit dependency of kvm_arm_gicv3_reset() on arm_gicv3_icc_reset() where the later is called by CPU reset reset callback. However commit: 3b77f6c arm/boot: split load_dtb() from arm_load_kernel() broke CPU reset callback registration in case arm_load_kernel() ... if (!info->kernel_filename || info->firmware_loaded) branch is taken, i.e. it's sufficient to provide a firmware or do not provide kernel on CLI to skip cpu reset callback registration, where before offending commit the callback has been registered unconditionally. Fix it by registering the callback right at the beginning of arm_load_kernel() unconditionally instead of doing it at the end. NOTE: we probably should eliminate that dependency anyways as well as separate arch CPU reset parts from arm_load_kernel() into CPU itself, but that refactoring that I probably would have to do anyways later for CPU hotplug to work. Reported-by: Auger Eric <eric.auger@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Message-id: 1527070950-208350-1-git-send-email-imammedo@redhat.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-31arm_gicv3_kvm: increase clroffset accordinglyShannon Zhao1-0/+1
It forgot to increase clroffset during the loop. So it only clear the first 4 bytes. Fixes: 367b9f527becdd20ddf116e17a3c0c2bbc486920 Cc: qemu-stable@nongnu.org Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-id: 1527047633-12368-1-git-send-email-zhaoshenglong@huawei.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-31hw/intc/arm_gicv3: Fix APxR<n> register dispatchingJan Kiszka1-6/+6
There was a nasty flip in identifying which register group an access is targeting. The issue caused spuriously raised priorities of the guest when handing CPUs over in the Jailhouse hypervisor. Cc: qemu-stable@nongnu.org Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Message-id: 28b927d3-da58-bce4-cc13-bfec7f9b1cb9@siemens.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-30numa: postpone options post-processing till machine_run_board_init()Igor Mammedov1-2/+3
in preparation for numa options to being handled via QMP before machine_run_board_init(), move final numa configuration checks and processing to machine_run_board_init() so it could take into account both CLI (via parse_numa_opts()) and QMP input Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1525423069-61903-2-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-29ppc: Rename 2.13 machines to 3.0Peter Maydell1-7/+7
Rename the 2.13 machines to match the number we're going to use for the next release. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-id: 20180522104000.9044-5-peter.maydell@linaro.org
2018-05-29hw/s390x: Rename 2.13 machines to 3.0Peter Maydell1-5/+5
Rename the 2.13 machines to match the number we're going to use for the next release. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-id: 20180522104000.9044-4-peter.maydell@linaro.org
2018-05-29hw/i386: Rename 2.13 machine types to 3.0Peter Maydell2-8/+8
Rename the 2.13 machine types to match what we're going to use as our next release number. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-id: 20180522104000.9044-3-peter.maydell@linaro.org
2018-05-29Merge remote-tracking branch ↵Peter Maydell3-14/+370
'remotes/stefanberger/tags/pull-tpm-2018-05-23-4' into staging Merge tpm 2018/05/23 v4 # gpg: Signature made Sat 26 May 2018 03:52:12 BST # gpg: using RSA key 75AD65802A0B4211 # gpg: Good signature from "Stefan Berger <stefanb@linux.vnet.ibm.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211 * remotes/stefanberger/tags/pull-tpm-2018-05-23-4: test: Add test cases that use the external swtpm with CRB interface docs: tpm: add VM save/restore example and troubleshooting guide tpm: extend TPM TIS with state migration support tpm: extend TPM emulator with state migration support Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-24Merge remote-tracking branch 'remotes/kraxel/tags/vga-20180524-pull-request' ↵Peter Maydell5-53/+388
into staging vga: catch depth 0 hw/display: add new bochs-display device some cleanups. # gpg: Signature made Thu 24 May 2018 16:45:46 BST # gpg: using RSA key 4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/vga-20180524-pull-request: MAINTAINERS: add vga entries bochs-display: add pcie support bochs-display: add dirty tracking support hw/display: add new bochs-display device vga-pci: use PCI_VGA_MMIO_SIZE vga: move bochs vbe defines to header file vga: catch depth 0 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-24tpm: extend TPM TIS with state migration supportStefan Berger2-2/+51
Extend the TPM TIS interface with state migration support. We need to synchronize with the backend thread to make sure that a command being processed by the external TPM emulator has completed and its response been received. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2018-05-24tpm: extend TPM emulator with state migration supportStefan Berger2-12/+319
Extend the TPM emulator backend device with state migration support. The external TPM emulator 'swtpm' provides a protocol over its control channel to retrieve its state blobs. We implement functions for getting and setting the different state blobs. In case the setting of the state blobs fails, we return a negative errno code to fail the start of the VM. Since we have an external TPM emulator, we need to make sure that we do not migrate the state for as long as it is busy processing a request. We need to wait for notification that the request has completed processing. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2018-05-24Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell11-113/+399
pc, pci, virtio, vhost: fixes, features Beginning of merging vDPA, new PCI ID, a new virtio balloon stat, intel iommu rework fixing a couple of security problems (no CVEs yet), fixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Wed 23 May 2018 15:41:32 BST # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (28 commits) intel-iommu: rework the page walk logic util: implement simple iova tree intel-iommu: trace domain id during page walk intel-iommu: pass in address space when page walk intel-iommu: introduce vtd_page_walk_info intel-iommu: only do page walk for MAP notifiers intel-iommu: add iommu lock intel-iommu: remove IntelIOMMUNotifierNode intel-iommu: send PSI always even if across PDEs nvdimm: fix typo in label-size definition contrib/vhost-user-blk: enable protocol feature for vhost-user-blk hw/virtio: Fix brace Werror with clang 6.0.0 libvhost-user: Send messages with no data vhost-user+postcopy: Use qemu_set_nonblock virtio: support setting memory region based host notifier vhost-user: support receiving file descriptors in slave_read vhost-user: add Net prefix to internal state structure linux-headers: add kvm header for mips linux-headers: add unistd.h on all arches update-linux-headers.sh: unistd.h, kvm consistency ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-24Merge remote-tracking branch ↵Peter Maydell10-617/+377
'remotes/sstabellini-http/tags/xen-20180522-tag' into staging Xen 2018/05/22 # gpg: Signature made Tue 22 May 2018 19:44:06 BST # gpg: using RSA key 894F8F4870E1AE90 # gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>" # gpg: aka "Stefano Stabellini <sstabellini@kernel.org>" # Primary key fingerprint: D04E 33AB A51F 67BA 07D3 0AEA 894F 8F48 70E1 AE90 * remotes/sstabellini-http/tags/xen-20180522-tag: xen_disk: be consistent with use of xendev and blkdev->xendev xen_disk: use a single entry iovec xen_backend: make the xen_feature_grant_copy flag private xen_disk: remove use of grant map/unmap xen_backend: add an emulation of grant copy xen: remove other open-coded use of libxengnttab xen_disk: remove open-coded use of libxengnttab xen_backend: add grant table helpers xen: add a meaningful declaration of grant_copy_segment into xen_common.h checkpatch: generalize xen handle matching in the list of types xen-hvm: create separate function for ioreq server initialization xen_pt: Present the size of 64 bit BARs correctly configure: Add explanation for --enable-xen-pci-passthrough xen/pt: use address_space_memory object for memory region hooks xen-pvdevice: Introduce a simplistic xen-pvdevice save state Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-24bochs-display: add pcie supportGerd Hoffmann1-0/+8
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Message-id: 20180522165058.15404-6-kraxel@redhat.com
2018-05-24bochs-display: add dirty tracking supportGerd Hoffmann1-2/+32
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-id: 20180522165058.15404-5-kraxel@redhat.com
2018-05-24hw/display: add new bochs-display deviceGerd Hoffmann2-0/+326
After writing up the virtual mdev device emulating a display supporting the bochs vbe dispi interface (mbochs.ko) and seeing how simple it actually is I've figured that would be useful for qemu too. So, here it is, -device bochs-display. It is basically -device VGA without legacy vga emulation. PCI bar 0 is the framebuffer, PCI bar 2 is mmio with the registers. The vga registers are simply not there though, neither in the legacy ioport location nor in the mmio bar. Consequently it is PCI class DISPLAY_OTHER not DISPLAY_VGA. So there is no text mode emulation, no weird video modes (planar, 256color palette), no memory window at 0xa0000. Just a linear framebuffer in the pci memory bar. And the amount of code to emulate this (and therefore the attack surface) is an order of magnitude smaller when compared to vga emulation. Compatibility wise it works with OVMF (latest git master). The bochs-drm.ko linux kernel module can handle it just fine too. So UEFI guests should not see any functional difference to VGA. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-id: 20180522165058.15404-4-kraxel@redhat.com
2018-05-24vga-pci: use PCI_VGA_MMIO_SIZEGerd Hoffmann1-2/+4
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180522165058.15404-3-kraxel@redhat.com
2018-05-24vga: move bochs vbe defines to header fileGerd Hoffmann2-46/+2
Create a new header file, move the bochs vbe dispi interface defines to it, so they can be used outside vga code. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-id: 20180522165058.15404-2-kraxel@redhat.com
2018-05-24vga: catch depth 0Gerd Hoffmann1-5/+18
depth == 0 is used to indicate 256 color modes. Our region calculation goes wrong in that case. So detect that and just take the safe code path we already have for the wraparound case. While being at it also catch depth == 15 (where our region size calculation goes wrong too). And make the comment more verbose, explaining what is going on here. Without this windows guest install might trigger an assert due to trying to check dirty bitmap outside the snapshot region. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1575541 Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Message-id: 20180514103117.21059-1-kraxel@redhat.com
2018-05-23intel-iommu: rework the page walk logicPeter Xu2-59/+157
This patch fixes a potential small window that the DMA page table might be incomplete or invalid when the guest sends domain/context invalidations to a device. This can cause random DMA errors for assigned devices. This is a major change to the VT-d shadow page walking logic. It includes but is not limited to: - For each VTDAddressSpace, now we maintain what IOVA ranges we have mapped and what we have not. With that information, now we only send MAP or UNMAP when necessary. Say, we don't send MAP notifies if we know we have already mapped the range, meanwhile we don't send UNMAP notifies if we know we never mapped the range at all. - Introduce vtd_sync_shadow_page_table[_range] APIs so that we can call in any places to resync the shadow page table for a device. - When we receive domain/context invalidation, we should not really run the replay logic, instead we use the new sync shadow page table API to resync the whole shadow page table without unmapping the whole region. After this change, we'll only do the page walk once for each domain invalidations (before this, it can be multiple, depending on number of notifiers per address space). While at it, the page walking logic is also refactored to be simpler. CC: QEMU Stable <qemu-stable@nongnu.org> Reported-by: Jintack Lim <jintack@cs.columbia.edu> Tested-by: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: trace domain id during page walkPeter Xu2-7/+11
This patch only modifies the trace points. Previously we were tracing page walk levels. They are redundant since we have page mask (size) already. Now we trace something much more useful which is the domain ID of the page walking. That can be very useful when we trace more than one devices on the same system, so that we can know which map is for which domain. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: pass in address space when page walkPeter Xu1-0/+4
We pass in the VTDAddressSpace too. It'll be used in the follow up patches. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: introduce vtd_page_walk_infoPeter Xu1-32/+52
During the recursive page walking of IOVA page tables, some stack variables are constant variables and never changed during the whole page walking procedure. Isolate them into a struct so that we don't need to pass those contants down the stack every time and multiple times. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: only do page walk for MAP notifiersPeter Xu1-5/+39
For UNMAP-only IOMMU notifiers, we don't need to walk the page tables. Fasten that procedure by skipping the page table walk. That should boost performance for UNMAP-only notifiers like vhost. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: add iommu lockPeter Xu1-9/+47
SECURITY IMPLICATION: this patch fixes a potential race when multiple threads access the IOMMU IOTLB cache. Add a per-iommu big lock to protect IOMMU status. Currently the only thing to be protected is the IOTLB/context cache, since that can be accessed even without BQL, e.g., in IO dataplane. Note that we don't need to protect device page tables since that's fully controlled by the guest kernel. However there is still possibility that malicious drivers will program the device to not obey the rule. In that case QEMU can't really do anything useful, instead the guest itself will be responsible for all uncertainties. CC: QEMU Stable <qemu-stable@nongnu.org> Reported-by: Fam Zheng <famz@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: remove IntelIOMMUNotifierNodePeter Xu1-30/+11
That is not really necessary. Removing that node struct and put the list entry directly into VTDAddressSpace. It simplfies the code a lot. Since at it, rename the old notifiers_list into vtd_as_with_notifiers. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: send PSI always even if across PDEsPeter Xu1-12/+30
SECURITY IMPLICATION: without this patch, any guest with both assigned device and a vIOMMU might encounter stale IO page mappings even if guest has already unmapped the page, which may lead to guest memory corruption. The stale mappings will only be limited to the guest's own memory range, so it should not affect the host memory or other guests on the host. During IOVA page table walking, there is a special case when the PSI covers one whole PDE (Page Directory Entry, which contains 512 Page Table Entries) or more. In the past, we skip that entry and we don't notify the IOMMU notifiers. This is not correct. We should send UNMAP notification to registered UNMAP notifiers in this case. For UNMAP only notifiers, this might cause IOTLBs cached in the devices even if they were already invalid. For MAP/UNMAP notifiers like vfio-pci, this will cause stale page mappings. This special case doesn't trigger often, but it is very easy to be triggered by nested device assignments, since in that case we'll possibly map the whole L2 guest RAM region into the device's IOVA address space (several GBs at least), which is far bigger than normal kernel driver usages of the device (tens of MBs normally). Without this patch applied to L1 QEMU, nested device assignment to L2 guests will dump some errors like: qemu-system-x86_64: VFIO_MAP_DMA: -17 qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000, 0x7f89a920d000) = -17 (File exists) CC: QEMU Stable <qemu-stable@nongnu.org> Acked-by: Jason Wang <jasowang@redhat.com> [peterx: rewrite the commit message] Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23nvdimm: fix typo in label-size definitionRoss Zwisler1-1/+1
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Fixes: commit da6789c27c2e ("nvdimm: add a macro for property "label-size"") Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Cc: Haozhong Zhang <haozhong.zhang@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23hw/virtio: Fix brace Werror with clang 6.0.0Richard Henderson1-1/+1
The warning is hw/virtio/vhost-user.c:1319:26: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces] VhostUserMsg msg = { 0 }; ^ {} While the original code is correct, and technically exactly correct as per ISO C89, both GCC and Clang support plain empty set of braces as an extension. Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23vhost-user+postcopy: Use qemu_set_nonblockDr. David Alan Gilbert1-1/+1
Use qemu_set_nonblock rather than a simple fcntl; cleaner and I have no reason to change other flags. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23virtio: support setting memory region based host notifierTiwei Bie2-0/+35
This patch introduces the support for setting memory region based host notifiers for virtio device. This is helpful when using a hardware accelerator for a virtio device, because hardware heavily depends on the notification, this will allow the guest driver in the VM to notify the hardware directly. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23vhost-user: support receiving file descriptors in slave_readTiwei Bie1-1/+40
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23x86/cpu: use standard-headers/asm-x86.kvm_para.hMichael S. Tsirkin1-1/+1
Switch to the header we imported from Linux, this allows us to drop a hack in kvm_i386.h. More code will be dropped in the next patch. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23vhost: add trace for IOTLB missPeter Xu2-0/+8
Add some trace points for IOTLB translation for vhost. After vhost-user is setup, the only IO path that QEMU will participate should be the IOMMU translation, so it'll be good we can track this with explicit timestamps when needed to see how long time we take to do the translation, and whether there's anything stuck inside. It might be useful for triaging vhost-user problems. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23virtio-balloon: add hugetlb page allocation countsJonathan Helman1-0/+2
qemu should read and report hugetlb page allocation counts exported in the following kernel patch: commit 4c3ca37c4a4394978fd0f005625f6064ed2b9a64 Author: Jonathan Helman <jonathan.helman@oracle.com> Date: Mon Mar 19 11:00:35 2018 -0700 virtio_balloon: export hugetlb page allocation counts Export the number of successful and failed hugetlb page allocations via the virtio balloon driver. These 2 counts come directly from the vm_events HTLB_BUDDY_PGALLOC and HTLB_BUDDY_PGALLOC_FAIL. Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2018-05-23hw/pci-host/q35: Replace hardcoded value with macroZihan Yang1-6/+11
During smram region initialization some addresses are hardcoded, replace them with macro to be more clear to readers. Previous patch forgets about one value and exceeds the line limit of 90 characters. The v2 breaks a few long lines Signed-off-by: Zihan Yang <whois.zihan.yang@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-22xen_disk: be consistent with use of xendev and blkdev->xendevPaul Durrant1-44/+46
Certain functions in xen_disk are called with a pointer to xendev (struct XenDevice *). They then use container_of() to acces the surrounding blkdev (struct XenBlkDev) but then in various places use &blkdev->xendev when use of the original xendev pointer is shorter to express and clearly equivalent. This patch is a purely cosmetic patch which makes sure there is a xendev pointer on stack for any function where the pointer is need on multiple occasions modified those functions to use it consistently. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2018-05-22xen_disk: use a single entry iovecPaul Durrant1-55/+21
Since xen_disk now always copies data to and from a guest there is no need to maintain a vector entry corresponding to every page of a request. This means there is less per-request state to maintain so the ioreq structure can shrink significantly. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Anthony Perard <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2018-05-22xen_backend: make the xen_feature_grant_copy flag privatePaul Durrant1-1/+1
There is no longer any use of this flag outside of the xen_backend code. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Anthony Perard <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2018-05-22xen_disk: remove use of grant map/unmapPaul Durrant1-327/+25
Now that the (native or emulated) xen_be_copy_grant_refs() helper is always available, the xen_disk code can be significantly simplified by removing direct use of grant map and unmap operations. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Anthony Perard <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>