aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2018-06-04migration: discard non-migratable RAMBlocksCédric Le Goater1-0/+4
On the POWER9 processor, the XIVE interrupt controller can control interrupt sources using MMIO to trigger events, to EOI or to turn off the sources. Priority management and interrupt acknowledgment is also controlled by MMIO in the presenter sub-engine. These MMIO regions are exposed to guests in QEMU with a set of 'ram device' memory mappings, similarly to VFIO, and the VMAs are populated dynamically with the appropriate pages using a fault handler. But, these regions are an issue for migration. We need to discard the associated RAMBlocks from the RAM state on the source VM and let the destination VM rebuild the memory mappings on the new host in the post_load() operation just before resuming the system. To achieve this goal, the following introduces a new RAMBlock flag RAM_MIGRATABLE which is updated in the vmstate_register_ram() and vmstate_unregister_ram() routines. This flag is then used by the migration to identify RAMBlocks to discard on the source. Some checks are also performed on the destination to make sure nothing invalid was sent. This change impacts the boston, malta and jazz mips boards for which migration compatibility is broken. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-04migration: introduce decompress-error-checkXiao Guangrong1-1/+6
QEMU 3.0 enables strict check for compression & decompression to make the migration more robust, that depends on the source to fix the internal design which triggers the unexpected error conditions To make it work for migrating old version QEMU to 2.13 QEMU, we introduce this parameter to disable the error check on the destination which is the default behavior of the machine type which is older than 2.13, alternately, the strict check can be enabled explicitly as followings: -M pc-q35-2.11 -global migration.decompress-error-check=true Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-01sandbox: disable -sandbox if CONFIG_SECCOMP undefinedYi Min Zhao1-1/+2
If CONFIG_SECCOMP is undefined, the option 'elevatedprivileges' remains compiled. This would make libvirt set the corresponding capability and then trigger failure during guest startup. This patch moves the code regarding seccomp command line options to qemu-seccomp.c file and wraps qemu_opts_foreach finding sandbox option with CONFIG_SECCOMP. Because parse_sandbox() is moved into qemu-seccomp.c file, change seccomp_start() to static function. Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Tested-by: Ján Tomko <jtomko@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>
2018-05-31xen-hvm: try to use xenforeignmemory_map_resource() to map ioreq pagesPaul Durrant1-0/+16
Xen 4.11 has a new API to directly map guest resources. Among the resources that can be mapped using this API are ioreq pages. This patch modifies QEMU to attempt to use the new API should it exist, falling back to the previous mechanism if it is unavailable. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2018-05-31xen/hvm: correct reporting of modified memory under physmap during migrationIgor Druzhinin1-3/+2
When global_log_dirty is enabled VRAM modification tracking never worked correctly. The address that is passed to xen_hvm_modified_memory() is not the effective PFN but RAM block address which is not the same for VRAM. We need to make a translation for this address into PFN using physmap. Since there is no way to access physmap properly inside xen_hvm_modified_memory() let's make it a global structure. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2018-05-31vmstate.h: Provide VMSTATE_BOOL_SUB_ARRAYPeter Maydell1-0/+3
Provide a VMSTATE_BOOL_SUB_ARRAY to go with VMSTATE_UINT8_SUB_ARRAY and friends. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180521140402.23318-23-peter.maydell@linaro.org
2018-05-31Make address_space_get_iotlb_entry() take a MemTxAttrs argumentPeter Maydell1-1/+1
As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to address_space_get_iotlb_entry(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-12-peter.maydell@linaro.org
2018-05-31Make flatview_translate() take a MemTxAttrs argumentPeter Maydell1-3/+4
As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to flatview_translate(); all its callers now have attrs available. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-11-peter.maydell@linaro.org
2018-05-31Make MemoryRegion valid.accepts callback take a MemTxAttrs argumentPeter Maydell1-1/+2
As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to the MemoryRegion valid.accepts callback. We'll need this for subpage_accepts(). We could take the approach we used with the read and write callbacks and add new a new _with_attrs version, but since there are so few implementations of the accepts hook we just change them all. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-9-peter.maydell@linaro.org
2018-05-31Make memory_region_access_valid() take a MemTxAttrs argumentPeter Maydell1-1/+2
As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to memory_region_access_valid(). Its callers either have an attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED. The callsite in flatview_access_valid() is part of a recursive loop flatview_access_valid() -> memory_region_access_valid() -> subpage_accepts() -> flatview_access_valid(); we make it pass MEMTXATTRS_UNSPECIFIED for now, until the next several commits have plumbed an attrs parameter through the rest of the loop and we can add an attrs parameter to flatview_access_valid(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-8-peter.maydell@linaro.org
2018-05-31Make address_space_access_valid() take a MemTxAttrs argumentPeter Maydell2-2/+5
As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to address_space_access_valid(). Its callers either have an attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-6-peter.maydell@linaro.org
2018-05-31Make address_space_map() take a MemTxAttrs argumentPeter Maydell2-2/+4
As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to address_space_map(). Its callers either have an attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-5-peter.maydell@linaro.org
2018-05-31Make address_space_translate{, _cached}() take a MemTxAttrs argumentPeter Maydell1-1/+3
As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to address_space_translate() and address_space_translate_cached(). Callers either have an attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-4-peter.maydell@linaro.org
2018-05-31Make tb_invalidate_phys_addr() take a MemTxAttrs argumentPeter Maydell1-2/+3
As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to tb_invalidate_phys_addr(). Its callers either have an attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180521140402.23318-3-peter.maydell@linaro.org
2018-05-31memory.h: Improve IOMMU related documentationPeter Maydell1-10/+95
Add more detail to the documentation for memory_region_init_iommu() and other IOMMU-related functions and data structures. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-id: 20180521140402.23318-2-peter.maydell@linaro.org
2018-05-31tcg: Fix helper function vs host abi for float16Richard Henderson1-1/+1
Depending on the host abi, float16, aka uint16_t, values are passed and returned either zero-extended in the host register or with garbage at the top of the host register. The tcg code generator has so far been assuming garbage, as that matches the x86 abi, but this is incorrect for other host abis. Further, target/arm has so far been assuming zero-extended results, so that it may store the 16-bit value into a 32-bit slot with the high 16-bits already clear. Rectify both problems by mapping "f16" in the helper definition to uint32_t instead of (a typedef for) uint16_t. This forces the host compiler to assume garbage in the upper 16 bits on input and to zero-extend the result on output. Cc: qemu-stable@nongnu.org Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com> Message-id: 20180522175629.24932-1-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-31Merge remote-tracking branch 'remotes/ehabkost/tags/numa-next-pull-request' ↵Peter Maydell3-0/+4
into staging NUMA queue, 2018-05-30 * New command-line option: --preconfig This option allows pausing QEMU and allow the configuration using QMP commands before running board initialization code. * New QMP set-numa-node, now made possible because of --preconfig * Small update on -numa error messages # gpg: Signature made Thu 31 May 2018 00:02:59 BST # gpg: using RSA key 2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/numa-next-pull-request: tests: functional tests for QMP command set-numa-node qmp: add set-numa-node command qmp: permit query-hotpluggable-cpus in preconfig state tests: extend qmp test with preconfig checks cli: add --preconfig option tests: qapi-schema tests for allow-preconfig qapi: introduce new cmd option "allow-preconfig" hmp: disable monitor in preconfig state qapi: introduce preconfig runstate numa: split out NumaOptions parsing into set_numa_options() numa: postpone options post-processing till machine_run_board_init() numa: clarify error message when node index is out of range in -numa dist, ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-30cli: add --preconfig optionIgor Mammedov1-0/+1
This option allows pausing QEMU in the new RUN_STATE_PRECONFIG state, allowing the configuration of QEMU from QMP before the machine jumps into board initialization code of machine_run_board_init() The intent is to allow management to query machine state and additionally configure it using previous query results within one QEMU instance (i.e. eliminate the need to start QEMU twice, 1st to query board specific parameters and 2nd for actual VM start using query results for additional parameters). The new option complements -S option and could be used with or without it. The difference is that -S pauses QEMU when the machine is completely initialized with all devices wired up and ready to execute guest code (QEMU needs only to unpause VCPUs to let guest execute its code), while the "preconfig" option pauses QEMU early before board specific init callback (machine_run_board_init) is executed and allows the configuration of machine parameters which will be used by board init code. When early introspection/configuration is done, command 'exit-preconfig' should be used to exit RUN_STATE_PRECONFIG and transition to the next requested state (i.e. if -S is used then QEMU will pause the second time when board/device initialization is completed or start guest execution if -S isn't provided on CLI) PS: Initially 'preconfig' is planned to be used for configuring numa topology depending on board specified possible cpus layout. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1526059483-42847-1-git-send-email-imammedo@redhat.com> [ehabkost: Changed "since 2.13" to "since 3.0"] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-30qapi: introduce new cmd option "allow-preconfig"Igor Mammedov1-0/+1
New option will be used to allow commands, which are prepared/need to run, during preconfig state. Other commands that should be able to run in preconfig state, should be amended to not expect machine in initialized state or deal with it. For compatibility reasons, commands that don't use new flag 'allow-preconfig' explicitly are not permitted to run in preconfig state but allowed in all other states like they used to be. Within this patch allow following commands in preconfig state: qmp_capabilities query-qmp-schema query-commands query-command-line-options query-status exit-preconfig to allow qmp connection, basic introspection and moving to the next state. PS: set-numa-node and query-hotpluggable-cpus will be enabled later in a separate patches. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1526057503-39287-1-git-send-email-imammedo@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [ehabkost: Changed "since 2.13" to "since 3.0"] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-30numa: split out NumaOptions parsing into set_numa_options()Igor Mammedov1-0/+1
it will allow to reuse set_numa_options() for parsing configuration commands received via QMP interface Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1525423069-61903-3-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-30numa: postpone options post-processing till machine_run_board_init()Igor Mammedov1-0/+1
in preparation for numa options to being handled via QMP before machine_run_board_init(), move final numa configuration checks and processing to machine_run_board_init() so it could take into account both CLI (via parse_numa_opts()) and QMP input Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1525423069-61903-2-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-30job: Add error message for failing jobsKevin Wolf1-1/+6
So far we relied on job->ret and strerror() to produce an error message for failed jobs. Not surprisingly, this tends to result in completely useless messages. This adds a Job.error field that can contain an error string for a failing job, and a parameter to job_completed() that sets the field. As a default, if NULL is passed, we continue to use strerror(job->ret). All existing callers are changed to pass NULL. They can be improved in separate patches. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-05-25gdbstub: Clarify what gdb_handlesig() is doingPeter Maydell1-0/+15
gdb_handlesig()'s behaviour is not entirely obvious at first glance. Add a doc comment for it, and also add a comment explaining why it's ok for gdb_do_syscallv() to ignore gdb_handlesig()'s return value. (Coverity complains about this: CID 1390850.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20180515181958.25837-1-peter.maydell@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2018-05-24linux-user: Assert on bad type in thunk_type_align() and thunk_type_size()Peter Maydell1-2/+2
In thunk_type_align() and thunk_type_size() we currently return -1 if the value at the type_ptr isn't one of the TYPE_* values we understand. However, this should never happen, and if it does then the calling code will go confusingly wrong because none of the callsites try to handle an error return. Switch to an assertion instead, so that if this does somehow happen we'll have a nice clear backtrace of what happened rather than a weird crash or misbehaviour. This also silences various Coverity complaints about not handling the negative return value (CID 1005735, 1005736, 1005738, 1390582). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20180514174616.19601-1-peter.maydell@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2018-05-24Merge remote-tracking branch 'remotes/kraxel/tags/vga-20180524-pull-request' ↵Peter Maydell1-0/+69
into staging vga: catch depth 0 hw/display: add new bochs-display device some cleanups. # gpg: Signature made Thu 24 May 2018 16:45:46 BST # gpg: using RSA key 4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/vga-20180524-pull-request: MAINTAINERS: add vga entries bochs-display: add pcie support bochs-display: add dirty tracking support hw/display: add new bochs-display device vga-pci: use PCI_VGA_MMIO_SIZE vga: move bochs vbe defines to header file vga: catch depth 0 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-24Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell9-10/+276
pc, pci, virtio, vhost: fixes, features Beginning of merging vDPA, new PCI ID, a new virtio balloon stat, intel iommu rework fixing a couple of security problems (no CVEs yet), fixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Wed 23 May 2018 15:41:32 BST # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (28 commits) intel-iommu: rework the page walk logic util: implement simple iova tree intel-iommu: trace domain id during page walk intel-iommu: pass in address space when page walk intel-iommu: introduce vtd_page_walk_info intel-iommu: only do page walk for MAP notifiers intel-iommu: add iommu lock intel-iommu: remove IntelIOMMUNotifierNode intel-iommu: send PSI always even if across PDEs nvdimm: fix typo in label-size definition contrib/vhost-user-blk: enable protocol feature for vhost-user-blk hw/virtio: Fix brace Werror with clang 6.0.0 libvhost-user: Send messages with no data vhost-user+postcopy: Use qemu_set_nonblock virtio: support setting memory region based host notifier vhost-user: support receiving file descriptors in slave_read vhost-user: add Net prefix to internal state structure linux-headers: add kvm header for mips linux-headers: add unistd.h on all arches update-linux-headers.sh: unistd.h, kvm consistency ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-24Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell4-464/+600
Block layer patches: - Generic background jobs - qemu-iotests fixes for NFS and the 'migration' group - sheepdog: Minor code simplification # gpg: Signature made Wed 23 May 2018 13:33:49 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (46 commits) qemu-iotests: Test job-* with block jobs iotests: Move qmp_to_opts() to VM blockjob: Remove BlockJob.driver job: Add query-jobs QMP command job: Add lifecycle QMP commands job: Add JOB_STATUS_CHANGE QMP event job: Introduce qapi/job.json job: Move progress fields to Job job: Add job_transition_to_ready() job: Add job_is_ready() job: Add job_dismiss() job: Add job_yield() block: Cancel job in bdrv_close_all() callers job: Move completion and cancellation to Job job: Move transactions to Job job: Switch transactions to JobTxn job: Move job_finish_sync() to Job job: Move .complete callback to Job job: Add job_drain() job: Convert block_job_cancel_async() to Job ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-24Merge remote-tracking branch ↵Peter Maydell2-3/+48
'remotes/sstabellini-http/tags/xen-20180522-tag' into staging Xen 2018/05/22 # gpg: Signature made Tue 22 May 2018 19:44:06 BST # gpg: using RSA key 894F8F4870E1AE90 # gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>" # gpg: aka "Stefano Stabellini <sstabellini@kernel.org>" # Primary key fingerprint: D04E 33AB A51F 67BA 07D3 0AEA 894F 8F48 70E1 AE90 * remotes/sstabellini-http/tags/xen-20180522-tag: xen_disk: be consistent with use of xendev and blkdev->xendev xen_disk: use a single entry iovec xen_backend: make the xen_feature_grant_copy flag private xen_disk: remove use of grant map/unmap xen_backend: add an emulation of grant copy xen: remove other open-coded use of libxengnttab xen_disk: remove open-coded use of libxengnttab xen_backend: add grant table helpers xen: add a meaningful declaration of grant_copy_segment into xen_common.h checkpatch: generalize xen handle matching in the list of types xen-hvm: create separate function for ioreq server initialization xen_pt: Present the size of 64 bit BARs correctly configure: Add explanation for --enable-xen-pci-passthrough xen/pt: use address_space_memory object for memory region hooks xen-pvdevice: Introduce a simplistic xen-pvdevice save state Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-24vga: move bochs vbe defines to header fileGerd Hoffmann1-0/+69
Create a new header file, move the bochs vbe dispi interface defines to it, so they can be used outside vga code. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-id: 20180522165058.15404-2-kraxel@redhat.com
2018-05-23intel-iommu: rework the page walk logicPeter Xu1-0/+2
This patch fixes a potential small window that the DMA page table might be incomplete or invalid when the guest sends domain/context invalidations to a device. This can cause random DMA errors for assigned devices. This is a major change to the VT-d shadow page walking logic. It includes but is not limited to: - For each VTDAddressSpace, now we maintain what IOVA ranges we have mapped and what we have not. With that information, now we only send MAP or UNMAP when necessary. Say, we don't send MAP notifies if we know we have already mapped the range, meanwhile we don't send UNMAP notifies if we know we never mapped the range at all. - Introduce vtd_sync_shadow_page_table[_range] APIs so that we can call in any places to resync the shadow page table for a device. - When we receive domain/context invalidation, we should not really run the replay logic, instead we use the new sync shadow page table API to resync the whole shadow page table without unmapping the whole region. After this change, we'll only do the page walk once for each domain invalidations (before this, it can be multiple, depending on number of notifiers per address space). While at it, the page walking logic is also refactored to be simpler. CC: QEMU Stable <qemu-stable@nongnu.org> Reported-by: Jintack Lim <jintack@cs.columbia.edu> Tested-by: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23util: implement simple iova treePeter Xu1-0/+134
Introduce a simplest iova tree implementation based on GTree. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: only do page walk for MAP notifiersPeter Xu1-0/+2
For UNMAP-only IOMMU notifiers, we don't need to walk the page tables. Fasten that procedure by skipping the page table walk. That should boost performance for UNMAP-only notifiers like vhost. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: add iommu lockPeter Xu1-0/+6
SECURITY IMPLICATION: this patch fixes a potential race when multiple threads access the IOMMU IOTLB cache. Add a per-iommu big lock to protect IOMMU status. Currently the only thing to be protected is the IOTLB/context cache, since that can be accessed even without BQL, e.g., in IO dataplane. Note that we don't need to protect device page tables since that's fully controlled by the guest kernel. However there is still possibility that malicious drivers will program the device to not obey the rule. In that case QEMU can't really do anything useful, instead the guest itself will be responsible for all uncertainties. CC: QEMU Stable <qemu-stable@nongnu.org> Reported-by: Fam Zheng <famz@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23intel-iommu: remove IntelIOMMUNotifierNodePeter Xu1-7/+2
That is not really necessary. Removing that node struct and put the list entry directly into VTDAddressSpace. It simplfies the code a lot. Since at it, rename the old notifiers_list into vtd_as_with_notifiers. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23nvdimm: fix typo in label-size definitionRoss Zwisler1-1/+1
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Fixes: commit da6789c27c2e ("nvdimm: add a macro for property "label-size"") Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Cc: Haozhong Zhang <haozhong.zhang@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23virtio: support setting memory region based host notifierTiwei Bie2-0/+4
This patch introduces the support for setting memory region based host notifiers for virtio device. This is helpful when using a hardware accelerator for a virtio device, because hardware heavily depends on the notification, this will allow the guest driver in the VM to notify the hardware directly. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23blockjob: Remove BlockJob.driverKevin Wolf1-3/+0
BlockJob.driver is redundant with Job.driver and only used in very few places any more. Remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-23job: Add query-jobs QMP commandKevin Wolf1-0/+3
This adds a minimal query-jobs implementation that shouldn't pose many design questions. It can later be extended to expose more information, and especially job-specific information. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-23job: Move progress fields to JobKevin Wolf2-25/+28
BlockJob has fields .offset and .len, which are actually misnomers today because they are no longer tied to block device sizes, but just progress counters. As such they make a lot of sense in generic Jobs. This patch moves the fields to Job and renames them to .progress_current and .progress_total to describe their function better. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23job: Add job_transition_to_ready()Kevin Wolf3-11/+9
The transition to the READY state was still performed in the BlockJob layer, in the same function that sent the BLOCK_JOB_READY QMP event. This patch brings the state transition to the Job layer and implements the QMP event using a notifier called from the Job layer, like we already do for other events related to state transitions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23job: Add job_is_ready()Kevin Wolf2-5/+3
Instead of having a 'bool ready' in BlockJob, add a function that derives its value from the job status. At the same time, this fixes the behaviour to match what the QAPI documentation promises for query-block-job: 'true if the job may be completed'. When the ready flag was introduced in commit ef6dbf1e46e, the flag never had to be reset to match the description because after being ready, the jobs would immediately complete and disappear. Job transactions and manual job finalisation were introduced only later. With these changes, jobs may stay around even after having completed (and they are not ready to be completed a second time), however their patches forgot to reset the ready flag. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23job: Add job_dismiss()Kevin Wolf2-10/+6
This moves block_job_dismiss() to the Job layer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23job: Add job_yield()Kevin Wolf2-10/+7
This moves block_job_yield() to the Job layer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23job: Move completion and cancellation to JobKevin Wolf3-84/+57
This moves the top-level job completion and cancellation functions from BlockJob to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-23job: Move transactions to JobKevin Wolf3-73/+62
This moves the logic that implements job transactions from BlockJob to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23job: Switch transactions to JobTxnKevin Wolf4-8/+10
This doesn't actually move any transaction code to Job yet, but it renames the type for transactions from BlockJobTxn to JobTxn and makes them contain Jobs rather than BlockJobs Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23job: Move job_finish_sync() to JobKevin Wolf1-0/+9
block_job_finish_sync() doesn't contain anything block job specific any more, so it can be moved to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23job: Move .complete callback to JobKevin Wolf3-16/+8
This moves the .complete callback that tells a READY job to complete from BlockJobDriver to JobDriver. The wrapper function job_complete() doesn't require anything block job specific any more and can be moved to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23job: Add job_drain()Kevin Wolf2-0/+25
block_job_drain() contains a blk_drain() call which cannot be moved to Job, so add a new JobDriver callback JobDriver.drain which has a common implementation for all BlockJobs. In addition to this we keep the existing BlockJobDriver.drain callback that is called by the common drain implementation for all block jobs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23job: Convert block_job_cancel_async() to JobKevin Wolf2-6/+6
block_job_cancel_async() did two things that were still block job specific: * Setting job->force. This field makes sense on the Job level, so we can just move it. While at it, rename it to job->force_cancel to make its purpose more obvious. * Resetting the I/O status. This can't be moved because generic Jobs don't have an I/O status. What the function really implements is a user resume, except without entering the coroutine. Consequently, it makes sense to call the .user_resume driver callback here which already resets the I/O status. The old block_job_cancel_async() has two separate if statements that check job->iostatus != BLOCK_DEVICE_IO_STATUS_OK and job->user_paused. However, the former condition always implies the latter (as is asserted in block_job_iostatus_reset()), so changing the explicit call of block_job_iostatus_reset() on the former condition with the .user_resume callback on the latter condition is equivalent and doesn't need to access any BlockJob specific state. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>