aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2021-06-21Merge remote-tracking branch ↵Peter Maydell10-71/+497
'remotes/ehabkost-gl/tags/x86-next-pull-request' into staging x86 queue, 2021-06-18 Features: * Add ratelimit for bus locks acquired in guest (Chenyi Qiang) Documentation: * SEV documentation updates (Tom Lendacky) * Add a table showing x86-64 ABI compatibility levels (Daniel P. Berrangé) Automated changes: * Update Linux headers to 5.13-rc4 (Eduardo Habkost) # gpg: Signature made Fri 18 Jun 2021 20:51:26 BST # gpg: using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6 # gpg: issuer "ehabkost@redhat.com" # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full] # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost-gl/tags/x86-next-pull-request: scripts: helper to generate x86_64 CPU ABI compat info docs: add a table showing x86-64 ABI compatibility levels docs/interop/firmware.json: Add SEV-ES support docs: Add SEV-ES documentation to amd-memory-encryption.txt doc: Fix some mistakes in the SEV documentation i386: Add ratelimit for bus locks acquired in guest Update Linux headers to 5.13-rc4 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-18nbd/client-connection: add option for non-blocking connection attemptVladimir Sementsov-Ogievskiy1-1/+1
We'll need a possibility of non-blocking nbd_co_establish_connection(), so that it returns immediately, and it returns success only if a connections was previously established in background. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-30-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18nbd/client-connection: return only one io channelVladimir Sementsov-Ogievskiy1-2/+2
block/nbd doesn't need underlying sioc channel anymore. So, we can update nbd/client-connection interface to return only one top-most io channel, which is more straight forward. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-27-vsementsov@virtuozzo.com> [eblake: squash in Vladimir's fixes for uninit usage caught by clang] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18nbd/client-connection: implement connection retryVladimir Sementsov-Ogievskiy1-0/+2
Add an option for a thread to retry connecting until it succeeds. We'll use nbd/client-connection both for reconnect and for initial connection in nbd_open(), so we need a possibility to use same NBDClientConnection instance to connect once in nbd_open() and then use retry semantics for reconnect. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-21-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18nbd/client-connection: add possibility of negotiationVladimir Sementsov-Ogievskiy1-2/+7
Add arguments and logic to support nbd negotiation in the same thread after successful connection. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-20-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18nbd: move connection code from block/nbd to nbd/client-connectionVladimir Sementsov-Ogievskiy1-0/+11
We now have bs-independent connection API, which consists of four functions: nbd_client_connection_new() nbd_client_connection_release() nbd_co_establish_connection() nbd_co_establish_connection_cancel() Move them to a separate file together with NBDClientConnection structure which becomes private to the new API. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-18-vsementsov@virtuozzo.com> [eblake: comment tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18qemu-sockets: introduce socket_address_parse_named_fd()Vladimir Sementsov-Ogievskiy1-0/+11
Add function that transforms named fd inside SocketAddress structure into number representation. This way it may be then used in a context where current monitor is not available. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-6-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: comment tweak] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18co-queue: drop extra coroutine_fn marksVladimir Sementsov-Ogievskiy1-2/+4
qemu_co_queue_next() and qemu_co_queue_restart_all() just call aio_co_wake() which works well in non-coroutine context. So these functions can be called from non-coroutine context as well. And actually qemu_co_queue_restart_all() is called from nbd_cancel_in_flight(), which is called from non-coroutine context. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18async: the main AioContext is only "current" if under the BQLPaolo Bonzini1-1/+4
If we want to wake up a coroutine from a worker thread, aio_co_wake() currently does not work. In that scenario, aio_co_wake() calls aio_co_enter(), but there is no current AioContext and therefore qemu_get_current_aio_context() returns the main thread. aio_co_wake() then attempts to call aio_context_acquire() instead of going through aio_co_schedule(). The default case of qemu_get_current_aio_context() was added to cover synchronous I/O started from the vCPU thread, but the main and vCPU threads are quite different. The main thread is an I/O thread itself, only running a more complicated event loop; the vCPU thread instead is essentially a worker thread that occasionally calls qemu_mutex_lock_iothread(). It is only in those critical sections that it acts as if it were the home thread of the main AioContext. Therefore, this patch detaches qemu_get_current_aio_context() from iothreads, which is a useless complication. The AioContext pointer is stored directly in the thread-local variable, including for the main loop. Worker threads (including vCPU threads) optionally behave as temporary home threads if they have taken the big QEMU lock, but if that is not the case they will always schedule coroutines on remote threads via aio_co_schedule(). With this change, the stub qemu_mutex_iothread_locked() must be changed from true to false. The previous value of true was needed because the main thread did not have an AioContext in the thread-local variable, but now it does have one. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210609122234.544153-1-pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: tweak commit message per Vladimir's review] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-17i386: Add ratelimit for bus locks acquired in guestChenyi Qiang1-0/+8
A bus lock is acquired through either split locked access to writeback (WB) memory or any locked access to non-WB memory. It is typically >1000 cycles slower than an atomic operation within a cache and can also disrupts performance on other cores. Virtual Machines can exploit bus locks to degrade the performance of system. To address this kind of performance DOS attack coming from the VMs, bus lock VM exit is introduced in KVM and it can report the bus locks detected in guest. If enabled in KVM, it would exit to the userspace to let the user enforce throttling policies once bus locks acquired in VMs. The availability of bus lock VM exit can be detected through the KVM_CAP_X86_BUS_LOCK_EXIT. The returned bitmap contains the potential policies supported by KVM. The field KVM_BUS_LOCK_DETECTION_EXIT in bitmap is the only supported strategy at present. It indicates that KVM will exit to userspace to handle the bus locks. This patch adds a ratelimit on the bus locks acquired in guest as a mitigation policy. Introduce a new field "bus_lock_ratelimit" to record the limited speed of bus locks in the target VM. The user can specify it through the "bus-lock-ratelimit" as a machine property. In current implementation, the default value of the speed is 0 per second, which means no restrictions on the bus locks. As for ratelimit on detected bus locks, simply set the ratelimit interval to 1s and restrict the quota of bus lock occurence to the value of "bus_lock_ratelimit". A potential alternative is to introduce the time slice as a property which can help the user achieve more precise control. The detail of bus lock VM exit can be found in spec: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20210521043820.29678-1-chenyi.qiang@intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-06-17Update Linux headers to 5.13-rc4Eduardo Habkost9-71/+489
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20210603191541.2862286-1-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-06-17Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into ↵Peter Maydell7-35/+70
staging * avoid deprecation warnings for SASL on macOS 10.11 or newer * fix -readconfig when config blocks have an id (like [chardev "qmp"]) * Error* initialization fixes * Improvements to ESP emulation (Mark) * Allow creating noreserve memory backends (David) * Improvements to query-memdev (David) * Bump compiler to C11 (Richard) * First round of SVM fixes from GSoC project (Lara) # gpg: Signature made Wed 16 Jun 2021 16:37:49 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (45 commits) configure: Remove probe for _Static_assert qemu/compiler: Remove QEMU_GENERIC include/qemu/lockable: Use _Generic instead of QEMU_GENERIC util: Use unique type for QemuRecMutex in thread-posix.h util: Pass file+line to qemu_rec_mutex_unlock_impl util: Use real functions for thread-posix QemuRecMutex softfloat: Use _Generic instead of QEMU_GENERIC configure: Use -std=gnu11 target/i386: Added Intercept CR0 writes check target/i386: Added consistency checks for CR0 target/i386: Added consistency checks for VMRUN intercept and ASID target/i386: Refactored intercept checks into cpu_svm_has_intercept configure: map x32 to cpu_family x86_64 for meson hmp: Print "reserve" property of memory backends with "info memdev" qmp: Include "reserve" property of memory backends hmp: Print "share" property of memory backends with "info memdev" qmp: Include "share" property of memory backends qmp: Clarify memory backend properties returned via query-memdev hostmem: Wire up RAM_NORESERVE via "reserve" property util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-17Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-c11-20210615' ↵Peter Maydell5-113/+61
into staging Change to -std=gnu11. Replace QEMU_GENERIC with _Generic. Remove configure detect of _Static_assert. # gpg: Signature made Wed 16 Jun 2021 02:32:32 BST # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth-gitlab/tags/pull-c11-20210615: configure: Remove probe for _Static_assert qemu/compiler: Remove QEMU_GENERIC include/qemu/lockable: Use _Generic instead of QEMU_GENERIC util: Use unique type for QemuRecMutex in thread-posix.h util: Pass file+line to qemu_rec_mutex_unlock_impl util: Use real functions for thread-posix QemuRecMutex softfloat: Use _Generic instead of QEMU_GENERIC configure: Use -std=gnu11 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-16Merge remote-tracking branch ↵Peter Maydell2-0/+39
'remotes/pmaydell/tags/pull-target-arm-20210616' into staging target-arm queue: * hw/intc/arm_gicv3_cpuif: Tolerate spurious EOIR writes * handle some UNALLOCATED decode cases correctly rather than asserting * hw: virt: consider hw_compat_6_0 * hw/arm: add quanta-gbs-bmc machine * hw/intc/armv7m_nvic: Remove stale comment * target/arm: Fix mte page crossing test * hw/arm: quanta-q71l add pca954x muxes * target/arm: First few parts of MVE support # gpg: Signature made Wed 16 Jun 2021 14:34:49 BST # gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE # gpg: issuer "peter.maydell@linaro.org" # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate] # gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate] # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate] # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * remotes/pmaydell/tags/pull-target-arm-20210616: (25 commits) include/qemu/int128.h: Add function to create Int128 from int64_t bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations target/arm: Move expand_pred_b() data to vec_helper.c target/arm: Add framework for MVE decode target/arm: Implement MVE LETP insn target/arm: Implement MVE DLSTP target/arm: Implement MVE WLSTP insn target/arm: Implement MVE LCTP target/arm: Let vfp_access_check() handle late NOCP checks target/arm: Add handling for PSR.ECI/ICI target/arm: Handle VPR semantics in existing code target/arm: Enable FPSCR.QC bit for MVE target/arm: Provide and use H8 and H1_8 macros hw/arm: quanta-q71l add pca954x muxes hw/arm: gsj add pca9548 hw/arm: gsj add i2c comments target/arm: Fix mte page crossing test hw/intc/armv7m_nvic: Remove stale comment hw/arm: quanta-gbs-bmc add i2c comments hw/arm: add quanta-gbs-bmc machine ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-16include/qemu/int128.h: Add function to create Int128 from int64_tPeter Maydell1-0/+10
int128_make64() creates an Int128 from an unsigned 64 bit value; add a function int128_makes64() creating an Int128 from a signed 64 bit value. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20210614151007.4545-34-peter.maydell@linaro.org
2021-06-16bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operationsPeter Maydell1-0/+29
Currently the ARM SVE helper code defines locally some utility functions for swapping 16-bit halfwords within 32-bit or 64-bit values and for swapping 32-bit words within 64-bit values, parallel to the byte-swapping bswap16/32/64 functions. We want these also for the ARM MVE code, and they're potentially generally useful for other targets, so move them to bitops.h. (We don't put them in bswap.h with the bswap* functions because they are implemented in terms of the rotate operations also defined in bitops.h, and including bitops.h from bswap.h seems better avoided.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20210614151007.4545-17-peter.maydell@linaro.org
2021-06-16configure: Remove probe for _Static_assertRichard Henderson1-11/+0
_Static_assert is part of C11, which is now required. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210614233143.1221879-9-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-16qemu/compiler: Remove QEMU_GENERICRichard Henderson1-40/+0
All previous users now use C11 _Generic. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210614233143.1221879-8-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-16include/qemu/lockable: Use _Generic instead of QEMU_GENERICRichard Henderson1-48/+40
This is both more and less complicated than our expansion using __builtin_choose_expr and __builtin_types_compatible_p. The expansion through QEMU_MAKE_LOCKABLE_ doesn't work because we're not emumerating all of the types within the same _Generic, which results in errors about unhandled cases. We must also handle void* explicitly, so that the NULL constant can be used. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210614233143.1221879-7-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-16util: Use unique type for QemuRecMutex in thread-posix.hRichard Henderson1-2/+8
We will shortly convert lockable.h to _Generic, and we cannot have two compatible types in the same expansion. Wrap QemuMutex in a struct, and unwrap in qemu-thread-posix.c. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210614233143.1221879-6-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-16util: Pass file+line to qemu_rec_mutex_unlock_implRichard Henderson1-1/+9
Create macros for file+line expansion in qemu_rec_mutex_unlock like we have for qemu_mutex_unlock. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210614233143.1221879-5-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-16util: Use real functions for thread-posix QemuRecMutexRichard Henderson3-13/+6
Move the declarations from thread-win32.h into thread.h and remove the macro redirection from thread-posix.h. This will be required by following cleanups. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210614233143.1221879-4-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15configure: Remove probe for _Static_assertRichard Henderson1-11/+0
_Static_assert is part of C11, which is now required. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210614233143.1221879-9-richard.henderson@linaro.org>
2021-06-15qemu/compiler: Remove QEMU_GENERICRichard Henderson1-40/+0
All previous users now use C11 _Generic. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210614233143.1221879-8-richard.henderson@linaro.org>
2021-06-15include/qemu/lockable: Use _Generic instead of QEMU_GENERICRichard Henderson1-48/+40
This is both more and less complicated than our expansion using __builtin_choose_expr and __builtin_types_compatible_p. The expansion through QEMU_MAKE_LOCKABLE_ doesn't work because we're not emumerating all of the types within the same _Generic, which results in errors about unhandled cases. We must also handle void* explicitly, so that the NULL constant can be used. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210614233143.1221879-7-richard.henderson@linaro.org>
2021-06-15util: Use unique type for QemuRecMutex in thread-posix.hRichard Henderson1-2/+8
We will shortly convert lockable.h to _Generic, and we cannot have two compatible types in the same expansion. Wrap QemuMutex in a struct, and unwrap in qemu-thread-posix.c. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210614233143.1221879-6-richard.henderson@linaro.org>
2021-06-15util: Pass file+line to qemu_rec_mutex_unlock_implRichard Henderson1-1/+9
Create macros for file+line expansion in qemu_rec_mutex_unlock like we have for qemu_mutex_unlock. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210614233143.1221879-5-richard.henderson@linaro.org>
2021-06-15util: Use real functions for thread-posix QemuRecMutexRichard Henderson3-13/+6
Move the declarations from thread-win32.h into thread.h and remove the macro redirection from thread-posix.h. This will be required by following cleanups. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210614233143.1221879-4-richard.henderson@linaro.org>
2021-06-15hostmem: Wire up RAM_NORESERVE via "reserve" propertyDavid Hildenbrand1-1/+1
Let's provide a way to control the use of RAM_NORESERVE via memory backends using the "reserve" property which defaults to true (old behavior). Only Linux currently supports clearing the flag (and support is checked at runtime, depending on the setting of "/proc/sys/vm/overcommit_memory"). Windows and other POSIX systems will bail out with "reserve=false". The target use case is virtio-mem, which dynamically exposes memory inside a large, sparse memory area to the VM. This essentially allows avoiding to set "/proc/sys/vm/overcommit_memory == 0") when using virtio-mem and also supporting hugetlbfs in the future. As really only Linux implements RAM_NORESERVE right now, let's expose the property only with CONFIG_LINUX. Setting the property to "false" will then only fail in corner cases -- for example on very old kernels or when memory overcommit was completely disabled by the admin. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Cc: Markus Armbruster <armbru@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-11-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under LinuxDavid Hildenbrand1-0/+3
Let's support RAM_NORESERVE via MAP_NORESERVE on Linux. The flag has no effect on most shared mappings - except for hugetlbfs and anonymous memory. Linux man page: "MAP_NORESERVE: Do not reserve swap space for this mapping. When swap space is reserved, one has the guarantee that it is possible to modify the mapping. When swap space is not reserved one might get SIGSEGV upon a write if no physical memory is available. See also the discussion of the file /proc/sys/vm/overcommit_memory in proc(5). In kernels before 2.6, this flag had effect only for private writable mappings." Note that the "guarantee" part is wrong with memory overcommit in Linux. Also, in Linux hugetlbfs is treated differently - we configure reservation of huge pages from the pool, not reservation of swap space (huge pages cannot be swapped). The rough behavior is [1]: a) !Hugetlbfs: 1) Without MAP_NORESERVE *or* with memory overcommit under Linux disabled ("/proc/sys/vm/overcommit_memory == 2"), the following accounting/reservation happens: For a file backed map SHARED or READ-only - 0 cost (the file is the map not swap) PRIVATE WRITABLE - size of mapping per instance For an anonymous or /dev/zero map SHARED - size of mapping PRIVATE READ-only - 0 cost (but of little use) PRIVATE WRITABLE - size of mapping per instance 2) With MAP_NORESERVE, no accounting/reservation happens. b) Hugetlbfs: 1) Without MAP_NORESERVE, huge pages are reserved. 2) With MAP_NORESERVE, no huge pages are reserved. Note: With "/proc/sys/vm/overcommit_memory == 0", we were already able to configure it for !hugetlbfs globally; this toggle now allows configuring it more fine-grained, not for the whole system. The target use case is virtio-mem, which dynamically exposes memory inside a large, sparse memory area to the VM. [1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-10-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()David Hildenbrand4-5/+23
Let's introduce RAM_NORESERVE, allowing mmap'ing with MAP_NORESERVE. The new flag has the following semantics: " RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge pages if applicable) is skipped: will bail out if not supported. When not set, the OS will do the reservation, if supported for the memory type. " Allow passing it into: - memory_region_init_ram_nomigrate() - memory_region_init_resizeable_ram() - memory_region_init_ram_from_file() ... and teach qemu_ram_mmap() and qemu_anon_ram_alloc() about the flag. Bail out if the flag is not supported, which is the case right now for both, POSIX and win32. We will add Linux support next and allow specifying RAM_NORESERVE via memory backends. The target use case is virtio-mem, which dynamically exposes memory inside a large, sparse memory area to the VM. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-9-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap()David Hildenbrand2-7/+27
Let's pass flags instead of bools to prepare for passing other flags and update the documentation of qemu_ram_mmap(). Introduce new QEMU_MAP_ flags that abstract the mmap() PROT_ and MAP_ flag handling and simplify it. We expose only flags that are currently supported by qemu_ram_mmap(). Maybe, we'll see qemu_mmap() in the future as well that can implement these flags. Note: We don't use MAP_ flags as some flags (e.g., MAP_SYNC) are only defined for some systems and we want to always be able to identify these flags reliably inside qemu_ram_mmap() -- for example, to properly warn when some future flags are not available or effective on a system. Also, this way we can simplify PROT_ handling as well. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-8-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15softmmu/memory: Pass ram_flags to qemu_ram_alloc() and qemu_ram_alloc_internal()David Hildenbrand1-1/+1
Let's pass ram_flags to qemu_ram_alloc() and qemu_ram_alloc_internal(), preparing for passing additional flags. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-7-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15softmmu/memory: Pass ram_flags to memory_region_init_ram_shared_nomigrate()David Hildenbrand1-12/+12
Let's forward ram_flags instead, renaming memory_region_init_ram_shared_nomigrate() into memory_region_init_ram_flags_nomigrate(). Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-6-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15softmmu/memory: Pass ram_flags to qemu_ram_alloc_from_fd()David Hildenbrand2-11/+4
Let's pass in ram flags just like we do with qemu_ram_alloc_from_file(), to clean up and prepare for more flags. Simplify the documentation of passed ram flags: Looking at our documentation of RAM_SHARED and RAM_PMEM is sufficient, no need to be repetitive. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-5-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memoryDavid Hildenbrand1-2/+2
We can create shared anonymous memory via "-object memory-backend-ram,share=on,..." which is, for example, required by PVRDMA for mremap() to work. Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we have to use MADV_REMOVE: MADV_DONTNEED will only remove / zap all relevant page table entries of the current process, the backend storage will not get removed, resulting in no reduced memory consumption and a repopulation of previous content on next access. Shared anonymous memory is internally really just shmem, but without a fd exposed. As we cannot use fallocate() without the fd to discard the backing storage, MADV_REMOVE gets the same job done without a fd as documented in "man 2 madvise". Removing backing storage implicitly invalidates all page table entries with relevant mappings - an additional MADV_DONTNEED is not required. Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram") Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210406080126.24010-3-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15esp: store lun coming from the MESSAGE OUT phasePaolo Bonzini1-0/+1
The LUN is selected with an IDENTIFY message, and persists until the next message out phase. Instead of passing it to do_busid_cmd, store it in ESPState. Because do_cmd can simply skip the message out phase if cmdfifo_cdb_offset is zero, it can now be used for the S without ATN cases as well. Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15sysemu: Make TPM structures inaccessible if CONFIG_TPM is not definedStefan Berger2-1/+14
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210614191335.1968807-5-stefanb@linux.ibm.com> [PMD: Remove tpm_init() / tpm_cleanup() stubs] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2021-06-15acpi: Eliminate all TPM related code if CONFIG_TPM is not setStefan Berger1-0/+4
Cc: M: Michael S. Tsirkin <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210614191335.1968807-4-stefanb@linux.ibm.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2021-06-13tcg: Fix documentation for tcg_constant_* vs tcg_temp_free_*Richard Henderson1-1/+2
At some point during the development of tcg_constant_*, I changed my mind about whether such temps should be able to be passed to tcg_temp_free_*. The final version committed allows this, but the commentary was not updated to match. Fixes: c0522136adf Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-13tcg: Introduce tcg_remove_ops_afterRichard Henderson1-0/+10
Introduce a function to remove everything emitted since a given point. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-13tcg: Move tcg_init_ctx and tcg_ctx from accel/tcg/Richard Henderson1-1/+0
These variables belong to the jit side, not the user side. Since tcg_init_ctx is no longer used outside of tcg/, move the declaration to tcg-internal.h. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-13util/osdep: Add qemu_mprotect_rwRichard Henderson1-0/+1
For --enable-tcg-interpreter on Windows, we will need this. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-11tcg: Move in_code_gen_buffer and tests to region.cRichard Henderson1-10/+1
Shortly, the full code_gen_buffer will only be visible to region.c, so move in_code_gen_buffer out-of-line. Move the debugging versions of tcg_splitwx_to_{rx,rw} to region.c as well, so that the compiler gets to see the implementation of in_code_gen_buffer. This leaves exactly one use of in_code_gen_buffer outside of region.c, in cpu_restore_state. Which, being on the exception path, is not performance critical. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-11accel/tcg: Pass down max_cpus to tcg_initRichard Henderson1-1/+1
Start removing the include of hw/boards.h from tcg/. Pass down the max_cpus value from tcg_init_machine, where we have the MachineState already. Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-11accel/tcg: Merge tcg_exec_init into tcg_init_machineRichard Henderson1-2/+0
There is only one caller, and shortly we will need access to the MachineState, which tcg_init_machine already has. Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-11tcg: Create tcg_initRichard Henderson1-2/+1
Perform both tcg_context_init and tcg_region_init. Do not leave this split to the caller. Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-11accel/tcg: Move alloc_code_gen_buffer to tcg/region.cRichard Henderson1-1/+1
Buffer management is integral to tcg. Do not leave the allocation to code outside of tcg/. This is code movement, with further cleanups to follow. Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-11vhost-vdpa: remove the unused vhost_vdpa_get_acked_features()Jason Wang1-1/+0
No user for this helper, let's remove it. Signed-off-by: Jason Wang <jasowang@redhat.com>
2021-06-11vhost-vdpa: map virtqueue notification area if possibleJason Wang1-0/+6
This patch implements the vq notification mapping support for vhost-vDPA. This is simply done by using mmap()/munmap() for the vhost-vDPA fd during device start/stop. For the device without notification mapping support, we fall back to eventfd based notification gracefully. Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Signed-off-by: Jason Wang <jasowang@redhat.com>