aboutsummaryrefslogtreecommitdiff
path: root/backends/hostmem.c
AgeCommit message (Collapse)AuthorFilesLines
2022-12-28hostmem: Honor multiple preferred nodes if possibleMichal Privoznik1-2/+17
If a memory-backend is configured with mode HOST_MEM_POLICY_PREFERRED then host_memory_backend_memory_complete() calls mbind() as: mbind(..., MPOL_PREFERRED, nodemask, ...); Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds to the .host-nodes attribute. Therefore, there can be multiple nodes specified. However, the documentation to MPOL_PREFERRED says: MPOL_PREFERRED This mode sets the preferred node for allocation. ... If nodemask specifies more than one node ID, the first node in the mask will be selected as the preferred node. Therefore, only the first node is honored and the rest is silently ignored. Well, with recent changes to the kernel and numactl we can do better. The Linux kernel added in v5.15 via commit cfcaa66f8032 ("mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY") support for MPOL_PREFERRED_MANY, which accepts multiple preferred NUMA nodes instead. Then, numa_has_preferred_many() API was introduced to numactl (v2.0.15~26) allowing applications to query kernel support. Wiring this all together, we can pass MPOL_PREFERRED_MANY to the mbind() call instead and stop ignoring multiple nodes, silently. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <a0b4adce1af5bd2344c2218eb4a04b3ff7bcfdb4.1671097918.git.mprivozn@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27hostmem: Allow for specifying a ThreadContext for preallocationDavid Hildenbrand1-3/+9
Let's allow for specifying a thread context via the "prealloc-context" property. When set, preallcoation threads will be crated via the thread context -- inheriting the same CPU affinity as the thread context. Pinning preallcoation threads to CPUs can heavily increase performance in NUMA setups, because, preallocation from a CPU close to the target NUMA node(s) is faster then preallocation from a CPU further remote, simply because of memory bandwidth for initializing memory with zeroes. This is especially relevant for very large VMs backed by huge/gigantic pages, whereby preallocation is mandatory. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-7-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27util: Make qemu_prealloc_mem() optionally consume a ThreadContextDavid Hildenbrand1-2/+3
... and implement it under POSIX. When a ThreadContext is provided, create new threads via the context such that these new threads obtain a properly configured CPU affinity. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-6-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27util: Cleanup and rename os_mem_prealloc()David Hildenbrand1-3/+3
Let's * give the function a "qemu_*" style name * make sure the parameters in the implementation match the prototype * rename smp_cpus to max_threads, which makes the semantics of that parameter clearer ... and add a function documentation. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-2-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-08-26backends/hostmem: Fix support of memory-backend-memfd in qemu_maxrampagesize()Thomas Huth1-12/+2
It is currently not possible yet to use "memory-backend-memfd" on s390x with hugepages enabled. This problem is caused by qemu_maxrampagesize() not taking memory-backend-memfd objects into account yet, so the code in s390_memory_init() fails to enable the huge page support there via s390_set_max_pagesize(). Fix it by generalizing the code, so that it looks at qemu_ram_pagesize(memdev->mr.ram_block) instead of re-trying to get the information from the filesystem. Suggested-by: David Hildenbrand <david@redhat.com> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2116496 Message-Id: <20220810125720.3849835-2-thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-05-23hostmem: default the amount of prealloc-threads to smp-cpusJaroslav Jindrak1-1/+1
Prior to the introduction of the prealloc-threads property, the amount of threads used to preallocate memory was derived from the value of smp-cpus passed to qemu, the amount of physical cpus of the host and a hardcoded maximum value. When the prealloc-threads property was introduced, it included a default of 1 in backends/hostmem.c and a default of smp-cpus using the sugar API for the property itself. The latter default is not used when the property is not specified on qemu's command line, so guests that were not adjusted for this change suddenly started to use the default of 1 thread to preallocate memory, which resulted in observable slowdowns in guest boots for guests with large memory (e.g. when using libvirt <8.2.0 or managing guests manually). This commit restores the original behavior for these cases while not impacting guests started with the prealloc-threads property in any way. Fixes: 220c1fd864e9d ("hostmem: introduce "prealloc-threads" property") Signed-off-by: Jaroslav Jindrak <dzejrou@gmail.com> Message-Id: <20220517123858.7933-1-dzejrou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06Replace qemu_real_host_page variables with inlined functionsMarc-André Lureau1-1/+1
Replace the global variables with inlined helper functions. getpagesize() is very likely annotated with a "const" function attribute (at least with glibc), and thus optimization should apply even better. This avoids the need for a constructor initialization too. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-21include: Move qemu_madvise() and related #defines to new qemu/madvise.hPeter Maydell1-0/+1
The function qemu_madvise() and the QEMU_MADV_* constants associated with it are used in only 10 files. Move them out of osdep.h to a new qemu/madvise.h header that is included where it is needed. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220208200856.3558249-2-peter.maydell@linaro.org
2021-06-15hostmem: Wire up RAM_NORESERVE via "reserve" propertyDavid Hildenbrand1-0/+36
Let's provide a way to control the use of RAM_NORESERVE via memory backends using the "reserve" property which defaults to true (old behavior). Only Linux currently supports clearing the flag (and support is checked at runtime, depending on the setting of "/proc/sys/vm/overcommit_memory"). Windows and other POSIX systems will bail out with "reserve=false". The target use case is virtio-mem, which dynamically exposes memory inside a large, sparse memory area to the VM. This essentially allows avoiding to set "/proc/sys/vm/overcommit_memory == 0") when using virtio-mem and also supporting hugetlbfs in the future. As really only Linux implements RAM_NORESERVE right now, let's expose the property only with CONFIG_LINUX. Setting the property to "false" will then only fail in corner cases -- for example on very old kernels or when memory overcommit was completely disabled by the admin. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Cc: Markus Armbruster <armbru@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-11-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-02Do not include sysemu/sysemu.h if it's not really necessaryThomas Huth1-1/+0
Stop including sysemu/sysemu.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-2-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-02-08machine: add missing doc for memory-backend optionIgor Mammedov1-0/+10
Add documentation for '-machine memory-backend' CLI option and how to use it. And document that x-use-canonical-path-for-ramblock-id, is considered to be stable to make sure it won't go away by accident. x- was intended for unstable/iternal properties, and not supposed to be stable option. However it's too late to rename (drop x-) it as it would mean that users will have to mantain both x-use-canonical-path-for-ramblock-id (for QEMU 5.0-5.2) versions and prefix-less for later versions. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210121161504.1007247-1-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-28qapi: Use QAPI_LIST_APPEND in trivial casesEric Blake1-7/+3
The easiest spots to use QAPI_LIST_APPEND are where we already have an obvious pointer to the tail of a list. While at it, consistently use the variable name 'tail' for that purpose. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210113221013.390592-5-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-12-18bugfix: hostmem: Free host_nodes list right after visitedKeqian Zhu1-0/+1
In host_memory_backend_get_host_nodes, we build host_nodes list and output it to v (a StringOutputVisitor) but forget to free the list. This fixes the memory leak. The memory leak stack: Direct leak of 32 byte(s) in 2 object(s) allocated from: #0 0xfffda30b3393 in __interceptor_calloc (/usr/lib64/libasan.so.4+0xd3393) #1 0xfffda1d28b9b in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x58b9b) #2 0xaaab05ca6e43 in host_memory_backend_get_host_nodes backends/hostmem.c:94 #3 0xaaab061ddf83 in object_property_get_uint16List qom/object.c:1478 #4 0xaaab05866513 in query_memdev hw/core/machine-qmp-cmds.c:312 #5 0xaaab061d980b in do_object_child_foreach qom/object.c:1001 #6 0xaaab0586779b in qmp_query_memdev hw/core/machine-qmp-cmds.c:328 #7 0xaaab0615ed3f in qmp_marshal_query_memdev qapi/qapi-commands-machine.c:327 #8 0xaaab0632d647 in do_qmp_dispatch qapi/qmp-dispatch.c:147 #9 0xaaab0632d647 in qmp_dispatch qapi/qmp-dispatch.c:190 #10 0xaaab0610f74b in monitor_qmp_dispatch monitor/qmp.c:120 #11 0xaaab0611074b in monitor_qmp_bh_dispatcher monitor/qmp.c:209 #12 0xaaab063caefb in aio_bh_poll util/async.c:117 #13 0xaaab063d30fb in aio_dispatch util/aio-posix.c:459 #14 0xaaab063cac8f in aio_ctx_dispatch util/async.c:268 #15 0xfffda1d22a6b in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x52a6b) #16 0xaaab063d0e97 in glib_pollfds_poll util/main-loop.c:218 #17 0xaaab063d0e97 in os_host_main_loop_wait util/main-loop.c:241 #18 0xaaab063d0e97 in main_loop_wait util/main-loop.c:517 #19 0xaaab05c8bfa7 in main_loop /root/rpmbuild/BUILD/qemu-4.1.0/vl.c:1791 #20 0xaaab05713bc3 in main /root/rpmbuild/BUILD/qemu-4.1.0/vl.c:4473 #21 0xfffda0a83ebf in __libc_start_main (/usr/lib64/libc.so.6+0x23ebf) #22 0xaaab0571ed5f (aarch64-softmmu/qemu-system-aarch64+0x88ed5f) SUMMARY: AddressSanitizer: 32 byte(s) leaked in 2 allocation(s). Fixes: 4cf1b76bf1e2 (hostmem: add properties for NUMA memory policy) Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Tested-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20201210075226.20196-1-zhukeqian1@huawei.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-07-21qom: Change object_get_canonical_path_component() not to mallocMarkus Armbruster1-1/+1
object_get_canonical_path_component() returns a malloced copy of a property name on success, null on failure. 19 of its 25 callers immediately free the returned copy. Change object_get_canonical_path_component() to return the property name directly. Since modifying the name would be wrong, adjust the return type to const char *. Drop the free from the 19 callers become simpler, add the g_strdup() to the other six. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200714160202.3121879-4-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com>
2020-07-10error: Eliminate error_propagate() with Coccinelle, part 1Markus Armbruster1-6/+2
When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. Convert if (!foo(..., &err)) { ... error_propagate(errp, err); ... return ... } to if (!foo(..., errp)) { ... ... return ... } where nothing else needs @err. Coccinelle script: @rule1 forall@ identifier fun, err, errp, lbl; expression list args, args2; binary operator op; constant c1, c2; symbol false; @@ if ( ( - fun(args, &err, args2) + fun(args, errp, args2) | - !fun(args, &err, args2) + !fun(args, errp, args2) | - fun(args, &err, args2) op c1 + fun(args, errp, args2) op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; | return c2; | return false; ) } @rule2 forall@ identifier fun, err, errp, lbl; expression list args, args2; expression var; binary operator op; constant c1, c2; symbol false; @@ - var = fun(args, &err, args2); + var = fun(args, errp, args2); ... when != err if ( ( var | !var | var op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; | return c2; | return false; | return var; ) } @depends on rule1 || rule2@ identifier err; @@ - Error *err = NULL; ... when != err Not exactly elegant, I'm afraid. The "when != lbl:" is necessary to avoid transforming if (fun(args, &err)) { goto out } ... out: error_propagate(errp, err); even though other paths to label out still need the error_propagate(). For an actual example, see sclp_realize(). Without the "when strict", Coccinelle transforms vfio_msix_setup(), incorrectly. I don't know what exactly "when strict" does, only that it helps here. The match of return is narrower than what I want, but I can't figure out how to express "return where the operand doesn't use @err". For an example where it's too narrow, see vfio_intx_enable(). Silently fails to convert hw/arm/armsse.c, because Coccinelle gets confused by ARMSSE being used both as typedef and function-like macro there. Converted manually. Line breaks tidied up manually. One nested declaration of @local_err deleted manually. Preexisting unwanted blank line dropped in hw/riscv/sifive_e.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-35-armbru@redhat.com>
2020-07-10error: Avoid unnecessary error_propagate() after error_setg()Markus Armbruster1-15/+12
Replace error_setg(&err, ...); error_propagate(errp, err); by error_setg(errp, ...); Related pattern: if (...) { error_setg(&err, ...); goto out; } ... out: error_propagate(errp, err); return; When all paths to label out are that way, replace by if (...) { error_setg(errp, ...); return; } and delete the label along with the error_propagate(). When we have at most one other path that actually needs to propagate, and maybe one at the end that where propagation is unnecessary, e.g. foo(..., &err); if (err) { goto out; } ... bar(..., &err); out: error_propagate(errp, err); return; move the error_propagate() to where it's needed, like if (...) { foo(..., &err); error_propagate(errp, err); return; } ... bar(..., errp); return; and transform the error_setg() as above. In some places, the transformation results in obviously unnecessary error_propagate(). The next few commits will eliminate them. Bonus: the elimination of gotos will make later patches in this series easier to review. Candidates for conversion tracked down with this Coccinelle script: @@ identifier err, errp; expression list args; @@ - error_setg(&err, args); + error_setg(errp, args); ... when != err error_propagate(errp, err); Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-34-armbru@redhat.com>
2020-07-10qapi: Use returned bool to check for failure, Coccinelle partMarkus Armbruster1-4/+2
The previous commit enables conversion of visit_foo(..., &err); if (err) { ... } to if (!visit_foo(..., errp)) { ... } for visitor functions that now return true / false on success / error. Coccinelle script: @@ identifier fun =~ "check_list|input_type_enum|lv_start_struct|lv_type_bool|lv_type_int64|lv_type_str|lv_type_uint64|output_type_enum|parse_type_bool|parse_type_int64|parse_type_null|parse_type_number|parse_type_size|parse_type_str|parse_type_uint64|print_type_bool|print_type_int64|print_type_null|print_type_number|print_type_size|print_type_str|print_type_uint64|qapi_clone_start_alternate|qapi_clone_start_list|qapi_clone_start_struct|qapi_clone_type_bool|qapi_clone_type_int64|qapi_clone_type_null|qapi_clone_type_number|qapi_clone_type_str|qapi_clone_type_uint64|qapi_dealloc_start_list|qapi_dealloc_start_struct|qapi_dealloc_type_anything|qapi_dealloc_type_bool|qapi_dealloc_type_int64|qapi_dealloc_type_null|qapi_dealloc_type_number|qapi_dealloc_type_str|qapi_dealloc_type_uint64|qobject_input_check_list|qobject_input_check_struct|qobject_input_start_alternate|qobject_input_start_list|qobject_input_start_struct|qobject_input_type_any|qobject_input_type_bool|qobject_input_type_bool_keyval|qobject_input_type_int64|qobject_input_type_int64_keyval|qobject_input_type_null|qobject_input_type_number|qobject_input_type_number_keyval|qobject_input_type_size_keyval|qobject_input_type_str|qobject_input_type_str_keyval|qobject_input_type_uint64|qobject_input_type_uint64_keyval|qobject_output_start_list|qobject_output_start_struct|qobject_output_type_any|qobject_output_type_bool|qobject_output_type_int64|qobject_output_type_null|qobject_output_type_number|qobject_output_type_str|qobject_output_type_uint64|start_list|visit_check_list|visit_check_struct|visit_start_alternate|visit_start_list|visit_start_struct|visit_type_.*"; expression list args; typedef Error; Error *err; @@ - fun(args, &err); - if (err) + if (!fun(args, &err)) { ... } A few line breaks tidied up manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-19-armbru@redhat.com>
2020-06-02Merge remote-tracking branch ↵Peter Maydell1-2/+4
'remotes/ehabkost/tags/machine-next-pull-request' into staging machine queue, 2020-05-13 Bug fixes: * hostmem: don't use mbind() if host-nodes is empty (Igor Mammedov) # gpg: Signature made Wed 13 May 2020 15:00:25 BST # gpg: using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6 # gpg: issuer "ehabkost@redhat.com" # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full] # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/machine-next-pull-request: hostmem: don't use mbind() if host-nodes is empty Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-05-15qom: Drop parameter @errp of object_property_add() & friendsMarkus Armbruster1-10/+9
The only way object_property_add() can fail is when a property with the same name already exists. Since our property names are all hardcoded, failure is a programming error, and the appropriate way to handle it is passing &error_abort. Same for its variants, except for object_property_add_child(), which additionally fails when the child already has a parent. Parentage is also under program control, so this is a programming error, too. We have a bit over 500 callers. Almost half of them pass &error_abort, slightly fewer ignore errors, one test case handles errors, and the remaining few callers pass them to their own callers. The previous few commits demonstrated once again that ignoring programming errors is a bad idea. Of the few ones that pass on errors, several violate the Error API. The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. ich9_pm_add_properties(), sparc32_ledma_realize(), sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize() are wrong that way. When the one appropriate choice of argument is &error_abort, letting users pick the argument is a bad idea. Drop parameter @errp and assert the preconditions instead. There's one exception to "duplicate property name is a programming error": the way object_property_add() implements the magic (and undocumented) "automatic arrayification". Don't drop @errp there. Instead, rename object_property_add() to object_property_try_add(), and add the obvious wrapper object_property_add(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-15-armbru@redhat.com> [Two semantic rebase conflicts resolved]
2020-05-15qom: Drop object_property_set_description() parameter @errpMarkus Armbruster1-8/+8
object_property_set_description() and object_class_property_set_description() fail only when property @name is not found. There are 85 calls of object_property_set_description() and object_class_property_set_description(). None of them can fail: * 84 immediately follow the creation of the property. * The one in spapr_rng_instance_init() refers to a property created in spapr_rng_class_init(), from spapr_rng_properties[]. Every one of them still gets to decide what to pass for @errp. 51 calls pass &error_abort, 32 calls pass NULL, one receives the error and propagates it to &error_abort, and one propagates it to &error_fatal. I'm actually surprised none of them violates the Error API. What are we gaining by letting callers handle the "property not found" error? Use when the property is not known to exist is simpler: you don't have to guard the call with a check. We haven't found such a use in 5+ years. Until we do, let's make life a bit simpler and drop the @errp parameter. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-8-armbru@redhat.com> [One semantic rebase conflict resolved]
2020-05-12hostmem: don't use mbind() if host-nodes is emptyIgor Mammedov1-2/+4
Since 5.0 QEMU uses hostmem backend for allocating main guest RAM. The backend however calls mbind() which is typically NOP in case of default policy/absent host-nodes bitmap. However when runing in container with black-listed mbind() syscall, QEMU fails to start with error "cannot bind memory to host NUMA nodes: Operation not permitted" even when user hasn't provided host-nodes to pin to explictly (which is the case with -m option) To fix issue, call mbind() only in case when user has provided host-nodes explicitly (i.e. host_nodes bitmap is not empty). That should allow to run QEMU in containers with black-listed mbind() without memory pinning. If QEMU provided memory-pinning is required user still has to white-list mbind() in container configuration. Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20200430154606.6421-1-imammedo@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-04-14hostmem: set default prealloc_threads to valid valueIgor Mammedov1-0/+1
Commit 4ebc74dbbf removed default prealloc_threads initialization by mistake, and that makes QEMU crash with division on zero at numpages_per_thread = numpages / memset_num_threads; when QEMU is started with following backend -object memory-backend-ram,id=ram-node0,prealloc=yes,size=128M Return back initialization removed by 4ebc74dbbf to fix issue. Fixes: 4ebc74dbbf7ad50e4101629f3f5da5fdc1544051 Reported-by: Raphael Norwitz <raphael.norwitz@nutanix.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20200325094423.24293-2-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-19hostmem: fix strict bind policyIgor Mammedov1-11/+1
When option -mem-prealloc is used with one or more memory-backend objects, created backends may not obey configured bind policy or creation may fail after kernel attempts to move pages according to bind policy. Reason is in file_ram_alloc(), which will pre-allocate any descriptor based RAM if global mem_prealloc != 0 and that happens way before bind policy is applied to memory range. One way to fix it would be to extend memory_region_foo() API and add more invariants that could broken later due implicit dependencies that's hard to track. Another approach is to drop adhoc main RAM allocation and consolidate it around memory-backend. That allows to have single place that allocates guest RAM (main and memdev) in the same way and then global mem_prealloc could be replaced by backend's property[s] that will affect created memory-backend objects but only in correct order this time. With main RAM now converted to hostmem backends, there is no point in keeping global mem_prealloc around, so alias -mem-prealloc to "memory-backend.prealloc=on" machine compat[*] property and make mem_prealloc a local variable to only stir registration of compat property. *) currently user accessible -global works only with DEVICE based objects and extra work is needed to make it work with hostmem backends. But that is convenience option and out of scope of this already huge refactoring. Hence machine compat properties were used. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20200219160953.13771-78-imammedo@redhat.com>
2020-02-19hostmem: introduce "prealloc-threads" propertyIgor Mammedov1-4/+39
the property will allow user to specify number of threads to use in pre-allocation stage. It also will allow to reduce implicit hostmem dependency on current_machine. On object creation it will default to 1, but via machine compat property it will be updated to MachineState::smp::cpus to keep current behavior for hostmem and main RAM (which is now also hostmem based). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20200219160953.13771-77-imammedo@redhat.com>
2019-10-26core: replace getpagesize() with qemu_real_host_page_sizeWei Yang1-1/+1
There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-16Include sysemu/sysemu.h a lot lessMarkus Armbruster1-0/+1
In my "build everything" tree, changing sysemu/sysemu.h triggers a recompile of some 5400 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). hw/qdev-core.h includes sysemu/sysemu.h since recent commit e965ffa70a "qdev: add qdev_add_vm_change_state_handler()". This is a bad idea: hw/qdev-core.h is widely included. Move the declaration of qdev_add_vm_change_state_handler() to sysemu/sysemu.h, and drop the problematic include from hw/qdev-core.h. Touching sysemu/sysemu.h now recompiles some 1800 objects. qemu/uuid.h also drops from 5400 to 1800. A few more headers show smaller improvement: qemu/notify.h drops from 5600 to 5200, qemu/timer.h from 5600 to 4500, and qapi/qapi-types-run-state.h from 5500 to 5000. Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20190812052359.30071-28-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2019-07-05general: Replace global smp variables with smp machine propertiesLike Xu1-2/+4
Basically, the context could get the MachineState reference via call chains or unrecommended qdev_get_machine() in !CONFIG_USER_ONLY mode. A local variable of the same name would be introduced in the declaration phase out of less effort OR replace it on the spot if it's only used once in the context. No semantic changes. Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190518205428.90532-4-like.xu@linux.intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-03-06hostmem: fix crash when querying empty host-nodes property via QMPIgor Mammedov1-1/+2
QEMU will crashes with qapi/qobject-output-visitor.c:210: qobject_output_complete: Assertion `qov->root && ((&qov->stack)->slh_first == ((void *)0))' failed when trying to get value of not set hostmem's "host-nodes" property, HostMemoryBackend::host_nodes bitmap doesn't have any bits set in it, which leads to find_first_bit() returning MAX_NODES and consequently to an early return from host_memory_backend_get_host_nodes() without calling visitor. Fix it by calling visitor even if "host-nodes" property wasn't set before exiting from property getter to return valid empty list. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20190214105733.25643-1-imammedo@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-01-28hostmem: add more information in error messagesZhang Yi1-3/+5
When there are multiple memory backends in use, including the object type and property name in the error message can help users to locate the error. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com> Message-Id: <97d9193875747d8378c05b9e3b3cb39c1b7d2b4e.1546399191.git.yi.z.zhang@linux.intel.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> [ehabkost: reword commit message] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-01-07hostmem: use object id for memory region name with >= 4.0Marc-André Lureau1-0/+36
hostmem-file and hostmem-memfd use the whole object path for the memory region name, and hostname-ram uses only the path component (the object id, or canonical path basename): qemu -m 1024 -object memory-backend-file,id=mem,size=1G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000 qemu -m 1024 -object memory-backend-memfd,id=mem,size=1G -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000 qemu -m 1024 -object memory-backend-ram,id=mem,size=1G -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000 For consistency, change to use object id for -file and -memfd as well with >= 4.0. Having a consistent naming allows to migrate to different hostmem backends. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>
2018-12-11hostmem: Validate host-nodes before setting bitmapEduardo Habkost1-4/+13
host_memory_backend_set_host_nodes() was not validating host-nodes before writing to backend->host_nodes, making QEMU write beyond the end of the bitmap. Fix the crash and add a simple regression test for the fix. While at it, fix memory leak of the list returned by visit_type_uint16List(). Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20181130122844.29103-1-ehabkost@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> [ehabkost: removed test case code] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-10-05hostmem: add some properties descriptionMarc-André Lureau1-0/+14
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-28hostmem: drop error variable from host_memory_backend_get_memory()David Hildenbrand1-2/+1
Unused, so let's remove it. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180619134141.29478-8-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-05-09memdev: remove "id" propertyPaolo Bonzini1-26/+0
The "id" property is unnecessary and can be replaced simply with object_get_canonical_path_component. This patch mostly undoes commit e1ff3c67e8 ("monitor: fix qmp/hmp query-memdev not reporting IDs of memory backends", 2017-01-12). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-27Add host_memory_backend_pagesize() helperDavid Gibson1-0/+18
There are a couple places (one generic, one target specific) where we need to get the host page size associated with a particular memory backend. I have some upcoming code which will add another place which wants this. So, for convenience, add a helper function to calculate this. host_memory_backend_pagesize() returns the host pagesize for a given HostMemoryBackend object. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-02qapi: Move qapi-schema.json to qapi/, rename generated filesMarkus Armbruster1-1/+1
Move qapi-schema.json to qapi/, so it's next to its modules, and all files get generated to qapi/, not just the ones generated for modules. Consistently name the generated files qapi-MODULE.EXT: qmp-commands.[ch] become qapi-commands.[ch], qapi-event.[ch] become qapi-events.[ch], and qmp-introspect.[ch] become qapi-introspect.[ch]. This gets rid of the temporary hacks in scripts/qapi/commands.py, scripts/qapi/events.py, and scripts/qapi/common.py. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-28-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> [eblake: Fix trailing dot in tpm.c, undo temporary hack for OSX toolchain] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-02Include less of the generated modular QAPI headersMarkus Armbruster1-1/+2
In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-02-19mem: add share parameter to memory-backend-ramMarcel Apfelbaum1-0/+21
Currently only file backed memory backend can be created with a "share" flag in order to allow sharing guest RAM with other processes in the host. Add the "share" flag also to RAM Memory Backend in order to allow remapping parts of the guest RAM to different host virtual addresses. This is needed by the RDMA devices in order to remap non-contiguous QEMU virtual addresses to a contiguous virtual address range. Moved the "share" flag to the Host Memory base class, modified phys_mem_alloc to include the new parameter and a new interface memory_region_init_ram_shared_nomigrate. There are no functional changes if the new flag is not used. Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
2018-02-09Drop superfluous includes of qapi-types.h and test-qapi-types.hMarkus Armbruster1-1/+0
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-4-armbru@redhat.com>
2017-09-04Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-09-01-v3' ↵Peter Maydell1-2/+2
into staging QAPI patches for 2017-09-01 # gpg: Signature made Mon 04 Sep 2017 12:30:31 BST # gpg: using RSA key 0x3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qapi-2017-09-01-v3: (47 commits) qapi: drop the sentinel in enum array qapi: Change data type of the FOO_lookup generated for enum FOO qapi: Convert indirect uses of FOO_lookup[...] to qapi_enum_lookup() qapi: Mechanically convert FOO_lookup[...] to FOO_str(...) qapi: Generate FOO_str() macro for QAPI enum FOO qapi: Avoid unnecessary use of enum lookup table's sentinel qapi: Use qapi_enum_parse() in input_type_enum() crypto: Use qapi_enum_parse() in qcrypto_block_luks_name_lookup() quorum: Use qapi_enum_parse() in quorum_open() block: Use qemu_enum_parse() in blkdebug_debug_breakpoint() hmp: Use qapi_enum_parse() in hmp_migrate_set_parameter() hmp: Use qapi_enum_parse() in hmp_migrate_set_capability() tpm: Clean up model registration & lookup tpm: Clean up driver registration & lookup qapi: Drop superfluous qapi_enum_parse() parameter max qapi: Update qapi-code-gen.txt examples to match current code qapi-schema: Improve section headings qapi-schema: Move queries from common.json to qapi-schema.json qapi-schema: Make block-core.json self-contained qapi-schema: Fold event.json back into qapi-schema.json ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-04qapi: Change data type of the FOO_lookup generated for enum FOOMarc-André Lureau1-1/+1
Currently, a FOO_lookup is an array of strings terminated by a NULL sentinel. A future patch will generate enums with "holes". NULL-termination will cease to work then. To prepare for that, store the length in the FOO_lookup by wrapping it in a struct and adding a member for the length. The sentinel will be dropped next. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170822132255.23945-13-marcandre.lureau@redhat.com> [Basically redone] Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-16-git-send-email-armbru@redhat.com> [Rebased]
2017-09-04qapi: Mechanically convert FOO_lookup[...] to FOO_str(...)Markus Armbruster1-1/+1
Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-14-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-09-01qom: Remove unused errp parameter from can_be_deleted()Eduardo Habkost1-1/+1
The errp argument is ignored by all implementations of the method, and user_creatable_del() would break if any implementation set an error (because it calls error_setg(errp) if the function returns false). Remove the unused parameter. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170829220337.23427-1-ehabkost@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20hostmem: use host_memory_backend_mr_inited() where properPeter Xu1-5/+5
Use the new interface to boost readability. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1489151370-15453-3-git-send-email-peterx@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20hostmem: introduce host_memory_backend_mr_inited()Peter Xu1-0/+9
We were checking this against memory region size of host memory backend's mr field to see whether the mr has been inited. This is efficient but less elegant. Let's make a helper for it to avoid confusions, along with some notes. Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1489151370-15453-2-git-send-email-peterx@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-03-21Revert "hostmem: fix QEMU crash by 'info memdev'"Markus Armbruster1-14/+8
This reverts commit 1454d33f0507cb54d62ed80f494884157c9e7130. The string input visitor regression fixed in the previous commit made visit_type_uint16List() fail on empty input. query_memdev() calls it via object_property_get_uint16List(). Because it doesn't expect it to fail, it passes &error_abort, and duly crashes. Commit 1454d33 "fixes" this crash by making host_memory_backend_get_host_nodes() return a list containing just MAX_NODES instead of the empty list. Papers over the regression, and leads to bogus "info memdev" output, as shown below; revert. I suspect that if we had bisected the crash back then, we would have found and fixed the actual bug instead of papering over it. To reproduce, run HMP command "info memdev" with $ qemu-system-x86_64 --nodefaults -S -display none -monitor stdio -object memory-backend-ram,id=mem1,size=4k With this commit, "info memdev" prints memory backend: mem1 size: 4096 merge: true dump: true prealloc: false policy: default host nodes: exactly like before commit 74f24cb. Between commit 1454d33 and this commit, it prints memory backend: mem1 size: 4096 merge: true dump: true prealloc: false policy: default host nodes: 128 The last line is bogus. Between commit 74f24cb and 1454d33, it crashes like this: Unexpected error in parse_str() at /work/armbru/tmp/qemu/qapi/string-input-visitor.c:126: Parameter 'null' expects an int64 value or range Aborted (core dumped) Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1490026424-11330-3-git-send-email-armbru@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-03-14mem-prealloc: reduce large guest start-up and migration time.Jitendra Kolhe1-2/+2
Using "-mem-prealloc" option for a large guest leads to higher guest start-up and migration time. This is because with "-mem-prealloc" option qemu tries to map every guest page (create address translations), and make sure the pages are available during runtime. virsh/libvirt by default, seems to use "-mem-prealloc" option in case the guest is configured to use huge pages. The patch tries to map all guest pages simultaneously by spawning multiple threads. Currently limiting the change to QEMU library functions on POSIX compliant host only, as we are not sure if the problem exists on win32. Below are some stats with "-mem-prealloc" option for guest configured to use huge pages. ------------------------------------------------------------------------ Idle Guest | Start-up time | Migration time ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - single threaded (existing code) ------------------------------------------------------------------------ 64 Core - 4TB | 54m11.796s | 75m43.843s 64 Core - 1TB | 8m56.576s | 14m29.049s 64 Core - 256GB | 2m11.245s | 3m26.598s ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - map guest pages using 8 threads ------------------------------------------------------------------------ 64 Core - 4TB | 5m1.027s | 34m10.565s 64 Core - 1TB | 1m10.366s | 8m28.188s 64 Core - 256GB | 0m19.040s | 2m10.148s ----------------------------------------------------------------------- Guest stats with 2M HugePage usage - map guest pages using 16 threads ----------------------------------------------------------------------- 64 Core - 4TB | 1m58.970s | 31m43.400s 64 Core - 1TB | 0m39.885s | 7m55.289s 64 Core - 256GB | 0m11.960s | 2m0.135s ----------------------------------------------------------------------- Changed in v2: - modify number of memset threads spawned to min(smp_cpus, 16). - removed 64GB memory restriction for spawning memset threads. Changed in v3: - limit number of threads spawned based on min(sysconf(_SC_NPROCESSORS_ONLN), 16, smp_cpus) - implement memset thread specific siglongjmp in SIGBUS signal_handler. Changed in v4 - remove sigsetjmp/siglongjmp and SIGBUS unblock/block for main thread as main thread no longer touches any pages. - simplify code my returning memset_thread_failed status from touch_all_pages. Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com> Message-Id: <1487907103-32350-1-git-send-email-jitendra.kolhe@hpe.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-12monitor: fix qmp/hmp query-memdev not reporting IDs of memory backendsIgor Mammedov1-0/+26
Considering 'id' is mandatory for user_creatable objects/backends and user_creatable_add_type() always has it as an argument regardless of where from it is called CLI/monitor or QMP, Fix issue by adding 'id' property to hostmem backends and set it in user_creatable_add_type() for every object that implements 'id' property. Then later at query-memdev time get 'id' from object directly. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1484052795-158195-4-git-send-email-imammedo@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17hostmem: Register TYPE_MEMORY_BACKEND properties as class propertiesEduardo Habkost1-20/+22
The NULL errp arguments on the property registration calls were changed to &error_abort. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-08-02fix qemu exit on memory hotplug when allocation fails at prealloc timeIgor Mammedov1-4/+14
When adding hostmem backend at runtime, QEMU might exit with error: "os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM" It happens due to os_mem_prealloc() not handling errors gracefully. Fix it by passing errp argument so that os_mem_prealloc() could report error to callers and undo performed allocation when os_mem_prealloc() fails. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1469008443-72059-1-git-send-email-imammedo@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>