aboutsummaryrefslogtreecommitdiff
path: root/hw/intc/spapr_xive_kvm.c
AgeCommit message (Collapse)AuthorFilesLines
2023-09-21hw/other: spelling fixesMichael Tokarev1-3/+3
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2021-10-21spapr/xive: Use xive_esb_rw() to trigger interruptsCédric Le Goater1-3/+1
xive_esb_rw() is the common routine used for memory accesses on ESB page. Use it for triggers also. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20211006210546.641102-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-21spapr/xive: Add source status helpersCédric Le Goater1-7/+3
and use them to set and test the ASSERTED bit of LSI sources. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20211004212141.432954-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30spapr/xive: Fix kvm_xive_source_reset trace eventCédric Le Goater1-2/+2
The trace event was placed in the wrong routine. Move it under kvmppc_xive_source_reset_one(). Fixes: 4e960974d4ee ("xive: Add trace events") Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210922070205.1235943-1-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-08-27ppc/xive: Export PQ get/set routinesCédric Le Goater1-4/+4
These will be shared with the XIVE2 router. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210809134547.689560-8-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-09sysemu: Let VMChangeStateHandler take boolean 'running' argumentPhilippe Mathieu-Daudé1-1/+1
The 'running' argument from VMChangeStateHandler does not require other value than 0 / 1. Make it a plain boolean. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20210111152020.1422021-3-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-12-14xive: Add trace eventsCédric Le Goater1-0/+5
I have been keeping those logging messages in an ugly form for while. Make them clean ! Beware not to activate all of them, this is really verbose. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20201123163717.1368450-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-11-18Revert series "spapr/xive: Allocate vCPU IPIs from the vCPU contexts"Greg Kurz1-84/+18
This series was largely built on the assumption that IPI numbers are numerically equal to vCPU ids. Even if this happens to be the case in practice with the default machine settings, this ceases to be true if VSMT is set to a different value than the number of vCPUs per core. This causes bogus IPI numbers to be created in KVM from a guest stand point. This leads to unknow results in the guest, including crashes or missing vCPUs (see BugLink) and even non-fatal oopses in current KVM that lacks a check before accessing misconfigured HW (see [1]). A tentative patch was sent (see [2]) but it seems too complex to be merged in an RC. Since the original changes are essentially an optimization, it seems safer to revert them for now. The damage is done by commit acbdb9956fe9 ("spapr/xive: Allocate IPIs independently from the other sources") but the previous patches in the series are really preparatory patches. So this reverts the whole series: eab0a2d06e97 ("spapr/xive: Allocate vCPU IPIs from the vCPU contexts") acbdb9956fe9 ("spapr/xive: Allocate IPIs independently from the other sources") fa94447a2cd6 ("spapr/xive: Use kvmppc_xive_source_reset() in post_load") 235d3b116213 ("spapr/xive: Modify kvm_cpu_is_enabled() interface") [1] https://marc.info/?l=kvm-ppc&m=160458409722959&w=4 [2] https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg03626.html Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Fixes: acbdb9956fe9 ("spapr/xive: Allocate IPIs independently from the other sources") BugLink: https://bugs.launchpad.net/qemu/+bug/1900241 Signed-off-by: Greg Kurz <groug@kaod.org> Acked-by: Cédric Le Goater <clg@kaod.org> Message-Id: <160554086275.1325084.12110142252189044646.stgit@bahia.lan>
2020-09-08spapr/xive: Allocate vCPU IPIs from the vCPU contextsCédric Le Goater1-3/+33
When QEMU switches to the XIVE interrupt mode, it creates all the guest interrupts at the level of the KVM device. These interrupts are backed by real HW interrupts from the IPI interrupt pool of the XIVE controller. Currently, this is done from the QEMU main thread, which results in allocating all interrupts from the chip on which QEMU is running. IPIs are not distributed across the system and the load is not well balanced across the interrupt controllers. Change the vCPU IPI allocation to run from the vCPU context. The associated XIVE IPI interrupt will be allocated on the chip on which the vCPU is running and improve distribution of the IPIs in the system. When the vCPUs are pinned, this will make the IPI local to the chip of the vCPU. It will reduce rerouting between interrupt controllers and gives better performance. Device interrupts are still treated the same. To improve placement, we would need some information on the chip owning the virtual source or the HW source in case of a passthrough device but this reuires changes in PAPR. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200820134547.2355743-5-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-09-08spapr/xive: Allocate IPIs independently from the other sourcesCédric Le Goater1-5/+42
The vCPU IPIs are now allocated in kvmppc_xive_cpu_connect() when the vCPU connects to the KVM device and not when all the sources are reset in kvmppc_xive_source_reset() This requires extra care for hotplug vCPUs and VM restore. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200820134547.2355743-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-09-08spapr/xive: Use kvmppc_xive_source_reset() in post_loadCédric Le Goater1-10/+10
This is doing an extra loop but should be equivalent. It also differentiate the reset of the sources from the restore of the sources configuration. This will help in allocating the vCPU IPIs independently. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200820134547.2355743-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-09-08spapr/xive: Modify kvm_cpu_is_enabled() interfaceCédric Le Goater1-3/+2
We will use to check if a vCPU IPI has been created. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200820134547.2355743-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-14spapr/xive: Use xive_source_esb_len()Greg Kurz1-1/+1
static inline size_t xive_source_esb_len(XiveSource *xsrc) { return (1ull << xsrc->esb_shift) * xsrc->nr_irqs; } Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159733969034.320580.6571451425779179477.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-13spapr/xive: Simplify error handling of kvmppc_xive_cpu_synchronize_state()Greg Kurz1-8/+6
Now that kvmppc_xive_cpu_get_state() returns negative on error, use that and get rid of the temporary Error object and error_propagate(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707852916.1489912.8376334685349668124.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-13spapr/xive: Simplify error handling in kvmppc_xive_connect()Greg Kurz1-13/+11
Now that all these functions return a negative errno on failure, check that and get rid of the local_err boilerplate. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707851537.1489912.1030839306195472651.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-13spapr/xive: Fix error handling in kvmppc_xive_post_load()Greg Kurz1-17/+18
Now that all these functions return a negative errno on failure, check that because it is preferred to local_err. And most of all, propagate it because vmstate expects negative errnos. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707850148.1489912.18355118622296682631.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-13spapr/kvm: Fix error handling in kvmppc_xive_pre_save()Greg Kurz1-3/+4
Now that kvmppc_xive_get_queues() returns a negative errno on failure, check with that because it is preferred to local_err. And most of all, propagate it because vmstate expects negative errnos. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707849455.1489912.6034461176847728064.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-13spapr/xive: Rework error handling of kvmppc_xive_set_source_config()Greg Kurz1-9/+4
Since kvm_device_access() returns a negative errno on failure, convert kvmppc_xive_set_source_config() to use it for error checking. This allows to get rid of the local_err boilerplate. Propagate the return value so that callers may use it as well to check failures. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707848764.1489912.17078842252160674523.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-13spapr/xive: Rework error handling in kvmppc_xive_get_queues()Greg Kurz1-7/+8
Since kvmppc_xive_get_queue_config() has a return value, convert kvmppc_xive_get_queues() to use it for error checking. This allows to get rid of the local_err boiler plate. Propagate the return value so that callers may use it as well to check failures. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707848069.1489912.14879208798696134531.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-13spapr/xive: Rework error handling of kvmppc_xive_[gs]et_queue_config()Greg Kurz1-19/+16
Since kvm_device_access() returns a negative errno on failure, convert kvmppc_xive_get_queue_config() and kvmppc_xive_set_queue_config() to use it for error checking. This allows to get rid of the local_err boilerplate. Propagate the return value so that callers may use it as well to check failures. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707847357.1489912.2032291280645236480.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-13spapr/xive: Rework error handling of kvmppc_xive_cpu_[gs]et_state()Greg Kurz1-5/+10
kvm_set_one_reg() returns a negative errno on failure, use that instead of errno. Also propagate it to callers so they can use it to check for failures and hopefully get rid of their local_err boilerplate. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707846665.1489912.14267225652103441921.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-13spapr/xive: Rework error handling of kvmppc_xive_mmap()Greg Kurz1-7/+11
Callers currently check failures of kvmppc_xive_mmap() through the @errp argument, which isn't a recommanded practice. It is preferred to use a return value when possible. Since NULL isn't an invalid address in theory, it seems better to return MAP_FAILED and to teach callers to handle it. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707845972.1489912.719896767746375765.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-13spapr/xive: Rework error handling of kvmppc_xive_source_reset()Greg Kurz1-6/+7
Since kvmppc_xive_source_reset_one() has a return value, convert kvmppc_xive_source_reset() to use it for error checking. This allows to get rid of the local_err boiler plate. Propagate the return value so that callers may use it as well to check failures. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707845245.1489912.9151822670764690034.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-13spapr/xive: Rework error handling of kvmppc_xive_cpu_connect()Greg Kurz1-11/+10
Use error_setg_errno() instead of error_setg(strerror()). While here, use -ret instead of errno since kvm_vcpu_enable_cap() returns a negative errno on failure. Use ERRP_GUARD() to ensure that errp can be passed to error_append_hint(), and get rid of the local_err boilerplate. Propagate the return value so that callers may use it as well to check failures. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707844549.1489912.4862921680328017645.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-13spapr/xive: Convert KVM device fd checks to assert()Greg Kurz1-28/+7
All callers guard these functions with an xive_in_kernel() helper. Make it clear that they are only to be called when the KVM XIVE device exists. Note that the check on xive is dropped in kvmppc_xive_disconnect(). It really cannot be NULL since it comes from set_active_intc() which only passes pointers to allocated objects. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <159679994169.876294.11026653581505077112.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-13ppc/xive: Rework setup of XiveSource::esb_mmioGreg Kurz1-2/+2
Depending on whether XIVE is emultated or backed with a KVM XIVE device, the ESB MMIOs of a XIVE source point to an I/O memory region or a mapped memory region. This is currently handled by checking kvm_irqchip_in_kernel() returns false in xive_source_realize(). This is a bit awkward as we usually need to do extra things when we're using the in-kernel backend, not less. But most important, we can do better: turn the existing "xive.esb" memory region into a plain container, introduce an "xive.esb-emulated" I/O subregion and rename the existing "xive.esb" subregion in the KVM code to "xive.esb-kvm". Since "xive.esb-kvm" is added with overlap and a higher priority, it prevails over "xive.esb-emulated" (ie. a guest using KVM XIVE will interact with "xive.esb-kvm" instead of the default "xive.esb-emulated" region. While here, consolidate the computation of the MMIO region size in a common helper. Suggested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159679992680.876294.7520540158586170894.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-12spapr/xive: Simplify kvmppc_xive_disconnect()Greg Kurz1-4/+2
Since this function begins with: /* The KVM XIVE device is not in use */ if (!xive || xive->fd == -1) { return; } we obviously don't need to check xive->fd again. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159673297296.766512.14780055521619233656.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-12spapr/xive: Fix xive->fd if kvm_create_device() failsGreg Kurz1-3/+5
If the creation of the KVM XIVE device fails for some reasons, the negative errno ends up in xive->fd, but the rest of the code assumes that xive->fd either contains an open fd, ie. positive value, or -1. This doesn't cause any misbehavior except kvmppc_xive_disconnect() that will try to close(xive->fd) during rollback and likely be rewarded with an EBADF. Only set xive->fd with a open fd. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159673296585.766512.15404407281299745442.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-01-08spapr/xive: Deduce the SpaprXive pointer from XiveTCTX::xptrGreg Kurz1-5/+4
And use it instead of reaching out to the machine. This allows to get rid of a call to qdev_get_machine() and to reduce the scope of another one so that it is only used within the argument list of error_append_hint(). This is an acceptable tradeoff compared to all it would require to know about the maximum number of CPUs here without calling qdev_get_machine(). Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200106145645.4539-10-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17spapr/xive: Configure number of servers in KVMGreg Kurz1-2/+21
The XIVE KVM devices now has an attribute to configure the number of interrupt servers. This allows to greatly optimize the usage of the VP space in the XIVE HW, and thus to start a lot more VMs. Only set this attribute if available in order to support older POWER9 KVM. The XIVE KVM device now reports the exhaustion of VPs upon the connection of the first VCPU. Check that in order to have a chance to provide a hint to the user. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157478679392.67101.7843580591407950866.stgit@bahia.tlslab.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17spapr: Pass the maximum number of vCPUs to the KVM interrupt controllerGreg Kurz1-1/+2
The XIVE and XICS-on-XIVE KVM devices on POWER9 hosts can greatly reduce their consumption of some scarce HW resources, namely Virtual Presenter identifiers, if they know the maximum number of vCPUs that may run in the VM. Prepare ground for this by passing the value down to xics_kvm_connect() and kvmppc_xive_connect(). This is purely mechanical, no functional change. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157478678301.67101.2717368060417156338.stgit@bahia.tlslab.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17xive/kvm: Trigger interrupts from userspaceGreg Kurz1-14/+2
When using the XIVE KVM device, the trigger page is directly accessible in QEMU. Unlike with XICS, no need to ask KVM to fire the interrupt. A simple store on the trigger page does the job. Just call xive_esb_trigger(). This may improve performance of emulated devices that go through qemu_set_irq(), eg. virtio devices created with ioeventfd=off or configured by the guest to use LSI interrupts, which aren't really recommended setups. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157408992731.494439.3405812941731584740.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-24spapr, xics, xive: Match signatures for XICS and XIVE KVM connect routinesDavid Gibson1-12/+10
Both XICS and XIVE have routines to connect and disconnect KVM with similar but not identical signatures. This adjusts them to match exactly, which will be useful for further cleanups later. While we're there, we add an explicit return value to the connect path to streamline error reporting in the callers. We remove error reporting the disconnect path. In the XICS case this wasn't used at all. In the XIVE case the only error case was if the KVM device was set up, but KVM didn't have the capability to do so which is pretty obviously impossible. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-04xive: Improve irq claim/free pathDavid Gibson1-4/+4
spapr_xive_irq_claim() returns a bool to indicate if it succeeded. But most of the callers and one callee use int return values and/or an Error * with more information instead. In any case, ints are a more common idiom for success/failure states than bools (one never knows what sense they'll be in). So instead change to an int return value to indicate presence of error + an Error * to describe the details through that call chain. It also didn't actually check if the irq was already claimed, which is one of the primary purposes of the claim path, so do that. spapr_xive_irq_free() also returned a bool... which no callers checked and was always true, so just drop it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04spapr/irq: Only claim VALID interrupts at the KVM levelCédric Le Goater1-3/+37
A typical pseries VM with 16 vCPUs, one disk, one network adapater uses less than 100 interrupts but the whole IRQ number space of the QEMU machine is allocated at reset time and it is 8K wide. This is wasting a considerable amount of interrupt numbers in the global IRQ space which has 1M interrupts per socket on a POWER9. To optimise the HW resources, only request at the KVM level interrupts which have been claimed by the guest. This will help to increase the maximum number of VMs per system and also help supporting nested guests using the XIVE interrupt mode. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190911133937.2716-3-clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156942766014.1274533.10792048853177121231.stgit@bahia.lan> [dwg: Folded in fix up from Greg Kurz] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-16sysemu: Split sysemu/runstate.h off sysemu/sysemu.hMarkus Armbruster1-0/+1
sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]
2019-08-13spapr/xive: Fix migration of hot-plugged CPUsCédric Le Goater1-2/+17
The migration sequence of a guest using the XIVE exploitation mode relies on the fact that the states of all devices are restored before the machine is. This is not true for hot-plug devices such as CPUs which state come after the machine. This breaks migration because the thread interrupt context registers are not correctly set. Fix migration of hotplugged CPUs by restoring their context in the 'post_load' handler of the XiveTCTX model. Fixes: 277dd3d7712a ("spapr/xive: add migration support for KVM") Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190813064853.29310-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02spapr/xive: Add proper rollback to kvmppc_xive_connect()Greg Kurz1-19/+29
Make kvmppc_xive_disconnect() able to undo the changes of a partial execution of kvmppc_xive_connect() and use it to perform rollback. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <156198735673.293938.7313195993600841641.stgit@bahia> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02spapr/xive: rework the mapping the KVM memory regionsCédric Le Goater1-9/+12
Today, the interrupt device is fully initialized at reset when the CAS negotiation process has completed. Depending on the KVM capabilities, the SpaprXive memory regions (ESB, TIMA) are initialized with a host MMIO backend or a QEMU emulated backend. This results in a complex initialization sequence partially done at realize and later at reset, and some memory region leaks. To simplify this sequence and to remove of the late initialization of the emulated device which is required to be done only once, we introduce new memory regions specific for KVM. These regions are mapped as overlaps on top of the emulated device to make use of the host MMIOs. Also provide proper cleanups of these regions when the XIVE KVM device is destroyed to fix the leaks. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190614165920.12670-2-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29spapr/xive: fix multiple resets when using the 'dual' interrupt modeCédric Le Goater1-4/+0
Today, when a reset occurs on a pseries machine using the 'dual' interrupt mode, the KVM devices are released and recreated depending on the interrupt mode selected by CAS. If XIVE is selected, the SysBus memory regions of the SpaprXive model are initialized by the KVM backend initialization routine each time a reset occurs. This leads to a crash after a couple of resets because the machine reaches the QDEV_MAX_MMIO limit of SysBusDevice : qemu-system-ppc64: hw/core/sysbus.c:193: sysbus_init_mmio: Assertion `dev->num_mmio < QDEV_MAX_MMIO' failed. To fix, initialize the SysBus memory regions in spapr_xive_realize() called only once and remove the same inits from the QEMU and KVM backend initialization routines which are called at each reset. Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190522074016.10521-2-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29spapr/irq: add KVM support to the 'dual' machineCédric Le Goater1-1/+28
The interrupt mode is chosen by the CAS negotiation process and activated after a reset to take into account the required changes in the machine. This brings new constraints on how the associated KVM IRQ device is initialized. Currently, each model takes care of the initialization of the KVM device in their realize method but this is not possible anymore as the initialization needs to be done globaly when the interrupt mode is known, i.e. when machine is reseted. It also means that we need a way to delete a KVM device when another mode is chosen. Also, to support migration, the QEMU objects holding the state to transfer should always be available but not necessarily activated. The overall approach of this proposal is to initialize both interrupt mode at the QEMU level to keep the IRQ number space in sync and to allow switching from one mode to another. For the KVM side of things, the whole initialization of the KVM device, sources and presenters, is grouped in a single routine. The XICS and XIVE sPAPR IRQ reset handlers are modified accordingly to handle the init and the delete sequences of the KVM device. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190513084245.25755-15-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29spapr: check for the activation of the KVM IRQ deviceCédric Le Goater1-0/+33
The activation of the KVM IRQ device depends on the interrupt mode chosen at CAS time by the machine and some methods used at reset or by the migration need to be protected. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190513084245.25755-11-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29spapr: introduce routines to delete the KVM IRQ deviceCédric Le Goater1-0/+56
If a new interrupt mode is chosen by CAS, the machine generates a reset to reconfigure. At this point, the connection with the previous KVM device needs to be closed and a new connection needs to opened with the KVM device operating the chosen interrupt mode. New routines are introduced to destroy the XICS and the XIVE KVM devices. They make use of a new KVM device ioctl which destroys the device and also disconnects the IRQ presenters from the vCPUs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190513084245.25755-10-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29spapr/xive: add migration support for KVMCédric Le Goater1-1/+94
When the VM is stopped, the VM state handler stabilizes the XIVE IC and marks the EQ pages dirty. These are then transferred to destination before the transfer of the device vmstates starts. The SpaprXive interrupt controller model captures the XIVE internal tables, EAT and ENDT and the XiveTCTX model does the same for the thread interrupt context registers. At restart, the SpaprXive 'post_load' method restores all the XIVE states. It is called by the sPAPR machine 'post_load' method, when all XIVE states have been transferred and loaded. Finally, the source states are restored in the VM change state handler when the machine reaches the running state. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190513084245.25755-7-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29spapr/xive: introduce a VM state change handlerCédric Le Goater1-1/+95
This handler is in charge of stabilizing the flow of event notifications in the XIVE controller before migrating a guest. This is a requirement before transferring the guest EQ pages to a destination. When the VM is stopped, the handler sets the source PQs to PENDING to stop the flow of events and to possibly catch a triggered interrupt occuring while the VM is stopped. Their previous state is saved. The XIVE controller is then synced through KVM to flush any in-flight event notification and to stabilize the EQs. At this stage, the EQ pages are marked dirty to make sure the EQ pages are transferred if a migration sequence is in progress. The previous configuration of the sources is restored when the VM resumes, after a migration or a stop. If an interrupt was queued while the VM was stopped, the handler simply generates the missing trigger. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190513084245.25755-6-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29spapr/xive: add state synchronization with KVMCédric Le Goater1-0/+90
This extends the KVM XIVE device backend with 'synchronize_state' methods used to retrieve the state from KVM. The HW state of the sources, the KVM device and the thread interrupt contexts are collected for the monitor usage and also migration. These get operations rely on their KVM counterpart in the host kernel which acts as a proxy for OPAL, the host firmware. The set operations will be added for migration support later. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190513084245.25755-5-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29spapr/xive: add hcall support when under KVMCédric Le Goater1-0/+197
XIVE hcalls are all redirected to QEMU as none are on a fast path. When necessary, QEMU invokes KVM through specific ioctls to perform host operations. QEMU should have done the necessary checks before calling KVM and, in case of failure, H_HARDWARE is simply returned. H_INT_ESB is a special case that could have been handled under KVM but the impact on performance was low when under QEMU. Here are some figures : kernel irqchip OFF ON H_INT_ESB KVM QEMU rtl8139 (LSI ) 1.19 1.24 1.23 Gbits/sec virtio 31.80 42.30 -- Gbits/sec Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190513084245.25755-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29spapr/xive: add KVM supportCédric Le Goater1-0/+237
This introduces a set of helpers when KVM is in use, which create the KVM XIVE device, initialize the interrupt sources at a KVM level and connect the interrupt presenters to the vCPU. They also handle the initialization of the TIMA and the source ESB memory regions of the controller. These have a different type under KVM. They are 'ram device' memory mappings, similarly to VFIO, exposed to the guest and the associated VMAs on the host are populated dynamically with the appropriate pages using a fault handler. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190513084245.25755-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>