aboutsummaryrefslogtreecommitdiff
path: root/hw/ppc/spapr.c
AgeCommit message (Collapse)AuthorFilesLines
2016-12-01spapr: fix default DRC state for coldplugged LMBsMichael Roth1-0/+5
Currently we set the initial isolation/allocation state for DRCs associated with coldplugged LMBs to ISOLATED/UNUSABLE, respectively, under the assumption that the guest will move this state to UNISOLATED/USABLE. In fact, this is only the case for LMBs added via hotplug. For coldplugged LMBs, the guest actually assumes the initial state to be UNISOLATED/USABLE. In practice, this only becomes an issue when we attempt to unplug one of these LMBs, where the guest kernel will issue an rtas-get-sensor-state call to check that the corresponding DRC is in an USABLE state before it will release the LMB back to QEMU. If the returned state is otherwise, the guest will assume no further action is needed, which bypasses the QEMU-side cleanup that occurs during the USABLE->UNUSABLE transition. This results in LMBs and their corresponding pc-dimm devices to stick around indefinitely. This patch fixes the issue by manually setting DRCs associated with cold-plugged LMBs to UNISOLATED/ALLOCATED, but leaving the hotplug state untouched. As it turns out, this is analogous to the handling for cold-plugged CPUs in spapr_core_plug(). Cc: qemu-ppc@nongnu.org Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-11-23spapr: Fix 2.7<->2.8 migration of PCI host bridgeDavid Gibson1-0/+5
daa2369 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration from qemu-2.7 to the current version. It split the device's MMIO window into two pieces for 32-bit and 64-bit MMIO. The patch included backwards compatibility code to convert the old property into the new format. However, the property value was also transferred in the migration stream and compared with a (probably unwise) VMSTATE_EQUAL. So, the "raw" value from 2.7 is compared to the new style converted value from (pre-)2.8 giving a mismatch and migration failure. Along with the actual field that caused the breakage, there are several other ill-advised VMSTATE_EQUAL()s. To fix forwards migration, we read the values in the stream into scratch variables and ignore them, instead of comparing for equality. To fix backwards migration, we populate those scratch variables in pre_save() with adjusted values to match the old behaviour. To permit the eventual possibility of removing this cruft from the stream, we only include these compatibility fields if a new 'pre-2.8-migration' property is set. We clear it on the pseries-2.8 machine type, which obviously can't be migrated backwards, but set it on earlier machine type versions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-11-23target-ppc: Allow eventual removal of old migration mistakesDavid Gibson1-0/+5
Until very recently, the vmstate for ppc cpus included some poorly thought out VMSTATE_EQUAL() components, that can easily break migration compatibility, and did so between qemu-2.6 and later versions. A hack was recently added which fixes this migration breakage, but it leaves the unhelpful cruft of these fields in the migration stream. This patch adds a new cpu property allowing these fields to be removed from the stream entirely. For the pseries-2.8 machine type - which comes after the fix - and for all non-pseries machine types - which aren't mature enough to care about cross-version migration - we remove the fields from the stream. For pseries-2.7 and earlier, The migration hack remains in place, allowing backwards and forwards migration with the older machine types. This restricts the migration compatibility cruft to older machine types, and at least opens the possibility of eventually deprecating and removing it entirely. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-11-23spapr: migration support for CAS-negotiated option vectorsMichael Roth1-0/+66
With the additional of the OV5_HP_EVT option vector, we now have certain functionality (namely, memory unplug) that checks at run-time for whether or not the guest negotiated the option via CAS. Because we don't currently migrate these negotiated values, we are unable to unplug memory from a guest after it's been migrated until after the guest is rebooted and CAS-negotiation is repeated. This patch fixes this by adding CAS-negotiated options to the migration stream. We do this using a subsection, since the negotiated value of OV5_HP_EVT is the only option currently needed to maintain proper functionality for a running guest. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-31Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-mttcg' into ↵Peter Maydell1-2/+2
staging Base patches for MTTCG enablement. # gpg: Signature made Mon 31 Oct 2016 14:01:41 GMT # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream-mttcg: tcg: move locking for tb_invalidate_phys_page_range up *_run_on_cpu: introduce run_on_cpu_data type cpus: re-factor out handle_icount_deadline tcg: cpus rm tcg_exec_all() tcg: move tcg_exec_all and helpers above thread fn target-arm/arm-powerctl: wake up sleeping CPUs tcg: protect translation related stuff with tb_lock. translate-all: Add assert_(memory|tb)_lock annotations linux-user/elfload: ensure mmap_lock() held while setting up tcg: comment on which functions have to be called with tb_lock held cpu-exec: include cpu_index in CPU_LOG_EXEC messages translate-all: add DEBUG_LOCKING asserts translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH cpus: make all_vcpus_paused() return bool Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-31*_run_on_cpu: introduce run_on_cpu_data typePaolo Bonzini1-2/+2
This changes the *_run_on_cpu APIs (and helpers) to pass data in a run_on_cpu_data type instead of a plain void *. This is because we sometimes want to pass a target address (target_ulong) and this fails on 32 bit hosts emulating 64 bit guests. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20161027151030.20863-24-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-31Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-fetch' into ↵Peter Maydell1-1/+0
staging trivial patches for 2016-10-28 # gpg: Signature made Fri 28 Oct 2016 16:17:51 BST # gpg: using RSA key 0x701B4F6B1A693E59 # gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" # gpg: aka "Michael Tokarev <mjt@corpit.ru>" # gpg: aka "Michael Tokarev <mjt@debian.org>" # Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5 # Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931 4B22 701B 4F6B 1A69 3E59 * remotes/mjt/tags/trivial-patches-fetch: (23 commits) Fix build for less common build directories names clean-up: removed duplicate #includes scripts/clean-includes: added duplicate #include check monitor: deprecate 'default' option qemu-ga: Remove stray 'q' in documentation Makefile: Fix help text for target 'installer' s390: avoid always-true comparison in s390_pci_generate_fid() migration: Remove unneeded NULL check from migrate_fd_error() scripts/hxtool: fix undefined behavour of echo qemu-options.hx: set: fix copy-paste error usb: Change *_exitfn return type from int to void MAINTAINERS: qemu-trivial information colo-compare: remove unused struct CompareChardevProps and 'props' variable milkymist-pfpu: fix potential integer overflow hw/block/nvme: Simplify if-statements a little bit target-lm32: rewrite gen_compare() lm32: milkymist-tmu2: fix integer overflow target-lm32: disable asm logging via LOG_DIS() target-lm32: swap operand of wcsr in LOG_DIS() target-lm32: fix LOG_DIS operand order ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-28clean-up: removed duplicate #includesAnand J1-1/+0
Some files contain multiple #includes of the same header file. Removed most of those unnecessary duplicate entries using scripts/clean-includes. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Anand J <anand.indukala@gmail.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-10-28spapr: Memory hot-unplug supportBharata B Rao1-1/+118
Add support to hot remove pc-dimm memory devices. Since we're introducing a machine-level unplug_request hook, we also had handling for CPU unplug there as well to ensure CPU unplug continues to work as it did before. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> * add hooks to CAS/cmdline enablement of hotplug ACR support * add hook for CPU unplug Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28spapr: use count+index for memory hotplugMichael Roth1-4/+18
Commit 0a417869: spapr: Move memory hotplug to RTAS_LOG_V6_HP_ID_DRC_COUNT type dropped per-DRC/per-LMB hotplugs event in favor of a bulk add via a single LMB count value. This was to avoid overrunning the guest EPOW event queue with hotplug events. This works fine, but relies on the guest exhaustively scanning for pluggable LMBs to satisfy the requested count by issuing rtas-get-sensor(DR_ENTITY_SENSE, ...) calls until all the LMBs associated with the DIMM are identified. With newer support for dedicated hotplug event source, this queue exhaustion is no longer as much of an issue due to implementation details on the guest side, but we still try to avoid excessive hotplug events by now supporting both a count and a starting index to avoid unecessary work. This patch makes use of that approach when the capability is available. Cc: bharata@linux.vnet.ibm.com Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28spapr: add hotplug interrupt machine optionsMichael Roth1-0/+28
This adds machine options of the form: -machine pseries,modern-hotplug-events=true -machine pseries,modern-hotplug-events=false If false, QEMU will force the use of "legacy" style hotplug events, which are surfaced through EPOW events instead of a dedicated hot plug event source, and lack certain features necessary, mainly, for memory unplug support. If true, QEMU will enable support for "modern" dedicated hot plug event source. Note that we will still default to "legacy" style unless the guest advertises support for the "modern" hotplug events via ibm,client-architecture-support hcall during early boot. For pseries-2.7 and earlier we default to false, for newer machine types we default to true. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28spapr_events: add support for dedicated hotplug event sourceMichael Roth1-2/+7
Hotplug events were previously delivered using an EPOW interrupt and were queued by linux guests into a circular buffer. For traditional EPOW events like shutdown/resets, this isn't an issue, but for hotplug events there are cases where this buffer can be exhausted, resulting in the loss of hotplug events, resets, etc. Newer-style hotplug event are delivered using a dedicated event source. We enable this in supported guests by adding standard an additional event source in the guest device-tree via /event-sources, and, if the guest advertises support for the newer-style hotplug events, using the corresponding interrupt to signal the available of hotplug/unplug events. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28spapr: improve ibm,architecture-vec-5 property handlingMichael Roth1-6/+17
ibm,architecture-vec-5 is supposed to encode all option vector 5 bits negotiated between platform/guest. Currently we hardcode this property in the boot-time device tree to advertise a single negotiated capability, "Form 1" NUMA Affinity, regardless of whether or not CAS has been invoked or that capability has actually been negotiated. Improve this by generating ibm,architecture-vec-5 based on the full set of option vector 5 capabilities negotiated via CAS. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28spapr: add option vector handling in CAS-generated resetsMichael Roth1-6/+34
In some cases, ibm,client-architecture-support calls can fail. This could happen in the current code for situations where the modified device tree segment exceeds the buffer size provided by the guest via the call parameters. In these cases, QEMU will reset, allowing an opportunity to regenerate the device tree from scratch via boot-time handling. There are potentially other scenarios as well, not currently reachable in the current code, but possible in theory, such as cases where device-tree properties or nodes need to be removed. We currently don't handle either of these properly for option vector capabilities however. Instead of carrying the negotiated capability beyond the reset and creating the boot-time device tree accordingly, we start from scratch, generating the same boot-time device tree as we did prior to the CAS-generated and the same device tree updates as we did before. This could (in theory) cause us to get stuck in a reset loop. This hasn't been observed, but depending on the extensiveness of CAS-induced device tree updates in the future, could eventually become an issue. Address this by pulling capability-related device tree updates resulting from CAS calls into a common routine, spapr_dt_cas_updates(), and adding an sPAPROptionVector* parameter that allows us to test for newly-negotiated capabilities. We invoke it as follows: 1) When ibm,client-architecture-support gets called, we call spapr_dt_cas_updates() with the set of capabilities added since the previous call to ibm,client-architecture-support. For the initial boot, or a system reset generated by something other than the CAS call itself, this set will consist of *all* options supported both the platform and the guest. For calls to ibm,client-architecture-support immediately after a CAS-induced reset, we call spapr_dt_cas_updates() with only the set of capabilities added since the previous call, since the other capabilities will have already been addressed by the boot-time device-tree this time around. In the unlikely event that capabilities are *removed* since the previous CAS, we will generate a CAS-induced reset. In the unlikely event that we cannot fit the device-tree updates into the buffer provided by the guest, well generate a CAS-induced reset. 2) When a CAS update results in the need to reset the machine and include the updates in the boot-time device tree, we call the spapr_dt_cas_updates() using the full set of negotiated capabilities as part of the reset path. At initial boot, or after a reset generated by something other than the CAS call itself, this set will be empty, resulting in what should be the same boot-time device-tree as we generated prior to this patch. For CAS-induced reset, this routine will be called with the full set of capabilities negotiated by the platform/guest in the previous CAS call, which should result in CAS updates from previous call being accounted for in the initial boot-time device tree. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Changed an int -> bool conversion to be more explicit] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28spapr_hcall: use spapr_ovec_* interfaces for CAS optionsMichael Roth1-2/+8
Currently we access individual bytes of an option vector via ldub_phys() to test for the presence of a particular capability within that byte. Currently this is only done for the "dynamic reconfiguration memory" capability bit. If that bit is present, we pass a boolean value to spapr_h_cas_compose_response() to generate a modified device tree segment with the additional properties required to enable this functionality. As more capability bits are added, will would need to modify the code to add additional option vector accesses and extend the param list for spapr_h_cas_compose_response() to include similar boolean values for these parameters. Avoid this by switching to spapr_ovec_* helpers so we can do all the parsing in one shot and then test for these additional bits within spapr_h_cas_compose_response() directly. Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28pseries: Remove spapr_create_fdt_skel()David Gibson1-58/+36
For historical reasons construction of the guest device tree in spapr is divided between spapr_create_fdt_skel() which is called at init time, and spapr_build_fdt() which runs at reset time. Over time, more and more things have needed to be moved to reset time. Previous cleanups mean the only things left in spapr_create_fdt_skel() are the properties of the root node itself. Finish consolidating these two parts of device tree construction, by moving this to the start of spapr_build_fdt(), and removing spapr_create_fdt_skel() entirely. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Consolidate construction of /vdevice device tree nodeDavid Gibson1-17/+2
Construction of the /vdevice node (and its children) is divided between spapr_create_fdt_skel() (at init time), which creates the base node, and spapr_populate_vdevice() (at reset time) which creates the nodes for each individual virtual device. This consolidates both into a single function called from spapr_build_fdt(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Move /hypervisor node construction to fdt_build_fdt()David Gibson1-21/+28
Currently the /hypervisor device tree node is constructed in spapr_create_fdt_skel(). As part of consolidating device tree construction to reset time, move it to a function called from spapr_build_fdt(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Move /event-sources construction to spapr_build_fdt()David Gibson1-3/+3
The /event-sources device tree node is built from spapr_create_fdt_skel(). As part of consolidating device tree construction to reset time, this moves it to spapr_build_fdt(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Consolidate construction of /rtas device tree nodeDavid Gibson1-58/+72
For historical reasons construction of the /rtas node in the device tree (amongst others) is split into several places. In particular it's split between spapr_create_fdt_skel(), spapr_build_fdt() and spapr_rtas_device_tree_setup(). In fact, as well as adding the actual RTAS tokens to the device tree, spapr_rtas_device_tree_setup() just adds the ibm,lrdr-capacity property, which despite going in the /rtas node, doesn't have a lot to do with RTAS. This patch consolidates the code constructing /rtas together into a new spapr_dt_rtas() function. spapr_rtas_device_tree_setup() is renamed to spapr_dt_rtas_tokens() and now only adds the token properties. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Consolidate construction of /chosen device tree nodeDavid Gibson1-66/+65
For historical reasons, building the /chosen node in the guest device tree is split across several places and includes both parts which write the DT sequentially and others which use random access functions. This patch consolidates construction of the node into one place, using random access functions throughout. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Move construction of /interrupt-controller fdt nodeDavid Gibson1-17/+3
Currently the device tree node for the XICS interrupt controller is in spapr_create_fdt_skel(). As part of consolidating device tree construction to reset time, this moves it to a function called from spapr_build_fdt(). In addition we move the actual code into hw/intc/xics_spapr.c with the rest of the PAPR specific interrupt controller code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Consolidate RTAS loadingDavid Gibson1-2/+1
At each system reset, the pseries machine needs to load RTAS (the runtime portion of the guest firmware) into the VM. This means copying the actual RTAS code into guest memory, and also updating the device tree so that the guest OS and boot firmware can locate it. For historical reasons the copy and update to the device tree were in different parts of the code. This cleanup brings them both together in an spapr_load_rtas() function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Move adding of fdt reserve map entriesDavid Gibson1-8/+9
The flattened device tree passed to pseries guests contains a list of reserved memory areas. Currently we construct this list early in spapr_create_fdt_skel() as we sequentially write the fdt. This will be inconvenient for upcoming cleanups, so this patch moves the reserve map changes to the end of fdt construction. This changes fdt_add_reservemap_entry() calls - which work when writing the fdt sequentially to fdt_add_mem_rsv() calls used when altering the fdt in random access mode. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Make spapr_create_fdt_skel() get information from machine stateDavid Gibson1-45/+36
Currently spapr_create_fdt_skel() takes a bunch of individual parameters for various things it will put in the device tree. Some of these can already be taken directly from sPAPRMachineState. This patch alters it so that all of them can be taken from there, which will allow this code to be moved away from its current caller in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Remove rtas_addr and fdt_addr fields from machinestateDavid Gibson1-7/+7
These values are used only within ppc_spapr_reset(), so just change them to local variables. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Split device tree construction from device tree loadDavid Gibson1-19/+23
spapr_finalize_fdt() both finishes building the device tree for the guest and loads it into guest memory. For future cleanups, it's going to be more convenient to do these two things separately. The loading portion is pretty trivial, so we move it inline into the caller, ppc_spapr_reset(). We also rename spapr_finalize_fdt(), because the current name is going to become inaccurate. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-10-24Increase MAX_CPUMASK_BITS from 255 to 288Igor Mammedov1-1/+1
so that it would be possible to increase maxcpus limit for x86 target. Keep spapr/virt_arm at limit they used to have 255. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-16spapr: Improved placement of PCI host bridges in guest memory mapDavid Gibson1-30/+92
Currently, the MMIO space for accessing PCI on pseries guests begins at 1 TiB in guest address space. Each PCI host bridge (PHB) has a 64 GiB chunk of address space in which it places its outbound PIO and 32-bit and 64-bit MMIO windows. This scheme as several problems: - It limits guest RAM to 1 TiB (though we have a limited fix for this now) - It limits the total MMIO window to 64 GiB. This is not always enough for some of the large nVidia GPGPU cards - Putting all the windows into a single 64 GiB area means that naturally aligning things within there will waste more address space. In addition there was a miscalculation in some of the defaults, which meant that the MMIO windows for each PHB actually slightly overran the 64 GiB region for that PHB. We got away without nasty consequences because the overrun fit within an unused area at the beginning of the next PHB's region, but it's not pretty. This patch implements a new scheme which addresses those problems, and is also closer to what bare metal hardware and pHyp guests generally use. Because some guest versions (including most current distro kernels) can't access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and 64 TiB. This is broken into 1 TiB chunks. The first 1 TiB contains the PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs. Each subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for one PHB each. This reduces the number of allowed PHBs (without full manual configuration of all the windows) from 256 to 31, but this should still be plenty in practice. We also change some of the default window sizes for manually configured PHBs to saner values. Finally we adjust some tests and libqos so that it correctly uses the new default locations. Ideally it would parse the device tree given to the guest, but that's a more complex problem for another time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16spapr_pci: Add a 64-bit MMIO windowDavid Gibson1-2/+8
On real hardware, and under pHyp, the PCI host bridges on Power machines typically advertise two outbound MMIO windows from the guest's physical memory space to PCI memory space: - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space - A 64-bit window which maps onto a large region somewhere high in PCI address space (traditionally this used an identity mapping from guest physical address to PCI address, but that's not always the case) The qemu implementation in spapr-pci-host-bridge, however, only supports a single outbound MMIO window, however. At least some Linux versions expect the two windows however, so we arranged this window to map onto the PCI memory space from 2 GiB..~64 GiB, then advertised it as two contiguous windows, the "32-bit" window from 2G..4G and the "64-bit" window from 4G..~64G. This approach means, however, that the 64G window is not naturally aligned. In turn this limits the size of the largest BAR we can map (which does have to be naturally aligned) to roughly half of the total window. With some large nVidia GPGPU cards which have huge memory BARs, this is starting to be a problem. This patch adds true support for separate 32-bit and 64-bit outbound MMIO windows to the spapr-pci-host-bridge implementation, each of which can be independently configured. The 32-bit window always maps to 2G.. in PCI space, but the PCI address of the 64-bit window can be configured (it defaults to the same as the guest physical address). So as not to break possible existing configurations, as long as a 64-bit window is not specified, a large single window can be specified. This will appear the same way to the guest as the old approach, although it's now implemented by two contiguous memory regions rather than a single one. For now, this only adds the possibility of 64-bit windows. The default configuration still uses the legacy mode. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16spapr: Adjust placement of PCI host bridge to allow > 1TiB RAMDavid Gibson1-2/+14
Currently the default PCI host bridge for the 'pseries' machine type is constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in guest memory space. This means that if > 1TiB of guest RAM is specified, the RAM will collide with the PCI IO windows, causing serious problems. Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because there's a little unused space at the bottom of the area reserved for PCI, but essentially this means that > 1TiB of RAM has never worked with the pseries machine type. This patch fixes this by altering the placement of PHBs on large-RAM VMs. Instead of always placing the first PHB at 1TiB, it is placed at the next 1 TiB boundary after the maximum RAM address. Technically, this changes behaviour in a migration-breaking way for existing machines with > 1TiB maximum memory, but since having > 1 TiB memory was broken anyway, this seems like a reasonable trade-off. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16spapr_pci: Delegate placement of PCI host bridges to machine typeDavid Gibson1-0/+31
The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB) for a PAPR guest. Unlike on x86, it's routine on Power (both bare metal and PAPR guests) to have numerous independent PHBs, each controlling a separate PCI domain. There are two ways of configuring the spapr-pci-host-bridge device: first it can be done fully manually, specifying the locations and sizes of all the IO windows. This gives the most control, but is very awkward with 6 mandatory parameters. Alternatively just an "index" can be specified which essentially selects from an array of predefined PHB locations. The PHB at index 0 is automatically created as the default PHB. The current set of default locations causes some problems for guests with large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia GPGPU cards via VFIO). Obviously, for migration we can only change the locations on a new machine type, however. This is awkward, because the placement is currently decided within the spapr-pci-host-bridge code, so it breaks abstraction to look inside the machine type version. So, this patch delegates the "default mode" PHB placement from the spapr-pci-host-bridge device back to the machine type via a public method in sPAPRMachineClass. It's still a bit ugly, but it's about the best we can do. For now, this just changes where the calculation is done. It doesn't change the actual location of the host bridges, or any other behaviour. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-14spapr: fix inheritance chain for default machine optionsMichael Roth1-0/+3
Rather than machine instances having backward-compatible option defaults that need to be repeatedly re-enabled for every new machine type we introduce, we set the defaults appropriate for newer machine types, then add code to explicitly disable instance options as needed to maintain compatibility with older machine types. Currently pseries-2.5 does not inherit from pseries-2.6 in this fashion, which is okay at the moment since we do not have any instance compatibility options for pseries-2.6+ currently. We will make use of this in future patches though, so fix it here. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> [dwg: Extended to make 2.7 inherit from 2.8 as well] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-06hw/ppc/spapr: Use POWER8 by default for the pseries-2.8 machineThomas Huth1-1/+5
A couple of distributors are compiling their distributions with "-mcpu=power8" for ppc64le these days, so the user sooner or later runs into a crash there when not explicitely specifying the "-cpu POWER8" option to QEMU (which is currently using POWER7 for the "pseries" machine by default). Due to this reason, the linux-user target already switched to POWER8 a while ago (see commit de3f1b98410e0d5b406a0df3a48547b559d18602). Since the softmmu target of course has the same problem, we should switch there to POWER8 for the newer machine types, too. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-05ppc: Check the availability of transactional memoryThomas Huth1-1/+4
KVM-PR currently does not support transactional memory, and the implementation in TCG is just a fake. We should not announce TM support in the ibm,pa-features property when running on such a system, so disable it by default and only enable it if the KVM implementation supports it (i.e. recent versions of KVM-HV). These changes are based on some earlier work from Anton Blanchard (thanks!). Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-05hw/ppc/spapr: Fix the selection of the processor featuresThomas Huth1-2/+9
The current code uses pa_features_206 for POWERPC_MMU_2_06, and for everything else, it uses pa_features_207. This is bad in some cases because there is also a "degraded" MMU version of ISA 2.06, called POWERPC_MMU_2_06a, which should of course use the flags for 2.06 instead. And there is also the possibility that the user runs the pseries machine with a POWER5+ or even 970 processor. In that case we certainly do not want to set the flags for 2.07, and rather simply skip the setting of the pa-features property instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-05hw/ppc/spapr: Move code related to "ibm,pa-features" to a separate functionThomas Huth1-30/+36
The function spapr_populate_cpu_dt() has become quite big already, and since we likely have to extend the pa-features property for every new processor generation, it is nicer if we put the related code into a separate function. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-05pseries: Add 2.8 machine type, set up compatibility macrosDavid Gibson1-2/+20
Now that 2.7 is released, create the pseries-2.8 machine type and add the boilerplate compatiblity macro stuff. There's nothing new to put into the 2.7 compatiliby properties yet, but we'll need something eventually, so we might as well get it ready now. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-28Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell1-4/+2
* thread-safe tb_flush (Fred, Alex, Sergey, me, Richard, Emilio,... :-) * license clarification for compiler.h (Felipe) * glib cflags improvement (Marc-André) * checkpatch silencing (Paolo) * SMRAM migration fix (Paolo) * Replay improvements (Pavel) * IOMMU notifier improvements (Peter) * IOAPIC now defaults to version 0x20 (Peter) # gpg: Signature made Tue 27 Sep 2016 10:57:40 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (28 commits) replay: allow replay stopping and restarting replay: vmstate for replay module replay: move internal data to the structure cpus-common: lock-free fast path for cpu_exec_start/end tcg: Make tb_flush() thread safe cpus-common: Introduce async_safe_run_on_cpu() cpus-common: simplify locking for start_exclusive/end_exclusive cpus-common: remove redundant call to exclusive_idle() cpus-common: always defer async_run_on_cpu work items docs: include formal model for TCG exclusive sections cpus-common: move exclusive work infrastructure from linux-user cpus-common: fix uninitialized variable use in run_on_cpu cpus-common: move CPU work item management to common code cpus-common: move CPU list management to common code linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick() linux-user: Use QemuMutex and QemuCond cpus: Rename flush_queued_work() cpus: Move common code out of {async_, }run_on_cpu() cpus: pass CPUState to run_on_cpu helpers build-sys: put glib_cflags in QEMU_CFLAGS ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-27sysbus: Remove ignored return value of FindSysbusDeviceFuncDavid Gibson1-3/+1
Functions of type FindSysbusDeviceFunc currently return an integer. However, this return value is always ignored by the caller in find_sysbus_device(). This changes the function type to return void, to avoid confusion over the function semantics. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27cpus: pass CPUState to run_on_cpu helpersAlex Bennée1-4/+2
CPUState is a fairly common pointer to pass to these helpers. This means if you need other arguments for the async_run_on_cpu case you end up having to do a g_malloc to stuff additional data into the routine. For the current users this isn't a massive deal but for MTTCG this gets cumbersome when the only other parameter is often an address. This adds the typedef run_on_cpu_func for helper functions which has an explicit CPUState * passed as the first parameter. All the users of run_on_cpu and async_run_on_cpu have had their helpers updated to use CPUState where available. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> [Sergey Fedorov: - eliminate more CPUState in user data; - remove unnecessary user data passing; - fix target-s390x/kvm.c and target-s390x/misc_helper.c] Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts) Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> (s390 parts) Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <1470158864-17651-3-git-send-email-alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-23Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.8-20160923' ↵Peter Maydell1-3/+6
into staging ppc patch queue 2016-09-23 This pull request supersedes ppc-for-2.8-20160922. There was a clang build error in that, and I've also added one extra patch in the new pull. Included in this set of ppc and spapr patches are: * TCG implementations for more POWER9 instructions * Some preliminary XICS fixes in preparataion for the pnv machine type * A significant ADB (Macintosh kbd/mouse) cleanup * Some conversions to use trace instead of debug macros * Fixes to correctly handle global TLB flush synchronization in TCG. This is already a bug, but it will have much more impact when we get MTTCG * Add more qtest testcases for Power * Some MAINTAINERS updates * Assorted bugfixes * Add the basics of NUMA associativity to the spapr PCI host bridge This touches some test files and monitor.c which are technically outside the ppc code, but coming through this tree because the changes are primarily of interest to ppc. # gpg: Signature made Fri 23 Sep 2016 08:14:47 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.8-20160923: (45 commits) spapr_pci: Add numa node id monitor: fix crash for platforms without a CPU 0 linux-user: ppc64: fix ARCH_206 bit in AT_HWCAP ppc/kvm: Mark 64kB page size support as disabled if not available ppc/xics: An ICS with offset 0 is assumed to be uninitialized ppc/xics: account correct irq status Enable H_CLEAR_MOD and H_CLEAR_REF hypercalls on KVM/PPC64. target-ppc: tlbie/tlbivax should have global effect target-ppc: add flag in check_tlb_flush() target-ppc: add TLB_NEED_LOCAL_FLUSH flag spapr: Introduce sPAPRCPUCoreClass target-ppc: implement darn instruction target-ppc: add stxsi[bh]x instruction target-ppc: add lxsi[bw]zx instruction target-ppc: add xxspltib instruction target-ppc: consolidate store conditional target-ppc: move out stqcx impementation target-ppc: consolidate load with reservation target-ppc: convert st[16,32,64]r to use new macro target-ppc: convert st64 to use new macro ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-23vl: Switch qemu_uuid to QemuUUIDFam Zheng1-6/+1
Update all qemu_uuid users as well, especially get rid of the duplicated low level g_strdup_printf, sscanf and snprintf calls with QEMU UUID API. Since qemu_uuid_parse is quite tangled with qemu_uuid, its switching to QemuUUID is done here too to keep everything in sync and avoid code churn. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-Id: <1474432046-325-10-git-send-email-famz@redhat.com>
2016-09-23Enable H_CLEAR_MOD and H_CLEAR_REF hypercalls on KVM/PPC64.Nathan Whitehorn1-0/+3
These are mandatory per PAPR and available on Linux 4.3 and newer kernels. The calls in question are required to run FreeBSD guests with reasonable performance, so enable them if possible. Signed-off-by: Nathan Whitehorn <nwhitehorn@freebsd.org> [dwg: Added a stub to fix compile without KVM (e.g. on x86 host)] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-23spapr: Introduce sPAPRCPUCoreClassBharata B Rao1-3/+3
Each spapr cpu core type defines an instance_init routine which just populates the CPU class name. This can be done in the class_init commonly for all core types which simplifies the registration. This is inspired by how PowerNV core types are registered. Certain types of spapr cpu cores ('host' and generic type based on host CPU) are initialized in target-ppc/kvm.c. To convert these type registrations to use class_init, we need to expose spapr_cpu_core_class_init() outside of spapr_cpu_core.c. Commit d11b268e1765 added a generic sPAPR CPU core family type to support cases like POWER8 CPU type on POWER8E host CPU. Switching to class_init would fix such scenarios to use the right CPU thread type instead of defaulting to host-powerpc64-cpu. In an unrelated cleanup, fix a typo in .get_hotplug_handler routine. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-07hw/ppc: add a ppc_create_page_sizes_prop() helper routineCédric Le Goater1-35/+1
The exact same routine will be used in PowerNV. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-07hw/ppc: use error_report instead of fprintfCédric Le Goater1-6/+6
Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-07hw/ppc: include fdt helper routine in a common fileCédric Le Goater1-10/+1
spapr_pci would also be a good candidate but the macro _FDT is slightly different. It returns and does not exit. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-08-13ppc: parse cpu features onceGreg Kurz1-0/+2
Considering that features are converted to global properties and global properties are automatically applied to every new instance of created CPU (at object_new() time), there is no point in parsing cpu_model string every time a CPU created. So move parsing outside CPU creation loop and do it only once. Parsing also should be done before any CPU is created so that features would affect the first CPU a well. This patch does that for all PowerPC machine types. It is based on previous work from Bharata: https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg07564.html Signed-off-by: Greg Kurz <groug@kaod.org> [clg: only kept the fix for the spapr platform. support for other platform will be added in 2.8 ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-08-10hw/ppc/spapr: Look up CPU alias names instead of hard-coding the aliasesThomas Huth1-1/+1
Hard-coding the CPU alias names in the spapr_cores[] array has two big disadvantages: 1) We register a real type with the CPU alias name in spapr_cpu_core_register_types() - this prevents us from registering a CPU family name in kvm_ppc_register_host_cpu_type() with the same name (as we do it for the non-hotpluggable CPU types). 2) It's quite cumbersome to maintain the aliases here in sync with the ppc_cpu_aliases list from target-ppc/cpu-models.c. So let's simply add proper alias lookup to the spapr cpu core code, too (by checking whether the given model can be used directly, and if not by trying to look up the given model as an alias name instead). Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>