aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)AuthorFilesLines
2021-09-30hw/i386/fw_cfg: Set SGX bits in feature control fw_cfg accordinglySean Christopherson1-1/+11
Request SGX an SGX Launch Control to be enabled in FEATURE_CONTROL when the features are exposed to the guest. Our design is the SGX Launch Control bit will be unconditionally set in FEATURE_CONTROL, which is unlike host bios. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210719112136.57018-17-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30i386: Update SGX CPUID info according to hardware/KVM/user inputSean Christopherson3-1/+88
Expose SGX to the guest if and only if KVM is enabled and supports virtualization of SGX. While the majority of ENCLS can be emulated to some degree, because SGX uses a hardware-based root of trust, the attestation aspects of SGX cannot be emulated in software, i.e. ultimately emulation will fail as software cannot generate a valid quote/report. The complexity of partially emulating SGX in Qemu far outweighs the value added, e.g. an SGX specific simulator for userspace applications can emulate SGX for development and testing purposes. Note, access to the PROVISIONKEY is not yet advertised to the guest as KVM blocks access to the PROVISIONKEY by default and requires userspace to provide additional credentials (via ioctl()) to expose PROVISIONKEY. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210719112136.57018-13-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30vl: Add sgx compound properties to expose SGX EPC sections to guestSean Christopherson2-6/+43
Because SGX EPC is enumerated through CPUID, EPC "devices" need to be realized prior to realizing the vCPUs themselves, i.e. long before generic devices are parsed and realized. From a virtualization perspective, the CPUID aspect also means that EPC sections cannot be hotplugged without paravirtualizing the guest kernel (hardware does not support hotplugging as EPC sections must be locked down during pre-boot to provide EPC's security properties). So even though EPC sections could be realized through the generic -devices command, they need to be created much earlier for them to actually be usable by the guest. Place all EPC sections in a contiguous block, somewhat arbitrarily starting after RAM above 4g. Ensuring EPC is in a contiguous region simplifies calculations, e.g. device memory base, PCI hole, etc..., allows dynamic calculation of the total EPC size, e.g. exposing EPC to guests does not require -maxmem, and last but not least allows all of EPC to be enumerated in a single ACPI entry, which is expected by some kernels, e.g. Windows 7 and 8. The new compound properties command for sgx like below: ...... -object memory-backend-epc,id=mem1,size=28M,prealloc=on \ -object memory-backend-epc,id=mem2,size=10M \ -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2 Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210719112136.57018-6-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30i386: Add 'sgx-epc' device to expose EPC sections to guestSean Christopherson2-0/+168
SGX EPC is enumerated through CPUID, i.e. EPC "devices" need to be realized prior to realizing the vCPUs themselves, which occurs long before generic devices are parsed and realized. Because of this, do not allow 'sgx-epc' devices to be instantiated after vCPUS have been created. The 'sgx-epc' device is essentially a placholder at this time, it will be fully implemented in a future patch along with a dedicated command to create 'sgx-epc' devices. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210719112136.57018-5-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30Kconfig: Add CONFIG_SGX supportYang Zhong1-0/+5
Add new CONFIG_SGX for sgx support in the Qemu, and the Kconfig default enable sgx in the i386 platform. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210719112136.57018-32-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30memory: Add RAM_PROTECTED flag to skip IOMMU mappingsSean Christopherson1-0/+1
Add a new RAMBlock flag to denote "protected" memory, i.e. memory that looks and acts like RAM but is inaccessible via normal mechanisms, including DMA. Use the flag to skip protected memory regions when mapping RAM for DMA in VFIO. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30hw/arm: sabrelite: Connect SPI flash CS line to GPIO3_19Xuzhou Cheng1-1/+1
The Linux spi-imx driver does not work on QEMU. The reason is that the state of m25p80 loops in STATE_READING_DATA state after receiving RDSR command, the new command is ignored. Before sending a new command, CS line should be pulled high to make the state of m25p80 back to IDLE. Currently the SPI flash CS line is connected to the SPI controller, but on the real board, it's connected to GPIO3_19. This matches the ecspi1 device node in the board dts. ecspi1 node in imx6qdl-sabrelite.dtsi: &ecspi1 { cs-gpios = <&gpio3 19 GPIO_ACTIVE_LOW>; pinctrl-names = "default"; pinctrl-0 = <&pinctrl_ecspi1>; status = "okay"; flash: m25p80@0 { compatible = "sst,sst25vf016b", "jedec,spi-nor"; spi-max-frequency = <20000000>; reg = <0>; }; }; Should connect the SSI_GPIO_CS to GPIO3_19 when adding a spi-nor to spi1 on sabrelite machine. Verified this patch on Linux v5.14. Logs: # echo "01234567899876543210" > test # mtd_debug erase /dev/mtd0 0x0 0x1000 Erased 4096 bytes from address 0x00000000 in flash # mtd_debug write /dev/mtdblock0 0x0 20 test Copied 20 bytes from test to address 0x00000000 in flash # mtd_debug read /dev/mtdblock0 0x0 20 test_out Copied 20 bytes from address 0x00000000 in flash to test_out # cat test_out 01234567899876543210# Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20210927142825.491-1-xchengl.cn@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-30ide: Rename ide_bus_new() to ide_bus_init()Peter Maydell10-10/+10
The function ide_bus_new() does an in-place initialization. Rename it to ide_bus_init() to follow our _init vs _new convention. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Corey Minyard <cminyard@mvista.com> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: John Snow <jsnow@redhat.com> (Feel free to merge.) Message-id: 20210923121153.23754-7-peter.maydell@linaro.org
2021-09-30qbus: Rename qbus_create() to qbus_new()Peter Maydell13-13/+13
Rename the "allocate and return" qbus creation function to qbus_new(), to bring it into line with our _init vs _new convention. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Corey Minyard <cminyard@mvista.com> Message-id: 20210923121153.23754-6-peter.maydell@linaro.org
2021-09-30qbus: Rename qbus_create_inplace() to qbus_init()Peter Maydell31-59/+52
Rename qbus_create_inplace() to qbus_init(); this is more in line with our usual naming convention for functions that in-place initialize objects. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-id: 20210923121153.23754-5-peter.maydell@linaro.org
2021-09-30pci: Rename pci_root_bus_new_inplace() to pci_root_bus_init()Peter Maydell3-18/+18
Rename the pci_root_bus_new_inplace() function to pci_root_bus_init(); this brings the bus type in to line with a "_init for in-place init, _new for allocate-and-return" convention. To do this we need to rename the implementation-internal function that was using the pci_root_bus_init() name to pci_root_bus_internal_init(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-id: 20210923121153.23754-4-peter.maydell@linaro.org
2021-09-30ipack: Rename ipack_bus_new_inplace() to ipack_bus_init()Peter Maydell2-7/+7
Rename ipack_bus_new_inplace() to ipack_bus_init(), to bring it in to line with a "_init for in-place init, _new for allocate-and-return" convention. Drop the 'name' argument, because the only caller does not pass in a name. If a future caller does need to specify the bus name, we should create an ipack_bus_init_named() function at that point. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-id: 20210923121153.23754-3-peter.maydell@linaro.org
2021-09-30scsi: Replace scsi_bus_new() with scsi_bus_init(), scsi_bus_init_named()Peter Maydell12-20/+15
The function scsi_bus_new() creates a new SCSI bus; callers can either pass in a name argument to specify the name of the new bus, or they can pass in NULL to allow the bus to be given an automatically generated unique name. Almost all callers want to use the autogenerated name; the only exception is the virtio-scsi device. Taking a name argument that should almost always be NULL is an easy-to-misuse API design -- it encourages callers to think perhaps they should pass in some standard name like "scsi" or "scsi-bus". We don't do this anywhere for SCSI, but we do (incorrectly) do it for other bus types such as i2c. The function name also implies that it will return a newly allocated object, when it in fact does in-place allocation. We more commonly name such functions foo_init(), with foo_new() being the allocate-and-return variant. Replace all the scsi_bus_new() callsites with either: * scsi_bus_init() for the usual case where the caller wants an autogenerated bus name * scsi_bus_init_named() for the rare case where the caller needs to specify the bus name and document that for the _named() version it's then the caller's responsibility to think about uniqueness of bus names. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20210923121153.23754-2-peter.maydell@linaro.org
2021-09-30hw/arm: xlnx-zcu102: Add Xilinx eFUSE deviceTong Ho3-0/+45
Connect the support for ZynqMP eFUSE one-time field-programmable bit array. The command argument: -drive if=pflash,index=3,... Can be used to optionally connect the bit array to a backend storage, such that field-programmed values in one invocation can be made available to next invocation. The backend storage must be a seekable binary file, and its size must be 768 bytes or larger. A file with all binary 0's is a 'blank'. Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-9-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-30hw/arm: xlnx-zcu102: Add Xilinx BBRAM deviceTong Ho3-0/+36
Connect the support for Xilinx ZynqMP Battery-Backed RAM (BBRAM) The command argument: -drive if=pflash,index=2,... Can be used to optionally connect the bbram to a backend storage, such that field-programmed values in one invocation can be made available to next invocation. The backend storage must be a seekable binary file, and its size must be 36 bytes or larger. A file with all binary 0's is a 'blank'. Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-8-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-30hw/arm: xlnx-versal-virt: Add Xilinx eFUSE deviceTong Ho3-0/+92
Connect the support for Versal eFUSE one-time field-programmable bit array. The command argument: -drive if=pflash,index=1,... Can be used to optionally connect the bit array to a backend storage, such that field-programmed values in one invocation can be made available to next invocation. The backend storage must be a seekable binary file, and its size must be 3072 bytes or larger. A file with all binary 0's is a 'blank'. Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-7-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-30hw/arm: xlnx-versal-virt: Add Xilinx BBRAM deviceTong Ho3-0/+55
Connect the support for Versal Battery-Backed RAM (BBRAM) The command argument: -drive if=pflash,index=0,... Can be used to optionally connect the bbram to a backend storage, such that field-programmed values in one invocation can be made available to next invocation. The backend storage must be a seekable binary file, and its size must be 36 bytes or larger. A file with all binary 0's is a 'blank'. Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-6-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-30hw/nvram: Introduce Xilinx battery-backed ramTong Ho3-0/+550
This device is present in Versal and ZynqMP product families to store a 256-bit encryption key. Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-5-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-30hw/nvram: Introduce Xilinx ZynqMP eFuse deviceTong Ho3-0/+861
This implements the Xilinx ZynqMP eFuse, an one-time field-programmable non-volatile storage device. There is only one such device in the Xilinx ZynqMP product family. Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-4-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-30hw/nvram: Introduce Xilinx Versal eFuse deviceTong Ho4-0/+904
This implements the Xilinx Versal eFuse, an one-time field-programmable non-volatile storage device. There is only one such device in the Xilinx Versal product family. This device has two separate mmio interfaces, a controller and a flatten readback. The controller provides interfaces for field-programming, configuration, control, and status. The flatten readback is a cache to provide a byte-accessible read-only interface to efficiently read efuse array. Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-3-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-30hw/nvram: Introduce Xilinx eFuse QOMTong Ho4-0/+408
This introduces the QOM for Xilinx eFuse, an one-time field-programmable storage bit array. The actual mmio interface to the array varies by device families and will be provided in different change-sets. Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-2-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-30allwinner-h3: Switch to SMC as PSCI conduitAlexander Graf1-1/+1
The Allwinner H3 SoC uses Cortex-A7 cores which support virtualization. However, today we are configuring QEMU to use HVC as PSCI conduit. That means HVC calls get trapped into QEMU instead of the guest's own emulated CPU and thus break the guest's ability to execute virtualization. Fix this by moving to SMC as conduit, freeing up HYP completely to the VM. Signed-off-by: Alexander Graf <agraf@csgraf.de> Message-id: 20210920203931.66527-1-agraf@csgraf.de Fixes: 740dafc0ba0 ("hw/arm: add Allwinner H3 System-on-Chip") Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com> Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-30spapr/xive: Fix kvm_xive_source_reset trace eventCédric Le Goater1-2/+2
The trace event was placed in the wrong routine. Move it under kvmppc_xive_source_reset_one(). Fixes: 4e960974d4ee ("xive: Add trace events") Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210922070205.1235943-1-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30spapr_numa.c: fixes in spapr_numa_FORM2_write_rtas_tables()Daniel Henrique Barboza1-10/+9
This patch has a handful of modifications for the recent added FORM2 support: - to not allocate more than the necessary size in 'distance_table'. At this moment the array is oversized due to allocating uint32_t for all elements, when most of them fits in an uint8_t. Fix it by changing the array to uint8_t and allocating the exact size; - use stl_be_p() to store the uint32_t at the start of 'distance_table'; - use sizeof(uint32_t) to skip the uint32_t length when populating the distances; - use the NUMA_DISTANCE_MIN macro from sysemu/numa.h to avoid hardcoding the local distance value. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210922122852.130054-2-danielhb413@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30hw/intc: openpic: Clean up the stylesBin Meng1-21/+34
Correct the multi-line comment format. No functional changes. Signed-off-by: Bin Meng <bin.meng@windriver.com> Message-Id: <20210918032653.646370-3-bin.meng@windriver.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30hw/intc: openpic: Drop Raven related codesBin Meng1-27/+1
There is no machine that uses Motorola MCP750 (aka Raven) model. Drop the related codes. While we are here, drop the mentioning of Intel GW80314 I/O companion chip in the comments as it has been obsolete for years, and correct a typo too. Signed-off-by: Bin Meng <bin.meng@windriver.com> Message-Id: <20210918032653.646370-2-bin.meng@windriver.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30hw/intc: openpic: Correct the reset value of IPIDR for FSL chipsetBin Meng1-0/+9
The reset value of IPIDR should be zero for Freescale chipset, per the following 2 manuals I checked: - P2020RM (https://www.nxp.com/webapp/Download?colCode=P2020RM) - P4080RM (https://www.nxp.com/webapp/Download?colCode=P4080RM) Currently it is set to 1, which leaves the IPI enabled on core 0 after power-on reset. Such may cause unexpected interrupt to be delivered to core 0 if the IPI is triggered from core 0 to other cores later. Fixes: ffd5e9fe0276 ("openpic: Reset IRQ source private members") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/584 Signed-off-by: Bin Meng <bin.meng@windriver.com> Message-Id: <20210918032653.646370-1-bin.meng@windriver.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30target/ppc: Fix 64-bit decrementerCédric Le Goater1-11/+9
The current way the mask is built can overflow with a 64-bit decrementer. Use sextract64() to extract the signed values and remove the logic to handle negative values which has become useless. Cc: Luis Fernando Fujita Pires <luis.pires@eldorado.org.br> Fixes: a8dafa525181 ("target/ppc: Implement large decrementer support for TCG") Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210920061203.989563-5-clg@kaod.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30target/ppc: Convert debug to trace events (decrementer and IRQ)Cédric Le Goater2-109/+82
Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210920061203.989563-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30spapr_numa.c: handle auto NUMA node with no distance infoDaniel Henrique Barboza1-0/+11
numa_complete_configuration() in hw/core/numa.c always adds a NUMA node for the pSeries machine if none was specified, but without node distance information for the single node created. NUMA FORM1 affinity code didn't rely on numa_state information to do its job, but FORM2 does. As is now, this is the result of a pSeries guest with NUMA FORM2 affinity when no NUMA nodes is specified: $ numactl -H available: 1 nodes (0) node 0 cpus: 0 node 0 size: 16222 MB node 0 free: 15681 MB No distance information available. This can be amended in spapr_numa_FORM2_write_rtas_tables(). We're enforcing that the local distance (the distance to the node to itself) is always 10. This allows for the proper creation of the NUMA distance tables, fixing the output of 'numactl -H' in the guest: $ numactl -H available: 1 nodes (0) node 0 cpus: 0 node 0 size: 16222 MB node 0 free: 15685 MB node distances: node 0 0: 10 CC: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-8-danielhb413@gmail.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30spapr_numa.c: FORM2 NUMA affinity supportDaniel Henrique Barboza2-0/+154
The main feature of FORM2 affinity support is the separation of NUMA distances from ibm,associativity information. This allows for a more flexible and straightforward NUMA distance assignment without relying on complex associations between several levels of NUMA via ibm,associativity matches. Another feature is its extensibility. This base support contains the facilities for NUMA distance assignment, but in the future more facilities will be added for latency, performance, bandwidth and so on. This patch implements the base FORM2 affinity support as follows: - the use of FORM2 associativity is indicated by using bit 2 of byte 5 of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1 or FORM2 affinity. Setting both forms will default to FORM2. We're not advertising FORM2 for pseries-6.1 and older machine versions to prevent guest visible changes in those; - ibm,associativity-reference-points has a new semantic. Instead of being used to calculate distances via NUMA levels, it's now used to indicate the primary domain index in the ibm,associativity domain of each resource. In our case it's set to {0x4}, matching the position where we already place logical_domain_id; - two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table and ibm,numa-distance-table. The index table is used to list all the NUMA logical domains of the platform, in ascending order, and allows for spartial NUMA configurations (although QEMU ATM doesn't support that). ibm,numa-distance-table is an array that contains all the distances from the first NUMA node to all other nodes, then the second NUMA node distances to all other nodes and so on; - get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity() now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest selected FORM2 affinity during CAS. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-7-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30spapr: move FORM1 verifications to post CASDaniel Henrique Barboza3-39/+53
FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less) NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM device that has a different latency than the original NUMA node from the regular memory. FORM2 is also able to deal with asymmetric NUMA distances gracefully, something that our FORM1 implementation doesn't do. Move these FORM1 verifications to a new function and wait until after CAS, when we're sure that we're sticking with FORM1, to enforce them. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-6-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30spapr_numa.c: rename numa_assoc_array to FORM1_assoc_arrayDaniel Henrique Barboza2-13/+26
Introducing a new NUMA affinity, FORM2, requires a new mechanism to switch between affinity modes after CAS. Also, we want FORM2 data structures and functions to be completely separated from the existing FORM1 code, allowing us to avoid adding new code that inherits the existing complexity of FORM1. The idea of switching values used by the write_dt() functions in spapr_numa.c was already introduced in the previous patch, and the same approach will be used when dealing with the FORM1 and FORM2 arrays. We can accomplish that by that by renaming the existing numa_assoc_array to FORM1_assoc_array, which now is used exclusively to handle FORM1 affinity data. A new helper get_associativity() is then introduced to be used by the write_dt() functions to retrieve the current ibm,associativity array of a given node, after considering affinity selection that might have been done during CAS. All code that was using numa_assoc_array now needs to retrieve the array by calling this function. This will allow for an easier plug of FORM2 data later on. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-5-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30spapr_numa.c: parametrize FORM1 macrosDaniel Henrique Barboza1-21/+53
The next preliminary step to introduce NUMA FORM2 affinity is to make the existing code independent of FORM1 macros and values, i.e. MAX_DISTANCE_REF_POINTS, NUMA_ASSOC_SIZE and VCPU_ASSOC_SIZE. This patch accomplishes that by doing the following: - move the NUMA related macros from spapr.h to spapr_numa.c where they are used. spapr.h gets instead a 'NUMA_NODES_MAX_NUM' macro that is used to refer to the maximum number of NUMA nodes, including GPU nodes, that the machine can support; - MAX_DISTANCE_REF_POINTS and NUMA_ASSOC_SIZE are renamed to FORM1_DIST_REF_POINTS and FORM1_NUMA_ASSOC_SIZE. These FORM1 specific macros are used in FORM1 init functions; - code that uses MAX_DISTANCE_REF_POINTS now retrieves the max_dist_ref_points value using get_max_dist_ref_points(). NUMA_ASSOC_SIZE is replaced by get_numa_assoc_size() and VCPU_ASSOC_SIZE is replaced by get_vcpu_assoc_size(). These functions are used by the generic device tree functions and h_home_node_associativity() and will allow them to switch between FORM1 and FORM2 without changing their core logic. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-4-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30spapr_numa.c: scrap 'legacy_numa' conceptDaniel Henrique Barboza1-27/+20
When first introduced, 'legacy_numa' was a way to refer to guests that either wouldn't be affected by associativity domain calculations, namely the ones with only 1 NUMA node, and pre 5.2 guests that shouldn't be affected by it because it would be an userspace change. Calling these cases 'legacy_numa' was a convenient way to label these cases. We're about to introduce a new NUMA affinity, FORM2, and this concept of 'legacy_numa' is now a bit misleading because, although it is called 'legacy' it is in fact a FORM1 exclusive contraint. This patch removes spapr_machine_using_legacy_numa() and open code the conditions in each caller. While we're at it, move the chunk inside spapr_numa_FORM1_affinity_init() that sets all numa_assoc_array domains with 'node_id' to spapr_numa_define_FORM1_domains(). This chunk was being executed if !pre_5_2_numa_associativity and num_nodes => 1, the same conditions in which spapr_numa_define_FORM1_domains() is called shortly after. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30spapr_numa.c: split FORM1 code into helpersDaniel Henrique Barboza1-10/+25
The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies and doesn't need be concerned with all the legacy support for older pseries FORM1 guests. We're also not going to calculate associativity domains based on numa distance (via spapr_numa_define_associativity_domains) since the distances will be written directly into new DT properties. Let's split FORM1 code into its own functions to allow for easier insertion of FORM2 logic later on. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30target/ppc: Replace debug messages by asserts for unknown IRQ pinsCédric Le Goater1-18/+6
If an unknown pin of the IRQ controller is raised, something is very wrong in the QEMU model. It is better to abort. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210920061203.989563-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30memory_hotplug.c: send DEVICE_UNPLUG_GUEST_ERROR in acpi_memory_hotplug_write()Daniel Henrique Barboza1-0/+9
MEM_UNPLUG_ERROR is deprecated since the introduction of DEVICE_UNPLUG_GUEST_ERROR. Keep emitting both while the deprecation of MEM_UNPLUG_ERROR is pending. CC: Michael S. Tsirkin <mst@redhat.com> CC: Igor Mammedov <imammedo@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210907004755.424931-8-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30spapr: use DEVICE_UNPLUG_GUEST_ERROR to report unplug errorsDaniel Henrique Barboza2-5/+14
Linux Kernel 5.12 is now unisolating CPU DRCs in the device_removal error path, signalling that the hotunplug process wasn't successful. This allow us to send a DEVICE_UNPLUG_GUEST_ERROR in drc_unisolate_logical() to signal this error to the management layer. We also have another error path in spapr_memory_unplug_rollback() for configured LMB DRCs. Kernels older than 5.13 will not unisolate the LMBs in the hotunplug error path, but it will reconfigure them. Let's send the DEVICE_UNPLUG_GUEST_ERROR event in that code path as well to cover the case of older kernels. Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210907004755.424931-7-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29spapr_drc.c: do not error_report() when drc->dev->id == NULLDaniel Henrique Barboza1-2/+5
The error_report() call in drc_unisolate_logical() is not considering that drc->dev->id can be NULL, and the underlying functions error_report() calls to do its job (vprintf(), g_strdup_printf() ...) has undefined behavior when trying to handle "%s" with NULL arguments. Besides, there is no utility into reporting that an unknown device was rejected by the guest. Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210907004755.424931-4-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29spapr.c: handle dev->id in spapr_memory_unplug_rollback()Daniel Henrique Barboza1-1/+1
As done in hw/acpi/memory_hotplug.c, pass an empty string if dev->id is NULL to qapi_event_send_mem_unplug_error() to avoid relying on a behavior that can be changed in the future. Suggested-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210907004755.424931-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29memory_hotplug.c: handle dev->id = NULL in acpi_memory_hotplug_write()Daniel Henrique Barboza1-1/+1
qapi_event_send_mem_unplug_error() deals with @device being NULL by replacing it with an empty string ("") when emitting the event. Aside from the fact that this behavior (qapi visitor mapping NULL pointer to "") can be patched/changed someday, there's also the lack of utility that the event brings to listeners, e.g. "a memory unplug error happened somewhere". In theory we should just avoit emitting this event at all if dev->id is NULL, but this would be an incompatible change to existing guests. Instead, let's make the forementioned behavior explicit: if dev->id is NULL, pass an empty string to qapi_event_send_mem_unplug_error(). Suggested-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210907004755.424931-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29ppc/pnv: Add an assert when calculating the RAM distribution on chipsCédric Le Goater1-0/+2
Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210902130928.528803-3-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29ppc/pnv: Rename "id" to "quad-id" in PnvQuadCédric Le Goater2-4/+4
This to avoid possible conflicts with the "id" property of QOM objects. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210901094153.227671-9-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29ppc/xive: Export xive_tctx_word2() helperCédric Le Goater1-5/+0
Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210901094153.227671-8-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29ppc/xive: Export priority_to_ipb() helperCédric Le Goater1-15/+6
Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210901094153.227671-7-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29ppc/pnv: Remove useless variableCédric Le Goater1-4/+3
Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210901094153.227671-5-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29ppc/pnv: Add a comment on the "primary-topology-index" propertyCédric Le Goater1-0/+4
On P10, the chip id is calculated from the "Primary topology table index". See skiboot commits for more information [1]. This information is extracted from the hdata on real systems which QEMU needs to emulate. Add this property for all machines even if it is only used on POWER10. [1] https://github.com/open-power/skiboot/commit/2ce3f083f399 https://github.com/open-power/skiboot/commit/a2d4d7f9e14a Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210901094153.227671-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29ppc/spapr: Add a POWER10 DD2 CPUCédric Le Goater1-0/+1
Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210901094153.227671-3-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29i386/kvm: Replace abs64() with uabs64() from host-utilsLuis Pires1-6/+1
Drop abs64() and use uabs64() from host-utils, which avoids an undefined behavior when taking abs of the most negative value. Signed-off-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20210910112624.72748-5-luis.pires@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>