aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-02-07RISC-V: Set minimum priv version for Zfh to 1.11Christoph Müllner1-1/+1
There are no differences for floating point instructions in priv version 1.11 and 1.12. There is also no dependency for Zfh to priv version 1.12. Therefore allow Zfh to be enabled for priv version 1.11. Acked-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-12-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07RISC-V: Adding T-Head FMemIdx extensionChristoph Müllner5-1/+123
This patch adds support for the T-Head FMemIdx instructions. The patch uses the T-Head specific decoder and translation. Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-11-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07RISC-V: Adding T-Head MemIdx extensionChristoph Müllner5-1/+464
This patch adds support for the T-Head MemIdx instructions. The patch uses the T-Head specific decoder and translation. Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-10-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07RISC-V: Adding T-Head MemPair extensionChristoph Müllner5-1/+109
This patch adds support for the T-Head MemPair instructions. The patch uses the T-Head specific decoder and translation. Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-9-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07RISC-V: Adding T-Head multiply-accumulate instructionsChristoph Müllner5-1/+96
This patch adds support for the T-Head MAC instructions. The patch uses the T-Head specific decoder and translation. Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-8-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07RISC-V: Adding XTheadCondMov ISA extensionChristoph Müllner5-1/+43
This patch adds support for the XTheadCondMov ISA extension. The patch uses the T-Head specific decoder and translation. Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-7-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07RISC-V: Adding XTheadBs ISA extensionChristoph Müllner5-1/+23
This patch adds support for the XTheadBs ISA extension. The patch uses the T-Head specific decoder and translation. Co-developed-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-6-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07RISC-V: Adding XTheadBb ISA extensionChristoph Müllner5-2/+149
This patch adds support for the XTheadBb ISA extension. The patch uses the T-Head specific decoder and translation. Co-developed-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-5-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07RISC-V: Adding XTheadBa ISA extensionChristoph Müllner5-1/+66
This patch adds support for the XTheadBa ISA extension. The patch uses the T-Head specific decoder and translation. Co-developed-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-4-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07RISC-V: Adding XTheadSync ISA extensionChristoph Müllner7-1/+105
This patch adds support for the XTheadSync ISA extension. The patch uses the T-Head specific decoder and translation. The implementation introduces a helper to execute synchronization tasks: helper_tlb_flush_all() performs a synchronized TLB flush on all CPUs. Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-3-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07RISC-V: Adding XTheadCmo ISA extensionChristoph Müllner6-0/+131
This patch adds support for the XTheadCmo ISA extension. To avoid interfering with standard extensions, decoder and translation are in its own xthead* specific files. Future patches should be able to easily add additional T-Head extension. The implementation does not have much functionality (besides accepting the instructions and not qualifying them as illegal instructions if the hart executes in the required privilege level for the instruction), as QEMU does not model CPU caches and instructions are documented to not raise any exceptions. Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230131202013.2541053-2-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07hw/riscv: change riscv_compute_fdt_addr() semanticsDaniel Henrique Barboza6-17/+32
As it is now, riscv_compute_fdt_addr() is receiving a dram_base, a mem_size (which is defaulted to MachineState::ram_size in all boards) and the FDT pointer. And it makes a very important assumption: the DRAM interval dram_base + mem_size is contiguous. This is indeed the case for most boards that use a FDT. The Icicle Kit board works with 2 distinct RAM banks that are separated by a gap. We have a lower bank with 1GiB size, a gap follows, then at 64GiB the high memory starts. MachineClass::default_ram_size for this board is set to 1.5Gb, and machine_init() is enforcing it as minimal RAM size, meaning that there we'll always have at least 512 MiB in the Hi RAM area. Using riscv_compute_fdt_addr() in this board is weird because not only the board has sparse RAM, and it's calling it using the base address of the Lo RAM area, but it's also using a mem_size that we have guarantees that it will go up to the Hi RAM. All the function assumptions doesn't work for this board. In fact, what makes the function works at all in this case is a coincidence. Commit 1a475d39ef54 introduced a 3GB boundary for the FDT, down from 4Gb, that is enforced if dram_base is lower than 3072 MiB. For the Icicle Kit board, memmap[MICROCHIP_PFSOC_DRAM_LO].base is 0x80000000 (2 Gb) and it has a 1Gb size, so it will fall in the conditions to put the FDT under a 3Gb address, which happens to be exactly at the end of DRAM_LO. If the base address of the Lo area started later than 3Gb this function would be unusable by the board. Changing any assumptions inside riscv_compute_fdt_addr() can also break it by accident as well. Let's change riscv_compute_fdt_addr() semantics to be appropriate to the Icicle Kit board and for future boards that might have sparse RAM topologies to worry about: - relieve the condition that the dram_base + mem_size area is contiguous, since this is already not the case today; - receive an extra 'dram_size' size attribute that refers to a contiguous RAM block that the board wants the FDT to reside on. Together with 'mem_size' and 'fdt', which are now now being consumed by a MachineState pointer, we're able to make clear assumptions based on the DRAM block and total mem_size available to ensure that the FDT will be put in a valid RAM address. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230201171212.1219375-4-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07hw/riscv: split fdt address calculation from fdt loadDaniel Henrique Barboza6-17/+43
A common trend in other archs is to calculate the fdt address, which is usually straightforward, and then calling a function that loads the fdt/dtb by using that address. riscv_load_fdt() is doing a bit too much in comparison. It's calculating the fdt address via an elaborated heuristic to put the FDT at the bottom of DRAM, and "bottom of DRAM" will vary across boards and configurations, then it's actually loading the fdt, and finally it's returning the fdt address used to the caller. Reduce the existing complexity of riscv_load_fdt() by splitting its code into a new function, riscv_compute_fdt_addr(), that will take care of all fdt address logic. riscv_load_fdt() can then be a simple function that just loads a fdt at the given fdt address. We're also taken the opportunity to clarify the intentions and assumptions made by these functions. riscv_load_fdt() is now receiving a hwaddr as fdt_addr because there is no restriction of having to load the fdt in higher addresses that doesn't fit in an uint32_t. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-Id: <20230201171212.1219375-3-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07hw/riscv/boot.c: calculate fdt size after fdt_pack()Daniel Henrique Barboza1-4/+6
fdt_pack() can change the fdt size, meaning that fdt_totalsize() can contain a now deprecated (bigger) value. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-Id: <20230201171212.1219375-2-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07target/riscv: set tval for triggered watchpointsSergey Matyukevich2-1/+6
According to privileged spec, if [sm]tval is written with a nonzero value when a breakpoint exception occurs, then [sm]tval will contain the faulting virtual address. Set tval to hit address when breakpoint exception is triggered by hardware watchpoint. Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com> Reviewed-by: Bin Meng <bmeng@tinylab.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230131170955.752743-1-geomatsi@gmail.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07hw/riscv/spike.c: rename MachineState 'mc' pointers to' ms'Daniel Henrique Barboza1-9/+9
Follow the QEMU convention of naming MachineState pointers as 'ms' by renaming the instances where we're calling it 'mc'. Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-Id: <20230124212234.412630-4-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07hw/riscv/virt.c: rename MachineState 'mc' pointers to 'ms'Daniel Henrique Barboza1-217/+217
We have a convention in other QEMU boards/archs to name MachineState pointers as either 'machine' or 'ms'. MachineClass pointers are usually called 'mc'. The 'virt' RISC-V machine has a lot of instances where MachineState pointers are named 'mc'. There is nothing wrong with that, but we gain more compatibility with the rest of the QEMU code base, and easier reviews, if we follow QEMU conventions. Rename all 'mc' MachineState pointers to 'ms'. This is a very tedious and mechanical patch that was produced by doing the following: - find/replace all 'MachineState *mc' to 'MachineState *ms'; - find/replace all 'mc->fdt' to 'ms->fdt'; - find/replace all 'mc->smp.cpus' to 'ms->smp.cpus'; - replace any remaining occurrences of 'mc' that the compiler complained about. Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-Id: <20230124212234.412630-3-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07hw/riscv/virt.c: calculate socket count once in create_fdt_imsic()Daniel Henrique Barboza1-15/+19
riscv_socket_count() returns either ms->numa_state->num_nodes or 1 depending on NUMA support. In any case the value can be retrieved only once and used in the rest of the function. This will also alleviate the rename we're going to do next by reducing the instances of MachineState 'mc' inside hw/riscv/virt.c. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-Id: <20230124212234.412630-2-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07target/riscv: Ensure opcode is saved for all relevant instructionsAnup Patel7-3/+21
We should call decode_save_opc() for all relevant instructions which can potentially generate a virtual instruction fault or a guest page fault because generating transformed instruction upon guest page fault expects opcode to be available. Without this, hypervisor will see transformed instruction as zero in htinst CSR for guest MMIO emulation which makes MMIO emulation in hypervisor slow and also breaks nested virtualization. Fixes: a9814e3e08d2 ("target/riscv: Minimize the calls to decode_save_opc") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230120125950.2246378-5-apatel@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07target/riscv: No need to re-start QEMU timer when timecmp == UINT64_MAXAnup Patel1-0/+24
The time CSR will wrap-around immediately after reaching UINT64_MAX so we don't need to re-start QEMU timer when timecmp == UINT64_MAX in riscv_timer_write_timecmp(). Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230120125950.2246378-4-apatel@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07target/riscv: Don't clear mask in riscv_cpu_update_mip() for VSTIPAnup Patel2-6/+8
Instead of clearing mask in riscv_cpu_update_mip() for VSTIP, we should call riscv_cpu_update_mip() with mask == 0 from timer_helper.c for VSTIP. Fixes: 3ec0fe18a31f ("target/riscv: Add vstimecmp suppor") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230120125950.2246378-3-apatel@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07target/riscv: Update VS timer whenever htimedelta changesAnup Patel1-0/+16
The htimedelta[h] CSR has impact on the VS timer comparison so we should call riscv_timer_write_timecmp() whenever htimedelta changes. Fixes: 3ec0fe18a31f ("target/riscv: Add vstimecmp suppor") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230120125950.2246378-2-apatel@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07hw/riscv: boot: Don't use CSRs if they are disabledAlistair Francis1-0/+9
If the CSRs and CSR instructions are disabled because the Zicsr extension isn't enabled then we want to make sure we don't run any CSR instructions in the boot ROM. This patches removes the CSR instructions from the reset-vec if the extension isn't enabled. We replace the instruction with a NOP instead. Note that we don't do this for the SiFive U machine, as we are modelling the hardware in that case. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1447 Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-Id: <20230123035754.75553-1-alistair.francis@opensource.wdc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07include/hw/riscv/opentitan: update opentitan IRQsWilfred Mallawa2-47/+47
Updates the opentitan IRQs to match the latest supported commit of Opentitan from TockOS. OPENTITAN_SUPPORTED_SHA := 565e4af39760a123c59a184aa2f5812a961fde47 Memory layout as per [1] [1] https://github.com/lowRISC/opentitan/blob/565e4af39760a123c59a184aa2f5812a961fde47/hw/top_earlgrey/sw/autogen/top_earlgrey_memory.h Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230123063619.222459-1-wilfred.mallawa@opensource.wdc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-07target/riscv: update disas.c for xnor/orn/andn and slli.uwPhilipp Tomsich1-4/+4
The decoding of the following instructions from Zb[abcs] currently contains decoding/printing errors: * xnor,orn,andn: the rs2 operand is not being printed * slli.uw: decodes and prints the immediate shift-amount as a register (e.g. 'shift-by-2' becomes 'sp') instead of interpreting this as an immediate This commit updates the instruction descriptions to use the appropriate decoding/printing formats. Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230120151551.1022761-1-philipp.tomsich@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-02-06migration: save/delete migration thread infoJiang Jiacheng2-0/+10
To support query migration thread infomation, save and delete thread(live_migration and multifdsend) information at thread creation and finish. Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration: Introduce interface query-migrationthreadsJiang Jiacheng4-0/+109
Introduce interface query-migrationthreads. The interface is used to query information about migration threads and returns with migration thread's name and its id. Introduce threadinfo.c to manage threads with migration. Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06multifd: Fix flush of zero copy page send requestZhenzhong Duan4-4/+1291
Make IO channel flush call after the inflight request has been drained in multifd thread, or else we may missed to flush the inflight request. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06multifd: Fix a race on reading MultiFDPages_t.blockZhenzhong Duan1-2/+5
In multifd_queue_page() MultiFDPages_t.block is checked twice. Between the two checks, MultiFDPages_t.block may be reset to NULL by multifd thread. This lead to the 2nd check always true then a redundant page submitted to multifd thread again. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration: check magic value for deciding the mapping of channelsmanish.mishra7-31/+101
Current logic assumes that channel connections on the destination side are always established in the same order as the source and the first one will always be the main channel followed by the multifid or post-copy preemption channel. This may not be always true, as even if a channel has a connection established on the source side it can be in the pending state on the destination side and a newer connection can be established first. Basically causing out of order mapping of channels on the destination side. Currently, all channels except post-copy preempt send a magic number, this patch uses that magic number to decide the type of channel. This logic is applicable only for precopy(multifd) live migration, as mentioned, the post-copy preempt channel does not send any magic number. Also, tls live migrations already does tls handshake before creating other channels, so this issue is not possible with tls, hence this logic is avoided for tls live migrations. This patch uses read peek to check the magic number of channels so that current data/control stream management remains un-effected. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: manish.mishra <manish.mishra@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06io: Add support for MSG_PEEK for socket channelmanish.mishra16-10/+50
MSG_PEEK peeks at the channel, The data is treated as unread and the next read shall still return this data. This support is currently added only for socket class. Extra parameter 'flags' is added to io_readv calls to pass extra read flags like MSG_PEEK. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: manish.mishra <manish.mishra@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/dirtyrate: Show sample pages only in page-sampling modeZhenzhong Duan1-4/+6
The value of "Sample Pages" is confusing in mode other than page-sampling. See below: (qemu) calc_dirty_rate -b 10 520 (qemu) info dirty_rate Status: measuring Start Time: 11646834 (ms) Sample Pages: 520 (per GB) Period: 10 (sec) Mode: dirty-bitmap Dirty rate: (not ready) (qemu) info dirty_rate Status: measured Start Time: 11646834 (ms) Sample Pages: 0 (per GB) Period: 10 (sec) Mode: dirty-bitmap Dirty rate: 2 (MB/s) While it's totally useless in dirty-ring and dirty-bitmap mode, fix to show it only in page-sampling mode. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration: Perform vmsd structure check during testsDr. David Alan Gilbert1-0/+42
Perform a check on vmsd structures during test runs in the hope of catching any missing terminators and other simple screwups. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration: Add canary to VMSTATE_END_OF_LISTDr. David Alan Gilbert3-1/+9
We fairly regularly forget VMSTATE_END_OF_LIST markers off descriptions; given that the current check is only for ->name being NULL, sometimes we get unlucky and the code apparently works and no one spots the error. Explicitly add a flag, VMS_END that should be set, and assert it is set during the traversal. Note: This can't go in until we update the copy of vmstate.h in slirp. Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/rdma: fix return value for qio_channel_rdma_{readv,writev}Fiona Ebner1-5/+10
upon errors. As the documentation in include/io/channel.h states, only -1 and QIO_CHANNEL_ERR_BLOCK should be returned upon error. Other values have the potential to confuse the call sites. error_setg is used rather than error_setg_errno, because there are certain code paths where -1 (as a non-errno) is propagated up (e.g. starting from qemu_rdma_block_for_wrid or qemu_rdma_post_recv_control) all the way to qio_channel_rdma_{readv,writev}. Similar to a216ec85b7 ("migration/channel-block: fix return value for qio_channel_block_{readv,writev}"). Suggested-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration: Show downtime during postcopy phasePeter Xu1-2/+12
The downtime should be displayed during postcopy phase because the switchover phase is done. OTOH it's weird to show "expected downtime" which can confuse what does that mean if the switchover has already happened anyway. This is a slight ABI change on QMP, but I assume it shouldn't affect anyone. Reviewed-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06virtio-mem: Proper support for preallocation with migrationDavid Hildenbrand1-0/+87
Ordinary memory preallocation runs when QEMU starts up and creates the memory backends, before processing the incoming migration stream. With virtio-mem, we don't know which memory blocks to preallocate before migration started. Now that we migrate the virtio-mem bitmap early, before migrating any RAM content, we can safely preallocate memory for all plugged memory blocks before migrating any RAM content. This is especially relevant for the following cases: (1) User errors With hugetlb/files, if we don't have sufficient backend memory available on the migration destination, we'll crash QEMU (SIGBUS) during RAM migration when running out of backend memory. Preallocating memory before actual RAM migration allows for failing gracefully and informing the user about the setup problem. (2) Excluded memory ranges during migration For example, virtio-balloon free page hinting will exclude some pages from getting migrated. In that case, we won't crash during RAM migration, but later, when running the VM on the destination, which is bad. To fix this for new QEMU machines that migrate the bitmap early, preallocate the memory early, before any RAM migration. Warn with old QEMU machines. Getting postcopy right is a bit tricky, but we essentially now implement the same (problematic) preallocation logic as ordinary preallocation: preallocate memory early and discard it again before precopy starts. During ordinary preallocation, discarding of RAM happens when postcopy is advised. As the state (bitmap) is loaded after postcopy was advised but before postcopy starts listening, we have to discard memory we preallocated immediately again ourselves. Note that nothing (not even hugetlb reservations) guarantees for postcopy that backend memory (especially, hugetlb pages) are still free after they were freed ones while discarding RAM. Still, allocating that memory at least once helps catching some basic setup problems. Before this change, trying to restore a VM when insufficient hugetlb pages are around results in the process crashing to to a "Bus error" (SIGBUS). With this change, QEMU fails gracefully: qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-mem-device-early' qemu-system-x86_64: load of migration failed: Cannot allocate memory And we can even introspect the early migration data, including the bitmap: $ ./scripts/analyze-migration.py -f STATEFILE { "ram (2)": { "section sizes": { "0000:00:03.0/mem0": "0x0000000780000000", "0000:00:04.0/mem1": "0x0000000780000000", "pc.ram": "0x0000000100000000", "/rom@etc/acpi/tables": "0x0000000000020000", "pc.bios": "0x0000000000040000", "0000:00:02.0/e1000.rom": "0x0000000000040000", "pc.rom": "0x0000000000020000", "/rom@etc/table-loader": "0x0000000000001000", "/rom@etc/acpi/rsdp": "0x0000000000001000" } }, "0000:00:03.0/virtio-mem-device-early (51)": { "tmp": "00 00 00 01 40 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00", "size": "0x0000000040000000", "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...] }, "0000:00:04.0/virtio-mem-device-early (53)": { "tmp": "00 00 00 08 c0 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00", "size": "0x00000001fa400000", "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...] }, [...] Reported-by: Jing Qi <jinqi@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06virtio-mem: Migrate immutable properties earlyDavid Hildenbrand3-3/+60
The bitmap and the size are immutable while migration is active: see virtio_mem_is_busy(). We can migrate this information early, before migrating any actual RAM content. Further, all information we need for sanity checks is immutable as well. Having this information in place early will, for example, allow for properly preallocating memory before touching these memory locations during RAM migration: this way, we can make sure that all memory was actually preallocated and that any user errors (e.g., insufficient hugetlb pages) can be handled gracefully. In contrast, usable_region_size and requested_size can theoretically still be modified on the source while the VM is running. Keep migrating these properties the usual, late, way. Use a new device property to keep behavior of compat machines unmodified. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06virtio-mem: Fail if a memory backend with "prealloc=on" is specifiedDavid Hildenbrand1-0/+6
"prealloc=on" for the memory backend does not work as expected, as virtio-mem will simply discard all preallocated memory immediately again. In the best case, it's an expensive NOP. In the worst case, it's an unexpected allocation error. Instead, "prealloc=on" should be specified for the virtio-mem device only, such that virtio-mem will try preallocating memory before plugging memory dynamically to the guest. Fail if such a memory backend is provided. Tested-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/ram: Factor out check for advised postcopyDavid Hildenbrand3-8/+11
Let's factor out this check, to be used in virtio-mem context next. While at it, fix a spelling error in a related comment. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST()David Hildenbrand1-2/+10
We'll make use of both next in the context of virtio-mem. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/savevm: Allow immutable device state to be migrated early (i.e., ↵David Hildenbrand2-1/+29
before RAM) For virtio-mem, we want to have the plugged/unplugged state of memory blocks available before migrating any actual RAM content, and perform sanity checks before touching anything on the destination. This information is immutable on the migration source while migration is active, We want to use this information for proper preallocation support with migration: currently, we don't preallocate memory on the migration target, and especially with hugetlb, we can easily run out of hugetlb pages during RAM migration and will crash (SIGBUS) instead of catching this gracefully via preallocation. Migrating device state via a VMSD before we start iterating is currently impossible: the only approach that would be possible is avoiding a VMSD and migrating state manually during save_setup(), to be restored during load_state(). Let's allow for migrating device state via a VMSD early, during the setup phase in qemu_savevm_state_setup(). To keep it simple, we indicate applicable VMSD's using an "early_setup" flag. Note that only very selected devices (i.e., ones seriously messing with RAM setup) are supposed to make use of such early state migration. While at it, also use a bool for the "unmigratable" member. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup()David Hildenbrand3-6/+18
... and store it in the migration state. This is a preparation for storing selected vmds's already in qemu_savevm_state_setup(). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/savevm: Move more savevm handling into vmstate_save()David Hildenbrand1-42/+37
Let's move more code into vmstate_save(), reducing code duplication and preparing for reuse of vmstate_save() in qemu_savevm_state_setup(). We have to move vmstate_save() to make the compiler happy. We'll now also trace from qemu_save_device_state(), triggering the same tracepoints as previously called from qemu_savevm_state_complete_precopy_non_iterable() only. Note that qemu_save_device_state() ignores iterable device state, such as RAM, and consequently doesn't trigger some other trace points (e.g., trace_savevm_state_setup()). Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/ram: Optimize ram_write_tracking_start() for RamDiscardManagerDavid Hildenbrand1-2/+34
ram_block_populate_read() already optimizes for RamDiscardManager. However, ram_write_tracking_start() will still try protecting discarded memory ranges. Let's optimize, because discarded ranges don't map any pages and (1) For anonymous memory, trying to protect using uffd-wp without a mapped page is ignored by the kernel and consequently a NOP. (2) For shared/file-backed memory, we will fill present page tables in the range with PTE markers. However, we will even allocate page tables just to fill them with unnecessary PTE markers and effectively waste memory. So let's exclude these ranges, just like ram_block_populate_read() already does. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/ram: Rely on used_length for uffd_change_protection()David Hildenbrand1-1/+1
ram_mig_ram_block_resized() will abort migration (including background snapshots) when resizing a RAMBlock. ram_block_populate_read() will only populate RAM up to used_length, so at least for anonymous memory protecting everything between used_length and max_length won't actually be protected and is just a NOP. So let's only protect everything up to used_length. Note: it still makes sense to register uffd-wp for max_length, such that RAM_UF_WRITEPROTECT is independent of a changing used_length. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/ram: Don't explicitly unprotect when unregistering uffd-wpDavid Hildenbrand1-9/+0
When unregistering uffd-wp, older kernels before commit f369b07c86143 ("mm/uffd:reset write protection when unregister with wp-mode") won't clear the uffd-wp PTE bit. When re-registering uffd-wp, the previous uffd-wp PTE bits would trigger again. With above commit, the kernel will clear the uffd-wp PTE bits when unregistering itself. Consequently, we'll clear the uffd-wp PTE bits now twice -- whereby we don't care about clearing them at all: a new background snapshot will re-register uffd-wp and re-protect all memory either way. So let's skip the manual clearing of uffd-wp. If ever relevant, we could clear conditionally in uffd_unregister_memory() -- we just need a way to figure out more recent kernels. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/ram: Fix error handling in ram_write_tracking_start()David Hildenbrand1-2/+3
If something goes wrong during uffd_change_protection(), we would miss to unregister uffd-wp and not release our reference. Fix it by performing the uffd_change_protection(true) last. Note that a uffd_change_protection(false) on the recovery path without a prior uffd_change_protection(false) is fine. Fixes: 278e2f551a09 ("migration: support UFFD write fault processing in ram_save_iterate()") Cc: qemu-stable@nongnu.org Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/ram: Fix populate_read_range()David Hildenbrand1-1/+3
Unfortunately, commit f7b9dcfbcf44 broke populate_read_range(): the loop end condition is very wrong, resulting in that function not populating the full range. Lets' fix that. Fixes: f7b9dcfbcf44 ("migration/ram: Factor out populating pages readable in ram_block_populate_pages()") Cc: qemu-stable@nongnu.org Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06util/userfaultfd: Add uffd_open()Peter Xu4-10/+30
Add a helper to create the uffd handle. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>