aboutsummaryrefslogtreecommitdiff
path: root/target/arm
AgeCommit message (Collapse)AuthorFilesLines
2024-09-19target/arm: Push tcg_rnd into handle_shri_with_rndaccRichard Henderson1-26/+6
We always pass the same value for round; compute it within common code. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-21-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Convert SSHLL, USHLL to decodetreeRichard Henderson2-44/+45
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-20-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Use {, s}extract in handle_vec_simd_wshliRichard Henderson1-2/+5
Combine the right shift with the extension via the tcg extract operations. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-19-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Convert handle_vec_simd_shli to decodetreeRichard Henderson2-30/+18
This includes SHL and SLI. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-18-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Convert handle_vec_simd_shri to decodetreeRichard Henderson2-60/+89
This includes SSHR, USHR, SSRA, USRA, SRSHR, URSHR, SRSRA, URSRA, SRI. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-17-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Fix whitespace near gen_srshr64_i64Richard Henderson1-1/+1
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-16-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Introduce gen_gvec_sshr, gen_gvec_ushrRichard Henderson4-38/+27
Handle the two special cases within these new functions instead of higher in the call stack. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-15-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Convert MOVI, FMOV, ORR, BIC (vector immediate) to decodetreeRichard Henderson2-67/+59
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-14-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Convert FMOVI (scalar, immediate) to decodetreeRichard Henderson2-48/+30
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-13-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Convert FMAXNMV, FMINNMV, FMAXV, FMINV to decodetreeRichard Henderson2-123/+67
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-12-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Convert ADDV, *ADDLV, *MAXV, *MINV to decodetreeRichard Henderson2-91/+61
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-11-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Simplify do_reduction_opRichard Henderson1-27/+13
Use simple shift and add instead of ctpop, ctz, shift and mask. Unlike SVE, there is no predicate to disable elements. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-10-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Convert UZP, TRN, ZIP to decodetreeRichard Henderson2-90/+77
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-9-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Convert TBL, TBX to decodetreeRichard Henderson2-33/+18
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-8-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Convert EXT to decodetreeRichard Henderson2-73/+53
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-7-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Use tcg_gen_extract2_i64 for EXTRichard Henderson1-20/+3
The extract2 tcg op performs the same operation as the do_ext64 function. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-6-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Use cmpsel in gen_sshl_vecRichard Henderson1-5/+3
Instead of cmp+and or cmp+andc, use cmpsel. This will be better for hosts that use predicate registers for cmp. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-5-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Use cmpsel in gen_ushl_vecRichard Henderson1-11/+8
Instead of cmp+and or cmp+andc, use cmpsel. This will be better for hosts that use predicate registers for cmp. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-4-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Replace tcg_gen_dupi_vec with constants in translate-sve.cRichard Henderson1-79/+49
Instead of copying a constant into a temporary with dupi, use a vector constant directly. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-3-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-19target/arm: Replace tcg_gen_dupi_vec with constants in gengvec.cRichard Henderson1-24/+19
Instead of copying a constant into a temporary with dupi, use a vector constant directly. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240912024114.1097832-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-13target/arm/tcg: refine cache descriptions with a wrapperAlireza Sanaee3-60/+117
This patch allows for easier manipulation of the cache description register, CCSIDR. Which is helpful for testing as well. Currently, numbers get hard-coded and might be prone to errors. Therefore, this patch adds a wrapper for different types of CPUs available in tcg to decribe caches. One function `make_ccsidr` supports two cases by carrying a parameter as FORMAT that can be LEGACY and CCIDX which determines the specification of the register. For CCSIDR register, 32 bit version follows specification [1]. Conversely, 64 bit version follows specification [2]. [1] B4.1.19, ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition, https://developer.arm.com/documentation/ddi0406 [2] D23.2.29, ARM Architecture Reference Manual for A-profile Architecture, https://developer.arm.com/documentation/ddi0487/latest/ Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240903144550.280-1-alireza.sanaee@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-13hvf: arm: Implement and use hvf_get_physical_address_rangeDanny Canter4-1/+108
This patch's main focus is to use the previously added hvf_get_physical_address_range to inform VM creation about the IPA size we need for the VM, so we can extend the default 36b IPA size and support VMs with 64+GB of RAM. This is done by freezing the memory map, computing the highest GPA and then (depending on if the platform supports an IPA size that large) telling the kernel to use a size >= for the VM. In pursuit of this a couple of things related to how we handle the physical address range we expose to guests were altered, but for an explanation of what we were doing: Today, to get the IPA size we were reading id_aa64mmfr0_el1's PARange field from a newly made vcpu. Unfortunately, HVF just returns the hosts PARange directly for the initial value and not the IPA size that will actually back the VM, so we believe we have much more address space than we actually do today it seems. Starting in macOS 13.0 some APIs were introduced to be able to query the maximum IPA size the kernel supports, and to set the IPA size for a given VM. However, this still has a couple of issues on < macOS 15. Up until macOS 15 (and if the hardware supported it) the max IPA size was 39 bits which is not a valid PARange value, so we can't clamp down what we advertise in the vcpu's id_aa64mmfr0_el1 to our IPA size. Starting in macOS 15 however, the maximum IPA size is 40 bits (if it's supported in the hardware as well) which is also a valid PARange value so we can set our IPA size to the maximum as well as clamp down the PARange we advertise to the guest. This allows VMs with 64+ GB of RAM and should fix the oddness of the PARange situation as well. Signed-off-by: Danny Canter <danny_canter@apple.com> Message-id: 20240828111552.93482-4-danny_canter@apple.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-13hvf: Split up hv_vm_create logic per archDanny Canter1-0/+9
This is preliminary work to split up hv_vm_create logic per platform so we can support creating VMs with > 64GB of RAM on Apple Silicon machines. This is done via ARM HVF's hv_vm_config_create() (and other APIs that modify this config that will be coming in future patches). This should have no behavioral difference at all as hv_vm_config_create() just assigns the same default values as if you just passed NULL to the function. Signed-off-by: Danny Canter <danny_canter@apple.com> Message-id: 20240828111552.93482-3-danny_canter@apple.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-10gdbstub: Add support for MTE in system modeGustavo Romero1-2/+4
This commit makes handle_q_memtag, handle_q_isaddresstagged, and handle_Q_memtag stubs build for system mode, allowing all GDB 'memory-tag' subcommands to work with QEMU gdbstub on aarch64 system mode. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/620 Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240906143316.657436-3-gustavo.romero@linaro.org> [AJB: add #ifdef CONFIG_TCG guards] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20240910173900.4154726-8-alex.bennee@linaro.org>
2024-09-10gdbstub: Use specific MMU index when probing MTE addressesGustavo Romero1-4/+13
Use cpu_mmu_index() to determine the specific translation regime (MMU index) before probing addresses using allocation_tag_mem_probe(). Currently, the MMU index is hardcoded to 0 and only works for user mode. By obtaining the specific MMU index according to the translation regime, future use of the stubs relying on allocation_tag_mem_probe in other regimes will be possible, like in EL1. This commit also changes the ptr_size value passed to allocation_tag_mem_probe() from 8 to 1. The ptr_size parameter actually represents the number of bytes in the memory access (which can be as small as 1 byte), rather than the number of bits used in the address space pointed to by ptr. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240906143316.657436-2-gustavo.romero@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20240910173900.4154726-7-alex.bennee@linaro.org>
2024-09-05target/arm: Correct names of VFP VFNMA and VFNMS insnsPeter Maydell2-10/+10
In vfp.decode we have the names of the VFNMA and VFNMS instructions the wrong way around. The architecture says that bit 6 is the 'op' bit, which is 1 for VFNMA and 0 for VFNMS, but we label these two lines of decode the other way around. This doesn't cause any user-visible problem because in the handling of these functions in translate-vfp.c we give VFNMA the behaviour specified for VFNMS and vice-versa, but it's confusing when reading the code. Switch the names of the VFP VFNMA and VFNMS instructions in the decode file and flip the behaviour also. NB: the instructions VFMA and VFMS *are* decoded with op=0 for VFMA and op=1 for VFMS; the confusion probably arose because we assumed VFNMA and VFNMS to be the same way around. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2536 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20240830152156.2046590-1-peter.maydell@linaro.org Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-09-05target/arm: Enable FEAT_EBF16 in the "max" CPUPeter Maydell2-3/+2
Now that we've implemented the required behaviour for FEAT_EBF16, we can enable it for the "max" CPU type, list it in our documentation, and delete a TODO comment about it being missing. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-09-05target/arm: Implement FPCR.EBF=1 semantics for bfdotadd()Peter Maydell1-3/+54
Implement the FPCR.EBF=1 semantics for bfdotadd() operations: * is_ebf() sets up fpst and fpst_odd * bfdotadd_ebf() implements the fused paired-multiply-and-add operation that we need The paired-multiply-and-add is similar to f16_dotadd() and we use the same trick here as in that function, but the inputs here are bfloat16 rather than float16. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-09-05target/arm: Prepare bfdotadd() callers for FEAT_EBF supportPeter Maydell3-65/+193
We use bfdotadd() in four callsites for various helper functions. Currently this all assumes that we have the FPCR.EBF=0 semantics. For FPCR.EBF=1 we will need to: * call a different routine to bfdotadd() because we need to do a fused multiply-add rather than separate multiply and add steps * use a different float_status that honours the FPCR rounding mode and denormal-flushing fields * pass in an extra float_status that has been set up to perform round-to-odd rounding To prepare for this, refactor all the callsites so that instead of for (...) { x = bfdotadd(...); } they are: float_status fpst, fpst_odd; if (is_ebf(env, &fpst, &fpst_odd)) { for (...) { x = bfdotadd_ebf(..., &fpst, &fpst_odd); } } else { for (...) { x = bfdotadd(..., &fpst); } } For the moment the is_ebf() function always returns false, sets up fpst for EBF=0 semantics and never sets up fpst_odd; bfdotadd_ebf() will assert if called. We'll fill in the handling for EBF=1 in the next commit. This change should be a zero-behaviour-change refactor. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-09-05target/arm: Pass env pointer through to gvec_bfmmla helperPeter Maydell5-7/+8
Pass the env pointer through to the gvec_bfmmla helper, so we can use it to add support for FEAT_EBF16. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-09-05target/arm: Pass env pointer through to gvec_bfdot_idx helperPeter Maydell5-7/+22
Pass the env pointer through to the gvec_bfdot_idx helper, so we can use it to add support for FEAT_EBF16. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-09-05target/arm: Pass env pointer through to gvec_bfdot helperPeter Maydell5-7/+77
Pass the env pointer through to the gvec_bfdot helper, so we can use it to add support for FEAT_EBF16. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-09-05target/arm: Pass env pointer through to sme_bfmopa helperPeter Maydell3-5/+5
To implement the FEAT_EBF16 semantics, we are going to need the CPUARMState env pointer in every helper function which calls bfdotadd(). Pass the env pointer through from generated code to the sme_bfmopa helper. (We'll add the code that uses it when we've adjusted all the helpers to have access to the env pointer.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-09-05target/arm: Allow setting the FPCR.EBF bit for FEAT_EBF16Peter Maydell3-2/+12
FEAT_EBF16 adds one new bit to the FPCR floating point control register. Allow this bit to be read and written when the ID registers indicate the presence of the feature. Note that because this new bit is not in FPSCR_FPCR_MASK the bit is not visible in the AArch32 FPSCR, and FPSCR writes do not affect it. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-08-13target/arm: Fix usage of MMU indexes when EL3 is AArch32Peter Maydell8-34/+81
Our current usage of MMU indexes when EL3 is AArch32 is confused. Architecturally, when EL3 is AArch32, all Secure code runs under the Secure PL1&0 translation regime: * code at EL3, which might be Mon, or SVC, or any of the other privileged modes (PL1) * code at EL0 (Secure PL0) This is different from when EL3 is AArch64, in which case EL3 is its own translation regime, and EL1 and EL0 (whether AArch32 or AArch64) have their own regime. We claimed to be mapping Secure PL1 to our ARMMMUIdx_EL3, but didn't do anything special about Secure PL0, which meant it used the same ARMMMUIdx_EL10_0 that NonSecure PL0 does. This resulted in a bug where arm_sctlr() incorrectly picked the NonSecure SCTLR as the controlling register when in Secure PL0, which meant we were spuriously generating alignment faults because we were looking at the wrong SCTLR control bits. The use of ARMMMUIdx_EL3 for Secure PL1 also resulted in the bug that we wouldn't honour the PAN bit for Secure PL1, because there's no equivalent _PAN mmu index for it. We could fix this in one of two ways: * The most straightforward is to add new MMU indexes EL30_0, EL30_3, EL30_3_PAN to correspond to "Secure PL1&0 at PL0", "Secure PL1&0 at PL1", and "Secure PL1&0 at PL1 with PAN". This matches how we use indexes for the AArch64 regimes, and preserves propirties like being able to determine the privilege level from an MMU index without any other information. However it would add two MMU indexes (we can share one with ARMMMUIdx_EL3), and we are already using 14 of the 16 the core TLB code permits. * The more complicated approach is the one we take here. We use the same MMU indexes (E10_0, E10_1, E10_1_PAN) for Secure PL1&0 than we do for NonSecure PL1&0. This saves on MMU indexes, but means we need to check in some places whether we're in the Secure PL1&0 regime or not before we interpret an MMU index. The changes in this commit were created by auditing all the places where we use specific ARMMMUIdx_ values, and checking whether they needed to be changed to handle the new index value usage. Note for potential stable backports: taking also the previous (comment-change-only) commit might make the backport easier. Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2326 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Tested-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240809160430.1144805-3-peter.maydell@linaro.org
2024-08-13target/arm: Update translation regime comment for new featuresPeter Maydell1-7/+16
We have a long comment describing the Arm architectural translation regimes and how we map them to QEMU MMU indexes. This comment has got a bit out of date: * FEAT_SEL2 allows Secure EL2 and corresponding new regimes * FEAT_RME introduces Realm state and its translation regimes * We now model the Cortex-R52 so that is no longer a hypothetical * We separated Secure Stage 2 and NonSecure Stage 2 MMU indexes * We have an MMU index per physical address spacea Add the missing pieces so that the list of architectural translation regimes matches the Arm ARM, and the list and count of QEMU MMU indexes in the comment matches the enum. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Tested-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240809160430.1144805-2-peter.maydell@linaro.org
2024-08-13target/arm: Clear high SVE elements in handle_vec_simd_wshliRichard Henderson1-0/+1
AdvSIMD instructions are supposed to zero bits beyond 128. Affects SSHLL, USHLL, SSHLL2, USHLL2. Cc: qemu-stable@nongnu.org Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240717060903.205098-15-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-08-09target/arm: Fix BTI versus CF_PCRELRichard Henderson4-52/+56
With pcrel, we cannot check the guarded page bit at translation time, as different mappings of the same physical page may or may not have the GP bit set. Instead, add a couple of helpers to check the page at runtime, after all other filters that might obviate the need for the check. The set_btype_for_br call must be moved after the gen_a64_set_pc call to ensure the current pc can still be computed. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240802003028.795476-1-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-08-03hvf: arm: Fix hvf_sysreg_read_cp() callAkihiko Odaki1-1/+1
Changed val from uint64_t to a pointer to uint64_t in hvf_sysreg_read, but didn't change its usage in hvf_sysreg_read_cp call. Fixes: e9e640148c ("hvf: arm: Raise an exception for sysreg by default") Reported-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20240802-hvf-v1-1-e2c0292037e5@daynix.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-08-01target/arm: Handle denormals correctly for FMOPA (widening)Peter Maydell3-15/+51
The FMOPA (widening) SME instruction takes pairs of half-precision floating point values, widens them to single-precision, does a two-way dot product and accumulates the results into a single-precision destination. We don't quite correctly handle the FPCR bits FZ and FZ16 which control flushing of denormal inputs and outputs. This is because at the moment we pass a single float_status value to the helper function, which then uses that configuration for all the fp operations it does. However, because the inputs to this operation are float16 and the outputs are float32 we need to use the fp_status_f16 for the float16 input widening but the normal fp_status for everything else. Otherwise we will apply the flushing control FPCR.FZ16 to the 32-bit output rather than the FPCR.FZ control, and incorrectly flush a denormal output to zero when we should not (or vice-versa). (In commit 207d30b5fdb5b we tried to fix the FZ handling but didn't get it right, switching from "use FPCR.FZ for everything" to "use FPCR.FZ16 for everything".) Pass the CPU env to the sme_fmopa_h helper instead of an fp_status pointer, and have the helper pass an extra fp_status into the f16_dotadd() function so that we can use the right status for the right parts of this operation. Cc: qemu-stable@nongnu.org Fixes: 207d30b5fdb5 ("target/arm: Use FPST_F16 for SME FMOPA (widening)") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2373 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-29target/arm: Ignore SMCR_EL2.LEN and SVCR_EL2.LEN if EL2 is not enabledPeter Maydell1-1/+1
When determining the current vector length, the SMCR_EL2.LEN and SVCR_EL2.LEN settings should only be considered if EL2 is enabled (compare the pseudocode CurrentSVL and CurrentNSVL which call EL2Enabled()). We were checking against ARM_FEATURE_EL2 rather than calling arm_is_el2_enabled(), which meant that we would look at SMCR_EL2/SVCR_EL2 when in Secure EL1 or Secure EL0 even if Secure EL2 was not enabled. Use the correct check in sve_vqm1_for_el_sm(). Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240722172957.1041231-5-peter.maydell@linaro.org
2024-07-29target/arm: Avoid shifts by -1 in tszimm_shr() and tszimm_shl()Peter Maydell1-2/+16
The function tszimm_esz() returns a shift amount, or possibly -1 in certain cases that correspond to unallocated encodings in the instruction set. We catch these later in the trans_ functions (generally with an "a-esz < 0" check), but before we do the decodetree-generated code will also call tszimm_shr() or tszimm_sl(), which will use the tszimm_esz() return value as a shift count without checking that it is not negative, which is undefined behaviour. Avoid the UB by checking the return value in tszimm_shr() and tszimm_shl(). Cc: qemu-stable@nongnu.org Resolves: Coverity CID 1547617, 1547694 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240722172957.1041231-4-peter.maydell@linaro.org
2024-07-29target/arm: Fix UMOPA/UMOPS of 16-bit valuesPeter Maydell1-4/+4
The UMOPA/UMOPS instructions are supposed to multiply unsigned 8 or 16 bit elements and accumulate the products into a 64-bit element. In the Arm ARM pseudocode, this is done with the usual infinite-precision signed arithmetic. However our implementation doesn't quite get it right, because in the DEF_IMOP_64() macro we do: sum += (NTYPE)(n >> 0) * (MTYPE)(m >> 0); where NTYPE and MTYPE are uint16_t or int16_t. In the uint16_t case, the C usual arithmetic conversions mean the values are converted to "int" type and the multiply is done as a 32-bit multiply. This means that if the inputs are, for example, 0xffff and 0xffff then the result is 0xFFFE0001 as an int, which is then promoted to uint64_t for the accumulation into sum; this promotion incorrectly sign extends the multiply. Avoid the incorrect sign extension by casting to int64_t before the multiply, so we do the multiply as 64-bit signed arithmetic, which is a type large enough that the multiply can never overflow into the sign bit. (The equivalent 8-bit operations in DEF_IMOP_32() are fine, because the 8-bit multiplies can never overflow into the sign bit of a 32-bit integer.) Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2372 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240722172957.1041231-3-peter.maydell@linaro.org
2024-07-29target/arm: Don't assert for 128-bit tile accesses when SVL is 128Peter Maydell1-1/+9
For an instruction which accesses a 128-bit element tile when the SVL is also 128 (for example MOV z0.Q, p0/M, ZA0H.Q[w0,0]), we will assert in get_tile_rowcol(): qemu-system-aarch64: ../../tcg/tcg-op.c:926: tcg_gen_deposit_z_i32: Assertion `len > 0' failed. This happens because we calculate len = ctz32(streaming_vec_reg_size(s)) - esz;$ but if the SVL and the element size are the same len is 0, and the deposit operation asserts. In this case the ZA storage contains exactly one 128 bit element ZA tile, and the horizontal or vertical slice is just that tile. This means that regardless of the index value in the Ws register, we always access that tile. (In pseudocode terms, we calculate (index + offset) MOD 1, which is 0.) Special case the len == 0 case to avoid hitting the assertion in tcg_gen_deposit_z_i32(). Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240722172957.1041231-2-peter.maydell@linaro.org
2024-07-29hvf: arm: Do not advance PC when raising an exceptionAkihiko Odaki1-3/+3
This is identical with commit 30a1690f2402 ("hvf: arm: Do not advance PC when raising an exception") but for writes instead of reads. Fixes: a2260983c655 ("hvf: arm: Add support for GICv3") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-07-29hvf: arm: Properly disable PMUAkihiko Odaki1-87/+97
Setting pmu property used to have no effect for hvf so fix it. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-07-29hvf: arm: Raise an exception for sysreg by defaultAkihiko Odaki1-89/+85
Any sysreg access results in an exception unless defined otherwise so we should raise an exception by default. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-07-29target/arm/kvm: Do not silently remove PMUAkihiko Odaki1-5/+0
kvm_arch_init_vcpu() used to remove PMU when it is not available even if the CPU model needs one. It is semantically incorrect, and may continue execution on a misbehaving host that advertises a CPU model while lacking its PMU. Keep the PMU when the CPU model needs one, and let kvm_arm_vcpu_init() fail if the KVM implementation mismatches with our expectation. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-07-29target/arm/kvm: Set PMU for host only when availableAkihiko Odaki1-1/+1
target/arm/kvm.c checked PMU availability but unconditionally set the PMU feature flag for the host CPU model, which is confusing. Set the feature flag only when available. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-07-23bsd-user: Make compile for non-linux user-mode stuffWarner Losh1-0/+4
We include the files that define PR_MTE_TCF_SHIFT only on Linux, but use them unconditionally. Restrict its use to Linux-only. "It's ugly, but it's not actually wrong." Signed-off-by: Warner Losh <imp@bsdimp.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>