diff options
author | Stam Markianos-Wright <stam.markianos-wright@arm.com> | 2023-04-27 15:51:14 +0100 |
---|---|---|
committer | Stam Markianos-Wright <stam.markianos-wright@arm.com> | 2023-05-18 11:12:16 +0100 |
commit | 8eedd1e1d6ab14ab2e394a692cd0b6edb5262dd1 (patch) | |
tree | 58fee8ed1552e2c5510e5958913a33d131350242 /gcc/fortran/cpp.h | |
parent | f2dd012ae6cd1f488103e6c17b46fef64d1b96fd (diff) | |
download | gcc-8eedd1e1d6ab14ab2e394a692cd0b6edb5262dd1.zip gcc-8eedd1e1d6ab14ab2e394a692cd0b6edb5262dd1.tar.gz gcc-8eedd1e1d6ab14ab2e394a692cd0b6edb5262dd1.tar.bz2 |
arm: Stop vadcq, vsbcq intrinsics from overwriting the FPSCR NZ flags
Hi all,
We noticed that calls to the vadcq and vsbcq intrinsics, both of
which use __builtin_arm_set_fpscr_nzcvqc to set the Carry flag in
the FPSCR, would produce the following code:
```
< r2 is the *carry input >
vmrs r3, FPSCR_nzcvqc
bic r3, r3, #536870912
orr r3, r3, r2, lsl #29
vmsr FPSCR_nzcvqc, r3
```
when the MVE ACLE instead gives a different instruction sequence of:
```
< Rt is the *carry input >
VMRS Rs,FPSCR_nzcvqc
BFI Rs,Rt,#29,#1
VMSR FPSCR_nzcvqc,Rs
```
the bic + orr pair is slower and it's also wrong, because, if the
*carry input is greater than 1, then we risk overwriting the top two
bits of the FPSCR register (the N and Z flags).
This turned out to be a problem in the header file and the solution was
to simply add a `& 1x0u` to the `*carry` input: then the compiler knows
that we only care about the lowest bit and can optimise to a BFI.
Ok for trunk?
Thanks,
Stam Markianos-Wright
gcc/ChangeLog:
* config/arm/arm_mve.h (__arm_vadcq_s32): Fix arithmetic.
(__arm_vadcq_u32): Likewise.
(__arm_vadcq_m_s32): Likewise.
(__arm_vadcq_m_u32): Likewise.
(__arm_vsbcq_s32): Likewise.
(__arm_vsbcq_u32): Likewise.
(__arm_vsbcq_m_s32): Likewise.
(__arm_vsbcq_m_u32): Likewise.
* config/arm/mve.md (get_fpscr_nzcvqc): Make unspec_volatile.
gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: New.
Diffstat (limited to 'gcc/fortran/cpp.h')
0 files changed, 0 insertions, 0 deletions