aboutsummaryrefslogtreecommitdiff
path: root/gcc/fold-const.c
diff options
context:
space:
mode:
authorKyrylo Tkachov <kyrylo.tkachov@arm.com>2021-02-22 21:24:41 +0000
committerKyrylo Tkachov <kyrylo.tkachov@arm.com>2021-02-22 21:24:41 +0000
commita65b9ad863c5fc0aea12db58557f4d286a1974d7 (patch)
tree388caa27fcb281d2a4b2c2f4de839b5eb322c713 /gcc/fold-const.c
parent692ba083d9a22aaa08c8a3700d0237db8c922dc4 (diff)
downloadgcc-a65b9ad863c5fc0aea12db58557f4d286a1974d7.zip
gcc-a65b9ad863c5fc0aea12db58557f4d286a1974d7.tar.gz
gcc-a65b9ad863c5fc0aea12db58557f4d286a1974d7.tar.bz2
aarch64: Add internal tune flag to minimise VL-based scalar ops
This patch introduces an internal tune flag to break up VL-based scalar ops into a GP-reg scalar op with the VL read kept separate. This can be preferable on some CPUs. I went for a tune param rather than extending the rtx costs as our RTX costs tables aren't set up to track this intricacy. I've confirmed that on the simple loop: void vadd (int *dst, int *op1, int *op2, int count) { for (int i = 0; i < count; ++i) dst[i] = op1[i] + op2[i]; } we now split the incw into a cntw outside the loop and the add inside. + cntw x5 ... loop: - incw x4 + add x4, x4, x5 gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (cse_sve_vl_constants): Define. * config/aarch64/aarch64.md (add<mode>3): Force CONST_POLY_INT immediates into a register when the above is enabled. * config/aarch64/aarch64.c (neoversev1_tunings): AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS. (aarch64_rtx_costs): Use AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS. gcc/testsuite/ * gcc.target/aarch64/sve/cse_sve_vl_constants_1.c: New test.
Diffstat (limited to 'gcc/fold-const.c')
0 files changed, 0 insertions, 0 deletions