riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Richard Sandiford <richard.sandiford@arm.com>	2021-03-26 16:08:38 +0000
committer	Richard Sandiford <richard.sandiford@arm.com>	2021-03-26 16:08:38 +0000
commit	1205a8cadb6bd41cdf5b13d7aca8fb44332002e5 (patch)
tree	a926a061d24fdcbfe588e2dc7b37d5cbf8cb753e /gcc/lra-constraints.c
parent	e4180ab2fea0d3e8010f23b5e73095ac13cedafa (diff)
download	gcc-1205a8cadb6bd41cdf5b13d7aca8fb44332002e5.zip gcc-1205a8cadb6bd41cdf5b13d7aca8fb44332002e5.tar.gz gcc-1205a8cadb6bd41cdf5b13d7aca8fb44332002e5.tar.bz2

aarch64: Take issue rate into account for vector loop costs

When SVE is enabled, GCC needs to do a three-way comparison between scalar, Advanced SIMD and SVE code. The normal costs tend to be latency-based, which is well-suited to SLP. However, comparing sums of latency costs means that we effectively treat the code as executing sequentially. This can hide the effect of pipeline bubbles or resource contention that in practice are quite important for loop vectorisation. This is particularly true for loops that involve reductions. This patch therefore tries to estimate how quickly each piece of code could issue, using a very (very) simplistic model. It then uses this to adjust the loop vector costs up or down as appropriate. Part of the Advanced SIMD vs. SVE adjustment is opt-in and is not enabled by default even for use_new_vector_costs. Like with the previous patches, this one only becomes active if a CPU selects use_new_vector_costs. It should therefore have a very low impact on other CPUs. The code also mostly ignores CPUs that have no issue information, even if use_new_vector_costs is enabled for some reason. gcc/ * config/aarch64/aarch64.opt (-param=aarch64-loop-vect-issue-rate-niters=): New parameter. * doc/invoke.texi: Document it. * config/aarch64/aarch64-protos.h (aarch64_base_vec_issue_info) (aarch64_scalar_vec_issue_info, aarch64_simd_vec_issue_info) (aarch64_advsimd_vec_issue_info, aarch64_sve_vec_issue_info) (aarch64_vec_issue_info): New structures. (cpu_vector_cost): Write comments above the variables rather than to the side. (cpu_vector_cost::issue_info): New member variable. * config/aarch64/aarch64.c: Include gimple-pretty-print.h and tree-ssa-loop-niter.h. (generic_vector_cost, a64fx_vector_cost, qdf24xx_vector_cost) (thunderx_vector_cost, tsv110_vector_cost, cortexa57_vector_cost) (exynosm1_vector_cost, xgene1_vector_cost, thunderx2t99_vector_cost) (thunderx3t110_vector_cost): Initialize issue_info to null. (neoversev1_scalar_issue_info, neoversev1_advsimd_issue_info) (neoversev1_sve_issue_info, neoversev1_vec_issue_info): New structures. (neoversev1_vector_cost): Use them. (aarch64_vec_op_count, aarch64_sve_op_count): New structures. (aarch64_vector_costs::saw_sve_only_op): New member variable. (aarch64_vector_costs::num_vector_iterations): Likewise. (aarch64_vector_costs::scalar_ops): Likewise. (aarch64_vector_costs::advsimd_ops): Likewise. (aarch64_vector_costs::sve_ops): Likewise. (aarch64_vector_costs::seen_loads): Likewise. (aarch64_simd_vec_costs_for_flags): New function. (aarch64_analyze_loop_vinfo): Initialize num_vector_iterations. Count the number of predicate operations required by SVE WHILE instructions. (aarch64_comparison_type, aarch64_multiply_add_p): New functions. (aarch64_sve_only_stmt_p, aarch64_in_loop_reduction_latency): Likewise. (aarch64_count_ops): Likewise. (aarch64_add_stmt_cost): Record whether see an SVE operation that cannot currently be implementing using Advanced SIMD. Record issue information about the scalar, Advanced SIMD and (where relevant) SVE versions of a loop. (aarch64_vec_op_count::dump): New function. (aarch64_sve_op_count::dump): Likewise. (aarch64_estimate_min_cycles_per_iter): Likewise. (aarch64_adjust_body_cost): If issue information is available, try to compare the issue rates of the various loop implementations and increase or decrease the vector body cost accordingly.

Diffstat (limited to 'gcc/lra-constraints.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: