diff options
author | Kewen Lin <linkw@linux.ibm.com> | 2021-07-05 20:53:19 -0500 |
---|---|---|
committer | Kewen Lin <linkw@linux.ibm.com> | 2021-07-05 20:53:19 -0500 |
commit | 8ffe25eefae57fb3a228a2d31a57af5bdab8911f (patch) | |
tree | d7b9e781a5eae40fff8d2b34ab95c99264a96c55 /gcc/ira.c | |
parent | a3543b5e8002c033b2304d7ac1d1e58218eebb51 (diff) | |
download | gcc-8ffe25eefae57fb3a228a2d31a57af5bdab8911f.zip gcc-8ffe25eefae57fb3a228a2d31a57af5bdab8911f.tar.gz gcc-8ffe25eefae57fb3a228a2d31a57af5bdab8911f.tar.bz2 |
ira: Support more matching constraint forms with param [PR100328]
This patch is to make IRA consider matching constraint heavily,
even if there is at least one other alternative with non-NO_REG
register class constraint, it will continue and check matching
constraint in all available alternatives and respect the
matching constraint with preferred register class.
One typical case is destructive FMA style instruction on rs6000.
Without this patch, for the mentioned FMA instruction, IRA won't
respect the matching constraint on VSX_REG since there are some
alternative with FLOAT_REG which doesn't have matching constraint.
It can cause extra register copies since later reload has to make
code to respect the constraint. This patch make IRA respect this
matching constraint on VSX_REG which is the preferred regclass,
but it excludes some cases where for one preferred register class
there can be two or more alternatives, one of them has the
matching constraint, while another doesn't have. It also
considers the possibility of free register copy.
With option Ofast unroll, this patch can help to improve SPEC2017
bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while
508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any
remarkable degradations. It also improved something on SVE as
testcase changes showed and Richard's confirmation.
Bootstrapped & regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu.
gcc/ChangeLog:
PR rtl-optimization/100328
* doc/invoke.texi (ira-consider-dup-in-all-alts): Document new
parameter.
* ira.c (ira_get_dup_out_num): Adjust as parameter
param_ira_consider_dup_in_all_alts.
* params.opt (ira-consider-dup-in-all-alts): New.
* ira-conflicts.c (process_regs_for_copy): Add one parameter
single_input_op_has_cstr_p.
(get_freq_for_shuffle_copy): New function.
(add_insn_allocno_copies): Adjust as single_input_op_has_cstr_p.
* ira-int.h (ira_get_dup_out_num): Add one bool parameter.
gcc/testsuite/ChangeLog:
PR rtl-optimization/100328
* gcc.target/aarch64/sve/acle/asm/div_f16.c: Remove one xfail.
* gcc.target/aarch64/sve/acle/asm/div_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulx_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulx_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulx_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmad_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmad_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmad_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmla_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmla_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmla_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmls_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmls_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmls_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmsb_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmsb_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmsb_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f64.c: Likewise.
Diffstat (limited to 'gcc/ira.c')
-rw-r--r-- | gcc/ira.c | 128 |
1 files changed, 119 insertions, 9 deletions
@@ -1922,9 +1922,25 @@ ira_setup_alts (rtx_insn *insn) /* Return the number of the output non-early clobber operand which should be the same in any case as operand with number OP_NUM (or negative value if there is no such operand). ALTS is the mask - of alternatives that we should consider. */ + of alternatives that we should consider. SINGLE_INPUT_OP_HAS_CSTR_P + should be set in this function, it indicates whether there is only + a single input operand which has the matching constraint on the + output operand at the position specified in return value. If the + pattern allows any one of several input operands holds the matching + constraint, it's set as false, one typical case is destructive FMA + instruction on target rs6000. Note that for a non-NO_REG preferred + register class with no free register move copy, if the parameter + PARAM_IRA_CONSIDER_DUP_IN_ALL_ALTS is set to one, this function + will check all available alternatives for matching constraints, + even if it has found or will find one alternative with non-NO_REG + regclass, it can respect more cases with matching constraints. If + PARAM_IRA_CONSIDER_DUP_IN_ALL_ALTS is set to zero, + SINGLE_INPUT_OP_HAS_CSTR_P is always true, it will stop to find + matching constraint relationship once it hits some alternative with + some non-NO_REG regclass. */ int -ira_get_dup_out_num (int op_num, alternative_mask alts) +ira_get_dup_out_num (int op_num, alternative_mask alts, + bool &single_input_op_has_cstr_p) { int curr_alt, c, original; bool ignore_p, use_commut_op_p; @@ -1937,10 +1953,42 @@ ira_get_dup_out_num (int op_num, alternative_mask alts) return -1; str = recog_data.constraints[op_num]; use_commut_op_p = false; + single_input_op_has_cstr_p = true; + + rtx op = recog_data.operand[op_num]; + int op_regno = reg_or_subregno (op); + enum reg_class op_pref_cl = reg_preferred_class (op_regno); + machine_mode op_mode = GET_MODE (op); + + ira_init_register_move_cost_if_necessary (op_mode); + /* If the preferred regclass isn't NO_REG, continue to find the matching + constraint in all available alternatives with preferred regclass, even + if we have found or will find one alternative whose constraint stands + for a REG (non-NO_REG) regclass. Note that it would be fine not to + respect matching constraint if the register copy is free, so exclude + it. */ + bool respect_dup_despite_reg_cstr + = param_ira_consider_dup_in_all_alts + && op_pref_cl != NO_REGS + && ira_register_move_cost[op_mode][op_pref_cl][op_pref_cl] > 0; + + /* Record the alternative whose constraint uses the same regclass as the + preferred regclass, later if we find one matching constraint for this + operand with preferred reclass, we will visit these recorded + alternatives to check whether if there is one alternative in which no + any INPUT operands have one matching constraint same as our candidate. + If yes, it means there is one alternative which is perfectly fine + without satisfying this matching constraint. If no, it means in any + alternatives there is one other INPUT operand holding this matching + constraint, it's fine to respect this matching constraint and further + create this constraint copy since it would become harmless once some + other takes preference and it's interfered. */ + alternative_mask pref_cl_alts; + for (;;) { - rtx op = recog_data.operand[op_num]; - + pref_cl_alts = 0; + for (curr_alt = 0, ignore_p = !TEST_BIT (alts, curr_alt), original = -1;;) { @@ -1963,9 +2011,25 @@ ira_get_dup_out_num (int op_num, alternative_mask alts) { enum constraint_num cn = lookup_constraint (str); enum reg_class cl = reg_class_for_constraint (cn); - if (cl != NO_REGS - && !targetm.class_likely_spilled_p (cl)) - goto fail; + if (cl != NO_REGS && !targetm.class_likely_spilled_p (cl)) + { + if (respect_dup_despite_reg_cstr) + { + /* If it's free to move from one preferred class to + the one without matching constraint, it doesn't + have to respect this constraint with costs. */ + if (cl != op_pref_cl + && (ira_reg_class_intersect[cl][op_pref_cl] + != NO_REGS) + && (ira_may_move_in_cost[op_mode][op_pref_cl][cl] + == 0)) + goto fail; + else if (cl == op_pref_cl) + pref_cl_alts |= ALTERNATIVE_BIT (curr_alt); + } + else + goto fail; + } if (constraint_satisfied_p (op, cn)) goto fail; break; @@ -1979,7 +2043,21 @@ ira_get_dup_out_num (int op_num, alternative_mask alts) str = end; if (original != -1 && original != n) goto fail; - original = n; + gcc_assert (n < recog_data.n_operands); + if (respect_dup_despite_reg_cstr) + { + const operand_alternative *op_alt + = &recog_op_alt[curr_alt * recog_data.n_operands]; + /* Only respect the one with preferred rclass, without + respect_dup_despite_reg_cstr it's possible to get + one whose regclass isn't preferred first before, + but it would fail since there should be other + alternatives with preferred regclass. */ + if (op_alt[n].cl == op_pref_cl) + original = n; + } + else + original = n; continue; } } @@ -1988,7 +2066,39 @@ ira_get_dup_out_num (int op_num, alternative_mask alts) if (original == -1) goto fail; if (recog_data.operand_type[original] == OP_OUT) - return original; + { + if (pref_cl_alts == 0) + return original; + /* Visit these recorded alternatives to check whether + there is one alternative in which no any INPUT operands + have one matching constraint same as our candidate. + Give up this candidate if so. */ + int nop, nalt; + for (nalt = 0; nalt < recog_data.n_alternatives; nalt++) + { + if (!TEST_BIT (pref_cl_alts, nalt)) + continue; + const operand_alternative *op_alt + = &recog_op_alt[nalt * recog_data.n_operands]; + bool dup_in_other = false; + for (nop = 0; nop < recog_data.n_operands; nop++) + { + if (recog_data.operand_type[nop] != OP_IN) + continue; + if (nop == op_num) + continue; + if (op_alt[nop].matches == original) + { + dup_in_other = true; + break; + } + } + if (!dup_in_other) + return -1; + } + single_input_op_has_cstr_p = false; + return original; + } fail: if (use_commut_op_p) break; |