Age | Commit message (Collapse) | Author | Files | Lines |
|
Hi, Richard and Richi.
This is the last autovec pattern I want to add for RVV (length loop control).
This patch is supposed to handled this following case:
int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v, int n)
{
int last = 66; /* High start value. */
for (int i = 0; i < n; i++)
if (a[i] < min_v)
last = i;
return last;
}
ARM SVE IR:
...
mask__7.11_39 = vect__4.10_37 < vect_cst__38;
_40 = loop_mask_36 & mask__7.11_39;
last_5 = .FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32);
...
RVV IR, we want to see:
...
loop_len = SELECT_VL
mask__7.11_39 = vect__4.10_37 < vect_cst__38;
last_5 = .LEN_FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32, loop_len, bias);
...
gcc/ChangeLog:
* doc/md.texi: Add LEN_FOLD_EXTRACT_LAST pattern.
* internal-fn.cc (fold_len_extract_direct): Ditto.
(expand_fold_len_extract_optab_fn): Ditto.
(direct_fold_len_extract_optab_supported_p): Ditto.
* internal-fn.def (LEN_FOLD_EXTRACT_LAST): Ditto.
* optabs.def (OPTAB_D): Ditto.
|
|
Like the support conditional neg (r12-4470-g20dcda98ed376cb61c74b2c71),
this just adds conditional not too.
Also we should be able to turn `(a ? -1 : 0) ^ b` into a conditional
not.
OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.
gcc/ChangeLog:
* internal-fn.def (COND_NOT): New internal function.
* match.pd (UNCOND_UNARY, COND_UNARY): Add bit_not/not
to the lists.
(`vec (a ? -1 : 0) ^ b`): New pattern to convert
into conditional not.
* optabs.def (cond_one_cmpl): New optab.
(cond_len_one_cmpl): Likewise.
gcc/testsuite/ChangeLog:
PR target/110986
* gcc.target/aarch64/sve/cond_unary_9.c: New test.
|
|
This patch is add vec_mask_len_{load_lanes,store_stores} autovectorization patterns.
Here we want to support this following autovectorization:
void
foo (int8_t *__restrict a,
int8_t *__restrict b,
int8_t *__restrict cond,
int n)
{
for (intptr_t i = 0; i < n; ++i)
{
if (cond[i])
a[i] = b[i * 2] + b[i * 2 + 1];
}
}
ARM SVE IR:
https://godbolt.org/z/cro1Eqc6a
# loop_mask_60 = PHI <next_mask_82(4), max_mask_81(3)>
...
mask__39.12_63 = vect__3.11_61 != { 0, ... };
vec_mask_and_66 = loop_mask_60 & mask__39.12_63;
...
vect_array.15 = .MASK_LOAD_LANES (_57, 8B, vec_mask_and_66);
...
For RVV, we would like to see IR:
loop_len = SELECT_VL;
...
mask__39.12_63 = vect__3.11_61 != { 0, ... };
...
vect_array.15 = .MASK_LEN_LOAD_LANES (_57, 8B, mask__39.12_63, loop_len, bias);
...
Bootstrap and Regression on X86 passed.
Ok for trunk ?
gcc/ChangeLog:
* doc/md.texi: Add vec_mask_len_{load_lanes,store_lanes} patterns.
* internal-fn.cc (expand_partial_load_optab_fn): Ditto.
(expand_partial_store_optab_fn): Ditto.
* internal-fn.def (MASK_LEN_LOAD_LANES): Ditto.
(MASK_LEN_STORE_LANES): Ditto.
* optabs.def (OPTAB_CD): Ditto.
|
|
Hi.
Since start from LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE, COND_LEN_* patterns,
the order of len and mask is {mask,len,bias}.
The reason we make "mask" argument comes before "len" is because we want to keep
the "mask" location same as mask_* or cond_* patterns to make use of current codes flow
of mask_* and cond_*. Otherwise, we will need to change codes much more and make codes
hard to maintain.
Now, we already have COND_LEN_*, it's naturally that we should rename "LEN_MASK" into "MASK_LEN"
to keep name scheme consistent.
This patch only changes the name "LEN_MASK" into "MASK_LEN".
No codes functionality change.
gcc/ChangeLog:
* config/riscv/autovec.md (len_maskload<mode><vm>): Change LEN_MASK into MASK_LEN.
(mask_len_load<mode><vm>): Ditto.
(len_maskstore<mode><vm>): Ditto.
(mask_len_store<mode><vm>): Ditto.
(len_mask_gather_load<RATIO64:mode><RATIO64I:mode>): Ditto.
(mask_len_gather_load<RATIO64:mode><RATIO64I:mode>): Ditto.
(len_mask_gather_load<RATIO32:mode><RATIO32I:mode>): Ditto.
(mask_len_gather_load<RATIO32:mode><RATIO32I:mode>): Ditto.
(len_mask_gather_load<RATIO16:mode><RATIO16I:mode>): Ditto.
(mask_len_gather_load<RATIO16:mode><RATIO16I:mode>): Ditto.
(len_mask_gather_load<RATIO8:mode><RATIO8I:mode>): Ditto.
(mask_len_gather_load<RATIO8:mode><RATIO8I:mode>): Ditto.
(len_mask_gather_load<RATIO4:mode><RATIO4I:mode>): Ditto.
(mask_len_gather_load<RATIO4:mode><RATIO4I:mode>): Ditto.
(len_mask_gather_load<RATIO2:mode><RATIO2I:mode>): Ditto.
(mask_len_gather_load<RATIO2:mode><RATIO2I:mode>): Ditto.
(len_mask_gather_load<RATIO1:mode><RATIO1:mode>): Ditto.
(mask_len_gather_load<RATIO1:mode><RATIO1:mode>): Ditto.
(len_mask_scatter_store<RATIO64:mode><RATIO64I:mode>): Ditto.
(mask_len_scatter_store<RATIO64:mode><RATIO64I:mode>): Ditto.
(len_mask_scatter_store<RATIO32:mode><RATIO32I:mode>): Ditto.
(mask_len_scatter_store<RATIO32:mode><RATIO32I:mode>): Ditto.
(len_mask_scatter_store<RATIO16:mode><RATIO16I:mode>): Ditto.
(mask_len_scatter_store<RATIO16:mode><RATIO16I:mode>): Ditto.
(len_mask_scatter_store<RATIO8:mode><RATIO8I:mode>): Ditto.
(mask_len_scatter_store<RATIO8:mode><RATIO8I:mode>): Ditto.
(len_mask_scatter_store<RATIO4:mode><RATIO4I:mode>): Ditto.
(mask_len_scatter_store<RATIO4:mode><RATIO4I:mode>): Ditto.
(len_mask_scatter_store<RATIO2:mode><RATIO2I:mode>): Ditto.
(mask_len_scatter_store<RATIO2:mode><RATIO2I:mode>): Ditto.
(len_mask_scatter_store<RATIO1:mode><RATIO1:mode>): Ditto.
(mask_len_scatter_store<RATIO1:mode><RATIO1:mode>): Ditto.
* doc/md.texi: Ditto.
* genopinit.cc (main): Ditto.
(CMP_NAME): Ditto. Ditto.
* gimple-fold.cc (arith_overflowed_p): Ditto.
(gimple_fold_partial_load_store_mem_ref): Ditto.
(gimple_fold_call): Ditto.
* internal-fn.cc (len_maskload_direct): Ditto.
(mask_len_load_direct): Ditto.
(len_maskstore_direct): Ditto.
(mask_len_store_direct): Ditto.
(expand_call_mem_ref): Ditto.
(expand_len_maskload_optab_fn): Ditto.
(expand_mask_len_load_optab_fn): Ditto.
(expand_len_maskstore_optab_fn): Ditto.
(expand_mask_len_store_optab_fn): Ditto.
(direct_len_maskload_optab_supported_p): Ditto.
(direct_mask_len_load_optab_supported_p): Ditto.
(direct_len_maskstore_optab_supported_p): Ditto.
(direct_mask_len_store_optab_supported_p): Ditto.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_gather_scatter_fn_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
(internal_len_load_store_bias): Ditto.
* internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
(MASK_LEN_GATHER_LOAD): Ditto.
(LEN_MASK_LOAD): Ditto.
(MASK_LEN_LOAD): Ditto.
(LEN_MASK_SCATTER_STORE): Ditto.
(MASK_LEN_SCATTER_STORE): Ditto.
(LEN_MASK_STORE): Ditto.
(MASK_LEN_STORE): Ditto.
* optabs-query.cc (supports_vec_gather_load_p): Ditto.
(supports_vec_scatter_store_p): Ditto.
* optabs-tree.cc (target_supports_mask_load_store_p): Ditto.
(target_supports_len_load_store_p): Ditto.
* optabs.def (OPTAB_CD): Ditto.
* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Ditto.
(call_may_clobber_ref_p_1): Ditto.
* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Ditto.
(dse_optimize_stmt): Ditto.
* tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Ditto.
(get_alias_ptr_type_for_ptr_address): Ditto.
* tree-vect-data-refs.cc (vect_gather_scatter_fn_p): Ditto.
* tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(vect_get_strided_load_store_ops): Ditto.
(vectorizable_store): Ditto.
(vectorizable_load): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/gimple_fold-1.c: Ditto.
|
|
Hi, Richard and Richi.
This patch adds mask_len_fold_left_plus pattern to support in-order floating-point
reduction for target support len loop control.
Consider this following case:
double
foo2 (double *__restrict a,
double init,
int *__restrict cond,
int n)
{
for (int i = 0; i < n; i++)
if (cond[i])
init += a[i];
return init;
}
ARM SVE:
...
vec_mask_and_60 = loop_mask_54 & mask__23.33_57;
vect__ifc__35.37_64 = .VCOND_MASK (vec_mask_and_60, vect__8.36_61, { 0.0, ... });
_36 = .MASK_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, loop_mask_54);
...
For RVV, we want to see:
...
_36 = .MASK_LEN_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, control_mask, loop_len, bias);
...
gcc/ChangeLog:
* doc/md.texi: Add mask_len_fold_left_plus.
* internal-fn.cc (mask_len_fold_left_direct): Ditto.
(expand_mask_len_fold_left_optab_fn): Ditto.
(direct_mask_len_fold_left_optab_supported_p): Ditto.
* internal-fn.def (MASK_LEN_FOLD_LEFT_PLUS): Ditto.
* optabs.def (OPTAB_D): Ditto.
|
|
Hi, Richard and Richi.
This patch is adding cond_len_* operations pattern for target support loop control with length.
These patterns will be used in these following case:
1. Integer division:
void
f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int n)
{
for (int i = 0; i < n; ++i)
{
a[i] = b[i] / c[i];
}
}
ARM SVE IR:
...
max_mask_36 = .WHILE_ULT (0, bnd.5_32, { 0, ... });
Loop:
...
# loop_mask_29 = PHI <next_mask_37(4), max_mask_36(3)>
...
vect__4.8_28 = .MASK_LOAD (_33, 32B, loop_mask_29);
...
vect__6.11_25 = .MASK_LOAD (_20, 32B, loop_mask_29);
vect__8.12_24 = .COND_DIV (loop_mask_29, vect__4.8_28, vect__6.11_25, vect__4.8_28);
...
.MASK_STORE (_1, 32B, loop_mask_29, vect__8.12_24);
...
next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
...
For target like RVV who support loop control with length, we want to see IR as follows:
Loop:
...
# loop_len_29 = SELECT_VL
...
vect__4.8_28 = .LEN_MASK_LOAD (_33, 32B, loop_len_29);
...
vect__6.11_25 = .LEN_MASK_LOAD (_20, 32B, loop_len_29);
vect__8.12_24 = .COND_LEN_DIV (dummp_mask, vect__4.8_28, vect__6.11_25, vect__4.8_28, loop_len_29, bias);
...
.LEN_MASK_STORE (_1, 32B, loop_len_29, vect__8.12_24);
...
next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
...
Notice here, we use dummp_mask = { -1, -1, .... , -1 }
2. Integer conditional division:
Similar case with (1) but with condtion:
void
f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int32_t * cond, int n)
{
for (int i = 0; i < n; ++i)
{
if (cond[i])
a[i] = b[i] / c[i];
}
}
ARM SVE:
...
max_mask_76 = .WHILE_ULT (0, bnd.6_52, { 0, ... });
Loop:
...
# loop_mask_55 = PHI <next_mask_77(5), max_mask_76(4)>
...
vect__4.9_56 = .MASK_LOAD (_51, 32B, loop_mask_55);
mask__29.10_58 = vect__4.9_56 != { 0, ... };
vec_mask_and_61 = loop_mask_55 & mask__29.10_58;
...
vect__6.13_62 = .MASK_LOAD (_24, 32B, vec_mask_and_61);
...
vect__8.16_66 = .MASK_LOAD (_1, 32B, vec_mask_and_61);
vect__10.17_68 = .COND_DIV (vec_mask_and_61, vect__6.13_62, vect__8.16_66, vect__6.13_62);
...
.MASK_STORE (_2, 32B, vec_mask_and_61, vect__10.17_68);
...
next_mask_77 = .WHILE_ULT (_3, bnd.6_52, { 0, ... });
Here, ARM SVE use vec_mask_and_61 = loop_mask_55 & mask__29.10_58; to gurantee the correct result.
However, target with length control can not perform this elegant flow, for RVV, we would expect:
Loop:
...
loop_len_55 = SELECT_VL
...
mask__29.10_58 = vect__4.9_56 != { 0, ... };
...
vect__10.17_68 = .COND_LEN_DIV (mask__29.10_58, vect__6.13_62, vect__8.16_66, vect__6.13_62, loop_len_55, bias);
...
Here we expect COND_LEN_DIV predicated by a real mask which is the outcome of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... };
and a real length which is produced by loop control : loop_len_55 = SELECT_VL
3. conditional Floating-point operations (no -ffast-math):
void
f (float *restrict a, float *restrict b, int32_t *restrict cond, int n)
{
for (int i = 0; i < n; ++i)
{
if (cond[i])
a[i] = b[i] + a[i];
}
}
ARM SVE IR:
max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... });
...
# loop_mask_49 = PHI <next_mask_71(4), max_mask_70(3)>
...
mask__27.10_52 = vect__4.9_50 != { 0, ... };
vec_mask_and_55 = loop_mask_49 & mask__27.10_52;
...
vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56);
...
next_mask_71 = .WHILE_ULT (_22, bnd.6_46, { 0, ... });
...
For RVV, we would expect IR:
...
loop_len_49 = SELECT_VL
...
mask__27.10_52 = vect__4.9_50 != { 0, ... };
...
vect__9.17_62 = .COND_LEN_ADD (mask__27.10_52, vect__6.13_56, vect__8.16_60, vect__6.13_56, loop_len_49, bias);
...
4. Conditional un-ordered reduction:
int32_t
f (int32_t *restrict a,
int32_t *restrict cond, int n)
{
int32_t result = 0;
for (int i = 0; i < n; ++i)
{
if (cond[i])
result += a[i];
}
return result;
}
ARM SVE IR:
Loop:
# vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)>
...
# loop_mask_40 = PHI <next_mask_58(4), max_mask_57(3)>
...
mask__17.11_43 = vect__4.10_41 != { 0, ... };
vec_mask_and_46 = loop_mask_40 & mask__17.11_43;
...
vect__33.16_51 = .COND_ADD (vec_mask_and_46, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37);
...
next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... });
...
Epilogue:
_53 = .REDUC_PLUS (vect__33.16_51); [tail call]
For RVV, we expect:
Loop:
# vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)>
...
loop_len_40 = SELECT_VL
...
mask__17.11_43 = vect__4.10_41 != { 0, ... };
...
vect__33.16_51 = .COND_LEN_ADD (mask__17.11_43, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37, loop_len_40, bias);
...
next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... });
...
Epilogue:
_53 = .REDUC_PLUS (vect__33.16_51); [tail call]
I name these patterns as "cond_len_*" since I want the length operand comes after mask operand and all other operands except length operand
same order as "cond_*" patterns. Such order will make life easier in the following loop vectorizer support.
gcc/ChangeLog:
* doc/md.texi: Add COND_LEN_* operations for loop control with length.
* internal-fn.cc (cond_len_unary_direct): Ditto.
(cond_len_binary_direct): Ditto.
(cond_len_ternary_direct): Ditto.
(expand_cond_len_unary_optab_fn): Ditto.
(expand_cond_len_binary_optab_fn): Ditto.
(expand_cond_len_ternary_optab_fn): Ditto.
(direct_cond_len_unary_optab_supported_p): Ditto.
(direct_cond_len_binary_optab_supported_p): Ditto.
(direct_cond_len_ternary_optab_supported_p): Ditto.
* internal-fn.def (COND_LEN_ADD): Ditto.
(COND_LEN_SUB): Ditto.
(COND_LEN_MUL): Ditto.
(COND_LEN_DIV): Ditto.
(COND_LEN_MOD): Ditto.
(COND_LEN_RDIV): Ditto.
(COND_LEN_MIN): Ditto.
(COND_LEN_MAX): Ditto.
(COND_LEN_FMIN): Ditto.
(COND_LEN_FMAX): Ditto.
(COND_LEN_AND): Ditto.
(COND_LEN_IOR): Ditto.
(COND_LEN_XOR): Ditto.
(COND_LEN_SHL): Ditto.
(COND_LEN_SHR): Ditto.
(COND_LEN_FMA): Ditto.
(COND_LEN_FMS): Ditto.
(COND_LEN_FNMA): Ditto.
(COND_LEN_FNMS): Ditto.
(COND_LEN_NEG): Ditto.
* optabs.def (OPTAB_D): Ditto.
|
|
Hi, Richi and Richard.
Base one the review comments from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html
I change len_mask_gather_load/len_mask_scatter_store order into:
{len,bias,mask}
We adjust adding len and mask using using add_len_and_mask_args
which is same as partial_load/parial_store.
Now, the codes become more reasonable and easier maintain.
This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:
void
f (uint8_t *restrict a,
uint8_t *restrict b, int n,
int base, int step,
int *restrict cond)
{
for (int i = 0; i < n; ++i)
{
if (cond[i])
a[i * step + base] = b[i * step + base];
}
}
We hope RVV can vectorize such case into following IR:
loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, bias, control_mask)
LEN_SCATTER_STORE (... v, ..., loop_len, bias, control_mask)
This patch doesn't apply such patterns into vectorizer, just add patterns
and update the documents.
Will send patch which apply such patterns into vectorizer soon after this
patch is approved.
Ok for trunk?
gcc/ChangeLog:
* doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store.
* internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
(expand_gather_load_optab_fn): Ditto.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_gather_scatter_fn_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
(LEN_MASK_SCATTER_STORE): Ditto.
* optabs.def (OPTAB_CD): Ditto.
|
|
This updates vect_recog_abd_pattern to recognize the widening
variant of absolute difference (ABDL, ABDL2).
gcc/ChangeLog:
* internal-fn.def (VEC_WIDEN_ABD): New internal hilo optab.
* optabs.def (vec_widen_sabd_optab,
vec_widen_sabd_hi_optab, vec_widen_sabd_lo_optab,
vec_widen_sabd_odd_even, vec_widen_sabd_even_optab,
vec_widen_uabd_optab,
vec_widen_uabd_hi_optab, vec_widen_uabd_lo_optab,
vec_widen_uabd_odd_even, vec_widen_uabd_even_optab):
New optabs.
* doc/md.texi: Document them.
* tree-vect-patterns.cc (vect_recog_abd_pattern): Update to
to build a VEC_WIDEN_ABD call if the input precision is smaller
than the precision of the output.
(vect_recog_widen_abd_pattern): Should an ABD expression be
found preceeding an extension, replace the two with a
VEC_WIDEN_ABD.
|
|
This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
like RISC-V that uses length in loop control.
Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
Mask is the outcome of comparison.
LEN_MASK_ LOAD/STORE format is defined as follows:
1). LEN_MASK_LOAD (ptr, align, length, mask).
2). LEN_MASK_STORE (ptr, align, length, mask, vec).
Consider these 4 following cases:
VLA: Variable-length auto-vectorization
VLS: Specific-length auto-vectorization
Case 1 (VLS): -mrvv-vector-bits=128 IR (Does not use LEN_MASK_*):
Code: v1 = MEM (...)
for (int i = 0; i < 4; i++) v2 = MEM (...)
a[i] = b[i] + c[i]; v3 = v1 + v2
MEM[...] = v3
Case 2 (VLS): -mrvv-vector-bits=128 IR (LEN_MASK_* with length = VF, mask = comparison):
Code: mask = comparison
for (int i = 0; i < 4; i++) v1 = LEN_MASK_LOAD (length = VF, mask)
if (cond[i]) v2 = LEN_MASK_LOAD (length = VF, mask)
a[i] = b[i] + c[i]; v3 = v1 + v2
LEN_MASK_STORE (length = VF, mask, v3)
Case 3 (VLA):
Code: loop_len = SELECT_VL or MIN
for (int i = 0; i < n; i++) v1 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, mask = {-1,-1,...}, v3)
Case 4 (VLA):
Code: loop_len = SELECT_VL or MIN
for (int i = 0; i < n; i++) mask = comparison
if (cond[i]) v1 = LEN_MASK_LOAD (length = loop_len, mask)
a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask)
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, mask, v3)
Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com>
gcc/ChangeLog:
* doc/md.texi: Add len_mask{load,store}.
* genopinit.cc (main): Ditto.
(CMP_NAME): Ditto.
* internal-fn.cc (len_maskload_direct): Ditto.
(len_maskstore_direct): Ditto.
(expand_call_mem_ref): Ditto.
(expand_partial_load_optab_fn): Ditto.
(expand_len_maskload_optab_fn): Ditto.
(expand_partial_store_optab_fn): Ditto.
(expand_len_maskstore_optab_fn): Ditto.
(direct_len_maskload_optab_supported_p): Ditto.
(direct_len_maskstore_optab_supported_p): Ditto.
* internal-fn.def (LEN_MASK_LOAD): Ditto.
(LEN_MASK_STORE): Ditto.
* optabs.def (OPTAB_CD): Ditto.
|
|
The following patch introduces {add,sub}c5_optab and pattern recognizes
various forms of add with carry and subtract with carry/borrow, see
pr79173-{1,2,3,4,5,6}.c tests on what is matched.
Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow
calls per limb (with just one for the least significant one), for
add with carry even when it is hand written in C (for subtraction
reassoc seems to change it too much so that the pattern recognition
doesn't work). __builtin_{add,sub}_overflow are standardized in C23
under ckd_{add,sub} names, so it isn't any longer a GNU only extension.
Note, clang has for these (IMHO badly designed)
__builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just
a single bit of carry, but basically add 3 unsigned values or
subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2
because of that. If we wanted to introduce those for clang compatibility,
we could and lower them early to just two __builtin_{add,sub}_overflow
calls and let the pattern matching in this patch recognize it later.
I've added expanders for this on ix86 and in addition to that
added various peephole2s (in preparation patches for this patch) to make
sure we get nice (and small) code for the common cases. I think there are
other PRs which request that e.g. for the _{addcarry,subborrow}_u{32,64}
intrinsics, which the patch also improves.
Would be nice if support for these optabs was added to many other targets,
arm/aarch64 and powerpc* certainly have such instructions, I'd expect
in fact that most targets do.
The _BitInt support I'm working on will also need this to emit reasonable
code.
2023-06-15 Jakub Jelinek <jakub@redhat.com>
PR middle-end/79173
* internal-fn.def (UADDC, USUBC): New internal functions.
* internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
(commutative_ternary_fn_p): Return true also for IFN_UADDC.
* optabs.def (uaddc5_optab, usubc5_optab): New optabs.
* tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
match_uaddc_usubc): New functions.
(math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
other optimizations have been successful for those.
* gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
* fold-const-call.cc (fold_const_call): Likewise.
* gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
* doc/md.texi (uaddc<mode>5, usubc<mode>5): Document new named
patterns.
* config/i386/i386.md (uaddc<mode>5, usubc<mode>5): New
define_expand patterns.
(*setcc_qi_addqi3_cconly_overflow_1_<mode>, *setccc): Split
into NOTE_INSN_DELETED note rather than nop instruction.
(*setcc_qi_negqi_ccc_1_<mode>, *setcc_qi_negqi_ccc_2_<mode>):
Likewise.
* gcc.target/i386/pr79173-1.c: New test.
* gcc.target/i386/pr79173-2.c: New test.
* gcc.target/i386/pr79173-3.c: New test.
* gcc.target/i386/pr79173-4.c: New test.
* gcc.target/i386/pr79173-5.c: New test.
* gcc.target/i386/pr79173-6.c: New test.
* gcc.target/i386/pr79173-7.c: New test.
* gcc.target/i386/pr79173-8.c: New test.
* gcc.target/i386/pr79173-9.c: New test.
* gcc.target/i386/pr79173-10.c: New test.
|
|
This adds a recognition pattern for the non-widening
absolute difference (ABD).
gcc/ChangeLog:
* doc/md.texi (sabd, uabd): Document them.
* internal-fn.def (ABD): Use new optab.
* optabs.def (sabd_optab, uabd_optab): New optabs,
* tree-vect-patterns.cc (vect_recog_absolute_difference):
Recognize the following idiom abs (a - b).
(vect_recog_sad_pattern): Refactor to use
vect_recog_absolute_difference.
(vect_recog_abd_pattern): Use patterns found by
vect_recog_absolute_difference to build a new ABD
internal call.
|
|
This patch address comments from Richard && Richi and rebase to trunk.
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750
The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:
1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration
Code
# void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i<n; i++) { z[i]=x[i]+y[i]; } }
Take RVV codegen for example:
Before this patch:
vvaddint32:
ble a0,zero,.L6
csrr a4,vlenb
srli a6,a4,2
.L4:
mv a5,a0
bleu a0,a6,.L3
mv a5,a6
.L3:
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v2,0(a1)
vle32.v v1,0(a2)
vsetvli a7,zero,e32,m1,ta,ma
sub a0,a0,a5
vadd.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a3)
add a2,a2,a4
add a3,a3,a4
add a1,a1,a4
bne a0,zero,.L4
.L6:
ret
After this patch:
vvaddint32:
vsetvli t0, a0, e32, ta, ma # Set vector length based on 32-bit vectors
vle32.v v0, (a1) # Get first vector
sub a0, a0, t0 # Decrement number done
slli t0, t0, 2 # Multiply number done by 4 bytes
add a1, a1, t0 # Bump pointer
vle32.v v1, (a2) # Get second vector
add a2, a2, t0 # Bump pointer
vadd.vv v2, v0, v1 # Sum vectors
vse32.v v2, (a3) # Store result
add a3, a3, t0 # Bump pointer
bnez a0, vvaddint32 # Loop back
ret # Finished
Co-authored-by: Richard Sandiford<richard.sandiford@arm.com>
Co-authored-by: Richard Biener <rguenther@suse.de>
gcc/ChangeLog:
* doc/md.texi: Add SELECT_VL support.
* internal-fn.def (SELECT_VL): Ditto.
* optabs.def (OPTAB_D): Ditto.
* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto.
* tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto.
(vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
|
|
This patch removes the old widen plus/minus tree codes which have been
replaced by internal functions.
2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
Joel Hutton <joel.hutton@arm.com>
gcc/ChangeLog:
* doc/generic.texi: Remove old tree codes.
* expr.cc (expand_expr_real_2): Remove old tree code cases.
* gimple-pretty-print.cc (dump_binary_rhs): Likewise.
* optabs-tree.cc (optab_for_tree_code): Likewise.
(supportable_half_widening_operation): Likewise.
* tree-cfg.cc (verify_gimple_assign_binary): Likewise.
* tree-inline.cc (estimate_operator_cost): Likewise.
(op_symbol_code): Likewise.
* tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Likewise.
(vect_analyze_data_ref_accesses): Likewise.
* tree-vect-generic.cc (expand_vector_operations_1): Likewise.
* cfgexpand.cc (expand_debug_expr): Likewise.
* tree-vect-stmts.cc (vectorizable_conversion): Likewise.
(supportable_widening_operation): Likewise.
* gimple-range-op.cc (gimple_range_op_handler::maybe_non_standard):
Likewise.
* optabs.def (vec_widen_ssubl_hi_optab, vec_widen_ssubl_lo_optab,
vec_widen_saddl_hi_optab, vec_widen_saddl_lo_optab,
vec_widen_usubl_hi_optab, vec_widen_usubl_lo_optab,
vec_widen_uaddl_hi_optab, vec_widen_uaddl_lo_optab): Remove optabs.
* tree-pretty-print.cc (dump_generic_node): Remove tree code definition.
* tree.def (WIDEN_PLUS_EXPR, WIDEN_MINUS_EXPR, VEC_WIDEN_PLUS_HI_EXPR,
VEC_WIDEN_PLUS_LO_EXPR, VEC_WIDEN_MINUS_HI_EXPR,
VEC_WIDEN_MINUS_LO_EXPR): Likewise.
|
|
DEF_INTERNAL_WIDENING_OPTAB_FN and DEF_INTERNAL_NARROWING_OPTAB_FN
are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN
respectively. With the exception that they provide convenience wrappers
for a single vector to vector conversion, a hi/lo split or an even/odd
split. Each definition for <NAME> will require either signed optabs
named <UOPTAB> and <SOPTAB> (for widening) or a single <OPTAB> (for
narrowing) for each of the five functions it creates.
For example, for widening addition the
DEF_INTERNAL_WIDENING_OPTAB_FN will create five internal functions:
IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
IFN_VEC_WIDEN_PLUS_EVEN and IFN_VEC_WIDEN_PLUS_ODD. Each requiring two
optabs, one for signed and one for unsigned.
Aarch64 implements the hi/lo split optabs:
IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> (u/s)addl2
IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> -> (u/s)addl
This gives the same functionality as the previous
WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into
VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
Joel Hutton <joel.hutton@arm.com>
Tamar Christina <tamar.christina@arm.com>
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): Rename
this ...
(vec_widen_<su>add_lo_<mode>): ... to this.
(vec_widen_<su>addl_hi_<mode>): Rename this ...
(vec_widen_<su>add_hi_<mode>): ... to this.
(vec_widen_<su>subl_lo_<mode>): Rename this ...
(vec_widen_<su>sub_lo_<mode>): ... to this.
(vec_widen_<su>subl_hi_<mode>): Rename this ...
(vec_widen_<su>sub_hi_<mode>): ...to this.
* doc/generic.texi: Document new IFN codes.
* internal-fn.cc (lookup_hilo_internal_fn): Add lookup function.
(commutative_binary_fn_p): Add widen_plus fn's.
(widening_fn_p): New function.
(narrowing_fn_p): New function.
(direct_internal_fn_optab): Change visibility.
* internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an
internal_fn that expands into multiple internal_fns for widening.
(IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD,
IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI,
IFN_VEC_WIDEN_MINUS_LO, IFN_VEC_WIDEN_MINUS_ODD,
IFN_VEC_WIDEN_MINUS_EVEN): Define widening plus,minus functions.
* internal-fn.h (direct_internal_fn_optab): Declare new prototype.
(lookup_hilo_internal_fn): Likewise.
(widening_fn_p): Likewise.
(Narrowing_fn_p): Likewise.
* optabs.cc (commutative_optab_p): Add widening plus optabs.
* optabs.def (OPTAB_D): Define widen add, sub optabs.
* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
patterns with a hi/lo or even/odd split.
(vect_recog_sad_pattern): Refactor to use new IFN codes.
(vect_recog_widen_plus_pattern): Likewise.
(vect_recog_widen_minus_pattern): Likewise.
(vect_recog_average_pattern): Likewise.
* tree-vect-stmts.cc (vectorizable_conversion): Add support for
_HILO IFNs.
(supportable_widening_operation): Likewise.
* tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vect-widen-add.c: Test that new
IFN_VEC_WIDEN_PLUS is being used.
* gcc.target/aarch64/vect-widen-sub.c: Test that new
IFN_VEC_WIDEN_MINUS is being used.
|
|
|
|
operations
This adds a new test-and-branch optab that can be used to do a conditional test
of a bit and branch. This is similar to the cbranch optab but instead can
test any arbitrary bit inside the register.
This patch recognizes boolean comparisons and single bit mask tests.
gcc/ChangeLog:
* dojump.cc (do_jump): Pass along value.
(do_jump_by_parts_greater_rtx): Likewise.
(do_jump_by_parts_zero_rtx): Likewise.
(do_jump_by_parts_equality_rtx): Likewise.
(do_compare_rtx_and_jump): Likewise.
(do_compare_and_jump): Likewise.
* dojump.h (do_compare_rtx_and_jump): New.
* optabs.cc (emit_cmp_and_jump_insn_1): Refactor to take optab to check.
(validate_test_and_branch): New.
(emit_cmp_and_jump_insns): Optiobally take a value, and when value is
supplied then check if it's suitable for tbranch.
* optabs.def (tbranch_eq$a4, tbranch_ne$a4): New.
* doc/md.texi (tbranch_@var{op}@var{mode}4): Document it.
* optabs.h (emit_cmp_and_jump_insns): New.
* tree.h (tree_zero_one_valued_p): New.
|
|
The following patch implements a new builtin, __builtin_issignaling,
which can be used to implement the ISO/IEC TS 18661-1 issignaling
macro.
It is implemented as type-generic function, so there is just one
builtin, not many with various suffixes.
This patch doesn't address PR56831 nor PR58416, but I think compared to
using glibc issignaling macro could make some cases better (as
the builtin is expanded always inline and for SFmode/DFmode just
reinterprets a memory or pseudo register as SImode/DImode, so could
avoid some raising of exception + turning sNaN into qNaN before the
builtin can analyze it).
For floading point modes that do not have NaNs it will return 0,
otherwise I've tried to implement this for all the other supported
real formats.
It handles both the MIPS/PA floats where a sNaN has the mantissa
MSB set and the rest where a sNaN has it cleared, with the exception
of format which are known never to be in the MIPS/PA form.
The MIPS/PA floats are handled using a test like
(x & mask) == mask,
the other usually as
((x ^ bit) & mask) > val
where bit, mask and val are some constants.
IBM double double is done by doing DFmode test on the most significant
half, and Intel/Motorola extended (12 or 16 bytes) and IEEE quad are
handled by extracting 32-bit/16-bit words or 64-bit parts from the
value and testing those.
On x86, XFmode is handled by a special optab so that even pseudo numbers
are considered signaling, like in glibc and like the i386 specific testcase
tests.
2022-08-26 Jakub Jelinek <jakub@redhat.com>
gcc/
* builtins.def (BUILT_IN_ISSIGNALING): New built-in.
* builtins.cc (expand_builtin_issignaling): New function.
(expand_builtin_signbit): Don't overwrite target.
(expand_builtin): Handle BUILT_IN_ISSIGNALING.
(fold_builtin_classify): Likewise.
(fold_builtin_1): Likewise.
* optabs.def (issignaling_optab): New.
* fold-const-call.cc (fold_const_call_ss): Handle
BUILT_IN_ISSIGNALING.
* config/i386/i386.md (issignalingxf2): New expander.
* doc/extend.texi (__builtin_issignaling): Document.
(__builtin_isinf, __builtin_isnan): Clarify behavior with
-ffinite-math-only.
* doc/md.texi (issignaling<mode>2): Likewise.
gcc/c-family/
* c-common.cc (check_builtin_function_arguments): Handle
BUILT_IN_ISSIGNALING.
gcc/c/
* c-typeck.cc (convert_arguments): Handle BUILT_IN_ISSIGNALING.
gcc/fortran/
* f95-lang.cc (gfc_init_builtin_functions): Initialize
BUILT_IN_ISSIGNALING.
gcc/testsuite/
* gcc.dg/torture/builtin-issignaling-1.c: New test.
* gcc.dg/torture/builtin-issignaling-2.c: New test.
* gcc.dg/torture/float16-builtin-issignaling-1.c: New test.
* gcc.dg/torture/float32-builtin-issignaling-1.c: New test.
* gcc.dg/torture/float32x-builtin-issignaling-1.c: New test.
* gcc.dg/torture/float64-builtin-issignaling-1.c: New test.
* gcc.dg/torture/float64x-builtin-issignaling-1.c: New test.
* gcc.dg/torture/float128-builtin-issignaling-1.c: New test.
* gcc.dg/torture/float128x-builtin-issignaling-1.c: New test.
* gcc.target/i386/builtin-issignaling-1.c: New test.
|
|
and feraiseexcept [PR94193]
This optimizations were originally in glibc, but was removed
and suggested that they were a good fit as gcc builtins[1].
feclearexcept and feraiseexcept were extended (in comparison to the
glibc version) to accept any combination of the accepted flags, not
limited to just one flag bit at a time anymore.
The builtin expanders needs knowledge of the target libc's FE_*
values, so they are limited to expand only to suitable libcs.
[1] https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00047.html
https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00080.html
2020-08-13 Raoni Fassina Firmino <raoni@linux.ibm.com>
gcc/
PR target/94193
* builtins.cc (expand_builtin_fegetround): New function.
(expand_builtin_feclear_feraise_except): New function.
(expand_builtin): Add cases for BUILT_IN_FEGETROUND,
BUILT_IN_FECLEAREXCEPT and BUILT_IN_FERAISEEXCEPT.
* config/rs6000/rs6000.md (fegetroundsi): New pattern.
(feclearexceptsi): New Pattern.
(feraiseexceptsi): New Pattern.
* doc/extend.texi: Add a new introductory paragraph about the
new builtins.
* doc/md.texi: (fegetround@var{m}): Document new optab.
(feclearexcept@var{m}): Document new optab.
(feraiseexcept@var{m}): Document new optab.
* optabs.def (fegetround_optab): New optab.
(feclearexcept_optab): New optab.
(feraiseexcept_optab): New optab.
gcc/testsuite/
PR target/94193
* gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c: New test.
* gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c: New test.
* gcc.target/powerpc/builtin-fegetround.c: New test.
Signed-off-by: Raoni Fassina Firmino <raoni@linux.ibm.com>
|
|
C++20:
#include <compare>
auto cmp4way(double a, double b)
{
return a <=> b;
}
expands to:
ucomisd %xmm1, %xmm0
jp .L8
movl $0, %eax
jne .L8
.L2:
ret
.p2align 4,,10
.p2align 3
.L8:
comisd %xmm0, %xmm1
movl $-1, %eax
ja .L2
ucomisd %xmm1, %xmm0
setbe %al
addl $1, %eax
ret
That is 3 comparisons of the same operands.
The following patch improves it to just one comparison:
comisd %xmm1, %xmm0
jp .L4
seta %al
movl $0, %edx
leal -1(%rax,%rax), %eax
cmove %edx, %eax
ret
.L4:
movl $2, %eax
ret
While a <=> b expands to a == b ? 0 : a < b ? -1 : a > b ? 1 : 2
where the first comparison is equality and this shouldn't raise
exceptions on qNaN operands, if the operands aren't equal (which
includes unordered cases), then it immediately performs < or >
comparison and that raises exceptions even on qNaNs, so we can just
perform a single comparison that raises exceptions on qNaN.
As the 4 different cases are encoded as
ZF CF PF
1 1 1 a unordered b
0 0 0 a > b
0 1 0 a < b
1 0 0 a == b
we can emit optimal sequence of comparions, first jp
for the unordered case, then je for the == case and finally jb
for the < case.
The patch pattern recognizes spaceship-like comparisons during
widening_mul if the spaceship optab is implemented, and replaces
those comparisons with comparisons of .SPACESHIP ifn which returns
-1/0/1/2 based on the comparison. This seems to work well both for the
case of just returning the -1/0/1/2 (when we have just a common
successor with a PHI) or when the different cases are handled with
various other basic blocks. The testcases cover both of those cases,
the latter with different function calls in those.
2022-01-17 Jakub Jelinek <jakub@redhat.com>
PR target/103973
* tree-cfg.h (cond_only_block_p): Declare.
* tree-ssa-phiopt.c (cond_only_block_p): Move function to ...
* tree-cfg.c (cond_only_block_p): ... here. No longer static.
* optabs.def (spaceship_optab): New optab.
* internal-fn.def (SPACESHIP): New internal function.
* internal-fn.h (expand_SPACESHIP): Declare.
* internal-fn.c (expand_PHI): Formatting fix.
(expand_SPACESHIP): New function.
* tree-ssa-math-opts.c (optimize_spaceship): New function.
(math_opts_dom_walker::after_dom_children): Use it.
* config/i386/i386.md (spaceship<mode>3): New define_expand.
* config/i386/i386-protos.h (ix86_expand_fp_spaceship): Declare.
* config/i386/i386-expand.c (ix86_expand_fp_spaceship): New function.
* doc/md.texi (spaceship@var{m}3): Document.
* gcc.target/i386/pr103973-1.c: New test.
* gcc.target/i386/pr103973-2.c: New test.
* gcc.target/i386/pr103973-3.c: New test.
* gcc.target/i386/pr103973-4.c: New test.
* gcc.target/i386/pr103973-5.c: New test.
* gcc.target/i386/pr103973-6.c: New test.
* gcc.target/i386/pr103973-7.c: New test.
* gcc.target/i386/pr103973-8.c: New test.
* gcc.target/i386/pr103973-9.c: New test.
* gcc.target/i386/pr103973-10.c: New test.
* gcc.target/i386/pr103973-11.c: New test.
* gcc.target/i386/pr103973-12.c: New test.
* gcc.target/i386/pr103973-13.c: New test.
* gcc.target/i386/pr103973-14.c: New test.
* gcc.target/i386/pr103973-15.c: New test.
* gcc.target/i386/pr103973-16.c: New test.
* gcc.target/i386/pr103973-17.c: New test.
* gcc.target/i386/pr103973-18.c: New test.
* gcc.target/i386/pr103973-19.c: New test.
* gcc.target/i386/pr103973-20.c: New test.
* g++.target/i386/pr103973-1.C: New test.
* g++.target/i386/pr103973-2.C: New test.
* g++.target/i386/pr103973-3.C: New test.
* g++.target/i386/pr103973-4.C: New test.
* g++.target/i386/pr103973-5.C: New test.
* g++.target/i386/pr103973-6.C: New test.
* g++.target/i386/pr103973-7.C: New test.
* g++.target/i386/pr103973-8.C: New test.
* g++.target/i386/pr103973-9.C: New test.
* g++.target/i386/pr103973-10.C: New test.
* g++.target/i386/pr103973-11.C: New test.
* g++.target/i386/pr103973-12.C: New test.
* g++.target/i386/pr103973-13.C: New test.
* g++.target/i386/pr103973-14.C: New test.
* g++.target/i386/pr103973-15.C: New test.
* g++.target/i386/pr103973-16.C: New test.
* g++.target/i386/pr103973-17.C: New test.
* g++.target/i386/pr103973-18.C: New test.
* g++.target/i386/pr103973-19.C: New test.
* g++.target/i386/pr103973-20.C: New test.
|
|
{==,!=,<,<=,>,>=} 0 [PR98737]
On Wed, Jan 27, 2021 at 12:27:13PM +0100, Ulrich Drepper via Gcc-patches wrote:
> On 1/27/21 11:37 AM, Jakub Jelinek wrote:
> > Would equality comparison against 0 handle the most common cases.
> >
> > The user can write it as
> > __atomic_sub_fetch (x, y, z) == 0
> > or
> > __atomic_fetch_sub (x, y, z) - y == 0
> > thouch, so the expansion code would need to be able to cope with both.
>
> Please also keep !=0, <0, <=0, >0, and >=0 in mind. They all can be
> useful and can be handled with the flags.
<= 0 and > 0 don't really work well with lock {add,sub,inc,dec}, x86 doesn't
have comparisons that would look solely at both SF and ZF and not at other
flags (and emitting two separate conditional jumps or two setcc insns and
oring them together looks awful).
But the rest can work.
Here is a patch that adds internal functions and optabs for these,
recognizes them at the same spot as e.g. .ATOMIC_BIT_TEST_AND* internal
functions (fold all builtins pass) and expands them appropriately (or for
the <= 0 and > 0 cases of +/- FAILs and let's middle-end fall back).
So far I have handled just the op_fetch builtins, IMHO instead of handling
also __atomic_fetch_sub (x, y, z) - y == 0 etc. we should canonicalize
__atomic_fetch_sub (x, y, z) - y to __atomic_sub_fetch (x, y, z) (and vice
versa).
2022-01-03 Jakub Jelinek <jakub@redhat.com>
PR target/98737
* internal-fn.def (ATOMIC_ADD_FETCH_CMP_0, ATOMIC_SUB_FETCH_CMP_0,
ATOMIC_AND_FETCH_CMP_0, ATOMIC_OR_FETCH_CMP_0, ATOMIC_XOR_FETCH_CMP_0):
New internal fns.
* internal-fn.h (ATOMIC_OP_FETCH_CMP_0_EQ, ATOMIC_OP_FETCH_CMP_0_NE,
ATOMIC_OP_FETCH_CMP_0_LT, ATOMIC_OP_FETCH_CMP_0_LE,
ATOMIC_OP_FETCH_CMP_0_GT, ATOMIC_OP_FETCH_CMP_0_GE): New enumerators.
* internal-fn.c (expand_ATOMIC_ADD_FETCH_CMP_0,
expand_ATOMIC_SUB_FETCH_CMP_0, expand_ATOMIC_AND_FETCH_CMP_0,
expand_ATOMIC_OR_FETCH_CMP_0, expand_ATOMIC_XOR_FETCH_CMP_0): New
functions.
* optabs.def (atomic_add_fetch_cmp_0_optab,
atomic_sub_fetch_cmp_0_optab, atomic_and_fetch_cmp_0_optab,
atomic_or_fetch_cmp_0_optab, atomic_xor_fetch_cmp_0_optab): New
direct optabs.
* builtins.h (expand_ifn_atomic_op_fetch_cmp_0): Declare.
* builtins.c (expand_ifn_atomic_op_fetch_cmp_0): New function.
* tree-ssa-ccp.c: Include internal-fn.h.
(optimize_atomic_bit_test_and): Add . before internal fn call
in function comment. Change return type from void to bool and
return true only if successfully replaced.
(optimize_atomic_op_fetch_cmp_0): New function.
(pass_fold_builtins::execute): Use optimize_atomic_op_fetch_cmp_0
for BUILT_IN_ATOMIC_{ADD,SUB,AND,OR,XOR}_FETCH_{1,2,4,8,16} and
BUILT_IN_SYNC_{ADD,SUB,AND,OR,XOR}_AND_FETCH_{1,2,4,8,16},
for *XOR* ones only if optimize_atomic_bit_test_and failed.
* config/i386/sync.md (atomic_<plusminus_mnemonic>_fetch_cmp_0<mode>,
atomic_<logic>_fetch_cmp_0<mode>): New define_expand patterns.
(atomic_add_fetch_cmp_0<mode>_1, atomic_sub_fetch_cmp_0<mode>_1,
atomic_<logic>_fetch_cmp_0<mode>_1): New define_insn patterns.
* doc/md.texi (atomic_add_fetch_cmp_0<mode>,
atomic_sub_fetch_cmp_0<mode>, atomic_and_fetch_cmp_0<mode>,
atomic_or_fetch_cmp_0<mode>, atomic_xor_fetch_cmp_0<mode>): Document
new named patterns.
* gcc.target/i386/pr98737-1.c: New test.
* gcc.target/i386/pr98737-2.c: New test.
* gcc.target/i386/pr98737-3.c: New test.
* gcc.target/i386/pr98737-4.c: New test.
* gcc.target/i386/pr98737-5.c: New test.
* gcc.target/i386/pr98737-6.c: New test.
* gcc.target/i386/pr98737-7.c: New test.
|
|
|
|
This patch adds support for reductions involving calls to fmax*()
and fmin*(), without the -ffast-math flags that allow them to be
converted to MAX_EXPR and MIN_EXPR.
gcc/
* doc/md.texi (reduc_fmin_scal_@var{m}): Document.
(reduc_fmax_scal_@var{m}): Likewise.
* optabs.def (reduc_fmax_scal_optab): New optab.
(reduc_fmin_scal_optab): Likewise
* internal-fn.def (REDUC_FMAX, REDUC_FMIN): New functions.
* tree-vect-loop.c (reduction_fn_for_scalar_code): Handle
CASE_CFN_FMAX and CASE_CFN_FMIN.
(neutral_op_for_reduction): Likewise.
(needs_fold_left_reduction_p): Likewise.
* config/aarch64/iterators.md (FMAXMINV): New iterator.
(fmaxmin): Handle UNSPEC_FMAXNMV and UNSPEC_FMINNMV.
* config/aarch64/aarch64-simd.md (reduc_<optab>_scal_<mode>): Fix
unspec mode.
(reduc_<fmaxmin>_scal_<mode>): New pattern.
* config/aarch64/aarch64-sve.md (reduc_<fmaxmin>_scal_<mode>):
Likewise.
gcc/testsuite/
* gcc.dg/vect/vect-fmax-1.c: New test.
* gcc.dg/vect/vect-fmax-2.c: Likewise.
* gcc.dg/vect/vect-fmax-3.c: Likewise.
* gcc.dg/vect/vect-fmin-1.c: New test.
* gcc.dg/vect/vect-fmin-2.c: Likewise.
* gcc.dg/vect/vect-fmin-3.c: Likewise.
* gcc.target/aarch64/fmaxnm_1.c: Likewise.
* gcc.target/aarch64/fmaxnm_2.c: Likewise.
* gcc.target/aarch64/fminnm_1.c: Likewise.
* gcc.target/aarch64/fminnm_2.c: Likewise.
* gcc.target/aarch64/sve/fmaxnm_2.c: Likewise.
* gcc.target/aarch64/sve/fmaxnm_3.c: Likewise.
* gcc.target/aarch64/sve/fminnm_2.c: Likewise.
* gcc.target/aarch64/sve/fminnm_3.c: Likewise.
|
|
This patch adds conditional forms of FMAX and FMIN, following
the pattern for existing conditional binary functions.
gcc/
* doc/md.texi (cond_fmin@var{mode}, cond_fmax@var{mode}): Document.
* optabs.def (cond_fmin_optab, cond_fmax_optab): New optabs.
* internal-fn.def (COND_FMIN, COND_FMAX): New functions.
* internal-fn.c (first_commutative_argument): Handle them.
(FOR_EACH_COND_FN_PAIR): Likewise.
* match.pd (UNCOND_BINARY, COND_BINARY): Likewise.
* config/aarch64/aarch64-sve.md (cond_<fmaxmin><mode>): New
pattern.
gcc/testsuite/
* gcc.target/aarch64/sve/cond_fmaxnm_5.c: New test.
* gcc.target/aarch64/sve/cond_fmaxnm_5_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_6.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_6_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_7.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_7_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_8.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_8_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_5.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_5_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_6.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_6_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_7.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_7_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_8.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_8_run.c: Likewise.
|
|
gcc/ChangeLog:
PR target/93183
* gimple-match-head.c (try_conditional_simplification): Add case for single operand.
* internal-fn.def: Add entry for COND_NEG internal function.
* internal-fn.c (FOR_EACH_CODE_MAPPING): Add entry for
NEGATE_EXPR, COND_NEG mapping.
* optabs.def: Add entry for cond_neg_optab.
* match.pd (UNCOND_UNARY, COND_UNARY): New operator lists.
(vec_cond COND (foo A) B) -> (IFN_COND_FOO COND A B): New pattern.
(vec_cond COND B (foo A)) -> (IFN_COND_FOO ~COND A B): Likewise.
gcc/testsuite/ChangeLog:
PR target/93183
* gcc.target/aarch64/sve/cond_unary_4.c: Adjust.
* gcc.target/aarch64/sve/pr93183.c: New test.
|
|
This patch adds support for recognizing loops which mimic the behaviour
of functions strlen and rawmemchr, and replaces those with internal
function calls in case a target provides them. In contrast to the
standard strlen and rawmemchr functions, this patch also supports
different instances where the memory pointed to is interpreted as 8, 16,
and 32-bit sized, respectively.
gcc/ChangeLog:
* builtins.c (get_memory_rtx): Change to external linkage.
* builtins.h (get_memory_rtx): Add function prototype.
* doc/md.texi (rawmemchr<mode>): Document.
* internal-fn.c (expand_RAWMEMCHR): Define.
* internal-fn.def (RAWMEMCHR): Add.
* optabs.def (rawmemchr_optab): Add.
* tree-loop-distribution.c (find_single_drs): Change return code
behaviour by also returning true if no single store was found
but a single load.
(loop_distribution::classify_partition): Respect the new return
code behaviour of function find_single_drs.
(loop_distribution::execute): Call new function
transform_reduction_loop in order to replace rawmemchr or strlen
like loops by calls into builtins.
(generate_reduction_builtin_1): New function.
(generate_rawmemchr_builtin): New function.
(generate_strlen_builtin_1): New function.
(generate_strlen_builtin): New function.
(generate_strlen_builtin_using_rawmemchr): New function.
(reduction_var_overflows_first): New function.
(determine_reduction_stmt_1): New function.
(determine_reduction_stmt): New function.
(loop_distribution::transform_reduction_loop): New function.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ldist-rawmemchr-1.c: New test.
* gcc.dg/tree-ssa/ldist-rawmemchr-2.c: New test.
* gcc.dg/tree-ssa/ldist-strlen-1.c: New test.
* gcc.dg/tree-ssa/ldist-strlen-2.c: New test.
* gcc.dg/tree-ssa/ldist-strlen-3.c: New test.
|
|
This patch adds support for a dot product where the sign of the multiplication
arguments differ. i.e. one is signed and one is unsigned but the precisions are
the same.
#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned
SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
SIGNEDNESS_4 char *restrict b)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
int av = a[i];
int bv = b[i];
SIGNEDNESS_2 short mult = av * bv;
res += mult;
}
return res;
}
The operations are performed as if the operands were extended to a 32-bit value.
As such this operation isn't valid if there is an intermediate conversion to an
unsigned value. i.e. if SIGNEDNESS_2 is unsigned.
more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped the same
optab is used but the operands are flipped in the optab expansion.
To support this the patch extends the dot-product detection to optionally
ignore operands with different signs and stores this information in the optab
subtype which is now made a bitfield.
The subtype can now additionally controls which optab an EXPR can expand to.
gcc/ChangeLog:
* optabs.def (usdot_prod_optab): New.
* doc/md.texi: Document it and clarify other dot prod optabs.
* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
* optabs.c (expand_widen_pattern_expr): Likewise.
* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
optab subtype.
(vect_widened_op_tree): Optionally ignore
mismatch types.
(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
|
|
This adds named expanders for vec_fmaddsub<mode>4 and
vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and
vfmsubaddXXXp{ds} instructions. This complements the previous
addition of ADDSUB support.
x86 lacks SUBADD and the negate variants of FMA with mixed
plus minus so I did not add optabs or patterns for those but
it would not be difficult if there's a target that has them.
2021-07-05 Richard Biener <rguenther@suse.de>
* doc/md.texi (vec_fmaddsub<mode>4): Document.
(vec_fmsubadd<mode>4): Likewise.
* optabs.def (vec_fmaddsub$a4): Add.
(vec_fmsubadd$a4): Likewise.
* internal-fn.def (IFN_VEC_FMADDSUB): Add.
(IFN_VEC_FMSUBADD): Likewise.
* tree-vect-slp-patterns.c (addsub_pattern::recognize):
Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD.
(addsub_pattern::build): Likewise.
* tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB
and CFN_VEC_FMSUBADD are not transparent for permutes.
* config/i386/sse.md (vec_fmaddsub<mode>4): New expander.
(vec_fmsubadd<mode>4): Likewise.
* gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase.
* gcc.target/i386/vect-fmaddsubXXXps.c: Likewise.
* gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise.
* gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
|
|
This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1
instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... }
thus subtract, add alternating on lanes, starting with subtract.
It adds a corresponding optab and direct internal function,
vec_addsub$a3 and renames the existing i386 backend patterns to
the new canonical name.
The SLP pattern matches the exact alternating lane sequence rather
than trying to be clever and anticipating incoming permutes - we
could permute the two input vectors to the needed lane alternation,
do the addsub and then permute the result vector back but that's
only profitable in case the two input or the output permute will
vanish - something Tamars refactoring of SLP pattern recog should
make possible.
2021-06-17 Richard Biener <rguenther@suse.de>
* config/i386/sse.md (avx_addsubv4df3): Rename to
vec_addsubv4df3.
(avx_addsubv8sf3): Rename to vec_addsubv8sf3.
(sse3_addsubv2df3): Rename to vec_addsubv2df3.
(sse3_addsubv4sf3): Rename to vec_addsubv4sf3.
* config/i386/i386-builtin.def: Adjust.
* internal-fn.def (VEC_ADDSUB): New internal optab fn.
* optabs.def (vec_addsub_optab): New optab.
* tree-vect-slp-patterns.c (class addsub_pattern): New.
(slp_patterns): Add addsub_pattern.
* tree-vect-slp.c (vect_optimize_slp): Disable propagation
across CFN_VEC_ADDSUB.
* tree-vectorizer.h (vect_pattern::vect_pattern): Make
m_ops optional.
* doc/md.texi (vec_addsub<mode>3): Document.
* gcc.target/i386/vect-addsubv2df.c: New testcase.
* gcc.target/i386/vect-addsubv4sf.c: Likewise.
* gcc.target/i386/vect-addsubv4df.c: Likewise.
* gcc.target/i386/vect-addsubv8sf.c: Likewise.
* gcc.target/i386/vect-addsub-2.c: Likewise.
* gcc.target/i386/vect-addsub-3.c: Likewise.
|
|
This adds support for FMS and FMS conjugated to the slp pattern matcher.
Example of matches:
#include <stdio.h>
#include <complex.h>
#define N 200
#define ROT
#define TYPE float
#define TYPE2 float
void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
for (int i=0; i < N; i++)
{
c[i] -= a[i] * (b[i] ROT);
}
}
void g_f1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
for (int i=0; i < N; i++)
{
c[i] -= conjf (a[i]) * (b[i]);
}
}
void g_s1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
for (int i=0; i < N; i++)
{
c[i] -= a[i] * conjf (b[i] ROT);
}
}
void caxpy_sub(double complex * restrict y, double complex * restrict x, size_t N, double complex f) {
for (size_t i = 0; i < N; ++i)
y[i] -= x[i]* f;
}
gcc/ChangeLog:
* internal-fn.def (COMPLEX_FMS, COMPLEX_FMS_CONJ): New.
* optabs.def (cmls_optab, cmls_conj_optab): New.
* doc/md.texi: Document them.
* tree-vect-slp-patterns.c (class complex_fms_pattern,
complex_fms_pattern::matches, complex_fms_pattern::recognize,
complex_fms_pattern::build): New.
|
|
This adds support for FMA and FMA conjugated to the slp pattern matcher.
Example of instructions matched:
#include <stdio.h>
#include <complex.h>
#define N 200
#define ROT
#define TYPE float
#define TYPE2 float
void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
for (int i=0; i < N; i++)
{
c[i] += a[i] * (b[i] ROT);
}
}
void g_f1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
for (int i=0; i < N; i++)
{
c[i] += conjf (a[i]) * (b[i] ROT);
}
}
void g_s1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
for (int i=0; i < N; i++)
{
c[i] += a[i] * conjf (b[i] ROT);
}
}
void caxpy_add(double complex * restrict y, double complex * restrict x, size_t N, double complex f) {
for (size_t i = 0; i < N; ++i)
y[i] += x[i]* f;
}
gcc/ChangeLog:
* internal-fn.def (COMPLEX_FMA, COMPLEX_FMA_CONJ): New.
* optabs.def (cmla_optab, cmla_conj_optab): New.
* doc/md.texi: Document them.
* tree-vect-slp-patterns.c (vect_match_call_p,
class complex_fma_pattern, vect_slp_reset_pattern,
complex_fma_pattern::matches, complex_fma_pattern::recognize,
complex_fma_pattern::build): New.
|
|
This adds support for complex multiply and complex multiply and accumulate to
the vect pattern detector.
Example of instructions matched:
#include <stdio.h>
#include <complex.h>
#define N 200
#define ROT
#define TYPE float
#define TYPE2 float
void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
for (int i=0; i < N; i++)
{
c[i] = a[i] * (b[i] ROT);
}
}
void g_f1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
for (int i=0; i < N; i++)
{
c[i] = conjf (a[i]) * (b[i] ROT);
}
}
void g_s1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
{
for (int i=0; i < N; i++)
{
c[i] = a[i] * conjf (b[i] ROT);
}
}
gcc/ChangeLog:
* internal-fn.def (COMPLEX_MUL, COMPLEX_MUL_CONJ): New.
* optabs.def (cmul_optab, cmul_conj_optab): New.
* doc/md.texi: Document them.
* tree-vect-slp-patterns.c (vect_match_call_complex_mla,
vect_normalize_conj_loc, is_eq_or_top, vect_validate_multiplication,
vect_build_combine_node, class complex_mul_pattern,
complex_mul_pattern::matches, complex_mul_pattern::recognize,
complex_mul_pattern::build): New.
|
|
|
|
This patch adds support for
* Complex Addition with rotation of 90 and 270.
Addition with rotation of the second argument around the Argand plane.
Supported rotations are 90 and 180.
c = a + (b * I) and c = a + (b * I * I * I)
gcc/ChangeLog:
* tree-vect-slp-patterns.c: New file.
* Makefile.in: Add it.
* doc/passes.texi: Document it.
* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
* optabs.def (cadd90_optab, cadd270_optab): New.
* doc/md.texi: Document them.
* tree-vect-loop.c (vect_analyze_loop_2): Add dissolve code.
* tree-vect-slp.c:
(vect_free_slp_instance, vect_create_new_slp_node): Export.
(vect_match_slp_patterns_2, vect_match_slp_patterns): New.
(vect_analyze_slp): Use it.
* tree-vectorizer.h (vect_free_slp_tree): Export.
(enum _complex_operation): Forward declare.
(class vect_pattern): New
gcc/testsuite/ChangeLog:
* lib/target-supports.exp
(check_effective_target_arm_v8_3a_complex_neon_ok_nocache): Fix it.
(check_effective_target_vect_complex_add_byte
,check_effective_target_vect_complex_add_int
,check_effective_target_vect_complex_add_short
,check_effective_target_vect_complex_add_long
,check_effective_target_vect_complex_add_half
,check_effective_target_vect_complex_add_float
,check_effective_target_vect_complex_add_double): New.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c: New test.
* gcc.dg/vect/complex/complex-add-pattern-template.c: New test.
* gcc.dg/vect/complex/complex-add-template.c: New test.
* gcc.dg/vect/complex/complex-operations-run.c: New test.
* gcc.dg/vect/complex/complex-operations.c: New test.
* gcc.dg/vect/complex/complex.exp: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-double.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-half-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-int.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-long.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-short.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c: New test.
|
|
Add widening add, subtract patterns to tree-vect-patterns. Update the
widened code of patterns that detect PLUS_EXPR to also detect
WIDEN_PLUS_EXPR. These patterns take 2 vectors with N elements of size
S and perform an add/subtract on the elements, storing the results as N
elements of size 2*S (in 2 result vectors). This is implemented in the
aarch64 backend as addl,addl2 and subl,subl2 respectively. Add aarch64
tests for patterns.
gcc/ChangeLog:
* doc/generic.texi: Document new widen_plus/minus_lo/hi tree codes.
* doc/md.texi: Document new widenening add/subtract hi/lo optabs.
* expr.c (expand_expr_real_2): Add widen_add, widen_subtract cases.
* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
* optabs.def (OPTAB_D): Define vectorized widen add, subtracts.
* tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds,
subtracts.
* tree-inline.c (estimate_operator_cost): Add case for widening adds,
subtracts.
* tree-vect-generic.c (expand_vector_operations_1): Add case for
widening adds, subtracts
* tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog
pattern.
(vect_recog_widen_sub_pattern): New recog pattern.
(vect_recog_average_pattern): Update widened add code.
(vect_recog_average_pattern): Update widened add code.
* tree-vect-stmts.c (vectorizable_conversion): Add case for widened add,
subtract.
(supportable_widening_operation): Add case for widened add, subtract.
* tree.def
(WIDEN_PLUS_EXPR): New tree code.
(WIDEN_MINUS_EXPR): New tree code.
(VEC_WIDEN_ADD_HI_EXPR): New tree code.
(VEC_WIDEN_PLUS_LO_EXPR): New tree code.
(VEC_WIDEN_MINUS_HI_EXPR): New tree code.
(VEC_WIDEN_MINUS_LO_EXPR): New tree code.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vect-widen-add.c: New test.
* gcc.target/aarch64/vect-widen-sub.c: New test.
|
|
This patch is to add the internal function and optabs support for
vector load/store with length.
For the vector load/store with length optab, the length item would
be measured in lanes by default. For the targets which support
length measured in bytes like Power, they should only define VnQI
modes to wrap the other same size vector modes. If the length is
larger than total lane/byte count of the given mode, the behavior
is undefined. For the remaining lanes/bytes which isn't specified
by length, they would be taken as undefined value.
gcc/ChangeLog:
* doc/md.texi (len_load_@var{m}): Document.
(len_store_@var{m}): Likewise.
* internal-fn.c (len_load_direct): New macro.
(len_store_direct): Likewise.
(expand_len_load_optab_fn): Likewise.
(expand_len_store_optab_fn): Likewise.
(direct_len_load_optab_supported_p): Likewise.
(direct_len_store_optab_supported_p): Likewise.
(expand_mask_load_optab_fn): New macro. Original renamed to ...
(expand_partial_load_optab_fn): ... here. Add handlings for
len_load_optab.
(expand_mask_store_optab_fn): New macro. Original renamed to ...
(expand_partial_store_optab_fn): ... here. Add handlings for
len_store_optab.
(internal_load_fn_p): Handle IFN_LEN_LOAD.
(internal_store_fn_p): Handle IFN_LEN_STORE.
(internal_fn_stored_value_index): Handle IFN_LEN_STORE.
* internal-fn.def (LEN_LOAD): New internal function.
(LEN_STORE): Likewise.
* optabs.def (len_load_optab, len_store_optab): New optab.
|
|
From-SVN: r279813
|
|
This patch adds optabs that check whether a read followed by a write
or a write followed by a read can be divided into interleaved byte
accesses without changing the dependencies between the bytes.
This is one of the uses of the SVE2 WHILERW and WHILEWR instructions.
(The instructions can also be used to limit the VF at runtime,
but that's future work.)
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* doc/sourcebuild.texi (vect_check_ptrs): Document.
* optabs.def (check_raw_ptrs_optab, check_war_ptrs_optab): New optabs.
* doc/md.texi: Document them.
* internal-fn.def (IFN_CHECK_RAW_PTRS, IFN_CHECK_WAR_PTRS): New
internal functions.
* internal-fn.h (internal_check_ptrs_fn_supported_p): Declare.
* internal-fn.c (check_ptrs_direct): New macro.
(expand_check_ptrs_optab_fn): Likewise.
(direct_check_ptrs_optab_supported_p): Likewise.
(internal_check_ptrs_fn_supported_p): New fuction.
* tree-data-ref.c: Include internal-fn.h.
(create_ifn_alias_checks): New function.
(create_intersect_range_checks): Use it.
* config/aarch64/iterators.md (SVE2_WHILE_PTR): New int iterator.
(optab, cmp_op): Handle it.
(raw_war, unspec): New int attributes.
* config/aarch64/aarch64.md (UNSPEC_WHILERW, UNSPEC_WHILE_WR): New
constants.
* config/aarch64/predicates.md (aarch64_bytes_per_sve_vector_operand):
New predicate.
* config/aarch64/aarch64-sve2.md (check_<raw_war>_ptrs<mode>): New
expander.
(@aarch64_sve2_while<cmp_op><GPI:mode><PRED_ALL:mode>_ptest): New
pattern.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_check_ptrs):
New procedure.
* gcc.dg/vect/vect-alias-check-14.c: Expect IFN_CHECK_WAR to be
used, if available.
* gcc.dg/vect/vect-alias-check-15.c: Likewise.
* gcc.dg/vect/vect-alias-check-16.c: Likewise IFN_CHECK_RAW.
* gcc.target/aarch64/sve2/whilerw_1.c: New test.
* gcc.target/aarch64/sve2/whilewr_1.c: Likewise.
* gcc.target/aarch64/sve2/whilewr_2.c: Likewise.
From-SVN: r278414
|
|
The gather and scatter optabs required the vector offset to be
the integer equivalent of the vector mode being loaded or stored.
This patch generalises them so that the two vectors can have different
element sizes, although they still need to have the same number of
elements.
One consequence of this is that it's possible (if unlikely)
for two IFN_GATHER_LOADs to have the same arguments but different
return types. E.g. the same scalar base and vector of 32-bit offsets
could be used to load 8-bit elements and to load 16-bit elements.
From just looking at the arguments, we could wrongly deduce that
they're equivalent.
I know we saw this happen at one point with IFN_WHILE_ULT,
and we dealt with it there by passing a zero of the return type
as an extra argument. Doing the same here also makes the load
and store functions have the same argument assignment.
For now this patch should be a no-op, but later SVE patches take
advantage of the new flexibility.
2019-11-08 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* optabs.def (gather_load_optab, mask_gather_load_optab)
(scatter_store_optab, mask_scatter_store_optab): Turn into
conversion optabs, with the offset mode given explicitly.
* doc/md.texi: Update accordingly.
* config/aarch64/aarch64-sve-builtins-base.cc
(svld1_gather_impl::expand): Likewise.
(svst1_scatter_impl::expand): Likewise.
* internal-fn.c (gather_load_direct, scatter_store_direct): Likewise.
(expand_scatter_store_optab_fn): Likewise.
(direct_gather_load_optab_supported_p): Likewise.
(direct_scatter_store_optab_supported_p): Likewise.
(expand_gather_load_optab_fn): Likewise. Expect the mask argument
to be argument 4.
(internal_fn_mask_index): Return 4 for IFN_MASK_GATHER_LOAD.
(internal_gather_scatter_fn_supported_p): Replace the offset sign
argument with the offset vector type. Require the two vector
types to have the same number of elements but allow their element
sizes to be different. Treat the optabs as conversion optabs.
* internal-fn.h (internal_gather_scatter_fn_supported_p): Update
prototype accordingly.
* optabs-query.c (supports_at_least_one_mode_p): Replace with...
(supports_vec_convert_optab_p): ...this new function.
(supports_vec_gather_load_p): Update accordingly.
(supports_vec_scatter_store_p): Likewise.
* tree-vectorizer.h (vect_gather_scatter_fn_p): Take a vec_info.
Replace the offset sign and bits parameters with a scalar type tree.
* tree-vect-data-refs.c (vect_gather_scatter_fn_p): Likewise.
Pass back the offset vector type instead of the scalar element type.
Allow the offset to be wider than the memory elements. Search for
an offset type that the target supports, stopping once we've
reached the maximum of the element size and pointer size.
Update call to internal_gather_scatter_fn_supported_p.
(vect_check_gather_scatter): Update calls accordingly.
When testing a new scale before knowing the final offset type,
check whether the scale is supported for any signed or unsigned
offset type. Check whether the target supports the source and
target types of a conversion before deciding whether to look
through the conversion. Record the chosen offset_vectype.
* tree-vect-patterns.c (vect_get_gather_scatter_offset_type): Delete.
(vect_recog_gather_scatter_pattern): Get the scalar offset type
directly from the gs_info's offset_vectype instead. Pass a zero
of the result type to IFN_GATHER_LOAD and IFN_MASK_GATHER_LOAD.
* tree-vect-stmts.c (check_load_store_masking): Update call to
internal_gather_scatter_fn_supported_p, passing the offset vector
type recorded in the gs_info.
(vect_truncate_gather_scatter_offset): Update call to
vect_check_gather_scatter, leaving it to search for a valid
offset vector type.
(vect_use_strided_gather_scatters_p): Convert the offset to the
element type of the gs_info's offset_vectype.
(vect_get_gather_scatter_ops): Get the offset vector type directly
from the gs_info.
(vect_get_strided_load_store_ops): Likewise.
(vectorizable_load): Pass a zero of the result type to IFN_GATHER_LOAD
and IFN_MASK_GATHER_LOAD.
* config/aarch64/aarch64-sve.md (gather_load<mode>): Rename to...
(gather_load<mode><v_int_equiv>): ...this.
(mask_gather_load<mode>): Rename to...
(mask_gather_load<mode><v_int_equiv>): ...this.
(scatter_store<mode>): Rename to...
(scatter_store<mode><v_int_equiv>): ...this.
(mask_scatter_store<mode>): Rename to...
(mask_scatter_store<mode><v_int_equiv>): ...this.
From-SVN: r277949
|
|
2019-09-30 Yuliang Wang <yuliang.wang@arm.com>
gcc/
* config/aarch64/aarch64-sve.md (sdiv_pow2<mode>3):
New pattern for ASRD.
* config/aarch64/iterators.md (UNSPEC_ASRD): New unspec.
* internal-fn.def (IFN_DIV_POW2): New internal function.
* optabs.def (sdiv_pow2_optab): New optab.
* tree-vect-patterns.c (vect_recog_divmod_pattern):
Modify pattern to support new operation.
* doc/md.texi (sdiv_pow2$var{m3}): Documentation for the above.
* doc/sourcebuild.texi (vect_sdiv_pow2_si):
Document new target selector.
gcc/testsuite/
* gcc.dg/vect/vect-sdiv-pow2-1.c: New test.
* gcc.target/aarch64/sve/asrdiv_1.c: As above.
* lib/target-supports.exp (check_effective_target_vect_sdiv_pow2_si):
Return true for AArch64 with SVE.
From-SVN: r276343
|
|
2019-09-12 Yuliang Wang <yuliang.wang@arm.com>
gcc/
PR tree-optimization/89386
* config/aarch64/aarch64-sve2.md (<su>mull<bt><Vwide>)
(<r>shrnb<mode>, <r>shrnt<mode>): New SVE2 patterns.
(<su>mulh<r>s<mode>3): New pattern for MULHRS.
* config/aarch64/iterators.md (UNSPEC_SMULLB, UNSPEC_SMULLT)
(UNSPEC_UMULLB, UNSPEC_UMULLT, UNSPEC_SHRNB, UNSPEC_SHRNT)
(UNSPEC_RSHRNB, UNSPEC_RSHRNT, UNSPEC_SMULHS, UNSPEC_SMULHRS)
UNSPEC_UMULHS, UNSPEC_UMULHRS): New unspecs.
(MULLBT, SHRNB, SHRNT, MULHRS): New int iterators.
(su, r): Handle the unspecs above.
(bt): New int attribute.
* internal-fn.def (IFN_MULHS, IFN_MULHRS): New internal functions.
* internal-fn.c (first_commutative_argument): Commutativity info for
above.
* optabs.def (smulhs_optab, smulhrs_optab, umulhs_optab)
(umulhrs_optab): New optabs.
* doc/md.texi (smulhs$var{m3}, umulhs$var{m3})
(smulhrs$var{m3}, umulhrs$var{m3}): Documentation for the above.
* tree-vect-patterns.c (vect_recog_mulhs_pattern): New pattern
function.
(vect_vect_recog_func_ptrs): Add it.
* testsuite/gcc.target/aarch64/sve2/mulhrs_1.c: New test.
* testsuite/gcc.dg/vect/vect-mulhrs-1.c: As above.
* testsuite/gcc.dg/vect/vect-mulhrs-2.c: As above.
* testsuite/gcc.dg/vect/vect-mulhrs-3.c: As above.
* testsuite/gcc.dg/vect/vect-mulhrs-4.c: As above.
* doc/sourcebuild.texi (vect_mulhrs_hi): Document new target selector.
* testsuite/lib/target-supports.exp
(check_effective_target_vect_mulhrs_hi): Return true for AArch64
with SVE2.
From-SVN: r275682
|
|
gcc/ChangeLog:
2019-08-26 Tejas Joshi <tejasjoshi9673@gmail.com>
Uros Bizjak <ubizjak@gmail.com>
* builtins.c (mathfn_built_in_2): Change CASE_MATHFN to
CASE_MATHFN_FLOATN for roundeven.
* config/i386/i386.c (ix86_i387_mode_needed): Add case
I387_ROUNDEVEN.
(ix86_mode_needed): Likewise.
(ix86_mode_after): Likewise.
(ix86_mode_entry): Likewise.
(ix86_mode_exit): Likewise.
(ix86_emit_mode_set): Likewise.
(emit_i387_cw_initialization): Add case I387_CW_ROUNDEVEN.
* config/i386/i386.h (ix86_stack_slot): Add SLOT_CW_ROUNDEVEN.
(ix86_entry): Add I387_ROUNDEVEN.
(avx_u128_state): Add I387_CW_ANY.
* config/i386/i386.md: Define UNSPEC_FRNDINT_ROUNDEVEN.
(define_int_iterator): Likewise.
(define_int_attr): Likewise for rounding_insn, rounding and ROUNDING.
(define_constant): Define ROUND_ROUNDEVEN mode.
(define_attr): Add roundeven mode for i387_cw.
(<rouding_insn><mode>2): Add condition for ROUND_ROUNDEVEN.
* internal-fn.def (ROUNDEVEN): New builtin function.
* optabs.def (roundeven_optab): New optab.
gcc/testsuite/ChangeLog:
2019-08-26 Tejas Joshi <tejasjoshi9673@gmail.com>
* gcc.target/i386/sse4_1-round-roundeven-1.c: New test.
* gcc.target/i386/sse4_1-round-roundeven-2.c: New test.
Co-Authored-By: Uros Bizjak <ubizjak@gmail.com>
From-SVN: r274928
|
|
This patch adds support for IFN_COND shifts left and shifts right.
This is mostly mechanical, but since we try to handle conditional
operations in the same way as unconditional operations in match.pd,
we need to support IFN_COND shifts by scalars as well as vectors.
E.g.:
IFN_COND_SHL (cond, a, { 1, 1, ... }, fallback)
and:
IFN_COND_SHL (cond, a, 1, fallback)
are the same operation, with:
(for shiftrotate (lrotate rrotate lshift rshift)
...
/* Prefer vector1 << scalar to vector1 << vector2
if vector2 is uniform. */
(for vec (VECTOR_CST CONSTRUCTOR)
(simplify
(shiftrotate @0 vec@1)
(with { tree tem = uniform_vector_p (@1); }
(if (tem)
(shiftrotate @0 { tem; }))))))
preferring the latter. The patch copes with this by extending
create_convert_operand_from to handle scalar-to-vector conversions.
2019-08-15 Richard Sandiford <richard.sandiford@arm.com>
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
gcc/
* internal-fn.def (IFN_COND_SHL, IFN_COND_SHR): New internal functions.
* internal-fn.c (FOR_EACH_CODE_MAPPING): Handle shifts.
* match.pd (UNCOND_BINARY, COND_BINARY): Likewise.
* optabs.def (cond_ashl_optab, cond_ashr_optab, cond_lshr_optab): New
optabs.
* optabs.h (create_convert_operand_from): Expand comment.
* optabs.c (maybe_legitimize_operand): Allow implicit broadcasts
when mapping scalar rtxes to vector operands.
* config/aarch64/iterators.md (SVE_INT_BINARY): Add ashift,
ashiftrt and lshiftrt.
(sve_int_op, sve_int_op_rev, sve_pred_int_rhs2_operand): Handle them.
* config/aarch64/aarch64-sve.md (*cond_<optab><mode>_2_const)
(*cond_<optab><mode>_any_const): New patterns.
gcc/testsuite/
* gcc.target/aarch64/sve/cond_shift_1.c: New test.
* gcc.target/aarch64/sve/cond_shift_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_2.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_2_run.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_3.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_3_run.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_4.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_5.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_5_run.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_6.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_6_run.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_7.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_7_run.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_8.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_8_run.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_9.c: Likewise.
* gcc.target/aarch64/sve/cond_shift_9_run.c: Likewise.
Co-Authored-By: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
From-SVN: r274505
|
|
2019-07-02 Aaron Sawdey <acsawdey@linux.ibm.com>
* optabs.def (movmem_optab): Add movmem back for memmove().
* doc/md.texi: Add description of movmem pattern for overlapping move.
From-SVN: r272946
|
|
2019-06-27 Aaron Sawdey <acsawdey@linux.ibm.com>
* builtins.c (get_memory_rtx): Fix comment.
* optabs.def (movmem_optab): Change to cpymem_optab.
* expr.c (emit_block_move_via_cpymem): Change movmem to cpymem.
(emit_block_move_hints): Change movmem to cpymem.
* defaults.h: Change movmem to cpymem.
* targhooks.c (get_move_ratio): Change movmem to cpymem.
(default_use_by_pieces_infrastructure_p): Ditto.
* config/aarch64/aarch64-protos.h: Change movmem to cpymem.
* config/aarch64/aarch64.c (aarch64_expand_movmem): Change movmem
to cpymem.
* config/aarch64/aarch64.h: Change movmem to cpymem.
* config/aarch64/aarch64.md (movmemdi): Change name to cpymemdi.
* config/alpha/alpha.h: Change movmem to cpymem in comment.
* config/alpha/alpha.md (movmemqi, movmemdi, *movmemdi_1): Change
movmem to cpymem.
* config/arc/arc-protos.h: Change movmem to cpymem.
* config/arc/arc.c (arc_expand_movmem): Change movmem to cpymem.
* config/arc/arc.h: Change movmem to cpymem in comment.
* config/arc/arc.md (movmemsi): Change movmem to cpymem.
* config/arm/arm-protos.h: Change movmem to cpymem in names.
* config/arm/arm.c (arm_movmemqi_unaligned, arm_gen_movmemqi,
gen_movmem_ldrd_strd, thumb_expand_movmemqi) Change movmem to cpymem.
* config/arm/arm.md (movmemqi): Change movmem to cpymem.
* config/arm/thumb1.md (movmem12b, movmem8b): Change movmem to cpymem.
* config/avr/avr-protos.h: Change movmem to cpymem.
* config/avr/avr.c (avr_adjust_insn_length, avr_emit_movmemhi,
avr_out_movmem): Change movmem to cpymem.
* config/avr/avr.md (movmemhi, movmem_<mode>, movmemx_<mode>):
Change movmem to cpymem.
* config/bfin/bfin-protos.h: Change movmem to cpymem.
* config/bfin/bfin.c (single_move_for_movmem, bfin_expand_movmem):
Change movmem to cpymem.
* config/bfin/bfin.h: Change movmem to cpymem in comment.
* config/bfin/bfin.md (movmemsi): Change name to cpymemsi.
* config/c6x/c6x-protos.h: Change movmem to cpymem.
* config/c6x/c6x.c (c6x_expand_movmem): Change movmem to cpymem.
* config/c6x/c6x.md (movmemsi): Change name to cpymemsi.
* config/frv/frv.md (movmemsi): Change name to cpymemsi.
* config/ft32/ft32.md (movmemsi): Change name to cpymemsi.
* config/h8300/h8300.md (movmemsi): Change name to cpymemsi.
* config/i386/i386-expand.c (expand_set_or_movmem_via_loop,
expand_set_or_movmem_via_rep, expand_movmem_epilogue,
expand_setmem_epilogue_via_loop, expand_set_or_cpymem_prologue,
expand_small_cpymem_or_setmem,
expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves,
expand_set_or_cpymem_constant_prologue,
ix86_expand_set_or_cpymem): Change movmem to cpymem.
* config/i386/i386-protos.h: Change movmem to cpymem.
* config/i386/i386.h: Change movmem to cpymem in comment.
* config/i386/i386.md (movmem<mode>): Change name to cpymem.
(setmem<mode>): Change expansion function name.
* config/lm32/lm32.md (movmemsi): Change name to cpymemsi.
* config/m32c/blkmov.md (movmemhi, movmemhi_bhi_op, movmemhi_bpsi_op,
movmemhi_whi_op, movmemhi_wpsi_op): Change movmem to cpymem.
* config/m32c/m32c-protos.h: Change movmem to cpymem.
* config/m32c/m32c.c (m32c_expand_movmemhi): Change movmem to cpymem.
* config/m32r/m32r.c (m32r_expand_block_move): Change movmem to cpymem.
* config/m32r/m32r.md (movmemsi, movmemsi_internal): Change movmem
to cpymem.
* config/mcore/mcore.md (movmemsi): Change name to cpymemsi.
* config/microblaze/microblaze.c: Change movmem to cpymem in comment.
* config/microblaze/microblaze.md (movmemsi): Change name to cpymemsi.
* config/mips/mips.c (mips_use_by_pieces_infrastructure_p):
Change movmem to cpymem.
* config/mips/mips.h: Change movmem to cpymem.
* config/mips/mips.md (movmemsi): Change name to cpymemsi.
* config/nds32/nds32-memory-manipulation.c
(nds32_expand_movmemsi_loop_unknown_size,
nds32_expand_movmemsi_loop_known_size, nds32_expand_movmemsi_loop,
nds32_expand_movmemsi_unroll,
nds32_expand_movmemsi): Change movmem to cpymem.
* config/nds32/nds32-multiple.md (movmemsi): Change name to cpymemsi.
* config/nds32/nds32-protos.h: Change movmem to cpymem.
* config/pa/pa.c (compute_movmem_length): Change movmem to cpymem.
(pa_adjust_insn_length): Change call to compute_movmem_length.
* config/pa/pa.md (movmemsi, movmemsi_prereload, movmemsi_postreload,
movmemdi, movmemdi_prereload,
movmemdi_postreload): Change movmem to cpymem.
* config/pdp11/pdp11.md (movmemhi, movmemhi1,
movmemhi_nocc, UNSPEC_MOVMEM): Change movmem to cpymem.
* config/riscv/riscv.c: Change movmem to cpymem in comment.
* config/riscv/riscv.h: Change movmem to cpymem.
* config/riscv/riscv.md: (movmemsi) Change name to cpymemsi.
* config/rs6000/rs6000.md: (movmemsi) Change name to cpymemsi.
* config/rx/rx.md: (UNSPEC_MOVMEM, movmemsi, rx_movmem): Change
movmem to cpymem.
* config/s390/s390-protos.h: Change movmem to cpymem.
* config/s390/s390.c (s390_expand_movmem, s390_expand_setmem,
s390_expand_insv): Change movmem to cpymem.
* config/s390/s390.md (movmem<mode>, movmem_short, *movmem_short,
movmem_long, *movmem_long, *movmem_long_31z): Change movmem to cpymem.
* config/sh/sh.md (movmemsi): Change name to cpymemsi.
* config/sparc/sparc.h: Change movmem to cpymem in comment.
* config/vax/vax-protos.h (vax_output_movmemsi): Remove prototype
for nonexistent function.
* config/vax/vax.h: Change movmem to cpymem in comment.
* config/vax/vax.md (movmemhi, movmemhi1): Change movmem to cpymem.
* config/visium/visium.h: Change movmem to cpymem in comment.
* config/visium/visium.md (movmemsi): Change name to cpymemsi.
* config/xtensa/xtensa.md (movmemsi): Change name to cpymemsi.
* doc/md.texi: Change movmem to cpymem and update description to match.
* doc/rtl.texi: Change movmem to cpymem.
* target.def (use_by_pieces_infrastructure_p): Change movmem to cpymem.
* doc/tm.texi: Regenerate.
From-SVN: r272755
|
|
* doc/md.texi: Document vec_shl_<mode> pattern.
* optabs.def (vec_shl_optab): New optab.
* optabs.c (shift_amt_for_vec_perm_mask): Add shift_optab
argument, if == vec_shl_optab, check for left whole vector shift
pattern rather than right shift.
(expand_vec_perm_const): Add vec_shl_optab support.
* optabs-query.c (can_vec_perm_var_p): Mention also vec_shl optab
in the comment.
* tree-vect-generic.c (lower_vec_perm): Support permutations which
can be handled by vec_shl_optab.
* tree-vect-stmts.c (scan_store_can_perm_p): New function.
(check_scan_store): Use it.
(vectorizable_scan_store): If target can't do normal permutations,
try to use whole vector left shifts and if needed a VEC_COND_EXPR
after it.
* config/i386/sse.md (vec_shl_<mode>): New expander.
* gcc.dg/vect/vect-simd-8.c: If main is defined, don't include
tree-vect.h nor call check_vect.
* gcc.dg/vect/vect-simd-9.c: Likewise.
* gcc.dg/vect/vect-simd-10.c: New test.
* gcc.target/i386/sse2-vect-simd-8.c: New test.
* gcc.target/i386/sse2-vect-simd-9.c: New test.
* gcc.target/i386/sse2-vect-simd-10.c: New test.
* gcc.target/i386/avx2-vect-simd-8.c: New test.
* gcc.target/i386/avx2-vect-simd-9.c: New test.
* gcc.target/i386/avx2-vect-simd-10.c: New test.
* gcc.target/i386/avx512f-vect-simd-8.c: New test.
* gcc.target/i386/avx512f-vect-simd-9.c: New test.
* gcc.target/i386/avx512f-vect-simd-10.c: New test.
From-SVN: r272472
|
|
This patch adds support in the vectorizer for masking fold left reductions.
This avoids the need to insert a conditional assignement with some identity
value.
From-SVN: r272407
|
|
From-SVN: r267494
|
|
PR target/88556
* internal-fn.def (COSH): New.
(SINH): Ditto.
(TANH): Ditto.
* optabs.def (cosh_optab): New.
(sinh_optab): Ditto.
(tanh_optab): Ditto.
* config/i386/i386-protos.h (ix86_emit_i387_sinh): New prototype.
(ix86_emit_i387_cosh): Ditto.
(ix86_emit_i387_tanh): Ditto.
* config/i386/i386.c (ix86_emit_i387_sinh): New function.
(ix86_emit_i387_cosh): Ditto.
(ix86_emit_i387_tanh): Ditto.
* config/i386/i386.md (sinhxf2): New expander.
(sinh<mode>2): Ditto.
(coshxf2): Ditto.
(cosh<mode>2): Ditto.
(tanhxf2): Ditto.
(tanh<mode>2): Ditto.
From-SVN: r267325
|
|
PR target/88513
PR target/88514
* optabs.def (vec_pack_sbool_trunc_optab, vec_unpacks_sbool_hi_optab,
vec_unpacks_sbool_lo_optab): New optabs.
* optabs.c (expand_widen_pattern_expr): Use vec_unpacks_sbool_*_optab
and pass additional argument if both input and target have the same
scalar mode of VECTOR_BOOLEAN_TYPE_P vectors.
* expr.c (expand_expr_real_2) <case VEC_PACK_TRUNC_EXPR>: Handle
VECTOR_BOOLEAN_TYPE_P pack where result has the same scalar mode
as the operands using vec_pack_sbool_trunc_optab.
* tree-vect-stmts.c (supportable_widening_operation): Use
vec_unpacks_sbool_{lo,hi}_optab for VECTOR_BOOLEAN_TYPE_P conversions
where both wider_vectype and vectype have the same scalar mode.
(supportable_narrowing_operation): Similarly use
vec_pack_sbool_trunc_optab if narrow_vectype and vectype have the same
scalar mode.
* config/i386/i386.c (ix86_get_builtin)
<case IX86_BUILTIN_GATHER3ALTDIV8SF>: Check for VECTOR_MODE_P
rather than non-VOIDmode.
* config/i386/sse.md (vec_pack_trunc_qi, vec_pack_trunc_<mode>):
Remove useless ()s around "register_operand", formatting fixes.
(vec_pack_sbool_trunc_qi, vec_unpacks_sbool_lo_qi,
vec_unpacks_sbool_hi_qi): New expanders.
* doc/md.texi (vec_pack_sbool_trunc_M, vec_unpacks_sbool_hi_M,
vec_unpacks_sbool_lo_M): Document.
* gcc.target/i386/avx512f-pr88513-1.c: New test.
* gcc.target/i386/avx512f-pr88513-2.c: New test.
* gcc.target/i386/avx512vl-pr88464-1.c: New test.
* gcc.target/i386/avx512vl-pr88464-2.c: New test.
* gcc.target/i386/avx512vl-pr88464-3.c: New test.
* gcc.target/i386/avx512vl-pr88464-4.c: New test.
* gcc.target/i386/avx512vl-pr88513-1.c: New test.
* gcc.target/i386/avx512vl-pr88513-2.c: New test.
* gcc.target/i386/avx512vl-pr88513-3.c: New test.
* gcc.target/i386/avx512vl-pr88513-4.c: New test.
* gcc.target/i386/avx512vl-pr88514-1.c: New test.
* gcc.target/i386/avx512vl-pr88514-2.c: New test.
* gcc.target/i386/avx512vl-pr88514-3.c: New test.
From-SVN: r267228
|
|
PR target/88502
* internal-fn.def (ACOSH): New.
(ASINH): Ditto.
(ATANH): Ditto.
* optabs.def (acosh_optab): New.
(asinh_optab): Ditto.
(atanh_optab): Ditto.
* config/i386/i386-protos.h (ix86_emit_i387_asinh): New prototype.
(ix86_emit_i387_acosh): Ditto.
(ix86_emit_i387_atanh): Ditto.
* config/i386/i386.c (ix86_emit_i387_asinh): New function.
(ix86_emit_i387_acosh): Ditto.
(ix86_emit_i387_atanh): Ditto.
* config/i386/i386.md (asinhxf2): New expander.
(asinh<mode>2): Ditto.
(acoshxf2): Ditto.
(acosh<mode>2): Ditto.
(atanhxf2): Ditto.
(atanh<mode>2): Ditto.
From-SVN: r267204
|