riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2023-08-22	VECT: Add LEN_FOLD_EXTRACT_LAST pattern	Juzhe-Zhong	1	-0/+1
	Hi, Richard and Richi. This is the last autovec pattern I want to add for RVV (length loop control). This patch is supposed to handled this following case: int __attribute__ ((noinline, noclone)) condition_reduction (int a, int min_v, int n) { int last = 66; / High start value. / for (int i = 0; i < n; i++) if (a[i] < min_v) last = i; return last; } ARM SVE IR: ... mask__7.11_39 = vect__4.10_37 < vect_cst__38; _40 = loop_mask_36 & mask__7.11_39; last_5 = .FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32); ... RVV IR, we want to see: ... loop_len = SELECT_VL mask__7.11_39 = vect__4.10_37 < vect_cst__38; last_5 = .LEN_FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32, loop_len, bias); ... gcc/ChangeLog: doc/md.texi: Add LEN_FOLD_EXTRACT_LAST pattern. * internal-fn.cc (fold_len_extract_direct): Ditto. (expand_fold_len_extract_optab_fn): Ditto. (direct_fold_len_extract_optab_supported_p): Ditto. * internal-fn.def (LEN_FOLD_EXTRACT_LAST): Ditto. * optabs.def (OPTAB_D): Ditto.
2023-08-16	Add support for vector conitional not	Andrew Pinski	1	-0/+2
	Like the support conditional neg (r12-4470-g20dcda98ed376cb61c74b2c71), this just adds conditional not too. Also we should be able to turn `(a ? -1 : 0) ^ b` into a conditional not. OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu. gcc/ChangeLog: * internal-fn.def (COND_NOT): New internal function. * match.pd (UNCOND_UNARY, COND_UNARY): Add bit_not/not to the lists. (`vec (a ? -1 : 0) ^ b`): New pattern to convert into conditional not. * optabs.def (cond_one_cmpl): New optab. (cond_len_one_cmpl): Likewise. gcc/testsuite/ChangeLog: PR target/110986 * gcc.target/aarch64/sve/cond_unary_9.c: New test.
2023-08-11	VECT: Add vec_mask_len_{load_lanes,store_lanes} patterns	Juzhe-Zhong	1	-0/+2
	This patch is add vec_mask_len_{load_lanes,store_stores} autovectorization patterns. Here we want to support this following autovectorization: void foo (int8_t __restrict a, int8_t __restrict b, int8_t __restrict cond, int n) { for (intptr_t i = 0; i < n; ++i) { if (cond[i]) a[i] = b[i 2] + b[i * 2 + 1]; } } ARM SVE IR: https://godbolt.org/z/cro1Eqc6a # loop_mask_60 = PHI <next_mask_82(4), max_mask_81(3)> ... mask__39.12_63 = vect__3.11_61 != { 0, ... }; vec_mask_and_66 = loop_mask_60 & mask__39.12_63; ... vect_array.15 = .MASK_LOAD_LANES (_57, 8B, vec_mask_and_66); ... For RVV, we would like to see IR: loop_len = SELECT_VL; ... mask__39.12_63 = vect__3.11_61 != { 0, ... }; ... vect_array.15 = .MASK_LEN_LOAD_LANES (_57, 8B, mask__39.12_63, loop_len, bias); ... Bootstrap and Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * doc/md.texi: Add vec_mask_len_{load_lanes,store_lanes} patterns. * internal-fn.cc (expand_partial_load_optab_fn): Ditto. (expand_partial_store_optab_fn): Ditto. * internal-fn.def (MASK_LEN_LOAD_LANES): Ditto. (MASK_LEN_STORE_LANES): Ditto. * optabs.def (OPTAB_CD): Ditto.
2023-07-21	cleanup: Change LEN_MASK into MASK_LEN	Juzhe-Zhong	1	-4/+4
	Hi. Since start from LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE, COND_LEN_* patterns, the order of len and mask is {mask,len,bias}. The reason we make "mask" argument comes before "len" is because we want to keep the "mask" location same as mask_* or cond_* patterns to make use of current codes flow of mask_* and cond_. Otherwise, we will need to change codes much more and make codes hard to maintain. Now, we already have COND_LEN_, it's naturally that we should rename "LEN_MASK" into "MASK_LEN" to keep name scheme consistent. This patch only changes the name "LEN_MASK" into "MASK_LEN". No codes functionality change. gcc/ChangeLog: * config/riscv/autovec.md (len_maskload<mode><vm>): Change LEN_MASK into MASK_LEN. (mask_len_load<mode><vm>): Ditto. (len_maskstore<mode><vm>): Ditto. (mask_len_store<mode><vm>): Ditto. (len_mask_gather_load<RATIO64:mode><RATIO64I:mode>): Ditto. (mask_len_gather_load<RATIO64:mode><RATIO64I:mode>): Ditto. (len_mask_gather_load<RATIO32:mode><RATIO32I:mode>): Ditto. (mask_len_gather_load<RATIO32:mode><RATIO32I:mode>): Ditto. (len_mask_gather_load<RATIO16:mode><RATIO16I:mode>): Ditto. (mask_len_gather_load<RATIO16:mode><RATIO16I:mode>): Ditto. (len_mask_gather_load<RATIO8:mode><RATIO8I:mode>): Ditto. (mask_len_gather_load<RATIO8:mode><RATIO8I:mode>): Ditto. (len_mask_gather_load<RATIO4:mode><RATIO4I:mode>): Ditto. (mask_len_gather_load<RATIO4:mode><RATIO4I:mode>): Ditto. (len_mask_gather_load<RATIO2:mode><RATIO2I:mode>): Ditto. (mask_len_gather_load<RATIO2:mode><RATIO2I:mode>): Ditto. (len_mask_gather_load<RATIO1:mode><RATIO1:mode>): Ditto. (mask_len_gather_load<RATIO1:mode><RATIO1:mode>): Ditto. (len_mask_scatter_store<RATIO64:mode><RATIO64I:mode>): Ditto. (mask_len_scatter_store<RATIO64:mode><RATIO64I:mode>): Ditto. (len_mask_scatter_store<RATIO32:mode><RATIO32I:mode>): Ditto. (mask_len_scatter_store<RATIO32:mode><RATIO32I:mode>): Ditto. (len_mask_scatter_store<RATIO16:mode><RATIO16I:mode>): Ditto. (mask_len_scatter_store<RATIO16:mode><RATIO16I:mode>): Ditto. (len_mask_scatter_store<RATIO8:mode><RATIO8I:mode>): Ditto. (mask_len_scatter_store<RATIO8:mode><RATIO8I:mode>): Ditto. (len_mask_scatter_store<RATIO4:mode><RATIO4I:mode>): Ditto. (mask_len_scatter_store<RATIO4:mode><RATIO4I:mode>): Ditto. (len_mask_scatter_store<RATIO2:mode><RATIO2I:mode>): Ditto. (mask_len_scatter_store<RATIO2:mode><RATIO2I:mode>): Ditto. (len_mask_scatter_store<RATIO1:mode><RATIO1:mode>): Ditto. (mask_len_scatter_store<RATIO1:mode><RATIO1:mode>): Ditto. * doc/md.texi: Ditto. * genopinit.cc (main): Ditto. (CMP_NAME): Ditto. Ditto. * gimple-fold.cc (arith_overflowed_p): Ditto. (gimple_fold_partial_load_store_mem_ref): Ditto. (gimple_fold_call): Ditto. * internal-fn.cc (len_maskload_direct): Ditto. (mask_len_load_direct): Ditto. (len_maskstore_direct): Ditto. (mask_len_store_direct): Ditto. (expand_call_mem_ref): Ditto. (expand_len_maskload_optab_fn): Ditto. (expand_mask_len_load_optab_fn): Ditto. (expand_len_maskstore_optab_fn): Ditto. (expand_mask_len_store_optab_fn): Ditto. (direct_len_maskload_optab_supported_p): Ditto. (direct_mask_len_load_optab_supported_p): Ditto. (direct_len_maskstore_optab_supported_p): Ditto. (direct_mask_len_store_optab_supported_p): Ditto. (internal_load_fn_p): Ditto. (internal_store_fn_p): Ditto. (internal_gather_scatter_fn_p): Ditto. (internal_fn_len_index): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_stored_value_index): Ditto. (internal_len_load_store_bias): Ditto. * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto. (MASK_LEN_GATHER_LOAD): Ditto. (LEN_MASK_LOAD): Ditto. (MASK_LEN_LOAD): Ditto. (LEN_MASK_SCATTER_STORE): Ditto. (MASK_LEN_SCATTER_STORE): Ditto. (LEN_MASK_STORE): Ditto. (MASK_LEN_STORE): Ditto. * optabs-query.cc (supports_vec_gather_load_p): Ditto. (supports_vec_scatter_store_p): Ditto. * optabs-tree.cc (target_supports_mask_load_store_p): Ditto. (target_supports_len_load_store_p): Ditto. * optabs.def (OPTAB_CD): Ditto. * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Ditto. (call_may_clobber_ref_p_1): Ditto. * tree-ssa-dse.cc (initialize_ao_ref_for_dse): Ditto. (dse_optimize_stmt): Ditto. * tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Ditto. (get_alias_ptr_type_for_ptr_address): Ditto. * tree-vect-data-refs.cc (vect_gather_scatter_fn_p): Ditto. * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Ditto. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto. (vect_get_strided_load_store_ops): Ditto. (vectorizable_store): Ditto. (vectorizable_load): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/gimple_fold-1.c: Ditto.
2023-07-19	VECT: Add mask_len_fold_left_plus for in-order floating-point reduction	Ju-Zhe Zhong	1	-0/+1
	Hi, Richard and Richi. This patch adds mask_len_fold_left_plus pattern to support in-order floating-point reduction for target support len loop control. Consider this following case: double foo2 (double __restrict a, double init, int __restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) init += a[i]; return init; } ARM SVE: ... vec_mask_and_60 = loop_mask_54 & mask__23.33_57; vect__ifc__35.37_64 = .VCOND_MASK (vec_mask_and_60, vect__8.36_61, { 0.0, ... }); _36 = .MASK_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, loop_mask_54); ... For RVV, we want to see: ... _36 = .MASK_LEN_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, control_mask, loop_len, bias); ... gcc/ChangeLog: * doc/md.texi: Add mask_len_fold_left_plus. * internal-fn.cc (mask_len_fold_left_direct): Ditto. (expand_mask_len_fold_left_optab_fn): Ditto. (direct_mask_len_fold_left_optab_supported_p): Ditto. * internal-fn.def (MASK_LEN_FOLD_LEFT_PLUS): Ditto. * optabs.def (OPTAB_D): Ditto.
2023-07-11	VECT: Add COND_LEN_* operations for loop control with length targets	Ju-Zhe Zhong	1	-0/+24
	Hi, Richard and Richi. This patch is adding cond_len_* operations pattern for target support loop control with length. These patterns will be used in these following case: 1. Integer division: void f (int32_t restrict a, int32_t restrict b, int32_t restrict c, int n) { for (int i = 0; i < n; ++i) { a[i] = b[i] / c[i]; } } ARM SVE IR: ... max_mask_36 = .WHILE_ULT (0, bnd.5_32, { 0, ... }); Loop: ... # loop_mask_29 = PHI <next_mask_37(4), max_mask_36(3)> ... vect__4.8_28 = .MASK_LOAD (_33, 32B, loop_mask_29); ... vect__6.11_25 = .MASK_LOAD (_20, 32B, loop_mask_29); vect__8.12_24 = .COND_DIV (loop_mask_29, vect__4.8_28, vect__6.11_25, vect__4.8_28); ... .MASK_STORE (_1, 32B, loop_mask_29, vect__8.12_24); ... next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... }); ... For target like RVV who support loop control with length, we want to see IR as follows: Loop: ... # loop_len_29 = SELECT_VL ... vect__4.8_28 = .LEN_MASK_LOAD (_33, 32B, loop_len_29); ... vect__6.11_25 = .LEN_MASK_LOAD (_20, 32B, loop_len_29); vect__8.12_24 = .COND_LEN_DIV (dummp_mask, vect__4.8_28, vect__6.11_25, vect__4.8_28, loop_len_29, bias); ... .LEN_MASK_STORE (_1, 32B, loop_len_29, vect__8.12_24); ... next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... }); ... Notice here, we use dummp_mask = { -1, -1, .... , -1 } 2. Integer conditional division: Similar case with (1) but with condtion: void f (int32_t restrict a, int32_t restrict b, int32_t restrict c, int32_t * cond, int n) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i] = b[i] / c[i]; } } ARM SVE: ... max_mask_76 = .WHILE_ULT (0, bnd.6_52, { 0, ... }); Loop: ... # loop_mask_55 = PHI <next_mask_77(5), max_mask_76(4)> ... vect__4.9_56 = .MASK_LOAD (_51, 32B, loop_mask_55); mask__29.10_58 = vect__4.9_56 != { 0, ... }; vec_mask_and_61 = loop_mask_55 & mask__29.10_58; ... vect__6.13_62 = .MASK_LOAD (_24, 32B, vec_mask_and_61); ... vect__8.16_66 = .MASK_LOAD (_1, 32B, vec_mask_and_61); vect__10.17_68 = .COND_DIV (vec_mask_and_61, vect__6.13_62, vect__8.16_66, vect__6.13_62); ... .MASK_STORE (_2, 32B, vec_mask_and_61, vect__10.17_68); ... next_mask_77 = .WHILE_ULT (_3, bnd.6_52, { 0, ... }); Here, ARM SVE use vec_mask_and_61 = loop_mask_55 & mask__29.10_58; to gurantee the correct result. However, target with length control can not perform this elegant flow, for RVV, we would expect: Loop: ... loop_len_55 = SELECT_VL ... mask__29.10_58 = vect__4.9_56 != { 0, ... }; ... vect__10.17_68 = .COND_LEN_DIV (mask__29.10_58, vect__6.13_62, vect__8.16_66, vect__6.13_62, loop_len_55, bias); ... Here we expect COND_LEN_DIV predicated by a real mask which is the outcome of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... }; and a real length which is produced by loop control : loop_len_55 = SELECT_VL 3. conditional Floating-point operations (no -ffast-math): void f (float restrict a, float restrict b, int32_t restrict cond, int n) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i] = b[i] + a[i]; } } ARM SVE IR: max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... }); ... # loop_mask_49 = PHI <next_mask_71(4), max_mask_70(3)> ... mask__27.10_52 = vect__4.9_50 != { 0, ... }; vec_mask_and_55 = loop_mask_49 & mask__27.10_52; ... vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56); ... next_mask_71 = .WHILE_ULT (_22, bnd.6_46, { 0, ... }); ... For RVV, we would expect IR: ... loop_len_49 = SELECT_VL ... mask__27.10_52 = vect__4.9_50 != { 0, ... }; ... vect__9.17_62 = .COND_LEN_ADD (mask__27.10_52, vect__6.13_56, vect__8.16_60, vect__6.13_56, loop_len_49, bias); ... 4. Conditional un-ordered reduction: int32_t f (int32_t restrict a, int32_t restrict cond, int n) { int32_t result = 0; for (int i = 0; i < n; ++i) { if (cond[i]) result += a[i]; } return result; } ARM SVE IR: Loop: # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)> ... # loop_mask_40 = PHI <next_mask_58(4), max_mask_57(3)> ... mask__17.11_43 = vect__4.10_41 != { 0, ... }; vec_mask_and_46 = loop_mask_40 & mask__17.11_43; ... vect__33.16_51 = .COND_ADD (vec_mask_and_46, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37); ... next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... }); ... Epilogue: _53 = .REDUC_PLUS (vect__33.16_51); [tail call] For RVV, we expect: Loop: # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)> ... loop_len_40 = SELECT_VL ... mask__17.11_43 = vect__4.10_41 != { 0, ... }; ... vect__33.16_51 = .COND_LEN_ADD (mask__17.11_43, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37, loop_len_40, bias); ... next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... }); ... Epilogue: _53 = .REDUC_PLUS (vect__33.16_51); [tail call] I name these patterns as "cond_len_" since I want the length operand comes after mask operand and all other operands except length operand same order as "cond_" patterns. Such order will make life easier in the following loop vectorizer support. gcc/ChangeLog: doc/md.texi: Add COND_LEN_* operations for loop control with length. * internal-fn.cc (cond_len_unary_direct): Ditto. (cond_len_binary_direct): Ditto. (cond_len_ternary_direct): Ditto. (expand_cond_len_unary_optab_fn): Ditto. (expand_cond_len_binary_optab_fn): Ditto. (expand_cond_len_ternary_optab_fn): Ditto. (direct_cond_len_unary_optab_supported_p): Ditto. (direct_cond_len_binary_optab_supported_p): Ditto. (direct_cond_len_ternary_optab_supported_p): Ditto. * internal-fn.def (COND_LEN_ADD): Ditto. (COND_LEN_SUB): Ditto. (COND_LEN_MUL): Ditto. (COND_LEN_DIV): Ditto. (COND_LEN_MOD): Ditto. (COND_LEN_RDIV): Ditto. (COND_LEN_MIN): Ditto. (COND_LEN_MAX): Ditto. (COND_LEN_FMIN): Ditto. (COND_LEN_FMAX): Ditto. (COND_LEN_AND): Ditto. (COND_LEN_IOR): Ditto. (COND_LEN_XOR): Ditto. (COND_LEN_SHL): Ditto. (COND_LEN_SHR): Ditto. (COND_LEN_FMA): Ditto. (COND_LEN_FMS): Ditto. (COND_LEN_FNMA): Ditto. (COND_LEN_FNMS): Ditto. (COND_LEN_NEG): Ditto. * optabs.def (OPTAB_D): Ditto.
2023-07-04	Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern	Ju-Zhe Zhong	1	-0/+2
	Hi, Richi and Richard. Base one the review comments from Richard: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html I change len_mask_gather_load/len_mask_scatter_store order into: {len,bias,mask} We adjust adding len and mask using using add_len_and_mask_args which is same as partial_load/parial_store. Now, the codes become more reasonable and easier maintain. This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets handle flow control by mask and loop control by length on gather/scatter memory operations. Consider this following case: void f (uint8_t restrict a, uint8_t restrict b, int n, int base, int step, int restrict cond) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i step + base] = b[i * step + base]; } } We hope RVV can vectorize such case into following IR: loop_len = SELECT_VL control_mask = comparison v = LEN_MASK_GATHER_LOAD (.., loop_len, bias, control_mask) LEN_SCATTER_STORE (... v, ..., loop_len, bias, control_mask) This patch doesn't apply such patterns into vectorizer, just add patterns and update the documents. Will send patch which apply such patterns into vectorizer soon after this patch is approved. Ok for trunk? gcc/ChangeLog: * doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store. * internal-fn.cc (expand_scatter_store_optab_fn): Ditto. (expand_gather_load_optab_fn): Ditto. (internal_load_fn_p): Ditto. (internal_store_fn_p): Ditto. (internal_gather_scatter_fn_p): Ditto. (internal_fn_len_index): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_stored_value_index): Ditto. * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto. (LEN_MASK_SCATTER_STORE): Ditto. * optabs.def (OPTAB_CD): Ditto.
2023-06-30	Mid engine setup [SU]ABDL	Oluwatamilore Adebayo	1	-0/+10
	This updates vect_recog_abd_pattern to recognize the widening variant of absolute difference (ABDL, ABDL2). gcc/ChangeLog: * internal-fn.def (VEC_WIDEN_ABD): New internal hilo optab. * optabs.def (vec_widen_sabd_optab, vec_widen_sabd_hi_optab, vec_widen_sabd_lo_optab, vec_widen_sabd_odd_even, vec_widen_sabd_even_optab, vec_widen_uabd_optab, vec_widen_uabd_hi_optab, vec_widen_uabd_lo_optab, vec_widen_uabd_odd_even, vec_widen_uabd_even_optab): New optabs. * doc/md.texi: Document them. * tree-vect-patterns.cc (vect_recog_abd_pattern): Update to to build a VEC_WIDEN_ABD call if the input precision is smaller than the precision of the output. (vect_recog_widen_abd_pattern): Should an ABD expression be found preceeding an extension, replace the two with a VEC_WIDEN_ABD.
2023-06-19	VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs	Ju-Zhe Zhong	1	-0/+2
	This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets like RISC-V that uses length in loop control. Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR. Mask is the outcome of comparison. LEN_MASK_ LOAD/STORE format is defined as follows: 1). LEN_MASK_LOAD (ptr, align, length, mask). 2). LEN_MASK_STORE (ptr, align, length, mask, vec). Consider these 4 following cases: VLA: Variable-length auto-vectorization VLS: Specific-length auto-vectorization Case 1 (VLS): -mrvv-vector-bits=128 IR (Does not use LEN_MASK_): Code: v1 = MEM (...) for (int i = 0; i < 4; i++) v2 = MEM (...) a[i] = b[i] + c[i]; v3 = v1 + v2 MEM[...] = v3 Case 2 (VLS): -mrvv-vector-bits=128 IR (LEN_MASK_ with length = VF, mask = comparison): Code: mask = comparison for (int i = 0; i < 4; i++) v1 = LEN_MASK_LOAD (length = VF, mask) if (cond[i]) v2 = LEN_MASK_LOAD (length = VF, mask) a[i] = b[i] + c[i]; v3 = v1 + v2 LEN_MASK_STORE (length = VF, mask, v3) Case 3 (VLA): Code: loop_len = SELECT_VL or MIN for (int i = 0; i < n; i++) v1 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...}) a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...}) v3 = v1 + v2 LEN_MASK_STORE (length = loop_len, mask = {-1,-1,...}, v3) Case 4 (VLA): Code: loop_len = SELECT_VL or MIN for (int i = 0; i < n; i++) mask = comparison if (cond[i]) v1 = LEN_MASK_LOAD (length = loop_len, mask) a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask) v3 = v1 + v2 LEN_MASK_STORE (length = loop_len, mask, v3) Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com> gcc/ChangeLog: * doc/md.texi: Add len_mask{load,store}. * genopinit.cc (main): Ditto. (CMP_NAME): Ditto. * internal-fn.cc (len_maskload_direct): Ditto. (len_maskstore_direct): Ditto. (expand_call_mem_ref): Ditto. (expand_partial_load_optab_fn): Ditto. (expand_len_maskload_optab_fn): Ditto. (expand_partial_store_optab_fn): Ditto. (expand_len_maskstore_optab_fn): Ditto. (direct_len_maskload_optab_supported_p): Ditto. (direct_len_maskstore_optab_supported_p): Ditto. * internal-fn.def (LEN_MASK_LOAD): Ditto. (LEN_MASK_STORE): Ditto. * optabs.def (OPTAB_CD): Ditto.
2023-06-15	middle-end, i386: Pattern recognize add/subtract with carry [PR79173]	Jakub Jelinek	1	-0/+2
	The following patch introduces {add,sub}c5_optab and pattern recognizes various forms of add with carry and subtract with carry/borrow, see pr79173-{1,2,3,4,5,6}.c tests on what is matched. Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow calls per limb (with just one for the least significant one), for add with carry even when it is hand written in C (for subtraction reassoc seems to change it too much so that the pattern recognition doesn't work). __builtin_{add,sub}_overflow are standardized in C23 under ckd_{add,sub} names, so it isn't any longer a GNU only extension. Note, clang has for these (IMHO badly designed) __builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just a single bit of carry, but basically add 3 unsigned values or subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2 because of that. If we wanted to introduce those for clang compatibility, we could and lower them early to just two __builtin_{add,sub}_overflow calls and let the pattern matching in this patch recognize it later. I've added expanders for this on ix86 and in addition to that added various peephole2s (in preparation patches for this patch) to make sure we get nice (and small) code for the common cases. I think there are other PRs which request that e.g. for the _{addcarry,subborrow}_u{32,64} intrinsics, which the patch also improves. Would be nice if support for these optabs was added to many other targets, arm/aarch64 and powerpc* certainly have such instructions, I'd expect in fact that most targets do. The _BitInt support I'm working on will also need this to emit reasonable code. 2023-06-15 Jakub Jelinek <jakub@redhat.com> PR middle-end/79173 * internal-fn.def (UADDC, USUBC): New internal functions. * internal-fn.cc (expand_UADDC, expand_USUBC): New functions. (commutative_ternary_fn_p): Return true also for IFN_UADDC. * optabs.def (uaddc5_optab, usubc5_optab): New optabs. * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart, match_uaddc_usubc): New functions. (math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless other optimizations have been successful for those. * gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC. * fold-const-call.cc (fold_const_call): Likewise. * gimple-range-fold.cc (adjust_imagpart_expr): Likewise. * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise. * doc/md.texi (uaddc<mode>5, usubc<mode>5): Document new named patterns. * config/i386/i386.md (uaddc<mode>5, usubc<mode>5): New define_expand patterns. (setcc_qi_addqi3_cconly_overflow_1_<mode>, setccc): Split into NOTE_INSN_DELETED note rather than nop instruction. (setcc_qi_negqi_ccc_1_<mode>, setcc_qi_negqi_ccc_2_<mode>): Likewise. * gcc.target/i386/pr79173-1.c: New test. * gcc.target/i386/pr79173-2.c: New test. * gcc.target/i386/pr79173-3.c: New test. * gcc.target/i386/pr79173-4.c: New test. * gcc.target/i386/pr79173-5.c: New test. * gcc.target/i386/pr79173-6.c: New test. * gcc.target/i386/pr79173-7.c: New test. * gcc.target/i386/pr79173-8.c: New test. * gcc.target/i386/pr79173-9.c: New test. * gcc.target/i386/pr79173-10.c: New test.
2023-06-15	Missed opportunity to use [SU]ABD	Oluwatamilore Adebayo	1	-0/+2
	This adds a recognition pattern for the non-widening absolute difference (ABD). gcc/ChangeLog: * doc/md.texi (sabd, uabd): Document them. * internal-fn.def (ABD): Use new optab. * optabs.def (sabd_optab, uabd_optab): New optabs, * tree-vect-patterns.cc (vect_recog_absolute_difference): Recognize the following idiom abs (a - b). (vect_recog_sad_pattern): Refactor to use vect_recog_absolute_difference. (vect_recog_abd_pattern): Use patterns found by vect_recog_absolute_difference to build a new ABD internal call.
2023-06-10	VECT: Add SELECT_VL support	Ju-Zhe Zhong	1	-0/+1
	This patch address comments from Richard && Richi and rebase to trunk. This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patch is inspired by RVV ISA and LLVM: https://reviews.llvm.org/D99750 The SELECT_VL is same behavior as LLVM "get_vector_length" with these following properties: 1. Only apply on single-rgroup. 2. non SLP. 3. adjust loop control IV. 4. adjust data reference IV. 5. allow non-vf elements processing in non-final iteration Code # void vvaddint32(size_t n, const intx, const inty, intz) # { for (size_t i=0; i<n; i++) { z[i]=x[i]+y[i]; } } Take RVV codegen for example: Before this patch: vvaddint32: ble a0,zero,.L6 csrr a4,vlenb srli a6,a4,2 .L4: mv a5,a0 bleu a0,a6,.L3 mv a5,a6 .L3: vsetvli zero,a5,e32,m1,ta,ma vle32.v v2,0(a1) vle32.v v1,0(a2) vsetvli a7,zero,e32,m1,ta,ma sub a0,a0,a5 vadd.vv v1,v1,v2 vsetvli zero,a5,e32,m1,ta,ma vse32.v v1,0(a3) add a2,a2,a4 add a3,a3,a4 add a1,a1,a4 bne a0,zero,.L4 .L6: ret After this patch: vvaddint32: vsetvli t0, a0, e32, ta, ma # Set vector length based on 32-bit vectors vle32.v v0, (a1) # Get first vector sub a0, a0, t0 # Decrement number done slli t0, t0, 2 # Multiply number done by 4 bytes add a1, a1, t0 # Bump pointer vle32.v v1, (a2) # Get second vector add a2, a2, t0 # Bump pointer vadd.vv v2, v0, v1 # Sum vectors vse32.v v2, (a3) # Store result add a3, a3, t0 # Bump pointer bnez a0, vvaddint32 # Loop back ret # Finished Co-authored-by: Richard Sandiford<richard.sandiford@arm.com> Co-authored-by: Richard Biener <rguenther@suse.de> gcc/ChangeLog: doc/md.texi: Add SELECT_VL support. * internal-fn.def (SELECT_VL): Ditto. * optabs.def (OPTAB_D): Ditto. * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto. * tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto. (vectorizable_store): Ditto. (vectorizable_load): Ditto. * tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
2023-06-05	Remove widen_plus/minus_expr tree codes	Andre Vieira	1	-8/+0
	This patch removes the old widen plus/minus tree codes which have been replaced by internal functions. 2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com> Joel Hutton <joel.hutton@arm.com> gcc/ChangeLog: * doc/generic.texi: Remove old tree codes. * expr.cc (expand_expr_real_2): Remove old tree code cases. * gimple-pretty-print.cc (dump_binary_rhs): Likewise. * optabs-tree.cc (optab_for_tree_code): Likewise. (supportable_half_widening_operation): Likewise. * tree-cfg.cc (verify_gimple_assign_binary): Likewise. * tree-inline.cc (estimate_operator_cost): Likewise. (op_symbol_code): Likewise. * tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Likewise. (vect_analyze_data_ref_accesses): Likewise. * tree-vect-generic.cc (expand_vector_operations_1): Likewise. * cfgexpand.cc (expand_debug_expr): Likewise. * tree-vect-stmts.cc (vectorizable_conversion): Likewise. (supportable_widening_operation): Likewise. * gimple-range-op.cc (gimple_range_op_handler::maybe_non_standard): Likewise. * optabs.def (vec_widen_ssubl_hi_optab, vec_widen_ssubl_lo_optab, vec_widen_saddl_hi_optab, vec_widen_saddl_lo_optab, vec_widen_usubl_hi_optab, vec_widen_usubl_lo_optab, vec_widen_uaddl_hi_optab, vec_widen_uaddl_lo_optab): Remove optabs. * tree-pretty-print.cc (dump_generic_node): Remove tree code definition. * tree.def (WIDEN_PLUS_EXPR, WIDEN_MINUS_EXPR, VEC_WIDEN_PLUS_HI_EXPR, VEC_WIDEN_PLUS_LO_EXPR, VEC_WIDEN_MINUS_HI_EXPR, VEC_WIDEN_MINUS_LO_EXPR): Likewise.
2023-06-05	internal-fn,vect: Refactor widen_plus as internal_fn	Andre Vieira	1	-0/+20
	DEF_INTERNAL_WIDENING_OPTAB_FN and DEF_INTERNAL_NARROWING_OPTAB_FN are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively. With the exception that they provide convenience wrappers for a single vector to vector conversion, a hi/lo split or an even/odd split. Each definition for <NAME> will require either signed optabs named <UOPTAB> and <SOPTAB> (for widening) or a single <OPTAB> (for narrowing) for each of the five functions it creates. For example, for widening addition the DEF_INTERNAL_WIDENING_OPTAB_FN will create five internal functions: IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO, IFN_VEC_WIDEN_PLUS_EVEN and IFN_VEC_WIDEN_PLUS_ODD. Each requiring two optabs, one for signed and one for unsigned. Aarch64 implements the hi/lo split optabs: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> (u/s)addl2 IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> -> (u/s)addl This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. 2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com> Joel Hutton <joel.hutton@arm.com> Tamar Christina <tamar.christina@arm.com> gcc/ChangeLog: * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): Rename this ... (vec_widen_<su>add_lo_<mode>): ... to this. (vec_widen_<su>addl_hi_<mode>): Rename this ... (vec_widen_<su>add_hi_<mode>): ... to this. (vec_widen_<su>subl_lo_<mode>): Rename this ... (vec_widen_<su>sub_lo_<mode>): ... to this. (vec_widen_<su>subl_hi_<mode>): Rename this ... (vec_widen_<su>sub_hi_<mode>): ...to this. * doc/generic.texi: Document new IFN codes. * internal-fn.cc (lookup_hilo_internal_fn): Add lookup function. (commutative_binary_fn_p): Add widen_plus fn's. (widening_fn_p): New function. (narrowing_fn_p): New function. (direct_internal_fn_optab): Change visibility. * internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an internal_fn that expands into multiple internal_fns for widening. (IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO, IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD, IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI, IFN_VEC_WIDEN_MINUS_LO, IFN_VEC_WIDEN_MINUS_ODD, IFN_VEC_WIDEN_MINUS_EVEN): Define widening plus,minus functions. * internal-fn.h (direct_internal_fn_optab): Declare new prototype. (lookup_hilo_internal_fn): Likewise. (widening_fn_p): Likewise. (Narrowing_fn_p): Likewise. * optabs.cc (commutative_optab_p): Add widening plus optabs. * optabs.def (OPTAB_D): Define widen add, sub optabs. * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support patterns with a hi/lo or even/odd split. (vect_recog_sad_pattern): Refactor to use new IFN codes. (vect_recog_widen_plus_pattern): Likewise. (vect_recog_widen_minus_pattern): Likewise. (vect_recog_average_pattern): Likewise. * tree-vect-stmts.cc (vectorizable_conversion): Add support for _HILO IFNs. (supportable_widening_operation): Likewise. * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-widen-add.c: Test that new IFN_VEC_WIDEN_PLUS is being used. * gcc.target/aarch64/vect-widen-sub.c: Test that new IFN_VEC_WIDEN_MINUS is being used.
2023-01-02	Update copyright years.	Jakub Jelinek	1	-1/+1

2022-12-12	middle-end: Add new tbranch optab to add support for bit-test-and-branch ↵	Tamar Christina	1	-0/+2
	operations This adds a new test-and-branch optab that can be used to do a conditional test of a bit and branch. This is similar to the cbranch optab but instead can test any arbitrary bit inside the register. This patch recognizes boolean comparisons and single bit mask tests. gcc/ChangeLog: * dojump.cc (do_jump): Pass along value. (do_jump_by_parts_greater_rtx): Likewise. (do_jump_by_parts_zero_rtx): Likewise. (do_jump_by_parts_equality_rtx): Likewise. (do_compare_rtx_and_jump): Likewise. (do_compare_and_jump): Likewise. * dojump.h (do_compare_rtx_and_jump): New. * optabs.cc (emit_cmp_and_jump_insn_1): Refactor to take optab to check. (validate_test_and_branch): New. (emit_cmp_and_jump_insns): Optiobally take a value, and when value is supplied then check if it's suitable for tbranch. * optabs.def (tbranch_eq$a4, tbranch_ne$a4): New. * doc/md.texi (tbranch_@var{op}@var{mode}4): Document it. * optabs.h (emit_cmp_and_jump_insns): New. * tree.h (tree_zero_one_valued_p): New.
2022-08-26	Implement __builtin_issignaling	Jakub Jelinek	1	-0/+1
	The following patch implements a new builtin, __builtin_issignaling, which can be used to implement the ISO/IEC TS 18661-1 issignaling macro. It is implemented as type-generic function, so there is just one builtin, not many with various suffixes. This patch doesn't address PR56831 nor PR58416, but I think compared to using glibc issignaling macro could make some cases better (as the builtin is expanded always inline and for SFmode/DFmode just reinterprets a memory or pseudo register as SImode/DImode, so could avoid some raising of exception + turning sNaN into qNaN before the builtin can analyze it). For floading point modes that do not have NaNs it will return 0, otherwise I've tried to implement this for all the other supported real formats. It handles both the MIPS/PA floats where a sNaN has the mantissa MSB set and the rest where a sNaN has it cleared, with the exception of format which are known never to be in the MIPS/PA form. The MIPS/PA floats are handled using a test like (x & mask) == mask, the other usually as ((x ^ bit) & mask) > val where bit, mask and val are some constants. IBM double double is done by doing DFmode test on the most significant half, and Intel/Motorola extended (12 or 16 bytes) and IEEE quad are handled by extracting 32-bit/16-bit words or 64-bit parts from the value and testing those. On x86, XFmode is handled by a special optab so that even pseudo numbers are considered signaling, like in glibc and like the i386 specific testcase tests. 2022-08-26 Jakub Jelinek <jakub@redhat.com> gcc/ * builtins.def (BUILT_IN_ISSIGNALING): New built-in. * builtins.cc (expand_builtin_issignaling): New function. (expand_builtin_signbit): Don't overwrite target. (expand_builtin): Handle BUILT_IN_ISSIGNALING. (fold_builtin_classify): Likewise. (fold_builtin_1): Likewise. * optabs.def (issignaling_optab): New. * fold-const-call.cc (fold_const_call_ss): Handle BUILT_IN_ISSIGNALING. * config/i386/i386.md (issignalingxf2): New expander. * doc/extend.texi (__builtin_issignaling): Document. (__builtin_isinf, __builtin_isnan): Clarify behavior with -ffinite-math-only. * doc/md.texi (issignaling<mode>2): Likewise. gcc/c-family/ * c-common.cc (check_builtin_function_arguments): Handle BUILT_IN_ISSIGNALING. gcc/c/ * c-typeck.cc (convert_arguments): Handle BUILT_IN_ISSIGNALING. gcc/fortran/ * f95-lang.cc (gfc_init_builtin_functions): Initialize BUILT_IN_ISSIGNALING. gcc/testsuite/ * gcc.dg/torture/builtin-issignaling-1.c: New test. * gcc.dg/torture/builtin-issignaling-2.c: New test. * gcc.dg/torture/float16-builtin-issignaling-1.c: New test. * gcc.dg/torture/float32-builtin-issignaling-1.c: New test. * gcc.dg/torture/float32x-builtin-issignaling-1.c: New test. * gcc.dg/torture/float64-builtin-issignaling-1.c: New test. * gcc.dg/torture/float64x-builtin-issignaling-1.c: New test. * gcc.dg/torture/float128-builtin-issignaling-1.c: New test. * gcc.dg/torture/float128x-builtin-issignaling-1.c: New test. * gcc.target/i386/builtin-issignaling-1.c: New test.
2022-01-24	rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept ↵	Raoni Fassina Firmino	1	-0/+4
	and feraiseexcept [PR94193] This optimizations were originally in glibc, but was removed and suggested that they were a good fit as gcc builtins[1]. feclearexcept and feraiseexcept were extended (in comparison to the glibc version) to accept any combination of the accepted flags, not limited to just one flag bit at a time anymore. The builtin expanders needs knowledge of the target libc's FE_* values, so they are limited to expand only to suitable libcs. [1] https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00047.html https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00080.html 2020-08-13 Raoni Fassina Firmino <raoni@linux.ibm.com> gcc/ PR target/94193 * builtins.cc (expand_builtin_fegetround): New function. (expand_builtin_feclear_feraise_except): New function. (expand_builtin): Add cases for BUILT_IN_FEGETROUND, BUILT_IN_FECLEAREXCEPT and BUILT_IN_FERAISEEXCEPT. * config/rs6000/rs6000.md (fegetroundsi): New pattern. (feclearexceptsi): New Pattern. (feraiseexceptsi): New Pattern. * doc/extend.texi: Add a new introductory paragraph about the new builtins. * doc/md.texi: (fegetround@var{m}): Document new optab. (feclearexcept@var{m}): Document new optab. (feraiseexcept@var{m}): Document new optab. * optabs.def (fegetround_optab): New optab. (feclearexcept_optab): New optab. (feraiseexcept_optab): New optab. gcc/testsuite/ PR target/94193 * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c: New test. * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c: New test. * gcc.target/powerpc/builtin-fegetround.c: New test. Signed-off-by: Raoni Fassina Firmino <raoni@linux.ibm.com>
2022-01-17	widening_mul, i386: Improve spaceship expansion on x86 [PR103973]	Jakub Jelinek	1	-0/+1
	C++20: #include <compare> auto cmp4way(double a, double b) { return a <=> b; } expands to: ucomisd %xmm1, %xmm0 jp .L8 movl $0, %eax jne .L8 .L2: ret .p2align 4,,10 .p2align 3 .L8: comisd %xmm0, %xmm1 movl $-1, %eax ja .L2 ucomisd %xmm1, %xmm0 setbe %al addl $1, %eax ret That is 3 comparisons of the same operands. The following patch improves it to just one comparison: comisd %xmm1, %xmm0 jp .L4 seta %al movl $0, %edx leal -1(%rax,%rax), %eax cmove %edx, %eax ret .L4: movl $2, %eax ret While a <=> b expands to a == b ? 0 : a < b ? -1 : a > b ? 1 : 2 where the first comparison is equality and this shouldn't raise exceptions on qNaN operands, if the operands aren't equal (which includes unordered cases), then it immediately performs < or > comparison and that raises exceptions even on qNaNs, so we can just perform a single comparison that raises exceptions on qNaN. As the 4 different cases are encoded as ZF CF PF 1 1 1 a unordered b 0 0 0 a > b 0 1 0 a < b 1 0 0 a == b we can emit optimal sequence of comparions, first jp for the unordered case, then je for the == case and finally jb for the < case. The patch pattern recognizes spaceship-like comparisons during widening_mul if the spaceship optab is implemented, and replaces those comparisons with comparisons of .SPACESHIP ifn which returns -1/0/1/2 based on the comparison. This seems to work well both for the case of just returning the -1/0/1/2 (when we have just a common successor with a PHI) or when the different cases are handled with various other basic blocks. The testcases cover both of those cases, the latter with different function calls in those. 2022-01-17 Jakub Jelinek <jakub@redhat.com> PR target/103973 * tree-cfg.h (cond_only_block_p): Declare. * tree-ssa-phiopt.c (cond_only_block_p): Move function to ... * tree-cfg.c (cond_only_block_p): ... here. No longer static. * optabs.def (spaceship_optab): New optab. * internal-fn.def (SPACESHIP): New internal function. * internal-fn.h (expand_SPACESHIP): Declare. * internal-fn.c (expand_PHI): Formatting fix. (expand_SPACESHIP): New function. * tree-ssa-math-opts.c (optimize_spaceship): New function. (math_opts_dom_walker::after_dom_children): Use it. * config/i386/i386.md (spaceship<mode>3): New define_expand. * config/i386/i386-protos.h (ix86_expand_fp_spaceship): Declare. * config/i386/i386-expand.c (ix86_expand_fp_spaceship): New function. * doc/md.texi (spaceship@var{m}3): Document. * gcc.target/i386/pr103973-1.c: New test. * gcc.target/i386/pr103973-2.c: New test. * gcc.target/i386/pr103973-3.c: New test. * gcc.target/i386/pr103973-4.c: New test. * gcc.target/i386/pr103973-5.c: New test. * gcc.target/i386/pr103973-6.c: New test. * gcc.target/i386/pr103973-7.c: New test. * gcc.target/i386/pr103973-8.c: New test. * gcc.target/i386/pr103973-9.c: New test. * gcc.target/i386/pr103973-10.c: New test. * gcc.target/i386/pr103973-11.c: New test. * gcc.target/i386/pr103973-12.c: New test. * gcc.target/i386/pr103973-13.c: New test. * gcc.target/i386/pr103973-14.c: New test. * gcc.target/i386/pr103973-15.c: New test. * gcc.target/i386/pr103973-16.c: New test. * gcc.target/i386/pr103973-17.c: New test. * gcc.target/i386/pr103973-18.c: New test. * gcc.target/i386/pr103973-19.c: New test. * gcc.target/i386/pr103973-20.c: New test. * g++.target/i386/pr103973-1.C: New test. * g++.target/i386/pr103973-2.C: New test. * g++.target/i386/pr103973-3.C: New test. * g++.target/i386/pr103973-4.C: New test. * g++.target/i386/pr103973-5.C: New test. * g++.target/i386/pr103973-6.C: New test. * g++.target/i386/pr103973-7.C: New test. * g++.target/i386/pr103973-8.C: New test. * g++.target/i386/pr103973-9.C: New test. * g++.target/i386/pr103973-10.C: New test. * g++.target/i386/pr103973-11.C: New test. * g++.target/i386/pr103973-12.C: New test. * g++.target/i386/pr103973-13.C: New test. * g++.target/i386/pr103973-14.C: New test. * g++.target/i386/pr103973-15.C: New test. * g++.target/i386/pr103973-16.C: New test. * g++.target/i386/pr103973-17.C: New test. * g++.target/i386/pr103973-18.C: New test. * g++.target/i386/pr103973-19.C: New test. * g++.target/i386/pr103973-20.C: New test.
2022-01-03	i386, fab: Optimize __atomic_{add,sub,and,or,xor}_fetch (x, y, z) ↵	Jakub Jelinek	1	-0/+5
	{==,!=,<,<=,>,>=} 0 [PR98737] On Wed, Jan 27, 2021 at 12:27:13PM +0100, Ulrich Drepper via Gcc-patches wrote: > On 1/27/21 11:37 AM, Jakub Jelinek wrote: > > Would equality comparison against 0 handle the most common cases. > > > > The user can write it as > > __atomic_sub_fetch (x, y, z) == 0 > > or > > __atomic_fetch_sub (x, y, z) - y == 0 > > thouch, so the expansion code would need to be able to cope with both. > > Please also keep !=0, <0, <=0, >0, and >=0 in mind. They all can be > useful and can be handled with the flags. <= 0 and > 0 don't really work well with lock {add,sub,inc,dec}, x86 doesn't have comparisons that would look solely at both SF and ZF and not at other flags (and emitting two separate conditional jumps or two setcc insns and oring them together looks awful). But the rest can work. Here is a patch that adds internal functions and optabs for these, recognizes them at the same spot as e.g. .ATOMIC_BIT_TEST_AND* internal functions (fold all builtins pass) and expands them appropriately (or for the <= 0 and > 0 cases of +/- FAILs and let's middle-end fall back). So far I have handled just the op_fetch builtins, IMHO instead of handling also __atomic_fetch_sub (x, y, z) - y == 0 etc. we should canonicalize __atomic_fetch_sub (x, y, z) - y to __atomic_sub_fetch (x, y, z) (and vice versa). 2022-01-03 Jakub Jelinek <jakub@redhat.com> PR target/98737 * internal-fn.def (ATOMIC_ADD_FETCH_CMP_0, ATOMIC_SUB_FETCH_CMP_0, ATOMIC_AND_FETCH_CMP_0, ATOMIC_OR_FETCH_CMP_0, ATOMIC_XOR_FETCH_CMP_0): New internal fns. * internal-fn.h (ATOMIC_OP_FETCH_CMP_0_EQ, ATOMIC_OP_FETCH_CMP_0_NE, ATOMIC_OP_FETCH_CMP_0_LT, ATOMIC_OP_FETCH_CMP_0_LE, ATOMIC_OP_FETCH_CMP_0_GT, ATOMIC_OP_FETCH_CMP_0_GE): New enumerators. * internal-fn.c (expand_ATOMIC_ADD_FETCH_CMP_0, expand_ATOMIC_SUB_FETCH_CMP_0, expand_ATOMIC_AND_FETCH_CMP_0, expand_ATOMIC_OR_FETCH_CMP_0, expand_ATOMIC_XOR_FETCH_CMP_0): New functions. * optabs.def (atomic_add_fetch_cmp_0_optab, atomic_sub_fetch_cmp_0_optab, atomic_and_fetch_cmp_0_optab, atomic_or_fetch_cmp_0_optab, atomic_xor_fetch_cmp_0_optab): New direct optabs. * builtins.h (expand_ifn_atomic_op_fetch_cmp_0): Declare. * builtins.c (expand_ifn_atomic_op_fetch_cmp_0): New function. * tree-ssa-ccp.c: Include internal-fn.h. (optimize_atomic_bit_test_and): Add . before internal fn call in function comment. Change return type from void to bool and return true only if successfully replaced. (optimize_atomic_op_fetch_cmp_0): New function. (pass_fold_builtins::execute): Use optimize_atomic_op_fetch_cmp_0 for BUILT_IN_ATOMIC_{ADD,SUB,AND,OR,XOR}_FETCH_{1,2,4,8,16} and BUILT_IN_SYNC_{ADD,SUB,AND,OR,XOR}_AND_FETCH_{1,2,4,8,16}, for XOR ones only if optimize_atomic_bit_test_and failed. * config/i386/sync.md (atomic_<plusminus_mnemonic>_fetch_cmp_0<mode>, atomic_<logic>_fetch_cmp_0<mode>): New define_expand patterns. (atomic_add_fetch_cmp_0<mode>_1, atomic_sub_fetch_cmp_0<mode>_1, atomic_<logic>_fetch_cmp_0<mode>_1): New define_insn patterns. * doc/md.texi (atomic_add_fetch_cmp_0<mode>, atomic_sub_fetch_cmp_0<mode>, atomic_and_fetch_cmp_0<mode>, atomic_or_fetch_cmp_0<mode>, atomic_xor_fetch_cmp_0<mode>): Document new named patterns. * gcc.target/i386/pr98737-1.c: New test. * gcc.target/i386/pr98737-2.c: New test. * gcc.target/i386/pr98737-3.c: New test. * gcc.target/i386/pr98737-4.c: New test. * gcc.target/i386/pr98737-5.c: New test. * gcc.target/i386/pr98737-6.c: New test. * gcc.target/i386/pr98737-7.c: New test.
2022-01-03	Update copyright years.	Jakub Jelinek	1	-1/+1

2021-11-30	vect: Add support for fmax and fmin reductions	Richard Sandiford	1	-0/+2
	This patch adds support for reductions involving calls to fmax() and fmin(), without the -ffast-math flags that allow them to be converted to MAX_EXPR and MIN_EXPR. gcc/ * doc/md.texi (reduc_fmin_scal_@var{m}): Document. (reduc_fmax_scal_@var{m}): Likewise. * optabs.def (reduc_fmax_scal_optab): New optab. (reduc_fmin_scal_optab): Likewise * internal-fn.def (REDUC_FMAX, REDUC_FMIN): New functions. * tree-vect-loop.c (reduction_fn_for_scalar_code): Handle CASE_CFN_FMAX and CASE_CFN_FMIN. (neutral_op_for_reduction): Likewise. (needs_fold_left_reduction_p): Likewise. * config/aarch64/iterators.md (FMAXMINV): New iterator. (fmaxmin): Handle UNSPEC_FMAXNMV and UNSPEC_FMINNMV. * config/aarch64/aarch64-simd.md (reduc_<optab>_scal_<mode>): Fix unspec mode. (reduc_<fmaxmin>_scal_<mode>): New pattern. * config/aarch64/aarch64-sve.md (reduc_<fmaxmin>_scal_<mode>): Likewise. gcc/testsuite/ * gcc.dg/vect/vect-fmax-1.c: New test. * gcc.dg/vect/vect-fmax-2.c: Likewise. * gcc.dg/vect/vect-fmax-3.c: Likewise. * gcc.dg/vect/vect-fmin-1.c: New test. * gcc.dg/vect/vect-fmin-2.c: Likewise. * gcc.dg/vect/vect-fmin-3.c: Likewise. * gcc.target/aarch64/fmaxnm_1.c: Likewise. * gcc.target/aarch64/fmaxnm_2.c: Likewise. * gcc.target/aarch64/fminnm_1.c: Likewise. * gcc.target/aarch64/fminnm_2.c: Likewise. * gcc.target/aarch64/sve/fmaxnm_2.c: Likewise. * gcc.target/aarch64/sve/fmaxnm_3.c: Likewise. * gcc.target/aarch64/sve/fminnm_2.c: Likewise. * gcc.target/aarch64/sve/fminnm_3.c: Likewise.
2021-11-17	Add IFN_COND_FMIN/FMAX functions	Richard Sandiford	1	-0/+2
	This patch adds conditional forms of FMAX and FMIN, following the pattern for existing conditional binary functions. gcc/ * doc/md.texi (cond_fmin@var{mode}, cond_fmax@var{mode}): Document. * optabs.def (cond_fmin_optab, cond_fmax_optab): New optabs. * internal-fn.def (COND_FMIN, COND_FMAX): New functions. * internal-fn.c (first_commutative_argument): Handle them. (FOR_EACH_COND_FN_PAIR): Likewise. * match.pd (UNCOND_BINARY, COND_BINARY): Likewise. * config/aarch64/aarch64-sve.md (cond_<fmaxmin><mode>): New pattern. gcc/testsuite/ * gcc.target/aarch64/sve/cond_fmaxnm_5.c: New test. * gcc.target/aarch64/sve/cond_fmaxnm_5_run.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_6.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_6_run.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_7.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_7_run.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_8.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_8_run.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_5.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_5_run.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_6.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_6_run.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_7.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_7_run.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_8.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_8_run.c: Likewise.
2021-10-18	[sve] PR93183 - Add support for conditional neg.	prathamesh.kulkarni	1	-0/+1
	gcc/ChangeLog: PR target/93183 * gimple-match-head.c (try_conditional_simplification): Add case for single operand. * internal-fn.def: Add entry for COND_NEG internal function. * internal-fn.c (FOR_EACH_CODE_MAPPING): Add entry for NEGATE_EXPR, COND_NEG mapping. * optabs.def: Add entry for cond_neg_optab. * match.pd (UNCOND_UNARY, COND_UNARY): New operator lists. (vec_cond COND (foo A) B) -> (IFN_COND_FOO COND A B): New pattern. (vec_cond COND B (foo A)) -> (IFN_COND_FOO ~COND A B): Likewise. gcc/testsuite/ChangeLog: PR target/93183 * gcc.target/aarch64/sve/cond_unary_4.c: Adjust. * gcc.target/aarch64/sve/pr93183.c: New test.
2021-10-11	ldist: Recognize strlen and rawmemchr like loops	Stefan Schulze Frielinghaus	1	-0/+1
	This patch adds support for recognizing loops which mimic the behaviour of functions strlen and rawmemchr, and replaces those with internal function calls in case a target provides them. In contrast to the standard strlen and rawmemchr functions, this patch also supports different instances where the memory pointed to is interpreted as 8, 16, and 32-bit sized, respectively. gcc/ChangeLog: * builtins.c (get_memory_rtx): Change to external linkage. * builtins.h (get_memory_rtx): Add function prototype. * doc/md.texi (rawmemchr<mode>): Document. * internal-fn.c (expand_RAWMEMCHR): Define. * internal-fn.def (RAWMEMCHR): Add. * optabs.def (rawmemchr_optab): Add. * tree-loop-distribution.c (find_single_drs): Change return code behaviour by also returning true if no single store was found but a single load. (loop_distribution::classify_partition): Respect the new return code behaviour of function find_single_drs. (loop_distribution::execute): Call new function transform_reduction_loop in order to replace rawmemchr or strlen like loops by calls into builtins. (generate_reduction_builtin_1): New function. (generate_rawmemchr_builtin): New function. (generate_strlen_builtin_1): New function. (generate_strlen_builtin): New function. (generate_strlen_builtin_using_rawmemchr): New function. (reduction_var_overflows_first): New function. (determine_reduction_stmt_1): New function. (determine_reduction_stmt): New function. (loop_distribution::transform_reduction_loop): New function. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ldist-rawmemchr-1.c: New test. * gcc.dg/tree-ssa/ldist-rawmemchr-2.c: New test. * gcc.dg/tree-ssa/ldist-strlen-1.c: New test. * gcc.dg/tree-ssa/ldist-strlen-2.c: New test. * gcc.dg/tree-ssa/ldist-strlen-3.c: New test.
2021-07-14	Vect: Add support for dot-product where the sign for the multiplicant changes.	Tamar Christina	1	-0/+1
	This patch adds support for a dot product where the sign of the multiplication arguments differ. i.e. one is signed and one is unsigned but the precisions are the same. #define N 480 #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2 signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4 unsigned SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char restrict a, SIGNEDNESS_4 char restrict b) { for (__INTPTR_TYPE__ i = 0; i < N; ++i) { int av = a[i]; int bv = b[i]; SIGNEDNESS_2 short mult = av * bv; res += mult; } return res; } The operations are performed as if the operands were extended to a 32-bit value. As such this operation isn't valid if there is an intermediate conversion to an unsigned value. i.e. if SIGNEDNESS_2 is unsigned. more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped the same optab is used but the operands are flipped in the optab expansion. To support this the patch extends the dot-product detection to optionally ignore operands with different signs and stores this information in the optab subtype which is now made a bitfield. The subtype can now additionally controls which optab an EXPR can expand to. gcc/ChangeLog: * optabs.def (usdot_prod_optab): New. * doc/md.texi: Document it and clarify other dot prod optabs. * optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign. * optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab. * optabs.c (expand_widen_pattern_expr): Likewise. * tree-cfg.c (verify_gimple_assign_ternary): Likewise. * tree-vect-loop.c (vectorizable_reduction): Query dot-product kind. * tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional optab subtype. (vect_widened_op_tree): Optionally ignore mismatch types. (vect_recog_dot_prod_pattern): Support usdot_prod_optab.
2021-07-06	Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs	Richard Biener	1	-0/+2
	This adds named expanders for vec_fmaddsub<mode>4 and vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and vfmsubaddXXXp{ds} instructions. This complements the previous addition of ADDSUB support. x86 lacks SUBADD and the negate variants of FMA with mixed plus minus so I did not add optabs or patterns for those but it would not be difficult if there's a target that has them. 2021-07-05 Richard Biener <rguenther@suse.de> * doc/md.texi (vec_fmaddsub<mode>4): Document. (vec_fmsubadd<mode>4): Likewise. * optabs.def (vec_fmaddsub$a4): Add. (vec_fmsubadd$a4): Likewise. * internal-fn.def (IFN_VEC_FMADDSUB): Add. (IFN_VEC_FMSUBADD): Likewise. * tree-vect-slp-patterns.c (addsub_pattern::recognize): Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD. (addsub_pattern::build): Likewise. * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB and CFN_VEC_FMSUBADD are not transparent for permutes. * config/i386/sse.md (vec_fmaddsub<mode>4): New expander. (vec_fmsubadd<mode>4): Likewise. * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase. * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise. * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise. * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
2021-06-24	Add x86 addsub SLP pattern	Richard Biener	1	-0/+1
	This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1 instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... } thus subtract, add alternating on lanes, starting with subtract. It adds a corresponding optab and direct internal function, vec_addsub$a3 and renames the existing i386 backend patterns to the new canonical name. The SLP pattern matches the exact alternating lane sequence rather than trying to be clever and anticipating incoming permutes - we could permute the two input vectors to the needed lane alternation, do the addsub and then permute the result vector back but that's only profitable in case the two input or the output permute will vanish - something Tamars refactoring of SLP pattern recog should make possible. 2021-06-17 Richard Biener <rguenther@suse.de> * config/i386/sse.md (avx_addsubv4df3): Rename to vec_addsubv4df3. (avx_addsubv8sf3): Rename to vec_addsubv8sf3. (sse3_addsubv2df3): Rename to vec_addsubv2df3. (sse3_addsubv4sf3): Rename to vec_addsubv4sf3. * config/i386/i386-builtin.def: Adjust. * internal-fn.def (VEC_ADDSUB): New internal optab fn. * optabs.def (vec_addsub_optab): New optab. * tree-vect-slp-patterns.c (class addsub_pattern): New. (slp_patterns): Add addsub_pattern. * tree-vect-slp.c (vect_optimize_slp): Disable propagation across CFN_VEC_ADDSUB. * tree-vectorizer.h (vect_pattern::vect_pattern): Make m_ops optional. * doc/md.texi (vec_addsub<mode>3): Document. * gcc.target/i386/vect-addsubv2df.c: New testcase. * gcc.target/i386/vect-addsubv4sf.c: Likewise. * gcc.target/i386/vect-addsubv4df.c: Likewise. * gcc.target/i386/vect-addsubv8sf.c: Likewise. * gcc.target/i386/vect-addsub-2.c: Likewise. * gcc.target/i386/vect-addsub-3.c: Likewise.
2021-01-14	slp: support complex FMS and complex FMS conjugate	Tamar Christina	1	-0/+2
	This adds support for FMS and FMS conjugated to the slp pattern matcher. Example of matches: #include <stdio.h> #include <complex.h> #define N 200 #define ROT #define TYPE float #define TYPE2 float void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] -= a[i] * (b[i] ROT); } } void g_f1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] -= conjf (a[i]) * (b[i]); } } void g_s1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] -= a[i] * conjf (b[i] ROT); } } void caxpy_sub(double complex * restrict y, double complex * restrict x, size_t N, double complex f) { for (size_t i = 0; i < N; ++i) y[i] -= x[i]* f; } gcc/ChangeLog: * internal-fn.def (COMPLEX_FMS, COMPLEX_FMS_CONJ): New. * optabs.def (cmls_optab, cmls_conj_optab): New. * doc/md.texi: Document them. * tree-vect-slp-patterns.c (class complex_fms_pattern, complex_fms_pattern::matches, complex_fms_pattern::recognize, complex_fms_pattern::build): New.
2021-01-14	slp: support complex FMA and complex FMA conjugate	Tamar Christina	1	-0/+2
	This adds support for FMA and FMA conjugated to the slp pattern matcher. Example of instructions matched: #include <stdio.h> #include <complex.h> #define N 200 #define ROT #define TYPE float #define TYPE2 float void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] += a[i] * (b[i] ROT); } } void g_f1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] += conjf (a[i]) * (b[i] ROT); } } void g_s1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] += a[i] * conjf (b[i] ROT); } } void caxpy_add(double complex * restrict y, double complex * restrict x, size_t N, double complex f) { for (size_t i = 0; i < N; ++i) y[i] += x[i]* f; } gcc/ChangeLog: * internal-fn.def (COMPLEX_FMA, COMPLEX_FMA_CONJ): New. * optabs.def (cmla_optab, cmla_conj_optab): New. * doc/md.texi: Document them. * tree-vect-slp-patterns.c (vect_match_call_p, class complex_fma_pattern, vect_slp_reset_pattern, complex_fma_pattern::matches, complex_fma_pattern::recognize, complex_fma_pattern::build): New.
2021-01-14	slp: support complex multiply and complex multiply conjugate	Tamar Christina	1	-0/+2
	This adds support for complex multiply and complex multiply and accumulate to the vect pattern detector. Example of instructions matched: #include <stdio.h> #include <complex.h> #define N 200 #define ROT #define TYPE float #define TYPE2 float void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] = a[i] * (b[i] ROT); } } void g_f1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] = conjf (a[i]) * (b[i] ROT); } } void g_s1 (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) { for (int i=0; i < N; i++) { c[i] = a[i] * conjf (b[i] ROT); } } gcc/ChangeLog: * internal-fn.def (COMPLEX_MUL, COMPLEX_MUL_CONJ): New. * optabs.def (cmul_optab, cmul_conj_optab): New. * doc/md.texi: Document them. * tree-vect-slp-patterns.c (vect_match_call_complex_mla, vect_normalize_conj_loc, is_eq_or_top, vect_validate_multiplication, vect_build_combine_node, class complex_mul_pattern, complex_mul_pattern::matches, complex_mul_pattern::recognize, complex_mul_pattern::build): New.
2021-01-04	Update copyright years.	Jakub Jelinek	1	-1/+1

2020-12-13	middle-end: Support complex Addition	Tamar Christina	1	-0/+2
	This patch adds support for * Complex Addition with rotation of 90 and 270. Addition with rotation of the second argument around the Argand plane. Supported rotations are 90 and 180. c = a + (b * I) and c = a + (b * I * I * I) gcc/ChangeLog: * tree-vect-slp-patterns.c: New file. * Makefile.in: Add it. * doc/passes.texi: Document it. * internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New. * optabs.def (cadd90_optab, cadd270_optab): New. * doc/md.texi: Document them. * tree-vect-loop.c (vect_analyze_loop_2): Add dissolve code. * tree-vect-slp.c: (vect_free_slp_instance, vect_create_new_slp_node): Export. (vect_match_slp_patterns_2, vect_match_slp_patterns): New. (vect_analyze_slp): Use it. * tree-vectorizer.h (vect_free_slp_tree): Export. (enum _complex_operation): Forward declare. (class vect_pattern): New gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_arm_v8_3a_complex_neon_ok_nocache): Fix it. (check_effective_target_vect_complex_add_byte ,check_effective_target_vect_complex_add_int ,check_effective_target_vect_complex_add_short ,check_effective_target_vect_complex_add_long ,check_effective_target_vect_complex_add_half ,check_effective_target_vect_complex_add_float ,check_effective_target_vect_complex_add_double): New. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c: New test. * gcc.dg/vect/complex/complex-add-pattern-template.c: New test. * gcc.dg/vect/complex/complex-add-template.c: New test. * gcc.dg/vect/complex/complex-operations-run.c: New test. * gcc.dg/vect/complex/complex-operations.c: New test. * gcc.dg/vect/complex/complex.exp: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-double.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-half-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-short.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c: New test.
2020-11-19	[2/3] [vect] Add widening add, subtract patterns	Joel Hutton	1	-0/+8
	Add widening add, subtract patterns to tree-vect-patterns. Update the widened code of patterns that detect PLUS_EXPR to also detect WIDEN_PLUS_EXPR. These patterns take 2 vectors with N elements of size S and perform an add/subtract on the elements, storing the results as N elements of size 2S (in 2 result vectors). This is implemented in the aarch64 backend as addl,addl2 and subl,subl2 respectively. Add aarch64 tests for patterns. gcc/ChangeLog: doc/generic.texi: Document new widen_plus/minus_lo/hi tree codes. * doc/md.texi: Document new widenening add/subtract hi/lo optabs. * expr.c (expand_expr_real_2): Add widen_add, widen_subtract cases. * optabs-tree.c (optab_for_tree_code): Add case for widening optabs. * optabs.def (OPTAB_D): Define vectorized widen add, subtracts. * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, subtracts. * tree-inline.c (estimate_operator_cost): Add case for widening adds, subtracts. * tree-vect-generic.c (expand_vector_operations_1): Add case for widening adds, subtracts * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog pattern. (vect_recog_widen_sub_pattern): New recog pattern. (vect_recog_average_pattern): Update widened add code. (vect_recog_average_pattern): Update widened add code. * tree-vect-stmts.c (vectorizable_conversion): Add case for widened add, subtract. (supportable_widening_operation): Add case for widened add, subtract. * tree.def (WIDEN_PLUS_EXPR): New tree code. (WIDEN_MINUS_EXPR): New tree code. (VEC_WIDEN_ADD_HI_EXPR): New tree code. (VEC_WIDEN_PLUS_LO_EXPR): New tree code. (VEC_WIDEN_MINUS_HI_EXPR): New tree code. (VEC_WIDEN_MINUS_LO_EXPR): New tree code. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-widen-add.c: New test. * gcc.target/aarch64/vect-widen-sub.c: New test.
2020-07-08	IFN/optabs: Support vector load/store with length	Kewen Lin	1	-0/+2
	This patch is to add the internal function and optabs support for vector load/store with length. For the vector load/store with length optab, the length item would be measured in lanes by default. For the targets which support length measured in bytes like Power, they should only define VnQI modes to wrap the other same size vector modes. If the length is larger than total lane/byte count of the given mode, the behavior is undefined. For the remaining lanes/bytes which isn't specified by length, they would be taken as undefined value. gcc/ChangeLog: * doc/md.texi (len_load_@var{m}): Document. (len_store_@var{m}): Likewise. * internal-fn.c (len_load_direct): New macro. (len_store_direct): Likewise. (expand_len_load_optab_fn): Likewise. (expand_len_store_optab_fn): Likewise. (direct_len_load_optab_supported_p): Likewise. (direct_len_store_optab_supported_p): Likewise. (expand_mask_load_optab_fn): New macro. Original renamed to ... (expand_partial_load_optab_fn): ... here. Add handlings for len_load_optab. (expand_mask_store_optab_fn): New macro. Original renamed to ... (expand_partial_store_optab_fn): ... here. Add handlings for len_store_optab. (internal_load_fn_p): Handle IFN_LEN_LOAD. (internal_store_fn_p): Handle IFN_LEN_STORE. (internal_fn_stored_value_index): Handle IFN_LEN_STORE. * internal-fn.def (LEN_LOAD): New internal function. (LEN_STORE): Likewise. * optabs.def (len_load_optab, len_store_optab): New optab.
2020-01-01	Update copyright years.	Jakub Jelinek	1	-1/+1
	From-SVN: r279813
2019-11-18	Add optabs for accelerating RAW and WAR alias checks	Richard Sandiford	1	-0/+3
	This patch adds optabs that check whether a read followed by a write or a write followed by a read can be divided into interleaved byte accesses without changing the dependencies between the bytes. This is one of the uses of the SVE2 WHILERW and WHILEWR instructions. (The instructions can also be used to limit the VF at runtime, but that's future work.) 2019-11-18 Richard Sandiford <richard.sandiford@arm.com> gcc/ * doc/sourcebuild.texi (vect_check_ptrs): Document. * optabs.def (check_raw_ptrs_optab, check_war_ptrs_optab): New optabs. * doc/md.texi: Document them. * internal-fn.def (IFN_CHECK_RAW_PTRS, IFN_CHECK_WAR_PTRS): New internal functions. * internal-fn.h (internal_check_ptrs_fn_supported_p): Declare. * internal-fn.c (check_ptrs_direct): New macro. (expand_check_ptrs_optab_fn): Likewise. (direct_check_ptrs_optab_supported_p): Likewise. (internal_check_ptrs_fn_supported_p): New fuction. * tree-data-ref.c: Include internal-fn.h. (create_ifn_alias_checks): New function. (create_intersect_range_checks): Use it. * config/aarch64/iterators.md (SVE2_WHILE_PTR): New int iterator. (optab, cmp_op): Handle it. (raw_war, unspec): New int attributes. * config/aarch64/aarch64.md (UNSPEC_WHILERW, UNSPEC_WHILE_WR): New constants. * config/aarch64/predicates.md (aarch64_bytes_per_sve_vector_operand): New predicate. * config/aarch64/aarch64-sve2.md (check_<raw_war>_ptrs<mode>): New expander. (@aarch64_sve2_while<cmp_op><GPI:mode><PRED_ALL:mode>_ptest): New pattern. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_check_ptrs): New procedure. * gcc.dg/vect/vect-alias-check-14.c: Expect IFN_CHECK_WAR to be used, if available. * gcc.dg/vect/vect-alias-check-15.c: Likewise. * gcc.dg/vect/vect-alias-check-16.c: Likewise IFN_CHECK_RAW. * gcc.target/aarch64/sve2/whilerw_1.c: New test. * gcc.target/aarch64/sve2/whilewr_1.c: Likewise. * gcc.target/aarch64/sve2/whilewr_2.c: Likewise. From-SVN: r278414
2019-11-08	Generalise gather and scatter optabs	Richard Sandiford	1	-5/+4
	The gather and scatter optabs required the vector offset to be the integer equivalent of the vector mode being loaded or stored. This patch generalises them so that the two vectors can have different element sizes, although they still need to have the same number of elements. One consequence of this is that it's possible (if unlikely) for two IFN_GATHER_LOADs to have the same arguments but different return types. E.g. the same scalar base and vector of 32-bit offsets could be used to load 8-bit elements and to load 16-bit elements. From just looking at the arguments, we could wrongly deduce that they're equivalent. I know we saw this happen at one point with IFN_WHILE_ULT, and we dealt with it there by passing a zero of the return type as an extra argument. Doing the same here also makes the load and store functions have the same argument assignment. For now this patch should be a no-op, but later SVE patches take advantage of the new flexibility. 2019-11-08 Richard Sandiford <richard.sandiford@arm.com> gcc/ * optabs.def (gather_load_optab, mask_gather_load_optab) (scatter_store_optab, mask_scatter_store_optab): Turn into conversion optabs, with the offset mode given explicitly. * doc/md.texi: Update accordingly. * config/aarch64/aarch64-sve-builtins-base.cc (svld1_gather_impl::expand): Likewise. (svst1_scatter_impl::expand): Likewise. * internal-fn.c (gather_load_direct, scatter_store_direct): Likewise. (expand_scatter_store_optab_fn): Likewise. (direct_gather_load_optab_supported_p): Likewise. (direct_scatter_store_optab_supported_p): Likewise. (expand_gather_load_optab_fn): Likewise. Expect the mask argument to be argument 4. (internal_fn_mask_index): Return 4 for IFN_MASK_GATHER_LOAD. (internal_gather_scatter_fn_supported_p): Replace the offset sign argument with the offset vector type. Require the two vector types to have the same number of elements but allow their element sizes to be different. Treat the optabs as conversion optabs. * internal-fn.h (internal_gather_scatter_fn_supported_p): Update prototype accordingly. * optabs-query.c (supports_at_least_one_mode_p): Replace with... (supports_vec_convert_optab_p): ...this new function. (supports_vec_gather_load_p): Update accordingly. (supports_vec_scatter_store_p): Likewise. * tree-vectorizer.h (vect_gather_scatter_fn_p): Take a vec_info. Replace the offset sign and bits parameters with a scalar type tree. * tree-vect-data-refs.c (vect_gather_scatter_fn_p): Likewise. Pass back the offset vector type instead of the scalar element type. Allow the offset to be wider than the memory elements. Search for an offset type that the target supports, stopping once we've reached the maximum of the element size and pointer size. Update call to internal_gather_scatter_fn_supported_p. (vect_check_gather_scatter): Update calls accordingly. When testing a new scale before knowing the final offset type, check whether the scale is supported for any signed or unsigned offset type. Check whether the target supports the source and target types of a conversion before deciding whether to look through the conversion. Record the chosen offset_vectype. * tree-vect-patterns.c (vect_get_gather_scatter_offset_type): Delete. (vect_recog_gather_scatter_pattern): Get the scalar offset type directly from the gs_info's offset_vectype instead. Pass a zero of the result type to IFN_GATHER_LOAD and IFN_MASK_GATHER_LOAD. * tree-vect-stmts.c (check_load_store_masking): Update call to internal_gather_scatter_fn_supported_p, passing the offset vector type recorded in the gs_info. (vect_truncate_gather_scatter_offset): Update call to vect_check_gather_scatter, leaving it to search for a valid offset vector type. (vect_use_strided_gather_scatters_p): Convert the offset to the element type of the gs_info's offset_vectype. (vect_get_gather_scatter_ops): Get the offset vector type directly from the gs_info. (vect_get_strided_load_store_ops): Likewise. (vectorizable_load): Pass a zero of the result type to IFN_GATHER_LOAD and IFN_MASK_GATHER_LOAD. * config/aarch64/aarch64-sve.md (gather_load<mode>): Rename to... (gather_load<mode><v_int_equiv>): ...this. (mask_gather_load<mode>): Rename to... (mask_gather_load<mode><v_int_equiv>): ...this. (scatter_store<mode>): Rename to... (scatter_store<mode><v_int_equiv>): ...this. (mask_scatter_store<mode>): Rename to... (mask_scatter_store<mode><v_int_equiv>): ...this. From-SVN: r277949
2019-09-30	[AArch64][SVE] Utilize ASRD instruction for division and remainder	Yuliang Wang	1	-0/+1
	2019-09-30 Yuliang Wang <yuliang.wang@arm.com> gcc/ * config/aarch64/aarch64-sve.md (sdiv_pow2<mode>3): New pattern for ASRD. * config/aarch64/iterators.md (UNSPEC_ASRD): New unspec. * internal-fn.def (IFN_DIV_POW2): New internal function. * optabs.def (sdiv_pow2_optab): New optab. * tree-vect-patterns.c (vect_recog_divmod_pattern): Modify pattern to support new operation. * doc/md.texi (sdiv_pow2$var{m3}): Documentation for the above. * doc/sourcebuild.texi (vect_sdiv_pow2_si): Document new target selector. gcc/testsuite/ * gcc.dg/vect/vect-sdiv-pow2-1.c: New test. * gcc.target/aarch64/sve/asrdiv_1.c: As above. * lib/target-supports.exp (check_effective_target_vect_sdiv_pow2_si): Return true for AArch64 with SVE. From-SVN: r276343
2019-09-12	Vectorise multiply high with scaling operations (PR 89386)	Yuliang Wang	1	-0/+4
	2019-09-12 Yuliang Wang <yuliang.wang@arm.com> gcc/ PR tree-optimization/89386 * config/aarch64/aarch64-sve2.md (<su>mull<bt><Vwide>) (<r>shrnb<mode>, <r>shrnt<mode>): New SVE2 patterns. (<su>mulh<r>s<mode>3): New pattern for MULHRS. * config/aarch64/iterators.md (UNSPEC_SMULLB, UNSPEC_SMULLT) (UNSPEC_UMULLB, UNSPEC_UMULLT, UNSPEC_SHRNB, UNSPEC_SHRNT) (UNSPEC_RSHRNB, UNSPEC_RSHRNT, UNSPEC_SMULHS, UNSPEC_SMULHRS) UNSPEC_UMULHS, UNSPEC_UMULHRS): New unspecs. (MULLBT, SHRNB, SHRNT, MULHRS): New int iterators. (su, r): Handle the unspecs above. (bt): New int attribute. * internal-fn.def (IFN_MULHS, IFN_MULHRS): New internal functions. * internal-fn.c (first_commutative_argument): Commutativity info for above. * optabs.def (smulhs_optab, smulhrs_optab, umulhs_optab) (umulhrs_optab): New optabs. * doc/md.texi (smulhs$var{m3}, umulhs$var{m3}) (smulhrs$var{m3}, umulhrs$var{m3}): Documentation for the above. * tree-vect-patterns.c (vect_recog_mulhs_pattern): New pattern function. (vect_vect_recog_func_ptrs): Add it. * testsuite/gcc.target/aarch64/sve2/mulhrs_1.c: New test. * testsuite/gcc.dg/vect/vect-mulhrs-1.c: As above. * testsuite/gcc.dg/vect/vect-mulhrs-2.c: As above. * testsuite/gcc.dg/vect/vect-mulhrs-3.c: As above. * testsuite/gcc.dg/vect/vect-mulhrs-4.c: As above. * doc/sourcebuild.texi (vect_mulhrs_hi): Document new target selector. * testsuite/lib/target-supports.exp (check_effective_target_vect_mulhrs_hi): Return true for AArch64 with SVE2. From-SVN: r275682
2019-08-26	i386: Roundeven expansion for SSE4.1+	Tejas Joshi	1	-0/+1
	gcc/ChangeLog: 2019-08-26 Tejas Joshi <tejasjoshi9673@gmail.com> Uros Bizjak <ubizjak@gmail.com> * builtins.c (mathfn_built_in_2): Change CASE_MATHFN to CASE_MATHFN_FLOATN for roundeven. * config/i386/i386.c (ix86_i387_mode_needed): Add case I387_ROUNDEVEN. (ix86_mode_needed): Likewise. (ix86_mode_after): Likewise. (ix86_mode_entry): Likewise. (ix86_mode_exit): Likewise. (ix86_emit_mode_set): Likewise. (emit_i387_cw_initialization): Add case I387_CW_ROUNDEVEN. * config/i386/i386.h (ix86_stack_slot): Add SLOT_CW_ROUNDEVEN. (ix86_entry): Add I387_ROUNDEVEN. (avx_u128_state): Add I387_CW_ANY. * config/i386/i386.md: Define UNSPEC_FRNDINT_ROUNDEVEN. (define_int_iterator): Likewise. (define_int_attr): Likewise for rounding_insn, rounding and ROUNDING. (define_constant): Define ROUND_ROUNDEVEN mode. (define_attr): Add roundeven mode for i387_cw. (<rouding_insn><mode>2): Add condition for ROUND_ROUNDEVEN. * internal-fn.def (ROUNDEVEN): New builtin function. * optabs.def (roundeven_optab): New optab. gcc/testsuite/ChangeLog: 2019-08-26 Tejas Joshi <tejasjoshi9673@gmail.com> * gcc.target/i386/sse4_1-round-roundeven-1.c: New test. * gcc.target/i386/sse4_1-round-roundeven-2.c: New test. Co-Authored-By: Uros Bizjak <ubizjak@gmail.com> From-SVN: r274928
2019-08-15	Add support for conditional shifts	Richard Sandiford	1	-0/+3
	This patch adds support for IFN_COND shifts left and shifts right. This is mostly mechanical, but since we try to handle conditional operations in the same way as unconditional operations in match.pd, we need to support IFN_COND shifts by scalars as well as vectors. E.g.: IFN_COND_SHL (cond, a, { 1, 1, ... }, fallback) and: IFN_COND_SHL (cond, a, 1, fallback) are the same operation, with: (for shiftrotate (lrotate rrotate lshift rshift) ... /* Prefer vector1 << scalar to vector1 << vector2 if vector2 is uniform. / (for vec (VECTOR_CST CONSTRUCTOR) (simplify (shiftrotate @0 vec@1) (with { tree tem = uniform_vector_p (@1); } (if (tem) (shiftrotate @0 { tem; })))))) preferring the latter. The patch copes with this by extending create_convert_operand_from to handle scalar-to-vector conversions. 2019-08-15 Richard Sandiford <richard.sandiford@arm.com> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> gcc/ internal-fn.def (IFN_COND_SHL, IFN_COND_SHR): New internal functions. * internal-fn.c (FOR_EACH_CODE_MAPPING): Handle shifts. * match.pd (UNCOND_BINARY, COND_BINARY): Likewise. * optabs.def (cond_ashl_optab, cond_ashr_optab, cond_lshr_optab): New optabs. * optabs.h (create_convert_operand_from): Expand comment. * optabs.c (maybe_legitimize_operand): Allow implicit broadcasts when mapping scalar rtxes to vector operands. * config/aarch64/iterators.md (SVE_INT_BINARY): Add ashift, ashiftrt and lshiftrt. (sve_int_op, sve_int_op_rev, sve_pred_int_rhs2_operand): Handle them. * config/aarch64/aarch64-sve.md (cond_<optab><mode>_2_const) (cond_<optab><mode>_any_const): New patterns. gcc/testsuite/ * gcc.target/aarch64/sve/cond_shift_1.c: New test. * gcc.target/aarch64/sve/cond_shift_1_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_2.c: Likewise. * gcc.target/aarch64/sve/cond_shift_2_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_3.c: Likewise. * gcc.target/aarch64/sve/cond_shift_3_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_4.c: Likewise. * gcc.target/aarch64/sve/cond_shift_4_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_5.c: Likewise. * gcc.target/aarch64/sve/cond_shift_5_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_6.c: Likewise. * gcc.target/aarch64/sve/cond_shift_6_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_7.c: Likewise. * gcc.target/aarch64/sve/cond_shift_7_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_8.c: Likewise. * gcc.target/aarch64/sve/cond_shift_8_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_9.c: Likewise. * gcc.target/aarch64/sve/cond_shift_9_run.c: Likewise. Co-Authored-By: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> From-SVN: r274505
2019-07-02	optabs.def (movmem_optab): Add movmem back for memmove().	Aaron Sawdey	1	-0/+1
	2019-07-02 Aaron Sawdey <acsawdey@linux.ibm.com> * optabs.def (movmem_optab): Add movmem back for memmove(). * doc/md.texi: Add description of movmem pattern for overlapping move. From-SVN: r272946
2019-06-27	builtins.c (get_memory_rtx): Fix comment.	Aaron Sawdey	1	-1/+1
	2019-06-27 Aaron Sawdey <acsawdey@linux.ibm.com> * builtins.c (get_memory_rtx): Fix comment. * optabs.def (movmem_optab): Change to cpymem_optab. * expr.c (emit_block_move_via_cpymem): Change movmem to cpymem. (emit_block_move_hints): Change movmem to cpymem. * defaults.h: Change movmem to cpymem. * targhooks.c (get_move_ratio): Change movmem to cpymem. (default_use_by_pieces_infrastructure_p): Ditto. * config/aarch64/aarch64-protos.h: Change movmem to cpymem. * config/aarch64/aarch64.c (aarch64_expand_movmem): Change movmem to cpymem. * config/aarch64/aarch64.h: Change movmem to cpymem. * config/aarch64/aarch64.md (movmemdi): Change name to cpymemdi. * config/alpha/alpha.h: Change movmem to cpymem in comment. * config/alpha/alpha.md (movmemqi, movmemdi, movmemdi_1): Change movmem to cpymem. config/arc/arc-protos.h: Change movmem to cpymem. * config/arc/arc.c (arc_expand_movmem): Change movmem to cpymem. * config/arc/arc.h: Change movmem to cpymem in comment. * config/arc/arc.md (movmemsi): Change movmem to cpymem. * config/arm/arm-protos.h: Change movmem to cpymem in names. * config/arm/arm.c (arm_movmemqi_unaligned, arm_gen_movmemqi, gen_movmem_ldrd_strd, thumb_expand_movmemqi) Change movmem to cpymem. * config/arm/arm.md (movmemqi): Change movmem to cpymem. * config/arm/thumb1.md (movmem12b, movmem8b): Change movmem to cpymem. * config/avr/avr-protos.h: Change movmem to cpymem. * config/avr/avr.c (avr_adjust_insn_length, avr_emit_movmemhi, avr_out_movmem): Change movmem to cpymem. * config/avr/avr.md (movmemhi, movmem_<mode>, movmemx_<mode>): Change movmem to cpymem. * config/bfin/bfin-protos.h: Change movmem to cpymem. * config/bfin/bfin.c (single_move_for_movmem, bfin_expand_movmem): Change movmem to cpymem. * config/bfin/bfin.h: Change movmem to cpymem in comment. * config/bfin/bfin.md (movmemsi): Change name to cpymemsi. * config/c6x/c6x-protos.h: Change movmem to cpymem. * config/c6x/c6x.c (c6x_expand_movmem): Change movmem to cpymem. * config/c6x/c6x.md (movmemsi): Change name to cpymemsi. * config/frv/frv.md (movmemsi): Change name to cpymemsi. * config/ft32/ft32.md (movmemsi): Change name to cpymemsi. * config/h8300/h8300.md (movmemsi): Change name to cpymemsi. * config/i386/i386-expand.c (expand_set_or_movmem_via_loop, expand_set_or_movmem_via_rep, expand_movmem_epilogue, expand_setmem_epilogue_via_loop, expand_set_or_cpymem_prologue, expand_small_cpymem_or_setmem, expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves, expand_set_or_cpymem_constant_prologue, ix86_expand_set_or_cpymem): Change movmem to cpymem. * config/i386/i386-protos.h: Change movmem to cpymem. * config/i386/i386.h: Change movmem to cpymem in comment. * config/i386/i386.md (movmem<mode>): Change name to cpymem. (setmem<mode>): Change expansion function name. * config/lm32/lm32.md (movmemsi): Change name to cpymemsi. * config/m32c/blkmov.md (movmemhi, movmemhi_bhi_op, movmemhi_bpsi_op, movmemhi_whi_op, movmemhi_wpsi_op): Change movmem to cpymem. * config/m32c/m32c-protos.h: Change movmem to cpymem. * config/m32c/m32c.c (m32c_expand_movmemhi): Change movmem to cpymem. * config/m32r/m32r.c (m32r_expand_block_move): Change movmem to cpymem. * config/m32r/m32r.md (movmemsi, movmemsi_internal): Change movmem to cpymem. * config/mcore/mcore.md (movmemsi): Change name to cpymemsi. * config/microblaze/microblaze.c: Change movmem to cpymem in comment. * config/microblaze/microblaze.md (movmemsi): Change name to cpymemsi. * config/mips/mips.c (mips_use_by_pieces_infrastructure_p): Change movmem to cpymem. * config/mips/mips.h: Change movmem to cpymem. * config/mips/mips.md (movmemsi): Change name to cpymemsi. * config/nds32/nds32-memory-manipulation.c (nds32_expand_movmemsi_loop_unknown_size, nds32_expand_movmemsi_loop_known_size, nds32_expand_movmemsi_loop, nds32_expand_movmemsi_unroll, nds32_expand_movmemsi): Change movmem to cpymem. * config/nds32/nds32-multiple.md (movmemsi): Change name to cpymemsi. * config/nds32/nds32-protos.h: Change movmem to cpymem. * config/pa/pa.c (compute_movmem_length): Change movmem to cpymem. (pa_adjust_insn_length): Change call to compute_movmem_length. * config/pa/pa.md (movmemsi, movmemsi_prereload, movmemsi_postreload, movmemdi, movmemdi_prereload, movmemdi_postreload): Change movmem to cpymem. * config/pdp11/pdp11.md (movmemhi, movmemhi1, movmemhi_nocc, UNSPEC_MOVMEM): Change movmem to cpymem. * config/riscv/riscv.c: Change movmem to cpymem in comment. * config/riscv/riscv.h: Change movmem to cpymem. * config/riscv/riscv.md: (movmemsi) Change name to cpymemsi. * config/rs6000/rs6000.md: (movmemsi) Change name to cpymemsi. * config/rx/rx.md: (UNSPEC_MOVMEM, movmemsi, rx_movmem): Change movmem to cpymem. * config/s390/s390-protos.h: Change movmem to cpymem. * config/s390/s390.c (s390_expand_movmem, s390_expand_setmem, s390_expand_insv): Change movmem to cpymem. * config/s390/s390.md (movmem<mode>, movmem_short, movmem_short, movmem_long, movmem_long, movmem_long_31z): Change movmem to cpymem. config/sh/sh.md (movmemsi): Change name to cpymemsi. * config/sparc/sparc.h: Change movmem to cpymem in comment. * config/vax/vax-protos.h (vax_output_movmemsi): Remove prototype for nonexistent function. * config/vax/vax.h: Change movmem to cpymem in comment. * config/vax/vax.md (movmemhi, movmemhi1): Change movmem to cpymem. * config/visium/visium.h: Change movmem to cpymem in comment. * config/visium/visium.md (movmemsi): Change name to cpymemsi. * config/xtensa/xtensa.md (movmemsi): Change name to cpymemsi. * doc/md.texi: Change movmem to cpymem and update description to match. * doc/rtl.texi: Change movmem to cpymem. * target.def (use_by_pieces_infrastructure_p): Change movmem to cpymem. * doc/tm.texi: Regenerate. From-SVN: r272755
2019-06-19	md.texi: Document vec_shl_<mode> pattern.	Jakub Jelinek	1	-0/+1
	* doc/md.texi: Document vec_shl_<mode> pattern. * optabs.def (vec_shl_optab): New optab. * optabs.c (shift_amt_for_vec_perm_mask): Add shift_optab argument, if == vec_shl_optab, check for left whole vector shift pattern rather than right shift. (expand_vec_perm_const): Add vec_shl_optab support. * optabs-query.c (can_vec_perm_var_p): Mention also vec_shl optab in the comment. * tree-vect-generic.c (lower_vec_perm): Support permutations which can be handled by vec_shl_optab. * tree-vect-stmts.c (scan_store_can_perm_p): New function. (check_scan_store): Use it. (vectorizable_scan_store): If target can't do normal permutations, try to use whole vector left shifts and if needed a VEC_COND_EXPR after it. * config/i386/sse.md (vec_shl_<mode>): New expander. * gcc.dg/vect/vect-simd-8.c: If main is defined, don't include tree-vect.h nor call check_vect. * gcc.dg/vect/vect-simd-9.c: Likewise. * gcc.dg/vect/vect-simd-10.c: New test. * gcc.target/i386/sse2-vect-simd-8.c: New test. * gcc.target/i386/sse2-vect-simd-9.c: New test. * gcc.target/i386/sse2-vect-simd-10.c: New test. * gcc.target/i386/avx2-vect-simd-8.c: New test. * gcc.target/i386/avx2-vect-simd-9.c: New test. * gcc.target/i386/avx2-vect-simd-10.c: New test. * gcc.target/i386/avx512f-vect-simd-8.c: New test. * gcc.target/i386/avx512f-vect-simd-9.c: New test. * gcc.target/i386/avx512f-vect-simd-10.c: New test. From-SVN: r272472
2019-06-18	[Vectorizer] Support masking fold left reductions	Alejandro Martinez	1	-0/+1
	This patch adds support in the vectorizer for masking fold left reductions. This avoids the need to insert a conditional assignement with some identity value. From-SVN: r272407
2019-01-01	Update copyright years.	Jakub Jelinek	1	-1/+1
	From-SVN: r267494
2018-12-21	re PR target/88556 (Inline built-in sinh, cosh, tanh for -ffast-math)	Uros Bizjak	1	-0/+3
	PR target/88556 * internal-fn.def (COSH): New. (SINH): Ditto. (TANH): Ditto. * optabs.def (cosh_optab): New. (sinh_optab): Ditto. (tanh_optab): Ditto. * config/i386/i386-protos.h (ix86_emit_i387_sinh): New prototype. (ix86_emit_i387_cosh): Ditto. (ix86_emit_i387_tanh): Ditto. * config/i386/i386.c (ix86_emit_i387_sinh): New function. (ix86_emit_i387_cosh): Ditto. (ix86_emit_i387_tanh): Ditto. * config/i386/i386.md (sinhxf2): New expander. (sinh<mode>2): Ditto. (coshxf2): Ditto. (cosh<mode>2): Ditto. (tanhxf2): Ditto. (tanh<mode>2): Ditto. From-SVN: r267325
2018-12-18	re PR target/88513 (FAIL: gcc.target/i386/pr59591-1.c)	Jakub Jelinek	1	-0/+3
	PR target/88513 PR target/88514 * optabs.def (vec_pack_sbool_trunc_optab, vec_unpacks_sbool_hi_optab, vec_unpacks_sbool_lo_optab): New optabs. * optabs.c (expand_widen_pattern_expr): Use vec_unpacks_sbool__optab and pass additional argument if both input and target have the same scalar mode of VECTOR_BOOLEAN_TYPE_P vectors. expr.c (expand_expr_real_2) <case VEC_PACK_TRUNC_EXPR>: Handle VECTOR_BOOLEAN_TYPE_P pack where result has the same scalar mode as the operands using vec_pack_sbool_trunc_optab. * tree-vect-stmts.c (supportable_widening_operation): Use vec_unpacks_sbool_{lo,hi}_optab for VECTOR_BOOLEAN_TYPE_P conversions where both wider_vectype and vectype have the same scalar mode. (supportable_narrowing_operation): Similarly use vec_pack_sbool_trunc_optab if narrow_vectype and vectype have the same scalar mode. * config/i386/i386.c (ix86_get_builtin) <case IX86_BUILTIN_GATHER3ALTDIV8SF>: Check for VECTOR_MODE_P rather than non-VOIDmode. * config/i386/sse.md (vec_pack_trunc_qi, vec_pack_trunc_<mode>): Remove useless ()s around "register_operand", formatting fixes. (vec_pack_sbool_trunc_qi, vec_unpacks_sbool_lo_qi, vec_unpacks_sbool_hi_qi): New expanders. * doc/md.texi (vec_pack_sbool_trunc_M, vec_unpacks_sbool_hi_M, vec_unpacks_sbool_lo_M): Document. * gcc.target/i386/avx512f-pr88513-1.c: New test. * gcc.target/i386/avx512f-pr88513-2.c: New test. * gcc.target/i386/avx512vl-pr88464-1.c: New test. * gcc.target/i386/avx512vl-pr88464-2.c: New test. * gcc.target/i386/avx512vl-pr88464-3.c: New test. * gcc.target/i386/avx512vl-pr88464-4.c: New test. * gcc.target/i386/avx512vl-pr88513-1.c: New test. * gcc.target/i386/avx512vl-pr88513-2.c: New test. * gcc.target/i386/avx512vl-pr88513-3.c: New test. * gcc.target/i386/avx512vl-pr88513-4.c: New test. * gcc.target/i386/avx512vl-pr88514-1.c: New test. * gcc.target/i386/avx512vl-pr88514-2.c: New test. * gcc.target/i386/avx512vl-pr88514-3.c: New test. From-SVN: r267228
2018-12-17	re PR target/88502 (Inline built-in asinh, acosh, atanh for -ffast-math)	Uros Bizjak	1	-0/+3
	PR target/88502 * internal-fn.def (ACOSH): New. (ASINH): Ditto. (ATANH): Ditto. * optabs.def (acosh_optab): New. (asinh_optab): Ditto. (atanh_optab): Ditto. * config/i386/i386-protos.h (ix86_emit_i387_asinh): New prototype. (ix86_emit_i387_acosh): Ditto. (ix86_emit_i387_atanh): Ditto. * config/i386/i386.c (ix86_emit_i387_asinh): New function. (ix86_emit_i387_acosh): Ditto. (ix86_emit_i387_atanh): Ditto. * config/i386/i386.md (asinhxf2): New expander. (asinh<mode>2): Ditto. (acoshxf2): Ditto. (acosh<mode>2): Ditto. (atanhxf2): Ditto. (atanh<mode>2): Ditto. From-SVN: r267204