aboutsummaryrefslogtreecommitdiff
path: root/gcc/internal-fn.def
AgeCommit message (Collapse)AuthorFilesLines
2024-12-10Remove vcond{,u,eq} optabsRichard Sandiford1-3/+0
This patch removes the remaining traces of the vcond{,u,eq} optabs. Earlier patches removed the target-independent uses and I couldn't find any direct references to either the *_optabs or the ifns in target-specific code. gcc/ * doc/md.texi (vcond@var{m}@var{n}, vcondu@var{m}@var{n}) (vcondeq@var{m}@var{n}): Delete. (vcond_mask_@var{m}@var{n}): Redocument in standalone form. * internal-fn.def (VCOND, VCONDU, VCONDEQ): Delete. * internal-fn.cc (expand_vec_cond_optab_fn): Delete. * optabs.def (vcond_optab, vcondu_optab, vcondeq_optab): Delete.
2024-11-28[PATCH v6 01/12] Implement internal functions for efficient CRC computation.Mariam Arutunian1-0/+2
Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster CRC generation. One performs bit-forward and the other bit-reversed CRC computation. If CRC optabs are supported, they are used for the CRC computation. Otherwise, table-based CRC is generated. The supported data and CRC sizes are 8, 16, 32, and 64 bits. The polynomial is without the leading 1. A table with 256 elements is used to store precomputed CRCs. For the reflection of inputs and the output, a simple algorithm involving SHIFT, AND, and OR operations is used. gcc/ * doc/md.texi (crc@var{m}@var{n}4, crc_rev@var{m}@var{n}4): Document. * expr.cc (calculate_crc): New function. (assemble_crc_table): Likewise. (generate_crc_table): Likewise. (calculate_table_based_CRC): Likewise. (expand_crc_table_based): Likewise. (gen_common_operation_to_reflect): Likewise. (reflect_64_bit_value): Likewise. (reflect_32_bit_value): Likewise. (reflect_16_bit_value): Likewise. (reflect_8_bit_value): Likewise. (generate_reflecting_code_standard): Likewise. (expand_reversed_crc_table_based): Likewise. * expr.h (generate_reflecting_code_standard): New function declaration. (expand_crc_table_based): Likewise. (expand_reversed_crc_table_based): Likewise. * internal-fn.cc: (crc_direct): Define. (direct_crc_optab_supported_p): Likewise. (expand_crc_optab_fn): New function * internal-fn.def (CRC, CRC_REV): New internal functions. * optabs.def (crc_optab, crc_rev_optab): New optabs. Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com> Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com> Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2024-11-20OpenMP: middle-end support for dispatch + adjust_argsPaul-Antoine Arras1-0/+1
This patch adds middle-end support for the `dispatch` construct and the `adjust_args` clause. The heavy lifting is done in `gimplify_omp_dispatch` and `gimplify_call_expr` respectively. For `adjust_args`, this mostly consists in emitting a call to `omp_get_mapped_ptr` for the adequate device. For dispatch, the following steps are performed: * Handle the device clause, if any: set the default-device ICV at the top of the dispatch region and restore its previous value at the end. * Handle novariants and nocontext clauses, if any. Evaluate compile-time constants and select a variant, if possible. Otherwise, emit code to handle all possible cases at run time. gcc/ChangeLog: * builtins.cc (builtin_fnspec): Handle BUILT_IN_OMP_GET_MAPPED_PTR. * gimple-low.cc (lower_stmt): Handle GIMPLE_OMP_DISPATCH. * gimple-pretty-print.cc (dump_gimple_omp_dispatch): New function. (pp_gimple_stmt_1): Handle GIMPLE_OMP_DISPATCH. * gimple-walk.cc (walk_gimple_stmt): Likewise. * gimple.cc (gimple_build_omp_dispatch): New function. (gimple_copy): Handle GIMPLE_OMP_DISPATCH. * gimple.def (GIMPLE_OMP_DISPATCH): Define. * gimple.h (gimple_build_omp_dispatch): Declare. (gimple_has_substatements): Handle GIMPLE_OMP_DISPATCH. (gimple_omp_dispatch_clauses): New function. (gimple_omp_dispatch_clauses_ptr): Likewise. (gimple_omp_dispatch_set_clauses): Likewise. (gimple_return_set_retval): Handle GIMPLE_OMP_DISPATCH. * gimplify.cc (enum omp_region_type): Add ORT_DISPATCH. (struct gimplify_omp_ctx): Add in_call_args. (gimplify_call_expr): Handle need_device_ptr arguments. (is_gimple_stmt): Handle OMP_DISPATCH. (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_DEVICE in a dispatch construct. Handle OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT. (omp_has_novariants): New function. (omp_has_nocontext): Likewise. (omp_construct_selector_matches): Handle OMP_DISPATCH with nocontext clause. (find_ifn_gomp_dispatch): New function. (gimplify_omp_dispatch): Likewise. (gimplify_expr): Handle OMP_DISPATCH. * gimplify.h (omp_has_novariants): Declare. * internal-fn.cc (expand_GOMP_DISPATCH): New function. * internal-fn.def (GOMP_DISPATCH): Define. * omp-builtins.def (BUILT_IN_OMP_GET_MAPPED_PTR): Define. (BUILT_IN_OMP_GET_DEFAULT_DEVICE): Define. (BUILT_IN_OMP_SET_DEFAULT_DEVICE): Define. * omp-general.cc (omp_construct_traits_to_codes): Add OMP_DISPATCH. (struct omp_ts_info): Add dispatch. (omp_resolve_declare_variant): Handle novariants. Adjust DECL_ASSEMBLER_NAME. * omp-low.cc (scan_omp_1_stmt): Handle GIMPLE_OMP_DISPATCH. (lower_omp_dispatch): New function. (lower_omp_1): Call it. * tree-inline.cc (remap_gimple_stmt): Handle GIMPLE_OMP_DISPATCH. (estimate_num_insns): Handle GIMPLE_OMP_DISPATCH.
2024-11-13aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]Soumya AR1-1/+1
This patch uses the FSCALE instruction provided by SVE to implement the standard ldexp family of functions. Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the following code: float test_ldexpf (float x, int i) { return __builtin_ldexpf (x, i); } double test_ldexp (double x, int i) { return __builtin_ldexp(x, i); } GCC Output: test_ldexpf: b ldexpf test_ldexp: b ldexp Since SVE has support for an FSCALE instruction, we can use this to process scalar floats by moving them to a vector register and performing an fscale call, similar to how LLVM tackles an ldexp builtin as well. New Output: test_ldexpf: fmov s31, w0 ptrue p7.b, vl4 fscale z0.s, p7/m, z0.s, z31.s ret test_ldexp: sxtw x0, w0 ptrue p7.b, vl8 fmov d31, x0 fscale z0.d, p7/m, z0.d, z31.d ret This is a revision of an earlier patch, and now uses the extended definition of aarch64_ptrue_reg to generate predicate registers with the appropriate set bits. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Soumya AR <soumyaa@nvidia.com> gcc/ChangeLog: PR target/111733 * config/aarch64/aarch64-sve.md (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar floating modes and expand to the existing pattern for FSCALE. * config/aarch64/iterators.md: (SVE_FULL_F_SCALAR): Added an iterator to match all FP SVE modes as well as their scalar equivalents. (VPRED): Extended the attribute to handle GPF_HF modes. * internal-fn.def (LDEXP): Changed macro to incorporate ldexpf16. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/fscale.c: New test.
2024-11-06openmp: Add IFN_GOMP_MAX_VFAndrew Stubbs1-0/+1
Delay omp_max_vf call until after the host and device compilers have diverged so that the max_vf value can be tuned exactly right on both variants. This change means that the ompdevlow pass must be enabled for functions that use OpenMP directives with both "simd" and "schedule" enabled. gcc/ChangeLog: * internal-fn.cc (expand_GOMP_MAX_VF): New function. * internal-fn.def (GOMP_MAX_VF): New internal function. * omp-expand.cc (omp_adjust_chunk_size): Emit IFN_GOMP_MAX_VF when called in offload context, otherwise assume host context. * omp-offload.cc (execute_omp_device_lower): Expand IFN_GOMP_MAX_VF.
2024-10-29Internal-fn: Introduce new IFN MASK_LEN_STRIDED_LOAD{STORE}Pan Li1-0/+6
This patch would like to introduce new IFN for strided load and store. LOAD: v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) STORE: MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias) The IFN target below code example similar as below void foo (int * a, int * b, int stride, int n) { for (int i = 0; i < n; i++) a[i * stride] = b[i * stride]; } The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * internal-fn.cc (strided_load_direct): Add new define direct for strided load. (strided_store_direct): Ditto but for store. (expand_strided_load_optab_fn): Add new func to expand the IFN MASK_LEN_STRIDED_LOAD in middle-end. (expand_strided_store_optab_fn): Ditto but for store. (direct_strided_load_optab_supported_p): Add define for stride load optab supported. (direct_strided_store_optab_supported_p): Ditto but for store. (internal_fn_len_index): Add strided load/store len index. (internal_fn_mask_index): Ditto but for mask. (internal_fn_stored_value_index): Add strided store value index. * internal-fn.def (MASK_LEN_STRIDED_LOAD): Add new IFN for strided load. (MASK_LEN_STRIDED_STORE): Ditto but for store. * optabs.def (mask_len_strided_load_optab): Add strided load optab. (mask_len_strided_store_optab): Add strided store optab. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
2024-09-04coros: mark .CO_YIELD as LEAF [PR106973]Arsen Arsenović1-1/+1
We rely on .CO_YIELD calls being followed by an assignment (optionally) and then a switch/if in the same basic block. This implies that a .CO_YIELD can never end a block. However, since a call to .CO_YIELD is still a call, if the function containing it calls setjmp, GCC thinks that the .CO_YIELD can introduce abnormal control flow, and generates an edge for the call. We know this is not the case; .CO_YIELD calls get removed quite early on and have no effect, and result in no other calls, so .CO_YIELD can be considered a leaf function, preventing generating an edge when calling it. PR c++/106973 - coroutine generator and setjmp PR c++/106973 gcc/ChangeLog: * internal-fn.def (CO_YIELD): Mark as ECF_LEAF. gcc/testsuite/ChangeLog: * g++.dg/coroutines/pr106973.C: New test.
2024-07-24optabs/rs6000: Rename iorc and andc to iorn and andnAndrew Pinski1-2/+2
When I was trying to add an scalar version of iorc and andc, the optab that got matched was for and/ior with the mode of csi and cdi instead of iorc and andc optabs for si and di modes. Since csi/cdi are the complex integer modes, we need to rename the optabs to be without c there. This changes c to n which is a neutral and known not to be first letter of a mode. Bootstrapped and tested on x86_64 and powerpc64le. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def: s/iorc/iorn/. s/andc/andn/ for the code. * config/rs6000/rs6000-string.cc (expand_cmp_vec_sequence): Update to iorn. * config/rs6000/rs6000.md (andc<mode>3): Rename to ... (andn<mode>3): This. (iorc<mode>3): Rename to ... (iorn<mode>3): This. * doc/md.texi: Update documentation for the rename. * internal-fn.def (BIT_ANDC): Rename to ... (BIT_ANDN): This. (BIT_IORC): Rename to ... (BIT_IORN): This. * optabs.def (andc_optab): Rename to ... (andn_optab): This. (iorc_optab): Rename to ... (iorn_optab): This. * gimple-isel.cc (gimple_expand_vec_cond_expr): Update for the renamed internal functions, ANDC/IORC to ANDN/IORN. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-07-08isel: Fold more in gimple_expand_vec_cond_expr with andc and iorc [PR115659]Kewen Lin1-0/+4
As PR115659 shows, assuming c = x CMP y, there are some folding chances for patterns r = c ? 0/z : z/-1: - for r = c ? 0 : z, it can be folded into r = ~c & z. - for r = c ? z : -1, it can be folded into r = ~c | z. But BIT_AND/BIT_IOR applied on one BIT_NOT operand is a compound operation, it's arguable to consider it beats vector selection. So this patch is to introduce new optabs andc, iorc and its corresponding internal functions BIT_{ANDC,IORC}, and if targets defines such optabs for vector modes, it means targets support these hardware insns and should be not worse than vector selection. PR tree-optimization/115659 gcc/ChangeLog: * doc/md.texi: Document andcm3 and iorcm3. * gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings for patterns x CMP y ? 0 : z and x CMP y ? z : -1. * internal-fn.def (BIT_ANDC): New internal function. (BIT_IORC): Likewise. * optabs.def (andc, iorc): New optab.
2024-06-27Internal-fn: Support new IFN SAT_TRUNC for unsigned scalar intPan Li1-0/+2
This patch would like to add the middle-end presentation for the saturation truncation. Aka set the result of truncated value to the max value when overflow. It will take the pattern similar as below. Form 1: #define DEF_SAT_U_TRUC_FMT_1(WT, NT) \ NT __attribute__((noinline)) \ sat_u_truc_##T##_fmt_1 (WT x) \ { \ bool overflow = x > (WT)(NT)(-1); \ return ((NT)x) | (NT)-overflow; \ } For example, truncated uint16_t to uint8_t, we have * SAT_TRUNC (254) => 254 * SAT_TRUNC (255) => 255 * SAT_TRUNC (256) => 255 * SAT_TRUNC (65536) => 255 Given below SAT_TRUNC from uint64_t to uint32_t. DEF_SAT_U_TRUC_FMT_1 (uint64_t, uint32_t) Before this patch: __attribute__((noinline)) uint32_t sat_u_truc_T_fmt_1 (uint64_t x) { _Bool overflow; unsigned int _1; unsigned int _2; unsigned int _3; uint32_t _6; ;; basic block 2, loop depth 0 ;; pred: ENTRY overflow_5 = x_4(D) > 4294967295; _1 = (unsigned int) x_4(D); _2 = (unsigned int) overflow_5; _3 = -_2; _6 = _1 | _3; return _6; ;; succ: EXIT } After this patch: __attribute__((noinline)) uint32_t sat_u_truc_T_fmt_1 (uint64_t x) { uint32_t _6; ;; basic block 2, loop depth 0 ;; pred: ENTRY _6 = .SAT_TRUNC (x_4(D)); [tail call] return _6; ;; succ: EXIT } The below tests are passed for this patch: *. The rv64gcv fully regression tests. *. The rv64gcv build with glibc. *. The x86 bootstrap tests. *. The x86 fully regression tests. gcc/ChangeLog: * internal-fn.def (SAT_TRUNC): Add new signed IFN sat_trunc as unary_convert. * match.pd: Add new matching pattern for unsigned int sat_trunc. * optabs.def (OPTAB_CL): Add unsigned and signed optab. * tree-ssa-math-opts.cc (gimple_unsigend_integer_sat_trunc): Add new decl for the matching pattern generated func. (match_unsigned_saturation_trunc): Add new func impl to match the .SAT_TRUNC. (math_opts_dom_walker::after_dom_children): Add .SAT_TRUNC match function under BIT_IOR_EXPR case. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-06-05Internal-fn: Support new IFN SAT_SUB for unsigned scalar intPan Li1-0/+1
This patch would like to add the middle-end presentation for the saturation sub. Aka set the result of add to the min when downflow. It will take the pattern similar as below. SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y)); For example for uint8_t, we have * SAT_SUB (255, 0) => 255 * SAT_SUB (1, 2) => 0 * SAT_SUB (254, 255) => 0 * SAT_SUB (0, 255) => 0 Given below SAT_SUB for uint64 uint64_t sat_sub_u64 (uint64_t x, uint64_t y) { return (x - y) & (-(TYPE)(x >= y)); } Before this patch: uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y) { _Bool _1; long unsigned int _3; uint64_t _6; ;; basic block 2, loop depth 0 ;; pred: ENTRY _1 = x_4(D) >= y_5(D); _3 = x_4(D) - y_5(D); _6 = _1 ? _3 : 0; return _6; ;; succ: EXIT } After this patch: uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y) { uint64_t _6; ;; basic block 2, loop depth 0 ;; pred: ENTRY _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call] return _6; ;; succ: EXIT } The below tests are running for this patch: *. The riscv fully regression tests. *. The x86 bootstrap tests. *. The x86 fully regression tests. PR target/51492 PR target/112600 gcc/ChangeLog: * internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB. * match.pd: Add new match for SAT_SUB. * optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub. * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add new decl for generated in match.pd. (build_saturation_binary_arith_call): Add new helper function to build the gimple call to binary SAT alu. (match_saturation_arith): Rename from. (match_unsigned_saturation_add): Rename to. (match_unsigned_saturation_sub): Add new func to match the unsigned sat sub. (math_opts_dom_walker::after_dom_children): Add SAT_SUB matching try when COND_EXPR. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-05-31Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.Qing Zhao1-0/+5
Including the following changes: * The definition of the new internal function .ACCESS_WITH_SIZE in internal-fn.def. * C FE converts every reference to a FAM with a "counted_by" attribute to a call to the internal function .ACCESS_WITH_SIZE. (build_component_ref in c_typeck.cc) This includes the case when the object is statically allocated and initialized. In order to make this working, the routine digest_init in c-typeck.cc is updated to fold calls to .ACCESS_WITH_SIZE to its first argument when require_constant is TRUE. However, for the reference inside "offsetof", the "counted_by" attribute is ignored since it's not useful at all. (c_parser_postfix_expression in c/c-parser.cc) In addtion to "offsetof", for the reference inside operator "typeof" and "alignof", we ignore counted_by attribute too. When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE, replace the call with its first argument. * Convert every call to .ACCESS_WITH_SIZE to its first argument. (expand_ACCESS_WITH_SIZE in internal-fn.cc) * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and get the reference from the call to .ACCESS_WITH_SIZE. (is_access_with_size_p and get_ref_from_access_with_size in tree.cc) gcc/c/ChangeLog: * c-parser.cc (c_parser_postfix_expression): Ignore the counted-by attribute when build_component_ref inside offsetof operator. * c-tree.h (build_component_ref): Add one more parameter. * c-typeck.cc (build_counted_by_ref): New function. (build_access_with_size_for_counted_by): New function. (build_component_ref): Check the counted-by attribute and build call to .ACCESS_WITH_SIZE. (build_unary_op): When building ADDR_EXPR for .ACCESS_WITH_SIZE, use its first argument. (lvalue_p): Accept call to .ACCESS_WITH_SIZE. (digest_init): Fold call to .ACCESS_WITH_SIZE to its first argument when require_constant is TRUE. gcc/ChangeLog: * internal-fn.cc (expand_ACCESS_WITH_SIZE): New function. * internal-fn.def (ACCESS_WITH_SIZE): New internal function. * tree.cc (is_access_with_size_p): New function. (get_ref_from_access_with_size): New function. * tree.h (is_access_with_size_p): New prototype. (get_ref_from_access_with_size): New prototype. gcc/testsuite/ChangeLog: * gcc.dg/flex-array-counted-by-2.c: New test.
2024-05-16Internal-fn: Support new IFN SAT_ADD for unsigned scalar intPan Li1-0/+2
This patch would like to add the middle-end presentation for the saturation add. Aka set the result of add to the max when overflow. It will take the pattern similar as below. SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x)) Take uint8_t as example, we will have: * SAT_ADD (1, 254) => 255. * SAT_ADD (1, 255) => 255. * SAT_ADD (2, 255) => 255. * SAT_ADD (255, 255) => 255. Given below example for the unsigned scalar integer uint64_t: uint64_t sat_add_u64 (uint64_t x, uint64_t y) { return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x)); } Before this patch: uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) { long unsigned int _1; _Bool _2; long unsigned int _3; long unsigned int _4; uint64_t _7; long unsigned int _10; __complex__ long unsigned int _11; ;; basic block 2, loop depth 0 ;; pred: ENTRY _11 = .ADD_OVERFLOW (x_5(D), y_6(D)); _1 = REALPART_EXPR <_11>; _10 = IMAGPART_EXPR <_11>; _2 = _10 != 0; _3 = (long unsigned int) _2; _4 = -_3; _7 = _1 | _4; return _7; ;; succ: EXIT } After this patch: uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) { uint64_t _7; ;; basic block 2, loop depth 0 ;; pred: ENTRY _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call] return _7; ;; succ: EXIT } The below tests are passed for this patch: 1. The riscv fully regression tests. 3. The x86 bootstrap tests. 4. The x86 fully regression tests. PR target/51492 PR target/112600 gcc/ChangeLog: * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD to the return true switch case(s). * internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD. * match.pd: Add unsigned SAT_ADD match(es). * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd. * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern func decl generated in match.pd match. (match_saturation_arith): New func impl to match the saturation arith. (math_opts_dom_walker::after_dom_children): Try match saturation arith when IOR expr. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-02-27expand: Add trivial folding for bit query builtins at expansion time [PR114044]Jakub Jelinek1-5/+5
While it seems a lot of places in various optimization passes fold bit query internal functions with INTEGER_CST arguments to INTEGER_CST when there is a lhs, when lhs is missing, all the removals of such dead stmts are guarded with -ftree-dce, so with -fno-tree-dce those unfolded ifn calls remain in the IL until expansion. If they have large/huge BITINT_TYPE arguments, there is no BLKmode optab and so expansion ICEs, and bitint lowering doesn't touch such calls because it doesn't know they need touching, functions only containing those will not even be further processed by the pass because there are no non-small BITINT_TYPE SSA_NAMEs + the 2 exceptions (stores of BITINT_TYPE INTEGER_CSTs and conversions from BITINT_TYPE INTEGER_CSTs to floating point SSA_NAMEs) and when walking there is no special case for calls with BITINT_TYPE INTEGER_CSTs either, those are for normal calls normally handled at expansion time. So, the following patch adjust the expansion of these 6 ifns, by doing nothing if there is no lhs, and also just in case and user disabled all possible passes that would fold this handles the case of setting lhs to ifn call with INTEGER_CST argument. 2024-02-27 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/114044 * internal-fn.def (CLRSB, CLZ, CTZ, FFS, PARITY): Use DEF_INTERNAL_INT_EXT_FN macro rather than DEF_INTERNAL_INT_FN. * internal-fn.h (expand_CLRSB, expand_CLZ, expand_CTZ, expand_FFS, expand_PARITY): Declare. * internal-fn.cc (expand_bitquery, expand_CLRSB, expand_CLZ, expand_CTZ, expand_FFS, expand_PARITY): New functions. (expand_POPCOUNT): Use expand_bitquery. * gcc.dg/bitint-95.c: New test.
2024-01-03Update copyright years.Jakub Jelinek1-1/+1
2023-11-20tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) ↵Jakub Jelinek1-1/+12
optimization for direct optab [PR90693] On Fri, Nov 17, 2023 at 03:01:04PM +0100, Jakub Jelinek wrote: > As a follow-up, I'm considering changing in this routine the popcount > call to IFN_POPCOUNT with 2 arguments and during expansion test costs. Here is the follow-up which does the rtx costs testing. 2023-11-20 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/90693 * tree-ssa-math-opts.cc (match_single_bit_test): Mark POPCOUNT with result only used in equality comparison against 1 with direct optab support as .POPCOUNT call with 2 arguments. * internal-fn.h (expand_POPCOUNT): Declare. * internal-fn.def (DEF_INTERNAL_INT_EXT_FN): New macro, document it, undefine at the end. (POPCOUNT): Use it instead of DEF_INTERNAL_INT_FN. * internal-fn.cc (DEF_INTERNAL_INT_EXT_FN): Define to nothing before inclusion to define expanders. (expand_POPCOUNT): New function.
2023-11-20internal-fn: Always undefine DEF_INTERNAL* macros at the end of internal-fn.defJakub Jelinek1-0/+6
I have noticed we are inconsistent, some DEF_INTERNAL* macros (most of them) were undefined at the end of internal-fn.def (but in some cases uselessly undefined again after inclusion), while others were not (and sometimes undefined after the inclusion). I've changed it to always undefine at the end of internal-fn.def. 2023-11-20 Jakub Jelinek <jakub@redhat.com> * internal-fn.def: Document missing DEF_INTERNAL* macros and make sure they are all undefined at the end. * internal-fn.cc (lookup_hilo_internal_fn, lookup_evenodd_internal_fn, widening_fn_p, get_len_internal_fn): Don't undef DEF_INTERNAL_*FN macros after inclusion of internal-fn.def.
2023-11-17c++: Implement C++ DR 2406 - [[fallthrough]] attribute and iteration statementsJakub Jelinek1-1/+3
The following patch implements CWG 2406 - [[fallthrough]] attribute and iteration statements The genericization of some loops leaves nothing at all or just a label after a body of a loop, so if the loop is later followed by case or default label in a switch, the fallthrough statement isn't diagnosed. The following patch implements it by marking the IFN_FALLTHROUGH call in such a case, such that during gimplification it can be pedantically diagnosed even if it is followed by case or default label or some normal labels followed by case/default labels. While looking into this, I've discovered other problems. expand_FALLTHROUGH_r is removing the IFN_FALLTHROUGH calls from the IL, but wasn't telling that to walk_gimple_stmt/walk_gimple_seq_mod, so the callers would then skip the next statement after it, and it would return non-NULL if the removed stmt was last in the sequence. This could lead to wi->callback_result being set even if it didn't appear at the very end of switch sequence. The patch makes use of wi->removed_stmt such that the callers properly know what happened, and use different way to handle the end of switch sequence case. That change discovered a bug in the gimple-walk handling of wi->removed_stmt. If that flag is set, the callback is telling the callers that the current statement has been removed and so the innermost walk_gimple_seq_mod shouldn't gsi_next. The problem is that wi->removed_stmt is only reset at the start of a walk_gimple_stmt, but that can be too late for some cases. If we have two nested gimple sequences, say GIMPLE_BIND as the last stmt of some gimple seq, we remove the last statement inside of that GIMPLE_BIND, set wi->removed_stmt there, don't do gsi_next correctly because already gsi_remove moved us to the next stmt, there is no next stmt, so we return back to the caller, but wi->removed_stmt is still set and so we don't do gsi_next even in the outer sequence, despite the GIMPLE_BIND (etc.) not being removed. That means we walk the GIMPLE_BIND with its whole sequence again. The patch fixes that by resetting wi->removed_stmt after we've used that flag in walk_gimple_seq_mod. Nothing really uses that flag after the outermost walk_gimple_seq_mod, it is just a private notification that the stmt callback has removed a stmt. 2023-11-17 Jakub Jelinek <jakub@redhat.com> PR c++/107571 gcc/ * gimplify.cc (expand_FALLTHROUGH_r): Use wi->removed_stmt after gsi_remove, change the way of passing fallthrough stmt at the end of sequence to expand_FALLTHROUGH. Diagnose IFN_FALLTHROUGH with GF_CALL_NOTHROW flag. (expand_FALLTHROUGH): Change loc into array of 2 location_t elts, don't test wi.callback_result, instead check whether first elt is not UNKNOWN_LOCATION and in that case pedwarn with the second location. * gimple-walk.cc (walk_gimple_seq_mod): Clear wi->removed_stmt after the flag has been used. * internal-fn.def (FALLTHROUGH): Mention in comment the special meaning of the TREE_NOTHROW/GF_CALL_NOTHROW flag on the calls. gcc/c-family/ * c-gimplify.cc (genericize_c_loop): For C++ mark IFN_FALLTHROUGH call at the end of loop body as TREE_NOTHROW. gcc/testsuite/ * g++.dg/DRs/dr2406.C: New test.
2023-11-10Internal-fn: Add FLOATN support for l/ll round and rint [PR/112432]Pan Li1-4/+4
The defined DEF_EXT_LIB_FLOATN_NX_BUILTINS functions should also have DEF_INTERNAL_FLT_FLOATN_FN instead of DEF_INTERNAL_FLT_FN for the FLOATN support. According to the glibc API and gcc builtin, we have below table for the FLOATN is supported or not. +---------+-------+-------------------------------------+ | | glibc | gcc: DEF_EXT_LIB_FLOATN_NX_BUILTINS | +---------+-------+-------------------------------------+ | iceil | N | N | | ifloor | N | N | | irint | N | N | | iround | N | N | | lceil | N | N | | lfloor | N | N | | lrint | Y | Y | | lround | Y | Y | | llceil | N | N | | llfllor | N | N | | llrint | Y | Y | | llround | Y | Y | +---------+-------+-------------------------------------+ This patch would like to support FLOATN for: 1. lrint 2. lround 3. llrint 4. llround The below tests are passed within this patch: 1. x86 bootstrap and regression test. 2. aarch64 regression test. 3. riscv regression tests. PR target/112432 gcc/ChangeLog: * internal-fn.def (LRINT): Add FLOATN support. (LROUND): Ditto. (LLRINT): Ditto. (LLROUND): Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-11-09ifcvt: Add support for conditional copysignTamar Christina1-0/+1
This adds a masked variant of copysign. Nothing very exciting just the general machinery to define and use a new masked IFN. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Note: This patch is part of a testseries and tests for it are added in the AArch64 patch that adds supports for the optab. gcc/ChangeLog: PR tree-optimization/109154 * internal-fn.def (COPYSIGN): New. * match.pd (UNCOND_BINARY, COND_BINARY): Map IFN_COPYSIGN to IFN_COND_COPYSIGN. * optabs.def (cond_copysign_optab, cond_len_copysign_optab): New.
2023-11-06internal-fn: Add VCOND_MASK_LEN.Robin Dapp1-0/+2
In order to prevent simplification of a COND_OP with degenerate mask (CONSTM1_RTX) into just an OP in the presence of length masking this patch introduces a length-masked analog to VEC_COND_EXPR: IFN_VCOND_MASK_LEN. It also adds new match patterns that allow the combination of unconditional unary, binary and ternay operations with the VCOND_MASK_LEN into a conditional operation if the target supports it. gcc/ChangeLog: PR tree-optimization/111760 * config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add expander. * config/riscv/riscv-protos.h (enum insn_type): Add. * config/riscv/riscv-v.cc (needs_fp_rounding): Add !pred_mov. * doc/md.texi: Add vcond_mask_len. * gimple-match-exports.cc (maybe_resimplify_conditional_op): Create VCOND_MASK_LEN when length masking. * gimple-match.h (gimple_match_op::gimple_match_op): Always initialize len and bias. * internal-fn.cc (vec_cond_mask_len_direct): Add. (direct_vec_cond_mask_len_optab_supported_p): Add. (internal_fn_len_index): Add VCOND_MASK_LEN. (internal_fn_mask_index): Ditto. * internal-fn.def (VCOND_MASK_LEN): New internal function. * match.pd: Combine unconditional unary, binary and ternary operations into the respective COND_LEN operations. * optabs.def (OPTAB_D): Add vcond_mask_len optab. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-arith-2.c: No vect cost model for riscv_v.
2023-09-06Middle-end _BitInt support [PR102989]Jakub Jelinek1-0/+6
The following patch introduces the middle-end part of the _BitInt support, a new BITINT_TYPE, handling it where needed, except the lowering pass and sanitizer support. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * tree.def (BITINT_TYPE): New type. * tree.h (TREE_CHECK6, TREE_NOT_CHECK6): Define. (NUMERICAL_TYPE_CHECK, INTEGRAL_TYPE_P): Include BITINT_TYPE. (BITINT_TYPE_P): Define. (CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode bit-fields if they have BITINT_TYPE type. (tree_check6, tree_not_check6): New inline functions. (any_integral_type_check): Include BITINT_TYPE. (build_bitint_type): Declare. * tree.cc (tree_code_size, wide_int_to_tree_1, cache_integer_cst, build_zero_cst, type_hash_canon_hash, type_cache_hasher::equal, type_hash_canon): Handle BITINT_TYPE. (bitint_type_cache): New variable. (build_bitint_type): New function. (signed_or_unsigned_type_for, verify_type_variant, verify_type): Handle BITINT_TYPE. (tree_cc_finalize): Free bitint_type_cache. * builtins.cc (type_to_class): Handle BITINT_TYPE. (fold_builtin_unordered_cmp): Handle BITINT_TYPE like INTEGER_TYPE. * cfgexpand.cc (expand_debug_expr): Punt on BLKmode BITINT_TYPE INTEGER_CSTs. * convert.cc (convert_to_pointer_1, convert_to_real_1, convert_to_complex_1): Handle BITINT_TYPE like INTEGER_TYPE. (convert_to_integer_1): Likewise. For BITINT_TYPE don't check GET_MODE_PRECISION (TYPE_MODE (type)). * doc/generic.texi (BITINT_TYPE): Document. * doc/tm.texi.in (TARGET_C_BITINT_TYPE_INFO): New. * doc/tm.texi: Regenerated. * dwarf2out.cc (base_type_die, is_base_type, modified_type_die, gen_type_die_with_usage): Handle BITINT_TYPE. (rtl_for_decl_init): Punt on BLKmode BITINT_TYPE INTEGER_CSTs or handle those which fit into shwi. * expr.cc (expand_expr_real_1): Define EXTEND_BITINT macro, reduce to bitfield precision reads from BITINT_TYPE vars, parameters or memory locations. Expand large/huge BITINT_TYPE INTEGER_CSTs into memory. * fold-const.cc (fold_convert_loc, make_range_step): Handle BITINT_TYPE. (extract_muldiv_1): For BITINT_TYPE use TYPE_PRECISION rather than GET_MODE_SIZE (SCALAR_INT_TYPE_MODE). (native_encode_int, native_interpret_int, native_interpret_expr): Handle BITINT_TYPE. * gimple-expr.cc (useless_type_conversion_p): Make BITINT_TYPE to some other integral type or vice versa conversions non-useless. * gimple-fold.cc (gimple_fold_builtin_memset): Punt for BITINT_TYPE. (clear_padding_unit): Mention in comment that _BitInt types don't need to fit either. (clear_padding_bitint_needs_padding_p): New function. (clear_padding_type_may_have_padding_p): Handle BITINT_TYPE. (clear_padding_type): Likewise. * internal-fn.cc (expand_mul_overflow): For unsigned non-mode precision operands force pos_neg? to 1. (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): New functions. * internal-fn.def (MULBITINT, DIVMODBITINT, FLOATTOBITINT, BITINTTOFLOAT): New internal functions. * internal-fn.h (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): Declare. * match.pd (non-equality compare simplifications from fold_binary): Punt if TYPE_MODE (arg1_type) is BLKmode. * pretty-print.h (pp_wide_int): Handle printing of large precision wide_ints which would buffer overflow digit_buffer. * stor-layout.cc (finish_bitfield_representative): For bit-fields with BITINT_TYPE, prefer representatives with precisions in multiple of limb precision. (layout_type): Handle BITINT_TYPE. Handle COMPLEX_TYPE with BLKmode element type and assert it is BITINT_TYPE. * target.def (bitint_type_info): New C target hook. * target.h (struct bitint_info): New type. * targhooks.cc (default_bitint_type_info): New function. * targhooks.h (default_bitint_type_info): Declare. * tree-pretty-print.cc (dump_generic_node): Handle BITINT_TYPE. Handle printing large wide_ints which would buffer overflow digit_buffer. * tree-ssa-sccvn.cc: Include target.h. (eliminate_dom_walker::eliminate_stmt): Punt for large/huge BITINT_TYPE. * tree-switch-conversion.cc (jump_table_cluster::emit): For more than 64-bit BITINT_TYPE subtract low bound from expression and cast to 64-bit integer type both the controlling expression and case labels. * typeclass.h (enum type_class): Add bitint_type_class enumerator. * varasm.cc (output_constant): Handle BITINT_TYPE INTEGER_CSTs. * vr-values.cc (check_for_binary_op_overflow): Use widest2_int rather than widest_int. (simplify_using_ranges::simplify_internal_call_using_ranges): Use unsigned_type_for rather than build_nonstandard_integer_type.
2023-08-22VECT: Add LEN_FOLD_EXTRACT_LAST patternJuzhe-Zhong1-0/+3
Hi, Richard and Richi. This is the last autovec pattern I want to add for RVV (length loop control). This patch is supposed to handled this following case: int __attribute__ ((noinline, noclone)) condition_reduction (int *a, int min_v, int n) { int last = 66; /* High start value. */ for (int i = 0; i < n; i++) if (a[i] < min_v) last = i; return last; } ARM SVE IR: ... mask__7.11_39 = vect__4.10_37 < vect_cst__38; _40 = loop_mask_36 & mask__7.11_39; last_5 = .FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32); ... RVV IR, we want to see: ... loop_len = SELECT_VL mask__7.11_39 = vect__4.10_37 < vect_cst__38; last_5 = .LEN_FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32, loop_len, bias); ... gcc/ChangeLog: * doc/md.texi: Add LEN_FOLD_EXTRACT_LAST pattern. * internal-fn.cc (fold_len_extract_direct): Ditto. (expand_fold_len_extract_optab_fn): Ditto. (direct_fold_len_extract_optab_supported_p): Ditto. * internal-fn.def (LEN_FOLD_EXTRACT_LAST): Ditto. * optabs.def (OPTAB_D): Ditto.
2023-08-16Add support for vector conitional notAndrew Pinski1-0/+2
Like the support conditional neg (r12-4470-g20dcda98ed376cb61c74b2c71), this just adds conditional not too. Also we should be able to turn `(a ? -1 : 0) ^ b` into a conditional not. OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu. gcc/ChangeLog: * internal-fn.def (COND_NOT): New internal function. * match.pd (UNCOND_UNARY, COND_UNARY): Add bit_not/not to the lists. (`vec (a ? -1 : 0) ^ b`): New pattern to convert into conditional not. * optabs.def (cond_one_cmpl): New optab. (cond_len_one_cmpl): Likewise. gcc/testsuite/ChangeLog: PR target/110986 * gcc.target/aarch64/sve/cond_unary_9.c: New test.
2023-08-11VECT: Add vec_mask_len_{load_lanes,store_lanes} patternsJuzhe-Zhong1-0/+6
This patch is add vec_mask_len_{load_lanes,store_stores} autovectorization patterns. Here we want to support this following autovectorization: void foo (int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict cond, int n) { for (intptr_t i = 0; i < n; ++i) { if (cond[i]) a[i] = b[i * 2] + b[i * 2 + 1]; } } ARM SVE IR: https://godbolt.org/z/cro1Eqc6a # loop_mask_60 = PHI <next_mask_82(4), max_mask_81(3)> ... mask__39.12_63 = vect__3.11_61 != { 0, ... }; vec_mask_and_66 = loop_mask_60 & mask__39.12_63; ... vect_array.15 = .MASK_LOAD_LANES (_57, 8B, vec_mask_and_66); ... For RVV, we would like to see IR: loop_len = SELECT_VL; ... mask__39.12_63 = vect__3.11_61 != { 0, ... }; ... vect_array.15 = .MASK_LEN_LOAD_LANES (_57, 8B, mask__39.12_63, loop_len, bias); ... Bootstrap and Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * doc/md.texi: Add vec_mask_len_{load_lanes,store_lanes} patterns. * internal-fn.cc (expand_partial_load_optab_fn): Ditto. (expand_partial_store_optab_fn): Ditto. * internal-fn.def (MASK_LEN_LOAD_LANES): Ditto. (MASK_LEN_STORE_LANES): Ditto. * optabs.def (OPTAB_CD): Ditto.
2023-08-10Make ISEL used internal functions const/nothrow where appropriateRichard Biener1-7/+9
Both .VEC_SET and .VEC_EXTACT and the various .VCOND internal functions are operating on registers only and they are not supposed to raise any exceptions. The following makes them const/nothrow. I've verified this avoids useless SSA updates in ISEL. * internal-fn.def (VCOND, VCONDU, VCONDEQ, VCOND_MASK, VEC_SET, VEC_EXTRACT): Make ECF_CONST | ECF_NOTHROW.
2023-07-31internal-fn: Refine macro define of COND_* and COND_LEN_* internal functionsJu-Zhe Zhong1-67/+56
Hi, Richard and Richi. Base on previous disscussions, we should make COND_* and COND_LEN_* consistent. So, this patch define these internal function together by these 2 wrappers: DEF_INTERNAL_OPTAB_FN (COND_##NAME, FLAGS, cond_##OPTAB, cond_##TYPE) \ DEF_INTERNAL_OPTAB_FN (COND_LEN_##NAME, FLAGS, cond_len_##OPTAB, \ cond_len_##TYPE) UNSIGNED_OPTAB, TYPE) \ DEF_INTERNAL_SIGNED_OPTAB_FN (COND_##NAME, FLAGS, SELECTOR, \ cond_##SIGNED_OPTAB, cond_##UNSIGNED_OPTAB, \ cond_##TYPE) \ DEF_INTERNAL_SIGNED_OPTAB_FN (COND_LEN_##NAME, FLAGS, SELECTOR, \ cond_len_##SIGNED_OPTAB, \ cond_len_##UNSIGNED_OPTAB, cond_len_##TYPE) Bootstrap and Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * internal-fn.def (DEF_INTERNAL_COND_FN): New macro. (DEF_INTERNAL_SIGNED_COND_FN): Ditto. (COND_ADD): Remove. (COND_SUB): Ditto. (COND_MUL): Ditto. (COND_DIV): Ditto. (COND_MOD): Ditto. (COND_RDIV): Ditto. (COND_MIN): Ditto. (COND_MAX): Ditto. (COND_FMIN): Ditto. (COND_FMAX): Ditto. (COND_AND): Ditto. (COND_IOR): Ditto. (COND_XOR): Ditto. (COND_SHL): Ditto. (COND_SHR): Ditto. (COND_FMA): Ditto. (COND_FMS): Ditto. (COND_FNMA): Ditto. (COND_FNMS): Ditto. (COND_NEG): Ditto. (COND_LEN_ADD): Ditto. (COND_LEN_SUB): Ditto. (COND_LEN_MUL): Ditto. (COND_LEN_DIV): Ditto. (COND_LEN_MOD): Ditto. (COND_LEN_RDIV): Ditto. (COND_LEN_MIN): Ditto. (COND_LEN_MAX): Ditto. (COND_LEN_FMIN): Ditto. (COND_LEN_FMAX): Ditto. (COND_LEN_AND): Ditto. (COND_LEN_IOR): Ditto. (COND_LEN_XOR): Ditto. (COND_LEN_SHL): Ditto. (COND_LEN_SHR): Ditto. (COND_LEN_FMA): Ditto. (COND_LEN_FMS): Ditto. (COND_LEN_FNMA): Ditto. (COND_LEN_FNMS): Ditto. (COND_LEN_NEG): Ditto. (ADD): New macro define. (SUB): Ditto. (MUL): Ditto. (DIV): Ditto. (MOD): Ditto. (RDIV): Ditto. (MIN): Ditto. (MAX): Ditto. (FMIN): Ditto. (FMAX): Ditto. (AND): Ditto. (IOR): Ditto. (XOR): Ditto. (SHL): Ditto. (SHR): Ditto. (FMA): Ditto. (FMS): Ditto. (FNMA): Ditto. (FNMS): Ditto. (NEG): Ditto.
2023-07-21cleanup: Change LEN_MASK into MASK_LENJuzhe-Zhong1-10/+10
Hi. Since start from LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE, COND_LEN_* patterns, the order of len and mask is {mask,len,bias}. The reason we make "mask" argument comes before "len" is because we want to keep the "mask" location same as mask_* or cond_* patterns to make use of current codes flow of mask_* and cond_*. Otherwise, we will need to change codes much more and make codes hard to maintain. Now, we already have COND_LEN_*, it's naturally that we should rename "LEN_MASK" into "MASK_LEN" to keep name scheme consistent. This patch only changes the name "LEN_MASK" into "MASK_LEN". No codes functionality change. gcc/ChangeLog: * config/riscv/autovec.md (len_maskload<mode><vm>): Change LEN_MASK into MASK_LEN. (mask_len_load<mode><vm>): Ditto. (len_maskstore<mode><vm>): Ditto. (mask_len_store<mode><vm>): Ditto. (len_mask_gather_load<RATIO64:mode><RATIO64I:mode>): Ditto. (mask_len_gather_load<RATIO64:mode><RATIO64I:mode>): Ditto. (len_mask_gather_load<RATIO32:mode><RATIO32I:mode>): Ditto. (mask_len_gather_load<RATIO32:mode><RATIO32I:mode>): Ditto. (len_mask_gather_load<RATIO16:mode><RATIO16I:mode>): Ditto. (mask_len_gather_load<RATIO16:mode><RATIO16I:mode>): Ditto. (len_mask_gather_load<RATIO8:mode><RATIO8I:mode>): Ditto. (mask_len_gather_load<RATIO8:mode><RATIO8I:mode>): Ditto. (len_mask_gather_load<RATIO4:mode><RATIO4I:mode>): Ditto. (mask_len_gather_load<RATIO4:mode><RATIO4I:mode>): Ditto. (len_mask_gather_load<RATIO2:mode><RATIO2I:mode>): Ditto. (mask_len_gather_load<RATIO2:mode><RATIO2I:mode>): Ditto. (len_mask_gather_load<RATIO1:mode><RATIO1:mode>): Ditto. (mask_len_gather_load<RATIO1:mode><RATIO1:mode>): Ditto. (len_mask_scatter_store<RATIO64:mode><RATIO64I:mode>): Ditto. (mask_len_scatter_store<RATIO64:mode><RATIO64I:mode>): Ditto. (len_mask_scatter_store<RATIO32:mode><RATIO32I:mode>): Ditto. (mask_len_scatter_store<RATIO32:mode><RATIO32I:mode>): Ditto. (len_mask_scatter_store<RATIO16:mode><RATIO16I:mode>): Ditto. (mask_len_scatter_store<RATIO16:mode><RATIO16I:mode>): Ditto. (len_mask_scatter_store<RATIO8:mode><RATIO8I:mode>): Ditto. (mask_len_scatter_store<RATIO8:mode><RATIO8I:mode>): Ditto. (len_mask_scatter_store<RATIO4:mode><RATIO4I:mode>): Ditto. (mask_len_scatter_store<RATIO4:mode><RATIO4I:mode>): Ditto. (len_mask_scatter_store<RATIO2:mode><RATIO2I:mode>): Ditto. (mask_len_scatter_store<RATIO2:mode><RATIO2I:mode>): Ditto. (len_mask_scatter_store<RATIO1:mode><RATIO1:mode>): Ditto. (mask_len_scatter_store<RATIO1:mode><RATIO1:mode>): Ditto. * doc/md.texi: Ditto. * genopinit.cc (main): Ditto. (CMP_NAME): Ditto. Ditto. * gimple-fold.cc (arith_overflowed_p): Ditto. (gimple_fold_partial_load_store_mem_ref): Ditto. (gimple_fold_call): Ditto. * internal-fn.cc (len_maskload_direct): Ditto. (mask_len_load_direct): Ditto. (len_maskstore_direct): Ditto. (mask_len_store_direct): Ditto. (expand_call_mem_ref): Ditto. (expand_len_maskload_optab_fn): Ditto. (expand_mask_len_load_optab_fn): Ditto. (expand_len_maskstore_optab_fn): Ditto. (expand_mask_len_store_optab_fn): Ditto. (direct_len_maskload_optab_supported_p): Ditto. (direct_mask_len_load_optab_supported_p): Ditto. (direct_len_maskstore_optab_supported_p): Ditto. (direct_mask_len_store_optab_supported_p): Ditto. (internal_load_fn_p): Ditto. (internal_store_fn_p): Ditto. (internal_gather_scatter_fn_p): Ditto. (internal_fn_len_index): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_stored_value_index): Ditto. (internal_len_load_store_bias): Ditto. * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto. (MASK_LEN_GATHER_LOAD): Ditto. (LEN_MASK_LOAD): Ditto. (MASK_LEN_LOAD): Ditto. (LEN_MASK_SCATTER_STORE): Ditto. (MASK_LEN_SCATTER_STORE): Ditto. (LEN_MASK_STORE): Ditto. (MASK_LEN_STORE): Ditto. * optabs-query.cc (supports_vec_gather_load_p): Ditto. (supports_vec_scatter_store_p): Ditto. * optabs-tree.cc (target_supports_mask_load_store_p): Ditto. (target_supports_len_load_store_p): Ditto. * optabs.def (OPTAB_CD): Ditto. * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Ditto. (call_may_clobber_ref_p_1): Ditto. * tree-ssa-dse.cc (initialize_ao_ref_for_dse): Ditto. (dse_optimize_stmt): Ditto. * tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Ditto. (get_alias_ptr_type_for_ptr_address): Ditto. * tree-vect-data-refs.cc (vect_gather_scatter_fn_p): Ditto. * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Ditto. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto. (vect_get_strided_load_store_ops): Ditto. (vectorizable_store): Ditto. (vectorizable_load): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/gimple_fold-1.c: Ditto.
2023-07-19VECT: Add mask_len_fold_left_plus for in-order floating-point reductionJu-Zhe Zhong1-0/+3
Hi, Richard and Richi. This patch adds mask_len_fold_left_plus pattern to support in-order floating-point reduction for target support len loop control. Consider this following case: double foo2 (double *__restrict a, double init, int *__restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) init += a[i]; return init; } ARM SVE: ... vec_mask_and_60 = loop_mask_54 & mask__23.33_57; vect__ifc__35.37_64 = .VCOND_MASK (vec_mask_and_60, vect__8.36_61, { 0.0, ... }); _36 = .MASK_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, loop_mask_54); ... For RVV, we want to see: ... _36 = .MASK_LEN_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, control_mask, loop_len, bias); ... gcc/ChangeLog: * doc/md.texi: Add mask_len_fold_left_plus. * internal-fn.cc (mask_len_fold_left_direct): Ditto. (expand_mask_len_fold_left_optab_fn): Ditto. (direct_mask_len_fold_left_optab_supported_p): Ditto. * internal-fn.def (MASK_LEN_FOLD_LEFT_PLUS): Ditto. * optabs.def (OPTAB_D): Ditto.
2023-07-11VECT: Add COND_LEN_* operations for loop control with length targetsJu-Zhe Zhong1-0/+38
Hi, Richard and Richi. This patch is adding cond_len_* operations pattern for target support loop control with length. These patterns will be used in these following case: 1. Integer division: void f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int n) { for (int i = 0; i < n; ++i) { a[i] = b[i] / c[i]; } } ARM SVE IR: ... max_mask_36 = .WHILE_ULT (0, bnd.5_32, { 0, ... }); Loop: ... # loop_mask_29 = PHI <next_mask_37(4), max_mask_36(3)> ... vect__4.8_28 = .MASK_LOAD (_33, 32B, loop_mask_29); ... vect__6.11_25 = .MASK_LOAD (_20, 32B, loop_mask_29); vect__8.12_24 = .COND_DIV (loop_mask_29, vect__4.8_28, vect__6.11_25, vect__4.8_28); ... .MASK_STORE (_1, 32B, loop_mask_29, vect__8.12_24); ... next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... }); ... For target like RVV who support loop control with length, we want to see IR as follows: Loop: ... # loop_len_29 = SELECT_VL ... vect__4.8_28 = .LEN_MASK_LOAD (_33, 32B, loop_len_29); ... vect__6.11_25 = .LEN_MASK_LOAD (_20, 32B, loop_len_29); vect__8.12_24 = .COND_LEN_DIV (dummp_mask, vect__4.8_28, vect__6.11_25, vect__4.8_28, loop_len_29, bias); ... .LEN_MASK_STORE (_1, 32B, loop_len_29, vect__8.12_24); ... next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... }); ... Notice here, we use dummp_mask = { -1, -1, .... , -1 } 2. Integer conditional division: Similar case with (1) but with condtion: void f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int32_t * cond, int n) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i] = b[i] / c[i]; } } ARM SVE: ... max_mask_76 = .WHILE_ULT (0, bnd.6_52, { 0, ... }); Loop: ... # loop_mask_55 = PHI <next_mask_77(5), max_mask_76(4)> ... vect__4.9_56 = .MASK_LOAD (_51, 32B, loop_mask_55); mask__29.10_58 = vect__4.9_56 != { 0, ... }; vec_mask_and_61 = loop_mask_55 & mask__29.10_58; ... vect__6.13_62 = .MASK_LOAD (_24, 32B, vec_mask_and_61); ... vect__8.16_66 = .MASK_LOAD (_1, 32B, vec_mask_and_61); vect__10.17_68 = .COND_DIV (vec_mask_and_61, vect__6.13_62, vect__8.16_66, vect__6.13_62); ... .MASK_STORE (_2, 32B, vec_mask_and_61, vect__10.17_68); ... next_mask_77 = .WHILE_ULT (_3, bnd.6_52, { 0, ... }); Here, ARM SVE use vec_mask_and_61 = loop_mask_55 & mask__29.10_58; to gurantee the correct result. However, target with length control can not perform this elegant flow, for RVV, we would expect: Loop: ... loop_len_55 = SELECT_VL ... mask__29.10_58 = vect__4.9_56 != { 0, ... }; ... vect__10.17_68 = .COND_LEN_DIV (mask__29.10_58, vect__6.13_62, vect__8.16_66, vect__6.13_62, loop_len_55, bias); ... Here we expect COND_LEN_DIV predicated by a real mask which is the outcome of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... }; and a real length which is produced by loop control : loop_len_55 = SELECT_VL 3. conditional Floating-point operations (no -ffast-math): void f (float *restrict a, float *restrict b, int32_t *restrict cond, int n) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i] = b[i] + a[i]; } } ARM SVE IR: max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... }); ... # loop_mask_49 = PHI <next_mask_71(4), max_mask_70(3)> ... mask__27.10_52 = vect__4.9_50 != { 0, ... }; vec_mask_and_55 = loop_mask_49 & mask__27.10_52; ... vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56); ... next_mask_71 = .WHILE_ULT (_22, bnd.6_46, { 0, ... }); ... For RVV, we would expect IR: ... loop_len_49 = SELECT_VL ... mask__27.10_52 = vect__4.9_50 != { 0, ... }; ... vect__9.17_62 = .COND_LEN_ADD (mask__27.10_52, vect__6.13_56, vect__8.16_60, vect__6.13_56, loop_len_49, bias); ... 4. Conditional un-ordered reduction: int32_t f (int32_t *restrict a, int32_t *restrict cond, int n) { int32_t result = 0; for (int i = 0; i < n; ++i) { if (cond[i]) result += a[i]; } return result; } ARM SVE IR: Loop: # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)> ... # loop_mask_40 = PHI <next_mask_58(4), max_mask_57(3)> ... mask__17.11_43 = vect__4.10_41 != { 0, ... }; vec_mask_and_46 = loop_mask_40 & mask__17.11_43; ... vect__33.16_51 = .COND_ADD (vec_mask_and_46, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37); ... next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... }); ... Epilogue: _53 = .REDUC_PLUS (vect__33.16_51); [tail call] For RVV, we expect: Loop: # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)> ... loop_len_40 = SELECT_VL ... mask__17.11_43 = vect__4.10_41 != { 0, ... }; ... vect__33.16_51 = .COND_LEN_ADD (mask__17.11_43, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37, loop_len_40, bias); ... next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... }); ... Epilogue: _53 = .REDUC_PLUS (vect__33.16_51); [tail call] I name these patterns as "cond_len_*" since I want the length operand comes after mask operand and all other operands except length operand same order as "cond_*" patterns. Such order will make life easier in the following loop vectorizer support. gcc/ChangeLog: * doc/md.texi: Add COND_LEN_* operations for loop control with length. * internal-fn.cc (cond_len_unary_direct): Ditto. (cond_len_binary_direct): Ditto. (cond_len_ternary_direct): Ditto. (expand_cond_len_unary_optab_fn): Ditto. (expand_cond_len_binary_optab_fn): Ditto. (expand_cond_len_ternary_optab_fn): Ditto. (direct_cond_len_unary_optab_supported_p): Ditto. (direct_cond_len_binary_optab_supported_p): Ditto. (direct_cond_len_ternary_optab_supported_p): Ditto. * internal-fn.def (COND_LEN_ADD): Ditto. (COND_LEN_SUB): Ditto. (COND_LEN_MUL): Ditto. (COND_LEN_DIV): Ditto. (COND_LEN_MOD): Ditto. (COND_LEN_RDIV): Ditto. (COND_LEN_MIN): Ditto. (COND_LEN_MAX): Ditto. (COND_LEN_FMIN): Ditto. (COND_LEN_FMAX): Ditto. (COND_LEN_AND): Ditto. (COND_LEN_IOR): Ditto. (COND_LEN_XOR): Ditto. (COND_LEN_SHL): Ditto. (COND_LEN_SHR): Ditto. (COND_LEN_FMA): Ditto. (COND_LEN_FMS): Ditto. (COND_LEN_FNMA): Ditto. (COND_LEN_FNMS): Ditto. (COND_LEN_NEG): Ditto. * optabs.def (OPTAB_D): Ditto.
2023-07-05gimple-isel: Recognize vec_extract pattern.Robin Dapp1-0/+1
In gimple-isel we already deduce a vec_set pattern from an ARRAY_REF(VIEW_CONVERT_EXPR). This patch does the same for a vec_extract. The code is largely similar to the vec_set one including the addition of a can_vec_extract_var_idx_p function in optabs.cc to check if the backend can handle a register operand as index. We already have can_vec_extract in optabs-query but that one checks whether we can extract specific modes. With the introduction of an internal function for vec_extract the expander must not FAIL. For vec_set this has already been the case so adjust the documentation accordingly. Additionally, clarify the wording of the vector-vector case for vec_extract. gcc/ChangeLog: * doc/md.texi: Document that vec_set and vec_extract must not fail. * gimple-isel.cc (gimple_expand_vec_set_expr): Rename this... (gimple_expand_vec_set_extract_expr): ...to this. (gimple_expand_vec_exprs): Call renamed function. * internal-fn.cc (vec_extract_direct): Add. (expand_vec_extract_optab_fn): New function to expand vec_extract optab. (direct_vec_extract_optab_supported_p): Add. * internal-fn.def (VEC_EXTRACT): Add. * optabs.cc (can_vec_extract_var_idx_p): New function. * optabs.h (can_vec_extract_var_idx_p): Declare.
2023-07-04Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} patternJu-Zhe Zhong1-2/+6
Hi, Richi and Richard. Base one the review comments from Richard: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html I change len_mask_gather_load/len_mask_scatter_store order into: {len,bias,mask} We adjust adding len and mask using using add_len_and_mask_args which is same as partial_load/parial_store. Now, the codes become more reasonable and easier maintain. This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets handle flow control by mask and loop control by length on gather/scatter memory operations. Consider this following case: void f (uint8_t *restrict a, uint8_t *restrict b, int n, int base, int step, int *restrict cond) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i * step + base] = b[i * step + base]; } } We hope RVV can vectorize such case into following IR: loop_len = SELECT_VL control_mask = comparison v = LEN_MASK_GATHER_LOAD (.., loop_len, bias, control_mask) LEN_SCATTER_STORE (... v, ..., loop_len, bias, control_mask) This patch doesn't apply such patterns into vectorizer, just add patterns and update the documents. Will send patch which apply such patterns into vectorizer soon after this patch is approved. Ok for trunk? gcc/ChangeLog: * doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store. * internal-fn.cc (expand_scatter_store_optab_fn): Ditto. (expand_gather_load_optab_fn): Ditto. (internal_load_fn_p): Ditto. (internal_store_fn_p): Ditto. (internal_gather_scatter_fn_p): Ditto. (internal_fn_len_index): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_stored_value_index): Ditto. * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto. (LEN_MASK_SCATTER_STORE): Ditto. * optabs.def (OPTAB_CD): Ditto.
2023-06-30Mid engine setup [SU]ABDLOluwatamilore Adebayo1-0/+5
This updates vect_recog_abd_pattern to recognize the widening variant of absolute difference (ABDL, ABDL2). gcc/ChangeLog: * internal-fn.def (VEC_WIDEN_ABD): New internal hilo optab. * optabs.def (vec_widen_sabd_optab, vec_widen_sabd_hi_optab, vec_widen_sabd_lo_optab, vec_widen_sabd_odd_even, vec_widen_sabd_even_optab, vec_widen_uabd_optab, vec_widen_uabd_hi_optab, vec_widen_uabd_lo_optab, vec_widen_uabd_odd_even, vec_widen_uabd_even_optab): New optabs. * doc/md.texi: Document them. * tree-vect-patterns.cc (vect_recog_abd_pattern): Update to to build a VEC_WIDEN_ABD call if the input precision is smaller than the precision of the output. (vect_recog_widen_abd_pattern): Should an ABD expression be found preceeding an extension, replace the two with a VEC_WIDEN_ABD.
2023-06-19VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabsJu-Zhe Zhong1-0/+4
This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets like RISC-V that uses length in loop control. Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR. Mask is the outcome of comparison. LEN_MASK_ LOAD/STORE format is defined as follows: 1). LEN_MASK_LOAD (ptr, align, length, mask). 2). LEN_MASK_STORE (ptr, align, length, mask, vec). Consider these 4 following cases: VLA: Variable-length auto-vectorization VLS: Specific-length auto-vectorization Case 1 (VLS): -mrvv-vector-bits=128 IR (Does not use LEN_MASK_*): Code: v1 = MEM (...) for (int i = 0; i < 4; i++) v2 = MEM (...) a[i] = b[i] + c[i]; v3 = v1 + v2 MEM[...] = v3 Case 2 (VLS): -mrvv-vector-bits=128 IR (LEN_MASK_* with length = VF, mask = comparison): Code: mask = comparison for (int i = 0; i < 4; i++) v1 = LEN_MASK_LOAD (length = VF, mask) if (cond[i]) v2 = LEN_MASK_LOAD (length = VF, mask) a[i] = b[i] + c[i]; v3 = v1 + v2 LEN_MASK_STORE (length = VF, mask, v3) Case 3 (VLA): Code: loop_len = SELECT_VL or MIN for (int i = 0; i < n; i++) v1 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...}) a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...}) v3 = v1 + v2 LEN_MASK_STORE (length = loop_len, mask = {-1,-1,...}, v3) Case 4 (VLA): Code: loop_len = SELECT_VL or MIN for (int i = 0; i < n; i++) mask = comparison if (cond[i]) v1 = LEN_MASK_LOAD (length = loop_len, mask) a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask) v3 = v1 + v2 LEN_MASK_STORE (length = loop_len, mask, v3) Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com> gcc/ChangeLog: * doc/md.texi: Add len_mask{load,store}. * genopinit.cc (main): Ditto. (CMP_NAME): Ditto. * internal-fn.cc (len_maskload_direct): Ditto. (len_maskstore_direct): Ditto. (expand_call_mem_ref): Ditto. (expand_partial_load_optab_fn): Ditto. (expand_len_maskload_optab_fn): Ditto. (expand_partial_store_optab_fn): Ditto. (expand_len_maskstore_optab_fn): Ditto. (direct_len_maskload_optab_supported_p): Ditto. (direct_len_maskstore_optab_supported_p): Ditto. * internal-fn.def (LEN_MASK_LOAD): Ditto. (LEN_MASK_STORE): Ditto. * optabs.def (OPTAB_CD): Ditto.
2023-06-15middle-end, i386: Pattern recognize add/subtract with carry [PR79173]Jakub Jelinek1-0/+2
The following patch introduces {add,sub}c5_optab and pattern recognizes various forms of add with carry and subtract with carry/borrow, see pr79173-{1,2,3,4,5,6}.c tests on what is matched. Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow calls per limb (with just one for the least significant one), for add with carry even when it is hand written in C (for subtraction reassoc seems to change it too much so that the pattern recognition doesn't work). __builtin_{add,sub}_overflow are standardized in C23 under ckd_{add,sub} names, so it isn't any longer a GNU only extension. Note, clang has for these (IMHO badly designed) __builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just a single bit of carry, but basically add 3 unsigned values or subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2 because of that. If we wanted to introduce those for clang compatibility, we could and lower them early to just two __builtin_{add,sub}_overflow calls and let the pattern matching in this patch recognize it later. I've added expanders for this on ix86 and in addition to that added various peephole2s (in preparation patches for this patch) to make sure we get nice (and small) code for the common cases. I think there are other PRs which request that e.g. for the _{addcarry,subborrow}_u{32,64} intrinsics, which the patch also improves. Would be nice if support for these optabs was added to many other targets, arm/aarch64 and powerpc* certainly have such instructions, I'd expect in fact that most targets do. The _BitInt support I'm working on will also need this to emit reasonable code. 2023-06-15 Jakub Jelinek <jakub@redhat.com> PR middle-end/79173 * internal-fn.def (UADDC, USUBC): New internal functions. * internal-fn.cc (expand_UADDC, expand_USUBC): New functions. (commutative_ternary_fn_p): Return true also for IFN_UADDC. * optabs.def (uaddc5_optab, usubc5_optab): New optabs. * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart, match_uaddc_usubc): New functions. (math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless other optimizations have been successful for those. * gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC. * fold-const-call.cc (fold_const_call): Likewise. * gimple-range-fold.cc (adjust_imagpart_expr): Likewise. * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise. * doc/md.texi (uaddc<mode>5, usubc<mode>5): Document new named patterns. * config/i386/i386.md (uaddc<mode>5, usubc<mode>5): New define_expand patterns. (*setcc_qi_addqi3_cconly_overflow_1_<mode>, *setccc): Split into NOTE_INSN_DELETED note rather than nop instruction. (*setcc_qi_negqi_ccc_1_<mode>, *setcc_qi_negqi_ccc_2_<mode>): Likewise. * gcc.target/i386/pr79173-1.c: New test. * gcc.target/i386/pr79173-2.c: New test. * gcc.target/i386/pr79173-3.c: New test. * gcc.target/i386/pr79173-4.c: New test. * gcc.target/i386/pr79173-5.c: New test. * gcc.target/i386/pr79173-6.c: New test. * gcc.target/i386/pr79173-7.c: New test. * gcc.target/i386/pr79173-8.c: New test. * gcc.target/i386/pr79173-9.c: New test. * gcc.target/i386/pr79173-10.c: New test.
2023-06-15Missed opportunity to use [SU]ABDOluwatamilore Adebayo1-0/+3
This adds a recognition pattern for the non-widening absolute difference (ABD). gcc/ChangeLog: * doc/md.texi (sabd, uabd): Document them. * internal-fn.def (ABD): Use new optab. * optabs.def (sabd_optab, uabd_optab): New optabs, * tree-vect-patterns.cc (vect_recog_absolute_difference): Recognize the following idiom abs (a - b). (vect_recog_sad_pattern): Refactor to use vect_recog_absolute_difference. (vect_recog_abd_pattern): Use patterns found by vect_recog_absolute_difference to build a new ABD internal call.
2023-06-10VECT: Add SELECT_VL supportJu-Zhe Zhong1-0/+1
This patch address comments from Richard && Richi and rebase to trunk. This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patch is inspired by RVV ISA and LLVM: https://reviews.llvm.org/D99750 The SELECT_VL is same behavior as LLVM "get_vector_length" with these following properties: 1. Only apply on single-rgroup. 2. non SLP. 3. adjust loop control IV. 4. adjust data reference IV. 5. allow non-vf elements processing in non-final iteration Code # void vvaddint32(size_t n, const int*x, const int*y, int*z) # { for (size_t i=0; i<n; i++) { z[i]=x[i]+y[i]; } } Take RVV codegen for example: Before this patch: vvaddint32: ble a0,zero,.L6 csrr a4,vlenb srli a6,a4,2 .L4: mv a5,a0 bleu a0,a6,.L3 mv a5,a6 .L3: vsetvli zero,a5,e32,m1,ta,ma vle32.v v2,0(a1) vle32.v v1,0(a2) vsetvli a7,zero,e32,m1,ta,ma sub a0,a0,a5 vadd.vv v1,v1,v2 vsetvli zero,a5,e32,m1,ta,ma vse32.v v1,0(a3) add a2,a2,a4 add a3,a3,a4 add a1,a1,a4 bne a0,zero,.L4 .L6: ret After this patch: vvaddint32: vsetvli t0, a0, e32, ta, ma # Set vector length based on 32-bit vectors vle32.v v0, (a1) # Get first vector sub a0, a0, t0 # Decrement number done slli t0, t0, 2 # Multiply number done by 4 bytes add a1, a1, t0 # Bump pointer vle32.v v1, (a2) # Get second vector add a2, a2, t0 # Bump pointer vadd.vv v2, v0, v1 # Sum vectors vse32.v v2, (a3) # Store result add a3, a3, t0 # Bump pointer bnez a0, vvaddint32 # Loop back ret # Finished Co-authored-by: Richard Sandiford<richard.sandiford@arm.com> Co-authored-by: Richard Biener <rguenther@suse.de> gcc/ChangeLog: * doc/md.texi: Add SELECT_VL support. * internal-fn.def (SELECT_VL): Ditto. * optabs.def (OPTAB_D): Ditto. * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto. * tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto. (vectorizable_store): Ditto. (vectorizable_load): Ditto. * tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
2023-06-05internal-fn,vect: Refactor widen_plus as internal_fnAndre Vieira1-0/+34
DEF_INTERNAL_WIDENING_OPTAB_FN and DEF_INTERNAL_NARROWING_OPTAB_FN are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively. With the exception that they provide convenience wrappers for a single vector to vector conversion, a hi/lo split or an even/odd split. Each definition for <NAME> will require either signed optabs named <UOPTAB> and <SOPTAB> (for widening) or a single <OPTAB> (for narrowing) for each of the five functions it creates. For example, for widening addition the DEF_INTERNAL_WIDENING_OPTAB_FN will create five internal functions: IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO, IFN_VEC_WIDEN_PLUS_EVEN and IFN_VEC_WIDEN_PLUS_ODD. Each requiring two optabs, one for signed and one for unsigned. Aarch64 implements the hi/lo split optabs: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> (u/s)addl2 IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> -> (u/s)addl This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. 2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com> Joel Hutton <joel.hutton@arm.com> Tamar Christina <tamar.christina@arm.com> gcc/ChangeLog: * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): Rename this ... (vec_widen_<su>add_lo_<mode>): ... to this. (vec_widen_<su>addl_hi_<mode>): Rename this ... (vec_widen_<su>add_hi_<mode>): ... to this. (vec_widen_<su>subl_lo_<mode>): Rename this ... (vec_widen_<su>sub_lo_<mode>): ... to this. (vec_widen_<su>subl_hi_<mode>): Rename this ... (vec_widen_<su>sub_hi_<mode>): ...to this. * doc/generic.texi: Document new IFN codes. * internal-fn.cc (lookup_hilo_internal_fn): Add lookup function. (commutative_binary_fn_p): Add widen_plus fn's. (widening_fn_p): New function. (narrowing_fn_p): New function. (direct_internal_fn_optab): Change visibility. * internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an internal_fn that expands into multiple internal_fns for widening. (IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO, IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD, IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI, IFN_VEC_WIDEN_MINUS_LO, IFN_VEC_WIDEN_MINUS_ODD, IFN_VEC_WIDEN_MINUS_EVEN): Define widening plus,minus functions. * internal-fn.h (direct_internal_fn_optab): Declare new prototype. (lookup_hilo_internal_fn): Likewise. (widening_fn_p): Likewise. (Narrowing_fn_p): Likewise. * optabs.cc (commutative_optab_p): Add widening plus optabs. * optabs.def (OPTAB_D): Define widen add, sub optabs. * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support patterns with a hi/lo or even/odd split. (vect_recog_sad_pattern): Refactor to use new IFN codes. (vect_recog_widen_plus_pattern): Likewise. (vect_recog_widen_minus_pattern): Likewise. (vect_recog_average_pattern): Likewise. * tree-vect-stmts.cc (vectorizable_conversion): Add support for _HILO IFNs. (supportable_widening_operation): Likewise. * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-widen-add.c: Test that new IFN_VEC_WIDEN_PLUS is being used. * gcc.target/aarch64/vect-widen-sub.c: Test that new IFN_VEC_WIDEN_MINUS is being used.
2023-02-22vect: inbranch SIMD clonesAndrew Stubbs1-0/+3
There has been support for generating "inbranch" SIMD clones for a long time, but nothing actually uses them (as far as I can see). This patch add supports for a sub-set of possible cases (those using mask_mode == VOIDmode). The other cases fail to vectorize, just as before, so there should be no regressions. The sub-set of support should cover all cases needed by amdgcn, at present. gcc/ChangeLog: * internal-fn.cc (expand_MASK_CALL): New. * internal-fn.def (MASK_CALL): New. * internal-fn.h (expand_MASK_CALL): New prototype. * omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type for mask arguments also. * tree-if-conv.cc: Include cgraph.h. (if_convertible_stmt_p): Do if conversions for calls to SIMD calls. (predicate_statements): Convert functions to IFN_MASK_CALL. * tree-vect-loop.cc (vect_get_datarefs_in_loop): Recognise IFN_MASK_CALL as a SIMD function call. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle IFN_MASK_CALL as an inbranch SIMD function call. Generate the mask vector arguments. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-simd-clone-16.c: New test. * gcc.dg/vect/vect-simd-clone-16b.c: New test. * gcc.dg/vect/vect-simd-clone-16c.c: New test. * gcc.dg/vect/vect-simd-clone-16d.c: New test. * gcc.dg/vect/vect-simd-clone-16e.c: New test. * gcc.dg/vect/vect-simd-clone-16f.c: New test. * gcc.dg/vect/vect-simd-clone-17.c: New test. * gcc.dg/vect/vect-simd-clone-17b.c: New test. * gcc.dg/vect/vect-simd-clone-17c.c: New test. * gcc.dg/vect/vect-simd-clone-17d.c: New test. * gcc.dg/vect/vect-simd-clone-17e.c: New test. * gcc.dg/vect/vect-simd-clone-17f.c: New test. * gcc.dg/vect/vect-simd-clone-18.c: New test. * gcc.dg/vect/vect-simd-clone-18b.c: New test. * gcc.dg/vect/vect-simd-clone-18c.c: New test. * gcc.dg/vect/vect-simd-clone-18d.c: New test. * gcc.dg/vect/vect-simd-clone-18e.c: New test. * gcc.dg/vect/vect-simd-clone-18f.c: New test.
2023-02-02Replace IFN_TRAP with BUILT_IN_UNREACHABLE_TRAP [PR107300]Jakub Jelinek1-5/+0
For PR106099 I've added IFN_TRAP as an alternative to __builtin_trap meant for __builtin_unreachable purposes (e.g. with -funreachable-traps or some sanitizers) which doesn't need vops because __builtin_unreachable doesn't need them either. This works in various cases, but unfortunately IPA likes to decide on the redirection to unreachable just by tweaking the cgraph edge to point to a different FUNCTION_DECL. As internal functions don't have a decl, this causes problems like in the following testcase. The following patch fixes it by removing IFN_TRAP again and replacing it with user inaccessible BUILT_IN_UNREACHABLE_TRAP, so that e.g. builtin_decl_unreachable can return it directly and we don't need to tweak it later in wherever we actually replace the call stmt. 2023-02-02 Jakub Jelinek <jakub@redhat.com> PR ipa/107300 * builtins.def (BUILT_IN_UNREACHABLE_TRAP): New builtin. * internal-fn.def (TRAP): Remove. * internal-fn.cc (expand_TRAP): Remove. * tree.cc (build_common_builtin_nodes): Define BUILT_IN_UNREACHABLE_TRAP if not yet defined. (builtin_decl_unreachable): Use BUILT_IN_UNREACHABLE_TRAP instead of BUILT_IN_TRAP. * gimple.cc (gimple_build_builtin_unreachable): Remove emitting internal function for BUILT_IN_TRAP. * asan.cc (maybe_instrument_call): Handle BUILT_IN_UNREACHABLE_TRAP. * cgraph.cc (cgraph_edge::verify_corresponds_to_fndecl): Handle BUILT_IN_UNREACHABLE_TRAP instead of BUILT_IN_TRAP. * ipa-devirt.cc (possible_polymorphic_call_target_p): Handle BUILT_IN_UNREACHABLE_TRAP. * builtins.cc (expand_builtin, is_inexpensive_builtin): Likewise. * tree-cfg.cc (verify_gimple_call, pass_warn_function_return::execute): Likewise. * attribs.cc (decl_attributes): Don't report exclusions on BUILT_IN_UNREACHABLE_TRAP either. * gcc.dg/pr107300.c: New test.
2023-01-02Update copyright years.Jakub Jelinek1-1/+1
2022-10-06c++, c: Implement C++23 P1774R8 - Portable assumptions [PR106654]Jakub Jelinek1-0/+4
The following patch implements C++23 P1774R8 - Portable assumptions paper, by introducing support for [[assume (cond)]]; attribute for C++. In addition to that the patch adds [[gnu::assume (cond)]]; and __attribute__((assume (cond))); support to both C and C++. As described in C++23, the attribute argument is conditional-expression rather than the usual assignment-expression for attribute arguments, the condition is contextually converted to bool (for C truthvalue conversion is done on it) and is never evaluated at runtime. For C++ constant expression evaluation, I only check the simplest conditions for undefined behavior, because otherwise I'd need to undo changes to *ctx->global which happened during the evaluation (but I believe the spec allows that and we can further improve later). The patch uses a new internal function, .ASSUME, to hold the condition in the FEs. At gimplification time, if the condition is simple/without side-effects, it is gimplified as if (cond) ; else __builtin_unreachable (); and otherwise for now dropped on the floor. The intent is to incrementally outline the conditions into separate artificial functions and use .ASSUME further to tell the ranger and perhaps other optimization passes about the assumptions, as detailed in the PR. When implementing it, I found that assume entry hasn't been added to https://eel.is/c++draft/cpp.cond#6 Jonathan said he'll file a NB comment about it, this patch assumes it has been added into the table as 202207L when the paper has been voted in. With the attributes for both C/C++, I'd say we don't need to add __builtin_assume with similar purpose, especially when __builtin_assume in LLVM is just weird. It is strange for side-effects in function call's argument not to be evaluated, and LLVM in that case (annoyingly) warns and ignores the side-effects (but doesn't do then anything with it), if there are no side-effects, it will work like our if (!cond) __builtin_unreachable (); 2022-10-06 Jakub Jelinek <jakub@redhat.com> PR c++/106654 gcc/ * internal-fn.def (ASSUME): New internal function. * internal-fn.h (expand_ASSUME): Declare. * internal-fn.cc (expand_ASSUME): Define. * gimplify.cc (gimplify_call_expr): Gimplify IFN_ASSUME. * fold-const.h (simple_condition_p): Declare. * fold-const.cc (simple_operand_p_2): Rename to ... (simple_condition_p): ... this. Remove forward declaration. No longer static. Adjust function comment and fix a typo in it. Adjust recursive call. (simple_operand_p): Adjust function comment. (fold_truth_andor): Adjust simple_operand_p_2 callers to call simple_condition_p. * doc/extend.texi: Document assume attribute. Move fallthrough attribute example to its section. gcc/c-family/ * c-attribs.cc (handle_assume_attribute): New function. (c_common_attribute_table): Add entry for assume attribute. * c-lex.cc (c_common_has_attribute): Handle __have_cpp_attribute (assume). gcc/c/ * c-parser.cc (handle_assume_attribute): New function. (c_parser_declaration_or_fndef): Handle assume attribute. (c_parser_attribute_arguments): Add assume_attr argument, if true, parse first argument as conditional expression. (c_parser_gnu_attribute, c_parser_std_attribute): Adjust c_parser_attribute_arguments callers. (c_parser_statement_after_labels) <case RID_ATTRIBUTE>: Handle assume attribute. gcc/cp/ * cp-tree.h (process_stmt_assume_attribute): Implement C++23 P1774R8 - Portable assumptions. Declare. (diagnose_failing_condition): Declare. (find_failing_clause): Likewise. * parser.cc (assume_attr): New enumerator. (cp_parser_parenthesized_expression_list): Handle assume_attr. Remove identifier variable, for id_attr push the identifier into expression_list right away instead of inserting it before all the others at the end. (cp_parser_conditional_expression): New function. (cp_parser_constant_expression): Use it. (cp_parser_statement): Handle assume attribute. (cp_parser_expression_statement): Likewise. (cp_parser_gnu_attribute_list): Use assume_attr for assume attribute. (cp_parser_std_attribute): Likewise. Handle standard assume attribute like gnu::assume. * cp-gimplify.cc (process_stmt_assume_attribute): New function. * constexpr.cc: Include fold-const.h. (find_failing_clause_r, find_failing_clause): New functions, moved from semantics.cc with ctx argument added and if non-NULL, call cxx_eval_constant_expression rather than fold_non_dependent_expr. (cxx_eval_internal_function): Handle IFN_ASSUME. (potential_constant_expression_1): Likewise. * pt.cc (tsubst_copy_and_build): Likewise. * semantics.cc (diagnose_failing_condition): New function. (find_failing_clause_r, find_failing_clause): Moved to constexpr.cc. (finish_static_assert): Use it. Add auto_diagnostic_group. gcc/testsuite/ * gcc.dg/attr-assume-1.c: New test. * gcc.dg/attr-assume-2.c: New test. * gcc.dg/attr-assume-3.c: New test. * g++.dg/cpp2a/feat-cxx2a.C: Add colon to C++20 features comment, add C++20 attributes comment and move C++20 new features after the attributes before them. * g++.dg/cpp23/feat-cxx2b.C: Likewise. Test __has_cpp_attribute(assume). * g++.dg/cpp23/attr-assume1.C: New test. * g++.dg/cpp23/attr-assume2.C: New test. * g++.dg/cpp23/attr-assume3.C: New test. * g++.dg/cpp23/attr-assume4.C: New test.
2022-08-26OpenMP: Support reverse offload (middle end part)Tobias Burnus1-0/+1
gcc/ChangeLog: * internal-fn.cc (expand_GOMP_TARGET_REV): New. * internal-fn.def (GOMP_TARGET_REV): New. * lto-cgraph.cc (lto_output_node, verify_node_partition): Mark 'omp target device_ancestor_host' as in_other_partition and don't error if absent. * omp-low.cc (create_omp_child_function): Mark as 'noclone'. * omp-expand.cc (expand_omp_target): For reverse offload, remove sorry, use device = GOMP_DEVICE_HOST_FALLBACK and create empty-body nohost function. * omp-offload.cc (execute_omp_device_lower): Handle IFN_GOMP_TARGET_REV. (pass_omp_target_link::execute): For ACCEL_COMPILER, don't nullify fn argument for reverse offload libgomp/ChangeLog: * libgomp.texi (OpenMP 5.0): Mark 'ancestor' as implemented but refer to 'requires'. * testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c: New test. * testsuite/libgomp.c-c++-common/reverse-offload-1.c: New test. * testsuite/libgomp.fortran/reverse-offload-1-aux.f90: New test. * testsuite/libgomp.fortran/reverse-offload-1.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/reverse-offload-1.c: Remove dg-sorry. * c-c++-common/gomp/target-device-ancestor-4.c: Likewise. * gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise. * gfortran.dg/gomp/target-device-ancestor-5.f90: Likewise. * c-c++-common/goacc/classify-kernels-parloops.c: Add 'noclone' to scan-tree-dump-times. * c-c++-common/goacc/classify-kernels-unparallelized-parloops.c: Likewise. * c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/classify-parallel.c: Likewise. * c-c++-common/goacc/classify-serial.c: Likewise. * c-c++-common/goacc/kernels-counter-vars-function-scope.c: Likewise. * c-c++-common/goacc/kernels-loop-2.c: Likewise. * c-c++-common/goacc/kernels-loop-3.c: Likewise. * c-c++-common/goacc/kernels-loop-data-2.c: Likewise. * c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise. * c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise. * c-c++-common/goacc/kernels-loop-data-update.c: Likewise. * c-c++-common/goacc/kernels-loop-data.c: Likewise. * c-c++-common/goacc/kernels-loop-g.c: Likewise. * c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise. * c-c++-common/goacc/kernels-loop-n.c: Likewise. * c-c++-common/goacc/kernels-loop-nest.c: Likewise. * c-c++-common/goacc/kernels-loop.c: Likewise. * c-c++-common/goacc/kernels-one-counter-var.c: Likewise. * c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c: Likewise. * gfortran.dg/goacc/classify-kernels-parloops.f95: Likewise. * gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95: Likewise. * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise. * gfortran.dg/goacc/classify-kernels.f95: Likewise. * gfortran.dg/goacc/classify-parallel.f95: Likewise. * gfortran.dg/goacc/classify-serial.f95: Likewise. * gfortran.dg/goacc/kernels-loop-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data.f95: Likewise. * gfortran.dg/goacc/kernels-loop-n.f95: Likewise. * gfortran.dg/goacc/kernels-loop.f95: Likewise. * gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95: Likewise.
2022-08-26internal-fn, tree-cfg: Fix .TRAP handling and another __builtin_trap vops ↵Jakub Jelinek1-1/+2
issue [PR106099] This patch fixes 2 __builtin_unreachable/__builtin_trap related issues. One (first hunk) is that CDDCE happily removes calls to .TRAP () internal-fn as useless. The problem is that the internal-fn is ECF_CONST | ECF_NORETURN, doesn't have lhs and so DCE thinks it doesn't have side-effects and removes it. __builtin_unreachable which has the same ECF_* flags works fine, as since PR44485 we implicitly add ECF_LOOPING_CONST_OR_PURE to ECF_CONST | ECF_NORETURN builtins, but do it in flags_from_decl_or_type which isn't called for internal-fns. As IFN_TRAP is the only ifn with such flags, it seems easier to add it explicitly. The other issue (which on the testcase can be seen only with the first bug unfixed) is that execute_fixup_cfg can add a __builtin_trap which needs vops, but nothing adds it and it can appear in many passes which don't have corresponding TODO_update_ssa_only_virtuals etc. Fixed similarly as last time but emitting ifn there instead. 2022-08-26 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/106099 * internal-fn.def (TRAP): Add ECF_LOOPING_CONST_OR_PURE flag. * tree-cfg.cc (execute_fixup_cfg): Add IFN_TRAP instead of __builtin_trap to avoid the need of vops. * gcc.dg/pr106099.c: New test.
2022-07-28gimple, internal-fn: Add IFN_TRAP and use it for __builtin_unreachable ↵Jakub Jelinek1-0/+4
[PR106099] __builtin_unreachable and __ubsan_handle_builtin_unreachable don't use vops, they are marked const/leaf/noreturn/nothrow/cold. But __builtin_trap uses vops, isn't const, just leaf/noreturn/nothrow/cold. This is I believe so that when users explicitly use __builtin_trap in their sources they get stores visible at the trap side. -fsanitize=unreachable -fsanitize-undefined-trap-on-error used to transform __builtin_unreachable to __builtin_trap even in the past, but the sanopt pass has TODO_update_ssa, so it worked fine. Now that gimple_build_builtin_unreachable can build a __builtin_trap call right away, we can run into problems that whenever we need it we would need to either manually or through TODO_update* ensure the vops being updated. Though, as it is originally __builtin_unreachable which is just implemented as trap, I think for this case it is fine to avoid vops. For this the patch introduces IFN_TRAP, which has ECF_* flags like __builtin_unreachable and is expanded as __builtin_trap. 2022-07-28 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/106099 * internal-fn.def (TRAP): New internal fn. * internal-fn.h (expand_TRAP): Declare. * internal-fn.cc (expand_TRAP): Define. * gimple.cc (gimple_build_builtin_unreachable): For BUILT_IN_TRAP, use internal fn rather than builtin. * gcc.dg/ubsan/pr106099.c: New test.
2022-07-12Add internal functions for iround etc. [PR106253]Richard Sandiford1-0/+23
The PR is about the aarch64 port using an ACLE built-in function to vectorise a scalar function call, even though the ECF_* flags for the ACLE function didn't match the ECF_* flags for the scalar call. To some extent that kind of difference is inevitable, since the ACLE intrinsics are supposed to follow the behaviour of the underlying instruction as closely as possible. Also, using target-specific builtins has the drawback of limiting further gimple optimisation, since the gimple optimisers won't know what the function does. We handle several other maths functions, including round, floor and ceil, by defining directly-mapped internal functions that are linked to the associated built-in functions. This has two main advantages: - it means that, internally, we are not restricted to the set of scalar types that happen to have associated C/C++ functions - the functions (and thus the underlying optabs) extend naturally to vectors This patch takes the same approach for the remaining functions handled by aarch64_builtin_vectorized_function. gcc/ PR target/106253 * predict.h (insn_optimization_type): Declare. * predict.cc (insn_optimization_type): New function. * internal-fn.def (IFN_ICEIL, IFN_IFLOOR, IFN_IRINT, IFN_IROUND) (IFN_LCEIL, IFN_LFLOOR, IFN_LRINT, IFN_LROUND, IFN_LLCEIL) (IFN_LLFLOOR, IFN_LLRINT, IFN_LLROUND): New internal functions. * internal-fn.cc (unary_convert_direct): New macro. (expand_convert_optab_fn): New function. (expand_unary_convert_optab_fn): New macro. (direct_unary_convert_optab_supported_p): Likewise. * optabs.cc (expand_sfix_optab): Pass insn_optimization_type to convert_optab_handler. * config/aarch64/aarch64-protos.h (aarch64_builtin_vectorized_function): Delete. * config/aarch64/aarch64-builtins.cc (aarch64_builtin_vectorized_function): Delete. * config/aarch64/aarch64.cc (TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Delete. * config/i386/i386.cc (ix86_optab_supported_p): Handle lround_optab. * config/i386/i386.md (lround<X87MODEF:mode><SWI248x:mode>2): Remove optimize_insn_for_size_p test. gcc/testsuite/ PR target/106253 * gcc.target/aarch64/vect_unary_1.c: Add tests for iroundf, llround, iceilf, llceil, ifloorf, llfloor, irintf and llrint. * gfortran.dg/vect/pr106253.f: New test.
2022-06-15Revert recent internal-fn changes [PR105975]Richard Sandiford1-26/+8
The recent internal-fn “clean-ups” triggered problems on nvptx because some of the omp_simt_* patterns had modeless operands. I wondered about adapting expand_fn_using_insn to cope with that, but then the problem becomes: what should the mode of operand 0 be when there is no lhs? The answer depends on the target insn. For GOMP_SIMT_ENTER_ALLOC the answer was: use Pmode. For GOMP_SIMT_ORDERED_PRED and others the answer was: elide the call. (However, GOMP_SIMT_ORDERED_PRED doesn't seem to have ECF_* flags that would normally allow it to be dropped at the gimple level.) So these instructions seem to be special enough that they need their own code after all. This patch reverts the second patch and most of the first. The only part retained from the first is splitting expand_fn_using_insn out of expand_direct_optab_fn, since I think expand_fn_using_insn could still be useful in future. gcc/ PR middle-end/105975 Revert everything apart from the expand_fn_using_insn and expand_direct_optab_fn changes from: * internal-fn.def (DEF_INTERNAL_INSN_FN): New macro. (GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT, GOMP_SIMT_LANE) (GOMP_SIMT_LAST_LANE, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_VOTE_ANY) (GOMP_SIMT_XCHG_BFLY, GOMP_SIMT_XCHG_IDX): Use it. * internal-fn.h (direct_internal_fn_info::directly_mapped): New member variable. (direct_internal_fn_info::vectorizable): Reduce to 1 bit. (direct_internal_fn_p): Also return true for internal functions that map directly to instructions defined target-insns.def. (direct_internal_fn): Adjust comment accordingly. * internal-fn.cc (direct_insn, optab1, optab2, vectorizable_optab1) (vectorizable_optab2): New local macros. (not_direct): Initialize directly_mapped. (mask_load_direct, load_lanes_direct, mask_load_lanes_direct) (gather_load_direct, len_load_direct, mask_store_direct) (store_lanes_direct, mask_store_lanes_direct, vec_cond_mask_direct) (vec_cond_direct, scatter_store_direct, len_store_direct) (vec_set_direct, unary_direct, binary_direct, ternary_direct) (cond_unary_direct, cond_binary_direct, cond_ternary_direct) (while_direct, fold_extract_direct, fold_left_direct) (mask_fold_left_direct, check_ptrs_direct): Use the macros above. (expand_GOMP_SIMT_ENTER_ALLOC, expand_GOMP_SIMT_EXIT): Delete (expand_GOMP_SIMT_LANE, expand_GOMP_SIMT_LAST_LANE): Likewise; (expand_GOMP_SIMT_ORDERED_PRED, expand_GOMP_SIMT_VOTE_ANY): Likewise. (expand_GOMP_SIMT_XCHG_BFLY, expand_GOMP_SIMT_XCHG_IDX): Likewise. (direct_internal_fn_types): Handle functions that map to instructions defined in target-insns.def. (direct_internal_fn_types): Likewise. (direct_internal_fn_supported_p): Likewise. (internal_fn_expanders): Likewise. (expand_fn_using_insn): New function, split out and adapted from... (expand_direct_optab_fn): ...here. (expand_GOMP_SIMT_ENTER_ALLOC): Use it. (expand_GOMP_SIMT_EXIT): Likewise. (expand_GOMP_SIMT_LANE): Likewise. (expand_GOMP_SIMT_LAST_LANE): Likewise. (expand_GOMP_SIMT_ORDERED_PRED): Likewise. (expand_GOMP_SIMT_VOTE_ANY): Likewise. (expand_GOMP_SIMT_XCHG_BFLY): Likewise. (expand_GOMP_SIMT_XCHG_IDX): Likewise.
2022-06-13Add a general mapping from internal fns to target insnsRichard Sandiford1-8/+26
Several existing internal functions map directly to an instruction defined in target-insns.def. This patch makes it easier to define more such functions in future. This should help to reduce cut-&-paste, but more importantly, it allows the difference between optab functions and target-insns.def functions to be abstracted away; both are now treated as “directly-mapped”. gcc/ * internal-fn.def (DEF_INTERNAL_INSN_FN): New macro. (GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT, GOMP_SIMT_LANE) (GOMP_SIMT_LAST_LANE, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_VOTE_ANY) (GOMP_SIMT_XCHG_BFLY, GOMP_SIMT_XCHG_IDX): Use it. * internal-fn.h (direct_internal_fn_info::directly_mapped): New member variable. (direct_internal_fn_info::vectorizable): Reduce to 1 bit. (direct_internal_fn_p): Also return true for internal functions that map directly to instructions defined target-insns.def. (direct_internal_fn): Adjust comment accordingly. * internal-fn.cc (direct_insn, optab1, optab2, vectorizable_optab1) (vectorizable_optab2): New local macros. (not_direct): Initialize directly_mapped. (mask_load_direct, load_lanes_direct, mask_load_lanes_direct) (gather_load_direct, len_load_direct, mask_store_direct) (store_lanes_direct, mask_store_lanes_direct, vec_cond_mask_direct) (vec_cond_direct, scatter_store_direct, len_store_direct) (vec_set_direct, unary_direct, binary_direct, ternary_direct) (cond_unary_direct, cond_binary_direct, cond_ternary_direct) (while_direct, fold_extract_direct, fold_left_direct) (mask_fold_left_direct, check_ptrs_direct): Use the macros above. (expand_GOMP_SIMT_ENTER_ALLOC, expand_GOMP_SIMT_EXIT): Delete (expand_GOMP_SIMT_LANE, expand_GOMP_SIMT_LAST_LANE): Likewise; (expand_GOMP_SIMT_ORDERED_PRED, expand_GOMP_SIMT_VOTE_ANY): Likewise. (expand_GOMP_SIMT_XCHG_BFLY, expand_GOMP_SIMT_XCHG_IDX): Likewise. (direct_internal_fn_types): Handle functions that map to instructions defined in target-insns.def. (direct_internal_fn_types): Likewise. (direct_internal_fn_supported_p): Likewise. (internal_fn_expanders): Likewise.
2022-01-17widening_mul, i386: Improve spaceship expansion on x86 [PR103973]Jakub Jelinek1-0/+3
C++20: #include <compare> auto cmp4way(double a, double b) { return a <=> b; } expands to: ucomisd %xmm1, %xmm0 jp .L8 movl $0, %eax jne .L8 .L2: ret .p2align 4,,10 .p2align 3 .L8: comisd %xmm0, %xmm1 movl $-1, %eax ja .L2 ucomisd %xmm1, %xmm0 setbe %al addl $1, %eax ret That is 3 comparisons of the same operands. The following patch improves it to just one comparison: comisd %xmm1, %xmm0 jp .L4 seta %al movl $0, %edx leal -1(%rax,%rax), %eax cmove %edx, %eax ret .L4: movl $2, %eax ret While a <=> b expands to a == b ? 0 : a < b ? -1 : a > b ? 1 : 2 where the first comparison is equality and this shouldn't raise exceptions on qNaN operands, if the operands aren't equal (which includes unordered cases), then it immediately performs < or > comparison and that raises exceptions even on qNaNs, so we can just perform a single comparison that raises exceptions on qNaN. As the 4 different cases are encoded as ZF CF PF 1 1 1 a unordered b 0 0 0 a > b 0 1 0 a < b 1 0 0 a == b we can emit optimal sequence of comparions, first jp for the unordered case, then je for the == case and finally jb for the < case. The patch pattern recognizes spaceship-like comparisons during widening_mul if the spaceship optab is implemented, and replaces those comparisons with comparisons of .SPACESHIP ifn which returns -1/0/1/2 based on the comparison. This seems to work well both for the case of just returning the -1/0/1/2 (when we have just a common successor with a PHI) or when the different cases are handled with various other basic blocks. The testcases cover both of those cases, the latter with different function calls in those. 2022-01-17 Jakub Jelinek <jakub@redhat.com> PR target/103973 * tree-cfg.h (cond_only_block_p): Declare. * tree-ssa-phiopt.c (cond_only_block_p): Move function to ... * tree-cfg.c (cond_only_block_p): ... here. No longer static. * optabs.def (spaceship_optab): New optab. * internal-fn.def (SPACESHIP): New internal function. * internal-fn.h (expand_SPACESHIP): Declare. * internal-fn.c (expand_PHI): Formatting fix. (expand_SPACESHIP): New function. * tree-ssa-math-opts.c (optimize_spaceship): New function. (math_opts_dom_walker::after_dom_children): Use it. * config/i386/i386.md (spaceship<mode>3): New define_expand. * config/i386/i386-protos.h (ix86_expand_fp_spaceship): Declare. * config/i386/i386-expand.c (ix86_expand_fp_spaceship): New function. * doc/md.texi (spaceship@var{m}3): Document. * gcc.target/i386/pr103973-1.c: New test. * gcc.target/i386/pr103973-2.c: New test. * gcc.target/i386/pr103973-3.c: New test. * gcc.target/i386/pr103973-4.c: New test. * gcc.target/i386/pr103973-5.c: New test. * gcc.target/i386/pr103973-6.c: New test. * gcc.target/i386/pr103973-7.c: New test. * gcc.target/i386/pr103973-8.c: New test. * gcc.target/i386/pr103973-9.c: New test. * gcc.target/i386/pr103973-10.c: New test. * gcc.target/i386/pr103973-11.c: New test. * gcc.target/i386/pr103973-12.c: New test. * gcc.target/i386/pr103973-13.c: New test. * gcc.target/i386/pr103973-14.c: New test. * gcc.target/i386/pr103973-15.c: New test. * gcc.target/i386/pr103973-16.c: New test. * gcc.target/i386/pr103973-17.c: New test. * gcc.target/i386/pr103973-18.c: New test. * gcc.target/i386/pr103973-19.c: New test. * gcc.target/i386/pr103973-20.c: New test. * g++.target/i386/pr103973-1.C: New test. * g++.target/i386/pr103973-2.C: New test. * g++.target/i386/pr103973-3.C: New test. * g++.target/i386/pr103973-4.C: New test. * g++.target/i386/pr103973-5.C: New test. * g++.target/i386/pr103973-6.C: New test. * g++.target/i386/pr103973-7.C: New test. * g++.target/i386/pr103973-8.C: New test. * g++.target/i386/pr103973-9.C: New test. * g++.target/i386/pr103973-10.C: New test. * g++.target/i386/pr103973-11.C: New test. * g++.target/i386/pr103973-12.C: New test. * g++.target/i386/pr103973-13.C: New test. * g++.target/i386/pr103973-14.C: New test. * g++.target/i386/pr103973-15.C: New test. * g++.target/i386/pr103973-16.C: New test. * g++.target/i386/pr103973-17.C: New test. * g++.target/i386/pr103973-18.C: New test. * g++.target/i386/pr103973-19.C: New test. * g++.target/i386/pr103973-20.C: New test.
2022-01-03i386, fab: Optimize __atomic_{add,sub,and,or,xor}_fetch (x, y, z) ↵Jakub Jelinek1-0/+5
{==,!=,<,<=,>,>=} 0 [PR98737] On Wed, Jan 27, 2021 at 12:27:13PM +0100, Ulrich Drepper via Gcc-patches wrote: > On 1/27/21 11:37 AM, Jakub Jelinek wrote: > > Would equality comparison against 0 handle the most common cases. > > > > The user can write it as > > __atomic_sub_fetch (x, y, z) == 0 > > or > > __atomic_fetch_sub (x, y, z) - y == 0 > > thouch, so the expansion code would need to be able to cope with both. > > Please also keep !=0, <0, <=0, >0, and >=0 in mind. They all can be > useful and can be handled with the flags. <= 0 and > 0 don't really work well with lock {add,sub,inc,dec}, x86 doesn't have comparisons that would look solely at both SF and ZF and not at other flags (and emitting two separate conditional jumps or two setcc insns and oring them together looks awful). But the rest can work. Here is a patch that adds internal functions and optabs for these, recognizes them at the same spot as e.g. .ATOMIC_BIT_TEST_AND* internal functions (fold all builtins pass) and expands them appropriately (or for the <= 0 and > 0 cases of +/- FAILs and let's middle-end fall back). So far I have handled just the op_fetch builtins, IMHO instead of handling also __atomic_fetch_sub (x, y, z) - y == 0 etc. we should canonicalize __atomic_fetch_sub (x, y, z) - y to __atomic_sub_fetch (x, y, z) (and vice versa). 2022-01-03 Jakub Jelinek <jakub@redhat.com> PR target/98737 * internal-fn.def (ATOMIC_ADD_FETCH_CMP_0, ATOMIC_SUB_FETCH_CMP_0, ATOMIC_AND_FETCH_CMP_0, ATOMIC_OR_FETCH_CMP_0, ATOMIC_XOR_FETCH_CMP_0): New internal fns. * internal-fn.h (ATOMIC_OP_FETCH_CMP_0_EQ, ATOMIC_OP_FETCH_CMP_0_NE, ATOMIC_OP_FETCH_CMP_0_LT, ATOMIC_OP_FETCH_CMP_0_LE, ATOMIC_OP_FETCH_CMP_0_GT, ATOMIC_OP_FETCH_CMP_0_GE): New enumerators. * internal-fn.c (expand_ATOMIC_ADD_FETCH_CMP_0, expand_ATOMIC_SUB_FETCH_CMP_0, expand_ATOMIC_AND_FETCH_CMP_0, expand_ATOMIC_OR_FETCH_CMP_0, expand_ATOMIC_XOR_FETCH_CMP_0): New functions. * optabs.def (atomic_add_fetch_cmp_0_optab, atomic_sub_fetch_cmp_0_optab, atomic_and_fetch_cmp_0_optab, atomic_or_fetch_cmp_0_optab, atomic_xor_fetch_cmp_0_optab): New direct optabs. * builtins.h (expand_ifn_atomic_op_fetch_cmp_0): Declare. * builtins.c (expand_ifn_atomic_op_fetch_cmp_0): New function. * tree-ssa-ccp.c: Include internal-fn.h. (optimize_atomic_bit_test_and): Add . before internal fn call in function comment. Change return type from void to bool and return true only if successfully replaced. (optimize_atomic_op_fetch_cmp_0): New function. (pass_fold_builtins::execute): Use optimize_atomic_op_fetch_cmp_0 for BUILT_IN_ATOMIC_{ADD,SUB,AND,OR,XOR}_FETCH_{1,2,4,8,16} and BUILT_IN_SYNC_{ADD,SUB,AND,OR,XOR}_AND_FETCH_{1,2,4,8,16}, for *XOR* ones only if optimize_atomic_bit_test_and failed. * config/i386/sync.md (atomic_<plusminus_mnemonic>_fetch_cmp_0<mode>, atomic_<logic>_fetch_cmp_0<mode>): New define_expand patterns. (atomic_add_fetch_cmp_0<mode>_1, atomic_sub_fetch_cmp_0<mode>_1, atomic_<logic>_fetch_cmp_0<mode>_1): New define_insn patterns. * doc/md.texi (atomic_add_fetch_cmp_0<mode>, atomic_sub_fetch_cmp_0<mode>, atomic_and_fetch_cmp_0<mode>, atomic_or_fetch_cmp_0<mode>, atomic_xor_fetch_cmp_0<mode>): Document new named patterns. * gcc.target/i386/pr98737-1.c: New test. * gcc.target/i386/pr98737-2.c: New test. * gcc.target/i386/pr98737-3.c: New test. * gcc.target/i386/pr98737-4.c: New test. * gcc.target/i386/pr98737-5.c: New test. * gcc.target/i386/pr98737-6.c: New test. * gcc.target/i386/pr98737-7.c: New test.