aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-06-07Match: Support more form for scalar unsigned SAT_ADDPan Li4-5/+236
After we support one gassign form of the unsigned .SAT_ADD, we would like to support more forms including both the branch and branchless. There are 5 other forms of .SAT_ADD, list as below: Form 1: #define SAT_ADD_U_1(T) \ T sat_add_u_1_##T(T x, T y) \ { \ return (T)(x + y) >= x ? (x + y) : -1; \ } Form 2: #define SAT_ADD_U_2(T) \ T sat_add_u_2_##T(T x, T y) \ { \ T ret; \ T overflow = __builtin_add_overflow (x, y, &ret); \ return (T)(-overflow) | ret; \ } Form 3: #define SAT_ADD_U_3(T) \ T sat_add_u_3_##T (T x, T y) \ { \ T ret; \ return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \ } Form 4: #define SAT_ADD_U_4(T) \ T sat_add_u_4_##T (T x, T y) \ { \ T ret; \ return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \ } Form 5: #define SAT_ADD_U_5(T) \ T sat_add_u_5_##T(T x, T y) \ { \ return (T)(x + y) < x ? -1 : (x + y); \ } Take the forms 3 of above as example: uint64_t sat_add (uint64_t x, uint64_t y) { uint64_t ret; return __builtin_add_overflow (x, y, &ret) ? -1 : ret; } Before this patch: uint64_t sat_add (uint64_t x, uint64_t y) { long unsigned int _1; long unsigned int _2; uint64_t _3; __complex__ long unsigned int _6; ;; basic block 2, loop depth 0 ;; pred: ENTRY _6 = .ADD_OVERFLOW (x_4(D), y_5(D)); _2 = IMAGPART_EXPR <_6>; if (_2 != 0) goto <bb 4>; [35.00%] else goto <bb 3>; [65.00%] ;; succ: 4 ;; 3 ;; basic block 3, loop depth 0 ;; pred: 2 _1 = REALPART_EXPR <_6>; ;; succ: 4 ;; basic block 4, loop depth 0 ;; pred: 3 ;; 2 # _3 = PHI <_1(3), 18446744073709551615(2)> return _3; ;; succ: EXIT } After this patch: uint64_t sat_add (uint64_t x, uint64_t y) { long unsigned int _12; ;; basic block 2, loop depth 0 ;; pred: ENTRY _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call] return _12; ;; succ: EXIT } The flag '^' acts on cond_expr will generate matching code similar as below: else if (gphi *_a1 = dyn_cast <gphi *> (_d1)) { basic_block _b1 = gimple_bb (_a1); if (gimple_phi_num_args (_a1) == 2) { basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src; basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src; basic_block _db_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_pb_0_1)) ? _pb_0_1 : _pb_1_1; basic_block _other_db_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_pb_0_1)) ? _pb_1_1 : _pb_0_1; gcond *_ct_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_db_1)); if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1 && EDGE_COUNT (_other_db_1->succs) == 1 && EDGE_PRED (_other_db_1, 0)->src == _db_1) { tree _cond_lhs_1 = gimple_cond_lhs (_ct_1); tree _cond_rhs_1 = gimple_cond_rhs (_ct_1); tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, _cond_lhs_1, _cond_rhs_1); bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & EDGE_TRUE_VALUE; tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1); tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0); .... The below test suites are passed for this patch. * The x86 bootstrap test. * The x86 fully regression test. * The riscv fully regression test. gcc/ChangeLog: * doc/match-and-simplify.texi: Add doc for the matching flag '^'. * genmatch.cc (cmp_operand): Add match_phi comparation. (dt_node::gen_kids_1): Add cond_expr bool flag for phi match. (dt_operand::gen_phi_on_cond): Add new func to gen phi matching on cond_expr. (parser::parse_expr): Add handling for the expr flag '^'. * match.pd: Add more form for unsigned .SAT_ADD. * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add new func impl to build call for phi gimple. (match_unsigned_saturation_add): Add new func impl to match the .SAT_ADD for phi gimple. (math_opts_dom_walker::after_dom_children): Add phi matching try for all gimple phi stmt. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-06-06c: Fix up pointer types to may_alias structures [PR114493]Jakub Jelinek3-0/+60
The following testcase ICEs in ipa-free-lang, because the fld_incomplete_type_of gcc_assert (TYPE_CANONICAL (t2) != t2 && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t))); assertion doesn't hold. This is because t is a struct S * type which was created while struct S was still incomplete and without the may_alias attribute (and TYPE_CANONICAL of a pointer type is a type created with can_alias_all = false argument), while later on on the struct definition may_alias attribute was used. fld_incomplete_type_of then creates an incomplete distinct copy of the structure (but with the original attributes) but pointers created for it are because of the "may_alias" attribute TYPE_REF_CAN_ALIAS_ALL, including their TYPE_CANONICAL, because while that is created with !can_alias_all argument, we later set it because of the "may_alias" attribute on the to_type. This doesn't ICE with C++ since PR70512 fix because the C++ FE sets TYPE_REF_CAN_ALIAS_ALL on all pointer types to the class type (and its variants) when the may_alias is added. The following patch does that in the C FE as well. 2024-06-06 Jakub Jelinek <jakub@redhat.com> PR c/114493 * c-decl.cc (c_fixup_may_alias): New function. (finish_struct): Call it if "may_alias" attribute is specified. * gcc.dg/pr114493-1.c: New test. * gcc.dg/pr114493-2.c: New test.
2024-06-06aarch64: Add vector floating point extend pattern [PR113880, PR113869]Pengxuan Zheng3-1/+31
This patch adds vector floating point extend pattern for V2SF->V2DF and V4HF->V4SF conversions by renaming the existing aarch64_float_extend_lo_<Vwide> pattern to the standard optab one, i.e., extend<mode><Vwide>2. This allows the vectorizer to vectorize certain floating point widening operations for the aarch64 target. PR target/113880 PR target/113869 gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (VAR1): Remap float_extend_lo_ builtin codes to standard optab ones. * config/aarch64/aarch64-simd.md (aarch64_float_extend_lo_<Vwide>): Rename to... (extend<mode><Vwide>2): ... This. gcc/testsuite/ChangeLog: * gcc.target/aarch64/extend-vec.c: New test. Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
2024-06-06modula2: Simplify REAL/LONGREAL/SHORTREAL node creation.Gaius Mulley1-23/+7
This patch simplifies the real type build functions by using the default float_type_node, double_type_node rather than create new nodes. It also uses the default GCC long_double_type_node or float128_type_nodes for longreal. gcc/m2/ChangeLog: * gm2-gcc/m2type.cc (build_m2_short_real_node): Rewrite to use the default float_type_node. (build_m2_real_node): Rewrite to use the default double_type_node. (build_m2_long_real_node): Rewrite to use the default long_double_type_node or float128_type_node. Co-Authored-By: Kewen.Lin <linkw@linux.ibm.com> Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2024-06-06testsuite/i386: Add vector sat_sub testcases [PR112600]Uros Bizjak2-0/+30
PR middle-end/112600 gcc/testsuite/ChangeLog: * gcc.target/i386/pr112600-2a.c: New test. * gcc.target/i386/pr112600-2b.c: New test.
2024-06-06Plugins: Add label-text.h to CPPLIB_H so it will be installed [PR115288]Andrew Pinski1-0/+1
After r15-874-g9bda2c4c81b668, out of tree plugins won't compile as the new libcpp header file label-text.h is not installed. This adds the new header file to CPPLIB_H which is used for the plugin headers to install. Committed as obvious after a build and install and make sure the new header file is installed. gcc/ChangeLog: PR plugins/115288 * Makefile.in (CPPLIB_H): Add label-text.h. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-06-06aarch64: Add missing ACLE macro for NEON-SVE BridgeRichard Ball1-0/+1
__ARM_NEON_SVE_BRIDGE was missed in the original patch and is added by this patch. gcc/ChangeLog: * config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros): Add missing __ARM_NEON_SVE_BRIDGE.
2024-06-06arm: Fix CASE_VECTOR_SHORTEN_MODE for thumb2.Richard Ball2-2/+146
The CASE_VECTOR_SHORTEN_MODE query is missing some equals signs which causes suboptimal codegen due to missed optimisation opportunities. This patch also adds a test for thumb2 switch statements as none exist currently. gcc/ChangeLog: PR target/115353 * config/arm/arm.h (enum arm_auto_incmodes): Correct CASE_VECTOR_SHORTEN_MODE query. gcc/testsuite/ChangeLog: * gcc.target/arm/thumb2-switchstatement.c: New test.
2024-06-06arm: Add .type and .size to __gnu_cmse_nonsecure_call [PR115360]Andre Vieira1-0/+2
This patch adds missing assembly directives to the CMSE library wrapper to call functions with attribute cmse_nonsecure_call. Without the .type directive the linker will fail to produce the correct veneer if a call to this wrapper function is to far from the wrapper itself. The .size was added for completeness, though we don't necessarily have a usecase for it. libgcc/ChangeLog: PR target/115360 * config/arm/cmse_nonsecure_call.S: Add .type and .size directives.
2024-06-06libgomp.texi (nvptx): Add missing prepositionTobias Burnus1-1/+1
libgomp/ * libgomp.texi (nvptx): Add missing preposition.
2024-06-06AArch64: correct constraint on Upl early clobber alternativesTamar Christina2-33/+33
I made an oversight in the previous patch, where I added a ?Upa alternative to the Upl cases. This causes it to create the tie between the larger register file rather than the constrained one. This fixes the affected patterns. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (@aarch64_pred_cmp<cmp_op><mode>, *cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest, @aarch64_pred_cmp<cmp_op><mode>_wide, *aarch64_pred_cmp<cmp_op><mode>_wide_cc, *aarch64_pred_cmp<cmp_op><mode>_wide_ptest): Fix Upl tie alternative. * config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Fix Upl tie alternative.
2024-06-06nvptx, libgfortran: Switch out of "minimal" modeThomas Schwinge11-53/+134
..., in order to enable (portions of) Fortran I/O, for example. libgfortran/ * configure.ac: No longer set 'LIBGFOR_MINIMAL' for nvptx. * configure: Regenerate. libgomp/ * libgomp.texi (nvptx): Update. * testsuite/libgomp.fortran/target-print-1-nvptx.f90: Remove. * testsuite/libgomp.fortran/target-print-1.f90: Adjust. * testsuite/libgomp.oacc-fortran/error_stop-2-nvptx.f: New. * testsuite/libgomp.oacc-fortran/error_stop-2.f: Adjust. * testsuite/libgomp.oacc-fortran/print-1-nvptx.f90: Adjust. * testsuite/libgomp.oacc-fortran/print-1.f90: Adjust. * testsuite/libgomp.oacc-fortran/stop-2-nvptx.f: New. * testsuite/libgomp.oacc-fortran/stop-2.f: Adjust. Co-authored-by: Andrew Stubbs <ams@gcc.gnu.org>
2024-06-06nvptx offloading: 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment ↵Thomas Schwinge2-0/+46
variable [PR97384, PR105274] ... as a means to manually set the "native" GPU thread stack size. PR libgomp/97384 PR libgomp/105274 libgomp/ * plugin/cuda-lib.def (cuCtxSetLimit): Add. * plugin/plugin-nvptx.c (nvptx_open_device): Handle 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment variable.
2024-06-06nvptx, libgcc: Stub unwinding implementationThomas Schwinge2-1/+39
Adding stub '_Unwind_Backtrace', '_Unwind_GetIPInfo' functions is necessary for linking libbacktrace, as a normal (non-'LIBGFOR_MINIMAL') configuration of libgfortran wants to do, for example. The file 'libgcc/config/nvptx/unwind-nvptx.c' is copied from 'libgcc/config/gcn/unwind-gcn.c'. libgcc/ChangeLog: * config/nvptx/t-nvptx: Add unwind-nvptx.c. * config/nvptx/unwind-nvptx.c: New file. Co-authored-by: Andrew Stubbs <ams@gcc.gnu.org>
2024-06-06nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld'Thomas Schwinge2-1/+171
This extends commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53 "nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'" for offloading. libgcc/ * config/nvptx/gbl-ctors.c ["mgomp"] (__do_global_ctors__entry__mgomp) (__do_global_dtors__entry__mgomp): New. [!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry): New. libgomp/ * plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New. (nvptx_close_device, GOMP_OFFLOAD_load_image) (GOMP_OFFLOAD_unload_image): Call it.
2024-06-06nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution, via ↵Thomas Schwinge4-9/+39
'vote.all.pred' For example, this allows for '-muniform-simt' code to be executed single-threaded, which currently fails (device-side 'trap'): the '0xffffffff' bitmask isn't correct if not all 32 threads of a warp are active. The same issue/fix, I suppose but have not verified, would apply if we were to allow for OpenACC 'vector_length' smaller than 32, for example for OpenACC 'serial'. We use 'nvptx_uniform_warp_check' only for PTX ISA version less than 6.0. Otherwise we're using 'nvptx_warpsync', which emits 'bar.warp.sync 0xffffffff', which evidently appears to do the right thing. (I've tested '-muniform-simt' code executing single-threaded.) The change that I proposed on 2022-12-15 was to emit PTX code to calculate '(1 << %ntid.x) - 1' as the actual bitmask to use instead of '0xffffffff'. This works, but the PTX JIT generates SASS code to do this computation. In turn, this change now uses PTX 'vote.all.pred' -- which even simplifies upon the original code a little bit, see the following examplary SASS 'diff' before vs. after this change: [...] /*[...]*/ SYNC (*"BRANCH_TARGETS .L_x_332"*) } .L_x_332: - /*[...]*/ VOTE.ANY R9, PT, PT ; + /*[...]*/ VOTE.ALL P1, PT ; - /*[...]*/ ISETP.NE.U32.AND P1, PT, R9, -0x1, PT ; - /*[...]*/ @!P1 BRA `(.L_x_333) ; + /*[...]*/ @P1 BRA `(.L_x_333) ; /*[...]*/ BPT.TRAP 0x1 ; .L_x_333: - /*[...]*/ @P1 EXIT ; + /*[...]*/ @!P1 EXIT ; [...] gcc/ * config/nvptx/nvptx.md (nvptx_uniform_warp_check): Make fit for non-full-warp execution, via 'vote.all.pred'. gcc/testsuite/ * gcc.target/nvptx/nvptx.exp (check_effective_target_default_ptx_isa_version_at_least_6_0): New. * gcc.target/nvptx/uniform-simt-2.c: Adjust. * gcc.target/nvptx/uniform-simt-5.c: New.
2024-06-06Clean up after newlib "nvptx: In offloading execution, map '_exit' to ↵Thomas Schwinge8-33/+37
'abort' [GCC PR85463]" PR target/85463 libgfortran/ * runtime/minimal.c [__nvptx__] (exit): Don't override. libgomp/ * config/nvptx/error.c (exit): Don't override. * testsuite/libgomp.oacc-fortran/error_stop-1.f: Update. * testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise. * testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise. * testsuite/libgomp.oacc-fortran/stop-1.f: Likewise. * testsuite/libgomp.oacc-fortran/stop-2.f: Likewise. * testsuite/libgomp.oacc-fortran/stop-3.f: Likewise.
2024-06-06Vect: Support IFN SAT_SUB for unsigned vector intPan Li2-15/+84
This patch would like to support the .SAT_SUB for the unsigned vector int. Given we have below example code: void vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { for (unsigned i = 0; i < n; i++) out[i] = (x[i] - y[i]) & (-(uint64_t)(x[i] >= y[i])); } Before this patch: void vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { ... _77 = .SELECT_VL (ivtmp_75, POLY_INT_CST [2, 2]); ivtmp_56 = _77 * 8; vect__4.7_59 = .MASK_LEN_LOAD (vectp_x.5_57, 64B, { -1, ... }, _77, 0); vect__6.10_63 = .MASK_LEN_LOAD (vectp_y.8_61, 64B, { -1, ... }, _77, 0); mask__7.11_64 = vect__4.7_59 >= vect__6.10_63; _66 = .COND_SUB (mask__7.11_64, vect__4.7_59, vect__6.10_63, { 0, ... }); .MASK_LEN_STORE (vectp_out.15_71, 64B, { -1, ... }, _77, 0, _66); vectp_x.5_58 = vectp_x.5_57 + ivtmp_56; vectp_y.8_62 = vectp_y.8_61 + ivtmp_56; vectp_out.15_72 = vectp_out.15_71 + ivtmp_56; ivtmp_76 = ivtmp_75 - _77; ... } After this patch: void vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { ... _76 = .SELECT_VL (ivtmp_74, POLY_INT_CST [2, 2]); ivtmp_60 = _76 * 8; vect__4.7_63 = .MASK_LEN_LOAD (vectp_x.5_61, 64B, { -1, ... }, _76, 0); vect__6.10_67 = .MASK_LEN_LOAD (vectp_y.8_65, 64B, { -1, ... }, _76, 0); vect_patt_37.11_68 = .SAT_SUB (vect__4.7_63, vect__6.10_67); .MASK_LEN_STORE (vectp_out.12_70, 64B, { -1, ... }, _76, 0, vect_patt_37.11_68); vectp_x.5_62 = vectp_x.5_61 + ivtmp_60; vectp_y.8_66 = vectp_y.8_65 + ivtmp_60; vectp_out.12_71 = vectp_out.12_70 + ivtmp_60; ivtmp_75 = ivtmp_74 - _76; ... } The below test suites are passed for this patch * The x86 bootstrap test. * The x86 fully regression test. * The riscv fully regression tests. gcc/ChangeLog: * match.pd: Add new form for vector mode recog. * tree-vect-patterns.cc (gimple_unsigned_integer_sat_sub): Add new match func decl; (vect_recog_build_binary_gimple_call): Extract helper func to build gcall with given internal_fn. (vect_recog_sat_sub_pattern): Add new func impl to recog .SAT_SUB. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-06-06lto: Remove random_seed from section name.Michal Jires2-2/+16
This patch removes suffixes from section names during LTO linking. These suffixes were originally added for ld -r to work (PR lto/44992). They were added to all LTO object files, but are only useful before WPA. After that they waste space, and if kept random, make LTO caching impossible. Bootstrapped/regtested on x86_64-pc-linux-gnu gcc/ChangeLog: * lto-streamer.cc (lto_get_section_name): Remove suffixes after WPA. gcc/lto/ChangeLog: * lto-common.cc (lto_section_with_id): Dont load suffix during LTRANS.
2024-06-06lto: Skip flag OPT_fltrans_output_list_.Michal Jires1-0/+1
Bootstrapped/regtested on x86_64-pc-linux-gnu gcc/ChangeLog: * lto-opts.cc (lto_write_options): Skip OPT_fltrans_output_list_.
2024-06-06RISC-V: Regenerate opt urls.Robin Dapp1-0/+6
I wasn't aware that I needed to regenerate the opt urls when adding an option. This patch does that. gcc/ChangeLog: * config/riscv/riscv.opt.urls: Regenerate.
2024-06-06[APX CCMP] Support ccmp for float compareHongyu Wang3-7/+138
The ccmp insn itself doesn't support fp compare, but x86 has fp comi insn that changes EFLAG which can be the scc input to ccmp. Allow scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD compare which can not be identified in ccmp. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_gen_ccmp_first): Add fp compare and check the allowed fp compare type. (ix86_gen_ccmp_next): Adjust compare_code input to ccmp for fp compare. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ccmp-1.c: Add test for fp compare. * gcc.target/i386/apx-ccmp-2.c: Likewise.
2024-06-06[APX CCMP] Adjust startegy for selecting ccmp candidatesHongyu Wang1-1/+9
For general ccmp scenario, the tree sequence is like _1 = (a < b) _2 = (c < d) _3 = _1 & _2 current ccmp expanding will try to swap compare order for _1 and _2, compare the expansion cost/cost2 for expanding _1 or _2 first, then return the sequence with lower cost. It is possible that one expansion succeeds and the other fails. For example, x86 has int ccmp but not fp ccmp, so a combined fp and int comparison must be ordered such that the fp comparison happens first. The costs are not meaningful for failed expansions. Check the expand_ccmp_next result ret and ret2, returns the valid one before cost comparison. gcc/ChangeLog: * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of expand_ccmp_next, returns the valid one first instead of comparing cost.
2024-06-06[APX CCMP] Support APX CCMPHongyu Wang9-4/+337
APX CCMP feature implements conditional compare which executes compare when EFLAGS matches certain condition. CCMP introduces default flags value (dfv), when conditional compare does not execute, it will directly set the flags according to dfv. The instruction goes like ccmpeq {dfv=sf,of,cf,zf} %rax, %r16 For this instruction, it will test EFLAGS regs if it matches conditional code EQ, if yes, compare %rax and %r16 like legacy cmp. If no, the EFLAGS will be updated according to dfv, which means SF,OF,CF,ZF are set. PF will be set according to CF in dfv, and AF will always be cleared. The dfv part can be a combination of sf,of,cf,zf, like {dfv=cf,zf} which sets CF and ZF only and clear others, or {dfv=} which clears all EFLAGS. To enable CCMP, we implemented the target hook TARGET_GEN_CCMP_FIRST and TARGET_GEN_CCMP_NEXT to reuse the current ccmp infrastructure. Also we extended the cstorem4 optab to support storing different CCmode to fit current ccmp infrasturcture. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_gen_ccmp_first): New function that test if the first compare can be generated. (ix86_gen_ccmp_next): New function to emit a simgle compare and ccmp sequence. * config/i386/i386-opts.h (enum apx_features): Add apx_ccmp. * config/i386/i386-protos.h (ix86_gen_ccmp_first): New proto declare. (ix86_gen_ccmp_next): Likewise. (ix86_get_flags_cc): Likewise. * config/i386/i386.cc (ix86_flags_cc): New enum. (ix86_ccmp_dfv_mapping): New string array to map conditional code to dfv. (ix86_print_operand): Handle special dfv flag for CCMP. (ix86_get_flags_cc): New function to return x86 CC enum. (TARGET_GEN_CCMP_FIRST): Define. (TARGET_GEN_CCMP_NEXT): Likewise. * config/i386/i386.h (TARGET_APX_CCMP): Define. * config/i386/i386.md (@ccmp<mode>): New define_insn to support ccmp. (UNSPEC_APX_DFV): New unspec for ccmp dfv. (ALL_CC): New mode iterator. (cstorecc4): Change to ... (cstore<mode>4) ... this, use ALL_CC to loop through all available CCmodes. * config/i386/i386.opt (apx_ccmp): Add enum value for ccmp. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ccmp-1.c: New compile test. * gcc.target/i386/apx-ccmp-2.c: New runtime test.
2024-06-06[APX] Adjust target-support check [PR 115341]Hongyu Wang1-1/+7
Current target apxf check does not specify sub-features that assembler supports, so the check with older binutils will fail at assemble stage for new apx features like NF,CCMP or CFCMOV. Adjust the assembler check for all apx subfeatures. gcc/testsuite/ChangeLog: PR target/115341 * lib/target-supports.exp (check_effective_target_apxf): Check for all apx sub-features.
2024-06-06Allow single-lane SLP in-order reductionsRichard Biener1-29/+19
The single-lane case isn't different from non-SLP, no re-association implied. But the transform stage cannot handle a conditional reduction op which isn't checked during analysis - this makes it work, exercised with a single-lane non-reduction-chain by gcc.target/i386/pr112464.c * tree-vect-loop.cc (vectorizable_reduction): Allow single-lane SLP in-order reductions. (vectorize_fold_left_reduction): Handle SLP reduction with conditional reduction op.
2024-06-06Add double reduction support for SLP vectorizationRichard Biener3-11/+31
The following makes double reduction vectorization work when using (single-lane) SLP vectorization. * tree-vect-loop.cc (vect_analyze_scalar_cycles_1): Queue double reductions in LOOP_VINFO_REDUCTIONS. (vect_create_epilog_for_reduction): Remove asserts disabling SLP for double reductions. (vectorizable_reduction): Analyze SLP double reductions only once and start off the correct places. * tree-vect-slp.cc (vect_get_and_check_slp_defs): Allow vect_double_reduction_def. (vect_build_slp_tree_2): Fix condition for the ignored reduction initial values. * tree-vect-stmts.cc (vect_analyze_stmt): Allow vect_double_reduction_def.
2024-06-06Allow single-lane COND_REDUCTION vectorizationRichard Biener1-16/+81
The following enables single-lane COND_REDUCTION vectorization. * tree-vect-loop.cc (vect_create_epilog_for_reduction): Adjust for single-lane COND_REDUCTION SLP vectorization. (vectorizable_reduction): Likewise. (vect_transform_cycle_phi): Likewise.
2024-06-06Relax COND_EXPR reduction vectorization SLP restrictionRichard Biener1-1/+5
Allow one-lane SLP but for the case where we need to swap the arms. * tree-vect-stmts.cc (vectorizable_condition): Allow single-lane SLP, but not when we need to swap then and else clause.
2024-06-06libgomp: Mark Loop transformation constructs as implemented in the ↵Jakub Jelinek1-1/+1
implementation status The implementation has been committed in r15-1037. 2024-06-06 Jakub Jelinek <jakub@redhat.com> * libgomp.texi (OpenMP 5.1 status): Mark Loop transformation constructs as implemented.
2024-06-06MIPS: Need COSTS_N_INSNS in mips_insn_costYunQiang Su1-1/+1
In mips_insn_cost, COSTS_N_INSNS is missing when we return the cost if count * ratio > 0. gcc * config/mips/mips.cc(mips_insn_cost): Add missing COSTS_N_INSNS to count.
2024-06-06Refine testcase for power10.liuhongt1-1/+1
For power10, there're extra 3 REG_EQUIV notes with (fix:SI. to avoid the failure. Check (fix:SI is from the pattern not NOTE. gcc/testsuite/ChangeLog: PR target/115365 * gcc.dg/pr100927.c: Don't scan fix:SI from the note.
2024-06-05[libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__Alexandre Oliva13-31/+42
A proprietary embedded operating system that uses clang as its primary compiler ships headers that require __clang__ to be defined. Defining that macro causes libstdc++ to adopt workarounds that work for clang but that break for GCC. So, introduce a _GLIBCXX_CLANG macro, and a convention to test for it rather than for __clang__, so that a GCC variant that adds -D__clang__ to satisfy system headers can also -D_GLIBCXX_CLANG=0 to avoid workarounds that are not meant for GCC. I've left fast_float and ryu files alone, their tests for __clang__ don't seem to be harmful for GCC, they don't include bits/c++config, and patching such third-party files would just make trouble for updating them without visible benefit. pstl_config.h, though also imported, required adjustment. for libstdc++-v3/ChangeLog * include/bits/c++config (_GLIBCXX_CLANG): Define or undefine. * include/bits/locale_facets_nonio.tcc: Test for it. * include/bits/stl_bvector.h: Likewise. * include/c_compatibility/stdatomic.h: Likewise. * include/experimental/bits/simd.h: Likewise. * include/experimental/bits/simd_builtin.h: Likewise. * include/experimental/bits/simd_detail.h: Likewise. * include/experimental/bits/simd_x86.h: Likewise. * include/experimental/simd: Likewise. * include/std/complex: Likewise. * include/std/ranges: Likewise. * include/std/variant: Likewise. * include/pstl/pstl_config.h: Likewise.
2024-06-06Adjust rtx_cost for MEM to enable more simplicationliuhongt4-1/+33
For CONST_VECTOR_DUPLICATE_P in constant_pool, it is just broadcast or variants in ix86_vector_duplicate_simode_const. Adjust the cost to COSTS_N_INSNS (2) + speed which should be a little bit larger than broadcast. gcc/ChangeLog: PR target/114428 * config/i386/i386.cc (ix86_rtx_costs): Adjust cost for CONST_VECTOR_DUPLICATE_P in constant_pool. * config/i386/i386-expand.cc (ix86_broadcast_from_constant): Remove static. * config/i386/i386-protos.h (ix86_broadcast_from_constant): Declare. gcc/testsuite/ChangeLog: * gcc.target/i386/pr114428.c: New test.
2024-06-06Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.liuhongt2-0/+64
When mask is (1 << (prec - imm) - 1) which is used to clear upper bits of A, then it can be simplified to LSHIFTRT. i.e Simplify (and:v8hi (ashifrt:v8hi A 8) (const_vector 0xff x8)) to (lshifrt:v8hi A 8) gcc/ChangeLog: PR target/114428 * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for specific mask. gcc/testsuite/ChangeLog: * gcc.target/i386/pr114428-1.c: New test.
2024-06-06Daily bump.GCC Administrator14-1/+672
2024-06-05contrib: Fix spelling and capitalization in header-toolsJonathan Wakely2-13/+13
contrib/header-tools/ChangeLog: * README: Fix spelling and capitalization typos. * gcc-order-headers: Fix spelling typo.
2024-06-05contrib: header-tools scripts updated to python3Sundeep KOKKONDA9-177/+177
The scripts in contrib/header-tools/ are incompatible with python3. This updates them to use python3. contrib/header-tools/ChangeLog: * count-headers: Adapt to Python 3. * gcc-order-headers: Likewise. * graph-header-logs: Likewise. * graph-include-web: Likewise. * headerutils.py: Likewise. * included-by: Likewise. * reduce-headers: Likewise. * replace-header: Likewise. * show-headers: Likewise. Signed-off-by: Sundeep KOKKONDA <sundeep.kokkonda@windriver.com>
2024-06-05check_GNU_style: Use raw strings.Robin Dapp1-10/+10
This silences some warnings when using check_GNU_style. contrib/ChangeLog: * check_GNU_style_lib.py: Use raw strings for regexps.
2024-06-05RISC-V: Introduce -mvector-strict-align.Robin Dapp13-12/+89
this patch disables movmisalign by default and introduces the -mno-vector-strict-align option to override it and re-enable movmisalign. For now, generic-ooo is the only uarch that supports misaligned vector access. The patch also adds a check_effective_target_riscv_v_misalign_ok to the testsuite which enables or disables the vector misalignment tests depending on whether the target under test can execute a misaligned vle32. Changes from v3: - Adressed Kito's comments. - Made -mscalar-strict-align a real alias. gcc/ChangeLog: * config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED): Move from here... * config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED): ...to here and map to riscv_vector_unaligned_access_p. * config/riscv/riscv.opt: Add -mvector-strict-align. * config/riscv/riscv.cc (struct riscv_tune_param): Add vector_unaligned_access. (riscv_override_options_internal): Set riscv_vector_unaligned_access_p. * doc/invoke.texi: Document -mvector-strict-align. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add check_effective_target_riscv_v_misalign_ok. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add -mno-vector-strict-align. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Ditto.
2024-06-05AArch64: enable new predicate tuning for Neoverse cores.Tamar Christina7-3/+95
This enables the new tuning flag for Neoverse V1, Neoverse V2 and Neoverse N2. It is kept off for generic codegen. Note the reason for the +sve even though they are in aarch64-sve.exp is if the testsuite is ran with a forced SVE off option, e.g. -march=armv8-a+nosve then the intrinsics end up being disabled because the -march is preferred over the -mcpu even though the -mcpu comes later. This prevents the tests from failing in such runs. gcc/ChangeLog: * config/aarch64/tuning_models/neoversen2.h (neoversen2_tunings): Add AARCH64_EXTRA_TUNE_AVOID_PRED_RMW. * config/aarch64/tuning_models/neoversev1.h (neoversev1_tunings): Add AARCH64_EXTRA_TUNE_AVOID_PRED_RMW. * config/aarch64/tuning_models/neoversev2.h (neoversev2_tunings): Add AARCH64_EXTRA_TUNE_AVOID_PRED_RMW. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pred_clobber_1.c: New test. * gcc.target/aarch64/sve/pred_clobber_2.c: New test. * gcc.target/aarch64/sve/pred_clobber_3.c: New test. * gcc.target/aarch64/sve/pred_clobber_4.c: New test.
2024-06-05AArch64: add new alternative with early clobber to patternsTamar Christina2-60/+124
This patch adds new alternatives to the patterns which are affected. The new alternatives with the conditional early clobbers are added before the normal ones in order for LRA to prefer them in the event that we have enough free registers to accommodate them. In case register pressure is too high the normal alternatives will be preferred before a reload is considered as we rather have the tie than a spill. Tests are in the next patch. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (and<mode>3, @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, *<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>, *cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest, @aarch64_pred_cmp<cmp_op><mode>_wide, *aarch64_pred_cmp<cmp_op><mode>_wide_cc, *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, @aarch64_brk<brk_op>, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber alternative. * config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Likewise.
2024-06-05AArch64: add new tuning param and attribute for enabling conditional early ↵Tamar Christina3-2/+25
clobber This adds a new tuning parameter AARCH64_EXTRA_TUNE_AVOID_PRED_RMW for AArch64 to allow us to conditionally enable the early clobber alternatives based on the tuning models. gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (AVOID_PRED_RMW): New. * config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New. * config/aarch64/aarch64.md (pred_clobber): New. (arch_enabled): Use it.
2024-06-05AArch64: convert several predicate patterns to new compact syntaxTamar Christina2-113/+161
This converts the single alternative patterns to the new compact syntax such that when I add the new alternatives it's clearer what's being changed. Note that this will spew out a bunch of warnings from geninsn as it'll warn that @ is useless for a single alternative pattern. These are not fatal so won't break the build and are only temporary. No change in functionality is expected with this patch. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (and<mode>3, @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, *<logical_nn><mode>3_ptest, *cmp<cmp_op><mode>_ptest, @aarch64_pred_cmp<cmp_op><mode>_wide, *aarch64_pred_cmp<cmp_op><mode>_wide_cc, *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Convert to compact syntax. * config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Likewise.
2024-06-05openmp: OpenMP loop transformation supportJakub Jelinek197-212/+10843
This patch is largely rewritten version of the https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631764.html patch set which I've promissed to adjust the way I'd like it but didn't get to it until now. The previous series together in diffstat was 176 files changed, 12107 insertions(+), 298 deletions(-) This patch is 197 files changed, 10843 insertions(+), 212 deletions(-) and diff between the old series and new patch is 268 files changed, 8053 insertions(+), 9231 deletions(-) Only the 5.1/5.2 tile/unroll constructs are supported, in various places some preparations for the other 6.0 loop transformations constructs (interchange/reverse/fuse) are done, but certainly not complete and not everywhere. The important difference is that because tile/unroll partial map 1:1 the original loops to generated canonical loops and add another set of generated loops without canonical form inside of it, the tile/unroll partial constructs are terminal for the generated loop, one can't have some loops from the tile or unroll partial and some further loops from inside the body of that construct. The GENERIC representation attempts to match what the standard specifies, so there are separate OMP_TILE and OMP_UNROLL trees. If for a particular loop in a loop nest of some OpenMP loop it awaits a generated loop from a nested loop, or if in OMP_LOOPXFORM_LOWERED OMP_TILE/UNROLL construct a generated loop has been moved to some surrounding construct, that particular loop is represented by all NULL_TREEs in the OMP_FOR_{INIT,COND,INCR,ORIG_DECLS} vector. The lowering of the loop transforming constructs is done at gimplification time, at the start of gimplify_omp_for. I think this way it is more maintainable over magic clauses with various loop depths on the other looping constructs or the magic OMP_LOOP_TRANS construct. Though, I admit I'm still undecided how to represent the OpenMP 6.0 loop transformation case of say: #pragma omp for collapse (4) for (int i = 0; i < 32; ++i) #pragma omp interchange permutation (2, 1) #pragma omp reverse for (int j = 0; j < 32; ++j) #pragma omp reverse for (int k = 0; k < 32; ++k) for (int l = 0; l < 32; ++l) ; Surely the i loop would go to first vector elements of OMP_FOR_* of the work-sharing loop, then 2 loops are expecting generated loops from interchange which would be inside of the body. But the innermost l loop isn't part of the interchange, so the question is where to put it. One possibility is to have it in the 4th loop of the OMP_FOR, another possibility would be to add some artificial construct inside of the OMP_INTERCHANGE and 2 OMP_REVERSE bodies which would contain the inner loop(s), e.g. it could be OMP_INTERCHANGE without permutation clause or some artificial ones or whatever. I've recently raised various unclear things in the 5.1/5.2/TRs versions regarding loop transformations, in particular https://github.com/OpenMP/spec/issues/3908 https://github.com/OpenMP/spec/issues/3909 (sorry, private links unless you have OpenMP membership). Until those are resolved, I have a sorry on trying to mix generated loops with non-rectangular loops (way too many questions need to be answered before that can be done) and similarly for mixing non-perfectly nested loops with generated loops (again, it can be implemented somehow, but is way too unclear). The second issue is mostly about data sharing, which is ambiguous, the patch makes the artificial iterators of the loops effectively private in the associated constructs (more like local), but for user iterators doesn't do anything in particular, so for now one needs to use explicit data sharing clauses on the non-loop transformation OpenMP looping constructs or surrounding parallel/task/target etc. 2024-06-05 Jakub Jelinek <jakub@redhat.com> Frederik Harwath <frederik@codesourcery.com> Sandra Loosemore <sandra@codesourcery.com> gcc/ * tree.def (OMP_TILE, OMP_UNROLL): New tree codes. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_PARTIAL, OMP_CLAUSE_FULL and OMP_CLAUSE_SIZES. * tree.h (OMP_LOOPXFORM_CHECK): Define. (OMP_LOOPXFORM_LOWERED): Define. (OMP_CLAUSE_PARTIAL_EXPR): Define. (OMP_CLAUSE_SIZES_LIST): Define. * tree.cc (omp_clause_num_ops, omp_clause_code_name): Add entries for OMP_CLAUSE_{PARTIAL,FULL,SIZES}. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_{PARTIAL,FULL,SIZES}. (dump_generic_node): Handle OMP_TILE and OMP_UNROLL. Skip printing loops with NULL OMP_FOR_INIT (node) vector element. * gimplify.cc (is_gimple_stmt): Handle OMP_TILE and OMP_UNROLL. (gimplify_omp_taskloop_expr): For SAVE_EXPR use gimplify_save_expr. (gimplify_omp_loop_xform): New function. (gimplify_omp_for): Call omp_maybe_apply_loop_xforms and if that reshuffles what the passed pointer points to, retry or return GS_OK. Handle OMP_TILE and OMP_UNROLL. (gimplify_omp_loop): Call omp_maybe_apply_loop_xforms and if that reshuffles what the passed pointer points to, return GS_OK. (gimplify_expr): Handle OMP_TILE and OMP_UNROLL. * omp-general.h (omp_loop_number_of_iterations, omp_maybe_apply_loop_xforms): Declare. * omp-general.cc (omp_adjust_for_condition): For LE_EXPR and GE_EXPR with pointers, don't add/subtract one, but the size of what the pointer points to. (omp_loop_number_of_iterations, omp_apply_tile, find_nested_loop_xform, omp_maybe_apply_loop_xforms): New functions. gcc/c-family/ * c-common.h (c_omp_find_generated_loop): Declare. * c-gimplify.cc (c_genericize_control_stmt): Handle OMP_TILE and OMP_UNROLL. * c-omp.cc (c_finish_omp_for): Handle generated loops. (c_omp_is_loop_iterator): Likewise. (c_find_nested_loop_xform_r, c_omp_find_generated_loop): New functions. (c_omp_check_loop_iv): Handle generated loops. For now sorry on mixing non-rectangular loop with generated loops. (c_omp_check_loop_binding_exprs): For now sorry on mixing imperfect loops with generated loops. (c_omp_directives): Uncomment tile and unroll entries. * c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_TILE and PRAGMA_OMP_UNROLL, change PRAGMA_OMP__LAST_ to the latter. (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_FULL and PRAGMA_OMP_CLAUSE_PARTIAL. * c-pragma.cc (omp_pragmas_simd): Add tile and unroll omp pragmas. gcc/c/ * c-parser.cc (c_parser_skip_std_attribute_spec_seq): New function. (check_omp_intervening_code): Reject imperfectly nested tile. (c_parser_compound_statement_nostart): If want_nested_loop, use c_parser_omp_next_tokens_can_be_canon_loop instead of just checking for RID_FOR keyword. (c_parser_omp_clause_name): Handle full and partial clause names. (c_parser_omp_clause_allocate): Remove spurious semicolon. (c_parser_omp_clause_full, c_parser_omp_clause_partial): New functions. (c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_FULL and PRAGMA_OMP_CLAUSE_PARTIAL. (c_parser_omp_next_tokens_can_be_canon_loop): New function. (c_parser_omp_loop_nest): Parse C23 attributes. Handle tile/unroll constructs. Use c_parser_omp_next_tokens_can_be_canon_loop instead of just checking for RID_FOR keyword. Only add_stmt (body) if it is non-NULL. (c_parser_omp_for_loop): Rename tiling variable to oacc_tiling. For OMP_CLAUSE_SIZES set collapse to list length of OMP_CLAUSE_SIZES_LIST. Use c_parser_omp_next_tokens_can_be_canon_loop instead of just checking for RID_FOR keyword. Remove spurious semicolon. Don't call c_omp_check_loop_binding_exprs if stmt is NULL. Skip generated loops. (c_parser_omp_tile_sizes, c_parser_omp_tile): New functions. (OMP_UNROLL_CLAUSE_MASK): Define. (c_parser_omp_unroll): New function. (c_parser_omp_construct): Handle PRAGMA_OMP_TILE and PRAGMA_OMP_UNROLL. * c-typeck.cc (c_finish_omp_clauses): Adjust wording of some of the conflicting clause diagnostic messages to include word clause. Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES} and diagnose full vs. partial conflict. gcc/cp/ * cp-tree.h (dependent_omp_for_p): Add another tree argument. * parser.cc (check_omp_intervening_code): Reject imperfectly nested tile. (cp_parser_statement_seq_opt): If want_nested_loop, use cp_parser_next_tokens_can_be_canon_loop instead of just checking for RID_FOR keyword. (cp_parser_omp_clause_name): Handle full and partial clause names. (cp_parser_omp_clause_full, cp_parser_omp_clause_partial): New functions. (cp_parser_omp_all_clauses): Formatting fix. Handle PRAGMA_OMP_CLAUSE_PARTIAL and PRAGMA_OMP_CLAUSE_FULL. (cp_parser_next_tokens_can_be_canon_loop): New function. (cp_parser_omp_loop_nest): Parse C++11 attributes. Handle tile/unroll constructs. Use cp_parser_next_tokens_can_be_canon_loop instead of just checking for RID_FOR keyword. Only add_stmt cp_parser_omp_loop_nest result if it is non-NULL. (cp_parser_omp_for_loop): Rename tiling variable to oacc_tiling. For OMP_CLAUSE_SIZES set collapse to list length of OMP_CLAUSE_SIZES_LIST. Use cp_parser_next_tokens_can_be_canon_loop instead of just checking for RID_FOR keyword. Remove spurious semicolon. Don't call c_omp_check_loop_binding_exprs if stmt is NULL. Skip and/or handle generated loops. Remove spurious ()s around & operands. (cp_parser_omp_tile_sizes, cp_parser_omp_tile): New functions. (OMP_UNROLL_CLAUSE_MASK): Define. (cp_parser_omp_unroll): New function. (cp_parser_omp_construct): Handle PRAGMA_OMP_TILE and PRAGMA_OMP_UNROLL. (cp_parser_pragma): Likewise. * semantics.cc (finish_omp_clauses): Don't call fold_build_cleanup_point_expr for cases which obviously won't need it, like checked INTEGER_CSTs. Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES} and diagnose full vs. partial conflict. Adjust wording of some of the conflicting clause diagnostic messages to include word clause. (finish_omp_for): Use decl equal to global_namespace as a marker for generated loop. Pass also body to dependent_omp_for_p. Skip generated loops. (finish_omp_for_block): Skip generated loops. * pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES}. (tsubst_stmt): Handle OMP_TILE and OMP_UNROLL. Handle or skip generated loops. (dependent_omp_for_p): Add body argument. If declv vector element is NULL, find generated loop. * cp-gimplify.cc (cp_gimplify_expr): Handle OMP_TILE and OMP_UNROLL. (cp_fold_r): Likewise. (cp_genericize_r): Likewise. Skip generated loops. gcc/fortran/ * gfortran.h (enum gfc_statement): Add ST_OMP_UNROLL, ST_OMP_END_UNROLL, ST_OMP_TILE and ST_OMP_END_TILE. (struct gfc_omp_clauses): Add sizes_list, partial, full and erroneous members. (enum gfc_exec_op): Add EXEC_OMP_UNROLL and EXEC_OMP_TILE. (gfc_expr_list_len): Declare. * match.h (gfc_match_omp_tile, gfc_match_omp_unroll): Declare. * openmp.cc (gfc_get_location): Declare. (gfc_free_omp_clauses): Free sizes_list. (match_oacc_expr_list): Rename to ... (match_omp_oacc_expr_list): ... this. Add is_omp argument and change diagnostic wording if it is true. (enum omp_mask2): Add OMP_CLAUSE_{FULL,PARTIAL,SIZES}. (gfc_match_omp_clauses): Parse full, partial and sizes clauses. (gfc_match_oacc_wait): Use match_omp_oacc_expr_list instead of match_oacc_expr_list. (OMP_UNROLL_CLAUSES, OMP_TILE_CLAUSES): Define. (gfc_match_omp_tile, gfc_match_omp_unroll): New functions. (resolve_omp_clauses): Diagnose full vs. partial clause conflict. Resolve sizes clause arguments. (find_nested_loop_in_chain): Use switch instead of series of ifs. Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. (gfc_resolve_omp_do_blocks): Set omp_current_do_collapse to list length of sizes_list if present. (gfc_resolve_do_iterator): Return for EXEC_OMP_TILE or EXEC_OMP_UNROLL. (restructure_intervening_code): Remove spurious ()s around & operands. (is_outer_iteration_variable): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. (check_nested_loop_in_chain): Likewise. (expr_is_invariant): Likewise. (resolve_omp_do): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. Diagnose tile without sizes clause. Use sizes_list length for count if non-NULL. Set code->ext.omp_clauses->erroneous on loops where we've reported diagnostics. Sorry for mixing non-rectangular loops with generated loops. (omp_code_to_statement): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. (gfc_resolve_omp_directive): Likewise. * parse.cc (decode_omp_directive): Parse end tile, end unroll, tile and unroll. Move nothing entry alphabetically. (case_exec_markers): Add ST_OMP_TILE and ST_OMP_UNROLL. (gfc_ascii_statement): Handle ST_OMP_END_TILE, ST_OMP_END_UNROLL, ST_OMP_TILE and ST_OMP_UNROLL. (parse_omp_do): Add nested argument. Handle ST_OMP_TILE and ST_OMP_UNROLL. (parse_omp_structured_block): Adjust parse_omp_do caller. (parse_executable): Likewise. Handle ST_OMP_TILE and ST_OMP_UNROLL. * resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. (gfc_resolve_code): Likewise. * st.cc (gfc_free_statement): Likewise. * trans.cc (trans_code): Likewise. * trans-openmp.cc (gfc_trans_omp_clauses): Handle full, partial and sizes clauses. Use tree_cons + nreverse instead of temporary vector and build_tree_list_vec for tile_list handling. (gfc_expr_list_len): New function. (gfc_trans_omp_do): Rename tile to oacc_tile. Handle sizes clause. Don't assert code->op is EXEC_DO. Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. (gfc_trans_omp_directive): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. * dump-parse-tree.cc (show_omp_clauses): Dump full, partial and sizes clauses. (show_omp_node): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. (show_code_node): Likewise. gcc/testsuite/ * c-c++-common/gomp/attrs-tile-1.c: New test. * c-c++-common/gomp/attrs-tile-2.c: New test. * c-c++-common/gomp/attrs-tile-3.c: New test. * c-c++-common/gomp/attrs-tile-4.c: New test. * c-c++-common/gomp/attrs-tile-5.c: New test. * c-c++-common/gomp/attrs-tile-6.c: New test. * c-c++-common/gomp/attrs-unroll-1.c: New test. * c-c++-common/gomp/attrs-unroll-2.c: New test. * c-c++-common/gomp/attrs-unroll-3.c: New test. * c-c++-common/gomp/attrs-unroll-inner-1.c: New test. * c-c++-common/gomp/attrs-unroll-inner-2.c: New test. * c-c++-common/gomp/attrs-unroll-inner-3.c: New test. * c-c++-common/gomp/attrs-unroll-inner-4.c: New test. * c-c++-common/gomp/attrs-unroll-inner-5.c: New test. * c-c++-common/gomp/imperfect-attributes.c: Adjust expected diagnostics. * c-c++-common/gomp/imperfect-loop-nest.c: New test. * c-c++-common/gomp/ordered-5.c: New test. * c-c++-common/gomp/scan-7.c: New test. * c-c++-common/gomp/tile-1.c: New test. * c-c++-common/gomp/tile-2.c: New test. * c-c++-common/gomp/tile-3.c: New test. * c-c++-common/gomp/tile-4.c: New test. * c-c++-common/gomp/tile-5.c: New test. * c-c++-common/gomp/tile-6.c: New test. * c-c++-common/gomp/tile-7.c: New test. * c-c++-common/gomp/tile-8.c: New test. * c-c++-common/gomp/tile-9.c: New test. * c-c++-common/gomp/tile-10.c: New test. * c-c++-common/gomp/tile-11.c: New test. * c-c++-common/gomp/tile-12.c: New test. * c-c++-common/gomp/tile-13.c: New test. * c-c++-common/gomp/tile-14.c: New test. * c-c++-common/gomp/tile-15.c: New test. * c-c++-common/gomp/unroll-1.c: New test. * c-c++-common/gomp/unroll-2.c: New test. * c-c++-common/gomp/unroll-3.c: New test. * c-c++-common/gomp/unroll-4.c: New test. * c-c++-common/gomp/unroll-5.c: New test. * c-c++-common/gomp/unroll-6.c: New test. * c-c++-common/gomp/unroll-7.c: New test. * c-c++-common/gomp/unroll-8.c: New test. * c-c++-common/gomp/unroll-9.c: New test. * c-c++-common/gomp/unroll-inner-1.c: New test. * c-c++-common/gomp/unroll-inner-2.c: New test. * c-c++-common/gomp/unroll-inner-3.c: New test. * c-c++-common/gomp/unroll-non-rect-1.c: New test. * c-c++-common/gomp/unroll-non-rect-2.c: New test. * c-c++-common/gomp/unroll-non-rect-3.c: New test. * c-c++-common/gomp/unroll-simd-1.c: New test. * gcc.dg/gomp/attrs-4.c: Adjust expected diagnostics. * gcc.dg/gomp/for-1.c: Likewise. * gcc.dg/gomp/for-11.c: Likewise. * g++.dg/gomp/attrs-4.C: Likewise. * g++.dg/gomp/for-1.C: Likewise. * g++.dg/gomp/pr94512.C: Likewise. * g++.dg/gomp/tile-1.C: New test. * g++.dg/gomp/tile-2.C: New test. * g++.dg/gomp/unroll-1.C: New test. * g++.dg/gomp/unroll-2.C: New test. * g++.dg/gomp/unroll-3.C: New test. * gfortran.dg/gomp/inner-loops-1.f90: New test. * gfortran.dg/gomp/inner-loops-2.f90: New test. * gfortran.dg/gomp/pure-1.f90: Add tests for !$omp unroll and !$omp tile. * gfortran.dg/gomp/pure-2.f90: Remove those tests from here. * gfortran.dg/gomp/scan-9.f90: New test. * gfortran.dg/gomp/tile-1.f90: New test. * gfortran.dg/gomp/tile-2.f90: New test. * gfortran.dg/gomp/tile-3.f90: New test. * gfortran.dg/gomp/tile-4.f90: New test. * gfortran.dg/gomp/tile-5.f90: New test. * gfortran.dg/gomp/tile-6.f90: New test. * gfortran.dg/gomp/tile-7.f90: New test. * gfortran.dg/gomp/tile-8.f90: New test. * gfortran.dg/gomp/tile-9.f90: New test. * gfortran.dg/gomp/tile-10.f90: New test. * gfortran.dg/gomp/tile-imperfect-nest-1.f90: New test. * gfortran.dg/gomp/tile-imperfect-nest-2.f90: New test. * gfortran.dg/gomp/tile-inner-loops-1.f90: New test. * gfortran.dg/gomp/tile-inner-loops-2.f90: New test. * gfortran.dg/gomp/tile-inner-loops-3.f90: New test. * gfortran.dg/gomp/tile-inner-loops-4.f90: New test. * gfortran.dg/gomp/tile-inner-loops-5.f90: New test. * gfortran.dg/gomp/tile-inner-loops-6.f90: New test. * gfortran.dg/gomp/tile-inner-loops-7.f90: New test. * gfortran.dg/gomp/tile-inner-loops-8.f90: New test. * gfortran.dg/gomp/tile-non-rectangular-1.f90: New test. * gfortran.dg/gomp/tile-non-rectangular-2.f90: New test. * gfortran.dg/gomp/tile-non-rectangular-3.f90: New test. * gfortran.dg/gomp/tile-unroll-1.f90: New test. * gfortran.dg/gomp/tile-unroll-2.f90: New test. * gfortran.dg/gomp/unroll-1.f90: New test. * gfortran.dg/gomp/unroll-2.f90: New test. * gfortran.dg/gomp/unroll-3.f90: New test. * gfortran.dg/gomp/unroll-4.f90: New test. * gfortran.dg/gomp/unroll-5.f90: New test. * gfortran.dg/gomp/unroll-6.f90: New test. * gfortran.dg/gomp/unroll-7.f90: New test. * gfortran.dg/gomp/unroll-8.f90: New test. * gfortran.dg/gomp/unroll-9.f90: New test. * gfortran.dg/gomp/unroll-10.f90: New test. * gfortran.dg/gomp/unroll-11.f90: New test. * gfortran.dg/gomp/unroll-12.f90: New test. * gfortran.dg/gomp/unroll-13.f90: New test. * gfortran.dg/gomp/unroll-inner-loop-1.f90: New test. * gfortran.dg/gomp/unroll-inner-loop-2.f90: New test. * gfortran.dg/gomp/unroll-no-clause-1.f90: New test. * gfortran.dg/gomp/unroll-non-rect-1.f90: New test. * gfortran.dg/gomp/unroll-non-rect-2.f90: New test. * gfortran.dg/gomp/unroll-simd-1.f90: New test. * gfortran.dg/gomp/unroll-simd-2.f90: New test. * gfortran.dg/gomp/unroll-simd-3.f90: New test. * gfortran.dg/gomp/unroll-tile-1.f90: New test. * gfortran.dg/gomp/unroll-tile-2.f90: New test. * gfortran.dg/gomp/unroll-tile-inner-1.f90: New test. libgomp/ * testsuite/libgomp.c-c++-common/imperfect-transform-1.c: New test. * testsuite/libgomp.c-c++-common/imperfect-transform-2.c: New test. * testsuite/libgomp.c-c++-common/matrix-1.h: New test. * testsuite/libgomp.c-c++-common/matrix-constant-iter.h: New test. * testsuite/libgomp.c-c++-common/matrix-helper.h: New test. * testsuite/libgomp.c-c++-common/matrix-no-directive-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-distribute-parallel-for-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-for-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-parallel-for-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-parallel-masked-taskloop-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-parallel-masked-taskloop-simd-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-target-parallel-for-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-target-teams-distribute-parallel-for-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-taskloop-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-teams-distribute-parallel-for-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-simd-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-transform-variants-1.h: New test. * testsuite/libgomp.c-c++-common/target-imperfect-transform-1.c: New test. * testsuite/libgomp.c-c++-common/target-imperfect-transform-2.c: New test. * testsuite/libgomp.c-c++-common/unroll-1.c: New test. * testsuite/libgomp.c-c++-common/unroll-non-rect-1.c: New test. * testsuite/libgomp.c++/matrix-no-directive-unroll-full-1.C: New test. * testsuite/libgomp.c++/tile-2.C: New test. * testsuite/libgomp.c++/tile-3.C: New test. * testsuite/libgomp.c++/unroll-1.C: New test. * testsuite/libgomp.c++/unroll-2.C: New test. * testsuite/libgomp.c++/unroll-full-tile.C: New test. * testsuite/libgomp.fortran/imperfect-transform-1.f90: New test. * testsuite/libgomp.fortran/imperfect-transform-2.f90: New test. * testsuite/libgomp.fortran/inner-1.f90: New test. * testsuite/libgomp.fortran/nested-fn.f90: New test. * testsuite/libgomp.fortran/target-imperfect-transform-1.f90: New test. * testsuite/libgomp.fortran/target-imperfect-transform-2.f90: New test. * testsuite/libgomp.fortran/tile-1.f90: New test. * testsuite/libgomp.fortran/tile-2.f90: New test. * testsuite/libgomp.fortran/tile-unroll-1.f90: New test. * testsuite/libgomp.fortran/tile-unroll-2.f90: New test. * testsuite/libgomp.fortran/tile-unroll-3.f90: New test. * testsuite/libgomp.fortran/tile-unroll-4.f90: New test. * testsuite/libgomp.fortran/unroll-1.f90: New test. * testsuite/libgomp.fortran/unroll-2.f90: New test. * testsuite/libgomp.fortran/unroll-3.f90: New test. * testsuite/libgomp.fortran/unroll-4.f90: New test. * testsuite/libgomp.fortran/unroll-5.f90: New test. * testsuite/libgomp.fortran/unroll-6.f90: New test. * testsuite/libgomp.fortran/unroll-7a.f90: New test. * testsuite/libgomp.fortran/unroll-7b.f90: New test. * testsuite/libgomp.fortran/unroll-7c.f90: New test. * testsuite/libgomp.fortran/unroll-7.f90: New test. * testsuite/libgomp.fortran/unroll-8.f90: New test. * testsuite/libgomp.fortran/unroll-simd-1.f90: New test. * testsuite/libgomp.fortran/unroll-tile-1.f90: New test. * testsuite/libgomp.fortran/unroll-tile-2.f90: New test.
2024-06-05AArch64: Fix cpu features initialization [PR115342]Wilco Dijkstra1-106/+75
The CPU features initialization code uses CPUID registers (rather than HWCAP). The equality comparisons it uses are incorrect: for example FEAT_SVE is not set if SVE2 is available. Using HWCAPs for these is both simpler and correct. The initialization must also be done atomically to avoid multiple threads causing corruption due to non-atomic RMW accesses to the global. libgcc: PR target/115342 * config/aarch64/cpuinfo.c (__init_cpu_features_constructor): Use HWCAP where possible. Use atomic write for initialization. Fix FEAT_PREDRES comparison. (__init_cpu_features_resolver): Use atomic load for correct initialization. (__init_cpu_features): Likewise.
2024-06-05testsuite: Improve check-function-bodiesWilco Dijkstra1-3/+3
Improve check-function-bodies by allowing single-character function names. gcc/testsuite: * lib/scanasm.exp (configure_check-function-bodies): Allow single-char function names.
2024-06-05darwin: Replace use of LONG_DOUBLE_TYPE_SIZEKewen Lin1-1/+1
Joseph pointed out "floating types should have their mode, not a poorly defined precision value" in the discussion[1], as he and Richi suggested, the existing macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a hook mode_for_floating_type. To be prepared for that, this patch is to replace use of LONG_DOUBLE_TYPE_SIZE in darwin with TYPE_PRECISION of long_double_type_node. [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html gcc/ChangeLog: * config/darwin.cc (darwin_patch_builtins): Use TYPE_PRECISION of long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
2024-06-05fortran: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZEKewen Lin2-5/+8
Joseph pointed out "floating types should have their mode, not a poorly defined precision value" in the discussion[1], as he and Richi suggested, the existing macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a hook mode_for_floating_type. To be prepared for that, this patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE in fortran with TYPE_PRECISION of {float,{,long_}double}_type_node. [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html gcc/fortran/ChangeLog: * trans-intrinsic.cc (build_round_expr): Use TYPE_PRECISION of long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE. * trans-types.cc (gfc_build_real_type): Use TYPE_PRECISION of {float,double,long_double}_type_node to replace {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.
2024-06-05d: Replace use of LONG_DOUBLE_TYPE_SIZEKewen Lin1-1/+1
Joseph pointed out "floating types should have their mode, not a poorly defined precision value" in the discussion[1], as he and Richi suggested, the existing macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a hook mode_for_floating_type. To be prepared for that, this patch is to remove the only one use of LONG_DOUBLE_TYPE_SIZE in d. Iain found that LONG_DOUBLE_TYPE_SIZE is poorly named and used incorrectly before, so this patch follows his advice with int_size_in_bytes. [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html Co-authored-by: Iain Buclaw <ibuclaw@gdcproject.org> gcc/d/ChangeLog: * d-target.cc (Target::_init): Use int_size_in_bytes of long_double_type_node to replace the expression with LONG_DOUBLE_TYPE_SIZE for c.long_doublesize assignment.