aboutsummaryrefslogtreecommitdiff
path: root/gcc/passes.def
AgeCommit message (Collapse)AuthorFilesLines
2024-01-03Update copyright years.Jakub Jelinek1-1/+1
2023-12-14A new copy propagation and PHI elimination passFilip Kastl1-0/+2
This patch adds the strongly-connected copy propagation (SCCOPY) pass. It is a lightweight GIMPLE copy propagation pass that also removes some redundant PHI statements. It handles degenerate PHIs, e.g.: _5 = PHI <_1>; _6 = PHI <_6, _6, _1, _1>; _7 = PHI <16, _7>; // Replaces occurences of _5 and _6 by _1 and _7 by 16 It also handles more complicated situations, e.g.: _8 = PHI <_9, _10>; _9 = PHI <_8, _10>; _10 = PHI <_8, _9, _1>; // Replaces occurences of _8, _9 and _10 by _1 gcc/ChangeLog: * Makefile.in: Added sccopy pass. * passes.def: Added sccopy pass before LTO streaming and before RTL expansion. * tree-pass.h (make_pass_sccopy): Added sccopy pass. * gimple-ssa-sccopy.cc: New file. gcc/testsuite/ChangeLog: * gcc.dg/sccopy-1.c: New test. Signed-off-by: Filip Kastl <fkastl@suse.cz>
2023-12-05Introduce strub: machine-independent stack scrubbingAlexandre Oliva1-0/+2
This patch adds the strub attribute for function and variable types, command-line options, passes and adjustments to implement it, documentation, and tests. Stack scrubbing is implemented in a machine-independent way: functions with strub enabled are modified so that they take an extra stack watermark argument, that they update with their stack use, and the caller can then zero it out once it regains control, whether by return or exception. There are two ways to go about it: at-calls, that modifies the visible interface (signature) of the function, and internal, in which the body is moved to a clone, the clone undergoes the interface change, and the function becomes a wrapper, preserving its original interface, that calls the clone and then clears the stack used by it. Variables can also be annotated with the strub attribute, so that functions that read from them get stack scrubbing enabled implicitly, whether at-calls, for functions only usable within a translation unit, or internal, for functions whose interfaces must not be modified. There is a strict mode, in which functions that have their stack scrubbed can only call other functions with stack-scrubbing interfaces, or those explicitly marked as callable from strub contexts, so that an entire call chain gets scrubbing, at once or piecemeal depending on optimization levels. In the default mode, relaxed, this requirement is not enforced by the compiler. The implementation adds two IPA passes, one that assigns strub modes early on, another that modifies interfaces and adds calls to the builtins that jointly implement stack scrubbing. Another builtin, that obtains the stack pointer, is added for use in the implementation of the builtins, whether expanded inline or called in libgcc. There are new command-line options to change operation modes and to force the feature disabled; it is enabled by default, but it has no effect and is implicitly disabled if the strub attribute is never used. There are also options meant to use for testing the feature, enabling different strubbing modes for all (viable) functions. for gcc/ChangeLog * Makefile.in (OBJS): Add ipa-strub.o. (GTFILES): Add ipa-strub.cc. * builtins.def (BUILT_IN_STACK_ADDRESS): New. (BUILT_IN___STRUB_ENTER): New. (BUILT_IN___STRUB_UPDATE): New. (BUILT_IN___STRUB_LEAVE): New. * builtins.cc: Include ipa-strub.h. (STACK_STOPS, STACK_UNSIGNED): Define. (expand_builtin_stack_address): New. (expand_builtin_strub_enter): New. (expand_builtin_strub_update): New. (expand_builtin_strub_leave): New. (expand_builtin): Call them. * common.opt (fstrub=*): New options. * doc/extend.texi (strub): New type attribute. (__builtin_stack_address): New function. (Stack Scrubbing): New section. * doc/invoke.texi (-fstrub=*): New options. (-fdump-ipa-*): New passes. * gengtype-lex.l: Ignore multi-line pp-directives. * ipa-inline.cc: Include ipa-strub.h. (can_inline_edge_p): Test strub_inlinable_to_p. * ipa-split.cc: Include ipa-strub.h. (execute_split_functions): Test strub_splittable_p. * ipa-strub.cc, ipa-strub.h: New. * passes.def: Add strub_mode and strub passes. * tree-cfg.cc (gimple_verify_flow_info): Note on debug stmts. * tree-pass.h (make_pass_ipa_strub_mode): Declare. (make_pass_ipa_strub): Declare. (make_pass_ipa_function_and_variable_visibility): Fix formatting. * tree-ssa-ccp.cc (optimize_stack_restore): Keep restores before strub leave. * attribs.cc: Include ipa-strub.h. (decl_attributes): Support applying attributes to function type, rather than pointer type, at handler's request. (comp_type_attributes): Combine strub_comptypes and target comp_type results. * doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New. (TARGET_STRUB_MAY_USE_MEMSET): New. * doc/tm.texi: Rebuilt. * cgraph.h (symtab_node::reset): Add preserve_comdat_group param, with a default. * cgraphunit.cc (symtab_node::reset): Use it. for gcc/c-family/ChangeLog * c-attribs.cc: Include ipa-strub.h. (handle_strub_attribute): New. (c_common_attribute_table): Add strub. for gcc/ada/ChangeLog * gcc-interface/trans.cc: Include ipa-strub.h. (gigi): Make internal decls for targets of compiler-generated calls strub-callable too. (build_raise_check): Likewise. * gcc-interface/utils.cc: Include ipa-strub.h. (handle_strub_attribute): New. (gnat_internal_attribute_table): Add strub. for gcc/testsuite/ChangeLog * c-c++-common/strub-O0.c: New. * c-c++-common/strub-O1.c: New. * c-c++-common/strub-O2.c: New. * c-c++-common/strub-O2fni.c: New. * c-c++-common/strub-O3.c: New. * c-c++-common/strub-O3fni.c: New. * c-c++-common/strub-Og.c: New. * c-c++-common/strub-Os.c: New. * c-c++-common/strub-all1.c: New. * c-c++-common/strub-all2.c: New. * c-c++-common/strub-apply1.c: New. * c-c++-common/strub-apply2.c: New. * c-c++-common/strub-apply3.c: New. * c-c++-common/strub-apply4.c: New. * c-c++-common/strub-at-calls1.c: New. * c-c++-common/strub-at-calls2.c: New. * c-c++-common/strub-defer-O1.c: New. * c-c++-common/strub-defer-O2.c: New. * c-c++-common/strub-defer-O3.c: New. * c-c++-common/strub-defer-Os.c: New. * c-c++-common/strub-internal1.c: New. * c-c++-common/strub-internal2.c: New. * c-c++-common/strub-parms1.c: New. * c-c++-common/strub-parms2.c: New. * c-c++-common/strub-parms3.c: New. * c-c++-common/strub-relaxed1.c: New. * c-c++-common/strub-relaxed2.c: New. * c-c++-common/strub-short-O0-exc.c: New. * c-c++-common/strub-short-O0.c: New. * c-c++-common/strub-short-O1.c: New. * c-c++-common/strub-short-O2.c: New. * c-c++-common/strub-short-O3.c: New. * c-c++-common/strub-short-Os.c: New. * c-c++-common/strub-strict1.c: New. * c-c++-common/strub-strict2.c: New. * c-c++-common/strub-tail-O1.c: New. * c-c++-common/strub-tail-O2.c: New. * c-c++-common/torture/strub-callable1.c: New. * c-c++-common/torture/strub-callable2.c: New. * c-c++-common/torture/strub-const1.c: New. * c-c++-common/torture/strub-const2.c: New. * c-c++-common/torture/strub-const3.c: New. * c-c++-common/torture/strub-const4.c: New. * c-c++-common/torture/strub-data1.c: New. * c-c++-common/torture/strub-data2.c: New. * c-c++-common/torture/strub-data3.c: New. * c-c++-common/torture/strub-data4.c: New. * c-c++-common/torture/strub-data5.c: New. * c-c++-common/torture/strub-indcall1.c: New. * c-c++-common/torture/strub-indcall2.c: New. * c-c++-common/torture/strub-indcall3.c: New. * c-c++-common/torture/strub-inlinable1.c: New. * c-c++-common/torture/strub-inlinable2.c: New. * c-c++-common/torture/strub-ptrfn1.c: New. * c-c++-common/torture/strub-ptrfn2.c: New. * c-c++-common/torture/strub-ptrfn3.c: New. * c-c++-common/torture/strub-ptrfn4.c: New. * c-c++-common/torture/strub-pure1.c: New. * c-c++-common/torture/strub-pure2.c: New. * c-c++-common/torture/strub-pure3.c: New. * c-c++-common/torture/strub-pure4.c: New. * c-c++-common/torture/strub-run1.c: New. * c-c++-common/torture/strub-run2.c: New. * c-c++-common/torture/strub-run3.c: New. * c-c++-common/torture/strub-run4.c: New. * c-c++-common/torture/strub-run4c.c: New. * c-c++-common/torture/strub-run4d.c: New. * c-c++-common/torture/strub-run4i.c: New. * g++.dg/strub-run1.C: New. * g++.dg/torture/strub-init1.C: New. * g++.dg/torture/strub-init2.C: New. * g++.dg/torture/strub-init3.C: New. * gnat.dg/strub_attr.adb, gnat.dg/strub_attr.ads: New. * gnat.dg/strub_ind.adb, gnat.dg/strub_ind.ads: New. for libgcc/ChangeLog * Makefile.in (LIB2ADD): Add strub.c. * libgcc2.h (__strub_enter, __strub_update, __strub_leave): Declare. * strub.c: New. * libgcc-std.ver.in (__strub_enter): Add to GCC_14.0.0. (__strub_update, __strub_leave): Likewise.
2023-12-05Allow prologues and epilogues to be inserted laterRichard Sandiford1-0/+3
Arm's SME adds a new processor mode called streaming mode. This mode enables some new (matrix-oriented) instructions and disables several existing groups of instructions, such as most Advanced SIMD vector instructions and a much smaller set of SVE instructions. It can also change the current vector length. There are instructions to switch in and out of streaming mode. However, their effect on the ISA and vector length can't be represented directly in RTL, so they need to be emitted late in the pass pipeline, close to md_reorg. It's sometimes the responsibility of the prologue and epilogue to switch modes, which means we need to emit the prologue and epilogue sequences late as well. (This loses shrink-wrapping and scheduling opportunities, but that's a price worth paying.) This patch therefore adds a target hook for forcing prologue and epilogue insertion to happen later in the pipeline. gcc/ * target.def (use_late_prologue_epilogue): New hook. * doc/tm.texi.in: Add TARGET_USE_LATE_PROLOGUE_EPILOGUE. * doc/tm.texi: Regenerate. * passes.def (pass_late_thread_prologue_and_epilogue): New pass. * tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare. * function.cc (pass_thread_prologue_and_epilogue::gate): New function. (pass_data_late_thread_prologue_and_epilogue): New pass variable. (pass_late_thread_prologue_and_epilogue): New pass class. (make_pass_late_thread_prologue_and_epilogue): New function.
2023-10-20Control flow redundancy hardeningAlexandre Oliva1-0/+1
This patch introduces an optional hardening pass to catch unexpected execution flows. Functions are transformed so that basic blocks set a bit in an automatic array, and (non-exceptional) function exit edges check that the bits in the array represent an expected execution path in the CFG. Functions with multiple exit edges, or with too many blocks, call an out-of-line checker builtin implemented in libgcc. For simpler functions, the verification is performed in-line. -fharden-control-flow-redundancy enables the pass for eligible functions, --param hardcfr-max-blocks sets a block count limit for functions to be eligible, and --param hardcfr-max-inline-blocks tunes the "too many blocks" limit for in-line verification. -fhardcfr-skip-leaf makes leaf functions non-eligible. Additional -fhardcfr-check-* options are added to enable checking at exception escape points, before potential sibcalls, hereby dubbed returning calls, and before noreturn calls and exception raises. A notable case is the distinction between noreturn calls expected to throw and those expected to terminate or loop forever: the default setting for -fhardcfr-check-noreturn-calls, no-xthrow, performs checking before the latter, but the former only gets checking in the exception handler. GCC can only tell between them by explicit marking noreturn functions expected to raise with the newly-introduced expected_throw attribute, and corresponding ECF_XTHROW flag. for gcc/ChangeLog * tree-core.h (ECF_XTHROW): New macro. * tree.cc (set_call_expr): Add expected_throw attribute when ECF_XTHROW is set. (build_common_builtin_node): Add ECF_XTHROW to __cxa_end_cleanup and _Unwind_Resume or _Unwind_SjLj_Resume. * calls.cc (flags_from_decl_or_type): Check for expected_throw attribute to set ECF_XTHROW. * gimple.cc (gimple_build_call_from_tree): Propagate ECF_XTHROW from decl flags to gimple call... (gimple_call_flags): ... and back. * gimple.h (GF_CALL_XTHROW): New gf_mask flag. (gimple_call_set_expected_throw): New. (gimple_call_expected_throw_p): New. * Makefile.in (OBJS): Add gimple-harden-control-flow.o. * builtins.def (BUILT_IN___HARDCFR_CHECK): New. * common.opt (fharden-control-flow-redundancy): New. (-fhardcfr-check-returning-calls): New. (-fhardcfr-check-exceptions): New. (-fhardcfr-check-noreturn-calls=*): New. (Enum hardcfr_check_noreturn_calls): New. (fhardcfr-skip-leaf): New. * doc/invoke.texi: Document them. (hardcfr-max-blocks, hardcfr-max-inline-blocks): New params. * flag-types.h (enum hardcfr_noret): New. * gimple-harden-control-flow.cc: New. * params.opt (-param=hardcfr-max-blocks=): New. (-param=hradcfr-max-inline-blocks=): New. * passes.def (pass_harden_control_flow_redundancy): Add. * tree-pass.h (make_pass_harden_control_flow_redundancy): Declare. * doc/extend.texi: Document expected_throw attribute. for gcc/ada/ChangeLog * gcc-interface/trans.cc (gigi): Mark __gnat_reraise_zcx with ECF_XTHROW. (build_raise_check): Likewise for all rcheck subprograms. for gcc/c-family/ChangeLog * c-attribs.cc (handle_expected_throw_attribute): New. (c_common_attribute_table): Add expected_throw. for gcc/cp/ChangeLog * decl.cc (push_throw_library_fn): Mark with ECF_XTHROW. * except.cc (build_throw): Likewise __cxa_throw, _ITM_cxa_throw, __cxa_rethrow. for gcc/testsuite/ChangeLog * c-c++-common/torture/harden-cfr.c: New. * c-c++-common/harden-cfr-noret-never-O0.c: New. * c-c++-common/torture/harden-cfr-noret-never.c: New. * c-c++-common/torture/harden-cfr-noret-noexcept.c: New. * c-c++-common/torture/harden-cfr-noret-nothrow.c: New. * c-c++-common/torture/harden-cfr-noret.c: New. * c-c++-common/torture/harden-cfr-notail.c: New. * c-c++-common/torture/harden-cfr-returning.c: New. * c-c++-common/torture/harden-cfr-tail.c: New. * c-c++-common/torture/harden-cfr-abrt-always.c: New. * c-c++-common/torture/harden-cfr-abrt-never.c: New. * c-c++-common/torture/harden-cfr-abrt-no-xthrow.c: New. * c-c++-common/torture/harden-cfr-abrt-nothrow.c: New. * c-c++-common/torture/harden-cfr-abrt.c: New. * c-c++-common/torture/harden-cfr-always.c: New. * c-c++-common/torture/harden-cfr-never.c: New. * c-c++-common/torture/harden-cfr-no-xthrow.c: New. * c-c++-common/torture/harden-cfr-nothrow.c: New. * c-c++-common/torture/harden-cfr-bret-always.c: New. * c-c++-common/torture/harden-cfr-bret-never.c: New. * c-c++-common/torture/harden-cfr-bret-noopt.c: New. * c-c++-common/torture/harden-cfr-bret-noret.c: New. * c-c++-common/torture/harden-cfr-bret-no-xthrow.c: New. * c-c++-common/torture/harden-cfr-bret-nothrow.c: New. * c-c++-common/torture/harden-cfr-bret-retcl.c: New. * c-c++-common/torture/harden-cfr-bret.c: New. * g++.dg/harden-cfr-throw-always-O0.C: New. * g++.dg/harden-cfr-throw-returning-O0.C: New. * g++.dg/torture/harden-cfr-noret-always-no-nothrow.C: New. * g++.dg/torture/harden-cfr-noret-never-no-nothrow.C: New. * g++.dg/torture/harden-cfr-noret-no-nothrow.C: New. * g++.dg/torture/harden-cfr-throw-always.C: New. * g++.dg/torture/harden-cfr-throw-never.C: New. * g++.dg/torture/harden-cfr-throw-no-xthrow.C: New. * g++.dg/torture/harden-cfr-throw-no-xthrow-expected.C: New. * g++.dg/torture/harden-cfr-throw-nothrow.C: New. * g++.dg/torture/harden-cfr-throw-nocleanup.C: New. * g++.dg/torture/harden-cfr-throw-returning.C: New. * g++.dg/torture/harden-cfr-throw.C: New. * gcc.dg/torture/harden-cfr-noret-no-nothrow.c: New. * gcc.dg/torture/harden-cfr-tail-ub.c: New. * gnat.dg/hardcfr.adb: New. for libgcc/ChangeLog * Makefile.in (LIB2ADD): Add hardcfr.c. * hardcfr.c: New.
2023-10-16Implement new RTL optimizations pass: fold-mem-offsetsManolis Tsamis1-0/+1
This is a new RTL pass that tries to optimize memory offset calculations by moving them from add immediate instructions to the memory loads/stores. For example it can transform this: addi t4,sp,16 add t2,a6,t4 shl t3,t2,1 ld a2,0(t3) addi a2,1 sd a2,8(t2) into the following (one instruction less): add t2,a6,sp shl t3,t2,1 ld a2,32(t3) addi a2,1 sd a2,24(t2) Although there are places where this is done already, this pass is more powerful and can handle the more difficult cases that are currently not optimized. Also, it runs late enough and can optimize away unnecessary stack pointer calculations. gcc/ChangeLog: * Makefile.in: Add fold-mem-offsets.o. * passes.def: Schedule a new pass. * tree-pass.h (make_pass_fold_mem_offsets): Declare. * common.opt: New options. * doc/invoke.texi: Document new option. * fold-mem-offsets.cc: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/fold-mem-offsets-1.c: New test. * gcc.target/riscv/fold-mem-offsets-2.c: New test. * gcc.target/riscv/fold-mem-offsets-3.c: New test. * gcc.target/i386/pr52146.c: Adjust expected output. Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
2023-10-03Remove pass counting in VRP.Andrew MacLeod1-2/+2
Rather than using a pass count to decide which parameters are passed to VRP, makemit explicit. * passes.def (pass_vrp): Pass "final pass" flag as parameter. * tree-vrp.cc (vrp_pass_num): Remove. (pass_vrp::my_pass): Remove. (pass_vrp::pass_vrp): Add warn_p as a parameter. (pass_vrp::final_p): New. (pass_vrp::set_pass_param): Set final_p param. (pass_vrp::execute): Call execute_range_vrp with no conditions. (make_pass_vrp): Pass additional parameter. (make_pass_early_vrp): Ditto.
2023-09-06_BitInt lowering support [PR102989]Jakub Jelinek1-0/+3
The following patch adds a new bitintlower lowering pass which lowers most operations on medium _BitInt into operations on corresponding integer types, large _BitInt into straight line code operating on 2 or more limbs and finally huge _BitInt into a loop plus optional straight line code. As the only supported architecture is little-endian, the lowering only supports little-endian for now, because it would be impossible to test it all for big-endian. Rest is written with any endian support in mind, but of course only little-endian has been actually tested. I hope it is ok to add big-endian support to the lowering pass incrementally later when first big-endian target shows with the backend support. There are 2 possibilities of adding such support, one would be minimal one, just tweak limb_access function and perhaps one or two other spots and transform there the indexes from little endian (index 0 is least significant) to big endian for just the memory access. Advantage is I think maintainance costs, disadvantage is that the loops will still iterate from 0 to some number of limbs and we'd rely on IVOPTs or something similar changing it later if needed. Or we could make those indexes endian related everywhere, though I'm afraid that would be several hundreds of changes. For switches indexed by large/huge _BitInt the patch invokes what the switch lowering pass does (but only on those specific switches, not all of them); the switch lowering breaks the switches into clusters and none of the clusters can have a range which doesn't fit into 64-bit UWHI, everything else will be turned into a tree of comparisons. For clusters normally emitted as smaller switches, because we already have a guarantee that the low .. high range is at most 64 bits, the patch forces subtraction of the low and turns it into a 64-bit switch. This is done before the actual pass starts. Similarly, we cancel lowering of certain constructs like ABS_EXPR, ABSU_EXPR, MIN_EXPR, MAX_EXPR and COND_EXPR and turn those back to simpler comparisons etc., so that fewer operations need to be lowered later. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * Makefile.in (OBJS): Add gimple-lower-bitint.o. * passes.def: Add pass_lower_bitint after pass_lower_complex and pass_lower_bitint_O0 after pass_lower_complex_O0. * tree-pass.h (PROP_gimple_lbitint): Define. (make_pass_lower_bitint_O0, make_pass_lower_bitint): Declare. * gimple-lower-bitint.h: New file. * tree-ssa-live.h (struct _var_map): Add bitint member. (init_var_map): Adjust declaration. (region_contains_p): Handle map->bitint like map->outofssa_p. * tree-ssa-live.cc (init_var_map): Add BITINT argument, initialize map->bitint and set map->outofssa_p to false if it is non-NULL. * tree-ssa-coalesce.cc: Include gimple-lower-bitint.h. (build_ssa_conflict_graph): Call build_bitint_stmt_ssa_conflicts if map->bitint. (create_coalesce_list_for_region): For map->bitint ignore SSA_NAMEs not in that bitmap, and allow res without default def. (compute_optimized_partition_bases): In map->bitint mode try hard to coalesce any SSA_NAMEs with the same size. (coalesce_bitint): New function. (coalesce_ssa_name): In map->bitint mode, or map->bitmap into used_in_copies and call coalesce_bitint. * gimple-lower-bitint.cc: New file.
2023-08-03Swap loop splitting and final value replacementRichard Biener1-1/+1
The following swaps the loop splitting pass and the final value replacement pass to avoid keeping the IV of the earlier loop live when not necessary. The existing gcc.target/i386/pr87007-5.c testcase shows that we otherwise fail to elide an empty loop later. I don't see any good reason why loop splitting would need final value replacement, all exit values honor the constraints we place on loop header PHIs automatically. * passes.def: Exchange loop splitting and final value replacement passes. * gcc.target/i386/pr87007-5.c: Make sure we split the loop and eliminate both in the end.
2023-07-14Turn TODO_rebuild_frequencies to a passJan Hubicka1-0/+8
Currently we rebiuild profile_counts from profile_probability after inlining, because there is a chance that producing large loop nests may get unrealistically large profile_count values. This is much less of concern when we switched to new profile_count representation while back. This propagation can also compensate for profile inconsistencies caused by optimization passes. Since inliner is followed by basic cleanup passes that does not use profile, we get more realistic profile by delaying the recomputation after basic optimizations exposed by inlininig are finished. This does not fit into TODO machinery, so I turn rebuilding into stand alone pass and schedule it before first consumer of profile in the optimization queue. I also added logic that avoids repropagating when CFG is good and not too close to overflow. Propagating visits very basic block loop_depth times, so it is not linear and avoiding it may help a bit. On tramp3d we get 14 functions repropagated and 916 are OK. The repropagated functions are RB tree ones where we produce crazy loop nests by recurisve inlining. This is something to fix independently. gcc/ChangeLog: * passes.cc (execute_function_todo): Remove TODO_rebuild_frequencies * passes.def: Add rebuild_frequencies pass. * predict.cc (estimate_bb_frequencies): Drop force parameter. (tree_estimate_probability): Update call of estimate_bb_frequencies. (rebuild_frequencies): Turn into a pass; verify CFG profile consistency first and do not rebuild if not necessary. (class pass_rebuild_frequencies): New. (make_pass_rebuild_frequencies): New. * profile-count.h: Add profile_count::very_large_p. * tree-inline.cc (optimize_inline_calls): Do not return TODO_rebuild_frequencies * tree-pass.h (TODO_rebuild_frequencies): Remove. (make_pass_rebuild_frequencies): Declare.
2023-06-19optimize std::max earlyJan Hubicka1-0/+2
we currently produce very bad code on loops using std::vector as a stack, since we fail to inline push_back which in turn prevents SRA and we fail to optimize out some store-to-load pairs. I looked into why this function is not inlined and it is inlined by clang. We currently estimate it to 66 instructions and inline limits are 15 at -O2 and 30 at -O3. Clang has similar estimate, but still decides to inline at -O2. I looked into reason why the body is so large and one problem I spotted is the way std::max is implemented by taking and returning reference to the values. const T& max( const T& a, const T& b ); This makes it necessary to store the values to memory and load them later and max is used by code computing new size of vector on resize. We optimize this to MAX_EXPR, but only during late optimizations. I think this is a common enough coding pattern and we ought to make this transparent to early opts and IPA. The following is easist fix that simply adds phiprop pass that turns the PHI of address values into PHI of values so later FRE can propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE remove the memory stores. gcc/ChangeLog: PR tree-optimization/109811 PR tree-optimization/109849 * passes.def: Add phiprop to early optimization passes. * tree-ssa-phiprop.cc: Allow clonning. gcc/testsuite/ChangeLog: PR tree-optimization/109811 PR tree-optimization/109849 * gcc.dg/tree-ssa/phiprop-1.c: New test. * gcc.dg/tree-ssa/pr21463.c: Adjust template.
2023-01-02Update copyright years.Jakub Jelinek1-1/+1
2022-11-11Make last DCE remove empty loopsRichard Biener1-4/+4
The following makes the last DCE pass CD-DCE and in turn the last CD-DCE pass a DCE one. That ensues we remove empty loops that become empty between the two. I've also moved the tail-call pass after DCE since DCE can only improve things here. The two testcases were the only ones scanning cddce3 so I've changed them to scan the dce7 pass that's now in this place. The testcases scanning dce7 also work when that's in the earlier position. PR tree-optimization/84646 * tree-ssa-dce.cc (pass_dce::set_pass_param): Add param wheter to run update-address-taken. (pass_dce::execute): Honor it. * passes.def: Exchange last DCE and CD-DCE invocations. Swap pass_tail_calls and the last DCE. * g++.dg/tree-ssa/pr106922.C: Continue to scan earlier DCE dump. * gcc.dg/tree-ssa/20030808-1.c: Likewise.
2022-10-18middle-end IFN_ASSUME support [PR106654]Jakub Jelinek1-0/+1
My earlier patches gimplify the simplest non-side-effects assumptions into if (cond) ; else __builtin_unreachable (); and throw the rest on the floor. The following patch attempts to do something with the rest too. For -O0, it throws the more complex assumptions on the floor, we don't expect optimizations and the assumptions are there to allow optimizations. Otherwise arranges for the assumptions to be visible in the IL as .ASSUME (_Z2f4i._assume.0, i_1(D)); call where there is an artificial function like: bool _Z2f4i._assume.0 (int i) { bool _2; <bb 2> [local count: 1073741824]: _2 = i_1(D) == 43; return _2; } with the semantics that there is UB unless the assumption function would return true. Aldy, could ranger handle this? If it sees .ASSUME call, walk the body of such function from the edge(s) to exit with the assumption that the function returns true, so above set _2 [true, true] and from there derive that i_1(D) [43, 43] and then map the argument in the assumption function to argument passed to IFN_ASSUME (note, args there are shifted by 1)? During gimplification it actually gimplifies it into [[assume (D.2591)]] { { i = i + 1; D.2591 = i == 44; } } which is a new GIMPLE_ASSUME statement wrapping a GIMPLE_BIND and specifying a boolean_type_node variable which contains the result. The GIMPLE_ASSUME then survives just a couple of passes and is lowered during gimple lowering into an outlined separate function and IFN_ASSUME call. Variables declared inside of the condition (both static and automatic) just change context, automatic variables from the caller are turned into parameters (note, as the code is never executed, I handle this way even non-POD types, we don't need to bother pretending there would be user copy constructors etc. involved). The assume_function artificial functions are then optimized until the new assumptions pass which doesn't do much right now but I'd like to see there the backwards ranger walk and filling up of SSA_NAME_RANGE_INFO for the parameters. There are a few further changes I'd like to do, like ignoring the .ASSUME calls in inlining size estimations (but haven't figured out where it is done), or for LTO arrange for the assume functions to be emitted in all partitions that reference those (usually there will be just one, unless code with the assumption got inlined, versioned etc.). 2022-10-18 Jakub Jelinek <jakub@redhat.com> PR c++/106654 gcc/ * gimple.def (GIMPLE_ASSUME): New statement kind. * gimple.h (struct gimple_statement_assume): New type. (is_a_helper <gimple_statement_assume *>::test, is_a_helper <const gimple_statement_assume *>::test): New. (gimple_build_assume): Declare. (gimple_has_substatements): Return true for GIMPLE_ASSUME. (gimple_assume_guard, gimple_assume_set_guard, gimple_assume_guard_ptr, gimple_assume_body_ptr, gimple_assume_body): New inline functions. * gsstruct.def (GSS_ASSUME): New. * gimple.cc (gimple_build_assume): New function. (gimple_copy): Handle GIMPLE_ASSUME. * gimple-pretty-print.cc (dump_gimple_assume): New function. (pp_gimple_stmt_1): Handle GIMPLE_ASSUME. * gimple-walk.cc (walk_gimple_op): Handle GIMPLE_ASSUME. * omp-low.cc (WALK_SUBSTMTS): Likewise. (lower_omp_1): Likewise. * omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn): Likewise. * tree-cfg.cc (verify_gimple_stmt, verify_gimple_in_seq_2): Likewise. * function.h (struct function): Add assume_function bitfield. * gimplify.cc (gimplify_call_expr): If the assumption isn't simple enough, expand it into GIMPLE_ASSUME wrapped block or for -O0 drop it. * gimple-low.cc: Include attribs.h. (create_assumption_fn): New function. (struct lower_assumption_data): New type. (find_assumption_locals_r, assumption_copy_decl, adjust_assumption_stmt_r, adjust_assumption_stmt_op, lower_assumption): New functions. (lower_stmt): Handle GIMPLE_ASSUME. * tree-ssa-ccp.cc (pass_fold_builtins::execute): Remove IFN_ASSUME calls. * lto-streamer-out.cc (output_struct_function_base): Pack assume_function bit. * lto-streamer-in.cc (input_struct_function_base): And unpack it. * cgraphunit.cc (cgraph_node::expand): Don't verify assume_function has TREE_ASM_WRITTEN set and don't release its body. (symbol_table::compile): Allow assume functions not to have released body. * internal-fn.cc (expand_ASSUME): Remove gcc_unreachable. * passes.cc (execute_one_pass): For TODO_discard_function don't release body of assume functions. * cgraph.cc (cgraph_node::verify_node): Don't verify cgraph nodes of PROP_assumptions_done functions. * tree-pass.h (PROP_assumptions_done): Define. (TODO_discard_function): Adjust comment. (make_pass_assumptions): Declare. * passes.def (pass_assumptions): Add. * timevar.def (TV_TREE_ASSUMPTIONS): New. * tree-inline.cc (remap_gimple_stmt): Handle GIMPLE_ASSUME. * tree-vrp.cc (pass_data_assumptions): New variable. (pass_assumptions): New class. (make_pass_assumptions): New function. gcc/cp/ * cp-tree.h (build_assume_call): Declare. * parser.cc (cp_parser_omp_assumption_clauses): Use build_assume_call. * cp-gimplify.cc (build_assume_call): New function. (process_stmt_assume_attribute): Use build_assume_call. * pt.cc (tsubst_copy_and_build): Likewise. gcc/testsuite/ * g++.dg/cpp23/attr-assume5.C: New test. * g++.dg/cpp23/attr-assume6.C: New test. * g++.dg/cpp23/attr-assume7.C: New test.
2022-09-22tree-optimization/99407 - DSE with data-ref analysisRichard Biener1-1/+1
The following resolves the issue that DSE cannot handle references with variable offsets well when identifying possible uses of a store. Instead of just relying on ref_maybe_used_by_stmt_p we use data-ref analysis, making sure to perform that at most once per stmt. The new mode is only exercised by the DSE pass before loop optimization as specified by a new pass parameter and when expensive optimizations are enabled, so it's disabled below -O2. PR tree-optimization/99407 * tree-ssa-dse.cc (dse_stmt_to_dr_map): New global. (dse_classify_store): Use data-ref analysis to disambiguate more uses. (pass_dse::use_dr_analysis_p): New pass parameter. (pass_dse::set_pass_param): Implement. (pass_dse::execute): Allocate and deallocate dse_stmt_to_dr_map. * passes.def: Allow DR analysis for the DSE pass before loop. * gcc.dg/vect/tsvc/vect-tsvc-s243.c: Remove XFAIL.
2022-07-20Move pass_cse_sincos after vectorizer.liuhongt1-1/+2
__builtin_cexpi can't be vectorized since there's gap between it and vectorized sincos version(In libmvec, it passes a double and two double pointer and returns nothing.) And it will lose some vectorization opportunity if sin & cos are optimized to cexpi before vectorizer. I'm trying to add vect_recog_cexpi_pattern to split cexpi to sin and cos, but it failed vectorizable_simd_clone_call since NULL is returned by cgraph_node::get (fndecl). So alternatively, the patch try to move pass_cse_sincos after vectorizer, just before pas_cse_reciprocals. Also original pass_cse_sincos additionaly expands pow&cabs, this patch split that part into a separate pass named pass_expand_powcabs which remains the old pass position. gcc/ChangeLog: * passes.def: (Split pass_cse_sincos to pass_expand_powcabs and pass_cse_sincos, and move pass_cse_sincos after vectorizer). * timevar.def (TV_TREE_POWCABS): New timevar. * tree-pass.h (make_pass_expand_powcabs): Split from pass_cse_sincos. * tree-ssa-math-opts.cc (gimple_expand_builtin_cabs): Ditto. (class pass_expand_powcabs): Ditto. (pass_expand_powcabs::execute): Ditto. (make_pass_expand_powcabs): Ditto. (pass_cse_sincos::execute): Remove pow/cabs expand part. (make_pass_cse_sincos): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/pow-sqrt-synth-1.c: Adjust testcase.
2022-04-05c/105151 - move early walloca passRichard Biener1-1/+1
When the walloca pass gained support for ranger the early pass was not moved to a place where SSA form is available but remained in the lowering pipeline. For the testcase in this bug this is a problem because for errorneous input we still run the lowering pipeline but here have broken SSA form which ranger does not like. The solution is to rectify the mistake with using ranger without SSA form and move the pass which solves both issues. 2022-04-05 Richard Biener <rguenther@suse.de> PR c/105151 * passes.def (pass_walloca): Move early instance into pass_build_ssa_passes to make SSA form available. * gcc.dg/gimplefe-error-14.c: New testcase.
2022-03-17tree-optimization/104960 - unsplit edges after late sinkingRichard Biener1-2/+2
Something went wrong when testing the earlier patch to move the late sinking to before the late phiopt for PR102008. The following makes sure to unsplit edges after the late sinking since the split edges confuse the following phiopt leading to missed optimizations. I've went for a new pass parameter for this to avoid changing the CFG after the early sinking pass at this point. 2022-03-17 Richard Biener <rguenther@suse.de> PR tree-optimization/104960 * passes.def: Add pass parameter to pass_sink_code, mark last one to unsplit edges. * tree-ssa-sink.cc (pass_sink_code::set_pass_param): New. (pass_sink_code::execute): Always execute TODO_cleanup_cfg when we need to unsplit edges. * gcc.dg/gimplefe-37.c: Adjust to allow either the true or false edge to have a forwarder.
2022-03-16tree-optimization/102008 - restore if-conversion of adjacent loadsRichard Biener1-1/+1
The following re-orders the newly added code sinking pass before the last phiopt pass which performs hoisting of adjacent loads with the intent to enable if-conversion on those. I've added the aarch64 specific testcase from the PR. 2022-03-16 Richard Biener <rguenther@suse.de> PR tree-optimization/102008 * passes.def: Move the added code sinking pass before the preceeding phiopt pass. * gcc.target/aarch64/pr102008.c: New testcase.
2022-02-03Adjust warn_access pass placement [PR104260].Martin Sebor1-2/+2
Resolves: PR middle-end/104260 - Misplaced waccess3 pass gcc/ChangeLog: PR middle-end/104260 * passes.def (pass_warn_access): Adjust pass placement.
2022-01-18ipa/103989 - tame IPA optimizations at -OgRichard Biener1-2/+3
With -Og we are not prepared to do cleanup after IPA optimizations and dead code exposed by those confuses late diagnostic passes. This is a first patch removing unwanted IPA optimizations, namely both late modref and pure-const analysis. 2022-01-18 Richard Biener <rguenther@suse.de> PR ipa/103989 * passes.def (pass_all_optimizations_g): Remove pass_modref and pass_local_pure_const.
2022-01-15Add -Wdangling-pointer [PR63272].Martin Sebor1-1/+4
Resolves: PR c/63272 - GCC should warn when using pointer to dead scoped variable with in the same function gcc/c-family/ChangeLog: PR c/63272 * c.opt (-Wdangling-pointer): New option. gcc/ChangeLog: PR c/63272 * diagnostic-spec.c (nowarn_spec_t::nowarn_spec_t): Handle -Wdangling-pointer. * doc/invoke.texi (-Wdangling-pointer): Document new option. * gimple-ssa-warn-access.cc (pass_waccess::clone): Set new member. (pass_waccess::check_pointer_uses): New function. (pass_waccess::gimple_call_return_arg): New function. (pass_waccess::gimple_call_return_arg_ref): New function. (pass_waccess::check_call_dangling): New function. (pass_waccess::check_dangling_uses): New function overloads. (pass_waccess::check_dangling_stores): New function. (pass_waccess::check_dangling_stores): New function. (pass_waccess::m_clobbers): New data member. (pass_waccess::m_func): New data member. (pass_waccess::m_run_number): New data member. (pass_waccess::m_check_dangling_p): New data member. (pass_waccess::check_alloca): Check m_early_checks_p. (pass_waccess::check_alloc_size_call): Same. (pass_waccess::check_strcat): Same. (pass_waccess::check_strncat): Same. (pass_waccess::check_stxcpy): Same. (pass_waccess::check_stxncpy): Same. (pass_waccess::check_strncmp): Same. (pass_waccess::check_memop_access): Same. (pass_waccess::check_read_access): Same. (pass_waccess::check_builtin): Call check_pointer_uses. (pass_waccess::warn_invalid_pointer): Add arguments. (is_auto_decl): New function. (pass_waccess::check_stmt): New function. (pass_waccess::check_block): Call check_stmt. (pass_waccess::execute): Call check_dangling_uses, check_dangling_stores. Empty m_clobbers. * passes.def (pass_warn_access): Invoke pass two more times. gcc/testsuite/ChangeLog: PR c/63272 * g++.dg/warn/Wfree-nonheap-object-6.C: Disable valid warnings. * g++.dg/warn/ref-temp1.C: Prune expected warning. * gcc.dg/uninit-pr50476.c: Expect a new warning. * c-c++-common/Wdangling-pointer-2.c: New test. * c-c++-common/Wdangling-pointer-3.c: New test. * c-c++-common/Wdangling-pointer-4.c: New test. * c-c++-common/Wdangling-pointer-5.c: New test. * c-c++-common/Wdangling-pointer-6.c: New test. * c-c++-common/Wdangling-pointer.c: New test. * g++.dg/warn/Wdangling-pointer-2.C: New test. * g++.dg/warn/Wdangling-pointer.C: New test. * gcc.dg/Wdangling-pointer-2.c: New test. * gcc.dg/Wdangling-pointer.c: New test.
2022-01-03Update copyright years.Jakub Jelinek1-1/+1
2021-11-23Implement -Winfinite-recursion [PR88232].Martin Sebor1-0/+1
Resolves: PR middle-end/88232 - Please implement -Winfinite-recursion gcc/ChangeLog: PR middle-end/88232 * Makefile.in (OBJS): Add gimple-warn-recursion.o. * common.opt: Add -Winfinite-recursion. * doc/invoke.texi (-Winfinite-recursion): Document. * passes.def (pass_warn_recursion): Schedule a new pass. * tree-pass.h (make_pass_warn_recursion): Declare. * gimple-warn-recursion.c: New file. gcc/c-family/ChangeLog: PR middle-end/88232 * c.opt: Add -Winfinite-recursion. gcc/testsuite/ChangeLog: PR middle-end/88232 * c-c++-common/attr-used-5.c: Suppress valid warning. * c-c++-common/attr-used-6.c: Same. * c-c++-common/attr-used-9.c: Same. * g++.dg/warn/Winfinite-recursion-2.C: New test. * g++.dg/warn/Winfinite-recursion-3.C: New test. * g++.dg/warn/Winfinite-recursion.C: New test. * gcc.dg/Winfinite-recursion-2.c: New test. * gcc.dg/Winfinite-recursion.c: New test.
2021-11-11Enable pure-const discovery in modref.Jan Hubicka1-1/+1
We newly can handle some extra cases, for example: struct a {int a,b,c;}; __attribute__ ((noinline)) int init (struct a *a) { a->a=1; a->b=2; a->c=3; } int const_fn () { struct a a; init (&a); return a.a + a.b + a.c; } Here pure/const stops on the fact that const_fn calls non-const init, while modref knows that the memory it initializes is local to const_fn. I ended up reordering passes so early modref is done after early pure-const mostly to avoid need to change testsuite which greps for const functions being detects in pure-const. Stil some testuiste compensation is needed. gcc/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * ipa-modref.c (analyze_function): Do pure/const discovery, return true on success. (pass_modref::execute): If pure/const is discovered fixup cfg. (ignore_edge): Do not ignore pure/const edges. (modref_propagate_in_scc): Do pure/const discovery, return true if cdtor was promoted pure/const. (pass_ipa_modref::execute): If needed remove unreachable functions. * ipa-pure-const.c (warn_function_noreturn): Fix whitespace. (warn_function_cold): Likewise. (skip_function_for_local_pure_const): Move earlier. (ipa_make_function_const): Break out from ... (ipa_make_function_pure): Break out from ... (propagate_pure_const): ... here. (pass_local_pure_const::execute): Use it. * ipa-utils.h (ipa_make_function_const): Declare. (ipa_make_function_pure): Declare. * passes.def: Move early modref after pure-const. gcc/testsuite/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * c-c++-common/tm/inline-asm.c: Disable pure-const. * g++.dg/ipa/modref-1.C: Update template. * gcc.dg/tree-ssa/modref-11.c: Disable pure-const. * gcc.dg/tree-ssa/modref-14.c: New test. * gcc.dg/tree-ssa/modref-8.c: Do not optimize sibling calls. * gfortran.dg/do_subscript_3.f90: Add -O0.
2021-11-08Move uncprop after modrefJan Hubicka1-2/+6
moveS uncprop after modref and pure/const pass and adds a comment that this pass should alwasy be last since it is only supposed to help PHI lowering. The pass replaces constant by SSA names that are known to be constant at the place which hardly helps other passes. gcc/ChangeLog: PR tree-optimization/103177 * passes.def: Move uncprop after pure/const and modref.
2021-11-01Add debug counters to back threader.Aldy Hernandez1-5/+5
Chasing down stage3 miscomparisons is never fun, and having no way to distinguish between jump threads registered by a particular pass, is even harder. This patch adds debug counters for the individual back threading passes. I've left the ethread pass alone, as that one is usually benign, but we could easily add it if needed. The fact that we can only pass one boolean argument to the passes infrastructure has us do all sorts of gymnastics to differentiate between the various back threading passes. Tested on x86-64 Linux. gcc/ChangeLog: * dbgcnt.def: Add debug counter for back_thread[12] and back_threadfull[12]. * passes.def: Pass "first" argument to each back threading pass. * tree-ssa-threadbackward.c (back_threader::back_threader): Add first argument. (back_threader::debug_counter): New. (back_threader::maybe_register_path): Call debug_counter.
2021-10-29Remove VRP threader passes in exchange for better threading pre-VRP.Aldy Hernandez1-4/+2
This patch upgrades the pre-VRP threading passes to fully resolving backward threaders, and removes the post-VRP threading passes altogether. With it, we reduce the number of threaders in our pipeline from 9 to 7. This will leave DOM as the only forward threader client. When the ranger can handle floats, we should be able to upgrade the pre-DOM threaders to fully resolving threaders and kill the embedded DOM threader. The numbers are as follows: prev: # threads in backward + vrp-threaders = 92624 now: # threads in backward threaders = 94275 Gain: +1.78% prev: # total threads: 189495 now: # total threads: 193714 Gain: +2.22% The numbers are not as great as my initial proposal, but I've recently pushed all the work that got us to this point ;-). And... the compilation improves by 1.32%! There's a regression on uninit-pred-7_a.c that I've yet to look at. I want to make sure it's not a missing thread. If it is, I'll create a PR and own it. Also, the tree-ssa/phi_on_compare-*.c tests have all regressed. This seems to be some special case the forward threader handles that the backward threader does not (edge_forwards_cmp_to_conditional_jump*). I haven't dug deep to see if this is solveable within our infrastructure, but a cursory look shows that even though the VRP threader threads this, the *.optimized dump ends with more conditional jumps than without the optimization. I'd like to punt on this for now, because DOM actually catches this through its lone use of the forward threader (I've adjusted the tests). However, we will need to address this sooner or later, if indeed it's still improving the final assembly. gcc/ChangeLog: * passes.def: Replace the pass_thread_jumps before VRP* with pass_thread_jumps_full. Remove all pass_vrp_threader instances. * tree-ssa-threadbackward.c (pass_data_thread_jumps_full): Remove hyphen from "thread-full" name. libgomp/ChangeLog: * testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading changes. * testsuite/libgomp.graphite/force-parallel-8.c: Same. gcc/testsuite/ChangeLog: * gcc.dg/loop-unswitch-2.c: Adjust for threading changes. * gcc.dg/old-style-asm-1.c: Same. * gcc.dg/tree-ssa/phi_on_compare-1.c: Same. * gcc.dg/tree-ssa/phi_on_compare-2.c: Same. * gcc.dg/tree-ssa/phi_on_compare-3.c: Same. * gcc.dg/tree-ssa/phi_on_compare-4.c: Same. * gcc.dg/tree-ssa/pr20701.c: Same. * gcc.dg/tree-ssa/pr21001.c: Same. * gcc.dg/tree-ssa/pr21294.c: Same. * gcc.dg/tree-ssa/pr21417.c: Same. * gcc.dg/tree-ssa/pr21559.c: Same. * gcc.dg/tree-ssa/pr21563.c: Same. * gcc.dg/tree-ssa/pr49039.c: Same. * gcc.dg/tree-ssa/pr59597.c: Same. * gcc.dg/tree-ssa/pr61839_1.c: Same. * gcc.dg/tree-ssa/pr61839_3.c: Same. * gcc.dg/tree-ssa/pr66752-3.c: Same. * gcc.dg/tree-ssa/pr68198.c: Same. * gcc.dg/tree-ssa/pr77445-2.c: Same. * gcc.dg/tree-ssa/pr77445.c: Same. * gcc.dg/tree-ssa/ranger-threader-1.c: Same. * gcc.dg/tree-ssa/ranger-threader-2.c: Same. * gcc.dg/tree-ssa/ranger-threader-4.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same. * gcc.dg/tree-ssa/ssa-thread-14.c: Same. * gcc.dg/tree-ssa/ssa-thread-backedge.c: Same. * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same. * gcc.dg/tree-ssa/vrp02.c: Same. * gcc.dg/tree-ssa/vrp03.c: Same. * gcc.dg/tree-ssa/vrp05.c: Same. * gcc.dg/tree-ssa/vrp06.c: Same. * gcc.dg/tree-ssa/vrp07.c: Same. * gcc.dg/tree-ssa/vrp08.c: Same. * gcc.dg/tree-ssa/vrp09.c: Same. * gcc.dg/tree-ssa/vrp33.c: Same. * gcc.dg/uninit-pred-9_b.c: Same. * gcc.dg/uninit-pred-7_a.c: xfail.
2021-10-28hardened conditionalsAlexandre Oliva1-0/+2
This patch introduces optional passes to harden conditionals used in branches, and in computing boolean expressions, by adding redundant tests of the reversed conditions, and trapping in case of unexpected results. Though in abstract machines the redundant tests should never fail, CPUs may be led to misbehave under certain kinds of attacks, such as of power deprivation, and these tests reduce the likelihood of going too far down an unexpected execution path. for gcc/ChangeLog * common.opt (fharden-compares): New. (fharden-conditional-branches): New. * doc/invoke.texi: Document new options. * gimple-harden-conditionals.cc: New. * Makefile.in (OBJS): Build it. * passes.def: Add new passes. * tree-pass.h (make_pass_harden_compares): Declare. (make_pass_harden_conditional_branches): Declare. for gcc/ada/ChangeLog * doc/gnat_rm/security_hardening_features.rst (Hardened Conditionals): New. for gcc/testsuite/ChangeLog * c-c++-common/torture/harden-comp.c: New. * c-c++-common/torture/harden-cond.c: New.
2021-10-19Change threading comment before pass_ccp pass.Aldy Hernandez1-3/+1
gcc/ChangeLog: * passes.def: Change threading comment before pass_ccp pass.
2021-09-28reassoc: Do not bias loop-carried PHIs earlyIlya Leoshkevich1-2/+2
Biasing loop-carried PHIs during the 1st reassociation pass interferes with reduction chains and does not bring measurable benefits, so do it only during the 2nd reassociation pass. gcc/ChangeLog: * passes.def (pass_reassoc): Rename parameter to early_p. * tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p): New variable. (phi_rank): Don't bias loop-carried phi ranks before vectorization pass. (execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter. (pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p initializer. (pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p value. (pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to execute_reassoc. (pass_reassoc::bias_loop_carried_phi_ranks_p): New member.
2021-09-27Replace VRP threader with a hybrid forward threader.Aldy Hernandez1-0/+2
This patch implements the new hybrid forward threader and replaces the embedded VRP threader with it. With all the pieces that have gone in, the implementation of the hybrid threader is straightforward: convert the current state into SSA imports that the solver will understand, and let the path solver precompute ranges and relations for the path. After this setup is done, we can use the range_query API to solve gimple statements in the threader. The forward threader is now engine agnostic so there are no changes to the threader per se. I have put the hybrid bits in tree-ssa-threadedge.*, instead of VRP, because they will also be used in the evrp removal of the DOM/threader, which is my next task. Most of the patch, is actually test changes. I have gone through every single one and verified that we're correct. Most were trivial dump file name changes, but others required going through the IL an certifying that the different IL was expected. For example, in pr59597.c, we have one less thread because the ASSERT_EXPR was getting in the way, and making it seem like things were not crossing loops. The hybrid threader sees the correct representation of the IL, and avoids threading this one case. The final numbers are a 12.16% improvement in jump threads immediately after VRP, and a 0.82% improvement in overall jump threads. The performance drop is 0.6% (plus the 1.43% hit from moving the embedded threader into its own pass). As I've said, I'd prefer to keep the threader in its own pass, but if this is an issue, we can address this with a shared ranger when VRP is replaced with an evrp instance (upcoming). Note, that these numbers are slightly different than what I originally posted. A few correctness tweaks, plus restricting loop threads, made the difference. That being said, I was aiming for par. A 12% gain is just gravy ;-). When we merge the threaders, we should see even better numbers-- and we'll have the benefit of an entire release stress testing the solver. As I mentioned in my introductory note, paths ending in MEM_REF conditional are missing. In reality, this didn't make a difference, as it was so rare. However, as a follow-up, I will distill a test and add a suitable PR to keep us honest. There is a one-line change to libgomp/team.c silencing a new used uninitialized warning. As my previous work with the threaders has shown, warnings flare up after each improvement to jump threading. I expect this to be no different. I've promised Jakub to investigate fully, so I will analyze and add the appropriate PR for the warning experts. Oh yeah, the new pass dump is called vrp-threader[12] to match each VRP[12] pass. However, there's no reason for it to either be named vrp-threader, or for it to live in tree-vrp.c. Tested on x86-64 Linux. OK? p.s. "Did I say 5 weeks? My bad, I meant 5 months." gcc/ChangeLog: * passes.def (pass_vrp_threader): New. * tree-pass.h (make_pass_vrp_threader): Add make_pass_vrp_threader. * tree-ssa-threadedge.c (hybrid_jt_state::register_equivs_stmt): New. (hybrid_jt_simplifier::hybrid_jt_simplifier): New. (hybrid_jt_simplifier::simplify): New. (hybrid_jt_simplifier::compute_ranges_from_state): New. * tree-ssa-threadedge.h (class hybrid_jt_state): New. (class hybrid_jt_simplifier): New. * tree-vrp.c (execute_vrp): Remove ASSERT_EXPR based jump threader. (class hybrid_threader): New. (hybrid_threader::hybrid_threader): New. (hybrid_threader::~hybrid_threader): New. (hybrid_threader::before_dom_children): New. (hybrid_threader::after_dom_children): New. (execute_vrp_threader): New. (class pass_vrp_threader): New. (make_pass_vrp_threader): New. libgomp/ChangeLog: * team.c: Initialize start_data. * testsuite/libgomp.graphite/force-parallel-4.c: Adjust. * testsuite/libgomp.graphite/force-parallel-8.c: Adjust. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr55107.c: Adjust. * gcc.dg/tree-ssa/phi_on_compare-1.c: Adjust. * gcc.dg/tree-ssa/phi_on_compare-2.c: Adjust. * gcc.dg/tree-ssa/phi_on_compare-3.c: Adjust. * gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust. * gcc.dg/tree-ssa/pr21559.c: Adjust. * gcc.dg/tree-ssa/pr59597.c: Adjust. * gcc.dg/tree-ssa/pr61839_1.c: Adjust. * gcc.dg/tree-ssa/pr61839_3.c: Adjust. * gcc.dg/tree-ssa/pr71437.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-11.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-16.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-4.c: Adjust. * gcc.dg/tree-ssa/ssa-thread-14.c: Adjust. * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Adjust. * gcc.dg/tree-ssa/vrp106.c: Adjust. * gcc.dg/tree-ssa/vrp55.c: Adjust.
2021-08-09openacc: Middle-end worker-partitioning supportJulian Brown1-0/+1
This patch implements worker-partitioning support in the middle end, by rewriting gimple. The OpenACC execution model requires that code can run in either "worker single" mode where only a single worker per gang is active, or "worker partitioned" mode, where multiple workers per gang are active. This means we need to do something equivalent to spawning additional workers when transitioning from worker-single to worker-partitioned mode. However, GPUs typically fix the number of threads of invoked kernels at launch time, so we need to do something with the "extra" threads when they are not wanted. The scheme used is to conditionalise each basic block that executes in "worker single" mode for worker 0 only. Conditional branches are handled specially so "idle" (non-0) workers follow along with worker 0. On transitioning to "worker partitioned" mode, any variables modified by worker 0 are propagated to the other workers via GPU shared memory. Special care is taken for routine calls, writes through pointers, and so forth, as follows: - There are two types of function calls to consider in worker-single mode: "normal" calls to maths library routines, etc. are called from worker 0 only. OpenACC routines may contain worker-partitioned loops themselves, so are called from all workers, including "idle" ones. - SSA names set in worker-single mode, but used in worker-partitioned mode, are copied to shared memory in worker 0. Other workers retrieve the value from the appropriate shared-memory location after a barrier, and new phi nodes are introduced at the convergence point to resolve the worker 0/other worker copies of the value. - Local scalar variables (on the stack) also need special handling. We broadcast any variables that are written in the current worker-single block, and that are read in any worker-partitioned block. (This is believed to be safe, and is flow-insensitive to ease analysis.) - Local aggregates (arrays and composites) on the stack are *not* broadcast. Instead we force gimple stmts modifying elements/fields of local aggregates into fully-partitioned mode. The RHS of the assignment is a scalar, and is thus subject to broadcasting as above. - Writes through pointers may affect any local variable that has its address taken. We use points-to analysis to determine the set of potentially-affected variables for a given pointer indirection. We broadcast any such variable which is used in worker-partitioned mode, on a per-block basis for any block containing a write through a pointer. Some slides about the implementation (from 2018) are available at: https://jtb20.github.io/gcnworkers.pdf gcc/ * Makefile.in (OBJS): Add omp-oacc-neuter-broadcast.o. * doc/tm.texi.in (TARGET_GOACC_CREATE_WORKER_BROADCAST_RECORD): Add documentation hook. * doc/tm.texi: Regenerate. * omp-oacc-neuter-broadcast.cc: New file. * omp-builtins.def (BUILT_IN_GOACC_BARRIER) (BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START) (BUILT_IN_GOACC_SINGLE_COPY_END): New builtins. * passes.def (pass_omp_oacc_neuter_broadcast): Add pass. * target.def (goacc.create_worker_broadcast_record): Add target hook. * tree-pass.h (make_pass_omp_oacc_neuter_broadcast): Add prototype. * config/gcn/gcn-protos.h (gcn_goacc_adjust_propagation_record): Rename prototype to... (gcn_goacc_create_worker_broadcast_record): ... this. * config/gcn/gcn-tree.c (gcn_goacc_adjust_propagation_record): Rename function to... (gcn_goacc_create_worker_broadcast_record): ... this. * config/gcn/gcn.c (TARGET_GOACC_ADJUST_PROPAGATION_RECORD): Rename to... (TARGET_GOACC_CREATE_WORKER_BROADCAST_RECORD): ... this. Co-Authored-By: Nathan Sidwell <nathan@codesourcery.com> (via 'gcc/config/nvptx/nvptx.c' master) Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com> Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-07-29[OpenACC] Extract 'pass_oacc_loop_designation' out of 'pass_oacc_device_lower'Thomas Schwinge1-0/+1
This really is a separate step -- and another pass to be added between the two, later on. gcc/ * omp-offload.c (oacc_loop_xform_head_tail, oacc_loop_process): 'update_stmt' after modification. (pass_oacc_loop_designation): New function, extracted out of... (pass_oacc_device_lower): ... this. (pass_data_oacc_loop_designation, pass_oacc_loop_designation) (make_pass_oacc_loop_designation): New * passes.def: Add it. * tree-parloops.c (create_parallel_loop): Adjust. * tree-pass.h (make_pass_oacc_loop_designation): New. gcc/testsuite/ * c-c++-common/goacc/classify-kernels-unparallelized.c: 's%oaccdevlow%oaccloops%g'. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/classify-parallel.c: Likewise. * c-c++-common/goacc/classify-routine-nohost.c: Likewise. * c-c++-common/goacc/classify-routine.c: Likewise. * c-c++-common/goacc/classify-serial.c: Likewise. * c-c++-common/goacc/routine-nohost-1.c: Likewise. * g++.dg/goacc/template.C: Likewise. * gcc.dg/goacc/loop-processing-1.c: Likewise. * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise. * gfortran.dg/goacc/classify-kernels.f95: Likewise. * gfortran.dg/goacc/classify-parallel.f95: Likewise. * gfortran.dg/goacc/classify-routine-nohost.f95: Likewise. * gfortran.dg/goacc/classify-routine.f95: Likewise. * gfortran.dg/goacc/classify-serial.f95: Likewise. * gfortran.dg/goacc/routine-multiple-directives-1.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: 's%oaccdevlow%oaccloops%g'. * testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise. * testsuite/libgomp.oacc-fortran/routine-nohost-1.f90: Likewise. Co-Authored-By: Julian Brown <julian@codesourcery.com> Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
2021-07-28Add new gimple-ssa-warn-access pass.Martin Sebor1-0/+1
gcc/ChangeLog: * Makefile.in (OBJS): Add gimple-ssa-warn-access.o and pointer-query.o. * attribs.h (fndecl_dealloc_argno): Move fndecl_dealloc_argno to tree.h. * builtins.c (compute_objsize_r): Move to pointer-query.cc. (access_ref::access_ref): Same. (access_ref::phi): Same. (access_ref::get_ref): Same. (access_ref::size_remaining): Same. (access_ref::offset_in_range): Same. (access_ref::add_offset): Same. (access_ref::inform_access): Same. (ssa_name_limit_t::visit_phi): Same. (ssa_name_limit_t::leave_phi): Same. (ssa_name_limit_t::next): Same. (ssa_name_limit_t::next_phi): Same. (ssa_name_limit_t::~ssa_name_limit_t): Same. (pointer_query::pointer_query): Same. (pointer_query::get_ref): Same. (pointer_query::put_ref): Same. (pointer_query::flush_cache): Same. (warn_string_no_nul): Move to gimple-ssa-warn-access.cc. (check_nul_terminated_array): Same. (unterminated_array): Same. (maybe_warn_for_bound): Same. (check_read_access): Same. (warn_for_access): Same. (get_size_range): Same. (check_access): Same. (gimple_call_alloc_size): Move to tree.c. (gimple_parm_array_size): Move to pointer-query.cc. (get_offset_range): Same. (gimple_call_return_array): Same. (handle_min_max_size): Same. (handle_array_ref): Same. (handle_mem_ref): Same. (compute_objsize): Same. (gimple_call_alloc_p): Move to gimple-ssa-warn-access.cc. (call_dealloc_argno): Same. (fndecl_dealloc_argno): Same. (new_delete_mismatch_p): Same. (matching_alloc_calls_p): Same. (warn_dealloc_offset): Same. (maybe_emit_free_warning): Same. * builtins.h (check_nul_terminated_array): Move to gimple-ssa-warn-access.h. (check_nul_terminated_array): Same. (warn_string_no_nul): Same. (unterminated_array): Same. (class ssa_name_limit_t): Same. (class pointer_query): Same. (struct access_ref): Same. (class range_query): Same. (struct access_data): Same. (gimple_call_alloc_size): Same. (gimple_parm_array_size): Same. (compute_objsize): Same. (class access_data): Same. (maybe_emit_free_warning): Same. * calls.c (initialize_argument_information): Remove call to maybe_emit_free_warning. * gimple-array-bounds.cc: Include new header.. * gimple-fold.c: Same. * gimple-ssa-sprintf.c: Same. * gimple-ssa-warn-restrict.c: Same. * passes.def: Add pass_warn_access. * tree-pass.h (make_pass_warn_access): Declare. * tree-ssa-strlen.c: Include new headers. * tree.c (fndecl_dealloc_argno): Move here from builtins.c. * tree.h (fndecl_dealloc_argno): Move here from attribs.h. * gimple-ssa-warn-access.cc: New file. * gimple-ssa-warn-access.h: New file. * pointer-query.cc: New file. * pointer-query.h: New file. gcc/cp/ChangeLog: * init.c: Include new header.
2021-07-13passes: Fix up subobject __bos [PR101419]Jakub Jelinek1-3/+3
The following testcase is miscompiled, because VN during cunrolli changes __bos argument from address of a larger field to address of a smaller field and so __builtin_object_size (, 1) then folds into smaller value than the actually available size. copy_reference_ops_from_ref has a hack for this, but it was using cfun->after_inlining as a check whether the hack can be ignored, and cunrolli is after_inlining. This patch uses a property to make it exact (set at the end of objsz pass that doesn't do insert_min_max_p) and additionally based on discussions in the PR moves the objsz pass earlier after IPA. 2021-07-13 Jakub Jelinek <jakub@redhat.com> Richard Biener <rguenther@suse.de> PR tree-optimization/101419 * tree-pass.h (PROP_objsz): Define. (make_pass_early_object_sizes): Declare. * passes.def (pass_all_early_optimizations): Rename pass_object_sizes there to pass_early_object_sizes, drop parameter. (pass_all_optimizations): Move pass_object_sizes right after pass_ccp, drop parameter, move pass_post_ipa_warn right after that. * tree-object-size.c (pass_object_sizes::execute): Rename to... (object_sizes_execute): ... this. Add insert_min_max_p argument. (pass_data_object_sizes): Move after object_sizes_execute. (pass_object_sizes): Likewise. In execute method call object_sizes_execute, drop set_pass_param method and insert_min_max_p non-static data member and its initializer in the ctor. (pass_data_early_object_sizes, pass_early_object_sizes, make_pass_early_object_sizes): New. * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Use (cfun->curr_properties & PROP_objsz) instead of cfun->after_inlining. * gcc.dg/builtin-object-size-10.c: Pass -fdump-tree-early_objsz-details instead of -fdump-tree-objsz1-details in dg-options and adjust names of dump file in scan-tree-dump. * gcc.dg/pr101419.c: New test.
2021-05-18Run pass_sink_code once more before store_mergingXionghu Luo1-0/+1
Gimple sink code pass runs quite early, there may be some new oppertunities exposed by later gimple optmization passes, this patch runs the sink code pass once more before store_merging. For detailed discussion, please refer to: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562352.html Tested the SPEC2017 performance on P8LE, 544.nab_r is improved by 2.43%, but no big changes to other cases, GEOMEAN is improved quite small with 0.25%. gcc/ChangeLog: 2021-05-18 Xionghu Luo <luoxhu@linux.ibm.com> * passes.def: Add sink_code pass before store_merging. * tree-ssa-sink.c (pass_sink_code:clone): New. gcc/testsuite/ChangeLog: 2021-05-18 Xionghu Luo <luoxhu@linux.ibm.com> * gcc.dg/tree-ssa/ssa-sink-1.c: Adjust. * gcc.dg/tree-ssa/ssa-sink-2.c: Ditto. * gcc.dg/tree-ssa/ssa-sink-3.c: Ditto. * gcc.dg/tree-ssa/ssa-sink-4.c: Ditto. * gcc.dg/tree-ssa/ssa-sink-5.c: Ditto. * gcc.dg/tree-ssa/ssa-sink-6.c: Ditto. * gcc.dg/tree-ssa/ssa-sink-7.c: Ditto. * gcc.dg/tree-ssa/ssa-sink-8.c: Ditto. * gcc.dg/tree-ssa/ssa-sink-9.c: Ditto. * gcc.dg/tree-ssa/ssa-sink-10.c: Ditto. * gcc.dg/tree-ssa/ssa-sink-13.c: Ditto. * gcc.dg/tree-ssa/ssa-sink-14.c: Ditto. * gcc.dg/tree-ssa/ssa-sink-16.c: Ditto. * gcc.dg/tree-ssa/ssa-sink-17.c: Ditto. * gcc.dg/tree-ssa/ssa-sink-18.c: New.
2021-05-05PR middle-end/100325 - missing warning with -O0 on sprintf overflow with ↵Martin Sebor1-1/+1
pointer plus offset gcc/ChangeLog: * passes.def (pass_warn_printf): Run after SSA. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/builtin-sprintf-warn-26.c: New test.
2021-05-03introduce try store by multiple piecesAlexandre Oliva1-2/+3
The ldist pass turns even very short loops into memset calls. E.g., the TFmode emulation calls end with a loop of up to 3 iterations, to zero out trailing words, and the loop distribution pass turns them into calls of the memset builtin. Though short constant-length clearing memsets are usually dealt with efficiently, for non-constant-length ones, the options are setmemM, or a function calls. RISC-V doesn't have any setmemM pattern, so the loops above end up "optimized" into memset calls, incurring not only the overhead of an explicit call, but also discarding the information the compiler has about the alignment of the destination, and that the length is a multiple of the word alignment. This patch handles variable lengths with multiple conditional power-of-2-constant-sized stores-by-pieces, so as to reduce the overhead of length compares. It also changes the last copy-prop pass into ccp, so that pointer alignment and length's nonzero bits are detected and made available for the expander, even for ldist-introduced SSA_NAMEs. for gcc/ChangeLog * builtins.c (try_store_by_multiple_pieces): New. (expand_builtin_memset_args): Use it. If target_char_cast fails, proceed as for non-constant val. Pass len's ctz to... * expr.c (clear_storage_hints): ... this. Try store by multiple pieces after setmem. (clear_storage): Adjust. * expr.h (clear_storage_hints): Likewise. (try_store_by_multiple_pieces): Declare. * passes.def: Replace the last copy_prop with ccp.
2021-04-27tree-optimization/99912 - schedule another TODO_remove_unused_localsRichard Biener1-0/+3
This makes sure to remove unused locals and prune CLOBBERs after the first scalar cleanup phase after IPA optimizations. On the testcase in the PR this results in 8000 CLOBBERs removed which in turn unleashes more DSE which otherwise hits its walking limit of 256 too early on this testcase. 2021-04-27 Richard Biener <rguenther@suse.de> PR tree-optimization/99912 * passes.def: Add comment about new TODO_remove_unused_locals. * tree-stdarg.c (pass_data_stdarg): Run TODO_remove_unused_locals at start.
2021-04-27tree-optimization/99912 - schedule DSE before SRARichard Biener1-1/+2
For the testcase in the PR the main SRA pass is unable to do some important scalarizations because dead stores of addresses make the candiate variables disqualified. The following patch adds another DSE pass before SRA forming a DCE/DSE pair and moves the DSE pass that is currently closely after SRA up to after the next DCE pass, forming another DCE/DSE pair now residing after PRE. 2021-04-07 Richard Biener <rguenther@suse.de> PR tree-optimization/99912 * passes.def (pass_all_optimizations): Add pass_dse before the first pass_dce, move the first pass_dse before the pass_dce following pass_pre. * gcc.dg/tree-ssa/ldist-33.c: Disable PRE and LIM. * gcc.dg/tree-ssa/pr96789.c: Adjust dump file scanned. * gcc.dg/tree-ssa/ssa-dse-28.c: Likewise. * gcc.dg/tree-ssa/ssa-dse-29.c: Likewise.
2021-01-16cd_dce: Return TODO_update_address_taken from last cd_dce [PR96271]Jakub Jelinek1-3/+5
On the following testcase, handle_builtin_memcmp in the strlen pass folds the memcmp into comparison of two MEM_REFs. But nothing triggers updating of addressable vars afterwards, so even when the parameters are no longer address taken, we force the parameters to stack and back anyway. This patch causes TODO_update_address_taken to happen right before last forwprop pass (at the end of last cd_dce), so after strlen1 too. 2021-01-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/96271 * passes.def: Pass false argument to first two pass_cd_dce instances and true to last instance. Add comment that last instance rewrites no longer addressed locals. * tree-ssa-dce.c (pass_cd_dce): Add update_address_taken_p member and initialize it. (pass_cd_dce::set_pass_param): New method. (pass_cd_dce::execute): Return TODO_update_address_taken from last cd_dce instance. * gcc.target/i386/pr96271.c: New test.
2021-01-04Update copyright years.Jakub Jelinek1-1/+1
2020-12-01Add if-chain to switch conversion pass.Martin Liska1-0/+1
gcc/ChangeLog: PR tree-optimization/14799 PR ipa/88702 * Makefile.in: Add gimple-if-to-switch.o. * dbgcnt.def (DEBUG_COUNTER): Add new debug counter. * passes.def: Include new pass_if_to_switch pass. * timevar.def (TV_TREE_IF_TO_SWITCH): New timevar. * tree-pass.h (make_pass_if_to_switch): New. * tree-ssa-reassoc.c (struct operand_entry): Move to the header. (dump_range_entry): Move to header file. (debug_range_entry): Likewise. (no_side_effect_bb): Make it global. * tree-switch-conversion.h (simple_cluster::simple_cluster): Add inline for couple of functions in order to prevent error about multiple defined symbols. * gimple-if-to-switch.cc: New file. * tree-ssa-reassoc.h: New file. gcc/testsuite/ChangeLog: PR tree-optimization/14799 PR ipa/88702 * gcc.dg/tree-ssa/pr96480.c: Disable if-to-switch conversion. * gcc.dg/tree-ssa/reassoc-32.c: Likewise. * g++.dg/tree-ssa/if-to-switch-1.C: New test. * gcc.dg/tree-ssa/if-to-switch-1.c: New test. * gcc.dg/tree-ssa/if-to-switch-2.c: New test. * gcc.dg/tree-ssa/if-to-switch-3.c: New test. * gcc.dg/tree-ssa/if-to-switch-4.c: New test. * gcc.dg/tree-ssa/if-to-switch-5.c: New test. * gcc.dg/tree-ssa/if-to-switch-6.c: New test. * gcc.dg/tree-ssa/if-to-switch-7.c: New test. * gcc.dg/tree-ssa/if-to-switch-8.c: New test.
2020-11-13Decompose OpenACC 'kernels' constructs into parts, a sequence of compute ↵Gergö Barany1-0/+1
constructs Not yet enabled by default: for now, the current mode of OpenACC 'kernels' constructs handling still remains '-fopenacc-kernels=parloops', but that is to change later. gcc/ * omp-oacc-kernels-decompose.cc: New. * Makefile.in (OBJS): Add it. * passes.def: Instantiate it. * tree-pass.h (make_pass_omp_oacc_kernels_decompose): Declare. * flag-types.h (enum openacc_kernels): Add. * doc/invoke.texi (-fopenacc-kernels): Document. * gimple.h (enum gf_mask): Add 'GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED', 'GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE', 'GF_OMP_TARGET_KIND_OACC_DATA_KERNELS'. (is_gimple_omp_oacc, is_gimple_omp_offloaded): Handle these. * gimple-pretty-print.c (dump_gimple_omp_target): Likewise. * omp-expand.c (expand_omp_target, build_omp_regions_1) (omp_make_gimple_edges): Likewise. * omp-low.c (scan_sharing_clauses, scan_omp_for) (check_omp_nesting_restrictions, lower_oacc_reductions) (lower_oacc_head_mark, lower_omp_target): Likewise. * omp-offload.c (execute_oacc_device_lower): Likewise. gcc/c-family/ * c.opt (fopenacc-kernels): Add. gcc/fortran/ * lang.opt (fopenacc-kernels): Add. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-1.c: New. * c-c++-common/goacc/kernels-decompose-2.c: New. * c-c++-common/goacc/kernels-decompose-ice-1.c: New. * c-c++-common/goacc/kernels-decompose-ice-2.c: New. * gfortran.dg/goacc/kernels-decompose-1.f95: New. * gfortran.dg/goacc/kernels-decompose-2.f95: New. * c-c++-common/goacc/if-clause-2.c: Adjust. * gfortran.dg/goacc/kernels-tree.f95: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c: New. * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Adjust. * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2020-11-02pass: Run cleanup passes before SLP [PR96789]Kewen Lin1-3/+8
As the discussion in PR96789, we found that some scalar stmts which can be eliminated by some passes after SLP, but we still modeled their costs when trying to SLP, it could impact vectorizer's decision. One typical case is the case in PR96789 on target Power. As Richard suggested there, this patch is to introduce one pass called pre_slp_scalar_cleanup which has some secondary clean up passes, for now they are FRE and DSE. It introduces one new TODO flags group called pending TODO flags, unlike normal TODO flags, the pending TODO flags are passed down in the pipeline until one of its consumers can perform the requested action. Consumers should then clear the flags for the actions that they have taken. Soem compilation time statistics on all SPEC2017 INT bmks were collected on one Power9 machine for several option sets below: A1: -Ofast -funroll-loops A2: -O1 A3: -O1 -funroll-loops A4: -O2 A5: -O2 -funroll-loops the corresponding increment rate is trivial: A1 A2 A3 A4 A5 0.08% 0.00% -0.38% -0.10% -0.05% Bootstrapped/regtested on powerpc64le-linux-gnu P8. gcc/ChangeLog: PR tree-optimization/96789 * function.h (struct function): New member unsigned pending_TODOs. * passes.c (class pass_pre_slp_scalar_cleanup): New class. (make_pass_pre_slp_scalar_cleanup): New function. (pass_data_pre_slp_scalar_cleanup): New pass data. * passes.def: (pass_pre_slp_scalar_cleanup): New pass, add pass_fre and pass_dse as its children. * timevar.def (TV_SCALAR_CLEANUP): New timevar. * tree-pass.h (PENDING_TODO_force_next_scalar_cleanup): New pending TODO flag. (make_pass_pre_slp_scalar_cleanup): New declare. * tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1): Once any outermost loop gets unrolled, flag cfun pending_TODOs PENDING_TODO_force_next_scalar_cleanup on. gcc/testsuite/ChangeLog: PR tree-optimization/96789 * gcc.dg/tree-ssa/ssa-dse-28.c: Adjust. * gcc.dg/tree-ssa/ssa-dse-29.c: Likewise. * gcc.dg/vect/bb-slp-41.c: Likewise. * gcc.dg/tree-ssa/pr96789.c: New test.
2020-10-30Add -fzero-call-used-regs option and zero_call_used_regs function attributes.qing zhao1-0/+1
This new feature causes the compiler to zero a subset of all call-used registers at function return. This is used to increase program security by either mitigating Return-Oriented Programming (ROP) attacks or preventing information leakage through registers. gcc/ChangeLog: 2020-10-30 Qing Zhao <qing.zhao@oracle.com> H.J.Lu <hjl.tools@gmail.com> * common.opt: Add new option -fzero-call-used-regs * config/i386/i386.c (zero_call_used_regno_p): New function. (zero_call_used_regno_mode): Likewise. (zero_all_vector_registers): Likewise. (zero_all_st_registers): Likewise. (zero_all_mm_registers): Likewise. (ix86_zero_call_used_regs): Likewise. (TARGET_ZERO_CALL_USED_REGS): Define. * df-scan.c (df_epilogue_uses_p): New function. (df_get_exit_block_use_set): Replace EPILOGUE_USES with df_epilogue_uses_p. * df.h (df_epilogue_uses_p): Declare. * doc/extend.texi: Document the new zero_call_used_regs attribute. * doc/invoke.texi: Document the new -fzero-call-used-regs option. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGS): New hook. * emit-rtl.h (struct rtl_data): New field must_be_zero_on_return. * flag-types.h (namespace zero_regs_flags): New namespace. * function.c (gen_call_used_regs_seq): New function. (class pass_zero_call_used_regs): New class. (pass_zero_call_used_regs::execute): New function. (make_pass_zero_call_used_regs): New function. * optabs.c (expand_asm_reg_clobber_mem_blockage): New function. * optabs.h (expand_asm_reg_clobber_mem_blockage): Declare. * opts.c (zero_call_used_regs_opts): New structure array initialization. (parse_zero_call_used_regs_options): New function. (common_handle_option): Handle -fzero-call-used-regs. * opts.h (zero_call_used_regs_opts): New structure array. * passes.def: Add new pass pass_zero_call_used_regs. * recog.c (valid_insn_p): New function. * recog.h (valid_insn_p): Declare. * resource.c (init_resource_info): Replace EPILOGUE_USES with df_epilogue_uses_p. * target.def (zero_call_used_regs): New hook. * targhooks.c (default_zero_call_used_regs): New function. * targhooks.h (default_zero_call_used_regs): Declare. * tree-pass.h (make_pass_zero_call_used_regs): Declare. gcc/c-family/ChangeLog: 2020-10-30 Qing Zhao <qing.zhao@oracle.com> H.J.Lu <hjl.tools@gmail.com> * c-attribs.c (c_common_attribute_table): Add new attribute zero_call_used_regs. (handle_zero_call_used_regs_attribute): New function. gcc/testsuite/ChangeLog: 2020-10-30 Qing Zhao <qing.zhao@oracle.com> H.J.Lu <hjl.tools@gmail.com> * c-c++-common/zero-scratch-regs-1.c: New test. * c-c++-common/zero-scratch-regs-10.c: New test. * c-c++-common/zero-scratch-regs-11.c: New test. * c-c++-common/zero-scratch-regs-2.c: New test. * c-c++-common/zero-scratch-regs-3.c: New test. * c-c++-common/zero-scratch-regs-4.c: New test. * c-c++-common/zero-scratch-regs-5.c: New test. * c-c++-common/zero-scratch-regs-6.c: New test. * c-c++-common/zero-scratch-regs-7.c: New test. * c-c++-common/zero-scratch-regs-8.c: New test. * c-c++-common/zero-scratch-regs-9.c: New test. * c-c++-common/zero-scratch-regs-attr-usages.c: New test. * gcc.target/i386/zero-scratch-regs-1.c: New test. * gcc.target/i386/zero-scratch-regs-10.c: New test. * gcc.target/i386/zero-scratch-regs-11.c: New test. * gcc.target/i386/zero-scratch-regs-12.c: New test. * gcc.target/i386/zero-scratch-regs-13.c: New test. * gcc.target/i386/zero-scratch-regs-14.c: New test. * gcc.target/i386/zero-scratch-regs-15.c: New test. * gcc.target/i386/zero-scratch-regs-16.c: New test. * gcc.target/i386/zero-scratch-regs-17.c: New test. * gcc.target/i386/zero-scratch-regs-18.c: New test. * gcc.target/i386/zero-scratch-regs-19.c: New test. * gcc.target/i386/zero-scratch-regs-2.c: New test. * gcc.target/i386/zero-scratch-regs-20.c: New test. * gcc.target/i386/zero-scratch-regs-21.c: New test. * gcc.target/i386/zero-scratch-regs-22.c: New test. * gcc.target/i386/zero-scratch-regs-23.c: New test. * gcc.target/i386/zero-scratch-regs-24.c: New test. * gcc.target/i386/zero-scratch-regs-25.c: New test. * gcc.target/i386/zero-scratch-regs-26.c: New test. * gcc.target/i386/zero-scratch-regs-27.c: New test. * gcc.target/i386/zero-scratch-regs-28.c: New test. * gcc.target/i386/zero-scratch-regs-29.c: New test. * gcc.target/i386/zero-scratch-regs-30.c: New test. * gcc.target/i386/zero-scratch-regs-31.c: New test. * gcc.target/i386/zero-scratch-regs-3.c: New test. * gcc.target/i386/zero-scratch-regs-4.c: New test. * gcc.target/i386/zero-scratch-regs-5.c: New test. * gcc.target/i386/zero-scratch-regs-6.c: New test. * gcc.target/i386/zero-scratch-regs-7.c: New test. * gcc.target/i386/zero-scratch-regs-8.c: New test. * gcc.target/i386/zero-scratch-regs-9.c: New test.
2020-10-22Materialize clones on demandJan Hubicka1-1/+0
this patch removes the pass to materialize all clones and instead this is now done on demand. The motivation is to reduce lifetime of function bodies in ltrans that should noticeably reduce memory use for highly parallel compilations of large programs (like Martin does) or with partitioning reduced/disabled. For cc1 with one partition the memory use seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not particularly accurate). gcc/ChangeLog: 2020-10-22 Jan Hubicka <hubicka@ucw.cz> * cgraph.c (cgraph_node::get_untransformed_body): Perform lazy clone materialization. * cgraph.h (cgraph_node::materialize_clone): Declare. (symbol_table::materialize_all_clones): Remove. * cgraphclones.c (cgraph_materialize_clone): Turn to ... (cgraph_node::materialize_clone): .. this one; move here dumping from symbol_table::materialize_all_clones. (symbol_table::materialize_all_clones): Remove. * cgraphunit.c (mark_functions_to_output): Clear stmt references. (cgraph_node::expand): Initialize bitmaps early; do not call execute_all_ipa_transforms if there are no transforms. * ipa-inline-transform.c (save_inline_function_body): Fix formating. (inline_transform): Materialize all clones before function is modified. * ipa-param-manipulation.c (ipa_param_adjustments::modify_call): Materialize clone if needed. * ipa.c (class pass_materialize_all_clones): Remove. (make_pass_materialize_all_clones): Remove. * passes.c (execute_all_ipa_transforms): Materialize all clones. * passes.def: Remove pass_materialize_all_clones. * tree-pass.h (make_pass_materialize_all_clones): Remove. * tree-ssa-structalias.c (ipa_pta_execute): Clear refs.
2020-09-20New modref/ipa_modref optimization passesJan Hubicka1-0/+4
2020-09-19 David Cepelik <d@dcepelik.cz> Jan Hubicka <hubicka@ucw.cz> * Makefile.in: Add ipa-modref.c and ipa-modref-tree.c. * alias.c: (reference_alias_ptr_type_1): Export. * alias.h (reference_alias_ptr_type_1): Declare. * common.opt (fipa-modref): New. * gengtype.c (open_base_files): Add ipa-modref-tree.h and ipa-modref.h * ipa-modref-tree.c: New file. * ipa-modref-tree.h: New file. * ipa-modref.c: New file. * ipa-modref.h: New file. * lto-section-in.c (lto_section_name): Add ipa_modref. * lto-streamer.h (enum lto_section_type): Add LTO_section_ipa_modref. * opts.c (default_options_table): Enable ipa-modref at -O1+. * params.opt (-param=modref-max-bases, -param=modref-max-refs, -param=modref-max-tests): New params. * passes.def: Schedule pass_modref and pass_ipa_modref. * timevar.def (TV_IPA_MODREF): New timevar. (TV_TREE_MODREF): New timevar. * tree-pass.h (make_pass_modref): Declare. (make_pass_ipa_modref): Declare. * tree-ssa-alias.c (dump_alias_stats): Include ipa-modref-tree.h and ipa-modref.h (alias_stats): Add modref_use_may_alias, modref_use_no_alias, modref_clobber_may_alias, modref_clobber_no_alias, modref_tests. (dump_alias_stats): Dump new stats. (nonoverlapping_array_refs_p): Fix formating. (modref_may_conflict): New function. (ref_maybe_used_by_call_p_1): Use it. (call_may_clobber_ref_p_1): Use it. (call_may_clobber_ref_p): Update. (stmt_may_clobber_ref_p_1): Update. * tree-ssa-alias.h (call_may_clobber_ref_p_1): Update.
2020-08-03Removal of HSA offloading from gcc and libgompMartin Jambor1-2/+0
This patch removes the generation of HSAIL from the compiler, the HSA offloading plugin from libgomp and the associated testsuite tests and infrastructure bits from the respective testsuites. Apart from removal of the obvious files, I removed bits that I found by searching for HSA related terms and by re-tracing my steps and looking at the patches that introduced HSA in the first place. I did not remove everything these patches brought in, for example: - the mechanism to pass offload-target specific info from the application to the offloading plugin - but the same mechanism is also used to communicate number of teams and the thread limit to all offload targets. - run_func hook in gomp_device_descr stays too, although now it is not used. If some future offload target would like the ability to refuse to offload some functions, it can use it. It is easy to remove as a follow-up if it is considered clutter, though. - configure options --with-hsa-runtime=PATH, -with-hsa-runtime-include=PATH and --with-hsa-runtime-lib=PATH rmeain because GCN uses them too. - Surprisingly, GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES (a constant from gomp-constants.h) appears in the source of the amdgcn libgomp plugin, although I tend to think that code path is not ever used and this patch certainly removes it from the compiler. Nevertheless, it seems it has potential value beyond HSAIL and so I've kept it, it can of course always be easily removed in the future of GCN folk abandon it too. - I assume constants OFFLOAD_TARGET_TYPE_HSA and GOMP_DEVICE_HSA need to stay indefinitely too just so that no future offload target picks that number. - I have kept dg-require-effective-target offload_device_nonshared_as requirement of thests which have it. It is quite probable I missed some small HSA artifacts but those should be easy to remove later as we find them. include/ChangeLog: 2020-07-24 Martin Jambor <mjambor@suse.cz> * gomp-constants.h (GOMP_VERSION_HSA): Remove. gcc/ChangeLog: 2020-07-24 Martin Jambor <mjambor@suse.cz> * hsa-brig-format.h: Moved to brig/brigfrontend. * hsa-brig.c: Removed. * hsa-builtins.def: Likewise. * hsa-common.c: Likewise. * hsa-common.h: Likewise. * hsa-dump.c: Likewise. * hsa-gen.c: Likewise. * hsa-regalloc.c: Likewise. * ipa-hsa.c: Likewise. * omp-grid.c: Likewise. * omp-grid.h: Likewise. * Makefile.in (BUILTINS_DEF): Remove hsa-builtins.def. (OBJS): Remove hsa-common.o, hsa-gen.o, hsa-regalloc.o, hsa-brig.o, hsa-dump.o, ipa-hsa.c and omp-grid.o. (GTFILES): Removed hsa-common.c and omp-expand.c. * builtins.def: Remove processing of hsa-builtins.def. (DEF_HSA_BUILTIN): Remove. * common.opt (flag_disable_hsa): Remove. (-Whsa): Ignore. * config.in (ENABLE_HSA): Removed. * configure.ac: Removed handling configuration for hsa offloading. (ENABLE_HSA): Removed. * configure: Regenerated. * doc/install.texi (--enable-offload-targets): Remove hsa from the example. (--with-hsa-runtime): Reword to reference any HSA run-time, not specifically HSA offloading. * doc/invoke.texi (Option Summary): Remove -Whsa. (Warning Options): Likewise. (Optimize Options): Remove hsa-gen-debug-stores. * doc/passes.texi (Regular IPA passes): Remove section on IPA HSA pass. * gimple-low.c (lower_stmt): Remove GIMPLE_OMP_GRID_BODY case. * gimple-pretty-print.c (dump_gimple_omp_for): Likewise. (dump_gimple_omp_block): Likewise. (pp_gimple_stmt_1): Likewise. * gimple-walk.c (walk_gimple_stmt): Likewise. * gimple.c (gimple_build_omp_grid_body): Removed function. (gimple_copy): Remove GIMPLE_OMP_GRID_BODY case. * gimple.def (GIMPLE_OMP_GRID_BODY): Removed. * gimple.h (gf_mask): Removed GF_OMP_PARALLEL_GRID_PHONY, OMP_FOR_KIND_GRID_LOOP, GF_OMP_FOR_GRID_PHONY, GF_OMP_FOR_GRID_INTRA_GROUP, GF_OMP_FOR_GRID_GROUP_ITER and GF_OMP_TEAMS_GRID_PHONY. Renumbered GF_OMP_FOR_KIND_SIMD and GF_OMP_TEAMS_HOST. (gimple_build_omp_grid_body): Removed declaration. (gimple_has_substatements): Remove GIMPLE_OMP_GRID_BODY case. (gimple_omp_for_grid_phony): Removed. (gimple_omp_for_set_grid_phony): Likewise. (gimple_omp_for_grid_intra_group): Likewise. (gimple_omp_for_grid_intra_group): Likewise. (gimple_omp_for_grid_group_iter): Likewise. (gimple_omp_for_set_grid_group_iter): Likewise. (gimple_omp_parallel_grid_phony): Likewise. (gimple_omp_parallel_set_grid_phony): Likewise. (gimple_omp_teams_grid_phony): Likewise. (gimple_omp_teams_set_grid_phony): Likewise. (CASE_GIMPLE_OMP): Remove GIMPLE_OMP_GRID_BODY case. * lto-section-in.c (lto_section_name): Removed hsa. * lto-streamer.h (lto_section_type): Removed LTO_section_ipa_hsa. * lto-wrapper.c (compile_images_for_offload_targets): Remove special handling of hsa. * omp-expand.c: Do not include hsa-common.h and gt-omp-expand.h. (parallel_needs_hsa_kernel_p): Removed. (grid_launch_attributes_trees): Likewise. (grid_launch_attributes_trees): Likewise. (grid_create_kernel_launch_attr_types): Likewise. (grid_insert_store_range_dim): Likewise. (grid_get_kernel_launch_attributes): Likewise. (get_target_arguments): Remove code passing HSA grid sizes. (grid_expand_omp_for_loop): Remove. (grid_arg_decl_map): Likewise. (grid_remap_kernel_arg_accesses): Likewise. (grid_expand_target_grid_body): Likewise. (expand_omp): Remove call to grid_expand_target_grid_body. (omp_make_gimple_edges): Remove GIMPLE_OMP_GRID_BODY case. * omp-general.c: Do not include hsa-common.h. (omp_maybe_offloaded): Do not check for HSA offloading. (omp_context_selector_matches): Likewise. * omp-low.c: Do not include hsa-common.h and omp-grid.h. (build_outer_var_ref): Remove handling of GIMPLE_OMP_GRID_BODY. (scan_sharing_clauses): Remove handling of OMP_CLAUSE__GRIDDIM_. (scan_omp_parallel): Remove handling of the phoney variant. (check_omp_nesting_restrictions): Remove handling of GIMPLE_OMP_GRID_BODY and GF_OMP_FOR_KIND_GRID_LOOP. (scan_omp_1_stmt): Remove handling of GIMPLE_OMP_GRID_BODY. (lower_omp_for_lastprivate): Remove handling of gridified loops. (lower_omp_for): Remove phony loop handling. (lower_omp_taskreg): Remove phony construct handling. (lower_omp_teams): Likewise. (lower_omp_grid_body): Removed. (lower_omp_1): Remove GIMPLE_OMP_GRID_BODY case. (execute_lower_omp): Do not call omp_grid_gridify_all_targets. * opts.c (common_handle_option): Do not handle hsa when processing OPT_foffload_. * params.opt (hsa-gen-debug-stores): Remove. * passes.def: Remove pass_ipa_hsa and pass_gen_hsail. * timevar.def: Remove TV_IPA_HSA. * toplev.c: Do not include hsa-common.h. (compile_file): Do not call hsa_output_brig. * tree-core.h (enum omp_clause_code): Remove OMP_CLAUSE__GRIDDIM_. (tree_omp_clause): Remove union field dimension. * tree-nested.c (convert_nonlocal_omp_clauses): Remove the OMP_CLAUSE__GRIDDIM_ case. (convert_local_omp_clauses): Likewise. * tree-pass.h (make_pass_gen_hsail): Remove declaration. (make_pass_ipa_hsa): Likewise. * tree-pretty-print.c (dump_omp_clause): Remove GIMPLE_OMP_GRID_BODY case. * tree.c (omp_clause_num_ops): Remove the element corresponding to OMP_CLAUSE__GRIDDIM_. (omp_clause_code_name): Likewise. (walk_tree_1): Remove GIMPLE_OMP_GRID_BODY case. * tree.h (OMP_CLAUSE__GRIDDIM__DIMENSION): Remove. (OMP_CLAUSE__GRIDDIM__SIZE): Likewise. (OMP_CLAUSE__GRIDDIM__GROUP): Likewise. gcc/fortran/ChangeLog: 2020-07-24 Martin Jambor <mjambor@suse.cz> * f95-lang.c (gfc_init_builtin_functions): Remove processing of hsa-builtins.def. gcc/brig/ChangeLog: 2020-07-24 Martin Jambor <mjambor@suse.cz> * brigfrontend/brig-util.h (hsa_type_packed_p): Declared. * brigfrontend/brig-util.cc (hsa_type_packed_p): Moved here from removed gcc/hsa-common.c. libgomp/ChangeLog: 2020-07-24 Martin Jambor <mjambor@suse.cz> * plugin/Makefrag.am: Remove configuration of HSA plugin. * aclocal.m4: Regenerated. * Makefile.in: Regenerated. * config.h.in: Regenerated. * configure: Regenerated. * plugin/configfrag.ac: Likewise. * plugin/hsa_ext_finalize.h: Removed. * plugin/plugin-hsa.c: Likewise. * testsuite/Makefile.in: Regenerated. * testsuite/lib/libgomp.exp (offload_target_to_openacc_device_type): Remove hsa case. (check_effective_target_hsa_offloading_selected_nocache): Removed (check_effective_target_hsa_offloading_selected): Likewise. (libgomp_init): Do not add -Wno-hsa to additional_flags. * testsuite/libgomp.hsa.c/alloca-1.c: Removed test. * testsuite/libgomp.hsa.c/bitfield-1.c: Likewise. * testsuite/libgomp.hsa.c/bits-insns.c: Likewise. * testsuite/libgomp.hsa.c/builtins-1.c: Likewise. * testsuite/libgomp.hsa.c/c.exp: Likewise. * testsuite/libgomp.hsa.c/complex-1.c: Likewise. * testsuite/libgomp.hsa.c/complex-align-2.c: Likewise. * testsuite/libgomp.hsa.c/formal-actual-args-1.c: Likewise. * testsuite/libgomp.hsa.c/function-call-1.c: Likewise. * testsuite/libgomp.hsa.c/get-level-1.c: Likewise. * testsuite/libgomp.hsa.c/gridify-1.c: Likewise. * testsuite/libgomp.hsa.c/gridify-2.c: Likewise. * testsuite/libgomp.hsa.c/gridify-3.c: Likewise. * testsuite/libgomp.hsa.c/gridify-4.c: Likewise. * testsuite/libgomp.hsa.c/memory-operations-1.c: Likewise. * testsuite/libgomp.hsa.c/pr69568.c: Likewise. * testsuite/libgomp.hsa.c/pr82416.c: Likewise. * testsuite/libgomp.hsa.c/rotate-1.c: Likewise. * testsuite/libgomp.hsa.c/staticvar.c: Likewise. * testsuite/libgomp.hsa.c/switch-1.c: Likewise. * testsuite/libgomp.hsa.c/switch-branch-1.c: Likewise. * testsuite/libgomp.hsa.c/switch-sbr-2.c: Likewise. * testsuite/libgomp.hsa.c/tiling-1.c: Likewise. * testsuite/libgomp.hsa.c/tiling-2.c: Likewise. gcc/testsuite/ChangeLog: 2020-07-24 Martin Jambor <mjambor@suse.cz> * lib/target-supports.exp (check_effective_target_offload_hsa): Removed. * c-c++-common/gomp/gridify-1.c: Removed test. * c-c++-common/gomp/gridify-2.c: Likewise. * c-c++-common/gomp/gridify-3.c: Likewise. * c-c++-common/gomp/hsa-indirect-call-1.c: Likewise. * gfortran.dg/gomp/gridify-1.f90: Likewise. * gcc.dg/gomp/gomp.exp: Do not pass -Wno-hsa to tests. * g++.dg/gomp/gomp.exp: Likewise. * gfortran.dg/gomp/gomp.exp: Likewise.