Age | Commit message (Collapse) | Author | Files | Lines |
|
This is the updated patch with the suggestion from:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665217.html
Where we use a second arg/param to set which passes we want to have
the remove_unused_locals on the dce.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/117096
* passes.def: Update some of the dce/cd-cde passes setting
the 2nd arg to true.
Also remove comment about stdarg since dce does it.
* tree-ssa-dce.cc (pass_dce): Add remove_unused_locals_p field.
Update set_pass_param to allow for 2nd param.
Use remove_unused_locals_p in execute to return TODO_remove_unused_locals.
(pass_cd_dce): Likewise.
* tree-stdarg.cc (pass_data_stdarg): Remove TODO_remove_unused_locals.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
When looking at PR 115815 we realized that it would make sense to make
calls to functions originally declared static constructors and
destructors created by pass_ipa_cdtor_merge visible to IPA-SRA. This
patch does that.
gcc/ChangeLog:
2024-07-25 Martin Jambor <mjambor@suse.cz>
* passes.def: Move pass_ipa_cdtor_merge before pass_ipa_cp and
pass_ipa_sra.
|
|
See Subversion r217124 (Git commit 433e4164339f18d0b8798968444a56b681b5232c)
"Reorganize post-ra pipeline for targets without register allocation".
gcc/
* passes.cc: Document 'pass_postreload' vs. 'pass_late_compilation'.
* passes.def: Likewise.
|
|
Since Subversion r201359 (Git commit a167b052dfe9a8509bb23c374ffaeee953df0917)
"Introduce gen-pass-instances.awk and pass-instances.def", the usage comment at
the top of 'gcc/passes.def' no longer is accurate (even if that latter file
does continue to use the 'NEXT_PASS' form without 'NUM') -- and, worse, the
'NEXT_PASS' etc. in that usage comment are processed by the
'gcc/gen-pass-instances.awk' script:
--- source-gcc/gcc/passes.def 2024-06-24 18:55:15.132561641 +0200
+++ build-gcc/gcc/pass-instances.def 2024-06-24 18:55:27.768562714 +0200
[...]
@@ -20,546 +22,578 @@
/*
Macros that should be defined when using this file:
INSERT_PASSES_AFTER (PASS)
PUSH_INSERT_PASSES_WITHIN (PASS)
POP_INSERT_PASSES ()
- NEXT_PASS (PASS)
+ NEXT_PASS (PASS, 1)
TERMINATE_PASS_LIST (PASS)
*/
[...]
(That is, this is 'NEXT_PASS' for the first instance of pass 'PASS'.)
That's benign so far, but with another thing that I'll be extending, I'd
then run into an error while the script handles this comment block. ;-\
gcc/
* passes.def: Rewrite usage comment at the top.
|
|
Enable the tailcall optimization for non optimizing builds,
but in this case only checks calls that have the musttail attribute set.
This makes musttail work without optimization.
This is done with a new late musttail pass that is only active when
not optimizing. The new pass relies on tree-cfg to discover musttails.
This avoids a ~0.8% compiler run time penalty at -O0.
gcc/ChangeLog:
PR c/83324
* function.h (struct function): Add has_musttail.
* lto-streamer-in.cc (input_struct_function_base): Stream
has_musttail.
* lto-streamer-out.cc (output_struct_function_base): Dito.
* passes.def (pass_musttail): Add.
* tree-cfg.cc (notice_special_calls): Record has_musttail.
(clear_special_calls): Clear has_musttail.
* tree-pass.h (make_pass_musttail): Add.
* tree-tailcall.cc (find_tail_calls): Handle only_musttail
argument.
(tree_optimize_tail_calls_1): Pass on only_musttail.
(execute_tail_calls): Pass only_musttail as false.
(class pass_musttail): Add.
(make_pass_musttail): Add.
|
|
The pre-commit testing showed that making ext-dce only active at -O2 and above
would require minor edits to the tests. In some cases we had specified -O1 in
the test or specified no optimization level at all. Those need to be bumped to
-O2. In one test we had one set of dg-options overriding another.
The other approach that could have been taken would be to drop the -On
argument, add an explicit -fext-dce and add dg-skip-if options. I originally
thought that was going to be way to go, but the dg-skip-if aspect was going to
get ugly as things like interaction between unrolling, peeling and -ftracer
would have to be accounted for and would likely need semi-regular adjustment.
Changes since V2:
Testsuite changes to deal with pass only being enabled at -O2 or
higher.
--
Changes since V1:
Check flag_ext_dce before running the new pass. I'd forgotten that
I had removed that part of the gate to facilitate more testing.
Turn flag_ext_dce on at -O2 and above.
Adjust one of the riscv tests to explicitly avoid vectors
Adjust a few aarch64 tests
In tbz_2.c we remove an unnecessary extension which causes us to use
"x" registers instead of "w" registers.
In the pred_clobber tests we also remove an extension and that
ultimately causes a reg->reg copy to change locations.
--
This was actually ack'd late in the gcc-14 cycle, but I chose not to integrate
it given how late we were in the cycle.
The basic idea here is to track liveness of subobjects within a word and if we
find an extension where the bits set aren't actually used, then we convert the
extension into a subreg. The subreg typically simplifies away.
I've seen this help a few routines in coremark, fix one bug in the testsuite
(pr111384) and fix a couple internally reported bugs in Ventana.
The original idea and code were from Joern; Jivan and I hacked it into usable
shape. I've had this in my tester for ~8 months, so it's been through more
build/test cycles than I care to contemplate and nearly every architecture we
support.
But just in case, I'm going to wait for it to spin through the pre-commit CI
tester. I'll find my old ChangeLog before committing.
gcc/
* Makefile.in (OBJS): Add ext-dce.o
* common.opt (ext-dce): Document new option.
* df-scan.cc (df_get_ext_block_use_set): Delete prototype and
make extern.
* df.h (df_get_exit_block_use_set): Prototype.
* ext-dce.cc: New file/pass.
* opts.cc (default_options_table): Handle ext-dce at -O2 or higher.
* passes.def: Add ext-dce before combine.
* tree-pass.h (make_pass_ext_dce): Prototype.
gcc/testsuite
* gcc.target/aarch64/sve/pred_clobber_1.c: Update expected output.
* gcc.target/aarch64/sve/pred_clobber_2.c: Likewise.
* gcc.target/aarch64/sve/pred_clobber_3.c: Likewise.
* gcc.target/aarch64/tbz_2.c: Likewise.
* gcc.target/riscv/core_bench_list.c: New test.
* gcc.target/riscv/core_init_matrix.c: New test.
* gcc.target/riscv/core_list_init.c: New test.
* gcc.target/riscv/matrix_add_const.c: New test.
* gcc.target/riscv/mem-extend.c: New test.
* gcc.target/riscv/pr111384.c: New test.
Co-authored-by: Jivan Hakobyan <jivanhakobyan9@gmail.com>
Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com>
|
|
Since cabs expansion was removed from this pass,
it would be good to rename it.
Bootstrapped and tested on x86_64-linux-gnu
gcc/ChangeLog:
* passes.def (expand_pow): Renamed from expand_powcabs.
* timevar.def (TV_TREE_POWCABS): Remove.
(TV_TREE_POW): Add
* tree-pass.h (make_pass_expand_powcabs): Rename to ...
(make_pass_expand_pow): This.
* tree-ssa-math-opts.cc (class pass_expand_powcabs): Rename to ...
(class pass_expand_pow): This.
(pass_expand_powcabs::execute): Rename to ...
(pass_expand_pow::execute): This.
(make_pass_expand_powcabs): Rename to ...
(make_pass_expand_pow): This.
gcc/testsuite/ChangeLog:
* gcc.dg/pow-sqrt-synth-1.c: Update testcase for renamed pass.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.
The pass currently has a single objective: remove definitions by
substituting into all uses. The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.
The patch fixes PR106594. It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.
This is just a first step. I'm hoping that the pass could be
used for other combine-related optimisations in future. In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutable, since it doesn't have to worry about register
pressure. If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.
On most targets, the pass is enabled by default at -O2 and above.
However, it has a tendency to undo x86's STV and RPAD passes,
by folding the more complex post-STV/RPAD form back into the
simpler pre-pass form.
Also, running a pass after register allocation means that we can
now match define_insn_and_splits that were previously only matched
before register allocation. This trips things like:
(define_insn_and_split "..."
[...pattern...]
"...cond..."
"#"
"&& 1"
[...pattern...]
{
...unconditional use of gen_reg_rtx ()...;
}
because matching and splitting after RA will call gen_reg_rtx when
pseudos are no longer allowed. rs6000 has several instances of this.
xtensa has a variation in which the split condition is:
"&& can_create_pseudo_p ()"
The failure then is that, if we match after RA, we'll never be
able to split the instruction.
The patch therefore disables the pass by default on i386, rs6000
and xtensa. Hopefully we can fix those ports later (if their
maintainers want). It seems better to add the pass first, though,
to make it easier to test any such fixes.
gcc.target/aarch64/bitfield-bitint-abi-align{16,8}.c would need
quite a few updates for the late-combine output. That might be
worth doing, but it seems too complex to do as part of this patch.
I tried compiling at least one target per CPU directory and comparing
the assembly output for parts of the GCC testsuite. This is just a way
of getting a flavour of how the pass performs; it obviously isn't a
meaningful benchmark. All targets seemed to improve on average:
Target Tests Good Bad %Good Delta Median
====== ===== ==== === ===== ===== ======
aarch64-linux-gnu 2215 1975 240 89.16% -4159 -1
aarch64_be-linux-gnu 1569 1483 86 94.52% -10117 -1
alpha-linux-gnu 1454 1370 84 94.22% -9502 -1
amdgcn-amdhsa 5122 4671 451 91.19% -35737 -1
arc-elf 2166 1932 234 89.20% -37742 -1
arm-linux-gnueabi 1953 1661 292 85.05% -12415 -1
arm-linux-gnueabihf 1834 1549 285 84.46% -11137 -1
avr-elf 4789 4330 459 90.42% -441276 -4
bfin-elf 2795 2394 401 85.65% -19252 -1
bpf-elf 3122 2928 194 93.79% -8785 -1
c6x-elf 2227 1929 298 86.62% -17339 -1
cris-elf 3464 3270 194 94.40% -23263 -2
csky-elf 2915 2591 324 88.89% -22146 -1
epiphany-elf 2399 2304 95 96.04% -28698 -2
fr30-elf 7712 7299 413 94.64% -99830 -2
frv-linux-gnu 3332 2877 455 86.34% -25108 -1
ft32-elf 2775 2667 108 96.11% -25029 -1
h8300-elf 3176 2862 314 90.11% -29305 -2
hppa64-hp-hpux11.23 4287 4247 40 99.07% -45963 -2
ia64-linux-gnu 2343 1946 397 83.06% -9907 -2
iq2000-elf 9684 9637 47 99.51% -126557 -2
lm32-elf 2681 2608 73 97.28% -59884 -3
loongarch64-linux-gnu 1303 1218 85 93.48% -13375 -2
m32r-elf 1626 1517 109 93.30% -9323 -2
m68k-linux-gnu 3022 2620 402 86.70% -21531 -1
mcore-elf 2315 2085 230 90.06% -24160 -1
microblaze-elf 2782 2585 197 92.92% -16530 -1
mipsel-linux-gnu 1958 1827 131 93.31% -15462 -1
mipsisa64-linux-gnu 1655 1488 167 89.91% -16592 -2
mmix 4914 4814 100 97.96% -63021 -1
mn10300-elf 3639 3320 319 91.23% -34752 -2
moxie-rtems 3497 3252 245 92.99% -87305 -3
msp430-elf 4353 3876 477 89.04% -23780 -1
nds32le-elf 3042 2780 262 91.39% -27320 -1
nios2-linux-gnu 1683 1355 328 80.51% -8065 -1
nvptx-none 2114 1781 333 84.25% -12589 -2
or1k-elf 3045 2699 346 88.64% -14328 -2
pdp11 4515 4146 369 91.83% -26047 -2
pru-elf 1585 1245 340 78.55% -5225 -1
riscv32-elf 2122 2000 122 94.25% -101162 -2
riscv64-elf 1841 1726 115 93.75% -49997 -2
rl78-elf 2823 2530 293 89.62% -40742 -4
rx-elf 2614 2480 134 94.87% -18863 -1
s390-linux-gnu 1591 1393 198 87.55% -16696 -1
s390x-linux-gnu 2015 1879 136 93.25% -21134 -1
sh-linux-gnu 1870 1507 363 80.59% -9491 -1
sparc-linux-gnu 1123 1075 48 95.73% -14503 -1
sparc-wrs-vxworks 1121 1073 48 95.72% -14578 -1
sparc64-linux-gnu 1096 1021 75 93.16% -15003 -1
v850-elf 1897 1728 169 91.09% -11078 -1
vax-netbsdelf 3035 2995 40 98.68% -27642 -1
visium-elf 1392 1106 286 79.45% -7984 -2
xstormy16-elf 2577 2071 506 80.36% -13061 -1
gcc/
PR rtl-optimization/106594
PR rtl-optimization/114515
PR rtl-optimization/114575
PR rtl-optimization/114996
PR rtl-optimization/115104
* Makefile.in (OBJS): Add late-combine.o.
* common.opt (flate-combine-instructions): New option.
* doc/invoke.texi: Document it.
* opts.cc (default_options_table): Enable it by default at -O2
and above.
* tree-pass.h (make_pass_late_combine): Declare.
* late-combine.cc: New file.
* passes.def: Add two instances of late_combine.
* doc/passes.texi: Document the new passes.
* config/i386/i386-options.cc (ix86_override_options_after_change):
Disable late-combine by default.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Likewise.
* config/xtensa/xtensa.cc (xtensa_option_override): Likewise.
gcc/testsuite/
PR rtl-optimization/106594
* gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
targets.
* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
* gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
* gcc.target/aarch64/bitfield-bitint-abi-align16.c: Add
-fno-late-combine-instructions.
* gcc.target/aarch64/bitfield-bitint-abi-align8.c: Likewise.
* gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
* gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
* gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
described in the comment.
* gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
* gcc.target/aarch64/pr106594_1.c: New test.
|
|
Array bounds checking is currently tied to VRP. This causes issues with
using laternate VRP algorithms as well as experimenting with moving
the location of the warnings later. This moves it to its own pass
and cleans up the vrp_pass object.
* gimple-array-bounds.cc (array_bounds_checker::array_bounds_checker):
Always use current range_query.
(pass_data_array_bounds): New.
(pass_array_bounds): New.
(make_pass_array_bounds): New.
* gimple-array-bounds.h (array_bounds_checker): Adjust prototype.
* passes.def (pass_array_bounds): New. Add after VRP1.
* timevar.def (TV_TREE_ARRAY_BOUNDS): New timevar.
* tree-pass.h (make_pass_array_bounds): Add prototype.
* tree-vrp.cc (execute_ranger_vrp): Remove warning param and do
not invoke array bounds warning pass.
(pass_vrp::pass_vrp): Adjust params.
(pass_vrp::close): Adjust parameters.
(pass_vrp::warn_array_bounds_p): Remove.
(make_pass_vrp): Remove warning param.
(make_pass_early_vrp): Remove warning param.
(make_pass_fast_vrp): Remove warning param.
|
|
|
|
This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:
_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16
It also handles more complicated situations, e.g.:
_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1
gcc/ChangeLog:
* Makefile.in: Added sccopy pass.
* passes.def: Added sccopy pass before LTO streaming and before
RTL expansion.
* tree-pass.h (make_pass_sccopy): Added sccopy pass.
* gimple-ssa-sccopy.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.dg/sccopy-1.c: New test.
Signed-off-by: Filip Kastl <fkastl@suse.cz>
|
|
This patch adds the strub attribute for function and variable types,
command-line options, passes and adjustments to implement it,
documentation, and tests.
Stack scrubbing is implemented in a machine-independent way: functions
with strub enabled are modified so that they take an extra stack
watermark argument, that they update with their stack use, and the
caller can then zero it out once it regains control, whether by return
or exception. There are two ways to go about it: at-calls, that
modifies the visible interface (signature) of the function, and
internal, in which the body is moved to a clone, the clone undergoes
the interface change, and the function becomes a wrapper, preserving
its original interface, that calls the clone and then clears the stack
used by it.
Variables can also be annotated with the strub attribute, so that
functions that read from them get stack scrubbing enabled implicitly,
whether at-calls, for functions only usable within a translation unit,
or internal, for functions whose interfaces must not be modified.
There is a strict mode, in which functions that have their stack
scrubbed can only call other functions with stack-scrubbing
interfaces, or those explicitly marked as callable from strub
contexts, so that an entire call chain gets scrubbing, at once or
piecemeal depending on optimization levels. In the default mode,
relaxed, this requirement is not enforced by the compiler.
The implementation adds two IPA passes, one that assigns strub modes
early on, another that modifies interfaces and adds calls to the
builtins that jointly implement stack scrubbing. Another builtin,
that obtains the stack pointer, is added for use in the implementation
of the builtins, whether expanded inline or called in libgcc.
There are new command-line options to change operation modes and to
force the feature disabled; it is enabled by default, but it has no
effect and is implicitly disabled if the strub attribute is never
used. There are also options meant to use for testing the feature,
enabling different strubbing modes for all (viable) functions.
for gcc/ChangeLog
* Makefile.in (OBJS): Add ipa-strub.o.
(GTFILES): Add ipa-strub.cc.
* builtins.def (BUILT_IN_STACK_ADDRESS): New.
(BUILT_IN___STRUB_ENTER): New.
(BUILT_IN___STRUB_UPDATE): New.
(BUILT_IN___STRUB_LEAVE): New.
* builtins.cc: Include ipa-strub.h.
(STACK_STOPS, STACK_UNSIGNED): Define.
(expand_builtin_stack_address): New.
(expand_builtin_strub_enter): New.
(expand_builtin_strub_update): New.
(expand_builtin_strub_leave): New.
(expand_builtin): Call them.
* common.opt (fstrub=*): New options.
* doc/extend.texi (strub): New type attribute.
(__builtin_stack_address): New function.
(Stack Scrubbing): New section.
* doc/invoke.texi (-fstrub=*): New options.
(-fdump-ipa-*): New passes.
* gengtype-lex.l: Ignore multi-line pp-directives.
* ipa-inline.cc: Include ipa-strub.h.
(can_inline_edge_p): Test strub_inlinable_to_p.
* ipa-split.cc: Include ipa-strub.h.
(execute_split_functions): Test strub_splittable_p.
* ipa-strub.cc, ipa-strub.h: New.
* passes.def: Add strub_mode and strub passes.
* tree-cfg.cc (gimple_verify_flow_info): Note on debug stmts.
* tree-pass.h (make_pass_ipa_strub_mode): Declare.
(make_pass_ipa_strub): Declare.
(make_pass_ipa_function_and_variable_visibility): Fix
formatting.
* tree-ssa-ccp.cc (optimize_stack_restore): Keep restores
before strub leave.
* attribs.cc: Include ipa-strub.h.
(decl_attributes): Support applying attributes to function
type, rather than pointer type, at handler's request.
(comp_type_attributes): Combine strub_comptypes and target
comp_type results.
* doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New.
(TARGET_STRUB_MAY_USE_MEMSET): New.
* doc/tm.texi: Rebuilt.
* cgraph.h (symtab_node::reset): Add preserve_comdat_group
param, with a default.
* cgraphunit.cc (symtab_node::reset): Use it.
for gcc/c-family/ChangeLog
* c-attribs.cc: Include ipa-strub.h.
(handle_strub_attribute): New.
(c_common_attribute_table): Add strub.
for gcc/ada/ChangeLog
* gcc-interface/trans.cc: Include ipa-strub.h.
(gigi): Make internal decls for targets of compiler-generated
calls strub-callable too.
(build_raise_check): Likewise.
* gcc-interface/utils.cc: Include ipa-strub.h.
(handle_strub_attribute): New.
(gnat_internal_attribute_table): Add strub.
for gcc/testsuite/ChangeLog
* c-c++-common/strub-O0.c: New.
* c-c++-common/strub-O1.c: New.
* c-c++-common/strub-O2.c: New.
* c-c++-common/strub-O2fni.c: New.
* c-c++-common/strub-O3.c: New.
* c-c++-common/strub-O3fni.c: New.
* c-c++-common/strub-Og.c: New.
* c-c++-common/strub-Os.c: New.
* c-c++-common/strub-all1.c: New.
* c-c++-common/strub-all2.c: New.
* c-c++-common/strub-apply1.c: New.
* c-c++-common/strub-apply2.c: New.
* c-c++-common/strub-apply3.c: New.
* c-c++-common/strub-apply4.c: New.
* c-c++-common/strub-at-calls1.c: New.
* c-c++-common/strub-at-calls2.c: New.
* c-c++-common/strub-defer-O1.c: New.
* c-c++-common/strub-defer-O2.c: New.
* c-c++-common/strub-defer-O3.c: New.
* c-c++-common/strub-defer-Os.c: New.
* c-c++-common/strub-internal1.c: New.
* c-c++-common/strub-internal2.c: New.
* c-c++-common/strub-parms1.c: New.
* c-c++-common/strub-parms2.c: New.
* c-c++-common/strub-parms3.c: New.
* c-c++-common/strub-relaxed1.c: New.
* c-c++-common/strub-relaxed2.c: New.
* c-c++-common/strub-short-O0-exc.c: New.
* c-c++-common/strub-short-O0.c: New.
* c-c++-common/strub-short-O1.c: New.
* c-c++-common/strub-short-O2.c: New.
* c-c++-common/strub-short-O3.c: New.
* c-c++-common/strub-short-Os.c: New.
* c-c++-common/strub-strict1.c: New.
* c-c++-common/strub-strict2.c: New.
* c-c++-common/strub-tail-O1.c: New.
* c-c++-common/strub-tail-O2.c: New.
* c-c++-common/torture/strub-callable1.c: New.
* c-c++-common/torture/strub-callable2.c: New.
* c-c++-common/torture/strub-const1.c: New.
* c-c++-common/torture/strub-const2.c: New.
* c-c++-common/torture/strub-const3.c: New.
* c-c++-common/torture/strub-const4.c: New.
* c-c++-common/torture/strub-data1.c: New.
* c-c++-common/torture/strub-data2.c: New.
* c-c++-common/torture/strub-data3.c: New.
* c-c++-common/torture/strub-data4.c: New.
* c-c++-common/torture/strub-data5.c: New.
* c-c++-common/torture/strub-indcall1.c: New.
* c-c++-common/torture/strub-indcall2.c: New.
* c-c++-common/torture/strub-indcall3.c: New.
* c-c++-common/torture/strub-inlinable1.c: New.
* c-c++-common/torture/strub-inlinable2.c: New.
* c-c++-common/torture/strub-ptrfn1.c: New.
* c-c++-common/torture/strub-ptrfn2.c: New.
* c-c++-common/torture/strub-ptrfn3.c: New.
* c-c++-common/torture/strub-ptrfn4.c: New.
* c-c++-common/torture/strub-pure1.c: New.
* c-c++-common/torture/strub-pure2.c: New.
* c-c++-common/torture/strub-pure3.c: New.
* c-c++-common/torture/strub-pure4.c: New.
* c-c++-common/torture/strub-run1.c: New.
* c-c++-common/torture/strub-run2.c: New.
* c-c++-common/torture/strub-run3.c: New.
* c-c++-common/torture/strub-run4.c: New.
* c-c++-common/torture/strub-run4c.c: New.
* c-c++-common/torture/strub-run4d.c: New.
* c-c++-common/torture/strub-run4i.c: New.
* g++.dg/strub-run1.C: New.
* g++.dg/torture/strub-init1.C: New.
* g++.dg/torture/strub-init2.C: New.
* g++.dg/torture/strub-init3.C: New.
* gnat.dg/strub_attr.adb, gnat.dg/strub_attr.ads: New.
* gnat.dg/strub_ind.adb, gnat.dg/strub_ind.ads: New.
for libgcc/ChangeLog
* Makefile.in (LIB2ADD): Add strub.c.
* libgcc2.h (__strub_enter, __strub_update, __strub_leave):
Declare.
* strub.c: New.
* libgcc-std.ver.in (__strub_enter): Add to GCC_14.0.0.
(__strub_update, __strub_leave): Likewise.
|
|
Arm's SME adds a new processor mode called streaming mode.
This mode enables some new (matrix-oriented) instructions and
disables several existing groups of instructions, such as most
Advanced SIMD vector instructions and a much smaller set of SVE
instructions. It can also change the current vector length.
There are instructions to switch in and out of streaming mode.
However, their effect on the ISA and vector length can't be represented
directly in RTL, so they need to be emitted late in the pass pipeline,
close to md_reorg.
It's sometimes the responsibility of the prologue and epilogue to
switch modes, which means we need to emit the prologue and epilogue
sequences late as well. (This loses shrink-wrapping and scheduling
opportunities, but that's a price worth paying.)
This patch therefore adds a target hook for forcing prologue
and epilogue insertion to happen later in the pipeline.
gcc/
* target.def (use_late_prologue_epilogue): New hook.
* doc/tm.texi.in: Add TARGET_USE_LATE_PROLOGUE_EPILOGUE.
* doc/tm.texi: Regenerate.
* passes.def (pass_late_thread_prologue_and_epilogue): New pass.
* tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare.
* function.cc (pass_thread_prologue_and_epilogue::gate): New function.
(pass_data_late_thread_prologue_and_epilogue): New pass variable.
(pass_late_thread_prologue_and_epilogue): New pass class.
(make_pass_late_thread_prologue_and_epilogue): New function.
|
|
This patch introduces an optional hardening pass to catch unexpected
execution flows. Functions are transformed so that basic blocks set a
bit in an automatic array, and (non-exceptional) function exit edges
check that the bits in the array represent an expected execution path
in the CFG.
Functions with multiple exit edges, or with too many blocks, call an
out-of-line checker builtin implemented in libgcc. For simpler
functions, the verification is performed in-line.
-fharden-control-flow-redundancy enables the pass for eligible
functions, --param hardcfr-max-blocks sets a block count limit for
functions to be eligible, and --param hardcfr-max-inline-blocks
tunes the "too many blocks" limit for in-line verification.
-fhardcfr-skip-leaf makes leaf functions non-eligible.
Additional -fhardcfr-check-* options are added to enable checking at
exception escape points, before potential sibcalls, hereby dubbed
returning calls, and before noreturn calls and exception raises. A
notable case is the distinction between noreturn calls expected to
throw and those expected to terminate or loop forever: the default
setting for -fhardcfr-check-noreturn-calls, no-xthrow, performs
checking before the latter, but the former only gets checking in the
exception handler. GCC can only tell between them by explicit marking
noreturn functions expected to raise with the newly-introduced
expected_throw attribute, and corresponding ECF_XTHROW flag.
for gcc/ChangeLog
* tree-core.h (ECF_XTHROW): New macro.
* tree.cc (set_call_expr): Add expected_throw attribute when
ECF_XTHROW is set.
(build_common_builtin_node): Add ECF_XTHROW to
__cxa_end_cleanup and _Unwind_Resume or _Unwind_SjLj_Resume.
* calls.cc (flags_from_decl_or_type): Check for expected_throw
attribute to set ECF_XTHROW.
* gimple.cc (gimple_build_call_from_tree): Propagate
ECF_XTHROW from decl flags to gimple call...
(gimple_call_flags): ... and back.
* gimple.h (GF_CALL_XTHROW): New gf_mask flag.
(gimple_call_set_expected_throw): New.
(gimple_call_expected_throw_p): New.
* Makefile.in (OBJS): Add gimple-harden-control-flow.o.
* builtins.def (BUILT_IN___HARDCFR_CHECK): New.
* common.opt (fharden-control-flow-redundancy): New.
(-fhardcfr-check-returning-calls): New.
(-fhardcfr-check-exceptions): New.
(-fhardcfr-check-noreturn-calls=*): New.
(Enum hardcfr_check_noreturn_calls): New.
(fhardcfr-skip-leaf): New.
* doc/invoke.texi: Document them.
(hardcfr-max-blocks, hardcfr-max-inline-blocks): New params.
* flag-types.h (enum hardcfr_noret): New.
* gimple-harden-control-flow.cc: New.
* params.opt (-param=hardcfr-max-blocks=): New.
(-param=hradcfr-max-inline-blocks=): New.
* passes.def (pass_harden_control_flow_redundancy): Add.
* tree-pass.h (make_pass_harden_control_flow_redundancy):
Declare.
* doc/extend.texi: Document expected_throw attribute.
for gcc/ada/ChangeLog
* gcc-interface/trans.cc (gigi): Mark __gnat_reraise_zcx with
ECF_XTHROW.
(build_raise_check): Likewise for all rcheck subprograms.
for gcc/c-family/ChangeLog
* c-attribs.cc (handle_expected_throw_attribute): New.
(c_common_attribute_table): Add expected_throw.
for gcc/cp/ChangeLog
* decl.cc (push_throw_library_fn): Mark with ECF_XTHROW.
* except.cc (build_throw): Likewise __cxa_throw,
_ITM_cxa_throw, __cxa_rethrow.
for gcc/testsuite/ChangeLog
* c-c++-common/torture/harden-cfr.c: New.
* c-c++-common/harden-cfr-noret-never-O0.c: New.
* c-c++-common/torture/harden-cfr-noret-never.c: New.
* c-c++-common/torture/harden-cfr-noret-noexcept.c: New.
* c-c++-common/torture/harden-cfr-noret-nothrow.c: New.
* c-c++-common/torture/harden-cfr-noret.c: New.
* c-c++-common/torture/harden-cfr-notail.c: New.
* c-c++-common/torture/harden-cfr-returning.c: New.
* c-c++-common/torture/harden-cfr-tail.c: New.
* c-c++-common/torture/harden-cfr-abrt-always.c: New.
* c-c++-common/torture/harden-cfr-abrt-never.c: New.
* c-c++-common/torture/harden-cfr-abrt-no-xthrow.c: New.
* c-c++-common/torture/harden-cfr-abrt-nothrow.c: New.
* c-c++-common/torture/harden-cfr-abrt.c: New.
* c-c++-common/torture/harden-cfr-always.c: New.
* c-c++-common/torture/harden-cfr-never.c: New.
* c-c++-common/torture/harden-cfr-no-xthrow.c: New.
* c-c++-common/torture/harden-cfr-nothrow.c: New.
* c-c++-common/torture/harden-cfr-bret-always.c: New.
* c-c++-common/torture/harden-cfr-bret-never.c: New.
* c-c++-common/torture/harden-cfr-bret-noopt.c: New.
* c-c++-common/torture/harden-cfr-bret-noret.c: New.
* c-c++-common/torture/harden-cfr-bret-no-xthrow.c: New.
* c-c++-common/torture/harden-cfr-bret-nothrow.c: New.
* c-c++-common/torture/harden-cfr-bret-retcl.c: New.
* c-c++-common/torture/harden-cfr-bret.c: New.
* g++.dg/harden-cfr-throw-always-O0.C: New.
* g++.dg/harden-cfr-throw-returning-O0.C: New.
* g++.dg/torture/harden-cfr-noret-always-no-nothrow.C: New.
* g++.dg/torture/harden-cfr-noret-never-no-nothrow.C: New.
* g++.dg/torture/harden-cfr-noret-no-nothrow.C: New.
* g++.dg/torture/harden-cfr-throw-always.C: New.
* g++.dg/torture/harden-cfr-throw-never.C: New.
* g++.dg/torture/harden-cfr-throw-no-xthrow.C: New.
* g++.dg/torture/harden-cfr-throw-no-xthrow-expected.C: New.
* g++.dg/torture/harden-cfr-throw-nothrow.C: New.
* g++.dg/torture/harden-cfr-throw-nocleanup.C: New.
* g++.dg/torture/harden-cfr-throw-returning.C: New.
* g++.dg/torture/harden-cfr-throw.C: New.
* gcc.dg/torture/harden-cfr-noret-no-nothrow.c: New.
* gcc.dg/torture/harden-cfr-tail-ub.c: New.
* gnat.dg/hardcfr.adb: New.
for libgcc/ChangeLog
* Makefile.in (LIB2ADD): Add hardcfr.c.
* hardcfr.c: New.
|
|
This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:
addi t4,sp,16
add t2,a6,t4
shl t3,t2,1
ld a2,0(t3)
addi a2,1
sd a2,8(t2)
into the following (one instruction less):
add t2,a6,sp
shl t3,t2,1
ld a2,32(t3)
addi a2,1
sd a2,24(t2)
Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.
gcc/ChangeLog:
* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.
* gcc.target/i386/pr52146.c: Adjust expected output.
Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
|
|
Rather than using a pass count to decide which parameters are passed to
VRP, makemit explicit.
* passes.def (pass_vrp): Pass "final pass" flag as parameter.
* tree-vrp.cc (vrp_pass_num): Remove.
(pass_vrp::my_pass): Remove.
(pass_vrp::pass_vrp): Add warn_p as a parameter.
(pass_vrp::final_p): New.
(pass_vrp::set_pass_param): Set final_p param.
(pass_vrp::execute): Call execute_range_vrp with no conditions.
(make_pass_vrp): Pass additional parameter.
(make_pass_early_vrp): Ditto.
|
|
The following patch adds a new bitintlower lowering pass which lowers most
operations on medium _BitInt into operations on corresponding integer types,
large _BitInt into straight line code operating on 2 or more limbs and
finally huge _BitInt into a loop plus optional straight line code.
As the only supported architecture is little-endian, the lowering only
supports little-endian for now, because it would be impossible to test it
all for big-endian. Rest is written with any endian support in mind, but
of course only little-endian has been actually tested.
I hope it is ok to add big-endian support to the lowering pass incrementally
later when first big-endian target shows with the backend support.
There are 2 possibilities of adding such support, one would be minimal one,
just tweak limb_access function and perhaps one or two other spots and
transform there the indexes from little endian (index 0 is least significant)
to big endian for just the memory access. Advantage is I think maintainance
costs, disadvantage is that the loops will still iterate from 0 to some number
of limbs and we'd rely on IVOPTs or something similar changing it later if
needed. Or we could make those indexes endian related everywhere, though
I'm afraid that would be several hundreds of changes.
For switches indexed by large/huge _BitInt the patch invokes what the switch
lowering pass does (but only on those specific switches, not all of them);
the switch lowering breaks the switches into clusters and none of the clusters
can have a range which doesn't fit into 64-bit UWHI, everything else will be
turned into a tree of comparisons. For clusters normally emitted as smaller
switches, because we already have a guarantee that the low .. high range is
at most 64 bits, the patch forces subtraction of the low and turns it into
a 64-bit switch. This is done before the actual pass starts.
Similarly, we cancel lowering of certain constructs like ABS_EXPR, ABSU_EXPR,
MIN_EXPR, MAX_EXPR and COND_EXPR and turn those back to simpler comparisons
etc., so that fewer operations need to be lowered later.
2023-09-06 Jakub Jelinek <jakub@redhat.com>
PR c/102989
* Makefile.in (OBJS): Add gimple-lower-bitint.o.
* passes.def: Add pass_lower_bitint after pass_lower_complex and
pass_lower_bitint_O0 after pass_lower_complex_O0.
* tree-pass.h (PROP_gimple_lbitint): Define.
(make_pass_lower_bitint_O0, make_pass_lower_bitint): Declare.
* gimple-lower-bitint.h: New file.
* tree-ssa-live.h (struct _var_map): Add bitint member.
(init_var_map): Adjust declaration.
(region_contains_p): Handle map->bitint like map->outofssa_p.
* tree-ssa-live.cc (init_var_map): Add BITINT argument, initialize
map->bitint and set map->outofssa_p to false if it is non-NULL.
* tree-ssa-coalesce.cc: Include gimple-lower-bitint.h.
(build_ssa_conflict_graph): Call build_bitint_stmt_ssa_conflicts if
map->bitint.
(create_coalesce_list_for_region): For map->bitint ignore SSA_NAMEs
not in that bitmap, and allow res without default def.
(compute_optimized_partition_bases): In map->bitint mode try hard to
coalesce any SSA_NAMEs with the same size.
(coalesce_bitint): New function.
(coalesce_ssa_name): In map->bitint mode, or map->bitmap into
used_in_copies and call coalesce_bitint.
* gimple-lower-bitint.cc: New file.
|
|
The following swaps the loop splitting pass and the final value
replacement pass to avoid keeping the IV of the earlier loop
live when not necessary. The existing gcc.target/i386/pr87007-5.c
testcase shows that we otherwise fail to elide an empty loop
later. I don't see any good reason why loop splitting would need
final value replacement, all exit values honor the constraints
we place on loop header PHIs automatically.
* passes.def: Exchange loop splitting and final value
replacement passes.
* gcc.target/i386/pr87007-5.c: Make sure we split the loop
and eliminate both in the end.
|
|
Currently we rebiuild profile_counts from profile_probability after inlining,
because there is a chance that producing large loop nests may get unrealistically
large profile_count values. This is much less of concern when we switched to
new profile_count representation while back.
This propagation can also compensate for profile inconsistencies caused by
optimization passes. Since inliner is followed by basic cleanup passes that
does not use profile, we get more realistic profile by delaying the recomputation
after basic optimizations exposed by inlininig are finished.
This does not fit into TODO machinery, so I turn rebuilding into stand alone
pass and schedule it before first consumer of profile in the optimization
queue.
I also added logic that avoids repropagating when CFG is good and not too close
to overflow. Propagating visits very basic block loop_depth times, so it is
not linear and avoiding it may help a bit.
On tramp3d we get 14 functions repropagated and 916 are OK. The repropagated
functions are RB tree ones where we produce crazy loop nests by recurisve inlining.
This is something to fix independently.
gcc/ChangeLog:
* passes.cc (execute_function_todo): Remove
TODO_rebuild_frequencies
* passes.def: Add rebuild_frequencies pass.
* predict.cc (estimate_bb_frequencies): Drop
force parameter.
(tree_estimate_probability): Update call of
estimate_bb_frequencies.
(rebuild_frequencies): Turn into a pass; verify CFG profile consistency
first and do not rebuild if not necessary.
(class pass_rebuild_frequencies): New.
(make_pass_rebuild_frequencies): New.
* profile-count.h: Add profile_count::very_large_p.
* tree-inline.cc (optimize_inline_calls): Do not return
TODO_rebuild_frequencies
* tree-pass.h (TODO_rebuild_frequencies): Remove.
(make_pass_rebuild_frequencies): Declare.
|
|
we currently produce very bad code on loops using std::vector as a stack, since
we fail to inline push_back which in turn prevents SRA and we fail to optimize
out some store-to-load pairs.
I looked into why this function is not inlined and it is inlined by clang. We
currently estimate it to 66 instructions and inline limits are 15 at -O2 and 30
at -O3. Clang has similar estimate, but still decides to inline at -O2.
I looked into reason why the body is so large and one problem I spotted is the
way std::max is implemented by taking and returning reference to the values.
const T& max( const T& a, const T& b );
This makes it necessary to store the values to memory and load them later
and max is used by code computing new size of vector on resize.
We optimize this to MAX_EXPR, but only during late optimizations. I think this
is a common enough coding pattern and we ought to make this transparent to
early opts and IPA. The following is easist fix that simply adds phiprop pass
that turns the PHI of address values into PHI of values so later FRE can
propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE
remove the memory stores.
gcc/ChangeLog:
PR tree-optimization/109811
PR tree-optimization/109849
* passes.def: Add phiprop to early optimization passes.
* tree-ssa-phiprop.cc: Allow clonning.
gcc/testsuite/ChangeLog:
PR tree-optimization/109811
PR tree-optimization/109849
* gcc.dg/tree-ssa/phiprop-1.c: New test.
* gcc.dg/tree-ssa/pr21463.c: Adjust template.
|
|
|
|
The following makes the last DCE pass CD-DCE and in turn the
last CD-DCE pass a DCE one. That ensues we remove empty loops
that become empty between the two. I've also moved the tail-call
pass after DCE since DCE can only improve things here.
The two testcases were the only ones scanning cddce3 so I've
changed them to scan the dce7 pass that's now in this place.
The testcases scanning dce7 also work when that's in the earlier
position.
PR tree-optimization/84646
* tree-ssa-dce.cc (pass_dce::set_pass_param): Add param
wheter to run update-address-taken.
(pass_dce::execute): Honor it.
* passes.def: Exchange last DCE and CD-DCE invocations.
Swap pass_tail_calls and the last DCE.
* g++.dg/tree-ssa/pr106922.C: Continue to scan earlier DCE dump.
* gcc.dg/tree-ssa/20030808-1.c: Likewise.
|
|
My earlier patches gimplify the simplest non-side-effects assumptions
into if (cond) ; else __builtin_unreachable (); and throw the rest
on the floor.
The following patch attempts to do something with the rest too.
For -O0, it throws the more complex assumptions on the floor,
we don't expect optimizations and the assumptions are there to allow
optimizations. Otherwise arranges for the assumptions to be
visible in the IL as
.ASSUME (_Z2f4i._assume.0, i_1(D));
call where there is an artificial function like:
bool _Z2f4i._assume.0 (int i)
{
bool _2;
<bb 2> [local count: 1073741824]:
_2 = i_1(D) == 43;
return _2;
}
with the semantics that there is UB unless the assumption function
would return true.
Aldy, could ranger handle this? If it sees .ASSUME call,
walk the body of such function from the edge(s) to exit with the
assumption that the function returns true, so above set _2 [true, true]
and from there derive that i_1(D) [43, 43] and then map the argument
in the assumption function to argument passed to IFN_ASSUME (note,
args there are shifted by 1)?
During gimplification it actually gimplifies it into
[[assume (D.2591)]]
{
{
i = i + 1;
D.2591 = i == 44;
}
}
which is a new GIMPLE_ASSUME statement wrapping a GIMPLE_BIND and
specifying a boolean_type_node variable which contains the result.
The GIMPLE_ASSUME then survives just a couple of passes and is lowered
during gimple lowering into an outlined separate function and
IFN_ASSUME call. Variables declared inside of the
condition (both static and automatic) just change context, automatic
variables from the caller are turned into parameters (note, as the code
is never executed, I handle this way even non-POD types, we don't need to
bother pretending there would be user copy constructors etc. involved).
The assume_function artificial functions are then optimized until the
new assumptions pass which doesn't do much right now but I'd like to see
there the backwards ranger walk and filling up of SSA_NAME_RANGE_INFO
for the parameters.
There are a few further changes I'd like to do, like ignoring the
.ASSUME calls in inlining size estimations (but haven't figured out where
it is done), or for LTO arrange for the assume functions to be emitted
in all partitions that reference those (usually there will be just one,
unless code with the assumption got inlined, versioned etc.).
2022-10-18 Jakub Jelinek <jakub@redhat.com>
PR c++/106654
gcc/
* gimple.def (GIMPLE_ASSUME): New statement kind.
* gimple.h (struct gimple_statement_assume): New type.
(is_a_helper <gimple_statement_assume *>::test,
is_a_helper <const gimple_statement_assume *>::test): New.
(gimple_build_assume): Declare.
(gimple_has_substatements): Return true for GIMPLE_ASSUME.
(gimple_assume_guard, gimple_assume_set_guard,
gimple_assume_guard_ptr, gimple_assume_body_ptr, gimple_assume_body):
New inline functions.
* gsstruct.def (GSS_ASSUME): New.
* gimple.cc (gimple_build_assume): New function.
(gimple_copy): Handle GIMPLE_ASSUME.
* gimple-pretty-print.cc (dump_gimple_assume): New function.
(pp_gimple_stmt_1): Handle GIMPLE_ASSUME.
* gimple-walk.cc (walk_gimple_op): Handle GIMPLE_ASSUME.
* omp-low.cc (WALK_SUBSTMTS): Likewise.
(lower_omp_1): Likewise.
* omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn):
Likewise.
* tree-cfg.cc (verify_gimple_stmt, verify_gimple_in_seq_2): Likewise.
* function.h (struct function): Add assume_function bitfield.
* gimplify.cc (gimplify_call_expr): If the assumption isn't
simple enough, expand it into GIMPLE_ASSUME wrapped block or
for -O0 drop it.
* gimple-low.cc: Include attribs.h.
(create_assumption_fn): New function.
(struct lower_assumption_data): New type.
(find_assumption_locals_r, assumption_copy_decl,
adjust_assumption_stmt_r, adjust_assumption_stmt_op,
lower_assumption): New functions.
(lower_stmt): Handle GIMPLE_ASSUME.
* tree-ssa-ccp.cc (pass_fold_builtins::execute): Remove
IFN_ASSUME calls.
* lto-streamer-out.cc (output_struct_function_base): Pack
assume_function bit.
* lto-streamer-in.cc (input_struct_function_base): And unpack it.
* cgraphunit.cc (cgraph_node::expand): Don't verify assume_function
has TREE_ASM_WRITTEN set and don't release its body.
(symbol_table::compile): Allow assume functions not to have released
body.
* internal-fn.cc (expand_ASSUME): Remove gcc_unreachable.
* passes.cc (execute_one_pass): For TODO_discard_function don't
release body of assume functions.
* cgraph.cc (cgraph_node::verify_node): Don't verify cgraph nodes
of PROP_assumptions_done functions.
* tree-pass.h (PROP_assumptions_done): Define.
(TODO_discard_function): Adjust comment.
(make_pass_assumptions): Declare.
* passes.def (pass_assumptions): Add.
* timevar.def (TV_TREE_ASSUMPTIONS): New.
* tree-inline.cc (remap_gimple_stmt): Handle GIMPLE_ASSUME.
* tree-vrp.cc (pass_data_assumptions): New variable.
(pass_assumptions): New class.
(make_pass_assumptions): New function.
gcc/cp/
* cp-tree.h (build_assume_call): Declare.
* parser.cc (cp_parser_omp_assumption_clauses): Use build_assume_call.
* cp-gimplify.cc (build_assume_call): New function.
(process_stmt_assume_attribute): Use build_assume_call.
* pt.cc (tsubst_copy_and_build): Likewise.
gcc/testsuite/
* g++.dg/cpp23/attr-assume5.C: New test.
* g++.dg/cpp23/attr-assume6.C: New test.
* g++.dg/cpp23/attr-assume7.C: New test.
|
|
The following resolves the issue that DSE cannot handle references
with variable offsets well when identifying possible uses of a store.
Instead of just relying on ref_maybe_used_by_stmt_p we use data-ref
analysis, making sure to perform that at most once per stmt. The
new mode is only exercised by the DSE pass before loop optimization
as specified by a new pass parameter and when expensive optimizations
are enabled, so it's disabled below -O2.
PR tree-optimization/99407
* tree-ssa-dse.cc (dse_stmt_to_dr_map): New global.
(dse_classify_store): Use data-ref analysis to disambiguate more uses.
(pass_dse::use_dr_analysis_p): New pass parameter.
(pass_dse::set_pass_param): Implement.
(pass_dse::execute): Allocate and deallocate dse_stmt_to_dr_map.
* passes.def: Allow DR analysis for the DSE pass before loop.
* gcc.dg/vect/tsvc/vect-tsvc-s243.c: Remove XFAIL.
|
|
__builtin_cexpi can't be vectorized since there's gap between it and
vectorized sincos version(In libmvec, it passes a double and two
double pointer and returns nothing.) And it will lose some
vectorization opportunity if sin & cos are optimized to cexpi before
vectorizer.
I'm trying to add vect_recog_cexpi_pattern to split cexpi to sin and
cos, but it failed vectorizable_simd_clone_call since NULL is returned
by cgraph_node::get (fndecl). So alternatively, the patch try to move
pass_cse_sincos after vectorizer, just before pas_cse_reciprocals.
Also original pass_cse_sincos additionaly expands pow&cabs, this patch
split that part into a separate pass named pass_expand_powcabs which
remains the old pass position.
gcc/ChangeLog:
* passes.def: (Split pass_cse_sincos to pass_expand_powcabs
and pass_cse_sincos, and move pass_cse_sincos after vectorizer).
* timevar.def (TV_TREE_POWCABS): New timevar.
* tree-pass.h (make_pass_expand_powcabs): Split from pass_cse_sincos.
* tree-ssa-math-opts.cc (gimple_expand_builtin_cabs): Ditto.
(class pass_expand_powcabs): Ditto.
(pass_expand_powcabs::execute): Ditto.
(make_pass_expand_powcabs): Ditto.
(pass_cse_sincos::execute): Remove pow/cabs expand part.
(make_pass_cse_sincos): Ditto.
gcc/testsuite/ChangeLog:
* gcc.dg/pow-sqrt-synth-1.c: Adjust testcase.
|
|
When the walloca pass gained support for ranger the early pass
was not moved to a place where SSA form is available but remained
in the lowering pipeline. For the testcase in this bug this is
a problem because for errorneous input we still run the lowering
pipeline but here have broken SSA form which ranger does not like.
The solution is to rectify the mistake with using ranger without
SSA form and move the pass which solves both issues.
2022-04-05 Richard Biener <rguenther@suse.de>
PR c/105151
* passes.def (pass_walloca): Move early instance into
pass_build_ssa_passes to make SSA form available.
* gcc.dg/gimplefe-error-14.c: New testcase.
|
|
Something went wrong when testing the earlier patch to move the
late sinking to before the late phiopt for PR102008. The following
makes sure to unsplit edges after the late sinking since the split
edges confuse the following phiopt leading to missed optimizations.
I've went for a new pass parameter for this to avoid changing the
CFG after the early sinking pass at this point.
2022-03-17 Richard Biener <rguenther@suse.de>
PR tree-optimization/104960
* passes.def: Add pass parameter to pass_sink_code, mark
last one to unsplit edges.
* tree-ssa-sink.cc (pass_sink_code::set_pass_param): New.
(pass_sink_code::execute): Always execute TODO_cleanup_cfg
when we need to unsplit edges.
* gcc.dg/gimplefe-37.c: Adjust to allow either the true
or false edge to have a forwarder.
|
|
The following re-orders the newly added code sinking pass before
the last phiopt pass which performs hoisting of adjacent loads
with the intent to enable if-conversion on those.
I've added the aarch64 specific testcase from the PR.
2022-03-16 Richard Biener <rguenther@suse.de>
PR tree-optimization/102008
* passes.def: Move the added code sinking pass before the
preceeding phiopt pass.
* gcc.target/aarch64/pr102008.c: New testcase.
|
|
Resolves:
PR middle-end/104260 - Misplaced waccess3 pass
gcc/ChangeLog:
PR middle-end/104260
* passes.def (pass_warn_access): Adjust pass placement.
|
|
With -Og we are not prepared to do cleanup after IPA optimizations
and dead code exposed by those confuses late diagnostic passes.
This is a first patch removing unwanted IPA optimizations, namely
both late modref and pure-const analysis.
2022-01-18 Richard Biener <rguenther@suse.de>
PR ipa/103989
* passes.def (pass_all_optimizations_g): Remove pass_modref
and pass_local_pure_const.
|
|
Resolves:
PR c/63272 - GCC should warn when using pointer to dead scoped variable with
in the same function
gcc/c-family/ChangeLog:
PR c/63272
* c.opt (-Wdangling-pointer): New option.
gcc/ChangeLog:
PR c/63272
* diagnostic-spec.c (nowarn_spec_t::nowarn_spec_t): Handle
-Wdangling-pointer.
* doc/invoke.texi (-Wdangling-pointer): Document new option.
* gimple-ssa-warn-access.cc (pass_waccess::clone): Set new member.
(pass_waccess::check_pointer_uses): New function.
(pass_waccess::gimple_call_return_arg): New function.
(pass_waccess::gimple_call_return_arg_ref): New function.
(pass_waccess::check_call_dangling): New function.
(pass_waccess::check_dangling_uses): New function overloads.
(pass_waccess::check_dangling_stores): New function.
(pass_waccess::check_dangling_stores): New function.
(pass_waccess::m_clobbers): New data member.
(pass_waccess::m_func): New data member.
(pass_waccess::m_run_number): New data member.
(pass_waccess::m_check_dangling_p): New data member.
(pass_waccess::check_alloca): Check m_early_checks_p.
(pass_waccess::check_alloc_size_call): Same.
(pass_waccess::check_strcat): Same.
(pass_waccess::check_strncat): Same.
(pass_waccess::check_stxcpy): Same.
(pass_waccess::check_stxncpy): Same.
(pass_waccess::check_strncmp): Same.
(pass_waccess::check_memop_access): Same.
(pass_waccess::check_read_access): Same.
(pass_waccess::check_builtin): Call check_pointer_uses.
(pass_waccess::warn_invalid_pointer): Add arguments.
(is_auto_decl): New function.
(pass_waccess::check_stmt): New function.
(pass_waccess::check_block): Call check_stmt.
(pass_waccess::execute): Call check_dangling_uses,
check_dangling_stores. Empty m_clobbers.
* passes.def (pass_warn_access): Invoke pass two more times.
gcc/testsuite/ChangeLog:
PR c/63272
* g++.dg/warn/Wfree-nonheap-object-6.C: Disable valid warnings.
* g++.dg/warn/ref-temp1.C: Prune expected warning.
* gcc.dg/uninit-pr50476.c: Expect a new warning.
* c-c++-common/Wdangling-pointer-2.c: New test.
* c-c++-common/Wdangling-pointer-3.c: New test.
* c-c++-common/Wdangling-pointer-4.c: New test.
* c-c++-common/Wdangling-pointer-5.c: New test.
* c-c++-common/Wdangling-pointer-6.c: New test.
* c-c++-common/Wdangling-pointer.c: New test.
* g++.dg/warn/Wdangling-pointer-2.C: New test.
* g++.dg/warn/Wdangling-pointer.C: New test.
* gcc.dg/Wdangling-pointer-2.c: New test.
* gcc.dg/Wdangling-pointer.c: New test.
|
|
|
|
Resolves:
PR middle-end/88232 - Please implement -Winfinite-recursion
gcc/ChangeLog:
PR middle-end/88232
* Makefile.in (OBJS): Add gimple-warn-recursion.o.
* common.opt: Add -Winfinite-recursion.
* doc/invoke.texi (-Winfinite-recursion): Document.
* passes.def (pass_warn_recursion): Schedule a new pass.
* tree-pass.h (make_pass_warn_recursion): Declare.
* gimple-warn-recursion.c: New file.
gcc/c-family/ChangeLog:
PR middle-end/88232
* c.opt: Add -Winfinite-recursion.
gcc/testsuite/ChangeLog:
PR middle-end/88232
* c-c++-common/attr-used-5.c: Suppress valid warning.
* c-c++-common/attr-used-6.c: Same.
* c-c++-common/attr-used-9.c: Same.
* g++.dg/warn/Winfinite-recursion-2.C: New test.
* g++.dg/warn/Winfinite-recursion-3.C: New test.
* g++.dg/warn/Winfinite-recursion.C: New test.
* gcc.dg/Winfinite-recursion-2.c: New test.
* gcc.dg/Winfinite-recursion.c: New test.
|
|
We newly can handle some extra cases, for example:
struct a {int a,b,c;};
__attribute__ ((noinline))
int init (struct a *a)
{
a->a=1;
a->b=2;
a->c=3;
}
int const_fn ()
{
struct a a;
init (&a);
return a.a + a.b + a.c;
}
Here pure/const stops on the fact that const_fn calls non-const init, while
modref knows that the memory it initializes is local to const_fn.
I ended up reordering passes so early modref is done after early pure-const
mostly to avoid need to change testsuite which greps for const functions
being detects in pure-const. Stil some testuiste compensation is needed.
gcc/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref.c (analyze_function): Do pure/const discovery, return
true on success.
(pass_modref::execute): If pure/const is discovered fixup cfg.
(ignore_edge): Do not ignore pure/const edges.
(modref_propagate_in_scc): Do pure/const discovery, return true if
cdtor was promoted pure/const.
(pass_ipa_modref::execute): If needed remove unreachable functions.
* ipa-pure-const.c (warn_function_noreturn): Fix whitespace.
(warn_function_cold): Likewise.
(skip_function_for_local_pure_const): Move earlier.
(ipa_make_function_const): Break out from ...
(ipa_make_function_pure): Break out from ...
(propagate_pure_const): ... here.
(pass_local_pure_const::execute): Use it.
* ipa-utils.h (ipa_make_function_const): Declare.
(ipa_make_function_pure): Declare.
* passes.def: Move early modref after pure-const.
gcc/testsuite/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* c-c++-common/tm/inline-asm.c: Disable pure-const.
* g++.dg/ipa/modref-1.C: Update template.
* gcc.dg/tree-ssa/modref-11.c: Disable pure-const.
* gcc.dg/tree-ssa/modref-14.c: New test.
* gcc.dg/tree-ssa/modref-8.c: Do not optimize sibling calls.
* gfortran.dg/do_subscript_3.f90: Add -O0.
|
|
moveS uncprop after modref and pure/const pass and adds a comment that
this pass should alwasy be last since it is only supposed to help PHI lowering.
The pass replaces constant by SSA names that are known to be constant at the
place which hardly helps other passes.
gcc/ChangeLog:
PR tree-optimization/103177
* passes.def: Move uncprop after pure/const and modref.
|
|
Chasing down stage3 miscomparisons is never fun, and having no way to
distinguish between jump threads registered by a particular
pass, is even harder. This patch adds debug counters for the individual
back threading passes. I've left the ethread pass alone, as that one is
usually benign, but we could easily add it if needed.
The fact that we can only pass one boolean argument to the passes
infrastructure has us do all sorts of gymnastics to differentiate
between the various back threading passes.
Tested on x86-64 Linux.
gcc/ChangeLog:
* dbgcnt.def: Add debug counter for back_thread[12] and
back_threadfull[12].
* passes.def: Pass "first" argument to each back threading pass.
* tree-ssa-threadbackward.c (back_threader::back_threader): Add
first argument.
(back_threader::debug_counter): New.
(back_threader::maybe_register_path): Call debug_counter.
|
|
This patch upgrades the pre-VRP threading passes to fully resolving
backward threaders, and removes the post-VRP threading passes altogether.
With it, we reduce the number of threaders in our pipeline from 9 to 7.
This will leave DOM as the only forward threader client. When the ranger
can handle floats, we should be able to upgrade the pre-DOM threaders to
fully resolving threaders and kill the embedded DOM threader.
The numbers are as follows:
prev: # threads in backward + vrp-threaders = 92624
now: # threads in backward threaders = 94275
Gain: +1.78%
prev: # total threads: 189495
now: # total threads: 193714
Gain: +2.22%
The numbers are not as great as my initial proposal, but I've
recently pushed all the work that got us to this point ;-).
And... the compilation improves by 1.32%!
There's a regression on uninit-pred-7_a.c that I've yet to look at. I
want to make sure it's not a missing thread. If it is, I'll create a PR
and own it.
Also, the tree-ssa/phi_on_compare-*.c tests have all regressed. This
seems to be some special case the forward threader handles that the
backward threader does not (edge_forwards_cmp_to_conditional_jump*).
I haven't dug deep to see if this is solveable within our
infrastructure, but a cursory look shows that even though the VRP
threader threads this, the *.optimized dump ends with more conditional
jumps than without the optimization. I'd like to punt on this for
now, because DOM actually catches this through its lone use of the
forward threader (I've adjusted the tests). However, we will need to
address this sooner or later, if indeed it's still improving the final
assembly.
gcc/ChangeLog:
* passes.def: Replace the pass_thread_jumps before VRP* with
pass_thread_jumps_full. Remove all pass_vrp_threader instances.
* tree-ssa-threadbackward.c (pass_data_thread_jumps_full):
Remove hyphen from "thread-full" name.
libgomp/ChangeLog:
* testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading changes.
* testsuite/libgomp.graphite/force-parallel-8.c: Same.
gcc/testsuite/ChangeLog:
* gcc.dg/loop-unswitch-2.c: Adjust for threading changes.
* gcc.dg/old-style-asm-1.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-1.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-2.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-3.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-4.c: Same.
* gcc.dg/tree-ssa/pr20701.c: Same.
* gcc.dg/tree-ssa/pr21001.c: Same.
* gcc.dg/tree-ssa/pr21294.c: Same.
* gcc.dg/tree-ssa/pr21417.c: Same.
* gcc.dg/tree-ssa/pr21559.c: Same.
* gcc.dg/tree-ssa/pr21563.c: Same.
* gcc.dg/tree-ssa/pr49039.c: Same.
* gcc.dg/tree-ssa/pr59597.c: Same.
* gcc.dg/tree-ssa/pr61839_1.c: Same.
* gcc.dg/tree-ssa/pr61839_3.c: Same.
* gcc.dg/tree-ssa/pr66752-3.c: Same.
* gcc.dg/tree-ssa/pr68198.c: Same.
* gcc.dg/tree-ssa/pr77445-2.c: Same.
* gcc.dg/tree-ssa/pr77445.c: Same.
* gcc.dg/tree-ssa/ranger-threader-1.c: Same.
* gcc.dg/tree-ssa/ranger-threader-2.c: Same.
* gcc.dg/tree-ssa/ranger-threader-4.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
* gcc.dg/tree-ssa/ssa-thread-14.c: Same.
* gcc.dg/tree-ssa/ssa-thread-backedge.c: Same.
* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
* gcc.dg/tree-ssa/vrp02.c: Same.
* gcc.dg/tree-ssa/vrp03.c: Same.
* gcc.dg/tree-ssa/vrp05.c: Same.
* gcc.dg/tree-ssa/vrp06.c: Same.
* gcc.dg/tree-ssa/vrp07.c: Same.
* gcc.dg/tree-ssa/vrp08.c: Same.
* gcc.dg/tree-ssa/vrp09.c: Same.
* gcc.dg/tree-ssa/vrp33.c: Same.
* gcc.dg/uninit-pred-9_b.c: Same.
* gcc.dg/uninit-pred-7_a.c: xfail.
|
|
This patch introduces optional passes to harden conditionals used in
branches, and in computing boolean expressions, by adding redundant
tests of the reversed conditions, and trapping in case of unexpected
results. Though in abstract machines the redundant tests should never
fail, CPUs may be led to misbehave under certain kinds of attacks,
such as of power deprivation, and these tests reduce the likelihood of
going too far down an unexpected execution path.
for gcc/ChangeLog
* common.opt (fharden-compares): New.
(fharden-conditional-branches): New.
* doc/invoke.texi: Document new options.
* gimple-harden-conditionals.cc: New.
* Makefile.in (OBJS): Build it.
* passes.def: Add new passes.
* tree-pass.h (make_pass_harden_compares): Declare.
(make_pass_harden_conditional_branches): Declare.
for gcc/ada/ChangeLog
* doc/gnat_rm/security_hardening_features.rst
(Hardened Conditionals): New.
for gcc/testsuite/ChangeLog
* c-c++-common/torture/harden-comp.c: New.
* c-c++-common/torture/harden-cond.c: New.
|
|
gcc/ChangeLog:
* passes.def: Change threading comment before pass_ccp pass.
|
|
Biasing loop-carried PHIs during the 1st reassociation pass interferes
with reduction chains and does not bring measurable benefits, so do it
only during the 2nd reassociation pass.
gcc/ChangeLog:
* passes.def (pass_reassoc): Rename parameter to early_p.
* tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p):
New variable.
(phi_rank): Don't bias loop-carried phi ranks
before vectorization pass.
(execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter.
(pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p
initializer.
(pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p
value.
(pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to
execute_reassoc.
(pass_reassoc::bias_loop_carried_phi_ranks_p): New member.
|
|
This patch implements the new hybrid forward threader and replaces the
embedded VRP threader with it.
With all the pieces that have gone in, the implementation of the hybrid
threader is straightforward: convert the current state into
SSA imports that the solver will understand, and let the path solver
precompute ranges and relations for the path. After this setup is done,
we can use the range_query API to solve gimple statements in the threader.
The forward threader is now engine agnostic so there are no changes to
the threader per se.
I have put the hybrid bits in tree-ssa-threadedge.*, instead of VRP,
because they will also be used in the evrp removal of the DOM/threader,
which is my next task.
Most of the patch, is actually test changes. I have gone through every
single one and verified that we're correct. Most were trivial dump
file name changes, but others required going through the IL an
certifying that the different IL was expected.
For example, in pr59597.c, we have one less thread because the
ASSERT_EXPR was getting in the way, and making it seem like things were
not crossing loops. The hybrid threader sees the correct representation
of the IL, and avoids threading this one case.
The final numbers are a 12.16% improvement in jump threads immediately
after VRP, and a 0.82% improvement in overall jump threads. The
performance drop is 0.6% (plus the 1.43% hit from moving the embedded
threader into its own pass). As I've said, I'd prefer to keep the
threader in its own pass, but if this is an issue, we can address this
with a shared ranger when VRP is replaced with an evrp instance
(upcoming).
Note, that these numbers are slightly different than what I originally
posted. A few correctness tweaks, plus restricting loop threads, made
the difference. That being said, I was aiming for par. A 12% gain is
just gravy ;-). When we merge the threaders, we should see even better
numbers-- and we'll have the benefit of an entire release stress testing
the solver.
As I mentioned in my introductory note, paths ending in MEM_REF
conditional are missing. In reality, this didn't make a difference, as
it was so rare. However, as a follow-up, I will distill a test and add
a suitable PR to keep us honest.
There is a one-line change to libgomp/team.c silencing a new used
uninitialized warning. As my previous work with the threaders has
shown, warnings flare up after each improvement to jump threading. I
expect this to be no different. I've promised Jakub to investigate
fully, so I will analyze and add the appropriate PR for the warning
experts.
Oh yeah, the new pass dump is called vrp-threader[12] to match each
VRP[12] pass. However, there's no reason for it to either be named
vrp-threader, or for it to live in tree-vrp.c.
Tested on x86-64 Linux.
OK?
p.s. "Did I say 5 weeks? My bad, I meant 5 months."
gcc/ChangeLog:
* passes.def (pass_vrp_threader): New.
* tree-pass.h (make_pass_vrp_threader): Add make_pass_vrp_threader.
* tree-ssa-threadedge.c (hybrid_jt_state::register_equivs_stmt): New.
(hybrid_jt_simplifier::hybrid_jt_simplifier): New.
(hybrid_jt_simplifier::simplify): New.
(hybrid_jt_simplifier::compute_ranges_from_state): New.
* tree-ssa-threadedge.h (class hybrid_jt_state): New.
(class hybrid_jt_simplifier): New.
* tree-vrp.c (execute_vrp): Remove ASSERT_EXPR based jump
threader.
(class hybrid_threader): New.
(hybrid_threader::hybrid_threader): New.
(hybrid_threader::~hybrid_threader): New.
(hybrid_threader::before_dom_children): New.
(hybrid_threader::after_dom_children): New.
(execute_vrp_threader): New.
(class pass_vrp_threader): New.
(make_pass_vrp_threader): New.
libgomp/ChangeLog:
* team.c: Initialize start_data.
* testsuite/libgomp.graphite/force-parallel-4.c: Adjust.
* testsuite/libgomp.graphite/force-parallel-8.c: Adjust.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr55107.c: Adjust.
* gcc.dg/tree-ssa/phi_on_compare-1.c: Adjust.
* gcc.dg/tree-ssa/phi_on_compare-2.c: Adjust.
* gcc.dg/tree-ssa/phi_on_compare-3.c: Adjust.
* gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust.
* gcc.dg/tree-ssa/pr21559.c: Adjust.
* gcc.dg/tree-ssa/pr59597.c: Adjust.
* gcc.dg/tree-ssa/pr61839_1.c: Adjust.
* gcc.dg/tree-ssa/pr61839_3.c: Adjust.
* gcc.dg/tree-ssa/pr71437.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Adjust.
* gcc.dg/tree-ssa/ssa-thread-14.c: Adjust.
* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Adjust.
* gcc.dg/tree-ssa/vrp106.c: Adjust.
* gcc.dg/tree-ssa/vrp55.c: Adjust.
|
|
This patch implements worker-partitioning support in the middle end,
by rewriting gimple. The OpenACC execution model requires that code
can run in either "worker single" mode where only a single worker per
gang is active, or "worker partitioned" mode, where multiple workers
per gang are active. This means we need to do something equivalent
to spawning additional workers when transitioning from worker-single
to worker-partitioned mode. However, GPUs typically fix the number of
threads of invoked kernels at launch time, so we need to do something
with the "extra" threads when they are not wanted.
The scheme used is to conditionalise each basic block that executes
in "worker single" mode for worker 0 only. Conditional branches
are handled specially so "idle" (non-0) workers follow along with
worker 0. On transitioning to "worker partitioned" mode, any variables
modified by worker 0 are propagated to the other workers via GPU shared
memory. Special care is taken for routine calls, writes through pointers,
and so forth, as follows:
- There are two types of function calls to consider in worker-single
mode: "normal" calls to maths library routines, etc. are called from
worker 0 only. OpenACC routines may contain worker-partitioned loops
themselves, so are called from all workers, including "idle" ones.
- SSA names set in worker-single mode, but used in worker-partitioned
mode, are copied to shared memory in worker 0. Other workers retrieve
the value from the appropriate shared-memory location after a barrier,
and new phi nodes are introduced at the convergence point to resolve
the worker 0/other worker copies of the value.
- Local scalar variables (on the stack) also need special handling. We
broadcast any variables that are written in the current worker-single
block, and that are read in any worker-partitioned block. (This is
believed to be safe, and is flow-insensitive to ease analysis.)
- Local aggregates (arrays and composites) on the stack are *not*
broadcast. Instead we force gimple stmts modifying elements/fields of
local aggregates into fully-partitioned mode. The RHS of the
assignment is a scalar, and is thus subject to broadcasting as above.
- Writes through pointers may affect any local variable that has
its address taken. We use points-to analysis to determine the set
of potentially-affected variables for a given pointer indirection.
We broadcast any such variable which is used in worker-partitioned
mode, on a per-block basis for any block containing a write through
a pointer.
Some slides about the implementation (from 2018) are available at:
https://jtb20.github.io/gcnworkers.pdf
gcc/
* Makefile.in (OBJS): Add omp-oacc-neuter-broadcast.o.
* doc/tm.texi.in (TARGET_GOACC_CREATE_WORKER_BROADCAST_RECORD):
Add documentation hook.
* doc/tm.texi: Regenerate.
* omp-oacc-neuter-broadcast.cc: New file.
* omp-builtins.def (BUILT_IN_GOACC_BARRIER)
(BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START)
(BUILT_IN_GOACC_SINGLE_COPY_END): New builtins.
* passes.def (pass_omp_oacc_neuter_broadcast): Add pass.
* target.def (goacc.create_worker_broadcast_record): Add target
hook.
* tree-pass.h (make_pass_omp_oacc_neuter_broadcast): Add
prototype.
* config/gcn/gcn-protos.h (gcn_goacc_adjust_propagation_record):
Rename prototype to...
(gcn_goacc_create_worker_broadcast_record): ... this.
* config/gcn/gcn-tree.c (gcn_goacc_adjust_propagation_record): Rename
function to...
(gcn_goacc_create_worker_broadcast_record): ... this.
* config/gcn/gcn.c (TARGET_GOACC_ADJUST_PROPAGATION_RECORD):
Rename to...
(TARGET_GOACC_CREATE_WORKER_BROADCAST_RECORD): ... this.
Co-Authored-By: Nathan Sidwell <nathan@codesourcery.com> (via 'gcc/config/nvptx/nvptx.c' master)
Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
|
|
This really is a separate step -- and another pass to be added between the two,
later on.
gcc/
* omp-offload.c (oacc_loop_xform_head_tail, oacc_loop_process):
'update_stmt' after modification.
(pass_oacc_loop_designation): New function, extracted out of...
(pass_oacc_device_lower): ... this.
(pass_data_oacc_loop_designation, pass_oacc_loop_designation)
(make_pass_oacc_loop_designation): New
* passes.def: Add it.
* tree-parloops.c (create_parallel_loop): Adjust.
* tree-pass.h (make_pass_oacc_loop_designation): New.
gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c:
's%oaccdevlow%oaccloops%g'.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine-nohost.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* c-c++-common/goacc/classify-serial.c: Likewise.
* c-c++-common/goacc/routine-nohost-1.c: Likewise.
* g++.dg/goacc/template.C: Likewise.
* gcc.dg/goacc/loop-processing-1.c: Likewise.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/classify-parallel.f95: Likewise.
* gfortran.dg/goacc/classify-routine-nohost.f95: Likewise.
* gfortran.dg/goacc/classify-routine.f95: Likewise.
* gfortran.dg/goacc/classify-serial.f95: Likewise.
* gfortran.dg/goacc/routine-multiple-directives-1.f90: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c:
's%oaccdevlow%oaccloops%g'.
* testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c:
Likewise.
* testsuite/libgomp.oacc-fortran/routine-nohost-1.f90: Likewise.
Co-Authored-By: Julian Brown <julian@codesourcery.com>
Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
|
|
gcc/ChangeLog:
* Makefile.in (OBJS): Add gimple-ssa-warn-access.o and pointer-query.o.
* attribs.h (fndecl_dealloc_argno): Move fndecl_dealloc_argno to tree.h.
* builtins.c (compute_objsize_r): Move to pointer-query.cc.
(access_ref::access_ref): Same.
(access_ref::phi): Same.
(access_ref::get_ref): Same.
(access_ref::size_remaining): Same.
(access_ref::offset_in_range): Same.
(access_ref::add_offset): Same.
(access_ref::inform_access): Same.
(ssa_name_limit_t::visit_phi): Same.
(ssa_name_limit_t::leave_phi): Same.
(ssa_name_limit_t::next): Same.
(ssa_name_limit_t::next_phi): Same.
(ssa_name_limit_t::~ssa_name_limit_t): Same.
(pointer_query::pointer_query): Same.
(pointer_query::get_ref): Same.
(pointer_query::put_ref): Same.
(pointer_query::flush_cache): Same.
(warn_string_no_nul): Move to gimple-ssa-warn-access.cc.
(check_nul_terminated_array): Same.
(unterminated_array): Same.
(maybe_warn_for_bound): Same.
(check_read_access): Same.
(warn_for_access): Same.
(get_size_range): Same.
(check_access): Same.
(gimple_call_alloc_size): Move to tree.c.
(gimple_parm_array_size): Move to pointer-query.cc.
(get_offset_range): Same.
(gimple_call_return_array): Same.
(handle_min_max_size): Same.
(handle_array_ref): Same.
(handle_mem_ref): Same.
(compute_objsize): Same.
(gimple_call_alloc_p): Move to gimple-ssa-warn-access.cc.
(call_dealloc_argno): Same.
(fndecl_dealloc_argno): Same.
(new_delete_mismatch_p): Same.
(matching_alloc_calls_p): Same.
(warn_dealloc_offset): Same.
(maybe_emit_free_warning): Same.
* builtins.h (check_nul_terminated_array): Move to
gimple-ssa-warn-access.h.
(check_nul_terminated_array): Same.
(warn_string_no_nul): Same.
(unterminated_array): Same.
(class ssa_name_limit_t): Same.
(class pointer_query): Same.
(struct access_ref): Same.
(class range_query): Same.
(struct access_data): Same.
(gimple_call_alloc_size): Same.
(gimple_parm_array_size): Same.
(compute_objsize): Same.
(class access_data): Same.
(maybe_emit_free_warning): Same.
* calls.c (initialize_argument_information): Remove call to
maybe_emit_free_warning.
* gimple-array-bounds.cc: Include new header..
* gimple-fold.c: Same.
* gimple-ssa-sprintf.c: Same.
* gimple-ssa-warn-restrict.c: Same.
* passes.def: Add pass_warn_access.
* tree-pass.h (make_pass_warn_access): Declare.
* tree-ssa-strlen.c: Include new headers.
* tree.c (fndecl_dealloc_argno): Move here from builtins.c.
* tree.h (fndecl_dealloc_argno): Move here from attribs.h.
* gimple-ssa-warn-access.cc: New file.
* gimple-ssa-warn-access.h: New file.
* pointer-query.cc: New file.
* pointer-query.h: New file.
gcc/cp/ChangeLog:
* init.c: Include new header.
|
|
The following testcase is miscompiled, because VN during cunrolli changes
__bos argument from address of a larger field to address of a smaller field
and so __builtin_object_size (, 1) then folds into smaller value than the
actually available size.
copy_reference_ops_from_ref has a hack for this, but it was using
cfun->after_inlining as a check whether the hack can be ignored, and
cunrolli is after_inlining.
This patch uses a property to make it exact (set at the end of objsz
pass that doesn't do insert_min_max_p) and additionally based on discussions
in the PR moves the objsz pass earlier after IPA.
2021-07-13 Jakub Jelinek <jakub@redhat.com>
Richard Biener <rguenther@suse.de>
PR tree-optimization/101419
* tree-pass.h (PROP_objsz): Define.
(make_pass_early_object_sizes): Declare.
* passes.def (pass_all_early_optimizations): Rename pass_object_sizes
there to pass_early_object_sizes, drop parameter.
(pass_all_optimizations): Move pass_object_sizes right after pass_ccp,
drop parameter, move pass_post_ipa_warn right after that.
* tree-object-size.c (pass_object_sizes::execute): Rename to...
(object_sizes_execute): ... this. Add insert_min_max_p argument.
(pass_data_object_sizes): Move after object_sizes_execute.
(pass_object_sizes): Likewise. In execute method call
object_sizes_execute, drop set_pass_param method and insert_min_max_p
non-static data member and its initializer in the ctor.
(pass_data_early_object_sizes, pass_early_object_sizes,
make_pass_early_object_sizes): New.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Use
(cfun->curr_properties & PROP_objsz) instead of cfun->after_inlining.
* gcc.dg/builtin-object-size-10.c: Pass -fdump-tree-early_objsz-details
instead of -fdump-tree-objsz1-details in dg-options and adjust names
of dump file in scan-tree-dump.
* gcc.dg/pr101419.c: New test.
|
|
Gimple sink code pass runs quite early, there may be some new
oppertunities exposed by later gimple optmization passes, this patch
runs the sink code pass once more before store_merging. For detailed
discussion, please refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562352.html
Tested the SPEC2017 performance on P8LE, 544.nab_r is improved
by 2.43%, but no big changes to other cases, GEOMEAN is improved quite
small with 0.25%.
gcc/ChangeLog:
2021-05-18 Xionghu Luo <luoxhu@linux.ibm.com>
* passes.def: Add sink_code pass before store_merging.
* tree-ssa-sink.c (pass_sink_code:clone): New.
gcc/testsuite/ChangeLog:
2021-05-18 Xionghu Luo <luoxhu@linux.ibm.com>
* gcc.dg/tree-ssa/ssa-sink-1.c: Adjust.
* gcc.dg/tree-ssa/ssa-sink-2.c: Ditto.
* gcc.dg/tree-ssa/ssa-sink-3.c: Ditto.
* gcc.dg/tree-ssa/ssa-sink-4.c: Ditto.
* gcc.dg/tree-ssa/ssa-sink-5.c: Ditto.
* gcc.dg/tree-ssa/ssa-sink-6.c: Ditto.
* gcc.dg/tree-ssa/ssa-sink-7.c: Ditto.
* gcc.dg/tree-ssa/ssa-sink-8.c: Ditto.
* gcc.dg/tree-ssa/ssa-sink-9.c: Ditto.
* gcc.dg/tree-ssa/ssa-sink-10.c: Ditto.
* gcc.dg/tree-ssa/ssa-sink-13.c: Ditto.
* gcc.dg/tree-ssa/ssa-sink-14.c: Ditto.
* gcc.dg/tree-ssa/ssa-sink-16.c: Ditto.
* gcc.dg/tree-ssa/ssa-sink-17.c: Ditto.
* gcc.dg/tree-ssa/ssa-sink-18.c: New.
|
|
pointer plus offset
gcc/ChangeLog:
* passes.def (pass_warn_printf): Run after SSA.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/builtin-sprintf-warn-26.c: New test.
|
|
The ldist pass turns even very short loops into memset calls. E.g.,
the TFmode emulation calls end with a loop of up to 3 iterations, to
zero out trailing words, and the loop distribution pass turns them
into calls of the memset builtin.
Though short constant-length clearing memsets are usually dealt with
efficiently, for non-constant-length ones, the options are setmemM, or
a function calls.
RISC-V doesn't have any setmemM pattern, so the loops above end up
"optimized" into memset calls, incurring not only the overhead of an
explicit call, but also discarding the information the compiler has
about the alignment of the destination, and that the length is a
multiple of the word alignment.
This patch handles variable lengths with multiple conditional
power-of-2-constant-sized stores-by-pieces, so as to reduce the
overhead of length compares.
It also changes the last copy-prop pass into ccp, so that pointer
alignment and length's nonzero bits are detected and made available
for the expander, even for ldist-introduced SSA_NAMEs.
for gcc/ChangeLog
* builtins.c (try_store_by_multiple_pieces): New.
(expand_builtin_memset_args): Use it. If target_char_cast
fails, proceed as for non-constant val. Pass len's ctz to...
* expr.c (clear_storage_hints): ... this. Try store by
multiple pieces after setmem.
(clear_storage): Adjust.
* expr.h (clear_storage_hints): Likewise.
(try_store_by_multiple_pieces): Declare.
* passes.def: Replace the last copy_prop with ccp.
|
|
This makes sure to remove unused locals and prune CLOBBERs after
the first scalar cleanup phase after IPA optimizations. On the
testcase in the PR this results in 8000 CLOBBERs removed which
in turn unleashes more DSE which otherwise hits its walking limit
of 256 too early on this testcase.
2021-04-27 Richard Biener <rguenther@suse.de>
PR tree-optimization/99912
* passes.def: Add comment about new TODO_remove_unused_locals.
* tree-stdarg.c (pass_data_stdarg): Run TODO_remove_unused_locals
at start.
|
|
For the testcase in the PR the main SRA pass is unable to do some
important scalarizations because dead stores of addresses make
the candiate variables disqualified. The following patch adds
another DSE pass before SRA forming a DCE/DSE pair and moves the
DSE pass that is currently closely after SRA up to after the
next DCE pass, forming another DCE/DSE pair now residing after PRE.
2021-04-07 Richard Biener <rguenther@suse.de>
PR tree-optimization/99912
* passes.def (pass_all_optimizations): Add pass_dse before
the first pass_dce, move the first pass_dse before the
pass_dce following pass_pre.
* gcc.dg/tree-ssa/ldist-33.c: Disable PRE and LIM.
* gcc.dg/tree-ssa/pr96789.c: Adjust dump file scanned.
* gcc.dg/tree-ssa/ssa-dse-28.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-29.c: Likewise.
|