Age | Commit message (Collapse) | Author | Files | Lines |
|
Starting with GCC11 we keep emitting false positive -Warray-bounds or
-Wstringop-overflow etc. warnings on widely used *(type *)0x12345000
style accesses (or memory/string routines to such pointers).
This is a standard programming style supported by all C/C++ compilers
I've ever tried, used mostly in kernel or DSP programming, but sometimes
also together with mmap MAP_FIXED when certain things, often I/O registers
but could be anything else too are known to be present at fixed
addresses.
Such INTEGER_CST addresses can appear in code either because a user
used it like that (in which case it is fine) or because somebody used
pointer arithmetics (including &((struct whatever *)NULL)->field) on
a NULL pointer. The middle-end warning code wrongly assumes that the
latter case is what is very likely, while the former is unlikely and
users should change their code.
The following patch adds a min-pagesize param defaulting to 4KB,
and treats INTEGER_CST addresses smaller than that as assumed results
of pointer arithmetics from NULL while addresses equal or larger than
that as expected user constant addresses. For GCC 13 we can
represent results from pointer arithmetics on NULL using
&MEM[(void*)0 + offset] instead of (void*)offset INTEGER_CSTs.
2022-03-18 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99578
PR middle-end/100680
PR tree-optimization/100834
* params.opt (--param=min-pagesize=): New parameter.
* pointer-query.cc
(compute_objsize_r) <case ARRAY_REF>: Formatting fix.
(compute_objsize_r) <case INTEGER_CST>: Use maximum object size instead
of zero for pointer constants equal or larger than min-pagesize.
* gcc.dg/tree-ssa/pr99578-1.c: New test.
* gcc.dg/pr99578-1.c: New test.
* gcc.dg/pr99578-2.c: New test.
* gcc.dg/pr99578-3.c: New test.
* gcc.dg/pr100680.c: New test.
* gcc.dg/pr100834.c: New test.
|
|
Addresses one of the points raised in #104552; checking in under
the "obvious" rule.
gcc/ChangeLog:
PR translation/104552
* params.opt: Fix typo.
|
|
This adds a --param to allow disabling of vectorization of
floating point inductions. Ontop of -Ofast this should allow
549.fotonik3d_r to not miscompare.
2022-03-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/84201
* params.opt (-param=vect-induction-float): Add.
* doc/invoke.texi (vect-induction-float): Document.
* tree-vect-loop.cc (vectorizable_induction): Honor
param_vect_induction_float.
* gcc.dg/vect/pr84201.c: New testcase.
|
|
One of the mistakes reported in PR 104552 is repeated "that" in
description of ipa-cp-recursive-freq-factor which I introduced. This
patch removes one of them.
gcc/ChangeLog:
2022-03-07 Martin Jambor <mjambor@suse.cz>
PR translation/104552
* params.opt (ipa-cp-recursive-freq-factor): Remove repeated word
"that" in the description.
|
|
In pathological cases, the number of transitive relations being added is
potentially quadratic. Lookups for relations in a block is linear in
nature, so simply limit the number of relations to some reasonable number.
PR tree-optimization/104038
* doc/invoke.texi (relation-block-limit): New.
* params.opt (relation-block-limit): New.
* value-relation.cc (dom_oracle::register_relation): Check for NULL
record before invoking transitive registery.
(dom_oracle::set_one_relation): Check limit before creating record.
(dom_oracle::register_transitives): Stop when no record created.
* value-relation.h (relation_chain_head::m_num_relations): New.
|
|
|
|
Fixes:
FAIL: compiler driver --help=param option(s): "^ +-.*[^:.]$" absent from output: "
--param=max-inline-functions-called-once-loop-depth= Maximum loop depth of a call which is considered for inlining functions called once"
FAIL: compiler driver --help=params option(s): "[^.]$" absent from output: "e"
gcc/ChangeLog:
* params.opt: Add missing dot.
|
|
as dicussed in PR ipa/103454 there are several benchmarks that regresses
for -finline-functions-called once. Runtmes:
- tramp3d with -Ofast. 31%
- exchange2 with -Ofast 11-21%
- roms O2 9%-10%
- tonto 2.5-3.5% with LTO
Build times:
- specfp2006 41% (mostly wrf that builds 71% faster)
- specint2006 1.5-3%
- specfp2017 64% (again mostly wrf)
- specint2017 2.5-3.5%
This patch adds two params to tweak the behaviour:
1) max-inline-functions-called-once-loop-depth limiting the loop depth
(this is useful primarily for exchange where the inlined function is in
loop depth 9)
2) max-inline-functions-called-once-insns
We already have large-function-insns/growth parameters, but these are
limiting also inlining small functions, so reducing them will regress
very large functions that are hot.
Because inlining functions called once is meant just as a cleanup pass
I think it makes sense to have separate limit for it.
gcc/ChangeLog:
2021-12-09 Jan Hubicka <hubicka@ucw.cz>
* doc/invoke.texi (max-inline-functions-called-once-loop-depth,
max-inline-functions-called-once-insns): New parameters.
* ipa-inline.c (check_callers): Handle
param_inline_functions_called_once_loop_depth and
param_inline_functions_called_once_insns.
(edge_badness): Fix linebreaks.
* params.opt (param=max-inline-functions-called-once-loop-depth,
param=max-inline-functions-called-once-insn): New params.
|
|
I am returning a TDF_* flag to the queue of available entries as I am
unconvinced that we need to burn an entire flag for internal debugging
constructs, especially since we seem to be running out of them.
I've added a --param=threader-debug entry similar to the one we use for
ranger debugging. Currently this only affects the backward threader,
but since the DOM threader is an outlier and on the chopping block, I
avoided using the "backward" name.
Tested on x86-64 Linux.
gcc/ChangeLog:
* dumpfile.c (dump_options): Remove TDF_THREADING entry.
* dumpfile.h (enum dump_flag): Remove TDF_THREADING and adjust
remaining entries.
* flag-types.h (enum threader_debug): New.
* gimple-range-path.cc (DEBUG_SOLVER): Use param_threader_debug.
* params.opt: Add entry for --param=threader-debug=.
|
|
gcc/ChangeLog:
PR ipa/103055
* params.opt (modref-max-depth): Add range.
(modref-max-adjustments): Fix range.
|
|
This patch flips the default for the VRP2 pass to execute ranger vrp rather
than the assert_expr version of VRP.
* params.opt (param_vrp2_mode): Make ranger the default for VRP2.
|
|
gcc/ChangeLog:
* opt-functions.awk: Add new sanity checking.
* optc-gen.awk: Add new argument to integer_range_info.
* params.opt: Update 2 params which have negative IntegerRange.
|
|
When profile feedback is available, IPA-CP takes the count of the
hottest node and then evaluates all call contexts relative to it.
This means that typically almost no clones for specialized contexts
are ever created because the maximum is some special function, called
from everywhere (that is likely to get inlined anyway) and all the
examined edges look cold compared to it.
This patch changes the selection. It simply sorts counts of all edges
eligible for cloning in a vector and then picks the count in 90th
percentile (the actual number is configurable via a parameter).
I also tried more complex approaches which were summing the counts and
picking the edge which together with all hotter edges accounted for a
given portion of the total sum of all edge counts. But first it was
not apparently clear to me that they make more logical sense that the
simple method and practically I always also had to ignore a few
percent of the hottest edges with really extreme counts (looking at
bash and python). And when I had to do that anyway, it seemed simpler
to just "ignore" more and take the first non-ignored count as the
base.
Nevertheless, if people think some more sophisticated method should be
used anyway, I am willing to be persuaded. But this patch is a clear
improvement over the current situation.
gcc/ChangeLog:
2021-10-26 Martin Jambor <mjambor@suse.cz>
* params.opt (param_ipa_cp_profile_count_base): New parameter.
* doc/invoke.texi (Optimize Options): Add entry for
ipa-cp-profile-count-base.
* ipa-cp.c (max_count): Replace with base_count, replace all
occurrences too, unless otherwise stated.
(ipcp_cloning_candidate_p): identify mostly-directly called
functions based on their counts, not max_count.
(compare_edge_profile_counts): New function.
(ipcp_propagate_stage): Instead of setting max_count, find the
appropriate edge count in a sorted vector of counts of eligible
edges and make it the base_count.
|
|
Set the 3 possible flags as all individual bits and group for options.
* flag-types.h (enum ranger_debug): Adjust values.
* params.opt (ranger_debug): Ditto.
|
|
With Ranger being used in more than EVRP, the debug output should no longer
be tied up with the EVRP mode flag.
* doc/invoke.texi (ranger-debug): Document.
* flag-types.h (enum ranger_debug): New.
(enum evrp_mode): Remove debug values.
* gimple-range-cache.cc (DEBUG_RANGE_CACHE): Use new debug flag.
* gimple-range-gori.cc (gori_compute::gori_compute): Ditto.
* gimple-range.cc (gimple_ranger::gimple_ranger): Ditto.
* gimple-ssa-evrp.c (hybrid_folder::choose_value): Ditto.
(execute_early_vrp): Use evrp-mode directly.
* params.opt (enum evrp_mode): Remove debug values.
(ranger-debug): New.
(ranger-logical-depth): Relocate to be in alphabetical order.
|
|
Add 2 new params to select between VRP and RANGER to be used for each pass.
* doc/invoke.texi: (vrp1-mode, vrp2-mode): Document.
* flag-types.h: (enum vrp_mode): New.
* params.opt: (vrp1-mode, vrp2-mode): New.
* tree-vrp.c (vrp_pass_num): New.
(pass_vrp::pass_vrp): Set pass number.
(pass_vrp::execute): Choose which VRP mode to execute.
|
|
The removal of --param=max-fsm-thread-length is causing code
explosion. I thought that --param=max-fsm-thread-path-insns was a
better gague for path profitability than raw BB length, but it turns
out that we don't take into account PHIs when estimating the number of
statements.
In this PR, we have a sequence of very large PHIs that have us
traversing extremely large paths that blow up the compilation.
We could fix this a couple of different ways. We could avoid
traversing more than a certain number of PHI arguments, or ignore
large PHIs altogether. The old implementation certainly had this
knob, and we could cut things off before we even got to the ranger.
We could also adjust the instruction estimation to take into account
PHIs, but I'm sure we'll mess something else in the process ;-).
The easiest thing to do is just restore the knob.
At a later time we could tweak this further, for instance,
disregarding empty blocks in the count. BTW, this is the reason I
didn't chop things off in the lowlevel registry for all threaders: the
forward threader can't really explore too deep paths, but it could
theoretically get there while threading over empty blocks.
This fixes 102814, 102852, and I bet it solves the Linux kernel cross
compile issue.
Tested on x86-64 Linux.
gcc/ChangeLog:
PR tree-optimization/102814
* doc/invoke.texi: Document --param=max-fsm-thread-length.
* params.opt: Add --param=max-fsm-thread-length.
* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Fail on paths
longer than max-fsm-thread-length.
|
|
The new backward threader makes some of the --param knobs used to
control it questionable at best or no longer applicable at worst.
The fsm-maximum-phi-arguments param is unused and can be removed.
The max-fsm-thread-length param is block based which is a bit redundant,
since we already restrict paths based on instruction estimates.
The max-fsm-thread-paths restricts the total number of threadable paths
in a function. We probably don't need this. Besides, the forward
threader has no such restriction.
Tested on x86-64 Linux.
gcc/ChangeLog:
* doc/invoke.texi: Remove max-fsm-thread-length,
max-fsm-thread-paths, and fsm-maximum-phi-arguments.
* params.opt: Same.
* tree-ssa-threadbackward.c (back_threader::back_threader): Remove
argument.
(back_threader_registry::back_threader_registry): Same.
(back_threader_profitability::profitable_path_p): Remove
param_max_fsm_thread-length.
(back_threader_registry::register_path): Remove
m_max_allowable_paths.
|
|
Recursive call graph edges, even when they are hot and important for
the compiled program, can never have frequency bigger than one, even
when the actual time savings in the next recursion call are not
realized just once but depend on the depth of recursion. The current
IPA-CP effect propagation code did not take that into account and just
used the frequency, thus severely underestimating the effect.
This patch artificially boosts values taking part in such calls. If a
value feeds into itself through a recursive call, the frequency of the
edge is multiplied by a parameter with default value of 6, basically
assuming that the recursion will take place 6 times. This value can
of course be subject to change.
Moreover, values which do not feed into themselves but which were
generated for a self-recursive call with an arithmetic
pass-function (aka the 548.exchange "hack" which however is generally
applicable for recursive functions which count the recursion depth in
a parameter) have the edge frequency multiplied as many times as there
are generated values in the chain. In essence, we will assume they
are all useful.
This patch partially fixes the current situation when we fail to
optimize 548.exchange with PGO. In the benchmark one recursive edge
count overwhelmingly dominates all other counts in the program and so
we fail to perform the first cloning (for the nonrecursive entry call)
because it looks totally insignificant.
gcc/ChangeLog:
2021-07-16 Martin Jambor <mjambor@suse.cz>
* params.opt (ipa-cp-recursive-freq-factor): New.
* ipa-cp.c (ipcp_value): Switch to inline initialization. New members
scc_no, self_recursion_generated_level, same_scc and
self_recursion_generated_p.
(ipcp_lattice::add_value): Replaced parameter unlimited with
same_lat_gen_level, usit it determine limit of values and store it to
the value.
(ipcp_lattice<valtype>::print): Dump the new fileds.
(allocate_and_init_ipcp_value): Take same_lat_gen_level as a new
parameter and store it to the new value.
(self_recursively_generated_p): Removed.
(propagate_vals_across_arith_jfunc): Use self_recursion_generated_p
instead of self_recursively_generated_p, store self generation level
to such values.
(value_topo_info<valtype>::add_val): Set scc_no.
(value_topo_info<valtype>::propagate_effects): Multiply frequencies of
recursively feeding values and self generated values by appropriate
new factors.
|
|
Very large switches cause a lot of range calculations with multiple subranges
to happen. This can cause quadratic or even exponetial time increases in
large testcases. This patch introduces a param variable to limit
the size of switches EVRP will process.
* gimple-range-edge.cc (gimple_outgoing_range::gimple_outgoing_range):
Add parameter to limit size when recognizing switches.
(gimple_outgoing_range::edge_range_p): Check size limit.
* gimple-range-edge.h (gimple_outgoing_range): Add size field.
* gimple-range-gori.cc (gori_map::calculate_gori): Ignore switches
that exceed the size limit.
(gori_compute::gori_compute): Add initializer.
* params.opt (evrp-switch-limit): New.
* doc/invoke.texi: Update docs.
|
|
The last missing piece of the C++17 standard library is the hardware
intereference size constants. Much of the delay in implementing these has
been due to uncertainty about what the right values are, and even whether
there is a single constant value that is suitable; the destructive
interference size is intended to be used in structure layout, so program
ABIs will depend on it.
In principle, both of these values should be the same as the target's L1
cache line size. When compiling for a generic target that is intended to
support a range of target CPUs with different cache line sizes, the
constructive size should probably be the minimum size, and the destructive
size the maximum, unless you are constrained by ABI compatibility with
previous code.
From discussion on gcc-patches, I've come to the conclusion that the
solution to the difficulty of choosing stable values is to give up on it,
and instead encourage only uses where ABI stability is unimportant: in
particular, uses where the ABI is shared at most between translation units
built at the same time with the same flags.
To that end, I've added a warning for any use of the constant value of
std::hardware_destructive_interference_size in a header or module export.
Appropriate uses within a project can disable the warning.
A previous iteration of this patch included an -finterference-tune flag to
make the value vary with -mtune; this iteration makes that the default
behavior, which should be appropriate for all reasonable uses of the
variable. The previous default of "stable-ish" seems to me likely to have
been more of an attractive nuisance; since we can't promise actual
stability, we should instead make proper uses more convenient.
JF Bastien's implementation proposal is summarized at
https://github.com/itanium-cxx-abi/cxx-abi/issues/74
I implement this by adding new --params for the two sizes. Targets can
override these values in targetm.target_option.override() to support a range
of values for the generic target; otherwise, both will default to the L1
cache line size.
64 bytes still seems correct for all x86.
I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex
A9 has a 32-byte cache line, so I'd think 32/64 would make more sense.
He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B
cache line, I've changed that to 64/256.
Other arch maintainers are invited to set ranges for their generic targets
if that seems better than using the default cache line size for both values.
With the above choice to reject stability as a goal, getting these values
"right" is now just a matter of what we want the default optimization to be,
and we can feel free to adjust them as CPUs with different cache lines
become more and less common.
gcc/ChangeLog:
* params.opt: Add destructive-interference-size and
constructive-interference-size.
* doc/invoke.texi: Document them.
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set them.
* config/arm/arm.c (arm_option_override): Set them.
* config/i386/i386-options.c (ix86_option_override_internal):
Set them.
gcc/c-family/ChangeLog:
* c.opt: Add -Winterference-size.
* c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
and __GCC_CONSTRUCTIVE_SIZE.
gcc/cp/ChangeLog:
* constexpr.c (maybe_warn_about_constant_value):
Complain about std::hardware_destructive_interference_size.
(cxx_eval_constant_expression): Call it.
* decl.c (cxx_init_decl_processing): Check
--param *-interference-size values.
libstdc++-v3/ChangeLog:
* include/std/version: Define __cpp_lib_hardware_interference_size.
* libsupc++/new: Define hardware interference size variables.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Winterference.H: New file.
* g++.dg/warn/Winterference.C: New test.
* g++.target/aarch64/interference.C: New test.
* g++.target/arm/interference.C: New test.
* g++.target/i386/interference.C: New test.
|
|
gcc/ChangeLog:
* params.opt: (modref-max-adjustments): Add full stop.
|
|
this patch adds logic needed to merge neighbouring accesses in ipa-modref
summaries. This helps analyzing array initializers and similar code. It is
bit of work, since it breaks the fact that modref tree makes a good lattice for
dataflow: the access ranges can be extended indefinitely. For this reason I
added counter tracking number of adjustments and a cap to limit them during the
dataflow.
gcc/ChangeLog:
* doc/invoke.texi: Document --param modref-max-adjustments.
* ipa-modref-tree.c (test_insert_search_collapse): Update.
(test_merge): Update.
* ipa-modref-tree.h (struct modref_access_node): Add adjustments;
(modref_access_node::operator==): Fix handling of access ranges.
(modref_access_node::contains): Constify parameter; handle also
mismatched parm offsets.
(modref_access_node::update): New function.
(modref_access_node::merge): New function.
(unspecified_modref_access_node): Update constructor.
(modref_ref_node::insert_access): Add record_adjustments parameter;
handle merging.
(modref_ref_node::try_merge_with): New private function.
(modref_tree::insert): New record_adjustments parameter.
(modref_tree::merge): New record_adjustments parameter.
(modref_tree::copy_from): Update.
* ipa-modref.c (dump_access): Dump adjustments field.
(get_access): Update constructor.
(record_access): Update call of insert.
(record_access_lto): Update call of insert.
(merge_call_side_effects): Add record_adjustments parameter.
(get_access_for_fnspec): Update.
(process_fnspec): Update.
(analyze_call): Update.
(analyze_function): Update.
(read_modref_records): Update.
(ipa_merge_modref_summary_after_inlining): Update.
(propagate_unknown_call): Update.
(modref_propagate_in_scc): Update.
* params.opt (param-max-modref-adjustments=): New.
gcc/testsuite/ChangeLog:
* gcc.dg/ipa/modref-1.c: Update testcase.
* gcc.dg/tree-ssa/modref-4.c: Update testcase.
* gcc.dg/tree-ssa/modref-8.c: New test.
|
|
This makes use of the estimated number of iterations of the inner loop
to limit --param vect-inner-loop-cost-factor scaling. It also reduces
the maximum value of vect-inner-loop-cost-factor to 10000 making it
less likely to cause overflow of costs.
2021-08-23 Richard Biener <rguenther@suse.de>
* doc/invoke.texi (vect-inner-loop-cost-factor): Adjust.
* params.opt (--param vect-inner-loop-cost-factor): Adjust
maximum value.
* tree-vect-loop.c (vect_analyze_loop_form): Initialize
inner_loop_cost_factor to the minimum of the estimated number
of iterations of the inner loop and vect-inner-loop-cost-factor.
|
|
Remove tracing in hybrid mode. Add trace/gori/cache tracing options.
tracing options are now 'trace', 'gori', 'cache', or all combined in 'debug'
* flag-types.h (enum evrp_mode): Adjust evrp-mode values.
* gimple-range-cache.cc (DEBUG_RANGE_CACHE): Relocate from.
* gimple-range-trace.h (DEBUG_RANGE_CACHE): Here.
* params.opt (--param=evrp-mode): Adjust options.
|
|
At this point I don't see any use for the legacy mode, which I had
originally left in place during the transition.
This patch removes the legacy back threader, and cleans up the code a
bit. There are no functional changes to the non-legacy code.
Tested on x86-64 Linux.
gcc/ChangeLog:
* doc/invoke.texi: Remove docs for threader-mode param.
* flag-types.h (enum threader_mode): Remove.
* params.opt: Remove threader-mode param.
* tree-ssa-threadbackward.c (class back_threader): Remove
path_is_unreachable_p.
Make find_paths private.
Add maybe_thread and thread_through_all_blocks.
Remove reference marker for m_registry.
Remove reference marker for m_profit.
(back_threader::back_threader): Adjust for registry and profit not
being references.
(dump_path): Move down.
(debug): Move down.
(class thread_jumps): Remove.
(class back_threader_registry): Remove m_all_paths.
Remove destructor.
(thread_jumps::thread_through_all_blocks): Move to back_threader
class.
(fsm_find_thread_path): Remove
(back_threader::maybe_thread): New.
(back_threader::thread_through_all_blocks): Move from
thread_jumps.
(back_threader_registry::back_threader_registry): Remove
m_all_paths.
(back_threader_registry::~back_threader_registry): Remove.
(thread_jumps::find_taken_edge): Remove.
(thread_jumps::check_subpath_and_update_thread_path): Remove.
(thread_jumps::maybe_register_path): Remove.
(thread_jumps::handle_phi): Remove.
(handle_assignment_p): Remove.
(thread_jumps::handle_assignment): Remove.
(thread_jumps::fsm_find_control_statement_thread_paths): Remove.
(thread_jumps::find_jump_threads_backwards): Remove.
(thread_jumps::find_jump_threads_backwards_with_ranger): Remove.
(try_thread_blocks): Rename find_jump_threads_backwards to
maybe_thread.
(pass_early_thread_jumps::execute): Same.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Remove call into the legacy
code and adjust for ranger threader.
|
|
This was meant to be an internal construct, but I see folks are using
it and submitting PRs against it. Let's just remove this to avoid
further confusion.
Tested on x86-64 Linux.
gcc/ChangeLog:
PR tree-optimization/101724
* params.opt: Remove --param=threader-iterative.
* tree-ssa-threadbackward.c (pass_thread_jumps::execute): Remove
iterative mode.
|
|
This is a rewrite of the backwards threader with a ranger based solver.
The code is divided into two parts: the path solver in
gimple-range-path.*, and the path discovery bits in
tree-ssa-threadbackward.c.
The legacy code is still available with --param=threader-mode=legacy,
but will be removed shortly after.
gcc/ChangeLog:
* Makefile.in (tree-ssa-loop-im.o-warn): New.
* flag-types.h (enum threader_mode): New.
* params.opt: Add entry for --param=threader-mode.
* tree-ssa-threadbackward.c (THREADER_ITERATIVE_MODE): New.
(class back_threader): New.
(back_threader::back_threader): New.
(back_threader::~back_threader): New.
(back_threader::maybe_register_path): New.
(back_threader::find_taken_edge): New.
(back_threader::find_taken_edge_switch): New.
(back_threader::find_taken_edge_cond): New.
(back_threader::resolve_def): New.
(back_threader::resolve_phi): New.
(back_threader::find_paths_to_names): New.
(back_threader::find_paths): New.
(dump_path): New.
(debug): New.
(thread_jumps::find_jump_threads_backwards): Call ranger threader.
(thread_jumps::find_jump_threads_backwards_with_ranger): New.
(pass_thread_jumps::execute): Abstract out code...
(try_thread_blocks): ...here.
* tree-ssa-threadedge.c (jump_threader::thread_outgoing_edges):
Abstract out threading candidate code to...
(single_succ_to_potentially_threadable_block): ...here.
* tree-ssa-threadedge.h (single_succ_to_potentially_threadable_block):
New.
* tree-ssa-threadupdate.c (register_jump_thread): Return boolean.
* tree-ssa-threadupdate.h (class jump_thread_path_registry):
Return bool from register_jump_thread.
libgomp/ChangeLog:
* testsuite/libgomp.graphite/force-parallel-4.c: Adjust for
threader.
* testsuite/libgomp.graphite/force-parallel-8.c: Same.
gcc/testsuite/ChangeLog:
* g++.dg/debug/dwarf2/deallocator.C: Adjust for threader.
* gcc.c-torture/compile/pr83510.c: Same.
* dg.dg/analyzer/pr94851-2.c: Same.
* gcc.dg/loop-unswitch-2.c: Same.
* gcc.dg/old-style-asm-1.c: Same.
* gcc.dg/pr68317.c: Same.
* gcc.dg/pr97567-2.c: Same.
* gcc.dg/predict-9.c: Same.
* gcc.dg/shrink-wrap-loop.c: Same.
* gcc.dg/sibcall-1.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-3.c: Same.
* gcc.dg/tree-ssa/pr21001.c: Same.
* gcc.dg/tree-ssa/pr21294.c: Same.
* gcc.dg/tree-ssa/pr21417.c: Same.
* gcc.dg/tree-ssa/pr21458-2.c: Same.
* gcc.dg/tree-ssa/pr21563.c: Same.
* gcc.dg/tree-ssa/pr49039.c: Same.
* gcc.dg/tree-ssa/pr61839_1.c: Same.
* gcc.dg/tree-ssa/pr61839_3.c: Same.
* gcc.dg/tree-ssa/pr77445-2.c: Same.
* gcc.dg/tree-ssa/split-path-4.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
* gcc.dg/tree-ssa/ssa-fre-48.c: Same.
* gcc.dg/tree-ssa/ssa-thread-11.c: Same.
* gcc.dg/tree-ssa/ssa-thread-12.c: Same.
* gcc.dg/tree-ssa/ssa-thread-14.c: Same.
* gcc.dg/tree-ssa/vrp02.c: Same.
* gcc.dg/tree-ssa/vrp03.c: Same.
* gcc.dg/tree-ssa/vrp05.c: Same.
* gcc.dg/tree-ssa/vrp06.c: Same.
* gcc.dg/tree-ssa/vrp07.c: Same.
* gcc.dg/tree-ssa/vrp09.c: Same.
* gcc.dg/tree-ssa/vrp19.c: Same.
* gcc.dg/tree-ssa/vrp20.c: Same.
* gcc.dg/tree-ssa/vrp33.c: Same.
* gcc.dg/uninit-pred-9_b.c: Same.
* gcc.dg/uninit-pr61112.c: Same.
* gcc.dg/vect/bb-slp-16.c: Same.
* gcc.target/i386/avx2-vect-aggressive.c: Same.
* gcc.dg/tree-ssa/ranger-threader-1.c: New test.
* gcc.dg/tree-ssa/ranger-threader-2.c: New test.
* gcc.dg/tree-ssa/ranger-threader-3.c: New test.
* gcc.dg/tree-ssa/ranger-threader-4.c: New test.
* gcc.dg/tree-ssa/ranger-threader-5.c: New test.
|
|
Change the default EVRP mode to ranger-only.
gcc/
* params.opt (param_evrp_mode): Change default.
gcc/testsuite/
* gcc.dg/pr80776-1.c: Remove xfail.
|
|
This patch is to make IRA consider matching constraint heavily,
even if there is at least one other alternative with non-NO_REG
register class constraint, it will continue and check matching
constraint in all available alternatives and respect the
matching constraint with preferred register class.
One typical case is destructive FMA style instruction on rs6000.
Without this patch, for the mentioned FMA instruction, IRA won't
respect the matching constraint on VSX_REG since there are some
alternative with FLOAT_REG which doesn't have matching constraint.
It can cause extra register copies since later reload has to make
code to respect the constraint. This patch make IRA respect this
matching constraint on VSX_REG which is the preferred regclass,
but it excludes some cases where for one preferred register class
there can be two or more alternatives, one of them has the
matching constraint, while another doesn't have. It also
considers the possibility of free register copy.
With option Ofast unroll, this patch can help to improve SPEC2017
bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while
508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any
remarkable degradations. It also improved something on SVE as
testcase changes showed and Richard's confirmation.
Bootstrapped & regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu.
gcc/ChangeLog:
PR rtl-optimization/100328
* doc/invoke.texi (ira-consider-dup-in-all-alts): Document new
parameter.
* ira.c (ira_get_dup_out_num): Adjust as parameter
param_ira_consider_dup_in_all_alts.
* params.opt (ira-consider-dup-in-all-alts): New.
* ira-conflicts.c (process_regs_for_copy): Add one parameter
single_input_op_has_cstr_p.
(get_freq_for_shuffle_copy): New function.
(add_insn_allocno_copies): Adjust as single_input_op_has_cstr_p.
* ira-int.h (ira_get_dup_out_num): Add one bool parameter.
gcc/testsuite/ChangeLog:
PR rtl-optimization/100328
* gcc.target/aarch64/sve/acle/asm/div_f16.c: Remove one xfail.
* gcc.target/aarch64/sve/acle/asm/div_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulx_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulx_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulx_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmad_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmad_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmad_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmla_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmla_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmla_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmls_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmls_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmls_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmsb_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmsb_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmsb_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f64.c: Likewise.
|
|
Use a sparse representation for the on entry cache, and utilize it when
the number of basic blocks in the function exceeds param_evrp_sparse_threshold.
PR tree-optimization/PR100299
* gimple-range-cache.cc (class sbr_sparse_bitmap): New.
(sbr_sparse_bitmap::sbr_sparse_bitmap): New.
(sbr_sparse_bitmap::bitmap_set_quad): New.
(sbr_sparse_bitmap::bitmap_get_quad): New.
(sbr_sparse_bitmap::set_bb_range): New.
(sbr_sparse_bitmap::get_bb_range): New.
(sbr_sparse_bitmap::bb_range_p): New.
(block_range_cache::block_range_cache): initialize bitmap obstack.
(block_range_cache::~block_range_cache): Destruct obstack.
(block_range_cache::set_bb_range): Decide when to utilze the
sparse on entry cache.
* gimple-range-cache.h (block_range_cache): Add bitmap obstack.
* params.opt (-param=evrp-sparse-threshold): New.
|
|
testsuite coverage [PR90115]
gcc/
PR middle-end/90115
* flag-types.h (enum openacc_privatization): New.
* params.opt (-param=openacc-privatization): New.
* doc/invoke.texi (openacc-privatization): Document it.
* omp-general.h (get_openacc_privatization_dump_flags): New
function.
* omp-low.c (oacc_privatization_candidate_p): Add diagnostics.
* omp-offload.c (execute_oacc_device_lower)
<IFN_UNIQUE_OACC_PRIVATE>: Re-work diagnostics.
* target.def (goacc.adjust_private_decl): Add 'location_t'
parameter.
* doc/tm.texi: Regenerate.
* config/gcn/gcn-protos.h (gcn_goacc_adjust_private_decl): Adjust.
* config/gcn/gcn-tree.c (gcn_goacc_adjust_private_decl): Likewise.
* config/nvptx/nvptx.c (nvptx_goacc_adjust_private_decl):
Likewise. Preserve it for...
(nvptx_goacc_expand_var_decl): ... use here.
gcc/testsuite/
PR middle-end/90115
* c-c++-common/goacc/privatization-1-compute-loop.c: New file.
* c-c++-common/goacc/privatization-1-compute.c: Likewise.
* c-c++-common/goacc/privatization-1-routine_gang-loop.c:
Likewise.
* c-c++-common/goacc/privatization-1-routine_gang.c: Likewise.
* gfortran.dg/goacc/privatization-1-compute-loop.f90: Likewise.
* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
Likewise.
* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
* c-c++-common/goacc-gomp/nesting-1.c: Update.
* c-c++-common/goacc/private-reduction-1.c: Likewise.
* gfortran.dg/goacc/private-3.f95: Likewise.
libgomp/
PR middle-end/90115
* testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: New
file.
* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Update.
* testsuite/libgomp.oacc-c-c++-common/host_data-7.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/private-atomic-1-gang.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/private-variables.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/static-variable-1.c:
Likewise.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
* testsuite/libgomp.oacc-fortran/declare-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/host_data-5.F90: Likewise.
* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/optional-private.f90: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
* testsuite/libgomp.oacc-fortran/private-atomic-1-gang.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/private-variables.f90: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/routine-7.f90: Likewise.
|
|
This patch is to replace the current hardcoded weight factor
50, which is applied by the loop vectorizer to the cost of
statements in an inner loop relative to the loop being
vectorized, with one newly added member inner_loop_cost_factor
in loop vinfo. It also introduces one parameter
vect-inner-loop-cost-factor whose default value is 50, and
is used to initialize the inner_loop_cost_factor member.
The motivation here is that: if targets want to have one
unique function to gather some information in each add_stmt_cost
call, no matter that it's put before or after the cost tweaking
part for inner loop, it may have the need to adjust (expand or
shrink) the gathered data as the factor. Now the factor is
hardcoded, it's not easily maintained.
Bootstrapped/regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu.
gcc/ChangeLog:
* doc/invoke.texi (vect-inner-loop-cost-factor): Document new
parameter.
* params.opt (vect-inner-loop-cost-factor): New.
* targhooks.c (default_add_stmt_cost): Replace hardcoded factor
50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR, include head file
tree-vectorizer.h and its required ones.
* config/aarch64/aarch64.c (aarch64_add_stmt_cost): Replace
hardcoded factor 50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR.
* config/arm/arm.c (arm_add_stmt_cost): Likewise.
* config/i386/i386.c (ix86_add_stmt_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
Likewise.
(_loop_vec_info::_loop_vec_info): Init inner_loop_cost_factor.
* tree-vectorizer.h (_loop_vec_info): Add inner_loop_cost_factor.
(LOOP_VINFO_INNER_LOOP_COST_FACTOR): New macro.
|
|
gcc/ChangeLog:
* doc/invoke.texi: Fix typo.
* params.opt: Likewise.
|
|
Limit how many logical expressions GORI will look back through when
evaluating outgoing edge range.
PR tree-optimization/100081
* gimple-range-cache.h (ranger_cache): Inherit from gori_compute
rather than gori_compute_cache.
* gimple-range-gori.cc (is_gimple_logical_p): Move to top of file.
(range_def_chain::m_logical_depth): New member.
(range_def_chain::range_def_chain): Initialize m_logical_depth.
(range_def_chain::get_def_chain): Don't build defchains through more
than LOGICAL_LIMIT logical expressions.
* params.opt (param_ranger_logical_depth): New.
|
|
This configuration knob is temporary, and isn't really meant to be exposed to
users.
gcc/
* params.opt (-param=openacc-kernels=): Add.
* omp-oacc-kernels-decompose.cc
(pass_omp_oacc_kernels_decompose::gate): Use it.
* doc/invoke.texi (-fopenacc-kernels=@var{mode}): Move...
(--param): ... here, 'openacc-kernels'.
gcc/c-family/
* c.opt (fopenacc-kernels=): Remove.
gcc/fortran/
* lang.opt (fopenacc-kernels=): Remove.
gcc/testsuite/
* c-c++-common/goacc/if-clause-2.c: '-fopenacc-kernels=[...]' ->
'--param=openacc-kernels=[...]'.
* c-c++-common/goacc/kernels-decompose-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
* gfortran.dg/goacc/kernels-tree.f95: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
'-fopenacc-kernels=[...]' -> '--param=openacc-kernels=[...]'.
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
|
|
gcc/ChangeLog:
PR translation/99167
* params.opt: Fix typo.
|
|
gcc/ChangeLog:
* params.opt: Add 2 missing Param keywords.
|
|
The following puts a limit on the number of alias tests we do in
terminate_all_aliasing_chains which is quadratic in the number of
overall stores currentrly tracked. There is already a limit in
place on the maximum number of stores in a single chain so the
following adds a limit on the number of chains tracked. The
worst number of overall stores tracked from the defaults (64 and 64)
is then 4096 which when imposed as the sole limit for the testcase
still causes
store merging : 71.65 ( 56%)
because the testcase is somewhat degenerate with most chains
consisting only of a single store (and 25% of exactly three stores).
The single stores are all CLOBBERs at the point variables go out of
scope. Note unpatched we have
store merging : 308.60 ( 84%)
Limiting the number of chains to 64 brings this down to
store merging : 1.52 ( 3%)
which is more reasonable. There are ideas on how to make
terminate_all_aliasing_chains cheaper but for this degenerate case
they would not have any effect so I'll defer for GCC 12 for those.
I'm not sure we want to have both --params, just keeping the
more to-the-point max-stores-to-track works but makes the
degenerate case above slower.
I made the current default 1024 which for the testcasse
(without limiting chains) results in 25% compile time and 20s
putting it in the same ballpart as the next offender (which is PTA).
This is a regression on trunk and the GCC 10 branch btw.
2021-02-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/38474
* params.opt (-param=max-store-chains-to-track=): New param.
(-param=max-stores-to-track=): Likewise.
* doc/invoke.texi (max-store-chains-to-track): Document.
(max-stores-to-track): Likewise.
* gimple-ssa-store-merging.c (pass_store_merging::m_n_chains):
New.
(pass_store_merging::m_n_stores): Likewise.
(pass_store_merging::terminate_and_process_chain): Update
m_n_stores and m_n_chains.
(pass_store_merging::process_store): Likewise. Terminate
oldest chains if the number of stores or chains get too large.
(imm_store_chain_info::terminate_and_process_chain): Dump
chain length.
|
|
This changes it from bytes to kB since its value is limited to
2147483648.
2021-01-29 Richard Biener <rguenther@suse.de>
* doc/invoke.texi (--param max-gcse-memory): Document unit
of size.
* gcse.c (gcse_or_cprop_is_too_expensive): Adjust.
* params.opt (--param max-gcse-memory): Adjust default and
document unit of size.
|
|
|
|
Here is the new parameter and instrumentation timers for modules.
gcc/
* params.opt (lazy-modules): New.
* timevar.def (TV_MODULE_IMPORT, TV_MODULE_EXPORT)
(TV_MODULE_MAPPER): New.
|
|
These flags can't be used at the same time as any of the other
sanitizers.
We add an equivalent flag to -static-libasan in -static-libhwasan to
ensure static linking.
The -fsanitize=kernel-hwaddress option is for compiling targeting the
kernel. This flag has defaults to match the LLVM implementation and
sets some other behaviors to work in the kernel (e.g. accounting for
the fact that the stack pointer will have 0xff in the top byte and to not
call the userspace library initialisation routines).
The defaults are that we do not sanitize variables on the stack and
always recover from a detected bug.
Since we are introducing a few more conflicts between sanitizer flags we
refactor the checking for such conflicts to use a helper function which
makes checking for such conflicts more easy and consistent.
We introduce a backend hook `targetm.memtag.can_tag_addresses` that
indicates to the mid-end whether a target has a feature like AArch64 TBI
where the top byte of an address is ignored.
Without this feature hwasan sanitization is not done.
gcc/ChangeLog:
* common.opt (flag_sanitize_recover): Default for kernel
hwaddress.
(static-libhwasan): New cli option.
* config/aarch64/aarch64.c (aarch64_can_tag_addresses): New.
(TARGET_MEMTAG_CAN_TAG_ADDRESSES): New.
* config/gnu-user.h (LIBHWASAN_EARLY_SPEC): hwasan equivalent of
asan command line flags.
* cppbuiltin.c (define_builtin_macros_for_compilation_flags):
Add hwasan equivalent of __SANITIZE_ADDRESS__.
* doc/invoke.texi: Document hwasan command line flags.
* doc/tm.texi: Document new hook.
* doc/tm.texi.in: Document new hook.
* flag-types.h (enum sanitize_code): New sanitizer values.
* gcc.c (STATIC_LIBHWASAN_LIBS): New macro.
(LIBHWASAN_SPEC): New macro.
(LIBHWASAN_EARLY_SPEC): New macro.
(SANITIZER_EARLY_SPEC): Update to include hwasan.
(SANITIZER_SPEC): Update to include hwasan.
(sanitize_spec_function): Use hwasan options.
* opts.c (finish_options): Describe conflicts between address
sanitizers.
(find_sanitizer_argument): New.
(report_conflicting_sanitizer_options): New.
(sanitizer_opts): Introduce new sanitizer flags.
(common_handle_option): Add defaults for kernel sanitizer.
* params.opt (hwasan--instrument-stack): New
(hwasan-random-frame-tag): New
(hwasan-instrument-allocas): New
(hwasan-instrument-reads): New
(hwasan-instrument-writes): New
(hwasan-instrument-mem-intrinsics): New
* target.def (HOOK_PREFIX): Add new hook.
(can_tag_addresses): Add new hook under memtag prefix.
* targhooks.c (default_memtag_can_tag_addresses): New.
* targhooks.h (default_memtag_can_tag_addresses): New decl.
* toplev.c (process_options): Ensure hwasan only on
architectures that advertise the possibility.
|
|
gcc/ChangeLog:
* params.opt: Add missing dot.
|
|
this patch implements the IPA propagation part of EAF flags handling in
ipa-modref. It extends the local analysis to collect lattice consisting of
flags and escape points. SSA name escapes if it is passed directly or
indirectly to a function call.
If useful flags are found for parameter its escape list is stored into escape
summaries. This time each call site is annotated with info on which function
parameters escape to what argument of function call.
At IPA time we then perform iterative dataflow and produce final flags.
ipa-modref is still cheaper than pure-const when running on cc1plus (about 2-3%
that is what accounts every non-trivial passs) and the dataflow converges in 1
or 2 iterations.
Local analysis does some work to avoid streaming escape points when they are
not useful to determine final flags (that is, local escape analysis determined
good enough flags). For cc1plus there are 225k calls with useful escape
summary.
* ipa-modref.c (escape_point): New type.
(modref_lattice): New type.
(escape_entry): New type.
(escape_summary): New type.
(escape_summaries_t): New type.
(escape_summaries): New static variable.
(eaf_flags_useful_p): New function.
(modref_summary::useful_p): Add new check_flags
attribute; check eaf_flags for usefulness.
(modref_summary_lto): Add arg_flags.
(modref_summary_lto::useful_p): Add new check_flags
attribute; check eaf_flags for usefulness.
(dump_modref_edge_summaries): New function.
(remove_modref_edge_summaries): New function.
(ignore_retval_p): New predicate.
(ignore_stores_p): Also ignore for const.
(remove_summary): Call remove_modref_edge_summaries.
(modref_lattice::init): New member function.
(modref_lattice::release): New member unction.
(modref_lattice::dump): New member function.
(modref_lattice::add_escape_point): New member function.
(modref_lattice::merge): Two new member functions.
(modref_lattice::merge_deref): New member functions.
(modref_lattice::merge_direct_load): New member function.
(modref_lattice::merge_direct_store): New member function.
(call_lhs_flags): Rename to ...
(merge_call_lhs_flags): ... this one; reimplement using
modreflattice.
(analyze_ssa_name_flags): Replace KNOWN_FLAGS param by LATTICE;
add IPA parametr; use modref_lattice.
(analyze_parms): New parameter IPA and SUMMARY_LTO; update for
modref_lattice; initialize escape_summary.
(analyze_function): Allocate escape_summaries; update uses of useful_p.
(modref_write_escape_summary): New function.
(modref_read_escape_summary): New function.
(modref_write): Write escape summary.
(read_section): Read escape summary.
(modref_read): Initialie escape_summaries.
(remap_arg_flags): New function.
(update_signature): Use it.
(escape_map): New structure.
(update_escape_summary_1, update_escape_summary): New functions.
(ipa_merge_modref_summary_after_inlining): Merge escape summaries.
(propagate_unknown_call): Do not remove useless summaries.
(remove_useless_summaries): Remove them here.
(modref_propagate_in_scc): Update; do not dump scc.
(modref_propagate_dump_scc): New function.
(modref_merge_call_site_flags): New function.
(modref_propagate_flags_in_scc): New function.
(pass_ipa_modref::execute): Use modref_propagate_flags_in_scc
and modref_propagate_dump_scc; delete escape_summaries.
(ipa_modref_c_finalize): Remove escape_summaries.
* ipa-modref.h (modref_summary): Update prototype of useful_p.
* params.opt (param=modref-max-escape-points): New param.
* doc/invoke.texi (modref-max-escape-points): Document.
|
|
Fixes:
FAIL: compiler driver --help=common option(s): "^ +-.*[^:.]$" absent from output: " --param=modref-max-depth= Maximum depth of DFS walk used by modref escape analysis"
gcc/ChangeLog:
* params.opt: All modref parameters miss Optimization and Param
keyword as seen in testsuite failure.
|
|
* params.opt (-param=modref-max-depth=): Add missing full stop.
|
|
A minimal patch for the EAF flags discovery. It works only in local ipa-modref
and gives up on cyclic SSA graphs. It improves pt_solution_includes
disambiguations twice.
gcc/Changelog:
* gimple.c: Include ipa-modref-tree.h and ipa-modref.h.
(gimple_call_arg_flags): Use modref to determine flags.
* ipa-modref.c: Include gimple-ssa.h, tree-phinodes.h,
tree-ssa-operands.h, stringpool.h and tree-ssanames.h.
(analyze_ssa_name_flags): Declare.
(modref_summary::useful_p): Summary is also useful if arg flags are
known.
(dump_eaf_flags): New function.
(modref_summary::dump): Use it.
(get_modref_function_summary): Be read for current_function_decl
being NULL.
(memory_access_to): New function.
(deref_flags): New function.
(call_lhs_flags): New function.
(analyze_parms): New function.
(analyze_function): Use it.
* ipa-modref.h (struct modref_summary): Add arg_flags.
* doc/invoke.texi (ipa-modref-max-depth): Document.
* params.opt (ipa-modref-max-depth): New param.
|
|
The recent previous change in this area limited hoist insertion
iteration via a param but the following is IMHO better since
we are not really interested in PRE opportunities exposed by
hoisting but only the other way around. So this moves hoist
insertion after PRE iteration finished and removes hoist
insertion iteration alltogether.
2020-11-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/97623
* params.opt (-param=max-pre-hoist-insert-iterations): Remove
again.
* doc/invoke.texi (max-pre-hoist-insert-iterations): Likewise.
* tree-ssa-pre.c (insert): Move hoist insertion after PRE
insertion iteration and do not iterate it.
* gcc.dg/tree-ssa/ssa-hoist-3.c: Adjust.
* gcc.dg/tree-ssa/ssa-hoist-7.c: Likewise.
* gcc.dg/tree-ssa/ssa-pre-30.c: Likewise.
|
|
This limits insert iteration caused by PRE insertions generating
hoist insertion opportunities and vice versa. The patch limits
the hoist insertion iterations to three by default.
2020-11-03 Richard Biener <rguenther@suse.de>
PR tree-optimization/97623
* params.opt (-param=max-pre-hoist-insert-iterations): New.
* doc/invoke.texi (max-pre-hoist-insert-iterations): Document.
* tree-ssa-pre.c (insert): Do at most max-pre-hoist-insert-iterations
hoist insert iterations.
|