aboutsummaryrefslogtreecommitdiff
path: root/gcc/params.opt
AgeCommit message (Collapse)AuthorFilesLines
2022-03-18Allow (void *) 0xdeadbeef accesses without warnings [PR99578]Jakub Jelinek1-0/+4
Starting with GCC11 we keep emitting false positive -Warray-bounds or -Wstringop-overflow etc. warnings on widely used *(type *)0x12345000 style accesses (or memory/string routines to such pointers). This is a standard programming style supported by all C/C++ compilers I've ever tried, used mostly in kernel or DSP programming, but sometimes also together with mmap MAP_FIXED when certain things, often I/O registers but could be anything else too are known to be present at fixed addresses. Such INTEGER_CST addresses can appear in code either because a user used it like that (in which case it is fine) or because somebody used pointer arithmetics (including &((struct whatever *)NULL)->field) on a NULL pointer. The middle-end warning code wrongly assumes that the latter case is what is very likely, while the former is unlikely and users should change their code. The following patch adds a min-pagesize param defaulting to 4KB, and treats INTEGER_CST addresses smaller than that as assumed results of pointer arithmetics from NULL while addresses equal or larger than that as expected user constant addresses. For GCC 13 we can represent results from pointer arithmetics on NULL using &MEM[(void*)0 + offset] instead of (void*)offset INTEGER_CSTs. 2022-03-18 Jakub Jelinek <jakub@redhat.com> PR middle-end/99578 PR middle-end/100680 PR tree-optimization/100834 * params.opt (--param=min-pagesize=): New parameter. * pointer-query.cc (compute_objsize_r) <case ARRAY_REF>: Formatting fix. (compute_objsize_r) <case INTEGER_CST>: Use maximum object size instead of zero for pointer constants equal or larger than min-pagesize. * gcc.dg/tree-ssa/pr99578-1.c: New test. * gcc.dg/pr99578-1.c: New test. * gcc.dg/pr99578-2.c: New test. * gcc.dg/pr99578-3.c: New test. * gcc.dg/pr100680.c: New test. * gcc.dg/pr100834.c: New test.
2022-03-08Fix typo in gcc/params.opt.Eric Gallager1-1/+1
Addresses one of the points raised in #104552; checking in under the "obvious" rule. gcc/ChangeLog: PR translation/104552 * params.opt: Fix typo.
2022-03-08tree-optimization/84201 - add --param vect-induction-floatRichard Biener1-0/+4
This adds a --param to allow disabling of vectorization of floating point inductions. Ontop of -Ofast this should allow 549.fotonik3d_r to not miscompare. 2022-03-08 Richard Biener <rguenther@suse.de> PR tree-optimization/84201 * params.opt (-param=vect-induction-float): Add. * doc/invoke.texi (vect-induction-float): Document. * tree-vect-loop.cc (vectorizable_induction): Honor param_vect_induction_float. * gcc.dg/vect/pr84201.c: New testcase.
2022-03-08params: Remove repeated word "that" in parameter descriptionMartin Jambor1-1/+1
One of the mistakes reported in PR 104552 is repeated "that" in description of ipa-cp-recursive-freq-factor which I introduced. This patch removes one of them. gcc/ChangeLog: 2022-03-07 Martin Jambor <mjambor@suse.cz> PR translation/104552 * params.opt (ipa-cp-recursive-freq-factor): Remove repeated word "that" in the description.
2022-01-18Limit the number of relations registered per basic block.Andrew MacLeod1-0/+4
In pathological cases, the number of transitive relations being added is potentially quadratic. Lookups for relations in a block is linear in nature, so simply limit the number of relations to some reasonable number. PR tree-optimization/104038 * doc/invoke.texi (relation-block-limit): New. * params.opt (relation-block-limit): New. * value-relation.cc (dom_oracle::register_relation): Check for NULL record before invoking transitive registery. (dom_oracle::set_one_relation): Check limit before creating record. (dom_oracle::register_transitives): Stop when no record created. * value-relation.h (relation_chain_head::m_num_relations): New.
2022-01-03Update copyright years.Jakub Jelinek1-1/+1
2021-12-10param: Add missing . in description.Martin Liska1-1/+1
Fixes: FAIL: compiler driver --help=param option(s): "^ +-.*[^:.]$" absent from output: " --param=max-inline-functions-called-once-loop-depth= Maximum loop depth of a call which is considered for inlining functions called once" FAIL: compiler driver --help=params option(s): "[^.]$" absent from output: "e" gcc/ChangeLog: * params.opt: Add missing dot.
2021-12-09Limit inlining functions called onceJan Hubicka1-0/+8
as dicussed in PR ipa/103454 there are several benchmarks that regresses for -finline-functions-called once. Runtmes: - tramp3d with -Ofast. 31% - exchange2 with -Ofast 11-21% - roms O2 9%-10% - tonto 2.5-3.5% with LTO Build times: - specfp2006 41% (mostly wrf that builds 71% faster) - specint2006 1.5-3% - specfp2017 64% (again mostly wrf) - specint2017 2.5-3.5% This patch adds two params to tweak the behaviour: 1) max-inline-functions-called-once-loop-depth limiting the loop depth (this is useful primarily for exchange where the inlined function is in loop depth 9) 2) max-inline-functions-called-once-insns We already have large-function-insns/growth parameters, but these are limiting also inlining small functions, so reducing them will regress very large functions that are hot. Because inlining functions called once is meant just as a cleanup pass I think it makes sense to have separate limit for it. gcc/ChangeLog: 2021-12-09 Jan Hubicka <hubicka@ucw.cz> * doc/invoke.texi (max-inline-functions-called-once-loop-depth, max-inline-functions-called-once-insns): New parameters. * ipa-inline.c (check_callers): Handle param_inline_functions_called_once_loop_depth and param_inline_functions_called_once_insns. (edge_badness): Fix linebreaks. * params.opt (param=max-inline-functions-called-once-loop-depth, param=max-inline-functions-called-once-insn): New params.
2021-11-09Remove TDF_THREADING flag in favor of param.Aldy Hernandez1-0/+13
I am returning a TDF_* flag to the queue of available entries as I am unconvinced that we need to burn an entire flag for internal debugging constructs, especially since we seem to be running out of them. I've added a --param=threader-debug entry similar to the one we use for ranger debugging. Currently this only affects the backward threader, but since the DOM threader is an outlier and on the chopping block, I avoided using the "backward" name. Tested on x86-64 Linux. gcc/ChangeLog: * dumpfile.c (dump_options): Remove TDF_THREADING entry. * dumpfile.h (enum dump_flag): Remove TDF_THREADING and adjust remaining entries. * flag-types.h (enum threader_debug): New. * gimple-range-path.cc (DEBUG_SOLVER): Use param_threader_debug. * params.opt: Add entry for --param=threader-debug=.
2021-11-07Limit range of modref-max-depthJan Hubicka1-2/+2
gcc/ChangeLog: PR ipa/103055 * params.opt (modref-max-depth): Add range. (modref-max-adjustments): Fix range.
2021-11-03Switch vrp2 to ranger.Andrew MacLeod1-1/+1
This patch flips the default for the VRP2 pass to execute ranger vrp rather than the assert_expr version of VRP. * params.opt (param_vrp2_mode): Make ranger the default for VRP2.
2021-11-01Fix negative integer range for UInteger.Martin Liska1-2/+2
gcc/ChangeLog: * opt-functions.awk: Add new sanity checking. * optc-gen.awk: Add new argument to integer_range_info. * params.opt: Update 2 params which have negative IntegerRange.
2021-10-27ipa-cp: Select saner profile count to base heuristics onMartin Jambor1-0/+4
When profile feedback is available, IPA-CP takes the count of the hottest node and then evaluates all call contexts relative to it. This means that typically almost no clones for specialized contexts are ever created because the maximum is some special function, called from everywhere (that is likely to get inlined anyway) and all the examined edges look cold compared to it. This patch changes the selection. It simply sorts counts of all edges eligible for cloning in a vector and then picks the count in 90th percentile (the actual number is configurable via a parameter). I also tried more complex approaches which were summing the counts and picking the edge which together with all hotter edges accounted for a given portion of the total sum of all edge counts. But first it was not apparently clear to me that they make more logical sense that the simple method and practically I always also had to ignore a few percent of the hottest edges with really extreme counts (looking at bash and python). And when I had to do that anyway, it seemed simpler to just "ignore" more and take the first non-ignored count as the base. Nevertheless, if people think some more sophisticated method should be used anyway, I am willing to be persuaded. But this patch is a clear improvement over the current situation. gcc/ChangeLog: 2021-10-26 Martin Jambor <mjambor@suse.cz> * params.opt (param_ipa_cp_profile_count_base): New parameter. * doc/invoke.texi (Optimize Options): Add entry for ipa-cp-profile-count-base. * ipa-cp.c (max_count): Replace with base_count, replace all occurrences too, unless otherwise stated. (ipcp_cloning_candidate_p): identify mostly-directly called functions based on their counts, not max_count. (compare_edge_profile_counts): New function. (ipcp_propagate_stage): Instead of setting max_count, find the appropriate edge count in a sorted vector of counts of eligible edges and make it the base_count.
2021-10-25Tweak ranger-debug flags.Andrew MacLeod1-1/+1
Set the 3 possible flags as all individual bits and group for options. * flag-types.h (enum ranger_debug): Adjust values. * params.opt (ranger_debug): Ditto.
2021-10-21Split --param=evrp-mode into evrp-mode and ranger-debug.Andrew MacLeod1-21/+31
With Ranger being used in more than EVRP, the debug output should no longer be tied up with the EVRP mode flag. * doc/invoke.texi (ranger-debug): Document. * flag-types.h (enum ranger_debug): New. (enum evrp_mode): Remove debug values. * gimple-range-cache.cc (DEBUG_RANGE_CACHE): Use new debug flag. * gimple-range-gori.cc (gori_compute::gori_compute): Ditto. * gimple-range.cc (gimple_ranger::gimple_ranger): Ditto. * gimple-ssa-evrp.c (hybrid_folder::choose_value): Ditto. (execute_early_vrp): Use evrp-mode directly. * params.opt (enum evrp_mode): Remove debug values. (ranger-debug): New. (ranger-logical-depth): Relocate to be in alphabetical order.
2021-10-21Add --param=vrp1-mode and --param=vrp2-mode.Andrew MacLeod1-0/+17
Add 2 new params to select between VRP and RANGER to be used for each pass. * doc/invoke.texi: (vrp1-mode, vrp2-mode): Document. * flag-types.h: (enum vrp_mode): New. * params.opt: (vrp1-mode, vrp2-mode): New. * tree-vrp.c (vrp_pass_num): New. (pass_vrp::pass_vrp): Set pass number. (pass_vrp::execute): Choose which VRP mode to execute.
2021-10-20Restore --param=max-fsm-thread-lengthAldy Hernandez1-0/+4
The removal of --param=max-fsm-thread-length is causing code explosion. I thought that --param=max-fsm-thread-path-insns was a better gague for path profitability than raw BB length, but it turns out that we don't take into account PHIs when estimating the number of statements. In this PR, we have a sequence of very large PHIs that have us traversing extremely large paths that blow up the compilation. We could fix this a couple of different ways. We could avoid traversing more than a certain number of PHI arguments, or ignore large PHIs altogether. The old implementation certainly had this knob, and we could cut things off before we even got to the ranger. We could also adjust the instruction estimation to take into account PHIs, but I'm sure we'll mess something else in the process ;-). The easiest thing to do is just restore the knob. At a later time we could tweak this further, for instance, disregarding empty blocks in the count. BTW, this is the reason I didn't chop things off in the lowlevel registry for all threaders: the forward threader can't really explore too deep paths, but it could theoretically get there while threading over empty blocks. This fixes 102814, 102852, and I bet it solves the Linux kernel cross compile issue. Tested on x86-64 Linux. gcc/ChangeLog: PR tree-optimization/102814 * doc/invoke.texi: Document --param=max-fsm-thread-length. * params.opt: Add --param=max-fsm-thread-length. * tree-ssa-threadbackward.c (back_threader_profitability::profitable_path_p): Fail on paths longer than max-fsm-thread-length.
2021-10-14Cleanup --params for backward threader.Aldy Hernandez1-12/+0
The new backward threader makes some of the --param knobs used to control it questionable at best or no longer applicable at worst. The fsm-maximum-phi-arguments param is unused and can be removed. The max-fsm-thread-length param is block based which is a bit redundant, since we already restrict paths based on instruction estimates. The max-fsm-thread-paths restricts the total number of threadable paths in a function. We probably don't need this. Besides, the forward threader has no such restriction. Tested on x86-64 Linux. gcc/ChangeLog: * doc/invoke.texi: Remove max-fsm-thread-length, max-fsm-thread-paths, and fsm-maximum-phi-arguments. * params.opt: Same. * tree-ssa-threadbackward.c (back_threader::back_threader): Remove argument. (back_threader_registry::back_threader_registry): Same. (back_threader_profitability::profitable_path_p): Remove param_max_fsm_thread-length. (back_threader_registry::register_path): Remove m_max_allowable_paths.
2021-10-14ipa-cp: Propagation boost for recursion generated valuesMartin Jambor1-0/+4
Recursive call graph edges, even when they are hot and important for the compiled program, can never have frequency bigger than one, even when the actual time savings in the next recursion call are not realized just once but depend on the depth of recursion. The current IPA-CP effect propagation code did not take that into account and just used the frequency, thus severely underestimating the effect. This patch artificially boosts values taking part in such calls. If a value feeds into itself through a recursive call, the frequency of the edge is multiplied by a parameter with default value of 6, basically assuming that the recursion will take place 6 times. This value can of course be subject to change. Moreover, values which do not feed into themselves but which were generated for a self-recursive call with an arithmetic pass-function (aka the 548.exchange "hack" which however is generally applicable for recursive functions which count the recursion depth in a parameter) have the edge frequency multiplied as many times as there are generated values in the chain. In essence, we will assume they are all useful. This patch partially fixes the current situation when we fail to optimize 548.exchange with PGO. In the benchmark one recursive edge count overwhelmingly dominates all other counts in the program and so we fail to perform the first cloning (for the nonrecursive entry call) because it looks totally insignificant. gcc/ChangeLog: 2021-07-16 Martin Jambor <mjambor@suse.cz> * params.opt (ipa-cp-recursive-freq-factor): New. * ipa-cp.c (ipcp_value): Switch to inline initialization. New members scc_no, self_recursion_generated_level, same_scc and self_recursion_generated_p. (ipcp_lattice::add_value): Replaced parameter unlimited with same_lat_gen_level, usit it determine limit of values and store it to the value. (ipcp_lattice<valtype>::print): Dump the new fileds. (allocate_and_init_ipcp_value): Take same_lat_gen_level as a new parameter and store it to the new value. (self_recursively_generated_p): Removed. (propagate_vals_across_arith_jfunc): Use self_recursion_generated_p instead of self_recursively_generated_p, store self generation level to such values. (value_topo_info<valtype>::add_val): Set scc_no. (value_topo_info<valtype>::propagate_effects): Multiply frequencies of recursively feeding values and self generated values by appropriate new factors.
2021-10-06Introduce a param-switch-limit for EVRP.Andrew MacLeod1-0/+4
Very large switches cause a lot of range calculations with multiple subranges to happen. This can cause quadratic or even exponetial time increases in large testcases. This patch introduces a param variable to limit the size of switches EVRP will process. * gimple-range-edge.cc (gimple_outgoing_range::gimple_outgoing_range): Add parameter to limit size when recognizing switches. (gimple_outgoing_range::edge_range_p): Check size limit. * gimple-range-edge.h (gimple_outgoing_range): Add size field. * gimple-range-gori.cc (gori_map::calculate_gori): Ignore switches that exceed the size limit. (gori_compute::gori_compute): Add initializer. * params.opt (evrp-switch-limit): New. * doc/invoke.texi: Update docs.
2021-09-13c++: implement C++17 hardware interference sizeJason Merrill1-0/+16
The last missing piece of the C++17 standard library is the hardware intereference size constants. Much of the delay in implementing these has been due to uncertainty about what the right values are, and even whether there is a single constant value that is suitable; the destructive interference size is intended to be used in structure layout, so program ABIs will depend on it. In principle, both of these values should be the same as the target's L1 cache line size. When compiling for a generic target that is intended to support a range of target CPUs with different cache line sizes, the constructive size should probably be the minimum size, and the destructive size the maximum, unless you are constrained by ABI compatibility with previous code. From discussion on gcc-patches, I've come to the conclusion that the solution to the difficulty of choosing stable values is to give up on it, and instead encourage only uses where ABI stability is unimportant: in particular, uses where the ABI is shared at most between translation units built at the same time with the same flags. To that end, I've added a warning for any use of the constant value of std::hardware_destructive_interference_size in a header or module export. Appropriate uses within a project can disable the warning. A previous iteration of this patch included an -finterference-tune flag to make the value vary with -mtune; this iteration makes that the default behavior, which should be appropriate for all reasonable uses of the variable. The previous default of "stable-ish" seems to me likely to have been more of an attractive nuisance; since we can't promise actual stability, we should instead make proper uses more convenient. JF Bastien's implementation proposal is summarized at https://github.com/itanium-cxx-abi/cxx-abi/issues/74 I implement this by adding new --params for the two sizes. Targets can override these values in targetm.target_option.override() to support a range of values for the generic target; otherwise, both will default to the L1 cache line size. 64 bytes still seems correct for all x86. I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex A9 has a 32-byte cache line, so I'd think 32/64 would make more sense. He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B cache line, I've changed that to 64/256. Other arch maintainers are invited to set ranges for their generic targets if that seems better than using the default cache line size for both values. With the above choice to reject stability as a goal, getting these values "right" is now just a matter of what we want the default optimization to be, and we can feel free to adjust them as CPUs with different cache lines become more and less common. gcc/ChangeLog: * params.opt: Add destructive-interference-size and constructive-interference-size. * doc/invoke.texi: Document them. * config/aarch64/aarch64.c (aarch64_override_options_internal): Set them. * config/arm/arm.c (arm_option_override): Set them. * config/i386/i386-options.c (ix86_option_override_internal): Set them. gcc/c-family/ChangeLog: * c.opt: Add -Winterference-size. * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE and __GCC_CONSTRUCTIVE_SIZE. gcc/cp/ChangeLog: * constexpr.c (maybe_warn_about_constant_value): Complain about std::hardware_destructive_interference_size. (cxx_eval_constant_expression): Call it. * decl.c (cxx_init_decl_processing): Check --param *-interference-size values. libstdc++-v3/ChangeLog: * include/std/version: Define __cpp_lib_hardware_interference_size. * libsupc++/new: Define hardware interference size variables. gcc/testsuite/ChangeLog: * g++.dg/warn/Winterference.H: New file. * g++.dg/warn/Winterference.C: New test. * g++.target/aarch64/interference.C: New test. * g++.target/arm/interference.C: New test. * g++.target/i386/interference.C: New test.
2021-08-26Add full stop to params.opt.Jan Hubicka1-1/+1
gcc/ChangeLog: * params.opt: (modref-max-adjustments): Add full stop.
2021-08-25Merge load/stores in ipa-modref summariesJan Hubicka1-0/+4
this patch adds logic needed to merge neighbouring accesses in ipa-modref summaries. This helps analyzing array initializers and similar code. It is bit of work, since it breaks the fact that modref tree makes a good lattice for dataflow: the access ranges can be extended indefinitely. For this reason I added counter tracking number of adjustments and a cap to limit them during the dataflow. gcc/ChangeLog: * doc/invoke.texi: Document --param modref-max-adjustments. * ipa-modref-tree.c (test_insert_search_collapse): Update. (test_merge): Update. * ipa-modref-tree.h (struct modref_access_node): Add adjustments; (modref_access_node::operator==): Fix handling of access ranges. (modref_access_node::contains): Constify parameter; handle also mismatched parm offsets. (modref_access_node::update): New function. (modref_access_node::merge): New function. (unspecified_modref_access_node): Update constructor. (modref_ref_node::insert_access): Add record_adjustments parameter; handle merging. (modref_ref_node::try_merge_with): New private function. (modref_tree::insert): New record_adjustments parameter. (modref_tree::merge): New record_adjustments parameter. (modref_tree::copy_from): Update. * ipa-modref.c (dump_access): Dump adjustments field. (get_access): Update constructor. (record_access): Update call of insert. (record_access_lto): Update call of insert. (merge_call_side_effects): Add record_adjustments parameter. (get_access_for_fnspec): Update. (process_fnspec): Update. (analyze_call): Update. (analyze_function): Update. (read_modref_records): Update. (ipa_merge_modref_summary_after_inlining): Update. (propagate_unknown_call): Update. (modref_propagate_in_scc): Update. * params.opt (param-max-modref-adjustments=): New. gcc/testsuite/ChangeLog: * gcc.dg/ipa/modref-1.c: Update testcase. * gcc.dg/tree-ssa/modref-4.c: Update testcase. * gcc.dg/tree-ssa/modref-8.c: New test.
2021-08-24Adjust inner loop cost scalingRichard Biener1-2/+2
This makes use of the estimated number of iterations of the inner loop to limit --param vect-inner-loop-cost-factor scaling. It also reduces the maximum value of vect-inner-loop-cost-factor to 10000 making it less likely to cause overflow of costs. 2021-08-23 Richard Biener <rguenther@suse.de> * doc/invoke.texi (vect-inner-loop-cost-factor): Adjust. * params.opt (--param vect-inner-loop-cost-factor): Adjust maximum value. * tree-vect-loop.c (vect_analyze_loop_form): Initialize inner_loop_cost_factor to the minimum of the estimated number of iterations of the inner loop and vect-inner-loop-cost-factor.
2021-08-17Change evrp-mode options.Andrew MacLeod1-4/+7
Remove tracing in hybrid mode. Add trace/gori/cache tracing options. tracing options are now 'trace', 'gori', 'cache', or all combined in 'debug' * flag-types.h (enum evrp_mode): Adjust evrp-mode values. * gimple-range-cache.cc (DEBUG_RANGE_CACHE): Relocate from. * gimple-range-trace.h (DEBUG_RANGE_CACHE): Here. * params.opt (--param=evrp-mode): Adjust options.
2021-08-12Remove legacy back threader.Aldy Hernandez1-13/+0
At this point I don't see any use for the legacy mode, which I had originally left in place during the transition. This patch removes the legacy back threader, and cleans up the code a bit. There are no functional changes to the non-legacy code. Tested on x86-64 Linux. gcc/ChangeLog: * doc/invoke.texi: Remove docs for threader-mode param. * flag-types.h (enum threader_mode): Remove. * params.opt: Remove threader-mode param. * tree-ssa-threadbackward.c (class back_threader): Remove path_is_unreachable_p. Make find_paths private. Add maybe_thread and thread_through_all_blocks. Remove reference marker for m_registry. Remove reference marker for m_profit. (back_threader::back_threader): Adjust for registry and profit not being references. (dump_path): Move down. (debug): Move down. (class thread_jumps): Remove. (class back_threader_registry): Remove m_all_paths. Remove destructor. (thread_jumps::thread_through_all_blocks): Move to back_threader class. (fsm_find_thread_path): Remove (back_threader::maybe_thread): New. (back_threader::thread_through_all_blocks): Move from thread_jumps. (back_threader_registry::back_threader_registry): Remove m_all_paths. (back_threader_registry::~back_threader_registry): Remove. (thread_jumps::find_taken_edge): Remove. (thread_jumps::check_subpath_and_update_thread_path): Remove. (thread_jumps::maybe_register_path): Remove. (thread_jumps::handle_phi): Remove. (handle_assignment_p): Remove. (thread_jumps::handle_assignment): Remove. (thread_jumps::fsm_find_control_statement_thread_paths): Remove. (thread_jumps::find_jump_threads_backwards): Remove. (thread_jumps::find_jump_threads_backwards_with_ranger): Remove. (try_thread_blocks): Rename find_jump_threads_backwards to maybe_thread. (pass_early_thread_jumps::execute): Same. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Remove call into the legacy code and adjust for ranger threader.
2021-08-02Remove --param=threader-iterative.Aldy Hernandez1-4/+0
This was meant to be an internal construct, but I see folks are using it and submitting PRs against it. Let's just remove this to avoid further confusion. Tested on x86-64 Linux. gcc/ChangeLog: PR tree-optimization/101724 * params.opt: Remove --param=threader-iterative. * tree-ssa-threadbackward.c (pass_thread_jumps::execute): Remove iterative mode.
2021-07-29Backwards jump threader rewrite with ranger.Aldy Hernandez1-0/+17
This is a rewrite of the backwards threader with a ranger based solver. The code is divided into two parts: the path solver in gimple-range-path.*, and the path discovery bits in tree-ssa-threadbackward.c. The legacy code is still available with --param=threader-mode=legacy, but will be removed shortly after. gcc/ChangeLog: * Makefile.in (tree-ssa-loop-im.o-warn): New. * flag-types.h (enum threader_mode): New. * params.opt: Add entry for --param=threader-mode. * tree-ssa-threadbackward.c (THREADER_ITERATIVE_MODE): New. (class back_threader): New. (back_threader::back_threader): New. (back_threader::~back_threader): New. (back_threader::maybe_register_path): New. (back_threader::find_taken_edge): New. (back_threader::find_taken_edge_switch): New. (back_threader::find_taken_edge_cond): New. (back_threader::resolve_def): New. (back_threader::resolve_phi): New. (back_threader::find_paths_to_names): New. (back_threader::find_paths): New. (dump_path): New. (debug): New. (thread_jumps::find_jump_threads_backwards): Call ranger threader. (thread_jumps::find_jump_threads_backwards_with_ranger): New. (pass_thread_jumps::execute): Abstract out code... (try_thread_blocks): ...here. * tree-ssa-threadedge.c (jump_threader::thread_outgoing_edges): Abstract out threading candidate code to... (single_succ_to_potentially_threadable_block): ...here. * tree-ssa-threadedge.h (single_succ_to_potentially_threadable_block): New. * tree-ssa-threadupdate.c (register_jump_thread): Return boolean. * tree-ssa-threadupdate.h (class jump_thread_path_registry): Return bool from register_jump_thread. libgomp/ChangeLog: * testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threader. * testsuite/libgomp.graphite/force-parallel-8.c: Same. gcc/testsuite/ChangeLog: * g++.dg/debug/dwarf2/deallocator.C: Adjust for threader. * gcc.c-torture/compile/pr83510.c: Same. * dg.dg/analyzer/pr94851-2.c: Same. * gcc.dg/loop-unswitch-2.c: Same. * gcc.dg/old-style-asm-1.c: Same. * gcc.dg/pr68317.c: Same. * gcc.dg/pr97567-2.c: Same. * gcc.dg/predict-9.c: Same. * gcc.dg/shrink-wrap-loop.c: Same. * gcc.dg/sibcall-1.c: Same. * gcc.dg/tree-ssa/builtin-sprintf-3.c: Same. * gcc.dg/tree-ssa/pr21001.c: Same. * gcc.dg/tree-ssa/pr21294.c: Same. * gcc.dg/tree-ssa/pr21417.c: Same. * gcc.dg/tree-ssa/pr21458-2.c: Same. * gcc.dg/tree-ssa/pr21563.c: Same. * gcc.dg/tree-ssa/pr49039.c: Same. * gcc.dg/tree-ssa/pr61839_1.c: Same. * gcc.dg/tree-ssa/pr61839_3.c: Same. * gcc.dg/tree-ssa/pr77445-2.c: Same. * gcc.dg/tree-ssa/split-path-4.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same. * gcc.dg/tree-ssa/ssa-fre-48.c: Same. * gcc.dg/tree-ssa/ssa-thread-11.c: Same. * gcc.dg/tree-ssa/ssa-thread-12.c: Same. * gcc.dg/tree-ssa/ssa-thread-14.c: Same. * gcc.dg/tree-ssa/vrp02.c: Same. * gcc.dg/tree-ssa/vrp03.c: Same. * gcc.dg/tree-ssa/vrp05.c: Same. * gcc.dg/tree-ssa/vrp06.c: Same. * gcc.dg/tree-ssa/vrp07.c: Same. * gcc.dg/tree-ssa/vrp09.c: Same. * gcc.dg/tree-ssa/vrp19.c: Same. * gcc.dg/tree-ssa/vrp20.c: Same. * gcc.dg/tree-ssa/vrp33.c: Same. * gcc.dg/uninit-pred-9_b.c: Same. * gcc.dg/uninit-pr61112.c: Same. * gcc.dg/vect/bb-slp-16.c: Same. * gcc.target/i386/avx2-vect-aggressive.c: Same. * gcc.dg/tree-ssa/ranger-threader-1.c: New test. * gcc.dg/tree-ssa/ranger-threader-2.c: New test. * gcc.dg/tree-ssa/ranger-threader-3.c: New test. * gcc.dg/tree-ssa/ranger-threader-4.c: New test. * gcc.dg/tree-ssa/ranger-threader-5.c: New test.
2021-07-14Turn hybrid mode off, default to ranger-only mode for EVRP.Andrew MacLeod1-1/+1
Change the default EVRP mode to ranger-only. gcc/ * params.opt (param_evrp_mode): Change default. gcc/testsuite/ * gcc.dg/pr80776-1.c: Remove xfail.
2021-07-05ira: Support more matching constraint forms with param [PR100328]Kewen Lin1-0/+4
This patch is to make IRA consider matching constraint heavily, even if there is at least one other alternative with non-NO_REG register class constraint, it will continue and check matching constraint in all available alternatives and respect the matching constraint with preferred register class. One typical case is destructive FMA style instruction on rs6000. Without this patch, for the mentioned FMA instruction, IRA won't respect the matching constraint on VSX_REG since there are some alternative with FLOAT_REG which doesn't have matching constraint. It can cause extra register copies since later reload has to make code to respect the constraint. This patch make IRA respect this matching constraint on VSX_REG which is the preferred regclass, but it excludes some cases where for one preferred register class there can be two or more alternatives, one of them has the matching constraint, while another doesn't have. It also considers the possibility of free register copy. With option Ofast unroll, this patch can help to improve SPEC2017 bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while 508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any remarkable degradations. It also improved something on SVE as testcase changes showed and Richard's confirmation. Bootstrapped & regtested on powerpc64le-linux-gnu P9, x86_64-redhat-linux and aarch64-linux-gnu. gcc/ChangeLog: PR rtl-optimization/100328 * doc/invoke.texi (ira-consider-dup-in-all-alts): Document new parameter. * ira.c (ira_get_dup_out_num): Adjust as parameter param_ira_consider_dup_in_all_alts. * params.opt (ira-consider-dup-in-all-alts): New. * ira-conflicts.c (process_regs_for_copy): Add one parameter single_input_op_has_cstr_p. (get_freq_for_shuffle_copy): New function. (add_insn_allocno_copies): Adjust as single_input_op_has_cstr_p. * ira-int.h (ira_get_dup_out_num): Add one bool parameter. gcc/testsuite/ChangeLog: PR rtl-optimization/100328 * gcc.target/aarch64/sve/acle/asm/div_f16.c: Remove one xfail. * gcc.target/aarch64/sve/acle/asm/div_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/div_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/divr_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/divr_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/divr_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulx_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulx_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulx_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmad_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmad_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmad_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmla_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmla_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmla_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmls_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmls_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmls_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmsb_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmsb_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmsb_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/sub_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/sub_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/sub_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_f64.c: Likewise.
2021-06-07Implement a sparse bitmap representation for Rangers on-entry cache.Andrew MacLeod1-0/+4
Use a sparse representation for the on entry cache, and utilize it when the number of basic blocks in the function exceeds param_evrp_sparse_threshold. PR tree-optimization/PR100299 * gimple-range-cache.cc (class sbr_sparse_bitmap): New. (sbr_sparse_bitmap::sbr_sparse_bitmap): New. (sbr_sparse_bitmap::bitmap_set_quad): New. (sbr_sparse_bitmap::bitmap_get_quad): New. (sbr_sparse_bitmap::set_bb_range): New. (sbr_sparse_bitmap::get_bb_range): New. (sbr_sparse_bitmap::bb_range_p): New. (block_range_cache::block_range_cache): initialize bitmap obstack. (block_range_cache::~block_range_cache): Destruct obstack. (block_range_cache::set_bb_range): Decide when to utilze the sparse on entry cache. * gimple-range-cache.h (block_range_cache): Add bitmap obstack. * params.opt (-param=evrp-sparse-threshold): New.
2021-05-21[OpenACC privatization] Largely extend diagnostics and corresponding ↵Thomas Schwinge1-0/+13
testsuite coverage [PR90115] gcc/ PR middle-end/90115 * flag-types.h (enum openacc_privatization): New. * params.opt (-param=openacc-privatization): New. * doc/invoke.texi (openacc-privatization): Document it. * omp-general.h (get_openacc_privatization_dump_flags): New function. * omp-low.c (oacc_privatization_candidate_p): Add diagnostics. * omp-offload.c (execute_oacc_device_lower) <IFN_UNIQUE_OACC_PRIVATE>: Re-work diagnostics. * target.def (goacc.adjust_private_decl): Add 'location_t' parameter. * doc/tm.texi: Regenerate. * config/gcn/gcn-protos.h (gcn_goacc_adjust_private_decl): Adjust. * config/gcn/gcn-tree.c (gcn_goacc_adjust_private_decl): Likewise. * config/nvptx/nvptx.c (nvptx_goacc_adjust_private_decl): Likewise. Preserve it for... (nvptx_goacc_expand_var_decl): ... use here. gcc/testsuite/ PR middle-end/90115 * c-c++-common/goacc/privatization-1-compute-loop.c: New file. * c-c++-common/goacc/privatization-1-compute.c: Likewise. * c-c++-common/goacc/privatization-1-routine_gang-loop.c: Likewise. * c-c++-common/goacc/privatization-1-routine_gang.c: Likewise. * gfortran.dg/goacc/privatization-1-compute-loop.f90: Likewise. * gfortran.dg/goacc/privatization-1-compute.f90: Likewise. * gfortran.dg/goacc/privatization-1-routine_gang-loop.f90: Likewise. * gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise. * c-c++-common/goacc-gomp/nesting-1.c: Update. * c-c++-common/goacc/private-reduction-1.c: Likewise. * gfortran.dg/goacc/private-3.f95: Likewise. libgomp/ PR middle-end/90115 * testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: New file. * testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Update. * testsuite/libgomp.oacc-c-c++-common/host_data-7.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/private-atomic-1-gang.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/private-variables.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/static-variable-1.c: Likewise. * testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise. * testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise. * testsuite/libgomp.oacc-fortran/declare-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/host_data-5.F90: Likewise. * testsuite/libgomp.oacc-fortran/if-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90: Likewise. * testsuite/libgomp.oacc-fortran/optional-private.f90: Likewise. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise. * testsuite/libgomp.oacc-fortran/private-atomic-1-gang.f90: Likewise. * testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Likewise. * testsuite/libgomp.oacc-fortran/private-variables.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/routine-7.f90: Likewise.
2021-05-20vect: Replace hardcoded inner loop cost factorKewen Lin1-0/+4
This patch is to replace the current hardcoded weight factor 50, which is applied by the loop vectorizer to the cost of statements in an inner loop relative to the loop being vectorized, with one newly added member inner_loop_cost_factor in loop vinfo. It also introduces one parameter vect-inner-loop-cost-factor whose default value is 50, and is used to initialize the inner_loop_cost_factor member. The motivation here is that: if targets want to have one unique function to gather some information in each add_stmt_cost call, no matter that it's put before or after the cost tweaking part for inner loop, it may have the need to adjust (expand or shrink) the gathered data as the factor. Now the factor is hardcoded, it's not easily maintained. Bootstrapped/regtested on powerpc64le-linux-gnu P9, x86_64-redhat-linux and aarch64-linux-gnu. gcc/ChangeLog: * doc/invoke.texi (vect-inner-loop-cost-factor): Document new parameter. * params.opt (vect-inner-loop-cost-factor): New. * targhooks.c (default_add_stmt_cost): Replace hardcoded factor 50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR, include head file tree-vectorizer.h and its required ones. * config/aarch64/aarch64.c (aarch64_add_stmt_cost): Replace hardcoded factor 50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR. * config/arm/arm.c (arm_add_stmt_cost): Likewise. * config/i386/i386.c (ix86_add_stmt_cost): Likewise. * config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise. * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Likewise. (_loop_vec_info::_loop_vec_info): Init inner_loop_cost_factor. * tree-vectorizer.h (_loop_vec_info): Add inner_loop_cost_factor. (LOOP_VINFO_INNER_LOOP_COST_FACTOR): New macro.
2021-04-20Fix typo in param description.Martin Liska1-1/+1
gcc/ChangeLog: * doc/invoke.texi: Fix typo. * params.opt: Likewise.
2021-04-19tree-optimization/100081 - Limit depth of logical expression windback.Andrew MacLeod1-0/+5
Limit how many logical expressions GORI will look back through when evaluating outgoing edge range. PR tree-optimization/100081 * gimple-range-cache.h (ranger_cache): Inherit from gori_compute rather than gori_compute_cache. * gimple-range-gori.cc (is_gimple_logical_p): Move to top of file. (range_def_chain::m_logical_depth): New member. (range_def_chain::range_def_chain): Initialize m_logical_depth. (range_def_chain::get_def_chain): Don't build defchains through more than LOGICAL_LIMIT logical expressions. * params.opt (param_ranger_logical_depth): New.
2021-04-19[OpenACC 'kernels'] '-fopenacc-kernels=[...]' -> '--param=openacc-kernels=[...]'Thomas Schwinge1-0/+13
This configuration knob is temporary, and isn't really meant to be exposed to users. gcc/ * params.opt (-param=openacc-kernels=): Add. * omp-oacc-kernels-decompose.cc (pass_omp_oacc_kernels_decompose::gate): Use it. * doc/invoke.texi (-fopenacc-kernels=@var{mode}): Move... (--param): ... here, 'openacc-kernels'. gcc/c-family/ * c.opt (fopenacc-kernels=): Remove. gcc/fortran/ * lang.opt (fopenacc-kernels=): Remove. gcc/testsuite/ * c-c++-common/goacc/if-clause-2.c: '-fopenacc-kernels=[...]' -> '--param=openacc-kernels=[...]'. * c-c++-common/goacc/kernels-decompose-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise. * gfortran.dg/goacc/kernels-decompose-1.f95: Likewise. * gfortran.dg/goacc/kernels-decompose-2.f95: Likewise. * gfortran.dg/goacc/kernels-tree.f95: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c: '-fopenacc-kernels=[...]' -> '--param=openacc-kernels=[...]'. * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise. * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
2021-02-19Fix typo in param description.Martin Liska1-1/+1
gcc/ChangeLog: PR translation/99167 * params.opt: Fix typo.
2021-02-15Add 2 missing Param keywords.Martin Liska1-2/+2
gcc/ChangeLog: * params.opt: Add 2 missing Param keywords.
2021-02-12tree-optimization/38474 - fix store-merging compile-time regressionRichard Biener1-0/+8
The following puts a limit on the number of alias tests we do in terminate_all_aliasing_chains which is quadratic in the number of overall stores currentrly tracked. There is already a limit in place on the maximum number of stores in a single chain so the following adds a limit on the number of chains tracked. The worst number of overall stores tracked from the defaults (64 and 64) is then 4096 which when imposed as the sole limit for the testcase still causes store merging : 71.65 ( 56%) because the testcase is somewhat degenerate with most chains consisting only of a single store (and 25% of exactly three stores). The single stores are all CLOBBERs at the point variables go out of scope. Note unpatched we have store merging : 308.60 ( 84%) Limiting the number of chains to 64 brings this down to store merging : 1.52 ( 3%) which is more reasonable. There are ideas on how to make terminate_all_aliasing_chains cheaper but for this degenerate case they would not have any effect so I'll defer for GCC 12 for those. I'm not sure we want to have both --params, just keeping the more to-the-point max-stores-to-track works but makes the degenerate case above slower. I made the current default 1024 which for the testcasse (without limiting chains) results in 25% compile time and 20s putting it in the same ballpart as the next offender (which is PTA). This is a regression on trunk and the GCC 10 branch btw. 2021-02-11 Richard Biener <rguenther@suse.de> PR tree-optimization/38474 * params.opt (-param=max-store-chains-to-track=): New param. (-param=max-stores-to-track=): Likewise. * doc/invoke.texi (max-store-chains-to-track): Document. (max-stores-to-track): Likewise. * gimple-ssa-store-merging.c (pass_store_merging::m_n_chains): New. (pass_store_merging::m_n_stores): Likewise. (pass_store_merging::terminate_and_process_chain): Update m_n_stores and m_n_chains. (pass_store_merging::process_store): Likewise. Terminate oldest chains if the number of stores or chains get too large. (imm_store_chain_info::terminate_and_process_chain): Dump chain length.
2021-01-29change unit of --param max-gcse-memory to kBRichard Biener1-2/+2
This changes it from bytes to kB since its value is limited to 2147483648. 2021-01-29 Richard Biener <rguenther@suse.de> * doc/invoke.texi (--param max-gcse-memory): Document unit of size. * gcse.c (gcse_or_cprop_is_too_expensive): Adjust. * params.opt (--param max-gcse-memory): Adjust default and document unit of size.
2021-01-04Update copyright years.Jakub Jelinek1-1/+1
2020-12-01C++ Module parameters & timersNathan Sidwell1-0/+4
Here is the new parameter and instrumentation timers for modules. gcc/ * params.opt (lazy-modules): New. * timevar.def (TV_MODULE_IMPORT, TV_MODULE_EXPORT) (TV_MODULE_MAPPER): New.
2020-11-25libsanitizer: options: Add hwasan flags and argument parsingMatthew Malcomson1-0/+24
These flags can't be used at the same time as any of the other sanitizers. We add an equivalent flag to -static-libasan in -static-libhwasan to ensure static linking. The -fsanitize=kernel-hwaddress option is for compiling targeting the kernel. This flag has defaults to match the LLVM implementation and sets some other behaviors to work in the kernel (e.g. accounting for the fact that the stack pointer will have 0xff in the top byte and to not call the userspace library initialisation routines). The defaults are that we do not sanitize variables on the stack and always recover from a detected bug. Since we are introducing a few more conflicts between sanitizer flags we refactor the checking for such conflicts to use a helper function which makes checking for such conflicts more easy and consistent. We introduce a backend hook `targetm.memtag.can_tag_addresses` that indicates to the mid-end whether a target has a feature like AArch64 TBI where the top byte of an address is ignored. Without this feature hwasan sanitization is not done. gcc/ChangeLog: * common.opt (flag_sanitize_recover): Default for kernel hwaddress. (static-libhwasan): New cli option. * config/aarch64/aarch64.c (aarch64_can_tag_addresses): New. (TARGET_MEMTAG_CAN_TAG_ADDRESSES): New. * config/gnu-user.h (LIBHWASAN_EARLY_SPEC): hwasan equivalent of asan command line flags. * cppbuiltin.c (define_builtin_macros_for_compilation_flags): Add hwasan equivalent of __SANITIZE_ADDRESS__. * doc/invoke.texi: Document hwasan command line flags. * doc/tm.texi: Document new hook. * doc/tm.texi.in: Document new hook. * flag-types.h (enum sanitize_code): New sanitizer values. * gcc.c (STATIC_LIBHWASAN_LIBS): New macro. (LIBHWASAN_SPEC): New macro. (LIBHWASAN_EARLY_SPEC): New macro. (SANITIZER_EARLY_SPEC): Update to include hwasan. (SANITIZER_SPEC): Update to include hwasan. (sanitize_spec_function): Use hwasan options. * opts.c (finish_options): Describe conflicts between address sanitizers. (find_sanitizer_argument): New. (report_conflicting_sanitizer_options): New. (sanitizer_opts): Introduce new sanitizer flags. (common_handle_option): Add defaults for kernel sanitizer. * params.opt (hwasan--instrument-stack): New (hwasan-random-frame-tag): New (hwasan-instrument-allocas): New (hwasan-instrument-reads): New (hwasan-instrument-writes): New (hwasan-instrument-mem-intrinsics): New * target.def (HOOK_PREFIX): Add new hook. (can_tag_addresses): Add new hook under memtag prefix. * targhooks.c (default_memtag_can_tag_addresses): New. * targhooks.h (default_memtag_can_tag_addresses): New decl. * toplev.c (process_options): Ensure hwasan only on architectures that advertise the possibility.
2020-11-16param: Add missing dot for param description.Martin Liska1-1/+1
gcc/ChangeLog: * params.opt: Add missing dot.
2020-11-16IPA tracking of EAF flags in ipa-modref.Jan Hubicka1-0/+4
this patch implements the IPA propagation part of EAF flags handling in ipa-modref. It extends the local analysis to collect lattice consisting of flags and escape points. SSA name escapes if it is passed directly or indirectly to a function call. If useful flags are found for parameter its escape list is stored into escape summaries. This time each call site is annotated with info on which function parameters escape to what argument of function call. At IPA time we then perform iterative dataflow and produce final flags. ipa-modref is still cheaper than pure-const when running on cc1plus (about 2-3% that is what accounts every non-trivial passs) and the dataflow converges in 1 or 2 iterations. Local analysis does some work to avoid streaming escape points when they are not useful to determine final flags (that is, local escape analysis determined good enough flags). For cc1plus there are 225k calls with useful escape summary. * ipa-modref.c (escape_point): New type. (modref_lattice): New type. (escape_entry): New type. (escape_summary): New type. (escape_summaries_t): New type. (escape_summaries): New static variable. (eaf_flags_useful_p): New function. (modref_summary::useful_p): Add new check_flags attribute; check eaf_flags for usefulness. (modref_summary_lto): Add arg_flags. (modref_summary_lto::useful_p): Add new check_flags attribute; check eaf_flags for usefulness. (dump_modref_edge_summaries): New function. (remove_modref_edge_summaries): New function. (ignore_retval_p): New predicate. (ignore_stores_p): Also ignore for const. (remove_summary): Call remove_modref_edge_summaries. (modref_lattice::init): New member function. (modref_lattice::release): New member unction. (modref_lattice::dump): New member function. (modref_lattice::add_escape_point): New member function. (modref_lattice::merge): Two new member functions. (modref_lattice::merge_deref): New member functions. (modref_lattice::merge_direct_load): New member function. (modref_lattice::merge_direct_store): New member function. (call_lhs_flags): Rename to ... (merge_call_lhs_flags): ... this one; reimplement using modreflattice. (analyze_ssa_name_flags): Replace KNOWN_FLAGS param by LATTICE; add IPA parametr; use modref_lattice. (analyze_parms): New parameter IPA and SUMMARY_LTO; update for modref_lattice; initialize escape_summary. (analyze_function): Allocate escape_summaries; update uses of useful_p. (modref_write_escape_summary): New function. (modref_read_escape_summary): New function. (modref_write): Write escape summary. (read_section): Read escape summary. (modref_read): Initialie escape_summaries. (remap_arg_flags): New function. (update_signature): Use it. (escape_map): New structure. (update_escape_summary_1, update_escape_summary): New functions. (ipa_merge_modref_summary_after_inlining): Merge escape summaries. (propagate_unknown_call): Do not remove useless summaries. (remove_useless_summaries): Remove them here. (modref_propagate_in_scc): Update; do not dump scc. (modref_propagate_dump_scc): New function. (modref_merge_call_site_flags): New function. (modref_propagate_flags_in_scc): New function. (pass_ipa_modref::execute): Use modref_propagate_flags_in_scc and modref_propagate_dump_scc; delete escape_summaries. (ipa_modref_c_finalize): Remove escape_summaries. * ipa-modref.h (modref_summary): Update prototype of useful_p. * params.opt (param=modref-max-escape-points): New param. * doc/invoke.texi (modref-max-escape-points): Document.
2020-11-16modref: add missing Param Optimization keywordsMartin Liska1-5/+5
Fixes: FAIL: compiler driver --help=common option(s): "^ +-.*[^:.]$" absent from output: " --param=modref-max-depth= Maximum depth of DFS walk used by modref escape analysis" gcc/ChangeLog: * params.opt: All modref parameters miss Optimization and Param keyword as seen in testsuite failure.
2020-11-16Fix -param=modref-max-depth in params.optJan Hubicka1-1/+1
* params.opt (-param=modref-max-depth=): Add missing full stop.
2020-11-14Detect EAF flags in ipa-modrefJan Hubicka1-0/+4
A minimal patch for the EAF flags discovery. It works only in local ipa-modref and gives up on cyclic SSA graphs. It improves pt_solution_includes disambiguations twice. gcc/Changelog: * gimple.c: Include ipa-modref-tree.h and ipa-modref.h. (gimple_call_arg_flags): Use modref to determine flags. * ipa-modref.c: Include gimple-ssa.h, tree-phinodes.h, tree-ssa-operands.h, stringpool.h and tree-ssanames.h. (analyze_ssa_name_flags): Declare. (modref_summary::useful_p): Summary is also useful if arg flags are known. (dump_eaf_flags): New function. (modref_summary::dump): Use it. (get_modref_function_summary): Be read for current_function_decl being NULL. (memory_access_to): New function. (deref_flags): New function. (call_lhs_flags): New function. (analyze_parms): New function. (analyze_function): Use it. * ipa-modref.h (struct modref_summary): Add arg_flags. * doc/invoke.texi (ipa-modref-max-depth): Document. * params.opt (ipa-modref-max-depth): New param.
2020-11-11tree-optimization/97623 - Avoid PRE hoist insertion iterationRichard Biener1-4/+0
The recent previous change in this area limited hoist insertion iteration via a param but the following is IMHO better since we are not really interested in PRE opportunities exposed by hoisting but only the other way around. So this moves hoist insertion after PRE iteration finished and removes hoist insertion iteration alltogether. 2020-11-11 Richard Biener <rguenther@suse.de> PR tree-optimization/97623 * params.opt (-param=max-pre-hoist-insert-iterations): Remove again. * doc/invoke.texi (max-pre-hoist-insert-iterations): Likewise. * tree-ssa-pre.c (insert): Move hoist insertion after PRE insertion iteration and do not iterate it. * gcc.dg/tree-ssa/ssa-hoist-3.c: Adjust. * gcc.dg/tree-ssa/ssa-hoist-7.c: Likewise. * gcc.dg/tree-ssa/ssa-pre-30.c: Likewise.
2020-11-03tree-optimization/97623 - limit PRE hoist insertionRichard Biener1-0/+4
This limits insert iteration caused by PRE insertions generating hoist insertion opportunities and vice versa. The patch limits the hoist insertion iterations to three by default. 2020-11-03 Richard Biener <rguenther@suse.de> PR tree-optimization/97623 * params.opt (-param=max-pre-hoist-insert-iterations): New. * doc/invoke.texi (max-pre-hoist-insert-iterations): Document. * tree-ssa-pre.c (insert): Do at most max-pre-hoist-insert-iterations hoist insert iterations.