riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2021-11-14	VAX: Add the `setmemhi' instruction	Maciej W. Rozycki	3	-0/+87
	The MOVC5 machine instruction has `memset' semantics if encoded with a zero source length[1]: "4. MOVC5 with a zero source length operand is the preferred way to fill a block of memory with the fill character." Use that instruction to implement the `setmemhi' instruction then. Use the AP register in the register deferred mode for the source address to yield the shortest possible encoding of the otherwise unused operand, observing that the address is never dereferenced if the source length is zero. The use of this instruction yields steadily better performance, at least with the Mariah VAX implementation, for a variable-length `memset' call expanded inline as a single MOVC5 operation compared to an equivalent libcall invocation: Length: 1, time elapsed: 0.971789 (builtin), 2.847303 (libcall) Length: 2, time elapsed: 0.907904 (builtin), 2.728259 (libcall) Length: 3, time elapsed: 1.038311 (builtin), 2.917245 (libcall) Length: 4, time elapsed: 0.775305 (builtin), 2.686088 (libcall) Length: 7, time elapsed: 1.112331 (builtin), 2.992968 (libcall) Length: 8, time elapsed: 0.856882 (builtin), 2.764885 (libcall) Length: 15, time elapsed: 1.256086 (builtin), 3.096660 (libcall) Length: 16, time elapsed: 1.001962 (builtin), 2.888131 (libcall) Length: 31, time elapsed: 1.590456 (builtin), 3.774164 (libcall) Length: 32, time elapsed: 1.288909 (builtin), 3.629622 (libcall) Length: 63, time elapsed: 3.430285 (builtin), 5.269789 (libcall) Length: 64, time elapsed: 3.265147 (builtin), 5.113156 (libcall) Length: 127, time elapsed: 6.438772 (builtin), 8.268305 (libcall) Length: 128, time elapsed: 6.268991 (builtin), 8.114557 (libcall) Length: 255, time elapsed: 12.417338 (builtin), 14.259678 (libcall) (times given in seconds per 1000000 `memset' invocations for the given length made in a loop). It is clear from these figures that hardware does data coalescence for consecutive bytes rather than naively copying them one by one, as for lengths that are powers of 2 the figures are consistently lower than ones for their respective next lower lengths. The use of MOVC5 also requires at least 4 bytes less in terms of machine code as it avoids encoding the address of `memset' needed for the CALLS instruction used to make a libcall, as well as extra PUSHL instructions needed to pass arguments to the call as those can be encoded directly as the respective operands of the MOVC5 instruction. It is perhaps worth noting too that for constant lengths we prefer to emit up to 5 individual MOVx instructions rather than a single MOVC5 instruction to clear memory and for consistency we copy this behavior here for filling memory with another value too, even though there may be a performance advantage with a string copy in comparison to a piecemeal copy, e.g.: Length: 40, time elapsed: 2.183192 (string), 2.638878 (piecemeal) But this is something for another change as it will have to be carefully evaluated. [1] DEC STD 032-0 "VAX Architecture Standard", Digital Equipment Corporation, A-DS-EL-00032-00-0 Rev J, December 15, 1989, Section 3.10 "Character-String Instructions", p. 3-163 gcc/ * config/vax/vax.h (SET_RATIO): New macro. * config/vax/vax.md (UNSPEC_SETMEM_FILL): New constant. (setmemhi): New expander. (setmemhi1): New insn and splitter. (setmemhi1): New insn. gcc/testsuite/ gcc.target/vax/setmem.c: New test.
2021-11-14	Extend modref to track kills	Jan Hubicka	6	-79/+471
	This patch adds kill tracking to ipa-modref. This is representd by array of accesses to memory locations that are known to be overwritten by the function. gcc/ChangeLog: 2021-11-14 Jan Hubicka <hubicka@ucw.cz> * ipa-modref-tree.c (modref_access_node::update_for_kills): New member function. (modref_access_node::merge_for_kills): Likewise. (modref_access_node::insert_kill): Likewise. * ipa-modref-tree.h (modref_access_node::update_for_kills, modref_access_node::merge_for_kills, modref_access_node::insert_kill): Declare. (modref_access_node::useful_for_kill): New member function. * ipa-modref.c (modref_summary::useful_p): Release useless kills. (lto_modref_summary): Add kills. (modref_summary::dump): Dump kills. (record_access): Add mdoref_access_node parameter. (record_access_lto): Likewise. (merge_call_side_effects): Merge kills. (analyze_call): Add ALWAYS_EXECUTED param and pass it around. (struct summary_ptrs): Add always_executed filed. (analyze_load): Update. (analyze_store): Update; record kills. (analyze_stmt): Add always_executed; record kills in clobbers. (analyze_function): Track always_executed. (modref_summaries::duplicate): Duplicate kills. (update_signature): Release kills. * ipa-modref.h (struct modref_summary): Add kills. * tree-ssa-alias.c (alias_stats): Add kill stats. (dump_alias_stats): Dump kill stats. (store_kills_ref_p): Break out from ... (stmt_kills_ref_p): Use it; handle modref info based kills. gcc/testsuite/ChangeLog: 2021-11-14 Jan Hubicka <hubicka@ucw.cz> * gcc.dg/tree-ssa/modref-dse-3.c: New test.
2021-11-14	Remove gcc.dg/pr103229.c	Aldy Hernandez	1	-10/+0
	gcc/testsuite/ChangeLog: * gcc.dg/pr103229.c: Removed.
2021-11-14	Do not pass NULL to memset in ssa_global_cache.	Aldy Hernandez	2	-1/+12
	The code computing ranges in PHIs in the path solver reuses the temporary ssa_global_cache by calling its clear method. Calling it on an empty cache causes us to call memset with NULL. Tested on x86-64 Linux. gcc/ChangeLog: PR tree-optimization/103229 * gimple-range-cache.cc (ssa_global_cache::clear): Do not pass null value to memset. gcc/testsuite/ChangeLog: * gcc.dg/pr103229.c: New test.
2021-11-14	tsan: remove not needed -ldl in options	Martin Liska	2	-2/+2
	gcc/testsuite/ChangeLog: * c-c++-common/tsan/free_race.c: Remove unnecessary -ldl. * c-c++-common/tsan/free_race2.c: Likewise.
2021-11-14	Cleanup tree-ssa-alias and tree-ssa-dse use of modref summary	Jan Hubicka	8	-73/+77
	Move code getting tree op from access_node and stmt to a common place. I also commonized logic to build ao_ref. While I was on it I also replaced FOR_EACH_* by range for since they reads better. gcc/ChangeLog: 2021-11-14 Jan Hubicka <hubicka@ucw.cz> * ipa-modref-tree.c (modref_access_node::get_call_arg): New member function. (modref_access_node::get_ao_ref): Likewise. * ipa-modref-tree.h (modref_access_node::get_call_arg): Declare. (modref_access_node::get_ao_ref): Declare. * tree-ssa-alias.c (modref_may_conflict): Use new accessors. * tree-ssa-dse.c (dse_optimize_call): Use new accessors. gcc/testsuite/ChangeLog: 2021-11-14 Jan Hubicka <hubicka@ucw.cz> * c-c++-common/asan/null-deref-1.c: Update template. * c-c++-common/tsan/free_race.c: Update template. * c-c++-common/tsan/free_race2.c: Update template. * gcc.dg/ipa/ipa-sra-4.c: Update template.
2021-11-14	Daily bump.	GCC Administrator	4	-1/+244

2021-11-14	Fix bug in ipa-pure-const and add debug counters	Jan Hubicka	2	-4/+10
	gcc/ChangeLog: PR lto/103211 * dbgcnt.def (ipa_attr): New counters. * ipa-pure-const.c: Include dbgcnt.c (ipa_make_function_const): Use debug counter. (ipa_make_function_pure): Likewise. (propagate_pure_const): Fix bug in my previous change.
2021-11-13	More ipa-modref-tree.h cleanups	Jan Hubicka	3	-49/+63
	Move access dumping to member function and cleanup formating. gcc/ChangeLog: 2021-11-13 Jan Hubicka <hubicka@ucw.cz> * ipa-modref-tree.c (modref_access_node::range_info_useful_p): Offline from ipa-modref-tree.h. (modref_access_node::dump): Move from ipa-modref.c; make member function. * ipa-modref-tree.h (modref_access_node::range_info_useful_p. modref_access_node::dump): Declare. * ipa-modref.c (dump_access): Remove. (dump_records): Update. (dump_lto_records): Update. (record_access): Update. (record_access_lto): Update.
2021-11-13	Implement DSE of dead functions calls storing memory.	Jan Hubicka	9	-12/+227
	gcc/ChangeLog: 2021-11-13 Jan Hubicka <hubicka@ucw.cz> * ipa-modref.c (modref_summary::modref_summary): Clear new flags. (modref_summary::dump): Dump try_dse. (modref_summary::finalize): Add FUN attribute; compute try-dse. (analyze_function): Update. (read_section): Update. (update_signature): Update. (pass_ipa_modref::execute): Update. * ipa-modref.h (struct modref_summary): * tree-ssa-alias.c (ao_ref_init_from_ptr_and_range): Export. * tree-ssa-alias.h (ao_ref_init_from_ptr_and_range): Declare. * tree-ssa-dse.c (dse_optimize_call): New function. (dse_optimize_stmt): Use it. gcc/testsuite/ChangeLog: 2021-11-13 Jan Hubicka <hubicka@ucw.cz> * g++.dg/cpp1z/inh-ctor23.C: Fix template * g++.dg/ipa/ipa-icf-4.C: Fix template * gcc.dg/tree-ssa/modref-dse-1.c: New test. * gcc.dg/tree-ssa/modref-dse-2.c: New test.
2021-11-13	Fix checking disabled build.	Jan Hubicka	1	-2/+2
	gcc/ChangeLog: 2021-11-13 Jan Hubicka <hubicka@ucw.cz> * ipa-modref-tree.c: Move #if CHECKING_P to proper place.
2021-11-13	modref_access_node cleanup	Jan Hubicka	2	-514/+563
	move member functions of modref_access_node from ipa-modref-tree.h to ipa-modref-tree.c since they become long and not fitting for inlines anyway. I also cleaned up the interface by making static insert method (which handles inserting accesses into a vector and optimizing them) which makes it possible to hide most of the interface handling interval merging private. Honza gcc/ChangeLog: * ipa-modref-tree.h (struct modref_access_node): Move longer member functions to ipa-modref-tree.c (modref_ref_node::try_merge_with): Turn into modreef_acces_node member function. * ipa-modref-tree.c (modref_access_node::contains): Move here from ipa-modref-tree.h. (modref_access_node::update): Likewise. (modref_access_node::merge): Likewise. (modref_access_node::closer_pair_p): Likewise. (modref_access_node::forced_merge): Likewise. (modref_access_node::update2): Likewise. (modref_access_node::combined_offsets): Likewise. (modref_access_node::try_merge_with): Likewise. (modref_access_node::insert): Likewise.
2021-11-13	Add finalize method to modref summary.	Jan Hubicka	3	-29/+36
	gcc/ChangeLog: * ipa-modref.c (modref_summary::global_memory_read_p): Remove. (modref_summary::global_memory_written_p): Remove. (modref_summary::dump): Dump new flags. (modref_summary::finalize): New member function. (analyze_function): Call it. (read_section): Call it. (update_signature): Call it. (pass_ipa_modref::execute): Call it. * ipa-modref.h (struct modref_summary): Remove global_memory_read_p and global_memory_written_p. Add global_memory_read, global_memory_written. * tree-ssa-structalias.c (determine_global_memory_access): Update.
2021-11-13	Whitelity type attributes for function signature change	Jan Hubicka	3	-9/+39
	gcc/ChangeLog: * ipa-fnsummary.c (compute_fn_summary): Use type_attribut_allowed_p * ipa-param-manipulation.c (ipa_param_adjustments::type_attribute_allowed_p): New member function. (drop_type_attribute_if_params_changed_p): New function. (build_adjusted_function_type): Use it. * ipa-param-manipulation.h: Add type_attribute_allowed_p.
2021-11-13	analyzer: add four new taint-based warnings	David Malcolm	19	-123/+1492
	The initial commit of the analyzer in GCC 10 had a single warning, -Wanalyzer-tainted-array-index and required manually enabling the taint checker with -fanalyzer-checker=taint (due to scaling issues). This patch extends the taint detection to add four new taint-based warnings: -Wanalyzer-tainted-allocation-size for e.g. attacker-controlled malloc/alloca -Wanalyzer-tainted-divisor for detecting where an attacker can inject a divide-by-zero -Wanalyzer-tainted-offset for attacker-controlled pointer offsets -Wanalyzer-tainted-size for e.g. attacker-controlled memset and rewords all the warnings to talk about "attacker-controlled" values rather than "tainted" values. Unfortunately I haven't yet addressed the scaling issues, so all of these still require -fanalyzer-checker=taint (in addition to -fanalyzer). gcc/analyzer/ChangeLog: * analyzer.opt (Wanalyzer-tainted-allocation-size): New. (Wanalyzer-tainted-divisor): New. (Wanalyzer-tainted-offset): New. (Wanalyzer-tainted-size): New. * engine.cc (impl_region_model_context::get_taint_map): New. * exploded-graph.h (impl_region_model_context::get_taint_map): New decl. * program-state.cc (sm_state_map::get_state): Call alt_get_inherited_state. (sm_state_map::impl_set_state): Modify states within compound svalues. (program_state::impl_call_analyzer_dump_state): Undo casts. (selftest::test_program_state_1): Update for new context param of create_region_for_heap_alloc. (selftest::test_program_state_merging): Likewise. * region-model-impl-calls.cc (region_model::impl_call_alloca): Likewise. (region_model::impl_call_calloc): Likewise. (region_model::impl_call_malloc): Likewise. (region_model::impl_call_operator_new): Likewise. (region_model::impl_call_realloc): Likewise. * region-model.cc (region_model::check_region_access): Call check_region_for_taint. (region_model::get_representative_path_var_1): Handle binops. (region_model::create_region_for_heap_alloc): Add "ctxt" param and pass it to set_dynamic_extents. (region_model::create_region_for_alloca): Likewise. (region_model::set_dynamic_extents): Add "ctxt" param and use it to call check_dynamic_size_for_taint. (selftest::test_state_merging): Update for new context param of create_region_for_heap_alloc. (selftest::test_malloc_constraints): Likewise. (selftest::test_malloc): Likewise. (selftest::test_alloca): Likewise for create_region_for_alloca. * region-model.h (region_model::create_region_for_heap_alloc): Add "ctxt" param. (region_model::create_region_for_alloca): Likewise. (region_model::set_dynamic_extents): Likewise. (region_model::check_dynamic_size_for_taint): New decl. (region_model::check_region_for_taint): New decl. (region_model_context::get_taint_map): New vfunc. (noop_region_model_context::get_taint_map): New. * sm-taint.cc: Remove include of "diagnostic-event-id.h"; add includes of "gimple-iterator.h", "tristate.h", "selftest.h", "ordered-hash-map.h", "cgraph.h", "cfg.h", "digraph.h", "analyzer/supergraph.h", "analyzer/call-string.h", "analyzer/program-point.h", "analyzer/store.h", "analyzer/region-model.h", and "analyzer/program-state.h". (enum bounds): Move to top of file. (class taint_diagnostic): New. (class tainted_array_index): Convert to subclass of taint_diagnostic. (tainted_array_index::emit): Add CWE-129. Reword warning to use "attacker-controlled" rather than "tainted". (tainted_array_index::describe_state_change): Move to taint_diagnostic::describe_state_change. (tainted_array_index::describe_final_event): Reword to use "attacker-controlled" rather than "tainted". (class tainted_offset): New. (class tainted_size): New. (class tainted_divisor): New. (class tainted_allocation_size): New. (taint_state_machine::alt_get_inherited_state): New. (taint_state_machine::on_stmt): In assignment handling, remove ARRAY_REF handling in favor of check_region_for_taint. Add detection of tainted divisors. (taint_state_machine::get_taint): New. (taint_state_machine::combine_states): New. (region_model::check_region_for_taint): New. (region_model::check_dynamic_size_for_taint): New. * sm.h (state_machine::alt_get_inherited_state): New. gcc/ChangeLog: * doc/invoke.texi (Static Analyzer Options): Add -Wno-analyzer-tainted-allocation-size, -Wno-analyzer-tainted-divisor, -Wno-analyzer-tainted-offset, and -Wno-analyzer-tainted-size to list. Add -Wanalyzer-tainted-allocation-size, -Wanalyzer-tainted-divisor, -Wanalyzer-tainted-offset, and -Wanalyzer-tainted-size to list of options effectively enabled by -fanalyzer. (-Wanalyzer-tainted-allocation-size): New. (-Wanalyzer-tainted-array-index): Tweak wording; add link to CWE. (-Wanalyzer-tainted-divisor): New. (-Wanalyzer-tainted-offset): New. (-Wanalyzer-tainted-size): New. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/pr93382.c: Tweak expected wording. * gcc.dg/analyzer/taint-alloc-1.c: New test. * gcc.dg/analyzer/taint-alloc-2.c: New test. * gcc.dg/analyzer/taint-divisor-1.c: New test. * gcc.dg/analyzer/taint-1.c: Rename to... * gcc.dg/analyzer/taint-read-index-1.c: ...this. Tweak expected wording. Mark some events as xfail. * gcc.dg/analyzer/taint-read-offset-1.c: New test. * gcc.dg/analyzer/taint-size-1.c: New test. * gcc.dg/analyzer/taint-write-index-1.c: New test. * gcc.dg/analyzer/taint-write-offset-1.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-11-13	Remember fnspec based EAF flags in modref summary.	Jan Hubicka	3	-41/+49
	gcc/ChangeLog: * attr-fnspec.h (attr_fnspec::arg_eaf_flags): Break out from ... * gimple.c (gimple_call_arg_flags): ... here. * ipa-modref.c (analyze_parms): Record flags known from fnspec. (modref_merge_call_site_flags): Use arg_eaf_flags.
2021-11-13	path solver: Compute all PHI ranges simultaneously.	Aldy Hernandez	3	-9/+69
	PHIs must be resolved simulatenously, otherwise we may not pick up the ranges incoming to the block. For example. If we put p3_7 in the cache before all PHIs have been computed, we will pick up the wrong p3_7 value for p2_17: # p3_7 = PHI <1(2), 0(5)> # p2_17 = PHI <1(2), p3_7(5)> This patch delays updating the cache until all PHIs have been analyzed. gcc/ChangeLog: PR tree-optimization/103222 * gimple-range-path.cc (path_range_query::compute_ranges_in_phis): New. (path_range_query::compute_ranges_in_block): Call compute_ranges_in_phis. * gimple-range-path.h (path_range_query::compute_ranges_in_phis): New. gcc/testsuite/ChangeLog: * gcc.dg/pr103222.c: New test.
2021-11-13	Enable ipa-sra with fnspec attributes	Jan Hubicka	3	-15/+76
	Enable some ipa-sra on fortran by allowing signature changes on functions with "fn spec" attribute when ipa-modref is enabled. This is possible since ipa-modref knows how to preserve things we trace in fnspec and fnspec generated by fortran forntend are quite simple and can be analysed automatically now. To be sure I will also add code that merge fnspec to parameters. This unfortunately hits bug in ipa-param-manipulation when we remove parameter that specifies size of variable length parameter. For this reason I added a hack that prevent signature changes on such functions and will handle it incrementally. I tried creating C testcase but it is blocked by another problem that we punt ipa-sra on access attribute. This is optimization regression we ought to fix so I filled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103223. As a followup I will add code classifying the type attributes (we have just few) and get stats on access attribute. gcc/ChangeLog: * ipa-fnsummary.c (compute_fn_summary): Do not give up on signature changes on "fn spec" attribute; give up on varadic types. * ipa-param-manipulation.c: Include attribs.h. (build_adjusted_function_type): New parameter ARG_MODIFIED; if it is true remove "fn spec" attribute. (ipa_param_adjustments::build_new_function_type): Update. (ipa_param_body_adjustments::modify_formal_parameters): update. * ipa-sra.c: Include attribs.h. (ipa_sra_preliminary_function_checks): Do not check for TYPE_ATTRIBUTES.
2021-11-13	path solver: Merge path_range_query constructors.	Aldy Hernandez	2	-27/+21
	There's no need for two constructors, when we can do it all with one that defaults to the common behavior: path_range_query (bool resolve = true, gimple_ranger ranger = NULL); Tested on x86-64 Linux. gcc/ChangeLog: gimple-range-path.cc (path_range_query::path_range_query): Merge ctors. (path_range_query::import_p): Move from header file. (path_range_query::~path_range_query): Adjust for combined ctors. * gimple-range-path.h: Merge ctors. (path_range_query::import_p): Move to .cc file.
2021-11-13	Fix wrong code with modref and some builtins.	Jan Hubicka	1	-16/+14
	ipa-modref gets confused by EAF flags of memcpy becuase parameter 1 is escaping but used only directly. In modref we do not track values saved to memory and thus we clear all other flags on each store. This needs to also happen when called function escapes parameter. gcc/ChangeLog: PR tree-optimization/103182 * ipa-modref.c (callee_to_caller_flags): Fix merging of flags. (modref_eaf_analysis::analyze_ssa_name): Fix merging of flags.
2021-11-13	Daily bump.	GCC Administrator	6	-1/+323

2021-11-13	or1k: Fix clobbering of _mcount argument if fPIC is enabled	Stafford Horne	3	-16/+42
	Recently we changed the PROFILE_HOOK _mcount call to pass in the link register as an argument. This actually does not work when the _mcount call uses a PLT because the GOT register setup code ends up getting inserted before the PROFILE_HOOK and clobbers the link register argument. These glibc tests are failing: gmon/tst-gmon-pie-gprof gmon/tst-gmon-static-gprof This patch fixes this by saving the instruction that stores the Link Register to the _mcount argument and then inserts the GOT register setup instructions after that. For example: main.c: extern int e; int f2(int a) { return a + e; } int f1(int a) { return f2 (a + a); } int main(int argc, char ** argv) { return f1 (argc); } Compiled: or1k-smh-linux-gnu-gcc -Wall -c -O2 -fPIC -pg -S main.c Before Fix: main: l.addi r1, r1, -16 l.sw 8(r1), r2 l.sw 0(r1), r16 l.addi r2, r1, 16 # Keeping FP, but not needed l.sw 4(r1), r18 l.sw 12(r1), r9 l.jal 8 # GOT Setup clobbers r9 (Link Register) l.movhi r16, gotpchi(_GLOBAL_OFFSET_TABLE_-4) l.ori r16, r16, gotpclo(_GLOBAL_OFFSET_TABLE_+0) l.add r16, r16, r9 l.or r18, r3, r3 l.or r3, r9, r9 # This is not the original LR l.jal plt(_mcount) l.nop l.jal plt(f1) l.or r3, r18, r18 l.lwz r9, 12(r1) l.lwz r16, 0(r1) l.lwz r18, 4(r1) l.lwz r2, 8(r1) l.jr r9 l.addi r1, r1, 16 After the fix: main: l.addi r1, r1, -12 l.sw 0(r1), r16 l.sw 4(r1), r18 l.sw 8(r1), r9 l.or r18, r3, r3 l.or r3, r9, r9 # We now have r9 (LR) set early l.jal 8 # Clobbers r9 (Link Register) l.movhi r16, gotpchi(_GLOBAL_OFFSET_TABLE_-4) l.ori r16, r16, gotpclo(_GLOBAL_OFFSET_TABLE_+0) l.add r16, r16, r9 l.jal plt(_mcount) l.nop l.jal plt(f1) l.or r3, r18, r18 l.lwz r9, 8(r1) l.lwz r16, 0(r1) l.lwz r18, 4(r1) l.jr r9 l.addi r1, r1, 12 Fixes: 308531d148a ("or1k: Add return address argument to _mcount call") gcc/ChangeLog: * config/or1k/or1k-protos.h (or1k_profile_hook): New function. * config/or1k/or1k.h (PROFILE_HOOK): Change macro to reference new function or1k_profile_hook. * config/or1k/or1k.c (struct machine_function): Add new field set_mcount_arg_insn. (or1k_profile_hook): New function. (or1k_init_pic_reg): Update to inject pic rtx after _mcount arg when profiling. (or1k_frame_pointer_required): Frame pointer no longer needed when profiling.
2021-11-12	Fix wrong code with pure functions	Jan Hubicka	3	-2/+38
	I introduced bug into find_func_aliases_for_call in handling pure functions. Instead of reading global memory pure functions are believed to write global memory. This results in misoptimization of the testcase at -O1. The change to pta-callused.c updates the template for new behaviour of the constraint generation. We copy nonlocal memory to calluse which is correct but also not strictly necessary because later we take care to add nonlocal_p flag manually. gcc/ChangeLog: PR tree-optimization/103209 * tree-ssa-structalias.c (find_func_aliases_for_call): Fix use of handle_rhs_call gcc/testsuite/ChangeLog: PR tree-optimization/103209 * gcc.dg/tree-ssa/pta-callused.c: Update template. * gcc.c-torture/execute/pr103209.c: New test.
2021-11-12	path solver: Solve PHI imports first for ranges.	Aldy Hernandez	1	-2/+13
	PHIs must be resolved first while solving ranges in a block, regardless of where they appear in the import bitmap. We went through a similar exercise for the relational code, but missed these. Tested on x86-64 & ppc64le Linux. gcc/ChangeLog: PR tree-optimization/103202 * gimple-range-path.cc (path_range_query::compute_ranges_in_block): Solve PHI imports first.
2021-11-12	Fix ipa-pure-const	Jan Hubicka	1	-5/+2
	gcc/ChangeLog: * ipa-pure-const.c (propagate_pure_const): Remove redundant check; fix call of ipa_make_function_const and ipa_make_function_pure.
2021-11-12	analyzer: "__analyzer_dump_state" has no side-effects	David Malcolm	1	-2/+5
	gcc/analyzer/ChangeLog: * engine.cc (exploded_node::on_stmt_pre): Return when handling "__analyzer_dump_state". Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-11-12	aarch64: Remove redundant costing code	Richard Sandiford	1	-112/+30
	Previous patches made some of the complex parts of the issue rate code redundant. gcc/ * config/aarch64/aarch64.c (aarch64_vector_op::n_advsimd_ops): Delete. (aarch64_vector_op::m_seen_loads): Likewise. (aarch64_vector_costs::aarch64_vector_costs): Don't push to m_advsimd_ops. (aarch64_vector_op::count_ops): Remove vectype and factor parameters. Remove code that tries to predict different vec_flags from the current loop's. (aarch64_vector_costs::add_stmt_cost): Update accordingly. Remove m_advsimd_ops handling.
2021-11-12	aarch64: Use new hooks for vector comparisons	Richard Sandiford	1	-146/+145
	Previously we tried to account for the different issue rates of the various vector modes by guessing what the Advanced SIMD version of an SVE loop would look like and what its issue rate was likely to be. We'd then increase the cost of the SVE loop if the Advanced SIMD loop might issue more quickly. This patch moves that logic to better_main_loop_than_p, so that we can compare loops side-by-side rather than having to guess. This also means we can apply the issue rate heuristics to any vector loop comparison, rather than just weighting SVE vs. Advanced SIMD. The actual heuristics are otherwise unchanged. We're just applying them in a different place. gcc/ * config/aarch64/aarch64.c (aarch64_vector_costs::m_saw_sve_only_op) (aarch64_sve_only_stmt_p): Delete. (aarch64_vector_costs::prefer_unrolled_loop): New function, extracted from adjust_body_cost. (aarch64_vector_costs::better_main_loop_than_p): New function, using heuristics extracted from adjust_body_cost and adjust_body_cost_sve. (aarch64_vector_costs::adjust_body_cost_sve): Remove advsimd_cycles_per_iter and could_use_advsimd parameters. Update after changes above. (aarch64_vector_costs::adjust_body_cost): Update after changes above.
2021-11-12	aarch64: Add vf_factor to aarch64_vec_op_count	Richard Sandiford	1	-6/+24
	-mtune=neoverse-512tvb sets the likely SVE vector length to 128 bits, but it also takes into account Neoverse V1, which is a 256-bit target. This patch adds this VF (VL) factor to aarch64_vec_op_count. gcc/ * config/aarch64/aarch64.c (aarch64_vec_op_count::m_vf_factor): New member variable. (aarch64_vec_op_count::aarch64_vec_op_count): Add a parameter for it. (aarch64_vec_op_count::vf_factor): New function. (aarch64_vector_costs::aarch64_vector_costs): When costing for neoverse-512tvb, pass a vf_factor of 2 for the Neoverse V1 version of an SVE loop. (aarch64_vector_costs::adjust_body_cost): Read the vf factor instead of hard-coding 2.
2021-11-12	aarch64: Move cycle estimation into aarch64_vec_op_count	Richard Sandiford	1	-98/+105
	This patch just moves the main cycle estimation routines into aarch64_vec_op_count. gcc/ * config/aarch64/aarch64.c (aarch64_vec_op_count::rename_cycles_per_iter): New function. (aarch64_vec_op_count::min_nonpred_cycles_per_iter): Likewise. (aarch64_vec_op_count::min_pred_cycles_per_iter): Likewise. (aarch64_vec_op_count::min_cycles_per_iter): Likewise. (aarch64_vec_op_count::dump): Move earlier in file. Dump the above properties too. (aarch64_estimate_min_cycles_per_iter): Delete. (adjust_body_cost): Use aarch64_vec_op_count::min_cycles_per_iter instead of aarch64_estimate_min_cycles_per_iter. Rely on the dump routine to print CPI estimates. (adjust_body_cost_sve): Likewise. Use the other functions above instead of doing the work inline.
2021-11-12	aarch64: Use an array of aarch64_vec_op_counts	Richard Sandiford	1	-55/+60
	-mtune=neoverse-512tvb uses two issue rates, one for Neoverse V1 and one with more generic parameters. We use both rates when making a choice between scalar, Advanced SIMD and SVE code. Previously we calculated the Neoverse V1 issue rates from the more generic issue rates, but by removing m_scalar_ops and (later) m_advsimd_ops, it becomes easier to track multiple issue rates directly. This patch therefore converts m_ops and (temporarily) m_advsimd_ops into arrays. gcc/ * config/aarch64/aarch64.c (aarch64_vec_op_count): Allow default initialization. (aarch64_vec_op_count::base_issue_info): Remove handling of null issue_infos. (aarch64_vec_op_count::simd_issue_info): Likewise. (aarch64_vec_op_count::sve_issue_info): Likewise. (aarch64_vector_costs::m_ops): Turn into a vector. (aarch64_vector_costs::m_advsimd_ops): Likewise. (aarch64_vector_costs::aarch64_vector_costs): Add entries to the vectors based on aarch64_tune_params. (aarch64_vector_costs::analyze_loop_vinfo): Update the pred_ops of all entries in m_ops. (aarch64_vector_costs::add_stmt_cost): Call count_ops for all entries in m_ops. (aarch64_estimate_min_cycles_per_iter): Remove issue_info parameter and get the information from the ops instead. (aarch64_vector_costs::adjust_body_cost_sve): Take a aarch64_vec_issue_info instead of a aarch64_vec_op_count. (aarch64_vector_costs::adjust_body_cost): Update call accordingly. Exit earlier if m_ops is empty for either cost structure.
2021-11-12	aarch64: Use real scalar op counts	Richard Sandiford	1	-94/+88
	Now that vector finish_costs is passed the associated scalar costs, we can record the scalar issue information while computing the scalar costs, rather than trying to estimate it while computing the vector costs. This simplifies things a little, but the main motivation is to improve accuracy. gcc/ * config/aarch64/aarch64.c (aarch64_vector_costs::m_scalar_ops) (aarch64_vector_costs::m_sve_ops): Replace with... (aarch64_vector_costs::m_ops): ...this. (aarch64_vector_costs::analyze_loop_vinfo): Update accordingly. (aarch64_vector_costs::adjust_body_cost_sve): Likewise. (aarch64_vector_costs::aarch64_vector_costs): Likewise. Initialize m_vec_flags here rather than in add_stmt_cost. (aarch64_vector_costs::count_ops): Test for scalar reductions too. Allow vectype to be null. (aarch64_vector_costs::add_stmt_cost): Call count_ops for scalar code too. Don't require vectype to be nonnull. (aarch64_vector_costs::adjust_body_cost): Take the loop_vec_info and scalar costs as parameters. Use the scalar costs to determine the cycles per iteration of the scalar loop, then multiply it by the estimated VF. (aarch64_vector_costs::finish_cost): Update call accordingly.
2021-11-12	aarch64: Get floatness from stmt_info	Richard Sandiford	1	-2/+12
	This patch gets the floatness of a memory access from the data reference rather than the vectype. This makes it more suitable for use in scalar costing code. gcc/ * config/aarch64/aarch64.c (aarch64_dr_type): New function. (aarch64_vector_costs::count_ops): Use it rather than the vectype to determine floatness.
2021-11-12	aarch64: Remove vectype from latency tests	Richard Sandiford	1	-20/+13
	This patch gets the scalar mode of a reduction operation from the gimple stmt rather than the vectype. This makes it more suitable for use in scalar costs. gcc/ * config/aarch64/aarch64.c (aarch64_sve_in_loop_reduction_latency): Remove vectype parameter and get floatness from the type of the stmt lhs instead. (arch64_in_loop_reduction_latency): Likewise. (aarch64_detect_vector_stmt_subtype): Update caller. (aarch64_vector_costs::count_ops): Likewise.
2021-11-12	aarch64: Fold aarch64_sve_op_count into aarch64_vec_op_count	Richard Sandiford	1	-57/+96
	Later patches make aarch64 use the new vector hooks. We then only need to track one set of ops for each aarch64_vector_costs structure. This in turn means that it's more convenient to merge aarch64_sve_op_count and aarch64_vec_op_count. The patch also adds issue info and vec flags to aarch64_vec_op_count, so that the structure is more self-descriptive. This simplifies some things later. gcc/ * config/aarch64/aarch64.c (aarch64_sve_op_count): Fold into... (aarch64_vec_op_count): ...this. Add a constructor. (aarch64_vec_op_count::vec_flags): New function. (aarch64_vec_op_count::base_issue_info): Likewise. (aarch64_vec_op_count::simd_issue_info): Likewise. (aarch64_vec_op_count::sve_issue_info): Likewise. (aarch64_vec_op_count::m_issue_info): New member variable. (aarch64_vec_op_count::m_vec_flags): Likewise. (aarch64_vector_costs): Add a constructor. (aarch64_vector_costs::m_sve_ops): Change type to aarch64_vec_op_count. (aarch64_vector_costs::aarch64_vector_costs): New function. Initialize m_scalar_ops, m_advsimd_ops and m_sve_ops. (aarch64_vector_costs::count_ops): Remove vec_flags and issue_info parameters, using the new aarch64_vec_op_count functions instead. (aarch64_vector_costs::add_stmt_cost): Update call accordingly. (aarch64_sve_op_count::dump): Fold into... (aarch64_vec_op_count::dump): ..here.
2021-11-12	aarch64: Detect more consecutive MEMs	Richard Sandiford	2	-52/+133
	For tests like: int res[2]; void f1 (int x, int y) { res[0] = res[1] = x + y; } we generated: add w0, w0, w1 adrp x1, .LANCHOR0 add x2, x1, :lo12:.LANCHOR0 str w0, [x1, #:lo12:.LANCHOR0] str w0, [x2, 4] ret Using [x1, #:lo12:.LANCHOR0] for the first store prevented the two stores being recognised as a pair. However, the MEM_EXPR and MEM_OFFSET information tell us that the MEMs really are consecutive. The peehole2 context then guarantees that the first address is equivalent to [x2, 0]. While there: the reg_mentioned_p tests for loads were probably correct, but seemed a bit indirect. We're matching two consecutive loads, so the thing we need to test is that the second MEM in the original sequence doesn't depend on the result of the first load in the original sequence. gcc/ * config/aarch64/aarch64.c: Include tree-dfa.h. (aarch64_check_consecutive_mems): New function that takes MEM_EXPR and MEM_OFFSET into account. (aarch64_swap_ldrstr_operands): Use it. (aarch64_operands_ok_for_ldpstp): Likewise. Check that the address of the second memory doesn't depend on the result of the first load. gcc/testsuite/ * gcc.target/aarch64/stp_1.c: New test.
2021-11-12	Fortran/openmp: Fix '!$omp end'	Tobias Burnus	7	-17/+1131
	gcc/fortran/ChangeLog: * parse.c (decode_omp_directive): Fix permitting 'nowait' for some combined directives, add missing 'omp end ... loop'. (gfc_ascii_statement): Fix ST_OMP_END_TEAMS_LOOP result. * openmp.c (resolve_omp_clauses): Add missing combined loop constructs case values to the 'if(directive-name: ...)' check. * trans-openmp.c (gfc_split_omp_clauses): Put nowait on target if first leaf construct accepting it. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/unexpected-end.f90: Update dg-error. * gfortran.dg/gomp/clauses-1.f90: New test. * gfortran.dg/gomp/nowait-2.f90: New test. * gfortran.dg/gomp/nowait-3.f90: New test.
2021-11-12	Fix exit condition in ipa_make_function_pure	Jan Hubicka	1	-1/+1
	gcc/ChangeLog: * ipa-pure-const.c (ipa_make_function_pure): Fix exit condition.
2021-11-12	Fix ICE in tree-ssa-structalias.c	Jan Hubicka	1	-1/+10
	PR tree-optimization/103175 * ipa-modref.c (modref_lattice::merge): Add sanity check. (callee_to_caller_flags): Make flags adjustment sane. (modref_eaf_analysis::analyze_ssa_name): Likewise.
2021-11-12	Fortran: Use build_debug_expr_decl to create DEBUG_DECL_EXPRs	Martin Jambor	1	-4/+2
	This patch converts one more open coded construction of a DEBUG_EXPR_DECL to a call of build_debug_expr_decl that I missed in my previous patch befause it happens to be in the Fortran front-end. gcc/fortran/ChangeLog: 2021-11-11 Martin Jambor <mjambor@suse.cz> * trans-types.c (gfc_get_array_descr_info): Use build_debug_expr_decl instead of building DEBUG_EXPR_DECL manually.
2021-11-12	testsuite: Filter out TSVC test on Power [PR103051]	Martin Liska	1	-1/+1
	PR testsuite/103051 gcc/testsuite/ChangeLog: * gcc.dg/vect/tsvc/vect-tsvc-s112.c: Skip test for old Power CPUs.
2021-11-12	jit: fix -Werror=format-overflow= in testsuite [PR103199]	David Malcolm	2	-2/+2
	gcc/jit/ChangeLog: PR jit/103199 * docs/examples/tut04-toyvm/toyvm.c (toyvm_function_compile): Increase size of buffer. * docs/examples/tut04-toyvm/toyvm.cc (compilation_state::create_function): Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-11-12	Fix ipa-modref pure/const discovery	Jan Hubicka	1	-3/+5
	PR ipa/103200 * ipa-modref.c (analyze_function, modref_propagate_in_scc): Do not mark pure/const function if there are side-effects.
2021-11-12	openmp: Relax handling of implicit map vs. existing device mappings	Chung-Lin Tang	9	-11/+76
	This patch implements relaxing the requirements when a map with the implicit attribute encounters an overlapping existing map. As the OpenMP 5.0 spec describes on page 320, lines 18-27 (and 5.1 spec, page 352, lines 13-22): "If a single contiguous part of the original storage of a list item with an implicit data-mapping attribute has corresponding storage in the device data environment prior to a task encountering the construct that is associated with the map clause, only that part of the original storage will have corresponding storage in the device data environment as a result of the map clause." 2021-11-12 Chung-Lin Tang <cltang@codesourcery.com> include/ChangeLog: * gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define special bit macro. (GOMP_MAP_IMPLICIT): New special map kind bits value. (GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of special map kind bits. (GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds. gcc/ChangeLog: * tree.h (OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P): New access macro for 'implicit' bit, using 'base.deprecated_flag' field of tree_node. * tree-pretty-print.c (dump_omp_clause): Add support for printing implicit attribute in tree dumping. * gimplify.c (gimplify_adjust_omp_clauses_1): Set OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P to 1 if map clause is implicitly created. (gimplify_adjust_omp_clauses): Adjust place of adding implicitly created clauses, from simple append, to starting of list, after non-map clauses. * omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind values passed to libgomp for implicit maps. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-implicit-map-1.c: New test. * c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern. * c-c++-common/goacc/firstprivate-mappings-1.c: Likewise. * c-c++-common/goacc/mdc-1.c: Likewise. * g++.dg/goacc/firstprivate-mappings-1.C: Likewise. libgomp/ChangeLog: * target.c (gomp_map_vars_existing): Add 'bool implicit' parameter, add implicit map handling to allow a "superset" existing map as valid case. (get_kind): Adjust to filter out GOMP_MAP_IMPLICIT bits in return value. (get_implicit): New function to extract implicit status. (gomp_map_fields_existing): Adjust arguments in calls to gomp_map_vars_existing, and add uses of get_implicit. (gomp_map_vars_internal): Likewise. * testsuite/libgomp.c-c++-common/target-implicit-map-1.c: New test.
2021-11-12	fortran: Ignore unused args in scalarization [PR97896]	Mikael Morin	9	-83/+121
	The KIND argument of the INDEX intrinsic is a compile time constant that is used at compile time only to resolve to a kind-specific library function. That argument is otherwise completely ignored at runtime, and there is no code generated for it as the library procedure has no kind argument. This confuses the scalarizer which expects to see every argument of elemental functions used when calling a procedure. This change removes the argument from the scalarization lists at the beginning of the scalarization process, so that the argument is completely ignored. This also reverts the existing workaround (commit d09847357b965a2c2cda063827ce362d4c9c86f2 except for its testcase). PR fortran/97896 gcc/fortran/ChangeLog: * intrinsic.c (add_sym_4ind): Remove. (add_functions): Use add_sym4 instead of add_sym4ind. Don’t special case the index intrinsic. * iresolve.c (gfc_resolve_index_func): Use the individual arguments directly instead of the full argument list. * intrinsic.h (gfc_resolve_index_func): Update the declaration accordingly. * trans-decl.c (gfc_get_extern_function_decl): Don’t modify the list of arguments in the case of the index intrinsic. * trans-array.h (gfc_get_intrinsic_for_expr, gfc_get_proc_ifc_for_expr): New. * trans-array.c (gfc_get_intrinsic_for_expr, arg_evaluated_for_scalarization): New. (gfc_walk_elemental_function_args): Add intrinsic procedure as argument. Count arguments. Check arg_evaluated_for_scalarization. * trans-intrinsic.c (gfc_walk_intrinsic_function): Update call. * trans-stmt.c (get_intrinsic_for_code): New. (gfc_trans_call): Update call. gcc/testsuite/ChangeLog: * gfortran.dg/index_5.f90: New.
2021-11-12	openmp: Honor OpenMP 5.1 num_teams lower bound	Jakub Jelinek	4	-10/+42
	The following patch implements what I've been talking about earlier, honor that for explicit num_teams clause we create at least the lower-bound (if not specified, upper-bound) teams in the league. For host fallback, it still means we only have one thread doing all the teams, sequentially one after another. For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too will or might fail. For these offloads, I think it is ok to remove symbols no longer used from libgomp.a. If num_teams_lower is bigger than the provided num_blocks or num_workgroups, we should arrange for gomp_num_teams_var to be num_teams_lower - 1, stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num () and instead use for it some .shared var that GOMP_teams4 initializes to %ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first increment that by num_blocks or num_workgroups each time and only return false when we are above num_teams_lower. Any help with actually implementing this for the 2 architectures highly appreciated. 2021-11-12 Jakub Jelinek <jakub@redhat.com> gcc/ * omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove. (BUILT_IN_GOMP_TEAMS4): New. * builtin-types.def (BT_FN_VOID_UINT_UINT): Remove. (BT_FN_BOOL_UINT_UINT_UINT_BOOL): New. * omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of GOMP_teams, pass to it also num_teams lower-bound expression or a dup of upper-bound if it is missing and a flag whether it is the first call or not. gcc/fortran/ * types.def (BT_FN_VOID_UINT_UINT): Remove. (BT_FN_BOOL_UINT_UINT_UINT_BOOL): New. libgomp/ * libgomp_g.h (GOMP_teams4): Declare. * libgomp.map (GOMP_5.1): Export GOMP_teams4. * target.c (GOMP_teams4): New function. * config/nvptx/target.c (GOMP_teams): Remove. (GOMP_teams4): New function. * config/gcn/target.c (GOMP_teams): Remove. (GOMP_teams4): New function. * testsuite/libgomp.c/teams-4.c (main): Expect exactly 2 teams instead of <= 2. * testsuite/libgomp.c-c++-common/teams-2.c: New test.
2021-11-12	Remove unused function.	Martin Liska	1	-61/+0
	PR tree-optimization/102497 gcc/ChangeLog: * gimple-predicate-analysis.cc (add_pred): Remove unused function:
2021-11-12	tree-optimization/103204 - fix missed valueization in VN	Richard Biener	2	-5/+24
	The following fixes a missed valueization when simplifying a MEM[&...] combination during valueization. 2021-11-12 Richard Biener <rguenther@suse.de> PR tree-optimization/103204 * tree-ssa-sccvn.c (valueize_refs_1): Re-valueize the top operand after folding in an address. * gcc.dg/torture/pr103204.c: New testcase.
2021-11-12	Daily bump.	GCC Administrator	7	-1/+679

2021-11-11	Make ranger optional in path_range_query.	Aldy Hernandez	4	-24/+37
	All users of path_range_query are currently allocating a gimple_ranger only to pass it to the query object. It's tidier to just do it from path_range_query if no ranger was passed. Tested on x86-64 Linux. gcc/ChangeLog: * gimple-range-path.cc (path_range_query::path_range_query): New ctor without a ranger. (path_range_query::~path_range_query): Free ranger if necessary. (path_range_query::range_on_path_entry): Adjust m_ranger for pointer. (path_range_query::ssa_range_in_phi): Same. (path_range_query::compute_ranges_in_block): Same. (path_range_query::compute_imports): Same. (path_range_query::compute_ranges): Same. (path_range_query::range_of_stmt): Same. (path_range_query::compute_outgoing_relations): Same. * gimple-range-path.h (class path_range_query): New ctor. * tree-ssa-loop-ch.c (ch_base::copy_headers): Remove gimple_ranger as path_range_query allocates one. * tree-ssa-threadbackward.c (class back_threader): Remove m_ranger. (back_threader::~back_threader): Same.