aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2021-03-28d: Define language hook for LANG_HOOKS_ENUM_UNDERLYING_BASE_TYPEIain Buclaw1-0/+12
The underlying base type for enumerals are always present in TREE_TYPE. gcc/d/ChangeLog: * d-lang.cc (d_enum_underlying_base_type): New function. (LANG_HOOKS_ENUM_UNDERLYING_BASE_TYPE): Set as d_enum_underlying_base_type.
2021-03-28d: Use COMPILER_FOR_BUILD to build all D front-end generator programsIain Buclaw2-3/+15
This means the correct config headers are included when building the D front-end in a Canadian cross configuration. gcc/d/ChangeLog: * Make-lang.in (DMDGEN_COMPILE): Remove. (d/%.dmdgen.o): Use COMPILER_FOR_BUILD and BUILD_COMPILERFLAGS to build all D generator programs. (D_SYSTEM_H): New macro. (d/idgen.dmdgen.o): Add dependencies to build. (d/impcnvgen.dmdgen.o): Likewise. * d-system.h: Include bconfig.h if GENERATOR_FILE is defined.
2021-03-28d: Don't generate per-module wrapper for calling DSO constructor/destructor.Iain Buclaw4-50/+6
The static constructor/destructor list only ever has one function to call in it, so mark the gdc.dso_ctor and gdc.dso_dtor functions as static ctor/dtor directly instead. gcc/d/ChangeLog: * config-lang.in (gtfiles): Remove modules.cc. * modules.cc (struct module_info): Remove GTY marker. (static_ctor_list): Remove variable. (static_dtor_list): Remove variable. (register_moduleinfo): Directly set DECL_STATIC_CONSTRUCTOR on dso_ctor, and DECL_STATIC_DESTRUCTOR on dso_dtor. (d_finish_compilation): Remove static ctor/dtor handling. gcc/testsuite/ChangeLog: * gdc.dg/gdc270a.d: Removed. * gdc.dg/gdc270b.d: Removed.
2021-03-28Daily bump.GCC Administrator2-1/+5
2021-03-27fortran: Fix off-by-one in buffer sizes.Steve Kargl1-2/+4
gcc/fortran/ChangeLog: * misc.c (gfc_typename): Fix off-by-one in buffer sizes.
2021-03-27Daily bump.GCC Administrator5-1/+366
2021-03-26aix: ABI struct alignment (PR99557)David Edelsohn4-19/+130
The AIX power alignment rules apply the natural alignment of the "first member" if it is of a floating-point data type (or is an aggregate whose recursively "first" member or element is such a type). The alignment associated with these types for subsequent members use an alignment value where the floating-point data type is considered to have 4-byte alignment. GCC had been stripping array type but had not recursively looked within structs and unions. This also applies to classes and subclasses and, therefore, becomes more prominent with C++. For example, struct A { double x[2]; int y; }; struct B { int i; struct A a; }; struct A has double-word alignment for the bare type, but word alignment and offset within struct B despite the alignment of struct A. If struct A were the first member of struct B, struct B would have double-word alignment. One must search for the innermost first member to increase the alignment if double and then search for the innermost first member to reduce the alignment if the TYPE had double-word alignment solely because the innermost first member was double. This patch recursively looks through the first member to apply the double-word alignment to the struct / union as a whole and to apply the word alignment to the struct or union as a member within a struct or union. This is an ABI change for GCC on AIX, but GCC on AIX had not correctly implemented the AIX ABI and had not been compatible with the IBM XL compiler. Bootstrapped on powerpc-ibm-aix7.2.3.0. gcc/ChangeLog: * config/rs6000/aix.h (ADJUST_FIELD_ALIGN): Call function. * config/rs6000/rs6000-protos.h (rs6000_special_adjust_field_align): Declare. * config/rs6000/rs6000.c (rs6000_special_adjust_field_align): New. (rs6000_special_round_type_align): Recursively check innermost first field. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr99557.c: New.
2021-03-27dwarf2cfi: Defer queued register saves some more [PR99334]Jakub Jelinek2-4/+32
On the testcase in the PR with -fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign we have prologue: 0000000000000000 <_func_with_dwarf_issue_>: 0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 5: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 9: 41 ff 72 f8 pushq -0x8(%r10) d: 55 push %rbp e: 48 89 e5 mov %rsp,%rbp 11: 41 57 push %r15 13: 41 56 push %r14 15: 41 55 push %r13 17: 41 54 push %r12 19: 41 52 push %r10 1b: 53 push %rbx 1c: 48 83 ec 20 sub $0x20,%rsp and emit 00000000 0000000000000014 00000000 CIE Version: 1 Augmentation: "zR" Code alignment factor: 1 Data alignment factor: -8 Return address column: 16 Augmentation data: 1b DW_CFA_def_cfa: r7 (rsp) ofs 8 DW_CFA_offset: r16 (rip) at cfa-8 DW_CFA_nop DW_CFA_nop 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5 DW_CFA_advance_loc: 5 to 0000000000000005 DW_CFA_def_cfa: r10 (r10) ofs 0 DW_CFA_advance_loc: 9 to 000000000000000e DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) DW_CFA_advance_loc: 13 to 000000000000001b DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref) DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8) DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16) DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24) DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32) ... unwind info for that. The problem is when async signal (or stepping through in the debugger) stops after the pushq %rbp instruction and before movq %rsp, %rbp, the unwind info says that caller's %rbp is saved there at *%rbp, but that is not true, caller's %rbp is either still available in the %rbp register, or in *%rsp, only after executing the next instruction - movq %rsp, %rbp - the location for %rbp is correct. So, either we'd need to temporarily say: DW_CFA_advance_loc: 9 to 000000000000000e DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0) DW_CFA_advance_loc: 3 to 0000000000000011 DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) DW_CFA_advance_loc: 10 to 000000000000001b or to me it seems more compact to just say: DW_CFA_advance_loc: 12 to 0000000000000011 DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) DW_CFA_advance_loc: 10 to 000000000000001b I've tried instead to deal with it through REG_FRAME_RELATED_EXPR from the backend, but that failed miserably as explained in the PR, dwarf2cfi.c has some rules (Rule 16 to Rule 19) that are specific to the dynamic stack realignment using drap register that only the i386 backend does right now, and by using REG_FRAME_RELATED_EXPR or REG_CFA* notes we can't emulate those rules. The following patch instead does the deferring of the hard frame pointer save rule in dwarf2cfi.c Rule 18 handling and emits it on the (set hfp sp) assignment that must appear shortly after it and adds assertion that it is the case. The difference before/after the patch on the assembly is: --- pr99334.s~ 2021-03-26 15:42:40.881749380 +0100 +++ pr99334.s 2021-03-26 17:38:05.729161910 +0100 @@ -11,8 +11,8 @@ _func_with_dwarf_issue_: andq $-16, %rsp pushq -8(%r10) pushq %rbp - .cfi_escape 0x10,0x6,0x2,0x76,0 movq %rsp, %rbp + .cfi_escape 0x10,0x6,0x2,0x76,0 pushq %r15 pushq %r14 pushq %r13 i.e. does just what we IMHO need, after pushq %rbp %rbp still contains parent's frame value and so the save rule doesn't need to be overridden there, ditto at the start of the next insn before the side-effect took effect, and we override it only after it when %rbp already has the right value. If some other target adds dynamic stack realignment in the future and the offset 0 case wouldn't be true there, the code can be adjusted so that it works on all the drap architectures, I'm pretty sure the code would need other adjustments too. For the rule 18 and for the (set hfp sp) after it we already have asserts for the drap cases that check whether the code looks the way i?86/x86_64 emit it currently. 2021-03-26 Jakub Jelinek <jakub@redhat.com> PR debug/99334 * dwarf2out.h (struct dw_fde_node): Add rule18 member. * dwarf2cfi.c (dwarf2out_frame_debug_expr): When handling (set hfp sp) assignment with drap_reg active, queue reg save for hfp with offset 0 and flush queued reg saves. When handling a push with rule18, defer queueing reg save for hfp and just assert the offset is 0. (scan_trace): Assert that fde->rule18 is false.
2021-03-26PR tree-optimization/59970 - Bogus -Wmaybe-uninitialized at low optimization ↵Martin Sebor1-0/+79
levels PR tree-optimization/59970 * gcc.dg/uninit-pr59970.c: New test.
2021-03-26c++: ICE on invalid with NSDMI in C++98 [PR98352]Marek Polacek3-2/+10
NSDMIs are a C++11 thing, and here we ICE with them on the non-C++11 path. Fortunately all we need is a small tweak to my recent r11-7835 patch. gcc/cp/ChangeLog: PR c++/98352 * method.c (implicitly_declare_fn): Pass &raises to synthesized_method_walk. gcc/testsuite/ChangeLog: PR c++/98352 * g++.dg/cpp0x/inh-ctor37.C: Remove dg-error. * g++.dg/cpp0x/nsdmi17.C: New test.
2021-03-26c++: imported templates and alias-template changes [PR 99283]Nathan Sidwell13-245/+259
During development of modules, I had difficulty deciding whether the module flags of a template should live on the decl_template_result, the template_decl, or both. I chose the latter, and require them to be consistent. This and a few other defects show how hard that consistency is. Hence this patch move to holding the flags on the template-decl-result decl. That's the entity various bits of the parser have at the appropriate time. Once needs STRIP_TEMPLATE in a bunch of places, which this patch adds. Also a check that we never give a TEMPLATE_DECL to the module flag accessors. This left a problem with how I was handling template aliases. These were in two parts -- separating the TEMPLATE_DECL from the TYPE_DECL. That seemed somewhat funky, but development showed it necessary. Of course, that causes problems if the TEMPLATE_DECL cannot contain 'am imported' information. Investigating now shows that we do not need to treat them separately. By reverting a bit of template instantiation machinery that caused the problem, we're back on course. I think what has happened is that between then and now, other typedef fixes have corrected the underlying problem this separation was working around. It allows a bunch of cleanup in the decl streamer, as we no longer have to handle a null TEMPLATE_DECL_RESULT. PR c++/99283 gcc/cp/ * cp-tree.h (DECL_MODULE_CHECK): Ban TEMPLATE_DECL. (SET_TYPE_TEMPLATE_INFO): Restore Alias template setting. * decl.c (duplicate_decls): Remove template_decl module flag propagation. * module.cc (merge_kind_name): Add alias tmpl spec as a thing. (dumper::impl::nested_name): Adjust for template-decl module flag change. (trees_in::assert_definition): Likewise. (trees_in::install_entity): Likewise. (trees_out::decl_value): Likewise. Remove alias template separation of template and type_decl. (trees_in::decl_value): Likewise. (trees_out::key_mergeable): Likewise, (trees_in::key_mergeable): Likewise. (trees_out::decl_node): Adjust for template-decl module flag change. (depset::hash::make_dependency): Likewise. (get_originating_module, module_may_redeclare): Likewise. (set_instantiating_module, set_defining_module): Likewise. * name-lookup.c (name_lookup::search_adl): Likewise. (do_pushdecl): Likewise. * pt.c (build_template_decl): Likewise. (lookup_template_class_1): Remove special alias_template handling of DECL_TI_TEMPLATE. (tsubst_template_decl): Likewise. gcc/testsuite/ * g++.dg/modules/pr99283-2_a.H: New. * g++.dg/modules/pr99283-2_b.H: New. * g++.dg/modules/pr99283-2_c.H: New. * g++.dg/modules/pr99283-3_a.H: New. * g++.dg/modules/pr99283-3_b.H: New. * g++.dg/modules/pr99283-4.H: New. * g++.dg/modules/tpl-alias-1_a.H: Adjust scans. * g++.dg/modules/tpl-alias-1_b.C: Adjust scans.
2021-03-26[PR99766] Consider relaxed memory associated more with memory instead of ↵Vladimir Makarov6-5/+29
special memory. Relaxed memory should be considered more like memory then special memory. gcc/ChangeLog: PR target/99766 * ira-costs.c (record_reg_classes): Put case with CT_RELAXED_MEMORY adjacent to one with CT_MEMORY. * ira.c (ira_setup_alts): Ditto. * lra-constraints.c (process_alt_operands): Ditto. * recog.c (asm_operand_ok): Ditto. * reload.c (find_reloads): Ditto. gcc/testsuite/ChangeLog: PR target/99766 * g++.target/aarch64/sve/pr99766.C: New.
2021-03-26aarch64: Add costs for LD[34] and ST[34] postincrementsRichard Sandiford2-2/+45
Most postincrements are cheap on Neoverse V1, but it's generally better to avoid them on LD[34] and ST[34] instructions. This patch adds separate address costs fields for these cases. Other CPUs continue to use the same costs for all postincrements. gcc/ * config/aarch64/aarch64-protos.h (cpu_addrcost_table::post_modify_ld3_st3): New member variable. (cpu_addrcost_table::post_modify_ld4_st4): Likewise. * config/aarch64/aarch64.c (generic_addrcost_table): Update accordingly, using the same costs as for post_modify. (exynosm1_addrcost_table, xgene1_addrcost_table): Likewise. (thunderx2t99_addrcost_table, thunderx3t110_addrcost_table): (tsv110_addrcost_table, qdf24xx_addrcost_table): Likewise. (a64fx_addrcost_table): Likewise. (neoversev1_addrcost_table): New. (neoversev1_tunings): Use neoversev1_addrcost_table. (aarch64_address_cost): Use the new post_modify costs for CImode and XImode.
2021-03-26aarch64: Take issue rate into account for vector loop costsRichard Sandiford4-21/+966
When SVE is enabled, GCC needs to do a three-way comparison between scalar, Advanced SIMD and SVE code. The normal costs tend to be latency-based, which is well-suited to SLP. However, comparing sums of latency costs means that we effectively treat the code as executing sequentially. This can hide the effect of pipeline bubbles or resource contention that in practice are quite important for loop vectorisation. This is particularly true for loops that involve reductions. This patch therefore tries to estimate how quickly each piece of code could issue, using a very (very) simplistic model. It then uses this to adjust the loop vector costs up or down as appropriate. Part of the Advanced SIMD vs. SVE adjustment is opt-in and is not enabled by default even for use_new_vector_costs. Like with the previous patches, this one only becomes active if a CPU selects use_new_vector_costs. It should therefore have a very low impact on other CPUs. The code also mostly ignores CPUs that have no issue information, even if use_new_vector_costs is enabled for some reason. gcc/ * config/aarch64/aarch64.opt (-param=aarch64-loop-vect-issue-rate-niters=): New parameter. * doc/invoke.texi: Document it. * config/aarch64/aarch64-protos.h (aarch64_base_vec_issue_info) (aarch64_scalar_vec_issue_info, aarch64_simd_vec_issue_info) (aarch64_advsimd_vec_issue_info, aarch64_sve_vec_issue_info) (aarch64_vec_issue_info): New structures. (cpu_vector_cost): Write comments above the variables rather than to the side. (cpu_vector_cost::issue_info): New member variable. * config/aarch64/aarch64.c: Include gimple-pretty-print.h and tree-ssa-loop-niter.h. (generic_vector_cost, a64fx_vector_cost, qdf24xx_vector_cost) (thunderx_vector_cost, tsv110_vector_cost, cortexa57_vector_cost) (exynosm1_vector_cost, xgene1_vector_cost, thunderx2t99_vector_cost) (thunderx3t110_vector_cost): Initialize issue_info to null. (neoversev1_scalar_issue_info, neoversev1_advsimd_issue_info) (neoversev1_sve_issue_info, neoversev1_vec_issue_info): New structures. (neoversev1_vector_cost): Use them. (aarch64_vec_op_count, aarch64_sve_op_count): New structures. (aarch64_vector_costs::saw_sve_only_op): New member variable. (aarch64_vector_costs::num_vector_iterations): Likewise. (aarch64_vector_costs::scalar_ops): Likewise. (aarch64_vector_costs::advsimd_ops): Likewise. (aarch64_vector_costs::sve_ops): Likewise. (aarch64_vector_costs::seen_loads): Likewise. (aarch64_simd_vec_costs_for_flags): New function. (aarch64_analyze_loop_vinfo): Initialize num_vector_iterations. Count the number of predicate operations required by SVE WHILE instructions. (aarch64_comparison_type, aarch64_multiply_add_p): New functions. (aarch64_sve_only_stmt_p, aarch64_in_loop_reduction_latency): Likewise. (aarch64_count_ops): Likewise. (aarch64_add_stmt_cost): Record whether see an SVE operation that cannot currently be implementing using Advanced SIMD. Record issue information about the scalar, Advanced SIMD and (where relevant) SVE versions of a loop. (aarch64_vec_op_count::dump): New function. (aarch64_sve_op_count::dump): Likewise. (aarch64_estimate_min_cycles_per_iter): Likewise. (aarch64_adjust_body_cost): If issue information is available, try to compare the issue rates of the various loop implementations and increase or decrease the vector body cost accordingly.
2021-03-26aarch64: Ignore inductions when costing vector codeRichard Sandiford1-0/+6
In practice it seems to be better not to cost a vector induction. The scalar code generally needs the same induction but doesn't cost it, making an apples-for-apples comparison harder. Most inductions also have a low latency and their cost usually gets hidden by other operations. Like with the previous patches, this one only becomes active if a CPU selects use_new_vector_costs. It should therefore have a very low impact on other CPUs. gcc/ * config/aarch64/aarch64.c (aarch64_detect_vector_stmt_subtype): Assume a zero cost for induction phis.
2021-03-26aarch64: Cost comparisons embedded in COND_EXPRsRichard Sandiford1-0/+33
So far the costing of COND_EXPRs hasn't distinguished between cases in which the condition is calculated separately or is built into the COND_EXPR itself. This patch adds the cost of any embedded comparison. Like with the previous patches, this one only becomes active if a CPU selects use_new_vector_costs. It should therefore have a very low impact on other CPUs. gcc/ * config/aarch64/aarch64.c (aarch64_embedded_comparison_type): New function. (aarch64_adjust_stmt_cost): Add the costs of embedded scalar and vector comparisons.
2021-03-26aarch64: Detect scalar extending loadsRichard Sandiford1-4/+27
If the scalar code does an integer load followed by an integer extension, we've tended to cost that as two separate operations, even though the extension is probably going to be free in practice. This patch treats the extension as having zero cost, like we already do for extending SVE loads. Like with previous patches, this one only becomes active if a CPU selects use_new_vector_costs. It should therefore have a very low impact on other CPUs. gcc/ * config/aarch64/aarch64.c (aarch64_detect_scalar_stmt_subtype): New function. (aarch64_add_stmt_cost): Call it.
2021-03-26aarch64: Try to detect when Advanced SIMD code would be completely unrolledRichard Sandiford2-7/+210
GCC usually costs the SVE and Advanced SIMD versions of a loop and picks the one with the lowest cost. By default it will choose SVE over Advanced SIMD in the event of tie. This is normally the correct behaviour, not least because SVE can handle every scalar iteration count whereas Advanced SIMD can only handle full vectors. However, there is one important exception that GCC failed to consider: we can completely unroll Advanced SIMD code at compile time, but we can't do the same for SVE. This patch therefore adds an opt-in heuristic to guess whether the Advanced SIMD version of a loop is likely to be unrolled. This will only be suitable for some CPUs, so it is not enabled by default and is controlled separately from use_new_vector_costs. Like with previous patches, this one only becomes active if a CPU selects both of the new tuning parameters. It should therefore have a very low impact on other CPUs. gcc/ * config/aarch64/aarch64-tuning-flags.def (matched_vector_throughput): New tuning parameter. * config/aarch64/aarch64.c (neoversev1_tunings): Use it. (aarch64_estimated_sve_vq): New function. (aarch64_vector_costs::analyzed_vinfo): New member variable. (aarch64_vector_costs::is_loop): Likewise. (aarch64_vector_costs::unrolled_advsimd_niters): Likewise. (aarch64_vector_costs::unrolled_advsimd_stmts): Likewise. (aarch64_record_potential_advsimd_unrolling): New function. (aarch64_analyze_loop_vinfo, aarch64_analyze_bb_vinfo): Likewise. (aarch64_add_stmt_cost): Call aarch64_analyze_loop_vinfo or aarch64_analyze_bb_vinfo on the first use of a costs structure. Detect whether we're vectorizing a loop for SVE that might be completely unrolled if it used Advanced SIMD instead. (aarch64_adjust_body_cost_for_latency): New function. (aarch64_finish_cost): Call it.
2021-03-26aarch64: Use an aarch64-specific structure for vector costingRichard Sandiford1-2/+44
This patch makes the AArch64 vector code use its own vector costs structure, rather than just using the default unsigned[3]. Unfortunately, it's not easy to make this change specific to use_new_vector_costs, so this part is one that affects all CPUs. The change is relatively mechanical though. gcc/ * config/aarch64/aarch64.c (aarch64_vector_costs): New structure. (aarch64_init_cost): New function. (aarch64_add_stmt_cost): Use aarch64_vector_costs instead of the default unsigned[3]. (aarch64_finish_cost, aarch64_destroy_cost_data): New functions. (TARGET_VECTORIZE_INIT_COST): Override. (TARGET_VECTORIZE_FINISH_COST): Likewise. (TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
2021-03-26aarch64: Add a CPU-specific cost table for Neoverse V1Richard Sandiford1-2/+93
This patch adds dedicated vector costs for Neoverse V1. Previously we just used the Cortex-A57 costs, which isn't ideal given that Cortex-A57 doesn't support SVE. gcc/ * config/aarch64/aarch64.c (neoversev1_advsimd_vector_cost) (neoversev1_sve_vector_cost): New cost structures. (neoversev1_vector_cost): Likewise. (neoversev1_tunings): Use them. Enable use_new_vector_costs.
2021-03-26aarch64: Add costs for one element of a scatter storeRichard Sandiford2-4/+18
Currently each element in a gather load is costed as a scalar_load and each element in a scatter store is costed as a scalar_store. The load side seems to work pretty well in practice, since many CPU-specific costs give loads quite a high cost relative to arithmetic operations. However, stores usually have a cost of just 1, which means that scatters tend to appear too cheap. This patch adds a separate cost for one element in a scatter store. Like with the previous patches, this one only becomes active if a CPU selects use_new_vector_costs. It should therefore have a very low impact on other CPUs. gcc/ * config/aarch64/aarch64-protos.h (sve_vec_cost::scatter_store_elt_cost): New member variable. * config/aarch64/aarch64.c (generic_sve_vector_cost): Update accordingly, taking the cost from the cost of a scalar_store. (a64fx_sve_vector_cost): Likewise. (aarch64_detect_vector_stmt_subtype): Detect scatter stores.
2021-03-26aarch64: Add costs for storing one element of a vectorRichard Sandiford2-0/+24
Storing one element of a vector is costed as a vec_to_scalar followed by a scalar_store. However, vec_to_scalar is also used for reductions and for vector-to-GPR moves, which makes it difficult to pick one cost for them all. This patch therefore adds a cost for extracting one element of a vector in preparation for storing it out. The store itself is still costed separately. Like with the previous patches, this one only becomes active if a CPU selects use_new_vector_costs. It should therefore have a very low impact on other CPUs. gcc/ * config/aarch64/aarch64-protos.h (simd_vec_cost::store_elt_extra_cost): New member variable. * config/aarch64/aarch64.c (generic_advsimd_vector_cost): Update accordingly, using the vec_to_scalar cost for the new field. (generic_sve_vector_cost, a64fx_advsimd_vector_cost): Likewise. (a64fx_sve_vector_cost, qdf24xx_advsimd_vector_cost): Likewise. (thunderx_advsimd_vector_cost, tsv110_advsimd_vector_cost): Likewise. (cortexa57_advsimd_vector_cost, exynosm1_advsimd_vector_cost) (xgene1_advsimd_vector_cost, thunderx2t99_advsimd_vector_cost) (thunderx3t110_advsimd_vector_cost): Likewise. (aarch64_detect_vector_stmt_subtype): Detect single-element stores.
2021-03-26aarch64: Add costs for LD[234]/ST[234] permutesRichard Sandiford2-0/+101
At the moment, we cost LD[234] and ST[234] as N vector loads or stores, which effectively treats the implied permute as free. This patch adds additional costs for the permutes, which apply on top of the costs for the loads and stores. Like with the previous patches, this one only becomes active if a CPU selects use_new_vector_costs. It should therefore have a very low impact on other CPUs. gcc/ * config/aarch64/aarch64-protos.h (simd_vec_cost::ld2_st2_permute_cost) (simd_vec_cost::ld3_st3_permute_cost): New member variables. (simd_vec_cost::ld4_st4_permute_cost): Likewise. * config/aarch64/aarch64.c (generic_advsimd_vector_cost): Update accordingly, using zero for the new costs. (generic_sve_vector_cost, a64fx_advsimd_vector_cost): Likewise. (a64fx_sve_vector_cost, qdf24xx_advsimd_vector_cost): Likewise. (thunderx_advsimd_vector_cost, tsv110_advsimd_vector_cost): Likewise. (cortexa57_advsimd_vector_cost, exynosm1_advsimd_vector_cost) (xgene1_advsimd_vector_cost, thunderx2t99_advsimd_vector_cost) (thunderx3t110_advsimd_vector_cost): Likewise. (aarch64_ld234_st234_vectors): New function. (aarch64_adjust_stmt_cost): Likewise. (aarch64_add_stmt_cost): Call aarch64_adjust_stmt_cost if using the new vector costs.
2021-03-26aarch64: Add vector costs for SVE CLAST[AB] and FADDARichard Sandiford2-37/+141
Following on from the previous reduction costs patch, this one adds costs for the SVE CLAST[AB] and FADDA instructions. These instructions occur within the loop body, whereas the reductions handled by the previous patch occur outside. Like with the previous patch, this one only becomes active if a CPU selects use_new_vector_costs. It should therefore have a very low impact on other CPUs. gcc/ * config/aarch64/aarch64-protos.h (sve_vec_cost): Turn into a derived class of simd_vec_cost. Add information about CLAST[AB] and FADDA instructions. * config/aarch64/aarch64.c (generic_sve_vector_cost): Update accordingly, using the vec_to_scalar costs for the new fields. (a64fx_sve_vector_cost): Likewise. (aarch64_reduc_type): New function. (aarch64_sve_in_loop_reduction_latency): Likewise. (aarch64_detect_vector_stmt_subtype): Take a vinfo parameter. Use aarch64_sve_in_loop_reduction_latency to handle SVE reductions that occur in the loop body. (aarch64_add_stmt_cost): Update call accordingly.
2021-03-26aarch64: Add reduction costs to simd_vec_costsRichard Sandiford3-22/+216
This patch is part of a series that makes opt-in tweaks to the AArch64 vector cost model. At the moment, all reductions are costed as vec_to_scalar, which also includes things like extracting a single element from a vector. This is a bit too coarse in practice, since the cost of a reduction depends very much on the type of value that it's processing. This patch therefore adds separate costs for each case. To start with, all the new costs are copied from the associated vec_to_scalar ones. Due the extreme lateness of this patch in the GCC 11 cycle, I've added a new tuning flag (use_new_vector_costs) that selects the new behaviour. This should help to ensure that the risk of the new code is only borne by the CPUs that need it. Generic tuning is not affected. gcc/ * config/aarch64/aarch64-tuning-flags.def (use_new_vector_costs): New tuning flag. * config/aarch64/aarch64-protos.h (simd_vec_cost): Put comments above the fields rather than to the right. (simd_vec_cost::reduc_i8_cost): New member variable. (simd_vec_cost::reduc_i16_cost): Likewise. (simd_vec_cost::reduc_i32_cost): Likewise. (simd_vec_cost::reduc_i64_cost): Likewise. (simd_vec_cost::reduc_f16_cost): Likewise. (simd_vec_cost::reduc_f32_cost): Likewise. (simd_vec_cost::reduc_f64_cost): Likewise. * config/aarch64/aarch64.c (generic_advsimd_vector_cost): Update accordingly, using the vec_to_scalar_cost for the new fields. (generic_sve_vector_cost, a64fx_advsimd_vector_cost): Likewise. (a64fx_sve_vector_cost, qdf24xx_advsimd_vector_cost): Likewise. (thunderx_advsimd_vector_cost, tsv110_advsimd_vector_cost): Likewise. (cortexa57_advsimd_vector_cost, exynosm1_advsimd_vector_cost) (xgene1_advsimd_vector_cost, thunderx2t99_advsimd_vector_cost) (thunderx3t110_advsimd_vector_cost): Likewise. (aarch64_use_new_vector_costs_p): New function. (aarch64_simd_vec_costs): New function, split out from... (aarch64_builtin_vectorization_cost): ...here. (aarch64_is_reduction): New function. (aarch64_detect_vector_stmt_subtype): Likewise. (aarch64_add_stmt_cost): Call aarch64_detect_vector_stmt_subtype if using the new vector costs.
2021-03-26Fix ICE: in function_and_variable_visibility, at ipa-visibility.c:795 [PR99466]Iain Buclaw3-2/+22
In get_emutls_init_templ_addr, only thread-local declarations that were DECL_ONE_ONLY would have a public initializer symbol, ignoring variables that were declared with __attribute__((weak)). gcc/ChangeLog: PR ipa/99466 * tree-emutls.c (get_emutls_init_templ_addr): Mark initializer of weak TLS declarations as public. gcc/testsuite/ChangeLog: PR ipa/99466 * gcc.dg/tls/pr99466-1.c: New test. * gcc.dg/tls/pr99466-2.c: New test.
2021-03-26d: Define IN_TARGET_CODE in all machine-specific D language files.Iain Buclaw9-0/+18
This is to be consistent with the rest of the back-end. gcc/ChangeLog: * config/aarch64/aarch64-d.c (IN_TARGET_CODE): Define. * config/arm/arm-d.c (IN_TARGET_CODE): Likewise. * config/i386/i386-d.c (IN_TARGET_CODE): Likewise. * config/mips/mips-d.c (IN_TARGET_CODE): Likewise. * config/pa/pa-d.c (IN_TARGET_CODE): Likewise. * config/riscv/riscv-d.c (IN_TARGET_CODE): Likewise. * config/rs6000/rs6000-d.c (IN_TARGET_CODE): Likewise. * config/s390/s390-d.c (IN_TARGET_CODE): Likewise. * config/sparc/sparc-d.c (IN_TARGET_CODE): Likewise.
2021-03-26d: Add windows support for D compiler [PR91595]Iain Buclaw5-0/+87
gcc/ChangeLog: PR d/91595 * config.gcc (*-*-cygwin*): Add winnt-d.o (*-*-mingw*): Likewise. * config/i386/cygwin.h (EXTRA_TARGET_D_OS_VERSIONS): New macro. * config/i386/mingw32.h (EXTRA_TARGET_D_OS_VERSIONS): Likewise. * config/i386/t-cygming: Add winnt-d.o. * config/i386/winnt-d.c: New file.
2021-03-26[freebsd] d: Fix build failures on sparc64-*-freebsd*Iain Buclaw1-0/+1
All target platforms that could run on SPARC should include this header in order to avoid errors from memmodel being used in sparc-protos.h. gcc/ChangeLog: * config/freebsd-d.c: Include memmodel.h.
2021-03-26d: Add openbsd support for D compiler [PR99691]Iain Buclaw3-0/+46
gcc/ChangeLog: PR d/99691 * config.gcc (*-*-openbsd*): Add openbsd-d.o. * config/t-openbsd: Add openbsd-d.o. * config/openbsd-d.c: New file.
2021-03-26c++: Fix ICE with nsdmi [PR99705]Jakub Jelinek2-0/+63
When adding P0784R7 constexpr new support, we still didn't have P1331R2 implemented and so I had to change also build_vec_delete_1 - instead of having uninitialized tbase temporary later initialized by MODIFY_EXPR I've set the DECL_INITIAL for it - because otherwise it would be rejected during constexpr evaluation which didn't like uninitialized vars. Unfortunately, that change broke the following testcase. The problem is that these temporaries (not just tbase but tbase was the only one with an initializer) are created during NSDMI parsing and current_function_decl is NULL at that point. Later when we clone body of constructors, auto_var_in_fn_p is false for those (as they have NULL DECL_CONTEXT) and so they aren't duplicated, and what is worse, the DECL_INITIAL isn't duplicated either nor processed, and during expansion we ICE because the code from DECL_INITIAL of that var refers to the abstract constructor's PARM_DECL (this) rather than the actual constructor's one. So, either we can just revert those build_vec_delete_1 changes (as done in the second patch - in attachment), or, as the first patch does, we can copy the temporaries during bot_manip like we copy the temporaries of TARGET_EXPRs. To me that looks like a better fix because e.g. if break_out_of_target_exprs is called for the same NSDMI multiple times, sharing the temporaries looks just wrong to me. If the temporaries are declared as BIND_EXPR_VARS of some BIND_EXPR (which is the case of the tbase variable built by build_vec_delete_1 and is the only way how the DECL_INITIAL can be walked by *walk_tree*), then we need to copy it also in the BIND_EXPR BIND_EXPR_VARS chain, other temporaries (those that don't need DECL_INITIAL) often have just DECL_EXPR and no corresponding BIND_EXPR. Note, ({ }) are rejected in nsdmis, so all we run into are temporaries the FE creates artificially. 2021-03-26 Jakub Jelinek <jakub@redhat.com> PR c++/99705 * tree.c (bot_manip): Remap artificial automatic temporaries mentioned in DECL_EXPR or in BIND_EXPR_VARS. * g++.dg/cpp0x/new5.C: New test.
2021-03-26Fortran: Fix intrinsic null() handling [PR99651]Tobias Burnus2-0/+21
gcc/fortran/ChangeLog: PR fortran/99651 * intrinsic.c (gfc_intrinsic_func_interface): Set attr.proc = PROC_INTRINSIC if FL_PROCEDURE. gcc/testsuite/ChangeLog: PR fortran/99651 * gfortran.dg/null_11.f90: New test.
2021-03-26Daily bump.GCC Administrator8-1/+212
2021-03-25PR tree-optimization/55060 - False un-initialized variable warningsMartin Sebor1-0/+30
gcc/testsuite/ChangeLog: PR tree-optimization/55060 * gcc.dg/uninit-pr55060.c: New.
2021-03-25PR tree-optimization/48483 - Construct from yourself w/o warningMartin Sebor1-0/+56
gcc/testsuite/ChangeLog: PR tree-optimization/48483 * g++.dg/warn/uninit-pr48483.C: New test.
2021-03-25New test for PR tree-optimization/44547 - -Wuninitialized reports false ↵Martin Sebor1-0/+61
warning in nested switch statements. gcc/testsuite/ChangeLog: * gcc.dg/uninit-pr44547.c: New.
2021-03-25Update gcc fr.po.Joseph Myers1-446/+310
* fr.po: Update.
2021-03-25c++: Fix source_location inconsistency between calls from templates and ↵Jakub Jelinek12-26/+77
non-templates [PR99672] The srcloc19.C testcase shows inconsistency in std::source_location::current() locations between calls from templates and non-templates. The location used by __builtin_source_location comes in both cases from input_location which is set on it by bot_manip when handling the default argument, called during finish_call_expr. The problem is that in templates that input_location comes from the CALL_EXPR we built earlier and that has the combined locus with range between first character of the function name and closing paren with caret on the opening paren, so something printed as caret as: foobar (); ~~~~~~^~ But outside of templates, finish_call_expr is called when input_location is just the closing paren token, i.e. foobar (); ^ and only after that returns we create the combined location and set the CALL_EXPR location to that. So, it means std::source_location::current() reports in templates the column of opening (, while outside of templates closing ). The following patch makes it consistent by creating the combined location already before calling finish_call_expr and temporarily overriding input_location to that. 2021-03-25 Jakub Jelinek <jakub@redhat.com> PR c++/99672 * parser.c (cp_parser_postfix_expression): For calls, create combined_loc and temporarily set input_location to it before calling finish_call_expr. * g++.dg/concepts/diagnostic2.C: Adjust expected caret line. * g++.dg/cpp1y/builtin_location.C (f4, n6): Move #line directives to match locus changes. * g++.dg/cpp2a/srcloc1.C: Adjust expected column numbers. * g++.dg/cpp2a/srcloc2.C: Likewise. * g++.dg/cpp2a/srcloc15.C: Likewise. * g++.dg/cpp2a/srcloc16.C: Likewise. * g++.dg/cpp2a/srcloc19.C: New test. * g++.dg/modules/adhoc-1_b.C: Adjust expected column numbers and caret line. * g++.dg/modules/macloc-1_c.C: Adjust expected column numbers. * g++.dg/modules/macloc-1_d.C: Likewise. * g++.dg/plugin/diagnostic-test-expressions-1.C: Adjust expected caret line. * testsuite/18_support/source_location/consteval.cc (main): Adjust expected column numbers. * testsuite/18_support/source_location/1.cc (main): Likewise.
2021-03-25c++: ICE on invalid with inheriting constructors [PR94751]Marek Polacek4-8/+51
This is an ICE on invalid where we crash because since r269032 we keep error_mark_node around instead of using noexcept_false_spec when things go wrong; see the walk_field_subobs hunk. We crash in deduce_inheriting_ctor which calls synthesized_method_walk to deduce the exception-specification, but fails to do so in this case, because the testcase is invalid so get_nsdmi returns error_mark_node for the member 'c', and per r269032 the error_mark_node propagates back to deduce_inheriting_ctor which subsequently calls build_exception_variant whereon we crash. I think we should return early if the deduction fails and I decided to call mark_used to get an error right away instead of hoping that it would get called later. My worry is that we could forget that there was an error and think that we just deduced noexcept(false). And then I noticed that the test still crashes in C++98. Here again we failed to deduce the exception-specification in implicitly_declare_fn, but nothing reported an error between synthesized_method_walk and the assert. Well, not much we can do except calling synthesized_method_walk again, this time in the verbose mode and making sure that we did get an error. gcc/cp/ChangeLog: PR c++/94751 * call.c (build_over_call): Maybe call mark_used in case deduce_inheriting_ctor fails and return error_mark_node. * cp-tree.h (deduce_inheriting_ctor): Adjust declaration. * method.c (deduce_inheriting_ctor): Return bool if the deduction fails. (implicitly_declare_fn): If raises is error_mark_node, call synthesized_method_walk with diag being true. gcc/testsuite/ChangeLog: PR c++/94751 * g++.dg/cpp0x/inh-ctor37.C: New test.
2021-03-25c++: Diagnose bare parameter packs in bitfield widths [PR99745]Jakub Jelinek2-1/+10
The following invalid tests ICE because we don't diagnose (and drop) bare parameter packs in bitfield widths. 2021-03-25 Jakub Jelinek <jakub@redhat.com> PR c++/99745 * decl2.c (grokbitfield): Diagnose bitfields containing bare parameter packs and don't set DECL_BIT_FIELD_REPRESENTATIVE in that case. * g++.dg/cpp0x/variadic181.C: New test.
2021-03-25c++: -Wconversion vs value-dependent expressions [PR99331]Marek Polacek2-0/+22
This PR complains that we issue a -Wconversion warning in template <int N> struct X {}; template <class T> X<sizeof(T)> foo(); saying "conversion from 'long unsigned int' to 'int' may change value". While it's not technically wrong, I suspect -Wconversion warnings aren't all that useful for value-dependent expressions. So this patch disables them. This is a regression that started with r241425: @@ -7278,7 +7306,7 @@ convert_template_argument (tree parm, val = error_mark_node; } } - else if (!dependent_template_arg_p (orig_arg) + else if (!type_dependent_expression_p (orig_arg) && !uses_template_parms (t)) /* We used to call digest_init here. However, digest_init will report errors, which we don't want when complain Here orig_arg is SIZEOF_EXPR<T>; dependent_template_arg_p (orig_arg) was true, but type_dependent_expression_p (orig_arg) is false so we warn in convert_nontype_argument. gcc/cp/ChangeLog: PR c++/99331 * call.c (build_converted_constant_expr_internal): Don't emit -Wconversion warnings. gcc/testsuite/ChangeLog: PR c++/99331 * g++.dg/warn/Wconversion5.C: New test.
2021-03-25tree-optimization/96974 - avoid ICE by replacing assert with standard failureStam Markianos-Wright2-2/+24
Minor patch to add a graceful exit in the rare case where an invalid combination of TYPE_VECTOR_SUBPARTS for nunits_vectype and *stmt_vectype_out is reached in vect_get_vector_types_for_stmt. This resolves the ICE seen in PR tree-optimization/96974, however the issue of correctly handling this rare vectorization combination is left for a later patch. Bootstrapped and reg-tested on aarch64-linux-gnu. 2021-03-25 Stam Markianos-Wright <stam.markianos-wright@arm.com> gcc/ChangeLog: PR tree-optimization/96974 * tree-vect-stmts.c (vect_get_vector_types_for_stmt): Replace assert with graceful exit. gcc/testsuite/ChangeLog: PR tree-optimization/96974 * g++.target/aarch64/sve/pr96974.C: New test.
2021-03-25Revert "x86: Skip ISA check for always_inline in system headers"H.J. Lu4-58/+8
This reverts commit 72982851d70dfbc547d83ed2bb45356b9ebe3ff0.
2021-03-25vect: Init inside_cost in vect_model_reduction_costKewen Lin1-1/+1
This patch is to initialize the inside_cost as zero, can avoid to use its uninitialized value when some path doesn't assign it. gcc/ChangeLog: * tree-vect-loop.c (vect_model_reduction_cost): Init inside_cost.
2021-03-25c-family: Fix up -Wduplicated-branches for union members [PR99565]Jakub Jelinek8-9/+35
Honza has fairly recently changed operand_equal_p to compare DECL_FIELD_OFFSET for COMPONENT_REFs when comparing addresses. As the first testcase in this patch shows, while that is very nice for optimizations, for the -Wduplicated-branches warning it causes regressions. Pedantically a union in both C and C++ has only one active member at a time, so using some other union member even if it has the same type is UB, so I think the warning shouldn't warn when it sees access to different fields that happen to have the same offset and should consider them different. In my first attempt to fix this I've keyed the old behavior on OEP_LEXICOGRAPHIC, but unfortunately that has various problems, the warning has a quick non-lexicographic compare in build_conditional_expr* and another lexicographic more expensive one later during genericization and turning the first one into lexicographic would mean wasting compile time on large conditionals. So, this patch instead introduces a new OEP_ flag and makes sure to pass it to operand_equal_p in all -Wduplicated-branches cases. The cvt.c changes are because on the other testcase we were warning with UNKNOWN_LOCATION, so the user wouldn't really know where the questionable code is. 2021-03-25 Jakub Jelinek <jakub@redhat.com> PR c++/99565 * tree-core.h (enum operand_equal_flag): Add OEP_ADDRESS_OF_SAME_FIELD. * fold-const.c (operand_compare::operand_equal_p): Don't compare field offsets if OEP_ADDRESS_OF_SAME_FIELD. * c-warn.c (do_warn_duplicated_branches): Pass also OEP_ADDRESS_OF_SAME_FIELD to operand_equal_p. * c-typeck.c (build_conditional_expr): Pass OEP_ADDRESS_OF_SAME_FIELD to operand_equal_p. * call.c (build_conditional_expr_1): Pass OEP_ADDRESS_OF_SAME_FIELD to operand_equal_p. * cvt.c (convert_to_void): Preserve location_t on COND_EXPR or or COMPOUND_EXPR. * g++.dg/warn/Wduplicated-branches6.C: New test. * g++.dg/warn/Wduplicated-branches7.C: New test.
2021-03-25x86: Skip ISA check for always_inline in system headersH.J. Lu4-8/+58
For always_inline in system headers, we don't know if caller's ISAs are compatible with callee's ISAs until much later. Skip ISA check for always_inline in system headers if caller has target attribute. gcc/ PR target/98209 PR target/99744 * config/i386/i386.c (ix86_can_inline_p): Don't check ISA for always_inline in system headers. gcc/testsuite/ PR target/98209 PR target/99744 * gcc.target/i386/pr98209.c: New test. * gcc.target/i386/pr99744-1.c: Likewise. * gcc.target/i386/pr99744-2.c: Likewise.
2021-03-25tree-optimization/99746 - avoid confusing hybrid codeRichard Biener2-8/+47
This avoids confusing the hybrid vectorization code with SLP patterns by not marking SLP pattern covered stmts as patterns (they are marked as SLP patterns already). This means that loop vectorization will vectorize the scalar stmt rather than the SLP pattern stmt (which it can't anyway). 2021-03-24 Richard Biener <rguenther@suse.de> PR tree-optimization/99746 * tree-vect-slp-patterns.c (complex_pattern::build): Do not mark the scalar stmt as patterned. Instead set up required things manually. * gfortran.dg/vect/pr99746.f90: New testcase.
2021-03-24rs6000: Correct Power8 cost of l2 cache size [PR97329]Xionghu Luo1-1/+1
l2 cache size for Power8 is 512kB, it was copied from Power7 before public. Tested no performance change for SPEC2017. gcc/ChangeLog: 2021-03-24 Xionghu Luo <luoxhu@linux.ibm.com> * config/rs6000/rs6000.c (power8_costs): Change l2 cache from 256 to 512.
2021-03-24analyzer; reset sm-state for SSA names at def-stmts [PR93695,PR99044,PR99716]David Malcolm13-3/+372
Various false positives from -fanalyzer involve SSA names in loops, where sm-state associated with an SSA name from one iteration is erroneously reused in a subsequent iteration. For example, PR analyzer/99716 describes a false "double 'fclose' of FILE 'fp'" on: for (i = 0; i < 2; ++i) { FILE *fp = fopen ("/tmp/test", "w"); fprintf (fp, "hello"); fclose (fp); } where the gimple of the loop body is: fp_7 = fopen ("/tmp/test", "w"); __builtin_fwrite ("hello", 1, 5, fp_7); fclose (fp_7); i_10 = i_1 + 1; where fp_7 transitions to "closed" at the fclose, but is not reset at the subsequent fopen, leading to the false positive when the fclose is re-reached. The fix is to reset sm-state for svalues that involve an SSA name at the SSA name's def-stmt, since the def-stmt effectively changes the meaning of those related svalues. gcc/analyzer/ChangeLog: PR analyzer/93695 PR analyzer/99044 PR analyzer/99716 * engine.cc (exploded_node::on_stmt): Clear sm-state involving an SSA name at the def-stmt of that SSA name. * program-state.cc (sm_state_map::purge_state_involving): New. * program-state.h (sm_state_map::purge_state_involving): New decl. * region-model.cc (selftest::test_involves_p): New. (selftest::analyzer_region_model_cc_tests): Call it. * svalue.cc (class involvement_visitor): New class (svalue::involves_p): New. * svalue.h (svalue::involves_p): New decl. gcc/testsuite/ChangeLog: PR analyzer/93695 PR analyzer/99044 PR analyzer/99716 * gcc.dg/analyzer/attr-malloc-CVE-2019-19078-usb-leak.c: Remove xfail. * gcc.dg/analyzer/pr93695-1.c: New test. * gcc.dg/analyzer/pr99044-1.c: New test. * gcc.dg/analyzer/pr99044-2.c: New test. * gcc.dg/analyzer/pr99716-1.c: New test. * gcc.dg/analyzer/pr99716-2.c: New test. * gcc.dg/analyzer/pr99716-3.c: New test.
2021-03-25Daily bump.GCC Administrator5-1/+106