aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2021-03-22debug: Fix __int128 handling in dwarf2out [PR99562]Jakub Jelinek1-5/+5
The PR66728 changes broke __int128 handling. It emits wide_int numbers in their minimum unsigned precision rather than in their full precision. The problem is then that e.g. the DW_OP_implicit_value path: int_mode = as_a <scalar_int_mode> (mode); loc_result = new_loc_descr (DW_OP_implicit_value, GET_MODE_SIZE (int_mode), 0); loc_result->dw_loc_oprnd2.val_class = dw_val_class_wide_int; loc_result->dw_loc_oprnd2.v.val_wide = ggc_alloc<wide_int> (); *loc_result->dw_loc_oprnd2.v.val_wide = rtx_mode_t (rtl, int_mode); emits invalid DWARF. In particular this patch fixes there multiple occurences of: .byte 0x9e # DW_OP_implicit_value .uleb128 0x10 .quad 0xffffffffffffffff + .quad 0 .quad .LVL46 # Location list begin address (*.LLST40) .quad .LFE14 # Location list end address (*.LLST40) where we said the value has 16 byte size but then only emitted 8 byte value. My understanding is that most of the places that use val_wide expect the precision they chose (the one of the mode they want etc.), the only exception is the add_const_value_attribute case where it deals with VOIDmode CONST_WIDE_INTs, for that I agree when we don't have a mode we need to fallback to minimum precision (not sure if maximum of min_precision UNSIGNED and SIGNED wouldn't be better, then consumers would know if it is signed or unsigned by looking at the MSB), but that code already computes the precision, just decided to create the wide_int with much larger precision (e.g. 512 bit on x86_64). 2021-03-22 Jakub Jelinek <jakub@redhat.com> PR debug/99562 PR debug/66728 * dwarf2out.c (get_full_len): Use get_precision rather than min_precision. (add_const_value_attribute): Make sure add_AT_wide argument has precision prec rather than some very wide one.
2021-03-21rs6000: Fix some unexpected empty split conditionsKewen Lin2-19/+19
This patch is to fix empty split-conditions of some define_insn_and_split definitions where their conditions for define_insn part aren't empty. As Segher and Mike pointed out, they can sometimes lead to unexpected consequences. Bootstrapped/regtested on powerpc64le-linux-gnu P9 and powerpc64-linux-gnu P8. gcc/ChangeLog: * config/rs6000/rs6000.md (*rotldi3_insert_sf, *mov<SFDF:mode><SFDF2:mode>cc_p9, floatsi<mode>2_lfiwax, floatsi<mode>2_lfiwax_mem, floatunssi<mode>2_lfiwzx, floatunssi<mode>2_lfiwzx_mem, *floatsidf2_internal, *floatunssidf2_internal, fix_trunc<mode>si2_stfiwx, fix_trunc<mode>si2_internal, fixuns_trunc<mode>si2_stfiwx, *round32<mode>2_fprs, *roundu32<mode>2_fprs, *fix_trunc<mode>si2_internal): Fix empty split condition. * config/rs6000/vsx.md (*vsx_le_undo_permute_<mode>, vsx_reduc_<VEC_reduc_name>_v2df, vsx_reduc_<VEC_reduc_name>_v4sf, *vsx_reduc_<VEC_reduc_name>_v2df_scalar, *vsx_reduc_<VEC_reduc_name>_v4sf_scalar): Likewise.
2021-03-21rs6000: Convert the vector set variable idx to DImode [PR98914]Xionghu Luo2-21/+29
vec_insert defines the element argument type to be signed int by ELFv2 ABI. When expanding a vector with a variable rtx, convert the rtx type to DImode to support both intrinsic usage and other callers from rs6000_expand_vector_init produced by v[k] = val when k is long type. gcc/ChangeLog: 2021-03-21 Xionghu Luo <luoxhu@linux.ibm.com> PR target/98914 * config/rs6000/rs6000.c (rs6000_expand_vector_set_var_p9): Convert idx to DImode. (rs6000_expand_vector_set_var_p8): Likewise. gcc/testsuite/ChangeLog: 2021-03-21 Xionghu Luo <luoxhu@linux.ibm.com> PR target/98914 * gcc.target/powerpc/pr98914.c: New test.
2021-03-22Daily bump.GCC Administrator2-1/+9
2021-03-21dwarf2out: Fix debug info for 2 byte floats [PR99388]Jakub Jelinek1-10/+20
Aarch64, ARM and a couple of other architectures have 16-bit floats, HFmode. As can be seen e.g. on void foo (void) { __fp16 a = 1.0; asm ("nop"); a = 2.0; asm ("nop"); a = 3.0; asm ("nop"); } testcase, GCC mishandles this on the dwarf2out.c side by assuming all floating point types have sizes in multiples of 4 bytes, so what GCC emits is it says that e.g. the DW_OP_implicit_value will be 2 bytes but then doesn't emit anything and so anything emitted after it is treated by consumers as the value and then they get out of sync. real_to_target which insert_float uses indeed fills it that way, but putting into an array of long 32 bits each time, but for the half floats it puts everything into the least significant 16 bits of the first long no matter what endianity host or target has. The following patch fixes it. With the patch the -g -O2 -dA output changes (in a cross without .uleb128 support): .byte 0x9e // DW_OP_implicit_value .byte 0x2 // uleb128 0x2 + .2byte 0x3c00 // fp or vector constant word 0 .byte 0x7 // DW_LLE_start_end (*.LLST0) .8byte .LVL1 // Location list begin address (*.LLST0) .8byte .LVL2 // Location list end address (*.LLST0) .byte 0x4 // uleb128 0x4; Location expression size .byte 0x9e // DW_OP_implicit_value .byte 0x2 // uleb128 0x2 + .2byte 0x4000 // fp or vector constant word 0 .byte 0x7 // DW_LLE_start_end (*.LLST0) .8byte .LVL2 // Location list begin address (*.LLST0) .8byte .LFE0 // Location list end address (*.LLST0) .byte 0x4 // uleb128 0x4; Location expression size .byte 0x9e // DW_OP_implicit_value .byte 0x2 // uleb128 0x2 + .2byte 0x4200 // fp or vector constant word 0 .byte 0 // DW_LLE_end_of_list (*.LLST0) Bootstrapped/regtested on x86_64-linux, aarch64-linux and armv7hl-linux-gnueabi, ok for trunk? I fear the CONST_VECTOR case is still broken, while HFmode elements of vectors should be fine (it uses eltsize of the element sizes) and likewise SFmode could be fine, DFmode vectors are emitted as two 32-bit ints regardless of endianity and I'm afraid it can't be right on big-endian. But I haven't been able to create a testcase that emits a CONST_VECTOR, for e.g. unused vector vars with constant operands we emit CONCATN during expansion and thus ... DW_OP_*piece for each element of the vector and for DW_TAG_call_site_parameter we give up (because we handle CONST_VECTOR only in loc_descriptor, not mem_loc_descriptor). 2021-03-21 Jakub Jelinek <jakub@redhat.com> PR debug/99388 * dwarf2out.c (insert_float): Change return type from void to unsigned, handle GET_MODE_SIZE (mode) == 2 and return element size. (mem_loc_descriptor, loc_descriptor, add_const_value_attribute): Adjust callers.
2021-03-21Daily bump.GCC Administrator5-1/+40
2021-03-20x86: Check cfun != NULL before accessing silent_pH.J. Lu3-2/+36
Since construct_container may be called with cfun == NULL, check cfun != NULL before accessing silent_p. gcc/ PR target/99679 * config/i386/i386.c (construct_container): Check cfun != NULL before accessing silent_p. gcc/testsuite/ PR target/99679 * g++.target/i386/pr99679-1.C: New test. * g++.target/i386/pr99679-2.C: Likewise.
2021-03-20c-family: Fix PR94272 -fcompare-debug issue even for C [PR99230]Jakub Jelinek3-29/+70
The following testcase results in -fcompare-debug failure. The problem is the similar like in PR94272 https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542562.html When genericizing, with -g0 we have just a TREE_SIDE_EFFECTS DO_STMT in a branch of if, while with -g we have that wrapped into TREE_SIDE_EFFECTS STATEMENT_LIST containing DEBUG_BEGIN_STMT and that DO_STMT. The do loop is empty with 0 condition, so c_genericize_control_stmt turns it into an empty statement (without TREE_SIDE_EFFECTS). For -g0 that means that suddenly the if branch doesn't have side effects and is expanded differently. But with -g we still have TREE_SIDE_EFFECTS STATEMENT_LIST containing DEBUG_BEGIN_STMT and non-TREE_SIDE_EFFECTS stmt. The following patch fixes that by detecting this case and removing TREE_SIDE_EFFECTS. And, so that we don't duplicate the same code, changes the C++ FE to just call the c_genericize_control_stmt function that can now handle it. 2021-03-20 Jakub Jelinek <jakub@redhat.com> PR debug/99230 * c-gimplify.c (c_genericize_control_stmt): Handle STATEMENT_LIST. * cp-gimplify.c (cp_genericize_r) <case STATEMENT_LIST>: Remove special code, instead call c_genericize_control_stmt. * gcc.dg/pr99230.c: New test.
2021-03-20[PATCH] Fix typo in gcc/asan.c commentAhamed Husni1-2/+2
gcc/ * asan.c: Fix typos in comments.
2021-03-20[PR99680] Check empty constraint before using CONSTRAINT_LEN.Vladimir N. Makarov1-8/+8
It seems CONSTRAINT_LEN treats constraint '\0' as one having length 1. Therefore we read after the constraint string. The patch fixes it. gcc/ChangeLog: PR rtl-optimization/99680 * lra-constraints.c (skip_contraint_modifiers): Rename to skip_constraint_modifiers. (process_address_1): Check empty constraint before using CONSTRAINT_LEN.
2021-03-20Daily bump.GCC Administrator8-1/+150
2021-03-19c: Fix up -Wunused-but-set-* warnings for _Atomics [PR99588]Jakub Jelinek3-6/+97
As the following testcases show, compared to -D_Atomic= case we have many -Wunused-but-set-* warning false positives. When an _Atomic variable/parameter is read, we call mark_exp_read on it in convert_lvalue_to_rvalue, but build_atomic_assign does not. For consistency with the non-_Atomic case where we mark_exp_read the lhs for lhs op= ... but not for lhs = ..., this patch does that too. But furthermore we need to pattern match the trees emitted by _Atomic store, so that _Atomic store itself is not marked as being a variable read, but when the result of the store is used, we mark it. 2021-03-19 Jakub Jelinek <jakub@redhat.com> PR c/99588 * c-typeck.c (mark_exp_read): Recognize what build_atomic_assign with modifycode NOP_EXPR produces and mark the _Atomic var as read if found. (build_atomic_assign): For modifycode of NOP_EXPR, use COMPOUND_EXPRs rather than STATEMENT_LIST. Otherwise call mark_exp_read on lhs. Set TREE_SIDE_EFFECTS on the TARGET_EXPR. * gcc.dg/Wunused-var-5.c: New test. * gcc.dg/Wunused-var-6.c: New test.
2021-03-19Regenerate gcc.pot.Joseph Myers1-6766/+7527
* gcc.pot: Regenerate.
2021-03-19Add Power10 scheduling description.Pat Haugen2-253/+294
2021-03-19 Pat Haugen <pthaugen@linux.ibm.com> gcc/ * config/rs6000/rs6000.c (power10_cost): New. (rs6000_option_override_internal): Set Power10 costs. (rs6000_issue_rate): Set Power10 issue rate. * config/rs6000/power10.md: Rewrite for Power10.
2021-03-19Add size check to vector-matrix matmul.Thomas Koenig2-7/+21
It turns out the library version is much faster for vector-matrix multiplications for large sizes than what inlining can produce. Use size checks for switching between this and inlining for that case to. gcc/fortran/ChangeLog: * frontend-passes.c (inline_limit_check): Add rank_a argument. If a is rank 1, set the second dimension to 1. (inline_matmul_assign): Pass rank_a argument to inline_limit_check. (call_external_blas): Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/inline_matmul_6.f90: Adjust count for _gfortran_matmul.
2021-03-19[PR99663] Don't use unknown constraint for address constraint in ↵Vladimir N. Makarov2-5/+30
process_address_1. s390x has insns using several alternatives with address constraints. Even if we don't know at this stage what alternative will be used, we still can say that is an address constraint. So don't use unknown constraint in this case when there are multiple constraints or/and alternative. gcc/ChangeLog: PR target/99663 * lra-constraints.c (process_address_1): Don't use unknown constraint for address constraint. gcc/testsuite/ChangeLog: PR target/99663 * gcc.target/s390/pr99663.c: New.
2021-03-19c++: Only reject reinterpret casts from pointers to integers for ↵Jakub Jelinek3-2/+36
manifestly_const_eval evaluation [PR99456] My PR82304/PR95307 fix moved reinterpret cast from pointer to integer diagnostics from cxx_eval_outermost_constant_expr where it caught invalid code only at the outermost level down into cxx_eval_constant_expression. Unfortunately, it regressed following testcase, we emit worse code including dynamic initialization of some vars. While the initializers are not constant expressions due to the reinterpret_cast in there, there is no reason not to fold them as an optimization. I've tried to make this dependent on !ctx->quiet, but that regressed two further tests, and on ctx->strict, which regressed other tests, so this patch bases that on manifestly_const_eval. The new testcase is now optimized as much as it used to be in GCC 10 and the only regression it causes is an extra -Wnarrowing warning on vla22.C test on invalid code (which the patch adjusts). 2021-03-19 Jakub Jelinek <jakub@redhat.com> PR c++/99456 * constexpr.c (cxx_eval_constant_expression): For CONVERT_EXPR from INDIRECT_TYPE_P to ARITHMETIC_TYPE_P, when !ctx->manifestly_const_eval don't diagnose it, set *non_constant_p nor return t. * g++.dg/opt/pr99456.C: New test. * g++.dg/ext/vla22.C: Expect a -Wnarrowing warning for c++11 and later.
2021-03-19Darwin : Fix build failure for powerpc-darwin8 [PR99661].Iain Sandoe1-1/+0
A hunk had been missed from r11-6417, fixed thus: gcc/ChangeLog: PR target/99661 * config.gcc (powerpc-*-darwin8): Delete the reference to the now removed darwin8.h.
2021-03-19target/99660 - missing VX_CPU_PREFIX for vxworksaeOlivier Hainque1-0/+4
This fixes an oversight which causes make all-gcc to fail for --target=*vxworksae or vxworksmils, a regression introduced by the recent VxWorks7 related updates. Both AE and MILS variants resort to a common config/vxworksae.h, which misses a definition of VX_CPU_PREFIX expected by port specific headers. The change just provides the missing definition. 2021-03-19 Olivier Hainque <hainque@adacore.com> gcc/ PR target/99660 * config/vxworksae.h (VX_CPU_PREFIX): Define.
2021-03-19Use memcpy instead of strncpy to avoid error with -Werror=stringop-truncation.John David Anglin1-1/+1
gcc/ChangeLog: * config/pa/pa.c (import_milli): Use memcpy instead of strncpy.
2021-03-19slp: remove unneeded permute calculation (PR99656)Tamar Christina3-47/+47
The attach testcase ICEs because as you showed on the PR we have one child which is an internal with a PERM of EVENEVEN and one with TOP. The problem is while we can conceptually merge the permute itself into EVENEVEN, merging the lanes don't really make sense. That said, we no longer even require the merged lanes as we create the permutes based on the KIND directly. This patch just removes all of that code. Unfortunately it still won't vectorize with the cost model enabled due to the blend that's created combining the load and the external note: node 0x51f2ce8 (max_nunits=1, refcnt=1) note: op: VEC_PERM_EXPR note: { } note: lane permutation { 0[0] 1[1] } note: children 0x51f23e0 0x51f2578 note: node 0x51f23e0 (max_nunits=2, refcnt=1) note: op template: _16 = REALPART_EXPR <*t1_9(D)>; note: stmt 0 _16 = REALPART_EXPR <*t1_9(D)>; note: stmt 1 _16 = REALPART_EXPR <*t1_9(D)>; note: load permutation { 0 0 } note: node (external) 0x51f2578 (max_nunits=1, refcnt=1) note: { _18, _18 } which costs the cost for the load-and-split and the cost of the external splat, and the one for blending them while in reality it's just a scalar load and insert. The compiler (with the cost model disabled) generates ldr q1, [x19] dup v1.2d, v1.d[0] ldr d0, [x19, 8] fneg d0, d0 ins v1.d[1], v0.d[0] while really it should be ldp d1, d0, [x19] fneg d0, d0 ins v1.d[1], v0.d[0] but that's for another time. gcc/ChangeLog: PR tree-optimization/99656 * tree-vect-slp-patterns.c (linear_loads_p, complex_add_pattern::matches, is_eq_or_top, vect_validate_multiplication, complex_mul_pattern::matches, complex_fms_pattern::matches): Remove complex_perm_kinds_t. * tree-vectorizer.h: (complex_load_perm_t): Removed. (slp_tree_to_load_perm_map_t): Use complex_perm_kinds_t instead of complex_load_perm_t. gcc/testsuite/ChangeLog: PR tree-optimization/99656 * gfortran.dg/vect/pr99656.f90: New test.
2021-03-19x86: Issue error for return/argument only with function bodyH.J. Lu9-8/+41
If we never generate function body, we shouldn't issue errors for return nor argument. Add silent_p to i386 machine_function to avoid issuing errors for return and argument without function body. gcc/ PR target/99652 * config/i386/i386-options.c (ix86_init_machine_status): Set silent_p to true. * config/i386/i386.c (init_cumulative_args): Set silent_p to false. (construct_container): Return early for return and argument errors if silent_p is true. * config/i386/i386.h (machine_function): Add silent_p. gcc/testsuite/ PR target/99652 * gcc.dg/torture/pr99652-1.c: New test. * gcc.dg/torture/pr99652-2.c: Likewise. * gcc.target/i386/pr57655.c: Adjusted. * gcc.target/i386/pr59794-6.c: Likewise. * gcc.target/i386/pr70738-1.c: Likewise. * gcc.target/i386/pr96744-1.c: Likewise.
2021-03-19analyzer: mark epath_finder with DISABLE_COPY_AND_ASSIGN [PR99614]David Malcolm1-0/+2
cppcheck warns that class epath_finder does dynamic memory allocation, but is missing a copy constructor and operator=. This class isn't meant to be copied or assigned, so mark it with DISABLE_COPY_AND_ASSIGN. gcc/analyzer/ChangeLog: PR analyzer/99614 * diagnostic-manager.cc (class epath_finder): Add DISABLE_COPY_AND_ASSIGN.
2021-03-19arm: Fix mve_vshlq* [PR99593]Jakub Jelinek3-2/+139
As mentioned in the PR, before the r11-6708-gbfab355012ca0f5219da8beb04f2fdaf757d34b7 change v[al]shr<mode>3 expanders were expanding the shifts by register to gen_ashl<mode>3_{,un}signed which don't support immediate CONST_VECTOR shift amounts, but now expand to mve_vshlq_<supf><mode> which does. The testcase ICEs, because the constraint doesn't match the predicate and because LRA works solely with the constraints, so it can e.g. from REG_EQUAL propagate there a CONST_VECTOR which matches the constraint but fails the predicate and only later on other passes will notice the predicate fails and ICE. Fixed by adding a constraint that matches the immediate part of the predicate. PR target/99593 * config/arm/constraints.md (Ds): New constraint. * config/arm/vec-common.md (mve_vshlq_<supf><mode>): Use w,Ds constraint instead of w,Dm. * g++.target/arm/pr99593.C: New test.
2021-03-19amdgcn: Typo fixAndrew Stubbs1-1/+1
gcc/ChangeLog: * config/gcn/gcn.c (gcn_parse_amdgpu_hsa_kernel_attribute): Fix quotes in error message.
2021-03-19Require linker plugin for another LTO testEric Botcazou1-1/+6
If it is not present, fat LTO is generated with an additional warning. gcc/testsuite/ * g++.dg/lto/pr89335_0.C: Require the linker plugin.
2021-03-19Fix segfault during encoding of CONSTRUCTORsEric Botcazou1-13/+32
The segfault occurs in native_encode_initializer when it is encoding the CONSTRUCTOR for an array whose lower bound is negative (it's OK in Ada). The computation of the current position is done in HOST_WIDE_INT and this does not work for arrays whose original range has a negative lower bound and a positive upper bound; the computation must be done in sizetype instead so that it may wrap around. gcc/ PR middle-end/99641 * fold-const.c (native_encode_initializer) <CONSTRUCTOR>: For an array type, do the computation of the current position in sizetype.
2021-03-19Daily bump.GCC Administrator4-1/+137
2021-03-18c++: Fix error-recovery with requires expression [PR99500]Marek Polacek2-2/+9
This fixes an ICE on invalid code where one of the parameters was error_mark_node and thus resetting its DECL_CONTEXT crashed. gcc/cp/ChangeLog: PR c++/99500 * parser.c (cp_parser_requirement_parameter_list): Handle error_mark_node. gcc/testsuite/ChangeLog: PR c++/99500 * g++.dg/cpp2a/concepts-err3.C: New test.
2021-03-18c++: Remove FLOAT_EXPR assert in tsubst.Marek Polacek1-1/+0
This assert triggered when pr85013.C was compiled with -fchecking=2 which the usual testing doesn't exercise. Let's remove it for now and revisit in GCC 12. gcc/cp/ChangeLog: * pt.c (tsubst_copy_and_build) <case FLOAT_EXPR>: Remove.
2021-03-18[PR99422] LRA: Use lookup_constraint only for a single constraint in ↵Vladimir N. Makarov1-1/+6
process_address_1. This is an additional patch for PR99422. In process_address_1 we look only at the first constraint in the 1st alternative and ignore all other possibilities. As we don't know what alternative and constraint will be used at this stage, we can be sure only for a single constraint with one alternative and should use unknown constraint for all other cases. gcc/ChangeLog: PR target/99422 * lra-constraints.c (process_address_1): Use lookup_constraint only for a single constraint.
2021-03-18PR middle-end/99502 - missing -Warray-bounds on partial out of boundsMartin Sebor4-5/+753
gcc/ChangeLog: PR middle-end/99502 * gimple-array-bounds.cc (inbounds_vbase_memaccess_p): Rename... (inbounds_memaccess_p): ...to this. Check the ending offset of the accessed member. gcc/testsuite/ChangeLog: PR middle-end/99502 * g++.dg/warn/Warray-bounds-22.C: New test. * g++.dg/warn/Warray-bounds-23.C: New test. * g++.dg/warn/Warray-bounds-24.C: New test.
2021-03-18c++: Add assert to tsubst.Marek Polacek1-0/+2
As discussed in the r11-7709 patch, we can now make sure that tsubst never sees a FLOAT_EXPR, much like its counterpart FIX_TRUNC_EXPR. gcc/cp/ChangeLog: * pt.c (tsubst_copy_and_build): Add assert.
2021-03-18amdgcn: Silence warnings in gcn.cAndrew Stubbs1-7/+10
This fixes a few cases of "unquoted identifier or keyword", one "spurious trailing punctuation sequence", and a "may be used uninitialized". gcc/ChangeLog: * config/gcn/gcn.c (gcn_parse_amdgpu_hsa_kernel_attribute): Add %< and %> quote markers to error messages. (gcn_goacc_validate_dims): Likewise. (gcn_conditional_register_usage): Remove exclaimation mark from error message. (gcn_vectorize_vec_perm_const): Ensure perm is fully uninitialized.
2021-03-18Fix idiv latencies for znver3Jan Hubicka1-7/+5
update costs of integer divides to match actual latencies (the scheduler model already does the right thing). It is essentially no-op, since we end up expanding idiv for all sensible constants, so this only may end up disabling vectorization in some cases, but I did not find any such examples. However in general it is better ot have actual latencies than random numbers. gcc/ChangeLog: 2021-03-18 Jan Hubicka <hubicka@ucw.cz> * config/i386/x86-tune-costs.h (struct processor_costs): Fix costs of integer divides1.
2021-03-19PR target/99314: Fix integer signedness issue for cpymem pattern expansion.Sinan Lin1-11/+13
Third operand of cpymem pattern is unsigned HOST_WIDE_INT, however we are interpret that as signed HOST_WIDE_INT, that not a problem in most case, but when the value is large than signed HOST_WIDE_INT, it might screw up since we have using that value to calculate the buffer size. 2021-03-05 Sinan Lin <sinan@isrc.iscas.ac.cn> Kito Cheng <kito.cheng@sifive.com> gcc/ChangeLog: * config/riscv/riscv.c (riscv_block_move_straight): Change type to unsigned HOST_WIDE_INT for parameter and local variable with HOST_WIDE_INT type. (riscv_adjust_block_mem): Ditto. (riscv_block_move_loop): Ditto. (riscv_expand_block_move): Ditto.
2021-03-18testsuite: Fix up strlenopt-80.c on powerpc [PR99636]Jakub Jelinek1-1/+1
Similar issue as in strlenopt-73.c, various spots in this test rely on MOVE_MAX >= 8, this time it uses a target selector to pick up a couple of targets, and all of them but powerpc 32-bit satisfy it, but powerpc 32-bit have MOVE_MAX just 4. 2021-03-18 Jakub Jelinek <jakub@redhat.com> PR testsuite/99636 * gcc.dg/strlenopt-80.c: For powerpc*-*-*, only enable for lp64.
2021-03-18testsuite: Fix up strlenopt-73.c on powerpc [PR99626]Jakub Jelinek1-2/+11
As mentioned in the testcase as well as in the PR, this testcase relies on MOVE_MAX being sufficiently large that the memcpy call is folded early into load + store. Some popular targets define MOVE_MAX to 8 or even 16 (e.g. x86_64 or some options on s390x), but many other targets define it to just 4 (e.g. powerpc 32-bit), or even 2. The testcase has already one test routine guarded on one particular target with MOVE_MAX 16 (but does it incorrectly, __i386__ is only defined on 32-bit x86 and __SIZEOF_INT128__ is only defined on 64-bit targets), this patch fixes that, and guards another test that relies on memcpy (, , 8) being folded that way (which therefore needs MOVE_MAX >= 8) on a couple of common targets that are known to have such MOVE_MAX. 2021-03-18 Jakub Jelinek <jakub@redhat.com> PR testsuite/99626 * gcc.dg/strlenopt-73.c: Ifdef out test_copy_cond_unequal_length_i64 on targets other than x86, aarch64, s390 and 64-bit powerpc. Use test_copy_cond_unequal_length_i128 for __x86_64__ with int128 support rather than __i386__.
2021-03-18testsuite: Skip c-c++-common/zero-scratch-regs-10.c on armChristophe Lyon1-0/+1
As discussed in PR 97680, -fzero-call-used-regs is not supported on arm. Skip this test to avoid failure reports. 2021-03-18 Christophe Lyon <christophe.lyon@linaro.org> gcc/testsuite/ PR testsuite/97680 * c-c++-common/zero-scratch-regs-10.c: Skip on arm
2021-03-18Fix building the V850 port using recent versions of gcc.Nick Clifton2-2/+3
gcc/ * config/v850/v850.c (construct_restore_jr): Increase static buffer size. (construct_save_jarl): Likewise. * config/v850/v850.h (DWARF2_DEBUGGING_INFO): Define.
2021-03-18Objective-C++ : Fix handling of unnamed message parms [PR49070].Iain Sandoe4-1/+91
When we are parsing an Objective-C++ message, a colon is a valid terminator for a assignment-expression. That is: [receiver meth:x:x:x:x]; Is a valid, if somewhat unreadable, construction; corresponding to a method declaration like: - (id) meth:(id)arg0 :(id)arg1 :(id)arg2 :(id)arg3; Where three of the message params have no selector name. If fact, although it might be unintentional, Objective-C/C++ can accept message selectors with all the parms unnamed (this applies to the clang implementation too, which is taken as the reference for the language). For regular C++, the pattern x:x is not valid in that position an an error is emitted with a fixit for the expected scope token. If we simply made that error conditional on !c_dialect_objc() that would regress Objective-C++ diagnostics for cases outside a message selector, so we add a state flag for this. gcc/cp/ChangeLog: PR objc++/49070 * parser.c (cp_debug_parser): Add Objective-C++ message state flag. (cp_parser_nested_name_specifier_opt): Allow colon to terminate an assignment-expression when parsing Objective- C++ messages. (cp_parser_objc_message_expression): Set and clear message parsing state on entry and exit. * parser.h (struct cp_parser): Add a context flag for Objective-C++ message state. gcc/testsuite/ChangeLog: PR objc++/49070 * obj-c++.dg/pr49070.mm: New test. * objc.dg/unnamed-parms.m: New test.
2021-03-18aarch64: Improve generic SVE tuning defaultsKyrylo Tkachov7-1/+43
This patch adds the recently-added tweak to split some SVE VL-based scalar operations [1] to the generic tuning used for SVE, as enabled by adding +sve to the -march flag, for example -march=armv8.2-a+sve. The recommendation for best performance on a particular CPU remains unchanged: use the -mcpu option for that CPU, where possible. -mcpu=native makes this straightforward for native compilation. The tweak to split out SVE VL-based scalar operations is a consistent win for the Neoverse V1 CPU and should be neutral for the Fujitsu A64FX. A run of SPEC2017 on A64FX with this tweak on didn't show any non-noise differences. It is also expected to be neutral on SVE2 implementations. Therefore, the patch enables the tweak for generic +sve tuning e.g. -march=armv8.2-a+sve. No SVE2 CPUs are expected to benefit from it, therefore the tweak is disabled for generic tuning when +sve2 is in -march e.g. -march=armv8.2-a+sve2. The implementation of this approach requires a bit of custom logic in aarch64_override_options_internal to handle these kinds of architecture-dependent decisions, but we do believe the user-facing principle here is important to implement. In general, for the generic target we're using a decision framework that looks like: * If all cores that are known to benefit from an optimization are of architecture X, and all other cores that implement X or above are not impacted, or have a very slight impact, we will consider it for generic tuning for architecture X. * We will not enable that optimisation for generic tuning for architecture X+1 if no known cores of architecture X+1 or above will benefit. This framework allows us to improve generic tuning for CPUs of generation X while avoiding accumulating tweaks for future CPUs of generation X+1, X+2... that do not need them, and thus avoid even the slight negative effects of these optimisations if the user is willing to tell us the desired architecture accurately. X above can mean either annual architecture updates (Armv8.2-a, Armv8.3-a etc) or optional architecture extensions (like SVE, SVE2). [1] http://gcc.gnu.org/g:a65b9ad863c5fc0aea12db58557f4d286a1974d7 gcc/ChangeLog: * config/aarch64/aarch64.c (aarch64_adjust_generic_arch_tuning): Define. (aarch64_override_options_internal): Use it. (generic_tunings): Add AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS to tune_flags. gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/aarch64-sve.exp: Add -moverride=tune=none to sve_flags. * g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise. * g++.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise. * gcc.target/aarch64/sve/aarch64-sve.exp: Likewise. * gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise. * gcc.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise.
2021-03-18coroutines: init struct members to NULLMartin Liska1-1/+1
gcc/cp/ChangeLog: PR c++/99617 * coroutines.cc (struct var_nest_node): Init then_cl and else_cl to NULL.
2021-03-18testsuite: Fix up pr98099.c testcase for big endian [PR98099]Jakub Jelinek1-2/+3
The testcase fails on big-endian without int128 support, because due to -fsso-struct=big-endian no swapping is needed for big endian. This patch restricts the testcase to big or little endian (but not pdp) and uses -fsso-struct=little-endian for big endian, so that it is swapping everywhere. 2021-03-18 Jakub Jelinek <jakub@redhat.com> PR middle-end/98099 * gcc.dg/pr98099.c: Don't compile the test on pdp endian. For big endian use -fsso-struct=little-endian dg-options.
2021-03-18Daily bump.GCC Administrator4-1/+93
2021-03-17c++: ICE with real-to-int conversion in template [PR97973]Marek Polacek2-1/+39
In this test we are building a call in a template, but since neither the function nor any of its arguments are dependent, we go down the normal path in finish_call_expr. convert_arguments sees that we're binding a reference to int to double and therein convert_to_integer creates a FIX_TRUNC_EXPR. Later, we call check_function_arguments which folds the arguments, and, in a template, fold_for_warn calls fold_non_dependent_expr. But tsubst_copy_and_build should not see a FIX_TRUNC_EXPR (see the patch discussed in <https://gcc.gnu.org/pipermail/gcc-patches/2018-March/496183.html>) or we crash. So let's not create a FIX_TRUNC_EXPR in a template in the first place and instead use IMPLICIT_CONV_EXPR. gcc/cp/ChangeLog: PR c++/97973 * call.c (conv_unsafe_in_template_p): New. (convert_like): Use it. gcc/testsuite/ChangeLog: PR c++/97973 * g++.dg/conversion/real-to-int1.C: New test.
2021-03-17c++: Private parent access check for using decls [PR19377]Anthony Sharp2-18/+132
This bug was already mostly fixed by the patch for PR17314. This patch continues that by ensuring that where a using decl is used, causing an access failure to a child class because the using decl is private, the compiler correctly points to the using decl as the source of the problem. gcc/cp/ChangeLog: 2021-03-10 Anthony Sharp <anthonysharp15@gmail.com> * semantics.c (get_class_access_diagnostic_decl): New function that examines special cases when a parent class causes a private access failure. (enforce_access): Slightly modified to call function above. gcc/testsuite/ChangeLog: 2021-03-10 Anthony Sharp <anthonysharp15@gmail.com> * g++.dg/cpp1z/using9.C: New using decl test. Co-authored-by: Jason Merrill <jason@redhat.com>
2021-03-17nios2: Fix format complaints and similar diagnostics.Sandra Loosemore1-29/+34
The nios2 back end has not been building with newer versions of host GCC due to several complaints about diagnostic formatting, along with a couple other warnings. This patch fixes the errors seen when building with a host compiler from current mainline head. I also made a pass through all the error messages in this file to make them use more consistent formatting, even where the host compiler was not specifically complaining. gcc/ * config/nios2/nios2.c (nios2_custom_check_insns): Clean up error message format issues. (nios2_option_override): Likewise. (nios2_expand_fpu_builtin): Likewise. (nios2_init_custom_builtins): Adjust to avoid bogus strncpy truncation warning. (nios2_expand_custom_builtin): More error message format fixes. (nios2_expand_rdwrctl_builtin): Likewise. (nios2_expand_rdprs_builtin): Likewise. (nios2_expand_eni_builtin): Likewise. (nios2_expand_builtin): Likewise. (nios2_register_custom_code): Likewise. (nios2_valid_target_attribute_rec): Likewise. (nios2_add_insn_asm): Fix uninitialized variable warning.
2021-03-17Enable gather on zen3 hardware.Jan Hubicka2-6/+6
For TSVC it get used by 5 benchmarks with following runtime improvements: s4114: 1.424 -> 1.209 (84.9017%) s4115: 2.021 -> 1.065 (52.6967%) s4116: 1.549 -> 0.854 (55.1323%) s4117: 1.386 -> 1.193 (86.075%) vag: 2.741 -> 1.940 (70.7771%) there is regression in s4112: 1.115 -> 1.184 (106.188%) The internal loop is: for (int i = 0; i < LEN_1D; i++) { a[i] += b[ip[i]] * s; } (so a standard accmulate and add with indirect addressing) 40a400: c5 fe 6f 24 03 vmovdqu (%rbx,%rax,1),%ymm4 40a405: c5 fc 28 da vmovaps %ymm2,%ymm3 40a409: 48 83 c0 20 add $0x20,%rax 40a40d: c4 e2 65 92 04 a5 00 vgatherdps %ymm3,0x594100(,%ymm4,4),%ymm0 40a414: 41 59 00 40a417: c4 e2 75 a8 80 e0 34 vfmadd213ps 0x5b34e0(%rax),%ymm1,%ymm0 40a41e: 5b 00 40a420: c5 fc 29 80 e0 34 5b vmovaps %ymm0,0x5b34e0(%rax) 40a427: 00 40a428: 48 3d 00 f4 01 00 cmp $0x1f400,%rax 40a42e: 75 d0 jne 40a400 <s4112+0x60> compared to: 40a280: 49 63 14 04 movslq (%r12,%rax,1),%rdx 40a284: 48 83 c0 04 add $0x4,%rax 40a288: c5 fa 10 04 95 00 41 vmovss 0x594100(,%rdx,4),%xmm0 40a28f: 59 00 40a291: c4 e2 71 a9 80 fc 34 vfmadd213ss 0x5b34fc(%rax),%xmm1,%xmm0 40a298: 5b 00 40a29a: c5 fa 11 80 fc 34 5b vmovss %xmm0,0x5b34fc(%rax) 40a2a1: 00 40a2a2: 48 3d 00 f4 01 00 cmp $0x1f400,%rax 40a2a8: 75 d6 jne 40a280 <s4112+0x40> Looking at instructions latencies - fmadd is 4 cycles - vgatherdps is 39 So vgather iself is 4.8 cycle per iteration and probably CPU is able to execute rest out of order getting clos to 4 cycles per iteration (it can do 2 loads in parallel, one store and rest fits easily to execution resources). That would explain 20% slowdown. gimple internal loop is: _2 = a[i_38]; _3 = (long unsigned int) i_38; _4 = _3 * 4; _5 = ip_18 + _4; _6 = *_5; _7 = b[_6]; _8 = _7 * s_19; _9 = _2 + _8; a[i_38] = _9; i_28 = i_38 + 1; ivtmp_52 = ivtmp_53 - 1; if (ivtmp_52 != 0) goto <bb 8>; [98.99%] else goto <bb 4>; [1.01%] 0x25bac30 a[i_38] 1 times scalar_load costs 12 in body 0x25bac30 *_5 1 times scalar_load costs 12 in body 0x25bac30 b[_6] 1 times scalar_load costs 12 in body 0x25bac30 _7 * s_19 1 times scalar_stmt costs 12 in body 0x25bac30 _2 + _8 1 times scalar_stmt costs 12 in body 0x25bac30 _9 1 times scalar_store costs 16 in body so 19 cycles estimate of scalar load 0x2668630 a[i_38] 1 times vector_load costs 12 in body 0x2668630 *_5 1 times unaligned_load (misalign -1) costs 12 in body 0x2668630 b[_6] 8 times scalar_load costs 96 in body 0x2668630 _7 * s_19 1 times scalar_to_vec costs 4 in prologue 0x2668630 _7 * s_19 1 times vector_stmt costs 12 in body 0x2668630 _2 + _8 1 times vector_stmt costs 12 in body 0x2668630 _9 1 times vector_store costs 16 in body so 40 cycles per 8x vectorized body tsvc.c:3450:27: note: operating only on full vectors. tsvc.c:3450:27: note: Cost model analysis: Vector inside of loop cost: 160 Vector prologue cost: 4 Vector epilogue cost: 0 Scalar iteration cost: 76 Scalar outside cost: 0 Vector outside cost: 4 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 1 I think this generally suffers from GIGO principle. One problem seems to be that we do not know about fmadd yet and compute it as two instructions (6 cycles instead of 4). More importnat problem is that we do not account the parallelism at all. I do not see how to disable the vecotrization here without bumping gather costs noticeably off reality and thus we probably can try to experiment with this if more similar problems are found. Icc is also using gather in s1115 and s128. For s1115 the vectorization does not seem to help and s128 gets slower. Clang and aocc does not use gathers. * config/i386/x86-tune-costs.h (struct processor_costs): Update costs of gather to match reality. * config/i386/x86-tune.def (X86_TUNE_USE_GATHER): Enable for znver3.
2021-03-17compiler: copy receiver argument for go/defer of method callIan Lance Taylor3-2/+29
Test case is https://golang.org/cl/302371. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/302270