aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2020-08-25analyzer: fix ICE on initializers for unsized array fields [PR96777]David Malcolm4-5/+55
gcc/analyzer/ChangeLog: PR analyzer/96777 * region-model.h (class compound_svalue): Document that all keys must be concrete. (compound_svalue::compound_svalue): Move definition to svalue.cc. * store.cc (binding_map::apply_ctor_to_region): Handle initializers for trailing arrays with incomplete size. * svalue.cc (compound_svalue::compound_svalue): Move definition here from region-model.h. Add assertion that all keys are concrete. gcc/testsuite/ChangeLog: PR analyzer/96777 * gcc.dg/analyzer/pr96777.c: New test.
2020-08-26Daily bump.GCC Administrator5-1/+246
2020-08-25x86: Change CTZ_DEFINED_VALUE_AT_ZERO to return 0/2H.J. Lu3-2/+76
Change CTZ_DEFINED_VALUE_AT_ZERO/CTZ_DEFINED_VALUE_AT_ZERO to return 0/2 to enable table-based clz/ctz optimization: -- Macro: CLZ_DEFINED_VALUE_AT_ZERO (MODE, VALUE) -- Macro: CTZ_DEFINED_VALUE_AT_ZERO (MODE, VALUE) A C expression that indicates whether the architecture defines a value for 'clz' or 'ctz' with a zero operand. A result of '0' indicates the value is undefined. If the value is defined for only the RTL expression, the macro should evaluate to '1'; if the value applies also to the corresponding optab entry (which is normally the case if it expands directly into the corresponding RTL), then the macro should evaluate to '2'. In the cases where the value is defined, VALUE should be set to this value. gcc/ PR target/95863 * config/i386/i386.h (CTZ_DEFINED_VALUE_AT_ZERO): Return 0/2. (CLZ_DEFINED_VALUE_AT_ZERO): Likewise. gcc/testsuite/ PR target/95863 * gcc.target/i386/pr95863-1.c: New test. * gcc.target/i386/pr95863-2.c: Likewise.
2020-08-25hppa: PR middle-end/87256: Improved hppa_rtx_costs avoids synth_mult madness.Roger Sayle1-34/+138
This is my proposed fix to PR middle-end/87256 where synth_mult takes an unreasonable amount of CPU time determining an optimal sequence of instructions to perform multiplication by (large) integer constants on hppa. One workaround proposed in bugzilla, is to increase the hash table used to cache/reuse intermediate results. This helps but is a workaround for the (hidden) underlying problem. The real issue is that the hppa_rtx_costs function is providing wildly inaccurate values (estimates) to the middle-end. For example, (p*q)+(r*s) would appear to be cheaper than a single multiplication. Another example is that "(ashiftrt:di regA regB)" is claimed to be only be COST_N_INSNS(1) when in fact the hppa backend actually generates slightly more than a single instruction. It turns out that simply tightening up the logic in hppa_rtx_costs to return more reasonable values, dramatically reduces the number of recursive invocations in synth_mult for the test case in PR87256, and presumably also produces faster code (that should be observable in benchmarks). 2020-08-25 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR middle-end/87256 * config/pa/pa.c (hppa_rtx_costs_shadd_p): New helper function to check for coefficients supported by shNadd and shladd,l. (hppa_rtx_costs): Rewrite to avoid using estimates based upon FACTOR and enable recursing deeper into RTL expressions.
2020-08-25hppa: Improve expansion of ashldi3 when !TARGET_64BITRoger Sayle1-1/+33
This patch improves the code generated on PA-RISC for DImode (double word) left shifts by small constants (1-31). This target has a very cool shd instruction that can be recognized by combine for simple shifts, but relying on combine is fragile for more complicated functions. This patch tweaks pa.md's ashldi3 expander, to form the optimal two instruction shd/zdep sequence at RTL expansion time. As an example of the benefits of this approach, the simple function unsigned long long u9(unsigned long long x) { return x*9; } currently generates 9 instructions and with this patch now requires only 7. 2020-08-25 Roger Sayle <roger@nextmovesoftware.com> * config/pa/pa.md (ashldi3): Additionally, on !TARGET_64BIT generate a two instruction shd/zdep sequence when shifting registers by suitable constants. (shd_internal): New define_expand to provide gen_shd_internal.
2020-08-25OpenMP: Improve map-clause error message for array function parameter (PR96678)Tobias Burnus8-9/+48
gcc/c/ChangeLog: PR c/96678 * c-typeck.c (handle_omp_array_sections_1): Talk about array function parameter in the error message. gcc/cp/ChangeLog: PR c/96678 * semantics.c (handle_omp_array_sections_1): Talk about array function parameter in the error message. gcc/testsuite/ChangeLog: PR c/96678 * c-c++-common/gomp/map-4.c: New test. * c-c++-common/gomp/depend-1.c: Update dg-error. * c-c++-common/gomp/map-1.c: Likewise. * c-c++-common/gomp/reduction-1.c: Likewise. * g++.dg/gomp/depend-1.C: Likewise. * g++.dg/gomp/depend-2.C: Likewise.
2020-08-25aarch64: Update feature macro nameRichard Sandiford2-3/+3
GCC used the name __ARM_FEATURE_SVE_VECTOR_OPERATIONS, but in the final spec it was renamed to__ARM_FEATURE_SVE_VECTOR_OPERATORS. gcc/ * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Rename __ARM_FEATURE_SVE_VECTOR_OPERATIONS to __ARM_FEATURE_SVE_VECTOR_OPERATORS. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/attributes_1.c: Rename __ARM_FEATURE_SVE_VECTOR_OPERATIONS to __ARM_FEATURE_SVE_VECTOR_OPERATORS.
2020-08-25aarch64: Tweaks to the handling of fixed-length SVE typesRichard Sandiford8-17/+295
This patch is really four things rolled into one, since separating them seemed artificial: - Update the mangling of the fixed-length SVE ACLE types to match the upcoming spec. The idea is to mangle: VLAT __attribute__((arm_sve_vector_bits(N))) as an instance __SVE_VLS<VLAT, N> of the template: __SVE_VLS<typename, unsigned> - Give the fixed-length types their own TYPE_DECL. This is needed to make the above mangling fix work, but should also be a minor QoI improvement for error reporting. Unfortunately, the names are quite verbose, e.g.: svint8_t __attribute__((arm_sve_vector_bits(512))) but anything shorter would be ad-hoc syntax and so might be more confusing. - Improve the error message reported when arm_sve_vector_bits is applied to tuples, such as: svint32x2_t __attribute__((arm_sve_vector_bits(N))) Previously we would complain that the type isn't an SVE type; now we complain that it isn't a vector type. - Don't allow arm_sve_vector_bits(N) to be applied to existing fixed-length SVE types. gcc/ * config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): Take the ACLE name of the type as a parameter and add it as fourth argument to the "SVE type" attribute. (register_builtin_types): Update call accordingly. (register_tuple_type): Likewise. Construct the name of the type earlier in order to do this. (get_arm_sve_vector_bits_attributes): New function. (handle_arm_sve_vector_bits_attribute): Report a more sensible error message if the attribute is applied to an SVE tuple type. Don't allow the attribute to be applied to an existing fixed-length SVE type. Mangle the new type as __SVE_VLS<type, vector-bits>. Add a dummy TYPE_DECL to the new type. gcc/testsuite/ * g++.target/aarch64/sve/acle/general-c++/attributes_2.C: New test. * g++.target/aarch64/sve/acle/general-c++/mangle_6.C: Likewise. * g++.target/aarch64/sve/acle/general-c++/mangle_7.C: Likewise. * g++.target/aarch64/sve/acle/general-c++/mangle_8.C: Likewise. * g++.target/aarch64/sve/acle/general-c++/mangle_9.C: Likewise. * g++.target/aarch64/sve/acle/general-c++/mangle_10.C: Likewise. * gcc.target/aarch64/sve/acle/general/attributes_7.c: Check the error messages reported when arm_sve_vector_bits is applied to SVE tuple types or to existing fixed-length SVE types.
2020-08-25aarch64: Update the mangling of single SVE vectors and predicatesRichard Sandiford5-31/+31
GCC was implementing an old mangling scheme for single SVE vectors and predicates (based on the Advanced SIMD one). The final definition instead put them in the vendor built-in namespace via the "u" prefix. gcc/ * config/aarch64/aarch64-sve-builtins.cc (DEF_SVE_TYPE): Add a leading "u" to each mangled name. gcc/testsuite/ * g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Add a leading "u" to the mangling of each SVE vector and predicate type. * g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise. * g++.target/aarch64/sve/acle/general-c++/mangle_3.C: Likewise. * g++.target/aarch64/sve/acle/general-c++/mangle_5.C: Likewise.
2020-08-25tree-optimization/96548 - fix failure to recompute RPO after CFG changeRichard Biener3-0/+46
This recomputes RPO after store-motion changes the CFG. 2020-08-25 Richard Biener <rguenther@suse.de> PR tree-optimization/96548 PR tree-optimization/96760 * tree-ssa-loop-im.c (tree_ssa_lim): Recompute RPO after store-motion. * gcc.dg/torture/pr96548.c: New testcase. * gcc.dg/torture/pr96760.c: Likewise.
2020-08-25gimple: Ignore *0 = {CLOBBER} in path isolation [PR96722]Jakub Jelinek2-3/+24
Clobbers of MEM_REF with NULL address are just fancy nops, something we just ignore and don't emit any code for it (ditto for other clobbers), they just mark end of life on something, so we shouldn't infer from those that there is some UB. 2020-08-25 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/96722 * gimple.c (infer_nonnull_range): Formatting fix. (infer_nonnull_range_by_dereference): Return false for clobber stmts. * g++.dg/opt/pr96722.C: New test.
2020-08-25strlen: Fix handle_builtin_string_cmp [PR96758]Jakub Jelinek2-2/+27
The following testcase is miscompiled, because handle_builtin_string_cmp sees a strncmp call with constant last argument 4, where one of the strings has an upper bound of 5 bytes (due to it being an array of that size) and the other has a known string length of 1 and the result is used only in equality comparison. It is folded into __builtin_strncmp_eq (str1, str2, 4), which is incorrect, because that means reading 4 bytes from both strings and comparing that. When one of the strings has known strlen of 1, we want to compare just 2 bytes, not 4, as strncmp shouldn't compare any bytes beyond the null. So, the last argument to __builtin_strncmp_eq should be the minimum of the provided strncmp last argument and the known string length + 1 (assuming the other string has only a known upper bound due to array size). Besides that, I've noticed the code has been written with the intent to also support the case where we know exact string length of both strings (but not the string content, so we can't compute it at compile time). In that case, both cstlen1 and cstlen2 are non-negative and both arysiz1 and arysiz2 are negative. We wouldn't optimize that, cmpsiz would be either the strncmp last argument, or for strcmp the first string length, but varsiz would be -1 and thus cmpsiz would be never < varsiz. The patch fixes it by using the correct length, in that case using the minimum of the two and for strncmp also the last argument. 2020-08-25 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/96758 * tree-ssa-strlen.c (handle_builtin_string_cmp): If both cstlen1 and cstlen2 are set, set cmpsiz to their minimum, otherwise use the one that is set. If bound is used and smaller than cmpsiz, set cmpsiz to bound. If both cstlen1 and cstlen2 are set, perform the optimization. * gcc.dg/strcmpopt_12.c: New test.
2020-08-25sra: Bail out when encountering accesses with negative offsets (PR 96730)Martin Jambor2-0/+19
I must admit I was quite surprised to see that SRA does not disqualify an aggregate from any transformations when it encounters an offset for which get_ref_base_and_extent returns a negative offset. It may not matter too much because I sure hope such programs always have undefined behavior (SRA candidates are local variables on stack) but it is probably better not to perform weird transformations on them as build ref model with the new build_reconstructed_reference function currently happily do for negative offsets (they just copy the existing expression which is then used as the expression of a "propagated" access) and of course the compiler must not ICE (as it currently does because the SRA forest verifier does not like the expression). gcc/ChangeLog: 2020-08-24 Martin Jambor <mjambor@suse.cz> PR tree-optimization/96730 * tree-sra.c (create_access): Disqualify any aggregate with negative offset access. (build_ref_for_model): Add assert that offset is non-negative. gcc/testsuite/ChangeLog: 2020-08-24 Martin Jambor <mjambor@suse.cz> PR tree-optimization/96730 * gcc.dg/tree-ssa/pr96730.c: New test.
2020-08-25Fix a typo in rtl.defWei Wentao1-1/+1
gcc/ * rtl.def: Fix typo in comment.
2020-08-25middle-end: PR tree-optimization/21137: STRIP_NOPS avoids missed optimization.Roger Sayle2-31/+59
PR tree-optimization/21137 is now an old enhancement request pointing out that an optimization I added back in 2006, to optimize "((x>>31)&64) != 0" as "x < 0", doesn't fire in the presence of unanticipated type conversions. The fix is to call STRIP_NOPS at the appropriate point. 2020-08-25 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR tree-optimization/21137 * fold-const.c (fold_binary_loc) [NE_EXPR/EQ_EXPR]: Call STRIP_NOPS when checking whether to simplify ((x>>C1)&C2) != 0. gcc/testsuite/ChangeLog PR tree-optimization/21137 * gcc.dg/pr21137.c: New test.
2020-08-25MIPS: Fix __builtin_longjmp (PR 64242)Andrew Pinski1-1/+11
The problem here is mips has its own builtin_longjmp pattern and it was not fixed when expand_builtin_longjmp was fixed. We need to read the new fp and gp before restoring the stack as the buffer might be a local variable. 2020-08-25 Andrew Pinski <apinski@marvell.com> gcc/ChangeLog: PR middle-end/64242 * config/mips/mips.md (builtin_longjmp): Restore the frame pointer and stack pointer and gp.
2020-08-25debug/96690 - mangle symbols eventually used by late dwarf outputRichard Biener2-8/+24
The following makes sure to, at early debug generation time, mangle symbols we eventually end up outputting during late finish. 2020-08-24 Richard Biener <rguenther@suse.de> PR debug/96690 * dwarf2out.c (reference_to_unused): Make FUNCTION_DECL processing more consistent with respect to symtab->global_info_ready. (tree_add_const_value_attribute): Unconditionally call rtl_for_decl_init to do all mangling early but throw away the result if early_dwarf. * g++.dg/lto/pr96690_0.C: New testcase.
2020-08-25Refine typo to fix ICE.liuhongt2-1/+17
2020-08-24 Hongtao Liu <hongtao.liu@intel.com> gcc/ChangeLog: PR target/96755 * config/i386/sse.md: Correct the mode of NOT operands to SImode. gcc/testsuite/ChangeLog: * gcc.target/i386/pr96755.c: New test.
2020-08-25match.pd: Simplify copysign (x, -x) to -x [PR96715]Jakub Jelinek2-0/+28
The following patch implements an optimization suggested in the PR, copysign(x,-x) can be optimized into -x (even without -ffast-math, should work fine even for signed zeros and infinities or nans). 2020-08-25 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/96715 * match.pd (copysign(x,-x) -> -x): New simplification. * gcc.dg/tree-ssa/copy-sign-3.c: New test.
2020-08-25c++: Fix up ptr.~PTR () handling [PR96721]Jakub Jelinek4-5/+20
The following testcase is miscompiled, because build_trivial_dtor_call handles the case when instance is a pointer by adding a clobber to what the pointer points to (which is desirable e.g. for delete) rather than the pointer itself. That is I think always desirable behavior for references, but for pointers for the pseudo dtor case it is not. 2020-08-25 Jakub Jelinek <jakub@redhat.com> PR c++/96721 * cp-tree.h (build_trivial_dtor_call): Add bool argument defaulted to false. * call.c (build_trivial_dtor_call): Add NO_PTR_DEREF argument. If instance is a pointer and NO_PTR_DEREF is true, clobber the pointer rather than what it points to. * semantics.c (finish_call_expr): Call build_trivial_dtor_call with true as NO_PTR_DEREF. * g++.dg/opt/flifetime-dse8.C: New test.
2020-08-25gimple-fold: Don't optimize wierdo floating point value reads [PR95450]Jakub Jelinek2-1/+42
My patch to introduce native_encode_initializer to fold_ctor_reference apparently broke gnulib/m4 on powerpc64. There it uses a const union with two doubles and corresponding IBM double double long double which actually is the largest normalizable long double value (1 ulp higher than __LDBL_MAX__). The reason our __LDBL_MAX__ is smaller is that we internally treat the double double type as one having 106-bit precision, but it actually has a variable 53-bit to 2000-ish bit precision and for the 0x1.fffffffffffff7ffffffffffffc000p+1023L value gnulib uses we need 107-bit precision, therefore for GCC __LDBL_MAX__ is 0x1.fffffffffffff7ffffffffffff8000p+1023L Before my changes, we wouldn't be able to fold_ctor_reference it and it worked fine at runtime, but with the change we are able to do that, but because it is larger than anything we can handle internally, we treat it weirdly. Similar problem would be if somebody creates this way valid, but much more than 106 bit precision e.g. 1.0 + 1.0e-768. Now, I think similar problem could happen e.g. on i?86/x86_64 with long double there, it also has some weird values in the format, e.g. the unnormals, pseudo infinities and various other magic values. This patch for floating point types (including vector and complex types with such elements) will try to encode the returned value again and punt if it has different memory representation from the original. Note, this is only done in the path where native_encode_initializer was used, in order not to affect e.g. just reading an unpunned long double value; the value should be compiler generated in that case and thus should be properly representable. It will punt also if e.g. the padding bits are initialized to non-zero values. I think the verification that what we encode can be interpreted back woiuld be only an internal consistency check (so perhaps for ENABLE_CHECKING if flag_checking only, but if both directions perform it, then we need to avoid mutual recursion). While for the other direction (interpretation), at least for the broken by design long doubles we just know we can't represent in GCC all valid values. The other floating point formats are just theoretical case, perhaps we would canonicalize something to a value that wouldn't trigger invalid exception when without canonicalization it would trigger it at runtime, so let's just ignore those. Adjusted (so far untested) patch to do it in native_interpret_real instead and limit it to the MODE_COMPOSITE_P cases, for which e.g. fold-const.c/simplify-rtx.c punts in several other places too because we just know we can't represent everything. E.g. /* Don't constant fold this floating point operation if the result may dependent upon the run-time rounding mode and flag_rounding_math is set, or if GCC's software emulation is unable to accurately represent the result. */ if ((flag_rounding_math || (MODE_COMPOSITE_P (mode) && !flag_unsafe_math_optimizations)) && (inexact || !real_identical (&result, &value))) return NULL_TREE; Or perhaps guard it with MODE_COMPOSITE_P (mode) && !flag_unsafe_math_optimizations too, thus break what gnulib / m4 does with -ffast-math, but not normally? 2020-08-25 Jakub Jelinek <jakub@redhat.com> PR target/95450 * fold-const.c (native_interpret_real): For MODE_COMPOSITE_P modes punt if the to be returned REAL_CST does not encode to the bitwise same representation. * gcc.target/powerpc/pr95450.c: New test.
2020-08-24c++: Emit as-base 'tor symbols for final class. [PR95428]Jason Merrill2-10/+8
For PR70462 I stopped emitting the as-base constructor and destructor variants for final classes, because they can never be called. Except that it turns out that clang calls base variants from complete variants, even for classes with virtual bases, and in some cases inlines them such that the calls to the base variant are exposed. So we need to continue to emit the as-base symbols, even though they're unreachable by G++-compiled code. gcc/cp/ChangeLog: PR c++/95428 * optimize.c (populate_clone_array): Revert PR70462 change. (maybe_clone_body): Likewise. gcc/testsuite/ChangeLog: * g++.dg/other/final8.C: Adjust expected output.
2020-08-25Daily bump.GCC Administrator4-1/+72
2020-08-24doc: Switch valgrind.com to httpsGerald Pfeifer1-1/+1
gcc/ChangeLog: * doc/install.texi (Configuration): Switch valgrind.com to https.
2020-08-24c++: overload dumperNathan Sidwell1-0/+16
I frequently need to look at overload sets, and debug_node spews more information than is useful, most of the time. Here's a dumper for overloads, that just tells you their full name and where they came from. gcc/cp * ptree.c (debug_overload): New.
2020-08-24Fortran : get_environment_variable runtime error PR96486Mark Eggleston1-0/+9
Runtime error occurs when the type of the value argument is character(0): "Zero-length string passed as value...". The status argument, intent(out), will contain -1 if the value of the environment is too large to fit in the value argument, this is the case if the type is character(0) so there is no reason to produce a runtime error if the value argument is zero length. 2020-08-24 Mark Eggleston <markeggleston@gcc.gnu.org> libgfortran/ PR fortran/96486 * intrinsics/env.c: If value_len is > 0 blank the string. Copy the result only if its length is > 0. 2020-08-24 Mark Eggleston <markeggleston@gcc.gnu.org> gcc/testsuite/ PR fortran/96486 * gfortran.dg/pr96486.f90: New test.
2020-08-24arm: Fix -mpure-code support/-mslow-flash-data for armv8-m.base [PR94538]Christophe Lyon3-12/+79
armv8-m.base (cortex-m23) has the movt instruction, so we need to disable the define_split to generate a constant in this case, otherwise we get incorrect insn constraints as described in PR94538. We also need to fix the pure-code alternative for thumb1_movsi_insn because the assembler complains with instructions like movs r0, #:upper8_15:1234 (Internal error in md_apply_fix) We now generate movs r0, 4 instead. 2020-08-24 Christophe Lyon <christophe.lyon@linaro.org> PR target/94538 gcc/ * config/arm/thumb1.md: Disable set-constant splitter when TARGET_HAVE_MOVT. (thumb1_movsi_insn): Fix -mpure-code alternative. PR target/94538 gcc/testsuite/ * gcc.target/arm/pure-code/pr94538-1.c: New test. * gcc.target/arm/pure-code/pr94538-2.c: New test.
2020-08-24SLP: support entire BB.Martin Liska6-76/+120
gcc/ChangeLog: * tree-vect-data-refs.c (dr_group_sort_cmp): Work on data_ref_pair. (vect_analyze_data_ref_accesses): Work on groups. (vect_find_stmt_data_reference): Add group_id argument and fill up dataref_groups vector. * tree-vect-loop.c (vect_get_datarefs_in_loop): Pass new arguments. (vect_analyze_loop_2): Likewise. * tree-vect-slp.c (vect_slp_analyze_bb_1): Pass argument. (vect_slp_bb_region): Likewise. (vect_slp_region): Likewise. (vect_slp_bb):Work on the entire BB. * tree-vectorizer.h (vect_analyze_data_ref_accesses): Add new argument. (vect_find_stmt_data_reference): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-38.c: Adjust pattern as now we only process a single vectorization and now 2 partial. * gcc.dg/vect/bb-slp-45.c: New test.
2020-08-24Add missing vn_reference_t::punned initializationMartin Liska1-2/+3
gcc/ChangeLog: PR tree-optimization/96597 * tree-ssa-sccvn.c (vn_reference_lookup_call): Add missing initialization of ::punned. (vn_reference_insert): Use consistently false instead of 0. (vn_reference_insert_pieces): Likewise.
2020-08-24reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUMHans-Peter Nilsson2-1/+85
This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot fillers using a method similar to that in commit 33c2207d3fda, where care was taken for fill_simple_delay_slots to allow such insns when scanning for delay-slot fillers *backwards* (before the insn). A TARGET_FLAGS_REGNUM target is typically a former cc0 target. For cc0 targets, insns don't mention clobbering cc0, so the clobbers are mentioned in the "resources" only as a special entity and only for compare-insns and branches, where the cc0 value matters. In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and the register liveness detection in reorg.c / resource.c treats that as a blocker (for other insns mentioning it, i.e. most) when looking for delay-slot-filling candidates. This means that when comparing core and performance for a delay-slot cc0 target before and after the de-cc0 conversion, the inability to fill a delay slot after conversion manifests as a regression. This was one such case, for CRIS, with random_bitstring in gcc.c-torture/execute/arith-rand-ll.c as well as the target libgcc division function. After this, all known performance regressions compared to cc0 are fixed. gcc: PR target/93372 * reorg.c (fill_slots_from_thread): Allow trial insns that clobber TARGET_FLAGS_REGNUM as delay-slot fillers. gcc/testsuite: PR target/93372 * gcc.target/cris/pr93372-47.c: New test.
2020-08-24Daily bump.GCC Administrator4-1/+51
2020-08-23x86: Add target("general-regs-only") function attributeH.J. Lu13-3/+240
gcc/ PR target/96744 * config/i386/i386-options.c (IX86_ATTR_IX86_YES): New. (IX86_ATTR_IX86_NO): Likewise. (ix86_opt_type): Add ix86_opt_ix86_yes and ix86_opt_ix86_no. (ix86_valid_target_attribute_inner_p): Handle general-regs-only, ix86_opt_ix86_yes and ix86_opt_ix86_no. (ix86_option_override_internal): Check opts->x_ix86_target_flags instead of opts->x_ix86_target_flags. * doc/extend.texi: Document target("general-regs-only") function attribute. gcc/testsuite/ PR target/96744 * gcc.target/i386/pr96744-1.c: New test. * gcc.target/i386/pr96744-2.c: Likewise. * gcc.target/i386/pr96744-3a.c: Likewise. * gcc.target/i386/pr96744-3b.c: Likewise. * gcc.target/i386/pr96744-4.c: Likewise. * gcc.target/i386/pr96744-5.c: Likewise. * gcc.target/i386/pr96744-6.c: Likewise. * gcc.target/i386/pr96744-7.c: Likewise. * gcc.target/i386/pr96744-8a.c: Likewise. * gcc.target/i386/pr96744-8b.c: Likewise. * gcc.target/i386/pr96744-9.c: Likewise.
2020-08-23Changed to STOP 1 in unlimited_polymorphic_31.f03.Paul Thomas1-1/+1
2020-08-23 Paul Thomas <pault@gcc.gnu.org> gcc/testsuite/ PR fortran/92785 * gfortran.dg/unlimited_polymorphic_31.f03: Change to stop 1.
2020-08-23Adding option -g to pr96737.f90.Paul Thomas1-1/+1
2020-08-23 Paul Thomas <pault@gcc.gnu.org> gcc/testsuite/ PR fortran/96737 * gfortran.dg/pr96737.f90: Add option -g.
2020-08-23This patch fixes PR96737. See the explanatory comment in the testcase.Paul Thomas2-2/+107
2020-08-23 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/96737 * trans-types.c (gfc_get_derived_type): Derived types that are used in submodules are not compatible with TYPE_CANONICAL from any of the global namespaces. gcc/testsuite/ PR fortran/96737 * gfortran.dg/pr96737.f90: New test.
2020-08-23Daily bump.GCC Administrator4-1/+43
2020-08-22analyzer: fix NULL deref false positives [PR94851]David Malcolm4-0/+96
PR analyzer/94851 reports various false "NULL dereference" diagnostics. The first case (comment #1) affects GCC 10.2 but no longer affects trunk; I believe it was fixed by the state rewrite of r11-2694-g808f4dfeb3a95f50f15e71148e5c1067f90a126d. The patch adds a regression test for this case. The other cases (comment #3 and comment #4) still affect trunk. In both cases, the && in a conditional is optimized to bitwise & _1 = p_4 != 0B; _2 = p_4 != q_6(D); _3 = _1 & _2; and the analyzer fails to fold this for the case where one (or both) of the conditionals is false, and thus erroneously considers the path where "p" is non-NULL despite being passed a NULL value. Fix this by implementing folding for this case. gcc/analyzer/ChangeLog: PR analyzer/94851 * region-model-manager.cc (region_model_manager::maybe_fold_binop): Fold bitwise "& 0" to 0. gcc/testsuite/ChangeLog: PR analyzer/94851 * gcc.dg/analyzer/pr94851-1.c: New test. * gcc.dg/analyzer/pr94851-3.c: New test. * gcc.dg/analyzer/pr94851-4.c: New test.
2020-08-22analyzer: simplify store::eval_aliasDavid Malcolm2-22/+27
I have followup patches that add new conditions to store::eval_alias. Rather than duplicate all conditions for symmetry, split it up and call it on both (A, B) and (B, A). gcc/analyzer/ChangeLog: * store.cc (store::eval_alias): Make const. Split out 2nd half into store::eval_alias_1 and call it twice for symmetry, avoiding test duplication. (store::eval_alias_1): New function, split out from the above. * store.h (store::eval_alias): Make const. (store::eval_alias_1): New decl.
2020-08-22analyzer: simplify region_model::push_frameDavid Malcolm2-17/+12
region_model::push_frame was binding arguments for both the default SSA name for each parameter, and the underlying parameter. Simplify the generated states by only binding the default SSA name if it exists, or the parameter if there is no default SSA name. gcc/analyzer/ChangeLog: * region-model.cc (region_model::push_frame): Bind the default SSA name for each parm if it exists, falling back to the parm itself otherwise, rather than doing both. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/malloc-ipa-8-double-free.c: Drop -fanalyzer-verbose-state-changes.
2020-08-22libgccjit: Update comments for gcc_jit_context_new_rvalue_from* functionsAndrea Corallo1-5/+9
gcc/jit/ChangeLog 2020-08-06 Andrea Corallo <andrea.corallo@arm.com> * libgccjit.c: (gcc_jit_context_new_rvalue_from_int) (gcc_jit_context_new_rvalue_from_long) (gcc_jit_context_new_rvalue_from_double) (gcc_jit_context_new_rvalue_from_ptr): Update function heading comments.
2020-08-22Daily bump.GCC Administrator3-1/+127
2020-08-21Update links to Arm docsRichard Sandiford2-4/+4
gcc/ * doc/extend.texi: Update links to Arm docs. * doc/invoke.texi: Likewise.
2020-08-22Using gen_int_mode instead of GEN_INT to avoid ICE caused by type promotion.liuhongt2-3/+14
2020-07-22 Hongtao Liu <hongtao.liu@intel.com> gcc/ PR target/96262 * config/i386/i386-expand.c (ix86_expand_vec_shift_qihi_constant): Refine. gcc/testsuite/ * gcc.target/i386/pr96262-1.c: New test.
2020-08-21driver: Fix several memory leaks [PR63854]Alex Coplan1-8/+52
This patch fixes several memory leaks in the driver, all of which relate to the handling of static specs. We introduce functions set_static_spec_{shared,owned}() which are used to enforce proper memory management when updating the strings in the static_specs table. This is achieved by making use of the alloc_p field in the table entries. Similarly to set_spec(), each time we update an entry, we check whether alloc_p is set, and free the old value if so. We then set alloc_p correctly based on whether we "own" this memory or whether we're just taking a pointer to a shared string which we shouldn't free. The following table shows the number of leaks found by AddressSanitizer when running a minimal libgccjit program on AArch64. The test program does the whole libgccjit compilation cycle in a loop (including acquiring and releasing the context), and the table below shows the number of leaks for different iterations of that loop. +--------------+-----+-----+------+---------------+ | # of runs > | 1 | 2 | 3 | Leaks per run | +--------------+-----+-----+------+---------------+ | Before patch | 463 | 940 | 1417 | 477 | +--------------+-----+-----+------+---------------+ | After patch | 416 | 846 | 1276 | 430 | +--------------+-----+-----+------+---------------+ gcc/ChangeLog: PR jit/63854 * gcc.c (set_static_spec): New. (set_static_spec_owned): New. (set_static_spec_shared): New. (driver::maybe_putenv_COLLECT_LTO_WRAPPER): Use set_static_spec_owned() to take ownership of lto_wrapper_file such that it gets freed in driver::finalize. (driver::maybe_run_linker): Use set_static_spec_shared() to ensure that we don't try and free() the static string "ld", also ensuring that any previously-allocated string in linker_name_spec is freed. Likewise with argv0. (driver::finalize): Use set_static_spec_shared() when resetting specs that previously had allocated strings; remove if(0) around call to free().
2020-08-21Allow try_split to split RTX_FRAME_RELATED_P insnsSenthil Kumar Selvaraj3-65/+90
Instead of rejecting RTX_FRAME_RELATED_P insns, allow try_split to split such insns, provided the split is after reload, and the result of the split is a single insn. recog.c:peep2_attempt already splits an RTX_FRAME_RELATED_P insn splitting to a single insn. This patch refactors existing code copying frame related info to a separate function (copy_frame_info_to_split_insn) and calls it from both peep2_attempt and try_split. 2020-08-21 Senthil Kumar Selvaraj <saaadhu@gcc.gnu.org> gcc/ChangeLog: * emit-rtl.c (try_split): Call copy_frame_info_to_split_insn to split certain RTX_FRAME_RELATED_P insns. * recog.c (copy_frame_info_to_split_insn): New function. (peep2_attempt): Split copying of frame related info of RTX_FRAME_RELATED_P insns into above function and call it. * recog.h (copy_frame_info_to_split_insn): Declare it.
2020-08-21Enable bitwise operation for type mask.liuhongt13-68/+472
Enable operator or/xor/and/andn/not for mask register, kxnor is not enabled since there's no corresponding instruction for general registers. gcc/ PR target/88808 * config/i386/i386.c (ix86_preferred_reload_class): Allow QImode data go into mask registers. * config/i386/i386.md: (*movhi_internal): Adjust constraints for mask registers. (*movqi_internal): Ditto. (*anddi_1): Support mask register operations (*and<mode>_1): Ditto. (*andqi_1): Ditto. (*andn<mode>_1): Ditto. (*<code><mode>_1): Ditto. (*<code>qi_1): Ditto. (*one_cmpl<mode>2_1): Ditto. (*one_cmplsi2_1_zext): Ditto. (*one_cmplqi2_1): Ditto. (define_peephole2): Move constant 0/-1 directly into mask registers. * config/i386/predicates.md (mask_reg_operand): New predicate. * config/i386/sse.md (define_split): Add post-reload splitters that would convert "generic" patterns to mask patterns. (*knotsi_1_zext): New define_insn. gcc/testsuite/ * gcc.target/i386/bitwise_mask_op-1.c: New test. * gcc.target/i386/bitwise_mask_op-2.c: New test. * gcc.target/i386/bitwise_mask_op-3.c: New test. * gcc.target/i386/avx512bw-pr88465.c: New testcase. * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. * gcc.target/i386/avx512f-kmovw-5.c: Ditto. * gcc.target/i386/pr55342.c: Ditto.
2020-08-21According to instruction_tables.pdfliuhongt1-4/+4
1. Set cost of movement inside mask registers a bit higher than gpr's. 2. Set cost of movement between mask register and gpr much higher than movement inside gpr, but still less equal than load/store. 3. Set cost of mask register load/store a bit higher than gpr load/store. gcc/ * config/i386/x86-tune-costs.h (skylake_cost): Adjust cost model.
2020-08-21Enable direct movement between gpr and mask registers in pass_reload.liuhongt7-3/+128
Changelog gcc/ * config/i386/i386.c (inline_secondary_memory_needed): No memory is needed between mask regs and gpr. (ix86_hard_regno_mode_ok): Add condition TARGET_AVX512F for mask regno. * config/i386/i386.h (enum reg_class): Add INT_MASK_REGS. (REG_CLASS_NAMES): Ditto. (REG_CLASS_CONTENTS): Ditto. * config/i386/i386.md: Exclude mask register in define_peephole2 which is avaiable only for gpr. gcc/testsuite/ * gcc.target/i386/spill_to_mask-1.c: New tests. * gcc.target/i386/spill_to_mask-2.c: New tests. * gcc.target/i386/spill_to_mask-3.c: New tests. * gcc.target/i386/spill_to_mask-4.c: New tests.
2020-08-21x86: Add cost model for operation of mask registers.H.J. Lu3-0/+185
gcc/ PR target/71453 * config/i386/i386.h (struct processor_costs): Add member mask_to_integer, integer_to_mask, mask_load[3], mask_store[3], mask_move. * config/i386/x86-tune-costs.h (ix86_size_cost, i386_cost, i386_cost, pentium_cost, lakemont_cost, pentiumpro_cost, geode_cost, k6_cost, athlon_cost, k8_cost, amdfam10_cost, bdver_cost, znver1_cost, znver2_cost, skylake_cost, btver1_cost, btver2_cost, pentium4_cost, nocona_cost, atom_cost, slm_cost, intel_cost, generic_cost, core_cost): Initialize mask_load[3], mask_store[3], mask_move, integer_to_mask, mask_to_integer for all target costs. * config/i386/i386.c (ix86_register_move_cost): Using cost model of mask registers. (inline_memory_move_cost): Ditto. (ix86_register_move_cost): Ditto.
2020-08-20analyzer: add regression tests [PR95152]David Malcolm2-0/+17
PR analyzer/95152 reports various ICEs in region_model::get_or_create_mem_ref. I removed this function as part of the state rewrite in r11-2694-g808f4dfeb3a95f50f15e71148e5c1067f90a126d. I've verified that these two test cases reproduce the issue with 10.2 and don't ICE with trunk; adding them as regression tests. gcc/testsuite/ChangeLog: PR analyzer/95152 * gcc.dg/analyzer/pr95152-4.c: New test. * gcc.dg/analyzer/pr95152-5.c: New test.