Age | Commit message (Collapse) | Author | Files | Lines |
|
Patch from Svante Signell.
PR go/103573
PR go/104290
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/386216
|
|
LRA tried to split non-allocated hard reg for reload pseudos again and
again until number of assignment passes reaches the limit. The patch fixes
this.
gcc/ChangeLog:
PR rtl-optimization/104447
* lra-constraints.cc (spill_hard_reg_in_range): Initiate ignore
hard reg set by lra_no_alloc_regs.
gcc/testsuite/ChangeLog:
PR rtl-optimization/104447
* gcc.target/i386/pr104447.c: New.
|
|
In finish_compound_literal, we perform non-dependent expr folding before
the call to check_narrowing ever since r9-5973. But ever since r10-7096,
check_narrowing also performs non-dependent expr folding of its own.
This double folding means tsubst will see non-templated trees during the
second folding, which causes a spurious error in the below testcase.
This patch removes the former folding operation; it seems obviated by
the latter one.
PR c++/104565
gcc/cp/ChangeLog:
* semantics.cc (finish_compound_literal): Don't perform
non-dependent expr folding before calling check_narrowing.
gcc/testsuite/ChangeLog:
* g++.dg/template/non-dependent22.C: New test.
|
|
the same type when convert is extension.
It's not equal to transform
(cond (cmp @1 @2) (convert@3 @4) (convert@5 @6))
to
(convert (cmp @1 @2) (convert)@4 @6)
when(convert@3 @4) is extension because it's zero_extend vs sign_extend.
gcc/ChangeLog:
PR tree-optimization/104551
PR tree-optimization/103771
* match.pd (cond_expr_convert_p): Add types_match check when
convert is extension.
* tree-vect-patterns.cc
(gimple_cond_expr_convert_p): Adjust comments.
(vect_recog_cond_expr_convert_pattern): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr104551.c: New test.
|
|
After the recent r12-7240 simplify_immed_subreg changes, we bail on more
simplify_subreg calls than before, e.g. apparently for decimal modes
in the NaN representations we almost never preserve anything except the
canonical {q,s}NaNs.
simplify_gen_subreg will punt in such cases because a SUBREG with VOIDmode
is not valid, but debug_lowpart_subreg wants to attempt even harder, even
if e.g. target indicates certain mode combinations aren't valid for the
backend, dwarf2out can still handle them. But a SUBREG from a VOIDmode
operand is just too much, the inner mode is lost there. We'd need some
new rtx that would be able to represent those cases.
For now, just punt in those cases.
2022-02-17 Jakub Jelinek <jakub@redhat.com>
PR debug/104557
* valtrack.cc (debug_lowpart_subreg): Don't call gen_rtx_raw_SUBREG
if expr has VOIDmode.
* gcc.dg/dfp/pr104557.c: New test.
|
|
The following patch uses the functions normal CPP_DEREF parsing uses,
i.e. convert_lvalue_to_rvalue and build_indirect_ref, instead of
blindly calling build_simple_mem_ref, so that if the variable does not
have correct type, we properly diagnose it instead of ICEing on it.
2022-02-17 Jakub Jelinek <jakub@redhat.com>
PR c/104532
* c-parser.cc (c_parser_omp_variable_list): For CPP_DEREF, use
convert_lvalue_to_rvalue and build_indirect_ref instead of
build_simple_mem_ref.
* gcc.dg/gomp/pr104532.c: New test.
|
|
gcc/ChangeLog:
* config/i386/cpuid.h (bit_MPX): Removed.
(bit_BNDREGS): Ditto.
(bit_BNDCSR): Ditto.
|
|
Define the sizes of the PowerPC specific types __float128 and __ibm128 if those
types are enabled.
This patch will define __SIZEOF_IBM128__ and __SIZEOF_FLOAT128__ if their
respective types are created in the compiler. Currently, this means both of
these will be defined if float128 support is enabled. But at some point in
the future, __ibm128 could be enabled without enabling float128 support and
__SIZEOF_IBM128__ would be defined.
2022-02-16 Michael Meissner <meissner@the-meissners.org>
gcc/
PR target/99708
* config/rs6000/rs6000-c.cc (rs6000_cpu_cpp_builtins): Define
__SIZEOF_IBM128__ if the IBM 128-bit long double type is created.
Define __SIZEOF_FLOAT128__ if the IEEE 128-bit floating point type
is created.
gcc/testsuite/
PR target/99708
* gcc.target/powerpc/pr99708.c: New test.
|
|
PR analyzer/104576 tracks that we issue a false positive from
-Wanalyzer-use-of-uninitialized-value for the reproducers of PR 63311
when optimization is disabled.
The root cause is that the analyzer was considering that a call to
__builtin_sinf could have side-effects.
This patch fixes things by generalizing the handling for "pure"
functions to also consider "const" functions.
gcc/analyzer/ChangeLog:
PR analyzer/104576
* region-model.cc: Include "calls.h".
(region_model::on_call_pre): Use flags_from_decl_or_type to
generalize check for DECL_PURE_P to also check for ECF_CONST.
gcc/testsuite/ChangeLog:
PR analyzer/104576
* gcc.dg/analyzer/torture/uninit-pr63311.c: New test.
* gcc.dg/analyzer/uninit-pr104576.c: New test.
* gfortran.dg/analyzer/uninit-pr63311.f90: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
|
|
PR analyzer/104560 reports various false positives from
-Wanalyzer-free-of-non-heap seen with rdma-core, on what's
effectively:
free (&ptr->field)
where in this case "field" is the first element of its struct, and thus
&ptr->field == ptr, and could be on the heap.
The root cause is due to malloc_state_machine::on_stmt making
"LHS = &EXPR;"
transition LHS from start to non_heap when EXPR is not a MEM_REF;
this assumption doesn't hold for the above case.
This patch eliminates that state transition, instead relying on
malloc_state_machine::get_default_state to detect regions known to
not be on the heap.
Doing so fixes the false positive, but eliminates some events relating
to free-of-alloca identifying the alloca, so the patch also reworks
free_of_non_heap to capture which region has been freed, adding
region creation events to diagnostic paths, so that the alloca calls
can be identified, and using the memory space of the region for more
precise wording of the diagnostic.
The improvement to malloc_state_machine::get_default_state also
means we now detect attempts to free VLAs, functions and code labels.
In doing so I spotted that I wasn't adding region creation events for
regions for global variables, and for cases where an allocation is the
last stmt within its basic block, so the patch also fixes these issues.
gcc/analyzer/ChangeLog:
PR analyzer/104560
* diagnostic-manager.cc (diagnostic_manager::build_emission_path):
Add region creation events for globals of interest.
(null_assignment_sm_context::get_old_program_state): New.
(diagnostic_manager::add_events_for_eedge): Move check for
changing dynamic extents from PK_BEFORE_STMT case to after the
switch on the dst_point's kind so that we can emit them for the
final stmt in a basic block.
* engine.cc (impl_sm_context::get_old_program_state): New.
* sm-malloc.cc (malloc_state_machine::get_default_state): Rewrite
detection of m_non_heap to use get_memory_space.
(free_of_non_heap::free_of_non_heap): Add freed_reg param.
(free_of_non_heap::subclass_equal_p): Update for changes to
fields.
(free_of_non_heap::emit): Drop m_kind in favor of
get_memory_space.
(free_of_non_heap::describe_state_change): Remove logic for
detecting alloca.
(free_of_non_heap::mark_interesting_stuff): Add region-creation of
m_freed_reg.
(free_of_non_heap::get_memory_space): New.
(free_of_non_heap::kind): Drop enum.
(free_of_non_heap::m_freed_reg): New field.
(free_of_non_heap::m_kind): Drop field.
(malloc_state_machine::on_stmt): Drop transition to m_non_heap.
(malloc_state_machine::handle_free_of_non_heap): New function,
split out from on_deallocator_call and on_realloc_call, adding
detection of the freed region.
(malloc_state_machine::on_deallocator_call): Use it.
(malloc_state_machine::on_realloc_call): Likewise.
* sm.h (sm_context::get_old_program_state): New vfunc.
gcc/testsuite/ChangeLog:
PR analyzer/104560
* g++.dg/analyzer/placement-new.C: Update expected wording.
* g++.dg/analyzer/pr100244.C: Likewise.
* gcc.dg/analyzer/attr-malloc-1.c (test_7): Likewise.
* gcc.dg/analyzer/malloc-1.c (test_24): Likewise.
(test_25): Likewise.
(test_26): Likewise.
(test_50a, test_50b, test_50c): New.
* gcc.dg/analyzer/malloc-callbacks.c (test_5): Update expected
wording.
* gcc.dg/analyzer/malloc-paths-8.c: Likewise.
* gcc.dg/analyzer/pr104560-1.c: New test.
* gcc.dg/analyzer/pr104560-2.c: New test.
* gcc.dg/analyzer/realloc-1.c (test_7): Updated expected wording.
* gcc.dg/analyzer/vla-1.c (test_2): New. Prune output from
-Wfree-nonheap-object.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
Add build tags and a few other changes so that libgo builds on Solaris.
Patch partially from Rainer Orth.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/386215
|
|
* gimple-range-gori.cc (gori_compute::condexpr_adjust): Use
range_compatible_p instead of direct type comparison.
|
|
Here we're crashing from potential_constant_expression because it tries
to perform trial evaluation of the first operand '(bool)__r' of the
conjunction (which is overall wrapped in a NON_DEPENDENT_EXPR), but
cxx_eval_constant_expression ICEs on unsupported trees (of which CAST_EXPR
is one). The sequence of events is:
1. build_non_dependent_expr for the array subscript yields
NON_DEPENDENT_EXPR<<<(bool)__r && __s>>> ? 1 : 2
2. cp_build_array_ref calls fold_non_dependent_expr on this subscript
(after this point, processing_template_decl is cleared)
3. during which, the COND_EXPR case of tsubst_copy_and_build calls
fold_non_dependent_expr on the first operand
4. during which, we crash from p_c_e_1 because it attempts trial
evaluation of the CAST_EXPR '(bool)__r'.
Note that even if this crash didn't happen, fold_non_dependent_expr
from cp_build_array_ref would still ultimately be one big no-op here
since neither constexpr evaluation nor tsubst handle NON_DEPENDENT_EXPR.
In light of this and of the observation that we should never see
NON_DEPENDENT_EXPR in a context where a constant expression is needed
(it's used primarily in the build_x_* family of functions), it seems
futile for p_c_e_1 to ever return true for NON_DEPENDENT_EXPR. And the
otherwise inconsistent handling of NON_DEPENDENT_EXPR between p_c_e_1,
cxx_evaluate_constexpr_expression and tsubst apparently leads to weird
bugs such as this one.
PR c++/104507
gcc/cp/ChangeLog:
* constexpr.cc (potential_constant_expression_1)
<case NON_DEPENDENT_EXPR>: Return false instead of recursing.
Assert tf_error isn't set.
gcc/testsuite/ChangeLog:
* g++.dg/template/non-dependent21.C: New test.
|
|
This PR has been fixed with r12-7147-g2f9ab267e725ddf2.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR target/104448
* gcc.target/i386/pr104448.c: New test.
|
|
On the following testcase on aarch64-linux, we behave differently
with -g and -g0.
The problem is that on:
(insn 10011 10010 10012 2 (set (reg:CC 66 cc)
(compare:CC (reg:DI 105)
(const_int 0 [0]))) "pr104544.c":18:3 407 {cmpdi}
(expr_list:REG_DEAD (reg:DI 105)
(nil)))
(insn 10012 10011 10013 2 (set (reg:SI 109)
(eq:SI (reg:CC 66 cc)
(const_int 0 [0]))) "pr104544.c":18:3 444 {aarch64_cstoresi}
(expr_list:REG_DEAD (reg:CC 66 cc)
(nil)))
(insn 10013 10012 10016 2 (set (reg:DI 110)
(zero_extend:DI (reg:SI 109))) "pr104544.c":18:3 111 {*zero_extendsidi2_aarch64}
(expr_list:REG_DEAD (reg:SI 109)
(nil)))
(insn 10016 10013 10017 2 (parallel [
(set (reg:CC 66 cc)
(compare:CC (const_int 0 [0])
(reg:DI 110)))
(set (reg:DI 111)
(neg:DI (reg:DI 110)))
]) "pr104544.c":18:3 281 {negdi_carryout}
(expr_list:REG_DEAD (reg:DI 110)
(nil)))
...
(debug_insn 6 5 7 2 (var_location:SI y (debug_expr:SI D#5)) "pr104544.c":18:3 -1
(nil))
(debug_insn 7 6 10033 2 (debug_marker) "pr104544.c":11:3 -1
(nil))
(insn 10033 7 10034 2 (set (reg:DI 117 [ _14 ])
(ior:DI (reg:DI 111)
(reg:DI 112))) "pr104544.c":11:6 496 {iordi3}
(expr_list:REG_DEAD (reg:DI 112)
(expr_list:REG_DEAD (reg:DI 111)
(nil))))
we successfully split 3 insns into two:
Trying 10011, 10013 -> 10016:
10011: cc:CC=cmp(r105:DI,0)
REG_DEAD r105:DI
10013: r110:DI=cc:CC==0
REG_DEAD cc:CC
10016: {cc:CC=cmp(0,r110:DI);r111:DI=-r110:DI;}
REG_DEAD r110:DI
Failed to match this instruction:
(parallel [
(set (reg:CC 66 cc)
(compare:CC (reg:DI 105)
(const_int 0 [0])))
(set (reg:DI 111)
(neg:DI (eq:DI (reg:DI 105)
(const_int 0 [0]))))
])
Failed to match this instruction:
(parallel [
(set (reg:CC 66 cc)
(compare:CC (reg:DI 105)
(const_int 0 [0])))
(set (reg:DI 111)
(neg:DI (eq:DI (reg:DI 105)
(const_int 0 [0]))))
])
Successfully matched this instruction:
(set (reg:DI 111)
(neg:DI (eq:DI (reg:DI 105)
(const_int 0 [0]))))
Successfully matched this instruction:
(set (reg:CC 66 cc)
(compare:CC (reg:DI 105)
(const_int 0 [0])))
Successfully matched this instruction:
(set (reg:DI 112)
(neg:DI (eq:DI (reg:CC 66 cc)
(const_int 0 [0]))))
allowing combination of insns 10011, 10013 and 10016
original costs 4 + 4 + 4 = 16
replacement costs 4 + 4 = 12
deferring deletion of insn with uid = 10011.
but the code that searches forward for insns to update their log
links (before the change there is a link from insn 10033 to insn 10016
for pseudo 111) only finds insn 10033 and updates the log link if
-g isn't enabled, otherwise it stops earlier because there are debug insns
in between. So, with -g LOG_LINKS of 10033 isn't updated, points eventually
to NOTE_INSN_DELETED and so we do not attempt to combine 10033 with other
insns, while with -g0 we do.
The following patch fixes that by instead ignoring debug insns during the
searching. We can still check BLOCK_FOR_INSN (insn) on those, because
if we notice DEBUG_INSN in a following basic block, necessarily there won't
be any further normal insns in the current block after it.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/104544
* combine.cc (try_combine): When looking for insn whose links
should be updated from i3 to i2, don't stop on debug insns, instead
skip over them.
* gcc.dg/pr104544.c: New test.
|
|
atomic-inst-cas.c has code to skip __atomic_compare_exchange_n
calls for invalid memory orderings, but -Winvalid-memory-model
applies before the dead code is removed (which is the right
behaviour IMO). This patch therefore suppresses the warning
for this test.
gcc/testsuite/
* gcc.target/aarch64/atomic-inst-cas.c: Add
-Wno-invalid-memory-model.
|
|
bic-bitmask-1.c is now passing, so remove the XFAIL.
gcc/testsuite/
* gcc.target/aarch64/bic-bitmask-1.c: Remove XFAIL.
|
|
pr100056.c contains things like:
int
or_shift_u3a (unsigned i)
{
i &= 7;
return i | (i << 11);
}
After g:96146e61cd7aee62c21c2845916ec42152918ab7, the preferred
gimple representation of this is a multiplication:
i_2 = i_1(D) & 7;
_5 = i_2 * 2049;
Expand then open-codes the multiplication back to individual shifts,
but (of course) it uses + rather than | to combine the shifts.
This means that we end up with the RTL equivalent of:
i + (i << 11)
I wondered about canonicalising the + to | (*back* to | in this case)
when the operands have no set bits in common and when one of the
operands is &, | or ^, but that didn't seem to be a popular idea when
I asked on IRC. The feeling seemed to be that + is inherently simpler
than |, so we shouldn't be “simplifying” the other way.
This patch therefore adjusts the PR100056 patterns to handle +
as well as |, in cases where the operands are provably disjoint.
For:
int
or_shift_u8 (unsigned char i)
{
return i | (i << 11);
}
the instructions:
2: r95:SI=zero_extend(x0:QI)
REG_DEAD x0:QI
7: r98:SI=r95:SI<<0xb
are combined into:
(parallel [
(set (reg:SI 98)
(and:SI (ashift:SI (reg:SI 0 x0 [ i ])
(const_int 11 [0xb]))
(const_int 522240 [0x7f800])))
(set (reg/v:SI 95 [ i ])
(zero_extend:SI (reg:QI 0 x0 [ i ])))
])
which fails to match, but which is then split into its individual
(independent) sets. Later the zero_extend is combined with the add
to get an ADD UXTB:
(set (reg:SI 99)
(plus:SI (zero_extend:SI (reg:QI 0 x0 [ i ]))
(reg:SI 98)))
This means that there is never a 3-insn combo to match the split
against. The end result is therefore:
ubfiz w1, w0, 11, 8
add w0, w1, w0, uxtb
This is a bit redundant, since it's doing the zero_extend twice.
It is at least 2 instructions though, rather than the 3 that we
had before the original patch for PR100056. or_shift_u8_asm is
affected similarly.
The net effect is that we do still have 2 UBFIZs, but we're at
least back down to 2 instructions per function, as for GCC 11.
I think that's good enough for now.
There are probably other instructions that should be extended
to support + as well as | (e.g. the EXTR ones), but those aren't
regressions and so are GCC 13 material.
gcc/
PR target/100056
* config/aarch64/iterators.md (LOGICAL_OR_PLUS): New iterator.
* config/aarch64/aarch64.md: Extend the PR100056 patterns
to handle plus in the same way as ior, if the operands have
no set bits in common.
gcc/testsuite/
PR target/100056
* gcc.target/aarch64/pr100056.c: XFAIL the original UBFIZ test
and instead expect two UBFIZs + two ADD UXTBs.
|
|
D front-end changes:
- Parsing and compiling C code is now possible using `import'.
- `throw' statements can now be used as an expression.
- Improvements to the D template emission strategy when compiling
with `-funittest'.
D Runtime changes:
- New core.int128 module for implementing intrinsics to support
128-bit integer types.
- C bindings for the kernel and C runtime have been better separated
to allow compiling for hybrid targets, such as kFreeBSD.
Phobos changes:
- The std.experimental.checkedint module has been renamed to
std.checkedint.
gcc/d/ChangeLog:
* d-builtins.cc (d_build_builtins_module): Set purity of DECL_PURE_P
functions to PURE::const_.
* d-gimplify.cc (bit_field_ref): New function.
(d_gimplify_modify_expr): Handle implicit casting for assignments to
bit-fields.
(d_gimplify_unary_expr): New function.
(d_gimplify_binary_expr): New function.
(d_gimplify_expr): Handle UNARY_CLASS_P and BINARY_CLASS_P.
* d-target.cc (Target::_init): Initialize bitFieldStyle.
(TargetCPP::parameterType): Update signature.
(Target::supportsLinkerDirective): New function.
* dmd/MERGE: Merge upstream dmd 52844d4b1.
* expr.cc (ExprVisitor::visit (ThrowExp *)): New function.
* types.cc (d_build_bitfield_integer_type): New function.
(insert_aggregate_bitfield): New function.
(layout_aggregate_members): Handle inserting bit-fields into an
aggregate type.
libphobos/ChangeLog:
* Makefile.in: Regenerate.
* libdruntime/MERGE: Merge upstream druntime dbd0c874.
* libdruntime/Makefile.am (DRUNTIME_CSOURCES): Add core/int128.d.
(DRUNTIME_DISOURCES): Add __builtins.di.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos 896b1d0e1.
* src/Makefile.am (PHOBOS_DSOURCES): Add std/checkedint.d.
* src/Makefile.in: Regenerate.
* testsuite/testsuite_flags.in: Add -fall-instantiations to
--gdcflags.
|
|
build_binary_op [PR104531]
The MIN_EXPR/MAX_EXPR handling in *build_binary_op is minimal (especially
for C FE), because min/max aren't expressions the languages contain directly.
I'm using those for the
#pragma omp atomic
x = x < y ? y : x;
forms, but e.g. for the attached testcase we normally reject _Complex int vs. int
comparisons, in C++ due to MIN/MAX_EXPR we were diagnosing it as invalid types
for <unknown> while in C we accept it and ICEd later on.
The following patch will try build_binary_op with LT_EXPR on the operands first
to get needed diagnostics and fail if it returns error_mark_node.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR c/104531
* c-omp.cc (c_finish_omp_atomic): For MIN_EXPR/MAX_EXPR, try first
build_binary_op with LT_EXPR and only if that doesn't return
error_mark_node call build_modify_expr.
* c-c++-common/gomp/atomic-31.c: New test.
|
|
comparison [PR104510]
The comment in shorten_compare says:
/* If either arg is decimal float and the other is float, fail. */
but the callers of shorten_compare don't expect anything like failure
as a possibility from the function, callers require that the function
promotes the operands to the same type, whether the original selected
*restype_ptr one or some shortened.
So, if we choose not to shorten, we should still promote to the original
*restype_ptr.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR c/104510
* c-common.cc (shorten_compare): Convert original arguments to
the original *restype_ptr when mixing binary and decimal float.
* gcc.dg/dfp/pr104510.c: New test.
|
|
|
|
The HTM tbegin. instruction can fail intermittently due to many reasons.
This can lead to htm-1.c FAILing from time to time. The solution is to
allow retrying the instruction a few times before aborting.
2022-02-15 Peter Bergner <bergner@linux.ibm.com>
gcc/testsuite/
* gcc.target/powerpc/htm-1.c: Retry intermittent failing tbegins.
|
|
Provide an API into gori to perform a basic evaluation of the arguments of a
COND_EXPR if they are in the dependency chain of the condition.
PR tree-optimization/104526
gcc/
* gimple-range-fold.cc (fold_using_range::range_of_cond_expr): Call
new routine.
* gimple-range-gori.cc (range_def_chain::get_def_chain): Force a build
of dependency chain if there isn't one.
(gori_compute::condexpr_adjust): New.
* gimple-range-gori.h (class gori_compute): New prototype.
gcc/testsuite/
* gcc.dg/pr104526.c: New.
|
|
gcc/analyzer/ChangeLog:
PR analyzer/104524
* region-model-manager.cc
(region_model_manager::maybe_fold_sub_svalue): Only call
get_or_create_cast if type is non-NULL.
gcc/testsuite/ChangeLog:
PR analyzer/104524
* gcc.dg/analyzer/pr104524.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
There is false positive from -Wanalyzer-use-of-uninitialized-value on
gcc.dg/analyzer/pr102692.c here:
‘fix_overlays_before’: events 1-3
|
| 75 | while (tail
| | ~~~~
| 76 | && (tem = make_lisp_ptr (tail, 5),
| | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| | (1) following ‘false’ branch (when ‘tail’ is NULL)...
| 77 | (end = marker_position (XOVERLAY (tem)->end)) >= pos))
| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|......
| 82 | if (!tail || end < prev || !tail->next)
| | ~~~~~ ~~~~~~~~~~
| | | |
| | | (3) use of uninitialized value ‘end’ here
| | (2) ...to here
|
The issue is that inner || of the conditionals have been folded within the
frontend from a chain of control flow:
5 │ if (tail == 0B) goto <D.1986>; else goto <D.1988>;
6 │ <D.1988>:
7 │ if (end < prev) goto <D.1986>; else goto <D.1989>;
8 │ <D.1989>:
9 │ _1 = tail->next;
10 │ if (_1 == 0B) goto <D.1986>; else goto <D.1987>;
11 │ <D.1986>:
to an OR expr (and then to a bitwise-or by the gimplifier):
5 │ _1 = tail == 0B;
6 │ _2 = end < prev;
7 │ _3 = _1 | _2;
8 │ if (_3 != 0) goto <D.1986>; else goto <D.1988>;
9 │ <D.1988>:
10 │ _4 = tail->next;
11 │ if (_4 == 0B) goto <D.1986>; else goto <D.1987>;
This happens for sufficiently simple conditionals in fold_truth_andor.
In particular, the (end < prev) is short-circuited without optimization,
but is evaluated with optimization, leading to the false positive.
Given how early this folding occurs, it seems the simplest fix is to
try to detect places where this optimization appears to have happened,
and suppress uninit warnings within the statement that would have
been short-circuited.
gcc/analyzer/ChangeLog:
PR analyzer/102692
* exploded-graph.h (impl_region_model_context::get_stmt): New.
* region-model.cc: Include "gimple-ssa.h", "tree-phinodes.h",
"tree-ssa-operands.h", and "ssa-iterators.h".
(within_short_circuited_stmt_p): New.
(region_model::check_for_poison): Don't warn about uninit values
if within_short_circuited_stmt_p.
* region-model.h (region_model_context::get_stmt): New vfunc.
(noop_region_model_context::get_stmt): New.
gcc/testsuite/ChangeLog:
PR analyzer/102692
* gcc.dg/analyzer/pr102692-2.c: New test.
* gcc.dg/analyzer/pr102692.c: Remove xfail. Remove -O2 from
options and move to...
* gcc.dg/analyzer/torture/pr102692.c: ...here.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_depobj): Fix to alloc/ptr dummy
and for c_ptr.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/depend-4.f90: Add VALUE test, update scan test.
* gfortran.dg/gomp/depend-5.f90: Fix scan tree for -m32.
* gfortran.dg/gomp/depend-6.f90: New test.
|
|
subs_compare_2.c tests that we can use a SUBS+CSEL sequence for:
unsigned int
foo (unsigned int a, unsigned int b)
{
unsigned int x = a - 4;
if (a < 4)
return x;
else
return 0;
}
As Andrew notes in the PR, this is effectively MIN (x, 4) - 4,
and it is now recognised as such by phiopt. Previously it was
if-converted in RTL instead.
I tried to look for ways to generalise this to other situations
and to other ?:-style operations, not just max and min. However,
for general ?: we tend to push an outer “- CST” into the arms of
the ?: -- at least if one of them simplifies -- so I didn't find
any useful abstraction.
This patch therefore adds a pattern specifically for
max/min(a,cst)-cst. I'm not thrilled at having to do this,
but it seems like the least worst fix in the circumstances.
Also, max(a,cst)-cst for unsigned a is a useful saturating
subtraction idiom and so is arguably worth its own code
for that reason.
gcc/
PR target/100874
* config/aarch64/aarch64-protos.h (aarch64_maxmin_plus_const):
Declare.
* config/aarch64/aarch64.cc (aarch64_maxmin_plus_const): New function.
* config/aarch64/aarch64.md (*aarch64_minmax_plus): New pattern.
gcc/testsuite/
* gcc.target/aarch64/max_plus_1.c: New test.
* gcc.target/aarch64/max_plus_2.c: Likewise.
* gcc.target/aarch64/max_plus_3.c: Likewise.
* gcc.target/aarch64/max_plus_4.c: Likewise.
* gcc.target/aarch64/max_plus_5.c: Likewise.
* gcc.target/aarch64/max_plus_6.c: Likewise.
* gcc.target/aarch64/max_plus_7.c: Likewise.
* gcc.target/aarch64/min_plus_1.c: Likewise.
* gcc.target/aarch64/min_plus_2.c: Likewise.
* gcc.target/aarch64/min_plus_3.c: Likewise.
* gcc.target/aarch64/min_plus_4.c: Likewise.
* gcc.target/aarch64/min_plus_5.c: Likewise.
* gcc.target/aarch64/min_plus_6.c: Likewise.
* gcc.target/aarch64/min_plus_7.c: Likewise.
|
|
store_v2vec_lanes.c started failing after SLP was enabled at -O2.
The test is specifically checking what happens for unvectorised code,
with the vectors being constructed from individal addition results.
gcc/testsuite/
* gcc.target/aarch64/store_v2vec_lanes.c: Add -fno-tree-vectorize.
|
|
This patch adds +nosve to various Advanced SIMD-only tests.
gcc/testsuite/
* gcc.target/aarch64/shl-combine-2.c: New test.
* gcc.target/aarch64/shl-combine-3.c: Likewise.
* gcc.target/aarch64/shl-combine-4.c: Likewise.
* gcc.target/aarch64/shl-combine-5.c: Likewise.
* gcc.target/aarch64/xtn-combine-1.c: Likewise.
* gcc.target/aarch64/xtn-combine-2.c: Likewise.
* gcc.target/aarch64/xtn-combine-3.c: Likewise.
* gcc.target/aarch64/xtn-combine-4.c: Likewise.
* gcc.target/aarch64/xtn-combine-5.c: Likewise.
* gcc.target/aarch64/xtn-combine-6.c: Likewise.
|
|
ldp_stp_1.c, ldp_stp_4.c and ldp_stp_5.c have been failing since
vectorisation was enabled at -O2. In all three cases SLP is
generating vector code when scalar code would be better.
The problem is that the target costs do not model whether STP could
be used for the scalar or vector code, so the normal latency-based
costs for store-heavy code can be way off. It would be good to fix
that “properly” at some point, but it isn't easy; see the existing
discussion in aarch64_sve_adjust_stmt_cost for more details.
This patch therefore adds an on-the-side check for whether the
code is doing nothing more than set-up+stores. It then applies
STP-based costs to those cases only, in addition to the normal
latency-based costs. (That is, the vector code has to win on
both counts rather than on one count individually.)
However, at the moment, SLP costs one vector set-up instruction
for every vector in an SLP node, even if the contents are the
same as a previous vector in the same node. Fixing the STP costs
without fixing that would regress other cases, tested in the patch.
The patch therefore makes the SLP costing code check for duplicates
within a node. Ideally we'd check for duplicates more globally,
but that would require a more global approach to costs: the cost
of an initialisation should be amoritised across all trees that
use the initialisation, rather than fully counted against one
arbitrarily-chosen subtree.
Back on aarch64: an earlier version of the patch tried to apply
the new heuristic to constant stores. However, that didn't work
too well in practice; see the comments for details. The patch
therefore just tests the status quo for constant cases, leaving out
a match if the current choice is dubious.
ldp_stp_5.c was affected by the same thing. The test would be
worth vectorising if we generated better vector code, but:
(1) We do a bad job of moving the { -1, 1 } constant, given that
we have { -1, -1 } and { 1, 1 } to hand.
(2) The vector code has 6 pairable stores to misaligned offsets.
We have peephole patterns to handle such misalignment for
4 pairable stores, but not 6.
So the SLP decision isn't wrong as such. It's just being let
down by later codegen.
The patch therefore adds -mstrict-align to preserve the original
intention of the test while adding ldp_stp_19.c to check for the
preferred vector code (XFAILed for now).
gcc/
* tree-vectorizer.h (vect_scalar_ops_slice): New struct.
(vect_scalar_ops_slice_hash): Likewise.
(vect_scalar_ops_slice::op): New function.
* tree-vect-slp.cc (vect_scalar_ops_slice::all_same_p): New function.
(vect_scalar_ops_slice_hash::hash): Likewise.
(vect_scalar_ops_slice_hash::equal): Likewise.
(vect_prologue_cost_for_slp): Check for duplicate vectors.
* config/aarch64/aarch64.cc
(aarch64_vector_costs::m_stp_sequence_cost): New member variable.
(aarch64_aligned_constant_offset_p): New function.
(aarch64_stp_sequence_cost): Likewise.
(aarch64_vector_costs::add_stmt_cost): Handle new STP heuristic.
(aarch64_vector_costs::finish_cost): Likewise.
gcc/testsuite/
* gcc.target/aarch64/ldp_stp_5.c: Require -mstrict-align.
* gcc.target/aarch64/ldp_stp_14.h,
* gcc.target/aarch64/ldp_stp_14.c: New test.
* gcc.target/aarch64/ldp_stp_15.c: Likewise.
* gcc.target/aarch64/ldp_stp_16.c: Likewise.
* gcc.target/aarch64/ldp_stp_17.c: Likewise.
* gcc.target/aarch64/ldp_stp_18.c: Likewise.
* gcc.target/aarch64/ldp_stp_19.c: Likewise.
|
|
When updating the target costs interface, I failed to move the
free of the scalar costs beyond the new last use.
gcc/
* tree-vect-slp.cc (vect_bb_vectorization_profitable_p): Fix
use after free.
|
|
We have to make sure that outer loop exits come after the inner
loop since we otherwise will put it into the fused loop body.
2022-02-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/104543
* gimple-loop-jam.cc (unroll_jam_possible_p): Check outer loop exits
come after the inner loop.
* gcc.dg/torture/pr104543.c: New testcase.
|
|
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses, gfc_trans_omp_depobj):
Depend on the proper addr, for ptr/alloc depend on pointee.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/depend-4.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/depend-4.f90: New test.
* gfortran.dg/gomp/depend-5.f90: New test.
|
|
As the testcase reports, cygwin has 3 can%'t contractions in diagnostics,
we use cannot everywhere else instead and -Wformat-diag enforces that.
2022-02-15 Jakub Jelinek <jakub@redhat.com>
PR target/104536
* config/i386/host-cygwin.cc (cygwin_gt_pch_get_address): Use
cannot instead of can%'t in diagnostics. Formatting fixes.
|
|
[PR104522]
For IBM double double I've added in PR95450 and PR99648 verification that
when we at the tree/GIMPLE or RTL level interpret target bytes as a REAL_CST
or CONST_DOUBLE constant, we try to encode it back to target bytes and
verify it is the same.
This is because our real.c support isn't able to represent all valid values
of IBM double double which has variable precision.
In PR104522, it has been noted that we have similar problem with the
Intel/Motorola extended XFmode formats, our internal representation isn't
able to record pseudo denormals, pseudo infinities, pseudo NaNs and unnormal
values.
So, the following patch is an attempt to extend that verification to all
floats.
Unfortunately, it wasn't that straightforward, because the
__builtin_clear_padding code exactly for the XFmode long doubles needs to
discover what bits are padding and does that by interpreting memory of
all 1s. That is actually a valid supported value, a qNaN with negative
sign with all mantissa bits set, but the verification includes also the
padding bits (exactly what __builtin_clear_padding wants to figure out)
and so fails the comparison check and so we ICE.
The patch fixes that case by moving that verification from
native_interpret_real to its caller, so that clear_padding_type can
call native_interpret_real and avoid that extra check.
With this, the only thing that regresses in the testsuite is
+FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times long\\t-16843010 5
because it decides to use a pattern that has non-zero bits in the padding
bits of the long double, so the simplify-rtx.cc change prevents folding
a SUBREG into a constant. We emit (the testcase is -O0 but we emit worse
code at all opt levels) something like:
movabsq $-72340172838076674, %rax
movabsq $-72340172838076674, %rdx
movq %rax, -48(%rbp)
movq %rdx, -40(%rbp)
fldt -48(%rbp)
fstpt -32(%rbp)
instead of
fldt .LC2(%rip)
fstpt -32(%rbp)
...
.LC2:
.long -16843010
.long -16843010
.long 65278
.long 0
Note, neither of those sequences actually stores the padding bits, fstpt
simply doesn't touch them.
For vars with clear_padding_real_needs_padding_p types that are allocated
to memory at expansion time, I'd say much better would be to do the stores
using integral modes rather than XFmode, so do that:
movabsq $-72340172838076674, %rax
movq %rax, -32(%rbp)
movq %rax, -24(%rbp)
directly. That is the only way to ensure the padding bits are initialized
(or expand __builtin_clear_padding, but then you initialize separately the
value bits and padding bits).
2022-02-15 Jakub Jelinek <jakub@redhat.com>
PR middle-end/104522
* fold-const.h (native_interpret_real): Declare.
* fold-const.cc (native_interpret_real): No longer static. Don't
perform MODE_COMPOSITE_P verification here.
(native_interpret_expr) <case REAL_TYPE>: But perform it here instead
for all modes.
* gimple-fold.cc (clear_padding_type): Call native_interpret_real
instead of native_interpret_expr.
* simplify-rtx.cc (simplify_immed_subreg): Perform the native_encode_rtx
and comparison verification for all FLOAT_MODE_P modes, not just
MODE_COMPOSITE_P.
* gcc.dg/pr104522.c: New test.
|
|
The following adjusts the PR100499 niter fix to use the appropriate
types when checking whether the difference between the final and base
values of the IV are a multiple of the step. It also gets rid of
an always false condition in multiple_of_p which lead me to a
wrong solution first.
2022-02-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/104519
* fold-const.cc (multiple_of_p): Remove never true condition.
* tree-ssa-loop-niter.cc (number_of_iterations_ne): Use
the appropriate types for determining whether the difference
of final and base is a multiple of the step.
* gcc.dg/torture/pr104519.c: New testcase.
|
|
The following testcase fails -fcompare-debug, because finalize_task_copyfn
was invoked from splay tree destruction, whose order can in some cases
depend on -g/-g0. The fix is to queue the task stmts that need copyfn
in a vector and run finalize_task_copyfn on elements of that vector.
2022-02-15 Jakub Jelinek <jakub@redhat.com>
PR debug/104517
* omp-low.cc (task_cpyfns): New variable.
(delete_omp_context): Don't call finalize_task_copyfn from here.
(create_task_copyfn): Push task_stmt into task_cpyfns.
(execute_lower_omp): Call finalize_task_copyfn here on entries from
task_cpyfns vector and release the vector.
* gcc.dg/gomp/pr104517.c: New test.
|
|
In the first testcase, coerce_template_template_parms was adding too much of
outer_args when coercing to match P's template parameters, so that when
substituting into the 'const T&' parameter we got an unrelated template
argument for T. We should only add outer_args when the argument template is
a nested template.
PR c++/104107
PR c++/95036
gcc/cp/ChangeLog:
* pt.cc (coerce_template_template_parms): Take full parms.
Avoid adding too much of outer_args.
(coerce_template_template_parm): Adjust.
(template_template_parm_bindings_ok_p): Adjust.
(convert_template_argument): Adjust.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/alias-decl-ttp2.C: New test.
* g++.dg/cpp1z/ttp2.C: New test.
|
|
|
|
Resolves:
PR middle-end/104355 - Misleading and outdated -Warray-bounds documentation
gcc/ChangeLog:
PR middle-end/104355
* doc/invoke.texi (-Warray-bounds): Update documentation.
|
|
If you are on a PowerPC system where the default long double is IEEE
128-bit (either through the compiler option -mabi=ieeelongdouble or via
the configure option --with-long-double-format=ieee), GCC used the wrong
names for some of the conversion functions for the __ibm128 type.
Internally, GCC uses IFmode for __ibm128 if long double is IEEE 128-bit,
instead of TFmode when long double is IBM 128-bit. This patch adds the
missing conversions to prevent the 'if' name from being used.
In particular, before the patch, the conversions used were:
IFmode to DImode signed: __fixifdi instead of __fixtfdi
IFmode to DImode unsigned __fixunsifti instead of __fixunstfti
DImode to IFmode signed: __floatdiif instead of __floatditf
DImode to IFmode unsigned: __floatundiif instead of __floatunditf
2022-02-14 Michael Meissner <meissner@the-meissners.org>
gcc/
PR target/104253
* config/rs6000/rs6000.cc (init_float128_ibm): Update the
conversion functions used to convert IFmode types.
gcc/testsuite/
PR target/104253
* gcc.target/powerpc/pr104253.c: New test.
|
|
gcc/fortran/ChangeLog:
PR fortran/104211
* expr.cc (find_array_section): Replace assertion by error
recovery when encountering bad array constructor.
gcc/testsuite/ChangeLog:
PR fortran/104211
* gfortran.dg/pr104211.f90: New test.
|
|
potential_constant_expression_1 [PR104513]
return in ctors on targetm.cxx.cdtor_returns_this () target like arm
is emitted as GOTO_EXPR cdtor_label where at cdtor_label it emits
RETURN_EXPR with the this.
Similarly, in all dtors regardless of targetm.cxx.cdtor_returns_this ()
a return is emitted similarly.
potential_constant_expression_1 was rejecting these gotos and so we
incorrectly rejected these testcases, but actual cxx_eval* is apparently
handling these just fine. I was a little bit worried that for the
destruction of bases we wouldn't evaluate something we should, but as the
testcase shows, that is evaluated through try ... finally and there is
nothing after the cdtor_label. For arm there is RETURN_EXPR this; but we
don't really care about the return value from ctors and dtors during the
constexpr evaluation.
I must say I don't see much the point of cdtor_labels at all, I'd think
that with try ... finally around it for non-arm we could just RETURN_EXPR
instead of the GOTO_EXPR and the try/finally gimplification would DTRT,
and we could just add the right return value for the arm case.
2022-02-14 Jakub Jelinek <jakub@redhat.com>
PR c++/104513
* constexpr.cc (potential_constant_expression_1) <case GOTO_EXPR>:
Don't punt if returns (target).
* g++.dg/cpp1y/constexpr-104513.C: New test.
* g++.dg/cpp2a/constexpr-dtor12.C: New test.
|
|
Obviously it would be better if these reductions could be evaluated at compile
time, but this will avoid an ICE.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_expand_reduc_scalar): Use force_reg.
|
|
When DSE removes a trivially dead def we have to reset niter information
on loops since that might refer to it. The patch also adds verification
to make sure this does not happen.
2022-02-14 Richard Biener <rguenther@suse.de>
PR tree-optimization/104528
* tree-ssa.h (find_released_ssa_name): Declare.
* tree-ssa.cc (find_released_ssa_name): Export.
* cfgloop.cc (verify_loop_structure): Look for released
SSA names in loops nb_iterations.
* tree-ssa-dse.cc (pass_dse::execute): Release number of iteration
estimates.
* gfortran.dg/pr104528.f: New testcase.
|
|
This avoids forwprop from matching DFP <-> FP vector conversions
using VEC_[UN]PACK{_TRUNC,_LO,_HI}. Maybe DFP vectors shouldn't be
a thing, but they appearantly are. Re-using CONVERT/NOP_EXPR for
DFP <-> FP conversions was probably a mistake.
2022-02-14 Richard Biener <rguenther@suse.de>
PR tree-optimization/104511
* tree-ssa-forwprop.cc (simplify_vector_constructor): Avoid
touching DFP <-> FP conversions.
* gcc.dg/pr104511.c: New testcase.
|
|
The following handles internal function calls similar to how the
C++ frontend does, avoiding ICEing on those.
2022-02-14 Richard Biener <rguenther@suse.de>
PR c/104505
gcc/c-family/
* c-pretty-print.cc (c_pretty_printer::postfix_expression): Handle
internal function calls.
gcc/testsuite/
* c-c++-common/pr104505.c: New testcase.
|
|
The following attempts to address gimplification of
... = VIEW_CONVERT_EXPR<int[4]>((i & 1) != 0 ? inv : src)[i];
which is problematic since gimplifying the base object
? inv : src produces a register temporary but GIMPLE does not
really support a register as a base for an ARRAY_REF (even
though that's not strictly validated it seems as can be seen
at -O0). Interestingly the C++ frontend avoids this issue
by emitting the following GENERIC instead:
... = (i & 1) != 0 ? VIEW_CONVERT_EXPR<int[4]>(inv)[i] : VIEW_CONVERT_EXPR<int[4]>(src)[i];
The proposed patch below fixes things up when using an rvalue
as the base is OK by emitting a copy from a register base to a
non-register one. The ?: as lvalue extension seems to be gone
for C, C++ again unwraps the COND_EXPR in that case.
2022-02-11 Richard Biener <rguenther@suse.de>
PR middle-end/104497
* gimplify.cc (gimplify_compound_lval): Make sure the
base is a non-register if needed and possible.
* c-c++-common/torture/pr104497.c: New testcase.
|