Age | Commit message (Collapse) | Author | Files | Lines |
|
Shreya and I were working through some testsuite failures and noticed that many
of the current failures on the pioneer were just silly. We have tests that
expect to see full architecture strings in their expected output when the bulk
(some might say all) of the architecture string is irrelevant.
Worse yet, we'd have different matching lines. ie we'd have one that would
machine rv64gc_blah_blah and another for rv64imfa_blah_blah. Judicious
wildcard usage cleans this up considerably.
This fixes ~80 failures in the riscv.exp testsuite. Pushing to the trunk as
it's happy on the pioneer native, riscv32-elf and riscv64-elf.
gcc/testsuite/
* gcc.target/riscv/arch-25.c: Use wildcards to simplify/eliminate
dg-error directives.
* gcc.target/riscv/arch-ss-2.c: Similarly.
* gcc.target/riscv/arch-zilsd-2.c: Similarly.
* gcc.target/riscv/arch-zilsd-3.c: Similarly.
|
|
The test fails to compile on 32-bit targets because the arrays are too
large. Restrict to targets where the array index type is 64-bits.
Also note the relevant PR in the test comment.
PR debug/121411
gcc/testsuite/
* gcc.dg/debug/ctf/ctf-array-7.c: Restrict to lp64,llp64
targets.
|
|
Disable sched2 and sched3 to only have one order of instructions to
consider.
gcc/testsuite/ChangeLog:
* gcc.target/arm/unsigned-extend-2.c: Disable sched2 and sched3
and update function body to match.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
FMA/DOT_PROD_EXPR/SAD_EXPR
The patch is trying to unroll the vectorized loop when there're
FMA/DOT_PRDO_EXPR/SAD_EXPR reductions, it will break cross-iteration dependence
and enable more parallelism(since vectorize will also enable partial
sum).
When there's gather/scatter or scalarization in the loop, don't do the
unroll since the performance bottleneck is not at the reduction.
The unroll factor is set according to FMA/DOT_PROX_EXPR/SAD_EXPR
CEIL ((latency * throught), num_of_reduction)
.i.e
For fma, latency is 4, throught is 2, if there's 1 FMA for reduction
then unroll factor is 2 * 4 / 1 = 8.
There's also a vect_unroll_limit, the final suggested_unroll_factor is
set as MIN (vect_unroll_limix, 8).
The vect_unroll_limit is mainly for register pressure, avoid to many
spills.
Ideally, all instructions in the vectorized loop should be used to
determine unroll_factor with their (latency * throughput) / number,
but that would too much for this patch, and may just GIGO, so the
patch only considers 3 kinds of instructions: FMA, DOT_PROD_EXPR,
SAD_EXPR.
Note when DOT_PROD_EXPR is not native support,
m_num_reduction += 3 * count which almost prevents unroll.
There's performance boost for simple benchmark with DOT_PRDO_EXPR/FMA
chain, slight improvement in SPEC2017 performance.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_vector_costs::ix86_vector_costs):
Addd new memeber m_num_reduc, m_prefer_unroll.
(ix86_vector_costs::add_stmt_cost): Set m_prefer_unroll and
m_num_reduc
(ix86_vector_costs::finish_cost): Determine
m_suggested_unroll_vector with consideration of
reduc_lat_mult_thr, m_num_reduction and
ix86_vect_unroll_limit.
* config/i386/i386.h (enum ix86_reduc_unroll_factor): New
enum.
(processor_costs): Add reduc_lat_mult_thr and
vect_unroll_limit.
* config/i386/x86-tune-costs.h: Initialize
reduc_lat_mult_thr and vect_unroll_limit.
* config/i386/i386.opt: Add -param=ix86-vect-unroll-limit.
gcc/testsuite/ChangeLog:
* gcc.target/i386/vect_unroll-1.c: New test.
* gcc.target/i386/vect_unroll-2.c: New test.
* gcc.target/i386/vect_unroll-3.c: New test.
* gcc.target/i386/vect_unroll-4.c: New test.
* gcc.target/i386/vect_unroll-5.c: New test.
|
|
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into a div RTL instruction. The vec_duplicate is the
dividend operand.
Before this patch, we have two instructions, e.g.:
vfmv.v.f v2,fa0
vfdiv.vv v1,v2,v1
After, we get only one:
vfrdiv.vf v1,v1,fa0
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfrdiv_vf_<mode>): Add new pattern to
combine vec_duplicate + vfdiv.vv into vfrdiv.vf.
* config/riscv/vector.md (@pred_<optab><mode>_reverse_scalar): Allow VLS
modes.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfrdiv.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add support for reverse
variants.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for
reverse variants.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f64.c: New test.
|
|
invariant [PR121290]
Consider the example:
void
f (int *restrict x, int *restrict y, int *restrict z, int n)
{
for (int i = 0; i < 4; ++i)
{
int res = 0;
for (int j = 0; j < 100; ++j)
res += y[j] * z[i];
x[i] = res;
}
}
we currently vectorize as
f:
movi v30.4s, 0
ldr q31, [x2]
add x2, x1, 400
.L2:
ld1r {v29.4s}, [x1], 4
mla v30.4s, v29.4s, v31.4s
cmp x2, x1
bne .L2
str q30, [x0]
ret
which is not useful because by doing outer-loop vectorization we're performing
less work per iteration than we would had we done inner-loop vectorization and
simply unrolled the inner loop.
This patch teaches the cost model that if all your leafs are invariant, then
adjust the loop cost by * VF, since every vector iteration has at least one lane
really just doing 1 scalar.
There are a couple of ways we could have solved this, one is to increase the
unroll factor to process more iterations of the inner loop. This removes the
need for the broadcast, however we don't support unrolling the inner loop within
the outer loop. We only support unrolling by increasing the VF, which would
affect the outer loop as well as the inner loop.
We also don't directly support costing inner-loop vs outer-loop vectorization,
and as such we're left trying to predict/steer the cost model ahead of time to
what we think should be profitable. This patch attempts to do so using a
heuristic which penalizes the outer-loop vectorization.
We now cost the loop as
note: Cost model analysis:
Vector inside of loop cost: 2000
Vector prologue cost: 4
Vector epilogue cost: 0
Scalar iteration cost: 300
Scalar outside cost: 0
Vector outside cost: 4
prologue iterations: 0
epilogue iterations: 0
missed: cost model: the vector iteration cost = 2000 divided by the scalar iteration cost = 300 is greater or equal to the vectorization factor = 4.
missed: not vectorized: vectorization not profitable.
missed: not vectorized: vector version will never be profitable.
missed: Loop costings may not be worthwhile.
And subsequently generate:
.L5:
add w4, w4, w7
ld1w z24.s, p6/z, [x0, #1, mul vl]
ld1w z23.s, p6/z, [x0, #2, mul vl]
ld1w z22.s, p6/z, [x0, #3, mul vl]
ld1w z29.s, p6/z, [x0]
mla z26.s, p6/m, z24.s, z30.s
add x0, x0, x8
mla z27.s, p6/m, z23.s, z30.s
mla z28.s, p6/m, z22.s, z30.s
mla z25.s, p6/m, z29.s, z30.s
cmp w4, w6
bls .L5
and avoids the load and replicate if it knows it has enough vector pipes to do
so.
gcc/ChangeLog:
PR target/121290
* config/aarch64/aarch64.cc
(class aarch64_vector_costs ): Add m_loop_fully_scalar_dup.
(aarch64_vector_costs::add_stmt_cost): Detect invariant inner loops.
(adjust_body_cost): Adjust final costing if m_loop_fully_scalar_dup.
gcc/testsuite/ChangeLog:
PR target/121290
* gcc.target/aarch64/pr121290.c: New test.
|
|
multiply
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into a mult RTL instruction.
Before this patch, we have two instructions, e.g.:
vfmv.v.f v2,fa0
vfmul.vv v1,v1,v2
After, we get only one:
vfmul.vf v2,v2,fa0
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfmul_vf_<mode>): Add new pattern to
combine vec_duplicate + vfmul.vv into vfmul.vf.
* config/riscv/vector.md (@pred_<optab><mode>_scalar): Allow VLS modes.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfmul.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_run.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-2.c: Adjust scan
dump.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c: Likewise.
|
|
The compiler is getting too smart! But this test is really intended
to test that we generate BICS instead of BIC+CMP, so make the test use
something that we can't subsequently fold away into a bit minipulation
of a store-flag value.
I've also added a couple of extra tests, so we now cover both the
cases where we fold the result away and where that cannot be done.
Also add a test that we don't generate a compare against 0, since
that's really part of what this test is covering.
gcc/testsuite:
* gcc.target/arm/bics_3.c: Add some additional tests that
cannot be folded to a bit manipulation.
|
|
The following addresses a bogus swapping of SLP operands of a
reduction operation which gets STMT_VINFO_REDUC_IDX out of sync
with the SLP operand order. In fact the most obvious mistake is
that we simply swap operands even on the first stmt even when
there's no difference in the comparison operators (for == and !=
at least). But there are more latent issues that I noticed and
fixed up in the process.
PR tree-optimization/121659
* tree-vect-slp.cc (vect_build_slp_tree_1): Do not allow
matching up comparison operators by swapping if that would
disturb STMT_VINFO_REDUC_IDX. Make sure to only
actually mark operands for swapping when there was a
mismatch and we're not processing the first stmt.
* gcc.dg/vect/pr121659.c: New testcase.
|
|
The vgf2p8affineqb_<mode><mask_name> pattern uses "register_operand"
predicate for the first input operand, so using "general_operand"
for the rotate operand passed to it leads to ICEs, and so does
the "nonimmediate_operand" in the <insn>v16qi3 define_expand.
The following patch fixes it by using "register_operand" in the former
case (that pattern is TARGET_GFNI only) and using force_reg in
the latter case (the pattern is TARGET_XOP || TARGET_GFNI and for XOP
we can handle MEM operand).
The rest of the changes are small formatting tweaks or use of const0_rtx
instead of GEN_INT (0).
2025-08-26 Jakub Jelinek <jakub@redhat.com>
PR target/121658
* config/i386/sse.md (<insn><mode>3 any_shift): Use const0_rtx
instead of GEN_INT (0).
(cond_<insn><mode> any_shift): Likewise. Formatting fix.
(<insn><mode>3 any_rotate): Use register_operand predicate instead of
general_operand for match_operand 1. Use const0_rtx instead of
GEN_INT (0).
(<insn>v16qi3 any_rotate): Use force_reg on operands[1]. Formatting
fix.
* config/i386/i386.cc (ix86_shift_rotate_cost): Comment formatting
fixes.
* gcc.target/i386/pr121658.c: New test.
|
|
|
|
cost 0, 1 and 15
Add asm dump check and run test for vec_duplicate + vmacc.vvm
combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
0, 1 and 15
Add asm dump check and run test for vec_duplicate + vmacc.vvm
combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_run.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
When expand_omp_for_init_counts is called from expand_omp_for_generic,
zero_iter1_bb is NULL and the code always creates a new bb in which it
clears fd->loop.n2 var (if it is a var), because it can dominate code
with lastprivate guards that use the var.
When called from other places, zero_iter1_bb is non-NULL and so we don't
insert the clearing (and can't, because the same bb is used also for the
non-zero iterations exit and in that case we need to preserve the iteration
count). Clearing is also not necessary when e.g. outermost collapsed
loop has constant non-zero number of iterations, in that case we initialize the
var to something already earlier. The following patch makes sure to clear
it if it hasn't been initialized yet before the first check for zero iterations.
2025-08-26 Jakub Jelinek <jakub@redhat.com>
PR middle-end/121453
* omp-expand.cc (expand_omp_for_init_counts): Clear fd->loop.n2
before first zero count check if zero_iter1_bb is non-NULL upon
entry and fd->loop.n2 has not been written yet.
* gcc.dg/gomp/pr121453.c: New test.
|
|
PR tree-optimization/121656
* gcc.dg/pr121656.c: New file.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
CTF array encoding uses uint32 for number of elements. This means there
is a hard upper limit on array types which the format can represent.
GCC internally was also using a uint32_t for this, which would overflow
when translating from DWARF for arrays with more than UINT32_MAX
elements. Use an unsigned HOST_WIDE_INT instead to fetch the array
bound, and fall back to CTF_K_UNKNOWN if the array cannot be
represented in CTF.
PR debug/121411
gcc/
* dwarf2ctf.cc (gen_ctf_subrange_type): Use unsigned HWI for
array_num_elements. Fallback to CTF_K_UNKNOWN if the array
type has too many elements for CTF to represent.
gcc/testsuite/
* gcc.dg/debug/ctf/ctf-array-7.c: New test.
|
|
Just like r16-465-gf2bb7ffe84840d8 but this time
instead of a VCE there is a full on load from a boolean.
This showed up when trying to remove the extra copy
in the testcase from the revision mentioned above (pr120122-1.c).
So when moving loads from a boolean type from being conditional
to non-conditional, the load needs to become a full load and then
casted into a bool so that the upper bits are correct.
Bitfields loads will always do the truncation so they don't need to
be rewritten. Non boolean types always do the truncation too.
What we do is wrap the original reference with a VCE which causes
the full load and then do a casting to do the truncation. Using
fold_build1 with VCE will do the correct thing if there is a secondary
VCE and will also fold if this was just a plain MEM_REF so there is
no need to handle those 2 cases special either.
Changes since v1:
* v2: Use VIEW_CONVERT_EXPR instead of doing a manual load.
Accept all non mode precision loads rather than just
boolean ones.
* v3: Move back to checking boolean type. Don't handle BIT_FIELD_REF.
Add asserts for IMAG/REAL_PART_EXPR.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/121279
gcc/ChangeLog:
* gimple-fold.cc (gimple_needing_rewrite_undefined): Return
true for non mode precision boolean loads.
(rewrite_to_defined_unconditional): Handle non mode precision loads.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr121279-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
The following patch implements the proposed resolution of
https://cplusplus.github.io/CWG/issues/3048.html
Instead of rejecting structured binding size it just builds a normal
decl rather than structured binding declaration.
2025-08-25 Jakub Jelinek <jakub@redhat.com>
* pt.cc (finish_expansion_stmt): Implement C++ CWG3048
- Empty destructuring expansion statements. Don't error for
destructuring expansion stmts if sz is 0, don't call
fit_decomposition_lang_decl if n is 0 and pass NULL rather than
this_decomp to cp_finish_decl.
* g++.dg/cpp26/expansion-stmt15.C: Don't expect error on
destructuring expansion stmts with structured binding size 0.
* g++.dg/cpp26/expansion-stmt21.C: New test.
* g++.dg/cpp26/expansion-stmt22.C: New test.
|
|
The following testcase ICEs, because the
/* Check we aren't dereferencing a null pointer when calling a non-static
member function, which is undefined behaviour. */
if (i == 0 && DECL_OBJECT_MEMBER_FUNCTION_P (fun)
&& integer_zerop (arg)
/* But ignore calls from within compiler-generated code, to handle
cases like lambda function pointer conversion operator thunks
which pass NULL as the 'this' pointer. */
&& !(TREE_CODE (t) == CALL_EXPR && CALL_FROM_THUNK_P (t)))
{
if (!ctx->quiet)
error_at (cp_expr_loc_or_input_loc (x),
"dereferencing a null pointer");
*non_constant_p = true;
}
checking is done before testing if (*jump_target). Especially when
throws (jump_target), arg can be (and is on this testcase) NULL_TREE,
so calling integer_zerop on it ICEs.
Fixed by moving the if (*jump_target) test earlier.
2025-08-25 Jakub Jelinek <jakub@redhat.com>
PR c++/121601
* constexpr.cc (cxx_bind_parameters_in_call): Move break
if *jump_target before the check for null this object pointer.
* g++.dg/cpp26/constexpr-eh16.C: New test.
|
|
The following fixes a missed SLP discovery of a live induction.
Our pattern matching of those fails because of the PR81529 fix
which I think was misguided and should now no longer be relevant.
So this essentially reverts that fix. I have added a GIMPLE
testcase to increase the chance the particular IL is preserved
through the future.
This shows that how we make some IVs live because of early-break
isn't quite correct, so I had to preserve a hack here. Hopefully
to be investigated at some point.
PR tree-optimization/121638
* tree-vect-stmts.cc (process_use): Do not make induction
PHI backedge values relevant.
* gcc.dg/vect/pr121638.c: New testcase.
|
|
The GFNI AVX gf2p8affineqb instruction can be used to implement
vectorized byte shifts or rotates. This patch uses them to implement
shift and rotate patterns to allow the vectorizer to use them.
Previously AVX couldn't do rotates (except with XOP) and had to handle
8 bit shifts with a half throughput 16 bit shift.
This is only implemented for constant shifts. In theory it could
be used with a lookup table for variable shifts, but it's unclear
if it's worth it.
The vectorizer cost model could be improved, but seems to work for now.
It doesn't model the true latencies of the instructions. Also it doesn't
account for the memory loading of the mask, assuming that for a loop
it will be loaded outside the loop.
The instructions would also support more complex patterns
(e.g. arbitary bit movement or inversions), so some of the tricks
applied to ternlog could be applied here too to collapse
more code. It's trickier because the input patterns
can be much longer since they can apply to every bit individually. I didn't
attempt any of this.
There's currently no test case for the masked/cond_ variants, they seem
to be difficult to trigger with the vectorizer. Suggestions for a test
case for them welcome.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_vgf2p8affine_shift_matrix):
New function to lookup shift/rotate matrixes for gf2p8affine.
* config/i386/i386-protos.h (ix86_vgf2p8affine_shift_matrix):
Declare new function.
* config/i386/i386.cc (ix86_shift_rotate_cost): Add cost model
for shift/rotate implemented using gf2p8affine.
* config/i386/sse.md (VI1_AVX512_3264): New mode iterator.
(<insn><mode>3): Add GFNI case for shift patterns.
(cond_<insn><mode>3): New pattern.
(<insn><mode>3<mask_name>): Dito.
(<insn>v16qi): New rotate pattern to handle XOP V16QI case
and GFNI.
(rotl<mode>3, rotr<mode>3): Exclude V16QI case.
gcc/testsuite/ChangeLog:
* gcc.target/i386/shift-gf2p8affine-1.c: New test
* gcc.target/i386/shift-gf2p8affine-2.c: New test
* gcc.target/i386/shift-gf2p8affine-3.c: New test
* gcc.target/i386/shift-v16qi-4.c: New test
* gcc.target/i386/shift-gf2p8affine-5.c: New test
* gcc.target/i386/shift-gf2p8affine-6.c: New test
* gcc.target/i386/shift-gf2p8affine-7.c: New test
|
|
I can't believe I made such a stupid pasto and the regression test
didn't detect anything wrong.
PR target/121634
gcc/
* config/loongarch/simd.md (simd_maddw_evod_<mode>_<su>): Use
WVEC_HALF instead of WVEC for the mode of the sign_extend for
the rhs of multiplication.
gcc/testsuite/
* gcc.target/loongarch/pr121634.c: New test.
|
|
I got too clever trying to simplify the right shift computation in my recent
ifcvt patch. Interestingly enough, I haven't seen anything but the Linaro CI
configuration actually trip the problem, though the code is clearly wrong.
The problem I was trying to avoid were the leading zeros when calling clz on a
HWI when the real object is just say 32 bits.
The net is we get a right shift count of "2" when we really wanted a right
shift count of 30. That causes the execution aspect of bics_3 to fail.
The scan failures are due to creating slightly more efficient code. THe new
code sequences don't need to use conditional execution for selection and thus
we can use bic rather bics which requires a twiddle in the scan.
I reviewed recent bug reports and haven't seen one for this issue. So no new
testcase as this is covered by the armv7 testsuite in the right configuration.
Bootstrapped and regression tested on x86_64, also verified it fixes the Linaro
reported CI failure and verified the crosses are still happy. Pushing to the
trunk.
gcc/
* ifcvt.cc (noce_try_sign_bit_splat): Fix right shift computation.
gcc/testsuite/
* gcc.target/arm/bics_3.c: Adjust expected output
|
|
|
|
PR c++/116928
gcc/cp/ChangeLog:
* parser.cc (cp_parser_braced_list): Set greater_than_is_operator_p.
gcc/testsuite/ChangeLog:
* g++.dg/parse/template33.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
Compile noplt-gd-1.c and noplt-ld-1.c with -mtls-dialect=gnu to support
the --with-tls=gnu2 configure option since they scan the assembly output
for the __tls_get_addr call which is generated by -mtls-dialect=gnu.
PR target/120933
* gcc.target/i386/noplt-gd-1.c (dg-options): Add
-mtls-dialect=gnu.
* gcc.target/i386/noplt-ld-1.c (dg-options): Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
defining module [PR120499]
In the PR, we're getting a linker error from _Vector_impl's destructor
never getting emitted. This is because of a combination of factors:
1. in imp-member-4_a, the destructor is not used and so there is no
definition generated.
2. in imp-member-4_b, the destructor gets synthesized (as part of the
synthesis for Coll's destructor) but is not ODR-used and so does not
get emitted. Despite there being a definition provided in this TU,
the destructor is still considered imported and so isn't streamed
into the module body.
3. in imp-member-4_c, we need to ODR-use the destructor but we only got
a forward declaration from imp-member-4_b, so we cannot emit a body.
The point of failure here is step 2; this function has effectively been
declared in the imp-member-4_b module, and so we shouldn't treat it as
imported. This way we'll properly stream the body so that importers can
emit it.
PR c++/120499
gcc/cp/ChangeLog:
* method.cc (synthesize_method): Set the instantiating module.
gcc/testsuite/ChangeLog:
* g++.dg/modules/imp-member-4_a.C: New test.
* g++.dg/modules/imp-member-4_b.C: New test.
* g++.dg/modules/imp-member-4_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
|
|
sign bit test
While working to remove mvconst_internal I stumbled over a regression in
the code to handle signed division by a power of two.
In that sequence we want to select between 0, 2^n-1 by pairing a sign
bit splat with a subsequent logical right shift. This can be done
without branches or conditional moves.
Playing with it a bit made me realize there's a handful of selections we
can do based on a sign bit test. Essentially there's two broad cases.
Clearing bits after the sign bit splat. So we have 0, -1, if we clear
bits the 0 stays as-is, but the -1 could easily turn into 2^n-1, ~2^n-1,
or some small constants.
Setting bits after the sign bit splat. If we have 0, -1, setting bits
the -1 stays as-is, but the 0 can turn into 2^n, a small constant, etc.
Shreya and I originally started looking at target patterns to do this,
essentially discovering conditional move forms of the selects and
rewriting them into something more efficient. That got out of control
pretty quickly and it relied on if-conversion to initially create the
conditional move.
The better solution is to actually discover the cases during
if-conversion itself. That catches cases that were previously being
missed, checks cost models, and is actually simpler since we don't have
to distinguish between things like ori and bseti, instead we just emit
the natural RTL and let the target figure it out.
In the ifcvt implementation we put these cases just before trying the
traditional conditional move sequences. Essentially these are a last
attempt before trying the generalized conditional move sequence.
This as been bootstrapped and regression tested on aarch64, riscv,
ppc64le, s390x, alpha, m68k, sh4eb, x86_64 and probably a couple others
I've forgotten. It's also been tested on the other embedded targets.
Obviously the new tests are risc-v specific, so that testing was
primarily to make sure we didn't ICE, generate incorrect code or regress
target existing specific tests.
Raphael has some changes to attack this from the gimple direction as
well. I think the latest version of those is on me to push through
internal review.
PR rtl-optimization/120553
gcc/
* ifcvt.cc (noce_try_sign_bit_splat): New function.
(noce_process_if_block): Use it.
gcc/testsuite/
* gcc.target/riscv/pr120553-1.c: New test.
* gcc.target/riscv/pr120553-2.c: New test.
* gcc.target/riscv/pr120553-3.c: New test.
* gcc.target/riscv/pr120553-4.c: New test.
* gcc.target/riscv/pr120553-5.c: New test.
* gcc.target/riscv/pr120553-6.c: New test.
* gcc.target/riscv/pr120553-7.c: New test.
* gcc.target/riscv/pr120553-8.c: New test.
|
|
Add run and asm check test cases for scalar unsigned SAT_MUL form 3.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_u_mul-4-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u64.rv32.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
For the beginning basic block:
(note 4 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 4 26 2 NOTE_INSN_FUNCTION_BEG)
emit the TLS call after NOTE_INSN_FUNCTION_BEG.
gcc/
PR target/121635
* config/i386/i386-features.cc (ix86_emit_tls_call): Emit the
TLS call after NOTE_INSN_FUNCTION_BEG.
gcc/testsuite/
PR target/121635
* gcc.target/i386/pr121635-1a.c: New test.
* gcc.target/i386/pr121635-1b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Linaro CI informed me that this test fails on ARM thumb-m7-hard-eabi.
This appears to be because the target defaults to -fshort-enums, and so
the mangled names are inaccurate.
This patch just disables the implicit type enum test for this case.
gcc/testsuite/ChangeLog:
* g++.dg/abi/mangle83.C: Disable implicit enum test for
-fshort-enums.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
Without stating the architecture version required by the test, test
runs with options that are incompatible with the required
architecture version fail, e.g. -mfloat-abi=hard.
armv7 was not covered by the long list of arm variants in
target-supports.exp, so add it, and use it for the effective target
requirement and for the option.
for gcc/testsuite/ChangeLog
PR rtl-optimization/120424
* lib/target-supports.exp (arm arches): Add arm_arch_v7.
* g++.target/arm/pr120424.C: Require armv7 support. Use
dg-add-options arm_arch_v7 instead of explicit -march=armv7.
|
|
|
|
PR fortran/121627
gcc/fortran/ChangeLog:
* module.cc (create_int_parameter_array): Avoid NULL
pointer dereference and enhance error message.
gcc/testsuite/ChangeLog:
* gfortran.dg/pr121627.f90: New test.
|
|
The middle-end does not fully understand NULLPTR_TYPE. So it
gets confused a lot of the time when dealing with it.
This adds the folding that is similarly done in the C++ front-end already.
In some cases it should produce slightly better code as there is no
reason to load from a nullptr_t variable as it is always NULL.
The following is handled:
nullptr_v ==/!= nullptr_v -> true/false
(ptr)nullptr_v -> (ptr)0, nullptr_v
f(nullptr_v) -> f ((nullptr, nullptr_v))
The last one is for conversion inside ... .
Bootstrapped and tested on x86_64-linux-gnu.
PR c/121478
gcc/c/ChangeLog:
* c-fold.cc (c_fully_fold_internal): Fold nullptr_t ==/!= nullptr_t.
* c-typeck.cc (convert_arguments): Handle conversion from nullptr_t
for varargs.
(convert_for_assignment): Handle conversions from nullptr_t to
pointer type specially.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr121478-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
Since r16-3022, 20_util/variant/102912.cc was failing in C++20 and above due
to wrong errors about destruction modifying a const object; destruction is
OK.
PR c++/121068
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_store_expression): Allow clobber of a const
object.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/constexpr-dtor18.C: New test.
|
|
Call check_effective_target_riscv_zvfh_ok rather than
check_effective_target_riscv_zvfh in vx_vf_*run-1-f16.c run tests and ensure
that they are actually run.
Also fix remove_options_for_riscv_zvfh.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f16.c: Call
check_effective_target_riscv_zvfh_ok rather than
check_effective_target_riscv_zvfh.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmacc-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmsac-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmsac-run-1-f16.c: Likewise.
* lib/target-supports.exp (check_effective_target_riscv_zvfh_ok): Append
zvfh instead of v to march.
(remove_options_for_riscv_zvfh): Remove duplicate and
call remove_ rather than add_options_for_riscv_z_ext.
|
|
This PR is another bug in the rtl-ssa code to manage live-out uses.
It seems that this didn't get much coverage until recently.
In the testcase, late-combine first removed a register-to-register
move by substituting into all uses, some of which were in other EBBs.
This was done after checking make_uses_available, which (as expected)
says that single dominating definitions are available everywhere
that the definition dominates. But the update failed to add
appropriate live-out uses, so a later parallelisation attempt
tried to move the new destination into a later block.
gcc/
PR rtl-optimization/121619
* rtl-ssa/functions.h (function_info::commit_make_use_available):
Declare.
* rtl-ssa/blocks.cc (function_info::commit_make_use_available):
New function.
* rtl-ssa/changes.cc (function_info::apply_changes_to_insn): Use it.
gcc/testsuite/
PR rtl-optimization/121619
* gcc.dg/pr121619.c: New test.
|
|
For a basic block with only a label:
(code_label 78 11 77 3 14 (nil) [1 uses])
(note 77 78 54 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
emit the TLS call after NOTE_INSN_BASIC_BLOCK, instead of before
NOTE_INSN_BASIC_BLOCK, to avoid
x.c: In function ‘aout_16_write_syms’:
x.c:54:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 3
54 | }
| ^
x.c:54:1: error: NOTE_INSN_BASIC_BLOCK 77 in middle of basic block 3
during RTL pass: x86_cse
x.c:54:1: internal compiler error: verify_flow_info failed
gcc/
PR target/121607
* config/i386/i386-features.cc (ix86_emit_tls_call): Emit the
TLS call after NOTE_INSN_BASIC_BLOCK in a basic block with only
a label.
gcc/testsuite/
PR target/121607
* gcc.target/i386/pr121607-1a.c: New test.
* gcc.target/i386/pr121607-1b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
2025-08-21 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/84122
* parse.cc (parse_derived): PDT type parameters are not allowed
an explicit access specification and must appear before a
PRIVATE statement. If a PRIVATE statement is seen, mark all the
other components as PRIVATE.
PR fortran/85942
* simplify.cc (get_kind): Convert a PDT KIND component into a
specification expression using the default initializer.
gcc/testsuite/
PR fortran/84122
* gfortran.dg/pdt_38.f03: New test.
PR fortran/85942
* gfortran.dg/pdt_39.f03: New test.
|
|
Here r13-1210 correctly changed &A<int>::foo to not be considered
type-dependent, but tsubst_expr of the OFFSET_REF got confused trying to
tsubst a type that involved auto. Fixed by getting the type from the
member rather than tsubst.
PR c++/120757
gcc/cp/ChangeLog:
* pt.cc (tsubst_expr) [OFFSET_REF]: Don't tsubst the type.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/auto-fn66.C: New test.
|
|
|
|
P2036 says that this:
[x=1]{ int x; }
should be rejected, but with my P2036 we started giving an error
for the attached testcase as well, breaking Dolphin. So let's
keep the error only for init-captures.
PR c++/121553
gcc/cp/ChangeLog:
* name-lookup.cc (check_local_shadow): Check !is_normal_capture_proxy.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wshadow-19.C: Revert P2036 changes.
* g++.dg/warn/Wshadow-6.C: Likewise.
* g++.dg/warn/Wshadow-20.C: New test.
* g++.dg/warn/Wshadow-21.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
-Wstringop-* warnings [PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]
'-fdiagnostics-show-context[=DEPTH]'
'-fno-diagnostics-show-context'
With this option, the compiler might print the interesting control
flow chain that guards the basic block of the statement which has
the warning. DEPTH is the maximum depth of the control flow chain.
Currently, The list of the impacted warning options includes:
'-Warray-bounds', '-Wstringop-overflow', '-Wstringop-overread',
'-Wstringop-truncation'. and '-Wrestrict'. More warning options
might be added to this list in future releases. The forms
'-fdiagnostics-show-context' and '-fno-diagnostics-show-context'
are aliases for '-fdiagnostics-show-context=1' and
'-fdiagnostics-show-context=0', respectively.
For example:
$ cat t.c
extern void warn(void);
static inline void assign(int val, int *regs, int *index)
{
if (*index >= 4)
warn();
*regs = val;
}
struct nums {int vals[4];};
void sparx5_set (int *ptr, struct nums *sg, int index)
{
int *val = &sg->vals[index];
assign(0, ptr, &index);
assign(*val, ptr, &index);
}
$ gcc -Wall -O2 -c -o t.o t.c
t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ [-Warray-bounds=]
12 | int *val = &sg->vals[index];
| ~~~~~~~~^~~~~~~
t.c:8:18: note: while referencing ‘vals’
8 | struct nums {int vals[4];};
| ^~~~
In the above, Although the warning is correct in theory, the warning message
itself is confusing to the end-user since there is information that cannot
be connected to the source code directly.
It will be a nice improvement to add more information in the warning message
to report where such index value come from.
With the new option -fdiagnostics-show-context=1, the warning message for
the above testing case is now:
$ gcc -Wall -O2 -fdiagnostics-show-context=1 -c -o t.o t.c
t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ [-Warray-bounds=]
12 | int *val = &sg->vals[index];
| ~~~~~~~~^~~~~~~
‘sparx5_set’: events 1-2
4 | if (*index >= 4)
| ^
| |
| (1) when the condition is evaluated to true
......
12 | int *val = &sg->vals[index];
| ~~~~~~~~~~~~~~~
| |
| (2) warning happens here
t.c:8:18: note: while referencing ‘vals’
8 | struct nums {int vals[4];};
| ^~~~
PR tree-optimization/109071
PR tree-optimization/85788
PR tree-optimization/88771
PR tree-optimization/106762
PR tree-optimization/108770
PR tree-optimization/115274
PR tree-optimization/117179
gcc/ChangeLog:
* Makefile.in (OBJS): Add diagnostic-context-rich-location.o.
* common.opt (fdiagnostics-show-context): New option.
(fdiagnostics-show-context=): New option.
* diagnostic-context-rich-location.cc: New file.
* diagnostic-context-rich-location.h: New file.
* doc/invoke.texi (fdiagnostics-details): Add
documentation for the new options.
* gimple-array-bounds.cc (check_out_of_bounds_and_warn): Add
one new parameter. Use rich location with details for warning_at.
(array_bounds_checker::check_array_ref): Use rich location with
ditails for warning_at.
(array_bounds_checker::check_mem_ref): Add one new parameter.
Use rich location with details for warning_at.
(array_bounds_checker::check_addr_expr): Use rich location with
move_history_diagnostic_path for warning_at.
(array_bounds_checker::check_array_bounds): Call check_mem_ref with
one more parameter.
* gimple-array-bounds.h: Update prototype for check_mem_ref.
* gimple-ssa-warn-access.cc (warn_string_no_nul): Use rich location
with details for warning_at.
(maybe_warn_nonstring_arg): Likewise.
(maybe_warn_for_bound): Likewise.
(warn_for_access): Likewise.
(check_access): Likewise.
(pass_waccess::check_strncat): Likewise.
(pass_waccess::maybe_check_access_sizes): Likewise.
* gimple-ssa-warn-restrict.cc (pass_wrestrict::execute): Calculate
dominance info for diagnostics show context.
(maybe_diag_overlap): Use rich location with details for warning_at.
(maybe_diag_access_bounds): Use rich location with details for
warning_at.
gcc/testsuite/ChangeLog:
* gcc.dg/pr109071.c: New test.
* gcc.dg/pr109071_1.c: New test.
* gcc.dg/pr109071_10.c: New test.
* gcc.dg/pr109071_11.c: New test.
* gcc.dg/pr109071_12.c: New test.
* gcc.dg/pr109071_2.c: New test.
* gcc.dg/pr109071_3.c: New test.
* gcc.dg/pr109071_4.c: New test.
* gcc.dg/pr109071_5.c: New test.
* gcc.dg/pr109071_6.c: New test.
* gcc.dg/pr109071_7.c: New test.
* gcc.dg/pr109071_8.c: New test.
* gcc.dg/pr109071_9.c: New test.
* gcc.dg/pr117375.c: New test.
|
|
We can't place a TLS call before a conditional jump in a basic block like
(code_label 13 11 14 4 2 (nil) [1 uses])
(note 14 13 16 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(jump_insn 16 14 17 4 (set (pc)
(if_then_else (le (reg:CCNO 17 flags)
(const_int 0 [0]))
(label_ref 27)
(pc))) "x.c":10:21 discrim 1 1462 {*jcc}
(expr_list:REG_DEAD (reg:CCNO 17 flags)
(int_list:REG_BR_PROB 628353713 (nil)))
-> 27)
since the TLS call will clobber flags register nor place a TLS call in a
basic block if any live caller-saved registers aren't dead at the end of
the basic block:
;; live in 6 [bp] 7 [sp] 16 [argp] 17 [flags] 19 [frame] 104
;; live gen 0 [ax] 102 106 108 116 117 118 120
;; live kill 5 [di]
Instead, we should place such call before all register setting basic
blocks which dominate the current basic block.
Keep track the replaced GNU and GNU2 TLS instructions. Use these info to
place the __tls_get_addr call and mark FLAGS register as dead.
gcc/
PR target/121572
* config/i386/i386-features.cc (replace_tls_call): Add a bitmap
argument and put the updated TLS instruction in the bitmap.
(ix86_get_dominator_for_reg): New.
(ix86_check_flags_reg): Likewise.
(ix86_emit_tls_call): Likewise.
(ix86_place_single_tls_call): Add 2 bitmap arguments for updated
GNU and GNU2 TLS instructions. Call ix86_emit_tls_call to emit
TLS instruction. Correct debug dump for before instruction.
gcc/testsuite/
PR target/121572
* gcc.target/i386/pr121572-1a.c: New test.
* gcc.target/i386/pr121572-1b.c: Likewise.
* gcc.target/i386/pr121572-2a.c: Likewise.
* gcc.target/i386/pr121572-2b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
This testcase is testing the difference between functions that are or are
not declared constexpr.
gcc/testsuite/ChangeLog:
* g++.dg/cpp26/expansion-stmt16.C: Add -fno-implicit-constexpr.
|
|
This testcase caused an ICE when mangling the invalid type-constraint in
write_requirement since write_type_constraint expects a TEMPLATE_TYPE_PARM.
Setting the trailing return type to NULL_TREE when a
return-type-requirement is found in place of a type-constraint prevents the
failed assertion in write_requirement. It also allows the invalid
constraint to be satisfied in some contexts to prevent redundant errors,
e.g. in concepts-requires5.C.
Bootstrapped and tested on x86_64-linux-gnu.
PR c++/120618
gcc/cp/ChangeLog:
* parser.cc (cp_parser_compound_requirement): Set type to
NULL_TREE for invalid type-constraint.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-requires5.C: Don't require
redundant diagnostic in static assertion.
* g++.dg/concepts/pr120618.C: New test.
Suggested-by: Jason Merrill <jason@redhat.com>
|
|
When expanding malloc like functions, we copy the return register into a temporary
and then mark that temporary register with a noalias regnote and the alignment.
This works fine unless you are calling the function with a return type of void.
At this point then the valreg will be null and a crash will happen.
A few cleanups are included in this patch because it was easier to do the fix
with the cleanups added.
The start_sequence/end_sequence for ECF_MALLOC is no longer needed; I can't tell
if it was ever needed.
The emit_move_insn function returns the last emitted instruction anyways so
there is no reason to call get_last_insn as we can just use the return value
of emit_move_insn. This has been true since this code was originally added
so I don't understand why it was done that way beforehand.
Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/120024
gcc/ChangeLog:
* calls.cc (expand_call): Remove start_sequence/end_sequence
for ECF_MALLOC.
Check valreg before deferencing it when it comes to malloc like
functions. Use the return value of emit_move_insn instead of
calling get_last_insn.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/malloc-1.c: New test.
* gcc.dg/torture/malloc-2.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|