Age | Commit message (Collapse) | Author | Files | Lines |
|
cost 0, 1 and 15
Add asm dump check and run test for vec_duplicate + vmacc.vvm
combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
0, 1 and 15
Add asm dump check and run test for vec_duplicate + vmacc.vvm
combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_run.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to combine the vec_duplicate + vmacc.vv to the
vmacc.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_VX_TERNARY_CASE_0(T, OP_1, OP_2, NAME) \
void \
test_vx_ternary_##NAME##_##T##_case_0 (T * restrict vd, T * restrict vs2, \
T rs1, unsigned n) \
{ \
for (unsigned i = 0; i < n; i++) \
vd[i] = vd[i] OP_2 vs2[i] OP_1 rs1; \
}
DEF_VX_TERNARY_CASE_0(int32_t, *, +, macc)
Before this patch:
11 │ beq a3,zero,.L8
12 │ vsetvli a5,zero,e32,m1,ta,ma
13 │ vmv.v.x v2,a2
...
16 │ .L3:
17 │ vsetvli a5,a3,e32,m1,ta,ma
...
22 │ vmacc.vv v1,v2,v3
...
25 │ bne a3,zero,.L3
After this patch:
11 │ beq a3,zero,.L8
...
14 │ .L3:
15 │ vsetvli a5,a3,e32,m1,ta,ma
...
20 │ vmacc.vx v1,a2,v3
...
23 │ bne a3,zero,.L3
gcc/ChangeLog:
* config/riscv/vector.md (@pred_mul_plus_vx_<mode>): Add new pattern to
generate vmacc rtl.
(*pred_macc_<mode>_scalar_undef): Ditto.
* config/riscv/autovec-opt.md (*vmacc_vx_<mode>): Add new
pattern to match the vmacc vx combine.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
When expand_omp_for_init_counts is called from expand_omp_for_generic,
zero_iter1_bb is NULL and the code always creates a new bb in which it
clears fd->loop.n2 var (if it is a var), because it can dominate code
with lastprivate guards that use the var.
When called from other places, zero_iter1_bb is non-NULL and so we don't
insert the clearing (and can't, because the same bb is used also for the
non-zero iterations exit and in that case we need to preserve the iteration
count). Clearing is also not necessary when e.g. outermost collapsed
loop has constant non-zero number of iterations, in that case we initialize the
var to something already earlier. The following patch makes sure to clear
it if it hasn't been initialized yet before the first check for zero iterations.
2025-08-26 Jakub Jelinek <jakub@redhat.com>
PR middle-end/121453
* omp-expand.cc (expand_omp_for_init_counts): Clear fd->loop.n2
before first zero count check if zero_iter1_bb is non-NULL upon
entry and fd->loop.n2 has not been written yet.
* gcc.dg/gomp/pr121453.c: New test.
|
|
PR tree-optimization/121656
* gcc.dg/pr121656.c: New file.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
CTF array encoding uses uint32 for number of elements. This means there
is a hard upper limit on array types which the format can represent.
GCC internally was also using a uint32_t for this, which would overflow
when translating from DWARF for arrays with more than UINT32_MAX
elements. Use an unsigned HOST_WIDE_INT instead to fetch the array
bound, and fall back to CTF_K_UNKNOWN if the array cannot be
represented in CTF.
PR debug/121411
gcc/
* dwarf2ctf.cc (gen_ctf_subrange_type): Use unsigned HWI for
array_num_elements. Fallback to CTF_K_UNKNOWN if the array
type has too many elements for CTF to represent.
gcc/testsuite/
* gcc.dg/debug/ctf/ctf-array-7.c: New test.
|
|
After the return type of remove_prop_source_from_use was changed to void,
simplify_permutation only returns 1 or 0 so it can be boolified.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (simplify_permutation): Boolify.
(pass_forwprop::execute): No longer handle 2 as the return
from simplify_permutation.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
After changing the return type of remove_prop_source_from_use,
forward_propagate_into_comparison will never return 2. So boolify
forward_propagate_into_comparison.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (forward_propagate_into_comparison): Boolify.
(pass_forwprop::execute): Don't handle return of 2 from
forward_propagate_into_comparison.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
Since r5-4705-ga499aac5dfa5d9, remove_prop_source_from_use has always
return false. This removes the return type of remove_prop_source_from_use
and cleans up the usage of remove_prop_source_from_use.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (remove_prop_source_from_use): Remove
return type.
(forward_propagate_into_comparison): Update dealing with
no return type of remove_prop_source_from_use.
(forward_propagate_into_gimple_cond): Likewise.
(simplify_permutation): Likewise.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
While looking at this code I noticed that we don't remove
the old switch index assignment if it is only used in the switch
after it is modified in simplify_gimple_switch.
This fixes that by marking the old switch index for the dce worklist.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (simplify_gimple_switch): Add simple_dce_worklist
argument. Mark the old index when doing the replacement.
(pass_forwprop::execute): Update call to simplify_gimple_switch.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
Just like r16-465-gf2bb7ffe84840d8 but this time
instead of a VCE there is a full on load from a boolean.
This showed up when trying to remove the extra copy
in the testcase from the revision mentioned above (pr120122-1.c).
So when moving loads from a boolean type from being conditional
to non-conditional, the load needs to become a full load and then
casted into a bool so that the upper bits are correct.
Bitfields loads will always do the truncation so they don't need to
be rewritten. Non boolean types always do the truncation too.
What we do is wrap the original reference with a VCE which causes
the full load and then do a casting to do the truncation. Using
fold_build1 with VCE will do the correct thing if there is a secondary
VCE and will also fold if this was just a plain MEM_REF so there is
no need to handle those 2 cases special either.
Changes since v1:
* v2: Use VIEW_CONVERT_EXPR instead of doing a manual load.
Accept all non mode precision loads rather than just
boolean ones.
* v3: Move back to checking boolean type. Don't handle BIT_FIELD_REF.
Add asserts for IMAG/REAL_PART_EXPR.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/121279
gcc/ChangeLog:
* gimple-fold.cc (gimple_needing_rewrite_undefined): Return
true for non mode precision boolean loads.
(rewrite_to_defined_unconditional): Handle non mode precision loads.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr121279-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
When working on PR121279, I noticed that lim
would create an uninitialized decl and marking
it with supression for uninitialization warning.
This is fine but then into ssa would just call
get_or_create_ssa_default_def on that new decl which
could in theory take some extra compile time to figure
that out.
Plus when doing the rewriting for undefinedness, there
would now be a VCE around the decl. This means the ssa
name is kept around and not propagated in some cases.
So instead this patch manually calls get_or_create_ssa_default_def
to get the "uninitalized" ssa name for this decl and
no longer needs the write into ssa nor for undefined ness.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-loop-im.cc (execute_sm): Call
get_or_create_ssa_default_def for the new uninitialized
decl.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
multiple alternatives
The use of compact syntax makes the relationship between asm output,
operand constraints, and insn attributes easier to understand and modify,
especially for "mov<mode>_internal".
gcc/ChangeLog:
* config/xtensa/xtensa.md (addsi3, <u>mulhisi3, andsi3,
zero_extend<mode>si2, extendhisi2_internal, movsi_internal,
movhi_internal, movqi_internal, movsf_internal, ashlsi3_internal,
ashrsi3, lshrsi3, rotlsi3, rotrsi3):
Rewrite in compact syntax.
|
|
gcc/ChangeLog:
* config/xtensa/xtensa.md
(The auxiliary define_split for *masktrue_const_bitcmpl):
Use a more concise function call, i.e.,
(1 << GET_MODE_BITSIZE (mode)) - 1 is equivalent to
GET_MODE_MASK (mode).
|
|
gcc/ChangeLog:
* config/xtensa/xtensa.md (mode_bits):
New mode attribute.
(zero_extend<mode>si2): Use the appropriate mode iterator and
attribute to unify "zero_extend[hq]isi2" to this description.
|
|
The following patch implements the proposed resolution of
https://cplusplus.github.io/CWG/issues/3048.html
Instead of rejecting structured binding size it just builds a normal
decl rather than structured binding declaration.
2025-08-25 Jakub Jelinek <jakub@redhat.com>
* pt.cc (finish_expansion_stmt): Implement C++ CWG3048
- Empty destructuring expansion statements. Don't error for
destructuring expansion stmts if sz is 0, don't call
fit_decomposition_lang_decl if n is 0 and pass NULL rather than
this_decomp to cp_finish_decl.
* g++.dg/cpp26/expansion-stmt15.C: Don't expect error on
destructuring expansion stmts with structured binding size 0.
* g++.dg/cpp26/expansion-stmt21.C: New test.
* g++.dg/cpp26/expansion-stmt22.C: New test.
|
|
The following testcase ICEs, because the
/* Check we aren't dereferencing a null pointer when calling a non-static
member function, which is undefined behaviour. */
if (i == 0 && DECL_OBJECT_MEMBER_FUNCTION_P (fun)
&& integer_zerop (arg)
/* But ignore calls from within compiler-generated code, to handle
cases like lambda function pointer conversion operator thunks
which pass NULL as the 'this' pointer. */
&& !(TREE_CODE (t) == CALL_EXPR && CALL_FROM_THUNK_P (t)))
{
if (!ctx->quiet)
error_at (cp_expr_loc_or_input_loc (x),
"dereferencing a null pointer");
*non_constant_p = true;
}
checking is done before testing if (*jump_target). Especially when
throws (jump_target), arg can be (and is on this testcase) NULL_TREE,
so calling integer_zerop on it ICEs.
Fixed by moving the if (*jump_target) test earlier.
2025-08-25 Jakub Jelinek <jakub@redhat.com>
PR c++/121601
* constexpr.cc (cxx_bind_parameters_in_call): Move break
if *jump_target before the check for null this object pointer.
* g++.dg/cpp26/constexpr-eh16.C: New test.
|
|
The following fixes a missed SLP discovery of a live induction.
Our pattern matching of those fails because of the PR81529 fix
which I think was misguided and should now no longer be relevant.
So this essentially reverts that fix. I have added a GIMPLE
testcase to increase the chance the particular IL is preserved
through the future.
This shows that how we make some IVs live because of early-break
isn't quite correct, so I had to preserve a hack here. Hopefully
to be investigated at some point.
PR tree-optimization/121638
* tree-vect-stmts.cc (process_use): Do not make induction
PHI backedge values relevant.
* gcc.dg/vect/pr121638.c: New testcase.
|
|
gcc/Changelog:
* asan.h (HWASAN_TAG_SIZE): Use targetm.memtag.tag_bitsize.
* config/i386/i386.cc (ix86_memtag_tag_size): Rename to
ix86_memtag_tag_bitsize.
(TARGET_MEMTAG_TAG_SIZE): Renamed to TARGET_MEMTAG_TAG_BITSIZE.
* doc/tm.texi (TARGET_MEMTAG_TAG_SIZE): Likewise.
* doc/tm.texi.in (TARGET_MEMTAG_TAG_SIZE): Likewise.
* target.def (tag_size): Rename to tag_bitsize.
* targhooks.cc (default_memtag_tag_size): Rename to
default_memtag_tag_bitsize.
* targhooks.h (default_memtag_tag_size): Likewise.
Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
Co-authored-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
|
|
The FUNCTION_VALUE and LIBCALL_VALUE macros are deprecated in favor of
the TARGET_FUNCTION_VALUE and TARGET_LIBCALL_VALUE target hooks. This
patch replaces the macro definitions with proper target hook implementations.
This change is also a preparatory step for VLS calling convention support,
which will require additional information that is more easily handled
through the target hook interface.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_init_cumulative_args): Change
fntype parameter from tree to const_tree.
* config/riscv/riscv.cc (riscv_init_cumulative_args): Likewise.
(riscv_function_value): Replace with new implementation that
conforms to TARGET_FUNCTION_VALUE hook signature.
(riscv_libcall_value): New function implementing TARGET_LIBCALL_VALUE.
(TARGET_FUNCTION_VALUE): Define.
(TARGET_LIBCALL_VALUE): Define.
* config/riscv/riscv.h (FUNCTION_VALUE): Remove.
(LIBCALL_VALUE): Remove.
|
|
The GFNI AVX gf2p8affineqb instruction can be used to implement
vectorized byte shifts or rotates. This patch uses them to implement
shift and rotate patterns to allow the vectorizer to use them.
Previously AVX couldn't do rotates (except with XOP) and had to handle
8 bit shifts with a half throughput 16 bit shift.
This is only implemented for constant shifts. In theory it could
be used with a lookup table for variable shifts, but it's unclear
if it's worth it.
The vectorizer cost model could be improved, but seems to work for now.
It doesn't model the true latencies of the instructions. Also it doesn't
account for the memory loading of the mask, assuming that for a loop
it will be loaded outside the loop.
The instructions would also support more complex patterns
(e.g. arbitary bit movement or inversions), so some of the tricks
applied to ternlog could be applied here too to collapse
more code. It's trickier because the input patterns
can be much longer since they can apply to every bit individually. I didn't
attempt any of this.
There's currently no test case for the masked/cond_ variants, they seem
to be difficult to trigger with the vectorizer. Suggestions for a test
case for them welcome.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_vgf2p8affine_shift_matrix):
New function to lookup shift/rotate matrixes for gf2p8affine.
* config/i386/i386-protos.h (ix86_vgf2p8affine_shift_matrix):
Declare new function.
* config/i386/i386.cc (ix86_shift_rotate_cost): Add cost model
for shift/rotate implemented using gf2p8affine.
* config/i386/sse.md (VI1_AVX512_3264): New mode iterator.
(<insn><mode>3): Add GFNI case for shift patterns.
(cond_<insn><mode>3): New pattern.
(<insn><mode>3<mask_name>): Dito.
(<insn>v16qi): New rotate pattern to handle XOP V16QI case
and GFNI.
(rotl<mode>3, rotr<mode>3): Exclude V16QI case.
gcc/testsuite/ChangeLog:
* gcc.target/i386/shift-gf2p8affine-1.c: New test
* gcc.target/i386/shift-gf2p8affine-2.c: New test
* gcc.target/i386/shift-gf2p8affine-3.c: New test
* gcc.target/i386/shift-v16qi-4.c: New test
* gcc.target/i386/shift-gf2p8affine-5.c: New test
* gcc.target/i386/shift-gf2p8affine-6.c: New test
* gcc.target/i386/shift-gf2p8affine-7.c: New test
|
|
I can't believe I made such a stupid pasto and the regression test
didn't detect anything wrong.
PR target/121634
gcc/
* config/loongarch/simd.md (simd_maddw_evod_<mode>_<su>): Use
WVEC_HALF instead of WVEC for the mode of the sign_extend for
the rhs of multiplication.
gcc/testsuite/
* gcc.target/loongarch/pr121634.c: New test.
|
|
I got too clever trying to simplify the right shift computation in my recent
ifcvt patch. Interestingly enough, I haven't seen anything but the Linaro CI
configuration actually trip the problem, though the code is clearly wrong.
The problem I was trying to avoid were the leading zeros when calling clz on a
HWI when the real object is just say 32 bits.
The net is we get a right shift count of "2" when we really wanted a right
shift count of 30. That causes the execution aspect of bics_3 to fail.
The scan failures are due to creating slightly more efficient code. THe new
code sequences don't need to use conditional execution for selection and thus
we can use bic rather bics which requires a twiddle in the scan.
I reviewed recent bug reports and haven't seen one for this issue. So no new
testcase as this is covered by the armv7 testsuite in the right configuration.
Bootstrapped and regression tested on x86_64, also verified it fixes the Linaro
reported CI failure and verified the crosses are still happy. Pushing to the
trunk.
gcc/
* ifcvt.cc (noce_try_sign_bit_splat): Fix right shift computation.
gcc/testsuite/
* gcc.target/arm/bics_3.c: Adjust expected output
|
|
|
|
* de.po: Update.
|
|
Fix a typo in the ChangeLog entry from r16-3355-g96a291c4bb0b8a.
|
|
|
|
PR c++/116928
gcc/cp/ChangeLog:
* parser.cc (cp_parser_braced_list): Set greater_than_is_operator_p.
gcc/testsuite/ChangeLog:
* g++.dg/parse/template33.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
Compile noplt-gd-1.c and noplt-ld-1.c with -mtls-dialect=gnu to support
the --with-tls=gnu2 configure option since they scan the assembly output
for the __tls_get_addr call which is generated by -mtls-dialect=gnu.
PR target/120933
* gcc.target/i386/noplt-gd-1.c (dg-options): Add
-mtls-dialect=gnu.
* gcc.target/i386/noplt-ld-1.c (dg-options): Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Allow passing --with-tls= at configure-time to control the default value
of -mtls-dialect= for i386 and x86_64. The default itself (gnu) is not changed
unless --with-tls= is passed.
--with-tls= is already wired up for ARM and RISC-V.
gcc/ChangeLog:
PR target/120933
* config.gcc (supported_defaults): Add tls for i386, x86_64.
* config/i386/i386.h (host_detect_local_cpu): Add tls.
* doc/install.texi: Document --with-tls= for i386, x86_64.
|
|
The old C-style was cumbersome make making one responsible for manually
creating and passing in two parts a closure (separate function and
*_info class for closed-over variables).
With C++ lambdas, we can just:
- derive environment types implicitly
- have fewer stray static functions
Also thanks to templates we can
- make the return type polymorphic, to avoid casting pointee types.
Note that `struct spec_path` was *not* converted because it is used
multiple times. We could still convert to a lambda, but we would want to
put the for_each_path call with that lambda inside a separate function
anyways, to support the multiple callers. Unlike the other two
refactors, it is not clear that this one would make anything shorter.
Instead, I define the `operator()` explicitly. Keeping the explicit
struct gives us some nice "named arguments", versus the wrapper function
alternative, too.
gcc/ChangeLog:
* gcc.cc (for_each_path): templated, to make passing lambdas
possible/easy/safe, and to have a polymorphic return type.
(struct add_to_obstack_info): Deleted, lambda captures replace
it.
(add_to_obstack): Moved to lambda in build_search_list.
(build_search_list): Has above lambda now.
(struct file_at_path_info): Deleted, lambda captures replace
it.
(file_at_path): Moved to lambda in find_a_file.
(find_a_file): Has above lambda now.
(struct spec_path_info): Reamed to just struct spec_path.
(struct spec_path): New name.
(spec_path): Rnamed to spec_path::operator()
(spec_path::operator()): New name
(do_spec_1): Updated for_each_path call sites.
Signed-off-by: John Ericson <git@JohnEricson.me>
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
defining module [PR120499]
In the PR, we're getting a linker error from _Vector_impl's destructor
never getting emitted. This is because of a combination of factors:
1. in imp-member-4_a, the destructor is not used and so there is no
definition generated.
2. in imp-member-4_b, the destructor gets synthesized (as part of the
synthesis for Coll's destructor) but is not ODR-used and so does not
get emitted. Despite there being a definition provided in this TU,
the destructor is still considered imported and so isn't streamed
into the module body.
3. in imp-member-4_c, we need to ODR-use the destructor but we only got
a forward declaration from imp-member-4_b, so we cannot emit a body.
The point of failure here is step 2; this function has effectively been
declared in the imp-member-4_b module, and so we shouldn't treat it as
imported. This way we'll properly stream the body so that importers can
emit it.
PR c++/120499
gcc/cp/ChangeLog:
* method.cc (synthesize_method): Set the instantiating module.
gcc/testsuite/ChangeLog:
* g++.dg/modules/imp-member-4_a.C: New test.
* g++.dg/modules/imp-member-4_b.C: New test.
* g++.dg/modules/imp-member-4_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
|
|
This patch adds missing guards on shift amounts to prevent UB when the
shift count equals or exceeds HOST_BITS_PER_WIDE_INT.
In the patch (r16-2666-g647bd0a02789f1), shift counts were only checked
for nonzero but not for being within valid bounds. This patch tightens
those conditions by enforcing that shift counts are greater than zero
and less than HOST_BITS_PER_WIDE_INT.
2025-08-23 Kishan Parmar <kishan@linux.ibm.com>
gcc/
PR target/118890
* config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): Add bounds
checks for shift counts to prevent undefined behavior.
(rs6000_emit_set_long_const): Likewise.
|
|
sign bit test
While working to remove mvconst_internal I stumbled over a regression in
the code to handle signed division by a power of two.
In that sequence we want to select between 0, 2^n-1 by pairing a sign
bit splat with a subsequent logical right shift. This can be done
without branches or conditional moves.
Playing with it a bit made me realize there's a handful of selections we
can do based on a sign bit test. Essentially there's two broad cases.
Clearing bits after the sign bit splat. So we have 0, -1, if we clear
bits the 0 stays as-is, but the -1 could easily turn into 2^n-1, ~2^n-1,
or some small constants.
Setting bits after the sign bit splat. If we have 0, -1, setting bits
the -1 stays as-is, but the 0 can turn into 2^n, a small constant, etc.
Shreya and I originally started looking at target patterns to do this,
essentially discovering conditional move forms of the selects and
rewriting them into something more efficient. That got out of control
pretty quickly and it relied on if-conversion to initially create the
conditional move.
The better solution is to actually discover the cases during
if-conversion itself. That catches cases that were previously being
missed, checks cost models, and is actually simpler since we don't have
to distinguish between things like ori and bseti, instead we just emit
the natural RTL and let the target figure it out.
In the ifcvt implementation we put these cases just before trying the
traditional conditional move sequences. Essentially these are a last
attempt before trying the generalized conditional move sequence.
This as been bootstrapped and regression tested on aarch64, riscv,
ppc64le, s390x, alpha, m68k, sh4eb, x86_64 and probably a couple others
I've forgotten. It's also been tested on the other embedded targets.
Obviously the new tests are risc-v specific, so that testing was
primarily to make sure we didn't ICE, generate incorrect code or regress
target existing specific tests.
Raphael has some changes to attack this from the gimple direction as
well. I think the latest version of those is on me to push through
internal review.
PR rtl-optimization/120553
gcc/
* ifcvt.cc (noce_try_sign_bit_splat): New function.
(noce_process_if_block): Use it.
gcc/testsuite/
* gcc.target/riscv/pr120553-1.c: New test.
* gcc.target/riscv/pr120553-2.c: New test.
* gcc.target/riscv/pr120553-3.c: New test.
* gcc.target/riscv/pr120553-4.c: New test.
* gcc.target/riscv/pr120553-5.c: New test.
* gcc.target/riscv/pr120553-6.c: New test.
* gcc.target/riscv/pr120553-7.c: New test.
* gcc.target/riscv/pr120553-8.c: New test.
|
|
We passed the reduc_info which is close, but the representative is
more spot on and will not collide with making the reduc_info a
distinct type.
* tree-vect-loop.cc (vectorizable_live_operation): Pass
the representative of the PHIs node to
vect_create_epilog_for_reduction.
|
|
STMT_VINFO_REDUC_VECTYPE_IN exists on relevant reduction stmts, not
the reduction info. And STMT_VINFO_DEF_TYPE exists on the
reduction info. The following fixes up a few places.
* tree-vect-loop.cc (vectorizable_lane_reducing): Get
reduction info properly. Adjust checks according to
comments.
(vectorizable_reduction): Do not set STMT_VINFO_REDUC_VECTYPE_IN
on the reduc info.
(vect_transform_reduction): Query STMT_VINFO_REDUC_VECTYPE_IN
on the actual reduction stmt, not the info.
|
|
Add run and asm check test cases for scalar unsigned SAT_MUL form 3.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_u_mul-4-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u64.rv32.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to try to match the the unsigned
SAT_MUL form 3, aka below:
#define DEF_SAT_U_MUL_FMT_3(NT, WT) \
NT __attribute__((noinline)) \
sat_u_mul_##NT##_from_##WT##_fmt_3 (NT a, NT b) \
{ \
WT x = (WT)a * (WT)b; \
if ((x >> sizeof(a) * 8) == 0) \
return (NT)x; \
else \
return (NT)-1; \
}
While WT is T is uint16_t, uint32_t, uint64_t and uint128_t,
and NT is is uint8_t, uint16_t, uint32_t and uint64_t.
gcc/ChangeLog:
* match.pd: Add form 3 for unsigned SAT_MUL.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
For the beginning basic block:
(note 4 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 4 26 2 NOTE_INSN_FUNCTION_BEG)
emit the TLS call after NOTE_INSN_FUNCTION_BEG.
gcc/
PR target/121635
* config/i386/i386-features.cc (ix86_emit_tls_call): Emit the
TLS call after NOTE_INSN_FUNCTION_BEG.
gcc/testsuite/
PR target/121635
* gcc.target/i386/pr121635-1a.c: New test.
* gcc.target/i386/pr121635-1b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
REDUC_GROUP_FIRST_ELEMENT is often checked to see whether we are
dealing with a SLP reduction or a reduction chain. When we are
in the context of analyzing the reduction (so we are sure
the SLP instance we see is correct), then we can use the SLP
instance kind instead.
* tree-vect-loop.cc (get_initial_defs_for_reduction): Adjust
comment.
(vect_create_epilog_for_reduction): Get at the reduction
kind via the instance, re-use the slp_reduc flag instead
of checking REDUC_GROUP_FIRST_ELEMENT again.
Remove unreachable code.
(vectorizable_reduction): Compute a reduc_chain flag from
the SLP instance kind, avoid REDUC_GROUP_FIRST_ELEMENT
checks.
(vect_transform_cycle_phi): Likewise.
(vectorizable_live_operation): Check the SLP instance
kind instead of REDUC_GROUP_FIRST_ELEMENT.
|
|
Linaro CI informed me that this test fails on ARM thumb-m7-hard-eabi.
This appears to be because the target defaults to -fshort-enums, and so
the mangled names are inaccurate.
This patch just disables the implicit type enum test for this case.
gcc/testsuite/ChangeLog:
* g++.dg/abi/mangle83.C: Disable implicit enum test for
-fshort-enums.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
The following removes the use of STMT_VINFO_REDUC_* from parloops,
also fixing a mistake with analyzing double reductions which rely
on the outer loop vinfo so the inner loop is properly detected as
nested.
* tree-parloops.cc (parloops_is_simple_reduction): Pass
in double reduction inner loop LC phis and query that.
(parloops_force_simple_reduction): Similar, but set it.
Check for valid reduction types here.
(valid_reduction_p): Remove.
(gather_scalar_reductions): Adjust, fixup double
reduction inner loop processing.
|
|
gcc/ChangeLog:
* config/riscv/t-rtems: Add -mstrict-align multilibs for
targets without support for misaligned access in hardware.
|
|
Without stating the architecture version required by the test, test
runs with options that are incompatible with the required
architecture version fail, e.g. -mfloat-abi=hard.
armv7 was not covered by the long list of arm variants in
target-supports.exp, so add it, and use it for the effective target
requirement and for the option.
for gcc/testsuite/ChangeLog
PR rtl-optimization/120424
* lib/target-supports.exp (arm arches): Add arm_arch_v7.
* g++.target/arm/pr120424.C: Require armv7 support. Use
dg-add-options arm_arch_v7 instead of explicit -march=armv7.
|
|
|
|
PR fortran/121627
gcc/fortran/ChangeLog:
* module.cc (create_int_parameter_array): Avoid NULL
pointer dereference and enhance error message.
gcc/testsuite/ChangeLog:
* gfortran.dg/pr121627.f90: New test.
|
|
For cores without a hardware multiplier, set respective optabs
with library functions which use software implementation of
multiplication.
The implementation was copied from the RL78 backend.
gcc/ChangeLog:
* config/pru/pru.cc (pru_init_libfuncs): Set softmpy libgcc
functions for optab multiplication entries if TARGET_OPT_MUL
option is not set.
libgcc/ChangeLog:
* config/pru/libgcc-eabi.ver: Add __pruabi_softmpyi and
__pruabi_softmpyll symbols.
* config/pru/t-pru: Add softmpy source files.
* config/pru/pru-softmpy.h: New file.
* config/pru/softmpyi.c: New file.
* config/pru/softmpyll.c: New file.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
Enable multilib builds for contemporary PRU core versions (AM335x and
later), and older versions present in AM18xx.
gcc/ChangeLog:
* config.gcc: Include pru/t-multilib.
* config/pru/pru.h (MULTILIB_DEFAULTS): Define.
* config/pru/t-multilib: New file.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
Older PRU core versions (e.g. in AM1808 SoC) do not support
XIN, XOUT, FILL, ZERO instructions. Add GCC command line options to
optionally disable generation of those instructions, so that code
can be executed on such older PRU cores.
gcc/ChangeLog:
* common/config/pru/pru-common.cc (TARGET_DEFAULT_TARGET_FLAGS):
Keep multiplication, FILL and ZERO instructions enabled by
default.
* config/pru/pru.md (prumov<mode>): Gate code generation on
TARGET_OPT_FILLZERO.
(mov<mode>): Ditto.
(zero_extendqidi2): Ditto.
(zero_extendhidi2): Ditto.
(zero_extendsidi2): Ditto.
(@pru_ior_fillbytes<mode>): Ditto.
(@pru_and_zerobytes<mode>): Ditto.
(@<code>di3): Ditto.
(mulsi3): Gate code generation on TARGET_OPT_MUL.
* config/pru/pru.opt: Add mmul and mfillzero options.
* config/pru/pru.opt.urls: Regenerate.
* config/rl78/rl78.opt.urls: Regenerate.
* doc/invoke.texi: Document new options.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|