Age | Commit message (Collapse) | Author | Files | Lines |
|
2025-09-05 Jakub Jelinek <jakub@redhat.com>
* J: Remove.
|
|
On Tue, Jul 01, 2025 at 02:50:40PM -0500, Segher Boessenkool wrote:
> No tests become good tests without effort. And tests that are not good
> tests require constant maintenance!
Here are two patches, either just the first one or both can be used
and both were tested on powerpc64le-linux.
The second one adds further 8 tests, which are dg-do run which #include
the former tests, don't do any dump tests and just define the checking/main
for those.
2025-09-05 Jakub Jelinek <jakub@redhat.com>
PR testsuite/118567
* gcc.target/powerpc/vsx-vectorize-9.c: New test.
* gcc.target/powerpc/vsx-vectorize-10.c: New test.
* gcc.target/powerpc/vsx-vectorize-11.c: New test.
* gcc.target/powerpc/vsx-vectorize-12.c: New test.
* gcc.target/powerpc/vsx-vectorize-13.c: New test.
* gcc.target/powerpc/vsx-vectorize-14.c: New test.
* gcc.target/powerpc/vsx-vectorize-15.c: New test.
* gcc.target/powerpc/vsx-vectorize-16.c: New test.
|
|
On Tue, Jul 01, 2025 at 02:50:40PM -0500, Segher Boessenkool wrote:
> No tests become good tests without effort. And tests that are not good
> tests require constant maintenance!
Here are two patches, either just the first one or both can be used
and both were tested on powerpc64le-linux.
The first one removes all the checking etc. stuff from the testcases,
as they are just dg-do compile, for the vectorize dump checks all we
care about are the vectorized loops they want to test.
2025-09-05 Jakub Jelinek <jakub@redhat.com>
PR testsuite/118567
* gcc.target/powerpc/vsx-vectorize-1.c: Remove includes, checking
part of main1 and main.
* gcc.target/powerpc/vsx-vectorize-2.c: Remove includes, replace
bar definition with declaration, remove main.
* gcc.target/powerpc/vsx-vectorize-3.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-4.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-5.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-6.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-7.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-8.c: Likewise.
|
|
Unlike Advanced SIMD, SVE has instruction to perform smin, smax, umin, umax
on 64-bit elements. Thus, we can use them with the fixed-width V2DImode
expander. Most of the machinery is already there on the define_insn side,
supporting V2DImode operands of the SVE pattern. We just need to wire up
the RTL emission to the v2di standard names for the TARGET_SVE case.
So for the smin case we now generate:
min_di:
ldr q30, [x0]
ptrue p7.b, all
ldr q31, [x1]
smin z30.d, p7/m, z30.d, z31.d
str q30, [x2]
ret
min_imm_di:
ldr q31, [x0]
smin z31.d, z31.d, #5
str q31, [x2]
ret
instead of the previous:
min_di:
ldr q30, [x0]
ldr q31, [x1]
cmgt v29.2d, v30.2d, v31.2d
bsl v29.16b, v31.16b, v30.16b
str q29, [x2]
ret
min_imm_di:
ldr q31, [x0]
mov z30.d, #5
cmgt v29.2d, v30.2d, v31.2d
bsl v29.16b, v31.16b, v30.16b
str q29, [x2]
ret
The register operand case is the same length, though the new ptrue can now be
shared and moved away. But the immediate operand case is obviously better
as the SVE immediate form doesn't require a predicate operand.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/
* config/aarch64/iterators.md (sve_di_suf): New mode attribute.
* config/aarch64/aarch64-sve.md (<optab><mode>3 SVE_INT_BINARY_MULTI):
Rename to...
(<optab><mode>3<sve_di_suf>): ... This. Use SVE_I_SIMD_DI mode
iterator.
* config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Use the above
for TARGET_SVE.
gcc/testsuite/
* gcc.target/aarch64/sve/usminmax_di.c: New test.
|
|
2025-09-04 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/84432
PR fortran/114815
* expr.cc (gfc_check_assign_symbol): Check that components in a
PDT with a default initializer have type and length parameters
that reduce to constant integer expressions.
* trans-expr.cc (gfc_trans_assignment_1): Parameterized
components cannot have default initializers so they must be
allocated after initialization.
gcc/testsuite/
PR fortran/84432
PR fortran/114815
* gfortran.dg/pdt_26.f03: Update with default no initializer.
* gfortran.dg/pdt_27.f03: Change to test non-conforming
initializers.
|
|
2025-09-05 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/83762
PR fortran/102457
* decl.cc (gfc_get_pdt_instance): Check that variable PDT parm
expressions are of type integer. Note that the symbol must be
tested since the expression often appears as BT_PROCEDURE.
gcc/testsuite/
PR fortran/83762
PR fortran/102457
* gfortran.dg/pdt_44.f03: New test.
* gfortran.dg/pr95090.f90: Give the PDT parameter a value to
suppress the type error.
|
|
|
|
cost 0, 1 and 15
Add asm dump check and run test for vec_duplicate + vmadd.vvm
combine to vmadd.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vmadd.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-u8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
0, 1 and 15
Add asm dump check and run test for vec_duplicate + vmadd.vv
combine to vmadd.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vmadd.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
To avoid generating the vmadd.vx code.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: Adjust the
vmacc.vx to avoid generating vmadd.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to combine the vec_duplicate + vmadd.vv to the
vmadd.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
Before this patch:
11 │ beq a3,zero,.L8
12 │ vsetvli a5,zero,e32,m1,ta,ma
13 │ vmv.v.x v2,a2
...
16 │ .L3:
17 │ vsetvli a5,a3,e32,m1,ta,ma
...
22 │ vmadd.vv v1,v2,v3
...
25 │ bne a3,zero,.L3
After this patch:
11 │ beq a3,zero,.L8
...
14 │ .L3:
15 │ vsetvli a5,a3,e32,m1,ta,ma
...
20 │ vmadd.vx v1,a2,v3
...
23 │ bne a3,zero,.L3
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vmacc_vx_<mode>): Rename to
handle both the macc and madd.
(*mul_plus_vx_<mode>): Add madd pattern.
* config/riscv/vector.md (@pred_mul_plus_vx_<mode>): Rename to
handle both the macc and madd.
(*pred_macc_<mode>_scalar_undef): Remove.
(*pred_nmsac_<mode>_scalar_undef): Remove.
(*pred_mul_plus_vx<mode>_undef): Add new pattern to handle
both the vmacc and vmadd.
(@pred_mul_plus_vx<mode>): Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
In r16-3414 libstdc++ changed ABI for (still experimental C++20) and uses
unordered value -128 instead of 2. Generally the change improved code
generation on all targets tested, see
https://gcc.gnu.org/pipermail/gcc-patches/2025-August/693534.html
for details.
In r16-3474 I've adjusted the middle-end and backends to use that value.
This apparently broke the spaceship_1.C test on aarch64 which scans the
exact function bodies which are now different.
The following patch adjusts the full body patterns to match. On these
2 routines, the generated code is 1 insn longer than in the past, so if
you have ideas how to change the code generation for the common case of
-1, 0, 1, -128 value, maybe it could be improved.
2025-09-04 Jakub Jelinek <jakub@redhat.com>
PR testsuite/121732
PR target/117013
* g++.target/aarch64/spaceship_1.C: Adjust expected fn bodies
for _Z8ss_floatff and _Z9ss_doubledd.
|
|
With -fpartial-profling we ICE building perlbench and gcc from spec2k17 since
afdo_annotate_cfg applies knowlede about zero profiles too early. This patch
moves it after the early exit when profile is 0 everywhere and also fixes
formatting issue in the next block.
gcc/ChangeLog:
* auto-profile.cc (afdo_annotate_cfg): Apply zero_bbs after early
exit for missing profile; fix formating
|
|
with auto-fdo it is possible that function bar with non-zero profile is inlined
into foo with zero profile and foo is the only caller of it. In this case
we currently scale bar to also have zero profile which makes it optimized
for size. With normal profiles this does not happen, since basic blocks with
non-zero count must have some way to be reached.
This patch makes inliner to scale caller in this case which mitigates the
problem (to some degree).
Bootstrapped/regtested x86_64-linux, plan to commit it shortly.
gcc/ChangeLog:
* ipa-inline-transform.cc (inline_call): If function with
AFDO profile is inlined into function with
GUESSED_GLOBAL0_AFDO or GUESSED_GLOBAL0_ADJUSTED, scale
caller to AFDO profile.
* profile-count.h (profile_count::apply_scale): If num is AFDO
and den is not GUESSED, make result AFDO rather then GUESSED.
|
|
Add an optab for isnan. This requires changes to the existing folding code
to extend the interclass_mathfn infrastructure to support BUILT_IN_ISNAN.
It now checks for a valid optab before emitting the generic expansion.
There is no change if no optab is defined. Update documentation.
gcc:
* builtins.cc (interclass_mathfn_icode): Add support for isnan
optab.
(expand_builtin): Add BUILT_IN_ISNAN to expand isnan optab.
(fold_builtin_interclass_mathfn): Expand BUILT_IN_ISNAN only after
checking for a valid optab.
(fold_builtin_classify): Move generic BUILT_IN_ISNAN expansion
to fold_builtin_interclass_mathfn.
(fold_builtin_1): For BUILT_IN_ISNAN first try fold_builtin_classify,
then fold_builtin_interclass_mathfn.
* optabs.def: Add isnan optab.
* doc/md.texi: Document isnan.
|
|
The following removes back-and-forth of state in
vect_create_epilog_for_reduction and code that's pointless, in
particular around double reduction handling which isn't that
special as it seems.
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Remove unnecessary code around double reductions.
|
|
Insufficient validation of the operands in vec_set_<mode>_internal
means that the optimizers can transform the exanded code into
something that is invalid. We then emit code based on the incorrect
RTL assuming that it is still valid. A valid pattern can only have a
single bit set in the immediate operand, representing the lane to be
written.
gcc/ChangeLog:
PR target/121775
* config/arm/neon.md (vec_set<mode>_internal, all variants):
validate the immediate operand that indicates the lane to
modify.
gcc/testsuite/ChangeLog:
PR target/121775
* gcc.target/arm/simd/vset_lane_u8.c: New test.
|
|
The following removes never taken paths and consolidates the
nested_cycle and double_reduc variables which are the same.
* tree-vect-loop.cc (vectorizable_reduction): Eliminate
nested_cycle in favor of double_reduc and set that where
it makes most sense. Remove never taken paths and always
true conditions.
|
|
This fixes a glaring mistake in yesterday's change to the expansion of
vec_perm. We should of course move tmp_target into the real target
and not the other way around. I wonder why my testing hasn't
caught this...
PR target/121742
PR target/121780
PR target/121781
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vec_perm): Swap target and
tmp_target.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr121780.c: New test.
* gcc.target/riscv/rvv/autovec/pr121781.c: New test.
|
|
Since peeling and version for alignment for VLA modes was introduced
(r16-3065-geee51f9a4b6) we have been seeing a lot of test suite failures
like
internal compiler error: in apply_scale, at profile-count.h:1187
This is because vect_gen_prolog_loop_niters sets the prolog bound to -1
in case align_in_elems is a non-constant poly_int.
bound - 1 is later used to scale the loop profile in scale_loop_profile
so we try to calculate with an assumed -2 iterations.
This patch changes bound_prolog to poly_int64, using a poly estimate for
frequency scaling but only records an iteration bound for the prolog if
the bound is a scalar.
PR/tree-optimization 121523
gcc/ChangeLog:
* tree-vect-loop-manip.cc (vect_gen_prolog_loop_niters):
Change prolog bound to poly_int64.
(vect_gen_scalar_loop_niters): Ditto.
(vect_do_peeling): Use poly estimate for frequency scaling.
|
|
The following changes how we detect double reductions, in particular
not setting vect_double_reduction_def on the outer PHIs when the inner
loop doesn't satisfy double reduction constraints. It also simplifies
the setup a bit by not having to detect wheter we process an inner
loop of a double reduction.
PR tree-optimization/121768
* tree-vect-loop.cc (vect_inner_phi_in_double_reduction_p): Remove.
(vect_analyze_scalar_cycles_1): Analyze inner loops of
double reductions immediately and only mark fully recognized
double reductions. Skip already analyzed inner loops.
(vect_is_simple_reduction): Change double_reduc from a flag
to an output of the inner loop PHI and to whether we are
processing an inner loop of a double reduction.
* gcc.dg/vect/pr121768.c: New testcase.
|
|
When inside a method then we know the this pointer points to
an object of at least the size of the methods base type. We
can use this to compute more references as not trapping and
enable invariant motion and in turn vectorization as for a
slightly modified version of the testcase in the PR.
PR tree-optimization/121685
* tree-eh.cc (ref_outside_object_p): Split out from ...
(tree_could_trap_p): ... here. Assume the this pointer
of a method refers to an object of at least size of its
base type.
* g++.dg/vect/pr121685-1.cc: New testcase.
|
|
Currently the code rejects:
```
tmp = *a;
*b = tmp;
```
(unless *a == *b). This can be improved such that if a and b are known to
share the same base, then only reject it if they overlap; that is the
difference of the offsets (from the base) is maybe less than the size.
This fixes the testcase in comment #0 of PR 107051.
Changes since v1:
* v2: Use ranges_maybe_overlap_p instead of manually checking the overlap.
Allow for the case where the alignment is known to be greater than
the size.
PR tree-optimization/107051
gcc/ChangeLog:
* tree-ssa-forwprop.cc (optimize_agr_copyprop_1): Allow for
memory sharing the same base if they known not to overlap over
the size.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/copy-prop-aggregate-union-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
Previously, vector built-in functions were not properly registered during
the LTO pipeline, causing link failures when vector intrinsics were used
in LTO builds with mixed architecture options. This patch ensures all
vector built-in functions are always registered during LTO compilation.
The key changes include:
- Moving pragma intrinsic flag manipulation from riscv-c.cc to
riscv-vector-builtins.cc for better encapsulation
- Registering all vector built-in functions regardless of current ISA
extensions, deferring the actual extension checking to expansion time
- Adding proper support for built-in type registration during LTO
This approach is safe because we already perform extension requirement
checking at expansion time. The trade-off is a slight increase in
bootstrap time for LTO builds due to registering more built-in functions.
PR target/110812
gcc/ChangeLog:
* config/riscv/riscv-c.cc (pragma_intrinsic_flags): Remove struct.
(riscv_pragma_intrinsic_flags_pollute): Remove function.
(riscv_pragma_intrinsic_flags_restore): Remove function.
(riscv_pragma_intrinsic): Simplify to only call handle_pragma_vector.
* config/riscv/riscv-vector-builtins.cc (pragma_intrinsic_flags):
Move struct definition here from riscv-c.cc.
(riscv_pragma_intrinsic_flags_pollute): Move and adapt from
riscv-c.cc, add zvfbfmin, zvfhmin and vector_elen_bf_16 support.
(riscv_pragma_intrinsic_flags_restore): Move from riscv-c.cc.
(rvv_switcher::rvv_switcher): Add pollute_flags parameter to
control flag manipulation.
(rvv_switcher::~rvv_switcher): Restore flags conditionally.
(register_builtin_types): Use rvv_switcher without polluting flags.
(get_required_extensions): Remove function.
(check_required_extensions): Simplify to only check type validity.
(function_instance::function_returns_void_p): Move implementation
from header.
(function_builder::add_function): Register placeholder for LTO.
(init_builtins): Simplify and handle LTO case.
(reinit_builtins): Remove function.
(handle_pragma_vector): Remove extension checking.
* config/riscv/riscv-vector-builtins.h
(function_instance::function_returns_void_p): Add declaration.
(function_call_info::function_returns_void_p): Remove inline
implementation.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/lto/pr110812_0.c: New test.
* gcc.target/riscv/lto/pr110812_1.c: New test.
* gcc.target/riscv/lto/riscv-lto.exp: New test driver.
* gcc.target/riscv/lto/riscv_vector.h: New header wrapper.
|
|
The extension subset check logic in riscv_ext_is_subset was incorrectly
inverted, causing functions with more extensions to be incorrectly
rejected from being inlined into functions with fewer extensions.
This patch fixes the logic to correctly check if the callee's required
extensions are a subset of the caller's extensions. The corrected logic
now properly allows inlining when the caller has all the extensions that
the callee requires.
gcc/
* common/config/riscv/riscv-common.cc (riscv_ext_is_subset): Fix
inverted logic in extension subset check.
gcc/testsuite/
* gcc.target/riscv/can_inline_p_test-01.c: New test.
* gcc.target/riscv/can_inline_p_test-02.c: New test.
* gcc.target/riscv/can_inline_p_test-03.c: New test.
* gcc.target/riscv/can_inline_p_test-04.c: New test.
* gcc.target/riscv/riscv_vector.h: New header wrapper for vector
tests.
|
|
This patch fixes regressions of the gcc.dg/torture/bitint-* tests
caused by r16-3036-ga76a032354ee48 with --enable-checking=all.
The errors are similar to the following:
../../gcc/testsuite/gcc.dg/torture/bitint-14.c:54:1: error: type mismatch in 'array_ref'
<unnamed-signed:63>
unsigned long
_42 = VIEW_CONVERT_EXPR<unsigned long[10]>(r575[i_10])[8];
during GIMPLE pass: bitintlower0
../../gcc/testsuite/gcc.dg/torture/bitint-14.c:54:1: internal compiler error: verify_gimple failed
The first two hunks aren't strictly necessary, I'm just trying to
avoid calling build_qualified_type when it won't be needed.
At least on s390x-linux (tried cross) bitint-14.c doesn't ICE with it
anymore.
Though, I must say the more I look at the limb_access changes, the less
I like the abi_load_p stuff, so I think what we eventually should do instead
is return values with m_limb_type always.
For bitint_extended case (but only if we can prove that the extension there
is for the right precision and right sign) or !write_p just return it,
otherwise cast to lower precision and back to m_limb_type.
And on the other side on stores, for !bitint_extended happily store whatever
the whole m_limb_type value contains, for bitint_extended do the cast to
smaller precision and back on the writes.
2025-09-04 Jakub Jelinek <jakub@redhat.com>
PR target/117599
* gimple-lower-bitint.cc (bitint_large_huge::limb_access): Move
build_qualified_type calls into the if/else if/else bodies, for
the last one set ltype to m_limb_type first, drop limb_type_a
and use ltype instead.
|
|
The following handles SCEV analysis of a peeled converted IV if
that IV is known to not overflow. For
# _15 = PHI <_4(6), 0(5)>
# i_18 = PHI <i_11(6), 0(5)>
i_11 = i_18 + 1;
_4 = (long unsigned int) i_11;
we cannot analyze _15 directly since the SCC has a widening
conversion. But we can analyze _4 to (long unsigned int) {1, +, 1}_1
which is "peeled" (it's from after the first iteration of _15).
If the un-peeled IV {0, +, 1}_1 has the same initial value as _15
and it does not overflow then _15 can be analyzed as
{0ul, +, 1ul}_1.
The following implements this in simplify_peeled_chrec.
PR tree-optimization/61247
* tree-scalar-evolution.cc (simplify_peeled_chrec):
Handle the case of a converted peeled chrec.
* gcc.dg/vect/vect-pr61247.c: New testcase.
|
|
The following makes value-numbering handle a situation like
D.58046 = {};
SR.83_44->i = {};
pretmp_41 = MEM[(struct _Optional_payload_base &)&D.58046 + 8]._M_engaged;
where the intermediate may-def SR.83_44->i = {} prevents CSE of the
load to zero. The problem is two-fold here, one is that the code
skipping may-defs does not handle zeroing via a CTOR, the other is that
(partial) must-defs can be better handled by later code as otherwise
we may not find an appropriate definition to CSE to.
I've noticed we fail to guard against storage-order issues, so fixed
that on the fly.
PR tree-optimization/121740
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Allow skipping
may-defs from CTORs. Do not skip may-defs with storage-order
issues or (partial) must-defs.
* gcc.dg/tree-ssa/ssa-fre-104.c: Un-XFAIL.
* gcc.dg/tree-ssa/ssa-fre-110.c: New testcase.
|
|
On looking again at [basic.lookup.argdep] p4, I believe GCC hasn't fully
implemented the wording here for ADL. This patch fixes two issues.
First, 4.3 indicates that a function exported from a named module should
be visible to ADL regardless of whether it's visible to normal name
lookup, as long as some restrictions are followed.
This patch implements this; for skipping declarations that "do not
appear in the TU containing the point of lookup" I don't think there's
anything special we need to do, as any declarations before the point of
lookup will be found in other ways anyway, and any remaining
declarations from the current TU cannot be seen regardless.
Secondly, currently we only add the exported functions along the
instantiation path of a lookup. But I don't think this is intended by
the current wording, so this patch adjusts that. I also clean up the
logic to do all different module processing in adl_namespace_fns so that
we don't duplicate work in traversing the module binding list
unnecessarily.
This new handling means we need to do some extra work to properly error
on overload sets containing TU-local entities (as this might actually
come up now!) but I'm leaving that for a later patch.
As a drive-by fix this also fixes an ICE for C++26 expansion statements
with finding the instantiation path.
PR c++/117658
gcc/cp/ChangeLog:
* cp-tree.h (get_originating_module): Adjust parameter names.
* module.cc (path_of_instantiation): Handle C++26 expansion
statements.
* name-lookup.cc (name_lookup::adl_namespace_fns): Handle
exported declarations attached to the same module of an
associated entity with the same innermost non-inline namespace,
and non-exported functions on the instantiation path.
(name_lookup::search_adl): Build mapping of namespace to modules
that associated entities are attached to; remove now-unneeded
instantiation path handling.
gcc/testsuite/ChangeLog:
* g++.dg/modules/adl-4_a.C: Test should pass.
* g++.dg/modules/adl-4_b.C: Test should pass.
* g++.dg/modules/adl-6_a.C: New test.
* g++.dg/modules/adl-6_b.C: New test.
* g++.dg/modules/adl-6_c.C: New test.
* g++.dg/modules/adl-7_a.C: New test.
* g++.dg/modules/adl-7_b.C: New test.
* g++.dg/modules/adl-7_c.C: New test.
* g++.dg/modules/adl-8_a.C: New test.
* g++.dg/modules/adl-8_b.C: New test.
* g++.dg/modules/adl-8_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
When we push an existing namespace within the module purview for the
first time, we also need to mark any parent inline namespaces as purview
to not confuse the streaming logic.
PR c++/121724
gcc/cp/ChangeLog:
* name-lookup.cc (push_namespace): Mark inline namespace
contexts as purview if needed.
gcc/testsuite/ChangeLog:
* g++.dg/modules/namespace-12_a.C: New test.
* g++.dg/modules/namespace-12_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
Currently, for Darwin unwind and EH frames are emitted without use
of .cfi_xxx instructions; the emitted frames also contain the
string 'ascii'. For the purpose of this test, omit them.
PR testsuite/112728
gcc/testsuite/ChangeLog:
* gcc.dg/scantest-lto.c: Omit unwind frames.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
|
|
This extension defines instructions to perform scalar floating-point
conversion between the BFLOAT16 floating-point data and the IEEE-754
32-bit single-precision floating-point (SP) data in a scalar
floating point register.
gcc/ChangeLog:
* config/riscv/andes.def: Add nds_fcvt_s_bf16 and nds_fcvt_bf16_s.
* config/riscv/riscv.md (truncsfbf2): Add TARGET_XANDESBFHCVT support.
(extendbfsf2): Ditto.
* config/riscv/riscv-builtins.cc: New AVAIL andesbfhcvt.
Add new define RISCV_ATYPE_BF and RISCV_ATYPE_SF.
* config/riscv/riscv-ftypes.def: New DEF_RISCV_FTYPE.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xandes/xandesbfhcvt-1.c: New test.
* gcc.target/riscv/xandes/xandesbfhcvt-2.c: New test.
|
|
This patch adds support for the XAndesperf ISA extension.
The 32-bit AndeStar V5 extension includes branch instructions,
load effective address instructions, and string processing
instructions for performance improvement.
New INSN patterns are added into the new file andes.md
as a seprated vender extension.
gcc/ChangeLog:
* config/riscv/constraints.md (Ou07): New constraint.
(ads_Bext): New constraint.
* config/riscv/iterators.md (ANYLE32): New iterator.
(sizen): New iterator.
(sh_limit): New iterator.
(sh_bit): New iterator.
(cs): New iterator.
* config/riscv/predicates.md (ads_branch_bbcs_operand): New predicate.
(ads_branch_bimm_operand): New predicate.
(ads_imm_extract_operand): New predicate.
(ads_extract_size_imm_si): New predicate.
(ads_extract_size_imm_di): New predicate.
(const_int5_operand): New predicate.
* config/riscv/riscv-builtins.cc:
Add new AVAIL andesperf32 and andesperf64.
Add new define RISCV_ATYPE_DI.
* config/riscv/riscv-ftypes.def: New DEF_RISCV_FTYPE.
* config/riscv/riscv.cc
(riscv_extend_cost): Cost for pattern 'bfo'.
(riscv_rtx_costs): Cost for XAndesperf extension.
* config/riscv/riscv.md: Add support for XAndesperf to patterns
zero_extendsidi2_internal, zero_extendhi2, extendsidi2_internal,
extend<SHORT:mode><SUPERQI:mode>2, <any_extract:optab><GPR:mode>3
and branch_on_bit.
* config/riscv/vector-iterators.md
(sz): Add sign_extract and zero_extract.
* config/riscv/andes.def: New file for vender Andes.
* config/riscv/andes.md: New file for vender Andes.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/riscv.exp: Add runtest for subdir xandes.
* gcc.target/riscv/xandes/xandesperf-1.c: New test.
* gcc.target/riscv/xandes/xandesperf-10.c: New test.
* gcc.target/riscv/xandes/xandesperf-2.c: New test.
* gcc.target/riscv/xandes/xandesperf-3.c: New test.
* gcc.target/riscv/xandes/xandesperf-4.c: New test.
* gcc.target/riscv/xandes/xandesperf-5.c: New test.
* gcc.target/riscv/xandes/xandesperf-6.c: New test.
* gcc.target/riscv/xandes/xandesperf-7.c: New test.
* gcc.target/riscv/xandes/xandesperf-8.c: New test.
* gcc.target/riscv/xandes/xandesperf-9.c: New test.
|
|
This patch add basic support for the following XAndes ISA extensions:
XANDESPERF
XANDESBFHCVT
XANDESVBFHCVT
XANDESVSINTLOAD
XANDESVPACKFPH
XANDESVDOT
gcc/ChangeLog:
* config/riscv/riscv-ext.def: Include riscv-ext-andes.def.
* config/riscv/riscv-ext.opt (riscv_xandes_subext): New variable.
(XANDESPERF) : New mask.
(XANDESBFHCVT): Ditto.
(XANDESVBFHCVT): Ditto.
(XANDESVSINTLOAD): Ditto.
(XANDESVPACKFPH): Ditto.
(XANDESVDOT): Ditto.
* config/riscv/t-riscv: Add riscv-ext-andes.def.
* doc/riscv-ext.texi: Regenerated.
* config/riscv/riscv-ext-andes.def: New file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xandes/xandes-predef-1.c: New test.
* gcc.target/riscv/xandes/xandes-predef-2.c: New test.
* gcc.target/riscv/xandes/xandes-predef-3.c: New test.
* gcc.target/riscv/xandes/xandes-predef-4.c: New test.
* gcc.target/riscv/xandes/xandes-predef-5.c: New test.
* gcc.target/riscv/xandes/xandes-predef-6.c: New test.
Co-author: Lino Hsing-Yu Peng (linopeng@andestech.com)
Co-author: Kai Kai-Yi Weng (kaiweng@andestech.com).
|
|
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into an smax RTL instruction.
Before this patch, we have two instructions, e.g.:
vfmv.v.f v2,fa0
vfmax.vv v1,v1,v2
After, we get only one:
vfmax.vf v1,v1,fa0
In some cases, it also shaves off one vsetvli.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfmax_vf_<mode>): Rename into...
(*vf<optab>_vf_<mode>): New pattern to combine vec_duplicate +
vf{min,max}.vv into vf{max,min}.vf.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-2.c: Adjust scan
dump.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfmax. Also add
missing scan-dump for vfmul.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Add vfmax.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add max functions.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for
vfmax.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmax-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmax-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmax-run-1-f64.c: New test.
|
|
PR fortran/121263
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (gfc_conv_intrinsic_transfer): For an
unlimited polymorphic SOURCE to TRANSFER use saved descriptor
if possible.
gcc/testsuite/ChangeLog:
* gfortran.dg/transfer_class_5.f90: New test.
|
|
This is Austin's work to remove the redundant sign extension seen in pr121213.
--
The .w form of amoswap will sign extend its result from 32 to 64 bits, thus any
explicit sign extension insn doing the same is redundant.
This uses Jivan's approach of allocating a DI temporary for an extended result
and using a promoted subreg extraction to get that result into the final
destination.
Tested with no regressions on riscv32-elf and riscv64-elf and bootstrapped on
the BPI and pioneer systems.
PR target/121213
gcc/
* config/riscv/sync.md (amo_atomic_exchange_extended<mode>):
Separate insn with sign extension for 64 bit targets.
gcc/testsuite
* gcc.target/riscv/amo/pr121213.c: Remove xfail.
|
|
WPA currently does not print profile_info which might have been modified
by profile merging logic. this patch adds dumping logic to ipa-profile pass.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* ipa-profile.cc (ipa_profile): Dump profile_info.
|
|
With -O2 we automatically enable several loop optimizations with -fprofile-use.
The rationale is that those optimizations at -O3 only mainly since they may
hurt performance or not pay back in code size when used blindly on all loops.
Profile feedback gives us data on number of iterations which is used by heuristics
controlling those optimizations.
Currently auto-FDO is not that good on determining number of iterations so I think we
do not want to enable them until we can prove that those are useful.
This is affecting primarily -O2 codegen.
Theoretically auto-FdO with lbr can be pretty good on estimating # of
iterations, but to make it useful we will need to implement multiplicity for
discriminators at least.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* opts.cc (enable_fdo_optimizations): Do not auto-enabele loop
optimizations with AutoFDO.
|
|
Committing as obvious.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/testsuite/
PR target/121749
* gcc.target/aarch64/simd/pr121749.c: Use dg-assemble directive.
|
|
The number of LTO partitions should exceed number of CPUs (or hyper-threads) of
commonly used CPUs. I think it is time to increase it again and as discussed
in the LTO and toplevel asm thread, doing so scales quite well. Tmp file usage
grows from 2.7 to 2.9MB which seems acceptable. Overall build time on machine
with 256 hyperthreads is comparable.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* params.opt (-param=lto-partitions=): INcrease default value from 128 to 512.
|
|
With g:d20b2ad845876eec0ee80a3933ad49f9f6c4ee30 the narrowing shift instructions
are now represented with standard RTL and more merging optimisations occur.
This exposed a wrong predicate for the shift amount operand.
The shift amount is the number of bits of the narrow destination, not the input
sources.
Correct this by using the vn_mode attribute when specifying the predicate, which
exists for this purpose.
I've spotted a few more narrowing shift patterns that need the restriction, so
they are updated as well.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/
PR target/121749
* config/aarch64/aarch64-simd.md (aarch64_<shrn_op>shrn_n<mode>):
Use aarch64_simd_shift_imm_offset_<vn_mode> instead of
aarch64_simd_shift_imm_offset_<ve_mode> predicate.
(aarch64_<shrn_op>shrn_n<mode> VQN define_expand): Likewise.
(*aarch64_<shrn_op>rshrn_n<mode>_insn): Likewise.
(aarch64_<shrn_op>rshrn_n<mode>): Likewise.
(aarch64_<shrn_op>rshrn_n<mode> VQN define_expand): Likewise.
(aarch64_sqshrun_n<mode>_insn): Likewise.
(aarch64_sqshrun_n<mode>): Likewise.
(aarch64_sqshrun_n<mode> VQN define_expand): Likewise.
(aarch64_sqrshrun_n<mode>_insn): Likewise.
(aarch64_sqrshrun_n<mode>): Likewise.
(aarch64_sqrshrun_n<mode>): Likewise.
* config/aarch64/iterators.md (vn_mode): Handle DI, SI, HI modes.
gcc/testsuite/
PR target/121749
* gcc.target/aarch64/simd/pr121749.c: New test.
|
|
Here although the local templated variables x and y have the same
reduced constant value, only x's initializer {a.get()} is well-formed
as written since A::m has private access. We correctly reject y's
initializer {&a.m} (at instantiation time), but we also reject x's
initializer because we happen to constant fold it ahead of time, which
means at instantiation time it's already represented as a COMPONENT_REF
to a FIELD_DECL, and so when substituting this COMPONENT_REF we naively
double check that the given FIELD_DECL is accessible, which fails.
This patch sidesteps around this particular issue by not checking access
when substituting a COMPONENT_REF to a FIELD_DECL. If the target of a
COMPONENT_REF is already a FIELD_DECL (i.e. before substitution), then I
think we can assume access has been already checked appropriately.
PR c++/97740
gcc/cp/ChangeLog:
* pt.cc (tsubst_expr) <case COMPONENT_REF>: Don't check access
when the given member is already a FIELD_DECL.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/constexpr-97740a.C: New test.
* g++.dg/cpp0x/constexpr-97740b.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
The sinking code currently does not heuristically avoid placing
code into an irreducible region in the same way it avoids placing
into a deeper loop nest. Critically for the PR we may not insert
a VDEF into a irreducible region that does not contain a virtual
definition. The following adds the missing heuristic and also
a stop-gap for the VDEF issue - since we cannot determine
validity inside an irreducible region we have to reject any
VDEF movement with destination inside such region, even when
it originates there. In particular irreducible sub-cycles are
not tracked separately and can cause issues.
I chose to not complicate the already partly incomplete assert
but prune it down to essentials.
PR tree-optimization/121756
* tree-ssa-sink.cc (select_best_block): Avoid irreducible
regions in otherwise same loop depth.
(statement_sink_location): When sinking a VDEF, never place
that into an irreducible region.
* gcc.dg/torture/pr121756.c: New testcase.
|
|
This pattern doesn't do any target support check so no need to set
a vector type.
* tree-vect-patterns.cc (vect_recog_cond_expr_convert_pattern):
Do not set any vector types.
|
|
The a % b -> a - a / b pattern breaks reduction constraints, disable it
for reduction stmts.
PR tree-optimization/121767
* tree-vect-patterns.cc (vect_recog_mod_var_pattern): Disable
for reductions.
* gcc.dg/vect/pr121767.c: New testcase.
|
|
The following fixes a corner case of pattern stmt STMT_VINFO_REDUC_IDX
updating which happens auto-magically. When a 2nd pattern sequence
uses defs from inside a prior pattern sequence then the first guess
for the lookfor can be off. This happens when for example widening
patterns use vect_get_internal_def, which looks into earlier patterns.
PR tree-optimization/121758
* tree-vect-patterns.cc (vect_mark_pattern_stmts): Try
harder to find a reduction continuation.
* gcc.dg/vect/pr121758.c: New testcase.
|
|
split_address_to_core_and_offset [PR121355]
Inside split_address_to_core_and_offset, this calls get_inner_reference.
Take:
```
_6 = t_3(D) + 12;
_8 = &MEM[(struct s1 *)t_3(D) + 4B].t;
_1 = _6 - _8;
```
On the assignement of _8, get_inner_reference will return `MEM[(struct s1 *)t_3(D) + 4B]`
and an offset but that does not match up with `t_3(D)` which is how split_address_to_core_and_offset
handles pointer plus.
So this patch adds the unwrapping of the MEM_REF after the call to get_inner_reference
and have it act like a pointer plus.
Changes since v1:
* v2: Remove check on operand 1 for poly_int_tree_p, it is always.
Add before the check to see if it fits in shwi instead of after.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/121355
gcc/ChangeLog:
* fold-const.cc (split_address_to_core_and_offset): Handle an MEM_REF after the call
to get_inner_reference.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ptrdiff-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
|