Age | Commit message (Collapse) | Author | Files | Lines |
|
* de.po: Update.
|
|
Fix a typo in the ChangeLog entry from r16-3355-g96a291c4bb0b8a.
|
|
|
|
PR c++/116928
gcc/cp/ChangeLog:
* parser.cc (cp_parser_braced_list): Set greater_than_is_operator_p.
gcc/testsuite/ChangeLog:
* g++.dg/parse/template33.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
Compile noplt-gd-1.c and noplt-ld-1.c with -mtls-dialect=gnu to support
the --with-tls=gnu2 configure option since they scan the assembly output
for the __tls_get_addr call which is generated by -mtls-dialect=gnu.
PR target/120933
* gcc.target/i386/noplt-gd-1.c (dg-options): Add
-mtls-dialect=gnu.
* gcc.target/i386/noplt-ld-1.c (dg-options): Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Allow passing --with-tls= at configure-time to control the default value
of -mtls-dialect= for i386 and x86_64. The default itself (gnu) is not changed
unless --with-tls= is passed.
--with-tls= is already wired up for ARM and RISC-V.
gcc/ChangeLog:
PR target/120933
* config.gcc (supported_defaults): Add tls for i386, x86_64.
* config/i386/i386.h (host_detect_local_cpu): Add tls.
* doc/install.texi: Document --with-tls= for i386, x86_64.
|
|
The old C-style was cumbersome make making one responsible for manually
creating and passing in two parts a closure (separate function and
*_info class for closed-over variables).
With C++ lambdas, we can just:
- derive environment types implicitly
- have fewer stray static functions
Also thanks to templates we can
- make the return type polymorphic, to avoid casting pointee types.
Note that `struct spec_path` was *not* converted because it is used
multiple times. We could still convert to a lambda, but we would want to
put the for_each_path call with that lambda inside a separate function
anyways, to support the multiple callers. Unlike the other two
refactors, it is not clear that this one would make anything shorter.
Instead, I define the `operator()` explicitly. Keeping the explicit
struct gives us some nice "named arguments", versus the wrapper function
alternative, too.
gcc/ChangeLog:
* gcc.cc (for_each_path): templated, to make passing lambdas
possible/easy/safe, and to have a polymorphic return type.
(struct add_to_obstack_info): Deleted, lambda captures replace
it.
(add_to_obstack): Moved to lambda in build_search_list.
(build_search_list): Has above lambda now.
(struct file_at_path_info): Deleted, lambda captures replace
it.
(file_at_path): Moved to lambda in find_a_file.
(find_a_file): Has above lambda now.
(struct spec_path_info): Reamed to just struct spec_path.
(struct spec_path): New name.
(spec_path): Rnamed to spec_path::operator()
(spec_path::operator()): New name
(do_spec_1): Updated for_each_path call sites.
Signed-off-by: John Ericson <git@JohnEricson.me>
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
defining module [PR120499]
In the PR, we're getting a linker error from _Vector_impl's destructor
never getting emitted. This is because of a combination of factors:
1. in imp-member-4_a, the destructor is not used and so there is no
definition generated.
2. in imp-member-4_b, the destructor gets synthesized (as part of the
synthesis for Coll's destructor) but is not ODR-used and so does not
get emitted. Despite there being a definition provided in this TU,
the destructor is still considered imported and so isn't streamed
into the module body.
3. in imp-member-4_c, we need to ODR-use the destructor but we only got
a forward declaration from imp-member-4_b, so we cannot emit a body.
The point of failure here is step 2; this function has effectively been
declared in the imp-member-4_b module, and so we shouldn't treat it as
imported. This way we'll properly stream the body so that importers can
emit it.
PR c++/120499
gcc/cp/ChangeLog:
* method.cc (synthesize_method): Set the instantiating module.
gcc/testsuite/ChangeLog:
* g++.dg/modules/imp-member-4_a.C: New test.
* g++.dg/modules/imp-member-4_b.C: New test.
* g++.dg/modules/imp-member-4_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
|
|
This patch adds missing guards on shift amounts to prevent UB when the
shift count equals or exceeds HOST_BITS_PER_WIDE_INT.
In the patch (r16-2666-g647bd0a02789f1), shift counts were only checked
for nonzero but not for being within valid bounds. This patch tightens
those conditions by enforcing that shift counts are greater than zero
and less than HOST_BITS_PER_WIDE_INT.
2025-08-23 Kishan Parmar <kishan@linux.ibm.com>
gcc/
PR target/118890
* config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): Add bounds
checks for shift counts to prevent undefined behavior.
(rs6000_emit_set_long_const): Likewise.
|
|
sign bit test
While working to remove mvconst_internal I stumbled over a regression in
the code to handle signed division by a power of two.
In that sequence we want to select between 0, 2^n-1 by pairing a sign
bit splat with a subsequent logical right shift. This can be done
without branches or conditional moves.
Playing with it a bit made me realize there's a handful of selections we
can do based on a sign bit test. Essentially there's two broad cases.
Clearing bits after the sign bit splat. So we have 0, -1, if we clear
bits the 0 stays as-is, but the -1 could easily turn into 2^n-1, ~2^n-1,
or some small constants.
Setting bits after the sign bit splat. If we have 0, -1, setting bits
the -1 stays as-is, but the 0 can turn into 2^n, a small constant, etc.
Shreya and I originally started looking at target patterns to do this,
essentially discovering conditional move forms of the selects and
rewriting them into something more efficient. That got out of control
pretty quickly and it relied on if-conversion to initially create the
conditional move.
The better solution is to actually discover the cases during
if-conversion itself. That catches cases that were previously being
missed, checks cost models, and is actually simpler since we don't have
to distinguish between things like ori and bseti, instead we just emit
the natural RTL and let the target figure it out.
In the ifcvt implementation we put these cases just before trying the
traditional conditional move sequences. Essentially these are a last
attempt before trying the generalized conditional move sequence.
This as been bootstrapped and regression tested on aarch64, riscv,
ppc64le, s390x, alpha, m68k, sh4eb, x86_64 and probably a couple others
I've forgotten. It's also been tested on the other embedded targets.
Obviously the new tests are risc-v specific, so that testing was
primarily to make sure we didn't ICE, generate incorrect code or regress
target existing specific tests.
Raphael has some changes to attack this from the gimple direction as
well. I think the latest version of those is on me to push through
internal review.
PR rtl-optimization/120553
gcc/
* ifcvt.cc (noce_try_sign_bit_splat): New function.
(noce_process_if_block): Use it.
gcc/testsuite/
* gcc.target/riscv/pr120553-1.c: New test.
* gcc.target/riscv/pr120553-2.c: New test.
* gcc.target/riscv/pr120553-3.c: New test.
* gcc.target/riscv/pr120553-4.c: New test.
* gcc.target/riscv/pr120553-5.c: New test.
* gcc.target/riscv/pr120553-6.c: New test.
* gcc.target/riscv/pr120553-7.c: New test.
* gcc.target/riscv/pr120553-8.c: New test.
|
|
We passed the reduc_info which is close, but the representative is
more spot on and will not collide with making the reduc_info a
distinct type.
* tree-vect-loop.cc (vectorizable_live_operation): Pass
the representative of the PHIs node to
vect_create_epilog_for_reduction.
|
|
STMT_VINFO_REDUC_VECTYPE_IN exists on relevant reduction stmts, not
the reduction info. And STMT_VINFO_DEF_TYPE exists on the
reduction info. The following fixes up a few places.
* tree-vect-loop.cc (vectorizable_lane_reducing): Get
reduction info properly. Adjust checks according to
comments.
(vectorizable_reduction): Do not set STMT_VINFO_REDUC_VECTYPE_IN
on the reduc info.
(vect_transform_reduction): Query STMT_VINFO_REDUC_VECTYPE_IN
on the actual reduction stmt, not the info.
|
|
Add run and asm check test cases for scalar unsigned SAT_MUL form 3.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_u_mul-4-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u8-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u64.rv32.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to try to match the the unsigned
SAT_MUL form 3, aka below:
#define DEF_SAT_U_MUL_FMT_3(NT, WT) \
NT __attribute__((noinline)) \
sat_u_mul_##NT##_from_##WT##_fmt_3 (NT a, NT b) \
{ \
WT x = (WT)a * (WT)b; \
if ((x >> sizeof(a) * 8) == 0) \
return (NT)x; \
else \
return (NT)-1; \
}
While WT is T is uint16_t, uint32_t, uint64_t and uint128_t,
and NT is is uint8_t, uint16_t, uint32_t and uint64_t.
gcc/ChangeLog:
* match.pd: Add form 3 for unsigned SAT_MUL.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
For the beginning basic block:
(note 4 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 4 26 2 NOTE_INSN_FUNCTION_BEG)
emit the TLS call after NOTE_INSN_FUNCTION_BEG.
gcc/
PR target/121635
* config/i386/i386-features.cc (ix86_emit_tls_call): Emit the
TLS call after NOTE_INSN_FUNCTION_BEG.
gcc/testsuite/
PR target/121635
* gcc.target/i386/pr121635-1a.c: New test.
* gcc.target/i386/pr121635-1b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
REDUC_GROUP_FIRST_ELEMENT is often checked to see whether we are
dealing with a SLP reduction or a reduction chain. When we are
in the context of analyzing the reduction (so we are sure
the SLP instance we see is correct), then we can use the SLP
instance kind instead.
* tree-vect-loop.cc (get_initial_defs_for_reduction): Adjust
comment.
(vect_create_epilog_for_reduction): Get at the reduction
kind via the instance, re-use the slp_reduc flag instead
of checking REDUC_GROUP_FIRST_ELEMENT again.
Remove unreachable code.
(vectorizable_reduction): Compute a reduc_chain flag from
the SLP instance kind, avoid REDUC_GROUP_FIRST_ELEMENT
checks.
(vect_transform_cycle_phi): Likewise.
(vectorizable_live_operation): Check the SLP instance
kind instead of REDUC_GROUP_FIRST_ELEMENT.
|
|
Linaro CI informed me that this test fails on ARM thumb-m7-hard-eabi.
This appears to be because the target defaults to -fshort-enums, and so
the mangled names are inaccurate.
This patch just disables the implicit type enum test for this case.
gcc/testsuite/ChangeLog:
* g++.dg/abi/mangle83.C: Disable implicit enum test for
-fshort-enums.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
The following removes the use of STMT_VINFO_REDUC_* from parloops,
also fixing a mistake with analyzing double reductions which rely
on the outer loop vinfo so the inner loop is properly detected as
nested.
* tree-parloops.cc (parloops_is_simple_reduction): Pass
in double reduction inner loop LC phis and query that.
(parloops_force_simple_reduction): Similar, but set it.
Check for valid reduction types here.
(valid_reduction_p): Remove.
(gather_scalar_reductions): Adjust, fixup double
reduction inner loop processing.
|
|
gcc/ChangeLog:
* config/riscv/t-rtems: Add -mstrict-align multilibs for
targets without support for misaligned access in hardware.
|
|
Without stating the architecture version required by the test, test
runs with options that are incompatible with the required
architecture version fail, e.g. -mfloat-abi=hard.
armv7 was not covered by the long list of arm variants in
target-supports.exp, so add it, and use it for the effective target
requirement and for the option.
for gcc/testsuite/ChangeLog
PR rtl-optimization/120424
* lib/target-supports.exp (arm arches): Add arm_arch_v7.
* g++.target/arm/pr120424.C: Require armv7 support. Use
dg-add-options arm_arch_v7 instead of explicit -march=armv7.
|
|
|
|
PR fortran/121627
gcc/fortran/ChangeLog:
* module.cc (create_int_parameter_array): Avoid NULL
pointer dereference and enhance error message.
gcc/testsuite/ChangeLog:
* gfortran.dg/pr121627.f90: New test.
|
|
For cores without a hardware multiplier, set respective optabs
with library functions which use software implementation of
multiplication.
The implementation was copied from the RL78 backend.
gcc/ChangeLog:
* config/pru/pru.cc (pru_init_libfuncs): Set softmpy libgcc
functions for optab multiplication entries if TARGET_OPT_MUL
option is not set.
libgcc/ChangeLog:
* config/pru/libgcc-eabi.ver: Add __pruabi_softmpyi and
__pruabi_softmpyll symbols.
* config/pru/t-pru: Add softmpy source files.
* config/pru/pru-softmpy.h: New file.
* config/pru/softmpyi.c: New file.
* config/pru/softmpyll.c: New file.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
Enable multilib builds for contemporary PRU core versions (AM335x and
later), and older versions present in AM18xx.
gcc/ChangeLog:
* config.gcc: Include pru/t-multilib.
* config/pru/pru.h (MULTILIB_DEFAULTS): Define.
* config/pru/t-multilib: New file.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
Older PRU core versions (e.g. in AM1808 SoC) do not support
XIN, XOUT, FILL, ZERO instructions. Add GCC command line options to
optionally disable generation of those instructions, so that code
can be executed on such older PRU cores.
gcc/ChangeLog:
* common/config/pru/pru-common.cc (TARGET_DEFAULT_TARGET_FLAGS):
Keep multiplication, FILL and ZERO instructions enabled by
default.
* config/pru/pru.md (prumov<mode>): Gate code generation on
TARGET_OPT_FILLZERO.
(mov<mode>): Ditto.
(zero_extendqidi2): Ditto.
(zero_extendhidi2): Ditto.
(zero_extendsidi2): Ditto.
(@pru_ior_fillbytes<mode>): Ditto.
(@pru_and_zerobytes<mode>): Ditto.
(@<code>di3): Ditto.
(mulsi3): Gate code generation on TARGET_OPT_MUL.
* config/pru/pru.opt: Add mmul and mfillzero options.
* config/pru/pru.opt.urls: Regenerate.
* config/rl78/rl78.opt.urls: Regenerate.
* doc/invoke.texi: Document new options.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
The middle-end does not fully understand NULLPTR_TYPE. So it
gets confused a lot of the time when dealing with it.
This adds the folding that is similarly done in the C++ front-end already.
In some cases it should produce slightly better code as there is no
reason to load from a nullptr_t variable as it is always NULL.
The following is handled:
nullptr_v ==/!= nullptr_v -> true/false
(ptr)nullptr_v -> (ptr)0, nullptr_v
f(nullptr_v) -> f ((nullptr, nullptr_v))
The last one is for conversion inside ... .
Bootstrapped and tested on x86_64-linux-gnu.
PR c/121478
gcc/c/ChangeLog:
* c-fold.cc (c_fully_fold_internal): Fold nullptr_t ==/!= nullptr_t.
* c-typeck.cc (convert_arguments): Handle conversion from nullptr_t
for varargs.
(convert_for_assignment): Handle conversions from nullptr_t to
pointer type specially.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr121478-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
Since r16-3022, 20_util/variant/102912.cc was failing in C++20 and above due
to wrong errors about destruction modifying a const object; destruction is
OK.
PR c++/121068
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_store_expression): Allow clobber of a const
object.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/constexpr-dtor18.C: New test.
|
|
Call check_effective_target_riscv_zvfh_ok rather than
check_effective_target_riscv_zvfh in vx_vf_*run-1-f16.c run tests and ensure
that they are actually run.
Also fix remove_options_for_riscv_zvfh.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f16.c: Call
check_effective_target_riscv_zvfh_ok rather than
check_effective_target_riscv_zvfh.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmacc-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmsac-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmsac-run-1-f16.c: Likewise.
* lib/target-supports.exp (check_effective_target_riscv_zvfh_ok): Append
zvfh instead of v to march.
(remove_options_for_riscv_zvfh): Remove duplicate and
call remove_ rather than add_options_for_riscv_z_ext.
|
|
This PR is another bug in the rtl-ssa code to manage live-out uses.
It seems that this didn't get much coverage until recently.
In the testcase, late-combine first removed a register-to-register
move by substituting into all uses, some of which were in other EBBs.
This was done after checking make_uses_available, which (as expected)
says that single dominating definitions are available everywhere
that the definition dominates. But the update failed to add
appropriate live-out uses, so a later parallelisation attempt
tried to move the new destination into a later block.
gcc/
PR rtl-optimization/121619
* rtl-ssa/functions.h (function_info::commit_make_use_available):
Declare.
* rtl-ssa/blocks.cc (function_info::commit_make_use_available):
New function.
* rtl-ssa/changes.cc (function_info::apply_changes_to_insn): Use it.
gcc/testsuite/
PR rtl-optimization/121619
* gcc.dg/pr121619.c: New test.
|
|
The following makes sure to pun arithmetic that's used in vectorized
reduction to unsigned when overflow invokes undefined behavior.
PR tree-optimization/111494
* gimple-fold.h (arith_code_with_undefined_signed_overflow): Declare.
* gimple-fold.cc (arith_code_with_undefined_signed_overflow): Export.
* tree-vect-stmts.cc (vectorizable_operation): Use unsigned
arithmetic for operations participating in a reduction.
|
|
For a basic block with only a label:
(code_label 78 11 77 3 14 (nil) [1 uses])
(note 77 78 54 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
emit the TLS call after NOTE_INSN_BASIC_BLOCK, instead of before
NOTE_INSN_BASIC_BLOCK, to avoid
x.c: In function ‘aout_16_write_syms’:
x.c:54:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 3
54 | }
| ^
x.c:54:1: error: NOTE_INSN_BASIC_BLOCK 77 in middle of basic block 3
during RTL pass: x86_cse
x.c:54:1: internal compiler error: verify_flow_info failed
gcc/
PR target/121607
* config/i386/i386-features.cc (ix86_emit_tls_call): Emit the
TLS call after NOTE_INSN_BASIC_BLOCK in a basic block with only
a label.
gcc/testsuite/
PR target/121607
* gcc.target/i386/pr121607-1a.c: New test.
* gcc.target/i386/pr121607-1b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
This patch changes the implementation of the insn to test whether the
result itself is negative or not, rather than the MSB of the result of
the ABS machine instruction. This eliminates the need to consider bit-
endianness and allows for longer branch distances.
/* example */
extern void foo(int);
void test0(int a) {
if (a == -2147483648)
foo(a);
}
void test1(int a) {
if (a != -2147483648)
foo(a);
}
;; before (endianness: little)
test0:
entry sp, 32
abs a8, a2
bbci a8, 31, .L1
mov.n a10, a2
call8 foo
.L1:
retw.n
test1:
entry sp, 32
abs a8, a2
bbsi a8, 31, .L4
mov.n a10, a2
call8 foo
.L4:
retw.n
;; after (endianness-independent)
test0:
entry sp, 32
abs a8, a2
bgez a8, .L1
mov.n a10, a2
call8 foo
.L1:
retw.n
test1:
entry sp, 32
abs a8, a2
bltz a8, .L4
mov.n a10, a2
call8 foo
.L4:
retw.n
gcc/ChangeLog:
* config/xtensa/xtensa.md (*btrue_INT_MIN):
Change the branch insn condition to test for a negative number
rather than testing for the MSB.
|
|
We have now common patterns for most of the vectorizable_* calls, so
merge. This also avoids calling vectorizable_early_exit for BB
vect and clarifies signatures of it and vectorizable_phi.
* tree-vectorizer.h (vectorizable_phi): Take bb_vec_info.
(vectorizable_early_exit): Take loop_vec_info.
* tree-vect-loop.cc (vectorizable_phi): Adjust.
* tree-vect-slp.cc (vect_slp_analyze_operations): Likewise.
(vectorize_slp_instance_root_stmt): Likewise.
* tree-vect-stmts.cc (vectorizable_early_exit): Likewise.
(vect_transform_stmt): Likewise.
(vect_analyze_stmt): Merge the sequences of vectorizable_*
where common.
|
|
2025-08-21 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/84122
* parse.cc (parse_derived): PDT type parameters are not allowed
an explicit access specification and must appear before a
PRIVATE statement. If a PRIVATE statement is seen, mark all the
other components as PRIVATE.
PR fortran/85942
* simplify.cc (get_kind): Convert a PDT KIND component into a
specification expression using the default initializer.
gcc/testsuite/
PR fortran/84122
* gfortran.dg/pdt_38.f03: New test.
PR fortran/85942
* gfortran.dg/pdt_39.f03: New test.
|
|
Here r13-1210 correctly changed &A<int>::foo to not be considered
type-dependent, but tsubst_expr of the OFFSET_REF got confused trying to
tsubst a type that involved auto. Fixed by getting the type from the
member rather than tsubst.
PR c++/120757
gcc/cp/ChangeLog:
* pt.cc (tsubst_expr) [OFFSET_REF]: Don't tsubst the type.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/auto-fn66.C: New test.
|
|
|
|
P2036 says that this:
[x=1]{ int x; }
should be rejected, but with my P2036 we started giving an error
for the attached testcase as well, breaking Dolphin. So let's
keep the error only for init-captures.
PR c++/121553
gcc/cp/ChangeLog:
* name-lookup.cc (check_local_shadow): Check !is_normal_capture_proxy.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wshadow-19.C: Revert P2036 changes.
* g++.dg/warn/Wshadow-6.C: Likewise.
* g++.dg/warn/Wshadow-20.C: New test.
* g++.dg/warn/Wshadow-21.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
When -fdiagnostics-show-context[=DEPTH] was added, they were documented, but
common.opt.urls wasn't regenerated.
gcc/ChangeLog:
* common.opt.urls: Regenerate.
|
|
-Wstringop-* warnings [PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]
'-fdiagnostics-show-context[=DEPTH]'
'-fno-diagnostics-show-context'
With this option, the compiler might print the interesting control
flow chain that guards the basic block of the statement which has
the warning. DEPTH is the maximum depth of the control flow chain.
Currently, The list of the impacted warning options includes:
'-Warray-bounds', '-Wstringop-overflow', '-Wstringop-overread',
'-Wstringop-truncation'. and '-Wrestrict'. More warning options
might be added to this list in future releases. The forms
'-fdiagnostics-show-context' and '-fno-diagnostics-show-context'
are aliases for '-fdiagnostics-show-context=1' and
'-fdiagnostics-show-context=0', respectively.
For example:
$ cat t.c
extern void warn(void);
static inline void assign(int val, int *regs, int *index)
{
if (*index >= 4)
warn();
*regs = val;
}
struct nums {int vals[4];};
void sparx5_set (int *ptr, struct nums *sg, int index)
{
int *val = &sg->vals[index];
assign(0, ptr, &index);
assign(*val, ptr, &index);
}
$ gcc -Wall -O2 -c -o t.o t.c
t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ [-Warray-bounds=]
12 | int *val = &sg->vals[index];
| ~~~~~~~~^~~~~~~
t.c:8:18: note: while referencing ‘vals’
8 | struct nums {int vals[4];};
| ^~~~
In the above, Although the warning is correct in theory, the warning message
itself is confusing to the end-user since there is information that cannot
be connected to the source code directly.
It will be a nice improvement to add more information in the warning message
to report where such index value come from.
With the new option -fdiagnostics-show-context=1, the warning message for
the above testing case is now:
$ gcc -Wall -O2 -fdiagnostics-show-context=1 -c -o t.o t.c
t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ [-Warray-bounds=]
12 | int *val = &sg->vals[index];
| ~~~~~~~~^~~~~~~
‘sparx5_set’: events 1-2
4 | if (*index >= 4)
| ^
| |
| (1) when the condition is evaluated to true
......
12 | int *val = &sg->vals[index];
| ~~~~~~~~~~~~~~~
| |
| (2) warning happens here
t.c:8:18: note: while referencing ‘vals’
8 | struct nums {int vals[4];};
| ^~~~
PR tree-optimization/109071
PR tree-optimization/85788
PR tree-optimization/88771
PR tree-optimization/106762
PR tree-optimization/108770
PR tree-optimization/115274
PR tree-optimization/117179
gcc/ChangeLog:
* Makefile.in (OBJS): Add diagnostic-context-rich-location.o.
* common.opt (fdiagnostics-show-context): New option.
(fdiagnostics-show-context=): New option.
* diagnostic-context-rich-location.cc: New file.
* diagnostic-context-rich-location.h: New file.
* doc/invoke.texi (fdiagnostics-details): Add
documentation for the new options.
* gimple-array-bounds.cc (check_out_of_bounds_and_warn): Add
one new parameter. Use rich location with details for warning_at.
(array_bounds_checker::check_array_ref): Use rich location with
ditails for warning_at.
(array_bounds_checker::check_mem_ref): Add one new parameter.
Use rich location with details for warning_at.
(array_bounds_checker::check_addr_expr): Use rich location with
move_history_diagnostic_path for warning_at.
(array_bounds_checker::check_array_bounds): Call check_mem_ref with
one more parameter.
* gimple-array-bounds.h: Update prototype for check_mem_ref.
* gimple-ssa-warn-access.cc (warn_string_no_nul): Use rich location
with details for warning_at.
(maybe_warn_nonstring_arg): Likewise.
(maybe_warn_for_bound): Likewise.
(warn_for_access): Likewise.
(check_access): Likewise.
(pass_waccess::check_strncat): Likewise.
(pass_waccess::maybe_check_access_sizes): Likewise.
* gimple-ssa-warn-restrict.cc (pass_wrestrict::execute): Calculate
dominance info for diagnostics show context.
(maybe_diag_overlap): Use rich location with details for warning_at.
(maybe_diag_access_bounds): Use rich location with details for
warning_at.
gcc/testsuite/ChangeLog:
* gcc.dg/pr109071.c: New test.
* gcc.dg/pr109071_1.c: New test.
* gcc.dg/pr109071_10.c: New test.
* gcc.dg/pr109071_11.c: New test.
* gcc.dg/pr109071_12.c: New test.
* gcc.dg/pr109071_2.c: New test.
* gcc.dg/pr109071_3.c: New test.
* gcc.dg/pr109071_4.c: New test.
* gcc.dg/pr109071_5.c: New test.
* gcc.dg/pr109071_6.c: New test.
* gcc.dg/pr109071_7.c: New test.
* gcc.dg/pr109071_8.c: New test.
* gcc.dg/pr109071_9.c: New test.
* gcc.dg/pr117375.c: New test.
|
|
build_ref_for_offset was originally made external
with r0-95095-g3f84bf08c48ea4. The call was extracted
out into ipa_get_jf_ancestor_result by r0-110216-g310bc6334823b9.
Then the call was removed by r10-7273-gf3280e4c0c98e1.
So there is no use of build_ref_for_offset outside of SRA, so
let's make it static again.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/121568
gcc/ChangeLog:
* ipa-prop.h (build_ref_for_offset): Remove.
* tree-sra.cc (build_ref_for_offset): Make static.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
I'd added the aarch64-specific CC fusion pass to fold a PTEST
instruction into the instruction that feeds the PTEST, in cases
where the latter instruction can set the appropriate flags as a
side-effect.
Combine does the same optimisation. However, as explained in the
comments, the PTEST case often has:
A: set predicate P based on inputs X
B: clobber X
C: test P
and so the fusion is only possible if we move C before B.
That's something that combine currently can't do (for the cases
that we needed).
The optimisation was never really AArch64-specific. It's just that,
in an all-too-familiar fashion, we needed it in stage 3, when it was
too late to add something target-independent.
late-combine adds a convenient place to do the optimisation in a
target-independent way, just as combine is a convenient place to
do its related optimisation.
gcc/
* config.gcc (aarch64*-*-*): Remove aarch64-cc-fusion.o from
extra_objs.
* config/aarch64/aarch64-passes.def (pass_cc_fusion): Delete.
* config/aarch64/aarch64-protos.h (make_pass_cc_fusion): Delete.
* config/aarch64/t-aarch64 (aarch64-cc-fusion.o): Delete.
* config/aarch64/aarch64-cc-fusion.cc: Delete.
* late-combine.cc (late_combine::optimizable_set): Take a set_info *
rather than an insn_info * and move destination tests from...
(late_combine::combine_into_uses): ...here. Take a set_info * rather
an insn_info *. Take the rtx set.
(late_combine::parallelize_insns, late_combine::combine_cc_setter)
(late_combine::combine_insn): New member functions.
(late_combine::m_parallel): New member variable.
* rtlanal.cc (pattern_cost): Handle sets of CC registers in the
same way as comparisons.
|
|
While testing a later patch, I found that create_degenerate_phi
had an inverted test for bitmap_set_bit. It was assuming that
the return value was the previous bit value, rather than a
"something changed" value. :(
Also, the call to add_live_out_use shouldn't be conditional
on the DF_LR_OUT operation, since the register could be live-out
because of uses later in the same EBB (which do not require a
live-out use to be added to the rtl-ssa instruction). Instead,
add_live_out should itself check whether a live-out use already exists.
gcc/
* rtl-ssa/blocks.cc (function_info::create_degenerate_phi): Fix
inverted test of bitmap_set_bit. Call add_live_out_use even
if the register was previously live-out from the predecessor block.
Instead...
(function_info::add_live_out_use): ...check here whether a live-out
use already exists.
|
|
rtl-ssa already has a find_def function for finding the definition
of a particular resource (register or memory) at a particular point
in the program. This patch adds a similar function for looking
up uses. Both functions have amortised logarithmic complexity.
gcc/
* rtl-ssa/accesses.h (use_lookup): New class.
* rtl-ssa/functions.h (function_info::find_def): Expand comment.
(function_info::find_use): Declare.
* rtl-ssa/member-fns.inl (use_lookup::prev_use, use_lookup::next_use)
(use_lookup::matching_use, use_lookup::matching_or_prev_use)
(use_lookup::matching_or_next_use): New member functions.
* rtl-ssa/accesses.cc (function_info::find_use): Likewise.
|
|
The testcase in the PR shows that it's worth splitting the processing
of the initial workset, which is def_blocks from the main iteration.
This reduces SSA incremental update time from 44.7s to 32.9s. Further
changing the workset bitmap of the main iteration to a vector
speeds up things further to 23.5s for an overall nearly halving of
the SSA incremental update compile-time and an overall 12% compile-time
saving at -O1.
Using bitmap_ior in the first loop or avoiding (immediate) re-processing
of blocks in def_blocks does not make a measurable difference for the
testcase so I left this as-is.
PR tree-optimization/114480
* cfganal.cc (compute_idf): Split processing of the initial
workset from the main iteration. Use a vector for the
workset of the main iteration.
|
|
The linker rejects --relax in relocatable links (-r), hence only
add --relax when -r is not specified.
gcc/
PR target/121608
* config/avr/specs.h (LINK_RELAX_SPEC): Wrap in %{!r...}.
|
|
vect_analyze_slp_instance still handles stores and reduction chains.
The following threads the special handling of those two kinds,
duplicating vect_build_slp_instance into two specialized entries.
* tree-vect-slp.cc (vect_analyze_slp_reduc_chain): New,
copied from vect_analyze_slp_instance and only handle
slp_inst_kind_reduc_chain. Inline vect_build_slp_instance.
(vect_analyze_slp_instance): Only handle slp_inst_kind_store.
Inline vect_build_slp_instance.
(vect_build_slp_instance): Remove now unused stmt_info parameter,
remove special code for store groups and reduction chains.
(vect_analyze_slp): Call vect_analyze_slp_reduc_chain
for reduction chain SLP build and adjust.
|
|
The restriction no longer applies, so remove it.
* tree-vect-data-refs.cc (vect_check_gather_scatter):
Remove restriction on epilogue of epilogue vectorization.
|
|
The following removes the fixup we apply to pattern stmt operands
before code generating vector epilogues. This isn't necessary anymore
since the SLP graph now exclusively records the data flow. Similarly
fixing up of SSA references inside DR_REF of gather/scatter isn't
necessary since we now record the analysis result and avoid re-doing
it during transform.
What we still need to keep is the adjustment of the actual pointers
to gimple stmts from stmt_vec_info and the back-reference from the DRs.
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Remove
fixing up pattern stmt operands and gather/scatter DR_REFs.
(find_in_mapping): Remove.
|
|
The following is a patch to make us record the get_load_store_info
results from load/store analysis and re-use them during transform.
In particular this moves where SLP_TREE_MEMORY_ACCESS_TYPE is stored.
A major hassle was (and still is, to some extent), gather/scatter
handling with it's accompaning gather_scatter_info. As
get_load_store_info no longer fully re-analyzes them but parts of
the information is recorded in the SLP tree during SLP build the
following goes and eliminates the use of this data in
vectorizable_load/store, instead recording the other relevant
part in the load-store info (namely the IFN or decl chosen).
Strided load handling keeps the re-analysis but populates the
data back to the SLP tree and the load-store info. That's something
for further improvement. This also shows that early classifying
a SLP tree as load/store and allocating the load-store data might
be a way to move back all of the gather/scatter auxiliary data
into one place.
Rather than mass-replacing references to variables I've kept the
locals but made them read-only, only adjusting a few elsval setters
and adding a FIXME to strided SLP handling of alignment (allowing
local override there).
The FIXME shows that while a lot of analysis is done in
get_load_store_type that's far from all of it. There's also
a possibility that splitting up the transform phase into
separate load/store def types, based on VMAT choosen, will make
the code more maintainable.
* tree-vectorizer.h (vect_load_store_data): New.
(_slp_tree::memory_access_type): Remove.
(SLP_TREE_MEMORY_ACCESS_TYPE): Turn into inline function.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Do not
initialize SLP_TREE_MEMORY_ACCESS_TYPE.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors):
Remove gather_scatter_info pointer argument, instead get
info from the SLP node.
(vect_build_one_gather_load_call): Get SLP node and builtin
decl as argument and remove uses of gather_scatter_info.
(vect_build_one_scatter_store_call): Likewise.
(vect_get_gather_scatter_ops): Remove uses of gather_scatter_info.
(vect_get_strided_load_store_ops): Get SLP node and remove
uses of gather_scatter_info.
(get_load_store_type): Take pointer to vect_load_store_data
instead of individual pointers.
(vectorizable_store): Adjust. Re-use get_load_store_type
result from analysis time.
(vectorizable_load): Likewise.
|