Age | Commit message (Collapse) | Author | Files | Lines |
|
Configuring gcc for --target=powerpc-wrs-vxworks7r2 sets things up for
a 64-bit compiler, just like powerpc64-wrs-vxworks7r2, except that
TARGET_VXWORKS64 is only defined as 1 for targets that match
*64-*-vxworks*.
With !TARGET_VXWORKS64, we get a 64-bit toolchain that defines
SIZE_TYPE, PTRDIFF_TYPE, and WCHAR_TYPE as 32-bit types, and that
breaks GCC passes that expect SIZE_TYPE and PTRDIFF_TYPE to be as wide
as pointers.
Arrange for TARGET_VXWORKS64 on ppc to match TARGET_64BIT, after using
it to select the default word size with driver self specs.
for gcc/ChangeLog
* config/rs6000/vxworks.h (SUBTARGET_DRIVER_SELF_SPECS):
Redefine to select word size matching TARGET_VXWORKS64.
(TARGET_VXWORKS64): Redefine in terms of TARGET_64BIT.
|
|
|
|
prefetch was recently fixed/tightened (with Q reg constraint) to only
support right address patterns (REG or REG+D with lower 5 bits clear).
However in some cases that's too restrictive for LRA and it fails to
allocate a reg resulting in following ICE...
| gcc/testsuite/gcc.target/riscv/pr118241-b.cc:31:19: error: unable to generate reloads for:
| 31 | void m() { a.l(); }
| | ^
|(insn 26 25 27 7 (prefetch (mem/f:DI (plus:DI (reg/f:DI 143 [ _5 ])
| (const_int 56 [0x38])) [5 _5->batch[6]+0 S8 A64])
| (const_int 0 [0])
| (const_int 3 [0x3])) "gcc/testsuite/gcc.target/riscv/pr118241-b.cc":18:29 498 {prefetch}
| (expr_list:REG_DEAD (reg/f:DI 142 [ _5->batch[6] ])
| (nil)))
|during RTL pass: reload
Fix that by providing a fallback alternative register constraint to reload the address.
PR target/118241
gcc/ChangeLog:
* config/riscv/riscv.md (prefetch): Add alternative "r".
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr118241-b.cc: New test.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
|
|
Spotted this by chance as I saw a similar fixup in comment.
From comments, I think this is needed, but I've not hit any issues due
to this.
gcc/ChangeLog:
* config/riscv/predicates.md (prefetch_operand): mack 5 bits.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
|
|
A right shift of 31 will become 0 or 1, this can be checked for
treg_set_expr_not_const01 to avoid matching addc_t_r as this
can expand to a 3 insn sequence instead.
This improves tests 023 to 026 from gcc.target/sh/pr54236-2.c, e.g.:
test_023:
shll r5
mov #0,r1
mov r4,r0
rts
addc r1,r0
With this change:
test_023:
shll r5
movt r0
rts
add r4,r0
We noticed this while evaluating a patch to improve how we handle
selecting between two constants based on the output of a LT/GE 0
test.
gcc/ChangeLog:
* config/sh/predicates.md
(treg_set_expr_not_const01): call sh_recog_treg_set_expr_not_01
* config/sh/sh-protos.h
(sh_recog_treg_set_expr_not_01): New function
* config/sh/sh.cc (sh_recog_treg_set_expr_not_01): Likewise
gcc/testsuite/ChangeLog:
* gcc.target/sh/pr54236-2.c: Fix comments and expected output
|
|
When GCC is built with clang, it suggests that we add a brace to the
initialization of format_asterisk:
gcc/fortran/io.cc:32:16: warning: suggest braces around initialization of subobject [-Wmissing-braces]
So this patch does that to silence it.
gcc/fortran/ChangeLog:
2025-06-24 Martin Jambor <mjambor@suse.cz>
* io.cc (format_asterisk): Add a brace around static initialization
location part of the field locus.
|
|
tree_expr_nonnegative_warnv_p [PR118948]
This is an obvious fix for this small regression. Basically after r15-328-g5726de79e2154a,
there is a call to tree_expr_nonnegative_warnv_p where the type of the expression is now
error_mark_node. Though there was only a check if the expression was error_mark_node.
Bootstrapped and tested on x86_64-linux-gnu.
PR c/118948
gcc/ChangeLog:
* fold-const.cc (tree_expr_nonnegative_warnv_p): Use
error_operand_p instead of checking for error_mark_node directly.
gcc/testsuite/ChangeLog:
* gcc.dg/pr118948-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Here we were asserting non-zero errorcount, which is not the case if the
parse error was reduced to a warning (or silenced) in a template body. So
check seen_error instead.
PR c++/120575
PR c++/116064
gcc/cp/ChangeLog:
* parser.cc (cp_parser_abort_tentative_parse): Check seen_error
instead of errorcount.
gcc/testsuite/ChangeLog:
* g++.dg/template/permissive-error3.C: New test.
|
|
0, 1 and 2
Add asm dump check test for vec_duplicate + vsadd.vv combine to
vsadd.vx, with the GR2VR cost is 0, 1 and 2.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
0, 2 and 15
Add asm dump check and run test for vec_duplicate + vsadd.vv
combine to vsadd.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to combine the vec_duplicate + vsadd.vv to the
vsadd.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_SAT_S_ADD(T, UT, MIN, MAX) \
T \
test_##T##_sat_add (T x, T y) \
{ \
T sum = (UT)x + (UT)y; \
return (x ^ y) < 0 \
? sum \
: (sum ^ x) >= 0 \
? sum \
: x < 0 ? MIN : MAX; \
}
DEF_SAT_S_ADD(int32_t, uint32_t, INT32_MIN, INT32_MAX)
DEF_VX_BINARY_CASE_2_WRAP(T, SAT_S_ADD_FUNC(T), sat_add)
Before this patch:
10 │ test_vx_binary_or_int32_t_case_0:
11 │ beq a3,zero,.L8
12 │ vsetvli a5,zero,e32,m1,ta,ma
13 │ vmv.v.x v2,a2
14 │ slli a3,a3,32
15 │ srli a3,a3,32
16 │ .L3:
17 │ vsetvli a5,a3,e32,m1,ta,ma
18 │ vle32.v v1,0(a1)
19 │ slli a4,a5,2
20 │ sub a3,a3,a5
21 │ add a1,a1,a4
22 │ vsadd.vv v1,v1,v2
23 │ vse32.v v1,0(a0)
24 │ add a0,a0,a4
25 │ bne a3,zero,.L3
After this patch:
10 │ test_vx_binary_or_int32_t_case_0:
11 │ beq a3,zero,.L8
12 │ slli a3,a3,32
13 │ srli a3,a3,32
14 │ .L3:
15 │ vsetvli a5,a3,e32,m1,ta,ma
16 │ vle32.v v1,0(a1)
17 │ slli a4,a5,2
18 │ sub a3,a3,a5
19 │ add a1,a1,a4
20 │ vsadd.vx v1,v1,a2
21 │ vse32.v v1,0(a0)
22 │ add a0,a0,a4
23 │ bne a3,zero,.L3
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case SS_PLUS.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op ss_plus.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
The following avoids translating expressions through volatile
copies.
PR tree-optimization/120944
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Gate optimizations
invalid when volatile is involved.
* gcc.dg/torture/pr120944.c: New testcase.
|
|
This effectively adds 250 new tests, i.e. around 10% more tests.
gcc/ada/
* gcc-interface/Make-lang.in (ACATSDIR): Change to acats-4.
|
|
This happens when aggressive optimizations are enabled (i.e. -O2 and above)
because the ivopts pass fails to properly mark the new memory accesses it is
creating as misaligned by means of the build_aligned_type function.
gcc/ada/ChangeLog:
* gcc-interface/utils.cc (make_packable_type): Clear the TYPE_PACKED
flag in the case where the alignment is bumped.
|
|
System.Pack_NN
Initialization procedures are turned into functions under the hood and, even
when they are null (empty), the compiler may generate a convoluted sequence
of instructions that return uninitialized data and, therefore, is useless.
gcc/ada/ChangeLog:
* gcc-interface/trans.cc (Subprogram_Body_to_gnu): Do not generate
a block-copy out for a null initialization procedure when the _Init
parameter is not passed in.
|
|
The debugger cannot correctly interpret the return value in this case.
gcc/ada/ChangeLog:
* gcc-interface/decl.cc (gnat_to_gnu_subprog_type): Only apply the
transformation to integer types.
|
|
We need to make sure that an integer type exists for the given size.
gcc/ada/ChangeLog:
* gcc-interface/decl.cc (gnat_to_gnu_subprog_type): Add guards.
|
|
The second problem occurs on 64-bit platforms where there is a second Out
parameter that is smaller than the access parameter, creating a hole in the
return structure.
gcc/ada/ChangeLog:
* gcc-interface/decl.cc (gnat_to_gnu_subprog_type): In the case of a
subprogram using the Copy-In/Copy-Out mechanism, deal specially with
the case of 2 parameters of differing sizes.
* gcc-interface/trans.cc (Subprogram_Body_to_gnu): In the case of a
subprogram using the Copy-In/Copy-Out mechanism, make sure the types
are consistent on the two sides for all the parameters.
|
|
If -gnatwr is enabled, then in some cases a type conversion between two
different Boolean types incorrectly results in a warning that the conversion
is redundant.
gcc/ada/ChangeLog:
* sem_res.adb (Resolve_Type_Conversion): Replace code for
detecting a similar case with a more comprehensive test.
|
|
Improve documentation of pragma Short_Circuit_And_Or.
Also disallow renamings, because the semantics as currently
implemented is confusing.
gcc/ada/ChangeLog:
* doc/gnat_rm/implementation_defined_pragmas.rst
(Short_Circuit_And_Or): Add more documentation.
* sem_ch8.adb (Analyze_Subprogram_Renaming):
Disallow renamings.
* gnat_rm.texi: Regenerate.
|
|
The newly introduced Finalizable aspect makes it possible to derive from
a type that is not tagged but has a Finalize primitive. This patch fixes
problems where overridings of the Finalize primitive were ignored.
gcc/ada/ChangeLog:
* exp_ch7.adb (Make_Final_Call): Tweak search of Finalize primitive.
* exp_util.adb (Finalize_Address): Likewise.
|
|
We fail to use the implementation permission given by RM 13.9(12) because
the array type does not have the Size_Known_At_Compile_Time flag set.
gcc/ada/ChangeLog:
* freeze.adb (Check_Compile_Time_Size): Try harder to see whether
the bounds of array types are known at compile time.
|
|
Cleanup; technical commit meant to trigger a GNAT continuous builder.
gcc/ada/ChangeLog:
* sem_aux.ads (First_Discriminant): Remove space before period.
|
|
Even when -gnatw.c is enabled, no warning about a missing component clause
should be generated if the placement of a discriminant of an Unchecked_Union
type is left unspecified in a record representation clause (such a discriminant
occupies no storage). In determining whether to generate such a warning, in
some cases the compiler would incorrectly ignore an Unchecked_Union pragma
occurring after the record representation clause. This could result in a
spurious warning.
gcc/ada/ChangeLog:
* sem_ch13.adb (Analyze_Record_Representation_Clause): In deciding
whether to generate a warning about a missing component clause, in
addition to calling Is_Unchecked_Union also call a new local
function, Unchecked_Union_Pragma_Pending, which checks for the
case of a not-yet-analyzed Unchecked_Union pragma occurring later
in the declaration list.
|
|
limit
Improve the error message that is generated when the size of tagged type
exceeds a Size'Class limit specified for an ancestor type.
gcc/ada/ChangeLog:
* mutably_tagged.adb (Make_CW_Size_Compile_Check): Include the
value of the Size'Class limit in the message generated via a
Compile_Time_Error pragma.
|
|
This patch removes some comments and object definitions that referred to
a hacky use of the Entity field that had been removed by the latest
rework of the internal representation of aspects.
gcc/ada/ChangeLog:
* sem_ch13.adb (Check_Aspect_At_Freeze_Point): Remove obsolete bits.
|
|
The format string used for the error in that case requires setting the
Error_Msg_Name_1 global variable. This was not done so this patch adds
the missing assignment.
gcc/ada/ChangeLog:
* sem_ch13.adb (Analyze_Aspect_Specifications): Fix error emission.
|
|
gcc/ChangeLog:
* common.opt: Add period.
* common.opt.urls: Regenerate.
|
|
The following fixes bad alignment computaton for epilog vectorization
when as in this case for 510.parest_r and masked epilog vectorization
with AVX512 we end up choosing AVX to vectorize the main loop and
masked AVX512 (sic!) to vectorize the epilog. In that case alignment
analysis for the epilog tries to force alignment of the base to 64,
but that cannot possibly help the epilog when the main loop had used
a vector mode with smaller alignment requirement.
There's another issue, that the check whether the step preserves
alignment needs to consider possibly previously involved VFs
(here, the main loops smaller VF) as well.
These might not be the only case with problems for such a mode mix
but at least there it seems wise to never use DR alignment forcing
when analyzing an epilog.
We get to chose this mode setup because the iteration over epilog
modes doesn't prevent this, the maybe_ge (cached_vf_per_mode[0],
first_vinfo_vf) skip is conditional on !supports_partial_vectors
and it is also conditional on having a cached VF. Further nothing
in vect_analyze_loop_1 rejects this setup - it might be conceivable
that a target can do masking only for larger modes. There is a
second reason we end up with this mode setup, which is that
vect_need_peeling_or_partial_vectors_p says we do not need
peeling or partial vectors when analyzing the main loop with
AVX512 (if it would say so we'd have chosen a masked AVX512
epilog-only vectorization). It does that because it looks at
LOOP_VINFO_COST_MODEL_THRESHOLD (which is not yet computed, so
always zero at this point), and compares max_niter (5) against
the VF (8), but not with equality as the comment says but with
greater. This also needs looking at, PR120939.
PR tree-optimization/120927
* tree-vect-data-refs.cc (vect_compute_data_ref_alignment):
Do not force a DRs base alignment when analyzing an
epilog loop. Check whether the step preserves alignment
for all VFs possibly involved sofar.
* gcc.dg/vect/vect-pr120927.c: New testcase.
* gcc.dg/vect/vect-pr120927-2.c: Likewise.
|
|
The following testcase is miscompiled with -fsanitize=undefined but we
introduce UB into the IL even without that flag.
The optimization ptr +- (expr +- cst) when expr/cst have undefined
overflow into (ptr +- cst) +- expr is sometimes simply not valid,
without careful analysis on what ptr points to we don't know if it
is valid to do (ptr +- cst) pointer arithmetics.
E.g. on the testcase, ptr points to start of an array (actually
conditionally one or another) and cst is -1, so ptr - 1 is invalid
pointer arithmetics, while ptr + (expr - 1) can be valid if expr
is at runtime always > 1 and smaller than size of the array ptr points
to + 1.
Unfortunately, removing this 1992-ish optimization altogether causes
FAIL: c-c++-common/restrict-2.c -Wc++-compat scan-tree-dump-times lim2 "Moving statement" 11
FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump ch2 "is now do-while loop"
FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump-times ch2 " if " 3
FAIL: gcc.dg/vect/pr57558-2.c scan-tree-dump vect "vectorized 1 loops"
FAIL: gcc.dg/vect/pr57558-2.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops"
regressions (restrict-2.c also for C++ in all std modes). I've been thinking
about some match.pd optimization for signed integer addition/subtraction of
constant followed by widening integral conversion followed by multiplication
or left shift, but that wouldn't help 32-bit arches.
So, instead at least for now, the following patch keeps doing the
optimization, just doesn't perform it in pointer arithmetics.
pointer_int_sum itself actually adds the multiplication by size_exp,
so ptr + expr is turned into ptr p+ expr * size_exp,
so this patch will try to optimize
ptr + (expr +- cst)
into
ptr p+ ((sizetype)expr * size_exp +- (sizetype)cst * size_exp)
and
ptr - (expr +- cst)
into
ptr p+ -((sizetype)expr * size_exp +- (sizetype)cst * size_exp)
2025-07-04 Jakub Jelinek <jakub@redhat.com>
PR c/120837
* c-common.cc (pointer_int_sum): Rewrite the intop PLUS_EXPR or
MINUS_EXPR optimization into extension of both intop operands,
their separate multiplication and then addition/subtraction followed
by rest of pointer_int_sum handling after the multiplication.
* gcc.dg/ubsan/pr120837.c: New test.
|
|
I mistyped the file name :(.
gcc/testsuite/ChangeLog:
PR target/120807
* gcc.c-torture/compile/pr120708.c: Rename to ...
* gcc.c-torture/compile/pr120807.c: ... here.
|
|
The register_operand predicate can match subreg, then we'd have a subreg
of subreg and it's invalid. Use lowpart_subreg to avoid the nested
subreg.
gcc/ChangeLog:
* config/loongarch/loongarch.md (crc_combine): Avoid nested
subreg.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr120708.c: New test.
|
|
We were looking to evaluate some changes from Artemiy that improve GCC's
ability to discover fusible instruction pairs. There was no good way to get
any static data out of the compiler about what kinds of fusions were happening.
Yea, you could grub around the .sched dumps looking for the magic '+'
annotation, then look around at the slim RTL representation and make an
educated guess about what fused. But boy that was inconvenient.
All we really needed was a quick note in the dump file that the target hook
found a fusion pair and what kind was discovered. That made it easy to spot
invalid fusions, evaluate the effectiveness of Artemiy's work, write/discover
testcases for existing fusions and implement new fusions.
So from a codegen standpoint this is NFC, it only affects dump file output.
It's gone through the usual testing and I'll wait for pre-commit CI to churn
through it before moving forward.
gcc/
* config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Add basic
instrumentation to all cases where fusion is detected. Fix
minor formatting goofs found in the process.
|
|
This patch adds testcase for form2, as shown below:
T __attribute__((noinline)) \
sat_s_add_imm_##T##_fmt_2##_##INDEX (T x) \
{ \
T sum = (T)((UT)x + (UT)IMM); \
return ((x ^ sum) < 0 && (x ^ IMM) >= 0) ? \
(-(T)(x < 0) ^ MAX) : sum; \
}
Passed the rv64gcv regression test.
Signed-off-by: Ciyan Pan <panciyan@eswincomputing.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add signed scalar SAT_ADD IMM form2.
* gcc.target/riscv/sat/sat_s_add_imm-2-i16.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-2-i32.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-2-i64.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-2-i8.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-run-2-i16.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-run-2-i32.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-run-2-i64.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-run-2-i8.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i16.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i32.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i8.c: New test.
|
|
This patch would like to support signed scalar SAT_ADD IMM form 2
Form2:
T __attribute__((noinline)) \
sat_s_add_imm_##T##_fmt_2##_##INDEX (T x) \
{ \
T sum = (T)((UT)x + (UT)IMM); \
return ((x ^ sum) < 0 && (x ^ IMM) >= 0) ? \
(-(T)(x < 0) ^ MAX) : sum; \
}
Take below form1 as example:
DEF_SAT_S_ADD_IMM_FMT_2(0, int8_t, uint8_t, 9, INT8_MIN, INT8_MAX)
Before this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_2_0 (int8_t x)
{
int8_t sum;
unsigned char x.0_1;
unsigned char _2;
signed char _3;
signed char _4;
_Bool _5;
signed char _6;
int8_t _7;
int8_t _10;
signed char _11;
signed char _13;
signed char _14;
<bb 2> [local count: 1073741822]:
x.0_1 = (unsigned char) x_8(D);
_2 = x.0_1 + 9;
sum_9 = (int8_t) _2;
_3 = x_8(D) ^ sum_9;
_4 = x_8(D) ^ 9;
_13 = ~_3;
_14 = _4 | _13;
if (_14 >= 0)
goto <bb 3>; [59.00%]
else
goto <bb 4>; [41.00%]
<bb 3> [local count: 259738146]:
_5 = x_8(D) < 0;
_11 = (signed char) _5;
_6 = -_11;
_10 = _6 ^ 127;
<bb 4> [local count: 1073741824]:
# _7 = PHI <sum_9(2), _10(3)>
return _7;
}
After this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_2_0 (int8_t x)
{
int8_t _7;
<bb 2> [local count: 1073741824]:
_7 = .SAT_ADD (x_8(D), 9); [tail call]
return _7;
}
The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.
Signed-off-by: Ciyan Pan <panciyan@eswincomputing.com>
gcc/ChangeLog:
* match.pd: Add signed scalar SAT_ADD IMM form2 matching.
|
|
|
|
In this testcase there is nothing in the lambda except a static_assert which
mentions a variable from the enclosing scope but does not odr-use it, so we
want prune_lambda_captures to remove its capture. Since the lambda is so
empty, there's nothing in the body except the DECL_EXPR of the capture
proxy, so pop_stmt_list moves that into the enclosing STATEMENT_LIST and
passes the 'body' STATEMENT_LIST to free_stmt_list. As a result, passing
'body' to prune_lambda_captures is wrong; we should instead pass the
enclosing scope, i.e. cur_stmt_list.
PR c++/120716
gcc/cp/ChangeLog:
* lambda.cc (finish_lambda_function): Pass cur_stmt_list to
prune_lambda_captures.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-constexpr3.C: New test.
* g++.dg/cpp0x/lambda/lambda-constexpr3a.C: New test.
|
|
This testcase was crashing from infinite recursion in the diagnostic
machinery, trying to print the lambda signature, which referred to the
__this capture field in the lambda, which wanted to print the lambda again.
But we don't want the signature to refer to the capture field; 'this' in an
unevaluated context refers to the 'this' from the enclosing function, not
the capture.
After fixing that, we still wrongly rejected the B case because
THIS_FORBIDDEN is set in a default (template) argument. Since we don't
distinguish between THIS_FORBIDDEN being set for a default argument and it
being set for a static member function, let's just ignore it if
cp_unevaluated_operand; we'll give a better diagnostic for the static memfn
case in finish_this_expr.
PR c++/120748
gcc/cp/ChangeLog:
* lambda.cc (lambda_expr_this_capture): Don't return a FIELD_DECL.
* parser.cc (cp_parser_primary_expression): Ignore THIS_FORBIDDEN
if cp_unevaluated_operand.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/lambda-targ16.C: New test.
* g++.dg/cpp0x/this1.C: Adjust diagnostics.
|
|
No idea how this slipped in, I'm terribly sorry.
Strangely nothing in the testsuite has caught this, so I've added
a new test for that.
2025-07-03 Jakub Jelinek <jakub@redhat.com>
PR c++/120940
* typeck.cc (cp_build_array_ref): Fix a pasto.
* g++.dg/parse/pr120940.C: New test.
* g++.dg/warn/Wduplicated-branches9.C: New test.
|
|
It was removed from the compiler a few releases ago.
gcc/ada/
* gcc-interface/Makefile.in (gnatlib-sjlj): Delete.
(gnatlib-zcx): Do not modify Frontend_Exceptions constant.
* libgnat/system-linux-loongarch.ads (Frontend_Exceptions): Delete.
|
|
s390 missed constant vector permutation cases based on the vector pack
instruction or changing the size of the vector elements during vector
merge. This enables some more patterns that do not need to load a
constant vector for permutation.
gcc/ChangeLog:
* config/s390/s390.cc (expand_perm_with_merge): Add size change cases.
(expand_perm_with_pack): New function.
(vectorize_vec_perm_const_1): Wire up new function.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/vec-perm-merge-1.c: New test.
* gcc.target/s390/vector/vec-perm-pack-1.c: New test.
Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
|
|
With this fix-up for commit 387209938d2c476a67966c6ddbdbf817626f24a2
"OpenMP: Add omp_get_initial_device/omp_get_num_devices builtins", we progress:
PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c (test for excess errors)
PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-not optimized "abort"
-FAIL: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-times optimized "omp_get_num_devices;" 1
+PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-times optimized "omp_get_num_devices" 1
PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump optimized "_1 = __builtin_omp_get_num_devices \\(\\);[\\r\\n]+[ ]+return _1;"
... etc. for offloading configurations.
gcc/testsuite/
* c-c++-common/gomp/omp_get_num_devices_initial_device.c: Fix.
* gfortran.dg/gomp/omp_get_num_devices_initial_device.f90: Likewise.
|
|
candidates
A number of folks have had their fingers in this code and it's going to take a
few submissions to do everything we want to do.
This patch is primarily concerned with avoiding signaling that fusion can occur
in cases where it obviously should not be signaling fusion.
Every DEC based fusion I'm aware of requires the first instruction to set a
destination register that is both used and set again by the second instruction.
If the two instructions set different registers, then the destination of the
first instruction was not dead and would need to have a result produced.
This is complicated by the fact that we have pseudo registers prior to reload.
So the approach we take is to signal fusion prior to reload even if the
destination registers don't match. Post reload we require them to match.
That allows us to clean up the code ever-so-slightly.
Second, we sometimes signaled fusion into loads that weren't scalar integer
loads. I'm not aware of a design that's fusing into FP loads or vector loads.
So those get rejected explicitly.
Third, the store pair "fusion" code is cleaned up a little. We use fusion to
model store pair commits since the basic properties for detection are the same.
The point where they "fuse" is different. Also this code liked to "return
false" at each step along the way if fusion wasn't possible. Future work for
additional fusion cases makes that behavior undesirable. So the logic gets
reworked a little bit to be more friendly to future work.
Fourth, if we already fused the previous instruction, then we can't fuse it
again. Signaling fusion in that case is, umm, bad as it creates an atomic blob
of code from a scheduling standpoint.
Hopefully I got everything correct with extracting this work out of a larger
set of changes 🙂 We will contribute some instrumentation & testing code so if
I botched things in a major way we'll soon have a way to test that and I'll be
on the hook to fix any goof's.
From a correctness standpoint this should be a big fat nop. We've seen this
make measurable differences in pico benchmarks, but obviously as you scale up
to bigger stuff the gains largely disappear into the noise.
This has been through Ventana's internal CI and my tester. I'll obviously wait
for a verdict from the pre-commit tester.
PR target/118886
gcc/
* config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Check
for fusion being disabled earlier. If PREV is already fused,
then it can't be fused again. Be more selective about fusing
when the destination registers do not match. Don't fuse into
loads that aren't scalar integer modes. Revamp store pair
commit support.
Co-authored-by: Daniel Barboza <dbarboza@ventanamicro.com>
Co-authored-by: Shreya Munnangi <smunnangi1@ventanamicro.com>
|
|
gcc.dg/ipa/pr120295.c FAILs on Solaris:
FAIL: gcc.dg/ipa/pr120295.c (test for excess errors)
Excess errors:
ld: warning: symbol 'glob' has differing types:
(file /var/tmp//ccsDR59c.o type=OBJT; file /lib/libc.so type=FUNC);
/var/tmp//ccsDR59c.o definition taken
Fixed by renaming the glob variable to glob_ to avoid the conflict.
Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
gcc/testsuite:
* gcc.dg/ipa/pr120295.c (glob): Rename to glob_.
|
|
Move the rules for CBZ/TBZ to be above the rules for
CBB<cond>/CBH<cond>/CB<cond>. We want them to have higher priority
because they can express larger displacements.
gcc/ChangeLog:
* config/aarch64/aarch64.md (aarch64_cbz<optab><mode>1): Move
above rules for CBB<cond>/CBH<cond>/CB<cond>.
(*aarch64_tbz<optab><mode>1): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cmpbr.c: Update tests.
|
|
Add rules for lowering `cbranch<mode>4` to CBB<cond>/CBH<cond>/CB<cond> when
CMPBR extension is enabled.
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (aarch64_cb_rhs): New function.
* config/aarch64/aarch64.cc (aarch64_cb_rhs): Likewise.
* config/aarch64/aarch64.md (cbranch<mode>4): Rename to ...
(cbranch<GPI:mode>4): ...here, and emit CMPBR if possible.
(cbranch<SHORT:mode>4): New expand rule.
(aarch64_cb<INT_CMP:code><GPI:mode>): New insn rule.
(aarch64_cb<INT_CMP:code><SHORT:mode>): Likewise.
* config/aarch64/constraints.md (Uc0): New constraint.
(Uc1): Likewise.
(Uc2): Likewise.
* config/aarch64/iterators.md (cmpbr_suffix): New mode attr.
(INT_CMP): New code iterator.
(cmpbr_imm_constraint): New code attr.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cmpbr.c:
|
|
Commit the test file `cmpbr.c` before rules for generating the new
instructions are added, so that the changes in codegen are more obvious
in the next commit.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Add `cmpbr` to the list of extensions.
* gcc.target/aarch64/cmpbr.c: New test.
|
|
Add the `+cmpbr` option to enable the FEAT_CMPBR architectural
extension.
gcc/ChangeLog:
* config/aarch64/aarch64-option-extensions.def (cmpbr): New
option.
* config/aarch64/aarch64.h (TARGET_CMPBR): New macro.
* doc/invoke.texi (cmpbr): New option.
|
|
The `far_branch` attribute only ever takes the values 0 or 1, so make it
a `no/yes` valued string attribute instead.
gcc/ChangeLog:
* config/aarch64/aarch64.md (far_branch): Replace 0/1 with
no/yes.
(aarch64_bcond): Handle rename.
(aarch64_cbz<optab><mode>1): Likewise.
(*aarch64_tbz<optab><mode>1): Likewise.
(@aarch64_tbz<optab><ALLI:mode><GPI:mode>): Likewise.
|
|
Extract the hardcoded values for the minimum PC-relative displacements
into named constants and document them.
gcc/ChangeLog:
* config/aarch64/aarch64.md (BRANCH_LEN_P_1MiB): New constant.
(BRANCH_LEN_N_1MiB): Likewise.
(BRANCH_LEN_P_32KiB): Likewise.
(BRANCH_LEN_N_32KiB): Likewise.
|