Age | Commit message (Collapse) | Author | Files | Lines |
|
As the testcase shows, a missing unshare_expr caused that the condition
was only evaluated once instead of every time when a 'declare variant'
was resolved.
PR middle-end/121922
gcc/ChangeLog:
* omp-general.cc (omp_dynamic_cond): Use 'unshare_expr' for
the user condition.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/declare-variant-1.c: New test.
Co-authored-by: Sandra Loosemore <sloosemore@baylibre.com>
|
|
The following re-implements the fix for PR84830 where the original
fix causes missed optimizations. The issue with PR84830 is that
we end up growing ANTIC_IN value set during iteration which happens
because we conditionally prune values based on ANTIC_OUT - TMP_GEN
expressions. But when ANTIC_OUT was computed including the
MAX set on one edge we fail to take into account the implicitly
represented MAX expression set. The following rectifies this by
not pruning the value set in bitmap_set_subtract_expressions in
such case. This avoids the pruning from the ANTIC_IN value
set when MAX is involved and thus later growing, removing the
need to explicitly prune it with the last iteration set.
PR tree-optimization/121720
* tree-ssa-pre.cc (bitmap_set_subtract_expressions): Add
flag to tell whether we should copy instead of prune the
value set.
(compute_antic_aux): Remove intersection of ANTIC_IN with
the old solution. When subtracting TMP_GEN from
ANTIC_OUT do not prune the value set when MAX was involved
in the ANTIC_OUT computation.
* gcc.dg/tree-ssa/ssa-pre-36.c: New testcase.
|
|
Align move_max with prefer_vector_width for SPR/GNR/DMR similar as
below commit.
commit 6ea25c041964bf63014fcf7bb68fb1f5a0a4e123
Author: liuhongt <hongtao.liu@intel.com>
Date: Thu Aug 15 12:54:07 2024 +0800
Align ix86_{move_max,store_max} with vectorizer.
When none of mprefer-vector-width, avx256_optimal/avx128_optimal,
avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will
set ix86_{move_max,store_max} as max available vector length except
for AVX part.
if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)
&& TARGET_EVEX512_P (opts->x_ix86_isa_flags2))
opts->x_ix86_move_max = PVW_AVX512;
else
opts->x_ix86_move_max = PVW_AVX128;
So for -mavx2, vectorizer will choose 256-bit for vectorization, but
128-bit is used for struct copy, there could be a potential STLF issue
due to this "misalign".
gcc/ChangeLog:
* config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES):
Remove SPR/GNR/DMR.
(X86_TUNE_AVX512_STORE_BY_PIECES): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pieces-memcpy-18.c: Use -mtune=znver5
instead of -mtune=sapphirerapids.
* gcc.target/i386/pieces-memcpy-21.c: Ditto.
* gcc.target/i386/pieces-memset-46.c: Ditto.
* gcc.target/i386/pieces-memset-49.c: Ditto.
|
|
|
|
Comment #2 of PR c++/121966 notes that the "inherited here" messages
should be nested *within* the note they describe.
Implemented by this patch, which also nests other notes emitted for
rejection_reason within the first note of print_z_candidate.
gcc/cp/ChangeLog:
PR c++/121966
* call.cc (print_z_candidate): Consolidate instances of
auto_diagnostic_nesting_level into one, above the "inherited here"
message so that any such message is nested within the note,
and any messages emitted due to the switch on rejection_reason are
similarly nested within the note.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
In r15-6116-gd3dd24acd74605 I updated print_z_candidates to show the
number of candidates, and a number for each candidate.
PR c++/121966 notes that the printed count is sometimes higher than
what's actually printed: I missed the case where candidates in the
list aren't printed due to not being viable.
Fixed thusly.
gcc/cp/ChangeLog:
PR c++/121966
* call.cc (print_z_candidates): Copy the filtering logic on viable
candidates from the printing loop to the counting loop, so that
num_candidates matches the number of iterations of the latter
loop.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
gcc/testsuite/ChangeLog:
* g++.dg/analyzer/unique_ptr-1.C: Rename to...
* g++.dg/analyzer/std-unique_ptr-1.C: ...this.
* g++.dg/analyzer/unique_ptr-2.C: Rename to...
* g++.dg/analyzer/std-unique_ptr-2.C: ...this.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
In r16-2766-g7969e4859ed007 I added a new field to replay_opts
but forgot to initialize it in set_defaults.
Fixed thusly.
Spotted thanks to valgrind.
gcc/ChangeLog:
* sarif-replay.cc (set_defaults): Initialize
m_debug_physical_locations.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
After r16-3887-g597b50abb0d2fc, the check to see if the copy is
a nop copy becomes inefficient. The code going into an infinite
loop as the copy keeps on being propagated over and over again.
That is if we have:
```
struct s1 *b = &a.t;
a.t = *b;
p = *b;
```
This goes into an infinite loop propagating over and over again the
`MEM[&a]`.
To solve this a new function is needed for the comparison that is
similar to new_src_based_on_copy.
PR tree-optimization/121962
gcc/ChangeLog:
* tree-ssa-forwprop.cc (same_for_assignment): New function.
(optimize_agr_copyprop_1): Use same_for_assignment to check for
nop copies.
(optimize_agr_copyprop): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr121962-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
If both operands that are being compared are decls, operand_equal_p will already
handle that case so an early out can be done here.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (new_src_based_on_copy): An early out
if both are decls.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
This moves the code used in optimize_agr_copyprop_1 (r16-3887-g597b50abb0d)
to handle this same case into its new function and use it inside
optimize_agr_copyprop_arg. This allows to remove more copies that show up only
in arguments.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (optimize_agr_copyprop_1): Split out
the case where `operand_equal_p (dest, src2)` is false into ...
(new_src_based_on_copy): This. New function.
(optimize_agr_copyprop_arg): Use new_src_based_on_copy
instead of operand_equal_p to find the new src.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/copy-prop-aggregate-arg-2.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
Libraries like Intel MKL use 64-bit integers in their API, but gfortran
up to now only provides external BLAS for matmul with 32-bit
integers. This straightforward patch provides a new option -fexternal-blas64
to remedy that situation.
gcc/fortran/ChangeLog:
* frontend-passes.cc (optimize_namespace): Handle
flag_external_blas64.
(call_external_blas): If flag_external_blas is set, use
gfc_integer_4_kind as the argument kind, gfc_integer_8_kind otherwise.
* gfortran.h (gfc_integer_8_kind): Define.
* invoke.texi: Document -fexternal-blas64.
* lang.opt: Add -fexternal-blas64.
* lang.opt.urls: Regenerated.
* options.cc (gfc_post_options): -fexternal-blas is incompatible
with -fexternal-blas64.
gcc/testsuite/ChangeLog:
* gfortran.dg/matmul_blas_3.f90: New test.
|
|
Here's Shreya's next patch.
In pr58727 we have a case where the tree/gimple optimizers have decided to
"simplify" constants involved in logical ops by turning off as many bits as
they can in the hope that the simplified constant will be easier/smaller to
encode. That "simplified" constant gets passed down into the RTL optimizers
where it can ultimately cause a missed optimization.
Concretely let's assume we have insns 6, 7, 8 as shown in the combine dump
below:
> Trying 6, 7 -> 9:
> 6: r139:SI=r141:SI&0xfffffffffffffffd
> REG_DEAD r141:SI
> 7: r140:SI=r139:SI&0xffffffffffbfffff
> REG_DEAD r139:SI
> 9: r137:SI=r140:SI|0x2
> REG_DEAD r140:SI
We can obviously see that insn 6 is redundant as the bit we turn off would be
turned on by insn 9. But combine ultimately tries to generate:
> (set (reg:SI 137 [ _3 ])
> (ior:SI (and:SI (reg:SI 141 [ a ])
> (const_int -4194305 [0xffffffffffbffffd]))
> (const_int 2 [0x2])))
That does actually match a pattern on RISC-V, but it's a pattern that generates
two bit-clear insns (or a bit-clear followed by andi and a pattern we'll be
removing someday). But if instead we IOR 0x2 back into the simplified constant
we get:
> (set (reg:SI 137 [ _3 ])
> (ior:SI (and:SI (reg:SI 141 [ a ])
> (const_int -4194305 [0xffffffffffbfffff]))
> (const_int 2 [0x2])))
That doesn't match, but when split by generic code in the combiner we get:
> Successfully matched this instruction:
> (set (reg:SI 140)
> (and:SI (reg:SI 141 [ a ])
> (const_int -4194305 [0xffffffffffbfffff])))
> Successfully matched this instruction:
> (set (reg:SI 137 [ _3 ])
> (ior:SI (reg:SI 140)
> (const_int 2 [0x2])))
Which is bclr+bset/ori. ie, we dropped one of the logical AND operations.
Bootstrapped and regression tested on x86 and riscv. Regression tested on the
30 or so embedded targets as well without new failures.
I'll give this a couple days for folks to chime in before pushing on Shreya's
behalf. This doesn't fix pr58727 for the other targets as they would need
target dependent hackery.
Jeff
PR tree-optimization/58727
gcc/
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
In (A & C1) | C2, if (C1|C2) results in a constant with a single bit
clear, then adjust C1 appropriately.
gcc/testsuite/
* gcc.target/riscv/pr58727.c: New test.
|
|
When transitioning gcc.dg/torture/pr84830.c to a GIMPLE testcase to
feed the IL into PRE that caused the original issue (and verify it's
still there with the fix reverted), I noticed we put up SSA operands
before having fully parsed the function and thus with not all
variables having the final TREE_ADDRESSABLE state. The following
fixes this, delaying update_stmt calls to when we create PHI nodes.
It also makes the pr84830.c not rely on the particular fake exit
edge source location by making the loop have an exit.
gcc/c/
* gimple-parser.cc (c_parser_parse_gimple_body): Initialize
SSA operands for each stmt.
(c_parser_gimple_compound_statement): Append stmts without
updating SSA operands.
gcc/testsuite/
* gcc.dg/torture/pr84830.c: Turn into GIMPLE unit test for PRE.
|
|
After r16-2649-g0340177d54d tests fail for
gcc.target/s390/arch13/bitops-{1,2}.c since sign extends in conjunction
with (subreg (not a)) are folded, now. That is, of course, wanted.
Since the original tests were about 32-bit operations, circumvent the
sign extend by not returning a value but rather writing it to memory.
Similar for andc-splitter-2.c sign extends are folded there, too. Since
the test is not about 32- or 64-bit adjust the scan assembler directives
only.
gcc/testsuite/ChangeLog:
* gcc.target/s390/arch13/bitops-1.c: Do not return a 32bit value
but write it to memory.
* gcc.target/s390/arch13/bitops-2.c: Ditto.
* gcc.target/s390/md/andc-splitter-2.c: Adjust scan assembler
directive because sign extends are folded, now.
|
|
For parameters passed by reference, the Ada compiler sets TREE_THIS_NOTRAP
on their dereference to prevent tree_could_trap_p from returning true and
then causing a new basic block to be created for every access to them,
given that in Ada the -fnon-call-exceptions flag is enabled by default.
However, when the subprogram is inlined, this TREE_THIS_NOTRAP flag cannot
be blindly preserved because the call may pass the dereference of a pointer
as the argument: even if the compiler generates a check that the pointer is
not null just before, preserving TREE_THIS_NOTRAP could cause an access to
be hoisted before the check; therefore it gets cleared for parameters.
Now that's suboptimal if the argument is a full object because accessing it
through the dereference of the parameter cannot trap, which causes MEM_REFs
of the form MEM_REF [&DECL] to be considered as trapping in the case where
the nominal subtype of DECL is self-referential.
gcc/
* tree-inline.cc (maybe_copy_this_notrap): New function. Also copy
the TREE_THIS_NOTRAP flag for parameters when the argument is a full
object and the parameter's type is self-referential.
(remap_gimple_op_r): Call maybe_copy_this_notrap.
(copy_tree_body_r): Likewise.
|
|
For macOS/Darwin, we run Objective-C tests for both the GNU and
NeXT runtimes (and these runs are usually differentiated by
identifying the runtime in the test name).
However, the 'special' sub-set of tests had a non-standard driver
since it needs two sources for each test (but did not report the
runtime in the test name and so shows duplicates).
We can now automate the multi-source case with dg-additional-sources
but need to do a little work to filter these additional sources
from the set (since they also have a .m suffix).
This addresses the FIXME in the original driver.
To resolve the duplicated names, means amending the reported name
to include the runtime as a differentiator, this means that test
comparisons will temporarily report new and missing tests for any
comparison that includes this change.
gcc/testsuite/ChangeLog:
* objc.dg/special/load-category-1.m: Add second source.
* objc.dg/special/load-category-2.m: Likewise.
* objc.dg/special/load-category-3.m: Likewise.
* objc.dg/special/unclaimed-category-1.m: Likewise.
* objc.dg/special/special.exp: Rewrite to make use of generic
testsuite facilities.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
Reduce fp16-aapcs testcases to return value testing since parameter
passing are already tested in aapcs/vfp*.c
gcc/testsuite/ChangeLog:
* gcc.target/arm/fp16-aapcs.c: New test.
* gcc.target/arm/fp16-aapcs-1.c: Removed.
* gcc.target/arm/fp16-aapcs-2.c: Likewise.
* gcc.target/arm/fp16-aapcs-3.c: Likewise.
* gcc.target/arm/fp16-aapcs-4.c: Likewise.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
cost 0, 1 and 15
Add asm dump check and run test for vec_duplicate + vwmulu.vv
combine to vwmulu.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vwmulu.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen.h: Add test helper
macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen_data.h: Add test
data for vwmulu.vx run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vwmulu-run-1-u64.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
cost 0, 1 and 15
Add asm dump check and run test for vec_duplicate + vwsubu.vv
combine to vwsubu.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vwsubu.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen.h: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vwsubu-run-1-u64.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
cost 0, 1 and 15
Add asm dump check and run test for vec_duplicate + vwaddu.vv
combine to vwaddu.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vwaddu.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vwaddu-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen_data.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen_vx_run.h: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to combine the vec_duplicate + vwaddu.vv to the
vwaddu.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
Before this patch:
11 beq a3,zero,.L8
12 vsetvli a5,zero,e32,m1,ta,ma
13 vmv.v.x v2,a2
...
16 .L3:
17 vsetvli a5,a3,e32,m1,ta,ma
...
22 vwaddu.vv v1,v2,v3
...
25 bne a3,zero,.L3
After this patch:
11 beq a3,zero,.L8
...
14 .L3:
15 vsetvli a5,a3,e32,m1,ta,ma
...
20 vwaddu.vx v1,a2,v3
...
23 bne a3,zero,.L3
The pattern of this patch only works on DImode, aka below pattern.
v1:RVVM1DImode = (zero_extend:RVVM1DImode v2:RVVM1SImode)
+ (vec_dup:RVVM1DImode (zero_extend:DImode x2:SImode));
Unfortunately, for uint16_t to uint32_t or uint8_t to uint16_t, we loss
this extend op after expand.
For uint16_t => uint32_t we have:
(set (reg:SI 149) (subreg/s/v:SI (reg/v:DI 146 [ rs1 ]) 0))
For uint32_t => uint64_t we have:
(set (reg:DI 148 [ _6 ])
(zero_extend:DI (subreg/s/u:SI (reg/v:DI 146 [ rs1 ]) 0)))
We can see there is no zero_extend for uint16_t to uint32_t, and we
cannot hit the pattern above. So the combine will try below pattern
for uint16_t to uint32_t.
v1:RVVM1SImode = (zero_extend:RVVM1SImode v2:RVVM1HImode)
+ (vec_dup:RVVM1SImode (subreg:SIMode (:DImode x2:SImode)))
But it cannot match the vwaddu sematics, thus we need another handing
for the vwaddu.vv for uint16_t to uint32_t, as well as the uint8_t to
uint16_t.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*widen_first_<any_extend:su>_vx_<mode>):
Add helper bridge pattern for vwaddu.vx combine.
(*widen_<any_widen_binop:optab>_<any_extend:su>_vx_<mode>): Add
new pattern to match vwaddu.vx combine.
* config/riscv/iterators.md: Add code attr to get extend CODE.
* config/riscv/vector-iterators.md: Add Dmode iterator for
widen.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Both of the tests under 128 bit are raising:
warning: writing 16 bytes into a region of size 8 [-Wstringop-overflow=]
when compiling, leading to a test fail. The warning is caused by the
incorrect array size for res_ref2. The wrong size caused the overflow.
Correct them in this patch to fix the test fail.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512bw-vpmovuswb-2.c: Correct res_ref2
array size.
* gcc.target/i386/avx512bw-vpmovwb-2.c: Ditto.
|
|
vect-epilogue-4.c uses mask 64 byte to vectorize in epilogue part.
Similar as r16-876 fix for vect-epilogue-5.c, we need to adjust the
scan tree dump.
gcc/testsuite/ChangeLog:
* gcc.target/i386/vect-epilogues-4.c: Fix for epilogue
vect tree dump.
|
|
|
|
These two don't make sense as nested functions as they both don't handle
the unnesting and/or have support for the static chain.
So let's reject them.
Bootstrapped and tested on x86_64-linux-gnu.
PR c/121421
gcc/c/ChangeLog:
* c-parser.cc (c_parser_declaration_or_fndef): Error out for gimple
and rtl functions as nested functions.
gcc/testsuite/ChangeLog:
* gcc.dg/gimplefe-error-16.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
I've noticed in -Wimplicit-fallthrough= documentation we talk about
[[fallthrough]]; for C++17 but don't mention that it is also standard
way to suppress the warning for C23.
2025-09-16 Jakub Jelinek <jakub@redhat.com>
* doc/invoke.texi (Wimplicit-fallthrough=): Document that also C23
provides a standard way to suppress the warning with [[fallthrough]];.
|
|
it directly
In recent gcc versions, REGNO_OK_FOR_BASE_P() is not called directly, but
rather via regno_ok_for_base_p() which is a wrapper in gcc/addresses.h.
The wrapper obtains a hard register number from pseudo via reg_renumber
array, so REGNO_OK_FOR_BASE_P() does not need to take this into
consideration.
On the other hand, since there is only one use of REGNO_OK_FOR_BASE_P()
in the target-specific code, it would make more sense to simplify the
definition of REGNO_OK_FOR_BASE_P() and replace its call with that of
regno_ok_for_base_p().
gcc/ChangeLog:
* config/xtensa/xtensa.cc (#include):
Add "addresses.h".
* config/xtensa/xtensa.h (REGNO_OK_FOR_BASE_P):
Simplify to just a call to GP_REG_P().
(BASE_REG_P): Replace REGNO_OK_FOR_BASE_P() with the equivalent
call to regno_ok_for_base_p().
|
|
Add an expander for isnan using integer arithmetic. Since isnan is
just a compare, enable it only with -fsignaling-nans to avoid
generating spurious exceptions. This fixes part of PR66462.
int isnan1 (float x) { return __builtin_isnan (x); }
Before:
fcmp s0, s0
cset w0, vs
ret
After:
fmov w1, s0
mov w0, -16777216
cmp w0, w1, lsl 1
cset w0, cc
ret
gcc:
PR middle-end/66462
* config/aarch64/aarch64.md (isnan<mode>2): Add new expander.
gcc/testsuite:
PR middle-end/66462
* gcc.target/aarch64/pr66462.c: Update test.
|
|
The following unifies the vect_transform_slp_perm_load call done
in vectorizable_load with that eventually done in get_load_store_type.
On the way it fixes the conditions on which we can allow
VMAT_ELEMENTWISE or VMAT_GATHER_SCATTER when there's a SLP permutation
(and we arrange to not code generate that). In particular that only
works for single-lane SLP of non-grouped loads or groups of size one.
VMAT_ELEMENTWISE does not (yet?) materialize a permutation upon
vector build but still relies on vect_transform_slp_perm_load.
* tree-vect-stmts.cc (get_load_store_type): Get in a flag
whether a SLP_TREE_LOAD_PERMUTATION on the node can be
code generated and use it. Fix the condition on using
strided gather/scatter to avoid dropping a meaningful
permutation.
(vectorizable_store): Adjust.
(vectorizable_load): Analyze the permutation early and
pass the result down to get_load_store_type. Fix the
condition on when we are allowed to elide a load permutation.
|
|
An ICE was reported in the following test case:
svint8_t foo(svbool_t pg, int8_t op2) {
return svmul_n_s8_z(pg, svdup_s8(1), op2);
}
with a type mismatch in 'vec_cond_expr':
_4 = VEC_COND_EXPR <v16_2(D), v32_3(D), { 0, ... }>;
The reason is that svmul_impl::fold folds calls where one of the operands
is all ones to the other operand using
gimple_folder::fold_active_lanes_to. However, we implicitly assumed
that the argument that is passed to fold_active_lanes_to is a vector
type. In the given test case op2 is a scalar type, resulting in the type
mismatch in the vec_cond_expr.
This patch fixes the ICE by forcing a vector type of the argument
in fold_active_lanes_to before the statement with the vec_cond_expr.
In the initial version of this patch, the force_vector statement was placed in
svmul_impl::fold, but it was moved to fold_active_lanes_to to align it with
fold_const_binary which takes care of the fixup from scalar to vector
type using vector_const_binop.
The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for trunk?
OK to backport to GCC 15?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
PR target/121602
* config/aarch64/aarch64-sve-builtins.cc
(gimple_folder::fold_active_lanes_to): Add force_vector
statement.
gcc/testsuite/
PR target/121602
* gcc.target/aarch64/sve/acle/asm/mul_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
|
|
Before this patch, confirming Stream_Size aspect specifications on
elementary types were incorrectly rejected when the stream size was 128,
and the error messages emitted for Stream_Size aspect errors gave
incorrect possible values.
This patch fixes this. The most significant part of the fix is a new
subprogram in Exp_Strm, Get_Primitives, that makes it possible to
retrieve a precise list of supported stream sizes, but also to select
the right runtime streaming primitives for a given type. Using the
latter, this patch factorizes code that was present in both
Build_Elementary_Input_Call and Build_Elementary_Write_Call.
gcc/ada/ChangeLog:
* exp_strm.ads (Get_Primitives): New function.
* exp_strm.adb (Get_Primitives): Likewise.
(Build_Elementary_Input_Call, Build_Elementary_Write_Call): use
Get_Primitives.
(Has_Stream_Standard_Rep): Add formal parameter and rename to...
(Is_Stream_Standard_Rep): New function.
* sem_ch13.adb (Analyze_Attribute_Definition_Clause): Fix error
emission.
|
|
component"
This reverts commit 91b51fc42b167eedaaded6360c490a4306bc5c55.
|
|
Recent changes to Ada have produced a new diagnostic:
s-osinte.adb:34:18: warning: unit "Interfaces.C.Extensions"...
which causes a bootstrap fail on Darwin when Ada is enabled.
Fixed thus.
PR ada/114065
gcc/ada/ChangeLog:
* libgnarl/s-osinte__darwin.adb: Add and reference clause
for Interfaces.C, remove clause for Interfaces.C.Extensions.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
Allows profiles input in '--with-arch'. Check profiles with
'riscv-profiles.def'.
gcc/ChangeLog:
* config.gcc: Accept RISC-V profiles in `--with-arch`.
* config/riscv/arch-canonicalize: Add profile detection and
skip canonicalization for profiles.
|
|
Moving RISC-V Profiles definations into 'riscv-profiles.def'. Add comments for
'riscv_profiles'.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (struct riscv_profiles): Add comments.
(RISCV_PROFILE): Removed.
* config/riscv/riscv-profiles.def: New file.
|
|
This patch implies zicsr for sdtrig and ssstrict extensions.
According to the riscv-privileged spec, the sdtrig and ssstrict extensions
are privileged extensions, so they should imply zicsr.
gcc/ChangeLog:
* config/riscv/riscv-ext.def: Imply zicsr.
|
|
After r16-3651, compare_tests script will explicitly mention those
tests have the same name. This helps us review all the tests we have.
Among them, most of them are unintentional typos (e.g., keep testing
the same vector size for scan-assembler). Fix them through this commit.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512bw-vpackssdw-1.c:
Fix xmm/ymm mask tests.
* gcc.target/i386/avx512bw-vpacksswb-1.c: Ditto.
* gcc.target/i386/avx512bw-vpackusdw-1.c: Ditto.
* gcc.target/i386/avx512bw-vpackuswb-1.c: Ditto.
* gcc.target/i386/avx512bw-vpermw-1.c: Test xmm.
* gcc.target/i386/avx512bw-vpmulhw-1.c:
Fix xmm/ymm mask tests.
* gcc.target/i386/avx512f-vec-init.c: Remove duplicate test.
* gcc.target/i386/avx512fp16-13.c: Fix test for aligned load.
* gcc.target/i386/avx512fp16-conjugation-1.c: Revise the test
to test more precisely on masks.
* gcc.target/i386/avx512fp16vl-conjugation-1.c: Ditto.
* gcc.target/i386/avx512vbmi-vpermb-1.c: Test xmm.
* gcc.target/i386/avx512vl-vcvtpd2ps-1.c: Fix scan asm.
* gcc.target/i386/avx512vl-vinsert-1.c: Fix typo.
* gcc.target/i386/avx512vl-vpmulld-1.c:
Fix xmm/ymm mask tests.
* gcc.target/i386/avx512vl-vptestmd-1.c: Ditto.
* gcc.target/i386/bitwise_mask_op-1.c: Fix typo.
* gcc.target/i386/cond_op_shift_q-1.c: Test both vpsra{,v}
and vpsll{,v}.
* gcc.target/i386/cond_op_shift_ud-1.c: Ditto.
* gcc.target/i386/cond_op_shift_uq-1.c: Ditto.
* gcc.target/i386/memcpy-pr95886.c: Fix the wrong const int.
* gcc.target/i386/part-vect-sqrtph-1.c: Remove duplicate test.
* gcc.target/i386/pr107432-7.c: Test vpmov{s,z}xbw instead of
vpmov{s,z}xbd.
* gcc.target/i386/pr88828-0.c: Fix pblendw scan asm.
|
|
gcc/ChangeLog:
* config/i386/predicates.md (avx_vbroadcast128_operand): New
predicate.
* config/i386/sse.md (*avx_vbroadcastf128_<mode>_perm): New
pre_reload splitter.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx_vbroadcastf128.c: New test.
|
|
|
|
Add __get_errno_ptr() as yet another synonym for __errno_location.
for gcc/analyzer/ChangeLog
* kf.cc (register_known_functions): Add __get_errno_ptr.
|
|
This test requires a C++ compiler.
for gcc/testsuite/ChangeLog
* gcc.target/aarch64/pr113356.C: Move to ...
* g++.target/aarch64/pr113356.C: ... here.
|
|
Bring code model selection logic to vxworks.h as well.
for gcc/ChangeLog
* config/rs6000/vxworks.h (TARGET_CMODEL, SET_CMODEL): Define.
|
|
The use of the TLS register in a TOC/GOT address computation was
probably a cut&pasto or a thinko. It causes a linker warning and,
because the TLS access in the test is incomplete, may cause
significant confusion. Adjust to use the TOC/GOT register as base.
for gcc/ChangeLog
* configure.ac: Adjust base register in linker test for large
TOC support.
* configure: Rebuild.
|
|
The widen-mul removed the unnecessary cast, thus adjust the
SAT_MUL of wide-mul to a simpler form.
gcc/ChangeLog:
* match.pd: Remove unnecessary cast of unsigned
SAT_MUL for widen-mul.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
The widening-mul will insert a cast for the widen-mul, the
function build_and_insert_cast is design to take care of it.
In some case the optimized gimple has some unnecessary cast,
for example as below code.
#define SAT_U_MUL_FMT_5(NT, WT) \
NT __attribute__((noinline)) \
sat_u_mul_##NT##_from_##WT##_fmt_5 (NT a, NT b) \
{ \
WT x = (WT)a * (WT)b; \
NT hi = x >> (sizeof(NT) * 8); \
NT lo = (NT)x; \
return lo | -!!hi; \
}
SAT_U_MUL_FMT_5(uint64_t, uint128_t)
There will be a additional cast to uint128_t after optimized,
this patch would like to refine this by checking the def of
the rhs cast, if it comes from a cast with less or equal
precision, the rhs of the def will be leveraged.
Before this patch:
29 │ _1 = (__int128 unsigned) a_8(D);
30 │ _2 = (__int128 unsigned) b_9(D);
31 │ _35 = (unsigned long) _1;
32 │ _34 = (unsigned long) _2;
33 │ x_10 = _35 w* _34;
After this patch:
27 │ _35 = (unsigned long) a_8(D);
28 │ _34 = (unsigned long) b_9(D);
29 │ x_10 = _35 w* _34;
gcc/ChangeLog:
* tree-ssa-math-opts.cc (build_and_insert_cast): Refine
the cast insert by check the rhs of val.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/widen-mul-0.c: New test.
* gcc.target/riscv/sat/widen-mul-1.c: New test.
* gcc.target/riscv/sat/widen-mul-2.c: New test.
* gcc.target/riscv/sat/widen-mul-3.c: New test.
* gcc.target/riscv/sat/widen-mul.h: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
The 'size' argument of ctf_add_sou was size_t. After the prior fixes
for PR121411, this could cause the struct size to be truncated when
encoding extremely large structs on a host where size_t is smaller than
unsigned HOST_WIDE_INT, manifesting for example as the test failure
reported in PR121903. Change the argument to uHWI to resolve the issue.
PR debug/121411
PR debug/121903
gcc/
* ctfc.h (ctf_add_sou): Change size arg from size_t to uHWI.
* ctfc.cc (ctf_add_sou): Likewise.
|
|
gcc/ada
PR ada/114065
PR ada/121953
* Makefile.rtl (LIBGNAT_TARGET_PAIRS) [x32-linux]: Replace
libgnarl/s-osinte__x32.adb with libgnarl/s-osinte__posix.adb.
* libgnarl/s-osinte__x32.adb: Delete.
|
|
It turns out easy to add support for memcpy copy prop when the memcpy
has changed into `MEM<char[N]>` copy.
Instead of rejecting right out we need to figure out that
`a` and `MEM<char[N]>[&a]` are equivalent in terms of address and size.
And then create a VIEW_CONVER_EXPR from the original src to the new type.
Note this also allows for `a.b` and `a` being considered equivalent if b is the
only field (PR 121751).
Changes since v1:
* v2: Move check for IMAG/REAL and BFR earlier.
Add a wrapping function around get_inner_reference and use that instead
of get_addr_base_and_unit_offset.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/121751
PR tree-optimization/121418
PR tree-optimization/121417
gcc/ChangeLog:
* tree-ssa-forwprop.cc (split_core_and_offset_size): New function.
(optimize_agr_copyprop_1): Allow for the same
address but different type accesses via a VCE.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/copy-prop-aggregate-1.c: New test.
* gcc.dg/tree-ssa/copy-prop-aggregate-memcpy-1.c: New test.
* gcc.dg/tree-ssa/copy-prop-aggregate-memcpy-2.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
The sufficient conditions are that the aspect be deferred and the object be
rewritten as a renaming because of the complex initialization expression.
gcc/ada/ChangeLog:
* gcc-interface/trans.cc (gnat_to_gnu)
<N_Object_Renaming_Declaration>: Deal with objects whose elaboration
is deferred.
(process_freeze_entity): Deal with renamed objects whose elaboration
is deferred.
|