Age | Commit message (Collapse) | Author | Files | Lines |
|
A change we have committed back in 2015 relies on the backend
requested ABI alignment to be applied to ALL symbols by the
middle-end. However, this does not appear to be the case for external
symbols. With this commit we assume all symbols without explicit
alignment to be aligned according to the ABI. That's the behavior we
had before.
This fixes a performance regression caused by the 2015 patch. Since
then the address of external char type symbols have been pushed to the
literal pool, although it is safe to access them with larl (which
requires symbols to reside at even addresses).
gcc/
* config/s390/s390.cc (s390_encode_section_info): Set
SYMBOL_FLAG_SET_NOTALIGN2 only if the symbol has explicitely been
misaligned.
gcc/testsuite/
* gcc.target/s390/larl-1.c: New test.
|
|
Hi, previous I made a mistake on GIMPLE_FOLD of LEN_MASK_{LOAD,STORE}.
We should fold LEN_MASK_{LOAD,STORE} (bias+len) == vf (nunits instead of bytesize) && mask = all trues mask
into:
MEM_REF [...].
This patch added testcase to test gimple fold of LEN_MASK_{LOAD,STORE}.
Also, I fix LEN_LOAD/LEN_STORE, to make them have the same behavior.
Ok for trunk ?
gcc/ChangeLog:
* gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Fix gimple
fold of LOAD/STORE with length.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/gimple_fold-1.c: New test.
|
|
The following fixes a bug that manifests itself during fold-left
reduction transform in picking not the last scalar def to replace
and thus double-counting some elements. But the underlying issue
is that we merge a load permutation into the in-order reduction
which is of course wrong.
Now, reduction analysis has not yet been performend when optimizing
permutations so we have to resort to check that ourselves.
PR tree-optimization/110381
* tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
Materialize permutes before fold-left reductions.
* gcc.dg/vect/pr110381.c: New testcase.
|
|
This patch contains a pair of (related) optimizations in i386.md that
allow us to generate better code for the example below (this is a step
towards fixing a bugzilla PR, but I've forgotten the number).
__int128 foo64(__int128 x, long long y)
{
__int128 t = (__int128)y << 64;
return x ^ t;
}
The hidden issue is that the RTL currently seen by reload contains
the sign extension of y from DImode to TImode, even though this is
dead (not required) for left shifts by more than WORD_SIZE bits.
(insn 11 8 12 2 (parallel [
(set (reg:TI 0 ax [orig:91 y ] [91])
(sign_extend:TI (reg:DI 1 dx [97])))
(clobber (reg:CC 17 flags))
(clobber (scratch:DI))
]) {extendditi2}
What makes this particularly undesirable is that the sign-extension
pattern above requires an additional DImode scratch register, indicated
by the clobber, which unnecessarily increases register pressure.
The proposed solution is to add a define_insn_and_split for such
left shifts (of sign or zero extensions) that only have a non-zero
highpart, where the extension is redundant and eliminated, that can
be split after reload, without scratch registers or early clobbers.
This (late split) exposes a second optimization opportunity where
setting the lowpart to zero can sometimes be combined/simplified with
the following instruction during peephole2.
For the test case above, we previously generated with -O2:
foo64: xorl %eax, %eax
xorq %rsi, %rdx
xorq %rdi, %rax
ret
with this patch, we now generate:
foo64: movq %rdi, %rax
xorq %rsi, %rdx
ret
Likewise for the related -m32 test case, we go from:
foo32: movl 12(%esp), %eax
movl %eax, %edx
xorl %eax, %eax
xorl 8(%esp), %edx
xorl 4(%esp), %eax
ret
to the improved:
foo32: movl 12(%esp), %edx
movl 4(%esp), %eax
xorl 8(%esp), %edx
ret
2023-06-26 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (peephole2): Simplify zeroing a register
followed by an IOR, XOR or PLUS operation on it, into a move.
(*ashl<dwi>3_doubleword_highpart): New define_insn_and_split to
eliminate (and hide from reload) unnecessary word to doubleword
extensions that are followed by left shifts by sufficiently large,
but valid, bit counts.
gcc/testsuite/ChangeLog
* gcc.target/i386/ashldi3-1.c: New 32-bit test case.
* gcc.target/i386/ashlti3-2.c: New 64-bit test case.
|
|
When there're multiple operands in vec_oprnds0, vec_dest will be
overwrited to vectype_out, but in multi_step_cvt case, cvt_type is
expected. It caused an ICE when verify_gimple_in_cfg.
gcc/ChangeLog:
PR tree-optimization/110371
PR tree-optimization/110018
* tree-vect-stmts.cc (vectorizable_conversion): Use cvt_op to
save intermediate type operand instead of "subtle" vec_dest
for case NONE.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr110371.c: New test.
|
|
> > Hmm, good question. GENERIC has a direct truncation to unsigned char
> > for example, the C standard generally says if the integral part cannot
> > be represented then the behavior is undefined. So I think we should be
> > safe here (0x1.0p32 doesn't fit an int).
>
> We should be following Annex F (unspecified value plus "invalid" exception
> for out-of-range floating-to-integer conversions rather than undefined
> behavior). But we don't achieve that very well at present (see bug 93806
> comments 27-29 for examples of how such conversions produce wobbly
> values).
That would mean guarding this with !flag_trapping_math would be the appropriate
thing to do.
gcc/ChangeLog:
PR tree-optimization/110371
PR tree-optimization/110018
* tree-vect-stmts.cc (vectorizable_conversion): Don't use
intermiediate type for FIX_TRUNC_EXPR when ftrapping-math.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110018-1.c: Add -fno-trapping-math to dg-options.
* gcc.target/i386/pr110018-2.c: Ditto.
|
|
For function with target attribute arch=*, current logic will set its
tune to -mtune from command line so all target_clones will get same
tuning flags which would affect the performance for each clone. Override
tune with arch if tune was not explicitly specified to get proper tuning
flags for target_clones.
gcc/ChangeLog:
* config/i386/i386-options.cc (ix86_valid_target_attribute_tree):
Override tune_string with arch_string if tune_string is not
explicitly specified.
gcc/testsuite/ChangeLog:
* gcc.target/i386/mvc17.c: New test.
|
|
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/vlmul_ext-2.c: Add -Wno-psabi for dg.
|
|
Since PR96435, both boolean objects and expressions have been evaluated
in the following way.
(*(ubyte*)&obj_or_expr) & 1
It has been noted that sometimes this can cause the back-end to optimize
in non-obvious ways - in particular with __builtin_expect.
This @safe feature is now restricted to just when reading the value of a
bool field that comes from a union.
PR d/110359
gcc/d/ChangeLog:
* d-convert.cc (convert_for_rvalue): Only apply the @safe boolean
conversion to boolean fields of a union.
(convert_for_condition): Call convert_for_rvalue in the default case.
gcc/testsuite/ChangeLog:
* gdc.dg/pr110359.d: New test.
|
|
|
|
D front-end changes:
- Import dmd v2.103.1.
- Deprecated invalid special token sequences inside token strings.
D runtime changes:
- Import druntime v2.103.1.
Phobos changes:
- Import phobos v2.103.1.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd a45f4e9f43.
* dmd/VERSION: Bump version to v2.103.1.
libphobos/ChangeLog:
* libdruntime/MERGE: Merge upstream druntime a45f4e9f43.
* src/MERGE: Merge upstream phobos 106038f2e.
|
|
This patch is depending on LEN_MASK_{LOAD,STORE} patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622742.html
After enabling the LEN_MASK_{LOAD,STORE}, I notice that there is a case that VSETVL PASS need to be optimized:
void
f (int32_t *__restrict a,
int32_t *__restrict b,
int32_t *__restrict cond,
int n)
{
for (int i = 0; i < 8; i++)
if (cond[i])
a[i] = b[i];
}
Before this patch:
f:
vsetivli a5,8,e8,mf4,tu,mu --> Propagate "8" to the following vsetvl
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v0,0(a2)
vsetvli a6,zero,e32,m1,ta,ma
li a3,8
vmsne.vi v0,v0,0
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a1),v0.t
vse32.v v1,0(a0),v0.t
sub a4,a3,a5
beq a3,a5,.L6
slli a5,a5,2
add a2,a2,a5
add a1,a1,a5
add a0,a0,a5
vsetvli a5,a4,e8,mf4,tu,mu --> Propagate "a4" to the following vsetvl
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v0,0(a2)
vsetvli a6,zero,e32,m1,ta,ma
vmsne.vi v0,v0,0
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a1),v0.t
vse32.v v1,0(a0),v0.t
.L6:
ret
Current VSETLV PASS only enable AVL propagation of VLMAX AVL ("zero").
Now, we enable AVL propagation of immediate && conservative non-VLMAX.
After this patch:
f:
vsetivli a5,8,e8,mf4,ta,ma
vle32.v v0,0(a2)
vsetvli a6,zero,e32,m1,ta,ma
li a3,8
vmsne.vi v0,v0,0
vsetivli zero,8,e32,m1,ta,ma
vle32.v v1,0(a1),v0.t
vse32.v v1,0(a0),v0.t
sub a4,a3,a5
beq a3,a5,.L6
slli a5,a5,2
vsetvli a4,a4,e8,mf4,ta,ma
add a2,a2,a5
vle32.v v0,0(a2)
add a1,a1,a5
vsetvli a6,zero,e32,m1,ta,ma
add a0,a0,a5
vmsne.vi v0,v0,0
vsetvli zero,a4,e32,m1,ta,ma
vle32.v v1,0(a1),v0.t
vse32.v v1,0(a0),v0.t
.L6:
ret
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (vector_insn_info::parse_insn): Ehance
AVL propagation.
* config/riscv/riscv-vsetvl.h: New function.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/select_vl-1.c: Add dump checks.
* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: New test.
|
|
Consider this following case:
void test_vlmul_ext_v_i8mf8_i8mf4(vint8mf8_t op1) {
vint8mf4_t res = __riscv_vlmul_ext_v_i8mf8_i8mf4(op1);
}
Compilation fails with:
test.c: In function 'test_vlmul_ext_v_i8mf8_i8mf4':
test.c:5:1: error: unrecognizable insn:
5 | }
| ^
(insn 30 29 0 2 (set (mem/c:VNx2QI (reg/f:DI 143) [0 x+0 S[2, 2] A32])
(mem/c:VNx2QI (reg/f:DI 148) [0 op1+0 S[2, 2] A16])) "test.c":4:18 -1
(nil))
during RTL pass: vregs
test.c:5:1: internal compiler error: in extract_insn, at recog.cc:2791
0x7c61b8 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
../.././riscv-gcc/gcc/rtl-error.cc:108
0x7c61d7 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../.././riscv-gcc/gcc/rtl-error.cc:116
0xed58a7 extract_insn(rtx_insn*)
../.././riscv-gcc/gcc/recog.cc:2791
0xb7f789 instantiate_virtual_regs_in_insn
../.././riscv-gcc/gcc/function.cc:1611
0xb7f789 instantiate_virtual_regs
../.././riscv-gcc/gcc/function.cc:1984
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc: change emit_insn to
emit_move_insn
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/vlmul_ext-2.c: New test.
|
|
This patch enable len_mask_{load,store} to support flow-control in RVV auto-vectorization.
Consider this following case:
void
f (int32_t *__restrict a,
int32_t *__restrict b,
int32_t *__restrict cond,
int n)
{
for (int i = 0; i < n; i++)
if (cond[i])
a[i] = b[i];
}
Before this patch:
<source>:9:21: missed: couldn't vectorize loop
<source>:9:21: missed: not vectorized: control flow in loop.
After this patch:
f:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,ta,ma
vle32.v v0,0(a2)
vsetvli a6,zero,e32,m1,ta,ma
slli a4,a5,2
vmsne.vi v0,v0,0
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a1),v0.t
vse32.v v1,0(a0),v0.t
add a2,a2,a4
add a1,a1,a4
add a0,a0,a4
bne a3,zero,.L3
.L5:
ret
gcc/ChangeLog:
* config/riscv/autovec.md (len_load_<mode>): Remove.
(len_maskload<mode><vm>): Remove.
(len_store_<mode>): New pattern.
(len_maskstore<mode><vm>): New pattern.
* config/riscv/predicates.md (autovec_length_operand): New predicate.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_load_store): New function.
* config/riscv/riscv-v.cc (emit_vlmax_masked_insn): Ditto.
(emit_nonvlmax_masked_insn): Ditto.
(expand_load_store): Ditto.
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_contiguous_store_insn): Add avl_type operand
into pred_store.
* config/riscv/vector.md: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-2.h: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c: New test.
|
|
This reverts commit f9ab5d62c94547499de52c800ab914cc8e802212 due to the
bootstrap failure on machine mode out of range memory access.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/vector.md: Revert.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/abi-10.c: Revert.
* gcc.target/riscv/rvv/base/abi-11.c: Ditto.
* gcc.target/riscv/rvv/base/abi-12.c: Ditto.
* gcc.target/riscv/rvv/base/abi-15.c: Ditto.
* gcc.target/riscv/rvv/base/abi-8.c: Ditto.
* gcc.target/riscv/rvv/base/abi-9.c: Ditto.
* gcc.target/riscv/rvv/base/abi-17.c: Ditto.
* gcc.target/riscv/rvv/base/abi-18.c: Ditto.
|
|
This reverts commit 8a96f240d71d367a2955ab9e0f0fef3a0b0e2a74 due to
bootstrap failure on mode out of range access, will commit this patch
after the issue addressed.
gcc/ChangeLog:
* config/riscv/genrvv-type-indexer.cc (valid_type): Revert changes.
* config/riscv/riscv-modes.def (RVV_TUPLE_MODES): Ditto.
(ADJUST_ALIGNMENT): Ditto.
(RVV_TUPLE_PARTIAL_MODES): Ditto.
(ADJUST_NUNITS): Ditto.
* config/riscv/riscv-vector-builtins-types.def (vfloat16mf4x2_t): Ditto.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Diito.
(vfloat16m2x4_t): Diito.
(vfloat16m4x2_t): Diito.
* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): Ditto.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): Ditto.
* config/riscv/riscv.md: Ditto.
* config/riscv/vector-iterators.md: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/tuple-28.c: Removed.
* gcc.target/riscv/rvv/base/tuple-29.c: Removed.
* gcc.target/riscv/rvv/base/tuple-30.c: Removed.
* gcc.target/riscv/rvv/base/tuple-31.c: Removed.
* gcc.target/riscv/rvv/base/tuple-32.c: Removed.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
If mem_addr points to a memory region with less than whole vector size
bytes of accessible memory and k is a mask that would prevent reading
the inaccessible bytes from mem_addr, add UNSPEC_MASKLOAD to prevent
it to be transformed to vpblendd.
gcc/ChangeLog:
PR target/110309
* config/i386/sse.md (maskload<mode><avx512fmaskmodelower>):
Refine pattern with UNSPEC_MASKLOAD.
(maskload<mode><avx512fmaskmodelower>): Ditto.
(*<avx512>_load<mode>_mask): Extend mode iterator to
VI12HFBF_AVX512VL.
(*<avx512>_load<mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110309.c: New test.
|
|
gcc/ChangeLog:
* config/riscv/vector.md: Add float16 attr at sew、vlmul and ratio.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/abi-10.c: Add float16 tuple type case.
* gcc.target/riscv/rvv/base/abi-11.c: Ditto.
* gcc.target/riscv/rvv/base/abi-12.c: Ditto.
* gcc.target/riscv/rvv/base/abi-15.c: Ditto.
* gcc.target/riscv/rvv/base/abi-8.c: Ditto.
* gcc.target/riscv/rvv/base/abi-9.c: Ditto.
* gcc.target/riscv/rvv/base/abi-17.c: New test.
* gcc.target/riscv/rvv/base/abi-18.c: New test.
|
|
|
|
This patch adds RVV floating-point auto-vectorization.
Also, fix attribute bug of floating-point ternary operations in vector.md.
gcc/ChangeLog:
* config/riscv/autovec.md (fma<mode>4): New pattern.
(*fma<mode>): Ditto.
(fnma<mode>4): Ditto.
(*fnma<mode>): Ditto.
(fms<mode>4): Ditto.
(*fms<mode>): Ditto.
(fnms<mode>4): Ditto.
(*fnms<mode>): Ditto.
* config/riscv/riscv-protos.h (emit_vlmax_fp_ternary_insn):
New function.
* config/riscv/riscv-v.cc (emit_vlmax_fp_ternary_insn): Ditto.
* config/riscv/vector.md: Fix attribute bug.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: Adjust tests.
* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-9.c: New test.
|
|
gcc/analyzer/ChangeLog:
* access-diagram.cc: Add #define INCLUDE_VECTOR.
* bounds-checking.cc: Likewise.
gcc/ChangeLog:
* diagnostic-format-sarif.cc: Add #define INCLUDE_VECTOR.
* diagnostic.cc: Likewise.
* text-art/box-drawing.cc: Likewise.
* text-art/canvas.cc: Likewise.
* text-art/ruler.cc: Likewise.
* text-art/selftests.cc: Likewise.
* text-art/selftests.h (text_art::canvas): New forward decl.
* text-art/style.cc: Add #define INCLUDE_VECTOR.
* text-art/styled-string.cc: Likewise.
* text-art/table.cc: Likewise.
* text-art/table.h: Remove #include <vector>.
* text-art/theme.cc: Add #define INCLUDE_VECTOR.
* text-art/types.h: Check that INCLUDE_VECTOR is defined.
Remove #include of <vector> and <string>.
* text-art/widget.cc: Add #define INCLUDE_VECTOR.
* text-art/widget.h: Remove #include <vector>.
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_plugin_test_text_art.c: Add
#define INCLUDE_VECTOR.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
|
|
PR c++/110164 notes that in cases where we have a forward decl
of a std library type such as:
std::array<int, 10> x;
we emit this diagnostic:
error: aggregate ‘std::array<int, 10> x’ has incomplete type and cannot be defined
This patch adds this hint to the diagnostic:
note: ‘std::array’ is defined in header ‘<array>’; this is probably fixable by adding ‘#include <array>’
gcc/cp/ChangeLog:
PR c++/110164
* cp-name-hint.h (maybe_suggest_missing_header): New decl.
* decl.cc: Define INCLUDE_MEMORY. Add include of
"cp/cp-name-hint.h".
(start_decl_1): Call maybe_suggest_missing_header.
* name-lookup.cc (maybe_suggest_missing_header): Remove "static".
gcc/testsuite/ChangeLog:
PR c++/110164
* g++.dg/diagnostic/missing-header-pr110164.C: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
It seems prudent to add C++26 now that the first C++26 papers have been
approved. I followed commit r11-6920 as well as r8-3237.
Since C++23 is essentially finished and its __cplusplus value has
settled to 202302L, I've updated cpp_init_builtins and marked
-std=c++2b Undocumented and made -std=c++23 no longer Undocumented.
As for __cplusplus, I've chosen 202400L:
$ xg++ -std=c++26 -dM -E -x c++ - < /dev/null | grep cplusplus
#define __cplusplus 202400L
I've verified the patch with a simple test, exercising the new
directives. Don't forget to update your GXX_TESTSUITE_STDS!
This patch does not add -Wc++26-extensions.
gcc/c-family/ChangeLog:
* c-common.h (cxx_dialect): Add cxx26 as a dialect.
* c-opts.cc (set_std_cxx26): New.
(c_common_handle_option): Set options when -std={c,gnu}++2{c,6} is
enabled.
(c_common_post_options): Adjust comments.
* c.opt: Add options for -std=c++26, std=c++2c, -std=gnu++26,
and -std=gnu++2c.
(std=c++2b): Mark as Undocumented.
(std=c++23): No longer Undocumented.
gcc/ChangeLog:
* doc/cpp.texi (__cplusplus): Document value for -std=c++26 and
-std=gnu++26. Document that for C++23, its value is 202302L.
* doc/invoke.texi: Document -std=c++26 and -std=gnu++26.
* dwarf2out.cc (highest_c_language): Handle GNU C++26.
(gen_compile_unit_die): Likewise.
libcpp/ChangeLog:
* include/cpplib.h (c_lang): Add CXX26 and GNUCXX26.
* init.cc (lang_defaults): Add rows for CXX26 and GNUCXX26.
(cpp_init_builtins): Set __cplusplus to 202400L for C++26.
Set __cplusplus to 202302L for C++23.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_c++23): Return
1 also if check_effective_target_c++26.
(check_effective_target_c++23_down): New.
(check_effective_target_c++26_only): New.
(check_effective_target_c++26): New.
* g++.dg/cpp23/cplusplus.C: Adjust expected value.
* g++.dg/cpp26/cplusplus.C: New test.
|
|
gcc/fortran/ChangeLog:
PR fortran/110360
* trans-expr.cc (gfc_conv_procedure_call): Pass actual argument
to scalar CHARACTER(1),VALUE dummy argument by value.
gcc/testsuite/ChangeLog:
PR fortran/110360
* gfortran.dg/value_9.f90: New test.
|
|
This changes fixes PR target/105325. PR target/105325 is a bug where an
invalid lwa instruction is generated due to power10 fusion of a load
instruction to a GPR and an compare immediate instruction with the immediate
being -1, 0, or 1.
In some cases, when the load instruction is done, the GCC compiler would
generate a load instruction with an offset that was too large to fit into the
normal load instruction.
In particular, loads from the stack might originally have a small offset, so
that the load is not a prefixed load. However, after the stack is set up, and
register allocation has been done, the offset now is large enough that we would
have to use a prefixed load instruction.
The support for prefixed loads did not consider that patterns with a fused load
and compare might have a prefixed address. Without this support, the proper
prefixed load won't be generated.
In the original code, when the split2 pass is run after reload has finished the
ds_form_mem_operand predicate that was used for lwa and ld no longer returns
true. When the pattern was created, ds_form_mem_operand recognized the insn as
being valid since the offset was small. But after register allocation,
ds_form_mem_operand did not return true. Because it didn't return true, the
insn could not be split. Since the insn was not split and the prefix support
did not indicate a prefixed instruction was used, the wrong load is generated.
The solution involves:
1) Don't use ds_form_mem_operand for ld and lwa, always use
non_update_memory_operand.
2) Delete ds_form_mem_operand since it is no longer used.
3) Use the "YZ" constraints for ld/lwa instead of "m".
4) If we don't need to sign extend the lwa, convert it to lwz, and use
cmpwi instead of cmpdi. Adjust the insn name to reflect the code
generate.
5) Insure that the insn using lwa will be recognized as having a prefixed
operand (and hence the insn length will be 16 bytes instead of 8
bytes).
5a) Set the prefixed and maybe_prefix attributes to know that
fused_load_cmpi are also load insns;
5b) In the case where we are just setting CC and not using the memory
afterward, set the clobber to use a DI register, and put an
explicit sign_extend operation in the split;
5c) Set the sign_extend attribute to "yes" for lwa.
5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to
ensure that lwa is treated as a ds-form instruction and not as
a d-form instruction (i.e. lwz).
6) Add a new test case for this case.
7) Adjust the insn counts in fusion-p10-ldcmpi.c. Because we are no
longer using ds_form_mem_operand, the ld and lwa instructions will fuse
x-form (reg+reg) addresses in addition ds-form (reg+offset or reg).
2023-06-23 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/105325
* config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
allowed prefixed lwa to be generated.
* config/rs6000/fusion.md: Regenerate.
* config/rs6000/predicates.md (ds_form_mem_operand): Delete.
* config/rs6000/rs6000.md (prefixed attribute): Add support for load
plus compare immediate fused insns.
(maybe_prefixed): Likewise.
gcc/testsuite/
PR target/105325
* g++.target/powerpc/pr105325.C: New test.
* gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn counts.
Co-Authored-By: Aaron Sawdey <acsawdey@linux.ibm.com>
|
|
We have imported some headers from the GNUStep project to allow us
to maintain the testsuite independent to changing versions of system
headers.
One of these headers has a macro that (now we have support for
__has_feature) expands to a declaration that triggers a warning.
These headers are considered part of the implementation so that, in
this case, we can suppress the warning with the system_header pragma.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/testsuite/ChangeLog:
* objc-obj-c++-shared/GNUStep/Foundation/NSObjCRuntime.h: Make
this header use pragma system_header.
|
|
gcc/ChangeLog:
PR tree-optimization/110280
* match.pd (vec_perm_expr(v, v, mask) -> v): Explicitly build vector
using build_vector_from_val with the element of input operand, and
mask's type if operand and mask's types don't match.
gcc/testsuite/ChangeLog:
PR tree-optimization/110280
* gcc.target/aarch64/sve/pr110280.c: New test.
|
|
|
|
The following fixes an ICE that occurs when we visit an edge
inserted load from the code validating correctness for inserting
an aggregate copy there. We can simply skip those loads here.
PR tree-optimization/110332
* tree-ssa-phiprop.cc (propagate_with_phi): Always
check aliasing with edge inserted loads.
* g++.dg/torture/pr110332.C: New testcase.
* gcc.dg/torture/pr110332-1.c: Likewise.
* gcc.dg/torture/pr110332-2.c: Likewise.
|
|
This patch is the next installment in a set of backend patches around
improvements to ptest/vptest. A previous patch optimized the sequence
t=pand(x,y); ptestz(t,t) into the equivalent ptestz(x,y), using the
property that ZF is set to (X&Y) == 0. This patch performs a similar
transformation, converting t=pandn(x,y); ptestz(t,t) into the (almost)
equivalent ptestc(y,x), using the property that the CF flags is set to
(~X&Y) == 0. The tricky bit is that this sets the CF flag instead of
the ZF flag, so we can only perform this transformation when we can
also convert the flags consumer, as well as the producer.
For the test case:
int foo (__m128i x, __m128i y)
{
__m128i a = x & ~y;
return __builtin_ia32_ptestz128 (a, a);
}
With -O2 -msse4.1 we previously generated:
foo: pandn %xmm0, %xmm1
xorl %eax, %eax
ptest %xmm1, %xmm1
sete %al
ret
with this patch we now generate:
foo: xorl %eax, %eax
ptest %xmm0, %xmm1
setc %al
ret
At the same time, this patch also provides alternative fixes for
PR target/109973 and PR target/110118, by recognizing that ptestc(x,x)
always sets the carry flag (X&~X is always zero). This is achieved
both by recognizing the special case in ix86_expand_sse_ptest and with
a splitter to convert an eligible ptest into an stc.
2023-06-22 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize
expansion of ptestc with equal operands as producing const1_rtx.
* config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost
estimates of UNSPEC_PTEST, where the ptest performs the PAND
or PAND of its operands.
* config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST
of reg_equal_p operands into an x86_stc instruction.
(define_split): Split pandn/ptestz/set{n?}e into ptestc/set{n?}c.
(define_split): Similar to above for strict_low_part destinations.
(define_split): Split pandn/ptestz/j{n?}e into ptestc/j{n?}c.
gcc/testsuite/ChangeLog
* gcc.target/i386/avx-vptest-4.c: New test case.
* gcc.target/i386/avx-vptest-5.c: Likewise.
* gcc.target/i386/avx-vptest-6.c: Likewise.
* gcc.target/i386/pr109973-1.c: Update test case.
* gcc.target/i386/pr109973-2.c: Likewise.
* gcc.target/i386/sse4_1-ptest-4.c: New test case.
* gcc.target/i386/sse4_1-ptest-5.c: Likewise.
* gcc.target/i386/sse4_1-ptest-6.c: Likewise.
|
|
This patch extends -Wanalyzer-out-of-bounds so that, where possible, it
will emit a text art diagram visualizing the spatial relationship between
(a) the memory region that the analyzer predicts would be accessed, versus
(b) the range of memory that is valid to access - whether they overlap,
are touching, are close or far apart; which one is before or after in
memory, the relative sizes involved, the direction of the access (read vs
write), and, in some cases, the values of data involved. This diagram
can be suppressed using -fdiagnostics-text-art-charset=none.
For example, given:
int32_t arr[10];
int32_t int_arr_read_element_before_start_far(void)
{
return arr[-100];
}
it emits:
demo-1.c: In function ‘int_arr_read_element_before_start_far’:
demo-1.c:7:13: warning: buffer under-read [CWE-127] [-Wanalyzer-out-of-bounds]
7 | return arr[-100];
| ~~~^~~~~~
‘int_arr_read_element_before_start_far’: event 1
|
| 7 | return arr[-100];
| | ~~~^~~~~~
| | |
| | (1) out-of-bounds read from byte -400 till byte -397 but ‘arr’ starts at byte 0
|
demo-1.c:7:13: note: valid subscripts for ‘arr’ are ‘[0]’ to ‘[9]’
┌───────────────────────────┐
│read of ‘int32_t’ (4 bytes)│
└───────────────────────────┘
^
│
│
┌───────────────────────────┐ ┌────────┬────────┬─────────┐
│ │ │ [0] │ ... │ [9] │
│ before valid range │ ├────────┴────────┴─────────┤
│ │ │‘arr’ (type: ‘int32_t[10]’)│
└───────────────────────────┘ └───────────────────────────┘
├─────────────┬─────────────┤├─────┬──────┤├─────────────┬─────────────┤
│ │ │
╭────────────┴───────────╮ ╭────┴────╮ ╭───────┴──────╮
│⚠️ under-read of 4 bytes│ │396 bytes│ │size: 40 bytes│
╰────────────────────────╯ ╰─────────╯ ╰──────────────╯
and given:
#include <string.h>
void
test_non_ascii ()
{
char buf[5];
strcpy (buf, "文字化け");
}
it emits:
demo-2.c: In function ‘test_non_ascii’:
demo-2.c:7:3: warning: stack-based buffer overflow [CWE-121] [-Wanalyzer-out-of-bounds]
7 | strcpy (buf, "文字化け");
| ^~~~~~~~~~~~~~~~~~~~~~~~
‘test_non_ascii’: events 1-2
|
| 6 | char buf[5];
| | ^~~
| | |
| | (1) capacity: 5 bytes
| 7 | strcpy (buf, "文字化け");
| | ~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| | (2) out-of-bounds write from byte 5 till byte 12 but ‘buf’ ends at byte 5
|
demo-2.c:7:3: note: write of 8 bytes to beyond the end of ‘buf’
7 | strcpy (buf, "文字化け");
| ^~~~~~~~~~~~~~~~~~~~~~~~
demo-2.c:7:3: note: valid subscripts for ‘buf’ are ‘[0]’ to ‘[4]’
┌─────┬─────┬─────┬────┬────┐┌────┬────┬────┬────┬────┬────┬────┬──────┐
│ [0] │ [1] │ [2] │[3] │[4] ││[5] │[6] │[7] │[8] │[9] │[10]│[11]│ [12] │
├─────┼─────┼─────┼────┼────┤├────┼────┼────┼────┼────┼────┼────┼──────┤
│0xe6 │0x96 │0x87 │0xe5│0xad││0x97│0xe5│0x8c│0x96│0xe3│0x81│0x91│ 0x00 │
├─────┴─────┴─────┼────┴────┴┴────┼────┴────┴────┼────┴────┴────┼──────┤
│ U+6587 │ U+5b57 │ U+5316 │ U+3051 │U+0000│
├─────────────────┼───────────────┼──────────────┼──────────────┼──────┤
│ 文 │ 字 │ 化 │ け │ NUL │
├─────────────────┴───────────────┴──────────────┴──────────────┴──────┤
│ string literal (type: ‘char[13]’) │
└──────────────────────────────────────────────────────────────────────┘
│ │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │ │
v v v v v v v v v v v v v
┌─────┬────────────────┬────┐┌─────────────────────────────────────────┐
│ [0] │ ... │[4] ││ │
├─────┴────────────────┴────┤│ after valid range │
│ ‘buf’ (type: ‘char[5]’) ││ │
└───────────────────────────┘└─────────────────────────────────────────┘
├─────────────┬─────────────┤├────────────────────┬────────────────────┤
│ │
╭────────┴────────╮ ╭───────────┴──────────╮
│capacity: 5 bytes│ │⚠️ overflow of 8 bytes│
╰─────────────────╯ ╰──────────────────────╯
showing that the overflow occurs partway through the UTF-8 encoding of
the U+5b57 code point.
There are lots more examples in the test suite.
It doesn't show up in this email, but the above diagrams are colorized
to constrast the valid and invalid access ranges.
gcc/ChangeLog:
PR analyzer/106626
* Makefile.in (ANALYZER_OBJS): Add analyzer/access-diagram.o.
* doc/invoke.texi (Wanalyzer-out-of-bounds): Add description of
text art.
(fanalyzer-debug-text-art): New.
gcc/analyzer/ChangeLog:
PR analyzer/106626
* access-diagram.cc: New file.
* access-diagram.h: New file.
* analyzer.h (class region_offset): Add default ctor.
(region_offset::make_byte_offset): New decl.
(region_offset::concrete_p): New.
(region_offset::get_concrete_byte_offset): New.
(region_offset::calc_symbolic_bit_offset): New decl.
(region_offset::calc_symbolic_byte_offset): New decl.
(region_offset::dump_to_pp): New decl.
(region_offset::dump): New decl.
(operator<, operator<=, operator>, operator>=): New decls for
region_offset.
* analyzer.opt
(-param=analyzer-text-art-string-ellipsis-threshold=): New.
(-param=analyzer-text-art-string-ellipsis-head-len=): New.
(-param=analyzer-text-art-string-ellipsis-tail-len=): New.
(-param=analyzer-text-art-ideal-canvas-width=): New.
(fanalyzer-debug-text-art): New.
* bounds-checking.cc: Include "intl.h", "diagnostic-diagram.h",
and "analyzer/access-diagram.h".
(class out_of_bounds::oob_region_creation_event_capacity): New.
(out_of_bounds::out_of_bounds): Add "model" and "sval_hint"
params.
(out_of_bounds::mark_interesting_stuff): Use the base region.
(out_of_bounds::add_region_creation_events): Use
oob_region_creation_event_capacity.
(out_of_bounds::get_dir): New pure vfunc.
(out_of_bounds::maybe_show_notes): New.
(out_of_bounds::maybe_show_diagram): New.
(out_of_bounds::make_access_diagram): New.
(out_of_bounds::m_model): New field.
(out_of_bounds::m_sval_hint): New field.
(out_of_bounds::m_region_creation_event_id): New field.
(concrete_out_of_bounds::concrete_out_of_bounds): Update for new
fields.
(concrete_past_the_end::concrete_past_the_end): Likewise.
(concrete_past_the_end::add_region_creation_events): Use
oob_region_creation_event_capacity.
(concrete_buffer_overflow::concrete_buffer_overflow): Update for
new fields.
(concrete_buffer_overflow::emit): Replace call to
maybe_describe_array_bounds with maybe_show_notes.
(concrete_buffer_overflow::get_dir): New.
(concrete_buffer_over_read::concrete_buffer_over_read): Update for
new fields.
(concrete_buffer_over_read::emit): Replace call to
maybe_describe_array_bounds with maybe_show_notes.
(concrete_buffer_overflow::get_dir): New.
(concrete_buffer_underwrite::concrete_buffer_underwrite): Update
for new fields.
(concrete_buffer_underwrite::emit): Replace call to
maybe_describe_array_bounds with maybe_show_notes.
(concrete_buffer_underwrite::get_dir): New.
(concrete_buffer_under_read::concrete_buffer_under_read): Update
for new fields.
(concrete_buffer_under_read::emit): Replace call to
maybe_describe_array_bounds with maybe_show_notes.
(concrete_buffer_under_read::get_dir): New.
(symbolic_past_the_end::symbolic_past_the_end): Update for new
fields.
(symbolic_buffer_overflow::symbolic_buffer_overflow): Likewise.
(symbolic_buffer_overflow::emit): Call maybe_show_notes.
(symbolic_buffer_overflow::get_dir): New.
(symbolic_buffer_over_read::symbolic_buffer_over_read): Update for
new fields.
(symbolic_buffer_over_read::emit): Call maybe_show_notes.
(symbolic_buffer_over_read::get_dir): New.
(region_model::check_symbolic_bounds): Add "sval_hint" param. Pass
it and sized_offset_reg to diagnostics.
(region_model::check_region_bounds): Add "sval_hint" param, passing
it to diagnostics.
* diagnostic-manager.cc
(diagnostic_manager::emit_saved_diagnostic): Pass logger to
pending_diagnostic::emit.
* engine.cc: Add logger param to pending_diagnostic::emit
implementations.
* infinite-recursion.cc: Likewise.
* kf-analyzer.cc: Likewise.
* kf.cc: Likewise. Add nullptr for new param of
check_region_for_write.
* pending-diagnostic.h: Likewise in decl.
* region-model-manager.cc
(region_model_manager::get_or_create_int_cst): Convert param from
poly_int64 to const poly_wide_int_ref &.
(region_model_manager::maybe_fold_binop): Support type being NULL
when checking for floating-point types.
Check for (X + Y) - X => Y. Be less strict about types when folding
associative ops. Check for (X + Y) * CST => (X * CST) + (Y * CST).
* region-model-manager.h
(region_model_manager::get_or_create_int_cst): Convert param from
poly_int64 to const poly_wide_int_ref &.
* region-model.cc: Add logger param to pending_diagnostic::emit
implementations.
(region_model::check_external_function_for_access_attr): Update
for new param of check_region_for_write.
(region_model::deref_rvalue): Use nullptr rather than NULL.
(region_model::get_capacity): Handle RK_STRING.
(region_model::check_region_access): Add "sval_hint" param; pass it to
check_region_bounds.
(region_model::check_region_for_write): Add "sval_hint" param;
pass it to check_region_access.
(region_model::check_region_for_read): Add NULL for new param to
check_region_access.
(region_model::set_value): Pass rhs_sval to
check_region_for_write.
(region_model::get_representative_path_var_1): Handle SK_CONSTANT
in the check for infinite recursion.
* region-model.h (region_model::check_region_for_write): Add
"sval_hint" param.
(region_model::check_region_access): Likewise.
(region_model::check_symbolic_bounds): Likewise.
(region_model::check_region_bounds): Likewise.
* region.cc (region_offset::make_byte_offset): New.
(region_offset::calc_symbolic_bit_offset): New.
(region_offset::calc_symbolic_byte_offset): New.
(region_offset::dump_to_pp): New.
(region_offset::dump): New.
(struct linear_op): New.
(operator<, operator<=, operator>, operator>=): New, for
region_offset.
(region::get_next_offset): New.
(region::get_relative_symbolic_offset): Use ptrdiff_type_node.
(field_region::get_relative_symbolic_offset): Likewise.
(element_region::get_relative_symbolic_offset): Likewise.
(bit_range_region::get_relative_symbolic_offset): Likewise.
* region.h (region::get_next_offset): New decl.
* sm-fd.cc: Add logger param to pending_diagnostic::emit
implementations.
* sm-file.cc: Likewise.
* sm-malloc.cc: Likewise.
* sm-pattern-test.cc: Likewise.
* sm-sensitive.cc: Likewise.
* sm-signal.cc: Likewise.
* sm-taint.cc: Likewise.
* store.cc (bit_range::contains_p): Allow "out" to be null.
* store.h (byte_range::get_start_bit_offset): New.
(byte_range::get_next_bit_offset): New.
* varargs.cc: Add logger param to pending_diagnostic::emit
implementations.
gcc/testsuite/ChangeLog:
PR analyzer/106626
* gcc.dg/analyzer/data-model-1.c (test_16): Update for
out-of-bounds working.
* gcc.dg/analyzer/out-of-bounds-diagram-1-ascii.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-1-debug.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-1-json.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-1-sarif.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-1-unicode.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-10.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-11.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-12.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-13.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-14.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-15.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-2.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-3.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-4.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-5-ascii.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-5-unicode.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-6.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-7.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-8.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-9.c: New test.
* gcc.dg/analyzer/pattern-test-2.c: Update expected results.
* gcc.dg/analyzer/pr101962.c: Update expected results.
* gcc.dg/plugin/analyzer_gil_plugin.c: Add logger param to
pending_diagnostic::emit implementations.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
Existing text output in GCC has to be implemented by writing
sequentially to a pretty_printer instance. This makes it
hard to implement some kinds of diagnostic output (see e.g.
diagnostic-show-locus.cc).
This patch adds more flexible ways of creating text output:
- a canvas class, which can be "painted" to via random-access (rather
that sequentially)
- a table class for 2D grid layout, supporting items that span
multiple rows/columns
- a widget class for organizing diagrams hierarchically.
The patch also expands GCC's diagnostics subsystem so that diagnostics
can have "text art" diagrams - think ASCII art, but potentially
including some Unicode characters, such as box-drawing chars.
The new code is in a new "gcc/text-art" subdirectory and "text_art"
namespace.
The patch adds a new "-fdiagnostics-text-art-charset=VAL" option, with
values:
- "none": don't emit diagrams (added to -fdiagnostics-plain-output)
- "ascii": use pure ASCII in diagrams
- "unicode": allow for conservative use of unicode drawing characters
(such as box-drawing characters).
- "emoji" (the default): as "unicode", but potentially allow for
conservative use of emoji in the output (such as U+26A0 WARNING SIGN).
I made it possible to disable emoji separately from unicode as I believe
there's a generation gap in acceptance of these characters (some older
programmers have a visceral reaction against them, whereas younger
programmers may have no problem with them).
Diagrams are emitted to stderr by default. With SARIF output they are
captured as a location in "relatedLocations", with the diagram as a
code block in Markdown within a "markdown" property of a message.
This patch doesn't add any such diagram usage to GCC, saving that for
followups, apart from adding a plugin to the test suite to exercise the
functionality.
contrib/ChangeLog:
* unicode/gen-box-drawing-chars.py: New file.
* unicode/gen-combining-chars.py: New file.
* unicode/gen-printable-chars.py: New file.
gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add text-art/box-drawing.o,
text-art/canvas.o, text-art/ruler.o, text-art/selftests.o,
text-art/style.o, text-art/styled-string.o, text-art/table.o,
text-art/theme.o, and text-art/widget.o.
* color-macros.h (COLOR_FG_BRIGHT_BLACK): New.
(COLOR_FG_BRIGHT_RED): New.
(COLOR_FG_BRIGHT_GREEN): New.
(COLOR_FG_BRIGHT_YELLOW): New.
(COLOR_FG_BRIGHT_BLUE): New.
(COLOR_FG_BRIGHT_MAGENTA): New.
(COLOR_FG_BRIGHT_CYAN): New.
(COLOR_FG_BRIGHT_WHITE): New.
(COLOR_BG_BRIGHT_BLACK): New.
(COLOR_BG_BRIGHT_RED): New.
(COLOR_BG_BRIGHT_GREEN): New.
(COLOR_BG_BRIGHT_YELLOW): New.
(COLOR_BG_BRIGHT_BLUE): New.
(COLOR_BG_BRIGHT_MAGENTA): New.
(COLOR_BG_BRIGHT_CYAN): New.
(COLOR_BG_BRIGHT_WHITE): New.
* common.opt (fdiagnostics-text-art-charset=): New option.
(diagnostic-text-art.h): New SourceInclude.
(diagnostic_text_art_charset) New Enum and EnumValues.
* configure: Regenerate.
* configure.ac (gccdepdir): Add text-art to loop.
* diagnostic-diagram.h: New file.
* diagnostic-format-json.cc (json_emit_diagram): New.
(diagnostic_output_format_init_json): Wire it up to
context->m_diagrams.m_emission_cb.
* diagnostic-format-sarif.cc: Include "diagnostic-diagram.h" and
"text-art/canvas.h".
(sarif_result::on_nested_diagnostic): Move code to...
(sarif_result::add_related_location): ...this new function.
(sarif_result::on_diagram): New.
(sarif_builder::emit_diagram): New.
(sarif_builder::make_message_object_for_diagram): New.
(sarif_emit_diagram): New.
(diagnostic_output_format_init_sarif): Set
context->m_diagrams.m_emission_cb to sarif_emit_diagram.
* diagnostic-text-art.h: New file.
* diagnostic.cc: Include "diagnostic-text-art.h",
"diagnostic-diagram.h", and "text-art/theme.h".
(diagnostic_initialize): Initialize context->m_diagrams and
call diagnostics_text_art_charset_init.
(diagnostic_finish): Clean up context->m_diagrams.m_theme.
(diagnostic_emit_diagram): New.
(diagnostics_text_art_charset_init): New.
* diagnostic.h (text_art::theme): New forward decl.
(class diagnostic_diagram): Likewise.
(diagnostic_context::m_diagrams): New field.
(diagnostic_emit_diagram): New decl.
* doc/invoke.texi (Diagnostic Message Formatting Options): Add
-fdiagnostics-text-art-charset=.
(-fdiagnostics-plain-output): Add
-fdiagnostics-text-art-charset=none.
* gcc.cc: Include "diagnostic-text-art.h".
(driver_handle_option): Handle OPT_fdiagnostics_text_art_charset_.
* opts-common.cc (decode_cmdline_options_to_array): Add
"-fdiagnostics-text-art-charset=none" to expanded_args for
-fdiagnostics-plain-output.
* opts.cc: Include "diagnostic-text-art.h".
(common_handle_option): Handle OPT_fdiagnostics_text_art_charset_.
* pretty-print.cc (pp_unicode_character): New.
* pretty-print.h (pp_unicode_character): New decl.
* selftest-run-tests.cc: Include "text-art/selftests.h".
(selftest::run_tests): Call text_art_tests.
* text-art/box-drawing-chars.inc: New file, generated by
contrib/unicode/gen-box-drawing-chars.py.
* text-art/box-drawing.cc: New file.
* text-art/box-drawing.h: New file.
* text-art/canvas.cc: New file.
* text-art/canvas.h: New file.
* text-art/ruler.cc: New file.
* text-art/ruler.h: New file.
* text-art/selftests.cc: New file.
* text-art/selftests.h: New file.
* text-art/style.cc: New file.
* text-art/styled-string.cc: New file.
* text-art/table.cc: New file.
* text-art/table.h: New file.
* text-art/theme.cc: New file.
* text-art/theme.h: New file.
* text-art/types.h: New file.
* text-art/widget.cc: New file.
* text-art/widget.h: New file.
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic-test-text-art-ascii-bw.c: New test.
* gcc.dg/plugin/diagnostic-test-text-art-ascii-color.c: New test.
* gcc.dg/plugin/diagnostic-test-text-art-none.c: New test.
* gcc.dg/plugin/diagnostic-test-text-art-unicode-bw.c: New test.
* gcc.dg/plugin/diagnostic-test-text-art-unicode-color.c: New test.
* gcc.dg/plugin/diagnostic_plugin_test_text_art.c: New test plugin.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add them.
libcpp/ChangeLog:
* charset.cc (get_cppchar_property): New function template, based
on...
(cpp_wcwidth): ...this function. Rework to use the above.
Include "combining-chars.inc".
(cpp_is_combining_char): New function
Include "printable-chars.inc".
(cpp_is_printable_char): New function
* combining-chars.inc: New file, generated by
contrib/unicode/gen-combining-chars.py.
* include/cpplib.h (cpp_is_combining_char): New function decl.
(cpp_is_printable_char): New function decl.
* printable-chars.inc: New file, generated by
contrib/unicode/gen-printable-chars.py.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
I have followup patches that require checking for multiline patterns
that have blank lines within them, so this moves the handling of
multiline patterns before the check for blank lines, allowing for such
multiline patterns.
Doing so uncovers some issues with existing multiline directives, which
the patch fixes.
gcc/testsuite/ChangeLog:
* c-c++-common/Wlogical-not-parentheses-2.c: Split up the
multiline directive.
* gcc.dg/analyzer/malloc-macro-inline-events.c: Remove redundant
dg-regexp directives.
* gcc.dg/missing-header-fixit-5.c: Split up the multiline
directives.
* lib/gcc-dg.exp (gcc-dg-prune): Move call to
handle-multiline-outputs from prune_gcc_output to here.
* lib/multiline.exp (dg-end-multiline-output): Move call to
maybe-handle-nn-line-numbers from prune_gcc_output to here.
* lib/prune.exp (prune_gcc_output): Move calls to
maybe-handle-nn-line-numbers and handle-multiline-outputs from
here to the above.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
|
|
2023-06-21 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/87477
PR fortran/88688
PR fortran/94380
PR fortran/107900
PR fortran/110224
* decl.cc (char_len_param_value): Fix memory leak.
(resolve_block_construct): Remove unnecessary static decls.
* expr.cc (gfc_is_ptr_fcn): New function.
(gfc_check_vardef_context): Use it to permit pointer function
result selectors to be used for associate names in variable
definition context.
* gfortran.h: Prototype for gfc_is_ptr_fcn.
* match.cc (build_associate_name): New function.
(gfc_match_select_type): Use the new function to replace inline
version and to build a new associate name for the case where
the supplied associate name is already used for that purpose.
* resolve.cc (resolve_assoc_var): Call gfc_is_ptr_fcn to allow
associate names with pointer function targets to be used in
variable definition context.
* trans-decl.cc (gfc_get_symbol_decl): Unlimited polymorphic
variables need deferred initialisation of the vptr.
(gfc_trans_deferred_vars): Do the vptr initialisation.
* trans-stmt.cc (trans_associate_var): Ensure that a pointer
associate name points to the target of the selector and not
the selector itself.
gcc/testsuite/
PR fortran/87477
PR fortran/107900
* gfortran.dg/pr107900.f90 : New test
PR fortran/110224
* gfortran.dg/pr110224.f90 : New test
PR fortran/88688
* gfortran.dg/pr88688.f90 : New test
PR fortran/94380
* gfortran.dg/pr94380.f90 : New test
PR fortran/95398
* gfortran.dg/pr95398.f90 : Set -std=f2008, bump the line
numbers in the error tests by two and change the text in two.
|
|
2023-06-21 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/108961
* trans-expr.cc (gfc_conv_procedure_call): The hidden string
length must not be passed to a formal arg of type(cptr).
gcc/testsuite/
PR fortran/108961
* gfortran.dg/pr108961.f90: New test.
|
|
Also test convresions with unsigned types.
PR target/110018
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110018-1.c: Use explicit signed types.
* gcc.target/i386/pr110018-2.c: New test.
|
|
The architecture recommends that load-gather instructions avoid using the same
Z register for the load address and the destination, and the Software Optimization
Guides for Arm cores recommend that as well.
This means that for code like:
svuint64_t
food (svbool_t p, uint64_t *in, svint64_t offsets, svuint64_t a)
{
return svadd_u64_x (p, a, svld1_gather_offset(p, in, offsets));
}
we'll want to avoid generating the current:
food:
ld1d z0.d, p0/z, [x0, z0.d] // Z0 reused as input and output.
add z0.d, z1.d, z0.d
ret
However, we still want to avoid generating extra moves where there were
none before, so the tight aarch64-sve-acle.exp tests for load gathers
should still pass as they are.
This patch implements that recommendation for the load gather patterns by:
* duplicating the alternatives
* marking the output operand as early clobber
* Tying the input Z register operand in the original alternatives to 0
* Penalising the original alternatives with '?'
This results in a large-ish patch in terms of diff lines but the new
compact syntax (thanks Tamar) makes it quite a readable an regular change.
The benchmark numbers on a Neoverse V1 on fprate look okay:
diff
503.bwaves_r 0.00%
507.cactuBSSN_r 0.00%
508.namd_r 0.00%
510.parest_r 0.55%
511.povray_r 0.22%
519.lbm_r 0.00%
521.wrf_r 0.00%
526.blender_r 0.00%
527.cam4_r 0.56%
538.imagick_r 0.00%
544.nab_r 0.00%
549.fotonik3d_r 0.00%
554.roms_r 0.00%
fprate 0.10%
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>):
Add alternatives to prefer to avoid same input and output Z register.
(mask_gather_load<mode><v_int_container>): Likewise.
(*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise.
(*mask_gather_load<mode><v_int_container>_sxtw): Likewise.
(*mask_gather_load<mode><v_int_container>_uxtw): Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>):
Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>):
Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_sxtw): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_uxtw): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(*aarch64_ldff1_gather<mode>_sxtw): Likewise.
(*aarch64_ldff1_gather<mode>_uxtw): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode>
<VNx4_NARROW:mode>): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_sxtw): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_uxtw): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>): Likewise.
(@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode>
<SVE_PARTIAL_I:mode>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/gather_earlyclobber.c: New test.
* gcc.target/aarch64/sve2/gather_earlyclobber.c: New test.
|
|
The following works around the lack of the x86 backend making the
vectorizer compare the costs of the different possible vector
sizes the backed advertises through the vector_modes hook. When
enabling masked epilogues or main loops then this means we will
select the prefered vector mode which is usually the largest even
for loops that do not iterate close to the times the vector has
lanes. When not using masking the vectorizer would reject any
mode resulting in a VF bigger than the number of iterations
but with masking they are simply masked out.
So this overloads the finish_cost function and matches for
the problematic case, forcing a high cost to make us try a
smaller vector size.
* config/i386/i386.cc (ix86_vector_costs::finish_cost):
Overload. For masked main loops make sure the vectorization
factor isn't more than double the number of iterations.
* gcc.target/i386/vect-partial-vectors-1.c: New testcase.
* gcc.target/i386/vect-partial-vectors-2.c: Likewise.
|
|
There's no reason to constrain this to AVX512VL, unless instructed so by
-mprefer-vector-width=, as the wider operation is unusable for more
narrow operands only when the possible memory source is a non-broadcast
one. This way even the scalar copysign<mode>3 can benefit from the
operation being a single-insn one (leaving aside moves which the
compiler decides to insert for unclear reasons, and leaving aside the
fact that bcst_mem_operand() is too restrictive for broadcast to be
embedded right into VPTERNLOG*).
While there also bring *<avx512>_vternlog<mode>_all's in sync with that
of the three splitters.
Along with this also request value duplication in
ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating
excess space allocation in .rodata.*, filled with zeros which are never
read.
gcc/
* config/i386/i386-expand.cc (ix86_expand_copysign): Request
value duplication by ix86_build_signbit_mask() when AVX512F and
not HFmode.
* config/i386/sse.md (*<avx512>_vternlog<mode>_all): Convert to
2-alternative form. Adjust "mode" attribute. Add "enabled"
attribute.
(*<avx512>_vpternlog<mode>_1): Also permit when TARGET_AVX512F
&& !TARGET_PREFER_AVX256.
(*<avx512>_vpternlog<mode>_2): Likewise.
(*<avx512>_vpternlog<mode>_3): Likewise.
gcc/testsuite/
* gcc.target/i386/avx512f-copysign.c: New test.
|
|
This is to cover testing also being done with -march=cascadelake.
gcc/testsuite/
* gcc.target/i386/avx512f-dupv2di.c: Add
-mprefer-vector-width=512.
|
|
optab is not existed.
We have already use intermidate type in case WIDEN, but not for NONE,
this patch extended that.
gcc/ChangeLog:
PR target/110018
* tree-vect-stmts.cc (vectorizable_conversion): Use
intermiediate integer type for float_expr/fix_trunc_expr when
direct optab is not existed.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110018-1.c: New test.
|
|
|
|
When libcpp reports diagnostics whose locus is a macro name (such as for
-Wunused-macros), it uses the location in the cpp_macro object that was
stored by _cpp_new_macro. This is currently set to pfile->directive_line,
which contains the line number only and no column information. This patch
changes the stored location to the src_loc for the token defining the macro
name, which includes the location and range information.
libcpp/ChangeLog:
PR c++/66290
* macro.cc (_cpp_create_definition): Add location argument.
* internal.h (_cpp_create_definition): Adjust prototype.
* directives.cc (do_define): Pass new location argument to
_cpp_create_definition.
(do_undef): Stop passing inferior location to cpp_warning_with_line;
the default from cpp_warning is better.
(cpp_pop_definition): Pass new location argument to
_cpp_create_definition.
* pch.cc (cpp_read_state): Likewise.
gcc/testsuite/ChangeLog:
PR c++/66290
* c-c++-common/cpp/macro-ranges.c: New test.
* c-c++-common/cpp/line-2.c: Adapt to check for column information
on macro-related libcpp warnings.
* c-c++-common/cpp/line-3.c: Likewise.
* c-c++-common/cpp/macro-arg-count-1.c: Likewise.
* c-c++-common/cpp/pr58844-1.c: Likewise.
* c-c++-common/cpp/pr58844-2.c: Likewise.
* c-c++-common/cpp/warning-zero-location.c: Likewise.
* c-c++-common/pragma-diag-14.c: Likewise.
* c-c++-common/pragma-diag-15.c: Likewise.
* g++.dg/modules/macro-2_d.C: Likewise.
* g++.dg/modules/macro-4_d.C: Likewise.
* g++.dg/modules/macro-4_e.C: Likewise.
* g++.dg/spellcheck-macro-ordering.C: Likewise.
* gcc.dg/builtin-redefine.c: Likewise.
* gcc.dg/cpp/Wunused.c: Likewise.
* gcc.dg/cpp/redef2.c: Likewise.
* gcc.dg/cpp/redef3.c: Likewise.
* gcc.dg/cpp/redef4.c: Likewise.
* gcc.dg/cpp/ucnid-11-utf8.c: Likewise.
* gcc.dg/cpp/ucnid-11.c: Likewise.
* gcc.dg/cpp/undef2.c: Likewise.
* gcc.dg/cpp/warn-redefined-2.c: Likewise.
* gcc.dg/cpp/warn-redefined.c: Likewise.
* gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
* gcc.dg/cpp/warn-unused-macros.c: Likewise.
|
|
Several gcc.target/aarch64/sve/pcs tests started failing after
6a2e8dcbbd4, because the tests weren't robust against whether
an indirect argument register or the stack pointer was used as
the base for stores.
The patch allows either base register when there is only one
indirect argument. It disables -fcprop-registers in cases where
there are sometimes multiple indirect arguments, since the name
of the argument register is then an important part of the test.
Disabling -fcprop-registers gives poor final register allocation,
since:
* combine's make_more_copies hack adds extra redundant moves
* code with those moves is not allocated as well as moves without them
* we often rely on -fcprop-registers to clean up the allocation later
The patch therefore disables combine in the same tests as
cprop-registers.
gcc/testsuite/
* gcc.target/aarch64/sve/pcs/args_1.c: Match moves from the stack
pointer to indirect argument registers and allow either to be used
as the base register in subsequent stores.
* gcc.target/aarch64/sve/pcs/args_8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_2.c: Allow the store of the
indirect argument to happen via the argument register or the
stack pointer.
* gcc.target/aarch64/sve/pcs/args_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_4.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_bf16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_bf16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_bf16.c: Disable
-fcprop-registers and combine.
* gcc.target/aarch64/sve/pcs/args_6_be_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_bf16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_3_nosc.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_3_sc.c: Likewise.
|
|
In the following testcase we fail to pattern recognize the least significant
.UADDC call. The reason is that arg3 in that case is
_3 = .ADD_OVERFLOW (...);
_2 = __imag__ _3;
_1 = _2 != 0;
arg3 = (unsigned long) _1;
and while before the changes arg3 has a single use in some .ADD_OVERFLOW
later on, we add a .UADDC call next to it (and gsi_remove/gsi_replace only
what is strictly necessary and leave quite a few dead stmts around which
next DCE cleans up) and so it all of sudden isn't used just once, but twice
(.ADD_OVERFLOW and .UADDC) and so uaddc_cast fails. While we could tweak
uaddc_cast and not require has_single_use in these uses, there is also
no vrp that would figure out that because __imag__ _3 is in [0, 1] range,
it can just use arg3 = __imag__ _3; and drop the comparison and cast.
We already search if either arg2 or arg3 is ultimately set from __imag__
of .{{ADD,SUB}_OVERFLOW,U{ADD,SUB}C} call, so the following patch just
remembers the lhs of __imag__ from that case and uses it later.
2023-06-20 Jakub Jelinek <jakub@redhat.com>
PR middle-end/79173
* tree-ssa-math-opts.cc (match_uaddc_usubc): Remember lhs of
IMAGPART_EXPR of arg2/arg3 and use that as arg3 if it has the right
type.
* g++.target/i386/pr79173-1.C: New test.
|
|
In IPA-SRA we use can_be_local_p () predicate rather than just plain
local call graph flag in order to figure out whether the node is a
part of an external API that we cannot change. Although there are
cases where this can allow more transformations, it also means we can
analyze functions which have no callers at all, which is pointless.
Moreover, it makes an assert of hint propagation trigger, which checks
that we have looked at callers before processing hints that come from
them. This has been reported as PR 110276.
This patch simply adds a check that a node has at least one caller
into the early checks and makes the node a non-candidate for any
transformation if it does not.
gcc/ChangeLog:
2023-06-16 Martin Jambor <mjambor@suse.cz>
PR ipa/110276
* ipa-sra.cc (struct caller_issues): New field there_is_one.
(check_for_caller_issues): Set it.
(check_all_callers_for_issues): Check it.
gcc/testsuite/ChangeLog:
2023-06-16 Martin Jambor <mjambor@suse.cz>
PR ipa/110276
* gcc.dg/ipa/pr110276.c: New test.
|
|
Add support for the following builtins:
__vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
__vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
__ieee128 scalar_insert_exp (__vector unsigned __int128,
__vector unsigned long long);
The instructions used in the builtins operate on vector registers. Thus
the result must be moved to a scalar type. There is no clean, performant
way to do this. The user code typically needs the result as a vector
anyway.
gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CODE_FOR_xsxsigqp_tf to CODE_FOR_xsxsigqp_tf_ti.
Rename CODE_FOR_xsxsigqp_kf to CODE_FOR_xsxsigqp_kf_ti.
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
(CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
CODE_FOR_xsiexpqp_kf_v2di): Add case statements.
* config/rs6000/rs6000-builtins.def
(__builtin_vsx_scalar_extract_exp_to_vec,
__builtin_vsx_scalar_extract_sig_to_vec,
__builtin_vsx_scalar_insert_exp_vqp): Add new builtin definitions.
Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di,
xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new
overloaded instance. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
(scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
overloaded definitions.
* config/rs6000/vsx.md (V2DI_DI): New mode iterator.
(DI_to_TI): New mode attribute.
Rename xsxexpqp_<mode> to sxexpqp_<IEEE128:mode>_<V2DI_DI:mode>.
Rename xsxsigqp_<mode> to xsxsigqp_<IEEE128:mode>_<VEC_TI:mode>.
Rename xsiexpqp_<mode> to xsiexpqp_<IEEE128:mode>_<V2DI_DI:mode>.
* doc/extend.texi (scalar_extract_exp_to_vec,
scalar_extract_sig_to_vec): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.
gcc/testsuite/
* gcc.target/powerpc/bfp/scalar-extract-exp-8.c: New test case.
* gcc.target/powerpc/bfp/scalar-extract-sig-8.c: New test case.
* gcc.target/powerpc/bfp/scalar-insert-exp-16.c: New test case.
|
|
This fixes more cases of missing -mabi=lp64d.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c: Add
-mabi=lp64d.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Dito.
|