Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:
_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16
It also handles more complicated situations, e.g.:
_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1
gcc/ChangeLog:
* Makefile.in: Added sccopy pass.
* passes.def: Added sccopy pass before LTO streaming and before
RTL expansion.
* tree-pass.h (make_pass_sccopy): Added sccopy pass.
* gimple-ssa-sccopy.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.dg/sccopy-1.c: New test.
Signed-off-by: Filip Kastl <fkastl@suse.cz>
|
|
This patch half-reverts 3aaf704bca3e and replaces it with a fix with
relaxed requiremets for invoking build_reconstructed_reference in
build_ref_for_model.
build_ref_for_model/build_ref_for_offset is used in two slightly
different contexts. The first is when we are looking at an assignmernt
like
p->field_A.field_B = s.field_B;
and we have a replacements for e.g. s.field_B.field_C.field_D and we
want to store them directly to p->field_A.field_B.field_C.field_D (as
opposed to going through s or using a MEM_REF based in
p->field_A.field_B). In this case, the offset of the
"model" (s.field_B.field_C.field_D) within this can be different than
offset within the LHS that we want to reach (field_C.field_D within
the "base" p->field_A.field_B). Patch 3aaf704bca3e has caused us to
unnecessarily create MEM_REFs for these situations. These uses of
build_ref_for_model work with the relaxed condition just fine.
The second, problematic, context is when somewhere in the function we
have an assignment
s.field_A = t.field_A.field_B;
and we are creating an access structure to represent s.field_A.field_B
even if it is not actually accessed in the original input. This is
done after scanning the entire function body and we need to construct
a "universal" reference to s.field_A.field_B. In this case the "base"
is "s" and it has to be the DECL itself and not some reference for it
because for arbitrary references we need a GSI pointing to a statement
which we don't have, the reference is supposed to be universal.
But then using build_ref_for_model and within it
build_reconstructed_reference misbihaves if the expression contains
any ARRAY_REFs. In the first case those are fine because as we
eventually reach the aggregate type that matches a real LHS or RHS, we
know we we can just bolt the rest of the references onto it and end up
with the correct overall reference. However when dealing with
s.array[1].field_A = s.array[2].field_B;
we cannot just bolt array[2] reference when we want array[1] but that
is exactly what happens when we use build_reconstructed_reference and
keep it walking all the way to s.
I was consiering making all users of the second kind use directly
build_ref_for_offset instead of build_ref_for_model but the latter
also handles COMPONENT_REFs to bit-fields which the former does not.
THerefore I have deided to use the NULL-ness of GSI as an indicator
how strict we need to be. I have changed the function comment to
reflect that.
I have been able to observe diambiguation improvements with this patch
over currenct master, we do successfuly manage a few more
aliasing_component_refs_p disambiguations when compiling cc1, going
from:
Alias oracle query stats:
refs_may_alias_p: 94354287 disambiguations, 106279231 queries
ref_maybe_used_by_call_p: 1572511 disambiguations, 95618222 queries
call_may_clobber_ref_p: 649273 disambiguations, 659371 queries
stmt_kills_ref_p: 142342 kills, 8407309 queries
nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries
nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must overlaps, 68893 queries
aliasing_component_refs_p: 67090 disambiguations, 3081766 queries
TBAA oracle: 22675296 disambiguations 61781978 queries
14045969 are in alias set 0
10997085 queries asked about the same object
153 queries asked about the same alias set
0 access volatile
12485774 are dependent in the DAG
1577701 are aritificially in conflict with void *
Modref stats:
modref kill: 832 kills, 19399 queries
modref use: 50760 disambiguations, 1825109 queries
modref clobber: 1371014 disambiguations, 40152535 queries
5190238 tbaa queries (0.129263 per modref query)
1341663 base compares (0.033414 per modref query)
PTA query stats:
pt_solution_includes: 36784427 disambiguations, 46141175 queries
pt_solutions_intersect: 4519387 disambiguations, 17081996 queries
to:
Alias oracle query stats:
refs_may_alias_p: 94354083 disambiguations, 106278948 queries
ref_maybe_used_by_call_p: 1572511 disambiguations, 95618018 queries
call_may_clobber_ref_p: 649273 disambiguations, 659371 queries
stmt_kills_ref_p: 142342 kills, 8407310 queries
nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries
nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must overlaps, 68893 queries
aliasing_component_refs_p: 67104 disambiguations, 3081781 queries
TBAA oracle: 22676608 disambiguations 61782455 queries
14044948 are in alias set 0
10998619 queries asked about the same object
153 queries asked about the same alias set
0 access volatile
12484882 are dependent in the DAG
1577245 are aritificially in conflict with void *
Modref stats:
modref kill: 832 kills, 19399 queries
modref use: 50760 disambiguations, 1825106 queries
modref clobber: 1371028 disambiguations, 40152504 queries
5190319 tbaa queries (0.129265 per modref query)
1341403 base compares (0.033408 per modref query)
PTA query stats:
pt_solution_includes: 36784449 disambiguations, 46141210 queries
pt_solutions_intersect: 4519320 disambiguations, 17082083 queries
gcc/ChangeLog:
2023-12-13 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/111807
* tree-sra.cc (build_ref_for_model): Allow offset smaller than
model->offset when gsi is non-NULL. Adjust function comment.
|
|
TARGET_AVX2 is not available.
vpbroadcastd/vpbroadcastq is avaiable under TARGET_AVX2, but
vec_dup{v4di,v8si} pattern is avaiable under AVX with memory operand.
And it will cause LRA/Reload to generate spill and reload if we put
constant in register.
gcc/ChangeLog:
PR target/112992
* config/i386/i386-expand.cc
(ix86_convert_const_wide_int_to_broadcast): Don't convert to
broadcast for vec_dup{v4di,v8si} when TARGET_AVX2 is not
available.
(ix86_broadcast_from_constant): Allow broadcast for V4DI/V8SI
when !TARGET_AVX2 since it will be forced to memory later.
(ix86_expand_vector_move): Force constant to mem for
vec_dup{vssi,v4di} when TARGET_AVX2 is not available.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr100865-7a.c: Adjust testcase.
* gcc.target/i386/pr100865-7c.c: Ditto.
* gcc.target/i386/pr112992.c: New test.
|
|
After recent RVV cost model tweak, I found this PR issue has been fixed.
Add testcase and committed.
PR target/112387
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr112387.c: New test.
|
|
PR tree-optimization/110640
* gcc.dg/torture/pr110640.c: New testcase.
|
|
struct bar { int num_vectors; double *vectors; };
is 16 bytes only on 64-bit targets, on 32-bit ones it is just 8 bytes,
so the explicit matching of the * 16 multiplication only works on the
former.
2023-12-14 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/gomp/target-enter-data-1.c: Match also sizeof bar on
32-bit targets - 8 bytes - rather than just 16 bytes.
|
|
On Fri, Dec 08, 2023 at 03:12:00PM +0800, liuhongt wrote:
> * g++.target/i386/pr112904.C: New test.
The new test FAILs on i686-linux and even on x86_64-linux I think
it doesn't actually test what was reported, unless one performs testing
with -march= for some XOP enabled CPU or -mxop.
The following patch fixes that, tested on x86_64-linux with
make check-g++ RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mno-sse/-mno-mmx,-m64\} i386.exp=pr112904.C'
2023-12-14 Jakub Jelinek <jakub@redhat.com>
PR target/112904
* g++.target/i386/pr112904.C: Add dg-do compile, dg-options -mxop
and for ia32 also dg-additional-options -mmmx.
|
|
With valgrind checking, there are various errors reported on some C++26
libstdc++ tests, like:
==2009913== Conditional jump or move depends on uninitialised value(s)
==2009913== at 0x914C59: gt_ggc_mx_lang_tree_node(void*) (gt-cp-tree.h:107)
==2009913== by 0x8AB7A5: gt_ggc_mx_tinst_level(void*) (gt-cp-pt.h:32)
==2009913== by 0xB89B25: ggc_mark_root_tab(ggc_root_tab const*) (ggc-common.cc:75)
==2009913== by 0xB89DF4: ggc_mark_roots() (ggc-common.cc:104)
==2009913== by 0x9D6311: ggc_collect(ggc_collect) (ggc-page.cc:2227)
==2009913== by 0xDB70F6: execute_one_pass(opt_pass*) (passes.cc:2738)
==2009913== by 0xDB721F: execute_pass_list_1(opt_pass*) (passes.cc:2755)
==2009913== by 0xDB7258: execute_pass_list(function*, opt_pass*) (passes.cc:2766)
==2009913== by 0xA55525: cgraph_node::analyze() (cgraphunit.cc:695)
==2009913== by 0xA57CC7: analyze_functions(bool) (cgraphunit.cc:1248)
==2009913== by 0xA5890D: symbol_table::finalize_compilation_unit() (cgraphunit.cc:2555)
==2009913== by 0xEB02A1: compile_file() (toplev.cc:473)
I think the problem is in the tinst_level::to_list optimization from 2018.
That function returns a TREE_LIST with TREE_PURPOSE/TREE_VALUE filled in.
Either it freshly allocates using build_tree_list (NULL, NULL); + stores
TREE_PURPOSE/TREE_VALUE, that case is fine (the whole tree_list object
is zeros, except for TREE_CODE set to TREE_LIST and TREE_PURPOSE/TREE_VALUE
modified later; the above also means in particular TREE_TYPE of it is NULL
and TREE_CHAIN is NULL and both are accessible/initialized even in valgrind
annotations.
Or it grabs a TREE_LIST node from a freelist.
If defined(ENABLE_GC_CHECKING), the object is still all zeros except
for TREE_CODE/TREE_PURPOSE/TREE_VALUE like in the fresh allocation case
(but unlike the build_tree_list case in the valgrind annotations
TREE_TYPE and TREE_CHAIN are marked as uninitialized).
If !defined(ENABLE_GC_CHECKING), I believe the actual memory content
is that everything but TREE_CODE/TREE_PURPOSE/TREE_VALUE/TREE_CHAIN is
zeros and TREE_CHAIN is something random (whatever next entry is in the
freelist, nothing overwrote it) and from valgrind POV again,
TREE_TYPE and TREE_CHAIN are marked as uninitialized.
When using the other freelist instantiations (pending_template and
tinst_level) I believe everything is correct, from valgrind POV it marks
the whole pending_template or tinst_level as uninitialized, but the
caller initializes it all).
One way to fix this would be let tinst_level::to_list not store just
TREE_PURPOSE (ret) = tldcl;
TREE_VALUE (ret) = targs;
but also
TREE_TYPE (ret) = NULL_TREE;
TREE_CHAIN (ret) = NULL_TREE;
Though, that seems like wasted effort in the build_tree_list case to me.
So, the following patch instead does that TREE_CHAIN = NULL_TREE store only
in the case where it isn't already done (and likewise for TREE_TYPE just to
be sure) and marks both TREE_CHAIN and TREE_TYPE as initialized (the latter
is at that spot, the former is because we never really touch TREE_TYPE of a
TREE_LIST anywhere and so the NULL gets stored into the freelist and
restored from there (except for ENABLE_GC_CHECKING where it is poisoned
and then cleared again).
2023-12-14 Jakub Jelinek <jakub@redhat.com>
PR c++/112968
* pt.cc (freelist<tree_node>::reinit): Make whole obj->common
defined for valgrind annotations rather than just obj->base,
and do it even for ENABLE_GC_CHECKING. If not ENABLE_GC_CHECKING,
clear TREE_CHAIN (obj) and TREE_TYPE (obj).
|
|
This patch fixes PR11153:
ble a1,zero,.L8
addiw a5,a1,-1
li a4,4
addi sp,sp,-16
mv a2,a0
sext.w a3,a1
bleu a5,a4,.L9
srliw a4,a3,2
slli a4,a4,4
mv a5,a0
add a4,a4,a0
vsetivli zero,4,e32,m1,ta,ma
vmv.v.i v1,0
vse32.v v1,0(sp)
.L4:
vle32.v v1,0(a5) ---> This loop always processes 4 elements which is ok for VLEN = 128bits, but waste a huge amount of computation units when VLEN > 128bits
vle32.v v2,0(sp)
addi a5,a5,16
vadd.vv v1,v2,v1
vse32.v v1,0(sp)
bne a4,a5,.L4
ld a5,0(sp)
lw a4,0(sp)
andi a1,a1,-4
srai a5,a5,32
addw a5,a4,a5
lw a4,8(sp)
addw a5,a5,a4
ld a4,8(sp)
srai a4,a4,32
addw a0,a5,a4
beq a3,a1,.L15
.L3:
subw a3,a3,a1
slli a5,a1,32
slli a3,a3,32
srli a3,a3,32
srli a5,a5,30
add a2,a2,a5
vsetvli a5,a3,e8,mf4,tu,mu
vsetvli a4,zero,e32,m1,ta,ma
sub a1,a3,a5
vmv.v.i v1,0
vsetvli zero,a3,e32,m1,tu,ma
vle32.v v2,0(a2)
vmv.v.v v1,v2
bne a3,a5,.L21
.L7:
vsetvli a4,zero,e32,m1,ta,ma
vmv.s.x v2,zero
vredsum.vs v1,v1,v2
vmv.x.s a5,v1
addw a0,a0,a5
.L15:
addi sp,sp,16
jr ra
.L21:
slli a5,a5,2
add a2,a2,a5
vsetvli zero,a1,e32,m1,tu,ma
vle32.v v2,0(a2)
vadd.vv v1,v1,v2
j .L7
.L8:
li a0,0
ret
.L9:
li a1,0
li a0,0
j .L3
The rootcause of this is we missed RVV builtin vectorization cost model.
After this patch:
ble a1,zero,.L4
vsetvli a5,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a1,e32,m1,tu,ma
vle32.v v2,0(a0)
slli a4,a5,2
sub a1,a1,a5
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
li a5,0
vsetivli zero,1,e32,m1,ta,ma
vmv.s.x v2,a5
vsetvli a5,zero,e32,m1,ta,ma
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li a0,0
ret
PR target/111153
gcc/ChangeLog:
* config/riscv/riscv-protos.h (struct common_vector_cost): New struct.
(struct scalable_vector_cost): Ditto.
(struct cpu_vector_cost): Ditto.
* config/riscv/riscv-vector-costs.cc (costs::add_stmt_cost): Add RVV
builtin vectorization cost
* config/riscv/riscv.cc (struct riscv_tune_param): Ditto.
(get_common_costs): New function.
(riscv_builtin_vectorization_cost): Ditto.
(TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): New targethook.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr111153.c: New test.
|
|
The alpha port failed its weekly test due to a lack of a prototype for the
syscall() routine. Fixed thusly and pushed to the trunk.
gcc/testsuite
* gcc.c-torture/execute/20001229-1.c: Prototype syscall().
|
|
|
|
Since r14-6505 I see:
FAIL: g++.dg/cpp0x/constexpr-ex1.C -std=c++23 at line 91 (test for errors, line 89)
FAIL: g++.dg/cpp0x/constexpr-ex1.C -std=c++23 (test for excess errors)
FAIL: g++.dg/cpp0x/constexpr-ex1.C -std=c++26 at line 91 (test for errors, line 89)
FAIL: g++.dg/cpp0x/constexpr-ex1.C -std=c++26 (test for excess errors)
and it wasn't fixed by r14-6511. So I'm fixing it with the below.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/constexpr-ex1.C: Adjust expected diagnostic line.
|
|
ACLE has added intrinsics to bridge between SVE and Neon.
The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
SVE vectors.
This patch adds support to GCC for the following 3 intrinsics:
svset_neonq, svget_neonq and svdup_neonq
gcc/ChangeLog:
* config.gcc: Adds new header to config.
* config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers):
Moved to header file.
(ENTRY): Likewise.
(enum aarch64_simd_type): Likewise.
(struct aarch64_simd_type_info): Remove static.
(GTY): Likewise.
* config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
Defines pragma for arm_neon_sve_bridge.h.
* config/aarch64/aarch64-protos.h:
Add handle_arm_neon_sve_bridge_h
* config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
* config/aarch64/aarch64-sve-builtins-base.cc
(class svget_neonq_impl): New intrinsic implementation.
(class svset_neonq_impl): Likewise.
(class svdup_neonq_impl): Likewise.
(NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
* config/aarch64/aarch64-sve-builtins-functions.h
(NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE
functions.
* config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_element_type): Add NEON element types.
(parse_type): Likewise.
(struct get_neonq_def): Defines function shape for get_neonq.
(struct set_neonq_def): Defines function shape for set_neonq.
(struct dup_neonq_def): Defines function shape for dup_neonq.
* config/aarch64/aarch64-sve-builtins.cc
(DEF_SVE_TYPE_SUFFIX): Changed to be called through
SVE_NEON macro.
(DEF_SVE_NEON_TYPE_SUFFIX): Defines
macro for NEON_SVE_BRIDGE type suffixes.
(DEF_NEON_SVE_FUNCTION): Defines
macro for NEON_SVE_BRIDGE functions.
(function_resolver::infer_neon128_vector_type): Infers type suffix
for overloaded functions.
(handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
* config/aarch64/aarch64-sve-builtins.def
(DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes.
(bf16): Replace entry with neon-sve entry.
(f16): Likewise.
(f32): Likewise.
(f64): Likewise.
(s8): Likewise.
(s16): Likewise.
(s32): Likewise.
(s64): Likewise.
(u8): Likewise.
(u16): Likewise.
(u32): Likewise.
(u64): Likewise.
* config/aarch64/aarch64-sve-builtins.h
(GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h.
(ENTRY): Add aarch64_simd_type definiton.
(enum aarch64_simd_type): Add neon information to type_suffix_info.
(struct type_suffix_info): New function.
* config/aarch64/aarch64-sve.md
(@aarch64_sve_get_neonq_<mode>): New intrinsic insn for big endian.
(@aarch64_sve_set_neonq_<mode>): Likewise.
* config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ.
* config/aarch64/aarch64-builtins.h: New file.
* config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file.
* config/aarch64/arm_neon_sve_bridge.h: New file.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include
arm_neon_sve_bridge header file
* gcc.dg/torture/neon-sve-bridge.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/general-c/dup_neonq_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/get_neonq_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/set_neonq_1.c: New test.
|
|
With the previous two patches in place, we can now extend our
deletedness diagnostic to note the other considered candidates, e.g.:
deleted.C: In function 'int main()':
deleted.C:10:4: error: use of deleted function 'void f(int)'
10 | f(0);
| ~^~~
deleted.C:5:6: note: declared here
5 | void f(int) = delete;
| ^
deleted.C:5:6: note: candidate: 'void f(int)' (deleted)
deleted.C:6:6: note: candidate: 'void f(...)'
6 | void f(...);
| ^
deleted.C:7:6: note: candidate: 'void f(int, int)'
7 | void f(int, int);
| ^
deleted.C:7:6: note: candidate expects 2 arguments, 1 provided
These notes are controlled by a new command line flag
-fdiagnostics-all-candidates which also controls whether we note
ignored candidates more generally.
gcc/ChangeLog:
* doc/invoke.texi (C++ Dialect Options): Document
-fdiagnostics-all-candidates.
gcc/c-family/ChangeLog:
* c.opt: Add -fdiagnostics-all-candidates.
gcc/cp/ChangeLog:
* call.cc (print_z_candidates): Only print ignored candidates
when -fdiagnostics-all-candidates is set, otherwise suggest
the flag.
(build_over_call): When diagnosing deletedness, note
other candidates only if -fdiagnostics-all-candidates is
set, otherwise suggest the flag.
gcc/testsuite/ChangeLog:
* g++.dg/overload/error6.C: Pass -fdiagnostics-all-candidates.
* g++.dg/cpp0x/deleted16.C: New test.
* g++.dg/cpp0x/deleted16a.C: New test.
* g++.dg/overload/error6a.C: New test.
|
|
During overload resolution, we sometimes outright ignore a function in
the overload set and leave no trace of it in the candidates list, for
example when we find a perfect non-template candidate we discard all
function templates, or when the callee is a template-id we discard all
non-template functions. We should still however make note of these
non-viable functions when diagnosing overload resolution failure, but
that's not possible if they're not present in the returned candidates
list.
To that end, this patch reworks add_candidates to add such ignored
functions to the list. The new rr_ignored rejection reason is somewhat
of a catch-all; we could perhaps split it up into more specific rejection
reasons, but I leave that as future work.
gcc/cp/ChangeLog:
* call.cc (enum rejection_reason_code): Add rr_ignored.
(add_ignored_candidate): Define.
(ignored_candidate_p): Define.
(add_template_candidate_real): Do add_ignored_candidate
instead of returning NULL.
(splice_viable): Put ignored (non-viable) candidates last.
(print_z_candidate): Handle ignored candidates.
(build_new_function_call): Refine shortcut that calls
cp_build_function_call_vec now that non-templates can
appear in the candidate list for a template-id call.
(add_candidates): Replace 'bad_fns' overload with 'bad_cands'
candidate list. When not considering a candidate, add it
to the list as an ignored candidate. Add all 'bad_cands'
to the overload set as well.
gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/param-type-mismatch-2.C: Rename template
function test_7 that (maybe accidentally) shares the same name
as its non-template callee.
* g++.dg/overload/error6.C: New test.
|
|
This patch:
* changes splice_viable to move the non-viable candidates to the end
of the list instead of removing them outright
* makes tourney move the best candidate to the front of the candidate
list
* adjusts print_z_candidates to preserve our behavior of printing only
viable candidates when diagnosing ambiguity
* adds a parameter to print_z_candidates to control this default behavior
(the follow-up patch will want to print all candidates when diagnosing
deletedness)
Thus after this patch we have access to the entire candidate list through
the best viable candidate.
This change also happens to fix diagnostics for the below testcase where
we currently neglect to note the third candidate, since the presence of
the two unordered non-strictly viable candidates causes splice_viable to
prematurely get rid of the non-viable third candidate.
gcc/cp/ChangeLog:
* call.cc: Include "tristate.h".
(splice_viable): Sort the candidate list according to viability.
Don't remove non-viable candidates from the list.
(print_z_candidates): Add defaulted only_viable_p parameter.
By default only print non-viable candidates if there is no
viable candidate.
(tourney): Ignore non-viable candidates. Move the true champ to
the front of the candidates list, and update 'candidates' to
point to the front. Rename champ_compared_to_predecessor to
previous_worse_champ.
gcc/testsuite/ChangeLog:
* g++.dg/overload/error5.C: New test.
|
|
When unifying constants we need to treat constants of different types
but same value as different in light of auto template parameters since
otherwise e.g. A<1> will unify with A<1u> (where A's template-head is
template<auto>). This patch fixes this in a minimal way; it seems we
could get away with just using template_args_equal here, as we do in the
default case, or even just cp_tree_equal since the CONVERT_EXPR_P loop
seems to be dead code, but that's a simplification we could consider
during next stage 1.
PR c++/99186
PR c++/104867
gcc/cp/ChangeLog:
* pt.cc (unify) <case INTEGER_CST>: Compare types as well.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/nontype-auto23.C: New test.
* g++.dg/cpp1z/nontype-auto24.C: New test.
|
|
unify currently always returns success when unifying two FUNCTION_DECLs
(due to the is_overloaded_fn deferment within the default case), which
means for the below testcase we incorrectly unify &A::foo and &A::bar
leading to deduction failure for the index_of calls due to a bogus base
class ambiguity.
This patch makes unify handle FUNCTION_DECL naturally like other decls.
PR c++/93740
gcc/cp/ChangeLog:
* pt.cc (unify) <case FUNCTION_DECL>: Handle it like FIELD_DECL
and TEMPLATE_DECL.
gcc/testsuite/ChangeLog:
* g++.dg/template/ptrmem34.C: New test.
|
|
Following the last patch, let's rename the functions to reflect the change
in behavior.
gcc/c-family/ChangeLog:
* c-warn.cc (check_address_or_pointer_of_packed_member):
Rename to check_address_of_packed_member.
(check_and_warn_address_or_pointer_of_packed_member):
Rename to check_and_warn_address_of_packed_member.
(warn_for_address_or_pointer_of_packed_member):
Rename to warn_for_address_of_packed_member.
* c-common.h: Adjust.
gcc/c/ChangeLog:
* c-typeck.cc (convert_for_assignment): Adjust call to
warn_for_address_of_packed_member.
gcc/cp/ChangeLog:
* call.cc (convert_for_arg_passing)
* typeck.cc (convert_for_assignment): Adjust call to
warn_for_address_of_packed_member.
|
|
-Waddress-of-packed-member, in addition to the documented warning about
actually taking the address of a packed member, also warns about casting
from a pointer to a TYPE_PACKED type to a pointer to a type with greater
alignment.
This wrongly warns if the source is a pointer to enum when -fshort-enums
is on, since that is also represented by TYPE_PACKED.
And there's already -Wcast-align to catch casting from pointer to less
aligned type (packed or otherwise) to pointer to more aligned type; even
apart from the enum problem, this seems like a somewhat arbitrary subset of
that warning.
So, this patch removes the undocumented type-based warning from
-Waddress-of-packed-member. Some of the tests where the warning is
desirable I changed to use -Wcast-align=strict instead. The ones that
require -Wno-incompatible-pointer-types I just removed.
gcc/c-family/ChangeLog:
* c-warn.cc (check_address_or_pointer_of_packed_member):
Remove warning based on TYPE_PACKED.
gcc/testsuite/ChangeLog:
* c-c++-common/Waddress-of-packed-member-1.c: Don't expect
a warning on the cast cases.
* c-c++-common/pr51628-35.c: Use -Wcast-align=strict.
* g++.dg/warn/Waddress-of-packed-member3.C: Likewise.
* gcc.dg/pr88928.c: Likewise.
* gcc.dg/pr51628-20.c: Removed.
* gcc.dg/pr51628-21.c: Removed.
* gcc.dg/pr51628-25.c: Removed.
|
|
This patch changes the mapping node arrangement used for array components
of derived types in order to accommodate for changes made in the previous
patch, particularly the use of "GOMP_MAP_ATTACH_DETACH" for pointer-typed
derived-type members instead of "GOMP_MAP_ALWAYS_POINTER".
We change the mapping nodes used for a derived-type mapping like this:
type T
integer, pointer, dimension(:) :: arrptr
end type T
type(T) :: tvar
[...]
!$omp target map(tofrom: tvar%arrptr)
So that the nodes used look like this:
1) map(to: tvar%arrptr) -->
GOMP_MAP_TO [implicit] *tvar%arrptr%data (the array data)
GOMP_MAP_TO_PSET tvar%arrptr (the descriptor)
GOMP_MAP_ATTACH_DETACH tvar%arrptr%data
2) map(tofrom: tvar%arrptr(3:8) -->
GOMP_MAP_TOFROM *tvar%arrptr%data(3) (size 8-3+1, etc.)
GOMP_MAP_TO_PSET tvar%arrptr
GOMP_MAP_ATTACH_DETACH tvar%arrptr%data (bias 3, etc.)
In this case, we can determine in the front-end that the
whole-array/pointer mapping (1) is only needed to map the pointer
-- so we drop it entirely. (Note also that we set -- early -- the
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P flag for whole-array-via-pointer
mappings. See below.)
In the middle end, we process mappings using the struct sibling-list
handling machinery by moving the "GOMP_MAP_TO_PSET" node from the middle
of the group of three mapping nodes to the proper sorted position after
the GOMP_MAP_STRUCT mapping:
GOMP_MAP_STRUCT tvar (len: 1)
GOMP_MAP_TO_PSET tvar%arr (size: 64, etc.) <--. moved here
[...] |
GOMP_MAP_TOFROM *tvar%arrptr%data(3) ___|
GOMP_MAP_ATTACH_DETACH tvar%arrptr%data
In another case, if we have an array of derived-type values "dtarr",
and mappings like:
i = 1
j = 1
map(to: dtarr(i)%arrptr) map(tofrom: dtarr(j)%arrptr(3:8))
We still map the same way, but this time we cannot prove that the base
expressions "dtarr(i) and "dtarr(j)" are the same in the front-end.
So we keep both mappings, but we move the "[implicit]" mapping of the
full-array reference to the end of the clause list in gimplify.cc (by
adjusting the topological sorting algorithm):
GOMP_MAP_STRUCT dtvar (len: 2)
GOMP_MAP_TO_PSET dtvar(i)%arrptr
GOMP_MAP_TO_PSET dtvar(j)%arrptr
[...]
GOMP_MAP_TOFROM *dtvar(j)%arrptr%data(3) (size: 8-3+1)
GOMP_MAP_ATTACH_DETACH dtvar(j)%arrptr%data
GOMP_MAP_TO [implicit] *dtvar(i)%arrptr%data(1) (size: whole array)
GOMP_MAP_ATTACH_DETACH dtvar(i)%arrptr%data
Always moving "[implicit]" full-array mappings after array-section
mappings (without that bit set) means that we'll avoid copying the whole
array unnecessarily -- even in cases where we can't prove that the arrays
are the same.
The patch also fixes some bugs with "enter data" and "exit data"
directives with this new mapping arrangement. Also now if you have
mappings like this:
#pragma omp target enter data map(to: dv, dv%arr(1:20))
The whole of the derived-type variable "dv" is mapped, so the
GOMP_MAP_TO_PSET for the array-section mapping can be dropped:
GOMP_MAP_TO dv
GOMP_MAP_TO *dv%arr%data
GOMP_MAP_TO_PSET dv%arr <-- deleted (array section mapping)
GOMP_MAP_ATTACH_DETACH dv%arr%data
To accommodate for recent changes to mapping nodes made by
Tobias, this version of the patch avoids using GOMP_MAP_TO_PSET
for "exit data" directives, in favour of using the "correct"
GOMP_MAP_RELEASE/GOMP_MAP_DELETE kinds during early expansion. A new
flag is introduced so the middle-end knows when the latter two kinds
are being used specifically for an array descriptor.
This version of the patch fixes "omp target exit data" handling
for GOMP_MAP_DELETE, and adds pretty-printing dump output
for the OMP_CLAUSE_RELEASE_DESCRIPTOR flag (for a little extra
clarity).
Also I noticed the handling of descriptors on *OpenACC*
exit-data directives was inconsistent, so I've made those use
GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the new flag in the same way as
OpenMP too. In the end it doesn't actually matter to the runtime,
which handles GOMP_MAP_RELEASE/GOMP_MAP_DELETE/GOMP_MAP_TO_PSET for
array descriptors on OpenACC "exit data" directives the same, anyway,
and doing it this way in the FE avoids needless divergence.
I've added a couple of new tests (gomp/target-enter-exit-data.f90 and
goacc/enter-exit-data-2.f90).
2023-12-07 Julian Brown <julian@codesourcery.com>
gcc/fortran/
* dependency.cc (gfc_omp_expr_prefix_same): New function.
* dependency.h (gfc_omp_expr_prefix_same): Add prototype.
* gfortran.h (gfc_omp_namelist): Add "duplicate_of" field to "u2"
union.
* trans-openmp.cc (dependency.h): Include.
(gfc_trans_omp_array_section): Adjust mapping node arrangement for
array descriptors. Use GOMP_MAP_TO_PSET or
GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the OMP_CLAUSE_RELEASE_DESCRIPTOR
flag set.
(gfc_symbol_rooted_namelist): New function.
(gfc_trans_omp_clauses): Check subcomponent and subarray/element
accesses elsewhere in the clause list for pointers to derived types or
array descriptors, and adjust or drop mapping nodes appropriately.
Adjust for changes to mapping node arrangement.
(gfc_trans_oacc_executable_directive): Pass code op through.
gcc/
* gimplify.cc (omp_map_clause_descriptor_p): New function.
(build_omp_struct_comp_nodes, omp_get_attachment, omp_group_base): Use
above function.
(omp_tsort_mapping_groups): Process nodes that have
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P set after those that don't. Add
enter_exit_data parameter.
(omp_resolve_clause_dependencies): Remove GOMP_MAP_TO_PSET mappings if
we're mapping the whole containing derived-type variable.
(omp_accumulate_sibling_list): Adjust GOMP_MAP_TO_PSET handling.
Remove GOMP_MAP_ALWAYS_POINTER handling.
(gimplify_scan_omp_clauses): Pass enter_exit argument to
omp_tsort_mapping_groups. Don't adjust/remove GOMP_MAP_TO_PSET
mappings for derived-type components here.
* tree.h (OMP_CLAUSE_RELEASE_DESCRIPTOR): New macro.
* tree-pretty-print.cc (dump_omp_clause): Show
OMP_CLAUSE_RELEASE_DESCRIPTOR in dump output (with
GOMP_MAP_TO_PSET-like syntax).
gcc/testsuite/
* gfortran.dg/goacc/enter-exit-data-2.f90: New test.
* gfortran.dg/goacc/finalize-1.f: Adjust scan output.
* gfortran.dg/gomp/map-9.f90: Adjust scan output.
* gfortran.dg/gomp/map-subarray-2.f90: New test.
* gfortran.dg/gomp/map-subarray.f90: New test.
* gfortran.dg/gomp/target-enter-exit-data.f90: New test.
libgomp/
* testsuite/libgomp.fortran/map-subarray.f90: New test.
* testsuite/libgomp.fortran/map-subarray-2.f90: New test.
* testsuite/libgomp.fortran/map-subarray-3.f90: New test.
* testsuite/libgomp.fortran/map-subarray-4.f90: New test.
* testsuite/libgomp.fortran/map-subarray-6.f90: New test.
* testsuite/libgomp.fortran/map-subarray-7.f90: New test.
* testsuite/libgomp.fortran/map-subarray-8.f90: New test.
* testsuite/libgomp.fortran/map-subcomponents.f90: New test.
* testsuite/libgomp.fortran/struct-elem-map-1.f90: Adjust for
descriptor-mapping changes. Remove XFAIL.
|
|
This patch reworks clause expansion in the C, C++ and (to a lesser
extent) Fortran front ends for OpenMP and OpenACC mapping nodes used in
GPU offloading support.
At present a single clause may be turned into several mapping nodes,
or have its mapping type changed, in several places scattered through
the front- and middle-end. The analysis relating to which particular
transformations are needed for some given expression has become quite hard
to follow. Briefly, we manipulate clause types in the following places:
1. During parsing, in c_omp_adjust_map_clauses. Depending on a set of
rules, we may change a FIRSTPRIVATE_POINTER (etc.) mapping into
ATTACH_DETACH, or mark the decl addressable.
2. In semantics.cc or c-typeck.cc, clauses are expanded in
handle_omp_array_sections (called via {c_}finish_omp_clauses, or in
finish_omp_clauses itself. The two cases are for processing array
sections (the former), or non-array sections (the latter).
3. In gimplify.cc, we build sibling lists for struct accesses, which
groups and sorts accesses along with their struct base, creating
new ALLOC/RELEASE nodes for pointers.
4. In gimplify.cc:gimplify_adjust_omp_clauses, mapping nodes may be
adjusted or created.
This patch doesn't completely disrupt this scheme, though clause
types are no longer adjusted in c_omp_adjust_map_clauses (step 1).
Clause expansion in step 2 (for C and C++) now uses a single, unified
mechanism, parts of which are also reused for analysis in step 3.
Rather than the kind-of "ad-hoc" pattern matching on addresses used to
expand clauses used at present, a new method for analysing addresses is
introduced. This does a recursive-descent tree walk on expression nodes,
and emits a vector of tokens describing each "part" of the address.
This tokenized address can then be translated directly into mapping nodes,
with the assurance that no part of the expression has been inadvertently
skipped or misinterpreted. In this way, all the variations of ways
pointers, arrays, references and component accesses might be combined
can be teased apart into easily-understood cases - and we know we've
"parsed" the whole address before we start analysis, so the right code
paths can easily be selected.
For example, a simple access "arr[idx]" might parse as:
base-decl access-indexed-array
or "mystruct->foo[x]" with a pointer "foo" component might parse as:
base-decl access-pointer component-selector access-pointer
A key observation is that support for "array" bases, e.g. accesses
whose root nodes are not structures, but describe scalars or arrays,
and also *one-level deep* structure accesses, have first-class support
in gimplify and beyond. Expressions that use deeper struct accesses
or e.g. multiple indirections were more problematic: some cases worked,
but lots of cases didn't. This patch reimplements the support for those
in gimplify.cc, again using the new "address tokenization" support.
An expression like "mystruct->foo->bar[0:10]" used in a mapping node will
translate the right-hand access directly in the front-end. The base for
the access will be "mystruct->foo". This is handled recursively in
gimplify.cc -- there may be several accesses of "mystruct"'s members
on the same directive, so the sibling-list building machinery can be
used again. (This was already being done for OpenACC, but the new
implementation differs somewhat in details, and is more robust.)
For OpenMP, in the case where the base pointer itself,
i.e. "mystruct->foo" here, is NOT mapped on the same directive, we
create a "fragile" mapping. This turns the "foo" component access
into a zero-length allocation (which is a new feature for the runtime,
so support has been added there too).
A couple of changes have been made to how mapping clauses are turned
into mapping nodes:
The first change is based on the observation that it is probably never
correct to use GOMP_MAP_ALWAYS_POINTER for component accesses (e.g. for
references), because if the containing struct is already mapped on the
target then the host version of the pointer in question will be corrupted
if the struct is copied back from the target. This patch removes all
such uses, across each of C, C++ and Fortran.
The second change is to the way that GOMP_MAP_ATTACH_DETACH nodes
are processed during sibling-list creation. For OpenMP, for pointer
components, we must map the base pointer separately from an array section
that uses the base pointer, so e.g. we must have both "map(mystruct.base)"
and "map(mystruct.base[0:10])" mappings. These create nodes such as:
GOMP_MAP_TOFROM mystruct.base
G_M_TOFROM *mystruct.base [len: 10*elemsize] G_M_ATTACH_DETACH mystruct.base
Instead of using the first of these directly when building the struct
sibling list then skipping the group using GOMP_MAP_ATTACH_DETACH,
leading to:
GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_TOFROM mystruct.base
we now introduce a new "mini-pass", omp_resolve_clause_dependencies, that
drops the GOMP_MAP_TOFROM for the base pointer, marks the second group
as having had a base-pointer mapping, then omp_build_struct_sibling_lists
can create:
GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_ALLOC mystruct.base [len: ptrsize]
This ends up working better in many cases, particularly those involving
references. (The "alloc" space is immediately overwritten by a pointer
attachment, so this is mildly more efficient than a redundant TO mapping
at runtime also.)
There is support in the address tokenizer for "arbitrary" base expressions
which aren't rooted at a decl, but that is not used as present because
such addresses are disallowed at parse time.
In the front-ends, the address tokenization machinery is mostly only
used for clause expansion and not for diagnostics at present. It could
be used for those too, which would allow more of my previous "address
inspector" implementation to be removed.
The new bits in gimplify.cc work with OpenACC also.
This version of the patch addresses several first-pass review comments
from Tobias, and fixes a few previously-missed cases for manually-managed
ragged array mappings (including cases using references). Some arbitrary
differences between handling of clause expansion for C vs. C++ have also
been fixed, and some fragments from later in the patch series have been
moved forward (where they were useful for fixing bugs). Several new
test cases have been added.
2023-11-29 Julian Brown <julian@codesourcery.com>
gcc/c-family/
* c-common.h (c_omp_region_type): Add C_ORT_EXIT_DATA,
C_ORT_OMP_EXIT_DATA and C_ORT_ACC_TARGET.
(omp_addr_token): Add forward declaration.
(c_omp_address_inspector): New class.
* c-omp.cc (c_omp_adjust_map_clauses): Mark decls addressable here, but
do not change any mapping node types.
(c_omp_address_inspector::unconverted_ref_origin,
c_omp_address_inspector::component_access_p,
c_omp_address_inspector::check_clause,
c_omp_address_inspector::get_root_term,
c_omp_address_inspector::map_supported_p,
c_omp_address_inspector::get_origin,
c_omp_address_inspector::maybe_unconvert_ref,
c_omp_address_inspector::maybe_zero_length_array_section,
c_omp_address_inspector::expand_array_base,
c_omp_address_inspector::expand_component_selector,
c_omp_address_inspector::expand_map_clause): New methods.
(omp_expand_access_chain): New function.
gcc/c/
* c-parser.cc (c_parser_oacc_all_clauses): Add TARGET_P parameter. Use
to select region type for c_finish_omp_clauses call.
(c_parser_oacc_loop): Update calls to c_parser_oacc_all_clauses.
(c_parser_oacc_compute): Likewise.
(c_parser_omp_target_data, c_parser_omp_target_enter_data): Support
ATTACH kind.
(c_parser_omp_target_exit_data): Support DETACH kind.
(check_clauses): Handle GOMP_MAP_POINTER and GOMP_MAP_ATTACH here.
* c-typeck.cc (handle_omp_array_sections_1,
handle_omp_array_sections, c_finish_omp_clauses): Use
c_omp_address_inspector class and OMP address tokenizer to analyze and
expand map clause expressions. Fix some diagnostics. Fix "is OpenACC"
condition for C_ORT_ACC_TARGET addition.
gcc/cp/
* parser.cc (cp_parser_oacc_all_clauses): Add TARGET_P parameter. Use
to select region type for finish_omp_clauses call.
(cp_parser_omp_target_data, cp_parser_omp_target_enter_data): Support
GOMP_MAP_ATTACH kind.
(cp_parser_omp_target_exit_data): Support GOMP_MAP_DETACH kind.
(cp_parser_oacc_declare): Update call to cp_parser_oacc_all_clauses.
(cp_parser_oacc_loop): Update calls to cp_parser_oacc_all_clauses.
(cp_parser_oacc_compute): Likewise.
* pt.cc (tsubst_expr): Use C_ORT_ACC_TARGET for call to
tsubst_omp_clauses for OpenACC compute regions.
* semantics.cc (cp_omp_address_inspector): New class, derived from
c_omp_address_inspector.
(handle_omp_array_sections_1, handle_omp_array_sections,
finish_omp_clauses): Use cp_omp_address_inspector class and OMP address
tokenizer to analyze and expand OpenMP map clause expressions. Fix
some diagnostics. Support C_ORT_ACC_TARGET.
(finish_omp_target): Handle GOMP_MAP_POINTER.
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_array_section): Add OPENMP parameter.
Use GOMP_MAP_ATTACH_DETACH instead of GOMP_MAP_ALWAYS_POINTER for
derived type components.
(gfc_trans_omp_clauses): Update calls to gfc_trans_omp_array_section.
gcc/
* gimplify.cc (build_struct_comp_nodes): Don't process
GOMP_MAP_ATTACH_DETACH "middle" nodes here.
(omp_mapping_group): Add REPROCESS_STRUCT and FRAGILE booleans for
nested struct handling.
(omp_strip_components_and_deref, omp_strip_indirections): Remove
functions.
(omp_get_attachment): Handle GOMP_MAP_DETACH here.
(omp_group_last): Handle GOMP_MAP_*, GOMP_MAP_DETACH,
GOMP_MAP_ATTACH_DETACH groups for "exit data" of reference-to-pointer
component array sections.
(omp_gather_mapping_groups_1): Initialise reprocess_struct and fragile
fields.
(omp_group_base): Handle GOMP_MAP_ATTACH_DETACH after GOMP_MAP_STRUCT.
(omp_index_mapping_groups_1): Skip reprocess_struct groups.
(omp_get_nonfirstprivate_group, omp_directive_maps_explicitly,
omp_resolve_clause_dependencies, omp_first_chained_access_token): New
functions.
(omp_check_mapping_compatibility): Adjust accepted node combinations
for "from" clauses using release instead of alloc.
(omp_accumulate_sibling_list): Add GROUP_MAP, ADDR_TOKENS, FRAGILE_P,
REPROCESSING_STRUCT, ADDED_TAIL parameters. Use OMP address tokenizer
to analyze addresses. Reimplement nested struct handling, and
implement "fragile groups".
(omp_build_struct_sibling_lists): Adjust for changes to
omp_accumulate_sibling_list. Recalculate bias for ATTACH_DETACH nodes
after GOMP_MAP_STRUCT nodes.
(gimplify_scan_omp_clauses): Call omp_resolve_clause_dependencies. Use
OMP address tokenizer.
(gimplify_adjust_omp_clauses_1): Use build_fold_indirect_ref_loc
instead of build_simple_mem_ref_loc.
* omp-general.cc (omp-general.h, tree-pretty-print.h): Include.
(omp_addr_tokenizer): New namespace.
(omp_addr_tokenizer::omp_addr_token): New.
(omp_addr_tokenizer::omp_parse_component_selector,
omp_addr_tokenizer::omp_parse_ref,
omp_addr_tokenizer::omp_parse_pointer,
omp_addr_tokenizer::omp_parse_access_method,
omp_addr_tokenizer::omp_parse_access_methods,
omp_addr_tokenizer::omp_parse_structure_base,
omp_addr_tokenizer::omp_parse_structured_expr,
omp_addr_tokenizer::omp_parse_array_expr,
omp_addr_tokenizer::omp_access_chain_p,
omp_addr_tokenizer::omp_accessed_addr): New functions.
(omp_parse_expr, debug_omp_tokenized_addr): New functions.
* omp-general.h (omp_addr_tokenizer::access_method_kinds,
omp_addr_tokenizer::structure_base_kinds,
omp_addr_tokenizer::token_type,
omp_addr_tokenizer::omp_addr_token,
omp_addr_tokenizer::omp_access_chain_p,
omp_addr_tokenizer::omp_accessed_addr): New.
(omp_addr_token, omp_parse_expr): New.
* omp-low.cc (scan_sharing_clauses): Skip error check for references
to pointers.
* tree.h (OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED): New macro.
gcc/testsuite/
* c-c++-common/gomp/clauses-2.c: Fix error output.
* c-c++-common/gomp/target-implicit-map-2.c: Adjust scan output.
* c-c++-common/gomp/target-50.c: Adjust scan output.
* c-c++-common/gomp/target-enter-data-1.c: Adjust scan output.
* g++.dg/gomp/static-component-1.C: New test.
* gcc.dg/gomp/target-3.c: Adjust scan output.
* gfortran.dg/gomp/map-9.f90: Adjust scan output.
libgomp/
* target.c (gomp_map_pointer): Modify zero-length array section
pointer handling.
(gomp_attach_pointer): Likewise.
(gomp_map_fields_existing): Use gomp_map_0len_lookup.
(gomp_attach_pointer): Allow attaching null pointers (or Fortran
"unassociated" pointers).
(gomp_map_vars_internal): Handle zero-sized struct members. Add
diagnostic for unmapped struct pointer members.
* testsuite/libgomp.c-c++-common/baseptrs-1.c: New test.
* testsuite/libgomp.c-c++-common/baseptrs-2.c: New test.
* testsuite/libgomp.c-c++-common/baseptrs-6.c: New test.
* testsuite/libgomp.c-c++-common/baseptrs-7.c: New test.
* testsuite/libgomp.c-c++-common/ptr-attach-2.c: New test.
* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing
"free".
* testsuite/libgomp.c-c++-common/target-implicit-map-5.c: New test.
* testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test.
* testsuite/libgomp.c++/class-array-1.C: New test.
* testsuite/libgomp.c++/baseptrs-3.C: New test.
* testsuite/libgomp.c++/baseptrs-4.C: New test.
* testsuite/libgomp.c++/baseptrs-5.C: New test.
* testsuite/libgomp.c++/baseptrs-8.C: New test.
* testsuite/libgomp.c++/baseptrs-9.C: New test.
* testsuite/libgomp.c++/ref-mapping-1.C: New test.
* testsuite/libgomp.c++/target-48.C: New test.
* testsuite/libgomp.c++/target-49.C: New test.
* testsuite/libgomp.c++/target-exit-data-reftoptr-1.C: New test.
* testsuite/libgomp.c++/target-lambda-1.C: Update for OpenMP 5.2
semantics.
* testsuite/libgomp.c++/target-this-3.C: Likewise.
* testsuite/libgomp.c++/target-this-4.C: Likewise.
* testsuite/libgomp.fortran/struct-elem-map-1.f90: Add temporary XFAIL.
* testsuite/libgomp.fortran/target-enter-data-6.f90: Likewise.
|
|
This patch trivially adds braces and reindents the
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza in
c_finish_omp_clause and finish_omp_clause, in preparation for the
following patch (to clarify the diff a little).
2022-09-13 Julian Brown <julian@codesourcery.com>
gcc/c/
* c-typeck.cc (c_finish_omp_clauses): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.
gcc/cp/
* semantics.cc (finish_omp_clause): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.
|
|
On the c-c++-common/cpp/pr88974.c testcase I'm seeing
==600549== Conditional jump or move depends on uninitialised value(s)
==600549== at 0x1DD3A05: cpp_get_token_1(cpp_reader*, unsigned int*) (macro.cc:3050)
==600549== by 0x1DBFC7F: _cpp_parse_expr (expr.cc:1392)
==600549== by 0x1DB9471: do_if(cpp_reader*) (directives.cc:2087)
==600549== by 0x1DBB4D8: _cpp_handle_directive (directives.cc:572)
==600549== by 0x1DCD488: _cpp_lex_token (lex.cc:3682)
==600549== by 0x1DD3A97: cpp_get_token_1(cpp_reader*, unsigned int*) (macro.cc:2936)
==600549== by 0x7F7EE4: scan_translation_unit (c-ppoutput.cc:350)
==600549== by 0x7F7EE4: preprocess_file(cpp_reader*) (c-ppoutput.cc:106)
==600549== by 0x7F6235: c_common_init() (c-opts.cc:1280)
==600549== by 0x704C8B: lang_dependent_init (toplev.cc:1837)
==600549== by 0x704C8B: do_compile (toplev.cc:2135)
==600549== by 0x704C8B: toplev::main(int, char**) (toplev.cc:2306)
==600549== by 0x7064BA: main (main.cc:39)
error. The problem is that _cpp_lex_direct can leave result->src_loc
uninitialized in some cases and later on we use that location_t.
_cpp_lex_direct essentially does:
cppchar_t c;
...
cpp_token *result = pfile->cur_token++;
fresh_line:
result->flags = 0;
...
if (buffer->need_line)
{
if (pfile->state.in_deferred_pragma)
{
result->type = CPP_PRAGMA_EOL;
... // keeps result->src_loc uninitialized;
return result;
}
if (!_cpp_get_fresh_line (pfile))
{
result->type = CPP_EOF;
if (!pfile->state.in_directive && !pfile->state.parsing_args)
{
result->src_loc = pfile->line_table->highest_line;
...
}
... // otherwise result->src_loc is sometimes uninitialized here
return result;
}
...
}
...
result->src_loc = pfile->line_table->highest_line;
...
c = *buffer->cur++;
switch (c)
{
...
case '\n':
...
buffer->need_line = true;
if (pfile->state.in_deferred_pragma)
{
result->type = CPP_PRAGMA_EOL;
...
return result;
}
goto fresh_line;
...
}
...
So, if _cpp_lex_direct is called without buffer->need_line initially set,
result->src_loc is always initialized (and actually hundreds of tests rely
on that exact value it has), even when c == '\n' and we set that flag later
on and goto fresh_line. For CPP_PRAGMA_EOL case we have in that case
separate handling and don't goto.
But if _cpp_lex_direct is called with buffer->need_line initially set and
either decide to return a CPP_PRAGMA_EOL token or if getting a new line fails
for some reason and we return an CPP_ERROR token and we are in directive
or parsing args state, it is kept uninitialized and can be whatever the
allocation left it there as.
The following patch attempts to keep the status quo, use value that was
returned previously if it was initialized (i.e. we went through the
goto fresh_line; statement in c == '\n' handling) and only initialize
result->src_loc if it was uninitialized before.
2023-12-13 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/112956
* lex.cc (_cpp_lex_direct): Initialize c to 0.
For CPP_PRAGMA_EOL tokens and if c == 0 also for CPP_EOF
set result->src_loc to highest locus.
|
|
Fix-up for commit 348874f0baac0f22c98ab11abbfa65fd172f6bdd
"libgomp: basic pinned memory on Linux", which may result in build failures
as follow, for example, for the '-m32' multilib of x86_64-pc-linux-gnu:
In file included from [...]/source-gcc/libgomp/config/linux/allocator.c:31:
[...]/source-gcc/libgomp/config/linux/allocator.c: In function ‘linux_memspace_alloc’:
[...]/source-gcc/libgomp/config/linux/allocator.c:70:26: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
70 | gomp_debug (0, "libgomp: failed to pin %ld bytes of"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
71 | " memory (ulimit too low?)\n", size);
| ~~~~
| |
| size_t {aka unsigned int}
[...]/source-gcc/libgomp/libgomp.h:186:29: note: in definition of macro ‘gomp_debug’
186 | (gomp_debug) ((KIND), __VA_ARGS__); \
| ^~~~~~~~~~~
[...]/source-gcc/libgomp/config/linux/allocator.c:70:52: note: format string is defined here
70 | gomp_debug (0, "libgomp: failed to pin %ld bytes of"
| ~~^
| |
| long int
| %d
cc1: all warnings being treated as errors
make[9]: *** [allocator.lo] Error 1
make[9]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/32/libgomp'
[...]
Fix this in the same way as used elsewhere in libgomp.
libgomp/
* config/linux/allocator.c (linux_memspace_alloc): Fix 'size_t'
vs. '%ld' format string mismatch.
|
|
My r14-6505-g52b4b7d7f5c7c0 change to copy the location in
build_aggr_init_expr reopened PR96997; let's fix it properly this time, by
clearing the location like we do for other trees.
PR c++/96997
gcc/cp/ChangeLog:
* tree.cc (bot_manip): Check data.clear_location for TARGET_EXPR.
gcc/testsuite/ChangeLog:
* g++.dg/debug/cleanup2.C: New test.
|
|
This reverts commit d2b269ce30d77dbfc6c28c75887c330d4698b132.
|
|
For completeness here are three SHORTREAL modules which match their
LONGREAL and REAL counterparts. The datatype SHORTREAL is a GNU
extension and these modules were missing.
gcc/m2/ChangeLog:
PR modula2/112921
* gm2-libs-iso/ConvStringShort.def: New file.
* gm2-libs-iso/ConvStringShort.mod: New file.
* gm2-libs-iso/ShortConv.def: New file.
* gm2-libs-iso/ShortConv.mod: New file.
* gm2-libs-iso/ShortMath.def: New file.
* gm2-libs-iso/ShortMath.mod: New file.
* gm2-libs-iso/ShortStr.def: New file.
* gm2-libs-iso/ShortStr.mod: New file.
libgm2/ChangeLog:
PR modula2/112921
* libm2iso/Makefile.am (M2DEFS): Add ConvStringShort.def,
ShortConv.def, ShortMath.def and ShortStr.def.
(M2MODS): Add ConvStringShort.mod,
ShortConv.mod, ShortMath.mod and ShortStr.mod.
* libm2iso/Makefile.in: Regenerate.
gcc/testsuite/ChangeLog:
PR modula2/112921
* gm2/iso/run/pass/shorttest.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
This patch adds checks for using objects after they've been manually
destroyed via explicit destructor call. Currently this is only
implemented for 'top-level' objects; FIELD_DECLs and individual elements
of arrays will need a lot more work to track correctly and are left for
a future patch.
The other limitation is that destruction of parameter objects is checked
too 'early', happening at the end of the function call rather than the
end of the owning full-expression as they should be for consistency;
see cpp2a/constexpr-lifetime2.C. This is because I wasn't able to find a
good way to link the constructed parameter declarations with the
variable declarations that are actually destroyed later on to propagate
their lifetime status, so I'm leaving this for a later patch.
PR c++/71093
gcc/cp/ChangeLog:
* constexpr.cc (constexpr_global_ctx::get_value_ptr): Don't
return NULL_TREE for objects we're initializing.
(constexpr_global_ctx::destroy_value): Rename from remove_value.
Only mark real variables as outside lifetime.
(constexpr_global_ctx::clear_value): New function.
(destroy_value_checked): New function.
(cxx_eval_call_expression): Defer complaining about non-constant
arg0 for operator delete. Use remove_value_safe.
(cxx_fold_indirect_ref_1): Handle conversion to 'as base' type.
(outside_lifetime_error): Include name of object we're
accessing.
(cxx_eval_store_expression): Handle clobbers. Improve error
messages.
(cxx_eval_constant_expression): Use remove_value_safe. Clear
bind variables before entering body.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/constexpr-lifetime1.C: Improve error message.
* g++.dg/cpp1y/constexpr-lifetime2.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime3.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime4.C: Likewise.
* g++.dg/cpp2a/bitfield2.C: Likewise.
* g++.dg/cpp2a/constexpr-new3.C: Likewise. New check.
* g++.dg/cpp1y/constexpr-lifetime7.C: New test.
* g++.dg/cpp2a/constexpr-lifetime1.C: New test.
* g++.dg/cpp2a/constexpr-lifetime2.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
I was puzzled by the proposed patch for PR71093 specifically ignoring the
in-charge parameter; the problem turned out to be that when
cxx_eval_call_expression jumps from the clone to the cloned function, it
assumes that the latter has the same parameters, and so the in-charge parm
doesn't get an argument. Since a class with vbases can't have constexpr
'tors there isn't actually a need for an in-charge parameter in a
destructor, but we used to use it for deleting destructors and never removed
it. I have a patch to do that for GCC 15, but for now let's work around it.
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_call_expression): Handle missing in-charge
argument.
|
|
When testing the proposed patch for PR71093 I noticed that it changed the
diagnostic for consteval-prop6.C. I then noticed that the diagnostic wasn't
very helpful either way; it was complaining about modification of the 'x'
variable, but it's not a problem to initialize a local variable with a
consteval constructor as long as the value is actually constant, we want to
know why the value isn't constant. And then it turned out that this also
fixed a missed-optimization bug in the testsuite.
PR c++/108243
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_outermost_constant_expr): Turn
a constructor CALL_EXPR into a TARGET_EXPR.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/consteval-prop6.C: Adjust diagnostic.
* g++.dg/opt/is_constant_evaluated3.C: Remove xfails.
|
|
When building an AGGR_INIT_EXPR from a CALL_EXPR, we shouldn't lose location
information.
gcc/cp/ChangeLog:
* tree.cc (build_aggr_init_expr): Copy EXPR_LOCATION.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/constexpr-nsdmi7b.C: Adjust line.
* g++.dg/template/copy1.C: Likewise.
|
|
gcc/testsuite/ChangeLog:
* g++.dg/pr112822.C: Require C++17.
|
|
The extra register pressure is causing infinite loops in some cases, especially
at -O0. I have not yet observed any issue on devices that have AVGPRs for
spilling, and XNACK is only really useful on those devices anyway, so change
the defaults.
gcc/ChangeLog:
* config/gcn/gcn-hsa.h (NO_XNACK): Change the defaults.
* config/gcn/gcn-opts.h (enum hsaco_attr_type): Add HSACO_ATTR_DEFAULT.
* config/gcn/gcn.cc (gcn_option_override): Set the default flag_xnack.
* config/gcn/gcn.opt: Add -mxnack=default.
* doc/invoke.texi: Document the -mxnack default.
|
|
The XNACK feature allows memory load instructions to restart safely following
a page-miss interrupt. This is useful for shared-memory devices, like APUs,
and to implement OpenMP Unified Shared Memory.
To support the feature we must be able to set the appropriate meta-data and
set the load instructions to early-clobber. When the port supports scheduling
of s_waitcnt instructions there will be further requirements.
gcc/ChangeLog:
* config/gcn/gcn-hsa.h (NO_XNACK): Ignore missing -march.
(XNACKOPT): Match on/off; ignore any.
* config/gcn/gcn-valu.md (gather<mode>_insn_1offset<exec>):
Add xnack compatible alternatives.
(gather<mode>_insn_2offsets<exec>): Likewise.
* config/gcn/gcn.cc (gcn_option_override): Permit -mxnack for devices
other than Fiji and gfx1030.
(gcn_expand_epilogue): Remove early-clobber problems.
(gcn_hsa_declare_function_name): Obey -mxnack setting.
* config/gcn/gcn.md (xnack): New attribute.
(enabled): Rework to include "xnack" attribute.
(*movbi): Add xnack compatible alternatives.
(*mov<mode>_insn): Likewise.
(*mov<mode>_insn): Likewise.
(*mov<mode>_insn): Likewise.
(*movti_insn): Likewise.
* config/gcn/gcn.opt (-mxnack): Change the default to "any".
* doc/invoke.texi: Remove placeholder notice for -mxnack.
|
|
Add a terminating newline to various tests, and add missing
extensions to some test strings. The current output is broken for
options_set_4.c, so this test is left unchanged, to be fixed in a
subsequent patch.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/native_cpu_18.c: Add \+nopauth\n
* gcc.target/aarch64/options_set_7.c: Add \+crc\n
* gcc.target/aarch64/options_set_8.c: Add \+crc\+nodotprod\n
* gcc.target/aarch64/cpunative/native_cpu_0.c: Add \n
* gcc.target/aarch64/cpunative/native_cpu_1.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_2.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_3.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_4.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_5.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_8.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_9.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_10.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_11.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_12.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_14.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_15.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/options_set_1.c: Ditto.
* gcc.target/aarch64/options_set_2.c: Ditto.
* gcc.target/aarch64/options_set_3.c: Ditto.
* gcc.target/aarch64/options_set_5.c: Ditto.
* gcc.target/aarch64/options_set_6.c: Ditto.
* gcc.target/aarch64/options_set_9.c: Ditto.
* gcc.target/aarch64/options_set_11.c: Ditto.
* gcc.target/aarch64/options_set_12.c: Ditto.
* gcc.target/aarch64/options_set_13.c: Ditto.
* gcc.target/aarch64/options_set_14.c: Ditto.
* gcc.target/aarch64/options_set_15.c: Ditto.
* gcc.target/aarch64/options_set_16.c: Ditto.
* gcc.target/aarch64/options_set_17.c: Ditto.
* gcc.target/aarch64/options_set_18.c: Ditto.
* gcc.target/aarch64/options_set_19.c: Ditto.
* gcc.target/aarch64/options_set_20.c: Ditto.
* gcc.target/aarch64/options_set_21.c: Ditto.
* gcc.target/aarch64/options_set_22.c: Ditto.
* gcc.target/aarch64/options_set_23.c: Ditto.
* gcc.target/aarch64/options_set_24.c: Ditto.
* gcc.target/aarch64/options_set_25.c: Ditto.
* gcc.target/aarch64/options_set_26.c: Ditto.
|
|
gcc/ChangeLog:
* config/aarch64/x-aarch64: Add missing dependencies.
|
|
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.
This implementation will work OK for page-scale allocations, and finer-grained
allocations will be implemented in a future patch.
libgomp/ChangeLog:
* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
(omp_init_allocator): Use MEMSPACE_VALIDATE to check pinning.
(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
(omp_free): Likewise.
* config/linux/allocator.c: New file.
* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
* config/gcn/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* libgomp.texi: Switch pinned trait to supported.
(MEMSPACE_VALIDATE): Add PIN.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.
* testsuite/libgomp.c/alloc-pinned-3.c: New test.
* testsuite/libgomp.c/alloc-pinned-4.c: New test.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
|
|
Add dg-do compile target directive that limits the test case to being built
on c++17 compiles or greater.
2023-12-13 Peter Bergner <bergner@linux.ibm.com>
gcc/testsuite/
PR tree-optimization/112822
* g++.dg/pr112822.C: Add dg-do compile target c++17 directive.
|
|
Refine the test cases for:
* Name convention.
* Add run case.
These test cases used to cause out-of-bounds writes to the stack
and therefore showed unreliable behavior. Depending on the
execution environment they can either pass or fail. As of now,
with the latest QEMU version, they will pass even without the
underlying issue fixed. As the test case is known to have
caused the problem before we keep it as a run test case for
future reference.
PR target/112929
PR target/112988
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr112929.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr112929-1.c: ...here.
* gcc.target/riscv/rvv/vsetvl/pr112988.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr112988-1.c: ...here.
* gcc.target/riscv/rvv/vsetvl/pr112929-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr112988-2.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
gcc/testsuite/ChangeLog:
* g++.target/aarch64/sve/aarch64-ssve.exp:
|
|
This patch improves the code generated for bitfield sign extensions on
ARC cpus without a barrel shifter.
Compiling the following test case:
int foo(int x) { return (x<<27)>>27; }
with -O2 -mcpu=em, generates two loops:
foo: mov lp_count,27
lp 2f
add r0,r0,r0
nop
2: # end single insn loop
mov lp_count,27
lp 2f
asr r0,r0
nop
2: # end single insn loop
j_s [blink]
and the closely related test case:
struct S { int a : 5; };
int bar (struct S *p) { return p->a; }
generates the slightly better:
bar: ldb_s r0,[r0]
mov_s r2,0 ;3
add3 r0,r2,r0
sexb_s r0,r0
asr_s r0,r0
asr_s r0,r0
j_s.d [blink]
asr_s r0,r0
which uses 6 instructions to perform this particular sign extension.
It turns out that sign extensions can always be implemented using at
most three instructions on ARC (without a barrel shifter) using the
idiom ((x&mask)^msb)-msb [as described in section "2-5 Sign Extension"
of Henry Warren's book "Hacker's Delight"]. Using this, the sign
extensions above on ARC's EM both become:
bmsk_s r0,r0,4
xor r0,r0,16
sub r0,r0,16
which takes about 3 cycles, compared to the ~112 cycles for the loops
in foo.
2023-12-13 Roger Sayle <roger@nextmovesoftware.com>
Jeff Law <jlaw@ventanamicro.com>
gcc/ChangeLog
* config/arc/arc.md (*extvsi_n_0): New define_insn_and_split to
implement SImode sign extract using a AND, XOR and MINUS sequence.
gcc/testsuite/ChangeLog
* gcc.target/arc/extvsi-1.c: New test case.
* gcc.target/arc/extvsi-2.c: Likewise.
|
|
Due to the crypto vector entension is depend on the Vector extension,
so add the implied ISA info with the corresponding crypto vector extension.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Modify implied ISA info.
* config/riscv/arch-canonicalize: Add crypto vector implied info.
|
|
The change in r14-6468-ga01462ae8bafa8 was only supposed to apply to %C
formats, not %Y.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_chrono::_M_C_y_Y): Do
not round century down for %Y formats.
|
|
This fixes issues reported by David Edelsohn <dje.gcc@gmail.com>, and by
Eric Gallager <egallager@gcc.gnu.org>.
ChangeLog:
* Makefile.def (gettext): Disable (via missing)
{install-,}{pdf,html,info,dvi} and TAGS targets. Set no_install
to true. Add --disable-threads --disable-libasprintf. Drop the
lib_path (as there are no shared libs).
* Makefile.in: Regenerate.
|
|
contrib/ChangeLog:
* download_prerequisites
<arg parse>: Parse --only-gettext.
(echo_archives): Check only_gettext and stop early if true.
(helptext): Document --only-gettext.
|
|
Fix VSETVL BUG that AVL is polluted
.L15:
li a3,9
lui a4,%hi(s)
sw a3,%lo(j)(t2)
sh a5,%lo(s)(a4) <--a4 is hold the address of s
beq t0,zero,.L42
sw t5,8(t4)
vsetvli zero,a4,e8,m8,ta,ma <<--- a4 as avl
Actually, this vsetvl is redundant.
The root cause we include full available optimization in LCM local data computation.
full available optimization should be after LCM computation.
PR target/112929
PR target/112988
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc
(pre_vsetvl::compute_lcm_local_properties): Remove full available.
(pre_vsetvl::pre_global_vsetvl_info): Add full available optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr112929.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr112988.c: New test.
|
|
Some toolchain configs would report:
fatal error: gnu/stubs-ilp32.h: No such file or directory
Fix method suggested by Juzhe-Zhong
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h: New file.
Signed-off-by: demin.han <demin.han@starfivetech.com>
Signed-off-by: demin.han <demin.han@starfivetech.com>
|
|
Hi, before this patch, a simple conversion case for RVV codegen:
foo:
ble a2,zero,.L8
addiw a5,a2,-1
li a4,6
bleu a5,a4,.L6
srliw a3,a2,3
slli a3,a3,3
add a3,a3,a0
mv a5,a0
mv a4,a1
vsetivli zero,8,e16,m1,ta,ma
.L4:
vle8.v v2,0(a5)
addi a5,a5,8
vzext.vf2 v1,v2
vse16.v v1,0(a4)
addi a4,a4,16
bne a3,a5,.L4
andi a5,a2,-8
beq a2,a5,.L10
.L3:
slli a4,a5,32
srli a4,a4,32
subw a2,a2,a5
slli a2,a2,32
slli a5,a4,1
srli a2,a2,32
add a0,a0,a4
add a1,a1,a5
vsetvli zero,a2,e16,m1,ta,ma
vle8.v v2,0(a0)
vzext.vf2 v1,v2
vse16.v v1,0(a1)
.L8:
ret
.L10:
ret
.L6:
li a5,0
j .L3
This vectorization go through first loop:
vsetivli zero,8,e16,m1,ta,ma
.L4:
vle8.v v2,0(a5)
addi a5,a5,8
vzext.vf2 v1,v2
vse16.v v1,0(a4)
addi a4,a4,16
bne a3,a5,.L4
Each iteration processes 8 elements.
For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN = 128.
But, as long as VLEN > 128 bits, it will waste the CPU resources. That is, e.g. VLEN = 256bits.
only half of the vector units are working and another half is idle.
After investigation, I realize that I forgot to adjust COST for SELECT_VL.
So, adjust COST for SELECT_VL styple length vectorization. We adjust COST from 3 to 2. since
after this patch:
foo:
ble a2,zero,.L5
.L3:
vsetvli a5,a2,e16,m1,ta,ma -----> SELECT_VL cost.
vle8.v v2,0(a0)
slli a4,a5,1 -----> additional shift of outcome SELECT_VL for memory address calculation.
vzext.vf2 v1,v2
sub a2,a2,a5
vse16.v v1,0(a1)
add a0,a0,a5
add a1,a1,a4
bne a2,zero,.L3
.L5:
ret
This patch is a simple fix that I previous forgot.
Ok for trunk ?
If not, I am going to adjust cost in backend cost model.
PR target/111317
gcc/ChangeLog:
* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust for COST for decrement IV.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr111317.c: New test.
|
|
wider cast [PR112940]
The following testcase ICEs, because a PHI argument from latch edge
uses a SSA_NAME set only in a conditionally executed block inside of the
loop.
This happens when we have some outer cast which lowers its operand several
times, under some condition with variable index, under different condition
with some constant index, otherwise something else, and then there is
an inner cast from non-_BitInt integer (or small/middle one). Such cast
in certain conditions is emitted by initializing some SSA_NAMEs in the
initialization statements before loops (say for casts from <= limb size
precision by computing a SSA_NAME for the first limb and then extension
of it for the later limbs) and uses the prepare_data_in_out function
to create a PHI node. Such function is passed the value (constant or
SSA_NAME) to use in the PHI argument from the pre-header edge, but for
the latch edge it always created a new SSA_NAME and then caller emitted
in the following 3 spots an extra assignment to set that SSA_NAME to
whatever value we want from the latch edge. In all these 3 cases
the argument from the latch edge is known already before the loop though,
either constant or SSA_NAME computed in pre-header as well.
But the need to emit an assignment combined with the handle_operand done
in a conditional basic block results in the SSA verification failure.
The following patch fixes it by extending the prpare_data_in_out method,
so that when the latch edge argument is known before (constant or computed
in pre-header), we can just use it directly and avoid the extra assignment
that would normally be hopefully optimized away later to what we now emit
directly.
2023-12-13 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112940
* gimple-lower-bitint.cc (struct bitint_large_huge): Add another
argument to prepare_data_in_out method defaulted to NULL_TREE.
(bitint_large_huge::handle_operand): Pass another argument to
prepare_data_in_out instead of emitting an assignment to set it.
(bitint_large_huge::prepare_data_in_out): Add VAL_OUT argument.
If non-NULL, use it as PHI argument instead of creating a new
SSA_NAME.
(bitint_large_huge::handle_cast): Pass rext as another argument
to 2 prepare_data_in_out calls instead of emitting assignments
to set them.
* gcc.dg/bitint-53.c: New test.
|