riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2023-12-14	A new copy propagation and PHI elimination pass	Filip Kastl	5	-0/+762
	This patch adds the strongly-connected copy propagation (SCCOPY) pass. It is a lightweight GIMPLE copy propagation pass that also removes some redundant PHI statements. It handles degenerate PHIs, e.g.: _5 = PHI <_1>; _6 = PHI <_6, _6, _1, _1>; _7 = PHI <16, _7>; // Replaces occurences of _5 and _6 by _1 and _7 by 16 It also handles more complicated situations, e.g.: _8 = PHI <_9, _10>; _9 = PHI <_8, _10>; _10 = PHI <_8, _9, _1>; // Replaces occurences of _8, _9 and _10 by _1 gcc/ChangeLog: * Makefile.in: Added sccopy pass. * passes.def: Added sccopy pass before LTO streaming and before RTL expansion. * tree-pass.h (make_pass_sccopy): Added sccopy pass. * gimple-ssa-sccopy.cc: New file. gcc/testsuite/ChangeLog: * gcc.dg/sccopy-1.c: New test. Signed-off-by: Filip Kastl <fkastl@suse.cz>
2023-12-14	SRA: Relax requirements to use build_reconstructed_reference (PR 111807)	Martin Jambor	1	-3/+7
	This patch half-reverts 3aaf704bca3e and replaces it with a fix with relaxed requiremets for invoking build_reconstructed_reference in build_ref_for_model. build_ref_for_model/build_ref_for_offset is used in two slightly different contexts. The first is when we are looking at an assignmernt like p->field_A.field_B = s.field_B; and we have a replacements for e.g. s.field_B.field_C.field_D and we want to store them directly to p->field_A.field_B.field_C.field_D (as opposed to going through s or using a MEM_REF based in p->field_A.field_B). In this case, the offset of the "model" (s.field_B.field_C.field_D) within this can be different than offset within the LHS that we want to reach (field_C.field_D within the "base" p->field_A.field_B). Patch 3aaf704bca3e has caused us to unnecessarily create MEM_REFs for these situations. These uses of build_ref_for_model work with the relaxed condition just fine. The second, problematic, context is when somewhere in the function we have an assignment s.field_A = t.field_A.field_B; and we are creating an access structure to represent s.field_A.field_B even if it is not actually accessed in the original input. This is done after scanning the entire function body and we need to construct a "universal" reference to s.field_A.field_B. In this case the "base" is "s" and it has to be the DECL itself and not some reference for it because for arbitrary references we need a GSI pointing to a statement which we don't have, the reference is supposed to be universal. But then using build_ref_for_model and within it build_reconstructed_reference misbihaves if the expression contains any ARRAY_REFs. In the first case those are fine because as we eventually reach the aggregate type that matches a real LHS or RHS, we know we we can just bolt the rest of the references onto it and end up with the correct overall reference. However when dealing with s.array[1].field_A = s.array[2].field_B; we cannot just bolt array[2] reference when we want array[1] but that is exactly what happens when we use build_reconstructed_reference and keep it walking all the way to s. I was consiering making all users of the second kind use directly build_ref_for_offset instead of build_ref_for_model but the latter also handles COMPONENT_REFs to bit-fields which the former does not. THerefore I have deided to use the NULL-ness of GSI as an indicator how strict we need to be. I have changed the function comment to reflect that. I have been able to observe diambiguation improvements with this patch over currenct master, we do successfuly manage a few more aliasing_component_refs_p disambiguations when compiling cc1, going from: Alias oracle query stats: refs_may_alias_p: 94354287 disambiguations, 106279231 queries ref_maybe_used_by_call_p: 1572511 disambiguations, 95618222 queries call_may_clobber_ref_p: 649273 disambiguations, 659371 queries stmt_kills_ref_p: 142342 kills, 8407309 queries nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must overlaps, 68893 queries aliasing_component_refs_p: 67090 disambiguations, 3081766 queries TBAA oracle: 22675296 disambiguations 61781978 queries 14045969 are in alias set 0 10997085 queries asked about the same object 153 queries asked about the same alias set 0 access volatile 12485774 are dependent in the DAG 1577701 are aritificially in conflict with void * Modref stats: modref kill: 832 kills, 19399 queries modref use: 50760 disambiguations, 1825109 queries modref clobber: 1371014 disambiguations, 40152535 queries 5190238 tbaa queries (0.129263 per modref query) 1341663 base compares (0.033414 per modref query) PTA query stats: pt_solution_includes: 36784427 disambiguations, 46141175 queries pt_solutions_intersect: 4519387 disambiguations, 17081996 queries to: Alias oracle query stats: refs_may_alias_p: 94354083 disambiguations, 106278948 queries ref_maybe_used_by_call_p: 1572511 disambiguations, 95618018 queries call_may_clobber_ref_p: 649273 disambiguations, 659371 queries stmt_kills_ref_p: 142342 kills, 8407310 queries nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must overlaps, 68893 queries aliasing_component_refs_p: 67104 disambiguations, 3081781 queries TBAA oracle: 22676608 disambiguations 61782455 queries 14044948 are in alias set 0 10998619 queries asked about the same object 153 queries asked about the same alias set 0 access volatile 12484882 are dependent in the DAG 1577245 are aritificially in conflict with void * Modref stats: modref kill: 832 kills, 19399 queries modref use: 50760 disambiguations, 1825106 queries modref clobber: 1371028 disambiguations, 40152504 queries 5190319 tbaa queries (0.129265 per modref query) 1341403 base compares (0.033408 per modref query) PTA query stats: pt_solution_includes: 36784449 disambiguations, 46141210 queries pt_solutions_intersect: 4519320 disambiguations, 17082083 queries gcc/ChangeLog: 2023-12-13 Martin Jambor <mjambor@suse.cz> PR tree-optimization/111807 * tree-sra.cc (build_ref_for_model): Allow offset smaller than model->offset when gsi is non-NULL. Adjust function comment.
2023-12-14	Force broadcast constant to mem for vec_dup{v4di,v8si,v4df,v8df} when ↵	liuhongt	4	-22/+62
	TARGET_AVX2 is not available. vpbroadcastd/vpbroadcastq is avaiable under TARGET_AVX2, but vec_dup{v4di,v8si} pattern is avaiable under AVX with memory operand. And it will cause LRA/Reload to generate spill and reload if we put constant in register. gcc/ChangeLog: PR target/112992 * config/i386/i386-expand.cc (ix86_convert_const_wide_int_to_broadcast): Don't convert to broadcast for vec_dup{v4di,v8si} when TARGET_AVX2 is not available. (ix86_broadcast_from_constant): Allow broadcast for V4DI/V8SI when !TARGET_AVX2 since it will be forced to memory later. (ix86_expand_vector_move): Force constant to mem for vec_dup{vssi,v4di} when TARGET_AVX2 is not available. gcc/testsuite/ChangeLog: * gcc.target/i386/pr100865-7a.c: Adjust testcase. * gcc.target/i386/pr100865-7c.c: Ditto. * gcc.target/i386/pr112992.c: New test.
2023-12-14	RISC-V: Add failed SLP testcase	Juzhe-Zhong	1	-0/+19
	After recent RVV cost model tweak, I found this PR issue has been fixed. Add testcase and committed. PR target/112387 gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/pr112387.c: New test.
2023-12-14	tree-optimization/110640 - testcase for fixed bug	Richard Biener	1	-0/+22
	PR tree-optimization/110640 * gcc.dg/torture/pr110640.c: New testcase.
2023-12-14	testsuite: Fix up target-enter-data-1.c on 32-bit targets	Jakub Jelinek	1	-1/+1
	struct bar { int num_vectors; double vectors; }; is 16 bytes only on 64-bit targets, on 32-bit ones it is just 8 bytes, so the explicit matching of the 16 multiplication only works on the former. 2023-12-14 Jakub Jelinek <jakub@redhat.com> * c-c++-common/gomp/target-enter-data-1.c: Match also sizeof bar on 32-bit targets - 8 bytes - rather than just 16 bytes.
2023-12-14	testsuite: Fix up pr112904.C test [PR112904]	Jakub Jelinek	1	-0/+5
	On Fri, Dec 08, 2023 at 03:12:00PM +0800, liuhongt wrote: > * g++.target/i386/pr112904.C: New test. The new test FAILs on i686-linux and even on x86_64-linux I think it doesn't actually test what was reported, unless one performs testing with -march= for some XOP enabled CPU or -mxop. The following patch fixes that, tested on x86_64-linux with make check-g++ RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mno-sse/-mno-mmx,-m64\} i386.exp=pr112904.C' 2023-12-14 Jakub Jelinek <jakub@redhat.com> PR target/112904 * g++.target/i386/pr112904.C: Add dg-do compile, dg-options -mxop and for ia32 also dg-additional-options -mmmx.
2023-12-14	c++: Fix tinst_level::to_list [PR112968]	Jakub Jelinek	1	-2/+4
	With valgrind checking, there are various errors reported on some C++26 libstdc++ tests, like: ==2009913== Conditional jump or move depends on uninitialised value(s) ==2009913== at 0x914C59: gt_ggc_mx_lang_tree_node(void) (gt-cp-tree.h:107) ==2009913== by 0x8AB7A5: gt_ggc_mx_tinst_level(void) (gt-cp-pt.h:32) ==2009913== by 0xB89B25: ggc_mark_root_tab(ggc_root_tab const) (ggc-common.cc:75) ==2009913== by 0xB89DF4: ggc_mark_roots() (ggc-common.cc:104) ==2009913== by 0x9D6311: ggc_collect(ggc_collect) (ggc-page.cc:2227) ==2009913== by 0xDB70F6: execute_one_pass(opt_pass) (passes.cc:2738) ==2009913== by 0xDB721F: execute_pass_list_1(opt_pass) (passes.cc:2755) ==2009913== by 0xDB7258: execute_pass_list(function, opt_pass) (passes.cc:2766) ==2009913== by 0xA55525: cgraph_node::analyze() (cgraphunit.cc:695) ==2009913== by 0xA57CC7: analyze_functions(bool) (cgraphunit.cc:1248) ==2009913== by 0xA5890D: symbol_table::finalize_compilation_unit() (cgraphunit.cc:2555) ==2009913== by 0xEB02A1: compile_file() (toplev.cc:473) I think the problem is in the tinst_level::to_list optimization from 2018. That function returns a TREE_LIST with TREE_PURPOSE/TREE_VALUE filled in. Either it freshly allocates using build_tree_list (NULL, NULL); + stores TREE_PURPOSE/TREE_VALUE, that case is fine (the whole tree_list object is zeros, except for TREE_CODE set to TREE_LIST and TREE_PURPOSE/TREE_VALUE modified later; the above also means in particular TREE_TYPE of it is NULL and TREE_CHAIN is NULL and both are accessible/initialized even in valgrind annotations. Or it grabs a TREE_LIST node from a freelist. If defined(ENABLE_GC_CHECKING), the object is still all zeros except for TREE_CODE/TREE_PURPOSE/TREE_VALUE like in the fresh allocation case (but unlike the build_tree_list case in the valgrind annotations TREE_TYPE and TREE_CHAIN are marked as uninitialized). If !defined(ENABLE_GC_CHECKING), I believe the actual memory content is that everything but TREE_CODE/TREE_PURPOSE/TREE_VALUE/TREE_CHAIN is zeros and TREE_CHAIN is something random (whatever next entry is in the freelist, nothing overwrote it) and from valgrind POV again, TREE_TYPE and TREE_CHAIN are marked as uninitialized. When using the other freelist instantiations (pending_template and tinst_level) I believe everything is correct, from valgrind POV it marks the whole pending_template or tinst_level as uninitialized, but the caller initializes it all). One way to fix this would be let tinst_level::to_list not store just TREE_PURPOSE (ret) = tldcl; TREE_VALUE (ret) = targs; but also TREE_TYPE (ret) = NULL_TREE; TREE_CHAIN (ret) = NULL_TREE; Though, that seems like wasted effort in the build_tree_list case to me. So, the following patch instead does that TREE_CHAIN = NULL_TREE store only in the case where it isn't already done (and likewise for TREE_TYPE just to be sure) and marks both TREE_CHAIN and TREE_TYPE as initialized (the latter is at that spot, the former is because we never really touch TREE_TYPE of a TREE_LIST anywhere and so the NULL gets stored into the freelist and restored from there (except for ENABLE_GC_CHECKING where it is poisoned and then cleared again). 2023-12-14 Jakub Jelinek <jakub@redhat.com> PR c++/112968 pt.cc (freelist<tree_node>::reinit): Make whole obj->common defined for valgrind annotations rather than just obj->base, and do it even for ENABLE_GC_CHECKING. If not ENABLE_GC_CHECKING, clear TREE_CHAIN (obj) and TREE_TYPE (obj).
2023-12-14	RISC-V: Add RVV builtin vectorization cost model	Juzhe-Zhong	4	-3/+239
	This patch fixes PR11153: ble a1,zero,.L8 addiw a5,a1,-1 li a4,4 addi sp,sp,-16 mv a2,a0 sext.w a3,a1 bleu a5,a4,.L9 srliw a4,a3,2 slli a4,a4,4 mv a5,a0 add a4,a4,a0 vsetivli zero,4,e32,m1,ta,ma vmv.v.i v1,0 vse32.v v1,0(sp) .L4: vle32.v v1,0(a5) ---> This loop always processes 4 elements which is ok for VLEN = 128bits, but waste a huge amount of computation units when VLEN > 128bits vle32.v v2,0(sp) addi a5,a5,16 vadd.vv v1,v2,v1 vse32.v v1,0(sp) bne a4,a5,.L4 ld a5,0(sp) lw a4,0(sp) andi a1,a1,-4 srai a5,a5,32 addw a5,a4,a5 lw a4,8(sp) addw a5,a5,a4 ld a4,8(sp) srai a4,a4,32 addw a0,a5,a4 beq a3,a1,.L15 .L3: subw a3,a3,a1 slli a5,a1,32 slli a3,a3,32 srli a3,a3,32 srli a5,a5,30 add a2,a2,a5 vsetvli a5,a3,e8,mf4,tu,mu vsetvli a4,zero,e32,m1,ta,ma sub a1,a3,a5 vmv.v.i v1,0 vsetvli zero,a3,e32,m1,tu,ma vle32.v v2,0(a2) vmv.v.v v1,v2 bne a3,a5,.L21 .L7: vsetvli a4,zero,e32,m1,ta,ma vmv.s.x v2,zero vredsum.vs v1,v1,v2 vmv.x.s a5,v1 addw a0,a0,a5 .L15: addi sp,sp,16 jr ra .L21: slli a5,a5,2 add a2,a2,a5 vsetvli zero,a1,e32,m1,tu,ma vle32.v v2,0(a2) vadd.vv v1,v1,v2 j .L7 .L8: li a0,0 ret .L9: li a1,0 li a0,0 j .L3 The rootcause of this is we missed RVV builtin vectorization cost model. After this patch: ble a1,zero,.L4 vsetvli a5,zero,e32,m1,ta,ma vmv.v.i v1,0 .L3: vsetvli a5,a1,e32,m1,tu,ma vle32.v v2,0(a0) slli a4,a5,2 sub a1,a1,a5 add a0,a0,a4 vadd.vv v1,v2,v1 bne a1,zero,.L3 li a5,0 vsetivli zero,1,e32,m1,ta,ma vmv.s.x v2,a5 vsetvli a5,zero,e32,m1,ta,ma vredsum.vs v1,v1,v2 vmv.x.s a0,v1 ret .L4: li a0,0 ret PR target/111153 gcc/ChangeLog: * config/riscv/riscv-protos.h (struct common_vector_cost): New struct. (struct scalable_vector_cost): Ditto. (struct cpu_vector_cost): Ditto. * config/riscv/riscv-vector-costs.cc (costs::add_stmt_cost): Add RVV builtin vectorization cost * config/riscv/riscv.cc (struct riscv_tune_param): Ditto. (get_common_costs): New function. (riscv_builtin_vectorization_cost): Ditto. (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): New targethook. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/pr111153.c: New test.
2023-12-13	[committed] Minor testsuite fallout from c99 changes	Jeff Law	1	-0/+2
	The alpha port failed its weekly test due to a lack of a prototype for the syscall() routine. Fixed thusly and pushed to the trunk. gcc/testsuite * gcc.c-torture/execute/20001229-1.c: Prototype syscall().
2023-12-14	Daily bump.	GCC Administrator	14	-1/+922

2023-12-13	c++: fix cpp0x/constexpr-ex1.C in C++23	Marek Polacek	1	-1/+1
	Since r14-6505 I see: FAIL: g++.dg/cpp0x/constexpr-ex1.C -std=c++23 at line 91 (test for errors, line 89) FAIL: g++.dg/cpp0x/constexpr-ex1.C -std=c++23 (test for excess errors) FAIL: g++.dg/cpp0x/constexpr-ex1.C -std=c++26 at line 91 (test for errors, line 89) FAIL: g++.dg/cpp0x/constexpr-ex1.C -std=c++26 (test for excess errors) and it wasn't fixed by r14-6511. So I'm fixing it with the below. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/constexpr-ex1.C: Adjust expected diagnostic line.
2023-12-13	aarch64: SVE/NEON Bridging intrinsics	Richard Ball	58	-96/+1697
	ACLE has added intrinsics to bridge between SVE and Neon. The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and SVE vectors. This patch adds support to GCC for the following 3 intrinsics: svset_neonq, svget_neonq and svdup_neonq gcc/ChangeLog: * config.gcc: Adds new header to config. * config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers): Moved to header file. (ENTRY): Likewise. (enum aarch64_simd_type): Likewise. (struct aarch64_simd_type_info): Remove static. (GTY): Likewise. * config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64): Defines pragma for arm_neon_sve_bridge.h. * config/aarch64/aarch64-protos.h: Add handle_arm_neon_sve_bridge_h * config/aarch64/aarch64-sve-builtins-base.h: New intrinsics. * config/aarch64/aarch64-sve-builtins-base.cc (class svget_neonq_impl): New intrinsic implementation. (class svset_neonq_impl): Likewise. (class svdup_neonq_impl): Likewise. (NEON_SVE_BRIDGE_FUNCTION): New intrinsics. * config/aarch64/aarch64-sve-builtins-functions.h (NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE functions. * config/aarch64/aarch64-sve-builtins-shapes.h: New shapes. * config/aarch64/aarch64-sve-builtins-shapes.cc (parse_element_type): Add NEON element types. (parse_type): Likewise. (struct get_neonq_def): Defines function shape for get_neonq. (struct set_neonq_def): Defines function shape for set_neonq. (struct dup_neonq_def): Defines function shape for dup_neonq. * config/aarch64/aarch64-sve-builtins.cc (DEF_SVE_TYPE_SUFFIX): Changed to be called through SVE_NEON macro. (DEF_SVE_NEON_TYPE_SUFFIX): Defines macro for NEON_SVE_BRIDGE type suffixes. (DEF_NEON_SVE_FUNCTION): Defines macro for NEON_SVE_BRIDGE functions. (function_resolver::infer_neon128_vector_type): Infers type suffix for overloaded functions. (handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h. * config/aarch64/aarch64-sve-builtins.def (DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes. (bf16): Replace entry with neon-sve entry. (f16): Likewise. (f32): Likewise. (f64): Likewise. (s8): Likewise. (s16): Likewise. (s32): Likewise. (s64): Likewise. (u8): Likewise. (u16): Likewise. (u32): Likewise. (u64): Likewise. * config/aarch64/aarch64-sve-builtins.h (GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h. (ENTRY): Add aarch64_simd_type definiton. (enum aarch64_simd_type): Add neon information to type_suffix_info. (struct type_suffix_info): New function. * config/aarch64/aarch64-sve.md (@aarch64_sve_get_neonq_<mode>): New intrinsic insn for big endian. (@aarch64_sve_set_neonq_<mode>): Likewise. * config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ. * config/aarch64/aarch64-builtins.h: New file. * config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file. * config/aarch64/arm_neon_sve_bridge.h: New file. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include arm_neon_sve_bridge header file * gcc.dg/torture/neon-sve-bridge.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_s8.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_u16.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_u32.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_u64.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_u8.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_f16.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_f32.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_f64.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_s16.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_s32.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_s64.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_s8.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_u16.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_u32.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_u64.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_u8.c: New test. * gcc.target/aarch64/sve/acle/general-c/dup_neonq_1.c: New test. * gcc.target/aarch64/sve/acle/general-c/get_neonq_1.c: New test. * gcc.target/aarch64/sve/acle/general-c/set_neonq_1.c: New test.
2023-12-13	c++: note other candidates when diagnosing deletedness	Patrick Palka	7	-1/+71
	With the previous two patches in place, we can now extend our deletedness diagnostic to note the other considered candidates, e.g.: deleted.C: In function 'int main()': deleted.C:10:4: error: use of deleted function 'void f(int)' 10 \| f(0); \| ~^~~ deleted.C:5:6: note: declared here 5 \| void f(int) = delete; \| ^ deleted.C:5:6: note: candidate: 'void f(int)' (deleted) deleted.C:6:6: note: candidate: 'void f(...)' 6 \| void f(...); \| ^ deleted.C:7:6: note: candidate: 'void f(int, int)' 7 \| void f(int, int); \| ^ deleted.C:7:6: note: candidate expects 2 arguments, 1 provided These notes are controlled by a new command line flag -fdiagnostics-all-candidates which also controls whether we note ignored candidates more generally. gcc/ChangeLog: * doc/invoke.texi (C++ Dialect Options): Document -fdiagnostics-all-candidates. gcc/c-family/ChangeLog: * c.opt: Add -fdiagnostics-all-candidates. gcc/cp/ChangeLog: * call.cc (print_z_candidates): Only print ignored candidates when -fdiagnostics-all-candidates is set, otherwise suggest the flag. (build_over_call): When diagnosing deletedness, note other candidates only if -fdiagnostics-all-candidates is set, otherwise suggest the flag. gcc/testsuite/ChangeLog: * g++.dg/overload/error6.C: Pass -fdiagnostics-all-candidates. * g++.dg/cpp0x/deleted16.C: New test. * g++.dg/cpp0x/deleted16a.C: New test. * g++.dg/overload/error6a.C: New test.
2023-12-13	c++: remember candidates that we ignored	Patrick Palka	3	-44/+131
	During overload resolution, we sometimes outright ignore a function in the overload set and leave no trace of it in the candidates list, for example when we find a perfect non-template candidate we discard all function templates, or when the callee is a template-id we discard all non-template functions. We should still however make note of these non-viable functions when diagnosing overload resolution failure, but that's not possible if they're not present in the returned candidates list. To that end, this patch reworks add_candidates to add such ignored functions to the list. The new rr_ignored rejection reason is somewhat of a catch-all; we could perhaps split it up into more specific rejection reasons, but I leave that as future work. gcc/cp/ChangeLog: * call.cc (enum rejection_reason_code): Add rr_ignored. (add_ignored_candidate): Define. (ignored_candidate_p): Define. (add_template_candidate_real): Do add_ignored_candidate instead of returning NULL. (splice_viable): Put ignored (non-viable) candidates last. (print_z_candidate): Handle ignored candidates. (build_new_function_call): Refine shortcut that calls cp_build_function_call_vec now that non-templates can appear in the candidate list for a template-id call. (add_candidates): Replace 'bad_fns' overload with 'bad_cands' candidate list. When not considering a candidate, add it to the list as an ignored candidate. Add all 'bad_cands' to the overload set as well. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/param-type-mismatch-2.C: Rename template function test_7 that (maybe accidentally) shares the same name as its non-template callee. * g++.dg/overload/error6.C: New test.
2023-12-13	c++: sort candidates according to viability	Patrick Palka	2	-74/+115
	This patch: * changes splice_viable to move the non-viable candidates to the end of the list instead of removing them outright * makes tourney move the best candidate to the front of the candidate list * adjusts print_z_candidates to preserve our behavior of printing only viable candidates when diagnosing ambiguity * adds a parameter to print_z_candidates to control this default behavior (the follow-up patch will want to print all candidates when diagnosing deletedness) Thus after this patch we have access to the entire candidate list through the best viable candidate. This change also happens to fix diagnostics for the below testcase where we currently neglect to note the third candidate, since the presence of the two unordered non-strictly viable candidates causes splice_viable to prematurely get rid of the non-viable third candidate. gcc/cp/ChangeLog: * call.cc: Include "tristate.h". (splice_viable): Sort the candidate list according to viability. Don't remove non-viable candidates from the list. (print_z_candidates): Add defaulted only_viable_p parameter. By default only print non-viable candidates if there is no viable candidate. (tourney): Ignore non-viable candidates. Move the true champ to the front of the candidates list, and update 'candidates' to point to the front. Rename champ_compared_to_predecessor to previous_worse_champ. gcc/testsuite/ChangeLog: * g++.dg/overload/error5.C: New test.
2023-12-13	c++: unifying constants vs their type [PR99186, PR104867]	Patrick Palka	3	-0/+42
	When unifying constants we need to treat constants of different types but same value as different in light of auto template parameters since otherwise e.g. A<1> will unify with A<1u> (where A's template-head is template<auto>). This patch fixes this in a minimal way; it seems we could get away with just using template_args_equal here, as we do in the default case, or even just cp_tree_equal since the CONVERT_EXPR_P loop seems to be dead code, but that's a simplification we could consider during next stage 1. PR c++/99186 PR c++/104867 gcc/cp/ChangeLog: * pt.cc (unify) <case INTEGER_CST>: Compare types as well. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/nontype-auto23.C: New test. * g++.dg/cpp1z/nontype-auto24.C: New test.
2023-12-13	c++: unifying FUNCTION_DECLs [PR93740]	Patrick Palka	2	-0/+28
	unify currently always returns success when unifying two FUNCTION_DECLs (due to the is_overloaded_fn deferment within the default case), which means for the below testcase we incorrectly unify &A::foo and &A::bar leading to deduction failure for the index_of calls due to a bogus base class ambiguity. This patch makes unify handle FUNCTION_DECL naturally like other decls. PR c++/93740 gcc/cp/ChangeLog: * pt.cc (unify) <case FUNCTION_DECL>: Handle it like FIELD_DECL and TEMPLATE_DECL. gcc/testsuite/ChangeLog: * g++.dg/template/ptrmem34.C: New test.
2023-12-13	c-family: rename warn_for_address_or_pointer_of_packed_member	Jason Merrill	5	-23/+19
	Following the last patch, let's rename the functions to reflect the change in behavior. gcc/c-family/ChangeLog: * c-warn.cc (check_address_or_pointer_of_packed_member): Rename to check_address_of_packed_member. (check_and_warn_address_or_pointer_of_packed_member): Rename to check_and_warn_address_of_packed_member. (warn_for_address_or_pointer_of_packed_member): Rename to warn_for_address_of_packed_member. * c-common.h: Adjust. gcc/c/ChangeLog: * c-typeck.cc (convert_for_assignment): Adjust call to warn_for_address_of_packed_member. gcc/cp/ChangeLog: * call.cc (convert_for_arg_passing) * typeck.cc (convert_for_assignment): Adjust call to warn_for_address_of_packed_member.
2023-12-13	c-family: -Waddress-of-packed-member and casts	Jason Merrill	8	-102/+19
	-Waddress-of-packed-member, in addition to the documented warning about actually taking the address of a packed member, also warns about casting from a pointer to a TYPE_PACKED type to a pointer to a type with greater alignment. This wrongly warns if the source is a pointer to enum when -fshort-enums is on, since that is also represented by TYPE_PACKED. And there's already -Wcast-align to catch casting from pointer to less aligned type (packed or otherwise) to pointer to more aligned type; even apart from the enum problem, this seems like a somewhat arbitrary subset of that warning. So, this patch removes the undocumented type-based warning from -Waddress-of-packed-member. Some of the tests where the warning is desirable I changed to use -Wcast-align=strict instead. The ones that require -Wno-incompatible-pointer-types I just removed. gcc/c-family/ChangeLog: * c-warn.cc (check_address_or_pointer_of_packed_member): Remove warning based on TYPE_PACKED. gcc/testsuite/ChangeLog: * c-c++-common/Waddress-of-packed-member-1.c: Don't expect a warning on the cast cases. * c-c++-common/pr51628-35.c: Use -Wcast-align=strict. * g++.dg/warn/Waddress-of-packed-member3.C: Likewise. * gcc.dg/pr88928.c: Likewise. * gcc.dg/pr51628-20.c: Removed. * gcc.dg/pr51628-21.c: Removed. * gcc.dg/pr51628-25.c: Removed.
2023-12-13	OpenMP: Pointers and member mappings	Julian Brown	22	-88/+1220
	This patch changes the mapping node arrangement used for array components of derived types in order to accommodate for changes made in the previous patch, particularly the use of "GOMP_MAP_ATTACH_DETACH" for pointer-typed derived-type members instead of "GOMP_MAP_ALWAYS_POINTER". We change the mapping nodes used for a derived-type mapping like this: type T integer, pointer, dimension(:) :: arrptr end type T type(T) :: tvar [...] !$omp target map(tofrom: tvar%arrptr) So that the nodes used look like this: 1) map(to: tvar%arrptr) --> GOMP_MAP_TO [implicit] tvar%arrptr%data (the array data) GOMP_MAP_TO_PSET tvar%arrptr (the descriptor) GOMP_MAP_ATTACH_DETACH tvar%arrptr%data 2) map(tofrom: tvar%arrptr(3:8) --> GOMP_MAP_TOFROM tvar%arrptr%data(3) (size 8-3+1, etc.) GOMP_MAP_TO_PSET tvar%arrptr GOMP_MAP_ATTACH_DETACH tvar%arrptr%data (bias 3, etc.) In this case, we can determine in the front-end that the whole-array/pointer mapping (1) is only needed to map the pointer -- so we drop it entirely. (Note also that we set -- early -- the OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P flag for whole-array-via-pointer mappings. See below.) In the middle end, we process mappings using the struct sibling-list handling machinery by moving the "GOMP_MAP_TO_PSET" node from the middle of the group of three mapping nodes to the proper sorted position after the GOMP_MAP_STRUCT mapping: GOMP_MAP_STRUCT tvar (len: 1) GOMP_MAP_TO_PSET tvar%arr (size: 64, etc.) <--. moved here [...] \| GOMP_MAP_TOFROM tvar%arrptr%data(3) ___\| GOMP_MAP_ATTACH_DETACH tvar%arrptr%data In another case, if we have an array of derived-type values "dtarr", and mappings like: i = 1 j = 1 map(to: dtarr(i)%arrptr) map(tofrom: dtarr(j)%arrptr(3:8)) We still map the same way, but this time we cannot prove that the base expressions "dtarr(i) and "dtarr(j)" are the same in the front-end. So we keep both mappings, but we move the "[implicit]" mapping of the full-array reference to the end of the clause list in gimplify.cc (by adjusting the topological sorting algorithm): GOMP_MAP_STRUCT dtvar (len: 2) GOMP_MAP_TO_PSET dtvar(i)%arrptr GOMP_MAP_TO_PSET dtvar(j)%arrptr [...] GOMP_MAP_TOFROM dtvar(j)%arrptr%data(3) (size: 8-3+1) GOMP_MAP_ATTACH_DETACH dtvar(j)%arrptr%data GOMP_MAP_TO [implicit] dtvar(i)%arrptr%data(1) (size: whole array) GOMP_MAP_ATTACH_DETACH dtvar(i)%arrptr%data Always moving "[implicit]" full-array mappings after array-section mappings (without that bit set) means that we'll avoid copying the whole array unnecessarily -- even in cases where we can't prove that the arrays are the same. The patch also fixes some bugs with "enter data" and "exit data" directives with this new mapping arrangement. Also now if you have mappings like this: #pragma omp target enter data map(to: dv, dv%arr(1:20)) The whole of the derived-type variable "dv" is mapped, so the GOMP_MAP_TO_PSET for the array-section mapping can be dropped: GOMP_MAP_TO dv GOMP_MAP_TO dv%arr%data GOMP_MAP_TO_PSET dv%arr <-- deleted (array section mapping) GOMP_MAP_ATTACH_DETACH dv%arr%data To accommodate for recent changes to mapping nodes made by Tobias, this version of the patch avoids using GOMP_MAP_TO_PSET for "exit data" directives, in favour of using the "correct" GOMP_MAP_RELEASE/GOMP_MAP_DELETE kinds during early expansion. A new flag is introduced so the middle-end knows when the latter two kinds are being used specifically for an array descriptor. This version of the patch fixes "omp target exit data" handling for GOMP_MAP_DELETE, and adds pretty-printing dump output for the OMP_CLAUSE_RELEASE_DESCRIPTOR flag (for a little extra clarity). Also I noticed the handling of descriptors on OpenACC exit-data directives was inconsistent, so I've made those use GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the new flag in the same way as OpenMP too. In the end it doesn't actually matter to the runtime, which handles GOMP_MAP_RELEASE/GOMP_MAP_DELETE/GOMP_MAP_TO_PSET for array descriptors on OpenACC "exit data" directives the same, anyway, and doing it this way in the FE avoids needless divergence. I've added a couple of new tests (gomp/target-enter-exit-data.f90 and goacc/enter-exit-data-2.f90). 2023-12-07 Julian Brown <julian@codesourcery.com> gcc/fortran/ * dependency.cc (gfc_omp_expr_prefix_same): New function. * dependency.h (gfc_omp_expr_prefix_same): Add prototype. * gfortran.h (gfc_omp_namelist): Add "duplicate_of" field to "u2" union. * trans-openmp.cc (dependency.h): Include. (gfc_trans_omp_array_section): Adjust mapping node arrangement for array descriptors. Use GOMP_MAP_TO_PSET or GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the OMP_CLAUSE_RELEASE_DESCRIPTOR flag set. (gfc_symbol_rooted_namelist): New function. (gfc_trans_omp_clauses): Check subcomponent and subarray/element accesses elsewhere in the clause list for pointers to derived types or array descriptors, and adjust or drop mapping nodes appropriately. Adjust for changes to mapping node arrangement. (gfc_trans_oacc_executable_directive): Pass code op through. gcc/ * gimplify.cc (omp_map_clause_descriptor_p): New function. (build_omp_struct_comp_nodes, omp_get_attachment, omp_group_base): Use above function. (omp_tsort_mapping_groups): Process nodes that have OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P set after those that don't. Add enter_exit_data parameter. (omp_resolve_clause_dependencies): Remove GOMP_MAP_TO_PSET mappings if we're mapping the whole containing derived-type variable. (omp_accumulate_sibling_list): Adjust GOMP_MAP_TO_PSET handling. Remove GOMP_MAP_ALWAYS_POINTER handling. (gimplify_scan_omp_clauses): Pass enter_exit argument to omp_tsort_mapping_groups. Don't adjust/remove GOMP_MAP_TO_PSET mappings for derived-type components here. * tree.h (OMP_CLAUSE_RELEASE_DESCRIPTOR): New macro. * tree-pretty-print.cc (dump_omp_clause): Show OMP_CLAUSE_RELEASE_DESCRIPTOR in dump output (with GOMP_MAP_TO_PSET-like syntax). gcc/testsuite/ * gfortran.dg/goacc/enter-exit-data-2.f90: New test. * gfortran.dg/goacc/finalize-1.f: Adjust scan output. * gfortran.dg/gomp/map-9.f90: Adjust scan output. * gfortran.dg/gomp/map-subarray-2.f90: New test. * gfortran.dg/gomp/map-subarray.f90: New test. * gfortran.dg/gomp/target-enter-exit-data.f90: New test. libgomp/ * testsuite/libgomp.fortran/map-subarray.f90: New test. * testsuite/libgomp.fortran/map-subarray-2.f90: New test. * testsuite/libgomp.fortran/map-subarray-3.f90: New test. * testsuite/libgomp.fortran/map-subarray-4.f90: New test. * testsuite/libgomp.fortran/map-subarray-6.f90: New test. * testsuite/libgomp.fortran/map-subarray-7.f90: New test. * testsuite/libgomp.fortran/map-subarray-8.f90: New test. * testsuite/libgomp.fortran/map-subcomponents.f90: New test. * testsuite/libgomp.fortran/struct-elem-map-1.f90: Adjust for descriptor-mapping changes. Remove XFAIL.
2023-12-13	OpenMP/OpenACC: Rework clause expansion and nested struct handling	Julian Brown	44	-820/+7216
	This patch reworks clause expansion in the C, C++ and (to a lesser extent) Fortran front ends for OpenMP and OpenACC mapping nodes used in GPU offloading support. At present a single clause may be turned into several mapping nodes, or have its mapping type changed, in several places scattered through the front- and middle-end. The analysis relating to which particular transformations are needed for some given expression has become quite hard to follow. Briefly, we manipulate clause types in the following places: 1. During parsing, in c_omp_adjust_map_clauses. Depending on a set of rules, we may change a FIRSTPRIVATE_POINTER (etc.) mapping into ATTACH_DETACH, or mark the decl addressable. 2. In semantics.cc or c-typeck.cc, clauses are expanded in handle_omp_array_sections (called via {c_}finish_omp_clauses, or in finish_omp_clauses itself. The two cases are for processing array sections (the former), or non-array sections (the latter). 3. In gimplify.cc, we build sibling lists for struct accesses, which groups and sorts accesses along with their struct base, creating new ALLOC/RELEASE nodes for pointers. 4. In gimplify.cc:gimplify_adjust_omp_clauses, mapping nodes may be adjusted or created. This patch doesn't completely disrupt this scheme, though clause types are no longer adjusted in c_omp_adjust_map_clauses (step 1). Clause expansion in step 2 (for C and C++) now uses a single, unified mechanism, parts of which are also reused for analysis in step 3. Rather than the kind-of "ad-hoc" pattern matching on addresses used to expand clauses used at present, a new method for analysing addresses is introduced. This does a recursive-descent tree walk on expression nodes, and emits a vector of tokens describing each "part" of the address. This tokenized address can then be translated directly into mapping nodes, with the assurance that no part of the expression has been inadvertently skipped or misinterpreted. In this way, all the variations of ways pointers, arrays, references and component accesses might be combined can be teased apart into easily-understood cases - and we know we've "parsed" the whole address before we start analysis, so the right code paths can easily be selected. For example, a simple access "arr[idx]" might parse as: base-decl access-indexed-array or "mystruct->foo[x]" with a pointer "foo" component might parse as: base-decl access-pointer component-selector access-pointer A key observation is that support for "array" bases, e.g. accesses whose root nodes are not structures, but describe scalars or arrays, and also one-level deep structure accesses, have first-class support in gimplify and beyond. Expressions that use deeper struct accesses or e.g. multiple indirections were more problematic: some cases worked, but lots of cases didn't. This patch reimplements the support for those in gimplify.cc, again using the new "address tokenization" support. An expression like "mystruct->foo->bar[0:10]" used in a mapping node will translate the right-hand access directly in the front-end. The base for the access will be "mystruct->foo". This is handled recursively in gimplify.cc -- there may be several accesses of "mystruct"'s members on the same directive, so the sibling-list building machinery can be used again. (This was already being done for OpenACC, but the new implementation differs somewhat in details, and is more robust.) For OpenMP, in the case where the base pointer itself, i.e. "mystruct->foo" here, is NOT mapped on the same directive, we create a "fragile" mapping. This turns the "foo" component access into a zero-length allocation (which is a new feature for the runtime, so support has been added there too). A couple of changes have been made to how mapping clauses are turned into mapping nodes: The first change is based on the observation that it is probably never correct to use GOMP_MAP_ALWAYS_POINTER for component accesses (e.g. for references), because if the containing struct is already mapped on the target then the host version of the pointer in question will be corrupted if the struct is copied back from the target. This patch removes all such uses, across each of C, C++ and Fortran. The second change is to the way that GOMP_MAP_ATTACH_DETACH nodes are processed during sibling-list creation. For OpenMP, for pointer components, we must map the base pointer separately from an array section that uses the base pointer, so e.g. we must have both "map(mystruct.base)" and "map(mystruct.base[0:10])" mappings. These create nodes such as: GOMP_MAP_TOFROM mystruct.base G_M_TOFROM mystruct.base [len: 10elemsize] G_M_ATTACH_DETACH mystruct.base Instead of using the first of these directly when building the struct sibling list then skipping the group using GOMP_MAP_ATTACH_DETACH, leading to: GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_TOFROM mystruct.base we now introduce a new "mini-pass", omp_resolve_clause_dependencies, that drops the GOMP_MAP_TOFROM for the base pointer, marks the second group as having had a base-pointer mapping, then omp_build_struct_sibling_lists can create: GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_ALLOC mystruct.base [len: ptrsize] This ends up working better in many cases, particularly those involving references. (The "alloc" space is immediately overwritten by a pointer attachment, so this is mildly more efficient than a redundant TO mapping at runtime also.) There is support in the address tokenizer for "arbitrary" base expressions which aren't rooted at a decl, but that is not used as present because such addresses are disallowed at parse time. In the front-ends, the address tokenization machinery is mostly only used for clause expansion and not for diagnostics at present. It could be used for those too, which would allow more of my previous "address inspector" implementation to be removed. The new bits in gimplify.cc work with OpenACC also. This version of the patch addresses several first-pass review comments from Tobias, and fixes a few previously-missed cases for manually-managed ragged array mappings (including cases using references). Some arbitrary differences between handling of clause expansion for C vs. C++ have also been fixed, and some fragments from later in the patch series have been moved forward (where they were useful for fixing bugs). Several new test cases have been added. 2023-11-29 Julian Brown <julian@codesourcery.com> gcc/c-family/ * c-common.h (c_omp_region_type): Add C_ORT_EXIT_DATA, C_ORT_OMP_EXIT_DATA and C_ORT_ACC_TARGET. (omp_addr_token): Add forward declaration. (c_omp_address_inspector): New class. * c-omp.cc (c_omp_adjust_map_clauses): Mark decls addressable here, but do not change any mapping node types. (c_omp_address_inspector::unconverted_ref_origin, c_omp_address_inspector::component_access_p, c_omp_address_inspector::check_clause, c_omp_address_inspector::get_root_term, c_omp_address_inspector::map_supported_p, c_omp_address_inspector::get_origin, c_omp_address_inspector::maybe_unconvert_ref, c_omp_address_inspector::maybe_zero_length_array_section, c_omp_address_inspector::expand_array_base, c_omp_address_inspector::expand_component_selector, c_omp_address_inspector::expand_map_clause): New methods. (omp_expand_access_chain): New function. gcc/c/ * c-parser.cc (c_parser_oacc_all_clauses): Add TARGET_P parameter. Use to select region type for c_finish_omp_clauses call. (c_parser_oacc_loop): Update calls to c_parser_oacc_all_clauses. (c_parser_oacc_compute): Likewise. (c_parser_omp_target_data, c_parser_omp_target_enter_data): Support ATTACH kind. (c_parser_omp_target_exit_data): Support DETACH kind. (check_clauses): Handle GOMP_MAP_POINTER and GOMP_MAP_ATTACH here. * c-typeck.cc (handle_omp_array_sections_1, handle_omp_array_sections, c_finish_omp_clauses): Use c_omp_address_inspector class and OMP address tokenizer to analyze and expand map clause expressions. Fix some diagnostics. Fix "is OpenACC" condition for C_ORT_ACC_TARGET addition. gcc/cp/ * parser.cc (cp_parser_oacc_all_clauses): Add TARGET_P parameter. Use to select region type for finish_omp_clauses call. (cp_parser_omp_target_data, cp_parser_omp_target_enter_data): Support GOMP_MAP_ATTACH kind. (cp_parser_omp_target_exit_data): Support GOMP_MAP_DETACH kind. (cp_parser_oacc_declare): Update call to cp_parser_oacc_all_clauses. (cp_parser_oacc_loop): Update calls to cp_parser_oacc_all_clauses. (cp_parser_oacc_compute): Likewise. * pt.cc (tsubst_expr): Use C_ORT_ACC_TARGET for call to tsubst_omp_clauses for OpenACC compute regions. * semantics.cc (cp_omp_address_inspector): New class, derived from c_omp_address_inspector. (handle_omp_array_sections_1, handle_omp_array_sections, finish_omp_clauses): Use cp_omp_address_inspector class and OMP address tokenizer to analyze and expand OpenMP map clause expressions. Fix some diagnostics. Support C_ORT_ACC_TARGET. (finish_omp_target): Handle GOMP_MAP_POINTER. gcc/fortran/ * trans-openmp.cc (gfc_trans_omp_array_section): Add OPENMP parameter. Use GOMP_MAP_ATTACH_DETACH instead of GOMP_MAP_ALWAYS_POINTER for derived type components. (gfc_trans_omp_clauses): Update calls to gfc_trans_omp_array_section. gcc/ * gimplify.cc (build_struct_comp_nodes): Don't process GOMP_MAP_ATTACH_DETACH "middle" nodes here. (omp_mapping_group): Add REPROCESS_STRUCT and FRAGILE booleans for nested struct handling. (omp_strip_components_and_deref, omp_strip_indirections): Remove functions. (omp_get_attachment): Handle GOMP_MAP_DETACH here. (omp_group_last): Handle GOMP_MAP_, GOMP_MAP_DETACH, GOMP_MAP_ATTACH_DETACH groups for "exit data" of reference-to-pointer component array sections. (omp_gather_mapping_groups_1): Initialise reprocess_struct and fragile fields. (omp_group_base): Handle GOMP_MAP_ATTACH_DETACH after GOMP_MAP_STRUCT. (omp_index_mapping_groups_1): Skip reprocess_struct groups. (omp_get_nonfirstprivate_group, omp_directive_maps_explicitly, omp_resolve_clause_dependencies, omp_first_chained_access_token): New functions. (omp_check_mapping_compatibility): Adjust accepted node combinations for "from" clauses using release instead of alloc. (omp_accumulate_sibling_list): Add GROUP_MAP, ADDR_TOKENS, FRAGILE_P, REPROCESSING_STRUCT, ADDED_TAIL parameters. Use OMP address tokenizer to analyze addresses. Reimplement nested struct handling, and implement "fragile groups". (omp_build_struct_sibling_lists): Adjust for changes to omp_accumulate_sibling_list. Recalculate bias for ATTACH_DETACH nodes after GOMP_MAP_STRUCT nodes. (gimplify_scan_omp_clauses): Call omp_resolve_clause_dependencies. Use OMP address tokenizer. (gimplify_adjust_omp_clauses_1): Use build_fold_indirect_ref_loc instead of build_simple_mem_ref_loc. omp-general.cc (omp-general.h, tree-pretty-print.h): Include. (omp_addr_tokenizer): New namespace. (omp_addr_tokenizer::omp_addr_token): New. (omp_addr_tokenizer::omp_parse_component_selector, omp_addr_tokenizer::omp_parse_ref, omp_addr_tokenizer::omp_parse_pointer, omp_addr_tokenizer::omp_parse_access_method, omp_addr_tokenizer::omp_parse_access_methods, omp_addr_tokenizer::omp_parse_structure_base, omp_addr_tokenizer::omp_parse_structured_expr, omp_addr_tokenizer::omp_parse_array_expr, omp_addr_tokenizer::omp_access_chain_p, omp_addr_tokenizer::omp_accessed_addr): New functions. (omp_parse_expr, debug_omp_tokenized_addr): New functions. * omp-general.h (omp_addr_tokenizer::access_method_kinds, omp_addr_tokenizer::structure_base_kinds, omp_addr_tokenizer::token_type, omp_addr_tokenizer::omp_addr_token, omp_addr_tokenizer::omp_access_chain_p, omp_addr_tokenizer::omp_accessed_addr): New. (omp_addr_token, omp_parse_expr): New. * omp-low.cc (scan_sharing_clauses): Skip error check for references to pointers. * tree.h (OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED): New macro. gcc/testsuite/ * c-c++-common/gomp/clauses-2.c: Fix error output. * c-c++-common/gomp/target-implicit-map-2.c: Adjust scan output. * c-c++-common/gomp/target-50.c: Adjust scan output. * c-c++-common/gomp/target-enter-data-1.c: Adjust scan output. * g++.dg/gomp/static-component-1.C: New test. * gcc.dg/gomp/target-3.c: Adjust scan output. * gfortran.dg/gomp/map-9.f90: Adjust scan output. libgomp/ * target.c (gomp_map_pointer): Modify zero-length array section pointer handling. (gomp_attach_pointer): Likewise. (gomp_map_fields_existing): Use gomp_map_0len_lookup. (gomp_attach_pointer): Allow attaching null pointers (or Fortran "unassociated" pointers). (gomp_map_vars_internal): Handle zero-sized struct members. Add diagnostic for unmapped struct pointer members. * testsuite/libgomp.c-c++-common/baseptrs-1.c: New test. * testsuite/libgomp.c-c++-common/baseptrs-2.c: New test. * testsuite/libgomp.c-c++-common/baseptrs-6.c: New test. * testsuite/libgomp.c-c++-common/baseptrs-7.c: New test. * testsuite/libgomp.c-c++-common/ptr-attach-2.c: New test. * testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing "free". * testsuite/libgomp.c-c++-common/target-implicit-map-5.c: New test. * testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test. * testsuite/libgomp.c++/class-array-1.C: New test. * testsuite/libgomp.c++/baseptrs-3.C: New test. * testsuite/libgomp.c++/baseptrs-4.C: New test. * testsuite/libgomp.c++/baseptrs-5.C: New test. * testsuite/libgomp.c++/baseptrs-8.C: New test. * testsuite/libgomp.c++/baseptrs-9.C: New test. * testsuite/libgomp.c++/ref-mapping-1.C: New test. * testsuite/libgomp.c++/target-48.C: New test. * testsuite/libgomp.c++/target-49.C: New test. * testsuite/libgomp.c++/target-exit-data-reftoptr-1.C: New test. * testsuite/libgomp.c++/target-lambda-1.C: Update for OpenMP 5.2 semantics. * testsuite/libgomp.c++/target-this-3.C: Likewise. * testsuite/libgomp.c++/target-this-4.C: Likewise. * testsuite/libgomp.fortran/struct-elem-map-1.f90: Add temporary XFAIL. * testsuite/libgomp.fortran/target-enter-data-6.f90: Likewise.
2023-12-13	OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in {c_}finish_omp_clause	Julian Brown	2	-677/+688
	This patch trivially adds braces and reindents the OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza in c_finish_omp_clause and finish_omp_clause, in preparation for the following patch (to clarify the diff a little). 2022-09-13 Julian Brown <julian@codesourcery.com> gcc/c/ * c-typeck.cc (c_finish_omp_clauses): Add braces and reindent OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza. gcc/cp/ * semantics.cc (finish_omp_clause): Add braces and reindent OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.
2023-12-13	libcpp: Fix valgrind errors on pr88974.c [PR112956]	Jakub Jelinek	1	-1/+4
	On the c-c++-common/cpp/pr88974.c testcase I'm seeing ==600549== Conditional jump or move depends on uninitialised value(s) ==600549== at 0x1DD3A05: cpp_get_token_1(cpp_reader, unsigned int) (macro.cc:3050) ==600549== by 0x1DBFC7F: _cpp_parse_expr (expr.cc:1392) ==600549== by 0x1DB9471: do_if(cpp_reader) (directives.cc:2087) ==600549== by 0x1DBB4D8: _cpp_handle_directive (directives.cc:572) ==600549== by 0x1DCD488: _cpp_lex_token (lex.cc:3682) ==600549== by 0x1DD3A97: cpp_get_token_1(cpp_reader, unsigned int) (macro.cc:2936) ==600549== by 0x7F7EE4: scan_translation_unit (c-ppoutput.cc:350) ==600549== by 0x7F7EE4: preprocess_file(cpp_reader) (c-ppoutput.cc:106) ==600549== by 0x7F6235: c_common_init() (c-opts.cc:1280) ==600549== by 0x704C8B: lang_dependent_init (toplev.cc:1837) ==600549== by 0x704C8B: do_compile (toplev.cc:2135) ==600549== by 0x704C8B: toplev::main(int, char*) (toplev.cc:2306) ==600549== by 0x7064BA: main (main.cc:39) error. The problem is that _cpp_lex_direct can leave result->src_loc uninitialized in some cases and later on we use that location_t. _cpp_lex_direct essentially does: cppchar_t c; ... cpp_token result = pfile->cur_token++; fresh_line: result->flags = 0; ... if (buffer->need_line) { if (pfile->state.in_deferred_pragma) { result->type = CPP_PRAGMA_EOL; ... // keeps result->src_loc uninitialized; return result; } if (!_cpp_get_fresh_line (pfile)) { result->type = CPP_EOF; if (!pfile->state.in_directive && !pfile->state.parsing_args) { result->src_loc = pfile->line_table->highest_line; ... } ... // otherwise result->src_loc is sometimes uninitialized here return result; } ... } ... result->src_loc = pfile->line_table->highest_line; ... c = buffer->cur++; switch (c) { ... case '\n': ... buffer->need_line = true; if (pfile->state.in_deferred_pragma) { result->type = CPP_PRAGMA_EOL; ... return result; } goto fresh_line; ... } ... So, if _cpp_lex_direct is called without buffer->need_line initially set, result->src_loc is always initialized (and actually hundreds of tests rely on that exact value it has), even when c == '\n' and we set that flag later on and goto fresh_line. For CPP_PRAGMA_EOL case we have in that case separate handling and don't goto. But if _cpp_lex_direct is called with buffer->need_line initially set and either decide to return a CPP_PRAGMA_EOL token or if getting a new line fails for some reason and we return an CPP_ERROR token and we are in directive or parsing args state, it is kept uninitialized and can be whatever the allocation left it there as. The following patch attempts to keep the status quo, use value that was returned previously if it was initialized (i.e. we went through the goto fresh_line; statement in c == '\n' handling) and only initialize result->src_loc if it was uninitialized before. 2023-12-13 Jakub Jelinek <jakub@redhat.com> PR preprocessor/112956 lex.cc (_cpp_lex_direct): Initialize c to 0. For CPP_PRAGMA_EOL tokens and if c == 0 also for CPP_EOF set result->src_loc to highest locus.
2023-12-13	Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld' format string mismatch	Thomas Schwinge	1	-2/+10
	Fix-up for commit 348874f0baac0f22c98ab11abbfa65fd172f6bdd "libgomp: basic pinned memory on Linux", which may result in build failures as follow, for example, for the '-m32' multilib of x86_64-pc-linux-gnu: In file included from [...]/source-gcc/libgomp/config/linux/allocator.c:31: [...]/source-gcc/libgomp/config/linux/allocator.c: In function ‘linux_memspace_alloc’: [...]/source-gcc/libgomp/config/linux/allocator.c:70:26: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ {aka ‘unsigned int’} [-Werror=format=] 70 \| gomp_debug (0, "libgomp: failed to pin %ld bytes of" \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 71 \| " memory (ulimit too low?)\n", size); \| ~~~~ \| \| \| size_t {aka unsigned int} [...]/source-gcc/libgomp/libgomp.h:186:29: note: in definition of macro ‘gomp_debug’ 186 \| (gomp_debug) ((KIND), __VA_ARGS__); \ \| ^~~~~~~~~~~ [...]/source-gcc/libgomp/config/linux/allocator.c:70:52: note: format string is defined here 70 \| gomp_debug (0, "libgomp: failed to pin %ld bytes of" \| ~~^ \| \| \| long int \| %d cc1: all warnings being treated as errors make[9]: *** [allocator.lo] Error 1 make[9]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/32/libgomp' [...] Fix this in the same way as used elsewhere in libgomp. libgomp/ * config/linux/allocator.c (linux_memspace_alloc): Fix 'size_t' vs. '%ld' format string mismatch.
2023-12-13	c++: TARGET_EXPR location in default arg [PR96997]	Jason Merrill	2	-0/+13
	My r14-6505-g52b4b7d7f5c7c0 change to copy the location in build_aggr_init_expr reopened PR96997; let's fix it properly this time, by clearing the location like we do for other trees. PR c++/96997 gcc/cp/ChangeLog: * tree.cc (bot_manip): Check data.clear_location for TARGET_EXPR. gcc/testsuite/ChangeLog: * g++.dg/debug/cleanup2.C: New test.
2023-12-13	Revert "testsuite: fix g++.dg/pr112822.C"	Jason Merrill	1	-1/+0
	This reverts commit d2b269ce30d77dbfc6c28c75887c330d4698b132.
2023-12-13	PR modula2/112921 missing modules shortreal shortstr shortconv convstringshort	Gaius Mulley	11	-15/+1020
	For completeness here are three SHORTREAL modules which match their LONGREAL and REAL counterparts. The datatype SHORTREAL is a GNU extension and these modules were missing. gcc/m2/ChangeLog: PR modula2/112921 * gm2-libs-iso/ConvStringShort.def: New file. * gm2-libs-iso/ConvStringShort.mod: New file. * gm2-libs-iso/ShortConv.def: New file. * gm2-libs-iso/ShortConv.mod: New file. * gm2-libs-iso/ShortMath.def: New file. * gm2-libs-iso/ShortMath.mod: New file. * gm2-libs-iso/ShortStr.def: New file. * gm2-libs-iso/ShortStr.mod: New file. libgm2/ChangeLog: PR modula2/112921 * libm2iso/Makefile.am (M2DEFS): Add ConvStringShort.def, ShortConv.def, ShortMath.def and ShortStr.def. (M2MODS): Add ConvStringShort.mod, ShortConv.mod, ShortMath.mod and ShortStr.mod. * libm2iso/Makefile.in: Regenerate. gcc/testsuite/ChangeLog: PR modula2/112921 * gm2/iso/run/pass/shorttest.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-12-13	c++: End lifetime of objects in constexpr after destructor call [PR71093]	Nathaniel Shead	10	-28/+284
	This patch adds checks for using objects after they've been manually destroyed via explicit destructor call. Currently this is only implemented for 'top-level' objects; FIELD_DECLs and individual elements of arrays will need a lot more work to track correctly and are left for a future patch. The other limitation is that destruction of parameter objects is checked too 'early', happening at the end of the function call rather than the end of the owning full-expression as they should be for consistency; see cpp2a/constexpr-lifetime2.C. This is because I wasn't able to find a good way to link the constructed parameter declarations with the variable declarations that are actually destroyed later on to propagate their lifetime status, so I'm leaving this for a later patch. PR c++/71093 gcc/cp/ChangeLog: * constexpr.cc (constexpr_global_ctx::get_value_ptr): Don't return NULL_TREE for objects we're initializing. (constexpr_global_ctx::destroy_value): Rename from remove_value. Only mark real variables as outside lifetime. (constexpr_global_ctx::clear_value): New function. (destroy_value_checked): New function. (cxx_eval_call_expression): Defer complaining about non-constant arg0 for operator delete. Use remove_value_safe. (cxx_fold_indirect_ref_1): Handle conversion to 'as base' type. (outside_lifetime_error): Include name of object we're accessing. (cxx_eval_store_expression): Handle clobbers. Improve error messages. (cxx_eval_constant_expression): Use remove_value_safe. Clear bind variables before entering body. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/constexpr-lifetime1.C: Improve error message. * g++.dg/cpp1y/constexpr-lifetime2.C: Likewise. * g++.dg/cpp1y/constexpr-lifetime3.C: Likewise. * g++.dg/cpp1y/constexpr-lifetime4.C: Likewise. * g++.dg/cpp2a/bitfield2.C: Likewise. * g++.dg/cpp2a/constexpr-new3.C: Likewise. New check. * g++.dg/cpp1y/constexpr-lifetime7.C: New test. * g++.dg/cpp2a/constexpr-lifetime1.C: New test. * g++.dg/cpp2a/constexpr-lifetime2.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2023-12-13	c++: fix in-charge parm in constexpr	Jason Merrill	1	-0/+13
	I was puzzled by the proposed patch for PR71093 specifically ignoring the in-charge parameter; the problem turned out to be that when cxx_eval_call_expression jumps from the clone to the cloned function, it assumes that the latter has the same parameters, and so the in-charge parm doesn't get an argument. Since a class with vbases can't have constexpr 'tors there isn't actually a need for an in-charge parameter in a destructor, but we used to use it for deleting destructors and never removed it. I have a patch to do that for GCC 15, but for now let's work around it. gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_call_expression): Handle missing in-charge argument.
2023-12-13	c++: constant direct-initialization [PR108243]	Jason Merrill	3	-6/+20
	When testing the proposed patch for PR71093 I noticed that it changed the diagnostic for consteval-prop6.C. I then noticed that the diagnostic wasn't very helpful either way; it was complaining about modification of the 'x' variable, but it's not a problem to initialize a local variable with a consteval constructor as long as the value is actually constant, we want to know why the value isn't constant. And then it turned out that this also fixed a missed-optimization bug in the testsuite. PR c++/108243 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_outermost_constant_expr): Turn a constructor CALL_EXPR into a TARGET_EXPR. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/consteval-prop6.C: Adjust diagnostic. * g++.dg/opt/is_constant_evaluated3.C: Remove xfails.
2023-12-13	c++: copy location to AGGR_INIT_EXPR	Jason Merrill	3	-3/+4
	When building an AGGR_INIT_EXPR from a CALL_EXPR, we shouldn't lose location information. gcc/cp/ChangeLog: * tree.cc (build_aggr_init_expr): Copy EXPR_LOCATION. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/constexpr-nsdmi7b.C: Adjust line. * g++.dg/template/copy1.C: Likewise.
2023-12-13	testsuite: fix g++.dg/pr112822.C	Jason Merrill	1	-0/+1
	gcc/testsuite/ChangeLog: * g++.dg/pr112822.C: Require C++17.
2023-12-13	amdgcn: Work around XNACK register allocation problem	Andrew Stubbs	5	-5/+35
	The extra register pressure is causing infinite loops in some cases, especially at -O0. I have not yet observed any issue on devices that have AVGPRs for spilling, and XNACK is only really useful on those devices anyway, so change the defaults. gcc/ChangeLog: * config/gcn/gcn-hsa.h (NO_XNACK): Change the defaults. * config/gcn/gcn-opts.h (enum hsaco_attr_type): Add HSACO_ATTR_DEFAULT. * config/gcn/gcn.cc (gcn_option_override): Set the default flag_xnack. * config/gcn/gcn.opt: Add -mxnack=default. * doc/invoke.texi: Document the -mxnack default.
2023-12-13	amdgcn: Support XNACK mode	Andrew Stubbs	6	-138/+180
	The XNACK feature allows memory load instructions to restart safely following a page-miss interrupt. This is useful for shared-memory devices, like APUs, and to implement OpenMP Unified Shared Memory. To support the feature we must be able to set the appropriate meta-data and set the load instructions to early-clobber. When the port supports scheduling of s_waitcnt instructions there will be further requirements. gcc/ChangeLog: * config/gcn/gcn-hsa.h (NO_XNACK): Ignore missing -march. (XNACKOPT): Match on/off; ignore any. * config/gcn/gcn-valu.md (gather<mode>_insn_1offset<exec>): Add xnack compatible alternatives. (gather<mode>_insn_2offsets<exec>): Likewise. * config/gcn/gcn.cc (gcn_option_override): Permit -mxnack for devices other than Fiji and gfx1030. (gcn_expand_epilogue): Remove early-clobber problems. (gcn_hsa_declare_function_name): Obey -mxnack setting. * config/gcn/gcn.md (xnack): New attribute. (enabled): Rework to include "xnack" attribute. (movbi): Add xnack compatible alternatives. (mov<mode>_insn): Likewise. (mov<mode>_insn): Likewise. (mov<mode>_insn): Likewise. (movti_insn): Likewise. config/gcn/gcn.opt (-mxnack): Change the default to "any". * doc/invoke.texi: Remove placeholder notice for -mxnack.
2023-12-13	aarch64 testsuite: Check entire .arch string	Andrew Carlotti	43	-43/+43
	Add a terminating newline to various tests, and add missing extensions to some test strings. The current output is broken for options_set_4.c, so this test is left unchanged, to be fixed in a subsequent patch. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/native_cpu_18.c: Add \+nopauth\n * gcc.target/aarch64/options_set_7.c: Add \+crc\n * gcc.target/aarch64/options_set_8.c: Add \+crc\+nodotprod\n * gcc.target/aarch64/cpunative/native_cpu_0.c: Add \n * gcc.target/aarch64/cpunative/native_cpu_1.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_2.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_3.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_4.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_5.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_8.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_9.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_10.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_11.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_12.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_14.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_15.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto. * gcc.target/aarch64/options_set_1.c: Ditto. * gcc.target/aarch64/options_set_2.c: Ditto. * gcc.target/aarch64/options_set_3.c: Ditto. * gcc.target/aarch64/options_set_5.c: Ditto. * gcc.target/aarch64/options_set_6.c: Ditto. * gcc.target/aarch64/options_set_9.c: Ditto. * gcc.target/aarch64/options_set_11.c: Ditto. * gcc.target/aarch64/options_set_12.c: Ditto. * gcc.target/aarch64/options_set_13.c: Ditto. * gcc.target/aarch64/options_set_14.c: Ditto. * gcc.target/aarch64/options_set_15.c: Ditto. * gcc.target/aarch64/options_set_16.c: Ditto. * gcc.target/aarch64/options_set_17.c: Ditto. * gcc.target/aarch64/options_set_18.c: Ditto. * gcc.target/aarch64/options_set_19.c: Ditto. * gcc.target/aarch64/options_set_20.c: Ditto. * gcc.target/aarch64/options_set_21.c: Ditto. * gcc.target/aarch64/options_set_22.c: Ditto. * gcc.target/aarch64/options_set_23.c: Ditto. * gcc.target/aarch64/options_set_24.c: Ditto. * gcc.target/aarch64/options_set_25.c: Ditto. * gcc.target/aarch64/options_set_26.c: Ditto.
2023-12-13	aarch64: Add missing driver-aarch64 dependencies	Andrew Carlotti	1	-1/+3
	gcc/ChangeLog: * config/aarch64/x-aarch64: Add missing dependencies.
2023-12-13	libgomp: basic pinned memory on Linux	Andrew Stubbs	9	-46/+716
	Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed. This implementation will work OK for page-scale allocations, and finer-grained allocations will be implemented in a future patch. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN. (MEMSPACE_REALLOC): Add PIN. (MEMSPACE_FREE): Add PIN. (MEMSPACE_VALIDATE): Add PIN. (omp_init_allocator): Use MEMSPACE_VALIDATE to check pinning. (omp_aligned_alloc): Add pinning to all MEMSPACE_* calls. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. (omp_free): Likewise. * config/linux/allocator.c: New file. * config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN. (MEMSPACE_REALLOC): Add PIN. (MEMSPACE_FREE): Add PIN. (MEMSPACE_VALIDATE): Add PIN. * config/gcn/allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN. (MEMSPACE_REALLOC): Add PIN. (MEMSPACE_FREE): Add PIN. * libgomp.texi: Switch pinned trait to supported. (MEMSPACE_VALIDATE): Add PIN. * testsuite/libgomp.c/alloc-pinned-1.c: New test. * testsuite/libgomp.c/alloc-pinned-2.c: New test. * testsuite/libgomp.c/alloc-pinned-3.c: New test. * testsuite/libgomp.c/alloc-pinned-4.c: New test. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2023-12-13	testsuite: Add dg-do compile target c++17 directive for testcase [PR112822]	Peter Bergner	1	-0/+1
	Add dg-do compile target directive that limits the test case to being built on c++17 compiles or greater. 2023-12-13 Peter Bergner <bergner@linux.ibm.com> gcc/testsuite/ PR tree-optimization/112822 * g++.dg/pr112822.C: Add dg-do compile target c++17 directive.
2023-12-13	RISC-V: Refine test cases for both PR112929 and PR112988	Pan Li	4	-0/+110
	Refine the test cases for: * Name convention. * Add run case. These test cases used to cause out-of-bounds writes to the stack and therefore showed unreliable behavior. Depending on the execution environment they can either pass or fail. As of now, with the latest QEMU version, they will pass even without the underlying issue fixed. As the test case is known to have caused the problem before we keep it as a run test case for future reference. PR target/112929 PR target/112988 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr112929.c: Moved to... * gcc.target/riscv/rvv/vsetvl/pr112929-1.c: ...here. * gcc.target/riscv/rvv/vsetvl/pr112988.c: Moved to... * gcc.target/riscv/rvv/vsetvl/pr112988-1.c: ...here. * gcc.target/riscv/rvv/vsetvl/pr112929-2.c: New test. * gcc.target/riscv/rvv/vsetvl/pr112988-2.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-12-13	aarch64 testsuite: Only run aarch64-ssve tests once	Andrew Carlotti	1	-0/+4
	gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/aarch64-ssve.exp:
2023-12-13	ARC: Add *extvsi_n_0 define_insn_and_split for PR 110717.	Roger Sayle	3	-0/+47
	This patch improves the code generated for bitfield sign extensions on ARC cpus without a barrel shifter. Compiling the following test case: int foo(int x) { return (x<<27)>>27; } with -O2 -mcpu=em, generates two loops: foo: mov lp_count,27 lp 2f add r0,r0,r0 nop 2: # end single insn loop mov lp_count,27 lp 2f asr r0,r0 nop 2: # end single insn loop j_s [blink] and the closely related test case: struct S { int a : 5; }; int bar (struct S p) { return p->a; } generates the slightly better: bar: ldb_s r0,[r0] mov_s r2,0 ;3 add3 r0,r2,r0 sexb_s r0,r0 asr_s r0,r0 asr_s r0,r0 j_s.d [blink] asr_s r0,r0 which uses 6 instructions to perform this particular sign extension. It turns out that sign extensions can always be implemented using at most three instructions on ARC (without a barrel shifter) using the idiom ((x&mask)^msb)-msb [as described in section "2-5 Sign Extension" of Henry Warren's book "Hacker's Delight"]. Using this, the sign extensions above on ARC's EM both become: bmsk_s r0,r0,4 xor r0,r0,16 sub r0,r0,16 which takes about 3 cycles, compared to the ~112 cycles for the loops in foo. 2023-12-13 Roger Sayle <roger@nextmovesoftware.com> Jeff Law <jlaw@ventanamicro.com> gcc/ChangeLog config/arc/arc.md (extvsi_n_0): New define_insn_and_split to implement SImode sign extract using a AND, XOR and MINUS sequence. gcc/testsuite/ChangeLog gcc.target/arc/extvsi-1.c: New test case. * gcc.target/arc/extvsi-2.c: Likewise.
2023-12-13	RISC-V:Add crypto vector implied ISA info.	Feng Wang	2	-6/+24
	Due to the crypto vector entension is depend on the Vector extension, so add the implied ISA info with the corresponding crypto vector extension. gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Modify implied ISA info. * config/riscv/arch-canonicalize: Add crypto vector implied info.
2023-12-13	libstdc++: Fix regression in std::format output of %Y for negative years	Jonathan Wakely	1	-1/+1
	The change in r14-6468-ga01462ae8bafa8 was only supposed to apply to %C formats, not %Y. libstdc++-v3/ChangeLog: * include/bits/chrono_io.h (__formatter_chrono::_M_C_y_Y): Do not round century down for %Y formats.
2023-12-13	gettext: disable install, docs targets, libasprintf, threads	Arsen Arsenović	2	-175/+40
	This fixes issues reported by David Edelsohn <dje.gcc@gmail.com>, and by Eric Gallager <egallager@gcc.gnu.org>. ChangeLog: * Makefile.def (gettext): Disable (via missing) {install-,}{pdf,html,info,dvi} and TAGS targets. Set no_install to true. Add --disable-threads --disable-libasprintf. Drop the lib_path (as there are no shared libs). * Makefile.in: Regenerate.
2023-12-13	download_prerequisites: add --only-gettext	Arsen Arsenović	1	-1/+7
	contrib/ChangeLog: * download_prerequisites <arg parse>: Parse --only-gettext. (echo_archives): Check only_gettext and stop early if true. (helptext): Document --only-gettext.
2023-12-13	RISC-V: Postpone full available optimization [VSETVL PASS]	Juzhe-Zhong	3	-2/+139
	Fix VSETVL BUG that AVL is polluted .L15: li a3,9 lui a4,%hi(s) sw a3,%lo(j)(t2) sh a5,%lo(s)(a4) <--a4 is hold the address of s beq t0,zero,.L42 sw t5,8(t4) vsetvli zero,a4,e8,m8,ta,ma <<--- a4 as avl Actually, this vsetvl is redundant. The root cause we include full available optimization in LCM local data computation. full available optimization should be after LCM computation. PR target/112929 PR target/112988 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_lcm_local_properties): Remove full available. (pre_vsetvl::pre_global_vsetvl_info): Add full available optimization. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr112929.c: New test. * gcc.target/riscv/rvv/vsetvl/pr112988.c: New test.
2023-12-13	RISC-V: Fix dynamic lmul tests depended on abi	demin.han	1	-0/+11
	Some toolchain configs would report: fatal error: gnu/stubs-ilp32.h: No such file or directory Fix method suggested by Juzhe-Zhong gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h: New file. Signed-off-by: demin.han <demin.han@starfivetech.com> Signed-off-by: demin.han <demin.han@starfivetech.com>
2023-12-13	Middle-end: Adjust decrement IV style partial vectorization COST model	Juzhe-Zhong	2	-3/+26
	Hi, before this patch, a simple conversion case for RVV codegen: foo: ble a2,zero,.L8 addiw a5,a2,-1 li a4,6 bleu a5,a4,.L6 srliw a3,a2,3 slli a3,a3,3 add a3,a3,a0 mv a5,a0 mv a4,a1 vsetivli zero,8,e16,m1,ta,ma .L4: vle8.v v2,0(a5) addi a5,a5,8 vzext.vf2 v1,v2 vse16.v v1,0(a4) addi a4,a4,16 bne a3,a5,.L4 andi a5,a2,-8 beq a2,a5,.L10 .L3: slli a4,a5,32 srli a4,a4,32 subw a2,a2,a5 slli a2,a2,32 slli a5,a4,1 srli a2,a2,32 add a0,a0,a4 add a1,a1,a5 vsetvli zero,a2,e16,m1,ta,ma vle8.v v2,0(a0) vzext.vf2 v1,v2 vse16.v v1,0(a1) .L8: ret .L10: ret .L6: li a5,0 j .L3 This vectorization go through first loop: vsetivli zero,8,e16,m1,ta,ma .L4: vle8.v v2,0(a5) addi a5,a5,8 vzext.vf2 v1,v2 vse16.v v1,0(a4) addi a4,a4,16 bne a3,a5,.L4 Each iteration processes 8 elements. For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN = 128. But, as long as VLEN > 128 bits, it will waste the CPU resources. That is, e.g. VLEN = 256bits. only half of the vector units are working and another half is idle. After investigation, I realize that I forgot to adjust COST for SELECT_VL. So, adjust COST for SELECT_VL styple length vectorization. We adjust COST from 3 to 2. since after this patch: foo: ble a2,zero,.L5 .L3: vsetvli a5,a2,e16,m1,ta,ma -----> SELECT_VL cost. vle8.v v2,0(a0) slli a4,a5,1 -----> additional shift of outcome SELECT_VL for memory address calculation. vzext.vf2 v1,v2 sub a2,a2,a5 vse16.v v1,0(a1) add a0,a0,a5 add a1,a1,a4 bne a2,zero,.L3 .L5: ret This patch is a simple fix that I previous forgot. Ok for trunk ? If not, I am going to adjust cost in backend cost model. PR target/111317 gcc/ChangeLog: * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust for COST for decrement IV. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/pr111317.c: New test.
2023-12-13	lower-bitint: Fix lowering of non-_BitInt to _BitInt cast merged with some ↵	Jakub Jelinek	2	-15/+28
	wider cast [PR112940] The following testcase ICEs, because a PHI argument from latch edge uses a SSA_NAME set only in a conditionally executed block inside of the loop. This happens when we have some outer cast which lowers its operand several times, under some condition with variable index, under different condition with some constant index, otherwise something else, and then there is an inner cast from non-_BitInt integer (or small/middle one). Such cast in certain conditions is emitted by initializing some SSA_NAMEs in the initialization statements before loops (say for casts from <= limb size precision by computing a SSA_NAME for the first limb and then extension of it for the later limbs) and uses the prepare_data_in_out function to create a PHI node. Such function is passed the value (constant or SSA_NAME) to use in the PHI argument from the pre-header edge, but for the latch edge it always created a new SSA_NAME and then caller emitted in the following 3 spots an extra assignment to set that SSA_NAME to whatever value we want from the latch edge. In all these 3 cases the argument from the latch edge is known already before the loop though, either constant or SSA_NAME computed in pre-header as well. But the need to emit an assignment combined with the handle_operand done in a conditional basic block results in the SSA verification failure. The following patch fixes it by extending the prpare_data_in_out method, so that when the latch edge argument is known before (constant or computed in pre-header), we can just use it directly and avoid the extra assignment that would normally be hopefully optimized away later to what we now emit directly. 2023-12-13 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/112940 * gimple-lower-bitint.cc (struct bitint_large_huge): Add another argument to prepare_data_in_out method defaulted to NULL_TREE. (bitint_large_huge::handle_operand): Pass another argument to prepare_data_in_out instead of emitting an assignment to set it. (bitint_large_huge::prepare_data_in_out): Add VAL_OUT argument. If non-NULL, use it as PHI argument instead of creating a new SSA_NAME. (bitint_large_huge::handle_cast): Pass rext as another argument to 2 prepare_data_in_out calls instead of emitting assignments to set them. * gcc.dg/bitint-53.c: New test.