aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-10-25OpenAcc: Correction of reduction enhancementMarcel Vollweiler2-7/+5
Commit bce2c92cfec2ae1eb9d79e36dff5a220b688bfa1 "Various OpenACC reduction enhancements - ME and nvptx changes" introduced several regressions: gcc/testsuite/c-c++-common/goacc/nested-reductions-1-routine.c gcc/testsuite/c-c++-common/goacc/nested-reductions-2-routine.c gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c gcc/testsuite/gfortran.dg/goacc/nested-reductions-1-routine.f90 gcc/testsuite/gfortran.dg/goacc/nested-reductions-2-routine.f90 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90 This fixes above regressions. gcc/ChangeLog: * omp-offload.cc (oacc_loop_auto_partitions): Removed OLF reduction handling.
2022-10-25Relax assertion in profilerEric Botcazou4-5/+20
This assertion in branch_prob: if (bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb) { location_t loc = DECL_SOURCE_LOCATION (current_function_decl); gcc_checking_assert (!RESERVED_LOCATION_P (loc)); had been correct until the fix for PR debug/101598 was installed. gcc/ * profile.cc (branch_prob): Be prepared for ignored functions with DECL_SOURCE_LOCATION set to UNKNOWN_LOCATION. gcc/testsuite: * gnat.dg/specs/coverage1.ads: New test. * gnat.dg/specs/variant_part.ads: Minor tweak. * gnat.dg/specs/weak1.ads: Add dg directive.
2022-10-25IBM zSystems: Fix function_ok_for_sibcall [PR106355]Stefan Schulze Frielinghaus5-23/+100
For a parameter with BLKmode we cannot use REG_NREGS in order to determine the number of consecutive registers. Streamlined this with the implementation of s390_function_arg. Fix some indentation whitespace, too. gcc/ChangeLog: PR target/106355 * config/s390/s390.cc (s390_call_saved_register_used): For a parameter with BLKmode fix determining number of consecutive registers. gcc/testsuite/ChangeLog: * gcc.target/s390/pr106355.h: Common code for new tests. * gcc.target/s390/pr106355-1.c: New test. * gcc.target/s390/pr106355-2.c: New test. * gcc.target/s390/pr106355-3.c: New test. (cherry picked from commit cb994acc08b67f26a54e7c5dc1f4995a2ce24d98)
2022-10-25i386: fix pedantic warningMartin Liska1-1/+1
PR target/107364 gcc/ChangeLog: * common/config/i386/i386-cpuinfo.h (enum processor_vendor): Fix pedantic warning. (cherry picked from commit f3f000b7689ce9eb6364808072025672af1e4e1b)
2022-10-25x86: fix VENDOR_MAX enum valueMartin Liska1-1/+3
PR target/107364 gcc/ChangeLog: * common/config/i386/i386-cpuinfo.h (enum processor_vendor): Reorder enum values as BUILTIN_VENDOR_MAX should not point in the middle of the valid enum values. (cherry picked from commit f751bf4c5d1aaa1aacfcbdec62881c5ea1175dfb)
2022-10-25Daily bump.GCC Administrator1-1/+1
2022-10-24libgomp/nvptx: Prepare for reverse-offload callback handling, resolve ↵Thomas Schwinge2-0/+10
spurious SIGSEGVs Per commit r13-3460-g131d18e928a3ea1ab2d3bf61aa92d68a8a254609 "libgomp/nvptx: Prepare for reverse-offload callback handling", I'm seeing a lot of libgomp execution test regressions. Random example, 'libgomp.c-c++-common/error-1.c': [...] GOMP_OFFLOAD_run: kernel main$_omp_fn$0: launch [(teams: 1), 1, 1] [(lanes: 32), (threads: 8), 1] Thread 1 "a.out" received signal SIGSEGV, Segmentation fault. 0x00007ffff793b87d in GOMP_OFFLOAD_run (ord=<optimized out>, tgt_fn=<optimized out>, tgt_vars=<optimized out>, args=<optimized out>) at [...]/source-gcc/libgomp/plugin/plugin-nvptx.c:2127 2127 if (__atomic_load_n (&ptx_dev->rev_data->fn, __ATOMIC_ACQUIRE) != 0) (gdb) print ptx_dev $1 = (struct ptx_device *) 0x6a55a0 (gdb) print ptx_dev->rev_data $2 = (struct rev_offload *) 0xffffffff00000000 (gdb) print ptx_dev->rev_data->fn Cannot access memory at address 0xffffffff00000000 libgomp/ * plugin/plugin-nvptx.c (nvptx_open_device): Initialize 'ptx_dev->rev_data'. (cherry picked from commit 205538832b7033699047900cf25928f5920d8b93)
2022-10-24vect: WORKAROUND vectorizer bugAndrew Stubbs2-1/+20
This patch disables vectorization of memory accesses to non-default address spaces where the pointer size is different to the usual pointer size. This condition typically occurs in OpenACC programs on amdgcn, where LDS memory is used for broadcasting gang-private variables between threads. In particular, see libgomp.oacc-c-c++-common/private-variables.c The problem is that the address space information is dropped from the various types in the middle-end and eventually it triggers an ICE trying to do an address conversion. That ICE can be avoided by defining POINTERS_EXTEND_UNSIGNED, but that just produces wrong RTL code later on. A correct solution would ensure that all the vectypes have the correct address spaces, but I don't have time for that right now. gcc/ChangeLog: * tree-vect-data-refs.cc (vect_analyze_data_refs): Workaround an address-space bug.
2022-10-24amdgcn: disallow USM on gfx908Andrew Stubbs2-0/+5
It does work, but not well and only with the amdgpu.noreply=0 kernel boot option. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_init_cumulative_args): Disallow gfx908.
2022-10-24amdgcn, libgomp: USM allocation updateAndrew Stubbs6-5/+86
Allocate Unified Shared Memory via malloc and hsa_amd_svm_attributes_set, instead of hsa_allocate_memory. This scheme should be more efficient for for memory that is first accessed by the CPU. libgomp/ChangeLog: * plugin/plugin-gcn.c (HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED): New. (HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): New. (HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG): New. (HSA_AMD_SVM_GLOBAL_FLAG_COARSE_GRAINED): New. (hsa_amd_svm_attribute_pair_t): New. (struct hsa_runtime_fn_info): Add hsa_amd_svm_attributes_set_fn. (dump_hsa_system_info): Dump HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED and HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT. (DLSYM_OPT_FN): New. (init_hsa_runtime_functions): Add hsa_amd_svm_attributes_set. (GOMP_OFFLOAD_usm_alloc): Use malloc and hsa_amd_svm_attributes_set. (GOMP_OFFLOAD_usm_free): Use regular free. * testsuite/libgomp.c/usm-1.c: Add -mxnack=on for amdgcn. * testsuite/libgomp.c/usm-2.c: Likewise. * testsuite/libgomp.c/usm-3.c: Likewise. * testsuite/libgomp.c/usm-4.c: Likewise.
2022-10-24libgomp/nvptx: Prepare for reverse-offload callback handlingTobias Burnus13-17/+286
This patch adds a stub 'gomp_target_rev' in the host's target.c, which will later handle the reverse offload. For nvptx, it adds support for forwarding the offload gomp_target_ext call to the host by setting values in a struct on the device and querying it on the host - invoking gomp_target_rev on the result. include/ChangeLog: * cuda/cuda.h (enum CUdevice_attribute): Add CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING. (CU_MEMHOSTALLOC_DEVICEMAP): Define. (cuMemHostAlloc): Add prototype. libgomp/ChangeLog: * config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove 'static' for this variable. * config/nvptx/libgomp-nvptx.h: New file. * config/nvptx/target.c: Include it. (GOMP_ADDITIONAL_ICVS): Declare extern var. (GOMP_REV_OFFLOAD_VAR): Declare var. (GOMP_target_ext): Handle reverse offload. * libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype. * libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ... * target.c (gomp_target_rev): ... this new stub function. * libgomp.h (gomp_target_rev): Declare. * libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev. * plugin/cuda-lib.def (cuMemHostAlloc): Add. * plugin/plugin-nvptx.c: Include libgomp-nvptx.h. (struct ptx_device): Add rev_data member. (nvptx_open_device): Remove async_engines query, last used in r10-304-g1f4c5b9b; add unified-address assert check. (GOMP_OFFLOAD_get_num_devices): Claim unified address support. (GOMP_OFFLOAD_load_image): Free rev_fn_table if no offload functions exist. Make offload var available on host and device. (rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New. (GOMP_OFFLOAD_run): Handle reverse offload. (cherry picked from commit 052dfa279f5de90b324d60cf787e821b18cf496c)
2022-10-24gcc/testsuite: Change 'cunrolli' to 'cunrolli1' in dump scan + optionsTobias Burnus13-21/+37
The OG12 commit 3e8b51d143e openacc: Move pass_oacc_device_lower after pass_graphite adds a new pass, which also re-invokes some previous passes. This seems to have the effect that the pass names and, hence, the dump names have now a tailing number. In that commit, 'cunrolli' was changed to 'cunrolli1 for some testcases. This commit does likewise for some more testcases. In particular, . in scan-tree-dump the tailing '1' is crucial to change UNRESOLVED to PASS. . for "-fdisable-tree-cunrolli1" option, it changes FAIL (excess errors) to PASS . even without the change, "-fdump-tree-cunrolli{-details,-optimized}" has PASS, but I believe the tailing 1 ensures that only the first 'cunrolli' dumps. gcc/testsuite * g++.dg/ext/unroll-1.C: Change 'cunrolli' to 'cunrolli1' in dg-options and scan-tree-dump. * g++.dg/ext/unroll-2.C: Likewise. * g++.dg/ext/unroll-3.C: Likewise. * g++.dg/vect/pr36648.cc: Likewise. * gcc.dg/tree-prof/init-array.c: Likewise. * gcc.dg/tree-ssa/pr100359.c: Likewise. * gcc.dg/tree-ssa/pr59597.c: Likewise. * gcc.dg/unroll-2.c: Likewise. * gfortran.dg/directive_unroll_1.f90: Likewise. * gfortran.dg/directive_unroll_4.f90: Likewise. * gnat.dg/unroll1.adb: Likewise. * gnat.dg/unroll2.adb: Likewise.
2022-10-24OpenMP: Fix reverse offload GOMP_TARGET_REV IFN corner cases [PR107236]Tobias Burnus5-7/+43
For 'target parallel' and similarly nested directives, cgraph_node's calls_declare_variant_alt was not set in the parent region node but in cfun->decl. Hence, pass_omp_device_lower did not process handle the internal function GOMP_TARGET_REV. - Solution is to set it to the DECL_CONTEXT, which is set in adjust_context_and_scope. The cgraph_node::create_clone issue is exposed with -O2 for the existing libgomp.fortran/reverse-offload-1.f90. PR middle-end/107236 gcc/ChangeLog: * omp-expand.cc (expand_omp_target): Set calls_declare_variant_alt in DECL_CONTEXT and not to cfun->decl. * cgraphclones.cc (cgraph_node::create_clone): Copy also the node's calls_declare_variant_alt value. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/target-device-ancestor-6.f90: New test. (cherry picked from commit 178ac530fe67e4f2fc439cc4ce89bc19d571ca31)
2022-10-24Merge branch 'releases/gcc-12' into devel/omp/gcc-12Tobias Burnus29-64/+540
Merge up to r12-8861-g1ccec25cf0c3c9cfd5882c83fd8cc56ea2987bad (24th Oct 2022)
2022-10-24Missing pr104517.c change from: 'Add a restriction on allocate clause ↵Tobias Burnus1-8/+10
(OpenMP 5.0)' OG12 commit df47c25110474565f521508a1545232550052a75 included everything of r13-150-g1a8c4d9ed36556a95bd7d53c04d2ec4c95594061 but the change to gcc/testsuite/gcc.dg/gomp/pr104517.c This commit cherry-picks the missing changes to that file. Note: The OG12 commit already contained the ChangeLog.omp entry for that file. gcc/testsuite/ * gcc.dg/gomp/pr104517.c: Update. (cherry picked from commit 1a8c4d9ed36556a95bd7d53c04d2ec4c95594061)
2022-10-24Daily bump.GCC Administrator3-1/+20
2022-10-23Fortran: error recovery with references of bad array constructors [PR105633]Harald Anlauf2-3/+15
gcc/fortran/ChangeLog: PR fortran/105633 * expr.cc (find_array_section): Move check for NULL pointers so that both subscript triplets and vector subscripts are covered. gcc/testsuite/ChangeLog: PR fortran/105633 * gfortran.dg/pr105633.f90: New test. Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org> (cherry picked from commit ecb20df4fa6d99daa635c7fb662dc0554610777e)
2022-10-23Daily bump.GCC Administrator1-1/+1
2022-10-22Daily bump.GCC Administrator4-1/+40
2022-10-21omp-oacc-kernels-decompose.cc: fix -fcompare-debug with GIMPLE_DEBUGTobias Burnus2-2/+9
GIMPLE_DEBUG were put in a parallel region of its own, which is not only pointless but also breaks -fcompare-debug. With this commit, they are handled like simple assignments: those placed are places into the same body as the loop such that only one parallel region remains as without debugging. This fixes the existing testcase libgomp.oacc-c-c++-common/kernels-loop-g.c. Note: GIMPLE_DEBUG are only accepted with -fcompare-debug; if they appear otherwise, decompose_kernels_region_body rejects them with a sorry (unchanged). gcc/ * omp-oacc-kernels-decompose.cc (top_level_omp_for_in_stmt, decompose_kernels_region_body): Handle GIMPLE_DEBUG like simple assignment.
2022-10-21Add 'libgomp.oacc-c-c++-common/private-big-1.c' [PR105421]Thomas Schwinge1-0/+100
After commit r13-3404-g7c55755d4c760de326809636531478fd7419e1e5 "amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]", "big" private data now works for GCN offloading, too. PR target/105421 libgomp/ * testsuite/libgomp.oacc-c-c++-common/private-big-1.c: New. (cherry picked from commit c7ebee2378426eeca425ca5406af213a926f154c)
2022-10-21amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]Julian Brown1-6/+9
The GCN backend uses a heuristic to determine whether to use FLAT or GLOBAL addressing in a particular (offload) function: namely, if a function takes a pointer-to-scalar parameter, it is assumed that the pointer may refer to "flat scratch" space, and thus FLAT addressing must be used instead of GLOBAL. I came up with this heuristic initially whilst working on support for moving OpenACC gang-private variables into local-data share (scratch) memory. The assumption that only scalar variables would be transformed in that way turned out to be wrong. For example, prior to the next patch in the series, Fortran compiler-generated temporary structures were treated as gang private and moved to LDS space, typically overflowing the region allocated for such variables. That will no longer happen after that patch is applied, but there may be other cases of structs moving to LDS space now or in the future that this patch may be needed for. 2022-10-14 Julian Brown <julian@codesourcery.com> PR target/105421 gcc/ * config/gcn/gcn.cc (gcn_detect_incoming_pointer_arg): Any pointer argument forces FLAT addressing mode, not just pointer-to-non-aggregate. (cherry picked from commit 7c55755d4c760de326809636531478fd7419e1e5)
2022-10-21Added "noclone" to scan-tree-dump for several OpenAcc tests.Marcel Vollweiler11-11/+22
This fixes multiple tests in addition to b0256655fb402f87c921cd782b873dd301760ebd. gcc/testsuite/ChangeLog: * c-c++-common/goacc/classify-kernels-unparallelized-graphite.c: Add "noclone" in scan-tree-dump. * c-c++-common/goacc/kernels-acc-loop-reduction.c: Likewise. * c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: Likewise. * c-c++-common/goacc/kernels-loop-2-acc-loop.c: Likewise. * c-c++-common/goacc/kernels-loop-3-acc-loop.c: Likewise. * c-c++-common/goacc/kernels-loop-acc-loop.c: Likewise. * c-c++-common/goacc/kernels-loop-n-acc-loop.c: Likewise. * gfortran.dg/goacc/kernels-loop-data-parloops-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-parloops-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-parloops.f95: Likewise.
2022-10-21tree-optimization/107323 - loop distribution partition ordering issueRichard Biener2-14/+64
The following reverts part of the PR94125 fix which causes us to use a bogus partition ordering after applying versioning for alias to the testcase in PR107323. Instead PR94125 is fixed by appropriately considering to be merged SCCs when skipping edges we want to ignore because of the alias versioning. PR tree-optimization/107323 * tree-loop-distribution.cc (pg_unmark_merged_alias_ddrs): New function. (loop_distribution::break_alias_scc_partitions): Revert postorder save/restore from the PR94125 fix. Instead make sure to not ignore edges from SCCs we are going to merge. * gcc.dg/tree-ssa/pr107323.c: New testcase. (cherry picked from commit 09f9814dc02c161ed78604c6df70b19b596f7524)
2022-10-21Daily bump.GCC Administrator3-1/+72
2022-10-20Make 'c-c++-common/goacc/kernels-decompose-pr100400-1-*.c' behave ↵Thomas Schwinge2-7/+11
consistently, regardless of checking level Fix-up for commit c14ea6a72fb1ae66e3d32ac8329558497c6e4403 "Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061]". For C++ compilation of 'c-c++-common/goacc/kernels-decompose-pr100400-1-2.c', we first emit a 'sorry' diagnostic, and then a 'gcc_unreachable' (or 'internal_error', see below) diagnostic, but for example, for '--enable-checking=release' (thus, '!CHECKING_P'), the second one may actually be turned into a 'confused by earlier errors, bailing out' diagnostic. (See 'gcc/diagnostic.cc:diagnostic_report_diagnostic': "When not checking, ICEs are converted to fatal errors when an error has already occurred.") Thus, make 'c-c++-common/goacc/kernels-decompose-pr100400-1-2.c' behave consistently via '-Wfatal-errors', and thus only matching the 'sorry' diagnostic. For example, for '--enable-checking=no' (thus, '!ENABLE_ASSERT_CHECKING'), a call to 'gcc_unreachable' cannot be assumed emit an 'internal_error'-like diagnostic, so explicitly call 'internal_error' in 'gcc/omp-oacc-kernels-decompose.cc:visit_loops_in_gang_single_region', in the 'GIMPLE_OMP_FOR' case, to avoid regressing 'c-c++-common/goacc/kernels-decompose-pr100400-1-3.c', and 'c-c++-common/goacc/kernels-decompose-pr100400-1-4.c'. PR middle-end/100400 gcc/ * omp-oacc-kernels-decompose.cc (visit_loops_in_gang_single_region) <GIMPLE_OMP_FOR>: Explicitly call 'internal_error'. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: Specify '-Wfatal-errors'. (cherry picked from commit da6305558bab9e24943848e4fc5bd8738d7e8f9b)
2022-10-20aarch64: Prevent generation of /M BRKAS and BRKBSRichard Sandiford3-18/+16
Bit of a brown-paper-bag bug, but: GCC was generating non-existent merging forms of BRKAS and BRKBS. Those instructions only support zero predication (although BRKA and BRKB support both). gcc/ * config/aarch64/aarch64-sve.md (*aarch64_brk<brk_op>_cc): Remove merging alternative. (*aarch64_brk<brk_op>_ptest): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/brka_1.c: Expect a separate PTEST instruction. * gcc.target/aarch64/sve/acle/general/brkb_1.c: Likewise. (cherry picked from commit 57675c7f92a3bd3ca8dae1faac7f2f51d40e0f9e)
2022-10-20aarch64: Fix matching of BRKNSRichard Sandiford4-10/+90
Unlike other flag-setting SVE instructions, BRKNS sets the flags based on an all-true governing predicate, rather than the GP operand. gcc/ * config/aarch64/iterators.md (SVE_BRKP): New iterator. * config/aarch64/aarch64-sve.md (*aarch64_brkn_cc): New pattern. (*aarch64_brkn_ptest): Likewise. (*aarch64_brk<brk_op>_cc): Restrict to SVE_BRKP. (*aarch64_brk<brk_op>_ptest): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/brkn_1.c: Expect separate PTEST instructions. * gcc.target/aarch64/sve/acle/general/brkn_2.c: New test. (cherry picked from commit 6bec66640597e2604f51fc1642c7d279164cd442)
2022-10-20aarch64: Define __ARM_FEATURE_RCPCRichard Sandiford4-6/+29
https://github.com/ARM-software/acle/pull/199 adds a new feature macro for RCPC, for use in things like inline assembly. This patch adds the associated support to GCC. Also, RCPC is required for Armv8.3-A and later, but the armv8.3-a entry didn't include it. This was probably harmless in practice since GCC simply ignored the extension until now. (The GAS definition is OK.) gcc/ * config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH8_3): Add AARCH64_FL_RCPC. (AARCH64_ISA_RCPC): New macro. * config/aarch64/aarch64-cores.def (thunderx3t110, zeus, neoverse-v1) (neoverse-512tvb, saphira): Remove RCPC from these Armv8.3-A+ cores. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define __ARM_FEATURE_RCPC when appropriate. gcc/testsuite/ * gcc.target/aarch64/pragma_cpp_predefs_1.c: Add RCPC tests.
2022-10-20libgomp.c-c++-common/requires-4.c: dg-xfail-run-if for USM with ↵Tobias Burnus1-0/+2
-foffload-memory= The USM implementation uses -foffload-memory=... which allocates variables in a special memory. This does not support static variables. Hence, XFAIL this test on nvptx/gcn. The requires-4a.c testcase tests the same but uses hash memory instead. libgomp/ * testsuite/libgomp.c-c++-common/requires-4.c: dg-xfail-run-if on nvptx and gcn.
2022-10-20libgomp: Add offload_device_gcn check, add requires-4a.c testTobias Burnus6-1/+84
Duplicate libgomp.c-c++-common/requires-4.c (as ...-4a.c) but with using a heap-allocated instead of static memory for a variable. This change and the added offload_device_gcn check prepare for pseudo-USM, where the device hardware cannot access all host memory but only managed and pinned memory; for those, requires-4.c will fail and the new check permits to add target { ! { offload_device_nvptx || offload_device_gcn } } to requires-4.c; however, it has not been added yet as pseuo-USM support is not yet on mainline. (Review is pending for the USM patches.) include/ChangeLog: * gomp-constants.h (GOMP_DEVICE_HSA): Comment out unused define. libgomp/ChangeLog: * testsuite/lib/libgomp.exp (check_effective_target_offload_device_gcn): New. * testsuite/libgomp.c-c++-common/on_device_arch.h (device_arch_gcn, on_device_arch_gcn): New. * testsuite/libgomp.c-c++-common/requires-4a.c: New test; copied from requires-4.c but using heap-allocated memory. (cherry picked from commit 12d9f5afbd2660862045acd41cb65a77e35bea4d)
2022-10-20Daily bump.GCC Administrator4-1/+49
2022-10-19libstdc++: eh_globals: gthreads: reset _S_init before deleting keyAlexandre Oliva1-2/+7
Clear __eh_globals_init's _S_init in the dtor before deleting the gthread key. This ensures that, in case any code involved in deleting the key interacts with eh_globals, the key that is being deleted won't be used, and the non-thread-specific eh_globals fallback will. for libstdc++-v3/ChangeLog * libsupc++/eh_globals.cc [!_GLIBCXX_HAVE_TLS] (__eh_globals_init::~__eh_globals_init): Clear _S_init first. (cherry picked from commit a33dda016e5acf9c6325ce8a72a1b0238130374e)
2022-10-19Fix omp-expand.cc's expand_omp_target for OpenACCTobias Burnus2-1/+6
In OG12 commit a6c1eccffb161130351d891dc87f5afe54f8075c, "Fortran/OpenMP: Support mapping of DT with allocatable components" the size of the addr/sizes/kind arrays was passed as 4th argument. However, OpenACC uses >3 arguments for its own purpose, e.g. to handle noncontiguous arrays by passing an array descriptor there. This patch restores the previous behaviour for OpenACC, fixing testcases like libgomp.oacc-c-c++-common/noncontig_array-1.c. gcc/ * omp-expand.cc (expand_omp_target): Fix OpenACC in case there are more than 3 arguments to the builtin function.
2022-10-19ChangeLog for "Fortran: Fix delinearization regression"Tobias Burnus2-0/+14
Missed to update gcc/fortran/ChangeLog.omp and to include the following in previous commit, i.e. commit 76b773a4a2d1daf0b83e50cd999bc38f8dd047be. gcc/fortran/ChangeLog: * trans-array.cc (non_negative_strides_array_p): Fix handling of GFC_DECL_SAVED_DESCRIPTOR. (gfc_conv_array_ref): Use ARRAY_REF again when possible. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/affinity-clause-1.f90: Revert to upsteam version, update one scan-tree item. * gfortran.dg/gomp/depend-4.f90: Revert to upstream version. * gfortran.dg/gomp/depend-5.f90: Likewise. * gfortran.dg/gomp/depend-6.f90: Likewise.
2022-10-19Fortran: Fix delinearization regressionTobias Burnus5-92/+91
The delinearization patch "Fortran: delinearize multi-dimensional array accesses", OG12 commit 39a8c371fda6136cf77c74895a00b136409e0ba3 uses gfc_build_array_ref for the non-delinearization path. The generated code depends on whether there can be negative strides or not, an addition to that function in r12-8230-g7964ab6c364 - adding a Boolean argument. The follow-up OG12 commit "Fix Fortran array-access regressions", 9fb0076b11eb2774b620bcf2171d55c7d1fb899f also added this argument to the call in gfc_conv_array_ref, but always evaluating as false. This commit changes it to a call to non_negative_strides_array_p (Note: for 'se->expr' not 'base'; the former could be 'arraydesc' while the later is then 'arraydesc.data' whose TREE_TYPE does not contain information about the array type.) However, doing so revealed a bug in non_negative_strides_array_p, fixed in this commit but also submitted as "Fortran: Fix non_negative_strides_array_p" to mainline, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603883.html As a side effect of this commit, several testcases now pass and the OG12-only changes to depend-{4,5,6}.f90 and affinity-clause-1.f90 could be undone, except that the latter now uses the delinearized array syntax in one case, which is an improvement (as honored in the scan-dump-tree). Hence, this commit (partially) reverts the commits: 21c806f73fc gfortran.dg/gomp/{depend-5,scope-6}.f90: Update scan-tree-dump 014fc7cd451 Fix dg- pattern for gomp/{affinity-clause-1.f90,uses_allocators-3.f90} 2d8aa5cc5d3 gfortran.dg/gomp/depend-6.f90: minor fix + dump update d77133b29fc gfortran.dg/gomp/depend-4.f90: minor fix + dump update The main testcase for non_negative_strides_array_p is gfortran.dg/array_reference_3.f90, which now also passes as well. Additionally, this changes prevents some unintended implicit mapping such that libgomp.fortran/map-alloc-comp-{4,6}.f90 failed before - and now passes again.
2022-10-19Remove undefined behaviour from testscase.Andrew MacLeod1-1/+1
There was a patch posted to remove the undefined behaviour from this testcase, but it appear to never have been applied. gcc/teststuite/ PR tree-optimization/102892 * gcc.dg/pr102892-1.c: Remove undefined behaviour.
2022-10-19rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]Kewen Lin2-1/+15
As PR96072 shows, the code adding REG_CFA_DEF_CFA reg note makes one assumption that we have emitted one insn which restores the frame pointer previously. That part of code was guarded with flag frame_pointer_needed before, it was consistent, but it was replaced with flag frame_pointer_needed_indeed since commit r10-7981. It caused ICE due to unexpected NULL insn. PR target/96072 gcc/ChangeLog: * config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): Update the condition for adding REG_CFA_DEF_CFA reg note with frame_pointer_needed_indeed. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr96072.c: New test. (cherry picked from commit 5be0950d22209f5ba69d244387228e12389a8470)
2022-10-19rs6000: Fix condition of define_expand vec_shr_<mode> [PR100645]Kewen Lin2-1/+14
PR100645 exposes one latent bug in define_expand vec_shr_<mode> that the current condition TARGET_ALTIVEC is too loose. The mode iterator VEC_L contains a few modes, they are not always supported as vector mode, VECTOR_UNIT_ALTIVEC_OR_VSX_P should be used like some other VEC_L usages. PR target/100645 gcc/ChangeLog: * config/rs6000/vector.md (vec_shr_<mode>): Replace condition TARGET_ALTIVEC with VECTOR_UNIT_ALTIVEC_OR_VSX_P. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr100645.c: New test. (cherry picked from commit bfad7069b74c97000b698191c1945f07a6192db5)
2022-10-19Daily bump.GCC Administrator1-1/+1
2022-10-18Merge branch 'releases/gcc-12' into devel/omp/gcc-12Tobias Burnus32-128/+1234
Merge up to r12-8843-g912bdd5cfb92f6dd58accd755ad14f47c0df619e (18th Oct 2022)
2022-10-18Daily bump.GCC Administrator3-1/+136
2022-10-17Fix register count when not splitting Complex IEEE 128-bit args.Pat Haugen1-0/+6
For ABI_V4, we do not split complex args. This created a problem because even though an arg would be passed in two VSX regs, we were only advancing the function arg counter by one VSX register. Fixed with this patch. PR target/99685 gcc/ * config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump register count when not splitting IEEE 128-bit Complex. (cherry picked from commit 2ee68beee709e48fce85b8892ff9985acc6a91a8)
2022-10-17Fortran: Fixes for kind=4 characters strings [PR107266]Tobias Burnus7-12/+153
PR fortran/107266 gcc/fortran/ * trans-expr.cc (gfc_conv_string_parameter): Use passed type to honor character kind. * trans-types.cc (gfc_sym_type): Honor character kind. * trans-decl.cc (gfc_conv_cfi_to_gfc): Fix handling kind=4 character strings. gcc/testsuite/ * gfortran.dg/char4_decl.f90: New test. * gfortran.dg/char4_decl-2.f90: New test. (cherry picked from commit c610cf20ebb3444ef4224d789aca670a12f5da40)
2022-10-17libgomp: Add Fortran testcases for omp_in_explicit_taskTobias Burnus8-0/+260
Fortranized testcases of commits r13-3257-ga58a965eb73 and r13-3258-g0ec4e93fb9f. libgomp/ChangeLog: * testsuite/libgomp.fortran/task-7.f90: New test. * testsuite/libgomp.fortran/task-8.f90: New test. * testsuite/libgomp.fortran/task-in-explicit-1.f90: New test. * testsuite/libgomp.fortran/task-in-explicit-2.f90: New test. * testsuite/libgomp.fortran/task-in-explicit-3.f90: New test. * testsuite/libgomp.fortran/task-reduction-17.f90: New test. * testsuite/libgomp.fortran/task-reduction-18.f90: New test. (cherry picked from commit ab8477af9949a7e6fcaf89c5f1dcf32788accf88)
2022-10-17libgomp: Fix up OpenMP 5.2 feature bulletJakub Jelinek2-1/+9
The previous bullet correctly mentions 5.2 added for Fortran allocators directive which is a replacement of allocate directive associated with ALLOCATE statement to differentiate it at parse time from allocate directive as declarative one not associated with ALLOCATE statement, but the deprecation bullet talks about non-existing allocator directive. 2022-10-12 Jakub Jelinek <jakub@redhat.com> * libgomp.texi (OpenMP 5.2): Fix up allocator -> allocate directive in deprecation bullet. (cherry picked from commit caf9db5a7f99fae8b6088328b9b48ee79fa5e5f0)
2022-10-17libgomp: Add omp_in_explicit_task supportJakub Jelinek11-2/+220
This is pretty straightforward, if gomp_thread ()->task is NULL, it can't be explicit task, otherwise if gomp_thread ()->task->kind == GOMP_TASK_IMPLICIT, it is an implicit task, otherwise explicit task. 2022-10-12 Jakub Jelinek <jakub@redhat.com> * omp.h.in (omp_in_explicit_task): Declare. * omp_lib.h.in (omp_in_explicit_task): Likewise. * omp_lib.f90.in (omp_in_explicit_task): New interface. * libgomp.map (OMP_5.2): New symbol version, export omp_in_explicit_task and omp_in_explicit_task_. * task.c (omp_in_explicit_task): New function. * fortran.c (omp_in_explicit_task): Add ialias_redirect. (omp_in_explicit_task_): New function. * libgomp.texi (OpenMP 5.2): Mark omp_in_explicit_task as implemented. * testsuite/libgomp.c-c++-common/task-in-explicit-1.c: New test. * testsuite/libgomp.c-c++-common/task-in-explicit-2.c: New test. * testsuite/libgomp.c-c++-common/task-in-explicit-3.c: New test. (cherry picked from commit 0ec4e93fb9fa5e9d2424683c5fab1310c8ae2f76)
2022-10-17libgomp: Fix up creation of artificial teamsJakub Jelinek7-6/+130
When not in explicit parallel/target/teams construct, we in some cases create an artificial parallel with a single thread (either to handle target nowait or for task reduction purposes). In those cases, it handled again artificially created implicit task (created by gomp_new_icv for cases where we needed to write to some ICVs), but as the testcases show, didn't take into account possibility of this being done from explicit task(s). The code would destroy/free the previous task and replace it with the new implicit task. If task is an explicit task (when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to a local stack variable, so freeing it doesn't work, and additionally we shouldn't lose the explicit tasks - the new implicit task should instead replace the ancestor task which is the first implicit one. 2022-10-12 Jakub Jelinek <jakub@redhat.com> * task.c (gomp_create_artificial_team): Fix up handling of invocations from within explicit task. * target.c (GOMP_target_ext): Likewise. * testsuite/libgomp.c/task-7.c: New test. * testsuite/libgomp.c/task-8.c: New test. * testsuite/libgomp.c-c++-common/task-reduction-17.c: New test. * testsuite/libgomp.c-c++-common/task-reduction-18.c: New test. (cherry picked from commit a58a965eb73253759f6a3e1c7380392557da89c8)
2022-10-17tree-optimization/107254 - check and support live lanes from permutesRichard Biener2-5/+77
The following fixes an omission from adding SLP permute nodes which is live lanes originating from those. We have to check that we can extract the lane and have to actually code generate them. PR tree-optimization/107254 * tree-vect-slp.cc (vect_slp_analyze_node_operations_1): For permutes also analyze live lanes. (vect_schedule_slp_node): For permutes also code generate live lane extracts. * gfortran.dg/vect/pr107254.f90: New testcase. (cherry picked from commit 9ed4a849afb5b18b462bea311e7eee454c2c9f68)
2022-10-17tree-optimization/107212 - SLP reduction of reduction pathsRichard Biener3-7/+63
The following fixes an issue with how we handle epilogue generation for SLP reductions of reduction paths where the actual live lanes are not "canonical". We need to make sure to identify all live lanes as reductions and thus have to iterate over all participating SLP lanes when walking the reduction SSA use-def chain. Also the previous attempt likely to mitigate such issue in vectorizable_live_operation is misguided and has to be removed. PR tree-optimization/107212 * tree-vect-loop.cc (vectorizable_reduction): Make sure to set STMT_VINFO_REDUC_DEF for all live lanes in a SLP reduction. (vectorizable_live_operation): Do not pun to the SLP node representative for reduction epilogue generation. * gcc.dg/vect/pr107212-1.c: New testcase. * gcc.dg/vect/pr107212-2.c: Likewise. (cherry picked from commit ee467644c53ee2f7d633a8e1f53603feafab4351)