Age | Commit message (Collapse) | Author | Files | Lines |
|
Commit bce2c92cfec2ae1eb9d79e36dff5a220b688bfa1 "Various OpenACC reduction
enhancements - ME and nvptx changes" introduced several regressions:
gcc/testsuite/c-c++-common/goacc/nested-reductions-1-routine.c
gcc/testsuite/c-c++-common/goacc/nested-reductions-2-routine.c
gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c
gcc/testsuite/gfortran.dg/goacc/nested-reductions-1-routine.f90
gcc/testsuite/gfortran.dg/goacc/nested-reductions-2-routine.f90
gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90
This fixes above regressions.
gcc/ChangeLog:
* omp-offload.cc (oacc_loop_auto_partitions): Removed OLF reduction
handling.
|
|
This assertion in branch_prob:
if (bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb)
{
location_t loc = DECL_SOURCE_LOCATION (current_function_decl);
gcc_checking_assert (!RESERVED_LOCATION_P (loc));
had been correct until the fix for PR debug/101598 was installed.
gcc/
* profile.cc (branch_prob): Be prepared for ignored functions with
DECL_SOURCE_LOCATION set to UNKNOWN_LOCATION.
gcc/testsuite:
* gnat.dg/specs/coverage1.ads: New test.
* gnat.dg/specs/variant_part.ads: Minor tweak.
* gnat.dg/specs/weak1.ads: Add dg directive.
|
|
For a parameter with BLKmode we cannot use REG_NREGS in order to
determine the number of consecutive registers. Streamlined this with
the implementation of s390_function_arg.
Fix some indentation whitespace, too.
gcc/ChangeLog:
PR target/106355
* config/s390/s390.cc (s390_call_saved_register_used): For a
parameter with BLKmode fix determining number of consecutive
registers.
gcc/testsuite/ChangeLog:
* gcc.target/s390/pr106355.h: Common code for new tests.
* gcc.target/s390/pr106355-1.c: New test.
* gcc.target/s390/pr106355-2.c: New test.
* gcc.target/s390/pr106355-3.c: New test.
(cherry picked from commit cb994acc08b67f26a54e7c5dc1f4995a2ce24d98)
|
|
PR target/107364
gcc/ChangeLog:
* common/config/i386/i386-cpuinfo.h (enum processor_vendor):
Fix pedantic warning.
(cherry picked from commit f3f000b7689ce9eb6364808072025672af1e4e1b)
|
|
PR target/107364
gcc/ChangeLog:
* common/config/i386/i386-cpuinfo.h (enum processor_vendor):
Reorder enum values as BUILTIN_VENDOR_MAX should not point
in the middle of the valid enum values.
(cherry picked from commit f751bf4c5d1aaa1aacfcbdec62881c5ea1175dfb)
|
|
|
|
spurious SIGSEGVs
Per commit r13-3460-g131d18e928a3ea1ab2d3bf61aa92d68a8a254609
"libgomp/nvptx: Prepare for reverse-offload callback handling",
I'm seeing a lot of libgomp execution test regressions. Random
example, 'libgomp.c-c++-common/error-1.c':
[...]
GOMP_OFFLOAD_run: kernel main$_omp_fn$0: launch [(teams: 1), 1, 1] [(lanes: 32), (threads: 8), 1]
Thread 1 "a.out" received signal SIGSEGV, Segmentation fault.
0x00007ffff793b87d in GOMP_OFFLOAD_run (ord=<optimized out>, tgt_fn=<optimized out>, tgt_vars=<optimized out>, args=<optimized out>) at [...]/source-gcc/libgomp/plugin/plugin-nvptx.c:2127
2127 if (__atomic_load_n (&ptx_dev->rev_data->fn, __ATOMIC_ACQUIRE) != 0)
(gdb) print ptx_dev
$1 = (struct ptx_device *) 0x6a55a0
(gdb) print ptx_dev->rev_data
$2 = (struct rev_offload *) 0xffffffff00000000
(gdb) print ptx_dev->rev_data->fn
Cannot access memory at address 0xffffffff00000000
libgomp/
* plugin/plugin-nvptx.c (nvptx_open_device): Initialize
'ptx_dev->rev_data'.
(cherry picked from commit 205538832b7033699047900cf25928f5920d8b93)
|
|
This patch disables vectorization of memory accesses to non-default address
spaces where the pointer size is different to the usual pointer size. This
condition typically occurs in OpenACC programs on amdgcn, where LDS memory is
used for broadcasting gang-private variables between threads. In particular,
see libgomp.oacc-c-c++-common/private-variables.c
The problem is that the address space information is dropped from the various
types in the middle-end and eventually it triggers an ICE trying to do an
address conversion. That ICE can be avoided by defining
POINTERS_EXTEND_UNSIGNED, but that just produces wrong RTL code later on.
A correct solution would ensure that all the vectypes have the correct address
spaces, but I don't have time for that right now.
gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_analyze_data_refs): Workaround an
address-space bug.
|
|
It does work, but not well and only with the amdgpu.noreply=0 kernel boot
option.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_init_cumulative_args): Disallow gfx908.
|
|
Allocate Unified Shared Memory via malloc and hsa_amd_svm_attributes_set,
instead of hsa_allocate_memory. This scheme should be more efficient for
for memory that is first accessed by the CPU.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED): New.
(HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): New.
(HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG): New.
(HSA_AMD_SVM_GLOBAL_FLAG_COARSE_GRAINED): New.
(hsa_amd_svm_attribute_pair_t): New.
(struct hsa_runtime_fn_info): Add hsa_amd_svm_attributes_set_fn.
(dump_hsa_system_info): Dump HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED and
HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT.
(DLSYM_OPT_FN): New.
(init_hsa_runtime_functions): Add hsa_amd_svm_attributes_set.
(GOMP_OFFLOAD_usm_alloc): Use malloc and hsa_amd_svm_attributes_set.
(GOMP_OFFLOAD_usm_free): Use regular free.
* testsuite/libgomp.c/usm-1.c: Add -mxnack=on for amdgcn.
* testsuite/libgomp.c/usm-2.c: Likewise.
* testsuite/libgomp.c/usm-3.c: Likewise.
* testsuite/libgomp.c/usm-4.c: Likewise.
|
|
This patch adds a stub 'gomp_target_rev' in the host's target.c, which will
later handle the reverse offload.
For nvptx, it adds support for forwarding the offload gomp_target_ext call
to the host by setting values in a struct on the device and querying it on
the host - invoking gomp_target_rev on the result.
include/ChangeLog:
* cuda/cuda.h (enum CUdevice_attribute): Add
CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING.
(CU_MEMHOSTALLOC_DEVICEMAP): Define.
(cuMemHostAlloc): Add prototype.
libgomp/ChangeLog:
* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove
'static' for this variable.
* config/nvptx/libgomp-nvptx.h: New file.
* config/nvptx/target.c: Include it.
(GOMP_ADDITIONAL_ICVS): Declare extern var.
(GOMP_REV_OFFLOAD_VAR): Declare var.
(GOMP_target_ext): Handle reverse offload.
* libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype.
* libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ...
* target.c (gomp_target_rev): ... this new stub function.
* libgomp.h (gomp_target_rev): Declare.
* libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev.
* plugin/cuda-lib.def (cuMemHostAlloc): Add.
* plugin/plugin-nvptx.c: Include libgomp-nvptx.h.
(struct ptx_device): Add rev_data member.
(nvptx_open_device): Remove async_engines query, last used in
r10-304-g1f4c5b9b; add unified-address assert check.
(GOMP_OFFLOAD_get_num_devices): Claim unified address
support.
(GOMP_OFFLOAD_load_image): Free rev_fn_table if no
offload functions exist. Make offload var available
on host and device.
(rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New.
(GOMP_OFFLOAD_run): Handle reverse offload.
(cherry picked from commit 052dfa279f5de90b324d60cf787e821b18cf496c)
|
|
The OG12 commit
3e8b51d143e openacc: Move pass_oacc_device_lower after pass_graphite
adds a new pass, which also re-invokes some previous passes. This
seems to have the effect that the pass names and, hence, the dump
names have now a tailing number.
In that commit, 'cunrolli' was changed to 'cunrolli1 for some testcases.
This commit does likewise for some more testcases. In particular,
. in scan-tree-dump the tailing '1' is crucial to change UNRESOLVED
to PASS.
. for "-fdisable-tree-cunrolli1" option, it changes FAIL (excess errors)
to PASS
. even without the change, "-fdump-tree-cunrolli{-details,-optimized}"
has PASS, but I believe the tailing 1 ensures that only the first
'cunrolli' dumps.
gcc/testsuite
* g++.dg/ext/unroll-1.C: Change 'cunrolli' to 'cunrolli1' in
dg-options and scan-tree-dump.
* g++.dg/ext/unroll-2.C: Likewise.
* g++.dg/ext/unroll-3.C: Likewise.
* g++.dg/vect/pr36648.cc: Likewise.
* gcc.dg/tree-prof/init-array.c: Likewise.
* gcc.dg/tree-ssa/pr100359.c: Likewise.
* gcc.dg/tree-ssa/pr59597.c: Likewise.
* gcc.dg/unroll-2.c: Likewise.
* gfortran.dg/directive_unroll_1.f90: Likewise.
* gfortran.dg/directive_unroll_4.f90: Likewise.
* gnat.dg/unroll1.adb: Likewise.
* gnat.dg/unroll2.adb: Likewise.
|
|
For 'target parallel' and similarly nested directives, cgraph_node's
calls_declare_variant_alt was not set in the parent region node but in
cfun->decl. Hence, pass_omp_device_lower did not process handle the
internal function GOMP_TARGET_REV. - Solution is to set it to the
DECL_CONTEXT, which is set in adjust_context_and_scope.
The cgraph_node::create_clone issue is exposed with -O2 for the existing
libgomp.fortran/reverse-offload-1.f90.
PR middle-end/107236
gcc/ChangeLog:
* omp-expand.cc (expand_omp_target): Set calls_declare_variant_alt
in DECL_CONTEXT and not to cfun->decl.
* cgraphclones.cc (cgraph_node::create_clone): Copy also the
node's calls_declare_variant_alt value.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/target-device-ancestor-6.f90: New test.
(cherry picked from commit 178ac530fe67e4f2fc439cc4ce89bc19d571ca31)
|
|
Merge up to r12-8861-g1ccec25cf0c3c9cfd5882c83fd8cc56ea2987bad (24th Oct 2022)
|
|
(OpenMP 5.0)'
OG12 commit df47c25110474565f521508a1545232550052a75 included everything
of r13-150-g1a8c4d9ed36556a95bd7d53c04d2ec4c95594061 but the change to
gcc/testsuite/gcc.dg/gomp/pr104517.c
This commit cherry-picks the missing changes to that file.
Note: The OG12 commit already contained the ChangeLog.omp entry for
that file.
gcc/testsuite/
* gcc.dg/gomp/pr104517.c: Update.
(cherry picked from commit 1a8c4d9ed36556a95bd7d53c04d2ec4c95594061)
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/105633
* expr.cc (find_array_section): Move check for NULL pointers so
that both subscript triplets and vector subscripts are covered.
gcc/testsuite/ChangeLog:
PR fortran/105633
* gfortran.dg/pr105633.f90: New test.
Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
(cherry picked from commit ecb20df4fa6d99daa635c7fb662dc0554610777e)
|
|
|
|
|
|
GIMPLE_DEBUG were put in a parallel region of its own, which is not
only pointless but also breaks -fcompare-debug. With this commit,
they are handled like simple assignments: those placed are places
into the same body as the loop such that only one parallel region
remains as without debugging. This fixes the existing testcase
libgomp.oacc-c-c++-common/kernels-loop-g.c.
Note: GIMPLE_DEBUG are only accepted with -fcompare-debug; if they
appear otherwise, decompose_kernels_region_body rejects them with
a sorry (unchanged).
gcc/
* omp-oacc-kernels-decompose.cc (top_level_omp_for_in_stmt,
decompose_kernels_region_body): Handle GIMPLE_DEBUG like
simple assignment.
|
|
After commit r13-3404-g7c55755d4c760de326809636531478fd7419e1e5
"amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]",
"big" private data now works for GCN offloading, too.
PR target/105421
libgomp/
* testsuite/libgomp.oacc-c-c++-common/private-big-1.c: New.
(cherry picked from commit c7ebee2378426eeca425ca5406af213a926f154c)
|
|
The GCN backend uses a heuristic to determine whether to use FLAT or
GLOBAL addressing in a particular (offload) function: namely, if a
function takes a pointer-to-scalar parameter, it is assumed that the
pointer may refer to "flat scratch" space, and thus FLAT addressing must
be used instead of GLOBAL.
I came up with this heuristic initially whilst working on support for
moving OpenACC gang-private variables into local-data share (scratch)
memory. The assumption that only scalar variables would be transformed in
that way turned out to be wrong. For example, prior to the next patch in
the series, Fortran compiler-generated temporary structures were treated
as gang private and moved to LDS space, typically overflowing the region
allocated for such variables. That will no longer happen after that
patch is applied, but there may be other cases of structs moving to LDS
space now or in the future that this patch may be needed for.
2022-10-14 Julian Brown <julian@codesourcery.com>
PR target/105421
gcc/
* config/gcn/gcn.cc (gcn_detect_incoming_pointer_arg): Any pointer
argument forces FLAT addressing mode, not just
pointer-to-non-aggregate.
(cherry picked from commit 7c55755d4c760de326809636531478fd7419e1e5)
|
|
This fixes multiple tests in addition to
b0256655fb402f87c921cd782b873dd301760ebd.
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/classify-kernels-unparallelized-graphite.c:
Add "noclone" in scan-tree-dump.
* c-c++-common/goacc/kernels-acc-loop-reduction.c: Likewise.
* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: Likewise.
* c-c++-common/goacc/kernels-loop-2-acc-loop.c: Likewise.
* c-c++-common/goacc/kernels-loop-3-acc-loop.c: Likewise.
* c-c++-common/goacc/kernels-loop-acc-loop.c: Likewise.
* c-c++-common/goacc/kernels-loop-n-acc-loop.c: Likewise.
* gfortran.dg/goacc/kernels-loop-data-parloops-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-parloops-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-parloops.f95: Likewise.
|
|
The following reverts part of the PR94125 fix which causes us to
use a bogus partition ordering after applying versioning for
alias to the testcase in PR107323. Instead PR94125 is fixed by
appropriately considering to be merged SCCs when skipping edges
we want to ignore because of the alias versioning.
PR tree-optimization/107323
* tree-loop-distribution.cc (pg_unmark_merged_alias_ddrs):
New function.
(loop_distribution::break_alias_scc_partitions): Revert
postorder save/restore from the PR94125 fix. Instead
make sure to not ignore edges from SCCs we are going to
merge.
* gcc.dg/tree-ssa/pr107323.c: New testcase.
(cherry picked from commit 09f9814dc02c161ed78604c6df70b19b596f7524)
|
|
|
|
consistently, regardless of checking level
Fix-up for commit c14ea6a72fb1ae66e3d32ac8329558497c6e4403
"Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition
[PR100400, PR103836, PR104061]".
For C++ compilation of 'c-c++-common/goacc/kernels-decompose-pr100400-1-2.c',
we first emit a 'sorry' diagnostic, and then a 'gcc_unreachable' (or
'internal_error', see below) diagnostic, but for example, for
'--enable-checking=release' (thus, '!CHECKING_P'), the second one may actually
be turned into a 'confused by earlier errors, bailing out' diagnostic. (See
'gcc/diagnostic.cc:diagnostic_report_diagnostic': "When not checking, ICEs are
converted to fatal errors when an error has already occurred.") Thus, make
'c-c++-common/goacc/kernels-decompose-pr100400-1-2.c' behave consistently via
'-Wfatal-errors', and thus only matching the 'sorry' diagnostic.
For example, for '--enable-checking=no' (thus, '!ENABLE_ASSERT_CHECKING'), a
call to 'gcc_unreachable' cannot be assumed emit an 'internal_error'-like
diagnostic, so explicitly call 'internal_error' in
'gcc/omp-oacc-kernels-decompose.cc:visit_loops_in_gang_single_region', in the
'GIMPLE_OMP_FOR' case, to avoid regressing
'c-c++-common/goacc/kernels-decompose-pr100400-1-3.c', and
'c-c++-common/goacc/kernels-decompose-pr100400-1-4.c'.
PR middle-end/100400
gcc/
* omp-oacc-kernels-decompose.cc
(visit_loops_in_gang_single_region) <GIMPLE_OMP_FOR>: Explicitly
call 'internal_error'.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: Specify
'-Wfatal-errors'.
(cherry picked from commit da6305558bab9e24943848e4fc5bd8738d7e8f9b)
|
|
Bit of a brown-paper-bag bug, but: GCC was generating
non-existent merging forms of BRKAS and BRKBS. Those
instructions only support zero predication (although
BRKA and BRKB support both).
gcc/
* config/aarch64/aarch64-sve.md (*aarch64_brk<brk_op>_cc): Remove
merging alternative.
(*aarch64_brk<brk_op>_ptest): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/brka_1.c: Expect a separate
PTEST instruction.
* gcc.target/aarch64/sve/acle/general/brkb_1.c: Likewise.
(cherry picked from commit 57675c7f92a3bd3ca8dae1faac7f2f51d40e0f9e)
|
|
Unlike other flag-setting SVE instructions, BRKNS sets the flags
based on an all-true governing predicate, rather than the GP operand.
gcc/
* config/aarch64/iterators.md (SVE_BRKP): New iterator.
* config/aarch64/aarch64-sve.md (*aarch64_brkn_cc): New pattern.
(*aarch64_brkn_ptest): Likewise.
(*aarch64_brk<brk_op>_cc): Restrict to SVE_BRKP.
(*aarch64_brk<brk_op>_ptest): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/brkn_1.c: Expect separate
PTEST instructions.
* gcc.target/aarch64/sve/acle/general/brkn_2.c: New test.
(cherry picked from commit 6bec66640597e2604f51fc1642c7d279164cd442)
|
|
https://github.com/ARM-software/acle/pull/199 adds a new feature
macro for RCPC, for use in things like inline assembly. This patch
adds the associated support to GCC.
Also, RCPC is required for Armv8.3-A and later, but the armv8.3-a
entry didn't include it. This was probably harmless in practice
since GCC simply ignored the extension until now. (The GAS
definition is OK.)
gcc/
* config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH8_3): Add
AARCH64_FL_RCPC.
(AARCH64_ISA_RCPC): New macro.
* config/aarch64/aarch64-cores.def (thunderx3t110, zeus, neoverse-v1)
(neoverse-512tvb, saphira): Remove RCPC from these Armv8.3-A+ cores.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_RCPC when appropriate.
gcc/testsuite/
* gcc.target/aarch64/pragma_cpp_predefs_1.c: Add RCPC tests.
|
|
-foffload-memory=
The USM implementation uses -foffload-memory=... which allocates variables
in a special memory. This does not support static variables. Hence, XFAIL
this test on nvptx/gcn. The requires-4a.c testcase tests the same but uses
hash memory instead.
libgomp/
* testsuite/libgomp.c-c++-common/requires-4.c: dg-xfail-run-if on
nvptx and gcn.
|
|
Duplicate libgomp.c-c++-common/requires-4.c (as ...-4a.c) but
with using a heap-allocated instead of static memory for a variable.
This change and the added offload_device_gcn check prepare for
pseudo-USM, where the device hardware cannot access all host
memory but only managed and pinned memory; for those, requires-4.c
will fail and the new check permits to add
target { ! { offload_device_nvptx || offload_device_gcn } }
to requires-4.c; however, it has not been added yet as pseuo-USM
support is not yet on mainline. (Review is pending for the USM
patches.)
include/ChangeLog:
* gomp-constants.h (GOMP_DEVICE_HSA): Comment out unused define.
libgomp/ChangeLog:
* testsuite/lib/libgomp.exp (check_effective_target_offload_device_gcn):
New.
* testsuite/libgomp.c-c++-common/on_device_arch.h (device_arch_gcn,
on_device_arch_gcn): New.
* testsuite/libgomp.c-c++-common/requires-4a.c: New test; copied from
requires-4.c but using heap-allocated memory.
(cherry picked from commit 12d9f5afbd2660862045acd41cb65a77e35bea4d)
|
|
|
|
Clear __eh_globals_init's _S_init in the dtor before deleting the
gthread key.
This ensures that, in case any code involved in deleting the key
interacts with eh_globals, the key that is being deleted won't be
used, and the non-thread-specific eh_globals fallback will.
for libstdc++-v3/ChangeLog
* libsupc++/eh_globals.cc [!_GLIBCXX_HAVE_TLS]
(__eh_globals_init::~__eh_globals_init): Clear _S_init first.
(cherry picked from commit a33dda016e5acf9c6325ce8a72a1b0238130374e)
|
|
In OG12 commit a6c1eccffb161130351d891dc87f5afe54f8075c,
"Fortran/OpenMP: Support mapping of DT with allocatable components"
the size of the addr/sizes/kind arrays was passed as 4th argument.
However, OpenACC uses >3 arguments for its own purpose, e.g. to
handle noncontiguous arrays by passing an array descriptor there.
This patch restores the previous behaviour for OpenACC, fixing
testcases like libgomp.oacc-c-c++-common/noncontig_array-1.c.
gcc/
* omp-expand.cc (expand_omp_target): Fix OpenACC in case there
are more than 3 arguments to the builtin function.
|
|
Missed to update gcc/fortran/ChangeLog.omp and to include the
following in previous commit, i.e.
commit 76b773a4a2d1daf0b83e50cd999bc38f8dd047be.
gcc/fortran/ChangeLog:
* trans-array.cc (non_negative_strides_array_p): Fix handling
of GFC_DECL_SAVED_DESCRIPTOR.
(gfc_conv_array_ref): Use ARRAY_REF again when possible.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/affinity-clause-1.f90: Revert to upsteam version,
update one scan-tree item.
* gfortran.dg/gomp/depend-4.f90: Revert to upstream version.
* gfortran.dg/gomp/depend-5.f90: Likewise.
* gfortran.dg/gomp/depend-6.f90: Likewise.
|
|
The delinearization patch "Fortran: delinearize multi-dimensional array
accesses", OG12 commit 39a8c371fda6136cf77c74895a00b136409e0ba3 uses
gfc_build_array_ref for the non-delinearization path. The generated
code depends on whether there can be negative strides or not, an
addition to that function in r12-8230-g7964ab6c364 - adding a Boolean
argument.
The follow-up OG12 commit "Fix Fortran array-access regressions",
9fb0076b11eb2774b620bcf2171d55c7d1fb899f also added this argument
to the call in gfc_conv_array_ref, but always evaluating as false.
This commit changes it to a call to non_negative_strides_array_p
(Note: for 'se->expr' not 'base'; the former could be 'arraydesc'
while the later is then 'arraydesc.data' whose TREE_TYPE does not
contain information about the array type.)
However, doing so revealed a bug in non_negative_strides_array_p,
fixed in this commit but also submitted as "Fortran: Fix
non_negative_strides_array_p" to mainline,
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603883.html
As a side effect of this commit, several testcases now pass and the
OG12-only changes to depend-{4,5,6}.f90 and affinity-clause-1.f90
could be undone, except that the latter now uses the delinearized
array syntax in one case, which is an improvement (as honored in
the scan-dump-tree). Hence, this commit (partially) reverts the
commits:
21c806f73fc gfortran.dg/gomp/{depend-5,scope-6}.f90: Update scan-tree-dump
014fc7cd451 Fix dg- pattern for gomp/{affinity-clause-1.f90,uses_allocators-3.f90}
2d8aa5cc5d3 gfortran.dg/gomp/depend-6.f90: minor fix + dump update
d77133b29fc gfortran.dg/gomp/depend-4.f90: minor fix + dump update
The main testcase for non_negative_strides_array_p is
gfortran.dg/array_reference_3.f90, which now also passes as well.
Additionally, this changes prevents some unintended implicit
mapping such that libgomp.fortran/map-alloc-comp-{4,6}.f90 failed
before - and now passes again.
|
|
There was a patch posted to remove the undefined behaviour from this
testcase, but it appear to never have been applied.
gcc/teststuite/
PR tree-optimization/102892
* gcc.dg/pr102892-1.c: Remove undefined behaviour.
|
|
As PR96072 shows, the code adding REG_CFA_DEF_CFA reg note
makes one assumption that we have emitted one insn which
restores the frame pointer previously. That part of code
was guarded with flag frame_pointer_needed before, it was
consistent, but it was replaced with flag
frame_pointer_needed_indeed since commit r10-7981. It
caused ICE due to unexpected NULL insn.
PR target/96072
gcc/ChangeLog:
* config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): Update the
condition for adding REG_CFA_DEF_CFA reg note with
frame_pointer_needed_indeed.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr96072.c: New test.
(cherry picked from commit 5be0950d22209f5ba69d244387228e12389a8470)
|
|
PR100645 exposes one latent bug in define_expand vec_shr_<mode>
that the current condition TARGET_ALTIVEC is too loose. The
mode iterator VEC_L contains a few modes, they are not always
supported as vector mode, VECTOR_UNIT_ALTIVEC_OR_VSX_P should
be used like some other VEC_L usages.
PR target/100645
gcc/ChangeLog:
* config/rs6000/vector.md (vec_shr_<mode>): Replace condition
TARGET_ALTIVEC with VECTOR_UNIT_ALTIVEC_OR_VSX_P.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr100645.c: New test.
(cherry picked from commit bfad7069b74c97000b698191c1945f07a6192db5)
|
|
|
|
Merge up to r12-8843-g912bdd5cfb92f6dd58accd755ad14f47c0df619e (18th Oct 2022)
|
|
|
|
For ABI_V4, we do not split complex args. This created a problem because
even though an arg would be passed in two VSX regs, we were only advancing the
function arg counter by one VSX register. Fixed with this patch.
PR target/99685
gcc/
* config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump
register count when not splitting IEEE 128-bit Complex.
(cherry picked from commit 2ee68beee709e48fce85b8892ff9985acc6a91a8)
|
|
PR fortran/107266
gcc/fortran/
* trans-expr.cc (gfc_conv_string_parameter): Use passed
type to honor character kind.
* trans-types.cc (gfc_sym_type): Honor character kind.
* trans-decl.cc (gfc_conv_cfi_to_gfc): Fix handling kind=4
character strings.
gcc/testsuite/
* gfortran.dg/char4_decl.f90: New test.
* gfortran.dg/char4_decl-2.f90: New test.
(cherry picked from commit c610cf20ebb3444ef4224d789aca670a12f5da40)
|
|
Fortranized testcases of commits r13-3257-ga58a965eb73
and r13-3258-g0ec4e93fb9f.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/task-7.f90: New test.
* testsuite/libgomp.fortran/task-8.f90: New test.
* testsuite/libgomp.fortran/task-in-explicit-1.f90: New test.
* testsuite/libgomp.fortran/task-in-explicit-2.f90: New test.
* testsuite/libgomp.fortran/task-in-explicit-3.f90: New test.
* testsuite/libgomp.fortran/task-reduction-17.f90: New test.
* testsuite/libgomp.fortran/task-reduction-18.f90: New test.
(cherry picked from commit ab8477af9949a7e6fcaf89c5f1dcf32788accf88)
|
|
The previous bullet correctly mentions 5.2 added for Fortran
allocators directive which is a replacement of allocate directive
associated with ALLOCATE statement to differentiate it at parse time
from allocate directive as declarative one not associated with ALLOCATE
statement, but the deprecation bullet talks about non-existing allocator
directive.
2022-10-12 Jakub Jelinek <jakub@redhat.com>
* libgomp.texi (OpenMP 5.2): Fix up allocator -> allocate directive
in deprecation bullet.
(cherry picked from commit caf9db5a7f99fae8b6088328b9b48ee79fa5e5f0)
|
|
This is pretty straightforward, if gomp_thread ()->task is NULL,
it can't be explicit task, otherwise if
gomp_thread ()->task->kind == GOMP_TASK_IMPLICIT, it is an implicit
task, otherwise explicit task.
2022-10-12 Jakub Jelinek <jakub@redhat.com>
* omp.h.in (omp_in_explicit_task): Declare.
* omp_lib.h.in (omp_in_explicit_task): Likewise.
* omp_lib.f90.in (omp_in_explicit_task): New interface.
* libgomp.map (OMP_5.2): New symbol version, export
omp_in_explicit_task and omp_in_explicit_task_.
* task.c (omp_in_explicit_task): New function.
* fortran.c (omp_in_explicit_task): Add ialias_redirect.
(omp_in_explicit_task_): New function.
* libgomp.texi (OpenMP 5.2): Mark omp_in_explicit_task as implemented.
* testsuite/libgomp.c-c++-common/task-in-explicit-1.c: New test.
* testsuite/libgomp.c-c++-common/task-in-explicit-2.c: New test.
* testsuite/libgomp.c-c++-common/task-in-explicit-3.c: New test.
(cherry picked from commit 0ec4e93fb9fa5e9d2424683c5fab1310c8ae2f76)
|
|
When not in explicit parallel/target/teams construct, we in some cases create
an artificial parallel with a single thread (either to handle target nowait
or for task reduction purposes). In those cases, it handled again artificially
created implicit task (created by gomp_new_icv for cases where we needed to write
to some ICVs), but as the testcases show, didn't take into account possibility
of this being done from explicit task(s). The code would destroy/free the previous
task and replace it with the new implicit task. If task is an explicit task
(when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to
a local stack variable, so freeing it doesn't work, and additionally we shouldn't
lose the explicit tasks - the new implicit task should instead replace the
ancestor task which is the first implicit one.
2022-10-12 Jakub Jelinek <jakub@redhat.com>
* task.c (gomp_create_artificial_team): Fix up handling of invocations
from within explicit task.
* target.c (GOMP_target_ext): Likewise.
* testsuite/libgomp.c/task-7.c: New test.
* testsuite/libgomp.c/task-8.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-17.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-18.c: New test.
(cherry picked from commit a58a965eb73253759f6a3e1c7380392557da89c8)
|
|
The following fixes an omission from adding SLP permute nodes which
is live lanes originating from those. We have to check that we
can extract the lane and have to actually code generate them.
PR tree-optimization/107254
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1):
For permutes also analyze live lanes.
(vect_schedule_slp_node): For permutes also code generate
live lane extracts.
* gfortran.dg/vect/pr107254.f90: New testcase.
(cherry picked from commit 9ed4a849afb5b18b462bea311e7eee454c2c9f68)
|
|
The following fixes an issue with how we handle epilogue generation
for SLP reductions of reduction paths where the actual live lanes
are not "canonical". We need to make sure to identify all live
lanes as reductions and thus have to iterate over all participating
SLP lanes when walking the reduction SSA use-def chain. Also the
previous attempt likely to mitigate such issue in
vectorizable_live_operation is misguided and has to be removed.
PR tree-optimization/107212
* tree-vect-loop.cc (vectorizable_reduction): Make sure to
set STMT_VINFO_REDUC_DEF for all live lanes in a SLP
reduction.
(vectorizable_live_operation): Do not pun to the SLP
node representative for reduction epilogue generation.
* gcc.dg/vect/pr107212-1.c: New testcase.
* gcc.dg/vect/pr107212-2.c: Likewise.
(cherry picked from commit ee467644c53ee2f7d633a8e1f53603feafab4351)
|