aboutsummaryrefslogtreecommitdiff
path: root/libgomp/testsuite
AgeCommit message (Collapse)AuthorFilesLines
2022-02-09C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct.Marcel Vollweiler10-0/+478
This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct which was introduced in OpenMP 5.1 (OpenMP API 5.1 specification pp. 197ff): has_device_addr(list) "The has_device_addr clause indicates that its list items already have device addresses and therefore they may be directly accessed from a target device. If the device address of a list item is not for the device on which the target region executes, accessing the list item inside the region results in unspecified behavior. The list items may include array sections." (p. 200) "A list item may not be specified in both an is_device_ptr clause and a has_device_addr clause on the directive." (p. 202) "A list item that appears in an is_device_ptr or a has_device_addr clause must not be specified in any data-sharing attribute clause on the same target construct." (p. 203) gcc/c-family/ChangeLog: * c-omp.cc (c_omp_split_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case. * c-pragma.h (enum pragma_kind): Added 5.1 in comment. (enum pragma_omp_clause): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_name): Parse 'has_device_addr' clause. (c_parser_omp_variable_list): Handle array sections. (c_parser_omp_clause_has_device_addr): Added. (c_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR case. (c_parser_omp_target_exit_data): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK. * c-typeck.cc (handle_omp_array_sections): Handle clause restrictions. (c_finish_omp_clauses): Handle array sections. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_name): Parse 'has_device_addr' clause. (cp_parser_omp_var_list_no_open): Handle array sections. (cp_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR case. (cp_parser_omp_target_update): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK. * semantics.cc (handle_omp_array_sections): Handle clause restrictions. (finish_omp_clauses): Handle array sections. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_clauses): Added OMP_LIST_HAS_DEVICE_ADDR case. * gfortran.h: Added OMP_LIST_HAS_DEVICE_ADDR. * openmp.cc (enum omp_mask2): Added OMP_CLAUSE_HAS_DEVICE_ADDR. (gfc_match_omp_clauses): Parse HAS_DEVICE_ADDR clause. (resolve_omp_clauses): Same. * trans-openmp.cc (gfc_trans_omp_variable_list): Added OMP_LIST_HAS_DEVICE_ADDR case. (gfc_trans_omp_clauses): Firstprivatize of array descriptors. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Added cases for OMP_CLAUSE_HAS_DEVICE_ADDR and handle array sections. (gimplify_adjust_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case. * omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_HAS_DEVICE_ADDR. (lower_omp_target): Same. * tree-core.h (enum omp_clause_code): Same. * tree-nested.cc (convert_nonlocal_omp_clauses): Same. (convert_local_omp_clauses): Same. * tree-pretty-print.cc (dump_omp_clause): Same. * tree.cc: Same. libgomp/ChangeLog: * libgomp.texi: Updated entry for HAS_DEVICE_ADDR. * target.c (copy_firstprivate_data): Copy only if host address is not NULL. * testsuite/libgomp.c++/target-has-device-addr-2.C: New test. * testsuite/libgomp.c++/target-has-device-addr-4.C: New test. * testsuite/libgomp.c++/target-has-device-addr-5.C: New test. * testsuite/libgomp.c++/target-has-device-addr-6.C: New test. * testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test. * testsuite/libgomp.c/target-has-device-addr-3.c: New test. * testsuite/libgomp.fortran/target-has-device-addr-1.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-2.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-3.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-4.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/clauses-1.c: Added has_device_addr to test cases. * g++.dg/gomp/attrs-1.C: Added has_device_addr to test cases. * g++.dg/gomp/attrs-2.C: Added has_device_addr to test cases. * c-c++-common/gomp/target-has-device-addr-1.c: New test. * c-c++-common/gomp/target-has-device-addr-2.c: New test. * c-c++-common/gomp/target-is-device-ptr-1.c: New test. * c-c++-common/gomp/target-is-device-ptr-2.c: New test. * gfortran.dg/gomp/is_device_ptr-3.f90: New test. * gfortran.dg/gomp/target-has-device-addr-1.f90: New test. * gfortran.dg/gomp/target-has-device-addr-2.f90: New test.
2022-02-08libgomp: Fix segfault with posthumous orphan tasks [PR104385]Jakub Jelinek1-0/+26
The following patch fixes crashes with posthumous orphan tasks. When a parent task finishes, gomp_clear_parent clears the parent pointers of its children tasks present in the parent->children_queue. But children that are still waiting for dependencies aren't in that queue yet, they will be added there only when the sibling they are waiting for exits. Unfortunately we were adding those tasks into the queues with the original task->parent which then causes crashes because that task is gone and freed. The following patch fixes that by clearing the parent field when we schedule such task for running by adding it into the queues and we know that the sibling task which is about to finish has NULL parent. 2022-02-08 Jakub Jelinek <jakub@redhat.com> PR libgomp/104385 * task.c (gomp_task_run_post_handle_dependers): If parent is NULL, clear task->parent. * testsuite/libgomp.c/pr104385.c: New test.
2022-02-04libgomp.fortran/allocate-1.f90: Fix minor cleanupTobias Burnus1-3/+0
libgomp/ChangeLog: * testsuite/libgomp.fortran/allocate-1.f90: Remove spurious STOP of previous commit.
2022-02-04libgomp.fortran/allocate-1.f90: Minor cleanupTobias Burnus2-56/+60
libgomp/ChangeLog: * testsuite/libgomp.fortran/allocate-1.c (is_64bit_aligned): Renamed from is_64bit_aligned_. * testsuite/libgomp.fortran/allocate-1.f90: Fix interface decl and use it, more implicit none, remove unused argument.
2022-02-01[nvptx] Add some support for .local atomicsTom de Vries3-21/+0
The ptx insn atom doesn't support local memory. In case of doing an atomic operation on local memory, we run into: ... operation not supported on global/shared address space ... This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE. The message is somewhat confusing given that actually the operation is not supported on local address space. Fix this by falling back on a non-atomic version when detecting a frame-related memory operand. This only solves some cases that are detected at compile-time. It does however fix the openacc private-atomic-* test-cases. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.md (define_insn "atomic_compare_and_swap<mode>_1") (define_insn "atomic_exchange<mode>") (define_insn "atomic_fetch_add<mode>") (define_insn "atomic_fetch_addsf") (define_insn "atomic_fetch_<logic><mode>"): Output non-atomic version if memory operands is frame-relative. gcc/testsuite/ChangeLog: 2022-01-31 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/stack-atomics-run.c: New test. libgomp/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Remove PR83812 workaround. * testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: Same. * testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Same.
2022-02-01[libgomp, testsuite] Fix insufficient resources in test-casesTom de Vries3-3/+25
When running libgomp test-case broadcast-many.c on an nvptx accelerator (T400, driver version 470.86), I run into: ... libgomp: The Nvidia accelerator has insufficient resources to launch \ 'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \ recompile the program with 'num_workers = x and vector_length = y' on \ that offloaded region or '-fopenacc-dim=:x:y' where x * y <= 896. FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/broadcast-many.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \ -O0 execution test ... The error does not occur when using GOMP_NVPTX_JIT=-O0. Fix this by using 896 / 32 == 28 workers for ACC_DEVICE_TYPE_nvidia. Likewise for some other test-cases. Tested libgomp on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Reduce num_workers for nvidia accelerator to fix libgomp error 'insufficient resources'. * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: Same. * testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Same.
2022-02-01[libgomp, testsuite] Reduce recursion depth in declare_target-*.f90Tom de Vries2-3/+7
When running the libgomp testsuite with GOMP_NVPTX_JIT=-O0 using an nvptx accelerator (Nvidia T400, 2GB), I run into: ... libgomp: cuCtxSynchronize error: unspecified launch failure \ (perhaps abort was called) libgomp: cuMemFree_v2 error: unspecified launch failure libgomp: device finalization failed FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 execution test ... The test-case contains: ... ! Reduced from 25 to 23, otherwise execution runs out of thread stack on ! Nvidia Titan V. if (fib (23) /= fib_wrapper (23)) stop 2 ... Fix this by reducing the fib/fib_wrapper argument from 23 to 22. Same for declare_target-2.f90. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce recursion depth. * testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.
2022-01-21Strengthen a few OpenACC test casesThomas Schwinge15-57/+202
Rather than rubber-stamp whatever requested vs. actual device kernel launch configuration happens, actually (again) verify the requested values (modulo expected variations). This better highlights that "AMD GCN has an upper limit of 'num_workers(16)'", and the deficiency that "AMD GCN uses the autovectorizer for the vector dimension: the use of a function call in vector-partitioned code [...] is not currently supported". And, this removes several instances of race conditions, where variables are concurrently written to in OpenACC gang-redundant mode. libgomp/ * testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Strengthen. * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.
2022-01-19nvptx: update fix for -Wformat-diagMartin Liska16-33/+33
gcc/ChangeLog: * config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Update warning messages. libgomp/ChangeLog: * testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update scanning patterns. * testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c: Likewise. * testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2022-01-18nvptx: fix -Wformat-diag warningsMartin Liska16-33/+33
gcc/ChangeLog: * config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Wrap keyword. * config/nvptx/nvptx.md: Remove trailing dot. libgomp/ChangeLog: * testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update keyword in dg-warning. * testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c: Likewise. * testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
2022-01-17Extend test cases for references in OpenACC 'private' clausesThomas Schwinge3-22/+364
libgomp/ * testsuite/libgomp.oacc-c++/privatized-ref-2.C: Extend. * testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
2022-01-17Test cases for references in OpenACC 'private' clausesJulian Brown3-0/+211
libgomp/ * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: New test. * testsuite/libgomp.oacc-c++/privatized-ref-2.C: New test. * testsuite/libgomp.oacc-c++/privatized-ref-3.C: New test. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2022-01-13Add support for allocate clause (OpenMP 5.0).Hafiz Abid Qadeer2-0/+340
This patch adds support for OpenMP 5.0 allocate clause for fortran. It does not yet support the allocator-modifier as specified in OpenMP 5.1. The allocate clause is already supported in C/C++. gcc/fortran/ChangeLog: * dump-parse-tree.c (show_omp_clauses): Handle OMP_LIST_ALLOCATE. * gfortran.h (OMP_LIST_ALLOCATE): New enum value. * openmp.c (enum omp_mask1): Add OMP_CLAUSE_ALLOCATE. (gfc_match_omp_clauses): Handle OMP_CLAUSE_ALLOCATE (OMP_PARALLEL_CLAUSES, OMP_DO_CLAUSES, OMP_SECTIONS_CLAUSES) (OMP_TASK_CLAUSES, OMP_TASKLOOP_CLAUSES, OMP_TARGET_CLAUSES) (OMP_TEAMS_CLAUSES, OMP_DISTRIBUTE_CLAUSES) (OMP_SINGLE_CLAUSES): Add OMP_CLAUSE_ALLOCATE. (OMP_TASKGROUP_CLAUSES): New. (gfc_match_omp_taskgroup): Use OMP_TASKGROUP_CLAUSES instead of OMP_CLAUSE_TASK_REDUCTION. (resolve_omp_clauses): Handle OMP_LIST_ALLOCATE. (resolve_omp_do): Avoid warning when loop iteration variable is in allocate clause. * trans-openmp.c (gfc_trans_omp_clauses): Handle translation of allocate clause. (gfc_split_omp_clauses): Update for OMP_LIST_ALLOCATE. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-1.f90: New test. * gfortran.dg/gomp/allocate-2.f90: New test. * gfortran.dg/gomp/allocate-3.f90: New test. * gfortran.dg/gomp/collapse1.f90: Update error message. * gfortran.dg/gomp/openmp-simd-4.f90: Likewise. * gfortran.dg/gomp/clauses-1.f90: Uncomment allocate clause. libgomp/ChangeLog: * testsuite/libgomp.fortran/allocate-1.c: New test. * testsuite/libgomp.fortran/allocate-1.f90: New test. * libgomp.texi: Remove string that says that allocate clause support is for C/C++ only.
2022-01-13Improve Intel MIC offloading XFAILing for 'omp_get_device_num'Thomas Schwinge4-3/+36
After recent commit be661959a6b6d8f9c3c8608a746789e7b2ec3ca4 "libgomp/testsuite: Improve omp_get_device_num() tests", we're now iterating over all OpenMP target devices. Intel MIC (emulated) offloading still doesn't properly implement device-side 'omp_get_device_num', and we thus regress: PASS: libgomp.c/../libgomp.c-c++-common/target-45.c (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/target-45.c execution test PASS: libgomp.c++/../libgomp.c-c++-common/target-45.c (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.c++/../libgomp.c-c++-common/target-45.c execution test PASS: libgomp.fortran/target10.f90 -O0 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90 -O0 execution test PASS: libgomp.fortran/target10.f90 -O1 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90 -O1 execution test PASS: libgomp.fortran/target10.f90 -O2 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90 -O2 execution test PASS: libgomp.fortran/target10.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: libgomp.fortran/target10.f90 -O3 -g (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90 -O3 -g execution test PASS: libgomp.fortran/target10.f90 -Os (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90 -Os execution test Improve the XFAILing added in commit bb75b22aba254e8ff144db27b1c8b4804bad73bb "Allow matching Intel MIC in OpenMP 'declare variant'" for the case that *any* Intel MIC offload device is available. libgomp/ * testsuite/libgomp.c-c++-common/on_device_arch.h (any_device_arch, any_device_arch_intel_mic): New. * testsuite/lib/libgomp.exp (check_effective_target_offload_device_any_intel_mic): New. * testsuite/libgomp.c-c++-common/target-45.c: Use it. * testsuite/libgomp.fortran/target10.f90: Likewise.
2022-01-13Document current '-Wuninitialized' diagnostics for ↵Thomas Schwinge1-1/+13
'libgomp.oacc-fortran/routine-10.f90' [PR102192] libgomp/ PR tree-optimization/102192 * testsuite/libgomp.oacc-fortran/routine-10.f90: Document current '-Wuninitialized' diagnostics.
2022-01-13Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics for ↵Thomas Schwinge12-3/+108
OpenACC test cases ... including "note: '[...]' was declared here" emitted since recent commit 9695e1c23be5b5c55d572ced152897313ddb96ae "Improve -Wuninitialized note location". For those that seemed incorrect to me, I've placed XFAILed 'dg-bogus'es, including one more instance of PR77504 etc., and several instances where for "local variables" of reference-data-type reductions (etc.?) we emit bogus (?) diagnostics. For implicit data clauses (including 'firstprivate'), we seem to be missing diagnostics, so I've placed XFAILed 'dg-warning's. gcc/testsuite/ * c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: Document current '-Wuninitialized' diagnostics. * c-c++-common/goacc/mdc-1.c: Likewise. * c-c++-common/goacc/nested-reductions-1-kernels.c: Likewise. * c-c++-common/goacc/nested-reductions-1-parallel.c: Likewise. * c-c++-common/goacc/nested-reductions-1-routine.c: Likewise. * c-c++-common/goacc/nested-reductions-2-kernels.c: Likewise. * c-c++-common/goacc/nested-reductions-2-parallel.c: Likewise. * c-c++-common/goacc/nested-reductions-2-routine.c: Likewise. * c-c++-common/goacc/uninit-dim-clause.c: Likewise. * c-c++-common/goacc/uninit-firstprivate-clause.c: Likewise. * c-c++-common/goacc/uninit-if-clause.c: Likewise. * gfortran.dg/goacc/array-with-dt-1.f90: Likewise. * gfortran.dg/goacc/array-with-dt-2.f90: Likewise. * gfortran.dg/goacc/array-with-dt-3.f90: Likewise. * gfortran.dg/goacc/array-with-dt-4.f90: Likewise. * gfortran.dg/goacc/array-with-dt-5.f90: Likewise. * gfortran.dg/goacc/derived-chartypes-1.f90: Likewise. * gfortran.dg/goacc/derived-chartypes-2.f90: Likewise. * gfortran.dg/goacc/derived-chartypes-3.f90: Likewise. * gfortran.dg/goacc/derived-chartypes-4.f90: Likewise. * gfortran.dg/goacc/derived-classtypes-1.f95: Likewise. * gfortran.dg/goacc/derived-types-2.f90: Likewise. * gfortran.dg/goacc/host_data-tree.f95: Likewise. * gfortran.dg/goacc/kernels-tree.f95: Likewise. * gfortran.dg/goacc/modules.f95: Likewise. * gfortran.dg/goacc/nested-reductions-1-kernels.f90: Likewise. * gfortran.dg/goacc/nested-reductions-1-parallel.f90: Likewise. * gfortran.dg/goacc/nested-reductions-1-routine.f90: Likewise. * gfortran.dg/goacc/nested-reductions-2-kernels.f90: Likewise. * gfortran.dg/goacc/nested-reductions-2-parallel.f90: Likewise. * gfortran.dg/goacc/nested-reductions-2-routine.f90: Likewise. * gfortran.dg/goacc/parallel-tree.f95: Likewise. * gfortran.dg/goacc/pr93464.f90: Likewise. * gfortran.dg/goacc/privatization-1-compute-loop.f90: Likewise. * gfortran.dg/goacc/privatization-1-compute.f90: Likewise. * gfortran.dg/goacc/privatization-1-routine_gang-loop.f90: Likewise. * gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise. * gfortran.dg/goacc/uninit-dim-clause.f95: Likewise. * gfortran.dg/goacc/uninit-firstprivate-clause.f95: Likewise. * gfortran.dg/goacc/uninit-if-clause.f95: Likewise. * gfortran.dg/goacc/uninit-use-device-clause.f95: Likewise. * gfortran.dg/goacc/wait.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Document current '-Wuninitialized' diagnostics. * testsuite/libgomp.oacc-fortran/data-5.f90: Likewise. * testsuite/libgomp.oacc-fortran/gemm-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/gemm.f90: Likewise. * testsuite/libgomp.oacc-fortran/optional-reduction.f90: Likewise. * testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise. * testsuite/libgomp.oacc-fortran/pr70643.f90: Likewise. * testsuite/libgomp.oacc-fortran/pr96628-part1.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise. * testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise. * testsuite/libgomp.oacc-fortran/reference-reductions.f90: Likewise.
2022-01-13Host and offload targets have no common meaning of address spacesThomas Schwinge1-4/+0
gcc/ * tree-streamer-out.c (pack_ts_base_value_fields): Don't pack 'TYPE_ADDR_SPACE' for offloading. * tree-streamer-in.c (unpack_ts_base_value_fields): Don't unpack 'TYPE_ADDR_SPACE' for offloading. libgomp/ * testsuite/libgomp.c/address-space-1.c: Remove 'dg-xfail-run-if' for 'offload_device_intel_mic'.
2022-01-13Wait at end of OpenACC asynchronous kernels regionsJulian Brown1-1/+0
In OpenACC 'kernels' decomposition, we're improperly nesting synchronous and asynchronous data and compute regions, giving rise to data races when the asynchronicity is actually executed, as is visible in at least on test case with GCN offloading. The proper fix is to correctly use the asynchronous interfaces, making the currently synchronous data regions fully asynchronous (see also <https://gcc.gnu.org/PR97390> "[OpenACC] 'async' clause on 'data' construct", which is to share the same implementation), but that's for later; for now add some more synchronization. gcc/ * omp-oacc-kernels-decompose.cc (add_wait): New function, split out of... (add_async_clauses_and_wait): ...here. Call new outlined function. (decompose_kernels_region_body): Add wait at the end of explicitly-asynchronous kernels regions. libgomp/ * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Remove GCN offloading execution XFAIL. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2022-01-13OpenACC 'kernels' decomposition: Mark variables used in synthesized data ↵Thomas Schwinge3-28/+33
clauses as addressable [PR100280] ... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary: 13073 else if (is_gimple_reg (var)) 13074 { 13075 gcc_assert (offloaded); 13076 tree avar = create_tmp_var (TREE_TYPE (var)); 13077 mark_addressable (avar); ..., which (a) is only implemented for actualy *offloaded* regions (but not data regions), and (b) the subsequently synthesized code for writing to and later reading back from the temporary fundamentally conflicts with OpenACC 'async' (as used by OpenACC 'kernels' decomposition). That's all not trivial to make work, so let's just avoid this case. gcc/ PR middle-end/100280 * omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region): Mark variables used in synthesized data clauses as addressable. gcc/testsuite/ PR middle-end/100280 * c-c++-common/goacc/kernels-decompose-pr100280-1.c: New. * c-c++-common/goacc/classify-kernels-parloops.c: Likewise. * c-c++-common/goacc/classify-kernels-unparallelized-parloops.c: Likewise. * c-c++-common/goacc/classify-kernels-unparallelized.c: Test '--param openacc-kernels=decompose'. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/kernels-decompose-2.c: Update. * c-c++-common/goacc/kernels-decompose-ice-1.c: Remove. * c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise. * gfortran.dg/goacc/classify-kernels-parloops.f95: New. * gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95: Likewise. * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Test '--param openacc-kernels=decompose'. * gfortran.dg/goacc/classify-kernels.f95: Likewise. libgomp/ PR middle-end/100280 * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c: Update. * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise. Suggested-by: Julian Brown <julian@codesourcery.com>
2022-01-13Enhance OpenACC 'kernels' decomposition testingThomas Schwinge11-91/+434
gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-1.c: Enhance. * c-c++-common/goacc/kernels-decompose-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise. * gfortran.dg/goacc/kernels-decompose-1.f95: Likewise. * gfortran.dg/goacc/kernels-decompose-2.f95: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise. * testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
2022-01-04libgomp/testsuite: Improve omp_get_device_num() testsTobias Burnus2-19/+27
Related to r12-6208-gebc853deb7cc0487de9ef6e891a007ba853d1933 "libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during offload image load" That commit fixed an issue with omp_get_device_num() on gcn/nvptx that resulted in having always the value 0. This commit modifies the tests to iterate over all devices such that on a multi-nonhost-device system it had detected that always-zero issue. libgomp/ChangeLog: * testsuite/libgomp.c-c++-common/target-45.c: Iterate over all devices. * testsuite/libgomp.fortran/target10.f90: Likewise.
2022-01-03Update copyright years.Jakub Jelinek4-4/+4
2021-12-13Fortran: Handle compare in OpenMP atomicTobias Burnus1-0/+313
gcc/fortran/ChangeLog: PR fortran/103576 * openmp.c (is_scalar_intrinsic_expr): Fix condition. (resolve_omp_atomic): Fix/update checks, accept compare. * trans-openmp.c (gfc_trans_omp_atomic): Handle compare. libgomp/ChangeLog: * libgomp.texi (OpenMP 5.1): Set Fortran support for atomic to 'Y'. * testsuite/libgomp.fortran/atomic-19.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/atomic-25.f90: Remove sorry, fix + add checks. * gfortran.dg/gomp/atomic-26.f90: Likewise. * gfortran.dg/gomp/atomic-21.f90: New test.
2021-12-10openmp: Fix libgomp.c++ testsuite errors for non-offload configsChung-Lin Tang3-1/+6
Some testcases for libgomp.c++ only works for non-shared address space offloading, because it exercises the zero-length array section behavior for offloaded address space, testing for NULL/non-NULL cases. libgomp/ChangeLog: * testsuite/libgomp.c++/target-lambda-1.C: Only run under "target offload_device_nonshared_as" * testsuite/libgomp.c++/target-this-3.C: Likewise. * testsuite/libgomp.c++/target-this-4.C: Likewise.
2021-12-09OpenMP 5.0: Remove array section base-pointer mapping semantics and other ↵Chung-Lin Tang10-54/+102
front-end adjustments This patch implements three pieces of functionality: (1) Adjust array section mapping to have standards conforming behavior, mapping array sections should *NOT* also map the base-pointer: struct S { int *ptr; ... }; struct S s; Instead of generating this during gimplify: map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) Now, adjust to: (i.e. do not map the base-pointer together. The attach operation is still generated, and if s.ptr is already mapped prior, attachment will happen) The correct way of achieving the base-pointer-also-mapped behavior would be to use: (A small Fortran front-end patch to trans-openmp.c:gfc_trans_omp_array_section is also included, which removes generation of a GOMP_MAP_ALWAYS_POINTER for array types, which appears incorrect and causes a regression in libgomp.fortranlibgomp.fortran/struct-elem-map-1.f90) (2) Related to the first item above, are fixes in libgomp/target.c to not overwrite attached pointers when handling device<->host copies, mainly for the "always" case. (3) The third is a set of changes to the C/C++ front-ends to extend the allowed component access syntax in map clauses. These changes are enabled for both OpenACC and OpenMP. gcc/c/ChangeLog: * c-parser.c (struct omp_dim): New struct type for use inside c_parser_omp_variable_list. (c_parser_omp_variable_list): Allow multiple levels of array and component accesses in array section base-pointer expression. (c_parser_omp_clause_to): Set 'allow_deref' to true in call to c_parser_omp_var_list_parens. (c_parser_omp_clause_from): Likewise. * c-typeck.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (c_finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/cp/ChangeLog: * parser.c (struct omp_dim): New struct type for use inside cp_parser_omp_var_list_no_open. (cp_parser_omp_var_list_no_open): Allow multiple levels of array and component accesses in array section base-pointer expression. (cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to cp_parser_omp_var_list for to/from clauses. * semantics.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (handle_omp_array_sections): Adjust pointer map generation of references. (finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_trans_omp_array_section): Do not generate GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type. gcc/ChangeLog: * gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter, accomodate case where 'offset' return of get_inner_reference is non-NULL. (is_or_contains_p): Further robustify conditions. (omp_target_reorder_clauses): In alloc/to/from sorting phase, also move following GOMP_MAP_ALWAYS_POINTER maps along. Add new sorting phase where we make sure pointers with an attach/detach map are ordered correctly. (gimplify_scan_omp_clauses): Add modifications to avoid creating GOMP_MAP_STRUCT and associated alloc map for attach/detach maps. gcc/testsuite/ChangeLog: * c-c++-common/goacc/deep-copy-arrayofstruct.c: Adjust testcase. * c-c++-common/gomp/target-enter-data-1.c: New testcase. * c-c++-common/gomp/target-implicit-map-2.c: New testcase. libgomp/ChangeLog: * target.c (gomp_map_vars_existing): Make sure attached pointer is not overwritten during cross-host/device copying. (gomp_update): Likewise. (gomp_exit_data): Likewise. * testsuite/libgomp.c++/target-11.C: Adjust testcase. * testsuite/libgomp.c++/target-12.C: Likewise. * testsuite/libgomp.c++/target-15.C: Likewise. * testsuite/libgomp.c++/target-16.C: Likewise. * testsuite/libgomp.c++/target-17.C: Likewise. * testsuite/libgomp.c++/target-21.C: Likewise. * testsuite/libgomp.c++/target-23.C: Likewise. * testsuite/libgomp.c/target-23.c: Likewise. * testsuite/libgomp.c/target-29.c: Likewise. * testsuite/libgomp.c-c++-common/target-implicit-map-2.c: New testcase.
2021-12-08openmp: Improve OpenMP target support for C++ (PR92120)Chung-Lin Tang8-0/+459
This patch implements several C++ specific mapping capabilities introduced for OpenMP 5.0, including implicit mapping of this[:1] for non-static member functions, zero-length array section mapping of pointer-typed members, lambda captured variable access in target regions, and use of lambda objects inside target regions. Several adjustments to the C/C++ front-ends to allow more member-access syntax as valid is also included. PR middle-end/92120 gcc/cp/ChangeLog: * cp-tree.h (finish_omp_target): New declaration. (finish_omp_target_clauses): Likewise. * parser.c (cp_parser_omp_clause_map): Adjust call to cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true. (cp_parser_omp_target): Factor out code, adjust into calls to new function finish_omp_target. * pt.c (tsubst_expr): Add call to finish_omp_target_clauses for OMP_TARGET case. * semantics.c (handle_omp_array_sections_1): Add handling to create 'this->member' from 'member' FIELD_DECL. Remove case of rejecting 'this' when not in declare simd. (handle_omp_array_sections): Likewise. (finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP map clauses. Handle 'A->member' case in map clauses. Remove case of rejecting 'this' when not in declare simd. (struct omp_target_walk_data): New struct for walking over target-directive tree body. (finish_omp_target_clauses_r): New function for tree walk. (finish_omp_target_clauses): New function. (finish_omp_target): New function. gcc/c/ChangeLog: * c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in call to c_parser_omp_variable_list to 'true'. * c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in array base handling. (c_finish_omp_clauses): Handle 'A->member' case in map clauses. gcc/ChangeLog: * gimplify.c ("tree-hash-traits.h"): Add include. (gimplify_scan_omp_clauses): Change struct_map_to_clause to type hash_map<tree_operand, tree> *. Adjust struct map handling to handle cases of *A and A->B expressions. Under !DECL_P case of GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when handling component_ref_p case. Add unshare_expr and gimplification when created GOMP_MAP_STRUCT is not a DECL. Add code to add firstprivate pointer for *pointer-to-struct case. (gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for exit data directives code to earlier position. * omp-low.c (lower_omp_target): Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds. * tree-pretty-print.c (dump_omp_clause): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/gomp/target-3.c: New testcase. * g++.dg/gomp/target-3.C: New testcase. * g++.dg/gomp/target-lambda-1.C: New testcase. * g++.dg/gomp/target-lambda-2.C: New testcase. * g++.dg/gomp/target-this-1.C: New testcase. * g++.dg/gomp/target-this-2.C: New testcase. * g++.dg/gomp/target-this-3.C: New testcase. * g++.dg/gomp/target-this-4.C: New testcase. * g++.dg/gomp/target-this-5.C: New testcase. * g++.dg/gomp/this-2.C: Adjust testcase. include/ChangeLog: * gomp-constants.h (enum gomp_map_kind): Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds. (GOMP_MAP_POINTER_P): Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION. libgomp/ChangeLog: * libgomp.h (gomp_attach_pointer): Add bool parameter. * oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer. (goacc_enter_data_internal): Likewise. * target.c (gomp_map_vars_existing): Update assert condition to include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION. (gomp_map_pointer): Add 'bool allow_zero_length_array_sections' parameter, add support for mapping a pointer with NULL target. (gomp_attach_pointer): Add 'bool allow_zero_length_array_sections' parameter, add support for attaching a pointer with NULL target. (gomp_map_vars_internal): Update calls to gomp_map_pointer and gomp_attach_pointer, add handling for GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases. * testsuite/libgomp.c++/target-23.C: New testcase. * testsuite/libgomp.c++/target-lambda-1.C: New testcase. * testsuite/libgomp.c++/target-lambda-2.C: New testcase. * testsuite/libgomp.c++/target-this-1.C: New testcase. * testsuite/libgomp.c++/target-this-2.C: New testcase. * testsuite/libgomp.c++/target-this-3.C: New testcase. * testsuite/libgomp.c++/target-this-4.C: New testcase. * testsuite/libgomp.c++/target-this-5.C: New testcase.
2021-12-02fortran: OpenMP/OpenACC array mapping alignment fix (PR90030)Chung-Lin Tang2-0/+32
Fix issue with the Fortran front-end when mapping arrays: when creating the data MEM_REF for the map clause, there was a convention of casting the referencing pointer to 'c_char *' by fold_convert (build_pointer_type (char_type_node), ptr). This causes the alignment passed to the libgomp runtime for array data hardwared to '1', and causes alignment errors on the offload target. This patch fixes this by removing the char_type_node pointer converts, and adding gcc_asserts to ensure POINTER_TYPE_P (TREE_TYPE (ptr)). PR fortran/90030 gcc/fortran/ChangeLog: * trans-openmp.c (gfc_omp_finish_clause): Remove fold_convert to pointer to char_type_node, add gcc_assert of POINTER_TYPE_P. (gfc_trans_omp_array_section): Likewise. (gfc_trans_omp_clauses): Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/goacc/finalize-1.f: Adjust scan test. * gfortran.dg/gomp/affinity-clause-1.f90: Likewise. * gfortran.dg/gomp/affinity-clause-5.f90: Likewise. * gfortran.dg/gomp/defaultmap-4.f90: Likewise. * gfortran.dg/gomp/defaultmap-5.f90: Likewise. * gfortran.dg/gomp/defaultmap-6.f90: Likewise. * gfortran.dg/gomp/map-3.f90: Likewise. * gfortran.dg/gomp/pr78260-2.f90: Likewise. * gfortran.dg/gomp/pr78260-3.f90: Likewise. libgomp/ChangeLog: * testsuite/libgomp.oacc-fortran/pr90030.f90: New test. * testsuite/libgomp.fortran/pr90030.f90: New test.
2021-11-30[OpenACC] Allow gang reductions inside serial constructsKwok Cheung Yeung1-1/+0
... fixing a regression introduced in the preceding commit 2b7dac2c0dcb087da9e4018943c023c0678234a3 "Make OpenACC orphan gang reductions errors". gcc/fortran/ * openmp.c (oacc_is_serial, oacc_is_parallel_or_serial): New. (resolve_oacc_loop_blocks): Use oacc_is_parallel_or_serial instead of oacc_is_parallel. libgomp/ * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Remove temporary skip. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-11-30Make OpenACC orphan gang reductions errorsCesar Philippidis1-0/+1
This patch promotes all OpenACC gang reductions on orphan loops as errors. Accord to the spec, orphan loops are those which are not lexically nested inside an OpenACC parallel or kernels regions. I.e., acc loops inside acc routines. At first I thought this could be a warning because the gang reduction finalizer uses an atomic update. However, because there is no synchronization between gangs, there is way to guarantee that reduction will have completed once a single gang entity returns from the acc routine call. gcc/c/ * c-typeck.c (c_finish_omp_clauses): Emit an error on orphan OpenACC gang reductions. gcc/cp/ * semantics.c (finish_omp_clauses): Emit an error on orphan OpenACC gang reductions. gcc/fortran/ * openmp.c (oacc_is_parallel, oacc_is_kernels): New 'static' functions. (resolve_oacc_loop_blocks): Emit an error on orphan OpenACC gang reductions. gcc/ * omp-general.h (enum oacc_loop_flags): Add OLF_REDUCTION enum. * omp-low.c (lower_oacc_head_mark): Use it to mark OpenACC reductions. * omp-offload.c (oacc_loop_auto_partitions): Don't assign gang level parallelism to orphan reductions. gcc/testsuite/ * c-c++-common/goacc/nested-reductions-1-routine.c: Adjust. * c-c++-common/goacc/nested-reductions-2-routine.c: Likewise. * gcc.dg/goacc/loop-processing-1.c: Likewise. * gfortran.dg/goacc/nested-reductions-1-routine.f90: Likewise. * gfortran.dg/goacc/nested-reductions-2-routine.f90: Likewise. * c-c++-common/goacc/orphan-reductions-1.c: New test. * c-c++-common/goacc/orphan-reductions-2.c: New test. * gfortran.dg/goacc/orphan-reductions-1.f90: New test. * gfortran.dg/goacc/orphan-reductions-2.f90: New test. libgomp/ * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Temporarily skip. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-11-24openmp: Fix up handling of kind(host) and kind(nohost) in ACCEL_COMPILERs ↵Jakub Jelinek1-0/+45
[PR103384] As the testcase shows, we weren't handling kind(host) and kind(nohost) properly in the ACCEL_COMPILERs, the code written in there is valid for the host compiler only, where if we are maybe offloaded, we defer resolution after IPA, otherwise return 0 for kind(nohost) and accept it for kind(host). Note, omp_maybe_offloaded is false after IPA. If ACCEL_COMPILER is defined, it is the other way around, but also we know we are after IPA. 2021-11-24 Jakub Jelinek <jakub@redhat.com> PR middle-end/103384 gcc/ * omp-general.c (omp_context_selector_matches): For ACCEL_COMPILER, return 0 for kind(host) and continue for kind(nohost). libgomp/ * testsuite/libgomp.c/declare-variant-2.c: New test.
2021-11-15Fortran: openmp: Add support for thread_limit clause on targetTobias Burnus1-0/+41
gcc/fortran/ChangeLog: * openmp.c (OMP_TARGET_CLAUSES): Add thread_limit. * trans-openmp.c (gfc_split_omp_clauses): Add thread_limit also to teams. libgomp/ChangeLog: * testsuite/libgomp.fortran/thread-limit-1.f90: New test.
2021-11-15openmp: Add support for thread_limit clause on targetJakub Jelinek1-0/+23
OpenMP 5.1 says that thread_limit clause can also appear on target, and similarly to teams should affect the thread-limit-var ICV. On combined target teams, the clause goes to both. We actually passed thread_limit internally on target already before, but only used it for gcn/ptx offloading to hint how many threads should be created and for ptx didn't set thread_limit_var in that case. Similarly for host fallback. Also, I found that we weren't copying the args array that contains encoded thread_limit and num_teams clause for target (etc.) for async target. 2021-11-15 Jakub Jelinek <jakub@redhat.com> gcc/ * gimplify.c (optimize_target_teams): Only add OMP_CLAUSE_THREAD_LIMIT to OMP_TARGET_CLAUSES if it isn't there already. gcc/c-family/ * c-omp.c (c_omp_split_clauses) <case OMP_CLAUSE_THREAD_LIMIT>: Duplicate to both OMP_TARGET and OMP_TEAMS. gcc/c/ * c-parser.c (OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_THREAD_LIMIT. gcc/cp/ * parser.c (OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_THREAD_LIMIT. libgomp/ * task.c (gomp_create_target_task): Copy args array as well. * target.c (gomp_target_fallback): Add args argument. Set gomp_icv (true)->thread_limit_var if thread_limit is present. (GOMP_target): Adjust gomp_target_fallback caller. (GOMP_target_ext): Likewise. (gomp_target_task_fn): Likewise. * config/nvptx/team.c (gomp_nvptx_main): Set gomp_global_icv.thread_limit_var. * testsuite/libgomp.c-c++-common/thread-limit-1.c: New test.
2021-11-15libgomp: Add a testcase for omp_get_num_teams inside of target inside of ↵Jakub Jelinek1-0/+17
host teams This is https://github.com/OpenMP/spec/issues/3183 There is an agreement that we should return 1 team inside of target, even if that target is inside of host teams. We were doing that when offloading and not during host fallback, r12-5151 should fix that even for host fallback. 2021-11-15 Jakub Jelinek <jakub@redhat.com> * testsuite/libgomp.c/teams-5.c: New test.
2021-11-12openmp: Relax handling of implicit map vs. existing device mappingsChung-Lin Tang1-0/+31
This patch implements relaxing the requirements when a map with the implicit attribute encounters an overlapping existing map. As the OpenMP 5.0 spec describes on page 320, lines 18-27 (and 5.1 spec, page 352, lines 13-22): "If a single contiguous part of the original storage of a list item with an implicit data-mapping attribute has corresponding storage in the device data environment prior to a task encountering the construct that is associated with the map clause, only that part of the original storage will have corresponding storage in the device data environment as a result of the map clause." 2021-11-12 Chung-Lin Tang <cltang@codesourcery.com> include/ChangeLog: * gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define special bit macro. (GOMP_MAP_IMPLICIT): New special map kind bits value. (GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of special map kind bits. (GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds. gcc/ChangeLog: * tree.h (OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P): New access macro for 'implicit' bit, using 'base.deprecated_flag' field of tree_node. * tree-pretty-print.c (dump_omp_clause): Add support for printing implicit attribute in tree dumping. * gimplify.c (gimplify_adjust_omp_clauses_1): Set OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P to 1 if map clause is implicitly created. (gimplify_adjust_omp_clauses): Adjust place of adding implicitly created clauses, from simple append, to starting of list, after non-map clauses. * omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind values passed to libgomp for implicit maps. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-implicit-map-1.c: New test. * c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern. * c-c++-common/goacc/firstprivate-mappings-1.c: Likewise. * c-c++-common/goacc/mdc-1.c: Likewise. * g++.dg/goacc/firstprivate-mappings-1.C: Likewise. libgomp/ChangeLog: * target.c (gomp_map_vars_existing): Add 'bool implicit' parameter, add implicit map handling to allow a "superset" existing map as valid case. (get_kind): Adjust to filter out GOMP_MAP_IMPLICIT bits in return value. (get_implicit): New function to extract implicit status. (gomp_map_fields_existing): Adjust arguments in calls to gomp_map_vars_existing, and add uses of get_implicit. (gomp_map_vars_internal): Likewise. * testsuite/libgomp.c-c++-common/target-implicit-map-1.c: New test.
2021-11-12openmp: Honor OpenMP 5.1 num_teams lower boundJakub Jelinek2-1/+71
The following patch implements what I've been talking about earlier, honor that for explicit num_teams clause we create at least the lower-bound (if not specified, upper-bound) teams in the league. For host fallback, it still means we only have one thread doing all the teams, sequentially one after another. For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too will or might fail. For these offloads, I think it is ok to remove symbols no longer used from libgomp.a. If num_teams_lower is bigger than the provided num_blocks or num_workgroups, we should arrange for gomp_num_teams_var to be num_teams_lower - 1, stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num () and instead use for it some .shared var that GOMP_teams4 initializes to %ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first increment that by num_blocks or num_workgroups each time and only return false when we are above num_teams_lower. Any help with actually implementing this for the 2 architectures highly appreciated. 2021-11-12 Jakub Jelinek <jakub@redhat.com> gcc/ * omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove. (BUILT_IN_GOMP_TEAMS4): New. * builtin-types.def (BT_FN_VOID_UINT_UINT): Remove. (BT_FN_BOOL_UINT_UINT_UINT_BOOL): New. * omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of GOMP_teams, pass to it also num_teams lower-bound expression or a dup of upper-bound if it is missing and a flag whether it is the first call or not. gcc/fortran/ * types.def (BT_FN_VOID_UINT_UINT): Remove. (BT_FN_BOOL_UINT_UINT_UINT_BOOL): New. libgomp/ * libgomp_g.h (GOMP_teams4): Declare. * libgomp.map (GOMP_5.1): Export GOMP_teams4. * target.c (GOMP_teams4): New function. * config/nvptx/target.c (GOMP_teams): Remove. (GOMP_teams4): New function. * config/gcn/target.c (GOMP_teams): Remove. (GOMP_teams4): New function. * testsuite/libgomp.c/teams-4.c (main): Expect exactly 2 teams instead of <= 2. * testsuite/libgomp.c-c++-common/teams-2.c: New test.
2021-11-11Fortran/openmp: Add support for 2 argument num_teams clauseTobias Burnus1-0/+22
Fortran part to commit r12-5146-g48d7327f2aaf65 gcc/fortran/ChangeLog: * gfortran.h (struct gfc_omp_clauses): Rename num_teams to num_teams_upper, add num_teams_upper. * dump-parse-tree.c (show_omp_clauses): Update to handle lower-bound num_teams clause. * frontend-passes.c (gfc_code_walker): Likewise * openmp.c (gfc_free_omp_clauses, gfc_match_omp_clauses, resolve_omp_clauses): Likewise. * trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses, gfc_trans_omp_target): Likewise. libgomp/ChangeLog: * testsuite/libgomp.fortran/teams-1.f90: New test.
2021-11-11libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() valuesJakub Jelinek1-0/+30
When thinking about GOMP_teams3, I've realized that using global variables for the values returned by omp_get_num_teams()/omp_get_team_num() calls is incorrect even with our right now dumb way of implementing host teams. The problems are two, one is if host teams is used from multiple pthread_create created threads - the spec says that host teams can't be nested inside of explicit parallel or other teams constructs, but with pthread_create the standard says obviously nothing about it. Another more important thing is host fallback, right now we don't do anything for omp_get_num_teams() or omp_get_team_num() which was fine before host teams was introduced and the 5.1 requirement that num_teams clause specifies minimum of teams, but with the global vars it means inside of target teams num_teams (2) we happily return omp_get_num_teams() == 4 if the target teams is inside of host teams with num_teams(4). With target fallback being invoked from parallel regions global vars simply can't work right on the host. So, this patch moves them to struct gomp_thread and propagates those for parallel to child threads. For host fallback, the implicit zeroing of *thr results in us returning omp_get_num_teams () == 1 and omp_get_team_num () == 0 which is fine for target teams without num_teams clause, for target teams with num_teams clause something to work on and for target without teams nested in it I've asked on omp-lang what should be done. 2021-11-11 Jakub Jelinek <jakub@redhat.com> * libgomp.h (struct gomp_thread): Add num_teams and team_num members. * team.c (struct gomp_thread_start_data): Likewise. (gomp_thread_start): Initialize thr->num_teams and thr->team_num. (gomp_team_start): Initialize start_data->num_teams and start_data->team_num. Update nthr->num_teams and nthr->team_num. * teams.c (gomp_num_teams, gomp_team_num): Remove. (GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num instead of gomp_num_teams and gomp_team_num. (omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams. (omp_get_team_num): Use thr->team_num instead of gomp_team_num. * testsuite/libgomp.c/teams-4.c: New test.
2021-11-11openmp: Add support for 2 argument num_teams clauseJakub Jelinek1-0/+26
In OpenMP 5.1, num_teams clause can accept either one expression as before, but it in that case changed meaning, rather than create <= expression teams it is now create == expression teams. Or it accepts two expressions separated by :, with the meaning that the first is low bound and second upper bound on how many teams should be created. The other ways to set number of teams are upper bounds with lower bound of 1. The following patch does parsing of this for C/C++. For host teams, we actually don't need to do anything further right now, we always create (pretend to create) exactly the requested number of teams, so we can just evaluate and throw away the lower bound for now. For teams nested in target, we don't guarantee that though and further work will be needed. In particular, omplower now turns the teams part of: struct S { S (); S (const S &); ~S (); int s; }; void bar (S &, S &); int baz (); _Pragma ("omp declare target to (baz)"); void foo (void) { S a, b; #pragma omp target private (a) map (b) { #pragma omp teams firstprivate (b) num_teams (baz ()) { bar (a, b); } } } into: retval.0 = baz (); retval.1 = retval.0; { unsigned int retval.3; struct S * D.2549; struct S b; retval.3 = (unsigned int) retval.1; D.2549 = .omp_data_i->b; S::S (&b, D.2549); #pragma omp teams num_teams(retval.1) firstprivate(b) shared(a) __builtin_GOMP_teams (retval.3, 0); { bar (&a, &b); } S::~S (&b); #pragma omp return(nowait) } IMHO we want a new API, say GOMP_teams3 which will take 3 arguments instead of 2 (the lower and upper bounds from num_teams and thread_limit) and will return a bool whether it should do the teams body or not. And, we should add right before outermost {} above while (__builtin_GOMP_teams3 ((unsigned) retval.1, (unsigned) retval.1, 0)) and remove the __builtin_GOMP_teams call. The current function performs exit equivalent (at least on NVPTX) which seems bad because that means the destructors of e.g. private variables on target aren't invoked, and at the current placement neither destructors of the already constructed privatized variables in teams. I'll do this next on the compiler side, but I'm afraid I'll need help with the nvptx and amdgcn implementations. E.g. for nvptx, we won't be able to use %ctaid.x . I think ideal would be to use a .shared integer variable for the omp_get_team_num value, but I don't have any experience with that, are .shared variables zero initialized by default, or do they have random value at start? PTX docs say they aren't initializable. 2021-11-11 Jakub Jelinek <jakub@redhat.com> gcc/ * tree.h (OMP_CLAUSE_NUM_TEAMS_EXPR): Rename to ... (OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR): ... this. (OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR): Define. * tree.c (omp_clause_num_ops): Increase num ops for OMP_CLAUSE_NUM_TEAMS to 2. * tree-pretty-print.c (dump_omp_clause): Print optional lower bound for OMP_CLAUSE_NUM_TEAMS. * gimplify.c (gimplify_scan_omp_clauses): Gimplify OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR if non-NULL. (optimize_target_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. * omp-low.c (lower_omp_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. * omp-expand.c (expand_teams_call, get_target_arguments): Likewise. gcc/c/ * c-parser.c (c_parser_omp_clause_num_teams): Parse optional lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. (c_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. gcc/cp/ * parser.c (cp_parser_omp_clause_num_teams): Parse optional lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. (cp_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. * semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR of OMP_CLAUSE_NUM_TEAMS clause. * pt.c (tsubst_omp_clauses): Likewise. (tsubst_expr): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. gcc/fortran/ * trans-openmp.c (gfc_trans_omp_clauses): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. gcc/testsuite/ * c-c++-common/gomp/clauses-1.c (bar): Supply lower-bound expression to half of the num_teams clauses. * c-c++-common/gomp/num-teams-1.c: New test. * c-c++-common/gomp/num-teams-2.c: New test. * g++.dg/gomp/attrs-1.C (bar): Supply lower-bound expression to half of the num_teams clauses. * g++.dg/gomp/attrs-2.C (bar): Likewise. * g++.dg/gomp/num-teams-1.C: New test. * g++.dg/gomp/num-teams-2.C: New test. libgomp/ * testsuite/libgomp.c-c++-common/teams-1.c: New test.
2021-10-30OpenMP: Add strictly nested API call check [PR102972]Tobias Burnus14-70/+146
The teams construct only permits omp_get_num_teams and omp_get_team_num as API call in strictly nested regions - check for it. Additionally, for Fortran, using DECL_NAME does not show the mangled name, hence, DECL_ASSEMBLER_NAME had to be used to. Finally, 'target device(ancestor:1)' wrongly rejected non-API calls as well. PR middle-end/102972 gcc/ChangeLog: * omp-low.c (omp_runtime_api_call): Use DECL_ASSEMBLER_NAME to get internal Fortran name; new permit_num_teams arg to permit omp_get_num_teams and omp_get_team_num. (scan_omp_1_stmt): Update call to it, add missing call for reverse offload, and check for strictly nested API calls in teams. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-device-ancestor-3.c: Add non-API routine test. * gfortran.dg/gomp/order-6.f90: Add missing bind(C). * c-c++-common/gomp/teams-3.c: New test. * gfortran.dg/gomp/teams-3.f90: New test. * gfortran.dg/gomp/teams-4.f90: New test. libgomp/ChangeLog: * testsuite/libgomp.c-c++-common/icv-3.c: Nest API calls inside parallel construct. * testsuite/libgomp.c-c++-common/icv-4.c: Likewise. * testsuite/libgomp.c/target-3.c: Likewise. * testsuite/libgomp.c/target-5.c: Likewise. * testsuite/libgomp.c/target-6.c: Likewise. * testsuite/libgomp.c/target-teams-1.c: Likewise. * testsuite/libgomp.c/teams-1.c: Likewise. * testsuite/libgomp.c/thread-limit-2.c: Likewise. * testsuite/libgomp.c/thread-limit-3.c: Likewise. * testsuite/libgomp.c/thread-limit-4.c: Likewise. * testsuite/libgomp.c/thread-limit-5.c: Likewise. * testsuite/libgomp.fortran/icv-3.f90: Likewise. * testsuite/libgomp.fortran/icv-4.f90: Likewise. * testsuite/libgomp.fortran/teams1.f90: Likewise.
2021-10-29Remove VRP threader passes in exchange for better threading pre-VRP.Aldy Hernandez2-2/+2
This patch upgrades the pre-VRP threading passes to fully resolving backward threaders, and removes the post-VRP threading passes altogether. With it, we reduce the number of threaders in our pipeline from 9 to 7. This will leave DOM as the only forward threader client. When the ranger can handle floats, we should be able to upgrade the pre-DOM threaders to fully resolving threaders and kill the embedded DOM threader. The numbers are as follows: prev: # threads in backward + vrp-threaders = 92624 now: # threads in backward threaders = 94275 Gain: +1.78% prev: # total threads: 189495 now: # total threads: 193714 Gain: +2.22% The numbers are not as great as my initial proposal, but I've recently pushed all the work that got us to this point ;-). And... the compilation improves by 1.32%! There's a regression on uninit-pred-7_a.c that I've yet to look at. I want to make sure it's not a missing thread. If it is, I'll create a PR and own it. Also, the tree-ssa/phi_on_compare-*.c tests have all regressed. This seems to be some special case the forward threader handles that the backward threader does not (edge_forwards_cmp_to_conditional_jump*). I haven't dug deep to see if this is solveable within our infrastructure, but a cursory look shows that even though the VRP threader threads this, the *.optimized dump ends with more conditional jumps than without the optimization. I'd like to punt on this for now, because DOM actually catches this through its lone use of the forward threader (I've adjusted the tests). However, we will need to address this sooner or later, if indeed it's still improving the final assembly. gcc/ChangeLog: * passes.def: Replace the pass_thread_jumps before VRP* with pass_thread_jumps_full. Remove all pass_vrp_threader instances. * tree-ssa-threadbackward.c (pass_data_thread_jumps_full): Remove hyphen from "thread-full" name. libgomp/ChangeLog: * testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading changes. * testsuite/libgomp.graphite/force-parallel-8.c: Same. gcc/testsuite/ChangeLog: * gcc.dg/loop-unswitch-2.c: Adjust for threading changes. * gcc.dg/old-style-asm-1.c: Same. * gcc.dg/tree-ssa/phi_on_compare-1.c: Same. * gcc.dg/tree-ssa/phi_on_compare-2.c: Same. * gcc.dg/tree-ssa/phi_on_compare-3.c: Same. * gcc.dg/tree-ssa/phi_on_compare-4.c: Same. * gcc.dg/tree-ssa/pr20701.c: Same. * gcc.dg/tree-ssa/pr21001.c: Same. * gcc.dg/tree-ssa/pr21294.c: Same. * gcc.dg/tree-ssa/pr21417.c: Same. * gcc.dg/tree-ssa/pr21559.c: Same. * gcc.dg/tree-ssa/pr21563.c: Same. * gcc.dg/tree-ssa/pr49039.c: Same. * gcc.dg/tree-ssa/pr59597.c: Same. * gcc.dg/tree-ssa/pr61839_1.c: Same. * gcc.dg/tree-ssa/pr61839_3.c: Same. * gcc.dg/tree-ssa/pr66752-3.c: Same. * gcc.dg/tree-ssa/pr68198.c: Same. * gcc.dg/tree-ssa/pr77445-2.c: Same. * gcc.dg/tree-ssa/pr77445.c: Same. * gcc.dg/tree-ssa/ranger-threader-1.c: Same. * gcc.dg/tree-ssa/ranger-threader-2.c: Same. * gcc.dg/tree-ssa/ranger-threader-4.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same. * gcc.dg/tree-ssa/ssa-thread-14.c: Same. * gcc.dg/tree-ssa/ssa-thread-backedge.c: Same. * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same. * gcc.dg/tree-ssa/vrp02.c: Same. * gcc.dg/tree-ssa/vrp03.c: Same. * gcc.dg/tree-ssa/vrp05.c: Same. * gcc.dg/tree-ssa/vrp06.c: Same. * gcc.dg/tree-ssa/vrp07.c: Same. * gcc.dg/tree-ssa/vrp08.c: Same. * gcc.dg/tree-ssa/vrp09.c: Same. * gcc.dg/tree-ssa/vrp33.c: Same. * gcc.dg/uninit-pred-9_b.c: Same. * gcc.dg/uninit-pred-7_a.c: xfail.
2021-10-27openmp: Allow non-rectangular loops with pointer iteratorsJakub Jelinek2-0/+376
This patch handles pointer iterators for non-rectangular loops. They are more limited than integral iterators of non-rectangular loops, in particular only var-outer, var-outer + a2, a2 + var-outer or var-outer - a2 can appear in lb or ub where a2 is some integral loop invariant expression, so no e.g. multiplication etc. 2021-10-27 Jakub Jelinek <jakub@redhat.com> gcc/ * omp-expand.c (expand_omp_for_init_counts): Handle non-rectangular iterators with pointer types. (expand_omp_for_init_vars, extract_omp_for_update_vars): Likewise. gcc/c-family/ * c-omp.c (c_omp_check_loop_iv_r): Don't clear 3rd bit for POINTER_PLUS_EXPR. (c_omp_check_nonrect_loop_iv): Handle POINTER_PLUS_EXPR. (c_omp_check_loop_iv): Set kind even if the iterator is non-integral. gcc/testsuite/ * c-c++-common/gomp/loop-8.c: New test. * c-c++-common/gomp/loop-9.c: New test. libgomp/ * testsuite/libgomp.c/loop-26.c: New test. * testsuite/libgomp.c/loop-27.c: New test.
2021-10-25libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_allocaTobias Burnus1-4/+3
Some systems do not have <alloca.h> but provide alloca differently, e.g. via stdlib.h. Do it like other testcases do and use __builtin_alloca. libgomp/ChangeLog: PR testsuite/102910 * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_alloca instead of #include <alloca.h> + alloca.
2021-10-21openmp: Fortran strictly-structured blocks supportChung-Lin Tang1-0/+1
This implements strictly-structured blocks support for Fortran, as specified in OpenMP 5.2. This now allows using a Fortran BLOCK construct as the body of most OpenMP constructs, with a "!$omp end ..." ending directive optional for that form. gcc/fortran/ChangeLog: * decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case together with COMP_BLOCK. * parse.c (parse_omp_structured_block): Change return type to 'gfc_statement', add handling for strictly-structured block case, adjust recursive calls to parse_omp_structured_block. (parse_executable): Adjust calls to parse_omp_structured_block. * parse.h (enum gfc_compile_state): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK. * trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case handling. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/cancel-1.f90: Adjust testcase. * gfortran.dg/gomp/nesting-3.f90: Adjust testcase. * gfortran.dg/gomp/strictly-structured-block-1.f90: New test. * gfortran.dg/gomp/strictly-structured-block-2.f90: New test. * gfortran.dg/gomp/strictly-structured-block-3.f90: New test. libgomp/ChangeLog: * libgomp.texi (Support of strictly structured blocks in Fortran): Adjust to 'Y'. * testsuite/libgomp.fortran/task-reduction-16.f90: Adjust testcase.
2021-10-20openmp: in_reduction support for FortranChung-Lin Tang2-0/+108
This patch implements support for the in_reduction clause for Fortran. It also includes more completion of the taskgroup construct inside the Fortran front-end, thus allowing task_reduction to work for task and target constructs. gcc/fortran/ChangeLog: * openmp.c (gfc_match_omp_clause_reduction): Add 'openmp_target' default false parameter. Add 'always,tofrom' map for OMP_LIST_IN_REDUCTION case. (gfc_match_omp_clauses): Add 'openmp_target' default false parameter, adjust call to gfc_match_omp_clause_reduction. (match_omp): Adjust call to gfc_match_omp_clauses * trans-openmp.c (gfc_trans_omp_taskgroup): Add call to gfc_match_omp_clause, create and return block. gcc/ChangeLog: * omp-low.c (omp_copy_decl_2): For !ctx, use record_vars to add new copy as local variable. (scan_sharing_clauses): Place copy of OMP_CLAUSE_IN_REDUCTION decl in ctx->outer instead of ctx. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/reduction4.f90: Adjust omp target in_reduction' scan pattern. libgomp/ChangeLog: * testsuite/libgomp.fortran/target-in-reduction-1.f90: New test. * testsuite/libgomp.fortran/target-in-reduction-2.f90: New test.
2021-10-20Disallow loop rotation and loop header crossing in jump threaders.Aldy Hernandez1-1/+1
There is a lot of fall-out from this patch, as there were many threading tests that assumed the restrictions introduced by this patch were valid. Some tests have merely shifted the threading to after loop optimizations, but others ended up with no threading opportunities at all. Surprisingly some tests ended up with more total threads. It was a crapshoot all around. On a postive note, there are 6 tests that no longer XFAIL, and one guality test which now passes. I felt a bit queasy about such a fundamental change wrt threading, so I ran it through my callgrind test harness (.ii files from a bootstrap). There was no change in overall compilation, DOM, or the VRP threaders. However, there was a slight increase of 1.63% in the backward threader. I'm pretty sure we could reduce this if we incorporated the restrictions into their profitability code. This way we could stop the search when we ran into one of these restrictions. Not sure it's worth it at this point. Tested on x86-64 Linux. Co-authored-by: Richard Biener <rguenther@suse.de> gcc/ChangeLog: * tree-ssa-threadupdate.c (cancel_thread): Dump threading reason on the same line as the threading cancellation. (jt_path_registry::cancel_invalid_paths): Avoid rotating loops. Avoid threading through loop headers where the path remains in the loop. libgomp/ChangeLog: * testsuite/libgomp.graphite/force-parallel-5.c: Remove xfail. gcc/testsuite/ChangeLog: * gcc.dg/Warray-bounds-87.c: Remove xfail. * gcc.dg/analyzer/pr94851-2.c: Remove xfail. * gcc.dg/graphite/pr69728.c: Remove xfail. * gcc.dg/graphite/scop-dsyr2k.c: Remove xfail. * gcc.dg/graphite/scop-dsyrk.c: Remove xfail. * gcc.dg/shrink-wrap-loop.c: Remove xfail. * gcc.dg/loop-8.c: Adjust for new threading restrictions. * gcc.dg/tree-ssa/ifc-20040816-1.c: Same. * gcc.dg/tree-ssa/pr21559.c: Same. * gcc.dg/tree-ssa/pr59597.c: Same. * gcc.dg/tree-ssa/pr71437.c: Same. * gcc.dg/tree-ssa/pr77445-2.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-4.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same. * gcc.dg/vect/bb-slp-16.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Remove. * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Remove. * gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Remove. * gcc.dg/tree-ssa/ssa-thread-invalid.c: New test.
2021-10-18Fortran: Fix Bind(C) Array-Descriptor ConversionTobias Burnus1-0/+18
gfortran uses internally a different array descriptor ("gfc") as Fortran 2018 alias TS291113 defines for C interoperability via ISO_Fortran_binding.h ("CFI"). Hence, when calling a C function from Fortran, it has to be converted in the callee - and if a BIND(C) procedure is written in Fortran, the CFI argument has to be converted to gfc in order work with the rest of the FE code and the library calls. Before this patch, part was handled in the FE generated code and other parts in libgfortran. With this patch, all code is generated and CFI is defined as proper type - visible in the debugger and to the middle end - avoiding both alias issues and missed optimization issues. This patch also fixes issues like: intent(out) deallocation in the bind(C) callee, using the CFI descriptor also for allocatable and pointer scalars and for len=* character strings. For 'select rank', it also optimizes the code + avoid accessing uninitialized memory if the dummy argument is allocatable/a pointer. It additionally rejects passing a descriptorless type(*) to an assumed-rank dummy argument. [F2018:C711] PR fortran/102086 PR fortran/92189 PR fortran/92621 PR fortran/101308 PR fortran/101309 PR fortran/101635 PR fortran/92482 gcc/fortran/ChangeLog: * decl.c (gfc_verify_c_interop_param): Remove 'sorry' for scalar allocatable/pointer and len=*. * expr.c (is_CFI_desc): Return true for for those. * gfortran.h (CFI_type_kind_shift, CFI_type_mask, CFI_type_from_type_kind, CFI_VERSION, CFI_MAX_RANK, CFI_attribute_pointer, CFI_attribute_allocatable, CFI_attribute_other, CFI_type_Integer, CFI_type_Logical, CFI_type_Real, CFI_type_Complex, CFI_type_Character, CFI_type_ucs4_char, CFI_type_struct, CFI_type_cptr, CFI_type_cfunptr, CFI_type_other): New #define. * trans-array.c (CFI_FIELD_BASE_ADDR, CFI_FIELD_ELEM_LEN, CFI_FIELD_VERSION, CFI_FIELD_RANK, CFI_FIELD_ATTRIBUTE, CFI_FIELD_TYPE, CFI_FIELD_DIM, CFI_DIM_FIELD_LOWER_BOUND, CFI_DIM_FIELD_EXTENT, CFI_DIM_FIELD_SM, gfc_get_cfi_descriptor_field, gfc_get_cfi_desc_base_addr, gfc_get_cfi_desc_elem_len, gfc_get_cfi_desc_version, gfc_get_cfi_desc_rank, gfc_get_cfi_desc_type, gfc_get_cfi_desc_attribute, gfc_get_cfi_dim_item, gfc_get_cfi_dim_lbound, gfc_get_cfi_dim_extent, gfc_get_cfi_dim_sm): New define/functions to access the CFI array descriptor. (gfc_conv_descriptor_type): New function for the GFC descriptor. (gfc_get_array_span): Handle expr of CFI descriptors and assumed-type descriptors. (gfc_trans_array_bounds): Remove 'static'. (gfc_conv_expr_descriptor): For assumed type, use the dtype of the actual argument. (structure_alloc_comps): Remove ' ' inside tabs. * trans-array.h (gfc_trans_array_bounds, gfc_conv_descriptor_type, gfc_get_cfi_desc_base_addr, gfc_get_cfi_desc_elem_len, gfc_get_cfi_desc_version, gfc_get_cfi_desc_rank, gfc_get_cfi_desc_type, gfc_get_cfi_desc_attribute, gfc_get_cfi_dim_lbound, gfc_get_cfi_dim_extent, gfc_get_cfi_dim_sm): New prototypes. * trans-decl.c (gfor_fndecl_cfi_to_gfc, gfor_fndecl_gfc_to_cfi): Remove global vars. (gfc_build_builtin_function_decls): Remove their initialization. (gfc_get_symbol_decl, create_function_arglist, gfc_trans_deferred_vars): Update for CFI. (convert_CFI_desc): Remove and replace by ... (gfc_conv_cfi_to_gfc): ... this function (gfc_generate_function_code): Call it; create local GFC var for CFI. * trans-expr.c (gfc_maybe_dereference_var): Handle CFI. (gfc_conv_subref_array_arg): Handle the if-noncontigous-only copy in when the result should be a descriptor. (gfc_conv_gfc_desc_to_cfi_desc): Completely rewritten. (gfc_conv_procedure_call): CFI fixes. * trans-openmp.c (gfc_omp_is_optional_argument, gfc_omp_check_optional_argument): Handle optional CFI. * trans-stmt.c (gfc_trans_select_rank_cases): Cleanup, avoid invalid code for allocatable/pointer dummies, which cannot be assumed size. * trans-types.c (gfc_cfi_descriptor_base): New global var. (gfc_get_dtype_rank_type): Skip rank init for rank < 0. (gfc_sym_type): Handle CFI dummies. (gfc_get_function_type): Update call. (gfc_get_cfi_dim_type, gfc_get_cfi_type): New. * trans-types.h (gfc_sym_type): Update prototype. (gfc_get_cfi_type): New prototype. * trans.c (gfc_trans_runtime_check): Make conditions more consistent to avoid '<logical> AND_THEN <long int>' in conditions. * trans.h (gfor_fndecl_cfi_to_gfc, gfor_fndecl_gfc_to_cfi): Remove global-var declaration. libgfortran/ChangeLog: * ISO_Fortran_binding.h (CFI_type_cfunptr): Make unique type again. * runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc, gfc_desc_to_cfi_desc): Add comment that those are no longer called by new code. libgomp/ChangeLog: * testsuite/libgomp.fortran/optional-bind-c.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/ISO_Fortran_binding_4.f90: Extend testcase. * gfortran.dg/PR100914.f90: Remove xfail. * gfortran.dg/PR100915.c: Expect CFI_type_cfunptr. * gfortran.dg/PR100915.f90: Handle CFI_type_cfunptr != CFI_type_cptr. * gfortran.dg/PR93963.f90: Extend select-rank tests. * gfortran.dg/bind-c-intent-out.f90: Change to dg-do run, update scan-dump. * gfortran.dg/bind_c_array_params_2.f90: Update/extend scan-dump. * gfortran.dg/bind_c_char_10.f90: Update scan-dump. * gfortran.dg/bind_c_char_8.f90: Remove dg-error "sorry". * gfortran.dg/c-interop/allocatable-dummy.f90: Remove xfail. * gfortran.dg/c-interop/c1255-1.f90: Likewise. * gfortran.dg/c-interop/c407c-1.f90: Update dg-error. * gfortran.dg/c-interop/cf-descriptor-5.f90: Remove xfail. * gfortran.dg/c-interop/cf-out-descriptor-3.f90: Likewise. * gfortran.dg/c-interop/cf-out-descriptor-4.f90: Likewise. * gfortran.dg/c-interop/cf-out-descriptor-5.f90: Likewise. * gfortran.dg/c-interop/contiguous-2.f90: Likewise. * gfortran.dg/c-interop/contiguous-3.f90: Likewise. * gfortran.dg/c-interop/deferred-character-1.f90: Likewise. * gfortran.dg/c-interop/deferred-character-2.f90: Likewise. * gfortran.dg/c-interop/fc-descriptor-3.f90: Likewise. * gfortran.dg/c-interop/fc-descriptor-5.f90: Likewise. * gfortran.dg/c-interop/fc-descriptor-6.f90: Likewise. * gfortran.dg/c-interop/fc-out-descriptor-3.f90: Likewise. * gfortran.dg/c-interop/fc-out-descriptor-4.f90: Likewise. * gfortran.dg/c-interop/fc-out-descriptor-5.f90: Likewise. * gfortran.dg/c-interop/fc-out-descriptor-6.f90: Likewise. * gfortran.dg/c-interop/ff-descriptor-5.f90: Likewise. * gfortran.dg/c-interop/ff-descriptor-6.f90: Likewise. * gfortran.dg/c-interop/fc-descriptor-7.f90: Remove xfail + extend. * gfortran.dg/c-interop/fc-descriptor-7-c.c: Update for changes. * gfortran.dg/c-interop/shape.f90: Add implicit none. * gfortran.dg/c-interop/typecodes-array-char-c.c: Add kind=4 char. * gfortran.dg/c-interop/typecodes-array-char.f90: Likewise. * gfortran.dg/c-interop/typecodes-array-float128.f90: Remove xfail. * gfortran.dg/c-interop/typecodes-scalar-basic.f90: Likewise. * gfortran.dg/c-interop/typecodes-scalar-float128.f90: Likewise. * gfortran.dg/c-interop/typecodes-scalar-int128.f90: Likewise. * gfortran.dg/c-interop/typecodes-scalar-longdouble.f90: Likewise. * gfortran.dg/iso_c_binding_char_1.f90: Remove dg-error "sorry". * gfortran.dg/pr93792.f90: Turn XFAIL into PASS. * gfortran.dg/ISO_Fortran_binding_19.f90: New test. * gfortran.dg/assumed_type_12.f90: New test. * gfortran.dg/assumed_type_13.c: New test. * gfortran.dg/assumed_type_13.f90: New test. * gfortran.dg/bind-c-char-descr.f90: New test. * gfortran.dg/bind-c-contiguous-1.c: New test. * gfortran.dg/bind-c-contiguous-1.f90: New test. * gfortran.dg/bind-c-contiguous-2.f90: New test. * gfortran.dg/bind-c-contiguous-3.c: New test. * gfortran.dg/bind-c-contiguous-3.f90: New test. * gfortran.dg/bind-c-contiguous-4.c: New test. * gfortran.dg/bind-c-contiguous-4.f90: New test. * gfortran.dg/bind-c-contiguous-5.c: New test. * gfortran.dg/bind-c-contiguous-5.f90: New test.
2021-10-15openmp: Improve testsuite/libgomp.c/affinity-1.c testcaseJakub Jelinek1-2/+3
I've noticed that while I have added hopefully sufficient test coverage for the case where one uses simple number or !number as p-interval, I haven't added any coverage for number:len:stride or number:len. This patch adds that. 2021-10-15 Jakub Jelinek <jakub@redhat.com> * testsuite/libgomp.c/affinity-1.c (struct places): Change name field type from char [50] to const char *. (places_array): Add a testcase for simplified syntax place followed by length or length and stride.
2021-10-15openmp: Handle OpenMP 5.1 simplified OMP_PLACES syntaxJakub Jelinek1-2/+4
In addition to adding ll_caches and numa_domain abstract names to OMP_PLACES syntax, OpenMP 5.1 also added one syntax simplification: https://github.com/OpenMP/spec/issues/2080 https://github.com/OpenMP/spec/pull/2081 in particular that in the grammar place non-terminal is now not only { res-list } but also res (i.e. a non-negative integer), which stands as a shortcut for { res } So, one can specify OMP_PLACES=0,4,8,12 with the meaning OMP_PLACES={0},{4},{8},{12} or OMP_PLACES=0:4 instead of OMP_PLACES={0}:4 or OMP_PLACES={0},{1},{2},{3} etc. This patch implements that. 2021-10-15 Jakub Jelinek <jakub@redhat.com> * env.c (parse_one_place): Handle non-negative-number the same as { non-negative-number }. Reject even !number:1 and !number:1:stride or !place:1 or !place:1:stride instead of just length other than 1. * libgomp.texi (OpenMP 5.1): Document OMP_PLACES syntax extensions and OMP_NUM_TEAMS/OMP_TEAMS_THREAD_LIMIT and omp_{set_num,get_max}_teams/omp_{s,g}et_teams_thread_limit features as implemented. * testsuite/libgomp.c/affinity-1.c: Add a test for the 5.1 place simplified syntax.
2021-10-15openmp: Fix up handling of OMP_PLACES=threads(1)Jakub Jelinek5-0/+50
When writing the places-*.c tests, I've noticed that we mishandle threads abstract name with specified num-places if num-places isn't a multiple of number of hw threads in a core. It then happily ignores the maximum count and overwrites for the remaining hw threads in a core further places that haven't been allocated. 2021-10-15 Jakub Jelinek <jakub@redhat.com> * config/linux/affinity.c (gomp_affinity_init_level_1): For level 1 after creating count places clean up and return immediately. * testsuite/libgomp.c/places-6.c: New test. * testsuite/libgomp.c/places-7.c: New test. * testsuite/libgomp.c/places-8.c: New test. * testsuite/libgomp.c/places-9.c: New test. * testsuite/libgomp.c/places-10.c: New test.
2021-10-15openmp: Add support for OMP_PLACES=numa_domainsJakub Jelinek1-0/+10
This adds support for numa_domains abstract name in OMP_PLACES, also new in OpenMP 5.1. Way to test this is OMP_PLACES=numa_domains OMP_DISPLAY_ENV=true LD_PRELOAD=.libs/libgomp.so.1 /bin/true and see what it prints on OMP_PLACES line. For non-NUMA machines it should print a single place that covers all CPUs, for NUMA machine one place for each NUMA node with corresponding CPUs. 2021-10-15 Jakub Jelinek <jakub@redhat.com> * env.c (parse_places_var): Handle numa_domains as level 5. * config/linux/affinity.c (gomp_affinity_init_numa_domains): New function. (gomp_affinity_init_level): Use it instead of gomp_affinity_init_level_1 for level == 5. * testsuite/libgomp.c/places-5.c: New test.