aboutsummaryrefslogtreecommitdiff
path: root/libgomp
AgeCommit message (Collapse)AuthorFilesLines
2024-08-06Daily bump.GCC Administrator1-0/+9
2024-08-05libgomp: Remove bogus warnings from privatized-ref-2.f90.Paul Thomas1-6/+0
2024-07-19 Paul Thomas <pault@gcc.gnu.org> libgomp/ChangeLog * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Cut dg-note about 'a' and remove bogus warnings about its array descriptor components being used uninitialized. (cherry picked from commit 8d6994f33a98a168151a57a3d21395b19196cd9d)
2024-06-20Update ChangeLog and version files for releasereleases/gcc-12.4.0Jakub Jelinek1-0/+4
2024-06-12Daily bump.GCC Administrator1-0/+18
2024-06-11c++: Fix ICE with weird copy assignment operator [PR114572]Jakub Jelinek1-0/+24
While ctors/dtors don't return anything (undeclared void or this pointer on arm) and copy assignment operators normally return a reference to *this, it isn't invalid to return uselessly some class object which might need destructing, but the OpenMP clause handling code wasn't expecting that. The following patch fixes that. 2024-04-05 Jakub Jelinek <jakub@redhat.com> PR c++/114572 * cp-gimplify.cc (cxx_omp_clause_apply_fn): Call build_cplus_new on build_call_a result if it has class type. * testsuite/libgomp.c++/pr114572.C: New test. (cherry picked from commit 592536eb3c0a97a55b1019ff0216ef77e6ca847e)
2024-06-11libgomp: Fix up FLOCK fallback handling [PR113192]Jakub Jelinek2-2/+18
My earlier change broke Solaris testing, because @FLOCK@ isn't substituted just into libgomp/Makefile where it worked, but also the testsuite/libgomp-site-extra.exp file where Make variables aren't present and can't be substituted. The following patch instead computes the absolute srcdir path and uses it for FLOCK. 2024-01-10 Jakub Jelinek <jakub@redhat.com> PR libgomp/113192 * configure.ac (FLOCK): Use $libgomp_abs_srcdir/testsuite/flock instead of \$(abs_top_srcdir)/testsuite/flock. * configure: Regenerated. (cherry picked from commit 2fb3ee3ee82874e160309344bc3e52afeed8f26a)
2023-09-02Daily bump.GCC Administrator1-0/+8
2023-09-01omp-expand.cc: Fix wrong code with non-rectangular loop nest [PR111017]Tobias Burnus1-0/+72
Before commit r12-5295-g47de0b56ee455e, all gimple_build_cond in expand_omp_for_* were inserted with gsi_insert_before (gsi_p, cond_stmt, GSI_SAME_STMT); except the one dealing with the multiplicative factor that was gsi_insert_after (gsi, cond_stmt, GSI_CONTINUE_LINKING); That commit for PR103208 fixed the issue of some missing regimplify of operands of GIMPLE_CONDs by moving the condition handling to the new function expand_omp_build_cond. While that function has an 'bool after = false' argument to switch between the two variants. However, all callers ommited this argument. This commit reinstates the prior behavior by passing 'true' for the factor != 0 condition, fixing the included testcase. PR middle-end/111017 gcc/ * omp-expand.cc (expand_omp_for_init_vars): Pass after=true to expand_omp_build_cond for 'factor != 0' condition, resulting in pre-r12-5295-g47de0b56ee455e code for the gimple insert. libgomp/ * testsuite/libgomp.c-c++-common/non-rect-loop-1.c: New test. (cherry picked from commit 1dc65003b66e5a97200f454eeddcccfce34416b3)
2023-06-29Daily bump.GCC Administrator1-0/+58
2023-06-28Support parallel testing in libgomp: fallback Perl 'flock' [PR66005]Thomas Schwinge4-1/+67
Follow-up to commit 6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba "Support parallel testing in libgomp, part II [PR66005]" ("..., and enable if 'flock' is available for serializing execution testing"), where we saw: > On my Dell Precision 7530 laptop: > > $ uname -srvi > Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 > $ grep '^model name' < /proc/cpuinfo | uniq -c > 12 model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz > $ nvidia-smi -L > GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea) > > ... [...]: case (c) standard configuration, no offloading > configured, [...] > $ \time make check-target-libgomp > > Case (c), baseline; [...]: > > 1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k > 1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k > > Case (c), parallelized [using 'flock']: > > [...] > -j12 GCC_TEST_PARALLEL_SLOTS=12 > 2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k > 2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k Quite the same when instead of 'flock' using this fallback Perl 'flock': 2565.23user 194.35system 4:46.77elapsed 962%CPU (0avgtext+0avgdata 505216maxresident)k 2549.38user 200.20system 4:46.08elapsed 961%CPU (0avgtext+0avgdata 505216maxresident)k PR testsuite/66005 gcc/ * doc/install.texi: Document (optional) Perl usage for parallel testing of libgomp. libgomp/ * testsuite/lib/libgomp.exp: 'flock' through stdout. * testsuite/flock: New. * configure.ac (FLOCK): Point to that if no 'flock' available, but 'perl' is. * configure: Regenerate. (cherry picked from commit 04abe1944d30eb18a2060cfcd9695d085f7b4752)
2023-06-28Support parallel testing in libgomp, part II [PR66005]Thomas Schwinge8-6/+84
..., and enable if 'flock' is available for serializing execution testing. Regarding the default of 19 parallel slots, this turned out to be a local minimum for wall time when testing this on: $ uname -srvi Linux 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64 $ grep '^model name' < /proc/cpuinfo | uniq -c 32 model name : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz ... in two configurations: case (a) standard configuration, no offloading configured, case (b) offloading for GCN and nvptx configured but no devices available. For both cases, default plus '-m32' variant. $ \time make check-target-libgomp RUNTESTFLAGS="--target_board=unix\{,-m32\}" Case (a), baseline: 6432.23user 332.38system 47:32.28elapsed 237%CPU (0avgtext+0avgdata 505044maxresident)k 6382.43user 319.21system 47:06.04elapsed 237%CPU (0avgtext+0avgdata 505172maxresident)k This is what people have been complaining about, rightly so, in <https://gcc.gnu.org/PR66005> "libgomp make check time is excessive" and elsewhere. Case (a), parallelized: -j12 GCC_TEST_PARALLEL_SLOTS=10 3088.49user 267.74system 6:43.82elapsed 831%CPU (0avgtext+0avgdata 505188maxresident)k -j15 GCC_TEST_PARALLEL_SLOTS=15 3308.08user 294.79system 5:56.04elapsed 1011%CPU (0avgtext+0avgdata 505360maxresident)k -j17 GCC_TEST_PARALLEL_SLOTS=17 3539.93user 298.99system 5:27.86elapsed 1170%CPU (0avgtext+0avgdata 505112maxresident)k -j18 GCC_TEST_PARALLEL_SLOTS=18 3697.50user 317.18system 5:14.63elapsed 1275%CPU (0avgtext+0avgdata 505360maxresident)k -j19 GCC_TEST_PARALLEL_SLOTS=19 3765.94user 324.27system 5:13.22elapsed 1305%CPU (0avgtext+0avgdata 505128maxresident)k -j20 GCC_TEST_PARALLEL_SLOTS=20 3684.66user 312.32system 5:15.26elapsed 1267%CPU (0avgtext+0avgdata 505100maxresident)k -j23 GCC_TEST_PARALLEL_SLOTS=23 4040.59user 347.10system 5:29.12elapsed 1333%CPU (0avgtext+0avgdata 505200maxresident)k -j26 GCC_TEST_PARALLEL_SLOTS=26 3973.24user 377.96system 5:24.70elapsed 1340%CPU (0avgtext+0avgdata 505160maxresident)k -j32 GCC_TEST_PARALLEL_SLOTS=32 4004.42user 346.10system 5:16.11elapsed 1376%CPU (0avgtext+0avgdata 505160maxresident)k Yay! Case (b), baseline; 2+ h: 7227.58user 700.54system 2:14:33elapsed 98%CPU (0avgtext+0avgdata 994264maxresident)k Case (b), parallelized: -j12 GCC_TEST_PARALLEL_SLOTS=10 7377.46user 777.52system 16:06.63elapsed 843%CPU (0avgtext+0avgdata 994344maxresident)k -j15 GCC_TEST_PARALLEL_SLOTS=15 8019.18user 721.42system 12:13.56elapsed 1191%CPU (0avgtext+0avgdata 994228maxresident)k -j17 GCC_TEST_PARALLEL_SLOTS=17 8530.11user 716.95system 10:45.92elapsed 1431%CPU (0avgtext+0avgdata 994176maxresident)k -j18 GCC_TEST_PARALLEL_SLOTS=18 8776.79user 645.89system 10:27.20elapsed 1502%CPU (0avgtext+0avgdata 994248maxresident)k -j19 GCC_TEST_PARALLEL_SLOTS=19 9332.37user 641.76system 10:15.09elapsed 1621%CPU (0avgtext+0avgdata 994260maxresident)k -j20 GCC_TEST_PARALLEL_SLOTS=20 9609.54user 789.88system 10:26.94elapsed 1658%CPU (0avgtext+0avgdata 994284maxresident)k -j23 GCC_TEST_PARALLEL_SLOTS=23 10362.40user 911.14system 10:44.47elapsed 1749%CPU (0avgtext+0avgdata 994208maxresident)k -j26 GCC_TEST_PARALLEL_SLOTS=26 11159.44user 850.99system 11:09.25elapsed 1794%CPU (0avgtext+0avgdata 994256maxresident)k -j32 GCC_TEST_PARALLEL_SLOTS=32 11453.50user 939.52system 11:00.38elapsed 1876%CPU (0avgtext+0avgdata 994240maxresident)k On my Dell Precision 7530 laptop: $ uname -srvi Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 $ grep '^model name' < /proc/cpuinfo | uniq -c 12 model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz $ nvidia-smi -L GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea) ... in two configurations: case (c) standard configuration, no offloading configured, case (d) offloading for nvptx configured and device available. For both cases, only default variant, no '-m32'. $ \time make check-target-libgomp Case (c), baseline; roughly half of case (a) (just one variant): 1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k 1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k Case (c), parallelized: -j12 GCC_TEST_PARALLEL_SLOTS=2 1143.83user 110.76system 10:20.46elapsed 202%CPU (0avgtext+0avgdata 505216maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=6 1737.08user 143.94system 4:59.48elapsed 628%CPU (0avgtext+0avgdata 505200maxresident)k 1730.31user 143.02system 4:58.75elapsed 627%CPU (0avgtext+0avgdata 505152maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=8 2192.63user 169.34system 4:52.96elapsed 806%CPU (0avgtext+0avgdata 505216maxresident)k 2219.04user 167.67system 4:53.19elapsed 814%CPU (0avgtext+0avgdata 505152maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=10 2463.93user 184.98system 4:48.39elapsed 918%CPU (0avgtext+0avgdata 505200maxresident)k 2455.62user 183.68system 4:47.40elapsed 918%CPU (0avgtext+0avgdata 505216maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=12 2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k 2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe] 2613.18user 199.51system 4:44.06elapsed 990%CPU (0avgtext+0avgdata 505216maxresident)k Case (d), baseline (compared to case (b): only nvptx offloading compilation, but also nvptx offloading execution); ~1 h: 2841.93user 653.68system 1:02:26elapsed 93%CPU (0avgtext+0avgdata 909792maxresident)k 2842.03user 654.39system 1:02:24elapsed 93%CPU (0avgtext+0avgdata 909880maxresident)k Case (d), parallelized: -j12 GCC_TEST_PARALLEL_SLOTS=2 2856.39user 606.87system 33:58.64elapsed 169%CPU (0avgtext+0avgdata 909948maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=6 3444.90user 666.86system 18:37.57elapsed 367%CPU (0avgtext+0avgdata 909856maxresident)k 3462.13user 667.13system 18:36.87elapsed 369%CPU (0avgtext+0avgdata 909872maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=8 3929.74user 716.22system 18:02.36elapsed 429%CPU (0avgtext+0avgdata 909832maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=10 4152.84user 736.16system 17:43.05elapsed 459%CPU (0avgtext+0avgdata 909872maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=12 4209.60user 749.00system 17:35.20elapsed 469%CPU (0avgtext+0avgdata 909840maxresident)k -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe] 4255.54user 756.78system 17:29.06elapsed 477%CPU (0avgtext+0avgdata 909868maxresident)k Worth noting is that with nvptx offloading, there is one execution test case that times out ('libgomp.fortran/reverse-offload-5.f90'). This effectively stalls progress for almost 5 min: quickly other executions test cases queue up on the lock for all parallel slots. That's working as expected; just noting this as it accordingly does skew the wall time numbers. PR testsuite/66005 libgomp/ * configure.ac: Look for 'flock'. * testsuite/Makefile.am (gcc_test_parallel_slots): Enable parallel testing. * testsuite/config/default.exp: Don't 'load_lib "standard.exp"' here... * testsuite/lib/libgomp.exp: ... but here, instead. (libgomp_load): Override for parallel testing. * testsuite/libgomp-site-extra.exp.in (FLOCK): Set. * configure: Regenerate. * Makefile.in: Regenerate. * testsuite/Makefile.in: Regenerate. (cherry picked from commit 6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba)
2023-06-28Support parallel testing in libgomp, part I [PR66005]Rainer Orth3-23/+138
..., while still hard-coding the number of parallel slots to one. PR testsuite/66005 libgomp/ * testsuite/Makefile.am (PWD_COMMAND): New variable. (%/site.exp): New target. (check_p_numbers0, check_p_numbers1, check_p_numbers2) (check_p_numbers3, check_p_numbers4, check_p_numbers5) (check_p_numbers6, check_p_numbers, gcc_test_parallel_slots) (check_p_subdirs) (check_DEJAGNU_libgomp_targets): New variables. ($(check_DEJAGNU_libgomp_targets)): New target. ($(check_DEJAGNU_libgomp_targets)): New dependency. (check-DEJAGNU $(check_DEJAGNU_libgomp_targets)): New targets. * testsuite/Makefile.in: Regenerate. * testsuite/lib/libgomp.exp: For parallel testing, 'load_file ../libgomp-test-support.exp'. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com> (cherry picked from commit e797db5c744f7b4e110f23a495fca8e6b8aebe83)
2023-06-28libgomp C++ testsuite: Use 'lang_include_flags' instead of 'libstdcxx_includes'Thomas Schwinge2-8/+6
With nvptx offloading configured, and supported, and CUDA available: $ make check-target-libgomp RUNTESTFLAGS="--all c.exp=context-1.c c++.exp=context-1.c" [...] Running [...]/libgomp.oacc-c/c.exp ... PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors) PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors) PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 Running [...]/libgomp.oacc-c++/c++.exp ... PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors) PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors) PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 [...] ..., but for 'c++.exp=context-1.c' alone, we currently get all-UNSUPPORTED: $ make check-target-libgomp RUNTESTFLAGS_="--all c++.exp=context-1.c" [...] Running [...]/libgomp.oacc-c++/c++.exp ... UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 [...] That is, if 'c.exp' executes first, it does successfully evaluate 'dg-require-effective-target openacc_cublas' -- and does cache this result (so it isn't reevaluated for 'c++.exp'). However, for 'c++.exp' alone (that is, without the 'c.exp' result cached), we run into: spawn -ignore SIGHUP [xgcc] [...] -x c++ openacc_cublas2311907.c [...] In file included from /usr/include/cuda_fp16.h:3673, from /usr/include/cublas_api.h:75, from /usr/include/cublas_v2.h:65, from openacc_cublas2311907.c:3: /usr/include/cuda_fp16.hpp:67:10: fatal error: utility: No such file or directory We're missing include paths to C++/libstdc++ build-tree headers. Fix this by using the mechanism introduced for Fortran in r212268 (commit f707da16f714f7fe5a42391748212c84dfec639b) re "libgomp.fortran/fortran.exp - add -fintrinsic-modules-path ${blddir}". libgomp/ * testsuite/libgomp.c++/c++.exp: Use 'lang_include_flags' instead of 'libstdcxx_includes'. * testsuite/libgomp.oacc-c++/c++.exp: Likewise. (cherry picked from commit 1b93b9191d073bf9e867ab8bfc8e4b59ba5af1f3)
2023-05-08Update ChangeLog and version files for releasereleases/gcc-12.3.0Richard Biener1-0/+4
2023-03-20Daily bump.GCC Administrator1-0/+7
2023-03-19libgomp: Fix up some typos in libgomp.texiJakub Jelinek1-7/+7
I decided to check for repeated the the in libgomp and noticed there are several occurrences of a typo theads rather than threads in libgomp.texi. 2023-02-16 Jakub Jelinek <jakub@redhat.com> * libgomp.texi: Fix typos - theads -> threads. (cherry picked from commit 0b9bd33d69d0c30330a465e6bad262d90c94d4ea)
2023-03-09Daily bump.GCC Administrator1-0/+9
2023-03-08OpenMP/Fortran: Fix handling of optional is_device_ptr + bind(C) [PR108546]Tobias Burnus2-0/+99
For is_device_ptr, optional checks should only be done before calling libgomp, afterwards they are NULL either because of absent or, by chance, because it is unallocated or unassociated (for pointers/allocatables). Additionally, it fixes an issue with explicit mapping for 'type(c_ptr)'. PR middle-end/108546 gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_trans_omp_clauses): Fix mapping of type(C_ptr) variables. gcc/ChangeLog: * omp-low.cc (lower_omp_target): Remove optional handling on the receiver side, i.e. inside target (data), for use_device_ptr. libgomp/ChangeLog: * testsuite/libgomp.fortran/is_device_ptr-3.f90: New test. * testsuite/libgomp.fortran/use_device_ptr-optional-4.f90: New test. (cherry picked from commit 96ff97ff6574666a5509ae9fa596e7f2b6ad4f88)
2023-02-11Daily bump.GCC Administrator1-0/+24
2023-02-10openmp: Fix up OpenMP expansion of non-rectangular loops [PR108459]Jakub Jelinek1-0/+41
expand_omp_for_init_counts was using for the case where collapse(2) inner loop has init expression dependent on non-constant multiple of the outer iterator and the condition upper bound expression doesn't depend on the outer iterator fold_unary (NEGATE_EXPR, ...). This will just return NULL if it can't be folded, we need fold_build1 instead. 2023-01-19 Jakub Jelinek <jakub@redhat.com> PR middle-end/108459 * omp-expand.cc (expand_omp_for_init_counts): Use fold_build1 rather than fold_unary for NEGATE_EXPR. * testsuite/libgomp.c/pr108459.c: New test. (cherry picked from commit 46644ec99cb355845b23bb1d02775c057ed8ee88)
2023-02-10openmp: Fix up finish_omp_target_clauses [PR108286]Jakub Jelinek1-0/+29
The comment in the loop says that we shouldn't add a map clause if such a clause exists already, but the loop was actually using OMP_CLAUSE_DECL on any clause. Target construct can have various clauses which don't have OMP_CLAUSE_DECL at all (e.g. nowait, device or if) or clause where it means something different (e.g. privatization clauses, allocate, depend). So, only check OMP_CLAUSE_DECL on OMP_CLAUSE_MAP clauses. 2023-01-05 Jakub Jelinek <jakub@redhat.com> PR c++/108286 * semantics.cc (finish_omp_target_clauses): Ignore clauses other than OMP_CLAUSE_MAP. * testsuite/libgomp.c++/pr108286.C: New test. (cherry picked from commit 29c3218618ef6177dc33871b26c8fbd9b21eabe1)
2023-02-10openmp: Don't try to destruct DECL_OMP_PRIVATIZED_MEMBER vars [PR108180]Jakub Jelinek1-0/+55
DECL_OMP_PRIVATIZED_MEMBER vars are artificial vars with DECL_VALUE_EXPR of this->field used just during gimplification and omp lowering/expansion to privatize individual fields in methods when needed. As the following testcase shows, when not in templates, they were handled right, but in templates we actually called cp_finish_decl on them and that can result in their destruction, which is obviously undesirable, we should only destruct the privatized copies of them created in omp lowering. Fixed thusly. 2022-12-21 Jakub Jelinek <jakub@redhat.com> PR c++/108180 * pt.cc (tsubst_expr): Don't call cp_finish_decl on DECL_OMP_PRIVATIZED_MEMBER vars. * testsuite/libgomp.c++/pr108180.C: New test. (cherry picked from commit 1119902b6c7c1c50123ed85ec1def8be4772d68c)
2023-01-31Daily bump.GCC Administrator1-0/+8
2023-01-30OpenMP/Fortran: Fix has_device_addr clause splitting [PR108558]Tobias Burnus1-0/+59
gcc/fortran/ChangeLog: PR fortran/108558 * trans-openmp.cc (gfc_split_omp_clauses): Handle has_device_addr. libgomp/ChangeLog: PR fortran/108558 * testsuite/libgomp.fortran/has_device_addr.f90: New test. (cherry picked from commit 2325c8920bbc99edcc9fffaa79577c528df41eb8)
2022-11-04Daily bump.GCC Administrator1-0/+21
2022-11-03libgomp: Fix up creation of artificial teamsJakub Jelinek6-6/+117
When not in explicit parallel/target/teams construct, we in some cases create an artificial parallel with a single thread (either to handle target nowait or for task reduction purposes). In those cases, it handled again artificially created implicit task (created by gomp_new_icv for cases where we needed to write to some ICVs), but as the testcases show, didn't take into account possibility of this being done from explicit task(s). The code would destroy/free the previous task and replace it with the new implicit task. If task is an explicit task (when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to a local stack variable, so freeing it doesn't work, and additionally we shouldn't lose the explicit tasks - the new implicit task should instead replace the ancestor task which is the first implicit one. 2022-10-12 Jakub Jelinek <jakub@redhat.com> * task.c (gomp_create_artificial_team): Fix up handling of invocations from within explicit task. * target.c (GOMP_target_ext): Likewise. * testsuite/libgomp.c/task-7.c: New test. * testsuite/libgomp.c/task-8.c: New test. * testsuite/libgomp.c-c++-common/task-reduction-17.c: New test. * testsuite/libgomp.c-c++-common/task-reduction-18.c: New test. (cherry picked from commit a58a965eb73253759f6a3e1c7380392557da89c8)
2022-11-03openmp, c: Tighten up c_tree_equal [PR106981]Jakub Jelinek1-0/+19
This patch changes c_tree_equal to work more like cp_tree_equal, be more strict in what it accepts. The ICE on the first testcase was due to INTEGER_CST wi::wide (t1) == wi::wide (t2) comparison which ICEs if the two constants have different precision, but as the second testcase shows, being too lenient in it can also lead to miscompilation of valid OpenMP programs where we think certain expression is the same even when it isn't and can be guaranteed at runtime to represent different memory location. So, the patch looks through only NON_LVALUE_EXPRs and for constants as well as casts requires that the types match before actually comparing the constant values or recursing on the cast operands. 2022-09-24 Jakub Jelinek <jakub@redhat.com> PR c/106981 gcc/c/ * c-typeck.cc (c_tree_equal): Only strip NON_LVALUE_EXPRs at the start. For CONSTANT_CLASS_P or CASE_CONVERT: return false if t1 and t2 have different types. gcc/testsuite/ * c-c++-common/gomp/pr106981.c: New test. libgomp/ * testsuite/libgomp.c-c++-common/pr106981.c: New test. (cherry picked from commit 3c5bccb608c665ac3f62adb1817c42c845812428)
2022-10-29Daily bump.GCC Administrator1-0/+13
2022-10-28OpenACC: Don't gang-privatize artificial variables [PR90115]Julian Brown5-37/+22
This patch prevents compiler-generated artificial variables from being treated as privatization candidates for OpenACC. The rationale is that e.g. "gang-private" variables actually must be shared by each worker and vector spawned within a particular gang, but that sharing is not necessary for any compiler-generated variable (at least at present, but no such need is anticipated either). Variables on the stack (and machine registers) are already private per-"thread" (gang, worker and/or vector), and that's fine for artificial variables. We're restricting this to blocks, as we still need to understand what it means for a 'DECL_ARTIFICIAL' to appear in a 'private' clause. Several tests need their scan output patterns adjusted to compensate. 2022-10-14 Julian Brown <julian@codesourcery.com> PR middle-end/90115 gcc/ * omp-low.cc (oacc_privatization_candidate_p): Artificial vars are not privatization candidates. libgomp/ * testsuite/libgomp.oacc-fortran/declare-1.f90: Adjust scan output. * testsuite/libgomp.oacc-fortran/host_data-5.F90: Likewise. * testsuite/libgomp.oacc-fortran/if-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/print-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com> (cherry picked from commit 11e811d8e2f63667f60f73731bb934273f5882b8)
2022-10-22Daily bump.GCC Administrator1-0/+8
2022-10-21Add 'libgomp.oacc-c-c++-common/private-big-1.c' [PR105421]Thomas Schwinge1-0/+100
After commit r13-3404-g7c55755d4c760de326809636531478fd7419e1e5 "amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]", "big" private data now works for GCN offloading, too. PR target/105421 libgomp/ * testsuite/libgomp.oacc-c-c++-common/private-big-1.c: New. (cherry picked from commit c7ebee2378426eeca425ca5406af213a926f154c)
2022-08-24Daily bump.GCC Administrator1-0/+8
2022-08-23OpenMP: Fix var replacement with 'simd' and linear-step vars [PR106548]Tobias Burnus1-0/+254
gcc/ChangeLog: PR middle-end/106548 * omp-low.cc (lower_rec_input_clauses): Use build_outer_var_ref for 'simd' linear-step values that are variable. libgomp/ChangeLog: PR middle-end/106548 * testsuite/libgomp.c/linear-2.c: New test. (cherry picked from commit d9c9424d2c4f7b25acfc00db0076a65882c8a99f)
2022-08-19Update ChangeLog and version files for releasereleases/gcc-12.2.0Richard Biener1-0/+4
2022-08-02Daily bump.GCC Administrator1-0/+9
2022-08-01c: Fix location for _Pragma tokens [PR97498]Lewis Hyatt2-11/+11
The handling of #pragma GCC diagnostic uses input_location, which is not always as precise as needed; in particular the relative location of some tokens and a _Pragma directive will crucially determine whether a given diagnostic is enabled or suppressed in the desired way. PR97498 shows how the C frontend ends up with input_location pointing to the beginning of the line containing a _Pragma() directive, resulting in the wrong behavior if the diagnostic to be modified pertains to some tokens found earlier on the same line. This patch fixes that by addressing two issues: a) libcpp was not assigning a valid location to the CPP_PRAGMA token generated by the _Pragma directive. b) C frontend was not setting input_location to something reasonable. With this change, the C frontend is able to change input_location to point to the _Pragma token as needed. This is just a two-line fix (one for each of a) and b)), the testsuite changes were needed only because the location on the tested warnings has been somewhat improved, so the tests need to look for the new locations. gcc/c/ChangeLog: PR preprocessor/97498 * c-parser.cc (c_parser_pragma): Set input_location to the location of the pragma, rather than the start of the line. libcpp/ChangeLog: PR preprocessor/97498 * directives.cc (destringize_and_run): Override the location of the CPP_PRAGMA token from a _Pragma directive to the location of the expansion point, as is done for the tokens lexed from it. gcc/testsuite/ChangeLog: PR preprocessor/97498 * c-c++-common/pr97498.c: New test. * c-c++-common/gomp/pragma-3.c: Adapt for improved warning locations. * c-c++-common/gomp/pragma-5.c: Likewise. * gcc.dg/pragma-message.c: Likewise. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adapt for improved warning locations. * testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise. (cherry picked from commit 0587cef3d7962a8b0f44779589ba2920dd3d71e5)
2022-07-31Daily bump.GCC Administrator1-0/+8
2022-07-30openmp: Fix up handling of non-rectangular simd loops with pointer type ↵Jakub Jelinek1-0/+62
iterators [PR106449] There were 2 issues visible on this new testcase, one that we didn't have special POINTER_TYPE_P handling in a few spots of expand_omp_simd - for pointers we need to use POINTER_PLUS_EXPR and need to have the non-pointer part in sizetype, for non-rectangular loop on the other side we can rely on multiplication factor 1, pointers can't be multiplied, without those changes we'd ICE. The other issue was that we put n2 expression directly into a comparison in a condition and regimplified that, for the &a[512] case that and with gimplification being destructed that unfortunately meant modification of original fd->loops[?].n2. Fixed by unsharing the expression. This was causing a runtime failure on the testcase. 2022-07-29 Jakub Jelinek <jakub@redhat.com> PR middle-end/106449 * omp-expand.cc (expand_omp_simd): Fix up handling of pointer iterators in non-rectangular simd loops. Unshare fd->loops[i].n2 or n2 before regimplifying it inside of a condition. * testsuite/libgomp.c-c++-common/pr106449.c: New test. (cherry picked from commit 97d32048c04e9787fccadc4bae1c042754503e34)
2022-06-29Daily bump.GCC Administrator1-0/+18
2022-06-28libgomp: Fix up target-31.c test [PR106045]Jakub Jelinek1-1/+1
The i variable is used inside of the parallel in: #pragma omp simd safelen(32) private (v) for (i = 0; i < 64; i++) { v = 3 * i; ll[i] = u1 + v * u2[0] + u2[1] + x + y[0] + y[1] + v + h[0] + u3[i]; } where i is predetermined linear (so while inside of the body it is safe, private per SIMD lane var) the final value is written to the shared variable, and in: for (i = 0; i < 64; i++) if (ll[i] != u1 + 3 * i * u2[0] + u2[1] + x + y[0] + y[1] + 3 * i + 13 + 14 + i) #pragma omp atomic write err = 1; which is a normal loop and so it isn't in any way privatized there. So we have a data race, fixed by adding private (i) clause to the parallel. 2022-06-21 Jakub Jelinek <jakub@redhat.com> Paul Iannetta <piannetta@kalrayinc.com> PR libgomp/106045 * testsuite/libgomp.c/target-31.c: Add private (i) clause. (cherry picked from commit 85d613da341b76308edea48359a5dbc7061937c4)
2022-06-28libgomp: fix typo in mold linker detectionMartin Liska3-3/+3
libgomp/ChangeLog: * acinclude.m4: Fix typo in mold linker detection. * Makefile.in: Regenerate. * configure: Regenerate. (cherry picked from commit 6835baee7196eeb03f24f6e650c0d37cbb295647)
2022-05-31Daily bump.GCC Administrator1-0/+13
2022-05-30libgomp: Don't define GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC for _aligned_malloc ↵Jakub Jelinek2-4/+5
[PR105745] since apparently _aligned_malloc requires freeing with _aligned_free and: /* Defined if gomp_aligned_alloc doesn't use fallback version and free can be used instead of gomp_aligned_free. */ #define GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC 1 so the second condition isn't satisfied. For uses inside of the OpenMP allocators we can still use _aligned_malloc but we need to call _aligned_free in gomp_aligned_free. 2022-05-28 Jakub Jelinek <jakub@redhat.com> PR libgomp/105745 * libgomp.h (GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC): Don't define for defined(HAVE__ALIGNED_MALLOC) case. * alloc.c (gomp_aligned_alloc): Move defined(HAVE__ALIGNED_MALLOC) handling as last option before fallback instead of first. (gomp_aligned_free): For defined(HAVE__ALIGNED_MALLOC) call _aligned_free. (cherry picked from commit 42fd2cd932384288914174f4af7974a060972bff)
2022-05-18Daily bump.GCC Administrator1-0/+8
2022-05-17libgomp: Clarify that omp_display_env is fully implementedJakub Jelinek1-2/+1
OpenMP 5.2 added "When called from within a target region the effect is unspecified." restriction to omp_display_env, so it is ok not to support it in target regions (worst case we could add an empty implementation or one with __builtin_trap in there). 2022-05-17 Jakub Jelinek <jakub@redhat.com> * libgomp.texi (OpenMP 5.1): Remove "Not inside target regions" comment for omp_display_env feature. (cherry picked from commit 741478ed3ed51658af476bccbc198da931e41d44)
2022-05-06Update ChangeLog and version files for releasereleases/gcc-12.1.0Jakub Jelinek1-0/+4
2022-04-29Daily bump.GCC Administrator1-0/+7
2022-04-28Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717]Thomas Schwinge1-1/+29
That got broken by recent commit b2202431910e30d8505c94d1cb9341cac7080d10 "fortran: Fix up gfc_trans_oacc_construct [PR104717]". PR fortran/104717 libgomp/ * testsuite/libgomp.oacc-fortran/print-1.f90: Add OpenACC privatization scanning. For GCN offloading compilation, raise '-mgang-private-size'.
2022-04-27Daily bump.GCC Administrator1-0/+13
2022-04-26libgomp: Fix up two non-GOMP_USE_ALIGNED_WORK_SHARES related issues [PR105358]Jakub Jelinek4-7/+18
Last fall I've changed struct gomp_work_share, so that it doesn't have __attribute__((aligned (64))) lock member in the middle unless the target has non-emulated aligned allocator, otherwise it just makes sure the first and second halves are 64 bytes appart for cache line reasons, but doesn't make the struct 64-byte aligned itself and so we can use normal allocators for it. When the struct isn't 64-byte aligned, the amount of tail padding significantly decreases, to 0 or 4 bytes or so. The library uses that tail padding when the ordered_teams_ids array (array of uints) and/or the memory for lastprivate conditional temporaries (the latter wants to guarantee long long alignment). The problem with it on ia32 darwin9 is that while the struct contains long long members, long long is just 4 byte aligned while __alignof__(long long) is 8. That causes problems in gomp_init_work_share, where we currently rely on if offsetof (struct gomp_work_share, inline_ordered_team_ids) is long long aligned, then that tail array will be aligned at runtime and so no extra memory for dynamic realignment will be needed (that is false when the whole struct doesn't have long long alignment). And also in the remaining hunks causes another problem, where we compute INLINE_ORDERED_TEAM_IDS_OFF as the above offsetof aligned up to long long boundary and subtract sizeof (struct gomp_work_share) and INLINE_ORDERED_TEAM_IDS_OFF. When unlucky, the former isn't multiple of 8 and the latter is 4 bigger than that and as the subtraction is done in size_t, we end up with (size_t) -4, so the comparison doesn't really work. The fixes add additional conditions to make it work properly, but all of them should be evaluated at compile time when optimizing and so shouldn't slow anything. 2022-04-26 Jakub Jelinek <jakub@redhat.com> PR libgomp/105358 * work.c (gomp_init_work_share): Don't mask of adjustment for dynamic long long realignment if struct gomp_work_share has smaller alignof than long long. * loop.c (GOMP_loop_start): Don't use inline_ordered_team_ids if struct gomp_work_share has smaller alignof than long long or if sizeof (struct gomp_work_share) is smaller than INLINE_ORDERED_TEAM_IDS_OFF. * loop_ull.c (GOMP_loop_ull_start): Likewise. * sections.c (GOMP_sections2_start): Likewise.