Age | Commit message (Collapse) | Author | Files | Lines |
|
This documents that GDC 9.4 or later is required to build the D
language rather than GDC 9.1 which suffers from PR94240.
PR d/104749
* doc/install.texi (GDC): Document GDC 9.4 or later is required
to build the D language frontend.
(cherry picked from commit 05b7cf52e1b640271900894a894da2d27ef90092)
|
|
For Alderlake there is similar issue like PR81616, enable
avoid_fma256_chain will also benefit on Intel latest platforms
Alderlake and Sapphire Rapids.
gcc/ChangeLog:
* config/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_CHAINS): Add
m_SAPPHIRERAPIDS, m_ALDERLAKE.
|
|
|
|
Merge up to r12-8998-gcdc1a14be1182874ccf1ceb27ee5b67c5ce8c62d (19th Dec 2022)
|
|
Here during partial instantiation of the constexpr if, extra_local_specs
walks the statement looking for local specializations within to capture.
However, we're thwarted by the fact that 'ts' first appears inside an
unevaluated context, and so the calls to process_outer_var_ref for its
local specializations are a no-op. And since we walk each tree exactly
once, we end up not capturing the local specializations despite 'ts'
later occurring in an evaluated context.
This patch fixes this by making extract_local_specs walk evaluated
contexts first before walking unevaluated contexts. We could probably
get away with not walking unevaluated contexts at all, but this approach
seems more clearly safe.
PR c++/100295
PR c++/107579
gcc/cp/ChangeLog:
* pt.cc (el_data::skip_unevaluated_operands): New data member.
(extract_locals_r): If skip_unevaluated_operands is true,
don't walk into unevaluated contexts.
(extract_local_specs): Walk the pattern twice, first with
skip_unevaluated_operands true followed by it set to false.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/constexpr-if-lambda5.C: New test.
(cherry picked from commit 18499b9f848707aee42d810e99ac0a4c9788433c)
|
|
Here we're triggering an overzealous assert in unify during partial
ordering since the member function pointer constants are represented as
ordinary CONSTRUCTORs (with TYPE_PTRMEMFUNC_P TREE_TYPE) but the assert
expects COMPOUND_LITERAL_P constructors.
PR c++/108104
gcc/cp/ChangeLog:
* pt.cc (unify) <default>: Relax assert to accept any
CONSTRUCTOR parm, not just COMPOUND_LITERAL_P one.
gcc/testsuite/ChangeLog:
* g++.dg/template/ptrmem33.C: New test.
(cherry picked from commit 38304846d18d6bb14b0fd6c627c5c6d43a814d01)
|
|
Here find_parameter_packs_r isn't detecting the pack T inside the
requires-expr's parameter list ultimately because cp_walk_trees
deliberately avoids walking the list so as to avoid false positives in
the unexpanded pack checker.
But it should still be fine to walk the TREE_TYPE of each parameter,
which we already need to do from for_each_template_parm_r, and is
sufficient to fix the testcase.
PR c++/107417
gcc/cp/ChangeLog:
* pt.cc (for_each_template_parm_r) <case REQUIRES_EXPR>: Move
walking of the TREE_TYPE of each parameter to ...
* tree.cc (cp_walk_subtrees) <case REQUIRES_EXPR>: ... here.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-requires33.C: New test.
(cherry picked from commit 079add3ad39d6620d34665dd9c26c21951eb657c)
|
|
We implement class-scope using enum by injecting clones of the enum's
CONST_DECLs as fields of the class, for which CONST_DECL_USING_P is
true, so that qualified lookup naturally finds the enumerators.
Substitution into such a CONST_DECL currently ICEs however, because we
assume the DECL_CONTEXT is always the ENUMERAL_TYPE (which has
TYPE_VALUES) but in this case it's the RECORD_TYPE for the class scope
(which has TYPE_FIELDS).
Since these CONST_DECLs appear to always be non-dependent, this patch
fixes this by shortcutting substitution for CONST_DECLs that have
non-dependent DECL_CONTEXT. This subsumes the existing (and seemingly
dead) DECL_NAMESPACE_SCOPE_P early exit test and also benefits
substitution into ordinary non-dependent CONST_DECLs.
PR c++/103081
gcc/cp/ChangeLog:
* pt.cc (tsubst_copy) <case CONST_DECL>: Generalize
early exit test for namespace-scope decls to check dependence of
the enclosing scope instead. Remove dead early exit test.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/using-enum-10.C: New test.
* g++.dg/cpp2a/using-enum-10a.C: New test.
(cherry picked from commit b3912122c9dfaba6c8229e8f095885f69782ceda)
|
|
In a SFINAE context composite_pointer_type returns error_mark_node if
the given pointer types are incompatible. But the SPACESHIP_EXPR case
of cp_build_binary_op wasn't prepared for this error_mark_node result,
which led to an ICE (from spaceship_comp_cat) for the below testcase.
(In a non-SFINAE context composite_pointer_type issues a permerror and
returns cv void* in this case, so this ICE seems specific to SFINAE.)
PR c++/107542
gcc/cp/ChangeLog:
* typeck.cc (cp_build_binary_op): In the SPACESHIP_EXPR case,
handle an error_mark_node result type.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/spaceship-sfinae2.C: New test.
(cherry picked from commit 000e9863120cbc75a0f8d497264519974c97669f)
|
|
Here we crash because check_function_format was using TREE_PURPOSE
directly rather than using get_attribute_name.
PR c/98487
gcc/c-family/ChangeLog:
* c-format.cc (check_function_format): Use get_attribute_name.
gcc/testsuite/ChangeLog:
* c-c++-common/Wsuggest-attribute-1.c: New test.
(cherry picked from commit 68e51bd0a85794cd437d3e740357dfef84dc560d)
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/107872
* resolve.cc (derived_inaccessible): Skip over allocatable components
to prevent an infinite loop.
gcc/testsuite/ChangeLog:
PR fortran/107872
* gfortran.dg/pr107872.f90: New test.
(cherry picked from commit 01254aa2eb766c7584fd047568d7277d4d65d067)
|
|
|
|
|
|
Fix up some USM corner cases.
libgomp/ChangeLog:
* libgomp.h (OFFSET_USM): New macro.
* target.c (gomp_map_pointer): Handle USM mappings.
(gomp_map_val): Handle OFFSET_USM.
(gomp_map_vars_internal): Move USM check earlier, and use OFFSET_USM.
Add OFFSET_USM check to the second mapping pass.
* testsuite/libgomp.fortran/usm-1.f90: New test.
* testsuite/libgomp.fortran/usm-2.f90: New test.
|
|
|
|
Currently patchable area is at the wrong place on AArch64. It is placed
immediately after function label, before .cfi_startproc. This patch
adds UNSPECV_PATCHABLE_AREA for pseudo patchable area instruction and
modifies aarch64_print_patchable_function_entry to avoid placing
patchable area before .cfi_startproc.
gcc/
PR target/98776
* config/aarch64/aarch64-protos.h (aarch64_output_patchable_area):
Declared.
* config/aarch64/aarch64.cc (aarch64_print_patchable_function_entry):
Emit an UNSPECV_PATCHABLE_AREA pseudo instruction.
(aarch64_output_patchable_area): New.
* config/aarch64/aarch64.md (UNSPECV_PATCHABLE_AREA): New.
(patchable_area): Define.
gcc/testsuite/
PR target/98776
* gcc.target/aarch64/pr98776.c: New.
* gcc.target/aarch64/pr92424-2.c: Adjust pattern.
* gcc.target/aarch64/pr92424-3.c: Adjust pattern.
|
|
|
|
The number of elements gets stored in _M_capacity so use a separate
variable for the number of bytes to allocate.
libstdc++-v3/ChangeLog:
PR libstdc++/108097
* include/std/stacktrace (basic_stracktrace::_Impl): Do not
multiply N by sizeof(value_type) when allocating.
(cherry picked from commit 881c6cabce5d0b27285ed41bd6dabdf48848cce7)
|
|
|
|
D Runtime changes:
- Fix MIPS64 bindings for CRuntime_UClibc.
Phobos changes:
- Fix std.path.expandTilde erroneously raising onOutOfMemory
after failed call to getpwnam_r().
- Fix std.random unittest failures on ILP32 targets.
- Use GENERIC_IO on CRuntime_UClibc port of std.stdio.
libphobos/ChangeLog:
* libdruntime/core/stdc/fenv.d: Compile in MIPS uClibc bindings on
MIPS_Any targets.
* libdruntime/core/stdc/math.d: Likewise.
* libdruntime/core/sys/posix/dlfcn.d: Likewise.
* libdruntime/core/sys/posix/setjmp.d: Add MIPS64 definitions for
CRuntime_UClibc.
* libdruntime/core/sys/posix/sys/types.d: Likewise.
* src/std/path.d (expandTilde): Handle more errno codes that could be
left set by getpwnam_r.
* src/std/random.d: Use D_LP64 in unittests.
* src/std/stdio.d: Set CRuntime_UClibc as GENERIC_IO target.
|
|
This patch fixes a type confusion bug in varasm.cc:assemble_variable.
The problem is that the current code calls:
sect = get_variable_section (decl, false);
and then accesses sect->named.name without checking whether the section
is in fact a named section. In the surrounding else clause, we only know
that SECTION_STYLE (sect) != SECTION_NOSWITCH, so it is possible that
the section is an unnamed section.
In practice, this means that we end up doing a wild string compare
between a function pointer and the string literal ".vtable_map_vars".
This is because sect->named.name aliases sect->unnamed.callback in the
section union.
This can be seen in GDB with a simple testcase such as "int x;".
This patch fixes the issue by checking the SECTION_STYLE of the section
is in fact SECTION_NAMED before trying to do the string comparison.
We drop the existing check of whether sect->named.name is non-NULL
because this should presumably always be the case for a named section.
gcc/ChangeLog:
* varasm.cc (assemble_variable): Fix type confusion bug when
checking for ".vtable_map_vars" section.
(cherry picked from commit de144fdab17dbbb64ccb540056ab78b4ffb3fbbc)
|
|
This patch handles Unified Shared Memory (USM) in the OpenMP runtime routine
omp_target_is_accessible.
libgomp/ChangeLog:
* target.c (omp_target_is_accessible): Handle unified shared memory.
* testsuite/libgomp.c-c++-common/target-is-accessible-1.c: Updated.
* testsuite/libgomp.fortran/target-is-accessible-1.f90: Updated.
* testsuite/libgomp.c-c++-common/target-is-accessible-2.c: New test.
* testsuite/libgomp.fortran/target-is-accessible-2.f90: New test.
|
|
|
|
Sometimes, nested lambdas of templated functions get no code generation
due to them being marked as instantianted outside of all modules being
compiled in the current compilation unit. This despite enclosing
template instances being marked as instantiated inside the current
compilation unit. To fix, all enclosing templates are now checked in
`function_defined_in_root_p'.
Because of this change, `function_needs_inline_definition_p' has also
been fixed up to only check whether the regular function definition
itself is to be emitted in the current compilation unit.
PR d/108055
gcc/d/ChangeLog:
* decl.cc (function_defined_in_root_p): Check all enclosing template
instances for definition in a root module.
(function_needs_inline_definition_p): Replace call to
function_defined_in_root_p with test for outer module `isRoot'.
gcc/testsuite/ChangeLog:
* gdc.dg/torture/imports/pr108055conv.d: New.
* gdc.dg/torture/imports/pr108055spec.d: New.
* gdc.dg/torture/imports/pr108055write.d: New.
* gdc.dg/torture/pr108055.d: New test.
(cherry picked from commit 9fe7d3debbf60ed9fef8053123ad542a99d62100)
|
|
Merge up to r12-8979-g1a8af012222a8386fcda16a61dc17f11ba9cfbfd (12th Dec 2022)
|
|
The following avoids ICEing with a mismatched prototype for alloca
and -Walloca-larger-than using irange for checks which doesn't
like mismatched types.
PR tree-optimization/107898
* gimple-ssa-warn-alloca.cc (alloca_call_type): Check
the type of the alloca argument is compatible with size_t
before querying ranges.
(cherry picked from commit 9948daa4fd0f0ea0a9d56c2fefe1bca478554d27)
|
|
The following makes sure to clear loops number of iterations when
outlining them as part of a SESE region as can happen with
auto-parallelization. The referenced SSA names become stale otherwise.
PR tree-optimization/107865
* tree-cfg.cc (move_sese_region_to_fn): Free the number of
iterations of moved loops.
* gfortran.dg/graphite/pr107865.f90: New testcase.
(cherry picked from commit bcc2449384f2092cbdf5d6ac2357aeabe3212b2e)
|
|
The following fixes a wrong-code bug caused by loop invariant motion
hoisting an expression using an uninitialized value outside of its
controlling condition causing IVOPTs to use that to rewrite a defined
value. PR107839 is a similar case involving a bogus uninit diagnostic.
PR tree-optimization/107833
PR tree-optimization/107839
* cfghooks.cc: Include tree.h.
* tree-ssa-loop-im.cc (movement_possibility): Wrap and
make stmts using any ssa_name_maybe_undef_p operand
to preserve execution.
(loop_invariant_motion_in_fun): Call mark_ssa_maybe_undefs
to init maybe-undefined status.
* tree-ssa-loop-ivopts.cc (ssa_name_maybe_undef_p,
ssa_name_set_maybe_undef, ssa_name_any_use_dominates_bb_p,
mark_ssa_maybe_undefs): Move ...
* tree-ssa.cc: ... here.
* tree-ssa.h (ssa_name_any_use_dominates_bb_p,
mark_ssa_maybe_undefs): Declare.
(ssa_name_maybe_undef_p, ssa_name_set_maybe_undef): Define.
* gcc.dg/torture/pr107833.c: New testcase.
* gcc.dg/uninit-pr107839.c: Likewise.
(cherry picked from commit 44c8402d35160515b3c09fd2bc239587e0c32a2b)
|
|
The following propely restricts the bitfield access to integral types
when we look through VEC_UNPACK with the intent to emit a widening
conversion.
PR tree-optimization/107686
* tree-ssa-forwprop.cc (optimize_vector_load): Restrict
VEC_UNPACK support to integral typed bitfield refs.
* gcc.dg/pr107686.c: New testcase.
(cherry picked from commit 246bbdaa5f536b7a199dda9860c473137f40d622)
|
|
The following uses *node to check for FP types rather than the
child nodes which could be constant leafs and thus without a
vector type.
PR tree-optimization/107766
* tree-vect-slp-patterns.cc (complex_mul_pattern::matches):
Use *node to check for FP vector types.
* g++.dg/vect/pr107766.cc: New testcase.
(cherry picked from commit 1a06ae6f2f4f292fd05a900bcf433cb4282da1e3)
|
|
Only with -ffp-contract=fast we can synthesize FMA operations like
vfmaddsub231ps, so properly guard the transform in SLP pattern
detection.
PR tree-optimization/107647
* tree-vect-slp-patterns.cc (addsub_pattern::recognize): Only
allow FMA generation with -ffp-contract=fast for FP types.
(complex_mul_pattern::matches): Likewise.
* gcc.target/i386/pr107647.c: New testcase.
(cherry picked from commit c5df8392c5848c0462558f41cdf6eab31db301cf)
|
|
So what happens is that we elide processing of this check with
/* In addition to kills we can remove defs whose only use
is another def in defs. That can only ever be PHIs of which
we track two for simplicity reasons, the first and last in
{first,last}_phi_def (we fail for multiple PHIs anyways).
We can also ignore defs that feed only into
already visited PHIs. */
else if (single_imm_use (vdef, &use_p, &use_stmt)
&& (use_stmt == first_phi_def
|| use_stmt == last_phi_def
|| (gimple_code (use_stmt) == GIMPLE_PHI
&& bitmap_bit_p (visited,
SSA_NAME_VERSION
(PHI_RESULT (use_stmt))))))
where we have the last PHI being the outer loop virtual PHI and the first
PHI being the loop exit PHI of the outer loop and we've already processed
the single immediate use of the outer loop PHI, the inner loop PHI. But
we still have to perform the above check!
It's easiest to perform the check when we visit the PHI node instead of
delaying it to the actual processing loop.
PR tree-optimization/107407
* tree-ssa-dse.cc (dse_classify_store): Perform backedge
varying index check when collecting PHI uses rather than
after optimizing processing of the candidate defs.
* gcc.dg/torture/pr107407.c: New testcase.
(cherry picked from commit 031a400e49d8db156c43f9ec0b21ab0c2aee8c6d)
|
|
The testcase shows we mishandle the case where there's a pass-through
of a pointer through a function like memcpy. The following adjusts
handling of this copy case to require a taken address and adjust
the PHI case similarly.
PR tree-optimization/106868
* gimple-ssa-warn-access.cc (pass_waccess::gimple_call_return_arg_ref):
Inline into single user ...
(pass_waccess::check_dangling_uses): ... here and adjust the
call and the PHI case to require that ref.aref is the address
of the decl.
* gcc.dg/Wdangling-pointer-pr106868.c: New testcase.
(cherry picked from commit d492d50f644811327c5976e2c918ab6d906ed40c)
|
|
This commit enabled reverse offload for nvptx such that gomp_target_rev
actually gets called. And it fills the latter function to do all of
the following: finding the host function to the device func ptr and
copying the arguments to the host, processing the mapping/firstprivate,
calling the host function, copying back the data and freeing as needed.
The data handling is made easier by assuming that all host variables
either existed before (and are in the mapping) or that those are
devices variables not yet available on the host. Thus, the reverse
mapping can do without refcounts etc. Note that the spec disallows
inside a target region device-affecting constructs other than target
plus ancestor device-modifier and it also limits the clauses permitted
on this construct.
For the function addresses, an additional splay tree is used; for
the lookup of mapped variables, the existing splay-tree is used.
Unfortunately, its data structure requires a full walk of the tree;
Additionally, the just mapped variables are recorded in a separate
data structure an extra lookup. While the lookup is slow, assuming
that only few variables get mapped in each reverse offload construct
and that reverse offload is the exception and not performance critical,
this seems to be acceptable.
libgomp/ChangeLog:
* libgomp.h (struct target_mem_desc): Predeclare; move
below after 'reverse_splay_tree_node' and add rev_array
member.
(struct reverse_splay_tree_key_s, reverse_splay_compare): New.
(reverse_splay_tree_node, reverse_splay_tree,
reverse_splay_tree_key): New typedef.
(struct gomp_device_descr): Add mem_map_rev member.
* oacc-host.c (host_dispatch): NULL init .mem_map_rev.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim
support for GOMP_REQUIRES_REVERSE_OFFLOAD.
* splay-tree.h (splay_tree_callback_stop): New typedef; like
splay_tree_callback but returning int not void.
(splay_tree_foreach_lazy): Define; like splay_tree_foreach but
taking splay_tree_callback_stop as argument.
* splay-tree.c (splay_tree_foreach_internal_lazy,
splay_tree_foreach_lazy): New; but early exit if callback returns
nonzero.
* target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'.
(gomp_map_lookup_rev): New.
(gomp_load_image_to_device): Handle reverse-offload function
lookup table.
(gomp_unload_image_from_device): Free devicep->mem_map_rev.
(struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup,
gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int,
gomp_map_cdata_lookup): New auxiliary structs and functions for
gomp_target_rev.
(gomp_target_rev): Implement reverse offloading and its mapping.
(gomp_target_init): Init current_device.mem_map_rev.root.
* testsuite/libgomp.fortran/reverse-offload-2.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-3.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-4.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without
mapping of on-device allocated variables.
(cherry picked from commit ea4b23d9c82d9be3b982c3519fe5e8e9d833a6a8)
|
|
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_namelist): Improve OMP_LIST_ALLOCATE
output.
* gfortran.h (struct gfc_omp_namelist): Add 'align' to 'u'.
(gfc_free_omp_namelist): Add bool arg.
* match.cc (gfc_free_omp_namelist): Likewise; free 'u.align'.
* openmp.cc (gfc_free_omp_clauses, gfc_match_omp_clause_reduction,
gfc_match_omp_flush): Update call.
(gfc_match_omp_clauses): Match 'align/allocate modifers in
'allocate' clause.
(resolve_omp_clauses): Resolve align.
* st.cc (gfc_free_statement): Update call
* trans-openmp.cc (gfc_trans_omp_clauses): Handle 'align'.
libgomp/ChangeLog:
* libgomp.texi (5.1 Impl. Status): Split allocate clause/directive
item about 'align'; mark clause as 'Y' and directive as 'N'.
* testsuite/libgomp.fortran/allocate-2.f90: New test.
* testsuite/libgomp.fortran/allocate-3.f90: New test.
(cherry picked from commit b2e1c49b4a4592f9e96ae9ece8af7d0e6527b194)
|
|
|
|
This was added by the backport of an ICE in r12-8969. While harmless,
it was not until r13-758 that "final" and "override" were introduced to
all visitor methods in the D front-end. Removing it from the release
branch just for consistency with the rest of the file.
gcc/d/ChangeLog:
* imports.cc (ImportVisitor::visit (OverloadSet *)): Remove "final"
and "override" from visitor method.
|
|
The visitor for lowering IMPORTED_DECLs did not have an override for
dealing with importing OverloadSet symbols. This has now been
implemented in the code generator.
PR d/108050
gcc/d/ChangeLog:
* decl.cc (DeclVisitor::visit (Import *)): Handle build_import_decl
returning a TREE_LIST.
* imports.cc (ImportVisitor::visit (OverloadSet *)): New override.
gcc/testsuite/ChangeLog:
* gdc.dg/imports/pr108050/mod1.d: New.
* gdc.dg/imports/pr108050/mod2.d: New.
* gdc.dg/imports/pr108050/package.d: New.
* gdc.dg/pr108050.d: New test.
(cherry picked from commit d9d8c9674ad3ad3aa38419d24b1aaaffe31f5d3f)
|
|
|
|
|
|
Similar story as PR103661, we again return a negative number
for __builtin_cpu_supports:
Documentation says:
int __builtin_cpu_supports(const char *feature)
This function returns a positive integer if the run-time CPU supports feature and returns 0 otherwise.
while we return -2147483648.
Moreover, I noticed "x86-64" is not a valid option for __builtin_cpu_is,
but for __builtin_cpu_supports.
PR target/107551
gcc/ChangeLog:
* config/i386/i386-builtins.cc (fold_builtin_cpu): Use same path
as for PR103661.
* doc/extend.texi: Fix "x86-64" use.
gcc/testsuite/ChangeLog:
* gcc.target/i386/builtin_target.c: Add more checks.
(cherry picked from commit d71b20fc30965ba8326ad9363d0aca9d61eb4ed3)
|
|
The patch removes unneeded loops for cpu_features2 and CONVERT_EXPR
that can be simplified with NOP_EXPR.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (has_cpu_feature): Directly
compute index in cpu_features2.
(set_cpu_feature): Likewise.
* config/i386/i386-builtins.cc (fold_builtin_cpu): Also remove
loop for cpu_features2 and use NOP_EXPRs.
(cherry picked from commit ef14bba0a6f3836d41d75863e6516d21aef0e936)
|
|
Merge up to r12-8964-g58791f4db575ad9f952025c5eac4cc46e5c27019 (9th Dec 2022)
|
|
|
|
|
|
|
|
omp_{gs}et_teams_thread_limit on offload devices
This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order to reuse device-specific ICV values, a copy back mechanism is
implemented that copies ICV values back from device to the host.
Additionally, a limitation of the number of teams on gcn offload devices is
implemented. The number of teams is limited by twice the number of compute
units (one team is executed on one compute unit). This avoids queueing
unnessecary many teams and a corresponding allocation of large amounts of
memory. Without that limitation the memory allocation for a large number of
user-specified teams can result in an "memory access fault".
A limitation of the number of teams is already also implemented for nvptx
devices (see nvptx_adjust_launch_bounds in libgomp/plugin/plugin-nvptx.c).
gcc/ChangeLog:
* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
to "-2" instead of "1" for non-existing num_teams clause in order to
disambiguate from the case of an existing num_teams clause with value 1.
libgomp/ChangeLog:
* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
allow processing of device-specific values.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* icv-device.c (omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
(omp_set_teams_thread_limit): Likewise.
* icv.c (omp_set_teams_thread_limit): Removed.
(omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
* libgomp.texi: Updated documentation for nvptx and gcn corresponding
to the limitation of the number of teams.
* plugin/plugin-gcn.c (limit_teams): New helper function that limits
the number of teams by twice the number of compute units.
(parse_target_attributes): Limit the number of teams on gcn offload
devices.
* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
handling.
(gomp_load_image_to_device): Added a size check for the ICVs struct
variable.
(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
copy back the ICV values from device to host.
(GOMP_target_ext): Update the number of teams and threads in the kernel
args also considering device-specific values.
* testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading
of OMP_TEAMS_THREAD_LIMIT from the environment.
* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
* testsuite/libgomp.c-c++-common/icv-9.c: New test.
* testsuite/libgomp.fortran/icv-5.f90: New test.
* testsuite/libgomp.fortran/icv-6.f90: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
num_teams from "1" to "-2" in cases without num_teams clause.
* g++.dg/gomp/target-teams-1.C: Likewise.
* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
* gfortran.dg/gomp/defaultmap-6.f90: Likewise.
(cherry picked from commit 81476bc4f4a20bcf3af7ac2548c2322d48499402)
|
|
Add libgomp support for 'amdgcn' as arch, and for each processor type (as passed
to '-march') as isa traits.
Add test case for all supported 'isa' values used as context selectors in a
metadirective construct.
libgomp/ChangeLog:
* config/gcn/selector.c (GOMP_evaluate_current_device): Recognise 'amdgcn'
as arch, and '-march' values (as well as 'gfx803') as isa traits.
* testsuite/libgomp.c-c++-common/metadirective-6.c: New test.
|
|
Provide a specific builtin for each possible value of '-march'.
gcc/ChangeLog:
* config/gcn/gcn-opts.h (TARGET_FIJI): -march=fiji.
(TARGET_VEGA10): -march=gfx900.
(TARGET_VEGA20): -march=gfx906.
(TARGET_GFX908): -march=gfx908.
(TARGET_GFX90a): -march=gfx90a.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Define a builtin that
uniquely maps to '-march'.
(cherry picked from commit e41b243302e9964e642924329826448afb21d28e)
|