Age | Commit message (Collapse) | Author | Files | Lines |
|
Merge up to r12-9058-ge1357577e6e39430869e294f94c2c547717b960f (23rd Jan 2023)
|
|
This avoids generating illegal (strict_low_part (reg ...)) RTXs. This
required two changes:
1. Do not use gen_lowpart to generate the inner expression of a
STRICT_LOW_PART. gen_lowpart might fold the SUBREG either because
there is already a paradoxical subreg or because it can directly be
applied to the register. A new wrapper function makes sure that we
always end up having an actual SUBREG.
2. Change the movstrict patterns to enforce a SUBREG as inner operand
of the STRICT_LOW_PARTs. The new predicate introduced for the
destination operand requires a SUBREG expression with a
register_operand as inner operand. However, since reload strips away
the majority of the SUBREGs we have to accept single registers as well
once we reach reload.
Bootstrapped and regression tested on IBM zSystems 64 bit.
gcc/ChangeLog:
PR target/106101
* config/s390/predicates.md (subreg_register_operand): New
predicate.
* config/s390/s390-protos.h (s390_gen_lowpart_subreg): New
function prototype.
* config/s390/s390.cc (s390_gen_lowpart_subreg): New function.
(s390_expand_insv): Use s390_gen_lowpart_subreg instead of
gen_lowpart.
* config/s390/s390.md ("*get_tp_64", "*zero_extendhisi2_31")
("*zero_extendqisi2_31", "*zero_extendqihi2_31"): Likewise.
("movstrictqi", "movstricthi", "movstrictsi"): Use the
subreg_register_operand predicate instead of register_operand.
gcc/testsuite/ChangeLog:
PR target/106101
* gcc.c-torture/compile/pr106101.c: New test.
(cherry picked from commit 585a21bab3ec688c2039bff2922cc372d8558283)
|
|
|
|
|
|
PR fortran/106731
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_trans_auto_array_allocation): Remove gcc_assert (!TREE_STATIC()).
gcc/testsuite/ChangeLog:
* gfortran.dg/pr106731.f90: New test.
|
|
|
|
The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this. Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more "sorry, unimplemented:
global constructors not supported on this target".
This depends on <https://github.com/MentorEmbedded/nvptx-tools/pull/40> "'nm'"
generally, and for global destructors support: newlib
<https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
"nvptx: Implement '_exit' instead of 'exit'".
gcc/
* collect2.cc (write_c_file_glob): Allow for
'COLLECT2_MAIN_REFERENCE' override.
* config.gcc <case ${target} in nvptx-*>: Set 'use_collect2=yes'.
* config/nvptx/nvptx.h: Adjust.
gcc/testsuite/
* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
in identifiers.
* lib/target-supports.exp
(check_effective_target_global_constructor): Enable for nvptx.
libgcc/
* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
'crtend.o' to 'extra_parts'.
* config/nvptx/crt0.c: Invoke '__do_global_ctors',
'__do_global_dtors'.
* config/nvptx/crtstuff.c: New.
* config/nvptx/t-nvptx: Adjust.
|
|
'__nvptx_uni'
As I have reported to Nvidia in 2022-12-01 'NVIDIA Incident Report (3891704):
ptxas: Duplicate declaration error: "cannot be resolved by a '.static'"',
'ptxas' has an inscrutable error mode for duplicate declarations:
ptxas softstack-decl-1.o, line 11; error : '.extern' variable '__nvptx_stacks' cannot be resolved by a '.static'
ptxas fatal : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
ptxas uniform-simt-decl-1.o, line 12; error : '.extern' variable '__nvptx_uni' cannot be resolved by a '.static'
ptxas fatal : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
This is inscrutable, because (a) what is "cannot be resolved by a '.static'"
supposed to tell me (there is no '.static' in PTX?), and (b) why arent't
repeated declaration just verified to match the first, but otherwise a no-op
(like in other programming languages)?
gcc/
* config/nvptx/nvptx.cc (nvptx_assemble_undefined_decl): Notice
'__nvptx_stacks', '__nvptx_uni' declarations.
(nvptx_file_end): Don't emit duplicate declarations for those.
gcc/testsuite/
* gcc.target/nvptx/softstack-decl-1.c: Make 'dg-do assemble',
adjust.
* gcc.target/nvptx/uniform-simt-decl-1.c: Likewise.
|
|
'gcc.target/nvptx/uniform-simt-decl-1.c'
... to document the status quo re implicit (via 'need_softstack_decl',
'need_unisimt_decl') and explicit declarations of '__nvptx_stacks',
'__nvptx_uni'.
gcc/testsuite/
* gcc.target/nvptx/softstack-decl-1.c: New.
* gcc.target/nvptx/uniform-simt-decl-1.c: Likewise.
|
|
For example, this allows for '-muniform-simt' code to be executed
single-threaded, which currently fails (device-side 'trap'), as the 0xffffffff
mask isn't correct if not all 32 threads of a warp are active. The same
issue/fix, I suppose but have not verified, would apply if we were to allow for
OpenACC 'vector_length' smaller than 32, for example for OpenACC 'serial'.
We use 'nvptx_uniform_warp_check' only for PTX ISA version less than 6.0.
Otherwise we're using 'nvptx_warpsync', which emits 'bar.warp.sync 0xffffffff',
which evidently appears to do the right thing. (I've tested '-muniform-simt'
code executing single-threaded.)
gcc/
* config/nvptx/nvptx.md (nvptx_uniform_warp_check): Make fit for
non-full-warp execution.
gcc/testsuite/
* gcc.target/nvptx/nvptx.exp
(check_effective_target_default_ptx_isa_version_at_least_6_0):
New.
* gcc.target/nvptx/uniform-simt-5.c: New.
libgomp/
* plugin/plugin-nvptx.c (nvptx_exec): Assert what we know about
'blockDimX'.
|
|
|
|
Merge up to r12-9052-g61ef24af3ce8ec9c5eb65770f8047d98f42a93bf (19th Jan 2023)
|
|
expand_omp_for_init_counts was using for the case where collapse(2)
inner loop has init expression dependent on non-constant multiple of
the outer iterator and the condition upper bound expression doesn't
depend on the outer iterator fold_unary (NEGATE_EXPR, ...). This
will just return NULL if it can't be folded, we need fold_build1
instead.
2023-01-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/108459
* omp-expand.cc (expand_omp_for_init_counts): Use fold_build1 rather
than fold_unary for NEGATE_EXPR.
* testsuite/libgomp.c/pr108459.c: New test.
(cherry picked from commit 46644ec99cb355845b23bb1d02775c057ed8ee88)
|
|
|
|
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/107706
* openmp.cc (gfc_resolve_omp_assumptions): Reject nonscalars.
gcc/testsuite/ChangeLog:
PR fortran/107706
* gfortran.dg/gomp/assume-2.f90: Update dg-error.
* gfortran.dg/gomp/assumes-2.f90: Likewise.
* gfortran.dg/gomp/assume-5.f90: New test.
(cherry picked from commit 2ce55247a8bf32985a96ed63a7a92d36746723dc)
|
|
Merge up to r12-9046-gd369eb486bdc720e4c50563226dbbb11a0226b5d (16th Jan 2023)
|
|
|
|
|
|
|
|
|
|
|
|
The handling of bitfields by the SRA pass is peculiar and this must be taken
into account to support the scalar_storage_order attribute.
gcc/
PR tree-optimization/108199
* tree-sra.cc (sra_modify_expr): Deal with reverse storage order
for bit-field references.
gcc/testsuite/
* gcc.dg/sso-17.c: New test.
|
|
PR tree-optimization/108137
gcc/ChangeLog:
* tree-ssa-strlen.cc (get_range_strlen_phi): Reject anything
different from INTEGER_CST.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr108137.c: New test.
(cherry picked from commit ee6f262b87fef590729e96e999f1c3b207c251c0)
|
|
|
|
In the M-Class Arm-ARM:
https://developer.arm.com/documentation/ddi0553/bu/?lang=en
these MVE instructions only have '!' writeback variant and at:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714
we found that the Um constraint would also allow through a
register offset writeback, resulting in an assembler error.
Here I have added a new constraint and predicate for these
instructions, which (uniquely, AFAICT), only support a `!` writeback
increment by the data size (inside the compiler this is a POST_INC).
No regressions in arm-none-eabi with MVE and MVE.FP.
gcc/ChangeLog:
PR target/107714
* config/arm/arm-protos.h (mve_struct_mem_operand): New protoype.
* config/arm/arm.cc (mve_struct_mem_operand): New function.
* config/arm/constraints.md (Ug): New constraint.
* config/arm/mve.md (mve_vst4q<mode>): Change constraint.
(mve_vst2q<mode>): Likewise.
(mve_vld4q<mode>): Likewise.
(mve_vld2q<mode>): Likewise.
* config/arm/predicates.md (mve_struct_operand): New predicate.
gcc/testsuite/ChangeLog:
PR target/107714
* gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c: New test.
(cherry picked from commit 4269a6567eb991e6838f40bda5be9e3a7972530c)
|
|
In this PR we ICE when expanding the __rbit builtin with a NULL target rtx.
I *think* that only happens when the result is unused and hence maybe we shouldn't be expanding
any RTL at all, but the ICE here is easily fixed by deriving the mode from the type of the expression
rather than the target.
This patch does that.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
PR target/108140
* config/aarch64/aarch64-builtins.cc
(aarch64_expand_builtin_data_intrinsic): Handle NULL target.
gcc/testsuite/ChangeLog:
PR target/108140
* gcc.target/aarch64/acle/pr108140.c: New test.
(cherry picked from commit 98756bcbe27647f263f2b312d1d933d70cf56ba9)
|
|
|
|
The comment in the loop says that we shouldn't add a map clause if such
a clause exists already, but the loop was actually using OMP_CLAUSE_DECL
on any clause. Target construct can have various clauses which don't
have OMP_CLAUSE_DECL at all (e.g. nowait, device or if) or clause
where it means something different (e.g. privatization clauses, allocate,
depend).
So, only check OMP_CLAUSE_DECL on OMP_CLAUSE_MAP clauses.
2023-01-05 Jakub Jelinek <jakub@redhat.com>
PR c++/108286
* semantics.cc (finish_omp_target_clauses): Ignore clauses other than
OMP_CLAUSE_MAP.
* testsuite/libgomp.c++/pr108286.C: New test.
(cherry picked from commit 29c3218618ef6177dc33871b26c8fbd9b21eabe1)
|
|
Merge up to r12-9034-g4494965932fc5d005e2482bbe58cf9e138c830bd (9th Jan 2023)
|
|
|
|
|
|
|
|
|
|
As PR106736 shows, it's unexpected to use __vector_quad and
__vector_pair types without MMA support, it would cause ICE
when expanding the corresponding assignment. We can't guard
these built-in types registering under MMA support as Peter
pointed out in that PR, because the registering is global,
it doesn't work for target pragma/attribute support with MMA
enabled. The existing verify_type_context mentioned in [2]
can help to make the diagnostics invalid built-in type uses
better, but as Richard pointed out in [4], it can't deal with
all cases. As the discussions in [1][3], this patch is to
check the invalid use of built-in types __vector_quad and
__vector_pair in mov pattern of OOmode and XOmode, on the
currently being expanded gimple assignment statement. It
still puts an assertion in else arm rather than just makes
it go through, it's to ensure we can catch any other possible
unexpected cases in time if there are.
[1] https://gcc.gnu.org/pipermail/gcc/2022-December/240218.html
[2] https://gcc.gnu.org/pipermail/gcc/2022-December/240220.html
[3] https://gcc.gnu.org/pipermail/gcc/2022-December/240223.html
[4] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608083.html
PR target/106736
gcc/ChangeLog:
* config/rs6000/mma.md (define_expand movoo): Call function
rs6000_opaque_type_invalid_use_p to check and emit error message for
the invalid use of opaque type.
(define_expand movxo): Likewise.
* config/rs6000/rs6000-protos.h
(rs6000_opaque_type_invalid_use_p): New function declaration.
(currently_expanding_gimple_stmt): New extern declaration.
* config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): New
function.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr106736-1.c: New test.
* gcc.target/powerpc/pr106736-2.c: Likewise.
* gcc.target/powerpc/pr106736-3.c: Likewise.
* gcc.target/powerpc/pr106736-4.c: Likewise.
* gcc.target/powerpc/pr106736-5.c: Likewise.
|
|
|
|
We typically ignore mark_used failure when in a non-SFINAE context for
sake of better error recovery. But in mark_single_function we're
instead ignoring mark_used failure in a SFINAE context, which ends up
causing the second static_assert here to incorrectly fail.
PR c++/108282
gcc/cp/ChangeLog:
* decl2.cc (mark_single_function): Ignore mark_used failure
only in a non-SFINAE context rather than in a SFINAE one.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-requires34.C: New test.
(cherry picked from commit 238e292cf5d822f3bd12d9b58eb04cf377758b2a)
|
|
SIMD clones are created during the IPA phase when it is not known whether
or not the vectorizer can use them. Clones for functions with external
linkage are part of the ABI, but local clones can be GC'ed if no calls are
found in the compilation unit after vectorization.
gcc/ChangeLog
* cgraph.h (struct cgraph_node): Add gc_candidate bit, modify
default constructor to initialize it.
* cgraphunit.cc (expand_all_functions): Save gc_candidate functions
for last and iterate to handle recursive calls. Delete leftover
candidates at the end.
* omp-simd-clone.cc (simd_clone_create): Set gc_candidate bit
on local clones.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Clear
gc_candidate bit when a clone is used.
gcc/testsuite/ChangeLog
* g++.dg/gomp/target-simd-clone-1.C: Tweak to test
that the unused clone is GC'ed.
* gcc.dg/gomp/target-simd-clone-1.c: Likewise.
(cherry picked from commit 0425ae780fb2b055d985b5719af5edfaaad5e980)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/108131
* array.cc (match_array_element_spec): Avoid too early simplification
of matched array element specs that can lead to a misinterpretation
when used as array bounds in array declarations.
gcc/testsuite/ChangeLog:
PR fortran/108131
* gfortran.dg/pr103505.f90: Adjust expected patterns.
* gfortran.dg/pr108131.f90: New test.
(cherry picked from commit 6a95f0e0a06d78d94138d4c3dd64d41591197281)
|
|
|
|
Merge up to r12-9015-ga3fbfc1027e9edcd14bb290b5702504d80d9e8fe (27th Dec 2022)
|
|
|