Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
__builtin_tgmath implements <tgmath.h> semantics for integer generic
arguments that handle cases involving _FloatN / _FloatNx types as
specified in TS 18661-3 plus some defect fixes.
C2x has further changes to the semantics for <tgmath.h> macros with
such types, which should also be considered defect fixes (although
handled through the integration of TS 18661-3 in C2x rather than
through an issue tracking process). Specifically, the rules were
changed because of problems raised with using the macros with the
evaluation format types such as float_t and _Float32_t: the older
version of the rules didn't allow passing _FloatN / _FloatNx types to
the narrowing macros returning float or double, or passing float /
double / long double to the narrowing macros returning _FloatN /
_FloatNx, which was a problem with the evaluation format types which
could be either kind of type depending on the value of
FLT_EVAL_METHOD.
Thus the new rules allow cases of mixing types which were not allowed
before - which is not itself a problem for __builtin_tgmath - and, as
part of the changes, the handling of integer arguments was also
changed: if there is any _FloatNx generic argument, integer generic
arguments are treated as _Float32x (not double), while the rule about
treating integer arguments to narrowing macros returning _FloatN or
_FloatNx as _Float64 not double was removed (no longer needed now
double is a valid argument to such macros).
Implement the changes for __builtin_tgmath. (The changes also added a
rule that if any argument is _DecimalNx, integer arguments are treated
as _Decimal64x, but GCC doesn't support _DecimalNx types so nothing is
done about that.)
I have a corresponding glibc patch to update glibc test expectations
for C2x and also ensure that appropriate semantics are followed when
GCC 7 through 12 are used with <tgmath.h> (avoiding __builtin_tgmath
in cases where it doesn't match the C2x semantics).
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/
* doc/extend.texi (__builtin_tgmath): Do not restate standard rule
for handling real integer types.
gcc/c/
* c-parser.cc (c_parser_postfix_expression): Handle integer
generic arguments to functions passed to __builtin_tgmath as
_Float32x if any argument has _FloatNx or _Complex _FloatNx type.
Do not handle integer arguments to some narrowing functions as
_Float64.
gcc/testsuite/
* gcc.dg/builtin-tgmath-3.c: Update expectations and add more
tests.
|
|
This reverts commit 2cba118e538ba0b7582af7f9fb5ba2dfbb772f8e.
|
|
These PRs were for now fixed by reversion of the r13-4977
patch, but so that the problems don't reappear during stage 1,
I'm adding testcase coverage from those PRs.
2023-01-06 Jakub Jelinek <jakub@redhat.com>
PR target/108292
PR target/108308
* gcc.c-torture/execute/pr108292.c: New test.
* gcc.target/i386/pr108292.c: New test.
* gcc.dg/pr108308.c: New test.
|
|
|
|
ix86_expand_int_movcc to allow condition (mask) sharing"
This reverts commit d0558f420b2a5692fd38ac76ffa97ae6c1726ed9.
2023-01-05 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR rtl-optimization/108292
* config/i386/i386-expand.cc (ix86_expand_int_movcc): Revert
previous changes.
gcc/testsuite/ChangeLog
PR rtl-optimization/108292
* gcc.target/i386/cmov10.c: Remove test case.
|
|
When tentatively parsing what is really an elaborated-type-specifier
containing a template-id first as a class-specifier, we may form a
CPP_TEMPLATE_ID token that later gets reused by the fallback parse if
the tentative parse fails. These special tokens also capture the access
checks that have been deferred while parsing the template-id. But here
we form such a token when the access check state is dk_no_check, and so
the token captures no access checks. This effectively bypasses access
checking for the template-id during the subsequent parse as an
elaborated-type-specifier.
This patch fixes this by using dk_deferred instead of dk_no_check when
parsing the class name of a class-head.
PR c++/108275
gcc/cp/ChangeLog:
* parser.cc (cp_parser_class_head): Use dk_deferred instead of
dk_no_check when parsing the class name.
gcc/testsuite/ChangeLog:
* g++.dg/parse/access14.C: New test.
|
|
This patch adds two missing procedures to
gcc/m2/gm2-libs-min/M2RTS.{def,mod} required for linking. The
patch also includes test code, changes to
gcc/testsuite/lib/gm2.exp and an expect tcl script to test the
min libraries.
gcc/m2/ChangeLog:
* gm2-libs-min/M2RTS.def (ConstructModules): New procedure
declaration.
(DeconstructModules): New procedure declaration.
* gm2-libs-min/M2RTS.mod (ConstructModules): New procedure
dummy implementation.
(DeconstructModules): New procedure dummy implementation.
gcc/testsuite:
* lib/gm2.exp (gm2_init_minx): New procedure.
(gm2_init_min): New procedure calls gm2_init_min with
dialect flags.
* gm2/link/min/pass/tiny.mod: New test case.
* gm2/link/min/pass/link-min-pass.exp: New file.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
|
|
We typically ignore mark_used failure when in a non-SFINAE context for
sake of better error recovery. But in mark_single_function we're
instead ignoring mark_used failure in a SFINAE context, which ends up
causing the second static_assert here to incorrectly fail.
PR c++/108282
gcc/cp/ChangeLog:
* decl2.cc (mark_single_function): Ignore mark_used failure
only in a non-SFINAE context rather than in a SFINAE one.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-requires34.C: New test.
|
|
We ICE on the following testcase during error recovery, both new_parm
and old_parm are error_mark_node, the ICE is on
error ("redefinition of default argument for %q+#D", new_parm);
inform (DECL_SOURCE_LOCATION (old_parm),
"original definition appeared here");
where we don't print anything useful for new_parm and ICE trying to
access DECL_SOURCE_LOCATION of old_parm. I think we shouldn't diagnose
anything when either of the parms is erroneous, GCC 11 before
merge_default_template_args has been added was doing
if (TREE_VEC_ELT (tmpl_parms, i) == error_mark_node
|| TREE_VEC_ELT (parms, i) == error_mark_node)
continue;
tmpl_parm = TREE_VALUE (TREE_VEC_ELT (tmpl_parms, i));
if (error_operand_p (tmpl_parm))
return false;
in redeclare_class_template.
2023-01-04 Jakub Jelinek <jakub@redhat.com>
PR c++/108206
* decl.cc (merge_default_template_args): Return false if either
new_parm or old_parm are erroneous.
* g++.dg/template/pr108206.C: New test.
|
|
The realbitscast.mod is currently failing on x86_64 and aarch64
Darwin since they do not have a 96b floating type. Disable the
type for all Darwin arches.
gcc/testsuite/ChangeLog:
* gm2/iso/pass/realbitscast.mod: Disable REAL96 on Darwin.
|
|
maybe_set_nonzero_bits calls set_nonzero_bits which asserts that
var doesn't have pointer type. While we could punt for those
cases, I think we can handle at least some easy cases.
Earlier in maybe_set_nonzero_bits we've checked this is on
(var & cst) == 0
edge and the other edge is __builtin_unreachable, so if cst
is say 3 as in the testcase, we want to turn it into 4 byte alignment
of the pointer.
2023-01-04 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/108253
* tree-vrp.cc (maybe_set_nonzero_bits): Handle var with pointer
types.
* g++.dg/opt/pr108253.C: New test.
|
|
We ICE on the following testcase, because a valid V2DImode
!= comparison is folded into an unsupported V2DImode > comparison.
The match.pd pattern which does this looks like:
/* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
where ~Y + 1 == pow2 and Z = ~Y. */
(for cst (VECTOR_CST INTEGER_CST)
(for cmp (eq ne)
icmp (le gt)
(simplify
(cmp (bit_and:c@2 @0 cst@1) integer_zerop)
(with { tree csts = bitmask_inv_cst_vector_p (@1); }
(if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
(with { auto optab = VECTOR_TYPE_P (TREE_TYPE (@1))
? optab_vector : optab_default;
tree utype = unsigned_type_for (TREE_TYPE (@1)); }
(if (target_supports_op_p (utype, icmp, optab)
|| (optimize_vectors_before_lowering_p ()
&& (!target_supports_op_p (type, cmp, optab)
|| !target_supports_op_p (type, BIT_AND_EXPR, optab))))
(if (TYPE_UNSIGNED (TREE_TYPE (@1)))
(icmp @0 { csts; })
(icmp (view_convert:utype @0) { csts; })))))))))
and that optimize_vectors_before_lowering_p () guarded stuff there
already deals with this problem, not trying to fold a supported comparison
into a non-supported one. The reason it doesn't work in this case is that
it isn't GIMPLE folding which does this, but GENERIC folding done during
forwprop4 - forward_propagate_into_comparison -> forward_propagate_into_comparison_1
-> combine_cond_expr_cond -> fold_binary_loc -> generic_simplify
and we simply assumed that GENERIC folding happens only before
gimplification.
The following patch fixes that by checking cfun properties instead of
always returning true in those cases.
2023-01-04 Jakub Jelinek <jakub@redhat.com>
PR middle-end/108237
* generic-match-head.cc: Include tree-pass.h.
(canonicalize_math_p, optimize_vectors_before_lowering_p): Define
to false if cfun and cfun->curr_properties has PROP_gimple_opt_math
resp. PROP_gimple_lvec property set.
* gcc.c-torture/compile/pr108237.c: New test.
|
|
[PR108256]
We shouldn't narrow multiplications originally done in signed types,
because the original multiplication might overflow but the narrowed
one will be done in unsigned arithmetics and will never overflow.
2023-01-04 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/108256
* convert.cc (do_narrow): Punt for MULT_EXPR if original
type doesn't wrap around and -fsanitize=signed-integer-overflow
is on.
* fold-const.cc (fold_unary_loc) <CASE_CONVERT>: Likewise.
* c-c++-common/ubsan/pr108256.c: New test.
|
|
|
|
C++ Modules do not work reliably on AIX. This patch disables the
modules portion of the testsuite on AIX.
IBM128 float keywords not enabled for AIX, so skip this test.
gcc/testsuite/ChangeLog:
* g++.dg/modules/modules.exp: Skip on AIX.
* gcc.target/powerpc/pr99708.c: Skip on AIX.
|
|
SIMD clones are created during the IPA phase when it is not known whether
or not the vectorizer can use them. Clones for functions with external
linkage are part of the ABI, but local clones can be GC'ed if no calls are
found in the compilation unit after vectorization.
gcc/ChangeLog
* cgraph.h (struct cgraph_node): Add gc_candidate bit, modify
default constructor to initialize it.
* cgraphunit.cc (expand_all_functions): Save gc_candidate functions
for last and iterate to handle recursive calls. Delete leftover
candidates at the end.
* omp-simd-clone.cc (simd_clone_create): Set gc_candidate bit
on local clones.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Clear
gc_candidate bit when a clone is used.
gcc/testsuite/ChangeLog
* g++.dg/gomp/target-simd-clone-1.C: Tweak to test
that the unused clone is GC'ed.
* gcc.dg/gomp/target-simd-clone-1.c: Likewise.
|
|
This patch modifies the way that ix86_expand_int_movcc generates RTL,
to allow the condition mask to be shared/reused between multiple
conditional move sequences. Such redundancy is common when RTL
if-conversion transforms non-trivial basic blocks.
As a motivating example, consider the new test case:
int a, b, c, d;
int foo(int x)
{
if (x == 0) {
a = 3;
b = 1;
c = 4;
d = 1;
} else {
a = 5;
b = 9;
c = 2;
d = 7;
}
return x;
}
This is currently compiled, with -O2, to:
foo: cmpl $1, %edi
movl %edi, %eax
sbbl %edi, %edi
andl $-2, %edi
addl $5, %edi
cmpl $1, %eax
sbbl %esi, %esi
movl %edi, a(%rip)
andl $-8, %esi
addl $9, %esi
cmpl $1, %eax
sbbl %ecx, %ecx
movl %esi, b(%rip)
andl $2, %ecx
addl $2, %ecx
cmpl $1, %eax
sbbl %edx, %edx
movl %ecx, c(%rip)
andl $-6, %edx
addl $7, %edx
movl %edx, d(%rip)
ret
Notice that the if-then-else blocks have been if-converted into four
conditional move sequences/assignments, each consisting of cmpl, sbbl,
andl and addl. However, as the conditions are the same, the cmpl and
sbbl instructions used to generate the mask could be shared by CSE.
This patch enables that so that we now generate:
foo: cmpl $1, %edi
movl %edi, %eax
sbbl %edx, %edx
movl %edx, %edi
movl %edx, %esi
movl %edx, %ecx
andl $-6, %edx
andl $-2, %edi
andl $-8, %esi
andl $2, %ecx
addl $7, %edx
addl $5, %edi
addl $9, %esi
addl $2, %ecx
movl %edx, d(%rip)
movl %edi, a(%rip)
movl %esi, b(%rip)
movl %ecx, c(%rip)
ret
Notice, the code now contains only a single cmpl and a single sbbl,
with result being shared (via movl).
2023-01-03 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_int_movcc): Rewrite
RTL expansion to allow condition (mask) to be shared/reused,
by avoiding overwriting pseudos and adding REG_EQUAL notes.
gcc/testsuite/ChangeLog
* gcc.target/i386/cmov10.c: New test case.
|
|
The following testcase ICEs on s390x-linux (e.g. with -march=z13).
The problem is that target is (subreg/s/u:SI (reg/v:DI 66 [ x+-4 ]) 4)
and we call convert_move from temp to the SUBREG_REG of that, expecting
to extend the value properly. That works nicely if temp has some
scalar integer mode (or partial one), but ICEs when temp has V4QImode
on the assertion that from and to modes have the same bitsize.
store_expr generally allows say store from V4QI to SI target because
they have the same size and if temp is a CONST_INT, we already have code
to convert the constant properly, so the following patch just adds handling
of non-scalar integer modes by converting them to the mode of target
first before convert_move extends them.
2023-01-03 Jakub Jelinek <jakub@redhat.com>
PR middle-end/108264
* expr.cc (store_expr): For stores into SUBREG_PROMOTED_* targets
from source which doesn't have scalar integral mode first convert
it to outer_mode.
* gcc.dg/pr108264.c: New test.
|
|
The following testcase distilled from Linux kernel on ppc64le ICEs,
because fixup_reorder_chain sees a bb with a single fallthru edge
falling into a bb with simple return and decides to redirect
that fallthru edge to EXIT. That is possible if the bb ending
in the fallthru edge doesn't end with a jump or ends with a normal
unconditional jump, but not when the bb ends with asm goto which can despite
a single fallthru have multiple labels to the fallthrough basic block.
The following patch makes sure we never try to redirect such cases to EXIT.
2023-01-03 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/108263
* cfgrtl.cc (fixup_reorder_chain): Avoid trying to redirect
asm goto to EXIT.
* gcc.dg/pr108263.c: New test.
|
|
|
|
This is another step towards a possible solution for PR 105137.
This patch introduces a define_insn for extendditi2 that allows
DImode to TImode sign-extension to be represented in the early
RTL optimizers, before being split post-reload into the exact
same idiom as currently produced by RTL expansion.
Typically this produces the identical code, so the first new
test case:
__int128 foo(long long x) { return (__int128)x; }
continues to generate:
foo: movq %rdi, %rax
cqto
ret
The "magic" is that this representation allows combine and the
other RTL optimizers to do a better job. Hence, the second
test case:
__int128 foo(__int128 a, long long b) {
a += ((__int128)b) << 70;
return a;
}
which mainline with -O2 currently generates as:
foo: movq %rsi, %rax
movq %rdx, %rcx
movq %rdi, %rsi
salq $6, %rcx
movq %rax, %rdi
xorl %eax, %eax
movq %rcx, %rdx
addq %rsi, %rax
adcq %rdi, %rdx
ret
with this patch now becomes:
foo: movl $0, %eax
salq $6, %rdx
addq %rdi, %rax
adcq %rsi, %rdx
ret
i.e. the same code for the signed and unsigned extension variants.
2023-01-01 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386.md (extendditi2): New define_insn.
(define_split): Use DWIH mode iterator to treat new extendditi2
identically to existing extendsidi2_1.
(define_peephole2): Likewise.
(define_peephole2): Likewise.
(define_Split): Likewise.
gcc/testsuite/ChangeLog
* gcc.target/i386/extendditi2-1.c: New test case.
* gcc.target/i386/extendditi2-2.c: Likewise.
|
|
Rotate ChangeLog files for ChangeLogs with yearly cadence.
|
|
|
|
This adds tests from bugzilla for PR103770 and duplicates.
gcc/testsuite/
* gcc.dg/pr103770.c: New test.
* gcc.dg/pr103859.c: New test.
* gcc.dg/pr105065.c: New test.
|
|
In the M-Class Arm-ARM:
https://developer.arm.com/documentation/ddi0553/bu/?lang=en
these MVE instructions only have '!' writeback variant and at:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714
we found that the Um constraint would also allow through a
register offset writeback, resulting in an assembler error.
Here I have added a new constraint and predicate for these
instructions, which (uniquely, AFAICT), only support a `!` writeback
increment by the data size (inside the compiler this is a POST_INC).
No regressions in arm-none-eabi with MVE and MVE.FP.
gcc/ChangeLog:
PR target/107714
* config/arm/arm-protos.h (mve_struct_mem_operand): New protoype.
* config/arm/arm.cc (mve_struct_mem_operand): New function.
* config/arm/constraints.md (Ug): New constraint.
* config/arm/mve.md (mve_vst4q<mode>): Change constraint.
(mve_vst2q<mode>): Likewise.
(mve_vld4q<mode>): Likewise.
(mve_vld2q<mode>): Likewise.
* config/arm/predicates.md (mve_struct_operand): New predicate.
gcc/testsuite/ChangeLog:
PR target/107714
* gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c: New test.
|
|
Update test cases with error messages that changed as a result.
gcc/fortran/ChangeLog:
PR fortran/102595
* decl.cc (attr_decl1): Guard against NULL pointer.
* parse.cc (match_deferred_characteristics): Include BT_CLASS in check for
derived being undefined.
gcc/testsuite/ChangeLog:
PR fortran/102595
* gfortran.dg/class_result_4.f90: Update error message check.
* gfortran.dg/pr85779_3.f90: Update error message check.
|
|
|
|
This patch is a one line change, to call ix86_expand_clear instead of
emit_move_insn with const0_rtx in ix86_split_ashl, allowing the backend
to use an xor instruction to clear a register if appropriate.
The effect is demonstrated with the following function.
__int128 foo(__int128 x, unsigned long long b) {
return ((__int128)b << 72) + x;
}
previously with -O2, GCC would generate
foo: movl $0, %eax
salq $8, %rdx
addq %rdi, %rax
adcq %rsi, %rdx
ret
with this patch, it now generates
foo: xorl %eax, %eax
salq $8, %rdx
addq %rdi, %rax
adcq %rsi, %rdx
ret
2022-12-28 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_split_ashl): Call
ix86_expand_clear to generate an xor instruction.
gcc/testsuite/ChangeLog
* gcc.target/i386/ashlti3-1.c: New test case.
|
|
PR tree-optimization/108137
gcc/ChangeLog:
* tree-ssa-strlen.cc (get_range_strlen_phi): Reject anything
different from INTEGER_CST.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr108137.c: New test.
|
|
|
|
gcc/Changelog:
PR target/95632
PR target/106602
* config/riscv/riscv.md: New pattern to simulate complex
const_int loads.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr95632.c: New test.
* gcc.target/riscv/pr106602.c: New test.
|
|
Like d0bbecb1c418b680505faa998fe420f0fd4bbfc1, we add a wrapper to
prevent it pull stdint.h from standard C library.
gcc/testsuite:
* gcc.target/riscv/rvv/vsetvl/riscv_vector.h: New.
|
|
PR106680 shows that -m32 -mpowerpc64 is different from
-mpowerpc64 -m32, this is determined by the way how we
handle option powerpc64 in rs6000_handle_option.
Segher pointed out this difference should be taken as
a bug and we should ensure that option powerpc64 is
independent of -m32/-m64. So this patch removes the
handlings in rs6000_handle_option and add some necessary
supports in rs6000_option_override_internal instead.
With this patch, if users specify -m{no-,}powerpc64, the
specified value is honoured, otherwise, for 64bit it
always enables OPTION_MASK_POWERPC64; while for 32bit
and TARGET_POWERPC64 and OS_MISSING_POWERPC64, it disables
OPTION_MASK_POWERPC64.
btw, following Segher's suggestion, I did some tries to warn
when OPTION_MASK_POWERPC64 is set for OS_MISSING_POWERPC64.
If warn for the case that powerpc64 is specified explicitly,
there are some TCs using -m32 -mpowerpc64 on ppc64-linux,
they need some updates, meanwhile the artificial run
with "--target_board=unix'{-m32/-mpowerpc64}'" will have
noisy warnings on ppc64-linux. If warn for the case that
it's specified implicitly, they can just be initialized by
TARGET_DEFAULT (like -m32 on ppc64-linux) or set from the
given cpu mask, we have to special case them and not to warn.
As Segher's latest comment, I decide not to warn them and
keep it consistent with before.
Bootstrapped and regress-tested on:
- powerpc64-linux-gnu P7 and P8 {-m64,-m32}
- powerpc64le-linux-gnu P9 and P10
- powerpc-ibm-aix7.2.0.0 {-maix64,-maix32}
- powerpc-darwin9 (with Iain's help)
PR target/106680
gcc/ChangeLog:
* common/config/rs6000/rs6000-common.cc (rs6000_handle_option): Remove
the adjustment for option powerpc64 in -m64 handling, and remove the
whole -m32 handling.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): When no
explicit powerpc64 option is provided, enable it for -m64. For 32 bit
and OS_MISSING_POWERPC64, disable powerpc64 if it's enabled but not
specified explicitly.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr106680-1.c: New test.
* gcc.target/powerpc/pr106680-2.c: New test.
* gcc.target/powerpc/pr106680-3.c: New test.
* gcc.target/powerpc/pr106680-4.c: New test.
2022-12-27 Kewen Lin <linkw@linux.ibm.com>
Iain Sandoe <iain@sandoe.co.uk>
|
|
|
|
Many analyzer testcases are failing on AIX, some due to specific system
header expectations. This patch skips the testcases to avoid the noise.
* gcc.dg/analyzer/fd-accept.c: Skip.
* gcc.dg/analyzer/fd-access-mode-target-headers.c: Skip.
* gcc.dg/analyzer/fd-bind.c: Skip.
* gcc.dg/analyzer/fd-connect.c: Skip.
* gcc.dg/analyzer/fd-datagram-socket.c: Skip.
* gcc.dg/analyzer/fd-glibc-datagram-client.c: Skip.
* gcc.dg/analyzer/fd-glibc-datagram-socket.c: Skip.
* gcc.dg/analyzer/fd-listen.c: Skip.
* gcc.dg/analyzer/fd-socket-misuse.c: Skip.
* gcc.dg/analyzer/fd-stream-socket-active-open.c: Skip.
* gcc.dg/analyzer/fd-stream-socket-passive-open.c: Skip.
* gcc.dg/analyzer/fd-stream-socket.c: Skip.
* gcc.dg/analyzer/fd-symbolic-socket.c: Skip.
* gcc.dg/analyzer/flex-with-call-summaries.c: Skip.
* gcc.dg/analyzer/getchar-1.c: Skip.
* gcc.dg/analyzer/isatty-1.c: Skip.
* gcc.dg/analyzer/pr94851-1.c: Skip.
* gcc.dg/analyzer/pragma-2.c: Skip.
|
|
|
|
This patch tweaks the x86 backend to use the movss and movsd instructions
to perform some vector permutations on integer vectors (V4SI and V2DI) in
the same way they are used for floating point vectors (V4SF and V2DF).
As a motivating example, consider:
typedef unsigned int v4si __attribute__((vector_size(16)));
typedef float v4sf __attribute__((vector_size(16)));
v4si foo(v4si x,v4si y) { return (v4si){y[0],x[1],x[2],x[3]}; }
v4sf bar(v4sf x,v4sf y) { return (v4sf){y[0],x[1],x[2],x[3]}; }
which is currently compiled with -O2 to:
foo: movdqa %xmm0, %xmm2
shufps $80, %xmm0, %xmm1
movdqa %xmm1, %xmm0
shufps $232, %xmm2, %xmm0
ret
bar: movss %xmm1, %xmm0
ret
with this patch both functions compile to the same form.
Likewise for the V2DI case:
typedef unsigned long v2di __attribute__((vector_size(16)));
typedef double v2df __attribute__((vector_size(16)));
v2di foo(v2di x,v2di y) { return (v2di){y[0],x[1]}; }
v2df bar(v2df x,v2df y) { return (v2df){y[0],x[1]}; }
which currently generates:
foo: shufpd $2, %xmm0, %xmm1
movdqa %xmm1, %xmm0
ret
bar: movsd %xmm1, %xmm0
ret
2022-12-25 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386-builtin.def (__builtin_ia32_movss): Update
CODE_FOR_sse_movss to CODE_FOR_sse_movss_v4sf.
(__builtin_ia32_movsd): Likewise, update CODE_FOR_sse2_movsd to
CODE_FOR_sse2_movsd_v2df.
* config/i386/i386-expand.cc (split_convert_uns_si_sse): Update
gen_sse_movss call to gen_sse_movss_v4sf, and gen_sse2_movsd call
to gen_sse2_movsd_v2df.
(expand_vec_perm_movs): Also allow V4SImode with TARGET_SSE and
V2DImode with TARGET_SSE2.
* config/i386/sse.md
(avx512fp16_fcmaddcsh_v8hf_mask3<round_expand_name>): Update
gen_sse_movss call to gen_sse_movss_v4sf.
(avx512fp16_fmaddcsh_v8hf_mask3<round_expand_name>): Likewise.
(sse_movss_<mode>): Renamed from sse_movss using VI4F_128 mode
iterator to handle both V4SF and V4SI.
(sse2_movsd_<mode>): Likewise, renamed from sse2_movsd using
VI8F_128 mode iterator to handle both V2DF and V2DI.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse-movss-4.c: New test case.
* gcc.target/i386/sse2-movsd-3.c: New test case.
|
|
|
|
My recently added testcases gcc.target/i386/pr107548-[12].c need to be
tweaked slightly for -march=cascadelake. Committed as obvious.
2022-12-24 Roger Sayle <roger@nextmovesoftware.com>
gcc/testsuite/ChangeLog
PR target/107548
* gcc.target/i386/pr107548-1.c: Match both vmovd and movd.
* gcc.target/i386/pr107548-2.c: Match both vpaddq and paddq.
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/108131
* array.cc (match_array_element_spec): Avoid too early simplification
of matched array element specs that can lead to a misinterpretation
when used as array bounds in array declarations.
gcc/testsuite/ChangeLog:
PR fortran/108131
* gfortran.dg/pr103505.f90: Adjust expected patterns.
* gfortran.dg/pr108131.f90: New test.
|
|
Here during ahead of time checking of C{}, we indirectly call get_nsdmi
for C::m from finish_compound_literal, which in turn calls
break_out_target_exprs for C::m's (non-templated) initializer, during
which we build a call to A::~A and check expr_noexcept_p for it (from
build_vec_delete_1). But this is all done with processing_template_decl
set, so the built A::~A call is templated (whose form was recently
changed by r12-6897-gdec8d0e5fa00ceb2) which expr_noexcept_p doesn't
expect, and we crash.
This patch fixes this by clearing processing_template_decl before
the call to break_out_target_exprs from get_nsdmi. And since it more
generally seems we shouldn't be seeing (or producing) non-templated
trees in break_out_target_exprs, this patch also adds an assert to
that effect.
PR c++/108116
gcc/cp/ChangeLog:
* constexpr.cc (maybe_constant_value): Clear
processing_template_decl before calling break_out_target_exprs.
* init.cc (get_nsdmi): Likewise.
* tree.cc (break_out_target_exprs): Assert processing_template_decl
is cleared.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/nsdmi-template24.C: New test.
|
|
As reported in the PR, tree-ssa-dom.cc uses real_zerop call to find
if a floating point constant is zero and it shouldn't try to infer
equivalences from comparison against it if signed zeros are honored.
This doesn't work at all for decimal types, because real_zerop always
returns false for them (one can have different representations of decimal
zero beyond -0/+0), and it doesn't work for vector compares either,
as real_zerop checks if all elements are zero, while we need to avoid
infering equivalences from comparison against vector constants which have
at least one zero element in it (if signed zeros are honored).
Furthermore, as mentioned by Joseph, for decimal types many other values
aren't singleton.
So, this patch stops infering anything if element mode is decimal, and
otherwise uses instead of real_zerop a new function, real_maybe_zerop,
which will work even for decimal types and for complex or vector will
return true if any element is or might be zero (so it returns true
for anything but constants for now).
2022-12-23 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/108068
* tree.h (real_maybe_zerop): Declare.
* tree.cc (real_maybe_zerop): Define.
* tree-ssa-dom.cc (record_edge_info): Use it instead of
real_zerop or TREE_CODE (op1) == SSA_NAME || real_zerop. Always set
can_infer_simple_equiv to false for decimal floating point types.
* gcc.dg/dfp/pr108068.c: New test.
|
|
When instantiating a constrained hidden template friend, we substitute
into its template-head requirements in tsubst_friend_function. For this
substitution we use the template's full argument vector whose outer
levels correspond to the instantiated class's arguments and innermost
level corresponds to the template's own level-lowered generic arguments.
But for A<int>::f here, for which the relevant argument vector is
{{int}, {Us...}}, the substitution into (C<Ts, Us> && ...) triggers the
assert in use_pack_expansion_extra_args_p since one argument is a pack
expansion and the other isn't.
And for A<int, int>::f, for which the relevant argument vector is
{{int, int}, {Us...}}, the use_pack_expansion_extra_args_p assert would
also trigger but we first get a bogus "mismatched argument pack lengths"
error from tsubst_pack_expansion.
Sidestepping the question of whether tsubst_pack_expansion should be
able to handle such substitutions, it seems we can work around this by
using only the instantiated class's arguments and not also the template
friend's own generic arguments, which is consistent with how we normally
substitute into the signature of a member template.
PR c++/107853
gcc/cp/ChangeLog:
* constraint.cc (maybe_substitute_reqs_for): Substitute into
the template-head requirements of a template friend using only
its outer arguments via outer_template_args.
* cp-tree.h (outer_template_args): Declare.
* pt.cc (outer_template_args): Define, factored out and
generalized from ...
(ctor_deduction_guides_for): ... here.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-friend12.C: New test.
* g++.dg/cpp2a/concepts-friend13.C: New test.
|
|
This patch enhances x86's STV pass to handle VEC_SELECT during general
scalar chain conversion, performing SImode scalar extraction from V4SI
and DImode scalar extraction from V2DI in vector registers.
The motivating test case from bugzilla is:
typedef unsigned int v4si __attribute__((vector_size(16)));
unsigned int f (v4si a, v4si b)
{
a[0] += b[0];
return a[0] + a[1];
}
currently with -O2 -march=znver2 this generates:
vpextrd $1, %xmm0, %edx
vmovd %xmm0, %eax
addl %edx, %eax
vmovd %xmm1, %edx
addl %edx, %eax
ret
which performs three transfers from the vector unit to the scalar unit,
and performs the two additions there. With this patch, we now generate:
vmovdqa %xmm0, %xmm2
vpshufd $85, %xmm0, %xmm0
vpaddd %xmm0, %xmm2, %xmm0
vpaddd %xmm1, %xmm0, %xmm0
vmovd %xmm0, %eax
ret
which performs the two additions in the vector unit, and then transfers
the result to the scalar unit. Technically the (cheap) movdqa isn't
needed with better register allocation (or this could be cleaned up
during peephole2), but even so this transform is still a win.
2022-12-23 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/107548
* config/i386/i386-features.cc (scalar_chain::add_insn): The
operands of a VEC_SELECT don't need to added to the scalar chain.
(general_scalar_chain::compute_convert_gain) <case VEC_SELECT>:
Provide gains for performing STV on a VEC_SELECT.
(general_scalar_chain::convert_insn): Convert VEC_SELECT to pshufd,
psrldq or no-op.
(general_scalar_to_vector_candidate_p): Handle VEC_SELECT of a
single element from a vector register to a scalar register.
gcc/testsuite/ChangeLog
PR target/107548
* gcc.target/i386/pr107548-1.c: New test V4SI case.
* gcc.target/i386/pr107548-2.c: New test V2DI case.
|
|
With many thanks to H.J. for doing all the hard work, this patch resolves
two P1 regressions; PR target/106933 and PR target/106959.
Although superficially similar, the i386 backend's two scalar-to-vector
(STV) passes perform their transformations in importantly different ways.
The original pass converting SImode and DImode operations to V4SImode
or V2DImode operations is "soft", allowing values to be maintained in
both integer and vector hard registers. The newer pass converting TImode
operations to V1TImode is "hard" (all or nothing) that converts all uses
of a pseudo to vector form. To implement this it invokes powerful ju-ju
calling SET_MODE on a reg_rtx, which due to RTL sharing, often updates
this pseudo's mode everywhere in the RTL chain. Hence, TImode STV can only
be performed when all uses of a pseudo are convertible to V1TImode form.
To ensure this the STV passes currently use data-flow analysis to inspect
all DEFs and USEs in a chain. This works fine for chains that are in
the usual single assignment form, but the occurrence of uninitialized
variables, or multiple assignments that split a pseudo's usage into
several independent chains (lifetimes) can lead to situations where
some but not all of a pseudo's occurrences need to be updated. This is
safe for the SImode/DImode pass, but leads to the above bugs during
the TImode pass.
My one minor tweak to HJ's patch from comment #4 of bugzilla PR106959
is to only perform the new single_def_chain_p check for TImode STV; it
turns out that STV of SImode/DImode min/max operates safely on multiple-def
chains, and prohibiting this leads to testsuite regressions. We don't
(yet) support V1TImode min/max, so this idiom isn't an issue during the
TImode STV pass.
For the record, the two alternate possible fixes are (i) make the TImode
STV pass "soft", by eliminating use of SET_MODE, instead using replace_rtx
with a new pseudo, or (ii) merging "chains" so that multiple DFA
chains/lifetimes are considered a single STV chain.
2022-12-23 H.J. Lu <hjl.tools@gmail.com>
Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/106933
PR target/106959
* config/i386/i386-features.cc (single_def_chain_p): New predicate
function to check that a pseudo's use-def chain is in SSA form.
(timode_scalar_to_vector_candidate_p): Check that TImode regs that
are SET_DEST or SET_SRC of an insn match/are single_def_chain_p.
gcc/testsuite/ChangeLog
PR target/106933
PR target/106959
* gcc.target/i386/pr106933-1.c: New test case.
* gcc.target/i386/pr106933-2.c: Likewise.
* gcc.target/i386/pr106959-1.c: Likewise.
* gcc.target/i386/pr106959-2.c: Likewise.
* gcc.target/i386/pr106959-3.c: Likewise.
|
|
gcc/ChangeLog:
* config/riscv/vector.md: Fix contraints.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/vle-constraint-1.c: New test.
|
|
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-shapes.cc (struct vsetvl_def): Add
"__riscv_" prefix.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/vsetvl-1.c: Add "__riscv_" prefix.
|