Age | Commit message (Collapse) | Author | Files | Lines |
|
This resolves all instances of PR119369
"GCN: weak undefined symbols -> execution test FAIL, 'HSA_STATUS_ERROR_VARIABLE_UNDEFINED'";
for all affected test cases, the execution test status progresses FAIL -> PASS.
This however also causes a small number of (expected) regressions, very similar
to GCC/nvptx:
[-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C -std=c++17 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C -std=c++26 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C -std=c++98 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++11 scan-assembler .weak[ \t]*_?_ZTH11derived_obj
[-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++11 scan-assembler .weak[ \t]*_?_ZTH13container_obj
[-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++11 scan-assembler .weak[ \t]*_?_ZTH8base_obj
PASS: g++.dg/cpp0x/pr84497.C -std=c++11 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++17 scan-assembler .weak[ \t]*_?_ZTH11derived_obj
[-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++17 scan-assembler .weak[ \t]*_?_ZTH13container_obj
[-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++17 scan-assembler .weak[ \t]*_?_ZTH8base_obj
PASS: g++.dg/cpp0x/pr84497.C -std=c++17 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++26 scan-assembler .weak[ \t]*_?_ZTH11derived_obj
[-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++26 scan-assembler .weak[ \t]*_?_ZTH13container_obj
[-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++26 scan-assembler .weak[ \t]*_?_ZTH8base_obj
PASS: g++.dg/cpp0x/pr84497.C -std=c++26 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/ext/weak2.C -std=gnu++17 scan-assembler weak[^ \t]*[ \t]_?_Z3foov
PASS: g++.dg/ext/weak2.C -std=gnu++17 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/ext/weak2.C -std=gnu++26 scan-assembler weak[^ \t]*[ \t]_?_Z3foov
PASS: g++.dg/ext/weak2.C -std=gnu++26 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/ext/weak2.C -std=gnu++98 scan-assembler weak[^ \t]*[ \t]_?_Z3foov
PASS: g++.dg/ext/weak2.C -std=gnu++98 (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/attr-weakref-1.c (test for excess errors)
[-FAIL:-]{+UNRESOLVED:+} gcc.dg/attr-weakref-1.c [-execution test-]{+compilation failed to produce executable+}
@@ -131211,25 +131211,25 @@ PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?c
PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?d
PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?e
PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?g
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?j
PASS: gcc.dg/weak/weak-1.c scan-assembler-not weak[^ \t]*[ \t]_?i
PASS: gcc.dg/weak/weak-12.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-12.c scan-assembler weak[^ \t]*[ \t]_?foo
PASS: gcc.dg/weak/weak-15.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-15.c scan-assembler weak[^ \t]*[ \t]_?a
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-15.c scan-assembler weak[^ \t]*[ \t]_?c
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-15.c scan-assembler weak[^ \t]*[ \t]_?d
PASS: gcc.dg/weak/weak-15.c scan-assembler-not weak[^ \t]*[ \t]_?b
PASS: gcc.dg/weak/weak-16.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-16.c scan-assembler weak[^ \t]*[ \t]_?kallsyms_token_index
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-16.c scan-assembler weak[^ \t]*[ \t]_?kallsyms_token_table
PASS: gcc.dg/weak/weak-2.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1a
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1b
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1c
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1e
PASS: gcc.dg/weak/weak-2.c scan-assembler-not weak[^ \t]*[ \t]_?ffoo1d
PASS: gcc.dg/weak/weak-3.c (test for warnings, line 58)
PASS: gcc.dg/weak/weak-3.c (test for warnings, line 73)
PASS: gcc.dg/weak/weak-3.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1a
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1b
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1c
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1e
PASS: gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1f
PASS: gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1g
PASS: gcc.dg/weak/weak-3.c scan-assembler-not weak[^ \t]*[ \t]_?ffoo1d
PASS: gcc.dg/weak/weak-4.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1a
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1b
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1c
PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1d
PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1e
PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1f
@@ -131267,16 +131267,16 @@ PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1i
PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1j
PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1k
PASS: gcc.dg/weak/weak-5.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1a
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1b
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1c
PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1d
PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1e
PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1f
PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1g
PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1h
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1i
[-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1j
PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1k
PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1l
These get 'dg-xfail-if'ed or 'dg-skip-if'ed, (mostly) similar to GCC/nvptx.
PR target/119369
gcc/
* config/gcn/gcn-protos.h (gcn_asm_weaken_decl): Declare.
* config/gcn/gcn.cc (gcn_asm_weaken_decl): New.
* config/gcn/gcn-hsa.h (ASM_WEAKEN_DECL): '#define' to this.
gcc/testsuite/
* g++.dg/abi/pure-virtual1.C: 'dg-xfail-if' GCN.
* g++.dg/cpp0x/pr84497.C: 'dg-skip-if' GCN.
* g++.dg/ext/weak2.C: Likewise.
* gcc.dg/attr-weakref-1.c: Likewise.
* gcc.dg/weak/weak-1.c: Likewise.
* gcc.dg/weak/weak-12.c: Likewise.
* gcc.dg/weak/weak-15.c: Likewise.
* gcc.dg/weak/weak-16.c: Likewise.
* gcc.dg/weak/weak-2.c: Likewise.
* gcc.dg/weak/weak-3.c: Likewise.
* gcc.dg/weak/weak-4.c: Likewise.
* gcc.dg/weak/weak-5.c: Likewise.
|
|
The following fixes ix86_valid_target_attribute_inner_p to properly
handle target("no-sse4") via OPT_mno_sse4 rather than as unset OPT_msse4.
I've added asserts to ix86_handle_option that RejectNegative is honored
for both.
PR target/119549
* common/config/i386/i386-common.cc (ix86_handle_option):
Assert that both OPT_msse4 and OPT_mno_sse4 are never unset.
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Process negated OPT_msse4 as OPT_mno_sse4.
* gcc.target/i386/pr119549.c: New testcase.
|
|
For vaes patterns with jm constraint and gpr16 attr, it requires "isa"
attr to distinct avx/avx512 alternatives in ix86_memory_address_reg_class.
Also adds missing type and mode attributes for those vaes patterns.
gcc/ChangeLog:
PR target/119473
* config/i386/sse.md
(vaesdec_<mode>): Set attr "isa" as "avx,vaes_avx512vl", "type" as
"sselog1", "mode" as "TI".
(vaesdeclast_<mode>): Ditto.
(vaesenc_<mode>): Ditto.
(vaesenclast_<mode>): Ditto.
gcc/testsuite/ChangeLog:
PR target/119473
* gcc.target/i386/pr119473.c: New test.
Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
|
|
According to Section 3.4.2, Vector Register Grouping, in the RISC-V
Vector Specification, the rule for LMUL is LMUL >= SEW/ELEN
Changes since V2:
- Add check on vector-iterators.md
- Add one more testcase to check the VLS use correct mode.
gcc/ChangeLog:
* config/riscv/riscv-v.cc: Add restrict for insert LMUL.
* config/riscv/riscv-vector-builtins-types.def:
Use RVV_REQUIRE_ELEN_64 to check LMUL number.
* config/riscv/riscv-vector-switch.def: Likewise.
* config/riscv/vector-iterators.md: Check TARGET_VECTOR_ELEN_64
rather than "TARGET_MIN_VLEN > 32" for all iterator.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr111391-2.c: Update test.
* gcc.target/riscv/rvv/base/abi-14.c: Update test.
* gcc.target/riscv/rvv/base/abi-16.c: Update test.
* gcc.target/riscv/rvv/base/abi-18.c: Update test.
* gcc.target/riscv/rvv/base/vsetvl_zve32-1.c: New test.
* gcc.target/riscv/rvv/base/vsetvl_zve32-2.c: New test.
Co-authored-by: Kito Cheng <kito.cheng@sifive.com>
|
|
As per the AArch64 ISA FEAT_SME does not require FEAT_SVE2. However, we don't
support SME without SVE2 and bail out with a 'sorry' if this configuration is
encountered. We may choose to support this in the future.
gcc/ChangeLog:
* config/aarch64/aarch64-option-extensions.def (SME): Remove SVE2 as
prerequisite and add in FCMA and F16FML.
* config/aarch64/aarch64.cc (aarch64_override_options_internal):
Diagnose use of SME without SVE2 and implicitly enable SVE2 when
enabling SME after streaming mode diagnosis.
* doc/invoke.texi (sme): Document that this can only be used with the
sve2 extension.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/no-sve-with-sme-1.c: New.
* gcc.target/aarch64/no-sve-with-sme-2.c: New.
* gcc.target/aarch64/no-sve-with-sme-3.c: New.
* gcc.target/aarch64/no-sve-with-sme-4.c: New.
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Pass +sve2 to existing
+sme pragma.
* gcc.target/aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_opt_single_n_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_single_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_int_opt_single_1.c:
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_3.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_4.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_3.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_uint_opt_single_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/clamp_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/compare_scalar_count_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/dot_za_slice_int_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/dot_za_slice_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/dot_za_slice_lane_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/dot_za_slice_uint_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/storexn_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrow_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrowt_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_za_slice_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/write_za_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/write_za_slice_1.c: Likewise.
|
|
Like the other instances. This avoids
;; 1--> b 0: i6540 {xmm2=const_vector;unspec[xmm2] 38;} :nothing
PR target/119010
* config/i386/sse.md (*vmov<mode>_constm1_pternlog_false_dep):
Add mode attribute.
|
|
The following fixes up the ssemov2 type introduction, amending
the znver4_sse_mov_fp_load reservation. This fixes
;; 14--> b 0: i1436 xmm6=vec_concat(xmm6,[ax+0x8]) :nothing
PR target/119010
* config/i386/zn4zn5.md (znver4_sse_mov_fp_load,
znver5_sse_mov_fp_load): Also match ssemov2.
|
|
The following adds missing reservations for the store variant of
sselog reservations covering
;; 112--> b 0: i1499 [dx-0x10]=vec_select(xmm10,parallel) :nothing
PR target/119010
* config/i386/zn4zn5.md (znver4_sse_log_evex_store,
znver5_sse_log_evex_store): New reservations.
|
|
They were using ssecvt instead of sseicvt, I've also added handling
for sseicvt2 which was introduced without fixing up automata, and
the relevant instruction uses DFmode. IMO this is a quite messy
area that could need TLC in the machine description itself.
PR target/119010
* config/i386/zn4zn5.md (znver4_sse_icvt): Use sseicvt.
(znver4_sse_icvt_store): Likewise.
(znver5_sse_icvt_store): Likewise.
(znver4_sse_icvt2): New.
|
|
Like the other DFmode cases.
PR target/119010
* config/i386/zn4zn5.md (znver4_sse_div_pd,
znver4_sse_div_pd_load, znver5_sse_div_pd_load): Handle DFmode.
|
|
The following handles TI, OI and XI mode in the respective EVEX
compare reservations that do not use memory (I've not yet run into
ones with). The znver automata has separate reservations for
integer compares (but only for zen1, for zen2 and zen3 there are
no compare reservations at all), but I don't see why that should
be necessary here.
PR target/119010
* config/i386/zn4zn5.md (znver4_sse_cmp_avx128,
znver5_sse_cmp_avx128): Handle TImode.
(znver4_sse_cmp_avx256, znver5_sse_cmp_avx256): Handle OImode.
(znver4_sse_cmp_avx512, znver5_sse_cmp_avx512): Handle XImode.
|
|
There's the znver4_sse_test reservation which matches the memory-less
SSE compares but currently requires prefix_extra == 1. The old
znver automata in this case sometimes uses znver1-double instead of
znver1-direct, but it's quite a maze. The following simply drops
the prefix_extra requirement, but I have no idea what I'm doing here.
There doesn't seem to be any documentation on the scheduler relevant
attributes used, or at least I cannot find that.
PR target/119010
* config/i386/zn4zn5.md (znver4_sse_test): Drop test of
prefix_extra attribute.
|
|
movv8si_internal uses sselog1 and V4SFmode for an instruction like
(insn 363 2437 371 97 (set (reg:V8SI 46 xmm10 [1125])
(const_vector:V8SI [
(const_int 0 [0]) repeated x8
])) "ComputeNonbondedUtil.C":185:21 2402 {movv8si_internal}
this wasn't catched by the existing znver4_sse_log1 reservation,
I think the znver automaton catches this with the generic
(define_insn_reservation "znver1_sse_log1" 1
(and (eq_attr "cpu" "znver1,znver2,znver3")
(and (eq_attr "type" "sselog1")
(eq_attr "memory" "none")))
"znver1-direct,znver1-fp1|znver1-fp2")
which does not look at the mode at all. The zn4zn5 automaton lacks
this and instead has separated store and load-store reservations
in odd ways. The following renames the store one and introduces
a none variant.
PR target/119010
* config/i386/zn4zn5.md (znver4_sse_log1): Rename to
znver4_sse_log1_store.
(znver5_sse_log1): Rename to znver5_sse_log1_store.
(znver4_sse_log1): New memory-less variant.
|
|
Similarly to data races with 8-bit byte or 16-bit word quantity memory
writes on non-BWX Alpha implementations we have the same problem even on
BWX implementations with partial memory writes produced for unaligned
stores as well as block memory move and clear operations. This happens
at the boundaries of the area written where we produce unprotected RMW
sequences, such as for example:
ldbu $1,0($3)
stw $31,8($3)
stq $1,0($3)
to zero a 9-byte member at the byte offset of 1 of a quadword-aligned
struct, happily clobbering a 1-byte member at the beginning of said
struct if concurrent write happens while executing on the same CPU such
as in a signal handler or a parallel write happens while executing on
another CPU such as in another thread or via a shared memory segment.
To guard against these data races with partial memory write accesses
introduce the `-msafe-partial' command-line option that instructs the
compiler to protect boundaries of the data quantity accessed by instead
using a longer code sequence composed of narrower memory writes where
suitable machine instructions are available (i.e. with BWX targets) or
atomic RMW access sequences where byte and word memory access machine
instructions are not available (i.e. with non-BWX targets).
Owing to the desire of branch avoidance there are redundant overlapping
writes in unaligned cases where STQ_U operations are used in the middle
of a block so as to make sure no part of data to be written has been
lost regardless of run-time alignment. For the non-BWX case it means
that with blocks whose size is not a multiple of 8 there are additional
atomic RMW sequences issued towards the end of the block in addition to
the always required pair enclosing the block from each end.
Only one such additional atomic RMW sequence is actually required, but
code currently issues two for the sake of simplicity. An improvement
might be added to `alpha_expand_unaligned_store_words_safe_partial' in
the future, by folding `alpha_expand_unaligned_store_safe_partial' code
for handling multi-word blocks whose size is not a multiple of 8 (i.e.
with a trailing partial-word part). It would improve performance a bit,
but current code is correct regardless.
Update test cases with `-mno-safe-partial' where required and add new
ones accordingly.
In some cases GCC chooses to open-code block memory write operations, so
with non-BWX targets `-msafe-partial' will in the usual case have to be
used together with `-msafe-bwa'.
Credit to Magnus Lindholm <linmag7@gmail.com> for sharing hardware for
the purpose of verifying the BWX side of this change.
gcc/
PR target/117759
* config/alpha/alpha-protos.h
(alpha_expand_unaligned_store_safe_partial): New prototype.
* config/alpha/alpha.cc (alpha_expand_movmisalign)
(alpha_expand_block_move, alpha_expand_block_clear): Handle
TARGET_SAFE_PARTIAL.
(alpha_expand_unaligned_store_safe_partial)
(alpha_expand_unaligned_store_words_safe_partial)
(alpha_expand_clear_safe_partial_nobwx): New functions.
* config/alpha/alpha.md (insvmisaligndi): Handle
TARGET_SAFE_PARTIAL.
* config/alpha/alpha.opt (msafe-partial): New option.
* config/alpha/alpha.opt.urls: Regenerate.
* doc/invoke.texi (Option Summary, DEC Alpha Options): Document
the new option.
gcc/testsuite/
PR target/117759
* gcc.target/alpha/memclr-a2-o1-c9-ptr.c: Add
`-mno-safe-partial'.
* gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c: New
file.
* gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c:
New file.
* gcc.target/alpha/memcpy-si-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c: New
file.
* gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c:
New file.
* gcc.target/alpha/stlx0.c: Add `-mno-safe-partial'.
* gcc.target/alpha/stlx0-safe-partial.c: New file.
* gcc.target/alpha/stlx0-safe-partial-bwx.c: New file.
* gcc.target/alpha/stqx0.c: Add `-mno-safe-partial'.
* gcc.target/alpha/stqx0-safe-partial.c: New file.
* gcc.target/alpha/stqx0-safe-partial-bwx.c: New file.
* gcc.target/alpha/stwx0.c: Add `-mno-safe-partial'.
* gcc.target/alpha/stwx0-bwx.c: Add `-mno-safe-partial'. Refer
to stwx0.c rather than copying its code and also verify no LDQ_U
or STQ_U instructions have been produced.
* gcc.target/alpha/stwx0-safe-partial.c: New file.
* gcc.target/alpha/stwx0-safe-partial-bwx.c: New file.
|
|
With non-BWX Alpha implementations we have a problem of data races where
a 8-bit byte or 16-bit word quantity is to be written to memory in that
in those cases we use an unprotected RMW access of a 32-bit longword or
64-bit quadword width. If contents of the longword or quadword accessed
outside the byte or word to be written are changed midway through by a
concurrent write executing on the same CPU such as by a signal handler
or a parallel write executing on another CPU such as by another thread
or via a shared memory segment, then the concluding write of the RMW
access will clobber them. This is especially important for the safety
of RCU algorithms, but is otherwise an issue anyway.
To guard against these data races with byte and aligned word quantities
introduce the `-msafe-bwa' command-line option (standing for Safe Byte &
Word Access) that instructs the compiler to instead use an atomic RMW
access sequence where byte and word memory access machine instructions
are not available. There is no change to code produced for BWX targets.
It would be sufficient for the secondary reload handle to use a pair of
scratch registers, as requested by `reload_out<mode>', but it would end
with poor code produced as one of the scratches would be occupied by
data retrieved and the other one would have to be reloaded with repeated
calculations, all within the LL/SC sequence.
Therefore I chose to add a dedicated `reload_out<mode>_safe_bwa' handler
and ask for more scratches there by defining a 256-bit OI integer mode.
While reload is documented in our manual to support an arbitrary number
of scratches in reality it hasn't been implemented for IRA:
/* ??? It would be useful to be able to handle only two, or more than
three, operands, but for now we can only handle the case of having
exactly three: output, input and one temp/scratch. */
and it seems to be the case for LRA as well. Do what everyone else does
then and just have one wide multi-register scratch.
I note that the atomic sequences emitted are suboptimal performance-wise
as the looping branch for the unsuccessful completion of the sequence
points backwards, which means it will be predicted as taken despite that
in most cases it will fall through. I do not see it as a deficiency of
this change proposed as it takes care of recording that the branch is
unlikely to be taken, by calling `alpha_emit_unlikely_jump'. Therefore
generic code elsewhere should instead be investigated and adjusted
accordingly for the arrangement to actually take effect.
Add test cases accordingly.
There are notable regressions between a plain `-mno-bwx' configuration
and a `-mno-bwx -msafe-bwa' one:
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O0 execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O1 execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O3 -g execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -Os execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test
FAIL: g++.dg/init/array25.C -std=c++17 execution test
FAIL: g++.dg/init/array25.C -std=c++98 execution test
FAIL: g++.dg/init/array25.C -std=c++26 execution test
They come from the fact that these test cases play tricks with alignment
and end up calling code that expects a reference to aligned data but is
handed one to unaligned data.
This doesn't cause a visible problem with plain `-mno-bwx' code, because
the resulting alignment exception is fixed up by Linux. There's no such
handling currently implemented for LDL_L or LDQ_L instructions (which
are first in the sequence) and consequently the offender is issued with
SIGBUS instead. Suitable handling will be added to Linux to complement
this change that will emulate the trapping instructions[1], so these
interim regressions are seen as harmless and expected.
References:
[1] "Alpha: Emulate unaligned LDx_L/STx_C for data consistency",
<https://lore.kernel.org/r/alpine.DEB.2.21.2502181912230.65342@angie.orcam.me.uk/>
gcc/
PR target/117759
* config/alpha/alpha-modes.def (OI): New integer mode.
* config/alpha/alpha-protos.h (alpha_expand_mov_safe_bwa): New
prototype.
* config/alpha/alpha.cc (alpha_expand_mov_safe_bwa): New
function.
(alpha_secondary_reload): Handle TARGET_SAFE_BWA.
* config/alpha/alpha.md (aligned_store_safe_bwa)
(unaligned_store<mode>_safe_bwa, reload_out<mode>_safe_bwa)
(reload_out<mode>_unaligned_safe_bwa): New expanders.
(mov<mode>, movcqi, reload_out<mode>_aligned): Handle
TARGET_SAFE_BWA.
(reload_out<mode>): Guard against TARGET_SAFE_BWA.
* config/alpha/alpha.opt (msafe-bwa): New option.
* config/alpha/alpha.opt.urls: Regenerate.
* doc/invoke.texi (Option Summary, DEC Alpha Options): Document
the new option.
gcc/testsuite/
PR target/117759
* gcc.target/alpha/stb.c: New file.
* gcc.target/alpha/stb-bwa.c: New file.
* gcc.target/alpha/stb-bwx.c: New file.
* gcc.target/alpha/stba.c: New file.
* gcc.target/alpha/stba-bwa.c: New file.
* gcc.target/alpha/stba-bwx.c: New file.
* gcc.target/alpha/stw.c: New file.
* gcc.target/alpha/stw-bwa.c: New file.
* gcc.target/alpha/stw-bwx.c: New file.
* gcc.target/alpha/stwa.c: New file.
* gcc.target/alpha/stwa-bwa.c: New file.
* gcc.target/alpha/stwa-bwx.c: New file.
|
|
Rename `emit_unlikely_jump' function to `alpha_emit_unlikely_jump', so
as to avoid namespace pollution, updating callers accordingly and export
it for use in the machine description. Make it return the insn emitted.
gcc/
* config/alpha/alpha-protos.h (alpha_emit_unlikely_jump): New
prototype.
* config/alpha/alpha.cc (emit_unlikely_jump): Rename to...
(alpha_emit_unlikely_jump): ... this. Return the insn emitted.
(alpha_split_atomic_op, alpha_split_compare_and_swap)
(alpha_split_compare_and_swap_12, alpha_split_atomic_exchange)
(alpha_split_atomic_exchange_12): Update call sites accordingly.
|
|
Windows only requires sections to be aligned on a 4-byte boundary. This used
to work because in binutils the `.rdata` section is over-aligned to a 16-byte
boundary, which will be fixed in the future.
This matches the output of Clang.
Signed-off-by: LIU Hao <lh_mouse@126.com>
Signed-off-by: Jonathan Yong <10walls@gmail.com>
gcc/ChangeLog:
* config/mingw/winnt.cc (mingw_pe_file_end): Add `.p2align`.
|
|
Based on r15-7624, a set of align combinations with better performance
was tested through spec2006.
LA464: -falign-loops=8 -falign-functions=32 -falign-jumps=32 -falign-labels=8
LA664: -falign-loops=16 -falign-functions=16 -falign-jumps=32 -falign-labels=8
gcc/ChangeLog:
* config/loongarch/loongarch-def.cc
(la464_align): Add settings for labels.
(la664_align): Likewise.
* config/loongarch/loongarch-opts.cc
(loongarch_target_option_override): Likewise.
* config/loongarch/loongarch-tune.h
(struct loongarch_align): Implement the function `label_`.
|
|
plus_constant expects integer as its third argument, not rtx.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_redzone_clobber): Use integer, not rtx
as the third argument of plus_constant.
|
|
I still was seeing
;; 0--> b 0: i 101 {[sp-0x3c]=[sp-0x3c]+0x1;clobber flags;}:nothing
so the following adds a standard alu insn reservation mimicing that
from the znver.md description allowing both load and store.
PR target/119010
* config/i386/zn4zn5.md (znver4_insn_both, znver5_insn_both):
New reservation for ALU ops with load and store.
|
|
The following adds DFmode where V1DFmode and SFmode were handled.
This resolves missing reservations for adds, subs [with memory]
and for FMAs for the testcase I'm looking at. Resolved cases are
-;; 16--> b 0: i 237 xmm3=xmm3+[r9*0x8+si] :nothing
-;; 29--> b 0: i 246 xmm3=xmm3+xmm1 :nothing
-;; 46--> b 0: i 296 xmm1=xmm1-xmm3 :nothing
I've done search-and-replace for this, the catched cases look reasonable
though I'm of course not sure all of them can actually happen.
This also fixes the matched type for the znver{4,5}_sse_muladd_load
reservations from sseshuf to ssemuladd, resolving
-;; 1--> b 0: i 161 xmm0={-xmm0*xmm27+[cx+ax]} :nothing
-;; 22--> b 0: i 229 xmm11={-xmm11*xmm7+[di*0x8+dx]} :nothing
PR target/119010
* config/i386/zn4zn5.md (znver4_sse_add, znver4_sse_add_load,
znver5_sse_add_load, znver4_sse_add1, znver4_sse_add1_load,
znver5_sse_add1_load, znver4_sse_mul, znver4_sse_mul_load,
znver5_sse_mul_load, znver4_sse_cvt, znver4_sse_cvt_load,
znver5_sse_cvt_load, znver4_sse_shuf, znver5_sse_shuf,
znver4_sse_shuf_load, znver5_sse_shuf_load,
znver4_sse_cmp_avx128, znver5_sse_cmp_avx128,
znver4_sse_cmp_avx128_load, znver5_sse_cmp_avx128_load):
Also handle DFmode.
(znver4_sse_muladd_load, znver5_sse_muladd_load): Use
ssemuladd type.
|
|
This test has presumably been failing since vectorization was enabled
at -O2. I suspect part of the reason this wasn't picked up sooner is
that the test is a hybrid execution/scan-assembler test and the
execution part requires appropriate hardware.
The problem is that we are vectorizing an expansion of fmaxf() when
the vector version of the instruction does not preserve denormal
values. This means we should only apply this optimization when
-funsafe-math-optimizations is enabled.
This fix does a few things:
- Moves the expand pattern to vec-common.md. Although I haven't changed
its behaviour (beyond fixing the bug), this should really be enabled for
MVE as well (but that will need to wait for gcc-16 since the MVE code
needs some additional changes first).
- Adds support for HF mode vectors.
- splits the test that was exposing the bug into two parts: an executable
test and a scan-assembler test. The scan-assembler version is more
widely enabled, since it does not require a suitable executable environment.
gcc/ChangeLog:
* config/arm/neon.md (<fmaxmin><mode>3): Move pattern from here...
* config/arm/vec-common.md (<fmaxmin><mode>3): ... to here. Convert
to define_expand and disable the pattern when denormal values might
get truncated to zero. Iterate on VF to add V4HF and V8HF variants.
gcc/testsuite/ChangeLog:
* gcc.target/arm/fmaxmin.c: Move scan-assembler checks to ...
* gcc.target/arm/fmaxmin-2.c: ... here. New test.
|
|
"jm" should with "gpr16", otherwise maybe raise ICE in reload pass.
gcc/ChangeLog:
PR target/119425
* config/i386/sse.md:
(vec_set<mode>_0): Set the alternative with constraint "jm"'s
attribute "addr" to "gpr16".
(<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>):
Ditto.
(avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Ditto.
(avx2_pblendd<mode>): Ditto.
(aesenc): Ditto.
(aesenclast): Ditto.
(aesdec): Ditto.
(aesdeclast): Ditto.
(vaesdec_<mode>): Ditto.
(vaesdeclast_<mode>): Ditto.
(vaesenc_<mode>):: Ditto.
(vaesenclast_<mode>):: Ditto.
(aes<aesklvariant>u8): Ditto.
(*aes<aeswideklvariant>u8): Ditto.
gcc/testsuite/ChangeLog:
PR target/119425
* gcc.target/i386/pr119425.c: New test.
Co-authered-by: Hongyu Wang <hongyu.wang@intel.com>
|
|
In r14-3635 supports `__float128`, but does not support the 'q/Q' suffix.
PR target/119408
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_c_mode_for_suffix): New.
(TARGET_C_MODE_FOR_SUFFIX): Define.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/pr119408.c: New test.
|
|
..., so that users don't manually need to specify '-foffload-options=-lstdc++'
in addition to '-lstdc++' (specified manually, or implicitly by the driver).
Do like commit 4bcb46b3ade1796c5a57b294f5cca25f00671cac
"driver: Forward '-lgfortran', '-lm' to offloading compilation".
PR driver/101544
gcc/
* gcc.cc (driver_handle_option): Forward host '-lstdc++' to
offloading compilation.
* config/gcn/mkoffload.cc (main): Adjust.
* config/nvptx/mkoffload.cc (main): Likewise.
libgomp/
* testsuite/libgomp.c++/pr101544-1-O0.C: Remove
'-foffload-options=-lstdc++'.
* testsuite/libgomp.c++/pr101544-1.C: Likewise.
* testsuite/libgomp.oacc-c++/pr101544-1.C: Likewise.
|
|
The following testcase ICEs because a peephole2 attempts to offset
memory which is not offsettable (in particular address is a ZERO_EXTEND
in this case).
Because peephole2s don't support constraints, I've added a check for this
in the peephole2's condition.
2025-03-26 Jakub Jelinek <jakub@redhat.com>
PR target/119450
* config/i386/i386.md (narrow test peephole2): Test for
offsettable_memref_p in condition.
* gcc.target/i386/pr119450.c: New test.
|
|
The following resolves missing reservations for DFmode *movdf_internal
loads and stores, visible as 'nothing' in -fsched-verbose=2 dumps.
PR target/119010
* config/i386/zn4zn5.md (znver4_sse_mov_fp, znver4_sse_mov_fp_load,
znver5_sse_mov_fp_load, znver4_sse_mov_fp_store,
znver5_sse_mov_fp_store): Also match V1SF and DF.
|
|
The imov and imovx classified stores miss reservations in the znver4/5
pipeline description. The following adds them.
PR target/119010
* config/i386/zn4zn5.md (znver4_imov_double_store,
znver5_imov_double_store, znver4_imov_store, znver5_imov_store):
New reservations for integer stores.
|
|
This patch aims to add "s_" after 'cvt' represent saturation.
gcc/ChangeLog:
* config/i386/avx10_2-512convertintrin.h (_mm512_mask_cvtx2ps_ph): Formatting fixes
(_mm512_mask_cvtx_round2ps_ph): Ditto
(_mm512_maskz_cvtx_round2ps_ph): Ditto
(_mm512_cvtbiassph_bf8): Rename to _mm512_cvts_biasph_bf8.
(_mm512_mask_cvtbiassph_bf8): Rename to _mm512_mask_cvts_biasph_bf8.
(_mm512_maskz_cvtbiassph_bf8): Rename to _mm512_maskz_cvts_biasph_bf8.
(_mm512_cvtbiassph_hf8): Rename to _mm512_cvts_biasph_hf8.
(_mm512_mask_cvtbiassph_hf8): Rename to _mm512_mask_cvts_biasph_hf8.
(_mm512_maskz_cvtbiassph_hf8): Rename to _mm512_maskz_cvts_biasph_hf8.
(_mm512_cvts2ph_bf8): Rename to _mm512_cvts_2ph_bf8.
(_mm512_mask_cvts2ph_bf8): Rename to _mm512_mask_cvts_2ph_bf8.
(_mm512_maskz_cvts2ph_bf8): Rename to _mm512_maskz_cvts_2ph_bf8.
(_mm512_cvts2ph_hf8): Rename to _mm512_cvts_2ph_hf8.
(_mm512_mask_cvts2ph_hf8): Rename to _mm512_mask_cvts_2ph_hf8.
(_mm512_maskz_cvts2ph_hf8): Rename to _mm512_maskz_cvts_2ph_hf8.
(_mm512_cvtsph_bf8): Rename to _mm512_cvts_ph_bf8.
(_mm512_mask_cvtsph_bf8): Rename to _mm512_mask_cvts_ph_bf8.
(_mm512_maskz_cvtsph_bf8): Rename to _mm512_maskz_cvts_ph_bf8.
(_mm512_cvtsph_hf8): Rename to _mm512_cvts_ph_hf8.
(_mm512_mask_cvtsph_hf8): Rename to _mm512_mask_cvts_ph_hf8.
(_mm512_maskz_cvtsph_hf8): Rename to _mm512_maskz_cvts_ph_hf8.
* config/i386/avx10_2convertintrin.h
(_mm_cvtbiassph_bf8): Rename to _mm_cvts_biasph_bf8.
(_mm_mask_cvtbiassph_bf8): Rename to _mm_mask_cvts_biasph_bf8.
(_mm_maskz_cvtbiassph_bf8): Rename to _mm_maskz_cvts_biasph_bf8.
(_mm256_cvtbiassph_bf8): Rename to _mm256_cvts_biasph_bf8.
(_mm256_mask_cvtbiassph_bf8): Rename to _mm256_mask_cvts_biasph_bf8.
(_mm256_maskz_cvtbiassph_bf8): Rename to _mm256_maskz_cvts_biasph_bf8.
(_mm_cvtbiassph_hf8): Rename to _mm_cvts_biasph_hf8.
(_mm_mask_cvtbiassph_hf8): Rename to _mm_mask_cvts_biasph_hf8.
(_mm_maskz_cvtbiassph_hf8): Rename to _mm_maskz_cvts_biasph_hf8.
(_mm256_cvtbiassph_hf8): Rename to _mm256_cvts_biasph_hf8.
(_mm256_mask_cvtbiassph_hf8): Rename to _mm256_mask_cvts_biasph_hf8.
(_mm256_maskz_cvtbiassph_hf8): Rename to _mm256_maskz_cvts_biasph_hf8.
(_mm_cvts2ph_bf8): Rename to _mm_cvts_2ph_bf8.
(_mm_mask_cvts2ph_bf8): Rename to _mm_mask_cvts_2ph_bf8.
(_mm_maskz_cvts2ph_bf8): Rename to _mm_maskz_cvts_2ph_bf8.
(_mm256_cvts2ph_bf8): Rename to _mm256_cvts_2ph_bf8.
(_mm256_mask_cvts2ph_bf8): Rename to _mm256_mask_cvts_2ph_bf8.
(_mm256_maskz_cvts2ph_bf8): Rename to _mm256_maskz_cvts_2ph_bf8.
(_mm_cvts2ph_hf8): Rename to _mm_cvts_2ph_hf8.
(_mm_mask_cvts2ph_hf8): Rename to _mm_mask_cvts_2ph_hf8.
(_mm_maskz_cvts2ph_hf8): Rename to _mm_maskz_cvts_2ph_hf8.
(_mm256_cvts2ph_hf8): Rename to _mm256_cvts_2ph_hf8.
(_mm256_mask_cvts2ph_hf8): Rename to _mm256_mask_cvts_2ph_hf8.
(_mm256_maskz_cvts2ph_hf8): Rename to _mm256_maskz_cvts_2ph_hf8.
(_mm_cvtsph_bf8): Rename to _mm_cvts_ph_bf8.
(_mm_mask_cvtsph_bf8): Rename to _mm_mask_cvts_ph_bf8.
(_mm_maskz_cvtsph_bf8): Rename to _mm_maskz_cvts_ph_bf8.
(_mm256_cvtsph_bf8): Rename to _mm256_cvts_ph_bf8.
(_mm256_mask_cvtsph_bf8): Rename to _mm256_mask_cvts_ph_bf8.
(_mm256_maskz_cvtsph_bf8): Rename to _mm256_maskz_cvts_ph_bf8.
(_mm_cvtsph_hf8): Rename to _mm_cvts_ph_hf8.
(_mm_mask_cvtsph_hf8): Rename to _mm_mask_cvts_ph_hf8.
(_mm_maskz_cvtsph_hf8): Rename to _mm_maskz_cvts_ph_hf8.
(_mm256_cvtsph_hf8): Rename to _mm256_cvts_ph_hf8.
(_mm256_mask_cvtsph_hf8): Rename to _mm256_mask_cvts_ph_hf8.
(_mm256_maskz_cvtsph_hf8): Rename to _mm256_maskz_cvts_ph_hf8.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-512-convert-1.c: Modify function name
to follow the latest version.
* gcc.target/i386/avx10_2-512-vcvt2ph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvt2ph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-convert-1.c: Ditto.
|
|
The following patch is miscompiled from r15-8478 but latently already
since my r11-5756 and r11-6631 changes.
The r11-5756 change was
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561164.html
which changed the splitters to immediately throw away the masking.
And the r11-6631 change was an optimization to recognize
(set (zero_extract:HI (...) (const_int 1) (...)) (const_int 1)
as btr.
The problem is their interaction. x86 is not a SHIFT_COUNT_TRUNCATED
target, so the masking needs to be explicit in the IL.
And combine.cc (make_field_assignment) has since 1992 optimizations
which try to optimize x &= (-2 r<< y) into zero_extract (x) = 0.
Now, such an optimization is fine if y has not been masked or if the
chosen zero_extract has the same mode as the rotate (or it recognizes
something with a left shift too). IMHO such optimization is invalid
for SHIFT_COUNT_TRUNCATED targets because we explicitly say that
the masking of the shift/rotate counts are redundant there and don't
need to be part of the IL (I have a patch for that, but because it
is just latent, I'm not sure it needs to be posted for gcc 15 (and
also am not sure if it should punt or add operand masking just in case)).
x86 is not SHIFT_COUNT_TRUNCATED though and so even fixing combine
not to do that for SHIFT_COUNT_TRUNCATED targets doesn't help, and we don't
have QImode insv, so it is optimized into HImode insertions. Now,
if the y in x &= (-2 r<< y) wasn't masked in any way, turning it into
HImode btr is just fine, but if it was x &= (-2 r<< (y & 7)) and we just
decided to throw away the masking, using btr changes the behavior on it
and causes e2fsprogs and sqlite miscompilations.
So IMHO on !SHIFT_COUNT_TRUNCATED targets, we need to keep the maskings
explicit in the IL, either at least for the duration of the combine pass
as does the following patch (where combine is the only known pass to have
such transformation), or even keep it until final pass in case there are
some later optimizations that would also need to know whether there was
explicit masking or not and with what mask. The latter change would be
much larger.
The following patch just reverts the r11-5756 change and adds a testcase.
2025-03-25 Jakub Jelinek <jakub@redhat.com>
PR target/96226
PR target/119428
* config/i386/i386.md (splitter after *<rotate_insn><mode>3_mask,
splitter after *<rotate_insn><mode>3_mask_1): Revert 2020-12-05
changes.
* gcc.c-torture/execute/pr119428.c: New test.
|
|
It seems the new expander triggers a latent issue in sched1 causing
extraneous spills in a different sad variant.
Given how close we are to gcc-15 release, disable it for now.
Since we do want to retain and re-enable this capabilty, manully disable
vs. reverting the orig patch which takes away the test case too.
Fix the orig test case to expect old codegen idiom (although vneg is no
longer emitted, in favor of vrsub).
Also add a new testcase which flags any future spills in the affected
routine.
PR target/119224
gcc/ChangeLog:
* config/riscv/autovec.md: Disable abd splitter.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr117722.c: Adjust output insn.
* gcc.target/riscv/rvv/autovec/pr119224.c: Add new test.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
|
|
Prior to Armv6, the SMULL and UMULL instructions, which have the form
UMULL Rdlo, Rdhi, Rm, Rs
had an operand restriction such that Rdlo, Rdhi and Rm must all be
different registers. Rs, however can overlap either of the
destination registers. Add some register-tie alternatives to allow
the register allocator to find these forms without having to use
additional register moves.
In addition to this, the test is pretty meaningless on Thumb-1 targets
as the S/UMULL instructions do not exist in a 16-bit encoding. So skip
the test in this case.
gcc/ChangeLog:
* config/arm/arm.md (<US>mull): Add alternatives that allow Rs
to be tied to either Rdlo or Rdhi.
gcc/testsuite/ChangeLog:
* gcc.target/arm/pr42575.c: Skip test if thumb1.
|
|
aliases [PR101544]
Namely, use PTX '.alias' even for (default) '-mno-alias' if the host made the
C++ "base and complete [cd]tor aliases".
PR target/101544
gcc/
* config/nvptx/nvptx.cc (nvptx_asm_output_def_from_decls)
[ACCEL_COMPILER]: Special-case certain host-setup symbol aliases.
* varasm.cc (do_assemble_alias) [ACCEL_COMPILER]: Adjust.
|
|
gcc/
* config/nvptx/nvptx.cc (default_ptx_version_option): Default at
least to '-mptx=6.3'.
* doc/invoke.texi (Nvidia PTX Options): Update '-mptx=[...]'.
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_30.c: Adjust.
* gcc.target/nvptx/march-map=sm_32.c: Likewise.
* gcc.target/nvptx/march-map=sm_35.c: Likewise.
* gcc.target/nvptx/march-map=sm_37.c: Likewise.
* gcc.target/nvptx/march-map=sm_50.c: Likewise.
* gcc.target/nvptx/march=sm_30.c: Likewise.
* gcc.target/nvptx/march=sm_35.c: Likewise.
* gcc.target/nvptx/march=sm_37.c: Likewise.
|
|
-mavx10.1 back with 512 bit alias
When AVX10.1 options are added into GCC 14, E-core is supposed to
support up to 256 bit vector width, while P-core up to 512 bit vector
width. Therefore, we added avx10.1-256 and avx10.1-512 options into
compiler since there will be real platforms with 256 bit only support.
At the same time, for old platforms could also compile a 256 bit only
binary, we introduced -mno-evex512 to disable 512 bit vector.
However, all the future platforms will now support 512 bit vector width,
including P-core and E-core. It will result in no need for split the
option for vector width. Therefore, we will remove them in this patch.
Unlike AVX10.2 options, AVX10.1 options has been there in a major
release, so we have to raise a deprecate warning in GCC 15 and remove
them in GCC 16. At the same time, to align with avx10.2 options, we will
add just removed avx10.1 option back with warning to mention its
behavior change.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h
(get_available_features): Change to FEATURE_AVX10_1.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_1_512_SET): Renamed to ...
(OPTION_MASK_ISA2_AVX10_1_SET): ... this.
(OPTION_MASK_ISA2_AVX10_2_SET): Use renamed macro.
(OPTION_MASK_ISA2_AVX10_1_UNSET): Ditto.
(ix86_handle_option): Ditto.
(processor_alias_table): Use P_PROC_AVX10_1.
* common/config/i386/i386-cpuinfo.h
(enum feature_priority): Rename from AVX10_1_512 to AVX10_1.
(enum processor_features): Ditto.
* common/config/i386/i386-isas.h: Add avx10.1.
* config/i386/driver-i386.cc
(host_detect_local_cpu): Use renamed enum.
* config/i386/i386-c.cc
(ix86_target_macros_internal): Rename to avx10.1.
* config/i386/i386-isa.def (AVX10_1_512): Rename to ...
(AVX10_1): ... this.
* config/i386/i386-options.cc (isa2_opts): Rename to avx10.1.
(ix86_valid_target_attribute_inner_p): Add avx10.1.
(ix86_option_override_internal): Rename to AVX10_1.
Revise warnings to mention behavior change for option
combination in GCC 16.
* config/i386/i386.h (PTA_DIAMONDRAPIDS): Use AVX10_1.
* config/i386/i386.opt: Add avx10.1.
Add deprecate warnings for mevex512 and mavx10.1-256/512.
* config/i386/i386.opt.urls: Add avx10.1.
* doc/extend.texi: Ditto.
* doc/sourcebuild.texi: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10-check.h: Change to avx10.1.
* gcc.target/i386/avx10_1-1.c: Add warning check.
* gcc.target/i386/avx10_1-10.c: Ditto.
* gcc.target/i386/avx10_1-11.c: Ditto.
* gcc.target/i386/avx10_1-12.c: Ditto.
* gcc.target/i386/avx10_1-13.c: Ditto.
* gcc.target/i386/avx10_1-15.c: Ditto.
* gcc.target/i386/avx10_1-16.c: Ditto.
* gcc.target/i386/avx10_1-18.c: Ditto.
* gcc.target/i386/avx10_1-19.c: Ditto.
* gcc.target/i386/avx10_1-2.c: Ditto.
* gcc.target/i386/avx10_1-20.c: Ditto.
* gcc.target/i386/avx10_1-21.c: Ditto.
* gcc.target/i386/avx10_1-22.c: Ditto.
* gcc.target/i386/avx10_1-23.c: Ditto.
* gcc.target/i386/avx10_1-26.c: Ditto.
* gcc.target/i386/avx10_1-3.c: Ditto.
* gcc.target/i386/avx10_1-4.c: Ditto.
* gcc.target/i386/avx10_1-7.c: Ditto.
* gcc.target/i386/avx10_1-8.c: Ditto.
* gcc.target/i386/avx10_1-9.c: Ditto.
* gcc.target/i386/noevex512-1.c: Ditto.
* gcc.target/i386/noevex512-2.c: Ditto.
* gcc.target/i386/pr111068.c: Ditto.
* gcc.target/i386/pr111907.c: Ditto.
* gcc.target/i386/pr117240_avx512f.c: Ditto.
* gcc.target/i386/pr117304-1.c: Ditto.
* gcc.target/i386/pr117946.c: Ditto.
* gcc.target/i386/avx10_1-24.c: Removed.
* gcc.target/i386/avx10_1-25.c: Removed.
* gcc.target/i386/avx10_1-5.c: Removed.
* gcc.target/i386/avx10_1-6.c: Removed.
|
|
When AVX10.2 options are added into GCC 15, E-core is supposed to
support up to 256 bit vector width, while P-core up to 512 bit vector
width. Therefore, we added avx10.2-256 and avx10.2-512 options into
compiler since there will be real platforms with 256 bit only support.
However, all the future platforms will now support 512 bit vector width,
including P-core and E-core. It will result in no need for split the
option for vector width. Therefore, we will remove them in this patch.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h
(get_available_features): Revise the logic AVX10 version.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_2_256_SET): Removed.
(OPTION_MASK_ISA2_AVX10_2_512_SET): Ditto.
(OPTION_MASK_ISA2_AVX10_2_SET): New.
(OPTION_MASK_ISA2_AMX_AVX512_SET): Use AVX10.2 macro.
(OPTION_MASK_ISA2_AVX10_2_UNSET): Ditto.
(ix86_handle_option): Remove avx10.2-256 part. Adjust avx10.2.
* common/config/i386/i386-cpuinfo.h
(enum processor_features): Remove FEATURE_AVX10_2_256 and skip
the value for it. Change the name from FEATURE_AVX10_2_512 to
FEATURE_AVX10_2.
* common/config/i386/i386-isas.h: Remove avx10.2-256/512.
* config/i386/avx10_2-512bf16intrin.h: Use avx10.2 instead of
avx10.2-256/512.
* config/i386/avx10_2-512convertintrin.h: Ditto.
* config/i386/avx10_2-512mediaintrin.h: Ditto.
* config/i386/avx10_2-512minmaxintrin.h: Ditto.
* config/i386/avx10_2-512satcvtintrin.h: Ditto.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/avx10_2convertintrin.h: Ditto.
* config/i386/avx10_2mediaintrin.h: Ditto.
* config/i386/avx10_2minmaxintrin.h: Ditto.
* config/i386/avx10_2satcvtintrin.h: Ditto.
* config/i386/movrsintrin.h: Ditto.
* config/i386/sm4intrin.h: Ditto.
* config/i386/cpuid.h (bit_AVX10_256): Removed.
(bit_AVX10_512): Ditto.
* config/i386/driver-i386.cc (host_detect_local_cpu): Adjust
Diamond Rapids and -march=native condition.
* config/i386/i386-builtin.def (BDESC): Use AVX10.2 macro
instead of AVX10.2-256/512.
* config/i386/i386-c.cc (ix86_target_macros_internal): Ditto.
* config/i386/i386-expand.cc
(ix86_expand_branch): Use TARGET_AVX10_2 instead of specifying
vector size.
(ix86_prepare_fp_compare_args): Ditto.
(ix86_expand_fp_compare): Ditto.
(ix86_ssecom_setcc): Ditto.
(ix86_expand_sse_comi): Ditto.
(ix86_expand_sse_comi_round): Ditto.
(ix86_check_builtin_isa_match): Ditto.
* config/i386/i386.cc (ix86_fp_compare_code_to_integer): Ditto.
(ix86_get_mask_mode): Ditto.
* config/i386/i386.h (SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P): Ditto.
* config/i386/i386.md: Ditto.
* config/i386/mmx.md: Ditto.
* config/i386/sse.md: Ditto.
* config/i386/predicates.md: Ditto.
* config/i386/i386-isa.def (AVX10_2_256): Removed.
(AVX10_2_512): Removed.
(AVX10_2): New.
* config/i386/i386-options.cc
(isa2_opts): Remove avx10.2-256/512.
(ix86_valid_target_attribute_inner_p): Ditto.
(PTA_DIAMONDRAPIDS): Use PTA_AVX10_2.
* config/i386/i386.opt: Remove avx10.2-256/512.
* config/i386/i386.opt.urls: Ditto.
* doc/extend.texi: Ditto.
* doc/invoke.texi: Ditto.
* doc/sourcebuild.texi: Ditto.
|
|
This reverts commit e22e3af1954469c40b139b7cfa8e7708592f4bfd.
|
|
This reverts commit 85e874d19548f0dcb9a3f14f9e4b1e3411c88c4b.
|
|
This reverts commit 508ac49e1a94c28346642bff512d0ed5f4f58b64.
|
|
vcvtph2{,u}{dq,qq} intrins"
This reverts commit 6f2eac53b6026836f3222961c32312e02c2c7dbc.
|
|
This reverts commit b70bb94aca7bc10a54f744d793c32c51f91ce195.
|
|
This reverts commit 0f5a42d41b46b746c6f77374d76a3b918a1e2b57.
|
|
vcvttpd2{,u}{dq,qq} intrins"
This reverts commit 6e231f8504874828b23bbe89f3ef4086dcc15a44.
|
|
This reverts commit 493c5096050523ebc05e5fa21612683a996b97a7.
|
|
vcvtu{dq,qq}2p{s,d,h} intrins"
This reverts commit b2754227139512adecb6fda067632b587ff4a017.
|
|
This reverts commit 3d1b5530ea1d23e26dc5ab70aa4a2e7b9dc19b50.
|
|
This reverts commit 95980b292b24110d3f1dffb81926df23c61b4fe7.
|
|
This reverts commit 0683ca355a87fd36a2e7ae1721199204ceff4c4c.
|
|
vfmaddsub{132,231,213}p{s,d,h} intrins"
This reverts commit cfbc94eaf167ae7aecd21ee6054556e1cf9d7143.
|
|
intrins"
This reverts commit dd48acbe85ca55dd23ffafbb917ffe559d13b6a3.
|