aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
40 hoursGCN: Don't emit weak undefined symbols [PR119369]Thomas Schwinge3-0/+20
This resolves all instances of PR119369 "GCN: weak undefined symbols -> execution test FAIL, 'HSA_STATUS_ERROR_VARIABLE_UNDEFINED'"; for all affected test cases, the execution test status progresses FAIL -> PASS. This however also causes a small number of (expected) regressions, very similar to GCC/nvptx: [-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C -std=c++17 (test for excess errors) [-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C -std=c++26 (test for excess errors) [-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C -std=c++98 (test for excess errors) [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++11 scan-assembler .weak[ \t]*_?_ZTH11derived_obj [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++11 scan-assembler .weak[ \t]*_?_ZTH13container_obj [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++11 scan-assembler .weak[ \t]*_?_ZTH8base_obj PASS: g++.dg/cpp0x/pr84497.C -std=c++11 (test for excess errors) [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++17 scan-assembler .weak[ \t]*_?_ZTH11derived_obj [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++17 scan-assembler .weak[ \t]*_?_ZTH13container_obj [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++17 scan-assembler .weak[ \t]*_?_ZTH8base_obj PASS: g++.dg/cpp0x/pr84497.C -std=c++17 (test for excess errors) [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++26 scan-assembler .weak[ \t]*_?_ZTH11derived_obj [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++26 scan-assembler .weak[ \t]*_?_ZTH13container_obj [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C -std=c++26 scan-assembler .weak[ \t]*_?_ZTH8base_obj PASS: g++.dg/cpp0x/pr84497.C -std=c++26 (test for excess errors) [-PASS:-]{+FAIL:+} g++.dg/ext/weak2.C -std=gnu++17 scan-assembler weak[^ \t]*[ \t]_?_Z3foov PASS: g++.dg/ext/weak2.C -std=gnu++17 (test for excess errors) [-PASS:-]{+FAIL:+} g++.dg/ext/weak2.C -std=gnu++26 scan-assembler weak[^ \t]*[ \t]_?_Z3foov PASS: g++.dg/ext/weak2.C -std=gnu++26 (test for excess errors) [-PASS:-]{+FAIL:+} g++.dg/ext/weak2.C -std=gnu++98 scan-assembler weak[^ \t]*[ \t]_?_Z3foov PASS: g++.dg/ext/weak2.C -std=gnu++98 (test for excess errors) [-PASS:-]{+FAIL:+} gcc.dg/attr-weakref-1.c (test for excess errors) [-FAIL:-]{+UNRESOLVED:+} gcc.dg/attr-weakref-1.c [-execution test-]{+compilation failed to produce executable+} @@ -131211,25 +131211,25 @@ PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?c PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?d PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?e PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?g [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?j PASS: gcc.dg/weak/weak-1.c scan-assembler-not weak[^ \t]*[ \t]_?i PASS: gcc.dg/weak/weak-12.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-12.c scan-assembler weak[^ \t]*[ \t]_?foo PASS: gcc.dg/weak/weak-15.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-15.c scan-assembler weak[^ \t]*[ \t]_?a [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-15.c scan-assembler weak[^ \t]*[ \t]_?c [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-15.c scan-assembler weak[^ \t]*[ \t]_?d PASS: gcc.dg/weak/weak-15.c scan-assembler-not weak[^ \t]*[ \t]_?b PASS: gcc.dg/weak/weak-16.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-16.c scan-assembler weak[^ \t]*[ \t]_?kallsyms_token_index [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-16.c scan-assembler weak[^ \t]*[ \t]_?kallsyms_token_table PASS: gcc.dg/weak/weak-2.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1a [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1b [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1c [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1e PASS: gcc.dg/weak/weak-2.c scan-assembler-not weak[^ \t]*[ \t]_?ffoo1d PASS: gcc.dg/weak/weak-3.c (test for warnings, line 58) PASS: gcc.dg/weak/weak-3.c (test for warnings, line 73) PASS: gcc.dg/weak/weak-3.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1a [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1b [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1c [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1e PASS: gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1f PASS: gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1g PASS: gcc.dg/weak/weak-3.c scan-assembler-not weak[^ \t]*[ \t]_?ffoo1d PASS: gcc.dg/weak/weak-4.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1a [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1b [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1c PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1d PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1e PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1f @@ -131267,16 +131267,16 @@ PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1i PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1j PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1k PASS: gcc.dg/weak/weak-5.c (test for excess errors) [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1a [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1b [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1c PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1d PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1e PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1f PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1g PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1h [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1i [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1j PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1k PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1l These get 'dg-xfail-if'ed or 'dg-skip-if'ed, (mostly) similar to GCC/nvptx. PR target/119369 gcc/ * config/gcn/gcn-protos.h (gcn_asm_weaken_decl): Declare. * config/gcn/gcn.cc (gcn_asm_weaken_decl): New. * config/gcn/gcn-hsa.h (ASM_WEAKEN_DECL): '#define' to this. gcc/testsuite/ * g++.dg/abi/pure-virtual1.C: 'dg-xfail-if' GCN. * g++.dg/cpp0x/pr84497.C: 'dg-skip-if' GCN. * g++.dg/ext/weak2.C: Likewise. * gcc.dg/attr-weakref-1.c: Likewise. * gcc.dg/weak/weak-1.c: Likewise. * gcc.dg/weak/weak-12.c: Likewise. * gcc.dg/weak/weak-15.c: Likewise. * gcc.dg/weak/weak-16.c: Likewise. * gcc.dg/weak/weak-2.c: Likewise. * gcc.dg/weak/weak-3.c: Likewise. * gcc.dg/weak/weak-4.c: Likewise. * gcc.dg/weak/weak-5.c: Likewise.
41 hourstarget/119549 - fixup handling of -mno-sse4 in target attributeRichard Biener1-0/+7
The following fixes ix86_valid_target_attribute_inner_p to properly handle target("no-sse4") via OPT_mno_sse4 rather than as unset OPT_msse4. I've added asserts to ix86_handle_option that RejectNegative is honored for both. PR target/119549 * common/config/i386/i386-common.cc (ix86_handle_option): Assert that both OPT_msse4 and OPT_mno_sse4 are never unset. * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Process negated OPT_msse4 as OPT_mno_sse4. * gcc.target/i386/pr119549.c: New testcase.
47 hoursi386: Add attr_isa for vaes patterns to sync with attr gpr16. [pr119473]Hu, Lin11-4/+16
For vaes patterns with jm constraint and gpr16 attr, it requires "isa" attr to distinct avx/avx512 alternatives in ix86_memory_address_reg_class. Also adds missing type and mode attributes for those vaes patterns. gcc/ChangeLog: PR target/119473 * config/i386/sse.md (vaesdec_<mode>): Set attr "isa" as "avx,vaes_avx512vl", "type" as "sselog1", "mode" as "TI". (vaesdeclast_<mode>): Ditto. (vaesenc_<mode>): Ditto. (vaesenclast_<mode>): Ditto. gcc/testsuite/ChangeLog: PR target/119473 * gcc.target/i386/pr119473.c: New test. Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
2 daysRISC-V: Fix wrong LMUL when only implict zve32f.Monk Chiang4-334/+336
According to Section 3.4.2, Vector Register Grouping, in the RISC-V Vector Specification, the rule for LMUL is LMUL >= SEW/ELEN Changes since V2: - Add check on vector-iterators.md - Add one more testcase to check the VLS use correct mode. gcc/ChangeLog: * config/riscv/riscv-v.cc: Add restrict for insert LMUL. * config/riscv/riscv-vector-builtins-types.def: Use RVV_REQUIRE_ELEN_64 to check LMUL number. * config/riscv/riscv-vector-switch.def: Likewise. * config/riscv/vector-iterators.md: Check TARGET_VECTOR_ELEN_64 rather than "TARGET_MIN_VLEN > 32" for all iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr111391-2.c: Update test. * gcc.target/riscv/rvv/base/abi-14.c: Update test. * gcc.target/riscv/rvv/base/abi-16.c: Update test. * gcc.target/riscv/rvv/base/abi-18.c: Update test. * gcc.target/riscv/rvv/base/vsetvl_zve32-1.c: New test. * gcc.target/riscv/rvv/base/vsetvl_zve32-2.c: New test. Co-authored-by: Kito Cheng <kito.cheng@sifive.com>
2 daysaarch64: Remove +sme -> +sve2 feature flag dependencyAndre Simoes Dias Vieira2-2/+11
As per the AArch64 ISA FEAT_SME does not require FEAT_SVE2. However, we don't support SME without SVE2 and bail out with a 'sorry' if this configuration is encountered. We may choose to support this in the future. gcc/ChangeLog: * config/aarch64/aarch64-option-extensions.def (SME): Remove SVE2 as prerequisite and add in FCMA and F16FML. * config/aarch64/aarch64.cc (aarch64_override_options_internal): Diagnose use of SME without SVE2 and implicitly enable SVE2 when enabling SME after streaming mode diagnosis. * doc/invoke.texi (sme): Document that this can only be used with the sve2 extension. gcc/testsuite/ChangeLog: * gcc.target/aarch64/no-sve-with-sme-1.c: New. * gcc.target/aarch64/no-sve-with-sme-2.c: New. * gcc.target/aarch64/no-sve-with-sme-3.c: New. * gcc.target/aarch64/no-sve-with-sme-4.c: New. * gcc.target/aarch64/pragma_cpp_predefs_4.c: Pass +sve2 to existing +sme pragma. * gcc.target/aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_opt_single_n_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_single_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_int_opt_single_1.c: * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_3.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_4.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_3.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_uint_opt_single_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/clamp_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/compare_scalar_count_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/dot_za_slice_int_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/dot_za_slice_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/dot_za_slice_lane_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/dot_za_slice_uint_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/storexn_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrow_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrowt_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/unary_za_slice_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/write_za_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/write_za_slice_1.c: Likewise.
3 daystarget/119010 - add mode attribute to *vmovv16si_constm1_pternlog_false_depRichard Biener1-1/+5
Like the other instances. This avoids ;; 1--> b 0: i6540 {xmm2=const_vector;unspec[xmm2] 38;} :nothing PR target/119010 * config/i386/sse.md (*vmov<mode>_constm1_pternlog_false_dep): Add mode attribute.
3 daystarget/119010 - Zen4/Zen5 reservations for movlhps loadsRichard Biener1-2/+2
The following fixes up the ssemov2 type introduction, amending the znver4_sse_mov_fp_load reservation. This fixes ;; 14--> b 0: i1436 xmm6=vec_concat(xmm6,[ax+0x8]) :nothing PR target/119010 * config/i386/zn4zn5.md (znver4_sse_mov_fp_load, znver5_sse_mov_fp_load): Also match ssemov2.
3 daystarget/119010 - reservations for Zen4/Zen5 movhlps to memoryRichard Biener1-0/+14
The following adds missing reservations for the store variant of sselog reservations covering ;; 112--> b 0: i1499 [dx-0x10]=vec_select(xmm10,parallel) :nothing PR target/119010 * config/i386/zn4zn5.md (znver4_sse_log_evex_store, znver5_sse_log_evex_store): New reservations.
3 daystarget/119010 - fixup Zen4/Zen5 fp<->int convert reservationsRichard Biener1-3/+10
They were using ssecvt instead of sseicvt, I've also added handling for sseicvt2 which was introduced without fixing up automata, and the relevant instruction uses DFmode. IMO this is a quite messy area that could need TLC in the machine description itself. PR target/119010 * config/i386/zn4zn5.md (znver4_sse_icvt): Use sseicvt. (znver4_sse_icvt_store): Likewise. (znver5_sse_icvt_store): Likewise. (znver4_sse_icvt2): New.
3 daystarget/119010 - handle DFmode in SSE divide reservations for Zen4/Zen5Richard Biener1-3/+3
Like the other DFmode cases. PR target/119010 * config/i386/zn4zn5.md (znver4_sse_div_pd, znver4_sse_div_pd_load, znver5_sse_div_pd_load): Handle DFmode.
3 daystarget/119010 - add reservations for integer vector compares to zen4/zen5Richard Biener1-6/+6
The following handles TI, OI and XI mode in the respective EVEX compare reservations that do not use memory (I've not yet run into ones with). The znver automata has separate reservations for integer compares (but only for zen1, for zen2 and zen3 there are no compare reservations at all), but I don't see why that should be necessary here. PR target/119010 * config/i386/zn4zn5.md (znver4_sse_cmp_avx128, znver5_sse_cmp_avx128): Handle TImode. (znver4_sse_cmp_avx256, znver5_sse_cmp_avx256): Handle OImode. (znver4_sse_cmp_avx512, znver5_sse_cmp_avx512): Handle XImode.
3 daystarget/119010 - missing reservations for Zen4/5 and SSE comparesRichard Biener1-3/+2
There's the znver4_sse_test reservation which matches the memory-less SSE compares but currently requires prefix_extra == 1. The old znver automata in this case sometimes uses znver1-double instead of znver1-direct, but it's quite a maze. The following simply drops the prefix_extra requirement, but I have no idea what I'm doing here. There doesn't seem to be any documentation on the scheduler relevant attributes used, or at least I cannot find that. PR target/119010 * config/i386/zn4zn5.md (znver4_sse_test): Drop test of prefix_extra attribute.
3 daystarget/119010 - fixup zn4zn5 reservation for move from const_vectorRichard Biener1-1/+8
movv8si_internal uses sselog1 and V4SFmode for an instruction like (insn 363 2437 371 97 (set (reg:V8SI 46 xmm10 [1125]) (const_vector:V8SI [ (const_int 0 [0]) repeated x8 ])) "ComputeNonbondedUtil.C":185:21 2402 {movv8si_internal} this wasn't catched by the existing znver4_sse_log1 reservation, I think the znver automaton catches this with the generic (define_insn_reservation "znver1_sse_log1" 1 (and (eq_attr "cpu" "znver1,znver2,znver3") (and (eq_attr "type" "sselog1") (eq_attr "memory" "none"))) "znver1-direct,znver1-fp1|znver1-fp2") which does not look at the mode at all. The zn4zn5 automaton lacks this and instead has separated store and load-store reservations in odd ways. The following renames the store one and introduces a none variant. PR target/119010 * config/i386/zn4zn5.md (znver4_sse_log1): Rename to znver4_sse_log1_store. (znver5_sse_log1): Rename to znver5_sse_log1_store. (znver4_sse_log1): New memory-less variant.
3 daysAlpha: Add option to avoid data races for partial writes [PR117759]Maciej W. Rozycki5-35/+603
Similarly to data races with 8-bit byte or 16-bit word quantity memory writes on non-BWX Alpha implementations we have the same problem even on BWX implementations with partial memory writes produced for unaligned stores as well as block memory move and clear operations. This happens at the boundaries of the area written where we produce unprotected RMW sequences, such as for example: ldbu $1,0($3) stw $31,8($3) stq $1,0($3) to zero a 9-byte member at the byte offset of 1 of a quadword-aligned struct, happily clobbering a 1-byte member at the beginning of said struct if concurrent write happens while executing on the same CPU such as in a signal handler or a parallel write happens while executing on another CPU such as in another thread or via a shared memory segment. To guard against these data races with partial memory write accesses introduce the `-msafe-partial' command-line option that instructs the compiler to protect boundaries of the data quantity accessed by instead using a longer code sequence composed of narrower memory writes where suitable machine instructions are available (i.e. with BWX targets) or atomic RMW access sequences where byte and word memory access machine instructions are not available (i.e. with non-BWX targets). Owing to the desire of branch avoidance there are redundant overlapping writes in unaligned cases where STQ_U operations are used in the middle of a block so as to make sure no part of data to be written has been lost regardless of run-time alignment. For the non-BWX case it means that with blocks whose size is not a multiple of 8 there are additional atomic RMW sequences issued towards the end of the block in addition to the always required pair enclosing the block from each end. Only one such additional atomic RMW sequence is actually required, but code currently issues two for the sake of simplicity. An improvement might be added to `alpha_expand_unaligned_store_words_safe_partial' in the future, by folding `alpha_expand_unaligned_store_safe_partial' code for handling multi-word blocks whose size is not a multiple of 8 (i.e. with a trailing partial-word part). It would improve performance a bit, but current code is correct regardless. Update test cases with `-mno-safe-partial' where required and add new ones accordingly. In some cases GCC chooses to open-code block memory write operations, so with non-BWX targets `-msafe-partial' will in the usual case have to be used together with `-msafe-bwa'. Credit to Magnus Lindholm <linmag7@gmail.com> for sharing hardware for the purpose of verifying the BWX side of this change. gcc/ PR target/117759 * config/alpha/alpha-protos.h (alpha_expand_unaligned_store_safe_partial): New prototype. * config/alpha/alpha.cc (alpha_expand_movmisalign) (alpha_expand_block_move, alpha_expand_block_clear): Handle TARGET_SAFE_PARTIAL. (alpha_expand_unaligned_store_safe_partial) (alpha_expand_unaligned_store_words_safe_partial) (alpha_expand_clear_safe_partial_nobwx): New functions. * config/alpha/alpha.md (insvmisaligndi): Handle TARGET_SAFE_PARTIAL. * config/alpha/alpha.opt (msafe-partial): New option. * config/alpha/alpha.opt.urls: Regenerate. * doc/invoke.texi (Option Summary, DEC Alpha Options): Document the new option. gcc/testsuite/ PR target/117759 * gcc.target/alpha/memclr-a2-o1-c9-ptr.c: Add `-mno-safe-partial'. * gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c: New file. * gcc.target/alpha/memcpy-di-unaligned-dst.c: New file. * gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c: New file. * gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c: New file. * gcc.target/alpha/stlx0.c: Add `-mno-safe-partial'. * gcc.target/alpha/stlx0-safe-partial.c: New file. * gcc.target/alpha/stlx0-safe-partial-bwx.c: New file. * gcc.target/alpha/stqx0.c: Add `-mno-safe-partial'. * gcc.target/alpha/stqx0-safe-partial.c: New file. * gcc.target/alpha/stqx0-safe-partial-bwx.c: New file. * gcc.target/alpha/stwx0.c: Add `-mno-safe-partial'. * gcc.target/alpha/stwx0-bwx.c: Add `-mno-safe-partial'. Refer to stwx0.c rather than copying its code and also verify no LDQ_U or STQ_U instructions have been produced. * gcc.target/alpha/stwx0-safe-partial.c: New file. * gcc.target/alpha/stwx0-safe-partial-bwx.c: New file.
3 daysAlpha: Add option to avoid data races for sub-longword memory stores [PR117759]Maciej W. Rozycki6-6/+229
With non-BWX Alpha implementations we have a problem of data races where a 8-bit byte or 16-bit word quantity is to be written to memory in that in those cases we use an unprotected RMW access of a 32-bit longword or 64-bit quadword width. If contents of the longword or quadword accessed outside the byte or word to be written are changed midway through by a concurrent write executing on the same CPU such as by a signal handler or a parallel write executing on another CPU such as by another thread or via a shared memory segment, then the concluding write of the RMW access will clobber them. This is especially important for the safety of RCU algorithms, but is otherwise an issue anyway. To guard against these data races with byte and aligned word quantities introduce the `-msafe-bwa' command-line option (standing for Safe Byte & Word Access) that instructs the compiler to instead use an atomic RMW access sequence where byte and word memory access machine instructions are not available. There is no change to code produced for BWX targets. It would be sufficient for the secondary reload handle to use a pair of scratch registers, as requested by `reload_out<mode>', but it would end with poor code produced as one of the scratches would be occupied by data retrieved and the other one would have to be reloaded with repeated calculations, all within the LL/SC sequence. Therefore I chose to add a dedicated `reload_out<mode>_safe_bwa' handler and ask for more scratches there by defining a 256-bit OI integer mode. While reload is documented in our manual to support an arbitrary number of scratches in reality it hasn't been implemented for IRA: /* ??? It would be useful to be able to handle only two, or more than three, operands, but for now we can only handle the case of having exactly three: output, input and one temp/scratch. */ and it seems to be the case for LRA as well. Do what everyone else does then and just have one wide multi-register scratch. I note that the atomic sequences emitted are suboptimal performance-wise as the looping branch for the unsuccessful completion of the sequence points backwards, which means it will be predicted as taken despite that in most cases it will fall through. I do not see it as a deficiency of this change proposed as it takes care of recording that the branch is unlikely to be taken, by calling `alpha_emit_unlikely_jump'. Therefore generic code elsewhere should instead be investigated and adjusted accordingly for the arrangement to actually take effect. Add test cases accordingly. There are notable regressions between a plain `-mno-bwx' configuration and a `-mno-bwx -msafe-bwa' one: FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O0 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O1 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O3 -g execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -Os execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: g++.dg/init/array25.C -std=c++17 execution test FAIL: g++.dg/init/array25.C -std=c++98 execution test FAIL: g++.dg/init/array25.C -std=c++26 execution test They come from the fact that these test cases play tricks with alignment and end up calling code that expects a reference to aligned data but is handed one to unaligned data. This doesn't cause a visible problem with plain `-mno-bwx' code, because the resulting alignment exception is fixed up by Linux. There's no such handling currently implemented for LDL_L or LDQ_L instructions (which are first in the sequence) and consequently the offender is issued with SIGBUS instead. Suitable handling will be added to Linux to complement this change that will emulate the trapping instructions[1], so these interim regressions are seen as harmless and expected. References: [1] "Alpha: Emulate unaligned LDx_L/STx_C for data consistency", <https://lore.kernel.org/r/alpine.DEB.2.21.2502181912230.65342@angie.orcam.me.uk/> gcc/ PR target/117759 * config/alpha/alpha-modes.def (OI): New integer mode. * config/alpha/alpha-protos.h (alpha_expand_mov_safe_bwa): New prototype. * config/alpha/alpha.cc (alpha_expand_mov_safe_bwa): New function. (alpha_secondary_reload): Handle TARGET_SAFE_BWA. * config/alpha/alpha.md (aligned_store_safe_bwa) (unaligned_store<mode>_safe_bwa, reload_out<mode>_safe_bwa) (reload_out<mode>_unaligned_safe_bwa): New expanders. (mov<mode>, movcqi, reload_out<mode>_aligned): Handle TARGET_SAFE_BWA. (reload_out<mode>): Guard against TARGET_SAFE_BWA. * config/alpha/alpha.opt (msafe-bwa): New option. * config/alpha/alpha.opt.urls: Regenerate. * doc/invoke.texi (Option Summary, DEC Alpha Options): Document the new option. gcc/testsuite/ PR target/117759 * gcc.target/alpha/stb.c: New file. * gcc.target/alpha/stb-bwa.c: New file. * gcc.target/alpha/stb-bwx.c: New file. * gcc.target/alpha/stba.c: New file. * gcc.target/alpha/stba-bwa.c: New file. * gcc.target/alpha/stba-bwx.c: New file. * gcc.target/alpha/stw.c: New file. * gcc.target/alpha/stw-bwa.c: New file. * gcc.target/alpha/stw-bwx.c: New file. * gcc.target/alpha/stwa.c: New file. * gcc.target/alpha/stwa-bwa.c: New file. * gcc.target/alpha/stwa-bwx.c: New file.
3 daysAlpha: Export `emit_unlikely_jump' for a subsequent change to useMaciej W. Rozycki2-9/+11
Rename `emit_unlikely_jump' function to `alpha_emit_unlikely_jump', so as to avoid namespace pollution, updating callers accordingly and export it for use in the machine description. Make it return the insn emitted. gcc/ * config/alpha/alpha-protos.h (alpha_emit_unlikely_jump): New prototype. * config/alpha/alpha.cc (emit_unlikely_jump): Rename to... (alpha_emit_unlikely_jump): ... this. Return the insn emitted. (alpha_split_atomic_op, alpha_split_compare_and_swap) (alpha_split_compare_and_swap_12, alpha_split_atomic_exchange) (alpha_split_atomic_exchange_12): Update call sites accordingly.
4 daysgcc/mingw: Align `.refptr.` to 8-byte boundaries for 64-bit targetsLIU Hao1-0/+1
Windows only requires sections to be aligned on a 4-byte boundary. This used to work because in binutils the `.rdata` section is over-aligned to a 16-byte boundary, which will be fixed in the future. This matches the output of Clang. Signed-off-by: LIU Hao <lh_mouse@126.com> Signed-off-by: Jonathan Yong <10walls@gmail.com> gcc/ChangeLog: * config/mingw/winnt.cc (mingw_pe_file_end): Add `.p2align`.
5 daysLoongArch: Set default alignment for functions jumps loops and labels.Lulu Cheng3-3/+13
Based on r15-7624, a set of align combinations with better performance was tested through spec2006. LA464: -falign-loops=8 -falign-functions=32 -falign-jumps=32 -falign-labels=8 LA664: -falign-loops=16 -falign-functions=16 -falign-jumps=32 -falign-labels=8 gcc/ChangeLog: * config/loongarch/loongarch-def.cc (la464_align): Add settings for labels. (la664_align): Likewise. * config/loongarch/loongarch-opts.cc (loongarch_target_option_override): Likewise. * config/loongarch/loongarch-tune.h (struct loongarch_align): Implement the function `label_`.
6 daysi386: Fix offset calculation in ix86_redzone_clobberUros Bizjak1-2/+1
plus_constant expects integer as its third argument, not rtx. gcc/ChangeLog: * config/i386/i386.cc (ix86_redzone_clobber): Use integer, not rtx as the third argument of plus_constant.
6 daystarget/119010 - add znver{4,5}_insn_both to resolve missing reservationsRichard Biener1-0/+12
I still was seeing ;; 0--> b 0: i 101 {[sp-0x3c]=[sp-0x3c]+0x1;clobber flags;}:nothing so the following adds a standard alu insn reservation mimicing that from the znver.md description allowing both load and store. PR target/119010 * config/i386/zn4zn5.md (znver4_insn_both, znver5_insn_both): New reservation for ALU ops with load and store.
6 daystarget/119010 - more DFmode handling in zn4zn5 reservationsRichard Biener1-22/+22
The following adds DFmode where V1DFmode and SFmode were handled. This resolves missing reservations for adds, subs [with memory] and for FMAs for the testcase I'm looking at. Resolved cases are -;; 16--> b 0: i 237 xmm3=xmm3+[r9*0x8+si] :nothing -;; 29--> b 0: i 246 xmm3=xmm3+xmm1 :nothing -;; 46--> b 0: i 296 xmm1=xmm1-xmm3 :nothing I've done search-and-replace for this, the catched cases look reasonable though I'm of course not sure all of them can actually happen. This also fixes the matched type for the znver{4,5}_sse_muladd_load reservations from sseshuf to ssemuladd, resolving -;; 1--> b 0: i 161 xmm0={-xmm0*xmm27+[cx+ax]} :nothing -;; 22--> b 0: i 229 xmm11={-xmm11*xmm7+[di*0x8+dx]} :nothing PR target/119010 * config/i386/zn4zn5.md (znver4_sse_add, znver4_sse_add_load, znver5_sse_add_load, znver4_sse_add1, znver4_sse_add1_load, znver5_sse_add1_load, znver4_sse_mul, znver4_sse_mul_load, znver5_sse_mul_load, znver4_sse_cvt, znver4_sse_cvt_load, znver5_sse_cvt_load, znver4_sse_shuf, znver5_sse_shuf, znver4_sse_shuf_load, znver5_sse_shuf_load, znver4_sse_cmp_avx128, znver5_sse_cmp_avx128, znver4_sse_cmp_avx128_load, znver5_sse_cmp_avx128_load): Also handle DFmode. (znver4_sse_muladd_load, znver5_sse_muladd_load): Use ssemuladd type.
7 daysarm: don't vectorize fmaxf() unless unsafe math opts are enabledRichard Earnshaw2-11/+11
This test has presumably been failing since vectorization was enabled at -O2. I suspect part of the reason this wasn't picked up sooner is that the test is a hybrid execution/scan-assembler test and the execution part requires appropriate hardware. The problem is that we are vectorizing an expansion of fmaxf() when the vector version of the instruction does not preserve denormal values. This means we should only apply this optimization when -funsafe-math-optimizations is enabled. This fix does a few things: - Moves the expand pattern to vec-common.md. Although I haven't changed its behaviour (beyond fixing the bug), this should really be enabled for MVE as well (but that will need to wait for gcc-16 since the MVE code needs some additional changes first). - Adds support for HF mode vectors. - splits the test that was exposing the bug into two parts: an executable test and a scan-assembler test. The scan-assembler version is more widely enabled, since it does not require a suitable executable environment. gcc/ChangeLog: * config/arm/neon.md (<fmaxmin><mode>3): Move pattern from here... * config/arm/vec-common.md (<fmaxmin><mode>3): ... to here. Convert to define_expand and disable the pattern when denormal values might get truncated to zero. Iterate on VF to add V4HF and V8HF variants. gcc/testsuite/ChangeLog: * gcc.target/arm/fmaxmin.c: Move scan-assembler checks to ... * gcc.target/arm/fmaxmin-2.c: ... here. New test.
7 daysi386: Set attr "addr" as "gpr16" for constraint "jm". [PR 119425]Hu, Lin11-11/+20
"jm" should with "gpr16", otherwise maybe raise ICE in reload pass. gcc/ChangeLog: PR target/119425 * config/i386/sse.md: (vec_set<mode>_0): Set the alternative with constraint "jm"'s attribute "addr" to "gpr16". (<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>): Ditto. (avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Ditto. (avx2_pblendd<mode>): Ditto. (aesenc): Ditto. (aesenclast): Ditto. (aesdec): Ditto. (aesdeclast): Ditto. (vaesdec_<mode>): Ditto. (vaesdeclast_<mode>): Ditto. (vaesenc_<mode>):: Ditto. (vaesenclast_<mode>):: Ditto. (aes<aesklvariant>u8): Ditto. (*aes<aeswideklvariant>u8): Ditto. gcc/testsuite/ChangeLog: PR target/119425 * gcc.target/i386/pr119425.c: New test. Co-authered-by: Hongyu Wang <hongyu.wang@intel.com>
7 daysLoongArch: Support Q suffix for __float128.Lulu Cheng1-0/+13
In r14-3635 supports `__float128`, but does not support the 'q/Q' suffix. PR target/119408 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_c_mode_for_suffix): New. (TARGET_C_MODE_FOR_SUFFIX): Define. gcc/testsuite/ChangeLog: * gcc.target/loongarch/pr119408.c: New test.
7 daysdriver: Forward '-lstdc++' to offloading compilation [PR101544]Thomas Schwinge2-2/+4
..., so that users don't manually need to specify '-foffload-options=-lstdc++' in addition to '-lstdc++' (specified manually, or implicitly by the driver). Do like commit 4bcb46b3ade1796c5a57b294f5cca25f00671cac "driver: Forward '-lgfortran', '-lm' to offloading compilation". PR driver/101544 gcc/ * gcc.cc (driver_handle_option): Forward host '-lstdc++' to offloading compilation. * config/gcn/mkoffload.cc (main): Adjust. * config/nvptx/mkoffload.cc (main): Likewise. libgomp/ * testsuite/libgomp.c++/pr101544-1-O0.C: Remove '-foffload-options=-lstdc++'. * testsuite/libgomp.c++/pr101544-1.C: Likewise. * testsuite/libgomp.oacc-c++/pr101544-1.C: Likewise.
8 daysi386: Require in peephole2 that memory is offsettable [PR119450]Jakub Jelinek1-1/+3
The following testcase ICEs because a peephole2 attempts to offset memory which is not offsettable (in particular address is a ZERO_EXTEND in this case). Because peephole2s don't support constraints, I've added a check for this in the peephole2's condition. 2025-03-26 Jakub Jelinek <jakub@redhat.com> PR target/119450 * config/i386/i386.md (narrow test peephole2): Test for offsettable_memref_p in condition. * gcc.target/i386/pr119450.c: New test.
8 daystarget/119010 - add missing DF load/store reservations for znver4 and znver5Richard Biener1-5/+5
The following resolves missing reservations for DFmode *movdf_internal loads and stores, visible as 'nothing' in -fsched-verbose=2 dumps. PR target/119010 * config/i386/zn4zn5.md (znver4_sse_mov_fp, znver4_sse_mov_fp_load, znver5_sse_mov_fp_load, znver4_sse_mov_fp_store, znver5_sse_mov_fp_store): Also match V1SF and DF.
8 daystarget/119010 - add missing integer store reservations for znver4 and znver5Richard Biener1-0/+26
The imov and imovx classified stores miss reservations in the znver4/5 pipeline description. The following adds them. PR target/119010 * config/i386/zn4zn5.md (znver4_imov_double_store, znver5_imov_double_store, znver4_imov_store, znver5_imov_store): New reservations for integer stores.
8 daysi386: Add "s_" as Saturation for AVX10.2 Converting Intrinsics.Hu, Lin12-69/+69
This patch aims to add "s_" after 'cvt' represent saturation. gcc/ChangeLog: * config/i386/avx10_2-512convertintrin.h (_mm512_mask_cvtx2ps_ph): Formatting fixes (_mm512_mask_cvtx_round2ps_ph): Ditto (_mm512_maskz_cvtx_round2ps_ph): Ditto (_mm512_cvtbiassph_bf8): Rename to _mm512_cvts_biasph_bf8. (_mm512_mask_cvtbiassph_bf8): Rename to _mm512_mask_cvts_biasph_bf8. (_mm512_maskz_cvtbiassph_bf8): Rename to _mm512_maskz_cvts_biasph_bf8. (_mm512_cvtbiassph_hf8): Rename to _mm512_cvts_biasph_hf8. (_mm512_mask_cvtbiassph_hf8): Rename to _mm512_mask_cvts_biasph_hf8. (_mm512_maskz_cvtbiassph_hf8): Rename to _mm512_maskz_cvts_biasph_hf8. (_mm512_cvts2ph_bf8): Rename to _mm512_cvts_2ph_bf8. (_mm512_mask_cvts2ph_bf8): Rename to _mm512_mask_cvts_2ph_bf8. (_mm512_maskz_cvts2ph_bf8): Rename to _mm512_maskz_cvts_2ph_bf8. (_mm512_cvts2ph_hf8): Rename to _mm512_cvts_2ph_hf8. (_mm512_mask_cvts2ph_hf8): Rename to _mm512_mask_cvts_2ph_hf8. (_mm512_maskz_cvts2ph_hf8): Rename to _mm512_maskz_cvts_2ph_hf8. (_mm512_cvtsph_bf8): Rename to _mm512_cvts_ph_bf8. (_mm512_mask_cvtsph_bf8): Rename to _mm512_mask_cvts_ph_bf8. (_mm512_maskz_cvtsph_bf8): Rename to _mm512_maskz_cvts_ph_bf8. (_mm512_cvtsph_hf8): Rename to _mm512_cvts_ph_hf8. (_mm512_mask_cvtsph_hf8): Rename to _mm512_mask_cvts_ph_hf8. (_mm512_maskz_cvtsph_hf8): Rename to _mm512_maskz_cvts_ph_hf8. * config/i386/avx10_2convertintrin.h (_mm_cvtbiassph_bf8): Rename to _mm_cvts_biasph_bf8. (_mm_mask_cvtbiassph_bf8): Rename to _mm_mask_cvts_biasph_bf8. (_mm_maskz_cvtbiassph_bf8): Rename to _mm_maskz_cvts_biasph_bf8. (_mm256_cvtbiassph_bf8): Rename to _mm256_cvts_biasph_bf8. (_mm256_mask_cvtbiassph_bf8): Rename to _mm256_mask_cvts_biasph_bf8. (_mm256_maskz_cvtbiassph_bf8): Rename to _mm256_maskz_cvts_biasph_bf8. (_mm_cvtbiassph_hf8): Rename to _mm_cvts_biasph_hf8. (_mm_mask_cvtbiassph_hf8): Rename to _mm_mask_cvts_biasph_hf8. (_mm_maskz_cvtbiassph_hf8): Rename to _mm_maskz_cvts_biasph_hf8. (_mm256_cvtbiassph_hf8): Rename to _mm256_cvts_biasph_hf8. (_mm256_mask_cvtbiassph_hf8): Rename to _mm256_mask_cvts_biasph_hf8. (_mm256_maskz_cvtbiassph_hf8): Rename to _mm256_maskz_cvts_biasph_hf8. (_mm_cvts2ph_bf8): Rename to _mm_cvts_2ph_bf8. (_mm_mask_cvts2ph_bf8): Rename to _mm_mask_cvts_2ph_bf8. (_mm_maskz_cvts2ph_bf8): Rename to _mm_maskz_cvts_2ph_bf8. (_mm256_cvts2ph_bf8): Rename to _mm256_cvts_2ph_bf8. (_mm256_mask_cvts2ph_bf8): Rename to _mm256_mask_cvts_2ph_bf8. (_mm256_maskz_cvts2ph_bf8): Rename to _mm256_maskz_cvts_2ph_bf8. (_mm_cvts2ph_hf8): Rename to _mm_cvts_2ph_hf8. (_mm_mask_cvts2ph_hf8): Rename to _mm_mask_cvts_2ph_hf8. (_mm_maskz_cvts2ph_hf8): Rename to _mm_maskz_cvts_2ph_hf8. (_mm256_cvts2ph_hf8): Rename to _mm256_cvts_2ph_hf8. (_mm256_mask_cvts2ph_hf8): Rename to _mm256_mask_cvts_2ph_hf8. (_mm256_maskz_cvts2ph_hf8): Rename to _mm256_maskz_cvts_2ph_hf8. (_mm_cvtsph_bf8): Rename to _mm_cvts_ph_bf8. (_mm_mask_cvtsph_bf8): Rename to _mm_mask_cvts_ph_bf8. (_mm_maskz_cvtsph_bf8): Rename to _mm_maskz_cvts_ph_bf8. (_mm256_cvtsph_bf8): Rename to _mm256_cvts_ph_bf8. (_mm256_mask_cvtsph_bf8): Rename to _mm256_mask_cvts_ph_bf8. (_mm256_maskz_cvtsph_bf8): Rename to _mm256_maskz_cvts_ph_bf8. (_mm_cvtsph_hf8): Rename to _mm_cvts_ph_hf8. (_mm_mask_cvtsph_hf8): Rename to _mm_mask_cvts_ph_hf8. (_mm_maskz_cvtsph_hf8): Rename to _mm_maskz_cvts_ph_hf8. (_mm256_cvtsph_hf8): Rename to _mm256_cvts_ph_hf8. (_mm256_mask_cvtsph_hf8): Rename to _mm256_mask_cvts_ph_hf8. (_mm256_maskz_cvtsph_hf8): Rename to _mm256_maskz_cvts_ph_hf8. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-convert-1.c: Modify function name to follow the latest version. * gcc.target/i386/avx10_2-512-vcvt2ph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvt2ph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-convert-1.c: Ditto.
8 daysi386: Fix up combination of -2 r<<= (x & 7) into btr [PR119428]Jakub Jelinek1-2/+4
The following patch is miscompiled from r15-8478 but latently already since my r11-5756 and r11-6631 changes. The r11-5756 change was https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561164.html which changed the splitters to immediately throw away the masking. And the r11-6631 change was an optimization to recognize (set (zero_extract:HI (...) (const_int 1) (...)) (const_int 1) as btr. The problem is their interaction. x86 is not a SHIFT_COUNT_TRUNCATED target, so the masking needs to be explicit in the IL. And combine.cc (make_field_assignment) has since 1992 optimizations which try to optimize x &= (-2 r<< y) into zero_extract (x) = 0. Now, such an optimization is fine if y has not been masked or if the chosen zero_extract has the same mode as the rotate (or it recognizes something with a left shift too). IMHO such optimization is invalid for SHIFT_COUNT_TRUNCATED targets because we explicitly say that the masking of the shift/rotate counts are redundant there and don't need to be part of the IL (I have a patch for that, but because it is just latent, I'm not sure it needs to be posted for gcc 15 (and also am not sure if it should punt or add operand masking just in case)). x86 is not SHIFT_COUNT_TRUNCATED though and so even fixing combine not to do that for SHIFT_COUNT_TRUNCATED targets doesn't help, and we don't have QImode insv, so it is optimized into HImode insertions. Now, if the y in x &= (-2 r<< y) wasn't masked in any way, turning it into HImode btr is just fine, but if it was x &= (-2 r<< (y & 7)) and we just decided to throw away the masking, using btr changes the behavior on it and causes e2fsprogs and sqlite miscompilations. So IMHO on !SHIFT_COUNT_TRUNCATED targets, we need to keep the maskings explicit in the IL, either at least for the duration of the combine pass as does the following patch (where combine is the only known pass to have such transformation), or even keep it until final pass in case there are some later optimizations that would also need to know whether there was explicit masking or not and with what mask. The latter change would be much larger. The following patch just reverts the r11-5756 change and adds a testcase. 2025-03-25 Jakub Jelinek <jakub@redhat.com> PR target/96226 PR target/119428 * config/i386/i386.md (splitter after *<rotate_insn><mode>3_mask, splitter after *<rotate_insn><mode>3_mask_1): Revert 2020-12-05 changes. * gcc.c-torture/execute/pr119428.c: New test.
8 daysRISC-V: disable the abd expander for gcc-15 release [PR119224]Vineet Gupta1-1/+2
It seems the new expander triggers a latent issue in sched1 causing extraneous spills in a different sad variant. Given how close we are to gcc-15 release, disable it for now. Since we do want to retain and re-enable this capabilty, manully disable vs. reverting the orig patch which takes away the test case too. Fix the orig test case to expect old codegen idiom (although vneg is no longer emitted, in favor of vrsub). Also add a new testcase which flags any future spills in the affected routine. PR target/119224 gcc/ChangeLog: * config/riscv/autovec.md: Disable abd splitter. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr117722.c: Adjust output insn. * gcc.target/riscv/rvv/autovec/pr119224.c: Add new test. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
9 daysarm: add commutative alternatives to <US>mull pattern.Richard Earnshaw1-5/+5
Prior to Armv6, the SMULL and UMULL instructions, which have the form UMULL Rdlo, Rdhi, Rm, Rs had an operand restriction such that Rdlo, Rdhi and Rm must all be different registers. Rs, however can overlap either of the destination registers. Add some register-tie alternatives to allow the register allocator to find these forms without having to use additional register moves. In addition to this, the test is pretty meaningless on Thumb-1 targets as the S/UMULL instructions do not exist in a 16-bit encoding. So skip the test in this case. gcc/ChangeLog: * config/arm/arm.md (<US>mull): Add alternatives that allow Rs to be tied to either Rdlo or Rdhi. gcc/testsuite/ChangeLog: * gcc.target/arm/pr42575.c: Skip test if thumb1.
10 daysnvptx: In offloading compilation, special-case certain host-setup symbol ↵Thomas Schwinge1-1/+27
aliases [PR101544] Namely, use PTX '.alias' even for (default) '-mno-alias' if the host made the C++ "base and complete [cd]tor aliases". PR target/101544 gcc/ * config/nvptx/nvptx.cc (nvptx_asm_output_def_from_decls) [ACCEL_COMPILER]: Special-case certain host-setup symbol aliases. * varasm.cc (do_assemble_alias) [ACCEL_COMPILER]: Adjust.
10 daysnvptx: Default at least to '-mptx=6.3'Thomas Schwinge1-0/+3
gcc/ * config/nvptx/nvptx.cc (default_ptx_version_option): Default at least to '-mptx=6.3'. * doc/invoke.texi (Nvidia PTX Options): Update '-mptx=[...]'. gcc/testsuite/ * gcc.target/nvptx/march-map=sm_30.c: Adjust. * gcc.target/nvptx/march-map=sm_32.c: Likewise. * gcc.target/nvptx/march-map=sm_35.c: Likewise. * gcc.target/nvptx/march-map=sm_37.c: Likewise. * gcc.target/nvptx/march-map=sm_50.c: Likewise. * gcc.target/nvptx/march=sm_30.c: Likewise. * gcc.target/nvptx/march=sm_35.c: Likewise. * gcc.target/nvptx/march=sm_37.c: Likewise.
10 daysi386: Raise deprecate warning for -mavx10.1-256/512 and -mevex512 while add ↵Haochen Jiang7-22/+33
-mavx10.1 back with 512 bit alias When AVX10.1 options are added into GCC 14, E-core is supposed to support up to 256 bit vector width, while P-core up to 512 bit vector width. Therefore, we added avx10.1-256 and avx10.1-512 options into compiler since there will be real platforms with 256 bit only support. At the same time, for old platforms could also compile a 256 bit only binary, we introduced -mno-evex512 to disable 512 bit vector. However, all the future platforms will now support 512 bit vector width, including P-core and E-core. It will result in no need for split the option for vector width. Therefore, we will remove them in this patch. Unlike AVX10.2 options, AVX10.1 options has been there in a major release, so we have to raise a deprecate warning in GCC 15 and remove them in GCC 16. At the same time, to align with avx10.2 options, we will add just removed avx10.1 option back with warning to mention its behavior change. gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Change to FEATURE_AVX10_1. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVX10_1_512_SET): Renamed to ... (OPTION_MASK_ISA2_AVX10_1_SET): ... this. (OPTION_MASK_ISA2_AVX10_2_SET): Use renamed macro. (OPTION_MASK_ISA2_AVX10_1_UNSET): Ditto. (ix86_handle_option): Ditto. (processor_alias_table): Use P_PROC_AVX10_1. * common/config/i386/i386-cpuinfo.h (enum feature_priority): Rename from AVX10_1_512 to AVX10_1. (enum processor_features): Ditto. * common/config/i386/i386-isas.h: Add avx10.1. * config/i386/driver-i386.cc (host_detect_local_cpu): Use renamed enum. * config/i386/i386-c.cc (ix86_target_macros_internal): Rename to avx10.1. * config/i386/i386-isa.def (AVX10_1_512): Rename to ... (AVX10_1): ... this. * config/i386/i386-options.cc (isa2_opts): Rename to avx10.1. (ix86_valid_target_attribute_inner_p): Add avx10.1. (ix86_option_override_internal): Rename to AVX10_1. Revise warnings to mention behavior change for option combination in GCC 16. * config/i386/i386.h (PTA_DIAMONDRAPIDS): Use AVX10_1. * config/i386/i386.opt: Add avx10.1. Add deprecate warnings for mevex512 and mavx10.1-256/512. * config/i386/i386.opt.urls: Add avx10.1. * doc/extend.texi: Ditto. * doc/sourcebuild.texi: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10-check.h: Change to avx10.1. * gcc.target/i386/avx10_1-1.c: Add warning check. * gcc.target/i386/avx10_1-10.c: Ditto. * gcc.target/i386/avx10_1-11.c: Ditto. * gcc.target/i386/avx10_1-12.c: Ditto. * gcc.target/i386/avx10_1-13.c: Ditto. * gcc.target/i386/avx10_1-15.c: Ditto. * gcc.target/i386/avx10_1-16.c: Ditto. * gcc.target/i386/avx10_1-18.c: Ditto. * gcc.target/i386/avx10_1-19.c: Ditto. * gcc.target/i386/avx10_1-2.c: Ditto. * gcc.target/i386/avx10_1-20.c: Ditto. * gcc.target/i386/avx10_1-21.c: Ditto. * gcc.target/i386/avx10_1-22.c: Ditto. * gcc.target/i386/avx10_1-23.c: Ditto. * gcc.target/i386/avx10_1-26.c: Ditto. * gcc.target/i386/avx10_1-3.c: Ditto. * gcc.target/i386/avx10_1-4.c: Ditto. * gcc.target/i386/avx10_1-7.c: Ditto. * gcc.target/i386/avx10_1-8.c: Ditto. * gcc.target/i386/avx10_1-9.c: Ditto. * gcc.target/i386/noevex512-1.c: Ditto. * gcc.target/i386/noevex512-2.c: Ditto. * gcc.target/i386/pr111068.c: Ditto. * gcc.target/i386/pr111907.c: Ditto. * gcc.target/i386/pr117240_avx512f.c: Ditto. * gcc.target/i386/pr117304-1.c: Ditto. * gcc.target/i386/pr117946.c: Ditto. * gcc.target/i386/avx10_1-24.c: Removed. * gcc.target/i386/avx10_1-25.c: Removed. * gcc.target/i386/avx10_1-5.c: Removed. * gcc.target/i386/avx10_1-6.c: Removed.
10 daysi386: Remove avx10.2-256 and avx10.2-512 optionsHaochen Jiang27-654/+613
When AVX10.2 options are added into GCC 15, E-core is supposed to support up to 256 bit vector width, while P-core up to 512 bit vector width. Therefore, we added avx10.2-256 and avx10.2-512 options into compiler since there will be real platforms with 256 bit only support. However, all the future platforms will now support 512 bit vector width, including P-core and E-core. It will result in no need for split the option for vector width. Therefore, we will remove them in this patch. gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Revise the logic AVX10 version. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVX10_2_256_SET): Removed. (OPTION_MASK_ISA2_AVX10_2_512_SET): Ditto. (OPTION_MASK_ISA2_AVX10_2_SET): New. (OPTION_MASK_ISA2_AMX_AVX512_SET): Use AVX10.2 macro. (OPTION_MASK_ISA2_AVX10_2_UNSET): Ditto. (ix86_handle_option): Remove avx10.2-256 part. Adjust avx10.2. * common/config/i386/i386-cpuinfo.h (enum processor_features): Remove FEATURE_AVX10_2_256 and skip the value for it. Change the name from FEATURE_AVX10_2_512 to FEATURE_AVX10_2. * common/config/i386/i386-isas.h: Remove avx10.2-256/512. * config/i386/avx10_2-512bf16intrin.h: Use avx10.2 instead of avx10.2-256/512. * config/i386/avx10_2-512convertintrin.h: Ditto. * config/i386/avx10_2-512mediaintrin.h: Ditto. * config/i386/avx10_2-512minmaxintrin.h: Ditto. * config/i386/avx10_2-512satcvtintrin.h: Ditto. * config/i386/avx10_2bf16intrin.h: Ditto. * config/i386/avx10_2convertintrin.h: Ditto. * config/i386/avx10_2mediaintrin.h: Ditto. * config/i386/avx10_2minmaxintrin.h: Ditto. * config/i386/avx10_2satcvtintrin.h: Ditto. * config/i386/movrsintrin.h: Ditto. * config/i386/sm4intrin.h: Ditto. * config/i386/cpuid.h (bit_AVX10_256): Removed. (bit_AVX10_512): Ditto. * config/i386/driver-i386.cc (host_detect_local_cpu): Adjust Diamond Rapids and -march=native condition. * config/i386/i386-builtin.def (BDESC): Use AVX10.2 macro instead of AVX10.2-256/512. * config/i386/i386-c.cc (ix86_target_macros_internal): Ditto. * config/i386/i386-expand.cc (ix86_expand_branch): Use TARGET_AVX10_2 instead of specifying vector size. (ix86_prepare_fp_compare_args): Ditto. (ix86_expand_fp_compare): Ditto. (ix86_ssecom_setcc): Ditto. (ix86_expand_sse_comi): Ditto. (ix86_expand_sse_comi_round): Ditto. (ix86_check_builtin_isa_match): Ditto. * config/i386/i386.cc (ix86_fp_compare_code_to_integer): Ditto. (ix86_get_mask_mode): Ditto. * config/i386/i386.h (SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P): Ditto. * config/i386/i386.md: Ditto. * config/i386/mmx.md: Ditto. * config/i386/sse.md: Ditto. * config/i386/predicates.md: Ditto. * config/i386/i386-isa.def (AVX10_2_256): Removed. (AVX10_2_512): Removed. (AVX10_2): New. * config/i386/i386-options.cc (isa2_opts): Remove avx10.2-256/512. (ix86_valid_target_attribute_inner_p): Ditto. (PTA_DIAMONDRAPIDS): Use PTA_AVX10_2. * config/i386/i386.opt: Remove avx10.2-256/512. * config/i386/i386.opt.urls: Ditto. * doc/extend.texi: Ditto. * doc/invoke.texi: Ditto. * doc/sourcebuild.texi: Ditto.
10 daysRevert "AVX10.2 ymm rounding: Support vadd{s,d,h} and vcmp{s,d,h} intrins"Haochen Jiang7-437/+68
This reverts commit e22e3af1954469c40b139b7cfa8e7708592f4bfd.
10 daysRevert "AVX10.2 ymm rounding: Support vcvtdq2p{s,h} and vcvtpd2p{s,h} intrins"Haochen Jiang6-249/+9
This reverts commit 85e874d19548f0dcb9a3f14f9e4b1e3411c88c4b.
10 daysRevert "AVX10.2 ymm rounding: Support vcvtpd2{,u}{dq,qq} intrins"Haochen Jiang6-234/+6
This reverts commit 508ac49e1a94c28346642bff512d0ed5f4f58b64.
10 daysRevert "AVX10.2 ymm rounding: Support vcvtph2p{s,d,sx} and ↵Haochen Jiang6-410/+9
vcvtph2{,u}{dq,qq} intrins" This reverts commit 6f2eac53b6026836f3222961c32312e02c2c7dbc.
10 daysRevert "AVX10.2 ymm rounding: Support vcvtph2{,u}w and vcvtps2p{d,hx} intrins"Haochen Jiang6-232/+1
This reverts commit b70bb94aca7bc10a54f744d793c32c51f91ce195.
10 daysRevert "AVX10.2 ymm rounding: Support vcvtps2{,u}{dq,qq} intrins"Haochen Jiang6-240/+5
This reverts commit 0f5a42d41b46b746c6f77374d76a3b918a1e2b57.
10 daysRevert "AVX10.2 ymm rounding: Support vcvtqq2p{s,d,h} and ↵Haochen Jiang6-434/+14
vcvttpd2{,u}{dq,qq} intrins" This reverts commit 6e231f8504874828b23bbe89f3ef4086dcc15a44.
10 daysRevert "AVX10.2 ymm rounding: Support vcvttph2{,u}{dq,qq,w} intrins"Haochen Jiang4-347/+5
This reverts commit 493c5096050523ebc05e5fa21612683a996b97a7.
10 daysRevert "AVX10.2 ymm rounding: Support vcvttps2{,u}{dq,qq} and ↵Haochen Jiang3-515/+13
vcvtu{dq,qq}2p{s,d,h} intrins" This reverts commit b2754227139512adecb6fda067632b587ff4a017.
10 daysRevert "AVX10.2 ymm rounding: Support vcvt{,u}w2ph and vdivp{s,d,h} intrins"Haochen Jiang4-293/+0
This reverts commit 3d1b5530ea1d23e26dc5ab70aa4a2e7b9dc19b50.
10 daysRevert "AVX10.2 ymm rounding: Support vfc{madd,mul}cph, vfixupimmp{s,d} intrins"Haochen Jiang5-269/+2
This reverts commit 95980b292b24110d3f1dffb81926df23c61b4fe7.
10 daysRevert "AVX10.2 ymm rounding: Support vfmadd{132,231,213}p{s,d,h} intrins"Haochen Jiang3-186/+1
This reverts commit 0683ca355a87fd36a2e7ae1721199204ceff4c4c.
10 daysRevert "AVX10.2 ymm rounding: Support vfmaddcph and ↵Haochen Jiang3-253/+2
vfmaddsub{132,231,213}p{s,d,h} intrins" This reverts commit cfbc94eaf167ae7aecd21ee6054556e1cf9d7143.
10 daysRevert "AVX10.2 ymm rounding: Support vfm{sub,subadd}{132,231,213}p{s,d,h} ↵Haochen Jiang3-369/+1
intrins" This reverts commit dd48acbe85ca55dd23ffafbb917ffe559d13b6a3.