aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2024-07-05i386: Refactor ssedoublemodeHu, Lin11-10/+9
ssedoublemode's double should mean double type, like SI -> DI. And we need to refactor some patterns with <ssedoublemode> instead of <ssedoublevecmode>. gcc/ChangeLog: * config/i386/sse.md (ssedoublemode): Remove mappings to twice the number of same-sized elements. Add mappings to the same number of double-sized elements. (define_split for vec_concat_minus_plus): Change mode_attr from ssedoublemode to ssedoublevecmode. (define_split for vec_concat_plus_minus): Ditto. (<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>): Ditto. (avx512f_shuf_<shuffletype>64x2_1<mask_name>): Ditto. (avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Ditto. (avx512f_shuf_<shuffletype>32x4_1<mask_name>): Ditto.
2024-07-05MIPS: Support more cases with alien mode of SHF.DFYunQiang Su3-15/+170
Currently, we support the cases that strictly fit for the instructions. For example, for V16QImode, we only support shuffle like (0<=N0, N1, N2, N3<=3 here) N0, N1, N2, N3 N0+4 N1+4 N2+4, N3+4 N0+8 N1+8 N2+8, N3+8 N0+12 N1+12 N2+12, N3+12 While in fact we can support more cases to try use other SHF.DF instructions not strictly fitting the mode. 1) We can use SHF.H to support more cases for V16QImode: (M0/M1/M2/M3 are 0 or 2 or 4 or 6) M0 M0+1, M1, M1+1 M2 M2+1, M3, M3+1 M0+8 M0+9, M1+8, M1+9 M2+8 M2+9, M3+8, M3+9 2) We can use SHF.W to support some cases for V16QImode: (M0/M1/M2/M3 are 0 or 4 or 8 or 12) M0, M0+1, M0+2, M0+3 M1, M1+1, M1+2, M1+3 M2, M2+1, M2+2, M2+3 M3, M3+1, M3+2, M3+3 3) We can use SHF.W to support some cases for V8HImode: (M0/M1/M2/M3 are 0 or 2 or 4 or 6) M0, M0+1 M1, M1+1 M2, M2+1 M3, M3+1 4) We can also use SHF.W to swap the 2 parts of V2DF or V2DI. gcc * config/mips/mips-protos.h: New function mips_msa_shf_i8. * config/mips/mips-msa.md(MSA_WHB_W): Not used anymore; (msa_shf_<msafmt_f>): Use mips_msa_shf_i8. * config/mips/mips.cc(mips_const_vector_shuffle_set_p): Support more cases try to use alien mode instruction; (mips_msa_shf_i8): New function to get the correct MSA SHF instruction and IMM.
2024-07-05Testsuite/MIPS: Fix msa.c: test7_v2f64, test7_v4f32, test43_v2i64YunQiang Su1-3/+3
BNEGI.W/D are used for test7_v2f64 and test7_v4f32 now. It is an improvment since that we can save a instruction. ILVR.D is used for test43_v2i64 now, instead of INSVE.D. gcc/testsuite * gcc.target/mips/msa.c: Fix test7_v2f64, test7_v4f32 and test43_v2i64.
2024-07-05MIPS/testsuite: Add -mfpxx to call-clobbered-1.cYunQiang Su1-1/+1
The scan-assembler-times rules only fit for -mfp32 and -mfpxx. It fails if we are configured as FP64 by default, as it has one less sdc1/ldc1 pair. gcc/testsuite * gcc.target/mips/call-clobbered-1.c: Add -mfpxx.
2024-07-05MIPS/testsuite: Fix umips-save-restore-1.cYunQiang Su1-4/+6
With some recent optimization, -O1/-O2/-O3 can archive almost same performace/size by stack load/store. Thus lwm/swm will save/store less callee-saved register. In fact only $16 is saved with swm. To be sure that this optimization does exist, let's add 2 more function calls. So that lwm/swm can be much more profitable. If we add only once more, -O1 will still use stack load/store. gcc/testsuite * gcc.target/mips/umips-save-restore-1.c: Be sure lwm/swm are used for more callee-saved registers with addtional 2 more function calls.
2024-07-05Support group size of three in SLP store permute loweringRichard Biener3-1/+97
The following implements the group-size three scheme from vect_permute_store_chain in SLP grouped store permute lowering and extends it to power-of-two multiples of group size three. The scheme goes from vectors A, B and C to { A[0], B[0], C[0], A[1], B[1], C[1], ... } by first producing { A[0], B[0], X, A[1], B[1], X, ... } (with X random but chosen to A[n]) and then permuting in C[n] in the appropriate places. The extension goes as to replace vector elements with a power-of-two number of lanes and you'd get pairwise interleaving until the final three input permutes happen. The last permute step could be seen as extending C to { C[0], C[0], C[0], ... } and then performing a blend. VLA archs will want to use store-lanes here I guess, I'm not sure if the three vector interleave operation is also available with a register source and destination and thus available for a shuffle. * tree-vect-slp.cc (vect_build_slp_instance): Special case three input permute with the same number of lanes in store permute lowering. * gcc.dg/vect/slp-53.c: New testcase. * gcc.dg/vect/slp-54.c: New testcase.
2024-07-05Daily bump.GCC Administrator5-1/+155
2024-07-04analyzer: convert sm_context * to sm_context &David Malcolm12-427/+419
These are never nullptr and never change, so use a reference rather than a pointer. No functional change intended. gcc/analyzer/ChangeLog: * diagnostic-manager.cc (diagnostic_manager::add_events_for_eedge): Pass sm_ctxt by reference. * engine.cc (impl_region_model_context::on_condition): Likewise. (impl_region_model_context::on_bounded_ranges): Likewise. (impl_region_model_context::on_phi): Likewise. (exploded_node::on_stmt): Likewise. * sm-fd.cc: Update all uses of sm_context * to sm_context &. * sm-file.cc: Likewise. * sm-malloc.cc: Likewise. * sm-pattern-test.cc: Likewise. * sm-sensitive.cc: Likewise. * sm-signal.cc: Likewise. * sm-taint.cc: Likewise. * sm.h: Likewise. * varargs.cc: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/plugin/analyzer_gil_plugin.c: Update all uses of sm_context * to sm_context &. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-07-04analyzer: handle <error.h> at -O0 [PR115724]David Malcolm2-0/+90
At -O0, glibc's: __extern_always_inline void error (int __status, int __errnum, const char *__format, ...) { if (__builtin_constant_p (__status) && __status != 0) __error_noreturn (__status, __errnum, __format, __builtin_va_arg_pack ()); else __error_alias (__status, __errnum, __format, __builtin_va_arg_pack ()); } becomes just: __extern_always_inline void error (int __status, int __errnum, const char *__format, ...) { if (0) __error_noreturn (__status, __errnum, __format, __builtin_va_arg_pack ()); else __error_alias (__status, __errnum, __format, __builtin_va_arg_pack ()); } and thus calls to "error" are calls to "__error_alias" by the time -fanalyzer "sees" them. Handle them with more special-casing in kf.cc. gcc/analyzer/ChangeLog: PR analyzer/115724 * kf.cc (register_known_functions): Add __error_alias and __error_at_line_alias. gcc/testsuite/ChangeLog: PR analyzer/115724 * c-c++-common/analyzer/error-pr115724.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-07-04[committed][RISC-V] Fix test expectations after recent late-combine changesJeff Law1-3/+3
With the recent DCE related adjustment to late-combine the rvv/base/vcreate.c test no longer has those undesirable vmvNr statements. It's a bit unclear why this wasn't written as a scan-assembler-not and xfailed given the comment says we don't want to see vmvNr insructions. I must have missed that during review. This patch adjusts the test to expect no vmvNr statements and if they're ever re-introduced, we'll get a nice unexpected failure. gcc/testsuite * gcc.target/riscv/rvv/base/vcreate.c: Update expected output.
2024-07-04testsuite: Update test for PR115537 to use SVE .Tamar Christina1-1/+1
The PR was about SVE codegen, the testcase accidentally used neoverse-n1 instead of neoverse-v1 as was the original report. This updates the tool options. gcc/testsuite/ChangeLog: PR tree-optimization/115537 * gcc.dg/vect/pr115537.c: Update flag from neoverse-n1 to neoverse-v1.
2024-07-04c++ frontend: check for missing condition for novector [PR115623]Tamar Christina2-1/+11
It looks like I forgot to check in the C++ frontend if a condition exist for the loop being adorned with novector. This causes a segfault because cond isn't expected to be null. This fixes it by issuing ignoring the pragma when there's no loop condition the same way we do in the C frontend. gcc/cp/ChangeLog: PR c++/115623 * semantics.cc (finish_for_cond): Add check for C++ cond. gcc/testsuite/ChangeLog: PR c++/115623 * g++.dg/vect/vect-novector-pragma_2.cc: New test.
2024-07-04arm: Use LDMIA/STMIA for thumb1 DI/DF loads/storesSiarhei Volkau4-8/+58
If the address register is dead after load/store operation it looks beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions, at least if optimizing for size. gcc/ChangeLog: * config/arm/arm.cc (thumb_load_double_from_address): Emit ldmia when address reg rewritten by load. * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New. (peephole2 to rewrite DI/DF store): New. * config/arm/iterators.md (DIDF): New. gcc/testsuite: * gcc.target/arm/thumb1-load-store-64bit.c: Add new test. Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
2024-07-04Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890]Alfie Richards2-3/+1
This change removes code that switches the operands in bigendian mode erroneously. This fixes the related test also. gcc/ChangeLog: PR target/114890 * config/aarch64/aarch64-simd.md: Remove bigendian operand swap. gcc/testsuite/ChangeLog: PR target/114890 * gcc.target/aarch64/vector_intrinsics_asm.c: Remove xfail.
2024-07-04Aarch64: Add test for non-commutative SIMD intrinsicAlfie Richards1-0/+371
This adds a test for non-commutative SIMD NEON intrinsics. Specifically addp is non-commutative and has a bug in the current big-endian implementation. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vector_intrinsics_asm.c: New test.
2024-07-04middle-end/115426 - wrong gimplification of "rm" asm output operandRichard Biener2-0/+22
When the operand is gimplified to an extract of a register or a register we have to disallow memory as we otherwise fail to gimplify it properly. Instead of __asm__("" : "=rm" __imag <r>); we want __asm__("" : "=rm" D.2772); _1 = REALPART_EXPR <r>; r = COMPLEX_EXPR <_1, D.2772>; otherwise SSA rewrite will fail and generate wrong code with 'r' left bare in the asm output. PR middle-end/115426 * gimplify.cc (gimplify_asm_expr): Handle "rm" output constraint gimplified to a register (operation). * gcc.dg/pr115426.c: New testcase.
2024-07-04Use __builtin_cpu_support instead of __get_cpuid_count.liuhongt1-26/+20
gcc/testsuite/ChangeLog: PR target/115748 * gcc.target/i386/avx512-check.h: Use __builtin_cpu_support instead of __get_cpuid_count.
2024-07-04i386: Add additional variant of bswaphisi2_lowpart peephole2.Roger Sayle2-0/+35
This patch adds an additional variation of the peephole2 used to convert bswaphisi2_lowpart into rotlhi3_1_slp, which converts xchgb %ah,%al into rotw if the flags register isn't live. The motivating example is: void ext(int x); void foo(int x) { ext((x&~0xffff)|((x>>8)&0xff)|((x&0xff)<<8)); } where GCC with -O2 currently produces: foo: movl %edi, %eax rolw $8, %ax movl %eax, %edi jmp ext The issue is that the original xchgb (bswaphisi2_lowpart) can only be performed in "Q" registers that allow the %?h register to be used, so reload generates the above two movl. However, it's later in peephole2 where we see that CC_FLAGS can be clobbered, so we can use a rotate word, which is more forgiving with register allocations. With the additional peephole2 proposed here, we now generate: foo: rolw $8, %di jmp ext 2024-07-04 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.md (bswaphisi2_lowpart peephole2): New peephole2 variant to eliminate register shuffling. gcc/testsuite/ChangeLog * gcc.target/i386/xchg-4.c: New test case.
2024-07-03[committed] Fix newlib build failure with rx as well as several dozen ↵Jeff Law1-3/+2
testsuite failures The rx port has been failing to build newlib for a bit over a week. I can't remember if it was the late-combine work or the IRA costing twiddle, regardless the real bug is in the rx backend. Basically dwarf2cfi is blowing up because of inconsistent state caused by the failure to mark a stack adjustment as frame related. This instance in the epilogue looks like a simple goof. With the port building again, the testsuite would run and it showed a number of regressions, again related to CFI handling. The common thread was a failure to mark a copy from FP to SP in the prologue as frame related. The change which introduced this bug as supposed to just be changing promotions of vector types. It's unclear if Nick included the hunk accidentally or just goof'd on the logic. Regardless it looks quite incorrect. Reverting that hunk fixes the regressions *and* fixes 94 pre-existing failures. The net is rx-elf is regression free and has moved forward in terms of its testsuite status. Pushing to the trunk momentarily. gcc/ * config/rx/rx.cc (rx_expand_prologue): Mark the copy from FP to SP as frame related. (rx_expand_epilogue): Mark the stack pointer adjustment as frame related.
2024-07-04[APX PPX] Avoid generating unmatched pushp/popp in pro/epilogueHongyu Wang4-4/+34
According to APX spec, the pushp/popp pairs should be matched, otherwise the PPX hint cannot take effect and cause performance loss. In the ix86_expand_epilogue, there are several optimizations that may cause the epilogue using mov to restore the regs. Check if PPX applied and prevent usage of mov/leave in the epilogue. Also do not use PPX for eh_return. gcc/ChangeLog: * config/i386/i386.cc (ix86_expand_prologue): Set apx_ppx_used flag in m.fs with TARGET_APX_PPX && !crtl->calls_eh_return. (ix86_emit_save_regs): Emit ppx is available only when TARGET_APX_PPX && !crtl->calls_eh_return. (ix86_expand_epilogue): Don't restore reg using mov when apx_ppx_used flag is true. * config/i386/i386.h (struct machine_frame_state): Add apx_ppx_used flag. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ppx-2.c: New test. * gcc.target/i386/apx-ppx-3.c: Likewise.
2024-07-03c++: OVERLOAD in diagnosticsJason Merrill2-5/+3
In modules we can get an OVERLOAD around a non-function, so let's tail recurse instead of falling through. As a result we start printing the template header in this testcase. gcc/cp/ChangeLog: * error.cc (dump_decl) [OVERLOAD]: Recurse on single case. gcc/testsuite/ChangeLog: * g++.dg/warn/pr61945.C: Adjust diagnostic.
2024-07-03c++: CTAD and trait built-insJason Merrill1-0/+5
While poking at 101232 I noticed that we started trying to parse __is_invocable(_Fn, _Args...) as a functional cast to a CTAD placeholder type; we shouldn't consider CTAD for a template that shares a name (reserved for the implementation) with a built-in trait. gcc/cp/ChangeLog: * pt.cc (ctad_template_p): Return false for trait names.
2024-07-04vect: Fix ICE caused by missing check for TREE_CODE == SSA_NAMEHu, Lin14-1/+48
Need to check if the tree's code is SSA_NAME before SSA_NAME_RANGE_INFO. 2024-07-03 Hu, Lin1 <lin1.hu@intel.com> Andrew Pinski <quic_apinski@quicinc.com> gcc/ChangeLog: PR tree-optimization/115753 * tree-vect-stmts.cc (supportable_indirect_convert_operation): Add TYPE_CODE check before SSA_NAME_RANGE_INFO. gcc/testsuite/ChangeLog: PR tree-optimization/115753 * gcc.dg/vect/pr115753-1.c: New test. * gcc.dg/vect/pr115753-2.c: Ditto. * gcc.dg/vect/pr115753-3.c: Ditto.
2024-07-04Daily bump.GCC Administrator6-1/+286
2024-07-03[committed] Fix previously latent bug in reorg affecting cris portJeff Law1-1/+2
The late-combine patch has triggered a previously latent bug in reorg. Basically we have a sequence like this in the middle of reorg before we start relaxing delay slots (cris-elf, gcc.dg/torture/pr98289.c) > (insn 67 49 18 (sequence [ > (jump_insn 50 49 52 (set (pc) > (if_then_else (ne (reg:CC 19 ccr) > (const_int 0 [0])) > (label_ref:SI 30) > (pc))) "j.c":10:6 discrim 1 282 {*bnecc} > (expr_list:REG_DEAD (reg:CC 19 ccr) > (int_list:REG_BR_PROB 7 (nil))) > -> 30) > (insn/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1 S4 A8]) > (reg:SI 16 srp)) 37 {*mov_tomemsi} > (nil)) > ]) "j.c":10:6 discrim 1 -1 > (nil)) > > (note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK) > > (note 54 18 55 NOTE_INSN_EPILOGUE_BEG) > > (jump_insn 55 54 56 (return) "j.c":14:1 228 {*return_expanded} > (nil) > -> return) > > (barrier 56 55 43) > > (note 43 56 65 [bb 4] NOTE_INSN_BASIC_BLOCK) > > (note 65 43 30 NOTE_INSN_SWITCH_TEXT_SECTIONS) > > (code_label 30 65 8 5 6 (nil) [1 uses]) > > (note 8 30 61 [bb 5] NOTE_INSN_BASIC_BLOCK) So at a high level the things to note are that insn 50 conditionally jumps around insn 55. Second there's a SWITCH_TEXT_SECTIONS note between insn 50 and the target label for insn 50 (code_label 30). reorg sees the conditional jump around the unconditional jump/return and will invert the jump and retarget the original jump to an appropriate location. In this case generating: > (insn 67 49 18 (sequence [ > (jump_insn 50 49 52 (set (pc) > (if_then_else (eq (reg:CC 19 ccr) > (const_int 0 [0])) > (label_ref:SI 68) > (pc))) "j.c":10:6 discrim 1 281 {*beqcc} > (expr_list:REG_DEAD (reg:CC 19 ccr) > (int_list:REG_BR_PROB 1073741831 (nil))) > -> 68) > (insn/s/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1 S4 A8]) > (reg:SI 16 srp)) 37 {*mov_tomemsi} > (nil)) > ]) "j.c":10:6 discrim 1 -1 > (nil)) > > (note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK) > > (note 54 18 43 NOTE_INSN_EPILOGUE_BEG) > > (note 43 54 65 [bb 4] NOTE_INSN_BASIC_BLOCK) > > (note 65 43 8 NOTE_INSN_SWITCH_TEXT_SECTIONS) > > (note 8 65 61 [bb 5] NOTE_INSN_BASIC_BLOCK) [ ... ] Where the new target of the jump is a return statement later in the IL. Note that we now have a SWITCH_TEXT_SECTIONS note that is not immediately preceded by a BARRIER. That triggers an assertion in the dwarf2 code. Removal of the BARRIER is inherent in this optimization. The fix is simple, we avoid this optimization when there's a SWITCH_TEXT_SECTIONS note between the conditional jump insn and its target. Thankfully we already have a routine to test for this in reorg, so we just need to call it appropriately. The other approach would be to drop the note which I considered and discarded. We don't have great coverage for delay slot targets. I've tested arc, cris, fr30, frv, h8, iq2000, microblaze, or1k, sh3 visium in my tester as crosses without new regressions, fixing one regression along the way. Bootstrap & regression testing on sh4 and hppa will take considerably longer. gcc/ * reorg.cc (relax_delay_slots): Do not optimize a conditional jump around an unconditional jump/return in the presence of a text section switch.
2024-07-03Revert "Delete MALLOC_ABI_ALIGNMENT define from pa32-linux.h"John David Anglin1-0/+5
This reverts commit 0ee3266b3dec4d984d43c79e2b3e649256e3eaaa.
2024-07-03Fortran: fix associate with assumed-length character array [PR115700]Harald Anlauf2-4/+47
gcc/fortran/ChangeLog: PR fortran/115700 * trans-stmt.cc (trans_associate_var): When the associate target is an array-valued character variable, the length is known at entry of the associate block. Move setting of string length of the selector to the initialization part of the block. gcc/testsuite/ChangeLog: PR fortran/115700 * gfortran.dg/associate_69.f90: New test.
2024-07-03RISC-V: Describe -march behavior for dependent extensionsPalmer Dabbelt1-0/+4
gcc/ChangeLog: * doc/invoke.texi: Describe -march behavior for dependent extensions on RISC-V.
2024-07-03RISC-V: Add support for Zabha extensionGianluca Guida30-8/+476
The Zabha extension adds support for subword Zaamo ops. Extension: https://github.com/riscv/riscv-zabha.git Ratification: https://jira.riscv.org/browse/RVS-1685 gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::to_string): Skip zabha when not supported by the assembler. * config.in: Regenerate. * config/riscv/arch-canonicalize: Make zabha imply zaamo. * config/riscv/iterators.md (amobh): Add iterator for amo byte/halfword. * config/riscv/riscv.opt: Add zabha. * config/riscv/sync.md (atomic_<atomic_optab><mode>): Add subword atomic op pattern. (zabha_atomic_fetch_<atomic_optab><mode>): Add subword atomic_fetch op pattern. (lrsc_atomic_fetch_<atomic_optab><mode>): Prefer zabha over lrsc for subword atomic ops. (zabha_atomic_exchange<mode>): Add subword atomic exchange pattern. (lrsc_atomic_exchange<mode>): Prefer zabha over lrsc for subword atomic exchange ops. * configure: Regenerate. * configure.ac: Add zabha assembler check. * doc/sourcebuild.texi: Add zabha documentation. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add zabha testsuite infra support. * gcc.target/riscv/amo/inline-atomics-1.c: Remove zabha to continue to test the lr/sc subword patterns. * gcc.target/riscv/amo/inline-atomics-2.c: Ditto. * gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acq-rel.c: Ditto. * gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acquire.c: Ditto. * gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-relaxed.c: Ditto. * gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-release.c: Ditto. * gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-seq-cst.c: Ditto. * gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acq-rel.c: Ditto. * gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acquire.c: Ditto. * gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-relaxed.c: Ditto. * gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-release.c: Ditto. * gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-seq-cst.c: Ditto. * gcc.target/riscv/amo/zabha-all-amo-ops-char-run.c: New test. * gcc.target/riscv/amo/zabha-all-amo-ops-short-run.c: New test. * gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-char.c: New test. * gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-short.c: New test. * gcc.target/riscv/amo/zabha-rvwmo-amo-add-char.c: New test. * gcc.target/riscv/amo/zabha-rvwmo-amo-add-short.c: New test. * gcc.target/riscv/amo/zabha-ztso-amo-add-char.c: New test. * gcc.target/riscv/amo/zabha-ztso-amo-add-short.c: New test. Co-Authored-By: Patrick O'Neill <patrick@rivosinc.com> Signed-Off-By: Gianluca Guida <gianluca@rivosinc.com> Tested-by: Andrea Parri <andrea@rivosinc.com>
2024-07-03[PATCH] ARC: Update gcc.target/arc/pr9001184797.c testLuis Silva1-1/+3
... to comply with new standards due to stricter analysis in the latest GCC versions. gcc/testsuite/ChangeLog: * gcc.target/arc/pr9001184797.c: Fix compiler warnings.
2024-07-03RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763]Pan Li7-26/+64
According to the ISA, the zvfhmin sub extension should only contain convertion insn. Thus, the vfmv insn acts on FP16 should not be present when only the zvfhmin option is given. This patch would like to fix it by split the pred_broadcast define_insn into zvfhmin and zvfh part. Given below example: void test (_Float16 *dest, _Float16 bias) { dest[0] = bias; dest[1] = bias; } when compile with -march=rv64gcv_zfh_zvfhmin Before this patch: test: vsetivli zero,2,e16,mf4,ta,ma vfmv.v.f v1,fa0 // should not leverage vfmv for zvfhmin vse16.v v1,0(a0) ret After this patch: test: addi sp,sp,-16 fsh fa0,14(sp) addi a5,sp,14 vsetivli zero,2,e16,mf4,ta,ma vlse16.v v1,0(a5),zero vse16.v v1,0(a0) addi sp,sp,16 jr ra PR target/115763 gcc/ChangeLog: * config/riscv/vector.md (*pred_broadcast<mode>): Split into zvfh and zvfhmin part. (*pred_broadcast<mode>_zvfh): New define_insn for zvfh part. (*pred_broadcast<mode>_zvfhmin): Ditto but for zvfhmin. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check. * gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto. * gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto. * gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto. * gcc.target/riscv/rvv/base/pr115763-1.c: New test. * gcc.target/riscv/rvv/base/pr115763-2.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-03Match: Allow more types truncation for .SAT_TRUNCPan Li1-6/+6
The .SAT_TRUNC has the input and output types, aka cvt from itype to otype and the sizeof (otype) < sizeof (itype). The previous patch only allows the sizeof (otype) == sizeof (itype) / 2. But actually we have 1/4 and 1/8 truncation. This patch would like to support more types trunction when sizeof (otype) < sizeof (itype). The below truncation will be covered. * uint64_t => uint8_t * uint64_t => uint16_t * uint64_t => uint32_t * uint32_t => uint8_t * uint32_t => uint16_t * uint16_t => uint8_t The below test suites are passed for this patch: 1. The rv64gcv fully regression tests. 2. The rv64gcv build with glibc. 3. The x86 bootstrap tests. 4. The x86 fully regression tests. gcc/ChangeLog: * match.pd: Allow any otype is less than itype truncation. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-03Vect: Support IFN SAT_TRUNC for unsigned vector intPan Li1-0/+54
This patch would like to support the .SAT_TRUNC for the unsigned vector int. Given we have below example code: Form 1 #define VEC_DEF_SAT_U_TRUC_FMT_1(NT, WT) \ void __attribute__((noinline)) \ vec_sat_u_truc_##WT##_to_##NT##_fmt_1 (NT *x, WT *y, unsigned limit) \ { \ for (unsigned i = 0; i < limit; i++) \ { \ bool overflow = y[i] > (WT)(NT)(-1); \ x[i] = ((NT)y[i]) | (NT)-overflow; \ } \ } VEC_DEF_SAT_U_TRUC_FMT_1 (uint32_t, uint64_t) Before this patch: void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, unsigned int limit) { ... _51 = .SELECT_VL (ivtmp_49, POLY_INT_CST [2, 2]); ivtmp_35 = _51 * 8; vect__4.7_32 = .MASK_LEN_LOAD (vectp_y.5_34, 64B, { -1, ... }, _51, 0); mask_overflow_16.8_30 = vect__4.7_32 > { 4294967295, ... }; vect__5.9_29 = (vector([2,2]) unsigned int) vect__4.7_32; vect__10.13_20 = .VCOND_MASK (mask_overflow_16.8_30, { 4294967295, ... }, vect__5.9_29); ivtmp_12 = _51 * 4; .MASK_LEN_STORE (vectp_x.14_11, 32B, { -1, ... }, _51, 0, vect__10.13_20); vectp_y.5_33 = vectp_y.5_34 + ivtmp_35; vectp_x.14_46 = vectp_x.14_11 + ivtmp_12; ivtmp_50 = ivtmp_49 - _51; if (ivtmp_50 != 0) ... } After this patch: void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, unsigned int limit) { ... _12 = .SELECT_VL (ivtmp_21, POLY_INT_CST [2, 2]); ivtmp_34 = _12 * 8; vect__4.7_31 = .MASK_LEN_LOAD (vectp_y.5_33, 64B, { -1, ... }, _12, 0); vect_patt_40.8_30 = .SAT_TRUNC (vect__4.7_31); // << .SAT_TRUNC ivtmp_29 = _12 * 4; .MASK_LEN_STORE (vectp_x.9_28, 32B, { -1, ... }, _12, 0, vect_patt_40.8_30); vectp_y.5_32 = vectp_y.5_33 + ivtmp_34; vectp_x.9_27 = vectp_x.9_28 + ivtmp_29; ivtmp_20 = ivtmp_21 - _12; if (ivtmp_20 != 0) ... } The below test suites are passed for this patch * The x86 bootstrap test. * The x86 fully regression test. * The rv64gcv fully regression tests. gcc/ChangeLog: * tree-vect-patterns.cc (gimple_unsigned_integer_sat_trunc): Add new decl generated by match. (vect_recog_sat_trunc_pattern): Add new func impl to recog the .SAT_TRUNC pattern. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-03Remove redundant vector permute dumpRichard Biener1-10/+0
The following removes redundant dumping in vect permute vectorization. * tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove redundant dump.
2024-07-03[PATCH] match.pd: Fold x/sqrt(x) to sqrt(x)Jennifer Schmitz2-0/+27
This patch adds a pattern in match.pd folding x/sqrt(x) to sqrt(x) for -funsafe-math-optimizations. Test cases were added for double, float, and long double. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. Ok for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * match.pd: Fold x/sqrt(x) to sqrt(x). gcc/testsuite/ * gcc.dg/tree-ssa/sqrt_div.c: New test.
2024-07-03Deduplicate explicitly-sized typesAlexandre Oliva4-9/+120
When make_type_from_size is called with a biased type, for an entity that isn't explicitly biased, we may refrain from reusing the given type because it doesn't seem to match, and then proceed to create an exact copy of that type. Compute earlier the biased status of the expected type, early enough for the suitability check of the given type. Modify for_biased instead of biased_p, so that biased_p remains with the given type's status for the comparison. Avoid creating unnecessary copies of types in make_type_from_size, by caching and reusing previously-created identical types, similarly to the caching of packable types. While at that, fix two vaguely related issues: - TYPE_DEBUG_TYPE's storage is shared with other sorts of references to types, so it shouldn't be accessed unless TYPE_CAN_HAVE_DEBUG_TYPE_P holds. - When we choose the narrower/packed variant of a type as the main debug info type, we fail to output its name if we fail to follow debug type for the TYPE_NAME decl type in modified_type_die. for gcc/ada/ChangeLog * gcc-interface/misc.cc (gnat_get_array_descr_info): Only follow TYPE_DEBUG_TYPE if TYPE_CAN_HAVE_DEBUG_TYPE_P. * gcc-interface/utils.cc (sized_type_hash): New struct. (sized_type_hasher): New struct. (sized_type_hash_table): New variable. (init_gnat_utils): Allocate it. (destroy_gnat_utils): Release it. (sized_type_hasher::equal): New. (hash_sized_type): New. (canonicalize_sized_type): New. (make_type_from_size): Use it to cache packed variants. Fix type reuse by combining biased_p and for_biased earlier. Hold the combination in for_biased, adjusting later uses. for gcc/ChangeLog * dwarf2out.cc (modified_type_die): Follow name's debug type. for gcc/testsuite/ChangeLog * gnat.dg/bias1.adb: Count occurrences of -7.*DW_AT_GNU_bias.
2024-07-03[debug] Avoid dropping bits from num/den in fixed-point typesAlexandre Oliva2-21/+64
We used to use an unsigned 128-bit type to hold the numerator and denominator used to represent the delta of a fixed-point type in debug information, but there are cases in which that was not enough, and more significant bits silently overflowed and got omitted from debug information. Introduce a mode in which UI_to_gnu selects a wide-enough unsigned type, and use that to convert numerator and denominator. While at that, avoid exceeding the maximum precision for wide ints, and for available int modes, when selecting a type to represent very wide constants, falling back to 0/0 for unrepresentable fractions. for gcc/ada/ChangeLog * gcc-interface/cuintp.cc (UI_To_gnu): Add mode that selects a wide enough unsigned type. Fail if the constant exceeds the representable numbers. * gcc-interface/decl.cc (gnat_to_gnu_entity): Use it for numerator and denominator of fixed-point types. In case of failure, fall back to an indeterminate fraction.
2024-07-03[i386] restore recompute to override opts after change [PR113719]Alexandre Oliva1-19/+40
The first patch for PR113719 regressed gcc.dg/ipa/iinline-attr.c on toolchains configured to --enable-frame-pointer, because the optimization node created within handle_optimize_attribute had flag_omit_frame_pointer incorrectly set, whereas default_optimization_node didn't. With this difference, can_inline_edge_by_limits_p flagged an optimization mismatch and we refused to inline the function that had a redundant optimization flag into one that didn't, which is exactly what is tested for there. This patch restores the calls to ix86_default_align and ix86_recompute_optlev_based_flags that used to be, and ought to be, issued during TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE, but preserves the intent of the original change, of having those functions called at different spots within ix86_option_override_internal. To that end, the remaining bits were refactored into a separate function, that was in turn adjusted to operate on explicitly-passed opts and opts_set, rather than going for their global counterparts. for gcc/ChangeLog PR target/113719 * config/i386/i386-options.cc (ix86_override_options_after_change_1): Add opts and opts_set parms, operate on them, after factoring out of... (ix86_override_options_after_change): ... this. Restore calls of ix86_default_align and ix86_recompute_optlev_based_flags. (ix86_option_override_internal): Call the factored-out bits.
2024-07-03aarch64: PR target/115475 Implement missing __ARM_FEATURE_SVE_BF16 macroKyrylo Tkachov2-0/+13
The ACLE requires __ARM_FEATURE_SVE_BF16 to be enabled when SVE and BF16 and the associated intrinsics are available. GCC does support the required intrinsics for TARGET_SVE_BF16 so define this macro too. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ PR target/115475 * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define __ARM_FEATURE_SVE_BF16 for TARGET_SVE_BF16. gcc/testsuite/ PR target/115475 * gcc.target/aarch64/acle/bf16_sve_feature.c: New test. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
2024-07-03aarch64: PR target/115457 Implement missing __ARM_FEATURE_BF16 macroKyrylo Tkachov2-0/+12
The ACLE asks the user to test for __ARM_FEATURE_BF16 before using the <arm_bf16.h> header but GCC doesn't set this up. LLVM does, so this is an inconsistency between the compilers. This patch enables that macro for TARGET_BF16_FP. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ PR target/115457 * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define __ARM_FEATURE_BF16 for TARGET_BF16_FP. gcc/testsuite/ PR target/115457 * gcc.target/aarch64/acle/bf16_feature.c: New test. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
2024-07-03Handle NULL stmt in SLP_TREE_SCALAR_STMTSRichard Biener3-39/+61
The following starts to handle NULL elements in SLP_TREE_SCALAR_STMTS with the first candidate being the two-operator nodes where some lanes are do-not-care and also do not have a scalar stmt computing the result. I originally added SLP_TREE_SCALAR_STMTS to two-operator nodes but this exposes PR115764, so I've split that out. I have a patch use NULL elements for loads from groups with gaps where we get around not doing that by having a load permutation. * tree-vect-slp.cc (bst_traits::hash): Handle NULL elements in SLP_TREE_SCALAR_STMTS. (vect_print_slp_tree): Likewise. (vect_mark_slp_stmts): Likewise. (vect_mark_slp_stmts_relevant): Likewise. (vect_find_last_scalar_stmt_in_slp): Likewise. (vect_bb_slp_mark_live_stmts): Likewise. (vect_slp_prune_covered_roots): Likewise. (vect_bb_partition_graph_r): Likewise. (vect_remove_slp_scalar_calls): Likewise. (vect_slp_gather_vectorized_scalar_stmts): Likewise. (vect_bb_slp_scalar_cost): Likewise. (vect_contains_pattern_stmt_p): Likewise. (vect_slp_convert_to_external): Likewise. (vect_find_first_scalar_stmt_in_slp): Likewise. (vect_optimize_slp_pass::remove_redundant_permutations): Likewise. (vect_slp_analyze_node_operations_1): Likewise. (vect_schedule_slp_node): Likewise. * tree-vect-stmts.cc (can_vectorize_live_stmts): Likewise. (vectorizable_shift): Likewise. * tree-vect-data-refs.cc (vect_slp_analyze_load_dependences): Handle NULL elements in SLP_TREE_SCALAR_STMTS.
2024-07-03AVR: target/98762 - Handle partial clobber in movqi output.Georg-Johann Lay2-5/+41
PR target/98762 gcc/ * config/avr/avr.cc (avr_out_movqi_r_mr_reg_disp_tiny): Properly restore the base register when it is partially clobbered. gcc/testsuite/ * gcc.target/avr/torture/pr98762.c: New test.
2024-07-03ivopts: replace constant_multiple_of with ↵Tamar Christina1-58/+8
aff_combination_constant_multiple_p [PR114932] The current implementation of constant_multiple_of is doing a more limited version of aff_combination_constant_multiple_p. The only non-debug usage of constant_multiple_of will proceed with the values as affine trees. There is scope for further optimization here, namely I believe that if constant_multiple_of returns the aff_tree after the conversion then get_computation_aff_1 can use it instead of manually creating the aff_tree. However I think it makes sense to first commit this smaller change and then incrementally change things. gcc/ChangeLog: PR tree-optimization/114932 * tree-ssa-loop-ivopts.cc (constant_multiple_of): Use aff_combination_constant_multiple_p instead.
2024-07-03ivopts: fix wide_int_constant_multiple_p when VAL and DIV are 0. [PR114932]Tamar Christina1-5/+8
wide_int_constant_multiple_p tries to check if for two tree expressions a and b that there is a multiplier which makes a == b * c. This code however seems to think that there's no c where a=0 and b=0 are equal which is of course wrong. This fixes it and also fixes the comment. gcc/ChangeLog: PR tree-optimization/114932 * tree-affine.cc (wide_int_constant_multiple_p): Support 0 and 0 being multiples.
2024-07-03Give fast DCE a separate dirty flagRichard Sandiford4-30/+74
Thomas pointed out that we sometimes failed to eliminate some dead code (specifically clobbers of otherwise unused registers) on nvptx when late-combine is enabled. This happens because: - combine is able to optimise the function in a way that exposes dead code. This leaves the df information in a "dirty" state. - late_combine calls df_analyze without DF_LR_RUN_DCE run set. This updates the df information and clears the "dirty" state. - late_combine doesn't find any extra optimisations, and so leaves the df information up-to-date. - if_after_combine (ce2) calls df_analyze with DF_LR_RUN_DCE set. Because the df information is already up-to-date, fast DCE is not run. The upshot is that running late-combine has the effect of suppressing a DCE opportunity that would have been noticed without late_combine. I think this shows that we should track the state of the DCE separately from the LR problem. Every pass updates the latter, but not all passes update the former. gcc/ * df.h (DF_LR_DCE): New df_problem_id. (df_lr_dce): New macro. * df-core.cc (rest_of_handle_df_finish): Check for a null free_fun. * df-problems.cc (df_lr_finalize): Split out fast DCE handling to... (df_lr_dce_finalize): ...this new function. (problem_LR_DCE): New df_problem. (df_lr_add_problem): Register LR_DCE rather than LR itself. * dce.cc (fast_dce): Clear df_lr_dce->solutions_dirty.
2024-07-03Move runtime check into a separate function and guard it with target ("no-avx")liuhongt1-1/+13
The patch can avoid SIGILL on non-AVX512 machine due to kmovd is generated in dynamic check. gcc/testsuite/ChangeLog: PR target/115748 * gcc.target/i386/avx512-check.h: Move runtime check into a separate function and guard it with target ("no-avx").
2024-07-03RISC-V: Fix asm check failure for truncated after SAT_SUBPan Li3-3/+3
It seems that the asm check is incorrect for truncated after SAT_SUB, we should take the vx check for vssubu instead of vv check. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c: Update vssubu check from vv to vx. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c: Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-03tree-optimization/115764 - testcase for BB SLP issueRichard Biener1-0/+30
The following adds a testcase for a CSE issue with BB SLP two operator handling when we make those CSE aware by providing SLP_TREE_SCALAR_STMTS for them. This was reduced from 526.blender_r. PR tree-optimization/115764 * gcc.dg/vect/bb-slp-76.c: New testcase.
2024-07-02preprocessor: Create the parser before handling command-line includes [PR115312]Lewis Hyatt3-1/+4
Since r14-2893, we create a parser object in preprocess-only mode for the purpose of parsing #pragma while preprocessing. The parser object was formerly created after calling c_finish_options(), which leads to problems on platforms that don't use stdc-predef.h (such as MinGW, as reported in the PR). On such platforms, the call to c_finish_options() will process the first command-line-specified include file. If that includes a PCH, then c-ppoutput.cc will encounter a state it did not anticipate. Fix it by creating the parser prior to calling c_finish_options(). gcc/c-family/ChangeLog: PR pch/115312 * c-opts.cc (c_common_init): Call c_init_preprocess() before c_finish_options() so that a parser is available to process any includes specified on the command line. gcc/testsuite/ChangeLog: PR pch/115312 * g++.dg/pch/pr115312.C: New test. * g++.dg/pch/pr115312.Hs: New test.
2024-07-03Daily bump.GCC Administrator5-1/+552