aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2024-02-18LoongArch: Fix wrong return value type of __iocsrrd_h.Lulu Cheng1-1/+1
gcc/ChangeLog: * config/loongarch/larchintrin.h (__iocsrrd_h): Modify the function return value type to unsigned short.
2024-02-16RISC-V: Fix *sge<u>_<X:mode><GPR:mode> patternKito Cheng1-1/+1
*sge<u>_<X:mode><GPR:mode> pattern has referenced operand[2] which is invalid...it should just use `slti<u>` rather than `slti%i2<u>`. gcc/ChangeLog: PR target/106543 * config/riscv/riscv.md (*sge<u>_<X:mode><GPR:mode>): Fix asm pattern.
2024-02-16RISC-V: Add new option -march=help to print all supported extensionsKito Cheng4-2/+26
The output of -march=help is like below: ``` All available -march extensions for RISC-V: Name Version i 2.0, 2.1 e 2.0 m 2.0 a 2.0, 2.1 f 2.0, 2.2 d 2.0, 2.2 ... ``` Also support -print-supported-extensions and --print-supported-extensions for clang compatibility. gcc/ChangeLog: PR target/109349 * common/config/riscv/riscv-common.cc (riscv_arch_help): New. * config/riscv/riscv-protos.h (RISCV_MAJOR_VERSION_BASE): New. (RISCV_MINOR_VERSION_BASE): Ditto. (RISCV_REVISION_VERSION_BASE): Ditto. * config/riscv/riscv-c.cc (riscv_ext_version_value): Use enum rather than magic number. * config/riscv/riscv.h (riscv_arch_help): New. (EXTRA_SPEC_FUNCTIONS): Add riscv_arch_help. (DRIVER_SELF_SPECS): Handle -march=help, -print-supported-extensions and --print-supported-extensions. * config/riscv/riscv.opt (march=help): New. (print-supported-extensions): New. (-print-supported-extensions): New. * doc/invoke.texi (RISC-V Options): Document -march=help. Reviewed-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-02-16Arm: Fix incorrect tailcall-generation for indirect calls [PR113780]Tejas Belagod1-4/+7
This patch fixes a bug that causes indirect calls in PAC-enabled functions to be tailcalled incorrectly when all argument registers R0-R3 are used. 2024-02-07 Tejas Belagod <tejas.belagod@arm.com> PR target/113780 * config/arm/arm.cc (arm_function_ok_for_sibcall): Don't allow tailcalls for indirect calls with 4 or more arguments in pac-enabled functions. * lib/target-supports.exp (v8_1m_main_pacbti): Add __ARM_FEATURE_PAUTH. * gcc.target/arm/pac-sibcall.c: New.
2024-02-15bpf: fix zero_extendqidi2 ldx templateDavid Faust1-1/+1
Commit 77d0f9ec3809b4d2e32c36069b6b9239d301c030 inadvertently changed the normal asm dialect instruction template for zero_extendqidi2 from ldxb to ldxh. Fix that. gcc/ * config/bpf/bpf.md (zero_extendqidi2): Correct asm template to use ldxb instead of ldxh.
2024-02-15AVR: target 113927 - Simple code triggers stack frame for Reduced Tiny.Georg-Johann Lay5-24/+48
The -mmcu=avrtiny cores have no ADIW and SBIW instructions. This was implemented by clearing all regs out of regclass ADDW_REGS so that constraint "w" never matched. This corrupted the subset relations of the register classes as they appear in enum reg_class. This patch keeps ADDW_REGS like for all other cores, i.e. it contains R24...R31. Instead of tests like test_hard_reg_class (ADDW_REGS, *) the code now uses avr_adiw_reg_p (*). And all insns with constraint "w" get "isa" insn attribute value of "adiw". Plus, a new built-in macro __AVR_HAVE_ADIW__ is provided, which is more specific than __AVR_TINY__. gcc/ PR target/113927 * config/avr/avr.h (AVR_HAVE_ADIW): New macro. * config/avr/avr-protos.h (avr_adiw_reg_p): New proto. * config/avr/avr.cc (avr_adiw_reg_p): New function. (avr_conditional_register_usage) [AVR_TINY]: Don't clear ADDW_REGS. Replace test_hard_reg_class (ADDW_REGS, ...) with calls to * config/avr/avr.md: Same. (attr "isa") <tiny, no_tiny>: Remove. <adiw, no_adiw>: Add. (define_insn, define_insn_and_split): When an alternative has constraint "w", then set attribute "isa" to "adiw". * config/avr/avr-c.cc (avr_cpu_cpp_builtins) [AVR_HAVE_ADIW]: Built-in define __AVR_HAVE_ADIW__. * doc/invoke.texi (AVR Options): Document it.
2024-02-15amdgcn: Disallow unsupported permute on RDNA devicesAndrew Stubbs2-8/+14
The RDNA architecture has limited support for permute operations. This should allow use of the permutations that do work, and fall back to linear code for other cases. gcc/ChangeLog: * config/gcn/gcn-valu.md (vec_extract<V_MOV:mode><V_MOV_ALT:mode>): Add conditions for RDNA. * config/gcn/gcn.cc (gcn_vectorize_vec_perm_const): Check permutation details are supported on RDNA devices.
2024-02-14i386: psrlq is not used for PERM<a,{0},1,2,3,4> [PR113871]Uros Bizjak2-14/+89
Introduce vec_shl_<mode> and vec_shr_<mode> expanders to improve '*a = __builtin_shufflevector(*a, (vect64){0}, 1, 2, 3, 4);' and '*a = __builtin_shufflevector((vect64){0}, *a, 3, 4, 5, 6);' shuffles. The generated code improves from: movzwl 6(%rdi), %eax movzwl 4(%rdi), %edx salq $16, %rax orq %rdx, %rax movzwl 2(%rdi), %edx salq $16, %rax orq %rdx, %rax movq %rax, (%rdi) to: movq (%rdi), %xmm0 psrlq $16, %xmm0 movq %xmm0, (%rdi) and to: movq (%rdi), %xmm0 psllq $16, %xmm0 movq %xmm0, (%rdi) in the second case. The patch handles 32-bit vectors as well and improves generated code from: movd (%rdi), %xmm0 pxor %xmm1, %xmm1 punpcklwd %xmm1, %xmm0 pshuflw $230, %xmm0, %xmm0 movd %xmm0, (%rdi) to: movd (%rdi), %xmm0 psrld $16, %xmm0 movd %xmm0, (%rdi) and to: movd (%rdi), %xmm0 pslld $16, %xmm0 movd %xmm0, (%rdi) PR target/113871 gcc/ChangeLog: * config/i386/mmx.md (V248FI): New mode iterator. (V24FI_32): DItto. (vec_shl_<V248FI:mode>): New expander. (vec_shl_<V24FI_32:mode>): Ditto. (vec_shr_<V248FI:mode>): Ditto. (vec_shr_<V24FI_32:mode>): Ditto. * config/i386/sse.md (vec_shl_<V_128:mode>): Simplify expander. (vec_shr_<V248FI:mode>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr113871-1a.c: New test. * gcc.target/i386/pr113871-1b.c: New test. * gcc.target/i386/pr113871-2a.c: New test. * gcc.target/i386/pr113871-2b.c: New test. * gcc.target/i386/pr113871-3a.c: New test. * gcc.target/i386/pr113871-3b.c: New test. * gcc.target/i386/pr113871-4a.c: New test.
2024-02-13x86-64: Use push2/pop2 only if the incoming stack is 16-byte alignedH.J. Lu1-0/+6
Since push2/pop2 requires 16-byte stack alignment, don't use them if the incoming stack isn't 16-byte aligned. gcc/ PR target/113876 * config/i386/i386.cc (ix86_pro_and_epilogue_can_use_push2pop2): Return false if the incoming stack isn't 16-byte aligned. gcc/testsuite/ PR target/113876 * gcc.target/i386/pr113876.c: New test.
2024-02-13Re: [PATCH] RISC-V: Fix macro fusion for auipc+add, when identifying ↵Monk Chiang1-1/+1
UNSPEC_AUIPC. [PR113742] gcc/ChangeLog: PR target/113742 * config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Fix recognizes UNSPEC_AUIPC for RISCV_FUSE_LUI_ADDI. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr113742.c: New test.
2024-02-12x86, libgcc: Implement ia32 basic heap trampoline [PR113855].Iain Sandoe3-7/+2
The initial heap trampoline implementation was targeting 64b platforms. As the PR demonstrates this creates an issue where it is expected that the same symbols are exported for 32 and 64b. Rather than conditionalize the exports and code-gen on x86_64, this patch provides a basic implementation of the IA32 trampoline. This also avoids potential user confusion, when a 32b target has 64b multilibs, and vice versa; which is the case for Darwin. PR target/113855 gcc/ChangeLog: * config/i386/darwin.h (DARWIN_HEAP_T_LIB): Moved to be available to all sub-targets. * config/i386/darwin32-biarch.h (DARWIN_HEAP_T_LIB): Delete. * config/i386/darwin64-biarch.h (DARWIN_HEAP_T_LIB): Delete. libgcc/ChangeLog: * config.host: Add trampoline support to x?86-linux. * config/i386/heap-trampoline.c (trampoline_insns): Provide a variant for IA32. (union ix86_trampoline): Likewise. (__gcc_nested_func_ptr_created): Implement a basic trampoline for IA32.
2024-02-12RISC-V: Fix misspelled term args in error_at messagePan Li1-1/+2
When build with "-Werror=format-diag", there will be one misspelled term args as below. This patch would like fix it by taking the term arguments instead. ../../gcc/config/riscv/riscv-vector-builtins.cc: In function 'tree_node* riscv_vector::resolve_overloaded_builtin(location_t, unsigned int, tree, vec<tree_node*, va_gc>*)': ../../gcc/config/riscv/riscv-vector-builtins.cc:4633:65: error: misspelled term 'args' in format; use 'arguments' instead [-Werror=format-diag] 4633 | error_at (loc, "no matching function call to %qE with empty args", fndecl); gcc/ChangeLog: * config/riscv/riscv-vector-builtins.cc (resolve_overloaded_builtin): Replace args to arguments for misspelled term. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr113766-1.c: Adjust the test cases. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-02-12AVR: target/112944 - Addendum: Link code to initialize NVMCTRL_CTRLB.FLMAPGeorg-Johann Lay1-1/+4
For devices that see a part for the flash memory in the RAM address space, bit-field NVMCTRL_CTRLB.FLMAP must match the value of symbol __flmap. This is achieved by dragging in startup code from lib<mcu>.a. The mechanism is the same like for libgcc's __do_copy_data and __do_clear_bss. The code is implemented in AVR-LibC #931 and can be dragged by referencing __do_flmap_init. In addition to setting FLMAP, that code also sets bit FLMAPLOCK provided symbol __flmap_lock has a non-zero value. This protects FLMAP from future changes. When the __do_flmap_init code is not wanted, the symbol can be satisfied by linking with -Wl,--defsym,__do_flmap_init=0 gcc/ PR target/112944 * config/avr/gen-avr-mmcu-specs.cc (print_mcu) [have_flmap]: <*link_rodata_in_ram>: Spec undefs symbol __do_flmap_init when not linked with -mrodata-in-ram.
2024-02-08AVR: Tidy up gen-avr-mmcu-specs.ccGeorg-Johann Lay1-67/+66
Some information was (re-)computed in different places. This patch computes them in new struct McuInfo and passes it around in order to provide the information. gcc/ * config/avr/gen-avr-mmcu-specs.cc (struct McuInfo): New. (main, print_mcu, diagnose_mrodata_in_ram): Pass it down.
2024-02-08x86: Update constraints for APX NDD instructionsH.J. Lu5-92/+164
1. The only supported TLS code sequence with ADD is addq foo@gottpoff(%rip),%reg Change je constraint to a memory operand in APX NDD ADD pattern with register source operand. 2. The instruction length of APX NDD instructions with immediate operand: op imm, mem, reg may exceed the size limit of 15 byes when non-default address space, segment register or address size prefix are used. Add jM constraint which is a memory operand valid for APX NDD instructions with immediate operand and add jO constraint which is an offsetable memory operand valid for APX NDD instructions with immediate operand. Update APX NDD patterns with jM and jO constraints. gcc/ PR target/113711 PR target/113733 * config/i386/constraints.md: List all constraints with j prefix. (j>): Change auto-dec to auto-inc in documentation. (je): Changed to a memory constraint with APX NDD TLS operand check. (jM): New memory constraint for APX NDD instructions. (jO): Likewise. * config/i386/i386-protos.h (x86_poff_operand_p): Removed. * config/i386/i386.cc (x86_poff_operand_p): Likewise. * config/i386/i386.md (*add<dwi>3_doubleword): Use rjO. (*add<mode>_1[SWI48]): Use je and jM. (addsi_1_zext): Use jM. (*addv<dwi>4_doubleword_1[DWI]): Likewise. (*sub<mode>_1[SWI]): Use jM. (@add<mode>3_cc_overflow_1[SWI]): Likewise. (*add<dwi>3_doubleword_cc_overflow_1): Use rjO. (*and<dwi>3_doubleword): Likewise. (*anddi_1): Use jM. (*andsi_1_zext): Likewise. (*and<mode>_1[SWI24]): Likewise. (*<code><dwi>3_doubleword[any_or]): Use rjO (*code<mode>_1[any_or SWI248]): Use jM. (*<code>si_1_zext[zero_extend + any_or]): Likewise. * config/i386/predicates.md (apx_ndd_memory_operand): New. (apx_ndd_add_memory_operand): Likewise. gcc/testsuite/ PR target/113711 PR target/113733 * gcc.target/i386/apx-ndd-2.c: New test. * gcc.target/i386/apx-ndd-base-index-1.c: Likewise. * gcc.target/i386/apx-ndd-no-seg-global-1.c: Likewise. * gcc.target/i386/apx-ndd-seg-1.c: Likewise. * gcc.target/i386/apx-ndd-seg-2.c: Likewise. * gcc.target/i386/apx-ndd-seg-3.c: Likewise. * gcc.target/i386/apx-ndd-seg-4.c: Likewise. * gcc.target/i386/apx-ndd-seg-5.c: Likewise. * gcc.target/i386/apx-ndd-tls-1a.c: Likewise. * gcc.target/i386/apx-ndd-tls-2.c: Likewise. * gcc.target/i386/apx-ndd-tls-3.c: Likewise. * gcc.target/i386/apx-ndd-tls-4.c: Likewise. * gcc.target/i386/apx-ndd-x32-1.c: Likewise.
2024-02-08AVR: target/113824 - Fix multilib set for ATA5795.Georg-Johann Lay1-2/+2
gcc/ PR target/113824 * config/avr/avr-mcus.def (ata5797): Move from avr5 to avr4. * doc/avr-mmcu.texi: Rebuild.
2024-02-08AVR: Always define __AVR_PM_BASE_ADDRESS__ in specs provided the core has it.Georg-Johann Lay1-6/+14
gcc/ * config/avr/gen-avr-mmcu-specs.cc (print_mcu) <*cpp_mcu>: Spec always defines __AVR_PM_BASE_ADDRESS__ if the core has it.
2024-02-08AVR: Rename device-specs %_misc to %_rodata_in_ram.Georg-Johann Lay2-8/+5
gcc/ * config/avr/gen-avr-mmcu-specs.cc: Rename spec cc1_misc to cc1_rodata_in_ram. Rename spec link_misc to link_rodata_in_ram. Remove spec asm_misc. * config/avr/specs.h: Same.
2024-02-08RISC-V: Bugfix for RVV overloaded intrinsic ICE in function checkerPan Li1-4/+13
There is another corn case when similar as below example: void test (void) { __riscv_vaadd (); } We report error when overloaded function with empty args. For example: test.c: In function 'foo': test.c:8:3: error: no matching function call to '__riscv_vaadd' with empty args 8 | __riscv_vaadd (); | ^~~~~~~~~~~~~~~~~~~~ Unfortunately, it will meet another ICE similar to below after above message. The underlying build function checker will have zero args and break some assumption of the function checker. For example, the count of args is not less than 2. ice.c: In function ‘foo’: ice.c:8:3: internal compiler error: in require_immediate, at config/riscv/riscv-vector-builtins.cc:4252 8 | __riscv_vaadd (); | ^~~~~~~~~~~~~ 0x20b36ac riscv_vector::function_checker::require_immediate(unsigned int, long, long) const .../__RISC-V_BUILD__/../gcc/config/riscv/riscv-vector-builtins.cc:4252 0x20b890c riscv_vector::alu_def::check(riscv_vector::function_checker&) const .../__RISC-V_BUILD__/../gcc/config/riscv/riscv-vector-builtins-shapes.cc:387 0x20b38d7 riscv_vector::function_checker::check() .../__RISC-V_BUILD__/../gcc/config/riscv/riscv-vector-builtins.cc:4315 0x20b4876 riscv_vector::check_builtin_call(unsigned int, vec<unsigned int, va_heap, vl_ptr>, .../__RISC-V_BUILD__/../gcc/config/riscv/riscv-vector-builtins.cc:4605 0x2069393 riscv_check_builtin_call .../__RISC-V_BUILD__/../gcc/config/riscv/riscv-c.cc:227 Below test are passed for this patch. * The riscv regression tests. PR target/113766 gcc/ChangeLog: * config/riscv/riscv-vector-builtins-shapes.cc (struct alu_def): Make sure the c.arg_num is >= 2 before checking. (struct build_frm_base): Ditto. (struct narrow_alu_def): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr113766-1.c: Add new cases. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-02-07PR target/113690: Remove TImode REG_EQUAL notes in STV.Roger Sayle1-22/+20
This patch fixes PR target/113690, an ICE-on-valid regression on x86_64 that exhibits with a specific combination of command line options. The cause is that x86's scalar-to-vector pass converts a chain of instructions from TImode to V1TImode, but fails to appropriately update or delete the attached REG_EQUAL note. This implements Uros' recommendation of removing these notes. For convenience, this code (re)factors the logic to convert a TImode constant into a V1TImode constant vector into a subroutine and reuses it. For the record, STV is actually doing something useful in this strange testcase, GCC with -O2 -fno-dce -fno-forward-propagate -fno-split-wide-types -funroll-loops generates: foo: movl $v, %eax pxor %xmm0, %xmm0 movaps %xmm0, 48(%rax) movaps %xmm0, (%rax) movaps %xmm0, 16(%rax) movaps %xmm0, 32(%rax) ret With the addition of -mno-stv (to disable the patched code) it gives: foo: movl $v, %eax movq $0, 48(%rax) movq $0, 56(%rax) movq $0, (%rax) movq $0, 8(%rax) movq $0, 16(%rax) movq $0, 24(%rax) movq $0, 32(%rax) movq $0, 40(%rax) ret 2024-02-07 Roger Sayle <roger@nextmovesoftware.com> Uros Bizjak <ubizjak@gmail.com> gcc/ChangeLog PR target/113690 * config/i386/i386-features.cc (timode_convert_cst): New helper function to convert a TImode CONST_SCALAR_INT_P to a V1TImode CONST_VECTOR. (timode_scalar_chain::convert_op): Use timode_convert_cst. (timode_scalar_chain::convert_insn): Delete REG_EQUAL notes. Use timode_convert_cst. gcc/testsuite/ChangeLog PR target/113690 * gcc.target/i386/pr113690.c: New test case.
2024-02-07AArch64: Update system register database.Victor Do Nascimento2-0/+105
With the release of Binutils 2.42, this brings the level of system-register support in GCC in line with the current state-of-the-art in Binutils, ensuring everything available in Binutils is plainly accessible from GCC. Where Binutils uses a more detailed description of which features are responsible for enabling a given system register, GCC aliases the binutils-equivalent feature flag macro constant to that of the base architecture implementing the feature, resulting in entries such as #define AARCH64_FL_S2PIE AARCH64_FL_V8_9A in `aarch64.h', thus ensuring that the Binutils `aarch64-sys-regs.def' file can be understood by GCC without the need for modification. To accompany the addition of the new system registers, a new test is added confirming they were successfully added to the list of recognized registers. gcc/ChangeLog: * config/aarch64/aarch64-sys-regs.def: Copy from Binutils. * config/aarch64/aarch64.h (AARCH64_FL_AIE): New. (AARCH64_FL_DEBUGv8p9): Likewise. (AARCH64_FL_FGT2): Likewise.Likewise. (AARCH64_FL_ITE): Likewise. (AARCH64_FL_PFAR): Likewise. (AARCH64_FL_PMUv3_ICNTR): Likewise. (AARCH64_FL_PMUv3_SS): Likewise. (AARCH64_FL_PMUv3p9): Likewise. (AARCH64_FL_RASv2): Likewise. (AARCH64_FL_S1PIE): Likewise. (AARCH64_FL_S1POE): Likewise. (AARCH64_FL_S2PIE): Likewise. (AARCH64_FL_S2POE): Likewise. (AARCH64_FL_SCTLR2): Likewise. (AARCH64_FL_SEBEP): Likewise. (AARCH64_FL_SPE_FDS): Likewise. (AARCH64_FL_TCR2): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/rwsr-armv8p9.c: New.
2024-02-07RISC-V: Bugfix for RVV overloaded intrinisc ICE when empty argsPan Li3-6/+22
There is one corn case when similar as below example: void test (void) { __riscv_vfredosum_tu (); } It will meet ICE because of the implement details of overloaded function in gcc. According to the rvv intrinisc doc, we have no such overloaded function with empty args. Unfortunately, we register the empty args function as overloaded for avoiding conflict. Thus, there will be actual one register function after return NULL_TREE back to the middle-end, and finally result in ICE when expanding. For example: 1. First we registered void __riscv_vfredmax () as the overloaded function. 2. Then resolve_overloaded_builtin (this func) return NULL_TREE. 3. The functions register in step 1 bypass the args check as empty args. 4. Finally, fall into expand_builtin with empty args and meet ICE. Here we report error when overloaded function with empty args. For example: test.c: In function 'foo': test.c:8:3: error: no matching function call to '__riscv_vfredosum_tu' with empty args 8 | __riscv_vfredosum_tu(); | ^~~~~~~~~~~~~~~~~~~~ Below test are passed for this patch. * The riscv regression tests. PR target/113766 gcc/ChangeLog: * config/riscv/riscv-protos.h (resolve_overloaded_builtin): Adjust the signature of func. * config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): Ditto. * config/riscv/riscv-vector-builtins.cc (resolve_overloaded_builtin): Make overloaded func with empty args error. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr113766-1.c: New test. * gcc.target/riscv/rvv/base/pr113766-2.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-02-06x86-64: Return 10_REG if there is no scratch registerH.J. Lu1-1/+1
If we can't find a scratch register for large model profiling, return R10_REG. PR target/113689 * config/i386/i386.cc (x86_64_select_profile_regnum): Return R10_REG after sorry.
2024-02-06aarch64: Fix function multiversioning manglingAndrew Carlotti1-38/+81
It would be neater if the middle end for target_clones used a target hook for version name mangling, so we only do version name mangling once. However, that would require more intrusive refactoring that will have to wait till Stage 1. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_mangle_decl_assembler_name): Move before new caller, and add ".default" suffix. (get_suffixed_assembler_name): New. (make_resolver_func): Use get_suffixed_assembler_name. (aarch64_generate_version_dispatcher_body): Redo name mangling. gcc/testsuite/ChangeLog: * g++.target/aarch64/mv-symbols1.C: New test. * g++.target/aarch64/mv-symbols2.C: Ditto. * g++.target/aarch64/mv-symbols3.C: Ditto. * g++.target/aarch64/mv-symbols4.C: Ditto. * g++.target/aarch64/mv-symbols5.C: Ditto. * g++.target/aarch64/mvc-symbols1.C: Ditto. * g++.target/aarch64/mvc-symbols2.C: Ditto. * g++.target/aarch64/mvc-symbols3.C: Ditto. * g++.target/aarch64/mvc-symbols4.C: Ditto.
2024-02-06aarch64: Fix build against libc++ in c++11 mode [PR113763]Jakub Jelinek1-3/+3
std::pair ctor used in tiles constexpr variable is only constexpr in C++14 and later, it works with libstdc++ because it is marked constexpr there even in C++11 mode. The following patch fixes it by using an unnamed local class instead of std::pair, and additionally changes the first element from unsigned int to unsigned char because 0xff has to fit into unsigned char on all hosts. 2024-02-06 Jakub Jelinek <jakub@redhat.com> PR target/113763 * config/aarch64/aarch64.cc (aarch64_output_sme_zero_za): Change tiles element from std::pair<unsigned int, char> to an unnamed struct. Adjust uses of tile range variable.
2024-02-06RISC-V: Fix infinite compilation of VSETVL PASSJuzhe-Zhong1-5/+4
This patch fixes issue reported by Jeff. Testing is running. Ok for trunk if I passed the testing with no regression ? gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): Fix inifinite compilation. (pre_vsetvl::remove_vsetvl_pre_insns): Ditto.
2024-02-06AArch64: aarch64_class_max_nregs mishandles 64-bit structure modes [PR112577]Tejas Belagod1-0/+2
The target hook aarch64_class_max_nregs returns the incorrect result for 64-bit structure modes like V31DImode or V41DFmode etc. The calculation of the nregs is based on the size of AdvSIMD vector register for 64-bit modes which ought to be UNITS_PER_VREG / 2. This patch fixes the register size. gcc/ChangeLog: PR target/112577 * config/aarch64/aarch64.cc (aarch64_class_max_nregs): Handle 64-bit vector structure modes correctly.
2024-02-06riscv: Fix compiler warning in thead.ccChristoph Müllner1-1/+2
A recent commit introduced a compiler warning in thead.cc: error: invalid suffix on literal; C++11 requires a space between literal and string macro [-Werror=literal-suffix] 1144 | fprintf (file, "(%s),"HOST_WIDE_INT_PRINT_DEC",%u", reg_names[REGNO (addr.reg)], | ^ This commit addresses this issue and breaks the line such that it won't exceed 80 characters. gcc/ChangeLog: * config/riscv/thead.cc (th_print_operand_address): Fix compiler warning. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-02-05x86-64: Find a scratch register for large model profilingH.J. Lu1-15/+76
2 scratch registers, %r10 and %r11, are available at function entry for large model profiling. But %r10 may be used by stack realignment and we can't use %r10 in this case. Add x86_64_select_profile_regnum to find a caller-saved register which isn't live or a callee-saved register which has been saved on stack in the prologue at entry for large model profiling and sorry if we can't find one. gcc/ PR target/113689 * config/i386/i386.cc (x86_64_select_profile_regnum): New. (x86_function_profiler): Call x86_64_select_profile_regnum to get a scratch register for large model profiling. gcc/testsuite/ PR target/113689 * gcc.target/i386/pr113689-1.c: New file. * gcc.target/i386/pr113689-2.c: Likewise. * gcc.target/i386/pr113689-3.c: Likewise.
2024-02-05arm: Fix missing bti instruction for virtual thunksRichard Ball1-0/+2
Adds missing bti instruction at the beginning of a virtual thunk, when bti is enabled. gcc/ChangeLog: * config/arm/arm.cc (arm_output_mi_thunk): Emit insn for bti_c when bti is enabled. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add v8_1_m_main_pacbti. * g++.target/arm/bti_thunk.C: New test.
2024-02-05mips: Fix missing mode in neg<mode:MSA>2Xi Ruoyao1-1/+1
I was too sleepy writting this :(. gcc/ChangeLog: * config/mips/mips-msa.md (neg<mode:MSA>2): Add missing mode for neg.
2024-02-05MIPS: Fix wrong MSA FP vector negationXi Ruoyao1-3/+15
We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is wrong because -0.0 is not 0 - 0.0. This causes some Python tests to fail when Python is built with MSA enabled. Use the bnegi.df instructions to simply reverse the sign bit instead. gcc/ChangeLog: * config/mips/mips-msa.md (elmsgnbit): New define_mode_attr. (neg<mode>2): Change the mode iterator from MSA to IMSA because in FP arithmetic we cannot use (0 - x) for -x. (neg<mode>2): New define_insn to implement FP vector negation, using a bnegi instruction to negate the sign bit.
2024-02-05i386: Clear REG_UNUSED and REG_DEAD notes from the IL at the end of ↵Jakub Jelinek1-0/+26
vzeroupper pass [PR113059] The move of the vzeroupper pass from after reload pass to after postreload_cse helped only partially, CSE-like passes can still invalidate those notes (especially REG_UNUSED) if they use some earlier register holding some value later on in the IL. So, either we could try to move it one pass further after gcse2 and hope no later pass invalidates the notes, or the following patch attempts to restore the REG_DEAD/REG_UNUSED state from GCC 13 and earlier, where the LRA or reload passes remove all REG_DEAD/REG_UNUSED notes and the notes reappear only at the start of dse2 pass when it calls df_note_add_problem (); df_analyze (); So, effectively NEXT_PASS (pass_postreload_cse); NEXT_PASS (pass_gcse2); NEXT_PASS (pass_split_after_reload); NEXT_PASS (pass_ree); NEXT_PASS (pass_compare_elim_after_reload); NEXT_PASS (pass_thread_prologue_and_epilogue); passes operate without those notes in the IL. While in GCC 14 mode switching computes the notes problem at the start of vzeroupper, the patch below removes them at the end of the pass again, so that the above passes continue to operate without them. 2024-02-05 Jakub Jelinek <jakub@redhat.com> PR target/113059 * config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper): Remove REG_DEAD/REG_UNUSED notes at the end of the pass before df_analyze call.
2024-02-05target/113255 - avoid REG_POINTER on a pointer differenceRichard Biener1-1/+1
The following avoids re-using a register holding a pointer (and thus might be REG_POINTER) for the result of a pointer difference computation. That might confuse heuristics in (broken) RTL alias analysis which relies on REG_POINTER indicating that we're dealing with one. This alone doesn't fix anything. PR target/113255 * config/i386/i386-expand.cc (expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves): Use a new pseudo for the skipped number of bytes.
2024-02-04RISC-V: Add sifive-p450, sifive-p67 to -mcpuMonk Chiang1-0/+9
gcc/ChangeLog: * config/riscv/riscv-cores.def: Add sifive-p450, sifive-p670. * doc/invoke.texi (RISC-V Options): Add sifive-p450, sifive-p670. gcc/testsuite/ChangeLog: * gcc.target/riscv/mcpu-sifive-p450.c: New test. * gcc.target/riscv/mcpu-sifive-p670.c: New test.
2024-02-04RISC-V: Support scheduling for sifive p400 seriesMonk Chiang6-1/+196
Add sifive p400 series scheduler module. For more information see https://www.sifive.com/cores/performance-p450-470. gcc/ChangeLog: * config/riscv/riscv.md: Include sifive-p400.md. * config/riscv/sifive-p400.md: New file. * config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter. * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Add sifive_p400. * config/riscv/riscv.cc (sifive_p400_tune_info): New. * config/riscv/riscv.h (TARGET_SFB_ALU): Update. * doc/invoke.texi (RISC-V Options): Add sifive-p400-series
2024-02-04xtensa: Fix missing mode warning in "*eqne_zero_masked_bits"Takayuki 'January June' Suwa1-1/+1
gcc/ChangeLog: * config/xtensa/xtensa.md (*eqne_zero_masked_bits): Add missing ":SI" to the match_operator.
2024-02-04xtensa: Recover constant synthesis for HImode after LRA transitionTakayuki 'January June' Suwa1-8/+14
After LRA transition, HImode constants that don't fit into signed 12 bits are no longer subject to constant synthesis: /* example */ void test(void) { short foo = 32767; __asm__ ("" :: "r"(foo)); } ;; before .literal_position .literal .LC0, 32767 test: l32r a9, .LC0 ret.n This patch fixes that: ;; after test: movi.n a9, -1 extui a9, a9, 17, 15 ret.n gcc/ChangeLog: * config/xtensa/xtensa.md (SHI): New mode iterator. (2 split patterns related to constsynth): Change to also accept HImode operands.
2024-02-04[committed] Reasonably handle SUBREGs in risc-v cost modelingJeff Law1-7/+11
This patch adjusts the costs so that we treat REG and SUBREG expressions the same for costing. This was motivated by bt_skip_func and bt_find_func in xz and results in nearly a 5% improvement in the dynamic instruction count for input #2 and smaller, but definitely visible improvements pretty much across the board. Exceptions would be perlbench input #1 and exchange2 which showed very small regressions. In the bt_find_func and bt_skip_func cases we have something like this: > (insn 10 7 11 2 (set (reg/v:DI 136 [ x ]) > (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip} > (nil)) > (insn 11 10 12 2 (set (reg:DI 142 [ _1 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3} > (nil)) [ ... ]> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3} > (nil)) Note the two uses of (reg 136). The best way to handle that in combine might be a 3->2 split. But there's a much better approach if we look at fwprop... (set (reg:DI 142 [ _1 ]) (plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0)) (reg/v:DI 139 [ b ]))) change not profitable (cost 4 -> cost 8) So that should be the same cost as a regular DImode addition when the ZBA extension is enabled. But it ends up costing more because the clause to cost this variant isn't prepared to handle a SUBREG. That results in the RTL above having too high a cost and fwprop gives up. One approach would be to replace the REG_P with REG_P || SUBREG_P in the costing code. I ultimately decided against that and instead check if the operand in question passes register_operand. By far the most important case to handle is the DImode PLUS. But for the sake of consistency, I changed the other instances in riscv_rtx_costs as well. For those other cases we're talking about improvements in the .000001% range. While we are into stage4, this just hits cost modeling which we've generally agreed is still appropriate (though we were mostly talking about vector). So I'm going to extend that general agreement ever so slightly and include scalar cost modeling :-) gcc/ * config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG similarly. gcc/testsuite/ * gcc.target/riscv/reg_subreg_costs.c: New test. Co-authored-by: Jivan Hakobyan <jivanhakobyan9@gmail.com>
2024-02-04LoongArch: Fix wrong LSX FP vector negationXi Ruoyao3-27/+18
We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is wrong because -0.0 is not 0 - 0.0. This causes some Python tests to fail when Python is built with LSX enabled. Use the vbitrevi.{d/w} instructions to simply reverse the sign bit instead. We are already doing this for LASX and now we can unify them into simd.md. gcc/ChangeLog: * config/loongarch/lsx.md (neg<mode:FLSX>2): Remove the incorrect expand. * config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr. (elmsgnbit): Likewise. (neg<mode:FVEC>2): New define_insn. * config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they are now instantiated in simd.md.
2024-02-04LoongArch: Avoid out-of-bounds access in loongarch_symbol_insnsXi Ruoyao1-1/+2
We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes. But in loongarch_symbol_insns: if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode)) return 0; And LSX_SUPPORTED_MODE_P is defined as: #define LSX_SUPPORTED_MODE_P(MODE) \ (ISA_HAS_LSX \ && GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ... GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined: ALWAYS_INLINE poly_uint16 mode_to_bytes (machine_mode mode) { #if GCC_VERSION >= 4001 return (__builtin_constant_p (mode) ? mode_size_inline (mode) : mode_size[mode]); #else return mode_size[mode]; #endif } There is an assertion in mode_size_inline: gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES); Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc), thus if __builtin_constant_p (mode) is evaluated true (it happens when GCC is bootstrapped with LTO+PGO), the assertion will be triggered and cause an ICE. OTOH if __builtin_constant_p (mode) is evaluated false, mode_size[mode] is still an out-of-bound array access (the length or the mode_size array is NUM_MACHINE_MODES). So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with MAX_MACHINE_MODE in loongarch_symbol_insns. This is very similar to a MIPS bug PR98491 fixed by me about 3 years ago. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is MAX_MACHINE_MODE.
2024-02-04LoongArch: testsuite: Fix gcc.dg/vect/vect-reduc-mul_{1, 2}.c FAIL.Li Wei1-55/+163
This FAIL was introduced from r14-6908. The reason is that when merging constant vector permutation implementations, the 128-bit matching situation was not fully considered. In fact, the expansion of 128-bit vectors after merging only supports value-based 4 elements set shuffle, so this time is a complete implementation of the entire 128-bit vector constant permutation, and some structural adjustments have also been made to the code. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_expand_vselect): Adjust. (loongarch_expand_vselect_vconcat): Ditto. (loongarch_try_expand_lsx_vshuf_const): New, use vshuf to implement all 128-bit constant permutation situations. (loongarch_expand_lsx_shuffle): Adjust and rename function name. (loongarch_is_imm_set_shuffle): Renamed function name. (loongarch_expand_vec_perm_even_odd): Function forward declaration. (loongarch_expand_vec_perm_even_odd_1): Add implement for 128-bit extract-even and extract-odd permutations. (loongarch_is_odd_extraction): Delete. (loongarch_is_even_extraction): Ditto. (loongarch_expand_vec_perm_const): Adjust.
2024-02-03LoongArch: Fix an ODR violationXi Ruoyao2-2/+3
When bootstrapping GCC 14 with --with-build-config=bootstrap-lto, an ODR violation is detected: ../../gcc/config/loongarch/loongarch-opts.cc:57: warning: 'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr] 57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES]; ../../gcc/config/loongarch/loongarch-def.cc:186: note: 'abi_minimal_isa' was previously declared here 186 | abi_minimal_isa = array<array<loongarch_isa, N_ABI_EXT_TYPES>, ../../gcc/config/loongarch/loongarch-def.cc:186: note: code may be misoptimized unless '-fno-strict-aliasing' is used Fix it by adding a proper declaration of abi_minimal_isa into loongarch-def.h and remove the ODR-violating local declaration in loongarch-opts.cc. gcc/ChangeLog: * config/loongarch/loongarch-def.h (abi_minimal_isa): Declare. * config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove the ODR-violating locale declaration.
2024-02-02hppa: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENVJohn David Anglin2-1/+298
This change implements __builtin_get_fpsr() and __builtin_set_fpsr(x) to get and set the floating-point status register. They are used to implement pa_atomic_assign_expand_fenv(). 2024-02-02 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: PR target/59778 * config/pa/pa.cc (enum pa_builtins): Add PA_BUILTIN_GET_FPSR and PA_BUILTIN_SET_FPSR builtins. * (pa_builtins_icode): Declare. * (def_builtin, pa_fpu_init_builtins): New. * (pa_init_builtins): Initialize FPU builtins. * (pa_builtin_decl, pa_expand_builtin_1): New. * (pa_expand_builtin): Handle PA_BUILTIN_GET_FPSR and PA_BUILTIN_SET_FPSR builtins. * (pa_atomic_assign_expand_fenv): New. * config/pa/pa.md (UNSPECV_GET_FPSR, UNSPECV_SET_FPSR): New UNSPECV constants. (get_fpsr, put_fpsr): New expanders. (get_fpsr_32, get_fpsr_64, set_fpsr_32, set_fpsr_64): New insn patterns.
2024-02-03RISC-V: Expand VLMAX scalar move in reductionJuzhe-Zhong1-5/+7
This patch fixes the following: vsetvli a5,a1,e32,m1,tu,ma slli a4,a5,2 sub a1,a1,a5 vle32.v v2,0(a0) add a0,a0,a4 vadd.vv v1,v2,v1 bne a1,zero,.L3 vsetivli zero,1,e32,m1,ta,ma vmv.s.x v2,zero vsetvli a5,zero,e32,m1,ta,ma ---> Redundant vsetvl. vredsum.vs v1,v1,v2 vmv.x.s a0,v1 ret VSETVL PASS is able to fuse avl = 1 of scalar move and VLMAX avl of reduction. However, this following RTL blocks the fusion in dependence analysis in VSETVL PASS: (insn 49 24 50 5 (set (reg:RVVM1SI 98 v2 [148]) (if_then_else:RVVM1SI (unspec:RVVMF32BI [ (const_vector:RVVMF32BI [ (const_int 1 [0x1]) repeat [ (const_int 0 [0]) ] ]) (const_int 1 [0x1]) (const_int 2 [0x2]) repeated x2 (const_int 0 [0]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (const_vector:RVVM1SI repeat [ (const_int 0 [0]) ]) (unspec:RVVM1SI [ (reg:DI 0 zero) ] UNSPEC_VUNDEF))) 3813 {*pred_broadcastrvvm1si_zero} (nil)) (insn 50 49 51 5 (set (reg:DI 15 a5 [151]) ----> It set a5, blocks the following VLMAX into the scalar move above. (unspec:DI [ (const_int 32 [0x20]) ] UNSPEC_VLMAX)) 2566 {vlmax_avldi} (expr_list:REG_EQUIV (unspec:DI [ (const_int 32 [0x20]) ] UNSPEC_VLMAX) (nil))) (insn 51 50 52 5 (set (reg:RVVM1SI 97 v1 [150]) (unspec:RVVM1SI [ (unspec:RVVMF32BI [ (const_vector:RVVMF32BI repeat [ (const_int 1 [0x1]) ]) (reg:DI 15 a5 [151]) (const_int 2 [0x2]) (const_int 1 [0x1]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (unspec:RVVM1SI [ (reg:RVVM1SI 97 v1 [orig:134 vect_result_14.6 ] [134]) (reg:RVVM1SI 98 v2 [148]) ] UNSPEC_REDUC_SUM) (unspec:RVVM1SI [ (reg:DI 0 zero) ] UNSPEC_VUNDEF) ] UNSPEC_REDUC)) 17541 {pred_redsumrvvm1si} (expr_list:REG_DEAD (reg:RVVM1SI 98 v2 [148]) (expr_list:REG_DEAD (reg:SI 66 vl) (expr_list:REG_DEAD (reg:DI 15 a5 [151]) (expr_list:REG_DEAD (reg:DI 0 zero) (nil)))))) Such situation can only happen on auto-vectorization, never happen on intrinsic codes. Since the reduction is passed VLMAX AVL, it should be more natural to pass VLMAX to the scalar move which initial the value of the reduction. After this patch: vsetvli a5,a1,e32,m1,tu,ma slli a4,a5,2 sub a1,a1,a5 vle32.v v2,0(a0) add a0,a0,a4 vadd.vv v1,v2,v1 bne a1,zero,.L3 vsetvli a5,zero,e32,m1,ta,ma vmv.s.x v2,zero vredsum.vs v1,v1,v2 vmv.x.s a0,v1 ret Tested on both RV32/RV64 no regression. PR target/113697 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_reduction): Pass VLMAX avl to scalar move. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr113697.c: New test.
2024-02-02Revert "RISC-V: Allow LICM hoist POLY_INT configuration code sequence"Lehua Ding1-5/+4
This reverts commit 74489c19070703361acc20bc172f304cae845a96.
2024-02-02RISC-V: Allow LICM hoist POLY_INT configuration code sequenceJuzhe-Zhong1-4/+5
Realize in recent benchmark evaluation (coremark-pro zip-test): vid.v v2 vmv.v.i v5,0 .L9: vle16.v v3,0(a4) vrsub.vx v4,v2,a6 ---> LICM failed to hoist it outside the loop. The root cause is: (insn 56 47 57 4 (set (subreg:DI (reg:HI 220) 0) (reg:DI 223)) "rvv.c":11:9 208 {*movdi_64bit} -> Its result used by the following vrsub.vx then supress the hoist of the vrsub.vx (nil)) (insn 57 56 59 4 (set (reg:RVVMF2HI 216) (if_then_else:RVVMF2HI (unspec:RVVMF32BI [ (const_vector:RVVMF32BI repeat [ (const_int 1 [0x1]) ]) (reg:DI 350) (const_int 2 [0x2]) repeated x2 (const_int 1 [0x1]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (minus:RVVMF2HI (vec_duplicate:RVVMF2HI (reg:HI 220)) (reg:RVVMF2HI 217)) (unspec:RVVMF2HI [ (reg:DI 0 zero) ] UNSPEC_VUNDEF))) "rvv.c":11:9 6938 {pred_subrvvmf2hi_reverse_scalar} (expr_list:REG_DEAD (reg:HI 220) (nil))) This patch fixes it generate (set (reg:HI) (subreg:HI (reg:DI))) instead of (set (subreg:DI (reg:DI)) (reg:DI)). After this patch: vid.v v2 vrsub.vx v2,v2,a7 vmv.v.i v4,0 .L3: vle16.v v3,0(a4) Tested on both RV32 and RV64 no regression. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_move): Fix poly_int dest generation. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/poly_licm-1.c: New test. * gcc.target/riscv/rvv/autovec/poly_licm-2.c: New test.
2024-02-02RISC-V: Cleanup the comments for the psabiPan Li1-12/+9
This patch would like to cleanup some comments which are out of date or incorrect. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_get_arg_info): Cleanup comments. (riscv_pass_by_reference): Ditto. (riscv_fntype_abi): Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-02-02RISC-V: Remove vsetvl_pre bogus instructions in VSETVL PASSJuzhe-Zhong1-0/+64
I realize there is a RTL regression between GCC-14 and GCC-13. https://godbolt.org/z/Ga7K6MqaT GCC-14: (insn 9 13 31 2 (set (reg:DI 15 a5 [138]) (unspec:DI [ (const_int 64 [0x40]) ] UNSPEC_VLMAX)) "/app/example.c":5:15 2566 {vlmax_avldi} (expr_list:REG_EQUIV (unspec:DI [ (const_int 64 [0x40]) ] UNSPEC_VLMAX) (nil))) (insn 31 9 10 2 (parallel [ (set (reg:DI 15 a5 [138]) (unspec:DI [ (reg:DI 0 zero) (const_int 32 [0x20]) (const_int 7 [0x7]) (const_int 1 [0x1]) repeated x2 ] UNSPEC_VSETVL)) (set (reg:SI 66 vl) (unspec:SI [ (reg:DI 0 zero) (const_int 32 [0x20]) (const_int 7 [0x7]) ] UNSPEC_VSETVL)) (set (reg:SI 67 vtype) (unspec:SI [ (const_int 32 [0x20]) (const_int 7 [0x7]) (const_int 1 [0x1]) repeated x2 ] UNSPEC_VSETVL)) ]) "/app/example.c":5:15 3281 {vsetvldi} (nil)) GCC-13: (insn 10 7 26 2 (set (reg/f:DI 11 a1 [139]) (plus:DI (reg:DI 11 a1 [142]) (const_int 800 [0x320]))) "/app/example.c":6:32 5 {adddi3} (nil)) (insn 26 10 9 2 (parallel [ (set (reg:DI 15 a5) (unspec:DI [ (reg:DI 0 zero) (const_int 32 [0x20]) (const_int 7 [0x7]) (const_int 1 [0x1]) repeated x2 ] UNSPEC_VSETVL)) (set (reg:SI 66 vl) (unspec:SI [ (reg:DI 0 zero) (const_int 32 [0x20]) (const_int 7 [0x7]) ] UNSPEC_VSETVL)) (set (reg:SI 67 vtype) (unspec:SI [ (const_int 32 [0x20]) (const_int 7 [0x7]) (const_int 1 [0x1]) repeated x2 ] UNSPEC_VSETVL)) ]) "/app/example.c":5:15 792 {vsetvldi} (nil)) GCC-13 doesn't have: (insn 9 13 31 2 (set (reg:DI 15 a5 [138]) (unspec:DI [ (const_int 64 [0x40]) ] UNSPEC_VLMAX)) "/app/example.c":5:15 2566 {vlmax_avldi} (expr_list:REG_EQUIV (unspec:DI [ (const_int 64 [0x40]) ] UNSPEC_VLMAX) (nil))) vsetvl_pre doesn't emit any assembler which is just used for occupying scalar register. It should be removed in VSETVL PASS. Tested on both RV32 and RV64 no regression. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (vsetvl_pre_insn_p): New function. (pre_vsetvl::cleaup): Remove vsetvl_pre. (pre_vsetvl::remove_vsetvl_pre_insns): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/vsetvl_pre-1.c: New test.
2024-02-02LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functionsJiahao Xu1-8/+8
gcc/ChangeLog: * config/loongarch/larchintrin.h (__frecipe_s): Update function return type. (__frecipe_d): Ditto. (__frsqrte_s): Ditto. (__frsqrte_d): Ditto. gcc/testsuite/ChangeLog: * gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.