Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ChangeLog:
* config/loongarch/larchintrin.h (__iocsrrd_h): Modify the
function return value type to unsigned short.
|
|
*sge<u>_<X:mode><GPR:mode> pattern has referenced operand[2] which is
invalid...it should just use `slti<u>` rather than `slti%i2<u>`.
gcc/ChangeLog:
PR target/106543
* config/riscv/riscv.md (*sge<u>_<X:mode><GPR:mode>): Fix asm
pattern.
|
|
The output of -march=help is like below:
```
All available -march extensions for RISC-V:
Name Version
i 2.0, 2.1
e 2.0
m 2.0
a 2.0, 2.1
f 2.0, 2.2
d 2.0, 2.2
...
```
Also support -print-supported-extensions and --print-supported-extensions for
clang compatibility.
gcc/ChangeLog:
PR target/109349
* common/config/riscv/riscv-common.cc (riscv_arch_help): New.
* config/riscv/riscv-protos.h (RISCV_MAJOR_VERSION_BASE): New.
(RISCV_MINOR_VERSION_BASE): Ditto.
(RISCV_REVISION_VERSION_BASE): Ditto.
* config/riscv/riscv-c.cc (riscv_ext_version_value): Use enum
rather than magic number.
* config/riscv/riscv.h (riscv_arch_help): New.
(EXTRA_SPEC_FUNCTIONS): Add riscv_arch_help.
(DRIVER_SELF_SPECS): Handle -march=help, -print-supported-extensions and
--print-supported-extensions.
* config/riscv/riscv.opt (march=help): New.
(print-supported-extensions): New.
(-print-supported-extensions): New.
* doc/invoke.texi (RISC-V Options): Document -march=help.
Reviewed-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
This patch fixes a bug that causes indirect calls in PAC-enabled functions
to be tailcalled incorrectly when all argument registers R0-R3 are used.
2024-02-07 Tejas Belagod <tejas.belagod@arm.com>
PR target/113780
* config/arm/arm.cc (arm_function_ok_for_sibcall): Don't allow tailcalls
for indirect calls with 4 or more arguments in pac-enabled functions.
* lib/target-supports.exp (v8_1m_main_pacbti): Add __ARM_FEATURE_PAUTH.
* gcc.target/arm/pac-sibcall.c: New.
|
|
Commit 77d0f9ec3809b4d2e32c36069b6b9239d301c030 inadvertently changed
the normal asm dialect instruction template for zero_extendqidi2 from
ldxb to ldxh. Fix that.
gcc/
* config/bpf/bpf.md (zero_extendqidi2): Correct asm template to
use ldxb instead of ldxh.
|
|
The -mmcu=avrtiny cores have no ADIW and SBIW instructions. This was
implemented by clearing all regs out of regclass ADDW_REGS so that
constraint "w" never matched. This corrupted the subset relations of
the register classes as they appear in enum reg_class.
This patch keeps ADDW_REGS like for all other cores, i.e. it contains
R24...R31. Instead of tests like test_hard_reg_class (ADDW_REGS, *)
the code now uses avr_adiw_reg_p (*). And all insns with constraint "w"
get "isa" insn attribute value of "adiw".
Plus, a new built-in macro __AVR_HAVE_ADIW__ is provided, which is more
specific than __AVR_TINY__.
gcc/
PR target/113927
* config/avr/avr.h (AVR_HAVE_ADIW): New macro.
* config/avr/avr-protos.h (avr_adiw_reg_p): New proto.
* config/avr/avr.cc (avr_adiw_reg_p): New function.
(avr_conditional_register_usage) [AVR_TINY]: Don't clear ADDW_REGS.
Replace test_hard_reg_class (ADDW_REGS, ...) with calls to
* config/avr/avr.md: Same.
(attr "isa") <tiny, no_tiny>: Remove.
<adiw, no_adiw>: Add.
(define_insn, define_insn_and_split): When an alternative has
constraint "w", then set attribute "isa" to "adiw".
* config/avr/avr-c.cc (avr_cpu_cpp_builtins) [AVR_HAVE_ADIW]:
Built-in define __AVR_HAVE_ADIW__.
* doc/invoke.texi (AVR Options): Document it.
|
|
The RDNA architecture has limited support for permute operations. This should
allow use of the permutations that do work, and fall back to linear code for
other cases.
gcc/ChangeLog:
* config/gcn/gcn-valu.md
(vec_extract<V_MOV:mode><V_MOV_ALT:mode>): Add conditions for RDNA.
* config/gcn/gcn.cc (gcn_vectorize_vec_perm_const): Check permutation
details are supported on RDNA devices.
|
|
Introduce vec_shl_<mode> and vec_shr_<mode> expanders to improve
'*a = __builtin_shufflevector(*a, (vect64){0}, 1, 2, 3, 4);'
and
'*a = __builtin_shufflevector((vect64){0}, *a, 3, 4, 5, 6);'
shuffles. The generated code improves from:
movzwl 6(%rdi), %eax
movzwl 4(%rdi), %edx
salq $16, %rax
orq %rdx, %rax
movzwl 2(%rdi), %edx
salq $16, %rax
orq %rdx, %rax
movq %rax, (%rdi)
to:
movq (%rdi), %xmm0
psrlq $16, %xmm0
movq %xmm0, (%rdi)
and to:
movq (%rdi), %xmm0
psllq $16, %xmm0
movq %xmm0, (%rdi)
in the second case.
The patch handles 32-bit vectors as well and improves generated code from:
movd (%rdi), %xmm0
pxor %xmm1, %xmm1
punpcklwd %xmm1, %xmm0
pshuflw $230, %xmm0, %xmm0
movd %xmm0, (%rdi)
to:
movd (%rdi), %xmm0
psrld $16, %xmm0
movd %xmm0, (%rdi)
and to:
movd (%rdi), %xmm0
pslld $16, %xmm0
movd %xmm0, (%rdi)
PR target/113871
gcc/ChangeLog:
* config/i386/mmx.md (V248FI): New mode iterator.
(V24FI_32): DItto.
(vec_shl_<V248FI:mode>): New expander.
(vec_shl_<V24FI_32:mode>): Ditto.
(vec_shr_<V248FI:mode>): Ditto.
(vec_shr_<V24FI_32:mode>): Ditto.
* config/i386/sse.md (vec_shl_<V_128:mode>): Simplify expander.
(vec_shr_<V248FI:mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr113871-1a.c: New test.
* gcc.target/i386/pr113871-1b.c: New test.
* gcc.target/i386/pr113871-2a.c: New test.
* gcc.target/i386/pr113871-2b.c: New test.
* gcc.target/i386/pr113871-3a.c: New test.
* gcc.target/i386/pr113871-3b.c: New test.
* gcc.target/i386/pr113871-4a.c: New test.
|
|
Since push2/pop2 requires 16-byte stack alignment, don't use them if the
incoming stack isn't 16-byte aligned.
gcc/
PR target/113876
* config/i386/i386.cc (ix86_pro_and_epilogue_can_use_push2pop2):
Return false if the incoming stack isn't 16-byte aligned.
gcc/testsuite/
PR target/113876
* gcc.target/i386/pr113876.c: New test.
|
|
UNSPEC_AUIPC. [PR113742]
gcc/ChangeLog:
PR target/113742
* config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Fix
recognizes UNSPEC_AUIPC for RISCV_FUSE_LUI_ADDI.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr113742.c: New test.
|
|
The initial heap trampoline implementation was targeting 64b
platforms. As the PR demonstrates this creates an issue where it
is expected that the same symbols are exported for 32 and 64b.
Rather than conditionalize the exports and code-gen on x86_64,
this patch provides a basic implementation of the IA32 trampoline.
This also avoids potential user confusion, when a 32b target has
64b multilibs, and vice versa; which is the case for Darwin.
PR target/113855
gcc/ChangeLog:
* config/i386/darwin.h (DARWIN_HEAP_T_LIB): Moved to be
available to all sub-targets.
* config/i386/darwin32-biarch.h (DARWIN_HEAP_T_LIB): Delete.
* config/i386/darwin64-biarch.h (DARWIN_HEAP_T_LIB): Delete.
libgcc/ChangeLog:
* config.host: Add trampoline support to x?86-linux.
* config/i386/heap-trampoline.c (trampoline_insns): Provide
a variant for IA32.
(union ix86_trampoline): Likewise.
(__gcc_nested_func_ptr_created): Implement a basic trampoline
for IA32.
|
|
When build with "-Werror=format-diag", there will be one misspelled
term args as below. This patch would like fix it by taking the term
arguments instead.
../../gcc/config/riscv/riscv-vector-builtins.cc: In function 'tree_node*
riscv_vector::resolve_overloaded_builtin(location_t, unsigned int, tree,
vec<tree_node*, va_gc>*)':
../../gcc/config/riscv/riscv-vector-builtins.cc:4633:65: error:
misspelled term 'args' in format; use 'arguments' instead
[-Werror=format-diag]
4633 | error_at (loc, "no matching function call to %qE with empty
args", fndecl);
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins.cc (resolve_overloaded_builtin):
Replace args to arguments for misspelled term.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr113766-1.c: Adjust the test cases.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
For devices that see a part for the flash memory in the RAM address space,
bit-field NVMCTRL_CTRLB.FLMAP must match the value of symbol __flmap.
This is achieved by dragging in startup code from lib<mcu>.a.
The mechanism is the same like for libgcc's __do_copy_data and __do_clear_bss.
The code is implemented in AVR-LibC #931 and can be dragged by referencing
__do_flmap_init.
In addition to setting FLMAP, that code also sets bit FLMAPLOCK provided
symbol __flmap_lock has a non-zero value. This protects FLMAP from future
changes.
When the __do_flmap_init code is not wanted, the symbol can be satisfied by
linking with -Wl,--defsym,__do_flmap_init=0
gcc/
PR target/112944
* config/avr/gen-avr-mmcu-specs.cc (print_mcu) [have_flmap]:
<*link_rodata_in_ram>: Spec undefs symbol __do_flmap_init
when not linked with -mrodata-in-ram.
|
|
Some information was (re-)computed in different places.
This patch computes them in new struct McuInfo and passes
it around in order to provide the information.
gcc/
* config/avr/gen-avr-mmcu-specs.cc (struct McuInfo): New.
(main, print_mcu, diagnose_mrodata_in_ram): Pass it down.
|
|
1. The only supported TLS code sequence with ADD is
addq foo@gottpoff(%rip),%reg
Change je constraint to a memory operand in APX NDD ADD pattern with
register source operand.
2. The instruction length of APX NDD instructions with immediate operand:
op imm, mem, reg
may exceed the size limit of 15 byes when non-default address space,
segment register or address size prefix are used.
Add jM constraint which is a memory operand valid for APX NDD instructions
with immediate operand and add jO constraint which is an offsetable memory
operand valid for APX NDD instructions with immediate operand. Update
APX NDD patterns with jM and jO constraints.
gcc/
PR target/113711
PR target/113733
* config/i386/constraints.md: List all constraints with j prefix.
(j>): Change auto-dec to auto-inc in documentation.
(je): Changed to a memory constraint with APX NDD TLS operand
check.
(jM): New memory constraint for APX NDD instructions.
(jO): Likewise.
* config/i386/i386-protos.h (x86_poff_operand_p): Removed.
* config/i386/i386.cc (x86_poff_operand_p): Likewise.
* config/i386/i386.md (*add<dwi>3_doubleword): Use rjO.
(*add<mode>_1[SWI48]): Use je and jM.
(addsi_1_zext): Use jM.
(*addv<dwi>4_doubleword_1[DWI]): Likewise.
(*sub<mode>_1[SWI]): Use jM.
(@add<mode>3_cc_overflow_1[SWI]): Likewise.
(*add<dwi>3_doubleword_cc_overflow_1): Use rjO.
(*and<dwi>3_doubleword): Likewise.
(*anddi_1): Use jM.
(*andsi_1_zext): Likewise.
(*and<mode>_1[SWI24]): Likewise.
(*<code><dwi>3_doubleword[any_or]): Use rjO
(*code<mode>_1[any_or SWI248]): Use jM.
(*<code>si_1_zext[zero_extend + any_or]): Likewise.
* config/i386/predicates.md (apx_ndd_memory_operand): New.
(apx_ndd_add_memory_operand): Likewise.
gcc/testsuite/
PR target/113711
PR target/113733
* gcc.target/i386/apx-ndd-2.c: New test.
* gcc.target/i386/apx-ndd-base-index-1.c: Likewise.
* gcc.target/i386/apx-ndd-no-seg-global-1.c: Likewise.
* gcc.target/i386/apx-ndd-seg-1.c: Likewise.
* gcc.target/i386/apx-ndd-seg-2.c: Likewise.
* gcc.target/i386/apx-ndd-seg-3.c: Likewise.
* gcc.target/i386/apx-ndd-seg-4.c: Likewise.
* gcc.target/i386/apx-ndd-seg-5.c: Likewise.
* gcc.target/i386/apx-ndd-tls-1a.c: Likewise.
* gcc.target/i386/apx-ndd-tls-2.c: Likewise.
* gcc.target/i386/apx-ndd-tls-3.c: Likewise.
* gcc.target/i386/apx-ndd-tls-4.c: Likewise.
* gcc.target/i386/apx-ndd-x32-1.c: Likewise.
|
|
gcc/
PR target/113824
* config/avr/avr-mcus.def (ata5797): Move from avr5 to avr4.
* doc/avr-mmcu.texi: Rebuild.
|
|
gcc/
* config/avr/gen-avr-mmcu-specs.cc (print_mcu) <*cpp_mcu>: Spec always
defines __AVR_PM_BASE_ADDRESS__ if the core has it.
|
|
gcc/
* config/avr/gen-avr-mmcu-specs.cc: Rename spec cc1_misc to
cc1_rodata_in_ram. Rename spec link_misc to link_rodata_in_ram.
Remove spec asm_misc.
* config/avr/specs.h: Same.
|
|
There is another corn case when similar as below example:
void test (void)
{
__riscv_vaadd ();
}
We report error when overloaded function with empty args. For example:
test.c: In function 'foo':
test.c:8:3: error: no matching function call to '__riscv_vaadd' with empty args
8 | __riscv_vaadd ();
| ^~~~~~~~~~~~~~~~~~~~
Unfortunately, it will meet another ICE similar to below after above
message. The underlying build function checker will have zero args
and break some assumption of the function checker. For example, the
count of args is not less than 2.
ice.c: In function ‘foo’:
ice.c:8:3: internal compiler error: in require_immediate, at
config/riscv/riscv-vector-builtins.cc:4252
8 | __riscv_vaadd ();
| ^~~~~~~~~~~~~
0x20b36ac riscv_vector::function_checker::require_immediate(unsigned
int, long, long) const
.../__RISC-V_BUILD__/../gcc/config/riscv/riscv-vector-builtins.cc:4252
0x20b890c riscv_vector::alu_def::check(riscv_vector::function_checker&) const
.../__RISC-V_BUILD__/../gcc/config/riscv/riscv-vector-builtins-shapes.cc:387
0x20b38d7 riscv_vector::function_checker::check()
.../__RISC-V_BUILD__/../gcc/config/riscv/riscv-vector-builtins.cc:4315
0x20b4876 riscv_vector::check_builtin_call(unsigned int, vec<unsigned int, va_heap, vl_ptr>,
.../__RISC-V_BUILD__/../gcc/config/riscv/riscv-vector-builtins.cc:4605
0x2069393 riscv_check_builtin_call
.../__RISC-V_BUILD__/../gcc/config/riscv/riscv-c.cc:227
Below test are passed for this patch.
* The riscv regression tests.
PR target/113766
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-shapes.cc (struct alu_def): Make
sure the c.arg_num is >= 2 before checking.
(struct build_frm_base): Ditto.
(struct narrow_alu_def): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr113766-1.c: Add new cases.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch fixes PR target/113690, an ICE-on-valid regression on x86_64
that exhibits with a specific combination of command line options. The
cause is that x86's scalar-to-vector pass converts a chain of instructions
from TImode to V1TImode, but fails to appropriately update or delete the
attached REG_EQUAL note. This implements Uros' recommendation of removing
these notes. For convenience, this code (re)factors the logic to convert
a TImode constant into a V1TImode constant vector into a subroutine and
reuses it.
For the record, STV is actually doing something useful in this strange
testcase, GCC with -O2 -fno-dce -fno-forward-propagate -fno-split-wide-types
-funroll-loops generates:
foo: movl $v, %eax
pxor %xmm0, %xmm0
movaps %xmm0, 48(%rax)
movaps %xmm0, (%rax)
movaps %xmm0, 16(%rax)
movaps %xmm0, 32(%rax)
ret
With the addition of -mno-stv (to disable the patched code) it gives:
foo: movl $v, %eax
movq $0, 48(%rax)
movq $0, 56(%rax)
movq $0, (%rax)
movq $0, 8(%rax)
movq $0, 16(%rax)
movq $0, 24(%rax)
movq $0, 32(%rax)
movq $0, 40(%rax)
ret
2024-02-07 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR target/113690
* config/i386/i386-features.cc (timode_convert_cst): New helper
function to convert a TImode CONST_SCALAR_INT_P to a V1TImode
CONST_VECTOR.
(timode_scalar_chain::convert_op): Use timode_convert_cst.
(timode_scalar_chain::convert_insn): Delete REG_EQUAL notes.
Use timode_convert_cst.
gcc/testsuite/ChangeLog
PR target/113690
* gcc.target/i386/pr113690.c: New test case.
|
|
With the release of Binutils 2.42, this brings the level of
system-register support in GCC in line with the current
state-of-the-art in Binutils, ensuring everything available in
Binutils is plainly accessible from GCC.
Where Binutils uses a more detailed description of which features are
responsible for enabling a given system register, GCC aliases the
binutils-equivalent feature flag macro constant to that of the base
architecture implementing the feature, resulting in entries such as
#define AARCH64_FL_S2PIE AARCH64_FL_V8_9A
in `aarch64.h', thus ensuring that the Binutils `aarch64-sys-regs.def'
file can be understood by GCC without the need for modification.
To accompany the addition of the new system registers, a new test is
added confirming they were successfully added to the list of
recognized registers.
gcc/ChangeLog:
* config/aarch64/aarch64-sys-regs.def: Copy from Binutils.
* config/aarch64/aarch64.h (AARCH64_FL_AIE): New.
(AARCH64_FL_DEBUGv8p9): Likewise.
(AARCH64_FL_FGT2): Likewise.Likewise.
(AARCH64_FL_ITE): Likewise.
(AARCH64_FL_PFAR): Likewise.
(AARCH64_FL_PMUv3_ICNTR): Likewise.
(AARCH64_FL_PMUv3_SS): Likewise.
(AARCH64_FL_PMUv3p9): Likewise.
(AARCH64_FL_RASv2): Likewise.
(AARCH64_FL_S1PIE): Likewise.
(AARCH64_FL_S1POE): Likewise.
(AARCH64_FL_S2PIE): Likewise.
(AARCH64_FL_S2POE): Likewise.
(AARCH64_FL_SCTLR2): Likewise.
(AARCH64_FL_SEBEP): Likewise.
(AARCH64_FL_SPE_FDS): Likewise.
(AARCH64_FL_TCR2): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/acle/rwsr-armv8p9.c: New.
|
|
There is one corn case when similar as below example:
void test (void)
{
__riscv_vfredosum_tu ();
}
It will meet ICE because of the implement details of overloaded function
in gcc. According to the rvv intrinisc doc, we have no such overloaded
function with empty args. Unfortunately, we register the empty args
function as overloaded for avoiding conflict. Thus, there will be actual
one register function after return NULL_TREE back to the middle-end,
and finally result in ICE when expanding. For example:
1. First we registered void __riscv_vfredmax () as the overloaded function.
2. Then resolve_overloaded_builtin (this func) return NULL_TREE.
3. The functions register in step 1 bypass the args check as empty args.
4. Finally, fall into expand_builtin with empty args and meet ICE.
Here we report error when overloaded function with empty args. For example:
test.c: In function 'foo':
test.c:8:3: error: no matching function call to '__riscv_vfredosum_tu' with empty args
8 | __riscv_vfredosum_tu();
| ^~~~~~~~~~~~~~~~~~~~
Below test are passed for this patch.
* The riscv regression tests.
PR target/113766
gcc/ChangeLog:
* config/riscv/riscv-protos.h (resolve_overloaded_builtin): Adjust
the signature of func.
* config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): Ditto.
* config/riscv/riscv-vector-builtins.cc (resolve_overloaded_builtin): Make
overloaded func with empty args error.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr113766-1.c: New test.
* gcc.target/riscv/rvv/base/pr113766-2.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
If we can't find a scratch register for large model profiling, return
R10_REG.
PR target/113689
* config/i386/i386.cc (x86_64_select_profile_regnum): Return
R10_REG after sorry.
|
|
It would be neater if the middle end for target_clones used a target
hook for version name mangling, so we only do version name mangling
once. However, that would require more intrusive refactoring that will
have to wait till Stage 1.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_mangle_decl_assembler_name):
Move before new caller, and add ".default" suffix.
(get_suffixed_assembler_name): New.
(make_resolver_func): Use get_suffixed_assembler_name.
(aarch64_generate_version_dispatcher_body): Redo name mangling.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/mv-symbols1.C: New test.
* g++.target/aarch64/mv-symbols2.C: Ditto.
* g++.target/aarch64/mv-symbols3.C: Ditto.
* g++.target/aarch64/mv-symbols4.C: Ditto.
* g++.target/aarch64/mv-symbols5.C: Ditto.
* g++.target/aarch64/mvc-symbols1.C: Ditto.
* g++.target/aarch64/mvc-symbols2.C: Ditto.
* g++.target/aarch64/mvc-symbols3.C: Ditto.
* g++.target/aarch64/mvc-symbols4.C: Ditto.
|
|
std::pair ctor used in tiles constexpr variable is only constexpr in C++14
and later, it works with libstdc++ because it is marked constexpr there even
in C++11 mode.
The following patch fixes it by using an unnamed local class instead of
std::pair, and additionally changes the first element from unsigned int to
unsigned char because 0xff has to fit into unsigned char on all hosts.
2024-02-06 Jakub Jelinek <jakub@redhat.com>
PR target/113763
* config/aarch64/aarch64.cc (aarch64_output_sme_zero_za): Change tiles
element from std::pair<unsigned int, char> to an unnamed struct.
Adjust uses of tile range variable.
|
|
This patch fixes issue reported by Jeff.
Testing is running. Ok for trunk if I passed the testing with no regression ?
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): Fix inifinite compilation.
(pre_vsetvl::remove_vsetvl_pre_insns): Ditto.
|
|
The target hook aarch64_class_max_nregs returns the incorrect result for 64-bit
structure modes like V31DImode or V41DFmode etc. The calculation of the nregs
is based on the size of AdvSIMD vector register for 64-bit modes which ought to
be UNITS_PER_VREG / 2. This patch fixes the register size.
gcc/ChangeLog:
PR target/112577
* config/aarch64/aarch64.cc (aarch64_class_max_nregs): Handle 64-bit
vector structure modes correctly.
|
|
A recent commit introduced a compiler warning in thead.cc:
error: invalid suffix on literal; C++11 requires a space between literal and string macro [-Werror=literal-suffix]
1144 | fprintf (file, "(%s),"HOST_WIDE_INT_PRINT_DEC",%u", reg_names[REGNO (addr.reg)],
| ^
This commit addresses this issue and breaks the line such that it won't
exceed 80 characters.
gcc/ChangeLog:
* config/riscv/thead.cc (th_print_operand_address): Fix compiler
warning.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
2 scratch registers, %r10 and %r11, are available at function entry for
large model profiling. But %r10 may be used by stack realignment and we
can't use %r10 in this case. Add x86_64_select_profile_regnum to find
a caller-saved register which isn't live or a callee-saved register
which has been saved on stack in the prologue at entry for large model
profiling and sorry if we can't find one.
gcc/
PR target/113689
* config/i386/i386.cc (x86_64_select_profile_regnum): New.
(x86_function_profiler): Call x86_64_select_profile_regnum to
get a scratch register for large model profiling.
gcc/testsuite/
PR target/113689
* gcc.target/i386/pr113689-1.c: New file.
* gcc.target/i386/pr113689-2.c: Likewise.
* gcc.target/i386/pr113689-3.c: Likewise.
|
|
Adds missing bti instruction at the beginning of a virtual
thunk, when bti is enabled.
gcc/ChangeLog:
* config/arm/arm.cc (arm_output_mi_thunk): Emit
insn for bti_c when bti is enabled.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Add v8_1_m_main_pacbti.
* g++.target/arm/bti_thunk.C: New test.
|
|
I was too sleepy writting this :(.
gcc/ChangeLog:
* config/mips/mips-msa.md (neg<mode:MSA>2): Add missing mode for
neg.
|
|
We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is
wrong because -0.0 is not 0 - 0.0. This causes some Python tests to
fail when Python is built with MSA enabled.
Use the bnegi.df instructions to simply reverse the sign bit instead.
gcc/ChangeLog:
* config/mips/mips-msa.md (elmsgnbit): New define_mode_attr.
(neg<mode>2): Change the mode iterator from MSA to IMSA because
in FP arithmetic we cannot use (0 - x) for -x.
(neg<mode>2): New define_insn to implement FP vector negation,
using a bnegi instruction to negate the sign bit.
|
|
vzeroupper pass [PR113059]
The move of the vzeroupper pass from after reload pass to after
postreload_cse helped only partially, CSE-like passes can still invalidate
those notes (especially REG_UNUSED) if they use some earlier register
holding some value later on in the IL.
So, either we could try to move it one pass further after gcse2 and hope
no later pass invalidates the notes, or the following patch attempts to
restore the REG_DEAD/REG_UNUSED state from GCC 13 and earlier, where
the LRA or reload passes remove all REG_DEAD/REG_UNUSED notes and the notes
reappear only at the start of dse2 pass when it calls
df_note_add_problem ();
df_analyze ();
So, effectively
NEXT_PASS (pass_postreload_cse);
NEXT_PASS (pass_gcse2);
NEXT_PASS (pass_split_after_reload);
NEXT_PASS (pass_ree);
NEXT_PASS (pass_compare_elim_after_reload);
NEXT_PASS (pass_thread_prologue_and_epilogue);
passes operate without those notes in the IL.
While in GCC 14 mode switching computes the notes problem at the start of
vzeroupper, the patch below removes them at the end of the pass again, so
that the above passes continue to operate without them.
2024-02-05 Jakub Jelinek <jakub@redhat.com>
PR target/113059
* config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
Remove REG_DEAD/REG_UNUSED notes at the end of the pass before
df_analyze call.
|
|
The following avoids re-using a register holding a pointer (and
thus might be REG_POINTER) for the result of a pointer difference
computation. That might confuse heuristics in (broken) RTL alias
analysis which relies on REG_POINTER indicating that we're
dealing with one.
This alone doesn't fix anything.
PR target/113255
* config/i386/i386-expand.cc
(expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves):
Use a new pseudo for the skipped number of bytes.
|
|
gcc/ChangeLog:
* config/riscv/riscv-cores.def: Add sifive-p450, sifive-p670.
* doc/invoke.texi (RISC-V Options): Add sifive-p450,
sifive-p670.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/mcpu-sifive-p450.c: New test.
* gcc.target/riscv/mcpu-sifive-p670.c: New test.
|
|
Add sifive p400 series scheduler module. For more information
see https://www.sifive.com/cores/performance-p450-470.
gcc/ChangeLog:
* config/riscv/riscv.md: Include sifive-p400.md.
* config/riscv/sifive-p400.md: New file.
* config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add sifive_p400.
* config/riscv/riscv.cc (sifive_p400_tune_info): New.
* config/riscv/riscv.h (TARGET_SFB_ALU): Update.
* doc/invoke.texi (RISC-V Options): Add sifive-p400-series
|
|
gcc/ChangeLog:
* config/xtensa/xtensa.md (*eqne_zero_masked_bits):
Add missing ":SI" to the match_operator.
|
|
After LRA transition, HImode constants that don't fit into signed 12 bits
are no longer subject to constant synthesis:
/* example */
void test(void) {
short foo = 32767;
__asm__ ("" :: "r"(foo));
}
;; before
.literal_position
.literal .LC0, 32767
test:
l32r a9, .LC0
ret.n
This patch fixes that:
;; after
test:
movi.n a9, -1
extui a9, a9, 17, 15
ret.n
gcc/ChangeLog:
* config/xtensa/xtensa.md (SHI): New mode iterator.
(2 split patterns related to constsynth):
Change to also accept HImode operands.
|
|
This patch adjusts the costs so that we treat REG and SUBREG expressions the
same for costing.
This was motivated by bt_skip_func and bt_find_func in xz and results in nearly
a 5% improvement in the dynamic instruction count for input #2 and smaller, but
definitely visible improvements pretty much across the board. Exceptions would
be perlbench input #1 and exchange2 which showed very small regressions.
In the bt_find_func and bt_skip_func cases we have something like this:
> (insn 10 7 11 2 (set (reg/v:DI 136 [ x ])
> (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip}
> (nil))
> (insn 11 10 12 2 (set (reg:DI 142 [ _1 ])
> (plus:DI (reg/v:DI 136 [ x ])
> (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3}
> (nil))
[ ... ]> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ])
> (plus:DI (reg/v:DI 136 [ x ])
> (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3}
> (nil))
Note the two uses of (reg 136). The best way to handle that in combine might be
a 3->2 split. But there's a much better approach if we look at fwprop...
(set (reg:DI 142 [ _1 ])
(plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))
(reg/v:DI 139 [ b ])))
change not profitable (cost 4 -> cost 8)
So that should be the same cost as a regular DImode addition when the ZBA
extension is enabled. But it ends up costing more because the clause to cost
this variant isn't prepared to handle a SUBREG. That results in the RTL above
having too high a cost and fwprop gives up.
One approach would be to replace the REG_P with REG_P || SUBREG_P in the
costing code. I ultimately decided against that and instead check if the
operand in question passes register_operand.
By far the most important case to handle is the DImode PLUS. But for the sake
of consistency, I changed the other instances in riscv_rtx_costs as well. For
those other cases we're talking about improvements in the .000001% range.
While we are into stage4, this just hits cost modeling which we've generally
agreed is still appropriate (though we were mostly talking about vector). So
I'm going to extend that general agreement ever so slightly and include scalar
cost modeling :-)
gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG
similarly.
gcc/testsuite/
* gcc.target/riscv/reg_subreg_costs.c: New test.
Co-authored-by: Jivan Hakobyan <jivanhakobyan9@gmail.com>
|
|
We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is
wrong because -0.0 is not 0 - 0.0. This causes some Python tests to
fail when Python is built with LSX enabled.
Use the vbitrevi.{d/w} instructions to simply reverse the sign bit
instead. We are already doing this for LASX and now we can unify them
into simd.md.
gcc/ChangeLog:
* config/loongarch/lsx.md (neg<mode:FLSX>2): Remove the
incorrect expand.
* config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr.
(elmsgnbit): Likewise.
(neg<mode:FVEC>2): New define_insn.
* config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they
are now instantiated in simd.md.
|
|
We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes.
But in loongarch_symbol_insns:
if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))
return 0;
And LSX_SUPPORTED_MODE_P is defined as:
#define LSX_SUPPORTED_MODE_P(MODE) \
(ISA_HAS_LSX \
&& GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ...
GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined:
ALWAYS_INLINE poly_uint16
mode_to_bytes (machine_mode mode)
{
#if GCC_VERSION >= 4001
return (__builtin_constant_p (mode)
? mode_size_inline (mode) : mode_size[mode]);
#else
return mode_size[mode];
#endif
}
There is an assertion in mode_size_inline:
gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);
Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc),
thus if __builtin_constant_p (mode) is evaluated true (it happens when
GCC is bootstrapped with LTO+PGO), the assertion will be triggered and
cause an ICE. OTOH if __builtin_constant_p (mode) is evaluated false,
mode_size[mode] is still an out-of-bound array access (the length or the
mode_size array is NUM_MACHINE_MODES).
So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with
MAX_MACHINE_MODE in loongarch_symbol_insns. This is very similar to a
MIPS bug PR98491 fixed by me about 3 years ago.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not
use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is
MAX_MACHINE_MODE.
|
|
This FAIL was introduced from r14-6908. The reason is that when merging
constant vector permutation implementations, the 128-bit matching situation
was not fully considered. In fact, the expansion of 128-bit vectors after
merging only supports value-based 4 elements set shuffle, so this time is a
complete implementation of the entire 128-bit vector constant permutation,
and some structural adjustments have also been made to the code.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_expand_vselect): Adjust.
(loongarch_expand_vselect_vconcat): Ditto.
(loongarch_try_expand_lsx_vshuf_const): New, use vshuf to implement
all 128-bit constant permutation situations.
(loongarch_expand_lsx_shuffle): Adjust and rename function name.
(loongarch_is_imm_set_shuffle): Renamed function name.
(loongarch_expand_vec_perm_even_odd): Function forward declaration.
(loongarch_expand_vec_perm_even_odd_1): Add implement for 128-bit
extract-even and extract-odd permutations.
(loongarch_is_odd_extraction): Delete.
(loongarch_is_even_extraction): Ditto.
(loongarch_expand_vec_perm_const): Adjust.
|
|
When bootstrapping GCC 14 with --with-build-config=bootstrap-lto, an ODR
violation is detected:
../../gcc/config/loongarch/loongarch-opts.cc:57: warning:
'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr]
57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
../../gcc/config/loongarch/loongarch-def.cc:186: note:
'abi_minimal_isa' was previously declared here
186 | abi_minimal_isa = array<array<loongarch_isa, N_ABI_EXT_TYPES>,
../../gcc/config/loongarch/loongarch-def.cc:186: note:
code may be misoptimized unless '-fno-strict-aliasing' is used
Fix it by adding a proper declaration of abi_minimal_isa into
loongarch-def.h and remove the ODR-violating local declaration in
loongarch-opts.cc.
gcc/ChangeLog:
* config/loongarch/loongarch-def.h (abi_minimal_isa): Declare.
* config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove
the ODR-violating locale declaration.
|
|
This change implements __builtin_get_fpsr() and __builtin_set_fpsr(x)
to get and set the floating-point status register. They are used to
implement pa_atomic_assign_expand_fenv().
2024-02-02 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
PR target/59778
* config/pa/pa.cc (enum pa_builtins): Add PA_BUILTIN_GET_FPSR
and PA_BUILTIN_SET_FPSR builtins.
* (pa_builtins_icode): Declare.
* (def_builtin, pa_fpu_init_builtins): New.
* (pa_init_builtins): Initialize FPU builtins.
* (pa_builtin_decl, pa_expand_builtin_1): New.
* (pa_expand_builtin): Handle PA_BUILTIN_GET_FPSR and
PA_BUILTIN_SET_FPSR builtins.
* (pa_atomic_assign_expand_fenv): New.
* config/pa/pa.md (UNSPECV_GET_FPSR, UNSPECV_SET_FPSR): New
UNSPECV constants.
(get_fpsr, put_fpsr): New expanders.
(get_fpsr_32, get_fpsr_64, set_fpsr_32, set_fpsr_64): New
insn patterns.
|
|
This patch fixes the following:
vsetvli a5,a1,e32,m1,tu,ma
slli a4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
vsetivli zero,1,e32,m1,ta,ma
vmv.s.x v2,zero
vsetvli a5,zero,e32,m1,ta,ma ---> Redundant vsetvl.
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
VSETVL PASS is able to fuse avl = 1 of scalar move and VLMAX avl of reduction.
However, this following RTL blocks the fusion in dependence analysis in VSETVL PASS:
(insn 49 24 50 5 (set (reg:RVVM1SI 98 v2 [148])
(if_then_else:RVVM1SI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI [
(const_int 1 [0x1])
repeat [
(const_int 0 [0])
]
])
(const_int 1 [0x1])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM1SI repeat [
(const_int 0 [0])
])
(unspec:RVVM1SI [
(reg:DI 0 zero)
] UNSPEC_VUNDEF))) 3813 {*pred_broadcastrvvm1si_zero}
(nil))
(insn 50 49 51 5 (set (reg:DI 15 a5 [151]) ----> It set a5, blocks the following VLMAX into the scalar move above.
(unspec:DI [
(const_int 32 [0x20])
] UNSPEC_VLMAX)) 2566 {vlmax_avldi}
(expr_list:REG_EQUIV (unspec:DI [
(const_int 32 [0x20])
] UNSPEC_VLMAX)
(nil)))
(insn 51 50 52 5 (set (reg:RVVM1SI 97 v1 [150])
(unspec:RVVM1SI [
(unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(reg:DI 15 a5 [151])
(const_int 2 [0x2])
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(unspec:RVVM1SI [
(reg:RVVM1SI 97 v1 [orig:134 vect_result_14.6 ] [134])
(reg:RVVM1SI 98 v2 [148])
] UNSPEC_REDUC_SUM)
(unspec:RVVM1SI [
(reg:DI 0 zero)
] UNSPEC_VUNDEF)
] UNSPEC_REDUC)) 17541 {pred_redsumrvvm1si}
(expr_list:REG_DEAD (reg:RVVM1SI 98 v2 [148])
(expr_list:REG_DEAD (reg:SI 66 vl)
(expr_list:REG_DEAD (reg:DI 15 a5 [151])
(expr_list:REG_DEAD (reg:DI 0 zero)
(nil))))))
Such situation can only happen on auto-vectorization, never happen on intrinsic codes.
Since the reduction is passed VLMAX AVL, it should be more natural to pass VLMAX to the scalar move which initial the value of the reduction.
After this patch:
vsetvli a5,a1,e32,m1,tu,ma
slli a4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
vsetvli a5,zero,e32,m1,ta,ma
vmv.s.x v2,zero
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
Tested on both RV32/RV64 no regression.
PR target/113697
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_reduction): Pass VLMAX avl to scalar move.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr113697.c: New test.
|
|
This reverts commit 74489c19070703361acc20bc172f304cae845a96.
|
|
Realize in recent benchmark evaluation (coremark-pro zip-test):
vid.v v2
vmv.v.i v5,0
.L9:
vle16.v v3,0(a4)
vrsub.vx v4,v2,a6 ---> LICM failed to hoist it outside the loop.
The root cause is:
(insn 56 47 57 4 (set (subreg:DI (reg:HI 220) 0)
(reg:DI 223)) "rvv.c":11:9 208 {*movdi_64bit} -> Its result used by the following vrsub.vx then supress the hoist of the vrsub.vx
(nil))
(insn 57 56 59 4 (set (reg:RVVMF2HI 216)
(if_then_else:RVVMF2HI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(reg:DI 350)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(minus:RVVMF2HI (vec_duplicate:RVVMF2HI (reg:HI 220))
(reg:RVVMF2HI 217))
(unspec:RVVMF2HI [
(reg:DI 0 zero)
] UNSPEC_VUNDEF))) "rvv.c":11:9 6938 {pred_subrvvmf2hi_reverse_scalar}
(expr_list:REG_DEAD (reg:HI 220)
(nil)))
This patch fixes it generate (set (reg:HI) (subreg:HI (reg:DI))) instead of (set (subreg:DI (reg:DI)) (reg:DI)).
After this patch:
vid.v v2
vrsub.vx v2,v2,a7
vmv.v.i v4,0
.L3:
vle16.v v3,0(a4)
Tested on both RV32 and RV64 no regression.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_legitimize_move): Fix poly_int dest generation.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/poly_licm-1.c: New test.
* gcc.target/riscv/rvv/autovec/poly_licm-2.c: New test.
|
|
This patch would like to cleanup some comments which are out of date or incorrect.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_get_arg_info): Cleanup comments.
(riscv_pass_by_reference): Ditto.
(riscv_fntype_abi): Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
I realize there is a RTL regression between GCC-14 and GCC-13.
https://godbolt.org/z/Ga7K6MqaT
GCC-14:
(insn 9 13 31 2 (set (reg:DI 15 a5 [138])
(unspec:DI [
(const_int 64 [0x40])
] UNSPEC_VLMAX)) "/app/example.c":5:15 2566 {vlmax_avldi}
(expr_list:REG_EQUIV (unspec:DI [
(const_int 64 [0x40])
] UNSPEC_VLMAX)
(nil)))
(insn 31 9 10 2 (parallel [
(set (reg:DI 15 a5 [138])
(unspec:DI [
(reg:DI 0 zero)
(const_int 32 [0x20])
(const_int 7 [0x7])
(const_int 1 [0x1]) repeated x2
] UNSPEC_VSETVL))
(set (reg:SI 66 vl)
(unspec:SI [
(reg:DI 0 zero)
(const_int 32 [0x20])
(const_int 7 [0x7])
] UNSPEC_VSETVL))
(set (reg:SI 67 vtype)
(unspec:SI [
(const_int 32 [0x20])
(const_int 7 [0x7])
(const_int 1 [0x1]) repeated x2
] UNSPEC_VSETVL))
]) "/app/example.c":5:15 3281 {vsetvldi}
(nil))
GCC-13:
(insn 10 7 26 2 (set (reg/f:DI 11 a1 [139])
(plus:DI (reg:DI 11 a1 [142])
(const_int 800 [0x320]))) "/app/example.c":6:32 5 {adddi3}
(nil))
(insn 26 10 9 2 (parallel [
(set (reg:DI 15 a5)
(unspec:DI [
(reg:DI 0 zero)
(const_int 32 [0x20])
(const_int 7 [0x7])
(const_int 1 [0x1]) repeated x2
] UNSPEC_VSETVL))
(set (reg:SI 66 vl)
(unspec:SI [
(reg:DI 0 zero)
(const_int 32 [0x20])
(const_int 7 [0x7])
] UNSPEC_VSETVL))
(set (reg:SI 67 vtype)
(unspec:SI [
(const_int 32 [0x20])
(const_int 7 [0x7])
(const_int 1 [0x1]) repeated x2
] UNSPEC_VSETVL))
]) "/app/example.c":5:15 792 {vsetvldi}
(nil))
GCC-13 doesn't have:
(insn 9 13 31 2 (set (reg:DI 15 a5 [138])
(unspec:DI [
(const_int 64 [0x40])
] UNSPEC_VLMAX)) "/app/example.c":5:15 2566 {vlmax_avldi}
(expr_list:REG_EQUIV (unspec:DI [
(const_int 64 [0x40])
] UNSPEC_VLMAX)
(nil)))
vsetvl_pre doesn't emit any assembler which is just used for occupying scalar register.
It should be removed in VSETVL PASS.
Tested on both RV32 and RV64 no regression.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (vsetvl_pre_insn_p): New function.
(pre_vsetvl::cleaup): Remove vsetvl_pre.
(pre_vsetvl::remove_vsetvl_pre_insns): New function.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/vsetvl_pre-1.c: New test.
|
|
gcc/ChangeLog:
* config/loongarch/larchintrin.h
(__frecipe_s): Update function return type.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.
|