Age | Commit message (Collapse) | Author | Files | Lines |
|
We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r
failed to vectorize effectively. For this reason, we adjust the cost of
128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit
vectorization.
The experimental results show that after the modification, 549.fotonik3d_r
performance can be improved by 9.77% under the 128-bit vectorization option.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_multiply_add_p): New.
(loongarch_vector_costs::add_stmt_cost): Adjust.
gcc/testsuite/ChangeLog:
* gfortran.dg/vect/vect-10.f90: New test.
|
|
model.
The ABI mandates the pcalau12i/addi.d/lu32i.d/lu52i.d instructions for
addressing a symbol to be adjacent. So model them as "one large
instruction", i.e. define_insn, with two output registers. The real
address is the sum of these two registers.
The advantage of this approach is the RTL passes can still use ldx/stx
instructions to skip an addi.d instruction.
gcc/ChangeLog:
* config/loongarch/loongarch.md (unspec): Add
UNSPEC_LA_PCREL_64_PART1 and UNSPEC_LA_PCREL_64_PART2.
(la_pcrel64_two_parts): New define_insn.
* config/loongarch/loongarch.cc (loongarch_tls_symbol): Fix a
typo in the comment.
(loongarch_call_tls_get_addr): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, use la_pcrel64_two_parts for
addressing the TLS symbol and __tls_get_addr. Emit an REG_EQUAL
note to allow CSE addressing __tls_get_addr.
(loongarch_legitimize_tls_address): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, address TLS IE symbols with
la_pcrel64_two_parts.
(loongarch_split_symbol): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, address symbols with
la_pcrel64_two_parts.
(loongarch_output_mi_thunk): Clean up unreachable code. If
-mcmodel=extreme -mexplicit-relocs={always,auto}, address the MI
thunks with la_pcrel64_two_parts.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/func-call-extreme-1.c (dg-options):
Use -O2 instead of -O0 to ensure the pcalau12i/addi/lu32i/lu52i
instruction sequences are not reordered by the compiler.
(NOIPA): Disallow interprocedural optimizations.
* gcc.target/loongarch/func-call-extreme-2.c: Remove the content
duplicated from func-call-extreme-1.c, include it instead.
(dg-options): Likewise.
* gcc.target/loongarch/func-call-extreme-3.c (dg-options):
Likewise.
* gcc.target/loongarch/func-call-extreme-4.c (dg-options):
Likewise.
* gcc.target/loongarch/cmodel-extreme-1.c: New test.
* gcc.target/loongarch/cmodel-extreme-2.c: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-1.C: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-2.C: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-3.C: New test.
|
|
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_call_tls_get_addr):
Add support for call36.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c: New test.
|
|
-mexplicit-relocs=auto.
Binutils does not support relaxation using four instructions to obtain
symbol addresses
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
When the code model of the symbol is extreme and -mexplicit-relocs=auto,
the macro instruction loading symbol address is not applicable.
(loongarch_call_tls_get_addr): Adjust code.
(loongarch_legitimize_tls_address): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c: New test.
* gcc.target/loongarch/explicit-relocs-medium-auto-tls-ld-gd.c: New test.
|
|
gcc/ChangeLog:
* config/loongarch/loongarch-protos.h (loongarch_symbol_extreme_p):
Add function declaration.
* config/loongarch/loongarch.cc (loongarch_symbolic_constant_p):
For SYMBOL_PCREL64, non-zero addend of "la.local $rd,$rt,sym+addend"
is not allowed
(loongarch_load_tls): Added macro support in extreme mode.
(loongarch_call_tls_get_addr): Likewise.
(loongarch_legitimize_tls_address): Likewise.
(loongarch_force_address): Likewise.
(loongarch_legitimize_move): Likewise.
(loongarch_output_mi_thunk): Likewise.
(loongarch_option_override_internal): Remove the code that detects
explicit relocs status.
(loongarch_handle_model_attribute): Likewise.
* config/loongarch/loongarch.md (movdi_symbolic_off64): New template.
* config/loongarch/predicates.md (symbolic_off64_operand): New predicate.
(symbolic_off64_or_reg_operand): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/attr-model-5.c: New test.
* gcc.target/loongarch/func-call-extreme-5.c: New test.
* gcc.target/loongarch/func-call-extreme-6.c: New test.
* gcc.target/loongarch/tls-extreme-macro.c: New test.
|
|
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_load_tls):
Load all types of tls symbols through one function.
(loongarch_got_load_tls_gd): Delete.
(loongarch_got_load_tls_ld): Delete.
(loongarch_got_load_tls_ie): Delete.
(loongarch_got_load_tls_le): Delete.
(loongarch_call_tls_get_addr): Modify the called function name.
(loongarch_legitimize_tls_address): Likewise.
* config/loongarch/loongarch.md (@got_load_tls_gd<mode>): Delete.
(@load_tls<mode>): New template.
(@got_load_tls_ld<mode>): Delete.
(@got_load_tls_le<mode>): Delete.
(@got_load_tls_ie<mode>): Delete.
|
|
values through fp.
Modify address calculation logic from (((a x C) + fp) + offset) to ((fp + offset) + a x C).
Thereby modifying the register dependencies and optimizing the code.
The value of C is 2 4 or 8.
The following is the assembly code before and after a loop modification in spec2006 401.bzip:
old | new
735 .L71: | 735 .L71:
736 slli.d $r12,$r15,2 | 736 slli.d $r12,$r15,2
737 ldx.w $r13,$r22,$r12 | 737 ldx.w $r13,$r22,$r12
738 addi.d $r15,$r15,-1 | 738 addi.d $r15,$r15,-1
739 slli.w $r16,$r15,0 | 739 slli.w $r16,$r15,0
740 addi.w $r13,$r13,-1 | 740 addi.w $r13,$r13,-1
741 slti $r14,$r13,0 | 741 slti $r14,$r13,0
742 add.w $r12,$r26,$r13 | 742 add.w $r12,$r26,$r13
743 maskeqz $r12,$r12,$r14 | 743 maskeqz $r12,$r12,$r14
744 masknez $r14,$r13,$r14 | 744 masknez $r14,$r13,$r14
745 or $r12,$r12,$r14 | 745 or $r12,$r12,$r14
746 ldx.bu $r14,$r30,$r12 | 746 ldx.bu $r14,$r30,$r12
747 lu12i.w $r13,4096>>12 | 747 alsl.d $r14,$r14,$r18,2
748 ori $r13,$r13,432 | 748 ldptr.w $r13,$r14,0
749 add.d $r13,$r13,$r3 | 749 addi.w $r17,$r13,-1
750 alsl.d $r14,$r14,$r13,2 | 750 stptr.w $r17,$r14,0
751 ldptr.w $r13,$r14,-1968 | 751 slli.d $r13,$r13,2
752 addi.w $r17,$r13,-1 | 752 stx.w $r12,$r22,$r13
753 st.w $r17,$r14,-1968 | 753 ldptr.w $r12,$r19,0
754 slli.d $r13,$r13,2 | 754 blt $r12,$r16,.L71
755 stx.w $r12,$r22,$r13 | 755 .align 4
756 ldptr.w $r12,$r18,-2048 | 756
757 blt $r12,$r16,.L71 | 757
758 .align 4 | 758
This patch is ported from riscv's commit r14-3111.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (mem_shadd_or_shadd_rtx_p): New function.
(loongarch_legitimize_address): Add logical transformation code.
|
|
The fix for PR70321 introduced a splitter that split a doubleword
comparison into a pair of XORs followed by an IOR to set the (zero)
flags register. To help the reload, splitter forced SUBREG pieces of
double-word input values to a pseudo, but this regressed
gcc.target/i386/pr82580.c:
int f0 (U x, U y) { return x == y; }
from:
xorq %rdx, %rdi
xorq %rcx, %rsi
xorl %eax, %eax
orq %rsi, %rdi
sete %al
ret
to:
xchgq %rdi, %rsi
movq %rdx, %r8
movq %rcx, %rax
movq %rsi, %rdx
movq %rdi, %rcx
xorq %rax, %rcx
xorq %r8, %rdx
xorl %eax, %eax
orq %rcx, %rdx
sete %al
ret
To mitigate the regression, remove this legacy heuristic (workaround?).
There have been many incremental changes and improvements to x86 TImode
and register allocation, so this legacy workaround is not only no longer
useful, but it actually hurts register allocation. The patched compiler
now produces:
xchgq %rdi, %rsi
xorl %eax, %eax
xorq %rsi, %rdx
xorq %rdi, %rcx
orq %rcx, %rdx
sete %al
ret
PR target/113701
gcc/ChangeLog:
* config/i386/i386.md (*cmp<dwi>_doubleword):
Do not force SUBREG pieces to pseudos.
|
|
The first alternative stores the floating-point status register
in the destination. It should store zero. We need to copy %fr0
to another floating-point register to initialize it to zero.
2024-02-01 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa.md (atomic_storedi_1): Fix bug in
alternative 1.
|
|
gcc/
* config/avr/avr.cc: Tabify.
|
|
Also add 'STATIC_ASSERT's for number of SGPR/VGPR/AVGPR registers (in
'#ifndef USED_FOR_TARGET', as otherwise 'STATIC_ASSERT' isn't available).
gcc/
* config/gcn/gcn.cc (gcn_hsa_declare_function_name): Don't
hard-code number of SGPR/VGPR/AVGPR registers.
* config/gcn/gcn.h: Add a 'STATIC_ASSERT's for number of
SGPR/VGPR/AVGPR registers.
|
|
Add sifive p600 series scheduler module. For more information
see https://www.sifive.com/cores/performance-p650-670.
Add sifive-p650, sifive-p670 for mcpu option will come in separate patches.
gcc/ChangeLog:
* config/riscv/riscv.md: Add "fcvt_i2f", "fcvt_f2i" type
attribute, and include sifive-p600.md.
* config/riscv/generic-ooo.md: Update type attribute.
* config/riscv/generic.md: Update type attribute.
* config/riscv/sifive-7.md: Update type attribute.
* config/riscv/sifive-p600.md: New file.
* config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add sifive_p600.
* config/riscv/riscv.cc (sifive_p600_tune_info): New.
* config/riscv/riscv.h (TARGET_SFB_ALU): Update.
* doc/invoke.texi (RISC-V Options): Add sifive-p600-series
|
|
The RISC-V Profiles specification here:
https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#7-new-isa-extensions
These extensions don't add any new features but
describe existing features. So this patch only adds parsing.
Za64rs: Reservation set size of 64 bytes
Za128rs: Reservation set size of 128 bytes
Ziccif: Main memory supports instruction fetch with atomicity requirement
Ziccrse: Main memory supports forward progress on LR/SC sequences
Ziccamoa: Main memory supports all atomics in A
Zicclsm: Main memory supports misaligned loads/stores
Zic64b: Cache block size isf 64 bytes
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add Za64rs, Za128rs,
Ziccif, Ziccrse, Ziccamoa, Zicclsm, Zic64b items.
* config/riscv/riscv.opt: New macro for 7 new unprivileged
extensions.
* doc/invoke.texi (RISC-V Options): Add Za64rs, Za128rs,
Ziccif, Ziccrse, Ziccamoa, Zicclsm, Zic64b extensions.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/za-ext.c: New test.
* gcc.target/riscv/zi-ext.c: New test.
|
|
g++.dg/asan/default-options-1.C FAILs on Solaris/SPARC and x86:
FAIL: g++.dg/asan/default-options-1.C -O0 execution test
FAIL: g++.dg/asan/default-options-1.C -O1 execution test
FAIL: g++.dg/asan/default-options-1.C -O2 execution test
FAIL: g++.dg/asan/default-options-1.C -O2 -flto execution test
FAIL: g++.dg/asan/default-options-1.C -O2 -flto -flto-partition=none execution test
FAIL: g++.dg/asan/default-options-1.C -O3 -g execution test
FAIL: g++.dg/asan/default-options-1.C -Os execution test
The failure is always the same:
AddressSanitizer: CHECK failed: asan_rtl.cpp:397 "((!AsanInitIsRunning() && "ASan init calls itself!")) != (0)" (0x0, 0x0) (tid=1)
This happens because libasan makes unportable assumptions about
initialization order that don't hold on Solaris. The problem has
already been fixed in clang by
[Driver] Link shared asan runtime lib with -z now on Solaris/x86
https://reviews.llvm.org/D156325
where it was way more prevalent.
This patch applies the same fix to gcc.
Tested on i386-pc-solaris2.11 (ld and gld) and sparc-sun-solaris2.11.
2024-01-30 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc:
* config/sol2.h (LIBASAN_EARLY_SPEC): Add -z now unless
-static-libasan. Add missing whitespace.
|
|
machine description
They're not used there, and we avoid potentially out-of-sync definitions.
gcc/
* config/gcn/gcn.md (FIRST_SGPR_REG, LAST_SGPR_REG)
(FIRST_VGPR_REG, LAST_VGPR_REG, FIRST_AVGPR_REG, LAST_AVGPR_REG):
Don't 'define_constants'.
|
|
..., which was always (a) unused, and (b) bogus: always-false.
gcc/
* config/gcn/gcn.h (SGPR_OR_VGPR_REGNO_P): Remove.
|
|
For OpenACC/GCN '-march=gfx1100', a lot of libgomp OpenACC test cases FAIL:
/tmp/ccGfLJ8a.mkoffload.2.s:406:2: error: instruction not supported on this GPU
ds_cmpst_rtn_b32 v0, v0, v4, v3
^
In RDNA 3, 'ds_cmpst_[...]' has been replaced by 'ds_cmpstore_[...]', and the
notes for 'ds_cmpst_[...]' in pre-RDNA 3 ISA manuals:
Caution, the order of src and cmp are the *opposite* of the BUFFER_ATOMIC_CMPSWAP opcode.
..., have been resolved for 'ds_cmpstore_[...]' in the RDNA 3 ISA manual:
In this architecture the order of src and cmp agree with the BUFFER_ATOMIC_CMPSWAP opcode.
..., and therefore '%2', '%3' now swapped with regards to GCC operand order.
Most of the affected libgomp OpenACC test cases then PASS their execution test.
gcc/
* config/gcn/gcn.md (sync_compare_and_swap<mode>_lds_insn)
[TARGET_RDNA3]: Adjust.
|
|
This reverts commit 26c34b809cd1a6249027730a8b52bbf6a1c0f4a8.
|
|
This reverts commit e56fb037d9d265682f5e7217d8a4c12a8d3fddf8.
|
|
This reverts commit 23cd2961bd2ff63583f46e3499a07bd54491d45c.
|
|
Enables assert that every typed instruction is associated with a
dfa reservation
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sched_variable_issue): enable assert
|
|
Creates new generic vector pipeline file common to all cpu tunes.
Moves all vector related pipelines from generic-ooo to generic-vector-ooo.
Creates new vector crypto related insn reservations.
gcc/ChangeLog:
* config/riscv/generic-ooo.md (generic_ooo): Move reservation
(generic_ooo_vec_load): ditto
(generic_ooo_vec_store): ditto
(generic_ooo_vec_loadstore_seg): ditto
(generic_ooo_vec_alu): ditto
(generic_ooo_vec_fcmp): ditto
(generic_ooo_vec_imul): ditto
(generic_ooo_vec_fadd): ditto
(generic_ooo_vec_fmul): ditto
(generic_ooo_crypto): ditto
(generic_ooo_perm): ditto
(generic_ooo_vec_reduction): ditto
(generic_ooo_vec_ordered_reduction): ditto
(generic_ooo_vec_idiv): ditto
(generic_ooo_vec_float_divsqrt): ditto
(generic_ooo_vec_mask): ditto
(generic_ooo_vec_vesetvl): ditto
(generic_ooo_vec_setrm): ditto
(generic_ooo_vec_readlen): ditto
* config/riscv/riscv.md: include generic-vector-ooo
* config/riscv/generic-vector-ooo.md: New file. to here
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com>
|
|
This patch adds non-vector related insn reservations and updates/creates
new insn reservations so all non-vector typed instructions have a reservation.
gcc/ChangeLog:
* config/riscv/generic-ooo.md (generic_ooo_sfb_alu): Add reservation
(generic_ooo_branch): ditto
* config/riscv/generic.md (generic_sfb_alu): ditto
(generic_fmul_half): ditto
* config/riscv/riscv.md: Remove cbo, pushpop, and rdfrm types
* config/riscv/sifive-7.md (sifive_7_hfma):Add reservation
(sifive_7_popcount): ditto
* config/riscv/vector.md: change rdfrm to fmove
* config/riscv/zc.md: change pushpop to load/store
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
|
|
After r14-1187-gd6b756447cd58b, simplify_gen_subreg can return
NULL for "unaligned" memory subreg. Since V8DI has an alignment of 8 bytes,
using TImode causes simplify_gen_subreg to return NULL.
This fixes the issue by using DImode instead for the loop. And then we will have
later on the STP/LDP pass combine it back into STP/LDP if needed.
Since strict align is less important (usually used for firmware and early boot only),
not doing LDP/STP here is ok.
Built and tested for aarch64-linux-gnu with no regressions.
PR target/113657
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (split for movv8di):
For strict aligned mode, use DImode instead of TImode.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/acle/ls64_strict_align.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The PR shows us ICEing due to an unrecognizable TFmode save emitted by
aarch64_process_components. The problem is that for T{I,F,D}mode we
conservatively require mems to be in range for x-register ldp/stp. That
is because (at least for TImode) it can be allocated to both GPRs and
FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
a q-register load/store.
As Richard pointed out in the PR, aarch64_get_separate_components
already checks that the offsets are suitable for a single load, so we
just need to choose a mode in aarch64_reg_save_mode that gives the full
q-register range. In this patch, we choose V16QImode as an alternative
16-byte "bag-of-bits" mode that doesn't have the artificial range
restrictions imposed on T{I,F,D}mode.
For T{F,D}mode in GCC 15 I think we could consider relaxing the
restriction imposed in aarch64_classify_address, as typically T{F,D}mode
should be allocated to FPRs. But such a change seems too invasive to
consider for GCC 14 at this stage (let alone backports).
Fortunately the new flexible load/store pair patterns in GCC 14 allow
this mode change to work without further changes. The backports are
more involved as we need to adjust the load/store pair handling to cater
for V16QImode in a few places.
Note that for the testcase we are relying on the torture options to add
-funroll-loops at -O3 which is necessary to trigger the ICE on trunk
(but not on the 13 branch).
gcc/ChangeLog:
PR target/111677
* config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
V16QImode for the full 16-byte FPR saves in the vector PCS case.
gcc/testsuite/ChangeLog:
PR target/111677
* gcc.target/aarch64/torture/pr111677.c: New test.
|
|
gcc/
* config/avr/avr-mcus.def: Add AVR64DU28, AVR64DU32, ATA5787,
ATA5835, ATtiny64AUTO, ATA5700M322.
* doc/avr-mmcu.texi: Rebuild.
|
|
strub: introduce STACK_ADDRESS_OFFSET
Since STACK_POINTER_OFFSET is not necessarily at the boundary between
caller- and callee-owned stack, as desired by
__builtin_stack_address(), and using it as if it were or not causes
problems, introduce a new macro so that ports can define it suitably,
without modifying STACK_POINTER_OFFSET.
for gcc/ChangeLog
PR middle-end/112917
PR middle-end/113100
* builtins.cc (expand_builtin_stack_address): Use
STACK_ADDRESS_OFFSET.
* doc/extend.texi (__builtin_stack_address): Adjust.
* config/sparc/sparc.h (STACK_ADDRESS_OFFSET): Define.
* doc/tm.texi.in (STACK_ADDRESS_OFFSET): Document.
* doc/tm.texi: Rebuilt.
|
|
The compile time issue was discovered in SPEC 2017 wrf:
Use time and -ftime-report to analyze the profile data of SPEC 2017 wrf compilation .
Before this patch (Lazy vsetvl):
scheduling : 121.89 ( 15%) 0.53 ( 11%) 122.72 ( 15%) 13M ( 1%)
machine dep reorg : 424.61 ( 53%) 1.84 ( 37%) 427.44 ( 53%) 5290k ( 0%)
real 13m27.074s
user 13m19.539s
sys 0m5.180s
Simple vsetvl:
machine dep reorg : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 4138k ( 0%)
real 6m5.780s
user 6m2.396s
sys 0m2.373s
The machine dep reorg is the compile time of VSETVL PASS (424 seconds) which counts 53% of
the compilation time, spends much more time than scheduling.
After investigation, the critical patch of VSETVL pass is compute_lcm_local_properties which
is called every iteration of phase 2 (earliest fusion) and phase 3 (global lcm).
This patch optimized the codes of compute_lcm_local_properties to reduce the compilation time.
After this patch:
scheduling : 117.51 ( 27%) 0.21 ( 6%) 118.04 ( 27%) 13M ( 1%)
machine dep reorg : 80.13 ( 18%) 0.91 ( 26%) 81.26 ( 18%) 5290k ( 0%)
real 7m25.374s
user 7m20.116s
sys 0m3.795s
The optimization of this patch is very obvious, lazy VSETVL PASS: 424s (53%) -> 80s (18%) which
spend less time than scheduling.
Tested on both RV32 and RV64 no regression. Ok for trunk ?
PR target/113495
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (extract_single_source): Remove.
(pre_vsetvl::compute_vsetvl_def_data): Fix compile time issue.
(pre_vsetvl::compute_transparent): New function.
(pre_vsetvl::compute_lcm_local_properties): Fix compile time time issue.
|
|
Printing the raw symbol is useful in inline asm (e.g. in C++ to get the
mangled name). Similar constraints are available in other targets (e.g.
"S" for aarch64/riscv, "Cs" for m68k).
There isn't a good way for x86 yet, e.g. "i" doesn't work for
PIC/-mcmodel=large. This patch adds "Ws". Here are possible use cases:
```
namespace ns { extern int var; }
asm (".pushsection .xxx,\"aw\"; .dc.a %0; .popsection" :: "Ws"(&var));
asm (".reloc ., BFD_RELOC_NONE, %0" :: "Ws"(&var));
```
gcc/ChangeLog:
PR target/105576
* config/i386/constraints.md: Define constraint "Ws".
* doc/md.texi: Document it.
gcc/testsuite/ChangeLog:
PR target/105576
* gcc.target/i386/asm-raw-symbol.c: New testcase.
|
|
gcc/ChangeLog:
* config/xtensa/constraints.md (R, T, U):
Change define_constraint to define_memory_constraint.
* config/xtensa/predicates.md (move_operand): Don't check that a
constant pool operand size is a multiple of UNITS_PER_WORD.
* config/xtensa/xtensa.cc
(xtensa_lra_p, TARGET_LRA_P): Remove.
(xtensa_emit_move_sequence): Remove "if (reload_in_progress)"
clause as it can no longer be true.
(fixup_subreg_mem): Drop function.
(xtensa_output_integer_literal_parts): Consider 16-bit wide
constants.
(xtensa_legitimate_constant_p): Add short-circuit path for
integer load instructions. Don't check that mode size is
at least UNITS_PER_WORD.
* config/xtensa/xtensa.md (movsf): Use can_create_pseudo_p()
rather reload_in_progress and reload_completed.
(doloop_end): Drop operand 2.
(movhi_internal): Add alternative loading constant from a
literal pool.
(define_split for DI register_operand): Don't limit to
!TARGET_AUTO_LITPOOLS.
* config/xtensa/xtensa.opt (mlra): Change to no effect.
|
|
According to the issue as below.
https://hub.fgit.cf/riscv-non-isa/riscv-elf-psabi-doc/pull/416
When the mode size of vls integer mode is less than 2 * XLEN, we will
take the gpr for both the args and the return values. Instead of the
reference. For example the below code:
typedef short v8hi __attribute__ ((vector_size (16)));
v8hi __attribute__((noinline))
add (v8hi a, v8hi b)
{
v8hi r = a + b;
return r;
}
Before this patch:
add:
vsetivli zero,8,e16,m1,ta,ma
vle16.v v1,0(a1) <== arg by reference
vle16.v v2,0(a2) <== arg by reference
vadd.vv v1,v1,v2
vse16.v v1,0(a0) <== return by reference
ret
After this patch:
add:
addi sp,sp,-32
sd a0,0(sp) <== arg by register a0 - a3
sd a1,8(sp)
sd a2,16(sp)
sd a3,24(sp)
addi a5,sp,16
vsetivli zero,8,e16,m1,ta,ma
vle16.v v2,0(sp)
vle16.v v1,0(a5)
vadd.vv v1,v1,v2
vse16.v v1,0(sp)
ld a0,0(sp) <== return by a0 - a1.
ld a1,8(sp)
addi sp,sp,32
jr ra
For vls floating point, we take the same rules as integer and passed by
the gpr or reference.
The riscv regression passed for this patch.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_v_vls_mode_aggregate_gpr_count): New function to
calculate the gpr count required by vls mode.
(riscv_v_vls_to_gpr_mode): New function convert vls mode to gpr mode.
(riscv_pass_vls_aggregate_in_gpr): New function to return the rtx of gpr
for vls mode.
(riscv_get_arg_info): Add vls mode handling.
(riscv_pass_by_reference): Return false if arg info has no zero gpr count.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h: Add new helper macro.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-9.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-run-6.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
The UNSPEC_XTHEAD* macros ended up in the unspecv enum,
which broke gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c.
The INSNs expect these unspecs to be not volatile.
Further, there is not reason to have them defined volatile.
So let's simply move the macros into the unspec enum.
With this patch we have again 0 fails in riscv.exp.
gcc/ChangeLog:
* config/riscv/riscv.md: Move UNSPEC_XTHEADFMV* to unspec enum.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
In order to handle system security constraints during GCC build
and test and that most platform versions cannot link to libgcc_eh
since the unwinder there is incompatible with the system one.
1. We make the support functions weak definitions.
2. We include them as a CRT for platform conditions that do not
allow libgcc_eh.
3. We ensure that the weak symbols are exported from DSOs (which
includes exes on Darwin) so that the dynamic linker will
pick one instance (which avoids duplication of trampoline
caches).
PR libgcc/113403
gcc/ChangeLog:
* config/darwin.h (DARWIN_SHARED_WEAK_ADDS, DARWIN_WEAK_CRTS): New.
(REAL_LIBGCC_SPEC): Move weak CRT handling to separate spec.
* config/i386/darwin.h (DARWIN_HEAP_T_LIB): New.
* config/i386/darwin32-biarch.h (DARWIN_HEAP_T_LIB): New.
* config/i386/darwin64-biarch.h (DARWIN_HEAP_T_LIB): New.
* config/rs6000/darwin.h (DARWIN_HEAP_T_LIB): New.
libgcc/ChangeLog:
* config.host: Build libheap_t.a for i686/x86_64 Darwin.
* config/aarch64/heap-trampoline.c (HEAP_T_ATTR): New.
(allocate_tramp_ctrl): Allow a target to build this as a weak def.
(__gcc_nested_func_ptr_created): Likewise.
* config/i386/heap-trampoline.c (HEAP_T_ATTR): New.
(allocate_tramp_ctrl): Allow a target to build this as a weak def.
(__gcc_nested_func_ptr_created): Likewise.
* config/t-darwin: Build libheap_t.a (a CRT with heap trampoline
support).
|
|
For something like:
void
foo (void)
{
int *ptr;
asm volatile ("%0" : "=w" (ptr));
asm volatile ("%0" :: "m" (*ptr));
}
early-ra would allocate ptr to an FPR for the first asm, thus
leaving an FPR address in the second asm. The address was then
reloaded by LRA to make it valid.
But early-ra shouldn't be allocating at all in that kind of
situation. Doing so caused the ICE in the PR (with LDP fusion).
Fixed by making sure that we record address references as
GPR references.
gcc/
PR target/113623
* config/aarch64/aarch64-early-ra.cc (early_ra::preprocess_insns):
Mark all registers that occur in addresses as needing a GPR.
gcc/testsuite/
PR target/113623
* gcc.c-torture/compile/pr113623.c: New test.
|
|
In this PR, we entered early-ra with quite a bit of dead code.
The code was duly removed (to avoid wasting registers), but there
was a dangling reference in debug instructions, which caused an
ICE later.
Fixed by resetting a debug instruction if it references a register
that is no longer needed by non-debug instructions.
gcc/
PR target/113636
* config/aarch64/aarch64-early-ra.cc (early_ra::replace_regs): Take
the containing insn as an extra parameter. Reset debug instructions
if they reference a register that is no longer used by real insns.
(early_ra::apply_allocation): Update calls accordingly.
gcc/testsuite/
PR target/113636
* go.dg/pr113636.go: New test.
|
|
32-bit systems.
When using '%ld' to print 'long long int' variable, 'fprintf' will
produce messy output on a 32-bit system, in an incorrect instruction
being generated, such as 'th.lwib a1,(a0),-16,4294967295'. And the
following error occurred during compilation:
Assembler messages:
Error: improper immediate value (18446744073709551615)
gcc/ChangeLog:
* config/riscv/thead.cc (th_print_operand_address): Change %ld
to %lld.
|
|
The current ldp/stp policy framework implementation would miss cases,
where the memory operands were reversed. To address this, the call to
the framework function is moved after the lower mem check with the
suitable parameters.
This change removes the mode of aarch64_operands_ok_for_ldpstp, which
becomes unused.
gcc/ChangeLog:
* config/aarch64/aarch64-ldpstp.md: Remove unused mode.
* config/aarch64/aarch64-protos.h (aarch64_operands_ok_for_ldpstp):
Likewise.
* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
Call on framework moved later.
Signed-off-by: Manos Anagnostakis <manos.anagnostakis@vrull.eu>
Co-Authored-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
Co-Authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
|
|
This patch fixes the BPF backend to not generate `exit' (return)
instructions in epilogues of functions that are declared as naked via
the corresponding compiler attribute. Having extra exit instructions
upsets the kernel BPF verifier.
Tested in bpf-unknown-none target in x86_64-linux-gnu host.
gcc/ChangeLog
* config/bpf/bpf.cc (bpf_expand_epilogue): Do not emit a return
instruction in naked function epilogues.
gcc/testsuite/ChangeLog
* gcc.target/bpf/naked-1.c: Update test to not expect an exit
instruction in naked function.
* gcc.target/bpf/naked-2.c: New test.
|
|
The rev16 pattern was not recognised anymore as a change in the bswap
tree pass was introducing a new GIMPLE form, not recognized by the
assembly final transformation pass.
Also, fix the output patterns for arm_rev16si_alt[12] to correctly
handle the instructions being made conditional.
More details in the PR.
gcc/ChangeLog:
PR target/108933
* config/arm/arm.md (arm_rev16si2): Convert to define_insn.
Correct generated RTL.
(arm_rev16si2_alt1): Correctly handle conditional execution.
(arm_rev16si2_alt2): Likewise.
gcc/testsuite/ChangeLog:
PR target/108933
* gcc.target/arm/rev16.c: Moved to...
* gcc.target/arm/rev16_1.c: ...here.
* gcc.target/arm/rev16_2.c: New test to check that rev16 is emitted.
|
|
The fix for PR113089 introduced range-based for loops over the
debug_insn_uses of an RTL-SSA set_info, but in the case that we reset a
debug insn, the use would get removed from the use list, and thus we
would end up using an invalidated iterator in the next iteration of the
loop. In practice this means we end up terminating the loop
prematurely, and hence ICE as in PR113089 since there are debug uses
that we failed to fix up.
This patch fixes that by introducing a general mechanism to avoid this
sort of problem. We introduce a safe_iterator to iterator-utils.h which
wraps an iterator, and also holds the end iterator value. It then
pre-computes the next iterator value at all iterations, so it doesn't
matter if the original iterator got invalidated during the loop body, we
can still move safely to the next iteration.
We introduce an iterate_safely helper which effectively adapts a
container such as iterator_range into a container of safe_iterators over
the original iterator type.
We then use iterate_safely around all loops over debug_insn_uses () in
the aarch64 ldp/stp pass to fix PR113616. While doing this, I
remembered that cleanup_tombstones () had the same problem. I
previously worked around this locally by manually maintaining the next
nondebug insn, so this patch also refactors that loop to use the new
iterate_safely helper.
While doing that I noticed that a couple of cases in cleanup_tombstones
could be converted from using dyn_cast<set_info *> to as_a<set_info *>,
which should be safe because there are no clobbers of mem in RTL-SSA, so
all defs of memory should be set_infos.
gcc/ChangeLog:
PR target/113616
* config/aarch64/aarch64-ldp-fusion.cc (fixup_debug_uses_trailing_add):
Use iterate_safely when iterating over debug uses.
(fixup_debug_uses): Likewise.
(ldp_bb_info::cleanup_tombstones): Use iterate_safely to iterate
over nondebug insns instead of manually maintaining the next insn.
* iterator-utils.h (class safe_iterator): New.
(iterate_safely): New.
gcc/testsuite/ChangeLog:
PR target/113616
* gcc.c-torture/compile/pr113616.c: New test.
|
|
Save callee-saved registers in noreturn functions for -O0/-Og so that
debugger can restore callee-saved registers in caller's frame.
Also add the TREE_THIS_VOLATILE check to minimize noreturn attribute
lookup.
gcc/
PR target/38534
* config/i386/i386-options.cc (ix86_set_func_type): Save
callee-saved registers in noreturn functions for -O0/-Og.
gcc/testsuite/
PR target/38534
* gcc.target/i386/pr38534-5.c: New file.
* gcc.target/i386/pr38534-6.c: Likewise.
|
|
gcc/ChangeLog:
PR target/113615
* config/gcn/gcn-valu.md (fold_left_plus_<mode>): Only
define for !TARGET_RDNA2_PLUS.
Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
|
|
The -march-map= options maps the compute capability to the closest
lower compute capability that has been implemented; for sm_89 and
sm_90a, that were previously missing, that's currently -march=sm_80
alias -misa=sm_80.
gcc/ChangeLog:
* config/nvptx/nvptx.opt (march-map=): Add sm_89 and sm_90a.
Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
|
|
Some more '-g' fixes as the .mkoffload.dbg.o debug file's has elf flags
which did not match those generated for the compilation, leading to linker
errors. For .mkoffload.dbg.o, the elf flags are generated by mkoffload
itself - while for the other .o files, that's done by the compiler via
setting default and mainly via the ASM_SPEC.
This is a follow up to r14-8332-g13127dac106724 which fixed an issue
caused by the default arch. In this patch, it is mainly for gfx1100
and gfx1030 which always failed. It also affects gfx906 and possibly
gfx900 but only when using the -mxnack/-msram-ecc flags explicitly.
What happens on the compiler side is mainly determined by gcn-hsa.h's
and otherwise by some default setting. In particular for xnack and
sram_ecc, there is:
For gfx1100 and gfx1030, neither xnack nor sram_ecc is set (only
'+wavefrontsize64').
For fiji, gfx900, gfx906 and gfx908 there is always -mattr=-xnack and
for all but gfx908 also -msram-ecc=no - independent of what has been
passed to the compiler. However, on the elf flags, the result differs:
For fiji, due to the HSACOv3, it is always set to 0 via
copy_early_debug_info; for gfx900, gfx906 and gfx908, xnack is OFF.
For sram-ecc, it is 'unset' for gfx900, 'any' for gfx906 and for
gfx908 it is 'any' unless overridden.
For gfx90a, the -msram-ecc= and -mxnack= are passed on, or if not present,
...=any is passed on. Note that this "any" is different from argument
nor present at elf flag level:
For XNACK: unset/unsupported is 0, any = 0x100, off = 0x200, on = 0x300.
For SRAMECC: unset/unsupported is 0, any = 0x400, off = 0x800, on = 0xc00.
The obstack_ptr_grow changes are more to avoid confusion than having an
actual effect as they would overwise be filtered out via the ASM_SPEC.
gcc/ChangeLog:
PR other/111966
* config/gcn/mkoffload.cc (SET_XNACK_UNSET, TEST_SRAM_ECC_UNSET): New.
(SET_SRAM_ECC_UNSUPPORTED): Renamed to ...
(SET_SRAM_ECC_UNSET): ... this.
(copy_early_debug_info): Remove gfx900 special case, now handled as
part of the generic handling.
(main): Update SRAM_ECC and XNACK for the -march as done in gcn-hsa.h.
Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
|
|
We have reports of regressions in both Objective-C and Objective-C++ on
Darwin23 (macOS 14). In some cases, these are linker warnings about the
alignment of CFString constants; in other cases the built executables
crash during runtime initialization. The underlying issue is the same in
both cases; since the objects (CFStrings, Objective-C meta-data) are TU-
local, we are choosing to increase their alignment for efficiency - to
values greater than ABI alignment.
However, although these objects are TU-local, they are also visible to the
linker (since they are placed in specific named sections). In many cases
the metadata can be regarded as tables of data, and thus it is expected
that these sections can be concatenated from multiple TUs and the data
treated as tabular. In order for this to work the data cannot be allowed
to exceed ABI alignment - which leads to the crashes.
For GCC-15+ it would be nice to find a more elegant solution to this issue
(perhaps by adjusting the concept of binds-locally to exclude specific
named sections) - but I do not want to do that in stage 4.
The solution here is to force the alignment to be preserved as created by
setting DECL_USER_ALIGN on the relevant objects.
gcc/ChangeLog:
* config/darwin.cc (darwin_build_constant_cfstring): Prevent over-
alignment of CFString constants by setting DECL_USER_ALIGN.
gcc/objc/ChangeLog:
* objc-next-runtime-abi-02.cc (build_v2_address_table): Prevent
over-alignment of Objective-C metadata by setting DECL_USER_ALIGN
on relevant variables.
(build_v2_protocol_list_address_table): Likewise.
(generate_v2_protocol_list): Likewise.
(generate_v2_meth_descriptor_table): Likewise.
(generate_v2_meth_type_list): Likewise.
(generate_v2_property_table): Likewise.
(generate_v2_dispatch_table): Likewise.
(generate_v2_ivars_list): Likewise.
(generate_v2_class_structs): Likewise.
(build_ehtype): Likewise.
* objc-runtime-shared-support.cc (generate_strings): Likewise.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
There is no need to save callee-saved registers in noreturn functions
if they don't throw nor support exceptions. We can treat them the same
as functions with no_callee_saved_registers attribute.
Adjust stack-check-17.c for noreturn function which no longer saves any
registers.
With this change, __libc_start_main in glibc 2.39, which is a noreturn
function, is changed from
__libc_start_main:
endbr64
push %r15
push %r14
mov %rcx,%r14
push %r13
push %r12
push %rbp
mov %esi,%ebp
push %rbx
mov %rdx,%rbx
sub $0x28,%rsp
mov %rdi,(%rsp)
mov %fs:0x28,%rax
mov %rax,0x18(%rsp)
xor %eax,%eax
test %r9,%r9
to
__libc_start_main:
endbr64
sub $0x28,%rsp
mov %esi,%ebp
mov %rdx,%rbx
mov %rcx,%r14
mov %rdi,(%rsp)
mov %fs:0x28,%rax
mov %rax,0x18(%rsp)
xor %eax,%eax
test %r9,%r9
In Linux kernel 6.7.0 on x86-64, do_exit is changed from
do_exit:
endbr64
call <do_exit+0x9>
push %r15
push %r14
push %r13
push %r12
mov %rdi,%r12
push %rbp
push %rbx
mov %gs:0x0,%rbx
sub $0x28,%rsp
mov %gs:0x28,%rax
mov %rax,0x20(%rsp)
xor %eax,%eax
call *0x0(%rip) # <do_exit+0x39>
test $0x2,%ah
je <do_exit+0x8d3>
to
do_exit:
endbr64
call <do_exit+0x9>
sub $0x28,%rsp
mov %rdi,%r12
mov %gs:0x28,%rax
mov %rax,0x20(%rsp)
xor %eax,%eax
mov %gs:0x0,%rbx
call *0x0(%rip) # <do_exit+0x2f>
test $0x2,%ah
je <do_exit+0x8c9>
I compared GCC master branch bootstrap and test times on a slow machine
with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
with the backported patch. The performance data isn't precise since the
measurements were done on different days with different GCC sources under
different 6.6 kernel versions.
GCC master branch build time in seconds:
before after improvement
30043.75user 30013.16user 0%
1274.85system 1243.72system 2.4%
GCC master branch test time in seconds (new tests added):
before after improvement
216035.90user 216547.51user 0
27365.51system 26658.54system 2.6%
gcc/
PR target/38534
* config/i386/i386-options.cc (ix86_set_func_type): Don't
save and restore callee saved registers for a noreturn function
with nothrow or compiled with -fno-exceptions.
gcc/testsuite/
PR target/38534
* gcc.target/i386/pr38534-1.c: New file.
* gcc.target/i386/pr38534-2.c: Likewise.
* gcc.target/i386/pr38534-3.c: Likewise.
* gcc.target/i386/pr38534-4.c: Likewise.
* gcc.target/i386/stack-check-17.c: Updated.
|
|
When an interrupt handler is implemented by an assembly stub which does:
1. Save all registers.
2. Call a C function.
3. Restore all registers.
4. Return from interrupt.
it is completely unnecessary to save and restore any registers in the C
function called by the assembly stub, even if they would normally be
callee-saved.
Add no_callee_saved_registers function attribute, which is complementary
to no_caller_saved_registers function attribute, to mark a function which
doesn't have any callee-saved registers. Such a function won't save and
restore any registers. Classify function call-saved register handling
type with:
1. Default call-saved registers.
2. No caller-saved registers with no_caller_saved_registers attribute.
3. No callee-saved registers with no_callee_saved_registers attribute.
Disallow sibcall if callee is a no_callee_saved_registers function
and caller isn't a no_callee_saved_registers function. Otherwise,
callee-saved registers won't be preserved.
After a no_callee_saved_registers function is called, all registers may
be clobbered. If the calling function isn't a no_callee_saved_registers
function, we need to preserve all registers which aren't used by function
calls.
gcc/
PR target/103503
PR target/113312
* config/i386/i386-expand.cc (ix86_expand_call): Replace
no_caller_saved_registers check with call_saved_registers check.
Clobber all registers that are not used by the callee with
no_callee_saved_registers attribute.
* config/i386/i386-options.cc (ix86_set_func_type): Set
call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for
noreturn function. Disallow no_callee_saved_registers with
interrupt or no_caller_saved_registers attributes together.
(ix86_set_current_function): Replace no_caller_saved_registers
check with call_saved_registers check.
(ix86_handle_no_caller_saved_registers_attribute): Renamed to ...
(ix86_handle_call_saved_registers_attribute): This.
(ix86_gnu_attributes): Add
ix86_handle_call_saved_registers_attribute.
* config/i386/i386.cc (ix86_conditional_register_usage): Replace
no_caller_saved_registers check with call_saved_registers check.
(ix86_function_ok_for_sibcall): Don't allow callee with
no_callee_saved_registers attribute when the calling function
has callee-saved registers.
(ix86_comp_type_attributes): Also check
no_callee_saved_registers.
(ix86_epilogue_uses): Replace no_caller_saved_registers check
with call_saved_registers check.
(ix86_hard_regno_scratch_ok): Likewise.
(ix86_save_reg): Replace no_caller_saved_registers check with
call_saved_registers check. Don't save any registers for
TYPE_NO_CALLEE_SAVED_REGISTERS. Save all registers with
TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with
no_callee_saved_registers attribute is called.
(find_drap_reg): Replace no_caller_saved_registers check with
call_saved_registers check.
* config/i386/i386.h (call_saved_registers_type): New enum.
(machine_function): Replace no_caller_saved_registers with
call_saved_registers.
* doc/extend.texi: Document no_callee_saved_registers attribute.
gcc/testsuite/
PR target/103503
PR target/113312
* gcc.dg/torture/no-callee-saved-run-1a.c: New file.
* gcc.dg/torture/no-callee-saved-run-1b.c: Likewise.
* gcc.target/i386/no-callee-saved-1.c: Likewise.
* gcc.target/i386/no-callee-saved-2.c: Likewise.
* gcc.target/i386/no-callee-saved-3.c: Likewise.
* gcc.target/i386/no-callee-saved-4.c: Likewise.
* gcc.target/i386/no-callee-saved-5.c: Likewise.
* gcc.target/i386/no-callee-saved-6.c: Likewise.
* gcc.target/i386/no-callee-saved-7.c: Likewise.
* gcc.target/i386/no-callee-saved-8.c: Likewise.
* gcc.target/i386/no-callee-saved-9.c: Likewise.
* gcc.target/i386/no-callee-saved-10.c: Likewise.
* gcc.target/i386/no-callee-saved-11.c: Likewise.
* gcc.target/i386/no-callee-saved-12.c: Likewise.
* gcc.target/i386/no-callee-saved-13.c: Likewise.
* gcc.target/i386/no-callee-saved-14.c: Likewise.
* gcc.target/i386/no-callee-saved-15.c: Likewise.
* gcc.target/i386/no-callee-saved-16.c: Likewise.
* gcc.target/i386/no-callee-saved-17.c: Likewise.
* gcc.target/i386/no-callee-saved-18.c: Likewise.
|
|
This is enough to get gfx1030 and gfx1100 working; there are still some test
failures to investigate, and probably some tuning to do.
gcc/ChangeLog:
* config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3.
* config/gcn/gcn-valu.md (all_convert): New iterator.
(<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New
define_expand, and rename the old one to ...
(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this.
(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ...
(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this.
(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New.
* config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly.
(gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100.
* config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3.
(<u>mulqihi3_scalar): Likewise.
libgcc/ChangeLog:
* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3.
libgomp/ChangeLog:
* config/gcn/time.c (RTC_TICKS): Configure RDNA3.
(omp_get_wtime): Add RDNA3-compatible variant.
* plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100.
Signed-off-by: Andrew Stubbs <ams@baylibre.com>
|
|
Since LLVM commit 082f87c9d418 (Pull Req. #79038; will become LLVM 18)
"[AMDGPU] Change default AMDHSA Code Object version to 5"
the default - when no --amdhsa-code-object-version= is used - was bumped.
Using --amdhsa-code-object-version=5 is supported (with unknown limitations)
since LLVM 14. GCC required for proper support at least LLVM 13.0.1 such
that explicitly using COV5 is not possible.
Unfortunately, the COV number matters for debugging ("-g") as mkoffload.cc
extracts debugging data from the host's object file and writes into an
an AMD GPU object file it creates. And all object files linked together
must have the same ABI version.
gcc/ChangeLog:
* config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): New; creates the
"--amdhsa-code-object-version=" argument.
(ASM_SPEC): Use it; replace previous version of it.
Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
|
|
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): Refine some codes.
(pre_vsetvl::emit_vsetvl): Ditto.
|