aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2024-02-02LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.Li Wei1-0/+48
We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r failed to vectorize effectively. For this reason, we adjust the cost of 128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit vectorization. The experimental results show that after the modification, 549.fotonik3d_r performance can be improved by 9.77% under the 128-bit vectorization option. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_multiply_add_p): New. (loongarch_vector_costs::add_stmt_cost): Adjust. gcc/testsuite/ChangeLog: * gfortran.dg/vect/vect-10.f90: New test.
2024-02-02LoongArch: Don't split the instructions containing relocs for extreme code ↵Xi Ruoyao2-57/+94
model. The ABI mandates the pcalau12i/addi.d/lu32i.d/lu52i.d instructions for addressing a symbol to be adjacent. So model them as "one large instruction", i.e. define_insn, with two output registers. The real address is the sum of these two registers. The advantage of this approach is the RTL passes can still use ldx/stx instructions to skip an addi.d instruction. gcc/ChangeLog: * config/loongarch/loongarch.md (unspec): Add UNSPEC_LA_PCREL_64_PART1 and UNSPEC_LA_PCREL_64_PART2. (la_pcrel64_two_parts): New define_insn. * config/loongarch/loongarch.cc (loongarch_tls_symbol): Fix a typo in the comment. (loongarch_call_tls_get_addr): If -mcmodel=extreme -mexplicit-relocs={always,auto}, use la_pcrel64_two_parts for addressing the TLS symbol and __tls_get_addr. Emit an REG_EQUAL note to allow CSE addressing __tls_get_addr. (loongarch_legitimize_tls_address): If -mcmodel=extreme -mexplicit-relocs={always,auto}, address TLS IE symbols with la_pcrel64_two_parts. (loongarch_split_symbol): If -mcmodel=extreme -mexplicit-relocs={always,auto}, address symbols with la_pcrel64_two_parts. (loongarch_output_mi_thunk): Clean up unreachable code. If -mcmodel=extreme -mexplicit-relocs={always,auto}, address the MI thunks with la_pcrel64_two_parts. gcc/testsuite/ChangeLog: * gcc.target/loongarch/func-call-extreme-1.c (dg-options): Use -O2 instead of -O0 to ensure the pcalau12i/addi/lu32i/lu52i instruction sequences are not reordered by the compiler. (NOIPA): Disallow interprocedural optimizations. * gcc.target/loongarch/func-call-extreme-2.c: Remove the content duplicated from func-call-extreme-1.c, include it instead. (dg-options): Likewise. * gcc.target/loongarch/func-call-extreme-3.c (dg-options): Likewise. * gcc.target/loongarch/func-call-extreme-4.c (dg-options): Likewise. * gcc.target/loongarch/cmodel-extreme-1.c: New test. * gcc.target/loongarch/cmodel-extreme-2.c: New test. * g++.target/loongarch/cmodel-extreme-mi-thunk-1.C: New test. * g++.target/loongarch/cmodel-extreme-mi-thunk-2.C: New test. * g++.target/loongarch/cmodel-extreme-mi-thunk-3.C: New test.
2024-02-02LoongArch: Added support for loading __get_tls_addr symbol address using call36.Lulu Cheng1-6/+16
gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_call_tls_get_addr): Add support for call36. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c: New test.
2024-02-02LoongArch: Enable explicit reloc for extreme TLS GD/LD with ↵Lulu Cheng1-10/+9
-mexplicit-relocs=auto. Binutils does not support relaxation using four instructions to obtain symbol addresses gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_explicit_relocs_p): When the code model of the symbol is extreme and -mexplicit-relocs=auto, the macro instruction loading symbol address is not applicable. (loongarch_call_tls_get_addr): Adjust code. (loongarch_legitimize_tls_address): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c: New test. * gcc.target/loongarch/explicit-relocs-medium-auto-tls-ld-gd.c: New test.
2024-02-02LoongArch: Add the macro implementation of mcmodel=extreme.Lulu Cheng4-44/+127
gcc/ChangeLog: * config/loongarch/loongarch-protos.h (loongarch_symbol_extreme_p): Add function declaration. * config/loongarch/loongarch.cc (loongarch_symbolic_constant_p): For SYMBOL_PCREL64, non-zero addend of "la.local $rd,$rt,sym+addend" is not allowed (loongarch_load_tls): Added macro support in extreme mode. (loongarch_call_tls_get_addr): Likewise. (loongarch_legitimize_tls_address): Likewise. (loongarch_force_address): Likewise. (loongarch_legitimize_move): Likewise. (loongarch_output_mi_thunk): Likewise. (loongarch_option_override_internal): Remove the code that detects explicit relocs status. (loongarch_handle_model_attribute): Likewise. * config/loongarch/loongarch.md (movdi_symbolic_off64): New template. * config/loongarch/predicates.md (symbolic_off64_operand): New predicate. (symbolic_off64_or_reg_operand): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/attr-model-5.c: New test. * gcc.target/loongarch/func-call-extreme-5.c: New test. * gcc.target/loongarch/func-call-extreme-6.c: New test. * gcc.target/loongarch/tls-extreme-macro.c: New test.
2024-02-02LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.Lulu Cheng2-76/+30
gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_load_tls): Load all types of tls symbols through one function. (loongarch_got_load_tls_gd): Delete. (loongarch_got_load_tls_ld): Delete. (loongarch_got_load_tls_ie): Delete. (loongarch_got_load_tls_le): Delete. (loongarch_call_tls_get_addr): Modify the called function name. (loongarch_legitimize_tls_address): Likewise. * config/loongarch/loongarch.md (@got_load_tls_gd<mode>): Delete. (@load_tls<mode>): New template. (@got_load_tls_ld<mode>): Delete. (@got_load_tls_le<mode>): Delete. (@got_load_tls_ie<mode>): Delete.
2024-02-02LoongArch: Modify the address calculation logic for obtaining array element ↵Lulu Cheng1-0/+43
values through fp. Modify address calculation logic from (((a x C) + fp) + offset) to ((fp + offset) + a x C). Thereby modifying the register dependencies and optimizing the code. The value of C is 2 4 or 8. The following is the assembly code before and after a loop modification in spec2006 401.bzip: old | new 735 .L71: | 735 .L71: 736 slli.d $r12,$r15,2 | 736 slli.d $r12,$r15,2 737 ldx.w $r13,$r22,$r12 | 737 ldx.w $r13,$r22,$r12 738 addi.d $r15,$r15,-1 | 738 addi.d $r15,$r15,-1 739 slli.w $r16,$r15,0 | 739 slli.w $r16,$r15,0 740 addi.w $r13,$r13,-1 | 740 addi.w $r13,$r13,-1 741 slti $r14,$r13,0 | 741 slti $r14,$r13,0 742 add.w $r12,$r26,$r13 | 742 add.w $r12,$r26,$r13 743 maskeqz $r12,$r12,$r14 | 743 maskeqz $r12,$r12,$r14 744 masknez $r14,$r13,$r14 | 744 masknez $r14,$r13,$r14 745 or $r12,$r12,$r14 | 745 or $r12,$r12,$r14 746 ldx.bu $r14,$r30,$r12 | 746 ldx.bu $r14,$r30,$r12 747 lu12i.w $r13,4096>>12 | 747 alsl.d $r14,$r14,$r18,2 748 ori $r13,$r13,432 | 748 ldptr.w $r13,$r14,0 749 add.d $r13,$r13,$r3 | 749 addi.w $r17,$r13,-1 750 alsl.d $r14,$r14,$r13,2 | 750 stptr.w $r17,$r14,0 751 ldptr.w $r13,$r14,-1968 | 751 slli.d $r13,$r13,2 752 addi.w $r17,$r13,-1 | 752 stx.w $r12,$r22,$r13 753 st.w $r17,$r14,-1968 | 753 ldptr.w $r12,$r19,0 754 slli.d $r13,$r13,2 | 754 blt $r12,$r16,.L71 755 stx.w $r12,$r22,$r13 | 755 .align 4 756 ldptr.w $r12,$r18,-2048 | 756 757 blt $r12,$r16,.L71 | 757 758 .align 4 | 758 This patch is ported from riscv's commit r14-3111. gcc/ChangeLog: * config/loongarch/loongarch.cc (mem_shadd_or_shadd_rtx_p): New function. (loongarch_legitimize_address): Add logical transformation code.
2024-02-01i386: Improve *cmp<dwi>_doubleword splitter [PR113701]Uros Bizjak1-4/+0
The fix for PR70321 introduced a splitter that split a doubleword comparison into a pair of XORs followed by an IOR to set the (zero) flags register. To help the reload, splitter forced SUBREG pieces of double-word input values to a pseudo, but this regressed gcc.target/i386/pr82580.c: int f0 (U x, U y) { return x == y; } from: xorq %rdx, %rdi xorq %rcx, %rsi xorl %eax, %eax orq %rsi, %rdi sete %al ret to: xchgq %rdi, %rsi movq %rdx, %r8 movq %rcx, %rax movq %rsi, %rdx movq %rdi, %rcx xorq %rax, %rcx xorq %r8, %rdx xorl %eax, %eax orq %rcx, %rdx sete %al ret To mitigate the regression, remove this legacy heuristic (workaround?). There have been many incremental changes and improvements to x86 TImode and register allocation, so this legacy workaround is not only no longer useful, but it actually hurts register allocation. The patched compiler now produces: xchgq %rdi, %rsi xorl %eax, %eax xorq %rsi, %rdx xorq %rdi, %rcx orq %rcx, %rdx sete %al ret PR target/113701 gcc/ChangeLog: * config/i386/i386.md (*cmp<dwi>_doubleword): Do not force SUBREG pieces to pseudos.
2024-02-01hppa: Fix bug in atomic_storedi_1 patternJohn David Anglin1-3/+3
The first alternative stores the floating-point status register in the destination. It should store zero. We need to copy %fr0 to another floating-point register to initialize it to zero. 2024-02-01 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: * config/pa/pa.md (atomic_storedi_1): Fix bug in alternative 1.
2024-02-01AVR: Tabify avr.ccGeorg-Johann Lay1-3715/+3715
gcc/ * config/avr/avr.cc: Tabify.
2024-02-01GCN: Don't hard-code number of SGPR/VGPR/AVGPR registersThomas Schwinge2-6/+15
Also add 'STATIC_ASSERT's for number of SGPR/VGPR/AVGPR registers (in '#ifndef USED_FOR_TARGET', as otherwise 'STATIC_ASSERT' isn't available). gcc/ * config/gcn/gcn.cc (gcn_hsa_declare_function_name): Don't hard-code number of SGPR/VGPR/AVGPR registers. * config/gcn/gcn.h: Add a 'STATIC_ASSERT's for number of SGPR/VGPR/AVGPR registers.
2024-02-01RISC-V: Support scheduling for sifive p600 seriesMonk Chiang9-12/+214
Add sifive p600 series scheduler module. For more information see https://www.sifive.com/cores/performance-p650-670. Add sifive-p650, sifive-p670 for mcpu option will come in separate patches. gcc/ChangeLog: * config/riscv/riscv.md: Add "fcvt_i2f", "fcvt_f2i" type attribute, and include sifive-p600.md. * config/riscv/generic-ooo.md: Update type attribute. * config/riscv/generic.md: Update type attribute. * config/riscv/sifive-7.md: Update type attribute. * config/riscv/sifive-p600.md: New file. * config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter. * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Add sifive_p600. * config/riscv/riscv.cc (sifive_p600_tune_info): New. * config/riscv/riscv.h (TARGET_SFB_ALU): Update. * doc/invoke.texi (RISC-V Options): Add sifive-p600-series
2024-02-01RISC-V: Add minimal support for 7 new unprivileged extensionsMonk Chiang1-0/+14
The RISC-V Profiles specification here: https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#7-new-isa-extensions These extensions don't add any new features but describe existing features. So this patch only adds parsing. Za64rs: Reservation set size of 64 bytes Za128rs: Reservation set size of 128 bytes Ziccif: Main memory supports instruction fetch with atomicity requirement Ziccrse: Main memory supports forward progress on LR/SC sequences Ziccamoa: Main memory supports all atomics in A Zicclsm: Main memory supports misaligned loads/stores Zic64b: Cache block size isf 64 bytes gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Add Za64rs, Za128rs, Ziccif, Ziccrse, Ziccamoa, Zicclsm, Zic64b items. * config/riscv/riscv.opt: New macro for 7 new unprivileged extensions. * doc/invoke.texi (RISC-V Options): Add Za64rs, Za128rs, Ziccif, Ziccrse, Ziccamoa, Zicclsm, Zic64b extensions. gcc/testsuite/ChangeLog: * gcc.target/riscv/za-ext.c: New test. * gcc.target/riscv/zi-ext.c: New test.
2024-02-01Link shared libasan with -z now on SolarisRainer Orth1-1/+1
g++.dg/asan/default-options-1.C FAILs on Solaris/SPARC and x86: FAIL: g++.dg/asan/default-options-1.C -O0 execution test FAIL: g++.dg/asan/default-options-1.C -O1 execution test FAIL: g++.dg/asan/default-options-1.C -O2 execution test FAIL: g++.dg/asan/default-options-1.C -O2 -flto execution test FAIL: g++.dg/asan/default-options-1.C -O2 -flto -flto-partition=none execution test FAIL: g++.dg/asan/default-options-1.C -O3 -g execution test FAIL: g++.dg/asan/default-options-1.C -Os execution test The failure is always the same: AddressSanitizer: CHECK failed: asan_rtl.cpp:397 "((!AsanInitIsRunning() && "ASan init calls itself!")) != (0)" (0x0, 0x0) (tid=1) This happens because libasan makes unportable assumptions about initialization order that don't hold on Solaris. The problem has already been fixed in clang by [Driver] Link shared asan runtime lib with -z now on Solaris/x86 https://reviews.llvm.org/D156325 where it was way more prevalent. This patch applies the same fix to gcc. Tested on i386-pc-solaris2.11 (ld and gld) and sparc-sun-solaris2.11. 2024-01-30 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc: * config/sol2.h (LIBASAN_EARLY_SPEC): Add -z now unless -static-libasan. Add missing whitespace.
2024-02-01GCN: Remove 'FIRST_{SGPR,VGPR,AVGPR}_REG', 'LAST_{SGPR,VGPR,AVGPR}_REG' from ↵Thomas Schwinge1-8/+2
machine description They're not used there, and we avoid potentially out-of-sync definitions. gcc/ * config/gcn/gcn.md (FIRST_SGPR_REG, LAST_SGPR_REG) (FIRST_VGPR_REG, LAST_VGPR_REG, FIRST_AVGPR_REG, LAST_AVGPR_REG): Don't 'define_constants'.
2024-02-01GCN: Remove 'SGPR_OR_VGPR_REGNO_P' definitionThomas Schwinge1-1/+0
..., which was always (a) unused, and (b) bogus: always-false. gcc/ * config/gcn/gcn.h (SGPR_OR_VGPR_REGNO_P): Remove.
2024-02-01GCN, RDNA 3: Adjust 'sync_compare_and_swap<mode>_lds_insn'Thomas Schwinge1-1/+6
For OpenACC/GCN '-march=gfx1100', a lot of libgomp OpenACC test cases FAIL: /tmp/ccGfLJ8a.mkoffload.2.s:406:2: error: instruction not supported on this GPU ds_cmpst_rtn_b32 v0, v0, v4, v3 ^ In RDNA 3, 'ds_cmpst_[...]' has been replaced by 'ds_cmpstore_[...]', and the notes for 'ds_cmpst_[...]' in pre-RDNA 3 ISA manuals: Caution, the order of src and cmp are the *opposite* of the BUFFER_ATOMIC_CMPSWAP opcode. ..., have been resolved for 'ds_cmpstore_[...]' in the RDNA 3 ISA manual: In this architecture the order of src and cmp agree with the BUFFER_ATOMIC_CMPSWAP opcode. ..., and therefore '%2', '%3' now swapped with regards to GCC operand order. Most of the affected libgomp OpenACC test cases then PASS their execution test. gcc/ * config/gcn/gcn.md (sync_compare_and_swap<mode>_lds_insn) [TARGET_RDNA3]: Adjust.
2024-01-31Revert "RISC-V: Add non-vector types to dfa pipelines"Edwin Lu6-102/+66
This reverts commit 26c34b809cd1a6249027730a8b52bbf6a1c0f4a8.
2024-01-31Revert "RISC-V: Add vector related pipelines"Edwin Lu3-145/+126
This reverts commit e56fb037d9d265682f5e7217d8a4c12a8d3fddf8.
2024-01-31Revert "RISC-V: Enable assert for insn_has_dfa_reservation"Edwin Lu1-0/+2
This reverts commit 23cd2961bd2ff63583f46e3499a07bd54491d45c.
2024-01-31RISC-V: Enable assert for insn_has_dfa_reservationEdwin Lu1-2/+0
Enables assert that every typed instruction is associated with a dfa reservation gcc/ChangeLog: * config/riscv/riscv.cc (riscv_sched_variable_issue): enable assert
2024-01-31RISC-V: Add vector related pipelinesEdwin Lu3-126/+145
Creates new generic vector pipeline file common to all cpu tunes. Moves all vector related pipelines from generic-ooo to generic-vector-ooo. Creates new vector crypto related insn reservations. gcc/ChangeLog: * config/riscv/generic-ooo.md (generic_ooo): Move reservation (generic_ooo_vec_load): ditto (generic_ooo_vec_store): ditto (generic_ooo_vec_loadstore_seg): ditto (generic_ooo_vec_alu): ditto (generic_ooo_vec_fcmp): ditto (generic_ooo_vec_imul): ditto (generic_ooo_vec_fadd): ditto (generic_ooo_vec_fmul): ditto (generic_ooo_crypto): ditto (generic_ooo_perm): ditto (generic_ooo_vec_reduction): ditto (generic_ooo_vec_ordered_reduction): ditto (generic_ooo_vec_idiv): ditto (generic_ooo_vec_float_divsqrt): ditto (generic_ooo_vec_mask): ditto (generic_ooo_vec_vesetvl): ditto (generic_ooo_vec_setrm): ditto (generic_ooo_vec_readlen): ditto * config/riscv/riscv.md: include generic-vector-ooo * config/riscv/generic-vector-ooo.md: New file. to here Signed-off-by: Edwin Lu <ewlu@rivosinc.com> Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com>
2024-01-31RISC-V: Add non-vector types to dfa pipelinesEdwin Lu6-66/+102
This patch adds non-vector related insn reservations and updates/creates new insn reservations so all non-vector typed instructions have a reservation. gcc/ChangeLog: * config/riscv/generic-ooo.md (generic_ooo_sfb_alu): Add reservation (generic_ooo_branch): ditto * config/riscv/generic.md (generic_sfb_alu): ditto (generic_fmul_half): ditto * config/riscv/riscv.md: Remove cbo, pushpop, and rdfrm types * config/riscv/sifive-7.md (sifive_7_hfma):Add reservation (sifive_7_popcount): ditto * config/riscv/vector.md: change rdfrm to fmove * config/riscv/zc.md: change pushpop to load/store Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
2024-01-31aarch64: -mstrict-align vs __arm_data512_t [PR113657]Andrew Pinski1-4/+7
After r14-1187-gd6b756447cd58b, simplify_gen_subreg can return NULL for "unaligned" memory subreg. Since V8DI has an alignment of 8 bytes, using TImode causes simplify_gen_subreg to return NULL. This fixes the issue by using DImode instead for the loop. And then we will have later on the STP/LDP pass combine it back into STP/LDP if needed. Since strict align is less important (usually used for firmware and early boot only), not doing LDP/STP here is ok. Built and tested for aarch64-linux-gnu with no regressions. PR target/113657 gcc/ChangeLog: * config/aarch64/aarch64-simd.md (split for movv8di): For strict aligned mode, use DImode instead of TImode. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/ls64_strict_align.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-01-31aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]Alex Coplan1-1/+1
The PR shows us ICEing due to an unrecognizable TFmode save emitted by aarch64_process_components. The problem is that for T{I,F,D}mode we conservatively require mems to be in range for x-register ldp/stp. That is because (at least for TImode) it can be allocated to both GPRs and FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is a q-register load/store. As Richard pointed out in the PR, aarch64_get_separate_components already checks that the offsets are suitable for a single load, so we just need to choose a mode in aarch64_reg_save_mode that gives the full q-register range. In this patch, we choose V16QImode as an alternative 16-byte "bag-of-bits" mode that doesn't have the artificial range restrictions imposed on T{I,F,D}mode. For T{F,D}mode in GCC 15 I think we could consider relaxing the restriction imposed in aarch64_classify_address, as typically T{F,D}mode should be allocated to FPRs. But such a change seems too invasive to consider for GCC 14 at this stage (let alone backports). Fortunately the new flexible load/store pair patterns in GCC 14 allow this mode change to work without further changes. The backports are more involved as we need to adjust the load/store pair handling to cater for V16QImode in a few places. Note that for the testcase we are relying on the torture options to add -funroll-loops at -O3 which is necessary to trigger the ICE on trunk (but not on the 13 branch). gcc/ChangeLog: PR target/111677 * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use V16QImode for the full 16-byte FPR saves in the vector PCS case. gcc/testsuite/ChangeLog: PR target/111677 * gcc.target/aarch64/torture/pr111677.c: New test.
2024-01-31AVR: Add AVR64DU and some older devices.Georg-Johann Lay1-1/+7
gcc/ * config/avr/avr-mcus.def: Add AVR64DU28, AVR64DU32, ATA5787, ATA5835, ATtiny64AUTO, ATA5700M322. * doc/avr-mmcu.texi: Rebuild.
2024-01-310From: Alexandre Oliva <oliva@adacore.com>Alexandre Oliva1-0/+7
strub: introduce STACK_ADDRESS_OFFSET Since STACK_POINTER_OFFSET is not necessarily at the boundary between caller- and callee-owned stack, as desired by __builtin_stack_address(), and using it as if it were or not causes problems, introduce a new macro so that ports can define it suitably, without modifying STACK_POINTER_OFFSET. for gcc/ChangeLog PR middle-end/112917 PR middle-end/113100 * builtins.cc (expand_builtin_stack_address): Use STACK_ADDRESS_OFFSET. * doc/extend.texi (__builtin_stack_address): Adjust. * config/sparc/sparc.h (STACK_ADDRESS_OFFSET): Define. * doc/tm.texi.in (STACK_ADDRESS_OFFSET): Document. * doc/tm.texi: Rebuilt.
2024-01-31RISC-V: Fix VSETLV PASS compile-time issueJuzhe-Zhong1-124/+60
The compile time issue was discovered in SPEC 2017 wrf: Use time and -ftime-report to analyze the profile data of SPEC 2017 wrf compilation . Before this patch (Lazy vsetvl): scheduling : 121.89 ( 15%) 0.53 ( 11%) 122.72 ( 15%) 13M ( 1%) machine dep reorg : 424.61 ( 53%) 1.84 ( 37%) 427.44 ( 53%) 5290k ( 0%) real 13m27.074s user 13m19.539s sys 0m5.180s Simple vsetvl: machine dep reorg : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 4138k ( 0%) real 6m5.780s user 6m2.396s sys 0m2.373s The machine dep reorg is the compile time of VSETVL PASS (424 seconds) which counts 53% of the compilation time, spends much more time than scheduling. After investigation, the critical patch of VSETVL pass is compute_lcm_local_properties which is called every iteration of phase 2 (earliest fusion) and phase 3 (global lcm). This patch optimized the codes of compute_lcm_local_properties to reduce the compilation time. After this patch: scheduling : 117.51 ( 27%) 0.21 ( 6%) 118.04 ( 27%) 13M ( 1%) machine dep reorg : 80.13 ( 18%) 0.91 ( 26%) 81.26 ( 18%) 5290k ( 0%) real 7m25.374s user 7m20.116s sys 0m3.795s The optimization of this patch is very obvious, lazy VSETVL PASS: 424s (53%) -> 80s (18%) which spend less time than scheduling. Tested on both RV32 and RV64 no regression. Ok for trunk ? PR target/113495 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (extract_single_source): Remove. (pre_vsetvl::compute_vsetvl_def_data): Fix compile time issue. (pre_vsetvl::compute_transparent): New function. (pre_vsetvl::compute_lcm_local_properties): Fix compile time time issue.
2024-01-30i386: Add "Ws" constraint for symbolic address/label reference [PR105576]Fangrui Song1-0/+4
Printing the raw symbol is useful in inline asm (e.g. in C++ to get the mangled name). Similar constraints are available in other targets (e.g. "S" for aarch64/riscv, "Cs" for m68k). There isn't a good way for x86 yet, e.g. "i" doesn't work for PIC/-mcmodel=large. This patch adds "Ws". Here are possible use cases: ``` namespace ns { extern int var; } asm (".pushsection .xxx,\"aw\"; .dc.a %0; .popsection" :: "Ws"(&var)); asm (".reloc ., BFD_RELOC_NONE, %0" :: "Ws"(&var)); ``` gcc/ChangeLog: PR target/105576 * config/i386/constraints.md: Define constraint "Ws". * doc/md.texi: Document it. gcc/testsuite/ChangeLog: PR target/105576 * gcc.target/i386/asm-raw-symbol.c: New testcase.
2024-01-30xtensa: Make full transition to LRATakayuki 'January June' Suwa5-74/+26
gcc/ChangeLog: * config/xtensa/constraints.md (R, T, U): Change define_constraint to define_memory_constraint. * config/xtensa/predicates.md (move_operand): Don't check that a constant pool operand size is a multiple of UNITS_PER_WORD. * config/xtensa/xtensa.cc (xtensa_lra_p, TARGET_LRA_P): Remove. (xtensa_emit_move_sequence): Remove "if (reload_in_progress)" clause as it can no longer be true. (fixup_subreg_mem): Drop function. (xtensa_output_integer_literal_parts): Consider 16-bit wide constants. (xtensa_legitimate_constant_p): Add short-circuit path for integer load instructions. Don't check that mode size is at least UNITS_PER_WORD. * config/xtensa/xtensa.md (movsf): Use can_create_pseudo_p() rather reload_in_progress and reload_completed. (doloop_end): Drop operand 2. (movhi_internal): Add alternative loading constant from a literal pool. (define_split for DI register_operand): Don't limit to !TARGET_AUTO_LITPOOLS. * config/xtensa/xtensa.opt (mlra): Change to no effect.
2024-01-30RISC-V: Bugfix for vls mode aggregated in GPR calling conventionPan Li1-0/+78
According to the issue as below. https://hub.fgit.cf/riscv-non-isa/riscv-elf-psabi-doc/pull/416 When the mode size of vls integer mode is less than 2 * XLEN, we will take the gpr for both the args and the return values. Instead of the reference. For example the below code: typedef short v8hi __attribute__ ((vector_size (16))); v8hi __attribute__((noinline)) add (v8hi a, v8hi b) { v8hi r = a + b; return r; } Before this patch: add: vsetivli zero,8,e16,m1,ta,ma vle16.v v1,0(a1) <== arg by reference vle16.v v2,0(a2) <== arg by reference vadd.vv v1,v1,v2 vse16.v v1,0(a0) <== return by reference ret After this patch: add: addi sp,sp,-32 sd a0,0(sp) <== arg by register a0 - a3 sd a1,8(sp) sd a2,16(sp) sd a3,24(sp) addi a5,sp,16 vsetivli zero,8,e16,m1,ta,ma vle16.v v2,0(sp) vle16.v v1,0(a5) vadd.vv v1,v1,v2 vse16.v v1,0(sp) ld a0,0(sp) <== return by a0 - a1. ld a1,8(sp) addi sp,sp,32 jr ra For vls floating point, we take the same rules as integer and passed by the gpr or reference. The riscv regression passed for this patch. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_v_vls_mode_aggregate_gpr_count): New function to calculate the gpr count required by vls mode. (riscv_v_vls_to_gpr_mode): New function convert vls mode to gpr mode. (riscv_pass_vls_aggregate_in_gpr): New function to return the rtx of gpr for vls mode. (riscv_get_arg_info): Add vls mode handling. (riscv_pass_by_reference): Return false if arg info has no zero gpr count. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/def.h: Add new helper macro. * gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-10.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-8.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-9.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-run-3.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-run-4.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-run-5.c: New test. * gcc.target/riscv/rvv/autovec/vls/calling-convention-run-6.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-01-30riscv: Move UNSPEC_XTHEAD* from unspecv to unspecChristoph Müllner1-4/+4
The UNSPEC_XTHEAD* macros ended up in the unspecv enum, which broke gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c. The INSNs expect these unspecs to be not volatile. Further, there is not reason to have them defined volatile. So let's simply move the macros into the unspec enum. With this patch we have again 0 fails in riscv.exp. gcc/ChangeLog: * config/riscv/riscv.md: Move UNSPEC_XTHEADFMV* to unspec enum. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-01-30libgcc: Make heap trampoline support dynamic [PR113403].Iain Sandoe5-16/+38
In order to handle system security constraints during GCC build and test and that most platform versions cannot link to libgcc_eh since the unwinder there is incompatible with the system one. 1. We make the support functions weak definitions. 2. We include them as a CRT for platform conditions that do not allow libgcc_eh. 3. We ensure that the weak symbols are exported from DSOs (which includes exes on Darwin) so that the dynamic linker will pick one instance (which avoids duplication of trampoline caches). PR libgcc/113403 gcc/ChangeLog: * config/darwin.h (DARWIN_SHARED_WEAK_ADDS, DARWIN_WEAK_CRTS): New. (REAL_LIBGCC_SPEC): Move weak CRT handling to separate spec. * config/i386/darwin.h (DARWIN_HEAP_T_LIB): New. * config/i386/darwin32-biarch.h (DARWIN_HEAP_T_LIB): New. * config/i386/darwin64-biarch.h (DARWIN_HEAP_T_LIB): New. * config/rs6000/darwin.h (DARWIN_HEAP_T_LIB): New. libgcc/ChangeLog: * config.host: Build libheap_t.a for i686/x86_64 Darwin. * config/aarch64/heap-trampoline.c (HEAP_T_ATTR): New. (allocate_tramp_ctrl): Allow a target to build this as a weak def. (__gcc_nested_func_ptr_created): Likewise. * config/i386/heap-trampoline.c (HEAP_T_ATTR): New. (allocate_tramp_ctrl): Allow a target to build this as a weak def. (__gcc_nested_func_ptr_created): Likewise. * config/t-darwin: Build libheap_t.a (a CRT with heap trampoline support).
2024-01-30aarch64: Avoid allocating FPRs to address registers [PR113623]Richard Sandiford1-0/+9
For something like: void foo (void) { int *ptr; asm volatile ("%0" : "=w" (ptr)); asm volatile ("%0" :: "m" (*ptr)); } early-ra would allocate ptr to an FPR for the first asm, thus leaving an FPR address in the second asm. The address was then reloaded by LRA to make it valid. But early-ra shouldn't be allocating at all in that kind of situation. Doing so caused the ICE in the PR (with LDP fusion). Fixed by making sure that we record address references as GPR references. gcc/ PR target/113623 * config/aarch64/aarch64-early-ra.cc (early_ra::preprocess_insns): Mark all registers that occur in addresses as needing a GPR. gcc/testsuite/ PR target/113623 * gcc.c-torture/compile/pr113623.c: New test.
2024-01-30aarch64: Handle debug references to removed registers [PR113636]Richard Sandiford1-5/+14
In this PR, we entered early-ra with quite a bit of dead code. The code was duly removed (to avoid wasting registers), but there was a dangling reference in debug instructions, which caused an ICE later. Fixed by resetting a debug instruction if it references a register that is no longer needed by non-debug instructions. gcc/ PR target/113636 * config/aarch64/aarch64-early-ra.cc (early_ra::replace_regs): Take the containing insn as an extra parameter. Reset debug instructions if they reference a register that is no longer used by real insns. (early_ra::apply_allocation): Update calls accordingly. gcc/testsuite/ PR target/113636 * go.dg/pr113636.go: New test.
2024-01-30RISC-V: THEAD: Fix improper immediate value for MODIFY_DISP instruction on ↵Jin Ma1-1/+1
32-bit systems. When using '%ld' to print 'long long int' variable, 'fprintf' will produce messy output on a 32-bit system, in an incorrect instruction being generated, such as 'th.lwib a1,(a0),-16,4294967295'. And the following error occurred during compilation: Assembler messages: Error: improper immediate value (18446744073709551615) gcc/ChangeLog: * config/riscv/thead.cc (th_print_operand_address): Change %ld to %lld.
2024-01-30aarch64: fix handling of reversed mem ops in ldp/stp policy modelManos Anagnostakis3-21/+21
The current ldp/stp policy framework implementation would miss cases, where the memory operands were reversed. To address this, the call to the framework function is moved after the lower mem check with the suitable parameters. This change removes the mode of aarch64_operands_ok_for_ldpstp, which becomes unused. gcc/ChangeLog: * config/aarch64/aarch64-ldpstp.md: Remove unused mode. * config/aarch64/aarch64-protos.h (aarch64_operands_ok_for_ldpstp): Likewise. * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp): Call on framework moved later. Signed-off-by: Manos Anagnostakis <manos.anagnostakis@vrull.eu> Co-Authored-by: Manolis Tsamis <manolis.tsamis@vrull.eu> Co-Authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
2024-01-29bpf: emit empty epilogues in naked functionsJose E. Marchesi1-3/+2
This patch fixes the BPF backend to not generate `exit' (return) instructions in epilogues of functions that are declared as naked via the corresponding compiler attribute. Having extra exit instructions upsets the kernel BPF verifier. Tested in bpf-unknown-none target in x86_64-linux-gnu host. gcc/ChangeLog * config/bpf/bpf.cc (bpf_expand_epilogue): Do not emit a return instruction in naked function epilogues. gcc/testsuite/ChangeLog * gcc.target/bpf/naked-1.c: Update test to not expect an exit instruction in naked function. * gcc.target/bpf/naked-2.c: New test.
2024-01-29arm: Add pattern for bswap + rotate -> rev16 [Bug 108933]Matthieu Longo1-11/+20
The rev16 pattern was not recognised anymore as a change in the bswap tree pass was introducing a new GIMPLE form, not recognized by the assembly final transformation pass. Also, fix the output patterns for arm_rev16si_alt[12] to correctly handle the instructions being made conditional. More details in the PR. gcc/ChangeLog: PR target/108933 * config/arm/arm.md (arm_rev16si2): Convert to define_insn. Correct generated RTL. (arm_rev16si2_alt1): Correctly handle conditional execution. (arm_rev16si2_alt2): Likewise. gcc/testsuite/ChangeLog: PR target/108933 * gcc.target/arm/rev16.c: Moved to... * gcc.target/arm/rev16_1.c: ...here. * gcc.target/arm/rev16_2.c: New test to check that rev16 is emitted.
2024-01-29aarch64: Ensure iterator validity when updating debug uses [PR113616]Alex Coplan1-27/+20
The fix for PR113089 introduced range-based for loops over the debug_insn_uses of an RTL-SSA set_info, but in the case that we reset a debug insn, the use would get removed from the use list, and thus we would end up using an invalidated iterator in the next iteration of the loop. In practice this means we end up terminating the loop prematurely, and hence ICE as in PR113089 since there are debug uses that we failed to fix up. This patch fixes that by introducing a general mechanism to avoid this sort of problem. We introduce a safe_iterator to iterator-utils.h which wraps an iterator, and also holds the end iterator value. It then pre-computes the next iterator value at all iterations, so it doesn't matter if the original iterator got invalidated during the loop body, we can still move safely to the next iteration. We introduce an iterate_safely helper which effectively adapts a container such as iterator_range into a container of safe_iterators over the original iterator type. We then use iterate_safely around all loops over debug_insn_uses () in the aarch64 ldp/stp pass to fix PR113616. While doing this, I remembered that cleanup_tombstones () had the same problem. I previously worked around this locally by manually maintaining the next nondebug insn, so this patch also refactors that loop to use the new iterate_safely helper. While doing that I noticed that a couple of cases in cleanup_tombstones could be converted from using dyn_cast<set_info *> to as_a<set_info *>, which should be safe because there are no clobbers of mem in RTL-SSA, so all defs of memory should be set_infos. gcc/ChangeLog: PR target/113616 * config/aarch64/aarch64-ldp-fusion.cc (fixup_debug_uses_trailing_add): Use iterate_safely when iterating over debug uses. (fixup_debug_uses): Likewise. (ldp_bb_info::cleanup_tombstones): Use iterate_safely to iterate over nondebug insns instead of manually maintaining the next insn. * iterator-utils.h (class safe_iterator): New. (iterate_safely): New. gcc/testsuite/ChangeLog: PR target/113616 * gcc.c-torture/compile/pr113616.c: New test.
2024-01-29x86: Save callee-saved registers in noreturn functions for -O0/-OgH.J. Lu1-4/+8
Save callee-saved registers in noreturn functions for -O0/-Og so that debugger can restore callee-saved registers in caller's frame. Also add the TREE_THIS_VOLATILE check to minimize noreturn attribute lookup. gcc/ PR target/38534 * config/i386/i386-options.cc (ix86_set_func_type): Save callee-saved registers in noreturn functions for -O0/-Og. gcc/testsuite/ PR target/38534 * gcc.target/i386/pr38534-5.c: New file. * gcc.target/i386/pr38534-6.c: Likewise.
2024-01-29gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]Tobias Burnus1-1/+2
gcc/ChangeLog: PR target/113615 * config/gcn/gcn-valu.md (fold_left_plus_<mode>): Only define for !TARGET_RDNA2_PLUS. Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
2024-01-29nvptx.opt: Add sm_89 and sm_90a to -march-map=Tobias Burnus1-0/+6
The -march-map= options maps the compute capability to the closest lower compute capability that has been implemented; for sm_89 and sm_90a, that were previously missing, that's currently -march=sm_80 alias -misa=sm_80. gcc/ChangeLog: * config/nvptx/nvptx.opt (march-map=): Add sm_89 and sm_90a. Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
2024-01-29gcn/mkoffload.cc: Fix SRAM_ECC and XNACK handling [PR111966]Tobias Burnus1-18/+32
Some more '-g' fixes as the .mkoffload.dbg.o debug file's has elf flags which did not match those generated for the compilation, leading to linker errors. For .mkoffload.dbg.o, the elf flags are generated by mkoffload itself - while for the other .o files, that's done by the compiler via setting default and mainly via the ASM_SPEC. This is a follow up to r14-8332-g13127dac106724 which fixed an issue caused by the default arch. In this patch, it is mainly for gfx1100 and gfx1030 which always failed. It also affects gfx906 and possibly gfx900 but only when using the -mxnack/-msram-ecc flags explicitly. What happens on the compiler side is mainly determined by gcn-hsa.h's and otherwise by some default setting. In particular for xnack and sram_ecc, there is: For gfx1100 and gfx1030, neither xnack nor sram_ecc is set (only '+wavefrontsize64'). For fiji, gfx900, gfx906 and gfx908 there is always -mattr=-xnack and for all but gfx908 also -msram-ecc=no - independent of what has been passed to the compiler. However, on the elf flags, the result differs: For fiji, due to the HSACOv3, it is always set to 0 via copy_early_debug_info; for gfx900, gfx906 and gfx908, xnack is OFF. For sram-ecc, it is 'unset' for gfx900, 'any' for gfx906 and for gfx908 it is 'any' unless overridden. For gfx90a, the -msram-ecc= and -mxnack= are passed on, or if not present, ...=any is passed on. Note that this "any" is different from argument nor present at elf flag level: For XNACK: unset/unsupported is 0, any = 0x100, off = 0x200, on = 0x300. For SRAMECC: unset/unsupported is 0, any = 0x400, off = 0x800, on = 0xc00. The obstack_ptr_grow changes are more to avoid confusion than having an actual effect as they would overwise be filtered out via the ASM_SPEC. gcc/ChangeLog: PR other/111966 * config/gcn/mkoffload.cc (SET_XNACK_UNSET, TEST_SRAM_ECC_UNSET): New. (SET_SRAM_ECC_UNSUPPORTED): Renamed to ... (SET_SRAM_ECC_UNSET): ... this. (copy_early_debug_info): Remove gfx900 special case, now handled as part of the generic handling. (main): Update SRAM_ECC and XNACK for the -march as done in gcn-hsa.h. Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
2024-01-28Objective-C, Darwin: Do not overalign CFStrings and Objective-C metadata.Iain Sandoe1-0/+1
We have reports of regressions in both Objective-C and Objective-C++ on Darwin23 (macOS 14). In some cases, these are linker warnings about the alignment of CFString constants; in other cases the built executables crash during runtime initialization. The underlying issue is the same in both cases; since the objects (CFStrings, Objective-C meta-data) are TU- local, we are choosing to increase their alignment for efficiency - to values greater than ABI alignment. However, although these objects are TU-local, they are also visible to the linker (since they are placed in specific named sections). In many cases the metadata can be regarded as tables of data, and thus it is expected that these sections can be concatenated from multiple TUs and the data treated as tabular. In order for this to work the data cannot be allowed to exceed ABI alignment - which leads to the crashes. For GCC-15+ it would be nice to find a more elegant solution to this issue (perhaps by adjusting the concept of binds-locally to exclude specific named sections) - but I do not want to do that in stage 4. The solution here is to force the alignment to be preserved as created by setting DECL_USER_ALIGN on the relevant objects. gcc/ChangeLog: * config/darwin.cc (darwin_build_constant_cfstring): Prevent over- alignment of CFString constants by setting DECL_USER_ALIGN. gcc/objc/ChangeLog: * objc-next-runtime-abi-02.cc (build_v2_address_table): Prevent over-alignment of Objective-C metadata by setting DECL_USER_ALIGN on relevant variables. (build_v2_protocol_list_address_table): Likewise. (generate_v2_protocol_list): Likewise. (generate_v2_meth_descriptor_table): Likewise. (generate_v2_meth_type_list): Likewise. (generate_v2_property_table): Likewise. (generate_v2_dispatch_table): Likewise. (generate_v2_ivars_list): Likewise. (generate_v2_class_structs): Likewise. (build_ehtype): Likewise. * objc-runtime-shared-support.cc (generate_strings): Likewise. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-01-27x86: Don't save callee-saved registers in noreturn functionsH.J. Lu1-2/+14
There is no need to save callee-saved registers in noreturn functions if they don't throw nor support exceptions. We can treat them the same as functions with no_callee_saved_registers attribute. Adjust stack-check-17.c for noreturn function which no longer saves any registers. With this change, __libc_start_main in glibc 2.39, which is a noreturn function, is changed from __libc_start_main: endbr64 push %r15 push %r14 mov %rcx,%r14 push %r13 push %r12 push %rbp mov %esi,%ebp push %rbx mov %rdx,%rbx sub $0x28,%rsp mov %rdi,(%rsp) mov %fs:0x28,%rax mov %rax,0x18(%rsp) xor %eax,%eax test %r9,%r9 to __libc_start_main: endbr64 sub $0x28,%rsp mov %esi,%ebp mov %rdx,%rbx mov %rcx,%r14 mov %rdi,(%rsp) mov %fs:0x28,%rax mov %rax,0x18(%rsp) xor %eax,%eax test %r9,%r9 In Linux kernel 6.7.0 on x86-64, do_exit is changed from do_exit: endbr64 call <do_exit+0x9> push %r15 push %r14 push %r13 push %r12 mov %rdi,%r12 push %rbp push %rbx mov %gs:0x0,%rbx sub $0x28,%rsp mov %gs:0x28,%rax mov %rax,0x20(%rsp) xor %eax,%eax call *0x0(%rip) # <do_exit+0x39> test $0x2,%ah je <do_exit+0x8d3> to do_exit: endbr64 call <do_exit+0x9> sub $0x28,%rsp mov %rdi,%r12 mov %gs:0x28,%rax mov %rax,0x20(%rsp) xor %eax,%eax mov %gs:0x0,%rbx call *0x0(%rip) # <do_exit+0x2f> test $0x2,%ah je <do_exit+0x8c9> I compared GCC master branch bootstrap and test times on a slow machine with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13 with the backported patch. The performance data isn't precise since the measurements were done on different days with different GCC sources under different 6.6 kernel versions. GCC master branch build time in seconds: before after improvement 30043.75user 30013.16user 0% 1274.85system 1243.72system 2.4% GCC master branch test time in seconds (new tests added): before after improvement 216035.90user 216547.51user 0 27365.51system 26658.54system 2.6% gcc/ PR target/38534 * config/i386/i386-options.cc (ix86_set_func_type): Don't save and restore callee saved registers for a noreturn function with nothrow or compiled with -fno-exceptions. gcc/testsuite/ PR target/38534 * gcc.target/i386/pr38534-1.c: New file. * gcc.target/i386/pr38534-2.c: Likewise. * gcc.target/i386/pr38534-3.c: Likewise. * gcc.target/i386/pr38534-4.c: Likewise. * gcc.target/i386/stack-check-17.c: Updated.
2024-01-27x86: Add no_callee_saved_registers function attributeH.J. Lu4-35/+139
When an interrupt handler is implemented by an assembly stub which does: 1. Save all registers. 2. Call a C function. 3. Restore all registers. 4. Return from interrupt. it is completely unnecessary to save and restore any registers in the C function called by the assembly stub, even if they would normally be callee-saved. Add no_callee_saved_registers function attribute, which is complementary to no_caller_saved_registers function attribute, to mark a function which doesn't have any callee-saved registers. Such a function won't save and restore any registers. Classify function call-saved register handling type with: 1. Default call-saved registers. 2. No caller-saved registers with no_caller_saved_registers attribute. 3. No callee-saved registers with no_callee_saved_registers attribute. Disallow sibcall if callee is a no_callee_saved_registers function and caller isn't a no_callee_saved_registers function. Otherwise, callee-saved registers won't be preserved. After a no_callee_saved_registers function is called, all registers may be clobbered. If the calling function isn't a no_callee_saved_registers function, we need to preserve all registers which aren't used by function calls. gcc/ PR target/103503 PR target/113312 * config/i386/i386-expand.cc (ix86_expand_call): Replace no_caller_saved_registers check with call_saved_registers check. Clobber all registers that are not used by the callee with no_callee_saved_registers attribute. * config/i386/i386-options.cc (ix86_set_func_type): Set call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for noreturn function. Disallow no_callee_saved_registers with interrupt or no_caller_saved_registers attributes together. (ix86_set_current_function): Replace no_caller_saved_registers check with call_saved_registers check. (ix86_handle_no_caller_saved_registers_attribute): Renamed to ... (ix86_handle_call_saved_registers_attribute): This. (ix86_gnu_attributes): Add ix86_handle_call_saved_registers_attribute. * config/i386/i386.cc (ix86_conditional_register_usage): Replace no_caller_saved_registers check with call_saved_registers check. (ix86_function_ok_for_sibcall): Don't allow callee with no_callee_saved_registers attribute when the calling function has callee-saved registers. (ix86_comp_type_attributes): Also check no_callee_saved_registers. (ix86_epilogue_uses): Replace no_caller_saved_registers check with call_saved_registers check. (ix86_hard_regno_scratch_ok): Likewise. (ix86_save_reg): Replace no_caller_saved_registers check with call_saved_registers check. Don't save any registers for TYPE_NO_CALLEE_SAVED_REGISTERS. Save all registers with TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with no_callee_saved_registers attribute is called. (find_drap_reg): Replace no_caller_saved_registers check with call_saved_registers check. * config/i386/i386.h (call_saved_registers_type): New enum. (machine_function): Replace no_caller_saved_registers with call_saved_registers. * doc/extend.texi: Document no_callee_saved_registers attribute. gcc/testsuite/ PR target/103503 PR target/113312 * gcc.dg/torture/no-callee-saved-run-1a.c: New file. * gcc.dg/torture/no-callee-saved-run-1b.c: Likewise. * gcc.target/i386/no-callee-saved-1.c: Likewise. * gcc.target/i386/no-callee-saved-2.c: Likewise. * gcc.target/i386/no-callee-saved-3.c: Likewise. * gcc.target/i386/no-callee-saved-4.c: Likewise. * gcc.target/i386/no-callee-saved-5.c: Likewise. * gcc.target/i386/no-callee-saved-6.c: Likewise. * gcc.target/i386/no-callee-saved-7.c: Likewise. * gcc.target/i386/no-callee-saved-8.c: Likewise. * gcc.target/i386/no-callee-saved-9.c: Likewise. * gcc.target/i386/no-callee-saved-10.c: Likewise. * gcc.target/i386/no-callee-saved-11.c: Likewise. * gcc.target/i386/no-callee-saved-12.c: Likewise. * gcc.target/i386/no-callee-saved-13.c: Likewise. * gcc.target/i386/no-callee-saved-14.c: Likewise. * gcc.target/i386/no-callee-saved-15.c: Likewise. * gcc.target/i386/no-callee-saved-16.c: Likewise. * gcc.target/i386/no-callee-saved-17.c: Likewise. * gcc.target/i386/no-callee-saved-18.c: Likewise.
2024-01-26amdgcn: additional gfx1030/gfx1100 supportAndrew Stubbs4-16/+62
This is enough to get gfx1030 and gfx1100 working; there are still some test failures to investigate, and probably some tuning to do. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3. * config/gcn/gcn-valu.md (all_convert): New iterator. (<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New define_expand, and rename the old one to ... (*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this. (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ... (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this. (*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New. * config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly. (gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100. * config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3. (<u>mulqihi3_scalar): Likewise. libgcc/ChangeLog: * config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3. libgomp/ChangeLog: * config/gcn/time.c (RTC_TICKS): Configure RDNA3. (omp_get_wtime): Add RDNA3-compatible variant. * plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100. Signed-off-by: Andrew Stubbs <ams@baylibre.com>
2024-01-26gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPECTobias Burnus1-1/+16
Since LLVM commit 082f87c9d418 (Pull Req. #79038; will become LLVM 18) "[AMDGPU] Change default AMDHSA Code Object version to 5" the default - when no --amdhsa-code-object-version= is used - was bumped. Using --amdhsa-code-object-version=5 is supported (with unknown limitations) since LLVM 14. GCC required for proper support at least LLVM 13.0.1 such that explicitly using COV5 is not possible. Unfortunately, the COV number matters for debugging ("-g") as mkoffload.cc extracts debugging data from the host's object file and writes into an an AMD GPU object file it creates. And all object files linked together must have the same ABI version. gcc/ChangeLog: * config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): New; creates the "--amdhsa-code-object-version=" argument. (ASM_SPEC): Use it; replace previous version of it. Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
2024-01-26RISC-V: Refine some codes of VSETVL PASS [NFC]Juzhe-Zhong1-42/+27
gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): Refine some codes. (pre_vsetvl::emit_vsetvl): Ditto.