aboutsummaryrefslogtreecommitdiff
path: root/gcc/avoid-store-forwarding.cc
AgeCommit message (Collapse)AuthorFilesLines
5 daysasf: Fix null pointer dereference in is_store_forwarding [PR121303]Konstantinos Eleftheriou1-1/+8
We were calling `is_store_forwarding` with a NULL value for `off_val`, which was causing a null pointer dereference in `is_constant`, leading to an ICE. This patch updates the call to `is_constant` in `is_store_forwarding` and adds a check for `off_val`, in order to update it with the right value. Bootstrapped/regtested on AArch64 and x86_64. PR rtl-optimization/121303 gcc/ChangeLog: * avoid-store-forwarding.cc (is_store_forwarding): Add check for `off_val` in `is_store_forwarding`. gcc/testsuite/ChangeLog: * gcc.target/i386/pr121303.c: New test.
11 daysasf: Fix case of multiple stores with base offset [PR120660]Konstantinos Eleftheriou1-8/+27
When having multiple stores with the same offset as the load, in the case that we are eliminating the load, we were generating a mov instruction for both of them, leading to the overwrite of the register containing the loaded value. This patch fixes this issue by generating a mov instruction only for the first store in the store-load sequence that has the same offset as the load. For the next ones that might be encountered, we use bit-field insertion. Bootstrapped/regtested on AArch64 and x86_64. PR rtl-optimization/120660 gcc/ChangeLog: * avoid-store-forwarding.cc (process_store_forwarding): Fix instruction generation when haveing multiple stores with base offset. gcc/testsuite/ChangeLog: * gcc.dg/pr120660.c: New test.
11 daysasf: Skip when an instruction doesn't satisfy the constraints [PR119795]Konstantinos Eleftheriou1-8/+57
While scanning the instructions and upon reaching an instruction that doesn't satisfy the constraints that we have set, we were removing the already detected stores, but we were continuing adding stores from that point onward. This was causing issues when the address ranges from later stores overlapped with the load's address, leading to partial and wrong update of the register containing the loaded value. With this patch, we are skipping the tranformation for stores that operate on the load's address range, when stores that operate on the same range have been deleted due to constraint violations. PR rtl-optimization/119795 gcc/ChangeLog: * avoid-store-forwarding.cc (store_forwarding_analyzer::avoid_store_forwarding): Skip transformations for stores that operate on the same address range as deleted ones. gcc/testsuite/ChangeLog: * gcc.target/i386/pr119795.c: New test.
2025-07-15asf: Fix offset check in base reg initialization for big-endian targetsKonstantinos Eleftheriou1-13/+5
During the base register initialization, in the case that we are eliminating the load instruction, we are using `offset == 0` in order to find the store instruction that has the same offset as the load. This would not work on big-endian targets where byte 0 would be the MS byte. This patch updates the condition to take into account the target's endianness. We are, also, removing the adjustment of the starting position for the bitfield insertion, when BYTES_BIG_ENDIAN != BITS_BIG_ENDIAN. This is supposed to be handled inside `store_bit_field` and it's not needed anymore after the offset fix. Bootstrapped/regtested on AArch64 LE, x86_64 and PowerPC LE. gcc/ChangeLog: * avoid-store-forwarding.cc (generate_bit_insert_sequence): Remove adjustment of bitfield insertion's starting position when BYTES_BIG_ENDIAN != BITS_BIG_ENDIAN. (process_store_forwarding): Update offset check in base reg initialization to take into account the target's endianness. gcc/testsuite/ChangeLog: * gcc.target/aarch64/avoid-store-forwarding-be.c: New test.
2025-06-25Mark rtl_avoid_store_forwarding functions final overrideMartin Jambor1-2/+2
It is customary to mark the gate and execute functions of the classes representing passes as final override but this is missing in pass_rtl_avoid_store_forwarding. This patch adds it which also silences a clang warning about it. gcc/ChangeLog: 2025-06-24 Martin Jambor <mjambor@suse.cz> * avoid-store-forwarding.cc (class pass_rtl_avoid_store_forwarding): Mark member function gate as final override.
2025-05-27asf: Fix calling of emit_move_insn on registers of different modes [PR119884]Konstantinos Eleftheriou1-11/+40
This patch uses `lowpart_subreg` for the base register initialization, instead of zero-extending it. We had tried this solution before, but we were leaving undefined bytes in the upper part of the register. This shouldn't be happening as we are supposed to write the whole register when the load is eliminated. This was occurring when having multiple stores with the same offset as the load, generating a register move for all of them, overwriting the bit inserts that were inserted before them. In order to overcome this, we are removing redundant stores from the sequence, i.e. stores that write to addresses that will be overwritten by stores that come after them in the sequence. We are using the same bitmap that is used for the load elimination check, to keep track of the bytes that are written by each store. Also, we are now allowing the load to be eliminated even when there are overlaps between the stores, as there is no obvious reason why we shouldn't do that, we just want the stores to cover all of the load's bytes. Bootstrapped/regtested on AArch64 and x86_64. PR rtl-optimization/119884 gcc/ChangeLog: * avoid-store-forwarding.cc (process_store_forwarding): Use `lowpart_subreg` for the base register initialization and remove redundant stores from the store/load sequence. gcc/testsuite/ChangeLog: * gcc.target/i386/pr119884.c: New test.
2025-05-16Automatic replacement of get_insns/end_sequence pairsRichard Sandiford1-2/+1
This is the result of using a regexp to replace instances of: <stuff> = get_insns (); end_sequence (); with: <stuff> = end_sequence (); where the indentation is the same for both lines, and where there might be blank lines inbetween. gcc/ * asan.cc (asan_clear_shadow): Use the return value of end_sequence, rather than calling get_insns separately. (asan_emit_stack_protection, asan_emit_allocas_unpoison): Likewise. (hwasan_frame_base, hwasan_emit_untag_frame): Likewise. * auto-inc-dec.cc (attempt_change): Likewise. * avoid-store-forwarding.cc (process_store_forwarding): Likewise. * bb-reorder.cc (fix_crossing_unconditional_branches): Likewise. * builtins.cc (expand_builtin_apply_args): Likewise. (expand_builtin_return, expand_builtin_mathfn_ternary): Likewise. (expand_builtin_mathfn_3, expand_builtin_int_roundingfn): Likewise. (expand_builtin_int_roundingfn_2, expand_builtin_saveregs): Likewise. (inline_string_cmp): Likewise. * calls.cc (expand_call): Likewise. * cfgexpand.cc (expand_asm_stmt, pass_expand::execute): Likewise. * cfgloopanal.cc (init_set_costs): Likewise. * cfgrtl.cc (insert_insn_on_edge, prepend_insn_to_edge): Likewise. (rtl_lv_add_condition_to_bb): Likewise. * config/aarch64/aarch64-speculation.cc (aarch64_speculation_clobber_sp): Likewise. (aarch64_speculation_establish_tracker): Likewise. (aarch64_do_track_speculation): Likewise. * config/aarch64/aarch64.cc (aarch64_load_symref_appropriately) (aarch64_expand_vector_init, aarch64_gen_ccmp_first): Likewise. (aarch64_gen_ccmp_next, aarch64_mode_emit): Likewise. (aarch64_md_asm_adjust): Likewise. (aarch64_switch_pstate_sm_for_landing_pad): Likewise. (aarch64_switch_pstate_sm_for_jump): Likewise. (aarch64_switch_pstate_sm_for_call): Likewise. * config/alpha/alpha.cc (alpha_legitimize_address_1): Likewise. (alpha_emit_xfloating_libcall, alpha_gp_save_rtx): Likewise. * config/arc/arc.cc (hwloop_optimize): Likewise. * config/arm/aarch-common.cc (arm_md_asm_adjust): Likewise. * config/arm/arm-builtins.cc: Likewise. * config/arm/arm.cc (require_pic_register): Likewise. (arm_call_tls_get_addr, arm_gen_load_multiple_1): Likewise. (arm_gen_store_multiple_1, cmse_clear_registers): Likewise. (cmse_nonsecure_call_inline_register_clear): Likewise. (arm_attempt_dlstp_transform): Likewise. * config/avr/avr-passes.cc (bbinfo_t::optimize_one_block): Likewise. (avr_parallel_insn_from_insns): Likewise. * config/avr/avr.cc (avr_prologue_setup_frame): Likewise. (avr_expand_epilogue): Likewise. * config/bfin/bfin.cc (hwloop_optimize): Likewise. * config/c6x/c6x.cc (c6x_expand_compare): Likewise. * config/cris/cris.cc (cris_split_movdx): Likewise. * config/cris/cris.md: Likewise. * config/csky/csky.cc (csky_call_tls_get_addr): Likewise. * config/epiphany/resolve-sw-modes.cc (pass_resolve_sw_modes::execute): Likewise. * config/fr30/fr30.cc (fr30_move_double): Likewise. * config/frv/frv.cc (frv_split_scc, frv_split_cond_move): Likewise. (frv_split_minmax, frv_split_abs): Likewise. * config/frv/frv.md: Likewise. * config/gcn/gcn.cc (move_callee_saved_registers): Likewise. (gcn_expand_prologue, gcn_restore_exec, gcn_md_reorg): Likewise. * config/i386/i386-expand.cc (ix86_expand_carry_flag_compare, ix86_expand_int_movcc): Likewise. (ix86_vector_duplicate_value, expand_vec_perm_interleave2): Likewise. (expand_vec_perm_vperm2f128_vblend): Likewise. (expand_vec_perm_2perm_interleave): Likewise. (expand_vec_perm_2perm_pblendv): Likewise. (expand_vec_perm2_vperm2f128_vblend, ix86_gen_ccmp_first): Likewise. (ix86_gen_ccmp_next): Likewise. * config/i386/i386-features.cc (scalar_chain::make_vector_copies): Likewise. (scalar_chain::convert_reg, scalar_chain::convert_op): Likewise. (timode_scalar_chain::convert_insn): Likewise. * config/i386/i386.cc (ix86_init_pic_reg, ix86_va_start): Likewise. (ix86_get_drap_rtx, legitimize_tls_address): Likewise. (ix86_md_asm_adjust): Likewise. * config/ia64/ia64.cc (ia64_expand_tls_address): Likewise. (ia64_expand_compare, spill_restore_mem): Likewise. (expand_vec_perm_interleave_2): Likewise. * config/loongarch/loongarch.cc (loongarch_call_tls_get_addr): Likewise. * config/m32r/m32r.cc (gen_split_move_double): Likewise. * config/m32r/m32r.md: Likewise. * config/m68k/m68k.cc (m68k_call_tls_get_addr): Likewise. (m68k_call_m68k_read_tp, m68k_sched_md_init_global): Likewise. * config/m68k/m68k.md: Likewise. * config/microblaze/microblaze.cc (microblaze_call_tls_get_addr): Likewise. * config/mips/mips.cc (mips_call_tls_get_addr): Likewise. (mips_ls2_init_dfa_post_cycle_insn): Likewise. (mips16_split_long_branches): Likewise. * config/nvptx/nvptx.cc (nvptx_gen_shuffle): Likewise. (nvptx_gen_shared_bcast, nvptx_propagate): Likewise. (workaround_uninit_method_1, workaround_uninit_method_2): Likewise. (workaround_uninit_method_3): Likewise. * config/or1k/or1k.cc (or1k_init_pic_reg): Likewise. * config/pa/pa.cc (legitimize_tls_address): Likewise. * config/pru/pru.cc (pru_expand_fp_compare, pru_reorg_loop): Likewise. * config/riscv/riscv-shorten-memrefs.cc (pass_shorten_memrefs::transform): Likewise. * config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): Likewise. * config/riscv/riscv.cc (riscv_call_tls_get_addr): Likewise. (riscv_frm_emit_after_bb_end): Likewise. * config/rl78/rl78.cc (rl78_emit_libcall): Likewise. * config/rs6000/rs6000.cc (rs6000_debug_legitimize_address): Likewise. * config/s390/s390.cc (legitimize_tls_address): Likewise. (s390_two_part_insv, s390_load_got, s390_va_start): Likewise. * config/sh/sh_treg_combine.cc (sh_treg_combine::make_not_reg_insn): Likewise. * config/sparc/sparc.cc (sparc_legitimize_tls_address): Likewise. (sparc_output_mi_thunk, sparc_init_pic_reg): Likewise. * config/stormy16/stormy16.cc (xstormy16_split_cbranch): Likewise. * config/xtensa/xtensa.cc (xtensa_copy_incoming_a7): Likewise. (xtensa_expand_block_set_libcall): Likewise. (xtensa_expand_block_set_unrolled_loop): Likewise. (xtensa_expand_block_set_small_loop, xtensa_call_tls_desc): Likewise. * dse.cc (emit_inc_dec_insn_before, find_shift_sequence): Likewise. (replace_read): Likewise. * emit-rtl.cc (reorder_insns, gen_clobber, gen_use): Likewise. * except.cc (dw2_build_landing_pads, sjlj_mark_call_sites): Likewise. (sjlj_emit_function_enter, sjlj_emit_function_exit): Likewise. (sjlj_emit_dispatch_table): Likewise. * expmed.cc (expmed_mult_highpart_optab, expand_sdiv_pow2): Likewise. * expr.cc (convert_mode_scalar, emit_move_multi_word): Likewise. (gen_move_insn, expand_cond_expr_using_cmove): Likewise. (expand_expr_divmod, expand_expr_real_2): Likewise. (maybe_optimize_pow2p_mod_cmp, maybe_optimize_mod_cmp): Likewise. * function.cc (emit_initial_value_sets): Likewise. (instantiate_virtual_regs_in_insn, expand_function_end): Likewise. (get_arg_pointer_save_area, make_split_prologue_seq): Likewise. (make_prologue_seq, gen_call_used_regs_seq): Likewise. (thread_prologue_and_epilogue_insns): Likewise. (match_asm_constraints_1): Likewise. * gcse.cc (prepare_copy_insn): Likewise. * ifcvt.cc (noce_emit_store_flag, noce_emit_move_insn): Likewise. (noce_emit_cmove): Likewise. * init-regs.cc (initialize_uninitialized_regs): Likewise. * internal-fn.cc (expand_POPCOUNT): Likewise. * ira-emit.cc (emit_move_list): Likewise. * ira.cc (ira): Likewise. * loop-doloop.cc (doloop_modify): Likewise. * loop-unroll.cc (compare_and_jump_seq): Likewise. (unroll_loop_runtime_iterations, insert_base_initialization): Likewise. (split_iv, insert_var_expansion_initialization): Likewise. (combine_var_copies_in_loop_exit): Likewise. * lower-subreg.cc (resolve_simple_move,resolve_shift_zext): Likewise. * lra-constraints.cc (match_reload, check_and_process_move): Likewise. (process_addr_reg, insert_move_for_subreg): Likewise. (process_address_1, curr_insn_transform): Likewise. (inherit_reload_reg, process_invariant_for_inheritance): Likewise. (inherit_in_ebb, remove_inheritance_pseudos): Likewise. * lra-remat.cc (do_remat): Likewise. * mode-switching.cc (commit_mode_sets): Likewise. (optimize_mode_switching): Likewise. * optabs.cc (expand_binop, expand_twoval_binop_libfunc): Likewise. (expand_clrsb_using_clz, expand_doubleword_clz_ctz_ffs): Likewise. (expand_doubleword_popcount, expand_ctz, expand_ffs): Likewise. (expand_absneg_bit, expand_unop, expand_copysign_bit): Likewise. (prepare_float_lib_cmp, expand_float, expand_fix): Likewise. (expand_fixed_convert, gen_cond_trap): Likewise. (expand_atomic_fetch_op): Likewise. * ree.cc (combine_reaching_defs): Likewise. * reg-stack.cc (compensate_edge): Likewise. * reload1.cc (emit_input_reload_insns): Likewise. * sel-sched-ir.cc (setup_nop_and_exit_insns): Likewise. * shrink-wrap.cc (emit_common_heads_for_components): Likewise. (emit_common_tails_for_components): Likewise. (insert_prologue_epilogue_for_components): Likewise. * tree-outof-ssa.cc (emit_partition_copy): Likewise. (insert_value_copy_on_edge): Likewise. * tree-ssa-loop-ivopts.cc (computation_cost): Likewise.
2025-04-18avoid-store-forwarding: Fix reg init on load-elimination [PR119160]kelefth1-3/+8
In the case that we are eliminating the load instruction, we use zero_extend for the initialization of the base register for the zero-offset store. This causes issues when the store and the load use the same mode, as we are trying to generate a zero_extend with the same inner and outer modes. This patch fixes the issue by zero-extending the value stored in the base register only when the load's mode is wider than the store's mode. PR rtl-optimization/119160 gcc/ChangeLog: * avoid-store-forwarding.cc (process_store_forwarding): Zero-extend the value stored in the base register, in case of load-elimination, only when the mode of the destination is wider. gcc/testsuite/ChangeLog: * gcc.dg/pr119160.c: New test.
2025-01-02Update copyright years.Jakub Jelinek1-1/+1
2024-12-30avoid-store-forwarding: fix reg init on load-eliminiation [PR117835]kelefth1-5/+1
During the initialization of the base register for the zero-offset store, in the case that we are eliminating the load, we used a paradoxical subreg assuming that we don't care about the higher bits of the register. This led to writing wrong values when we were not updating the whole register. This patch fixes the issue by zero-extending the value stored in the base register instead of using a paradoxical subreg. Bootstrapped/regtested on x86 and AArch64. PR rtl-optimization/117835 PR rtl-optimization/117872 gcc/ChangeLog: * avoid-store-forwarding.cc (store_forwarding_analyzer::process_store_forwarding): Zero-extend the value stored in the base register instead of using a paradoxical subreg. gcc/testsuite/ChangeLog: * gcc.target/i386/pr117835.c: New test.
2024-12-06avoid-store-forwarding: bail when an instruction may throw [PR117816]kelefth1-1/+1
Avoid-store-forwarding doesn't handle the case where an instruction in the store-load sequence contains a REG_EH_REGION note, leading to the insertion of instructions after it, while it should be the last instruction in the basic block. This causes an ICE when compiling using `-O -fnon-call-exceptions -favoid-store-forwarding -fno-forward-propagate -finstrument-functions`. This patch rejects the transformation when there are instructions in the sequence that may throw an exeption. PR rtl-optimization/117816 gcc/ChangeLog: * avoid-store-forwarding.cc (store_forwarding_analyzer::avoid_store_forwarding): Reject the transformation when having instructions that may throw exceptions in the sequence. gcc/testsuite/ChangeLog: * gcc.dg/pr117816.c: New test.
2024-11-25Add target-independent store forwarding avoidance passKonstantinos Eleftheriou1-0/+651
This pass detects cases of expensive store forwarding and tries to avoid them by reordering the stores and using suitable bit insertion sequences. For example it can transform this: strb w2, [x1, 1] ldr x0, [x1] # Expensive store forwarding to larger load. To: ldr x0, [x1] strb w2, [x1] bfi x0, x2, 0, 8 Assembly like this can appear with bitfields or type punning / unions. On stress-ng when running the cpu-union microbenchmark the following speedups have been observed. Neoverse-N1: +29.4% Intel Coffeelake: +13.1% AMD 5950X: +17.5% The transformation is rejected on cases that cause store_bit_field to generate subreg expressions on different register classes. Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c contain such cases and have been marked as XFAIL. Due to biasing of its operands in store_bit_field, there is a special handling for machines with BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN. The need for this was exosed by an issue exposed on the H8 architecture, which uses big-endian ordering, but BITS_BIG_ENDIAN is false. In that case, the START parameter of store_bit_field needs to be calculated from the end of the destination register. gcc/ChangeLog: * Makefile.in (OBJS): Add avoid-store-forwarding.o. * common.opt (favoid-store-forwarding): New option. * common.opt.urls: Regenerate. * doc/invoke.texi: New param store-forwarding-max-distance. * doc/passes.texi: Document new pass. * doc/tm.texi: Regenerate. * doc/tm.texi.in: Document new pass. * params.opt (store-forwarding-max-distance): New param. * passes.def: Add pass_rtl_avoid_store_forwarding before pass_early_remat. * target.def (avoid_store_forwarding_p): New DEFHOOK. * target.h (struct store_fwd_info): Declare. * targhooks.cc (default_avoid_store_forwarding_p): New function. * targhooks.h (default_avoid_store_forwarding_p): Declare. * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare. * avoid-store-forwarding.cc: New file. * avoid-store-forwarding.h: New file. * timevar.def (TV_AVOID_STORE_FORWARDING): New timevar. gcc/testsuite/ChangeLog: * gcc.target/aarch64/avoid-store-forwarding-1.c: New test. * gcc.target/aarch64/avoid-store-forwarding-2.c: New test. * gcc.target/aarch64/avoid-store-forwarding-3.c: New test. * gcc.target/aarch64/avoid-store-forwarding-4.c: New test. * gcc.target/aarch64/avoid-store-forwarding-5.c: New test. * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-1.c: New test. * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-2.c: New test. Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Signed-off-by: Konstantinos Eleftheriou <konstantinos.eleftheriou@vrull.eu>