Age | Commit message (Collapse) | Author | Files | Lines |
|
The following addresses quadraticness in processing debug insns
in delete_trivially_dead_insns and insn_live_p by using TREE_VISITED
on the INSN_VAR_LOCATION_DECL to indicate a later debug bind
with the same decl and no intervening real insn or debug marker.
That gets rid of the NEXT_INSN walk in insn_live_p in favor of
first clearing TREE_VISITED in the first loop over insn and
the book-keeping of decls we set the bit since we need to clear
them when visiting a real or debug marker insn.
That improves the time spent in delete_trivially_dead_insns from
10.6s to 2.2s for the testcase.
PR rtl-optimization/109237
* cse.cc (insn_live_p): Remove NEXT_INSN walk, instead check
TREE_VISITED on INSN_VAR_LOCATION_DECL.
(delete_trivially_dead_insns): Maintain TREE_VISITED on
active debug bind INSN_VAR_LOCATION_DECL.
|
|
For the testcase bb_is_just_return is on top of the profile, changing
it to walk BB insns backwards puts it off the profile. That's because
in the forward walk you have to process possibly many debug insns
but in a backward walk you very likely run into control insns first.
PR rtl-optimization/109237
* cfgcleanup.cc (bb_is_just_return): Walk insns backwards.
|
|
This testcase was reduced such that it isn't valid C++23, so with my
usual testing with GXX_TESTSUITE_STDS=98,11,14,17,20,2b it fails:
FAIL: g++.dg/pr109524.C -std=gnu++2b (test for excess errors)
.../gcc/testsuite/g++.dg/pr109524.C: In function 'nn hh(nn)':
.../gcc/testsuite/g++.dg/pr109524.C:35:12: error: cannot bind non-const lvalue reference of type 'nn&' to an rvalue of type 'nn'
.../gcc/testsuite/g++.dg/pr109524.C:17:6: note: initializing argument 1 of 'nn::nn(nn&)'
The following patch fixes that and I've verified it doesn't change
anything on what the test was testing, it still ICEs in r13-7198 and
passes in r13-7203, now in all language modes (except for 98 where
it is intentionally UNSUPPORTED).
2023-04-19 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109524
* g++.dg/pr109524.C (nn::nn): Change argument type from nn & to
const nn &.
|
|
When I committed the patches to enable support for DFP on AArch64, I
forgot to update the installation documentation.
This patch adds AArch64 as needed (same as i386/x86_64).
2023-04-17 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* doc/install.texi (enable-decimal-float): Add AArch64.
|
|
with different reg classes.
There's a potential performance issue when backend returns some
unreasonable value for the mode which can be never be allocate with
reg class.
gcc/ChangeLog:
PR rtl-optimization/109351
* ira.cc (setup_class_subset_and_memory_move_costs): Check
hard_regno_mode_ok before setting lowest memory move cost for
the mode with different reg classes.
|
|
|
|
@gol was removed in r13-6778, new doc additions can't use it.
gcc/ChangeLog:
* doc/invoke.texi: Remove stray @gol.
|
|
gcc/
* ifcvt.cc (cond_move_process_if_block): Consider the result of
targetm.noce_conversion_profitable_p() when replacing the original
sequence with the converted one.
|
|
gcc/
* common.opt (gcodeview): Add new option.
* gcc.cc (driver_handle_option); Handle OPT_gcodeview.
* opts.cc (command_handle_option): Similarly.
* doc/invoke.texi: Add documentation for -gcodeview.
|
|
This moves around the code for tree_ssa_cs_elim slightly
improving code readability and removing declarations that
are no longer needed.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove declaration.
(make_pass_phiopt): Make execute out of line.
(tree_ssa_cs_elim): Move code into ...
(pass_cselim::execute): here.
|
|
gcc/ChangeLog:
* system.h: Drop unused INCLUDE_PTHREAD_H.
|
|
vect_grouped_store_supported
gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_grouped_store_supported): Add new
condition.
|
|
gcc/
* config/riscv/bitmanip.md (rotr<mode>3 expander): Enable for ZBKB.
(bswapdi2, bswapsi2): Similarly.
|
|
INSERTPS can select any element from src and insert into any place
of the dest. For SSE4.1 targets, compiler can generate e.g.
insertps $64, %xmm0, %xmm1
to insert element 1 from %xmm1 to element 0 of %xmm0.
gcc/ChangeLog:
PR target/94908
* config/i386/i386-builtin.def (__builtin_ia32_insertps128):
Use CODE_FOR_sse4_1_insertps_v4sf.
* config/i386/i386-expand.cc (expand_vec_perm_insertps): New.
(expand_vec_perm_1): Call expand_vec_per_insertps.
* config/i386/i386.md ("unspec"): Declare UNSPEC_INSERTPS here.
* config/i386/mmx.md (mmxscalarmode): New mode attribute.
(@sse4_1_insertps_<mode>): New insn pattern.
* config/i386/sse.md (@sse4_1_insertps_<mode>): Macroize insn
pattern from sse4_1_insertps using VI4F_128 mode iterator.
gcc/testsuite/ChangeLog:
PR target/94908
* gcc.target/i386/pr94908.c: New test.
* gcc.target/i386/sse4_1-insertps-5.c: New test.
* gcc.target/i386/vperm-v4sf-2-sse4.c: New test.
|
|
IPA currently puts *some* irange's in GC memory. When I contribute
support for generic ranges in IPA, we'll need to change this to
vrange. This patch adds GTY support for both vrange and frange.
gcc/ChangeLog:
* value-range.cc (gt_ggc_mx): New.
(gt_pch_nx): New.
* value-range.h (class vrange): Add GTY marker.
(class frange): Same.
(gt_ggc_mx): Remove.
(gt_pch_nx): Remove.
|
|
The function `constrain_operands' lacked the logic to consider relaxed
memory constraints when "traditional" memory constraints were not
satisfied, creating potential issues as observed during the reload
compilation pass.
In addition, it was observed that while `constrain_operands' chooses
to disregard constraints when more than one alternative is provided,
e.g. "m,r" using CONSTRAINT__UNKNOWN, it has no checks in place to
determine whether the multiple constraints in a given string are in
fact repetitions of the same constraint and should thus in fact be
treated as a single constraint, as ought to be the case for something
like "m,m".
Both of these issues are dealt with here, thus ensuring that we get
appropriate pattern matching.
gcc/
* lra-constraints.cc (constraint_unique): New.
(process_address_1): Apply constraint_unique test.
* recog.cc (constrain_operands): Allow relaxed memory
constaints.
|
|
Document which version of RISC-V vector intrinsics has implemented in
GCC.
gcc/ChangeLog:
* doc/extend.texi (Target Builtins): Add RISC-V Vector
Intrinsics.
(RISC-V Vector Intrinsics): Document GCC implemented which
version of RISC-V vector intrinsics and its reference.
|
|
This adds bitmap_clear_first_set_bit and uses it where previously
bitmap_clear_bit followed bitmap_first_set_bit. The advantage
is speeding up the search and avoiding to clobber ->current.
PR middle-end/108786
* bitmap.h (bitmap_clear_first_set_bit): New.
* bitmap.cc (bitmap_first_set_bit_worker): Rename from
bitmap_first_set_bit and add optional clearing of the bit.
(bitmap_first_set_bit): Wrap bitmap_first_set_bit_worker.
(bitmap_clear_first_set_bit): Likewise.
* df-core.cc (df_worklist_dataflow_doublequeue): Use
bitmap_clear_first_set_bit.
* graphite-scop-detection.cc (scop_detection::merge_sese):
Likewise.
* sanopt.cc (sanitize_asan_mark_unpoison): Likewise.
(sanitize_asan_mark_poison): Likewise.
* tree-cfgcleanup.cc (cleanup_tree_cfg_noloop): Likewise.
* tree-into-ssa.cc (rewrite_blocks): Likewise.
* tree-ssa-dce.cc (simple_dce_from_worklist): Likewise.
* tree-ssa-sccvn.cc (do_rpo_vn_1): Likewise.
|
|
The following allows to get PTA stats with -stats without blowing
up your filesystem by guarding constraint and solution dumping
with TDF_DETAILS and the SSA points-to info with TDF_DETAILS
or TDF_ALIAS.
* tree-ssa-structalias.cc (dump_sa_stats): Split out from...
(dump_sa_points_to_info): ... this function.
(compute_points_to_sets): Guard large dumps with TDF_DETAILS,
and call dump_sa_stats guarded with TDF_STATS.
(ipa_pta_execute): Likewise.
(compute_may_aliases): Guard dump_alias_info with
TDF_DETAILS|TDF_ALIAS.
* gcc.dg/ipa/ipa-pta-16.c: Use -details for dump.
* gcc.dg/tm/alias-1.c: Likewise.
* gcc.dg/tm/alias-2.c: Likewise.
* gcc.dg/torture/ipa-pta-1.c: Likewise.
* gcc.dg/torture/pr39074-2.c: Likewise.
* gcc.dg/torture/pr39074.c: Likewise.
* gcc.dg/torture/pta-callused-1.c: Likewise.
* gcc.dg/torture/pta-escape-1.c: Likewise.
* gcc.dg/torture/pta-ptrarith-1.c: Likewise.
* gcc.dg/torture/pta-ptrarith-2.c: Likewise.
* gcc.dg/torture/pta-ptrarith-3.c: Likewise.
* gcc.dg/torture/pta-structcopy-1.c: Likewise.
* gcc.dg/torture/ssa-pta-fn-1.c: Likewise.
* gcc.dg/tree-ssa/alias-19.c: Likewise.
* gcc.dg/tree-ssa/pta-callused.c: Likewise.
* gcc.dg/tree-ssa/pta-fp.c: Likewise.
* gcc.dg/tree-ssa/pta-ptrarith-1.c: Likewise.
* gcc.dg/tree-ssa/pta-ptrarith-2.c: Likewise.
|
|
While debugging PHI-OPT with match-and-simplify,
I found that adding more dumping to the debug dumps made
it easier to understand what was going on rather than stepping in
the debugger so this adds them. Note I used TDF_FOLDING rather
than TDF_DETAILS as these debug messages can be chatty and
only needed if you are debugging match and simplify
with PHI-OPT and match and simplify uses TDF_FOLDING as
its check.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (gimple_simplify_phiopt): Dump
the expression that is being tried when TDF_FOLDING
is true.
(phiopt_worker::match_simplify_replacement): Dump
the sequence which was created by gimple_simplify_phiopt
when TDF_FOLDING is true.
|
|
We know that the statement we are moving is already
have a SSA_NAME on the lhs so we don't need to
check that and can also just call reset_flow_sensitive_info
with the name we already got.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (match_simplify_replacement):
Simplify code that does the movement slightly.
|
|
I noticed for the expansion of the __rev16* arm_acle.h intrinsics we don't need to use an unspec just because it doesn't match neatly to a bswap code.
We have organic combine patterns for it that we can reuse.
This patch removes the define_insn using UNSPEC_REV (should it have been an UNSPEC_REV16?) and adds an expander to emit
the patterns we have for rev16 using standard RTL codes.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64.md (@aarch64_rev16<mode>): Change to
define_expand.
(rev16<mode>2): Rename to...
(aarch64_rev16<mode>2_alt1): ... This.
(rev16<mode>2_alt): Rename to...
(*aarch64_rev16<mode>2_alt2): ... This.
|
|
Negating dconst0 is getting pretty old, and we will keep adding copies
of the same idiom. Fixed by adding a dconstm0 constant to go along
with dconst1, dconstm1, etc.
gcc/ChangeLog:
* emit-rtl.cc (init_emit_once): Initialize dconstm0.
* gimple-range-op.cc (class cfn_signbit): Remove dconstm0
declaration.
* range-op-float.cc (zero_range): Use dconstm0.
(zero_to_inf_range): Same.
* real.h (dconstm0): New.
* value-range.cc (frange::flush_denormals_to_zero): Use dconstm0.
(frange::set_zero): Do not declare dconstm0.
|
|
The following adds two RAII classes, one for mpz_t and one for mpfr_t
making object lifetime management easier. Both formerly require
explicit initialization with {mpz,mpfr}_init and release with
{mpz,mpfr}_clear.
I've converted two example places (where lifetime is trivial).
* system.h (class auto_mpz): New,
* realmpfr.h (class auto_mpfr): Likewise.
* fold-const-call.cc (do_mpfr_arg1): Use auto_mpfr.
(do_mpfr_arg2): Likewise.
* tree-ssa-loop-niter.cc (bound_difference): Use auto_mpz;
|
|
We record the flags to use for the intrinsics in aarch64_simd_intrinsic_data, so use it when initialising them
rather than using a hardcoded FLAG_AUTO_FP. The current vreinterpret intrinsics use FLAG_AUTO_FP anyway so this
patch is an NFC but this will be needed as we migrate more builtins into the intrinsics infrastructure.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_init_simd_intrinsics): Take
builtin flags from intrinsic data rather than hardcoded FLAG_AUTO_FP.
|
|
gcc/ada
* gcc-interface/utils.cc (unchecked_convert): Fixed typo.
|
|
The == operator for ranges signifies that two ranges contain the same
thing, not that they are ultimately equal. So [2,4] == [2,4], even
though one may be a 2 and the other may be a 3. Similarly with two
VARYING ranges.
There is an oversight in frange::operator== where we are returning
false for two identical NANs. This is causing us to never cache NANs
in sbr_sparse_bitmap::set_bb_range.
gcc/ChangeLog:
* value-range.cc (frange::operator==): Adjust for NAN.
(range_tests_nan): Remove some NAN tests.
|
|
This patch provides inchash support for vrange. It is along the lines
of the streaming support I just posted and will be used for IPA
hashing of ranges.
gcc/ChangeLog:
* inchash.cc (hash::add_real_value): New.
* inchash.h (class hash): Add add_real_value.
* value-range.cc (add_vrange): New.
* value-range.h (inchash::add_vrange): New.
|
|
Access diagnostics visits the SSA def-use chains to diagnose things like
dangling pointer uses. When that runs into PHIs it tries to prove
all incoming pointers of which one is the currently visited use are
related to decide whether to keep looking for the PHI def uses.
That turns out to be overly optimistic and thus costly. The following
scraps the existing handling for simply requiring that we eventually
visit all incoming pointers of the PHI during the def-use chain
analysis and only then process uses of the PHI def.
Note this handles backedges of natural loops optimistically, diagnosing
the first iteration. There's gcc.dg/Wuse-after-free-2.c containing
a testcase requiring this.
PR tree-optimization/109539
* gimple-ssa-warn-access.cc (pass_waccess::check_pointer_uses):
Re-implement pointer relatedness for PHIs.
|
|
Implement FP division using hardware instructions. This replaces both the
softfp library calls, and the --fast-math inaccurate divsion we had previously.
The GCN architecture does not have a single divide instruction, but it does
have a number of support instructions designed to make multiply-by-reciprocal
sufficiently accurate for non-fast-math usage.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (SV_SFDF): New iterator.
(SV_FP): New iterator.
(scalar_mode, SCALAR_MODE): Add identity mappings for scalar modes.
(recip<mode>2): Unify the two patterns using SV_FP.
(div_scale<mode><exec_vcc>): New insn.
(div_fmas<mode><exec>): New insn.
(div_fixup<mode><exec>): New insn.
(div<mode>3): Unify the two expanders and rewrite using hardfp.
* config/gcn/gcn.cc (gcn_md_reorg): Support "vccwait" attribute.
* config/gcn/gcn.md (unspec): Add UNSPEC_DIV_SCALE, UNSPEC_DIV_FMAS,
and UNSPEC_DIV_FIXUP.
(vccwait): New attribute.
gcc/testsuite/ChangeLog:
* gcc.target/gcn/fpdiv.c: Remove the -ffast-math requirement.
|
|
We should redirect users of the erroneous -mcpu=armv8.2-a to use -march instead.
There is an equivalent hint for -march used with a CPU name.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_validate_mcpu): Add hint to use -march
if the argument matches that.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/spellcheck_11.c: New test.
|
|
This patch is a straightforward extension of the zero-extending LDAPR
pattern to represent QI -> HI load-extends. This maps down to a LDAPRB-W
instruction.
This lets us remove a redundant zero-extend in the new test function.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/atomics.md
(*aarch64_atomic_load<ALLX:mode>_rcpc_zext):
Use SD_HSDI for destination mode iterator.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ldapr-zext.c: Add test for u8 to u16
extension.
|
|
riscv-spec and binutils.
The current order of gcc and binutils parsing extensions is inconsistent.
According to latest risc-v spec, the canonical order in which extension names must
appear in the name string specified in Table 29.1 is different from before.
In the latest table, non-standard extensions must be listed after all standard
extensions. To keep consistent, we now change the parsing order.
Related llvm patch links:
https://reviews.llvm.org/D148315
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (multi_letter_subset_rank): Swap the order
of z-extensions and s-extensions.
(riscv_subset_list::parse): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-5.c: Likewise.
|
|
match.pd has mostly for AArch64 an optimization in which it optimizes
certain forms of __builtin_shuffle of x + y and x - y vectors into
fneg using twice as wide element type so that every other sign is changed,
followed by fadd.
The following patch extends that optimization, so that it can handle
other forms as well, using the same fneg but fsub instead of fadd.
As the plus is commutative and minus is not and I want to handle
vec_perm with plus minus and minus plus order preferrably in one
pattern, I had to do the matching operand checks by hand.
2023-04-18 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109240
* match.pd (fneg/fadd): Rewrite such that it handles both plus as
first vec_perm operand and minus as second using fneg/fadd and
minus as first vec_perm operand and plus as second using fneg/fsub.
* gcc.target/aarch64/simd/addsub_2.c: New test.
* gcc.target/aarch64/sve/addsub_2.c: New test.
|
|
In upcoming patches I will contribute code to stream out frange's as
well as vrange's. This patch abstracts out the REAL_VALUE_TYPE
streaming into their own functions, so that they may be used elsewhere.
gcc/ChangeLog:
* data-streamer.cc (bp_pack_real_value): New.
(bp_unpack_real_value): New.
* data-streamer.h (bp_pack_real_value): New.
(bp_unpack_real_value): New.
* tree-streamer-in.cc (unpack_ts_real_cst_value_fields): Use
bp_unpack_real_value.
* tree-streamer-out.cc (pack_ts_real_cst_value_fields): Use
bp_pack_real_value.
|
|
I'm about to add one more use of the same snippet of code, for a total
of 4 identical calculations in the code base.
gcc/ChangeLog:
* wide-int.h (WIDE_INT_MAX_HWIS): New.
(class fixed_wide_int_storage): Use it.
(trailing_wide_ints <N>::set_precision): Use it.
(trailing_wide_ints <N>::extra_size): Use it.
|
|
1. Use addu16i.d for TARGET_64BIT and suitable immediates.
2. Split one addition with immediate into two addu16i.d or addi.{d/w}
instructions if possible. This can avoid using a temp register w/o
increase the count of instructions.
Inspired by https://reviews.llvm.org/D143710 and
https://reviews.llvm.org/D147222.
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for GCC 14?
gcc/ChangeLog:
* config/loongarch/loongarch-protos.h
(loongarch_addu16i_imm12_operand_p): New function prototype.
(loongarch_split_plus_constant): Likewise.
* config/loongarch/loongarch.cc
(loongarch_addu16i_imm12_operand_p): New function.
(loongarch_split_plus_constant): Likewise.
* config/loongarch/loongarch.h (ADDU16I_OPERAND): New macro.
(DUAL_IMM12_OPERAND): Likewise.
(DUAL_ADDU16I_OPERAND): Likewise.
* config/loongarch/constraints.md (La, Lb, Lc, Ld, Le): New
constraint.
* config/loongarch/predicates.md (const_dual_imm12_operand): New
predicate.
(const_addu16i_operand): Likewise.
(const_addu16i_imm12_di_operand): Likewise.
(const_addu16i_imm12_si_operand): Likewise.
(plus_di_operand): Likewise.
(plus_si_operand): Likewise.
(plus_si_extend_operand): Likewise.
* config/loongarch/loongarch.md (add<mode>3): Convert to
define_insn_and_split. Use plus_<mode>_operand predicate
instead of arith_operand. Add alternatives for La, Lb, Lc, Ld,
and Le constraints.
(*addsi3_extended): Convert to define_insn_and_split. Use
plus_si_extend_operand instead of arith_operand. Add
alternatives for La and Le alternatives.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/add-const.c: New test.
* gcc.target/loongarch/stack-check-cfa-1.c: Adjust for stack
frame size change.
* gcc.target/loongarch/stack-check-cfa-2.c: Likewise.
|
|
This is for upcoming work in this area.
gcc/ChangeLog:
* value-range.h (Value_Range::Value_Range): New.
(Value_Range::contains_p): New.
|
|
The discriminator in vrange cannot change after construction,
similarly the number of allocated ranges in an irange. It's best to
make them constant to avoid invalid changes.
gcc/ChangeLog:
* value-range.h (class vrange): Make m_discriminator const.
(class irange): Make m_max_ranges const. Adjust constructors
accordingly.
(class unsupported_range): Construct vrange appropriately.
(class frange): Same.
|
|
under the architecture and use the default definition instead.
In some cases, setting this macro as the default can reduce the number of conditional
branch instructions.
gcc/ChangeLog:
* config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove the macro
definition.
|
|
set instructions.
gcc/ChangeLog:
* doc/extend.texi: Add section for LoongArch Base Built-in functions.
|
|
|
|
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_first_stack_step): Make codes more
readable.
(riscv_expand_epilogue): Likewise.
|
|
Here when level lowering the bound ttp TT<typename T::type> via the
substitution T=C, we're neglecting to canonicalize (and thereby strip
of simple typedefs) the substituted template arguments {A<int>} before
determining the new canonical type via hash table lookup. This leads to
a hash mismatch ICE for the two equivalent types TT<int> and TT<A<int>>
since iterative_hash_template_arg assumes type arguments are already
canonicalized.
We can fix this by canonicalizing or coercing the substituted arguments
directly, but seeing as creation and ordinary substitution of bound ttps
both go through lookup_template_class, which in turn performs the desired
coercion/canonicalization, it seems preferable to make this code path go
through lookup_template_class as well.
PR c++/109531
gcc/cp/ChangeLog:
* pt.cc (tsubst) <case BOUND_TEMPLATE_TEMPLATE_PARM>:
In the level-lowering case just use lookup_template_class
to rebuild the bound ttp.
gcc/testsuite/ChangeLog:
* g++.dg/template/canon-type-20.C: New test.
* g++.dg/template/ttp36.C: New test.
|
|
The stack that save-restore reserves is not well accumulated in stack allocation and deallocation.
This patch allows less instructions to be used in stack allocation and deallocation if save-restore enabled.
before patch:
bar:
call t0,__riscv_save_4
addi sp,sp,-64
...
li t0,-12288
addi t0,t0,-1968 # optimized out after patch
add sp,sp,t0 # prologue
...
li t0,12288 # epilogue
addi t0,t0,2000 # optimized out after patch
add sp,sp,t0
...
addi sp,sp,32
tail __riscv_restore_4
after patch:
bar:
call t0,__riscv_save_4
addi sp,sp,-2032
...
li t0,-12288
add sp,sp,t0 # prologue
...
li t0,12288 # epilogue
add sp,sp,t0
...
addi sp,sp,2032
tail __riscv_restore_4
gcc/
* config/riscv/riscv.cc (riscv_expand_prologue): Consider save-restore in
stack allocation.
(riscv_expand_epilogue): Consider save-restore in stack deallocation.
gcc/testsuite
* gcc.target/riscv/stack_save_restore.c: New test.
|
|
gate_hoist_loads is defined before its usage so there is
no reason for the declaration (prototype) to be there.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (gate_hoist_loads): Remove
prototype.
|
|
A warning pass should not be exporting global ranges it finds along
the way, because that will alter the behavior of future passes.
The reason the present behavior was there was because of some long ago
forgotten regression in another pass. This regression is no longer
there, and if there's ever any fallout from cleaning this up, we can
address it in the pass that is missing some information.
gcc/ChangeLog:
* gimple-ssa-warn-alloca.cc (pass_walloca::execute): Do not export
global ranges.
|
|
These functions are NOPs on the soft-float ABIs. Since we're already
forcing the ISA, let's just force the ABI too.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadfmv-fmv.c: Force the ilp32d ABI.
|
|
The RVV test harness currently sets the ISA according to the target
tuple, but doesn't also set the ABI. This just sets the ABI to match
the ISA, though we should really also be respecting the user's specific
ISA to test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp (gcc_mabi): New variable.
|
|
The test case that was added is rv64i-specific, as there's better ways
to generate this code on rv32i (where the long/int cast is a NOP) and on
rv64i_zba (where we have word shifts). This renames the original test
case and adds two more for those targets.
gcc/testsuite/ChangeLog:
PR target/106602
* gcc.target/riscv/pr106602.c: Moved to...
* gcc.target/riscv/pr106602-rv64i.c: ...here.
* gcc.target/riscv/pr106602-rv32i.c: New test.
* gcc.target/riscv/pr106602-rv64i_zba.c: New test.
|