Age | Commit message (Collapse) | Author | Files | Lines |
|
The following makes sure to remove the copy edges we ignore or
need to special-case only once.
* tree-ssa-structalias.cc (solve_graph): Remove self-copy
edges, remove edges from escaped after special-casing them.
|
|
The following fixes the escape special casing to test the proper
variable IDs.
* tree-ssa-structalias.cc (do_sd_constraint): Fixup escape
special casing.
|
|
* tree-ssa-structalias.cc (do_sd_constraint): Do not write
to the LHS varinfo solution member.
|
|
Since we do not update successor edges when merging nodes we have
to deal with this in the users. The following avoids putting those
on the topo order vector.
* tree-ssa-structalias.cc (topo_visit): Look at the real
destination of edges.
|
|
The following adjusts tree_[transform_and_]unroll_loop to set an
upper bound on the number of iterations on the epilogue loop it
creates. For the testcase at hand which involves array prefetching
this avoids applying RTL unrolling to them when -funroll-loops is
specified.
Other users of this API includes predictive commoning and
unroll-and-jam.
PR tree-optimization/44794
* tree-ssa-loop-manip.cc (tree_transform_and_unroll_loop):
If an epilogue loop is required set its iteration upper bound.
|
|
We'd been generating really bad block move sequences which is recently
complained by kernel developers who tried __builtin_memcpy. To improve
it:
1. Take the advantage of -mno-strict-align. When it is set, set mode
size to UNITS_PER_WORD regardless of the alignment.
2. Half the mode size when (block size) % (mode size) != 0, instead of
falling back to ld.bu/st.b at once.
3. Limit the length of block move sequence considering the number of
instructions, not the size of block. When -mstrict-align is set and
the block is not aligned, the old size limit for straight-line
implementation (64 bytes) was definitely too large (we don't have 64
registers anyway).
Change since v1: add a comment about the calculation of num_reg.
gcc/ChangeLog:
PR target/109465
* config/loongarch/loongarch-protos.h
(loongarch_expand_block_move): Add a parameter as alignment RTX.
* config/loongarch/loongarch.h:
(LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER): Remove.
(LARCH_MAX_MOVE_BYTES_STRAIGHT): Remove.
(LARCH_MAX_MOVE_OPS_PER_LOOP_ITER): Define.
(LARCH_MAX_MOVE_OPS_STRAIGHT): Define.
(MOVE_RATIO): Use LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of
LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER.
* config/loongarch/loongarch.cc (loongarch_expand_block_move):
Take the alignment from the parameter, but set it to
UNITS_PER_WORD if !TARGET_STRICT_ALIGN. Limit the length of
straight-line implementation with LARCH_MAX_MOVE_OPS_STRAIGHT
instead of LARCH_MAX_MOVE_BYTES_STRAIGHT.
(loongarch_block_move_straight): When there are left-over bytes,
half the mode size instead of falling back to byte mode at once.
(loongarch_block_move_loop): Limit the length of loop body with
LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of
LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER.
* config/loongarch/loongarch.md (cpymemsi): Pass the alignment
to loongarch_expand_block_move.
gcc/testsuite/ChangeLog:
PR target/109465
* gcc.target/loongarch/pr109465-1.c: New test.
* gcc.target/loongarch/pr109465-2.c: New test.
* gcc.target/loongarch/pr109465-3.c: New test.
|
|
LoongArch backend used to save all GARs for a function with variable
arguments. But sometimes a function only accepts variable arguments for
a purpose like C++ function overloading. For example, POSIX defines
open() as:
int open(const char *path, int oflag, ...);
But only two forms are actually used:
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
So it's obviously a waste to save all 8 GARs in open(). We can use the
cfun->va_list_gpr_size field set by the stdarg pass to only save the
GARs necessary to be saved.
If the va_list escapes (for example, in fprintf() we pass it to
vfprintf()), stdarg would set cfun->va_list_gpr_size to 255 so we
don't need a special case.
With this patch, only one GAR ($a2/$r6) is saved in open(). Ideally
even this stack store should be omitted too, but doing so is not trivial
and AFAIK there are no compilers (for any target) performing the "ideal"
optimization here, see https://godbolt.org/z/n1YqWq9c9.
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk
(GCC 14 or now)?
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_setup_incoming_varargs): Don't save more GARs than
cfun->va_list_gpr_size / UNITS_PER_WORD.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/va_arg.c: New test.
|
|
The following fixes the condition determining whether we need an
epilogue.
* tree-ssa-loop-manip.cc (determine_exit_conditions): Fix
no epilogue condition.
|
|
The following simplifies and outlines gimple_assign_load. In
particular it is not necessary to get at the base of the possibly
loaded expression but just handle the case of a single handled
component wrapping a non-memory operand.
* gimple.h (gimple_assign_load): Outline...
* gimple.cc (gimple_assign_load): ... here. Avoid
get_base_address and instead just strip the outermost
handled component, treating a remaining handled component
as load.
|
|
I don't think we need to keep the __builtin_aarch64_neg* builtins around.
They are only used once in the vnegh_f16 intrinsic in arm_fp16.h and I AFAICT
it was added this way only for the sake of orthogonality in
https://gcc.gnu.org/g:d7f33f07d88984cbe769047e3d07fc21067fbba9
We already use normal "-" negation in the other vneg* intrinsics, so do so here as well.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd-builtins.def (neg): Delete builtins
definition.
* config/aarch64/arm_fp16.h (vnegh_f16): Reimplement using normal negation.
|
|
For __builtin_popcountll tree-vect-patterns.cc has
vect_recog_popcount_pattern, which improves the vectorized code.
Without that the vectorization is always multi-type vectorization
in the loop (at least int and long long types) where we emit two
.POPCOUNT calls with long long arguments and int return value and then
widen to long long, so effectively after vectorization do the
V?DImode -> V?DImode popcount twice, then pack the result into V?SImode
and immediately unpack.
The following patch extends that handling to __builtin_{clz,ctz,ffs}ll
builtins as well (as long as there is an optab for them; more to come
laster).
x86 can do __builtin_popcountll with -mavx512vpopcntdq, __builtin_clzll
with -mavx512cd, ppc can do __builtin_popcountll and __builtin_clzll
with -mpower8-vector and __builtin_ctzll with -mpower9-vector, s390
can do __builtin_{popcount,clz,ctz}ll with -march=z13 -mzarch (i.e. VX).
2023-04-19 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109011
* tree-vect-patterns.cc (vect_recog_popcount_pattern): Rename to ...
(vect_recog_popcount_clz_ctz_ffs_pattern): ... this. Handle also
CLZ, CTZ and FFS. Remove vargs variable, use
gimple_build_call_internal rather than gimple_build_call_internal_vec.
(vect_vect_recog_func_ptrs): Adjust popcount entry.
* gcc.dg/vect/pr109011-1.c: New test.
|
|
WORD_REGISTER_OPERATIONS targets [PR109040]
While we've agreed this is not the right fix for the PR109040 bug,
the patch clearly improves generated code (at least on the testcase from the
PR), so I'd like to propose this as optimization heuristics improvement
for GCC 14.
2023-04-19 Jakub Jelinek <jakub@redhat.com>
PR target/109040
* dse.cc (replace_read): If read_reg is a SUBREG of a word mode
REG, for WORD_REGISTER_OPERATIONS copy SUBREG_REG of it into
a new REG rather than the SUBREG.
|
|
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>):
New pattern.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vec-set-zero.c: New test.
|
|
shift amount masking
In this PR we fail to eliminate explicit &31 operations for variable shifts such as in:
void
bar (int x[3], int y)
{
x[0] <<= (y & 31);
x[1] <<= (y & 31);
x[2] <<= (y & 31);
}
This is rejected by RTX costs that end up giving too high a cost for:
(set (reg:SI 96)
(ashift:SI (reg:SI 98)
(subreg:QI (and:SI (reg:SI 99)
(const_int 31 [0x1f])) 0)))
There is code to handle the AND-31 case in rtx costs, but it gets confused by the subreg.
It's easy enough to fix by looking inside the subreg when costing the expression.
While doing that I noticed that the ASHIFT case and the other shift-like cases are almost identical
and we should just merge them. This code will only be used for valid insns anyway, so the code after this
patch should do the Right Thing (TM) for all such shift cases.
With this patch there are no more "and wn, wn, 31" instructions left in the testcase.
Bootstrapped and tested on aarch64-none-linux-gnu.
PR target/108840
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_rtx_costs): Merge ASHIFT and
ROTATE, ROTATERT, LSHIFTRT, ASHIFTRT cases. Handle subregs in op1.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr108840.c: New test.
|
|
The following addresses quadraticness in processing debug insns
in delete_trivially_dead_insns and insn_live_p by using TREE_VISITED
on the INSN_VAR_LOCATION_DECL to indicate a later debug bind
with the same decl and no intervening real insn or debug marker.
That gets rid of the NEXT_INSN walk in insn_live_p in favor of
first clearing TREE_VISITED in the first loop over insn and
the book-keeping of decls we set the bit since we need to clear
them when visiting a real or debug marker insn.
That improves the time spent in delete_trivially_dead_insns from
10.6s to 2.2s for the testcase.
PR rtl-optimization/109237
* cse.cc (insn_live_p): Remove NEXT_INSN walk, instead check
TREE_VISITED on INSN_VAR_LOCATION_DECL.
(delete_trivially_dead_insns): Maintain TREE_VISITED on
active debug bind INSN_VAR_LOCATION_DECL.
|
|
For the testcase bb_is_just_return is on top of the profile, changing
it to walk BB insns backwards puts it off the profile. That's because
in the forward walk you have to process possibly many debug insns
but in a backward walk you very likely run into control insns first.
PR rtl-optimization/109237
* cfgcleanup.cc (bb_is_just_return): Walk insns backwards.
|
|
This testcase was reduced such that it isn't valid C++23, so with my
usual testing with GXX_TESTSUITE_STDS=98,11,14,17,20,2b it fails:
FAIL: g++.dg/pr109524.C -std=gnu++2b (test for excess errors)
.../gcc/testsuite/g++.dg/pr109524.C: In function 'nn hh(nn)':
.../gcc/testsuite/g++.dg/pr109524.C:35:12: error: cannot bind non-const lvalue reference of type 'nn&' to an rvalue of type 'nn'
.../gcc/testsuite/g++.dg/pr109524.C:17:6: note: initializing argument 1 of 'nn::nn(nn&)'
The following patch fixes that and I've verified it doesn't change
anything on what the test was testing, it still ICEs in r13-7198 and
passes in r13-7203, now in all language modes (except for 98 where
it is intentionally UNSUPPORTED).
2023-04-19 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109524
* g++.dg/pr109524.C (nn::nn): Change argument type from nn & to
const nn &.
|
|
When I committed the patches to enable support for DFP on AArch64, I
forgot to update the installation documentation.
This patch adds AArch64 as needed (same as i386/x86_64).
2023-04-17 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* doc/install.texi (enable-decimal-float): Add AArch64.
|
|
with different reg classes.
There's a potential performance issue when backend returns some
unreasonable value for the mode which can be never be allocate with
reg class.
gcc/ChangeLog:
PR rtl-optimization/109351
* ira.cc (setup_class_subset_and_memory_move_costs): Check
hard_regno_mode_ok before setting lowest memory move cost for
the mode with different reg classes.
|
|
|
|
@gol was removed in r13-6778, new doc additions can't use it.
gcc/ChangeLog:
* doc/invoke.texi: Remove stray @gol.
|
|
gcc/
* ifcvt.cc (cond_move_process_if_block): Consider the result of
targetm.noce_conversion_profitable_p() when replacing the original
sequence with the converted one.
|
|
gcc/
* common.opt (gcodeview): Add new option.
* gcc.cc (driver_handle_option); Handle OPT_gcodeview.
* opts.cc (command_handle_option): Similarly.
* doc/invoke.texi: Add documentation for -gcodeview.
|
|
This moves around the code for tree_ssa_cs_elim slightly
improving code readability and removing declarations that
are no longer needed.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove declaration.
(make_pass_phiopt): Make execute out of line.
(tree_ssa_cs_elim): Move code into ...
(pass_cselim::execute): here.
|
|
gcc/ChangeLog:
* system.h: Drop unused INCLUDE_PTHREAD_H.
|
|
vect_grouped_store_supported
gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_grouped_store_supported): Add new
condition.
|
|
gcc/
* config/riscv/bitmanip.md (rotr<mode>3 expander): Enable for ZBKB.
(bswapdi2, bswapsi2): Similarly.
|
|
INSERTPS can select any element from src and insert into any place
of the dest. For SSE4.1 targets, compiler can generate e.g.
insertps $64, %xmm0, %xmm1
to insert element 1 from %xmm1 to element 0 of %xmm0.
gcc/ChangeLog:
PR target/94908
* config/i386/i386-builtin.def (__builtin_ia32_insertps128):
Use CODE_FOR_sse4_1_insertps_v4sf.
* config/i386/i386-expand.cc (expand_vec_perm_insertps): New.
(expand_vec_perm_1): Call expand_vec_per_insertps.
* config/i386/i386.md ("unspec"): Declare UNSPEC_INSERTPS here.
* config/i386/mmx.md (mmxscalarmode): New mode attribute.
(@sse4_1_insertps_<mode>): New insn pattern.
* config/i386/sse.md (@sse4_1_insertps_<mode>): Macroize insn
pattern from sse4_1_insertps using VI4F_128 mode iterator.
gcc/testsuite/ChangeLog:
PR target/94908
* gcc.target/i386/pr94908.c: New test.
* gcc.target/i386/sse4_1-insertps-5.c: New test.
* gcc.target/i386/vperm-v4sf-2-sse4.c: New test.
|
|
IPA currently puts *some* irange's in GC memory. When I contribute
support for generic ranges in IPA, we'll need to change this to
vrange. This patch adds GTY support for both vrange and frange.
gcc/ChangeLog:
* value-range.cc (gt_ggc_mx): New.
(gt_pch_nx): New.
* value-range.h (class vrange): Add GTY marker.
(class frange): Same.
(gt_ggc_mx): Remove.
(gt_pch_nx): Remove.
|
|
The function `constrain_operands' lacked the logic to consider relaxed
memory constraints when "traditional" memory constraints were not
satisfied, creating potential issues as observed during the reload
compilation pass.
In addition, it was observed that while `constrain_operands' chooses
to disregard constraints when more than one alternative is provided,
e.g. "m,r" using CONSTRAINT__UNKNOWN, it has no checks in place to
determine whether the multiple constraints in a given string are in
fact repetitions of the same constraint and should thus in fact be
treated as a single constraint, as ought to be the case for something
like "m,m".
Both of these issues are dealt with here, thus ensuring that we get
appropriate pattern matching.
gcc/
* lra-constraints.cc (constraint_unique): New.
(process_address_1): Apply constraint_unique test.
* recog.cc (constrain_operands): Allow relaxed memory
constaints.
|
|
Document which version of RISC-V vector intrinsics has implemented in
GCC.
gcc/ChangeLog:
* doc/extend.texi (Target Builtins): Add RISC-V Vector
Intrinsics.
(RISC-V Vector Intrinsics): Document GCC implemented which
version of RISC-V vector intrinsics and its reference.
|
|
This adds bitmap_clear_first_set_bit and uses it where previously
bitmap_clear_bit followed bitmap_first_set_bit. The advantage
is speeding up the search and avoiding to clobber ->current.
PR middle-end/108786
* bitmap.h (bitmap_clear_first_set_bit): New.
* bitmap.cc (bitmap_first_set_bit_worker): Rename from
bitmap_first_set_bit and add optional clearing of the bit.
(bitmap_first_set_bit): Wrap bitmap_first_set_bit_worker.
(bitmap_clear_first_set_bit): Likewise.
* df-core.cc (df_worklist_dataflow_doublequeue): Use
bitmap_clear_first_set_bit.
* graphite-scop-detection.cc (scop_detection::merge_sese):
Likewise.
* sanopt.cc (sanitize_asan_mark_unpoison): Likewise.
(sanitize_asan_mark_poison): Likewise.
* tree-cfgcleanup.cc (cleanup_tree_cfg_noloop): Likewise.
* tree-into-ssa.cc (rewrite_blocks): Likewise.
* tree-ssa-dce.cc (simple_dce_from_worklist): Likewise.
* tree-ssa-sccvn.cc (do_rpo_vn_1): Likewise.
|
|
The following allows to get PTA stats with -stats without blowing
up your filesystem by guarding constraint and solution dumping
with TDF_DETAILS and the SSA points-to info with TDF_DETAILS
or TDF_ALIAS.
* tree-ssa-structalias.cc (dump_sa_stats): Split out from...
(dump_sa_points_to_info): ... this function.
(compute_points_to_sets): Guard large dumps with TDF_DETAILS,
and call dump_sa_stats guarded with TDF_STATS.
(ipa_pta_execute): Likewise.
(compute_may_aliases): Guard dump_alias_info with
TDF_DETAILS|TDF_ALIAS.
* gcc.dg/ipa/ipa-pta-16.c: Use -details for dump.
* gcc.dg/tm/alias-1.c: Likewise.
* gcc.dg/tm/alias-2.c: Likewise.
* gcc.dg/torture/ipa-pta-1.c: Likewise.
* gcc.dg/torture/pr39074-2.c: Likewise.
* gcc.dg/torture/pr39074.c: Likewise.
* gcc.dg/torture/pta-callused-1.c: Likewise.
* gcc.dg/torture/pta-escape-1.c: Likewise.
* gcc.dg/torture/pta-ptrarith-1.c: Likewise.
* gcc.dg/torture/pta-ptrarith-2.c: Likewise.
* gcc.dg/torture/pta-ptrarith-3.c: Likewise.
* gcc.dg/torture/pta-structcopy-1.c: Likewise.
* gcc.dg/torture/ssa-pta-fn-1.c: Likewise.
* gcc.dg/tree-ssa/alias-19.c: Likewise.
* gcc.dg/tree-ssa/pta-callused.c: Likewise.
* gcc.dg/tree-ssa/pta-fp.c: Likewise.
* gcc.dg/tree-ssa/pta-ptrarith-1.c: Likewise.
* gcc.dg/tree-ssa/pta-ptrarith-2.c: Likewise.
|
|
While debugging PHI-OPT with match-and-simplify,
I found that adding more dumping to the debug dumps made
it easier to understand what was going on rather than stepping in
the debugger so this adds them. Note I used TDF_FOLDING rather
than TDF_DETAILS as these debug messages can be chatty and
only needed if you are debugging match and simplify
with PHI-OPT and match and simplify uses TDF_FOLDING as
its check.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (gimple_simplify_phiopt): Dump
the expression that is being tried when TDF_FOLDING
is true.
(phiopt_worker::match_simplify_replacement): Dump
the sequence which was created by gimple_simplify_phiopt
when TDF_FOLDING is true.
|
|
We know that the statement we are moving is already
have a SSA_NAME on the lhs so we don't need to
check that and can also just call reset_flow_sensitive_info
with the name we already got.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (match_simplify_replacement):
Simplify code that does the movement slightly.
|
|
I noticed for the expansion of the __rev16* arm_acle.h intrinsics we don't need to use an unspec just because it doesn't match neatly to a bswap code.
We have organic combine patterns for it that we can reuse.
This patch removes the define_insn using UNSPEC_REV (should it have been an UNSPEC_REV16?) and adds an expander to emit
the patterns we have for rev16 using standard RTL codes.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64.md (@aarch64_rev16<mode>): Change to
define_expand.
(rev16<mode>2): Rename to...
(aarch64_rev16<mode>2_alt1): ... This.
(rev16<mode>2_alt): Rename to...
(*aarch64_rev16<mode>2_alt2): ... This.
|
|
Negating dconst0 is getting pretty old, and we will keep adding copies
of the same idiom. Fixed by adding a dconstm0 constant to go along
with dconst1, dconstm1, etc.
gcc/ChangeLog:
* emit-rtl.cc (init_emit_once): Initialize dconstm0.
* gimple-range-op.cc (class cfn_signbit): Remove dconstm0
declaration.
* range-op-float.cc (zero_range): Use dconstm0.
(zero_to_inf_range): Same.
* real.h (dconstm0): New.
* value-range.cc (frange::flush_denormals_to_zero): Use dconstm0.
(frange::set_zero): Do not declare dconstm0.
|
|
The following adds two RAII classes, one for mpz_t and one for mpfr_t
making object lifetime management easier. Both formerly require
explicit initialization with {mpz,mpfr}_init and release with
{mpz,mpfr}_clear.
I've converted two example places (where lifetime is trivial).
* system.h (class auto_mpz): New,
* realmpfr.h (class auto_mpfr): Likewise.
* fold-const-call.cc (do_mpfr_arg1): Use auto_mpfr.
(do_mpfr_arg2): Likewise.
* tree-ssa-loop-niter.cc (bound_difference): Use auto_mpz;
|
|
We record the flags to use for the intrinsics in aarch64_simd_intrinsic_data, so use it when initialising them
rather than using a hardcoded FLAG_AUTO_FP. The current vreinterpret intrinsics use FLAG_AUTO_FP anyway so this
patch is an NFC but this will be needed as we migrate more builtins into the intrinsics infrastructure.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_init_simd_intrinsics): Take
builtin flags from intrinsic data rather than hardcoded FLAG_AUTO_FP.
|
|
gcc/ada
* gcc-interface/utils.cc (unchecked_convert): Fixed typo.
|
|
The == operator for ranges signifies that two ranges contain the same
thing, not that they are ultimately equal. So [2,4] == [2,4], even
though one may be a 2 and the other may be a 3. Similarly with two
VARYING ranges.
There is an oversight in frange::operator== where we are returning
false for two identical NANs. This is causing us to never cache NANs
in sbr_sparse_bitmap::set_bb_range.
gcc/ChangeLog:
* value-range.cc (frange::operator==): Adjust for NAN.
(range_tests_nan): Remove some NAN tests.
|
|
This patch provides inchash support for vrange. It is along the lines
of the streaming support I just posted and will be used for IPA
hashing of ranges.
gcc/ChangeLog:
* inchash.cc (hash::add_real_value): New.
* inchash.h (class hash): Add add_real_value.
* value-range.cc (add_vrange): New.
* value-range.h (inchash::add_vrange): New.
|
|
Access diagnostics visits the SSA def-use chains to diagnose things like
dangling pointer uses. When that runs into PHIs it tries to prove
all incoming pointers of which one is the currently visited use are
related to decide whether to keep looking for the PHI def uses.
That turns out to be overly optimistic and thus costly. The following
scraps the existing handling for simply requiring that we eventually
visit all incoming pointers of the PHI during the def-use chain
analysis and only then process uses of the PHI def.
Note this handles backedges of natural loops optimistically, diagnosing
the first iteration. There's gcc.dg/Wuse-after-free-2.c containing
a testcase requiring this.
PR tree-optimization/109539
* gimple-ssa-warn-access.cc (pass_waccess::check_pointer_uses):
Re-implement pointer relatedness for PHIs.
|
|
Implement FP division using hardware instructions. This replaces both the
softfp library calls, and the --fast-math inaccurate divsion we had previously.
The GCN architecture does not have a single divide instruction, but it does
have a number of support instructions designed to make multiply-by-reciprocal
sufficiently accurate for non-fast-math usage.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (SV_SFDF): New iterator.
(SV_FP): New iterator.
(scalar_mode, SCALAR_MODE): Add identity mappings for scalar modes.
(recip<mode>2): Unify the two patterns using SV_FP.
(div_scale<mode><exec_vcc>): New insn.
(div_fmas<mode><exec>): New insn.
(div_fixup<mode><exec>): New insn.
(div<mode>3): Unify the two expanders and rewrite using hardfp.
* config/gcn/gcn.cc (gcn_md_reorg): Support "vccwait" attribute.
* config/gcn/gcn.md (unspec): Add UNSPEC_DIV_SCALE, UNSPEC_DIV_FMAS,
and UNSPEC_DIV_FIXUP.
(vccwait): New attribute.
gcc/testsuite/ChangeLog:
* gcc.target/gcn/fpdiv.c: Remove the -ffast-math requirement.
|
|
We should redirect users of the erroneous -mcpu=armv8.2-a to use -march instead.
There is an equivalent hint for -march used with a CPU name.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_validate_mcpu): Add hint to use -march
if the argument matches that.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/spellcheck_11.c: New test.
|
|
This patch is a straightforward extension of the zero-extending LDAPR
pattern to represent QI -> HI load-extends. This maps down to a LDAPRB-W
instruction.
This lets us remove a redundant zero-extend in the new test function.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/atomics.md
(*aarch64_atomic_load<ALLX:mode>_rcpc_zext):
Use SD_HSDI for destination mode iterator.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ldapr-zext.c: Add test for u8 to u16
extension.
|
|
riscv-spec and binutils.
The current order of gcc and binutils parsing extensions is inconsistent.
According to latest risc-v spec, the canonical order in which extension names must
appear in the name string specified in Table 29.1 is different from before.
In the latest table, non-standard extensions must be listed after all standard
extensions. To keep consistent, we now change the parsing order.
Related llvm patch links:
https://reviews.llvm.org/D148315
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (multi_letter_subset_rank): Swap the order
of z-extensions and s-extensions.
(riscv_subset_list::parse): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-5.c: Likewise.
|
|
match.pd has mostly for AArch64 an optimization in which it optimizes
certain forms of __builtin_shuffle of x + y and x - y vectors into
fneg using twice as wide element type so that every other sign is changed,
followed by fadd.
The following patch extends that optimization, so that it can handle
other forms as well, using the same fneg but fsub instead of fadd.
As the plus is commutative and minus is not and I want to handle
vec_perm with plus minus and minus plus order preferrably in one
pattern, I had to do the matching operand checks by hand.
2023-04-18 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109240
* match.pd (fneg/fadd): Rewrite such that it handles both plus as
first vec_perm operand and minus as second using fneg/fadd and
minus as first vec_perm operand and plus as second using fneg/fsub.
* gcc.target/aarch64/simd/addsub_2.c: New test.
* gcc.target/aarch64/sve/addsub_2.c: New test.
|
|
In upcoming patches I will contribute code to stream out frange's as
well as vrange's. This patch abstracts out the REAL_VALUE_TYPE
streaming into their own functions, so that they may be used elsewhere.
gcc/ChangeLog:
* data-streamer.cc (bp_pack_real_value): New.
(bp_unpack_real_value): New.
* data-streamer.h (bp_pack_real_value): New.
(bp_unpack_real_value): New.
* tree-streamer-in.cc (unpack_ts_real_cst_value_fields): Use
bp_unpack_real_value.
* tree-streamer-out.cc (pack_ts_real_cst_value_fields): Use
bp_pack_real_value.
|
|
I'm about to add one more use of the same snippet of code, for a total
of 4 identical calculations in the code base.
gcc/ChangeLog:
* wide-int.h (WIDE_INT_MAX_HWIS): New.
(class fixed_wide_int_storage): Use it.
(trailing_wide_ints <N>::set_precision): Use it.
(trailing_wide_ints <N>::extra_size): Use it.
|