Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently, rtl_ssa::change_insns requires all new uses and defs to be
specified explicitly. This turns out to be rather inconvenient for
forming load pairs in the new aarch64 load pair pass, as the pass has to
determine which mem def the final load pair consumes, and then obtain or
create a suitable use (i.e. significant bookkeeping, just to keep the
RTL-SSA IR consistent). It turns out to be much more convenient to
allow change_insns to infer which def is consumed and create a suitable
use of mem itself. This patch does that.
gcc/ChangeLog:
* rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add new
parameter to give final insn position, infer use of mem if it isn't
specified explicitly.
(function_info::change_insns): Pass down final insn position to
finalize_new_accesses.
* rtl-ssa/functions.h: Add parameter to finalize_new_accesses.
|
|
This is needed by the upcoming aarch64 load pair pass, as it can
re-order stores (when alias analysis determines this is safe) and thus
change which mem def a given use consumes (in the RTL-SSA view, there is
no alias disambiguation of memory).
gcc/ChangeLog:
* rtl-ssa/accesses.cc (function_info::reparent_use): New.
* rtl-ssa/functions.h (function_info): Declare new member
function reparent_use.
|
|
Add a helper routine to access-utils.h which removes the memory access
from an access_array, if it has one.
gcc/ChangeLog:
* rtl-ssa/access-utils.h (drop_memory_access): New.
|
|
In the case that !insn->is_debug_insn () && next->is_debug_insn (), this
function was missing an update of the prev pointer on the first nondebug
insn following the sequence of debug insns starting at next.
This can lead to corruption of the insn chain, in that we end up with:
insn->next_any_insn ()->prev_any_insn () != insn
in this case. This patch fixes that.
gcc/ChangeLog:
* rtl-ssa/insns.cc (function_info::add_insn_after): Ensure we
update the prev pointer on the following nondebug insn in the
case that !insn->is_debug_insn () && next->is_debug_insn ().
|
|
gcc/ChangeLog:
* config/i386/i386.h: Correct the ISA enabled for Arrow Lake.
Also make Clearwater Forest depends on Sierra Forest.
* config/i386/i386-options.cc: Revise the order of the macro
definition to avoid confusion.
* doc/extend.texi: Revise documentation.
* doc/invoke.texi: Correct documentation.
gcc/testsuite/ChangeLog:
* gcc.target/i386/funcspec-56.inc: Group Clearwater Forest
with atom cores.
|
|
LLVM wants to remove it, which breaks our build. This patch means that
most users won't notice that change, when it comes, and those that do will
have chosen to enable Fiji explicitly.
I'm selecting gfx900 as the new default as that's the least likely for users
to want, which means most users will specify -march explicitly, which means
we'll be free to change the default again, when we need to, without breaking
anybody's makefiles.
gcc/ChangeLog:
* config.gcc (amdgcn): Switch default to --with-arch=gfx900.
Implement support for --with-multilib-list.
* config/gcn/t-gcn-hsa: Likewise.
* doc/install.texi: Likewise.
* doc/invoke.texi: Mark Fiji deprecated.
|
|
This patch make loongarch use the new vector hooks and implements the costing
function determine_suggested_unroll_factor, to make it be able to suggest the
unroll factor for a given loop being vectorized base vec_ops analysis during
vector costing and the available issue information. Referring to aarch64 and
rs6000 port.
The patch also reduces the cost of unaligned stores, making it equal to the
cost of aligned ones in order to avoid odd alignment peeling.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_vector_costs): Inherit from
vector_costs. Add a constructor.
(loongarch_vector_costs::add_stmt_cost): Use adjust_cost_for_freq to
adjust the cost for inner loops.
(loongarch_vector_costs::count_operations): New function.
(loongarch_vector_costs::determine_suggested_unroll_factor): Ditto.
(loongarch_vector_costs::finish_cost): Ditto.
(loongarch_builtin_vectorization_cost): Adjust.
* config/loongarch/loongarch.opt (loongarch-vect-unroll-limit): New parameter.
(loongarcg-vect-issue-info): Ditto.
(mmemvec-cost): Delete.
* config/loongarch/genopts/loongarch.opt.in
(loongarch-vect-unroll-limit): Ditto.
(loongarcg-vect-issue-info): Ditto.
(mmemvec-cost): Delete.
* doc/invoke.texi (loongarcg-vect-unroll-limit): Document new option.
|
|
Add support for vec_widen lo/hi patterns. These do not directly
match on Loongarch lasx instructions but can be emulated with
even/odd + vector merge.
gcc/ChangeLog:
* config/loongarch/lasx.md
(vec_widen_<su>mult_even_v8si): New patterns.
(vec_widen_<su>add_hi_<mode>): Ditto.
(vec_widen_<su>add_lo_<mode>): Ditto.
(vec_widen_<su>sub_hi_<mode>): Ditto.
(vec_widen_<su>sub_lo_<mode>): Ditto.
(vec_widen_<su>mult_hi_<mode>): Ditto.
(vec_widen_<su>mult_lo_<mode>): Ditto.
* config/loongarch/loongarch.md (u_bool): New iterator.
* config/loongarch/loongarch-protos.h
(loongarch_expand_vec_widen_hilo): New prototype.
* config/loongarch/loongarch.cc
(loongarch_expand_vec_interleave): New function.
(loongarch_expand_vec_widen_hilo): New function.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vect-widen-add.c: New test.
* gcc.target/loongarch/vect-widen-mul.c: New test.
* gcc.target/loongarch/vect-widen-sub.c: New test.
|
|
gcc/ChangeLog:
* config/loongarch/lasx.md
(avg<mode>3_ceil): New patterns.
(uavg<mode>3_ceil): Ditto.
(avg<mode>3_floor): Ditto.
(uavg<mode>3_floor): Ditto.
(usadv32qi): Ditto.
(ssadv32qi): Ditto.
* config/loongarch/lsx.md
(avg<mode>3_ceil): New patterns.
(uavg<mode>3_ceil): Ditto.
(avg<mode>3_floor): Ditto.
(uavg<mode>3_floor): Ditto.
(usadv16qi): Ditto.
(ssadv16qi): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/avg-ceil-lasx.c: New test.
* gcc.target/loongarch/avg-ceil-lsx.c: New test.
* gcc.target/loongarch/avg-floor-lasx.c: New test.
* gcc.target/loongarch/avg-floor-lsx.c: New test.
* gcc.target/loongarch/sad-lasx.c: New test.
* gcc.target/loongarch/sad-lsx.c: New test.
|
|
|
|
I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670
where we would remove the `& CST` part if we ended up not calling
expand_single_bit_test.
This fixes the problem by introducing a new variable that will be used
for calling expand_single_bit_test.
As afar as I know this can only show up when disabling optimization
passes as this above form would have been optimized away.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR middle-end/111863
gcc/ChangeLog:
* expr.cc (do_store_flag): Don't over write arg0
when stripping off `& POW2`.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr111863-1.c: New test.
|
|
checking for error
When checking to see if we have a function declaration has a conflict due to
promotations, there is no test to see if the type was an error mark and then calls
c_type_promotes_to. c_type_promotes_to is not ready for error_mark and causes an
ICE.
This adds a check for error before the call of c_type_promotes_to.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR c/101364
gcc/c/ChangeLog:
* c-decl.cc (diagnose_arglist_conflict): Test for
error mark before calling of c_type_promotes_to.
gcc/testsuite/ChangeLog:
* gcc.dg/pr101364-1.c: New test.
|
|
This is a simple error recovery issue when c_safe_arg_type_equiv_p
was added in r8-5312-gc65e18d3331aa999. The issue is that after
an error, an argument type (of a function type) might turn
into an error mark node and c_safe_arg_type_equiv_p was not ready
for that. So this just adds a check for error operand for its
arguments before getting the main variant.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR c/101285
gcc/c/ChangeLog:
* c-typeck.cc (c_safe_arg_type_equiv_p): Return true for error
operands early.
gcc/testsuite/ChangeLog:
* gcc.dg/pr101285-1.c: New test.
|
|
gcc/ChangeLog:
PR tree-optimization/111648
* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): If a1
chooses base element from arg, ensure that it's a natural stepped
sequence.
(build_vec_cst_rand): New param natural_stepped and use it to
construct a naturally stepped sequence.
(test_nunits_min_2): Add new unit tests Case 6 and Case 7.
|
|
This patch slightly improves the embench-iot benchmark score for
PRU code size. There is also small improvement in a few real-world
firmware programs.
Embench-iot size
------------------------------------------
Benchmark before after delta
--------- ---- ---- -----
aha-mont64 4.15 4.15 0
crc32 6.04 6.04 0
cubic 21.64 21.62 -0.02
edn 6.37 6.37 0
huffbench 18.63 18.55 -0.08
matmult-int 5.44 5.44 0
md5sum 25.56 25.43 -0.13
minver 12.82 12.76 -0.06
nbody 15.09 14.97 -0.12
nettle-aes 4.75 4.75 0
nettle-sha256 4.67 4.67 0
nsichneu 3.77 3.77 0
picojpeg 4.11 4.11 0
primecount 7.90 7.90 0
qrduino 7.18 7.16 -0.02
sglib-combined 13.63 13.59 -0.04
slre 5.19 5.19 0
st 14.23 14.12 -0.11
statemate 2.34 2.34 0
tarfind 36.85 36.64 -0.21
ud 10.51 10.46 -0.05
wikisort 7.44 7.41 -0.03
--------- ----- -----
Geometric mean 8.42 8.40 -0.02
Geometric SD 2.00 2.00 0
Geometric range 12.68 12.62 -0.06
gcc/ChangeLog:
* config/pru/pru.cc (pru_insn_cost): New function.
(TARGET_INSN_COST): Define for PRU.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_test_fractional_cost):
Test <= instead of testing < twice.
|
|
Before the r5-3834 commit for PR63362, GCC 4.8-4.9 refuses to compile
cse.cc which contains a variable with rtx_def type, because rtx_def
contains a union with poly_uint16 element. poly_int template has
defaulted default constructor and a variadic template constructor which
could have empty parameter pack. GCC < 5 treated it as non-trivially
constructible class and deleted rtunion and rtx_def default constructors.
For the cse_insn purposes, all we need is a variable with size and alignment
of rtx_def, not necessarily rtx_def itself, which we then memset to 0 and
fill in like rtx is normally allocated from heap, so this patch for
GCC_VERSION < 5000 uses an unsigned char array of the right size/alignment.
2023-10-18 Jakub Jelinek <jakub@redhat.com>
PR bootstrap/111852
* cse.cc (cse_insn): Add workaround for GCC 4.8-4.9, instead of
using rtx_def type for memory_extend_buf, use unsigned char
arrayy with size of rtx_def and its alignment.
|
|
In the discussion of promoting some pedwarns to be errors by default, rather
than move them all into -fpermissive it seems to me to make sense to support
DK_PERMERROR with an option flag. This way will also work with
-fpermissive, but users can also still use -Wno-error=narrowing to downgrade
that specific diagnostic rather than everything affected by -fpermissive.
So, for diagnostics that we want to make errors by default we can just
change the pedwarn call to permerror.
The tests check desired behavior for such a permerror in a system header
with various flags. The patch preserves the existing permerror behavior of
ignoring -w and system headers by default, but respecting them when
downgraded to a warning by -fpermissive.
This seems similar to but a bit better than the approach of forcing
-pedantic-errors that I previously used for -Wnarrowing: specifically, in
that now -w by itself is not enough to silence the -Wnarrowing
error (integer-pack2.C).
gcc/ChangeLog:
* doc/invoke.texi: Move -fpermissive to Warning Options.
* diagnostic.cc (update_effective_level_from_pragmas): Remove
redundant system header check.
(diagnostic_report_diagnostic): Move down syshdr/-w check.
(diagnostic_impl): Handle DK_PERMERROR with an option number.
(permerror): Add new overloads.
* diagnostic-core.h (permerror): Declare them.
gcc/cp/ChangeLog:
* typeck2.cc (check_narrowing): Use permerror.
gcc/testsuite/ChangeLog:
* g++.dg/ext/integer-pack2.C: Add -fpermissive.
* g++.dg/diagnostic/sys-narrow.h: New test.
* g++.dg/diagnostic/sys-narrow1.C: New test.
* g++.dg/diagnostic/sys-narrow1a.C: New test.
* g++.dg/diagnostic/sys-narrow1b.C: New test.
* g++.dg/diagnostic/sys-narrow1c.C: New test.
* g++.dg/diagnostic/sys-narrow1d.C: New test.
* g++.dg/diagnostic/sys-narrow1e.C: New test.
* g++.dg/diagnostic/sys-narrow1f.C: New test.
* g++.dg/diagnostic/sys-narrow1g.C: New test.
* g++.dg/diagnostic/sys-narrow1h.C: New test.
* g++.dg/diagnostic/sys-narrow1i.C: New test.
|
|
gcc/ChangeLog:
* gimplify.cc (gimplify_bind_expr): Remove "omp allocate" attribute
to avoid that auxillary statement list reaches LTO.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/allocate-13a.f90: New test.
|
|
GCC ICEs on the first testcase. Successful match_uaddc_usubc ends up with
some dead stmts which DCE will remove (hopefully) later all.
The ICE is because one of the dead stmts refers to a freed SSA_NAME.
The code already gsi_removes a couple of stmts in the
/* Remove some statements which can't be kept in the IL because they
use SSA_NAME whose setter is going to be removed too. */
section for the same reason (the reason for the freed SSA_NAMEs is that
we don't really have a replacement for those cases - all we have after
a match is combined overflow from the addition/subtraction of 2 operands + a
[0, 1] carry in, but not the individual overflows from the former 2
additions), but for the last (most significant) limb case, where we try
to match x = op1 + op2 + carry1 + carry2; or
x = op1 - op2 - carry1 - carry2; we just gsi_replace the final stmt, but
left around the 2 temporary stmts as dead; if we were unlucky enough that
those referenced the carry flag that went away, it ICEs.
So, the following patch remembers those temporary statements (rather than
trying to rediscover them more expensively) and removes them before the
final one is replaced.
While working on it, I've noticed we didn't support all the reassociated
possibilities of writing the addition of 4 operands or subtracting 3
operands from one, we supported e.g.
x = ((op1 + op2) + op3) + op4;
x = op1 + ((op2 + op3) + op4);
but not
x = (op1 + (op2 + op3)) + op4;
x = op1 + (op2 + (op3 + op4));
Fixed by the change to inspect also rhs[2] when rhs[1] didn't yield what
we were searching for (if non-NULL) - rhs[0] is inspected in the first
loop and has different handling for the MINUS_EXPR case.
2023-10-18 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/111845
* tree-ssa-math-opts.cc (match_uaddc_usubc): Remember temporary
statements for the 4 operand addition or subtraction of 3 operands
from 1 operand cases and remove them when successful. Look for
nested additions even from rhs[2], not just rhs[1].
* gcc.dg/pr111845.c: New test.
* gcc.target/i386/pr111845.c: New test.
|
|
gcc/ChangeLog:
PR target/111093
* config/nvptx/nvptx.cc (nvptx_option_override): Issue fatal error
instead of an assert ICE when no -march= has been specified.
|
|
This adds support for the minimum OS version data in assembler files.
At present, we have no mechanism to detect the SDK version in use, and
so that is omitted from build_versions.
We follow the implementation in clang, '.build_version' is only emitted
(where supported) for target macOS versions >= 10.14. For earlier macOS
we fall back to using a '.macosx_version_min' directive. This latter is
also emitted when the assembler supports it, but not build_version.
gcc/ChangeLog:
* config.in: Regenerate.
* config/darwin.cc (darwin_file_start): Add assembler directives
for the target OS version, where these are supported by the
assembler.
(darwin_override_options): Check for building >= macOS 10.14.
* configure: Regenerate.
* configure.ac: Check for assembler support of .build_version
directives.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
This refactors the code to remove the args cache and index lookups
in favor of a single structure. It also again, removes the use of
std::sort as previously requested but avoids the new asserts in
trunk.
gcc/ChangeLog:
PR tree-optimization/109154
* tree-if-conv.cc (INCLUDE_ALGORITHM): Remove.
(typedef struct ifcvt_arg_entry): New.
(cmp_arg_entry): New.
(gen_phi_arg_condition, gen_phi_nest_statement,
predicate_scalar_phi): Use them.
|
|
This rewrites the simd MOV patterns to use the new compact syntax.
No change in semantics is expected. This will be needed in follow on patches.
This also merges the splits into the define_insn which will also be needed soon.
gcc/ChangeLog:
PR tree-optimization/109154
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
Rewrite to new syntax.
(*aarch64_simd_mov<VQMOV:mode): Rewrite to new syntax and merge in
splits.
|
|
When ifcvt was initially added masking was not a thing and as such it was
rather conservative in what it supported.
For builtins it only allowed C99 builtin functions which it knew it can fold
away.
These days the vectorizer is able to deal with needing to mask IFNs itself.
vectorizable_call is able vectorize the IFN by emitting a VEC_PERM_EXPR after
the operation to emulate the masking.
This is then used by match.pd to conver the IFN into a masked variant if it's
available.
For these reasons the restriction in ifconvert is no longer require and we
needless block vectorization when we can effectively handle the operations.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Note: This patch is part of a testseries and tests for it are added in the
AArch64 patch that adds supports for the optab.
gcc/ChangeLog:
PR tree-optimization/109154
* tree-if-conv.cc (if_convertible_stmt_p): Allow any const IFN.
|
|
sharing operand [PR109154]
When we have a vector conditional on a masked target which is doing a selection
on the result of a conditional operation where one of the operands of the
conditional operation is the other operand of the select, then we can fold the
vector conditional into the operation.
Concretely this transforms
c = mask1 ? (masked_op mask2 a b) : b
into
c = masked_op (mask1 & mask2) a b
The mask is then propagated upwards by the compiler. In the SVE case we don't
end up needing a mask AND here since `mask2` will end up in the instruction
creating `mask` which gives us a natural &.
Such transformations are more common now in GCC 13+ as PRE has not started
unsharing of common code in case it can make one branch fully independent.
e.g. in this case `b` becomes a loop invariant value after PRE.
This transformation removes the extra select for masked architectures but
doesn't fix the general case.
gcc/ChangeLog:
PR tree-optimization/109154
* match.pd: Add new cond_op rule.
gcc/testsuite/ChangeLog:
PR tree-optimization/109154
* gcc.target/aarch64/sve/pre_cond_share_1.c: New test.
|
|
During the review of an LLVM change [1], on LA464 we found that zeroing
an fcc with fcmp.caf.s is much faster than a movgr2cf from $r0.
[1]: https://github.com/llvm/llvm-project/pull/69300
gcc/ChangeLog:
* config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for
zeroing a fcc.
|
|
The following allows to pass integer mask data as traditional
vector mask for OMP SIMD clone calls which is required due to
the limited set of OMP SIMD clones in the x86 ABI when using
AVX512 but a prefered vector size of 256 bits.
* tree-vect-stmts.cc (vectorizable_simd_clone_call):
Relax check to again allow passing integer mode masks
as traditional vectors.
|
|
This final patch updates peeling to maintain LCSSA all the way through.
It's significantly easier to maintain it during peeling while we still know
where all new edges connect rather than touching it up later as is currently
being done.
This allows us to remove many of the helper functions that touch up the loops
at various parts. The only complication is for loop distribution where we
should be able to use the same, however ldist depending on whether
redirect_lc_phi_defs is true or not will either try to maintain a limited LCSSA
form itself or removes are non-virtual phis.
The problem here is that if we maintain LCSSA then in some cases the blocks
connecting the two loops get PHIs to keep the loop IV up to date.
However there is no loop, the guard condition is rewritten as 0 != 0, to the
"loop" always exits. However due to the PHI nodes the probabilities get
completely wrong. It seems to think that the impossible exit is the likely
edge. This causes incorrect warnings and the presence of the PHIs prevent the
blocks to be simplified.
While it may be possible to make ldist work with LCSSA form, doing so seems more
work than not. For that reason the peeling code has an additional parameter
used by only ldist to not connect the two loops during peeling.
This preserves the current behaviour from ldist until I can dive into the
implementation more. Hopefully that's ok for now.
gcc/ChangeLog:
* tree-loop-distribution.cc (copy_loop_before): Request no LCSSA.
* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional
asserts.
(slpeel_tree_duplicate_loop_to_edge_cfg): Keep LCSSA during peeling.
(find_guard_arg): Look value up through explicit edge and original defs.
(vect_do_peeling): Use it.
(slpeel_update_phi_nodes_for_guard2): Take explicit exit edge.
(slpeel_update_phi_nodes_for_lcssa, slpeel_update_phi_nodes_for_loops):
Remove.
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Initialize phi.
* tree-vectorizer.h (slpeel_tree_duplicate_loop_to_edge_cfg): Add
optional param to turn off LCSSA mode.
|
|
This second part updates niters analysis to be able to analyze any number of
exits. If we have multiple exits we determine the main exit by finding the
first counting IV.
The change allows the vectorizer to pass analysis for multiple loops, but we
later gracefully reject them. It does however allow us to test if the exit
handling is using the right exit everywhere.
Additionally since we analyze all exits, we now return all conditions for them
and determine which condition belongs to the main exit.
The main condition is needed because the vectorizer needs to ignore the main IV
condition during vectorization as it will replace it during codegen.
To track versioned loops we extend the contract between ifcvt and the vectorizer
to store the exit number in aux so that we can match it up again during peeling.
gcc/ChangeLog:
* tree-if-conv.cc (tree_if_conversion): Record exits in aux.
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use
it.
* tree-vect-loop.cc (vect_get_loop_niters): Determine main exit.
(vec_init_loop_exit_info): Extend analysis when multiple exits.
(vect_analyze_loop_form): Record conds and determine main cond.
(vect_create_loop_vinfo): Extend bookkeeping of conds.
(vect_analyze_loop): Release conds.
* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
LOOP_VINFO_LOOP_IV_COND): New.
(struct vect_loop_form_info): Add conds, alt_loop_conds;
(struct loop_vec_info): Add conds, loop_iv_cond.
|
|
variables
This is extracted out of the patch series to support early break vectorization
in order to simplify the review of that patch series.
The goal of this one is to separate out the refactoring from the new
functionality.
This first patch separates out the vectorizer's definition of an exit to their
own values inside loop_vinfo. During vectorization we can have three separate
copies for each loop: scalar, vectorized, epilogue. The scalar loop can also be
the versioned loop before peeling.
Because of this we track 3 different exits inside loop_vinfo corresponding to
each of these loops. Additionally each function that uses an exit, when not
obviously clear which exit is needed will now take the exit explicitly as an
argument.
This is because often times the callers switch the loops being passed around.
While the caller knows which loops it is, the callee does not.
For now the loop exits are simply initialized to same value as before determined
by single_exit (..).
No change in functionality is expected throughout this patch series.
gcc/ChangeLog:
* tree-loop-distribution.cc (copy_loop_before): Pass exit explicitly.
(loop_distribution::distribute_loop): Bail out of not single exit.
* tree-scalar-evolution.cc (get_loop_exit_condition): New.
* tree-scalar-evolution.h (get_loop_exit_condition): New.
* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Pass exit
explicitly.
* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors,
vect_set_loop_condition_partial_vectors_avx512,
vect_set_loop_condition_normal, vect_set_loop_condition): Explicitly
take exit.
(slpeel_tree_duplicate_loop_to_edge_cfg): Explicitly take exit and
return new peeled corresponding peeled exit.
(slpeel_can_duplicate_loop_p): Explicitly take exit.
(find_loop_location): Handle not knowing an explicit exit.
(vect_update_ivs_after_vectorizer, vect_gen_vector_loop_niters_mult_vf,
find_guard_arg, slpeel_update_phi_nodes_for_loops,
slpeel_update_phi_nodes_for_guard2): Use new exits.
(vect_do_peeling): Update bookkeeping to keep track of exits.
* tree-vect-loop.cc (vect_get_loop_niters): Explicitly take exit to
analyze.
(vec_init_loop_exit_info): New.
(_loop_vec_info::_loop_vec_info): Initialize vec_loop_iv,
vec_epilogue_loop_iv, scalar_loop_iv.
(vect_analyze_loop_form): Initialize exits.
(vect_create_loop_vinfo): Set main exit.
(vect_create_epilog_for_reduction, vectorizable_live_operation,
vect_transform_loop): Use it.
(scale_profile_for_vect_loop): Explicitly take exit to scale.
* tree-vectorizer.cc (set_uid_loop_bbs): Initialize loop exit.
* tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_EPILOGUE_IV_EXIT,
LOOP_VINFO_SCALAR_IV_EXIT): New.
(struct loop_vec_info): Add vec_loop_iv, vec_epilogue_loop_iv,
scalar_loop_iv.
(vect_set_loop_condition, slpeel_can_duplicate_loop_p,
slpeel_tree_duplicate_loop_to_edge_cfg): Take explicit exits.
(vec_init_loop_exit_info): New.
(struct vect_loop_form_info): Add loop_exit.
|
|
Vectorization of a gcond starts off essentially the same as vectorizing a
comparison witht he only difference being how the operands are extracted.
This refactors vectorable_comparison such that we now have a generic function
that can be used from vectorizable_early_break. The refactoring splits the
gassign checks and actual validation/codegen off to a helper function.
No change in functionality expected.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting body
to ...
(vectorizable_comparison_1): ...This.
|
|
This patch optimize this following permutation with consecutive patterns index:
typedef char vnx16i __attribute__ ((vector_size (16)));
#define MASK_16 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15
vnx16i __attribute__ ((noinline, noclone))
test_1 (vnx16i x, vnx16i y)
{
return __builtin_shufflevector (x, y, MASK_16);
}
Before this patch:
lui a5,%hi(.LC0)
addi a5,a5,%lo(.LC0)
vsetivli zero,16,e8,m1,ta,ma
vle8.v v3,0(a5)
vle8.v v2,0(a1)
vrgather.vv v1,v2,v3
vse8.v v1,0(a0)
ret
After this patch:
vsetivli zero,16,e8,mf8,ta,ma
vle8.v v2,0(a1)
vsetivli zero,4,e32,mf2,ta,ma
vrgather.vi v1,v2,3
vsetivli zero,16,e8,mf8,ta,ma
vse8.v v1,0(a0)
ret
Overal reduce 1 instruction which is vector load instruction which is much more expansive
than VL toggling.
Also, with this patch, we are using vrgather.vi which reduce 1 vector register consumption.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (shuffle_consecutive_patterns): New function.
(expand_vec_perm_const_1): Add consecutive pattern recognition.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h: Add new test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-3.c: New test.
|
|
gcc/fortran/ChangeLog:
* intrinsic.texi (signal): Add 'intrinsic :: signal, sleep' to
the example to make it safer.
|
|
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_intel_cpu): Add Panther
Lake.
* common/config/i386/i386-common.cc (processor_name):
Ditto.
(processor_alias_table): Ditto.
* common/config/i386/i386-cpuinfo.h (enum processor_types):
Add INTEL_PANTHERLAKE.
* config.gcc: Add -march=pantherlake.
* config/i386/driver-i386.cc (host_detect_local_cpu): Refactor
the if clause. Handle pantherlake.
* config/i386/i386-c.cc (ix86_target_macros_internal):
Handle pantherlake.
* config/i386/i386-options.cc (processor_cost_table): Ditto.
(m_PANTHERLAKE): New.
(m_CORE_HYBRID): Add pantherlake.
* config/i386/i386.h (enum processor_type): Ditto.
* doc/extend.texi: Ditto.
* doc/invoke.texi: Ditto.
gcc/testsuite/ChangeLog:
* g++.target/i386/mv16.C: Ditto.
* gcc.target/i386/funcspec-56.inc: Handle new march.
|
|
gcc/Changelog:
* config/i386/i386-options.cc (m_CORE_HYBRID): New.
* config/i386/x86-tune.def: Replace hybrid client tune to
m_CORE_HYBRID.
|
|
gcc/ChangeLog:
* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle Clearwater Forest.
* common/config/i386/i386-common.cc (processor_name):
Add Clearwater Forest.
(processor_alias_table): Ditto.
* common/config/i386/i386-cpuinfo.h (enum processor_types):
Add INTEL_CLEARWATERFOREST.
* config.gcc: Add -march=clearwaterforest.
* config/i386/driver-i386.cc (host_detect_local_cpu): Handle
clearwaterforest.
* config/i386/i386-c.cc (ix86_target_macros_internal): Ditto.
* config/i386/i386-options.cc (processor_cost_table): Ditto.
(m_CLEARWATERFOREST): New.
(m_CORE_ATOM): Add clearwaterforest.
* config/i386/i386.h (enum processor_type): Ditto.
* doc/extend.texi: Ditto.
* doc/invoke.texi: Ditto.
gcc/testsuite/ChangeLog:
* g++.target/i386/mv16.C: Ditto.
* gcc.target/i386/funcspec-56.inc: Handle new march.
|
|
gcc/ChangeLog:
* config/i386/mmx.md (fma<mode>4): New expander.
(fms<mode>4): Ditto.
(fnma<mode>4): Ditto.
(fnms<mode>4): Ditto.
(vec_fmaddsubv4hf4): Ditto.
(vec_fmsubaddv4hf4): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/part-vect-fmaddsubhf-1.c: New test.
* gcc.target/i386/part-vect-fmahf-1.c: New test.
|
|
Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html
which is caused by assertion FAIL.
When we enable more currents in rvv.exp with dynamic LMUL, such issue can be
reproduced and has a PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111832
Now, we enable more tests in rvv.exp in this patch and fix the bug.
PR target/111832
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (get_biggest_mode): New function.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Enable more dynamic tests.
|
|
|
|
Now that the prologue and epilogue code iterates over saved
registers in offset order, we can put the LR save slot first
without compromising LDP/STP formation.
This isn't worthwhile when shadow call stacks are enabled, since the
first two registers are also push/pop candidates, and LR cannot be
popped when shadow call stacks are enabled. (LR is instead loaded
first and compared against the shadow stack's value.)
But otherwise, it seems better to put the LR save slot first,
to reduce unnecessary variation with the layout for stack clash
protection.
gcc/
* config/aarch64/aarch64.cc (aarch64_layout_frame): Don't make
the position of the LR save slot dependent on stack clash
protection unless shadow call stacks are enabled.
gcc/testsuite/
* gcc.target/aarch64/test_frame_2.c: Expect x30 to come before x19.
* gcc.target/aarch64/test_frame_4.c: Likewise.
* gcc.target/aarch64/test_frame_7.c: Likewise.
* gcc.target/aarch64/test_frame_10.c: Likewise.
|
|
aarch64_save/restore_callee_saves looped over registers in register
number order. This in turn meant that we could only use LDP and STP
for registers that were consecutive both number-wise and
offset-wise (after unsaved registers are excluded).
This patch instead builds lists of the registers that we've decided to
save, in offset order. We can then form LDP/STP pairs regardless of
register number order, which in turn means that we can put the LR save
slot first without losing LDP/STP opportunities.
gcc/
* config/aarch64/aarch64.h (aarch64_frame): Add vectors that
store the list saved GPRs, FPRs and predicate registers.
* config/aarch64/aarch64.cc (aarch64_layout_frame): Initialize
the lists of saved registers. Use them to choose push candidates.
Invalidate pop candidates if we're not going to do a pop.
(aarch64_next_callee_save): Delete.
(aarch64_save_callee_saves): Take a list of registers,
rather than a range. Make !skip_wb select only write-back
candidates.
(aarch64_expand_prologue): Update calls accordingly.
(aarch64_restore_callee_saves): Take a list of registers,
rather than a range. Always skip pop candidates. Also skip
LR if shadow call stacks are enabled.
(aarch64_expand_epilogue): Update calls accordingly.
gcc/testsuite/
* gcc.target/aarch64/sve/pcs/stack_clash_2.c: Expect restores
to happen in offset order.
* gcc.target/aarch64/sve/pcs/stack_clash_2_128.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise.
|
|
The prologue/epilogue pass allows the prologue sequence to contain
jumps. The sequence is then partitioned into basic blocks using
find_many_sub_basic_blocks.
This patch treats epilogues in a similar way. Since only one block
might need to be split, the patch (re)introduces a find_sub_basic_blocks
routine to handle a single block.
The new routine hard-codes the assumption that split_block will chain
the new block immediately after the original block. The routine doesn't
try to replicate the fix for PR81030, since that was specific to
gimple->rtl expansion.
The patch is needed for follow-on aarch64 patches that add conditional
code to the epilogue. The tests are part of those patches.
gcc/
* cfgbuild.h (find_sub_basic_blocks): Declare.
* cfgbuild.cc (update_profile_for_new_sub_basic_block): New function,
split out from...
(find_many_sub_basic_blocks): ...here.
(find_sub_basic_blocks): New function.
* function.cc (thread_prologue_and_epilogue_insns): Handle
epilogues that contain jumps.
|
|
This turns out to be a latent bug in ssa_name_has_boolean_range
where it would return true for all boolean types but all of the
uses of ssa_name_has_boolean_range was expecting 0/1 as the range
rather than [-1,0].
So when I fixed vector lower to do all comparisons in boolean_type
rather than still in the signed-boolean:31 type (to fix a different issue),
the pattern in match for `-(type)!A -> (type)A - 1.` would assume A (which
was signed-boolean:31) had a range of [0,1] which broke down and sometimes
gave us -1/-2 as values rather than what we were expecting of -1/0.
This was the simpliest patch I found while testing.
We have another way of matching [0,1] range which we could use instead
of ssa_name_has_boolean_range except that uses only the global ranges
rather than the local range (during VRP).
I tried to clean this up slightly by using gimple_match_zero_one_valuedp
inside ssa_name_has_boolean_range but that failed because due to using
only the global ranges. I then tried to change get_nonzero_bits to use
the local ranges at the optimization time but that failed also because
we would remove branches to __builtin_unreachable during evrp and lose
information as we don't set the global ranges during evrp.
OK? Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/110817
gcc/ChangeLog:
* tree-ssanames.cc (ssa_name_has_boolean_range): Remove the
check for boolean type as they don't have "[0,1]" range.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr110817-1.c: New test.
* gcc.c-torture/execute/pr110817-2.c: New test.
* gcc.c-torture/execute/pr110817-3.c: New test.
|
|
r6-2367 added a DECL_INITIAL check to cp_parser_simple_declaration
so that we don't emit multiple errors in g++.dg/parse/error57.C.
But that means we don't diagnose
int f1() = delete("george_crumb");
anymore, because fn decls often have error_mark_node in their
DECL_INITIAL. (The code may be allowed one day via https://wg21.link/P2573R0.)
I was hoping I could use cp_parser_error_occurred but that would
regress error57.C.
PR c++/111840
gcc/cp/ChangeLog:
* parser.cc (cp_parser_simple_declaration): Do cp_parser_error
for FUNCTION_DECLs.
gcc/testsuite/ChangeLog:
* g++.dg/parse/error65.C: New test.
|
|
My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs. The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on. This patch brings the
compile time down to about 0m0.030s.
The ff_fold_immediate flag is unused after this patch but since I'm
using it in the P2564 patch, I'm not removing it now. Maybe at_eof
can be used instead and then we can remove ff_fold_immediate.
PR c++/111660
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_fold_immediate_r) <case COND_EXPR>: Don't
handle it here.
(cp_fold_r): Handle COND_EXPR here.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/hog1.C: New test.
* g++.dg/cpp2a/consteval36.C: New test.
|
|
Most of this is introducing the abi_check function to reduce the verbosity
of most places that check -fabi-version.
The start_mangling change is to avoid needing to zero-initialize additional
members of the mangling globals, though I'm not actually adding any.
The comment documents existing semantics.
gcc/cp/ChangeLog:
* mangle.cc (abi_check): New.
(write_prefix, write_unqualified_name, write_discriminator)
(write_type, write_member_name, write_expression)
(write_template_arg, write_template_param): Use it.
(start_mangling): Assign from {}.
* cp-tree.h: Update comment.
|
|
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_dynamic_cast_fn): Add missing
auto_diagnostic_group.
(cxx_eval_call_expression): Likewise.
(diag_array_subscript): Likewise.
(outside_lifetime_error): Likewise.
(potential_constant_expression_1): Likewise.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Marek Polacek <polacek@redhat.com>
|
|
Update the test to potentially generate two SEXT.W instructions: one for
incoming function arg, other for function return.
But after commit 8eb9cdd14218
("expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg")
the test is not supposed to generate either of them so fix the expected
assembler output which was errorneously introduced by commit above.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr111466.c (foo2): Change return to unsigned
int as that will potentially generate two SEXT.W instructions.
dg-final: Change to scan-assembler-not SEXT.W.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
|
|
Declaring a function with both external and internal linkage
in the same TU is translation-time UB. Add an error for this
case as already done for objects.
PR c/111708
gcc/c/ChangeLog:
* c-decl.cc (grokdeclarator): Add error.
gcc/testsuite/ChangeLog:
* gcc.dg/pr111708-1.c: New test.
* gcc.dg/pr111708-2.c: New test.
|