Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch reimplements the analyzer's implementation of strcpy using
the region_model::scan_for_null_terminator infrastructure, so that e.g.
it can complain about out-of-bounds reads/writes, unterminated strings,
etc.
gcc/analyzer/ChangeLog:
PR analyzer/105899
* kf.cc (kf_strcpy::impl_call_pre): Reimplement using
check_for_null_terminated_string_arg.
* region-model.cc (region_model::get_store_bytes): Shortcut
reading all of a string_region.
(region_model::scan_for_null_terminator): Use get_store_value for
the bytes rather than "unknown" when returning an unknown length.
(region_model::write_bytes): New.
* region-model.h (region_model::write_bytes): New decl.
gcc/testsuite/ChangeLog:
PR analyzer/105899
* gcc.dg/analyzer/out-of-bounds-diagram-16.c: New test.
* gcc.dg/analyzer/strcpy-1.c: Add test coverage.
* gcc.dg/analyzer/strcpy-3.c: Likewise.
* gcc.dg/analyzer/strcpy-4.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
gcc/analyzer/ChangeLog:
PR analyzer/105899
* region-model.cc (iterable_cluster::iterable_cluster): Add
symbolic binding keys to m_symbolic_bindings.
(iterable_cluster::has_symbolic_bindings_p): New.
(iterable_cluster::m_symbolic_bindings): New field.
(region_model::scan_for_null_terminator): Treat clusters with
symbolic bindings as having unknown strlen.
gcc/testsuite/ChangeLog:
PR analyzer/105899
* gcc.dg/analyzer/sprintf-1.c: Include "analyzer-decls.h".
(test_strlen_1): New.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
gcc/analyzer/ChangeLog:
* engine.cc (impl_path_context::impl_path_context): Add logger
param.
(impl_path_context::bifurcate): Add log message.
(impl_path_context::terminate_path): Likewise.
(impl_path_context::m_logger): New field.
(exploded_graph::process_node): Pass logger to path_ctxt ctor.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
The testcase in the PR shows that late uninit diagnostic relies
on indirect clobbers in CTORs but we throw those away in the fab
pass which is too early. The reasoning was they were supposed
to keep SSA names live but that's no longer the case since DCE
doesn't treat them as keeping SSA uses live.
The following instead removes them before out-of-SSA coalescing
which is the thing that's still affected by them.
PR tree-optimization/111123
* tree-ssa-ccp.cc (pass_fold_builtins::execute): Do not
remove indirect clobbers here ...
* tree-outof-ssa.cc (rewrite_out_of_ssa): ... but here.
(remove_indirect_clobbers): New function.
* g++.dg/warn/Wuninitialized-pr111123-1.C: New testcase.
|
|
This patch extends verifier to check that all probabilities and counts are
initialized if profile is supposed to be present. This is a bit complicated
by the posibility that we inline !flag_guess_branch_probability function
into function with profile defined and in this case we need to stop
verification. For this reason I added flag to cfg structure tracking this.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* cfg.h (struct control_flow_graph): New field full_profile.
* auto-profile.cc (afdo_annotate_cfg): Set full_profile to true.
* cfg.cc (init_flow): Set full_profile to false.
* graphite.cc (graphite_transform_loops): Set full_profile to false.
* lto-streamer-in.cc (input_cfg): Initialize full_profile flag.
* predict.cc (pass_profile::execute): Set full_profile to true.
* symtab-thunks.cc (expand_thunk): Set full_profile to true.
* tree-cfg.cc (gimple_verify_flow_info): Verify that profile is full
if full_profile is set.
* tree-inline.cc (initialize_cfun): Initialize full_profile.
(expand_call_inline): Combine full_profile.
|
|
libstdc++-v3/ChangeLog:
PR libstdc++/111102
* testsuite/std/format/string.cc: Check wide character format
strings with out-of-range widths.
|
|
When parsing a format string, the width is parsed into an unsigned short
but the result is not checked in the case the format string is not a
char string (such as a wide string). In case the parse fails, a null
pointer is returned which is used for pointer arithmetic which is
undefined behaviour.
Signed-off-by: Paul Dreik <gccpatches@pauldreik.se>
libstdc++-v3/ChangeLog:
PR libstdc++/111102
* include/std/format (__format::__parse_integer): Check for
non-null pointer.
|
|
libstdc++-v3/ChangeLog:
* testsuite/std/format/functions/format_to.cc: Avoid warning for
unused variables.
|
|
Update a preprocessor condition using __cplusplus and _GLIBCXX_HOSTED
to use the relevant feature test macro for <syncstream>.
Also add comments to some conditions saying which C++ standard revision
the check corresponds to.
libstdc++-v3/ChangeLog:
* include/std/atomic: Add comment to #ifdef and fix indentation.
* include/std/ostream: Check __glibcxx_syncbuf instead of
__cplusplus and _GLIBCXX_HOSTED.
* include/std/thread: Add comment to #ifdef.
|
|
This is a no-op for libstdc++, because our intmax_t is a 64-bit type and
so is incapable of representing the largest and smallest ratios from
C++11, let alone the new ones. I've added them to the file anyway (and
defined the feature test macro) so that if somebody ports libstdc++ to a
target with 128-bit intmax_t then they'll be present.
libstdc++-v3/ChangeLog:
* include/bits/version.def (__cpp_lib_ratio): Define.
* include/bits/version.h: Regenerate.
* include/std/ratio (quecto, ronto, yocto, zepto)
(zetta, yotta, ronna, quetta): Define.
* testsuite/20_util/ratio/operations/ops_overflow_neg.cc: Adjust
dg-error line numbers.
|
|
load_p is set and used as to whether the stmt is a memory operation,
not whether it is only a load. The following renames it to ldst_p
to avoid this confusion. It also replaces checking for a VUSE
with checking STMT_VINFO_DATA_REF since VUSE checking doesn't
work for pattern matched stores where no virtual operands are
present. Where we want to distinguish between loads and stores
we then check DR_IS_READ/WRITE.
I've made a classification mistake with .MASK_STORE support and
this hits other complications when dealing with single-lane SLP.
* tree-vect-slp.cc (vect_build_slp_tree_1): Rename
load_p to ldst_p, fix mistakes and rely on
STMT_VINFO_DATA_REF.
|
|
Print the locale's name, except when it uses the same named C locale for
all categories except one, in which case print something like:
std::locale = "en_GB.UTF-8" with "LC_CTYPE=en_US.UTF-8"
libstdc++-v3/ChangeLog:
* python/libstdcxx/v6/printers.py (StdLocalePrinter): New
printer class.
* testsuite/libstdc++-prettyprinters/locale.cc: New test.
|
|
As the PR says, including the template arguments in the GDB output of
these class templates can result in very long names, especially for
std::variant. You can use 'whatis' or other GDB commands to get details
of the type, we don't need to include it in the value.
We could consider including the type if it's not too long, but I think
consistency is better (and we already omit the template arguments for
std::vector and other class templates).
libstdc++-v3/ChangeLog:
PR libstdc++/110944
* python/libstdcxx/v6/printers.py (StdExpOptionalPrinter): Do
not show template arguments.
(StdVariantPrinter): Likewise.
* testsuite/libstdc++-prettyprinters/compat.cc: Adjust expected
output.
* testsuite/libstdc++-prettyprinters/cxx17.cc: Likewise.
* testsuite/libstdc++-prettyprinters/libfundts.cc: Likewise.
|
|
gcc/ChangeLog:
* gimple-harden-conditionals.cc (insert_check_and_trap): Set count
of newly build trap bb.
|
|
This patch is depending on middle-end patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627621.html
We already had COND_LEN_FNMA/COND_LEN_FMS/COND_FNMS patterns.
Remove TARGET_PREFERRED_ELSE_VALUE since it forbid the COND_LEN_FMS/COND_LEN_FNMS STMT fold.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_preferred_else_value): Remove it since
it forbid COND_LEN_FMS/COND_LEN_FNMS STMT fold.
(TARGET_PREFERRED_ELSE_VALUE): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Adapt test.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-9.c: New test.
|
|
this patch enables pressure-aware scheduling for riscv. There have been
various requests for it so I figured I'd just go ahead and send
the patch.
There is some slight regression in code quality for a number of
vector tests where we spill more due to different instructions order.
The ones I looked at were a mix of bad luck and/or brittle tests.
Comparing the size of the generated assembly or the number of vsetvls
for SPECint also didn't show any immediate benefit but that's obviously
not a very fine-grained analysis.
As cost and scheduling models mature I expect the situation to improve
and for now I think it's generally favorable to enable pressure-aware
scheduling so we can work with it rather than trying to find every
possible problem in advance.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add -fsched-pressure.
* config/riscv/riscv.cc (riscv_option_override): Set sched
pressure algorithm.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/narrow_constraint-1.c: Add
-fno-sched-pressure.
* gcc.target/riscv/rvv/base/narrow_constraint-17.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-18.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-19.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-20.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-21.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-22.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-23.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-24.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-25.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-26.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-27.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-28.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-29.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-30.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-31.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-4.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-5.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-8.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c: Ditto.
|
|
This patch adds a missing constraint in order to be able to print (and
not ICE) vector immediates 17-31 for vector shifts.
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_print_operand): Allow vk operand.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/shift-immediate.c: New test.
|
|
This adds some missing tests for vf[nw]cvt.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c:
Add tests.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c:
Ditto.
|
|
This patch fixes the reduc_strict_run-1 testcase by introducing
a variable that holds the reference result. This is necessary
because in presence of _Float16 emulation an intermediate
result used in a comparison is computed in higher precision.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c:
Add variable to hold reference result.
|
|
When a loop is marked with
#pragma GCC novector
the following makes sure to also skip BB vectorization for contained
blocks. That avoids gcc.dg/vect/bb-slp-29.c failing on aarch64
because of extra BB vectorization therein. I'm not specifically
dealing with sub-loops of novector loops, the desired semantics
isn't documented.
PR tree-optimization/111125
* tree-vect-slp.cc (vect_slp_function): Split at novector
loop entry, do not push blocks in novector loops.
|
|
[[]] attributes are a recent addition to C, but as a GNU extension,
GCC allows them to be used in C11 and earlier. Normally this use
would trigger a pedwarn (for -pedantic, -Wc11-c2x-compat, etc.).
This patch allows the pedwarn to be suppressed by starting the
attribute-list with __extension__.
Also, :: is not a single lexing token prior to C2X, so it wasn't
possible to use scoped attributes in C11, even as a GNU extension.
The patch allows two colons to be used in place of :: when
__extension__ is used. No attempt is made to check whether the
two colons are immediately adjacent.
gcc/
* doc/extend.texi: Document the C [[__extension__ ...]] construct.
gcc/c/
* c-parser.cc (c_parser_std_attribute): Conditionally allow
two colons to be used in place of ::.
(c_parser_std_attribute_list): New function, split out from...
(c_parser_std_attribute_specifier): ...here. Allow the attribute-list
to start with __extension__. When it does, also allow two colons
to be used in place of ::.
gcc/testsuite/
* gcc.dg/c2x-attr-syntax-6.c: New test.
* gcc.dg/c2x-attr-syntax-7.c: Likewise.
|
|
Hi, Richard and Richi.
Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
It's supported in tree-ssa-math-opts.cc. However, GCC failed to support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
Consider this following case:
__attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \
TYPE *__restrict a, \
TYPE *__restrict b, int n) \
{ \
for (int i = 0; i < n; i++) \
dst[i] -= a[i] * b[i]; \
}
TEST_TYPE (float) \
TEST_ALL ()
Gimple IR for RVV:
...
_39 = -vect__8.14_26;
vect__10.16_21 = .COND_LEN_FMA ({ -1, ... }, vect__6.11_30, _39, vect__4.8_34, vect__4.8_34, _46, 0);
...
This is because this following piece of codes in tree-ssa-math-opts.cc:
if (len)
fma_stmt
= gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2,
addop, else_value, len, bias);
else if (cond)
fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1,
op2, addop, else_value);
else
fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop);
gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt));
gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun,
use_stmt));
gsi_replace (&gsi, fma_stmt, true);
/* Follow all SSA edges so that we generate FMS, FNMA and FNMS
regardless of where the negation occurs. */
gimple *orig_stmt = gsi_stmt (gsi);
if (fold_stmt (&gsi, follow_all_ssa_edges))
{
if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
gcc_unreachable ();
update_stmt (gsi_stmt (gsi));
}
'fold_stmt' failed to fold NEGATE_EXPR + COND_LEN_FMA ====> COND_LEN_FNMA.
This patch support STMT fold into:
vect__10.16_21 = .COND_LEN_FNMA ({ -1, ... }, vect__8.14_26, vect__6.11_30, vect__4.8_34, { 0.0, ... }, _46, 0);
Note that COND_LEN_FNMA has 7 arguments and COND_LEN_ADD has 6 arguments.
Extend maximum num ops:
- static const unsigned int MAX_NUM_OPS = 5;
+ static const unsigned int MAX_NUM_OPS = 7;
Bootstrap and Regtest on X86 passed.
Tested on aarch64 Qemu.
Fully tested COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS on RISC-V backend.
gcc/ChangeLog:
* genmatch.cc (decision_tree::gen): Support
COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
* gimple-match-exports.cc (gimple_simplify): Ditto.
(gimple_resimplify6): New function.
(gimple_resimplify7): New function.
(gimple_match_op::resimplify): Support
COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
(convert_conditional_op): Ditto.
(build_call_internal): Ditto.
(try_conditional_simplification): Ditto.
(gimple_extract): Ditto.
* gimple-match.h (gimple_match_cond::gimple_match_cond): Ditto.
* internal-fn.cc (CASE): Ditto.
|
|
The following adds the capability to do SLP on .MASK_STORE, I do not
plan to add interleaving support.
PR tree-optimization/111115
gcc/
* tree-vectorizer.h (vect_slp_child_index_for_operand): New.
* tree-vect-data-refs.cc (can_group_stmts_p): Also group
.MASK_STORE.
* tree-vect-slp.cc (arg3_arg2_map): New.
(vect_get_operand_map): Handle IFN_MASK_STORE.
(vect_slp_child_index_for_operand): New function.
(vect_build_slp_tree_1): Handle statements with no LHS,
masked store ifns.
(vect_remove_slp_scalar_calls): Likewise.
* tree-vect-stmts.cc (vect_check_store_rhs): Lookup the
SLP child corresponding to the ifn value index.
(vectorizable_store): Likewise for the mask index. Support
masked stores.
(vectorizable_load): Lookup the SLP child corresponding to the
ifn mask index.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_masked_store):
Supported with check_avx_available.
* gcc.dg/vect/slp-mask-store-1.c: New testcase.
|
|
We assume that all root stmts which compose the total reduction chain
are vectorized but fail to account for the cost of adding back the
scalar defs we are not vectorizing. The following rectifies this,
fixing the gcc.dg/tree-ssa/slsr-11.c FAIL on aarch64.
PR tree-optimization/111125
* tree-vect-slp.cc (vectorizable_bb_reduc_epilogue): Account
for the remain_defs processing.
|
|
The scalar FNMADD/FNMSUB and SVE FNMLA/FNMLS instructions mean
that either side of a subtraction can start an accumulator chain.
However, Advanced SIMD doesn't have an equivalent instruction.
This means that, for Advanced SIMD, a subtraction can only be
fused if the second operand is a multiplication.
Also, if both sides of a subtraction are multiplications,
and if the second operand is used multiple times, such as:
c * d - a * b
e * f - a * b
then the first rather than second multiplication operand will tend
to be fused. On Advanced SIMD, this leads to:
tmp1 = a * b
tmp2 = -tmp1
... = tmp2 + c * d // FMLA
... = tmp2 + e * f // FMLA
where one of the FMLAs also requires a MOV.
This patch tries to account for this in the vector cost model.
It improves roms performance by 2-3% on Neoverse V1. It's also
needed to avoid a regression in fotonik for Neoverse N2 and
Neoverse V2 with the patch for PR110625.
gcc/
* config/aarch64/aarch64.cc: Include ssa.h.
(aarch64_multiply_add_p): Require the second operand of an
Advanced SIMD subtraction to be a multiplication. Assume that
such an operation won't be fused if the second operand is used
multiple times and if the first operand is also a multiplication.
gcc/testsuite/
* gcc.target/aarch64/neoverse_v1_2.c: New test.
* gcc.target/aarch64/neoverse_v1_3.c: Likewise.
|
|
Hi.
This patch is apply LEN_FOLD_EXTRACT_LAST into loop vectorizer.
Consider this following case:
/* Simple condition reduction. */
int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v)
{
int last = 66; /* High start value. */
for (int i = 0; i < N; i++)
if (a[i] < min_v)
last = i;
return last;
}
With this patch, we can generate this following IR:
_44 = .SELECT_VL (ivtmp_42, POLY_INT_CST [4, 4]);
_34 = vect_vec_iv_.5_33 + { POLY_INT_CST [4, 4], ... };
ivtmp_36 = _44 * 4;
vect__4.8_39 = .MASK_LEN_LOAD (vectp_a.6_37, 32B, { -1, ... }, _44, 0);
mask__11.9_41 = vect__4.8_39 < vect_cst__40;
last_5 = .LEN_FOLD_EXTRACT_LAST (last_14, mask__11.9_41, vect_vec_iv_.5_33, _44, 0);
...
gcc/ChangeLog:
* tree-vect-loop.cc (vectorizable_reduction): Apply
LEN_FOLD_EXTRACT_LAST.
* tree-vect-stmts.cc (vectorizable_condition): Ditto.
|
|
The following fixes placement of shift operand sanitization with
MIN when the original shift operand was external but the actual
one is not.
PR tree-optimization/111128
* tree-vect-patterns.cc (vect_recog_over_widening_pattern):
Emit external shift operand inline if we promoted it with
another pattern stmt.
* gcc.dg/torture/pr111128.c: New testcase.
|
|
The test is for loop vectorization producing non-canonical
multiplications. We can now BB vectorize the whole function
when the target supports .REDUC_PLUS for V2SImode but we don't
have a dejagnu selector for that. Disable BB vectorization
like we disabled epilogue vectorization.
PR testsuite/111125
* gcc.dg/vect/pr53773.c: Disable BB vectorization.
|
|
vfmsac => vfnmacc
vfmsub => vfnmadd
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/autovec.md: Fix typo.
|
|
As suggested by kito, we will add new frm_opt_type template arg
to the op class, to avoid the duplicated function expand.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(class binop_frm): Removed.
(class reverse_binop_frm): Ditto.
(class widen_binop_frm): Ditto.
(class vfmacc_frm): Ditto.
(class vfnmacc_frm): Ditto.
(class vfmsac_frm): Ditto.
(class vfnmsac_frm): Ditto.
(class vfmadd_frm): Ditto.
(class vfnmadd_frm): Ditto.
(class vfmsub_frm): Ditto.
(class vfnmsub_frm): Ditto.
(class vfwmacc_frm): Ditto.
(class vfwnmacc_frm): Ditto.
(class vfwmsac_frm): Ditto.
(class vfwnmsac_frm): Ditto.
(class unop_frm): Ditto.
(class vfrec7_frm): Ditto.
(class binop): Add frm_op_type template arg.
(class unop): Ditto.
(class widen_binop): Ditto.
(class widen_binop_fp): Ditto.
(class reverse_binop): Ditto.
(class vfmacc): Ditto.
(class vfnmsac): Ditto.
(class vfmadd): Ditto.
(class vfnmsub): Ditto.
(class vfnmacc): Ditto.
(class vfmsac): Ditto.
(class vfnmadd): Ditto.
(class vfmsub): Ditto.
(class vfwmacc): Ditto.
(class vfwnmacc): Ditto.
(class vfwmsac): Ditto.
(class vfwnmsac): Ditto.
(class float_misc): Ditto.
|
|
The patterns that were added in r13-4620-g4d9db4bdd458, missed that
(a > b) and (a <= b) are not inverse of each other for floating point
comparisons (if NaNs are supported). Even though there was a check for
intergal types, it was only for the result of the cond rather for the
type of what is being compared. The fix is to check to see if cmp and
icmp are inverse of each other by using the invert_tree_comparison function.
OK for trunk and GCC 13 branch? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
I added the testcase to execute/ieee as it requires support for NAN.
PR tree-optimization/111109
gcc/ChangeLog:
* match.pd (ior(cond,cond), ior(vec_cond,vec_cond)):
Add check to make sure cmp and icmp are inverse.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/ieee/fp-cmp-cond-1.c: New test.
|
|
For 1bit types, negate is either undefined or don't change the value.
In either cases we want to remove them.
This patch adds a match pattern to do that.
Also converting to a 1bit type we can remove the negate just like we already do
for `&1` so this patch adds that too.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Notes on the testcases:
This patch is the last part to fix PR 95929; cond-bool-2.c testcase.
bit1neg-1.c is a 1bit-field testcase where we could remove the assignment
all the way in one case (which happened on the RTL level for some targets but not all).
cond-bool-2.c is the reduced testcase of PR 95929.
PR tree-optimization/95929
gcc/ChangeLog:
* match.pd (convert?(-a)): New pattern
for 1bit integer types.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/bit1neg-1.c: New test.
* gcc.dg/tree-ssa/cond-bool-1.c: New test.
* gcc.dg/tree-ssa/cond-bool-2.c: New test.
|
|
This reverts commit 11ad44da01dd1c91c96e45802fd8b1c50e88703f.
|
|
AVX10 with AVX512 enabled"
This reverts commit 0288ab14732a16b3787546cdd159941eb7306cf3.
|
|
This reverts commit 26a820dc136b00b4dc37609429576b6a914cb572.
|
|
This reverts commit 2485dd9b4e219307f00d683077bbaf5a2add6604.
|
|
This reverts commit 1c3c405ecf23aeb3a2976350887bf2238719c71f.
|
|
This reverts commit d14ab07ee91de0ebf80b73a22c4a23ecf2a2572e.
|
|
This reverts commit aba10895052fcb2ab3c6d53ad98c855509877555.
|
|
This reverts commit 0b20e0f17b47a86cddba68a2e016be0132ae9b0a.
|
|
This reverts commit 5ccdfd0870be168031f8902e1039e77be93b131a.
|
|
This reverts commit 68f7cb6cf9e8b9f2254855507f3b479552adda5f.
|
|
The following applies some maintainance with respect to type qualifiers
and kinds added by later DWARF standards to prune_unused_types_walk.
The particular case in the bug is not handling (thus marking required)
all restrict qualified type DIEs. I've found more DW_TAG_*_type that
are unhandled, looked up the DWARF docs and added them as well based
on common sense.
PR debug/111080
* dwarf2out.cc (prune_unused_types_walk): Handle
DW_TAG_restrict_type, DW_TAG_shared_type, DW_TAG_atomic_type,
DW_TAG_immutable_type, DW_TAG_coarray_type, DW_TAG_unspecified_type
and DW_TAG_dynamic_type as to only output them when referenced.
* gcc.dg/debug/dwarf2/pr111080.c: New testcase.
|
|
gcc/ChangeLog:
* config/i386/i386.cc (ix86_invalid_conversion): Adjust GCC
V13 to GCC 13.1.
|
|
Both "graniterapid-d" and "graniterapids" are attached with
PROCESSOR_GRANITERAPID in processor_alias_table but mapped to
different __cpu_subtype in get_intel_cpu.
And get_builtin_code_for_version will try to match the first
PROCESSOR_GRANITERAPIDS in processor_alias_table which maps to
"granitepraids" here.
861 else if (new_target->arch_specified && new_target->arch > 0)
1862 for (i = 0; i < pta_size; i++)
1863 if (processor_alias_table[i].processor == new_target->arch)
1864 {
1865 const pta *arch_info = &processor_alias_table[i];
1866 switch (arch_info->priority)
1867 {
1868 default:
1869 arg_str = arch_info->name;
This mismatch makes dispatch_function_versions check the preidcate
of__builtin_cpu_is ("graniterapids") for "graniterapids-d" and causes
the issue.
The patch explicitly adds PROCESSOR_ARROWLAKE_S and
PROCESSOR_GRANITERAPIDS_D to make a distinction.
For "alderlake","raptorlake", "meteorlake" they share same isa, cost,
tuning, and mapped to the same __cpu_type/__cpu_subtype in
get_intel_cpu, so no need to add PROCESSOR_RAPTORLAKE and others.
gcc/ChangeLog:
* common/config/i386/i386-common.cc (processor_names): Add new
member graniterapids-s and arrowlake-s.
* config/i386/i386-options.cc (processor_alias_table): Update
table with PROCESSOR_ARROWLAKE_S and
PROCESSOR_GRANITERAPIDS_D.
(m_GRANITERAPID_D): New macro.
(m_ARROWLAKE_S): Ditto.
(m_CORE_AVX512): Add m_GRANITERAPIDS_D.
(processor_cost_table): Add icelake_cost for
PROCESSOR_GRANITERAPIDS_D and alderlake_cost for
PROCESSOR_ARROWLAKE_S.
* config/i386/x86-tune.def: Hanlde m_ARROWLAKE_S same as
m_ARROWLAKE.
* config/i386/i386.h (enum processor_type): Add new member
PROCESSOR_GRANITERAPIDS_D and PROCESSOR_ARROWLAKE_S.
* config/i386/i386-c.cc (ix86_target_macros_internal): Handle
PROCESSOR_GRANITERAPIDS_D and PROCESSOR_ARROWLAKE_S
|
|
* gcc.dg/tree-ssa/update-threading.c: Xfail for cris-*-*.
|
|
|
|
This is primarily Jivan's work, I'm mostly responsible for the write-up and
coordinating with Vlad on a few questions.
On targets with limitations on immediates usable in arithmetic instructions,
LRA's register elimination phase can construct fairly poor code.
This example (from the GCC testsuite) illustrates the problem well.
int consume (void *);
int foo (void) {
int x[1000000];
return consume (x + 1000);
}
If you compile on riscv64-linux-gnu with "-O2 -march=rv64gc -mabi=lp64d", then
you'll get this code (up to the call to consume()).
.cfi_startproc
li t0,-4001792
li a0,-3997696
li a5,4001792
addi sp,sp,-16
.cfi_def_cfa_offset 16
addi t0,t0,1792
addi a0,a0,1696
addi a5,a5,-1792
sd ra,8(sp)
add a5,a5,a0
add sp,sp,t0
.cfi_def_cfa_offset 4000016
.cfi_offset 1, -8
add a0,a5,sp
call consume
Of particular interest is the value in a0 when we call consume. We compute that
horribly inefficiently. If we back-substitute from the final assignment to a0
we get...
a0 = a5 + sp
a0 = a5 + (sp + t0)
a0 = (a5 + a0) + (sp + t0)
a0 = ((a5 - 1792) + a0) + (sp + t0)
a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + t0)
a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + (t0 + 1792))
a0 = (a5 + (a0 + 1696)) + (sp + t0) // removed offsetting terms
a0 = (a5 + (a0 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (a0 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + t0)
a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + -4001792)
a0 = (-3997696 + 1696) + (sp -16) // removed offsetting terms
a0 = sp - 3990616
That's a pretty convoluted way to compute sp - 3990616.
Something like this would be notably better (not great, but we need both the
stack adjustment and the address of the object to pass to consume):
addi sp,sp,-16
sd ra,8(sp)
li t0,-4001792
addi t0,t0,1792
add sp,sp,t0
li a0,4096
addi a0,a0,-96
add a0,sp,a0
call consume
The problem is LRA's elimination code is not handling the case where we have
(plus (reg1) (reg2) where reg1 is an eliminable register and reg2 has a known
equivalency, particularly a constant.
If we can determine that reg2 is equivalent to a constant and treat (plus
(reg1) (reg2)) in the same way we'd treat (plus (reg1) (const_int)) then we can
get the desired code.
This eliminates about 19b instructions, or roughly 1% for deepsjeng on rv64.
There are improvements elsewhere, but they're relatively small. This may
ultimately lessen the value of Manolis's fold-mem-offsets patch. So we'll have
to evaluate that again once he posts a new version.
Bootstrapped and regression tested on x86_64 as well as bootstrapped on rv64.
Earlier versions have been tested against spec2017. Pre-approved by Vlad in a
private email conversation (thanks Vlad!).
Committed to the trunk,
gcc/
* lra-eliminations.cc (eliminate_regs_in_insn): Use equivalences to
to help simplify code further.
|
|
gcc/fortran/ChangeLog:
PR fortran/32986
* resolve.cc (is_non_constant_shape_array): Add forward declaration.
(resolve_common_vars): Diagnose automatic array object in COMMON.
(resolve_symbol): Prevent confusing follow-on error.
gcc/testsuite/ChangeLog:
PR fortran/32986
* gfortran.dg/common_28.f90: New test.
|
|
Rangers PHI analyzer currently only allows a single initializer to a group.
This patch changes that to use an inialization range, which is
cumulative of all integer constants, plus a single symbolic value.
There is no other change to group functionality.
This patch also changes the way PHI groups are printed so they show up in the
listing as they are encountered, rather than as a list at the end. It
was more difficult to see what was going on previously.
PR tree-optimization/110918 - Initialize with range instead of a tree.
gcc/
* gimple-range-fold.cc (fold_using_range::range_of_phi): Tweak output.
* gimple-range-phi.cc (phi_group::phi_group): Remove unused members.
Initialize using a range instead of value and edge.
(phi_group::calculate_using_modifier): Use initializer value and
process for relations after trying for iteration convergence.
(phi_group::refine_using_relation): Use initializer range.
(phi_group::dump): Rework the dump output.
(phi_analyzer::process_phi): Allow multiple constant initilizers.
Dump groups immediately as created.
(phi_analyzer::dump): Tweak output.
* gimple-range-phi.h (phi_group::phi_group): Adjust prototype.
(phi_group::initial_value): Delete.
(phi_group::refine_using_relation): Adjust prototype.
(phi_group::m_initial_value): Delete.
(phi_group::m_initial_edge): Delete.
(phi_group::m_vr): Use int_range_max.
* tree-vrp.cc (execute_ranger_vrp): Don't dump phi groups.
gcc/testsuite/
* gcc.dg/pr102983.c: Adjust output expectations.
* gcc.dg/pr110918.c: New.
|