Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
When vectorizing 4 times, we sometimes do
for
<4x vectorized body>
for
<2x vectorized body>
for
<1x vectorized body>
Here the second two fors handling epilogue never iterates.
Currently vecotrizer thinks that the middle for itrates twice.
This turns out to be scale_profile_for_vect_loop that uses
niter_for_unrolled_loop.
At that time we know epilogue will iterate at most 2 times
but niter_for_unrolled_loop does not know that the last iteration
will be taken by the epilogue-of-epilogue and thus it think
that the loop may iterate once and exit in middle of second
iteration.
We already do correct job updating niter bounds and this is
just ordering issue. This patch makes us to first update
the bounds and then do updating of the loop. I re-implemented
the function more correctly and precisely.
The loop reducing iteration factor for overly flat profiles is bit funny, but
only other method I can think of is to compute sreal scale that would have
similar overhead I think.
Bootstrapped/regtested x86_64-linux, will commit it shortly.
gcc/ChangeLog:
PR middle-end/110649
* tree-vect-loop.cc (scale_profile_for_vect_loop): Rewrite.
(vect_transform_loop): Move scale_profile_for_vect_loop after
upper bound updates.
|
|
While looking into sphinx3 regression I noticed that vectorizer produces
BBs with overall probability count 120%. This patch fixes it.
Richi, I don't know how to create a testcase, but having one would
be nice.
Bootstrapped/regtested x86_64-linux, will commit it shortly.
gcc/ChangeLog:
PR tree-optimization/110649
* tree-vect-loop.cc (optimize_mask_stores): Set correctly
probability of the if-then-else construct.
|
|
try_peel_loop uses gimple_duplicate_loop_body_to_header_edge which subtracts the profile
from the original loop. However then it tries to scale the profile in a wrong way
(it forces header count to be entry count).
This eliminates to profile misupdates in the internal loop of sphinx3.
gcc/ChangeLog:
PR middle-end/110649
* tree-ssa-loop-ivcanon.cc (try_peel_loop): Avoid double profile update.
|
|
|
|
pr103628.f90 adds the -mabi=ibmlongdouble option, but AIX defaults
to 64 bit long double. This patch adds -mlong-double-128 to ensure
that the testcase is compiled with 128 bit long double.
gcc/testsuite/ChangeLog:
* gfortran.dg/pr103628.f90: Add -mlong-double-128 option.
Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
|
|
Committed as obvious after making sure the documentation still builds.
gcc/ChangeLog:
* doc/contrib.texi: Update my entry.
|
|
2023-07-15 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa.md: Define constants R1_REGNUM, R19_REGNUM and
R27_REGNUM.
(tgd_load): Restrict to !TARGET_64BIT. Use register constants.
(tld_load): Likewise.
(tgd_load_pic): Change to expander.
(tld_load_pic, tld_offset_load, tp_load): Likewise.
(tie_load_pic, tle_load): Likewise.
(tgd_load_picsi, tgd_load_picdi): New.
(tld_load_picsi, tld_load_picdi): New.
(tld_offset_load<P:mode>): New.
(tp_load<P:mode>): New.
(tie_load_picsi, tie_load_picdi): New.
(tle_load<P:mode>): New.
|
|
Here the call A().f() is represented as a COMPOUND_EXPR whose first
operand is the otherwise unused object argument A() and second operand
is the call result (both are TARGET_EXPRs). Within the return statement,
this outermost COMPOUND_EXPR ends up foiling the copy elision check in
build_special_member_call, resulting in us introducing a bogus call to the
deleted move constructor. (Within the variable initialization, which goes
through ocp_convert instead of convert_for_initialization, we've already
been eliding the copy -- despite the outermost COMPOUND_EXPR -- ever since
r10-7410-g72809d6fe8e085 made ocp_convert look through COMPOUND_EXPR).
In contrast I noticed '(A(), A::f())' (which should be equivalent to
the above call) is represented with the COMPOUND_EXPR inside the RHS's
TARGET_EXPR initializer thanks to a special case in cp_build_compound_expr.
So this patch fixes this by making keep_unused_object_arg use
cp_build_compound_expr as well.
PR c++/110441
gcc/cp/ChangeLog:
* call.cc (keep_unused_object_arg): Use cp_build_compound_expr
instead of building a COMPOUND_EXPR directly.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/elide8.C: New test.
|
|
This fixes a crash when mangling an ADL-enabled call to a template-id
naming an unknown template (as per P0846R0).
PR c++/110524
gcc/cp/ChangeLog:
* mangle.cc (write_expression): Handle TEMPLATE_ID_EXPR
whose template is already an IDENTIFIER_NODE.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/fn-template26.C: New test.
|
|
|
|
At this point r == t, but it makes more sense to refer to t like all the
other cases do.
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression): Pass t to get_value.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
Apparently I wasn't actually running the testsuite in C++26 mode like I
thought I was, so there were some failures I wasn't seeing.
The constexpr hunk fixes regressions with the P2738 implementation; we still
need to use the old handling for casting from void pointers to heap
variables.
PR c++/110344
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression): Move P2738 handling
after heap handling.
* name-lookup.cc (get_cxx_dialect_name): Add C++26.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/constexpr-cast2.C: Adjust for P2738.
* g++.dg/ipa/devirt-45.C: Handle -fimplicit-constexpr.
|
|
Implement vcmlaq using the new MVE builtins framework.
2023-07-13 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm-mve-builtins-base.cc (vcmlaq, vcmlaq_rot90)
(vcmlaq_rot180, vcmlaq_rot270): New.
* config/arm/arm-mve-builtins-base.def (vcmlaq, vcmlaq_rot90)
(vcmlaq_rot180, vcmlaq_rot270): New.
* config/arm/arm-mve-builtins-base.h: (vcmlaq, vcmlaq_rot90)
(vcmlaq_rot180, vcmlaq_rot270): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vcmlaq,
vcmlaq_rot90, vcmlaq_rot180, vcmlaq_rot270.
* config/arm/arm_mve.h (vcmlaq): Delete.
(vcmlaq_rot180): Delete.
(vcmlaq_rot270): Delete.
(vcmlaq_rot90): Delete.
(vcmlaq_m): Delete.
(vcmlaq_rot180_m): Delete.
(vcmlaq_rot270_m): Delete.
(vcmlaq_rot90_m): Delete.
(vcmlaq_f16): Delete.
(vcmlaq_rot180_f16): Delete.
(vcmlaq_rot270_f16): Delete.
(vcmlaq_rot90_f16): Delete.
(vcmlaq_f32): Delete.
(vcmlaq_rot180_f32): Delete.
(vcmlaq_rot270_f32): Delete.
(vcmlaq_rot90_f32): Delete.
(vcmlaq_m_f32): Delete.
(vcmlaq_m_f16): Delete.
(vcmlaq_rot180_m_f32): Delete.
(vcmlaq_rot180_m_f16): Delete.
(vcmlaq_rot270_m_f32): Delete.
(vcmlaq_rot270_m_f16): Delete.
(vcmlaq_rot90_m_f32): Delete.
(vcmlaq_rot90_m_f16): Delete.
(__arm_vcmlaq_f16): Delete.
(__arm_vcmlaq_rot180_f16): Delete.
(__arm_vcmlaq_rot270_f16): Delete.
(__arm_vcmlaq_rot90_f16): Delete.
(__arm_vcmlaq_f32): Delete.
(__arm_vcmlaq_rot180_f32): Delete.
(__arm_vcmlaq_rot270_f32): Delete.
(__arm_vcmlaq_rot90_f32): Delete.
(__arm_vcmlaq_m_f32): Delete.
(__arm_vcmlaq_m_f16): Delete.
(__arm_vcmlaq_rot180_m_f32): Delete.
(__arm_vcmlaq_rot180_m_f16): Delete.
(__arm_vcmlaq_rot270_m_f32): Delete.
(__arm_vcmlaq_rot270_m_f16): Delete.
(__arm_vcmlaq_rot90_m_f32): Delete.
(__arm_vcmlaq_rot90_m_f16): Delete.
(__arm_vcmlaq): Delete.
(__arm_vcmlaq_rot180): Delete.
(__arm_vcmlaq_rot270): Delete.
(__arm_vcmlaq_rot90): Delete.
(__arm_vcmlaq_m): Delete.
(__arm_vcmlaq_rot180_m): Delete.
(__arm_vcmlaq_rot270_m): Delete.
(__arm_vcmlaq_rot90_m): Delete.
|
|
Factorize vcmlaq builtins so that they use parameterized names.
2023-17-13 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm_mve_builtins.def (vcmlaq_rot90_f)
(vcmlaq_rot270_f, vcmlaq_rot180_f, vcmlaq_f): Add "_f" suffix.
* config/arm/iterators.md (MVE_VCMLAQ_M): New.
(mve_insn): Add vcmla.
(rot): Add VCMLAQ_M_F, VCMLAQ_ROT90_M_F, VCMLAQ_ROT180_M_F,
VCMLAQ_ROT270_M_F.
(mve_rot): Add VCMLAQ_M_F, VCMLAQ_ROT90_M_F, VCMLAQ_ROT180_M_F,
VCMLAQ_ROT270_M_F.
* config/arm/mve.md (mve_vcmlaq<mve_rot><mode>): Rename into ...
(@mve_<mve_insn>q<mve_rot>_f<mode>): ... this.
(mve_vcmlaq_m_f<mode>, mve_vcmlaq_rot180_m_f<mode>)
(mve_vcmlaq_rot270_m_f<mode>, mve_vcmlaq_rot90_m_f<mode>): Merge
into ...
(@mve_<mve_insn>q<mve_rot>_m_f<mode>): ... this.
|
|
Implement vcmulq using the new MVE builtins framework.
2023-07-13 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm-mve-builtins-base.cc (vcmulq, vcmulq_rot90)
(vcmulq_rot180, vcmulq_rot270): New.
* config/arm/arm-mve-builtins-base.def (vcmulq, vcmulq_rot90)
(vcmulq_rot180, vcmulq_rot270): New.
* config/arm/arm-mve-builtins-base.h: (vcmulq, vcmulq_rot90)
(vcmulq_rot180, vcmulq_rot270): New.
* config/arm/arm_mve.h (vcmulq_rot90): Delete.
(vcmulq_rot270): Delete.
(vcmulq_rot180): Delete.
(vcmulq): Delete.
(vcmulq_m): Delete.
(vcmulq_rot180_m): Delete.
(vcmulq_rot270_m): Delete.
(vcmulq_rot90_m): Delete.
(vcmulq_x): Delete.
(vcmulq_rot90_x): Delete.
(vcmulq_rot180_x): Delete.
(vcmulq_rot270_x): Delete.
(vcmulq_rot90_f16): Delete.
(vcmulq_rot270_f16): Delete.
(vcmulq_rot180_f16): Delete.
(vcmulq_f16): Delete.
(vcmulq_rot90_f32): Delete.
(vcmulq_rot270_f32): Delete.
(vcmulq_rot180_f32): Delete.
(vcmulq_f32): Delete.
(vcmulq_m_f32): Delete.
(vcmulq_m_f16): Delete.
(vcmulq_rot180_m_f32): Delete.
(vcmulq_rot180_m_f16): Delete.
(vcmulq_rot270_m_f32): Delete.
(vcmulq_rot270_m_f16): Delete.
(vcmulq_rot90_m_f32): Delete.
(vcmulq_rot90_m_f16): Delete.
(vcmulq_x_f16): Delete.
(vcmulq_x_f32): Delete.
(vcmulq_rot90_x_f16): Delete.
(vcmulq_rot90_x_f32): Delete.
(vcmulq_rot180_x_f16): Delete.
(vcmulq_rot180_x_f32): Delete.
(vcmulq_rot270_x_f16): Delete.
(vcmulq_rot270_x_f32): Delete.
(__arm_vcmulq_rot90_f16): Delete.
(__arm_vcmulq_rot270_f16): Delete.
(__arm_vcmulq_rot180_f16): Delete.
(__arm_vcmulq_f16): Delete.
(__arm_vcmulq_rot90_f32): Delete.
(__arm_vcmulq_rot270_f32): Delete.
(__arm_vcmulq_rot180_f32): Delete.
(__arm_vcmulq_f32): Delete.
(__arm_vcmulq_m_f32): Delete.
(__arm_vcmulq_m_f16): Delete.
(__arm_vcmulq_rot180_m_f32): Delete.
(__arm_vcmulq_rot180_m_f16): Delete.
(__arm_vcmulq_rot270_m_f32): Delete.
(__arm_vcmulq_rot270_m_f16): Delete.
(__arm_vcmulq_rot90_m_f32): Delete.
(__arm_vcmulq_rot90_m_f16): Delete.
(__arm_vcmulq_x_f16): Delete.
(__arm_vcmulq_x_f32): Delete.
(__arm_vcmulq_rot90_x_f16): Delete.
(__arm_vcmulq_rot90_x_f32): Delete.
(__arm_vcmulq_rot180_x_f16): Delete.
(__arm_vcmulq_rot180_x_f32): Delete.
(__arm_vcmulq_rot270_x_f16): Delete.
(__arm_vcmulq_rot270_x_f32): Delete.
(__arm_vcmulq_rot90): Delete.
(__arm_vcmulq_rot270): Delete.
(__arm_vcmulq_rot180): Delete.
(__arm_vcmulq): Delete.
(__arm_vcmulq_m): Delete.
(__arm_vcmulq_rot180_m): Delete.
(__arm_vcmulq_rot270_m): Delete.
(__arm_vcmulq_rot90_m): Delete.
(__arm_vcmulq_x): Delete.
(__arm_vcmulq_rot90_x): Delete.
(__arm_vcmulq_rot180_x): Delete.
(__arm_vcmulq_rot270_x): Delete.
|
|
Factorize vcmulq builtins so that they use parameterized names.
We can merged them with vcadd.
2023-07-13 Christophe Lyon <christophe.lyon@linaro.org>
gcc/:
* config/arm/arm_mve_builtins.def (vcmulq_rot90_f)
(vcmulq_rot270_f, vcmulq_rot180_f, vcmulq_f): Add "_f" suffix.
* config/arm/iterators.md (MVE_VCADDQ_VCMULQ)
(MVE_VCADDQ_VCMULQ_M): New.
(mve_insn): Add vcmul.
(rot): Add VCMULQ_M_F, VCMULQ_ROT90_M_F, VCMULQ_ROT180_M_F,
VCMULQ_ROT270_M_F.
(VCMUL): Delete.
(mve_rot): Add VCMULQ_M_F, VCMULQ_ROT90_M_F, VCMULQ_ROT180_M_F,
VCMULQ_ROT270_M_F.
* config/arm/mve.md (mve_vcmulq<mve_rot><mode>): Merge into
@mve_<mve_insn>q<mve_rot>_f<mode>.
(mve_vcmulq_m_f<mode>, mve_vcmulq_rot180_m_f<mode>)
(mve_vcmulq_rot270_m_f<mode>, mve_vcmulq_rot90_m_f<mode>): Merge
into @mve_<mve_insn>q<mve_rot>_m_f<mode>.
|
|
Implement vcaddq, vhcaddq using the new MVE builtins framework.
2023-07-13 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm-mve-builtins-base.cc (vcaddq_rot90)
(vcaddq_rot270, vhcaddq_rot90, vhcaddq_rot270): New.
* config/arm/arm-mve-builtins-base.def (vcaddq_rot90)
(vcaddq_rot270, vhcaddq_rot90, vhcaddq_rot270): New.
* config/arm/arm-mve-builtins-base.h: (vcaddq_rot90)
(vcaddq_rot270, vhcaddq_rot90, vhcaddq_rot270): New.
* config/arm/arm-mve-builtins-functions.h (class
unspec_mve_function_exact_insn_rot): New.
* config/arm/arm_mve.h (vcaddq_rot90): Delete.
(vcaddq_rot270): Delete.
(vhcaddq_rot90): Delete.
(vhcaddq_rot270): Delete.
(vcaddq_rot270_m): Delete.
(vcaddq_rot90_m): Delete.
(vhcaddq_rot270_m): Delete.
(vhcaddq_rot90_m): Delete.
(vcaddq_rot90_x): Delete.
(vcaddq_rot270_x): Delete.
(vhcaddq_rot90_x): Delete.
(vhcaddq_rot270_x): Delete.
(vcaddq_rot90_u8): Delete.
(vcaddq_rot270_u8): Delete.
(vhcaddq_rot90_s8): Delete.
(vhcaddq_rot270_s8): Delete.
(vcaddq_rot90_s8): Delete.
(vcaddq_rot270_s8): Delete.
(vcaddq_rot90_u16): Delete.
(vcaddq_rot270_u16): Delete.
(vhcaddq_rot90_s16): Delete.
(vhcaddq_rot270_s16): Delete.
(vcaddq_rot90_s16): Delete.
(vcaddq_rot270_s16): Delete.
(vcaddq_rot90_u32): Delete.
(vcaddq_rot270_u32): Delete.
(vhcaddq_rot90_s32): Delete.
(vhcaddq_rot270_s32): Delete.
(vcaddq_rot90_s32): Delete.
(vcaddq_rot270_s32): Delete.
(vcaddq_rot90_f16): Delete.
(vcaddq_rot270_f16): Delete.
(vcaddq_rot90_f32): Delete.
(vcaddq_rot270_f32): Delete.
(vcaddq_rot270_m_s8): Delete.
(vcaddq_rot270_m_s32): Delete.
(vcaddq_rot270_m_s16): Delete.
(vcaddq_rot270_m_u8): Delete.
(vcaddq_rot270_m_u32): Delete.
(vcaddq_rot270_m_u16): Delete.
(vcaddq_rot90_m_s8): Delete.
(vcaddq_rot90_m_s32): Delete.
(vcaddq_rot90_m_s16): Delete.
(vcaddq_rot90_m_u8): Delete.
(vcaddq_rot90_m_u32): Delete.
(vcaddq_rot90_m_u16): Delete.
(vhcaddq_rot270_m_s8): Delete.
(vhcaddq_rot270_m_s32): Delete.
(vhcaddq_rot270_m_s16): Delete.
(vhcaddq_rot90_m_s8): Delete.
(vhcaddq_rot90_m_s32): Delete.
(vhcaddq_rot90_m_s16): Delete.
(vcaddq_rot270_m_f32): Delete.
(vcaddq_rot270_m_f16): Delete.
(vcaddq_rot90_m_f32): Delete.
(vcaddq_rot90_m_f16): Delete.
(vcaddq_rot90_x_s8): Delete.
(vcaddq_rot90_x_s16): Delete.
(vcaddq_rot90_x_s32): Delete.
(vcaddq_rot90_x_u8): Delete.
(vcaddq_rot90_x_u16): Delete.
(vcaddq_rot90_x_u32): Delete.
(vcaddq_rot270_x_s8): Delete.
(vcaddq_rot270_x_s16): Delete.
(vcaddq_rot270_x_s32): Delete.
(vcaddq_rot270_x_u8): Delete.
(vcaddq_rot270_x_u16): Delete.
(vcaddq_rot270_x_u32): Delete.
(vhcaddq_rot90_x_s8): Delete.
(vhcaddq_rot90_x_s16): Delete.
(vhcaddq_rot90_x_s32): Delete.
(vhcaddq_rot270_x_s8): Delete.
(vhcaddq_rot270_x_s16): Delete.
(vhcaddq_rot270_x_s32): Delete.
(vcaddq_rot90_x_f16): Delete.
(vcaddq_rot90_x_f32): Delete.
(vcaddq_rot270_x_f16): Delete.
(vcaddq_rot270_x_f32): Delete.
(__arm_vcaddq_rot90_u8): Delete.
(__arm_vcaddq_rot270_u8): Delete.
(__arm_vhcaddq_rot90_s8): Delete.
(__arm_vhcaddq_rot270_s8): Delete.
(__arm_vcaddq_rot90_s8): Delete.
(__arm_vcaddq_rot270_s8): Delete.
(__arm_vcaddq_rot90_u16): Delete.
(__arm_vcaddq_rot270_u16): Delete.
(__arm_vhcaddq_rot90_s16): Delete.
(__arm_vhcaddq_rot270_s16): Delete.
(__arm_vcaddq_rot90_s16): Delete.
(__arm_vcaddq_rot270_s16): Delete.
(__arm_vcaddq_rot90_u32): Delete.
(__arm_vcaddq_rot270_u32): Delete.
(__arm_vhcaddq_rot90_s32): Delete.
(__arm_vhcaddq_rot270_s32): Delete.
(__arm_vcaddq_rot90_s32): Delete.
(__arm_vcaddq_rot270_s32): Delete.
(__arm_vcaddq_rot270_m_s8): Delete.
(__arm_vcaddq_rot270_m_s32): Delete.
(__arm_vcaddq_rot270_m_s16): Delete.
(__arm_vcaddq_rot270_m_u8): Delete.
(__arm_vcaddq_rot270_m_u32): Delete.
(__arm_vcaddq_rot270_m_u16): Delete.
(__arm_vcaddq_rot90_m_s8): Delete.
(__arm_vcaddq_rot90_m_s32): Delete.
(__arm_vcaddq_rot90_m_s16): Delete.
(__arm_vcaddq_rot90_m_u8): Delete.
(__arm_vcaddq_rot90_m_u32): Delete.
(__arm_vcaddq_rot90_m_u16): Delete.
(__arm_vhcaddq_rot270_m_s8): Delete.
(__arm_vhcaddq_rot270_m_s32): Delete.
(__arm_vhcaddq_rot270_m_s16): Delete.
(__arm_vhcaddq_rot90_m_s8): Delete.
(__arm_vhcaddq_rot90_m_s32): Delete.
(__arm_vhcaddq_rot90_m_s16): Delete.
(__arm_vcaddq_rot90_x_s8): Delete.
(__arm_vcaddq_rot90_x_s16): Delete.
(__arm_vcaddq_rot90_x_s32): Delete.
(__arm_vcaddq_rot90_x_u8): Delete.
(__arm_vcaddq_rot90_x_u16): Delete.
(__arm_vcaddq_rot90_x_u32): Delete.
(__arm_vcaddq_rot270_x_s8): Delete.
(__arm_vcaddq_rot270_x_s16): Delete.
(__arm_vcaddq_rot270_x_s32): Delete.
(__arm_vcaddq_rot270_x_u8): Delete.
(__arm_vcaddq_rot270_x_u16): Delete.
(__arm_vcaddq_rot270_x_u32): Delete.
(__arm_vhcaddq_rot90_x_s8): Delete.
(__arm_vhcaddq_rot90_x_s16): Delete.
(__arm_vhcaddq_rot90_x_s32): Delete.
(__arm_vhcaddq_rot270_x_s8): Delete.
(__arm_vhcaddq_rot270_x_s16): Delete.
(__arm_vhcaddq_rot270_x_s32): Delete.
(__arm_vcaddq_rot90_f16): Delete.
(__arm_vcaddq_rot270_f16): Delete.
(__arm_vcaddq_rot90_f32): Delete.
(__arm_vcaddq_rot270_f32): Delete.
(__arm_vcaddq_rot270_m_f32): Delete.
(__arm_vcaddq_rot270_m_f16): Delete.
(__arm_vcaddq_rot90_m_f32): Delete.
(__arm_vcaddq_rot90_m_f16): Delete.
(__arm_vcaddq_rot90_x_f16): Delete.
(__arm_vcaddq_rot90_x_f32): Delete.
(__arm_vcaddq_rot270_x_f16): Delete.
(__arm_vcaddq_rot270_x_f32): Delete.
(__arm_vcaddq_rot90): Delete.
(__arm_vcaddq_rot270): Delete.
(__arm_vhcaddq_rot90): Delete.
(__arm_vhcaddq_rot270): Delete.
(__arm_vcaddq_rot270_m): Delete.
(__arm_vcaddq_rot90_m): Delete.
(__arm_vhcaddq_rot270_m): Delete.
(__arm_vhcaddq_rot90_m): Delete.
(__arm_vcaddq_rot90_x): Delete.
(__arm_vcaddq_rot270_x): Delete.
(__arm_vhcaddq_rot90_x): Delete.
(__arm_vhcaddq_rot270_x): Delete.
|
|
Factorize vcaddq, vhcaddq so that they use the same parameterized
names.
To be able to use the same patterns, we add a suffix to vcaddq.
Note that vcadd uses UNSPEC_VCADDxx for builtins without predication,
and VCADDQ_ROTxx_M_x (that is, not starting with "UNSPEC_"). The
UNPEC_* names are also used by neon.md
2023-07-13 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm_mve_builtins.def (vcaddq_rot90_, vcaddq_rot270_)
(vcaddq_rot90_f, vcaddq_rot90_f): Add "_" or "_f" suffix.
* config/arm/iterators.md (mve_insn): Add vcadd, vhcadd.
(isu): Add UNSPEC_VCADD90, UNSPEC_VCADD270, VCADDQ_ROT270_M_U,
VCADDQ_ROT270_M_S, VCADDQ_ROT90_M_U, VCADDQ_ROT90_M_S,
VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S, VHCADDQ_ROT90_S,
VHCADDQ_ROT270_S.
(rot): Add VCADDQ_ROT90_M_F, VCADDQ_ROT90_M_S, VCADDQ_ROT90_M_U,
VCADDQ_ROT270_M_F, VCADDQ_ROT270_M_S, VCADDQ_ROT270_M_U,
VHCADDQ_ROT90_S, VHCADDQ_ROT270_S, VHCADDQ_ROT90_M_S,
VHCADDQ_ROT270_M_S.
(mve_rot): Add VCADDQ_ROT90_M_F, VCADDQ_ROT90_M_S,
VCADDQ_ROT90_M_U, VCADDQ_ROT270_M_F, VCADDQ_ROT270_M_S,
VCADDQ_ROT270_M_U, VHCADDQ_ROT90_S, VHCADDQ_ROT270_S,
VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S.
(supf): Add VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S,
VHCADDQ_ROT90_S, VHCADDQ_ROT270_S, UNSPEC_VCADD90,
UNSPEC_VCADD270.
(VCADDQ_ROT270_M): Delete.
(VCADDQ_M_F VxCADDQ VxCADDQ_M): New.
(VCADDQ_ROT90_M): Delete.
* config/arm/mve.md (mve_vcaddq<mve_rot><mode>)
(mve_vhcaddq_rot270_s<mode>, mve_vhcaddq_rot90_s<mode>): Merge
into ...
(@mve_<mve_insn>q<mve_rot>_<supf><mode>): ... this.
(mve_vcaddq<mve_rot><mode>): Rename into ...
(@mve_<mve_insn>q<mve_rot>_f<mode>): ... this
(mve_vcaddq_rot270_m_<supf><mode>)
(mve_vcaddq_rot90_m_<supf><mode>, mve_vhcaddq_rot270_m_s<mode>)
(mve_vhcaddq_rot90_m_s<mode>): Merge into ...
(@mve_<mve_insn>q<mve_rot>_m_<supf><mode>): ... this.
(mve_vcaddq_rot270_m_f<mode>, mve_vcaddq_rot90_m_f<mode>): Merge
into ...
(@mve_<mve_insn>q<mve_rot>_m_f<mode>): ... this.
|
|
This patch resolves PR target/110588 to catch another case in combine
where the i386 backend should be generating a btl instruction. This adds
another define_insn_and_split to recognize the RTL representation for this
case.
I also noticed that two related define_insn_and_split weren't using the
preferred string style for single statement preparation-statements, so
I've reformatted these to be consistent in style with the new one.
2023-07-14 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/110588
* config/i386/i386.md (*bt<mode>_setcqi): Prefer string form
preparation statement over braces for a single statement.
(*bt<mode>_setncqi): Likewise.
(*bt<mode>_setncqi_2): New define_insn_and_split.
gcc/testsuite/ChangeLog
PR target/110588
* gcc.target/i386/pr110588.c: New test case.
|
|
Since r8-509, we'll no longer create a static temporary var for
the initializer '{ 1, 2 }' for num in the attached test because
the code in finish_compound_literal is now guarded by
'&& fcl_context == fcl_c99' but it's fcl_functional here. This
causes us to reject num as non-constant when evaluating it in
a template.
Jason's idea was to treat num as value-dependent even though it
actually isn't. This patch implements that suggestion.
We weren't marking objects whose type is an empty class type
constant. This patch changes that so that v_d_e_p doesn't need
to check is_really_empty_class.
Co-authored-by: Jason Merrill <jason@redhat.com>
PR c++/109876
gcc/cp/ChangeLog:
* decl.cc (cp_finish_decl): Set TREE_CONSTANT when initializing
an object of empty class type.
* pt.cc (value_dependent_expression_p) <case VAR_DECL>: Treat a
constexpr-declared non-constant variable as value-dependent.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/constexpr-template12.C: New test.
* g++.dg/cpp1z/constexpr-template1.C: New test.
* g++.dg/cpp1z/constexpr-template2.C: New test.
|
|
This is the next piece towards a fix for (the x86_64 ABI issues affecting)
PR 88873. This patch generalizes the recent tweak to ix86_expand_move
for setting the highpart of a TImode reg from a DImode source using
*insvti_highpart_1, to handle both DImode and DFmode sources, and also
use the recently added *insvti_lowpart_1 for setting the lowpart.
Although this is another intermediate step (not yet a fix), towards
enabling *insvti and *concat* patterns to be candidates for TImode STV
(by using V2DI/V2DF instructions), it already improves things a little.
For the test case from PR 88873
typedef struct { double x, y; } s_t;
typedef double v2df __attribute__ ((vector_size (2 * sizeof(double))));
s_t foo (s_t a, s_t b, s_t c)
{
return (s_t) { fma(a.x, b.x, c.x), fma (a.y, b.y, c.y) };
}
With -O2 -march=cascadelake, GCC currently generates:
Before (29 instructions):
vmovq %xmm2, -56(%rsp)
movq -56(%rsp), %rdx
vmovq %xmm4, -40(%rsp)
movq $0, -48(%rsp)
movq %rdx, -56(%rsp)
movq -40(%rsp), %rdx
vmovq %xmm0, -24(%rsp)
movq %rdx, -40(%rsp)
movq -24(%rsp), %rsi
movq -56(%rsp), %rax
movq $0, -32(%rsp)
vmovq %xmm3, -48(%rsp)
movq -48(%rsp), %rcx
vmovq %xmm5, -32(%rsp)
vmovq %rax, %xmm6
movq -40(%rsp), %rax
movq $0, -16(%rsp)
movq %rsi, -24(%rsp)
movq -32(%rsp), %rsi
vpinsrq $1, %rcx, %xmm6, %xmm6
vmovq %rax, %xmm7
vmovq %xmm1, -16(%rsp)
vmovapd %xmm6, %xmm3
vpinsrq $1, %rsi, %xmm7, %xmm7
vfmadd132pd -24(%rsp), %xmm7, %xmm3
vmovapd %xmm3, -56(%rsp)
vmovsd -48(%rsp), %xmm1
vmovsd -56(%rsp), %xmm0
ret
After (20 instructions):
vmovq %xmm2, -56(%rsp)
movq -56(%rsp), %rax
vmovq %xmm3, -48(%rsp)
vmovq %xmm4, -40(%rsp)
movq -48(%rsp), %rcx
vmovq %xmm5, -32(%rsp)
vmovq %rax, %xmm6
movq -40(%rsp), %rax
movq -32(%rsp), %rsi
vpinsrq $1, %rcx, %xmm6, %xmm6
vmovq %xmm0, -24(%rsp)
vmovq %rax, %xmm7
vmovq %xmm1, -16(%rsp)
vmovapd %xmm6, %xmm2
vpinsrq $1, %rsi, %xmm7, %xmm7
vfmadd132pd -24(%rsp), %xmm7, %xmm2
vmovapd %xmm2, -56(%rsp)
vmovsd -48(%rsp), %xmm1
vmovsd -56(%rsp), %xmm0
ret
2023-07-14 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_move): Generalize special
case inserting of 64-bit values into a TImode register, to handle
both DImode and DFmode using either *insvti_lowpart_1
or *isnvti_highpart_1.
|
|
cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
that it equals 8 elements of HImodeby setting REG_EQUAL note:
(insn 21 19 22 4 (set (reg:V4QI 98)
(mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0 S4 A32])) "pr110206.c":12:42 1530 {*movv4qi_internal}
(expr_list:REG_EQUAL (const_vector:V4QI [
(const_int -52 [0xffffffffffffffcc]) repeated x4
])
(nil)))
(insn 22 21 23 4 (set (reg:V8HI 100)
(zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
(const_int 4 [0x4])
(const_int 5 [0x5])
(const_int 6 [0x6])
(const_int 7 [0x7])
])))) "pr110206.c":12:42 7471 {sse4_1_zero_extendv8qiv8hi2}
(expr_list:REG_EQUAL (const_vector:V8HI [
(const_int 204 [0xcc]) repeated x8
])
(expr_list:REG_DEAD (reg:V4QI 98)
(nil))))
We rely on the "undefined" vals to have a specific value (from the earlier
REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't
need to). That said, the issue isn't the constant folding per-se but that
we do not actually constant fold but register an equality that doesn't hold.
PR target/110206
gcc/ChangeLog:
* fwprop.cc (contains_paradoxical_subreg_p): Move to ...
* rtlanal.cc (contains_paradoxical_subreg_p): ... here.
* rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
* cprop.cc (try_replace_reg): Do not set REG_EQUAL note
when the original source contains a paradoxical subreg.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110206.c: New test.
|
|
Currently we rebiuild profile_counts from profile_probability after inlining,
because there is a chance that producing large loop nests may get unrealistically
large profile_count values. This is much less of concern when we switched to
new profile_count representation while back.
This propagation can also compensate for profile inconsistencies caused by
optimization passes. Since inliner is followed by basic cleanup passes that
does not use profile, we get more realistic profile by delaying the recomputation
after basic optimizations exposed by inlininig are finished.
This does not fit into TODO machinery, so I turn rebuilding into stand alone
pass and schedule it before first consumer of profile in the optimization
queue.
I also added logic that avoids repropagating when CFG is good and not too close
to overflow. Propagating visits very basic block loop_depth times, so it is
not linear and avoiding it may help a bit.
On tramp3d we get 14 functions repropagated and 916 are OK. The repropagated
functions are RB tree ones where we produce crazy loop nests by recurisve inlining.
This is something to fix independently.
gcc/ChangeLog:
* passes.cc (execute_function_todo): Remove
TODO_rebuild_frequencies
* passes.def: Add rebuild_frequencies pass.
* predict.cc (estimate_bb_frequencies): Drop
force parameter.
(tree_estimate_probability): Update call of
estimate_bb_frequencies.
(rebuild_frequencies): Turn into a pass; verify CFG profile consistency
first and do not rebuild if not necessary.
(class pass_rebuild_frequencies): New.
(make_pass_rebuild_frequencies): New.
* profile-count.h: Add profile_count::very_large_p.
* tree-inline.cc (optimize_inline_calls): Do not return
TODO_rebuild_frequencies
* tree-pass.h (TODO_rebuild_frequencies): Remove.
(make_pass_rebuild_frequencies): Declare.
|
|
Add comments as Robin's suggestion in scatter_store_run-7.c
Enable COND_LEN_FMA auto-vectorization for floating-point FMA auto-vectorization **NO** ffast-math.
Since the middle-end support has been approved and I will merge it after I finished bootstrap && regression on X86.
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624395.html
Now, it's time to send this patch.
Consider this following case:
__attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \
TYPE *__restrict a, \
TYPE *__restrict b, int n) \
{ \
for (int i = 0; i < n; i++) \
dst[i] += a[i] * b[i]; \
}
TEST_ALL ()
Before this patch:
ternop_double:
ble a3,zero,.L5
mv a6,a0
.L3:
vsetvli a5,a3,e64,m1,tu,ma
slli a4,a5,3
vle64.v v1,0(a0)
vle64.v v2,0(a1)
vle64.v v3,0(a2)
sub a3,a3,a5
vfmul.vv v2,v2,v3
vfadd.vv v1,v1,v2
vse64.v v1,0(a6)
add a0,a0,a4
add a1,a1,a4
add a2,a2,a4
add a6,a6,a4
bne a3,zero,.L3
.L5:
ret
After this patch:
ternop_double:
ble a3,zero,.L5
mv a6,a0
.L3:
vsetvli a5,a3,e64,m1,tu,ma
slli a4,a5,3
vle64.v v1,0(a0)
vle64.v v2,0(a1)
vle64.v v3,0(a2)
sub a3,a3,a5
vfmacc.vv v1,v3,v2
vse64.v v1,0(a6)
add a0,a0,a4
add a1,a1,a4
add a2,a2,a4
add a6,a6,a4
bne a3,zero,.L3
.L5:
ret
Notice: This patch only supports COND_LEN_FMA, **NO** COND_LEN_FNMA, ... etc since I didn't support them
in the middle-end yet.
Will support them in the following patches soon.
gcc/ChangeLog:
* config/riscv/autovec.md (cond_len_fma<mode>): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_cond_len_ternop): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_fp_ternary_tu_insn): Ditto.
(expand_cond_len_ternop): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c:
Adapt testcase for link fail.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-3.c: New test.
|
|
This patch adds a dummy FSM to bpf.md in order to get INSN_SCHEDULING
defined. If the later is not defined, the `combine' pass generates
paradoxical subregs of mems, which seems to then be mishandled by LRA,
resulting in invalid code.
Tested in bpf-unknown-none.
gcc/ChangeLog:
2023-07-14 Jose E. Marchesi <jose.marchesi@oracle.com>
PR target/110657
* config/bpf/bpf.md: Enable instruction scheduling.
|
|
In the case of an array actual arg passed to a polymorphic array dummy
with INTENT(OUT) attribute, reorder the argument evaluation code to
the following:
- first evaluate arguments' values, and data references,
- deallocate data references associated with an allocatable,
intent(out) dummy,
- create a class container using the freed data references.
The ordering used to be incorrect between the first two items,
when one argument was deallocated before a later argument evaluated
its expression depending on the former argument.
r14-2395-gb1079fc88f082d3c5b583c8822c08c5647810259 fixed it by treating
arguments associated with an allocatable, intent(out) dummy in a
separate, later block. This, however, wasn't working either if the data
reference of such an argument was depending on its own content, as
the class container initialization was trying to use deallocated
content.
This change generates class container initialization code in a separate
block, so that it is moved after the deallocation block without moving
the rest of the argument evaluation code.
This alone is not sufficient to fix the problem, because the class
container generation code repeatedly uses the full expression of
the argument at a place where deallocation might have happened
already. This is non-optimal, but may also be invalid, because the data
reference may depend on its own content. In that case the expression
can't be evaluated after the data has been deallocated.
As in the scalar case previously treated, this is fixed by saving
the data reference to a pointer before any deallocation happens,
and then only refering to the pointer. gfc_reset_vptr is updated
to take into account the already evaluated class container if it's
available.
Contrary to the scalar case, one hunk is needed to wrap the parameter
evaluation in a conditional, to avoid regressing in
optional_class_2.f90. This used to be handled by the class wrapper
construction which wrapped the whole code in a conditional. With
this change the class wrapper construction can't see the parameter
evaluation code, so the latter is updated with an additional handling
for optional arguments.
PR fortran/92178
gcc/fortran/ChangeLog:
* trans.h (gfc_reset_vptr): Add class_container argument.
* trans-expr.cc (gfc_reset_vptr): Ditto. If a valid vptr can
be obtained through class_container argument, bypass evaluation
of e.
(gfc_conv_procedure_call): Wrap the argument evaluation code
in a conditional if the associated dummy is optional. Evaluate
the data reference to a pointer now, and replace later
references with usage of the pointer.
gcc/testsuite/ChangeLog:
* gfortran.dg/intent_out_21.f90: New test.
|
|
In the case of a scalar actual arg passed to a polymorphic assumed-rank
dummy with INTENT(OUT) attribute, avoid repeatedly evaluating the actual
argument reference by saving a pointer to it. This is non-optimal, but
may also be invalid, because the data reference may depend on its own
content. In that case the expression can't be evaluated after the data
has been deallocated.
There are two ways redundant expressions are generated:
- parmse.expr, which contains the actual argument expression, is
reused to get or set subfields in gfc_conv_class_to_class.
- gfc_conv_class_to_class, to get the virtual table pointer associated
with the argument, generates a new expression from scratch starting
with the frontend expression.
The first part is fixed by saving parmse.expr to a pointer and using
the pointer instead of the original expression.
The second part is fixed by adding a separate field to gfc_se that
is set to the class container expression when the expression to
evaluate is polymorphic. This needs the same field in gfc_ss_info
so that its value can be propagated to gfc_conv_class_to_class which
is modified to use that value. Finally gfc_conv_procedure saves the
expression in that field to a pointer in between to avoid the same
problem as for the first part.
PR fortran/92178
gcc/fortran/ChangeLog:
* trans.h (struct gfc_se): New field class_container.
(struct gfc_ss_info): Ditto.
(gfc_evaluate_data_ref_now): New prototype.
* trans.cc (gfc_evaluate_data_ref_now): Implement it.
* trans-array.cc (gfc_conv_ss_descriptor): Copy class_container
field from gfc_se struct to gfc_ss_info struct.
(gfc_conv_expr_descriptor): Copy class_container field from
gfc_ss_info struct to gfc_se struct.
* trans-expr.cc (gfc_conv_class_to_class): Use class container
set in class_container field if available.
(gfc_conv_variable): Set class_container field on encountering
class variables or components, clear it on encountering
non-class components.
(gfc_conv_procedure_call): Evaluate data ref to a pointer now,
and replace later references by usage of the pointer.
gcc/testsuite/ChangeLog:
* gfortran.dg/intent_out_20.f90: New test.
|
|
If an actual argument is associated with an INTENT(OUT) dummy, and code
to deallocate it is generated, generate the class wrapper initialization
after the actual argument deallocation.
This is achieved by passing a cleaned up expression to
gfc_conv_class_to_class, so that the class wrapper initialization code
can be isolated and moved independently after the deallocation.
PR fortran/92178
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_conv_procedure_call): Use a separate gfc_se
struct, initalized from parmse, to generate the class wrapper.
After the class wrapper code has been generated, copy it back
depending on whether parameter deallocation code has been
generated.
gcc/testsuite/ChangeLog:
* gfortran.dg/intent_out_19.f90: New test.
|
|
libgomp/
* libgomp.texi (OMP_ALLOCATOR): Document the default values for
the traits. Add crossref to 'Memory allocation'.
(Memory allocation): Refer to OMP_ALLOCATOR for the available
traits and allocators/mem spaces; document the default value
for the pool_size trait.
|
|
This patch builds on the previous patch by fixing another issue with the
way ifcvt currently picks which branches to test.
The issue with the current implementation is while it sorts for
occurrences of the argument, it doesn't check for complexity of the arguments.
As an example:
<bb 15> [local count: 528603100]:
...
if (distbb_75 >= 0.0)
goto <bb 17>; [59.00%]
else
goto <bb 16>; [41.00%]
<bb 16> [local count: 216727269]:
...
goto <bb 19>; [100.00%]
<bb 17> [local count: 311875831]:
...
if (distbb_75 < iftmp.0_98)
goto <bb 18>; [20.00%]
else
goto <bb 19>; [80.00%]
<bb 18> [local count: 62375167]:
...
<bb 19> [local count: 528603100]:
# prephitmp_175 = PHI <_173(18), 0.0(17), _174(16)>
All tree arguments to the PHI have the same number of occurrences, namely 1,
however it makes a big difference which comparison we test first.
Sorting only on occurrences we'll pick the compares coming from BB 18 and BB 17,
This means we end up generating 4 comparisons, while 2 would have been enough.
By keeping track of the "complexity" of the COND in each BB, (i.e. the number
of comparisons needed to traverse from the start [BB 15] to end [BB 19]) and
using a key tuple of <occurrences, complexity> we end up selecting the compare
from BB 16 and BB 18 first. BB 16 only requires 1 compare, and BB 18, after we
test BB 16 also only requires one additional compare. This change paired with
the one previous above results in the optimal 2 compares.
For deep nesting, i.e. for
...
_79 = vr_15 > 20;
_80 = _68 & _79;
_82 = vr_15 <= 20;
_83 = _68 & _82;
_84 = vr_15 < -20;
_85 = _73 & _84;
_87 = vr_15 >= -20;
_88 = _73 & _87;
_ifc__111 = _55 ? 10 : 12;
_ifc__112 = _70 ? 7 : _ifc__111;
_ifc__113 = _85 ? 8 : _ifc__112;
_ifc__114 = _88 ? 9 : _ifc__113;
_ifc__115 = _45 ? 1 : _ifc__114;
_ifc__116 = _63 ? 3 : _ifc__115;
_ifc__117 = _65 ? 4 : _ifc__116;
_ifc__118 = _83 ? 6 : _ifc__117;
_ifc__119 = _60 ? 2 : _ifc__118;
_ifc__120 = _43 ? 13 : _ifc__119;
_ifc__121 = _75 ? 11 : _ifc__120;
vw_1 = _80 ? 5 : _ifc__121;
Most of the comparisons are still needed because the chain of
occurrences to not negate eachother. i.e. _80 is _73 & vr_15 >= -20 and
_85 is _73 & vr_15 < -20. clearly given _73 needs to be true in both branches,
the only additional test needed is on vr_15, where the one test is the negation
of the other. So we don't need to do the comparison of _73 twice.
The changes in the patch reduces the overall number of compares by one, but has
a bigger effect on the dependency chain.
Previously we would generate 5 instructions chain:
cmple p7.s, p4/z, z29.s, z30.s
cmpne p7.s, p7/z, z29.s, #0
cmple p6.s, p7/z, z31.s, z30.s
cmpge p6.s, p6/z, z27.s, z25.s
cmplt p15.s, p6/z, z28.s, z21.s
as the longest chain. With this patch we generate 3:
cmple p7.s, p3/z, z27.s, z30.s
cmpne p7.s, p7/z, z27.s, #0
cmpgt p7.s, p7/z, z31.s, z30.s
and I don't think (x <= y) && (x != 0) && (z > y) can be reduced further.
gcc/ChangeLog:
PR tree-optimization/109154
* tree-if-conv.cc (INCLUDE_ALGORITHM): Include.
(struct bb_predicate): Add no_predicate_stmts.
(set_bb_predicate): Increase predicate count.
(set_bb_predicate_gimplified_stmts): Conditionally initialize
no_predicate_stmts.
(get_bb_num_predicate_stmts): New.
(init_bb_predicate): Initialzie no_predicate_stmts.
(release_bb_predicate): Cleanup no_predicate_stmts.
(insert_gimplified_predicates): Preserve no_predicate_stmts.
gcc/testsuite/ChangeLog:
PR tree-optimization/109154
* gcc.dg/vect/vect-ifcvt-20.c: New test.
|
|
Following on from Jakub's patch in g:de0ee9d14165eebb3d31c84e98260c05c3b33acb
these two patches finishes the work fixing the regression and improves codegen.
As explained in that commit, ifconvert sorts PHI args in increasing number of
occurrences in order to reduce the number of comparisons done while
traversing the tree.
The remaining task that this patch fixes is dealing with the long chain of
comparisons that can be created from phi nodes, particularly when they share
any common successor (classical example is a diamond node).
on a PHI-node the true and else branches carry a condition, true will
carry `a` and false `~a`. The issue is that at the moment GCC tests both `a`
and `~a` when the phi node has more than 2 arguments. Clearly this isn't
needed. The deeper the nesting of phi nodes the larger the repetition.
As an example, for
foo (int *f, int d, int e)
{
for (int i = 0; i < 1024; i++)
{
int a = f[i];
int t;
if (a < 0)
t = 1;
else if (a < e)
t = 1 - a * d;
else
t = 0;
f[i] = t;
}
}
after Jakub's patch we generate:
_7 = a_10 < 0;
_21 = a_10 >= 0;
_22 = a_10 < e_11(D);
_23 = _21 & _22;
_ifc__42 = _23 ? t_13 : 0;
t_6 = _7 ? 1 : _ifc__42
but while better than before it is still inefficient, since in the false
branch, where we know ~_7 is true, we still test _21.
This leads to superfluous tests for every diamond node. After this patch we
generate
_7 = a_10 < 0;
_22 = a_10 < e_11(D);
_ifc__42 = _22 ? t_13 : 0;
t_6 = _7 ? 1 : _ifc__42;
Which correctly elides the test of _21. This is done by borrowing the
vectorizer's helper functions to limit predicate mask usages. Ifcvt will chain
conditionals on the false edge (unless specifically inverted) so this patch on
creating cond a ? b : c, will register ~a when traversing c. If c is a
conditional then c will be simplified to the smaller possible predicate given
the assumptions we already know to be true.
gcc/ChangeLog:
PR tree-optimization/109154
* tree-if-conv.cc (gen_simplified_condition,
gen_phi_nest_statement): New.
(gen_phi_arg_condition, predicate_scalar_phi): Use it.
gcc/testsuite/ChangeLog:
PR tree-optimization/109154
* gcc.dg/vect/vect-ifcvt-19.c: New test.
|
|
The following adds checking that the edge we query an associated
PHI arg for is related to the PHI node. Triggered by questionable
code in one of my reviews.
* gimple.h (gimple_phi_arg): New const overload.
(gimple_phi_arg_def): Make gimple arg const.
(gimple_phi_arg_def_from_edge): New inline function.
* tree-phinodes.h (gimple_phi_arg_imm_use_ptr_from_edge):
Likewise.
* tree-ssa-operands.h (PHI_ARG_DEF_FROM_EDGE): Direct to
new inline function.
(PHI_ARG_DEF_PTR_FROM_EDGE): Likewise.
|
|
Follow up to r14-2462-g450b05ce54d3f0. The case that libnuma was not
available at runtime was not properly handled; now it falls back to
the normal malloc.
libgomp/
* allocator.c (omp_init_allocator): Check whether symbol from
dlopened libnuma is available before using libnuma for
allocations.
|
|
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc:
(riscv_implied_info): Add zihintntl item.
(riscv_ext_version_table): Ditto.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv-opts.h (MASK_ZIHINTNTL): New macro.
(TARGET_ZIHINTNTL): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-22.c: New test.
* gcc.target/riscv/predef-28.c: New test.
|
|
When generating the gen_and<mode>3 function based on the and<mode>3
template, it produces the expression emit_insn (gen_rtx_SET (operand0,
gen_rtx_AND (<mode>, operand1, operand2)));, which is identical to the
portion I removed in this patch. Therefore, the redundant portion can be
deleted.
Signed-off-by: Die Li <lidie@eswincomputing.com>
gcc/ChangeLog:
* config/riscv/riscv.md: Remove redundant portion in and<mode>3.
|
|
gcc/ChangeLog:
PR target/101469
* config/sh/sh.md (peephole2): Handle case where eliminated reg is also
used by the address of the following memory operand.
|
|
|
|
gcc/
PR target/107841
* config/pdp11/pdp11.cc (pdp11_expand_epilogue): Also
deallocate alloca-only frame.
gcc/testsuite/
PR target/107841
* gcc.target/pdp11/pr107841.c: New test.
|
|
When trying to bootstrap current trunk on macOS 14.0 beta 3 with Xcode
15 beta 4, the build failed running mklink in stage 2:
unset CC ; m2/boot-bin/mklink -s --langc++ --exit --name m2/mc-boot/main.cc
/vol/gcc/src/hg/master/darwin/gcc/m2/init/mcinit
dyld[55825]: Library not loaded: /vol/gcc/lib/libstdc++.6.dylib
While it's unclear to me why this only happens on macOS 14, the problem
is clear: unlike other C++ executables, mklink isn't linked with
-static-libstdc++ which is passed in from toplevel in LDFLAGS.
This patch fixes that and allows the build to continue.
Bootstrapped on x86_64-apple-darwin23.0.0, i386-pc-solaris2.11, and
sparc-sun-solaris2.11.
2023-07-11 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/m2:
* Make-lang.in (m2/boot-bin/mklink$(exeext)): Add $(LDFLAGS).
|
|
Release symbols in reversed order wrt the order they were allocated.
This fixes an error recovery ICE in the case of a misplaced
derived type declaration. Such a declaration creates nested
symbols, one for the derived type and one for each type parameter,
which should be immediately released as the declaration is
rejected. This breaks if the derived type is released first.
As the type parameter symbols are in the namespace of the derived
type, releasing the derived type releases the type parameters, so
one can't access them after that, even to release them. Hence,
the type parameters should be released first.
PR fortran/106050
gcc/fortran/ChangeLog:
* symbol.cc (gfc_restore_last_undo_checkpoint): Release symbols
in reverse order.
gcc/testsuite/ChangeLog:
* gfortran.dg/pdt_33.f90: New test.
|
|
Later versions of the static linker support a more flexible flag to
describe the OS, OS version and SDK used to build the code. This
replaces the functionality of '-mmacosx_version_min' (which is now
deprecated, leading to the diagnostic described in the PR).
We now use the platform_version flag when available which avoids the
diagnostic.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/110624
gcc/ChangeLog:
* config/darwin.h (DARWIN_PLATFORM_ID): New.
(LINK_COMMAND_A): Use DARWIN_PLATFORM_ID to pass OS, OS version
and SDK data to the static linker.
|
|
Change the return value from void to double for __builtin_set_fpscr_rn.
The return value consists of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI,
RN bit positions. A new test file, test powerpc/test_fpscr_rn_builtin_2.c,
is added to test the new return value for the built-in.
The value __SET_FPSCR_RN_RETURNS_FPSCR__ is defined if
__builtin_set_fpscr_rn returns a double.
gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update
built-in definition return type.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Add check,
define __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
* config/rs6000/rs6000.md (rs6000_set_fpscr_rn): Add return
argument to return FPSCR fields.
* doc/extend.texi (__builtin_set_fpscr_rn): Update description for
the return value. Add description for
__SET_FPSCR_RN_RETURNS_FPSCR__ macro.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/test_fpscr_rn_builtin.c: Rename to
test_fpscr_rn_builtin_1.c. Add comment.
* gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the
return value of __builtin_set_fpscr_rn builtin.
|
|
std::stoi, std::stol, std::stoul, and std::stod only depend on C89
functions, so don't need to be guarded by _GLIBCXX_USE_C99_STDLIB
std::stoll and std::stoull don't need C99 strtoll and strtoull if
sizeof(long) == sizeof(long long).
std::stold doesn't need C99 strtold if DBL_MANT_DIG == LDBL_MANT_DIG.
This only applies to the narrow character overloads, the wchar_t
overloads depend on a separate _GLIBCXX_USE_C99_WCHAR macro and none of
them can be implemented in C89 easily.
libstdc++-v3/ChangeLog:
PR libstdc++/110653
* include/bits/basic_string.h (stoi, stol, stoul, stod): Do not
depend on _GLIBCXX_USE_C99_STDLIB.
[__LONG_WIDTH__ == __LONG_LONG_WIDTH__] (stoll, stoull): Define
in terms of stol and stoul respectively.
[__DBL_MANT_DIG__ == __LDBL_MANT_DIG__] (stold): Define in terms
of stod.
|
|
PR target/106966
gcc/ChangeLog:
* config/alpha/alpha.cc (alpha_emit_set_long_const):
Always use DImode when constructing long const.
gcc/testsuite/ChangeLog:
* gcc.target/alpha/pr106966.c: New test.
|
|
gcc/ChangeLog:
* haifa-sched.cc: Change TRUE/FALSE to true/false.
* ira.cc: Ditto.
* lra-assigns.cc: Ditto.
* lra-constraints.cc: Ditto.
* sel-sched.cc: Ditto.
|
|
This fixes part of PR 110293, for the outer comparison case
being `!=` or `==`. In turn PR 110539 is able to be optimized
again as the if statement for `(a&1) == ((a & 1) != 0)` gets optimized
to `false` early enough to allow FRE/DOM to do a CSE for memory store/load.
OK? Bootstrapped and tested on x86_64-linux with no regressions.
gcc/ChangeLog:
PR tree-optimization/110293
PR tree-optimization/110539
* match.pd: Expand the `x != (typeof x)(x == 0)`
pattern to handle where the inner and outer comparsions
are either `!=` or `==` and handle other constants
than 0.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr110293-1.c: New test.
* gcc.dg/tree-ssa/pr110539-1.c: New test.
* gcc.dg/tree-ssa/pr110539-2.c: New test.
* gcc.dg/tree-ssa/pr110539-3.c: New test.
* gcc.dg/tree-ssa/pr110539-4.c: New test.
|
|
Asm insn unlike other insns can have so many operands whose
constraints can not be satisfied. It results in LRA cycling for such
test case. The following patch catches such situation and reports the
problem.
PR middle-end/109520
gcc/ChangeLog:
* lra-int.h (lra_insn_recog_data): Add member asm_reloads_num.
(lra_asm_insn_error): New prototype.
* lra.cc: Include rtl_error.h.
(lra_set_insn_recog_data): Initialize asm_reloads_num.
(lra_asm_insn_error): New func whose code is taken from ...
* lra-assigns.cc (lra_split_hard_reg_for): ... here. Use lra_asm_insn_error.
* lra-constraints.cc (curr_insn_transform): Check reloads nummber for asm.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr109520.c: New test.
|
|
Hi, Richard and Richi.
Previous patch we support COND_LEN_* binary operations. However, we didn't
support COND_LEN_* ternary.
Now, this patch support COND_LEN_* ternary. Consider this following case:
__attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \
TYPE *__restrict a, \
TYPE *__restrict b,\
TYPE *__restrict c, int n) \
{ \
for (int i = 0; i < n; i++) \
dst[i] += a[i] * b[i]; \
}
TEST_ALL ()
Before this patch:
...
COND_LEN_MUL
COND_LEN_ADD
Afther this patch:
...
COND_LEN_FMA
gcc/ChangeLog:
* genmatch.cc (commutative_op): Add COND_LEN_*
* internal-fn.cc (first_commutative_argument): Ditto.
(CASE): Ditto.
(get_unconditional_internal_fn): Ditto.
(can_interpret_as_conditional_op_p): Ditto.
(internal_fn_len_index): Ditto.
* internal-fn.h (can_interpret_as_conditional_op_p): Ditt.
* tree-ssa-math-opts.cc (convert_mult_to_fma_1): Ditto.
(convert_mult_to_fma): Ditto.
(math_opts_dom_walker::after_dom_children): Ditto.
|
|
Some test cases in libgomp testsuite pass -flto as an option, but
the testcases do not require LTO target support. This patch adds
the necessary DejaGNU requirement for LTO support to the testcases..
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-map-class-2.C: Require LTO.
* testsuite/libgomp.c-c++-common/requires-4.c: Require LTO.
* testsuite/libgomp.c-c++-common/requires-4a.c: Require LTO.
Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
|