aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-07-17Daily bump.GCC Administrator2-1/+19
2023-07-16Fix profile update in scale_profile_for_vect_loopJan Hubicka1-25/+24
When vectorizing 4 times, we sometimes do for <4x vectorized body> for <2x vectorized body> for <1x vectorized body> Here the second two fors handling epilogue never iterates. Currently vecotrizer thinks that the middle for itrates twice. This turns out to be scale_profile_for_vect_loop that uses niter_for_unrolled_loop. At that time we know epilogue will iterate at most 2 times but niter_for_unrolled_loop does not know that the last iteration will be taken by the epilogue-of-epilogue and thus it think that the loop may iterate once and exit in middle of second iteration. We already do correct job updating niter bounds and this is just ordering issue. This patch makes us to first update the bounds and then do updating of the loop. I re-implemented the function more correctly and precisely. The loop reducing iteration factor for overly flat profiles is bit funny, but only other method I can think of is to compute sreal scale that would have similar overhead I think. Bootstrapped/regtested x86_64-linux, will commit it shortly. gcc/ChangeLog: PR middle-end/110649 * tree-vect-loop.cc (scale_profile_for_vect_loop): Rewrite. (vect_transform_loop): Move scale_profile_for_vect_loop after upper bound updates.
2023-07-16Fix optimize_mask_stores profile updateJan Hubicka1-0/+1
While looking into sphinx3 regression I noticed that vectorizer produces BBs with overall probability count 120%. This patch fixes it. Richi, I don't know how to create a testcase, but having one would be nice. Bootstrapped/regtested x86_64-linux, will commit it shortly. gcc/ChangeLog: PR tree-optimization/110649 * tree-vect-loop.cc (optimize_mask_stores): Set correctly probability of the if-then-else construct.
2023-07-16Avoid double profile udpate in try_peel_loopJan Hubicka1-12/+1
try_peel_loop uses gimple_duplicate_loop_body_to_header_edge which subtracts the profile from the original loop. However then it tries to scale the profile in a wrong way (it forces header count to be entry count). This eliminates to profile misupdates in the internal loop of sphinx3. gcc/ChangeLog: PR middle-end/110649 * tree-ssa-loop-ivcanon.cc (try_peel_loop): Avoid double profile update.
2023-07-16Daily bump.GCC Administrator4-1/+47
2023-07-15testsuite: Require 128 bit long double for ibmlongdouble.David Edelsohn1-1/+1
pr103628.f90 adds the -mabi=ibmlongdouble option, but AIX defaults to 64 bit long double. This patch adds -mlong-double-128 to ensure that the testcase is compiled with 128 bit long double. gcc/testsuite/ChangeLog: * gfortran.dg/pr103628.f90: Add -mlong-double-128 option. Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
2023-07-15Update my contrib entryAndrew Pinski1-1/+2
Committed as obvious after making sure the documentation still builds. gcc/ChangeLog: * doc/contrib.texi: Update my entry.
2023-07-15hppa: Modify TLS patterns to provide both 32 and 64-bit support.John David Anglin1-34/+162
2023-07-15 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: * config/pa/pa.md: Define constants R1_REGNUM, R19_REGNUM and R27_REGNUM. (tgd_load): Restrict to !TARGET_64BIT. Use register constants. (tld_load): Likewise. (tgd_load_pic): Change to expander. (tld_load_pic, tld_offset_load, tp_load): Likewise. (tie_load_pic, tle_load): Likewise. (tgd_load_picsi, tgd_load_picdi): New. (tld_load_picsi, tld_load_picdi): New. (tld_offset_load<P:mode>): New. (tp_load<P:mode>): New. (tie_load_picsi, tie_load_picdi): New. (tle_load<P:mode>): New.
2023-07-15c++: copy elision w/ obj arg and static memfn call [PR110441]Patrick Palka2-1/+26
Here the call A().f() is represented as a COMPOUND_EXPR whose first operand is the otherwise unused object argument A() and second operand is the call result (both are TARGET_EXPRs). Within the return statement, this outermost COMPOUND_EXPR ends up foiling the copy elision check in build_special_member_call, resulting in us introducing a bogus call to the deleted move constructor. (Within the variable initialization, which goes through ocp_convert instead of convert_for_initialization, we've already been eliding the copy -- despite the outermost COMPOUND_EXPR -- ever since r10-7410-g72809d6fe8e085 made ocp_convert look through COMPOUND_EXPR). In contrast I noticed '(A(), A::f())' (which should be equivalent to the above call) is represented with the COMPOUND_EXPR inside the RHS's TARGET_EXPR initializer thanks to a special case in cp_build_compound_expr. So this patch fixes this by making keep_unused_object_arg use cp_build_compound_expr as well. PR c++/110441 gcc/cp/ChangeLog: * call.cc (keep_unused_object_arg): Use cp_build_compound_expr instead of building a COMPOUND_EXPR directly. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/elide8.C: New test.
2023-07-15c++: mangling template-id of unknown template [PR110524]Patrick Palka2-1/+18
This fixes a crash when mangling an ADL-enabled call to a template-id naming an unknown template (as per P0846R0). PR c++/110524 gcc/cp/ChangeLog: * mangle.cc (write_expression): Handle TEMPLATE_ID_EXPR whose template is already an IDENTIFIER_NODE. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/fn-template26.C: New test.
2023-07-15Daily bump.GCC Administrator6-1/+626
2023-07-14c++: style tweakNathaniel Shead1-1/+1
At this point r == t, but it makes more sense to refer to t like all the other cases do. gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_constant_expression): Pass t to get_value. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2023-07-14c++: c++26 regression fixesJason Merrill4-15/+16
Apparently I wasn't actually running the testsuite in C++26 mode like I thought I was, so there were some failures I wasn't seeing. The constexpr hunk fixes regressions with the P2738 implementation; we still need to use the old handling for casting from void pointers to heap variables. PR c++/110344 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_constant_expression): Move P2738 handling after heap handling. * name-lookup.cc (get_cxx_dialect_name): Add C++26. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/constexpr-cast2.C: Adjust for P2738. * g++.dg/ipa/devirt-45.C: Handle -fimplicit-constexpr.
2023-07-14arm: [MVE intrinsics] rework vcmlaqChristophe Lyon5-310/+22
Implement vcmlaq using the new MVE builtins framework. 2023-07-13 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/arm-mve-builtins-base.cc (vcmlaq, vcmlaq_rot90) (vcmlaq_rot180, vcmlaq_rot270): New. * config/arm/arm-mve-builtins-base.def (vcmlaq, vcmlaq_rot90) (vcmlaq_rot180, vcmlaq_rot270): New. * config/arm/arm-mve-builtins-base.h: (vcmlaq, vcmlaq_rot90) (vcmlaq_rot180, vcmlaq_rot270): New. * config/arm/arm-mve-builtins.cc (function_instance::has_inactive_argument): Handle vcmlaq, vcmlaq_rot90, vcmlaq_rot180, vcmlaq_rot270. * config/arm/arm_mve.h (vcmlaq): Delete. (vcmlaq_rot180): Delete. (vcmlaq_rot270): Delete. (vcmlaq_rot90): Delete. (vcmlaq_m): Delete. (vcmlaq_rot180_m): Delete. (vcmlaq_rot270_m): Delete. (vcmlaq_rot90_m): Delete. (vcmlaq_f16): Delete. (vcmlaq_rot180_f16): Delete. (vcmlaq_rot270_f16): Delete. (vcmlaq_rot90_f16): Delete. (vcmlaq_f32): Delete. (vcmlaq_rot180_f32): Delete. (vcmlaq_rot270_f32): Delete. (vcmlaq_rot90_f32): Delete. (vcmlaq_m_f32): Delete. (vcmlaq_m_f16): Delete. (vcmlaq_rot180_m_f32): Delete. (vcmlaq_rot180_m_f16): Delete. (vcmlaq_rot270_m_f32): Delete. (vcmlaq_rot270_m_f16): Delete. (vcmlaq_rot90_m_f32): Delete. (vcmlaq_rot90_m_f16): Delete. (__arm_vcmlaq_f16): Delete. (__arm_vcmlaq_rot180_f16): Delete. (__arm_vcmlaq_rot270_f16): Delete. (__arm_vcmlaq_rot90_f16): Delete. (__arm_vcmlaq_f32): Delete. (__arm_vcmlaq_rot180_f32): Delete. (__arm_vcmlaq_rot270_f32): Delete. (__arm_vcmlaq_rot90_f32): Delete. (__arm_vcmlaq_m_f32): Delete. (__arm_vcmlaq_m_f16): Delete. (__arm_vcmlaq_rot180_m_f32): Delete. (__arm_vcmlaq_rot180_m_f16): Delete. (__arm_vcmlaq_rot270_m_f32): Delete. (__arm_vcmlaq_rot270_m_f16): Delete. (__arm_vcmlaq_rot90_m_f32): Delete. (__arm_vcmlaq_rot90_m_f16): Delete. (__arm_vcmlaq): Delete. (__arm_vcmlaq_rot180): Delete. (__arm_vcmlaq_rot270): Delete. (__arm_vcmlaq_rot90): Delete. (__arm_vcmlaq_m): Delete. (__arm_vcmlaq_rot180_m): Delete. (__arm_vcmlaq_rot270_m): Delete. (__arm_vcmlaq_rot90_m): Delete.
2023-07-14arm: [MVE intrinsics] factorize vcmlaqChristophe Lyon3-64/+29
Factorize vcmlaq builtins so that they use parameterized names. 2023-17-13 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/arm_mve_builtins.def (vcmlaq_rot90_f) (vcmlaq_rot270_f, vcmlaq_rot180_f, vcmlaq_f): Add "_f" suffix. * config/arm/iterators.md (MVE_VCMLAQ_M): New. (mve_insn): Add vcmla. (rot): Add VCMLAQ_M_F, VCMLAQ_ROT90_M_F, VCMLAQ_ROT180_M_F, VCMLAQ_ROT270_M_F. (mve_rot): Add VCMLAQ_M_F, VCMLAQ_ROT90_M_F, VCMLAQ_ROT180_M_F, VCMLAQ_ROT270_M_F. * config/arm/mve.md (mve_vcmlaq<mve_rot><mode>): Rename into ... (@mve_<mve_insn>q<mve_rot>_f<mode>): ... this. (mve_vcmlaq_m_f<mode>, mve_vcmlaq_rot180_m_f<mode>) (mve_vcmlaq_rot270_m_f<mode>, mve_vcmlaq_rot90_m_f<mode>): Merge into ... (@mve_<mve_insn>q<mve_rot>_m_f<mode>): ... this.
2023-07-14arm: [MVE intrinsics] rework vcmulqChristophe Lyon4-448/+12
Implement vcmulq using the new MVE builtins framework. 2023-07-13 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/arm-mve-builtins-base.cc (vcmulq, vcmulq_rot90) (vcmulq_rot180, vcmulq_rot270): New. * config/arm/arm-mve-builtins-base.def (vcmulq, vcmulq_rot90) (vcmulq_rot180, vcmulq_rot270): New. * config/arm/arm-mve-builtins-base.h: (vcmulq, vcmulq_rot90) (vcmulq_rot180, vcmulq_rot270): New. * config/arm/arm_mve.h (vcmulq_rot90): Delete. (vcmulq_rot270): Delete. (vcmulq_rot180): Delete. (vcmulq): Delete. (vcmulq_m): Delete. (vcmulq_rot180_m): Delete. (vcmulq_rot270_m): Delete. (vcmulq_rot90_m): Delete. (vcmulq_x): Delete. (vcmulq_rot90_x): Delete. (vcmulq_rot180_x): Delete. (vcmulq_rot270_x): Delete. (vcmulq_rot90_f16): Delete. (vcmulq_rot270_f16): Delete. (vcmulq_rot180_f16): Delete. (vcmulq_f16): Delete. (vcmulq_rot90_f32): Delete. (vcmulq_rot270_f32): Delete. (vcmulq_rot180_f32): Delete. (vcmulq_f32): Delete. (vcmulq_m_f32): Delete. (vcmulq_m_f16): Delete. (vcmulq_rot180_m_f32): Delete. (vcmulq_rot180_m_f16): Delete. (vcmulq_rot270_m_f32): Delete. (vcmulq_rot270_m_f16): Delete. (vcmulq_rot90_m_f32): Delete. (vcmulq_rot90_m_f16): Delete. (vcmulq_x_f16): Delete. (vcmulq_x_f32): Delete. (vcmulq_rot90_x_f16): Delete. (vcmulq_rot90_x_f32): Delete. (vcmulq_rot180_x_f16): Delete. (vcmulq_rot180_x_f32): Delete. (vcmulq_rot270_x_f16): Delete. (vcmulq_rot270_x_f32): Delete. (__arm_vcmulq_rot90_f16): Delete. (__arm_vcmulq_rot270_f16): Delete. (__arm_vcmulq_rot180_f16): Delete. (__arm_vcmulq_f16): Delete. (__arm_vcmulq_rot90_f32): Delete. (__arm_vcmulq_rot270_f32): Delete. (__arm_vcmulq_rot180_f32): Delete. (__arm_vcmulq_f32): Delete. (__arm_vcmulq_m_f32): Delete. (__arm_vcmulq_m_f16): Delete. (__arm_vcmulq_rot180_m_f32): Delete. (__arm_vcmulq_rot180_m_f16): Delete. (__arm_vcmulq_rot270_m_f32): Delete. (__arm_vcmulq_rot270_m_f16): Delete. (__arm_vcmulq_rot90_m_f32): Delete. (__arm_vcmulq_rot90_m_f16): Delete. (__arm_vcmulq_x_f16): Delete. (__arm_vcmulq_x_f32): Delete. (__arm_vcmulq_rot90_x_f16): Delete. (__arm_vcmulq_rot90_x_f32): Delete. (__arm_vcmulq_rot180_x_f16): Delete. (__arm_vcmulq_rot180_x_f32): Delete. (__arm_vcmulq_rot270_x_f16): Delete. (__arm_vcmulq_rot270_x_f32): Delete. (__arm_vcmulq_rot90): Delete. (__arm_vcmulq_rot270): Delete. (__arm_vcmulq_rot180): Delete. (__arm_vcmulq): Delete. (__arm_vcmulq_m): Delete. (__arm_vcmulq_rot180_m): Delete. (__arm_vcmulq_rot270_m): Delete. (__arm_vcmulq_rot90_m): Delete. (__arm_vcmulq_x): Delete. (__arm_vcmulq_rot90_x): Delete. (__arm_vcmulq_rot180_x): Delete. (__arm_vcmulq_rot270_x): Delete.
2023-07-14arm: [MVE intrinsics factorize vcmulqChristophe Lyon3-94/+33
Factorize vcmulq builtins so that they use parameterized names. We can merged them with vcadd. 2023-07-13 Christophe Lyon <christophe.lyon@linaro.org> gcc/: * config/arm/arm_mve_builtins.def (vcmulq_rot90_f) (vcmulq_rot270_f, vcmulq_rot180_f, vcmulq_f): Add "_f" suffix. * config/arm/iterators.md (MVE_VCADDQ_VCMULQ) (MVE_VCADDQ_VCMULQ_M): New. (mve_insn): Add vcmul. (rot): Add VCMULQ_M_F, VCMULQ_ROT90_M_F, VCMULQ_ROT180_M_F, VCMULQ_ROT270_M_F. (VCMUL): Delete. (mve_rot): Add VCMULQ_M_F, VCMULQ_ROT90_M_F, VCMULQ_ROT180_M_F, VCMULQ_ROT270_M_F. * config/arm/mve.md (mve_vcmulq<mve_rot><mode>): Merge into @mve_<mve_insn>q<mve_rot>_f<mode>. (mve_vcmulq_m_f<mode>, mve_vcmulq_rot180_m_f<mode>) (mve_vcmulq_rot270_m_f<mode>, mve_vcmulq_rot90_m_f<mode>): Merge into @mve_<mve_insn>q<mve_rot>_m_f<mode>.
2023-07-14arm: [MVE intrinsics] rework vcaddq vhcaddqChristophe Lyon5-1202/+117
Implement vcaddq, vhcaddq using the new MVE builtins framework. 2023-07-13 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/arm-mve-builtins-base.cc (vcaddq_rot90) (vcaddq_rot270, vhcaddq_rot90, vhcaddq_rot270): New. * config/arm/arm-mve-builtins-base.def (vcaddq_rot90) (vcaddq_rot270, vhcaddq_rot90, vhcaddq_rot270): New. * config/arm/arm-mve-builtins-base.h: (vcaddq_rot90) (vcaddq_rot270, vhcaddq_rot90, vhcaddq_rot270): New. * config/arm/arm-mve-builtins-functions.h (class unspec_mve_function_exact_insn_rot): New. * config/arm/arm_mve.h (vcaddq_rot90): Delete. (vcaddq_rot270): Delete. (vhcaddq_rot90): Delete. (vhcaddq_rot270): Delete. (vcaddq_rot270_m): Delete. (vcaddq_rot90_m): Delete. (vhcaddq_rot270_m): Delete. (vhcaddq_rot90_m): Delete. (vcaddq_rot90_x): Delete. (vcaddq_rot270_x): Delete. (vhcaddq_rot90_x): Delete. (vhcaddq_rot270_x): Delete. (vcaddq_rot90_u8): Delete. (vcaddq_rot270_u8): Delete. (vhcaddq_rot90_s8): Delete. (vhcaddq_rot270_s8): Delete. (vcaddq_rot90_s8): Delete. (vcaddq_rot270_s8): Delete. (vcaddq_rot90_u16): Delete. (vcaddq_rot270_u16): Delete. (vhcaddq_rot90_s16): Delete. (vhcaddq_rot270_s16): Delete. (vcaddq_rot90_s16): Delete. (vcaddq_rot270_s16): Delete. (vcaddq_rot90_u32): Delete. (vcaddq_rot270_u32): Delete. (vhcaddq_rot90_s32): Delete. (vhcaddq_rot270_s32): Delete. (vcaddq_rot90_s32): Delete. (vcaddq_rot270_s32): Delete. (vcaddq_rot90_f16): Delete. (vcaddq_rot270_f16): Delete. (vcaddq_rot90_f32): Delete. (vcaddq_rot270_f32): Delete. (vcaddq_rot270_m_s8): Delete. (vcaddq_rot270_m_s32): Delete. (vcaddq_rot270_m_s16): Delete. (vcaddq_rot270_m_u8): Delete. (vcaddq_rot270_m_u32): Delete. (vcaddq_rot270_m_u16): Delete. (vcaddq_rot90_m_s8): Delete. (vcaddq_rot90_m_s32): Delete. (vcaddq_rot90_m_s16): Delete. (vcaddq_rot90_m_u8): Delete. (vcaddq_rot90_m_u32): Delete. (vcaddq_rot90_m_u16): Delete. (vhcaddq_rot270_m_s8): Delete. (vhcaddq_rot270_m_s32): Delete. (vhcaddq_rot270_m_s16): Delete. (vhcaddq_rot90_m_s8): Delete. (vhcaddq_rot90_m_s32): Delete. (vhcaddq_rot90_m_s16): Delete. (vcaddq_rot270_m_f32): Delete. (vcaddq_rot270_m_f16): Delete. (vcaddq_rot90_m_f32): Delete. (vcaddq_rot90_m_f16): Delete. (vcaddq_rot90_x_s8): Delete. (vcaddq_rot90_x_s16): Delete. (vcaddq_rot90_x_s32): Delete. (vcaddq_rot90_x_u8): Delete. (vcaddq_rot90_x_u16): Delete. (vcaddq_rot90_x_u32): Delete. (vcaddq_rot270_x_s8): Delete. (vcaddq_rot270_x_s16): Delete. (vcaddq_rot270_x_s32): Delete. (vcaddq_rot270_x_u8): Delete. (vcaddq_rot270_x_u16): Delete. (vcaddq_rot270_x_u32): Delete. (vhcaddq_rot90_x_s8): Delete. (vhcaddq_rot90_x_s16): Delete. (vhcaddq_rot90_x_s32): Delete. (vhcaddq_rot270_x_s8): Delete. (vhcaddq_rot270_x_s16): Delete. (vhcaddq_rot270_x_s32): Delete. (vcaddq_rot90_x_f16): Delete. (vcaddq_rot90_x_f32): Delete. (vcaddq_rot270_x_f16): Delete. (vcaddq_rot270_x_f32): Delete. (__arm_vcaddq_rot90_u8): Delete. (__arm_vcaddq_rot270_u8): Delete. (__arm_vhcaddq_rot90_s8): Delete. (__arm_vhcaddq_rot270_s8): Delete. (__arm_vcaddq_rot90_s8): Delete. (__arm_vcaddq_rot270_s8): Delete. (__arm_vcaddq_rot90_u16): Delete. (__arm_vcaddq_rot270_u16): Delete. (__arm_vhcaddq_rot90_s16): Delete. (__arm_vhcaddq_rot270_s16): Delete. (__arm_vcaddq_rot90_s16): Delete. (__arm_vcaddq_rot270_s16): Delete. (__arm_vcaddq_rot90_u32): Delete. (__arm_vcaddq_rot270_u32): Delete. (__arm_vhcaddq_rot90_s32): Delete. (__arm_vhcaddq_rot270_s32): Delete. (__arm_vcaddq_rot90_s32): Delete. (__arm_vcaddq_rot270_s32): Delete. (__arm_vcaddq_rot270_m_s8): Delete. (__arm_vcaddq_rot270_m_s32): Delete. (__arm_vcaddq_rot270_m_s16): Delete. (__arm_vcaddq_rot270_m_u8): Delete. (__arm_vcaddq_rot270_m_u32): Delete. (__arm_vcaddq_rot270_m_u16): Delete. (__arm_vcaddq_rot90_m_s8): Delete. (__arm_vcaddq_rot90_m_s32): Delete. (__arm_vcaddq_rot90_m_s16): Delete. (__arm_vcaddq_rot90_m_u8): Delete. (__arm_vcaddq_rot90_m_u32): Delete. (__arm_vcaddq_rot90_m_u16): Delete. (__arm_vhcaddq_rot270_m_s8): Delete. (__arm_vhcaddq_rot270_m_s32): Delete. (__arm_vhcaddq_rot270_m_s16): Delete. (__arm_vhcaddq_rot90_m_s8): Delete. (__arm_vhcaddq_rot90_m_s32): Delete. (__arm_vhcaddq_rot90_m_s16): Delete. (__arm_vcaddq_rot90_x_s8): Delete. (__arm_vcaddq_rot90_x_s16): Delete. (__arm_vcaddq_rot90_x_s32): Delete. (__arm_vcaddq_rot90_x_u8): Delete. (__arm_vcaddq_rot90_x_u16): Delete. (__arm_vcaddq_rot90_x_u32): Delete. (__arm_vcaddq_rot270_x_s8): Delete. (__arm_vcaddq_rot270_x_s16): Delete. (__arm_vcaddq_rot270_x_s32): Delete. (__arm_vcaddq_rot270_x_u8): Delete. (__arm_vcaddq_rot270_x_u16): Delete. (__arm_vcaddq_rot270_x_u32): Delete. (__arm_vhcaddq_rot90_x_s8): Delete. (__arm_vhcaddq_rot90_x_s16): Delete. (__arm_vhcaddq_rot90_x_s32): Delete. (__arm_vhcaddq_rot270_x_s8): Delete. (__arm_vhcaddq_rot270_x_s16): Delete. (__arm_vhcaddq_rot270_x_s32): Delete. (__arm_vcaddq_rot90_f16): Delete. (__arm_vcaddq_rot270_f16): Delete. (__arm_vcaddq_rot90_f32): Delete. (__arm_vcaddq_rot270_f32): Delete. (__arm_vcaddq_rot270_m_f32): Delete. (__arm_vcaddq_rot270_m_f16): Delete. (__arm_vcaddq_rot90_m_f32): Delete. (__arm_vcaddq_rot90_m_f16): Delete. (__arm_vcaddq_rot90_x_f16): Delete. (__arm_vcaddq_rot90_x_f32): Delete. (__arm_vcaddq_rot270_x_f16): Delete. (__arm_vcaddq_rot270_x_f32): Delete. (__arm_vcaddq_rot90): Delete. (__arm_vcaddq_rot270): Delete. (__arm_vhcaddq_rot90): Delete. (__arm_vhcaddq_rot270): Delete. (__arm_vcaddq_rot270_m): Delete. (__arm_vcaddq_rot90_m): Delete. (__arm_vhcaddq_rot270_m): Delete. (__arm_vhcaddq_rot90_m): Delete. (__arm_vcaddq_rot90_x): Delete. (__arm_vcaddq_rot270_x): Delete. (__arm_vhcaddq_rot90_x): Delete. (__arm_vhcaddq_rot270_x): Delete.
2023-07-14arm: [MVE intrinsics] Factorize vcaddq vhcaddqChristophe Lyon3-117/+62
Factorize vcaddq, vhcaddq so that they use the same parameterized names. To be able to use the same patterns, we add a suffix to vcaddq. Note that vcadd uses UNSPEC_VCADDxx for builtins without predication, and VCADDQ_ROTxx_M_x (that is, not starting with "UNSPEC_"). The UNPEC_* names are also used by neon.md 2023-07-13 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/arm_mve_builtins.def (vcaddq_rot90_, vcaddq_rot270_) (vcaddq_rot90_f, vcaddq_rot90_f): Add "_" or "_f" suffix. * config/arm/iterators.md (mve_insn): Add vcadd, vhcadd. (isu): Add UNSPEC_VCADD90, UNSPEC_VCADD270, VCADDQ_ROT270_M_U, VCADDQ_ROT270_M_S, VCADDQ_ROT90_M_U, VCADDQ_ROT90_M_S, VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S, VHCADDQ_ROT90_S, VHCADDQ_ROT270_S. (rot): Add VCADDQ_ROT90_M_F, VCADDQ_ROT90_M_S, VCADDQ_ROT90_M_U, VCADDQ_ROT270_M_F, VCADDQ_ROT270_M_S, VCADDQ_ROT270_M_U, VHCADDQ_ROT90_S, VHCADDQ_ROT270_S, VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S. (mve_rot): Add VCADDQ_ROT90_M_F, VCADDQ_ROT90_M_S, VCADDQ_ROT90_M_U, VCADDQ_ROT270_M_F, VCADDQ_ROT270_M_S, VCADDQ_ROT270_M_U, VHCADDQ_ROT90_S, VHCADDQ_ROT270_S, VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S. (supf): Add VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S, VHCADDQ_ROT90_S, VHCADDQ_ROT270_S, UNSPEC_VCADD90, UNSPEC_VCADD270. (VCADDQ_ROT270_M): Delete. (VCADDQ_M_F VxCADDQ VxCADDQ_M): New. (VCADDQ_ROT90_M): Delete. * config/arm/mve.md (mve_vcaddq<mve_rot><mode>) (mve_vhcaddq_rot270_s<mode>, mve_vhcaddq_rot90_s<mode>): Merge into ... (@mve_<mve_insn>q<mve_rot>_<supf><mode>): ... this. (mve_vcaddq<mve_rot><mode>): Rename into ... (@mve_<mve_insn>q<mve_rot>_f<mode>): ... this (mve_vcaddq_rot270_m_<supf><mode>) (mve_vcaddq_rot90_m_<supf><mode>, mve_vhcaddq_rot270_m_s<mode>) (mve_vhcaddq_rot90_m_s<mode>): Merge into ... (@mve_<mve_insn>q<mve_rot>_m_<supf><mode>): ... this. (mve_vcaddq_rot270_m_f<mode>, mve_vcaddq_rot90_m_f<mode>): Merge into ... (@mve_<mve_insn>q<mve_rot>_m_f<mode>): ... this.
2023-07-14PR target/110588: Add *bt<mode>_setncqi_2 to generate btl on x86.Roger Sayle2-6/+41
This patch resolves PR target/110588 to catch another case in combine where the i386 backend should be generating a btl instruction. This adds another define_insn_and_split to recognize the RTL representation for this case. I also noticed that two related define_insn_and_split weren't using the preferred string style for single statement preparation-statements, so I've reformatted these to be consistent in style with the new one. 2023-07-14 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/110588 * config/i386/i386.md (*bt<mode>_setcqi): Prefer string form preparation statement over braces for a single statement. (*bt<mode>_setncqi): Likewise. (*bt<mode>_setncqi_2): New define_insn_and_split. gcc/testsuite/ChangeLog PR target/110588 * gcc.target/i386/pr110588.c: New test case.
2023-07-14c++: wrong error with static constexpr var in tmpl [PR109876]Marek Polacek5-3/+105
Since r8-509, we'll no longer create a static temporary var for the initializer '{ 1, 2 }' for num in the attached test because the code in finish_compound_literal is now guarded by '&& fcl_context == fcl_c99' but it's fcl_functional here. This causes us to reject num as non-constant when evaluating it in a template. Jason's idea was to treat num as value-dependent even though it actually isn't. This patch implements that suggestion. We weren't marking objects whose type is an empty class type constant. This patch changes that so that v_d_e_p doesn't need to check is_really_empty_class. Co-authored-by: Jason Merrill <jason@redhat.com> PR c++/109876 gcc/cp/ChangeLog: * decl.cc (cp_finish_decl): Set TREE_CONSTANT when initializing an object of empty class type. * pt.cc (value_dependent_expression_p) <case VAR_DECL>: Treat a constexpr-declared non-constant variable as value-dependent. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/constexpr-template12.C: New test. * g++.dg/cpp1z/constexpr-template1.C: New test. * g++.dg/cpp1z/constexpr-template2.C: New test.
2023-07-14i386: Improved insv of DImode/DFmode {high,low}parts into TImode.Roger Sayle1-10/+27
This is the next piece towards a fix for (the x86_64 ABI issues affecting) PR 88873. This patch generalizes the recent tweak to ix86_expand_move for setting the highpart of a TImode reg from a DImode source using *insvti_highpart_1, to handle both DImode and DFmode sources, and also use the recently added *insvti_lowpart_1 for setting the lowpart. Although this is another intermediate step (not yet a fix), towards enabling *insvti and *concat* patterns to be candidates for TImode STV (by using V2DI/V2DF instructions), it already improves things a little. For the test case from PR 88873 typedef struct { double x, y; } s_t; typedef double v2df __attribute__ ((vector_size (2 * sizeof(double)))); s_t foo (s_t a, s_t b, s_t c) { return (s_t) { fma(a.x, b.x, c.x), fma (a.y, b.y, c.y) }; } With -O2 -march=cascadelake, GCC currently generates: Before (29 instructions): vmovq %xmm2, -56(%rsp) movq -56(%rsp), %rdx vmovq %xmm4, -40(%rsp) movq $0, -48(%rsp) movq %rdx, -56(%rsp) movq -40(%rsp), %rdx vmovq %xmm0, -24(%rsp) movq %rdx, -40(%rsp) movq -24(%rsp), %rsi movq -56(%rsp), %rax movq $0, -32(%rsp) vmovq %xmm3, -48(%rsp) movq -48(%rsp), %rcx vmovq %xmm5, -32(%rsp) vmovq %rax, %xmm6 movq -40(%rsp), %rax movq $0, -16(%rsp) movq %rsi, -24(%rsp) movq -32(%rsp), %rsi vpinsrq $1, %rcx, %xmm6, %xmm6 vmovq %rax, %xmm7 vmovq %xmm1, -16(%rsp) vmovapd %xmm6, %xmm3 vpinsrq $1, %rsi, %xmm7, %xmm7 vfmadd132pd -24(%rsp), %xmm7, %xmm3 vmovapd %xmm3, -56(%rsp) vmovsd -48(%rsp), %xmm1 vmovsd -56(%rsp), %xmm0 ret After (20 instructions): vmovq %xmm2, -56(%rsp) movq -56(%rsp), %rax vmovq %xmm3, -48(%rsp) vmovq %xmm4, -40(%rsp) movq -48(%rsp), %rcx vmovq %xmm5, -32(%rsp) vmovq %rax, %xmm6 movq -40(%rsp), %rax movq -32(%rsp), %rsi vpinsrq $1, %rcx, %xmm6, %xmm6 vmovq %xmm0, -24(%rsp) vmovq %rax, %xmm7 vmovq %xmm1, -16(%rsp) vmovapd %xmm6, %xmm2 vpinsrq $1, %rsi, %xmm7, %xmm7 vfmadd132pd -24(%rsp), %xmm7, %xmm2 vmovapd %xmm2, -56(%rsp) vmovsd -48(%rsp), %xmm1 vmovsd -56(%rsp), %xmm0 ret 2023-07-14 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_move): Generalize special case inserting of 64-bit values into a TImode register, to handle both DImode and DFmode using either *insvti_lowpart_1 or *isnvti_highpart_1.
2023-07-14cprop: Do not set REG_EQUAL note when simplifying paradoxical subreg [PR110206]Uros Bizjak5-16/+60
cprop1 pass does not consider paradoxical subreg and for (insn 22) claims that it equals 8 elements of HImodeby setting REG_EQUAL note: (insn 21 19 22 4 (set (reg:V4QI 98) (mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0 S4 A32])) "pr110206.c":12:42 1530 {*movv4qi_internal} (expr_list:REG_EQUAL (const_vector:V4QI [ (const_int -52 [0xffffffffffffffcc]) repeated x4 ]) (nil))) (insn 22 21 23 4 (set (reg:V8HI 100) (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) ])))) "pr110206.c":12:42 7471 {sse4_1_zero_extendv8qiv8hi2} (expr_list:REG_EQUAL (const_vector:V8HI [ (const_int 204 [0xcc]) repeated x8 ]) (expr_list:REG_DEAD (reg:V4QI 98) (nil)))) We rely on the "undefined" vals to have a specific value (from the earlier REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't need to). That said, the issue isn't the constant folding per-se but that we do not actually constant fold but register an equality that doesn't hold. PR target/110206 gcc/ChangeLog: * fwprop.cc (contains_paradoxical_subreg_p): Move to ... * rtlanal.cc (contains_paradoxical_subreg_p): ... here. * rtlanal.h (contains_paradoxical_subreg_p): Add prototype. * cprop.cc (try_replace_reg): Do not set REG_EQUAL note when the original source contains a paradoxical subreg. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110206.c: New test.
2023-07-14Turn TODO_rebuild_frequencies to a passJan Hubicka6-116/+248
Currently we rebiuild profile_counts from profile_probability after inlining, because there is a chance that producing large loop nests may get unrealistically large profile_count values. This is much less of concern when we switched to new profile_count representation while back. This propagation can also compensate for profile inconsistencies caused by optimization passes. Since inliner is followed by basic cleanup passes that does not use profile, we get more realistic profile by delaying the recomputation after basic optimizations exposed by inlininig are finished. This does not fit into TODO machinery, so I turn rebuilding into stand alone pass and schedule it before first consumer of profile in the optimization queue. I also added logic that avoids repropagating when CFG is good and not too close to overflow. Propagating visits very basic block loop_depth times, so it is not linear and avoiding it may help a bit. On tramp3d we get 14 functions repropagated and 916 are OK. The repropagated functions are RB tree ones where we produce crazy loop nests by recurisve inlining. This is something to fix independently. gcc/ChangeLog: * passes.cc (execute_function_todo): Remove TODO_rebuild_frequencies * passes.def: Add rebuild_frequencies pass. * predict.cc (estimate_bb_frequencies): Drop force parameter. (tree_estimate_probability): Update call of estimate_bb_frequencies. (rebuild_frequencies): Turn into a pass; verify CFG profile consistency first and do not rebuild if not necessary. (class pass_rebuild_frequencies): New. (make_pass_rebuild_frequencies): New. * profile-count.h: Add profile_count::very_large_p. * tree-inline.cc (optimize_inline_calls): Do not return TODO_rebuild_frequencies * tree-pass.h (TODO_rebuild_frequencies): Remove. (make_pass_rebuild_frequencies): Declare.
2023-07-14RISC-V: Enable COND_LEN_FMA auto-vectorizationJuzhe-Zhong10-1/+118
Add comments as Robin's suggestion in scatter_store_run-7.c Enable COND_LEN_FMA auto-vectorization for floating-point FMA auto-vectorization **NO** ffast-math. Since the middle-end support has been approved and I will merge it after I finished bootstrap && regression on X86. https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624395.html Now, it's time to send this patch. Consider this following case: __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \ TYPE *__restrict a, \ TYPE *__restrict b, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] += a[i] * b[i]; \ } TEST_ALL () Before this patch: ternop_double: ble a3,zero,.L5 mv a6,a0 .L3: vsetvli a5,a3,e64,m1,tu,ma slli a4,a5,3 vle64.v v1,0(a0) vle64.v v2,0(a1) vle64.v v3,0(a2) sub a3,a3,a5 vfmul.vv v2,v2,v3 vfadd.vv v1,v1,v2 vse64.v v1,0(a6) add a0,a0,a4 add a1,a1,a4 add a2,a2,a4 add a6,a6,a4 bne a3,zero,.L3 .L5: ret After this patch: ternop_double: ble a3,zero,.L5 mv a6,a0 .L3: vsetvli a5,a3,e64,m1,tu,ma slli a4,a5,3 vle64.v v1,0(a0) vle64.v v2,0(a1) vle64.v v3,0(a2) sub a3,a3,a5 vfmacc.vv v1,v3,v2 vse64.v v1,0(a6) add a0,a0,a4 add a1,a1,a4 add a2,a2,a4 add a6,a6,a4 bne a3,zero,.L3 .L5: ret Notice: This patch only supports COND_LEN_FMA, **NO** COND_LEN_FNMA, ... etc since I didn't support them in the middle-end yet. Will support them in the following patches soon. gcc/ChangeLog: * config/riscv/autovec.md (cond_len_fma<mode>): New pattern. * config/riscv/riscv-protos.h (enum insn_type): New enum. (expand_cond_len_ternop): New function. * config/riscv/riscv-v.cc (emit_nonvlmax_fp_ternary_tu_insn): Ditto. (expand_cond_len_ternop): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c: Adapt testcase for link fail. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-1.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-1.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-2.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-3.c: New test.
2023-07-14bpf: enable instruction schedulingJose E. Marchesi1-0/+11
This patch adds a dummy FSM to bpf.md in order to get INSN_SCHEDULING defined. If the later is not defined, the `combine' pass generates paradoxical subregs of mems, which seems to then be mishandled by LRA, resulting in invalid code. Tested in bpf-unknown-none. gcc/ChangeLog: 2023-07-14 Jose E. Marchesi <jose.marchesi@oracle.com> PR target/110657 * config/bpf/bpf.md: Enable instruction scheduling.
2023-07-14fortran: Reorder array argument evaluation parts [PR92178]Mikael Morin3-20/+101
In the case of an array actual arg passed to a polymorphic array dummy with INTENT(OUT) attribute, reorder the argument evaluation code to the following: - first evaluate arguments' values, and data references, - deallocate data references associated with an allocatable, intent(out) dummy, - create a class container using the freed data references. The ordering used to be incorrect between the first two items, when one argument was deallocated before a later argument evaluated its expression depending on the former argument. r14-2395-gb1079fc88f082d3c5b583c8822c08c5647810259 fixed it by treating arguments associated with an allocatable, intent(out) dummy in a separate, later block. This, however, wasn't working either if the data reference of such an argument was depending on its own content, as the class container initialization was trying to use deallocated content. This change generates class container initialization code in a separate block, so that it is moved after the deallocation block without moving the rest of the argument evaluation code. This alone is not sufficient to fix the problem, because the class container generation code repeatedly uses the full expression of the argument at a place where deallocation might have happened already. This is non-optimal, but may also be invalid, because the data reference may depend on its own content. In that case the expression can't be evaluated after the data has been deallocated. As in the scalar case previously treated, this is fixed by saving the data reference to a pointer before any deallocation happens, and then only refering to the pointer. gfc_reset_vptr is updated to take into account the already evaluated class container if it's available. Contrary to the scalar case, one hunk is needed to wrap the parameter evaluation in a conditional, to avoid regressing in optional_class_2.f90. This used to be handled by the class wrapper construction which wrapped the whole code in a conditional. With this change the class wrapper construction can't see the parameter evaluation code, so the latter is updated with an additional handling for optional arguments. PR fortran/92178 gcc/fortran/ChangeLog: * trans.h (gfc_reset_vptr): Add class_container argument. * trans-expr.cc (gfc_reset_vptr): Ditto. If a valid vptr can be obtained through class_container argument, bypass evaluation of e. (gfc_conv_procedure_call): Wrap the argument evaluation code in a conditional if the associated dummy is optional. Evaluate the data reference to a pointer now, and replace later references with usage of the pointer. gcc/testsuite/ChangeLog: * gfortran.dg/intent_out_21.f90: New test.
2023-07-14fortran: Factor data references for scalar class argument wrapping [PR92178]Mikael Morin5-0/+96
In the case of a scalar actual arg passed to a polymorphic assumed-rank dummy with INTENT(OUT) attribute, avoid repeatedly evaluating the actual argument reference by saving a pointer to it. This is non-optimal, but may also be invalid, because the data reference may depend on its own content. In that case the expression can't be evaluated after the data has been deallocated. There are two ways redundant expressions are generated: - parmse.expr, which contains the actual argument expression, is reused to get or set subfields in gfc_conv_class_to_class. - gfc_conv_class_to_class, to get the virtual table pointer associated with the argument, generates a new expression from scratch starting with the frontend expression. The first part is fixed by saving parmse.expr to a pointer and using the pointer instead of the original expression. The second part is fixed by adding a separate field to gfc_se that is set to the class container expression when the expression to evaluate is polymorphic. This needs the same field in gfc_ss_info so that its value can be propagated to gfc_conv_class_to_class which is modified to use that value. Finally gfc_conv_procedure saves the expression in that field to a pointer in between to avoid the same problem as for the first part. PR fortran/92178 gcc/fortran/ChangeLog: * trans.h (struct gfc_se): New field class_container. (struct gfc_ss_info): Ditto. (gfc_evaluate_data_ref_now): New prototype. * trans.cc (gfc_evaluate_data_ref_now): Implement it. * trans-array.cc (gfc_conv_ss_descriptor): Copy class_container field from gfc_se struct to gfc_ss_info struct. (gfc_conv_expr_descriptor): Copy class_container field from gfc_ss_info struct to gfc_se struct. * trans-expr.cc (gfc_conv_class_to_class): Use class container set in class_container field if available. (gfc_conv_variable): Set class_container field on encountering class variables or components, clear it on encountering non-class components. (gfc_conv_procedure_call): Evaluate data ref to a pointer now, and replace later references by usage of the pointer. gcc/testsuite/ChangeLog: * gfortran.dg/intent_out_20.f90: New test.
2023-07-14fortran: defer class wrapper initialization after deallocation [PR92178]Mikael Morin2-1/+39
If an actual argument is associated with an INTENT(OUT) dummy, and code to deallocate it is generated, generate the class wrapper initialization after the actual argument deallocation. This is achieved by passing a cleaned up expression to gfc_conv_class_to_class, so that the class wrapper initialization code can be isolated and moved independently after the deallocation. PR fortran/92178 gcc/fortran/ChangeLog: * trans-expr.cc (gfc_conv_procedure_call): Use a separate gfc_se struct, initalized from parmse, to generate the class wrapper. After the class wrapper code has been generated, copy it back depending on whether parameter deallocation code has been generated. gcc/testsuite/ChangeLog: * gfortran.dg/intent_out_19.f90: New test.
2023-07-14libgomp.texi: Extend memory allocation documentationTobias Burnus1-5/+32
libgomp/ * libgomp.texi (OMP_ALLOCATOR): Document the default values for the traits. Add crossref to 'Memory allocation'. (Memory allocation): Refer to OMP_ALLOCATOR for the available traits and allocators/mem spaces; document the default value for the pool_size trait.
2023-07-14ifcvt: Sort PHI arguments not only occurrences but also complexity [PR109154]Tamar Christina2-21/+116
This patch builds on the previous patch by fixing another issue with the way ifcvt currently picks which branches to test. The issue with the current implementation is while it sorts for occurrences of the argument, it doesn't check for complexity of the arguments. As an example: <bb 15> [local count: 528603100]: ... if (distbb_75 >= 0.0) goto <bb 17>; [59.00%] else goto <bb 16>; [41.00%] <bb 16> [local count: 216727269]: ... goto <bb 19>; [100.00%] <bb 17> [local count: 311875831]: ... if (distbb_75 < iftmp.0_98) goto <bb 18>; [20.00%] else goto <bb 19>; [80.00%] <bb 18> [local count: 62375167]: ... <bb 19> [local count: 528603100]: # prephitmp_175 = PHI <_173(18), 0.0(17), _174(16)> All tree arguments to the PHI have the same number of occurrences, namely 1, however it makes a big difference which comparison we test first. Sorting only on occurrences we'll pick the compares coming from BB 18 and BB 17, This means we end up generating 4 comparisons, while 2 would have been enough. By keeping track of the "complexity" of the COND in each BB, (i.e. the number of comparisons needed to traverse from the start [BB 15] to end [BB 19]) and using a key tuple of <occurrences, complexity> we end up selecting the compare from BB 16 and BB 18 first. BB 16 only requires 1 compare, and BB 18, after we test BB 16 also only requires one additional compare. This change paired with the one previous above results in the optimal 2 compares. For deep nesting, i.e. for ... _79 = vr_15 > 20; _80 = _68 & _79; _82 = vr_15 <= 20; _83 = _68 & _82; _84 = vr_15 < -20; _85 = _73 & _84; _87 = vr_15 >= -20; _88 = _73 & _87; _ifc__111 = _55 ? 10 : 12; _ifc__112 = _70 ? 7 : _ifc__111; _ifc__113 = _85 ? 8 : _ifc__112; _ifc__114 = _88 ? 9 : _ifc__113; _ifc__115 = _45 ? 1 : _ifc__114; _ifc__116 = _63 ? 3 : _ifc__115; _ifc__117 = _65 ? 4 : _ifc__116; _ifc__118 = _83 ? 6 : _ifc__117; _ifc__119 = _60 ? 2 : _ifc__118; _ifc__120 = _43 ? 13 : _ifc__119; _ifc__121 = _75 ? 11 : _ifc__120; vw_1 = _80 ? 5 : _ifc__121; Most of the comparisons are still needed because the chain of occurrences to not negate eachother. i.e. _80 is _73 & vr_15 >= -20 and _85 is _73 & vr_15 < -20. clearly given _73 needs to be true in both branches, the only additional test needed is on vr_15, where the one test is the negation of the other. So we don't need to do the comparison of _73 twice. The changes in the patch reduces the overall number of compares by one, but has a bigger effect on the dependency chain. Previously we would generate 5 instructions chain: cmple p7.s, p4/z, z29.s, z30.s cmpne p7.s, p7/z, z29.s, #0 cmple p6.s, p7/z, z31.s, z30.s cmpge p6.s, p6/z, z27.s, z25.s cmplt p15.s, p6/z, z28.s, z21.s as the longest chain. With this patch we generate 3: cmple p7.s, p3/z, z27.s, z30.s cmpne p7.s, p7/z, z27.s, #0 cmpgt p7.s, p7/z, z31.s, z30.s and I don't think (x <= y) && (x != 0) && (z > y) can be reduced further. gcc/ChangeLog: PR tree-optimization/109154 * tree-if-conv.cc (INCLUDE_ALGORITHM): Include. (struct bb_predicate): Add no_predicate_stmts. (set_bb_predicate): Increase predicate count. (set_bb_predicate_gimplified_stmts): Conditionally initialize no_predicate_stmts. (get_bb_num_predicate_stmts): New. (init_bb_predicate): Initialzie no_predicate_stmts. (release_bb_predicate): Cleanup no_predicate_stmts. (insert_gimplified_predicates): Preserve no_predicate_stmts. gcc/testsuite/ChangeLog: PR tree-optimization/109154 * gcc.dg/vect/vect-ifcvt-20.c: New test.
2023-07-14ifcvt: Reduce comparisons on conditionals by tracking truths [PR109154]Tamar Christina2-26/+174
Following on from Jakub's patch in g:de0ee9d14165eebb3d31c84e98260c05c3b33acb these two patches finishes the work fixing the regression and improves codegen. As explained in that commit, ifconvert sorts PHI args in increasing number of occurrences in order to reduce the number of comparisons done while traversing the tree. The remaining task that this patch fixes is dealing with the long chain of comparisons that can be created from phi nodes, particularly when they share any common successor (classical example is a diamond node). on a PHI-node the true and else branches carry a condition, true will carry `a` and false `~a`. The issue is that at the moment GCC tests both `a` and `~a` when the phi node has more than 2 arguments. Clearly this isn't needed. The deeper the nesting of phi nodes the larger the repetition. As an example, for foo (int *f, int d, int e) { for (int i = 0; i < 1024; i++) { int a = f[i]; int t; if (a < 0) t = 1; else if (a < e) t = 1 - a * d; else t = 0; f[i] = t; } } after Jakub's patch we generate: _7 = a_10 < 0; _21 = a_10 >= 0; _22 = a_10 < e_11(D); _23 = _21 & _22; _ifc__42 = _23 ? t_13 : 0; t_6 = _7 ? 1 : _ifc__42 but while better than before it is still inefficient, since in the false branch, where we know ~_7 is true, we still test _21. This leads to superfluous tests for every diamond node. After this patch we generate _7 = a_10 < 0; _22 = a_10 < e_11(D); _ifc__42 = _22 ? t_13 : 0; t_6 = _7 ? 1 : _ifc__42; Which correctly elides the test of _21. This is done by borrowing the vectorizer's helper functions to limit predicate mask usages. Ifcvt will chain conditionals on the false edge (unless specifically inverted) so this patch on creating cond a ? b : c, will register ~a when traversing c. If c is a conditional then c will be simplified to the smaller possible predicate given the assumptions we already know to be true. gcc/ChangeLog: PR tree-optimization/109154 * tree-if-conv.cc (gen_simplified_condition, gen_phi_nest_statement): New. (gen_phi_arg_condition, predicate_scalar_phi): Use it. gcc/testsuite/ChangeLog: PR tree-optimization/109154 * gcc.dg/vect/vect-ifcvt-19.c: New test.
2023-07-14Provide extra checking for phi argument access from edgeRichard Biener3-3/+33
The following adds checking that the edge we query an associated PHI arg for is related to the PHI node. Triggered by questionable code in one of my reviews. * gimple.h (gimple_phi_arg): New const overload. (gimple_phi_arg_def): Make gimple arg const. (gimple_phi_arg_def_from_edge): New inline function. * tree-phinodes.h (gimple_phi_arg_imm_use_ptr_from_edge): Likewise. * tree-ssa-operands.h (PHI_ARG_DEF_FROM_EDGE): Direct to new inline function. (PHI_ARG_DEF_PTR_FROM_EDGE): Likewise.
2023-07-14libgomp: Fix allocator handling for Linux when libnuma is not availableTobias Burnus1-1/+2
Follow up to r14-2462-g450b05ce54d3f0. The case that libnuma was not available at runtime was not properly handled; now it falls back to the normal malloc. libgomp/ * allocator.c (omp_init_allocator): Check whether symbol from dlopened libnuma is available before using libnuma for allocations.
2023-07-14RISC-V: Recognized zihintntl extensionsMonk Chiang4-0/+58
gcc/ChangeLog: * common/config/riscv/riscv-common.cc: (riscv_implied_info): Add zihintntl item. (riscv_ext_version_table): Ditto. (riscv_ext_flag_table): Ditto. * config/riscv/riscv-opts.h (MASK_ZIHINTNTL): New macro. (TARGET_ZIHINTNTL): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-22.c: New test. * gcc.target/riscv/predef-28.c: New test.
2023-07-14RISC-V: Remove the redundant expressions in the and<mode>3.Die Li1-5/+0
When generating the gen_and<mode>3 function based on the and<mode>3 template, it produces the expression emit_insn (gen_rtx_SET (operand0, gen_rtx_AND (<mode>, operand1, operand2)));, which is identical to the portion I removed in this patch. Therefore, the redundant portion can be deleted. Signed-off-by: Die Li <lidie@eswincomputing.com> gcc/ChangeLog: * config/riscv/riscv.md: Remove redundant portion in and<mode>3.
2023-07-14SH: Fix PR101496 peephole bugOleg Endo1-0/+39
gcc/ChangeLog: PR target/101469 * config/sh/sh.md (peephole2): Handle case where eliminated reg is also used by the address of the following memory operand.
2023-07-14Daily bump.GCC Administrator8-1/+472
2023-07-13pdp11: Fix epilogue generation [PR107841]Mikael Pettersson2-1/+13
gcc/ PR target/107841 * config/pdp11/pdp11.cc (pdp11_expand_epilogue): Also deallocate alloca-only frame. gcc/testsuite/ PR target/107841 * gcc.target/pdp11/pr107841.c: New test.
2023-07-13m2, build: Use LDLFAGS for mklinkRainer Orth1-1/+1
When trying to bootstrap current trunk on macOS 14.0 beta 3 with Xcode 15 beta 4, the build failed running mklink in stage 2: unset CC ; m2/boot-bin/mklink -s --langc++ --exit --name m2/mc-boot/main.cc /vol/gcc/src/hg/master/darwin/gcc/m2/init/mcinit dyld[55825]: Library not loaded: /vol/gcc/lib/libstdc++.6.dylib While it's unclear to me why this only happens on macOS 14, the problem is clear: unlike other C++ executables, mklink isn't linked with -static-libstdc++ which is passed in from toplevel in LDFLAGS. This patch fixes that and allows the build to continue. Bootstrapped on x86_64-apple-darwin23.0.0, i386-pc-solaris2.11, and sparc-sun-solaris2.11. 2023-07-11 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc/m2: * Make-lang.in (m2/boot-bin/mklink$(exeext)): Add $(LDFLAGS).
2023-07-13fortran: Release symbols in reversed order [PR106050]Mikael Morin2-1/+16
Release symbols in reversed order wrt the order they were allocated. This fixes an error recovery ICE in the case of a misplaced derived type declaration. Such a declaration creates nested symbols, one for the derived type and one for each type parameter, which should be immediately released as the declaration is rejected. This breaks if the derived type is released first. As the type parameter symbols are in the namespace of the derived type, releasing the derived type releases the type parameters, so one can't access them after that, even to release them. Hence, the type parameters should be released first. PR fortran/106050 gcc/fortran/ChangeLog: * symbol.cc (gfc_restore_last_undo_checkpoint): Release symbols in reverse order. gcc/testsuite/ChangeLog: * gfortran.dg/pdt_33.f90: New test.
2023-07-13Darwin: Use -platform_version when available [PR110624].Iain Sandoe1-3/+12
Later versions of the static linker support a more flexible flag to describe the OS, OS version and SDK used to build the code. This replaces the functionality of '-mmacosx_version_min' (which is now deprecated, leading to the diagnostic described in the PR). We now use the platform_version flag when available which avoids the diagnostic. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> PR target/110624 gcc/ChangeLog: * config/darwin.h (DARWIN_PLATFORM_ID): New. (LINK_COMMAND_A): Use DARWIN_PLATFORM_ID to pass OS, OS version and SDK data to the static linker.
2023-07-13rs6000, Add return value to __builtin_set_fpscr_rnCarl Love6-32/+214
Change the return value from void to double for __builtin_set_fpscr_rn. The return value consists of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions. A new test file, test powerpc/test_fpscr_rn_builtin_2.c, is added to test the new return value for the built-in. The value __SET_FPSCR_RN_RETURNS_FPSCR__ is defined if __builtin_set_fpscr_rn returns a double. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update built-in definition return type. * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Add check, define __SET_FPSCR_RN_RETURNS_FPSCR__ macro. * config/rs6000/rs6000.md (rs6000_set_fpscr_rn): Add return argument to return FPSCR fields. * doc/extend.texi (__builtin_set_fpscr_rn): Update description for the return value. Add description for __SET_FPSCR_RN_RETURNS_FPSCR__ macro. gcc/testsuite/ChangeLog: * gcc.target/powerpc/test_fpscr_rn_builtin.c: Rename to test_fpscr_rn_builtin_1.c. Add comment. * gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the return value of __builtin_set_fpscr_rn builtin.
2023-07-13libstdc++: std::stoi etc. do not need C99 <stdlib.h> support [PR110653]Jonathan Wakely1-5/+19
std::stoi, std::stol, std::stoul, and std::stod only depend on C89 functions, so don't need to be guarded by _GLIBCXX_USE_C99_STDLIB std::stoll and std::stoull don't need C99 strtoll and strtoull if sizeof(long) == sizeof(long long). std::stold doesn't need C99 strtold if DBL_MANT_DIG == LDBL_MANT_DIG. This only applies to the narrow character overloads, the wchar_t overloads depend on a separate _GLIBCXX_USE_C99_WCHAR macro and none of them can be implemented in C89 easily. libstdc++-v3/ChangeLog: PR libstdc++/110653 * include/bits/basic_string.h (stoi, stol, stoul, stod): Do not depend on _GLIBCXX_USE_C99_STDLIB. [__LONG_WIDTH__ == __LONG_LONG_WIDTH__] (stoll, stoull): Define in terms of stol and stoul respectively. [__DBL_MANT_DIG__ == __LDBL_MANT_DIG__] (stold): Define in terms of stod.
2023-07-13alpha: Fix computation mode in alpha_emit_set_long_cost [PR106966]Uros Bizjak2-1/+19
PR target/106966 gcc/ChangeLog: * config/alpha/alpha.cc (alpha_emit_set_long_const): Always use DImode when constructing long const. gcc/testsuite/ChangeLog: * gcc.target/alpha/pr106966.c: New test.
2023-07-13RA+sched: Change TRUE/FALSE to true/falseUros Bizjak5-18/+18
gcc/ChangeLog: * haifa-sched.cc: Change TRUE/FALSE to true/false. * ira.cc: Ditto. * lra-assigns.cc: Ditto. * lra-constraints.cc: Ditto. * sel-sched.cc: Ditto.
2023-07-13Fix part of PR 110293: `A NEEQ (A NEEQ CST)` partAndrew Pinski6-4/+274
This fixes part of PR 110293, for the outer comparison case being `!=` or `==`. In turn PR 110539 is able to be optimized again as the if statement for `(a&1) == ((a & 1) != 0)` gets optimized to `false` early enough to allow FRE/DOM to do a CSE for memory store/load. OK? Bootstrapped and tested on x86_64-linux with no regressions. gcc/ChangeLog: PR tree-optimization/110293 PR tree-optimization/110539 * match.pd: Expand the `x != (typeof x)(x == 0)` pattern to handle where the inner and outer comparsions are either `!=` or `==` and handle other constants than 0. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr110293-1.c: New test. * gcc.dg/tree-ssa/pr110539-1.c: New test. * gcc.dg/tree-ssa/pr110539-2.c: New test. * gcc.dg/tree-ssa/pr110539-3.c: New test. * gcc.dg/tree-ssa/pr110539-4.c: New test.
2023-07-13[RA][PR109520]: Catch error when there are no enough registers for asm insnVladimir N. Makarov5-14/+81
Asm insn unlike other insns can have so many operands whose constraints can not be satisfied. It results in LRA cycling for such test case. The following patch catches such situation and reports the problem. PR middle-end/109520 gcc/ChangeLog: * lra-int.h (lra_insn_recog_data): Add member asm_reloads_num. (lra_asm_insn_error): New prototype. * lra.cc: Include rtl_error.h. (lra_set_insn_recog_data): Initialize asm_reloads_num. (lra_asm_insn_error): New func whose code is taken from ... * lra-assigns.cc (lra_split_hard_reg_for): ... here. Use lra_asm_insn_error. * lra-constraints.cc (curr_insn_transform): Check reloads nummber for asm. gcc/testsuite/ChangeLog: * gcc.target/i386/pr109520.c: New test.
2023-07-13SSA MATH: Support COND_LEN_FMA for floating-point math optimizationJu-Zhe Zhong4-23/+159
Hi, Richard and Richi. Previous patch we support COND_LEN_* binary operations. However, we didn't support COND_LEN_* ternary. Now, this patch support COND_LEN_* ternary. Consider this following case: __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \ TYPE *__restrict a, \ TYPE *__restrict b,\ TYPE *__restrict c, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] += a[i] * b[i]; \ } TEST_ALL () Before this patch: ... COND_LEN_MUL COND_LEN_ADD Afther this patch: ... COND_LEN_FMA gcc/ChangeLog: * genmatch.cc (commutative_op): Add COND_LEN_* * internal-fn.cc (first_commutative_argument): Ditto. (CASE): Ditto. (get_unconditional_internal_fn): Ditto. (can_interpret_as_conditional_op_p): Ditto. (internal_fn_len_index): Ditto. * internal-fn.h (can_interpret_as_conditional_op_p): Ditt. * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Ditto. (convert_mult_to_fma): Ditto. (math_opts_dom_walker::after_dom_children): Ditto.
2023-07-13testsuite: dg-require LTO for libgomp LTO testsDavid Edelsohn3-0/+3
Some test cases in libgomp testsuite pass -flto as an option, but the testcases do not require LTO target support. This patch adds the necessary DejaGNU requirement for LTO support to the testcases.. libgomp/ChangeLog: * testsuite/libgomp.c++/target-map-class-2.C: Require LTO. * testsuite/libgomp.c-c++-common/requires-4.c: Require LTO. * testsuite/libgomp.c-c++-common/requires-4a.c: Require LTO. Signed-off-by: David Edelsohn <dje.gcc@gmail.com>