Age | Commit message (Collapse) | Author | Files | Lines |
|
libgcc/config/avr/libf7/
* libf7.h (F7_SIZEOF): New macro.
* libf7-asm.sx: Use F7_SIZEOF instead of magic number "10".
(F7MOD_D_fma_, __fma): New module and function.
(fma) [-mdouble=64]: Define as alias for __fma.
(fmal) [-mlong-double=64]: Define as alias for __fma.
* libf7-common.mk (F7_ASM_PARTS): Add D_fma.
|
|
and GENERAL_REGS.
For testcase
void __cond_swap(double* __x, double* __y) {
bool __r = (*__x < *__y);
auto __tmp = __r ? *__x : *__y;
*__y = __r ? *__y : *__x;
*__x = __tmp;
}
GCC-14 with -O2 and -march=x86-64 options generates the following code:
__cond_swap(double*, double*):
movsd xmm1, QWORD PTR [rdi]
movsd xmm0, QWORD PTR [rsi]
comisd xmm0, xmm1
jbe .L2
movq rax, xmm1
movapd xmm1, xmm0
movq xmm0, rax
.L2:
movsd QWORD PTR [rsi], xmm1
movsd QWORD PTR [rdi], xmm0
ret
rax is used to save and restore DFmode value. In RA both GENERAL_REGS
and SSE_REGS cost zero since we didn't disparage the
alternative in movdf_internal pattern, according to register
allocation order, GENERAL_REGS is allocated. The patch add ? for
alternative (r,v) and (v,r) just like we did for movsf/hf/bf_internal
pattern, after that we get optimal RA.
__cond_swap:
.LFB0:
.cfi_startproc
movsd (%rdi), %xmm1
movsd (%rsi), %xmm0
comisd %xmm1, %xmm0
jbe .L2
movapd %xmm1, %xmm2
movapd %xmm0, %xmm1
movapd %xmm2, %xmm0
.L2:
movsd %xmm1, (%rsi)
movsd %xmm0, (%rdi)
ret
gcc/ChangeLog:
PR target/110170
* config/i386/i386.md (movdf_internal): Disparage slightly for
2 alternatives (r,v) and (v,r) by adding constraint modifier
'?'.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110170-3.c: New test.
(cherry picked from commit 37a231cc7594d12ba0822077018aad751a6fb94e)
|
|
|
|
Merge up to r13-7954-gcc87aaeceea58389b681e3a6a63f95e54f2b59cd (16th Oct 2023)
|
|
gfc_match ("... %s ...", ...) matches a gfc_symbol but with
host_assoc = 0. This commit adds '%S' as variant which matches
with host_assoc = 1
gcc/fortran/ChangeLog:
* match.cc (gfc_match_char): Match with '%S' a symbol
with host_assoc = 1.
(cherry picked from commit 0607e93490058ec31b6ab57078c54771f139b870)
|
|
As PR111380 (and the discussion in related PRs) shows, for
now how function rs6000_can_inline_p treats the callee
without any target option node is wrong. It considers it's
always safe to inline this kind of callee, but actually its
target flags are from the command line options
(target_option_default_node), it's possible that the flags
of callee don't satisfy the condition of inlining, but it
is still inlined, then result in unexpected consequence.
As the associated test case pr111380-1.c shows, the caller
main is attributed with power8, but the callee foo is
compiled with power9 from command line, it's unexpected to
make main inline foo since foo can contain something that
requires power9 capability. Without this patch, for lto
(with -flto) we can get error message (as it forces the
callee to have a target option node), but for non-lto, it's
inlined unexpectedly.
This patch is to make callee adopt target_option_default_node
when it doesn't have a target option node, it can avoid wrong
inlining decision and fix the inconsistency between LTO and
non-LTO. It also aligns with what the other ports do.
PR target/111380
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_can_inline_p): Adopt
target_option_default_node when the callee has no option
attributes, also simplify the existing code accordingly.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr111380-1.c: New test.
* gcc.target/powerpc/pr111380-2.c: New test.
(cherry picked from commit 266dfed68b881702e9660889f63408054b7fa9c0)
|
|
PR111366 exposes one thing that can be improved in function
rs6000_update_ipa_fn_target_info is to skip the given empty
inline asm string, since it's impossible to adopt any
hardware features (so far HTM).
Since this rs6000_update_ipa_fn_target_info related approach
exists in GCC12 and later, the affected project highway has
updated its target pragma with ",htm", see the link:
https://github.com/google/highway/commit/15e63d61eb535f478bc
I'd not bother to consider an inline asm parser for now but
will file a separated PR for further enhancement.
PR target/111366
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_update_ipa_fn_target_info): Skip
empty inline asm.
gcc/testsuite/ChangeLog:
* g++.target/powerpc/pr111366.C: New test.
(cherry picked from commit a65b38e361320e0aa45adbc969c704385ab1f45b)
|
|
|
|
|
|
|
|
PR tree-optimization/111622
* value-relation.cc (equiv_oracle::add_partial_equiv): Do not
register a partial equivalence if an operand has no uses.
|
|
|
|
libgcc/config/avr/libf7/
* libf7.c (F7MOD_atan2_, f7_atan2): New module and function.
* libf7.h: Adjust comments.
* libf7-common.mk (CALL_PROLOGUES): Add atan2.
|
|
|
|
A floating point equivalence may not properly reflect both signs of
zero, so be pessimsitic and ensure both signs are included.
PR tree-optimization/111694
gcc/
* gimple-range-cache.cc (ranger_cache::fill_block_cache): Adjust
equivalence range.
* value-relation.cc (adjust_equivalence_range): New.
* value-relation.h (adjust_equivalence_range): New prototype.
gcc/testsuite/
* gcc.dg/pr111694.c: New.
|
|
The following testcase is miscompiled, because count_nonzero_bytes incorrectly
uses get_strinfo information on a pointer from which an earlier instruction
loads SSA_NAME stored at the current instruction. get_strinfo shows a state
right before the current store though, so if there are some stores in between
the current store and the load, the string length information might have
changed.
The patch passes around gimple_vuse from the store and punts instead of using
strinfo on loads from MEM_REF which have different gimple_vuse from that.
2023-10-11 Richard Biener <rguenther@suse.de>
Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/111519
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Add vuse
argument and pass it through to recursive calls and
count_nonzero_bytes_addr calls. Don't shadow the stmt argument, but
change stmt for gimple_assign_single_p statements for which we don't
immediately punt.
(strlen_pass::count_nonzero_bytes_addr): Add vuse argument and pass
it through to recursive calls and count_nonzero_bytes calls. Don't
use get_strinfo if gimple_vuse (stmt) is different from vuse. Don't
shadow the stmt argument.
* gcc.dg/torture/pr111519.c: New testcase.
(cherry picked from commit e75bf1985fdc9a5d3a307882a9251d8fd6e93def)
|
|
|
|
This occurs when one of the types has an incomplete declaration in addition
to its full declaration in its package. In this case AI05-129 says that the
incomplete type is not part of the limited view of the package, i.e. only
the full view is. Now, in the GNAT implementation, it's the opposite in the
regular view of the package, i.e. the incomplete type is the visible one.
That's why the implementation needs to also swap the types on the visibility
chain while it is swapping the views when the clauses are either installed
or removed. This works correctly for the installation, but does not for the
removal, so this change rewrites the code doing the latter.
gcc/ada/
PR ada/111434
* sem_ch10.adb (Replace): New procedure to replace an entity with
another on the homonym chain.
(Install_Limited_With_Clause): Rename Non_Lim_View to Typ for the
sake of consistency. Call Replace to do the replacements and split
the code into the regular and the special cases. Add debuggging
output controlled by -gnatdi.
(Install_With_Clause): Print the Parent_With and Implicit_With flags
in the debugging output controlled by -gnatdi.
(Remove_Limited_With_Unit.Restore_Chain_For_Shadow (Shadow)): Rewrite
using a direct replacement of E4 by E2. Call Replace to do the
replacements. Add debuggging output controlled by -gnatdi.
|
|
|
|
For strictly structured blocks, a BLOCK was created but the code
was placed after the block the outer structured block. Additionally,
labelled blocks were mishandled. As the code is now properly in a
BLOCK, it solves additional issues.
gcc/fortran/ChangeLog:
* parse.cc (parse_omp_structured_block): Make the user code end
up inside of BLOCK construct for strictly structured blocks;
fix fallout for 'section' and 'teams'.
* openmp.cc (resolve_omp_target): Fix changed BLOCK handling
for teams in target checking.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/strictly-structured-block-1.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/block_17.f90: New test.
* gfortran.dg/gomp/strictly-structured-block-5.f90: New test.
(cherry picked from commit 6a8edd50a149f10621b59798c887c24c81c8b9ea)
|
|
|
|
|
|
Match has a pattern which converts `vec_cond(vec_cond(a,b,0), c, d)`
into `vec_cond(a & b, c, d)` but since in this case a is a comparison
fold will change `a & b` back into `vec_cond(a,b,0)` which causes an
infinite loop.
The best way to fix this is to enable the patterns for vec_cond(*,vec_cond,*)
only for GIMPLE so we don't get an infinite loop for fold any more.
Note this is a latent bug since these patterns were added in r11-2577-g229752afe3156a
and was exposed by r14-3350-g47b833a9abe1 where now able to remove a VIEW_CONVERT_EXPR.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR middle-end/111699
gcc/ChangeLog:
* match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
(v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Enable only for GIMPLE.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr111699-1.c: New test.
(cherry picked from commit e77428a9a336f57e3efe3eff95f2b491d7e9be14)
|
|
|
|
Merge up to r13-7936-g7c47e03df2a77f2e25e23887734a5e818aeca3f5 (6th Oct 2023)
|
|
|
|
|
|
2023-10-04 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/37336
PR fortran/111674
* trans-expr.cc (gfc_trans_scalar_assign): Finalize components
on deallocation if derived type is not finalizable.
gcc/testsuite/
PR fortran/37336
PR fortran/111674
* gfortran.dg/allocate_with_source_25.f90: Final count in tree
dump reverts from 4 to original 6.
* gfortran.dg/finalize_38.f90: Add test for fix of PR111674.
(cherry picked from commit 84284e1c490e9235fca5cb85269ecfcb87eef4f1)
|
|
|
|
|
|
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1b6f0476837205932613ddb2b3429a55c26c409d
changed _Hash_node_value_base to no longer derive from _Hash_node_base, which means
that its member functions expect _M_storage to be at a different offset. So explosions
result if an out-of-line definition is emitted for any of the member functions (say,
in a non-optimized build) and the resulting object file is then linked with code built
using older version of GCC/libstdc++.
libstdc++-v3/ChangeLog:
PR libstdc++/111050
* include/bits/hashtable_policy.h
(_Hash_node_value_base<>::_M_valptr(), _Hash_node_value_base<>::_M_v())
Add [[__gnu__::__always_inline__]].
(cherry picked from commit 2c1e3544a94c5d7354fad031e1f9731c3ce3af25)
|
|
It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.
gcc/
* config/rs6000/rs6000.cc (rs6000_rtx_costs): Check whether the
modulo instruction is disabled.
* config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
* config/rs6000/rs6000.md (mod<mode>3, *mod<mode>3): Check it.
(define_expand umod<mode>3): New.
(define_insn umod<mode>3): Rename to *umod<mode>3 and check if the modulo
instruction is disabled.
(umodti3, modti3): Check if the modulo instruction is disabled.
gcc/testsuite/
* gcc.target/powerpc/clone1.c: Add xfails.
* gcc.target/powerpc/clone3.c: Likewise.
* gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
* gcc.target/powerpc/mod-2.c: Likewise.
* gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
(cherry picked from commit 58ab38213b979811d314f68e3f455c28a1d44140)
|
|
|
|
The problem here is after r6-7425-ga9fee7cdc3c62d0e51730,
the comparison to see if the transformation could be done was using the
wrong value. Instead of see if the inner was LE (for MIN and GE for MAX)
the outer value, it was comparing the inner to the value used in the comparison
which was wrong.
Committed to GCC 13 branch after bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/111331
* tree-ssa-phiopt.cc (minmax_replacement):
Fix the LE/GE comparison for the
`(a CMP CST1) ? max<a,CST2> : a` optimization.
gcc/testsuite/ChangeLog:
PR tree-optimization/111331
* gcc.c-torture/execute/pr111331-1.c: New test.
* gcc.c-torture/execute/pr111331-2.c: New test.
* gcc.c-torture/execute/pr111331-3.c: New test.
(cherry picked from commit 30e6ee074588bacefd2dfe745b188bb20c81fe5e)
|
|
The issue here is that when backprop tries to go
and strip sign ops, it skips over ABSU_EXPR but
ABSU_EXPR not only does an ABS, it also changes the
type to unsigned.
Since strip_sign_op_1 is only supposed to strip off
sign changing operands and not ones that change types,
removing ABSU_EXPR here is correct. We don't handle
nop conversions so this does cause any missed optimizations either.
Committed to the GCC 13 branch after bootstrapped and
tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/110386
gcc/ChangeLog:
* gimple-ssa-backprop.cc (strip_sign_op_1): Remove ABSU_EXPR.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr110386-1.c: New test.
* gcc.c-torture/compile/pr110386-2.c: New test.
(cherry picked from commit 2bbac12ea7bd8a3eef5382e1b13f6019df4ec03f)
|
|
|
|
Adapt to different parameter count in comparison to gcc-14.
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_trans_deferred_array): Use correct
position for statements to add to guarded block.
|
|
|
|
When freeing allocatable components of an allocatable coarray, add
a check that the coarray is still allocated, before accessing the
components.
This patch adds to PR fortran/37336, but does not fix it completely.
gcc/fortran/ChangeLog:
PR fortran/37336
* trans-array.cc (structure_alloc_comps): Deref coarray.
(gfc_trans_deferred_array): Add freeing of components after
check for allocated coarray.
gcc/testsuite/ChangeLog:
PR fortran/37336
* gfortran.dg/coarray/alloc_comp_6.f90: New test.
* gfortran.dg/coarray/alloc_comp_7.f90: New test.
(cherry picked from commit 9a63a62dfd73e159f1956e9b04b555c445de4e78)
|
|
Merge up to r13-7922-gb5b98a2d055d967d1fc92859827839c83c9368d7 (29th Sep 2023)
|
|
List official cores first so that -mcpu=native does not show a codename with
-v or in errors/warnings.
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (neoverse-n1): Place before ares.
(neoverse-v1): Place before zeus.
(neoverse-v2): Place before demeter.
* config/aarch64/aarch64-tune.md: Regenerate.
(cherry picked from commit 64d5bc35c8c2a66ac133a3e6ace820b0ad8a63fb)
|
|
A MOPS memmove may corrupt registers since there is no copy of the input
operands to temporary registers. Fix this by calling
aarch64_expand_cpymem_mops.
Reviewed-by: Richard Sandiford <richard.sandiford@arm.com>
gcc/ChangeLog/
PR target/111121
* config/aarch64/aarch64.md (aarch64_movmemdi): Add new expander.
(movmemdi): Call aarch64_expand_cpymem_mops for correct expansion.
* config/aarch64/aarch64.cc (aarch64_expand_cpymem_mops): Add support
for memmove.
* config/aarch64/aarch64-protos.h (aarch64_expand_cpymem_mops): Add new
function.
gcc/testsuite/ChangeLog/
PR target/111121
* gcc.target/aarch64/mops_4.c: Add memmove testcases.
(cherry picked from commit d8b56c95782aeeee79ec40932ca88d00fd9f2ee2)
|
|
|
|
|
|
libstdc++-v3/ChangeLog:
PR libstdc++/111102
* testsuite/std/format/string.cc: Check wide character format
strings with out-of-range widths.
(cherry picked from commit 7564fe98657ad5ede34bd08f5279778fa8698865)
|
|
GCC do not consider the inline namespace in friend function declarations.
This is PR c++/59526, we need to explicit this namespace.
libstdc++-v3/ChangeLog:
* include/std/format (std::__format::_Arg_store): Explicit version
namespace on make_format_args friend declaration.
(cherry picked from commit 92456291849fe88303bbcab366f41dcd4a885ad5)
|
|
When parsing a format string, the width is parsed into an unsigned short
but the result is not checked in the case the format string is not a
char string (such as a wide string). In case the parse fails, a null
pointer is returned which is used for pointer arithmetic which is
undefined behaviour.
Signed-off-by: Paul Dreik <gccpatches@pauldreik.se>
libstdc++-v3/ChangeLog:
PR libstdc++/111102
* include/std/format (__format::__parse_integer): Check for
non-null pointer.
(cherry picked from commit dd4bdb9eea436bf06f175d8dbfc2190377455be4)
|
|
libstdc++-v3/ChangeLog:
* include/std/format: Fix some warnings.
(__format::__write(Ctx&, basic_string_view<CharT>)): Remove
unused function template.
(cherry picked from commit b9e5a4b4f035ba85b1a4065b751c2d583206b4e3)
|
|
A decimal point was being added to the end of the string for {:#.0}
because the __expc character was not being set, for the _Pres_none
presentation type, so __s.find(__expc) didn't the 'e' in "1e+01" and so
we created "1e+01." by appending the radix char to the end.
This can be fixed by ensuring that __expc='e' is set for the _Pres_none
case. I realized we can also set __expc='P' and __expc='E' when needed,
to save a call to std::toupper later.
For the {:#.0g} format, __expc='e' was being set and so the 'e' was
found in "1e+10" but then __z = __prec - __sigfigs would wraparound to
SIZE_MAX. That meant we would decide not to add a radix char because the
number of extra characters to insert would be 1+SIZE_MAX i.e. zero.
This can be fixed by using __z == 0 when __prec == 0.
libstdc++-v3/ChangeLog:
PR libstdc++/108046
* include/std/format (__formatter_fp::format): Ensure __expc is
always set for all presentation types. Set __z correctly for
zero precision.
* testsuite/std/format/functions/format.cc: Check problem cases.
(cherry picked from commit 50bc490c090cc95175e6068ed7438788d7fd7040)
|
|
Some constexpr functions were inadvertently relying on relaxed constexpr
rules from later standards.
libstdc++-v3/ChangeLog:
* include/bits/chrono.h (duration_cast): Do not use braces
around statements for C++11 constexpr rules.
* include/bits/stl_algobase.h (__lg): Rewrite as a single
statement for C++11 constexpr rules.
* include/experimental/bits/fs_path.h (path::string): Use
_GLIBCXX17_CONSTEXPR not _GLIBCXX_CONSTEXPR for 'if constexpr'.
* include/std/charconv (__to_chars_8): Initialize variable for
C++17 constexpr rules.
(cherry picked from commit b3a2b307b9deea719fb725a86df43b82176fe459)
|