Age | Commit message (Collapse) | Author | Files | Lines |
|
Zero-length pack expansions are treated as if no list were provided
at all, that is, with
template<typename...> struct S { };
template<typename T, typename... Ts>
void g() {
S<std::is_same<T, Ts>...>;
}
g<int> will result in S<>. In the following test we have something
similar:
template <typename T, typename... Ts>
using IsOneOf = disjunction<is_same<T, Ts>...>;
and then we have "IsOneOf<OtherHolders>..." where OtherHolders is an
empty pack. Since r11-7931, we strip_typedefs in TYPE_PACK_EXPANSION.
In this test that results in "IsOneOf<OtherHolders>" being turned into
"disjunction<>". So the whole expansion is now "disjunction<>...". But
then we error in make_pack_expansion because find_parameter_packs_r won't
find the pack OtherHolders.
We strip the alias template because dependent_alias_template_spec_p says
it's not dependent. It it not dependent because this alias is not
TEMPLATE_DECL_COMPLEX_ALIAS_P. My understanding is that currently we
consider an alias complex if it
1) expands a pack from the enclosing class, as in
template<template<typename... U> typename... TT>
struct S {
template<typename... Args>
using X = P<TT<Args...>...>;
};
where the alias expands TT; or
2) the expansion does *not* name all the template parameters, as in
template<typename...> struct R;
template<typename T, typename... Ts>
using U = R<X<Ts>...>;
where T is not named in the expansion.
But IsOneOf is neither. And it can't know how it's going to be used.
Therefore I think we cannot make it complex (and in turn dependent) to fix
this bug.
After much gnashing of teeth, I think we simply want to avoid stripping
the alias if the new pattern doesn't have any parameter packs to expand.
PR c++/104008
gcc/cp/ChangeLog:
* tree.cc (strip_typedefs): Don't strip an alias template when
doing so would result in losing a parameter pack.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/variadic-alias3.C: New test.
* g++.dg/cpp0x/variadic-alias4.C: New test.
|
|
gfc_omp_predetermined_sharing cases the associate-name pointer variable
to be OMP_CLAUSE_DEFAULT_FIRSTPRIVATE, which is fine. However, the associated
selector is shared. Thus, the target of associate-name pointer should not get
copied. (It was before but because of gfc_omp_privatize_by_reference returning
false, the selector was not only wrongly copied but this was also not done
properly.)
gcc/fortran/ChangeLog:
PR fortran/103039
* trans-openmp.cc (gfc_omp_clause_copy_ctor, gfc_omp_clause_dtor):
Only privatize pointer for associate names.
libgomp/ChangeLog:
PR fortran/103039
* testsuite/libgomp.fortran/associate4.f90: New test.
|
|
Partially revert r12-4190-g6da36b7d0e43b6f9281c65c19a025d4888a25b2d
because using __and_<..., is_copy_constructible<T>> when T is incomplete
results in an error about deriving from is_copy_constructible<T> when
that is incomplete. I don't know how to fix that, so this simply
restores the previous constraint which worked in this case (even though
I think it's technically undefined to use is_copy_constructible<T> with
incomplete T). This doesn't restore exactly what we had before, but uses
the is_copy_constructible_v and __is_in_place_type_v variable templates
instead of the ::value member.
libstdc++-v3/ChangeLog:
PR libstdc++/104242
* include/std/any (any(T&&)): Revert change to constraints.
* testsuite/20_util/any/cons/104242.cc: New test.
|
|
Darwin versions <= 10 (macOS 10.6) emit different diagnostics for the failure
case being tested by bad-mapper-1.C. Adjust the dg- expressions to reflect this.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/testsuite/ChangeLog:
* g++.dg/modules/bad-mapper-1.C: Make dg- expressions that match the
diagnostics output by earlier Darwin too.
|
|
I accidentally committed an outdated version of patch "[openmp] Set location
for taskloop stmts".
Fix this by adding the missing changes.
gcc/ChangeLog:
2022-03-18 Tom de Vries <tdevries@suse.de>
* gimplify.cc (gimplify_omp_for): Set location using 'input_location'.
Set gfor location only when dealing with a OMP_TASKLOOP.
|
|
Some versions of the BSD getaddrinfo() call do not work with the specific
input of "0" for the servname entry (a segv results). Since we are making
the call with a dummy port number, the value is actually no important, other
than it should be in range. Work around the BSD bug by using "1" instead.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
c++tools/ChangeLog:
* server.cc (accept_from): Use "1" as the dummy port number.
|
|
the getaddrinfo() requires either a non-null name for the server or
a port service / number. In the code that opens a connection we have
been calling this with a dummy port number of "0". Unfortunately this
triggers a bug in some BSD versions and OSes importing that code.
In this part of the code we do not really need a port number, since it
is not reasonable to open a connection to an unspecified host.
Setting hints info field to 0, and the servname parm to nullptr works
around the BSD bug in this case.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libcody/ChangeLog:
* netclient.cc (OpenInet6): Do not provide a dummy port number
in the getaddrinfo() call.
|
|
The test-case included in this patch contains:
...
#pragma omp taskloop simd shared(a) lastprivate(myId)
...
This is translated to 3 taskloop statements in gimple, visible with
-fdump-tree-gimple:
...
#pragma omp taskloop private(D.2124)
#pragma omp taskloop shared(a) shared(myId) private(i.0) firstprivate(a_h)
#pragma omp taskloop lastprivate(myId)
...
But when exposing the gimple statement locations using
-fdump-tree-gimple-lineno, we find that only the first one has location
information.
Fix this by adding the missing location information.
Tested gomp.exp on x86_64.
Tested libgomp testsuite on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-03-18 Tom de Vries <tdevries@suse.de>
* gimplify.cc (gimplify_omp_for): Set taskloop location.
gcc/testsuite/ChangeLog:
2022-03-18 Tom de Vries <tdevries@suse.de>
* c-c++-common/gomp/pr104968.c: New test.
|
|
Consider test-case pr104952-1.c, included in this commit, containing:
...
#pragma omp target map(tofrom:result) map(to:arr)
#pragma omp simd reduction(||: result)
...
When run on x86_64 with nvptx accelerator, the test-case either aborts or
hangs.
The reduction clause is translated by the SIMT code (active for nvptx) as a
butterfly reduction loop with this butterfly shuffle / update pair:
...
D.2163 = D.2163 || .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164)
...
in the loop body.
The problem is that the butterfly shuffle is possibly not executed, while it
needs to be executed unconditionally.
Fix this by translating instead as:
...
D.tmp_bfly = .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164)
D.2163 = D.2163 || D.tmp_bfly
...
Tested on x86_64-linux with nvptx accelerator.
gcc/ChangeLog:
2022-03-17 Tom de Vries <tdevries@suse.de>
PR target/104952
* omp-low.cc (lower_rec_input_clauses): Make sure GOMP_SIMT_XCHG_BFLY
is executed unconditionally.
libgomp/ChangeLog:
2022-03-17 Tom de Vries <tdevries@suse.de>
PR target/104952
* testsuite/libgomp.c/pr104952-1.c: New test.
* testsuite/libgomp.c/pr104952-2.c: New test.
|
|
gcc/fortran/ChangeLog:
PR fortran/103039
* openmp.cc (resolve_omp_clauses): Improve associate-name diagnostic
for select type/rank.
gcc/testsuite/ChangeLog:
PR fortran/103039
* gfortran.dg/gomp/associate1.f90: Update dg-error.
* gfortran.dg/gomp/associate2.f90: New test.
|
|
Set attr from HImode to HFmode which uses vmovsh instead of vmovw for
movment between sse registers.
gcc/ChangeLog:
PR target/104974
* config/i386/i386.md (*movhi_internal): Set attr type from HI
to HF for alternative 12 under TARGET_AVX512FP16.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr104974.c: New test.
|
|
This avoids including the whole of <functional> in <algorithm>, as the
<pstl/glue_algorithm_defs.h> header only actually needs std::pair.
This also avoids including <iterator> in <pstl/utils.h>, which only
needs <type_traits>, std::bad_alloc, and std::terminate (which can be
repalced with std::__terminate). This matters less, because
<pstl/utils.h> is only included by the <pstl/*_impl.h> headers and they
all use <iterator> anyway, and are only included by <execution>.
libstdc++-v3/ChangeLog:
PR libstdc++/92546
* include/pstl/glue_algorithm_defs.h: Replace <functional> with
<bits/stl_pair.h>.
* include/pstl/utils.h: Replace <iterator> with <type_traits>.
(__pstl::__internal::__except_handler): Use std::__terminate
instead of std::terminate.
* src/c++17/fs_path.cc: Include <array>.
* testsuite/25_algorithms/adjacent_find/constexpr.cc: Include
<functional>.
* testsuite/25_algorithms/binary_search/constexpr.cc: Likewise.
* testsuite/25_algorithms/clamp/constrained.cc: Likewise.
* testsuite/25_algorithms/equal/constrained.cc: Likewise.
* testsuite/25_algorithms/for_each/constrained.cc: Likewise.
* testsuite/25_algorithms/includes/constrained.cc: Likewise.
* testsuite/25_algorithms/is_heap/constexpr.cc: Likewise.
* testsuite/25_algorithms/is_heap_until/constexpr.cc: Likewise.
* testsuite/25_algorithms/is_permutation/constrained.cc: Include
<iterator>.
* testsuite/25_algorithms/is_sorted/constexpr.cc: Include
<functional>.
* testsuite/25_algorithms/is_sorted_until/constexpr.cc:
Likewise.
* testsuite/25_algorithms/lexicographical_compare/constexpr.cc:
Likewise.
* testsuite/25_algorithms/lexicographical_compare/constrained.cc:
Likewise.
* testsuite/25_algorithms/lexicographical_compare_three_way/1.cc:
Include <array>.
* testsuite/25_algorithms/lower_bound/constexpr.cc: Include
<functional>.
* testsuite/25_algorithms/max/constrained.cc: Likewise.
* testsuite/25_algorithms/max_element/constrained.cc: Likewise.
* testsuite/25_algorithms/min/constrained.cc: Likewise.
* testsuite/25_algorithms/min_element/constrained.cc: Likewise.
* testsuite/25_algorithms/minmax_element/constrained.cc:
Likewise.
* testsuite/25_algorithms/mismatch/constexpr.cc: Likewise.
* testsuite/25_algorithms/move/93872.cc: Likewise.
* testsuite/25_algorithms/move_backward/93872.cc: Include
<iterator>.
* testsuite/25_algorithms/nth_element/constexpr.cc: Include
<functional>.
* testsuite/25_algorithms/partial_sort/constexpr.cc: Likewise.
* testsuite/25_algorithms/partial_sort_copy/constexpr.cc:
Likewise.
* testsuite/25_algorithms/search/constexpr.cc: Likewise.
* testsuite/25_algorithms/search_n/constrained.cc: Likewise.
* testsuite/25_algorithms/set_difference/constexpr.cc: Likewise.
* testsuite/25_algorithms/set_difference/constrained.cc:
Likewise.
* testsuite/25_algorithms/set_intersection/constexpr.cc:
Likewise.
* testsuite/25_algorithms/set_intersection/constrained.cc:
Likewise.
* testsuite/25_algorithms/set_symmetric_difference/constexpr.cc:
Likewise.
* testsuite/25_algorithms/set_union/constexpr.cc: Likewise.
* testsuite/25_algorithms/set_union/constrained.cc: Likewise.
* testsuite/25_algorithms/sort/constexpr.cc: Likewise.
* testsuite/25_algorithms/sort_heap/constexpr.cc: Likewise.
* testsuite/25_algorithms/transform/constrained.cc: Likewise.
* testsuite/25_algorithms/unique/constexpr.cc: Likewise.
* testsuite/25_algorithms/unique/constrained.cc: Likewise.
* testsuite/25_algorithms/unique_copy/constexpr.cc: Likewise.
* testsuite/25_algorithms/upper_bound/constexpr.cc: Likewise.
* testsuite/std/ranges/adaptors/elements.cc: Include <vector>.
* testsuite/std/ranges/adaptors/lazy_split.cc: Likewise.
* testsuite/std/ranges/adaptors/split.cc: Likewise.
|
|
On Thu, Nov 11, 2021 at 02:14:05PM +0100, Thomas Schwinge wrote:
> There appears to be yet another issue: there still are quite a number of
> 'FAIL: libgomp.c/places-10.c execution test' reports on
> <gcc-testresults@gcc.gnu.org>. Also in my testing testing, on a system
> where '/sys/devices/system/node/online' contains '0-1', I get a FAIL:
>
> [...]
> OPENMP DISPLAY ENVIRONMENT BEGIN
> _OPENMP = '201511'
> OMP_DYNAMIC = 'FALSE'
> OMP_NESTED = 'FALSE'
> OMP_NUM_THREADS = '8'
> OMP_SCHEDULE = 'DYNAMIC'
> OMP_PROC_BIND = 'TRUE'
> OMP_PLACES = '{0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30},{FAIL: libgomp.c/places-10.c execution test
I've finally managed to debug this (by dumping used /sys/ files from
an affected system in Fedora build system, replacing /sys/ with /tmp/
in gcc sources and populating there those files), I think following patch
ought to fix it.
2022-03-18 Jakub Jelinek <jakub@redhat.com>
* config/linux/affinity.c (gomp_affinity_init_numa_domains): Move seen
variable next to pl variable.
|
|
march=sapphirerapids should be based on icelake server not cooperlake.
gcc/ChangeLog:
PR target/104963
* config/i386/i386.h (PTA_SAPPHIRERAPIDS): change it to base on ICX.
* doc/invoke.texi: Update documents for Intel sapphirerapids.
gcc/testsuite/ChangeLog:
PR target/104963
* gcc.target/i386/pr104963.c: New test case.
|
|
|
|
gcc/analyzer/ChangeLog:
* state-purge.cc (state_purge_annotator::add_node_annotations):
Avoid duplicate before-supernode annotations when returning from
an interprocedural call. Show after-supernode annotations.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
gcc/analyzer/ChangeLog:
* program-point.cc (program_point::get_next): Fix missing
increment of index.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
Implementations of the x87 floating point instruction set have always
had some pretty strange characteristics. For example on the original
Intel Pentium the FLDPI instruction (to load 3.14159... into a register)
took 5 cycles, and the FLDZ instruction (to load 0.0) took 2 cycles,
when a regular FLD (load from memory) took just 1 cycle!? Given that
back then memory latencies were much lower (relatively) than they are
today, these instructions were all but useless except when optimizing
for size (impressively FLDZ/FLDPI require only two bytes).
Such was the world back in 2006 when Uros Bizjak first added support for
fldz https://gcc.gnu.org/pipermail/gcc-patches/2006-November/202589.html
and then shortly after sensibly disabled them for !optimize_size with
https://gcc.gnu.org/pipermail/gcc-patches/2006-November/204405.html
Alas this vestigial logic still persists in the compiler today,
so for example on x86_64 for the following function:
double foo(double x) { return x + 0.0; }
generates with -O2
foo: addsd .LC0(%rip), %xmm0
ret
.LC0: .long 0
.long 0
preferring to read the constant 0.0 from memory [the constant pool],
except when optimizing for size. With -Os we get:
foo: xorps %xmm1, %xmm1
addsd %xmm1, %xmm0
ret
Which is not only smaller (the two instructions require seven bytes vs.
eight for the original addsd from mem, even without considering the
constant pool) but is also faster on modern hardware. The latter code
sequence is generated by both clang and msvc with -O2. Indeed Agner
Fogg documents the set of floating point/SSE constants that it's
cheaper to materialize than to load from memory.
This patch shuffles the conditions on the i386 backend's *movtf_internal,
*movdf_internal and *movsf_internal define_insns to untangle the newer
TARGET_SSE_MATH clauses from the historical standard_80387_constant_p
conditions. Amongst the benefits of this are that it improves the code
generated for PR tree-optimization/90356 and resolves PR target/86722.
2022-03-17 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/86722
PR tree-optimization/90356
* config/i386/i386.md (*movtf_internal): Don't guard
standard_sse_constant_p clause by optimize_function_for_size_p.
(*movdf_internal): Likewise.
(*movsf_internal): Likewise.
gcc/testsuite/ChangeLog
PR target/86722
PR tree-optimization/90356
* gcc.target/i386/pr86722.c: New test case.
* gcc.target/i386/pr90356.c: New test case.
|
|
This patch adjusts range_from_dom to follow the dominator tree through the
cache until value is found, then apply any outgoing ranges encountered
along the way. This reduces the amount of cache storage required.
PR tree-optimization/102943
* gimple-range-cache.cc (ranger_cache::range_from_dom): Find range via
dominators and apply intermediary outgoing edge ranges.
|
|
This only affects Windows, but reduces the preprocessed size of
<filesystem> significantly.
libstdc++-v3/ChangeLog:
PR libstdc++/92546
* include/bits/fs_path.h (path::make_preferred): Use
handwritten loop instead of std::replace.
|
|
GCC thinks the following can lead to a buffer overflow when __ns.size()
equals zero:
const basic_string<_CharT>& __ns = __mp.negative_sign();
_M_negative_sign_size = __ns.size();
__negative_sign = new _CharT[_M_negative_sign_size];
__ns.copy(__negative_sign, _M_negative_sign_size);
This happens because operator new might be replaced with something that
writes to this->_M_negative_sign_size and so the basic_string::copy call
could use a non-zero size to write to a zero-length buffer.
The solution suggested by Richi is to cache the size in a local variable
so that the compiler knows it won't be changed between the allocation
and the copy.
This commit goes further and rewrites the whole function to use RAII and
delay all modifications of *this until after all allocations have
succeeded. The RAII helper type caches the size and copies the string
and owns the memory until told to release it.
libstdc++-v3/ChangeLog:
PR middle-end/104966
* include/bits/locale_facets_nonio.tcc
(__moneypunct_cache::_M_cache): Replace try-catch with RAII and
make all string copies before any stores to *this.
|
|
As mentioned in the PR, the latest Intel SDM has added:
"Processors that enumerate support for Intel® AVX (by setting the feature flag CPUID.01H:ECX.AVX[bit 28])
guarantee that the 16-byte memory operations performed by the following instructions will always be
carried out atomically:
• MOVAPD, MOVAPS, and MOVDQA.
• VMOVAPD, VMOVAPS, and VMOVDQA when encoded with VEX.128.
• VMOVAPD, VMOVAPS, VMOVDQA32, and VMOVDQA64 when encoded with EVEX.128 and k0 (masking disabled).
(Note that these instructions require the linear addresses of their memory operands to be 16-byte
aligned.)"
The following patch deals with it just on the libatomic library side so far,
currently (since ~ 2017) we emit all the __atomic_* 16-byte builtins as
library calls since and this is something that we can hopefully backport.
The patch simply introduces yet another ifunc variant that takes priority
over the pure CMPXCHG16B one, one that checks AVX and CMPXCHG16B bits and
on non-Intel clears the AVX bit during detection for now (if AMD comes
with the same guarantee, we could revert the config/x86/init.c hunk),
which implements 16-byte atomic load as vmovdqa and 16-byte atomic store
as vmovdqa followed by mfence.
2022-03-17 Jakub Jelinek <jakub@redhat.com>
PR target/104688
* Makefile.am (IFUNC_OPTIONS): Change on x86_64 to -mcx16 -mcx16.
(libatomic_la_LIBADD): Add $(addsuffix _16_2_.lo,$(SIZEOBJS)) for
x86_64.
* Makefile.in: Regenerated.
* config/x86/host-config.h (IFUNC_COND_1): For x86_64 define to
both AVX and CMPXCHG16B bits.
(IFUNC_COND_2): Define.
(IFUNC_NCOND): For x86_64 define to 2 * (N == 16).
(MAYBE_HAVE_ATOMIC_CAS_16, MAYBE_HAVE_ATOMIC_EXCHANGE_16,
MAYBE_HAVE_ATOMIC_LDST_16): Define to IFUNC_COND_2 rather than
IFUNC_COND_1.
(HAVE_ATOMIC_CAS_16): Redefine to 1 whenever IFUNC_ALT != 0.
(HAVE_ATOMIC_LDST_16): Redefine to 1 whenever IFUNC_ALT == 1.
(atomic_compare_exchange_n): Define whenever IFUNC_ALT != 0
on x86_64 for N == 16.
(__atomic_load_n, __atomic_store_n): Redefine whenever IFUNC_ALT == 1
on x86_64 for N == 16.
(atomic_load_n, atomic_store_n): New functions.
* config/x86/init.c (__libat_feat1_init): On x86_64 clear bit_AVX
if CPU vendor is not Intel.
|
|
libstdc++-v3/ChangeLog:
* testsuite/util/testsuite_character.h: Fix comment.
|
|
Something went wrong when testing the earlier patch to move the
late sinking to before the late phiopt for PR102008. The following
makes sure to unsplit edges after the late sinking since the split
edges confuse the following phiopt leading to missed optimizations.
I've went for a new pass parameter for this to avoid changing the
CFG after the early sinking pass at this point.
2022-03-17 Richard Biener <rguenther@suse.de>
PR tree-optimization/104960
* passes.def: Add pass parameter to pass_sink_code, mark
last one to unsplit edges.
* tree-ssa-sink.cc (pass_sink_code::set_pass_param): New.
(pass_sink_code::execute): Always execute TODO_cleanup_cfg
when we need to unsplit edges.
* gcc.dg/gimplefe-37.c: Adjust to allow either the true
or false edge to have a forwarder.
|
|
As mentioned in the PR, we emit a bogus uninitialized warning but
easily could emit wrong-code for it or similar testcases too.
The bug is that we emit clobber for a TARGET_EXPR_SLOT too early:
D.2499.e = B::qux (&h); [return slot optimization]
D.2516 = 1;
try
{
B::B (&D.2498, &h);
try
{
_2 = baz (&D.2498);
D.2499.f = _2;
D.2516 = 0;
try
{
try
{
bar (&D.2499);
}
finally
{
C::~C (&D.2499);
}
}
finally
{
D.2499 = {CLOBBER(eol)};
}
}
finally
{
D.2498 = {CLOBBER(eol)};
}
}
catch
{
if (D.2516 != 0) goto <D.2517>; else goto <D.2518>;
<D.2517>:
A::~A (&D.2499.e);
goto <D.2519>;
<D.2518>:
<D.2519>:
}
The CLOBBER for D.2499 is essentially only emitted on the non-exceptional
path, if B::B or baz throws, then there is no CLOBBER for it but there
is a conditional destructor A::~A (&D.2499.e). Now, ehcleanup1
sink_clobbers optimization assumes that clobbers in the EH cases are
emitted after last use and so sinks the D.2499 = {CLOBBER(eol)}; later,
so we then have
# _3 = PHI <1(3), 0(9)>
<L2>:
D.2499 ={v} {CLOBBER(eol)};
D.2498 ={v} {CLOBBER(eol)};
if (_3 != 0)
goto <bb 11>; [INV]
else
goto <bb 15>; [INV]
<bb 11> :
_35 = D.2499.a;
if (&D.2499.b != _35)
where that _35 = D.2499.a comes from inline expansion of the A::~A dtor,
and that is a load from a clobbered memory.
Now, what the gimplifier sees in this case is a CLEANUP_POINT_EXPR with
somewhere inside of it a TARGET_EXPR for D.2499 (with the C::~C (&D.2499)
cleanup) which in its TARGET_EXPR_INITIAL has another TARGET_EXPR for
D.2516 bool flag which has CLEANUP_EH_ONLY which performs that conditional
A::~A (&D.2499.e) call.
The following patch ensures that CLOBBERs (and asan poisoning) are emitted
after even those gimple_push_cleanup pushed cleanups from within the
TARGET_EXPR_INITIAL gimplification (i.e. the last point where the slot could
be in theory used). In my first version of the patch I've done it by just
moving the
/* Add a clobber for the temporary going out of scope, like
gimplify_bind_expr. */
if (gimplify_ctxp->in_cleanup_point_expr
&& needs_to_live_in_memory (temp))
{
...
}
block earlier in gimplify_target_expr, but that regressed a couple of tests
where temp is marked TREE_ADDRESSABLE only during (well, very early during
that) the gimplification of TARGET_EXPR_INITIAL, so we didn't emit e.g. on
pr80032.C or stack2.C tests any clobbers for the slots and thus stack slot
reuse wasn't performed.
So that we don't regress those tests, this patch gimplifies
TARGET_EXPR_INITIAL as before, but doesn't emit it directly into pre_p,
emits it into a temporary sequence. Then emits the CLOBBER cleanup
into pre_p, then asan poisoning if needed, then appends the
TARGET_EXPR_INITIAL temporary sequence and finally adds TARGET_EXPR_CLEANUP
gimple_push_cleanup. The earlier a GIMPLE_WCE appears in the sequence, the
outer try/finally or try/catch it is.
So, with this patch the part of the testcase in gimple dump cited above
looks instead like:
try
{
D.2499.e = B::qux (&h); [return slot optimization]
D.2516 = 1;
try
{
try
{
B::B (&D.2498, &h);
_2 = baz (&D.2498);
D.2499.f = _2;
D.2516 = 0;
try
{
bar (&D.2499);
}
finally
{
C::~C (&D.2499);
}
}
finally
{
D.2498 = {CLOBBER(eol)};
}
}
catch
{
if (D.2516 != 0) goto <D.2517>; else goto <D.2518>;
<D.2517>:
A::~A (&D.2499.e);
goto <D.2519>;
<D.2518>:
<D.2519>:
}
}
finally
{
D.2499 = {CLOBBER(eol)};
}
2022-03-17 Jakub Jelinek <jakub@redhat.com>
PR middle-end/103984
* gimplify.cc (gimplify_target_expr): Gimplify type sizes and
TARGET_EXPR_INITIAL into a temporary sequence, then push clobbers
and asan unpoisioning, then append the temporary sequence and
finally the TARGET_EXPR_CLEANUP clobbers.
* g++.dg/opt/pr103984.C: New test.
|
|
gcc/testsuite/
* c-c++-common/goacc-gomp/nesting-1.c: Enhance.
* c-c++-common/goacc/kernels-loop-g.c: Likewise.
* c-c++-common/goacc/nesting-1.c: Likewise.
* gcc.dg/goacc/nested-function-1.c: Likewise.
* gfortran.dg/goacc/common-block-3.f90: Likewise.
* gfortran.dg/goacc/nested-function-1.f90: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
Enhance.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise.
* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
|
|
[PR90115]
As originally introduced in commit 11b8286a83289f5b54e813f14ff56d730c3f3185
"[OpenACC privatization] Largely extend diagnostics and corresponding testsuite
coverage [PR90115]".
PR middle-end/90115
gcc/testsuite/
* c-c++-common/goacc-gomp/nesting-1.c: Enhance.
* gfortran.dg/goacc/common-block-3.f90: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Enhance.
* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
|
|
|
|
2022-03-16 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/sse.md: Delete corrupt character/typo.
|
|
This is Christophe Lyon's fix to PR c/98198, an ICE-on-invalid-code
regression affecting mainline, and a suitable testcase.
Tested on x86_64-pc-linux-gnu with make bootstrap and make -k check
with no new failures. Ok for mainline?
2022-03-16 Christophe Lyon <christophe.lyon@arm.com>
Roger Sayle <roger@nextmovesoftware.com>
gcc/c-family/ChangeLog
PR c/98198
* c-attribs.cc (decl_or_type_attrs): Add error_mark_node check.
gcc/testsuite/ChangeLog
PR c/98198
* gcc.dg/pr98198.c: New test case.
|
|
This simple i386 patch unblocks a more significant change. The testcase
gcc.target/i386/sse2-pr94680.c isn't quite testing what's intended, and
alas the fix for PR target/94680 doesn't (yet) handle V2DF mode.
For the first test from sse2-pr94680.c, below
v2df foo_v2df (v2df x) {
return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 });
}
GCC on x86_64-pc-linux-gnu with -O2 currently generates:
movhpd .LC0(%rip), %xmm0
ret
.LC0:
.long 0
.long 0
which passes the test as it contains a mov insn and no xor.
Alas reading a zero from the constant pool isn't quite the
desired implementation. With this patch we now generate:
movq %xmm0, %xmm0
ret
The same code as we generate for V2DI, and add a stricter
test case. This implementation generalizes the sse2_movq128
to V2DI and V2DF modes using a VI8F_128 mode iterator and
renames it *sse2_movq128_<mode>. A new define_expand is
introduced for sse2_movq128 so that the exisiting builtin
interface (CODE_FOR_sse2_movq128) remains the same.
2022-03-16 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/94680
* config/i386/sse.md (sse2_movq128): New define_expand to
preserve previous named instruction.
(*sse2_movq128_<mode>): Renamed from sse2_movq128, and
generalized to VI8F_128 (both V2DI and V2DF).
gcc/testsuite/ChangeLog
PR target/94680
* gcc.target/i386/sse2-pr94680-2.c: New stricter V2DF test case.
|
|
The new std::from_chars implementation means that those symbols are now
defined on Solaris 11.3, which lacks uselocale. They were not present in
gcc-11, but the linker script gives them the GLIBCXX_3.4.29 symbol
version because that is the version where they appeared for systems with
uselocale.
This makes the version for those symbols depend on whether uselocale is
available or not, so that they get version GLIBCXX_3.4.30 on targets
where they weren't defined in gcc-11.
In order to avoid needing separate ABI baseline files for Solaris 11.3
and 11.4, the ABI checker program now treats the floating-point
std::from_chars overloads as undesignated if they are not found in the
baseline symbols file. This means they can be left out of the SOlaris
baseline without causing the check-abi target to fail.
libstdc++-v3/ChangeLog:
PR libstdc++/103407
* config/abi/pre/gnu.ver: Make version for std::from_chars
depend on HAVE_USELOCALE macro.
* testsuite/util/testsuite_abi.cc (compare_symbols): Treat
std::from_chars for floating-point types as undesignated if
not found in the baseline symbols file.
|
|
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/393377
|
|
Avoid generating execution paths for warnings that are ultimately
rejected due to -Wno-analyzer-* flags.
This improves the test case from taking at least several minutes
(before I killed it) to taking under a second.
This doesn't fix the slowdown seen in PR analyzer/104955 with large
numbers of warnings when the warnings are still enabled.
gcc/analyzer/ChangeLog:
PR analyzer/104955
* diagnostic-manager.cc (get_emission_location): New.
(diagnostic_manager::diagnostic_manager): Initialize
m_num_disabled_diagnostics.
(diagnostic_manager::add_diagnostic): Reject diagnostics that
will eventually be rejected due to being disabled.
(diagnostic_manager::emit_saved_diagnostics): Log the number
of disabled diagnostics.
(diagnostic_manager::emit_saved_diagnostic): Split out logic for
determining emission location to get_emission_location.
* diagnostic-manager.h
(diagnostic_manager::m_num_disabled_diagnostics): New field.
* engine.cc (stale_jmp_buf::get_controlling_option): New.
(stale_jmp_buf::emit): Use it.
* pending-diagnostic.h
(pending_diagnostic::get_controlling_option): New vfunc.
* region-model.cc
(poisoned_value_diagnostic::get_controlling_option): New.
(poisoned_value_diagnostic::emit): Use it.
(shift_count_negative_diagnostic::get_controlling_option): New.
(shift_count_negative_diagnostic::emit): Use it.
(shift_count_overflow_diagnostic::get_controlling_option): New.
(shift_count_overflow_diagnostic::emit): Use it.
(dump_path_diagnostic::get_controlling_option): New.
(dump_path_diagnostic::emit): Use it.
(write_to_const_diagnostic::get_controlling_option): New.
(write_to_const_diagnostic::emit): Use it.
(write_to_string_literal_diagnostic::get_controlling_option): New.
(write_to_string_literal_diagnostic::emit): Use it.
* sm-file.cc (double_fclose::get_controlling_option): New.
(double_fclose::emit): Use it.
(file_leak::get_controlling_option): New.
(file_leak::emit): Use it.
* sm-malloc.cc (mismatching_deallocation::get_controlling_option):
New.
(mismatching_deallocation::emit): Use it.
(double_free::get_controlling_option): New.
(double_free::emit): Use it.
(possible_null_deref::get_controlling_option): New.
(possible_null_deref::emit): Use it.
(possible_null_arg::get_controlling_option): New.
(possible_null_arg::emit): Use it.
(null_deref::get_controlling_option): New.
(null_deref::emit): Use it.
(null_arg::get_controlling_option): New.
(null_arg::emit): Use it.
(use_after_free::get_controlling_option): New.
(use_after_free::emit): Use it.
(malloc_leak::get_controlling_option): New.
(malloc_leak::emit): Use it.
(free_of_non_heap::get_controlling_option): New.
(free_of_non_heap::emit): Use it.
* sm-pattern-test.cc (pattern_match::get_controlling_option): New.
(pattern_match::emit): Use it.
* sm-sensitive.cc
(exposure_through_output_file::get_controlling_option): New.
(exposure_through_output_file::emit): Use it.
* sm-signal.cc (signal_unsafe_call::get_controlling_option): New.
(signal_unsafe_call::emit): Use it.
* sm-taint.cc (tainted_array_index::get_controlling_option): New.
(tainted_array_index::emit): Use it.
(tainted_offset::get_controlling_option): New.
(tainted_offset::emit): Use it.
(tainted_size::get_controlling_option): New.
(tainted_size::emit): Use it.
(tainted_divisor::get_controlling_option): New.
(tainted_divisor::emit): Use it.
(tainted_allocation_size::get_controlling_option): New.
(tainted_allocation_size::emit): Use it.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/many-disabled-diagnostics.c: New test.
* gcc.dg/plugin/analyzer_gil_plugin.c
(gil_diagnostic::get_controlling_option): New.
(double_save_thread::emit): Use it.
(fncall_without_gil::emit): Likewise.
(pyobject_usage_without_gil::emit): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
This adjusts the declarations in <charconv> to match when the definition
is present. This solves the issue that std::from_chars is present on
Solaris 11.3 (using fast_float) but was not declared in the header
(because the declarations were guarded by _GLIBCXX_HAVE_USELOCALE).
Additionally, do not define __cpp_lib_to_chars unless both from_chars
and to_chars are supported (which is only true for IEEE float and
double). We might still provide from_chars (via strtold) but if to_chars
isn't provided, we shouldn't define the feature test macro.
Finally, this simplifies some of the preprocessor checks in the bodies
of std::from_chars in src/c++17/floating_from_chars.cc and hoists the
repeated code for the strtod version into a new function template.
N.B. the long double overload of std::from_chars will always be defined
if the float and double overloads are defined. We can always use one of
strtold or fast_float's binary64 routines (although the latter might
produce errors for some long double values if they are not representable
as binary64).
libstdc++-v3/ChangeLog:
* include/std/charconv (__cpp_lib_to_chars): Only define when
both from_chars and to_chars are supported for floating-point
types.
(from_chars, to_chars): Adjust preprocessor conditions guarding
declarations.
* include/std/version (__cpp_lib_to_chars): Adjust condition to
match <charconv> definition.
* src/c++17/floating_from_chars.cc (from_chars_strtod): New
function template.
(from_chars): Simplify preprocessor checks and use
from_chars_strtod when appropriate.
|
|
Assign the result of fold_convert to offset. Also make the useless
conversion check lighter since the two way check is not needed here.
gcc/ChangeLog:
PR tree-optimization/104941
* tree-object-size.cc (size_for_offset): Make useless conversion
check lighter and assign result of fold_convert to OFFSET.
gcc/testsuite/ChangeLog:
PR tree-optimization/104941
* gcc.dg/builtin-dynamic-object-size-0.c (S1, S2): New structs.
(test_alloc_nested_structs, g): New functions.
(main): Call test_alloc_nested_structs.
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
|
|
This patch fixes a small bug in the omp_set_num_teams implementation.
libgomp/ChangeLog:
* fortran.c (omp_set_num_teams_8_): Call omp_set_num_teams instead of
omp_set_max_active_levels.
* testsuite/libgomp.fortran/icv-8.f90: New test.
|
|
Push target("general-regs-only") in <x86gprintrin.h> if x87 is enabled.
gcc/
PR target/104890
* config/i386/x86gprintrin.h: Also check _SOFT_FLOAT before
pushing target("general-regs-only").
gcc/testsuite/
PR target/104890
* gcc.target/i386/pr104890.c: New test.
|
|
We just expand `zk`, `zkn` and `zks` before, but need version for
combine them back.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Add version info for zk, zks and zkn.
|
|
The crypto extension have several shorthand extensions that don't consist of any extra instructions.
Take zk for example, while the extension would imply zkn, zkr, zkt.
The 3 extensions should also combine back into zk to maintain the canonical order in isa strings.
This patch addresses the above.
And if the other extension has the same situation, you can add them in riscv_combine_info[]
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc
(riscv_combine_info): New.
(riscv_subset_list::handle_combine_ext): Combine back into zk to
maintain the canonical order in isa strings.
(riscv_subset_list::parse): Ditto.
* config/riscv/riscv-subset.h (handle_combine_ext): New.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/predef-17.c: New test.
|
|
The following re-orders the newly added code sinking pass before
the last phiopt pass which performs hoisting of adjacent loads
with the intent to enable if-conversion on those.
I've added the aarch64 specific testcase from the PR.
2022-03-16 Richard Biener <rguenther@suse.de>
PR tree-optimization/102008
* passes.def: Move the added code sinking pass before the
preceeding phiopt pass.
* gcc.target/aarch64/pr102008.c: New testcase.
|
|
As a minor followup to r12-7656-gffe9c0a0d3564a, this condenses the
handling of ambiguity and access w.r.t. the value of 'protect' so that
the logic is more clear.
gcc/cp/ChangeLog:
* search.cc (lookup_member): Simplify by handling all values
of protect together in the ambiguous case. Don't modify protect.
|
|
A well-formed call to std::move/forward is equivalent to a cast, but the
former being a function call means the compiler generates debug info,
which persists even after the call gets inlined, for an operation that's
never interesting to debug.
This patch addresses this problem by folding calls to std::move/forward
and other cast-like functions into simple casts as part of the frontend's
general expression folding routine. This behavior is controlled by a
new flag -ffold-simple-inlines, and otherwise by -fno-inline, so that
users can enable this folding with -O0 (which implies -fno-inline).
After this patch with -O2 and a non-checking compiler, debug info size
for some testcases from range-v3 and cmcstl2 decreases by as much as ~10%
and overall compile time and memory usage decreases by ~2%.
PR c++/96780
gcc/ChangeLog:
* doc/invoke.texi (C++ Dialect Options): Document
-ffold-simple-inlines.
gcc/c-family/ChangeLog:
* c.opt: Add -ffold-simple-inlines.
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_fold) <case CALL_EXPR>: Fold calls to
std::move/forward and other cast-like functions into simple
casts.
gcc/testsuite/ChangeLog:
* g++.dg/opt/pr96780.C: New test.
|
|
Retain the sizetype alloc_object_size to guarantee the assertion in
size_for_offset and to avoid adding a conversion there. nop conversions
are eliminated at the end anyway in dynamic object size computation.
gcc/ChangeLog:
PR tree-optimization/104942
* tree-object-size.cc (alloc_object_size): Remove STRIP_NOPS.
gcc/testsuite/ChangeLog:
PR tree-optimization/104942
* gcc.dg/builtin-dynamic-object-size-0.c (alloc_func_long,
test_builtin_malloc_long): New functions.
(main): Use it.
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
|
|
We unshare all RTL created during expansion, but when
aarch64_load_symref_appropriately is called after expansion like in the
following testcases, we use imm in both HIGH and LO_SUM operands.
If imm is some RTL that shouldn't be shared like a non-sharable CONST,
we get at least with --enable-checking=rtl a checking ICE, otherwise might
just get silently wrong code.
The following patch fixes that by copying it if it can't be shared.
2022-03-16 Jakub Jelinek <jakub@redhat.com>
PR target/104910
* config/aarch64/aarch64.cc (aarch64_load_symref_appropriately): Copy
imm rtx.
* gcc.dg/pr104910.c: New test.
|
|
This patch improves the implementation of single_use as used in code
generated from match.pd for patterns using :s. The current implementation
contains the logic "has_zero_uses (t) || has_single_use (t)" which
performs a loop over the uses to first check if there are zero non-debug
uses [which is rare], then another loop over these uses to check if there
is exactly one non-debug use. This can be better implemented using a
single loop.
This function is currently inlined over 800 times in gimple-match.cc,
whose .o on x86_64-pc-linux-gnu is now up to 30 Mbytes, so speeding up
and shrinking this function should help offset the growth in match.pd
for GCC 12.
I've also done an analysis of the stage3 sizes of gimple-match.o on
x86_64-pc-linux-gnu, which I believe is dominated by debug information,
the .o file is 30MB in stage3, but only 4.8M in stage2. Before my
proposed patch gimple-match.o is 31385160 bytes. The patch as proposed
yesterday (using a single loop in single_use) reduces that to 31105040
bytes, saving 280120 bytes. The suggestion to remove the "inline"
keyword saves only 56 more bytes, but annotating ATTRIBUTE_PURE on a
function prototype was curiously effective, saving 1888 bytes.
before: 31385160
after: 31105040 saved 280120
-inline: 31104984 saved 56
+pure: 31103096 saved 1888
2022-03-16 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
* gimple-match-head.cc (single_use): Implement inline using a
single loop.
|
|
Tweak the constant folding of X CMP X in when X can't be a NaN.
2022-03-16 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* match.pd (X CMP X -> true): Test tree_expr_maybe_nan_p
instead of HONOR_NANS.
(X LTGT X -> false): Enable if X is not tree_expr_maybe_nan_p, as
this can't trap/signal.
|
|
It's an orthogonal concern why these diagnostics do appear at all for
non-offloaded OpenACC constructs (where they're not relevant at all); PR90115.
Depending on how 'assert' is implemented, it may cause temporaries to be
created, and/or may lower into 'COND_EXPR's, and
'gcc/gimplify.cc:gimplify_cond_expr' uses 'create_tmp_var (type, "iftmp")'.
Fix-up for commit 11b8286a83289f5b54e813f14ff56d730c3f3185
"[OpenACC privatization] Largely extend diagnostics and
corresponding testsuite coverage [PR90115]".
PR testsuite/102841
libgomp/
* testsuite/libgomp.oacc-c-c++-common/host_data-7.c: Adjust.
|
|
__builtin_ia32_blendvpd is defined under sse4.1 and gimple folded
to ((v2di) c) < 0 ? b : a where vec_cmpv2di is under sse4.2 w/o which
it's veclowered to scalar operations and not combined back in rtl.
gcc/ChangeLog:
PR target/104946
* config/i386/i386-builtin.def (BDESC): Add
CODE_FOR_sse4_1_blendvpd for IX86_BUILTIN_BLENDVPD.
* config/i386/i386.cc (ix86_gimple_fold_builtin): Don't fold
__builtin_ia32_blendvpd w/o sse4.2
gcc/testsuite/ChangeLog:
* gcc.target/i386/sse4_1-blendvpd-1.c: New test.
|
|
ChangeLog:
* MAINTAINERS: Add myself to DCO section.
|