Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ada/
* doc/gnat_ugn/gnat_and_program_execution.rst (gnatmem): Document
that it works only with fixed-position executables.
|
|
gcc/ada/
* libgnat/s-parame__vxworks.ads (time_t_bits): Change to
Long_Long_Integer'Size.
|
|
|
|
gcc/testsuite/ChangeLog
* gfortran.dg/c-interop/cf-descriptor-5-c.c: Include alloca.h.
|
|
Patch is adding Cortex-R52+ as 'cortex-r52plus' command line
flag for -mcpu option.
gcc/ChangeLog:
* config/arm/arm-cpus.in: Add Cortex-R52+ CPU.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* doc/invoke.texi: Update docs.
|
|
is_xible_helper assumes only 0- and 1-argument ctors can be trivial, but
C++20 aggregate paren init means multi-arg ctors can now be trivial too.
This patch relaxes the relevant early exit check accordingly.
PR c++/102535
gcc/cp/ChangeLog:
* method.c (is_xible_helper): Don't exit early for multi-arg
ctors in C++20.
gcc/testsuite/ChangeLog:
* g++.dg/ext/is_trivially_constructible7.C: New test.
|
|
When parsing a variadic type trait intrinsic, we build up the list of
trailing arguments in reverse, but we neglect to reverse the list to
the true order afterwards. This causes us to confuse the meaning of
e.g. __is_xible(x, y, z) vs __is_xible(x, z, y).
Note that this bug doesn't affect the library traits because they pass a
pack expansion as the single trailing argument to __is_xible, which gets
expanded in the correct order by tsubst_tree_list.
gcc/cp/ChangeLog:
* parser.c (cp_parser_trait_expr): Call nreverse on the reversed
list of trailing arguments.
gcc/testsuite/ChangeLog:
* g++.dg/ext/is_constructible6.C: New test.
|
|
We need to explicitly skip over vptr fields when synthesizing a
defaulted comparison operator, because next_initializable_field
doesn't do so for us.
PR c++/95567
gcc/cp/ChangeLog:
* method.c (build_comparison_op): Skip DECL_VIRTUAL_P fields.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/spaceship-virtual1.C: New test.
|
|
This is a minor cleanup to ensure that the various Expression::do_type
methods don't have to worry about the possibility that the Expression
has not been lowered.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/353140
|
|
gcc/fortran/ChangeLog:
PR fortran/102458
* simplify.c (simplify_size): Resolve expressions used in array
specifications so that SIZE can be simplified.
gcc/testsuite/ChangeLog:
PR fortran/102458
* gfortran.dg/pr102458b.f90: New test.
|
|
gcc/fortran/
* expr.c: The correct reference to Fortran standard is: F2018:10.1.12.
|
|
Convert (sign_extend:WIDE (any_logic:NARROW (memory, immediate)))
to (any_logic:WIDE (sign_extend (memory)), (sign_extend (immediate))).
This eliminates sign extension after logic operation.
2021-09-30 Uroš Bizjak <ubizjak@gmail.com>
gcc/
PR target/89954
* config/i386/i386.md
(sign_extend:WIDE (any_logic:NARROW (memory, immediate)) splitters):
New splitters.
gcc/testsuite/
PR target/89954
* gcc.target/i386/pr89954.c: New test.
|
|
A test for CLASS(*) + assumed rank was missing; adding a test to
unlimited_polymorphic_1.f03 showed an ICE as backend_decl wasn't
set. While gfc_get_symbol_decl would fix it, the code also assumed
that the class(*) was a variable and could not be a subobject of
a derived type.
PR fortran/71703
PR fortran/84007
gcc/fortran/ChangeLog:
* trans-intrinsic.c (gfc_conv_same_type_as): Fix handling
of UNLIMITED_POLY.
* trans.h (gfc_vtpr_hash_get): Renamed prototype to ...
(gfc_vptr_hash_get): ... this to match function name.
gcc/testsuite/ChangeLog:
* gfortran.dg/c-interop/c535b-1.f90: Remove wrong comment.
* gfortran.dg/unlimited_polymorphic_1.f03: Extend.
* gfortran.dg/unlimited_polymorphic_32.f90: New test.
|
|
This is analogous to __gdc_personality, which ignores in-flight
exceptions that we haven't collided with yet.
libphobos/ChangeLog:
* libdruntime/gcc/deh.d (ExceptionHeader.getClassInfo): Move to...
(getClassInfo): ...here as free function. Add lsda parameter.
(scanLSDA): Pass lsda to actionTableLookup.
(actionTableLookup): Add lsda parameter, pass to getClassInfo.
(__gdc_personality): Remove currentCfa variable.
|
|
exception.
By default, D run-time has a top level exception handler to catch
anything that was uncaught by user code. However when the
`rt_trapExceptions' flag is cleared, this handler would not be enabled,
and this termination would occur, aborting the program, but without any
information about the exception.
libphobos/ChangeLog:
* libdruntime/gcc/deh.d (_d_print_throwable): Declare.
(_d_throw): Print stacktrace before terminating program due to
uncaught exception.
|
|
The core.runtime module always overrides the default parameter value for
constructor calls. MaxAlignment is not required because a class can be
created on the stack with the `scope' keyword.
libphobos/ChangeLog:
* libdruntime/core/runtime.d (runModuleUnitTests): Use scope to new
LibBacktrace on the stack.
* libdruntime/gcc/backtrace.d (FIRSTFRAME): Remove.
(LibBacktrace.MaxAlignment): Remove.
(LibBacktrace.this): Remove default initialization of firstFrame.
(UnwindBacktrace.this): Likewise.
|
|
__attribute__((aligned))
For interoperability with C++ EH, the alignment should match, otherwise
D may not be able to intercept exceptions thrown from C++.
libphobos/ChangeLog:
* libdruntime/gcc/unwind/generic.d (__aligned__): Define.
(_Unwind_Exception): Align struct to __aligned__.
|
|
runtime (PR102476)
The default supplied main function as read when compiling with `-fmain'
has extern(D) linkage. However this does not work when mixing this
option together with `-fno-druntime'.
PR d/102476
gcc/testsuite/ChangeLog:
* gdc.dg/pr102476.d: New test.
libphobos/ChangeLog:
* libdruntime/__main.di: Define main function as extern(C) when
compiling without D runtime.
|
|
libgomp/
* testsuite/libgomp.fortran/alloc-7.f90: Add dg-prune-output
for -fintrinsic-modules-path= warning of the C compiler.
* testsuite/libgomp.fortran/alloc-9.f90: Likewise.
* testsuite/libgomp.fortran/alloc-10.f90: Likewise.
|
|
gcc/ChangeLog:
* omp-low.c (omp_runtime_api_call): Add omp_aligned_{,c}alloc and
omp_{c,re}alloc, fix omp_alloc/omp_free.
libgomp/ChangeLog:
* libgomp.texi (OpenMP 5.1): Set implementation status to Y for
omp_aligned_{,c}alloc and omp_{c,re}alloc routines.
* omp_lib.f90.in (omp_aligned_alloc, omp_aligned_calloc, omp_calloc,
omp_realloc): Add.
* omp_lib.h.in (omp_aligned_alloc, omp_aligned_calloc, omp_calloc,
omp_realloc): Add.
* testsuite/libgomp.fortran/alloc-10.f90: New test.
* testsuite/libgomp.fortran/alloc-6.f90: New test.
* testsuite/libgomp.fortran/alloc-7.c: New test.
* testsuite/libgomp.fortran/alloc-7.f90: New test.
* testsuite/libgomp.fortran/alloc-8.f90: New test.
* testsuite/libgomp.fortran/alloc-9.f90: New test.
|
|
PR testsuite/102509
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/attr-complex-method.c: Skip if LTO is
used.
* gcc.c-torture/compile/attr-complex-method-2.c: Likewise.
|
|
gcc/ChangeLog:
* defaults.h (ASM_OUTPUT_ASCII): Do not hide global variable
asm_out_file and stream directly to MYFILE.
|
|
This refines the previous fix further by reverting to the original
code since the API is a bit of a mess. It also fixes the vector type
used to query the misalignment - that was what triggered the original
bogus change.
2021-09-30 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (vect_update_misalignment_for_peel):
Restore and fix condition under which we apply npeel to
the DRs misalignment value.
|
|
I was mistaken in that npeel is -1 for variable peeling - it is 0.
2021-09-30 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (vect_update_misalignment_for_peel):
Fix npeel check for variable amount of peeling.
|
|
libstdc++-v3/ChangeLog:
* include/bits/regex.h (basic_regex::multiline): Fix #if
condition.
|
|
My upcoming improvements to the DOM threader triggered a warning in
this code. It looks like the format string is ".ltrans%u.ltrans", but
we're only writing a max of ".ltrans" + whatever the MAX_INT is here.
Tested on x86-64 Linux.
gcc/ChangeLog:
* lto-wrapper.c (run_gcc): Plug snprintf overflow.
|
|
This patch adds new OpenMP 5.1 allocator entrypoints and in addition to that
fixes an omp_alloc bug which is hard to test for - if the first allocator
fails but has a larger alignment trait and has a fallback allocator, either
the default behavior or a user fallback, then the extra alignment will be used
even in the fallback allocation, rather than just starting with whatever
alignment has been requested (in GOMP_alloc or the minimum one in omp_alloc).
Jonathan's comment on IRC this morning made me realize that I should add
alloc_align attributes to 2 of the prototypes and I still need to add testsuite
coverage for omp_realloc, will do that in a follow-up.
2021-09-30 Jakub Jelinek <jakub@redhat.com>
* omp.h.in (omp_aligned_alloc, omp_calloc, omp_aligned_calloc,
omp_realloc): New prototypes.
(omp_alloc): Move after omp_free prototype, add __malloc__ (omp_free)
attribute.
* allocator.c: Include string.h.
(omp_aligned_alloc): No longer static, add ialias. Add new_alignment
variable and use it instead of alignment so that when retrying the old
alignment is used again. Don't retry if new alignment is the same
as old alignment, unless allocator had pool size.
(omp_alloc, GOMP_alloc, GOMP_free): Use ialias_call.
(omp_aligned_calloc, omp_calloc, omp_realloc): New functions.
* libgomp.map (OMP_5.0.2): Export omp_aligned_alloc, omp_calloc,
omp_aligned_calloc and omp_realloc.
* testsuite/libgomp.c-c++-common/alloc-4.c (main): Add
omp_aligned_alloc, omp_calloc and omp_aligned_calloc tests.
* testsuite/libgomp.c-c++-common/alloc-5.c: New test.
* testsuite/libgomp.c-c++-common/alloc-6.c: New test.
* testsuite/libgomp.c-c++-common/alloc-7.c: New test.
* testsuite/libgomp.c-c++-common/alloc-8.c: New test.
|
|
I'm trying to add one debug() for each dump() to the dumping aids.
Tested on x86-64 Linux.
gcc/ChangeLog:
* gimple-range.cc (gimple_ranger::debug): New.
* gimple-range.h (class gimple_ranger): Add debug.
|
|
Tested on x86-64 Linux.
gcc/ChangeLog:
PR middle-end/102519
* tree-vrp.c (hybrid_threader::~hybrid_threader): Free m_query.
|
|
|
|
Fix the free up of btf_var_ids hash_map in btf_finalize ().
gcc/ChangeLog:
PR debug/102507
* btfout.c (GTY): Add GTY (()) albeit for cosmetic only purpose.
(btf_finalize): Empty the hash_map btf_var_ids.
|
|
ChangeLog:
* MAINTAINERS: Add myself to DCO section.
|
|
I really don't know what to do here. This is a bit of whack-o-mole.
The IL is sufficiently different for various architectures that any
tweak can cause the number of jump threads to vary.
For the pr7745-2.c testcase, we have less threading candidates because 2
of them now cross loop boundaries. Interestingly, this test matches
"Jumps threaded", not threads registered, so the block copier can
drop threads at copying time adding further confusion.
For example, we can register N threads, but the old copier can cancel
N-M threads while updating the CFG for a variety of different reasons
(removed edges, threading through loop exits, etc). This makes the
"Registering jump threads" not to match the total number of threads this
test checks for with "Jumps threaded".
The pr66752-3.c test OTOH, is just a matter of thread4 eliminating the
"if". I had erroneously thought it would always be eliminated by
thread3, but we really don't care where it gets cleaned up. All we know
is that DCE can't depend on the early threaders doing this work, because
it may cross loop boundaries. I've chosen thread4 arbitrarily, but we
could just as easily pick the ".optimized" dump.
Sorry, I'm really at my wits end here. I don't see any clean path
forward, except rewrite these tests as gimple IL. They're close to useless
as they sit.
gcc/testsuite/ChangeLog:
PR testsuite/102501
* gcc.dg/tree-ssa/pr66752-3.c: Adjust.
* gcc.dg/tree-ssa/pr77445-2.c: Adjust.
|
|
There is no need to update the CFG or SSAs if nothing has changed in VRP
threading.
gcc/ChangeLog:
* tree-vrp.c (thread_through_all_blocks): Return bool.
(execute_vrp_threader): Return TODO_* flags.
(pass_data_vrp_threader): Set todo_flags_finish to 0.
|
|
There seems to be a memory consumption issue on 32 bit hosts after the
hybrid threader patchset. I'm having a hard time reproducing, and in
the process I've noticed that the threader is using the TV_TREE_VRP
timer. Having a distinct one could help diagnose this and other
issues going forward.
gcc/ChangeLog:
* timevar.def (TV_TREE_VRP_THREADER): New.
* tree-vrp.c: Use TV_TREE_VRP_THREADER for VRP threader pass.
|
|
gcc/fortran/ChangeLog:
PR fortran/102520
* array.c (expand_constructor): Do not dereference NULL pointer.
gcc/testsuite/ChangeLog:
PR fortran/102520
* gfortran.dg/pr102520.f90: New test.
|
|
The BPF CO-RE support (commit 8bdabb37549f12ce727800a1c8aa182c0b1dd42a)
mistakenly overwrote bpf-*-* extra_headers in config.gcc, causing
bpf-helpers.h to not be installed. The redefinition with coreout.h is
unneeded, so delete it.
gcc/ChangeLog:
* config.gcc (bpf-*-*): Do not overwrite extra_headers.
|
|
gcc/testsuite
* gcc.c-torture/compile/920831-1.c: Fix computed goto types.
* gcc.c-torture/compile/pr27863.c: Likewise.
|
|
Fix type qualifiers for qtbl1 and qtbx1 Neon builtins and remove
casts from the Neon intrinsic function bodies that use these
builtins.
gcc/ChangeLog:
2021-09-23 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-builtins.c (TYPES_BINOP_PPU): Define
new type qualifier enum.
(TYPES_TERNOP_SSSU): Likewise.
(TYPES_TERNOP_PPPU): Likewise.
* config/aarch64/aarch64-simd-builtins.def: Define PPU, SSU,
PPPU and SSSU builtin generator macros for qtbl1 and qtbx1
Neon builtins.
* config/aarch64/arm_neon.h (vqtbl1_p8): Use type-qualified
builtin and remove casts.
(vqtbl1_s8): Likewise.
(vqtbl1q_p8): Likewise.
(vqtbl1q_s8): Likewise.
(vqtbx1_s8): Likewise.
(vqtbx1_p8): Likewise.
(vqtbx1q_s8): Likewise.
(vqtbx1q_p8): Likewise.
(vtbl1_p8): Likewise.
(vtbl2_p8): Likewise.
(vtbx2_p8): Likewise.
|
|
This implements LWG 2503, which allows ^ and $ to match line terminator
characters, rather than only matching the beginning and end of the
entire input. The multiline option is only valid for ECMAScript, but
for other grammars we ignore it rather than throwing an exception.
This is related to PR libstdc++/102480, which incorrectly said that
ECMAscript should match the beginning of a line when match_prev_avail
is used. I think that's only supposed to happen when multiline is used.
The new regex_constants::multiline and basic_regex::multiline constants
are not defined for strict -std=c++11 and -std=c++14 modes, but
regex_constants::__multiline is always defined, so that the
implementation can use it internally.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/regex.h (basic_regex::multiline): Define constant
for C++17.
* include/bits/regex_constants.h (regex_constants::multiline):
Define constant for C++17.
(regex_constants::__multiline): Define duplicate constant for
internal use in C++11 and C++14.
* include/bits/regex_executor.h (_Executor::_M_match_multiline()):
New member function.
(_Executor::_M_is_line_terminator(_CharT)): New member function.
(_Executor::_M_at_begin(), _Executor::_M_at_end()): Use new
member functions to support multiline matches.
* testsuite/28_regex/algorithms/regex_match/multiline.cc: New test.
|
|
The standard says that it is invalid for more than one grammar element
to be set in a value of type regex_constants::syntax_option_type. This
adds a check in the regex compiler andthrows an exception if an invalid
value is used.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/regex_compiler.h (_Compiler::_S_validate): New
function.
* include/bits/regex_compiler.tcc (_Compiler::_Compiler): Use
_S_validate to check flags.
* include/bits/regex_error.h (_S_grammar): New error code for
internal use.
* testsuite/28_regex/basic_regex/ctors/grammar.cc: New test.
|
|
When the input sequence contains a _CharT(0) character, the strchr call
in _Scanner<_CharT>::_M_scan_normal() will search for '\0' and so return
a pointer to the terminating null at the end of the string. This makes
the scanner think it's found a special character. Because it doesn't
match any of the actual special characters, we fall off the end of the
function (or assert in debug mode).
We should check for a null character explicitly and either treat it as
an ordinary character (for the ECMAScript grammar) or an error (for all
others). I'm not 100% sure that's right, but it seems consistent with
the POSIX RE rules where a '\0' means the end of the regex pattern or
the end of the sequence being matched.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/84110
* include/bits/regex_error.h (regex_constants::_S_null): New
error code for internal use.
* include/bits/regex_scanner.tcc (_Scanner::_M_scan_normal()):
Check for null character.
* testsuite/28_regex/basic_regex/84110.cc: New test.
|
|
Introduce a new _M_compile function which does the common work needed by
all constructors and assignment. Call that directly to avoid multiple
levels of constructor delegation or calls to basic_regex::assign
overloads.
For assignment, there is no need to construct a std::basic_string if we
already have a contiguous sequence of the correct character type, and no
need to construct a temporary basic_regex when assigning from an
existing basic_regex.
Also define the copy and move assignment operators as defaulted, which
does the right thing without constructing a temporary and swapping it.
Copying or moving the shared_ptr member cannot fail, so they can be
noexcept. The assign(const basic_regex&) and assign(basic_regex&&)
member can then be defined in terms of copy or move assignment.
The new _M_compile function takes pointer arguments, so the caller has
to convert arbitrary iterator ranges into a contiguous sequence of
characters. With that simplification, the __compile_nfa helpers are not
needed and can be removed.
This also fixes a bug where construction from a contiguous sequence with
the wrong character type would fail to compile, rather than converting
the elements to the regex character type.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/regex.h (__detail::__is_contiguous_iter): Move
here from <bits/regex_compiler.h>.
(basic_regex::_M_compile): New function to compile an NFA from
a regular expression string.
(basic_regex::basic_regex): Use _M_compile instead of delegating
to other constructors.
(basic_regex::operator=(const basic_regex&)): Define as
defaulted.
(basic_regex::operator=(initializer_list<C>)): Use _M_compile.
(basic_regex::assign(const basic_regex&)): Use copy assignment.
(basic_regex::assign(basic_regex&&)): Use move assignment.
(basic_regex::assign(const C*, flag_type)): Use _M_compile
instead of constructing a temporary string.
(basic_regex::assign(const C*, size_t, flag_type)): Likewise.
(basic_regex::assign(const basic_string<C,T,A>&, flag_type)):
Use _M_compile instead of constructing a temporary basic_regex.
(basic_regex::assign(InputIter, InputIter, flag_type)): Avoid
constructing a temporary string for contiguous iterators of the
right value type.
* include/bits/regex_compiler.h (__is_contiguous_iter): Move to
<bits/regex.h>.
(__enable_if_contiguous_iter, __disable_if_contiguous_iter)
(__compile_nfa): Remove.
* testsuite/28_regex/basic_regex/assign/exception_safety.cc: New
test.
* testsuite/28_regex/basic_regex/ctors/char/other.cc: New test.
|
|
This fixes the testcase which looks for variants of memcpy after
memset folding which is disturbed when we expand the memcpy inline
earlier which in fact performs the desired optimization but makes
the dump file not match. For the ease of testing the following
adjusts the smaller structure size to be no longer power-of-two
which avoids the inline expansion.
2021-09-29 Richard Biener <rguenther@suse.de>
PR testsuite/102517
* gcc.dg/pr78408-1.c: Make S not power-of-two size.
|
|
The following fixes a regression causing us to no longer peel
negative step loops for alignment. With dr_misalignment now
applying the bias for negative step we have to do the reverse
when adjusting the misalignment for peeled DRs.
2021-09-29 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (vect_dr_misalign_for_aligned_access):
New helper.
(vect_update_misalignment_for_peel): Use it to update
misaligned to the value necessary for an aligned access.
(vect_get_peeling_costs_all_drs): Likewise.
(vect_enhance_data_refs_alignment): Likewise.
* gcc.target/i386/vect-alignment-peeling-1.c: New testcase.
* gcc.target/i386/vect-alignment-peeling-2.c: Likewise.
|
|
Similar to my previous patch for setmem this one does the same for the cpymem expansion.
We count the number of ops emitted and compare it against the alternative of just calling
the library function when optimising for size.
For the code:
void
cpy_127 (char *out, char *in)
{
__builtin_memcpy (out, in, 127);
}
void
cpy_128 (char *out, char *in)
{
__builtin_memcpy (out, in, 128);
}
we now emit a call to memcpy (with an extra MOV-immediate instruction for the size) instead of:
cpy_127(char*, char*):
ldp q0, q1, [x1]
stp q0, q1, [x0]
ldp q0, q1, [x1, 32]
stp q0, q1, [x0, 32]
ldp q0, q1, [x1, 64]
stp q0, q1, [x0, 64]
ldr q0, [x1, 96]
str q0, [x0, 96]
ldr q0, [x1, 111]
str q0, [x0, 111]
ret
cpy_128(char*, char*):
ldp q0, q1, [x1]
stp q0, q1, [x0]
ldp q0, q1, [x1, 32]
stp q0, q1, [x0, 32]
ldp q0, q1, [x1, 64]
stp q0, q1, [x0, 64]
ldp q0, q1, [x1, 96]
stp q0, q1, [x0, 96]
ret
which is a clear code size win. Speed optimisation heuristics remain unchanged.
2021-09-29 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* config/aarch64/aarch64.c (aarch64_expand_cpymem): Count number of
emitted operations and adjust heuristic for code size.
2021-09-29 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* gcc.target/aarch64/cpymem-size.c: New test.
|
|
This patch adjusts the setmem expansion in the backend to track the number of ops it generates
for the DUP + STR/STP inline sequences. This way we can compare the size/complexity of the sequence
against alternatives, notably just returning "false" and thus just emitting a call to memset.
The simple heuristic change here is that if we were going to emit more than 4 operations then
we shouldn't bother and just call memset. The number 4 is chosen because in the worst case for memset
we need to emit 4 instructions: 3 to move the arguments into the right registers and 1 for the call.
The speed optimisation decisions are not affected, though I do want to extend these expansions in a later
patch and I'd like to reuse this ops counting logic there. In any case this patch should make sense on its own.
For the code:
void __attribute__((__noinline__))
set127byte (int64_t *src, int c)
{
__builtin_memset (src, c, 127);
}
void __attribute__((__noinline__))
set128byte (int64_t *src, int c)
{
__builtin_memset (src, c, 128);
}
when optimising for size we now get just an immediate move + a call to memset (2 instructions) where before we'd have generated:
set127byte(long*, int):
dup v0.16b, w1
str q0, [x0, 96]
stp q0, q0, [x0]
stp q0, q0, [x0, 32]
stp q0, q0, [x0, 64]
str q0, [x0, 111]
ret
set128byte(long*, int):
dup v0.16b, w1
stp q0, q0, [x0]
stp q0, q0, [x0, 32]
stp q0, q0, [x0, 64]
stp q0, q0, [x0, 96]
ret
which is clearly undesirable for -Os.
I've adjusted the recently-added gcc.target/aarch64/memset-strict-align-1.c testcase to use a bigger struct
and switch to speed optimisation as with this patch we'll just call memset rather than expanding inline.
That is the right decision for size optimisation (the resulting code is indeed shorter).
With -O2 and the new struct size we still try the SIMD expansion and still trigger the path that the testcase is supposed to exercise.
2021-09-27 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* config/aarch64/aarch64.c (aarch64_expand_setmem): Count number of
emitted operations and adjust heuristic for code size.
2021-09-27 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* gcc.target/aarch64/memset-corner-cases-2.c: New test.
* gcc.target/aarch64/memset-strict-align-1.c: Adjust.
|
|
scope [PR102504]
The standard has a restriction:
"A list item that appears in a reduction clause of a scope construct must be
shared in the parallel region to which a corresponding scope region binds."
similar to the restriction for worksharing constructs, but we were checking
it only on worksharing constructs and not for scope and ICEd later on during
omp expansion.
2021-09-29 Jakub Jelinek <jakub@redhat.com>
PR middle-end/102504
* gimplify.c (gimplify_scan_omp_clauses): Use omp_check_private even
in OMP_SCOPE clauses, not just on worksharing construct clauses.
* c-c++-common/gomp/scope-4.c: New test.
|
|
For some reason I did not see these failures in my testing.
Sorry about that. Anyways this fixes the testcases by
adding a cast to __INTPTR_TYPE__ and then a cast to void*.
Committed after testing them on x86_64-linux-gnu.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/920826-1.c: Fix computed goto.
* gcc.c-torture/compile/pr27863.c: Likewise.
* gcc.c-torture/compile/pr70190.c: Likewise.
* gcc.dg/torture/pr89135.c: Likewise.
* gcc.dg/torture/pr90071.c: Likewise.
* gcc.dg/vect/bb-slp-pr97709.c: Likewise.
|
|
This avoids inline expansion to preserve the warning by making
the memcpy size a non-power-of-two as suggested by Martin Sebor.
2021-09-29 Richard Biener <rguenther@suse.de>
* gcc.dg/out-of-bounds-1.c: Make memcpied size not power-of-two.
|