Age | Commit message (Collapse) | Author | Files | Lines |
|
Resolves:
PR middle-end/98871 - Cannot silence -Wmaybe-uninitialized at declaration site
PR middle-end/98512 - #pragma GCC diagnostic ignored ineffective in conjunction with alias attribute
gcc/ChangeLog:
* builtins.c (warn_string_no_nul): Remove %G.
(maybe_warn_for_bound): Same.
(warn_for_access): Same.
(check_access): Same.
(check_strncat_sizes): Same.
(expand_builtin_strncat): Same.
(expand_builtin_strncmp): Same.
(expand_builtin): Same.
(expand_builtin_object_size): Same.
(warn_dealloc_offset): Same.
(maybe_emit_free_warning): Same.
* calls.c (maybe_warn_alloc_args_overflow): Same.
(maybe_warn_nonstring_arg): Same.
(maybe_warn_rdwr_sizes): Same.
* expr.c (expand_expr_real_1): Remove %K.
* gimple-fold.c (gimple_fold_builtin_strncpy): Remove %G.
(gimple_fold_builtin_strncat): Same.
* gimple-ssa-sprintf.c (format_directive): Same.
(handle_printf_call): Same.
* gimple-ssa-warn-alloca.c (pass_walloca::execute): Same.
* gimple-ssa-warn-restrict.c (maybe_diag_overlap): Same.
(maybe_diag_access_bounds): Same. Call gimple_location.
(check_bounds_or_overlap): Same.
* trans-mem.c (ipa_tm_scan_irr_block): Remove %K. Simplify.
* tree-ssa-ccp.c (pass_post_ipa_warn::execute): Remove %G.
* tree-ssa-strlen.c (maybe_warn_overflow): Same.
(maybe_diag_stxncpy_trunc): Same.
(handle_builtin_stxncpy_strncat): Same.
(maybe_warn_pointless_strcmp): Same.
* tree-ssa-uninit.c (maybe_warn_operand): Same.
gcc/testsuite/ChangeLog:
* gcc.dg/Wobjsize-1.c: Prune expected output.
* gcc.dg/Warray-bounds-71.c: New test.
* gcc.dg/Warray-bounds-71.h: New test header.
* gcc.dg/Warray-bounds-72.c: New test.
* gcc.dg/Warray-bounds-73.c: New test.
* gcc.dg/Warray-bounds-74.c: New test.
* gcc.dg/Warray-bounds-75.c: New test.
* gcc.dg/Wfree-nonheap-object-4.c: Adjust expected output.
* gcc.dg/Wfree-nonheap-object-5.c: New test.
* gcc.dg/Wfree-nonheap-object-6.c: New test.
* gcc.dg/pragma-diag-10.c: New test.
* gcc.dg/pragma-diag-9.c: New test.
* gcc.dg/uninit-suppress_3.c: New test.
* gcc.dg/pr79214.c: Xfail tests.
* gcc.dg/tree-ssa/builtin-sprintf-warn-27.c: New test.
* gcc.dg/format/c90-printf-1.c: Adjust expected output.
|
|
Since vec_duplicate only works on scalar, don't use it on vector in
store constructor expansion.
gcc/
PR middle-end/101294
* expr.c (store_constructor): Don't use vec_duplicate on vector.
gcc/testsuite/
PR middle-end/101294
* gcc.dg/pr101294.c: New test.
|
|
1. Replace PUSH_ARGS with a target calls hook, TARGET_PUSH_ARGUMENT, which
takes an integer argument. When it returns true, push instructions will
be used to pass outgoing arguments. If the argument is nonzero, it is
the number of bytes to push and indicates the PUSH instruction usage is
optional so that the backend can decide if PUSH instructions should be
generated. Otherwise, the argument is zero.
2. Implement x86 target hook which returns false when the number of bytes
to push is no less than 16 (8 for 32-bit targets) if vector load and store
can be used.
3. Remove target PUSH_ARGS definitions which return 0 as it is the same
as the default.
4. Define TARGET_PUSH_ARGUMENT of cr16 and m32c to always return true.
gcc/
PR target/100704
* calls.c (expand_call): Replace PUSH_ARGS with
targetm.calls.push_argument (0).
(emit_library_call_value_1): Likewise.
* defaults.h (PUSH_ARGS): Removed.
(PUSH_ARGS_REVERSED): Replace PUSH_ARGS with
targetm.calls.push_argument (0).
* expr.c (block_move_libcall_safe_for_call_parm): Likewise.
(emit_push_insn): Pass the number bytes to push to
targetm.calls.push_argument and pass 0 if ARGS_ADDR is 0.
* hooks.c (hook_bool_uint_true): New.
* hooks.h (hook_bool_uint_true): Likewise.
* rtlanal.c (nonzero_bits1): Replace PUSH_ARGS with
targetm.calls.push_argument (0).
* target.def (push_argument): Add a targetm.calls hook.
* targhooks.c (default_push_argument): New.
* targhooks.h (default_push_argument): Likewise.
* config/bpf/bpf.h (PUSH_ARGS): Removed.
* config/cr16/cr16.c (TARGET_PUSH_ARGUMENT): New.
* config/cr16/cr16.h (PUSH_ARGS): Removed.
* config/i386/i386.c (ix86_push_argument): New.
(TARGET_PUSH_ARGUMENT): Likewise.
* config/i386/i386.h (PUSH_ARGS): Removed.
* config/m32c/m32c.c (TARGET_PUSH_ARGUMENT): New.
* config/m32c/m32c.h (PUSH_ARGS): Removed.
* config/nios2/nios2.h (PUSH_ARGS): Likewise.
* config/pru/pru.h (PUSH_ARGS): Likewise.
* doc/tm.texi.in: Remove PUSH_ARGS documentation. Add
TARGET_PUSH_ARGUMENT hook.
* doc/tm.texi: Regenerated.
gcc/testsuite/
PR target/100704
* gcc.target/i386/pr100704-1.c: New test.
* gcc.target/i386/pr100704-2.c: Likewise.
* gcc.target/i386/pr100704-3.c: Likewise.
|
|
The following testcase ICEs, because we have a mode mismatch.
VEC_PACK_TRUNC_EXPR's operands have different modes from the result
(same vector mode size but twice as large element),
but we were passing non-NULL subtarget with the mode of the result
to the expansion of its arguments, so the VEC_PERM_EXPR in one of the
operands which had V8SImode operands and result had V16HImode target.
Fixed by clearing the subtarget if we are changing mode.
2021-06-15 Jakub Jelinek <jakub@redhat.com>
PR target/101046
* expr.c (expand_expr_real_2) <case VEC_PACK_FIX_TRUNC_EXPR,
case VEC_PACK_TRUNC_EXPR>: Clear subtarget when changing mode.
* gcc.target/i386/pr101046.c: New test.
|
|
[PR90115]
This patch implements a method to track the "private-ness" of
OpenACC variables declared in offload regions in gang-partitioned,
worker-partitioned or vector-partitioned modes. Variables declared
implicitly in scoped blocks and those declared "private" on enclosing
directives (e.g. "acc parallel") are both handled. Variables that are
e.g. gang-private can then be adjusted so they reside in GPU shared
memory.
The reason for doing this is twofold: correct implementation of OpenACC
semantics, and optimisation, since shared memory might be faster than
the main memory on a GPU. Handling of private variables is intimately
tied to the execution model for gangs/workers/vectors implemented by
a particular target: for current targets, we use (or on mainline, will
soon use) a broadcasting/neutering scheme.
That is sufficient for code that e.g. sets a variable in worker-single
mode and expects to use the value in worker-partitioned mode. The
difficulty (semantics-wise) comes when the user wants to do something like
an atomic operation in worker-partitioned mode and expects a worker-single
(gang private) variable to be shared across each partitioned worker.
Forcing use of shared memory for such variables makes that work properly.
In terms of implementation, the parallelism level of a given loop is
not fixed until the oaccdevlow pass in the offload compiler, so the
patch delays fixing the parallelism level of variables declared on or
within such loops until the same point. This is done by adding a new
internal UNIQUE function (OACC_PRIVATE) that lists (the address of) each
private variable as an argument, and other arguments set so as to be able
to determine the correct parallelism level to use for the listed
variables. This new internal function fits into the existing scheme for
demarcating OpenACC loops, as described in comments in the patch.
Two new target hooks are introduced: TARGET_GOACC_ADJUST_PRIVATE_DECL and
TARGET_GOACC_EXPAND_VAR_DECL. The first can tweak a variable declaration
at oaccdevlow time, and the second at expand time. The first or both
of these target hooks can be used by a given offload target, depending
on its strategy for implementing private variables.
This patch updates the TARGET_GOACC_ADJUST_PRIVATE_DECL target hook in
the AMD GCN backend to the current name and prototype. (An earlier
version of the hook was already present, but dormant.)
gcc/
PR middle-end/90115
* doc/tm.texi.in (TARGET_GOACC_EXPAND_VAR_DECL)
(TARGET_GOACC_ADJUST_PRIVATE_DECL): Add documentation hooks.
* doc/tm.texi: Regenerate.
* expr.c (expand_expr_real_1): Expand decls using the
expand_var_decl OpenACC hook if defined.
* internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE.
* internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE.
* omp-low.c (omp_context): Add oacc_privatization_candidates
field.
(lower_oacc_reductions): Add PRIVATE_MARKER parameter. Insert
before fork.
(lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify
private marker's gimple call arguments, and pass it to
lower_oacc_reductions.
(oacc_privatization_scan_clause_chain)
(oacc_privatization_scan_decl_chain, lower_oacc_private_marker):
New functions.
(lower_omp_for, lower_omp_target, lower_omp_1): Use these.
* omp-offload.c (convert.h): Include.
(oacc_loop_xform_head_tail): Treat private-variable markers like
fork/join when transforming head/tail sequences.
(struct var_decl_rewrite_info): Add struct.
(oacc_rewrite_var_decl, is_sync_builtin_call): New functions.
(execute_oacc_device_lower): Support rewriting gang-private
variables using target hook, and fix up addr_expr and var_decl
nodes afterwards.
* target.def (adjust_private_decl, expand_var_decl): New hooks.
* config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl):
Rename to...
(gcn_goacc_adjust_private_decl): ...this.
* config/gcn/gcn-tree.c (gcn_goacc_adjust_gangprivate_decl):
Rename to...
(gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter.
* config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename
definition using gcn_goacc_adjust_gangprivate_decl...
(TARGET_GOACC_ADJUST_PRIVATE_DECL): ...to this, using
gcn_goacc_adjust_private_decl.
* config/nvptx/nvptx.c (tree-pretty-print.h): Include.
(gang_private_shared_size): New global variable.
(gang_private_shared_align): Likewise.
(gang_private_shared_sym): Likewise.
(gang_private_shared_hmap): Likewise.
(nvptx_option_override): Initialize these.
(nvptx_file_end): Output gang_private_shared_sym.
(nvptx_goacc_adjust_private_decl, nvptx_goacc_expand_var_decl):
New functions.
(nvptx_set_current_function): Clear gang_private_shared_hmap.
(TARGET_GOACC_ADJUST_PRIVATE_DECL): Define hook.
(TARGET_GOACC_EXPAND_VAR_DECL): Likewise.
libgomp/
PR middle-end/90115
* testsuite/libgomp.oacc-c-c++-common/private-atomic-1-gang.c: New
test.
* testsuite/libgomp.oacc-fortran/private-atomic-1-gang.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90:
Likewise.
Co-Authored-By: Chung-Lin Tang <cltang@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
|
|
Elide expand_constructor when the constructor is static storage and not
mostly zeros and we can move it by pieces prefer to do so since that's
usually more efficient than performing a series of stores from immediates.
2021-05-21 Richard Biener <rguenther@suse.de>
H.J. Lu <hjl.tools@gmail.com>
gcc/
PR middle-end/90773
* expr.c (expand_constructor): Elide expand_constructor if
move by pieces is preferred.
gcc/testsuite/
* gcc.target/i386/pr90773-24.c: New test.
* gcc.target/i386/pr90773-25.c: Likewise.
|
|
Resolves:
PR middle-end/21433 - The COMPONENT_REF case of expand_expr_real_1 is probably wrong
gcc/ChangeLog:
PR middle-end/21433
* expr.c (expand_expr_real_1): Replace unreachable code with an assert.
|
|
The ldist pass turns even very short loops into memset calls. E.g.,
the TFmode emulation calls end with a loop of up to 3 iterations, to
zero out trailing words, and the loop distribution pass turns them
into calls of the memset builtin.
Though short constant-length clearing memsets are usually dealt with
efficiently, for non-constant-length ones, the options are setmemM, or
a function calls.
RISC-V doesn't have any setmemM pattern, so the loops above end up
"optimized" into memset calls, incurring not only the overhead of an
explicit call, but also discarding the information the compiler has
about the alignment of the destination, and that the length is a
multiple of the word alignment.
This patch handles variable lengths with multiple conditional
power-of-2-constant-sized stores-by-pieces, so as to reduce the
overhead of length compares.
It also changes the last copy-prop pass into ccp, so that pointer
alignment and length's nonzero bits are detected and made available
for the expander, even for ldist-introduced SSA_NAMEs.
for gcc/ChangeLog
* builtins.c (try_store_by_multiple_pieces): New.
(expand_builtin_memset_args): Use it. If target_char_cast
fails, proceed as for non-constant val. Pass len's ctz to...
* expr.c (clear_storage_hints): ... this. Try store by
multiple pieces after setmem.
(clear_storage): Adjust.
* expr.h (clear_storage_hints): Likewise.
(try_store_by_multiple_pieces): Declare.
* passes.def: Replace the last copy_prop with ccp.
|
|
alignment_for_piecewise_move is called only with MOVE_MAX_PIECES or
STORE_MAX_PIECES, which are the number of bytes at a time that we
can move or store efficiently. We should call mode_for_size without
limit to MAX_FIXED_MODE_SIZE, which is an integer expression for the
size in bits of the largest integer machine mode that should actually
be used, may be smaller than MOVE_MAX_PIECES or STORE_MAX_PIECES, which
may use vector.
* expr.c (alignment_for_piecewise_move): Call mode_for_size
without limit to MAX_FIXED_MODE_SIZE.
|
|
Add an overlap_op_by_pieces_p target hook for op_by_pieces operations
between two areas of memory to generate one offset adjusted operation
in the smallest integer mode for the remaining bytes on the last piece
operation of a memory region to avoid doing more than one smaller
operations.
Pass the RTL information from the previous iteration to m_constfn in
op_by_pieces operation so that builtin_memset_[read|gen]_str can
generate the new RTL from the previous RTL.
Tested on Linux/x86-64.
gcc/
PR middle-end/90773
* builtins.c (builtin_memcpy_read_str): Add a dummy argument.
(builtin_strncpy_read_str): Likewise.
(builtin_memset_read_str): Add an argument for the previous RTL
information and generate the new RTL from the previous RTL info.
(builtin_memset_gen_str): Likewise.
* builtins.h (builtin_strncpy_read_str): Update the prototype.
(builtin_memset_read_str): Likewise.
* expr.c (by_pieces_ninsns): If targetm.overlap_op_by_pieces_p()
returns true, round up size and alignment to the widest integer
mode for maximum size.
(pieces_addr::adjust): Add a pointer to by_pieces_prev argument
and pass it to m_constfn.
(op_by_pieces_d): Add m_push and m_overlap_op_by_pieces.
(op_by_pieces_d::op_by_pieces_d): Add a bool argument to
initialize m_push. Initialize m_overlap_op_by_pieces with
targetm.overlap_op_by_pieces_p ().
(op_by_pieces_d::run): Pass the previous RTL information to
pieces_addr::adjust and generate overlapping operations if
m_overlap_op_by_pieces is true.
(PUSHG_P): New.
(move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d
change.
(store_by_pieces_d::store_by_pieces_d): Updated for op_by_pieces_d
change.
(can_store_by_pieces): Use by_pieces_constfn on constfun.
(store_by_pieces): Use by_pieces_constfn on constfun. Updated
for op_by_pieces_d change.
(clear_by_pieces_1): Add a dummy argument.
(clear_by_pieces): Updated for op_by_pieces_d change.
(compare_by_pieces_d::compare_by_pieces_d): Likewise.
(string_cst_read_str): Add a dummy argument.
* expr.h (by_pieces_constfn): Add a dummy argument.
(by_pieces_prev): New.
* target.def (overlap_op_by_pieces_p): New target hook.
* config/i386/i386.c (TARGET_OVERLAP_OP_BY_PIECES_P): New.
* doc/tm.texi.in: Add TARGET_OVERLAP_OP_BY_PIECES_P.
* doc/tm.texi: Regenerated.
gcc/testsuite/
PR middle-end/90773
* g++.dg/pr90773-1.h: New test.
* g++.dg/pr90773-1a.C: Likewise.
* g++.dg/pr90773-1b.C: Likewise.
* g++.dg/pr90773-1c.C: Likewise.
* g++.dg/pr90773-1d.C: Likewise.
* gcc.target/i386/pr90773-1.c: Likewise.
* gcc.target/i386/pr90773-2.c: Likewise.
* gcc.target/i386/pr90773-3.c: Likewise.
* gcc.target/i386/pr90773-4.c: Likewise.
* gcc.target/i386/pr90773-5.c: Likewise.
* gcc.target/i386/pr90773-6.c: Likewise.
* gcc.target/i386/pr90773-7.c: Likewise.
* gcc.target/i386/pr90773-8.c: Likewise.
* gcc.target/i386/pr90773-9.c: Likewise.
* gcc.target/i386/pr90773-10.c: Likewise.
* gcc.target/i386/pr90773-11.c: Likewise.
* gcc.target/i386/pr90773-12.c: Likewise.
* gcc.target/i386/pr90773-13.c: Likewise.
* gcc.target/i386/pr90773-14.c: Likewise.
|
|
Change a while loop in op_by_pieces_d::run to a do-while loop to prepare
for offset adjusted operation for the remaining bytes on the last piece
operation of a memory region.
PR middle-end/90773
* expr.c (op_by_pieces_d::get_usable_mode): New member function.
(op_by_pieces_d::run): Cange a while loop to a do-while loop.
|
|
The following patch tests both x / y * y and x - x % y expansion for the
former GIMPLE code and chooses the cheaper of those sequences.
2021-04-27 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/96696
* expr.c (expand_expr_divmod): New function.
(expand_expr_real_2) <case TRUNC_DIV_EXPR>: Use it for truncations and
divisions. Formatting fixes.
<case MULT_EXPR>: Optimize x / y * y as x - x % y if the latter is
cheaper.
* gcc.target/i386/pr96696.c: New test.
|
|
This moves the legacy gimplify_buildN API to tree-vect-generic.c,
its only user and elides the gimplification step, making it a wrapper
around gimple_build, adjusting tree_vec_extract for this.
I've noticed that vector CTOR expansion doesn't deal with unfolded
{} and thus this makes it more resilent. I've also adjusted the
match.pd vector CTOR extraction code to make sure it doesn't
produce a CTOR when folding would make it a vector constant.
2021-04-15 Richard Biener <rguenther@suse.de>
* tree-cfg.h (gimplify_build1): Remove.
(gimplify_build2): Likewise.
(gimplify_build3): Likewise.
* tree-cfg.c (gimplify_build1): Move to tree-vect-generic.c.
(gimplify_build2): Likewise.
(gimplify_build3): Likewise.
* tree-vect-generic.c (gimplify_build1): Move from tree-cfg.c.
Modernize.
(gimplify_build2): Likewise.
(gimplify_build3): Likewise.
(tree_vec_extract): Use resimplify with following SSA edges.
(expand_vector_parallel): Avoid passing NULL size/bitpos
to tree_vec_extract.
* expr.c (store_constructor): Deal with zero-element CTORs.
* match.pd (bit_field_ref <vector CTOR>): Make sure to
produce vector constants when possible.
|
|
The gimplifier optimizes away COMPOUND_LITERAL_EXPRs, but they can remain
in the form of ADDR_EXPR of COMPOUND_LITERAL_EXPRs in static initializers.
By the TREE_STATIC check I meant to check that the underlying decl of
the compound literal is a global rather than automatic variable which
obviously can't be referenced in static initializers, but unfortunately
with LTO it might end up in another partition and thus be DECL_EXTERNAL
instead.
2021-04-10 Jakub Jelinek <jakub@redhat.com>
PR lto/99849
* expr.c (expand_expr_addr_expr_1): Test is_global_var rather than
just TREE_STATIC on COMPOUND_LITERAL_EXPR_DECLs.
* gcc.dg/lto/pr99849_0.c: New test.
|
|
This avoids doing bitfield stores into the return object of calls
when using return-slot optimization and the type is addressable.
Instead we have to pass down the original target RTX to the call
expansion which otherwise tries to create a new temporary.
2021-02-26 Richard Biener <rguenther@suse.de>
PR middle-end/99281
* expr.c (store_field): For calls with return-slot optimization
and addressable return type expand the store directly.
* g++.dg/pr99218.C: New testcase.
|
|
- Check `from` mode is not BLMmode before call store_expr, calling store_expr
with BLKmode will cause ICE.
- Verified with riscv64, x86_64 and aarch64, no introduce new regression.
Note: Those logic was introduced by 3e60ddeb8220ed388819bb3f14e8caa9309fd3c2,
so I cc Jakub for reivew.
Changes for V2:
- Checking mode of `from` rather than mode of `to`.
- Verified on riscv64, x86_64 and aarch64 again.
gcc/ChangeLog:
PR target/98743
* expr.c: Check mode before calling store_expr.
gcc/testsuite/ChangeLog:
PR target/98743
* g++.dg/opt/pr98743.C: New.
|
|
My earlier patch to simplify x - y < 0 etc. for signed subtraction
with undefined overflow into x < y in match.pd regressed some tests,
even when it was guarded to be post-IPA, the following patch thus
attempts to optimize that during expansion instead (which is the last
time we can do it, afterwards we lose the information whether it was
x - y < 0 or (int) ((unsigned) x - y) < 0 for which we couldn't
optimize it.
2021-01-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94802
* expr.h (maybe_optimize_sub_cmp_0): Declare.
* expr.c: Include tree-pretty-print.h and flags.h.
(maybe_optimize_sub_cmp_0): New function.
(do_store_flag): Use it.
* cfgexpand.c (expand_gimple_cond): Likewise.
* gcc.target/i386/pr94802.c: New test.
* gcc.dg/Wstrict-overflow-25.c: Remove xfail.
|
|
|
|
constant_byte_string now uses a convert_to_bytes function, which doesn't
handle bitfields at all (don't punt on them, just puts them into wrong bits
or bytes). Furthermore, I don't see a reason why that function should exist
at all, it duplicates native_encode_initializer functionality.
Except that native_encode_initializer punted on flexible array members and 2
tests in the testsuite relied on constant_byte_string handling those.
So, this patch throws away convert_to_bytes, uses native_encode_initializer
instead, but teaches it to handle flexible array members (only in the
non-mask mode with off == -1 for now), furthermore, it adds various corner
case checks that the old implementation was missing (like that STRING_CSTs
use int as length and therefore we shouldn't try to build larger than that
strings, or that native_encode*/native_interpret* APIs require sane
host and target bytes (8-bit on both).
2020-12-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/98366
* fold-const.c (native_encode_initializer): Don't try to
memset more than total_bytes with off == -1 even if len is large.
Handle flexible array member initializers if off == -1 and mask is
NULL.
* expr.c (convert_to_bytes): Remove.
(constant_byte_string): Use native_encode_initializer instead of
convert_to_bytes. Remove extraneous semicolon. Punt on various
corner-cases the APIs don't handle, like sizes > INT_MAX,
BITS_PER_UNIT != 8, CHAR_BIT != 8.
* gcc.c-torture/execute/pr98366.c: New test.
|
|
SUBREG_PROMOTED_VAR_P [PR98190]
Some targets decide to promote certain scalar variables to wider mode,
so their DECL_RTL is a SUBREG with SUBREG_PROMOTED_VAR_P.
When storing to such vars, store_expr takes care of sign or zero extending,
but if we store e.g. through MEM_REF into them, no sign or zero extension
happens and that leads to wrong-code e.g. on the following testcase on
aarch64-linux.
The following patch uses store_expr if we overwrite all the bits and it is
not reversed storage order, i.e. something that store_expr handles normally,
and otherwise (if the most significant bit is (or for pdp11 might be, but
pdp11 doesn't promote) being modified), the code extends manually.
2020-12-11 Jakub Jelinek <jakub@redhat.com>
PR middle-end/98190
* expr.c (expand_assignment): If to_rtx is a promoted SUBREG,
ensure sign or zero extension either through use of store_expr
or by extending manually.
* gcc.dg/pr98190.c: New test.
|
|
Jeff has reported that my earlier patch broke rl78-elf, e.g. with
unsigned short foo (unsigned short x) { return x % 7; }
when compiled with -O2 -mg14. The problem is that rl78 is a BITS_PER_WORD
== 8 target which doesn't have 8-bit modulo or divmod optab, but has instead
16-bit divmod, so my patch attempted to optimize it, then called
expand_divmod to do 8-bit modulo and that in turn tried to do 16-bit modulo
again.
The following patch fixes it in two ways.
One is to not perform the optimization when we have {u,s}divmod_optab
handler for the double-word mode, in that case it is IMHO better to just
do whatever we used to do before. This alone should fix the infinite
recursion. But I'd be afraid some other target might have similar problem
and might not have a divmod pattern, but only say a library call.
So the patch also introduces a methods argument to expand_divmod such that
normally we allow everything that was allowed before (using libcalls and
widening), but when called from these expand_doubleword*mod routines we
restrict it to no widening and no libcalls.
2020-12-02 Jakub Jelinek <jakub@redhat.com>
* expmed.h (expand_divmod): Only declare if GCC_OPTABS_H is defined.
Add enum optabs_method argument defaulted to OPTAB_LIB_WIDEN.
* expmed.c: Include expmed.h after optabs.h.
(expand_divmod): Add methods argument, if it is not OPTAB_{,LIB_}WIDEN,
don't choose a wider mode, and pass it to other calls instead of
hardcoded OPTAB_LIB_WIDEN. Avoid emitting libcalls if not
OPTAB_LIB or OPTAB_LIB_WIDEN.
* optabs.c: Include expmed.h after optabs.h.
(expand_doubleword_mod, expand_doubleword_divmod): Pass OPTAB_DIRECT
as last argument to expand_divmod.
(expand_binop): Punt if {s,u}divmod_optab has handler for double-word
int_mode.
* expr.c: Include expmed.h after optabs.h.
* explow.c: Include expmed.h after optabs.h.
|
|
After building some larger codes using opaque types and some c++ codes
using opaque types it became clear I needed to go through and look for
places where opaque types and modes needed to be handled. A whole pile
of one-liners.
gcc/
* typeclass.h: Add opaque_type_class.
* builtins.c (type_to_class): Identify opaque type class.
* dwarf2out.c (is_base_type): Handle opaque types.
(gen_type_die_with_usage): Handle opaque types.
* expr.c (count_type_elements): Opaque types should
never have initializers.
* ipa-devirt.c (odr_types_equivalent_p): No type-specific handling
for opaque types is needed as it eventually checks the underlying
mode which is what is important.
* tree-streamer.c (record_common_node): Handle opaque types.
* tree.c (type_contains_placeholder_1): Handle opaque types.
(type_cache_hasher::equal): No additional comparison needed for
opaque types.
gcc/c-family
* c-pretty-print.c (c_pretty_printer::simple_type_specifier):
Treat opaque types like other types.
(c_pretty_printer::direct_abstract_declarator): Opaque types are
supported types.
gcc/c
* c-aux-info.c (gen_type): Support opaque types.
gcc/cp
* error.c (dump_type): Handle opaque types.
(dump_type_prefix): Handle opaque types.
(dump_type_suffix): Handle opaque types.
(dump_expr): Handle opaque types.
* pt.c (tsubst): Allow opaque types in templates.
(unify): Allow opaque types in templates.
* typeck.c (structural_comptypes): Handle comparison
of opaque types.
|
|
Add widening add, subtract patterns to tree-vect-patterns. Update the
widened code of patterns that detect PLUS_EXPR to also detect
WIDEN_PLUS_EXPR. These patterns take 2 vectors with N elements of size
S and perform an add/subtract on the elements, storing the results as N
elements of size 2*S (in 2 result vectors). This is implemented in the
aarch64 backend as addl,addl2 and subl,subl2 respectively. Add aarch64
tests for patterns.
gcc/ChangeLog:
* doc/generic.texi: Document new widen_plus/minus_lo/hi tree codes.
* doc/md.texi: Document new widenening add/subtract hi/lo optabs.
* expr.c (expand_expr_real_2): Add widen_add, widen_subtract cases.
* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
* optabs.def (OPTAB_D): Define vectorized widen add, subtracts.
* tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds,
subtracts.
* tree-inline.c (estimate_operator_cost): Add case for widening adds,
subtracts.
* tree-vect-generic.c (expand_vector_operations_1): Add case for
widening adds, subtracts
* tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog
pattern.
(vect_recog_widen_sub_pattern): New recog pattern.
(vect_recog_average_pattern): Update widened add code.
(vect_recog_average_pattern): Update widened add code.
* tree-vect-stmts.c (vectorizable_conversion): Add case for widened add,
subtract.
(supportable_widening_operation): Add case for widened add, subtract.
* tree.def
(WIDEN_PLUS_EXPR): New tree code.
(WIDEN_MINUS_EXPR): New tree code.
(VEC_WIDEN_ADD_HI_EXPR): New tree code.
(VEC_WIDEN_PLUS_LO_EXPR): New tree code.
(VEC_WIDEN_MINUS_HI_EXPR): New tree code.
(VEC_WIDEN_MINUS_LO_EXPR): New tree code.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vect-widen-add.c: New test.
* gcc.target/aarch64/vect-widen-sub.c: New test.
|
|
This makes us always use a single-bit boolean type component type
for integer mode mask VECTOR_BOOLEAN_TYPE_P to match the RTL and target
representation. This aovids the need for magic translation and
the inconsistencies from the translation requirement now that
we expose temporaries of those types on the GIMPLE level.
2020-10-23 Richard Biener <rguenther@suse.de>
PR middle-end/97521
* expr.c (const_scalar_mask_from_tree): Remove.
(expand_expr_real_1): Always VIEW_CONVERT integer mode
vector constants to an integer type.
* tree.c (build_truth_vector_type_for_mode): Use a single-bit
boolean component type for non-vector-mode mask_mode.
* gcc.target/i386/pr97521.c: New testcase.
|
|
2020-10-23 Richard Biener <rguenther@suse.de>
PR middle-end/97521
* expr.c (expand_expr_real_1): Revert last change.
* gcc.target/i386/pr97521.c: Remove.
This reverts commit b960a9c83a93b58a84a7a370002990810675ac5d.
|
|
This fixes expansion of VECTOR_BOOLEAN_TYPE_P VECTOR_CSTs which
when using an integer mode are not always "mask-mode" but may
be using an integer mode when there's no supported vector mode.
The patch makes sure to only go the mask-mode expansion if
the elements do not line up to cover the full integer mode
(when they do and the mode was an actual mask-mode there's
no actual difference in both expansions).
2020-10-22 Richard Biener <rguenther@suse.de>
PR middle-end/97521
* expr.c (expand_expr_real_1): Be more careful when
expanding a VECTOR_BOOLEAN_TYPE_P VECTOR_CSTs.
* gcc.target/i386/pr97521.c: New testcase.
|
|
structure/PARALLEL return values.
In g:70cdb21e579191fe9f0f1d45e328908e59c0179e, DECL/global variable has handled
misaligned stores, but it didn't handle PARALLEL values, and I refer the
other part of this function, I found the PARALLEL need handled by
emit_group_* functions, so I add a check, and using emit_group_store if
storing a PARALLEL value, also checked this change didn't break the
testcase(gcc.target/arm/unaligned-argument-3.c) added by the orginal changes.
For riscv64 target, struct S {int a; double b;} will pack into a parallel
value to return and it has TImode when misaligned access is supported,
however TImode required 16-byte align, but it only 8-byte align, so it go to
the misaligned stores handling, then it will try to generate move
instruction from a PARALLEL value.
Tested on following target without introduced new reguression:
- riscv32/riscv64 elf
- x86_64-linux
- arm-eabi
v2 changes:
- Use maybe_emit_group_store instead of emit_group_store.
- Remove push_temp_slots/pop_temp_slots, emit_group_store only require
stack temp slot when dst is CONCAT or PARALLEL, however
maybe_emit_group_store will always use REG for dst if needed.
gcc/ChangeLog:
PR target/96759
* expr.c (expand_assignment): Handle misaligned stores with PARALLEL
value.
gcc/testsuite/ChangeLog:
PR target/96759
* g++.target/riscv/pr96759.C: New.
* gcc.target/riscv/pr96759.c: New.
|
|
The RTL expansion code for CTORs doesn't handle VECTOR_BOOLEAN_TYPE_P
with bit-precision elements correctly as the testcase shows before
the PR97085 fix. The following makes it do the correct thing
(not 100% sure for CTOR of sub-vectors due to the lack of a testcase).
The alternative would be to assert such CTORs do not happen (and also
add IL verification for this).
The GIMPLE FE needs a way to declare the VECTOR_BOOLEAN_TYPE_P vectors
(thus the C FE needs that).
2020-09-25 Richard Biener <rguenther@suse.de>
PR middle-end/96814
* expr.c (store_constructor): Handle VECTOR_BOOLEAN_TYPE_P
CTORs correctly.
* gcc.target/i386/pr96814.c: New testcase.
|
|
gcc/ada/ChangeLog:
* gcc-interface/trans.c (gigi): Set exact argument of a vector
growth function to true.
(Attribute_to_gnu): Likewise.
gcc/ChangeLog:
* alias.c (init_alias_analysis): Set exact argument of a vector
growth function to true.
* calls.c (internal_arg_pointer_based_exp_scan): Likewise.
* cfgbuild.c (find_many_sub_basic_blocks): Likewise.
* cfgexpand.c (expand_asm_stmt): Likewise.
* cfgrtl.c (rtl_create_basic_block): Likewise.
* combine.c (combine_split_insns): Likewise.
(combine_instructions): Likewise.
* config/aarch64/aarch64-sve-builtins.cc (function_expander::add_output_operand): Likewise.
(function_expander::add_input_operand): Likewise.
(function_expander::add_integer_operand): Likewise.
(function_expander::add_address_operand): Likewise.
(function_expander::add_fixed_operand): Likewise.
* df-core.c (df_worklist_dataflow_doublequeue): Likewise.
* dwarf2cfi.c (update_row_reg_save): Likewise.
* early-remat.c (early_remat::init_block_info): Likewise.
(early_remat::finalize_candidate_indices): Likewise.
* except.c (sjlj_build_landing_pads): Likewise.
* final.c (compute_alignments): Likewise.
(grow_label_align): Likewise.
* function.c (temp_slots_at_level): Likewise.
* fwprop.c (build_single_def_use_links): Likewise.
(update_uses): Likewise.
* gcc.c (insert_wrapper): Likewise.
* genautomata.c (create_state_ainsn_table): Likewise.
(add_vect): Likewise.
(output_dead_lock_vect): Likewise.
* genmatch.c (capture_info::capture_info): Likewise.
(parser::finish_match_operand): Likewise.
* genrecog.c (optimize_subroutine_group): Likewise.
(merge_pattern_info::merge_pattern_info): Likewise.
(merge_into_decision): Likewise.
(print_subroutine_start): Likewise.
(main): Likewise.
* gimple-loop-versioning.cc (loop_versioning::loop_versioning): Likewise.
* gimple.c (gimple_set_bb): Likewise.
* graphite-isl-ast-to-gimple.c (translate_isl_ast_node_user): Likewise.
* haifa-sched.c (sched_extend_luids): Likewise.
(extend_h_i_d): Likewise.
* insn-addr.h (insn_addresses_new): Likewise.
* ipa-cp.c (gather_context_independent_values): Likewise.
(find_more_contexts_for_caller_subset): Likewise.
* ipa-devirt.c (final_warning_record::grow_type_warnings): Likewise.
(ipa_odr_read_section): Likewise.
* ipa-fnsummary.c (evaluate_properties_for_edge): Likewise.
(ipa_fn_summary_t::duplicate): Likewise.
(analyze_function_body): Likewise.
(ipa_merge_fn_summary_after_inlining): Likewise.
(read_ipa_call_summary): Likewise.
* ipa-icf.c (sem_function::bb_dict_test): Likewise.
* ipa-prop.c (ipa_alloc_node_params): Likewise.
(parm_bb_aa_status_for_bb): Likewise.
(ipa_compute_jump_functions_for_edge): Likewise.
(ipa_analyze_node): Likewise.
(update_jump_functions_after_inlining): Likewise.
(ipa_read_edge_info): Likewise.
(read_ipcp_transformation_info): Likewise.
(ipcp_transform_function): Likewise.
* ipa-reference.c (ipa_reference_write_optimization_summary): Likewise.
* ipa-split.c (execute_split_functions): Likewise.
* ira.c (find_moveable_pseudos): Likewise.
* lower-subreg.c (decompose_multiword_subregs): Likewise.
* lto-streamer-in.c (input_eh_regions): Likewise.
(input_cfg): Likewise.
(input_struct_function_base): Likewise.
(input_function): Likewise.
* modulo-sched.c (set_node_sched_params): Likewise.
(extend_node_sched_params): Likewise.
(schedule_reg_moves): Likewise.
* omp-general.c (omp_construct_simd_compare): Likewise.
* passes.c (pass_manager::create_pass_tab): Likewise.
(enable_disable_pass): Likewise.
* predict.c (determine_unlikely_bbs): Likewise.
* profile.c (compute_branch_probabilities): Likewise.
* read-rtl-function.c (function_reader::parse_block): Likewise.
* read-rtl.c (rtx_reader::read_rtx_code): Likewise.
* reg-stack.c (stack_regs_mentioned): Likewise.
* regrename.c (regrename_init): Likewise.
* rtlanal.c (T>::add_single_to_queue): Likewise.
* sched-deps.c (init_deps_data_vector): Likewise.
* sel-sched-ir.c (sel_extend_global_bb_info): Likewise.
(extend_region_bb_info): Likewise.
(extend_insn_data): Likewise.
* symtab.c (symtab_node::create_reference): Likewise.
* tracer.c (tail_duplicate): Likewise.
* trans-mem.c (tm_region_init): Likewise.
(get_bb_regions_instrumented): Likewise.
* tree-cfg.c (init_empty_tree_cfg_for_function): Likewise.
(build_gimple_cfg): Likewise.
(create_bb): Likewise.
(move_block_to_fn): Likewise.
* tree-complex.c (tree_lower_complex): Likewise.
* tree-if-conv.c (predicate_rhs_code): Likewise.
* tree-inline.c (copy_bb): Likewise.
* tree-into-ssa.c (get_ssa_name_ann): Likewise.
(mark_phi_for_rewrite): Likewise.
* tree-object-size.c (compute_builtin_object_size): Likewise.
(init_object_sizes): Likewise.
* tree-predcom.c (initialize_root_vars_store_elim_1): Likewise.
(initialize_root_vars_store_elim_2): Likewise.
(prepare_initializers_chain_store_elim): Likewise.
* tree-ssa-address.c (addr_for_mem_ref): Likewise.
(multiplier_allowed_in_address_p): Likewise.
* tree-ssa-coalesce.c (ssa_conflicts_new): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-ssa-loop-ivopts.c (addr_offset_valid_p): Likewise.
(get_address_cost_ainc): Likewise.
* tree-ssa-loop-niter.c (discover_iteration_bound_by_body_walk): Likewise.
* tree-ssa-pre.c (add_to_value): Likewise.
(phi_translate_1): Likewise.
(do_pre_regular_insertion): Likewise.
(do_pre_partial_partial_insertion): Likewise.
(init_pre): Likewise.
* tree-ssa-propagate.c (ssa_prop_init): Likewise.
(update_call_from_tree): Likewise.
* tree-ssa-reassoc.c (optimize_range_tests_cmp_bitwise): Likewise.
* tree-ssa-sccvn.c (vn_reference_lookup_3): Likewise.
(vn_reference_lookup_pieces): Likewise.
(eliminate_dom_walker::eliminate_push_avail): Likewise.
* tree-ssa-strlen.c (set_strinfo): Likewise.
(get_stridx_plus_constant): Likewise.
(zero_length_string): Likewise.
(find_equal_ptrs): Likewise.
(printf_strlen_execute): Likewise.
* tree-ssa-threadedge.c (set_ssa_name_value): Likewise.
* tree-ssanames.c (make_ssa_name_fn): Likewise.
* tree-streamer-in.c (streamer_read_tree_bitfields): Likewise.
* tree-vect-loop.c (vect_record_loop_mask): Likewise.
(vect_get_loop_mask): Likewise.
(vect_record_loop_len): Likewise.
(vect_get_loop_len): Likewise.
* tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-slp.c (vect_slp_convert_to_external): Likewise.
(vect_bb_slp_scalar_cost): Likewise.
(vect_bb_vectorization_profitable_p): Likewise.
(vectorizable_slp_permutation): Likewise.
* tree-vect-stmts.c (vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(scan_store_can_perm_p): Likewise.
(vectorizable_store): Likewise.
* expr.c: Likewise.
* vec.c (test_safe_grow_cleared): Likewise.
* vec.h (vec_safe_grow): Likewise.
(vec_safe_grow_cleared): Likewise.
(vl_ptr>::safe_grow): Likewise.
(vl_ptr>::safe_grow_cleared): Likewise.
* config/c6x/c6x.c (insn_set_clock): Likewise.
gcc/c/ChangeLog:
* gimple-parser.c (c_parser_gimple_compound_statement): Set exact argument of a vector
growth function to true.
gcc/cp/ChangeLog:
* class.c (build_vtbl_initializer): Set exact argument of a vector
growth function to true.
* constraint.cc (get_mapped_args): Likewise.
* decl.c (cp_maybe_mangle_decomp): Likewise.
(cp_finish_decomp): Likewise.
* parser.c (cp_parser_omp_for_loop): Likewise.
* pt.c (canonical_type_parameter): Likewise.
* rtti.c (get_pseudo_ti_init): Likewise.
gcc/fortran/ChangeLog:
* trans-openmp.c (gfc_trans_omp_do): Set exact argument of a vector
growth function to true.
gcc/lto/ChangeLog:
* lto-common.c (lto_file_finalize): Set exact argument of a vector
growth function to true.
|
|
Related:
PR middle-end/78257 - missing memcmp optimization with constant arrays
gcc/ChangeLog:
PR middle-end/96665
PR middle-end/78257
* expr.c (convert_to_bytes): Replace statically allocated buffer with
a dynamically allocated one of sufficient size.
gcc/testsuite/ChangeLog:
PR middle-end/96665
PR middle-end/78257
* gcc.dg/memcmp-5.c: New test.
|
|
gcc/ChangeLog:
PR middle-end/78257
* builtins.c (expand_builtin_memory_copy_args): Rename called function.
(expand_builtin_stpcpy_1): Remove argument from call.
(expand_builtin_memcmp): Rename called function.
(inline_expand_builtin_bytecmp): Same.
* expr.c (convert_to_bytes): New function.
(constant_byte_string): New function (formerly string_constant).
(string_constant): Call constant_byte_string.
(byte_representation): New function.
* expr.h (byte_representation): Declare.
* fold-const-call.c (fold_const_call): Rename called function.
* fold-const.c (c_getstr): Remove an argument.
(getbyterep): Define a new function.
* fold-const.h (c_getstr): Remove an argument.
(getbyterep): Declare a new function.
* gimple-fold.c (gimple_fold_builtin_memory_op): Rename callee.
(gimple_fold_builtin_string_compare): Same.
(gimple_fold_builtin_memchr): Same.
gcc/testsuite/ChangeLog:
PR middle-end/78257
* gcc.dg/memchr.c: New test.
* gcc.dg/memcmp-2.c: New test.
* gcc.dg/memcmp-3.c: New test.
* gcc.dg/memcmp-4.c: New test.
|
|
At GIMPLE e.g. for __builtin_memmove we optimize away (to just the return
value) noop copies where src == dest, but at the RTL we don't, and as the
testcase shows, in some cases such copies can appear only at the RTL level
e.g. from trying to copy an aggregate by value argument to the same location
as it already has. If the block move is expanded e.g. piecewise, we
actually manage to optimize it away, as the individual memory copies are
seen as noop moves, but if the target optabs are used, often the sequences
stay until final.
2020-08-11 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/96539
* expr.c (emit_block_move_hints): Don't copy anything if x and y
are the same and neither is MEM_VOLATILE_P.
* gcc.target/i386/pr96539.c: New test.
|
|
Odd numbers are invertible in Z / 2^n Z, so X * C1 == C2 can be rewritten
as X == C2 * inv(C1) when overflow wraps.
mod_inv should probably be updated to better match the other wide_int
functions, but that's a separate issue.
2020-08-10 Marc Glisse <marc.glisse@inria.fr>
PR tree-optimization/95433
* match.pd (X * C1 == C2): Handle wrapping overflow.
* expr.c (maybe_optimize_mod_cmp): Qualify call to mod_inv.
(mod_inv): Move...
* wide-int.cc (mod_inv): ... here.
* wide-int.h (mod_inv): Declare it.
* gcc.dg/tree-ssa/pr95433-2.c: New file.
|
|
gcc/ChangeLog:
PR tree-optimization/96058
* expr.c (string_constant): Build string_constant only
for a type that has same precision as char_type_node
and is an integral type.
|
|
is_int_mode does not allow MODE_PARTIAL_INT modes, so convert_modes was
not allowing a constant value to be converted to a MODE_PARTIAL_INT for
use as operand 2 in patterns such as ashlpsi3. The constant had
to be copied into a register before it could be used, but now can be
used directly as an operand without any copying.
gcc/ChangeLog:
* expr.c (convert_modes): Allow a constant integer to be converted to
any scalar int mode.
|
|
Resolves:
PR middle-end/95189 - memcmp being wrongly stripped like strcm
PR middle-end/95886 - suboptimal memcpy with embedded zero bytes
gcc/ChangeLog:
PR middle-end/95189
PR middle-end/95886
* builtins.c (inline_expand_builtin_string_cmp): Rename...
(inline_expand_builtin_bytecmp): ...to this.
(builtin_memcpy_read_str): Don't expect data to be nul-terminated.
(expand_builtin_memory_copy_args): Handle object representations
with embedded nul bytes.
(expand_builtin_memcmp): Same.
(expand_builtin_strcmp): Adjust call to naming change.
(expand_builtin_strncmp): Same.
* expr.c (string_constant): Create empty strings with nonzero size.
* fold-const.c (c_getstr): Rename locals and update comments.
* tree.c (build_string): Accept null pointer argument.
(build_string_literal): Same.
* tree.h (build_string): Provide a default.
(build_string_literal): Same.
gcc/testsuite/ChangeLog:
PR middle-end/95189
PR middle-end/95886
* gcc.dg/memcmp-pr95189.c: New test.
* gcc.dg/strncmp-3.c: New test.
* gcc.target/i386/memcpy-pr95886.c: New test.
|
|
> > The store to the whole of each volatile object was picked apart
> > like there had been an individual assignment to each of the
> > fields. Reads were added as part of that; see PR for details.
> > The reads from volatile memory were a clear bug; individual
> > stores questionable. A separate patch clarifies the docs.
This breaks building of mesa on both the trunk and 10 branch.
The problem is that the middle-end may never create temporaries of non-POD
(TREE_ADDRESSABLE) types, those can be only created when the language says
so and thus only the FE is allowed to create those.
This patch just reverts the behavior to what we used to do before for the
stores to volatile non-PODs. Perhaps we want to do something else, but
definitely we can't create temporaries of the non-POD type. It is up to
discussions on what should happen in those cases.
2020-07-14 Jakub Jelinek <jakub@redhat.com>
PR middle-end/96194
* expr.c (expand_constructor): Don't create temporary for store to
volatile MEM if exp has an addressable type.
* g++.dg/opt/pr96194.C: New test.
|
|
The store to the whole of each volatile object was picked apart
like there had been an individual assignment to each of the
fields. Reads were added as part of that; see PR for details.
The reads from volatile memory were a clear bug; individual
stores questionable. A separate patch clarifies the docs.
gcc:
2020-07-09 Richard Biener <rguenther@suse.de>
PR middle-end/94600
* expr.c (expand_constructor): Make a temporary also if we're
storing to volatile memory.
gcc/testsuite:
2020-07-09 Hans-Peter Nilsson <hp@axis.com>
PR middle-end/94600
* gcc.dg/pr94600-1.c, gcc.dg/pr94600-2.c, gcc.dg/pr94600-3.c,
gcc.dg/pr94600-4.c, gcc.dg/pr94600-5.c, gcc.dg/pr94600-6.c,
gcc.dg/pr94600-7.c, gcc.dg/pr94600-8.c: New tests.
|
|
In some cases, expand_expr_real_2 prefers to use the mode of the
caller-suggested target instead of the mode of the expression when
passing values to reduce_to_bit_field_precision. E.g.:
else if (target == 0)
op0 = convert_to_mode (mode, op0,
TYPE_UNSIGNED (TREE_TYPE
(treeop0)));
else
{
convert_move (target, op0,
TYPE_UNSIGNED (TREE_TYPE (treeop0)));
op0 = target;
}
where “op0” might not have “mode” for the “else” branch,
but does for all the others.
reduce_to_bit_field_precision discards the suggested target if it
has the wrong mode. This patch moves that to expand_expr_real_2
instead (conditional on reduce_bit_field).
gcc/
PR middle-end/96151
* expr.c (expand_expr_real_2): When reducing bit fields,
clear the target if it has a different mode from the expression.
(reduce_to_bit_field_precision): Don't do that here. Instead
assert that the target already has the correct mode.
|
|
This is yet another PR caused by constant integer rtxes not storing
a mode. We were calling REDUCE_BIT_FIELD on a constant integer that
didn't fit in poly_int64, and then tripped the as_a<scalar_int_mode>
assert on VOIDmode.
AFAICT REDUCE_BIT_FIELD is always passed rtxes that have TYPE_MODE
(rather than some other mode) and it just fills in the redundant
sign bits of that TYPE_MODE value. So it should be safe to get
the mode from the type instead of the rtx. The patch does that
and asserts that the modes agree, where information is available.
That on its own is enough to fix the bug, but we might as well
extend the folding case to all constant integers, not just those
that fit poly_int64.
gcc/
PR middle-end/95694
* expr.c (expand_expr_real_2): Get the mode from the type rather
than the rtx, and assert that it is consistent with the mode of
the rtx (where known). Optimize all constant integers, not just
those that can be represented in poly_int64.
gcc/testsuite/
PR middle-end/95694
* gcc.dg/pr95694.c: New test.
|
|
gcc/ChangeLog:
* Makefile.in: Add new file.
* expr.c (expand_expr_real_2): Add gcc_unreachable as we should
not meet this condition.
(do_store_flag): Likewise.
* gimplify.c (gimplify_expr): Gimplify first argument of
VEC_COND_EXPR to be a SSA name.
* internal-fn.c (vec_cond_mask_direct): New.
(vec_cond_direct): Likewise.
(vec_condu_direct): Likewise.
(vec_condeq_direct): Likewise.
(expand_vect_cond_optab_fn): New.
(expand_vec_cond_optab_fn): Likewise.
(expand_vec_condu_optab_fn): Likewise.
(expand_vec_condeq_optab_fn): Likewise.
(expand_vect_cond_mask_optab_fn): Likewise.
(expand_vec_cond_mask_optab_fn): Likewise.
(direct_vec_cond_mask_optab_supported_p): Likewise.
(direct_vec_cond_optab_supported_p): Likewise.
(direct_vec_condu_optab_supported_p): Likewise.
(direct_vec_condeq_optab_supported_p): Likewise.
* internal-fn.def (VCOND): New OPTAB.
(VCONDU): Likewise.
(VCONDEQ): Likewise.
(VCOND_MASK): Likewise.
* optabs.c (get_rtx_code): Make it global.
(expand_vec_cond_mask_expr): Removed.
(expand_vec_cond_expr): Removed.
* optabs.h (expand_vec_cond_expr): Likewise.
(vector_compare_rtx): Make it global.
* passes.def: Add new pass_gimple_isel pass.
* tree-cfg.c (verify_gimple_assign_ternary): Add check
for VEC_COND_EXPR about first argument.
* tree-pass.h (make_pass_gimple_isel): New.
* tree-ssa-forwprop.c (pass_forwprop::execute): Prevent
propagation of the first argument of a VEC_COND_EXPR.
* tree-ssa-reassoc.c (ovce_extract_ops): Support SSA_NAME as
first argument of a VEC_COND_EXPR.
(optimize_vec_cond_expr): Likewise.
* tree-vect-generic.c (expand_vector_divmod): Make SSA_NAME
for a first argument of created VEC_COND_EXPR.
(expand_vector_condition): Fix coding style.
* tree-vect-stmts.c (vectorizable_condition): Gimplify
first argument.
* gimple-isel.cc: New file.
gcc/testsuite/ChangeLog:
* g++.dg/vect/vec-cond-expr-eh.C: New test.
|
|
In rtl expand, if we have a copy that matches one of the following patterns:
(set (subreg:M1 (reg:M2 ...)) (subreg:M1 (reg:M2 ...)))
(set (subreg:M1 (reg:M2 ...)) (mem:M1 ADDR))
(set (mem:M1 ADDR) (subreg:M1 (reg:M2 ...)))
(set (subreg:M1 (reg:M2 ...)) (constant C))
where mode M1 is equal in size to M2, try to detect whether the mode change
involves an implicit round trip through memory. If so, see if we can avoid
that by removing the subregs and doing the move in mode M2 instead.
2020-06-05 Felix Yang <felix.yang@huawei.com>
gcc/
PR target/95254
* expr.c (emit_move_insn): Check src and dest of the copy to see
if one or both of them are subregs, try to remove the subregs when
innermode and outermode are equal in size and the mode change involves
an implicit round trip through memory.
gcc/testsuite/
PR target/95254
* gcc.target/aarch64/pr95254.c: New test.
* gcc.target/i386/pr67609.c: Check "movq\t%xmm0" instead of "movdqa".
|
|
> Can't hurt, and debugging the assert tripping is likely a hell of a lot easier
> than debugging the resultant incorrect code. So if it passes, then I'd say go
> for it.
Testing passed, so I've committed it with those asserts (and thankfully I've
added them!) but it apparently broke Linux kernel build on arm.
The problem is that if the STRING_CST is very short, while the full object
has BLKmode, the short string could very well have
QImode/HImode/SImode/DImode and in that case it wouldn't take the path that
copies the string and then clears the remaining space, but different paths
in which it will ICE because of those asserts and without those it would
just emit wrong-code.
The following patch fixes it by enforcing BLKmode for the string MEM, even
if it is short, so that we copy it and memset the rest.
2020-05-31 Jakub Jelinek <jakub@redhat.com>
PR middle-end/95052
* expr.c (store_expr): For shortedned_string_cst, ensure temp has
BLKmode.
* gcc.dg/pr95052.c: New test.
|
|
In the following testcase, store_expr of e.g. 97 bytes long string literal
into 1MB long array is implemented by copying the 97 bytes from .rodata
section, followed by clearing the remaining bytes. But, as the STRING_CST
has type char[1024*1024], we actually allocate whole 1MB in .rodata section
for it, even when we only use the first 97 bytes from that.
The following patch tweaks it so that if we are going to initialize only the
small part from it, we don't emit all the zeros that we never use after it.
2020-05-29 Jakub Jelinek <jakub@redhat.com>
PR middle-end/95052
* expr.c (store_expr): If expr_size is constant and significantly
larger than TREE_STRING_LENGTH, set temp to just the
TREE_STRING_LENGTH portion of the STRING_CST.
* gcc.target/i386/pr95052.c: New test.
|
|
This adjusts emit_move_multi_word to handle moves into paradoxical
subregs parts that are not there and adjusts lower-subregs
CLOBBER resolving to deal with those as well.
2020-04-16 Richard Biener <rguenther@suse.de>
PR middle-end/94614
* expr.c (emit_move_multi_word): Do not generate code when
the destination part is undefined_operand_subword_p.
* lower-subreg.c (resolve_clobber): Look through a paradoxica
subreg.
|
|
From-SVN: r279813
|
|
PR middle-end/90840
* expr.c (expand_assignment): In the case of a CONCAT on the LHS, make
sure to pass a valid inner mode in calls to simplify_gen_subreg.
From-SVN: r279076
|
|
This patch adds AArch64 patterns for converting between 64-bit and
128-bit integer vectors, and makes the vectoriser and expand pass
use them.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-cfg.c (verify_gimple_assign_unary): Handle conversions
between vector types.
* tree-vect-stmts.c (vectorizable_conversion): Extend the
non-widening and non-narrowing path to handle standard
conversion codes, if the target supports them.
* expr.c (convert_move): Try using the extend and truncate optabs
for vectors.
* optabs-tree.c (supportable_convert_operation): Likewise.
* config/aarch64/iterators.md (Vnarroqw): New iterator.
* config/aarch64/aarch64-simd.md (<optab><Vnarrowq><mode>2)
(trunc<mode><Vnarrowq>2): New patterns.
gcc/testsuite/
* gcc.dg/vect/bb-slp-pr69907.c: Do not expect BB vectorization
to fail for aarch64 targets.
* gcc.dg/vect/no-scevccp-outer-12.c: Expect the test to pass
on aarch64 targets.
* gcc.dg/vect/vect-double-reduc-5.c: Likewise.
* gcc.dg/vect/vect-outer-4e.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_5.c: New test.
* gcc.target/aarch64/vect_mixed_sizes_6.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_7.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_8.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_11.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_12.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_13.c: Likewise.
From-SVN: r278245
|
|
The RISC-V backend wants to use a libcall when optimizing for size if
more than 6 instructions are needed. Emit_move_complex asks for no
libcalls. This case requires 8 insns for rv64 and 16 insns for rv32,
so we get fallback code that emits a loop. Commit_one_edge_insertion
doesn't allow code inserted for a phi node on an edge to end with a
branch, and so this triggers an assertion. This problem goes away if
we allow libcalls when optimizing for size, which gives the code the
RISC-V backend wants, and avoids triggering the assert.
gcc/
PR middle-end/92263
* expr.c (emit_move_complex): Only use BLOCK_OP_NO_LIBCALL when
optimize_insn_for_speed_p is true.
gcc/testsuite/
PR middle-end/92263
* gcc.dg/pr92263.c: New.
From-SVN: r277861
|
|
`build_personality_function` generates a declaration for a personality
function. The type it declares for these functions doesn't match the
type of the actual personality functions that are defined by the C++
unwinding ABI.
This doesn't cause any crashes since the compiler never generates a call
to these decl's, and hence the type of the function is never used.
Nonetheless, for the sake of consistency and readability we update the
type of this declaration.
gcc/ChangeLog:
2019-11-05 Matthew Malcomson <matthew.malcomson@arm.com>
* expr.c (build_personality_function): Fix generated type to
match actual personality functions.
From-SVN: r277846
|