aboutsummaryrefslogtreecommitdiff
path: root/gcc/expr.c
AgeCommit message (Collapse)AuthorFilesLines
2021-09-12Also preserve SUBREG_PROMOTED_VAR_P in expr.c's convert_move.Roger Sayle1-0/+19
This patch catches another place in the middle-end where it's possible to preserve the SUBREG_PROMOTED_VAR_P annotation on a subreg to the benefit of later RTL optimizations. This adds the same logic to expr.c's convert_move as recently added to convert_modes. On nvptx-none, the simple test program: short foo (char c) { return c; } currently generates three instructions: mov.u32 %r23, %ar0; cvt.u16.u32 %r24, %r23; cvt.s32.s16 %value, %r24; with this patch, we now generate just one: mov.u32 %value, %ar0; This patch should look familiar, it's almost identical to the recent patch https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578331.html but with the fix https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578519.html 2021-09-12 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_move): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg.
2021-08-31[Committed] Fix subreg_promoted_mode breakage on various platforms.Roger Sayle1-3/+6
My apologies for the inconvenience. My recent patch to preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))), and other places in the middle-end, has broken the build on several targets. The change to convert_modes inadvertently used the same subreg_promoted_mode idiom for retrieving the mode of a SUBREG_REG as the existing code just a few lines earlier. Alas in the meantime, the original SUBREG gets replaced by one without SUBREG_PROMOTED_VAR_P, the whole raison-d'etre for my patch, and I'd not realized/noticed that subreg_promoted_mode asserts for this. Alas neither the bootstrap and regression test on x86_64-pc-linux-gnu nor my testing on nvptx-none must have hit this particular case. The logic of this transformation is sound, it's the implementation that's bitten me. This patch has been committed, after another "make bootstrap" on x86_64-pc-linux-gnu (just in case), and confirmation/pre-approval from Jeff Law that this indeed fixes the build failures seen on several platforms. My humble apologies again. 2021-08-31 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Don't use subreg_promoted_mode on a SUBREG if it can't be guaranteed to a SUBREG_PROMOTED_VAR_P set. Instead use the standard (safer) is_a <scalar_int_mode> idiom.
2021-08-31Preserve SUBREG_PROMOTED_VAR_P on (extend:HI (subreg/s:QI (reg:SI))).Roger Sayle1-1/+18
SUBREG_PROMOTED_VAR_P is a mechanism for tracking that a partial subreg is correctly zero-extended or sign-extended in the parent register. For example, the RTL (subreg/s/v:QI (reg/v:SI 23 [ x ]) 0) indicates that the byte x is zero extended in reg:SI 23, which is useful for optimization. An example is that zero extending the above QImode value to HImode can simply use a wider subreg, i.e. (subreg:HI (reg/v:SI 23 [ x ]) 0). This patch addresses the oversight/missed optimization opportunity that the new HImode subreg above should retain its SUBREG_PROMOTED_VAR_P annotation as its value is guaranteed to be correctly extended in the SImode parent. The code below to preserve SUBREG_PROMOTED_VAR_P is already present in the middle-end (e.g. simplify-rtx.c:7232-7242) but missing from one or two (precisely three) places that (accidentally) strip it. Whilst there I also added another optimization. If we need to extend the above QImode value beyond the SImode register holding it, say to DImode, we can eliminate the SUBREG and simply extend from the SImode register to DImode. 2021-08-31 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * expr.c (convert_modes): Preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate SIGN_EXTEND of the SUBREG_REG when a subreg would be paradoxical. [ZERO_EXTEND]: Likewise, preserve SUBREG_PROMOTED_VAR_P when creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P subreg. Generate ZERO_EXTEND of the SUBREG_REG when a subreg would be paradoxical.
2021-08-04by_pieces: Pass MAX_PIECES to op_by_pieces_dH.J. Lu1-12/+14
Pass MAX_PIECES to op_by_pieces_d::op_by_pieces_d for move, store and compare. PR target/101742 * expr.c (op_by_pieces_d::op_by_pieces_d): Add a max_pieces argument to set m_max_size. (move_by_pieces_d): Pass MOVE_MAX_PIECES to op_by_pieces_d. (store_by_pieces_d): Pass STORE_MAX_PIECES to op_by_pieces_d. (compare_by_pieces_d): Pass COMPARE_MAX_PIECES to op_by_pieces_d.
2021-07-30Add QI vector mode support to by-pieces for memsetH.J. Lu1-52/+120
1. Replace scalar_int_mode with fixed_size_mode in the by-pieces infrastructure to allow non-integer mode. 2. Rename widest_int_mode_for_size to widest_fixed_size_mode_for_size to return QI vector mode for memset. 3. Add op_by_pieces_d::smallest_fixed_size_mode_for_size to return the smallest integer or QI vector mode. 4. Remove clear_by_pieces_1 and use builtin_memset_read_str in clear_by_pieces to support vector mode broadcast. 5. Add lowpart_subreg_regno, a wrapper around simplify_subreg_regno that uses subreg_lowpart_offset (mode, prev_mode) as the offset. 6. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard scratch register to avoid stack realignment when expanding memset. gcc/ PR middle-end/90773 * builtins.c (builtin_memcpy_read_str): Change the mode argument from scalar_int_mode to fixed_size_mode. (builtin_strncpy_read_str): Likewise. (gen_memset_value_from_prev): New function. (builtin_memset_read_str): Change the mode argument from scalar_int_mode to fixed_size_mode. Use gen_memset_value_from_prev and support CONST_VECTOR. (builtin_memset_gen_str): Likewise. (try_store_by_multiple_pieces): Use by_pieces_constfn to declare constfun. * builtins.h (builtin_strncpy_read_str): Replace scalar_int_mode with fixed_size_mode. (builtin_memset_read_str): Likewise. * expr.c (widest_int_mode_for_size): Renamed to ... (widest_fixed_size_mode_for_size): Add a bool argument to indicate if QI vector mode can be used. (by_pieces_ninsns): Call widest_fixed_size_mode_for_size instead of widest_int_mode_for_size. (pieces_addr::adjust): Change the mode argument from scalar_int_mode to fixed_size_mode. (op_by_pieces_d): Make m_len read-only. Add a bool member, m_qi_vector_mode, to indicate that QI vector mode can be used. (op_by_pieces_d::op_by_pieces_d): Add a bool argument to initialize m_qi_vector_mode. Call widest_fixed_size_mode_for_size instead of widest_int_mode_for_size. (op_by_pieces_d::get_usable_mode): Change the mode argument from scalar_int_mode to fixed_size_mode. Call widest_fixed_size_mode_for_size instead of widest_int_mode_for_size. (op_by_pieces_d::smallest_fixed_size_mode_for_size): New member function to return the smallest integer or QI vector mode. (op_by_pieces_d::run): Call widest_fixed_size_mode_for_size instead of widest_int_mode_for_size. Call smallest_fixed_size_mode_for_size instead of smallest_int_mode_for_size. (store_by_pieces_d::store_by_pieces_d): Add a bool argument to indicate that QI vector mode can be used and pass it to op_by_pieces_d::op_by_pieces_d. (can_store_by_pieces): Call widest_fixed_size_mode_for_size instead of widest_int_mode_for_size. Pass memsetp to widest_fixed_size_mode_for_size to support QI vector mode. Allow all CONST_VECTORs for memset if vec_duplicate is supported. (store_by_pieces): Pass memsetp to store_by_pieces_d::store_by_pieces_d. (clear_by_pieces_1): Removed. (clear_by_pieces): Replace clear_by_pieces_1 with builtin_memset_read_str and pass true to store_by_pieces_d to support vector mode broadcast. (string_cst_read_str): Change the mode argument from scalar_int_mode to fixed_size_mode. * expr.h (by_pieces_constfn): Change scalar_int_mode to fixed_size_mode. (by_pieces_prev): Likewise. * rtl.h (lowpart_subreg_regno): New. * rtlanal.c (lowpart_subreg_regno): New. A wrapper around simplify_subreg_regno. * target.def (gen_memset_scratch_rtx): New hook. * doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX. * doc/tm.texi: Regenerated. gcc/testsuite/ * gcc.target/i386/pr100865-3.c: Expect vmovdqu8 instead of vmovdqu. * gcc.target/i386/pr100865-4b.c: Likewise.
2021-07-06Improve warning suppression for inlined functions.Martin Sebor1-5/+5
Resolves: PR middle-end/98871 - Cannot silence -Wmaybe-uninitialized at declaration site PR middle-end/98512 - #pragma GCC diagnostic ignored ineffective in conjunction with alias attribute gcc/ChangeLog: * builtins.c (warn_string_no_nul): Remove %G. (maybe_warn_for_bound): Same. (warn_for_access): Same. (check_access): Same. (check_strncat_sizes): Same. (expand_builtin_strncat): Same. (expand_builtin_strncmp): Same. (expand_builtin): Same. (expand_builtin_object_size): Same. (warn_dealloc_offset): Same. (maybe_emit_free_warning): Same. * calls.c (maybe_warn_alloc_args_overflow): Same. (maybe_warn_nonstring_arg): Same. (maybe_warn_rdwr_sizes): Same. * expr.c (expand_expr_real_1): Remove %K. * gimple-fold.c (gimple_fold_builtin_strncpy): Remove %G. (gimple_fold_builtin_strncat): Same. * gimple-ssa-sprintf.c (format_directive): Same. (handle_printf_call): Same. * gimple-ssa-warn-alloca.c (pass_walloca::execute): Same. * gimple-ssa-warn-restrict.c (maybe_diag_overlap): Same. (maybe_diag_access_bounds): Same. Call gimple_location. (check_bounds_or_overlap): Same. * trans-mem.c (ipa_tm_scan_irr_block): Remove %K. Simplify. * tree-ssa-ccp.c (pass_post_ipa_warn::execute): Remove %G. * tree-ssa-strlen.c (maybe_warn_overflow): Same. (maybe_diag_stxncpy_trunc): Same. (handle_builtin_stxncpy_strncat): Same. (maybe_warn_pointless_strcmp): Same. * tree-ssa-uninit.c (maybe_warn_operand): Same. gcc/testsuite/ChangeLog: * gcc.dg/Wobjsize-1.c: Prune expected output. * gcc.dg/Warray-bounds-71.c: New test. * gcc.dg/Warray-bounds-71.h: New test header. * gcc.dg/Warray-bounds-72.c: New test. * gcc.dg/Warray-bounds-73.c: New test. * gcc.dg/Warray-bounds-74.c: New test. * gcc.dg/Warray-bounds-75.c: New test. * gcc.dg/Wfree-nonheap-object-4.c: Adjust expected output. * gcc.dg/Wfree-nonheap-object-5.c: New test. * gcc.dg/Wfree-nonheap-object-6.c: New test. * gcc.dg/pragma-diag-10.c: New test. * gcc.dg/pragma-diag-9.c: New test. * gcc.dg/uninit-suppress_3.c: New test. * gcc.dg/pr79214.c: Xfail tests. * gcc.dg/tree-ssa/builtin-sprintf-warn-27.c: New test. * gcc.dg/format/c90-printf-1.c: Adjust expected output.
2021-07-03Don't use vec_duplicate on vector in CTOR expansionH.J. Lu1-1/+2
Since vec_duplicate only works on scalar, don't use it on vector in store constructor expansion. gcc/ PR middle-end/101294 * expr.c (store_constructor): Don't use vec_duplicate on vector. gcc/testsuite/ PR middle-end/101294 * gcc.dg/pr101294.c: New test.
2021-06-17Add a target calls hook: TARGET_PUSH_ARGUMENTH.J. Lu1-3/+11
1. Replace PUSH_ARGS with a target calls hook, TARGET_PUSH_ARGUMENT, which takes an integer argument. When it returns true, push instructions will be used to pass outgoing arguments. If the argument is nonzero, it is the number of bytes to push and indicates the PUSH instruction usage is optional so that the backend can decide if PUSH instructions should be generated. Otherwise, the argument is zero. 2. Implement x86 target hook which returns false when the number of bytes to push is no less than 16 (8 for 32-bit targets) if vector load and store can be used. 3. Remove target PUSH_ARGS definitions which return 0 as it is the same as the default. 4. Define TARGET_PUSH_ARGUMENT of cr16 and m32c to always return true. gcc/ PR target/100704 * calls.c (expand_call): Replace PUSH_ARGS with targetm.calls.push_argument (0). (emit_library_call_value_1): Likewise. * defaults.h (PUSH_ARGS): Removed. (PUSH_ARGS_REVERSED): Replace PUSH_ARGS with targetm.calls.push_argument (0). * expr.c (block_move_libcall_safe_for_call_parm): Likewise. (emit_push_insn): Pass the number bytes to push to targetm.calls.push_argument and pass 0 if ARGS_ADDR is 0. * hooks.c (hook_bool_uint_true): New. * hooks.h (hook_bool_uint_true): Likewise. * rtlanal.c (nonzero_bits1): Replace PUSH_ARGS with targetm.calls.push_argument (0). * target.def (push_argument): Add a targetm.calls hook. * targhooks.c (default_push_argument): New. * targhooks.h (default_push_argument): Likewise. * config/bpf/bpf.h (PUSH_ARGS): Removed. * config/cr16/cr16.c (TARGET_PUSH_ARGUMENT): New. * config/cr16/cr16.h (PUSH_ARGS): Removed. * config/i386/i386.c (ix86_push_argument): New. (TARGET_PUSH_ARGUMENT): Likewise. * config/i386/i386.h (PUSH_ARGS): Removed. * config/m32c/m32c.c (TARGET_PUSH_ARGUMENT): New. * config/m32c/m32c.h (PUSH_ARGS): Removed. * config/nios2/nios2.h (PUSH_ARGS): Likewise. * config/pru/pru.h (PUSH_ARGS): Likewise. * doc/tm.texi.in: Remove PUSH_ARGS documentation. Add TARGET_PUSH_ARGUMENT hook. * doc/tm.texi: Regenerated. gcc/testsuite/ PR target/100704 * gcc.target/i386/pr100704-1.c: New test. * gcc.target/i386/pr100704-2.c: Likewise. * gcc.target/i386/pr100704-3.c: Likewise.
2021-06-15expr: Fix up VEC_PACK_TRUNC_EXPR expansion [PR101046]Jakub Jelinek1-0/+2
The following testcase ICEs, because we have a mode mismatch. VEC_PACK_TRUNC_EXPR's operands have different modes from the result (same vector mode size but twice as large element), but we were passing non-NULL subtarget with the mode of the result to the expansion of its arguments, so the VEC_PERM_EXPR in one of the operands which had V8SImode operands and result had V16HImode target. Fixed by clearing the subtarget if we are changing mode. 2021-06-15 Jakub Jelinek <jakub@redhat.com> PR target/101046 * expr.c (expand_expr_real_2) <case VEC_PACK_FIX_TRUNC_EXPR, case VEC_PACK_TRUNC_EXPR>: Clear subtarget when changing mode. * gcc.target/i386/pr101046.c: New test.
2021-05-21openacc: Add support for gang local storage allocation in shared memory ↵Julian Brown1-1/+12
[PR90115] This patch implements a method to track the "private-ness" of OpenACC variables declared in offload regions in gang-partitioned, worker-partitioned or vector-partitioned modes. Variables declared implicitly in scoped blocks and those declared "private" on enclosing directives (e.g. "acc parallel") are both handled. Variables that are e.g. gang-private can then be adjusted so they reside in GPU shared memory. The reason for doing this is twofold: correct implementation of OpenACC semantics, and optimisation, since shared memory might be faster than the main memory on a GPU. Handling of private variables is intimately tied to the execution model for gangs/workers/vectors implemented by a particular target: for current targets, we use (or on mainline, will soon use) a broadcasting/neutering scheme. That is sufficient for code that e.g. sets a variable in worker-single mode and expects to use the value in worker-partitioned mode. The difficulty (semantics-wise) comes when the user wants to do something like an atomic operation in worker-partitioned mode and expects a worker-single (gang private) variable to be shared across each partitioned worker. Forcing use of shared memory for such variables makes that work properly. In terms of implementation, the parallelism level of a given loop is not fixed until the oaccdevlow pass in the offload compiler, so the patch delays fixing the parallelism level of variables declared on or within such loops until the same point. This is done by adding a new internal UNIQUE function (OACC_PRIVATE) that lists (the address of) each private variable as an argument, and other arguments set so as to be able to determine the correct parallelism level to use for the listed variables. This new internal function fits into the existing scheme for demarcating OpenACC loops, as described in comments in the patch. Two new target hooks are introduced: TARGET_GOACC_ADJUST_PRIVATE_DECL and TARGET_GOACC_EXPAND_VAR_DECL. The first can tweak a variable declaration at oaccdevlow time, and the second at expand time. The first or both of these target hooks can be used by a given offload target, depending on its strategy for implementing private variables. This patch updates the TARGET_GOACC_ADJUST_PRIVATE_DECL target hook in the AMD GCN backend to the current name and prototype. (An earlier version of the hook was already present, but dormant.) gcc/ PR middle-end/90115 * doc/tm.texi.in (TARGET_GOACC_EXPAND_VAR_DECL) (TARGET_GOACC_ADJUST_PRIVATE_DECL): Add documentation hooks. * doc/tm.texi: Regenerate. * expr.c (expand_expr_real_1): Expand decls using the expand_var_decl OpenACC hook if defined. * internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE. * internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE. * omp-low.c (omp_context): Add oacc_privatization_candidates field. (lower_oacc_reductions): Add PRIVATE_MARKER parameter. Insert before fork. (lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify private marker's gimple call arguments, and pass it to lower_oacc_reductions. (oacc_privatization_scan_clause_chain) (oacc_privatization_scan_decl_chain, lower_oacc_private_marker): New functions. (lower_omp_for, lower_omp_target, lower_omp_1): Use these. * omp-offload.c (convert.h): Include. (oacc_loop_xform_head_tail): Treat private-variable markers like fork/join when transforming head/tail sequences. (struct var_decl_rewrite_info): Add struct. (oacc_rewrite_var_decl, is_sync_builtin_call): New functions. (execute_oacc_device_lower): Support rewriting gang-private variables using target hook, and fix up addr_expr and var_decl nodes afterwards. * target.def (adjust_private_decl, expand_var_decl): New hooks. * config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl): Rename to... (gcn_goacc_adjust_private_decl): ...this. * config/gcn/gcn-tree.c (gcn_goacc_adjust_gangprivate_decl): Rename to... (gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter. * config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename definition using gcn_goacc_adjust_gangprivate_decl... (TARGET_GOACC_ADJUST_PRIVATE_DECL): ...to this, using gcn_goacc_adjust_private_decl. * config/nvptx/nvptx.c (tree-pretty-print.h): Include. (gang_private_shared_size): New global variable. (gang_private_shared_align): Likewise. (gang_private_shared_sym): Likewise. (gang_private_shared_hmap): Likewise. (nvptx_option_override): Initialize these. (nvptx_file_end): Output gang_private_shared_sym. (nvptx_goacc_adjust_private_decl, nvptx_goacc_expand_var_decl): New functions. (nvptx_set_current_function): Clear gang_private_shared_hmap. (TARGET_GOACC_ADJUST_PRIVATE_DECL): Define hook. (TARGET_GOACC_EXPAND_VAR_DECL): Likewise. libgomp/ PR middle-end/90115 * testsuite/libgomp.oacc-c-c++-common/private-atomic-1-gang.c: New test. * testsuite/libgomp.oacc-fortran/private-atomic-1-gang.f90: Likewise. * testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Likewise. Co-Authored-By: Chung-Lin Tang <cltang@codesourcery.com> Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-05-21Elide expand_constructor if move by pieces is preferredH.J. Lu1-0/+13
Elide expand_constructor when the constructor is static storage and not mostly zeros and we can move it by pieces prefer to do so since that's usually more efficient than performing a series of stores from immediates. 2021-05-21 Richard Biener <rguenther@suse.de> H.J. Lu <hjl.tools@gmail.com> gcc/ PR middle-end/90773 * expr.c (expand_constructor): Elide expand_constructor if move by pieces is preferred. gcc/testsuite/ * gcc.target/i386/pr90773-24.c: New test. * gcc.target/i386/pr90773-25.c: Likewise.
2021-05-11Replace unreachable code with an assert.Martin Sebor1-55/+2
Resolves: PR middle-end/21433 - The COMPONENT_REF case of expand_expr_real_1 is probably wrong gcc/ChangeLog: PR middle-end/21433 * expr.c (expand_expr_real_1): Replace unreachable code with an assert.
2021-05-03introduce try store by multiple piecesAlexandre Oliva1-2/+7
The ldist pass turns even very short loops into memset calls. E.g., the TFmode emulation calls end with a loop of up to 3 iterations, to zero out trailing words, and the loop distribution pass turns them into calls of the memset builtin. Though short constant-length clearing memsets are usually dealt with efficiently, for non-constant-length ones, the options are setmemM, or a function calls. RISC-V doesn't have any setmemM pattern, so the loops above end up "optimized" into memset calls, incurring not only the overhead of an explicit call, but also discarding the information the compiler has about the alignment of the destination, and that the length is a multiple of the word alignment. This patch handles variable lengths with multiple conditional power-of-2-constant-sized stores-by-pieces, so as to reduce the overhead of length compares. It also changes the last copy-prop pass into ccp, so that pointer alignment and length's nonzero bits are detected and made available for the expander, even for ldist-introduced SSA_NAMEs. for gcc/ChangeLog * builtins.c (try_store_by_multiple_pieces): New. (expand_builtin_memset_args): Use it. If target_char_cast fails, proceed as for non-constant val. Pass len's ctz to... * expr.c (clear_storage_hints): ... this. Try store by multiple pieces after setmem. (clear_storage): Adjust. * expr.h (clear_storage_hints): Likewise. (try_store_by_multiple_pieces): Declare. * passes.def: Replace the last copy_prop with ccp.
2021-04-30Update alignment_for_piecewise_moveH.J. Lu1-1/+1
alignment_for_piecewise_move is called only with MOVE_MAX_PIECES or STORE_MAX_PIECES, which are the number of bytes at a time that we can move or store efficiently. We should call mode_for_size without limit to MAX_FIXED_MODE_SIZE, which is an integer expression for the size in bits of the largest integer machine mode that should actually be used, may be smaller than MOVE_MAX_PIECES or STORE_MAX_PIECES, which may use vector. * expr.c (alignment_for_piecewise_move): Call mode_for_size without limit to MAX_FIXED_MODE_SIZE.
2021-04-29Generate offset adjusted operation for op_by_pieces operationsH.J. Lu1-20/+85
Add an overlap_op_by_pieces_p target hook for op_by_pieces operations between two areas of memory to generate one offset adjusted operation in the smallest integer mode for the remaining bytes on the last piece operation of a memory region to avoid doing more than one smaller operations. Pass the RTL information from the previous iteration to m_constfn in op_by_pieces operation so that builtin_memset_[read|gen]_str can generate the new RTL from the previous RTL. Tested on Linux/x86-64. gcc/ PR middle-end/90773 * builtins.c (builtin_memcpy_read_str): Add a dummy argument. (builtin_strncpy_read_str): Likewise. (builtin_memset_read_str): Add an argument for the previous RTL information and generate the new RTL from the previous RTL info. (builtin_memset_gen_str): Likewise. * builtins.h (builtin_strncpy_read_str): Update the prototype. (builtin_memset_read_str): Likewise. * expr.c (by_pieces_ninsns): If targetm.overlap_op_by_pieces_p() returns true, round up size and alignment to the widest integer mode for maximum size. (pieces_addr::adjust): Add a pointer to by_pieces_prev argument and pass it to m_constfn. (op_by_pieces_d): Add m_push and m_overlap_op_by_pieces. (op_by_pieces_d::op_by_pieces_d): Add a bool argument to initialize m_push. Initialize m_overlap_op_by_pieces with targetm.overlap_op_by_pieces_p (). (op_by_pieces_d::run): Pass the previous RTL information to pieces_addr::adjust and generate overlapping operations if m_overlap_op_by_pieces is true. (PUSHG_P): New. (move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d change. (store_by_pieces_d::store_by_pieces_d): Updated for op_by_pieces_d change. (can_store_by_pieces): Use by_pieces_constfn on constfun. (store_by_pieces): Use by_pieces_constfn on constfun. Updated for op_by_pieces_d change. (clear_by_pieces_1): Add a dummy argument. (clear_by_pieces): Updated for op_by_pieces_d change. (compare_by_pieces_d::compare_by_pieces_d): Likewise. (string_cst_read_str): Add a dummy argument. * expr.h (by_pieces_constfn): Add a dummy argument. (by_pieces_prev): New. * target.def (overlap_op_by_pieces_p): New target hook. * config/i386/i386.c (TARGET_OVERLAP_OP_BY_PIECES_P): New. * doc/tm.texi.in: Add TARGET_OVERLAP_OP_BY_PIECES_P. * doc/tm.texi: Regenerated. gcc/testsuite/ PR middle-end/90773 * g++.dg/pr90773-1.h: New test. * g++.dg/pr90773-1a.C: Likewise. * g++.dg/pr90773-1b.C: Likewise. * g++.dg/pr90773-1c.C: Likewise. * g++.dg/pr90773-1d.C: Likewise. * gcc.target/i386/pr90773-1.c: Likewise. * gcc.target/i386/pr90773-2.c: Likewise. * gcc.target/i386/pr90773-3.c: Likewise. * gcc.target/i386/pr90773-4.c: Likewise. * gcc.target/i386/pr90773-5.c: Likewise. * gcc.target/i386/pr90773-6.c: Likewise. * gcc.target/i386/pr90773-7.c: Likewise. * gcc.target/i386/pr90773-8.c: Likewise. * gcc.target/i386/pr90773-9.c: Likewise. * gcc.target/i386/pr90773-10.c: Likewise. * gcc.target/i386/pr90773-11.c: Likewise. * gcc.target/i386/pr90773-12.c: Likewise. * gcc.target/i386/pr90773-13.c: Likewise. * gcc.target/i386/pr90773-14.c: Likewise.
2021-04-27op_by_pieces_d::run: Change a while loop to a do-while loopH.J. Lu1-23/+53
Change a while loop in op_by_pieces_d::run to a do-while loop to prepare for offset adjusted operation for the remaining bytes on the last piece operation of a memory region. PR middle-end/90773 * expr.c (op_by_pieces_d::get_usable_mode): New member function. (op_by_pieces_d::run): Cange a while loop to a do-while loop.
2021-04-27expand: Expand x / y * y as x - x % y if the latter is cheaper [PR96696]Jakub Jelinek1-58/+132
The following patch tests both x / y * y and x - x % y expansion for the former GIMPLE code and chooses the cheaper of those sequences. 2021-04-27 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/96696 * expr.c (expand_expr_divmod): New function. (expand_expr_real_2) <case TRUNC_DIV_EXPR>: Use it for truncations and divisions. Formatting fixes. <case MULT_EXPR>: Optimize x / y * y as x - x % y if the latter is cheaper. * gcc.target/i386/pr96696.c: New test.
2021-04-26Move gimplify_buildN API local to only remaining userRichard Biener1-1/+3
This moves the legacy gimplify_buildN API to tree-vect-generic.c, its only user and elides the gimplification step, making it a wrapper around gimple_build, adjusting tree_vec_extract for this. I've noticed that vector CTOR expansion doesn't deal with unfolded {} and thus this makes it more resilent. I've also adjusted the match.pd vector CTOR extraction code to make sure it doesn't produce a CTOR when folding would make it a vector constant. 2021-04-15 Richard Biener <rguenther@suse.de> * tree-cfg.h (gimplify_build1): Remove. (gimplify_build2): Likewise. (gimplify_build3): Likewise. * tree-cfg.c (gimplify_build1): Move to tree-vect-generic.c. (gimplify_build2): Likewise. (gimplify_build3): Likewise. * tree-vect-generic.c (gimplify_build1): Move from tree-cfg.c. Modernize. (gimplify_build2): Likewise. (gimplify_build3): Likewise. (tree_vec_extract): Use resimplify with following SSA edges. (expand_vector_parallel): Avoid passing NULL size/bitpos to tree_vec_extract. * expr.c (store_constructor): Deal with zero-element CTORs. * match.pd (bit_field_ref <vector CTOR>): Make sure to produce vector constants when possible.
2021-04-10expand: Fix up LTO ICE with COMPOUND_LITERAL_EXPR [PR99849]Jakub Jelinek1-1/+1
The gimplifier optimizes away COMPOUND_LITERAL_EXPRs, but they can remain in the form of ADDR_EXPR of COMPOUND_LITERAL_EXPRs in static initializers. By the TREE_STATIC check I meant to check that the underlying decl of the compound literal is a global rather than automatic variable which obviously can't be referenced in static initializers, but unfortunately with LTO it might end up in another partition and thus be DECL_EXTERNAL instead. 2021-04-10 Jakub Jelinek <jakub@redhat.com> PR lto/99849 * expr.c (expand_expr_addr_expr_1): Test is_global_var rather than just TREE_STATIC on COMPOUND_LITERAL_EXPR_DECLs. * gcc.dg/lto/pr99849_0.c: New test.
2021-02-26middle-end/99281 - avoid bitfield stores into addressable typesRichard Biener1-1/+7
This avoids doing bitfield stores into the return object of calls when using return-slot optimization and the type is addressable. Instead we have to pass down the original target RTX to the call expansion which otherwise tries to create a new temporary. 2021-02-26 Richard Biener <rguenther@suse.de> PR middle-end/99281 * expr.c (store_field): For calls with return-slot optimization and addressable return type expand the store directly. * g++.dg/pr99218.C: New testcase.
2021-02-02PR target/98743: Fix ICE in convert_move for RISC-VKito Cheng1-0/+1
- Check `from` mode is not BLMmode before call store_expr, calling store_expr with BLKmode will cause ICE. - Verified with riscv64, x86_64 and aarch64, no introduce new regression. Note: Those logic was introduced by 3e60ddeb8220ed388819bb3f14e8caa9309fd3c2, so I cc Jakub for reivew. Changes for V2: - Checking mode of `from` rather than mode of `to`. - Verified on riscv64, x86_64 and aarch64 again. gcc/ChangeLog: PR target/98743 * expr.c: Check mode before calling store_expr. gcc/testsuite/ChangeLog: PR target/98743 * g++.dg/opt/pr98743.C: New.
2021-01-05expand: Fold x - y < 0 to x < y during expansion [PR94802]Jakub Jelinek1-0/+41
My earlier patch to simplify x - y < 0 etc. for signed subtraction with undefined overflow into x < y in match.pd regressed some tests, even when it was guarded to be post-IPA, the following patch thus attempts to optimize that during expansion instead (which is the last time we can do it, afterwards we lose the information whether it was x - y < 0 or (int) ((unsigned) x - y) < 0 for which we couldn't optimize it. 2021-01-05 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/94802 * expr.h (maybe_optimize_sub_cmp_0): Declare. * expr.c: Include tree-pretty-print.h and flags.h. (maybe_optimize_sub_cmp_0): New function. (do_store_flag): Use it. * cfgexpand.c (expand_gimple_cond): Likewise. * gcc.target/i386/pr94802.c: New test. * gcc.dg/Wstrict-overflow-25.c: Remove xfail.
2021-01-04Update copyright years.Jakub Jelinek1-1/+1
2020-12-19expr: Fix up constant_byte_string bitfield handling [PR98366]Jakub Jelinek1-112/+38
constant_byte_string now uses a convert_to_bytes function, which doesn't handle bitfields at all (don't punt on them, just puts them into wrong bits or bytes). Furthermore, I don't see a reason why that function should exist at all, it duplicates native_encode_initializer functionality. Except that native_encode_initializer punted on flexible array members and 2 tests in the testsuite relied on constant_byte_string handling those. So, this patch throws away convert_to_bytes, uses native_encode_initializer instead, but teaches it to handle flexible array members (only in the non-mask mode with off == -1 for now), furthermore, it adds various corner case checks that the old implementation was missing (like that STRING_CSTs use int as length and therefore we shouldn't try to build larger than that strings, or that native_encode*/native_interpret* APIs require sane host and target bytes (8-bit on both). 2020-12-19 Jakub Jelinek <jakub@redhat.com> PR middle-end/98366 * fold-const.c (native_encode_initializer): Don't try to memset more than total_bytes with off == -1 even if len is large. Handle flexible array member initializers if off == -1 and mask is NULL. * expr.c (convert_to_bytes): Remove. (constant_byte_string): Use native_encode_initializer instead of convert_to_bytes. Remove extraneous semicolon. Punt on various corner-cases the APIs don't handle, like sizes > INT_MAX, BITS_PER_UNIT != 8, CHAR_BIT != 8. * gcc.c-torture/execute/pr98366.c: New test.
2020-12-11expansion: Sign or zero extend on MEM_REF stores into SUBREG with ↵Jakub Jelinek1-0/+24
SUBREG_PROMOTED_VAR_P [PR98190] Some targets decide to promote certain scalar variables to wider mode, so their DECL_RTL is a SUBREG with SUBREG_PROMOTED_VAR_P. When storing to such vars, store_expr takes care of sign or zero extending, but if we store e.g. through MEM_REF into them, no sign or zero extension happens and that leads to wrong-code e.g. on the following testcase on aarch64-linux. The following patch uses store_expr if we overwrite all the bits and it is not reversed storage order, i.e. something that store_expr handles normally, and otherwise (if the most significant bit is (or for pdp11 might be, but pdp11 doesn't promote) being modified), the code extends manually. 2020-12-11 Jakub Jelinek <jakub@redhat.com> PR middle-end/98190 * expr.c (expand_assignment): If to_rtx is a promoted SUBREG, ensure sign or zero extension either through use of store_expr or by extending manually. * gcc.dg/pr98190.c: New test.
2020-12-02expansion: Fix up infinite recursion due to double-word modulo optimizationJakub Jelinek1-1/+1
Jeff has reported that my earlier patch broke rl78-elf, e.g. with unsigned short foo (unsigned short x) { return x % 7; } when compiled with -O2 -mg14. The problem is that rl78 is a BITS_PER_WORD == 8 target which doesn't have 8-bit modulo or divmod optab, but has instead 16-bit divmod, so my patch attempted to optimize it, then called expand_divmod to do 8-bit modulo and that in turn tried to do 16-bit modulo again. The following patch fixes it in two ways. One is to not perform the optimization when we have {u,s}divmod_optab handler for the double-word mode, in that case it is IMHO better to just do whatever we used to do before. This alone should fix the infinite recursion. But I'd be afraid some other target might have similar problem and might not have a divmod pattern, but only say a library call. So the patch also introduces a methods argument to expand_divmod such that normally we allow everything that was allowed before (using libcalls and widening), but when called from these expand_doubleword*mod routines we restrict it to no widening and no libcalls. 2020-12-02 Jakub Jelinek <jakub@redhat.com> * expmed.h (expand_divmod): Only declare if GCC_OPTABS_H is defined. Add enum optabs_method argument defaulted to OPTAB_LIB_WIDEN. * expmed.c: Include expmed.h after optabs.h. (expand_divmod): Add methods argument, if it is not OPTAB_{,LIB_}WIDEN, don't choose a wider mode, and pass it to other calls instead of hardcoded OPTAB_LIB_WIDEN. Avoid emitting libcalls if not OPTAB_LIB or OPTAB_LIB_WIDEN. * optabs.c: Include expmed.h after optabs.h. (expand_doubleword_mod, expand_doubleword_divmod): Pass OPTAB_DIRECT as last argument to expand_divmod. (expand_binop): Punt if {s,u}divmod_optab has handler for double-word int_mode. * expr.c: Include expmed.h after optabs.h. * explow.c: Include expmed.h after optabs.h.
2020-11-21Additional small changes to support opaque modesAaron Sawdey1-0/+1
After building some larger codes using opaque types and some c++ codes using opaque types it became clear I needed to go through and look for places where opaque types and modes needed to be handled. A whole pile of one-liners. gcc/ * typeclass.h: Add opaque_type_class. * builtins.c (type_to_class): Identify opaque type class. * dwarf2out.c (is_base_type): Handle opaque types. (gen_type_die_with_usage): Handle opaque types. * expr.c (count_type_elements): Opaque types should never have initializers. * ipa-devirt.c (odr_types_equivalent_p): No type-specific handling for opaque types is needed as it eventually checks the underlying mode which is what is important. * tree-streamer.c (record_common_node): Handle opaque types. * tree.c (type_contains_placeholder_1): Handle opaque types. (type_cache_hasher::equal): No additional comparison needed for opaque types. gcc/c-family * c-pretty-print.c (c_pretty_printer::simple_type_specifier): Treat opaque types like other types. (c_pretty_printer::direct_abstract_declarator): Opaque types are supported types. gcc/c * c-aux-info.c (gen_type): Support opaque types. gcc/cp * error.c (dump_type): Handle opaque types. (dump_type_prefix): Handle opaque types. (dump_type_suffix): Handle opaque types. (dump_expr): Handle opaque types. * pt.c (tsubst): Allow opaque types in templates. (unify): Allow opaque types in templates. * typeck.c (structural_comptypes): Handle comparison of opaque types.
2020-11-19[2/3] [vect] Add widening add, subtract patternsJoel Hutton1-0/+6
Add widening add, subtract patterns to tree-vect-patterns. Update the widened code of patterns that detect PLUS_EXPR to also detect WIDEN_PLUS_EXPR. These patterns take 2 vectors with N elements of size S and perform an add/subtract on the elements, storing the results as N elements of size 2*S (in 2 result vectors). This is implemented in the aarch64 backend as addl,addl2 and subl,subl2 respectively. Add aarch64 tests for patterns. gcc/ChangeLog: * doc/generic.texi: Document new widen_plus/minus_lo/hi tree codes. * doc/md.texi: Document new widenening add/subtract hi/lo optabs. * expr.c (expand_expr_real_2): Add widen_add, widen_subtract cases. * optabs-tree.c (optab_for_tree_code): Add case for widening optabs. * optabs.def (OPTAB_D): Define vectorized widen add, subtracts. * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, subtracts. * tree-inline.c (estimate_operator_cost): Add case for widening adds, subtracts. * tree-vect-generic.c (expand_vector_operations_1): Add case for widening adds, subtracts * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog pattern. (vect_recog_widen_sub_pattern): New recog pattern. (vect_recog_average_pattern): Update widened add code. (vect_recog_average_pattern): Update widened add code. * tree-vect-stmts.c (vectorizable_conversion): Add case for widened add, subtract. (supportable_widening_operation): Add case for widened add, subtract. * tree.def (WIDEN_PLUS_EXPR): New tree code. (WIDEN_MINUS_EXPR): New tree code. (VEC_WIDEN_ADD_HI_EXPR): New tree code. (VEC_WIDEN_PLUS_LO_EXPR): New tree code. (VEC_WIDEN_MINUS_HI_EXPR): New tree code. (VEC_WIDEN_MINUS_LO_EXPR): New tree code. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-widen-add.c: New test. * gcc.target/aarch64/vect-widen-sub.c: New test.
2020-10-26middle-end/97521 - always use single-bit bools in mask vector typesRichard Biener1-35/+4
This makes us always use a single-bit boolean type component type for integer mode mask VECTOR_BOOLEAN_TYPE_P to match the RTL and target representation. This aovids the need for magic translation and the inconsistencies from the translation requirement now that we expose temporaries of those types on the GIMPLE level. 2020-10-23 Richard Biener <rguenther@suse.de> PR middle-end/97521 * expr.c (const_scalar_mask_from_tree): Remove. (expand_expr_real_1): Always VIEW_CONVERT integer mode vector constants to an integer type. * tree.c (build_truth_vector_type_for_mode): Use a single-bit boolean component type for non-vector-mode mask_mode. * gcc.target/i386/pr97521.c: New testcase.
2020-10-23Revert "middle-end/97521 - fix VECTOR_CST expansion"Richard Biener1-4/+1
2020-10-23 Richard Biener <rguenther@suse.de> PR middle-end/97521 * expr.c (expand_expr_real_1): Revert last change. * gcc.target/i386/pr97521.c: Remove. This reverts commit b960a9c83a93b58a84a7a370002990810675ac5d.
2020-10-22middle-end/97521 - fix VECTOR_CST expansionRichard Biener1-1/+4
This fixes expansion of VECTOR_BOOLEAN_TYPE_P VECTOR_CSTs which when using an integer mode are not always "mask-mode" but may be using an integer mode when there's no supported vector mode. The patch makes sure to only go the mask-mode expansion if the elements do not line up to cover the full integer mode (when they do and the mode was an actual mask-mode there's no actual difference in both expansions). 2020-10-22 Richard Biener <rguenther@suse.de> PR middle-end/97521 * expr.c (expand_expr_real_1): Be more careful when expanding a VECTOR_BOOLEAN_TYPE_P VECTOR_CSTs. * gcc.target/i386/pr97521.c: New testcase.
2020-10-14PR target/96759 - Handle global variable assignment from misaligned ↵Kito Cheng1-0/+2
structure/PARALLEL return values. In g:70cdb21e579191fe9f0f1d45e328908e59c0179e, DECL/global variable has handled misaligned stores, but it didn't handle PARALLEL values, and I refer the other part of this function, I found the PARALLEL need handled by emit_group_* functions, so I add a check, and using emit_group_store if storing a PARALLEL value, also checked this change didn't break the testcase(gcc.target/arm/unaligned-argument-3.c) added by the orginal changes. For riscv64 target, struct S {int a; double b;} will pack into a parallel value to return and it has TImode when misaligned access is supported, however TImode required 16-byte align, but it only 8-byte align, so it go to the misaligned stores handling, then it will try to generate move instruction from a PARALLEL value. Tested on following target without introduced new reguression: - riscv32/riscv64 elf - x86_64-linux - arm-eabi v2 changes: - Use maybe_emit_group_store instead of emit_group_store. - Remove push_temp_slots/pop_temp_slots, emit_group_store only require stack temp slot when dst is CONCAT or PARALLEL, however maybe_emit_group_store will always use REG for dst if needed. gcc/ChangeLog: PR target/96759 * expr.c (expand_assignment): Handle misaligned stores with PARALLEL value. gcc/testsuite/ChangeLog: PR target/96759 * g++.target/riscv/pr96759.C: New. * gcc.target/riscv/pr96759.c: New.
2020-09-25middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansionRichard Biener1-7/+11
The RTL expansion code for CTORs doesn't handle VECTOR_BOOLEAN_TYPE_P with bit-precision elements correctly as the testcase shows before the PR97085 fix. The following makes it do the correct thing (not 100% sure for CTOR of sub-vectors due to the lack of a testcase). The alternative would be to assert such CTORs do not happen (and also add IL verification for this). The GIMPLE FE needs a way to declare the VECTOR_BOOLEAN_TYPE_P vectors (thus the C FE needs that). 2020-09-25 Richard Biener <rguenther@suse.de> PR middle-end/96814 * expr.c (store_constructor): Handle VECTOR_BOOLEAN_TYPE_P CTORs correctly. * gcc.target/i386/pr96814.c: New testcase.
2020-08-27vec: add exact argument for various grow functions.Martin Liska1-4/+4
gcc/ada/ChangeLog: * gcc-interface/trans.c (gigi): Set exact argument of a vector growth function to true. (Attribute_to_gnu): Likewise. gcc/ChangeLog: * alias.c (init_alias_analysis): Set exact argument of a vector growth function to true. * calls.c (internal_arg_pointer_based_exp_scan): Likewise. * cfgbuild.c (find_many_sub_basic_blocks): Likewise. * cfgexpand.c (expand_asm_stmt): Likewise. * cfgrtl.c (rtl_create_basic_block): Likewise. * combine.c (combine_split_insns): Likewise. (combine_instructions): Likewise. * config/aarch64/aarch64-sve-builtins.cc (function_expander::add_output_operand): Likewise. (function_expander::add_input_operand): Likewise. (function_expander::add_integer_operand): Likewise. (function_expander::add_address_operand): Likewise. (function_expander::add_fixed_operand): Likewise. * df-core.c (df_worklist_dataflow_doublequeue): Likewise. * dwarf2cfi.c (update_row_reg_save): Likewise. * early-remat.c (early_remat::init_block_info): Likewise. (early_remat::finalize_candidate_indices): Likewise. * except.c (sjlj_build_landing_pads): Likewise. * final.c (compute_alignments): Likewise. (grow_label_align): Likewise. * function.c (temp_slots_at_level): Likewise. * fwprop.c (build_single_def_use_links): Likewise. (update_uses): Likewise. * gcc.c (insert_wrapper): Likewise. * genautomata.c (create_state_ainsn_table): Likewise. (add_vect): Likewise. (output_dead_lock_vect): Likewise. * genmatch.c (capture_info::capture_info): Likewise. (parser::finish_match_operand): Likewise. * genrecog.c (optimize_subroutine_group): Likewise. (merge_pattern_info::merge_pattern_info): Likewise. (merge_into_decision): Likewise. (print_subroutine_start): Likewise. (main): Likewise. * gimple-loop-versioning.cc (loop_versioning::loop_versioning): Likewise. * gimple.c (gimple_set_bb): Likewise. * graphite-isl-ast-to-gimple.c (translate_isl_ast_node_user): Likewise. * haifa-sched.c (sched_extend_luids): Likewise. (extend_h_i_d): Likewise. * insn-addr.h (insn_addresses_new): Likewise. * ipa-cp.c (gather_context_independent_values): Likewise. (find_more_contexts_for_caller_subset): Likewise. * ipa-devirt.c (final_warning_record::grow_type_warnings): Likewise. (ipa_odr_read_section): Likewise. * ipa-fnsummary.c (evaluate_properties_for_edge): Likewise. (ipa_fn_summary_t::duplicate): Likewise. (analyze_function_body): Likewise. (ipa_merge_fn_summary_after_inlining): Likewise. (read_ipa_call_summary): Likewise. * ipa-icf.c (sem_function::bb_dict_test): Likewise. * ipa-prop.c (ipa_alloc_node_params): Likewise. (parm_bb_aa_status_for_bb): Likewise. (ipa_compute_jump_functions_for_edge): Likewise. (ipa_analyze_node): Likewise. (update_jump_functions_after_inlining): Likewise. (ipa_read_edge_info): Likewise. (read_ipcp_transformation_info): Likewise. (ipcp_transform_function): Likewise. * ipa-reference.c (ipa_reference_write_optimization_summary): Likewise. * ipa-split.c (execute_split_functions): Likewise. * ira.c (find_moveable_pseudos): Likewise. * lower-subreg.c (decompose_multiword_subregs): Likewise. * lto-streamer-in.c (input_eh_regions): Likewise. (input_cfg): Likewise. (input_struct_function_base): Likewise. (input_function): Likewise. * modulo-sched.c (set_node_sched_params): Likewise. (extend_node_sched_params): Likewise. (schedule_reg_moves): Likewise. * omp-general.c (omp_construct_simd_compare): Likewise. * passes.c (pass_manager::create_pass_tab): Likewise. (enable_disable_pass): Likewise. * predict.c (determine_unlikely_bbs): Likewise. * profile.c (compute_branch_probabilities): Likewise. * read-rtl-function.c (function_reader::parse_block): Likewise. * read-rtl.c (rtx_reader::read_rtx_code): Likewise. * reg-stack.c (stack_regs_mentioned): Likewise. * regrename.c (regrename_init): Likewise. * rtlanal.c (T>::add_single_to_queue): Likewise. * sched-deps.c (init_deps_data_vector): Likewise. * sel-sched-ir.c (sel_extend_global_bb_info): Likewise. (extend_region_bb_info): Likewise. (extend_insn_data): Likewise. * symtab.c (symtab_node::create_reference): Likewise. * tracer.c (tail_duplicate): Likewise. * trans-mem.c (tm_region_init): Likewise. (get_bb_regions_instrumented): Likewise. * tree-cfg.c (init_empty_tree_cfg_for_function): Likewise. (build_gimple_cfg): Likewise. (create_bb): Likewise. (move_block_to_fn): Likewise. * tree-complex.c (tree_lower_complex): Likewise. * tree-if-conv.c (predicate_rhs_code): Likewise. * tree-inline.c (copy_bb): Likewise. * tree-into-ssa.c (get_ssa_name_ann): Likewise. (mark_phi_for_rewrite): Likewise. * tree-object-size.c (compute_builtin_object_size): Likewise. (init_object_sizes): Likewise. * tree-predcom.c (initialize_root_vars_store_elim_1): Likewise. (initialize_root_vars_store_elim_2): Likewise. (prepare_initializers_chain_store_elim): Likewise. * tree-ssa-address.c (addr_for_mem_ref): Likewise. (multiplier_allowed_in_address_p): Likewise. * tree-ssa-coalesce.c (ssa_conflicts_new): Likewise. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-ssa-loop-ivopts.c (addr_offset_valid_p): Likewise. (get_address_cost_ainc): Likewise. * tree-ssa-loop-niter.c (discover_iteration_bound_by_body_walk): Likewise. * tree-ssa-pre.c (add_to_value): Likewise. (phi_translate_1): Likewise. (do_pre_regular_insertion): Likewise. (do_pre_partial_partial_insertion): Likewise. (init_pre): Likewise. * tree-ssa-propagate.c (ssa_prop_init): Likewise. (update_call_from_tree): Likewise. * tree-ssa-reassoc.c (optimize_range_tests_cmp_bitwise): Likewise. * tree-ssa-sccvn.c (vn_reference_lookup_3): Likewise. (vn_reference_lookup_pieces): Likewise. (eliminate_dom_walker::eliminate_push_avail): Likewise. * tree-ssa-strlen.c (set_strinfo): Likewise. (get_stridx_plus_constant): Likewise. (zero_length_string): Likewise. (find_equal_ptrs): Likewise. (printf_strlen_execute): Likewise. * tree-ssa-threadedge.c (set_ssa_name_value): Likewise. * tree-ssanames.c (make_ssa_name_fn): Likewise. * tree-streamer-in.c (streamer_read_tree_bitfields): Likewise. * tree-vect-loop.c (vect_record_loop_mask): Likewise. (vect_get_loop_mask): Likewise. (vect_record_loop_len): Likewise. (vect_get_loop_len): Likewise. * tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Likewise. * tree-vect-slp.c (vect_slp_convert_to_external): Likewise. (vect_bb_slp_scalar_cost): Likewise. (vect_bb_vectorization_profitable_p): Likewise. (vectorizable_slp_permutation): Likewise. * tree-vect-stmts.c (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (scan_store_can_perm_p): Likewise. (vectorizable_store): Likewise. * expr.c: Likewise. * vec.c (test_safe_grow_cleared): Likewise. * vec.h (vec_safe_grow): Likewise. (vec_safe_grow_cleared): Likewise. (vl_ptr>::safe_grow): Likewise. (vl_ptr>::safe_grow_cleared): Likewise. * config/c6x/c6x.c (insn_set_clock): Likewise. gcc/c/ChangeLog: * gimple-parser.c (c_parser_gimple_compound_statement): Set exact argument of a vector growth function to true. gcc/cp/ChangeLog: * class.c (build_vtbl_initializer): Set exact argument of a vector growth function to true. * constraint.cc (get_mapped_args): Likewise. * decl.c (cp_maybe_mangle_decomp): Likewise. (cp_finish_decomp): Likewise. * parser.c (cp_parser_omp_for_loop): Likewise. * pt.c (canonical_type_parameter): Likewise. * rtti.c (get_pseudo_ti_init): Likewise. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_trans_omp_do): Set exact argument of a vector growth function to true. gcc/lto/ChangeLog: * lto-common.c (lto_file_finalize): Set exact argument of a vector growth function to true.
2020-08-18PR middle-end/96665 - memcmp of a constant string not foldedMartin Sebor1-8/+19
Related: PR middle-end/78257 - missing memcmp optimization with constant arrays gcc/ChangeLog: PR middle-end/96665 PR middle-end/78257 * expr.c (convert_to_bytes): Replace statically allocated buffer with a dynamically allocated one of sufficient size. gcc/testsuite/ChangeLog: PR middle-end/96665 PR middle-end/78257 * gcc.dg/memcmp-5.c: New test.
2020-08-14PR tree-optimization/78257 - missing memcmp optimization with constant arraysMartin Sebor1-18/+162
gcc/ChangeLog: PR middle-end/78257 * builtins.c (expand_builtin_memory_copy_args): Rename called function. (expand_builtin_stpcpy_1): Remove argument from call. (expand_builtin_memcmp): Rename called function. (inline_expand_builtin_bytecmp): Same. * expr.c (convert_to_bytes): New function. (constant_byte_string): New function (formerly string_constant). (string_constant): Call constant_byte_string. (byte_representation): New function. * expr.h (byte_representation): Declare. * fold-const-call.c (fold_const_call): Rename called function. * fold-const.c (c_getstr): Remove an argument. (getbyterep): Define a new function. * fold-const.h (c_getstr): Remove an argument. (getbyterep): Declare a new function. * gimple-fold.c (gimple_fold_builtin_memory_op): Rename callee. (gimple_fold_builtin_string_compare): Same. (gimple_fold_builtin_memchr): Same. gcc/testsuite/ChangeLog: PR middle-end/78257 * gcc.dg/memchr.c: New test. * gcc.dg/memcmp-2.c: New test. * gcc.dg/memcmp-3.c: New test. * gcc.dg/memcmp-4.c: New test.
2020-08-11expr: Optimize noop copies [PR96539]Jakub Jelinek1-0/+6
At GIMPLE e.g. for __builtin_memmove we optimize away (to just the return value) noop copies where src == dest, but at the RTL we don't, and as the testcase shows, in some cases such copies can appear only at the RTL level e.g. from trying to copy an aggregate by value argument to the same location as it already has. If the block move is expanded e.g. piecewise, we actually manage to optimize it away, as the individual memory copies are seen as noop moves, but if the target optabs are used, often the sequences stay until final. 2020-08-11 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/96539 * expr.c (emit_block_move_hints): Don't copy anything if x and y are the same and neither is MEM_VOLATILE_P. * gcc.target/i386/pr96539.c: New test.
2020-08-10Simplify X * C1 == C2 with wrapping overflowMarc Glisse1-33/+1
Odd numbers are invertible in Z / 2^n Z, so X * C1 == C2 can be rewritten as X == C2 * inv(C1) when overflow wraps. mod_inv should probably be updated to better match the other wide_int functions, but that's a separate issue. 2020-08-10 Marc Glisse <marc.glisse@inria.fr> PR tree-optimization/95433 * match.pd (X * C1 == C2): Handle wrapping overflow. * expr.c (maybe_optimize_mod_cmp): Qualify call to mod_inv. (mod_inv): Move... * wide-int.cc (mod_inv): ... here. * wide-int.h (mod_inv): Declare it. * gcc.dg/tree-ssa/pr95433-2.c: New file.
2020-07-27expr: build string_constant only for a char typeMartin Liska1-9/+14
gcc/ChangeLog: PR tree-optimization/96058 * expr.c (string_constant): Build string_constant only for a type that has same precision as char_type_node and is an integral type.
2020-07-22expr: Allow scalar_int_mode target mode when converting a constantJozef Lawrynowicz1-2/+2
is_int_mode does not allow MODE_PARTIAL_INT modes, so convert_modes was not allowing a constant value to be converted to a MODE_PARTIAL_INT for use as operand 2 in patterns such as ashlpsi3. The constant had to be copied into a register before it could be used, but now can be used directly as an operand without any copying. gcc/ChangeLog: * expr.c (convert_modes): Allow a constant integer to be converted to any scalar int mode.
2020-07-20Correct handling of constant representations containing embedded nuls.Martin Sebor1-2/+2
Resolves: PR middle-end/95189 - memcmp being wrongly stripped like strcm PR middle-end/95886 - suboptimal memcpy with embedded zero bytes gcc/ChangeLog: PR middle-end/95189 PR middle-end/95886 * builtins.c (inline_expand_builtin_string_cmp): Rename... (inline_expand_builtin_bytecmp): ...to this. (builtin_memcpy_read_str): Don't expect data to be nul-terminated. (expand_builtin_memory_copy_args): Handle object representations with embedded nul bytes. (expand_builtin_memcmp): Same. (expand_builtin_strcmp): Adjust call to naming change. (expand_builtin_strncmp): Same. * expr.c (string_constant): Create empty strings with nonzero size. * fold-const.c (c_getstr): Rename locals and update comments. * tree.c (build_string): Accept null pointer argument. (build_string_literal): Same. * tree.h (build_string): Provide a default. (build_string_literal): Same. gcc/testsuite/ChangeLog: PR middle-end/95189 PR middle-end/95886 * gcc.dg/memcmp-pr95189.c: New test. * gcc.dg/strncmp-3.c: New test. * gcc.target/i386/memcpy-pr95886.c: New test.
2020-07-14expr: Unbreak build of mesa [PR96194]Jakub Jelinek1-1/+3
> > The store to the whole of each volatile object was picked apart > > like there had been an individual assignment to each of the > > fields. Reads were added as part of that; see PR for details. > > The reads from volatile memory were a clear bug; individual > > stores questionable. A separate patch clarifies the docs. This breaks building of mesa on both the trunk and 10 branch. The problem is that the middle-end may never create temporaries of non-POD (TREE_ADDRESSABLE) types, those can be only created when the language says so and thus only the FE is allowed to create those. This patch just reverts the behavior to what we used to do before for the stores to volatile non-PODs. Perhaps we want to do something else, but definitely we can't create temporaries of the non-POD type. It is up to discussions on what should happen in those cases. 2020-07-14 Jakub Jelinek <jakub@redhat.com> PR middle-end/96194 * expr.c (expand_constructor): Don't create temporary for store to volatile MEM if exp has an addressable type. * g++.dg/opt/pr96194.C: New test.
2020-07-13PR94600: fix volatile access to the whole of a compound object.Hans-Peter Nilsson1-1/+4
The store to the whole of each volatile object was picked apart like there had been an individual assignment to each of the fields. Reads were added as part of that; see PR for details. The reads from volatile memory were a clear bug; individual stores questionable. A separate patch clarifies the docs. gcc: 2020-07-09 Richard Biener <rguenther@suse.de> PR middle-end/94600 * expr.c (expand_constructor): Make a temporary also if we're storing to volatile memory. gcc/testsuite: 2020-07-09 Hans-Peter Nilsson <hp@axis.com> PR middle-end/94600 * gcc.dg/pr94600-1.c, gcc.dg/pr94600-2.c, gcc.dg/pr94600-3.c, gcc.dg/pr94600-4.c, gcc.dg/pr94600-5.c, gcc.dg/pr94600-6.c, gcc.dg/pr94600-7.c, gcc.dg/pr94600-8.c: New tests.
2020-07-10expr: Move reduce_bit_field target mode check [PR96151]Richard Sandiford1-4/+5
In some cases, expand_expr_real_2 prefers to use the mode of the caller-suggested target instead of the mode of the expression when passing values to reduce_to_bit_field_precision. E.g.: else if (target == 0) op0 = convert_to_mode (mode, op0, TYPE_UNSIGNED (TREE_TYPE (treeop0))); else { convert_move (target, op0, TYPE_UNSIGNED (TREE_TYPE (treeop0))); op0 = target; } where “op0” might not have “mode” for the “else” branch, but does for all the others. reduce_to_bit_field_precision discards the suggested target if it has the wrong mode. This patch moves that to expand_expr_real_2 instead (conditional on reduce_bit_field). gcc/ PR middle-end/96151 * expr.c (expand_expr_real_2): When reducing bit fields, clear the target if it has a different mode from the expression. (reduce_to_bit_field_precision): Don't do that here. Instead assert that the target already has the correct mode.
2020-07-08expr: Fix REDUCE_BIT_FIELD for constants [PR95694]Richard Sandiford1-7/+8
This is yet another PR caused by constant integer rtxes not storing a mode. We were calling REDUCE_BIT_FIELD on a constant integer that didn't fit in poly_int64, and then tripped the as_a<scalar_int_mode> assert on VOIDmode. AFAICT REDUCE_BIT_FIELD is always passed rtxes that have TYPE_MODE (rather than some other mode) and it just fills in the redundant sign bits of that TYPE_MODE value. So it should be safe to get the mode from the type instead of the rtx. The patch does that and asserts that the modes agree, where information is available. That on its own is enough to fix the bug, but we might as well extend the folding case to all constant integers, not just those that fit poly_int64. gcc/ PR middle-end/95694 * expr.c (expand_expr_real_2): Get the mode from the type rather than the rtx, and assert that it is consistent with the mode of the rtx (where known). Optimize all constant integers, not just those that can be represented in poly_int64. gcc/testsuite/ PR middle-end/95694 * gcc.dg/pr95694.c: New test.
2020-06-17Lower VEC_COND_EXPR into internal functions.Martin Liska1-22/+3
gcc/ChangeLog: * Makefile.in: Add new file. * expr.c (expand_expr_real_2): Add gcc_unreachable as we should not meet this condition. (do_store_flag): Likewise. * gimplify.c (gimplify_expr): Gimplify first argument of VEC_COND_EXPR to be a SSA name. * internal-fn.c (vec_cond_mask_direct): New. (vec_cond_direct): Likewise. (vec_condu_direct): Likewise. (vec_condeq_direct): Likewise. (expand_vect_cond_optab_fn): New. (expand_vec_cond_optab_fn): Likewise. (expand_vec_condu_optab_fn): Likewise. (expand_vec_condeq_optab_fn): Likewise. (expand_vect_cond_mask_optab_fn): Likewise. (expand_vec_cond_mask_optab_fn): Likewise. (direct_vec_cond_mask_optab_supported_p): Likewise. (direct_vec_cond_optab_supported_p): Likewise. (direct_vec_condu_optab_supported_p): Likewise. (direct_vec_condeq_optab_supported_p): Likewise. * internal-fn.def (VCOND): New OPTAB. (VCONDU): Likewise. (VCONDEQ): Likewise. (VCOND_MASK): Likewise. * optabs.c (get_rtx_code): Make it global. (expand_vec_cond_mask_expr): Removed. (expand_vec_cond_expr): Removed. * optabs.h (expand_vec_cond_expr): Likewise. (vector_compare_rtx): Make it global. * passes.def: Add new pass_gimple_isel pass. * tree-cfg.c (verify_gimple_assign_ternary): Add check for VEC_COND_EXPR about first argument. * tree-pass.h (make_pass_gimple_isel): New. * tree-ssa-forwprop.c (pass_forwprop::execute): Prevent propagation of the first argument of a VEC_COND_EXPR. * tree-ssa-reassoc.c (ovce_extract_ops): Support SSA_NAME as first argument of a VEC_COND_EXPR. (optimize_vec_cond_expr): Likewise. * tree-vect-generic.c (expand_vector_divmod): Make SSA_NAME for a first argument of created VEC_COND_EXPR. (expand_vector_condition): Fix coding style. * tree-vect-stmts.c (vectorizable_condition): Gimplify first argument. * gimple-isel.cc: New file. gcc/testsuite/ChangeLog: * g++.dg/vect/vec-cond-expr-eh.C: New test.
2020-06-05expand: Simplify removing subregs when expanding a copy [PR95254]Fei Yang1-0/+74
In rtl expand, if we have a copy that matches one of the following patterns: (set (subreg:M1 (reg:M2 ...)) (subreg:M1 (reg:M2 ...))) (set (subreg:M1 (reg:M2 ...)) (mem:M1 ADDR)) (set (mem:M1 ADDR) (subreg:M1 (reg:M2 ...))) (set (subreg:M1 (reg:M2 ...)) (constant C)) where mode M1 is equal in size to M2, try to detect whether the mode change involves an implicit round trip through memory. If so, see if we can avoid that by removing the subregs and doing the move in mode M2 instead. 2020-06-05 Felix Yang <felix.yang@huawei.com> gcc/ PR target/95254 * expr.c (emit_move_insn): Check src and dest of the copy to see if one or both of them are subregs, try to remove the subregs when innermode and outermode are equal in size and the mode change involves an implicit round trip through memory. gcc/testsuite/ PR target/95254 * gcc.target/aarch64/pr95254.c: New test. * gcc.target/i386/pr67609.c: Check "movq\t%xmm0" instead of "movdqa".
2020-05-31expr: Fix fallout from optimize store_expr from STRING_CST [PR95052]Jakub Jelinek1-0/+5
> Can't hurt, and debugging the assert tripping is likely a hell of a lot easier > than debugging the resultant incorrect code. So if it passes, then I'd say go > for it. Testing passed, so I've committed it with those asserts (and thankfully I've added them!) but it apparently broke Linux kernel build on arm. The problem is that if the STRING_CST is very short, while the full object has BLKmode, the short string could very well have QImode/HImode/SImode/DImode and in that case it wouldn't take the path that copies the string and then clears the remaining space, but different paths in which it will ICE because of those asserts and without those it would just emit wrong-code. The following patch fixes it by enforcing BLKmode for the string MEM, even if it is short, so that we copy it and memset the rest. 2020-05-31 Jakub Jelinek <jakub@redhat.com> PR middle-end/95052 * expr.c (store_expr): For shortedned_string_cst, ensure temp has BLKmode. * gcc.dg/pr95052.c: New test.
2020-05-29expander: Optimize store_expr from STRING_CST [PR95052]Jakub Jelinek1-1/+33
In the following testcase, store_expr of e.g. 97 bytes long string literal into 1MB long array is implemented by copying the 97 bytes from .rodata section, followed by clearing the remaining bytes. But, as the STRING_CST has type char[1024*1024], we actually allocate whole 1MB in .rodata section for it, even when we only use the first 97 bytes from that. The following patch tweaks it so that if we are going to initialize only the small part from it, we don't emit all the zeros that we never use after it. 2020-05-29 Jakub Jelinek <jakub@redhat.com> PR middle-end/95052 * expr.c (store_expr): If expr_size is constant and significantly larger than TREE_STRING_LENGTH, set temp to just the TREE_STRING_LENGTH portion of the STRING_CST. * gcc.target/i386/pr95052.c: New test.
2020-04-16middle-end/94614 - avoid multiword moves to nothingRichard Biener1-0/+5
This adjusts emit_move_multi_word to handle moves into paradoxical subregs parts that are not there and adjusts lower-subregs CLOBBER resolving to deal with those as well. 2020-04-16 Richard Biener <rguenther@suse.de> PR middle-end/94614 * expr.c (emit_move_multi_word): Do not generate code when the destination part is undefined_operand_subword_p. * lower-subreg.c (resolve_clobber): Look through a paradoxica subreg.