riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-01-21	Re: [PATCH] Avoid ICE with m68k-elf -malign-int and libcalls	Mikael Pettersson	1	-2/+3
	>> emit_library_call_value_1 calls emit_push_insn with NULL_TREE >> for TYPE. Sometimes emit_push_insn needs to assign a temp with >> that TYPE, which causes a segfault. >> >> Fixed by computing the TYPE from MODE when needed. >> >> Original patch by Thorsten Otto. >> [ ... ] > This really needs to happen in the two call paths which pass in > NULL_TREE for the type. Note how the type is used to determine padding > earlier in emit_push_insn. That would also make the code more > consistent with the comment before emit_push_insn which implies that > both MODE and TYPE are valid. > > > Additionally you should bootstrap and regression test this patch on at > least one target. Updated as requested, and bootstrapped and tested on {x86_64,aarch64,m68k}-linux-gnu without regressions. gcc/ PR target/82420 PR target/111279 * calls.cc (emit_library_call_value_1): Pass valid TYPE to emit_push_insn. * expr.cc (emit_push_insn): Likewise. gcc/testsuite/ PR target/82420 * gcc.target/m68k/pr82420.c: New test. Co-authored-by: Thorsten Otto <admin@tho-otto.de>
2024-01-19	expansion: Fix ICEs with BLKmode VIEW_CONVERT_EXPR around non-BLKmode VAR_DECLs	Jakub Jelinek	1	-0/+4
	On aarch64 the backend decides to use non-BLKmode for some arrays like unsigned long[4] - OImode in that case, but the corresponding BITINT_TYPEs have BLKmode (like structures containing that many limb elements). This later causes ICEs durring expansion when expanding VIEW_CONVERT_EXPR from non-BLKmode VAR_DECL to BLKmode BITINT_TYPE. The following fix contains two parts, the discover_nonconstant_array_refs_r is make sure we force such variables into memory and the expand_expr_real_1 change makes sure we don't try to extract a bitfield or something similar which doesn't really work for BLKmode - as op0 is a MEM, all we need is the op0 = adjust_address (op0, mode, 0); at the end to change the MEM's mode to BLKmode. 2024-01-19 Jakub Jelinek <jakub@redhat.com> Richard Biener <rguenther@suse.de> * cfgexpand.cc (discover_nonconstant_array_refs_r): Force non-BLKmode VAR_DECLs referenced in BLKmode VIEW_CONVERT_EXPRs into memory. * expr.cc (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: Do nothing but adjust_address also for BLKmode mode and MEM op0.
2024-01-11	expr: Limit the store flag optimization for single bit to non-vectors [PR113322]	Andrew Pinski	1	-0/+2
	The problem here is after the recent vectorizer improvements, we end up with a comparison against a vector bool 0 which then tries expand_single_bit_test which is not expecting vector comparisons at all. The IR was: vector(4) <signed-boolean:1> mask_patt_5.13; _Bool _12; mask_patt_5.13_44 = vect_perm_even_41 != { 0.0, 1.0e+0, 2.0e+0, 3.0e+0 }; _12 = mask_patt_5.13_44 == { 0, 0, 0, 0 }; and we tried to call expand_single_bit_test for the last comparison. Rejecting the vector comparison is needed. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR middle-end/113322 gcc/ChangeLog: * expr.cc (do_store_flag): Don't try single bit tests with comparison on vector types. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr113322-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-01-11	middle-end/112740 - vector boolean CTOR expansion issue	Richard Biener	1	-3/+5
	The optimization to expand uniform boolean vectors by sign-extension works only for dense masks but it failed to check that. PR middle-end/112740 * expr.cc (store_constructor): Check the integer vector mask has a single bit per element before using sign-extension to expand an uniform vector. * gcc.dg/pr112740.c: New testcase.
2024-01-04	Improved RTL expansion of field assignments into promoted registers.	Roger Sayle	1	-5/+18
	This patch fixes PR rtl-optmization/104914 by tweaking/improving the way the fields are written into a pseudo register that needs to be kept sign extended. The motivating example from the bugzilla PR is: extern void ext(int); void foo(const unsigned char buf) { int val; ((unsigned char)&val)[0] = buf++; ((unsigned char)&val)[1] = buf++; ((unsigned char)&val)[2] = buf++; ((unsigned char)&val)[3] = buf++; if(val > 0) ext(1); else ext(0); } which at the end of the tree optimization passes looks like: void foo (const unsigned char buf) { int val; unsigned char _1; unsigned char _2; unsigned char _3; unsigned char _4; int val.5_5; <bb 2> [local count: 1073741824]: _1 = buf_7(D); MEM[(unsigned char )&val] = _1; _2 = MEM[(const unsigned char )buf_7(D) + 1B]; MEM[(unsigned char )&val + 1B] = _2; _3 = MEM[(const unsigned char )buf_7(D) + 2B]; MEM[(unsigned char )&val + 2B] = _3; _4 = MEM[(const unsigned char )buf_7(D) + 3B]; MEM[(unsigned char )&val + 3B] = _4; val.5_5 = val; if (val.5_5 > 0) goto <bb 3>; [59.00%] else goto <bb 4>; [41.00%] <bb 3> [local count: 633507681]: ext (1); goto <bb 5>; [100.00%] <bb 4> [local count: 440234144]: ext (0); <bb 5> [local count: 1073741824]: val ={v} {CLOBBER(eol)}; return; } Here four bytes are being sequentially written into the SImode value val. On some platforms, such as MIPS64, this SImode value is kept in a 64-bit register, suitably sign-extended. The function expand_assignment contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264 in expr.cc) which outputs an explicit extension operation after each store_field (typically insv) to such promoted/extended pseudos. The first observation is that there's no need to perform sign extension after each byte in the example above; the extension is only required after changes to the most significant byte (i.e. to a field that overlaps the most significant bit). The bug fix is actually a bit more subtle, but at this point during code expansion it's not safe to use a SUBREG when sign-extending this field. Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0)) but combine (and other RTL optimizers) later realize that because SImode values are always sign-extended in their 64-bit hard registers that this is a no-op and eliminates it. The trouble is that it's unsafe to refer to the SImode lowpart of a 64-bit register using SUBREG at those critical points when temporarily the value isn't correctly sign-extended, and the usual backend invariants don't hold. At these critical points, the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like (sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem. 2024-01-04 Roger Sayle <roger@nextmovesoftware.com> Jeff Law <jlaw@ventanamicro.com> gcc/ChangeLog PR rtl-optimization/104914 * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P a sign or zero extension is only required if the modified field overlaps the SUBREG's most significant bit. On MODE_REP_EXTENDED targets, don't refer to the temporarily incorrectly extended value using a SUBREG, but instead generate an explicit TRUNCATE rtx.
2024-01-03	Update copyright years.	Jakub Jelinek	1	-1/+1

2023-12-11	-finline-stringops: avoid too-wide smallest_int_mode_for_size [PR112784]	Alexandre Oliva	1	-11/+9
	smallest_int_mode_for_size may abort when the requested mode is not available. Call int_mode_for_size instead, that signals the unsatisfiable request in a more graceful way. for gcc/ChangeLog PR middle-end/112784 * expr.cc (emit_block_move_via_loop): Call int_mode_for_size for maybe-too-wide sizes. (emit_block_cmp_via_loop): Likewise. for gcc/testsuite/ChangeLog PR middle-end/112784 * gcc.target/i386/avx512cd-inline-stringops-pr112784.c: New.
2023-12-11	expr: catch more `a*bool` while expanding [PR 112935]	Andrew Pinski	1	-2/+3
	After r14-1655-g52c92fb3f40050 (and the other commits which touch zero_one_valued_p), we end up with a with `bool * a` but where the bool is an SSA name that might not have non-zero bits set on it (to 0x1) even though it does the non-zero bits would be 0x1. The case of coremarks, it is only phiopt4 which adds the new ssa name and nothing afterwards updates the nonzero bits on it. This fixes the regression by using gimple_zero_one_valued_p rather than tree_nonzero_bits to match the cases where the SSA_NAME didn't have the non-zero bits set. gimple_zero_one_valued_p handles one level of cast and also and an `&`. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR middle-end/112935 * expr.cc (expand_expr_real_2): Use gimple_zero_one_valued_p instead of tree_nonzero_bits to find boolean defined expressions. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2023-12-07	expr: Handle BITINT_TYPE in count_type_elements [PR112881]	Jakub Jelinek	1	-0/+1
	The following testcaser ICEs during gimplification, because count_type_elements doesn't handle BITINT_TYPE. It should handle it like other integral types. 2023-12-07 Jakub Jelinek <jakub@redhat.com> PR middle-end/112881 * expr.cc (count_type_elements): Handle BITINT_TYPE like INTEGER_TYPE. * gcc.dg/bitint-50.c: New test.
2023-11-29	Introduce -finline-stringops	Alexandre Oliva	1	-17/+379
	try_store_by_multiple_pieces was added not long ago, enabling variable-sized memset to be expanded inline when the worst-case in-range constant length would, using conditional blocks with powers of two to cover all possibilities of length and alignment. This patch introduces -finline-stringops[=fn] to request expansions to start with a loop, so as to still take advantage of known alignment even with long lengths, but without necessarily adding store blocks for every power of two. This makes it possible for the supported stringops (memset, memcpy, memmove, memset) to be expanded, even if storing a single byte per iteration. Surely efficient implementations can run faster, with a pre-loop to increase alignment, but that would likely be excessive for inline expansions. Still, in some cases, such as in freestanding environments, users prefer to inline such stringops, especially those that the compiler may introduce itself, even if the expansion is not as performant as a highly optimized C library implementation could be, to avoid depending on a C runtime library. for gcc/ChangeLog * expr.cc (emit_block_move_hints): Take ctz of len. Obey -finline-stringops. Use oriented or sized loop. (emit_block_move): Take ctz of len, and pass it on. (emit_block_move_via_sized_loop): New. (emit_block_move_via_oriented_loop): New. (emit_block_move_via_loop): Take incr. Move an incr-sized block per iteration. (emit_block_cmp_via_cmpmem): Take ctz of len. Obey -finline-stringops. (emit_block_cmp_via_loop): New. * expr.h (emit_block_move): Add ctz of len defaulting to zero. (emit_block_move_hints): Likewise. (emit_block_cmp_hints): Likewise. * builtins.cc (expand_builtin_memory_copy_args): Pass ctz of len to emit_block_move_hints. (try_store_by_multiple_pieces): Support starting with a loop. (expand_builtin_memcmp): Pass ctz of len to emit_block_cmp_hints. (expand_builtin): Allow inline expansion of memset, memcpy, memmove and memcmp if requested. * common.opt (finline-stringops): New. (ilsop_fn): New enum. * flag-types.h (enum ilsop_fn): New. * doc/invoke.texi (-finline-stringops): Add. for gcc/testsuite/ChangeLog * gcc.dg/torture/inline-mem-cmp-1.c: New. * gcc.dg/torture/inline-mem-cpy-1.c: New. * gcc.dg/torture/inline-mem-cpy-cmp-1.c: New. * gcc.dg/torture/inline-mem-move-1.c: New. * gcc.dg/torture/inline-mem-set-1.c: New.
2023-11-24	Clean up by_pieces_ninsns	Haochen Gui	1	-11/+8
	The by pieces compare can be implemented by overlapped operations. So it should be taken into consideration when doing the adjustment for overlap operations. The mode returned from widest_fixed_size_mode_for_size is already checked with mov_optab in by_pieces_mode_supported_p called by widest_fixed_size_mode_for_size. So it is no need to check mov_optab again in by_pieces_ninsns. The patch fixes these issues. gcc/ * expr.cc (by_pieces_ninsns): Include by pieces compare when do the adjustment for overlap operations. Replace mov_optab checks with gcc assertion.
2023-11-23	expr: Fix &bitint_var handling in initializers [PR112336]	Jakub Jelinek	1	-0/+1
	As the following testcase shows, we ICE when trying to emit ADDR_EXPR of a bitint variable which doesn't have mode width. The problem is in the EXTEND_BITINT stuff which makes sure we treat the padding bits on memory reads from user bitint vars as undefined. When expanding ADDR_EXPR on such vars inside outside of initializers, expand_expr_addr* uses EXPAND_CONST_ADDRESS modifier and EXTEND_BITINT does nothing, but in initializers it keeps using EXPAND_INITIALIZER modifier. So, we need to treat EXPAND_INITIALIZER the same as EXPAND_CONST_ADDRESS for this regard. 2023-11-23 Jakub Jelinek <jakub@redhat.com> PR middle-end/112336 * expr.cc (EXTEND_BITINT): Don't call reduce_to_bit_field_precision if modifier is EXPAND_INITIALIZER. * gcc.dg/bitint-41.c: New test.
2023-11-14	Fix ICE generating uniform vector masks	Andrew Stubbs	1	-1/+1
	Most targets have an "and" instructions for their vector mask size, but RISC-V only has DImode "and". Fixed by allowing wider instruction modes. gcc/ChangeLog: PR target/112481 * expr.cc (store_constructor): Use OPTAB_WIDEN for mask adjustment.
2023-11-10	vect: Don't set excess bits in unform masks	Andrew Stubbs	1	-2/+14
	AVX ignores any excess bits in the mask (at least for vector sizes >=8), but AMD GCN magically uses a larger vector than was intended (the smaller sizes are "fake"), leading to wrong-code. This patch fixes amdgcn execution failures in gcc.dg/vect/pr81740-1.c, gfortran.dg/c-interop/contiguous-1.f90, gfortran.dg/c-interop/ff-descriptor-7.f90, and others. gcc/ChangeLog: * expr.cc (store_constructor): Add "and" operation to uniform mask generation.
2023-10-30	Expand: Checking available optabs for scalar modes in by pieces operations	Haochen Gui	1	-10/+13
	The former patch (f08ca5903c7) examines the scalar modes by target hook scalar_mode_supported_p. It causes some i386 regression cases as XImode and OImode are not enabled in i386 target function. This patch examines the scalar mode by checking if the corresponding optabs are available for the mode. gcc/ PR target/111449 * expr.cc (qi_vector_mode_supported_p): Rename to... (by_pieces_mode_supported_p): ...this, and extends it to do the checking for both scalar and vector mode. (widest_fixed_size_mode_for_size): Call by_pieces_mode_supported_p to examine the mode. (op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise.
2023-10-23	Expand: Enable vector mode for by pieces compares	Haochen Gui	1	-34/+61
	Vector mode compare instructions are efficient for equality compare on rs6000. This patch refactors the codes of by pieces operation to enable vector mode for compare. gcc/ PR target/111449 * expr.cc (can_use_qi_vectors): New function to return true if we know how to implement OP using vectors of bytes. (qi_vector_mode_supported_p): New function to check if optabs exists for the mode and certain by pieces operations. (widest_fixed_size_mode_for_size): Replace the second argument with the type of by pieces operations. Call can_use_qi_vectors and qi_vector_mode_supported_p to do the check. Call scalar_mode_supported_p to check if the scalar mode is supported. (by_pieces_ninsns): Pass the type of by pieces operation to widest_fixed_size_mode_for_size. (class op_by_pieces_d): Remove m_qi_vector_mode. Add m_op to record the type of by pieces operations. (op_by_pieces_d::op_by_pieces_d): Change last argument to the type of by pieces operations, initialize m_op with it. Pass m_op to function widest_fixed_size_mode_for_size. (op_by_pieces_d::get_usable_mode): Pass m_op to function widest_fixed_size_mode_for_size. (op_by_pieces_d::smallest_fixed_size_mode_for_size): Call can_use_qi_vectors and qi_vector_mode_supported_p to do the check. (op_by_pieces_d::run): Pass m_op to function widest_fixed_size_mode_for_size. (move_by_pieces_d::move_by_pieces_d): Set m_op to MOVE_BY_PIECES. (store_by_pieces_d::store_by_pieces_d): Set m_op with the op. (can_store_by_pieces): Pass the type of by pieces operations to widest_fixed_size_mode_for_size. (clear_by_pieces): Initialize class store_by_pieces_d with CLEAR_BY_PIECES. (compare_by_pieces_d::compare_by_pieces_d): Set m_op to COMPARE_BY_PIECES.
2023-10-18	Fix expansion of `(a & 2) != 1`	Andrew Pinski	1	-4/+5
	I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670 where we would remove the `& CST` part if we ended up not calling expand_single_bit_test. This fixes the problem by introducing a new variable that will be used for calling expand_single_bit_test. As afar as I know this can only show up when disabling optimization passes as this above form would have been optimized away. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR middle-end/111863 gcc/ChangeLog: * expr.cc (do_store_flag): Don't over write arg0 when stripping off `& POW2`. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr111863-1.c: New test.
2023-10-16	expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg ↵	Vineet Gupta	1	-7/+0
	[target/111466] RISC-V suffers from extraneous sign extensions, despite/given the ABI guarantee that 32-bit quantities are sign-extended into 64-bit registers, meaning incoming SI function args need not be explicitly sign extended (so do SI return values as most ALU insns implicitly sign-extend too.) Existing REE doesn't seem to handle this well and there are various ideas floating around to smarten REE about it. RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE etc. Another approach would be to prevent EXPAND from generating the sign_extend in the first place which this patch tries to do. The hunk being removed was introduced way back in 1994 as 5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag") This survived full testsuite run for RISC-V rv64gc with surprisingly no fallouts: test results before/after are exactly same. \| \| # of unexpected case / # of unique unexpected case \| \| gcc \| g++ \| gfortran \| \| rv64imafdc_zba_zbb_zbs_zicond/\| 264 / 87 \| 5 / 2 \| 72 / 12 \| \| lp64d/medlow Granted for something so old to have survived, there must be a valid reason. Unfortunately the original change didn't have additional commentary or a test case. That is not to say it can't/won't possibly break things on other arches/ABIs, hence the RFC for someone to scream that this is just bonkers, don't do this 🙂 I've explicitly CC'ed Jakub and Roger who have last touched subreg promoted notes in expr.cc for insight and/or screaming 😉 Thanks to Robin for narrowing this down in an amazing debugging session @ GNU Cauldron. ``` foo2: sext.w a6,a1 <-- this goes away beq a1,zero,.L4 li a5,0 li a0,0 .L3: addw a4,a2,a5 addw a5,a3,a5 addw a0,a4,a0 bltu a5,a6,.L3 ret .L4: li a0,0 ret ``` Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com> PR target/111466 gcc/ * expr.cc (expand_expr_real_2): Do not clear SUBREG_PROMOTED_VAR_P. gcc/testsuite * gcc.target/riscv/pr111466.c: New test.
2023-09-29	Remove poly_int_pod	Richard Sandiford	1	-4/+4
	poly_int was written before the switch to C++11 and so couldn't use explicit default constructors. This led to an awkward split between poly_int_pod and poly_int. poly_int simply inherited from poly_int_pod and added constructors, with the argumentless constructor having an empty body. But inheritance meant that poly_int had to repeat the assignment operators from poly_int_pod (again, no C++11, so no "using" to inherit base-class implementations). All that goes away if we switch to using default constructors. The main complication is ensuring that braced initialisation still gives a constexpr, so that static variables can be initialised without runtime code. The two problems here are: (1) When initialising a poly_int<N, wide_int> with fewer than N coefficients, the other coefficients need to be a zero of the same precision as the explicit coefficients. This was previously done in a for loop using wi::ints_for<...>::zero, but C++11 constexpr constructors can't have function bodies. The patch instead uses a series of delegated initialisers to fill in the implicit coefficients. (2) The initialisation in: void f(int x) { unsigned int foo {x}; } produces the warning: warning: narrowing conversion of 'x' from 'int' to 'unsigned int' [-Wnarrowing] whereas: void f(int x) { unsigned int foo = x; } does not. So switching to direct initialisation of the coeffs array would mean that: poly_uin64_t x = 0; would trigger a warning for using 0 rather than 0u. That seemed overly pedantic, so the patch adds explicit casts to the constructor. The complication is to do that without adding extra code to wide-int versions. The patch uses a new init_cast type for that. gcc/ * poly-int.h (poly_int_pod): Delete. (poly_coeff_traits::init_cast): New type. (poly_int_full, poly_int_hungry, poly_int_fullness): New structures. (poly_int): Replace constructors that take 1 and 2 coefficients with a general one that takes an arbitrary number of coefficients. Delegate initialization to two new private constructors, one of which uses the coefficients as-is and one of which adds an extra zero of the appropriate type (and precision, where applicable). (gt_ggc_mx, gt_pch_nx): Operate on poly_ints rather than poly_int_pods. * poly-int-types.h (poly_uint16_pod, poly_int64_pod, poly_uint64_pod) (poly_offset_int_pod, poly_wide_int_pod, poly_widest_int_pod): Delete. * gengtype.cc (main): Don't register poly_int64_pod. * calls.cc (initialize_argument_information): Use poly_int rather than poly_int_pod. (combine_pending_stack_adjustment_and_call): Likewise. * config/aarch64/aarch64.cc (pure_scalable_type_info): Likewise. * data-streamer.h (bp_unpack_poly_value): Likewise. * dwarf2cfi.cc (struct dw_trace_info): Likewise. (struct queued_reg_save): Likewise. * dwarf2out.h (struct dw_cfa_location): Likewise. * emit-rtl.h (struct incoming_args): Likewise. (struct rtl_data): Likewise. * expr.cc (get_bit_range): Likewise. (get_inner_reference): Likewise. * expr.h (get_bit_range): Likewise. * fold-const.cc (split_address_to_core_and_offset): Likewise. (ptr_difference_const): Likewise. * fold-const.h (ptr_difference_const): Likewise. * function.cc (try_fit_stack_local): Likewise. (instantiate_new_reg): Likewise. * function.h (struct expr_status): Likewise. (struct args_size): Likewise. * genmodes.cc (ZERO_COEFFS): Likewise. (mode_size_inline): Likewise. (mode_nunits_inline): Likewise. (emit_mode_precision): Likewise. (emit_mode_size): Likewise. (emit_mode_nunits): Likewise. * gimple-fold.cc (get_base_constructor): Likewise. * gimple-ssa-store-merging.cc (struct symbolic_number): Likewise. * inchash.h (class hash): Likewise. * ipa-modref-tree.cc (modref_access_node::dump): Likewise. * ipa-modref.cc (modref_access_analysis::merge_call_side_effects): Likewise. * ira-int.h (ira_spilled_reg_stack_slot): Likewise. * lra-eliminations.cc (self_elim_offsets): Likewise. * machmode.h (mode_size, mode_precision, mode_nunits): Likewise. * omp-low.cc (omplow_simd_context): Likewise. * pretty-print.cc (pp_wide_integer): Likewise. * pretty-print.h (pp_wide_integer): Likewise. * reload.cc (struct decomposition): Likewise. * reload.h (struct reload): Likewise. * reload1.cc (spill_stack_slot_width): Likewise. (struct elim_table): Likewise. (offsets_at): Likewise. (init_eliminable_invariants): Likewise. * rtl.h (union rtunion): Likewise. (poly_int_rtx_p): Likewise. (strip_offset): Likewise. (strip_offset_and_add): Likewise. * rtlanal.cc (strip_offset): Likewise. * tree-dfa.cc (get_ref_base_and_extent): Likewise. (get_addr_base_and_unit_offset_1): Likewise. (get_addr_base_and_unit_offset): Likewise. * tree-dfa.h (get_ref_base_and_extent): Likewise. (get_addr_base_and_unit_offset_1): Likewise. (get_addr_base_and_unit_offset): Likewise. * tree-ssa-loop-ivopts.cc (struct iv_use): Likewise. (strip_offset): Likewise. * tree-ssa-sccvn.h (struct vn_reference_op_struct): Likewise. * tree.cc (ptrdiff_tree_p): Likewise. * tree.h (poly_int_tree_p): Likewise. (ptrdiff_tree_p): Likewise. (get_inner_reference): Likewise. gcc/testsuite/ * gcc.dg/plugin/poly-int-tests.h (test_num_coeffs_extra): Use poly_int rather than poly_int_pod.
2023-09-29	Simplify & expand c_readstr	Richard Sandiford	1	-5/+2
	c_readstr only operated on integer modes. It worked by reading the source string into an array of HOST_WIDE_INTs, converting that array into a wide_int, and from there to an rtx. It's simpler to do this by building a target memory image and using native_decode_rtx to convert that memory image into an rtx. It avoids all the endianness shenanigans because both the string and native_decode_rtx follow target memory order. It also means that the function can handle all fixed-size modes, which simplifies callers and allows vector modes to be used more widely. gcc/ * builtins.h (c_readstr): Take a fixed_size_mode rather than a scalar_int_mode. * builtins.cc (c_readstr): Likewise. Build a local array of bytes and use native_decode_rtx to get the rtx image. (builtin_memcpy_read_str): Simplify accordingly. (builtin_strncpy_read_str): Likewise. (builtin_memset_read_str): Likewise. (builtin_memset_gen_str): Likewise. * expr.cc (string_cst_read_str): Likewise.
2023-09-20	middle-end: use MAX_FIXED_MODE_SIZE instead of precidion of TImode/DImode	Jakub Jelinek	1	-10/+4
	On Tue, Sep 19, 2023 at 05:50:59PM +0100, Richard Sandiford wrote: > How about using MAX_FIXED_MODE_SIZE for things like this? Seems like a good idea. The following patch does that. 2023-09-20 Jakub Jelinek <jakub@redhat.com> * match.pd ((x << c) >> c): Use MAX_FIXED_MODE_SIZE instead of GET_MODE_PRECISION of TImode or DImode depending on whether TImode is supported scalar mode. * gimple-lower-bitint.cc (bitint_precision_kind): Likewise. * expr.cc (expand_expr_real_1): Likewise. * tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt): Likewise. * ubsan.cc (ubsan_encode_value, ubsan_type_descriptor): Likewise.
2023-09-07	middle-end: Avoid calling targetm.c.bitint_type_info inside of gcc_assert ↵	Jakub Jelinek	1	-1/+2
	[PR102989] On Thu, Sep 07, 2023 at 10:36:02AM +0200, Thomas Schwinge wrote: > Minor comment/question: are we doing away with the property that > 'assert'-like "calls" must not have side effects? Per 'gcc/system.h', > this is "OK" for 'gcc_assert' for '#if ENABLE_ASSERT_CHECKING' or > '#elif (GCC_VERSION >= 4005)' -- that is, GCC 4.5, which is always-true, > thus the "offending" '#else' is never active. However, it's different > for standard 'assert' and 'gcc_checking_assert', so I'm not sure if > that's a good property for 'gcc_assert' only? For example, see also > <https://gcc.gnu.org/PR6906> "warn about asserts with side effects", or > recent <https://gcc.gnu.org/PR111144> > "RFE: could -fanalyzer warn about assertions that have side effects?". You're right, the #define gcc_assert(EXPR) ((void)(0 && (EXPR))) fallback definition is incompatible with the way I've used it, so for --disable-checking built by non-GCC it would not work properly. 2023-09-07 Jakub Jelinek <jakub@redhat.com> PR c/102989 * expr.cc (expand_expr_real_1): Don't call targetm.c.bitint_type_info inside gcc_assert, as later code relies on it filling info variable. * gimple-fold.cc (clear_padding_bitint_needs_padding_p, clear_padding_type): Likewise. * varasm.cc (output_constant): Likewise. * fold-const.cc (native_encode_int, native_interpret_int): Likewise. * stor-layout.cc (finish_bitfield_representative, layout_type): Likewise. * gimple-lower-bitint.cc (bitint_precision_kind): Likewise.
2023-09-06	Middle-end _BitInt support [PR102989]	Jakub Jelinek	1	-5/+56
	The following patch introduces the middle-end part of the _BitInt support, a new BITINT_TYPE, handling it where needed, except the lowering pass and sanitizer support. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * tree.def (BITINT_TYPE): New type. * tree.h (TREE_CHECK6, TREE_NOT_CHECK6): Define. (NUMERICAL_TYPE_CHECK, INTEGRAL_TYPE_P): Include BITINT_TYPE. (BITINT_TYPE_P): Define. (CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode bit-fields if they have BITINT_TYPE type. (tree_check6, tree_not_check6): New inline functions. (any_integral_type_check): Include BITINT_TYPE. (build_bitint_type): Declare. * tree.cc (tree_code_size, wide_int_to_tree_1, cache_integer_cst, build_zero_cst, type_hash_canon_hash, type_cache_hasher::equal, type_hash_canon): Handle BITINT_TYPE. (bitint_type_cache): New variable. (build_bitint_type): New function. (signed_or_unsigned_type_for, verify_type_variant, verify_type): Handle BITINT_TYPE. (tree_cc_finalize): Free bitint_type_cache. * builtins.cc (type_to_class): Handle BITINT_TYPE. (fold_builtin_unordered_cmp): Handle BITINT_TYPE like INTEGER_TYPE. * cfgexpand.cc (expand_debug_expr): Punt on BLKmode BITINT_TYPE INTEGER_CSTs. * convert.cc (convert_to_pointer_1, convert_to_real_1, convert_to_complex_1): Handle BITINT_TYPE like INTEGER_TYPE. (convert_to_integer_1): Likewise. For BITINT_TYPE don't check GET_MODE_PRECISION (TYPE_MODE (type)). * doc/generic.texi (BITINT_TYPE): Document. * doc/tm.texi.in (TARGET_C_BITINT_TYPE_INFO): New. * doc/tm.texi: Regenerated. * dwarf2out.cc (base_type_die, is_base_type, modified_type_die, gen_type_die_with_usage): Handle BITINT_TYPE. (rtl_for_decl_init): Punt on BLKmode BITINT_TYPE INTEGER_CSTs or handle those which fit into shwi. * expr.cc (expand_expr_real_1): Define EXTEND_BITINT macro, reduce to bitfield precision reads from BITINT_TYPE vars, parameters or memory locations. Expand large/huge BITINT_TYPE INTEGER_CSTs into memory. * fold-const.cc (fold_convert_loc, make_range_step): Handle BITINT_TYPE. (extract_muldiv_1): For BITINT_TYPE use TYPE_PRECISION rather than GET_MODE_SIZE (SCALAR_INT_TYPE_MODE). (native_encode_int, native_interpret_int, native_interpret_expr): Handle BITINT_TYPE. * gimple-expr.cc (useless_type_conversion_p): Make BITINT_TYPE to some other integral type or vice versa conversions non-useless. * gimple-fold.cc (gimple_fold_builtin_memset): Punt for BITINT_TYPE. (clear_padding_unit): Mention in comment that _BitInt types don't need to fit either. (clear_padding_bitint_needs_padding_p): New function. (clear_padding_type_may_have_padding_p): Handle BITINT_TYPE. (clear_padding_type): Likewise. * internal-fn.cc (expand_mul_overflow): For unsigned non-mode precision operands force pos_neg? to 1. (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): New functions. * internal-fn.def (MULBITINT, DIVMODBITINT, FLOATTOBITINT, BITINTTOFLOAT): New internal functions. * internal-fn.h (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): Declare. * match.pd (non-equality compare simplifications from fold_binary): Punt if TYPE_MODE (arg1_type) is BLKmode. * pretty-print.h (pp_wide_int): Handle printing of large precision wide_ints which would buffer overflow digit_buffer. * stor-layout.cc (finish_bitfield_representative): For bit-fields with BITINT_TYPE, prefer representatives with precisions in multiple of limb precision. (layout_type): Handle BITINT_TYPE. Handle COMPLEX_TYPE with BLKmode element type and assert it is BITINT_TYPE. * target.def (bitint_type_info): New C target hook. * target.h (struct bitint_info): New type. * targhooks.cc (default_bitint_type_info): New function. * targhooks.h (default_bitint_type_info): Declare. * tree-pretty-print.cc (dump_generic_node): Handle BITINT_TYPE. Handle printing large wide_ints which would buffer overflow digit_buffer. * tree-ssa-sccvn.cc: Include target.h. (eliminate_dom_walker::eliminate_stmt): Punt for large/huge BITINT_TYPE. * tree-switch-conversion.cc (jump_table_cluster::emit): For more than 64-bit BITINT_TYPE subtract low bound from expression and cast to 64-bit integer type both the controlling expression and case labels. * typeclass.h (enum type_class): Add bitint_type_class enumerator. * varasm.cc (output_constant): Handle BITINT_TYPE INTEGER_CSTs. * vr-values.cc (check_for_binary_op_overflow): Use widest2_int rather than widest_int. (simplify_using_ranges::simplify_internal_call_using_ranges): Use unsigned_type_for rather than build_nonstandard_integer_type.
2023-08-10	expr: Small optimization [PR102989]	Jakub Jelinek	1	-6/+4
	Small optimization to avoid testing modifier multiple times. 2023-08-10 Jakub Jelinek <jakub@redhat.com> PR c/102989 * expr.cc (expand_expr_real_1) <case MEM_REF>: Add an early return for EXPAND_WRITE or EXPAND_MEMORY modifiers to avoid testing it multiple times.
2023-07-28	PR rtl-optimization/110587: Reduce useless moves in compile-time hog.	Roger Sayle	1	-9/+4
	This patch is one of a series of fixes for PR rtl-optimization/110587, a compile-time regression with -O0, that attempts to address the underlying cause. As noted previously, the pathological test case pr28071.c contains a large number of useless register-to-register moves that can produce quadratic behaviour (in LRA). These moves are generated during RTL expansion in emit_group_load_1, where the middle-end attempts to simplify the source before calling extract_bit_field. This is reasonable if the source is a complex expression (from before the tree-ssa optimizers), or a SUBREG, or a hard register, but it's not particularly useful to copy a pseudo register into a new pseudo register. This patch eliminates that redundancy. The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains 777K lines, with this patch it contains 717K lines, i.e. saving about 60K lines (admittedly of debugging text output, but it makes the point). 2023-07-28 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR middle-end/28071 PR rtl-optimization/110587 * expr.cc (emit_group_load_1): Simplify logic for calling force_reg on ORIG_SRC, to avoid making a copy if the source is already in a pseudo register.
2023-06-29	cselib+expr+bitmap: Change return type of predicate functions from int to bool	Uros Bizjak	1	-52/+52
	gcc/ChangeLog: * cselib.h (rtx_equal_for_cselib_1): Change return type from int to bool. (references_value_p): Ditto. (rtx_equal_for_cselib_p): Ditto. * expr.h (can_store_by_pieces): Ditto. (try_casesi): Ditto. (try_tablejump): Ditto. (safe_from_p): Ditto. * sbitmap.h (bitmap_equal_p): Ditto. * cselib.cc (references_value_p): Change return type from int to void and adjust function body accordingly. (rtx_equal_for_cselib_1): Ditto. * expr.cc (is_aligning_offset): Ditto. (can_store_by_pieces): Ditto. (mostly_zeros_p): Ditto. (all_zeros_p): Ditto. (safe_from_p): Ditto. (is_aligning_offset): Ditto. (try_casesi): Ditto. (try_tablejump): Ditto. (store_constructor): Change "need_to_clear" and "const_bounds_p" variables to bool. * sbitmap.cc (bitmap_equal_p): Change return type from int to bool.
2023-06-29	middle-end/110452 - bad code generation with AVX512 mask splat	Richard Biener	1	-0/+13
	The following adds an alternate way of expanding a uniform mask vector constructor like _55 = _2 ? -1 : 0; vect_cst__56 = {_55, _55, _55, _55, _55, _55, _55, _55}; when the mask mode is a scalar int mode like for AVX512 or GCN. Instead of piecewise building the result via shifts and ors we can take advantage of uniformity and signedness of the component and simply sign-extend to the result. Instead of cmpl $3, %edi sete %cl movl %ecx, %esi leal (%rsi,%rsi), %eax leal 0(,%rsi,4), %r9d leal 0(,%rsi,8), %r8d orl %esi, %eax orl %r9d, %eax movl %ecx, %r9d orl %r8d, %eax movl %ecx, %r8d sall $4, %r9d sall $5, %r8d sall $6, %esi orl %r9d, %eax orl %r8d, %eax movl %ecx, %r8d orl %esi, %eax sall $7, %r8d orl %r8d, %eax kmovb %eax, %k1 we then get cmpl $3, %edi sete %cl negl %ecx kmovb %ecx, %k1 Code generation for non-uniform masks remains bad, but at least I see no easy way out for the most general case here. PR middle-end/110452 * expr.cc (store_constructor): Handle uniform boolean vectors with integer mode specially.
2023-06-13	Avoid duplicate vector initializations during RTL expansion.	Roger Sayle	1	-2/+5
	This middle-end patch avoids some redundant RTL for vector initialization during RTL expansion. For the simple test case: typedef __int128 v1ti __attribute__ ((__vector_size__ (16))); __int128 key; v1ti foo() { return (v1ti){key}; } the middle-end currently expands: (set (reg:V1TI 85) (const_vector:V1TI [ (const_int 0) ])) (set (reg:V1TI 85) (mem/c:V1TI (symbol_ref:DI ("key")))) where we create a dead instruction that initializes the vector to zero, immediately followed by a set of the entire vector. This patch skips this zeroing instruction when the vector has only a single element. It also updates the code to indicate when we've cleared the vector, so that we don't need to initialize zero elements. 2023-06-13 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * expr.cc (store_constructor) <case VECTOR_TYPE>: Don't bother clearing vectors with only a single element. Set CLEARED if the vector was initialized to zero.
2023-06-06	Handle const_int in expand_single_bit_test	Andrew Pinski	1	-3/+7
	After expanding directly to rtl instead of creating a tree, we could end up with a const_int which is not ready to be handled by extract_bit_field. So need to the constant folding here instead. OK? bootstrapped and tested on x86_64-linux-gnu with no regressions. PR middle-end/110117 gcc/ChangeLog: * expr.cc (expand_single_bit_test): Handle const_int from expand_expr. gcc/testsuite/ChangeLog: * gcc.dg/pr110117-1.c: New test. * gcc.dg/pr110117-2.c: New test.
2023-06-06	Improve do_store_flag for single bit when there is no non-zero bits	Andrew Pinski	1	-17/+11
	In r14-1534-g908e5ab5c11c, I forgot you could turn off CCP or turn off the bit tracking part of CCP so we would lose out what TER was able to do before hand. This moves around the TER code so that it is used instead of just the nonzerobits. It also makes it easier to remove the TER part of the code later on too. OK? Bootstrapped and tested on x86_64-linux-gnu. Note it reintroduces PR 110117 (which was accidently fixed after r14-1534-g908e5ab5c11c). The next patch in series will fix that. gcc/ChangeLog: * expr.cc (do_store_flag): Rearrange the TER code so that it overrides the nonzero bits info if we had `a & POW2`.
2023-06-05	Remove widen_plus/minus_expr tree codes	Andre Vieira	1	-6/+0
	This patch removes the old widen plus/minus tree codes which have been replaced by internal functions. 2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com> Joel Hutton <joel.hutton@arm.com> gcc/ChangeLog: * doc/generic.texi: Remove old tree codes. * expr.cc (expand_expr_real_2): Remove old tree code cases. * gimple-pretty-print.cc (dump_binary_rhs): Likewise. * optabs-tree.cc (optab_for_tree_code): Likewise. (supportable_half_widening_operation): Likewise. * tree-cfg.cc (verify_gimple_assign_binary): Likewise. * tree-inline.cc (estimate_operator_cost): Likewise. (op_symbol_code): Likewise. * tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Likewise. (vect_analyze_data_ref_accesses): Likewise. * tree-vect-generic.cc (expand_vector_operations_1): Likewise. * cfgexpand.cc (expand_debug_expr): Likewise. * tree-vect-stmts.cc (vectorizable_conversion): Likewise. (supportable_widening_operation): Likewise. * gimple-range-op.cc (gimple_range_op_handler::maybe_non_standard): Likewise. * optabs.def (vec_widen_ssubl_hi_optab, vec_widen_ssubl_lo_optab, vec_widen_saddl_hi_optab, vec_widen_saddl_lo_optab, vec_widen_usubl_hi_optab, vec_widen_usubl_lo_optab, vec_widen_uaddl_hi_optab, vec_widen_uaddl_lo_optab): Remove optabs. * tree-pretty-print.cc (dump_generic_node): Remove tree code definition. * tree.def (WIDEN_PLUS_EXPR, WIDEN_MINUS_EXPR, VEC_WIDEN_PLUS_HI_EXPR, VEC_WIDEN_PLUS_LO_EXPR, VEC_WIDEN_MINUS_HI_EXPR, VEC_WIDEN_MINUS_LO_EXPR): Likewise.
2023-06-04	Improve do_store_flag for comparing single bit against that bit	Andrew Pinski	1	-3/+8
	This is a case which I noticed while working on the previous patch. Sometimes we end up with `a == CST` instead of comparing against 0. This happens in the following code: ``` unsigned f(unsigned t) { if (t & ~(1<<30)) __builtin_unreachable(); t ^= (1<<30); return t != 0; } ``` We should handle the case where the nonzero bits is the same as the comparison operand. Changes from v1: * v2: Updated for the bit extraction changes. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * expr.cc (do_store_flag): Improve for single bit testing not against zero but against that single bit.
2023-06-04	Improve do_store_flag for single bit comparison against 0	Andrew Pinski	1	-5/+20
	While working something else, I noticed we could improve the following function code generation: ``` unsigned f(unsigned t) { if (t & ~(1<<30)) __builtin_unreachable(); return t != 0; } ``` Right know we just emit a comparison against 0 instead of just a shift right by 30. There is code in do_store_flag which already optimizes `(t & 1<<30) != 0` to `(t >> 30) & 1` (using bit extraction if available). This patch extends it to handle the case where we know t has a nonzero of just one bit set. Changes from v1: * v2: Updated for the bit extraction improvements. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * expr.cc (do_store_flag): Extend the one bit checking case to handle the case where we don't have an and but rather still one bit is known to be non-zero.
2023-05-20	Fix expand_single_bit_test for big-endian	Andrew Pinski	1	-1/+8
	I had thought extract_bit_field bitpos argument was the shifted position and not the bitposition like BIT_FIELD_REF so I had removed the code which would use the correct bitposition for BYTES_BIG_ENDIAN. Committed as obvious; I checked big-endian MIPS to make sure we are now producing the correct code. gcc/ChangeLog: * expr.cc (expand_single_bit_test): Correct bitpos for big-endian.
2023-05-21	Fix PR 109919: ICE in emit_move_insn with some bit tests	Andrew Pinski	1	-1/+1
	The problem is I used expand_expr with the target but we don't want to use the target here as it is the wrong mode for the original expression. The testcase would ICE deap down while trying to do a move to use the target. Anyways just calling expand_expr with NULL_EXPR fixes the issue. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR middle-end/109919 gcc/ChangeLog: * expr.cc (expand_single_bit_test): Don't use the target for expand_expr. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr109919-1.c: New test.
2023-05-20	Expand directly for single bit test	Andrew Pinski	1	-35/+28
	Instead of using creating trees to the expansion, just expand directly which makes the code a little simplier but also reduces how much GC memory will be used during the expansion. gcc/ChangeLog: * expr.cc (fold_single_bit_test): Rename to ... (expand_single_bit_test): This and expand directly. (do_store_flag): Update for the rename function.
2023-05-20	Use BIT_FIELD_REF inside fold_single_bit_test	Andrew Pinski	1	-11/+10
	Instead of depending on combine to do the extraction, Let's create a tree which will expand directly into the extraction. This improves code generation on some targets. gcc/ChangeLog: * expr.cc (fold_single_bit_test): Use BIT_FIELD_REF instead of shift/and.
2023-05-20	Simplify fold_single_bit_test with respect to code	Andrew Pinski	1	-55/+53
	Since we know that fold_single_bit_test is now only passed NE_EXPR or EQ_EXPR, we can simplify it and just use a gcc_assert to assert that is the code that is being passed. gcc/ChangeLog: * expr.cc (fold_single_bit_test): Add an assert and simplify based on code being NE_EXPR or EQ_EXPR.
2023-05-20	Simplify fold_single_bit_test slightly	Andrew Pinski	1	-12/+10
	Now the only use of fold_single_bit_test is in do_store_flag, we can change it such that to pass the inner arg and bitnum instead of building a tree. There is no code generation changes due to this change, only a decrease in GC memory that is produced during expansion. gcc/ChangeLog: * expr.cc (fold_single_bit_test): Take inner and bitnum instead of arg0 and arg1. Update the code. (do_store_flag): Don't create a tree when calling fold_single_bit_test instead just call it with the bitnum and the inner tree.
2023-05-20	Use get_def_for_expr in fold_single_bit_test	Andrew Pinski	1	-5/+6
	The code in fold_single_bit_test, checks if the inner was a right shift and improve the bitnum based on that. But since the inner will always be a SSA_NAME at this point, the code is dead. Move it over to use the helper function get_def_for_expr instead. gcc/ChangeLog: * expr.cc (fold_single_bit_test): Use get_def_for_expr instead of checking the inner's code.
2023-05-20	Inline and simplify fold_single_bit_test_into_sign_test into ↵	Andrew Pinski	1	-41/+10
	fold_single_bit_test Since the last use of fold_single_bit_test is fold_single_bit_test, we can inline it and even simplify the inlined version. This has no behavior change. gcc/ChangeLog: * expr.cc (fold_single_bit_test_into_sign_test): Inline into ... (fold_single_bit_test): This and simplify.
2023-05-20	Move fold_single_bit_test to expr.cc from fold-const.cc	Andrew Pinski	1	-0/+113
	This is part 1 of N patch set that will change the expansion of `(A & C) != 0` from using trees to directly expanding so later on we can do some cost analysis. Since the only user of fold_single_bit_test is now expand, move it to there. gcc/ChangeLog: * fold-const.cc (fold_single_bit_test_into_sign_test): Move to expr.cc. (fold_single_bit_test): Likewise. * expr.cc (fold_single_bit_test_into_sign_test): Move from fold-const.cc (fold_single_bit_test): Likewise and make static. * fold-const.h (fold_single_bit_test): Remove declaration.
2023-05-18	gcc: use _P() defines from tree.h	Bernhard Reutner-Fischer	1	-1/+1
	gcc/ChangeLog: * alias.cc (ref_all_alias_ptr_type_p): Use _P() defines from tree.h. * attribs.cc (diag_attr_exclusions): Ditto. (decl_attributes): Ditto. (build_type_attribute_qual_variant): Ditto. * builtins.cc (fold_builtin_carg): Ditto. (fold_builtin_next_arg): Ditto. (do_mpc_arg2): Ditto. * cfgexpand.cc (expand_return): Ditto. * cgraph.h (decl_in_symtab_p): Ditto. (symtab_node::get_create): Ditto. * dwarf2out.cc (base_type_die): Ditto. (implicit_ptr_descriptor): Ditto. (gen_array_type_die): Ditto. (gen_type_die_with_usage): Ditto. (optimize_location_into_implicit_ptr): Ditto. * expr.cc (do_store_flag): Ditto. * fold-const.cc (negate_expr_p): Ditto. (fold_negate_expr_1): Ditto. (fold_convert_const): Ditto. (fold_convert_loc): Ditto. (constant_boolean_node): Ditto. (fold_binary_op_with_conditional_arg): Ditto. (build_fold_addr_expr_with_type_loc): Ditto. (fold_comparison): Ditto. (fold_checksum_tree): Ditto. (tree_unary_nonnegative_warnv_p): Ditto. (integer_valued_real_unary_p): Ditto. (fold_read_from_constant_string): Ditto. * gcc-rich-location.cc (maybe_range_label_for_tree_type_mismatch::get_text): Ditto. * gimple-expr.cc (useless_type_conversion_p): Ditto. (is_gimple_reg): Ditto. (is_gimple_asm_val): Ditto. (mark_addressable): Ditto. * gimple-expr.h (is_gimple_variable): Ditto. (virtual_operand_p): Ditto. * gimple-ssa-warn-access.cc (pass_waccess::check_dangling_stores): Ditto. * gimplify.cc (gimplify_bind_expr): Ditto. (gimplify_return_expr): Ditto. (gimple_add_padding_init_for_auto_var): Ditto. (gimplify_addr_expr): Ditto. (omp_add_variable): Ditto. (omp_notice_variable): Ditto. (omp_get_base_pointer): Ditto. (omp_strip_components_and_deref): Ditto. (omp_strip_indirections): Ditto. (omp_accumulate_sibling_list): Ditto. (omp_build_struct_sibling_lists): Ditto. (gimplify_adjust_omp_clauses_1): Ditto. (gimplify_adjust_omp_clauses): Ditto. (gimplify_omp_for): Ditto. (goa_lhs_expr_p): Ditto. (gimplify_one_sizepos): Ditto. * graphite-scop-detection.cc (scop_detection::graphite_can_represent_scev): Ditto. * ipa-devirt.cc (odr_types_equivalent_p): Ditto. * ipa-prop.cc (ipa_set_jf_constant): Ditto. (propagate_controlled_uses): Ditto. * ipa-sra.cc (type_prevails_p): Ditto. (scan_expr_access): Ditto. * optabs-tree.cc (optab_for_tree_code): Ditto. * toplev.cc (wrapup_global_declaration_1): Ditto. * trans-mem.cc (transaction_invariant_address_p): Ditto. * tree-cfg.cc (verify_types_in_gimple_reference): Ditto. (verify_gimple_comparison): Ditto. (verify_gimple_assign_binary): Ditto. (verify_gimple_assign_single): Ditto. * tree-complex.cc (get_component_ssa_name): Ditto. * tree-emutls.cc (lower_emutls_2): Ditto. * tree-inline.cc (copy_tree_body_r): Ditto. (estimate_move_cost): Ditto. (copy_decl_for_dup_finish): Ditto. * tree-nested.cc (convert_nonlocal_omp_clauses): Ditto. (note_nonlocal_vla_type): Ditto. (convert_local_omp_clauses): Ditto. (remap_vla_decls): Ditto. (fixup_vla_decls): Ditto. * tree-parloops.cc (loop_has_vector_phi_nodes): Ditto. * tree-pretty-print.cc (print_declaration): Ditto. (print_call_name): Ditto. * tree-sra.cc (compare_access_positions): Ditto. * tree-ssa-alias.cc (compare_type_sizes): Ditto. * tree-ssa-ccp.cc (get_default_value): Ditto. * tree-ssa-coalesce.cc (populate_coalesce_list_for_outofssa): Ditto. * tree-ssa-dom.cc (reduce_vector_comparison_to_scalar_comparison): Ditto. * tree-ssa-forwprop.cc (can_propagate_from): Ditto. * tree-ssa-propagate.cc (may_propagate_copy): Ditto. * tree-ssa-sccvn.cc (fully_constant_vn_reference_p): Ditto. * tree-ssa-sink.cc (statement_sink_location): Ditto. * tree-ssa-structalias.cc (type_must_have_pointers): Ditto. * tree-ssa-ter.cc (find_replaceable_in_bb): Ditto. * tree-ssa-uninit.cc (warn_uninit): Ditto. * tree-ssa.cc (maybe_rewrite_mem_ref_base): Ditto. (non_rewritable_mem_ref_base): Ditto. * tree-streamer-in.cc (lto_input_ts_type_non_common_tree_pointers): Ditto. * tree-streamer-out.cc (write_ts_type_non_common_tree_pointers): Ditto. * tree-vect-generic.cc (do_binop): Ditto. (do_cond): Ditto. * tree-vect-stmts.cc (vect_init_vector): Ditto. * tree-vector-builder.h (tree_vector_builder::note_representative): Ditto. * tree.cc (sign_mask_for): Ditto. (verify_type_variant): Ditto. (gimple_canonical_types_compatible_p): Ditto. (verify_type): Ditto. * ubsan.cc (get_ubsan_type_info_for_type): Ditto. * var-tracking.cc (prepare_call_arguments): Ditto. (vt_add_function_parameters): Ditto. * varasm.cc (decode_addr_const): Ditto.
2023-04-19	Introduce VIRTUAL_REGISTER_P and VIRTUAL_REGISTER_NUM_P predicates	Uros Bizjak	1	-2/+1
	These two predicates are similar to existing HARD_REGISTER_P and HARD_REGISTER_NUM_P predicates and return 1 if the given register corresponds to a virtual register. gcc/ChangeLog: * rtl.h (VIRTUAL_REGISTER_P): New predicate. (VIRTUAL_REGISTER_NUM_P): Ditto. (REGNO_PTR_FRAME_P): Use VIRTUAL_REGISTER_NUM_P predicate. * expr.cc (force_operand): Use VIRTUAL_REGISTER_P predicate. * function.cc (instantiate_decl_rtl): Ditto. * rtlanal.cc (rtx_addr_can_trap_p_1): Ditto. (nonzero_address_p): Ditto. (refers_to_regno_p): Use VIRTUAL_REGISTER_NUM_P predicate.
2023-03-14	Revert latest change to emit_group_store	Eric Botcazou	1	-10/+7
	This pessimizes on targets with insv instructions. gcc/ PR rtl-optimization/107762 * expr.cc (emit_group_store): Revert latest change.
2023-03-12	middle-end: Revert can_special_div_by_const changes [PR108583]	Tamar Christina	1	-14/+10
	This reverts the changes for the CAN_SPECIAL_DIV_BY_CONST hook. gcc/ChangeLog: PR target/108583 * doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Remove. * doc/tm.texi.in: Likewise. * explow.cc (round_push, align_dynamic_address): Revert previous patch. * expmed.cc (expand_divmod): Likewise. * expmed.h (expand_divmod): Likewise. * expr.cc (force_operand, expand_expr_divmod): Likewise. * optabs.cc (expand_doubleword_mod, expand_doubleword_divmod): Likewise. * target.def (can_special_div_by_const): Remove. * target.h: Remove tree-core.h include * targhooks.cc (default_can_special_div_by_const): Remove. * targhooks.h (default_can_special_div_by_const): Remove. * tree-vect-generic.cc (expand_vector_operation): Remove hook. * tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook. * tree-vect-stmts.cc (vectorizable_operation): Remove hook.
2023-01-03	expr: Fix up store_expr into SUBREG_PROMOTED_* target [PR108264]	Jakub Jelinek	1	-0/+3
	The following testcase ICEs on s390x-linux (e.g. with -march=z13). The problem is that target is (subreg/s/u:SI (reg/v:DI 66 [ x+-4 ]) 4) and we call convert_move from temp to the SUBREG_REG of that, expecting to extend the value properly. That works nicely if temp has some scalar integer mode (or partial one), but ICEs when temp has V4QImode on the assertion that from and to modes have the same bitsize. store_expr generally allows say store from V4QI to SI target because they have the same size and if temp is a CONST_INT, we already have code to convert the constant properly, so the following patch just adds handling of non-scalar integer modes by converting them to the mode of target first before convert_move extends them. 2023-01-03 Jakub Jelinek <jakub@redhat.com> PR middle-end/108264 * expr.cc (store_expr): For stores into SUBREG_PROMOTED_* targets from source which doesn't have scalar integral mode first convert it to outer_mode. * gcc.dg/pr108264.c: New test.
2023-01-02	Update copyright years.	Jakub Jelinek	1	-1/+1

2022-11-14	middle-end: Support not decomposing specific divisions during vectorization.	Tamar Christina	1	-10/+14
	In plenty of image and video processing code it's common to modify pixel values by a widening operation and then scale them back into range by dividing by 255. e.g.: x = y / (2 ^ (bitsize (y)/2)-1 This patch adds a new target hook can_special_div_by_const, similar to can_vec_perm which can be called to check if a target will handle a particular division in a special way in the back-end. The vectorizer will then vectorize the division using the standard tree code and at expansion time the hook is called again to generate the code for the division. Alot of the changes in the patch are to pass down the tree operands in all paths that can lead to the divmod expansion so that the target hook always has the type of the expression you're expanding since the types can change the expansion. gcc/ChangeLog: * expmed.h (expand_divmod): Pass tree operands down in addition to RTX. * expmed.cc (expand_divmod): Likewise. * explow.cc (round_push, align_dynamic_address): Likewise. * expr.cc (force_operand, expand_expr_divmod): Likewise. * optabs.cc (expand_doubleword_mod, expand_doubleword_divmod): Likewise. * target.h: Include tree-core. * target.def (can_special_div_by_const): New. * targhooks.cc (default_can_special_div_by_const): New. * targhooks.h (default_can_special_div_by_const): New. * tree-vect-generic.cc (expand_vector_operation): Use it. * doc/tm.texi.in: Document it. * doc/tm.texi: Regenerate. * tree-vect-patterns.cc (vect_recog_divmod_pattern): Check for support. * tree-vect-stmts.cc (vectorizable_operation): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-div-bitmask-1.c: New test. * gcc.dg/vect/vect-div-bitmask-2.c: New test. * gcc.dg/vect/vect-div-bitmask-3.c: New test. * gcc.dg/vect/vect-div-bitmask.h: New file.
2022-11-04	Do not use subword paradoxical subregs in emit_group_store	Eric Botcazou	1	-13/+13
	The goal of the trick is to make life easier for the combiner, but subword paradoxical subregs make it harder for the register allocator instead. gcc/ * expr.cc (emit_group_store): Do not use subword paradoxical subregs