riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2023-11-14	Fix ICE generating uniform vector masks	Andrew Stubbs	1	-1/+1
	Most targets have an "and" instructions for their vector mask size, but RISC-V only has DImode "and". Fixed by allowing wider instruction modes. gcc/ChangeLog: PR target/112481 * expr.cc (store_constructor): Use OPTAB_WIDEN for mask adjustment.
2023-11-10	vect: Don't set excess bits in unform masks	Andrew Stubbs	1	-2/+14
	AVX ignores any excess bits in the mask (at least for vector sizes >=8), but AMD GCN magically uses a larger vector than was intended (the smaller sizes are "fake"), leading to wrong-code. This patch fixes amdgcn execution failures in gcc.dg/vect/pr81740-1.c, gfortran.dg/c-interop/contiguous-1.f90, gfortran.dg/c-interop/ff-descriptor-7.f90, and others. gcc/ChangeLog: * expr.cc (store_constructor): Add "and" operation to uniform mask generation.
2023-10-30	Expand: Checking available optabs for scalar modes in by pieces operations	Haochen Gui	1	-10/+13
	The former patch (f08ca5903c7) examines the scalar modes by target hook scalar_mode_supported_p. It causes some i386 regression cases as XImode and OImode are not enabled in i386 target function. This patch examines the scalar mode by checking if the corresponding optabs are available for the mode. gcc/ PR target/111449 * expr.cc (qi_vector_mode_supported_p): Rename to... (by_pieces_mode_supported_p): ...this, and extends it to do the checking for both scalar and vector mode. (widest_fixed_size_mode_for_size): Call by_pieces_mode_supported_p to examine the mode. (op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise.
2023-10-23	Expand: Enable vector mode for by pieces compares	Haochen Gui	1	-34/+61
	Vector mode compare instructions are efficient for equality compare on rs6000. This patch refactors the codes of by pieces operation to enable vector mode for compare. gcc/ PR target/111449 * expr.cc (can_use_qi_vectors): New function to return true if we know how to implement OP using vectors of bytes. (qi_vector_mode_supported_p): New function to check if optabs exists for the mode and certain by pieces operations. (widest_fixed_size_mode_for_size): Replace the second argument with the type of by pieces operations. Call can_use_qi_vectors and qi_vector_mode_supported_p to do the check. Call scalar_mode_supported_p to check if the scalar mode is supported. (by_pieces_ninsns): Pass the type of by pieces operation to widest_fixed_size_mode_for_size. (class op_by_pieces_d): Remove m_qi_vector_mode. Add m_op to record the type of by pieces operations. (op_by_pieces_d::op_by_pieces_d): Change last argument to the type of by pieces operations, initialize m_op with it. Pass m_op to function widest_fixed_size_mode_for_size. (op_by_pieces_d::get_usable_mode): Pass m_op to function widest_fixed_size_mode_for_size. (op_by_pieces_d::smallest_fixed_size_mode_for_size): Call can_use_qi_vectors and qi_vector_mode_supported_p to do the check. (op_by_pieces_d::run): Pass m_op to function widest_fixed_size_mode_for_size. (move_by_pieces_d::move_by_pieces_d): Set m_op to MOVE_BY_PIECES. (store_by_pieces_d::store_by_pieces_d): Set m_op with the op. (can_store_by_pieces): Pass the type of by pieces operations to widest_fixed_size_mode_for_size. (clear_by_pieces): Initialize class store_by_pieces_d with CLEAR_BY_PIECES. (compare_by_pieces_d::compare_by_pieces_d): Set m_op to COMPARE_BY_PIECES.
2023-10-18	Fix expansion of `(a & 2) != 1`	Andrew Pinski	1	-4/+5
	I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670 where we would remove the `& CST` part if we ended up not calling expand_single_bit_test. This fixes the problem by introducing a new variable that will be used for calling expand_single_bit_test. As afar as I know this can only show up when disabling optimization passes as this above form would have been optimized away. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR middle-end/111863 gcc/ChangeLog: * expr.cc (do_store_flag): Don't over write arg0 when stripping off `& POW2`. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr111863-1.c: New test.
2023-10-16	expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg ↵	Vineet Gupta	1	-7/+0
	[target/111466] RISC-V suffers from extraneous sign extensions, despite/given the ABI guarantee that 32-bit quantities are sign-extended into 64-bit registers, meaning incoming SI function args need not be explicitly sign extended (so do SI return values as most ALU insns implicitly sign-extend too.) Existing REE doesn't seem to handle this well and there are various ideas floating around to smarten REE about it. RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE etc. Another approach would be to prevent EXPAND from generating the sign_extend in the first place which this patch tries to do. The hunk being removed was introduced way back in 1994 as 5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag") This survived full testsuite run for RISC-V rv64gc with surprisingly no fallouts: test results before/after are exactly same. \| \| # of unexpected case / # of unique unexpected case \| \| gcc \| g++ \| gfortran \| \| rv64imafdc_zba_zbb_zbs_zicond/\| 264 / 87 \| 5 / 2 \| 72 / 12 \| \| lp64d/medlow Granted for something so old to have survived, there must be a valid reason. Unfortunately the original change didn't have additional commentary or a test case. That is not to say it can't/won't possibly break things on other arches/ABIs, hence the RFC for someone to scream that this is just bonkers, don't do this 🙂 I've explicitly CC'ed Jakub and Roger who have last touched subreg promoted notes in expr.cc for insight and/or screaming 😉 Thanks to Robin for narrowing this down in an amazing debugging session @ GNU Cauldron. ``` foo2: sext.w a6,a1 <-- this goes away beq a1,zero,.L4 li a5,0 li a0,0 .L3: addw a4,a2,a5 addw a5,a3,a5 addw a0,a4,a0 bltu a5,a6,.L3 ret .L4: li a0,0 ret ``` Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com> PR target/111466 gcc/ * expr.cc (expand_expr_real_2): Do not clear SUBREG_PROMOTED_VAR_P. gcc/testsuite * gcc.target/riscv/pr111466.c: New test.
2023-09-29	Remove poly_int_pod	Richard Sandiford	1	-4/+4
	poly_int was written before the switch to C++11 and so couldn't use explicit default constructors. This led to an awkward split between poly_int_pod and poly_int. poly_int simply inherited from poly_int_pod and added constructors, with the argumentless constructor having an empty body. But inheritance meant that poly_int had to repeat the assignment operators from poly_int_pod (again, no C++11, so no "using" to inherit base-class implementations). All that goes away if we switch to using default constructors. The main complication is ensuring that braced initialisation still gives a constexpr, so that static variables can be initialised without runtime code. The two problems here are: (1) When initialising a poly_int<N, wide_int> with fewer than N coefficients, the other coefficients need to be a zero of the same precision as the explicit coefficients. This was previously done in a for loop using wi::ints_for<...>::zero, but C++11 constexpr constructors can't have function bodies. The patch instead uses a series of delegated initialisers to fill in the implicit coefficients. (2) The initialisation in: void f(int x) { unsigned int foo {x}; } produces the warning: warning: narrowing conversion of 'x' from 'int' to 'unsigned int' [-Wnarrowing] whereas: void f(int x) { unsigned int foo = x; } does not. So switching to direct initialisation of the coeffs array would mean that: poly_uin64_t x = 0; would trigger a warning for using 0 rather than 0u. That seemed overly pedantic, so the patch adds explicit casts to the constructor. The complication is to do that without adding extra code to wide-int versions. The patch uses a new init_cast type for that. gcc/ * poly-int.h (poly_int_pod): Delete. (poly_coeff_traits::init_cast): New type. (poly_int_full, poly_int_hungry, poly_int_fullness): New structures. (poly_int): Replace constructors that take 1 and 2 coefficients with a general one that takes an arbitrary number of coefficients. Delegate initialization to two new private constructors, one of which uses the coefficients as-is and one of which adds an extra zero of the appropriate type (and precision, where applicable). (gt_ggc_mx, gt_pch_nx): Operate on poly_ints rather than poly_int_pods. * poly-int-types.h (poly_uint16_pod, poly_int64_pod, poly_uint64_pod) (poly_offset_int_pod, poly_wide_int_pod, poly_widest_int_pod): Delete. * gengtype.cc (main): Don't register poly_int64_pod. * calls.cc (initialize_argument_information): Use poly_int rather than poly_int_pod. (combine_pending_stack_adjustment_and_call): Likewise. * config/aarch64/aarch64.cc (pure_scalable_type_info): Likewise. * data-streamer.h (bp_unpack_poly_value): Likewise. * dwarf2cfi.cc (struct dw_trace_info): Likewise. (struct queued_reg_save): Likewise. * dwarf2out.h (struct dw_cfa_location): Likewise. * emit-rtl.h (struct incoming_args): Likewise. (struct rtl_data): Likewise. * expr.cc (get_bit_range): Likewise. (get_inner_reference): Likewise. * expr.h (get_bit_range): Likewise. * fold-const.cc (split_address_to_core_and_offset): Likewise. (ptr_difference_const): Likewise. * fold-const.h (ptr_difference_const): Likewise. * function.cc (try_fit_stack_local): Likewise. (instantiate_new_reg): Likewise. * function.h (struct expr_status): Likewise. (struct args_size): Likewise. * genmodes.cc (ZERO_COEFFS): Likewise. (mode_size_inline): Likewise. (mode_nunits_inline): Likewise. (emit_mode_precision): Likewise. (emit_mode_size): Likewise. (emit_mode_nunits): Likewise. * gimple-fold.cc (get_base_constructor): Likewise. * gimple-ssa-store-merging.cc (struct symbolic_number): Likewise. * inchash.h (class hash): Likewise. * ipa-modref-tree.cc (modref_access_node::dump): Likewise. * ipa-modref.cc (modref_access_analysis::merge_call_side_effects): Likewise. * ira-int.h (ira_spilled_reg_stack_slot): Likewise. * lra-eliminations.cc (self_elim_offsets): Likewise. * machmode.h (mode_size, mode_precision, mode_nunits): Likewise. * omp-low.cc (omplow_simd_context): Likewise. * pretty-print.cc (pp_wide_integer): Likewise. * pretty-print.h (pp_wide_integer): Likewise. * reload.cc (struct decomposition): Likewise. * reload.h (struct reload): Likewise. * reload1.cc (spill_stack_slot_width): Likewise. (struct elim_table): Likewise. (offsets_at): Likewise. (init_eliminable_invariants): Likewise. * rtl.h (union rtunion): Likewise. (poly_int_rtx_p): Likewise. (strip_offset): Likewise. (strip_offset_and_add): Likewise. * rtlanal.cc (strip_offset): Likewise. * tree-dfa.cc (get_ref_base_and_extent): Likewise. (get_addr_base_and_unit_offset_1): Likewise. (get_addr_base_and_unit_offset): Likewise. * tree-dfa.h (get_ref_base_and_extent): Likewise. (get_addr_base_and_unit_offset_1): Likewise. (get_addr_base_and_unit_offset): Likewise. * tree-ssa-loop-ivopts.cc (struct iv_use): Likewise. (strip_offset): Likewise. * tree-ssa-sccvn.h (struct vn_reference_op_struct): Likewise. * tree.cc (ptrdiff_tree_p): Likewise. * tree.h (poly_int_tree_p): Likewise. (ptrdiff_tree_p): Likewise. (get_inner_reference): Likewise. gcc/testsuite/ * gcc.dg/plugin/poly-int-tests.h (test_num_coeffs_extra): Use poly_int rather than poly_int_pod.
2023-09-29	Simplify & expand c_readstr	Richard Sandiford	1	-5/+2
	c_readstr only operated on integer modes. It worked by reading the source string into an array of HOST_WIDE_INTs, converting that array into a wide_int, and from there to an rtx. It's simpler to do this by building a target memory image and using native_decode_rtx to convert that memory image into an rtx. It avoids all the endianness shenanigans because both the string and native_decode_rtx follow target memory order. It also means that the function can handle all fixed-size modes, which simplifies callers and allows vector modes to be used more widely. gcc/ * builtins.h (c_readstr): Take a fixed_size_mode rather than a scalar_int_mode. * builtins.cc (c_readstr): Likewise. Build a local array of bytes and use native_decode_rtx to get the rtx image. (builtin_memcpy_read_str): Simplify accordingly. (builtin_strncpy_read_str): Likewise. (builtin_memset_read_str): Likewise. (builtin_memset_gen_str): Likewise. * expr.cc (string_cst_read_str): Likewise.
2023-09-20	middle-end: use MAX_FIXED_MODE_SIZE instead of precidion of TImode/DImode	Jakub Jelinek	1	-10/+4
	On Tue, Sep 19, 2023 at 05:50:59PM +0100, Richard Sandiford wrote: > How about using MAX_FIXED_MODE_SIZE for things like this? Seems like a good idea. The following patch does that. 2023-09-20 Jakub Jelinek <jakub@redhat.com> * match.pd ((x << c) >> c): Use MAX_FIXED_MODE_SIZE instead of GET_MODE_PRECISION of TImode or DImode depending on whether TImode is supported scalar mode. * gimple-lower-bitint.cc (bitint_precision_kind): Likewise. * expr.cc (expand_expr_real_1): Likewise. * tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt): Likewise. * ubsan.cc (ubsan_encode_value, ubsan_type_descriptor): Likewise.
2023-09-07	middle-end: Avoid calling targetm.c.bitint_type_info inside of gcc_assert ↵	Jakub Jelinek	1	-1/+2
	[PR102989] On Thu, Sep 07, 2023 at 10:36:02AM +0200, Thomas Schwinge wrote: > Minor comment/question: are we doing away with the property that > 'assert'-like "calls" must not have side effects? Per 'gcc/system.h', > this is "OK" for 'gcc_assert' for '#if ENABLE_ASSERT_CHECKING' or > '#elif (GCC_VERSION >= 4005)' -- that is, GCC 4.5, which is always-true, > thus the "offending" '#else' is never active. However, it's different > for standard 'assert' and 'gcc_checking_assert', so I'm not sure if > that's a good property for 'gcc_assert' only? For example, see also > <https://gcc.gnu.org/PR6906> "warn about asserts with side effects", or > recent <https://gcc.gnu.org/PR111144> > "RFE: could -fanalyzer warn about assertions that have side effects?". You're right, the #define gcc_assert(EXPR) ((void)(0 && (EXPR))) fallback definition is incompatible with the way I've used it, so for --disable-checking built by non-GCC it would not work properly. 2023-09-07 Jakub Jelinek <jakub@redhat.com> PR c/102989 * expr.cc (expand_expr_real_1): Don't call targetm.c.bitint_type_info inside gcc_assert, as later code relies on it filling info variable. * gimple-fold.cc (clear_padding_bitint_needs_padding_p, clear_padding_type): Likewise. * varasm.cc (output_constant): Likewise. * fold-const.cc (native_encode_int, native_interpret_int): Likewise. * stor-layout.cc (finish_bitfield_representative, layout_type): Likewise. * gimple-lower-bitint.cc (bitint_precision_kind): Likewise.
2023-09-06	Middle-end _BitInt support [PR102989]	Jakub Jelinek	1	-5/+56
	The following patch introduces the middle-end part of the _BitInt support, a new BITINT_TYPE, handling it where needed, except the lowering pass and sanitizer support. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * tree.def (BITINT_TYPE): New type. * tree.h (TREE_CHECK6, TREE_NOT_CHECK6): Define. (NUMERICAL_TYPE_CHECK, INTEGRAL_TYPE_P): Include BITINT_TYPE. (BITINT_TYPE_P): Define. (CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode bit-fields if they have BITINT_TYPE type. (tree_check6, tree_not_check6): New inline functions. (any_integral_type_check): Include BITINT_TYPE. (build_bitint_type): Declare. * tree.cc (tree_code_size, wide_int_to_tree_1, cache_integer_cst, build_zero_cst, type_hash_canon_hash, type_cache_hasher::equal, type_hash_canon): Handle BITINT_TYPE. (bitint_type_cache): New variable. (build_bitint_type): New function. (signed_or_unsigned_type_for, verify_type_variant, verify_type): Handle BITINT_TYPE. (tree_cc_finalize): Free bitint_type_cache. * builtins.cc (type_to_class): Handle BITINT_TYPE. (fold_builtin_unordered_cmp): Handle BITINT_TYPE like INTEGER_TYPE. * cfgexpand.cc (expand_debug_expr): Punt on BLKmode BITINT_TYPE INTEGER_CSTs. * convert.cc (convert_to_pointer_1, convert_to_real_1, convert_to_complex_1): Handle BITINT_TYPE like INTEGER_TYPE. (convert_to_integer_1): Likewise. For BITINT_TYPE don't check GET_MODE_PRECISION (TYPE_MODE (type)). * doc/generic.texi (BITINT_TYPE): Document. * doc/tm.texi.in (TARGET_C_BITINT_TYPE_INFO): New. * doc/tm.texi: Regenerated. * dwarf2out.cc (base_type_die, is_base_type, modified_type_die, gen_type_die_with_usage): Handle BITINT_TYPE. (rtl_for_decl_init): Punt on BLKmode BITINT_TYPE INTEGER_CSTs or handle those which fit into shwi. * expr.cc (expand_expr_real_1): Define EXTEND_BITINT macro, reduce to bitfield precision reads from BITINT_TYPE vars, parameters or memory locations. Expand large/huge BITINT_TYPE INTEGER_CSTs into memory. * fold-const.cc (fold_convert_loc, make_range_step): Handle BITINT_TYPE. (extract_muldiv_1): For BITINT_TYPE use TYPE_PRECISION rather than GET_MODE_SIZE (SCALAR_INT_TYPE_MODE). (native_encode_int, native_interpret_int, native_interpret_expr): Handle BITINT_TYPE. * gimple-expr.cc (useless_type_conversion_p): Make BITINT_TYPE to some other integral type or vice versa conversions non-useless. * gimple-fold.cc (gimple_fold_builtin_memset): Punt for BITINT_TYPE. (clear_padding_unit): Mention in comment that _BitInt types don't need to fit either. (clear_padding_bitint_needs_padding_p): New function. (clear_padding_type_may_have_padding_p): Handle BITINT_TYPE. (clear_padding_type): Likewise. * internal-fn.cc (expand_mul_overflow): For unsigned non-mode precision operands force pos_neg? to 1. (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): New functions. * internal-fn.def (MULBITINT, DIVMODBITINT, FLOATTOBITINT, BITINTTOFLOAT): New internal functions. * internal-fn.h (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): Declare. * match.pd (non-equality compare simplifications from fold_binary): Punt if TYPE_MODE (arg1_type) is BLKmode. * pretty-print.h (pp_wide_int): Handle printing of large precision wide_ints which would buffer overflow digit_buffer. * stor-layout.cc (finish_bitfield_representative): For bit-fields with BITINT_TYPE, prefer representatives with precisions in multiple of limb precision. (layout_type): Handle BITINT_TYPE. Handle COMPLEX_TYPE with BLKmode element type and assert it is BITINT_TYPE. * target.def (bitint_type_info): New C target hook. * target.h (struct bitint_info): New type. * targhooks.cc (default_bitint_type_info): New function. * targhooks.h (default_bitint_type_info): Declare. * tree-pretty-print.cc (dump_generic_node): Handle BITINT_TYPE. Handle printing large wide_ints which would buffer overflow digit_buffer. * tree-ssa-sccvn.cc: Include target.h. (eliminate_dom_walker::eliminate_stmt): Punt for large/huge BITINT_TYPE. * tree-switch-conversion.cc (jump_table_cluster::emit): For more than 64-bit BITINT_TYPE subtract low bound from expression and cast to 64-bit integer type both the controlling expression and case labels. * typeclass.h (enum type_class): Add bitint_type_class enumerator. * varasm.cc (output_constant): Handle BITINT_TYPE INTEGER_CSTs. * vr-values.cc (check_for_binary_op_overflow): Use widest2_int rather than widest_int. (simplify_using_ranges::simplify_internal_call_using_ranges): Use unsigned_type_for rather than build_nonstandard_integer_type.
2023-08-10	expr: Small optimization [PR102989]	Jakub Jelinek	1	-6/+4
	Small optimization to avoid testing modifier multiple times. 2023-08-10 Jakub Jelinek <jakub@redhat.com> PR c/102989 * expr.cc (expand_expr_real_1) <case MEM_REF>: Add an early return for EXPAND_WRITE or EXPAND_MEMORY modifiers to avoid testing it multiple times.
2023-07-28	PR rtl-optimization/110587: Reduce useless moves in compile-time hog.	Roger Sayle	1	-9/+4
	This patch is one of a series of fixes for PR rtl-optimization/110587, a compile-time regression with -O0, that attempts to address the underlying cause. As noted previously, the pathological test case pr28071.c contains a large number of useless register-to-register moves that can produce quadratic behaviour (in LRA). These moves are generated during RTL expansion in emit_group_load_1, where the middle-end attempts to simplify the source before calling extract_bit_field. This is reasonable if the source is a complex expression (from before the tree-ssa optimizers), or a SUBREG, or a hard register, but it's not particularly useful to copy a pseudo register into a new pseudo register. This patch eliminates that redundancy. The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains 777K lines, with this patch it contains 717K lines, i.e. saving about 60K lines (admittedly of debugging text output, but it makes the point). 2023-07-28 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR middle-end/28071 PR rtl-optimization/110587 * expr.cc (emit_group_load_1): Simplify logic for calling force_reg on ORIG_SRC, to avoid making a copy if the source is already in a pseudo register.
2023-06-29	cselib+expr+bitmap: Change return type of predicate functions from int to bool	Uros Bizjak	1	-52/+52
	gcc/ChangeLog: * cselib.h (rtx_equal_for_cselib_1): Change return type from int to bool. (references_value_p): Ditto. (rtx_equal_for_cselib_p): Ditto. * expr.h (can_store_by_pieces): Ditto. (try_casesi): Ditto. (try_tablejump): Ditto. (safe_from_p): Ditto. * sbitmap.h (bitmap_equal_p): Ditto. * cselib.cc (references_value_p): Change return type from int to void and adjust function body accordingly. (rtx_equal_for_cselib_1): Ditto. * expr.cc (is_aligning_offset): Ditto. (can_store_by_pieces): Ditto. (mostly_zeros_p): Ditto. (all_zeros_p): Ditto. (safe_from_p): Ditto. (is_aligning_offset): Ditto. (try_casesi): Ditto. (try_tablejump): Ditto. (store_constructor): Change "need_to_clear" and "const_bounds_p" variables to bool. * sbitmap.cc (bitmap_equal_p): Change return type from int to bool.
2023-06-29	middle-end/110452 - bad code generation with AVX512 mask splat	Richard Biener	1	-0/+13
	The following adds an alternate way of expanding a uniform mask vector constructor like _55 = _2 ? -1 : 0; vect_cst__56 = {_55, _55, _55, _55, _55, _55, _55, _55}; when the mask mode is a scalar int mode like for AVX512 or GCN. Instead of piecewise building the result via shifts and ors we can take advantage of uniformity and signedness of the component and simply sign-extend to the result. Instead of cmpl $3, %edi sete %cl movl %ecx, %esi leal (%rsi,%rsi), %eax leal 0(,%rsi,4), %r9d leal 0(,%rsi,8), %r8d orl %esi, %eax orl %r9d, %eax movl %ecx, %r9d orl %r8d, %eax movl %ecx, %r8d sall $4, %r9d sall $5, %r8d sall $6, %esi orl %r9d, %eax orl %r8d, %eax movl %ecx, %r8d orl %esi, %eax sall $7, %r8d orl %r8d, %eax kmovb %eax, %k1 we then get cmpl $3, %edi sete %cl negl %ecx kmovb %ecx, %k1 Code generation for non-uniform masks remains bad, but at least I see no easy way out for the most general case here. PR middle-end/110452 * expr.cc (store_constructor): Handle uniform boolean vectors with integer mode specially.
2023-06-13	Avoid duplicate vector initializations during RTL expansion.	Roger Sayle	1	-2/+5
	This middle-end patch avoids some redundant RTL for vector initialization during RTL expansion. For the simple test case: typedef __int128 v1ti __attribute__ ((__vector_size__ (16))); __int128 key; v1ti foo() { return (v1ti){key}; } the middle-end currently expands: (set (reg:V1TI 85) (const_vector:V1TI [ (const_int 0) ])) (set (reg:V1TI 85) (mem/c:V1TI (symbol_ref:DI ("key")))) where we create a dead instruction that initializes the vector to zero, immediately followed by a set of the entire vector. This patch skips this zeroing instruction when the vector has only a single element. It also updates the code to indicate when we've cleared the vector, so that we don't need to initialize zero elements. 2023-06-13 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * expr.cc (store_constructor) <case VECTOR_TYPE>: Don't bother clearing vectors with only a single element. Set CLEARED if the vector was initialized to zero.
2023-06-06	Handle const_int in expand_single_bit_test	Andrew Pinski	1	-3/+7
	After expanding directly to rtl instead of creating a tree, we could end up with a const_int which is not ready to be handled by extract_bit_field. So need to the constant folding here instead. OK? bootstrapped and tested on x86_64-linux-gnu with no regressions. PR middle-end/110117 gcc/ChangeLog: * expr.cc (expand_single_bit_test): Handle const_int from expand_expr. gcc/testsuite/ChangeLog: * gcc.dg/pr110117-1.c: New test. * gcc.dg/pr110117-2.c: New test.
2023-06-06	Improve do_store_flag for single bit when there is no non-zero bits	Andrew Pinski	1	-17/+11
	In r14-1534-g908e5ab5c11c, I forgot you could turn off CCP or turn off the bit tracking part of CCP so we would lose out what TER was able to do before hand. This moves around the TER code so that it is used instead of just the nonzerobits. It also makes it easier to remove the TER part of the code later on too. OK? Bootstrapped and tested on x86_64-linux-gnu. Note it reintroduces PR 110117 (which was accidently fixed after r14-1534-g908e5ab5c11c). The next patch in series will fix that. gcc/ChangeLog: * expr.cc (do_store_flag): Rearrange the TER code so that it overrides the nonzero bits info if we had `a & POW2`.
2023-06-05	Remove widen_plus/minus_expr tree codes	Andre Vieira	1	-6/+0
	This patch removes the old widen plus/minus tree codes which have been replaced by internal functions. 2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com> Joel Hutton <joel.hutton@arm.com> gcc/ChangeLog: * doc/generic.texi: Remove old tree codes. * expr.cc (expand_expr_real_2): Remove old tree code cases. * gimple-pretty-print.cc (dump_binary_rhs): Likewise. * optabs-tree.cc (optab_for_tree_code): Likewise. (supportable_half_widening_operation): Likewise. * tree-cfg.cc (verify_gimple_assign_binary): Likewise. * tree-inline.cc (estimate_operator_cost): Likewise. (op_symbol_code): Likewise. * tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Likewise. (vect_analyze_data_ref_accesses): Likewise. * tree-vect-generic.cc (expand_vector_operations_1): Likewise. * cfgexpand.cc (expand_debug_expr): Likewise. * tree-vect-stmts.cc (vectorizable_conversion): Likewise. (supportable_widening_operation): Likewise. * gimple-range-op.cc (gimple_range_op_handler::maybe_non_standard): Likewise. * optabs.def (vec_widen_ssubl_hi_optab, vec_widen_ssubl_lo_optab, vec_widen_saddl_hi_optab, vec_widen_saddl_lo_optab, vec_widen_usubl_hi_optab, vec_widen_usubl_lo_optab, vec_widen_uaddl_hi_optab, vec_widen_uaddl_lo_optab): Remove optabs. * tree-pretty-print.cc (dump_generic_node): Remove tree code definition. * tree.def (WIDEN_PLUS_EXPR, WIDEN_MINUS_EXPR, VEC_WIDEN_PLUS_HI_EXPR, VEC_WIDEN_PLUS_LO_EXPR, VEC_WIDEN_MINUS_HI_EXPR, VEC_WIDEN_MINUS_LO_EXPR): Likewise.
2023-06-04	Improve do_store_flag for comparing single bit against that bit	Andrew Pinski	1	-3/+8
	This is a case which I noticed while working on the previous patch. Sometimes we end up with `a == CST` instead of comparing against 0. This happens in the following code: ``` unsigned f(unsigned t) { if (t & ~(1<<30)) __builtin_unreachable(); t ^= (1<<30); return t != 0; } ``` We should handle the case where the nonzero bits is the same as the comparison operand. Changes from v1: * v2: Updated for the bit extraction changes. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * expr.cc (do_store_flag): Improve for single bit testing not against zero but against that single bit.
2023-06-04	Improve do_store_flag for single bit comparison against 0	Andrew Pinski	1	-5/+20
	While working something else, I noticed we could improve the following function code generation: ``` unsigned f(unsigned t) { if (t & ~(1<<30)) __builtin_unreachable(); return t != 0; } ``` Right know we just emit a comparison against 0 instead of just a shift right by 30. There is code in do_store_flag which already optimizes `(t & 1<<30) != 0` to `(t >> 30) & 1` (using bit extraction if available). This patch extends it to handle the case where we know t has a nonzero of just one bit set. Changes from v1: * v2: Updated for the bit extraction improvements. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * expr.cc (do_store_flag): Extend the one bit checking case to handle the case where we don't have an and but rather still one bit is known to be non-zero.
2023-05-20	Fix expand_single_bit_test for big-endian	Andrew Pinski	1	-1/+8
	I had thought extract_bit_field bitpos argument was the shifted position and not the bitposition like BIT_FIELD_REF so I had removed the code which would use the correct bitposition for BYTES_BIG_ENDIAN. Committed as obvious; I checked big-endian MIPS to make sure we are now producing the correct code. gcc/ChangeLog: * expr.cc (expand_single_bit_test): Correct bitpos for big-endian.
2023-05-21	Fix PR 109919: ICE in emit_move_insn with some bit tests	Andrew Pinski	1	-1/+1
	The problem is I used expand_expr with the target but we don't want to use the target here as it is the wrong mode for the original expression. The testcase would ICE deap down while trying to do a move to use the target. Anyways just calling expand_expr with NULL_EXPR fixes the issue. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR middle-end/109919 gcc/ChangeLog: * expr.cc (expand_single_bit_test): Don't use the target for expand_expr. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr109919-1.c: New test.
2023-05-20	Expand directly for single bit test	Andrew Pinski	1	-35/+28
	Instead of using creating trees to the expansion, just expand directly which makes the code a little simplier but also reduces how much GC memory will be used during the expansion. gcc/ChangeLog: * expr.cc (fold_single_bit_test): Rename to ... (expand_single_bit_test): This and expand directly. (do_store_flag): Update for the rename function.
2023-05-20	Use BIT_FIELD_REF inside fold_single_bit_test	Andrew Pinski	1	-11/+10
	Instead of depending on combine to do the extraction, Let's create a tree which will expand directly into the extraction. This improves code generation on some targets. gcc/ChangeLog: * expr.cc (fold_single_bit_test): Use BIT_FIELD_REF instead of shift/and.
2023-05-20	Simplify fold_single_bit_test with respect to code	Andrew Pinski	1	-55/+53
	Since we know that fold_single_bit_test is now only passed NE_EXPR or EQ_EXPR, we can simplify it and just use a gcc_assert to assert that is the code that is being passed. gcc/ChangeLog: * expr.cc (fold_single_bit_test): Add an assert and simplify based on code being NE_EXPR or EQ_EXPR.
2023-05-20	Simplify fold_single_bit_test slightly	Andrew Pinski	1	-12/+10
	Now the only use of fold_single_bit_test is in do_store_flag, we can change it such that to pass the inner arg and bitnum instead of building a tree. There is no code generation changes due to this change, only a decrease in GC memory that is produced during expansion. gcc/ChangeLog: * expr.cc (fold_single_bit_test): Take inner and bitnum instead of arg0 and arg1. Update the code. (do_store_flag): Don't create a tree when calling fold_single_bit_test instead just call it with the bitnum and the inner tree.
2023-05-20	Use get_def_for_expr in fold_single_bit_test	Andrew Pinski	1	-5/+6
	The code in fold_single_bit_test, checks if the inner was a right shift and improve the bitnum based on that. But since the inner will always be a SSA_NAME at this point, the code is dead. Move it over to use the helper function get_def_for_expr instead. gcc/ChangeLog: * expr.cc (fold_single_bit_test): Use get_def_for_expr instead of checking the inner's code.
2023-05-20	Inline and simplify fold_single_bit_test_into_sign_test into ↵	Andrew Pinski	1	-41/+10
	fold_single_bit_test Since the last use of fold_single_bit_test is fold_single_bit_test, we can inline it and even simplify the inlined version. This has no behavior change. gcc/ChangeLog: * expr.cc (fold_single_bit_test_into_sign_test): Inline into ... (fold_single_bit_test): This and simplify.
2023-05-20	Move fold_single_bit_test to expr.cc from fold-const.cc	Andrew Pinski	1	-0/+113
	This is part 1 of N patch set that will change the expansion of `(A & C) != 0` from using trees to directly expanding so later on we can do some cost analysis. Since the only user of fold_single_bit_test is now expand, move it to there. gcc/ChangeLog: * fold-const.cc (fold_single_bit_test_into_sign_test): Move to expr.cc. (fold_single_bit_test): Likewise. * expr.cc (fold_single_bit_test_into_sign_test): Move from fold-const.cc (fold_single_bit_test): Likewise and make static. * fold-const.h (fold_single_bit_test): Remove declaration.
2023-05-18	gcc: use _P() defines from tree.h	Bernhard Reutner-Fischer	1	-1/+1
	gcc/ChangeLog: * alias.cc (ref_all_alias_ptr_type_p): Use _P() defines from tree.h. * attribs.cc (diag_attr_exclusions): Ditto. (decl_attributes): Ditto. (build_type_attribute_qual_variant): Ditto. * builtins.cc (fold_builtin_carg): Ditto. (fold_builtin_next_arg): Ditto. (do_mpc_arg2): Ditto. * cfgexpand.cc (expand_return): Ditto. * cgraph.h (decl_in_symtab_p): Ditto. (symtab_node::get_create): Ditto. * dwarf2out.cc (base_type_die): Ditto. (implicit_ptr_descriptor): Ditto. (gen_array_type_die): Ditto. (gen_type_die_with_usage): Ditto. (optimize_location_into_implicit_ptr): Ditto. * expr.cc (do_store_flag): Ditto. * fold-const.cc (negate_expr_p): Ditto. (fold_negate_expr_1): Ditto. (fold_convert_const): Ditto. (fold_convert_loc): Ditto. (constant_boolean_node): Ditto. (fold_binary_op_with_conditional_arg): Ditto. (build_fold_addr_expr_with_type_loc): Ditto. (fold_comparison): Ditto. (fold_checksum_tree): Ditto. (tree_unary_nonnegative_warnv_p): Ditto. (integer_valued_real_unary_p): Ditto. (fold_read_from_constant_string): Ditto. * gcc-rich-location.cc (maybe_range_label_for_tree_type_mismatch::get_text): Ditto. * gimple-expr.cc (useless_type_conversion_p): Ditto. (is_gimple_reg): Ditto. (is_gimple_asm_val): Ditto. (mark_addressable): Ditto. * gimple-expr.h (is_gimple_variable): Ditto. (virtual_operand_p): Ditto. * gimple-ssa-warn-access.cc (pass_waccess::check_dangling_stores): Ditto. * gimplify.cc (gimplify_bind_expr): Ditto. (gimplify_return_expr): Ditto. (gimple_add_padding_init_for_auto_var): Ditto. (gimplify_addr_expr): Ditto. (omp_add_variable): Ditto. (omp_notice_variable): Ditto. (omp_get_base_pointer): Ditto. (omp_strip_components_and_deref): Ditto. (omp_strip_indirections): Ditto. (omp_accumulate_sibling_list): Ditto. (omp_build_struct_sibling_lists): Ditto. (gimplify_adjust_omp_clauses_1): Ditto. (gimplify_adjust_omp_clauses): Ditto. (gimplify_omp_for): Ditto. (goa_lhs_expr_p): Ditto. (gimplify_one_sizepos): Ditto. * graphite-scop-detection.cc (scop_detection::graphite_can_represent_scev): Ditto. * ipa-devirt.cc (odr_types_equivalent_p): Ditto. * ipa-prop.cc (ipa_set_jf_constant): Ditto. (propagate_controlled_uses): Ditto. * ipa-sra.cc (type_prevails_p): Ditto. (scan_expr_access): Ditto. * optabs-tree.cc (optab_for_tree_code): Ditto. * toplev.cc (wrapup_global_declaration_1): Ditto. * trans-mem.cc (transaction_invariant_address_p): Ditto. * tree-cfg.cc (verify_types_in_gimple_reference): Ditto. (verify_gimple_comparison): Ditto. (verify_gimple_assign_binary): Ditto. (verify_gimple_assign_single): Ditto. * tree-complex.cc (get_component_ssa_name): Ditto. * tree-emutls.cc (lower_emutls_2): Ditto. * tree-inline.cc (copy_tree_body_r): Ditto. (estimate_move_cost): Ditto. (copy_decl_for_dup_finish): Ditto. * tree-nested.cc (convert_nonlocal_omp_clauses): Ditto. (note_nonlocal_vla_type): Ditto. (convert_local_omp_clauses): Ditto. (remap_vla_decls): Ditto. (fixup_vla_decls): Ditto. * tree-parloops.cc (loop_has_vector_phi_nodes): Ditto. * tree-pretty-print.cc (print_declaration): Ditto. (print_call_name): Ditto. * tree-sra.cc (compare_access_positions): Ditto. * tree-ssa-alias.cc (compare_type_sizes): Ditto. * tree-ssa-ccp.cc (get_default_value): Ditto. * tree-ssa-coalesce.cc (populate_coalesce_list_for_outofssa): Ditto. * tree-ssa-dom.cc (reduce_vector_comparison_to_scalar_comparison): Ditto. * tree-ssa-forwprop.cc (can_propagate_from): Ditto. * tree-ssa-propagate.cc (may_propagate_copy): Ditto. * tree-ssa-sccvn.cc (fully_constant_vn_reference_p): Ditto. * tree-ssa-sink.cc (statement_sink_location): Ditto. * tree-ssa-structalias.cc (type_must_have_pointers): Ditto. * tree-ssa-ter.cc (find_replaceable_in_bb): Ditto. * tree-ssa-uninit.cc (warn_uninit): Ditto. * tree-ssa.cc (maybe_rewrite_mem_ref_base): Ditto. (non_rewritable_mem_ref_base): Ditto. * tree-streamer-in.cc (lto_input_ts_type_non_common_tree_pointers): Ditto. * tree-streamer-out.cc (write_ts_type_non_common_tree_pointers): Ditto. * tree-vect-generic.cc (do_binop): Ditto. (do_cond): Ditto. * tree-vect-stmts.cc (vect_init_vector): Ditto. * tree-vector-builder.h (tree_vector_builder::note_representative): Ditto. * tree.cc (sign_mask_for): Ditto. (verify_type_variant): Ditto. (gimple_canonical_types_compatible_p): Ditto. (verify_type): Ditto. * ubsan.cc (get_ubsan_type_info_for_type): Ditto. * var-tracking.cc (prepare_call_arguments): Ditto. (vt_add_function_parameters): Ditto. * varasm.cc (decode_addr_const): Ditto.
2023-04-19	Introduce VIRTUAL_REGISTER_P and VIRTUAL_REGISTER_NUM_P predicates	Uros Bizjak	1	-2/+1
	These two predicates are similar to existing HARD_REGISTER_P and HARD_REGISTER_NUM_P predicates and return 1 if the given register corresponds to a virtual register. gcc/ChangeLog: * rtl.h (VIRTUAL_REGISTER_P): New predicate. (VIRTUAL_REGISTER_NUM_P): Ditto. (REGNO_PTR_FRAME_P): Use VIRTUAL_REGISTER_NUM_P predicate. * expr.cc (force_operand): Use VIRTUAL_REGISTER_P predicate. * function.cc (instantiate_decl_rtl): Ditto. * rtlanal.cc (rtx_addr_can_trap_p_1): Ditto. (nonzero_address_p): Ditto. (refers_to_regno_p): Use VIRTUAL_REGISTER_NUM_P predicate.
2023-03-14	Revert latest change to emit_group_store	Eric Botcazou	1	-10/+7
	This pessimizes on targets with insv instructions. gcc/ PR rtl-optimization/107762 * expr.cc (emit_group_store): Revert latest change.
2023-03-12	middle-end: Revert can_special_div_by_const changes [PR108583]	Tamar Christina	1	-14/+10
	This reverts the changes for the CAN_SPECIAL_DIV_BY_CONST hook. gcc/ChangeLog: PR target/108583 * doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Remove. * doc/tm.texi.in: Likewise. * explow.cc (round_push, align_dynamic_address): Revert previous patch. * expmed.cc (expand_divmod): Likewise. * expmed.h (expand_divmod): Likewise. * expr.cc (force_operand, expand_expr_divmod): Likewise. * optabs.cc (expand_doubleword_mod, expand_doubleword_divmod): Likewise. * target.def (can_special_div_by_const): Remove. * target.h: Remove tree-core.h include * targhooks.cc (default_can_special_div_by_const): Remove. * targhooks.h (default_can_special_div_by_const): Remove. * tree-vect-generic.cc (expand_vector_operation): Remove hook. * tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook. * tree-vect-stmts.cc (vectorizable_operation): Remove hook.
2023-01-03	expr: Fix up store_expr into SUBREG_PROMOTED_* target [PR108264]	Jakub Jelinek	1	-0/+3
	The following testcase ICEs on s390x-linux (e.g. with -march=z13). The problem is that target is (subreg/s/u:SI (reg/v:DI 66 [ x+-4 ]) 4) and we call convert_move from temp to the SUBREG_REG of that, expecting to extend the value properly. That works nicely if temp has some scalar integer mode (or partial one), but ICEs when temp has V4QImode on the assertion that from and to modes have the same bitsize. store_expr generally allows say store from V4QI to SI target because they have the same size and if temp is a CONST_INT, we already have code to convert the constant properly, so the following patch just adds handling of non-scalar integer modes by converting them to the mode of target first before convert_move extends them. 2023-01-03 Jakub Jelinek <jakub@redhat.com> PR middle-end/108264 * expr.cc (store_expr): For stores into SUBREG_PROMOTED_* targets from source which doesn't have scalar integral mode first convert it to outer_mode. * gcc.dg/pr108264.c: New test.
2023-01-02	Update copyright years.	Jakub Jelinek	1	-1/+1

2022-11-14	middle-end: Support not decomposing specific divisions during vectorization.	Tamar Christina	1	-10/+14
	In plenty of image and video processing code it's common to modify pixel values by a widening operation and then scale them back into range by dividing by 255. e.g.: x = y / (2 ^ (bitsize (y)/2)-1 This patch adds a new target hook can_special_div_by_const, similar to can_vec_perm which can be called to check if a target will handle a particular division in a special way in the back-end. The vectorizer will then vectorize the division using the standard tree code and at expansion time the hook is called again to generate the code for the division. Alot of the changes in the patch are to pass down the tree operands in all paths that can lead to the divmod expansion so that the target hook always has the type of the expression you're expanding since the types can change the expansion. gcc/ChangeLog: * expmed.h (expand_divmod): Pass tree operands down in addition to RTX. * expmed.cc (expand_divmod): Likewise. * explow.cc (round_push, align_dynamic_address): Likewise. * expr.cc (force_operand, expand_expr_divmod): Likewise. * optabs.cc (expand_doubleword_mod, expand_doubleword_divmod): Likewise. * target.h: Include tree-core. * target.def (can_special_div_by_const): New. * targhooks.cc (default_can_special_div_by_const): New. * targhooks.h (default_can_special_div_by_const): New. * tree-vect-generic.cc (expand_vector_operation): Use it. * doc/tm.texi.in: Document it. * doc/tm.texi: Regenerate. * tree-vect-patterns.cc (vect_recog_divmod_pattern): Check for support. * tree-vect-stmts.cc (vectorizable_operation): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-div-bitmask-1.c: New test. * gcc.dg/vect/vect-div-bitmask-2.c: New test. * gcc.dg/vect/vect-div-bitmask-3.c: New test. * gcc.dg/vect/vect-div-bitmask.h: New file.
2022-11-04	Do not use subword paradoxical subregs in emit_group_store	Eric Botcazou	1	-13/+13
	The goal of the trick is to make life easier for the combiner, but subword paradoxical subregs make it harder for the register allocator instead. gcc/ * expr.cc (emit_group_store): Do not use subword paradoxical subregs
2022-10-25	Always use TYPE_MODE instead of DECL_MODE for vector field	H.J. Lu	1	-2/+1
	e034c5c8957 re PR target/78643 (ICE in convert_move, at expr.c:230) fixed the case where DECL_MODE of a vector field is BLKmode and its TYPE_MODE is a vector mode because of target attribute. Remove the BLKmode check for the case where DECL_MODE of a vector field is a vector mode and its TYPE_MODE isn't a vector mode because of target attribute. gcc/ PR target/107304 * expr.cc (get_inner_reference): Always use TYPE_MODE for vector field with vector raw mode. gcc/testsuite/ PR target/107304 * gcc.target/i386/pr107304.c: New test.
2022-10-19	expr: Fix ICE on BFmode -> SFmode conversion of constant [PR107262]	Jakub Jelinek	1	-2/+9
	I forgot to handle the case where lowpart_subreg returns a VOIDmode CONST_INT, in that case convert_mode_scalar obviously doesn't work. The following patch fixes that. 2022-10-19 Jakub Jelinek <jakub@redhat.com> PR middle-end/107262 * expr.cc (convert_mode_scalar): For BFmode -> SFmode conversions of constants, use simplify_unary_operation if fromi has VOIDmode instead of recursive convert_mode_scalar. * gcc.dg/pr107262.c: New test.
2022-10-14	middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support	Jakub Jelinek	1	-1/+149
	Here is a complete patch to add std::bfloat16_t support on x86 (AArch64 and ARM left for later). Almost no BFmode optabs are added by the patch, so for binops/unops it extends to SFmode first and then truncates back to BFmode. For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations of all those conversions so that we avoid double rounding, for BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much it emits BFmode -> SFmode conversion first and then converts to the even wider mode, neither step should be imprecise. For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion and then SFmode -> HFmode, because neither format is subset or superset of the other, while SFmode is superset of both. expr.cc then contains a -ffast-math optimization of the BF -> SF and SF -> BF conversions if we don't optimize for space (and for the latter if -frounding-math isn't enabled either). For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16 but IMNSHO should FAIL if !flag_finite_math \|\| flag_rounding_math \|\| !flag_unsafe_math_optimizations, because I think the insn doesn't raise on sNaNs, hardcodes round to nearest and flushes denormals to zero. By default (unless x86 -fexcess-precision=16) we use float excess precision for BFmode, so truncate only on explicit casts and assignments. The patch introduces a single __bf16 builtin - __builtin_nansf16b, because (__bf16) __builtin_nansf ("") will drop the sNaN into qNaN, and uses f16b suffix instead of bf16 because there would be ambiguity on log vs. logb - __builtin_logbf16 could be either log with bf16 suffix or logb with f16 suffix. In other cases libstdc++ should mostly use __builtin_f for std::bfloat16_t overloads (we have a problem with std::nextafter though but that one we have also for std::float16_t). 2022-10-14 Jakub Jelinek <jakub@redhat.com> gcc/ tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE. * tree.h (bfloat16_type_node): Define. * tree.cc (excess_precision_type): Promote bfloat16_type_mode like float16_type_mode. (build_common_tree_nodes): Initialize bfloat16_type_node if BFmode is supported. * expmed.h (maybe_expand_shift): Declare. * expmed.cc (maybe_expand_shift): No longer static. * expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF conversions. If there is no optab, handle BF -> {DF,XF,TF,HF} conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add -ffast-math generic implementation for BF -> SF and SF -> BF conversions. * builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New. * builtins.def (BUILT_IN_NANSF16B): New builtin. * fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B. * config/i386/i386.cc (classify_argument): Handle E_BCmode. (ix86_libgcc_floating_mode_supported_p): Also return true for BFmode for -msse2. (ix86_mangle_type): Mangle BFmode as DF16b. (ix86_invalid_conversion, ix86_invalid_unary_op, ix86_invalid_binary_op): Remove. (TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP): Don't redefine. * config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove. (ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than ix86_bf16_type_node, only create it if still NULL. * config/i386/i386-builtin-types.def (BFLOAT16): Likewise. * config/i386/i386.md (cbranchbf4, cstorebf4): New expanders. gcc/c-family/ * c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node, predefine __BFLT16___ macros and for C++23 also __STDCPP_BFLOAT16_T__. Predefine bfloat16_type_node related macros for -fbuilding-libgcc. c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16. gcc/c/ * c-typeck.cc (convert_arguments): Don't promote __bf16 to double. gcc/cp/ * cp-tree.h (extended_float_type_p): Return true for bfloat16_type_node. * typeck.cc (cp_compare_floating_point_conversion_ranks): Set extended{1,2} if mv{1,2} is bfloat16_type_node. Adjust comment. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_bfloat16, check_effective_target_bfloat16_runtime, add_options_for_bfloat16): New. * gcc.dg/torture/bfloat16-basic.c: New test. * gcc.dg/torture/bfloat16-builtin.c: New test. * gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test. * gcc.dg/torture/bfloat16-complex.c: New test. * gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable from bfloat16-builtin-issignaling-1.c. * gcc.dg/torture/floatn-basic.h: Allow to be includable from bfloat16-basic.c. * gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected diagnostics. * gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise. * gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise. * g++.target/i386/bfloat_cpp_typecheck.C: Likewise. libcpp/ * include/cpplib.h (CPP_N_BFLOAT16): Define. * expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for C++. libgcc/ * config/i386/t-softfp (softfp_extensions): Add bfsf. (softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf. (CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c, CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add -msse2. * config/i386/libgcc-glibc.ver (GCC_13.0.0): Export __extendbfsf2 and __trunc{s,d,x,t,h}fbf2. * config/i386/sfp-machine.h (_FP_NANSIGN_B): Define. * config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define. * config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define. * soft-fp/brain.h: New file. * soft-fp/truncsfbf2.c: New file. * soft-fp/truncdfbf2.c: New file. * soft-fp/truncxfbf2.c: New file. * soft-fp/trunctfbf2.c: New file. * soft-fp/trunchfbf2.c: New file. * soft-fp/truncbfhf2.c: New file. * soft-fp/extendbfsf2.c: New file. libiberty/ * cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment. * cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t entry. (cplus_demangle_type): Demangle DF16b. * testsuite/demangle-expected (_Z3xxxDF16b): New test.
2022-10-13	Fix emit_group_store regression on big-endian	Eric Botcazou	1	-12/+32
	The recent optimization implemented for complex modes contains an oversight for big-endian platforms: it uses a lowpart SUBREG when the integer modes have different sizes, but this does not match the semantics of the PARALLELs which have a bundled byte offset; this offset is always zero in the code path and the lowpart is not at offset zero on big-endian platforms. gcc/ * expr.cc (emit_group_stote): Fix handling of modes of different sizes for big-endian targets in latest change and add commentary.
2022-10-06	middle-end/107115 - avoid bogus redundant store removal during RTL expansion	Richard Biener	1	-1/+3
	The following preserves the (premature) redundant store removal done in store_expr by appropriately guarding it with mems_same_for_tbaa_p. The testcase added needs scheduling disabled for now since there's a similar bug there still present. PR middle-end/107115 * expr.cc (store_expr): Check mems_same_for_tbaa_p before eliding a seemingly redundant store. * gcc.dg/torture/pr107115.c: New testcase.
2022-07-26	c: Handle initializations of opaque types [PR106016]	Peter Bergner	1	-1/+1
	The initial commit that added opaque types thought that there couldn't be any valid initializations for variables of these types, but the test case in the bug report shows that isn't true. The solution is to handle OPAQUE_TYPE initializations like the other scalar types. 2022-06-17 Peter Bergner <bergner@linux.ibm.com> gcc/ PR c/106016 * expr.cc (count_type_elements): Handle OPAQUE_TYPE. gcc/testsuite/ PR c/106016 * gcc.target/powerpc/pr106016.c: New test.
2022-07-09	[RFA] Improve initialization of objects when the initializer has trailing zeros.	Jeff Law	1	-0/+11
	gcc/ * expr.cc (store_expr): Identify trailing NULs in a STRING_CST initializer and use clear_storage rather than copying the NULs to the destination array.
2022-07-08	middle-end: Use subregs to expand COMPLEX_EXPR to set the lowpart.	Tamar Christina	1	-21/+23
	When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs. One for the lowpart and one for the highpart. The problem with this is that in RTL the lvalue of the RTX is the only thing tying the two instructions together. This means that e.g. combine is unable to try to combine the two instructions for setting the lowpart and highpart. For ISAs that have bit extract instructions we can eliminate one of the extracts if, and only if we're setting the entire complex number. This change changes the expand code when we're setting the entire complex number to generate a subreg for the lowpart instead of a vec_extract. This allows us to optimize sequences such as: _Complex int f(int a, int b) { _Complex int t = a + b * 1i; return t; } from: f: bfi x2, x0, 0, 32 bfi x2, x1, 32, 32 mov x0, x2 ret into: f: bfi x0, x1, 32, 32 ret I have also confirmed the codegen for x86_64 did not change. gcc/ChangeLog: * expmed.cc (store_bit_field_1): Add parameter that indicates if value is still undefined and if so emit a subreg move instead. (store_integral_bit_field): Likewise. (store_bit_field): Likewise. * expr.h (write_complex_part): Likewise. * expmed.h (store_bit_field): Add new parameter. * builtins.cc (expand_ifn_atomic_compare_exchange_into_call): Use new parameter. (expand_ifn_atomic_compare_exchange): Likewise. * calls.cc (store_unaligned_arguments_into_pseudos): Likewise. * emit-rtl.cc (validate_subreg): Likewise. * expr.cc (emit_group_store): Likewise. (copy_blkmode_from_reg): Likewise. (copy_blkmode_to_reg): Likewise. (clear_storage_hints): Likewise. (write_complex_part): Likewise. (emit_move_complex_parts): Likewise. (expand_assignment): Likewise. (store_expr): Likewise. (store_field): Likewise. (expand_expr_real_2): Likewise. * ifcvt.cc (noce_emit_move_insn): Likewise. * internal-fn.cc (expand_arith_set_overflow): Likewise. (expand_arith_overflow_result_store): Likewise. (expand_addsub_overflow): Likewise. (expand_neg_overflow): Likewise. (expand_mul_overflow): Likewise. (expand_arith_overflow): Likewise. gcc/testsuite/ChangeLog: * g++.target/aarch64/complex-init.C: New test.
2022-07-04	middle-end: Support ABIs that pass FP values as wider integers.	Roger Sayle	1	-0/+35
	Sorry for the long delay getting back to this, but after deeper investigation, it turns out that Jeff Law's tingling spider senses that the original patch wasn't updating everywhere that was required were spot on. Although my nvptx testing showed no problems with -O2, compiling the same tests with -O0 found several additional assertion ICEs (exactly where he'd predicted they'd be). Here's a revised patch that updates five locations (up from the previous two). Finding any remaining locations (if any) might be easier once folks are able to test things on their targets. This also implements Jeff's suggestion to factor the common code into helper routines. 2022-07-04 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/104489 * calls.cc (precompute_register_parameters): Allow promotion of floating point values to be passed in wider integer modes by calling new convert_float_to_wider_int. (expand_call): Allow floating point results to be returned in wider integer modes by calling new convert wider_int_to_float. * cfgexpand.cc (expand_value_return): Allow backends to promote a scalar floating point return value to a wider integer mode by calling new convert_float_to_wider_int. * expr.cc (convert_float_to_wider_int): New function. (convert_wider_int_to_float): Likewise. (expand_expr_real_1) <expand_decl_rtl>: Allow backends to promote scalar FP PARM_DECLs to wider integer modes, by calling new convert_wider_int_to_float. * expr.h (convert_modes): Name arguments for improved documentation. (convert_float_to_wider_int): Prototype new function here. (convert_wider_int_to_float): Likewise. * function.cc (assign_parm_setup_stack): Allow floating point values to be passed on the stack as wider integer modes by calling new convert_wider_int_to_float.
2022-07-01	Amend fix for PR middle-end/105874	Eric Botcazou	1	-37/+40
	The original fix is very likely too big a hammer. gcc/ PR middle-end/105874 * expr.cc (expand_expr_real_1) <normal_inner_ref>: Force EXPAND_MEMORY for the expansion of the inner reference only in the usual cases where a memory reference is required.
2022-06-27	expr.cc: use final/override on op_by_pieces_d vfuncs	David Malcolm	1	-7/+7
	gcc/ChangeLog: * expr.cc: Add "final" and "override" to op_by_pieces_d vfunc implementations as appropriate. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-06-21	expand: Fix up expand_cond_expr_using_cmove [PR106030]	Jakub Jelinek	1	-1/+2
	If expand_cond_expr_using_cmove can't find a cmove optab for a particular mode, it tries to promote the mode and perform the cmove in the promoted mode. The testcase in the patch ICEs on arm because in that case we pass temp which has the promoted mode (SImode) as target to expand_operands where the operands have the non-promoted mode (QImode). Later on the function uses paradoxical subregs: if (GET_MODE (op1) != mode) op1 = gen_lowpart (mode, op1); if (GET_MODE (op2) != mode) op2 = gen_lowpart (mode, op2); to change the operand modes. The following patch fixes it by passing NULL_RTX as target if it has promoted mode. 2022-06-21 Jakub Jelinek <jakub@redhat.com> PR middle-end/106030 * expr.cc (expand_cond_expr_using_cmove): Pass NULL_RTX instead of temp to expand_operands if mode has been promoted. * gcc.c-torture/compile/pr106030.c: New test.