aboutsummaryrefslogtreecommitdiff
path: root/gcc/fold-const.cc
AgeCommit message (Collapse)AuthorFilesLines
2023-10-19PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.Prathamesh Kulkarni1-11/+103
gcc/ChangeLog: PR tree-optimization/111648 * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): If a1 chooses base element from arg, ensure that it's a natural stepped sequence. (build_vec_cst_rand): New param natural_stepped and use it to construct a naturally stepped sequence. (test_nunits_min_2): Add new unit tests Case 6 and Case 7.
2023-10-16use more get_range_queryJiufu Guo1-5/+1
For "get_global_range_query" SSA_NAME_RANGE_INFO can be queried. For "get_range_query", it could get more context-aware range info. And look at the implementation of "get_range_query", it returns global range if no local fun info. So, if not quering for SSA_NAME and not chaning the IL, it would be ok to use get_range_query to replace get_global_range_query. gcc/ChangeLog: * fold-const.cc (expr_not_equal_to): Replace get_global_range_query by get_range_query. * gimple-fold.cc (size_must_be_zero_p): Likewise. * gimple-range-fold.cc (fur_source::fur_source): Likewise. * gimple-ssa-warn-access.cc (check_nul_terminated_array): Likewise. * tree-dfa.cc (get_ref_base_and_extent): Likewise.
2023-10-12wide-int: Allow up to 16320 bits wide_int and change widest_int precision to ↵Jakub Jelinek1-3/+11
32640 bits [PR102989] As mentioned in the _BitInt support thread, _BitInt(N) is currently limited by the wide_int/widest_int maximum precision limitation, which is depending on target 191, 319, 575 or 703 bits (one less than WIDE_INT_MAX_PRECISION). That is fairly low limit for _BitInt, especially on the targets with the 191 bit limitation. The following patch bumps that limit to 16319 bits on all arches (which support _BitInt at all), which is the limit imposed by INTEGER_CST representation (unsigned char members holding number of HOST_WIDE_INT limbs). In order to achieve that, wide_int is changed from a trivially copyable type which contained just an inline array of WIDE_INT_MAX_ELTS (3, 5, 9 or 11 limbs depending on target) limbs into a non-trivially copy constructible, copy assignable and destructible type which for the usual small cases (up to WIDE_INT_MAX_INL_ELTS which is the former WIDE_INT_MAX_ELTS) still uses an inline array of limbs, but for larger precisions uses heap allocated limb array. This makes wide_int unusable in GC structures, so for dwarf2out which was the only place which needed it there is a new rwide_int type (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs inline and is trivially copyable (dwarf2out should never deal with large _BitInt constants, those should have been lowered earlier). Similarly, widest_int has been changed from a trivially copyable type which contained also an inline array of WIDE_INT_MAX_ELTS limbs (but unlike wide_int didn't contain precision and assumed that to be WIDE_INT_MAX_PRECISION) into a non-trivially copy constructible, copy assignable and destructible type which has always WIDEST_INT_MAX_PRECISION precision (32640 bits currently, twice as much as INTEGER_CST limitation allows) and unlike wide_int decides depending on get_len () value whether it uses an inline array (again, up to WIDE_INT_MAX_INL_ELTS) or heap allocated one. In wide-int.h this means we need to estimate an upper bound on how many limbs will wide-int.cc (usually, sometimes wide-int.h) need to write, heap allocate if needed based on that estimation and upon set_len which is done at the end if we guessed over WIDE_INT_MAX_INL_ELTS and allocated dynamically, while we actually need less than that copy/deallocate. The unexact guesses are needed because the exact computation of the length in wide-int.cc is sometimes quite complex and especially canonicalize at the end can decrease it. widest_int is again because of this not usable in GC structures, so cfgloop.h has been changed to use fixed_wide_int_storage <WIDE_INT_MAX_INL_PRECISION> and punt if we'd have larger _BitInt based iterators, programs having more than 128-bit iterators will be hopefully rare and I think it is fine to treat loops with more than 2^127 iterations as effectively possibly infinite, omp-general.cc is changed to use fixed_wide_int_storage <1024>, as it better should support scores with the same precision on all arches. Code which used WIDE_INT_PRINT_BUFFER_SIZE sized buffers for printing wide_int/widest_int into buffer had to be changed to use XALLOCAVEC for larger lengths. On x86_64, the patch in --enable-checking=yes,rtl,extra configured bootstrapped cc1plus enlarges the .text section by 1.01% - from 0x25725a5 to 0x25e5555 and similarly at least when compiling insn-recog.cc with the usual bootstrap option slows compilation down by 1.01%, user 4m22.046s and 4m22.384s on vanilla trunk vs. 4m25.947s and 4m25.581s on patched trunk. I'm afraid some code size growth and compile time slowdown is unavoidable in this case, we use wide_int and widest_int everywhere, and while the rare cases are marked with UNLIKELY macros, it still means extra checks for it. The patch also regresses +FAIL: gm2/pim/fail/largeconst.mod, -O +FAIL: gm2/pim/fail/largeconst.mod, -O -g +FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer +FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer -finline-functions +FAIL: gm2/pim/fail/largeconst.mod, -Os +FAIL: gm2/pim/fail/largeconst.mod, -g +FAIL: gm2/pim/fail/largeconst2.mod, -O +FAIL: gm2/pim/fail/largeconst2.mod, -O -g +FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer +FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer -finline-functions +FAIL: gm2/pim/fail/largeconst2.mod, -Os +FAIL: gm2/pim/fail/largeconst2.mod, -g tests, which previously were rejected with error: constant literal ‘12345678912345678912345679123456789123456789123456789123456789123456791234567891234567891234567891234567891234567912345678912345678912345678912345678912345679123456789123456789’ exceeds internal ZTYPE range kind of errors, but now are accepted. Seems the FE tries to parse constants into widest_int in that case and only diagnoses if widest_int overflows, that seems wrong, it should at least punt if stuff doesn't fit into WIDE_INT_MAX_PRECISION, but perhaps far less than that, if it wants support for middle-end for precisions above 128-bit, it better should be using BITINT_TYPE. Will file a PR and defer to Modula2 maintainer. 2023-10-12 Jakub Jelinek <jakub@redhat.com> PR c/102989 * wide-int.h: Adjust file comment. (WIDE_INT_MAX_INL_ELTS): Define to former value of WIDE_INT_MAX_ELTS. (WIDE_INT_MAX_INL_PRECISION): Define. (WIDE_INT_MAX_ELTS): Change to 255. Assert that WIDE_INT_MAX_INL_ELTS is smaller than WIDE_INT_MAX_ELTS. (RWIDE_INT_MAX_ELTS, RWIDE_INT_MAX_PRECISION, WIDEST_INT_MAX_ELTS, WIDEST_INT_MAX_PRECISION): Define. (WI_BINARY_RESULT_VAR, WI_UNARY_RESULT_VAR): Change write_val callers to pass 0 as a new argument. (class widest_int_storage): Likewise. (widest_int, widest2_int): Change typedefs to use widest_int_storage rather than fixed_wide_int_storage. (enum wi::precision_type): Add INL_CONST_PRECISION enumerator. (struct binary_traits): Add partial specializations for INL_CONST_PRECISION. (generic_wide_int): Add needs_write_val_arg static data member. (int_traits): Likewise. (wide_int_storage): Replace val non-static data member with a union u of it and HOST_WIDE_INT *valp. Declare copy constructor, copy assignment operator and destructor. Add unsigned int argument to write_val. (wide_int_storage::wide_int_storage): Initialize precision to 0 in the default ctor. Remove unnecessary {}s around STATIC_ASSERTs. Assert in non-default ctor T's precision_type is not INL_CONST_PRECISION and allocate u.valp for large precision. Add copy constructor. (wide_int_storage::~wide_int_storage): New. (wide_int_storage::operator=): Add copy assignment operator. In assignment operator remove unnecessary {}s around STATIC_ASSERTs, assert ctor T's precision_type is not INL_CONST_PRECISION and if precision changes, deallocate and/or allocate u.valp. (wide_int_storage::get_val): Return u.valp rather than u.val for large precision. (wide_int_storage::write_val): Likewise. Add an unused unsigned int argument. (wide_int_storage::set_len): Use write_val instead of writing val directly. (wide_int_storage::from, wide_int_storage::from_array): Adjust write_val callers. (wide_int_storage::create): Allocate u.valp for large precisions. (wi::int_traits <wide_int_storage>::get_binary_precision): New. (fixed_wide_int_storage::fixed_wide_int_storage): Make default ctor defaulted. (fixed_wide_int_storage::write_val): Add unused unsigned int argument. (fixed_wide_int_storage::from, fixed_wide_int_storage::from_array): Adjust write_val callers. (wi::int_traits <fixed_wide_int_storage>::get_binary_precision): New. (WIDEST_INT): Define. (widest_int_storage): New template class. (wi::int_traits <widest_int_storage>): New. (trailing_wide_int_storage::write_val): Add unused unsigned int argument. (wi::get_binary_precision): Use wi::int_traits <WI_BINARY_RESULT (T1, T2)>::get_binary_precision rather than get_precision on get_binary_result. (wi::copy): Adjust write_val callers. Don't call set_len if needs_write_val_arg. (wi::bit_not): If result.needs_write_val_arg, call write_val again with upper bound estimate of len. (wi::sext, wi::zext, wi::set_bit): Likewise. (wi::bit_and, wi::bit_and_not, wi::bit_or, wi::bit_or_not, wi::bit_xor, wi::add, wi::sub, wi::mul, wi::mul_high, wi::div_trunc, wi::div_floor, wi::div_ceil, wi::div_round, wi::divmod_trunc, wi::mod_trunc, wi::mod_floor, wi::mod_ceil, wi::mod_round, wi::lshift, wi::lrshift, wi::arshift): Likewise. (wi::bswap, wi::bitreverse): Assert result.needs_write_val_arg is false. (gt_ggc_mx, gt_pch_nx): Remove generic template for all generic_wide_int, instead add functions and templates for each storage of generic_wide_int. Make functions for generic_wide_int <wide_int_storage> and templates for generic_wide_int <widest_int_storage <N>> deleted. (wi::mask, wi::shifted_mask): Adjust write_val calls. * wide-int.cc (zeros): Decrease array size to 1. (BLOCKS_NEEDED): Use CEIL. (canonize): Use HOST_WIDE_INT_M1. (wi::from_buffer): Pass 0 to write_val. (wi::to_mpz): Use CEIL. (wi::from_mpz): Likewise. Pass 0 to write_val. Use WIDE_INT_MAX_INL_ELTS instead of WIDE_INT_MAX_ELTS. (wi::mul_internal): Use WIDE_INT_MAX_INL_PRECISION instead of MAX_BITSIZE_MODE_ANY_INT in automatic array sizes, for prec above WIDE_INT_MAX_INL_PRECISION estimate precision from lengths of operands. Use XALLOCAVEC allocated buffers for prec above WIDE_INT_MAX_INL_PRECISION. (wi::divmod_internal): Likewise. (wi::lshift_large): For len > WIDE_INT_MAX_INL_ELTS estimate it from xlen and skip. (rshift_large_common): Remove xprecision argument, add len argument with len computed in caller. Don't return anything. (wi::lrshift_large, wi::arshift_large): Compute len here and pass it to rshift_large_common, for lengths above WIDE_INT_MAX_INL_ELTS using estimations from xlen if possible. (assert_deceq, assert_hexeq): For lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. (test_printing): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION. * wide-int-print.h (WIDE_INT_PRINT_BUFFER_SIZE): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION. * wide-int-print.cc (print_decs, print_decu, print_hex): For lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. * tree.h (wi::int_traits<extended_tree <N>>): Change precision_type to INL_CONST_PRECISION for N == ADDR_MAX_PRECISION. (widest_extended_tree): Use WIDEST_INT_MAX_PRECISION instead of WIDE_INT_MAX_PRECISION. (wi::ints_for): Use int_traits <extended_tree <N> >::precision_type instead of hard coded CONST_PRECISION. (widest2_int_cst): Use WIDEST_INT_MAX_PRECISION instead of WIDE_INT_MAX_PRECISION. (wi::extended_tree <N>::get_len): Use WIDEST_INT_MAX_PRECISION rather than WIDE_INT_MAX_PRECISION. (wi::ints_for::zero): Use wi::int_traits <wi::extended_tree <N> >::precision_type instead of wi::CONST_PRECISION. * tree.cc (build_replicated_int_cst): Formatting fix. Use WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS. * print-tree.cc (print_node): Don't print TREE_UNAVAILABLE on INTEGER_CSTs, TREE_VECs or SSA_NAMEs. * double-int.h (wi::int_traits <double_int>::precision_type): Change to INL_CONST_PRECISION from CONST_PRECISION. * poly-int.h (struct poly_coeff_traits): Add partial specialization for wi::INL_CONST_PRECISION. * cfgloop.h (bound_wide_int): New typedef. (struct nb_iter_bound): Change bound type from widest_int to bound_wide_int. (struct loop): Change nb_iterations_upper_bound, nb_iterations_likely_upper_bound and nb_iterations_estimate type from widest_int to bound_wide_int. * cfgloop.cc (record_niter_bound): Return early if wi::min_precision of i_bound is too large for bound_wide_int. Adjustments for the widest_int to bound_wide_int type change in non-static data members. (get_estimated_loop_iterations, get_max_loop_iterations, get_likely_max_loop_iterations): Adjustments for the widest_int to bound_wide_int type change in non-static data members. * tree-vect-loop.cc (vect_transform_loop): Likewise. * tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use XALLOCAVEC allocated buffer for i_bound len above WIDE_INT_MAX_INL_ELTS. (record_estimate): Return early if wi::min_precision of i_bound is too large for bound_wide_int. Adjustments for the widest_int to bound_wide_int type change in non-static data members. (wide_int_cmp): Use bound_wide_int instead of widest_int. (bound_index): Use bound_wide_int instead of widest_int. (discover_iteration_bound_by_body_walk): Likewise. Use widest_int::from to convert it to widest_int when passed to record_niter_bound. (maybe_lower_iteration_bound): Use widest_int::from to convert it to widest_int when passed to record_niter_bound. (estimate_numbers_of_iteration): Don't record upper bound if loop->nb_iterations has too large precision for bound_wide_int. (n_of_executions_at_most): Use widest_int::from. * tree-ssa-loop-ivcanon.cc (remove_redundant_iv_tests): Adjust for the widest_int to bound_wide_int changes. * match.pd (fold_sign_changed_comparison simplification): Use wide_int::from on wi::to_wide instead of wi::to_widest. * value-range.h (irange::maybe_resize): Avoid using memcpy on non-trivially copyable elements. * value-range.cc (irange_bitmask::dump): Use XALLOCAVEC allocated buffer for mask or value len above WIDE_INT_PRINT_BUFFER_SIZE. * fold-const.cc (fold_convert_const_int_from_int, fold_unary_loc): Use wide_int::from on wi::to_wide instead of wi::to_widest. * tree-ssa-ccp.cc (bit_value_binop): Zero extend r1max from width before calling wi::udiv_trunc. * lto-streamer-out.cc (output_cfg): Adjustments for the widest_int to bound_wide_int type change in non-static data members. * lto-streamer-in.cc (input_cfg): Likewise. (lto_input_tree_1): Use WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS. For length above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. Formatting fix. * data-streamer-in.cc (streamer_read_wide_int, streamer_read_widest_int): Likewise. * tree-affine.cc (aff_combination_expand): Use placement new to construct name_expansion. (free_name_expansion): Destruct name_expansion. * gimple-ssa-strength-reduction.cc (struct slsr_cand_d): Change index type from widest_int to offset_int. (class incr_info_d): Change incr type from widest_int to offset_int. (alloc_cand_and_find_basis, backtrace_base_for_ref, restructure_reference, slsr_process_ref, create_mul_ssa_cand, create_mul_imm_cand, create_add_ssa_cand, create_add_imm_cand, slsr_process_add, cand_abs_increment, replace_mult_candidate, replace_unconditional_candidate, incr_vec_index, create_add_on_incoming_edge, create_phi_basis_1, replace_conditional_candidate, record_increment, record_phi_increments_1, phi_incr_cost_1, phi_incr_cost, lowest_cost_path, total_savings, ncd_with_phi, ncd_of_cand_and_phis, nearest_common_dominator_for_cands, insert_initializers, all_phi_incrs_profitable_1, replace_one_candidate, replace_profitable_candidates): Use offset_int rather than widest_int and wi::to_offset rather than wi::to_widest. * real.cc (real_to_integer): Use WIDE_INT_MAX_INL_ELTS rather than 2 * WIDE_INT_MAX_ELTS and for words above that use XALLOCAVEC allocated buffer. * tree-ssa-loop-ivopts.cc (niter_for_exit): Use placement new to construct tree_niter_desc and destruct it on failure. (free_tree_niter_desc): Destruct tree_niter_desc if value is non-NULL. * gengtype.cc (main): Remove widest_int handling. * graphite-isl-ast-to-gimple.cc (widest_int_from_isl_expr_int): Use WIDEST_INT_MAX_ELTS instead of WIDE_INT_MAX_ELTS. * gimple-ssa-warn-alloca.cc (pass_walloca::execute): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION and assert get_len () fits into it. * value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks): For mask or value lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. * gimple-ssa-sprintf.cc (adjust_range_for_overflow): Use wide_int::from on wi::to_wide instead of wi::to_widest. * omp-general.cc (score_wide_int): New typedef. (omp_context_compute_score): Use score_wide_int instead of widest_int and adjust for those changes. (struct omp_declare_variant_entry): Change score and score_in_declare_simd_clone non-static data member type from widest_int to score_wide_int. (omp_resolve_late_declare_variant, omp_resolve_declare_variant): Use score_wide_int instead of widest_int and adjust for those changes. (omp_lto_output_declare_variant_alt): Likewise. (omp_lto_input_declare_variant_alt): Likewise. * godump.cc (go_output_typedef): Assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS. gcc/c-family/ * c-warn.cc (match_case_to_enum_1): Use wi::to_wide just once instead of 3 times, assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS. gcc/testsuite/ * gcc.dg/bitint-38.c: New test.
2023-10-10tree-optimization/111751 - support 1024 bit vector constant reinterpretationRichard Biener1-2/+2
The following ups the limit in fold_view_convert_expr to handle 1024bit vectors as used by GCN and RVV. It also robustifies the handling in visit_reference_op_load to properly give up when constants cannot be re-interpreted. PR tree-optimization/111751 * fold-const.cc (fold_view_convert_expr): Up the buffer size to 128 bytes. * tree-ssa-sccvn.cc (visit_reference_op_load): Special case constants, giving up when re-interpretation to the target type fails.
2023-09-29Remove poly_int_podRichard Sandiford1-2/+2
poly_int was written before the switch to C++11 and so couldn't use explicit default constructors. This led to an awkward split between poly_int_pod and poly_int. poly_int simply inherited from poly_int_pod and added constructors, with the argumentless constructor having an empty body. But inheritance meant that poly_int had to repeat the assignment operators from poly_int_pod (again, no C++11, so no "using" to inherit base-class implementations). All that goes away if we switch to using default constructors. The main complication is ensuring that braced initialisation still gives a constexpr, so that static variables can be initialised without runtime code. The two problems here are: (1) When initialising a poly_int<N, wide_int> with fewer than N coefficients, the other coefficients need to be a zero of the same precision as the explicit coefficients. This was previously done in a for loop using wi::ints_for<...>::zero, but C++11 constexpr constructors can't have function bodies. The patch instead uses a series of delegated initialisers to fill in the implicit coefficients. (2) The initialisation in: void f(int x) { unsigned int foo {x}; } produces the warning: warning: narrowing conversion of 'x' from 'int' to 'unsigned int' [-Wnarrowing] whereas: void f(int x) { unsigned int foo = x; } does not. So switching to direct initialisation of the coeffs array would mean that: poly_uin64_t x = 0; would trigger a warning for using 0 rather than 0u. That seemed overly pedantic, so the patch adds explicit casts to the constructor. The complication is to do that without adding extra code to wide-int versions. The patch uses a new init_cast type for that. gcc/ * poly-int.h (poly_int_pod): Delete. (poly_coeff_traits::init_cast): New type. (poly_int_full, poly_int_hungry, poly_int_fullness): New structures. (poly_int): Replace constructors that take 1 and 2 coefficients with a general one that takes an arbitrary number of coefficients. Delegate initialization to two new private constructors, one of which uses the coefficients as-is and one of which adds an extra zero of the appropriate type (and precision, where applicable). (gt_ggc_mx, gt_pch_nx): Operate on poly_ints rather than poly_int_pods. * poly-int-types.h (poly_uint16_pod, poly_int64_pod, poly_uint64_pod) (poly_offset_int_pod, poly_wide_int_pod, poly_widest_int_pod): Delete. * gengtype.cc (main): Don't register poly_int64_pod. * calls.cc (initialize_argument_information): Use poly_int rather than poly_int_pod. (combine_pending_stack_adjustment_and_call): Likewise. * config/aarch64/aarch64.cc (pure_scalable_type_info): Likewise. * data-streamer.h (bp_unpack_poly_value): Likewise. * dwarf2cfi.cc (struct dw_trace_info): Likewise. (struct queued_reg_save): Likewise. * dwarf2out.h (struct dw_cfa_location): Likewise. * emit-rtl.h (struct incoming_args): Likewise. (struct rtl_data): Likewise. * expr.cc (get_bit_range): Likewise. (get_inner_reference): Likewise. * expr.h (get_bit_range): Likewise. * fold-const.cc (split_address_to_core_and_offset): Likewise. (ptr_difference_const): Likewise. * fold-const.h (ptr_difference_const): Likewise. * function.cc (try_fit_stack_local): Likewise. (instantiate_new_reg): Likewise. * function.h (struct expr_status): Likewise. (struct args_size): Likewise. * genmodes.cc (ZERO_COEFFS): Likewise. (mode_size_inline): Likewise. (mode_nunits_inline): Likewise. (emit_mode_precision): Likewise. (emit_mode_size): Likewise. (emit_mode_nunits): Likewise. * gimple-fold.cc (get_base_constructor): Likewise. * gimple-ssa-store-merging.cc (struct symbolic_number): Likewise. * inchash.h (class hash): Likewise. * ipa-modref-tree.cc (modref_access_node::dump): Likewise. * ipa-modref.cc (modref_access_analysis::merge_call_side_effects): Likewise. * ira-int.h (ira_spilled_reg_stack_slot): Likewise. * lra-eliminations.cc (self_elim_offsets): Likewise. * machmode.h (mode_size, mode_precision, mode_nunits): Likewise. * omp-low.cc (omplow_simd_context): Likewise. * pretty-print.cc (pp_wide_integer): Likewise. * pretty-print.h (pp_wide_integer): Likewise. * reload.cc (struct decomposition): Likewise. * reload.h (struct reload): Likewise. * reload1.cc (spill_stack_slot_width): Likewise. (struct elim_table): Likewise. (offsets_at): Likewise. (init_eliminable_invariants): Likewise. * rtl.h (union rtunion): Likewise. (poly_int_rtx_p): Likewise. (strip_offset): Likewise. (strip_offset_and_add): Likewise. * rtlanal.cc (strip_offset): Likewise. * tree-dfa.cc (get_ref_base_and_extent): Likewise. (get_addr_base_and_unit_offset_1): Likewise. (get_addr_base_and_unit_offset): Likewise. * tree-dfa.h (get_ref_base_and_extent): Likewise. (get_addr_base_and_unit_offset_1): Likewise. (get_addr_base_and_unit_offset): Likewise. * tree-ssa-loop-ivopts.cc (struct iv_use): Likewise. (strip_offset): Likewise. * tree-ssa-sccvn.h (struct vn_reference_op_struct): Likewise. * tree.cc (ptrdiff_tree_p): Likewise. * tree.h (poly_int_tree_p): Likewise. (ptrdiff_tree_p): Likewise. (get_inner_reference): Likewise. gcc/testsuite/ * gcc.dg/plugin/poly-int-tests.h (test_num_coeffs_extra): Use poly_int rather than poly_int_pod.
2023-09-12fold-const: Handle BITINT_TYPE in range_check_typeJakub Jelinek1-1/+6
When discussing PR111369 with Andrew Pinski, I've realized that I haven't added BITINT_TYPE handling to range_check_type. Right now (unsigned) max + 1 == (unsigned) min for signed _BitInt,l so I think we don't need to do the extra hops for BITINT_TYPE (though possibly we don't need them for INTEGER_TYPE either in the two's complement word and we don't support anything else, though I really don't know if Ada or some other FEs don't create weird INTEGER_TYPEs). 2023-09-12 Jakub Jelinek <jakub@redhat.com> * fold-const.cc (range_check_type): Handle BITINT_TYPE like OFFSET_TYPE.
2023-09-09Support folding min(poly,poly) to constLehua Ding1-0/+24
This patch adds support that tries to fold `MIN (poly, poly)` to a constant. Consider the following C Code: ``` void foo2 (int* restrict a, int* restrict b, int n) { for (int i = 0; i < 3; i += 1) a[i] += b[i]; } ``` Before this patch: ``` void foo2 (int * restrict a, int * restrict b, int n) { vector([4,4]) int vect__7.27; vector([4,4]) int vect__6.26; vector([4,4]) int vect__4.23; unsigned long _32; <bb 2> [local count: 268435456]: _32 = MIN_EXPR <3, POLY_INT_CST [4, 4]>; vect__4.23_20 = .MASK_LEN_LOAD (a_11(D), 32B, { -1, ... }, _32, 0); vect__6.26_15 = .MASK_LEN_LOAD (b_12(D), 32B, { -1, ... }, _32, 0); vect__7.27_9 = vect__6.26_15 + vect__4.23_20; .MASK_LEN_STORE (a_11(D), 32B, { -1, ... }, _32, 0, vect__7.27_9); [tail call] return; } ``` After this patch: ``` void foo2 (int * restrict a, int * restrict b, int n) { vector([4,4]) int vect__7.27; vector([4,4]) int vect__6.26; vector([4,4]) int vect__4.23; <bb 2> [local count: 268435456]: vect__4.23_20 = .MASK_LEN_LOAD (a_11(D), 32B, { -1, ... }, 3, 0); vect__6.26_15 = .MASK_LEN_LOAD (b_12(D), 32B, { -1, ... }, 3, 0); vect__7.27_9 = vect__6.26_15 + vect__4.23_20; .MASK_LEN_STORE (a_11(D), 32B, { -1, ... }, 3, 0, vect__7.27_9); [tail call] return; } ``` For RISC-V RVV, csrr and branch instructions can be reduced: Before this patch: ``` foo2: csrr a4,vlenb srli a4,a4,2 li a5,3 bleu a5,a4,.L5 mv a5,a4 .L5: vsetvli zero,a5,e32,m1,ta,ma ... ``` After this patch. ``` foo2: vsetivli zero,3,e32,m1,ta,ma ... ``` gcc/ChangeLog: * fold-const.cc (can_min_p): New function. (poly_int_binop): Try fold MIN_EXPR. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/div-1.c: Adjust. * gcc.target/riscv/rvv/autovec/vls/shift-3.c: Adjust. * gcc.target/riscv/rvv/autovec/fold-min-poly.c: New test.
2023-09-07middle-end: Avoid calling targetm.c.bitint_type_info inside of gcc_assert ↵Jakub Jelinek1-4/+4
[PR102989] On Thu, Sep 07, 2023 at 10:36:02AM +0200, Thomas Schwinge wrote: > Minor comment/question: are we doing away with the property that > 'assert'-like "calls" must not have side effects? Per 'gcc/system.h', > this is "OK" for 'gcc_assert' for '#if ENABLE_ASSERT_CHECKING' or > '#elif (GCC_VERSION >= 4005)' -- that is, GCC 4.5, which is always-true, > thus the "offending" '#else' is never active. However, it's different > for standard 'assert' and 'gcc_checking_assert', so I'm not sure if > that's a good property for 'gcc_assert' only? For example, see also > <https://gcc.gnu.org/PR6906> "warn about asserts with side effects", or > recent <https://gcc.gnu.org/PR111144> > "RFE: could -fanalyzer warn about assertions that have side effects?". You're right, the #define gcc_assert(EXPR) ((void)(0 && (EXPR))) fallback definition is incompatible with the way I've used it, so for --disable-checking built by non-GCC it would not work properly. 2023-09-07 Jakub Jelinek <jakub@redhat.com> PR c/102989 * expr.cc (expand_expr_real_1): Don't call targetm.c.bitint_type_info inside gcc_assert, as later code relies on it filling info variable. * gimple-fold.cc (clear_padding_bitint_needs_padding_p, clear_padding_type): Likewise. * varasm.cc (output_constant): Likewise. * fold-const.cc (native_encode_int, native_interpret_int): Likewise. * stor-layout.cc (finish_bitfield_representative, layout_type): Likewise. * gimple-lower-bitint.cc (bitint_precision_kind): Likewise.
2023-09-06Middle-end _BitInt support [PR102989]Jakub Jelinek1-9/+66
The following patch introduces the middle-end part of the _BitInt support, a new BITINT_TYPE, handling it where needed, except the lowering pass and sanitizer support. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * tree.def (BITINT_TYPE): New type. * tree.h (TREE_CHECK6, TREE_NOT_CHECK6): Define. (NUMERICAL_TYPE_CHECK, INTEGRAL_TYPE_P): Include BITINT_TYPE. (BITINT_TYPE_P): Define. (CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode bit-fields if they have BITINT_TYPE type. (tree_check6, tree_not_check6): New inline functions. (any_integral_type_check): Include BITINT_TYPE. (build_bitint_type): Declare. * tree.cc (tree_code_size, wide_int_to_tree_1, cache_integer_cst, build_zero_cst, type_hash_canon_hash, type_cache_hasher::equal, type_hash_canon): Handle BITINT_TYPE. (bitint_type_cache): New variable. (build_bitint_type): New function. (signed_or_unsigned_type_for, verify_type_variant, verify_type): Handle BITINT_TYPE. (tree_cc_finalize): Free bitint_type_cache. * builtins.cc (type_to_class): Handle BITINT_TYPE. (fold_builtin_unordered_cmp): Handle BITINT_TYPE like INTEGER_TYPE. * cfgexpand.cc (expand_debug_expr): Punt on BLKmode BITINT_TYPE INTEGER_CSTs. * convert.cc (convert_to_pointer_1, convert_to_real_1, convert_to_complex_1): Handle BITINT_TYPE like INTEGER_TYPE. (convert_to_integer_1): Likewise. For BITINT_TYPE don't check GET_MODE_PRECISION (TYPE_MODE (type)). * doc/generic.texi (BITINT_TYPE): Document. * doc/tm.texi.in (TARGET_C_BITINT_TYPE_INFO): New. * doc/tm.texi: Regenerated. * dwarf2out.cc (base_type_die, is_base_type, modified_type_die, gen_type_die_with_usage): Handle BITINT_TYPE. (rtl_for_decl_init): Punt on BLKmode BITINT_TYPE INTEGER_CSTs or handle those which fit into shwi. * expr.cc (expand_expr_real_1): Define EXTEND_BITINT macro, reduce to bitfield precision reads from BITINT_TYPE vars, parameters or memory locations. Expand large/huge BITINT_TYPE INTEGER_CSTs into memory. * fold-const.cc (fold_convert_loc, make_range_step): Handle BITINT_TYPE. (extract_muldiv_1): For BITINT_TYPE use TYPE_PRECISION rather than GET_MODE_SIZE (SCALAR_INT_TYPE_MODE). (native_encode_int, native_interpret_int, native_interpret_expr): Handle BITINT_TYPE. * gimple-expr.cc (useless_type_conversion_p): Make BITINT_TYPE to some other integral type or vice versa conversions non-useless. * gimple-fold.cc (gimple_fold_builtin_memset): Punt for BITINT_TYPE. (clear_padding_unit): Mention in comment that _BitInt types don't need to fit either. (clear_padding_bitint_needs_padding_p): New function. (clear_padding_type_may_have_padding_p): Handle BITINT_TYPE. (clear_padding_type): Likewise. * internal-fn.cc (expand_mul_overflow): For unsigned non-mode precision operands force pos_neg? to 1. (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): New functions. * internal-fn.def (MULBITINT, DIVMODBITINT, FLOATTOBITINT, BITINTTOFLOAT): New internal functions. * internal-fn.h (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): Declare. * match.pd (non-equality compare simplifications from fold_binary): Punt if TYPE_MODE (arg1_type) is BLKmode. * pretty-print.h (pp_wide_int): Handle printing of large precision wide_ints which would buffer overflow digit_buffer. * stor-layout.cc (finish_bitfield_representative): For bit-fields with BITINT_TYPE, prefer representatives with precisions in multiple of limb precision. (layout_type): Handle BITINT_TYPE. Handle COMPLEX_TYPE with BLKmode element type and assert it is BITINT_TYPE. * target.def (bitint_type_info): New C target hook. * target.h (struct bitint_info): New type. * targhooks.cc (default_bitint_type_info): New function. * targhooks.h (default_bitint_type_info): Declare. * tree-pretty-print.cc (dump_generic_node): Handle BITINT_TYPE. Handle printing large wide_ints which would buffer overflow digit_buffer. * tree-ssa-sccvn.cc: Include target.h. (eliminate_dom_walker::eliminate_stmt): Punt for large/huge BITINT_TYPE. * tree-switch-conversion.cc (jump_table_cluster::emit): For more than 64-bit BITINT_TYPE subtract low bound from expression and cast to 64-bit integer type both the controlling expression and case labels. * typeclass.h (enum type_class): Add bitint_type_class enumerator. * varasm.cc (output_constant): Handle BITINT_TYPE INTEGER_CSTs. * vr-values.cc (check_for_binary_op_overflow): Use widest2_int rather than widest_int. (simplify_using_ranges::simplify_internal_call_using_ranges): Use unsigned_type_for rather than build_nonstandard_integer_type.
2023-08-21PR111048: Set arg_npatterns correctly.Prathamesh Kulkarni1-7/+37
In valid_mask_for_fold_vec_perm_cst we set arg_npatterns always to VECTOR_CST_NPATTERNS (arg0) because of (q1 & 0) == 0: /* Ensure that the stepped sequence always selects from the same input pattern. */ unsigned arg_npatterns = ((q1 & 0) == 0) ? VECTOR_CST_NPATTERNS (arg0) : VECTOR_CST_NPATTERNS (arg1); resulting in wrong code-gen issues. The patch fixes this by changing the condition to (q1 & 1) == 0. gcc/ChangeLog: PR tree-optimization/111048 * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Set arg_npatterns correctly. (fold_vec_perm_cst): Remove workaround and again call valid_mask_fold_vec_perm_cst_p for both VLS and VLA vectors. (test_fold_vec_perm_cst::test_nunits_min_4): Add test-case.
2023-08-18tree-optimization/111048 - avoid flawed logic in fold_vec_permRichard Biener1-6/+6
The following avoids running into somehow flawed logic in fold_vec_perm for non-VLA vectors. PR tree-optimization/111048 * fold-const.cc (fold_vec_perm_cst): Check for non-VLA vectors first. * gcc.dg/torture/pr111048.c: New testcase.
2023-08-16Extend fold_vec_perm to handle VLA vector_cst.Prathamesh Kulkarni1-21/+778
The patch extends fold_vec_perm to fold VLA vector_csts. For eg: arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x sel = { 0, len, ...} npatterns = 2, nelts_per_pattern = 1, len = 4 + 4x res = VEC_PERM_EXPR<arg0, arg1, sel> --> { arg0[0], arg1[0], ... }, npatterns = 2, nelts_per_pattern = 1 Eg 2: arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x sel = {0, 1, 2, ...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x For this case the index 2 in sel is ambiguous for len 2 + 2x: if x = 0, runtime vector length = 2 and sel[i] will choose arg1[0] if x > 0, runtime vector length > 2 and sel[i] choose arg0[2]. So we return NULL_TREE for this case. This leads us to defining a constraint that a stepped sequence in sel, should only select a particular pattern from a particular input vector. Eg 3: arg0 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x arg1 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x sel = { len, 0, 2, ... } npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x sel contains a single pattern with stepped sequence: {0, 2, ...}. Let, a1 = the first element of stepped part of sequence, which is 0. Let esel = number of total elements in stepped sequence. Thus, esel = len / sel_npatterns = (4 + 4x) / 1 = 4 + 4x Let S = step of the sequence, which is 2 in this case. Let ae = last element of the stepped sequence. Thus, ae = a1 + (esel - 2) * S = 0 + (4 + 4x - 2) * 2 = 4 + 8x To ensure that we select elements from the same input vector, a1 /trunc len = ae /trunc len. Let, q1 = a1 /trunc len = 0 / (4 + 4x) = 0 Let, qe = ae /trunc len = (4 + 8x) / (4 + 4x) = 1 Since q1 != qe, we cross input vectors, and return NULL_TREE for this case. However, if sel was: sel = {len, 0, 1, ...} The only change in this case is S = 1. So, ae = a1 + (esel - 2) * S = 0 + (4 + 4x - 2) * 1 = 2 + 4x In this case, a1/len == ae/len == 0, and the stepped sequence chooses all elements from arg0. Thus, res = {arg1[0], arg0[0], arg0[1], ...} For VLA folding, sel has to conform to constraints imposed in valid_mask_for_fold_vec_perm_cst_p. test_fold_vec_perm_cst defines several unit-tests for VLA folding. gcc/ChangeLog: * fold-const.cc (INCLUDE_ALGORITHM): Add Include. (valid_mask_for_fold_vec_perm_cst_p): New function. (fold_vec_perm_cst): Likewise. (fold_vec_perm): Adjust assert and call fold_vec_perm_cst. (test_fold_vec_perm_cst): New namespace. (test_fold_vec_perm_cst::build_vec_cst_rand): New function. (test_fold_vec_perm_cst::validate_res): Likewise. (test_fold_vec_perm_cst::validate_res_vls): Likewise. (test_fold_vec_perm_cst::builder_push_elems): Likewise. (test_fold_vec_perm_cst::test_vnx4si_v4si): Likewise. (test_fold_vec_perm_cst::test_v4si_vnx4si): Likewise. (test_fold_vec_perm_cst::test_all_nunits): Likewise. (test_fold_vec_perm_cst::test_nunits_min_2): Likewise. (test_fold_vec_perm_cst::test_nunits_min_4): Likewise. (test_fold_vec_perm_cst::test_nunits_min_8): Likewise. (test_fold_vec_perm_cst::test_nunits_max_4): Likewise. (test_fold_vec_perm_cst::is_simple_vla_size): Likewise. (test_fold_vec_perm_cst::test): Likewise. (fold_const_cc_tests): Call test_fold_vec_perm_cst::test. Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
2023-07-18c++: constexpr bit_cast with empty fieldJason Merrill1-1/+2
The change to only cache constexpr calls that are reduced_constant_expression_p tripped on bit-cast3.C, which failed that predicate due to the presence of an empty field in the result of native_interpret_aggregate, which reduced_constant_expression_p rejects to avoid confusing output_constructor. This patch proposes to skip such fields in native_interpret_aggregate, since they aren't actually involved in the value representation. gcc/ChangeLog: * fold-const.cc (native_interpret_aggregate): Skip empty fields. gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_bit_cast): Check that the result of native_interpret_aggregate doesn't need more evaluation.
2023-06-30fold-const+optabs: Change return type of predicate functions from int to boolUros Bizjak1-34/+34
Also change some internal variables and function argument from int to bool. gcc/ChangeLog: * fold-const.h (multiple_of_p): Change return type from int to bool. * fold-const.cc (split_tree): Change negl_p, neg_litp_p, neg_conp_p and neg_var_p variables to bool. (const_binop): Change sat_p variable to bool. (merge_ranges): Change no_overlap variable to bool. (extract_muldiv_1): Change same_p variable to bool. (tree_swap_operands_p): Update function body for bool return type. (fold_truth_andor): Change commutative variable to bool. (multiple_of_p): Change return type from int to void and adjust function body accordingly. * optabs.h (expand_twoval_unop): Change return type from int to bool. (expand_twoval_binop): Ditto. (can_compare_p): Ditto. (have_add2_insn): Ditto. (have_addptr3_insn): Ditto. (have_sub2_insn): Ditto. (have_insn_for): Ditto. * optabs.cc (add_equal_note): Ditto. (widen_operand): Change no_extend argument from int to bool. (expand_binop): Ditto. (expand_twoval_unop): Change return type from int to void and adjust function body accordingly. (expand_twoval_binop): Ditto. (can_compare_p): Ditto. (have_add2_insn): Ditto. (have_addptr3_insn): Ditto. (have_sub2_insn): Ditto. (have_insn_for): Ditto.
2023-06-23Fix tree_simple_nonnegative_warnv_p for VECTOR_TYPEsRichard Biener1-1/+2
tree_simple_nonnegative_warnv_p ends up being called on VECTOR_TYPEs which I think even gets the wrong answer here for tcc_comparison since vector bools are signed. The following properly guards that with !VECTOR_TYPE_P. * fold-const.cc (tree_simple_nonnegative_warnv_p): Guard the truth_value_p case with !VECTOR_TYPE_P.
2023-06-23Bogus and missed folding on vector comparesRichard Biener1-2/+2
fold_binary tries to transform (double)float1 CMP (double)float2 into float1 CMP float2 but ends up using TYPE_PRECISION on the argument types. For vector types that compares the number of lanes which should be always equal (so it's harmless as to not generating wrong code). The following instead properly uses element_precision. The same happens in the corresponding match.pd pattern. * fold-const.cc (fold_binary_loc): Use element_precision when trying (double)float1 CMP (double)float2 to float1 CMP float2 simplification. * match.pd: Likewise.
2023-06-16tree-optimization/110269 - restore missed condition foldingRichard Biener1-7/+0
The following makes sure we optimize x != 0 using range info via tree_expr_nonzero_p via match.pd. PR tree-optimization/110269 * fold-const.cc (fold_binary_loc): Merge x != 0 folding with tree_expr_nonzero_p ... * match.pd (cmp (convert? addr@0) integer_zerop): With this pattern. * gcc.dg/tree-ssa/pr110269.c: New testcase.
2023-06-13middle-end/110232 - fix native interpret of vector <signed-boolean:1>Richard Biener1-7/+4
The following fixes native interpretation of a buffer as boolean vector with bit-precision elements such as AVX512 vectors. The check whether the buffer covers the whole vector was broken for bit-precision elements and the following instead implements it based on the vector type size. PR middle-end/110232 * fold-const.cc (native_interpret_vector): Use TYPE_SIZE_UNIT to check whether the buffer covers the whole vector. * gcc.target/i386/pr110232.c: New testcase.
2023-05-30Add a != MIN/MAX_VALUE_CST ? CST-+1 : a to minmax_from_comparisonAndrew Pinski1-0/+26
This patch adds the support for match that was implemented for PR 87913 in phiopt. It implements it by adding support to minmax_from_comparison for the check. It uses the range information if available which allows to produce MIN/MAX expression when comparing against the lower/upper bound of the range instead of lower/upper of the type. minmax-20.c is the new testcase which tests the ranges part. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * fold-const.cc (minmax_from_comparison): Add support for NE_EXPR. * match.pd ((cond (cmp (convert1? x) c1) (convert2? x) c2) pattern): Add ne as a possible cmp. ((a CMP b) ? minmax<a, c> : minmax<b, c> pattern): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/minmax-22.c: New test.
2023-05-23Fix handling of non-integral bit-fields in native_encode_initializerEric Botcazou1-10/+17
The encoder for CONSTRUCTORs assumes that all bit-fields (DECL_BIT_FIELD) have integral types, but that's not the case in Ada where they may have pretty much any type, resulting in a wrong encoding for them gcc/ * fold-const.cc (native_encode_initializer) <CONSTRUCTOR>: Apply the specific treatment for bit-fields only if they have an integral type and filter out non-integral bit-fields that do not start and end on a byte boundary. gcc/testsuite/ * gnat.dg/opt101.adb: New test. * gnat.dg/opt101_pkg.ads: New helper.
2023-05-20Move fold_single_bit_test to expr.cc from fold-const.ccAndrew Pinski1-112/+0
This is part 1 of N patch set that will change the expansion of `(A & C) != 0` from using trees to directly expanding so later on we can do some cost analysis. Since the only user of fold_single_bit_test is now expand, move it to there. gcc/ChangeLog: * fold-const.cc (fold_single_bit_test_into_sign_test): Move to expr.cc. (fold_single_bit_test): Likewise. * expr.cc (fold_single_bit_test_into_sign_test): Move from fold-const.cc (fold_single_bit_test): Likewise and make static. * fold-const.h (fold_single_bit_test): Remove declaration.
2023-05-18gcc: use _P() defines from tree.hBernhard Reutner-Fischer1-24/+22
gcc/ChangeLog: * alias.cc (ref_all_alias_ptr_type_p): Use _P() defines from tree.h. * attribs.cc (diag_attr_exclusions): Ditto. (decl_attributes): Ditto. (build_type_attribute_qual_variant): Ditto. * builtins.cc (fold_builtin_carg): Ditto. (fold_builtin_next_arg): Ditto. (do_mpc_arg2): Ditto. * cfgexpand.cc (expand_return): Ditto. * cgraph.h (decl_in_symtab_p): Ditto. (symtab_node::get_create): Ditto. * dwarf2out.cc (base_type_die): Ditto. (implicit_ptr_descriptor): Ditto. (gen_array_type_die): Ditto. (gen_type_die_with_usage): Ditto. (optimize_location_into_implicit_ptr): Ditto. * expr.cc (do_store_flag): Ditto. * fold-const.cc (negate_expr_p): Ditto. (fold_negate_expr_1): Ditto. (fold_convert_const): Ditto. (fold_convert_loc): Ditto. (constant_boolean_node): Ditto. (fold_binary_op_with_conditional_arg): Ditto. (build_fold_addr_expr_with_type_loc): Ditto. (fold_comparison): Ditto. (fold_checksum_tree): Ditto. (tree_unary_nonnegative_warnv_p): Ditto. (integer_valued_real_unary_p): Ditto. (fold_read_from_constant_string): Ditto. * gcc-rich-location.cc (maybe_range_label_for_tree_type_mismatch::get_text): Ditto. * gimple-expr.cc (useless_type_conversion_p): Ditto. (is_gimple_reg): Ditto. (is_gimple_asm_val): Ditto. (mark_addressable): Ditto. * gimple-expr.h (is_gimple_variable): Ditto. (virtual_operand_p): Ditto. * gimple-ssa-warn-access.cc (pass_waccess::check_dangling_stores): Ditto. * gimplify.cc (gimplify_bind_expr): Ditto. (gimplify_return_expr): Ditto. (gimple_add_padding_init_for_auto_var): Ditto. (gimplify_addr_expr): Ditto. (omp_add_variable): Ditto. (omp_notice_variable): Ditto. (omp_get_base_pointer): Ditto. (omp_strip_components_and_deref): Ditto. (omp_strip_indirections): Ditto. (omp_accumulate_sibling_list): Ditto. (omp_build_struct_sibling_lists): Ditto. (gimplify_adjust_omp_clauses_1): Ditto. (gimplify_adjust_omp_clauses): Ditto. (gimplify_omp_for): Ditto. (goa_lhs_expr_p): Ditto. (gimplify_one_sizepos): Ditto. * graphite-scop-detection.cc (scop_detection::graphite_can_represent_scev): Ditto. * ipa-devirt.cc (odr_types_equivalent_p): Ditto. * ipa-prop.cc (ipa_set_jf_constant): Ditto. (propagate_controlled_uses): Ditto. * ipa-sra.cc (type_prevails_p): Ditto. (scan_expr_access): Ditto. * optabs-tree.cc (optab_for_tree_code): Ditto. * toplev.cc (wrapup_global_declaration_1): Ditto. * trans-mem.cc (transaction_invariant_address_p): Ditto. * tree-cfg.cc (verify_types_in_gimple_reference): Ditto. (verify_gimple_comparison): Ditto. (verify_gimple_assign_binary): Ditto. (verify_gimple_assign_single): Ditto. * tree-complex.cc (get_component_ssa_name): Ditto. * tree-emutls.cc (lower_emutls_2): Ditto. * tree-inline.cc (copy_tree_body_r): Ditto. (estimate_move_cost): Ditto. (copy_decl_for_dup_finish): Ditto. * tree-nested.cc (convert_nonlocal_omp_clauses): Ditto. (note_nonlocal_vla_type): Ditto. (convert_local_omp_clauses): Ditto. (remap_vla_decls): Ditto. (fixup_vla_decls): Ditto. * tree-parloops.cc (loop_has_vector_phi_nodes): Ditto. * tree-pretty-print.cc (print_declaration): Ditto. (print_call_name): Ditto. * tree-sra.cc (compare_access_positions): Ditto. * tree-ssa-alias.cc (compare_type_sizes): Ditto. * tree-ssa-ccp.cc (get_default_value): Ditto. * tree-ssa-coalesce.cc (populate_coalesce_list_for_outofssa): Ditto. * tree-ssa-dom.cc (reduce_vector_comparison_to_scalar_comparison): Ditto. * tree-ssa-forwprop.cc (can_propagate_from): Ditto. * tree-ssa-propagate.cc (may_propagate_copy): Ditto. * tree-ssa-sccvn.cc (fully_constant_vn_reference_p): Ditto. * tree-ssa-sink.cc (statement_sink_location): Ditto. * tree-ssa-structalias.cc (type_must_have_pointers): Ditto. * tree-ssa-ter.cc (find_replaceable_in_bb): Ditto. * tree-ssa-uninit.cc (warn_uninit): Ditto. * tree-ssa.cc (maybe_rewrite_mem_ref_base): Ditto. (non_rewritable_mem_ref_base): Ditto. * tree-streamer-in.cc (lto_input_ts_type_non_common_tree_pointers): Ditto. * tree-streamer-out.cc (write_ts_type_non_common_tree_pointers): Ditto. * tree-vect-generic.cc (do_binop): Ditto. (do_cond): Ditto. * tree-vect-stmts.cc (vect_init_vector): Ditto. * tree-vector-builder.h (tree_vector_builder::note_representative): Ditto. * tree.cc (sign_mask_for): Ditto. (verify_type_variant): Ditto. (gimple_canonical_types_compatible_p): Ditto. (verify_type): Ditto. * ubsan.cc (get_ubsan_type_info_for_type): Ditto. * var-tracking.cc (prepare_call_arguments): Ditto. (vt_add_function_parameters): Ditto. * varasm.cc (decode_addr_const): Ditto.
2023-05-01Conversion to irange wide_int API.Aldy Hernandez1-2/+1
This converts the irange API to use wide_ints exclusively, along with its users. This patch will slow down VRP, as there will be more useless wide_int to tree conversions. However, this slowdown is only temporary, as a follow-up patch will convert the internal representation of iranges to wide_ints for a net overall gain in performance. gcc/ChangeLog: * fold-const.cc (expr_not_equal_to): Convert to irange wide_int API. * gimple-fold.cc (size_must_be_zero_p): Same. * gimple-loop-versioning.cc (loop_versioning::prune_loop_conditions): Same. * gimple-range-edge.cc (gcond_edge_range): Same. (gimple_outgoing_range::calc_switch_ranges): Same. * gimple-range-fold.cc (adjust_imagpart_expr): Same. (adjust_realpart_expr): Same. (fold_using_range::range_of_address): Same. (fold_using_range::relation_fold_and_or): Same. * gimple-range-gori.cc (gori_compute::gori_compute): Same. (range_is_either_true_or_false): Same. * gimple-range-op.cc (cfn_toupper_tolower::get_letter_range): Same. (cfn_clz::fold_range): Same. (cfn_ctz::fold_range): Same. * gimple-range-tests.cc (class test_expr_eval): Same. * gimple-ssa-warn-alloca.cc (alloca_call_type): Same. * ipa-cp.cc (ipa_value_range_from_jfunc): Same. (propagate_vr_across_jump_function): Same. (decide_whether_version_node): Same. * ipa-prop.cc (ipa_get_value_range): Same. * ipa-prop.h (ipa_range_set_and_normalize): Same. * range-op.cc (get_shift_range): Same. (value_range_from_overflowed_bounds): Same. (value_range_with_overflow): Same. (create_possibly_reversed_range): Same. (equal_op1_op2_relation): Same. (not_equal_op1_op2_relation): Same. (lt_op1_op2_relation): Same. (le_op1_op2_relation): Same. (gt_op1_op2_relation): Same. (ge_op1_op2_relation): Same. (operator_mult::op1_range): Same. (operator_exact_divide::op1_range): Same. (operator_lshift::op1_range): Same. (operator_rshift::op1_range): Same. (operator_cast::op1_range): Same. (operator_logical_and::fold_range): Same. (set_nonzero_range_from_mask): Same. (operator_bitwise_or::op1_range): Same. (operator_bitwise_xor::op1_range): Same. (operator_addr_expr::fold_range): Same. (pointer_plus_operator::wi_fold): Same. (pointer_or_operator::op1_range): Same. (INT): Same. (UINT): Same. (INT16): Same. (UINT16): Same. (SCHAR): Same. (UCHAR): Same. (range_op_cast_tests): Same. (range_op_lshift_tests): Same. (range_op_rshift_tests): Same. (range_op_bitwise_and_tests): Same. (range_relational_tests): Same. * range.cc (range_zero): Same. (range_nonzero): Same. * range.h (range_true): Same. (range_false): Same. (range_true_and_false): Same. * tree-data-ref.cc (split_constant_offset_1): Same. * tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Same. * tree-ssa-loop-unswitch.cc (struct unswitch_predicate): Same. (find_unswitching_predicates_for_bb): Same. * tree-ssa-phiopt.cc (value_replacement): Same. * tree-ssa-threadbackward.cc (back_threader::find_taken_edge_cond): Same. * tree-ssanames.cc (ssa_name_has_boolean_range): Same. * tree-vrp.cc (find_case_label_range): Same. * value-query.cc (range_query::get_tree_range): Same. * value-range.cc (irange::set_nonnegative): Same. (frange::contains_p): Same. (frange::singleton_p): Same. (frange::internal_singleton_p): Same. (irange::irange_set): Same. (irange::irange_set_1bit_anti_range): Same. (irange::irange_set_anti_range): Same. (irange::set): Same. (irange::operator==): Same. (irange::singleton_p): Same. (irange::contains_p): Same. (irange::set_range_from_nonzero_bits): Same. (DEFINE_INT_RANGE_INSTANCE): Same. (INT): Same. (UINT): Same. (SCHAR): Same. (UINT128): Same. (UCHAR): Same. (range): New. (tree_range): New. (range_int): New. (range_uint): New. (range_uint128): New. (range_uchar): New. (range_char): New. (build_range3): Convert to irange wide_int API. (range_tests_irange3): Same. (range_tests_int_range_max): Same. (range_tests_strict_enum): Same. (range_tests_misc): Same. (range_tests_nonzero_bits): Same. (range_tests_nan): Same. (range_tests_signed_zeros): Same. * value-range.h (Value_Range::Value_Range): Same. (irange::set): Same. (irange::nonzero_p): Same. (irange::contains_p): Same. (range_includes_zero_p): Same. (irange::set_nonzero): Same. (irange::set_zero): Same. (contains_zero_p): Same. (frange::contains_p): Same. * vr-values.cc (simplify_using_ranges::op_with_boolean_value_range_p): Same. (bounds_of_var_in_loop): Same. (simplify_using_ranges::legacy_fold_cond_overflow): Same.
2023-04-28MATCH: Factor out code that for min max detection with constantsAndrew Pinski1-0/+44
This factors out some of the code from the min/max detection from match.pd into a function so it can be reused in other places. This is mainly used to detect the conversions of >= to > which causes the integer values to be changed by one. Changes since v1: * factor out the checks for INTEGER_CSTs so it is more obvious. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd: Factor out the deciding the min/max from the "(cond (cmp (convert1? x) c1) (convert2? x) c2)" pattern to ... * fold-const.cc (minmax_from_comparison): this new function. * fold-const.h (minmax_from_comparison): New prototype.
2023-04-04sanitizer: missing signed integer overflow errors [PR109107]Marek Polacek1-1/+2
Here we're failing to detect a signed overflow with -O because match.pd, since r8-1516, transforms c = (a + 1) - (int) (short int) b; into c = (int) ((unsigned int) a + 4294946117); wrongly eliding the overflow. This kind of problems is usually avoided by using TYPE_OVERFLOW_SANITIZED in the appropriate place. The first match.pd hunk in the patch fixes it. I've constructed a testcase for each of the surrounding cases as well. Then I noticed that fold_binary_loc/associate has the same problem, so I've added a TYPE_OVERFLOW_SANITIZED there as well (it may be too coarse, sorry). Then I found yet another problem, but instead of fixing it now I've opened 109134. I could probably go on and find a dozen more. PR sanitizer/109107 gcc/ChangeLog: * fold-const.cc (fold_binary_loc): Use TYPE_OVERFLOW_SANITIZED when associating. * match.pd: Use TYPE_OVERFLOW_SANITIZED. gcc/testsuite/ChangeLog: * c-c++-common/ubsan/pr109107-1.c: New test. * c-c++-common/ubsan/pr109107-2.c: New test. * c-c++-common/ubsan/pr109107-3.c: New test. * c-c++-common/ubsan/pr109107-4.c: New test.
2023-03-23c: [PR84900] cast of compound literal does not cause the code to become a ↵Andrew Pinski1-0/+1
non-lvalue The problem here is after r0-92187-g2ec5deb5c3146c, maybe_lvalue_p would return false for compound literals which causes non_lvalue_loc not to wrap the expression with a NON_LVALUE_EXPR unlike before when it return true as it returns true for all language specific tree codes. This fixes that oversight and fixes the testcase to have the cast as a non-lvalue. Committed to the trunk as obvious after a bootstrap/test on x86_64-linux-gnu. PR c/84900 gcc/ChangeLog: * fold-const.cc (maybe_lvalue_p): Treat COMPOUND_LITERAL_EXPR as a lvalue. gcc/testsuite/ChangeLog: * gcc.dg/compound-literal-cast-lvalue-1.c: New test.
2023-03-09middle-end/108995 - avoid folding when sanitizing overflowRichard Biener1-4/+3
The following plugs one place in extract_muldiv where it should avoid folding when sanitizing overflow. PR middle-end/108995 * fold-const.cc (extract_muldiv_1): Avoid folding (CST * b) / CST2 when sanitizing overflow and we rely on overflow being undefined. * gcc.dg/ubsan/pr108995.c: New testcase.
2023-03-02fold-const: Ignore padding bits in native_interpret_expr REAL_CST reverse ↵Jakub Jelinek1-2/+4
verification [PR108934] In the following testcase we try to std::bit_cast a (pair of) integral value(s) which has some non-zero bits in the place of x86 long double (for 64-bit 16 byte type with 10 bytes actually loaded/stored by hw, for 32-bit 12 byte) and starting with my PR104522 change we reject that as native_interpret_expr fails on it. The PR104522 change extends what has been done before for MODE_COMPOSITE_P (but those don't have any padding bits) to all floating point types, because e.g. the exact x86 long double has various bit combinations we don't support, like pseudo-(denormals,infinities,NaNs) or unnormals. The HW handles some of those as exceptional cases and others similarly to the non-pseudo ones. But for the padding bits it actually doesn't load/store those bits at all, it loads/stores 10 bytes. So, I think we should exempt the padding bits from the reverse comparison (the native_encode_expr bits for the padding will be all zeros), which the following patch does. For bit_cast it is similar to e.g. ignoring padding bits if the destination is a structure which has padding bits in there. The change changed auto-init-4.c to how it has been behaving before the PR105259 change, where some more VCEs can be now done. 2023-03-02 Jakub Jelinek <jakub@redhat.com> PR c++/108934 * fold-const.cc (native_interpret_expr) <case REAL_CST>: Before memcmp comparison copy the bytes from ptr to a temporary buffer and clearing padding bits in there. * gcc.target/i386/auto-init-4.c: Revert PR105259 change. * g++.target/i386/pr108934.C: New test.
2023-01-04ubsan: Avoid narrowing of multiply for -fsanitize=signed-integer-overflow ↵Jakub Jelinek1-1/+3
[PR108256] We shouldn't narrow multiplications originally done in signed types, because the original multiplication might overflow but the narrowed one will be done in unsigned arithmetics and will never overflow. 2023-01-04 Jakub Jelinek <jakub@redhat.com> PR sanitizer/108256 * convert.cc (do_narrow): Punt for MULT_EXPR if original type doesn't wrap around and -fsanitize=signed-integer-overflow is on. * fold-const.cc (fold_unary_loc) <CASE_CONVERT>: Likewise. * c-c++-common/ubsan/pr108256.c: New test.
2023-01-02Update copyright years.Jakub Jelinek1-1/+1
2022-12-20fold-const: Treat fp conversion to a type with same mode as copyKewen Lin1-0/+9
In function fold_convert_const_real_from_real, when the modes of two types involved in fp conversion are the same, we can simply take it as copy, rebuild with the exactly same TREE_REAL_CST and the target type. It is more efficient and helps to avoid possible unexpected signalling bit clearing in [1]. [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608533.html gcc/ChangeLog: * fold-const.cc (fold_convert_const_real_from_real): Treat floating point conversion to a type with same mode as copy instead of normal convertFormat.
2022-12-20fold: fix use of protected_set_expr_location_unshareJason Merrill1-1/+1
Unlike protected_set_expr_location, this variant can return a different tree. gcc/ChangeLog: * fold-const.cc (fold_convert_loc): Check return value of protected_set_expr_location_unshare.
2022-12-20c++: source position of lambda captures [PR84471]Jason Merrill1-1/+1
If the DECL_VALUE_EXPR of a VAR_DECL has EXPR_LOCATION set, then any use of that variable looks like it has that location, which leads to the debugger jumping back and forth for both lambdas and structured bindings. Rather than fix all the uses, it seems simplest to remove any EXPR_LOCATION when setting DECL_VALUE_EXPR. So the cp/ hunks aren't necessary, but they avoid the need to unshare to remove the location. PR c++/84471 PR c++/107504 gcc/cp/ChangeLog: * coroutines.cc (transform_local_var_uses): Don't specify a location for DECL_VALUE_EXPR. * decl.cc (cp_finish_decomp): Likewise. gcc/ChangeLog: * fold-const.cc (protected_set_expr_location_unshare): Not static. * tree.h: Declare it. * tree.cc (decl_value_expr_insert): Use it. include/ChangeLog: * ansidecl.h (ATTRIBUTE_WARN_UNUSED_RESULT): Add __. gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/value-expr1.C: New test. * g++.dg/tree-ssa/value-expr2.C: New test. * g++.dg/analyzer/pr93212.C: Move warning.
2022-12-12Revert parts of ADDR_EXPR/CONSTRUCTOR treatment change in match.pdRichard Biener1-0/+9
This reverts the part that substitutes from the definition of an SSA name to the capture, thus ADDR_EXPR@0 eventually yielding &y_1->a[i_2] instead of _3. That's because I didn't think of how to deal with substituting @0 in the result pattern. So the following re-instantiates the SSA def CONSTRUCTOR handling and in the ADDR_EXPR helpers used by match.pd handles SSA names defined to ADDR_EXPRs transparently. * genmatch.cc (dt_simplify::gen): Revert last change. * match.pd: Revert simplification of CONSTUCTOR leaf handling. (&x cmp SSA_NAME): Handle ADDR_EXPR in SSA defs. * fold-const.cc (split_address_to_core_and_offset): Handle ADDR_EXPRs in SSA defs. (address_compare): Likewise.
2022-12-02Fix a few incorrect accesses.Andrew MacLeod1-3/+3
This consists of 3 changes which stronger type checking has indicated are incorrect. gcc/ * fold-const.cc (fold_unary_loc): Check TREE_TYPE of node. (tree_invalid_nonnegative_warnv_p): Likewise. gcc/c-family/ * c-attribs.cc (handle_deprecated_attribute): Use type when using TYPE_NAME.
2022-11-24Remove ASSERT_EXPR.Aldy Hernandez1-6/+0
This removes all uses of ASSERT_EXPR except the internal one in ipa-*. gcc/ChangeLog: * doc/gimple.texi: Remove ASSERT_EXPR references. * fold-const.cc (tree_expr_nonzero_warnv_p): Same. (fold_binary_loc): Same. (tree_expr_nonnegative_warnv_p): Same. * gimple-array-bounds.cc (get_base_decl): Same. * gimple-pretty-print.cc (dump_unary_rhs): Same. * gimple.cc (get_gimple_rhs_num_ops): Same. * pointer-query.cc (handle_ssa_name): Same. * tree-cfg.cc (verify_gimple_assign_single): Same. * tree-pretty-print.cc (dump_generic_node): Same. * tree-scalar-evolution.cc (scev_dfs::follow_ssa_edge_expr):Same. (interpret_rhs_expr): Same. * tree-ssa-operands.cc (operands_scanner::get_expr_operands): Same. * tree-ssa-propagate.cc (substitute_and_fold_dom_walker::before_dom_children): Same. * tree-ssa-threadedge.cc: Same. * tree-vrp.cc (overflow_comparison_p): Same. * tree.def (ASSERT_EXPR): Add note. * tree.h (ASSERT_EXPR_VAR): Remove. (ASSERT_EXPR_COND): Remove. * vr-values.cc (simplify_using_ranges::vrp_visit_cond_stmt): Remove comment.
2022-11-04Fix recent thinko in operand_equal_pEric Botcazou1-14/+4
There is a thinko in a recent improvement made to operand_equal_p where the code just looks at operand 2 of COMPONENT_REF, if it is present, to compare addresses. That's wrong because operand 2 contains the number of DECL_OFFSET_ALIGN-bit-sized words so, when DECL_OFFSET_ALIGN > 8, not all the bytes are included and some of them are in DECL_FIELD_BIT_OFFSET, see get_inner_reference for the model computation. In other words, you would need to compare operand 2 and DECL_OFFSET_ALIGN and DECL_FIELD_BIT_OFFSET in this situation, but I'm not sure this is worth the hassle in practice so the fix just removes this alternate handling. gcc/ * fold-const.cc (operand_compare::operand_equal_p) <COMPONENT_REF>: Do not take into account operand 2. (operand_compare::hash_operand) <COMPONENT_REF>: Likewise. gcc/testsuite/ * gnat.dg/opt99.adb: New test. * gnat.dg/opt99_pkg1.ads, gnat.dg/opt99_pkg1.adb: New helper. * gnat.dg/opt99_pkg2.ads: Likewise.
2022-10-31builtins: Add various complex builtins for _Float{16,32,64,128,32x,64x,128x}Jakub Jelinek1-0/+37
The following patch adds some complex builtins which have libm implementation in glibc 2.26 and later on various arches. It is needed for libstdc++ _Float128 support when long double is not IEEE quad. 2022-10-31 Jakub Jelinek <jakub@redhat.com> * builtin-types.def (BT_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT32, BT_COMPLEX_FLOAT64, BT_COMPLEX_FLOAT128, BT_COMPLEX_FLOAT32X, BT_COMPLEX_FLOAT64X, BT_COMPLEX_FLOAT128X, BT_FN_COMPLEX_FLOAT16_COMPLEX_FLOAT16, BT_FN_COMPLEX_FLOAT32_COMPLEX_FLOAT32, BT_FN_COMPLEX_FLOAT64_COMPLEX_FLOAT64, BT_FN_COMPLEX_FLOAT128_COMPLEX_FLOAT128, BT_FN_COMPLEX_FLOAT32X_COMPLEX_FLOAT32X, BT_FN_COMPLEX_FLOAT64X_COMPLEX_FLOAT64X, BT_FN_COMPLEX_FLOAT128X_COMPLEX_FLOAT128X, BT_FN_FLOAT16_COMPLEX_FLOAT16, BT_FN_FLOAT32_COMPLEX_FLOAT32, BT_FN_FLOAT64_COMPLEX_FLOAT64, BT_FN_FLOAT128_COMPLEX_FLOAT128, BT_FN_FLOAT32X_COMPLEX_FLOAT32X, BT_FN_FLOAT64X_COMPLEX_FLOAT64X, BT_FN_FLOAT128X_COMPLEX_FLOAT128X, BT_FN_COMPLEX_FLOAT16_COMPLEX_FLOAT16_COMPLEX_FLOAT16, BT_FN_COMPLEX_FLOAT32_COMPLEX_FLOAT32_COMPLEX_FLOAT32, BT_FN_COMPLEX_FLOAT64_COMPLEX_FLOAT64_COMPLEX_FLOAT64, BT_FN_COMPLEX_FLOAT128_COMPLEX_FLOAT128_COMPLEX_FLOAT128, BT_FN_COMPLEX_FLOAT32X_COMPLEX_FLOAT32X_COMPLEX_FLOAT32X, BT_FN_COMPLEX_FLOAT64X_COMPLEX_FLOAT64X_COMPLEX_FLOAT64X, BT_FN_COMPLEX_FLOAT128X_COMPLEX_FLOAT128X_COMPLEX_FLOAT128X): New. * builtins.def (CABS_TYPE, CACOSH_TYPE, CARG_TYPE, CASINH_TYPE, CPOW_TYPE, CPROJ_TYPE): Define and undefine later. (BUILT_IN_CABS, BUILT_IN_CACOSH, BUILT_IN_CACOS, BUILT_IN_CARG, BUILT_IN_CASINH, BUILT_IN_CASIN, BUILT_IN_CATANH, BUILT_IN_CATAN, BUILT_IN_CCOSH, BUILT_IN_CCOS, BUILT_IN_CEXP, BUILT_IN_CLOG, BUILT_IN_CPOW, BUILT_IN_CPROJ, BUILT_IN_CSINH, BUILT_IN_CSIN, BUILT_IN_CSQRT, BUILT_IN_CTANH, BUILT_IN_CTAN): Add DEF_EXT_LIB_FLOATN_NX_BUILTINS. * fold-const-call.cc (fold_const_call_sc, fold_const_call_cc, fold_const_call_ccc): Add various CASE_CFN_*_FN: cases when CASE_CFN_* is present. * gimple-ssa-backprop.cc (backprop::process_builtin_call_use): Likewise. * builtins.cc (expand_builtin, fold_builtin_1): Likewise. * fold-const.cc (negate_mathfn_p, tree_expr_finite_p, tree_expr_maybe_signaling_nan_p, tree_expr_maybe_nan_p, tree_expr_maybe_real_minus_zero_p, tree_call_nonnegative_warnv_p): Likewise.
2022-10-31builtins: Add various __builtin_*f{16,32,64,128,32x,64x,128x} builtinsJakub Jelinek1-0/+27
When working on libstdc++ extended float support in <cmath>, I found that we need various builtins for the _Float{16,32,64,128,32x,64x,128x} types. Glibc 2.26 and later provides the underlying libm routines (except for _Float16 and _Float128x for the time being) and in libstdc++ I think we need at least the _Float128 builtins on x86_64, i?86, powerpc64le and ia64 (when long double is IEEE quad, we can handle it by using __builtin_*l instead), because without the builtins the overloads couldn't be constexpr (say when it would declare the *f128 extern "C" routines itself and call them). The testcase covers just types of those builtins and their constant folding, so doesn't need actual libm support. 2022-10-31 Jakub Jelinek <jakub@redhat.com> * builtin-types.def (BT_FLOAT16_PTR, BT_FLOAT32_PTR, BT_FLOAT64_PTR, BT_FLOAT128_PTR, BT_FLOAT32X_PTR, BT_FLOAT64X_PTR, BT_FLOAT128X_PTR): New DEF_PRIMITIVE_TYPE. (BT_FN_INT_FLOAT16, BT_FN_INT_FLOAT32, BT_FN_INT_FLOAT64, BT_FN_INT_FLOAT128, BT_FN_INT_FLOAT32X, BT_FN_INT_FLOAT64X, BT_FN_INT_FLOAT128X, BT_FN_LONG_FLOAT16, BT_FN_LONG_FLOAT32, BT_FN_LONG_FLOAT64, BT_FN_LONG_FLOAT128, BT_FN_LONG_FLOAT32X, BT_FN_LONG_FLOAT64X, BT_FN_LONG_FLOAT128X, BT_FN_LONGLONG_FLOAT16, BT_FN_LONGLONG_FLOAT32, BT_FN_LONGLONG_FLOAT64, BT_FN_LONGLONG_FLOAT128, BT_FN_LONGLONG_FLOAT32X, BT_FN_LONGLONG_FLOAT64X, BT_FN_LONGLONG_FLOAT128X): New DEF_FUNCTION_TYPE_1. (BT_FN_FLOAT16_FLOAT16_FLOAT16PTR, BT_FN_FLOAT32_FLOAT32_FLOAT32PTR, BT_FN_FLOAT64_FLOAT64_FLOAT64PTR, BT_FN_FLOAT128_FLOAT128_FLOAT128PTR, BT_FN_FLOAT32X_FLOAT32X_FLOAT32XPTR, BT_FN_FLOAT64X_FLOAT64X_FLOAT64XPTR, BT_FN_FLOAT128X_FLOAT128X_FLOAT128XPTR, BT_FN_FLOAT16_FLOAT16_INT, BT_FN_FLOAT32_FLOAT32_INT, BT_FN_FLOAT64_FLOAT64_INT, BT_FN_FLOAT128_FLOAT128_INT, BT_FN_FLOAT32X_FLOAT32X_INT, BT_FN_FLOAT64X_FLOAT64X_INT, BT_FN_FLOAT128X_FLOAT128X_INT, BT_FN_FLOAT16_FLOAT16_INTPTR, BT_FN_FLOAT32_FLOAT32_INTPTR, BT_FN_FLOAT64_FLOAT64_INTPTR, BT_FN_FLOAT128_FLOAT128_INTPTR, BT_FN_FLOAT32X_FLOAT32X_INTPTR, BT_FN_FLOAT64X_FLOAT64X_INTPTR, BT_FN_FLOAT128X_FLOAT128X_INTPTR, BT_FN_FLOAT16_FLOAT16_LONG, BT_FN_FLOAT32_FLOAT32_LONG, BT_FN_FLOAT64_FLOAT64_LONG, BT_FN_FLOAT128_FLOAT128_LONG, BT_FN_FLOAT32X_FLOAT32X_LONG, BT_FN_FLOAT64X_FLOAT64X_LONG, BT_FN_FLOAT128X_FLOAT128X_LONG): New DEF_FUNCTION_TYPE_2. (BT_FN_FLOAT16_FLOAT16_FLOAT16_INTPTR, BT_FN_FLOAT32_FLOAT32_FLOAT32_INTPTR, BT_FN_FLOAT64_FLOAT64_FLOAT64_INTPTR, BT_FN_FLOAT128_FLOAT128_FLOAT128_INTPTR, BT_FN_FLOAT32X_FLOAT32X_FLOAT32X_INTPTR, BT_FN_FLOAT64X_FLOAT64X_FLOAT64X_INTPTR, BT_FN_FLOAT128X_FLOAT128X_FLOAT128X_INTPTR): New DEF_FUNCTION_TYPE_3. * builtins.def (ACOSH_TYPE, ATAN2_TYPE, ATANH_TYPE, COSH_TYPE, FDIM_TYPE, HUGE_VAL_TYPE, HYPOT_TYPE, ILOGB_TYPE, LDEXP_TYPE, LGAMMA_TYPE, LLRINT_TYPE, LOG10_TYPE, LRINT_TYPE, MODF_TYPE, NEXTAFTER_TYPE, REMQUO_TYPE, SCALBLN_TYPE, SCALBN_TYPE, SINH_TYPE): Define and undefine later. (FMIN_TYPE, SQRT_TYPE): Undefine at a later line. (INF_TYPE): Define at a later line. (BUILT_IN_ACOSH, BUILT_IN_ACOS, BUILT_IN_ASINH, BUILT_IN_ASIN, BUILT_IN_ATAN2, BUILT_IN_ATANH, BUILT_IN_ATAN, BUILT_IN_CBRT, BUILT_IN_COSH, BUILT_IN_COS, BUILT_IN_ERFC, BUILT_IN_ERF, BUILT_IN_EXP2, BUILT_IN_EXP, BUILT_IN_EXPM1, BUILT_IN_FDIM, BUILT_IN_FMOD, BUILT_IN_FREXP, BUILT_IN_HYPOT, BUILT_IN_ILOGB, BUILT_IN_LDEXP, BUILT_IN_LGAMMA, BUILT_IN_LLRINT, BUILT_IN_LLROUND, BUILT_IN_LOG10, BUILT_IN_LOG1P, BUILT_IN_LOG2, BUILT_IN_LOGB, BUILT_IN_LOG, BUILT_IN_LRINT, BUILT_IN_LROUND, BUILT_IN_MODF, BUILT_IN_NEXTAFTER, BUILT_IN_POW, BUILT_IN_REMAINDER, BUILT_IN_REMQUO, BUILT_IN_SCALBLN, BUILT_IN_SCALBN, BUILT_IN_SINH, BUILT_IN_SIN, BUILT_IN_TANH, BUILT_IN_TAN, BUILT_IN_TGAMMA): Add DEF_EXT_LIB_FLOATN_NX_BUILTINS. (BUILT_IN_HUGE_VAL): Use HUGE_VAL_TYPE instead of INF_TYPE in DEF_GCC_FLOATN_NX_BUILTINS. * fold-const-call.cc (fold_const_call_ss): Add various CASE_CFN_*_FN: cases when CASE_CFN_* is present. (fold_const_call_sss): Likewise. * builtins.cc (mathfn_built_in_2): Use CASE_MATHFN_FLOATN instead of CASE_MATHFN for various builtins in SEQ_OF_CASE_MATHFN macro. (builtin_with_linkage_p): Add CASE_FLT_FN_FLOATN_NX for various builtins next to CASE_FLT_FN. * fold-const.cc (tree_call_nonnegative_warnv_p): Add CASE_CFN_*_FN: next to CASE_CFN_*: for various builtins. * tree-call-cdce.cc (can_test_argument_range): Add CASE_FLT_FN_FLOATN_NX next to CASE_FLT_FN for various builtins. (edom_only_function): Likewise. * gcc.dg/torture/floatn-builtin.h: Add tests for newly added builtins.
2022-10-06c++, c: Implement C++23 P1774R8 - Portable assumptions [PR106654]Jakub Jelinek1-15/+13
The following patch implements C++23 P1774R8 - Portable assumptions paper, by introducing support for [[assume (cond)]]; attribute for C++. In addition to that the patch adds [[gnu::assume (cond)]]; and __attribute__((assume (cond))); support to both C and C++. As described in C++23, the attribute argument is conditional-expression rather than the usual assignment-expression for attribute arguments, the condition is contextually converted to bool (for C truthvalue conversion is done on it) and is never evaluated at runtime. For C++ constant expression evaluation, I only check the simplest conditions for undefined behavior, because otherwise I'd need to undo changes to *ctx->global which happened during the evaluation (but I believe the spec allows that and we can further improve later). The patch uses a new internal function, .ASSUME, to hold the condition in the FEs. At gimplification time, if the condition is simple/without side-effects, it is gimplified as if (cond) ; else __builtin_unreachable (); and otherwise for now dropped on the floor. The intent is to incrementally outline the conditions into separate artificial functions and use .ASSUME further to tell the ranger and perhaps other optimization passes about the assumptions, as detailed in the PR. When implementing it, I found that assume entry hasn't been added to https://eel.is/c++draft/cpp.cond#6 Jonathan said he'll file a NB comment about it, this patch assumes it has been added into the table as 202207L when the paper has been voted in. With the attributes for both C/C++, I'd say we don't need to add __builtin_assume with similar purpose, especially when __builtin_assume in LLVM is just weird. It is strange for side-effects in function call's argument not to be evaluated, and LLVM in that case (annoyingly) warns and ignores the side-effects (but doesn't do then anything with it), if there are no side-effects, it will work like our if (!cond) __builtin_unreachable (); 2022-10-06 Jakub Jelinek <jakub@redhat.com> PR c++/106654 gcc/ * internal-fn.def (ASSUME): New internal function. * internal-fn.h (expand_ASSUME): Declare. * internal-fn.cc (expand_ASSUME): Define. * gimplify.cc (gimplify_call_expr): Gimplify IFN_ASSUME. * fold-const.h (simple_condition_p): Declare. * fold-const.cc (simple_operand_p_2): Rename to ... (simple_condition_p): ... this. Remove forward declaration. No longer static. Adjust function comment and fix a typo in it. Adjust recursive call. (simple_operand_p): Adjust function comment. (fold_truth_andor): Adjust simple_operand_p_2 callers to call simple_condition_p. * doc/extend.texi: Document assume attribute. Move fallthrough attribute example to its section. gcc/c-family/ * c-attribs.cc (handle_assume_attribute): New function. (c_common_attribute_table): Add entry for assume attribute. * c-lex.cc (c_common_has_attribute): Handle __have_cpp_attribute (assume). gcc/c/ * c-parser.cc (handle_assume_attribute): New function. (c_parser_declaration_or_fndef): Handle assume attribute. (c_parser_attribute_arguments): Add assume_attr argument, if true, parse first argument as conditional expression. (c_parser_gnu_attribute, c_parser_std_attribute): Adjust c_parser_attribute_arguments callers. (c_parser_statement_after_labels) <case RID_ATTRIBUTE>: Handle assume attribute. gcc/cp/ * cp-tree.h (process_stmt_assume_attribute): Implement C++23 P1774R8 - Portable assumptions. Declare. (diagnose_failing_condition): Declare. (find_failing_clause): Likewise. * parser.cc (assume_attr): New enumerator. (cp_parser_parenthesized_expression_list): Handle assume_attr. Remove identifier variable, for id_attr push the identifier into expression_list right away instead of inserting it before all the others at the end. (cp_parser_conditional_expression): New function. (cp_parser_constant_expression): Use it. (cp_parser_statement): Handle assume attribute. (cp_parser_expression_statement): Likewise. (cp_parser_gnu_attribute_list): Use assume_attr for assume attribute. (cp_parser_std_attribute): Likewise. Handle standard assume attribute like gnu::assume. * cp-gimplify.cc (process_stmt_assume_attribute): New function. * constexpr.cc: Include fold-const.h. (find_failing_clause_r, find_failing_clause): New functions, moved from semantics.cc with ctx argument added and if non-NULL, call cxx_eval_constant_expression rather than fold_non_dependent_expr. (cxx_eval_internal_function): Handle IFN_ASSUME. (potential_constant_expression_1): Likewise. * pt.cc (tsubst_copy_and_build): Likewise. * semantics.cc (diagnose_failing_condition): New function. (find_failing_clause_r, find_failing_clause): Moved to constexpr.cc. (finish_static_assert): Use it. Add auto_diagnostic_group. gcc/testsuite/ * gcc.dg/attr-assume-1.c: New test. * gcc.dg/attr-assume-2.c: New test. * gcc.dg/attr-assume-3.c: New test. * g++.dg/cpp2a/feat-cxx2a.C: Add colon to C++20 features comment, add C++20 attributes comment and move C++20 new features after the attributes before them. * g++.dg/cpp23/feat-cxx2b.C: Likewise. Test __has_cpp_attribute(assume). * g++.dg/cpp23/attr-assume1.C: New test. * g++.dg/cpp23/attr-assume2.C: New test. * g++.dg/cpp23/attr-assume3.C: New test. * g++.dg/cpp23/attr-assume4.C: New test.
2022-08-09middle-end: Optimize ((X >> C1) & C2) != C3 for more cases.Roger Sayle1-54/+0
Following my middle-end patch for PR tree-optimization/94026, I'd promised Jeff Law that I'd clean up the dead-code in fold-const.cc now that these optimizations are handled in match.pd. Alas, I discovered things aren't quite that simple, as the transformations I'd added avoided cases where C2 overlapped with the new bits introduced by the shift, but the original code handled any value of C2 provided that it had a single-bit set (under the condition that C3 was always zero). This patch upgrades the transformations supported by match.pd to cover any values of C2 and C3, provided that C1 is a valid bit shift constant, for all three shift types (logical right, arithmetic right and left). This then makes the code in fold-const.cc fully redundant, and adds support for some new (corner) cases not previously handled. If the constant C1 is valid for the type's precision, the shift is now always eliminated (with C2 and C3 possibly updated to test the sign bit). Interestingly, the fold-const.cc code that I'm now deleting was originally added by me back in 2006 to resolve PR middle-end/21137. I've confirmed that those testcase(s) remain resolved with this patch (and I'll close 21137 in Bugzilla). This patch also implements most (but not all) of the examples mentioned in PR tree-optimization/98954, for which I have some follow-up patches. 2022-08-09 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR middle-end/21137 PR tree-optimization/98954 * fold-const.cc (fold_binary_loc): Remove optimizations to optimize ((X >> C1) & C2) ==/!= 0. * match.pd (cmp (bit_and (lshift @0 @1) @2) @3): Remove wi::ctz check, and handle all values of INTEGER_CSTs @2 and @3. (cmp (bit_and (rshift @0 @1) @2) @3): Likewise, remove wi::clz checks, and handle all values of INTEGER_CSTs @2 and @3. gcc/testsuite/ChangeLog PR middle-end/21137 PR tree-optimization/98954 * gcc.dg/fold-eqandshift-4.c: New test case.
2022-06-20middle-end/106027 - fix types in needle foldingRichard Biener1-3/+7
The fold_to_nonsharp_ineq_using_bound folding ends up creating invalid typed IL which confuses later foldings. The following fixes that. 2022-06-20 Richard Biener <rguenther@suse.de> PR middle-end/106027 * fold-const.cc (fold_to_nonsharp_ineq_using_bound): Use the type of the prevailing comparison for the new comparison type. (fold_binary_loc): Use proper types for the A < X && A + 1 > Y to A < X && A >= Y folding. * gcc.dg/pr106027.c: New testcase.
2022-05-27fold-const: Fix up -fsanitize=null in C++ [PR105729]Jakub Jelinek1-0/+10
The following testcase triggers a false positive UBSan binding a reference to null diagnostics. In the FE we instrument conversions from pointer to reference type to diagnose at runtime if the operand of such a conversion is 0. The problem is that a GENERIC folding folds ((const struct Bar *) ((const struct Foo *) this)->data) + (sizetype) range_check (x) conversion to const struct Bar & by converting to that the first operand of the POINTER_PLUS_EXPR. But that changes when the -fsanitize=null binding to reference runtime check occurs. Without the optimization, it is invoked on the result of the POINTER_PLUS_EXPR, and as range_check call throws, that means it never triggers in the testcase. With the optimization, it checks whether this->data is NULL and it is. The following patch avoids that optimization during GENERIC folding when -fsanitize=null is enabled and it is a cast from non-REFERENCE_TYPE to REFERENCE_TYPE. 2022-05-27 Jakub Jelinek <jakub@redhat.com> PR sanitizer/105729 * fold-const.cc (fold_unary_loc): Don't optimize (X &) ((Y *) z + w) to (X &) z + w if -fsanitize=null during GENERIC folding. * g++.dg/ubsan/pr105729.C: New test.
2022-05-13Make gimple_build main workers more flexibleRichard Biener1-0/+1
The following makes the main gimple_build API take a gimple_stmt_iterator, whether to insert before or after and an iterator update argument to make it more convenient to use in certain situations (see the tree-vect-generic.cc hunks for an example). It also makes the case we insert into the IL somewhat distinct from inserting into a standalone sequence in that it simplifies built expressions the same way as inserting and calling fold_stmt (..., follow_all_ssa_edges) would. When inserting into a standalone sequence we restrict simplification to defs within the currently building sequence. The patch only amends the tree_code gimple_build API, I will followup with converting the rest as well. The patch got larger than intended because the template forwarders now use gsi_last which introduces a dependency on gimple-iterator.h requiring mass #include re-org across the tree. There are two frontend specific files including gimple-fold.h just for some padding clearing stuff - I've removed the include and instead moved the declarations to fold-const.h (but not the implementations). Otherwise I'd have to include half of the middle-end headers in those files which I didn't much like. 2022-05-12 Richard Biener <rguenther@suse.de> gcc/cp/ * constexpr.cc: Remove gimple-fold.h include. gcc/c-family/ * c-omp.cc: Remove gimple-fold.h include. gcc/analyzer/ * supergraph.cc: Re-order gimple-fold.h include. gcc/ * gimple-fold.cc (gimple_build): Adjust for new main API. * gimple-fold.h (gimple_build): New main APIs with iterator, insert direction and iterator update. (gimple_build): New forwarder template. (clear_padding_type_may_have_padding_p): Remove. (clear_type_padding_in_mask): Likewise. (arith_overflowed_p): Likewise. * fold-const.h (clear_padding_type_may_have_padding_p): Declare. (clear_type_padding_in_mask): Likewise. (arith_overflowed_p): Likewise. * tree-vect-generic.cc (gimplify_build3): Use main gimple_build API. (gimplify_build2): Likewise. (gimplify_build1): Likewise. * ubsan.cc (ubsan_expand_ptr_ifn): Likewise, avoid extra compare stmt. * gengtype.cc (open_base_files): Re-order includes. * builtins.cc: Re-order gimple-fold.h include. * calls.cc: Likewise. * cgraphbuild.cc: Likewise. * cgraphunit.cc: Likewise. * config/rs6000/rs6000-builtin.cc: Likewise. * config/rs6000/rs6000-call.cc: Likewise. * config/rs6000/rs6000.cc: Likewise. * config/s390/s390.cc: Likewise. * expr.cc: Likewise. * fold-const.cc: Likewise. * function-tests.cc: Likewise. * gimple-match-head.cc: Likewise. * gimple-range-fold.cc: Likewise. * gimple-ssa-evrp-analyze.cc: Likewise. * gimple-ssa-evrp.cc: Likewise. * gimple-ssa-sprintf.cc: Likewise. * gimple-ssa-warn-access.cc: Likewise. * gimplify.cc: Likewise. * graphite-isl-ast-to-gimple.cc: Likewise. * ipa-cp.cc: Likewise. * ipa-devirt.cc: Likewise. * ipa-prop.cc: Likewise. * omp-low.cc: Likewise. * pointer-query.cc: Likewise. * range-op.cc: Likewise. * tree-cfg.cc: Likewise. * tree-if-conv.cc: Likewise. * tree-inline.cc: Likewise. * tree-object-size.cc: Likewise. * tree-ssa-ccp.cc: Likewise. * tree-ssa-dom.cc: Likewise. * tree-ssa-forwprop.cc: Likewise. * tree-ssa-ifcombine.cc: Likewise. * tree-ssa-loop-ivcanon.cc: Likewise. * tree-ssa-math-opts.cc: Likewise. * tree-ssa-pre.cc: Likewise. * tree-ssa-propagate.cc: Likewise. * tree-ssa-reassoc.cc: Likewise. * tree-ssa-sccvn.cc: Likewise. * tree-ssa-strlen.cc: Likewise. * tree-ssa.cc: Likewise. * value-pointer-equiv.cc: Likewise. * vr-values.cc: Likewise. gcc/testsuite/ * gcc.dg/plugin/diagnostic_group_plugin.c: Reorder or remove gimple-fold.h include. * gcc.dg/plugin/diagnostic_plugin_show_trees.c: Likewise. * gcc.dg/plugin/diagnostic_plugin_test_inlining.c: Likewise. * gcc.dg/plugin/diagnostic_plugin_test_metadata.c: Likewise. * gcc.dg/plugin/diagnostic_plugin_test_paths.c: Likewise. * gcc.dg/plugin/diagnostic_plugin_test_show_locus.c: Likewise. * gcc.dg/plugin/diagnostic_plugin_test_string_literals.c: Likewise. * gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c: Likewise. * gcc.dg/plugin/finish_unit_plugin.c: Likewise. * gcc.dg/plugin/ggcplug.c: Likewise. * gcc.dg/plugin/must_tail_call_plugin.c: Likewise. * gcc.dg/plugin/one_time_plugin.c: Likewise. * gcc.dg/plugin/selfassign.c: Likewise. * gcc.dg/plugin/start_unit_plugin.c: Likewise. * g++.dg/plugin/selfassign.c: Likewise.
2022-04-13tree-optimization/105250 - adjust fold_convertible_p PR105140 fixRichard Biener1-4/+3
The following reverts the original PR105140 fix and goes for instead applying the additional fold_convert constraint for VECTOR_TYPE conversions also to fold_convertible_p. I did not try sanitizing all of this at this point. 2022-04-13 Richard Biener <rguenther@suse.de> PR tree-optimization/105250 * fold-const.cc (fold_convertible_p): Revert r12-7979-geaaf77dd85c333, instead check for size equality of the vector types involved. * gcc.dg/pr105250.c: New testcase.
2022-04-08fold-const: Fix up make_range_step [PR105189]Jakub Jelinek1-1/+27
The following testcase is miscompiled, because fold_truth_andor incorrectly folds (unsigned) foo () >= 0U && 1 into foo () >= 0 For the unsigned comparison (which is useless in this case, as >= 0U is always true, but hasn't been folded yet), previous make_range_step derives exp (unsigned) foo () and +[0U, -] range for it. Next we process the NOP_EXPR. We have special code for unsigned to signed casts, already earlier punt if low or high aren't representable in arg0_type or if it is a narrowing conversion. For the signed to unsigned casts, I think if high is specified we are still fine, as we punt for non-representable values in arg0_type, n_high is then still representable and so was smaller or equal to signed maximum and either low is not present (equivalent to 0U), or low must be smaller or equal to high and so for unsigned exp +[low, high] the signed exp +[n_low, n_high] will be correct. Similarly, if both low and high aren't specified (always true or always false), it is ok too. But if we have for unsigned exp +[low, -] or -[low, -], using +[n_low, -] or -[n_high, -] is incorrect. Because low is smaller or equal to signed maximum and high is unspecified (i.e. unsigned maximum), when signed that range is a union of +[n_low, -] and +[-, -1] which is equivalent to -[0, n_low-1], unless low is 0, in that case we can treat it as [-, -]. 2022-04-08 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/105189 * fold-const.cc (make_range_step): Fix up handling of (unsigned) x +[low, -] ranges for signed x if low fits into typeof (x). * g++.dg/torture/pr105189.C: New test.
2022-04-04middle-end/105140 - fix bogus recursion in fold_convertible_pRichard Biener1-2/+3
fold_convertible_p expects an operand and a type to convert to but recurses with two vector component types. Fixed by allowing types instead of an operand as well. 2022-04-04 Richard Biener <rguenther@suse.de> PR middle-end/105140 * fold-const.cc (fold_convertible_p): Allow a TYPE_P arg. * gcc.dg/pr105140.c: New testcase.
2022-03-24fold-const: Handle C++ dependent COMPONENT_REFs in operand_equal_p [PR105035]Jakub Jelinek1-2/+5
As mentioned in the PR, operand_equal_p already contains some hacks so that it can be called already on pre-instantiation C++ trees from templates, but the recent change to compare DECL_FIELD_OFFSET in the COMPONENT_REF case broke this. Many such COMPONENT_REFs are already punted on earlier because they have NULL TREE_TYPE, but in this case the code knows what type they have but still uses an IDENTIFIER_NODE as second operand of COMPONENT_REF (I think SCOPE_REF is something that could be used too). The following patch looks at those DECL_FIELD_*OFFSET fields only if both field[01] args are FIELD_DECLs and otherwise keeps it to the earlier OP_SAME (1) check that guards this whole block. 2022-03-24 Jakub Jelinek <jakub@redhat.com> PR c++/105035 * fold-const.cc (operand_equal_p) <case COMPONENT_REF>: If either field0 or field1 is not a FIELD_DECL, return false. * g++.dg/warn/Wduplicated-cond2.C: New test.
2022-02-22Implement constant-folding simplifications of reductions.Roger Sayle1-0/+20
This patch addresses a code quality regression in GCC 12 by implementing some constant folding/simplification transformations for REDUC_PLUS_EXPR in match.pd. The motivating example is gcc.dg/vect/pr89440.c which with -O2 -ffast-math (with vectorization now enabled) gets optimized to: float f (float x) { vector(4) float vect_x_14.11; vector(4) float _2; float _32; _2 = {x_9(D), 0.0, 0.0, 0.0}; vect_x_14.11_29 = _2 + { 1.0e+1, 2.6e+1, 4.2e+1, 5.8e+1 }; _32 = .REDUC_PLUS (vect_x_14.11_29); [tail call] return _32; } With these proposed new transformations, we can simplify the above code even further. float f (float x) { float _32; _32 = x_9(D) + 1.36e+2; return _32; } [which happens to match what we'd produce with -fno-tree-vectorize, and with GCC 11]. 2022-02-22 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog * fold-const.cc (ctor_single_nonzero_element): New function to return the single non-zero element of a (vector) constructor. * fold-const.h (ctor_single_nonzero_element): Prototype here. * match.pd (reduc (constructor@0)): Simplify reductions of a constructor containing a single non-zero element. (reduc (@0 op VECTOR_CST) -> (reduc @0) op CONST): Simplify reductions of vector operations of the same operator with constant vector operands. gcc/testsuite/ChangeLog * gcc.dg/fold-reduc-1.c: New test case.
2022-02-15fold, simplify-rtx: Punt on non-representable floating point constants ↵Jakub Jelinek1-15/+19
[PR104522] For IBM double double I've added in PR95450 and PR99648 verification that when we at the tree/GIMPLE or RTL level interpret target bytes as a REAL_CST or CONST_DOUBLE constant, we try to encode it back to target bytes and verify it is the same. This is because our real.c support isn't able to represent all valid values of IBM double double which has variable precision. In PR104522, it has been noted that we have similar problem with the Intel/Motorola extended XFmode formats, our internal representation isn't able to record pseudo denormals, pseudo infinities, pseudo NaNs and unnormal values. So, the following patch is an attempt to extend that verification to all floats. Unfortunately, it wasn't that straightforward, because the __builtin_clear_padding code exactly for the XFmode long doubles needs to discover what bits are padding and does that by interpreting memory of all 1s. That is actually a valid supported value, a qNaN with negative sign with all mantissa bits set, but the verification includes also the padding bits (exactly what __builtin_clear_padding wants to figure out) and so fails the comparison check and so we ICE. The patch fixes that case by moving that verification from native_interpret_real to its caller, so that clear_padding_type can call native_interpret_real and avoid that extra check. With this, the only thing that regresses in the testsuite is +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times long\\t-16843010 5 because it decides to use a pattern that has non-zero bits in the padding bits of the long double, so the simplify-rtx.cc change prevents folding a SUBREG into a constant. We emit (the testcase is -O0 but we emit worse code at all opt levels) something like: movabsq $-72340172838076674, %rax movabsq $-72340172838076674, %rdx movq %rax, -48(%rbp) movq %rdx, -40(%rbp) fldt -48(%rbp) fstpt -32(%rbp) instead of fldt .LC2(%rip) fstpt -32(%rbp) ... .LC2: .long -16843010 .long -16843010 .long 65278 .long 0 Note, neither of those sequences actually stores the padding bits, fstpt simply doesn't touch them. For vars with clear_padding_real_needs_padding_p types that are allocated to memory at expansion time, I'd say much better would be to do the stores using integral modes rather than XFmode, so do that: movabsq $-72340172838076674, %rax movq %rax, -32(%rbp) movq %rax, -24(%rbp) directly. That is the only way to ensure the padding bits are initialized (or expand __builtin_clear_padding, but then you initialize separately the value bits and padding bits). 2022-02-15 Jakub Jelinek <jakub@redhat.com> PR middle-end/104522 * fold-const.h (native_interpret_real): Declare. * fold-const.cc (native_interpret_real): No longer static. Don't perform MODE_COMPOSITE_P verification here. (native_interpret_expr) <case REAL_TYPE>: But perform it here instead for all modes. * gimple-fold.cc (clear_padding_type): Call native_interpret_real instead of native_interpret_expr. * simplify-rtx.cc (simplify_immed_subreg): Perform the native_encode_rtx and comparison verification for all FLOAT_MODE_P modes, not just MODE_COMPOSITE_P. * gcc.dg/pr104522.c: New test.