Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ChangeLog:
PR tree-optimization/111648
* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): If a1
chooses base element from arg, ensure that it's a natural stepped
sequence.
(build_vec_cst_rand): New param natural_stepped and use it to
construct a naturally stepped sequence.
(test_nunits_min_2): Add new unit tests Case 6 and Case 7.
|
|
For "get_global_range_query" SSA_NAME_RANGE_INFO can be queried.
For "get_range_query", it could get more context-aware range info.
And look at the implementation of "get_range_query", it returns
global range if no local fun info.
So, if not quering for SSA_NAME and not chaning the IL, it would
be ok to use get_range_query to replace get_global_range_query.
gcc/ChangeLog:
* fold-const.cc (expr_not_equal_to): Replace get_global_range_query
by get_range_query.
* gimple-fold.cc (size_must_be_zero_p): Likewise.
* gimple-range-fold.cc (fur_source::fur_source): Likewise.
* gimple-ssa-warn-access.cc (check_nul_terminated_array): Likewise.
* tree-dfa.cc (get_ref_base_and_extent): Likewise.
|
|
32640 bits [PR102989]
As mentioned in the _BitInt support thread, _BitInt(N) is currently limited
by the wide_int/widest_int maximum precision limitation, which is depending
on target 191, 319, 575 or 703 bits (one less than WIDE_INT_MAX_PRECISION).
That is fairly low limit for _BitInt, especially on the targets with the 191
bit limitation.
The following patch bumps that limit to 16319 bits on all arches (which support
_BitInt at all), which is the limit imposed by INTEGER_CST representation
(unsigned char members holding number of HOST_WIDE_INT limbs).
In order to achieve that, wide_int is changed from a trivially copyable type
which contained just an inline array of WIDE_INT_MAX_ELTS (3, 5, 9 or
11 limbs depending on target) limbs into a non-trivially copy constructible,
copy assignable and destructible type which for the usual small cases (up
to WIDE_INT_MAX_INL_ELTS which is the former WIDE_INT_MAX_ELTS) still uses
an inline array of limbs, but for larger precisions uses heap allocated
limb array. This makes wide_int unusable in GC structures, so for dwarf2out
which was the only place which needed it there is a new rwide_int type
(restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs
inline and is trivially copyable (dwarf2out should never deal with large
_BitInt constants, those should have been lowered earlier).
Similarly, widest_int has been changed from a trivially copyable type which
contained also an inline array of WIDE_INT_MAX_ELTS limbs (but unlike
wide_int didn't contain precision and assumed that to be
WIDE_INT_MAX_PRECISION) into a non-trivially copy constructible, copy
assignable and destructible type which has always WIDEST_INT_MAX_PRECISION
precision (32640 bits currently, twice as much as INTEGER_CST limitation
allows) and unlike wide_int decides depending on get_len () value whether
it uses an inline array (again, up to WIDE_INT_MAX_INL_ELTS) or heap
allocated one. In wide-int.h this means we need to estimate an upper
bound on how many limbs will wide-int.cc (usually, sometimes wide-int.h)
need to write, heap allocate if needed based on that estimation and upon
set_len which is done at the end if we guessed over WIDE_INT_MAX_INL_ELTS
and allocated dynamically, while we actually need less than that
copy/deallocate. The unexact guesses are needed because the exact
computation of the length in wide-int.cc is sometimes quite complex and
especially canonicalize at the end can decrease it. widest_int is again
because of this not usable in GC structures, so cfgloop.h has been changed
to use fixed_wide_int_storage <WIDE_INT_MAX_INL_PRECISION> and punt if
we'd have larger _BitInt based iterators, programs having more than 128-bit
iterators will be hopefully rare and I think it is fine to treat loops with
more than 2^127 iterations as effectively possibly infinite, omp-general.cc
is changed to use fixed_wide_int_storage <1024>, as it better should support
scores with the same precision on all arches.
Code which used WIDE_INT_PRINT_BUFFER_SIZE sized buffers for printing
wide_int/widest_int into buffer had to be changed to use XALLOCAVEC for
larger lengths.
On x86_64, the patch in --enable-checking=yes,rtl,extra configured
bootstrapped cc1plus enlarges the .text section by 1.01% - from
0x25725a5 to 0x25e5555 and similarly at least when compiling insn-recog.cc
with the usual bootstrap option slows compilation down by 1.01%,
user 4m22.046s and 4m22.384s on vanilla trunk vs.
4m25.947s and 4m25.581s on patched trunk. I'm afraid some code size growth
and compile time slowdown is unavoidable in this case, we use wide_int and
widest_int everywhere, and while the rare cases are marked with UNLIKELY
macros, it still means extra checks for it.
The patch also regresses
+FAIL: gm2/pim/fail/largeconst.mod, -O
+FAIL: gm2/pim/fail/largeconst.mod, -O -g
+FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer
+FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer -finline-functions
+FAIL: gm2/pim/fail/largeconst.mod, -Os
+FAIL: gm2/pim/fail/largeconst.mod, -g
+FAIL: gm2/pim/fail/largeconst2.mod, -O
+FAIL: gm2/pim/fail/largeconst2.mod, -O -g
+FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer
+FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer -finline-functions
+FAIL: gm2/pim/fail/largeconst2.mod, -Os
+FAIL: gm2/pim/fail/largeconst2.mod, -g
tests, which previously were rejected with
error: constant literal ‘12345678912345678912345679123456789123456789123456789123456789123456791234567891234567891234567891234567891234567912345678912345678912345678912345678912345679123456789123456789’ exceeds internal ZTYPE range
kind of errors, but now are accepted. Seems the FE tries to parse constants
into widest_int in that case and only diagnoses if widest_int overflows,
that seems wrong, it should at least punt if stuff doesn't fit into
WIDE_INT_MAX_PRECISION, but perhaps far less than that, if it wants support
for middle-end for precisions above 128-bit, it better should be using
BITINT_TYPE. Will file a PR and defer to Modula2 maintainer.
2023-10-12 Jakub Jelinek <jakub@redhat.com>
PR c/102989
* wide-int.h: Adjust file comment.
(WIDE_INT_MAX_INL_ELTS): Define to former value of WIDE_INT_MAX_ELTS.
(WIDE_INT_MAX_INL_PRECISION): Define.
(WIDE_INT_MAX_ELTS): Change to 255. Assert that WIDE_INT_MAX_INL_ELTS
is smaller than WIDE_INT_MAX_ELTS.
(RWIDE_INT_MAX_ELTS, RWIDE_INT_MAX_PRECISION, WIDEST_INT_MAX_ELTS,
WIDEST_INT_MAX_PRECISION): Define.
(WI_BINARY_RESULT_VAR, WI_UNARY_RESULT_VAR): Change write_val callers
to pass 0 as a new argument.
(class widest_int_storage): Likewise.
(widest_int, widest2_int): Change typedefs to use widest_int_storage
rather than fixed_wide_int_storage.
(enum wi::precision_type): Add INL_CONST_PRECISION enumerator.
(struct binary_traits): Add partial specializations for
INL_CONST_PRECISION.
(generic_wide_int): Add needs_write_val_arg static data member.
(int_traits): Likewise.
(wide_int_storage): Replace val non-static data member with a union
u of it and HOST_WIDE_INT *valp. Declare copy constructor, copy
assignment operator and destructor. Add unsigned int argument to
write_val.
(wide_int_storage::wide_int_storage): Initialize precision to 0
in the default ctor. Remove unnecessary {}s around STATIC_ASSERTs.
Assert in non-default ctor T's precision_type is not
INL_CONST_PRECISION and allocate u.valp for large precision. Add
copy constructor.
(wide_int_storage::~wide_int_storage): New.
(wide_int_storage::operator=): Add copy assignment operator. In
assignment operator remove unnecessary {}s around STATIC_ASSERTs,
assert ctor T's precision_type is not INL_CONST_PRECISION and
if precision changes, deallocate and/or allocate u.valp.
(wide_int_storage::get_val): Return u.valp rather than u.val for
large precision.
(wide_int_storage::write_val): Likewise. Add an unused unsigned int
argument.
(wide_int_storage::set_len): Use write_val instead of writing val
directly.
(wide_int_storage::from, wide_int_storage::from_array): Adjust
write_val callers.
(wide_int_storage::create): Allocate u.valp for large precisions.
(wi::int_traits <wide_int_storage>::get_binary_precision): New.
(fixed_wide_int_storage::fixed_wide_int_storage): Make default
ctor defaulted.
(fixed_wide_int_storage::write_val): Add unused unsigned int argument.
(fixed_wide_int_storage::from, fixed_wide_int_storage::from_array):
Adjust write_val callers.
(wi::int_traits <fixed_wide_int_storage>::get_binary_precision): New.
(WIDEST_INT): Define.
(widest_int_storage): New template class.
(wi::int_traits <widest_int_storage>): New.
(trailing_wide_int_storage::write_val): Add unused unsigned int
argument.
(wi::get_binary_precision): Use
wi::int_traits <WI_BINARY_RESULT (T1, T2)>::get_binary_precision
rather than get_precision on get_binary_result.
(wi::copy): Adjust write_val callers. Don't call set_len if
needs_write_val_arg.
(wi::bit_not): If result.needs_write_val_arg, call write_val
again with upper bound estimate of len.
(wi::sext, wi::zext, wi::set_bit): Likewise.
(wi::bit_and, wi::bit_and_not, wi::bit_or, wi::bit_or_not,
wi::bit_xor, wi::add, wi::sub, wi::mul, wi::mul_high, wi::div_trunc,
wi::div_floor, wi::div_ceil, wi::div_round, wi::divmod_trunc,
wi::mod_trunc, wi::mod_floor, wi::mod_ceil, wi::mod_round,
wi::lshift, wi::lrshift, wi::arshift): Likewise.
(wi::bswap, wi::bitreverse): Assert result.needs_write_val_arg
is false.
(gt_ggc_mx, gt_pch_nx): Remove generic template for all
generic_wide_int, instead add functions and templates for each
storage of generic_wide_int. Make functions for
generic_wide_int <wide_int_storage> and templates for
generic_wide_int <widest_int_storage <N>> deleted.
(wi::mask, wi::shifted_mask): Adjust write_val calls.
* wide-int.cc (zeros): Decrease array size to 1.
(BLOCKS_NEEDED): Use CEIL.
(canonize): Use HOST_WIDE_INT_M1.
(wi::from_buffer): Pass 0 to write_val.
(wi::to_mpz): Use CEIL.
(wi::from_mpz): Likewise. Pass 0 to write_val. Use
WIDE_INT_MAX_INL_ELTS instead of WIDE_INT_MAX_ELTS.
(wi::mul_internal): Use WIDE_INT_MAX_INL_PRECISION instead of
MAX_BITSIZE_MODE_ANY_INT in automatic array sizes, for prec
above WIDE_INT_MAX_INL_PRECISION estimate precision from
lengths of operands. Use XALLOCAVEC allocated buffers for
prec above WIDE_INT_MAX_INL_PRECISION.
(wi::divmod_internal): Likewise.
(wi::lshift_large): For len > WIDE_INT_MAX_INL_ELTS estimate
it from xlen and skip.
(rshift_large_common): Remove xprecision argument, add len
argument with len computed in caller. Don't return anything.
(wi::lrshift_large, wi::arshift_large): Compute len here
and pass it to rshift_large_common, for lengths above
WIDE_INT_MAX_INL_ELTS using estimations from xlen if possible.
(assert_deceq, assert_hexeq): For lengths above
WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer.
(test_printing): Use WIDE_INT_MAX_INL_PRECISION instead of
WIDE_INT_MAX_PRECISION.
* wide-int-print.h (WIDE_INT_PRINT_BUFFER_SIZE): Use
WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION.
* wide-int-print.cc (print_decs, print_decu, print_hex): For
lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer.
* tree.h (wi::int_traits<extended_tree <N>>): Change precision_type
to INL_CONST_PRECISION for N == ADDR_MAX_PRECISION.
(widest_extended_tree): Use WIDEST_INT_MAX_PRECISION instead of
WIDE_INT_MAX_PRECISION.
(wi::ints_for): Use int_traits <extended_tree <N> >::precision_type
instead of hard coded CONST_PRECISION.
(widest2_int_cst): Use WIDEST_INT_MAX_PRECISION instead of
WIDE_INT_MAX_PRECISION.
(wi::extended_tree <N>::get_len): Use WIDEST_INT_MAX_PRECISION rather
than WIDE_INT_MAX_PRECISION.
(wi::ints_for::zero): Use
wi::int_traits <wi::extended_tree <N> >::precision_type instead of
wi::CONST_PRECISION.
* tree.cc (build_replicated_int_cst): Formatting fix. Use
WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS.
* print-tree.cc (print_node): Don't print TREE_UNAVAILABLE on
INTEGER_CSTs, TREE_VECs or SSA_NAMEs.
* double-int.h (wi::int_traits <double_int>::precision_type): Change
to INL_CONST_PRECISION from CONST_PRECISION.
* poly-int.h (struct poly_coeff_traits): Add partial specialization
for wi::INL_CONST_PRECISION.
* cfgloop.h (bound_wide_int): New typedef.
(struct nb_iter_bound): Change bound type from widest_int to
bound_wide_int.
(struct loop): Change nb_iterations_upper_bound,
nb_iterations_likely_upper_bound and nb_iterations_estimate type from
widest_int to bound_wide_int.
* cfgloop.cc (record_niter_bound): Return early if wi::min_precision
of i_bound is too large for bound_wide_int. Adjustments for the
widest_int to bound_wide_int type change in non-static data members.
(get_estimated_loop_iterations, get_max_loop_iterations,
get_likely_max_loop_iterations): Adjustments for the widest_int to
bound_wide_int type change in non-static data members.
* tree-vect-loop.cc (vect_transform_loop): Likewise.
* tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use
XALLOCAVEC allocated buffer for i_bound len above
WIDE_INT_MAX_INL_ELTS.
(record_estimate): Return early if wi::min_precision of i_bound is too
large for bound_wide_int. Adjustments for the widest_int to
bound_wide_int type change in non-static data members.
(wide_int_cmp): Use bound_wide_int instead of widest_int.
(bound_index): Use bound_wide_int instead of widest_int.
(discover_iteration_bound_by_body_walk): Likewise. Use
widest_int::from to convert it to widest_int when passed to
record_niter_bound.
(maybe_lower_iteration_bound): Use widest_int::from to convert it to
widest_int when passed to record_niter_bound.
(estimate_numbers_of_iteration): Don't record upper bound if
loop->nb_iterations has too large precision for bound_wide_int.
(n_of_executions_at_most): Use widest_int::from.
* tree-ssa-loop-ivcanon.cc (remove_redundant_iv_tests): Adjust for
the widest_int to bound_wide_int changes.
* match.pd (fold_sign_changed_comparison simplification): Use
wide_int::from on wi::to_wide instead of wi::to_widest.
* value-range.h (irange::maybe_resize): Avoid using memcpy on
non-trivially copyable elements.
* value-range.cc (irange_bitmask::dump): Use XALLOCAVEC allocated
buffer for mask or value len above WIDE_INT_PRINT_BUFFER_SIZE.
* fold-const.cc (fold_convert_const_int_from_int, fold_unary_loc):
Use wide_int::from on wi::to_wide instead of wi::to_widest.
* tree-ssa-ccp.cc (bit_value_binop): Zero extend r1max from width
before calling wi::udiv_trunc.
* lto-streamer-out.cc (output_cfg): Adjustments for the widest_int to
bound_wide_int type change in non-static data members.
* lto-streamer-in.cc (input_cfg): Likewise.
(lto_input_tree_1): Use WIDE_INT_MAX_INL_ELTS rather than
WIDE_INT_MAX_ELTS. For length above WIDE_INT_MAX_INL_ELTS use
XALLOCAVEC allocated buffer. Formatting fix.
* data-streamer-in.cc (streamer_read_wide_int,
streamer_read_widest_int): Likewise.
* tree-affine.cc (aff_combination_expand): Use placement new to
construct name_expansion.
(free_name_expansion): Destruct name_expansion.
* gimple-ssa-strength-reduction.cc (struct slsr_cand_d): Change
index type from widest_int to offset_int.
(class incr_info_d): Change incr type from widest_int to offset_int.
(alloc_cand_and_find_basis, backtrace_base_for_ref,
restructure_reference, slsr_process_ref, create_mul_ssa_cand,
create_mul_imm_cand, create_add_ssa_cand, create_add_imm_cand,
slsr_process_add, cand_abs_increment, replace_mult_candidate,
replace_unconditional_candidate, incr_vec_index,
create_add_on_incoming_edge, create_phi_basis_1,
replace_conditional_candidate, record_increment,
record_phi_increments_1, phi_incr_cost_1, phi_incr_cost,
lowest_cost_path, total_savings, ncd_with_phi, ncd_of_cand_and_phis,
nearest_common_dominator_for_cands, insert_initializers,
all_phi_incrs_profitable_1, replace_one_candidate,
replace_profitable_candidates): Use offset_int rather than widest_int
and wi::to_offset rather than wi::to_widest.
* real.cc (real_to_integer): Use WIDE_INT_MAX_INL_ELTS rather than
2 * WIDE_INT_MAX_ELTS and for words above that use XALLOCAVEC
allocated buffer.
* tree-ssa-loop-ivopts.cc (niter_for_exit): Use placement new
to construct tree_niter_desc and destruct it on failure.
(free_tree_niter_desc): Destruct tree_niter_desc if value is non-NULL.
* gengtype.cc (main): Remove widest_int handling.
* graphite-isl-ast-to-gimple.cc (widest_int_from_isl_expr_int): Use
WIDEST_INT_MAX_ELTS instead of WIDE_INT_MAX_ELTS.
* gimple-ssa-warn-alloca.cc (pass_walloca::execute): Use
WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION and
assert get_len () fits into it.
* value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks):
For mask or value lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC
allocated buffer.
* gimple-ssa-sprintf.cc (adjust_range_for_overflow): Use
wide_int::from on wi::to_wide instead of wi::to_widest.
* omp-general.cc (score_wide_int): New typedef.
(omp_context_compute_score): Use score_wide_int instead of widest_int
and adjust for those changes.
(struct omp_declare_variant_entry): Change score and
score_in_declare_simd_clone non-static data member type from widest_int
to score_wide_int.
(omp_resolve_late_declare_variant, omp_resolve_declare_variant): Use
score_wide_int instead of widest_int and adjust for those changes.
(omp_lto_output_declare_variant_alt): Likewise.
(omp_lto_input_declare_variant_alt): Likewise.
* godump.cc (go_output_typedef): Assert get_len () is smaller than
WIDE_INT_MAX_INL_ELTS.
gcc/c-family/
* c-warn.cc (match_case_to_enum_1): Use wi::to_wide just once instead
of 3 times, assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS.
gcc/testsuite/
* gcc.dg/bitint-38.c: New test.
|
|
The following ups the limit in fold_view_convert_expr to handle
1024bit vectors as used by GCN and RVV. It also robustifies
the handling in visit_reference_op_load to properly give up when
constants cannot be re-interpreted.
PR tree-optimization/111751
* fold-const.cc (fold_view_convert_expr): Up the buffer size
to 128 bytes.
* tree-ssa-sccvn.cc (visit_reference_op_load): Special case
constants, giving up when re-interpretation to the target type
fails.
|
|
poly_int was written before the switch to C++11 and so couldn't
use explicit default constructors. This led to an awkward split
between poly_int_pod and poly_int. poly_int simply inherited from
poly_int_pod and added constructors, with the argumentless constructor
having an empty body. But inheritance meant that poly_int had to
repeat the assignment operators from poly_int_pod (again, no C++11,
so no "using" to inherit base-class implementations).
All that goes away if we switch to using default constructors.
The main complication is ensuring that braced initialisation still
gives a constexpr, so that static variables can be initialised without
runtime code. The two problems here are:
(1) When initialising a poly_int<N, wide_int> with fewer than N
coefficients, the other coefficients need to be a zero of
the same precision as the explicit coefficients. This was
previously done in a for loop using wi::ints_for<...>::zero,
but C++11 constexpr constructors can't have function bodies.
The patch instead uses a series of delegated initialisers to
fill in the implicit coefficients.
(2) The initialisation in:
void f(int x) {
unsigned int foo {x};
}
produces the warning:
warning: narrowing conversion of 'x' from 'int' to 'unsigned int' [-Wnarrowing]
whereas:
void f(int x) {
unsigned int foo = x;
}
does not. So switching to direct initialisation of the coeffs array
would mean that:
poly_uin64_t x = 0;
would trigger a warning for using 0 rather than 0u. That seemed
overly pedantic, so the patch adds explicit casts to the constructor.
The complication is to do that without adding extra code to
wide-int versions. The patch uses a new init_cast type for that.
gcc/
* poly-int.h (poly_int_pod): Delete.
(poly_coeff_traits::init_cast): New type.
(poly_int_full, poly_int_hungry, poly_int_fullness): New structures.
(poly_int): Replace constructors that take 1 and 2 coefficients with
a general one that takes an arbitrary number of coefficients.
Delegate initialization to two new private constructors, one of
which uses the coefficients as-is and one of which adds an extra
zero of the appropriate type (and precision, where applicable).
(gt_ggc_mx, gt_pch_nx): Operate on poly_ints rather than poly_int_pods.
* poly-int-types.h (poly_uint16_pod, poly_int64_pod, poly_uint64_pod)
(poly_offset_int_pod, poly_wide_int_pod, poly_widest_int_pod): Delete.
* gengtype.cc (main): Don't register poly_int64_pod.
* calls.cc (initialize_argument_information): Use poly_int rather
than poly_int_pod.
(combine_pending_stack_adjustment_and_call): Likewise.
* config/aarch64/aarch64.cc (pure_scalable_type_info): Likewise.
* data-streamer.h (bp_unpack_poly_value): Likewise.
* dwarf2cfi.cc (struct dw_trace_info): Likewise.
(struct queued_reg_save): Likewise.
* dwarf2out.h (struct dw_cfa_location): Likewise.
* emit-rtl.h (struct incoming_args): Likewise.
(struct rtl_data): Likewise.
* expr.cc (get_bit_range): Likewise.
(get_inner_reference): Likewise.
* expr.h (get_bit_range): Likewise.
* fold-const.cc (split_address_to_core_and_offset): Likewise.
(ptr_difference_const): Likewise.
* fold-const.h (ptr_difference_const): Likewise.
* function.cc (try_fit_stack_local): Likewise.
(instantiate_new_reg): Likewise.
* function.h (struct expr_status): Likewise.
(struct args_size): Likewise.
* genmodes.cc (ZERO_COEFFS): Likewise.
(mode_size_inline): Likewise.
(mode_nunits_inline): Likewise.
(emit_mode_precision): Likewise.
(emit_mode_size): Likewise.
(emit_mode_nunits): Likewise.
* gimple-fold.cc (get_base_constructor): Likewise.
* gimple-ssa-store-merging.cc (struct symbolic_number): Likewise.
* inchash.h (class hash): Likewise.
* ipa-modref-tree.cc (modref_access_node::dump): Likewise.
* ipa-modref.cc (modref_access_analysis::merge_call_side_effects):
Likewise.
* ira-int.h (ira_spilled_reg_stack_slot): Likewise.
* lra-eliminations.cc (self_elim_offsets): Likewise.
* machmode.h (mode_size, mode_precision, mode_nunits): Likewise.
* omp-low.cc (omplow_simd_context): Likewise.
* pretty-print.cc (pp_wide_integer): Likewise.
* pretty-print.h (pp_wide_integer): Likewise.
* reload.cc (struct decomposition): Likewise.
* reload.h (struct reload): Likewise.
* reload1.cc (spill_stack_slot_width): Likewise.
(struct elim_table): Likewise.
(offsets_at): Likewise.
(init_eliminable_invariants): Likewise.
* rtl.h (union rtunion): Likewise.
(poly_int_rtx_p): Likewise.
(strip_offset): Likewise.
(strip_offset_and_add): Likewise.
* rtlanal.cc (strip_offset): Likewise.
* tree-dfa.cc (get_ref_base_and_extent): Likewise.
(get_addr_base_and_unit_offset_1): Likewise.
(get_addr_base_and_unit_offset): Likewise.
* tree-dfa.h (get_ref_base_and_extent): Likewise.
(get_addr_base_and_unit_offset_1): Likewise.
(get_addr_base_and_unit_offset): Likewise.
* tree-ssa-loop-ivopts.cc (struct iv_use): Likewise.
(strip_offset): Likewise.
* tree-ssa-sccvn.h (struct vn_reference_op_struct): Likewise.
* tree.cc (ptrdiff_tree_p): Likewise.
* tree.h (poly_int_tree_p): Likewise.
(ptrdiff_tree_p): Likewise.
(get_inner_reference): Likewise.
gcc/testsuite/
* gcc.dg/plugin/poly-int-tests.h (test_num_coeffs_extra): Use
poly_int rather than poly_int_pod.
|
|
When discussing PR111369 with Andrew Pinski, I've realized that
I haven't added BITINT_TYPE handling to range_check_type. Right now
(unsigned) max + 1 == (unsigned) min for signed _BitInt,l so I think we
don't need to do the extra hops for BITINT_TYPE (though possibly we don't
need them for INTEGER_TYPE either in the two's complement word and we don't
support anything else, though I really don't know if Ada or some other
FEs don't create weird INTEGER_TYPEs).
2023-09-12 Jakub Jelinek <jakub@redhat.com>
* fold-const.cc (range_check_type): Handle BITINT_TYPE like
OFFSET_TYPE.
|
|
This patch adds support that tries to fold `MIN (poly, poly)` to
a constant. Consider the following C Code:
```
void foo2 (int* restrict a, int* restrict b, int n)
{
for (int i = 0; i < 3; i += 1)
a[i] += b[i];
}
```
Before this patch:
```
void foo2 (int * restrict a, int * restrict b, int n)
{
vector([4,4]) int vect__7.27;
vector([4,4]) int vect__6.26;
vector([4,4]) int vect__4.23;
unsigned long _32;
<bb 2> [local count: 268435456]:
_32 = MIN_EXPR <3, POLY_INT_CST [4, 4]>;
vect__4.23_20 = .MASK_LEN_LOAD (a_11(D), 32B, { -1, ... }, _32, 0);
vect__6.26_15 = .MASK_LEN_LOAD (b_12(D), 32B, { -1, ... }, _32, 0);
vect__7.27_9 = vect__6.26_15 + vect__4.23_20;
.MASK_LEN_STORE (a_11(D), 32B, { -1, ... }, _32, 0, vect__7.27_9); [tail call]
return;
}
```
After this patch:
```
void foo2 (int * restrict a, int * restrict b, int n)
{
vector([4,4]) int vect__7.27;
vector([4,4]) int vect__6.26;
vector([4,4]) int vect__4.23;
<bb 2> [local count: 268435456]:
vect__4.23_20 = .MASK_LEN_LOAD (a_11(D), 32B, { -1, ... }, 3, 0);
vect__6.26_15 = .MASK_LEN_LOAD (b_12(D), 32B, { -1, ... }, 3, 0);
vect__7.27_9 = vect__6.26_15 + vect__4.23_20;
.MASK_LEN_STORE (a_11(D), 32B, { -1, ... }, 3, 0, vect__7.27_9); [tail call]
return;
}
```
For RISC-V RVV, csrr and branch instructions can be reduced:
Before this patch:
```
foo2:
csrr a4,vlenb
srli a4,a4,2
li a5,3
bleu a5,a4,.L5
mv a5,a4
.L5:
vsetvli zero,a5,e32,m1,ta,ma
...
```
After this patch.
```
foo2:
vsetivli zero,3,e32,m1,ta,ma
...
```
gcc/ChangeLog:
* fold-const.cc (can_min_p): New function.
(poly_int_binop): Try fold MIN_EXPR.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/div-1.c: Adjust.
* gcc.target/riscv/rvv/autovec/vls/shift-3.c: Adjust.
* gcc.target/riscv/rvv/autovec/fold-min-poly.c: New test.
|
|
[PR102989]
On Thu, Sep 07, 2023 at 10:36:02AM +0200, Thomas Schwinge wrote:
> Minor comment/question: are we doing away with the property that
> 'assert'-like "calls" must not have side effects? Per 'gcc/system.h',
> this is "OK" for 'gcc_assert' for '#if ENABLE_ASSERT_CHECKING' or
> '#elif (GCC_VERSION >= 4005)' -- that is, GCC 4.5, which is always-true,
> thus the "offending" '#else' is never active. However, it's different
> for standard 'assert' and 'gcc_checking_assert', so I'm not sure if
> that's a good property for 'gcc_assert' only? For example, see also
> <https://gcc.gnu.org/PR6906> "warn about asserts with side effects", or
> recent <https://gcc.gnu.org/PR111144>
> "RFE: could -fanalyzer warn about assertions that have side effects?".
You're right, the
#define gcc_assert(EXPR) ((void)(0 && (EXPR)))
fallback definition is incompatible with the way I've used it, so for
--disable-checking built by non-GCC it would not work properly.
2023-09-07 Jakub Jelinek <jakub@redhat.com>
PR c/102989
* expr.cc (expand_expr_real_1): Don't call targetm.c.bitint_type_info
inside gcc_assert, as later code relies on it filling info variable.
* gimple-fold.cc (clear_padding_bitint_needs_padding_p,
clear_padding_type): Likewise.
* varasm.cc (output_constant): Likewise.
* fold-const.cc (native_encode_int, native_interpret_int): Likewise.
* stor-layout.cc (finish_bitfield_representative, layout_type):
Likewise.
* gimple-lower-bitint.cc (bitint_precision_kind): Likewise.
|
|
The following patch introduces the middle-end part of the _BitInt
support, a new BITINT_TYPE, handling it where needed, except the lowering
pass and sanitizer support.
2023-09-06 Jakub Jelinek <jakub@redhat.com>
PR c/102989
* tree.def (BITINT_TYPE): New type.
* tree.h (TREE_CHECK6, TREE_NOT_CHECK6): Define.
(NUMERICAL_TYPE_CHECK, INTEGRAL_TYPE_P): Include
BITINT_TYPE.
(BITINT_TYPE_P): Define.
(CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode bit-fields if
they have BITINT_TYPE type.
(tree_check6, tree_not_check6): New inline functions.
(any_integral_type_check): Include BITINT_TYPE.
(build_bitint_type): Declare.
* tree.cc (tree_code_size, wide_int_to_tree_1, cache_integer_cst,
build_zero_cst, type_hash_canon_hash, type_cache_hasher::equal,
type_hash_canon): Handle BITINT_TYPE.
(bitint_type_cache): New variable.
(build_bitint_type): New function.
(signed_or_unsigned_type_for, verify_type_variant, verify_type):
Handle BITINT_TYPE.
(tree_cc_finalize): Free bitint_type_cache.
* builtins.cc (type_to_class): Handle BITINT_TYPE.
(fold_builtin_unordered_cmp): Handle BITINT_TYPE like INTEGER_TYPE.
* cfgexpand.cc (expand_debug_expr): Punt on BLKmode BITINT_TYPE
INTEGER_CSTs.
* convert.cc (convert_to_pointer_1, convert_to_real_1,
convert_to_complex_1): Handle BITINT_TYPE like INTEGER_TYPE.
(convert_to_integer_1): Likewise. For BITINT_TYPE don't check
GET_MODE_PRECISION (TYPE_MODE (type)).
* doc/generic.texi (BITINT_TYPE): Document.
* doc/tm.texi.in (TARGET_C_BITINT_TYPE_INFO): New.
* doc/tm.texi: Regenerated.
* dwarf2out.cc (base_type_die, is_base_type, modified_type_die,
gen_type_die_with_usage): Handle BITINT_TYPE.
(rtl_for_decl_init): Punt on BLKmode BITINT_TYPE INTEGER_CSTs or
handle those which fit into shwi.
* expr.cc (expand_expr_real_1): Define EXTEND_BITINT macro, reduce
to bitfield precision reads from BITINT_TYPE vars, parameters or
memory locations. Expand large/huge BITINT_TYPE INTEGER_CSTs into
memory.
* fold-const.cc (fold_convert_loc, make_range_step): Handle
BITINT_TYPE.
(extract_muldiv_1): For BITINT_TYPE use TYPE_PRECISION rather than
GET_MODE_SIZE (SCALAR_INT_TYPE_MODE).
(native_encode_int, native_interpret_int, native_interpret_expr):
Handle BITINT_TYPE.
* gimple-expr.cc (useless_type_conversion_p): Make BITINT_TYPE
to some other integral type or vice versa conversions non-useless.
* gimple-fold.cc (gimple_fold_builtin_memset): Punt for BITINT_TYPE.
(clear_padding_unit): Mention in comment that _BitInt types don't need
to fit either.
(clear_padding_bitint_needs_padding_p): New function.
(clear_padding_type_may_have_padding_p): Handle BITINT_TYPE.
(clear_padding_type): Likewise.
* internal-fn.cc (expand_mul_overflow): For unsigned non-mode
precision operands force pos_neg? to 1.
(expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT,
expand_BITINTTOFLOAT): New functions.
* internal-fn.def (MULBITINT, DIVMODBITINT, FLOATTOBITINT,
BITINTTOFLOAT): New internal functions.
* internal-fn.h (expand_MULBITINT, expand_DIVMODBITINT,
expand_FLOATTOBITINT, expand_BITINTTOFLOAT): Declare.
* match.pd (non-equality compare simplifications from fold_binary):
Punt if TYPE_MODE (arg1_type) is BLKmode.
* pretty-print.h (pp_wide_int): Handle printing of large precision
wide_ints which would buffer overflow digit_buffer.
* stor-layout.cc (finish_bitfield_representative): For bit-fields
with BITINT_TYPE, prefer representatives with precisions in
multiple of limb precision.
(layout_type): Handle BITINT_TYPE. Handle COMPLEX_TYPE with BLKmode
element type and assert it is BITINT_TYPE.
* target.def (bitint_type_info): New C target hook.
* target.h (struct bitint_info): New type.
* targhooks.cc (default_bitint_type_info): New function.
* targhooks.h (default_bitint_type_info): Declare.
* tree-pretty-print.cc (dump_generic_node): Handle BITINT_TYPE.
Handle printing large wide_ints which would buffer overflow
digit_buffer.
* tree-ssa-sccvn.cc: Include target.h.
(eliminate_dom_walker::eliminate_stmt): Punt for large/huge
BITINT_TYPE.
* tree-switch-conversion.cc (jump_table_cluster::emit): For more than
64-bit BITINT_TYPE subtract low bound from expression and cast to
64-bit integer type both the controlling expression and case labels.
* typeclass.h (enum type_class): Add bitint_type_class enumerator.
* varasm.cc (output_constant): Handle BITINT_TYPE INTEGER_CSTs.
* vr-values.cc (check_for_binary_op_overflow): Use widest2_int rather
than widest_int.
(simplify_using_ranges::simplify_internal_call_using_ranges): Use
unsigned_type_for rather than build_nonstandard_integer_type.
|
|
In valid_mask_for_fold_vec_perm_cst we set arg_npatterns always
to VECTOR_CST_NPATTERNS (arg0) because of (q1 & 0) == 0:
/* Ensure that the stepped sequence always selects from the same
input pattern. */
unsigned arg_npatterns
= ((q1 & 0) == 0) ? VECTOR_CST_NPATTERNS (arg0)
: VECTOR_CST_NPATTERNS (arg1);
resulting in wrong code-gen issues.
The patch fixes this by changing the condition to (q1 & 1) == 0.
gcc/ChangeLog:
PR tree-optimization/111048
* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Set arg_npatterns
correctly.
(fold_vec_perm_cst): Remove workaround and again call
valid_mask_fold_vec_perm_cst_p for both VLS and VLA vectors.
(test_fold_vec_perm_cst::test_nunits_min_4): Add test-case.
|
|
The following avoids running into somehow flawed logic in fold_vec_perm
for non-VLA vectors.
PR tree-optimization/111048
* fold-const.cc (fold_vec_perm_cst): Check for non-VLA
vectors first.
* gcc.dg/torture/pr111048.c: New testcase.
|
|
The patch extends fold_vec_perm to fold VLA vector_csts.
For eg:
arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
sel = { 0, len, ...} npatterns = 2, nelts_per_pattern = 1, len = 4 + 4x
res = VEC_PERM_EXPR<arg0, arg1, sel>
--> { arg0[0], arg1[0], ... }, npatterns = 2, nelts_per_pattern = 1
Eg 2:
arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x
arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x
sel = {0, 1, 2, ...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x
For this case the index 2 in sel is ambiguous for len 2 + 2x:
if x = 0, runtime vector length = 2 and sel[i] will choose arg1[0]
if x > 0, runtime vector length > 2 and sel[i] choose arg0[2].
So we return NULL_TREE for this case.
This leads us to defining a constraint that a stepped sequence in sel,
should only select a particular pattern from a particular input vector.
Eg 3:
arg0 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
arg1 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
sel = { len, 0, 2, ... } npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
sel contains a single pattern with stepped sequence: {0, 2, ...}.
Let, a1 = the first element of stepped part of sequence, which is 0.
Let esel = number of total elements in stepped sequence.
Thus,
esel = len / sel_npatterns
= (4 + 4x) / 1
= 4 + 4x
Let S = step of the sequence, which is 2 in this case.
Let ae = last element of the stepped sequence.
Thus,
ae = a1 + (esel - 2) * S
= 0 + (4 + 4x - 2) * 2
= 4 + 8x
To ensure that we select elements from the same input vector,
a1 /trunc len = ae /trunc len.
Let, q1 = a1 /trunc len = 0 / (4 + 4x) = 0
Let, qe = ae /trunc len = (4 + 8x) / (4 + 4x) = 1
Since q1 != qe, we cross input vectors, and return NULL_TREE for this case.
However, if sel was:
sel = {len, 0, 1, ...}
The only change in this case is S = 1.
So,
ae = a1 + (esel - 2) * S
= 0 + (4 + 4x - 2) * 1
= 2 + 4x
In this case, a1/len == ae/len == 0, and the stepped sequence chooses all elements
from arg0.
Thus,
res = {arg1[0], arg0[0], arg0[1], ...}
For VLA folding, sel has to conform to constraints imposed in
valid_mask_for_fold_vec_perm_cst_p.
test_fold_vec_perm_cst defines several unit-tests for VLA folding.
gcc/ChangeLog:
* fold-const.cc (INCLUDE_ALGORITHM): Add Include.
(valid_mask_for_fold_vec_perm_cst_p): New function.
(fold_vec_perm_cst): Likewise.
(fold_vec_perm): Adjust assert and call fold_vec_perm_cst.
(test_fold_vec_perm_cst): New namespace.
(test_fold_vec_perm_cst::build_vec_cst_rand): New function.
(test_fold_vec_perm_cst::validate_res): Likewise.
(test_fold_vec_perm_cst::validate_res_vls): Likewise.
(test_fold_vec_perm_cst::builder_push_elems): Likewise.
(test_fold_vec_perm_cst::test_vnx4si_v4si): Likewise.
(test_fold_vec_perm_cst::test_v4si_vnx4si): Likewise.
(test_fold_vec_perm_cst::test_all_nunits): Likewise.
(test_fold_vec_perm_cst::test_nunits_min_2): Likewise.
(test_fold_vec_perm_cst::test_nunits_min_4): Likewise.
(test_fold_vec_perm_cst::test_nunits_min_8): Likewise.
(test_fold_vec_perm_cst::test_nunits_max_4): Likewise.
(test_fold_vec_perm_cst::is_simple_vla_size): Likewise.
(test_fold_vec_perm_cst::test): Likewise.
(fold_const_cc_tests): Call test_fold_vec_perm_cst::test.
Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
|
|
The change to only cache constexpr calls that are
reduced_constant_expression_p tripped on bit-cast3.C, which failed that
predicate due to the presence of an empty field in the result of
native_interpret_aggregate, which reduced_constant_expression_p rejects to
avoid confusing output_constructor.
This patch proposes to skip such fields in native_interpret_aggregate, since
they aren't actually involved in the value representation.
gcc/ChangeLog:
* fold-const.cc (native_interpret_aggregate): Skip empty fields.
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_bit_cast): Check that the result of
native_interpret_aggregate doesn't need more evaluation.
|
|
Also change some internal variables and function argument from int to bool.
gcc/ChangeLog:
* fold-const.h (multiple_of_p): Change return type from int to bool.
* fold-const.cc (split_tree): Change negl_p, neg_litp_p,
neg_conp_p and neg_var_p variables to bool.
(const_binop): Change sat_p variable to bool.
(merge_ranges): Change no_overlap variable to bool.
(extract_muldiv_1): Change same_p variable to bool.
(tree_swap_operands_p): Update function body for bool return type.
(fold_truth_andor): Change commutative variable to bool.
(multiple_of_p): Change return type
from int to void and adjust function body accordingly.
* optabs.h (expand_twoval_unop): Change return type from int to bool.
(expand_twoval_binop): Ditto.
(can_compare_p): Ditto.
(have_add2_insn): Ditto.
(have_addptr3_insn): Ditto.
(have_sub2_insn): Ditto.
(have_insn_for): Ditto.
* optabs.cc (add_equal_note): Ditto.
(widen_operand): Change no_extend argument from int to bool.
(expand_binop): Ditto.
(expand_twoval_unop): Change return type
from int to void and adjust function body accordingly.
(expand_twoval_binop): Ditto.
(can_compare_p): Ditto.
(have_add2_insn): Ditto.
(have_addptr3_insn): Ditto.
(have_sub2_insn): Ditto.
(have_insn_for): Ditto.
|
|
tree_simple_nonnegative_warnv_p ends up being called on VECTOR_TYPEs
which I think even gets the wrong answer here for tcc_comparison
since vector bools are signed. The following properly guards
that with !VECTOR_TYPE_P.
* fold-const.cc (tree_simple_nonnegative_warnv_p): Guard
the truth_value_p case with !VECTOR_TYPE_P.
|
|
fold_binary tries to transform (double)float1 CMP (double)float2
into float1 CMP float2 but ends up using TYPE_PRECISION on the
argument types. For vector types that compares the number of
lanes which should be always equal (so it's harmless as to
not generating wrong code). The following instead properly
uses element_precision.
The same happens in the corresponding match.pd pattern.
* fold-const.cc (fold_binary_loc): Use element_precision
when trying (double)float1 CMP (double)float2 to
float1 CMP float2 simplification.
* match.pd: Likewise.
|
|
The following makes sure we optimize x != 0 using range info
via tree_expr_nonzero_p via match.pd.
PR tree-optimization/110269
* fold-const.cc (fold_binary_loc): Merge x != 0 folding
with tree_expr_nonzero_p ...
* match.pd (cmp (convert? addr@0) integer_zerop): With this
pattern.
* gcc.dg/tree-ssa/pr110269.c: New testcase.
|
|
The following fixes native interpretation of a buffer as boolean
vector with bit-precision elements such as AVX512 vectors. The
check whether the buffer covers the whole vector was broken for
bit-precision elements and the following instead implements it
based on the vector type size.
PR middle-end/110232
* fold-const.cc (native_interpret_vector): Use TYPE_SIZE_UNIT
to check whether the buffer covers the whole vector.
* gcc.target/i386/pr110232.c: New testcase.
|
|
This patch adds the support for match that was implemented for PR 87913 in phiopt.
It implements it by adding support to minmax_from_comparison for the check.
It uses the range information if available which allows to produce MIN/MAX expression
when comparing against the lower/upper bound of the range instead of lower/upper
of the type.
minmax-20.c is the new testcase which tests the ranges part.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* fold-const.cc (minmax_from_comparison): Add support for NE_EXPR.
* match.pd ((cond (cmp (convert1? x) c1) (convert2? x) c2) pattern):
Add ne as a possible cmp.
((a CMP b) ? minmax<a, c> : minmax<b, c> pattern): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/minmax-22.c: New test.
|
|
The encoder for CONSTRUCTORs assumes that all bit-fields (DECL_BIT_FIELD)
have integral types, but that's not the case in Ada where they may have
pretty much any type, resulting in a wrong encoding for them
gcc/
* fold-const.cc (native_encode_initializer) <CONSTRUCTOR>: Apply the
specific treatment for bit-fields only if they have an integral type
and filter out non-integral bit-fields that do not start and end on
a byte boundary.
gcc/testsuite/
* gnat.dg/opt101.adb: New test.
* gnat.dg/opt101_pkg.ads: New helper.
|
|
This is part 1 of N patch set that will change the expansion
of `(A & C) != 0` from using trees to directly expanding so later
on we can do some cost analysis.
Since the only user of fold_single_bit_test is now
expand, move it to there.
gcc/ChangeLog:
* fold-const.cc (fold_single_bit_test_into_sign_test): Move to
expr.cc.
(fold_single_bit_test): Likewise.
* expr.cc (fold_single_bit_test_into_sign_test): Move from fold-const.cc
(fold_single_bit_test): Likewise and make static.
* fold-const.h (fold_single_bit_test): Remove declaration.
|
|
gcc/ChangeLog:
* alias.cc (ref_all_alias_ptr_type_p): Use _P() defines from tree.h.
* attribs.cc (diag_attr_exclusions): Ditto.
(decl_attributes): Ditto.
(build_type_attribute_qual_variant): Ditto.
* builtins.cc (fold_builtin_carg): Ditto.
(fold_builtin_next_arg): Ditto.
(do_mpc_arg2): Ditto.
* cfgexpand.cc (expand_return): Ditto.
* cgraph.h (decl_in_symtab_p): Ditto.
(symtab_node::get_create): Ditto.
* dwarf2out.cc (base_type_die): Ditto.
(implicit_ptr_descriptor): Ditto.
(gen_array_type_die): Ditto.
(gen_type_die_with_usage): Ditto.
(optimize_location_into_implicit_ptr): Ditto.
* expr.cc (do_store_flag): Ditto.
* fold-const.cc (negate_expr_p): Ditto.
(fold_negate_expr_1): Ditto.
(fold_convert_const): Ditto.
(fold_convert_loc): Ditto.
(constant_boolean_node): Ditto.
(fold_binary_op_with_conditional_arg): Ditto.
(build_fold_addr_expr_with_type_loc): Ditto.
(fold_comparison): Ditto.
(fold_checksum_tree): Ditto.
(tree_unary_nonnegative_warnv_p): Ditto.
(integer_valued_real_unary_p): Ditto.
(fold_read_from_constant_string): Ditto.
* gcc-rich-location.cc (maybe_range_label_for_tree_type_mismatch::get_text): Ditto.
* gimple-expr.cc (useless_type_conversion_p): Ditto.
(is_gimple_reg): Ditto.
(is_gimple_asm_val): Ditto.
(mark_addressable): Ditto.
* gimple-expr.h (is_gimple_variable): Ditto.
(virtual_operand_p): Ditto.
* gimple-ssa-warn-access.cc (pass_waccess::check_dangling_stores): Ditto.
* gimplify.cc (gimplify_bind_expr): Ditto.
(gimplify_return_expr): Ditto.
(gimple_add_padding_init_for_auto_var): Ditto.
(gimplify_addr_expr): Ditto.
(omp_add_variable): Ditto.
(omp_notice_variable): Ditto.
(omp_get_base_pointer): Ditto.
(omp_strip_components_and_deref): Ditto.
(omp_strip_indirections): Ditto.
(omp_accumulate_sibling_list): Ditto.
(omp_build_struct_sibling_lists): Ditto.
(gimplify_adjust_omp_clauses_1): Ditto.
(gimplify_adjust_omp_clauses): Ditto.
(gimplify_omp_for): Ditto.
(goa_lhs_expr_p): Ditto.
(gimplify_one_sizepos): Ditto.
* graphite-scop-detection.cc (scop_detection::graphite_can_represent_scev): Ditto.
* ipa-devirt.cc (odr_types_equivalent_p): Ditto.
* ipa-prop.cc (ipa_set_jf_constant): Ditto.
(propagate_controlled_uses): Ditto.
* ipa-sra.cc (type_prevails_p): Ditto.
(scan_expr_access): Ditto.
* optabs-tree.cc (optab_for_tree_code): Ditto.
* toplev.cc (wrapup_global_declaration_1): Ditto.
* trans-mem.cc (transaction_invariant_address_p): Ditto.
* tree-cfg.cc (verify_types_in_gimple_reference): Ditto.
(verify_gimple_comparison): Ditto.
(verify_gimple_assign_binary): Ditto.
(verify_gimple_assign_single): Ditto.
* tree-complex.cc (get_component_ssa_name): Ditto.
* tree-emutls.cc (lower_emutls_2): Ditto.
* tree-inline.cc (copy_tree_body_r): Ditto.
(estimate_move_cost): Ditto.
(copy_decl_for_dup_finish): Ditto.
* tree-nested.cc (convert_nonlocal_omp_clauses): Ditto.
(note_nonlocal_vla_type): Ditto.
(convert_local_omp_clauses): Ditto.
(remap_vla_decls): Ditto.
(fixup_vla_decls): Ditto.
* tree-parloops.cc (loop_has_vector_phi_nodes): Ditto.
* tree-pretty-print.cc (print_declaration): Ditto.
(print_call_name): Ditto.
* tree-sra.cc (compare_access_positions): Ditto.
* tree-ssa-alias.cc (compare_type_sizes): Ditto.
* tree-ssa-ccp.cc (get_default_value): Ditto.
* tree-ssa-coalesce.cc (populate_coalesce_list_for_outofssa): Ditto.
* tree-ssa-dom.cc (reduce_vector_comparison_to_scalar_comparison): Ditto.
* tree-ssa-forwprop.cc (can_propagate_from): Ditto.
* tree-ssa-propagate.cc (may_propagate_copy): Ditto.
* tree-ssa-sccvn.cc (fully_constant_vn_reference_p): Ditto.
* tree-ssa-sink.cc (statement_sink_location): Ditto.
* tree-ssa-structalias.cc (type_must_have_pointers): Ditto.
* tree-ssa-ter.cc (find_replaceable_in_bb): Ditto.
* tree-ssa-uninit.cc (warn_uninit): Ditto.
* tree-ssa.cc (maybe_rewrite_mem_ref_base): Ditto.
(non_rewritable_mem_ref_base): Ditto.
* tree-streamer-in.cc (lto_input_ts_type_non_common_tree_pointers): Ditto.
* tree-streamer-out.cc (write_ts_type_non_common_tree_pointers): Ditto.
* tree-vect-generic.cc (do_binop): Ditto.
(do_cond): Ditto.
* tree-vect-stmts.cc (vect_init_vector): Ditto.
* tree-vector-builder.h (tree_vector_builder::note_representative): Ditto.
* tree.cc (sign_mask_for): Ditto.
(verify_type_variant): Ditto.
(gimple_canonical_types_compatible_p): Ditto.
(verify_type): Ditto.
* ubsan.cc (get_ubsan_type_info_for_type): Ditto.
* var-tracking.cc (prepare_call_arguments): Ditto.
(vt_add_function_parameters): Ditto.
* varasm.cc (decode_addr_const): Ditto.
|
|
This converts the irange API to use wide_ints exclusively, along with
its users.
This patch will slow down VRP, as there will be more useless
wide_int to tree conversions. However, this slowdown is only
temporary, as a follow-up patch will convert the internal
representation of iranges to wide_ints for a net overall gain
in performance.
gcc/ChangeLog:
* fold-const.cc (expr_not_equal_to): Convert to irange wide_int API.
* gimple-fold.cc (size_must_be_zero_p): Same.
* gimple-loop-versioning.cc
(loop_versioning::prune_loop_conditions): Same.
* gimple-range-edge.cc (gcond_edge_range): Same.
(gimple_outgoing_range::calc_switch_ranges): Same.
* gimple-range-fold.cc (adjust_imagpart_expr): Same.
(adjust_realpart_expr): Same.
(fold_using_range::range_of_address): Same.
(fold_using_range::relation_fold_and_or): Same.
* gimple-range-gori.cc (gori_compute::gori_compute): Same.
(range_is_either_true_or_false): Same.
* gimple-range-op.cc (cfn_toupper_tolower::get_letter_range): Same.
(cfn_clz::fold_range): Same.
(cfn_ctz::fold_range): Same.
* gimple-range-tests.cc (class test_expr_eval): Same.
* gimple-ssa-warn-alloca.cc (alloca_call_type): Same.
* ipa-cp.cc (ipa_value_range_from_jfunc): Same.
(propagate_vr_across_jump_function): Same.
(decide_whether_version_node): Same.
* ipa-prop.cc (ipa_get_value_range): Same.
* ipa-prop.h (ipa_range_set_and_normalize): Same.
* range-op.cc (get_shift_range): Same.
(value_range_from_overflowed_bounds): Same.
(value_range_with_overflow): Same.
(create_possibly_reversed_range): Same.
(equal_op1_op2_relation): Same.
(not_equal_op1_op2_relation): Same.
(lt_op1_op2_relation): Same.
(le_op1_op2_relation): Same.
(gt_op1_op2_relation): Same.
(ge_op1_op2_relation): Same.
(operator_mult::op1_range): Same.
(operator_exact_divide::op1_range): Same.
(operator_lshift::op1_range): Same.
(operator_rshift::op1_range): Same.
(operator_cast::op1_range): Same.
(operator_logical_and::fold_range): Same.
(set_nonzero_range_from_mask): Same.
(operator_bitwise_or::op1_range): Same.
(operator_bitwise_xor::op1_range): Same.
(operator_addr_expr::fold_range): Same.
(pointer_plus_operator::wi_fold): Same.
(pointer_or_operator::op1_range): Same.
(INT): Same.
(UINT): Same.
(INT16): Same.
(UINT16): Same.
(SCHAR): Same.
(UCHAR): Same.
(range_op_cast_tests): Same.
(range_op_lshift_tests): Same.
(range_op_rshift_tests): Same.
(range_op_bitwise_and_tests): Same.
(range_relational_tests): Same.
* range.cc (range_zero): Same.
(range_nonzero): Same.
* range.h (range_true): Same.
(range_false): Same.
(range_true_and_false): Same.
* tree-data-ref.cc (split_constant_offset_1): Same.
* tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Same.
* tree-ssa-loop-unswitch.cc (struct unswitch_predicate): Same.
(find_unswitching_predicates_for_bb): Same.
* tree-ssa-phiopt.cc (value_replacement): Same.
* tree-ssa-threadbackward.cc
(back_threader::find_taken_edge_cond): Same.
* tree-ssanames.cc (ssa_name_has_boolean_range): Same.
* tree-vrp.cc (find_case_label_range): Same.
* value-query.cc (range_query::get_tree_range): Same.
* value-range.cc (irange::set_nonnegative): Same.
(frange::contains_p): Same.
(frange::singleton_p): Same.
(frange::internal_singleton_p): Same.
(irange::irange_set): Same.
(irange::irange_set_1bit_anti_range): Same.
(irange::irange_set_anti_range): Same.
(irange::set): Same.
(irange::operator==): Same.
(irange::singleton_p): Same.
(irange::contains_p): Same.
(irange::set_range_from_nonzero_bits): Same.
(DEFINE_INT_RANGE_INSTANCE): Same.
(INT): Same.
(UINT): Same.
(SCHAR): Same.
(UINT128): Same.
(UCHAR): Same.
(range): New.
(tree_range): New.
(range_int): New.
(range_uint): New.
(range_uint128): New.
(range_uchar): New.
(range_char): New.
(build_range3): Convert to irange wide_int API.
(range_tests_irange3): Same.
(range_tests_int_range_max): Same.
(range_tests_strict_enum): Same.
(range_tests_misc): Same.
(range_tests_nonzero_bits): Same.
(range_tests_nan): Same.
(range_tests_signed_zeros): Same.
* value-range.h (Value_Range::Value_Range): Same.
(irange::set): Same.
(irange::nonzero_p): Same.
(irange::contains_p): Same.
(range_includes_zero_p): Same.
(irange::set_nonzero): Same.
(irange::set_zero): Same.
(contains_zero_p): Same.
(frange::contains_p): Same.
* vr-values.cc
(simplify_using_ranges::op_with_boolean_value_range_p): Same.
(bounds_of_var_in_loop): Same.
(simplify_using_ranges::legacy_fold_cond_overflow): Same.
|
|
This factors out some of the code from the min/max detection
from match.pd into a function so it can be reused in other
places. This is mainly used to detect the conversions
of >= to > which causes the integer values to be changed by
one.
Changes since v1:
* factor out the checks for INTEGER_CSTs so it is more obvious.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd: Factor out the deciding the min/max from
the "(cond (cmp (convert1? x) c1) (convert2? x) c2)"
pattern to ...
* fold-const.cc (minmax_from_comparison): this new function.
* fold-const.h (minmax_from_comparison): New prototype.
|
|
Here we're failing to detect a signed overflow with -O because match.pd,
since r8-1516, transforms
c = (a + 1) - (int) (short int) b;
into
c = (int) ((unsigned int) a + 4294946117);
wrongly eliding the overflow. This kind of problems is usually
avoided by using TYPE_OVERFLOW_SANITIZED in the appropriate place.
The first match.pd hunk in the patch fixes it. I've constructed
a testcase for each of the surrounding cases as well. Then I
noticed that fold_binary_loc/associate has the same problem, so I've
added a TYPE_OVERFLOW_SANITIZED there as well (it may be too coarse,
sorry). Then I found yet another problem, but instead of fixing it
now I've opened 109134. I could probably go on and find a dozen more.
PR sanitizer/109107
gcc/ChangeLog:
* fold-const.cc (fold_binary_loc): Use TYPE_OVERFLOW_SANITIZED
when associating.
* match.pd: Use TYPE_OVERFLOW_SANITIZED.
gcc/testsuite/ChangeLog:
* c-c++-common/ubsan/pr109107-1.c: New test.
* c-c++-common/ubsan/pr109107-2.c: New test.
* c-c++-common/ubsan/pr109107-3.c: New test.
* c-c++-common/ubsan/pr109107-4.c: New test.
|
|
non-lvalue
The problem here is after r0-92187-g2ec5deb5c3146c, maybe_lvalue_p would
return false for compound literals which causes non_lvalue_loc not
to wrap the expression with a NON_LVALUE_EXPR unlike before when it
return true as it returns true for all language specific tree codes.
This fixes that oversight and fixes the testcase to have the cast as
a non-lvalue.
Committed to the trunk as obvious after a bootstrap/test on x86_64-linux-gnu.
PR c/84900
gcc/ChangeLog:
* fold-const.cc (maybe_lvalue_p): Treat COMPOUND_LITERAL_EXPR
as a lvalue.
gcc/testsuite/ChangeLog:
* gcc.dg/compound-literal-cast-lvalue-1.c: New test.
|
|
The following plugs one place in extract_muldiv where it should avoid
folding when sanitizing overflow.
PR middle-end/108995
* fold-const.cc (extract_muldiv_1): Avoid folding
(CST * b) / CST2 when sanitizing overflow and we rely on
overflow being undefined.
* gcc.dg/ubsan/pr108995.c: New testcase.
|
|
verification [PR108934]
In the following testcase we try to std::bit_cast a (pair of) integral
value(s) which has some non-zero bits in the place of x86 long double
(for 64-bit 16 byte type with 10 bytes actually loaded/stored by hw,
for 32-bit 12 byte) and starting with my PR104522 change we reject that
as native_interpret_expr fails on it. The PR104522 change extends what
has been done before for MODE_COMPOSITE_P (but those don't have any padding
bits) to all floating point types, because e.g. the exact x86 long double
has various bit combinations we don't support, like
pseudo-(denormals,infinities,NaNs) or unnormals. The HW handles some of
those as exceptional cases and others similarly to the non-pseudo ones.
But for the padding bits it actually doesn't load/store those bits at all,
it loads/stores 10 bytes. So, I think we should exempt the padding bits
from the reverse comparison (the native_encode_expr bits for the padding
will be all zeros), which the following patch does. For bit_cast it is
similar to e.g. ignoring padding bits if the destination is a structure
which has padding bits in there.
The change changed auto-init-4.c to how it has been behaving before the
PR105259 change, where some more VCEs can be now done.
2023-03-02 Jakub Jelinek <jakub@redhat.com>
PR c++/108934
* fold-const.cc (native_interpret_expr) <case REAL_CST>: Before memcmp
comparison copy the bytes from ptr to a temporary buffer and clearing
padding bits in there.
* gcc.target/i386/auto-init-4.c: Revert PR105259 change.
* g++.target/i386/pr108934.C: New test.
|
|
[PR108256]
We shouldn't narrow multiplications originally done in signed types,
because the original multiplication might overflow but the narrowed
one will be done in unsigned arithmetics and will never overflow.
2023-01-04 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/108256
* convert.cc (do_narrow): Punt for MULT_EXPR if original
type doesn't wrap around and -fsanitize=signed-integer-overflow
is on.
* fold-const.cc (fold_unary_loc) <CASE_CONVERT>: Likewise.
* c-c++-common/ubsan/pr108256.c: New test.
|
|
|
|
In function fold_convert_const_real_from_real, when the modes of
two types involved in fp conversion are the same, we can simply
take it as copy, rebuild with the exactly same TREE_REAL_CST and
the target type. It is more efficient and helps to avoid possible
unexpected signalling bit clearing in [1].
[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608533.html
gcc/ChangeLog:
* fold-const.cc (fold_convert_const_real_from_real): Treat floating
point conversion to a type with same mode as copy instead of normal
convertFormat.
|
|
Unlike protected_set_expr_location, this variant can return
a different tree.
gcc/ChangeLog:
* fold-const.cc (fold_convert_loc): Check return value of
protected_set_expr_location_unshare.
|
|
If the DECL_VALUE_EXPR of a VAR_DECL has EXPR_LOCATION set, then any use of
that variable looks like it has that location, which leads to the debugger
jumping back and forth for both lambdas and structured bindings.
Rather than fix all the uses, it seems simplest to remove any EXPR_LOCATION
when setting DECL_VALUE_EXPR. So the cp/ hunks aren't necessary, but they
avoid the need to unshare to remove the location.
PR c++/84471
PR c++/107504
gcc/cp/ChangeLog:
* coroutines.cc (transform_local_var_uses): Don't
specify a location for DECL_VALUE_EXPR.
* decl.cc (cp_finish_decomp): Likewise.
gcc/ChangeLog:
* fold-const.cc (protected_set_expr_location_unshare): Not static.
* tree.h: Declare it.
* tree.cc (decl_value_expr_insert): Use it.
include/ChangeLog:
* ansidecl.h (ATTRIBUTE_WARN_UNUSED_RESULT): Add __.
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/value-expr1.C: New test.
* g++.dg/tree-ssa/value-expr2.C: New test.
* g++.dg/analyzer/pr93212.C: Move warning.
|
|
This reverts the part that substitutes from the definition of an
SSA name to the capture, thus ADDR_EXPR@0 eventually yielding
&y_1->a[i_2] instead of _3. That's because I didn't think of
how to deal with substituting @0 in the result pattern. So
the following re-instantiates the SSA def CONSTRUCTOR handling
and in the ADDR_EXPR helpers used by match.pd handles SSA names
defined to ADDR_EXPRs transparently.
* genmatch.cc (dt_simplify::gen): Revert last change.
* match.pd: Revert simplification of CONSTUCTOR leaf handling.
(&x cmp SSA_NAME): Handle ADDR_EXPR in SSA defs.
* fold-const.cc (split_address_to_core_and_offset): Handle
ADDR_EXPRs in SSA defs.
(address_compare): Likewise.
|
|
This consists of 3 changes which stronger type checking has indicated
are incorrect.
gcc/
* fold-const.cc (fold_unary_loc): Check TREE_TYPE of node.
(tree_invalid_nonnegative_warnv_p): Likewise.
gcc/c-family/
* c-attribs.cc (handle_deprecated_attribute): Use type when
using TYPE_NAME.
|
|
This removes all uses of ASSERT_EXPR except the internal one in ipa-*.
gcc/ChangeLog:
* doc/gimple.texi: Remove ASSERT_EXPR references.
* fold-const.cc (tree_expr_nonzero_warnv_p): Same.
(fold_binary_loc): Same.
(tree_expr_nonnegative_warnv_p): Same.
* gimple-array-bounds.cc (get_base_decl): Same.
* gimple-pretty-print.cc (dump_unary_rhs): Same.
* gimple.cc (get_gimple_rhs_num_ops): Same.
* pointer-query.cc (handle_ssa_name): Same.
* tree-cfg.cc (verify_gimple_assign_single): Same.
* tree-pretty-print.cc (dump_generic_node): Same.
* tree-scalar-evolution.cc (scev_dfs::follow_ssa_edge_expr):Same.
(interpret_rhs_expr): Same.
* tree-ssa-operands.cc (operands_scanner::get_expr_operands): Same.
* tree-ssa-propagate.cc
(substitute_and_fold_dom_walker::before_dom_children): Same.
* tree-ssa-threadedge.cc: Same.
* tree-vrp.cc (overflow_comparison_p): Same.
* tree.def (ASSERT_EXPR): Add note.
* tree.h (ASSERT_EXPR_VAR): Remove.
(ASSERT_EXPR_COND): Remove.
* vr-values.cc (simplify_using_ranges::vrp_visit_cond_stmt):
Remove comment.
|
|
There is a thinko in a recent improvement made to operand_equal_p where
the code just looks at operand 2 of COMPONENT_REF, if it is present, to
compare addresses. That's wrong because operand 2 contains the number of
DECL_OFFSET_ALIGN-bit-sized words so, when DECL_OFFSET_ALIGN > 8, not all
the bytes are included and some of them are in DECL_FIELD_BIT_OFFSET, see
get_inner_reference for the model computation.
In other words, you would need to compare operand 2 and DECL_OFFSET_ALIGN
and DECL_FIELD_BIT_OFFSET in this situation, but I'm not sure this is worth
the hassle in practice so the fix just removes this alternate handling.
gcc/
* fold-const.cc (operand_compare::operand_equal_p) <COMPONENT_REF>:
Do not take into account operand 2.
(operand_compare::hash_operand) <COMPONENT_REF>: Likewise.
gcc/testsuite/
* gnat.dg/opt99.adb: New test.
* gnat.dg/opt99_pkg1.ads, gnat.dg/opt99_pkg1.adb: New helper.
* gnat.dg/opt99_pkg2.ads: Likewise.
|
|
The following patch adds some complex builtins which have libm
implementation in glibc 2.26 and later on various arches.
It is needed for libstdc++ _Float128 support when long double is not
IEEE quad.
2022-10-31 Jakub Jelinek <jakub@redhat.com>
* builtin-types.def (BT_COMPLEX_FLOAT16, BT_COMPLEX_FLOAT32,
BT_COMPLEX_FLOAT64, BT_COMPLEX_FLOAT128, BT_COMPLEX_FLOAT32X,
BT_COMPLEX_FLOAT64X, BT_COMPLEX_FLOAT128X,
BT_FN_COMPLEX_FLOAT16_COMPLEX_FLOAT16,
BT_FN_COMPLEX_FLOAT32_COMPLEX_FLOAT32,
BT_FN_COMPLEX_FLOAT64_COMPLEX_FLOAT64,
BT_FN_COMPLEX_FLOAT128_COMPLEX_FLOAT128,
BT_FN_COMPLEX_FLOAT32X_COMPLEX_FLOAT32X,
BT_FN_COMPLEX_FLOAT64X_COMPLEX_FLOAT64X,
BT_FN_COMPLEX_FLOAT128X_COMPLEX_FLOAT128X,
BT_FN_FLOAT16_COMPLEX_FLOAT16, BT_FN_FLOAT32_COMPLEX_FLOAT32,
BT_FN_FLOAT64_COMPLEX_FLOAT64, BT_FN_FLOAT128_COMPLEX_FLOAT128,
BT_FN_FLOAT32X_COMPLEX_FLOAT32X, BT_FN_FLOAT64X_COMPLEX_FLOAT64X,
BT_FN_FLOAT128X_COMPLEX_FLOAT128X,
BT_FN_COMPLEX_FLOAT16_COMPLEX_FLOAT16_COMPLEX_FLOAT16,
BT_FN_COMPLEX_FLOAT32_COMPLEX_FLOAT32_COMPLEX_FLOAT32,
BT_FN_COMPLEX_FLOAT64_COMPLEX_FLOAT64_COMPLEX_FLOAT64,
BT_FN_COMPLEX_FLOAT128_COMPLEX_FLOAT128_COMPLEX_FLOAT128,
BT_FN_COMPLEX_FLOAT32X_COMPLEX_FLOAT32X_COMPLEX_FLOAT32X,
BT_FN_COMPLEX_FLOAT64X_COMPLEX_FLOAT64X_COMPLEX_FLOAT64X,
BT_FN_COMPLEX_FLOAT128X_COMPLEX_FLOAT128X_COMPLEX_FLOAT128X): New.
* builtins.def (CABS_TYPE, CACOSH_TYPE, CARG_TYPE, CASINH_TYPE,
CPOW_TYPE, CPROJ_TYPE): Define and undefine later.
(BUILT_IN_CABS, BUILT_IN_CACOSH, BUILT_IN_CACOS, BUILT_IN_CARG,
BUILT_IN_CASINH, BUILT_IN_CASIN, BUILT_IN_CATANH, BUILT_IN_CATAN,
BUILT_IN_CCOSH, BUILT_IN_CCOS, BUILT_IN_CEXP, BUILT_IN_CLOG,
BUILT_IN_CPOW, BUILT_IN_CPROJ, BUILT_IN_CSINH, BUILT_IN_CSIN,
BUILT_IN_CSQRT, BUILT_IN_CTANH, BUILT_IN_CTAN): Add
DEF_EXT_LIB_FLOATN_NX_BUILTINS.
* fold-const-call.cc (fold_const_call_sc, fold_const_call_cc,
fold_const_call_ccc): Add various CASE_CFN_*_FN: cases when
CASE_CFN_* is present.
* gimple-ssa-backprop.cc (backprop::process_builtin_call_use):
Likewise.
* builtins.cc (expand_builtin, fold_builtin_1): Likewise.
* fold-const.cc (negate_mathfn_p, tree_expr_finite_p,
tree_expr_maybe_signaling_nan_p, tree_expr_maybe_nan_p,
tree_expr_maybe_real_minus_zero_p, tree_call_nonnegative_warnv_p):
Likewise.
|
|
When working on libstdc++ extended float support in <cmath>, I found that
we need various builtins for the _Float{16,32,64,128,32x,64x,128x} types.
Glibc 2.26 and later provides the underlying libm routines (except for
_Float16 and _Float128x for the time being) and in libstdc++ I think we
need at least the _Float128 builtins on x86_64, i?86, powerpc64le and ia64
(when long double is IEEE quad, we can handle it by using __builtin_*l
instead), because without the builtins the overloads couldn't be constexpr
(say when it would declare the *f128 extern "C" routines itself and call
them).
The testcase covers just types of those builtins and their constant
folding, so doesn't need actual libm support.
2022-10-31 Jakub Jelinek <jakub@redhat.com>
* builtin-types.def (BT_FLOAT16_PTR, BT_FLOAT32_PTR, BT_FLOAT64_PTR,
BT_FLOAT128_PTR, BT_FLOAT32X_PTR, BT_FLOAT64X_PTR, BT_FLOAT128X_PTR):
New DEF_PRIMITIVE_TYPE.
(BT_FN_INT_FLOAT16, BT_FN_INT_FLOAT32, BT_FN_INT_FLOAT64,
BT_FN_INT_FLOAT128, BT_FN_INT_FLOAT32X, BT_FN_INT_FLOAT64X,
BT_FN_INT_FLOAT128X, BT_FN_LONG_FLOAT16, BT_FN_LONG_FLOAT32,
BT_FN_LONG_FLOAT64, BT_FN_LONG_FLOAT128, BT_FN_LONG_FLOAT32X,
BT_FN_LONG_FLOAT64X, BT_FN_LONG_FLOAT128X, BT_FN_LONGLONG_FLOAT16,
BT_FN_LONGLONG_FLOAT32, BT_FN_LONGLONG_FLOAT64,
BT_FN_LONGLONG_FLOAT128, BT_FN_LONGLONG_FLOAT32X,
BT_FN_LONGLONG_FLOAT64X, BT_FN_LONGLONG_FLOAT128X): New
DEF_FUNCTION_TYPE_1.
(BT_FN_FLOAT16_FLOAT16_FLOAT16PTR, BT_FN_FLOAT32_FLOAT32_FLOAT32PTR,
BT_FN_FLOAT64_FLOAT64_FLOAT64PTR, BT_FN_FLOAT128_FLOAT128_FLOAT128PTR,
BT_FN_FLOAT32X_FLOAT32X_FLOAT32XPTR,
BT_FN_FLOAT64X_FLOAT64X_FLOAT64XPTR,
BT_FN_FLOAT128X_FLOAT128X_FLOAT128XPTR, BT_FN_FLOAT16_FLOAT16_INT,
BT_FN_FLOAT32_FLOAT32_INT, BT_FN_FLOAT64_FLOAT64_INT,
BT_FN_FLOAT128_FLOAT128_INT, BT_FN_FLOAT32X_FLOAT32X_INT,
BT_FN_FLOAT64X_FLOAT64X_INT, BT_FN_FLOAT128X_FLOAT128X_INT,
BT_FN_FLOAT16_FLOAT16_INTPTR, BT_FN_FLOAT32_FLOAT32_INTPTR,
BT_FN_FLOAT64_FLOAT64_INTPTR, BT_FN_FLOAT128_FLOAT128_INTPTR,
BT_FN_FLOAT32X_FLOAT32X_INTPTR, BT_FN_FLOAT64X_FLOAT64X_INTPTR,
BT_FN_FLOAT128X_FLOAT128X_INTPTR, BT_FN_FLOAT16_FLOAT16_LONG,
BT_FN_FLOAT32_FLOAT32_LONG, BT_FN_FLOAT64_FLOAT64_LONG,
BT_FN_FLOAT128_FLOAT128_LONG, BT_FN_FLOAT32X_FLOAT32X_LONG,
BT_FN_FLOAT64X_FLOAT64X_LONG, BT_FN_FLOAT128X_FLOAT128X_LONG): New
DEF_FUNCTION_TYPE_2.
(BT_FN_FLOAT16_FLOAT16_FLOAT16_INTPTR,
BT_FN_FLOAT32_FLOAT32_FLOAT32_INTPTR,
BT_FN_FLOAT64_FLOAT64_FLOAT64_INTPTR,
BT_FN_FLOAT128_FLOAT128_FLOAT128_INTPTR,
BT_FN_FLOAT32X_FLOAT32X_FLOAT32X_INTPTR,
BT_FN_FLOAT64X_FLOAT64X_FLOAT64X_INTPTR,
BT_FN_FLOAT128X_FLOAT128X_FLOAT128X_INTPTR): New DEF_FUNCTION_TYPE_3.
* builtins.def (ACOSH_TYPE, ATAN2_TYPE, ATANH_TYPE, COSH_TYPE,
FDIM_TYPE, HUGE_VAL_TYPE, HYPOT_TYPE, ILOGB_TYPE, LDEXP_TYPE,
LGAMMA_TYPE, LLRINT_TYPE, LOG10_TYPE, LRINT_TYPE, MODF_TYPE,
NEXTAFTER_TYPE, REMQUO_TYPE, SCALBLN_TYPE, SCALBN_TYPE, SINH_TYPE):
Define and undefine later.
(FMIN_TYPE, SQRT_TYPE): Undefine at a later line.
(INF_TYPE): Define at a later line.
(BUILT_IN_ACOSH, BUILT_IN_ACOS, BUILT_IN_ASINH, BUILT_IN_ASIN,
BUILT_IN_ATAN2, BUILT_IN_ATANH, BUILT_IN_ATAN, BUILT_IN_CBRT,
BUILT_IN_COSH, BUILT_IN_COS, BUILT_IN_ERFC, BUILT_IN_ERF,
BUILT_IN_EXP2, BUILT_IN_EXP, BUILT_IN_EXPM1, BUILT_IN_FDIM,
BUILT_IN_FMOD, BUILT_IN_FREXP, BUILT_IN_HYPOT, BUILT_IN_ILOGB,
BUILT_IN_LDEXP, BUILT_IN_LGAMMA, BUILT_IN_LLRINT, BUILT_IN_LLROUND,
BUILT_IN_LOG10, BUILT_IN_LOG1P, BUILT_IN_LOG2, BUILT_IN_LOGB,
BUILT_IN_LOG, BUILT_IN_LRINT, BUILT_IN_LROUND, BUILT_IN_MODF,
BUILT_IN_NEXTAFTER, BUILT_IN_POW, BUILT_IN_REMAINDER, BUILT_IN_REMQUO,
BUILT_IN_SCALBLN, BUILT_IN_SCALBN, BUILT_IN_SINH, BUILT_IN_SIN,
BUILT_IN_TANH, BUILT_IN_TAN, BUILT_IN_TGAMMA): Add
DEF_EXT_LIB_FLOATN_NX_BUILTINS.
(BUILT_IN_HUGE_VAL): Use HUGE_VAL_TYPE instead of INF_TYPE in
DEF_GCC_FLOATN_NX_BUILTINS.
* fold-const-call.cc (fold_const_call_ss): Add various CASE_CFN_*_FN:
cases when CASE_CFN_* is present.
(fold_const_call_sss): Likewise.
* builtins.cc (mathfn_built_in_2): Use CASE_MATHFN_FLOATN instead of
CASE_MATHFN for various builtins in SEQ_OF_CASE_MATHFN macro.
(builtin_with_linkage_p): Add CASE_FLT_FN_FLOATN_NX for various
builtins next to CASE_FLT_FN.
* fold-const.cc (tree_call_nonnegative_warnv_p): Add CASE_CFN_*_FN:
next to CASE_CFN_*: for various builtins.
* tree-call-cdce.cc (can_test_argument_range): Add
CASE_FLT_FN_FLOATN_NX next to CASE_FLT_FN for various builtins.
(edom_only_function): Likewise.
* gcc.dg/torture/floatn-builtin.h: Add tests for newly added builtins.
|
|
The following patch implements C++23 P1774R8 - Portable assumptions
paper, by introducing support for [[assume (cond)]]; attribute for C++.
In addition to that the patch adds [[gnu::assume (cond)]]; and
__attribute__((assume (cond))); support to both C and C++.
As described in C++23, the attribute argument is conditional-expression
rather than the usual assignment-expression for attribute arguments,
the condition is contextually converted to bool (for C truthvalue conversion
is done on it) and is never evaluated at runtime.
For C++ constant expression evaluation, I only check the simplest conditions
for undefined behavior, because otherwise I'd need to undo changes to
*ctx->global which happened during the evaluation (but I believe the spec
allows that and we can further improve later).
The patch uses a new internal function, .ASSUME, to hold the condition
in the FEs. At gimplification time, if the condition is simple/without
side-effects, it is gimplified as if (cond) ; else __builtin_unreachable ();
and otherwise for now dropped on the floor. The intent is to incrementally
outline the conditions into separate artificial functions and use
.ASSUME further to tell the ranger and perhaps other optimization passes
about the assumptions, as detailed in the PR.
When implementing it, I found that assume entry hasn't been added to
https://eel.is/c++draft/cpp.cond#6
Jonathan said he'll file a NB comment about it, this patch assumes it
has been added into the table as 202207L when the paper has been voted in.
With the attributes for both C/C++, I'd say we don't need to add
__builtin_assume with similar purpose, especially when __builtin_assume
in LLVM is just weird. It is strange for side-effects in function call's
argument not to be evaluated, and LLVM in that case (annoyingly) warns
and ignores the side-effects (but doesn't do then anything with it),
if there are no side-effects, it will work like our
if (!cond) __builtin_unreachable ();
2022-10-06 Jakub Jelinek <jakub@redhat.com>
PR c++/106654
gcc/
* internal-fn.def (ASSUME): New internal function.
* internal-fn.h (expand_ASSUME): Declare.
* internal-fn.cc (expand_ASSUME): Define.
* gimplify.cc (gimplify_call_expr): Gimplify IFN_ASSUME.
* fold-const.h (simple_condition_p): Declare.
* fold-const.cc (simple_operand_p_2): Rename to ...
(simple_condition_p): ... this. Remove forward declaration.
No longer static. Adjust function comment and fix a typo in it.
Adjust recursive call.
(simple_operand_p): Adjust function comment.
(fold_truth_andor): Adjust simple_operand_p_2 callers to call
simple_condition_p.
* doc/extend.texi: Document assume attribute. Move fallthrough
attribute example to its section.
gcc/c-family/
* c-attribs.cc (handle_assume_attribute): New function.
(c_common_attribute_table): Add entry for assume attribute.
* c-lex.cc (c_common_has_attribute): Handle
__have_cpp_attribute (assume).
gcc/c/
* c-parser.cc (handle_assume_attribute): New function.
(c_parser_declaration_or_fndef): Handle assume attribute.
(c_parser_attribute_arguments): Add assume_attr argument,
if true, parse first argument as conditional expression.
(c_parser_gnu_attribute, c_parser_std_attribute): Adjust
c_parser_attribute_arguments callers.
(c_parser_statement_after_labels) <case RID_ATTRIBUTE>: Handle
assume attribute.
gcc/cp/
* cp-tree.h (process_stmt_assume_attribute): Implement C++23
P1774R8 - Portable assumptions. Declare.
(diagnose_failing_condition): Declare.
(find_failing_clause): Likewise.
* parser.cc (assume_attr): New enumerator.
(cp_parser_parenthesized_expression_list): Handle assume_attr.
Remove identifier variable, for id_attr push the identifier into
expression_list right away instead of inserting it before all the
others at the end.
(cp_parser_conditional_expression): New function.
(cp_parser_constant_expression): Use it.
(cp_parser_statement): Handle assume attribute.
(cp_parser_expression_statement): Likewise.
(cp_parser_gnu_attribute_list): Use assume_attr for assume
attribute.
(cp_parser_std_attribute): Likewise. Handle standard assume
attribute like gnu::assume.
* cp-gimplify.cc (process_stmt_assume_attribute): New function.
* constexpr.cc: Include fold-const.h.
(find_failing_clause_r, find_failing_clause): New functions,
moved from semantics.cc with ctx argument added and if non-NULL,
call cxx_eval_constant_expression rather than fold_non_dependent_expr.
(cxx_eval_internal_function): Handle IFN_ASSUME.
(potential_constant_expression_1): Likewise.
* pt.cc (tsubst_copy_and_build): Likewise.
* semantics.cc (diagnose_failing_condition): New function.
(find_failing_clause_r, find_failing_clause): Moved to constexpr.cc.
(finish_static_assert): Use it. Add auto_diagnostic_group.
gcc/testsuite/
* gcc.dg/attr-assume-1.c: New test.
* gcc.dg/attr-assume-2.c: New test.
* gcc.dg/attr-assume-3.c: New test.
* g++.dg/cpp2a/feat-cxx2a.C: Add colon to C++20 features
comment, add C++20 attributes comment and move C++20
new features after the attributes before them.
* g++.dg/cpp23/feat-cxx2b.C: Likewise. Test
__has_cpp_attribute(assume).
* g++.dg/cpp23/attr-assume1.C: New test.
* g++.dg/cpp23/attr-assume2.C: New test.
* g++.dg/cpp23/attr-assume3.C: New test.
* g++.dg/cpp23/attr-assume4.C: New test.
|
|
Following my middle-end patch for PR tree-optimization/94026, I'd promised
Jeff Law that I'd clean up the dead-code in fold-const.cc now that these
optimizations are handled in match.pd. Alas, I discovered things aren't
quite that simple, as the transformations I'd added avoided cases where
C2 overlapped with the new bits introduced by the shift, but the original
code handled any value of C2 provided that it had a single-bit set (under
the condition that C3 was always zero).
This patch upgrades the transformations supported by match.pd to cover
any values of C2 and C3, provided that C1 is a valid bit shift constant,
for all three shift types (logical right, arithmetic right and left).
This then makes the code in fold-const.cc fully redundant, and adds
support for some new (corner) cases not previously handled. If the
constant C1 is valid for the type's precision, the shift is now always
eliminated (with C2 and C3 possibly updated to test the sign bit).
Interestingly, the fold-const.cc code that I'm now deleting was originally
added by me back in 2006 to resolve PR middle-end/21137. I've confirmed
that those testcase(s) remain resolved with this patch (and I'll close
21137 in Bugzilla). This patch also implements most (but not all) of the
examples mentioned in PR tree-optimization/98954, for which I have some
follow-up patches.
2022-08-09 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* fold-const.cc (fold_binary_loc): Remove optimizations to
optimize ((X >> C1) & C2) ==/!= 0.
* match.pd (cmp (bit_and (lshift @0 @1) @2) @3): Remove wi::ctz
check, and handle all values of INTEGER_CSTs @2 and @3.
(cmp (bit_and (rshift @0 @1) @2) @3): Likewise, remove wi::clz
checks, and handle all values of INTEGER_CSTs @2 and @3.
gcc/testsuite/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* gcc.dg/fold-eqandshift-4.c: New test case.
|
|
The fold_to_nonsharp_ineq_using_bound folding ends up creating invalid
typed IL which confuses later foldings. The following fixes that.
2022-06-20 Richard Biener <rguenther@suse.de>
PR middle-end/106027
* fold-const.cc (fold_to_nonsharp_ineq_using_bound): Use the
type of the prevailing comparison for the new comparison type.
(fold_binary_loc): Use proper types for the A < X && A + 1 > Y
to A < X && A >= Y folding.
* gcc.dg/pr106027.c: New testcase.
|
|
The following testcase triggers a false positive UBSan binding a reference
to null diagnostics.
In the FE we instrument conversions from pointer to reference type
to diagnose at runtime if the operand of such a conversion is 0.
The problem is that a GENERIC folding folds
((const struct Bar *) ((const struct Foo *) this)->data) + (sizetype) range_check (x)
conversion to const struct Bar & by converting to that the first
operand of the POINTER_PLUS_EXPR. But that changes when the -fsanitize=null
binding to reference runtime check occurs. Without the optimization,
it is invoked on the result of the POINTER_PLUS_EXPR, and as range_check
call throws, that means it never triggers in the testcase.
With the optimization, it checks whether this->data is NULL and it is.
The following patch avoids that optimization during GENERIC folding when
-fsanitize=null is enabled and it is a cast from non-REFERENCE_TYPE to
REFERENCE_TYPE.
2022-05-27 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/105729
* fold-const.cc (fold_unary_loc): Don't optimize (X &) ((Y *) z + w)
to (X &) z + w if -fsanitize=null during GENERIC folding.
* g++.dg/ubsan/pr105729.C: New test.
|
|
The following makes the main gimple_build API take a
gimple_stmt_iterator, whether to insert before or after and
an iterator update argument to make it more convenient to use
in certain situations (see the tree-vect-generic.cc hunks for
an example). It also makes the case we insert into the IL
somewhat distinct from inserting into a standalone sequence in
that it simplifies built expressions the same way as inserting
and calling fold_stmt (..., follow_all_ssa_edges) would. When
inserting into a standalone sequence we restrict simplification
to defs within the currently building sequence.
The patch only amends the tree_code gimple_build API, I will
followup with converting the rest as well. The patch got larger
than intended because the template forwarders now use gsi_last
which introduces a dependency on gimple-iterator.h requiring
mass #include re-org across the tree. There are two frontend
specific files including gimple-fold.h just for some padding
clearing stuff - I've removed the include and instead moved
the declarations to fold-const.h (but not the implementations).
Otherwise I'd have to include half of the middle-end headers in
those files which I didn't much like.
2022-05-12 Richard Biener <rguenther@suse.de>
gcc/cp/
* constexpr.cc: Remove gimple-fold.h include.
gcc/c-family/
* c-omp.cc: Remove gimple-fold.h include.
gcc/analyzer/
* supergraph.cc: Re-order gimple-fold.h include.
gcc/
* gimple-fold.cc (gimple_build): Adjust for new
main API.
* gimple-fold.h (gimple_build): New main APIs with
iterator, insert direction and iterator update.
(gimple_build): New forwarder template.
(clear_padding_type_may_have_padding_p): Remove.
(clear_type_padding_in_mask): Likewise.
(arith_overflowed_p): Likewise.
* fold-const.h (clear_padding_type_may_have_padding_p): Declare.
(clear_type_padding_in_mask): Likewise.
(arith_overflowed_p): Likewise.
* tree-vect-generic.cc (gimplify_build3): Use main gimple_build API.
(gimplify_build2): Likewise.
(gimplify_build1): Likewise.
* ubsan.cc (ubsan_expand_ptr_ifn): Likewise, avoid extra
compare stmt.
* gengtype.cc (open_base_files): Re-order includes.
* builtins.cc: Re-order gimple-fold.h include.
* calls.cc: Likewise.
* cgraphbuild.cc: Likewise.
* cgraphunit.cc: Likewise.
* config/rs6000/rs6000-builtin.cc: Likewise.
* config/rs6000/rs6000-call.cc: Likewise.
* config/rs6000/rs6000.cc: Likewise.
* config/s390/s390.cc: Likewise.
* expr.cc: Likewise.
* fold-const.cc: Likewise.
* function-tests.cc: Likewise.
* gimple-match-head.cc: Likewise.
* gimple-range-fold.cc: Likewise.
* gimple-ssa-evrp-analyze.cc: Likewise.
* gimple-ssa-evrp.cc: Likewise.
* gimple-ssa-sprintf.cc: Likewise.
* gimple-ssa-warn-access.cc: Likewise.
* gimplify.cc: Likewise.
* graphite-isl-ast-to-gimple.cc: Likewise.
* ipa-cp.cc: Likewise.
* ipa-devirt.cc: Likewise.
* ipa-prop.cc: Likewise.
* omp-low.cc: Likewise.
* pointer-query.cc: Likewise.
* range-op.cc: Likewise.
* tree-cfg.cc: Likewise.
* tree-if-conv.cc: Likewise.
* tree-inline.cc: Likewise.
* tree-object-size.cc: Likewise.
* tree-ssa-ccp.cc: Likewise.
* tree-ssa-dom.cc: Likewise.
* tree-ssa-forwprop.cc: Likewise.
* tree-ssa-ifcombine.cc: Likewise.
* tree-ssa-loop-ivcanon.cc: Likewise.
* tree-ssa-math-opts.cc: Likewise.
* tree-ssa-pre.cc: Likewise.
* tree-ssa-propagate.cc: Likewise.
* tree-ssa-reassoc.cc: Likewise.
* tree-ssa-sccvn.cc: Likewise.
* tree-ssa-strlen.cc: Likewise.
* tree-ssa.cc: Likewise.
* value-pointer-equiv.cc: Likewise.
* vr-values.cc: Likewise.
gcc/testsuite/
* gcc.dg/plugin/diagnostic_group_plugin.c: Reorder or remove
gimple-fold.h include.
* gcc.dg/plugin/diagnostic_plugin_show_trees.c:
Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_inlining.c:
Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_metadata.c:
Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_paths.c:
Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c:
Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_string_literals.c: Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c:
Likewise.
* gcc.dg/plugin/finish_unit_plugin.c: Likewise.
* gcc.dg/plugin/ggcplug.c: Likewise.
* gcc.dg/plugin/must_tail_call_plugin.c: Likewise.
* gcc.dg/plugin/one_time_plugin.c: Likewise.
* gcc.dg/plugin/selfassign.c: Likewise.
* gcc.dg/plugin/start_unit_plugin.c: Likewise.
* g++.dg/plugin/selfassign.c: Likewise.
|
|
The following reverts the original PR105140 fix and goes for instead
applying the additional fold_convert constraint for VECTOR_TYPE
conversions also to fold_convertible_p. I did not try sanitizing
all of this at this point.
2022-04-13 Richard Biener <rguenther@suse.de>
PR tree-optimization/105250
* fold-const.cc (fold_convertible_p): Revert
r12-7979-geaaf77dd85c333, instead check for size equality
of the vector types involved.
* gcc.dg/pr105250.c: New testcase.
|
|
The following testcase is miscompiled, because fold_truth_andor
incorrectly folds
(unsigned) foo () >= 0U && 1
into
foo () >= 0
For the unsigned comparison (which is useless in this case,
as >= 0U is always true, but hasn't been folded yet), previous
make_range_step derives exp (unsigned) foo () and +[0U, -]
range for it. Next we process the NOP_EXPR. We have special code
for unsigned to signed casts, already earlier punt if low or high
aren't representable in arg0_type or if it is a narrowing conversion.
For the signed to unsigned casts, I think if high is specified we
are still fine, as we punt for non-representable values in arg0_type,
n_high is then still representable and so was smaller or equal to
signed maximum and either low is not present (equivalent to 0U), or
low must be smaller or equal to high and so for unsigned exp
+[low, high] the signed exp +[n_low, n_high] will be correct.
Similarly, if both low and high aren't specified (always true or
always false), it is ok too.
But if we have for unsigned exp +[low, -] or -[low, -], using
+[n_low, -] or -[n_high, -] is incorrect. Because low is smaller
or equal to signed maximum and high is unspecified (i.e. unsigned
maximum), when signed that range is a union of +[n_low, -] and
+[-, -1] which is equivalent to -[0, n_low-1], unless low
is 0, in that case we can treat it as [-, -].
2022-04-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/105189
* fold-const.cc (make_range_step): Fix up handling of
(unsigned) x +[low, -] ranges for signed x if low fits into
typeof (x).
* g++.dg/torture/pr105189.C: New test.
|
|
fold_convertible_p expects an operand and a type to convert to
but recurses with two vector component types. Fixed by allowing
types instead of an operand as well.
2022-04-04 Richard Biener <rguenther@suse.de>
PR middle-end/105140
* fold-const.cc (fold_convertible_p): Allow a TYPE_P arg.
* gcc.dg/pr105140.c: New testcase.
|
|
As mentioned in the PR, operand_equal_p already contains some hacks so that
it can be called already on pre-instantiation C++ trees from templates,
but the recent change to compare DECL_FIELD_OFFSET in the COMPONENT_REF
case broke this. Many such COMPONENT_REFs are already punted on earlier
because they have NULL TREE_TYPE, but in this case the code knows what
type they have but still uses an IDENTIFIER_NODE as second operand
of COMPONENT_REF (I think SCOPE_REF is something that could be used too).
The following patch looks at those DECL_FIELD_*OFFSET fields only if
both field[01] args are FIELD_DECLs and otherwise keeps it to the
earlier OP_SAME (1) check that guards this whole block.
2022-03-24 Jakub Jelinek <jakub@redhat.com>
PR c++/105035
* fold-const.cc (operand_equal_p) <case COMPONENT_REF>: If either
field0 or field1 is not a FIELD_DECL, return false.
* g++.dg/warn/Wduplicated-cond2.C: New test.
|
|
This patch addresses a code quality regression in GCC 12 by implementing
some constant folding/simplification transformations for REDUC_PLUS_EXPR
in match.pd. The motivating example is gcc.dg/vect/pr89440.c which with
-O2 -ffast-math (with vectorization now enabled) gets optimized to:
float f (float x)
{
vector(4) float vect_x_14.11;
vector(4) float _2;
float _32;
_2 = {x_9(D), 0.0, 0.0, 0.0};
vect_x_14.11_29 = _2 + { 1.0e+1, 2.6e+1, 4.2e+1, 5.8e+1 };
_32 = .REDUC_PLUS (vect_x_14.11_29); [tail call]
return _32;
}
With these proposed new transformations, we can simplify the
above code even further.
float f (float x)
{
float _32;
_32 = x_9(D) + 1.36e+2;
return _32;
}
[which happens to match what we'd produce with -fno-tree-vectorize,
and with GCC 11].
2022-02-22 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
* fold-const.cc (ctor_single_nonzero_element): New function to
return the single non-zero element of a (vector) constructor.
* fold-const.h (ctor_single_nonzero_element): Prototype here.
* match.pd (reduc (constructor@0)): Simplify reductions of a
constructor containing a single non-zero element.
(reduc (@0 op VECTOR_CST) -> (reduc @0) op CONST): Simplify
reductions of vector operations of the same operator with
constant vector operands.
gcc/testsuite/ChangeLog
* gcc.dg/fold-reduc-1.c: New test case.
|
|
[PR104522]
For IBM double double I've added in PR95450 and PR99648 verification that
when we at the tree/GIMPLE or RTL level interpret target bytes as a REAL_CST
or CONST_DOUBLE constant, we try to encode it back to target bytes and
verify it is the same.
This is because our real.c support isn't able to represent all valid values
of IBM double double which has variable precision.
In PR104522, it has been noted that we have similar problem with the
Intel/Motorola extended XFmode formats, our internal representation isn't
able to record pseudo denormals, pseudo infinities, pseudo NaNs and unnormal
values.
So, the following patch is an attempt to extend that verification to all
floats.
Unfortunately, it wasn't that straightforward, because the
__builtin_clear_padding code exactly for the XFmode long doubles needs to
discover what bits are padding and does that by interpreting memory of
all 1s. That is actually a valid supported value, a qNaN with negative
sign with all mantissa bits set, but the verification includes also the
padding bits (exactly what __builtin_clear_padding wants to figure out)
and so fails the comparison check and so we ICE.
The patch fixes that case by moving that verification from
native_interpret_real to its caller, so that clear_padding_type can
call native_interpret_real and avoid that extra check.
With this, the only thing that regresses in the testsuite is
+FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times long\\t-16843010 5
because it decides to use a pattern that has non-zero bits in the padding
bits of the long double, so the simplify-rtx.cc change prevents folding
a SUBREG into a constant. We emit (the testcase is -O0 but we emit worse
code at all opt levels) something like:
movabsq $-72340172838076674, %rax
movabsq $-72340172838076674, %rdx
movq %rax, -48(%rbp)
movq %rdx, -40(%rbp)
fldt -48(%rbp)
fstpt -32(%rbp)
instead of
fldt .LC2(%rip)
fstpt -32(%rbp)
...
.LC2:
.long -16843010
.long -16843010
.long 65278
.long 0
Note, neither of those sequences actually stores the padding bits, fstpt
simply doesn't touch them.
For vars with clear_padding_real_needs_padding_p types that are allocated
to memory at expansion time, I'd say much better would be to do the stores
using integral modes rather than XFmode, so do that:
movabsq $-72340172838076674, %rax
movq %rax, -32(%rbp)
movq %rax, -24(%rbp)
directly. That is the only way to ensure the padding bits are initialized
(or expand __builtin_clear_padding, but then you initialize separately the
value bits and padding bits).
2022-02-15 Jakub Jelinek <jakub@redhat.com>
PR middle-end/104522
* fold-const.h (native_interpret_real): Declare.
* fold-const.cc (native_interpret_real): No longer static. Don't
perform MODE_COMPOSITE_P verification here.
(native_interpret_expr) <case REAL_TYPE>: But perform it here instead
for all modes.
* gimple-fold.cc (clear_padding_type): Call native_interpret_real
instead of native_interpret_expr.
* simplify-rtx.cc (simplify_immed_subreg): Perform the native_encode_rtx
and comparison verification for all FLOAT_MODE_P modes, not just
MODE_COMPOSITE_P.
* gcc.dg/pr104522.c: New test.
|