Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
[PR112788]
As PR112788 shows, on rs6000 with -mabi=ieeelongdouble type _Float128
has the different type precision (128) from that (127) of type long
double, but actually they has the same underlying mode, so they have
the same precision as the mode indicates the same real type format
ieee_quad_format.
It's not sensible to have such two types which have the same mode but
different type precisions, some fix attempt was posted at [1].
As the discussion there, there are some historical reasons and
practical issues. Considering we passed stage 1 and it also affected
the build as reported, this patch is trying to temporarily workaround
it. I thought to introduce a hookpod but that seems a bit overkill,
assuming scalar float type with the same mode should have the same
precision looks sensible.
[1] https://inbox.sourceware.org/gcc-patches/718677e7-614d-7977-312d-05a75e1fd5b4@linux.ibm.com/
PR tree-optimization/112788
gcc/ChangeLog:
* value-range.h (range_compatible_p): Workaround same type mode but
different type precision issue for rs6000 scalar float types
_Float128 and long double.
|
|
Instead of directly checking type precision, check_operands_p should
invoke range_compatible_p to keep the range checking centralized.
* gimple-range-fold.h (range_compatible_p): Relocate.
* value-range.h (range_compatible_p): Here.
* range-op-mixed.h (operand_equal::operand_check_p): Call
range_compatible_p rather than comparing precision.
(operand_not_equal::operand_check_p): Ditto.
(operand_not_lt::operand_check_p): Ditto.
(operand_not_le::operand_check_p): Ditto.
(operand_not_gt::operand_check_p): Ditto.
(operand_not_ge::operand_check_p): Ditto.
(operand_plus::operand_check_p): Ditto.
(operand_abs::operand_check_p): Ditto.
(operand_minus::operand_check_p): Ditto.
(operand_negate::operand_check_p): Ditto.
(operand_mult::operand_check_p): Ditto.
(operand_bitwise_not::operand_check_p): Ditto.
(operand_bitwise_xor::operand_check_p): Ditto.
(operand_bitwise_and::operand_check_p): Ditto.
(operand_bitwise_or::operand_check_p): Ditto.
(operand_min::operand_check_p): Ditto.
(operand_max::operand_check_p): Ditto.
* range-op.cc (operand_lshift::operand_check_p): Ditto.
(operand_rshift::operand_check_p): Ditto.
(operand_logical_and::operand_check_p): Ditto.
(operand_logical_or::operand_check_p): Ditto.
(operand_logical_not::operand_check_p): Ditto.
|
|
Check to see if a comparison to a constant can be determined to always
be not-equal based on the bitmask.
PR tree-optimization/111766
gcc/
* range-op.cc (operator_equal::fold_range): Check constants
against the bitmask.
(operator_not_equal::fold_range): Ditto.
* value-range.h (irange_bitmask::member_p): New.
gcc/testsuite/
* gcc.dg/pr111766.c: New.
|
|
During the intersection operation, it can be helpful to remove any
low-end ranges when the bitmask has trailing zeros. This prevents
obviously incorrect ranges from appearing without requiring a bitmask
check.
* value-range.cc (irange_bitmask::adjust_range): New.
(irange::intersect_bitmask): Call adjust_range.
* value-range.h (irange_bitmask::adjust_range): New prototype.
|
|
A common pattern to to append a range to an existing range via union.
This optimizes that process.
* value-range.cc (irange::union_append): New.
(irange::union_): Call union_append when appropriate.
* value-range.h (irange::union_append): New prototype.
|
|
32640 bits [PR102989]
As mentioned in the _BitInt support thread, _BitInt(N) is currently limited
by the wide_int/widest_int maximum precision limitation, which is depending
on target 191, 319, 575 or 703 bits (one less than WIDE_INT_MAX_PRECISION).
That is fairly low limit for _BitInt, especially on the targets with the 191
bit limitation.
The following patch bumps that limit to 16319 bits on all arches (which support
_BitInt at all), which is the limit imposed by INTEGER_CST representation
(unsigned char members holding number of HOST_WIDE_INT limbs).
In order to achieve that, wide_int is changed from a trivially copyable type
which contained just an inline array of WIDE_INT_MAX_ELTS (3, 5, 9 or
11 limbs depending on target) limbs into a non-trivially copy constructible,
copy assignable and destructible type which for the usual small cases (up
to WIDE_INT_MAX_INL_ELTS which is the former WIDE_INT_MAX_ELTS) still uses
an inline array of limbs, but for larger precisions uses heap allocated
limb array. This makes wide_int unusable in GC structures, so for dwarf2out
which was the only place which needed it there is a new rwide_int type
(restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs
inline and is trivially copyable (dwarf2out should never deal with large
_BitInt constants, those should have been lowered earlier).
Similarly, widest_int has been changed from a trivially copyable type which
contained also an inline array of WIDE_INT_MAX_ELTS limbs (but unlike
wide_int didn't contain precision and assumed that to be
WIDE_INT_MAX_PRECISION) into a non-trivially copy constructible, copy
assignable and destructible type which has always WIDEST_INT_MAX_PRECISION
precision (32640 bits currently, twice as much as INTEGER_CST limitation
allows) and unlike wide_int decides depending on get_len () value whether
it uses an inline array (again, up to WIDE_INT_MAX_INL_ELTS) or heap
allocated one. In wide-int.h this means we need to estimate an upper
bound on how many limbs will wide-int.cc (usually, sometimes wide-int.h)
need to write, heap allocate if needed based on that estimation and upon
set_len which is done at the end if we guessed over WIDE_INT_MAX_INL_ELTS
and allocated dynamically, while we actually need less than that
copy/deallocate. The unexact guesses are needed because the exact
computation of the length in wide-int.cc is sometimes quite complex and
especially canonicalize at the end can decrease it. widest_int is again
because of this not usable in GC structures, so cfgloop.h has been changed
to use fixed_wide_int_storage <WIDE_INT_MAX_INL_PRECISION> and punt if
we'd have larger _BitInt based iterators, programs having more than 128-bit
iterators will be hopefully rare and I think it is fine to treat loops with
more than 2^127 iterations as effectively possibly infinite, omp-general.cc
is changed to use fixed_wide_int_storage <1024>, as it better should support
scores with the same precision on all arches.
Code which used WIDE_INT_PRINT_BUFFER_SIZE sized buffers for printing
wide_int/widest_int into buffer had to be changed to use XALLOCAVEC for
larger lengths.
On x86_64, the patch in --enable-checking=yes,rtl,extra configured
bootstrapped cc1plus enlarges the .text section by 1.01% - from
0x25725a5 to 0x25e5555 and similarly at least when compiling insn-recog.cc
with the usual bootstrap option slows compilation down by 1.01%,
user 4m22.046s and 4m22.384s on vanilla trunk vs.
4m25.947s and 4m25.581s on patched trunk. I'm afraid some code size growth
and compile time slowdown is unavoidable in this case, we use wide_int and
widest_int everywhere, and while the rare cases are marked with UNLIKELY
macros, it still means extra checks for it.
The patch also regresses
+FAIL: gm2/pim/fail/largeconst.mod, -O
+FAIL: gm2/pim/fail/largeconst.mod, -O -g
+FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer
+FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer -finline-functions
+FAIL: gm2/pim/fail/largeconst.mod, -Os
+FAIL: gm2/pim/fail/largeconst.mod, -g
+FAIL: gm2/pim/fail/largeconst2.mod, -O
+FAIL: gm2/pim/fail/largeconst2.mod, -O -g
+FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer
+FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer -finline-functions
+FAIL: gm2/pim/fail/largeconst2.mod, -Os
+FAIL: gm2/pim/fail/largeconst2.mod, -g
tests, which previously were rejected with
error: constant literal ‘12345678912345678912345679123456789123456789123456789123456789123456791234567891234567891234567891234567891234567912345678912345678912345678912345678912345679123456789123456789’ exceeds internal ZTYPE range
kind of errors, but now are accepted. Seems the FE tries to parse constants
into widest_int in that case and only diagnoses if widest_int overflows,
that seems wrong, it should at least punt if stuff doesn't fit into
WIDE_INT_MAX_PRECISION, but perhaps far less than that, if it wants support
for middle-end for precisions above 128-bit, it better should be using
BITINT_TYPE. Will file a PR and defer to Modula2 maintainer.
2023-10-12 Jakub Jelinek <jakub@redhat.com>
PR c/102989
* wide-int.h: Adjust file comment.
(WIDE_INT_MAX_INL_ELTS): Define to former value of WIDE_INT_MAX_ELTS.
(WIDE_INT_MAX_INL_PRECISION): Define.
(WIDE_INT_MAX_ELTS): Change to 255. Assert that WIDE_INT_MAX_INL_ELTS
is smaller than WIDE_INT_MAX_ELTS.
(RWIDE_INT_MAX_ELTS, RWIDE_INT_MAX_PRECISION, WIDEST_INT_MAX_ELTS,
WIDEST_INT_MAX_PRECISION): Define.
(WI_BINARY_RESULT_VAR, WI_UNARY_RESULT_VAR): Change write_val callers
to pass 0 as a new argument.
(class widest_int_storage): Likewise.
(widest_int, widest2_int): Change typedefs to use widest_int_storage
rather than fixed_wide_int_storage.
(enum wi::precision_type): Add INL_CONST_PRECISION enumerator.
(struct binary_traits): Add partial specializations for
INL_CONST_PRECISION.
(generic_wide_int): Add needs_write_val_arg static data member.
(int_traits): Likewise.
(wide_int_storage): Replace val non-static data member with a union
u of it and HOST_WIDE_INT *valp. Declare copy constructor, copy
assignment operator and destructor. Add unsigned int argument to
write_val.
(wide_int_storage::wide_int_storage): Initialize precision to 0
in the default ctor. Remove unnecessary {}s around STATIC_ASSERTs.
Assert in non-default ctor T's precision_type is not
INL_CONST_PRECISION and allocate u.valp for large precision. Add
copy constructor.
(wide_int_storage::~wide_int_storage): New.
(wide_int_storage::operator=): Add copy assignment operator. In
assignment operator remove unnecessary {}s around STATIC_ASSERTs,
assert ctor T's precision_type is not INL_CONST_PRECISION and
if precision changes, deallocate and/or allocate u.valp.
(wide_int_storage::get_val): Return u.valp rather than u.val for
large precision.
(wide_int_storage::write_val): Likewise. Add an unused unsigned int
argument.
(wide_int_storage::set_len): Use write_val instead of writing val
directly.
(wide_int_storage::from, wide_int_storage::from_array): Adjust
write_val callers.
(wide_int_storage::create): Allocate u.valp for large precisions.
(wi::int_traits <wide_int_storage>::get_binary_precision): New.
(fixed_wide_int_storage::fixed_wide_int_storage): Make default
ctor defaulted.
(fixed_wide_int_storage::write_val): Add unused unsigned int argument.
(fixed_wide_int_storage::from, fixed_wide_int_storage::from_array):
Adjust write_val callers.
(wi::int_traits <fixed_wide_int_storage>::get_binary_precision): New.
(WIDEST_INT): Define.
(widest_int_storage): New template class.
(wi::int_traits <widest_int_storage>): New.
(trailing_wide_int_storage::write_val): Add unused unsigned int
argument.
(wi::get_binary_precision): Use
wi::int_traits <WI_BINARY_RESULT (T1, T2)>::get_binary_precision
rather than get_precision on get_binary_result.
(wi::copy): Adjust write_val callers. Don't call set_len if
needs_write_val_arg.
(wi::bit_not): If result.needs_write_val_arg, call write_val
again with upper bound estimate of len.
(wi::sext, wi::zext, wi::set_bit): Likewise.
(wi::bit_and, wi::bit_and_not, wi::bit_or, wi::bit_or_not,
wi::bit_xor, wi::add, wi::sub, wi::mul, wi::mul_high, wi::div_trunc,
wi::div_floor, wi::div_ceil, wi::div_round, wi::divmod_trunc,
wi::mod_trunc, wi::mod_floor, wi::mod_ceil, wi::mod_round,
wi::lshift, wi::lrshift, wi::arshift): Likewise.
(wi::bswap, wi::bitreverse): Assert result.needs_write_val_arg
is false.
(gt_ggc_mx, gt_pch_nx): Remove generic template for all
generic_wide_int, instead add functions and templates for each
storage of generic_wide_int. Make functions for
generic_wide_int <wide_int_storage> and templates for
generic_wide_int <widest_int_storage <N>> deleted.
(wi::mask, wi::shifted_mask): Adjust write_val calls.
* wide-int.cc (zeros): Decrease array size to 1.
(BLOCKS_NEEDED): Use CEIL.
(canonize): Use HOST_WIDE_INT_M1.
(wi::from_buffer): Pass 0 to write_val.
(wi::to_mpz): Use CEIL.
(wi::from_mpz): Likewise. Pass 0 to write_val. Use
WIDE_INT_MAX_INL_ELTS instead of WIDE_INT_MAX_ELTS.
(wi::mul_internal): Use WIDE_INT_MAX_INL_PRECISION instead of
MAX_BITSIZE_MODE_ANY_INT in automatic array sizes, for prec
above WIDE_INT_MAX_INL_PRECISION estimate precision from
lengths of operands. Use XALLOCAVEC allocated buffers for
prec above WIDE_INT_MAX_INL_PRECISION.
(wi::divmod_internal): Likewise.
(wi::lshift_large): For len > WIDE_INT_MAX_INL_ELTS estimate
it from xlen and skip.
(rshift_large_common): Remove xprecision argument, add len
argument with len computed in caller. Don't return anything.
(wi::lrshift_large, wi::arshift_large): Compute len here
and pass it to rshift_large_common, for lengths above
WIDE_INT_MAX_INL_ELTS using estimations from xlen if possible.
(assert_deceq, assert_hexeq): For lengths above
WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer.
(test_printing): Use WIDE_INT_MAX_INL_PRECISION instead of
WIDE_INT_MAX_PRECISION.
* wide-int-print.h (WIDE_INT_PRINT_BUFFER_SIZE): Use
WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION.
* wide-int-print.cc (print_decs, print_decu, print_hex): For
lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer.
* tree.h (wi::int_traits<extended_tree <N>>): Change precision_type
to INL_CONST_PRECISION for N == ADDR_MAX_PRECISION.
(widest_extended_tree): Use WIDEST_INT_MAX_PRECISION instead of
WIDE_INT_MAX_PRECISION.
(wi::ints_for): Use int_traits <extended_tree <N> >::precision_type
instead of hard coded CONST_PRECISION.
(widest2_int_cst): Use WIDEST_INT_MAX_PRECISION instead of
WIDE_INT_MAX_PRECISION.
(wi::extended_tree <N>::get_len): Use WIDEST_INT_MAX_PRECISION rather
than WIDE_INT_MAX_PRECISION.
(wi::ints_for::zero): Use
wi::int_traits <wi::extended_tree <N> >::precision_type instead of
wi::CONST_PRECISION.
* tree.cc (build_replicated_int_cst): Formatting fix. Use
WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS.
* print-tree.cc (print_node): Don't print TREE_UNAVAILABLE on
INTEGER_CSTs, TREE_VECs or SSA_NAMEs.
* double-int.h (wi::int_traits <double_int>::precision_type): Change
to INL_CONST_PRECISION from CONST_PRECISION.
* poly-int.h (struct poly_coeff_traits): Add partial specialization
for wi::INL_CONST_PRECISION.
* cfgloop.h (bound_wide_int): New typedef.
(struct nb_iter_bound): Change bound type from widest_int to
bound_wide_int.
(struct loop): Change nb_iterations_upper_bound,
nb_iterations_likely_upper_bound and nb_iterations_estimate type from
widest_int to bound_wide_int.
* cfgloop.cc (record_niter_bound): Return early if wi::min_precision
of i_bound is too large for bound_wide_int. Adjustments for the
widest_int to bound_wide_int type change in non-static data members.
(get_estimated_loop_iterations, get_max_loop_iterations,
get_likely_max_loop_iterations): Adjustments for the widest_int to
bound_wide_int type change in non-static data members.
* tree-vect-loop.cc (vect_transform_loop): Likewise.
* tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use
XALLOCAVEC allocated buffer for i_bound len above
WIDE_INT_MAX_INL_ELTS.
(record_estimate): Return early if wi::min_precision of i_bound is too
large for bound_wide_int. Adjustments for the widest_int to
bound_wide_int type change in non-static data members.
(wide_int_cmp): Use bound_wide_int instead of widest_int.
(bound_index): Use bound_wide_int instead of widest_int.
(discover_iteration_bound_by_body_walk): Likewise. Use
widest_int::from to convert it to widest_int when passed to
record_niter_bound.
(maybe_lower_iteration_bound): Use widest_int::from to convert it to
widest_int when passed to record_niter_bound.
(estimate_numbers_of_iteration): Don't record upper bound if
loop->nb_iterations has too large precision for bound_wide_int.
(n_of_executions_at_most): Use widest_int::from.
* tree-ssa-loop-ivcanon.cc (remove_redundant_iv_tests): Adjust for
the widest_int to bound_wide_int changes.
* match.pd (fold_sign_changed_comparison simplification): Use
wide_int::from on wi::to_wide instead of wi::to_widest.
* value-range.h (irange::maybe_resize): Avoid using memcpy on
non-trivially copyable elements.
* value-range.cc (irange_bitmask::dump): Use XALLOCAVEC allocated
buffer for mask or value len above WIDE_INT_PRINT_BUFFER_SIZE.
* fold-const.cc (fold_convert_const_int_from_int, fold_unary_loc):
Use wide_int::from on wi::to_wide instead of wi::to_widest.
* tree-ssa-ccp.cc (bit_value_binop): Zero extend r1max from width
before calling wi::udiv_trunc.
* lto-streamer-out.cc (output_cfg): Adjustments for the widest_int to
bound_wide_int type change in non-static data members.
* lto-streamer-in.cc (input_cfg): Likewise.
(lto_input_tree_1): Use WIDE_INT_MAX_INL_ELTS rather than
WIDE_INT_MAX_ELTS. For length above WIDE_INT_MAX_INL_ELTS use
XALLOCAVEC allocated buffer. Formatting fix.
* data-streamer-in.cc (streamer_read_wide_int,
streamer_read_widest_int): Likewise.
* tree-affine.cc (aff_combination_expand): Use placement new to
construct name_expansion.
(free_name_expansion): Destruct name_expansion.
* gimple-ssa-strength-reduction.cc (struct slsr_cand_d): Change
index type from widest_int to offset_int.
(class incr_info_d): Change incr type from widest_int to offset_int.
(alloc_cand_and_find_basis, backtrace_base_for_ref,
restructure_reference, slsr_process_ref, create_mul_ssa_cand,
create_mul_imm_cand, create_add_ssa_cand, create_add_imm_cand,
slsr_process_add, cand_abs_increment, replace_mult_candidate,
replace_unconditional_candidate, incr_vec_index,
create_add_on_incoming_edge, create_phi_basis_1,
replace_conditional_candidate, record_increment,
record_phi_increments_1, phi_incr_cost_1, phi_incr_cost,
lowest_cost_path, total_savings, ncd_with_phi, ncd_of_cand_and_phis,
nearest_common_dominator_for_cands, insert_initializers,
all_phi_incrs_profitable_1, replace_one_candidate,
replace_profitable_candidates): Use offset_int rather than widest_int
and wi::to_offset rather than wi::to_widest.
* real.cc (real_to_integer): Use WIDE_INT_MAX_INL_ELTS rather than
2 * WIDE_INT_MAX_ELTS and for words above that use XALLOCAVEC
allocated buffer.
* tree-ssa-loop-ivopts.cc (niter_for_exit): Use placement new
to construct tree_niter_desc and destruct it on failure.
(free_tree_niter_desc): Destruct tree_niter_desc if value is non-NULL.
* gengtype.cc (main): Remove widest_int handling.
* graphite-isl-ast-to-gimple.cc (widest_int_from_isl_expr_int): Use
WIDEST_INT_MAX_ELTS instead of WIDE_INT_MAX_ELTS.
* gimple-ssa-warn-alloca.cc (pass_walloca::execute): Use
WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION and
assert get_len () fits into it.
* value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks):
For mask or value lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC
allocated buffer.
* gimple-ssa-sprintf.cc (adjust_range_for_overflow): Use
wide_int::from on wi::to_wide instead of wi::to_widest.
* omp-general.cc (score_wide_int): New typedef.
(omp_context_compute_score): Use score_wide_int instead of widest_int
and adjust for those changes.
(struct omp_declare_variant_entry): Change score and
score_in_declare_simd_clone non-static data member type from widest_int
to score_wide_int.
(omp_resolve_late_declare_variant, omp_resolve_declare_variant): Use
score_wide_int instead of widest_int and adjust for those changes.
(omp_lto_output_declare_variant_alt): Likewise.
(omp_lto_input_declare_variant_alt): Likewise.
* godump.cc (go_output_typedef): Assert get_len () is smaller than
WIDE_INT_MAX_INL_ELTS.
gcc/c-family/
* c-warn.cc (match_case_to_enum_1): Use wi::to_wide just once instead
of 3 times, assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS.
gcc/testsuite/
* gcc.dg/bitint-38.c: New test.
|
|
We can set_nan() with a nan_state so it's good form to have the
analogous form for update_nan().
gcc/ChangeLog:
* value-range.h (frange::update_nan): New.
|
|
In the conversion of iranges to wide_int (commit cb779afeff204f), I
mistakenly made contains_zero_p() return TRUE for undefined ranges.
This means the rest of the patch was adjusted for this stupidity.
For example, we ended up doing the following, to make up for the fact
that contains_zero_p was broken:
- if (!lhs.contains_p (build_zero_cst (lhs.type ())))
+ if (lhs.undefined_p () || !contains_zero_p (lhs))
This patch fixes the thinko and adjusts all callers.
In places where a caller is not checking undefined_p(), it is because
either the caller has already handled undefined ranges in the
preceeding code, or the check is superfluous.
gcc/ChangeLog:
* value-range.h (contains_zero_p): Return false for undefined ranges.
* range-op-float.cc (operator_gt::op1_op2_relation): Adjust for
contains_zero_p change above.
(operator_ge::op1_op2_relation): Same.
(operator_equal::op1_op2_relation): Same.
(operator_not_equal::op1_op2_relation): Same.
(operator_lt::op1_op2_relation): Same.
(operator_le::op1_op2_relation): Same.
(operator_ge::op1_op2_relation): Same.
* range-op.cc (operator_equal::op1_op2_relation): Same.
(operator_not_equal::op1_op2_relation): Same.
(operator_lt::op1_op2_relation): Same.
(operator_le::op1_op2_relation): Same.
(operator_cast::op1_range): Same.
(set_nonzero_range_from_mask): Same.
(operator_bitwise_xor::op1_range): Same.
(operator_addr_expr::fold_range): Same.
(operator_addr_expr::op1_range): Same.
|
|
In previous reviews, adding overflow APIs to range-op would be useful.
Those APIs could help to check if overflow happens when operating
between two 'range's, like: plus, minus, and mult.
Previous discussions are here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624067.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624701.html
gcc/ChangeLog:
* range-op-mixed.h (operator_plus::overflow_free_p): New declare.
(operator_minus::overflow_free_p): New declare.
(operator_mult::overflow_free_p): New declare.
* range-op.cc (range_op_handler::overflow_free_p): New function.
(range_operator::overflow_free_p): New default function.
(operator_plus::overflow_free_p): New function.
(operator_minus::overflow_free_p): New function.
(operator_mult::overflow_free_p): New function.
* range-op.h (range_op_handler::overflow_free_p): New declare.
(range_operator::overflow_free_p): New declare.
* value-range.cc (irange::nonnegative_p): New function.
(irange::nonpositive_p): New function.
* value-range.h (irange::nonnegative_p): New declare.
(irange::nonpositive_p): New declare.
|
|
Set routines which take a type shouldn't have to pre-set the type of the
underlying range as it is specified as a parameter already.
* value-range.h (Value_Range::set_varying): Set the type.
(Value_Range::set_zero): Ditto.
(Value_Range::set_nonzero): Ditto.
|
|
The bit twiddling in union/intersect for the value/mask pair must be
normalized to have the unknown bits with a value of 0 in order to make
the math simpler. Normalizing at construction slowed VRP by 1.5% so I
opted to normalize before updating the bitmask in range-ops, since it
was the only user. However, with upcoming changes there will be
multiple setters of the mask (IPA and CCP), so we need something more
general.
I played with various alternatives, and settled on normalizing before
union/intersect which were the ones needing the bits cleared. With
this patch, there's no noticeable difference in performance either in
VRP or in overall compilation.
gcc/ChangeLog:
* value-range.cc (irange_bitmask::verify_mask): Mask need not be
normalized.
* value-range.h (irange_bitmask::union_): Normalize beforehand.
(irange_bitmask::intersect): Same.
|
|
Integer ranges (irange) currently track known 0 bits. We've wanted to
track known 1 bits for some time, and instead of tracking known 0 and
known 1's separately, it has been suggested we track a value/mask pair
similarly to what we do for CCP and RTL. This patch implements such a
thing.
With this we now track a VALUE integer which are the known values, and
a MASK which tells us which bits contain meaningful information. This
allows us to fix a handful of enhancement requests, such as PR107043
and PR107053.
There is a 4.48% performance penalty for VRP and 0.42% in overall
compilation for this entire patchset. It is expected and in line
with the loss incurred when we started tracking known 0 bits.
This patch just provides the value/mask tracking support. All the
nonzero users (range-op, IPA, CCP, etc), are still using the nonzero
nomenclature. For that matter, this patch reimplements the nonzero
accessors with the value/mask functionality. In follow-up patches I
will enhance these passes to use the value/mask information, and
fix the aforementioned PRs.
gcc/ChangeLog:
* data-streamer-in.cc (streamer_read_value_range): Adjust for
value/mask.
* data-streamer-out.cc (streamer_write_vrange): Same.
* range-op.cc (operator_cast::fold_range): Same.
* value-range-pretty-print.cc
(vrange_printer::print_irange_bitmasks): Same.
* value-range-storage.cc (irange_storage::write_lengths_address):
Same.
(irange_storage::set_irange): Same.
(irange_storage::get_irange): Same.
(irange_storage::size): Same.
(irange_storage::dump): Same.
* value-range-storage.h: Same.
* value-range.cc (debug): New.
(irange_bitmask::dump): New.
(add_vrange): Adjust for value/mask.
(irange::operator=): Same.
(irange::set): Same.
(irange::verify_range): Same.
(irange::operator==): Same.
(irange::contains_p): Same.
(irange::irange_single_pair_union): Same.
(irange::union_): Same.
(irange::intersect): Same.
(irange::invert): Same.
(irange::get_nonzero_bits_from_range): Rename to...
(irange::get_bitmask_from_range): ...this.
(irange::set_range_from_nonzero_bits): Rename to...
(irange::set_range_from_bitmask): ...this.
(irange::set_nonzero_bits): Rename to...
(irange::update_bitmask): ...this.
(irange::get_nonzero_bits): Rename to...
(irange::get_bitmask): ...this.
(irange::intersect_nonzero_bits): Rename to...
(irange::intersect_bitmask): ...this.
(irange::union_nonzero_bits): Rename to...
(irange::union_bitmask): ...this.
(irange_bitmask::verify_mask): New.
* value-range.h (class irange_bitmask): New.
(irange_bitmask::set_unknown): New.
(irange_bitmask::unknown_p): New.
(irange_bitmask::irange_bitmask): New.
(irange_bitmask::get_precision): New.
(irange_bitmask::get_nonzero_bits): New.
(irange_bitmask::set_nonzero_bits): New.
(irange_bitmask::operator==): New.
(irange_bitmask::union_): New.
(irange_bitmask::intersect): New.
(class irange): Friend vrange_printer.
(irange::varying_compatible_p): Adjust for bitmask.
(irange::set_varying): Same.
(irange::set_nonzero): Same.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr107009.c: Adjust irange dumping for
value/mask changes.
* gcc.dg/tree-ssa/vrp-unreachable.c: Same.
* gcc.dg/tree-ssa/vrp122.c: Same.
|
|
There's a few spots where a range is being altered in-place, but we
fail to call normalize the range. This patch makes sure we always
call normalize_kind(), and that normalize_kind in turn calls
verify_range to make sure verything is canonical.
gcc/ChangeLog:
* value-range.cc (frange::set): Do not call verify_range.
(frange::normalize_kind): Verify range.
(frange::union_nans): Do not call verify_range.
(frange::union_): Same.
(frange::intersect): Same.
(irange::irange_single_pair_union): Call normalize_kind if
necessary.
(irange::union_): Same.
(irange::intersect): Same.
(irange::set_range_from_nonzero_bits): Verify range.
(irange::set_nonzero_bits): Call normalize_kind if necessary.
(irange::get_nonzero_bits): Tweak comment.
(irange::intersect_nonzero_bits): Call normalize_kind if
necessary.
(irange::union_nonzero_bits): Same.
* value-range.h (irange::normalize_kind): Verify range.
|
|
Simplify range_op_handler to have a single range_operator pointer and
provide a more flexible dispatch mechanism for calls via generic vrange
classes. This is more extensible for adding new classes of range support.
Any unsupported dispatch patterns will simply return FALSE now rather
than generating compile time exceptions, aleviating the need to
constantly check for supoprted types.
* gimple-range-op.cc
(gimple_range_op_handler::gimple_range_op_handler): Adjust.
(gimple_range_op_handler::maybe_builtin_call): Adjust.
* gimple-range-op.h (operand1, operand2): Use m_operator.
* range-op.cc (integral_table, pointer_table): Relocate.
(get_op_handler): Rename from get_handler and handle all types.
(range_op_handler::range_op_handler): Relocate.
(range_op_handler::set_op_handler): Relocate and adjust.
(range_op_handler::range_op_handler): Relocate.
(dispatch_trio): New.
(RO_III, RO_IFI, RO_IFF, RO_FFF, RO_FIF, RO_FII): New consts.
(range_op_handler::dispatch_kind): New.
(range_op_handler::fold_range): Relocate and Use new dispatch value.
(range_op_handler::op1_range): Ditto.
(range_op_handler::op2_range): Ditto.
(range_op_handler::lhs_op1_relation): Ditto.
(range_op_handler::lhs_op2_relation): Ditto.
(range_op_handler::op1_op2_relation): Ditto.
(range_op_handler::set_op_handler): Use m_operator member.
* range-op.h (range_op_handler::operator bool): Use m_operator.
(range_op_handler::dispatch_kind): New.
(range_op_handler::m_valid): Delete.
(range_op_handler::m_int): Delete
(range_op_handler::m_float): Delete
(range_op_handler::m_operator): New.
(range_op_table::operator[]): Relocate from .cc file.
(range_op_table::set): Ditto.
* value-range.h (class vrange): Make range_op_handler a friend.
|
|
NANs don't have bounds, so there's no need to stream them out.
gcc/ChangeLog:
* data-streamer-in.cc (streamer_read_value_range): Handle NANs.
* data-streamer-out.cc (streamer_write_vrange): Same.
* value-range.h (class vrange): Make streamer_write_vrange a friend.
|
|
Generalize frange::set_nan() to take a nan_state and make current
set_nan() methods syntactic sugar.
This is in preparation for better streaming of NANs for LTO/IPA.
gcc/ChangeLog:
* value-range.h (frange::set_nan): New.
|
|
gcc/ChangeLog:
* value-range.h (vrange::kind): Remove.
|
|
gcc/ChangeLog:
PR tree-optimization/109920
* value-range.h (RESIZABLE>::~int_range): Use delete[].
|
|
This adds some missing accessors to the type agnostic Value_Range
class. They'll be used in the upcoming IPA work.
gcc/ChangeLog:
* value-range.h (class Value_Range): Implement set_zero,
set_nonzero, and nonzero_p.
|
|
gcc/ChangeLog:
* value-range.h (Value_Range::operator=): New.
|
|
The unsupported_range class is provided for completness sake. It is a
way to set VARYING/UNDEFINED ranges for unsupported ranges (currently
anything not float, integer, or pointer). You can't do anything with
them, except set_varying, and set_undefined. We will trap on any
other operation.
This patch provides a way to copy them, just in case they creep in.
This could happen in IPA under certain circumstances.
gcc/ChangeLog:
* value-range.cc (vrange::operator=): Add a stub to copy
unsupported ranges.
* value-range.h (is_a <unsupported_range>): New.
(Value_Range::operator=): Support copying unsupported ranges.
|
|
<tldr>
We can now have int_range<N, RESIZABLE=false> for automatically
resizable ranges. int_range_max is now int_range<3, true>
for a 69X reduction in size from current trunk, and 6.9X reduction from
GCC12. This incurs a 5% performance penalty for VRP that is more than
covered by our > 13% improvements recently.
</tldr>
int_range_max is the temporary range object we use in the ranger for
integers. With the conversion to wide_int, this structure bloated up
significantly because wide_ints are huge (80 bytes a piece) and are
about 10 times as big as a plain tree. Since the temporary object
requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
This means the structure grew from 4112 bytes to 40912 bytes.
This patch adds the ability to resize ranges as needed, defaulting to
no resizing, while int_range_max now defaults to 3 sub-ranges (instead
of 255) and grows to 255 when the range being calculated does not fit.
For example:
int_range<1> foo; // 1 sub-range with no resizing.
int_range<5> foo; // 5 sub-ranges with no resizing.
int_range<5, true> foo; // 5 sub-ranges with resizing.
I ran some tests and found that 3 sub-ranges cover 99% of cases, so
I've set the int_range_max default to that:
typedef int_range<3, /*RESIZABLE=*/true> int_range_max;
We don't bother growing incrementally, since the default covers most
cases and we have a 255 hard-limit. This hard limit could be reduced
to 128, since my tests never saw a range needing more than 124, but we
could do that as a follow-up if needed.
With 3-subranges, int_range_max is now 592 bytes versus 40912 for
trunk, and versus 4112 bytes for GCC12! The penalty is 5.04% for VRP
and 3.02% for threading, with no noticeable change in overall
compilation (0.27%). This is more than covered by our 13.26%
improvements for the legacy removal + wide_int conversion.
I think this approach is a good alternative, while providing us with
flexibility going forward. For example, we could try defaulting to a
8 sub-ranges for a noticeable improvement in VRP. We could also use
large sub-ranges for switch analysis to avoid resizing.
Another approach I tried was always resizing. With this, we could
drop the whole int_range<N> nonsense, and have irange just hold a
resizable range. This simplified things, but incurred a 7% penalty on
ipa_cp. This was hard to pinpoint, and I'm not entirely convinced
this wasn't some artifact of valgrind. However, until we're sure,
let's avoid massive changes, especially since IPA changes are coming
up.
For the curious, a particular hot spot for IPA in this area was:
ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
{
...
...
value_range save (m_vr);
m_vr.union_ (*other_vr);
return m_vr != save;
}
The problem isn't the resizing (since we do that at most once) but the
fact that for some functions with lots of callers we end up a huge
range that gets copied and compared for every meet operation. Maybe
the IPA algorithm could be adjusted somehow??.
Anywhooo... for now there is nothing to worry about, since value_range
still has 2 subranges and is not resizable. But we should probably
think what if anything we want to do here, as I envision IPA using
infinite ranges here (well, int_range_max) and handling frange's, etc.
gcc/ChangeLog:
PR tree-optimization/109695
* value-range.cc (irange::operator=): Resize range.
(irange::union_): Same.
(irange::intersect): Same.
(irange::invert): Same.
(int_range_max): Default to 3 sub-ranges and resize as needed.
* value-range.h (irange::maybe_resize): New.
(~int_range): New.
(int_range::int_range): Adjust for resizing.
(int_range::operator=): Same.
|
|
The previous patch just added basic intrinsic ranges for sqrt
([-0.0, +Inf] +-NAN being the general result range of the function
and [-0.0, +Inf] the general operand range if result isn't NAN etc.),
the following patch intersects those ranges with particular range
computed from argument or result's exact range with the expected
error in ulps taken into account and adds a function (frange_arithmetic
variant) which can be used by other functions as well as helper.
2023-05-06 Jakub Jelinek <jakub@redhat.com>
* value-range.h (frange_arithmetic): Declare.
* range-op-float.cc (frange_arithmetic): No longer static.
* gimple-range-op.cc (frange_mpfr_arg1): New function.
(cfn_sqrt::fold_range): Intersect the generic boundaries range
with range computed from sqrt of the particular bounds.
(cfn_sqrt::op1_range): Intersect the generic boundaries range
with range computed from squared particular bounds.
* gcc.dg/tree-ssa/range-sqrt-2.c: New test.
|
|
gcc/ChangeLog:
* value-range.h (class int_range): Remove gt_ggc_mx and gt_pch_nx
friends.
|
|
irange::set_nonzero is used everywhere and benefits immensely from
inlining.
gcc/ChangeLog:
* value-range.h (irange::set_nonzero): Inline.
|
|
Now that anti-ranges are no more and iranges contain wide_ints instead
of trees, various cleanups are possible. This is one of a handful of
patches improving the performance of irange::set() which is not on a
hot path, but quite sensitive because it is so pervasive.
gcc/ChangeLog:
* gimple-range-op.cc (cfn_ffs::fold_range): Use the correct
precision.
* gimple-ssa-warn-alloca.cc (alloca_call_type): Use <2> for
invalid_range, as it is an inverse range.
* tree-vrp.cc (find_case_label_range): Avoid trees.
* value-range.cc (irange::irange_set): Delete.
(irange::irange_set_1bit_anti_range): Delete.
(irange::irange_set_anti_range): Delete.
(irange::set): Cleanup.
* value-range.h (class irange): Remove irange_set,
irange_set_anti_range, irange_set_1bit_anti_range.
(irange::set_undefined): Remove set to m_type.
|
|
gcc/ChangeLog:
* range-op.cc (update_known_bitmask): Adjust for irange containing
wide_ints internally.
* tree-ssanames.cc (set_nonzero_bits): Same.
* tree-ssanames.h (set_nonzero_bits): Same.
* value-range-storage.cc (irange_storage::set_irange): Same.
(irange_storage::get_irange): Same.
* value-range.cc (irange::operator=): Same.
(irange::irange_set): Same.
(irange::irange_set_1bit_anti_range): Same.
(irange::irange_set_anti_range): Same.
(irange::set): Same.
(irange::verify_range): Same.
(irange::contains_p): Same.
(irange::irange_single_pair_union): Same.
(irange::union_): Same.
(irange::irange_contains_p): Same.
(irange::intersect): Same.
(irange::invert): Same.
(irange::set_range_from_nonzero_bits): Same.
(irange::set_nonzero_bits): Same.
(mask_to_wi): Same.
(irange::intersect_nonzero_bits): Same.
(irange::union_nonzero_bits): Same.
(gt_ggc_mx): Same.
(gt_pch_nx): Same.
(tree_range): Same.
(range_tests_strict_enum): Same.
(range_tests_misc): Same.
(range_tests_nonzero_bits): Same.
* value-range.h (irange::type): Same.
(irange::varying_compatible_p): Same.
(irange::irange): Same.
(int_range::int_range): Same.
(irange::set_undefined): Same.
(irange::set_varying): Same.
(irange::lower_bound): Same.
(irange::upper_bound): Same.
|
|
This patch removes all uses of vrp_val_{min,max} in favor for a
irange_val_* which are wide_int based. This will leave only one use
of vrp_val_* which returns trees in range_of_ssa_name_with_loop_info()
because it needs to work with non-integers (floats, etc). In a
follow-up patch, this function will also be cleaned up such that
vrp_val_* can be deleted.
The functions min_limit and max_limit in range-op.cc are now useless
as they're basically irange_val*. I didn't rename them yet to avoid
churn. I'll do it in a later patch.
gcc/ChangeLog:
* gimple-range-fold.cc (adjust_pointer_diff_expr): Rewrite with
irange_val*.
(vrp_val_max): New.
(vrp_val_min): New.
* gimple-range-op.cc (cfn_strlen::fold_range): Use irange_val_*.
* range-op.cc (max_limit): Same.
(min_limit): Same.
(plus_minus_ranges): Same.
(operator_rshift::op1_range): Same.
(operator_cast::inside_domain_p): Same.
* value-range.cc (vrp_val_is_max): Delete.
(vrp_val_is_min): Delete.
(range_tests_misc): Use irange_val_*.
* value-range.h (vrp_val_is_min): Delete.
(vrp_val_is_max): Delete.
(vrp_val_max): Delete.
(irange_val_min): New.
(vrp_val_min): Delete.
(irange_val_max): New.
* vr-values.cc (check_for_binary_op_overflow): Use irange_val_*.
|
|
This converts the irange API to use wide_ints exclusively, along with
its users.
This patch will slow down VRP, as there will be more useless
wide_int to tree conversions. However, this slowdown is only
temporary, as a follow-up patch will convert the internal
representation of iranges to wide_ints for a net overall gain
in performance.
gcc/ChangeLog:
* fold-const.cc (expr_not_equal_to): Convert to irange wide_int API.
* gimple-fold.cc (size_must_be_zero_p): Same.
* gimple-loop-versioning.cc
(loop_versioning::prune_loop_conditions): Same.
* gimple-range-edge.cc (gcond_edge_range): Same.
(gimple_outgoing_range::calc_switch_ranges): Same.
* gimple-range-fold.cc (adjust_imagpart_expr): Same.
(adjust_realpart_expr): Same.
(fold_using_range::range_of_address): Same.
(fold_using_range::relation_fold_and_or): Same.
* gimple-range-gori.cc (gori_compute::gori_compute): Same.
(range_is_either_true_or_false): Same.
* gimple-range-op.cc (cfn_toupper_tolower::get_letter_range): Same.
(cfn_clz::fold_range): Same.
(cfn_ctz::fold_range): Same.
* gimple-range-tests.cc (class test_expr_eval): Same.
* gimple-ssa-warn-alloca.cc (alloca_call_type): Same.
* ipa-cp.cc (ipa_value_range_from_jfunc): Same.
(propagate_vr_across_jump_function): Same.
(decide_whether_version_node): Same.
* ipa-prop.cc (ipa_get_value_range): Same.
* ipa-prop.h (ipa_range_set_and_normalize): Same.
* range-op.cc (get_shift_range): Same.
(value_range_from_overflowed_bounds): Same.
(value_range_with_overflow): Same.
(create_possibly_reversed_range): Same.
(equal_op1_op2_relation): Same.
(not_equal_op1_op2_relation): Same.
(lt_op1_op2_relation): Same.
(le_op1_op2_relation): Same.
(gt_op1_op2_relation): Same.
(ge_op1_op2_relation): Same.
(operator_mult::op1_range): Same.
(operator_exact_divide::op1_range): Same.
(operator_lshift::op1_range): Same.
(operator_rshift::op1_range): Same.
(operator_cast::op1_range): Same.
(operator_logical_and::fold_range): Same.
(set_nonzero_range_from_mask): Same.
(operator_bitwise_or::op1_range): Same.
(operator_bitwise_xor::op1_range): Same.
(operator_addr_expr::fold_range): Same.
(pointer_plus_operator::wi_fold): Same.
(pointer_or_operator::op1_range): Same.
(INT): Same.
(UINT): Same.
(INT16): Same.
(UINT16): Same.
(SCHAR): Same.
(UCHAR): Same.
(range_op_cast_tests): Same.
(range_op_lshift_tests): Same.
(range_op_rshift_tests): Same.
(range_op_bitwise_and_tests): Same.
(range_relational_tests): Same.
* range.cc (range_zero): Same.
(range_nonzero): Same.
* range.h (range_true): Same.
(range_false): Same.
(range_true_and_false): Same.
* tree-data-ref.cc (split_constant_offset_1): Same.
* tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Same.
* tree-ssa-loop-unswitch.cc (struct unswitch_predicate): Same.
(find_unswitching_predicates_for_bb): Same.
* tree-ssa-phiopt.cc (value_replacement): Same.
* tree-ssa-threadbackward.cc
(back_threader::find_taken_edge_cond): Same.
* tree-ssanames.cc (ssa_name_has_boolean_range): Same.
* tree-vrp.cc (find_case_label_range): Same.
* value-query.cc (range_query::get_tree_range): Same.
* value-range.cc (irange::set_nonnegative): Same.
(frange::contains_p): Same.
(frange::singleton_p): Same.
(frange::internal_singleton_p): Same.
(irange::irange_set): Same.
(irange::irange_set_1bit_anti_range): Same.
(irange::irange_set_anti_range): Same.
(irange::set): Same.
(irange::operator==): Same.
(irange::singleton_p): Same.
(irange::contains_p): Same.
(irange::set_range_from_nonzero_bits): Same.
(DEFINE_INT_RANGE_INSTANCE): Same.
(INT): Same.
(UINT): Same.
(SCHAR): Same.
(UINT128): Same.
(UCHAR): Same.
(range): New.
(tree_range): New.
(range_int): New.
(range_uint): New.
(range_uint128): New.
(range_uchar): New.
(range_char): New.
(build_range3): Convert to irange wide_int API.
(range_tests_irange3): Same.
(range_tests_int_range_max): Same.
(range_tests_strict_enum): Same.
(range_tests_misc): Same.
(range_tests_nonzero_bits): Same.
(range_tests_nan): Same.
(range_tests_signed_zeros): Same.
* value-range.h (Value_Range::Value_Range): Same.
(irange::set): Same.
(irange::nonzero_p): Same.
(irange::contains_p): Same.
(range_includes_zero_p): Same.
(irange::set_nonzero): Same.
(irange::set_zero): Same.
(contains_zero_p): Same.
(frange::contains_p): Same.
* vr-values.cc
(simplify_using_ranges::op_with_boolean_value_range_p): Same.
(bounds_of_var_in_loop): Same.
(simplify_using_ranges::legacy_fold_cond_overflow): Same.
|
|
gcc/ChangeLog:
* value-range.cc (irange::irange_union): Rename to...
(irange::union_): ...this.
(irange::irange_intersect): Rename to...
(irange::intersect): ...this.
* value-range.h (irange::union_): Delete.
(irange::intersect): Delete.
|
|
gcc/ChangeLog:
* value-range.cc (irange::irange_set_anti_range): Remove uses of
tree_lower_bound and tree_upper_bound.
(irange::verify_range): Same.
(irange::operator==): Same.
(irange::singleton_p): Same.
* value-range.h (irange::tree_lower_bound): Delete.
(irange::tree_upper_bound): Delete.
(irange::lower_bound): Delete.
(irange::upper_bound): Delete.
(irange::zero_p): Remove uses of tree_lower_bound and
tree_upper_bound.
|
|
gcc/ChangeLog:
* tree-ssa-loop-niter.cc (refine_value_range_using_guard): Remove
kind() call.
(determine_value_range): Same.
(record_nonwrapping_iv): Same.
(infer_loop_bounds_from_signedness): Same.
(scev_var_range_cant_overflow): Same.
* tree-vrp.cc (operand_less_p): Delete.
* tree-vrp.h (operand_less_p): Delete.
* value-range.cc (get_legacy_range): Remove uses of deprecated API.
(irange::value_inside_range): Delete.
* value-range.h (vrange::kind): Delete.
(irange::num_pairs): Remove check of m_kind.
(irange::min): Delete.
(irange::max): Delete.
|
|
[tl;dr: This is a rewrite of value-range-storage.* such that global
ranges and the internal ranger cache can use the same efficient
storage mechanism. It is optimized such that when wide_ints are
dropped into irange, the copying back and forth from storage will be
very fast, while being able to hold any number of sub-ranges
dynamically allocated at run-time. This replaces the global storage
mechanism which was limited to 6-subranges.]
Previously we had a vrange allocator for use in the ranger cache. It
worked with trees and could be used in place (fast), but it was not
memory efficient. With the upcoming switch to wide_ints for irange,
we can't afford to allocate ranges that can be used in place, because
an irange will be significantly larger, as it will hold full
wide_ints. We need a trailing_wide_int mechanism similar to what we
use for global ranges, but fast enough to use in the ranger's cache.
The global ranges had another allocation mechanism that was
trailing_wide_int based. It was memory efficient but slow given the
constant conversions from trees to wide_ints.
This patch gets us the best of both worlds by providing a storage
mechanism with a custom trailing wide int interface, while at the same
time being fast enough to use in the ranger cache.
We use a custom trailing wide_int mechanism but more flexible than
trailing_wide_int, since the latter has compile-time fixed-sized
wide_ints. The original TWI structure has the current length of each
wide_int in a static portion preceeding the variable length:
template <int N>
struct GTY((user)) trailing_wide_ints
{
...
...
/* The current length of each number.
that will, in turn, turn off TBAA on gimple, trees and RTL. */
struct {unsigned char len;} m_len[N];
/* The variable-length part of the structure, which always contains
at least one HWI. Element I starts at index I * M_MAX_LEN. */
HOST_WIDE_INT m_val[1];
};
We need both m_len[] and m_val[] to be variable-length at run-time.
In the previous incarnation of the storage mechanism the limitation of
m_len[] being static meant that we were limited to whatever [N] could
use up the unused bits in the TWI control world. In practice this
meant we were limited to 6 sub-ranges. This worked fine for global
ranges, but is a no go for our internal cache, where we must represent
things exactly (ranges for switches, etc).
The new implementation removes this restriction by making both m_len[]
and m_val[] variable length. Also, rolling our own allows future
optimization be using some of the leftover bits in the control world.
Also, in preparation for the wide_int conversion, vrange_storage is
now optimized to blast the bits directly into the ultimate irange
instead of going through the irange API. So ultimately copying back
and forth between the ranger cache and the storage mechanism is just a
matter of copying a few bits for the control word, and copying an
array of HOST_WIDE_INTs. These changes were heavily profiled, and
yielded a good chunk of the overall speedup for the wide_int
conversion.
Finally, vrange_storage is now a first class structure with GTY
markers and all, thus alleviating the void * hack in struct
tree_ssa_name and friends. This removes a few warts in the API and
looks cleaner overall.
gcc/ChangeLog:
* gimple-fold.cc (maybe_fold_comparisons_from_match_pd): Adjust
for vrange_storage.
* gimple-range-cache.cc (sbr_vector::sbr_vector): Same.
(sbr_vector::grow): Same.
(sbr_vector::set_bb_range): Same.
(sbr_vector::get_bb_range): Same.
(sbr_sparse_bitmap::sbr_sparse_bitmap): Same.
(sbr_sparse_bitmap::set_bb_range): Same.
(sbr_sparse_bitmap::get_bb_range): Same.
(block_range_cache::block_range_cache): Same.
(ssa_global_cache::ssa_global_cache): Same.
(ssa_global_cache::get_global_range): Same.
(ssa_global_cache::set_global_range): Same.
* gimple-range-cache.h: Same.
* gimple-range-edge.cc
(gimple_outgoing_range::gimple_outgoing_range): Same.
(gimple_outgoing_range::switch_edge_range): Same.
(gimple_outgoing_range::calc_switch_ranges): Same.
* gimple-range-edge.h: Same.
* gimple-range-infer.cc
(infer_range_manager::infer_range_manager): Same.
(infer_range_manager::get_nonzero): Same.
(infer_range_manager::maybe_adjust_range): Same.
(infer_range_manager::add_range): Same.
* gimple-range-infer.h: Rename obstack_vrange_allocator to
vrange_allocator.
* tree-core.h (struct irange_storage_slot): Remove.
(struct tree_ssa_name): Remove irange_info and frange_info. Make
range_info a pointer to vrange_storage.
* tree-ssanames.cc (range_info_fits_p): Adjust for vrange_storage.
(range_info_alloc): Same.
(range_info_free): Same.
(range_info_get_range): Same.
(range_info_set_range): Same.
(get_nonzero_bits): Same.
* value-query.cc (get_ssa_name_range_info): Same.
* value-range-storage.cc (class vrange_internal_alloc): New.
(class vrange_obstack_alloc): New.
(class vrange_ggc_alloc): New.
(vrange_allocator::vrange_allocator): New.
(vrange_allocator::~vrange_allocator): New.
(vrange_storage::alloc_slot): New.
(vrange_allocator::alloc): New.
(vrange_allocator::free): New.
(vrange_allocator::clone): New.
(vrange_allocator::clone_varying): New.
(vrange_allocator::clone_undefined): New.
(vrange_storage::alloc): New.
(vrange_storage::set_vrange): Remove slot argument.
(vrange_storage::get_vrange): Same.
(vrange_storage::fits_p): Same.
(vrange_storage::equal_p): New.
(irange_storage::write_lengths_address): New.
(irange_storage::lengths_address): New.
(irange_storage_slot::alloc_slot): Remove.
(irange_storage::alloc): New.
(irange_storage_slot::irange_storage_slot): Remove.
(irange_storage::irange_storage): New.
(write_wide_int): New.
(irange_storage_slot::set_irange): Remove.
(irange_storage::set_irange): New.
(read_wide_int): New.
(irange_storage_slot::get_irange): Remove.
(irange_storage::get_irange): New.
(irange_storage_slot::size): Remove.
(irange_storage::equal_p): New.
(irange_storage_slot::num_wide_ints_needed): Remove.
(irange_storage::size): New.
(irange_storage_slot::fits_p): Remove.
(irange_storage::fits_p): New.
(irange_storage_slot::dump): Remove.
(irange_storage::dump): New.
(frange_storage_slot::alloc_slot): Remove.
(frange_storage::alloc): New.
(frange_storage_slot::set_frange): Remove.
(frange_storage::set_frange): New.
(frange_storage_slot::get_frange): Remove.
(frange_storage::get_frange): New.
(frange_storage_slot::fits_p): Remove.
(frange_storage::equal_p): New.
(frange_storage::fits_p): New.
(ggc_vrange_allocator): New.
(ggc_alloc_vrange_storage): New.
* value-range-storage.h (class vrange_storage): Rewrite.
(class irange_storage): Rewrite.
(class frange_storage): Rewrite.
(class obstack_vrange_allocator): Remove.
(class ggc_vrange_allocator): Remove.
(vrange_allocator::alloc_vrange): Remove.
(vrange_allocator::alloc_irange): Remove.
(vrange_allocator::alloc_frange): Remove.
(ggc_alloc_vrange_storage): New.
* value-range.h (class irange): Rename vrange_allocator to
irange_storage.
(class frange): Same.
|
|
On Tue, Apr 18, 2023 at 03:12:50PM +0200, Aldy Hernandez wrote:
> [I don't know why I keep poking at floats. I must really like the pain.
>
> This is the range-op entry for sin/cos. It is meant to serve as an
> example of what we can do for glibc math functions. It is by no means
> exhaustive, just a stub to restrict the return range from sin/cos to
> [-1.0, 1.0] with appropriate smarts of NANs.
>
> As can be seen in the testcase, we see sin() as well as
> __builtin_sin() in the IL, and can resolve the resulting range
> accordingly.
Here is an updated version of the patch on top of the
Add targetm.libm_function_max_error
patch with all my comments incorporated into your patch (but still no
handling of sin/cos ranges shorter than 2*M_PI).
2023-04-28 Aldy Hernandez <aldyh@redhat.com>
Jakub Jelinek <jakub@redhat.com>
* value-range.h (frange_nextafter): Declare.
* gimple-range-op.cc (class cfn_sincos): New.
(op_cfn_sin, op_cfn_cos): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_CFN_{SIN,COS}{,_FN}.
* gcc.dg/tree-ssa/range-sincos.c: New test.
|
|
This patch removes all the code paths guarded by legacy_mode_p(), thus
allowing us to re-use the int_range<1> idiom for a range of one
sub-range. This allows us to represent these simple ranges in a more
efficient manner.
gcc/ChangeLog:
* range-op.cc (range_op_cast_tests): Remove legacy support.
* value-range-storage.h (vrange_allocator::alloc_irange): Same.
* value-range.cc (irange::operator=): Same.
(get_legacy_range): Same.
(irange::copy_legacy_to_multi_range): Delete.
(irange::copy_to_legacy): Delete.
(irange::irange_set_anti_range): Delete.
(irange::set): Remove legacy support.
(irange::verify_range): Same.
(irange::legacy_lower_bound): Delete.
(irange::legacy_upper_bound): Delete.
(irange::legacy_equal_p): Delete.
(irange::operator==): Remove legacy support.
(irange::singleton_p): Same.
(irange::value_inside_range): Same.
(irange::contains_p): Same.
(intersect_ranges): Delete.
(irange::legacy_intersect): Delete.
(union_ranges): Delete.
(irange::legacy_union): Delete.
(irange::legacy_verbose_union_): Delete.
(irange::legacy_verbose_intersect): Delete.
(irange::irange_union): Remove legacy support.
(irange::irange_intersect): Same.
(irange::intersect): Same.
(irange::invert): Same.
(ranges_from_anti_range): Delete.
(gt_pch_nx): Adjust for legacy removal.
(gt_ggc_mx): Same.
(range_tests_legacy): Delete.
(range_tests_misc): Adjust for legacy removal.
(range_tests): Same.
* value-range.h (class irange): Same.
(irange::legacy_mode_p): Delete.
(ranges_from_anti_range): Delete.
(irange::nonzero_p): Adjust for legacy removal.
(irange::lower_bound): Same.
(irange::upper_bound): Same.
(irange::union_): Same.
(irange::intersect): Same.
(irange::set_nonzero): Same.
(irange::set_zero): Same.
* vr-values.cc (simplify_using_ranges::legacy_fold_cond_overflow): Same.
|
|
gcc/ChangeLog:
* value-range.cc (irange::copy_legacy_to_multi_range): Rewrite use
of range_has_numeric_bounds_p with irange API.
(range_has_numeric_bounds_p): Delete.
* value-range.h (range_has_numeric_bounds_p): Delete.
|
|
This patch converts the users of the legacy API to a function called
get_legacy_range() which will return the pieces of the soon to be
removed API (min, max, and kind). This is a temporary measure while
these users are converted.
In upcoming patches I will convert most users, but most of the
middle-end warning uses will remain. Naive attempts to remove them
showed that a lot of these uses are quite dependant on the anti-range
idiom, and converting them to the new API broke the tests, even when
the conversion was conceptually correct. Perhaps someone who
understands these passes could take a stab at it. In the meantime,
the legacy uses can be trivially found by grepping for
get_legacy_range.
gcc/ChangeLog:
* builtins.cc (determine_block_size): Convert use of legacy API to
get_legacy_range.
* gimple-array-bounds.cc (check_out_of_bounds_and_warn): Same.
(array_bounds_checker::check_array_ref): Same.
* gimple-ssa-warn-restrict.cc
(builtin_memref::extend_offset_range): Same.
* ipa-cp.cc (ipcp_store_vr_results): Same.
* ipa-fnsummary.cc (set_switch_stmt_execution_predicate): Same.
* ipa-prop.cc (struct ipa_vr_ggc_hash_traits): Same.
(ipa_write_jump_function): Same.
* pointer-query.cc (get_size_range): Same.
* tree-data-ref.cc (split_constant_offset): Same.
* tree-ssa-strlen.cc (get_range): Same.
(maybe_diag_stxncpy_trunc): Same.
(strlen_pass::get_len_or_size): Same.
(strlen_pass::count_nonzero_bytes_addr): Same.
* tree-vect-patterns.cc (vect_get_range_info): Same.
* value-range.cc (irange::maybe_anti_range): Remove.
(get_legacy_range): New.
(irange::copy_to_legacy): Use get_legacy_range.
(ranges_from_anti_range): Same.
* value-range.h (class irange): Remove maybe_anti_range.
(get_legacy_range): New.
* vr-values.cc (check_for_binary_op_overflow): Convert use of
legacy API to get_legacy_range.
(compare_ranges): Same.
(compare_range_with_value): Same.
(bounds_of_var_in_loop): Same.
(find_case_label_ranges): Same.
(simplify_using_ranges::simplify_switch_using_ranges): Same.
|
|
gcc/ChangeLog:
* value-range-pretty-print.cc (vrange_printer::visit): Remove
constant_p use.
* value-range.cc (irange::constant_p): Remove.
(irange::get_nonzero_bits_from_range): Remove constant_p use.
* value-range.h (class irange): Remove constant_p.
(irange::num_pairs): Remove constant_p use.
|
|
gcc/ChangeLog:
* value-range.cc (irange::copy_legacy_to_multi_range): Remove
symbolics support.
(irange::set): Same.
(irange::legacy_lower_bound): Same.
(irange::legacy_upper_bound): Same.
(irange::contains_p): Same.
(range_tests_legacy): Same.
(irange::normalize_addresses): Remove.
(irange::normalize_symbolics): Remove.
(irange::symbolic_p): Remove.
* value-range.h (class irange): Remove symbolic_p,
normalize_symbolics, and normalize_addresses.
* vr-values.cc (simplify_using_ranges::two_valued_val_range_p):
Remove symbolics support.
|
|
The deprecated irange::may_contain_p method differed from contains_p
in that it could handle symbolics, which no longer exist in VRP.
gcc/ChangeLog:
* value-range.cc (irange::may_contain_p): Remove.
* value-range.h (range_includes_zero_p): Rewrite may_contain_p
usage with contains_p.
* vr-values.cc (compare_range_with_value): Same.
|
|
I think it's best to specify the default behavior of nan_state, since
it's not obvious that nan_state() defaults to TRUE. Also, this avoids
the ugly nan_state(false, false) idiom.
gcc/ChangeLog:
* value-range.cc (frange::set): Adjust constructor.
* value-range.h (nan_state::nan_state): Replace default
constructor with one taking an argument.
|
|
gcc/ChangeLog:
* value-range.h (Value_Range::Value_Range): Avoid pointer sharing.
|
|
IPA currently puts *some* irange's in GC memory. When I contribute
support for generic ranges in IPA, we'll need to change this to
vrange. This patch adds GTY support for both vrange and frange.
gcc/ChangeLog:
* value-range.cc (gt_ggc_mx): New.
(gt_pch_nx): New.
* value-range.h (class vrange): Add GTY marker.
(class frange): Same.
(gt_ggc_mx): Remove.
(gt_pch_nx): Remove.
|
|
This patch provides inchash support for vrange. It is along the lines
of the streaming support I just posted and will be used for IPA
hashing of ranges.
gcc/ChangeLog:
* inchash.cc (hash::add_real_value): New.
* inchash.h (class hash): Add add_real_value.
* value-range.cc (add_vrange): New.
* value-range.h (inchash::add_vrange): New.
|
|
This is for upcoming work in this area.
gcc/ChangeLog:
* value-range.h (Value_Range::Value_Range): New.
(Value_Range::contains_p): New.
|
|
The discriminator in vrange cannot change after construction,
similarly the number of allocated ranges in an irange. It's best to
make them constant to avoid invalid changes.
gcc/ChangeLog:
* value-range.h (class vrange): Make m_discriminator const.
(class irange): Make m_max_ranges const. Adjust constructors
accordingly.
(class unsupported_range): Construct vrange appropriately.
(class frange): Same.
|
|
As discussed in the PR, flushing denormals to zero on every frange::set
might be harmful for e.g. x < 0.0 comparisons, because we then on both
sides use ranges that include zero [-Inf, -0.0] on the true side, and
[-0.0, +Inf] NAN on the false side, rather than [-Inf, nextafter (-0.0, -Inf)]
on the true side.
The following patch does it only in range_operator_float::fold_range
which is right now used for +-*/ (both normal and reverse ops of those).
Though, I don't see any difference on the testcase in the PR, but not sure
what I should be looking at and the reduced testcase there has undefined
behavior.
2023-03-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109154
* value-range.h (frange::flush_denormals_to_zero): Make it public
rather than private.
* value-range.cc (frange::set): Don't call flush_denormals_to_zero
here.
* range-op-float.cc (range_operator_float::fold_range): Call
flush_denormals_to_zero.
|
|
I've noticed a comment typo in tree-vrp.cc and decided to quickly
skim aspell -c on the ranger sources (with quick I on everything that
looked ok or roughly ok).
But not being a native English speaker, I could get stuff wrong.
2023-03-23 Jakub Jelinek <jakub@redhat.com>
* value-range.cc (irange::irange_union, irange::intersect): Fix
comment spelling bugs.
* gimple-range-trace.cc (range_tracer::do_header): Likewise.
* gimple-range-trace.h: Likewise.
* gimple-range-edge.cc: Likewise.
(gimple_outgoing_range_stmt_p,
gimple_outgoing_range::switch_edge_range,
gimple_outgoing_range::edge_range_p): Likewise.
* gimple-range.cc (gimple_ranger::prefill_stmt_dependencies,
gimple_ranger::fold_stmt, gimple_ranger::register_transitive_infer,
assume_query::assume_query, assume_query::calculate_phi): Likewise.
* gimple-range-edge.h: Likewise.
* value-range.h (Value_Range::set, Value_Range::lower_bound,
Value_Range::upper_bound, frange::set_undefined): Likewise.
* gimple-range-gori.h (range_def_chain::depend, gori_map::m_outgoing,
gori_compute): Likewise.
* gimple-range-fold.h (fold_using_range): Likewise.
* gimple-range-path.cc (path_range_query::compute_ranges_in_phis):
Likewise.
* gimple-range-gori.cc (range_def_chain::in_chain_p,
range_def_chain::dump, gori_map::calculate_gori,
gori_compute::compute_operand_range_switch,
gori_compute::logical_combine, gori_compute::refine_using_relation,
gori_compute::compute_operand1_range, gori_compute::may_recompute_p):
Likewise.
* gimple-range.h: Likewise.
(enable_ranger): Likewise.
* range-op.h (empty_range_varying): Likewise.
* value-query.h (value_query): Likewise.
* gimple-range-cache.cc (block_range_cache::set_bb_range,
block_range_cache::dump, ssa_global_cache::clear_global_range,
temporal_cache::temporal_value, temporal_cache::current_p,
ranger_cache::range_of_def, ranger_cache::propagate_updated_value,
ranger_cache::range_from_dom, ranger_cache::register_inferred_value):
Likewise.
* gimple-range-fold.cc (fur_edge::get_phi_operand,
fur_stmt::get_operand, gimple_range_adjustment,
fold_using_range::range_of_phi,
fold_using_range::relation_fold_and_or): Likewise.
* value-range-storage.h (irange_storage_slot::MAX_INTS): Likewise.
* value-query.cc (range_query::value_of_expr,
range_query::value_on_edge, range_query::query_relation): Likewise.
* tree-vrp.cc (remove_unreachable::remove_and_update_globals,
intersect_range_with_nonzero_bits): Likewise.
* gimple-range-infer.cc (gimple_infer_range::check_assume_func,
exit_range): Likewise.
* value-relation.h: Likewise.
(equiv_oracle, relation_trio::relation_trio, value_relation,
value_relation::value_relation, pe_min): Likewise.
* range-op-float.cc (range_operator_float::rv_fold,
frange_arithmetic, foperator_unordered_equal::op1_range,
foperator_div::rv_fold): Likewise.
* gimple-range-op.cc (cfn_clz::fold_range): Likewise.
* value-relation.cc (equiv_oracle::query_relation,
equiv_oracle::register_equiv, equiv_oracle::add_equiv_to_block,
value_relation::apply_transitive, relation_chain_head::find_relation,
dom_oracle::query_relation, dom_oracle::find_relation_block,
dom_oracle::find_relation_dom, path_oracle::register_equiv): Likewise.
* range-op.cc (range_operator::wi_fold_in_parts_equiv,
create_possibly_reversed_range, adjust_op1_for_overflow,
operator_mult::wi_fold, operator_exact_divide::op1_range,
operator_cast::lhs_op1_relation, operator_cast::fold_pair,
operator_cast::fold_range, operator_abs::wi_fold, range_op_cast_tests,
range_op_lshift_tests): Likewise.
|
|
This patch implements a nan_state class, that allows us to query or
pass around the NANness of an frange. We can store +NAN, -NAN, +-NAN,
or not-a-NAN with it.
I tried to touch as little as possible, leaving other cleanups to the
next release. For example, we should replace the m_*_nan fields in
frange with nan_state, and provide relevant accessors to nan_state
(isnan, etc).
PR tree-optimization/109008
gcc/ChangeLog:
* value-range.cc (frange::set): Add nan_state argument.
* value-range.h (class nan_state): New.
(frange::get_nan_state): New.
|