aboutsummaryrefslogtreecommitdiff
path: root/gcc/fold-const.cc
AgeCommit message (Collapse)AuthorFilesLines
2024-07-28Fix hash of WIDEN_*_EXPRRichard Biener1-1/+1
We're hashing operand 2 to the temporary hash. * fold-const.cc (operand_compare::hash_operand): Fix hash of WIDEN_*_EXPR.
2024-07-19Treat boolean vector elements as 0/-1 [PR115406]Richard Sandiford1-2/+3
Previously we built vector boolean constants using 1 for true elements and 0 for false elements. This matches the predicates produced by SVE's PTRUE instruction, but leads to a miscompilation on AVX, where all bits of a boolean element should be set. One option for RTL would be to make this target-configurable. But that isn't really possible at the tree level, where vectors should work in a more target-independent way. (There is currently no way to create a "generic" packed boolean vector, but never say never :)) And, if we were going to pick a generic behaviour, it would make sense to use 0/-1 rather than 0/1, for consistency with integer vectors. Both behaviours should work with SVE on read, since SVE ignores the upper bits in each predicate element. And the choice shouldn't make much difference for RTL, since all SVE predicate modes are expressed as vectors of BI, rather than of multi-bit booleans. I suspect there might be some fallout from this change on SVE. But I think we should at least give it a go, and see whether any fallout provides a strong counterargument against the approach. gcc/ PR middle-end/115406 * fold-const.cc (native_encode_vector_part): For vector booleans, check whether an element is nonzero and, if so, set all of the correspending bits in the target image. * simplify-rtx.cc (native_encode_rtx): Likewise. gcc/testsuite/ PR middle-end/115406 * gcc.dg/torture/pr115406.c: New test.
2024-07-18middle-end/115641 - invalid address constructionRichard Biener1-0/+3
fold_truth_andor_1 via make_bit_field_ref builds an address of a CALL_EXPR which isn't valid GENERIC and later causes an ICE. The following simply avoids the folding for f ().a != 1 || f ().b != 2 as it is a premature optimization anyway. The alternative would have been to build a TARGET_EXPR around the call. To get this far f () has to be const as otherwise the two calls are not semantically equivalent for the optimization. PR middle-end/115641 * fold-const.cc (decode_field_reference): If the inner reference isn't something we can take the address of, fail. * gcc.dg/torture/pr115641.c: New testcase.
2024-06-05Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.liuhongt1-1/+12
According to IEEE standard, for conversions from floating point to integer. When a NaN or infinite operand cannot be represented in the destination format and this cannot otherwise be indicated, the invalid operation exception shall be signaled. When a numeric operand would convert to an integer outside the range of the destination format, the invalid operation exception shall be signaled if this situation cannot otherwise be indicated. The patch prevent simplication of the conversion from floating point to integer for NAN/INF/out-of-range constant when flag_trapping_math. gcc/ChangeLog: PR rtl-optimization/100927 PR rtl-optimization/115161 PR rtl-optimization/115115 * simplify-rtx.cc (simplify_const_unary_operation): Prevent simplication of FIX/UNSIGNED_FIX for NAN/INF/out-of-range constant when flag_trapping_math. * fold-const.cc (fold_convert_const_int_from_real): Don't fold for overflow value when_trapping_math. gcc/testsuite/ChangeLog: * gcc.dg/pr100927.c: New test. * c-c++-common/Wconversion-1.c: Add -fno-trapping-math. * c-c++-common/dfp/convert-int-saturate.c: Ditto. * g++.dg/ubsan/pr63956.C: Ditto. * g++.dg/warn/Wconversion-real-integer.C: Ditto. * gcc.c-torture/execute/20031003-1.c: Ditto. * gcc.dg/Wconversion-complex-c99.c: Ditto. * gcc.dg/Wconversion-real-integer.c: Ditto. * gcc.dg/c90-const-expr-11.c: Ditto. * gcc.dg/overflow-warn-8.c: Ditto.
2024-06-04fold-const: Handle CTZ like CLZ in tree_call_nonnegative_warnv_p [PR115337]Jakub Jelinek1-0/+1
I think we can handle CTZ exactly like CLZ in tree_call_nonnegative_warnv_p. Like CLZ, if it is UB at zero, the result range is [0, prec-1] and if it is well defined at zero, the second argument provides the value at zero. 2024-06-04 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/115337 * fold-const.cc (tree_call_nonnegative_warnv_p): Handle CASE_CFN_CTZ like CASE_CFN_CLZ.
2024-06-04fold-const, gimple-fold: Some formatting cleanupsJakub Jelinek1-4/+4
While looking into PR115337, I've spotted some badly formatted code, which the following patch fixes. 2024-06-04 Jakub Jelinek <jakub@redhat.com> * fold-const.cc (tree_call_nonnegative_warnv_p): Formatting fixes. (tree_invalid_nonnegative_warnv_p): Likewise. * gimple-fold.cc (gimple_call_nonnegative_warnv_p): Likewise.
2024-06-04fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]Jakub Jelinek1-1/+5
The function currently incorrectly assumes all the __builtin_clz* and .CLZ calls have non-negative result. That is the case of the former which is UB on zero and has [0, prec-1] return value otherwise, and is the case of the single argument .CLZ as well (again, UB on zero), but for two argument .CLZ is the case only if the second argument is also nonnegative (or if we know the argument can't be zero, but let's do that just in the ranger IMHO). The following patch does that. 2024-06-04 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/115337 * fold-const.cc (tree_call_nonnegative_warnv_p) <CASE_CFN_CLZ>: If arg1 is non-NULL, RECURSE on it, otherwise return true. * gcc.dg/bitint-106.c: New test.
2024-06-03Remove value_range typedef.Aldy Hernandez1-2/+2
Now that pointers and integers have been disambiguated from irange, and all the pointer range temporaries use prange, we can reclaim value_range as a general purpose range container. This patch removes the typedef, in favor of int_range_max, thus providing slightly better ranges in places. I have also used int_range<1> or <2> when it's known ahead of time how big the range will be, thus saving a few words. In a follow-up patch I will rename the Value_Range temporary to value_range. No change in performance. gcc/ChangeLog: * builtins.cc (expand_builtin_strnlen): Replace value_range use with int_range_max or irange when appropriate. (determine_block_size): Same. * fold-const.cc (minmax_from_comparison): Same. * gimple-array-bounds.cc (check_out_of_bounds_and_warn): Same. (array_bounds_checker::check_array_ref): Same. * gimple-fold.cc (size_must_be_zero_p): Same. * gimple-predicate-analysis.cc (find_var_cmp_const): Same. * gimple-ssa-sprintf.cc (get_int_range): Same. (format_integer): Same. (try_substitute_return_value): Same. (handle_printf_call): Same. * gimple-ssa-warn-restrict.cc (builtin_memref::extend_offset_range): Same. * graphite-sese-to-poly.cc (add_param_constraints): Same. * internal-fn.cc (get_min_precision): Same. * match.pd: Same. * pointer-query.cc (get_size_range): Same. * range-op.cc (get_shift_range): Same. (operator_trunc_mod::op1_range): Same. (operator_trunc_mod::op2_range): Same. * range.cc (range_negatives): Same. * range.h (range_positives): Same. (range_negatives): Same. * tree-affine.cc (expr_to_aff_combination): Same. * tree-data-ref.cc (compute_distributive_range): Same. (nop_conversion_for_offset_p): Same. (split_constant_offset): Same. (split_constant_offset_1): Same. (dr_step_indicator): Same. * tree-dfa.cc (get_ref_base_and_extent): Same. * tree-scalar-evolution.cc (iv_can_overflow_p): Same. * tree-ssa-math-opts.cc (optimize_spaceship): Same. * tree-ssa-pre.cc (insert_into_preds_of_block): Same. * tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Same. * tree-ssa-strlen.cc (compare_nonzero_chars): Same. (dump_strlen_info): Same. (get_range_strlen_dynamic): Same. (set_strlen_range): Same. (maybe_diag_stxncpy_trunc): Same. (strlen_pass::get_len_or_size): Same. (strlen_pass::handle_builtin_string_cmp): Same. (strlen_pass::count_nonzero_bytes_addr): Same. (strlen_pass::handle_integral_assign): Same. * tree-switch-conversion.cc (bit_test_cluster::emit): Same. * tree-vect-loop-manip.cc (vect_gen_vector_loop_niters): Same. (vect_do_peeling): Same. * tree-vect-patterns.cc (vect_get_range_info): Same. (vect_recog_divmod_pattern): Same. * tree.cc (get_range_pos_neg): Same. * value-range.cc (debug): Remove value_range variants. * value-range.h (value_range): Remove typedef. * vr-values.cc (simplify_using_ranges::op_with_boolean_value_range_p): Replace value_range use with int_range_max or irange when appropriate. (check_for_binary_op_overflow): Same. (simplify_using_ranges::legacy_fold_cond_overflow): Same. (find_case_label_ranges): Same. (simplify_using_ranges::simplify_abs_using_ranges): Same. (test_for_singularity): Same. (simplify_using_ranges::simplify_compare_using_ranges_1): Same. (simplify_using_ranges::simplify_casted_compare): Same. (simplify_using_ranges::simplify_switch_using_ranges): Same. (simplify_conversion_using_ranges): Same. (simplify_using_ranges::two_valued_val_range_p): Same.
2024-04-04fold-const: Handle NON_LVALUE_EXPR in native_encode_initializer [PR114537]Jakub Jelinek1-0/+2
The following testcase is incorrectly rejected. The problem is that for bit-fields native_encode_initializer expects the corresponding CONSTRUCTOR elt value must be INTEGER_CST, but that isn't the case here, it is wrapped into NON_LVALUE_EXPR by maybe_wrap_with_location. We could STRIP_ANY_LOCATION_WRAPPER as well, but as all we are looking for is INTEGER_CST inside, just looking through NON_LVALUE_EXPR seems easier. 2024-04-04 Jakub Jelinek <jakub@redhat.com> PR c++/114537 * fold-const.cc (native_encode_initializer): Look through NON_LVALUE_EXPR if val is INTEGER_CST. * g++.dg/cpp2a/bit-cast16.C: New test.
2024-03-26fold-const: Punt on MULT_EXPR in extract_muldiv MIN/MAX_EXPR case [PR111151]Jakub Jelinek1-0/+21
As I've tried to explain in the comments, the extract_muldiv_1 MIN/MAX_EXPR optimization is wrong for code == MULT_EXPR. If the multiplication is done in unsigned type or in signed type with -fwrapv, it is fairly obvious that max (a, b) * c in many cases isn't equivalent to max (a * c, b * c) (or min if c is negative) due to overflows, but even for signed with undefined overflow, the optimization could turn something without UB in it (where say a * c invokes UB, but max (or min) picks the other operand where b * c doesn't). As for division/modulo, I think it is in most cases safe, except if the problematic INT_MIN / -1 case could be triggered, but we can just punt for MAX_EXPR because for MIN_EXPR if one operand is INT_MIN, we'd pick that operand already. It is just for completeness, match.pd already has an optimization which turns x / -1 into -x, so the division by zero is mostly theoretical. That is also why in the testcase the i case isn't actually miscompiled without the patch, while the c and f cases are. 2024-03-26 Jakub Jelinek <jakub@redhat.com> PR middle-end/111151 * fold-const.cc (extract_muldiv_1) <case MAX_EXPR>: Punt for MULT_EXPR altogether, or for MAX_EXPR if c is -1. * gcc.c-torture/execute/pr111151.c: New test.
2024-03-10Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]Andrew Pinski1-1/+2
The problem here is that merge_truthop_with_opposite_arm would use the type of the result of the comparison rather than the operands of the comparison to figure out if we are honoring NaNs. This fixes that oversight and now we get the correct results in this case. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR middle-end/95351 gcc/ChangeLog: * fold-const.cc (merge_truthop_with_opposite_arm): Use the type of the operands of the comparison and not the type of the comparison. gcc/testsuite/ChangeLog: * gcc.dg/float_opposite_arm-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-02-26fold-const: Avoid infinite recursion in +-*&|^minmax reassociation [PR114084]Jakub Jelinek1-10/+41
In the following testcase we infinitely recurse during BIT_IOR_EXPR reassociation. One operand is (unsigned _BitInt(31)) a << 4 and another operand 2147483647 >> 1 | 80 where both the right shift and the | 80 trees have TREE_CONSTANT set, but weren't folded because of delayed folding, where some foldings are apparently done even in that case unfortunately. Now, the fold_binary_loc reassocation code splits both operands into variable part, minus variable part, constant part, minus constant part, literal part and minus literal parts, to prevent infinite recursion punts if there are just 2 parts altogether from the 2 operands and then goes on with reassociation, merges first the corresponding parts from both operands and then some further merges. The problem with the above expressions is that we get 3 different objects, var0 (the left shift), con1 (the right shift) and lit1 (80), so the infinite recursion prevention doesn't trigger, and we eventually merge con1 with lit1, which effectively reconstructs the original op1 and then associate that with var0 which is original op0, and associate_trees for that case calls fold_binary. There are some casts involved there too (the T typedef type and the underlying _BitInt type which are stripped with STRIP_NOPS). The following patch attempts to prevent this infinite recursion by tracking the origin (if certain var comes from nothing - 0, op0 - 1, op1 - 2 or both - 3) and propagates it through all the associate_tree calls which merge the vars. If near the end we'd try to merge what comes solely from op0 with what comes solely from op1 (or vice versa), the patch punts, because then it isn't any kind of reassociation between the two operands, if anything it should be handled when folding the suboperands. 2024-02-26 Jakub Jelinek <jakub@redhat.com> PR middle-end/114084 * fold-const.cc (fold_binary_loc): Avoid the final associate_trees if all subtrees of var0 come from one of the op0 or op1 operands and all subtrees of con0 come from the other one. Don't clear variables which are never used afterwards. * gcc.dg/bitint-94.c: New test.
2024-01-25fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].Robin Dapp1-0/+31
Found in PR112971 this patch adds folding support for bitwise operations of const duplicate zero/one vectors with stepped vectors. On riscv we have the situation that a folding would perpetually continue without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would not be folded to {0, 0, 0, ...}. gcc/ChangeLog: PR middle-end/112971 * fold-const.cc (simplify_const_binop): New function for binop simplification of two constant vectors when element-wise handling is not necessary. (const_binop): Call new function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112971.c: New test.
2024-01-23fold-const: Fold larger VIEW_CONVERT_EXPRs [PR113462]Jakub Jelinek1-5/+17
On Mon, Jan 22, 2024 at 11:27:52AM +0100, Richard Biener wrote: > We run into > > static tree > native_interpret_int (tree type, const unsigned char *ptr, int len) > { > ... > if (total_bytes > len > || total_bytes * BITS_PER_UNIT > HOST_BITS_PER_DOUBLE_INT) > return NULL_TREE; > > OTOH using a V_C_E to "truncate" a _BitInt looks wrong? OTOH the > check doesn't really handle native_encode_expr using the "proper" > wide_int encoding however that's exactly handled. So it might be > a pre-existing issue that's only uncovered by large _BitInts > (__int128 might show similar issues?) I guess the || total_bytes * BITS_PER_UNIT > HOST_BITS_PER_DOUBLE_INT conditions make no sense, all we care is whether it fits in the buffer or not. But then there is fold_view_convert_expr (and other spots) which use /* We support up to 1024-bit values (for GCN/RISC-V V128QImode). */ unsigned char buffer[128]; or something similar. This patch fixes even that by using a XALLOCAVEC allocated buffer if the type size is 129 .. 8192 bytes. 2024-01-22 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113462 * fold-const.cc (native_interpret_int): Don't punt if total_bytes is larger than HOST_BITS_PER_DOUBLE_INT / BITS_PER_UNIT. (fold_view_convert_expr): Use XALLOCAVEC buffers for types with sizes between 129 and 8192 bytes.
2024-01-12middle-end/113344 - is_truth_type_for vs GENERIC tcc_comparisonRichard Biener1-1/+1
On GENERIC tcc_comparison can have int type so restrict the PR113126 fix to vector types. PR middle-end/113344 * match.pd ((double)float CMP (double)float -> float CMP float): Perform result type check only for vectors. * fold-const.cc (fold_binary_loc): Likewise.
2024-01-11tree-optimization/113126 - vector extension compare optimizationRichard Biener1-1/+2
The following makes sure the resulting boolean type is the same when eliding a float extension. PR tree-optimization/113126 * match.pd ((double)float CMP (double)float -> float CMP float): Make sure the boolean type is the same. * fold-const.cc (fold_binary_loc): Likewise. * gcc.dg/torture/pr113126.c: New testcase.
2024-01-03Update copyright years.Jakub Jelinek1-1/+1
2023-12-11MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type are ↵Andrew Pinski1-27/+0
the same When I moved two_value to match.pd, I removed the check for the {0,+-1} as I had placed it after the {0,+-1} case for cond in match.pd. In the case of {0,+-1} and non boolean, before we would optmize those case to just `(convert)a` but after we would get `(convert)(a != 0)` which was not handled anyways to just `(convert)a`. So this adds a pattern to match `(convert)(zeroone != 0)` and simplify to `(convert)zeroone`. Also this optimizes (convert)(zeroone == 0) into (zeroone^1) if the type match. Removing the opposite transformation from fold. The opposite transformation was added with https://gcc.gnu.org/pipermail/gcc-patches/2006-February/190514.html It is no longer considered the canonicalization either, even VRP will transform it back into `(~a) & 1` so removing it is a good idea. Note the testcase pr69270.c needed a slight update due to not matching exactly a scan pattern, this update makes it more robust and will match before and afterwards and if there are other changes in this area too. Note the testcase gcc.target/i386/pr110790-2.c needs a slight update for better code generation in LP64 bit mode. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/111972 PR tree-optimization/110637 * match.pd (`(convert)(zeroone !=/== CST)`): Match and simplify to ((convert)zeroone){,^1}. * fold-const.cc (fold_binary_loc): Remove transformation of `(~a) & 1` and `(a ^ 1) & 1` into `(convert)(a == 0)`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr110637-1.c: New test. * gcc.dg/tree-ssa/pr110637-2.c: New test. * gcc.dg/tree-ssa/pr110637-3.c: New test. * gcc.dg/tree-ssa/pr111972-1.c: New test. * gcc.dg/tree-ssa/pr69270.c: Update testcase. * gcc.target/i386/pr110790-2.c: Update testcase. * gcc.dg/fold-even-1.c: Removed. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2023-11-29fold-const: Fix up multiple_of_p [PR112733]Jakub Jelinek1-1/+1
We ICE on the following testcase when wi::multiple_of_p is called on widest_int 1 and -128 with UNSIGNED. I still need to work on the actual wide-int.cc issue, the latest patch attached to the PR regressed bitint-{38,39}.c, so will need to debug that, but there is a clear bug on the fold-const.cc side as well - widest_int is a signed representation by definition, using UNSIGNED with it certainly doesn't match what was intended, because -128 as the second operand effectively means unsigned 131072 bit 0xfffff............ffff80 integer, not the signed char -128 that appeared in the source. In the INTEGER_CST case a few lines above this we already use case INTEGER_CST: if (TREE_CODE (bottom) != INTEGER_CST || integer_zerop (bottom)) return false; return wi::multiple_of_p (wi::to_widest (top), wi::to_widest (bottom), SIGNED); so I think using SIGNED with widest_int is best there (compared to the other choices in the PR). 2023-11-29 Jakub Jelinek <jakub@redhat.com> PR middle-end/112733 * fold-const.cc (multiple_of_p): Pass SIGNED rather than UNSIGNED for wi::multiple_of_p on widest_int arguments. * gcc.dg/pr112733.c: New test.
2023-11-27PR111754: Rework encoding of result for VEC_PERM_EXPR with constant input ↵Prathamesh Kulkarni1-19/+91
vectors. gcc/ChangeLog: PR middle-end/111754 * fold-const.cc (fold_vec_perm_cst): Set result's encoding to sel's encoding, and set res_nelts_per_pattern to 2 if sel contains stepped sequence but input vectors do not. (test_nunits_min_2): New test Case 8. (test_nunits_min_4): New tests Case 8 and Case 9. gcc/testsuite/ChangeLog: PR middle-end/111754 * gcc.target/aarch64/sve/slp_3.c: Adjust code-gen. * gcc.target/aarch64/sve/slp_4.c: Likewise. * gcc.dg/vect/pr111754.c: New test. Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
2023-11-10Handle constant CONSTRUCTORs in operand_compareEric Botcazou1-7/+67
This teaches operand_compare to compare constant CONSTRUCTORs, which is quite helpful for so-called fat pointers in Ada, i.e. objects that are semantically pointers but are represented by structures made up of two pointers. This is modeled on the implementation present in the ICF pass. gcc/ * fold-const.cc (operand_compare::operand_equal_p) <CONSTRUCTOR>: Deal with nonempty constant CONSTRUCTORs. (operand_compare::hash_operand) <CONSTRUCTOR>: Hash DECL_FIELD_OFFSET and DECL_FIELD_BIT_OFFSET for FIELD_DECLs. gcc/testsuite/ * gnat.dg/opt103.ads, gnat.dg/opt103.adb: New test.
2023-10-19PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.Prathamesh Kulkarni1-11/+103
gcc/ChangeLog: PR tree-optimization/111648 * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): If a1 chooses base element from arg, ensure that it's a natural stepped sequence. (build_vec_cst_rand): New param natural_stepped and use it to construct a naturally stepped sequence. (test_nunits_min_2): Add new unit tests Case 6 and Case 7.
2023-10-16use more get_range_queryJiufu Guo1-5/+1
For "get_global_range_query" SSA_NAME_RANGE_INFO can be queried. For "get_range_query", it could get more context-aware range info. And look at the implementation of "get_range_query", it returns global range if no local fun info. So, if not quering for SSA_NAME and not chaning the IL, it would be ok to use get_range_query to replace get_global_range_query. gcc/ChangeLog: * fold-const.cc (expr_not_equal_to): Replace get_global_range_query by get_range_query. * gimple-fold.cc (size_must_be_zero_p): Likewise. * gimple-range-fold.cc (fur_source::fur_source): Likewise. * gimple-ssa-warn-access.cc (check_nul_terminated_array): Likewise. * tree-dfa.cc (get_ref_base_and_extent): Likewise.
2023-10-12wide-int: Allow up to 16320 bits wide_int and change widest_int precision to ↵Jakub Jelinek1-3/+11
32640 bits [PR102989] As mentioned in the _BitInt support thread, _BitInt(N) is currently limited by the wide_int/widest_int maximum precision limitation, which is depending on target 191, 319, 575 or 703 bits (one less than WIDE_INT_MAX_PRECISION). That is fairly low limit for _BitInt, especially on the targets with the 191 bit limitation. The following patch bumps that limit to 16319 bits on all arches (which support _BitInt at all), which is the limit imposed by INTEGER_CST representation (unsigned char members holding number of HOST_WIDE_INT limbs). In order to achieve that, wide_int is changed from a trivially copyable type which contained just an inline array of WIDE_INT_MAX_ELTS (3, 5, 9 or 11 limbs depending on target) limbs into a non-trivially copy constructible, copy assignable and destructible type which for the usual small cases (up to WIDE_INT_MAX_INL_ELTS which is the former WIDE_INT_MAX_ELTS) still uses an inline array of limbs, but for larger precisions uses heap allocated limb array. This makes wide_int unusable in GC structures, so for dwarf2out which was the only place which needed it there is a new rwide_int type (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs inline and is trivially copyable (dwarf2out should never deal with large _BitInt constants, those should have been lowered earlier). Similarly, widest_int has been changed from a trivially copyable type which contained also an inline array of WIDE_INT_MAX_ELTS limbs (but unlike wide_int didn't contain precision and assumed that to be WIDE_INT_MAX_PRECISION) into a non-trivially copy constructible, copy assignable and destructible type which has always WIDEST_INT_MAX_PRECISION precision (32640 bits currently, twice as much as INTEGER_CST limitation allows) and unlike wide_int decides depending on get_len () value whether it uses an inline array (again, up to WIDE_INT_MAX_INL_ELTS) or heap allocated one. In wide-int.h this means we need to estimate an upper bound on how many limbs will wide-int.cc (usually, sometimes wide-int.h) need to write, heap allocate if needed based on that estimation and upon set_len which is done at the end if we guessed over WIDE_INT_MAX_INL_ELTS and allocated dynamically, while we actually need less than that copy/deallocate. The unexact guesses are needed because the exact computation of the length in wide-int.cc is sometimes quite complex and especially canonicalize at the end can decrease it. widest_int is again because of this not usable in GC structures, so cfgloop.h has been changed to use fixed_wide_int_storage <WIDE_INT_MAX_INL_PRECISION> and punt if we'd have larger _BitInt based iterators, programs having more than 128-bit iterators will be hopefully rare and I think it is fine to treat loops with more than 2^127 iterations as effectively possibly infinite, omp-general.cc is changed to use fixed_wide_int_storage <1024>, as it better should support scores with the same precision on all arches. Code which used WIDE_INT_PRINT_BUFFER_SIZE sized buffers for printing wide_int/widest_int into buffer had to be changed to use XALLOCAVEC for larger lengths. On x86_64, the patch in --enable-checking=yes,rtl,extra configured bootstrapped cc1plus enlarges the .text section by 1.01% - from 0x25725a5 to 0x25e5555 and similarly at least when compiling insn-recog.cc with the usual bootstrap option slows compilation down by 1.01%, user 4m22.046s and 4m22.384s on vanilla trunk vs. 4m25.947s and 4m25.581s on patched trunk. I'm afraid some code size growth and compile time slowdown is unavoidable in this case, we use wide_int and widest_int everywhere, and while the rare cases are marked with UNLIKELY macros, it still means extra checks for it. The patch also regresses +FAIL: gm2/pim/fail/largeconst.mod, -O +FAIL: gm2/pim/fail/largeconst.mod, -O -g +FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer +FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer -finline-functions +FAIL: gm2/pim/fail/largeconst.mod, -Os +FAIL: gm2/pim/fail/largeconst.mod, -g +FAIL: gm2/pim/fail/largeconst2.mod, -O +FAIL: gm2/pim/fail/largeconst2.mod, -O -g +FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer +FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer -finline-functions +FAIL: gm2/pim/fail/largeconst2.mod, -Os +FAIL: gm2/pim/fail/largeconst2.mod, -g tests, which previously were rejected with error: constant literal ‘12345678912345678912345679123456789123456789123456789123456789123456791234567891234567891234567891234567891234567912345678912345678912345678912345678912345679123456789123456789’ exceeds internal ZTYPE range kind of errors, but now are accepted. Seems the FE tries to parse constants into widest_int in that case and only diagnoses if widest_int overflows, that seems wrong, it should at least punt if stuff doesn't fit into WIDE_INT_MAX_PRECISION, but perhaps far less than that, if it wants support for middle-end for precisions above 128-bit, it better should be using BITINT_TYPE. Will file a PR and defer to Modula2 maintainer. 2023-10-12 Jakub Jelinek <jakub@redhat.com> PR c/102989 * wide-int.h: Adjust file comment. (WIDE_INT_MAX_INL_ELTS): Define to former value of WIDE_INT_MAX_ELTS. (WIDE_INT_MAX_INL_PRECISION): Define. (WIDE_INT_MAX_ELTS): Change to 255. Assert that WIDE_INT_MAX_INL_ELTS is smaller than WIDE_INT_MAX_ELTS. (RWIDE_INT_MAX_ELTS, RWIDE_INT_MAX_PRECISION, WIDEST_INT_MAX_ELTS, WIDEST_INT_MAX_PRECISION): Define. (WI_BINARY_RESULT_VAR, WI_UNARY_RESULT_VAR): Change write_val callers to pass 0 as a new argument. (class widest_int_storage): Likewise. (widest_int, widest2_int): Change typedefs to use widest_int_storage rather than fixed_wide_int_storage. (enum wi::precision_type): Add INL_CONST_PRECISION enumerator. (struct binary_traits): Add partial specializations for INL_CONST_PRECISION. (generic_wide_int): Add needs_write_val_arg static data member. (int_traits): Likewise. (wide_int_storage): Replace val non-static data member with a union u of it and HOST_WIDE_INT *valp. Declare copy constructor, copy assignment operator and destructor. Add unsigned int argument to write_val. (wide_int_storage::wide_int_storage): Initialize precision to 0 in the default ctor. Remove unnecessary {}s around STATIC_ASSERTs. Assert in non-default ctor T's precision_type is not INL_CONST_PRECISION and allocate u.valp for large precision. Add copy constructor. (wide_int_storage::~wide_int_storage): New. (wide_int_storage::operator=): Add copy assignment operator. In assignment operator remove unnecessary {}s around STATIC_ASSERTs, assert ctor T's precision_type is not INL_CONST_PRECISION and if precision changes, deallocate and/or allocate u.valp. (wide_int_storage::get_val): Return u.valp rather than u.val for large precision. (wide_int_storage::write_val): Likewise. Add an unused unsigned int argument. (wide_int_storage::set_len): Use write_val instead of writing val directly. (wide_int_storage::from, wide_int_storage::from_array): Adjust write_val callers. (wide_int_storage::create): Allocate u.valp for large precisions. (wi::int_traits <wide_int_storage>::get_binary_precision): New. (fixed_wide_int_storage::fixed_wide_int_storage): Make default ctor defaulted. (fixed_wide_int_storage::write_val): Add unused unsigned int argument. (fixed_wide_int_storage::from, fixed_wide_int_storage::from_array): Adjust write_val callers. (wi::int_traits <fixed_wide_int_storage>::get_binary_precision): New. (WIDEST_INT): Define. (widest_int_storage): New template class. (wi::int_traits <widest_int_storage>): New. (trailing_wide_int_storage::write_val): Add unused unsigned int argument. (wi::get_binary_precision): Use wi::int_traits <WI_BINARY_RESULT (T1, T2)>::get_binary_precision rather than get_precision on get_binary_result. (wi::copy): Adjust write_val callers. Don't call set_len if needs_write_val_arg. (wi::bit_not): If result.needs_write_val_arg, call write_val again with upper bound estimate of len. (wi::sext, wi::zext, wi::set_bit): Likewise. (wi::bit_and, wi::bit_and_not, wi::bit_or, wi::bit_or_not, wi::bit_xor, wi::add, wi::sub, wi::mul, wi::mul_high, wi::div_trunc, wi::div_floor, wi::div_ceil, wi::div_round, wi::divmod_trunc, wi::mod_trunc, wi::mod_floor, wi::mod_ceil, wi::mod_round, wi::lshift, wi::lrshift, wi::arshift): Likewise. (wi::bswap, wi::bitreverse): Assert result.needs_write_val_arg is false. (gt_ggc_mx, gt_pch_nx): Remove generic template for all generic_wide_int, instead add functions and templates for each storage of generic_wide_int. Make functions for generic_wide_int <wide_int_storage> and templates for generic_wide_int <widest_int_storage <N>> deleted. (wi::mask, wi::shifted_mask): Adjust write_val calls. * wide-int.cc (zeros): Decrease array size to 1. (BLOCKS_NEEDED): Use CEIL. (canonize): Use HOST_WIDE_INT_M1. (wi::from_buffer): Pass 0 to write_val. (wi::to_mpz): Use CEIL. (wi::from_mpz): Likewise. Pass 0 to write_val. Use WIDE_INT_MAX_INL_ELTS instead of WIDE_INT_MAX_ELTS. (wi::mul_internal): Use WIDE_INT_MAX_INL_PRECISION instead of MAX_BITSIZE_MODE_ANY_INT in automatic array sizes, for prec above WIDE_INT_MAX_INL_PRECISION estimate precision from lengths of operands. Use XALLOCAVEC allocated buffers for prec above WIDE_INT_MAX_INL_PRECISION. (wi::divmod_internal): Likewise. (wi::lshift_large): For len > WIDE_INT_MAX_INL_ELTS estimate it from xlen and skip. (rshift_large_common): Remove xprecision argument, add len argument with len computed in caller. Don't return anything. (wi::lrshift_large, wi::arshift_large): Compute len here and pass it to rshift_large_common, for lengths above WIDE_INT_MAX_INL_ELTS using estimations from xlen if possible. (assert_deceq, assert_hexeq): For lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. (test_printing): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION. * wide-int-print.h (WIDE_INT_PRINT_BUFFER_SIZE): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION. * wide-int-print.cc (print_decs, print_decu, print_hex): For lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. * tree.h (wi::int_traits<extended_tree <N>>): Change precision_type to INL_CONST_PRECISION for N == ADDR_MAX_PRECISION. (widest_extended_tree): Use WIDEST_INT_MAX_PRECISION instead of WIDE_INT_MAX_PRECISION. (wi::ints_for): Use int_traits <extended_tree <N> >::precision_type instead of hard coded CONST_PRECISION. (widest2_int_cst): Use WIDEST_INT_MAX_PRECISION instead of WIDE_INT_MAX_PRECISION. (wi::extended_tree <N>::get_len): Use WIDEST_INT_MAX_PRECISION rather than WIDE_INT_MAX_PRECISION. (wi::ints_for::zero): Use wi::int_traits <wi::extended_tree <N> >::precision_type instead of wi::CONST_PRECISION. * tree.cc (build_replicated_int_cst): Formatting fix. Use WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS. * print-tree.cc (print_node): Don't print TREE_UNAVAILABLE on INTEGER_CSTs, TREE_VECs or SSA_NAMEs. * double-int.h (wi::int_traits <double_int>::precision_type): Change to INL_CONST_PRECISION from CONST_PRECISION. * poly-int.h (struct poly_coeff_traits): Add partial specialization for wi::INL_CONST_PRECISION. * cfgloop.h (bound_wide_int): New typedef. (struct nb_iter_bound): Change bound type from widest_int to bound_wide_int. (struct loop): Change nb_iterations_upper_bound, nb_iterations_likely_upper_bound and nb_iterations_estimate type from widest_int to bound_wide_int. * cfgloop.cc (record_niter_bound): Return early if wi::min_precision of i_bound is too large for bound_wide_int. Adjustments for the widest_int to bound_wide_int type change in non-static data members. (get_estimated_loop_iterations, get_max_loop_iterations, get_likely_max_loop_iterations): Adjustments for the widest_int to bound_wide_int type change in non-static data members. * tree-vect-loop.cc (vect_transform_loop): Likewise. * tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use XALLOCAVEC allocated buffer for i_bound len above WIDE_INT_MAX_INL_ELTS. (record_estimate): Return early if wi::min_precision of i_bound is too large for bound_wide_int. Adjustments for the widest_int to bound_wide_int type change in non-static data members. (wide_int_cmp): Use bound_wide_int instead of widest_int. (bound_index): Use bound_wide_int instead of widest_int. (discover_iteration_bound_by_body_walk): Likewise. Use widest_int::from to convert it to widest_int when passed to record_niter_bound. (maybe_lower_iteration_bound): Use widest_int::from to convert it to widest_int when passed to record_niter_bound. (estimate_numbers_of_iteration): Don't record upper bound if loop->nb_iterations has too large precision for bound_wide_int. (n_of_executions_at_most): Use widest_int::from. * tree-ssa-loop-ivcanon.cc (remove_redundant_iv_tests): Adjust for the widest_int to bound_wide_int changes. * match.pd (fold_sign_changed_comparison simplification): Use wide_int::from on wi::to_wide instead of wi::to_widest. * value-range.h (irange::maybe_resize): Avoid using memcpy on non-trivially copyable elements. * value-range.cc (irange_bitmask::dump): Use XALLOCAVEC allocated buffer for mask or value len above WIDE_INT_PRINT_BUFFER_SIZE. * fold-const.cc (fold_convert_const_int_from_int, fold_unary_loc): Use wide_int::from on wi::to_wide instead of wi::to_widest. * tree-ssa-ccp.cc (bit_value_binop): Zero extend r1max from width before calling wi::udiv_trunc. * lto-streamer-out.cc (output_cfg): Adjustments for the widest_int to bound_wide_int type change in non-static data members. * lto-streamer-in.cc (input_cfg): Likewise. (lto_input_tree_1): Use WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS. For length above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. Formatting fix. * data-streamer-in.cc (streamer_read_wide_int, streamer_read_widest_int): Likewise. * tree-affine.cc (aff_combination_expand): Use placement new to construct name_expansion. (free_name_expansion): Destruct name_expansion. * gimple-ssa-strength-reduction.cc (struct slsr_cand_d): Change index type from widest_int to offset_int. (class incr_info_d): Change incr type from widest_int to offset_int. (alloc_cand_and_find_basis, backtrace_base_for_ref, restructure_reference, slsr_process_ref, create_mul_ssa_cand, create_mul_imm_cand, create_add_ssa_cand, create_add_imm_cand, slsr_process_add, cand_abs_increment, replace_mult_candidate, replace_unconditional_candidate, incr_vec_index, create_add_on_incoming_edge, create_phi_basis_1, replace_conditional_candidate, record_increment, record_phi_increments_1, phi_incr_cost_1, phi_incr_cost, lowest_cost_path, total_savings, ncd_with_phi, ncd_of_cand_and_phis, nearest_common_dominator_for_cands, insert_initializers, all_phi_incrs_profitable_1, replace_one_candidate, replace_profitable_candidates): Use offset_int rather than widest_int and wi::to_offset rather than wi::to_widest. * real.cc (real_to_integer): Use WIDE_INT_MAX_INL_ELTS rather than 2 * WIDE_INT_MAX_ELTS and for words above that use XALLOCAVEC allocated buffer. * tree-ssa-loop-ivopts.cc (niter_for_exit): Use placement new to construct tree_niter_desc and destruct it on failure. (free_tree_niter_desc): Destruct tree_niter_desc if value is non-NULL. * gengtype.cc (main): Remove widest_int handling. * graphite-isl-ast-to-gimple.cc (widest_int_from_isl_expr_int): Use WIDEST_INT_MAX_ELTS instead of WIDE_INT_MAX_ELTS. * gimple-ssa-warn-alloca.cc (pass_walloca::execute): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION and assert get_len () fits into it. * value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks): For mask or value lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. * gimple-ssa-sprintf.cc (adjust_range_for_overflow): Use wide_int::from on wi::to_wide instead of wi::to_widest. * omp-general.cc (score_wide_int): New typedef. (omp_context_compute_score): Use score_wide_int instead of widest_int and adjust for those changes. (struct omp_declare_variant_entry): Change score and score_in_declare_simd_clone non-static data member type from widest_int to score_wide_int. (omp_resolve_late_declare_variant, omp_resolve_declare_variant): Use score_wide_int instead of widest_int and adjust for those changes. (omp_lto_output_declare_variant_alt): Likewise. (omp_lto_input_declare_variant_alt): Likewise. * godump.cc (go_output_typedef): Assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS. gcc/c-family/ * c-warn.cc (match_case_to_enum_1): Use wi::to_wide just once instead of 3 times, assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS. gcc/testsuite/ * gcc.dg/bitint-38.c: New test.
2023-10-10tree-optimization/111751 - support 1024 bit vector constant reinterpretationRichard Biener1-2/+2
The following ups the limit in fold_view_convert_expr to handle 1024bit vectors as used by GCN and RVV. It also robustifies the handling in visit_reference_op_load to properly give up when constants cannot be re-interpreted. PR tree-optimization/111751 * fold-const.cc (fold_view_convert_expr): Up the buffer size to 128 bytes. * tree-ssa-sccvn.cc (visit_reference_op_load): Special case constants, giving up when re-interpretation to the target type fails.
2023-09-29Remove poly_int_podRichard Sandiford1-2/+2
poly_int was written before the switch to C++11 and so couldn't use explicit default constructors. This led to an awkward split between poly_int_pod and poly_int. poly_int simply inherited from poly_int_pod and added constructors, with the argumentless constructor having an empty body. But inheritance meant that poly_int had to repeat the assignment operators from poly_int_pod (again, no C++11, so no "using" to inherit base-class implementations). All that goes away if we switch to using default constructors. The main complication is ensuring that braced initialisation still gives a constexpr, so that static variables can be initialised without runtime code. The two problems here are: (1) When initialising a poly_int<N, wide_int> with fewer than N coefficients, the other coefficients need to be a zero of the same precision as the explicit coefficients. This was previously done in a for loop using wi::ints_for<...>::zero, but C++11 constexpr constructors can't have function bodies. The patch instead uses a series of delegated initialisers to fill in the implicit coefficients. (2) The initialisation in: void f(int x) { unsigned int foo {x}; } produces the warning: warning: narrowing conversion of 'x' from 'int' to 'unsigned int' [-Wnarrowing] whereas: void f(int x) { unsigned int foo = x; } does not. So switching to direct initialisation of the coeffs array would mean that: poly_uin64_t x = 0; would trigger a warning for using 0 rather than 0u. That seemed overly pedantic, so the patch adds explicit casts to the constructor. The complication is to do that without adding extra code to wide-int versions. The patch uses a new init_cast type for that. gcc/ * poly-int.h (poly_int_pod): Delete. (poly_coeff_traits::init_cast): New type. (poly_int_full, poly_int_hungry, poly_int_fullness): New structures. (poly_int): Replace constructors that take 1 and 2 coefficients with a general one that takes an arbitrary number of coefficients. Delegate initialization to two new private constructors, one of which uses the coefficients as-is and one of which adds an extra zero of the appropriate type (and precision, where applicable). (gt_ggc_mx, gt_pch_nx): Operate on poly_ints rather than poly_int_pods. * poly-int-types.h (poly_uint16_pod, poly_int64_pod, poly_uint64_pod) (poly_offset_int_pod, poly_wide_int_pod, poly_widest_int_pod): Delete. * gengtype.cc (main): Don't register poly_int64_pod. * calls.cc (initialize_argument_information): Use poly_int rather than poly_int_pod. (combine_pending_stack_adjustment_and_call): Likewise. * config/aarch64/aarch64.cc (pure_scalable_type_info): Likewise. * data-streamer.h (bp_unpack_poly_value): Likewise. * dwarf2cfi.cc (struct dw_trace_info): Likewise. (struct queued_reg_save): Likewise. * dwarf2out.h (struct dw_cfa_location): Likewise. * emit-rtl.h (struct incoming_args): Likewise. (struct rtl_data): Likewise. * expr.cc (get_bit_range): Likewise. (get_inner_reference): Likewise. * expr.h (get_bit_range): Likewise. * fold-const.cc (split_address_to_core_and_offset): Likewise. (ptr_difference_const): Likewise. * fold-const.h (ptr_difference_const): Likewise. * function.cc (try_fit_stack_local): Likewise. (instantiate_new_reg): Likewise. * function.h (struct expr_status): Likewise. (struct args_size): Likewise. * genmodes.cc (ZERO_COEFFS): Likewise. (mode_size_inline): Likewise. (mode_nunits_inline): Likewise. (emit_mode_precision): Likewise. (emit_mode_size): Likewise. (emit_mode_nunits): Likewise. * gimple-fold.cc (get_base_constructor): Likewise. * gimple-ssa-store-merging.cc (struct symbolic_number): Likewise. * inchash.h (class hash): Likewise. * ipa-modref-tree.cc (modref_access_node::dump): Likewise. * ipa-modref.cc (modref_access_analysis::merge_call_side_effects): Likewise. * ira-int.h (ira_spilled_reg_stack_slot): Likewise. * lra-eliminations.cc (self_elim_offsets): Likewise. * machmode.h (mode_size, mode_precision, mode_nunits): Likewise. * omp-low.cc (omplow_simd_context): Likewise. * pretty-print.cc (pp_wide_integer): Likewise. * pretty-print.h (pp_wide_integer): Likewise. * reload.cc (struct decomposition): Likewise. * reload.h (struct reload): Likewise. * reload1.cc (spill_stack_slot_width): Likewise. (struct elim_table): Likewise. (offsets_at): Likewise. (init_eliminable_invariants): Likewise. * rtl.h (union rtunion): Likewise. (poly_int_rtx_p): Likewise. (strip_offset): Likewise. (strip_offset_and_add): Likewise. * rtlanal.cc (strip_offset): Likewise. * tree-dfa.cc (get_ref_base_and_extent): Likewise. (get_addr_base_and_unit_offset_1): Likewise. (get_addr_base_and_unit_offset): Likewise. * tree-dfa.h (get_ref_base_and_extent): Likewise. (get_addr_base_and_unit_offset_1): Likewise. (get_addr_base_and_unit_offset): Likewise. * tree-ssa-loop-ivopts.cc (struct iv_use): Likewise. (strip_offset): Likewise. * tree-ssa-sccvn.h (struct vn_reference_op_struct): Likewise. * tree.cc (ptrdiff_tree_p): Likewise. * tree.h (poly_int_tree_p): Likewise. (ptrdiff_tree_p): Likewise. (get_inner_reference): Likewise. gcc/testsuite/ * gcc.dg/plugin/poly-int-tests.h (test_num_coeffs_extra): Use poly_int rather than poly_int_pod.
2023-09-12fold-const: Handle BITINT_TYPE in range_check_typeJakub Jelinek1-1/+6
When discussing PR111369 with Andrew Pinski, I've realized that I haven't added BITINT_TYPE handling to range_check_type. Right now (unsigned) max + 1 == (unsigned) min for signed _BitInt,l so I think we don't need to do the extra hops for BITINT_TYPE (though possibly we don't need them for INTEGER_TYPE either in the two's complement word and we don't support anything else, though I really don't know if Ada or some other FEs don't create weird INTEGER_TYPEs). 2023-09-12 Jakub Jelinek <jakub@redhat.com> * fold-const.cc (range_check_type): Handle BITINT_TYPE like OFFSET_TYPE.
2023-09-09Support folding min(poly,poly) to constLehua Ding1-0/+24
This patch adds support that tries to fold `MIN (poly, poly)` to a constant. Consider the following C Code: ``` void foo2 (int* restrict a, int* restrict b, int n) { for (int i = 0; i < 3; i += 1) a[i] += b[i]; } ``` Before this patch: ``` void foo2 (int * restrict a, int * restrict b, int n) { vector([4,4]) int vect__7.27; vector([4,4]) int vect__6.26; vector([4,4]) int vect__4.23; unsigned long _32; <bb 2> [local count: 268435456]: _32 = MIN_EXPR <3, POLY_INT_CST [4, 4]>; vect__4.23_20 = .MASK_LEN_LOAD (a_11(D), 32B, { -1, ... }, _32, 0); vect__6.26_15 = .MASK_LEN_LOAD (b_12(D), 32B, { -1, ... }, _32, 0); vect__7.27_9 = vect__6.26_15 + vect__4.23_20; .MASK_LEN_STORE (a_11(D), 32B, { -1, ... }, _32, 0, vect__7.27_9); [tail call] return; } ``` After this patch: ``` void foo2 (int * restrict a, int * restrict b, int n) { vector([4,4]) int vect__7.27; vector([4,4]) int vect__6.26; vector([4,4]) int vect__4.23; <bb 2> [local count: 268435456]: vect__4.23_20 = .MASK_LEN_LOAD (a_11(D), 32B, { -1, ... }, 3, 0); vect__6.26_15 = .MASK_LEN_LOAD (b_12(D), 32B, { -1, ... }, 3, 0); vect__7.27_9 = vect__6.26_15 + vect__4.23_20; .MASK_LEN_STORE (a_11(D), 32B, { -1, ... }, 3, 0, vect__7.27_9); [tail call] return; } ``` For RISC-V RVV, csrr and branch instructions can be reduced: Before this patch: ``` foo2: csrr a4,vlenb srli a4,a4,2 li a5,3 bleu a5,a4,.L5 mv a5,a4 .L5: vsetvli zero,a5,e32,m1,ta,ma ... ``` After this patch. ``` foo2: vsetivli zero,3,e32,m1,ta,ma ... ``` gcc/ChangeLog: * fold-const.cc (can_min_p): New function. (poly_int_binop): Try fold MIN_EXPR. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/div-1.c: Adjust. * gcc.target/riscv/rvv/autovec/vls/shift-3.c: Adjust. * gcc.target/riscv/rvv/autovec/fold-min-poly.c: New test.
2023-09-07middle-end: Avoid calling targetm.c.bitint_type_info inside of gcc_assert ↵Jakub Jelinek1-4/+4
[PR102989] On Thu, Sep 07, 2023 at 10:36:02AM +0200, Thomas Schwinge wrote: > Minor comment/question: are we doing away with the property that > 'assert'-like "calls" must not have side effects? Per 'gcc/system.h', > this is "OK" for 'gcc_assert' for '#if ENABLE_ASSERT_CHECKING' or > '#elif (GCC_VERSION >= 4005)' -- that is, GCC 4.5, which is always-true, > thus the "offending" '#else' is never active. However, it's different > for standard 'assert' and 'gcc_checking_assert', so I'm not sure if > that's a good property for 'gcc_assert' only? For example, see also > <https://gcc.gnu.org/PR6906> "warn about asserts with side effects", or > recent <https://gcc.gnu.org/PR111144> > "RFE: could -fanalyzer warn about assertions that have side effects?". You're right, the #define gcc_assert(EXPR) ((void)(0 && (EXPR))) fallback definition is incompatible with the way I've used it, so for --disable-checking built by non-GCC it would not work properly. 2023-09-07 Jakub Jelinek <jakub@redhat.com> PR c/102989 * expr.cc (expand_expr_real_1): Don't call targetm.c.bitint_type_info inside gcc_assert, as later code relies on it filling info variable. * gimple-fold.cc (clear_padding_bitint_needs_padding_p, clear_padding_type): Likewise. * varasm.cc (output_constant): Likewise. * fold-const.cc (native_encode_int, native_interpret_int): Likewise. * stor-layout.cc (finish_bitfield_representative, layout_type): Likewise. * gimple-lower-bitint.cc (bitint_precision_kind): Likewise.
2023-09-06Middle-end _BitInt support [PR102989]Jakub Jelinek1-9/+66
The following patch introduces the middle-end part of the _BitInt support, a new BITINT_TYPE, handling it where needed, except the lowering pass and sanitizer support. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * tree.def (BITINT_TYPE): New type. * tree.h (TREE_CHECK6, TREE_NOT_CHECK6): Define. (NUMERICAL_TYPE_CHECK, INTEGRAL_TYPE_P): Include BITINT_TYPE. (BITINT_TYPE_P): Define. (CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode bit-fields if they have BITINT_TYPE type. (tree_check6, tree_not_check6): New inline functions. (any_integral_type_check): Include BITINT_TYPE. (build_bitint_type): Declare. * tree.cc (tree_code_size, wide_int_to_tree_1, cache_integer_cst, build_zero_cst, type_hash_canon_hash, type_cache_hasher::equal, type_hash_canon): Handle BITINT_TYPE. (bitint_type_cache): New variable. (build_bitint_type): New function. (signed_or_unsigned_type_for, verify_type_variant, verify_type): Handle BITINT_TYPE. (tree_cc_finalize): Free bitint_type_cache. * builtins.cc (type_to_class): Handle BITINT_TYPE. (fold_builtin_unordered_cmp): Handle BITINT_TYPE like INTEGER_TYPE. * cfgexpand.cc (expand_debug_expr): Punt on BLKmode BITINT_TYPE INTEGER_CSTs. * convert.cc (convert_to_pointer_1, convert_to_real_1, convert_to_complex_1): Handle BITINT_TYPE like INTEGER_TYPE. (convert_to_integer_1): Likewise. For BITINT_TYPE don't check GET_MODE_PRECISION (TYPE_MODE (type)). * doc/generic.texi (BITINT_TYPE): Document. * doc/tm.texi.in (TARGET_C_BITINT_TYPE_INFO): New. * doc/tm.texi: Regenerated. * dwarf2out.cc (base_type_die, is_base_type, modified_type_die, gen_type_die_with_usage): Handle BITINT_TYPE. (rtl_for_decl_init): Punt on BLKmode BITINT_TYPE INTEGER_CSTs or handle those which fit into shwi. * expr.cc (expand_expr_real_1): Define EXTEND_BITINT macro, reduce to bitfield precision reads from BITINT_TYPE vars, parameters or memory locations. Expand large/huge BITINT_TYPE INTEGER_CSTs into memory. * fold-const.cc (fold_convert_loc, make_range_step): Handle BITINT_TYPE. (extract_muldiv_1): For BITINT_TYPE use TYPE_PRECISION rather than GET_MODE_SIZE (SCALAR_INT_TYPE_MODE). (native_encode_int, native_interpret_int, native_interpret_expr): Handle BITINT_TYPE. * gimple-expr.cc (useless_type_conversion_p): Make BITINT_TYPE to some other integral type or vice versa conversions non-useless. * gimple-fold.cc (gimple_fold_builtin_memset): Punt for BITINT_TYPE. (clear_padding_unit): Mention in comment that _BitInt types don't need to fit either. (clear_padding_bitint_needs_padding_p): New function. (clear_padding_type_may_have_padding_p): Handle BITINT_TYPE. (clear_padding_type): Likewise. * internal-fn.cc (expand_mul_overflow): For unsigned non-mode precision operands force pos_neg? to 1. (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): New functions. * internal-fn.def (MULBITINT, DIVMODBITINT, FLOATTOBITINT, BITINTTOFLOAT): New internal functions. * internal-fn.h (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): Declare. * match.pd (non-equality compare simplifications from fold_binary): Punt if TYPE_MODE (arg1_type) is BLKmode. * pretty-print.h (pp_wide_int): Handle printing of large precision wide_ints which would buffer overflow digit_buffer. * stor-layout.cc (finish_bitfield_representative): For bit-fields with BITINT_TYPE, prefer representatives with precisions in multiple of limb precision. (layout_type): Handle BITINT_TYPE. Handle COMPLEX_TYPE with BLKmode element type and assert it is BITINT_TYPE. * target.def (bitint_type_info): New C target hook. * target.h (struct bitint_info): New type. * targhooks.cc (default_bitint_type_info): New function. * targhooks.h (default_bitint_type_info): Declare. * tree-pretty-print.cc (dump_generic_node): Handle BITINT_TYPE. Handle printing large wide_ints which would buffer overflow digit_buffer. * tree-ssa-sccvn.cc: Include target.h. (eliminate_dom_walker::eliminate_stmt): Punt for large/huge BITINT_TYPE. * tree-switch-conversion.cc (jump_table_cluster::emit): For more than 64-bit BITINT_TYPE subtract low bound from expression and cast to 64-bit integer type both the controlling expression and case labels. * typeclass.h (enum type_class): Add bitint_type_class enumerator. * varasm.cc (output_constant): Handle BITINT_TYPE INTEGER_CSTs. * vr-values.cc (check_for_binary_op_overflow): Use widest2_int rather than widest_int. (simplify_using_ranges::simplify_internal_call_using_ranges): Use unsigned_type_for rather than build_nonstandard_integer_type.
2023-08-21PR111048: Set arg_npatterns correctly.Prathamesh Kulkarni1-7/+37
In valid_mask_for_fold_vec_perm_cst we set arg_npatterns always to VECTOR_CST_NPATTERNS (arg0) because of (q1 & 0) == 0: /* Ensure that the stepped sequence always selects from the same input pattern. */ unsigned arg_npatterns = ((q1 & 0) == 0) ? VECTOR_CST_NPATTERNS (arg0) : VECTOR_CST_NPATTERNS (arg1); resulting in wrong code-gen issues. The patch fixes this by changing the condition to (q1 & 1) == 0. gcc/ChangeLog: PR tree-optimization/111048 * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Set arg_npatterns correctly. (fold_vec_perm_cst): Remove workaround and again call valid_mask_fold_vec_perm_cst_p for both VLS and VLA vectors. (test_fold_vec_perm_cst::test_nunits_min_4): Add test-case.
2023-08-18tree-optimization/111048 - avoid flawed logic in fold_vec_permRichard Biener1-6/+6
The following avoids running into somehow flawed logic in fold_vec_perm for non-VLA vectors. PR tree-optimization/111048 * fold-const.cc (fold_vec_perm_cst): Check for non-VLA vectors first. * gcc.dg/torture/pr111048.c: New testcase.
2023-08-16Extend fold_vec_perm to handle VLA vector_cst.Prathamesh Kulkarni1-21/+778
The patch extends fold_vec_perm to fold VLA vector_csts. For eg: arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x sel = { 0, len, ...} npatterns = 2, nelts_per_pattern = 1, len = 4 + 4x res = VEC_PERM_EXPR<arg0, arg1, sel> --> { arg0[0], arg1[0], ... }, npatterns = 2, nelts_per_pattern = 1 Eg 2: arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x sel = {0, 1, 2, ...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x For this case the index 2 in sel is ambiguous for len 2 + 2x: if x = 0, runtime vector length = 2 and sel[i] will choose arg1[0] if x > 0, runtime vector length > 2 and sel[i] choose arg0[2]. So we return NULL_TREE for this case. This leads us to defining a constraint that a stepped sequence in sel, should only select a particular pattern from a particular input vector. Eg 3: arg0 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x arg1 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x sel = { len, 0, 2, ... } npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x sel contains a single pattern with stepped sequence: {0, 2, ...}. Let, a1 = the first element of stepped part of sequence, which is 0. Let esel = number of total elements in stepped sequence. Thus, esel = len / sel_npatterns = (4 + 4x) / 1 = 4 + 4x Let S = step of the sequence, which is 2 in this case. Let ae = last element of the stepped sequence. Thus, ae = a1 + (esel - 2) * S = 0 + (4 + 4x - 2) * 2 = 4 + 8x To ensure that we select elements from the same input vector, a1 /trunc len = ae /trunc len. Let, q1 = a1 /trunc len = 0 / (4 + 4x) = 0 Let, qe = ae /trunc len = (4 + 8x) / (4 + 4x) = 1 Since q1 != qe, we cross input vectors, and return NULL_TREE for this case. However, if sel was: sel = {len, 0, 1, ...} The only change in this case is S = 1. So, ae = a1 + (esel - 2) * S = 0 + (4 + 4x - 2) * 1 = 2 + 4x In this case, a1/len == ae/len == 0, and the stepped sequence chooses all elements from arg0. Thus, res = {arg1[0], arg0[0], arg0[1], ...} For VLA folding, sel has to conform to constraints imposed in valid_mask_for_fold_vec_perm_cst_p. test_fold_vec_perm_cst defines several unit-tests for VLA folding. gcc/ChangeLog: * fold-const.cc (INCLUDE_ALGORITHM): Add Include. (valid_mask_for_fold_vec_perm_cst_p): New function. (fold_vec_perm_cst): Likewise. (fold_vec_perm): Adjust assert and call fold_vec_perm_cst. (test_fold_vec_perm_cst): New namespace. (test_fold_vec_perm_cst::build_vec_cst_rand): New function. (test_fold_vec_perm_cst::validate_res): Likewise. (test_fold_vec_perm_cst::validate_res_vls): Likewise. (test_fold_vec_perm_cst::builder_push_elems): Likewise. (test_fold_vec_perm_cst::test_vnx4si_v4si): Likewise. (test_fold_vec_perm_cst::test_v4si_vnx4si): Likewise. (test_fold_vec_perm_cst::test_all_nunits): Likewise. (test_fold_vec_perm_cst::test_nunits_min_2): Likewise. (test_fold_vec_perm_cst::test_nunits_min_4): Likewise. (test_fold_vec_perm_cst::test_nunits_min_8): Likewise. (test_fold_vec_perm_cst::test_nunits_max_4): Likewise. (test_fold_vec_perm_cst::is_simple_vla_size): Likewise. (test_fold_vec_perm_cst::test): Likewise. (fold_const_cc_tests): Call test_fold_vec_perm_cst::test. Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
2023-07-18c++: constexpr bit_cast with empty fieldJason Merrill1-1/+2
The change to only cache constexpr calls that are reduced_constant_expression_p tripped on bit-cast3.C, which failed that predicate due to the presence of an empty field in the result of native_interpret_aggregate, which reduced_constant_expression_p rejects to avoid confusing output_constructor. This patch proposes to skip such fields in native_interpret_aggregate, since they aren't actually involved in the value representation. gcc/ChangeLog: * fold-const.cc (native_interpret_aggregate): Skip empty fields. gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_bit_cast): Check that the result of native_interpret_aggregate doesn't need more evaluation.
2023-06-30fold-const+optabs: Change return type of predicate functions from int to boolUros Bizjak1-34/+34
Also change some internal variables and function argument from int to bool. gcc/ChangeLog: * fold-const.h (multiple_of_p): Change return type from int to bool. * fold-const.cc (split_tree): Change negl_p, neg_litp_p, neg_conp_p and neg_var_p variables to bool. (const_binop): Change sat_p variable to bool. (merge_ranges): Change no_overlap variable to bool. (extract_muldiv_1): Change same_p variable to bool. (tree_swap_operands_p): Update function body for bool return type. (fold_truth_andor): Change commutative variable to bool. (multiple_of_p): Change return type from int to void and adjust function body accordingly. * optabs.h (expand_twoval_unop): Change return type from int to bool. (expand_twoval_binop): Ditto. (can_compare_p): Ditto. (have_add2_insn): Ditto. (have_addptr3_insn): Ditto. (have_sub2_insn): Ditto. (have_insn_for): Ditto. * optabs.cc (add_equal_note): Ditto. (widen_operand): Change no_extend argument from int to bool. (expand_binop): Ditto. (expand_twoval_unop): Change return type from int to void and adjust function body accordingly. (expand_twoval_binop): Ditto. (can_compare_p): Ditto. (have_add2_insn): Ditto. (have_addptr3_insn): Ditto. (have_sub2_insn): Ditto. (have_insn_for): Ditto.
2023-06-23Fix tree_simple_nonnegative_warnv_p for VECTOR_TYPEsRichard Biener1-1/+2
tree_simple_nonnegative_warnv_p ends up being called on VECTOR_TYPEs which I think even gets the wrong answer here for tcc_comparison since vector bools are signed. The following properly guards that with !VECTOR_TYPE_P. * fold-const.cc (tree_simple_nonnegative_warnv_p): Guard the truth_value_p case with !VECTOR_TYPE_P.
2023-06-23Bogus and missed folding on vector comparesRichard Biener1-2/+2
fold_binary tries to transform (double)float1 CMP (double)float2 into float1 CMP float2 but ends up using TYPE_PRECISION on the argument types. For vector types that compares the number of lanes which should be always equal (so it's harmless as to not generating wrong code). The following instead properly uses element_precision. The same happens in the corresponding match.pd pattern. * fold-const.cc (fold_binary_loc): Use element_precision when trying (double)float1 CMP (double)float2 to float1 CMP float2 simplification. * match.pd: Likewise.
2023-06-16tree-optimization/110269 - restore missed condition foldingRichard Biener1-7/+0
The following makes sure we optimize x != 0 using range info via tree_expr_nonzero_p via match.pd. PR tree-optimization/110269 * fold-const.cc (fold_binary_loc): Merge x != 0 folding with tree_expr_nonzero_p ... * match.pd (cmp (convert? addr@0) integer_zerop): With this pattern. * gcc.dg/tree-ssa/pr110269.c: New testcase.
2023-06-13middle-end/110232 - fix native interpret of vector <signed-boolean:1>Richard Biener1-7/+4
The following fixes native interpretation of a buffer as boolean vector with bit-precision elements such as AVX512 vectors. The check whether the buffer covers the whole vector was broken for bit-precision elements and the following instead implements it based on the vector type size. PR middle-end/110232 * fold-const.cc (native_interpret_vector): Use TYPE_SIZE_UNIT to check whether the buffer covers the whole vector. * gcc.target/i386/pr110232.c: New testcase.
2023-05-30Add a != MIN/MAX_VALUE_CST ? CST-+1 : a to minmax_from_comparisonAndrew Pinski1-0/+26
This patch adds the support for match that was implemented for PR 87913 in phiopt. It implements it by adding support to minmax_from_comparison for the check. It uses the range information if available which allows to produce MIN/MAX expression when comparing against the lower/upper bound of the range instead of lower/upper of the type. minmax-20.c is the new testcase which tests the ranges part. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * fold-const.cc (minmax_from_comparison): Add support for NE_EXPR. * match.pd ((cond (cmp (convert1? x) c1) (convert2? x) c2) pattern): Add ne as a possible cmp. ((a CMP b) ? minmax<a, c> : minmax<b, c> pattern): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/minmax-22.c: New test.
2023-05-23Fix handling of non-integral bit-fields in native_encode_initializerEric Botcazou1-10/+17
The encoder for CONSTRUCTORs assumes that all bit-fields (DECL_BIT_FIELD) have integral types, but that's not the case in Ada where they may have pretty much any type, resulting in a wrong encoding for them gcc/ * fold-const.cc (native_encode_initializer) <CONSTRUCTOR>: Apply the specific treatment for bit-fields only if they have an integral type and filter out non-integral bit-fields that do not start and end on a byte boundary. gcc/testsuite/ * gnat.dg/opt101.adb: New test. * gnat.dg/opt101_pkg.ads: New helper.
2023-05-20Move fold_single_bit_test to expr.cc from fold-const.ccAndrew Pinski1-112/+0
This is part 1 of N patch set that will change the expansion of `(A & C) != 0` from using trees to directly expanding so later on we can do some cost analysis. Since the only user of fold_single_bit_test is now expand, move it to there. gcc/ChangeLog: * fold-const.cc (fold_single_bit_test_into_sign_test): Move to expr.cc. (fold_single_bit_test): Likewise. * expr.cc (fold_single_bit_test_into_sign_test): Move from fold-const.cc (fold_single_bit_test): Likewise and make static. * fold-const.h (fold_single_bit_test): Remove declaration.
2023-05-18gcc: use _P() defines from tree.hBernhard Reutner-Fischer1-24/+22
gcc/ChangeLog: * alias.cc (ref_all_alias_ptr_type_p): Use _P() defines from tree.h. * attribs.cc (diag_attr_exclusions): Ditto. (decl_attributes): Ditto. (build_type_attribute_qual_variant): Ditto. * builtins.cc (fold_builtin_carg): Ditto. (fold_builtin_next_arg): Ditto. (do_mpc_arg2): Ditto. * cfgexpand.cc (expand_return): Ditto. * cgraph.h (decl_in_symtab_p): Ditto. (symtab_node::get_create): Ditto. * dwarf2out.cc (base_type_die): Ditto. (implicit_ptr_descriptor): Ditto. (gen_array_type_die): Ditto. (gen_type_die_with_usage): Ditto. (optimize_location_into_implicit_ptr): Ditto. * expr.cc (do_store_flag): Ditto. * fold-const.cc (negate_expr_p): Ditto. (fold_negate_expr_1): Ditto. (fold_convert_const): Ditto. (fold_convert_loc): Ditto. (constant_boolean_node): Ditto. (fold_binary_op_with_conditional_arg): Ditto. (build_fold_addr_expr_with_type_loc): Ditto. (fold_comparison): Ditto. (fold_checksum_tree): Ditto. (tree_unary_nonnegative_warnv_p): Ditto. (integer_valued_real_unary_p): Ditto. (fold_read_from_constant_string): Ditto. * gcc-rich-location.cc (maybe_range_label_for_tree_type_mismatch::get_text): Ditto. * gimple-expr.cc (useless_type_conversion_p): Ditto. (is_gimple_reg): Ditto. (is_gimple_asm_val): Ditto. (mark_addressable): Ditto. * gimple-expr.h (is_gimple_variable): Ditto. (virtual_operand_p): Ditto. * gimple-ssa-warn-access.cc (pass_waccess::check_dangling_stores): Ditto. * gimplify.cc (gimplify_bind_expr): Ditto. (gimplify_return_expr): Ditto. (gimple_add_padding_init_for_auto_var): Ditto. (gimplify_addr_expr): Ditto. (omp_add_variable): Ditto. (omp_notice_variable): Ditto. (omp_get_base_pointer): Ditto. (omp_strip_components_and_deref): Ditto. (omp_strip_indirections): Ditto. (omp_accumulate_sibling_list): Ditto. (omp_build_struct_sibling_lists): Ditto. (gimplify_adjust_omp_clauses_1): Ditto. (gimplify_adjust_omp_clauses): Ditto. (gimplify_omp_for): Ditto. (goa_lhs_expr_p): Ditto. (gimplify_one_sizepos): Ditto. * graphite-scop-detection.cc (scop_detection::graphite_can_represent_scev): Ditto. * ipa-devirt.cc (odr_types_equivalent_p): Ditto. * ipa-prop.cc (ipa_set_jf_constant): Ditto. (propagate_controlled_uses): Ditto. * ipa-sra.cc (type_prevails_p): Ditto. (scan_expr_access): Ditto. * optabs-tree.cc (optab_for_tree_code): Ditto. * toplev.cc (wrapup_global_declaration_1): Ditto. * trans-mem.cc (transaction_invariant_address_p): Ditto. * tree-cfg.cc (verify_types_in_gimple_reference): Ditto. (verify_gimple_comparison): Ditto. (verify_gimple_assign_binary): Ditto. (verify_gimple_assign_single): Ditto. * tree-complex.cc (get_component_ssa_name): Ditto. * tree-emutls.cc (lower_emutls_2): Ditto. * tree-inline.cc (copy_tree_body_r): Ditto. (estimate_move_cost): Ditto. (copy_decl_for_dup_finish): Ditto. * tree-nested.cc (convert_nonlocal_omp_clauses): Ditto. (note_nonlocal_vla_type): Ditto. (convert_local_omp_clauses): Ditto. (remap_vla_decls): Ditto. (fixup_vla_decls): Ditto. * tree-parloops.cc (loop_has_vector_phi_nodes): Ditto. * tree-pretty-print.cc (print_declaration): Ditto. (print_call_name): Ditto. * tree-sra.cc (compare_access_positions): Ditto. * tree-ssa-alias.cc (compare_type_sizes): Ditto. * tree-ssa-ccp.cc (get_default_value): Ditto. * tree-ssa-coalesce.cc (populate_coalesce_list_for_outofssa): Ditto. * tree-ssa-dom.cc (reduce_vector_comparison_to_scalar_comparison): Ditto. * tree-ssa-forwprop.cc (can_propagate_from): Ditto. * tree-ssa-propagate.cc (may_propagate_copy): Ditto. * tree-ssa-sccvn.cc (fully_constant_vn_reference_p): Ditto. * tree-ssa-sink.cc (statement_sink_location): Ditto. * tree-ssa-structalias.cc (type_must_have_pointers): Ditto. * tree-ssa-ter.cc (find_replaceable_in_bb): Ditto. * tree-ssa-uninit.cc (warn_uninit): Ditto. * tree-ssa.cc (maybe_rewrite_mem_ref_base): Ditto. (non_rewritable_mem_ref_base): Ditto. * tree-streamer-in.cc (lto_input_ts_type_non_common_tree_pointers): Ditto. * tree-streamer-out.cc (write_ts_type_non_common_tree_pointers): Ditto. * tree-vect-generic.cc (do_binop): Ditto. (do_cond): Ditto. * tree-vect-stmts.cc (vect_init_vector): Ditto. * tree-vector-builder.h (tree_vector_builder::note_representative): Ditto. * tree.cc (sign_mask_for): Ditto. (verify_type_variant): Ditto. (gimple_canonical_types_compatible_p): Ditto. (verify_type): Ditto. * ubsan.cc (get_ubsan_type_info_for_type): Ditto. * var-tracking.cc (prepare_call_arguments): Ditto. (vt_add_function_parameters): Ditto. * varasm.cc (decode_addr_const): Ditto.
2023-05-01Conversion to irange wide_int API.Aldy Hernandez1-2/+1
This converts the irange API to use wide_ints exclusively, along with its users. This patch will slow down VRP, as there will be more useless wide_int to tree conversions. However, this slowdown is only temporary, as a follow-up patch will convert the internal representation of iranges to wide_ints for a net overall gain in performance. gcc/ChangeLog: * fold-const.cc (expr_not_equal_to): Convert to irange wide_int API. * gimple-fold.cc (size_must_be_zero_p): Same. * gimple-loop-versioning.cc (loop_versioning::prune_loop_conditions): Same. * gimple-range-edge.cc (gcond_edge_range): Same. (gimple_outgoing_range::calc_switch_ranges): Same. * gimple-range-fold.cc (adjust_imagpart_expr): Same. (adjust_realpart_expr): Same. (fold_using_range::range_of_address): Same. (fold_using_range::relation_fold_and_or): Same. * gimple-range-gori.cc (gori_compute::gori_compute): Same. (range_is_either_true_or_false): Same. * gimple-range-op.cc (cfn_toupper_tolower::get_letter_range): Same. (cfn_clz::fold_range): Same. (cfn_ctz::fold_range): Same. * gimple-range-tests.cc (class test_expr_eval): Same. * gimple-ssa-warn-alloca.cc (alloca_call_type): Same. * ipa-cp.cc (ipa_value_range_from_jfunc): Same. (propagate_vr_across_jump_function): Same. (decide_whether_version_node): Same. * ipa-prop.cc (ipa_get_value_range): Same. * ipa-prop.h (ipa_range_set_and_normalize): Same. * range-op.cc (get_shift_range): Same. (value_range_from_overflowed_bounds): Same. (value_range_with_overflow): Same. (create_possibly_reversed_range): Same. (equal_op1_op2_relation): Same. (not_equal_op1_op2_relation): Same. (lt_op1_op2_relation): Same. (le_op1_op2_relation): Same. (gt_op1_op2_relation): Same. (ge_op1_op2_relation): Same. (operator_mult::op1_range): Same. (operator_exact_divide::op1_range): Same. (operator_lshift::op1_range): Same. (operator_rshift::op1_range): Same. (operator_cast::op1_range): Same. (operator_logical_and::fold_range): Same. (set_nonzero_range_from_mask): Same. (operator_bitwise_or::op1_range): Same. (operator_bitwise_xor::op1_range): Same. (operator_addr_expr::fold_range): Same. (pointer_plus_operator::wi_fold): Same. (pointer_or_operator::op1_range): Same. (INT): Same. (UINT): Same. (INT16): Same. (UINT16): Same. (SCHAR): Same. (UCHAR): Same. (range_op_cast_tests): Same. (range_op_lshift_tests): Same. (range_op_rshift_tests): Same. (range_op_bitwise_and_tests): Same. (range_relational_tests): Same. * range.cc (range_zero): Same. (range_nonzero): Same. * range.h (range_true): Same. (range_false): Same. (range_true_and_false): Same. * tree-data-ref.cc (split_constant_offset_1): Same. * tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Same. * tree-ssa-loop-unswitch.cc (struct unswitch_predicate): Same. (find_unswitching_predicates_for_bb): Same. * tree-ssa-phiopt.cc (value_replacement): Same. * tree-ssa-threadbackward.cc (back_threader::find_taken_edge_cond): Same. * tree-ssanames.cc (ssa_name_has_boolean_range): Same. * tree-vrp.cc (find_case_label_range): Same. * value-query.cc (range_query::get_tree_range): Same. * value-range.cc (irange::set_nonnegative): Same. (frange::contains_p): Same. (frange::singleton_p): Same. (frange::internal_singleton_p): Same. (irange::irange_set): Same. (irange::irange_set_1bit_anti_range): Same. (irange::irange_set_anti_range): Same. (irange::set): Same. (irange::operator==): Same. (irange::singleton_p): Same. (irange::contains_p): Same. (irange::set_range_from_nonzero_bits): Same. (DEFINE_INT_RANGE_INSTANCE): Same. (INT): Same. (UINT): Same. (SCHAR): Same. (UINT128): Same. (UCHAR): Same. (range): New. (tree_range): New. (range_int): New. (range_uint): New. (range_uint128): New. (range_uchar): New. (range_char): New. (build_range3): Convert to irange wide_int API. (range_tests_irange3): Same. (range_tests_int_range_max): Same. (range_tests_strict_enum): Same. (range_tests_misc): Same. (range_tests_nonzero_bits): Same. (range_tests_nan): Same. (range_tests_signed_zeros): Same. * value-range.h (Value_Range::Value_Range): Same. (irange::set): Same. (irange::nonzero_p): Same. (irange::contains_p): Same. (range_includes_zero_p): Same. (irange::set_nonzero): Same. (irange::set_zero): Same. (contains_zero_p): Same. (frange::contains_p): Same. * vr-values.cc (simplify_using_ranges::op_with_boolean_value_range_p): Same. (bounds_of_var_in_loop): Same. (simplify_using_ranges::legacy_fold_cond_overflow): Same.
2023-04-28MATCH: Factor out code that for min max detection with constantsAndrew Pinski1-0/+44
This factors out some of the code from the min/max detection from match.pd into a function so it can be reused in other places. This is mainly used to detect the conversions of >= to > which causes the integer values to be changed by one. Changes since v1: * factor out the checks for INTEGER_CSTs so it is more obvious. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd: Factor out the deciding the min/max from the "(cond (cmp (convert1? x) c1) (convert2? x) c2)" pattern to ... * fold-const.cc (minmax_from_comparison): this new function. * fold-const.h (minmax_from_comparison): New prototype.
2023-04-04sanitizer: missing signed integer overflow errors [PR109107]Marek Polacek1-1/+2
Here we're failing to detect a signed overflow with -O because match.pd, since r8-1516, transforms c = (a + 1) - (int) (short int) b; into c = (int) ((unsigned int) a + 4294946117); wrongly eliding the overflow. This kind of problems is usually avoided by using TYPE_OVERFLOW_SANITIZED in the appropriate place. The first match.pd hunk in the patch fixes it. I've constructed a testcase for each of the surrounding cases as well. Then I noticed that fold_binary_loc/associate has the same problem, so I've added a TYPE_OVERFLOW_SANITIZED there as well (it may be too coarse, sorry). Then I found yet another problem, but instead of fixing it now I've opened 109134. I could probably go on and find a dozen more. PR sanitizer/109107 gcc/ChangeLog: * fold-const.cc (fold_binary_loc): Use TYPE_OVERFLOW_SANITIZED when associating. * match.pd: Use TYPE_OVERFLOW_SANITIZED. gcc/testsuite/ChangeLog: * c-c++-common/ubsan/pr109107-1.c: New test. * c-c++-common/ubsan/pr109107-2.c: New test. * c-c++-common/ubsan/pr109107-3.c: New test. * c-c++-common/ubsan/pr109107-4.c: New test.
2023-03-23c: [PR84900] cast of compound literal does not cause the code to become a ↵Andrew Pinski1-0/+1
non-lvalue The problem here is after r0-92187-g2ec5deb5c3146c, maybe_lvalue_p would return false for compound literals which causes non_lvalue_loc not to wrap the expression with a NON_LVALUE_EXPR unlike before when it return true as it returns true for all language specific tree codes. This fixes that oversight and fixes the testcase to have the cast as a non-lvalue. Committed to the trunk as obvious after a bootstrap/test on x86_64-linux-gnu. PR c/84900 gcc/ChangeLog: * fold-const.cc (maybe_lvalue_p): Treat COMPOUND_LITERAL_EXPR as a lvalue. gcc/testsuite/ChangeLog: * gcc.dg/compound-literal-cast-lvalue-1.c: New test.
2023-03-09middle-end/108995 - avoid folding when sanitizing overflowRichard Biener1-4/+3
The following plugs one place in extract_muldiv where it should avoid folding when sanitizing overflow. PR middle-end/108995 * fold-const.cc (extract_muldiv_1): Avoid folding (CST * b) / CST2 when sanitizing overflow and we rely on overflow being undefined. * gcc.dg/ubsan/pr108995.c: New testcase.
2023-03-02fold-const: Ignore padding bits in native_interpret_expr REAL_CST reverse ↵Jakub Jelinek1-2/+4
verification [PR108934] In the following testcase we try to std::bit_cast a (pair of) integral value(s) which has some non-zero bits in the place of x86 long double (for 64-bit 16 byte type with 10 bytes actually loaded/stored by hw, for 32-bit 12 byte) and starting with my PR104522 change we reject that as native_interpret_expr fails on it. The PR104522 change extends what has been done before for MODE_COMPOSITE_P (but those don't have any padding bits) to all floating point types, because e.g. the exact x86 long double has various bit combinations we don't support, like pseudo-(denormals,infinities,NaNs) or unnormals. The HW handles some of those as exceptional cases and others similarly to the non-pseudo ones. But for the padding bits it actually doesn't load/store those bits at all, it loads/stores 10 bytes. So, I think we should exempt the padding bits from the reverse comparison (the native_encode_expr bits for the padding will be all zeros), which the following patch does. For bit_cast it is similar to e.g. ignoring padding bits if the destination is a structure which has padding bits in there. The change changed auto-init-4.c to how it has been behaving before the PR105259 change, where some more VCEs can be now done. 2023-03-02 Jakub Jelinek <jakub@redhat.com> PR c++/108934 * fold-const.cc (native_interpret_expr) <case REAL_CST>: Before memcmp comparison copy the bytes from ptr to a temporary buffer and clearing padding bits in there. * gcc.target/i386/auto-init-4.c: Revert PR105259 change. * g++.target/i386/pr108934.C: New test.
2023-01-04ubsan: Avoid narrowing of multiply for -fsanitize=signed-integer-overflow ↵Jakub Jelinek1-1/+3
[PR108256] We shouldn't narrow multiplications originally done in signed types, because the original multiplication might overflow but the narrowed one will be done in unsigned arithmetics and will never overflow. 2023-01-04 Jakub Jelinek <jakub@redhat.com> PR sanitizer/108256 * convert.cc (do_narrow): Punt for MULT_EXPR if original type doesn't wrap around and -fsanitize=signed-integer-overflow is on. * fold-const.cc (fold_unary_loc) <CASE_CONVERT>: Likewise. * c-c++-common/ubsan/pr108256.c: New test.