aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2022-10-14middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic supportJakub Jelinek29-301/+721
Here is a complete patch to add std::bfloat16_t support on x86 (AArch64 and ARM left for later). Almost no BFmode optabs are added by the patch, so for binops/unops it extends to SFmode first and then truncates back to BFmode. For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations of all those conversions so that we avoid double rounding, for BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much it emits BFmode -> SFmode conversion first and then converts to the even wider mode, neither step should be imprecise. For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion and then SFmode -> HFmode, because neither format is subset or superset of the other, while SFmode is superset of both. expr.cc then contains a -ffast-math optimization of the BF -> SF and SF -> BF conversions if we don't optimize for space (and for the latter if -frounding-math isn't enabled either). For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16 but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math || !flag_unsafe_math_optimizations, because I think the insn doesn't raise on sNaNs, hardcodes round to nearest and flushes denormals to zero. By default (unless x86 -fexcess-precision=16) we use float excess precision for BFmode, so truncate only on explicit casts and assignments. The patch introduces a single __bf16 builtin - __builtin_nansf16b, because (__bf16) __builtin_nansf ("") will drop the sNaN into qNaN, and uses f16b suffix instead of bf16 because there would be ambiguity on log vs. logb - __builtin_logbf16 could be either log with bf16 suffix or logb with f16 suffix. In other cases libstdc++ should mostly use __builtin_*f for std::bfloat16_t overloads (we have a problem with std::nextafter though but that one we have also for std::float16_t). 2022-10-14 Jakub Jelinek <jakub@redhat.com> gcc/ * tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE. * tree.h (bfloat16_type_node): Define. * tree.cc (excess_precision_type): Promote bfloat16_type_mode like float16_type_mode. (build_common_tree_nodes): Initialize bfloat16_type_node if BFmode is supported. * expmed.h (maybe_expand_shift): Declare. * expmed.cc (maybe_expand_shift): No longer static. * expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF conversions. If there is no optab, handle BF -> {DF,XF,TF,HF} conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add -ffast-math generic implementation for BF -> SF and SF -> BF conversions. * builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New. * builtins.def (BUILT_IN_NANSF16B): New builtin. * fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B. * config/i386/i386.cc (classify_argument): Handle E_BCmode. (ix86_libgcc_floating_mode_supported_p): Also return true for BFmode for -msse2. (ix86_mangle_type): Mangle BFmode as DF16b. (ix86_invalid_conversion, ix86_invalid_unary_op, ix86_invalid_binary_op): Remove. (TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP): Don't redefine. * config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove. (ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than ix86_bf16_type_node, only create it if still NULL. * config/i386/i386-builtin-types.def (BFLOAT16): Likewise. * config/i386/i386.md (cbranchbf4, cstorebf4): New expanders. gcc/c-family/ * c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node, predefine __BFLT16_*__ macros and for C++23 also __STDCPP_BFLOAT16_T__. Predefine bfloat16_type_node related macros for -fbuilding-libgcc. * c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16. gcc/c/ * c-typeck.cc (convert_arguments): Don't promote __bf16 to double. gcc/cp/ * cp-tree.h (extended_float_type_p): Return true for bfloat16_type_node. * typeck.cc (cp_compare_floating_point_conversion_ranks): Set extended{1,2} if mv{1,2} is bfloat16_type_node. Adjust comment. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_bfloat16, check_effective_target_bfloat16_runtime, add_options_for_bfloat16): New. * gcc.dg/torture/bfloat16-basic.c: New test. * gcc.dg/torture/bfloat16-builtin.c: New test. * gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test. * gcc.dg/torture/bfloat16-complex.c: New test. * gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable from bfloat16-builtin-issignaling-1.c. * gcc.dg/torture/floatn-basic.h: Allow to be includable from bfloat16-basic.c. * gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected diagnostics. * gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise. * gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise. * g++.target/i386/bfloat_cpp_typecheck.C: Likewise. libcpp/ * include/cpplib.h (CPP_N_BFLOAT16): Define. * expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for C++. libgcc/ * config/i386/t-softfp (softfp_extensions): Add bfsf. (softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf. (CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c, CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add -msse2. * config/i386/libgcc-glibc.ver (GCC_13.0.0): Export __extendbfsf2 and __trunc{s,d,x,t,h}fbf2. * config/i386/sfp-machine.h (_FP_NANSIGN_B): Define. * config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define. * config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define. * soft-fp/brain.h: New file. * soft-fp/truncsfbf2.c: New file. * soft-fp/truncdfbf2.c: New file. * soft-fp/truncxfbf2.c: New file. * soft-fp/trunctfbf2.c: New file. * soft-fp/trunchfbf2.c: New file. * soft-fp/truncbfhf2.c: New file. * soft-fp/extendbfsf2.c: New file. libiberty/ * cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment. * cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t entry. (cplus_demangle_type): Demangle DF16b. * testsuite/demangle-expected (_Z3xxxDF16b): New test.
2022-10-14c++: Excess precision for ? int : float or int == float [PR107097, PR82071, ↵Jakub Jelinek9-53/+99
PR87390] The following incremental patch implements the C11 behavior (for all C++ versions) for cond ? int : float cond ? float : int int cmp float float cmp int where int is any integral type, float any floating point type with excess precision and cmp ==, !=, >, <, >=, <= and <=>. 2022-10-14 Jakub Jelinek <jakub@redhat.com> PR c/82071 PR c/87390 PR c++/107097 gcc/cp/ * cp-tree.h (cp_ep_convert_and_check): Remove. * cvt.cc (cp_ep_convert_and_check): Remove. * call.cc (build_conditional_expr): Use excess precision for ?: with one arm floating and another integral. Don't convert first to semantic result type from integral types. (convert_like_internal): Don't call cp_ep_convert_and_check, instead just strip EXCESS_PRECISION_EXPR before calling cp_convert_and_check or cp_convert. * typeck.cc (cp_build_binary_op): Set may_need_excess_precision for comparisons or SPACESHIP_EXPR with at least one operand integral. Don't compute semantic_result_type if build_type is non-NULL. Call cp_convert_and_check instead of cp_ep_convert_and_check. gcc/testsuite/ * gcc.target/i386/excess-precision-8.c: For C++ wrap abort and exit declarations into extern "C" block. * gcc.target/i386/excess-precision-10.c: Likewise. * g++.target/i386/excess-precision-7.C: Remove. * g++.target/i386/excess-precision-8.C: New test. * g++.target/i386/excess-precision-9.C: Remove. * g++.target/i386/excess-precision-10.C: New test. * g++.target/i386/excess-precision-12.C: New test.
2022-10-14c++: Implement excess precision support for C++ [PR107097, PR323]Jakub Jelinek38-86/+613
The following patch implements excess precision support for C++. Like for C, it uses EXCESS_PRECISION_EXPR tree to say that its operand is evaluated in excess precision and what the semantic type of the expression is. In most places I've followed what the C FE does in similar spots, so e.g. for binary ops if one or both operands are already EXCESS_PRECISION_EXPR, strip those away or for operations that might need excess precision (+, -, *, /) check if the operands should use excess precision and convert to that type and at the end wrap into EXCESS_PRECISION_EXPR with the common semantic type. This patch follows the C99 handling where it differs from C11 handling. There are some cases which needed to be handled differently, the C FE can just strip EXCESS_PRECISION_EXPR (replace it with its operand) when handling explicit cast, but that IMHO isn't right for C++ - the discovery what exact conversion should be used (e.g. if user conversion or standard or their sequence) should be decided based on the semantic type (i.e. type of EXCESS_PRECISION_EXPR), and that decision continues in convert_like* where we pick the right user conversion, again, if say some class has ctor from double and long double and we are on ia32 with standard excess precision promoting float/double to long double, then we should pick the ctor from double. Or when some other class has ctor from just double, and EXCESS_PRECISION_EXPR semantic type is float, we should choose the user ctor from double, but actually just convert the long double excess precision to double and not to float first. We need to make sure even identity conversion converts from excess precision to the semantic one though, but if identity is chained with other conversions, we don't want the identity next_conversion to drop to semantic precision only to widen afterwards. The existing testcases tweaks were for cases on i686-linux where excess precision breaks those tests, e.g. if we have double d = 4.2; if (d == 4.2) then it does the expected thing only with -fexcess-precision=fast, because with -fexcess-precision=standard it is actually double d = 4.2; if ((long double) d == 4.2L) where 4.2L is different from 4.2. I've added -fexcess-precision=fast to some tests and changed other tests to use constants that are exactly representable and don't suffer from these excess precision issues. There is one exception, pr68180.C looks like a bug in the patch which is also present in the C FE (so I'd like to get it resolved incrementally in both). Reduced testcase: typedef float __attribute__((vector_size (16))) float32x4_t; float32x4_t foo(float32x4_t x, float y) { return x + y; } with -m32 -std=c11 -Wno-psabi or -m32 -std=c++17 -Wno-psabi it is rejected with: pr68180.c:2:52: error: conversion of scalar ‘long double’ to vector ‘float32x4_t’ {aka ‘__vector(4) float’} involves truncation but without excess precision (say just -std=c11 -Wno-psabi or -std=c++17 -Wno-psabi) it is accepted. Perhaps we should pass down the semantic type to scalar_to_vector and use the semantic type rather than excess precision type in the diagnostics. 2022-10-14 Jakub Jelinek <jakub@redhat.com> PR middle-end/323 PR c++/107097 gcc/ * doc/invoke.texi (-fexcess-precision=standard): Mention that the option now also works in C++. gcc/c-family/ * c-common.def (EXCESS_PRECISION_EXPR): Remove comment part about the tree being specific to C/ObjC. * c-opts.cc (c_common_post_options): Handle flag_excess_precision in C++ the same as in C. * c-lex.cc (interpret_float): Set const_type to excess_precision () even for C++. gcc/cp/ * parser.cc (cp_parser_primary_expression): Handle EXCESS_PRECISION_EXPR with REAL_CST operand the same as REAL_CST. * cvt.cc (cp_ep_convert_and_check): New function. * call.cc (build_conditional_expr): Add excess precision support. When type_after_usual_arithmetic_conversions returns error_mark_node, use gcc_checking_assert that it is because of uncomparable floating point ranks instead of checking all those conditions and make it work also with complex types. (convert_like_internal): Likewise. Add NESTED_P argument, pass true to recursive calls to convert_like. (convert_like): Add NESTED_P argument, pass it through to convert_like_internal. For other overload pass false to it. (convert_like_with_context): Pass false to NESTED_P. (convert_arg_to_ellipsis): Add excess precision support. (magic_varargs_p): For __builtin_is{finite,inf,inf_sign,nan,normal} and __builtin_fpclassify return 2 instead of 1, document what it means. (build_over_call): Don't handle former magic 2 which is no longer used, instead for magic 1 remove EXCESS_PRECISION_EXPR. (perform_direct_initialization_if_possible): Pass false to NESTED_P convert_like argument. * constexpr.cc (cxx_eval_constant_expression): Handle EXCESS_PRECISION_EXPR. (potential_constant_expression_1): Likewise. * pt.cc (tsubst_copy, tsubst_copy_and_build): Likewise. * cp-tree.h (cp_ep_convert_and_check): Declare. * cp-gimplify.cc (cp_fold): Handle EXCESS_PRECISION_EXPR. * typeck.cc (cp_common_type): For COMPLEX_TYPEs, return error_mark_node if recursive call returned it. (convert_arguments): For magic 1 remove EXCESS_PRECISION_EXPR. (cp_build_binary_op): Add excess precision support. When cp_common_type returns error_mark_node, use gcc_checking_assert that it is because of uncomparable floating point ranks instead of checking all those conditions and make it work also with complex types. (cp_build_unary_op): Likewise. (cp_build_compound_expr): Likewise. (build_static_cast_1): Remove EXCESS_PRECISION_EXPR. gcc/testsuite/ * gcc.target/i386/excess-precision-1.c: For C++ wrap abort and exit declarations into extern "C" block. * gcc.target/i386/excess-precision-2.c: Likewise. * gcc.target/i386/excess-precision-3.c: Likewise. Remove check_float_nonproto and check_double_nonproto tests for C++. * gcc.target/i386/excess-precision-7.c: For C++ wrap abort and exit declarations into extern "C" block. * gcc.target/i386/excess-precision-9.c: Likewise. * g++.target/i386/excess-precision-1.C: New test. * g++.target/i386/excess-precision-2.C: New test. * g++.target/i386/excess-precision-3.C: New test. * g++.target/i386/excess-precision-4.C: New test. * g++.target/i386/excess-precision-5.C: New test. * g++.target/i386/excess-precision-6.C: New test. * g++.target/i386/excess-precision-7.C: New test. * g++.target/i386/excess-precision-9.C: New test. * g++.target/i386/excess-precision-11.C: New test. * c-c++-common/dfp/convert-bfp-10.c: Add -fexcess-precision=fast as dg-additional-options. * c-c++-common/dfp/compare-eq-const.c: Likewise. * g++.dg/cpp1z/constexpr-96862.C: Likewise. * g++.dg/cpp1z/decomp12.C (main): Use 2.25 instead of 2.3 to avoid excess precision differences. * g++.dg/other/thunk1.C: Add -fexcess-precision=fast as dg-additional-options. * g++.dg/vect/pr64410.cc: Likewise. * g++.dg/cpp1y/pr68180.C: Likewise. * g++.dg/vect/pr89653.cc: Likewise. * g++.dg/cpp0x/variadic-tuple.C: Likewise. * g++.dg/cpp0x/nsdmi-union1.C: Use 4.25 instead of 4.2 to avoid excess precision differences. * g++.old-deja/g++.brendan/copy9.C: Add -fexcess-precision=fast as dg-additional-options. * g++.old-deja/g++.brendan/overload7.C: Likewise.
2022-10-14c: C2x storage class specifiers in compound literalsJoseph Myers16-14/+369
Implement the C2x feature of storage class specifiers in compound literals. Such storage class specifiers (static, register or thread_local; also constexpr, but we don't yet have C2x constexpr support implemented) can be used before the type name (not mixed with type specifiers, unlike in declarations) and have the same semantics and constraints as for declarations of named objects. Also allow GNU __thread to be used, given that thread_local can be. Bootstrapped with no regressions for x86_64-pc-linux-gnu. gcc/c/ * c-decl.cc (build_compound_literal): Add parameter scspecs. Handle storage class specifiers. * c-parser.cc (c_token_starts_compound_literal) (c_parser_compound_literal_scspecs): New. (c_parser_postfix_expression_after_paren_type): Add parameter scspecs. Call pedwarn_c11 for use of storage class specifiers. Update call to build_compound_literal. (c_parser_cast_expression, c_parser_sizeof_expression) (c_parser_alignof_expression): Handle storage class specifiers for compound literals. Update calls to c_parser_postfix_expression_after_paren_type. (c_parser_postfix_expression): Update syntax comment. * c-tree.h (build_compound_literal): Update prototype. * c-typeck.cc (c_mark_addressable): Diagnose taking address of register compound literal. gcc/testsuite/ * gcc.dg/c11-complit-1.c, gcc.dg/c11-complit-2.c, gcc.dg/c11-complit-3.c, gcc.dg/c2x-complit-2.c, gcc.dg/c2x-complit-3.c, gcc.dg/c2x-complit-4.c, gcc.dg/c2x-complit-5.c, gcc.dg/c2x-complit-6.c, gcc.dg/c2x-complit-7.c, gcc.dg/c90-complit-2.c, gcc.dg/gnu2x-complit-1.c, gcc.dg/gnu2x-complit-2.c: New tests.
2022-10-14Daily bump.GCC Administrator6-1/+220
2022-10-14Fix bogus -Wstringop-overflow warningEric Botcazou2-2/+22
If you compile the testcase with -O2 -fno-inline -Wall, you get: In function 'process_array3': cc1: warning: 'process_array4' accessing 4 bytes in a region of size 3 [- Wstringop-overflow=] cc1: note: referencing argument 1 of type 'char[4]' t.c:6:6: note: in a call to function 'process_array4' 6 | void process_array4 (char a[4], int n) | ^~~~~~~~~~~~~~ cc1: warning: 'process_array4' accessing 4 bytes in a region of size 3 [- Wstringop-overflow=] cc1: note: referencing argument 1 of type 'char[4]' t.c:6:6: note: in a call to function 'process_array4' That's because the ICF IPA pass has identified the two functions and turned process_array3 into a wrapper of process_array4. gcc/ * gimple-ssa-warn-access.cc (pass_waccess::check_call): Return early for calls made from thunks. gcc/testsuite/ * gcc.dg/Wstringop-overflow-89.c: New test.
2022-10-13c++: trivial formatting cleanupsJason Merrill5-15/+17
Split out from the C++ contracts patch. gcc/cp/ChangeLog: * cp-tree.h: Fix whitespace. * parser.h: Fix whitespace. * decl.cc: Fix whitespace. * parser.cc: Fix whitespace. * pt.cc: Fix whitespace.
2022-10-13analyzer: fix ICE introduced in r13-3168 [PR107210]David Malcolm2-1/+18
gcc/analyzer/ChangeLog: PR analyzer/107210 * svalue.cc (constant_svalue::maybe_fold_bits_within): Only attempt to extract individual bits when tree_fits_uhwi_p. gcc/testsuite/ChangeLog: PR analyzer/107210 * gfortran.dg/analyzer/pr107210.f90: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-10-13Fix emit_group_store regression on big-endianEric Botcazou1-12/+32
The recent optimization implemented for complex modes contains an oversight for big-endian platforms: it uses a lowpart SUBREG when the integer modes have different sizes, but this does not match the semantics of the PARALLELs which have a bundled byte offset; this offset is always zero in the code path and the lowpart is not at offset zero on big-endian platforms. gcc/ * expr.cc (emit_group_stote): Fix handling of modes of different sizes for big-endian targets in latest change and add commentary.
2022-10-13use proper DECL_INITIAL for VTVMartin Liska3-23/+12
gcc/cp/ChangeLog: * vtable-class-hierarchy.cc (vtv_generate_init_routine): Emit an artificial variable that would be put into .preinit_array section. gcc/ChangeLog: * output.h (assemble_vtv_preinit_initializer): Remove. * varasm.cc (assemble_vtv_preinit_initializer): Remove.
2022-10-13propagate partial equivs in the cache.Andrew MacLeod3-6/+66
Adjust on-entry cache propagation to look for and propagate both full and partial equivalences. gcc/ PR tree-optimization/102540 PR tree-optimization/102872 * gimple-range-cache.cc (ranger_cache::fill_block_cache): Handle partial equivs. (ranger_cache::range_from_dom): Cleanup dump output. gcc/testsuite/ * gcc.dg/pr102540.c: New. * gcc.dg/pr102872.c: New.
2022-10-13Add partial equivalence recognition to cast and bitwise and.Andrew MacLeod1-0/+65
This provides the hooks that will register partial equivalencies for casts and bitwise AND operations with the appropriate bit pattern. * range-op.cc (operator_cast::lhs_op1_relation): New. (operator_bitwise_and::lhs_op1_relation): New.
2022-10-13Add equivalence iterator to relation oracle.Andrew MacLeod3-11/+118
Instead of looping over an exposed equivalence bitmap, provide iterators to loop over equivalences, partial equivalences, or both. * gimple-range-cache.cc (ranger_cache::fill_block_cache): Use iterator. * value-relation.cc (equiv_relation_iterator::equiv_relation_iterator): New. (equiv_relation_iterator::next): New. (equiv_relation_iterator::get_name): New. * value-relation.h (class relation_oracle): Privatize some methods. (class equiv_relation_iterator): New. (FOR_EACH_EQUIVALENCE): New. (FOR_EACH_PARTIAL_EQUIV): New. (FOR_EACH_PARTIAL_AND_FULL_EQUIV): New.
2022-10-13Add partial equivalence support to the relation oracle.Andrew MacLeod2-14/+229
This provides enhancements to the equivalence oracle to also track partial equivalences. They are tracked similar to equivalences, except it tracks a 'slice' of another ssa name. 8, 16, 32 and 64 bit slices are tracked. This will allow casts and mask of the same value to compare equal. * value-relation.cc (equiv_chain::dump): Don't print empty equivalences. (equiv_oracle::equiv_oracle): Allocate a partial equiv table. (equiv_oracle::~equiv_oracle): Release the partial equiv table. (equiv_oracle::add_partial_equiv): New. (equiv_oracle::partial_equiv_set): New. (equiv_oracle::partial_equiv): New. (equiv_oracle::query_relation): Check for partial equivs too. (equiv_oracle::dump): Also dump partial equivs. (dom_oracle::register_relation): Handle partial equivs. (dom_oracle::query_relation): Check for partial equivs. * value-relation.h (enum relation_kind_t): Add partial equivs. (relation_partial_equiv_p): New. (relation_equiv_p): New. (class pe_slice): New. (class equiv_oracle): Add prototypes. (pe_to_bits): New. (bits_to_pe): New. (pe_min): New.
2022-10-13c++: ICE with VEC_INIT_EXPR and defarg [PR106925]Marek Polacek2-2/+18
Since r12-8066, in cxx_eval_vec_init we perform expand_vec_init_expr while processing the default argument in this test. At this point start_preparsed_function hasn't yet set current_function_decl. expand_vec_init_expr then leads to maybe_splice_retval_cleanup which checks DECL_CONSTRUCTOR_P (current_function_decl) without checking that c_f_d is non-null first. It seems correct that c_f_d is null here, so it seems to me that maybe_splice_retval_cleanup should check c_f_d as in the following patch. PR c++/106925 gcc/cp/ChangeLog: * except.cc (maybe_splice_retval_cleanup): Check current_function_decl. Make the bool const. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/initlist-defarg3.C: New test.
2022-10-13tree-optimization/107247 - reduce SLP reduction accumulatorRichard Biener1-1/+13
The following makes sure to reduce a multi-vector SLP reduction accumulator to a single vector using vector operations if easily possible (if the number of lanes in the vector type is a multiple of the number of scalar accumulators). PR tree-optimization/107247 * tree-vect-loop.cc (vect_create_epilog_for_reduction): Reduce multi vector SLP reduction accumulators. Check the adjusted number of accumulator vectors against one for the re-use in the epilogue.
2022-10-13machmode: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE ↵Jakub Jelinek5-14/+101
meaning, add new GET_MODE_WIDER_MODE On Wed, Oct 05, 2022 at 04:02:25PM -0400, Jason Merrill wrote: > > > > @@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_ > > > > { > > > > machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode; > > > > icode = optab_handler (cstore_optab, optab_mode); > > > > - if (icode != CODE_FOR_nothing) > > > > + if (icode != CODE_FOR_nothing > > > > + /* Don't consider [BH]Fmode as usable wider mode, as neither is > > > > + a subset or superset of the other. */ > > > > + && (compare_mode == mode > > > > + || !SCALAR_FLOAT_MODE_P (compare_mode) > > > > + || maybe_ne (GET_MODE_PRECISION (compare_mode), > > > > + GET_MODE_PRECISION (mode)))) > > > > > > Why do you need to do this here (and in prepare_cmp_insn, and similarly in > > > can_compare_p)? Shouldn't get_wider skip over modes that are not actually > > > wider? > > > > I'm afraid too many places rely on all modes of a certain class to be > > visible when walking from "narrowest" to "widest" mode, say > > FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE > > etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode > > && GET_MODE_WIDER_MODE (HFmode) == SFmode. > > Yes, it seems they need to change now that their assumptions have been > violated. I suppose FOR_EACH_MODE_IN_CLASS would need to change to not use > get_wider, and users of FOR_EACH_MODE/FOR_EACH_MODE_UNTIL need to decide > whether they want an iteration that uses get_wider (likely with a new name) > or not. Here is a patch which does that. Though I admit I didn't go carefully through all 24 GET_MODE_WIDER_MODE uses, 54 FOR_EACH_MODE_IN_CLASS uses, 3 FOR_EACH_MODE uses, 24 FOR_EACH_MODE_FROM, 6 FOR_EACH_MODE_UNTIL and 15 FOR_EACH_WIDER_MODE uses. It is more important to go through the GET_MODE_WIDER_MODE and FOR_EACH_WIDER_MODE uses because the patch changes behavior for those, the rest keep their previous meaning and so can be changed incrementally if the other meaning is desirable to them (I've of course changed the 3 spots I had to change in the previous BFmode patch and whatever triggered during the bootstraps). 2022-10-13 Jakub Jelinek <jakub@redhat.com> * genmodes.cc (emit_mode_wider): Emit previous content of mode_wider array into mode_next array and for mode_wider emit always VOIDmode for !CLASS_HAS_WIDER_MODES_P classes, otherwise skip through modes with the same precision. * machmode.h (mode_next): Declare. (GET_MODE_NEXT_MODE): New inline function. (mode_iterator::get_next, mode_iterator::get_known_next): New function templates. (FOR_EACH_MODE_IN_CLASS): Use get_next instead of get_wider. (FOR_EACH_MODE): Use get_known_next instead of get_known_wider. (FOR_EACH_MODE_FROM): Use get_next instead of get_wider. (FOR_EACH_WIDER_MODE_FROM): Define. (FOR_EACH_NEXT_MODE): Define. * expmed.cc (emit_store_flag_1): Use FOR_EACH_WIDER_MODE_FROM instead of FOR_EACH_MODE_FROM. * optabs.cc (prepare_cmp_insn): Likewise. Remove redundant !CLASS_HAS_WIDER_MODES_P check. (prepare_float_lib_cmp): Use FOR_EACH_WIDER_MODE_FROM instead of FOR_EACH_MODE_FROM. * config/i386/i386-expand.cc (get_mode_wider_vector): Use GET_MODE_NEXT_MODE instead of GET_MODE_WIDER_MODE.
2022-10-13[AArch64] Improve bit tests [PR105773]Wilco Dijkstra7-96/+107
Since AArch64 sets all flags on logical operations, comparisons with zero can be combined into an AND even if the condition is LE or GT. Add a new CC_NZV mode used by ANDS/BICS/TST instructions. gcc/ PR target/105773 * config/aarch64/aarch64.cc (aarch64_select_cc_mode): Allow GT/LE for merging compare with zero into AND. (aarch64_get_condition_code_1): Add CC_NZVmode support. * config/aarch64/aarch64-modes.def: Add CC_NZV. * config/aarch64/aarch64.md: Use CC_NZV in cmp+and patterns. gcc/testsuite/ PR target/105773 * gcc.target/aarch64/ands_2.c: Test for ANDS. * gcc.target/aarch64/bics_2.c: Test for BICS. * gcc.target/aarch64/tst_2.c: Test for TST. * gcc.target/aarch64/tst_imm_split_1.c: Fix test.
2022-10-13Merge branch 'master' into devel/sphinxMartin Liska168-1417/+6044
2022-10-13tree-optimization/107160 - avoid reusing multiple accumulatorsRichard Biener2-1/+43
Epilogue vectorization is not set up to re-use a vectorized accumulator consisting of more than one vector. For non-SLP we always reduce to a single but for SLP that isn't happening. In such case we currenlty miscompile the epilog so avoid this. PR tree-optimization/107160 * tree-vect-loop.cc (vect_create_epilog_for_reduction): Do not register accumulator if we failed to reduce it to a single vector. * gcc.dg/vect/pr107160.c: New testcase.
2022-10-13Add op1_op2_relation for float operands.Aldy Hernandez3-1/+16
op1_op2_relation can be called for relops (bool = a < b) as well as regular binary operators (z = a + b). This patch adds the overloaded method for floating point results. gcc/ChangeLog: * range-op-float.cc (range_operator_float::op1_op2_relation): New. (class foperator_equal): Add using. (class foperator_not_equal): Same. (class foperator_lt): Same. (class foperator_le): Same. (class foperator_gt): Same. (class foperator_ge): Same. * range-op.cc (range_op_handler::op1_op2_relation): New. * range-op.h (range_operator_float::op1_op2_relation): New.
2022-10-13diagnose return statement in match.pd (with { ... } expressionsRichard Biener2-150/+150
The expression in (with { ... } is used like a statement expression which means control flow that leaves it is not allowed. The following explicitely diagnoses 'return' and fixes up the few cases that crept into match.pd (oops). Any such return will prematurely end matching the current expression. * genmatch.cc (parser::parse_c_expr): Diagnose 'return'. * match.pd: Replace 'return' statements in with expressions with appropriate variants.
2022-10-13ifcvt: Fix bitpos calculation in bitfield lowering [PR107229]Andre Vieira4-4/+81
The bitposition calculation for the bitfield lowering in loop if conversion was not taking DECL_FIELD_OFFSET into account, which meant that it would result in wrong bitpositions for bitfields that did not end up having representations starting at the beginning of the struct. gcc/ChangeLog: PR tree-optimization/107229 * tree-if-conv.cc (get_bitfield_rep): Fix bitposition calculation. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr107229-1.c: New test. * gcc.dg/vect/pr107229-2.c: New test. * gcc.dg/vect/pr107229-3.c: New test.
2022-10-13vect: Don't pattern match BITFIELD_REF's of non-integrals [PR107226]Andre Vieira1-19/+2
The original patch supported matching the vect_recog_bitfield_ref_pattern for BITFIELD_REF's where the first operand didn't have a INTEGRAL_TYPE_P type. That means it would also match vectors, leading to regressions in targets that supported vectorization of those. gcc/ChangeLog: PR tree-optimization/107226 * tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Reject BITFIELD_REF's with non integral typed first operands.
2022-10-13c: Do not use *_IS_IEC_60559 == 2Joseph Myers3-12/+7
A late change for C2x (addressing comments from the second round of editorial review before the CD ballot, postdating the most recent public working draft) removed the value 2 for *_IS_IEC_60559 (a new <float.h> macro added in C2x). Adjust the implementation accordingly not to use this value. Bootstrapped with no regressions for x86_64-pc-linux-gnu. gcc/ * ginclude/float.h (FLT_IS_IEC_60559, DBL_IS_IEC_60559) (LDBL_IS_IEC_60559): Update comment. gcc/c-family/ * c-cppbuiltin.cc (builtin_define_float_constants): Do not use value 2 for *_IS_IEC_60559. gcc/testsuite/ * gcc.dg/c2x-float-10.c: Do not expect value 2 for *_IS_IEC_60559.
2022-10-13Daily bump.GCC Administrator6-1/+164
2022-10-12preprocessor: Fix tracking of system header state [PR60014,PR60723]Lewis Hyatt7-7/+52
The token_streamer class (which implements gcc mode -E and -save-temps/-no-integrated-cpp) needs to keep track whether the last tokens output were in a system header, so that it can generate line marker annotations as necessary for a downstream consumer to reconstruct the state. The logic for tracking it, which was added by r5-1863 to resolve PR60723, has some edge case issues as revealed by the three new test cases. The first, coming from the original PR60014, was incidentally fixed by r9-1926 for unrelated reasons. The other two were still failing on master prior to this commit. Such code paths were not realizable prior to r13-1544, which made it possible for the token streamer to see CPP_PRAGMA tokens in more contexts. The two main issues being corrected here are: 1) print.prev_was_system_token needs to indicate whether the previous token output was in a system location. However, it was not being set on every token, only on those that triggered the main code path; specifically it was not triggered on a CPP_PRAGMA token. Testcase 2 covers this case. 2) The token_streamer uses a variable "line_marker_emitted" to remember whether a line marker has been emitted while processing a given token, so that it wouldn't be done more than once in case multiple conditions requiring a line marker are true. There was no reason for this to be a member variable that retains its value from token to token, since it is just needed for tracking the state locally while processing a single given token. The fact that it could retain its value for a subsequent token is rather difficult to observe, but testcase 3 demonstrates incorrect behavior resulting from that. Moving this to a local variable also simplifies understanding the control flow going forward. gcc/c-family/ChangeLog: PR preprocessor/60014 PR preprocessor/60723 * c-ppoutput.cc (class token_streamer): Remove member line_marker_emitted to... (token_streamer::stream): ...a local variable here. Set print.prev_was_system_token on all code paths. gcc/testsuite/ChangeLog: PR preprocessor/60014 PR preprocessor/60723 * gcc.dg/cpp/pr60014-1.c: New test. * gcc.dg/cpp/pr60014-1.h: New test. * gcc.dg/cpp/pr60014-2.c: New test. * gcc.dg/cpp/pr60014-2.h: New test. * gcc.dg/cpp/pr60014-3.c: New test. * gcc.dg/cpp/pr60014-3.h: New test.
2022-10-12c++: Remove maybe-rvalue OR in implicit moveMarek Polacek10-97/+41
This patch removes the two-stage overload resolution when performing implicit move, whereby the compiler does two separate overload resolutions: one treating the operand as an rvalue, and then (if that resolution fails) another one treating the operand as an lvalue. In the standard this was introduced via CWG 1579 and implemented in gcc in r251035. In r11-2412, we disabled the fallback OR in C++20 (but not in C++17). Then C++23 P2266 removed the fallback overload resolution, and changed the implicit move rules once again. So we wound up with three different behaviors. The two overload resolutions approach was complicated and quirky, so users should transition to the newer model. Removing the maybe-rvalue OR also allows us to simplify our code, for instance, now we can get rid of LOOKUP_PREFER_RVALUE altogether. This change means that code that previously didn't compile in C++17 will now compile, for example: struct S1 { S1(S1 &&); }; struct S2 : S1 {}; S1 f (S2 s) { return s; // OK, derived-to-base, use S1::S1(S1&&) } And conversely, code that used to work in C++17 may not compile anymore: struct W { W(); }; struct F { F(W&); F(W&&) = delete; }; F fn () { W w; return w; // use w as rvalue -> use of deleted function F::F(W&&) } I plan to add a note to porting_to.html. gcc/cp/ChangeLog: * call.cc (standard_conversion): Remove LOOKUP_PREFER_RVALUE code. (reference_binding): Honor clk_implicit_rval even pre-C++20. (implicit_conversion_1): Remove LOOKUP_PREFER_RVALUE code. (build_user_type_conversion_1): Likewise. (convert_like_internal): Likewise. (build_over_call): Likewise. * cp-tree.h (LOOKUP_PREFER_RVALUE): Remove. (LOOKUP_NO_NARROWING): Adjust definition. * except.cc (build_throw): Don't perform two overload resolutions. * typeck.cc (maybe_warn_pessimizing_move): Don't use LOOKUP_PREFER_RVALUE. (check_return_expr): Don't perform two overload resolutions. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/Wredundant-move10.C: Adjust dg-warning. * g++.dg/cpp0x/Wredundant-move7.C: Likewise. * g++.dg/cpp0x/move-return2.C: Remove dg-error. * g++.dg/cpp0x/move-return4.C: Likewise. * g++.dg/cpp0x/ref-qual20.C: Adjust expected return value. * g++.dg/cpp0x/move-return5.C: New test.
2022-10-12Add range-op entry for floating point NEGATE_EXPR.Aldy Hernandez1-0/+62
gcc/ChangeLog: * range-op-float.cc (class foperator_negate): New. (floating_op_table::floating_op_table): Add NEGATE_EXPR (range_op_float_tests): Add negate tests.
2022-10-12Fortran: check types of operands of arithmetic binary operations [PR107217]Harald Anlauf2-0/+33
gcc/fortran/ChangeLog: PR fortran/107217 * arith.cc (gfc_arith_plus): Compare consistency of types of operands. (gfc_arith_minus): Likewise. (gfc_arith_times): Likewise. (gfc_arith_divide): Likewise. (arith_power): Check that both operands are of numeric type. gcc/testsuite/ChangeLog: PR fortran/107217 * gfortran.dg/pr107217.f90: New test.
2022-10-12c++: defer all consteval in default args [DR2631]Jason Merrill10-111/+95
The proposed resolution of CWG2631 extends our current handling of source_location::current to all consteval functions: default arguments are not evaluated until they're used in a call, the same should apply to evaluation of immediate invocations. And similarly for default member initializers. Previously we folded source_location::current in cp_fold_r; now we fold all consteval calls in default arguments/member initializers in bot_replace. DR 2631 gcc/cp/ChangeLog: * cp-tree.h (source_location_current_p): Remove. * name-lookup.h (struct cp_binding_level): Remove immediate_fn_ctx_p. * call.cc (in_immediate_context): All default args and DMI are potentially immediate context. (immediate_invocation_p): Don't treat source_location specially. (struct in_consteval_if_p_temp_override): Move to cp-tree.h. * constexpr.cc (get_nth_callarg): Move to cp-tree.h. * cp-gimplify.cc (cp_fold_r): Don't fold consteval. * name-lookup.cc (begin_scope): Don't set immediate_fn_ctx_p. * parser.cc (cp_parser_lambda_declarator_opt): Likewise. (cp_parser_direct_declarator): Likewise. * pt.cc (tsubst_default_argument): Open sk_function_parms level. * tree.cc (source_location_current_p): Remove. (bot_replace): Fold consteval here. (break_out_target_exprs): Handle errors. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/consteval-defarg3.C: New test.
2022-10-12RISC-V: Remove TUPLE size macro define. [NFC]Ju-Zhe Zhong1-3/+0
gcc/ChangeLog: * config/riscv/riscv-vector-builtins.h: Remove unused macro.
2022-10-12RISC-V: Apply clang-format for riscv-vector-builtins.* [NFC]Ju-Zhe Zhong3-7/+6
gcc/ChangeLog: * config/riscv/riscv-vector-builtins.cc (DEF_RVV_TYPE): Apply clang-format. (add_vector_type_attribute): Ditto. * config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE): Apply clang-format. * config/riscv/riscv-vector-builtins.h (DEF_RVV_TYPE): Apply clang-format.
2022-10-12RISC-V: Refine register_builtin_types function. [NFC]Ju-Zhe Zhong2-40/+50
gcc/ChangeLog: * config/riscv/riscv-vector-builtins.cc (builtin_types): Redefine vector types. (build_const_pointer): New function. (register_builtin_type): Ditto. (DEF_RVV_TYPE): Simplify macro. (register_vector_type): Refine implementation. * config/riscv/riscv-vector-builtins.h (rvv_builtin_types_t): New.
2022-10-12RISC-V: Move function place to make it looks better. [NFC]Ju-Zhe Zhong2-19/+19
gcc/ChangeLog: * config/riscv/riscv-vector-builtins.h (class rvv_switcher): Move to this to .... * config/riscv/riscv-vector-builtins.cc (class rvv_switcher): here.
2022-10-12Remove AVX512_VP2INTERSECT from PTA_SAPPHIRERAPIDSCui,Lili3-16/+12
gcc/ChangeLog: * config/i386/driver-i386.cc (host_detect_local_cpu): Move sapphirerapids out of AVX512_VP2INTERSECT. * config/i386/i386.h: Remove AVX512_VP2INTERSECT from PTA_SAPPHIRERAPIDS * doc/invoke.texi: Remove AVX512_VP2INTERSECT from SAPPHIRERAPIDS
2022-10-12gcov: rename gcov_write_summaryMartin Liska2-5/+5
gcc/ChangeLog: * gcov-io.cc (gcov_write_summary): Rename to ... (gcov_write_object_summary): ... this. * gcov-io.h (GCOV_TAG_OBJECT_SUMMARY_LENGTH): Rename from ... (GCOV_TAG_SUMMARY_LENGTH): ... this. libgcc/ChangeLog: * libgcov-driver.c: Use new function. * libgcov.h (gcov_write_summary): Rename to ... (gcov_write_object_summary): ... this.
2022-10-12regenerate configure filesMartin Liska1-2/+2
Needed after a recent change. gcc/ChangeLog: * configure: Regenerate. libatomic/ChangeLog: * configure: Regenerate. libbacktrace/ChangeLog: * configure: Regenerate. libcc1/ChangeLog: * configure: Regenerate. libffi/ChangeLog: * configure: Regenerate. libgfortran/ChangeLog: * configure: Regenerate. libgomp/ChangeLog: * configure: Regenerate. libitm/ChangeLog: * configure: Regenerate. libobjc/ChangeLog: * configure: Regenerate. liboffloadmic/ChangeLog: * configure: Regenerate. * plugin/configure: Regenerate. libphobos/ChangeLog: * configure: Regenerate. libquadmath/ChangeLog: * configure: Regenerate. libsanitizer/ChangeLog: * configure: Regenerate. libssp/ChangeLog: * configure: Regenerate. libstdc++-v3/ChangeLog: * configure: Regenerate. libvtv/ChangeLog: * configure: Regenerate. lto-plugin/ChangeLog: * configure: Regenerate. zlib/ChangeLog: * configure: Regenerate.
2022-10-12Add stubs for floating point range-op tests.Aldy Hernandez2-0/+29
gcc/ChangeLog: * range-op-float.cc (frange_float): New. (range_op_float_tests): New. * range-op.cc (range_op_tests): Call range_op_float_tests.
2022-10-12Add method to query the sign of a NAN.Aldy Hernandez1-0/+17
In writing some range-op entries I noticed we don't have a way to query the sign of the NAN in a range, unless the range only contains NAN, in which case you can just use frange::signbit_p. This patch adds a method that returns TRUE if there exists the possiblity of a NAN and we know its sign. gcc/ChangeLog: * value-range.h (frange::nan_signbit_p): New.
2022-10-12Disable tree to bool conversion in frange::update_nan.Aldy Hernandez2-1/+2
We have a set_nan(type) method which can be confused with update_nan(bool) because of the silent conversion of pointers to bool. Currently, if you call update_nan(tree), you'll set the possibility of NAN with a sign of true if tree is non-null. This is prone to error and this patch disallows this behavior. gcc/ChangeLog: * value-range.cc (frange::set_nonnegative): Pass bool to update_nan. * value-range.h: Disallow conversion to bool in update_nan().
2022-10-12Add an frange(type) constructor analogous to the irange version.Aldy Hernandez1-0/+8
gcc/ChangeLog: * value-range.h (frange::frange): Add constructor taking type.
2022-10-12Add default relation_kind to floating point range-op entries.Aldy Hernandez1-40/+40
The methods from which these derive all have a default relation_kind. This patch just adds the default, to make it easier to write unit tests later. gcc/ChangeLog: * range-op-float.cc: Add relation_kind = VREL_VARYING to all methods.
2022-10-12Daily bump.GCC Administrator6-1/+353
2022-10-12Enable support for atomic primitives on SPARC/LinuxEric Botcazou1-0/+1
The SPARC/Linux port is very similar to the SPARC/Solaris port nowadays so it makes sense to copy the setting of the support for atomic primitives. This fixes the single regression in the gnat.dg testsuite: FAIL: gnat.dg/prot7.adb (test for excess errors) gcc/ada/ * libgnat/system-linux-sparc.ads (Support_Atomic_Primitives): New constant set to True.
2022-10-11Fortran: check types of source expressions before conversion [PR107215]Harald Anlauf2-0/+50
gcc/fortran/ChangeLog: PR fortran/107215 * arith.cc (gfc_int2int): Check validity of type of source expr. (gfc_int2real): Likewise. (gfc_int2complex): Likewise. (gfc_real2int): Likewise. (gfc_real2real): Likewise. (gfc_complex2int): Likewise. (gfc_complex2real): Likewise. (gfc_complex2complex): Likewise. (gfc_log2log): Likewise. (gfc_log2int): Likewise. (gfc_int2log): Likewise. gcc/testsuite/ChangeLog: PR fortran/107215 * gfortran.dg/pr107215.f90: New test.
2022-10-11c++ modules: ICE with templated friend and std namespace [PR100134]Patrick Palka3-0/+25
The function depset::hash::add_binding_entity has an assert verifying that if a namespace contains an exported entity, then the namespace must have been opened in the module purview: if (data->hash->add_namespace_entities (decl, data->partitions)) { /* It contains an exported thing, so it is exported. */ gcc_checking_assert (DECL_MODULE_PURVIEW_P (decl)); DECL_MODULE_EXPORT_P (decl) = true; } We're tripping over this assert in the below testcase because by instantiating and exporting std::A<int>, we in turn define and export the hidden friend std::f(A<int>) without ever having opened the enclosing namespace std within the module purview, and thus DECL_MODULE_PURVIEW_P for std is false. It's important that the enclosing namespace is std here: if we use a different namespace then the ICE disappears. This probably has something to do with us predefining std via push_namespace from cxx_init_decl_processing (which makes it look like we've opened it within the TU), whereas with another namespace we would instead lazily create its NAMESPACE_DECL from add_imported_namespace. Since templated friend functions are special in that they give us a way to introduce a namespace-scope function without having to explicitly open the namespace, this patch proposes to fix this ICE by propagating DECL_MODULE_PURVIEW_P from the introduced function to the enclosing namespace during tsubst_friend_function. PR c++/100134 gcc/cp/ChangeLog: * pt.cc (tsubst_friend_function): Propagate DECL_MODULE_PURVIEW_P from the introduced namespace-scope function to the namespace. gcc/testsuite/ChangeLog: * g++.dg/modules/tpl-friend-8_a.H: New test. * g++.dg/modules/tpl-friend-8_b.C: New test.
2022-10-11c++ modules: lazy loading from within template [PR99377]Patrick Palka3-0/+22
Here when lazily loading the binding for f due to its first use from the template g, processing_template_decl is set which causes the call to note_vague_linkage_fn from module_state::read_cluster to have no effect, and thus we never push f onto deferred_fns and end up never emitting its definition despite needing it. The behavior of the lazy loading machinery shouldn't be sensitive to whether we're inside a template, so to that end this patch makes us clear processing_template_decl in the entrypoints lazy_load_binding and lazy_load_pendings. PR c++/99377 gcc/cp/ChangeLog: * module.cc (lazy_load_binding): Clear processing_template_decl. (lazy_load_pendings): Likewise. gcc/testsuite/ChangeLog: * g++.dg/modules/pr99377-2_a.C: New test. * g++.dg/modules/pr99377-2_b.C: New test.
2022-10-11Avoid calling tracer.trailer() twice.Aldy Hernandez1-2/+6
gcc/ChangeLog: * gimple-range-gori.cc (gori_compute::logical_combine): Avoid calling tracer.trailer().
2022-10-11i386: Fix up RTL checking ICE [PR107185]Jakub Jelinek1-1/+1
On Tue, Oct 11, 2022 at 04:03:16PM +0800, liuhongt via Gcc-patches wrote: > gcc/ChangeLog: > > * config/i386/i386.md (*notxor<mode>_1): New post_reload > define_insn_and_split. > (*notxorqi_1): Ditto. > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -10826,6 +10826,39 @@ (define_insn "*<code><mode>_1" > (set_attr "type" "alu, alu, msklog") > (set_attr "mode" "<MODE>")]) > > +(define_insn_and_split "*notxor<mode>_1" > + [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k") > + (not:SWI248 > + (xor:SWI248 > + (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k") > + (match_operand:SWI248 2 "<general_operand>" "r<i>,<m>,k")))) > + (clobber (reg:CC FLAGS_REG))] > + "ix86_binary_operator_ok (XOR, <MODE>mode, operands)" > + "#" > + "&& reload_completed" > + [(parallel > + [(set (match_dup 0) > + (xor:SWI248 (match_dup 1) (match_dup 2))) > + (clobber (reg:CC FLAGS_REG))]) > + (set (match_dup 0) > + (not:SWI248 (match_dup 1)))] > +{ > + if (MASK_REGNO_P (REGNO (operands[0]))) This causes --enable-checking=yes,rtl,extra regression on gcc.dg/store_merging_13.c test on x86_64-linux: .../gcc/testsuite/gcc.dg/store_merging_13.c: In function 'f13': .../gcc/testsuite/gcc.dg/store_merging_13.c:189:1: internal compiler error: RTL check: expected code 'reg', have 'mem' in rhs_regno, at rtl.h:1932 0x7b0c8f rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*) ../../gcc/rtl.cc:916 0x8e74be rhs_regno ../../gcc/rtl.h:1932 0x9785fd rhs_regno ./genrtl.h:120 0x9785fd gen_split_260(rtx_insn*, rtx_def**) ../../gcc/config/i386/i386.md:10846 0x23596dc split_insns(rtx_def*, rtx_insn*) ../../gcc/config/i386/i386.md:16392 0xfccd5a try_split(rtx_def*, rtx_insn*, int) ../../gcc/emit-rtl.cc:3799 0x132e9d8 split_insn ../../gcc/recog.cc:3384 0x13359d5 split_all_insns() ../../gcc/recog.cc:3488 0x1335ae8 execute ../../gcc/recog.cc:4412 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. Fixed thusly. 2022-10-11 Jakub Jelinek <jakub@redhat.com> PR target/107185 * config/i386/i386.md (*notxor<mode>_1): Use MASK_REG_P (x) instead of MASK_REGNO_P (REGNO (x)).