riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-12-18	noncontiguous ifcombine: skip marking of non-SSA_NAMEs [PR117915]	Alexandre Oliva	1	-0/+9
	When ifcombine_mark_ssa_name is called directly, rather than by ifcombine_mark_ssa_name_walk, we need to check that name is an SSA_NAME at the caller or in the function itself. For convenience and safety, I'm moving the checks from _walk to the implementation proper. for gcc/ChangeLog PR tree-optimization/117915 * tree-ssa-ifcombine.cc (ifcombine_mark_ssa_name): Move preconditions from... (ifcombine_mark_ssa_name_walk): ... here. for gcc/testsuite/ChangeLog PR tree-optimization/117915 * gcc.dg/pr117915.c: New.
2024-12-18	ifcombine field merge: adjust testcases [PR118025]	Alexandre Oliva	2	-3/+3
	There was a thinko in the testcase field-merge-9.c: I overcorrected it for big-endian. As a bonus, I'm including stdbool.h in field-merge-12.c, because I used bool without the header there. for gcc/testsuite/ChangeLog PR testsuite/118025 * gcc.dg/field-merge-9.c (q): Drop overcorrection for big-endian. * gcc.dg/field-merge-12.c: Include stdbool.h.
2024-12-18	ifcombine field merge: do not follow a second conversion [PR118046]	Alexandre Oliva	1	-0/+26
	The testcase shows that conversions that would impact negatively the ifcombine field merging implementation won't always have been optimized out by the time we reach ifcombine. There's probably room to support multiple conversions with extra logic, but this workaround should avoid codegen errors until that logic is figured out. for gcc/ChangeLog PR tree-optimization/118046 * gimple-fold.cc (decode_field_reference): Don't follow more than one conversion. for gcc/testsuite/ChangeLog PR tree-optimization/118046 * gcc.dg/field-merge-14.c: New.
2024-12-18	ifcombine field merge: stricten loads tests, swap compare to match	Alexandre Oliva	1	-0/+93
	ACATS-4 ca11d02 exposed an error in the logic for recognizing and identifying the inner object in decode_field_ref: a view-converting load, inserted in a previous successful field merging operation, was recognized by gimple_convert_def_p within decode_field_reference, and as a result we took its operand as the expression, and failed to take note of the load location. Without that load, we couldn't compare vuses, and then we ended up inserting a wider load before relevant parts of the object were initialized. This patch makes gimple_convert_def_p recognize loads only when requested, and requires that either both or neither parts of a potentially merged operand have associated loads. As a bonus, it enables additional optimizations by swapping the operands of the second compare when that makes left-hand operands of both compares match. for gcc/ChangeLog * gimple-fold.cc (gimple_convert_def_p): Reject load stmts unless requested. (decode_field_reference): Accept a converting load at the last conversion matcher, subsuming the load identification. (fold_truth_andor_for_ifcombine): Refuse to merge operands when only one of them has an associated load stmt. Swap operands of one of the compares if that helps them match. for gcc/testsuite/ChangeLog * gcc.dg/field-merge-13.c: New.
2024-12-18	gimple-fold: Fix up decode_field_reference xor handling [PR118081]	Jakub Jelinek	1	-0/+28
	The function comment says: XOR_P is to be FALSE if EXP might be a XOR used in a compare, in which case, if XOR_CMP_OP is a zero constant, it will be overridden with PEXP, XOR_P will be set to TRUE, and the left-hand operand of the XOR will be decoded. If XOR_P is TRUE, XOR_CMP_OP is supposed to be NULL, and then the right-hand operand of the XOR will be decoded. and the comment right above the xor_p handling says /* Turn (a ^ b) [!]= 0 into a [!]= b. / but I don't see anything that would actually check that the other operand is 0, in the testcase below it happily optimizes (a ^ 1) == 8 into a == 1. The following patch adds that check. Note, there are various other parts of the function I'm worried about, but haven't had time to construct counterexamples yet. One worrying thing is the / Drop casts, only save the outermost type. We need not worry about narrowing then widening casts, or vice-versa, for those that are not essential for the compare have already been optimized out at this point. / comment, while obviously there are various optimizations which do optimize nested casts and the like, I'm not really sure it is safe to rely on them happening always before this optimization, there are various options to disable certain optimizations and some IL could appear right before ifcombine without being optimized yet the way this routine expects. Plus, the 3 casts are looked through in between various optimizations which might make those narrowing/widening or vice versa cases necessary. Also, e.g. for the xor optimization, I think there is a difference between int a and (a ^ 0x23) == 0 and ((int) (((unsigned char) a) ^ (unsigned char) 0x23)) == 0 etc. Another thing I'm worrying about are mixing up the different patterns together, there is the BIT_AND_EXPR handling, BIT_XOR_EXPR handling, RSHIFT_EXPR handling and then load handling. What if all 4 appear together, or 3 of them, 2 of them? Is the xor optimization still valid if there is BIT_AND_EXPR in between? I.e. instead of (a ^ 123) == 0 there is ((a ^ 123) & 234) == 0 ? 2024-12-18 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/118081 gimple-fold.cc (decode_field_reference): Only set xor_p to true if xor_cmp_op is integer_zerop. * gcc.dg/pr118081.c: New test.
2024-12-17	ipa: Skip widening type conversions in jump function constructions	Martin Jambor	1	-0/+48
	Originally, we did not stream any formal parameter types into WPA and were generally very conservative when it came to type mismatches in IPA-CP. Over the time, mismatches that happen in code and blew up in WPA made us to be much more resilient and also to stream the types of the parameters which we now use commonly. With that information, we can safely skip conversions when looking at the IL from which we build jump functions and then simply fold convert the constants and ranges to the resulting type, as long as we are careful that performing the corresponding folding of constants gives the corresponding results. In order to do that, we must ensure that the old value can be represented in the new one without any loss. With this change, we can nicely propagate non-NULLness in IPA-VR as demonstrated with the new test case. I have gone through all other uses of (all components of) jump functions which could be affected by this and verified they do indeed check types and can handle mismatches. gcc/ChangeLog: 2024-12-11 Martin Jambor <mjambor@suse.cz> * ipa-prop.cc: Include vr-values.h. (skip_a_safe_conversion_op): New function. (ipa_compute_jump_functions_for_edge): Use it. gcc/testsuite/ChangeLog: 2024-11-01 Martin Jambor <mjambor@suse.cz> * gcc.dg/ipa/vrp9.c: New test.
2024-12-16	testsuite: Force max-completely-peeled-insns=300 for CRIS, PR118055	Hans-Peter Nilsson	2	-2/+2
	This handles fallout from r15-6097-gee2f19b0937b5e. A brief analysis shows that the metric used in that code is computed by estimate_move_cost, differentiating on the target macro MOVE_MAX_PIECES (which defaults to MOVE_MAX) which for most "32-bit targets" is 4 and for "64-bit targets" is 8. There are some outliers, like pru, with MOVE_MAX set to 8 but counting as a 32-bit target. So, the main difference for this test-case, which is heavy on 64-bit moves (most targets have "double" mapped to IEEE 64-bit), is between "32-bit" and "64-bit", with the cost up to twice for the former compared to the latter. I see no effective_target_move_max_is_4 or equivalent, and this instance falls below the threshold of adding one, so I'm sticking to a list of targets. For CRIS, it would suffice with 210, but there's no need to be this specific, and it would make the test even more brittle. PR tree-optimization/118055 * gcc.dg/tree-ssa/pr83403-1.c, gcc.dg/tree-ssa/pr83403-2.c: Add cris-- to targets passing --param=max-completely-peeled-insns=300.
2024-12-16	diagnostics: move libgdiagnostics dc from sinks into diagnostic_manager	David Malcolm	3	-2/+5
	libgdiagnostics was written before the fixes for PR other/116613 allowed a diagnostic_context to have multiple output sinks. Hence each libgdiagnostics sink had its own diagnostic_context with just one diagnostic_output_format. This wart is no longer necessary and makes it harder to move state into the manager/context; in particular for quoting source code from the .sarif file (PR sarif-replay/117943). Simplify, by making libgdiagnostics' implementation more similar to GCC's implementation, by moving the diagnostic_context from sink into diagnostic_manager. Doing so requires generalizing where the diagnostic_source_printing_options comes from in class diagnostic_text_output_format: for GCC we use the instance within the diagnostic_context, whereas for libgdiagnostics each diagnostic_text_sink has its own instance. No functional change intended. gcc/c-family/ChangeLog: PR sarif-replay/117943 * c-format.cc (selftest::test_type_mismatch_range_labels): Use dc.m_source_printing. * c-opts.cc (c_diagnostic_text_finalizer): Use source-printing options from text_output. gcc/cp/ChangeLog: PR sarif-replay/117943 * error.cc (auto_context_line::~auto_context_line): Use source-printing options from text_output. gcc/ChangeLog: PR sarif-replay/117943 * diagnostic-format-text.cc (diagnostic_text_output_format::append_note): Use source-printing options from text_output. (diagnostic_text_output_format::update_printer): Copy source-printing options from dc. (default_diagnostic_text_finalizer): Use source-printing options from text_output. * diagnostic-format-text.h (diagnostic_text_output_format::diagnostic_text_output_format): Add optional diagnostic_source_printing_options param, using the context's if null. (diagnostic_text_output_format::get_source_printing_options): New accessor. (diagnostic_text_output_format::m_source_printing): New field. * diagnostic-path.cc (event_range::print): Use source-printing options from text_output. (selftest::test_interprocedural_path_1): Use source-printing options from dc. * diagnostic-show-locus.cc (gcc_rich_location::add_location_if_nearby): Likewise. (diagnostic_context::maybe_show_locus): Add "opts" param and use in place of m_source_printing. Pass it to source_policy ctor. (diagnostic_source_print_policy::diagnostic_source_print_policy): Add overload taking a const diagnostic_source_printing_options &. * diagnostic.cc (diagnostic_context::initialize): Pass nullptr for source options when creating text sink, so that it uses the dc's options. (diagnostic_context::dump): Add an "output sinks:" heading and print "(none)" if there aren't any. (diagnostic_context::set_output_format): Split out code into... (diagnostic_context::remove_all_output_sinks): ...this new function. * diagnostic.h (diagnostic_source_print_policy::diagnostic_source_print_policy): Add overload taking a const diagnostic_source_printing_options &. (diagnostic_context::maybe_show_locus): Add "opts" param. (diagnostic_context::remove_all_output_sinks): New decl. (diagnostic_context::m_source_printing): New field. (diagnostic_show_locus): Add "opts" param and pass to maybe_show_locus. * libgdiagnostics.cc (sink::~sink): Delete. (sink::begin_group): Delete. (sink::end_group): Delete. (sink::emit): Delete. (sink::m_dc): Drop field. (diagnostic_text_sink::on_begin_text_diagnostic): Delete. (diagnostic_text_sink::get_source_printing_options): Use m_souece_printing. (diagnostic_text_sink::m_current_logical_loc): Drop field. (diagnostic_text_sink::m_inner_sink): New field. (diagnostic_text_sink::m_source_printing): New field. (diagnostic_manager::diagnostic_manager): Update for changes to fields. Initialize m_dc. (diagnostic_manager::~diagnostic_manager): Call diagnostic_finish. (diagnostic_manager::get_file_cache): Drop. (diagnostic_manager::get_dc): New accessor. (diagnostic_manager::begin_group): Reimplement. (diagnostic_manager::end_group): Reimplement. (diagnostic_manager::get_prev_diag_logical_loc): New accessor. (diagnostic_manager::m_dc): New field. (diagnostic_manager::m_file_cache): Drop field. (diagnostic_manager::m_edit_context): Convert to a std::unique_ptr so that object can be constructed after m_dc is initialized. (diagnostic_manager::m_prev_diag_logical_loc): New field. (diagnostic_text_sink::diagnostic_text_sink): Reimplement. (get_color_rule): Delete. (diagnostic_text_sink::set_colorize): Reimplement. (diagnostic_text_sink::text_starter): New. (sarif_sink::sarif_sink): Reimplement. (diagnostic_manager::write_patch): Update for change to m_edit_context. (diagnostic_manager::emit): Update now that each sink has a corresponding diagnostic_output_format object within m_dc. gcc/fortran/ChangeLog: PR sarif-replay/117943 * error.cc (gfc_diagnostic_text_starter): Use source-printing options from text_output. gcc/testsuite/ChangeLog: PR sarif-replay/117943 * gcc.dg/plugin/diagnostic_plugin_test_show_locus.cc (custom_diagnostic_text_finalizer): Use source-printing options from text_output. * gcc.dg/plugin/diagnostic_plugin_xhtml_format.cc (xhtml_builder::make_element_for_diagnostic): Use source-printing options from diagnostic_context. * gcc.dg/plugin/expensive_selftests_plugin.cc (test_richloc): Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-12-16	testsuite: Require int32plus target for gcc.dg/pr117816.c	Dimitar Dimitrov	1	-1/+1
	Memmove destination overflows if size of int is less than 3, resulting in spurious test failures. Fix by adding a requirement for effective target int32plus. gcc/testsuite/ChangeLog: * gcc.dg/pr117816.c: Require effective target int32plus. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2024-12-15	testsuite: Enable TImode tests on hppa64	John David Anglin	1	-1/+1
	2024-12-15 John David Anglin <danglin@gcc.gnu.org> gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ivopts-1.c: Enable TImode tests on hppa64.
2024-12-14	[PATCH v3] match.pd: Add pattern to simplify `(a - 1) & -a` to `0`	Jovan Vukic	1	-0/+117
	Thank you for the feedback. I have made the minor changes that were requested. Additionally, I extracted the repetitive code into a reusable helper function, match_plus_neg_pattern, making the code much more readable. Furthermore, the logic, code, and tests remain the same as in version 2 of the patch. gcc/ChangeLog: * match.pd: New pattern. * simplify-rtx.cc (match_plus_neg_pattern): New helper function. (simplify_context::simplify_binary_operation_1): New code to handle (a - 1) & -a, (a - 1) \| -a and (a - 1) ^ -a. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bitops-11.c: New test.
2024-12-14	gimple-fold: Fix the recent ifcombine optimization for _BitInt [PR118023]	Jakub Jelinek	1	-0/+11
	The BIT_FIELD_REF verifier has: if (INTEGRAL_TYPE_P (TREE_TYPE (op)) && !type_has_mode_precision_p (TREE_TYPE (op))) { error ("%qs of non-mode-precision operand", code_name); return true; } check among other things, so one can't extract something out of say _BitInt(63) or _BitInt(4096). The new ifcombine optimization happily creates such BIT_FIELD_REFs and ICEs during their verification. The following patch fixes that by rejecting those in decode_field_reference. 2024-12-14 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/118023 * gimple-fold.cc (decode_field_reference): Return NULL_TREE if inner has non-type_has_mode_precision_p integral type. * gcc.dg/bitint-119.c: New test.
2024-12-14	warn-access: Fix up matching_alloc_calls_p [PR118024]	Jakub Jelinek	1	-0/+15
	The following testcase ICEs because of a bug in matching_alloc_calls_p. The loop was apparently meant to be walking the two attribute chains in lock-step, but doesn't really do that. If the first lookup_attribute returns non-NULL, the second one is not done, so rmats in that case can be some random unrelated attribute rather than "malloc" attribute; the body assumes even rmats if non-NULL is "malloc" attribute and relies on its argument to be a "malloc" argument and if it is some other attribute with incompatible attribute, it just crashes. Now, fixing that in the obvious way, instead of doing (amats = lookup_attribute ("malloc", amats)) \|\| (rmats = lookup_attribute ("malloc", rmats)) in the condition do ((amats = lookup_attribute ("malloc", amats)), (rmats = lookup_attribute ("malloc", rmats)), (amats \|\| rmats)) fixes the testcase but regresses Wmismatched-dealloc-{2,3}.c tests. The problem is that walking the attribute lists in a lock-step is obviously a very bad idea, there is no requirement that the same deallocators are present in the same order on both decls, e.g. there could be an extra malloc attribute without argument in just one of the lists, or the order of say free/realloc could be swapped, etc. We don't generally document nor enforce any particular ordering of attributes (even when for some attributes we just handle the first one rather than all). So, this patch instead simply splits it into two loops, the first one walks alloc_decl attributes, the second one walks dealloc_decl attributes. If the malloc attribute argument is a built-in, that doesn't change anything, and otherwise we have the chance to populate the whole common_deallocs hash_set in the first loop and then can check it in the second one (and don't need to use more expensive add method on it, can just check contains there). Not to mention that it also fixes the case when the function would incorrectly return true if there wasn't a common deallocator between the two, but dealloc_decl had 2 malloc attributes with the same deallocator. 2024-12-14 Jakub Jelinek <jakub@redhat.com> PR middle-end/118024 * gimple-ssa-warn-access.cc (matching_alloc_calls_p): Walk malloc attributes of alloc_decl and dealloc_decl in separate loops rather than in lock-step. Use common_deallocs.contains rather than common_deallocs.add in the second loop. * gcc.dg/pr118024.c: New test.
2024-12-12	OpenMP: Enable has_device_addr clause for 'dispatch' in C/C++	Tobias Burnus	1	-0/+5
	The 'has_device_addr' of 'dispatch' has to be seen in conjunction with the 'need_device_addr' modifier to the 'adjust_args' clause of 'declare variant'. As the latter has not yet been implemented, 'has_device_addr' has no real effect. However, to prepare for 'need_device_addr' and as service to the user: For C, where 'need_device_addr' is not permitted (contrary to C++ and Fortran), a note is output when then the user tries to use it (alongside the existing error that either 'nothing' or 'need_device_ptr' was expected). And, on the ME side, is is lightly handled by diagnosing when - for the same argument - there is a mismatch between the variant's adjust_args 'need_device_ptr' modifier and dispatch having an 'has_device_addr' clause (or likewise for need_device_addr with is_device_ptr) as, according to the spec, those are completely separate. Thus, 'dispatch' will still do the host to device pointer conversion for a 'need_device_ptr' argument, even if it appeared in a 'has_device_addr' clause. gcc/c/ChangeLog: * c-parser.cc (OMP_DISPATCH_CLAUSE_MASK): Add has_device_addr clause. (c_finish_omp_declare_variant): Add an 'inform' telling the user that 'need_device_addr' is invalid for C. gcc/cp/ChangeLog: * parser.cc (OMP_DISPATCH_CLAUSE_MASK): Add has_device_addr clause. gcc/ChangeLog: * gimplify.cc (gimplify_call_expr): When handling OpenMP's dispatch, add diagnostic when there is a ptr vs. addr mismatch between need_device_{addr,ptr} and {is,has}_device_{ptr,addr}, respectively. gcc/testsuite/ChangeLog: * c-c++-common/gomp/adjust-args-3.c: New test. * gcc.dg/gomp/adjust-args-2.c: New test.
2024-12-12	fold fold_truth_andor field merging into ifcombine	Alexandre Oliva	12	-0/+424
	This patch introduces various improvements to the logic that merges field compares, while moving it into ifcombine. Before the patch, we could merge: (a.x1 EQNE b.x1) ANDOR (a.y1 EQNE b.y1) into something like: (((type )&a)[Na] & MASK) EQNE (((type )&b)[Nb] & MASK) if both of A's fields live within the same alignment boundaries, and so do B's, at the same relative positions. Constants may be used instead of the object B. The initial goal of this patch was to enable such combinations when a field crossed alignment boundaries, e.g. for packed types. We can't generally access such fields with a single memory access, so when we come across such a compare, we will attempt to combine each access separately. Some merging opportunities were missed because of right-shifts, compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and narrowing conversions, especially after earlier merges. This patch introduces handlers for several cases involving these. The merging of multiple field accesses into wider bitfield-like accesses is undesirable to do too early in compilation, so we move it from folding to ifcombine, and guard its warnings with -Wtautological-compare, turned into a common flag. When the second of a noncontiguous pair of compares is the first that accesses a word, we may merge the first compare with part of the second compare that refers to the same word, keeping the compare of the remaining bits at the spot where the second compare used to be. Handling compares with non-constant fields was somewhat generalized from what fold used to do, now handling non-adjacent fields, even if a field of one object crosses an alignment boundary but the other doesn't. for gcc/ChangeLog * fold-const.cc (make_bit_field): Export. (unextend, all_ones_mask_p): Drop. (decode_field_reference, fold_truth_andor_1): Move field compare merging logic... * gimple-fold.cc: (fold_truth_andor_for_ifcombine) ... here, with -Wtautological-compare warning guards, and... (decode_field_reference): ... here. Rework for gimple. (gimple_convert_def_p, gimple_binop_def_p): New. (compute_split_boundary_from_align): New. (make_bit_field_load, build_split_load): New. (reuse_split_load): New. * fold-const.h: (make_bit_field_ref): Declare (fold_truth_andor_for_ifcombine): Declare. * tree-ssa-ifcombine.cc (ifcombine_ifandif): Try fold_truth_andor_for_ifcombine. * common.opt (Wtautological-compare): Move here. for gcc/c-family/ChangeLog * c.opt (Wtautological-compare): Move to ../common.opt. for gcc/testsuite/ChangeLog * gcc.dg/field-merge-1.c: New. * gcc.dg/field-merge-2.c: New. * gcc.dg/field-merge-3.c: New. * gcc.dg/field-merge-4.c: New. * gcc.dg/field-merge-5.c: New. * gcc.dg/field-merge-6.c: New. * gcc.dg/field-merge-7.c: New. * gcc.dg/field-merge-8.c: New. * gcc.dg/field-merge-9.c: New. * gcc.dg/field-merge-10.c: New. * gcc.dg/field-merge-11.c: New. * gcc.dg/field-merge-12.c: New. * gcc.target/aarch64/long_branch_1.c: Disable ifcombine.
2024-12-12	Fix type compatibility for types with flexible array member 2/2 ↵	Martin Uecker	4	-0/+73
	[PR113688,PR114713,PR117724] For checking or computing TYPE_CANONICAL, ignore the array size when it is the last element of a structure or union. To not get errors because of an inconsistent number of members, zero-sized arrays which are the last element are not ignored anymore when checking the fields of a struct. PR c/113688 PR c/114014 PR c/114713 PR c/117724 gcc/ChangeLog: * tree.cc (gimple_canonical_types_compatible_p): Add exception. gcc/lto/ChangeLog: * lto-common.cc (hash_canonical_type): Add exception. gcc/testsuite/ChangeLog: * gcc.dg/pr113688.c: New test. * gcc.dg/pr114014.c: New test. * gcc.dg/pr114713.c: New test. * gcc.dg/pr117724.c: New test.
2024-12-12	testsuite: arm: Use -mcpu=unset when overriding -march	Torbjörn SVENSSON	2	-2/+2
	Update test cases to use -mcpu=unset/-march=unset feature introduced in r15-3606-g7d6c6a0d15c. gcc/testsuite/ChangeLog: * gcc.dg/pr41574.c: Added option "-mcpu=unset". * gcc.dg/pr59418.c: Likewise. * lib/target-supports.exp (add_options_for_vect_early_break): Likewise. (add_options_for_arm_v8_neon): Likewise. (check_effective_target_arm_neon_ok_nocache): Likewise. (check_effective_target_arm_simd32_ok_nocache): Likewise. (check_effective_target_arm_sat_ok_nocache): Likewise. (check_effective_target_arm_dsp_ok_nocache): Likewise. (check_effective_target_arm_crc_ok_nocache): Likewise. (check_effective_target_arm_v8_neon_ok_nocache): Likewise. (check_effective_target_arm_v8_1m_mve_fp_ok_nocache): Likewise. (check_effective_target_arm_v8_1a_neon_ok_nocache): Likewise. (check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): Likewise. (check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): Likewise. (check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache): Likewise. (check_effective_target_arm_v8_1m_mve_ok_nocache): Likewise. (check_effective_target_arm_v8_2a_i8mm_ok_nocache): Likewise. (check_effective_target_arm_fp16fml_neon_ok_nocache): Likewise. (check_effective_target_arm_v8_2a_bf16_neon_ok_nocache): Likewise. (check_effective_target_arm_v8m_main_cde_ok_nocache): Likewise. (check_effective_target_arm_v8m_main_cde_fp_ok_nocache): Likewise. (check_effective_target_arm_v8_1m_main_cde_mve_ok_nocache): Likewise. (check_effective_target_arm_v8_1m_main_cde_mve_fp_ok_nocache): Likewise. (check_effective_target_arm_v8_3a_complex_neon_ok_nocache): Likewise. (check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache): Likewise. (check_effective_target_arm_v8_1_lob_ok): Likewise. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-12-11	diagnostics: suppress "note: " prefix in nested diagnostics [PR116253]	David Malcolm	3	-36/+36
	This patch is a followup to: "c++: use diagnostic nesting [PR116253]" This patch tweaks how text output with experimental-nesting=yes prints nested diagnostics, by omitting the leading "note: " from nested notes. This reduces the amount of visual cruft the user has to ignore when reading C++ template errors; see the examples in the testsuite. This doesn't affect the output for users who have not opted-in to nested diagnostic-printing. gcc/ChangeLog: PR other/116253 * diagnostic-format-text.cc (build_prefix): Don't add the "note: " prefix when showing nested diagnostics. gcc/testsuite/ChangeLog: PR other/116253 * g++.dg/concepts/nested-diagnostics-1-truncated.C: Update expected output. * g++.dg/concepts/nested-diagnostics-1.C: Likewise. * g++.dg/concepts/nested-diagnostics-2.C: Likewise. * gcc.dg/plugin/diagnostic-test-nesting-text-indented-show-levels.c: Likewise. * gcc.dg/plugin/diagnostic-test-nesting-text-indented-unicode.c: Likewise. * gcc.dg/plugin/diagnostic-test-nesting-text-indented.c: Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-12-10	Fix inaccuracy in cunroll/cunrolli when considering what's innermost loop.	liuhongt	4	-0/+110
	r15-919-gef27b91b62c3aa removed 1 / 3 size reduction for innermost loop, but it doesn't accurately remember what's "innermost" for 2 testcases in PR117888. 1) For pass_cunroll, the "innermost" loop could be an originally outer loop with inner loop completely unrolled by cunrolli. The patch moves local variable cunrolli to parameter of tree_unroll_loops_completely and passes it directly from execute of the pass. 2) For pass_cunrolli, cunrolli is set to false when the sibling loop of a innermost loop is completely unrolled, and it inaccurately takes the innermost loop as an "outer" loop. The patch add another paramter innermost to helps recognizing the "original" innermost loop. gcc/ChangeLog: PR tree-optimization/117888 * tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely): Use cunrolli instead of cunrolli && !loop->inner to check if it's innermost loop. (canonicalize_loop_induction_variables): Add new parameter const_sbitmap innermost, and pass cunrolli && (unsigned) loop->num < SBITMAP_SIZE (innermost) && bitmap_bit_p (innermost, loop->num) as "cunrolli" to try_unroll_loop_completely (canonicalize_induction_variables): Pass innermost to canonicalize_loop_induction_variables. (tree_unroll_loops_completely_1): Add new parameter const_sbitmap innermost. (tree_unroll_loops_completely): Move local variable cunrolli to parameter to indicate it's from pass cunrolli, also track all "original" innermost loop at the beginning. gcc/testsuite/ChangeLog: * gcc.dg/pr117888-2.c: New test. * gcc.dg/vect/pr117888-1.c: Ditto. * gcc.dg/tree-ssa/pr83403-1.c: Add --param max-completely-peeled-insns=300 for arm--. gcc.dg/tree-ssa/pr83403-2.c: Ditto.
2024-12-10	testsuite/gcc.dg/tree-ssa/pr117973-1.c: New test	Hans-Peter Nilsson	1	-0/+7
	PR117973 covers the aspect of non-LOGICAL_OP_NON_SHORT_CIRCUIT targets for PR111456, for which the test-case gcc.dg/tree-ssa/pr111456-1.c started failing as described in PR117954. * gcc.dg/tree-ssa/pr117973-1.c: New test.
2024-12-10	testsuite/gcc.dg/tree-ssa/pr111456-1.c: Handle fallout	Hans-Peter Nilsson	1	-1/+1
	This is expected fallout from r15-5646-gd1cf0d7a0f27fd as described by that commit. The =0 case is covered by PR117973. PR tree-optimization/117954 * gcc.dg/tree-ssa/pr111456-1.c: Pass --param=logical-op-non-short-circuit=1.
2024-12-08	Support for 64-bit location_t: Activate 64-bit location_t	Lewis Hyatt	2	-7/+8
	Change location_t to be a 64-bit integer instead of a 32-bit integer in libcpp. Also included in this change are the two other patches in the original series which depended on this one; I am committing them all at once in case it needs to be reverted later: -Support for 64-bit location_t: gimple parts The size of struct gimple increased by 8 bytes with the change in size of location_t from 32- to 64-bit; adjust the WORD markings in the comments accordingly. It seems that most of the WORD markings were off by one already, probably not having been updated after a previous reduction in the size of a gimple, so they have become retroactively correct again, and only a couple needed adjustment actually. Also add a comment that there is now 32 bits of unused padding available in struct gimple for 64-bit hosts. -Support for 64-bit location_t: Remove -flarge-source-files The option -flarge-source-files became unnecessary with 64-bit location_t and harms performance compared to the new default setting, so silently ignore it. libcpp/ChangeLog: * include/cpplib.h (struct cpp_token): Adjust comment about the struct size. * include/line-map.h (location_t): Change typedef from 32-bit to 64-bit integer. (LINE_MAP_MAX_COLUMN_NUMBER): Increase size to be appropriate for 64-bit location_t. (LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES): Likewise. (LINE_MAP_MAX_LOCATION_WITH_COLS): Likewise. (LINE_MAP_MAX_LOCATION): Likewise. (MAX_LOCATION_T): Likewise. (line_map_suggested_range_bits): Likewise. (struct line_map): Adjust comment about the struct size. (struct line_map_macro): Likewise. (struct line_map_ordinary): Likewise. Rearrange fields to optimize padding. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/pr77949.C: Adapt the test for 64-bit location_t, when the previously expected failure doesn't actually happen. * g++.dg/modules/loc-prune-4.C: Adjust the expected output for the 64-bit location_t case. * gcc.dg/plugin/expensive_selftests_plugin.cc: Don't try to test the maximum supported column number in 64-bit location_t mode. * gcc.dg/plugin/location_overflow_plugin.cc: Adjust the base_location so it can effectively test 64-bit location_t. gcc/ChangeLog: * gimple.h (struct gphi): Update word marking comments to reflect the new size of location_t. (struct gimple): Likewise. Add a comment about padding. * common.opt: Mark -flarge-source-files as Ignored. * common.opt.urls: Regenerate. * doc/invoke.texi: Remove -flarge-source-files. * toplev.cc (process_options): Remove support for -flarge-source-files.
2024-12-06	avoid-store-forwarding: bail when an instruction may throw [PR117816]	kelefth	1	-0/+11
	Avoid-store-forwarding doesn't handle the case where an instruction in the store-load sequence contains a REG_EH_REGION note, leading to the insertion of instructions after it, while it should be the last instruction in the basic block. This causes an ICE when compiling using `-O -fnon-call-exceptions -favoid-store-forwarding -fno-forward-propagate -finstrument-functions`. This patch rejects the transformation when there are instructions in the sequence that may throw an exeption. PR rtl-optimization/117816 gcc/ChangeLog: * avoid-store-forwarding.cc (store_forwarding_analyzer::avoid_store_forwarding): Reject the transformation when having instructions that may throw exceptions in the sequence. gcc/testsuite/ChangeLog: * gcc.dg/pr117816.c: New test.
2024-12-06	testsuite/117714 - gcc.dg/vect/slp-reduc-4.c FAILs on 32-bit SPARC	Richard Biener	1	-1/+1
	The testcase tries to ensure we can elide all permutations when vectorizing a MAX reduction. For SPARC the issue is that the MAX reduction isn't supported and since we're trying to fall back to single-lane SLP the dumps contain VEC_PERM_EXPR for the interleaving permute lowering. Before all-SLP that wouldn't be in the dumps when doing non-SLP, but eventually we'd fail to vectorize so no VEC_PERM_EXPRs would be in the dumps either. The following adds vect_no_int_min_max to the set of xfails for this particular scan as well, like the existing check for vectorizing. PR testsuite/117714 * gcc.dg/vect/slp-reduc-4.c: Add vect_no_int_min_max to the XFAIL for the VEC_PERM_EXPR scan.
2024-12-05	c: Diagnose unexpected va_start arguments in C23 [PR107980]	Jakub Jelinek	8	-3/+132
	va_start macro was changed in C23 from the C17 va_start (va_list ap, parmN) where parmN is the identifier of the last parameter into va_start (va_list ap, ...) where arguments after ap aren't evaluated. Late in the C23 development "If any additional arguments expand to include unbalanced parentheses, or a preprocessing token that does not convert to a token, the behavior is undefined." has been added, plus there is "NOTE The macro allows additional arguments to be passed for va_start for compatibility with older versions of the library only." and "Additional arguments beyond the first given to the va_start macro may be expanded and used in unspecified contexts where they are unevaluated. For example, an implementation diagnoses potentially erroneous input for an invocation of va_start such as:" ... va_start(vl, 1, 3.0, "12", xd); // diagnostic encouraged ... "Simultaneously, va_start usage consistent with older revisions of this document should not produce a diagnostic:" ... void neigh (int last_arg, ...) { va_list vl; va_start(vl, last_arg); // no diagnostic The following patch implements the recommended diagnostics. Until now in C23 mode va_start(v, ...) was defined to __builtin_va_start(v, 0) and the extra arguments were silently ignored. The following patch adds a new builtin in a form of a keyword which parses the first argument, is silent about the __builtin_c23_va_start (ap) form, for __builtin_c23_va_start (ap, identifier) looks the identifier up and is silent if it is the last named parameter (except that it diagnoses if it has register keyword), otherwise diagnoses it isn't the last one but something else, and if there is just __builtin_c23_va_start (ap, ) or if __builtin_c23_va_start (ap, is followed by tokens other than identifier followed by ), it skips over the tokens (with handling of balanced ()s) until ) and diagnoses the extra tokens. In all cases in a form of warnings. 2024-12-05 Jakub Jelinek <jakub@redhat.com> PR c/107980 gcc/ * ginclude/stdarg.h (va_start): For C23+ change parameters from v, ... to just ... and define to __builtin_c23_va_start(__VA_ARGS__) rather than __builtin_va_start(v, 0). gcc/c-family/ * c-common.h (enum rid): Add RID_C23_VA_START. * c-common.cc (c_common_reswords): Add __builtin_c23_va_start. gcc/c/ * c-parser.cc (c_parser_postfix_expression): Handle RID_C23_VA_START. gcc/testsuite/ * gcc.dg/c23-stdarg-4.c: Expect extra warning. * gcc.dg/c23-stdarg-6.c: Likewise. * gcc.dg/c23-stdarg-7.c: Likewise. * gcc.dg/c23-stdarg-8.c: Likewise. * gcc.dg/c23-stdarg-10.c: New test. * gcc.dg/c23-stdarg-11.c: New test. * gcc.dg/torture/c23-stdarg-split-1a.c: Expect extra warning. * gcc.dg/torture/c23-stdarg-split-1b.c: Likewise.
2024-12-03	phiopt: Reset the number of iterations information of a loop when changing ↵	Andrew Pinski	2	-0/+64
	an exit from the loop [PR117243] After r12-5300-gf98f373dd822b3, phiopt could get the following bb structure: \| middle-bb -----\| \| \| \| \|----\| \| phi<1, 2> \| \| cond \| \| \| \| \| \|--------+---\| Which was considered 2 loops. The inner loop had esimtate of upper_bound to be 8, due to the original `for (b = 0; b <= 7; b++)`. The outer loop was already an infinite one. So phiopt would come along and change the condition to be unconditionally true, we change the inner loop to being an infinite one but don't reset the estimate on the loop and cleanup cfg comes along and changes it into one loop but also does not reset the estimate of the loop. Then the loop unrolling uses the old estimate and decides to add an unreachable there.o So the fix is when phiopt changes an exit to a loop, reset the estimates, similar to how cleanupcfg does it when merging some basic blocks. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117243 PR tree-optimization/116749 gcc/ChangeLog: * tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Reset loop estimates if the cond_block was an exit to a loop. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr117243-1.c: New test. * gcc.dg/torture/pr117243-2.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-12-03	Rectify some test cases.	Georg-Johann Lay	10	-8/+17
	PR testsuite/52641 PR testsuite/109123 PR testsuite/114661 PR testsuite/117828 PR testsuite/116481 PR testsuite/91069 gcc/testsuite/ * gcc.dg/Wuse-after-free-pr109123.c: Use size_t instead of long unsigned int. * gcc.dg/c23-tag-bitfields-1.c: Requires int32plus. * gcc.dg/pr114661.c: Same. * gcc.dg/pr117828.c: Same. * gcc.dg/flex-array-counted-by-2.c: Use uintptr_t instead of unsigned long. * gcc.dg/pr116481.c: Same. * gcc.dg/lto/tag-1_0.c: Use int32_t instead of int. * gcc.dg/lto/tag-1_1.c: Use int16_t instead of short. * gcc.dg/pr91069.c: Require double64. * gcc.dg/type-convert-var.c: Require double64plus.
2024-12-03	AVR: Skip some test cases that don't work for it.	Georg-Johann Lay	2	-0/+2
	gcc/testsuite/ * gcc.c-torture/execute/ieee/cdivchkd.x: New file. * gcc.c-torture/execute/ieee/cdivchkf.x: New file. * gcc.dg/flex-array-counted-by.c: Require wchar. * gcc.dg/fold-copysign-1.c [avr]: Add -mdouble=64.
2024-12-03	AVR: Improve location of late diagnostics.	Georg-Johann Lay	5	-10/+6
	Some diagnostics are issues late, e.g. in avr_print_operand(). This patch uses the insn's location as a proxy for the operand location. Without the patch, the location is usually input_location, which points to the closing } of the function body. gcc/ * config/avr/avr.cc (avr_insn_location): New variable. (avr_final_prescan_insn): Set avr_insn_location. (avr_asm_final_postscan_insn): Unset avr_insn_location after last insn. (avr_print_operand): Pass avr_insn_location to warning_at. gcc/testsuite/ * gcc.dg/Warray-bounds-33.c: Adjust for avr diagnostics. * gcc.dg/pr56228.c: Same. * gcc.dg/pr86124.c: Same. * gcc.dg/pr94291.c: Same. * gcc.dg/tree-ssa/pr82059.c: Same.
2024-12-03	Move some CRC tests into the gcc.dg/torture directory	Jeff Law	22	-0/+0
	Jakub noted that these tests were using dg-skip-if directives that implied the tests were expected to run under multiple optimization options, which means they probably should be in gcc.dg/torture rather than in the gcc.dg directory. This moves the relevant tests from gcc.dg to gcc.dg/torture. gcc/testsuite * gcc.dg/crc-linux-1.c: Moved to from gcc.dg/torture. * gcc.dg/crc-linux-2.c: Likewise. * gcc.dg/crc-linux-4.c: Likewise. * gcc.dg/crc-linux-5.c: Likewise. * gcc.dg/crc-not-crc-15.c: Likewise. * gcc.dg/crc-side-instr-1.c: Likewise. * gcc.dg/crc-side-instr-2.c: Likewise. * gcc.dg/crc-side-instr-3.c: Likewise. * gcc.dg/crc-side-instr-4.c: Likewise. * gcc.dg/crc-side-instr-5.c: Likewise. * gcc.dg/crc-side-instr-6.c: Likewise. * gcc.dg/crc-side-instr-7.c: Likewise. * gcc.dg/crc-side-instr-8.c: Likewise. * gcc.dg/crc-side-instr-9.c: Likewise. * gcc.dg/crc-side-instr-10.c: Likewise. * gcc.dg/crc-side-instr-11.c: Likewise. * gcc.dg/crc-side-instr-12.c: Likewise. * gcc.dg/crc-side-instr-13.c: Likewise. * gcc.dg/crc-side-instr-14.c: Likewise. * gcc.dg/crc-side-instr-15.c: Likewise. * gcc.dg/crc-side-instr-16.c: Likewise. * gcc.dg/crc-side-instr-17.c: Likewise.
2024-12-03	preprocessor: Adjust C rules on UCNs for C23 [PR117162]	Joseph Myers	9	-632/+5625
	As noted in bug 117162, C23 changed some rules on UCNs to match C++ (this was a late change agreed in the resolution to CD2 comment US-032, implementing changes from N3124), which we need to implement. Allow UCNs below 0xa0 outside identifiers for C, with a pedwarn-if-pedantic before C23 (and a warning with -Wc11-c23-compat) except for the always-allowed cases of UCNs for $ @ `. Also as part of that change, do not allow \u0024 in identifiers as equivalent to $ for C23. Bootstrapped with no regressions for x86_64-pc-linux-gnu. PR c/117162 libcpp/ * include/cpplib.h (struct cpp_options): Add low_ucns. * init.cc (struct lang_flags, lang_defaults): Add low_ucns. (cpp_set_lang): Set low_ucns * charset.cc (_cpp_valid_ucn): For C, allow UCNs below 0xa0 outside identifiers, with a pedwarn if pedantic before C23 or a warning with -Wc11-c23-compat. Do not allow \u0024 in identifiers for C23. gcc/testsuite/ * gcc.dg/cpp/c17-ucn-1.c, gcc.dg/cpp/c17-ucn-2.c, gcc.dg/cpp/c17-ucn-3.c, gcc.dg/cpp/c17-ucn-4.c, gcc.dg/cpp/c23-ucn-2.c, gcc.dg/cpp/c23-ucnid-2.c: New tests. * c-c++-common/cpp/delimited-escape-seq-3.c, c-c++-common/cpp/named-universal-char-escape-3.c, gcc.dg/cpp/c23-ucn-1.c, gcc.dg/cpp/c2y-delimited-escape-seq-3.c: Update expected messages * gcc.dg/cpp/ucs.c: Use -pedantic-errors. Update expected messages.
2024-12-03	tree-ssanames, match.pd: get_nonzero_bits/with__nonzero_bits cleanups and ↵	Jakub Jelinek	1	-0/+16
	improvements [PR117420] The following patch implements the with__nonzero_bits cleanups and improvements I was talking about. get_nonzero_bits is extended to also handle BIT_AND_EXPR (as a tree or as SSA_NAME with BIT_AND_EXPR def_stmt), new function is added for the bits known to be set (get_known_nonzero_bits) and the match.pd predicates are renamed and adjusted, so that there is no confusion on which one to use (one is named and documented to be internal), changed so that it can be used only as a simple predicate, not match some operands, and that it doesn't try to match twice for the GIMPLE case (where SSA_NAME with integral or pointer type matches, but SSA_NAME with BIT_AND_EXPR def_stmt matched differently). Furthermore, get_nonzero_bits just returns the all bits set (or get_known_nonzero_bits no bits set) fallback if the argument isn't a SSA_NAME (nor INTEGER_CST or whatever the functions handle explicitly). 2024-12-03 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/117420 * tree-ssanames.h (get_known_nonzero_bits): Declare. * tree-ssanames.cc (get_nonzero_bits): New wrapper function. Move old definition to ... (get_nonzero_bits_1): ... here, add static. Change widest_int in function comment to wide_int. (get_known_nonzero_bits_1, get_known_nonzero_bits): New functions. * match.pd (with_possible_nonzero_bits2): Rename to ... (with_possible_nonzero_bits): ... this. Guard the bit_and case with #if GENERIC. Change to a normal match predicate without parameters. Rename the old with_possible_nonzero_bits match to ... (with_possible_nonzero_bits_1): ... this. (with_certain_nonzero_bits2): Remove. (with_known_nonzero_bits_1, with_known_nonzero_bits): New match predicates. (X == C (or X & Z == Y \| C) is impossible if ~nonzero(X) & C != 0): Use with_known_nonzero_bits@0 instead of (with_certain_nonzero_bits2 @1), use with_possible_nonzero_bits@0 instead of (with_possible_nonzero_bits2 @0) and get_known_nonzero_bits (@1) instead of wi::to_wide (@1). * gcc.dg/tree-ssa/pr117420.c: New test.
2024-12-03	bitintlower: Fix up ?ROTATE_EXPR lowering [PR117847]	Jakub Jelinek	1	-0/+27
	In the ?ROTATE_EXPR lowering I forgot to handle rotation by 0 correctly. INTEGER_CST 0 is very unlikely, it would be probably folded away, but a non-constant count can't use just p - n because then the shift count is out of bounds for zero. In the FE I use n == 0 ? x : (x << n) \| (x >> (p - n)) but bitintlower here isn't prepared at this point to have bb split and am not sure if using COND_EXPR is a good idea either, so the patch uses (p - n) % p. Perhaps I should just disable lowering the rotate in the FE for the non-mode precision BITINT_TYPEs too. 2024-12-03 Jakub Jelinek <jakub@redhat.com> PR middle-end/117847 * gimple-lower-bitint.cc (gimple_lower_bitint) <case LROTATE_EXPR>: Use m = (p - n) % p instead of m = p - n for the other shift count. * gcc.dg/torture/bitint-75.c: New test.
2024-12-03	replace atoi with strtoul in varasm.cc (decode_reg_name_and_count) [PR114540]	Heiko Eißfeldt	1	-0/+31
	The function uses atoi, which can silently return valid numbers even for some too large numbers in the string. Furthermore, the verification that all the characters in asmspec are decimal digits can be simplified when using strotoul, we can check just the first digit and whether the end pointer points to '\0'. 2024-12-03 Heiko Eißfeldt <heiko@hexco.de> PR middle-end/114540 * varasm.cc (decode_reg_name_and_count): Use strtoul instead of atoi and simplify verification that the whole asmspec contains just decimal digits. * gcc.dg/pr114540.c: New test. Signed-off-by: Heiko Eißfeldt <heiko@hexco.de> Co-authored-by: Jakub Jelinek <jakub@redhat.com>
2024-12-03	tree-optimization/117874 - missed vectorization that's formerly hybrid	Richard Biener	1	-0/+50
	With SLP forced we fail to consider using single-lane SLP for a case that we still end up discovering as hybrid (in the PR in question this is because we run into the SLP discovery limit due to excessive association). PR tree-optimization/117874 * tree-vect-loop.cc (vect_analyze_loop_2): When non-SLP analysis fails, try single-lane SLP. * gcc.dg/vect/pr117874.c: New testcase.
2024-12-02	Add trailing newlines where needed	Jakub Jelinek	60	-60/+60
	Especially in the recent CRC commits, I see \ No newline at end of file in almost every second file. So, I went through the diff between r15-1 and current trunk in gcc/, looking for additions of such problems which don't intentional (e.g. Wtrailing-whitespace* tests had it there intentionally) and just added the missing newline elsewhere. 2024-12-02 Jakub Jelinek <jakub@redhat.com> gcc/ * config/mingw/mingw-stdint.h: Add newline at the end of the file. * config/mingw/winnt-dll.cc: Likewise. * sym-exec/sym-exec-expression.h: Likewise. * sym-exec/sym-exec-expression.cc: Likewise. * sym-exec/sym-exec-condition.cc: Likewise. * sym-exec/sym-exec-expr-is-a-helper.h: Likewise. * sym-exec/sym-exec-condition.h: Likewise. * hwint.cc: Likewise. * crc-verification.cc: Likewise. * sarif-spec-urls.def: Likewise. gcc/testsuite/ * g++.target/aarch64/pr94515-2.C: Add newline at the end of the file. * g++.target/aarch64/return_address_sign_ab_exception.C: Likewise. * gcc.target/arm/thumb2-switchstatement.c: Likewise. * gcc.target/riscv/rvv/base/vssubu-2.c: Likewise. * gcc.target/riscv/rvv/base/vssubu-1.c: Likewise. * gcc.target/riscv/and-shift32.c: Likewise. * gcc.target/riscv/crc-builtin-zbc32.c: Likewise. * gcc.target/riscv/and-shift64.c: Likewise. * gcc.target/riscv/xtheadbb-extu-4.c: Likewise. * gcc.target/i386/avx2-bf16-vec-absneg.c: Likewise. * gcc.target/i386/avx512f-bf16-vec-absneg.c: Likewise. * gcc.target/aarch64/cpunative/native_cpu_26.c: Likewise. * gcc.target/aarch64/cpunative/info_26: Likewise. * gcc.target/aarch64/cpunative/info_25: Likewise. * g++.dg/contracts/pr116607.C: Likewise. * gfortran.dg/pr108889.f90: Likewise. * gcc.dg/crc-not-crc-14.c: Likewise. * gcc.dg/crc-from-fedora-packages-13.c: Likewise. * gcc.dg/crc-not-crc-25.c: Likewise. * gcc.dg/crc-from-fedora-packages-29.c: Likewise. * gcc.dg/crc-from-fedora-packages-10.c: Likewise. * gcc.dg/crc-side-instr-10.c: Likewise. * gcc.dg/crc-side-instr-1.c: Likewise. * gcc.dg/crc-side-instr-3.c: Likewise. * gcc.dg/crc-side-instr-2.c: Likewise. * gcc.dg/crc-not-crc-17.c: Likewise. * gcc.dg/crc-from-fedora-packages-7.c: Likewise. * gcc.dg/crc-side-instr-12.c: Likewise. * gcc.dg/crc-side-instr-16.c: Likewise. * gcc.dg/crc-not-crc-16.c: Likewise. * gcc.dg/crc-from-fedora-packages-4.c: Likewise. * gcc.dg/crc-not-crc-20.c: Likewise. * gcc.dg/crc-linux-3.c: Likewise. * gcc.dg/crc-from-fedora-packages-27.c: Likewise. * gcc.dg/pr109393.c: Likewise. * gcc.dg/crc-side-instr-7.c: Likewise. * gcc.dg/crc-side-instr-4.c: Likewise. * gcc.dg/tree-ssa/ldexp.c: Likewise. * gcc.dg/tree-ssa/pr114760-2.c: Likewise. * gcc.dg/tree-ssa/pr114760-1.c: Likewise. * gcc.dg/crc-side-instr-15.c: Likewise. * gcc.dg/crc-side-instr-9.c: Likewise. * gcc.dg/crc-not-crc-26.c: Likewise. * gcc.dg/crc-side-instr-8.c: Likewise. * gcc.dg/crc-not-crc-23.c: Likewise. * gcc.dg/crc-not-crc-19.c: Likewise. * gcc.dg/crc-from-fedora-packages-22.c: Likewise. * gcc.dg/crc-from-fedora-packages-16.c: Likewise. * gcc.dg/crc-side-instr-11.c: Likewise. * gcc.dg/crc-from-fedora-packages-5.c: Likewise. * gcc.dg/crc-not-crc-22.c: Likewise. * gcc.dg/crc-side-instr-17.c: Likewise. * gcc.dg/crc-linux-4.c: Likewise. * gcc.dg/crc-side-instr-14.c: Likewise. * gcc.dg/crc-not-crc-18.c: Likewise. * gcc.dg/crc-from-fedora-packages-23.c: Likewise. * gcc.dg/crc-not-crc-21.c: Likewise. * gcc.dg/crc-linux-2.c: Likewise. * gcc.dg/crc-from-fedora-packages-1.c: Likewise. * gcc.dg/crc-from-fedora-packages-30.c: Likewise. * gcc.dg/torture/crc-11.c: Likewise. * gcc.dg/torture/crc-27.c: Likewise. * gcc.dg/torture/crc-2.c: Likewise. * gcc.dg/torture/crc-24.c: Likewise. * gcc.dg/torture/crc-crc8.c: Likewise. * gcc.dg/torture/crc-crc8-data8-xorOustideFor.c: Likewise. * gcc.dg/torture/crc-16.c: Likewise. * gcc.dg/torture/crc-crc64-data64.c: Likewise. * gcc.dg/crc-from-fedora-packages-32.c: Likewise. * gcc.dg/crc-side-instr-6.c: Likewise. * gcc.dg/crc-side-instr-5.c: Likewise. * gcc.dg/crc-side-instr-13.c: Likewise. * gcc.dg/crc-not-crc-15.c: Likewise. * gcc.dg/crc-not-crc-13.c: Likewise. * gcc.dg/crc-from-fedora-packages-6.c: Likewise. * gcc.dg/crc-not-crc-24.c: Likewise.
2024-12-02	tree-optimization/116352 - SLP scheduling and stmt order	Richard Biener	2	-2/+35
	The PR uncovers unchecked constraints on the ability to code-generate with SLP but also latent issues with regard to stmt order checking since loop (early-break) and BB (for quite some time) vectorization are no longer constraint to single-BBs. In particular get_later_stmt simply compares UIDs of stmts, but that's only reliable when they are in the same BB. For the PR in question the problematical case is demoting a SLP node to external which fails to check we can actually code generate this in the way we do (using get_later_stmt). The following thus adds checking that we demote to external only when all defs are from the same BB. We no longer vectorize gcc.dg/vect/bb-slp-49.c but the testcase was for a wrong-code issue and the vectorization done is a no-op. PR tree-optimization/116352 PR tree-optimization/117876 * tree-vect-slp.cc (vect_slp_can_convert_to_external): New. (vect_slp_convert_to_external): Call it. (vect_build_slp_tree_2): Likewise. * gcc.dg/vect/pr116352.c: New testcase. * gcc.dg/vect/bb-slp-49.c: Remove vectorization check.
2024-12-01	Thanks for the feedback on the first version of the patch. Accordingly:	Jovan Vukic	1	-1/+30
	I have corrected the code formatting as requested. I added new tests to the existing file phi-opt-11.c, instead of creating a new one. I performed testing before and after applying the patch on the x86 architecture, and I confirm that there are no new regressions. The logic and general code of the patch itself have not been changed. > So the A EQ/NE B expression, we can reverse A and B in the expression > and still get the same result. But don't we have to be more careful for > the TRUE/FALSE arms of the ternary? For BIT_AND we need ? a : b for > BIT_IOR we need ? b : a. > > I don't see that gets verified in the existing code or after your > change. I suspect I'm just missing something here. Can you clarify how > we verify that BIT_AND gets ? a : b for the true/false arms and that > BIT_IOR gets ? b : a for the true/false arms? I did not communicate this clearly last time, but the existing optimization simplifies the expression "(cond & (a == b)) ? a : b" to the simpler "b". Similarly, the expression "(cond & (a == b)) ? b : a" simplifies to "a". Thus, the existing and my optimization perform the following simplifications: (cond & (a == b)) ? a : b -> b (cond & (a == b)) ? b : a -> a (cond \| (a != b)) ? a : b -> a (cond \| (a != b)) ? b : a -> b For this reason, for BIT_AND_EXPR when we have A EQ B, it is sufficient to confirm that one operand matches the true/false arm and the other matches the false/true arm. In both cases, we simplify the expression to the third operand of the ternary operation (i.e., OP0 ? OP1 : OP2 simplifies to OP2). This is achieved in the value_replacement function after successfully setting the value of code within the rhs_is_fed_for_value_replacement function to EQ_EXPR. For BIT_IOR_EXPR, the same check is performed for A NE B, except now code remains NE_EXPR, and then value_replacement returns the second operand (i.e., OP0 ? OP1 : OP2 simplifies to OP1). 2024-10-30 Jovan Vukic <Jovan.Vukic@rt-rk.com> gcc/ChangeLog: * tree-ssa-phiopt.cc (rhs_is_fed_for_value_replacement): Add a new optimization opportunity for BIT_IOR_EXPR and a != b. (operand_equal_for_value_replacement): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-11.c: Add more tests.
2024-12-01	[PATCH v7 12/12] Add tests for CRC detection and generation	Mariam Arutunian	129	-0/+5146
	gcc/testsuite * gcc.dg/crc-from-fedora-packages-1.c: New test. * gcc.dg/crc-from-fedora-packages-2.c: Likewise. * gcc.dg/crc-from-fedora-packages-3.c: Likewise. * gcc.dg/crc-from-fedora-packages-4.c: Likewise. * gcc.dg/crc-from-fedora-packages-5.c: Likewise. * gcc.dg/crc-from-fedora-packages-6.c: Likewise. * gcc.dg/crc-from-fedora-packages-7.c: Likewise. * gcc.dg/crc-from-fedora-packages-8.c: Likewise. * gcc.dg/crc-from-fedora-packages-9.c: Likewise. * gcc.dg/crc-from-fedora-packages-10.c: Likewise. * gcc.dg/crc-from-fedora-packages-11.c: Likewise. * gcc.dg/crc-from-fedora-packages-12.c: Likewise. * gcc.dg/crc-from-fedora-packages-13.c: Likewise. * gcc.dg/crc-from-fedora-packages-14.c: Likewise. * gcc.dg/crc-from-fedora-packages-15.c: Likewise. * gcc.dg/crc-from-fedora-packages-16.c: Likewise. * gcc.dg/crc-from-fedora-packages-17.c: Likewise. * gcc.dg/crc-from-fedora-packages-18.c: Likewise. * gcc.dg/crc-from-fedora-packages-19.c: Likewise. * gcc.dg/crc-from-fedora-packages-20.c: Likewise. * gcc.dg/crc-from-fedora-packages-21.c: Likewise. * gcc.dg/crc-from-fedora-packages-22.c: Likewise. * gcc.dg/crc-from-fedora-packages-23.c: Likewise. * gcc.dg/crc-from-fedora-packages-24.c: Likewise. * gcc.dg/crc-from-fedora-packages-25.c: Likewise. * gcc.dg/crc-from-fedora-packages-26.c: Likewise. * gcc.dg/crc-from-fedora-packages-27.c: Likewise. * gcc.dg/crc-from-fedora-packages-28.c: Likewise. * gcc.dg/crc-from-fedora-packages-29.c: Likewise. * gcc.dg/crc-from-fedora-packages-30.c: Likewise. * gcc.dg/crc-from-fedora-packages-31.c: Likewise. * gcc.dg/crc-from-fedora-packages-32.c: Likewise. * gcc.dg/crc-linux-1.c: Likewise. * gcc.dg/crc-linux-2.c: Likewise. * gcc.dg/crc-linux-3.c: Likewise. * gcc.dg/crc-linux-4.c: Likewise. * gcc.dg/crc-linux-5.c: Likewise. * gcc.dg/crc-not-crc-1.c: Likewise. * gcc.dg/crc-not-crc-2.c: Likewise. * gcc.dg/crc-not-crc-3.c: Likewise. * gcc.dg/crc-not-crc-4.c: Likewise. * gcc.dg/crc-not-crc-5.c: Likewise. * gcc.dg/crc-not-crc-6.c: Likewise. * gcc.dg/crc-not-crc-7.c: Likewise. * gcc.dg/crc-not-crc-8.c: Likewise. * gcc.dg/crc-not-crc-9.c: Likewise. * gcc.dg/crc-not-crc-10.c: Likewise. * gcc.dg/crc-not-crc-11.c: Likewise. * gcc.dg/crc-not-crc-12.c: Likewise. * gcc.dg/crc-not-crc-13.c: Likewise. * gcc.dg/crc-not-crc-14.c: Likewise. * gcc.dg/crc-not-crc-15.c: Likewise. * gcc.dg/crc-not-crc-16.c: Likewise. * gcc.dg/crc-not-crc-17.c: Likewise. * gcc.dg/crc-not-crc-18.c: Likewise. * gcc.dg/crc-not-crc-19.c: Likewise. * gcc.dg/crc-not-crc-20.c: Likewise. * gcc.dg/crc-not-crc-21.c: Likewise. * gcc.dg/crc-not-crc-22.c: Likewise. * gcc.dg/crc-not-crc-23.c: Likewise. * gcc.dg/crc-not-crc-24.c: Likewise. * gcc.dg/crc-not-crc-25.c: Likewise. * gcc.dg/crc-not-crc-26.c: Likewise. * gcc.dg/crc-side-instr-1.c: Likewise. * gcc.dg/crc-side-instr-2.c: Likewise. * gcc.dg/crc-side-instr-3.c: Likewise. * gcc.dg/crc-side-instr-4.c: Likewise. * gcc.dg/crc-side-instr-5.c: Likewise. * gcc.dg/crc-side-instr-6.c: Likewise. * gcc.dg/crc-side-instr-7.c: Likewise. * gcc.dg/crc-side-instr-8.c: Likewise. * gcc.dg/crc-side-instr-9.c: Likewise. * gcc.dg/crc-side-instr-10.c: Likewise. * gcc.dg/crc-side-instr-11.c: Likewise. * gcc.dg/crc-side-instr-12.c: Likewise. * gcc.dg/crc-side-instr-13.c: Likewise. * gcc.dg/crc-side-instr-14.c: Likewise. * gcc.dg/crc-side-instr-15.c: Likewise. * gcc.dg/crc-side-instr-16.c: Likewise. * gcc.dg/crc-side-instr-17.c: Likewise. * gcc.dg/torture/crc-1.c: Likewise. * gcc.dg/torture/crc-2.c: Likewise. * gcc.dg/torture/crc-3.c: Likewise. * gcc.dg/torture/crc-4.c: Likewise. * gcc.dg/torture/crc-5.c: Likewise. * gcc.dg/torture/crc-6.c: Likewise. * gcc.dg/torture/crc-7.c: Likewise. * gcc.dg/torture/crc-8.c: Likewise. * gcc.dg/torture/crc-9.c: Likewise. * gcc.dg/torture/crc-10.c: Likewise. * gcc.dg/torture/crc-11.c: Likewise. * gcc.dg/torture/crc-12.c: Likewise. * gcc.dg/torture/crc-13.c: Likewise. * gcc.dg/torture/crc-14.c: Likewise. * gcc.dg/torture/crc-15.c: Likewise. * gcc.dg/torture/crc-16.c: Likewise. * gcc.dg/torture/crc-17.c: Likewise. * gcc.dg/torture/crc-18.c: Likewise. * gcc.dg/torture/crc-19.c: Likewise. * gcc.dg/torture/crc-20.c: Likewise. * gcc.dg/torture/crc-21.c: Likewise. * gcc.dg/torture/crc-22.c: Likewise. * gcc.dg/torture/crc-23.c: Likewise. * gcc.dg/torture/crc-24.c: Likewise. * gcc.dg/torture/crc-25.c: Likewise. * gcc.dg/torture/crc-26.c: Likewise. * gcc.dg/torture/crc-27.c: Likewise. * gcc.dg/torture/crc-28.c: Likewise. * gcc.dg/torture/crc-29.c: Likewise. * gcc.dg/torture/crc-CCIT-data16-xorOutside_InsideFor.c: Likewise. * gcc.dg/torture/crc-coremark16-data16.c: Likewise. * gcc.dg/torture/crc-coremark32-data16.c: Likewise. * gcc.dg/torture/crc-coremark32-data32.c: Likewise. * gcc.dg/torture/crc-coremark32-data8.c: Likewise. * gcc.dg/torture/crc-coremark64-data64.c: Likewise. * gcc.dg/torture/crc-coremark8-data8.c: Likewise. * gcc.dg/torture/crc-CCIT-data16.c: Likewise. * gcc.dg/torture/crc-CCIT-data8.c: Likewise. * gcc.dg/torture/crc-crc32-data16.c: Likewise. * gcc.dg/torture/crc-crc32-data24.c: Likewise. * gcc.dg/torture/crc-crc32-data8.c: Likewise. * gcc.dg/torture/crc-crc32.c: Likewise. * gcc.dg/torture/crc-crc64-data32.c: Likewise. * gcc.dg/torture/crc-crc64-data64.c: Likewise. * gcc.dg/torture/crc-crc8-data8-loop-xorInFor.c: Likewise. * gcc.dg/torture/crc-crc8-data8-xorOustideFor.c: Likewise. * gcc.dg/torture/crc-crc8.c: Likewise. Co-Authored: Jeff Law <jlaw@ventanamicro.com>
2024-12-01	testsuite: Silence gcc.dg/pr117806.c for default_packed	Dimitar Dimitrov	1	-0/+1
	On default_packed targets like PRU, spurious warnings are emitted: ...workspace/gcc/gcc/testsuite/gcc.dg/pr117806.c:5:3: warning: 'packed' attribute ignored for field of type 'double' [-Wattributes] Fix by annotating the excess warnings for default_packed targets. gcc/testsuite/ChangeLog: * gcc.dg/pr117806.c: Test can spill excess errors for default_packed targets. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2024-11-30	VN: Don't recurse on for the same value of `a != 0` [PR117859]	Andrew Pinski	2	-0/+74
	Like r15-5063-g6e84a41622f56c, but this is for the `a != 0` case. After adding vn_valueize to the handle the `a ==/!= 0` case of insert_predicates_for_cond, it would go into an infinite loop as the Value number for a could be the same as what it is for the whole expression. This avoids that recursion so there is no infinite loop here. Note lim was introducing `bool_var2 = bool_var1 != 0` originally but with the gimple testcase in -2, there is no dependency on what passes before hand will do. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117859 gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): If the valueization for the new lhs for `lhs != 0` is the same as the old ones, don't recurse. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr117859-1.c: New test. * gcc.dg/torture/pr117859-2.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-30	[PATCH v3] zero_extend(not) -> xor optimization [PR112398]	Alexey Merzlyakov	2	-0/+32
	This patch adds optimization of the following patterns: (zero_extend:M (subreg:N (not:O==M (X:Q==M)))) -> (xor:M (zero_extend:M (subreg:N (X:M)), mask)) ... where the mask is GET_MODE_MASK (N). For the cases when X:M doesn't have any non-zero bits outside of mode N, (zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M) and whole optimization will be: (zero_extend:M (subreg:N (not:M (X:M)))) -> (xor:M (X:M, mask)) Patch targets to handle code patterns like: not a0,a0 andi a0,a0,0xff to be optimized to: xori a0,a0,255 PR rtl-optimization/112398 PR rtl-optimization/117476 gcc/ChangeLog: * simplify-rtx.cc (simplify_context::simplify_unary_operation_1): Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG)) when X doesn't have any non-zero bits outside of SUBREG mode. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr112398.c: New test. * gcc.dg/torture/pr117476-1.c: New test. From Zhendong Su. * gcc.dg/torture/pr117476-2.c: New test. From Zdenek Sojka.
2024-11-30	gimplify: Handle void expression as asm input [PR100501, PR100792]	Joseph Myers	5	-0/+39
	As reported in bug 100501 (plus duplicates), the gimplifier ICEs for C tests involving a statement expression not returning a value as an asm input; this includes the variant bug 100792 where the statement expression ends with another asm statement. The expected diagnostic for this case (as seen for C++ input) is one coming from the gimplifier and so it seems reasonable to fix the gimplifier to handle the GENERIC generated for this case by the C front end, rather than trying to make the C front end detect it earlier. Thus the gimplifier to handle a void expression like other non-lvalues for such a memory input. Bootstrapped with no regressions for x86_64-pc-linux-gnu. OK to commit? PR c/100501 PR c/100792 gcc/ * gimplify.cc (gimplify_asm_expr): Handle void expressions for memory inputs like other non-lvalues. gcc/testsuite/ * gcc.dg/pr100501-1.c, gcc.dg/pr100792-1.c: New tests. * gcc.dg/pr48552-1.c, gcc.dg/pr48552-2.c, gcc.dg/torture/pr98601.c: Update expected errors. Co-authored-by: Richard Biener <rguenther@suse.de>
2024-11-30	strlen: Handle vector CONSTRUCTORs [PR117057]	Jakub Jelinek	2	-2/+43
	The following patch handles VECTOR_TYPE_P CONSTRUCTORs in count_nonzero_bytes, including handling them if they have some elements non-constant. If there are still some constant elements before it (in the range queried), we derive info at least from those bytes and consider the rest as unknown. The first 3 hunks just punt in IMHO problematic cases, the spaghetti code considers byte_size 0 as unknown size, determine yourself, so if offset is equal to exp size, there are 0 bytes to consider (so nothing useful to determine), but using byte_size 0 would mean use any size. Similarly, native_encode_expr uses int type for offset (and size), so padding it offset larger than INT_MAX could be silent miscompilation. I've guarded the test to just a couple of targets known to handle it, because e.g. on ia32 without -msse forwprop1 seems to lower the CONSTRUCTOR into 4 BIT_FIELD_REF stores and I haven't figured out on what exactly that depends on (e.g. powerpc* is fine on any CPUs, even with -mno-altivec -mno-vsx, even -m32). 2024-11-30 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/117057 * tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Punt also when byte_size is equal to offset or nchars. Punt if offset is bigger than INT_MAX. Handle vector CONSTRUCTOR with some elements constant, possibly followed by non-constant. * gcc.dg/strlenopt-32.c: Remove xfail and vect_slp_v2qi_store_unalign specific scan-tree-dump-times directive. * gcc.dg/strlenopt-96.c: New test.
2024-11-30	c: Set attributes for fields when forming a composite type [PR117806]	Martin Uecker	1	-0/+13
	We need to call decl_attributes when creating the fields for a composite type. PR c/117806 gcc/c/ChangeLog: * c-typeck.cc (composite_type_internal): Call decl_attributes. gcc/testsuite/ChangeLog: * gcc.dg/pr117806.c: New test.
2024-11-29	gimplefe: Error recovery for invalid declarations [PR117749]	Andrew Pinski	1	-0/+11
	c_parser_declarator can return null if there was an error, but c_parser_gimple_declaration was not ready for that. This fixes that oversight so we don't get an ICE after the error. Bootstrapped and tested on x86_64-linux-gnu. PR c/117749 gcc/c/ChangeLog: * gimple-parser.cc (c_parser_gimple_declaration): Check declarator to be non-null. gcc/testsuite/ChangeLog: * gcc.dg/gimplefe-55.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-29	c: Correct type compatibility for bit-fields [PR117828]	Martin Uecker	2	-0/+65
	Add missing test for consistency of bit-fields when comparing tagged types for compatibility. PR c/117828 gcc/c/ChangeLog: * c-typeck.cc (tagged_types_tu_compatible_p): Add check. gcc/testsuite/ChangeLog: * gcc.dg/c23-tag-bitfields-1.c: New test. * gcc.dg/pr117828.c: New test.
2024-11-29	gimple-fold: Fix up type_has_padding_at_level_p [PR117065]	Jakub Jelinek	1	-0/+12
	The following testcase used to ICE on the trunk since the clear small object if it has padding optimization before my r15-5746 change, now it doesn't just because type_has_padding_at_level_p isn't called on the testcase. Though, as the testcase shows, structures/unions which contain erroneous types of one or more of its members can have TREE_TYPE of the FIELD_DECL error_mark_node, on which we can crash. E.g. the __builtin_clear_padding lowering just ignores those: if (TREE_TYPE (field) == error_mark_node) continue; and if (ftype == error_mark_node) continue; It doesn't matter much what exactly we do for those cases, as we are going to fail the compilation anyway, but we shouldn't crash. So, the following patch ignores those in type_has_padding_at_level_p. For RECORD_TYPE, we already return if !DECL_SIZE (f) which I think should cover already the erroneous fields (and we don't use TYPE_SIZE on those). 2024-11-29 Jakub Jelinek <jakub@redhat.com> PR middle-end/117065 * gimple-fold.cc (type_has_padding_at_level_p) <case UNION_TYPE>: Also continue if f has error_mark_node type. * gcc.dg/pr117065.c: New test.
2024-11-29	__builtin_prefetch fixes [PR117608]	Jakub Jelinek	1	-0/+4
	The r15-4833-ge9ab41b79933 patch had among tons of config/i386 specific changes also important change to the generic code, allowing also 2 as valid value of the second argument of __builtin_prefetch: - /* Argument 1 must be either zero or one. / - if (INTVAL (op1) != 0 && INTVAL (op1) != 1) + / Argument 1 must be 0, 1 or 2. / + if (INTVAL (op1) < 0 \|\| INTVAL (op1) > 2) But the patch failed to document that change in __builtin_prefetch documentation, and more importantly didn't adjust any of the other backends to deal with it (my understanding is the expected behavior is that 2 will be silently handled as 0 unless backends have some more specific way). Some of the backends would ICE on it, in some cases gcc_assert failures/gcc_unreachable, in other cases crash later (e.g. accessing arrays with that value as index and due to accessing garbage after the array crashing at final.cc time), others treated 2 silently as 0, others treated 2 silently as 1. And even in the i386 backend there were bugs which caused ICEs. The patch added some if (write == 0) and write 2 handling into a (badly indented, maybe that is the reason, if (write == 1) body), rather than into the else side, so it would be always false. The new prefetch_rst2 define_insn only accepts parameters 2 1 (i.e. read-shared with moderate degree of locality), so in order not to ICE the patch uses it only for __builtin_prefetch (ptr, 2, 1); or __builtin_ia32_prefetch (ptr, 2, 1, 0); and not for other values of the parameter. If that isn't what we want and we want it to be used also for all or some of __builtin_prefetch (ptr, 2, {0,2,3}); and corresponding __builtin_ia32_prefetch, maybe the define_insn could match other values. And there was another problem that -mno-mmx -mno-sse -mmovrs compilation would ICE on most of the prefetches, so I had to add the FAIL; cases. 2024-11-29 Jakub Jelinek <jakub@redhat.com> PR target/117608 * doc/extend.texi (__builtin_prefetch): Document that second argument may be also 2 and its meaning. * config/i386/i386.md (prefetch): Remove unreachable code. Clear write set operands[1] to const0_rtx if !TARGET_MOVRS or of locality is not 1. Formatting fixes. * config/i386/i386-expand.cc (ix86_expand_builtin): Use IN_RANGE. Call gen_prefetch even for TARGET_MOVRS. * config/alpha/alpha.md (prefetch): Treat read_or_write 2 like 0. * config/mips/mips.md (prefetch): Likewise. * config/arc/arc.md (prefetch_1, prefetch_2, prefetch_3): Likewise. * config/riscv/riscv.md (prefetch): Likewise. * config/loongarch/loongarch.md (prefetch): Likewise. * config/sparc/sparc.md (prefetch): Likewise. Use IN_RANGE. * config/ia64/ia64.md (prefetch): Likewise. * config/pa/pa.md (prefetch): Likewise. * config/aarch64/aarch64.md (prefetch): Likewise. * config/rs6000/rs6000.md (prefetch): Likewise. * gcc.dg/builtin-prefetch-1.c (good): Add tests with second argument 2. * gcc.target/i386/pr117608-1.c: New test. * gcc.target/i386/pr117608-2.c: New test.
2024-11-29	ifcombine: avoid unsound forwarder-enabled combinations [PR117723]	Alexandre Oliva	1	-0/+63
	When ifcombining contiguous blocks, we can follow forwarder blocks and reverse conditions to enable combinations, but when there are intervening blocks, we have to constrain ourselves to paths to the exit that share the PHI args with all intervening blocks. Avoiding considering forwarders when intervening blocks were present would match the preexisting test, but we can do better, recording in case a forwarded path corresponds to the outer block's exit path, and insisting on not combining through any other path but the one that was verified as corresponding. The latter is what this patch implements. While at that, I've fixed some typos, introduced early testing before computing the exit path to avoid it when computing it would be wasteful, or when avoiding it can enable other sound combinations. for gcc/ChangeLog PR tree-optimization/117723 * tree-ssa-ifcombine.cc (tree_ssa_ifcombine_bb): Record forwarder blocks in path to exit, and stick to them. Avoid computing the exit if obviously not needed, and if that enables additional optimizations. (tree_ssa_ifcombine_bb_1): Fix typos. for gcc/testsuite/ChangeLog PR tree-optimization/117723 * gcc.dg/torture/ifcmb-1.c: New.