aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
14 daysdocs: fix a typo in used attribute documentationTamar Christina1-1/+1
This fixes a small typo in the Label attributes docs. gcc/ChangeLog: * doc/extend.texi: Fix typo in unsed attribute docs.
2025-06-27x86: Handle vector broadcast sourceH.J. Lu2-0/+220
Use the inner scalar mode of vector broadcast source in: (set (reg:V8DF 394) (vec_duplicate:V8DF (reg:V2DF 190 [ alpha ]))) to compute the vector mode for broadcast from vector source. gcc/ PR target/120830 * config/i386/i386-features.cc (ix86_get_vector_cse_mode): Handle vector broadcast source. gcc/testsuite/ PR target/120830 * g++.target/i386/pr120830.C: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-26[lra] catch all to-sp eliminations with nonzero offsets [PR120424]Alexandre Oliva1-21/+25
An x86_64-linux-gnu native with ix86_frame_pointer_required modified to return true for nonzero frames, to exercize lra_update_fp2sp_elimination, reveals in stage1 testing that wrong code is generated for gcc.c-torture/execute/ieee/fp-cmp-8l.c: argp-to-sp eliminations are used for one_test to pass its arguments on to *pos, and the sp offsets survive the disabling of that elimination. We didn't really have to disable that elimination, but the x86 backend disables eliminations to sp if frame_pointer_needed. This change extends the catching of fp2sp eliminations to all (?) eliminations to sp with nonzero offsets, since none of them can be properly reversed and would silently lead to wrong code. By accepting nonzero offsets, we bootstrap with -maccumulate-outgoing-args on x86_64-linux-gnu (with ix86_frame_pointer_required modified to return true on nonzero frame size). for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (elimination_2sp_occurred_p): Rename from... (elimination_fp2sp_occured_p): ... this. Adjust all uses. (lra_eliminate_regs_1): Don't require a from-frame-pointer elimination to set it. (update_reg_eliminate): Likewise to test it.
2025-06-26[lra] apply elimination offsets to MEM in autoinc address [PR120424]Alexandre Oliva1-0/+6
When attempting to bootstrap arm-linux-gnueabihf with {BOOT_C,T}FLAGS='-g -O2 -fnon-call-exceptions -fstack-clash-protection', gmp fails to build in stage2: gen-fac's mpz_and gets miscompiled. A pseudo is initialized before a loop and used in a PRE_INC load inside a loop. It gets spilled just as the fp2sp elimination is disabled, and only the initialization gets adjusted with elimination offsets. The unadjusted stack slot within the PRE_INC load ends up reloaded later, but only when the FP offset has already missed its chance to be adjusted. Arrange for lra_eliminate_regs_1 to adjust autoinc addresses that are MEMs themselves. for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (lra_eliminate_regs_1): Adjust autoinc addresses that are MEMs.
2025-06-26[lra] reorder operations in lra_update_fp2sp_elimination [PR120424]Alexandre Oliva1-7/+5
The various recent additions to lra_update_fp2sp_elimination rendered it somewhat confusing, with intermixed groups of statements pertaining to three different major actions: disabling the elimination, recomputing live ranges, and spilling uses of the frame pointer. Reorder them for readability. for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (lra_update_fp2sp_elimination): Reorder and regroup related statements.
2025-06-26[lra] rework deactivation of fp2sp elimination [PR120424]Alexandre Oliva1-2/+16
Deactivating the fp2sp elimination in lra_update_fp2sp_elimination prevents update_reg_eliminate from propagating the fp2sp elimination offset to the next chosen elimination, so it may retain -1 as the prev_offset, and prev_offset will be taken as an already-applied offset that needs to be compensated in the next round of spilling and reloading. This affects, for example, crtbegin.o's __do_global_dtors_aux on arm-linux-gnueabihf in a {BOOT_C,T}FLAGS='-O2 -g -fnon-call-exceptions -fstack-clash-protection' bootstrap. Alas, just retaining that elimination causes spills to use the fp2sp elimination, including applying sp offsets, which breaks e.g. an x86_64-linux-gnu native bootstrap with ix86_frame_pointer_required modified to return true on nonzero frame size. The middle-ground solution is to keep the elimination active, so that its offsets are applied and propagated on to the subsequent fp elimination, but without introducing sp offsets, so that e.g. pr103973-18.c on the modified x86_64-linux-gnu doesn't get adjacent argument pushes of two adjacent on-stack temporaries ending up pushing the same temporary because of undesired adjustments. for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (lra_update_fp2sp_elimination): Avoid sp offsets in further fp2sp eliminations... (update_reg_eliminate): ... and restore to_rtx before assert checking.
2025-06-26[lra] recompute ranges upon disabling fp2sp elimination [PR120424]Alexandre Oliva4-0/+70
If the frame size grows to nonzero, arm_frame_pointer_required may flip to true under -fstack-clash-protection -fnon-call-exceptions, and that may disable the fp2sp elimination part-way through lra. If pseudos had got assigned to the frame pointer register before that, they have to be spilled, and that requires complete live range information. If !lra_reg_spill_p, lra_spill won't have live ranges for such pseudos, and they could end up sharing spill slots with other pseudos whose live ranges actually overlap. This affects at least Ada.Strings.Wide_Superbounded.Super_Insert and .Super_Replace_Slice in libgnat/a-stwisu.adb, when compiled with -O2 -fstack-clash-protection -march=armv7 (implied Thumb2), causing acats-4's cdd2a01 to fail. Recomputing live ranges including registers may renumber and compress points, so we have to recompute the aggregated live ranges for already-assigned spill slots as well. As a safety net, reject empty live ranges when computing slot sharing. for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (lra_update_fp2sp_elimination): Compute complete live ranges and recompute slots' live ranges if needed. * lra-lives.cc (lra_reset_live_range_list): New. (lra_complete_live_ranges): New. * lra-spills.cc (assign_spill_hard_regs): Reject empty live ranges. (add_pseudo_to_slot): Likewise. (lra_recompute_slots_live_ranges): New. * lra-int.h (lra_reset_live_range_list): Declare. (lra_complete_live_ranges): Declare. (lra_recompute_slots_live_ranges): Declare.
2025-06-26[genoutput] mark scratch outputs as eliminable [PR120424]Alexandre Oliva1-1/+1
acats' fdd2a00.read is miscompiled on arm-linux-gnu with -O2 -fstack-clash-protection -march=armv7-a -marm: a clobbered scratch register in a *iorsi3_compare0_scratch pattern gets initially assigned to the frame pointer register, but at some point during lra the frame size grows to nonzero, arm_frame_pointer_required flips to true, and the fp2sp elimination has to be disabled, so the scratch register gets spilled to a stack slot. It needs to get the sfp elimination at that point, because later rounds of elimination will assume the previous round's offset has already been applied. But since scratch matches are not regarded as eliminable by genoutput, we don't attempt elimination in the clobbered stack slot MEM rtx. Later on, lra issues a reload for that slot, using a new pseudo allocated to a hardware register, that gets stored in the stack slot after the original insn. Elimination in that reload store insn eventually updates the elimination offset, but it's an incremental update, assuming that the offset so far has already been applied. Without applying the initial offset, the store ends up overlapping with the function's register save area, corrupting a caller's call-saved register. AFAICT the old reload's elimination wouldn't be harmed by allowing elimination in scratch operands, so I'm enabling eliminable for them regardless. Should it be found to make a difference, we could presumably set a different bit in eliminable to enable reload and lra to tell them apart and behave accordingly. for gcc/ChangeLog PR rtl-optimization/120424 * genoutput.cc (scan_operands): Make MATCH_SCRATCHes eliminable.
2025-06-26[lra] inactivate disabled fp2sp elimination [PR120424]Alexandre Oliva1-3/+12
Even after we disable the fp2sp elimination when it is the active elimination for the fp, spilling might use it before update_reg_eliminate runs and inactivates it for good. If it is used, update_reg_eliminate will fail the check that fp2sp was not used. Since we keep track of uses of this specific elimination, and lra_update_fp2sp_elimination checks it before disabling it, we know it hasn't been used, so we can inactivate it without any ill effects. This fixes the pr118591-1.c avr-none regression exposed by the PR120424 fix. for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (lra_update_fp2sp_elimination): Inactivate the unused fp2sp elimination right away.
2025-06-26pru: Split 64-bit moves into a sequence of 32-bit movesDimitar Dimitrov3-0/+94
The 64-bit register-to-register moves on PRU are implemented with two instructions moving 32-bit registers. Defining a split for the 64-bit moves allows this to be described in RTL, and thus one of the 32-bit moves to be eliminated if the destination register is dead. Also, split the loading of non-trivial 64-bit integer constants. The resulting 32-bit integer constants have better chance to be loaded with something more optimal than an "ldi32". For now do the splits only after register allocation, because LRA does not yet efficiently handle subregs. See https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651366.html This patch shows slight improvement for wikisort benchmark from embench-iot: Benchmark size-before size-after difference --------- ----------- ---------- ---------- aha-mont64 1,648 1,648 0 crc32 104 104 0 depthconv 1,172 1,172 0 edn 3,040 3,040 0 huffbench 1,616 1,616 0 matmult-int 748 748 0 md5sum 700 700 0 nettle-aes 2,664 2,664 0 nettle-sha256 5,732 5,732 0 nsichneu 21,372 21,372 0 picojpeg 9,716 9,716 0 qrduino 8,556 8,556 0 sglib-combined 3,724 3,724 0 slre 3,488 3,488 0 statemate 1,132 1,132 0 tarfind 652 652 0 ud 1,004 1,004 0 wikisort 18,120 18,092 -28 xgboost 300 300 0 gcc/ChangeLog: * config/pru/pru.md (reg move splitter): New splitter for 64-bit register moves into two 32-bit moves. (const_int move splitter): New splitter for 64-bit constant integer moves into two 32-bit moves. gcc/testsuite/ChangeLog: * gcc.target/pru/mov64-subreg-1.c: New test. * gcc.target/pru/mov64-subreg-2.c: New test. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2025-06-26diagnostics: make 5 more fields of diagnostic_context privateDavid Malcolm6-17/+40
No functional change intended. gcc/ada/ChangeLog: * gcc-interface/misc.cc (gnat_init): Use diagnostic_context::set_internal_error_callback. gcc/c-family/ChangeLog: * c-opts.cc (c_common_diagnostics_set_defaults): Use diagnostic_context::set_permissive_option. gcc/cp/ChangeLog: * error.cc (cxx_initialize_diagnostics): Use diagnostic_context::set_adjust_diagnostic_info_callback. gcc/ChangeLog: * diagnostic.h (diagnostic_context::set_permissive_option): New. (diagnostic_context::set_fatal_errors): New. (diagnostic_context::set_internal_error_callback): New. (diagnostic_context::set_adjust_diagnostic_info_callback): New. (diagnostic_context::inhibit_notes): New. (diagnostic_context::m_opt_permissive): Make private. (diagnostic_context::m_fatal_errors): Likewise. (diagnostic_context::m_internal_error): Likewise. (diagnostic_context::m_adjust_diagnostic_info): Likewise. (diagnostic_context::m_inhibit_notes_p): Likewise. (diagnostic_inhibit_notes): Delete. * opts.cc (common_handle_option): Use diagnostic_context::set_fatal_errors. * toplev.cc (internal_error_function): Use diagnostic_context::set_internal_error_callback. (general_init): Likewise. (process_options): Use diagnostic_context::inhibit_notes. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-06-26diagnostics, testsuite: don't assume host has "dot" [PR120809]David Malcolm4-10/+41
gcc/ChangeLog: PR analyzer/120809 * diagnostic-format-html.cc (html_builder::maybe_make_state_diagram): Bulletproof against the SVG generation failing. * xml.cc (xml::printer::push_element): Assert that the ptr is nonnull. (xml::printer::append): Likewise. gcc/testsuite/ChangeLog: PR analyzer/120809 * gcc.dg/analyzer/state-diagram-5.c: Split out into... * gcc.dg/analyzer/state-diagram-5-html.c: ...this, adding dg-require-dot... * gcc.dg/analyzer/state-diagram-5-sarif.c: ...and this. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-06-26diagnostics: refactor sarif_scheme_handler::make_sinkDavid Malcolm1-11/+32
No functional change intended. gcc/ChangeLog: * diagnostic-output-spec.cc (sarif_scheme_handler::make_sink): Split out creation of sarif_generation_options and sarif_serialization_format into... (sarif_scheme_handler::make_sarif_gen_opts): ...this... (sarif_scheme_handler::make_sarif_serialization_object): ...and this. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-06-26RISC-V: update prepare_ternary_operands to handle vector-scalar case [PR120828]Paul-Antoine Arras1-3/+5
This is a followup to 92e1893e0 "RISC-V: Add patterns for vector-scalar multiply-(subtract-)accumulate" that caused an ICE in some cases where the mult operands were wrongly swapped. This patch ensures that operands are not swapped in the vector-scalar case. PR target/120828 gcc/ChangeLog: * config/riscv/riscv-v.cc (prepare_ternary_operands): Handle the vector-scalar case.
2025-06-26rust: Silence a clang warning in borrow-checker-diagnosticsMartin Jambor1-1/+1
When compiling gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc with clang, it emits the following warning: gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc:145:46: warning: non-constant-expression cannot be narrowed from type 'Polonius::Loan' (aka 'unsigned long') to 'uint32_t' (aka 'unsigned int') in initializer list [-Wc++11-narrowing] I'd hope that for indexing that is never really a problem, nevertheless if narrowing is taking place, I guess it can be argued it should be made explicit. gcc/rust/ChangeLog: 2025-06-23 Martin Jambor <mjambor@suse.cz> * checks/errors/borrowck/rust-borrow-checker-diagnostics.cc (BorrowCheckerDiagnostics::get_loan): Type cast loan to uint32_t.
2025-06-26c++, libstdc++: Implement C++26 P2830R10 - Constexpr Type OrderingJakub Jelinek8-1/+145
The following patch attempts to implement the C++26 P2830R10 - Constexpr Type Ordering paper, with a minor change that std::type_order<T, U> class template doesn't derive from integer_constant, because std::strong_ordering is not a structural type (except in MSVC), so instead it is just a class template with static constexpr strong_ordering value member and also value_type, type and 2 operators. The paper mostly talks about using something other than mangled names for the ordering, but given that the mangler is part of the GCC C++ FE, using the mangler seems to be the best ordering choice to me. 2025-06-26 Jakub Jelinek <jakub@redhat.com> gcc/cp/ * cp-trait.def: Implement C++26 P2830R10 - Constexpr Type Ordering. (TYPE_ORDER): New. * method.cc (type_order_value): Define. * cp-tree.h (type_order_value): Declare. * semantics.cc (trait_expr_value): Use gcc_unreachable also for CPTK_TYPE_ORDER, adjust comment. (finish_trait_expr): Handle CPTK_TYPE_ORDER. * constraint.cc (diagnose_trait_expr): Likewise. gcc/testsuite/ * g++.dg/cpp26/type-order1.C: New test. * g++.dg/cpp26/type-order2.C: New test. * g++.dg/cpp26/type-order3.C: New test. libstdc++-v3/ * include/bits/version.def (type_order): New. * include/bits/version.h: Regenerate. * libsupc++/compare: Define __glibcxx_want_type_order before including bits/version.h. (std::type_order, std::type_order_v): New trait and template variable. * src/c++23/std.cc.in (std::type_order, std::type_order_v): Export. * testsuite/18_support/comparisons/type_order/1.cc: New test.
2025-06-26i386: Introduce crc_rev<mode>si4 expanders [PR120719]Uros Bizjak2-0/+39
Introduce crc_rev<mode>si4 expanders to generate CRC32 instruction when using __builtin_rev_crc32_data* builtins with 0x1EDC6F41 poylnomial and -mcrc32. PR target/120719 gcc/ChangeLog: * config/i386/i386.md (crc_rev<SWI124:mode>si4): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/crc-builtin-crc32.c: New test.
2025-06-26RISC-V: Fix build issueKito Cheng1-1/+1
Apparently I forgot to squash this fix into the previous commit before I push... gcc/ChangeLog: * config/riscv/riscv.md: Fix build issue.
2025-06-26lto-ltrans-cache: Remove unused private memberMartin Jambor2-4/+2
When building GCC with clang, it warns that the private member suffix in class ltrans_file_cache (defined in lto-ltrans-cache.h) is not used which indeed looks like it is the case. This patch therefore removes it along with its initialization in the constructor. gcc/ChangeLog: 2025-06-24 Martin Jambor <mjambor@suse.cz> * lto-ltrans-cache.h (class ltrans_file_cache): Remove member prefix. * lto-ltrans-cache.cc (ltrans_file_cache::ltrans_file_cache): Do not initialize member prefix.
2025-06-26RISC-V: Add comment and reorder the the include files in riscv.md [NFC]Kito Cheng1-8/+11
This patch adds a comment to the riscv.md file to clarify the purpose of the file and reorders the include files for better organization. gcc/ChangeLog: * config/riscv/riscv.md: Add comment and reorder include files.
2025-06-26tree-vect-stmts.cc: Remove an unused shadowed variableMartin Jambor1-1/+0
When compiling tree-vect-stmts.cc with clang, it emits a warning: gcc/tree-vect-stmts.cc:14930:19: warning: unused variable 'mode_iter' [-Wunused-variable] And indeed, there are two mode_iter local variables in function supportable_indirect_convert_operation and the first one is not used at all. This patch removes it. gcc/ChangeLog: 2025-06-24 Martin Jambor <mjambor@suse.cz> * tree-vect-stmts.cc (supportable_indirect_convert_operation): Remove an unused shadowed variable.
2025-06-26Silence a clang warning in tree-vect-slp.cc about an unused variableMartin Jambor1-5/+0
Since r15-4695-gd17e672ce82e69 (Richard Biener: Assert finished vectorizer pattern COND_EXPR transition), the static const array cond_expr_maps is unused and when GCC is compiled with clang, it warns about that. This patch simply removes the variable. gcc/ChangeLog: 2025-06-24 Martin Jambor <mjambor@suse.cz> * tree-vect-slp.cc (cond_expr_maps): Remove.
2025-06-26fortran: Avoid freeing uninitialized valueMartin Jambor1-1/+1
When compiling fortran/match.cc, clang emits a warning fortran/match.cc:5301:7: warning: variable 'p' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] which looks accurate, so this patch adds an initialization of p to avoid the use. gcc/fortran/ChangeLog: 2025-06-23 Martin Jambor <mjambor@suse.cz> * match.cc (gfc_match_nullify): Initialize p to NULL;
2025-06-26Add testcase for afdo offlining and fix two bugsJan Hubicka4-43/+90
This patch adds a testcase that offlining works and profile info is not lost. While doing it I noticed a pasto that made the dump to be "afdo" and not "afdo_offline" and also that not all functions are processed as the range for does not expect new values to be put to the vector. Fixed thus. gcc/ChangeLog: * auto-profile.cc (function_instance::merge): Add TODO. (autofdo_source_profile::offline_external_functions): Do not use range for on the worklist. * timevar.def (TV_IPA_AUTOFDO_OFFLINE): New timevar. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/afdo-crossmodule-1.c: New test. * gcc.dg/tree-prof/afdo-crossmodule-1b.c: New test.
2025-06-26Fortran: Prevent creation of unused tree.Andre Vehreschild1-4/+7
gcc/fortran/ChangeLog: * trans.cc (gfc_allocate_using_malloc): Prevent possible memory leak when allocation was already done.
2025-06-26Fortran: Fix wasting memory in coarray single mode.Andre Vehreschild2-3/+3
gcc/fortran/ChangeLog: * resolve.cc (resolve_fl_derived0): Do not create the token component when not in coarray lib mode. * trans-types.cc: Do not access the token when not in coarray lib mode.
2025-06-26Fortran: Fix out of bounds access in structure constructor's clean up [PR120711]Andre Vehreschild2-4/+29
A structure constructor's generated clean up code was using an offset variable, which was manipulated before the clean up was run leading to an out of bounds access. PR fortran/120711 gcc/fortran/ChangeLog: * trans-array.cc (gfc_trans_array_ctor_element): Store the value of the offset for reuse. gcc/testsuite/ChangeLog: * gfortran.dg/asan/array_constructor_1.f90: New test.
2025-06-26Avoid some lost AFDO profiles with LTOJan Hubicka4-33/+587
This patch fixes some of cases where we lose profile info because we do not perform inlining that happened at train run before AFDO annotation is done. This is a common problem with LTO in the case cross-module inlining happened. I added afdo_offline pass that does two things: 1) collect set of all functions defined in current unit 2) walk all toplevel function instances. If function instance correspond to a defined symbol, walk everything inlined to it. If crossmodule inlining is seen, remove the inline instances and recursively look into inline instnaces that go back to the current unit and turn them to offline ones If function instance corresponds to external symbol, remove it but also look for functions inlined to it that belong to current module. When merging profile we also need to recursively merge profiles of inlined functions and if the inlining decisins does not match, offline the bodies. This is somewhat fragile since recursive calls may trigger modifications of functions currently being merged, but I hope I chased away problems with that - will give it a second tought to see if this can be reorganized into a worklist fashion that is more safe. I noticed that functions may appear in the afdo data either as their symbol name or dwarf name (since inline functions may not have known symbol name). There is already some logic to handle that but it is broken in the case both names are used. To mitigate the problem I also added logic to translate dwarf names to symbol names in case both are used. This prevents profile loss i.e. in exchange2. Here digits_2 function appears by its dwarf name (digits_2) but also is clonned which makes it to appear by its symbol name (__*digits_2) All profile massaging is done before early optimization so the VPT targets of offline bodies are correct. We still will lose profile if early inlining fails. I will add second pass to afdo to offline these. Last problem is that in case we early inlined more than expected (which now happens more often due to offlining) the profile will be lost and filled by static profile. Problem here is that we need to somehow scale the profile of inline instance but I do not see how to determine invocation counts. Will try to look into that incrementally - perhaps we can keep some info from offlining. There is also now a dump infrastructure that prints the proflie in a the same format as dump_gcov tool. autoprofiledbootstraped, regsted x86_64-linux, will commit it shortly. Honza gcc/ChangeLog: * auto-profile.cc (name_index_set, name_index_map): New types. (dump_afdo_loc): New function. (dump_inline_stack): Simplify. (function_instance::merge): Merge recursively inlined functions; offline if necessary; collect new fnctions. (function_instance::offline): New member function. (function_instance::offline_if_in_set): New member function. (function_instance::remove_external_functions): New member function. (function_instance::dump): New member function. (function_instance::debug): New member function. (function_instance::dump_inline_stack): New member function. (function_instance::find_icall_target_map): Use removed_icall_target. (function_instance::remove_icall_target): Only mark icall target removed. (autofdo_source_profile::offline_external_functions): New function. (function_instance::read_function_instance): Record inlined_to pointers; use -1 for unknown head counts. (autofdo_source_profile::get_function_instance_by_name_index): New function. (autofdo_source_profile::add_function_instance): New member function. (autofdo_source_profile::read): Do not leak memory; fix formatting. (read_profile): Fix formatting. (afdo_annotate_cfg): LIkewise. (class pass_ipa_auto_profile_offline): New pass. (make_pass_ipa_auto_profile_offline): New function. * passes.def (pass_ipa_auto_profile_offline): Add * tree-pass.h (make_pass_ipa_auto_profile): Declare gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/indir-call-prof-2.c: Update template.
2025-06-26x86: Also handle all 1s float vector constantH.J. Lu2-2/+41
Since float vector constant (const_vector:V4SF [(const_double:SF -QNaN [-QNaN]) repeated x4]) is an all 1s float vector constant, update the remove_redundant_vector pass to replace (insn 20 18 21 2 (set (reg:V4SF 124) (const_vector:V4SF [ (const_double:SF -QNaN [-QNaN]) repeated x4 ])) "x.cc":26:5 2426 {movv4sf_internal} (nil)) with (insn 49 2 5 2 (set (reg:V16QI 135) (const_vector:V16QI [ (const_int -1 [0xffffffffffffffff]) repeated x16 ])) -1 (nil)) ... (insn 20 18 21 2 (set (reg:V4SF 124) (subreg:V4SF (reg:V16QI 135) 0)) "x.cc":26:5 2426 {movv4sf_internal} (nil)) gcc/ PR target/120819 * config/i386/i386-features.cc (ix86_broadcast_inner): Also handle all 1s float vector constant. gcc/testsuite/ PR target/120819 * g++.target/i386/pr120819.C: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-26x86: Handle REG_EH_REGION note in DEF_INSNH.J. Lu1-0/+32
For tcpsock_test.go in libgo tests, commit aba3b9d3a48a0703fd565f7c5f0caf604f59970b Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri May 9 07:17:07 2025 +0800 x86: Extend the remove_redundant_vector pass added an instruction: (insn 501 101 102 21 (set (reg:V2DI 234) (vec_duplicate:V2DI (reg:DI 111 [ _46 ]))) "tcpsock_test.go":691:12 discrim 1 -1 (nil)) after (insn 101 100 501 21 (set (reg:DI 111 [ _46 ]) (mem:DI (reg/f:DI 110 [ _45 ]) [5 *_45+0 S8 A64])) "tcpsock_test.go":691:12 discrim 1 99 {*movdi_internal} (expr_list:REG_DEAD (reg/f:DI 110 [ _45 ]) (expr_list:REG_EH_REGION (const_int 1 [0x1]) (nil)))) which resulted in (insn 101 100 501 21 (set (reg:DI 111 [ _46 ]) (mem:DI (reg/f:DI 110 [ _45 ]) [5 *_45+0 S8 A64])) "tcpsock_test.go":691:12 discrim 1 99 {*movdi_internal} (expr_list:REG_DEAD (reg/f:DI 110 [ _45 ]) (expr_list:REG_EH_REGION (const_int 1 [0x1]) (nil)))) (insn 501 101 102 21 (set (reg:V2DI 234) (vec_duplicate:V2DI (reg:DI 111 [ _46 ]))) "tcpsock_test.go":691:12 discrim 1 -1 (nil)) and caused: tcpsock_test.go: In function 'net.TestTCPBig..func2': tcpsock_test.go:684:28: error: in basic block 21: 684 | go func() { | ^ tcpsock_test.go:684:28: error: flow control insn inside a basic block (insn 101 100 501 21 (set (reg:DI 111 [ _46 ]) (mem:DI (reg/f:DI 110 [ _45 ]) [5 *_45+0 S8 A64])) "tcpsock_test.go":691:12 discrim 1 99 {*movdi_internal} (expr_list:REG_DEAD (reg/f:DI 110 [ _45 ]) (expr_list:REG_EH_REGION (const_int 1 [0x1]) (nil)))) during RTL pass: rrvl tcpsock_test.go:684:28: internal compiler error: in rtl_verify_bb_insns, at cfgrtl.cc:2834 Copy the REG_EH_REGION note to the newly added instruction and split the block after the previous instruction. PR target/120816 * config/i386/i386-features.cc (remove_redundant_vector_load): Handle REG_EH_REGION note in DEF_INSN. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-26x86: Add preserve_none and update no_caller_saved_registers attributesH.J. Lu43-51/+1680
Add preserve_none attribute which is similar to no_callee_saved_registers attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are used for integer parameter passing. This can be used in an interpreter to avoid saving/restoring the registers in functions which process byte codes. It improved the pystones benchmark by 6-7%: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628#c15 Remove -mgeneral-regs-only restriction on no_caller_saved_registers attribute. Only SSE is allowed since SSE XMM register load preserves the upper bits in YMM/ZMM register while YMM register load zeros the upper 256 bits of ZMM register, and preserving 32 ZMM registers can be quite expensive. gcc/ PR target/119628 * config/i386/i386-expand.cc (ix86_expand_call): Call ix86_type_no_callee_saved_registers_p instead of looking up no_callee_saved_registers attribute. * config/i386/i386-options.cc (ix86_set_func_type): Look up preserve_none attribute. Check preserve_none attribute for interrupt attribute. Don't check no_caller_saved_registers nor no_callee_saved_registers conflicts here. (ix86_set_func_type): Check no_callee_saved_registers before checking no_caller_saved_registers attribute. (ix86_set_current_function): Allow SSE with no_caller_saved_registers attribute. (ix86_handle_call_saved_registers_attribute): Check preserve_none, no_callee_saved_registers and no_caller_saved_registers conflicts. (ix86_gnu_attributes): Add preserve_none attribute. * config/i386/i386-protos.h (ix86_type_no_callee_saved_registers_p): New. * config/i386/i386.cc (x86_64_preserve_none_int_parameter_registers): New. (ix86_using_red_zone): Don't use red-zone when there are no caller-saved registers with SSE. (ix86_type_no_callee_saved_registers_p): New. (ix86_function_ok_for_sibcall): Also check TYPE_PRESERVE_NONE and call ix86_type_no_callee_saved_registers_p instead of looking up no_callee_saved_registers attribute. (ix86_comp_type_attributes): Call ix86_type_no_callee_saved_registers_p instead of looking up no_callee_saved_registers attribute. Return 0 if preserve_none attribute doesn't match in 64-bit mode. (ix86_function_arg_regno_p): For cfun with TYPE_PRESERVE_NONE, use x86_64_preserve_none_int_parameter_registers. (init_cumulative_args): Set preserve_none_abi. (function_arg_64): Use x86_64_preserve_none_int_parameter_registers with preserve_none attribute. (setup_incoming_varargs_64): Use x86_64_preserve_none_int_parameter_registers with preserve_none attribute. (ix86_save_reg): Treat TYPE_PRESERVE_NONE like TYPE_NO_CALLEE_SAVED_REGISTERS. (ix86_nsaved_sseregs): Allow saving XMM registers for no_caller_saved_registers attribute. (ix86_compute_frame_layout): Likewise. (x86_this_parameter): Use x86_64_preserve_none_int_parameter_registers with preserve_none attribute. * config/i386/i386.h (ix86_args): Add preserve_none_abi. (call_saved_registers_type): Add TYPE_PRESERVE_NONE. (machine_function): Change call_saved_registers to 3 bits. * doc/extend.texi: Add preserve_none attribute. Update no_caller_saved_registers attribute to remove -mgeneral-regs-only restriction. gcc/testsuite/ PR target/119628 * gcc.target/i386/no-callee-saved-3.c: Adjust error location. * gcc.target/i386/no-callee-saved-19a.c: New test. * gcc.target/i386/no-callee-saved-19b.c: Likewise. * gcc.target/i386/no-callee-saved-19c.c: Likewise. * gcc.target/i386/no-callee-saved-19d.c: Likewise. * gcc.target/i386/no-callee-saved-19e.c: Likewise. * gcc.target/i386/preserve-none-1.c: Likewise. * gcc.target/i386/preserve-none-2.c: Likewise. * gcc.target/i386/preserve-none-3.c: Likewise. * gcc.target/i386/preserve-none-4.c: Likewise. * gcc.target/i386/preserve-none-5.c: Likewise. * gcc.target/i386/preserve-none-6.c: Likewise. * gcc.target/i386/preserve-none-7.c: Likewise. * gcc.target/i386/preserve-none-8.c: Likewise. * gcc.target/i386/preserve-none-9.c: Likewise. * gcc.target/i386/preserve-none-10.c: Likewise. * gcc.target/i386/preserve-none-11.c: Likewise. * gcc.target/i386/preserve-none-12.c: Likewise. * gcc.target/i386/preserve-none-13.c: Likewise. * gcc.target/i386/preserve-none-14.c: Likewise. * gcc.target/i386/preserve-none-15.c: Likewise. * gcc.target/i386/preserve-none-16.c: Likewise. * gcc.target/i386/preserve-none-17.c: Likewise. * gcc.target/i386/preserve-none-18.c: Likewise. * gcc.target/i386/preserve-none-19.c: Likewise. * gcc.target/i386/preserve-none-20.c: Likewise. * gcc.target/i386/preserve-none-21.c: Likewise. * gcc.target/i386/preserve-none-22.c: Likewise. * gcc.target/i386/preserve-none-23.c: Likewise. * gcc.target/i386/preserve-none-24.c: Likewise. * gcc.target/i386/preserve-none-25.c: Likewise. * gcc.target/i386/preserve-none-26.c: Likewise. * gcc.target/i386/preserve-none-27.c: Likewise. * gcc.target/i386/preserve-none-28.c: Likewise. * gcc.target/i386/preserve-none-29.c: Likewise. * gcc.target/i386/preserve-none-30a.c: Likewise. * gcc.target/i386/preserve-none-30b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-26Daily bump.GCC Administrator4-1/+238
2025-06-26x86: Add debug dump for the remove_redundant_vector passH.J. Lu1-5/+60
Add debug dump for the remove_redundant_vector pass with the following output: Replace: (insn 7 4 8 2 (set (reg:V2DI 103) (const_vector:V2DI [ (const_int 0 [0]) repeated x2 ])) "x.c":8:13 2406 {movv2di_internal} (nil)) with: (insn 7 4 8 2 (set (reg:V2DI 103) (subreg:V2DI (reg:V32QI 109) 0)) "x.c":8:13 2406 {movv2di_internal} (nil)) ... Replace: (insn 16 15 17 3 (set (reg:V4DI 105) (const_vector:V4DI [ (const_int 0 [0]) repeated x4 ])) "x.c":13:28 2405 {movv4di_internal} (nil)) with: (insn 16 15 17 3 (set (reg:V4DI 105) (subreg:V4DI (reg:V32QI 109) 0)) "x.c":13:28 2405 {movv4di_internal} (nil)) ... Place: (insn 25 5 23 2 (set (reg:V32QI 109) (const_vector:V32QI [ (const_int 0 [0]) repeated x32 ])) -1 (nil)) after: (insn 23 25 24 2 (set (reg/f:DI 107 [ mem1 ]) (reg:DI 5 di [ mem1 ])) "x.c":5:1 95 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 5 di [ mem1 ]) (nil))) in the *.309r.rrvl debug dump. * config/i386/i386-features.cc (ix86_place_single_vector_set): Add debug dump. (replace_vector_const): Likewise. (remove_redundant_vector_load): Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-25arc: Use intrinsics for __builtin_mul_overflow ()Luis Silva1-0/+33
This patch handles both signed and unsigned builtin multiplication overflow. Uses the "mpy.f" instruction to set the condition codes based on the result. In the event of an overflow, the V flag is set, triggering a conditional move depending on the V flag status. For example, set "1" to "r0" in case of overflow: mov_s r0,1 mpy.f r0,r0,r1 j_s.d [blink] mov.nv r0,0 gcc/ChangeLog: * config/arc/arc.md (<su_optab>mulvsi4): New define_expand. (<su_optab>mulsi3_Vcmp): New define_insn. Signed-off-by: Luis Silva <luiss@synopsys.com>
2025-06-25arc: Add commutative multiplication patternsLuis Silva3-3/+102
This patch introduces two new instruction patterns: `*mulsi3_cmp0`: This pattern performs a multiplication and sets the CC_Z register based on the result, while also storing the result of the multiplication in a general-purpose register. `*mulsi3_cmp0_noout`: This pattern performs a multiplication and sets the CC_Z register based on the result without storing the result in a general-purpose register. These patterns are optimized to generate code using the `mpy.f` instruction, specifically used where the result is compared to zero. In addition, the previous commutative multiplication implementation was removed. It incorrectly took into account the negative flag, which is wrong. This new implementation only considers the zero flag. A test case has been added to verify the correctness of these changes. gcc/ChangeLog: * config/arc/arc.cc (arc_select_cc_mode): Handle multiplication results compared against zero, selecting CC_Zmode. * config/arc/arc.md (*mulsi3_cmp0): New define_insn. (*mulsi3_cmp0_noout): New define_insn. gcc/testsuite/ChangeLog: * gcc.target/arc/mult-cmp0.c: New test. Signed-off-by: Luis Silva <luiss@synopsys.com>
2025-06-25arc: testsuite: Scan rlc instead of mov.hsLuis Silva1-5/+3
Due to the patch by Roger Sayle, 09881218137f4af9b7c894c2d350cf2ff8e0ee23, which introduces the use of the `rlc rX,0` instruction in place of the `mov.hs`, the add overflow test case needs to be updated. The previous test case was validating the `mov.hs` instruction, but now it must validate the `rlc` instruction as the new behavior. gcc/testsuite/ChangeLog: * gcc.target/arc/overflow-1.c: Replace mov.hs with rlc. Signed-off-by: Luis Silva <luiss@synopsys.com>
2025-06-25ARC: Use intrinsics for __builtin_sub_overflow*()Shahab Vahedi2-0/+145
This patch covers signed and unsigned subtractions. The generated code would be something along these lines: signed: sub.f r0, r1, r2 b.v @label unsigned: sub.f r0, r1, r2 b.c @label gcc/ * config/arc/arc.md (subsi3_v, subvsi4, subsi3_c): New patterns. gcc/testsuite/ * gcc.target/arc/overflow-2.c: New file.
2025-06-25ARC: Use intrinsics for __builtin_add_overflow*()Shahab Vahedi6-1/+182
This patch covers signed and unsigned additions. The generated code would be something along these lines: signed: add.f r0, r1, r2 b.v @label unsigned: add.f r0, r1, r2 b.c @label gcc/ * config/arc/arc-modes.def (CC_V): New mode. * config/arc/arc-protos.h (arc_gen_unlikely_cbranch): New function declaration. * config/arc/arc.cc (arc_gen_unlikely_cbranch): New function. (get_arc_condition_code): Handle new mode. * config/arc/arc.md (addvsi3_v, addvsi4, addsi3_c, uaddvsi4): New patterns. * config/arc/predicates.md (proper_comparison_operator): Handel the new V_mode. (equality_comparison_operator): Likewise. gcc/testsuite/ * gcc.target/arc/overflow-1.c: New file
2025-06-25diagnostics: Mark path_label::get_effects as final overrideMartin Jambor2-2/+2
When compiling diagnostic-path-output.cc with clang, it warns that path_label::get_effects should be marked as override. That looks like a good idea and from a brief look I also believe it should be marked as final (the other override in the class is marked as both), so this patch does that. Likewise for html_output_format::after_diagnostic in diagnostic-format-html.cc which also already has quite a few member functions marked as final override. gcc/ChangeLog: 2025-06-24 Martin Jambor <mjambor@suse.cz> * diagnostic-path-output.cc (path_label::get_effects): Mark as final override. * diagnostic-format-html.cc (html_output_format::after_diagnostic): Likewise.
2025-06-25ranger-op: Use CFN_ constant instead of plain BUILTIN_ oneMartin Jambor1-1/+1
when compiling gimple-range-op.cc, clang issues warning: gimple-range-op.cc:1419:18: warning: comparison of different enumeration types in switch statement ('combined_fn' and 'built_in_function') [-Wenum-compare-switch] which I hope is harmless, but all other switch cases use CFN_ prefixed constants, so I guess the ISINF case should too. gcc/ChangeLog: 2025-06-23 Martin Jambor <mjambor@suse.cz> * gimple-range-op.cc (gimple_range_op_handler::maybe_builtin_call): Use CFN_BUILT_IN_ISINF instead of BUILT_IN_ISINF.
2025-06-25value-relation.h: Mark dom_oracle::next_relation as overrideMartin Jambor1-1/+1
When GCC is compiled with clang, it emits a warning that dom_oracle::next_relation is not marked as override even though it does override a virtual function of its ancestor. This patch marks it as such to silence the warning and for the sake of consistency. There are other member functions in the class which are marked as final override but this particular function is in the protected section so I decided to just mark it as override. gcc/ChangeLog: 2025-06-24 Martin Jambor <mjambor@suse.cz> * value-relation.h (class dom_oracle): Mark member function next_relation as override.
2025-06-25tree-ssa-propagate.h: Mark two functions as overrideMartin Jambor1-2/+2
When tree-ssa-propagate.h is compiled with clang, it complains that member functions functions value_of_expr and range_of_expr of class substitute_and_fold_engine are not marked as override even though they do override virtual functions of the ancestor class. This patch merely adds the keyword to silence the warning and for consistency's sake. I did not make this part of the previous patch because I wanted to point out that the first case is quite unusual, a virtual function with a functional body (range_query::value_of_expr) is being overridden with a pure virtual function. I assume it was a conscious decision but adding the override keyword seems even more important then. gcc/ChangeLog: 2025-06-24 Martin Jambor <mjambor@suse.cz> * tree-ssa-propagate.h (class substitute_and_fold_engine): Mark member functions value_of_expr and range_of_expr as override.
2025-06-25ranger: Mark several member functions as final overrideMartin Jambor3-42/+44
When GCC is built with clang, it emits warnings that several member functions of various ranger classes override a virtual function of an ancestor but are not marked with the override keyword. After inspecting the cases, I found that all these classes had other member functions marked as final override, so I added the final keyword everywhere too. In some cases other such overrides were not explicitly marked as virtual, which made formatting easier. For that reason and also for consistency, in such cases I removed the virtual keyword from the functions I marked as final override too. gcc/ChangeLog: 2025-06-24 Martin Jambor <mjambor@suse.cz> * range-op-mixed.h (class operator_plus): Mark member function overflow_free_p as final override. (class operator_minus): Likewise. (class operator_mult): Likewise. * range-op-ptr.cc (class pointer_plus_operator): Mark member function lhs_op1_relation as final override. * range-op.cc (class operator_div::): Mark member functions op2_range and update_bitmask as final override. (class operator_logical_and): Mark member functions fold_range, op1_range and op2_range as final override. Remove unnecessary virtual. (class operator_logical_or): Likewise. (class operator_logical_not): Mark member functions fold_range and op1_range as final override. Remove unnecessary virtual. formatting easier. (class operator_absu): Mark member functions wi_fold as final override.
2025-06-25coroutines: Remove unused private member in cp_coroutine_transformMartin Jambor1-1/+0
When building GCC with clang, it warns that the private member suffix in class cp_coroutine_transform (defined in gcc/cp/coroutines.h) is not used which indeed looks like it is the case. This patch therefore removes it. gcc/cp/ChangeLog: 2025-06-24 Martin Jambor <mjambor@suse.cz> * coroutines.h (class cp_coroutine_transform): Remove member orig_fn_body.
2025-06-25Mark pass_sccopy gate and execute functions as final overrideMartin Jambor1-2/+2
It is customary to mark the gate and execute functions of the classes representing passes as final override but this is missing in pass_sccopy. This patch adds it which also silences clang warnings about it. gcc/ChangeLog: 2025-06-24 Martin Jambor <mjambor@suse.cz> * gimple-ssa-sccopy.cc (class pass_sccopy): Mark member functions gate and execute as final override.
2025-06-25Mark rtl_avoid_store_forwarding functions final overrideMartin Jambor1-2/+2
It is customary to mark the gate and execute functions of the classes representing passes as final override but this is missing in pass_rtl_avoid_store_forwarding. This patch adds it which also silences a clang warning about it. gcc/ChangeLog: 2025-06-24 Martin Jambor <mjambor@suse.cz> * avoid-store-forwarding.cc (class pass_rtl_avoid_store_forwarding): Mark member function gate as final override.
2025-06-25Remove unused vector in value-relation.cc.Andrew MacLeod1-6/+0
The relation_to_code vector in value-relation is now unused, so we can remove it. * value-relation.cc (relation_to_code): Remove.
2025-06-25Promote verify_range to vrange.Andrew MacLeod2-6/+7
most range classes had a verufy_range, but it was all private. Make it a supported routine from vrange. * value-range.cc (frange::verify_range): Constify. (irange::verify_range): Constify. * value-range.h (vrange::verify_range): New. (irange::verify_range): Make public. (prange::verify_range): Make public. (prange::verify_range): Make public. (value_range::verify_range): New.
2025-06-25get_bitmask is sometimes less refined.Andrew MacLeod1-1/+116
get_bitmask intersects the current mask with a mask generated from the range. If the 2 masks are incompatible, it currently returns UNKNOWN. Instead, ti should return the original mask or information is lost. * value-range.cc (irange::get_bitmask): Return original mask if result is unknown. (assert_snap_result): New. (test_irange_snap_bounds): New. (range_tests_misc): Call test_irange_snap_bounds.
2025-06-25tree-optimization/109892 - SLP reduction of fmaRichard Biener4-0/+68
The following adds the ability to vectorize a fma reduction pair as SLP reduction (we cannot yet handle ternary association in reduction vectorization yet). PR tree-optimization/109892 * tree-vect-loop.cc (check_reduction_path): Handle fma. (vectorizable_reduction): Apply FOLD_LEFT_REDUCTION code generation constraints. * gcc.dg/vect/vect-reduc-fma-1.c: New testcase. * gcc.dg/vect/vect-reduc-fma-2.c: Likewise. * gcc.dg/vect/vect-reduc-fma-3.c: Likewise.