aboutsummaryrefslogtreecommitdiff
path: root/gcc/testsuite/gcc.dg
AgeCommit message (Collapse)AuthorFilesLines
2025-07-01Use the counted_by attribute of pointers in array bound checker.Qing Zhao5-0/+221
Current array bound checker only instruments ARRAY_REF, and the INDEX information is the 2nd operand of the ARRAY_REF. When extending the array bound checker to pointer references with counted_by attributes, the hardest part is to get the INDEX of the corresponding array ref from the offset computation expression of the pointer ref. I.e. Given an OFFSET expression, and the ELEMENT_SIZE, get the index expression from the OFFSET. For example: OFFSET: ((long unsigned int) m * (long unsigned int) SAVE_EXPR <n>) * 4 ELEMENT_SIZE: (sizetype) SAVE_EXPR <n> * 4 get the index as (long unsigned int) m. gcc/c-family/ChangeLog: * c-gimplify.cc (is_address_with_access_with_size): New function. (ubsan_walk_array_refs_r): Instrument an INDIRECT_REF whose base address is .ACCESS_WITH_SIZE or an address computation whose base address is .ACCESS_WITH_SIZE. * c-ubsan.cc (ubsan_instrument_bounds_pointer_address): New function. (struct factor_t): New structure. (get_factors_from_mul_expr): New function. (get_index_from_offset): New function. (get_index_from_pointer_addr_expr): New function. (is_instrumentable_pointer_array_address): New function. (ubsan_array_ref_instrumented_p): Change prototype. Handle MEM_REF in addtional to ARRAY_REF. (ubsan_maybe_instrument_array_ref): Handle MEM_REF in addtional to ARRAY_REF. gcc/testsuite/ChangeLog: * gcc.dg/ubsan/pointer-counted-by-bounds-2.c: New test. * gcc.dg/ubsan/pointer-counted-by-bounds-3.c: New test. * gcc.dg/ubsan/pointer-counted-by-bounds-4.c: New test. * gcc.dg/ubsan/pointer-counted-by-bounds-5.c: New test. * gcc.dg/ubsan/pointer-counted-by-bounds.c: New test.
2025-07-01Use the counted_by attribute of pointers in builtinin-object-size.Qing Zhao8-0/+253
gcc/ChangeLog: * tree-object-size.cc (access_with_size_object_size): Update comments for pointers with .ACCESS_WITH_SIZE. (collect_object_sizes_for): Propagate size info through GIMPLE_ASSIGN for pointers with .ACCESS_WITH_SIZE. gcc/testsuite/ChangeLog: * gcc.dg/pointer-counted-by-4-char.c: New test. * gcc.dg/pointer-counted-by-4-float.c: New test. * gcc.dg/pointer-counted-by-4-struct.c: New test. * gcc.dg/pointer-counted-by-4-union.c: New test. * gcc.dg/pointer-counted-by-4.c: New test. * gcc.dg/pointer-counted-by-5.c: New test. * gcc.dg/pointer-counted-by-6.c: New test. * gcc.dg/pointer-counted-by-7.c: New test.
2025-07-01Extend "counted_by" attribute to pointer fields of structures. Convert a ↵Qing Zhao5-1/+283
pointer reference with counted_by attribute to .ACCESS_WITH_SIZE. For example: struct PP { size_t count2; char other1; char *array2 __attribute__ ((counted_by (count2))); int other2; } *pp; specifies that the "array2" is an array that is pointed by the pointer field, and its number of elements is given by the field "count2" in the same structure. gcc/c-family/ChangeLog: * c-attribs.cc (handle_counted_by_attribute): Accept counted_by attribute for pointer fields. gcc/c/ChangeLog: * c-decl.cc (verify_counted_by_attribute): Change the 2nd argument to a vector of fields with counted_by attribute. Verify all fields in this vector. (finish_struct): Collect all the fields with counted_by attribute to a vector and pass this vector to verify_counted_by_attribute. * c-typeck.cc (build_counted_by_ref): Handle pointers with counted_by. Add one more argument, issue error when the pointee type is a structure or union including a flexible array member. (build_access_with_size_for_counted_by): Handle pointers with counted_by. (handle_counted_by_for_component_ref): Call build_counted_by_ref with the new prototype. gcc/ChangeLog: * doc/extend.texi: Extend counted_by attribute to pointer fields in structures. Add one more requirement to pointers with counted_by attribute. gcc/testsuite/ChangeLog: * gcc.dg/flex-array-counted-by.c: Update test. * gcc.dg/pointer-counted-by-1.c: New test. * gcc.dg/pointer-counted-by-2.c: New test. * gcc.dg/pointer-counted-by-3.c: New test. * gcc.dg/pointer-counted-by.c: New test.
2025-07-01testsuite: Fix up pr119318.c test for big-endian [PR120082]Jakub Jelinek1-2/+8
The test is not endianess clean, x[0] is supposed to be ((__int128)0x19)<<32 on little endian - 0x19 is in the second vector elt - but ((__int128)0x19)<<64 on big endian. I've added also verification of int and __int128 sizes just in case we have say 16-bit or 64-bit int target with __int128 type, or pdp endian gets __int128 support. 2025-07-01 Jakub Jelinek <jakub@redhat.com> PR ipa/119318 PR testsuite/120082 * gcc.dg/ipa/pr119318.c (main): Expect different result on big endian from little endian, on unexpected endianness or int/int128 sizes don't test anything. Formatting fixes.
2025-07-01gcc: middle-end opt for trigonometric pi-based functions builtinsYuao Ma1-0/+86
This patch partially handled PR118592. This patch builds upon r16-710-g591d3d02664c7b and r16-711-g89935d56f768b4. It introduces middle-end optimizations, such as constant folding, for our trigonometric pi-based function built-ins. gcc/ChangeLog: * fold-const-call.cc (fold_const_call_ss): Constant fold for single arg pi-based trigonometric builtins. (fold_const_call_sss): Constant fold for double arg pi-based trigonometric builtins. * fold-const.cc (negate_mathfn_p): asinpi/atanpi is odd func. (tree_call_nonnegative_warnv_p): acospi always non-neg, asinpi/atanpi non-neg iff arg non-neg. * tree-call-cdce.cc (can_test_argument_range): Add acospi/asinpi. (edom_only_function): Add acospi/asinpi/cospi/sinpi. (get_no_error_domain): Add acospi/asinpi. gcc/testsuite/ChangeLog: * lib/target-supports.exp (foldable_pi_based_trigonometry): New effective target. * gcc.dg/torture/builtin-math-9.c: New test. Signed-off-by: Yuao Ma <c8ef@outlook.com>
2025-06-30[testsuite] restore default action from dfp.exp [PR120631]Alexandre Oliva3-3/+2
dfp.exp tests for dfprt before deciding whether to default to run or compile, and the PR120631 tests override that without checking for dfprt. Rework them to avoid attempting to link and run programs when dfp runtime support isn't available. for gcc/testsuite/ChangeLog PR middle-end/120631 * gcc.dg/dfp/pr120631.c: Drop overrider of dg-do default action. * gcc.dg/dfp/bitint-9.c: Likewise. * gcc.dg/dfp/bitint-10.c: Likewise.
2025-06-30[committed] [PR rtl-optimization/120242] Fix SUBREG_PROMOTED_VAR_P after ↵Jeff Law4-0/+105
ext-dce's actions I've gone back and forth of these problems multiple times. We have two passes, ext-dce and combine which eliminate extensions using totally different mechanisms. ext-dce looks for cases where the state of upper bits in an object aren't observable and if they aren't observable, then eliminates extensions which set those bits. combine looks for cases where we know the state of the upper bits and can prove an extension is just setting those bits to their prior value. Combine also looks for cases where the precise extension isn't really important, just the knowledge that the upper bits are zero or sign extended from a narrower mode is needed. Combine relies heavily on the SUBREG_PROMOTED_VAR state to do its job. If the actions of ext-dce (or any other pass for that matter) make SUBREG_PROMOTED_VAR's state inconsistent with combine's expectations, then combine can end up generating incorrect code. -- When ext-dce eliminates an extension and turns it into a subreg copy (without any known SUBREG_PROMOTED_VAR state). Since we can no longer guarantee the destination object has any known extension state, we scurry around and wipe SUBREG_PROMOTED_VAR state for the destination object. That's fine and dandy, but ultimately insufficient. Consider if the destination of the optimized extension was used as a source in a simple copy insn. Furthermore assume that the destination of that copy is used within a SUBREG expression with SUBREG_PROMOTED_VAR set. ext-dce's actions have clobbered the SUBREG_PROMOTED_VAR state on the destination of that copy, albeit indirectly. This patch addresses this problem by taking the set of pseudos directly impacted by ext-dce's actions and expands that set by building a transitive closure for pseudos connected via copies. We then scurry around finding SUBREG_PROMOTED_VAR state to wipe for everything in that expanded set of pseudos. Voila, everything just works. -- The other approach here would be to further expand the liveness sets inside ext-dce. That's a simpler path forward, but ultimately regresses the quality of codes we do care about. One good piece of news is that with the transitive closure bits in place, we can eliminate a bit of the live set expansion we had in place for SUBREG_PROMOTED_VAR objects. -- So let's take one case of the 5 that have been reported. In ext-dce we have this insn: > (insn 29 27 30 3 (set (reg:DI 134 [ al_lsm.9 ]) > (zero_extend:DI (subreg:HI (reg:DI 162) 0))) "j.c":17:17 552 {*zero_extendhidi2_bitmanip} > (expr_list:REG_DEAD (reg:DI 162) > (nil))) There are reachable uses of (reg 134): > (insn 49 47 52 6 (set (mem/c:HI (lo_sum:DI (reg/f:DI 186) > (symbol_ref:DI ("al") [flags 0x86] <var_decl 0x7ffff73c2da8 al>)) [2 al+0 S2 A16]) > (subreg/s/v:HI (reg:DI 134 [ al_lsm.9 ]) 0)) 279 {*movhi_internal} > (expr_list:REG_DEAD (reg/f:DI 186) > (nil)))Obviously safe if we were to remove the extension. > (insn 52 49 53 6 (set (reg:DI 176) > (and:DI (reg:DI 134 [ al_lsm.9 ]) > (const_int 5 [0x5]))) "j.c":21:12 106 {*anddi3} > (expr_list:REG_DEAD (reg:DI 134 [ al_lsm.9 ]) > (nil))) > (insn 53 52 56 6 (set (reg:SI 177 [ _8 ]) > (zero_extend:SI (subreg:HI (reg:DI 176) 0))) "j.c":21:12 551 {*zero_extendhisi2_bitmanip} > (expr_list:REG_DEAD (reg:DI 176) > (nil))) Safe to remove the extension as we only read the low 16 bits from the destination register (reg 176) in insn 53. > (insn 27 26 29 3 (set (reg:DI 162) > (sign_extend:DI (plus:SI (subreg/s/v:SI (reg:DI 134 [ al_lsm.9 ]) 0) > (const_int 1 [0x1])))) "j.c":17:17 8 {addsi3_extended} > (expr_list:REG_DEAD (reg:DI 134 [ al_lsm.9 ]) > (nil))) > (insn 29 27 30 3 (set (reg:DI 134 [ al_lsm.9 ]) > (zero_extend:DI (subreg:HI (reg:DI 162) 0))) "j.c":17:17 552 {*zero_extendhidi2_bitmanip} > (expr_list:REG_DEAD (reg:DI 162) > (nil))) Again, not as obvious as the first case, but we only read the low 16 bits from (reg 162) in insn 29. So those upper bits in (reg 134) don't matter. > (insn 26 92 27 3 (set (reg:DI 144 [ ivtmp.17 ]) > (reg:DI 134 [ al_lsm.9 ])) 277 {*movdi_64bit} > (nil)) > (insn 30 29 31 3 (set (reg:DI 135 [ al.2_3 ]) > (sign_extend:DI (subreg/s/v:HI (reg:DI 144 [ ivtmp.17 ]) 0))) "j.c":17:9 558 {*extendhidi2_bitmanip} > (expr_list:REG_DEAD (reg:DI 144 [ ivtmp.17 ]) > (nil)))Also safe in isolation. But worth noting that if we remove the extension at insn 29, then the promoted status on (reg:DI 144) in insn 30 is no longer valid. Setting aside the promoted state of (reg:DI 144) at insn 30 for a minute, let's look into combine. > (insn 26 92 27 3 (set (reg:DI 144 [ ivtmp.17 ]) > (reg:DI 134 [ al_lsm.9 ])) 277 {*movdi_64bit} > (nil)) [ ... ] > (insn 30 29 31 3 (set (reg:DI 135 [ al.2_3 ]) > (sign_extend:DI (subreg/s/v:HI (reg:DI 144 [ ivtmp.17 ]) 0))) "j.c":17:9 558 {*extendhidi2_bitmanip} > (expr_list:REG_DEAD (reg:DI 144 [ ivtmp.17 ]) > (nil))) > (jump_insn 31 30 32 3 (set (pc) > (if_then_else (eq (reg:DI 135 [ al.2_3 ]) > (const_int 0 [0])) > (label_ref:DI 41) > (pc))) "j.c":4:55 371 {*branchdi} > (int_list:REG_BR_PROB 536870913 (nil)) > -> 41) Combine will do its thing on insns 30/31. Essentially the sign extension is not necessary in this context, assuming the promoted subreg status in insn 30 -- the equality test doesn't really care about the kind of extension, just knowing the value is extended is enough to safely elide the extension. And now we've come to the crux the problem. That promotion state needs to be adjusted. The new ext-dce code will see that copy at insn 26 and add (reg 144) to the set of registers that need promotion state wiped. And everything is happy after that. The other cases are similar in nature. -- This has been bootstrapped and regression tested on x86_64 and aarch64. Variants have bootstrapped & regression tested on several other platforms and it's survived testing on the crosses as well. Pushing to the trunk... PR rtl-optimization/120242 PR rtl-optimization/120627 PR rtl-optimization/120736 PR rtl-optimization/120813 gcc/ * ext-dce.cc (ext_dce_process_uses): Remove some cases where we unnecessarily expanded live sets for promoted subregs. (expand_changed_pseudos): New function. (reset_subreg_promoted_p): Use it. gcc/testsuite/ * gcc.dg/torture/pr120242.c: New test. * gcc.dg/torture/pr120627.c: Likewise. * gcc.dg/torture/pr120736.c: Likewise. * gcc.dg/torture/pr120813.c: Likewise.
2025-06-30diagnostics: convert diagnostic_event::meaning enums to enum classDavid Malcolm1-4/+4
Modernization; no functional change intended. gcc/analyzer/ChangeLog: * checker-event.cc (function_entry_event::get_meaning): Convert diagnostic_event::meaning enums to enum class. (cfg_edge_event::get_meaning): Likewise. (call_event::get_meaning): Likewise. (return_event::get_meaning): Likewise. (start_consolidated_cfg_edges_event::get_meaning): Likewise. (inlined_call_event::get_meaning): Likewise. (warning_event::get_meaning): Likewise. * sm-fd.cc (fd_diagnostic::get_meaning_for_state_change): Likewise. * sm-file.cc (file_diagnostic::get_meaning_for_state_change): Likewise. * sm-malloc.cc (malloc_diagnostic::get_meaning_for_state_change): Likewise. * sm-sensitive.cc (exposure_through_output_file::get_meaning_for_state_change): Likewise. * sm-taint.cc (taint_diagnostic::get_meaning_for_state_change): Likewise. * varargs.cc (va_list_sm_diagnostic::get_meaning_for_state_change): Likewise. gcc/ChangeLog: * diagnostic-format-sarif.cc (sarif_builder::maybe_make_kinds_array): Convert diagnostic_event::meaning enums to enum class. * diagnostic-path-output.cc (path_label::get_text): Likewise. * diagnostic-path.cc (diagnostic_event::meaning::maybe_get_verb_str): Likewise. (diagnostic_event::meaning::maybe_get_noun_str): Likewise. (diagnostic_event::meaning::maybe_get_property_str): Likewise. * diagnostic-path.h (diagnostic_event::verb): Likewise. (diagnostic_event::noun): Likewise. (diagnostic_event::property): Likewise. (diagnostic_event::meaning): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/plugin/analyzer_gil_plugin.cc (gil_diagnostic::get_meaning_for_state_change): Convert diagnostic_event::meaning enums to enum class. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-06-30diagnostics: remove "json" output formatDavid Malcolm2-76/+0
The "json" output format for diagnostics was deprecated in GCC 15, with advice to users seeking machine-readable diagnostics from GCC to use SARIF instead. This patch eliminates it from GCC 16, simplifying the diagnostics subsystem somewhat. Note that the Ada frontend seems to have its own implementation of this in errout.adb (Output_JSON_Message), and documented in gnat_ugn.texi. This patch does not touch Ada. gcc/ChangeLog: * Makefile.in (OBJS-libcommon): Drop diagnostic-format-json.o. * common.opt (fdiagnostics-format=): Drop "json|json-stderr|json-file". (diagnostics_output_format): Drop values "json", "json-stderr", and "json-file". * diagnostic-format-json.cc: Delete file. * diagnostic-format.h (diagnostic_output_format_init_json_stderr): Delete. (diagnostic_output_format_init_json_file): Delete. * diagnostic.cc (diagnostic_output_format_init): Delete cases for DIAGNOSTICS_OUTPUT_FORMAT_JSON_STDERR and DIAGNOSTICS_OUTPUT_FORMAT_JSON_FILE. * diagnostic.h (DIAGNOSTICS_OUTPUT_FORMAT_JSON_STDERR): Delete. (DIAGNOSTICS_OUTPUT_FORMAT_JSON_FILE): Delete. * doc/invoke.texi: Remove references to json output format. * doc/ux.texi: Likewise. * selftest-run-tests.cc (selftest::run_tests): Drop call to deleted selftest::diagnostic_format_json_cc_tests. * selftest.h (selftest::diagnostic_format_json_cc_tests): Delete. gcc/testsuite/ChangeLog: * c-c++-common/analyzer/out-of-bounds-diagram-1-json.c: Deleted test. * c-c++-common/diagnostic-format-json-1.c: Deleted test. * c-c++-common/diagnostic-format-json-2.c: Deleted test. * c-c++-common/diagnostic-format-json-3.c: Deleted test. * c-c++-common/diagnostic-format-json-4.c: Deleted test. * c-c++-common/diagnostic-format-json-5.c: Deleted test. * c-c++-common/diagnostic-format-json-file-1.c: Deleted test. * c-c++-common/diagnostic-format-json-stderr-1.c: Deleted test. * c-c++-common/pr106133.c: Deleted test. * g++.dg/pr90462.C: Deleted test. * gcc.dg/plugin/diagnostic-test-paths-3.c: Deleted test. * gcc.dg/plugin/plugin.exp (plugin_test_list): Remove deleted test. * gfortran.dg/diagnostic-format-json-1.F90: Deleted test. * gfortran.dg/diagnostic-format-json-2.F90: Deleted test. * gfortran.dg/diagnostic-format-json-3.F90: Deleted test. * gfortran.dg/diagnostic-format-json-pr105916.F90: Deleted test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-06-30Extend nonnull_if_nonzero attribute [PR120520]Jakub Jelinek4-10/+332
C2Y voted in the https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3466.pdf paper, which clarifies some of the conditional nonnull cases. For strncat/__strncat_chk no changes are necessary, we already use __attribute__((nonnull (1), nonnull_if_nonzero (2, 3))) attributes on the builtin and glibc can do the same too, meaning that first argument must be nonnull always and second must be nonnull if the third one is nonzero. The problem is with the fread/fwrite changes, where the paper adds: If size or nmemb is zero, +ptr may be a null pointer, fread returns zero and the contents of the array and the state of the stream remain unchanged. and ditto for fwrite, so the two argument nonnull_if_nonzero attribute isn't usable to express that, because whether the pointer can be null depends on 2 integral arguments rather than one. The following patch extends the nonnull_if_nonzero attribute, so that instead of requiring 2 arguments it allows 2 or 3, the first one is still the pointer argument index which sometimes must not be null and the other one or two are integral arguments, if there are 2, the invalid case is only if pointer is null and both the integral arguments are nonzero. 2025-06-30 Jakub Jelinek <jakub@redhat.com> PR c/120520 PR c/117023 gcc/ * builtin-attrs.def (DEF_LIST_INT_INT_INT): Define it and use for 1,2,3. (ATTR_NONNULL_IF123_LIST): New DEF_ATTR_TREE_LIST. (ATTR_NONNULL_4_IF123_LIST): Likewise. * builtins.def (BUILT_IN_FWRITE): Use ATTR_NONNULL_4_IF123_LIST instead of ATTR_NONNULL_LIST. (BUILT_IN_FWRITE_UNLOCKED): Likewise. * gimple.h (infer_nonnull_range_by_attribute): Add another optional tree * argument defaulted to NULL. * gimple.cc (infer_nonnull_range_by_attribute): Add OP3 argument, handle 3 argument nonnull_if_nonzero attribute. * builtins.cc (validate_arglist): Handle 3 argument nonnull_if_nonzero attribute. * tree-ssa-ccp.cc (pass_post_ipa_warn::execute): Likewise. * ubsan.cc (instrument_nonnull_arg): Adjust infer_nonnull_range_by_attribute caller, handle 3 argument nonnull_if_nonzero attribute. * gimple-range-infer.cc (gimple_infer_range::gimple_infer_range): Handle 3 argument nonnull_if_nonzero attribute. * doc/extend.texi (nonnull_if_nonzero): Document 3 argument version of the attribute. gcc/c-family/ * c-attribs.cc (c_common_gnu_attributes): Allow 2 or 3 arguments for nonnull_if_nonzero attribute instead of only 2. (handle_nonnull_if_nonzero_attribute): Handle 3 argument nonnull_if_nonzero. * c-common.cc (struct nonnull_arg_ctx): Rename other member to other1, add other2 member. (check_function_nonnull): Clear a if nonnull attribute has an argument. Adjust for nonnull_arg_ctx changes. Handle 3 argument nonnull_if_nonzero attribute. (check_nonnull_arg): Adjust for nonnull_arg_ctx changes, emit different diagnostics for 3 argument nonnull_if_nonzero attributes. (check_function_arguments): Adjust ctx var initialization. gcc/analyzer/ * sm-malloc.cc (malloc_state_machine::on_stmt): Handle 3 argument nonnull_if_nonzero attribute. gcc/testsuite/ * gcc.dg/nonnull-9.c: Tweak for 3 argument nonnull_if_nonzero attribute support, add further tests. * gcc.dg/nonnull-12.c: New test. * gcc.dg/nonnull-13.c: New test. * gcc.dg/nonnull-14.c: New test. * c-c++-common/ubsan/nonnull-8.c: New test. * c-c++-common/ubsan/nonnull-9.c: New test.
2025-06-28Fix handling of dwarf name and duplicated namesJan Hubicka2-2/+65
I have tested Kugan's patch on exchange2 and noticed multiple problems: 1) with LTO the translation from dwarf names to symbol names is disabled since we free lang data sooner. I moved the offline pass upstream which however also may make us miss clones intorduced betwen free lang data and annotation. This is not very important right now and may be furhter fixed by splitting off auto-profile-read and offline passes. 2) I noticed that we miss a lot of AFDO inlines because some code compares name indexes for equality in belief that it compares symbol names. This is not ture if we drop prefixes. For this reason I integrated get_original_name into the renaming machinery which actually updates indexes so string table conitnues to work as symbol table. This lets me to drop afdo_string_table->get_index (afdo_string_table->get_name (other->name ())) hops that were introduced at some places Now after renaming all afdo instances should go by DECL_ASSEMBLER_NAME names. 3) Detection of realized offline instances had an ordering issue where we omitted marking of those that were offlined later. Since we can now lookup assembler names, I simplified the logic into single-pass. autoprofiledbootstrapped/regteted x86_64-linux, comitted. gcc/ChangeLog: * auto-profile.cc (get_original_name): Only strip suffixes introduced after auto-fdo annotation. (string_table::get_index_by_decl): Simplify. (string_table::add_name): New member function. (string_table::read): Micro-optimize allocation. (function_instance::get_function_instance_by_decl): Dump reasons for failure; try to compensate lost discriminators. (function_instance::merge): Simplify sanity check; do not check for realized flag; fix merging of targets. (function_instance::offline_if_in_set): Simplify. (function_instance::dump): Sanity check that names are consistent. (autofdo_source_profile::offline_external_functions): Also handle stripping suffixes. (walk_block): Move up in source. (autofdo_source_profile::offline_unrealized_inlines): Also compute realized functions. (autofdo_source_profile::get_function_instance_by_name_index): Simplify. (autofdo_source_profile::add_function_instance): Simplify. (autofdo_source_profile::read): Do not strip suffxies; error on duplicates. (mark_realized_functions): Remove. (auto_profile): Do not call mark_realized_functions. * passes.def: Move auto_profile_offline before free_lang_data. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/clone-test.c: New test. * gcc.dg/tree-prof/clone-merge-1.c: Updae template. Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
2025-06-27Fix afdo profiles for functions that was not early-inlinedJan Hubicka1-1/+1
This patch should finish the oflining infrastructure by offlining (prior AFDO annotation) all inline function instances that was not early inlined. This is mostly the case of recursive inlining or when -fno-auto-profile-inlining is used which sould now produce comparable code. I also cleaned up offlining of self-recursive functions which now happens through the worklist and reduces problem with recursive ivocation of the funciton merging modifying datastructures at unexpected places. gcc/ChangeLog: * auto-profile.cc (function_instance::set_name, function_instance::set_realized, function_instnace::realized_p, function_instance::set_in_worklist, function_instance::clear_in_worklist, function_instance::in_worklist_p): New member functions. (function_instance::in_worklist, function_instance::realized_): new. (get_relative_location_for_locus): Break out from .... (get_relative_location_for_stmt): ... here. (function_instance::~function_instance): Sanity check that removed function is not in worklist. (function_instance::merge): Do not offline realized instances. (function_instance::offline): Make private; add duplicate functions to worklist rather then merging immediately. (function_instance::offline_if_in_set): Cleanup. (function_instance::remove_external_functions): Likewise. (function_instance::offline_if_not_realized): New member function. (autofdo_source_profile::offline_external_functions): Handle delayed functions. (autofdo_source_profile::offline_unrealized_inlines): New member function. (walk_block): New function. (mark_realized_functions): New function. (afdo_annotate_cfg): Fix dump. (auto_profile): Mark realized functions and offline rest; do not compute fn summary. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/afdo-crossmodule-1.c: Update template.
2025-06-27tree-optimization/120808 - SLP patterns with FMA/FMSRichard Biener1-2/+2
The following amends the SLP addsub pattern to also match blends of .FMA/.FMS and form .FMADDSUB even when -ffp-contract=off. PR tree-optimization/120808 * tree-vect-slp-patterns.cc (vect_match_expression_p): Take a code_helper and also match calls. (addsub_pattern::recognize): Handle .FMA/.FMS pairs in addition to PLUS/MINUS. (addsub_pattern::build): Adjust. * gcc.dg/vect/bb-slp-pr120808.c: Now also expect FMADDSUB patterns to be matched.
2025-06-26diagnostics, testsuite: don't assume host has "dot" [PR120809]David Malcolm2-9/+37
gcc/ChangeLog: PR analyzer/120809 * diagnostic-format-html.cc (html_builder::maybe_make_state_diagram): Bulletproof against the SVG generation failing. * xml.cc (xml::printer::push_element): Assert that the ptr is nonnull. (xml::printer::append): Likewise. gcc/testsuite/ChangeLog: PR analyzer/120809 * gcc.dg/analyzer/state-diagram-5.c: Split out into... * gcc.dg/analyzer/state-diagram-5-html.c: ...this, adding dg-require-dot... * gcc.dg/analyzer/state-diagram-5-sarif.c: ...and this. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-06-26Add testcase for afdo offlining and fix two bugsJan Hubicka2-0/+39
This patch adds a testcase that offlining works and profile info is not lost. While doing it I noticed a pasto that made the dump to be "afdo" and not "afdo_offline" and also that not all functions are processed as the range for does not expect new values to be put to the vector. Fixed thus. gcc/ChangeLog: * auto-profile.cc (function_instance::merge): Add TODO. (autofdo_source_profile::offline_external_functions): Do not use range for on the worklist. * timevar.def (TV_IPA_AUTOFDO_OFFLINE): New timevar. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/afdo-crossmodule-1.c: New test. * gcc.dg/tree-prof/afdo-crossmodule-1b.c: New test.
2025-06-26Avoid some lost AFDO profiles with LTOJan Hubicka1-3/+3
This patch fixes some of cases where we lose profile info because we do not perform inlining that happened at train run before AFDO annotation is done. This is a common problem with LTO in the case cross-module inlining happened. I added afdo_offline pass that does two things: 1) collect set of all functions defined in current unit 2) walk all toplevel function instances. If function instance correspond to a defined symbol, walk everything inlined to it. If crossmodule inlining is seen, remove the inline instances and recursively look into inline instnaces that go back to the current unit and turn them to offline ones If function instance corresponds to external symbol, remove it but also look for functions inlined to it that belong to current module. When merging profile we also need to recursively merge profiles of inlined functions and if the inlining decisins does not match, offline the bodies. This is somewhat fragile since recursive calls may trigger modifications of functions currently being merged, but I hope I chased away problems with that - will give it a second tought to see if this can be reorganized into a worklist fashion that is more safe. I noticed that functions may appear in the afdo data either as their symbol name or dwarf name (since inline functions may not have known symbol name). There is already some logic to handle that but it is broken in the case both names are used. To mitigate the problem I also added logic to translate dwarf names to symbol names in case both are used. This prevents profile loss i.e. in exchange2. Here digits_2 function appears by its dwarf name (digits_2) but also is clonned which makes it to appear by its symbol name (__*digits_2) All profile massaging is done before early optimization so the VPT targets of offline bodies are correct. We still will lose profile if early inlining fails. I will add second pass to afdo to offline these. Last problem is that in case we early inlined more than expected (which now happens more often due to offlining) the profile will be lost and filled by static profile. Problem here is that we need to somehow scale the profile of inline instance but I do not see how to determine invocation counts. Will try to look into that incrementally - perhaps we can keep some info from offlining. There is also now a dump infrastructure that prints the proflie in a the same format as dump_gcov tool. autoprofiledbootstraped, regsted x86_64-linux, will commit it shortly. Honza gcc/ChangeLog: * auto-profile.cc (name_index_set, name_index_map): New types. (dump_afdo_loc): New function. (dump_inline_stack): Simplify. (function_instance::merge): Merge recursively inlined functions; offline if necessary; collect new fnctions. (function_instance::offline): New member function. (function_instance::offline_if_in_set): New member function. (function_instance::remove_external_functions): New member function. (function_instance::dump): New member function. (function_instance::debug): New member function. (function_instance::dump_inline_stack): New member function. (function_instance::find_icall_target_map): Use removed_icall_target. (function_instance::remove_icall_target): Only mark icall target removed. (autofdo_source_profile::offline_external_functions): New function. (function_instance::read_function_instance): Record inlined_to pointers; use -1 for unknown head counts. (autofdo_source_profile::get_function_instance_by_name_index): New function. (autofdo_source_profile::add_function_instance): New member function. (autofdo_source_profile::read): Do not leak memory; fix formatting. (read_profile): Fix formatting. (afdo_annotate_cfg): LIkewise. (class pass_ipa_auto_profile_offline): New pass. (make_pass_ipa_auto_profile_offline): New function. * passes.def (pass_ipa_auto_profile_offline): Add * tree-pass.h (make_pass_ipa_auto_profile): Declare gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/indir-call-prof-2.c: Update template.
2025-06-25tree-optimization/109892 - SLP reduction of fmaRichard Biener3-0/+51
The following adds the ability to vectorize a fma reduction pair as SLP reduction (we cannot yet handle ternary association in reduction vectorization yet). PR tree-optimization/109892 * tree-vect-loop.cc (check_reduction_path): Handle fma. (vectorizable_reduction): Apply FOLD_LEFT_REDUCTION code generation constraints. * gcc.dg/vect/vect-reduc-fma-1.c: New testcase. * gcc.dg/vect/vect-reduc-fma-2.c: Likewise. * gcc.dg/vect/vect-reduc-fma-3.c: Likewise.
2025-06-25tree-optimization/120808 - SLP build with mixed .FMA/.FMSRichard Biener1-0/+12
The following allows SLP build to succeed when mixing .FMA/.FMS in different lanes like we handle mixed plus/minus. This does not yet address SLP pattern matching to not being able to form a FMADDSUB from this. PR tree-optimization/120808 * tree-vectorizer.h (compatible_calls_p): Add flag to indicate a FMA/FMS pair is allowed. * tree-vect-slp.cc (compatible_calls_p): Likewise. (vect_build_slp_tree_1): Allow mixed .FMA/.FMS as two-operator. (vect_build_slp_tree_2): Handle calls in two-operator SLP build. * tree-vect-slp-patterns.cc (compatible_complex_nodes_p): Adjust. * gcc.dg/vect/bb-slp-pr120808.c: New testcase.
2025-06-23diagnostics: add state diagrams to analyzer experimental-html output [PR116792]David Malcolm13-0/+286
This patch adds various support for debugging diagnostic paths and events, intended initially for myself to help with debugging -fanalyzer. It adds the optional ability for a diagnostic_event to supply a description of the predicted state of the program at that point along the diagnostic_path. To isolate the diagnostic subsystem from the analyzer, this representation is currently an xml::document with custom elements. The XML representation is similar to the analyzer's internal state but can be easier to read - for example, rather than storing the contents of memory via byte offsets, it uses fields for structs and element indexes for arrays, recursively. These states are handled by the HTML and SARIF diagnostic sinks. The SARIF sink simply embeds the XML as a string in a property bag of the threadFlowLocation object (SARIF v2.1.0 section 3.38). For HTML output, the "experimental-html" sink gains a new "show-state-diagrams=yes" option i.e.: -fdiagnostics-add-output=experimental-html:show-state-diagrams=yes which converts the state XML into SVG diagrams visualizing the state of memory at each event, inspired by the "ddd" debugger. These can be seen by pressing 'j' and 'k' to single-step forward and backward through events, making it *much* easier to debug -fanalyzer. An example of output can be seen here: https://dmalcolm.fedorapeople.org/gcc/2025-06-23/state-diagram-1.c.html showing an issue in a singly-linked list; there are various other examples in the parent directory. Generating the SVG diagrams requires an invocation of "dot" per event, so it noticeable slows down diagnostic emission, hence the opt-in command-line flag. However, I'm already finding bugs in -fanalyzer with this that I hadn't seen before. Given that the UI is rather clunky and there is lots of room for improvement to the visualizations, for now this feature is marked as being for GCC developers, not end-users. The patch also adds a dot::ast_node class hierarachy to make it easy to create GraphViz dot files with the correct escaping, and adds a C++ wrapper around pex adding some syntactic sugar for invoking subprocesses. gcc/ChangeLog: PR other/116792 * Makefile.in (ANALYZER_OBJS): Add analyzer/ana-state-to-diagnostic-state.o. (OBJS): Move graphviz.o to... (OBJS-libcommon): ...here. Add diagnostic-state-to-dot.o and pex.o. * diagnostic-format-html.cc: Include "diagnostic-state.h" and "graphviz.h". (html_generation_options::html_generation_options): Initialize the new flags. (HTML_SCRIPT): Add function "get_any_state_diagram". Use it when changing current focus id to update the visibility of the pertinent diagram, if any. (print_pre_source): New. (html_builder::maybe_make_state_diagram): New. (html_path_label_writer::html_path_label_writer): Add "path" param. Initialize m_path and m_curr_event_id. (html_path_label_writer::begin_label): Store current event id. (html_path_label_writer::end_label): Attempt to make a state diagram and add it if successful. (html_path_label_writer::get_element_id): New. (html_path_label_writer::m_path): New field. (html_path_label_writer::m_curr_event_id): New field. (html_builder::make_element_for_diagnostic): Pass path to label writer. * diagnostic-format-html.h (html_generation_options::m_show_state_diagrams): New field. (html_generation_options::m_show_state_diagram_xml): New field. (html_generation_options::m_show_state_diagram_dot_src): New field. * diagnostic-format-sarif.cc: Include "xml.h". (populate_thread_flow_location_object): If requested, attempt to generate xml state and add it to the proeprty bag as "gcc/diagnostic_event/xml_state" in xml source form. (sarif_generation_options::sarif_generation_options): Initialize m_xml_state. * diagnostic-format-sarif.h (sarif_generation_options::m_xml_state): New field. * diagnostic-path.cc: Define INCLUDE_MAP. Include "xml.h". (diagnostic_event::maybe_make_xml_state): New. * diagnostic-path.h (class xml::document): New forward decl. (diagnostic_event::maybe_make_xml_state): New vfunc decl. * diagnostic-state-to-dot.cc: New file. * diagnostic-state.h: New file. * digraph.cc: Define INCLUDE_STRING and INCLUDE_VECTOR. * doc/analyzer.texi: Document state diagrams in html output. (__analyzer_dump_dot): New. (__analyzer_dump_xml): New. * doc/invoke.texi (sarif): Add "xml-state" key. (experimental-html): Add keys "show-state-diagrams", "show-state-diagrams-dot-src" and "show-state-diagrams-xml". * graphviz.cc: Define INCLUDE_MAP, INCLUDE_STRING, and INCLUDE_VECTOR. Include "xml.h", "xml-printer.h", "pex.h" and "selftest.h". (graphviz_out::graphviz_out): Extract... (dot::writer::writer): ...this. (graphviz_out::write_indent): Convert to... (dot::writer::write_indent): ...this. (graphviz_out::print): Use get_pp. (graphviz_out::println): Likewise. (graphviz_out::begin_tr): Likewise. (graphviz_out::end_tr): Likewise. (graphviz_out::begin_td): Likewise. (graphviz_out::end_td): Likewise. (graphviz_out::begin_trtd): Likewise. (graphviz_out::end_tdtr): Likewise. (dot::ast_node::dump): New. (dot::id::id): New. (dot::id::print): New. (dot::id::is_identifier_p): New. (dot::kv_pair::print): New. (dot::attr_list::print): New. (dot::stmt_list::print): New. (dot::stmt_list::add_edge): New. (dot::stmt_list::add_attr): New. (dot::graph::print): New. (dot::stmt_with_attr_list::set_label): New. (dot::node_stmt::print): New. (dot::attr_stmt::print): New. (dot::kv_stmt::print): New. (dot::node_id::print): New. (dot::port::print): New. (dot::edge_stmt::print): New. (dot::subgraph::print): New. (dot::make_svg_document_buffer_from_graph): New. (dot::make_svg_from_graph): New. (selftest:test_ids): New. (selftest:test_trivial_graph): New. (selftest:test_layout_example): New. (selftest:graphviz_cc_tests): New. * graphviz.h (xml::node): New forward decl. (class graphviz_out): Split out into... (class dot::writer): ...this new class (struct dot::ast_node): New. (struct dot::id): New. (struct dot::kv_pair): New. (struct dot::attr_list): New. (struct dot::stmt_list): New. (struct dot::graph): New. (struct dot::stmt): New. (struct dot::stmt_with_attr_list): New. (struct dot::node_stmt): New. (struct dot::attr_stmt): New. (struct dot::kv_stmt): New. (enum class dot::compass_pt): New. (struct dot::port): New. (struct dot::node_id): New. (struct dot::edge_stmt): New. (struct dot::subgraph): New. (dot::make_svg_from_graph): New. * opts-diagnostic.cc (sarif_scheme_handler::make_sink): Add "xml-state" flag. (html_scheme_handler::make_sink): Add flags "show-state-diagrams", "show-state-diagram-dot-src", and "show-state-diagram-xml". * pex.cc: New file. * pex.h: New file. * selftest-run-tests.cc (selftest::run_tests): Call graphviz_cc_tests. * selftest.h (selftest::graphviz_cc_tests): New decl. * xml.cc (xml::node_with_children::add_comment): New. (xml::node_with_children::find_child_element): New. (xml::element::get_attr): New. (xml::comment::write_as_xml): New. (selftest::test_printer): Add coverage of find_child_element and get_attr. (selftest::test_comment): New. (selftest::xml_cc_tests): Call test_comment. * xml.h: New forward decls. (xml::node::dyn_cast_text): Use nullptr. (xml::node::dyn_cast_element): New vfunc. (xml::node_with_children::add_comment): New decl. (xml::node_with_children::find_child_element): New decl. (xml::element::dyn_cast_element): New vfunc impl. (xml::element::get_attr): New decl. (struct xml::comment): New xml::node subclass. gcc/analyzer/ChangeLog: PR other/116792 * ana-state-to-diagnostic-state.cc: New file. * ana-state-to-diagnostic-state.h: New file. * checker-event.cc: Include "xml.h". (checker_event::checker_event): Initialize m_path. (checker_event::prepare_for_emission): Store the path pointer into m_path. (checker_event::maybe_make_xml_state): New. (function_entry_event::function_entry_event): Add "state" param and use it to initialize m_state. (superedge_event::get_program_state): New. (call_event::get_program_state): New. (warning_event::get_program_state): New. * checker-event.h (checker_event::get_program_state): New vfunc. (checker_event::maybe_make_xml_state): New decl. (checker_event::m_path): New field. (statement_event::get_program_state): New vfunc impl. (function_entry_event::function_entry_event): Add "state" param. (function_entry_event::get_program_state): New vfunc impl. (function_entry_event::m_state): New field. (state_change_event::get_program_state): New vfunc impl. (superedge_event::get_program_state): New vfunc decl. (warning_event::warning_event): Add "program_state_" param and copy it. (warning_event::get_program_state): New vfunc decl. (warning_event::m_program_state): New field. * checker-path.h (checker_path::checker_path): Add ext_state param. (checker_path::get_ext_state): New accessor. (checker_path::m_ext_state): New field. * common.h: Define INCLUDE_MAP and INCLUDE_STRING. * diagnostic-manager.cc (saved_diagnostic::operator==): Don't deduplicate dump_path_diagnostic instances. (diagnostic_manager::emit_saved_diagnostic): Pass ext_state to checker_path ctor. * engine.cc: (impl_region_model_context::on_state_leak): Pass old and new state to state_machine::on_leak. (exploded_node::on_stmt_pre): Implement __analyzer_dump_xml and __analyzer_dump_dot. * exploded-graph.h (impl_region_model_context::get_state): New. * infinite-recursion.cc (recursive_function_entry_event::recursive_function_entry_event): Add "dst_state" param and pass to function_entry_event ctor. (infinite_recursion_diagnostic::add_function_entry_event): Pass state to event ctor. * kf-analyzer.cc: Include "analyzer/program-state.h" (dump_path_diagnostic::dump_path_diagnostic): Add "state" param. (dump_path_diagnostic::get_final_state): New. (dump_path_diagnostic::m_state): New field. (kf_analyzer_dump_path::impl_call_pre): Pass state to warning. * pending-diagnostic.cc (pending_diagnostic::add_function_entry_event): Pass state to function_entry_event. (pending_diagnostic::add_final_event): Likewise to warning_event. * pending-diagnostic.h (pending_diagnostic::get_final_state): New vfunc decl. * program-state.cc: Include "diagnostic-state.h", "graphviz.h" and "analyzer/ana-state-to-diagnostic-state.h". (program_state::dump_dot): New. * program-state.h: Include "text-art/tree-widget.h" and "analyzer/store.h". (class xml::document): New forward decl. (make_xml): New. (dump_xml_to_pp): New. (dump_xml_to_file): New. (dump_xml): New. (dump_dot): New. * record-layout.cc (record_layout::record_layout): Make param const_tree. * record-layout.h (item::item): Likewise. (item::m_field): Likewise. (record_layout::record_layout): Likewise. (record_layout::begin): New. (record_layout::end): New. * region-model.cc (exposure_through_uninit_copy::complain_about_fully_uninit_item): Use const_tree. (exposure_through_uninit_copy::complain_about_partially_uninit_item): Likewise. * region-model.h (region_model_context::get_state): New vfunc. (noop_region_model_context::get_state): New. (region_model_context_decorator::get_state): New. * sm-fd.cc (fd_leak::fd_leak): Add "final_state" param and capture it if present. (fd_leak::get_final_state): New. (fd_leak::m_final_state): New. (fd_state_machine::on_open): Pass nullptr for new "final_state" param. (fd_state_machine::on_creat): Likewise. (fd_state_machine::on_socket): Likewise. (fd_state_machine::on_accept): Likewise. (fd_state_machine::on_leak): Add state params and pass new state as final state to fd_leak ctor. * sm-file.cc: Include "analyzer/program-state.h". (file_leak::file_leak): Add "final_state" param and capture it if present. (file_leak::get_final_state): New. (file_leak::m_final_state): New. (fileptr_state_machine::on_leak): Add state params and pass new state as final state to fd_leak ctor. * sm-malloc.cc: Include "analyzer/ana-state-to-diagnostic-state.h". (malloc_leak::malloc_leak): Add "final_state" param and use it. (malloc_leak::get_final_state): New vfunc impl. (malloc_leak::m_final_state): New field. (malloc_state_machine::on_leak): Add state params; capture final state. (malloc_state_machine::add_state_to_xml): New. * sm.cc (state_machine::on_leak): Add "old_state" and "new_state" params. Use nullptr. (state_machine::add_state_to_xml): New. (state_machine::add_global_state_to_xml): New. * sm.h (class xml_state): New forward decl. (state_machine::on_leak): Add state params. (state_machine::add_state_to_xml): New vfunc decl. (state_machine::add_global_state_to_xml): New vfunc decl. * store.h (bit_range::operator<): New. * varargs.cc (va_list_leak::va_list_leak): Add final_state param and capture it if non-null. (va_list_leak::get_final_state): New. (va_list_leak::m_final_state): New. (va_list_state_machine::on_leak): Add state params and pass final state to va_list_leak ctor. gcc/testsuite/ChangeLog: PR other/116792 * g++.dg/analyzer/state-diagram.C: New test. * gcc.dg/analyzer/analyzer-decls.h (__analyzer_dump_dot): New decl. (__analyzer_dump_xml): New decl. * gcc.dg/analyzer/state-diagram-1-sarif.py: New test script. * gcc.dg/analyzer/state-diagram-1.c: New test. * gcc.dg/analyzer/state-diagram-2.c: New test. * gcc.dg/analyzer/state-diagram-3.c: New test. * gcc.dg/analyzer/state-diagram-4.c: New test. * gcc.dg/analyzer/state-diagram-5-html.py: New test script. * gcc.dg/analyzer/state-diagram-5-sarif.py: New test script. * gcc.dg/analyzer/state-diagram-5.c: New test. * gcc.dg/plugin/analyzer_cpython_plugin.cc: Define INCLUDE_STRING. * gcc.dg/plugin/analyzer_gil_plugin.cc: Likewise. * gcc.dg/plugin/analyzer_kernel_plugin.cc: Likewise. * gcc.dg/plugin/analyzer_known_fns_plugin.cc: Likewise. * lib/htmltest.py (ns): Add SVG namespace. * lib/sarif.py (get_result_by_index): New. (get_xml_state): New. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-06-23vect: Use combined peeling and versioning for mutually aligned DRsPengfei Li1-1/+1
Current GCC uses either peeling or versioning, but not in combination, to handle unaligned data references (DRs) during vectorization. This limitation causes some loops with early break to fall back to scalar code at runtime. Consider the following loop with DRs in its early break condition: for (int i = start; i < end; i++) { if (a[i] == b[i]) break; count++; } In the loop, references to a[] and b[] need to be strictly aligned for vectorization because speculative reads that may cross page boundaries are not allowed. Current GCC does versioning for this loop by creating a runtime check like: ((&a[start] | &b[start]) & mask) == 0 to see if two initial addresses both have lower bits zeros. If above runtime check fails, the loop will fall back to scalar code. However, it's often possible that DRs are all unaligned at the beginning but they become all aligned after a few loop iterations. We call this situation DRs being "mutually aligned". This patch enables combined peeling and versioning to avoid loops with mutually aligned DRs falling back to scalar code. Specifically, the function vect_peeling_supportable is updated in this patch to return a three-state enum indicating how peeling can make all unsupportable DRs aligned. In addition to previous true/false return values, a new state peeling_maybe_supported is used to indicate that peeling may be able to make these DRs aligned but we are not sure about it at compile time. In this case, peeling should be combined with versioning so that a runtime check will be generated to guard the peeled vectorized loop. A new type of runtime check is also introduced for combined peeling and versioning. It's enabled when LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is true. The new check tests if all DRs recorded in LOOP_VINFO_MAY_MISALIGN_STMTS have the same lower address bits. For above loop case, the new test will generate an XOR between two addresses, like: ((&a[start] ^ &b[start]) & mask) == 0 Therefore, if a and b have the same alignment step (element size) and the same offset from an alignment boundary, a peeled vectorized loop will run. This new runtime check also works for >2 DRs, with the LHS expression being: ((a1 ^ a2) | (a2 ^ a3) | (a3 ^ a4) | ... | (an-1 ^ an)) & mask where ai is the address of i'th DR. This patch is bootstrapped and regression tested on x86_64-linux-gnu, arm-linux-gnueabihf and aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-data-refs.cc (vect_peeling_supportable): Return new enum values to indicate if combined peeling and versioning can potentially support vectorization. (vect_enhance_data_refs_alignment): Support combined peeling and versioning in vectorization analysis. * tree-vect-loop-manip.cc (vect_create_cond_for_align_checks): Add a new type of runtime check for mutually aligned DRs. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Set default value of allow_mutual_alignment in the initializer list. * tree-vectorizer.h (enum peeling_support): Define type of peeling support for function vect_peeling_supportable. (LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT): New access macro. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-early-break_133_pfa6.c: Adjust test.
2025-06-21Extend afdo inliner to introduce speculative callsJan Hubicka2-7/+13
This patch makes the AFDO's VPT to happen during early inlining. This should make the einline pass inside afdo pass unnecesary, but some inlining still happens there - I will need to debug why that happens and will try to drop the afdo's inliner incrementally. get_inline_stack_in_node can now be used to produce inline stack out of callgraph nodes which are marked as inline clones, so we do not need to iterate tree-inline and IPA decisions phases like old code did. I also added some debug facilities - dumping of decisions and inline stacks, so one can match them with data in gcov profile. Former VPT pass identified all caes where in train run indirect call was inlined and the inlined callee collected some samples. In this case it forced inline without doing any checks, such as whether inlining is possible. New code simply introduces speculative edges into callgraph and lets afdo inlining to decide. Old code also marked statements that were introduced during promotion to prevent doing double speculation i.e. if (ptr == foo) .. inlined foo ... else ptr (); to if (ptr == foo) .. inlined foo ... else if (ptr == foo) foo (); // for IPA inlining else ptr (); Since inlning now happens much earlier, tracking the statements would be quite hard. Instead I simply remove the targets from profile data which sould have same effect. I also noticed that there is nothing setting max_count so all non-0 profile is considered hot which I fixed too. Training with ref run I now get 500.perlbench_r 1 160 9.93 * 1 162 9.84 * 502.gcc_r NR NR 505.mcf_r 1 186 8.68 * 1 194 8.34 * 520.omnetpp_r 1 183 7.15 * 1 208 6.32 * 523.xalancbmk_r NR NR 525.x264_r 1 85.2 20.5 * 1 85.8 20.4 * 531.deepsjeng_r 1 165 6.93 * 1 176 6.51 * 541.leela_r 1 268 6.18 * 1 282 5.87 * 548.exchange2_r 1 86.3 30.4 * 1 88.9 29.5 * 557.xz_r 1 224 4.81 * 1 224 4.82 * Est. SPECrate2017_int_base 9.72 Est. SPECrate2017_int_peak 9.33 503.bwaves_r NR NR 507.cactuBSSN_r 1 107 11.9 * 1 105 12.0 * 508.namd_r 1 108 8.79 * 1 116 8.18 * 510.parest_r 1 143 18.3 * 1 156 16.8 * 511.povray_r 1 188 12.4 * 1 163 14.4 * 519.lbm_r 1 72.0 14.6 * 1 75.0 14.1 * 521.wrf_r 1 106 21.1 * 1 106 21.1 * 526.blender_r 1 147 10.3 * 1 147 10.4 * 527.cam4_r 1 110 15.9 * 1 118 14.8 * 538.imagick_r 1 104 23.8 * 1 105 23.7 * 544.nab_r 1 146 11.6 * 1 143 11.8 * 549.fotonik3d_r 1 134 29.0 * 1 169 23.1 * 554.roms_r 1 86.6 18.4 * 1 89.3 17.8 * Est. SPECrate2017_fp_base 15.4 Est. SPECrate2017_fp_peak 14.9 Base is without profile feedback and peak is AFDO. gcc/ChangeLog: * auto-profile.cc (dump_inline_stack): New function. (get_inline_stack_in_node): New function. (get_relative_location_for_stmt): Add FN parameter. (has_indirect_call): Remove. (function_instance::find_icall_target_map): Add FN parameter. (function_instance::remove_icall_target): New function. (function_instance::read_function_instance): Set sum_max. (autofdo_source_profile::get_count_info): Add NODE parameter. (autofdo_source_profile::update_inlined_ind_target): Add NODE parameter. (autofdo_source_profile::remove_icall_target): New function. (afdo_indirect_call): Add INDIRECT_EDGE parameter; dump reason for failure; do not check for recursion; do not inline call. (afdo_vpt): Add INDIRECT_EDGE parameter. (afdo_set_bb_count): Do not take PROMOTED set. (afdo_vpt_for_early_inline): Remove. (afdo_annotate_cfg): Do not take PROMOTED set. (auto_profile): Do not call afdo_vpt_for_early_inline. (afdo_callsite_hot_enough_for_early_inline): Dump count. (remove_afdo_speculative_target): New function. * auto-profile.h (afdo_vpt_for_early_inline): Declare. (remove_afdo_speculative_target): Declare. * ipa-inline.cc (inline_functions_by_afdo): Do VPT. (early_inliner): Redirecct edges if inlining happened. * tree-inline.cc (expand_call_inline): Add sanity check. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/afdo-vpt-earlyinline.c: Update template. * gcc.dg/tree-prof/indir-call-prof-2.c: Update template.
2025-06-21Implement afdo inlinerJan Hubicka3-5/+32
This patch moves afdo inlining from early inliner into specialized one. The reason is that early inliner is by design non-recursive while afdo inliner needs to recurse. In the past google handled it by increasing early inliner iterations, but it can be done easily and cheaply without it by simply recusing into inlined functions. I will also look into moving VPT to early inliner now. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * auto-profile.cc (get_inline_stack): Add fn parameter. * ipa-inline.cc (want_early_inline_function_p): Do not care about AFDO. (inline_functions_by_afdo): New function. (early_inliner): Use it. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/afdo-vpt-earlyinline.c: Update template. * gcc.dg/tree-prof/indir-call-prof-2.c: Likewise. * gcc.dg/tree-prof/afdo-inline.c: New test.
2025-06-20Fix range wrap check and enhance verify_range.Andrew MacLeod1-0/+40
when snapping range bounds to satidsdaybitmask constraints, end bound overflow and underflow checks were not working properly. Also Adjust some comments, and enhance verify_range to make sure range pairs are sorted properly. PR tree-optimization/120701 gcc/ * value-range.cc (irange::verify_range): Verify range pairs are sorted properly. (irange::snap): Check for over/underflow properly. gcc/testsuite/ * gcc.dg/pr120701.c: New.
2025-06-20tree-optimization/120654 - ICE with range query from IVOPTsRichard Biener1-0/+24
The following ICEs as we hand down an UNDEFINED range to where it isn't expected. Put the guard that's there earlier. PR tree-optimization/120654 * vr-values.cc (range_fits_type_p): Check for undefined_p () before accessing type (). * gcc.dg/torture/pr120654.c: New testcase.
2025-06-19dfp: Further decimal_real_to_integer fixes [PR120631]Jakub Jelinek2-0/+74
Unfortunately, the following further testcase shows that there aren't problems only with very large precisions and large exponents, but pretty much anything larger than 64-bits. After all, before _BitInt support dfp didn't even have {,unsigned }__int128 <-> _Decimal{32,64,128,64x} support, and the testcase again shows some of the conversions yielding zeros. While the pr120631.c test worked even without the earlier patch. So, this patch assumes 64-bit precision at most is ok and for anything larger it just uses exponent 0 and multiplies afterwards. 2025-06-19 Jakub Jelinek <jakub@redhat.com> PR middle-end/120631 * dfp.cc (decimal_real_to_integer): Use result multiplication not just when precision > 128 and dn.exponent > 19, but when precision > 64 and dn.exponent > 0. * gcc.dg/dfp/bitint-10.c: New test. * gcc.dg/dfp/pr120631.c: New test.
2025-06-18Add space after foo in testcaseAndrew MacLeod1-1/+1
gcc/testsuite/ * gcc.dg/pr119039-1.c: Add space in search criteria.
2025-06-18Improve contains_p and intersect with bitmasks.Andrew MacLeod1-0/+60
Improve the way contains_p (wide_int) and intersect behave wioth singletons and bitmasks. Also fix a buglet in bitmask_intersect when the result is a singleton which is not in the current range. PR tree-optimization/119039 gcc/ * value-range.cc (irange::contains_p): Call wide_int version of contains_p for singleton ranges. (irange::intersect): If either range is a singleton, use contains_p. gcc/testsuite/ * gcc.dg/pr119039-2.c: New.
2025-06-18Simplify switches utilizing subranges.Andrew MacLeod2-2/+34
Adjust simplify_switch_using_ranges to use irange rather than relying on the older legacy_range mechaism. PR tree-optimization/119039 gcc/ * vr-values.cc (simplify_using_ranges::legacy_fold_cond): Remove. (simplify_using_ranges::simplify_switch_using_ranges): Adjust. gcc/testsuite/ * gcc.dg/pr119039-1.c: New. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust thread counts.
2025-06-18dfp, real: Fix up FLOAT_EXPR/FIX_TRUNC_EXPR constant folding between dfp and ↵Jakub Jelinek1-0/+29
large _BitInt [PR120631] The following testcase shows that while at runtime we handle conversions between _Decimal{64,128} and large _BitInt correctly, at compile time we mishandle them in both directions, in one direction we end up in ICE in decimal_from_integer callee because the char buffer is too short for the needed number of decimal digits, in the conversion of dfp to large _BitInt we return 0 in the wide_int. The following patch fixes the ICE by using larger buffer (XALLOCAVEC allocated, it will be never larger than 65536 / 3 bytes) in the larger _BitInt case, and the other direction by setting exponent to exp % 19 and instead multiplying the result by needed powers of 10^19 (10^19 chosen as largest power of ten that can fit into UHWI). 2025-06-18 Jakub Jelinek <jakub@redhat.com> PR middle-end/120631 * real.cc (decimal_from_integer): Add digits argument, if larger than 256, use XALLOCAVEC allocated buffer. (real_from_integer): Pass val_in's precision divided by 3 to decimal_from_integer. * dfp.cc (decimal_real_to_integer): For precision > 128 if finite and exponent is large, decrease exponent and multiply resulting wide_int by powers of 10^19. * gcc.dg/dfp/bitint-9.c: New test.
2025-06-17Snap subrange boundries to bitmask constraints.Andrew MacLeod2-0/+90
Ensure all subrange endpoints conform to the bitmask. PR tree-optimization/120661 gcc/ * value-range.cc (irange::snap): New. (irange::snap_subranges): New. (irange::set_range_from_bitmask): Call snap_subranges. * value-range.h (snap, snap_subranges): New prototypes. gcc/testsuite/ * gcc.dg/pr120661-1.c: New. * gcc.dg/pr120661-2.c: New.
2025-06-17Add testcase for AFDO early inlining and indirect call promotionJan Hubicka1-0/+32
gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/afdo-vpt-earlyinline.c: New test.
2025-06-13testsuite: Fix pr119160.c for non-glibc targets [PR119862]Konstantinos Eleftheriou1-0/+13
Testcase pr119160.c fails with symbol referencing errors for `__cyg_profile_func_enter` and `__cyg_profile_func_exit` on non-glibc systems. This patch adds empty definitions for `__cyg_profile_func_enter` and `__cyg_profile_func_exit` in order to prevent those errors. PR testsuite/119862 gcc/testsuite/ChangeLog: * gcc.dg/pr119160.c: Added empty definitions for `__cyg_profile_func_enter` and `__cyg_profile_func_exit` functions.
2025-06-12recip: Reset range info when replacing sqrt with rsqrt [PR120638]Jakub Jelinek1-0/+31
This pass reuses a SSA_NAME on the lhs of sqrt etc. call as lhs of .RSQRT etc. call. The following testcase is miscompiled since my recent ranger cast changes, because we compute (correct) range for sqrtf argument as well as result but then recip pass keeps using that range for the .RQSRT call which returns 1. / sqrt, so the function then returns 0.5f unconditionally. Note, on foo this is a regression from GCC 15, but on bar it regressed already with the r14-536 change. 2025-06-12 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/120638 * tree-ssa-math-opts.cc (pass_cse_reciprocals::execute): Call reset_flow_sensitive_info on arg1. * gcc.dg/pr120638.c: New test.
2025-06-12testsuite: Add testcase for already fixed PR [PR120630]Jakub Jelinek1-0/+25
These tests were broken by my r16-1398 PR120434 change and fixed by r16-1482 PR120629 change. Committing these to increase testsuite coverage. 2025-06-12 Jakub Jelinek <jakub@redhat.com> PR middle-end/120630 * gcc.dg/pr120630.c: New test. * gcc.c-torture/execute/pr120630.c: New test.
2025-06-12Fix test case for PR117811 which failed for int < 32 bit.Georg-Johann Lay1-0/+5
PR middle-end/117811 PR testsuite/52641 gcc/testsuite/ * gcc.dg/torture/pr117811.c: Fix for int < 32 bit.
2025-06-11c: remaining fix for the composite type inconsistency [PR120510]Martin Uecker1-0/+25
There is an old GNU extension which allows overriding the promoted old-style arguments when there is an earlier prototype An example (from a test added for PR16666) is the following. float dremf (float, float); float dremf (x, y) float x, y; { return x + y; } The types of the two declarations are not compatible, because the arguments are not self-promoting. Add a special case to function_types_compatible_p that can be toggled via a flag for comptypes_internal and add a helper function to be able to add the checking assertions to composite_type. PR c/120510 gcc/c/ChangeLog: * c-typeck.cc (composite_type_internal): Activate checking assertions for all types and also inputs. (comptypes_for_composite_check): New helper function. (function_types_compatible_p): Add exception. gcc/testsuite/ChangeLog: * gcc.dg/old-style-prom-4.c: New test.
2025-06-11c: fix ICE for invalid code in generic selection [PR120303]Martin Uecker1-0/+5
Fix an error recovery ICE that occurs when a typename can not be parsed correctly in the controlling expression of a generic selection. PR c/120303 gcc/c/ChangeLog: * c-parser.cc (c_parser_generic_selection): Handle error condition. gcc/testsuite/ChangeLog: * gcc.dg/pr120303.c: New test.
2025-06-10diagnostics: make experimental-html sink prettier [PR116792]David Malcolm6-192/+107
This patch to the "experimental-html" diagnostic sink: * adds use of the PatternFly 3 CSS library (via an optional link in the generated html to a copy in a CDN) * uses PatternFly's "alert" pattern to show severities for diagnostics, properly nesting "note" diagnostics for diagnostic groups. Example: before: https://dmalcolm.fedorapeople.org/gcc/2025-06-10/before/diagnostic-ranges.c.html after: https://dmalcolm.fedorapeople.org/gcc/2025-06-10/after/diagnostic-ranges.c.html * adds initial support for logical locations and physical locations * adds initial support for multi-level nested diagnostics such as those for C++ concepts diagnostics. Ideally this would show a clickable disclosure widget to expand/collapse a level, but for now it uses nested <ul> elements with <li> for the child diagnostics. Example: before: https://dmalcolm.fedorapeople.org/gcc/2025-06-10/before/nested-diagnostics-1.C.html after: https://dmalcolm.fedorapeople.org/gcc/2025-06-10/after/nested-diagnostics-1.C.html gcc/ChangeLog: PR other/116792 * diagnostic-format-html.cc: Include "diagnostic-path.h" and "diagnostic-client-data-hooks.h". (html_builder::m_logical_loc_mgr): New field. (html_builder::m_cur_nesting_levels): New field. (html_builder::m_last_logical_location): New field. (html_builder::m_last_location): New field. (html_builder::m_last_expanded_location): New field. (HTML_STYLE): Add "white-space: pre;" to .source and .annotation. Add "gcc-quoted-text" CSS class. (html_builder::html_builder): Initialize the new fields. If CSS is enabled, add CDN links to PatternFly 3 stylesheets. (html_builder::add_stylesheet): New. (html_builder::on_report_diagnostic): Add "alert" param to make_element_for_diagnostic, setting it by default, but unsetting it for nested diagnostics below the top level. Use add_at_nesting_level for nested diagnostics. (add_nesting_level_attr): New. (html_builder::add_at_nesting_level): New. (get_pf_class_for_alert_div): New. (get_pf_class_for_alert_icon): New. (get_label_for_logical_location_kind): New. (add_labelled_value): New. (html_builder::make_element_for_diagnostic): Add leading comment. Add "alert" param. Drop class="gcc-diagnostic" from <div> tag, instead adding the class for a PatternFly 3 alert if "alert" is true, and adding a <span> with an alert icon, both according to the diagnostic severity. Add a severity prefix to the message for alerts. Add any metadata/option text as suffixes to the message. Show any logical location. Show any physical location. Don't show the locus if the last location is unchanged within the diagnostic_group. Wrap any execution path element in a <div id="execution-path"> and add a label to it. Wrap any generated patch in a <div id="suggested-fix"> and add a label to it. (selftest::test_simple_log): Update expected HTML. gcc/testsuite/ChangeLog: PR other/116792 * gcc.dg/html-output/missing-semicolon.py: Update for changes to diagnostic elements. * gcc.dg/format/diagnostic-ranges-html.py: Likewise. * gcc.dg/plugin/diagnostic-test-metadata-html.py: Likewise. Drop out-of-date comment. * gcc.dg/plugin/diagnostic-test-paths-2.py: Likewise. * gcc.dg/plugin/diagnostic-test-paths-4.py: Likewise. Drop out-of-date comment. * gcc.dg/plugin/diagnostic-test-show-locus.py: Likewise. * lib/htmltest.py (get_diag_by_index): Update to use search by id. (get_message_within_diag): Update to use search by class. libcpp/ChangeLog: PR other/116792 * include/line-map.h (typedef expanded_location): Convert to... (struct expanded_location): ...this. (operator==): New decl, for expanded_location. (operator!=): Likewise. * line-map.cc (operator==): New decl, for expanded_location. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-06-10diagnostics: fix tag nesting issues in experimental-html sink [PR120610]David Malcolm1-23/+0
I've been seeing issues in the experimental-html sink where the nesting of tags goes wrong. The two issues I've seen are: * the pp_token_list from the diagnostic message that reaches the html_token_printer doesn't always have matching pairs of begin/end tokens (PR other/120610) * a bug in diagnostic-show-locus where there was a stray xp.pop_tag, in print_trailing_fixits. This patch: * changes the xml::printer::pop_tag API so that it now takes the expected name of the element being popped (rather than expressing this in comments), and that, by default, the xml::printer asserts that this matches. * gives the html_token_printer its own xml::printer instance to restrict the affected area of the DOM tree; this xml::printer doesn't enforce nesting (PR other/120610) * adds RAII sentinel classes that automatically check for pushes/pops being balanced within a scope, using them in various places * fixes the bug in print_trailing_fixits for html output gcc/ChangeLog: PR other/120610 * diagnostic-format-html.cc (html_builder::html_builder): Update for new param of xml::printer::pop_tag. (html_path_label_writer::end_label): Likewise. (html_builder::make_element_for_diagnostic::html_token_printer): Give the instance its own xml::printer. Update for new param of xml::printer::pop_tag. (html_builder::make_element_for_diagnostic): Give the instance its own xml::printer. (html_builder::make_metadata_element): Update for new param of xml::printer::pop_tag. (html_builder::flush_to_file): Likewise. * diagnostic-path-output.cc (begin_html_stack_frame): Likewise. (begin_html_stack_frame): Likewise. (end_html_stack_frame): Likewise. (print_path_summary_as_html): Likewise. * diagnostic-show-locus.cc (struct to_text::auto_check_tag_nesting): New. (struct to_html:: auto_check_tag_nesting): New. (to_text::pop_html_tag): Change param to const char *. (to_html::pop_html_tag): Likewise; rename param to "expected_name". (default_diagnostic_start_span_fn<to_html>): Update for new param of xml::printer::pop_tag. (layout_printer<to_html>::end_label): Likewise. (layout_printer<Sink>::print_trailing_fixits): Add RAII sentinel to check tag nesting for the HTML case. Delete stray popping of "td" in the presence of fix-it hints. (layout_printer<Sink>::print_line): Add RAII sentinel to check tag nesting for the HTML case. (diagnostic_source_print_policy::print_as_html): Likewise. (layout_printer<Sink>::print): Likewise. * xml-printer.h (xml::printer::printer): Add optional "check_popped_tags" param. (xml::printer::pop_tag): Add "expected_name" param. (xml::printer::get_num_open_tags): New accessor. (xml::printer::dump): New decl. (xml::printer::m_check_popped_tags): New field. (class xml::auto_check_tag_nesting): New. (class xml::auto_print_element): Update for new param of pop_tag. * xml.cc: Move pragma pop so that the pragma also covers xml::printer's member functions, "dump" in particular. (xml::printer::printer): Add param "check_popped_tags". (xml::printer::pop_tag): Add param "expected_name" and use it to assert that the popped tag is as expected. Assert that we have a tag to pop. (xml::printer::dump): New. (selftest::test_printer): Update for new param of pop_tag. (selftest::test_attribute_ordering): Likewise. gcc/testsuite/ChangeLog: PR other/120610 * gcc.dg/format/diagnostic-ranges-html.py: Remove out-of-date comment. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-06-10c: partial fix for qualifier inconsistency [PR120510]Martin Uecker1-0/+7
Checking assertions revealed that we sometimes produce composite types with incorrect qualifiers, e.g. the example int f(int [_Atomic]); int f(int [_Atomic]); int f(int [_Atomic]); was rejected because atomic was lost in the second declaration. PR c/120510 gcc/c/ChangeLog: * c-typeck.cc (composite_types_internal): Handle arrays declared with atomic for function arguments. gcc/testsuite/ChangeLog: * gcc.dg/pr120510.c
2025-06-09diagnostics: fix <title> of experimental-html output [PR116792]David Malcolm1-24/+1
Add a new vfunc diagnostic_output_format::set_main_input_filename so that we can separate setting the <title> of HTML output and the diagnostic_artifact_role::analysis_target of SARIF output from creation of the sinks. Calling it is done by the various creators of the sinks. gcc/ChangeLog: PR other/116792 * diagnostic-format-html.cc (html_builder::m_title_element): New field. (html_builder::html_builder): Initialize it. Don't add placeholder text. (html_builder::set_main_input_filename): New. (html_output_format::set_main_input_filename): New. (test_html_diagnostic_context::test_html_diagnostic_context): Call set_main_input_filename on the new sink. (seldtest::test_simple_log): Update expected <title> text. * diagnostic-format-json.cc (diagnostic_output_format_init_json): Return a reference to the new sink. (diagnostic_output_format_init_json_stderr): Likewise. (diagnostic_output_format_init_json_file): Likewise. * diagnostic-format-sarif.cc (sarif_builder::sarif_builder): Drop "main_input_filename_" param, and move adding an artifact for it with diagnostic_artifact_role::analysis_target to... (sarif_builder::set_main_input_filename): ...this new function. (sarif_output_format::set_main_input_filename): New. (sarif_output_format::sarif_output_format): Drop "main_input_filename_" param. (sarif_stream_output_format::sarif_stream_output_format): Likewise. (sarif_file_output_format::sarif_file_output_format): Likewise. (diagnostic_output_format_init_sarif): Return a reference to *FMT. (diagnostic_output_format_init_sarif_stderr): Return a refererence to the new sink. Drop "main_input_filename_" param. (diagnostic_output_format_init_sarif_file): Likewise. (diagnostic_output_format_init_sarif_stream): Likewise. (make_sarif_sink): Drop "main_input_filename_" param. (selftest::test_sarif_diagnostic_context::test_sarif_diagnostic_context): Likewise. Call set_main_input_filename on the new format. (selftest::test_sarif_diagnostic_context::buffered_output_format::buffered_output_format): Drop "main_input_filename_" param. (selftest::test_make_location_object): Likewise. * diagnostic-format-sarif.h (diagnostic_output_format_init_sarif_stderr): Return a refererence to the new sink. Drop "main_input_filename_" param. (diagnostic_output_format_init_sarif_file): Likewise. (diagnostic_output_format_init_sarif_stream): Likewise. (make_sarif_sink): Drop "main_input_filename_" param. * diagnostic-format.h (diagnostic_output_format::set_main_input_filename): New vfunc. (diagnostic_output_format_init_json_stderr): Return a refererence to the new sink. (diagnostic_output_format_init_json_file): Likewise. * diagnostic.cc (diagnostic_output_format_init): Likewise. Call set_main_input_filename on the new sink. * libgdiagnostics.cc (sarif_sink::sarif_sink): Update for above changes. * opts-diagnostic.cc (sarif_scheme_handler::make_sink): Likewise. (handle_OPT_fdiagnostics_add_output_): Likewise. (handle_OPT_fdiagnostics_set_output_): Likewise. gcc/testsuite/ChangeLog: PR other/116792 * gcc.dg/html-output/missing-semicolon.py: Update expected <title> text. Drop out-of-date comment. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-06-09[AutoFDO][testsuite] Enable clone-merge-1.c only for fauto-profileKugan Vivekanandarajah1-0/+2
This patch enables clone-merge-1.c only for fauto-profile as it is failing otherwise. This also fixes a yupo in merge. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/clone-merge-1.c: Enable only for -fauto-profile. gcc/ChangeLog: * auto-profile.cc (function_instance::merge): Fix typo. Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
2025-06-08phi-opt: Do limited form of cselim from phiopt [PR120533]Andrew Pinski4-6/+5
So currently cselim is limited to targets which have conditional move and also happens later in the pipeline. This adds the limited form of cselim; where there is only one store in the two sides and no loads after the store. This fixes phiprop-2.c for gcn target and now can match the MIN in phiopt1 so it moves the matching of MIN to phiopt1. The other testcases already disable cselim so they need to disable phiopt too. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/120533 gcc/ChangeLog: * tree-ssa-phiopt.cc (cond_if_else_store_replacement_limited): New function. (pass_phiopt::execute): Call cond_if_else_store_replacement_limited for diamand case. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr35286.c: Add -fno-ssa-phiopt. * gcc.dg/tree-ssa/split-path-6.c: Likewise. * gcc.dg/tree-ssa/split-path-7.c: Likewise. * gcc.dg/tree-ssa/phiprop-2.c: Move the check for MIN_EXPR to phiopt1. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-06-05ranger: Add support for float <-> int casts [PR120231]Jakub Jelinek2-0/+147
The following patch adds support for float <-> integer conversions in ranger. The patch reverts part of the r16-571 changes, those changes were right for fold_range, but not for op1_range, where RO_IFI and RO_FIF are actually called rather than RO_IFF and RO_FII that the patch expected. Also, the float -> int operation actually uses FIX_TRUNC_EXPR tree code rather than NOP_EXPR or CONVERT_EXPR and int -> float uses FLOAT_EXPR, but I think we can just handle all of them using operator_cast, at least as long as we don't try to use VIEW_CONVERT_EXPR using that too; not really sure handling VCE at least for floating to integral or vice versa would be actually useful though. The patch "regressed" two tests, gfortran.dg/inline_matmul_16.f90 and g++.dg/tree-ssa/loop-split-1.C. In the first case, there is a loop doing matmul on various sizes of matrices, up to 10x10 matrices, and Fortran FE given the options emits two implementations of the matmul, one inline for the case where the matmul has less than 1000 elements and one for larger matmuls. The check for whatever reason uses floating point calculations and before this patch we weren't able to prove that all the matrices will be smaller than the cutoff and the test was checking for presence of the fallback call; with the patch we are able to figure it out and only keep the inline copy. I've duplicated the test, once unmodified source which doesn't expect _gfortran_matmul string in optimized dump anymore, and another copy which uses volatile ten instead of 10 in loop upper bounds so that it has to keep the fallback and scans for it. The other test is g++.dg/tree-ssa/loop-split-1.C, which does constexpr unsigned s = 100000000; ... for(unsigned i = 0; i < s; ++i) { if(i == 0) a[i] = b[i] * c[i]; else a[i] = (b[i] + c[i]) * c[i-1] * std::log(i); } and for some reason the successful loop splitting for which the test searches in a dump file is dependent on the errno fallback of std::log, where we do t = std::log((double)i); if ((double)i) u> 0); else log ((double)i); But i goes only from 1 to 100000000, so (double)i has the range [1.0, 100000000.0] with the patch and so we see it will never need errno nor raise exception. I've tested adding + d for it where d is 0.0 but modifiable in some other TU, and tested it also with r14-2851 and r14-2852, where the former FAILed the test both unmodified and modified, while the latter PASSed both versions. 2025-06-05 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/120231 * range-op.cc (range_op_table::range_op_table): Register op_cast also for FLOAT_EXPR and FIX_TRUNC_EXPR. (RO_III): Adjust comment. (range_op_handler::op1_range): Handle RO_IFI rather than RO_IFF. Don't handle RO_FII. (range_operator::op1_range): Remove overload with irange &, tree, const frange &, const frange &, relation_trio and frange &, tree, const irange &, const irange &, relation_trio arguments. Add overload with irange &, tree, const frange &, const irange &, relation_trio arguments. * range-op-mixed.h (operator_cast::op1_range): Remove overload with irange &, tree, const frange &, const frange &, relation_trio and frange &, tree, const irange &, const irange &, relation_trio arguments. Add overload with irange &, tree, const frange &, const irange &, relation_trio and frange &, tree, const irange &, const frange &, relation_trio arguments. * range-op.h (range_operator::op1_cast): Remove overload with irange &, tree, const frange &, const frange &, relation_trio and frange &, tree, const irange &, const irange &, relation_trio arguments. Add overload with irange &, tree, const frange &, const irange &, relation_trio arguments. * range-op-float.cc (operator_cast::fold_range): Implement float to int and int to float casts. (operator_cast::op1_range): Remove overload with irange &, tree, const frange &, const frange &, relation_trio and frange &, tree, const irange &, const irange &, relation_trio arguments. Add overload with irange &, tree, const frange &, const irange &, relation_trio and frange &, tree, const irange &, const frange &, relation_trio arguments and implement reverse op of float to int and int to float cast there. * gcc.dg/tree-ssa/pr120231-2.c: New test. * gcc.dg/tree-ssa/pr120231-3.c: New test. * gfortran.dg/inline_matmul_16.f90: Don't expect any _gfortran_matmul strings in optimized dump. * gfortran.dg/inline_matmul_26.f90: New test. * g++.dg/tree-ssa/loop-split-1.C (d): New variable. (main): Use std::log (i + d) instead of std::log (i).
2025-06-05real: Fix up real_from_integer [PR120547]Jakub Jelinek1-0/+26
The function has 2 problems, one is _BitInt specific and the other is most likely also reproduceable only with it. The first issue is that I've missed updating the function for _BitInt, maxbitlen as MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT obviously isn't guaranteed to be larger than any integral type we might want to convert at compile time from wide_int to REAL_VALUE_FORMAT. Just using len instead of it works fine, at least when used after HOST_BITS_PER_WIDE_INT is added to it and it is truncated to multiples of HOST_BITS_PER_WIDE_INT. The other bug is that if the value has too many significant bits (formerly maxbitlen - cnt_l_z, now len - cnt_l_z), the code just shifts it right and adds the shift count to the future exponent. That isn't correct for rounding as the testcase attempts to show, the internal real format has more bits than any precision in supported format, but we still need to distinguish bewtween values exactly half way between representable floating point values (those should be rounded to even) and the case when we've shifted away some non-zero bits, so the value was tiny bit larger than half way and then we should round up. The patch uses something like e.g. soft-fp uses in these cases, right shift with sticky bit in the least significant bit. 2025-06-05 Jakub Jelinek <jakub@redhat.com> PR middle-end/120547 * real.cc (real_from_integer): Remove maxbitlen variable, use len instead of that. When shifting right, or in 1 if any of the shifted away bits are non-zero. Formatting fix. * gcc.dg/bitint-123.c: New test.
2025-06-05middle-end: Fix operation_could_trap_p for FIX_TRUNC expressionsSpencer Abson2-0/+25
Floating-point to integer conversions can be inexact or invalid (e.g., due to overflow or NaN). However, since users of operation_could_trap_p infer the bool FP_OPERATION argument from the expression's type, the FIX_TRUNC family are considered non-trapping here. This patch handles them explicitly. gcc/ChangeLog: * tree-eh.cc (operation_could_trap_helper_p): Cover FIX_TRUNC expressions explicitly. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pr96357.c: Change to avoid producing a conditional FIX_TRUNC_EXPR, whilst still reproducing the bug in PR96357. * gcc.dg/tree-ssa/ifcvt-fix-trunc-1.c: New test. * gcc.dg/tree-ssa/ifcvt-fix-trunc-2.c: Likewise.
2025-06-05[AutoFDO] Profile merging for clone testKugan Vivekanandarajah1-0/+32
This patch introduces a new testcase to verify the merging of profiles is performed for cloned functions. Since this is invoked very early, before the pass manager, we need to set up the dumping explicitly. This is similar to the handling in finish_optimization_passes. gcc/ChangeLog: * auto-profile.cc (autofdo_source_profile::read): Dump message while merging profile. * pass_manager.h (get_pass_auto_profile): New. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/clone-merge-1.c: New test. Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
2025-06-04gimple-fold: Implement simple copy propagation for aggregates [PR14295]Andrew Pinski5-3/+89
This implements a simple copy propagation for aggregates in the similar fashion as we already do for copy prop of zeroing. Right now this only looks at the previous vdef statement but this allows us to catch a lot of cases that show up in C++ code. This used to deleted aggregate copies that are to the same location (PR57361) But that was found to delete statements that are needed for aliasing markers reason. So we need to keep them around until that is solved. Note DSE will delete the statements anyways so there is no testcase added since we expose the latent bug in the same way. See https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685003.html for the testcase and explaintation there. Also adds a variant of pr22237.c which was found while working on this patch. Changes since v1: * v2: change check for vuse to use default definition. Remove dest/src arguments for optimize_agr_copyprop Changed dump messages slightly. Added stats Don't delete `a = a` until aliasing markers are added. PR tree-optimization/14295 PR tree-optimization/108358 PR tree-optimization/114169 gcc/ChangeLog: * tree-ssa-forwprop.cc (optimize_agr_copyprop): New function. (pass_forwprop::execute): Call optimize_agr_copyprop for load/store statements. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/20031106-6.c: Un-xfail. Add scan for forwprop1. * g++.dg/opt/pr66119.C: Disable forwprop since that does the copy prop now. * gcc.dg/tree-ssa/pr108358-a.c: New test. * gcc.dg/tree-ssa/pr114169-1.c: New test. * gcc.c-torture/execute/builtins/pr22237-1-lib.c: New test. * gcc.c-torture/execute/builtins/pr22237-1.c: New test. * gcc.dg/tree-ssa/pr57361.c: Disable forwprop1. * gcc.dg/tree-ssa/pr57361-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-06-04ranger: Add support for float <-> float casts [PR120231]Jakub Jelinek1-0/+67
I've noticed we don't even support say float -> double and other scalar floating point to scalar floating point conversions in the ranger, we just end up with VARYING for those. The following patch attempts to fix that. The reverse cast case uses float_binary_op_range_finish e.g. because if the result isn't infinite, then the source couldn't be infinite either even if the reverse fold_range would suggest that. And special cases the case of guaranteed widening cast (where we have assurance that all the source type values are exactly representable in the destination type; using ieee_bits for that). 2025-06-04 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/120231 * range-op-mixed.h (operator_cast::fold_range): Add overload with 3 {,const} frange & operands. Change parameter names and add final override keywords for float <-> integer cast overloads. (operator_cast::op1_range): Likewise. * range-op-float.cc (operator_cast::fold_range): New overload with 3 {,const} frange & operands. (operator_cast::op1_range): Likewise. * gcc.dg/tree-ssa/pr120231-1.c: New test.
2025-06-04emit-rtl: Tweak validate_subreg ordered_p condition [PR120447]Richard Sandiford1-0/+24
In the comment trail for PR119966, I'd said that the validate_subreg condition: /* The outer size must be ordered wrt the register size, otherwise we wouldn't know at compile time how many registers the outer mode occupies. */ if (!ordered_p (osize, regsize)) return false; "is also potentially relevant" for paradoxical subregs. But I'd forgotten an important caveat. If the inner size is smaller than a register, we know that the inner value will only occupy a single register. Although the paradoxical subreg might extend that single register to multiple registers by padding with undefined bits, the register size that matters for the extension is: REGMODE_NATURAL_SIZE (omode) rather than regsize's: REGMODE_NATURAL_SIZE (imode) The ordered check is still relevant if the inner value spans multiple registers. Enabling the check above for paradoxical subregs led to an ICE in the testcase, where we tried to generate a VNx4QI paradoxical subreg of a QI scalar. This was previously allowed, and AFAIK worked correctly. The patch doesn't have the effect of relaxing the condition for non-paradoxical subregs, since: known_le (osize, isize) && known_le (isize, regsize) => known_le (osize, regsize) => ordered_p (osize, regsize) So even before the patch for PR119966, the condition only existed for the maybe_gt (isize, regsize) case. The term "block" used in the comment is taken from the rtl.texi documentation of subregs. gcc/ PR rtl-optimization/120447 * emit-rtl.cc (validate_subreg): Restrict ordered_p test between osize and regsize to cases where the inner value occupies multiple blocks. gcc/testsuite/ PR rtl-optimization/120447 * gcc.dg/pr120447.c: New test.