Age | Commit message (Collapse) | Author | Files | Lines |
|
Current array bound checker only instruments ARRAY_REF, and the INDEX
information is the 2nd operand of the ARRAY_REF.
When extending the array bound checker to pointer references with
counted_by attributes, the hardest part is to get the INDEX of the
corresponding array ref from the offset computation expression of
the pointer ref. I.e.
Given an OFFSET expression, and the ELEMENT_SIZE,
get the index expression from the OFFSET.
For example:
OFFSET:
((long unsigned int) m * (long unsigned int) SAVE_EXPR <n>) * 4
ELEMENT_SIZE:
(sizetype) SAVE_EXPR <n> * 4
get the index as (long unsigned int) m.
gcc/c-family/ChangeLog:
* c-gimplify.cc (is_address_with_access_with_size): New function.
(ubsan_walk_array_refs_r): Instrument an INDIRECT_REF whose base
address is .ACCESS_WITH_SIZE or an address computation whose base
address is .ACCESS_WITH_SIZE.
* c-ubsan.cc (ubsan_instrument_bounds_pointer_address): New function.
(struct factor_t): New structure.
(get_factors_from_mul_expr): New function.
(get_index_from_offset): New function.
(get_index_from_pointer_addr_expr): New function.
(is_instrumentable_pointer_array_address): New function.
(ubsan_array_ref_instrumented_p): Change prototype.
Handle MEM_REF in addtional to ARRAY_REF.
(ubsan_maybe_instrument_array_ref): Handle MEM_REF in addtional
to ARRAY_REF.
gcc/testsuite/ChangeLog:
* gcc.dg/ubsan/pointer-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-4.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-5.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds.c: New test.
|
|
gcc/ChangeLog:
* tree-object-size.cc (access_with_size_object_size): Update comments
for pointers with .ACCESS_WITH_SIZE.
(collect_object_sizes_for): Propagate size info through GIMPLE_ASSIGN
for pointers with .ACCESS_WITH_SIZE.
gcc/testsuite/ChangeLog:
* gcc.dg/pointer-counted-by-4-char.c: New test.
* gcc.dg/pointer-counted-by-4-float.c: New test.
* gcc.dg/pointer-counted-by-4-struct.c: New test.
* gcc.dg/pointer-counted-by-4-union.c: New test.
* gcc.dg/pointer-counted-by-4.c: New test.
* gcc.dg/pointer-counted-by-5.c: New test.
* gcc.dg/pointer-counted-by-6.c: New test.
* gcc.dg/pointer-counted-by-7.c: New test.
|
|
pointer reference with counted_by attribute to .ACCESS_WITH_SIZE.
For example:
struct PP {
size_t count2;
char other1;
char *array2 __attribute__ ((counted_by (count2)));
int other2;
} *pp;
specifies that the "array2" is an array that is pointed by the
pointer field, and its number of elements is given by the field
"count2" in the same structure.
gcc/c-family/ChangeLog:
* c-attribs.cc (handle_counted_by_attribute): Accept counted_by
attribute for pointer fields.
gcc/c/ChangeLog:
* c-decl.cc (verify_counted_by_attribute): Change the 2nd argument
to a vector of fields with counted_by attribute. Verify all fields
in this vector.
(finish_struct): Collect all the fields with counted_by attribute
to a vector and pass this vector to verify_counted_by_attribute.
* c-typeck.cc (build_counted_by_ref): Handle pointers with counted_by.
Add one more argument, issue error when the pointee type is a structure
or union including a flexible array member.
(build_access_with_size_for_counted_by): Handle pointers with counted_by.
(handle_counted_by_for_component_ref): Call build_counted_by_ref
with the new prototype.
gcc/ChangeLog:
* doc/extend.texi: Extend counted_by attribute to pointer fields in
structures. Add one more requirement to pointers with counted_by
attribute.
gcc/testsuite/ChangeLog:
* gcc.dg/flex-array-counted-by.c: Update test.
* gcc.dg/pointer-counted-by-1.c: New test.
* gcc.dg/pointer-counted-by-2.c: New test.
* gcc.dg/pointer-counted-by-3.c: New test.
* gcc.dg/pointer-counted-by.c: New test.
|
|
The test is not endianess clean, x[0] is supposed to be ((__int128)0x19)<<32
on little endian - 0x19 is in the second vector elt - but ((__int128)0x19)<<64
on big endian. I've added also verification of int and __int128 sizes just
in case we have say 16-bit or 64-bit int target with __int128 type, or
pdp endian gets __int128 support.
2025-07-01 Jakub Jelinek <jakub@redhat.com>
PR ipa/119318
PR testsuite/120082
* gcc.dg/ipa/pr119318.c (main): Expect different result on big endian
from little endian, on unexpected endianness or int/int128 sizes don't
test anything. Formatting fixes.
|
|
This patch partially handled PR118592.
This patch builds upon r16-710-g591d3d02664c7b and r16-711-g89935d56f768b4. It
introduces middle-end optimizations, such as constant folding, for our
trigonometric pi-based function built-ins.
gcc/ChangeLog:
* fold-const-call.cc (fold_const_call_ss): Constant fold for
single arg pi-based trigonometric builtins.
(fold_const_call_sss): Constant fold for double arg pi-based
trigonometric builtins.
* fold-const.cc (negate_mathfn_p): asinpi/atanpi is odd func.
(tree_call_nonnegative_warnv_p): acospi always non-neg,
asinpi/atanpi non-neg iff arg non-neg.
* tree-call-cdce.cc (can_test_argument_range): Add acospi/asinpi.
(edom_only_function): Add acospi/asinpi/cospi/sinpi.
(get_no_error_domain): Add acospi/asinpi.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (foldable_pi_based_trigonometry): New
effective target.
* gcc.dg/torture/builtin-math-9.c: New test.
Signed-off-by: Yuao Ma <c8ef@outlook.com>
|
|
dfp.exp tests for dfprt before deciding whether to default to run or
compile, and the PR120631 tests override that without checking for
dfprt. Rework them to avoid attempting to link and run programs
when dfp runtime support isn't available.
for gcc/testsuite/ChangeLog
PR middle-end/120631
* gcc.dg/dfp/pr120631.c: Drop overrider of dg-do default action.
* gcc.dg/dfp/bitint-9.c: Likewise.
* gcc.dg/dfp/bitint-10.c: Likewise.
|
|
ext-dce's actions
I've gone back and forth of these problems multiple times. We have two passes,
ext-dce and combine which eliminate extensions using totally different
mechanisms.
ext-dce looks for cases where the state of upper bits in an object aren't
observable and if they aren't observable, then eliminates extensions which set
those bits.
combine looks for cases where we know the state of the upper bits and can prove
an extension is just setting those bits to their prior value. Combine also
looks for cases where the precise extension isn't really important, just the
knowledge that the upper bits are zero or sign extended from a narrower mode
is needed.
Combine relies heavily on the SUBREG_PROMOTED_VAR state to do its job. If the
actions of ext-dce (or any other pass for that matter) make
SUBREG_PROMOTED_VAR's state inconsistent with combine's expectations, then
combine can end up generating incorrect code.
--
When ext-dce eliminates an extension and turns it into a subreg copy (without
any known SUBREG_PROMOTED_VAR state). Since we can no longer guarantee the
destination object has any known extension state, we scurry around and wipe
SUBREG_PROMOTED_VAR state for the destination object.
That's fine and dandy, but ultimately insufficient. Consider if the
destination of the optimized extension was used as a source in a simple copy
insn. Furthermore assume that the destination of that copy is used within a
SUBREG expression with SUBREG_PROMOTED_VAR set. ext-dce's actions have
clobbered the SUBREG_PROMOTED_VAR state on the destination of that copy, albeit
indirectly.
This patch addresses this problem by taking the set of pseudos directly
impacted by ext-dce's actions and expands that set by building a transitive
closure for pseudos connected via copies. We then scurry around finding
SUBREG_PROMOTED_VAR state to wipe for everything in that expanded set of
pseudos. Voila, everything just works.
--
The other approach here would be to further expand the liveness sets inside
ext-dce. That's a simpler path forward, but ultimately regresses the quality
of codes we do care about.
One good piece of news is that with the transitive closure bits in place, we
can eliminate a bit of the live set expansion we had in place for
SUBREG_PROMOTED_VAR objects.
--
So let's take one case of the 5 that have been reported.
In ext-dce we have this insn:
> (insn 29 27 30 3 (set (reg:DI 134 [ al_lsm.9 ])
> (zero_extend:DI (subreg:HI (reg:DI 162) 0))) "j.c":17:17 552 {*zero_extendhidi2_bitmanip}
> (expr_list:REG_DEAD (reg:DI 162)
> (nil)))
There are reachable uses of (reg 134):
> (insn 49 47 52 6 (set (mem/c:HI (lo_sum:DI (reg/f:DI 186)
> (symbol_ref:DI ("al") [flags 0x86] <var_decl 0x7ffff73c2da8 al>)) [2 al+0 S2 A16])
> (subreg/s/v:HI (reg:DI 134 [ al_lsm.9 ]) 0)) 279 {*movhi_internal}
> (expr_list:REG_DEAD (reg/f:DI 186)
> (nil)))Obviously safe if we were to remove the extension.
> (insn 52 49 53 6 (set (reg:DI 176)
> (and:DI (reg:DI 134 [ al_lsm.9 ])
> (const_int 5 [0x5]))) "j.c":21:12 106 {*anddi3}
> (expr_list:REG_DEAD (reg:DI 134 [ al_lsm.9 ])
> (nil)))
> (insn 53 52 56 6 (set (reg:SI 177 [ _8 ])
> (zero_extend:SI (subreg:HI (reg:DI 176) 0))) "j.c":21:12 551 {*zero_extendhisi2_bitmanip}
> (expr_list:REG_DEAD (reg:DI 176)
> (nil))) Safe to remove the extension as we only read the low 16 bits from the destination register (reg 176) in insn 53.
> (insn 27 26 29 3 (set (reg:DI 162)
> (sign_extend:DI (plus:SI (subreg/s/v:SI (reg:DI 134 [ al_lsm.9 ]) 0)
> (const_int 1 [0x1])))) "j.c":17:17 8 {addsi3_extended}
> (expr_list:REG_DEAD (reg:DI 134 [ al_lsm.9 ])
> (nil)))
> (insn 29 27 30 3 (set (reg:DI 134 [ al_lsm.9 ])
> (zero_extend:DI (subreg:HI (reg:DI 162) 0))) "j.c":17:17 552 {*zero_extendhidi2_bitmanip}
> (expr_list:REG_DEAD (reg:DI 162)
> (nil)))
Again, not as obvious as the first case, but we only read the low 16 bits from
(reg 162) in insn 29. So those upper bits in (reg 134) don't matter.
> (insn 26 92 27 3 (set (reg:DI 144 [ ivtmp.17 ])
> (reg:DI 134 [ al_lsm.9 ])) 277 {*movdi_64bit}
> (nil))
> (insn 30 29 31 3 (set (reg:DI 135 [ al.2_3 ])
> (sign_extend:DI (subreg/s/v:HI (reg:DI 144 [ ivtmp.17 ]) 0))) "j.c":17:9 558 {*extendhidi2_bitmanip}
> (expr_list:REG_DEAD (reg:DI 144 [ ivtmp.17 ])
> (nil)))Also safe in isolation. But worth noting that if we remove the extension at insn 29, then the promoted status on (reg:DI 144) in insn 30 is no longer valid.
Setting aside the promoted state of (reg:DI 144) at insn 30 for a minute, let's
look into combine.
> (insn 26 92 27 3 (set (reg:DI 144 [ ivtmp.17 ])
> (reg:DI 134 [ al_lsm.9 ])) 277 {*movdi_64bit}
> (nil)) [ ... ]
> (insn 30 29 31 3 (set (reg:DI 135 [ al.2_3 ])
> (sign_extend:DI (subreg/s/v:HI (reg:DI 144 [ ivtmp.17 ]) 0))) "j.c":17:9 558 {*extendhidi2_bitmanip}
> (expr_list:REG_DEAD (reg:DI 144 [ ivtmp.17 ])
> (nil)))
> (jump_insn 31 30 32 3 (set (pc)
> (if_then_else (eq (reg:DI 135 [ al.2_3 ])
> (const_int 0 [0]))
> (label_ref:DI 41)
> (pc))) "j.c":4:55 371 {*branchdi}
> (int_list:REG_BR_PROB 536870913 (nil))
> -> 41)
Combine will do its thing on insns 30/31. Essentially the sign extension is
not necessary in this context, assuming the promoted subreg status in insn 30
-- the equality test doesn't really care about the kind of extension, just
knowing the value is extended is enough to safely elide the extension.
And now we've come to the crux the problem. That promotion state needs to be
adjusted. The new ext-dce code will see that copy at insn 26 and add (reg 144)
to the set of registers that need promotion state wiped. And everything is
happy after that.
The other cases are similar in nature.
--
This has been bootstrapped and regression tested on x86_64 and aarch64.
Variants have bootstrapped & regression tested on several other platforms and
it's survived testing on the crosses as well.
Pushing to the trunk...
PR rtl-optimization/120242
PR rtl-optimization/120627
PR rtl-optimization/120736
PR rtl-optimization/120813
gcc/
* ext-dce.cc (ext_dce_process_uses): Remove some cases where we
unnecessarily expanded live sets for promoted subregs.
(expand_changed_pseudos): New function.
(reset_subreg_promoted_p): Use it.
gcc/testsuite/
* gcc.dg/torture/pr120242.c: New test.
* gcc.dg/torture/pr120627.c: Likewise.
* gcc.dg/torture/pr120736.c: Likewise.
* gcc.dg/torture/pr120813.c: Likewise.
|
|
Modernization; no functional change intended.
gcc/analyzer/ChangeLog:
* checker-event.cc (function_entry_event::get_meaning): Convert
diagnostic_event::meaning enums to enum class.
(cfg_edge_event::get_meaning): Likewise.
(call_event::get_meaning): Likewise.
(return_event::get_meaning): Likewise.
(start_consolidated_cfg_edges_event::get_meaning): Likewise.
(inlined_call_event::get_meaning): Likewise.
(warning_event::get_meaning): Likewise.
* sm-fd.cc (fd_diagnostic::get_meaning_for_state_change):
Likewise.
* sm-file.cc (file_diagnostic::get_meaning_for_state_change):
Likewise.
* sm-malloc.cc (malloc_diagnostic::get_meaning_for_state_change):
Likewise.
* sm-sensitive.cc
(exposure_through_output_file::get_meaning_for_state_change):
Likewise.
* sm-taint.cc (taint_diagnostic::get_meaning_for_state_change):
Likewise.
* varargs.cc
(va_list_sm_diagnostic::get_meaning_for_state_change): Likewise.
gcc/ChangeLog:
* diagnostic-format-sarif.cc
(sarif_builder::maybe_make_kinds_array): Convert
diagnostic_event::meaning enums to enum class.
* diagnostic-path-output.cc (path_label::get_text): Likewise.
* diagnostic-path.cc
(diagnostic_event::meaning::maybe_get_verb_str): Likewise.
(diagnostic_event::meaning::maybe_get_noun_str): Likewise.
(diagnostic_event::meaning::maybe_get_property_str): Likewise.
* diagnostic-path.h (diagnostic_event::verb): Likewise.
(diagnostic_event::noun): Likewise.
(diagnostic_event::property): Likewise.
(diagnostic_event::meaning): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/analyzer_gil_plugin.cc
(gil_diagnostic::get_meaning_for_state_change): Convert
diagnostic_event::meaning enums to enum class.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
The "json" output format for diagnostics was deprecated in GCC 15, with
advice to users seeking machine-readable diagnostics from GCC to use
SARIF instead.
This patch eliminates it from GCC 16, simplifying the diagnostics
subsystem somewhat.
Note that the Ada frontend seems to have its own implementation of this
in errout.adb (Output_JSON_Message), and documented in
gnat_ugn.texi. This patch does not touch Ada.
gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Drop diagnostic-format-json.o.
* common.opt (fdiagnostics-format=): Drop
"json|json-stderr|json-file".
(diagnostics_output_format): Drop values "json", "json-stderr",
and "json-file".
* diagnostic-format-json.cc: Delete file.
* diagnostic-format.h
(diagnostic_output_format_init_json_stderr): Delete.
(diagnostic_output_format_init_json_file): Delete.
* diagnostic.cc (diagnostic_output_format_init): Delete cases for
DIAGNOSTICS_OUTPUT_FORMAT_JSON_STDERR and
DIAGNOSTICS_OUTPUT_FORMAT_JSON_FILE.
* diagnostic.h (DIAGNOSTICS_OUTPUT_FORMAT_JSON_STDERR): Delete.
(DIAGNOSTICS_OUTPUT_FORMAT_JSON_FILE): Delete.
* doc/invoke.texi: Remove references to json output format.
* doc/ux.texi: Likewise.
* selftest-run-tests.cc (selftest::run_tests): Drop call to
deleted selftest::diagnostic_format_json_cc_tests.
* selftest.h (selftest::diagnostic_format_json_cc_tests): Delete.
gcc/testsuite/ChangeLog:
* c-c++-common/analyzer/out-of-bounds-diagram-1-json.c: Deleted test.
* c-c++-common/diagnostic-format-json-1.c: Deleted test.
* c-c++-common/diagnostic-format-json-2.c: Deleted test.
* c-c++-common/diagnostic-format-json-3.c: Deleted test.
* c-c++-common/diagnostic-format-json-4.c: Deleted test.
* c-c++-common/diagnostic-format-json-5.c: Deleted test.
* c-c++-common/diagnostic-format-json-file-1.c: Deleted test.
* c-c++-common/diagnostic-format-json-stderr-1.c: Deleted test.
* c-c++-common/pr106133.c: Deleted test.
* g++.dg/pr90462.C: Deleted test.
* gcc.dg/plugin/diagnostic-test-paths-3.c: Deleted test.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Remove deleted
test.
* gfortran.dg/diagnostic-format-json-1.F90: Deleted test.
* gfortran.dg/diagnostic-format-json-2.F90: Deleted test.
* gfortran.dg/diagnostic-format-json-3.F90: Deleted test.
* gfortran.dg/diagnostic-format-json-pr105916.F90: Deleted test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
C2Y voted in the
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3466.pdf
paper, which clarifies some of the conditional nonnull cases.
For strncat/__strncat_chk no changes are necessary, we already
use __attribute__((nonnull (1), nonnull_if_nonzero (2, 3))) attributes
on the builtin and glibc can do the same too, meaning that first
argument must be nonnull always and second must be nonnull if
the third one is nonzero.
The problem is with the fread/fwrite changes, where the paper adds:
If size or nmemb is zero,
+ptr may be a null pointer,
fread returns zero and the contents of the array and the state of
the stream remain unchanged.
and ditto for fwrite, so the two argument nonnull_if_nonzero attribute
isn't usable to express that, because whether the pointer can be null
depends on 2 integral arguments rather than one.
The following patch extends the nonnull_if_nonzero attribute, so that
instead of requiring 2 arguments it allows 2 or 3, the first one
is still the pointer argument index which sometimes must not be null
and the other one or two are integral arguments, if there are 2, the
invalid case is only if pointer is null and both the integral arguments
are nonzero.
2025-06-30 Jakub Jelinek <jakub@redhat.com>
PR c/120520
PR c/117023
gcc/
* builtin-attrs.def (DEF_LIST_INT_INT_INT): Define it and
use for 1,2,3.
(ATTR_NONNULL_IF123_LIST): New DEF_ATTR_TREE_LIST.
(ATTR_NONNULL_4_IF123_LIST): Likewise.
* builtins.def (BUILT_IN_FWRITE): Use ATTR_NONNULL_4_IF123_LIST
instead of ATTR_NONNULL_LIST.
(BUILT_IN_FWRITE_UNLOCKED): Likewise.
* gimple.h (infer_nonnull_range_by_attribute): Add another optional
tree * argument defaulted to NULL.
* gimple.cc (infer_nonnull_range_by_attribute): Add OP3 argument,
handle 3 argument nonnull_if_nonzero attribute.
* builtins.cc (validate_arglist): Handle 3 argument nonnull_if_nonzero
attribute.
* tree-ssa-ccp.cc (pass_post_ipa_warn::execute): Likewise.
* ubsan.cc (instrument_nonnull_arg): Adjust
infer_nonnull_range_by_attribute caller, handle 3 argument
nonnull_if_nonzero attribute.
* gimple-range-infer.cc (gimple_infer_range::gimple_infer_range):
Handle 3 argument nonnull_if_nonzero attribute.
* doc/extend.texi (nonnull_if_nonzero): Document 3 argument version
of the attribute.
gcc/c-family/
* c-attribs.cc (c_common_gnu_attributes): Allow 2 or 3 arguments for
nonnull_if_nonzero attribute instead of only 2.
(handle_nonnull_if_nonzero_attribute): Handle 3 argument
nonnull_if_nonzero.
* c-common.cc (struct nonnull_arg_ctx): Rename other member to other1,
add other2 member.
(check_function_nonnull): Clear a if nonnull attribute has an
argument. Adjust for nonnull_arg_ctx changes. Handle 3 argument
nonnull_if_nonzero attribute.
(check_nonnull_arg): Adjust for nonnull_arg_ctx changes, emit different
diagnostics for 3 argument nonnull_if_nonzero attributes.
(check_function_arguments): Adjust ctx var initialization.
gcc/analyzer/
* sm-malloc.cc (malloc_state_machine::on_stmt): Handle 3 argument
nonnull_if_nonzero attribute.
gcc/testsuite/
* gcc.dg/nonnull-9.c: Tweak for 3 argument nonnull_if_nonzero
attribute support, add further tests.
* gcc.dg/nonnull-12.c: New test.
* gcc.dg/nonnull-13.c: New test.
* gcc.dg/nonnull-14.c: New test.
* c-c++-common/ubsan/nonnull-8.c: New test.
* c-c++-common/ubsan/nonnull-9.c: New test.
|
|
I have tested Kugan's patch on exchange2 and noticed multiple problems:
1) with LTO the translation from dwarf names to symbol names is disabled
since we free lang data sooner. I moved the offline pass upstream which
however also may make us miss clones intorduced betwen free lang data
and annotation. This is not very important right now and may be furhter
fixed by splitting off auto-profile-read and offline passes.
2) I noticed that we miss a lot of AFDO inlines because some code compares
name indexes for equality in belief that it compares symbol names. This
is not ture if we drop prefixes. For this reason I integrated get_original_name
into the renaming machinery which actually updates indexes so string table
conitnues to work as symbol table.
This lets me to drop
afdo_string_table->get_index (afdo_string_table->get_name (other->name ()))
hops that were introduced at some places
Now after renaming all afdo instances should go by DECL_ASSEMBLER_NAME
names.
3) Detection of realized offline instances had an ordering issue where we
omitted marking of those that were offlined later. Since we can now
lookup assembler names, I simplified the logic into single-pass.
autoprofiledbootstrapped/regteted x86_64-linux, comitted.
gcc/ChangeLog:
* auto-profile.cc (get_original_name): Only strip suffixes introduced
after auto-fdo annotation.
(string_table::get_index_by_decl): Simplify.
(string_table::add_name): New member function.
(string_table::read): Micro-optimize allocation.
(function_instance::get_function_instance_by_decl): Dump reasons
for failure; try to compensate lost discriminators.
(function_instance::merge): Simplify sanity check; do not check
for realized flag; fix merging of targets.
(function_instance::offline_if_in_set): Simplify.
(function_instance::dump): Sanity check that names are consistent.
(autofdo_source_profile::offline_external_functions): Also handle
stripping suffixes.
(walk_block): Move up in source.
(autofdo_source_profile::offline_unrealized_inlines): Also compute
realized functions.
(autofdo_source_profile::get_function_instance_by_name_index): Simplify.
(autofdo_source_profile::add_function_instance): Simplify.
(autofdo_source_profile::read): Do not strip suffxies; error on duplicates.
(mark_realized_functions): Remove.
(auto_profile): Do not call mark_realized_functions.
* passes.def: Move auto_profile_offline before free_lang_data.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/clone-test.c: New test.
* gcc.dg/tree-prof/clone-merge-1.c: Updae template.
Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
|
|
This patch should finish the oflining infrastructure by offlining
(prior AFDO annotation) all inline function instances that was not
early inlined. This is mostly the case of recursive inlining or when
-fno-auto-profile-inlining is used which sould now produce comparable
code.
I also cleaned up offlining of self-recursive functions which now
happens through the worklist and reduces problem with recursive ivocation
of the funciton merging modifying datastructures at unexpected places.
gcc/ChangeLog:
* auto-profile.cc (function_instance::set_name,
function_instance::set_realized, function_instnace::realized_p,
function_instance::set_in_worklist,
function_instance::clear_in_worklist,
function_instance::in_worklist_p): New member functions.
(function_instance::in_worklist, function_instance::realized_):
new.
(get_relative_location_for_locus): Break out from ....
(get_relative_location_for_stmt): ... here.
(function_instance::~function_instance): Sanity check that
removed function is not in worklist.
(function_instance::merge): Do not offline realized instances.
(function_instance::offline): Make private; add duplicate functions
to worklist rather then merging immediately.
(function_instance::offline_if_in_set): Cleanup.
(function_instance::remove_external_functions): Likewise.
(function_instance::offline_if_not_realized): New member function.
(autofdo_source_profile::offline_external_functions): Handle delayed
functions.
(autofdo_source_profile::offline_unrealized_inlines): New member function.
(walk_block): New function.
(mark_realized_functions): New function.
(afdo_annotate_cfg): Fix dump.
(auto_profile): Mark realized functions and offline rest; do not compute
fn summary.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/afdo-crossmodule-1.c: Update template.
|
|
The following amends the SLP addsub pattern to also match blends
of .FMA/.FMS and form .FMADDSUB even when -ffp-contract=off.
PR tree-optimization/120808
* tree-vect-slp-patterns.cc (vect_match_expression_p):
Take a code_helper and also match calls.
(addsub_pattern::recognize): Handle .FMA/.FMS pairs
in addition to PLUS/MINUS.
(addsub_pattern::build): Adjust.
* gcc.dg/vect/bb-slp-pr120808.c: Now also expect FMADDSUB
patterns to be matched.
|
|
gcc/ChangeLog:
PR analyzer/120809
* diagnostic-format-html.cc
(html_builder::maybe_make_state_diagram): Bulletproof against the
SVG generation failing.
* xml.cc (xml::printer::push_element): Assert that the ptr is
nonnull.
(xml::printer::append): Likewise.
gcc/testsuite/ChangeLog:
PR analyzer/120809
* gcc.dg/analyzer/state-diagram-5.c: Split out into...
* gcc.dg/analyzer/state-diagram-5-html.c: ...this, adding
dg-require-dot...
* gcc.dg/analyzer/state-diagram-5-sarif.c: ...and this.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
This patch adds a testcase that offlining works and profile info is not lost.
While doing it I noticed a pasto that made the dump to be "afdo" and not
"afdo_offline" and also that not all functions are processed as the range
for does not expect new values to be put to the vector. Fixed thus.
gcc/ChangeLog:
* auto-profile.cc (function_instance::merge): Add TODO.
(autofdo_source_profile::offline_external_functions):
Do not use range for on the worklist.
* timevar.def (TV_IPA_AUTOFDO_OFFLINE): New timevar.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/afdo-crossmodule-1.c: New test.
* gcc.dg/tree-prof/afdo-crossmodule-1b.c: New test.
|
|
This patch fixes some of cases where we lose profile info because we do not
perform inlining that happened at train run before AFDO annotation is done.
This is a common problem with LTO in the case cross-module inlining happened.
I added afdo_offline pass that does two things:
1) collect set of all functions defined in current unit
2) walk all toplevel function instances. If function instance correspond
to a defined symbol, walk everything inlined to it. If crossmodule
inlining is seen, remove the inline instances and recursively look into
inline instnaces that go back to the current unit and turn them to offline
ones
If function instance corresponds to external symbol, remove it but
also look for functions inlined to it that belong to current module.
When merging profile we also need to recursively merge profiles of inlined
functions and if the inlining decisins does not match, offline the bodies.
This is somewhat fragile since recursive calls may trigger modifications of
functions currently being merged, but I hope I chased away problems with that -
will give it a second tought to see if this can be reorganized into a worklist
fashion that is more safe.
I noticed that functions may appear in the afdo data either as their
symbol name or dwarf name (since inline functions may not have known symbol
name). There is already some logic to handle that but it is broken in the
case both names are used.
To mitigate the problem I also added logic to translate dwarf names
to symbol names in case both are used. This prevents profile loss i.e.
in exchange2. Here digits_2 function appears by its dwarf name (digits_2)
but also is clonned which makes it to appear by its symbol name (__*digits_2)
All profile massaging is done before early optimization so the VPT targets of
offline bodies are correct. We still will lose profile if early inlining
fails. I will add second pass to afdo to offline these.
Last problem is that in case we early inlined more than expected (which now
happens more often due to offlining) the profile will be lost and filled by
static profile. Problem here is that we need to somehow scale the profile of
inline instance but I do not see how to determine invocation counts. Will try
to look into that incrementally - perhaps we can keep some info from offlining.
There is also now a dump infrastructure that prints the proflie in a
the same format as dump_gcov tool.
autoprofiledbootstraped, regsted x86_64-linux, will commit it shortly.
Honza
gcc/ChangeLog:
* auto-profile.cc (name_index_set, name_index_map): New types.
(dump_afdo_loc): New function.
(dump_inline_stack): Simplify.
(function_instance::merge): Merge recursively inlined functions;
offline if necessary; collect new fnctions.
(function_instance::offline): New member function.
(function_instance::offline_if_in_set): New member function.
(function_instance::remove_external_functions): New member function.
(function_instance::dump): New member function.
(function_instance::debug): New member function.
(function_instance::dump_inline_stack): New member function.
(function_instance::find_icall_target_map): Use removed_icall_target.
(function_instance::remove_icall_target): Only mark icall target removed.
(autofdo_source_profile::offline_external_functions): New function.
(function_instance::read_function_instance): Record inlined_to pointers;
use -1 for unknown head counts.
(autofdo_source_profile::get_function_instance_by_name_index): New
function.
(autofdo_source_profile::add_function_instance): New member function.
(autofdo_source_profile::read): Do not leak memory; fix formatting.
(read_profile): Fix formatting.
(afdo_annotate_cfg): LIkewise.
(class pass_ipa_auto_profile_offline): New pass.
(make_pass_ipa_auto_profile_offline): New function.
* passes.def (pass_ipa_auto_profile_offline): Add
* tree-pass.h (make_pass_ipa_auto_profile): Declare
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/indir-call-prof-2.c: Update template.
|
|
The following adds the ability to vectorize a fma reduction pair
as SLP reduction (we cannot yet handle ternary association in
reduction vectorization yet).
PR tree-optimization/109892
* tree-vect-loop.cc (check_reduction_path): Handle fma.
(vectorizable_reduction): Apply FOLD_LEFT_REDUCTION code
generation constraints.
* gcc.dg/vect/vect-reduc-fma-1.c: New testcase.
* gcc.dg/vect/vect-reduc-fma-2.c: Likewise.
* gcc.dg/vect/vect-reduc-fma-3.c: Likewise.
|
|
The following allows SLP build to succeed when mixing .FMA/.FMS
in different lanes like we handle mixed plus/minus. This does not
yet address SLP pattern matching to not being able to form
a FMADDSUB from this.
PR tree-optimization/120808
* tree-vectorizer.h (compatible_calls_p): Add flag to
indicate a FMA/FMS pair is allowed.
* tree-vect-slp.cc (compatible_calls_p): Likewise.
(vect_build_slp_tree_1): Allow mixed .FMA/.FMS as two-operator.
(vect_build_slp_tree_2): Handle calls in two-operator SLP build.
* tree-vect-slp-patterns.cc (compatible_complex_nodes_p):
Adjust.
* gcc.dg/vect/bb-slp-pr120808.c: New testcase.
|
|
This patch adds various support for debugging diagnostic paths and
events, intended initially for myself to help with debugging -fanalyzer.
It adds the optional ability for a diagnostic_event to supply a
description of the predicted state of the program at that point along
the diagnostic_path. To isolate the diagnostic subsystem from the
analyzer, this representation is currently an xml::document with custom
elements. The XML representation is similar to the analyzer's internal
state but can be easier to read - for example, rather than storing the
contents of memory via byte offsets, it uses fields for structs and
element indexes for arrays, recursively.
These states are handled by the HTML and SARIF diagnostic sinks.
The SARIF sink simply embeds the XML as a string in a property bag of the
threadFlowLocation object (SARIF v2.1.0 section 3.38).
For HTML output, the "experimental-html" sink gains a new
"show-state-diagrams=yes" option i.e.:
-fdiagnostics-add-output=experimental-html:show-state-diagrams=yes
which converts the state XML into SVG diagrams visualizing the state of
memory at each event, inspired by the "ddd" debugger. These can be seen
by pressing 'j' and 'k' to single-step forward and backward through
events, making it *much* easier to debug -fanalyzer.
An example of output can be seen here:
https://dmalcolm.fedorapeople.org/gcc/2025-06-23/state-diagram-1.c.html
showing an issue in a singly-linked list; there are various other
examples in the parent directory.
Generating the SVG diagrams requires an invocation of "dot" per event,
so it noticeable slows down diagnostic emission, hence the opt-in
command-line flag. However, I'm already finding bugs in -fanalyzer with
this that I hadn't seen before.
Given that the UI is rather clunky and there is lots of room for
improvement to the visualizations, for now this feature is marked
as being for GCC developers, not end-users.
The patch also adds a dot::ast_node class hierarachy to make it easy to
create GraphViz dot files with the correct escaping, and adds a C++
wrapper around pex adding some syntactic sugar for invoking
subprocesses.
gcc/ChangeLog:
PR other/116792
* Makefile.in (ANALYZER_OBJS): Add
analyzer/ana-state-to-diagnostic-state.o.
(OBJS): Move graphviz.o to...
(OBJS-libcommon): ...here. Add diagnostic-state-to-dot.o and pex.o.
* diagnostic-format-html.cc: Include "diagnostic-state.h" and
"graphviz.h".
(html_generation_options::html_generation_options): Initialize the
new flags.
(HTML_SCRIPT): Add function "get_any_state_diagram". Use it
when changing current focus id to update the visibility of the
pertinent diagram, if any.
(print_pre_source): New.
(html_builder::maybe_make_state_diagram): New.
(html_path_label_writer::html_path_label_writer): Add "path" param.
Initialize m_path and m_curr_event_id.
(html_path_label_writer::begin_label): Store current event id.
(html_path_label_writer::end_label): Attempt to make a state
diagram and add it if successful.
(html_path_label_writer::get_element_id): New.
(html_path_label_writer::m_path): New field.
(html_path_label_writer::m_curr_event_id): New field.
(html_builder::make_element_for_diagnostic): Pass path to label
writer.
* diagnostic-format-html.h
(html_generation_options::m_show_state_diagrams): New field.
(html_generation_options::m_show_state_diagram_xml): New field.
(html_generation_options::m_show_state_diagram_dot_src): New field.
* diagnostic-format-sarif.cc: Include "xml.h".
(populate_thread_flow_location_object): If requested, attempt to
generate xml state and add it to the proeprty bag as
"gcc/diagnostic_event/xml_state" in xml source form.
(sarif_generation_options::sarif_generation_options): Initialize
m_xml_state.
* diagnostic-format-sarif.h
(sarif_generation_options::m_xml_state): New field.
* diagnostic-path.cc: Define INCLUDE_MAP. Include "xml.h".
(diagnostic_event::maybe_make_xml_state): New.
* diagnostic-path.h (class xml::document): New forward decl.
(diagnostic_event::maybe_make_xml_state): New vfunc decl.
* diagnostic-state-to-dot.cc: New file.
* diagnostic-state.h: New file.
* digraph.cc: Define INCLUDE_STRING and INCLUDE_VECTOR.
* doc/analyzer.texi: Document state diagrams in html output.
(__analyzer_dump_dot): New.
(__analyzer_dump_xml): New.
* doc/invoke.texi (sarif): Add "xml-state" key.
(experimental-html): Add keys "show-state-diagrams",
"show-state-diagrams-dot-src" and "show-state-diagrams-xml".
* graphviz.cc: Define INCLUDE_MAP, INCLUDE_STRING, and
INCLUDE_VECTOR. Include "xml.h", "xml-printer.h", "pex.h" and
"selftest.h".
(graphviz_out::graphviz_out): Extract...
(dot::writer::writer): ...this.
(graphviz_out::write_indent): Convert to...
(dot::writer::write_indent): ...this.
(graphviz_out::print): Use get_pp.
(graphviz_out::println): Likewise.
(graphviz_out::begin_tr): Likewise.
(graphviz_out::end_tr): Likewise.
(graphviz_out::begin_td): Likewise.
(graphviz_out::end_td): Likewise.
(graphviz_out::begin_trtd): Likewise.
(graphviz_out::end_tdtr): Likewise.
(dot::ast_node::dump): New.
(dot::id::id): New.
(dot::id::print): New.
(dot::id::is_identifier_p): New.
(dot::kv_pair::print): New.
(dot::attr_list::print): New.
(dot::stmt_list::print): New.
(dot::stmt_list::add_edge): New.
(dot::stmt_list::add_attr): New.
(dot::graph::print): New.
(dot::stmt_with_attr_list::set_label): New.
(dot::node_stmt::print): New.
(dot::attr_stmt::print): New.
(dot::kv_stmt::print): New.
(dot::node_id::print): New.
(dot::port::print): New.
(dot::edge_stmt::print): New.
(dot::subgraph::print): New.
(dot::make_svg_document_buffer_from_graph): New.
(dot::make_svg_from_graph): New.
(selftest:test_ids): New.
(selftest:test_trivial_graph): New.
(selftest:test_layout_example): New.
(selftest:graphviz_cc_tests): New.
* graphviz.h (xml::node): New forward decl.
(class graphviz_out): Split out into...
(class dot::writer): ...this new class
(struct dot::ast_node): New.
(struct dot::id): New.
(struct dot::kv_pair): New.
(struct dot::attr_list): New.
(struct dot::stmt_list): New.
(struct dot::graph): New.
(struct dot::stmt): New.
(struct dot::stmt_with_attr_list): New.
(struct dot::node_stmt): New.
(struct dot::attr_stmt): New.
(struct dot::kv_stmt): New.
(enum class dot::compass_pt): New.
(struct dot::port): New.
(struct dot::node_id): New.
(struct dot::edge_stmt): New.
(struct dot::subgraph): New.
(dot::make_svg_from_graph): New.
* opts-diagnostic.cc (sarif_scheme_handler::make_sink): Add
"xml-state" flag.
(html_scheme_handler::make_sink): Add flags "show-state-diagrams",
"show-state-diagram-dot-src", and "show-state-diagram-xml".
* pex.cc: New file.
* pex.h: New file.
* selftest-run-tests.cc (selftest::run_tests): Call
graphviz_cc_tests.
* selftest.h (selftest::graphviz_cc_tests): New decl.
* xml.cc (xml::node_with_children::add_comment): New.
(xml::node_with_children::find_child_element): New.
(xml::element::get_attr): New.
(xml::comment::write_as_xml): New.
(selftest::test_printer): Add coverage of find_child_element and
get_attr.
(selftest::test_comment): New.
(selftest::xml_cc_tests): Call test_comment.
* xml.h: New forward decls.
(xml::node::dyn_cast_text): Use nullptr.
(xml::node::dyn_cast_element): New vfunc.
(xml::node_with_children::add_comment): New decl.
(xml::node_with_children::find_child_element): New decl.
(xml::element::dyn_cast_element): New vfunc impl.
(xml::element::get_attr): New decl.
(struct xml::comment): New xml::node subclass.
gcc/analyzer/ChangeLog:
PR other/116792
* ana-state-to-diagnostic-state.cc: New file.
* ana-state-to-diagnostic-state.h: New file.
* checker-event.cc: Include "xml.h".
(checker_event::checker_event): Initialize m_path.
(checker_event::prepare_for_emission): Store the path pointer into
m_path.
(checker_event::maybe_make_xml_state): New.
(function_entry_event::function_entry_event): Add "state" param
and use it to initialize m_state.
(superedge_event::get_program_state): New.
(call_event::get_program_state): New.
(warning_event::get_program_state): New.
* checker-event.h (checker_event::get_program_state): New vfunc.
(checker_event::maybe_make_xml_state): New decl.
(checker_event::m_path): New field.
(statement_event::get_program_state): New vfunc impl.
(function_entry_event::function_entry_event): Add "state" param.
(function_entry_event::get_program_state): New vfunc impl.
(function_entry_event::m_state): New field.
(state_change_event::get_program_state): New vfunc impl.
(superedge_event::get_program_state): New vfunc decl.
(warning_event::warning_event): Add "program_state_" param and
copy it.
(warning_event::get_program_state): New vfunc decl.
(warning_event::m_program_state): New field.
* checker-path.h (checker_path::checker_path): Add ext_state param.
(checker_path::get_ext_state): New accessor.
(checker_path::m_ext_state): New field.
* common.h: Define INCLUDE_MAP and INCLUDE_STRING.
* diagnostic-manager.cc (saved_diagnostic::operator==): Don't
deduplicate dump_path_diagnostic instances.
(diagnostic_manager::emit_saved_diagnostic): Pass ext_state to
checker_path ctor.
* engine.cc:
(impl_region_model_context::on_state_leak): Pass old and new state
to state_machine::on_leak.
(exploded_node::on_stmt_pre): Implement __analyzer_dump_xml and
__analyzer_dump_dot.
* exploded-graph.h (impl_region_model_context::get_state): New.
* infinite-recursion.cc
(recursive_function_entry_event::recursive_function_entry_event):
Add "dst_state" param and pass to function_entry_event ctor.
(infinite_recursion_diagnostic::add_function_entry_event): Pass state
to event ctor.
* kf-analyzer.cc: Include "analyzer/program-state.h"
(dump_path_diagnostic::dump_path_diagnostic): Add "state" param.
(dump_path_diagnostic::get_final_state): New.
(dump_path_diagnostic::m_state): New field.
(kf_analyzer_dump_path::impl_call_pre): Pass state to warning.
* pending-diagnostic.cc
(pending_diagnostic::add_function_entry_event): Pass state to
function_entry_event.
(pending_diagnostic::add_final_event): Likewise to warning_event.
* pending-diagnostic.h (pending_diagnostic::get_final_state): New
vfunc decl.
* program-state.cc: Include "diagnostic-state.h", "graphviz.h" and
"analyzer/ana-state-to-diagnostic-state.h".
(program_state::dump_dot): New.
* program-state.h: Include "text-art/tree-widget.h" and
"analyzer/store.h".
(class xml::document): New forward decl.
(make_xml): New.
(dump_xml_to_pp): New.
(dump_xml_to_file): New.
(dump_xml): New.
(dump_dot): New.
* record-layout.cc (record_layout::record_layout): Make param
const_tree.
* record-layout.h (item::item): Likewise.
(item::m_field): Likewise.
(record_layout::record_layout): Likewise.
(record_layout::begin): New.
(record_layout::end): New.
* region-model.cc
(exposure_through_uninit_copy::complain_about_fully_uninit_item):
Use const_tree.
(exposure_through_uninit_copy::complain_about_partially_uninit_item):
Likewise.
* region-model.h (region_model_context::get_state): New vfunc.
(noop_region_model_context::get_state): New.
(region_model_context_decorator::get_state): New.
* sm-fd.cc (fd_leak::fd_leak): Add "final_state" param and capture
it if present.
(fd_leak::get_final_state): New.
(fd_leak::m_final_state): New.
(fd_state_machine::on_open): Pass nullptr for new "final_state"
param.
(fd_state_machine::on_creat): Likewise.
(fd_state_machine::on_socket): Likewise.
(fd_state_machine::on_accept): Likewise.
(fd_state_machine::on_leak): Add state params and pass new state
as final state to fd_leak ctor.
* sm-file.cc: Include "analyzer/program-state.h".
(file_leak::file_leak): Add "final_state" param and capture it if
present.
(file_leak::get_final_state): New.
(file_leak::m_final_state): New.
(fileptr_state_machine::on_leak): Add state params and pass new
state as final state to fd_leak ctor.
* sm-malloc.cc: Include
"analyzer/ana-state-to-diagnostic-state.h".
(malloc_leak::malloc_leak): Add "final_state" param and use it.
(malloc_leak::get_final_state): New vfunc impl.
(malloc_leak::m_final_state): New field.
(malloc_state_machine::on_leak): Add state params; capture final
state.
(malloc_state_machine::add_state_to_xml): New.
* sm.cc (state_machine::on_leak): Add "old_state" and "new_state"
params. Use nullptr.
(state_machine::add_state_to_xml): New.
(state_machine::add_global_state_to_xml): New.
* sm.h (class xml_state): New forward decl.
(state_machine::on_leak): Add state params.
(state_machine::add_state_to_xml): New vfunc decl.
(state_machine::add_global_state_to_xml): New vfunc decl.
* store.h (bit_range::operator<): New.
* varargs.cc (va_list_leak::va_list_leak): Add final_state param
and capture it if non-null.
(va_list_leak::get_final_state): New.
(va_list_leak::m_final_state): New.
(va_list_state_machine::on_leak): Add state params and pass final
state to va_list_leak ctor.
gcc/testsuite/ChangeLog:
PR other/116792
* g++.dg/analyzer/state-diagram.C: New test.
* gcc.dg/analyzer/analyzer-decls.h (__analyzer_dump_dot): New
decl.
(__analyzer_dump_xml): New decl.
* gcc.dg/analyzer/state-diagram-1-sarif.py: New test script.
* gcc.dg/analyzer/state-diagram-1.c: New test.
* gcc.dg/analyzer/state-diagram-2.c: New test.
* gcc.dg/analyzer/state-diagram-3.c: New test.
* gcc.dg/analyzer/state-diagram-4.c: New test.
* gcc.dg/analyzer/state-diagram-5-html.py: New test script.
* gcc.dg/analyzer/state-diagram-5-sarif.py: New test script.
* gcc.dg/analyzer/state-diagram-5.c: New test.
* gcc.dg/plugin/analyzer_cpython_plugin.cc: Define INCLUDE_STRING.
* gcc.dg/plugin/analyzer_gil_plugin.cc: Likewise.
* gcc.dg/plugin/analyzer_kernel_plugin.cc: Likewise.
* gcc.dg/plugin/analyzer_known_fns_plugin.cc: Likewise.
* lib/htmltest.py (ns): Add SVG namespace.
* lib/sarif.py (get_result_by_index): New.
(get_xml_state): New.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
Current GCC uses either peeling or versioning, but not in combination,
to handle unaligned data references (DRs) during vectorization. This
limitation causes some loops with early break to fall back to scalar
code at runtime.
Consider the following loop with DRs in its early break condition:
for (int i = start; i < end; i++) {
if (a[i] == b[i])
break;
count++;
}
In the loop, references to a[] and b[] need to be strictly aligned for
vectorization because speculative reads that may cross page boundaries
are not allowed. Current GCC does versioning for this loop by creating a
runtime check like:
((&a[start] | &b[start]) & mask) == 0
to see if two initial addresses both have lower bits zeros. If above
runtime check fails, the loop will fall back to scalar code. However,
it's often possible that DRs are all unaligned at the beginning but they
become all aligned after a few loop iterations. We call this situation
DRs being "mutually aligned".
This patch enables combined peeling and versioning to avoid loops with
mutually aligned DRs falling back to scalar code. Specifically, the
function vect_peeling_supportable is updated in this patch to return a
three-state enum indicating how peeling can make all unsupportable DRs
aligned. In addition to previous true/false return values, a new state
peeling_maybe_supported is used to indicate that peeling may be able to
make these DRs aligned but we are not sure about it at compile time. In
this case, peeling should be combined with versioning so that a runtime
check will be generated to guard the peeled vectorized loop.
A new type of runtime check is also introduced for combined peeling and
versioning. It's enabled when LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is true.
The new check tests if all DRs recorded in LOOP_VINFO_MAY_MISALIGN_STMTS
have the same lower address bits. For above loop case, the new test will
generate an XOR between two addresses, like:
((&a[start] ^ &b[start]) & mask) == 0
Therefore, if a and b have the same alignment step (element size) and
the same offset from an alignment boundary, a peeled vectorized loop
will run. This new runtime check also works for >2 DRs, with the LHS
expression being:
((a1 ^ a2) | (a2 ^ a3) | (a3 ^ a4) | ... | (an-1 ^ an)) & mask
where ai is the address of i'th DR.
This patch is bootstrapped and regression tested on x86_64-linux-gnu,
arm-linux-gnueabihf and aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_peeling_supportable): Return new
enum values to indicate if combined peeling and versioning can
potentially support vectorization.
(vect_enhance_data_refs_alignment): Support combined peeling and
versioning in vectorization analysis.
* tree-vect-loop-manip.cc (vect_create_cond_for_align_checks):
Add a new type of runtime check for mutually aligned DRs.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Set
default value of allow_mutual_alignment in the initializer list.
* tree-vectorizer.h (enum peeling_support): Define type of
peeling support for function vect_peeling_supportable.
(LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT): New access macro.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-early-break_133_pfa6.c: Adjust test.
|
|
This patch makes the AFDO's VPT to happen during early inlining. This should
make the einline pass inside afdo pass unnecesary, but some inlining still
happens there - I will need to debug why that happens and will try to drop the
afdo's inliner incrementally.
get_inline_stack_in_node can now be used to produce inline stack out of
callgraph nodes which are marked as inline clones, so we do not need to iterate
tree-inline and IPA decisions phases like old code did. I also added some
debug facilities - dumping of decisions and inline stacks, so one can match
them with data in gcov profile.
Former VPT pass identified all caes where in train run indirect call was inlined
and the inlined callee collected some samples. In this case it forced inline without
doing any checks, such as whether inlining is possible.
New code simply introduces speculative edges into callgraph and lets afdo inlining
to decide. Old code also marked statements that were introduced during promotion
to prevent doing double speculation i.e.
if (ptr == foo)
.. inlined foo ...
else
ptr ();
to
if (ptr == foo)
.. inlined foo ...
else if (ptr == foo)
foo (); // for IPA inlining
else
ptr ();
Since inlning now happens much earlier, tracking the statements would be quite hard.
Instead I simply remove the targets from profile data which sould have same effect.
I also noticed that there is nothing setting max_count so all non-0 profile is
considered hot which I fixed too.
Training with ref run I now get
500.perlbench_r 1 160 9.93 * 1 162 9.84 *
502.gcc_r NR NR
505.mcf_r 1 186 8.68 * 1 194 8.34 *
520.omnetpp_r 1 183 7.15 * 1 208 6.32 *
523.xalancbmk_r NR NR
525.x264_r 1 85.2 20.5 * 1 85.8 20.4 *
531.deepsjeng_r 1 165 6.93 * 1 176 6.51 *
541.leela_r 1 268 6.18 * 1 282 5.87 *
548.exchange2_r 1 86.3 30.4 * 1 88.9 29.5 *
557.xz_r 1 224 4.81 * 1 224 4.82 *
Est. SPECrate2017_int_base 9.72
Est. SPECrate2017_int_peak 9.33
503.bwaves_r NR NR
507.cactuBSSN_r 1 107 11.9 * 1 105 12.0 *
508.namd_r 1 108 8.79 * 1 116 8.18 *
510.parest_r 1 143 18.3 * 1 156 16.8 *
511.povray_r 1 188 12.4 * 1 163 14.4 *
519.lbm_r 1 72.0 14.6 * 1 75.0 14.1 *
521.wrf_r 1 106 21.1 * 1 106 21.1 *
526.blender_r 1 147 10.3 * 1 147 10.4 *
527.cam4_r 1 110 15.9 * 1 118 14.8 *
538.imagick_r 1 104 23.8 * 1 105 23.7 *
544.nab_r 1 146 11.6 * 1 143 11.8 *
549.fotonik3d_r 1 134 29.0 * 1 169 23.1 *
554.roms_r 1 86.6 18.4 * 1 89.3 17.8 *
Est. SPECrate2017_fp_base 15.4
Est. SPECrate2017_fp_peak 14.9
Base is without profile feedback and peak is AFDO.
gcc/ChangeLog:
* auto-profile.cc (dump_inline_stack): New function.
(get_inline_stack_in_node): New function.
(get_relative_location_for_stmt): Add FN parameter.
(has_indirect_call): Remove.
(function_instance::find_icall_target_map): Add FN parameter.
(function_instance::remove_icall_target): New function.
(function_instance::read_function_instance): Set sum_max.
(autofdo_source_profile::get_count_info): Add NODE parameter.
(autofdo_source_profile::update_inlined_ind_target): Add NODE parameter.
(autofdo_source_profile::remove_icall_target): New function.
(afdo_indirect_call): Add INDIRECT_EDGE parameter; dump reason
for failure; do not check for recursion; do not inline call.
(afdo_vpt): Add INDIRECT_EDGE parameter.
(afdo_set_bb_count): Do not take PROMOTED set.
(afdo_vpt_for_early_inline): Remove.
(afdo_annotate_cfg): Do not take PROMOTED set.
(auto_profile): Do not call afdo_vpt_for_early_inline.
(afdo_callsite_hot_enough_for_early_inline): Dump count.
(remove_afdo_speculative_target): New function.
* auto-profile.h (afdo_vpt_for_early_inline): Declare.
(remove_afdo_speculative_target): Declare.
* ipa-inline.cc (inline_functions_by_afdo): Do VPT.
(early_inliner): Redirecct edges if inlining happened.
* tree-inline.cc (expand_call_inline): Add sanity check.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/afdo-vpt-earlyinline.c: Update template.
* gcc.dg/tree-prof/indir-call-prof-2.c: Update template.
|
|
This patch moves afdo inlining from early inliner into specialized one.
The reason is that early inliner is by design non-recursive while afdo
inliner needs to recurse. In the past google handled it by increasing
early inliner iterations, but it can be done easily and cheaply without
it by simply recusing into inlined functions.
I will also look into moving VPT to early inliner now.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* auto-profile.cc (get_inline_stack): Add fn parameter.
* ipa-inline.cc (want_early_inline_function_p): Do not care
about AFDO.
(inline_functions_by_afdo): New function.
(early_inliner): Use it.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/afdo-vpt-earlyinline.c: Update template.
* gcc.dg/tree-prof/indir-call-prof-2.c: Likewise.
* gcc.dg/tree-prof/afdo-inline.c: New test.
|
|
when snapping range bounds to satidsdaybitmask constraints, end bound overflow
and underflow checks were not working properly.
Also Adjust some comments, and enhance verify_range to make sure range pairs
are sorted properly.
PR tree-optimization/120701
gcc/
* value-range.cc (irange::verify_range): Verify range pairs are
sorted properly.
(irange::snap): Check for over/underflow properly.
gcc/testsuite/
* gcc.dg/pr120701.c: New.
|
|
The following ICEs as we hand down an UNDEFINED range to where it
isn't expected. Put the guard that's there earlier.
PR tree-optimization/120654
* vr-values.cc (range_fits_type_p): Check for undefined_p ()
before accessing type ().
* gcc.dg/torture/pr120654.c: New testcase.
|
|
Unfortunately, the following further testcase shows that there aren't
problems only with very large precisions and large exponents, but pretty
much anything larger than 64-bits. After all, before _BitInt support dfp
didn't even have {,unsigned }__int128 <-> _Decimal{32,64,128,64x} support,
and the testcase again shows some of the conversions yielding zeros.
While the pr120631.c test worked even without the earlier patch.
So, this patch assumes 64-bit precision at most is ok and for anything
larger it just uses exponent 0 and multiplies afterwards.
2025-06-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120631
* dfp.cc (decimal_real_to_integer): Use result multiplication not just
when precision > 128 and dn.exponent > 19, but when precision > 64
and dn.exponent > 0.
* gcc.dg/dfp/bitint-10.c: New test.
* gcc.dg/dfp/pr120631.c: New test.
|
|
gcc/testsuite/
* gcc.dg/pr119039-1.c: Add space in search criteria.
|
|
Improve the way contains_p (wide_int) and intersect behave wioth
singletons and bitmasks. Also fix a buglet in bitmask_intersect when the
result is a singleton which is not in the current range.
PR tree-optimization/119039
gcc/
* value-range.cc (irange::contains_p): Call wide_int version of
contains_p for singleton ranges.
(irange::intersect): If either range is a singleton, use
contains_p.
gcc/testsuite/
* gcc.dg/pr119039-2.c: New.
|
|
Adjust simplify_switch_using_ranges to use irange rather than relying
on the older legacy_range mechaism.
PR tree-optimization/119039
gcc/
* vr-values.cc (simplify_using_ranges::legacy_fold_cond): Remove.
(simplify_using_ranges::simplify_switch_using_ranges): Adjust.
gcc/testsuite/
* gcc.dg/pr119039-1.c: New.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust thread counts.
|
|
large _BitInt [PR120631]
The following testcase shows that while at runtime we handle conversions
between _Decimal{64,128} and large _BitInt correctly, at compile time we
mishandle them in both directions, in one direction we end up in ICE in
decimal_from_integer callee because the char buffer is too short for the
needed number of decimal digits, in the conversion of dfp to large _BitInt
we return 0 in the wide_int.
The following patch fixes the ICE by using larger buffer (XALLOCAVEC
allocated, it will be never larger than 65536 / 3 bytes) in the larger
_BitInt case, and the other direction by setting exponent to exp % 19
and instead multiplying the result by needed powers of 10^19 (10^19 chosen
as largest power of ten that can fit into UHWI).
2025-06-18 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120631
* real.cc (decimal_from_integer): Add digits argument, if larger than
256, use XALLOCAVEC allocated buffer.
(real_from_integer): Pass val_in's precision divided by 3 to
decimal_from_integer.
* dfp.cc (decimal_real_to_integer): For precision > 128 if finite
and exponent is large, decrease exponent and multiply resulting
wide_int by powers of 10^19.
* gcc.dg/dfp/bitint-9.c: New test.
|
|
Ensure all subrange endpoints conform to the bitmask.
PR tree-optimization/120661
gcc/
* value-range.cc (irange::snap): New.
(irange::snap_subranges): New.
(irange::set_range_from_bitmask): Call snap_subranges.
* value-range.h (snap, snap_subranges): New prototypes.
gcc/testsuite/
* gcc.dg/pr120661-1.c: New.
* gcc.dg/pr120661-2.c: New.
|
|
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/afdo-vpt-earlyinline.c: New test.
|
|
Testcase pr119160.c fails with symbol referencing errors for
`__cyg_profile_func_enter` and `__cyg_profile_func_exit` on non-glibc
systems.
This patch adds empty definitions for `__cyg_profile_func_enter`
and `__cyg_profile_func_exit` in order to prevent those errors.
PR testsuite/119862
gcc/testsuite/ChangeLog:
* gcc.dg/pr119160.c: Added empty definitions for
`__cyg_profile_func_enter` and `__cyg_profile_func_exit`
functions.
|
|
This pass reuses a SSA_NAME on the lhs of sqrt etc. call as lhs
of .RSQRT etc. call. The following testcase is miscompiled since my recent
ranger cast changes, because we compute (correct) range for sqrtf argument
as well as result but then recip pass keeps using that range for the .RQSRT
call which returns 1. / sqrt, so the function then returns 0.5f
unconditionally.
Note, on foo this is a regression from GCC 15, but on bar it regressed
already with the r14-536 change.
2025-06-12 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/120638
* tree-ssa-math-opts.cc (pass_cse_reciprocals::execute): Call
reset_flow_sensitive_info on arg1.
* gcc.dg/pr120638.c: New test.
|
|
These tests were broken by my r16-1398 PR120434 change and
fixed by r16-1482 PR120629 change.
Committing these to increase testsuite coverage.
2025-06-12 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120630
* gcc.dg/pr120630.c: New test.
* gcc.c-torture/execute/pr120630.c: New test.
|
|
PR middle-end/117811
PR testsuite/52641
gcc/testsuite/
* gcc.dg/torture/pr117811.c: Fix for int < 32 bit.
|
|
There is an old GNU extension which allows overriding the
promoted old-style arguments when there is an earlier prototype
An example (from a test added for PR16666) is the following.
float dremf (float, float);
float
dremf (x, y)
float x, y;
{
return x + y;
}
The types of the two declarations are not compatible, because
the arguments are not self-promoting. Add a special case
to function_types_compatible_p that can be toggled via a flag
for comptypes_internal and add a helper function to be able to
add the checking assertions to composite_type.
PR c/120510
gcc/c/ChangeLog:
* c-typeck.cc (composite_type_internal): Activate checking
assertions for all types and also inputs.
(comptypes_for_composite_check): New helper function.
(function_types_compatible_p): Add exception.
gcc/testsuite/ChangeLog:
* gcc.dg/old-style-prom-4.c: New test.
|
|
Fix an error recovery ICE that occurs when a typename
can not be parsed correctly in the controlling expression
of a generic selection.
PR c/120303
gcc/c/ChangeLog:
* c-parser.cc (c_parser_generic_selection): Handle error
condition.
gcc/testsuite/ChangeLog:
* gcc.dg/pr120303.c: New test.
|
|
This patch to the "experimental-html" diagnostic sink:
* adds use of the PatternFly 3 CSS library (via an optional link
in the generated html to a copy in a CDN)
* uses PatternFly's "alert" pattern to show severities for diagnostics,
properly nesting "note" diagnostics for diagnostic groups.
Example:
before: https://dmalcolm.fedorapeople.org/gcc/2025-06-10/before/diagnostic-ranges.c.html
after: https://dmalcolm.fedorapeople.org/gcc/2025-06-10/after/diagnostic-ranges.c.html
* adds initial support for logical locations and physical locations
* adds initial support for multi-level nested diagnostics such as those
for C++ concepts diagnostics. Ideally this would show a clickable
disclosure widget to expand/collapse a level, but for now it uses
nested <ul> elements with <li> for the child diagnostics.
Example:
before: https://dmalcolm.fedorapeople.org/gcc/2025-06-10/before/nested-diagnostics-1.C.html
after: https://dmalcolm.fedorapeople.org/gcc/2025-06-10/after/nested-diagnostics-1.C.html
gcc/ChangeLog:
PR other/116792
* diagnostic-format-html.cc: Include "diagnostic-path.h" and
"diagnostic-client-data-hooks.h".
(html_builder::m_logical_loc_mgr): New field.
(html_builder::m_cur_nesting_levels): New field.
(html_builder::m_last_logical_location): New field.
(html_builder::m_last_location): New field.
(html_builder::m_last_expanded_location): New field.
(HTML_STYLE): Add "white-space: pre;" to .source and .annotation.
Add "gcc-quoted-text" CSS class.
(html_builder::html_builder): Initialize the new fields. If CSS
is enabled, add CDN links to PatternFly 3 stylesheets.
(html_builder::add_stylesheet): New.
(html_builder::on_report_diagnostic): Add "alert" param to
make_element_for_diagnostic, setting it by default, but unsetting
it for nested diagnostics below the top level. Use
add_at_nesting_level for nested diagnostics.
(add_nesting_level_attr): New.
(html_builder::add_at_nesting_level): New.
(get_pf_class_for_alert_div): New.
(get_pf_class_for_alert_icon): New.
(get_label_for_logical_location_kind): New.
(add_labelled_value): New.
(html_builder::make_element_for_diagnostic): Add leading comment.
Add "alert" param. Drop class="gcc-diagnostic" from <div> tag,
instead adding the class for a PatternFly 3 alert if "alert" is
true, and adding a <span> with an alert icon, both according to
the diagnostic severity. Add a severity prefix to the message for
alerts. Add any metadata/option text as suffixes to the message.
Show any logical location. Show any physical location. Don't
show the locus if the last location is unchanged within the
diagnostic_group. Wrap any execution path element in a
<div id="execution-path"> and add a label to it. Wrap any
generated patch in a <div id="suggested-fix"> and add a label
to it.
(selftest::test_simple_log): Update expected HTML.
gcc/testsuite/ChangeLog:
PR other/116792
* gcc.dg/html-output/missing-semicolon.py: Update for changes
to diagnostic elements.
* gcc.dg/format/diagnostic-ranges-html.py: Likewise.
* gcc.dg/plugin/diagnostic-test-metadata-html.py: Likewise. Drop
out-of-date comment.
* gcc.dg/plugin/diagnostic-test-paths-2.py: Likewise.
* gcc.dg/plugin/diagnostic-test-paths-4.py: Likewise. Drop
out-of-date comment.
* gcc.dg/plugin/diagnostic-test-show-locus.py: Likewise.
* lib/htmltest.py (get_diag_by_index): Update to use search by id.
(get_message_within_diag): Update to use search by class.
libcpp/ChangeLog:
PR other/116792
* include/line-map.h (typedef expanded_location): Convert to...
(struct expanded_location): ...this.
(operator==): New decl, for expanded_location.
(operator!=): Likewise.
* line-map.cc (operator==): New decl, for expanded_location.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
I've been seeing issues in the experimental-html sink where the nesting
of tags goes wrong.
The two issues I've seen are:
* the pp_token_list from the diagnostic message that reaches the
html_token_printer doesn't always have matching pairs of begin/end
tokens (PR other/120610)
* a bug in diagnostic-show-locus where there was a stray xp.pop_tag,
in print_trailing_fixits.
This patch:
* changes the xml::printer::pop_tag API so that it now takes the
expected name of the element being popped (rather than expressing this
in comments), and that, by default, the xml::printer asserts that this
matches.
* gives the html_token_printer its own xml::printer instance to restrict
the affected area of the DOM tree; this xml::printer doesn't enforce
nesting (PR other/120610)
* adds RAII sentinel classes that automatically check for pushes/pops
being balanced within a scope, using them in various places
* fixes the bug in print_trailing_fixits for html output
gcc/ChangeLog:
PR other/120610
* diagnostic-format-html.cc (html_builder::html_builder): Update
for new param of xml::printer::pop_tag.
(html_path_label_writer::end_label): Likewise.
(html_builder::make_element_for_diagnostic::html_token_printer):
Give the instance its own xml::printer. Update for new param of
xml::printer::pop_tag.
(html_builder::make_element_for_diagnostic): Give the instance its
own xml::printer.
(html_builder::make_metadata_element): Update for new param of
xml::printer::pop_tag.
(html_builder::flush_to_file): Likewise.
* diagnostic-path-output.cc (begin_html_stack_frame): Likewise.
(begin_html_stack_frame): Likewise.
(end_html_stack_frame): Likewise.
(print_path_summary_as_html): Likewise.
* diagnostic-show-locus.cc
(struct to_text::auto_check_tag_nesting): New.
(struct to_html:: auto_check_tag_nesting): New.
(to_text::pop_html_tag): Change param to const char *.
(to_html::pop_html_tag): Likewise; rename param to
"expected_name".
(default_diagnostic_start_span_fn<to_html>): Update for new param
of xml::printer::pop_tag.
(layout_printer<to_html>::end_label): Likewise.
(layout_printer<Sink>::print_trailing_fixits): Add RAII sentinel
to check tag nesting for the HTML case. Delete stray popping
of "td" in the presence of fix-it hints.
(layout_printer<Sink>::print_line): Add RAII sentinel
to check tag nesting for the HTML case.
(diagnostic_source_print_policy::print_as_html): Likewise.
(layout_printer<Sink>::print): Likewise.
* xml-printer.h (xml::printer::printer): Add optional
"check_popped_tags" param.
(xml::printer::pop_tag): Add "expected_name" param.
(xml::printer::get_num_open_tags): New accessor.
(xml::printer::dump): New decl.
(xml::printer::m_check_popped_tags): New field.
(class xml::auto_check_tag_nesting): New.
(class xml::auto_print_element): Update for new param of pop_tag.
* xml.cc: Move pragma pop so that the pragma also covers
xml::printer's member functions, "dump" in particular.
(xml::printer::printer): Add param "check_popped_tags".
(xml::printer::pop_tag): Add param "expected_name" and use it to assert
that the popped tag is as expected. Assert that we have a tag to
pop.
(xml::printer::dump): New.
(selftest::test_printer): Update for new param of pop_tag.
(selftest::test_attribute_ordering): Likewise.
gcc/testsuite/ChangeLog:
PR other/120610
* gcc.dg/format/diagnostic-ranges-html.py: Remove out-of-date
comment.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
Checking assertions revealed that we sometimes produce
composite types with incorrect qualifiers, e.g. the example
int f(int [_Atomic]);
int f(int [_Atomic]);
int f(int [_Atomic]);
was rejected because atomic was lost in the second declaration.
PR c/120510
gcc/c/ChangeLog:
* c-typeck.cc (composite_types_internal): Handle arrays
declared with atomic for function arguments.
gcc/testsuite/ChangeLog:
* gcc.dg/pr120510.c
|
|
Add a new vfunc diagnostic_output_format::set_main_input_filename
so that we can separate setting the <title> of HTML output and
the diagnostic_artifact_role::analysis_target of SARIF output from
creation of the sinks. Calling it is done by the various creators
of the sinks.
gcc/ChangeLog:
PR other/116792
* diagnostic-format-html.cc (html_builder::m_title_element): New
field.
(html_builder::html_builder): Initialize it. Don't add
placeholder text.
(html_builder::set_main_input_filename): New.
(html_output_format::set_main_input_filename): New.
(test_html_diagnostic_context::test_html_diagnostic_context): Call
set_main_input_filename on the new sink.
(seldtest::test_simple_log): Update expected <title> text.
* diagnostic-format-json.cc (diagnostic_output_format_init_json):
Return a reference to the new sink.
(diagnostic_output_format_init_json_stderr): Likewise.
(diagnostic_output_format_init_json_file): Likewise.
* diagnostic-format-sarif.cc (sarif_builder::sarif_builder): Drop
"main_input_filename_" param, and move adding an artifact for it
with diagnostic_artifact_role::analysis_target to...
(sarif_builder::set_main_input_filename): ...this new function.
(sarif_output_format::set_main_input_filename): New.
(sarif_output_format::sarif_output_format): Drop
"main_input_filename_" param.
(sarif_stream_output_format::sarif_stream_output_format):
Likewise.
(sarif_file_output_format::sarif_file_output_format): Likewise.
(diagnostic_output_format_init_sarif): Return a reference to *FMT.
(diagnostic_output_format_init_sarif_stderr): Return a refererence
to the new sink. Drop "main_input_filename_" param.
(diagnostic_output_format_init_sarif_file): Likewise.
(diagnostic_output_format_init_sarif_stream): Likewise.
(make_sarif_sink): Drop "main_input_filename_" param.
(selftest::test_sarif_diagnostic_context::test_sarif_diagnostic_context):
Likewise. Call set_main_input_filename on the new format.
(selftest::test_sarif_diagnostic_context::buffered_output_format::buffered_output_format):
Drop "main_input_filename_" param.
(selftest::test_make_location_object): Likewise.
* diagnostic-format-sarif.h
(diagnostic_output_format_init_sarif_stderr): Return a refererence
to the new sink. Drop "main_input_filename_" param.
(diagnostic_output_format_init_sarif_file): Likewise.
(diagnostic_output_format_init_sarif_stream): Likewise.
(make_sarif_sink): Drop "main_input_filename_" param.
* diagnostic-format.h
(diagnostic_output_format::set_main_input_filename): New vfunc.
(diagnostic_output_format_init_json_stderr): Return a refererence
to the new sink.
(diagnostic_output_format_init_json_file): Likewise.
* diagnostic.cc (diagnostic_output_format_init): Likewise. Call
set_main_input_filename on the new sink.
* libgdiagnostics.cc (sarif_sink::sarif_sink): Update for above
changes.
* opts-diagnostic.cc (sarif_scheme_handler::make_sink): Likewise.
(handle_OPT_fdiagnostics_add_output_): Likewise.
(handle_OPT_fdiagnostics_set_output_): Likewise.
gcc/testsuite/ChangeLog:
PR other/116792
* gcc.dg/html-output/missing-semicolon.py: Update expected <title>
text. Drop out-of-date comment.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
This patch enables clone-merge-1.c only for fauto-profile as it is
failing otherwise. This also fixes a yupo in merge.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/clone-merge-1.c: Enable only for
-fauto-profile.
gcc/ChangeLog:
* auto-profile.cc (function_instance::merge): Fix typo.
Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
|
|
So currently cselim is limited to targets which have conditional move
and also happens later in the pipeline. This adds the limited form of cselim;
where there is only one store in the two sides and no loads after the store.
This fixes phiprop-2.c for gcn target and now can match the MIN in phiopt1
so it moves the matching of MIN to phiopt1.
The other testcases already disable cselim so they need to disable phiopt too.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/120533
gcc/ChangeLog:
* tree-ssa-phiopt.cc (cond_if_else_store_replacement_limited): New function.
(pass_phiopt::execute): Call cond_if_else_store_replacement_limited
for diamand case.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr35286.c: Add -fno-ssa-phiopt.
* gcc.dg/tree-ssa/split-path-6.c: Likewise.
* gcc.dg/tree-ssa/split-path-7.c: Likewise.
* gcc.dg/tree-ssa/phiprop-2.c: Move the check for MIN_EXPR to phiopt1.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The following patch adds support for float <-> integer conversions in
ranger.
The patch reverts part of the r16-571 changes, those changes were right
for fold_range, but not for op1_range, where RO_IFI and RO_FIF are actually
called rather than RO_IFF and RO_FII that the patch expected.
Also, the float -> int operation actually uses FIX_TRUNC_EXPR tree code
rather than NOP_EXPR or CONVERT_EXPR and int -> float uses FLOAT_EXPR,
but I think we can just handle all of them using operator_cast, at least
as long as we don't try to use VIEW_CONVERT_EXPR using that too; not really
sure handling VCE at least for floating to integral or vice versa would
be actually useful though.
The patch "regressed" two tests, gfortran.dg/inline_matmul_16.f90 and
g++.dg/tree-ssa/loop-split-1.C. In the first case, there is a loop doing
matmul on various sizes of matrices, up to 10x10 matrices, and Fortran
FE given the options emits two implementations of the matmul, one inline
for the case where the matmul has less than 1000 elements and one for
larger matmuls. The check for whatever reason uses floating point
calculations and before this patch we weren't able to prove that all the
matrices will be smaller than the cutoff and the test was checking for
presence of the fallback call; with the patch we are able to figure it
out and only keep the inline copy. I've duplicated the test, once
unmodified source which doesn't expect _gfortran_matmul string in optimized
dump anymore, and another copy which uses volatile ten instead of 10 in
loop upper bounds so that it has to keep the fallback and scans for it.
The other test is g++.dg/tree-ssa/loop-split-1.C, which does
constexpr unsigned s = 100000000;
...
for(unsigned i = 0; i < s; ++i)
{
if(i == 0)
a[i] = b[i] * c[i];
else
a[i] = (b[i] + c[i]) * c[i-1] * std::log(i);
}
and for some reason the successful loop splitting for which the test
searches in a dump file is dependent on the errno fallback of std::log,
where we do t = std::log((double)i); if ((double)i) u> 0); else log ((double)i);
But i goes only from 1 to 100000000, so (double)i has the range
[1.0, 100000000.0] with the patch and so we see it will never need errno
nor raise exception. I've tested adding + d for it where d is 0.0 but
modifiable in some other TU, and tested it also with r14-2851 and r14-2852,
where the former FAILed the test both unmodified and modified, while
the latter PASSed both versions.
2025-06-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/120231
* range-op.cc (range_op_table::range_op_table): Register op_cast
also for FLOAT_EXPR and FIX_TRUNC_EXPR.
(RO_III): Adjust comment.
(range_op_handler::op1_range): Handle RO_IFI rather than RO_IFF.
Don't handle RO_FII.
(range_operator::op1_range): Remove overload with
irange &, tree, const frange &, const frange &, relation_trio
and frange &, tree, const irange &, const irange &, relation_trio
arguments. Add overload with
irange &, tree, const frange &, const irange &, relation_trio
arguments.
* range-op-mixed.h (operator_cast::op1_range): Remove overload with
irange &, tree, const frange &, const frange &, relation_trio
and frange &, tree, const irange &, const irange &, relation_trio
arguments. Add overload with
irange &, tree, const frange &, const irange &, relation_trio and
frange &, tree, const irange &, const frange &, relation_trio
arguments.
* range-op.h (range_operator::op1_cast): Remove overload with
irange &, tree, const frange &, const frange &, relation_trio
and frange &, tree, const irange &, const irange &, relation_trio
arguments. Add overload with
irange &, tree, const frange &, const irange &, relation_trio
arguments.
* range-op-float.cc (operator_cast::fold_range): Implement
float to int and int to float casts.
(operator_cast::op1_range): Remove overload with
irange &, tree, const frange &, const frange &, relation_trio
and frange &, tree, const irange &, const irange &, relation_trio
arguments. Add overload with
irange &, tree, const frange &, const irange &, relation_trio and
frange &, tree, const irange &, const frange &, relation_trio
arguments and implement reverse op of float to int and int to float
cast there.
* gcc.dg/tree-ssa/pr120231-2.c: New test.
* gcc.dg/tree-ssa/pr120231-3.c: New test.
* gfortran.dg/inline_matmul_16.f90: Don't expect any _gfortran_matmul
strings in optimized dump.
* gfortran.dg/inline_matmul_26.f90: New test.
* g++.dg/tree-ssa/loop-split-1.C (d): New variable.
(main): Use std::log (i + d) instead of std::log (i).
|
|
The function has 2 problems, one is _BitInt specific and the other is
most likely also reproduceable only with it.
The first issue is that I've missed updating the function for _BitInt,
maxbitlen as MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT
obviously isn't guaranteed to be larger than any integral type we might
want to convert at compile time from wide_int to REAL_VALUE_FORMAT.
Just using len instead of it works fine, at least when used after
HOST_BITS_PER_WIDE_INT is added to it and it is truncated to multiples
of HOST_BITS_PER_WIDE_INT.
The other bug is that if the value has too many significant bits (formerly
maxbitlen - cnt_l_z, now len - cnt_l_z), the code just shifts it right and
adds the shift count to the future exponent. That isn't correct for
rounding as the testcase attempts to show, the internal real format has more
bits than any precision in supported format, but we still need to
distinguish bewtween values exactly half way between representable floating
point values (those should be rounded to even) and the case when we've
shifted away some non-zero bits, so the value was tiny bit larger than half
way and then we should round up.
The patch uses something like e.g. soft-fp uses in these cases, right shift
with sticky bit in the least significant bit.
2025-06-05 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120547
* real.cc (real_from_integer): Remove maxbitlen variable, use
len instead of that. When shifting right, or in 1 if any of the
shifted away bits are non-zero. Formatting fix.
* gcc.dg/bitint-123.c: New test.
|
|
Floating-point to integer conversions can be inexact or invalid (e.g., due to
overflow or NaN). However, since users of operation_could_trap_p infer the
bool FP_OPERATION argument from the expression's type, the FIX_TRUNC family
are considered non-trapping here.
This patch handles them explicitly.
gcc/ChangeLog:
* tree-eh.cc (operation_could_trap_helper_p): Cover FIX_TRUNC
expressions explicitly.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/pr96357.c: Change to avoid producing
a conditional FIX_TRUNC_EXPR, whilst still reproducing the bug
in PR96357.
* gcc.dg/tree-ssa/ifcvt-fix-trunc-1.c: New test.
* gcc.dg/tree-ssa/ifcvt-fix-trunc-2.c: Likewise.
|
|
This patch introduces a new testcase to verify the merging of profiles
is performed for cloned functions.
Since this is invoked very early, before the pass manager, we need to
set up the dumping explicitly. This is similar to the handling in
finish_optimization_passes.
gcc/ChangeLog:
* auto-profile.cc (autofdo_source_profile::read): Dump message
while merging profile.
* pass_manager.h (get_pass_auto_profile): New.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/clone-merge-1.c: New test.
Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
|
|
This implements a simple copy propagation for aggregates in the similar
fashion as we already do for copy prop of zeroing.
Right now this only looks at the previous vdef statement but this allows us
to catch a lot of cases that show up in C++ code.
This used to deleted aggregate copies that are to the same location (PR57361)
But that was found to delete statements that are needed for aliasing markers reason.
So we need to keep them around until that is solved. Note DSE will delete the statements
anyways so there is no testcase added since we expose the latent bug in the same way.
See https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685003.html for the testcase and
explaintation there.
Also adds a variant of pr22237.c which was found while working on this patch.
Changes since v1:
* v2: change check for vuse to use default definition.
Remove dest/src arguments for optimize_agr_copyprop
Changed dump messages slightly.
Added stats
Don't delete `a = a` until aliasing markers are added.
PR tree-optimization/14295
PR tree-optimization/108358
PR tree-optimization/114169
gcc/ChangeLog:
* tree-ssa-forwprop.cc (optimize_agr_copyprop): New function.
(pass_forwprop::execute): Call optimize_agr_copyprop for load/store statements.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/20031106-6.c: Un-xfail. Add scan for forwprop1.
* g++.dg/opt/pr66119.C: Disable forwprop since that does
the copy prop now.
* gcc.dg/tree-ssa/pr108358-a.c: New test.
* gcc.dg/tree-ssa/pr114169-1.c: New test.
* gcc.c-torture/execute/builtins/pr22237-1-lib.c: New test.
* gcc.c-torture/execute/builtins/pr22237-1.c: New test.
* gcc.dg/tree-ssa/pr57361.c: Disable forwprop1.
* gcc.dg/tree-ssa/pr57361-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
I've noticed we don't even support say float -> double and other
scalar floating point to scalar floating point conversions in the
ranger, we just end up with VARYING for those.
The following patch attempts to fix that.
The reverse cast case uses float_binary_op_range_finish e.g. because
if the result isn't infinite, then the source couldn't be infinite
either even if the reverse fold_range would suggest that.
And special cases the case of guaranteed widening cast (where
we have assurance that all the source type values are exactly
representable in the destination type; using ieee_bits for that).
2025-06-04 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/120231
* range-op-mixed.h (operator_cast::fold_range): Add overload
with 3 {,const} frange & operands. Change parameter names and
add final override keywords for float <-> integer cast overloads.
(operator_cast::op1_range): Likewise.
* range-op-float.cc (operator_cast::fold_range): New overload
with 3 {,const} frange & operands.
(operator_cast::op1_range): Likewise.
* gcc.dg/tree-ssa/pr120231-1.c: New test.
|
|
In the comment trail for PR119966, I'd said that the validate_subreg
condition:
/* The outer size must be ordered wrt the register size, otherwise
we wouldn't know at compile time how many registers the outer
mode occupies. */
if (!ordered_p (osize, regsize))
return false;
"is also potentially relevant" for paradoxical subregs. But I'd
forgotten an important caveat. If the inner size is smaller than
a register, we know that the inner value will only occupy a single
register. Although the paradoxical subreg might extend that single
register to multiple registers by padding with undefined bits,
the register size that matters for the extension is:
REGMODE_NATURAL_SIZE (omode)
rather than regsize's:
REGMODE_NATURAL_SIZE (imode)
The ordered check is still relevant if the inner value spans
multiple registers.
Enabling the check above for paradoxical subregs led to an ICE in the
testcase, where we tried to generate a VNx4QI paradoxical subreg of a
QI scalar. This was previously allowed, and AFAIK worked correctly.
The patch doesn't have the effect of relaxing the condition for
non-paradoxical subregs, since:
known_le (osize, isize) && known_le (isize, regsize)
=> known_le (osize, regsize)
=> ordered_p (osize, regsize)
So even before the patch for PR119966, the condition only existed for
the maybe_gt (isize, regsize) case.
The term "block" used in the comment is taken from the rtl.texi
documentation of subregs.
gcc/
PR rtl-optimization/120447
* emit-rtl.cc (validate_subreg): Restrict ordered_p test
between osize and regsize to cases where the inner value
occupies multiple blocks.
gcc/testsuite/
PR rtl-optimization/120447
* gcc.dg/pr120447.c: New test.
|