Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc.dg/vect/slp-28.c is now vectorized as expected even on targets
without vect32.
* gcc.dg/vect/slp-28.c: Adjust.
|
|
I have seen this a few places though the testcase from PR 95906
is an obvious place where this shows up for sure.
This convert `cmp - 1` into `-icmp` as that form is more useful
in many cases.
Changes since v1:
* v2: Add check for outer type's precision being greater than 1.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/110949
PR tree-optimization/95906
gcc/ChangeLog:
* match.pd (cmp - 1): New pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/cmp-2.c: New test.
* gcc.dg/tree-ssa/max-bitcmp-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
With bools we can have the usual mismatch between mask and data
use. Catch that, like we do elsewhere.
PR tree-optimization/121194
* tree-vect-loop.cc (vectorizable_lc_phi): Verify
vector types are compatible.
* gcc.dg/torture/pr121194.c: New testcase.
|
|
This implements error handling for hard register constraints including
potential conflicts with register asm operands.
In contrast to register asm operands, hard register constraints allow
more than just one register per operand. Even more than just one
register per alternative. For example, a valid constraint for an
operand is "{r0}{r1}m,{r2}". However, this also means that we have to
make sure that each register is used at most once in each alternative
over all outputs and likewise over all inputs. For asm statements this
is done by this patch during gimplification. For hard register
constraints used in machine description, error handling is still a todo
and I haven't investigated this so far and consider this rather a low
priority.
gcc/ada/ChangeLog:
* gcc-interface/trans.cc (gnat_to_gnu): Pass null pointer to
parse_{input,output}_constraint().
gcc/analyzer/ChangeLog:
* region-model-asm.cc (region_model::on_asm_stmt): Pass null
pointer to parse_{input,output}_constraint().
gcc/c/ChangeLog:
* c-typeck.cc (build_asm_expr): Pass null pointer to
parse_{input,output}_constraint().
gcc/ChangeLog:
* cfgexpand.cc (n_occurrences): Move this ...
(check_operand_nalternatives): and this ...
(expand_asm_stmt): and the call to gimplify.cc.
* config/s390/s390.cc (s390_md_asm_adjust): Pass null pointer to
parse_{input,output}_constraint().
* gimple-walk.cc (walk_gimple_asm): Pass null pointer to
parse_{input,output}_constraint().
(walk_stmt_load_store_addr_ops): Ditto.
* gimplify-me.cc (gimple_regimplify_operands): Ditto.
* gimplify.cc (num_occurrences): Moved from cfgexpand.cc.
(num_alternatives): Ditto.
(gimplify_asm_expr): Deal with hard register constraints.
* stmt.cc (eliminable_regno_p): New helper.
(hardreg_ok_p): Perform a similar check as done in
make_decl_rtl().
(parse_output_constraint): Add parameter for gimplify_reg_info
and validate hard register constrained operands.
(parse_input_constraint): Ditto.
* stmt.h (class gimplify_reg_info): Forward declaration.
(parse_output_constraint): Add parameter.
(parse_input_constraint): Ditto.
* tree-ssa-operands.cc
(operands_scanner::get_asm_stmt_operands): Pass null pointer
to parse_{input,output}_constraint().
* tree-ssa-structalias.cc (find_func_aliases): Pass null pointer
to parse_{input,output}_constraint().
* varasm.cc (assemble_asm): Pass null pointer to
parse_{input,output}_constraint().
* gimplify_reg_info.h: New file.
gcc/cp/ChangeLog:
* semantics.cc (finish_asm_stmt): Pass null pointer to
parse_{input,output}_constraint().
gcc/d/ChangeLog:
* toir.cc: Pass null pointer to
parse_{input,output}_constraint().
gcc/testsuite/ChangeLog:
* gcc.dg/pr87600-2.c: Split test into two files since errors for
functions test{0,1} are thrown during expand, and for
test{2,3} during gimplification.
* lib/scanasm.exp: On s390, skip lines beginning with #.
* gcc.dg/asm-hard-reg-error-1.c: New test.
* gcc.dg/asm-hard-reg-error-2.c: New test.
* gcc.dg/asm-hard-reg-error-3.c: New test.
* gcc.dg/asm-hard-reg-error-4.c: New test.
* gcc.dg/asm-hard-reg-error-5.c: New test.
* gcc.dg/pr87600-3.c: New test.
* gcc.target/aarch64/asm-hard-reg-2.c: New test.
* gcc.target/s390/asm-hard-reg-7.c: New test.
|
|
Implement hard register constraints of the form {regname} where regname
must be a valid register name for the target. Such constraints may be
used in asm statements as a replacement for register asm and in machine
descriptions. A more verbose description is given in extend.texi.
It is expected and desired that optimizations coalesce multiple pseudos
into one whenever possible. However, in case of hard register
constraints we may have to undo this and introduce copies since
otherwise we would constraint a single pseudo to multiple hard
registers. This is done prior RA during asmcons in
match_asm_constraints_2(). While IRA tries to reduce live ranges, it
also replaces some register-register moves. That in turn might undo
those copies of a pseudo which we just introduced during asmcons. Thus,
check in decrease_live_ranges_number() via
valid_replacement_for_asm_input_p() whether it is valid to perform a
replacement.
The reminder of the patch mostly deals with parsing and decoding hard
register constraints. The actual work is done by LRA in
process_alt_operands() where a register filter, according to the
constraint, is installed.
For the sake of "reviewability" and in order to show the beauty of LRA,
error handling (which gets pretty involved) is spread out into a
subsequent patch.
Limitation
----------
Currently, a fixed register cannot be used as hard register constraint.
For example, loading the stack pointer on x86_64 via
void *
foo (void)
{
void *y;
__asm__ ("" : "={rsp}" (y));
return y;
}
leads to an error.
Asm Adjust Hook
---------------
The following targets implement TARGET_MD_ASM_ADJUST:
- aarch64
- arm
- avr
- cris
- i386
- mn10300
- nds32
- pdp11
- rs6000
- s390
- vax
Most of them only add the CC register to the list of clobbered register.
However, cris, i386, and s390 need some minor adjustment.
gcc/ChangeLog:
* config/cris/cris.cc (cris_md_asm_adjust): Deal with hard
register constraint.
* config/i386/i386.cc (map_egpr_constraints): Ditto.
* config/s390/s390.cc (f_constraint_p): Ditto.
* doc/extend.texi: Document hard register constraints.
* doc/md.texi: Ditto.
* function.cc (match_asm_constraints_2): Have a unique pseudo
for each operand with a hard register constraint.
(pass_match_asm_constraints::execute): Calling into new helper
match_asm_constraints_2().
* genoutput.cc (mdep_constraint_len): Return the length of a
hard register constraint.
* genpreds.cc (write_insn_constraint_len): Support hard register
constraints for insn_constraint_len().
* ira.cc (valid_replacement_for_asm_input_p_1): New helper.
(valid_replacement_for_asm_input_p): New helper.
(decrease_live_ranges_number): Similar to
match_asm_constraints_2() ensure that each operand has a unique
pseudo if constrained by a hard register.
* lra-constraints.cc (process_alt_operands): Install hard
register filter according to constraint.
* recog.cc (asm_operand_ok): Accept register type for hard
register constrained asm operands.
(constrain_operands): Validate hard register constraints.
* stmt.cc (decode_hard_reg_constraint): Parse a hard register
constraint into the corresponding register number or bail out.
(parse_output_constraint): Parse hard register constraint and
set *ALLOWS_REG.
(parse_input_constraint): Ditto.
* stmt.h (decode_hard_reg_constraint): Declaration of new
function.
gcc/testsuite/ChangeLog:
* gcc.dg/asm-hard-reg-1.c: New test.
* gcc.dg/asm-hard-reg-2.c: New test.
* gcc.dg/asm-hard-reg-3.c: New test.
* gcc.dg/asm-hard-reg-4.c: New test.
* gcc.dg/asm-hard-reg-5.c: New test.
* gcc.dg/asm-hard-reg-6.c: New test.
* gcc.dg/asm-hard-reg-7.c: New test.
* gcc.dg/asm-hard-reg-8.c: New test.
* gcc.target/aarch64/asm-hard-reg-1.c: New test.
* gcc.target/i386/asm-hard-reg-1.c: New test.
* gcc.target/i386/asm-hard-reg-2.c: New test.
* gcc.target/s390/asm-hard-reg-1.c: New test.
* gcc.target/s390/asm-hard-reg-2.c: New test.
* gcc.target/s390/asm-hard-reg-3.c: New test.
* gcc.target/s390/asm-hard-reg-4.c: New test.
* gcc.target/s390/asm-hard-reg-5.c: New test.
* gcc.target/s390/asm-hard-reg-6.c: New test.
* gcc.target/s390/asm-hard-reg-longdouble.h: New test.
|
|
The following removes the minimum VF compute from dataref analysis
which does not take into account SLP at all, leaving the testcase
vectorized with V2SImode instead of V4SImode on x86. With SLP
the only minimum VF we can compute this early is 1.
* tree-vectorizer.h (vect_analyze_data_refs): Remove min_vf
output.
* tree-vect-data-refs.cc (vect_analyze_data_refs): Likewise.
* tree-vect-loop.cc (vect_analyze_loop_2): Remove early
out based on bogus min_vf.
* tree-vect-slp.cc (vect_slp_analyze_bb_1): Adjust.
* gcc.dg/vect/vect-127.c: New testcase.
|
|
The problem here is that the testcase is part of another
testcase but dg-final does not work across source files
so it needs its own dg-* headers to that match up with
afdo-crossmodule-1.c.
Pushed as preapproved in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120859#c4 .
PR testsuite/120859
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/afdo-crossmodule-1b.c: Add some dg-*
commands like what is in afdo-crossmodule-1.c
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
[PR121153]
I missed this when I added the two testcase vect-reduc-cond-[12].c. These testcases
require support of vectorization of `a ? b : c` which some targets (e.g. sparc) does
not support.
Pushed as obvious after a quick test.
PR testsuite/121153
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-reduc-cond-1.c: Require vect_condition.
* gcc.dg/vect/vect-reduc-cond-2.c: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Testcase of PR 117423 shows a flaw in the fancy way we do "total
scalarization" in SRA now. We use the types encountered in the
function body and not in type declaration (allowing us to totally
scalarize when only one union field is ever used, since we effectively
"skip" the union then) and can accommodate pre-existing accesses that
happen to fall into padding.
In this case, we skipped the union (bypassing the
totally_scalarizable_type_p check) and the access falling into the
"padding" is an aggregate and so not a candidate for SRA but actually
containing data. Arguably total scalarization should just bail out
when it encounters this situation (but I decided not to depend on this
mainly because we'd need to detect all cases when we eventually cannot
scalarize, such as when a scalar access has children accesses) but the
actual bug is that the detection if all data in an aggregate is indeed
covered by replacements just assumes that is always the case if total
scalarization triggers which however may not be the case in cases like
this - and perhaps more.
This patch fixes the bug by just assuming that all padding is taken
care of when total scalarization triggered, not that every access was
actually scalarized.
gcc/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* tree-sra.cc (analyze_access_subtree): Fix computation of grp_covered
flag.
gcc/testsuite/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* gcc.dg/tree-ssa/pr117423.c: New test.
|
|
The following makes sure we analyze live LC PHIs not part of
a double reduction.
PR tree-optimization/121126
* tree-vect-stmts.cc (vect_analyze_stmt): Analyze the
live lane extract for LC PHIs that are vect_internal_def.
* gcc.dg/vect/pr121126.c: New testcase.
|
|
The PR shows that the uninit analysis limits are set too low in
cases we lower switches to ifs as happens on s390x for a linux
kernel TU. This causes false positive uninit diagnostics as we
abort the attempt to prove that a value is initialized on all
paths. The new testcase only would require upping to 9.
PR tree-optimization/120924
* params.opt (uninit-max-chain-len): Up from 8 to 12.
* gcc.dg/uninit-pr120924.c: New testcase.
|
|
The following testcase ICEs because SCALAR_INT_TYPE_MODE of course
doesn't work for large BITINT_TYPE types which have BLKmode.
native_encode* as well as e.g. r14-8276 use in cases like these
GET_MODE_SIZE (SCALAR_INT_TYPE_MODE ()) and TREE_INT_CST_LOW (TYPE_SIZE_UNIT
()) for the BLKmode ones.
In this case, it wants bits rather than bytes, so I've used
GET_MODE_BITSIZE like before and TYPE_SIZE otherwise.
Furthermore, the patch only computes encoding_size for big endian
targets, for little endian we don't really adjust anything, so there
is no point computing it.
2025-07-18 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/121131
* gimple-fold.cc (fold_nonarray_ctor_reference): Use
TREE_INT_CST_LOW (TYPE_SIZE ()) instead of
GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE ()) for BLKmode BITINT_TYPEs.
Don't compute encoding_size at all for little endian targets.
* gcc.dg/bitint-124.c: New test.
|
|
The following makes us never consider vector(1) T types for
vectorization and ensures this during SLP build. This is a
long-standing issue for BB vectorization and when we remove
early loop vector type setting we lose the single place we have
that rejects this for loops.
Once we implement partial loop vectorization we should revisit
this, but then use the original scalar types for the unvectorized
parts.
* tree-vect-slp.cc (vect_build_slp_tree_1): Reject
single-lane vector types.
* gcc.dg/vect/bb-slp-39.c: Adjust.
|
|
When VN iterates we can end up with unreachable inserted expressions
in the expression tables which in turn will not be added to their
value by PREs compute_avail. This will later ICE when we pick
them up and want to generate them. Deal with this by giving up.
PR tree-optimization/121035
* tree-ssa-pre.cc (find_or_generate_expression): Handle
values without expression.
* gcc.dg/pr121035.c: New testcase.
|
|
Since only glibc targets support -mfentry, warn -pg without -mfentry only
on glibc targets.
gcc/
PR target/120881
PR testsuite/121078
* config/i386/i386-options.cc (ix86_option_override_internal):
Warn -pg without -mfentry only on glibc targets.
gcc/testsuite/
PR target/120881
PR testsuite/121078
* gcc.dg/20021014-1.c (dg-additional-options): Add -mfentry
-fno-pic only on gnu/x86 targets.
* gcc.dg/aru-2.c (dg-additional-options): Likewise.
* gcc.dg/nest.c (dg-additional-options): Likewise.
* gcc.dg/pr32450.c (dg-additional-options): Likewise.
* gcc.dg/pr43643.c (dg-additional-options): Likewise.
* gcc.target/i386/pr104447.c (dg-additional-options): Likewise.
* gcc.target/i386/pr113122-3.c(dg-additional-options): Likewise.
* gcc.target/i386/pr119386-1.c (dg-additional-options): Add
-mfentry only on gnu targets.
* gcc.target/i386/pr119386-2.c (dg-additional-options): Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
The following disables loop masking when we are using an even/odd
widening operation in a reduction because the loop mask then aligns
to the wrong elements.
PR tree-optimization/121049
* internal-fn.h (widening_evenodd_fn_p): Declare.
* internal-fn.cc (widening_evenodd_fn_p): New function.
* tree-vect-stmts.cc (vectorizable_conversion): When using
an even/odd widening function disable loop masking.
* gcc.dg/vect/pr121049.c: New testcase.
|
|
For possible reductions, ifconv currently handles if the addition
is on one side of the if. But in the case of PR 119920, the reduction
addition is on both sides of the if.
E.g.
```
if (_27 == 0)
goto <bb 14>; [50.00%]
else
goto <bb 13>; [50.00%]
<bb 14>
a_29 = b_14(D) + a_17;
goto <bb 15>; [100.00%]
<bb 13>
a_28 = c_12(D) + a_17;
<bb 15>
# a_30 = PHI <a_28(13), a_29(14)>
```
Which ifcvt converts into:
```
_34 = _32 + _33;
a_15 = (int) _34;
_23 = _4 == 0;
_37 = _33 + _35;
a_13 = (int) _37;
a_5 = _23 ? a_15 : a_13;
```
But the vectorizer does not recognize this as a reduction.
To fix this, we should factor out the addition from the `if`.
This allows us to get:
```
iftmp.0_7 = _22 ? b_13(D) : c_12(D);
a_14 = iftmp.0_7 + a_18;
```
Which then the vectorizer recognizes as a reduction.
In the case of PR 112324 and PR 110015, it is similar but with MAX_EXPR reduction
instead of an addition.
Note while this should be done in phiopt, there are regressions
due to other passes not able to handle the factored out cases
(see linked bug to PR 64700). I have not had time to fix all of the passes
that could handle the addition being in the if/then/else rather than being outside yet.
So this is I thought it would be useful just to have a localized version in ifconv which
is then only used for the vectorizer.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/119920
PR tree-optimization/112324
PR tree-optimization/110015
gcc/ChangeLog:
* tree-if-conv.cc (find_different_opnum): New function.
(factor_out_operators): New function.
(predicate_scalar_phi): Call factor_out_operators when
there is only 2 elements of a phi.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-reduc-cond-1.c: New test.
* gcc.dg/vect/vect-reduc-cond-2.c: New test.
* gcc.dg/vect/vect-reduc-cond-3.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
When having a _BitInt induction we should make sure to not create
the step vector elements as _BitInts but as vector element typed.
PR tree-optimization/121116
* tree-vect-loop.cc (vectorizable_induction): Use the
step vector element type for further processing.
* gcc.dg/torture/pr121116.c: New testcase.
|
|
When we opportunistically mask an operand of a AND with an already
available loop mask we need to query that set with the correct number
of masks we expect.
PR tree-optimization/121059
* tree-vect-stmts.cc (vectorizable_operation): Query
scalar_cond_masked_set with the correct number of masks.
* gcc.dg/vect/pr121059.c: New testcase.
Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com>
|
|
The -Wunused-but-set-* warnings work by using 2 bits on VAR_DECLs &
PARM_DECLs, TREE_USED and DECL_READ_P. If neither is set, we typically
emit -Wunused-variable or -Wunused-parameter warning, that is for variables
which are just declared (including initializer) and completely unused.
If TREE_USED is set and DECL_READ_P is unset, -Wunused-but-set-* warnings
are emitted, i.e. for variables which can appear on the lhs of an assignment
expression but aren't actually used elsewhere. The DECL_READ_P marking is
done through mark_exp_read called from lots of places (e.g. lvalue to rvalue
conversions etc.).
LLVM has an extension on top of that in that it doesn't count pre/post
inc/decrements as use (i.e. DECL_READ_P for GCC).
The following patch does that too, though because we had the current
behavior for 11+ years already and lot of people is -Wunused-but-set-*
warning free in the current GCC behavior and not in the clang one (including
GCC sources), it allows users to choose.
Furthermore, it implements another level, where also var @= expr uses of var
(except when it is also used in expr) aren't counted as DECL_READ_P.
I think it would be nice to also handle var = var @ expr or var = expr @ var
but unfortunately mark_exp_read is then done in both FEs during parsing of
var @ expr or expr @ var and the code doesn't know it is rhs of an
assignment with var as lhs.
The patch works mostly by checking if DECL_READ_P is clear at some point and
then clearing it again after some operation which might have set it.
-Wunused or -Wall or -Wunused -Wextra or -Wall -Wextra turn on the 3 level
of the new warning (i.e. the one which ignores also var++, ++var etc. as
well as var @= expr), so does -Wunused-but-set-{variable,parameter}, but
users can use explicit -Wunused-but-set-{variable,parameter}={1,2} to select
a different level.
2025-07-15 Jakub Jelinek <jakub@redhat.com>
Jason Merrill <jason@redhat.com>
PR c/44677
gcc/
* common.opt (Wunused-but-set-parameter=, Wunused-but-set-variable=):
New options.
(Wunused-but-set-parameter, Wunused-but-set-variable): Turn into
aliases.
* common.opt.urls: Regenerate.
* diagnostic-spec.cc (nowarn_spec_t::nowarn_spec_t): Use
OPT_Wunused_but_set_variable_ instead of OPT_Wunused_but_set_variable
and OPT_Wunused_but_set_parameter_ instead of
OPT_Wunused_but_set_parameter.
* gimple-ssa-store-merging.cc (find_bswap_or_nop_1): Remove unused
but set variable tmp.
* ipa-strub.cc (pass_ipa_strub::execute): Cast named_args to
(void) if ATTR_FNSPEC_DECONST_WATERMARK is not defined.
* doc/invoke.texi (Wunused-but-set-parameter=,
Wunused-but-set-variable=): Document new options.
(Wunused-but-set-parameter, Wunused-but-set-variable): Adjust
documentation now that they are just aliases.
gcc/c-family/
* c-opts.cc (c_common_post_options): Change
warn_unused_but_set_parameter and warn_unused_but_set_variable
from 1 to 3 if they were set only implicitly.
* c-attribs.cc (build_attr_access_from_parms): Remove unused
but set variable nelts.
gcc/c/
* c-parser.cc (c_parser_unary_expression): Clear DECL_READ_P
after default_function_array_read_conversion for
-Wunused-but-set-{parameter,variable}={2,3} on
PRE{IN,DE}CREMENT_EXPR argument.
(c_parser_postfix_expression_after_primary): Similarly for
POST{IN,DE}CREMENT_EXPR.
* c-decl.cc (pop_scope): Use OPT_Wunused_but_set_variable_
instead of OPT_Wunused_but_set_variable.
(finish_function): Use OPT_Wunused_but_set_parameter_
instead of OPT_Wunused_but_set_parameter.
* c-typeck.cc (mark_exp_read): Handle {PRE,POST}{IN,DE}CREMENT_EXPR
and don't handle it when cast to void.
(build_modify_expr): Clear DECL_READ_P after build_binary_op
for -Wunused-but-set-{parameter,variable}=3.
gcc/cp/
* cp-gimplify.cc (cp_fold): Clear DECL_READ_P on lhs of MODIFY_EXPR
after cp_fold_rvalue if it wasn't set before.
* decl.cc (poplevel): Use OPT_Wunused_but_set_variable_
instead of OPT_Wunused_but_set_variable.
(finish_function): Use OPT_Wunused_but_set_parameter_
instead of OPT_Wunused_but_set_parameter.
* expr.cc (mark_use): Clear read_p for {PRE,POST}{IN,DE}CREMENT_EXPR
cast to void on {VAR,PARM}_DECL for
-Wunused-but-set-{parameter,variable}={2,3}.
(mark_exp_read): Handle {PRE,POST}{IN,DE}CREMENT_EXPR and don't handle
it when cast to void.
* module.cc (trees_in::fn_parms_fini): Remove unused but set variable
ix.
* semantics.cc (finish_unary_op_expr): Return early for
PRE{IN,DE}CREMENT_EXPR.
* typeck.cc (cp_build_unary_op): Clear DECL_READ_P
after mark_lvalue_use for -Wunused-but-set-{parameter,variable}={2,3}
on PRE{IN,DE}CREMENT_EXPR argument.
(cp_build_modify_expr): Clear DECL_READ_P after cp_build_binary_op
for -Wunused-but-set-{parameter,variable}=3.
gcc/go/
* gofrontend/gogo.cc (Function::export_func_with_type): Remove
unused but set variable i.
gcc/cobol/
* gcobolspec.cc (lang_specific_driver): Remove unused but set variable
n_cobol_files.
gcc/testsuite/
* c-c++-common/Wunused-parm-1.c: New test.
* c-c++-common/Wunused-parm-2.c: New test.
* c-c++-common/Wunused-parm-3.c: New test.
* c-c++-common/Wunused-parm-4.c: New test.
* c-c++-common/Wunused-parm-5.c: New test.
* c-c++-common/Wunused-parm-6.c: New test.
* c-c++-common/Wunused-var-7.c (bar, baz): Expect warning on a.
* c-c++-common/Wunused-var-19.c: New test.
* c-c++-common/Wunused-var-20.c: New test.
* c-c++-common/Wunused-var-21.c: New test.
* c-c++-common/Wunused-var-22.c: New test.
* c-c++-common/Wunused-var-23.c: New test.
* c-c++-common/Wunused-var-24.c: New test.
* g++.dg/cpp26/name-independent-decl1.C (foo): Expect one
set but not used warning.
* g++.dg/warn/Wunused-parm-12.C: New test.
* g++.dg/warn/Wunused-parm-13.C: New test.
* g++.dg/warn/Wunused-var-2.C (f2): Expect set but not used warning
on parameter x and variable a.
* g++.dg/warn/Wunused-var-40.C: New test.
* g++.dg/warn/Wunused-var-41.C: New test.
* gcc.dg/memchr-3.c (test_find): Change return type from void to int,
and add return n; statement.
* gcc.dg/unused-9.c (g): Move dg-bogus to the correct line and expect
a warning on i.
|
|
This reverts commit 66346b6d800fc4baae876e0fe4e932401bcc85fa.
|
|
For loop masking we need to mask a mask AND operation with the loop
mask. The following makes sure we have a corresponding mask
available. There's no good way to distinguish loop masking from
len masking here, so assume we have recorded a mask for the operands
mask producers.
PR tree-optimization/121059
* tree-vect-stmts.cc (vectorizable_operation): Record a
loop mask for mask AND operations.
* gcc.dg/vect/pr121059.c: New testcase.
|
|
When profiling is enabled with shrink wrapping, the mcount call may not
be placed at the function entry after
pushq %rbp
movq %rsp,%rbp
As the result, the profile data may be skewed which makes PGO less
effective.
Add --enable-x86-64-mfentry to enable -mfentry by default to use
__fentry__, added to glibc in 2010 by:
commit d22e4cc9397ed41534c9422d0b0ffef8c77bfa53
Author: Andi Kleen <ak@linux.intel.com>
Date: Sat Aug 7 21:24:05 2010 -0700
x86: Add support for frame pointer less mcount
instead of mcount, which is placed before the prologue so that -pg can
be used with -fshrink-wrap-separate enabled at -O1. This option is
64-bit only because __fentry__ doesn't support PIC in 32-bit mode. The
default it to enable -mfentry when targeting glibc.
Also warn -pg without -mfentry with shrink wrapping enabled. The warning
is disable for PIC in 32-bit mode.
gcc/
PR target/120881
* config.in: Regenerated.
* configure: Likewise.
* configure.ac: Add --enable-x86-64-mfentry.
* config/i386/i386-options.cc (ix86_option_override_internal):
Enable __fentry__ in 64-bit mode if ENABLE_X86_64_MFENTRY is set
to 1. Warn -pg without -mfentry with shrink wrapping enabled.
* doc/install.texi: Document --enable-x86-64-mfentry.
gcc/testsuite/
PR target/120881
* gcc.dg/20021014-1.c: Add additional -mfentry -fno-pic options
for x86.
* gcc.dg/aru-2.c: Likewise.
* gcc.dg/nest.c: Likewise.
* gcc.dg/pr32450.c: Likewise.
* gcc.dg/pr43643.c: Likewise.
* gcc.target/i386/pr104447.c: Likewise.
* gcc.target/i386/pr113122-3.c: Likewise.
* gcc.target/i386/pr119386-1.c: Add additional -mfentry if not
ia32.
* gcc.target/i386/pr119386-2.c: Likewise.
* gcc.target/i386/pr120881-1a.c: New test.
* gcc.target/i386/pr120881-1b.c: Likewise.
* gcc.target/i386/pr120881-1c.c: Likewise.
* gcc.target/i386/pr120881-1d.c: Likewise.
* gcc.target/i386/pr120881-2a.c: Likewise.
* gcc.target/i386/pr120881-2b.c: Likewise.
* gcc.target/i386/pr82699-1.c: Add additional -mfentry.
* lib/target-supports.exp (check_effective_target_fentry): New.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
darwin25 will be named macOS 26 (codename Tahoe). This is a change from
darwin24, which was macOS 15. We need to adapt the driver to this new
numbering scheme.
2025-07-14 François-Xavier Coudert <fxcoudert@gcc.gnu.org>
gcc/ChangeLog:
PR target/120645
* config/darwin-driver.cc: Account for latest macOS numbering
scheme.
gcc/testsuite/ChangeLog:
* gcc.dg/darwin-minversion-link.c: Account for macOS 26.
|
|
I'm going to refine a part of the PR 87600 fix which seems triggering
PR 120983 that LoongArch is particularly suffering. Enable the PR 87600
tests so I'll not regress PR 87600.
gcc/testsuite/ChangeLog:
PR rtl-optimization/87600
PR rtl-optimization/120983
* gcc.dg/pr87600.h [__loongarch__]: Define REG0 and REG1.
* gcc.dg/pr87600-1.c (dg-do): Add loongarch.
* gcc.dg/pr87600-2.c (dg-do): Likewise.
|
|
In r16-1631-g2334d30cd8feac I added support for capturing state
information from -fanalyzer in XML form, and adding a way to visualize
these states in HTML output. The data was optionally captured in SARIF
output (with "xml-state=yes"), stashing the XML in string form in
a property bag.
This worked, but there was no way to round-trip the stored data back
from SARIF without adding an XML parser to GCC, which I don't want to
do.
SARIF supports capturing directed graphs, so this patch:
(a) adds a new namespace diagnostics::digraphs, with classes digraph,
node, and edge, representing directed graphs in a form similar to
what SARIF can serialize
(b) adds support to GCC's diagnostic subsystem for reporting graphs,
either "globally" or as part of a diagnostic. An example in a testsuite
plugin emits an error that has a couple of dummy graphs associated with
it, and captures the optimization passes as a digraph "globally".
Graphs are ignored by text sinks, but are captured by sarif sinks,
and the "experimental-html" sink gains SVG-based rendering of any graphs
using dot. This HTML output is rather crude; an example can be seen
here:
https://dmalcolm.fedorapeople.org/gcc/2025-07-10/diagnostic-test-graphs-html.c.html
(c) adds support to libgdiagnostics for the above
(d) adds support to sarif-replay for the above (round-tripping any
graph information)
(e) replaces the XML representation of state with a representation
based on the above directed graphs, using property bags to stash
additional information (e.g. "this is an on-stack buffer")
(f) implements round-tripping of this information in sarif-replay
To summarize:
- previously we could generate HTML diagrams for debugging
-fanalyzer directly from gcc, but not from stored .sarif output.
- with this patch, we can generate such HTML diagrams both directly
*and* from stored .sarif output (provided the SARIF sink was created
with "state-graphs=yes")
Examples of HTML output can be seen here:
https://dmalcolm.fedorapeople.org/gcc/2025-07-10/
where as before j/k can be used to cycle through the events.
which is almost identical to the output from the old XML-based
implementation seen at:
https://dmalcolm.fedorapeople.org/gcc/2025-06-23/
gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add diagnostic-digraphs.o and
diagnostic-state-graphs.o.
gcc/ChangeLog:
* diagnostic-format-html.cc: Include "diagnostic-format-sarif.h",
Replace include of "diagnostic-state.h" with includes of
"diagnostic-digraphs.h" and "diagnostic-state-graphs.h".
(html_generation_options::html_generation_options): Update for
field renaming.
(html_builder::m_body_element): New field.
(html_builder::html_builder): Initialize m_body_element.
(html_builder::maybe_make_state_diagram): Port from XML
implementation to state graph implementation.
(html_builder::make_element_for_diagnostic): Add any
per-diagnostic graphs.
(html_builder::add_graph): New.
(html_builder::emit_global_graph): New.
(html_output_format::report_global_digraph): New.
* diagnostic-format-html.h
(html_generation_options::m_show_state_diagram_xml): Replace
with...
(html_generation_options::m_show_state_diagrams_sarif): ...this.
(html_generation_options::m_show_state_diagram_dot_src): Rename
to...
(html_generation_options::m_show_state_diagrams_dot_src): ...this.
* diagnostic-format-sarif.cc: Include "diagnostic-digraphs.h" and
"diagnostic-state-graphs.h".
(sarif_builder::m_run_graphs): New field.
(sarif_result::on_nested_diagnostic): Update call to
make_location_object to pass arg by pointer.
(sarif_builder::sarif_builder): Initialize m_run_graphs.
(sarif_builder::report_global_digraph): New.
(sarif_builder::make_result_object): Add any graphs to
the result object.
(sarif_builder::make_locations_arr): Update call to
make_location_object to pass arg by pointer.
(sarif_builder::make_location_object): Pass param "loc_mgr" by
pointer rather than by reference so that it can be null, and
handle this case.
(copy_any_property_bag): New.
(make_sarif_graph): New.
(make_sarif_node): New.
(make_sarif_edge): New.
(sarif_property_bag::set_graph): New.
(populate_thread_flow_location_object): Port from XML
implementation to state graph implementation.
(make_run_object): Store any graphs.
(sarif_output_format::report_global_digraph): New.
(sarif_generation_options::sarif_generation_options): Rename
m_xml_state to m_state_graph.
(selftest::test_make_location_object): Update for change to
make_location_object.
* diagnostic-format-sarif.h:
(sarif_generation_options::m_xml_state): Replace with...
(sarif_generation_options::m_state_graph): ...this.
(class sarif_location_manager): Add forward decl.
(diagnostics::digraphs::digraph): New forward decl.
(diagnostics::digraphs::node): New forward decl.
(diagnostics::digraphs::edge): New forward decl.
(sarif_property_bag::set_graph): New decl.
(class sarif_graph): New.
(class sarif_node): New.
(class sarif_edge): New.
(make_sarif_graph): New decl.
(make_sarif_node): New decl.
(make_sarif_edge): New decl.
* diagnostic-format-text.h
(diagnostic_text_output_format::report_global_digraph): New.
* diagnostic-format.h
(diagnostic_output_format::report_global_digraph): New vfunc.
* diagnostic-digraphs.cc: New file.
* diagnostic-digraphs.h: New file.
* diagnostic-metadata.h (diagnostics::digraphs::lazy_digraphs):
New forward decl.
(diagnostic_metadata::diagnostic_metadata): Initialize
m_lazy_digraphs.
(diagnostic_metadata::set_lazy_digraphs): New.
(diagnostic_metadata::get_lazy_digraphs): New.
(diagnostic_metadata::m_lazy_digraphs): New field.
* diagnostic-output-spec.cc (sarif_scheme_handler::make_sink):
Update for XML to state graph changes.
(sarif_scheme_handler::make_sarif_gen_opts): Likewise.
(html_scheme_handler::make_sink): Rename "show-state-diagram-xml"
to "show-state-diagrams-sarif" and use pluralization consistently.
* diagnostic-path.cc: Replace include of "xml.h" with
"diagnostic-state-graphs.h".
(diagnostic_event::maybe_make_xml_state): Replace with...
(diagnostic_event::maybe_make_diagnostic_state_graph): ...this.
* diagnostic-path.h (diagnostics::digraphs::digraph): New forward
decl.
(diagnostic_event::maybe_make_xml_state): Replace with...
(diagnostic_event::maybe_make_diagnostic_state_graph): ...this.
* diagnostic-state-graphs.cc: New file.
* diagnostic-state-graphs.h: New file.
* diagnostic-state-to-dot.cc: Port implementation from XML to
state graphs.
* diagnostic-state.h: Deleted file.
* diagnostic.cc (diagnostic_context::report_global_digraph): New.
* diagnostic.h (diagnostics::digraphs::lazy_digraph): New forward
decl.
(diagnostic_context::report_global_digraph): New decl.
* doc/analyzer.texi (Debugging the Analyzer): Update to reflect
change from XML to state graphs.
* doc/invoke.texi ("sarif" diagnostics sink): Replace "xml-state"
with "state-graphs".
("experimental-html" diagnostics sink): Replace
"show-state-diagrams-xml" with "show-state-diagrams-sarif"
* doc/libgdiagnostics/topics/compatibility.rst
(LIBGDIAGNOSTICS_ABI_3): New.
* doc/libgdiagnostics/topics/graphs.rst: New file.
* doc/libgdiagnostics/topics/index.rst: Add graphs.rst.
* graphviz.h (node_id::operator=): New.
* json.h (json::value::dyn_cast_string): New.
(json::object::get_num_keys): New accessor.
(json::object::get_key): New accessor.
(json::string::dyn_cast_string): New.
* libgdiagnostics++.h (class libgdiagnostics::graph): New.
(class libgdiagnostics::node): New.
(class libgdiagnostics::edge): New.
(class libgdiagnostics::diagnostic::take_graph): New.
(class libgdiagnostics::manager::take_global_graph): New.
(class libgdiagnostics::graph::set_description): New.
(class libgdiagnostics::graph::get_node_by_id): New.
(class libgdiagnostics::graph::get_edge_by_id): New.
(class libgdiagnostics::graph::add_edge): New.
(class libgdiagnostics::node::set_label): New.
(class libgdiagnostics::node::set_location): New.
(class libgdiagnostics::node::set_logical_location): New.
* libgdiagnostics-private.h: New file.
* libgdiagnostics.cc: Define INCLUDE_STRING. Include
"diagnostic-digraphs.h", "diagnostic-state-graphs.h", and
"libgdiagnostics-private.h".
(struct diagnostic_graph): New.
(struct diagnostic_node): New.
(struct diagnostic_edge): New.
(libgdiagnostics_path_event::libgdiagnostics_path_event): Add
state_graph param.
(libgdiagnostics_path_event::maybe_make_diagnostic_state_graph):
New.
(libgdiagnostics_path_event::m_state_graph): New field.
(diagnostic_execution_path::add_event_va): Add state_graph param.
(class prebuilt_digraphs): New.
(diagnostic::diagnostic): Use m_graphs in m_metadata.
(diagnostic::take_graph): New.
(diagnostic::get_graphs): New accessor.
(diagnostic::m_graphs): New field.
(diagnostic_manager::take_global_graph): New.
(diagnostic_execution_path_add_event): Update for new param to
add_event_va.
(diagnostic_execution_path_add_event_va): Likewise.
(diagnostic_graph::add_node_with_id): New public entrypoint.
(diagnostic_graph::add_edge_with_label): New public entrypoint.
(diagnostic_manager_new_graph): New public entrypoint.
(diagnostic_manager_take_global_graph): New public entrypoint.
(diagnostic_take_graph): New public entrypoint.
(diagnostic_graph_release): New public entrypoint.
(diagnostic_graph_set_description): New public entrypoint.
(diagnostic_graph_add_node): New public entrypoint.
(diagnostic_graph_add_edge): New public entrypoint.
(diagnostic_graph_get_node_by_id): New public entrypoint.
(diagnostic_graph_get_edge_by_id): New public entrypoint.
(diagnostic_node_set_location): New public entrypoint.
(diagnostic_node_set_label): New public entrypoint.
(diagnostic_node_set_logical_location): New public entrypoint.
(private_diagnostic_execution_path_add_event_2): New private
entrypoint.
(private_diagnostic_graph_set_property_bag): New private
entrypoint.
(private_diagnostic_node_set_property_bag): New private
entrypoint.
(private_diagnostic_edge_set_property_bag): New private
entrypoint.
* libgdiagnostics.h (diagnostic_graph): New typedef.
(diagnostic_node): New typedef.
(diagnostic_edge): New typedef.
(diagnostic_manager_new_graph): New decl.
(diagnostic_manager_take_global_graph): New decl.
(diagnostic_take_graph): New decl.
(diagnostic_graph_release): New decl.
(diagnostic_graph_set_description): New decl.
(diagnostic_graph_add_node): New decl.
(diagnostic_graph_add_edge): New decl.
(diagnostic_graph_get_node_by_id): New decl.
(diagnostic_graph_get_edge_by_id): New decl.
(diagnostic_node_set_label): New decl.
(diagnostic_node_set_location): New decl.
(diagnostic_node_set_logical_location): New decl.
* libgdiagnostics.map (LIBGDIAGNOSTICS_ABI_3): New.
* libsarifreplay.cc: Include "libgdiagnostics-private.h".
(id_map): New "using".
(sarif_replayer::report_invalid_sarif): Update for change to
report_problem params.
(sarif_replayer::report_unhandled_sarif): Likewise.
(sarif_replayer::report_note): New.
(sarif_replayer::report_problem): Pass param "ref" by
pointer rather than reference and handle it being null.
(sarif_replayer::maybe_get_property_bag): New.
(sarif_replayer::maybe_get_property_bag_value): New.
(sarif_replayer::handle_run_obj): Handle run-level "graphs" as per
§3.14.20.
(sarif_replayer::handle_result_obj): Handle result-level "graphs"
as per §3.27.19.
(handle_thread_flow_location_object): Optionally handle graphs
stored in property "gcc/diagnostic_event/state_graph" as state
graphs.
(sarif_replayer::handle_graph_object): New.
(sarif_replayer::handle_node_object): New.
(sarif_replayer::handle_edge_object): New.
(sarif_replayer::get_graph_node_by_id_property): New.
* selftest-run-tests.cc (selftest::run_tests): Call
selftest::diagnostic_graph_cc_tests and
selftest::diagnostic_state_graph_cc_tests.
* selftest.h (selftest::diagnostic_graph_cc_tests): New decl.
(selftest::diagnostic_state_graph_cc_tests): New decl.
gcc/analyzer/ChangeLog:
* ana-state-to-diagnostic-state.cc: Reimplement, replacing
XML-based implementation with one based on state graphs.
* ana-state-to-diagnostic-state.h: Likewise.
* checker-event.cc: Replace include of "xml.h" with include of
"diagnostic-state-graphs.h".
(checker_event::maybe_make_xml_state): Replace with...
(checker_event::maybe_make_diagnostic_state_graph): ...this.
* checker-event.h: Add include of "diagnostic-digraphs.h".
(checker_event::maybe_make_xml_state): Replace decl with...
(checker_event::maybe_make_diagnostic_state_graph): ...this.
* engine.cc (exploded_node::on_stmt_pre): Replace
"_analyzer_dump_xml" with "__analyzer_dump_sarif".
* program-state.cc: Replace include of "diagnostic-state.h" with
"diagnostic-state-graphs.h".
(program_state::dump_dot): Port from XML to state graphs.
* program-state.h: Drop reduntant forward decl of xml::document.
(program_state::make_xml): Replace decl with...
(program_state::make_diagnostic_state_graph): ...this.
(program_state::dump_xml_to_pp): Drop decl.
(program_state::dump_xml_to_file): Drop decl.
(program_state::dump_xml): Drop decl.
(program_state::dump_dump_sarif): New decl.
* sm-malloc.cc (get_dynalloc_state_for_state): New.
(malloc_state_machine::add_state_to_xml): Replace with...
(malloc_state_machine::add_state_to_state_graph): ...this.
* sm.cc (state_machine::add_state_to_xml): Replace with...
(state_machine::add_state_to_state_graph): ...this.
(state_machine::add_global_state_to_xml): Replace with...
(state_machine::add_global_state_to_state_graph): ...this.
* sm.h (class xml_state): Drop forward decl.
(class analyzer_state_graph): New forward decl.
(state_machine::add_state_to_xml): Replace decl with...
(state_machine::add_state_to_state_graph): ...this.
(state_machine::add_global_state_to_xml): Replace decl with...
(state_machine::add_global_state_to_state_graph): ...this.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/state-diagram-1-sarif.py (test_xml_state):
Rename to...
(test_state_graph): ...this. Port from XML to SARIF graphs.
* gcc.dg/analyzer/state-diagram-1.c: Update sink option
from "sarif:xml-state=yes" to "sarif:state-graphs=yes".
* gcc.dg/analyzer/state-diagram-5-sarif.c: Likewise.
* gcc.dg/analyzer/state-diagram-5-sarif.py: Drop import of ET.
(test_nested_types_in_xml_state): Rename to...
(test_nested_types_in_state_graph): ...this. Port from XML to
SARIF graphs.
* gcc.dg/plugin/diagnostic-test-graphs-html.c: New test.
* gcc.dg/plugin/diagnostic-test-graphs-html.py: New test script.
* gcc.dg/plugin/diagnostic-test-graphs-sarif.c: New test.
* gcc.dg/plugin/diagnostic-test-graphs-sarif.py: New test script.
* gcc.dg/plugin/diagnostic-test-graphs.c: New test.
* gcc.dg/plugin/diagnostic_plugin_test_graphs.cc: New test plugin.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add the above.
* lib/sarif.py (get_xml_state): Delete.
(get_state_graph): New.
(def get_state_node_attr): New.
(get_state_node_kind): New.
(get_state_node_name): New.
(get_state_node_type): New.
(get_state_node_value): New.
* sarif-replay.dg/2.1.0-invalid/3.40.2-duplicate-node-id.sarif:
New test.
* sarif-replay.dg/2.1.0-invalid/3.41.4-unrecognized-node-id.sarif:
New test.
* sarif-replay.dg/2.1.0-valid/graphs-check-html.py: New test
script.
* sarif-replay.dg/2.1.0-valid/graphs-check-sarif-roundtrip.py: New
test script.
* sarif-replay.dg/2.1.0-valid/graphs.sarif: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
The following fixes the loop following the reduction chain to
properly visit all SLP nodes involved and makes the stmt info
and the SLP node we track match.
PR tree-optimization/121034
* tree-vect-loop.cc (vectorizable_reduction): Cleanup
reduction chain following code.
* gcc.dg/vect/pr121034.c: New testcase.
|
|
.ACCESS_WITH_SIZE (PR121000)
The size of the element of the FAM _cannot_ reliably depends on the original
TYPE of the FAM that we passed as the 6th parameter to the .ACCESS_WITH_SIZE:
TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (gimple_call_arg (call, 5))))
when the element of the FAM has a variable length type. Since the variable
that represents TYPE_SIZE_UNIT has no explicit usage in the original IL,
compiler transformations (such as DSE) that are applied before object_size
phase might eliminate the whole definition to the variable that represents
the TYPE_SIZE_UNIT of the element of the FAM.
In order to resolve this issue, instead of passing the original TYPE of the
FAM as the 6th argument to .ACCESS_WITH_SIZE, we should explicitly pass the
original TYPE_SIZE_UNIT of the element TYPE of the FAM as the 6th argument
to the call to .ACCESS_WITH_SIZE.
PR middle-end/121000
gcc/c/ChangeLog:
* c-typeck.cc (build_access_with_size_for_counted_by): Update comments.
Pass TYPE_SIZE_UNIT of the element as the 6th argument.
gcc/ChangeLog:
* internal-fn.cc (expand_ACCESS_WITH_SIZE): Update comments.
* internal-fn.def (ACCESS_WITH_SIZE): Update comments.
* tree-object-size.cc (access_with_size_object_size): Update comments.
Get the element_size from the 6th argument directly.
gcc/testsuite/ChangeLog:
* gcc.dg/flex-array-counted-by-pr121000.c: New test.
|
|
This patch fixes several issues I noticed in gimple matching and -Wauto-profile
warning. One problem is that we mismatched symbols with user names, such as
"*strlen" instead of "strlen". I added raw_symbol_name to strip extra '*' which
is ok on ELF targets which are only targets we support with auto-profile, but
eventually we will want to add the user prefix. There is sorry about this.
Also I think dwarf2out is wrong:
static void
add_linkage_attr (dw_die_ref die, tree decl)
{
const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
/* Mimic what assemble_name_raw does with a leading '*'. */
if (name[0] == '*')
name = &name[1];
The patch also fixes locations of warning. I used location of problematic
statement as warning_at parmaeter but also included info about the containing
funtction. This makes warning_at to ignore the fist location that is fixed now.
I also fixed the ICE with -Wno-auto-profile disussed earlier.
Bootstrapped/regtested x86_64-linux. Autoprofiled bootstrap now fails for
weird reasons for me (it does not bild the training stage), so I will try to
debug this before comitting.
gcc/ChangeLog:
* auto-profile.cc: Include output.h.
(function_instance::set_call_location): Also sanity check
that location is known.
(raw_symbol_name): Two new static functions.
(dump_inline_stack): Use it.
(string_table::get_index_by_decl): Likewise.
(function_instance::get_cgraph_node): Likewise.
(function_instance::get_function_instance_by_decl): Fix typo
in warning; use raw names; fix lineno decoding.
(match_with_target): Add containing funciton parameter;
correctly output function and call location in warning.
(function_instance::lookup_count): Fix warning locations.
(function_instance::match): Fix warning locations; avoid
crash with mismatched callee; do not warn about broken callsites
twice.
(autofdo_source_profile::offline_external_functions): Use
raw_assembler_name.
(walk_block): Use raw_assembler_name.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/afdo-inline.c: Add user symbol names.
|
|
When using SVE INDEX to load an Advanced SIMD vector, we need to
take account of the different element ordering for big-endian
targets. For example, when big-endian targets store the V4SI
constant { 0, 1, 2, 3 } in registers, 0 becomes the most
significant element, whereas INDEX always operates from the
least significant element. A big-endian target would therefore
load V4SI { 0, 1, 2, 3 } using:
INDEX Z0.S, #3, #-1
rather than little-endian's:
INDEX Z0.S, #0, #1
While there, I noticed that we would only check the first vector
in a multi-vector SVE constant, which would trigger an ICE if the
other vectors turned out to be invalid. This is pretty difficult to
trigger at the moment, since we only allow single-register modes to be
used as frontend & middle-end vector modes, but it can be seen using
the RTL frontend.
gcc/
* config/aarch64/aarch64.cc (aarch64_sve_index_series_p): New
function, split out from...
(aarch64_simd_valid_imm): ...here. Account for the different
SVE and Advanced SIMD element orders on big-endian targets.
Check each vector in a structure mode.
gcc/testsuite/
* gcc.dg/rtl/aarch64/vec-series-1.c: New test.
* gcc.dg/rtl/aarch64/vec-series-2.c: Likewise.
* gcc.target/aarch64/sve/acle/general/dupq_2.c: Fix expected
output for this big-endian test.
* gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
* gcc.target/aarch64/sve/vec_init_3.c: Restrict to little-endian
targets and add more tests.
* gcc.target/aarch64/sve/vec_init_4.c: New big-endian version
of vec_init_3.c.
|
|
The following changes noinline to noipa to avoid having IPA-CP clones
confusing the vectorized loop counting.
PR testsuite/120093
* gcc.dg/vect/pr101145.c: Use noipa instead of noinline
attribute.
|
|
Fix-up for commit 72e85d46472716e670cbe6e967109473b8d12d38
"tree-optimization/120780: Support object size for containing objects".
'size_t sz' is unused here, and GCC/nvptx doesn't accept this:
spawn -ignore SIGHUP [...]/nvptx-none-run ./builtin-dynamic-object-size-pr120780.exe
error : Prototype doesn't match for 'main' in 'input file 1 at offset 1924', first defined in 'input file 1 at offset 1924'
nvptx-run: cuLinkAddData failed: unknown error (CUDA_ERROR_UNKNOWN, 999)
FAIL: gcc.dg/builtin-dynamic-object-size-pr120780.c execution test
gcc/testsuite/
* gcc.dg/builtin-dynamic-object-size-pr120780.c: Fix 'main' function.
|
|
Before the change in g:309dbcea2cabb31bde1a65cdfd30bb7f87b170a2 we would never
set a range for constant VF and requires partial vector loops.
I think a range could be set, since I think the number of latch executions is a
ceiling division of TYPE_MAX_VALUE / vf. To account for the partial iteration.
This would also then deal with the ICE cause in the PR where the chosen VF was
much higher than TYPE_MAX_VALUE and that a mask is relied upon to make it safe.
Since the patch was supposed to not change behavior I've added an additional
partial vector check on the const_vf > 0 check to make it explicit that we only
set it on non-partial vectors (alternative would have been to swap the order of
the vf.constant(&const_vf)) check, but that would have hidden the requirement
sneakily.
The second patch adds support for ranges for partial masks.
gcc/ChangeLog:
PR tree-optimization/120922
* tree-vect-loop-manip.cc (vect_gen_vector_loop_niters): Don't set range
for partial vectors.
gcc/testsuite/ChangeLog:
PR tree-optimization/120922
* gcc.dg/vect/pr120922.c: New test.
|
|
The following avoids inlining the actual main() (renamed to
guality_main) into the guality plumbing. This can cause
jump threading opportunities to appear and generally increase
the chance what we actually test isn't what we think. Likewise
make guality_check noipa instead of just noinline.
gcc/testsuite/
* gcc.dg/guality/guality.h (guality_main): Declare noipa.
(guality_check): Likewise.
|
|
I don't recall which port complained, but pr120654.c was failing on one or more
of the embedded targets due to the use of malloc/free. This change just turns
them into the __builtin variants which makes everyone happy again.
gcc/testsuite
* gcc.dg/torture/pr120654.c: Use __builtin variants of malloc and free.
|
|
Convert a pointer reference with counted_by attribute to .ACCESS_WITH_SIZE." due to PR120929.
This reverts commit 687727375769dd41971bad369f3553f1163b3e7a.
|
|
due to PR120929
This reverts commit 7165ca43caf47007f5ceaa46c034618d397d42ec.
|
|
due to PR120929
This reverts commit 9d579c522d551eaa807e438206e19a91a3def67f.
|
|
Drop down from SVE2 to SVE1 as that's the minimum
required for the test, and since it's a mid-end test
add the aarch64_sve_hw check.
gcc/testsuite/ChangeLog:
PR tree-optimization/120817
* gcc.dg/vect/pr120817.c: Add SVE HW check.
|
|
DSE used ao_ref_init_from_ptr_and_size for .MASK_STORE but
alias-analysis will use the specified size to disambiguate
against smaller objects. For .MASK_STORE we instead have to
make the access size unspecified but we can still constrain
the access extent based on the maximum size possible.
PR tree-optimization/120817
* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Use
ao_ref_init_from_ptr_and_range with unknown size for
.MASK_STORE and .MASK_LEN_STORE.
* gcc.dg/vect/pr120817.c: New testcase.
|
|
These builtins requires a constant integer for the third argument but currently
there is assert rather than error. This fixes that and updates the documentation too.
Uses the same terms as was being used for the __builtin_prefetch arguments.
Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/120709
gcc/ChangeLog:
* builtins.cc (expand_builtin_crc_table_based): Error out
instead of asserting the 3rd argument is an integer constant.
* internal-fn.cc (expand_crc_optab_fn): Likewise.
* doc/extend.texi (crc): Document requirement of the poly argument
being a constant.
gcc/testsuite/ChangeLog:
* gcc.dg/crc-non-cst-poly-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The cdce code introduces a test for a NaN using the EQ_EXPR code.
The problem is EQ_EXPR can cause an exception with non-call exceptions
and signaling nans turned on. This is now correctly rejected by the verfier
since r16-241-g4c40e3d7b9152f.
The fix is seperate out the comparison into its own statement from the GIMPLE_COND.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/120951
gcc/ChangeLog:
* tree-call-cdce.cc (use_internal_fn): For non-call exceptions
with EQ_EXPR can throw for floating point types, then create
the EQ_EXPR seperately.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr120951-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Main difference between normal profile feedback and auto-fdo is that with profile
feedback every basic block with non-zero profile has an incomming edge with non-zero
profile. With auto-profile it is possible that none of predecessors was sampled
and also the tool has cutoff parameter which makes it to ignore small counts.
This becomes a problem when one tries to specialize code and scale profile.
For exmaple if inline function happens to have hot loop with non-zero counts
but its entry count has zero counts and we want to inline to zero counts and we
want to inline to a call with a non-zero count X, we want to scale the body by
X/0 which we currently turn into X/1.
This is a problem since I added logic to scale up the auto-profiles (to get
some extra bits of precision) so X is often a large value and multiplying by X
is not a right answer at all. The multiply factor should be <= 1.
Iterating this few times will make counts to cap and we will lost any useful info.
Original implementation avoided this by doing all inlines before AFDO readback,
bit this is not possible with LTO (unless we move AFDO readback to WPA or add
support for context sensitive profiles). I think I can get the scaling work
reasonably well and then we can look into possible benefits of context sensitive
profiling which can be implemented both atop of AFDO as well as FDO.
This patch adds cutoff value to profile_info which is initialized by profile
feedback to 1 and by auto-profile to the scale factor (since we do not know the
cutoff create_gcov used; llvm's tool streams it and we probably should too).
Then force_nonzero forces every value smaller than cutoff/2 to cutoff/2 which
should keep scaling factors in reasonable ranges.
gcc/ChangeLog:
* auto-profile.cc
(autofdo_source_profile::read): Scale cutoff.
(read_autofdo_file): Initialize cutoff
* coverage.cc (read_counts_file): Initialize cutoff to 1.
* gcov-io.h (struct gcov_summary): Add cutoff field.
* ipa-inline.cc (inline_small_functions): mac_count can be non-zero
also with auto_profile.
* lto-cgraph.cc (output_profile_summary): Write cutoff
and sum_max.
(input_profile_summary): Read cutoff and sum max.
(merge_profile_summaries): Initialize and scale global cutoffs
and sum max.
* profile-count.cc: Include profile.h
(profile_count::force_nonzero): move here from ...; use cutoff.
* profile-count.h: (profile_count::force_nonzero): ... here.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/clone-merge-1.c:
|
|
tree_expr_nonnegative_warnv_p [PR118948]
This is an obvious fix for this small regression. Basically after r15-328-g5726de79e2154a,
there is a call to tree_expr_nonnegative_warnv_p where the type of the expression is now
error_mark_node. Though there was only a check if the expression was error_mark_node.
Bootstrapped and tested on x86_64-linux-gnu.
PR c/118948
gcc/ChangeLog:
* fold-const.cc (tree_expr_nonnegative_warnv_p): Use
error_operand_p instead of checking for error_mark_node directly.
gcc/testsuite/ChangeLog:
* gcc.dg/pr118948-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The following avoids translating expressions through volatile
copies.
PR tree-optimization/120944
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Gate optimizations
invalid when volatile is involved.
* gcc.dg/torture/pr120944.c: New testcase.
|
|
The following fixes bad alignment computaton for epilog vectorization
when as in this case for 510.parest_r and masked epilog vectorization
with AVX512 we end up choosing AVX to vectorize the main loop and
masked AVX512 (sic!) to vectorize the epilog. In that case alignment
analysis for the epilog tries to force alignment of the base to 64,
but that cannot possibly help the epilog when the main loop had used
a vector mode with smaller alignment requirement.
There's another issue, that the check whether the step preserves
alignment needs to consider possibly previously involved VFs
(here, the main loops smaller VF) as well.
These might not be the only case with problems for such a mode mix
but at least there it seems wise to never use DR alignment forcing
when analyzing an epilog.
We get to chose this mode setup because the iteration over epilog
modes doesn't prevent this, the maybe_ge (cached_vf_per_mode[0],
first_vinfo_vf) skip is conditional on !supports_partial_vectors
and it is also conditional on having a cached VF. Further nothing
in vect_analyze_loop_1 rejects this setup - it might be conceivable
that a target can do masking only for larger modes. There is a
second reason we end up with this mode setup, which is that
vect_need_peeling_or_partial_vectors_p says we do not need
peeling or partial vectors when analyzing the main loop with
AVX512 (if it would say so we'd have chosen a masked AVX512
epilog-only vectorization). It does that because it looks at
LOOP_VINFO_COST_MODEL_THRESHOLD (which is not yet computed, so
always zero at this point), and compares max_niter (5) against
the VF (8), but not with equality as the comment says but with
greater. This also needs looking at, PR120939.
PR tree-optimization/120927
* tree-vect-data-refs.cc (vect_compute_data_ref_alignment):
Do not force a DRs base alignment when analyzing an
epilog loop. Check whether the step preserves alignment
for all VFs possibly involved sofar.
* gcc.dg/vect/vect-pr120927.c: New testcase.
* gcc.dg/vect/vect-pr120927-2.c: Likewise.
|
|
The following testcase is miscompiled with -fsanitize=undefined but we
introduce UB into the IL even without that flag.
The optimization ptr +- (expr +- cst) when expr/cst have undefined
overflow into (ptr +- cst) +- expr is sometimes simply not valid,
without careful analysis on what ptr points to we don't know if it
is valid to do (ptr +- cst) pointer arithmetics.
E.g. on the testcase, ptr points to start of an array (actually
conditionally one or another) and cst is -1, so ptr - 1 is invalid
pointer arithmetics, while ptr + (expr - 1) can be valid if expr
is at runtime always > 1 and smaller than size of the array ptr points
to + 1.
Unfortunately, removing this 1992-ish optimization altogether causes
FAIL: c-c++-common/restrict-2.c -Wc++-compat scan-tree-dump-times lim2 "Moving statement" 11
FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump ch2 "is now do-while loop"
FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump-times ch2 " if " 3
FAIL: gcc.dg/vect/pr57558-2.c scan-tree-dump vect "vectorized 1 loops"
FAIL: gcc.dg/vect/pr57558-2.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops"
regressions (restrict-2.c also for C++ in all std modes). I've been thinking
about some match.pd optimization for signed integer addition/subtraction of
constant followed by widening integral conversion followed by multiplication
or left shift, but that wouldn't help 32-bit arches.
So, instead at least for now, the following patch keeps doing the
optimization, just doesn't perform it in pointer arithmetics.
pointer_int_sum itself actually adds the multiplication by size_exp,
so ptr + expr is turned into ptr p+ expr * size_exp,
so this patch will try to optimize
ptr + (expr +- cst)
into
ptr p+ ((sizetype)expr * size_exp +- (sizetype)cst * size_exp)
and
ptr - (expr +- cst)
into
ptr p+ -((sizetype)expr * size_exp +- (sizetype)cst * size_exp)
2025-07-04 Jakub Jelinek <jakub@redhat.com>
PR c/120837
* c-common.cc (pointer_int_sum): Rewrite the intop PLUS_EXPR or
MINUS_EXPR optimization into extension of both intop operands,
their separate multiplication and then addition/subtraction followed
by rest of pointer_int_sum handling after the multiplication.
* gcc.dg/ubsan/pr120837.c: New test.
|
|
gcc.dg/ipa/pr120295.c FAILs on Solaris:
FAIL: gcc.dg/ipa/pr120295.c (test for excess errors)
Excess errors:
ld: warning: symbol 'glob' has differing types:
(file /var/tmp//ccsDR59c.o type=OBJT; file /lib/libc.so type=FUNC);
/var/tmp//ccsDR59c.o definition taken
Fixed by renaming the glob variable to glob_ to avoid the conflict.
Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
gcc/testsuite:
* gcc.dg/ipa/pr120295.c (glob): Rename to glob_.
|
|
MEM_REF cast of a subobject to its containing object has negative
offsets, which objsz sees as an invalid access. Support this use case
by peeking into the structure to validate that the containing object
indeed contains a type of the subobject at that offset and if present,
adjust the wholesize for the object to allow the negative offset.
gcc/ChangeLog:
PR tree-optimization/120780
* tree-object-size.cc (inner_at_offset,
get_wholesize_for_memref): New functions.
(addr_object_size): Call get_wholesize_for_memref.
gcc/testsuite/ChangeLog:
PR tree-optimization/120780
* gcc.dg/builtin-dynamic-object-size-pr120780.c: New test case.
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
|
|
Current array bound checker only instruments ARRAY_REF, and the INDEX
information is the 2nd operand of the ARRAY_REF.
When extending the array bound checker to pointer references with
counted_by attributes, the hardest part is to get the INDEX of the
corresponding array ref from the offset computation expression of
the pointer ref. I.e.
Given an OFFSET expression, and the ELEMENT_SIZE,
get the index expression from the OFFSET.
For example:
OFFSET:
((long unsigned int) m * (long unsigned int) SAVE_EXPR <n>) * 4
ELEMENT_SIZE:
(sizetype) SAVE_EXPR <n> * 4
get the index as (long unsigned int) m.
gcc/c-family/ChangeLog:
* c-gimplify.cc (is_address_with_access_with_size): New function.
(ubsan_walk_array_refs_r): Instrument an INDIRECT_REF whose base
address is .ACCESS_WITH_SIZE or an address computation whose base
address is .ACCESS_WITH_SIZE.
* c-ubsan.cc (ubsan_instrument_bounds_pointer_address): New function.
(struct factor_t): New structure.
(get_factors_from_mul_expr): New function.
(get_index_from_offset): New function.
(get_index_from_pointer_addr_expr): New function.
(is_instrumentable_pointer_array_address): New function.
(ubsan_array_ref_instrumented_p): Change prototype.
Handle MEM_REF in addtional to ARRAY_REF.
(ubsan_maybe_instrument_array_ref): Handle MEM_REF in addtional
to ARRAY_REF.
gcc/testsuite/ChangeLog:
* gcc.dg/ubsan/pointer-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-4.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-5.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds.c: New test.
|