Age | Commit message (Collapse) | Author | Files | Lines |
|
The following testcase ICEs since r15-1579 (addition of late combiner),
because *clrmem_short can't be split.
The problem is that the define_insn uses
(use (match_operand 1 "nonmemory_operand" "n,a,a,a"))
(use (match_operand 2 "immediate_operand" "X,R,X,X"))
(clobber (match_scratch:P 3 "=X,X,X,&a"))
and define_split assumed that if operands[1] is const_int_operand,
match_scratch will be always scratch, and it will be reg only if
it was the last alternative where operands[1] is a reg.
The pattern doesn't guarantee it though, of course RA will not try to
uselessly assign a reg there if it is not needed, but during RA
on the testcase below we match the last alternative, but then comes
late combiner and propagates const_int 3 into operands[1]. And that
matches fine, match_scratch matches either scratch or reg and the constraint
in that case is X for the first variant, so still just fine. But we won't
split that because the splitters only expect scratch.
The following patch fixes it by using match_scratch instead of scratch,
so that it accepts either.
2025-04-17 Jakub Jelinek <jakub@redhat.com>
PR target/119834
* config/s390/s390.md (define_split after *cpymem_short): Use
(clobber (match_scratch N)) instead of (clobber (scratch)). Use
(match_dup 4) and operands[4] instead of (match_dup 3) and operands[3]
in the last of those.
(define_split after *clrmem_short): Use (clobber (match_scratch N))
instead of (clobber (scratch)).
(define_split after *cmpmem_short): Likewise.
* g++.target/s390/pr119834.C: New test.
|
|
Unused; remnant of an (internal) experiment, before we had nvptx 'as'.
gcc/
* config/nvptx/nvptx.cc (TARGET_ASM_NEED_VAR_DECL_BEFORE_USE):
Don't '#define'.
|
|
An infinite loop was introduced by a previous refactoring in the
semantic pass for DeclarationExp nodes. Ensure the loop properly
terminates and add tests cases.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd 956e73d64e.
gcc/testsuite/ChangeLog:
* gdc.test/fail_compilation/test21247.d: New test.
* gdc.test/fail_compilation/test21247b.d: New test.
Reviewed-on: https://github.com/dlang/dmd/pull/21248
|
|
Fix misleading comments. That function only determines whether
replacements cost more; it doesn't actually *validate* costs as being
cheaper.
For example, it returns true also if it for various reasons cannot
determine the costs, or if the new cost is the same, like when doing
an identity replacement. The code has been the same since
r0-59417-g64b8935d4809f3.
* combine.cc: Correct comments about combine_validate_cost.
|
|
If we already gave an error while parsing a function, we don't also need to
try to explain what's wrong with it when we later try to use it in a
constant-expression. In the new testcase explain_invalid_constexpr_fn
couldn't find anything still in the function to complain about, so it said
because: followed by nothing.
We still try to constant-evaluate it to reduce error cascades, but we
shouldn't complain if it doesn't work very well.
This flag is similar to CLASSTYPE_ERRONEOUS that I added a while back.
PR c++/113360
gcc/cp/ChangeLog:
* cp-tree.h (struct language_function): Add erroneous bit.
* constexpr.cc (explain_invalid_constexpr_fn): Return if set.
(cxx_eval_call_expression): Quiet if set.
* parser.cc (cp_parser_function_definition_after_declarator)
* pt.cc (instantiate_body): Set it.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/constexpr-nonlit18.C: Remove redundant message.
* g++.dg/cpp1y/constexpr-diag2.C: New test.
* g++.dg/cpp1y/pr63996.C: Adjust expected errors.
* g++.dg/template/explicit-args6.C: Likewise.
* g++.dg/cpp0x/constexpr-ice21.C: Likewise.
|
|
|
|
Like other ppc targets, powerpc-*-elf needs -Wno-psabi to compile
gcc.dg/ipa/ipa-sra-19.c without an undesired warning about vector
argument passing.
for gcc/testsuite/ChangeLog
* gcc.dg/ipa/ipa-sra-19.c: Add -Wno-psabi on ppc-elf too.
|
|
gcc/ChangeLog
PR c/88382
* doc/extend.texi (Syntax Extensions): Adjust menu.
(Raw String Literals): New section.
|
|
Usage of the altivec vector attribute requires use of the -maltivec option.
Replace with a generic equivalent which allows building the test case on
multiple other targets and non-altivec ppc cpus, but still diagnoses the
ICE on unfixed compilers.
2025-04-16 Peter Bergner <bergner@linux.ibm.com>
gcc/testsuite/
PR tree-optimization/112822
* g++.dg/pr112822.C: Replace altivec vector attribute with a generic
vector attribute.
|
|
gcc/cobol
PR cobol/119759
* LICENSE: Deleted.
|
|
pattern using rx_cmpstrn is cmpstrsi for which len is a constant -1,
so we'll be moving the setpsw instructions from rx_cmpstrn to
cmpstrnsi as follows:
1. Adjust the predicate on the length operand from "register_operand"
to "nonmemory_operand". This will allow constants to appear here,
instead of having them already transferred into a register.
2. Check to see if the len value is constant, and then check if it is
actually zero. In that case, short-circuit the rest of the pattern
and set the result register to 0.
3. Emit 'setpsw c' and 'setpsw z' instructions when the len is not a
constant, in case it turns out to be zero at runtime.
4. Remove the two 'setpsw' instructions from rx_cmpstrn.
gcc/
* config/rx/rx.md (cmpstrnsi): Allow constant length. For
static length 0, just store 0 into the output register.
For dynamic zero, set C/Z appropriately.
(rxcmpstrn): No longer set C/Z.
|
|
This is a regression introduced on the mainline and 14 branch by:
https://gcc.gnu.org/pipermail/gcc-cvs/2023-October/391658.html
The change bypasses int_fits_type_p (essentially) to work around the
signedness constraints, but in doing so disregards the peculiarities
of boolean types whose precision is not 1 dealt with by the predicate,
leading to the creation of a problematic conversion here.
Fixed by special-casing boolean types whose precision is not 1, as done
in several other places.
gcc/
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Do not
bypass the int_fits_type_p test for boolean types whose precision
is not 1.
gcc/testsuite/
* gnat.dg/opt105.adb: New test.
* gnat.dg/opt105_pkg.ads, gnat.dg/opt105_pkg.adb: New helper.
|
|
gcc/ChangeLog
* common.opt.urls: Regenerated.
|
|
Since r12-5426 apply_late_template_attributes suppresses various global
state to avoid applying active pragmas to earlier declarations; we also
need to override target_option_current_node.
PR c++/114772
PR c++/101180
gcc/cp/ChangeLog:
* pt.cc (apply_late_template_attributes): Also override
target_option_current_node.
gcc/testsuite/ChangeLog:
* g++.dg/ext/pragma-target2.C: New test.
|
|
Here when merging the two decls, remove_contract_attributes loses
ATTR_IS_DEPENDENT on the format attribute, so apply_late_template_attributes
just returns, so the attribute doesn't get propagated to the type where the
warning looks for it.
Fixed by using copy_node instead of tree_cons to preserve flags.
PR c++/116954
gcc/cp/ChangeLog:
* contracts.cc (remove_contract_attributes): Preserve flags
on the attribute list.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wformat-3.C: New test.
|
|
-mnop-mcount can be trivially enabled for -fPIC codegen as long as PLTs
are being used, given that the instruction encodings are identical, only
the target may resolve differently depending on how the linker decides
to incorporate the object file.
So relax the option check, and add a test to ensure that 5-byte NOPs are
emitted when -mnop-mcount is being used.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
gcc/ChangeLog:
PR target/119386
* config/i386/i386-options.cc: Permit -mnop-mcount when
using -fpic with PLTs.
gcc/testsuite/ChangeLog:
PR target/119386
* gcc.target/i386/pr119386-3.c: New test.
|
|
Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling
__fentry__") updated the logic that emits mcount() / __fentry__() calls
into function prologues when profiling is enabled, to avoid GOT-based
indirect calls when a direct call would suffice.
There are two problems with that change:
- it relies on -mdirect-extern-access rather than -fno-plt to decide
whether or not a direct [PLT based] call is appropriate;
- for the PLT case, it falls through to x86_print_call_or_nop(), which
does not emit the @PLT suffix, resulting in the wrong relocation to be
used (R_X86_64_PC32 instead of R_X86_64_PLT32)
Fix this by testing flag_plt instead of ix86_direct_extern_access, and
updating x86_print_call_or_nop() to take flag_pic and flag_plt into
account. This also ensures that -mnop-mcount works as expected when
emitting the PLT based profiling calls.
While at it, fix the 32-bit logic as well, and issue a PLT call unless
PLTs are explicitly disabled.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
gcc/ChangeLog:
PR target/119386
* config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix
where appropriate.
(x86_function_profiler): Fall through to x86_print_call_or_nop()
for PIC codegen when flag_plt is set.
gcc/testsuite/ChangeLog:
PR target/119386
* gcc.target/i386/pr119386-1.c: New test.
* gcc.target/i386/pr119386-2.c: New test.
|
|
-Q does something completely different in conjunction with --help than it
does otherwise; its main entry in the manual didn't mention that, nor did
-Q have an entry in the index for the --help usage.
gcc/ChangeLog
PR driver/90465
* doc/invoke.texi (Overall Options): Add a @cindex for -Q in
connection with --help=.
(Developer Options): Point at --help= documentation for the
other use of -Q.
|
|
PR fortran/106948
gcc/fortran/ChangeLog:
* resolve.cc (gfc_pure_function): If a function has been resolved,
but esym is not yet set, look at its attributes to see whether it
is pure or elemental.
gcc/testsuite/ChangeLog:
* gfortran.dg/pure_formal_proc_4.f90: New test.
|
|
[PR97106]
PR target/97106
gcc/
* config/nvptx/nvptx.cc (nvptx_asm_output_def_from_decls)
[ACCEL_COMPILER]: Make sure to emit C++ constructor, destructor
aliases.
libgomp/
* testsuite/libgomp.c++/pr96390.C: Un-XFAIL nvptx offloading.
* testsuite/libgomp.c-c++-common/pr96390.c: Adjust.
|
|
Add streaming of return summaries from compile time to ltrans
which are now needed for vrp to not ouput false errors on musttail.
Co-authored-by: Jakub Jelinek <jakub@redhat.com>
gcc/ChangeLog:
PR tree-optimization/119614
* ipa-prop.cc (ipa_write_return_summaries): New function.
(ipa_record_return_value_range_1): Break out from ....
(ipa_record_return_value_range): ... here.
(ipa_read_return_summaries): New function.
(ipa_prop_read_section): Read return summaries.
(read_ipcp_transformation_info): Read return summaries.
(ipcp_write_transformation_summaries): Write return summaries;
do not stream stray 0.
gcc/testsuite/ChangeLog:
* g++.dg/lto/pr119614_0.C: New test.
|
|
architecture [PR119286]
The given test is intended to test vectorization of a strided access done by
having a step of > 1.
GCN target doesn't support load lanes, so the testcase is expected to fail,
other targets create a permuted load here which we then then reject.
However some GCN arch don't seem to support the permuted loads either, so the
vectorizer tries a gather/scatter. But the indices aren't supported by some
target, so instead the vectorizer scalarizes the loads.
I can't really test for which architecture is being used by the compiler, so
instead this updates the testcase to use one single architecture so we get a
consistent result.
gcc/testsuite/ChangeLog:
PR target/119286
* gcc.dg/vect/vect-early-break_18.c: Force -march=gfx908 for amdgcn.
|
|
The following example:
#define N 512
#define START 2
#define END 505
int x[N] __attribute__((aligned(32)));
int __attribute__((noipa))
foo (void)
{
for (signed int i = START; i < END; ++i)
{
if (x[i] == 0)
return i;
}
return -1;
}
generates incorrect code with fixed length SVE because for early break we need
to know which value to start the scalar loop with if we take an early exit.
Historically this means that we take the first element of every induction.
this is because there's an assumption in place, that even with masked loops the
masks come from a whilel* instruction.
As such we reduce using a BIT_FIELD_REF <, 0>.
When PFA was added this assumption was correct for non-masked loop, however we
assumed that PFA for VLA wouldn't work for now, and disabled it using the
alignment requirement checks. We also expected VLS to PFA using scalar loops.
However as this PR shows, for VLS the vectorizer can, and does in some
circumstances choose to peel using masks by masking the first iteration of the
loop with an additional alignment mask.
When this is done, the first elements of the predicate can be inactive. In this
example element 1 is inactive based on the calculated misalignment. hence the
-1 value in the first vector IV element.
When we reduce using BIT_FIELD_REF we get the wrong value.
This patch updates it by creating a new scalar PHI that keeps track of whether
we are the first iteration of the loop (with the additional masking) or whether
we have taken a loop iteration already.
The generated sequence:
pre-header:
bb1:
i_1 = <number of leading inactive elements>
header:
bb2:
i_2 = PHI <i_1(bb1), 0(latch)>
…
early-exit:
bb3:
i_3 = iv_step * i_2 + PHI<vector-iv>
Which eliminates the need to do an expensive mask based reduction.
This fixes gromacs with one OpenMP thread. But with > 1 there is still an issue.
gcc/ChangeLog:
PR tree-optimization/119351
* tree-vectorizer.h (LOOP_VINFO_MASK_NITERS_PFA_OFFSET,
LOOP_VINFO_NON_LINEAR_IV): New.
(class _loop_vec_info): Add mask_skip_niters_pfa_offset and
nonlinear_iv.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize them.
(vect_analyze_scalar_cycles_1): Record non-linear inductions.
(vectorizable_induction): If early break and PFA using masking create a
new phi which tracks where the scalar code needs to start...
(vectorizable_live_operation): ...and generate the adjustments here.
(vect_use_loop_mask_for_alignment_p): Reject non-linear inductions and
early break needing peeling.
gcc/testsuite/ChangeLog:
PR tree-optimization/119351
* gcc.target/aarch64/sve/peel_ind_10.c: New test.
* gcc.target/aarch64/sve/peel_ind_10_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_5.c: New test.
* gcc.target/aarch64/sve/peel_ind_5_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_6.c: New test.
* gcc.target/aarch64/sve/peel_ind_6_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_7.c: New test.
* gcc.target/aarch64/sve/peel_ind_7_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_8.c: New test.
* gcc.target/aarch64/sve/peel_ind_8_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_9.c: New test.
* gcc.target/aarch64/sve/peel_ind_9_run.c: New test.
|
|
has_single_use [PR119808]
The following testcase is miscompiled, because we emit a CLOBBER in a place
where it shouldn't be emitted.
Before lowering we have:
b_5 = 0;
b.0_6 = b_5;
b.1_1 = (unsigned _BitInt(129)) b.0_6;
...
<retval> = b_5;
The bitint coalescing assigns the same partition/underlying variable
for both b_5 and b.0_6 (possible because there is a copy assignment)
and of course a different one for b.1_1 (and other SSA_NAMEs in between).
This is -O0 so stmts aren't DCEd and aren't propagated that much etc.
It is -O0 so we also don't try to optimize and omit some names from m_names
and handle multiple stmts at once, so the expansion emits essentially
bitint.4 = {};
bitint.4 = bitint.4;
bitint.2 = cast of bitint.4;
bitint.4 = CLOBBER;
...
<retval> = bitint.4;
and the CLOBBER is the problem because bitint.4 is still live afterwards.
We emit the clobbers to improve code generation, but do it only for
(initially) has_single_use SSA_NAMEs (remembered in m_single_use_names)
being used, if they don't have the same partition on the lhs and a few
other conditions.
The problem above is that b.0_6 which is used in the cast has_single_use
and so was in m_single_use_names bitmask and the lhs in that case is
bitint.2, so a different partition. But there is gimple_assign_copy_p
with SSA_NAME rhs1 and the partitioning special cases those and while
b.0_6 is single use, b_5 has multiple uses. I believe this ought to be
a problem solely in the case of such copy stmts and its special case
by the partitioning, if instead of b.0_6 = b_5; there would be
b.0_6 = b_5 + 1; or whatever other stmts that performs or may perform
changes on the value, partitioning couldn't assign the same partition
to b.0_6 and b_5 if b_5 is used later, it couldn't have two different
(or potentially different) values in the same bitint.N var. With
copy that is possible though.
So the following patch fixes it by being more careful when we set
m_single_use_names, don't set it if it is a has_single_use SSA_NAME
but SSA_NAME_DEF_STMT of it is a copy stmt with SSA_NAME rhs1 and that
rhs1 doesn't have single use, or has_single_use but SSA_NAME_DEF_STMT of it
is a copy stmt etc.
Just to make sure it doesn't change code generation too much, I've gathered
statistics how many times
if (m_first
&& m_single_use_names
&& m_vars[p] != m_lhs
&& m_after_stmt
&& bitmap_bit_p (m_single_use_names, SSA_NAME_VERSION (op)))
{
tree clobber = build_clobber (TREE_TYPE (m_vars[p]),
CLOBBER_STORAGE_END);
g = gimple_build_assign (m_vars[p], clobber);
gimple_stmt_iterator gsi = gsi_for_stmt (m_after_stmt);
gsi_insert_after (&gsi, g, GSI_SAME_STMT);
}
emits a clobber on
make check-gcc GCC_TEST_RUN_EXPENSIVE=1 RUNTESTFLAGS="--target_board=unix\{-m64,-m32\} GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c builtin-stdc-bit-*.c pr112566-2.c pr112511.c pr116588.c pr116003.c pr113693.c pr113602.c flex-array-counted-by-7.c' dg-torture.exp='*bitint* pr116480-2.c pr114312.c pr114121.c' dfp.exp=*bitint* i386.exp='pr118017.c pr117946.c apx-ndd-x32-2a.c' vect.exp='vect-early-break_99-pr113287.c' tree-ssa.exp=pr113735.c"
and before this patch it was 41010 clobbers and after it is 40968,
so difference is 42 clobbers, 0.1% fewer.
2025-04-16 Jakub Jelinek <jakub@redhat.com>
PR middle-end/119808
* gimple-lower-bitint.cc (gimple_lower_bitint): Don't set
m_single_use_names bits for SSA_NAMEs which have single use but
their SSA_NAME_DEF_STMT is a copy from another SSA_NAME which doesn't
have a single use, or single use which is such a copy etc.
* gcc.dg/bitint-121.c: New test.
|
|
Codegen is incorrectly emitting a ".p2align 3" that coerces the
alignment of the .note.gnu.property section from 4 to 8 on rv32.
2025-04-11 Jesse Huang <jesse.huang@sifive.com>
gcc/ChangeLog
* config/riscv/riscv.cc (riscv_file_end): Fix .p2align value.
gcc/testsuite/ChangeLog
* gcc.target/riscv/gnu-property-align-rv32.c: New file.
* gcc.target/riscv/gnu-property-align-rv64.c: New file.
|
|
Large code model assume the data or rodata may put far away from
text section. So we need to put jump table in text section for
large code model.
gcc/ChangeLog:
* config/riscv/riscv.h (JUMP_TABLES_IN_TEXT_SECTION): Check if
large code model.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/jump-table-large-code-model.c: New test.
|
|
This testcase got fixed with r15-9397 PR119722 fix.
2025-04-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/116093
* gcc.dg/bitint-122.c: New test.
|
|
The operand order to gen_vcond_mask call in the vec_extract pattern is wrong.
Fix the order where predicate is operand 3.
Tested and bootstrapped on aarch64-linux-gnu. OK for trunk?
gcc/ChangeLog
* config/aarch64/aarch64-sve.md (vec_extract<vpred><Vel>): Fix operand
order to gen_vcond_mask_*.
|
|
This applies to the sysreg read/write intrinsics __arm_[wr]sr*. It does
not depend on changes to Binutils, because GCC converts recognised
sysreg names to an encoding based form, which is already ungated in Binutils.
We have, however, agreed to make an equivalent change in Binutils (which
would then disable feature gating for sysreg accesses in inline
assembly), but this has not yet been posted upstream.
In the future we may introduce a new flag to renable some checking,
but these checks could not be comprehensive because many system
registers depend on architecture features that don't have corresponding
GCC/GAS --march options. This would also depend on addressing numerous
inconsistencies in the existing list of sysreg feature dependencies.
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_valid_sysreg_name_p): Remove feature check.
(aarch64_retrieve_sysreg): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/acle/rwsr-ungated.c: New test.
|
|
|
|
Forward referenced enum types were never fixed up after the main
ENUMERAL_TYPE was finished. All flags set are now propagated to all
variants after its mode, size, and alignment has been calculated.
PR d/119826
gcc/d/ChangeLog:
* types.cc (TypeVisitor::visit (TypeEnum *)): Propagate flags of main
enum types to all forward-referenced variants.
gcc/testsuite/ChangeLog:
* gdc.dg/debug/imports/pr119826b.d: New test.
* gdc.dg/debug/pr119826.d: New test.
|
|
Currently, pruned lambda captures are still leftover in the function's
BLOCK and topmost BIND_EXPR; this doesn't cause any issues for normal
compilation, but does break modules streaming as we try to reconstruct a
FIELD_DECL that no longer exists on the type itself.
PR c++/119755
gcc/cp/ChangeLog:
* lambda.cc (prune_lambda_captures): Remove pruned capture from
function's BLOCK_VARS and BIND_EXPR_VARS.
gcc/testsuite/ChangeLog:
* g++.dg/modules/lambda-10_a.H: New test.
* g++.dg/modules/lambda-10_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
The r15-9487 change has added -flto-partition=default, which broke
the completion-2.c testcase because that case is now also printed
during completion.
2025-04-16 Jakub Jelinek <jakub@redhat.com>
* gcc.dg/completion-2.c: Expect also -flto-partition=default line.
|
|
C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold.
In c_fully_fold, it assumes that operands of function calls have already
been folded. However, when we build call to .ACCESS_WITH_SIZE, all its
operands are not fully folded. therefore the C FE specific operator is
passed to middle-end.
In order to fix this issue, fully fold the parameters before building the
call to .ACCESS_WITH_SIZE.
PR c/119717
gcc/c/ChangeLog:
* c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the
parameters for call to .ACCESS_WITH_SIZE.
gcc/testsuite/ChangeLog:
* gcc.dg/pr119717.c: New test.
|
|
ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers
pushed in red-zone. Since
commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Sun Apr 13 12:20:42 2025 -0700
APX: Don't use red-zone with 32 GPRs and no caller-saved registers
disabled red-zone, update gcc.target/i386/apx-interrupt-1.c to expect
31 .cfi_restore directives.
PR target/119784
* gcc.target/i386/apx-interrupt-1.c: Expect 31 .cfi_restore
directives.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
There's a blurb at the top of the "Optimize Options" node telling
people that most optimization options are completely disabled at -O0
and a similar blurb in the entry for -Og, but nothing at the entry for
-O0. Since this is a continuing point of confusion it seems wise to
duplicate the information in all the places users are likely to look
for it.
gcc/ChangeLog
PR tree-optimization/71094
* doc/invoke.texi (Optimize Options): Document that -fivopts is
enabled at -O1 and higher. Add blurb about -O0 causing GCC to
completely ignore most optimization options.
|
|
On Darwin and other targets with !can_alias_cdtor, we instead go to
maybe_thunk_ctor, which builds a thunk function that calls the general
constructor. And then cp_fold tries to constant-evaluate that call, and we
ICE because we don't expect to ever be asked to constant-evaluate a call to
a trivial function.
No new test because this fixes g++.dg/torture/tail-padding1.C on affected
targets.
PR c++/111075
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_call_expression): Allow trivial
call from a thunk.
|
|
The latest editions of XCode have altered the identify reported by 'ld -v'
(again). This means that GCC configure no longer detects the version.
Fixed by adding the new name to the set checked.
gcc/ChangeLog:
* configure: Regenerate.
* configure.ac: Recognise PROJECT:ld-mmmm.nn.aa as an identifier
for Darwin's static linker.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
Recent changes to the OS SDKs have altered the way in which include guards
are used for a number of headers when C++ modules are enabled. Instead of
placing the guards in the included header, they are being placed in the
including header. This breaks the assumptions in the current GCC stddef.h
specifically, that the presence of __PTRDIFF_T and __SIZE_T means that the
relevant defs are already made. However in the case of the module-enabled
C++ with these SDKs, that is no longer true.
stddef.h has a large body of special-cases already, but it seems that the
only viable solution here is to add a new one specifically for __APPLE__
and modular code.
This fixes around 280 new fails in the modules test-suite; it is needed on
all open branches that support modules.
PR target/116827
gcc/ChangeLog:
* ginclude/stddef.h: Undefine __PTRDIFF_T and __SIZE_T for module-
enabled c++ on Darwin/macOS platforms.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
* common.opt.urls: Regenerate.
|
|
The following installs gcobol.3 as gcobol-io.3 and applies
program-transform-name to the gcobol-io part. This follows
naming of the pdf and the html variants.
It also uses $(man1ext) and $(man3ext) consistently.
PR cobol/119302
gcc/cobol/
* Make-lang.in (GCOBOLIO_INSTALL_NAME): Define.
Use $(GCOBOLIO_INSTALL_NAME) for gcobol.3 manpage source
upon install.
|
|
this patch sets issue rate of znver5 to 4. With current model, unless a reservation is
missing, we will never issue more than 4 instructions per cycle since that is the limit
of decoders and the model does not take into acount the fact that typically code is run
from op cache.
gcc/ChangeLog:
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Set
to 4 for znver5.
|
|
Znver5 has latency of addss 2 in typical case while all earlier versions has latency 3.
Unforunately addss cost is used to cost many other SSE instructions than just addss and
setting the cost to 2 makes us to vectorize 4 64bit stores into one 256bit store which
in turn regesses imagemagick.
This patch sets the cost back to 3. Next stage1 we can untie addss from the other operatoins
and set it correctly.
bootstrapped/regtested x86_64-linux and also benchmarked on SPEC2k17
gcc/ChangeLog:
PR target/119298
* config/i386/x86-tune-costs.h (znver5_cost): Set ADDSS cost to 3.
|
|
vsetvl phase4 uses LCM guided info to insert VSETVL insns, including a
straggler loop for "mising vsetvls" on certain edges. Currently it
asserts on encountering EDGE_ABNORMAL.
When enabling go frontend with V enabled, libgo build hits the assert.
The solution is to prevent abnormal edges from getting into LCM at all
(my prior attempt at this just ignored them after LCM which is not
right). Existing invalid_opt_bb_p () current does this for BB predecessors
but not for successors which is what the patch adds.
Crucially, the ICE/fix also depends on avoiding vsetvl hoisting past
non-transparent blocks: That is taken care of by Robin's patch
"RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547]"
for a different yet related issue.
Reported-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
PR target/119533
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (invalid_opt_bb_p): Check for
EDGE_ABNOMAL.
(pre_vsetvl::compute_lcm_local_properties): Initialize kill
bitmap.
Debug dump skipped edge.
gcc/testsuite/ChangeLog:
* go.dg/pr119533-riscv.go: New test.
* go.dg/pr119533-riscv-2.go: New test.
|
|
When lifting up a vsetvl into a block we currently don't consider the
block's transparency with respect to the vsetvl as in other parts of the
pass. This patch does not perform the lift when transparency is not
guaranteed.
This condition is more restrictive than necessary as we can still
perform a vsetvl lift if the conflicting register is only every used
in vsetvls and no regular insns but given how late we are in the GCC 15
cycle it seems better to defer this. Therefore
gcc.target/riscv/rvv/vsetvl/avl_single-68.c is XFAILed for now.
This issue was found in OpenCV where it manifests as a runtime error.
Zhijin Zeng debugged PR119547 and provided an initial patch.
Reported-By: 曾治金 <zhijin.zeng@spacemit.com>
PR target/119547
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info):
Do not perform lift if block is not transparent.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/avl_single-68.c: xfail.
* g++.target/riscv/rvv/autovec/pr119547.C: New test.
* g++.target/riscv/rvv/autovec/pr119547-2.C: New test.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-10.c: Adjust.
|
|
When mapping an allocatable variable (or derived-type component), explicitly
or implicitly, all its allocated allocatable components will automatically be
mapped. The patch implements the target hooks, added for this feature to
omp-low.cc with commit r15-3895-ge4a58b6f28383c.
Namely, there is a check whether there are allocatable components at all:
gfc_omp_deep_mapping_p. Then gfc_omp_deep_mapping_cnt, counting the number
of required mappings; this is a dynamic value as it depends on array
bounds and whether an allocatable is allocated or not.
And, finally, the actual mapping: gfc_omp_deep_mapping.
Polymorphic variables are partially supported: the mapping of the _data
component is fully supported, but only components of the declared type
are processed for additional allocatables. Additionally, _vptr is not
touched. This means that everything needing _vtab information requires
unified shared memory; in particular, _size data is required when
accessing elements of polymorphic arrays.
However, for scalar arrays, accessing components of the declare type
should work just fine.
As polymorphic variables are not (really) supported and OpenMP 6
explicitly disallows them, there is now a warning (-Wopenmp) when
they are encountered. Unlimited polymorphics are rejected (error).
Additionally, PRIVATE and FIRSTPRIVATE are not quite supported for
allocatable components, polymorphic components and as polymorphic
variable. Thus, those are now rejected as well.
gcc/fortran/ChangeLog:
* f95-lang.cc (LANG_HOOKS_OMP_DEEP_MAPPING,
LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT):
Define.
* openmp.cc (gfc_match_omp_clause_reduction): Fix location setting.
(resolve_omp_clauses): Permit allocatable components, reject
them and polymorphic variables in PRIVATE/FIRSTPRIVATE.
* trans-decl.cc (add_clause): Set clause location.
* trans-openmp.cc (gfc_has_alloc_comps): Add ptr_ok and
shallow_alloc_only Boolean arguments.
(gfc_omp_replace_alloc_by_to_mapping): New.
(gfc_omp_private_outer_ref, gfc_walk_alloc_comps,
gfc_omp_clause_default_ctor, gfc_omp_clause_copy_ctor,
gfc_omp_clause_assign_op, gfc_omp_clause_dtor): Update call to it.
(gfc_omp_finish_clause): Minor cleanups, improve location data,
handle allocatable components.
(gfc_omp_deep_mapping_map, gfc_omp_deep_mapping_item,
gfc_omp_deep_mapping_comps, gfc_omp_gen_simple_loop,
gfc_omp_get_array_size, gfc_omp_elmental_loop,
gfc_omp_deep_map_kind_p, gfc_omp_deep_mapping_int_p,
gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_do,
gfc_omp_deep_mapping_cnt, gfc_omp_deep_mapping): New.
(gfc_trans_omp_array_section): Save array descriptor in case
deep-mapping lang hook will need it.
(gfc_trans_omp_clauses): Likewise; use better clause location data.
* trans.h (gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_cnt,
gfc_omp_deep_mapping): Add function prototypes.
libgomp/ChangeLog:
* libgomp.texi (5.0 Impl. Status): Mark mapping alloc comps as 'Y'.
* testsuite/libgomp.fortran/allocatable-comp.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-3.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-4.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-5.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-6.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-7.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-8.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-9.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/map-alloc-comp-1.f90: Remove dg-error.
* gfortran.dg/gomp/polymorphic-mapping-2.f90: Update warn wording.
* gfortran.dg/gomp/polymorphic-mapping.f90: Change expected
diagnostic; some tests moved to ...
* gfortran.dg/gomp/polymorphic-mapping-1.f90: ... here as new test.
* gfortran.dg/gomp/polymorphic-mapping-3.f90: New test.
* gfortran.dg/gomp/polymorphic-mapping-4.f90: New test.
* gfortran.dg/gomp/polymorphic-mapping-5.f90: New test.
|
|
Implement partitioning and cloning in the callgraph to help locality.
A new -fipa-reorder-for-locality flag is used to enable this.
The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc
The optimization has two components:
* Partitioning the callgraph so as to group callers and callees that frequently
call each other in the same partition
* Cloning functions that straddle multiple callchains and allowing each clone
to be local to the partition of its callchain.
The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc.
It creates a partitioning plan and does the prerequisite cloning.
The partitioning is then implemented during the existing LTO partitioning pass.
To guide these locality heuristics we use PGO data.
In the absence of PGO data we use a static heuristic that uses the accumulated
estimated edge frequencies of the callees for each function to guide the
reordering.
We are investigating some more elaborate static heuristics, in particular using
the demangled C++ names to group template instantiatios together.
This is promising but we are working out some kinks in the implementation
currently and want to send that out as a follow-up once we're more confident
in it.
A new bootstrap-lto-locality bootstrap config is added that allows us to test
this on GCC itself with either static or PGO heuristics.
GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap).
As this new pass enables a new partitioning scheme it is incompatible with
explicit -flto-partition= options so an error is introduced when the user
uses both flags explicitly.
With this optimization we are seeing good performance gains on some large
internal workloads that stress the parts of the processor that is sensitive
to code locality, but we'd appreciate wider performance evaluation.
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?
Thanks,
Kyrill
Signed-off-by: Prachi Godbole <pgodbole@nvidia.com>
Co-authored-by: Kyrylo Tkachov <ktkachov@nvidia.com>
config/ChangeLog:
* bootstrap-lto-locality.mk: New file.
gcc/ChangeLog:
* Makefile.in (OBJS): Add ipa-locality-cloning.o.
* cgraph.h (set_new_clone_decl_and_node_flags): Declare prototype.
* cgraphclones.cc (set_new_clone_decl_and_node_flags): Remove static
qualifier.
* common.opt (fipa-reorder-for-locality): New flag.
(LTO_PARTITION_DEFAULT): Declare.
(flto-partition): Change default to LTO_PARTITION_DFEAULT.
* doc/invoke.texi: Document -fipa-reorder-for-locality.
* flag-types.h (enum lto_locality_cloning_model): Declare.
(lto_partitioning_model): Add LTO_PARTITION_DEFAULT.
* lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping of
node and index.
* opts.cc (validate_ipa_reorder_locality_lto_partition): Define.
(finish_options): Handle LTO_PARTITION_DEFAULT.
* params.opt (lto_locality_cloning_model): New enum.
(lto-partition-locality-cloning): New param.
(lto-partition-locality-frequency-cutoff): Likewise.
(lto-partition-locality-size-cutoff): Likewise.
(lto-max-locality-partition): Likewise.
* passes.def: Register pass_ipa_locality_cloning.
* timevar.def (TV_IPA_LC): New timevar.
* tree-pass.h (make_pass_ipa_locality_cloning): Declare.
* ipa-locality-cloning.cc: New file.
* ipa-locality-cloning.h: New file.
gcc/lto/ChangeLog:
* lto-partition.cc (add_node_references_to_partition): Define.
(create_partition): Likewise.
(lto_locality_map): Likewise.
(lto_promote_cross_file_statics): Add extra dumping.
* lto-partition.h (lto_locality_map): Declare prototype.
* lto.cc (do_whole_program_analysis): Handle
flag_ipa_reorder_for_locality.
|
|
In my fix for PR 119318 I put mask calculation in
ipcp_bits_lattice::meet_with_1 above a final fix to value so that all
the bits in the value which are meaningless according to mask have
value zero, which has tripped a validator in PR 119803. This patch
fixes that by moving the adjustment down.
Even thought the fix for PR 119318 did a similar thing in
ipcp_bits_lattice::meet_with, the same is not necessary because that
code path then feeds the new value and mask to
ipcp_bits_lattice::set_to_constant which does the final adjustment
correctly.
In both places, however, Jakup proposed a better way of calculating
cap_mask and so I have changed it accordingly.
gcc/ChangeLog:
2025-04-15 Martin Jambor <mjambor@suse.cz>
PR ipa/119803
* ipa-cp.cc (ipcp_bits_lattice::meet_with_1): Move m_value adjustmed
according to m_mask below the adjustment of the latter according to
cap_mask. Optimize the calculation of cap_mask a bit.
(ipcp_bits_lattice::meet_with): Optimize the calculation of cap_mask a
bit.
gcc/testsuite/ChangeLog:
2025-04-15 Martin Jambor <mjambor@suse.cz>
PR ipa/119803
* gcc.dg/ipa/pr119803.c: New test.
Co-authored-by: Jakub Jelinek <jakub@redhat.com>
|
|
This was caused by a check in the D front-end disallowing static
VAR_DECLs with a size `0'.
While empty structs in D are give the size `1', the same symbol coming
from ImportC modules do infact have no size, so allow C variables to
pass the check as well as array objects.
PR d/119799
gcc/d/ChangeLog:
* decl.cc (DeclVisitor::visit (VarDeclaration *)): Check front-end
type size before building the VAR_DECL. Allow C symbols to have a
size of `0'.
gcc/testsuite/ChangeLog:
* gdc.dg/import-c/pr119799.d: New test.
* gdc.dg/import-c/pr119799c.c: New test.
|
|
When remapping existing specializations of a hidden template friend from
a previous declaration to the new definition, we must remap only those
specializations that match this new definition, but currently we
remap all specializations (since they all appear in the same
DECL_TEMPLATE_INSTANTIATIONS list of the most general template).
Concretely, in the first testcase below, we form two specializations of
the friend A::f, one with arguments {{0},{bool}} and another with
arguments {{1},{bool}}. Later when instantiating B, we need to remap
these specializations. During the B<0> instantiation we only want to
remap the first specialization, and during the B<1> instantiation only
the second specialization, but currently we remap both specializations
twice.
tsubst_friend_function needs to determine if an existing specialization
matches the shape of the new definition, which is tricky in general,
e.g. if the outer template parameters may not match up. Fortunately we
don't have to reinvent the wheel here since is_specialization_of_friend
seems to do exactly what we need. We can check this unconditionally,
but I think it's only necessary when dealing with specializations formed
from a class template scope previous declaration, hence the
TMPL_ARGS_HAVE_MULTIPLE_LEVELS check.
PR c++/119807
PR c++/112288
gcc/cp/ChangeLog:
* pt.cc (tsubst_friend_function): Skip remapping an
existing specialization if it doesn't match the shape of
the new friend definition.
gcc/testsuite/ChangeLog:
* g++.dg/template/friend86.C: New test.
* g++.dg/template/friend87.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|