Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch extends the expander for unconditional fma, fnma, fms, and
fnms, so that it supports partial SVE FP modes.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (<optab><mode>4): Extend from
SVE_FULL_F_B16B16 to SVE_F_B16B16. Use aarch64_sve_fp_pred instead
of aarch64_ptrue_reg.
(@aarch64_pred_<optab><mode>): Extend from SVE_FULL_F_B16B16 to
SVE_F_B16B16. Use aarch64_predicate_operand.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/sve/unpacked_ternary_bf16_1.C: New test.
* g++.target/aarch64/sve/unpacked_ternary_bf16_2.C: Likewise.
* gcc.target/aarch64/sve/unpacked_fmla_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fmla_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fmls_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fmls_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fnmla_1.c: Likeiwse.
* gcc.target/aarch64/sve/unpacked_fnmla_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fnmls_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fnmls_2.c: Likewise.
|
|
It's needed by avx5124vnniw/avx5124fmaps which have been removed by
r15-656-ge1a7e2c54d52d0.
gcc/ChangeLog:
* config/i386/i386-modes.def: Remove VECTOR_MODES(FLOAT, 256)
and VECTOR_MODE (INT, SI, 64).
* config/i386/i386.cc (ix86_hard_regno_nregs): Remove related
code for V64SF/V64SImode.
|
|
r14-1902-g96c3539f2a3813 split TImode move with 2 DImode move, it's
supposed to optimize TImode in parameter/return since accoring to
psABI it's stored into 2 general registers.
But when TImode is not in parameter/return, it could create redundancy
in the PR.
The patch add a splitter to handle that.
.i.e.
(insn 10 9 14 2 (set (subreg:V2DI (reg:V4SI 98 [ <retval> ]) 0)
(vec_concat:V2DI (subreg:DI (reg:TI 101) 0)
(subreg:DI (reg:TI 101) 8)))
8442 {vec_concatv2di}
(expr_list:REG_DEAD (reg:TI 101)
gcc/ChangeLog:
PR target/121274
* config/i386/sse.md (*vec_concatv2di_0): Add a splitter
before it.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr121274.c: New test.
|
|
The unsigned avg ceil share the vaaddux.vx for the vx combine,
so add the test case to make sure it works well as expected.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Add asm check
for unsigned avg ceil.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add
test data.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-u8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
|
|
r16-2614-g965564eafb721f had a typo where it would assume byte==0
rather than use the byte (offset) that was passed.
This fixes that typo and also fixes the comment since it is not just
about lowerpart subregs but all non-paradoxical subregs.
Pushed as obvious after bootstrap/test on x86_64-linux-gnu.
PR rtl-optimization/121302
gcc/ChangeLog:
* simplify-rtx.cc (simplify_context::simplify_subreg): Use
byte instead of 0 when calling simplify_subreg.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
supported [PR121215]
The problem here is that in tree-prof.exp does not cleanup if requiring auto-profile
but it is not supported and the testcase uses dg-additional-sources. Currently additional_sources
is not reset to "" and then another testcase comes along and thinks that is the additional source
to be added.
Committed as obvious after testing:
make check-gcc RUNTESTFLAGS="tree-prof.exp=afdo-crossmodule-1.c tree-ssa.exp=pr67891.c"
to make sure pr67891.c now no longer uses the additional source.
PR testsuite/121215
gcc/testsuite/ChangeLog:
* lib/profopt.exp (profopt-execute): Call cleanup-after-saved-dg-test
if returning early for the -fauto-profile case failing case.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
This patch extends the expander for conditional smax, smin, add, sub, mul,
min, max, and div to support partial SVE FP modes.
If exceptions from undefined vector elements must be suppressed, this
expansion converts the container-level predicate to an element-level one, and
ensures that these elements are inactive for the operation. In practice, this
is a predicate AND with the existing mask and a container-size PTRUE.
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (aarch64_sve_emit_masked_fp_pred):
Declare.
* config/aarch64/aarch64-sve.md (and<mode>3): Change this to...
(@and<mode>3): ...this, so that we can use gen_and3.
(@cond_<optab><mode>): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16,
use aarch64_predicate_operand.
(*cond_<optab><mode>_2_strict): Likewise.
(*cond_<optab><mode>_3_strict): Likewise.
(*cond_<optab><mode>_any_strict): Likwise.
(*cond_<optab><mode>_2_const_strict): Extend from SVE_FULL_F to SVE_F,
use aarch64_predicate_operand.
(*cond_<optab><mode>_any_const_strict): Likewise.
(*cond_sub<mode>_3_const_strict): Likwise.
(*cond_sub<mode>_const_strict): Likewise.
(*vcond_mask_<mode><vpred>): Use aarch64_predicate_operand, and update
the comment here.
* config/aarch64/aarch64.cc (aarch64_sve_emit_masked_fp_pred): New
function. Helper to mask the predicate in conditional expanders.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/sve/unpacked_cond_binary_bf16_2.C: New test.
* gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fadd_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fdiv_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fmaxnm_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fminnm_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fmul_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fsubr_2.c: Likewise.
|
|
Pass -mno-80387 to compile pr121208-1(a|b).c to silence
.../pr121208-1a.c:11:1: sorry, unimplemented: 80387 instructions aren’t allowed in a function with the ‘no_caller_saved_registers’ attribute
PR target/121208
* gcc.target/i386/pr121208-1a.c (dg-options): Add -mno-80387.
* gcc.target/i386/pr121208-1b.c (dg-options): Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Loop peeling and minimal loop vectorization threshold prevented loop
vectorization in these examples. Adjust parameters in the test to
make the test pass.
Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
PR testsuite/121286
PR testsuite/121288
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr112325.c: Adjust parameters for s390.
* gcc.dg/vect/pr117888-1.c: Ditto.
|
|
Automatically generate -mcpu and -mtune options in invoke.texi from
the unified riscv-cores.def metadata, ensuring documentation stays in sync
with definitions and reducing manual maintenance.
gcc/ChangeLog:
* Makefile.in: Add riscv-mcpu.texi and riscv-mtune.texi to the list
of files to be processed by the Texinfo generator.
* config/riscv/t-riscv: Add rule for generating riscv-mcpu.texi
and riscv-mtune.texi.
* doc/invoke.texi: Replace hand‑written extension table with
`@include riscv-mcpu.texi` and `@include riscv-mtune.texi` to
pull in auto‑generated entries.
* config/riscv/gen-riscv-mcpu-texi.cc: New file.
* config/riscv/gen-riscv-mtune-texi.cc: New file.
* doc/riscv-mcpu.texi: New file.
* doc/riscv-mtune.texi: New file.
|
|
This patch adds a new rule for distributing lowpart subregs through
ANDs, IORs, and XORs with a constant, in cases where one of the terms
then disappears. For example:
(lowart-subreg:QI (and:HI x 0x100))
simplifies to zero and
(lowart-subreg:QI (and:HI x 0xff))
simplifies to (lowart-subreg:QI x).
This would often be handled at some point using nonzero bits. However,
the specific case I want the optimisation for is SVE predicates,
where nonzero bit tracking isn't currently an option. Specifically:
the predicate modes VNx8BI, VNx4BI and VNx2BI have the same size as
VNx16BI, but treat only every second, fourth, or eighth bit as
significant. Thus if we have:
(subreg:VNx8BI (and:VNx16BI x C))
where C is the repeating constant { 1, 0, 1, 0, ... }, then the
AND only clears bits that are made insignificant by the subreg,
and so the result is equal to (subreg:VNx8BI x). Later patches
rely on this.
gcc/
* simplify-rtx.cc (simplify_context::simplify_subreg): Distribute
lowpart subregs through AND/IOR/XOR, if doing so eliminates one
of the terms.
(test_scalar_int_ext_ops): Add some tests of the above for integers.
* config/aarch64/aarch64.cc (aarch64_test_sve_folding): Likewise
add tests for predicate modes.
|
|
gcc.target/aarch64/saturating_arithmetic_{1,2}.c expect w0 and w1 to
be duplicated into vectors. The tests expected the duplication of w1
to happen first, but the other order would be fine too. A later
simplify-rtx.cc patch happens to change the order.
gcc/testsuite/
* gcc.target/aarch64/saturating_arithmetic_1.c: Allow w0 and w1
to be duplicated in either order.
* gcc.target/aarch64/saturating_arithmetic_2.c: Likewise.
|
|
The 8-bit and 16-bit tests in cmpbr.c assumed an inverted operand
order ("w1, w0"), but it's possible to use the uninverted operand
order too. This patch generalises the tests to support both forms.
This is a prerequisite for a later patch that adds a new
simplify-rtx.cc rule.
gcc/testsuite/
* gcc.target/aarch64/cmpbr.c: Support both operand orders
for 8-bit and 16-bit comparisons.
|
|
function_expander::get_reg_target didn't actually check for a register,
meaning that it could return a memory target instead. That doesn't
really matter for the current direct and indirect uses (svundef*,
svcreate*, and svset*) but it will for later patches.
gcc/
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::get_reg_target): Check whether the target
is a valid register_operand.
|
|
This patch removes unused local variables from three procedures.
gcc/m2/ChangeLog:
* gm2-compiler/M2GenGCC.mod (FoldBecomes): Remove all
local variables.
(CodeIndrX): Remove length.
Remove newstr.
* gm2-compiler/M2Range.mod (FoldTypeIndrX): Remove desType.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
When having multiple stores with the same offset as the load, in the
case that we are eliminating the load, we were generating a mov instruction
for both of them, leading to the overwrite of the register containing the
loaded value.
This patch fixes this issue by generating a mov instruction only for the
first store in the store-load sequence that has the same offset as the load.
For the next ones that might be encountered, we use bit-field insertion.
Bootstrapped/regtested on AArch64 and x86_64.
PR rtl-optimization/120660
gcc/ChangeLog:
* avoid-store-forwarding.cc (process_store_forwarding):
Fix instruction generation when haveing multiple stores with
base offset.
gcc/testsuite/ChangeLog:
* gcc.dg/pr120660.c: New test.
|
|
This adderess TODO from the test file.
libstdc++-v3/ChangeLog:
* testsuite/std/format/ranges/format_kind.cc: New test.
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
Function riscv_ext_is_subset () uses structured bindings to iterate over
all keys and values of an unordered map. However, this is only
available since C++17 and causes a warning like this:
warning: structured bindings only available with ‘-std=c++17’
This patch addresses the warning.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (riscv_ext_is_subset):
Remove use of structured binding to fix compiler warning.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
While scanning the instructions and upon reaching an instruction that
doesn't satisfy the constraints that we have set, we were removing the
already detected stores, but we were continuing adding stores from that
point onward. This was causing issues when the address ranges from later
stores overlapped with the load's address, leading to partial and wrong
update of the register containing the loaded value.
With this patch, we are skipping the tranformation for stores that operate
on the load's address range, when stores that operate on the same range
have been deleted due to constraint violations.
PR rtl-optimization/119795
gcc/ChangeLog:
* avoid-store-forwarding.cc
(store_forwarding_analyzer::avoid_store_forwarding): Skip
transformations for stores that operate on the same address
range as deleted ones.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr119795.c: New test.
|
|
Add run and tree-optimized check for mul based unsigned scalar SAT_MUL
instead of the widen_mul.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_u_mul-run-1-u16-from-u64.c: Add rv64
target for run.
* gcc.target/riscv/sat/sat_u_mul-run-1-u32-from-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_mul-run-1-u8-from-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_mul-1-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-2-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-2-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-2-u8-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u8-from-u32.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Like widen_mul based pattern, we would like introduce the mul based
pattern as well. The pattern is quite simple compares to the
widen_mul, thus add new instead of the for loop in match.pd.
gcc/ChangeLog:
* match.pd: Add mul based unsigned SAT_MUL.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This shows reassoc is harmful even with len == 3.
PR tree-optimization/120687
* gcc.dg/vect/pr120687-3.c: New testcase.
|
|
I hadn't validated this test worked in C++14 before submitting, fixed
thusly.
PR testsuite/121285
gcc/testsuite/ChangeLog:
* g++.dg/modules/class-11_a.H: Make static_asserts valid for
C++14.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
Reassoc carefully ranks operands to form reduction chains for
vectorization so we are careful to not apply any width related
changes in the early pass. Unfortunately we are not careful
enough. The following gates fma related re-ordering and also
the >= 3 ops tail "optimization" which is the culprit here.
This does not fix the reported inefficient vectorization when
using signed integer reductions yet.
PR tree-optimization/120687
* tree-ssa-reassoc.cc (reassociate_bb): Do not disturb
the sorted operand order in the early pass.
* tree-vect-slp.cc (vect_analyze_slp): Dump when a detected
reduction chain fails SLP discovery.
* gcc.dg/vect/pr120687-1.c: New testcase.
* gcc.dg/vect/pr120687-2.c: Likewise.
|
|
This adds a nullptr check to fix a regression where it is possible to call
`memcmp (NULL, NULL, 0)` which is UB prior to C26.
This fixes the bootstrap-ubsan build.
gcc/ChangeLog:
PR middle-end/121261
* vec.h: Add null ptr check.
|
|
This patch adds a token location parameter to CheckVariableAgainstKeyword
and dependants ensuring that the warning is generated from the
token associated with the variable rather than the end of the statement.
gcc/m2/ChangeLog:
PR modula2/121289
* gm2-compiler/M2Students.def (CheckVariableAgainstKeyword): New
parameter tok.
* gm2-compiler/M2Students.mod (CheckVariableAgainstKeyword): New
parameter tok.
Pass tok to PerformVariableKeywordCheck.
(PerformVariableKeywordCheck): New parameter tok.
Pass tok to MetaErrorStringT0.
* gm2-compiler/P2SymBuild.mod (BuildVariable): Pass tok to
CheckVariableAgainstKeyword.
* gm2-libs-iso/LowLong.mod (except): Replace with ...
(exceptSrc): ... this.
* gm2-libs-iso/LowReal.mod (except): Replace with ...
(exceptSrc): ... this.
* gm2-libs-iso/LowShort.mod (except): Replace with ...
(exceptSrc): ... this.
* gm2-libs-iso/Processes.mod (Wait): Replace from with fromCor.
* gm2-libs-iso/RndFile.mod (EndPos): Replace end with endP.
* gm2-libs/SCmdArgs.mod (GetArg): Replace start with startPos.
Replace end with endPos.
(NArg): Replace start with startPos.
Replace end with endPos.
gcc/testsuite/ChangeLog:
PR modula2/121289
* gm2/warnings/style/fail/badvarname.mod: New test.
* gm2/warnings/style/fail/warnings-style-fail.exp: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
Commit r15-7152-g57b706d141b87c removed
/* { dg-do run { target*-*-linux* *-*-gnu* *-*-uclinux* } } */
from these tests, turning them into 'compile' only tests, even when
they could be executed.
This patch adds
/* { dg-do run } */
which is OK since the tests are correctly skipped if needed thanks to
the following effective-targets (alarm and signal).
With this patch we have again two entries for these tests on linux targets:
* compile (test for excess errors)
* execution test
gcc/testsuite/ChangeLog:
* gcc.dg/pr116906-1.c: Add 'dg-do run'.
* gcc.dg/pr116906-2.c: Likewise.
* gcc.dg/pr78185.c: Likewise.
|
|
In the PR119483 r15-9003 change we've allowed musttail calls to noreturn
functions, after all the decision not to normally tail call noreturn
functions is not because it is not possible to tail call those, but because
it screws up backtraces. As the following testcase shows, we've done that
only for functions not declared [[noreturn]]/_Noreturn but later on
discovered through IPA as noreturn. Functions explicitly declared
[[noreturn]] have (for historical reasons) volatile FUNCTION_TYPE and
the FUNCTION_DECLs are volatile as well, so in order to support those
we shouldn't complain on ECF_NORETURN (we've stopped doing so for musttail
in PR119483) but also shouldn't complain about TYPE_VOLATILE on their
FUNCTION_TYPE (something that IPA doesn't change, I think it only sets
TREE_THIS_VOLATILE on the FUNCTION_DECL). volatile on function type
really means noreturn as well, it has no other meaning.
2025-07-29 Jakub Jelinek <jakub@redhat.com>
PR middle-end/121159
* calls.cc (can_implement_as_sibling_call_p): Don't reject declared
noreturn functions in musttail calls.
* c-c++-common/pr121159.c: New test.
* gcc.dg/plugin/must-tail-call-2.c (test_5): Don't expect an error.
|
|
This is a followup to the review of mergability of CSWTCH patch
located at https://gcc.gnu.org/pipermail/gcc-patches/2025-July/690810.html.
Moves the special # (256) to a macro so it is not used bare in the source
and there is only the need to change it in one place.
This special # was added with r0-37392-g201556f0e00580 which added the original mergeable
section support to gcc.
Pushed as obvious after build and test on x86_64.
gcc/ChangeLog:
* output.h (MAX_ALIGN_MERGABLE): New define.
* tree-switch-conversion.cc (switch_conversion::build_one_array):
Use MAX_ALIGN_MERGABLE instead of 256.
* varasm.cc (mergeable_string_section): Likewise
(mergeable_constant_section): Likewise
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
When I did r16-1067-gaa935ce40a7, I thought it would be
enough to mark the decl as mergable to get it to merge on
all targets. Turns out a few things needed to be changed
to support it being mergable on all targets.
The first thing is improve the selecting of the mergable
section and instead of basing it on the DECL's mode, it
should be based on the size instead.
The second thing that needed to be happen is change the
alignment of the CSWTCH decl to be aligned to the next power
of 2 compared to the size if the size is less than 32bytes
(the max mergable size that is supported).
With these changes, cswtch-6.c passes on ia32 and other targets.
And the new testcase cswtch-7.c will pass now too.
Note I noticed the darwin's darwin_mergeable_constant_section could
be "fixed" up to use DECL_SIZE instead of the DECL_MODE but I am not
sure it makes a huge difference.
Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/120523
gcc/ChangeLog:
* output.h (mergeable_constant_section): New declaration taking
unsigned HOST_WIDE_INT for the size.
* tree-switch-conversion.cc (switch_conversion::build_one_array):
Increase the alignment of CSWTCH for sizes less than 32bytes.
* varasm.cc (mergeable_constant_section): Split out twice.
One that takes the size in unsigned HOST_WIDE_INT and the
other size in a tree.
(default_elf_select_section): Pass DECL_SIZE instead of
DECL_MODE to mergeable_constant_section.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/cswtch-7.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
When the costing refactoring happened we ended up with some strange
inter-mixing of VMAT unrelated code. The following moves stuff
closer to where it's actually used, at the expense of duplicating
some lines.
* tree-vect-stmts.cc (vectorizable_load): Un-factor VMAT
specific code to their handling blocks.
|
|
The following removes this only set member. Sligthly complicated
by the hoops get_group_load_store_type jumps through. I've simplified
that, noting the offset vector type that's relevant is that of the
actual offset SLP node, not of what vect_check_gather_scatter (re-)computes.
* tree-vectorizer.h (gather_scatter_info::offset_dt): Remove.
* tree-vect-data-refs.cc (vect_describe_gather_scatter_call):
Do not set it.
(vect_check_gather_scatter): Likewise.
* tree-vect-stmts.cc (vect_truncate_gather_scatter_offset):
Likewise.
(get_group_load_store_type): Use the vector type of the offset
SLP child. Do not re-check vect_is_simple_use validated by
SLP build.
|
|
|
|
Converting from generic AS to __flashx used the same rule like
for __memx, which tags RAM (generic AS) locations by setting bit 23.
The justification was that generic isn't a subset of __flashx, though
that lead to surprises with code like const __flashx *x = NULL.
The natural thing to do is to just load 0x000000 in that case,
so that the null pointer works in __flashx as expected.
Apart from that, converting NULL to __flashx (or __flash) no more
raises a -Waddr-space-convert diagnostic.
gcc/
PR target/121277
* config/avr/avr.cc (avr_addr_space_convert): When converting
from generic AS to __flashx, don't set bit 23.
(avr_convert_to_type): Don't -Waddr-space-convert when NULL
is converted to __flashx or to __flash.
|
|
When I added the factor operations to ifcvt, I messed how handling of removing
the phi nodes. The fix is we need to remove the phi node that was factored out
as we factored out the operator because otherwise scev can go when it comes
to detecting if the new args are from a reduction.
Also the need to change the interface for is_cond_scalar_reduction as the
phi node that was being passed after the factoring no longer exists so need
to pass the parts that were being used.
PR tree-optimization/121236
gcc/ChangeLog:
* tree-if-conv.cc (is_cond_scalar_reduction): Instead of phi argument,
pass bb and res of the phi.
(factor_out_operators): Add iterator for the phi. Remove the phi
if this is the first time. Return if we had removed the phi.
(predicate_scalar_phi): Add the phi iterator argument.
Update call to is_cond_scalar_reduction.
Update call to factor_out_operators and set the return value to true
when factor_out_operators returns true.
(predicate_all_scalar_phis): Don't remove the phi if predicate_scalar_phi
already removed it.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr121236-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
__tls_get_addr doesn't preserve vector registers. When a function
with no_caller_saved_registers attribute calls __tls_get_addr, YMM
and ZMM registers will be clobbered. Issue an error and suggest
-mtls-dialect=gnu2 in this case.
gcc/
PR target/121208
* config/i386/i386.cc (ix86_tls_get_addr): Issue an error for
-mtls-dialect=gnu with no_caller_saved_registers attribute and
suggest -mtls-dialect=gnu2.
gcc/testsuite/
PR target/121208
* gcc.target/i386/pr121208-1a.c: New test.
* gcc.target/i386/pr121208-1b.c: Likewise.
* gcc.target/i386/pr121208-2a.c: Likewise.
* gcc.target/i386/pr121208-2b.c: Likewise.
* gcc.target/i386/pr121208-3a.c: Likewise.
* gcc.target/i386/pr121208-3b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
When the C++98 std::distance and std::advance functions (and C++11
std::next and std::prev) are used with C++20 iterators there can be
unexpected results, ranging from compilation failure to decreased
performance to undefined behaviour.
An iterator which satisfies std::input_iterator but does not meet the
Cpp17InputIterator requirements might have std::output_iterator_tag for
its std::iterator_traits<I>::iterator_category, which means it currently
cannot be used with std::advance at all. However, the implementation of
std::advance for a Cpp17InputIterator doesn't do anything that isn't
valid for iterator types satsifying C++20 std::input_iterator.
Similarly, a type satisfying C++20 std::bidirectional_iterator might be
usable with std::prev, if it weren't for the fact that its C++17
iterator_category is std::input_iterator_tag.
Finally, a type satisfying C++20 std::random_access_iterator might use a
slower implementation for std::distance or std::advance if its C++17
iterator_category is not std::random_access_iterator_tag.
This commit adds a __promotable_iterator concept to detect C++20
iterators which explicitly define an iterator_concept member, and which
either have no iterator_category, or their iterator_category is weaker
than their iterator_concept. This is used by std::distance and
std::advance to detect iterators which should dispatch based on their
iterator_concept instead of their iterator_category. This means that
those functions just work and do the right thing for C++20 iterators
which would otherwise fail to compile or have suboptimal performance.
This is related to LWG 3197, which considers making it undefined to use
std::prev with types which do not meet the Cpp17BidirectionalIterator
requirements. I think making it work, as in this commit, is a better
solution than banning it (or rejecting it at compile-time as libc++
does).
PR libstdc++/102181
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator_base_funcs.h (distance, advance):
Check C++20 iterator concepts and handle appropriately.
(__detail::__iter_category_converts_to_concept): New concept.
(__detail::__promotable_iterator): New concept.
* testsuite/24_iterators/operations/cxx20_iterators.cc: New
test.
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
contrib/ChangeLog
* gcc-changelog/git_commit.py: Add "diagnostics" to bug
components.
|
|
Current trunk doesn't bootstrap with --enable-checking=release
due to improper nesting of namespaces and #if CHECKING_P blocks.
This corrects that.
gcc/
PR other/121260
* diagnostics/changes.cc: Correct nesting of namespaces
and #if CHECKING_P blocks.
* diagnostics/context.cc: Likewise.
* diagnostics/html-sink.cc: Likewise.
* diagnostics/output-spec.cc: Likewise.
* diagnostics/sarif-sink.cc: Likewise.
Signed-off-by: Mikael Pettersson <mikpelinux@gmail.com>
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
Test cases for commit 60ba2b61af23e6d561c5cbab8df57ea093ade3b3
"nvptx/nvptx.opt: Update -march-map= for newer sm_xxx".
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_100.c: New.
* gcc.target/nvptx/march-map=sm_100a.c: Likewise.
* gcc.target/nvptx/march-map=sm_100f.c: Likewise.
* gcc.target/nvptx/march-map=sm_101.c: Likewise.
* gcc.target/nvptx/march-map=sm_101a.c: Likewise.
* gcc.target/nvptx/march-map=sm_101f.c: Likewise.
* gcc.target/nvptx/march-map=sm_103.c: Likewise.
* gcc.target/nvptx/march-map=sm_103a.c: Likewise.
* gcc.target/nvptx/march-map=sm_103f.c: Likewise.
* gcc.target/nvptx/march-map=sm_120.c: Likewise.
* gcc.target/nvptx/march-map=sm_120a.c: Likewise.
* gcc.target/nvptx/march-map=sm_120f.c: Likewise.
* gcc.target/nvptx/march-map=sm_121.c: Likewise.
* gcc.target/nvptx/march-map=sm_121a.c: Likewise.
* gcc.target/nvptx/march-map=sm_121f.c: Likewise.
|
|
Usage of the -march-map=: "Select the closest available '-march=' value
that is not more capable."
As PTX ISA 8.6/8.7 (= unreleased CUDA 12.7 + CUDA 12.8) added the
Nvidia Blackwell GPUs SM_100, SM_101, and SM_120, it makes sense to
add them as well. Note that all three come as sm_XXX and sm_XXXa.
PTX ISA 8.8 (CUDA 12.9) added SM_103 and SM_121 and the new 'f' suffix
for all SM_1xx.
Internally, GCC currently generates the same code for >= sm_80 (Ampere);
however, as GCC's -march= also supports sm_89 (Ada), the here added
sm_1xxs (Blackwell) will map to sm_89.
[Naming note: while ptx code generated for sm_X can also run with sm_Y
if Y > X, code generated for sm_XXXa can (generally) only run on
the specific hardware; and sm_XXXf implies compatibility with only
subsequent targets in the same family.]
gcc/ChangeLog:
* config/nvptx/nvptx.opt (march-map=): Add sm_100{,f,a},
sm_101{,f,a}, sm_103{,a,f}, sm_120{,a,f} and sm_121{,f,a}.
|
|
For device (agent) scope atomics - as needed when there is more than one teams,
a buffer_wbl2 followed by s_waitcnt is required. When doing the initial porting,
the pre-atomic instruction got accidentally replaced by buffer_inv sc1, which is
not quite the right instruction.
gcc/ChangeLog:
* config/gcn/gcn.md (atomic_load, atomic_store, atomic_exchange):
Fix CDNA3 L2 cache write-back before atomic instructions.
|
|
The following adds const qualification to gather_scatter_info *
parameters for various APIs in the vectorizer.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors):
Make *gs_info const.
(vect_build_one_gather_load_call): Likewise.
(vect_build_one_scatter_store_call): Likewise.
(vect_get_gather_scatter_ops): Likewise.
(vect_get_strided_load_store_ops): Likewise.
|
|
Implement another case where the CDNA3 ISA documentation requires s_nop,
add a comment why another case does not need to be handled. And add one
case where an s_nop is required by MI300A hardware but seems to be not
mentioned in the CDNA3 ISA documentation.
gcc/ChangeLog:
* config/gcn/gcn.md (define_attr "vcmp"): Add with values
vcmp/vcmpx/no.
(*movbi, cstoredi4.., cstore<mode>4): Set it.
* config/gcn/gcn-valu.md (vec_cmp<mode>...): Likewise.
* config/gcn/gcn.cc (gcn_cmpx_insn_p): Remove.
(gcn_md_reorg): Add two new conditions for MI300.
|
|
Use 's_nops' with a number instead of multiple of 's_nop' when
manually adding 1 to 5 wait state. This helps with
the instruction cache and helps a tiny bit with PR119367 where
a two-byte variable overflows in the debugging location view handling.
Add a comment about 'sc0' to TARGET_GLC_NAME as for atomics it is
unrelated to the scope but to whether the result is stored; i.e.
using e.g. 'sc1' instead of 'sc0' will have undesired consequences!
Update the comment above print_operand_address to document 'R' and 'V';
those are used below as "Temporary hack.", but it makes sense to see
them in the list.
gcc/ChangeLog:
* config/gcn/gcn-opts.h (enum hsaco_attr_type): Add comment
about 'sc0'.
* config/gcn/gcn.cc (gcn_md_reorg): Use gen_nops instead of gen_nop.
(print_operand_address): Document 'R' and 'V' in the
pre-function comment as well.
* config/gcn/gcn.md (nops): Add.
|
|
This adds the new bitset constructor from string_view
defined in P2697 to the debug version of the type.
libstdc++-v3/Changelog:
PR libstdc++/119742
* include/debug/bitset: Add new ctor.
|
|
We failed to build the correct initialization vector. For VLA
vectors and a non-uniform initialization vector this rejects
vectorization for now.
PR tree-optimization/121256
* tree-vect-loop.cc (vectorizable_recurr): Build a correct
initialization vector for SLP_TREE_LANES > 1.
* gcc.dg/vect/vect-recurr-pr121256.c: New testcase.
* gcc.dg/vect/vect-recurr-pr121256-2.c: Likewise.
|
|
libstdc++-v3/ChangeLog:
* include/std/mdspan: Small stylistic adjustments.
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
I am at a point where I want to store additional information from
analysis (from loads and stores) to re-use them at transform stage
without repeating the analysis. I do not want to add to
stmt_vec_info at this point, so this starts adding kind specific
sub-structures by moving the STMT_VINFO_TYPE field to the SLP
tree and adding a (dummy for now) union tagged by it to receive
such data.
The change is largely mechanical after RISC-V has been prepared
to have a SLP node around.
I have settled for a union (supposed to get pointers to data).
As followup this enables getting rid of SLP_TREE_CODE and making
VEC_PERM therein a separate type, unifying its handling.
* tree-vectorizer.h (_slp_tree::type): Add.
(_slp_tree::u): Likewise.
(_stmt_vec_info::type): Remove.
(STMT_VINFO_TYPE): Likewise.
(SLP_TREE_TYPE): New.
* tree-vectorizer.cc (vec_info::new_stmt_vec_info): Do not
initialize type.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize type.
(vect_slp_analyze_node_operations): Adjust.
(vect_schedule_slp_node): Likewise.
* tree-vect-patterns.cc (vect_init_pattern_stmt): Do not
copy STMT_VINFO_TYPE.
* tree-vect-loop.cc: Set SLP_TREE_TYPE instead of
STMT_VINFO_TYPE everywhere.
(vect_create_loop_vinfo): Do not set STMT_VINFO_TYPE on
loop conditions.
* tree-vect-stmts.cc: Set SLP_TREE_TYPE instead of
STMT_VINFO_TYPE everywhere.
(vect_analyze_stmt): Adjust.
(vect_transform_stmt): Likewise.
* config/aarch64/aarch64.cc (aarch64_vector_costs::count_ops):
Access SLP_TREE_TYPE instead of STMT_VINFO_TYPE.
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Remove non-SLP element-wise load/store matching.
* config/rs6000/rs6000.cc
(rs6000_cost_data::update_target_cost_per_stmt): Pass in
the SLP node. Use that to get at the memory access
kind and type.
(rs6000_cost_data::add_stmt_cost): Pass down SLP node.
* config/riscv/riscv-vector-costs.cc (variable_vectorized_p):
Use SLP_TREE_TYPE.
(costs::need_additional_vector_vars_p): Likewise.
(costs::update_local_live_ranges): Likewise.
|