Age | Commit message (Collapse) | Author | Files | Lines |
|
Align move_max with prefer_vector_width for SPR/GNR/DMR similar as
below commit.
commit 6ea25c041964bf63014fcf7bb68fb1f5a0a4e123
Author: liuhongt <hongtao.liu@intel.com>
Date: Thu Aug 15 12:54:07 2024 +0800
Align ix86_{move_max,store_max} with vectorizer.
When none of mprefer-vector-width, avx256_optimal/avx128_optimal,
avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will
set ix86_{move_max,store_max} as max available vector length except
for AVX part.
if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)
&& TARGET_EVEX512_P (opts->x_ix86_isa_flags2))
opts->x_ix86_move_max = PVW_AVX512;
else
opts->x_ix86_move_max = PVW_AVX128;
So for -mavx2, vectorizer will choose 256-bit for vectorization, but
128-bit is used for struct copy, there could be a potential STLF issue
due to this "misalign".
gcc/ChangeLog:
* config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES):
Remove SPR/GNR/DMR.
(X86_TUNE_AVX512_STORE_BY_PIECES): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pieces-memcpy-18.c: Use -mtune=znver5
instead of -mtune=sapphirerapids.
* gcc.target/i386/pieces-memcpy-21.c: Ditto.
* gcc.target/i386/pieces-memset-46.c: Ditto.
* gcc.target/i386/pieces-memset-49.c: Ditto.
(cherry picked from commit dd713d0f3fc88778a9b3d4f8f1895a3cd6c145ca)
|
|
|
|
Reduce fp16-aapcs testcases to return value testing since parameter
passing are already tested in aapcs/vfp*.c
gcc/testsuite/ChangeLog:
* gcc.target/arm/fp16-aapcs.c: New test.
* gcc.target/arm/fp16-aapcs-1.c: Removed.
* gcc.target/arm/fp16-aapcs-2.c: Likewise.
* gcc.target/arm/fp16-aapcs-3.c: Likewise.
* gcc.target/arm/fp16-aapcs-4.c: Likewise.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit 1cf8cb45d872a5f09d65c63c891c091710c37432)
|
|
|
|
Shreya's work to add the addptr pattern on the RISC-V port exposed a latent bug
in LRA.
We lazily allocate/reallocate the ira_reg_equiv structure and when we do
(re)allocation we'll over-allocate and zero-fill so that we don't have to
actually allocate and relocate the data so often.
In the case exposed by Shreya's work we had N requested entries at the last
rellocation step. We actually allocate N+M entries. During LRA we allocate
enough new pseudos and thus have N+M+1 pseudos.
In get_equiv we read ira_reg_equiv[regno] without bounds checking so we read
past the allocated part of the array and get back junk which we use and
depending on the precise contents we fault in various fun and interesting ways.
We could either arrange to re-allocate ira_reg_equiv again on some path through
LRA (possibly in get_equiv itself). We could also just insert the bounds check
in get_equiv like is done elsewhere in LRA. Vlad indicated no strong
preference in an email last week.
So this just adds the bounds check in a manner similar to what's done elsewhere
in LRA. Bootstrapped and regression tested on x86_64 as well as RISC-V with
Shreya's work enabled and regtested across the various embedded targets.
gcc/
* lra-constraints.cc (get_equiv): Bounds check before accessing
data in ira_reg_equiv.
(cherry picked from commit 0c6ad3f5dfbd45150eeef2474899ba7ef0d8e592)
|
|
|
|
Committing as obvious.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/testsuite/
PR target/121749
* gcc.target/aarch64/simd/pr121749.c: Use dg-assemble directive.
(cherry picked from commit 2b8256d0ce18ed4d00868c78f5128d32884ccfa1)
|
|
With g:d20b2ad845876eec0ee80a3933ad49f9f6c4ee30 the narrowing shift instructions
are now represented with standard RTL and more merging optimisations occur.
This exposed a wrong predicate for the shift amount operand.
The shift amount is the number of bits of the narrow destination, not the input
sources.
Correct this by using the vn_mode attribute when specifying the predicate, which
exists for this purpose.
I've spotted a few more narrowing shift patterns that need the restriction, so
they are updated as well.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/
PR target/121749
* config/aarch64/aarch64-simd.md (aarch64_<shrn_op>shrn_n<mode>):
Use aarch64_simd_shift_imm_offset_<vn_mode> instead of
aarch64_simd_shift_imm_offset_<ve_mode> predicate.
(aarch64_<shrn_op>shrn_n<mode> VQN define_expand): Likewise.
(*aarch64_<shrn_op>rshrn_n<mode>_insn): Likewise.
(aarch64_<shrn_op>rshrn_n<mode>): Likewise.
(aarch64_<shrn_op>rshrn_n<mode> VQN define_expand): Likewise.
(aarch64_sqshrun_n<mode>_insn): Likewise.
(aarch64_sqshrun_n<mode>): Likewise.
(aarch64_sqshrun_n<mode> VQN define_expand): Likewise.
(aarch64_sqrshrun_n<mode>_insn): Likewise.
(aarch64_sqrshrun_n<mode>): Likewise.
(aarch64_sqrshrun_n<mode>): Likewise.
* config/aarch64/iterators.md (vn_mode): Handle DI, SI, HI modes.
gcc/testsuite/
PR target/121749
* gcc.target/aarch64/simd/pr121749.c: New test.
(cherry picked from commit cb508e54140687a50790059fac548d87515df6be)
|
|
Add support for some recent AVR devices.
gcc/
* config/avr/avr-mcus.def: Add avr32eb14, avr32eb20,
avr32eb28, avr32eb32.
* doc/avr-mmcu.texi: Rebuild.
(cherry picked from commit 45f605a74fd7e96294477db064cc58033c3fba49)
|
|
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
gcc/testsuite/ChangeLog:
PR c++/121801
* g++.dg/abi/pr121801.C: New test.
gcc/cp/ChangeLog:
PR c++/121801
* mangle.cc (write_real_cst): Handle 16-bit real and assert
that reals have 16 bits or a multiple of 32 bits.
(cherry picked from commit 19d1c7c28f4fd0557dd868a7a4041b00ceada890)
|
|
Mode iterator V_HW enables V1TI for target VXE which means
vec_cmpv1tiv1ti becomes available which leads to an ICE since there is
no corresponding insn.
Fixed by emulating comparisons and enabling mode V1TI unconditionally
for V_HW. For the sake of symmetry, I also added TI mode to V_HW since
TF mode is already included. As a consequence the consumers of V_HW
vec_{splat,slb,sld,sldw,sldb,srdb,srab,srb,test_mask_int,test_mask}
also become available for 128-bit integers.
This fixes gcc.c-torture/execute/pr105613.c and gcc.dg/pr106063.c.
gcc/ChangeLog:
* config/s390/vector.md (V_HW): Enable V1TI unconditionally and
add TI.
(vec_cmpu<VIT_HW:mode><VIT_HW:mode>): Add 128-bit integer
variants.
(*vec_cmpeq<mode><mode>_nocc_emu): Emulate operation.
(*vec_cmpgt<mode><mode>_nocc_emu): Emulate operation.
(*vec_cmpgtu<mode><mode>_nocc_emu): Emulate operation.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/vec-cmp-emu-1.c: New test.
* gcc.target/s390/vector/vec-cmp-emu-2.c: New test.
* gcc.target/s390/vector/vec-cmp-emu-3.c: New test.
(cherry picked from commit 1b575bb24a7a3d2b00197dd5deb4c26b313f442b)
|
|
|
|
|
|
|
|
|
|
|
|
There are at least two cases where tree-switch-conversion leads
to unpleasant resource allocation:
PR49857
The lookup table lives in RAM. This is the case for all
devices that locate .rodata in RAM, which is for almost
all AVR devices.
PR81540
Code is bloated for 64-bit inputs.
As far as PR49857 is concerned, a target hook that may add an
address-space qualifier to the lookup table is the obvious
solution, though a respective patch has always been rejected by
global maintainers for non-technical reasons.
gcc/
PR target/81540
PR target/49857
* common/config/avr/avr-common.cc: Disable -ftree-switch-conversion.
(cherry picked from commit 912159d2b5429c3126756b56723dd4f32dd56bdd)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gcc/ChangeLog:
* gimple.h (GTMA_DOES_GO_IRREVOCABLE): Fix typo.
(cherry picked from commit 356250630abd876ae592bc3d2b4cc171bc834b79)
|
|
|
|
|
|
|
|
|
|
This reverts commit e645728e9de64d019661c8f92bb487e06d95644a.
|
|
|
|
libgcc/config/libbid/ChangeLog:
PR target/120691
* bid128_div.c: Fix _Decimal128 arithmetic error under
FE_UPWARD.
* bid128_rem.c: Ditto.
* bid128_sqrt.c: Ditto.
* bid64_div.c (bid64_div): Ditto.
* bid64_sqrt.c (bid64_sqrt): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr120691.c: New test.
(cherry picked from commit 50064b2898edfb83bc37f2597a35cbd3c1c853e3)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
PR fortran/121145
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_conv_procedure_call): Do not create pointer
check for proc-pointer actual passed to optional dummy.
gcc/testsuite/ChangeLog:
* gfortran.dg/pointer_check_15.f90: New test.
(cherry picked from commit 8f9450505f8244d262f8b4ff274f113f99cdc7e2)
|
|
Commit r16-1633 introduced a regression for imported interfaces that were
not renamed-on-use, since the related logic did not take into account that
the absence of renaming could be represented by an empty string.
PR fortran/120784
gcc/fortran/ChangeLog:
* interface.cc (gfc_match_end_interface): Detect empty local_name.
gcc/testsuite/ChangeLog:
* gfortran.dg/interface_63.f90: Extend testcase.
(cherry picked from commit ddff83b3dde4a8308d0e156f85693e7176b85524)
|
|
PR fortran/120784
gcc/fortran/ChangeLog:
* interface.cc (gfc_match_end_interface): If a use-associated
symbol is renamed, use the local_name for checking.
gcc/testsuite/ChangeLog:
* gfortran.dg/interface_63.f90: New test.
(cherry picked from commit 6dd1659cf10a7ad51576f902ef3bc007db30c990)
|
|
|
|
|
|
We currently use the types encountered in the function body and not in
type declaration to perform total scalarization. Bug PR 119085
uncovered that we miss a check that when the same data is accessed
with aggregate types that those are actually compatible. Without it,
we can base total scalarization on a type that does not "cover" all
live data in a different part of the function. This patch adds the
check.
gcc/ChangeLog:
2025-07-21 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/119085
* tree-sra.cc (sort_and_splice_var_accesses): Prevent total
scalarization if two incompatible aggregates access the same place.
gcc/testsuite/ChangeLog:
2025-07-21 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/119085
* gcc.dg/tree-ssa/pr119085.c: New test.
(cherry picked from commit 171fcc80ede596442712e559c4fc787aa4636694)
|
|
Testcase of PR 117423 shows a flaw in the fancy way we do "total
scalarization" in SRA now. We use the types encountered in the
function body and not in type declaration (allowing us to totally
scalarize when only one union field is ever used, since we effectively
"skip" the union then) and can accommodate pre-existing accesses that
happen to fall into padding.
In this case, we skipped the union (bypassing the
totally_scalarizable_type_p check) and the access falling into the
"padding" is an aggregate and so not a candidate for SRA but actually
containing data. Arguably total scalarization should just bail out
when it encounters this situation (but I decided not to depend on this
mainly because we'd need to detect all cases when we eventually cannot
scalarize, such as when a scalar access has children accesses) but the
actual bug is that the detection if all data in an aggregate is indeed
covered by replacements just assumes that is always the case if total
scalarization triggers which however may not be the case in cases like
this - and perhaps more.
This patch fixes the bug by just assuming that all padding is taken
care of when total scalarization triggered, not that every access was
actually scalarized.
gcc/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* tree-sra.cc (analyze_access_subtree): Fix computation of grp_covered
flag.
gcc/testsuite/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* gcc.dg/tree-ssa/pr117423.c: New test.
(cherry picked from commit 7375909e9d9e7de23acb4b1e0a965d8faf1943c4)
|
|
The linker rejects --relax in relocatable links (-r), hence only
add --relax when -r is not specified.
gcc/
PR target/121608
* config/avr/specs.h (LINK_RELAX_SPEC): Wrap in %{!r...}.
(cherry picked from commit 0f15ff7b511493e9197e6153b794081c1557ba02)
|
|
|
|
|
|
This PR is about a case where we used aarch64_expand_sve_const_pred_trn
to combine two predicates, one of which was constructing using
aarch64_sve_move_pred_via_while. The former requires the inputs
to have mode VNx16BI, but the latter returned VNx8BI for a .H
WHILELO.
The proper fix, used on trunk, is to make the pattern emitted by
aarch64_sve_move_pred_via_while produce an VNx16BI for all element
sizes, since every bit of the result is significant. However,
that required some target-independent changes that are too invasive
to backport. This patch goes for the simpler (but less robust) approach
of using the original pattern and casting it to VNx16BI after the fact.
Since the WHILELO pattern is an unspec, the chances of something
optimising it in a way that changes the undefined bits of the output
should be very low, especially on a release branch. It is still a less
satisfactory fix though.
gcc/
PR target/121118
* config/aarch64/aarch64.cc (aarch64_sve_move_pred_via_while):
Return a VNx16BI predicate.
gcc/testsuite/
PR target/121118
* gcc.target/aarch64/sve/acle/general/pr121118_1.c: New test.
(cherry picked from commit 58a9717df098defb7f595fbc56122107e952a46b)
|