Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
This reverts commit 9ddef25c1812bf0b9c75634013b1fbcd94eca5a4.
|
|
This reverts commit cb4b73da237153871fb840a3a31a79354933a8bb.
|
|
Like the previous commit but for strlen copy so we can backport
this commit. The loads should have the correct alignment on them
so we need to create newly aligned types when the alignment of the
pointer is less than the alignment of the current type.
Pushed as pre-approved by https://gcc.gnu.org/pipermail/gcc-patches/2025-September/694016.html
after a bootstrap/test on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-strlen.cc (strlen_pass::handle_builtin_memcmp): Create
unaligned types if the alignment of the pointers is less
than the alignment of the new type.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
I noticed that when looking into g++.dg/tree-ssa/vector-compare-1.C
failure on arm, the wrong alignment was being used for the load.
There needs to be an unaligned type here to get the correct alignment.
NOTE this means the code in strlen is also wrong but that is on its way
out so I am not sure if we should update it or not to backport to the
release branches; there could be wrong code happening too.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (simplify_builtin_memcmp): Create
unaligned types if the alignment of the pointers is less
than the alignment of the new type.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
2025-09-02 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/89707
* decl.cc (gfc_get_pdt_instance): Copy the typebound procedure
field from the PDT template. If the template interface has
kind=0, provide the new instance with an interface with a type
spec that points to that of the parameterized component.
(match_ppc_decl): When 'saved_kind_expr' this is a PDT and the
expression should be copied to the component kind_expr.
* gfortran.h: Define gfc_get_tbp.
gcc/testsuite/
PR fortran/89707
* gfortran.dg/pdt_43.f03: New test.
|
|
2025-09-02 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/87669
* expr.cc (gfc_spec_list_type): If no LEN components are seen,
unconditionally return 'SPEC_ASSUMED'. This suppresses an
invalid error in match.cc(gfc_match_type_is).
gcc/testsuite/
PR fortran/87669
* gfortran.dg/pdt_42.f03: New test.
libgfortran/
PR fortran/87669
* intrinsics/extends_type_of.c (is_extension_of): Use the vptr
rather than the hash value to identify the types.
|
|
On arm, overriding -march can lead to warnings if the testsuite
options try to pass -mcpu. Avoid these by ensuring the -mcpu is unset
before adding the architecture.
Also, improve the compatibility of asm-hard-reg-error-3.c for
hard-float environment by allowing FP instructions in the
architecture.
gcc/testsuite:
* gcc.dg/asm-hard-reg-4.c: On Arm, unset the CPU before
setting the arch.
* gcc.dg/asm-hard-reg-error-3.c: Similarly. Also add
floating-point instructions to aid hard-float variants.
Match on arm* not just arm.
|
|
The recent change to vect_synth_mult_by_constant missed to handle
the synth_shift_p case for alg_shift, so we still changed c * 4
to c + c + c + c. The following also amends alg_add_t2_m, alg_sub_t2_m,
alg_add_factor and alg_sub_factor appropriately.
PR tree-optimization/121753
* tree-vect-patterns.cc (vect_synth_mult_by_constant): Properly
bail when synth_shift_p and an alg_shift use. Handle other
problematic cases.
|
|
This patch changes is_vlmax_len_p to handle VLS modes properly.
Before we would check if len == GET_MODE_NUNITS (mode). This works vor
VLA modes but not necessarily for VLS modes. We regularly have e.g.
small VLS modes where LEN equals their number of units but which do not
span a full vector. Therefore now check if len * GET_MODE_UNIT_SIZE
(mode) equals BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL.
Changing this uncovered an oversight in avlprop where we used
GET_MODE_NUNITS as AVL when GET_MODE_NUNITS / NF would be correct.
The testsuite is unchanged. I didn't bother to add a dedicated test
because we would have seen the fallout any way once the gather patch
lands.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (is_vlmax_len_p): Properly handle VLS
modes.
(imm_avl_p): Fix VLS length check.
(expand_strided_load): Use is_vlmax_len_p.
(expand_strided_store): Ditto.
* config/riscv/riscv-avlprop.cc (pass_avlprop::execute):
Use GET_MODE_NUNITS / NF as avl.
|
|
In a two-source gather we unconditionally overwrite target with the
first gather's result already. If op1 == target this clobbers the
source operand for the second gather. This patch uses a temporary in
that case.
PR target/121742
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vec_perm): Use temporary if
op1 and target overlap.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr121742.c: New test.
|
|
The NoOffload flag was introduced recently (commit "Don't pass vector params
through to offload targets").
gcc/ChangeLog:
* doc/options.texi: Document NoOffload.
|
|
In r16-3414 libstdc++ changed ABI for (still experimental C++20) and uses
unordered value -128 instead of 2. Generally the change improved code
generation on all targets tested, see
https://gcc.gnu.org/pipermail/gcc-patches/2025-August/693534.html
for details.
In r16-3474 I've adjusted the middle-end and backends to use that value.
This apparently broke the gcc.target/s390/spaceship-fp-2.c test,
with -ffast-math the 2 value is unreachable and so the .SPACESHIP last
argument in that case is the default, which changed from 2 to -128.
But spaceship-fp-1.c test also doesn't test what libstdc++ uses anymore,
so the following patch uses -128 in all the spots.
2025-09-02 Jakub Jelinek <jakub@redhat.com>
* gcc.target/s390/spaceship-fp-1.c: Expect .SPACESHIP call with
-128 as last argument instead of 2.
(TEST): Use -128 instead of 2.
* gcc.target/s390/spaceship-fp-2.c: Expect .SPACESHIP call with
-128 as last argument instead of 2.
(TEST): Use -128 instead of 2.
|
|
We have contracts-related declarations and macros split between contracts.h
and cp-tree.h, and then contracts.h is included in the latter, which means
that it is included in all c++ front end files.
This patch:
- moves all the contracts-related material to contracts.h.
- makes some functions that are only used in contracts.cc static.
- tries to group the external API for contracts into related topics.
- includes contracts.h in the front end sources that need it.
gcc/cp/ChangeLog:
* constexpr.cc: Include contracts.h
* coroutines.cc: Likewise.
* cp-gimplify.cc: Likewise.
* decl.cc: Likewise.
* decl2.cc: Likewise.
* mangle.cc: Likewise.
* module.cc: Likewise.
* pt.cc: Likewise.
* search.cc: Likewise.
* semantics.cc: Likewise.
* contracts.cc (validate_contract_role, setup_default_contract_role,
add_contract_role, get_concrete_axiom_semantic,
get_default_contract_role): Make static.
* cp-tree.h (make_postcondition_variable, grok_contract,
finish_contract_condition, find_contract, set_decl_contracts,
get_contract_semantic, set_contract_semantic): Move to contracts.h.
* contracts.h (get_contract_role, add_contract_role,
validate_contract_role, setup_default_contract_role,
lookup_concrete_semantic, get_default_contract_role): Remove.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
osthread.d is trying to use PPC_THREAD_STATE32 which is not defined
in thread_act.d (PPC_THREAD_STATE is defined for the 32b case). This
leads to a build fail for libdruntime.
libphobos/ChangeLog:
* libdruntime/core/thread/osthread.d: Use PPC_THREAD_STATE
instead of PPC_THREAD_STATE32.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
This patch update RISC-V Zba extension 'sext' instructions generation.
Supplemented the instruction generation detection of 'sext.h' and
'sext.b'.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zbb-sext.c: New test.
|
|
This patch update RISC-V Zba extension 'shNadd.uw' instruction generation.
Supplemented the instruction generation detection of 'sh1add.uw' and
'sh3add.uw'.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zba-shadd.c: New test functions.
|
|
As preparation for implementing std::constant_wrapper that's part of the
C++26 version of the <type_traits> header, the two classes _Index_tuple
and _Build_index_tuple are moved to <type_traits>. These two helpers are
needed by std::constant_wrapper to initialize the elements of one C
array with another.
Since, <bits/utility.h> already includes <type_traits> this solution
avoids creating a very small header file for just these two internal
classes. This approach doesn't move std::index_sequence and related code
to <type_traits> and therefore doesn't change which headers provide
user-facing features.
libstdc++-v3/ChangeLog:
* include/bits/utility.h (_Index_tuple): Move to <type_traits>.
(_Build_index_tuple): Ditto.
* include/std/type_traits (_Index_tuple): Ditto.
(_Build_index_tuple): Ditto.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
The new gcc.target/i386/memset-strategy-1[03].c tests FAIL on
Solaris/x86:
FAIL: gcc.target/i386/memset-strategy-10.c check-function-bodies foo
FAIL: gcc.target/i386/memset-strategy-13.c check-function-bodies foo
The issue is the same as several times previously: they need to be
compiled with -fasynchronous-unwind-tables -fdwarf2-cfi-asm, which this
patch does.
Tested on i386-pc-solaris2.11 (as and gas) and x86_64-pc-linux-gnu.
2025-09-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
* gcc.target/i386/memset-strategy-10.c (dg-options): Add
-fasynchronous-unwind-tables -fdwarf2-cfi-asm.
* gcc.target/i386/memset-strategy-13.c: Likewise.
|
|
The following makes vect_analyze_stmt call vectorizable_* with all
STMT_VINFO_VECTYPE NULL_TREE but restores the value for eventual
iteration with single-lane SLP. It clears it for every stmt during
vect_transform_stmt.
* tree-vect-stmts.cc (vect_transform_stmt): Clear
STMT_VINFO_VECTYPE for all stmts.
(vect_analyze_stmt): Likewise. But restore at the end again.
|
|
The reduction guard isn't correct, STMT_VINFO_REDUC_DEF also exists
for nested cycles not part of reductions but there's no reduction
info for them.
PR tree-optimization/121754
* tree-vectorizer.h (vect_reduc_type): Simplify to not ICE
on nested cycles.
* gcc.dg/vect/pr121754.c: New testcase.
* gcc.target/aarch64/vect-pr121754.c: Likewise.
|
|
bump is always specified, so remove the STMT_VINFO_VECTYPE touching
path.
* tree-vect-data-refs.cc (bump_vector_ptr): Remove the
STMT_VINFO_VECTYPE use, bump is always specified.
|
|
The strided-store path needs to have the SLP trees vector type so
the following patch passes dowm the vector type to be used to
vect_check_gather_scatter and adjusts all other callers. This
removes one of the last pieces requiring STMT_VINFO_VECTYPE
during SLP stmt analysis.
* tree-vectorizer.h (vect_check_gather_scatter): Add
vectype parameter.
* tree-vect-data-refs.cc (vect_check_gather_scatter): Get
vectype as parameter.
(vect_analyze_data_refs): Adjust.
* tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Likewise.
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Get vectype
as parameter, pass down.
(vect_build_slp_tree_2): Adjust.
* tree-vect-stmts.cc (vect_mark_stmts_to_be_vectorized): Likewise.
(vect_use_strided_gather_scatters_p): Likewise.
|
|
This slightly improve the readability of error message, by suggesting
that 0 (literal) is expected as argument:
invalid conversion from 'int' to 'std::__cmp_cat::__literal_zero*'
libstdc++-v3/ChangeLog:
* libsupc++/compare (__cmp_cat::__literal_zero): Rename
from __unspec.
(__cmp_cat::__unspec): Rename to __literal_zero.
(operator==, operator<, operator>, operator<=, operator>=):
Replace __cmp_cat::__unspec to __cmp_cat::__literal_zero.
|
|
gcc/ChangeLog:
* doc/extend.texi (Common Variable Attributes): Put counted_by
in alphabetical order.
|
|
As mentioned in the PR, LOCATION_LINE is represented in an int,
and while we have -pedantic diagnostics (and -pedantic-error error)
for too large #line, we can still overflow into negative line
numbers up to -2 and -1. We could overflow to that even with valid
source if it says has #line 2147483640 and then just has
2G+ lines after it.
Now, the ICE is because assign_discriminator{,s} uses a hash_map
with int_hash <int64_t, -1, -2>, so values -2 and -1 are reserved
for deleted and empty entries. We just need to make sure those aren't
valid. One possible fix would be just that
- discrim_entry &e = map.get_or_insert (LOCATION_LINE (loc), &existed);
+ discrim_entry &e
+ = map.get_or_insert ((unsigned) LOCATION_LINE (loc), &existed);
by adding unsigned cast when the key is signed 64-bit, it will never
be -1 or -2.
But I think that is wasteful, discrim_entry is a struct with 2 unsigned
non-static data members, so for lines which can only be 0 to 0xffffffff
(sure, with wrap-around), I think just using a hash_map with 96bit elts
is better than 128bit.
So, the following patch just doesn't assign any discriminators for lines
-1U and -2U, I think that is fine, normal programs never do that.
Another possibility would be to handle lines -1U and -2U as if it was say
-3U.
2025-09-02 Jakub Jelinek <jakub@redhat.com>
PR middle-end/121663
* tree-cfg.cc (assign_discriminator): Change map argument type
from hash_map with int_hash <int64_t, -1, -2> to one with
int_hash <unsigned, -1U, -2U>. Cast LOCATION_LINE to unsigned.
Return early for (unsigned) LOCATION_LINE above -3U.
(assign_discriminators): Change map type from hash_map with
int_hash <int64_t, -1, -2> to one with int_hash <unsigned, -1U, -2U>.
* gcc.dg/pr121663.c: New test.
|
|
The gcc.dg/tree-ssa/cswtch-[67].c tests FAIL on Solaris/SPARC with the
native as:
FAIL: gcc.dg/tree-ssa/cswtch-6.c scan-assembler .rodata.cst16
FAIL: gcc.dg/tree-ssa/cswtch-7.c scan-assembler .rodata.cst32
The issue is the same in both cases: compared to the gas version, with
as there's only
- .section .rodata.cst32,"aM",@progbits,32
+ .section ".rodata"
It turns out that varasm.c (mergeable_constant_section) only emits the
former if HAVE_GAS_SHF_MERGE, which is 0 with the native as.
Fixed by xfailing the tests in this case.
Tested on sparc-sun-solaris2.11 with both as and gas.
2025-07-30 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
* gcc.dg/tree-ssa/cswtch-6.c (dg-final): xfail on
sparc*-*-solaris2* && !gas.
* gcc.dg/tree-ssa/cswtch-7.c: Likewise.
|
|
The print_ext_doc_entry function and associated version_t struct in
gen-riscv-ext-opt.cc were not being used anywhere in the codebase.
Remove them to clean up the code.
gcc/
* config/riscv/gen-riscv-ext-opt.cc (version_t): Remove unused
struct.
(print_ext_doc_entry): Remove unused function.
|
|
This testcase will fail on strict alignment targets due to
the requirement of doing a possible unaligned load. This fixes
that.
Note this testcase still fails on arm (and maybe riscv) targets while
having unaligned loads, they have slow ones.
Pushed as obvious after testing on x86_64-linux-gnu to make sure it
is still testing.
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/vector-compare-1.C: Restrict to
non_strict_align targets.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
|
|
gcc:
* doc/install.texi (Configuration): Fix spelling of "support"
and "floating-point arithmetic".
Signed-off-by: Jonathan Grant <jg@jguk.org>
|
|
The LF_ARRAY CodeView type represents a C- or C++-style array, which a
length known at compile time. We were crashing when using -gcodeview
with Ada (bug #121157), as the DW_AT_upper_bound value is not an
unsigned integer but something more complicated:
0x00000123: DW_TAG_array_type
DW_AT_type (0x0000014d "character")
DW_AT_sibling (0x00000142)
0x0000012c: DW_TAG_subrange_type
DW_AT_type (0x00000142 "integer")
DW_AT_lower_bound (DW_OP_push_object_address, DW_OP_plus_uconst 0x8, DW_OP_deref, DW_OP_deref_size 0x4)
DW_AT_upper_bound (DW_OP_push_object_address, DW_OP_plus_uconst 0x8, DW_OP_deref, DW_OP_plus_uconst 0x4, DW_OP_deref_size 0x4)
It doesn't look like we can represent Ada arrays in CodeView, so return
0 in get_type_num_array_type so that they come through as an unknown
type.
gcc/
* dwarf2codeview.cc (get_type_num_array_type): Don't try to
encode non-C-style arrays.
|
|
rsync generally is a more commonly used tool for syncing data - among
others it retains time stamps and is able to remove orphaned files on
the receiver side.
We just need to exclude some directories and a symlink from being
removed as "orphaned", since they originate elsewhere.
maintainer-scripts:
* update_web_docs_libstdcxx_git: Copy our "inner" documentation
into the web area using rsync instead of cpio and remove orphaned
files.
|
|
The following patch implements the
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3457.htm
paper without the first 3 lines in Recommended practice.
Seems GCC behavior already matches the expected behavior except for
diagnostics of more than 2147483648 __COUNTER__ expansions, so the
patch adds a diagnostic for that (but not testcase because
#define A __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__
#define B A A A A A A A A
#define C B B B B B B B B
#define D C C C C C C C C
#define E D D D D D D D D
#define F E E E E E E E E
#define G F F F F F F F F
#define H G G G G G G G G
#define I H H H H H H H H
#define J I I I I I I I I
J J J J
__COUNTER__
just takes too long to preprocess).
Plus I've included all the snippets from the paper into one testcase.
2025-09-01 Jakub Jelinek <jakub@redhat.com>
* macro.cc: Implement C2Y N3457 - The __COUNTER__ predefined macro.
(_cpp_builtin_macro_text): Diagnose if __COUNTER__ reaches
2147483648 value.
* gcc.dg/cpp/c2y-counter-1.c: New test.
|
|
The following patch implements
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3577.txt
No big deal on the GCC side, for uimaxabs we just won't
recognize it as builtin and I don't see it worth preserving
__builtin_uimaxabs, I doubt anything but gcc testsuite used
that.
But on the glibc side I think it will need to remain exported
for ABI compatibility :(
2025-09-01 Jakub Jelinek <jakub@redhat.com>
* builtins.def: Implement C2Y N3577 - Rename s/uimaxabs/umaxabs/.
(BUILT_IN_UIMAXABS): Rename to ...
(BUILT_IN_UMAXABS): ... this. Change second argument to "umaxabs".
* builtins.cc (fold_builtin_1): Use BUILT_IN_UMAXABS rather than
BUILT_IN_UIMAXABS.
* gcc.c-torture/execute/builtins/lib/abs.c (uimaxabs): Rename to ...
(umaxabs): ... this.
* gcc.c-torture/execute/builtins/uabs-2.c (uimaxabs): Rename to ...
(umaxabs): ... this.
(main_test): Use umaxabs instead of uimaxabs.
* gcc.c-torture/execute/builtins/uabs-3.c (main_test): Use umaxabs
instead of uimaxabs.
|
|
PR fortran/121727
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_const_length_character_type_p): New helper
function.
(conv_dummy_value): Use it to determine if a character actual
argument has a constant length. If a character actual argument is
constant and longer than the dummy, truncate it at compile time.
gcc/testsuite/ChangeLog:
* gfortran.dg/value_10.f90: New test.
|
|
gcc:
* doc/invoke.texi (Optimize Options): Update the perfwiki web
address.
|
|
The use of HOST_SIZE_T_PRINT_HEX needs to be paired with a c-style
cast to (fmt_size_t) otherwise the detection mechanisms in hwint.h
are not sufficient to deal with size_t defined as 'long unsigned int'
which is done on Darwin (and I think on Windows).
This patch just makes that update.
gcc/ChangeLog:
* diagnostics/logging.h (log_param_location_t): Cast
location_t value to fmt_size_t.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
While the assemblers used by Darwin that are based on LLVM, do
support .cfi_ instructions, their use triggers production of
compact unwind which currently does not interwork properly with
GCC's output.
When the system objdump is used in the configure process this is
currently working by good fortune (the objdump does not recognise
the command and we fail to detect the cfi_advance.
However, if a user has binutils objdump earlier in thier PATH then
we will detect support and try to use .cfi_ which will cause later
and hard-to-diagnose issues.
Until we have this resolved, force cfi instruction use off for
Darwin.
gcc/ChangeLog:
* configure: Regenerate.
* configure.ac: Do not claim cfi instruction support even
if the assembler has it.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
The problem was caused by an erroneous note about creating a stack frame,
which caused the cur_cfa reg to fail to assert with a value other than
the frame pointer.
This fix will generate notes that correctly update cur_cfa.
v2 changes.
Add testcase.
All tests that failed with
"internal compiler error: in dwarf2out_frame_debug_adjust_cfa, at dwarf2cfi.cc"
now pass.
PR target/89828
gcc
* config/rx/rx.cc (add_pop_cfi_notes): Release the frame pointer if it is
used.
(rx_expand_prologue): Redesigned stack pointer and frame pointer update
process.
gcc/testsuite/
* gcc.dg/pr89828.c: New.
|
|
This makes them not fail during test suite runs with overriden arch or
tunings.
gcc/testsuite/ChangeLog:
* gcc.target/i386/shift-gf2p8affine-1.c: Use -march=x86-64
-mtune-generic.
* gcc.target/i386/shift-gf2p8affine-2.c: Dito.
* gcc.target/i386/shift-gf2p8affine-3.c: Dito.
* gcc.target/i386/shift-gf2p8affine-5.c: Dito.
* gcc.target/i386/shift-gf2p8affine-6.c: Dito.
* gcc.target/i386/shift-gf2p8affine-7.c: Dito.
|
|
Like we do in other effective-targets, add
"-mcpu=unset -march=armv8-a"
directly when setting et_arm_v8_neon_flags in arm_v8_neon_ok_nocache,
to avoid having to add these two flags in all users of arm_v8_neon_ok.
This avoids duplication and possible typos / oversights.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp
(check_effective_target_arm_v8_neon_ok_nocache): Add "-mcpu=unset
-march=armv8-a" to et_arm_v8_neon_flags.
(add_options_for_vect_early_break): Remove useless "-mcpu=unset
-march=armv8-a".
(add_options_for_arm_v8_neon): Likewise.
|
|
A few arm effective-targets call check_effective_target_arm32 even
though they would force a -march=XXX flag which supports Arm and/or
Thumb-2, thus making the arm32 check useless. This has an impact when
the toolchain is configured with a default -march or -mcpu which
supports Thumb-1 only: in such a case, arm32 is false and we skip many
tests, thus reducing coverage.
This patch removes the call to check_effective_target_arm32 where it
is useless, enabling about 2000 tests.
In addition, add an early exit if the target is not an arm one, thus
saving a few compilation cycles where not needed. In all callers of
arm_neon_ok, remove the now useless "istarget arm*-*-*.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp
(check_effective_target_arm_neon_ok_nocache): Remove arm32 check.
Add istarget arm*-*-* check.
(check_effective_target_arm_neon_fp16_ok_nocache): Likewise.
(check_effective_target_arm_neon_softfp_fp16_ok_nocache): Likewise.
(check_effective_target_arm_v8_neon_ok_nocache): Likewise.
(check_effective_target_arm_neonv2_ok_nocache): Likewise.
(check_effective_target_vect_pack_trunc): Remove istarget arm*-*-*
check.
(check_effective_target_vect_unpack): Likewise.
(check_effective_target_vect_condition): Likewise.
(check_effective_target_vect_cond_mixed): Likewise.
(available_vector_sizes): Likewise.
|
|
We currently do not handle promotion/demotion of 'var' when the
left operand of a variable shift is constant. There's no good
reason why, so the following fixes this omission.
PR tree-optimization/121744
* tree-vect-patterns.cc (vect_recog_vector_vector_shift_pattern):
Allow constant left operand.
* gcc.dg/vect/pr121744-1.c: New testcase.
|
|
The following uses SLP_TREE_REDUC_IDX where it looks more appropriate.
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Use SLP_TREE_REDUC_IDX for following the SLP graph and
for identifying whether we use the 'else' in a COND.
(vectorizable_lane_reducing): Simplify check of whether
we are in a reduction.
(vectorizable_reduction): Add sanity checking around
SLP_TREE_REDUC_IDX and use it where it looks appropriate.
(vect_transform_reduction): Use SLP_TREE_REDUC_IDX.
* tree-vect-stmts.cc (vectorizable_call): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_condition): Likewise.
|
|
The following removes no longer needed extra sets of STMT_VINFO_REDUC_DEF
and replaces a single remaining one with a more appropriate check.
* tree-vect-loop.cc (vectorizable_live_operation): Check
vect_is_reduction on the SLP node rather than
STMT_VINFO_REDUC_DEF on the stmt.
(vectorizable_reduction): Do not set STMT_VINFO_REDUC_DEF
on live stmts.
|
|
While we have already the accessor info_for_reduction, its result
is a plain stmt_vec_info. The following turns that into a class
for the purpose of changing accesses to reduction info to a new
set of accessors prefixed with VECT_REDUC_INFO and removes
the corresponding STMT_VINFO prefixed accessors where possible.
There is few reduction related things that are used by scalar
cycle detection and thus have to stay as-is for now and as
copies in future.
This also separates reduction info into one object per reduction
and associate it with SLP nodes, splitting it out from
stmt_vec_info, retaining (and duplicating) parts used by scalar
cycle analysis. The data is then associated with SLP nodes
forming reduction cycles and accessible via info_for_reduction.
The data is created at SLP discovery time as we look at it even
pre-vectorizable_reduction analysis, but most of the data is
only populated by the latter. There is no reduction info with
nested cycles that are not part of an outer reduction.
In the process this adds cycle info to each SLP tree, notably
the reduc-idx and a way to identify the reduction info.
* tree-vectorizer.h (vect_reduc_info): New.
(create_info_for_reduction): Likewise.
(VECT_REDUC_INFO_TYPE): Likewise.
(VECT_REDUC_INFO_CODE): Likewise.
(VECT_REDUC_INFO_FN): Likewise.
(VECT_REDUC_INFO_SCALAR_RESULTS): Likewise.
(VECT_REDUC_INFO_INITIAL_VALUES): Likewise.
(VECT_REDUC_INFO_REUSED_ACCUMULATOR): Likewise.
(VECT_REDUC_INFO_INDUC_COND_INITIAL_VAL): Likewise.
(VECT_REDUC_INFO_EPILOGUE_ADJUSTMENT): Likewise.
(VECT_REDUC_INFO_FORCE_SINGLE_CYCLE): Likewise.
(VECT_REDUC_INFO_RESULT_POS): Likewise.
(VECT_REDUC_INFO_VECTYPE): Likewise.
(STMT_VINFO_VEC_INDUC_COND_INITIAL_VAL): Remove.
(STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT): Likewise.
(STMT_VINFO_FORCE_SINGLE_CYCLE): Likewise.
(STMT_VINFO_REDUC_FN): Likewise.
(STMT_VINFO_REDUC_VECTYPE): Likewise.
(vect_reusable_accumulator::reduc_info): Adjust.
(vect_reduc_type): Adjust.
(_slp_tree::cycle_info): New member.
(SLP_TREE_REDUC_IDX): Likewise.
(vect_reduc_info_s): Move/copy data from ...
(_stmt_vec_info): ... here.
(_loop_vec_info::redcu_infos): New member.
(info_for_reduction): Adjust to take SLP node.
(vect_reduc_type): Adjust.
(vect_is_reduction): Add overload for SLP node.
* tree-vectorizer.cc (vec_info::new_stmt_vec_info):
Do not initialize removed members.
(vec_info::free_stmt_vec_info): Do not release them.
* tree-vect-stmts.cc (vectorizable_condition): Adjust.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize
cycle info.
(vect_build_slp_tree_2): Compute SLP reduc_idx and store
it. Create, populate and propagate reduction info.
(vect_print_slp_tree): Print cycle info.
(vect_analyze_slp_reduc_chain): Set cycle info on the
manual added conversion node.
(vect_optimize_slp_pass::start_choosing_layouts): Adjust.
* tree-vect-loop.cc (_loop_vec_info::~_loop_vec_info):
Release reduction infos.
(info_for_reduction): Get the reduction info from
the vector in the loop_vinfo.
(vect_create_epilog_for_reduction): Adjust.
(vectorizable_reduction): Likewise.
(vect_transform_reduction): Likewise.
(vect_transform_cycle_phi): Likewise, deal with nested
cycles not part of a double reduction have no reduction info.
* config/aarch64/aarch64.cc (aarch64_force_single_cycle):
Use VECT_REDUC_INFO_FORCE_SINGLE_CYCLE, get SLP node and use
that.
(aarch64_vector_costs::count_ops): Adjust.
|
|
Add two Newlib commits to the recommended Newlib version,
fixing two other SIMD issues.
Cf. PR target/121392 and Newlib Bug 33272.
gcc/ChangeLog:
PR target/121392
* doc/install.texi (amdgcn): Mention Newlib commit
that fixes another SIMD issue.
|
|
The following simplifies the flow of IV analysis a bit.
* tree-vect-loop.cc (vect_is_simple_iv_evolution): Get
stmt_info and store into STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED
and STMT_VINFO_LOOP_PHI_EVOLUTION_PART here. Drop unused
output parameters.
(vect_is_nonlinear_iv_evolution): Likewise.
(vect_analyze_scalar_cycles_1): Remove redundant setting
of STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED and
STMT_VINFO_LOOP_PHI_EVOLUTION_PART.
|
|
The original intention of this code was to allow more allocnos
to share the same register, but this led to expensive allocno
overflows. Extracted a small case (a bit large, see Bugzilla
PR117838 for details) from 548.exchange2_r to analyze this
register allocation issue.
Before improve_allocation function:
a537 (cost 1896, reg42)
a20 (cost 270, reg1)
a13 (cost 144, spill)
a551 (cost 70, reg40)
a5 (cost 43, spill)
a493 (cost 30, reg42)
a499 (cost 12, reg40)
------------------------------
Dump info in improve_allocation function:
Base:
Spilling a493r125 for a5r113
Spilling a573r202 for a5r113
Spilling a499r248 for a13r106
Spilling a551r120 for a13r106
Spilling a20r237 for a551r120
With patch:
Spilling a499r248 for a13r106
Spilling a551r120 for a13r106
Spilling a493r125 for a551r120
------------------------------
After assign_hard_reg (at the end of improve_allocation):
Base:
a537 (cost 1896, reg1)
a20 (cost 270, spill) -----> This is unreasonable
a13 (cost 144, reg40)
a551 (cost 70, reg1)
a5 (cost 43, reg42)
a493 (cost 30, spill)
a499 (cost 12, reg1)
With patch:
a537 (cost 1896, reg42)
a20 (cost 270, reg1)
a13 (cost 144, reg40)
a551 (cost 70, reg42)
a5 (cost 43, spill)
a493 (cost 30, spill)
a499 (cost 12, reg42)
-----------------------------
Collected spec2017 performance on Znver3/Graviton4/EMR/SRF for O2 and Ofast.
No performance regression was observed.
FOR multi-copy O2
SRF: 548.exchange2_r increased by 7.5%, 500.perlbench_r increased by 2.0%.
EMR: 548.exchange2_r increased by 4.5%, 500.perlbench_r increased by 1.7%.
Graviton4: 548.exchange2_r Increased by 2.2%, 511.povray_r increased by 2.8%.
Znver3 : 500.perlbench_r increased by 2.0%.
gcc/ChangeLog:
PR rtl-optimization/117838
* ira-color.cc (improve_allocation): Remove soft conflict related code.
|