Age | Commit message (Collapse) | Author | Files | Lines |
|
For x86, the option is -momit-leaf-frame-pointer, not -fomit-leaf-frame-pointer.
gcc/ChangeLog:
* doc/invoke.texi (x86 Options): Fix '-momit-leaf-frame-pointer' typo.
|
|
As simplify_builtin_call adds more and more optimization, it is
getting bigger and bigger and easier to misunderstand, so this
factors out the memcpy followed by memset optimization (which
was the original optimization added).
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (simplify_builtin_call): Factor out
the memcpy followed by a memset optimization to ...
(simplify_builtin_memcpy_memset): Here. New function.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
As more optimizations are added to forwprop's simplify_builtin_call,
this function is becoming harder and harder to understand. To help
simplify things, this factors out the memchr optimization to its own
function like what was done when memcmp optimization was added.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (simplify_builtin_call): Factor out the memchr
optimization to ...
(simplify_builtin_memchr): Here. New function.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
The build is broken on MacOS since r16-3581-g1da3c4d90e678a because
ipa-inline-transform.cc uses std::max but does not include <algorithm>.
This patch fixes it by defining INCLUDE_ALGORITHM in that file.
gcc/ChangeLog:
* ipa-inline-transform.cc: Define INCLUDE_ALGORITHM.
|
|
gcc:
PR target/69374
* doc/install.texi (Prerequisites): Properly capitalize
GNU Binutils.
(Configuration): Ditto.
(Building): Ditto.
(Specific): Ditto.
|
|
Unchanged instances are deliberate.
gcc/ChangeLog:
* doc/invoke.texi: Say 'whole-program' consistently where
appropriate.
|
|
gcc/ChangeLog:
* doc/invoke.texi: Capitalize 'GNU Binutils' consistently.
|
|
GNU Binutils now supports linking LTO and non-LTO objects into a single
mixed object file as of 2.44. Update the text to reflect this and fix
some minor grammar issues while at it.
gcc/ChangeLog:
PR ipa/116410
* doc/invoke.texi (Link Options): Update -flinker-output= text
to reflect GNU Binutils changes. Fix grammar.
|
|
This extension defines vector load instructions to move sign-extended or
zero-extended INT4 data into 8-bit vector register elements.
gcc/ChangeLog:
* config/riscv/andes-vector-builtins-bases.cc
(nds_nibbleload): New class.
* config/riscv/andes-vector-builtins-bases.h (nds_vln8): New def.
(nds_vlnu8): Ditto.
* config/riscv/andes-vector-builtins-functions.def (nds_vln8): Ditto.
(nds_vlnu8): Ditto.
* config/riscv/andes-vector.md (@pred_intload_mov<su><mode>): New pattern.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_Q_OPS): New def.
(DEF_RVV_QU_OPS): Ditto.
* config/riscv/riscv-vector-builtins.cc
(q_v_void_const_ptr_ops): New operand information.
(qu_v_void_const_ptr_ops): Ditto.
* config/riscv/riscv-vector-builtins.def (void_const_ptr): New def.
* config/riscv/riscv-vector-builtins.h (enum required_ext): Ditto.
(required_ext_to_isa_name): Add case XANDESVSINTLOAD_EXT.
(required_extensions_specified): Ditto.
* config/riscv/vector-iterators.md (NDS_QVI): New iterator.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vln8.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vln8.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vln8.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vln8.c: New test.
|
|
This patch add support for XAndesvbfhcvt ISA extension.
This extension defines instructions to perform vector floating-point
conversion between the BFLOAT16 floating-point data and the IEEE-754 32-bit
single-precision floating-point (SP) data in a vector register.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc:
Turn on VECTOR_ELEN_BF_16 for XAndesvbfhcvt.
* config.gcc: Add extra_objs andes-vector-builtins-bases.o
and extra_headers andes_vector.h.
* config/riscv/riscv-vector-builtins-shapes.cc
(BASE_NAME_MAX_LEN): Increase size to 20.
* config/riscv/riscv-vector-builtins.cc
(f32_to_bf16_nf_w_ops): New operand information.
(f32_to_bf16_nf_w_ops): New operand information.
(DEF_RVV_FUNCTION): New def.
* config/riscv/riscv-vector-builtins.def (bf16): Ditto.
* config/riscv/riscv-vector-builtins.h (enum required_ext): Ditto.
(required_ext_to_isa_name): Add case XANDESVBFHCVT_EXT.
(required_extensions_specified): Ditto.
* config/riscv/t-riscv: Add andes-vector-builtins-functions.def,
andes-vector-builtins-bases.h and andes-vector-builtins-bases.o.
* config/riscv/vector-iterators.md (NDS_VWEXTBF): New iterator.
(NDS_V_DOUBLE_TRUNC_BF): New attr.
* config/riscv/andes-vector-builtins-bases.cc: New file.
* config/riscv/andes-vector-builtins-bases.h: New file.
* config/riscv/andes-vector-builtins-functions.def: New file.
* config/riscv/andes_vector.h: New file.
* config/riscv/andes-vector.md: New file.
* config/riscv/vector.md: Include andes_vector.md.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Add regression for xandesvector.
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfncvtbf16s.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfwcvtsbf16.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfncvtbf16s.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfwcvtsbf16.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfncvtbf16s.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfwcvtsbf16.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfncvtbf16s.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfwcvtsbf16.c: New test.
|
|
Add pipeline description for the Tenstorrent Ascalon 8 wide CPU.
gcc/ChangeLog
* config/riscv/riscv-cores.def (RISCV_TUNE): Update.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add tt_ascalon_d8.
* config/riscv/riscv.md: Update tune attribute and include
tt-ascalon-d8.md.
* config/riscv/tt-ascalon-d8.md: New file.
|
|
2025-09-06 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/84119
* resolve.cc (reset_array_ref_to_scalar): New function using
chunk broken out from gfc_resolve_ref.
(gfc_resolve_ref): Call the new function, the first time for
PDT type parameters and the second time for LEN inquiry refs.
gcc/testsuite/
PR fortran/84119
* gfortran.dg/pdt_20.f03: Modify to deal with scalar type parm.
|
|
This improves the locations for the phi args and the newly created statement.
Since this is a factorization/commonizing in one case the location for the new
statement will not always be set correctly either way.
The new locations on the new phi will either be the old location of the argument
to the phi or the location of the defining statement (if it exists).
The new statement will be either the location of the phi or the location
of the defining statements if the location of the phi is unknown.
This fixes the location of an uninitialized variable warning too.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/108466
gcc/ChangeLog:
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Give better
locations to the new phi args and the new statement.
gcc/testsuite/ChangeLog:
* gcc.dg/uninit-pr108466-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
The recently added zbb-sext test includes stdint.h and explicitly asks for the
lp64 abi (not lp64d!).
This will fail on a native riscv system as the system headers don't support
lp64 -- they assume "d" is included.
It looks like most tests are including stdint-gcc instead of stdint. Not a fan
of that, but it seems to be how we've been handling this kind of issue to-date.
gcc/testsuite
* gcc.target/riscv/zbb-sext.c: Include stdint-gcc.h instead of
stdint.h.
|
|
Currently we represent exported using-directives as a list of indices
into the namespace array that we stream. However this list of
namespaces doesn't include any namespaces that we don't expose in this
module's purview, and so we ICE.
This patch reworks the handling to instead use the existing depset
tracking for namespaces directly. This means that we don't need to
build up a second lookup map when streaming, and we can reuse the logic
in {read,write}_namespace. We do need to make sure that we create a
depset for namespaces only referenced by a using-directive, though.
I don't expect to be exporting large numbers of using-directives from a
namespace, so for simplicity we stream the names as {parent, target}
pairs.
This also adjusts read handling so that we load the using-directives for
any import (including indirect) if it's in the import list for the
current TU. Otherwise we run into issues if the using-directive is in
a namespace that is otherwise never referenced in the 'export import'ing
module, because we never walk this namespace and so never know that we
need to emit it. To do this the patch ensures that we calculate the
import list before read_language is called.
As a drive-by fix, I noticed that with modules 'add_using_namespace'
will add duplicate using-directives because we compare usings against
the target namespace, but we then push a wrapping USING_DECL instead.
This reworks so that the contents of the structure is equivalent between
modules and non-modules code.
PR c++/121702
gcc/cp/ChangeLog:
* module.cc (enum module_state_counts): New counter.
(depset::hash::add_namespace_entities): Seed using-directive
targets for later streaming.
(module_state::write_namespaces): Don't handle using-directives
here.
(module_state::read_namespaces): Likewise.
(module_state::write_using_directives): New function.
(module_state::read_using_directives): New function.
(module_state::write_counts): Log using-directives.
(module_state::read_counts): Likewise.
(module_state::write_begin): Stream using-directives.
(module_state::read_language): Read using-directives if
directly importing.
(module_state::direct_import): Update current TU import list
before calling read_language.
* name-lookup.cc (add_using_namespace): Fix lookup of previous
using-directives.
* parser.cc (cp_parser_import_declaration): Don't set
MK_EXPORTING when performing import_module.
gcc/testsuite/ChangeLog:
* g++.dg/modules/namespace-10_c.C: Add check for log dump.
* g++.dg/modules/namespace-13_a.C: New test.
* g++.dg/modules/namespace-13_b.C: New test.
* g++.dg/modules/namespace-13_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
[basic.lookup.argdep] p4 says that ADL also finds declarations of
functions or function templates from a point of lookup within the
module, only ignoring discarded (or internal) GM entities.
To implement this we need to create bindings for these entities so that
we can guarantee that name lookup will discover they exist. This raises
some complications, though, as we ideally would like to avoid having
bindings that contain no declarations, or emitting GM namespaces that
only contain discarded or internal functions.
This patch does this by additionally creating a new binding whenever we
call make_dependency on a non-EK_FOR_BINDING decl. We don't do this for
using-decls, as at the point of use of a GM entity we no longer know
whether we called through a using-decl or the declaration directly;
however, this behaviour is explicitly supported by [module.global.frag]
p3.6.
Creating these bindings caused g++.dg/modules/default-arg-4_* to fail.
It turns out that this makes the behaviour look identical to
g++.dg/modules/default-arg-5, which is incorrectly dg-error-ing default
value redeclarations (we only currently error because of PR c++/99000).
This patch removes the otherwise identical test and turns the dg-errors
into xfailed dg-bogus.
As a drive-by fix this also fixes an ICE when debug printing friend
function instantiations.
PR c++/121705
PR c++/117658
gcc/cp/ChangeLog:
* module.cc (depset::hash::make_dependency): Make bindings for
GM functions.
(depset::hash::add_binding_entity): Adjust comment.
(depset::hash::add_deduction_guides): Add log.
* ptree.cc (cxx_print_xnode): Handle friend functions where
TI_TEMPLATE is an OVERLOAD or IDENTIFIER.
gcc/testsuite/ChangeLog:
* g++.dg/modules/default-arg-4_a.C: XFAIL bogus errors.
* g++.dg/modules/default-arg-4_b.C: Likewise.
* g++.dg/modules/default-arg-5_a.C: Remove duplicate test.
* g++.dg/modules/default-arg-5_b.C: Likewise.
* g++.dg/modules/adl-9_a.C: New test.
* g++.dg/modules/adl-9_b.C: New test.
* g++.dg/modules/gmf-5.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
|
|
gcc/testsuite/ChangeLog:
PR rtl-optimization/121757
* g++.dg/pr121757.C: Add dg-require-effective-target for lto.
|
|
gcc/ChangeLog:
PR middle-end/121806
* gcc.cc (for_each_path): Initialize return value.
|
|
For Zvfhmin a vector mode exists but the corresponding vec_extract does
not. This patch checks that a vec_extract is available and otherwise
falls back to standard handling.
PR target/121510
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_legitimize_move): Check if we can
vec_extract.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr121510.c: New test.
|
|
I've noticed a lot of diagnostic messages in the C FE aren't marked
for translation.
The reason is some weird coding style which wraps the string
literals into (), especially when they don't fit on a single line.
With that fixed, there were 83 unique similar messages
"both %<something%> and %<something%> in declaration specifiers"
marked for translation, which is very unfriendly to translators,
the patch brings that down to 4 (if it was ok to change order,
it could be even 3):
msgid "both %qs and %qs in declaration specifiers"
msgid "both %qs and %<__int%d%> in declaration specifiers"
msgid "both %qs and %<_Float%d%s%> in declaration specifiers"
msgid "both %<__int%d%> and %qs in declaration specifiers"
2025-09-05 Jakub Jelinek <jakub@redhat.com>
* c-decl.cc (pushtag): Remove ()s around string literal
in call to diagnostic function.
(diagnose_mismatched_decls): Likewise.
(c_check_switch_jump_warnings): Likewise.
(grokdeclarator): Likewise.
(warn_cxx_compat_finish_struct): Likewise.
(build_enumerator): Formatting fix.
(declspecs_add_type): Remove ()s around string literal
in call to diagnostic function, simplify
"both %<something%> and %<something%>" starting format
strings to "both %qs and %qs" with appropriate arguments.
Formatting fixes.
* c-typeck.cc (build_external_ref): Remove ()s around string
literal in call to diagnostic function.
(build_conditional_expr): Likewise.
* c-parser.cc (c_parser_transaction): Use G_() around string
literals. Formatting fix.
(c_parser_transaction_expression): Likewise.
|
|
In order to reduce time complexity, rtl-ssa groups consecutive
clobbers together. Each group of clobbers has a splay tree for
lookup and manipulation purposes.
This arrangement means that we might need to split a group (when
inserting a new non-clobber definition between two clobbers) or
to join consecutive groups together (when deleting an intervening
non-clobber definition). To reduce the time complexity of these updates,
the back pointer from a clobber to its group is only updated lazily.
The invariant is supposed to be that the first clobber, last clobber,
and splay tree root have the right group at all times, whereas other
members of the group can have identifiably stale group pointers.
However, a lack of abstraction meant that only some splay tree lookups
correctly maintained this invariant. Others did not update the group
pointer after installing a new root.
This patch adds a helper that maintains the invariant and uses it in
three places, one that was already correct and two that were wrong.
The original lookup_clobber is still used in other code that
manipulates groups as a whole.
gcc/
PR rtl-optimization/121757
* rtl-ssa/accesses.h (clobber_group::lookup_clobber): New member
function.
* rtl-ssa/accesses.cc (clobber_group::lookup_clobber): Likewise.
(clobber_group::prev_clobber, clobber_group::next_clobber)
(function_info::add_clobber): Use it.
gcc/testsuite/
PR rtl-optimization/121757
* g++.dg/pr121757.C: New test.
|
|
COBOL Special Registers (e.g., RETURN-CODE; DEBUG-ITEM) are implemented as
global variables. These changes define them with the prefix "__ggsr__" in
their variable names so that the GDB-COBOL debugger can identify them.
The creation and handling of such variables has been streamlined with the
introduction of the "register_e" cbl_field_t::attr bit.
gcc/cobol/ChangeLog:
* genapi.cc (trace1_init): Prepend two internal variables with
underscore.
(initialize_variable_internal): Use new register_e attribute.
(psa_global): Use "__ggsr__" prefix to identify special registers
(parser_symbol_add): Use new register_e attribute.
* symbols.cc (cbl_field_attr_str): Likewise.
(symbol_table_init): Likewise.
(is_register_field): Eliminated in favor of (attr & register_e).
* symbols.h (is_register_field): Likewise.
libgcobol/ChangeLog:
* common-defs.h (enum cbl_field_attr_t): Define register_e.
* constants.cc (struct cblc_field_t): Define special registers with
"__ggsr__" prefix.
|
|
This test case fails on int < 32-bit platforms obviously.
This patch undoes the macro expansion from stdint.h.
gcc/testsuite/
PR testsuite/121695
PR testsuite/52641
* gcc.dg/torture/pr121695-1.c: int -> int32_t etc.
|
|
There are some cases where involing zero_reg is not needed and
where there are other sequences with the same efficiency.
An example is to use SBCI R,0 instead of SBC R,__zero_reg__
when R >= R16. This may turn out to be better for small ISRs.
PR target/121794
gcc/
* config/avr/avr.cc (avr_out_compare): Only use zero_reg
when there is no other sequence of the same length.
(avr_out_plus_ext): Same.
(avr_out_plus_1): Same.
|
|
This avoids confusing the backends.
* tree-vect-slp.cc (vectorizable_bb_reduc_epilogue): Do not
cost zero remaining scalar stmts.
(vectorizable_slp_permutation): Do not cost zero actual
permutations.
* tree-vect-stmts.cc (vectorizable_load): Likewise.
|
|
The following avoids looking at STMT_VINFO_VECTYPE in
vect_setup_realignment and instead passes down the relevant vector
type.
PR tree-optimization/121802
* tree-vectorizer.h (vect_setup_realignment): Add vectype
argument.
* tree-vect-data-refs.cc (vect_setup_realignment): Replace
local vectype with argument.
* tree-vect-stmts.cc (vectorizable_load): Adjust.
|
|
Marek Polacek reported to me internally that I've messed up one diagnostic
message in this function, with one word before final double quote on one
line and another word right after opening double quote on the next line,
with no space in between.
Fixed thusly.
2025-09-05 Jakub Jelinek <jakub@redhat.com>
* constexpr.cc (cxx_eval_cxa_builtin_fn): Add missing word separating
space into invalid_nargs diagnostics.
|
|
This test was written without _BitInt support on any target with
fixed-point support as well, so was actually never tested.
Now that it can be tested on loongarch64-linux, there is a missing
expected error, so this patch adds it.
2025-09-05 Jakub Jelinek <jakub@redhat.com>
* gcc.dg/fixed-point/bitint-1.c: Expect also error about _Sat used
without _Fract/_Accum.
|
|
2025-09-05 Jakub Jelinek <jakub@redhat.com>
* J: Remove.
|
|
On Tue, Jul 01, 2025 at 02:50:40PM -0500, Segher Boessenkool wrote:
> No tests become good tests without effort. And tests that are not good
> tests require constant maintenance!
Here are two patches, either just the first one or both can be used
and both were tested on powerpc64le-linux.
The second one adds further 8 tests, which are dg-do run which #include
the former tests, don't do any dump tests and just define the checking/main
for those.
2025-09-05 Jakub Jelinek <jakub@redhat.com>
PR testsuite/118567
* gcc.target/powerpc/vsx-vectorize-9.c: New test.
* gcc.target/powerpc/vsx-vectorize-10.c: New test.
* gcc.target/powerpc/vsx-vectorize-11.c: New test.
* gcc.target/powerpc/vsx-vectorize-12.c: New test.
* gcc.target/powerpc/vsx-vectorize-13.c: New test.
* gcc.target/powerpc/vsx-vectorize-14.c: New test.
* gcc.target/powerpc/vsx-vectorize-15.c: New test.
* gcc.target/powerpc/vsx-vectorize-16.c: New test.
|
|
On Tue, Jul 01, 2025 at 02:50:40PM -0500, Segher Boessenkool wrote:
> No tests become good tests without effort. And tests that are not good
> tests require constant maintenance!
Here are two patches, either just the first one or both can be used
and both were tested on powerpc64le-linux.
The first one removes all the checking etc. stuff from the testcases,
as they are just dg-do compile, for the vectorize dump checks all we
care about are the vectorized loops they want to test.
2025-09-05 Jakub Jelinek <jakub@redhat.com>
PR testsuite/118567
* gcc.target/powerpc/vsx-vectorize-1.c: Remove includes, checking
part of main1 and main.
* gcc.target/powerpc/vsx-vectorize-2.c: Remove includes, replace
bar definition with declaration, remove main.
* gcc.target/powerpc/vsx-vectorize-3.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-4.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-5.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-6.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-7.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-8.c: Likewise.
|
|
Unlike Advanced SIMD, SVE has instruction to perform smin, smax, umin, umax
on 64-bit elements. Thus, we can use them with the fixed-width V2DImode
expander. Most of the machinery is already there on the define_insn side,
supporting V2DImode operands of the SVE pattern. We just need to wire up
the RTL emission to the v2di standard names for the TARGET_SVE case.
So for the smin case we now generate:
min_di:
ldr q30, [x0]
ptrue p7.b, all
ldr q31, [x1]
smin z30.d, p7/m, z30.d, z31.d
str q30, [x2]
ret
min_imm_di:
ldr q31, [x0]
smin z31.d, z31.d, #5
str q31, [x2]
ret
instead of the previous:
min_di:
ldr q30, [x0]
ldr q31, [x1]
cmgt v29.2d, v30.2d, v31.2d
bsl v29.16b, v31.16b, v30.16b
str q29, [x2]
ret
min_imm_di:
ldr q31, [x0]
mov z30.d, #5
cmgt v29.2d, v30.2d, v31.2d
bsl v29.16b, v31.16b, v30.16b
str q29, [x2]
ret
The register operand case is the same length, though the new ptrue can now be
shared and moved away. But the immediate operand case is obviously better
as the SVE immediate form doesn't require a predicate operand.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/
* config/aarch64/iterators.md (sve_di_suf): New mode attribute.
* config/aarch64/aarch64-sve.md (<optab><mode>3 SVE_INT_BINARY_MULTI):
Rename to...
(<optab><mode>3<sve_di_suf>): ... This. Use SVE_I_SIMD_DI mode
iterator.
* config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Use the above
for TARGET_SVE.
gcc/testsuite/
* gcc.target/aarch64/sve/usminmax_di.c: New test.
|
|
2025-09-04 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/84432
PR fortran/114815
* expr.cc (gfc_check_assign_symbol): Check that components in a
PDT with a default initializer have type and length parameters
that reduce to constant integer expressions.
* trans-expr.cc (gfc_trans_assignment_1): Parameterized
components cannot have default initializers so they must be
allocated after initialization.
gcc/testsuite/
PR fortran/84432
PR fortran/114815
* gfortran.dg/pdt_26.f03: Update with default no initializer.
* gfortran.dg/pdt_27.f03: Change to test non-conforming
initializers.
|
|
2025-09-05 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/83762
PR fortran/102457
* decl.cc (gfc_get_pdt_instance): Check that variable PDT parm
expressions are of type integer. Note that the symbol must be
tested since the expression often appears as BT_PROCEDURE.
gcc/testsuite/
PR fortran/83762
PR fortran/102457
* gfortran.dg/pdt_44.f03: New test.
* gfortran.dg/pr95090.f90: Give the PDT parameter a value to
suppress the type error.
|
|
|
|
cost 0, 1 and 15
Add asm dump check and run test for vec_duplicate + vmadd.vvm
combine to vmadd.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vmadd.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-u8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
0, 1 and 15
Add asm dump check and run test for vec_duplicate + vmadd.vv
combine to vmadd.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vmadd.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
To avoid generating the vmadd.vx code.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: Adjust the
vmacc.vx to avoid generating vmadd.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to combine the vec_duplicate + vmadd.vv to the
vmadd.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
Before this patch:
11 │ beq a3,zero,.L8
12 │ vsetvli a5,zero,e32,m1,ta,ma
13 │ vmv.v.x v2,a2
...
16 │ .L3:
17 │ vsetvli a5,a3,e32,m1,ta,ma
...
22 │ vmadd.vv v1,v2,v3
...
25 │ bne a3,zero,.L3
After this patch:
11 │ beq a3,zero,.L8
...
14 │ .L3:
15 │ vsetvli a5,a3,e32,m1,ta,ma
...
20 │ vmadd.vx v1,a2,v3
...
23 │ bne a3,zero,.L3
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vmacc_vx_<mode>): Rename to
handle both the macc and madd.
(*mul_plus_vx_<mode>): Add madd pattern.
* config/riscv/vector.md (@pred_mul_plus_vx_<mode>): Rename to
handle both the macc and madd.
(*pred_macc_<mode>_scalar_undef): Remove.
(*pred_nmsac_<mode>_scalar_undef): Remove.
(*pred_mul_plus_vx<mode>_undef): Add new pattern to handle
both the vmacc and vmadd.
(@pred_mul_plus_vx<mode>): Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
In r16-3414 libstdc++ changed ABI for (still experimental C++20) and uses
unordered value -128 instead of 2. Generally the change improved code
generation on all targets tested, see
https://gcc.gnu.org/pipermail/gcc-patches/2025-August/693534.html
for details.
In r16-3474 I've adjusted the middle-end and backends to use that value.
This apparently broke the spaceship_1.C test on aarch64 which scans the
exact function bodies which are now different.
The following patch adjusts the full body patterns to match. On these
2 routines, the generated code is 1 insn longer than in the past, so if
you have ideas how to change the code generation for the common case of
-1, 0, 1, -128 value, maybe it could be improved.
2025-09-04 Jakub Jelinek <jakub@redhat.com>
PR testsuite/121732
PR target/117013
* g++.target/aarch64/spaceship_1.C: Adjust expected fn bodies
for _Z8ss_floatff and _Z9ss_doubledd.
|
|
With -fpartial-profling we ICE building perlbench and gcc from spec2k17 since
afdo_annotate_cfg applies knowlede about zero profiles too early. This patch
moves it after the early exit when profile is 0 everywhere and also fixes
formatting issue in the next block.
gcc/ChangeLog:
* auto-profile.cc (afdo_annotate_cfg): Apply zero_bbs after early
exit for missing profile; fix formating
|
|
with auto-fdo it is possible that function bar with non-zero profile is inlined
into foo with zero profile and foo is the only caller of it. In this case
we currently scale bar to also have zero profile which makes it optimized
for size. With normal profiles this does not happen, since basic blocks with
non-zero count must have some way to be reached.
This patch makes inliner to scale caller in this case which mitigates the
problem (to some degree).
Bootstrapped/regtested x86_64-linux, plan to commit it shortly.
gcc/ChangeLog:
* ipa-inline-transform.cc (inline_call): If function with
AFDO profile is inlined into function with
GUESSED_GLOBAL0_AFDO or GUESSED_GLOBAL0_ADJUSTED, scale
caller to AFDO profile.
* profile-count.h (profile_count::apply_scale): If num is AFDO
and den is not GUESSED, make result AFDO rather then GUESSED.
|
|
Add an optab for isnan. This requires changes to the existing folding code
to extend the interclass_mathfn infrastructure to support BUILT_IN_ISNAN.
It now checks for a valid optab before emitting the generic expansion.
There is no change if no optab is defined. Update documentation.
gcc:
* builtins.cc (interclass_mathfn_icode): Add support for isnan
optab.
(expand_builtin): Add BUILT_IN_ISNAN to expand isnan optab.
(fold_builtin_interclass_mathfn): Expand BUILT_IN_ISNAN only after
checking for a valid optab.
(fold_builtin_classify): Move generic BUILT_IN_ISNAN expansion
to fold_builtin_interclass_mathfn.
(fold_builtin_1): For BUILT_IN_ISNAN first try fold_builtin_classify,
then fold_builtin_interclass_mathfn.
* optabs.def: Add isnan optab.
* doc/md.texi: Document isnan.
|
|
The following removes back-and-forth of state in
vect_create_epilog_for_reduction and code that's pointless, in
particular around double reduction handling which isn't that
special as it seems.
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Remove unnecessary code around double reductions.
|
|
Insufficient validation of the operands in vec_set_<mode>_internal
means that the optimizers can transform the exanded code into
something that is invalid. We then emit code based on the incorrect
RTL assuming that it is still valid. A valid pattern can only have a
single bit set in the immediate operand, representing the lane to be
written.
gcc/ChangeLog:
PR target/121775
* config/arm/neon.md (vec_set<mode>_internal, all variants):
validate the immediate operand that indicates the lane to
modify.
gcc/testsuite/ChangeLog:
PR target/121775
* gcc.target/arm/simd/vset_lane_u8.c: New test.
|
|
The following removes never taken paths and consolidates the
nested_cycle and double_reduc variables which are the same.
* tree-vect-loop.cc (vectorizable_reduction): Eliminate
nested_cycle in favor of double_reduc and set that where
it makes most sense. Remove never taken paths and always
true conditions.
|
|
This fixes a glaring mistake in yesterday's change to the expansion of
vec_perm. We should of course move tmp_target into the real target
and not the other way around. I wonder why my testing hasn't
caught this...
PR target/121742
PR target/121780
PR target/121781
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vec_perm): Swap target and
tmp_target.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr121780.c: New test.
* gcc.target/riscv/rvv/autovec/pr121781.c: New test.
|
|
Since peeling and version for alignment for VLA modes was introduced
(r16-3065-geee51f9a4b6) we have been seeing a lot of test suite failures
like
internal compiler error: in apply_scale, at profile-count.h:1187
This is because vect_gen_prolog_loop_niters sets the prolog bound to -1
in case align_in_elems is a non-constant poly_int.
bound - 1 is later used to scale the loop profile in scale_loop_profile
so we try to calculate with an assumed -2 iterations.
This patch changes bound_prolog to poly_int64, using a poly estimate for
frequency scaling but only records an iteration bound for the prolog if
the bound is a scalar.
PR/tree-optimization 121523
gcc/ChangeLog:
* tree-vect-loop-manip.cc (vect_gen_prolog_loop_niters):
Change prolog bound to poly_int64.
(vect_gen_scalar_loop_niters): Ditto.
(vect_do_peeling): Use poly estimate for frequency scaling.
|
|
The following changes how we detect double reductions, in particular
not setting vect_double_reduction_def on the outer PHIs when the inner
loop doesn't satisfy double reduction constraints. It also simplifies
the setup a bit by not having to detect wheter we process an inner
loop of a double reduction.
PR tree-optimization/121768
* tree-vect-loop.cc (vect_inner_phi_in_double_reduction_p): Remove.
(vect_analyze_scalar_cycles_1): Analyze inner loops of
double reductions immediately and only mark fully recognized
double reductions. Skip already analyzed inner loops.
(vect_is_simple_reduction): Change double_reduc from a flag
to an output of the inner loop PHI and to whether we are
processing an inner loop of a double reduction.
* gcc.dg/vect/pr121768.c: New testcase.
|