Age | Commit message (Collapse) | Author | Files | Lines |
|
order_nodes are used to implement ordered comparisons between
two insns with the same program point number. remove_insn would
remove an order_node from its splay tree, but didn't remove it
from the insn. This caused confusion if the insn was later
reinserted somewhere else that also needed an order_node.
gcc/
PR rtl-optimization/115929
* rtl-ssa/insns.cc (function_info::remove_insn): Remove an
order_node from the instruction as well as from the splay tree.
gcc/testsuite/
PR rtl-optimization/115929
* gcc.dg/torture/pr115929-1.c: New test.
|
|
In g:44fc801e97a8dc626a4806ff4124439003420b20 I'd extended
insn_propagation to handle simple cases of hard-reg mode punning.
One of the checks was that the new use mode occupied the same
number of registers as the original definition mode. However,
as PR115901 shows, we need to avoid increasing the size of any
registers in the punned "to" expression as well.
Specifically, the test includes a DImode move from GPR x0 to
a vector register, followed by a V2DI use of the vector register.
The simplification would then create a V2DI spanning x0 and x1,
manufacturing a new, unwanted use of x1.
Checking for that kind of thing directly seems too cumbersome,
and is not related to the original motivation (which was to improve
handling of shared vector zeros on aarch64). This patch therefore
restricts the paradoxical case to constants.
gcc/
PR rtl-optimization/115901
* recog.cc (insn_propagation::apply_to_rvalue_1): Restrict
paradoxical mode punning to cases where "to" is constant.
gcc/testsuite/
PR rtl-optimization/115901
* gcc.dg/torture/pr115901.c: New test.
|
|
The asm in the testcase has a memory operand and also clobbers ax.
The clobber means that ax cannot be used to hold inputs, which
extends to the address of the memory.
I think I had an implicit assumption that constrain_operands
would enforce this, but in hindsight, that clearly wasn't going
to be true. constrain_operands only looks at constraints, and
these clobbers are by definition outside the constraint system.
(And that's why they have to be handled conservatively, since there's
no way to distinguish the earlyclobber and non-earlyclobber cases.)
The semantics of hard-coded clobbers are generic enough that I think
they should be handled directly by rtl-ssa, rather than by consumers.
And in the context of rtl-ssa, the easiest way to check for a clash is
to walk the list of input registers, which we already have to hand.
It therefore seemed better not to push this down to a more generic
rtl helper.
The patch detects hard-coded clobbers in the same way as regrename:
by temporarily stubbing out the operands with pc_rtx.
gcc/
PR rtl-optimization/115891
* rtl-ssa/changes.cc (find_clobbered_access): New function.
(recog_level2): Use it to check for overlap between input
registers and hard-coded clobbers. Conditionally reset
recog_data.insn after changing the insn code.
gcc/testsuite/
PR rtl-optimization/115891
* gcc.target/i386/pr115891.c: New test.
|
|
These are insns of the forms
(set (regA:M)
(plus:M (extend:M (regB:L))
(regA:M)))
and
(set (regA:M)
(minus:M (regA:M)
(extend:M (regB:L))))
where "extend" may be a sign-extend or zero-extend,
and the integer modes are SImode >= M > L >= QImode.
The existing patterns are now represented in terms of insns
with mode iterators and a code iterator over any_extend,
and these new insn support all valid combinations of M and L
(which previously was not the case).
gcc/
* config/avr/avr.cc (avr_out_minus): Assimilate into...
(avr_out_plus_ext): ...this new function.
(avr_adjust_insn_length) [ADJUST_LEN_PLUS_EXT]: Handle case.
(avr_rtx_costs_1) [PLUS, MINUS]: Adjust RTX costs.
* config/avr/avr.md (adjust_len) <plus_ext>: Add new attribute value.
(*addpsi3_zero_extend.hi_split): Assimilate...
(*addpsi3_zero_extend.qi_split): Assimilate...
(*addsi3_zero_extend_split): Assimilate...
(*addsi3_zero_extend.hi_split): Assimilate...
(*addpsi3_sign_extend.hi_split): Assimilate...
(*addhi3.sign_extend1_split): Assimilate...
(*add<PSISI:mode>3.<code>.<QIPSI:mode>_split): ...into this
new insn-and-split.
(*addpsi3_zero_extend.hi): Assimilate...
(*addpsi3_zero_extend.qi): Assimilate...
(*addsi3_zero_extend): Assimilate...
(*addsi3_zero_extend.hi): Assimilate...
(*addpsi3_sign_extend.hi): Assimilate...
(*addhi3.sign_extend1): Assimilate...
(*add<PSISI:mode>3.<code>.<QIPSI:mode>): ...into this new insn.
(*subpsi3_sign_extend.hi_split): Assimilate...
(*subhi3.sign_extend2_split): Assimilate...
(*sub<HISI:mode>3.zero_extend.<QIPSI:mode>_split): Assimilate...
(*sub<HISI:mode>3.<code><QIPSI:mode>_split): ...into this new
insn-and-split.
(*subpsi3_sign_extend.hi): Assimilate...
(*subhi3.sign_extend2): Assimilate...
(*sub<HISI:mode>3.zero_extend.<QIPSI:mode>): Assimilate...
(*sub<HISI:mode>3.<code>.<QIPSI:mode>): ...into this new insn.
(*sub<HISI:mode>3.zero_extend.<QIPSI:mode>): Use avr_out_plus_ext
for asm out.
* config/avr/avr-protos.h (avr_out_minus): Remove.
(avr_out_plus_ext): New proto.
gcc/testsuite/
* gcc.target/avr/torture/add-extend.c: New test.
* gcc.target/avr/torture/sub-extend.c: New test.
|
|
An ICE would occur if a constant was declared using a variable term.
This fix catches variable terms in constant expressions and generates
an unrecoverable error.
gcc/m2/ChangeLog:
PR modula2/115957
* gm2-compiler/M2StackAddress.mod (PopAddress): Detect tail=NIL
and generate an internal error.
* gm2-compiler/PCBuild.bnf (InConstParameter): New variable.
(InConstBlock): New variable.
(ErrorString): Rewrite using MetaErrorStringT0.
(ErrorArrayAt): Rewrite using MetaErrorStringT0.
(WarnMissingToken): Use MetaErrorStringT0.
(CompilationUnit): Set seenError FALSE.
(init): Initialize InConstParameter and InConstBlock.
(ConstantDeclaration): Set InConstBlock.
(ConstSetOrQualidentOrFunction): Call CheckNotVar if not
InConstParameter and InConstBlock.
(ConstActualParameters): Set InConstParameter TRUE and restore
value at the end.
* gm2-compiler/PCSymBuild.def (CheckNotVar): New procedure.
Remove all unnecessary export qualified list.
* gm2-compiler/PCSymBuild.mod (CheckNotVar): New procedure.
gcc/testsuite/ChangeLog:
PR modula2/115957
* gm2/errors/fail/badconst.mod: New test.
* gm2/pim/fail/tinyadr.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_trans_zero_assign): Handle allocatable arrays.
gcc/testsuite/ChangeLog:
* gfortran.dg/array_memset_3.f90: New test.
Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
|
|
When emitting the compensation to the vectorized main loop for
a vector reduction value to be re-used in the vectorized epilogue
we fail to place it in the correct block when the main loop is
known to be entered (no loop_vinfo->main_loop_edge) but the
epilogue is not (a loop_vinfo->skip_this_loop_edge). The code
currently disregards this situation.
With the recent znver4 cost fix I couldn't trigger this situation
with the testcase but I adjusted it so it could eventually trigger
on other targets.
PR tree-optimization/115841
* tree-vect-loop.cc (vect_transform_cycle_phi): Correctly
place the partial vector reduction for the accumulator
re-use when the main loop cannot be skipped but the
epilogue can.
* gcc.dg/vect/pr115841.c: New testcase.
|
|
This patch takes some existing patterns that have QImode as one
input and uses a mode iterator to allow for more modes to match.
These insns are split after reload into *xorqi3 resp. *iorqi3 insn(s).
gcc/
* config/avr/avr-protos.h (avr_emit_xior_with_shift): New proto.
* config/avr/avr.cc (avr_emit_xior_with_shift): New function.
* config/avr/avr.md (any_lshift): New code iterator.
(*<xior:code><mode>.<any_lshift:code>): New insn-and-split.
(<code><HISI:mode><QIPSI:mode>.0): Replaces...
(*<code_stdname><mode>qi.byte0): ...this one.
(*<xior:code><HISI:mode><QIPSI:mode>.<any_lshift:code>): Replaces...
(*<code_stdname><mode>qi.byte1-3): ...this one.
|
|
gcc/testsuite/ChangeLog:
* gcc.target/i386/indirect-thunk-extern-1.c: Replace character with
invalid encoding with `?`.
|
|
Code attribute bhfgq is missing a mapping for TF. This results in
unresolved iterators in assembler templates for *bswaptf.
With the TF mapping added the base mnemonics vlbr and vstbr are not
"used" anymore but only the extended mnemonics (vlbr<bhfgq> was
interpreted as vlbr; likewise for vstbr). Therefore, remove the base
mnemonics from the scheduling description, otherwise, genattrtab would
error about unknown mnemonics.
Similarly, we end up with unresolved iterators in assembler templates
for mulfprx23 since code attribute xdee is missing a mapping for FPRX2.
gcc/ChangeLog:
* config/s390/3931.md (vlbr, vstbr): Remove.
* config/s390/s390.md (xdee): Add FPRX2 mapping.
* config/s390/vector.md (bhfgq): Add TF mapping.
|
|
Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue. It looks like the unaligned costs
were simply copied from the bogus znver4 costs. The following makes
the unaligned costs equal to the aligned costs like in the fixed znver4
version.
* config/i386/x86-tune-costs.h (znver5_cost): Update unaligned
load and store cost from the aligned costs.
|
|
Optabs vcond{,u} will be removed for GCC 15. Since regtest shows no
fallout, dropping the expanders, now.
gcc/ChangeLog:
PR target/114189
* config/s390/vector.md (V_HW2): Remove.
(vcond<V_HW:mode><V_HW2:mode>): Remove.
(vcondu<V_HW:mode><V_HW2:mode>): Remove.
|
|
In preparation of dropping vcond{,u,eq} optabs
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654690.html
enable 128-bit operands for vcond_mask---including integer as well as
floating point.
This fixes partially PR115519 w.r.t. autovec-long-double-signaling-*.c
tests.
gcc/ChangeLog:
* config/s390/vector.md: Enable vcond_mask for 128-bit ops.
|
|
Mode iterator V_HW enables V1TI for target VXE which means
vec_cmpv1tiv1ti becomes available which leads to an ICE since there is
no corresponding insn.
Fixed by emulating comparisons and enabling mode V1TI unconditionally
for V_HW. For the sake of symmetry, I also added TI mode to V_HW since
TF mode is already included. As a consequence the consumers of V_HW
vec_{splat,slb,sld,sldw,sldb,srdb,srab,srb,test_mask_int,test_mask}
also become available for 128-bit integers.
This fixes gcc.c-torture/execute/pr105613.c and gcc.dg/pr106063.c.
gcc/ChangeLog:
* config/s390/vector.md (V_HW): Enable V1TI unconditionally and
add TI.
(vec_cmpu<VIT_HW:mode><VIT_HW:mode>): Add 128-bit integer
variants.
(*vec_cmpeq<mode><mode>_nocc_emu): Emulate operation.
(*vec_cmpgt<mode><mode>_nocc_emu): Emulate operation.
(*vec_cmpgtu<mode><mode>_nocc_emu): Emulate operation.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/vec-cmp-emu-1.c: New test.
* gcc.target/s390/vector/vec-cmp-emu-2.c: New test.
* gcc.target/s390/vector/vec-cmp-emu-3.c: New test.
|
|
When AVX512 uses a fully masked loop and peeling we fail to create the
correct initial loop mask when the mask is composed of multiple
components in some cases. The following fixes this by properly applying
the bias for the component to the shift amount.
PR tree-optimization/115843
* tree-vect-loop-manip.cc
(vect_set_loop_condition_partial_vectors_avx512): Properly
bias the shift of the initial mask for alignment peeling.
* gcc.dg/vect/pr115843.c: New testcase.
|
|
Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue. It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4. The following makes
the unaligned costs equal to the aligned costs.
This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there. But it makes it qualify
as a regression fix.
PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.
|
|
This patch resolves PR tree-optimization/114661, by generalizing the set
of expressions that we canonicalize to multiplication. This extends the
optimization(s) contributed (by me) back in July 2021.
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575999.html
The existing transformation folds (X*C1)^(X<<C2) into X*C3 when
allowed. A subtlety is that for non-wrapping integer types, we
actually fold this into (int)((unsigned)X*C3) so that we don't
introduce an undefined overflow that wasn't in the original.
Unfortunately, this transformation confuses itself, as the type-cast
multiplication isn't recognized when further combining bit operations.
Fixed here by allowing optional useless type conversions in transforms
to turn (int)((unsigned)X*C1)^(X<<C2) into (int)((unsigned)X*C3) so
that match.pd and EVRP can continue to construct multiplications.
For the example given in the PR:
unsigned mul(unsigned char c) {
if (c > 3) __builtin_unreachable();
return c << 18 | c << 15 |
c << 12 | c << 9 |
c << 6 | c << 3 | c;
}
GCC on x86_64 with -O2 previously generated:
mul: movzbl %dil, %edi
leal (%rdi,%rdi,8), %edx
leal 0(,%rdx,8), %eax
movl %edx, %ecx
sall $15, %edx
orl %edi, %eax
sall $9, %ecx
orl %ecx, %eax
orl %edx, %eax
ret
with this patch we now generate:
mul: movzbl %dil, %eax
imull $299593, %eax, %eax
ret
2024-07-16 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR tree-optimization/114661
* match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Allow optional useless
type conversions around multiplications, such as those inserted
by this transformation.
gcc/testsuite/ChangeLog
PR tree-optimization/114661
* gcc.dg/pr114661.c: New test case.
|
|
Based on actual usage, trunc{128}2{16,32,64} use some instructions from
sse/sse3, so extend their scope to extend the scope of optimization.
gcc/ChangeLog:
PR target/107432
* config/i386/sse.md
(PMOV_SRC_MODE_3_AVX2): Add TARGET_AVX2 for V4DI and V8SI.
(PMOV_SRC_MODE_4): Add TARGET_AVX2 for V4DI.
(trunc<mode><pmov_dst_3_lower>2): Change constraint from TARGET_AVX2 to
TARGET_SSSE3.
(trunc<mode><pmov_dst_4_lower>2): Ditto.
(truncv2div2si2): Change constraint from TARGET_AVX2 to TARGET_SSE.
gcc/testsuite/ChangeLog:
PR target/107432
* gcc.target/i386/pr107432-10.c: New test.
|
|
|
|
So as I've noted before I believe the control flow in ext-dce.cc is horribly
messy. While investigating a fix for 115877 I came across another problem
related to control flow handling.
Specifically, if we have an binary op which implies the 2nd operand is fully
live, then we'd actually fail to mark that operand as live.
We essentially broke out of the loop which was supposed to be safe. But Y was
a REG and if Y is a REG or CONST_INT we skip sub-rtxs and thus failed to
process that operand (the shift count) at all.
Rather than muck around with control flow, we can just set all the bits as live
in DST_MASK and let normal processing continue. With all the bits live IN
DST_MASK all the bits implied by the mode of the argument will also be live.
No testcase.
Bootstrapped and regression tested on x86. Pushing to the trunk.
gcc/
* ext-dce.cc (ext_dce_process_uses): Simplify control flow and fix
liveness computation for shift/rotate counts.
|
|
My change to fix a ubsan issue broke handling propagation of the carry/sign bit
down through a right shift. Thanks to Andreas for the analysis and proposed
fix and Sergei for the testcase.
PR rtl-optimization/115876
PR rtl-optimization/115916
gcc/
* ext-dce.cc (carry_backpropagate): Make return type unsigned as well.
Cast to signed for right shift to preserve sign bit.
gcc/testsuite/
* g++.dg/torture/pr115916.C: New test.
Co-author: Andreas Schwab <schwab@linux-m68k.org>
Co-author: Sergei Trofimovich <slyfox at gentoo dot org>
|
|
Here we're prematurely stripping the dependent alias template-id A<T> to
its defining-type-id T when used as a template argument, which in turn
causes us to essentially ignore A's vector_size attribute in the outer
template-id.
This has always been a problem for class template-ids it seems, and after
r14-2170 variable template-ids are affected as well.
This patch marks alias templates that have a dependent attribute as
complex (as with e.g. constrained alias templates) so that we don't look
through them prematurely.
PR c++/115897
gcc/cp/ChangeLog:
* pt.cc (complex_alias_template_p): Return true for an alias
template with attributes.
(get_underlying_template): Don't look through an alias template
with attributes.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/alias-decl-77.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
This reverts commit 5040c273484d7123a40a99cdeb434cecbd17a2e9.
|
|
The set of enabled extensions can be extended via target arch function
attributes by listing each extension with a '+' prefix and a comma as
list separator. E.g.:
__attribute__((target("arch=+zba,+zbb"))) void foo();
The programmer intends to ensure that one or more extensions
are enabled when building the code. This is independent of the arch
string that is passed at build time via the -march= option.
Therefore, it is reasonable to allow enabling extensions via target arch
attributes, which have already been enabled via the -march= string.
The subset list code already supports such duplication for implied
extensions. This patch adds an interface so the subset list
parser can be switched into a mode where duplication is allowed.
This commit fixes the following regressed test cases:
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-39.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-42.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-43.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-44.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-45.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-46.c
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (riscv_subset_list::add):
Allow adding enabled extension if m_allow_adding_dup is set.
* config/riscv/riscv-subset.h: Add m_allow_adding_dup and setter.
* config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch):
Allow adding enabled extensions.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr115554.c: Change expected fail to expected pass.
* gcc.target/riscv/target-attr-16.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
The target-arch attribute handling in RISC-V is only a few months old,
but already saw a rewrite (9941f0295a14), which addressed an important
issue. This rewrite introduced a hash table in the backend, which is
used to keep track of target-arch attributes of all functions.
The index of this hash table is the pointer to the function declaration
object (fndecl). However, objects like these don't have the lifetime
that is assumed here, which resulted in observing two fndecl objects
with the same address for different objects (triggering the assertion
in riscv_func_target_put() -- see also PR115562).
This patch removes the hash table approach in favor of storing target
specific options using the DECL_FUNCTION_SPECIFIC_TARGET() macro, which
is also used by other backends and is specifically designed for this
purpose (https://gcc.gnu.org/onlinedocs/gccint/Function-Properties.html).
To have an accessible field in the target options, we need to
adjust riscv.opt and introduce the field riscv_arch_string
(for the already existing option '-march=').
Using this macro allows to remove much code from riscv-common.cc, which
controls access to the objects 'func_target_table' and 'current_subset_list'.
One thing to mention is, that we had two subset lists:
current_subset_list and cmdline_subset_list, with the latter being
introduced recently for target attribute handling.
This patch reduces them back to one (cmdline_subset_list) which
contains the list of extensions that have been enabled by the command
line arguments.
Note that the patch keeps the existing behavior of rejecting
duplications of extensions when added via the '+' operator in a function
target attribute. E.g. "-march=rv64gc_zbb" and "arch=+zbb" will trigger
an error (see pr115554.c). However, at the same time this patch breaks
the acceptance of adding implied extensions, which causes the following
six regressions (with the error "extension 'EXT' appear more than one time"):
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-39.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-42.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-43.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-44.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-45.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-46.c
New tests were added to document the behavior and to ensure it won't
regress. This patch did not show any regressions for rv32/rv64
and fixes the ICEs from PR115554 and PR115562.
PR target/115554
PR target/115562
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (struct riscv_func_target_info):
Remove.
(struct riscv_func_target_hasher): Likewise.
(riscv_func_decl_hash): Likewise.
(riscv_func_target_hasher::hash): Likewise.
(riscv_func_target_hasher::equal): Likewise.
(riscv_current_subset_list): Likewise.
(riscv_cmdline_subset_list): Remove obsolete space.
(riscv_func_target_table_lazy_init): Remove.
(riscv_func_target_get): Likewise.
(riscv_func_target_put): Likewise.
(riscv_func_target_remove_and_destory): Likewise.
(riscv_arch_str): Generate from cmdline_subset_list.
(riscv_set_arch_by_subset_list): Don't set current_subset_list.
(riscv_parse_arch_string): Remove current_subset_list.
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
Get subset list via riscv_cmdline_subset_list().
* config/riscv/riscv-subset.h (riscv_current_subset_list):
Remove prototype.
(riscv_func_target_get): Likewise.
(riscv_func_target_put): Likewise.
(riscv_func_target_remove_and_destory): Likewise.
* config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch):
Build base arch string from existing target options, if any.
(riscv_target_attr_parser::update_settings): Store new arch
string in target options.
(riscv_process_one_target_attr): Whitespace fix.
(riscv_process_target_attr): Drop opts argument.
(riscv_option_valid_attribute_p): Properly save, change and restore
target options.
* config/riscv/riscv.cc (get_arch_str): New function.
(riscv_declare_function_name): Get arch string for option-arch
directive from function's target options.
* config/riscv/riscv.opt: Add riscv_arch_string variable to
march option.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/target-attr-01.c: Add test for option-arch directive.
* gcc.target/riscv/target-attr-02.c: Likewise.
* gcc.target/riscv/target-attr-03.c: Likewise.
* gcc.target/riscv/target-attr-04.c: Likewise.
* gcc.target/riscv/target-attr-05.c: Fix formatting.
* gcc.target/riscv/target-attr-06.c: Likewise.
* gcc.target/riscv/target-attr-07.c: Likewise.
* gcc.target/riscv/pr115554.c: New test.
* gcc.target/riscv/pr115562.c: New test.
* gcc.target/riscv/target-attr-08.c: New test.
* gcc.target/riscv/target-attr-09.c: New test.
* gcc.target/riscv/target-attr-10.c: New test.
* gcc.target/riscv/target-attr-11.c: New test.
* gcc.target/riscv/target-attr-12.c: New test.
* gcc.target/riscv/target-attr-13.c: New test.
* gcc.target/riscv/target-attr-14.c: New test.
* gcc.target/riscv/target-attr-15.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
Allocating an object on the heap with new, wrapping it in a
std::unique_ptr and finally getting the buffer via buf.get()
is a correct way to allocate a buffer that is automatically
freed on return. However, a simple invocation of alloca()
does the same with less overhead.
gcc/ChangeLog:
* config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch):
Replace new + std::unique_ptr by alloca().
(riscv_process_one_target_attr): Likewise.
(riscv_process_target_attr): Likewise.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
The first two patches for PR113719 have each regressed
gcc.dg/ipa/iinline-attr.c on a different target. The reason for this
instability is that there are competing flag_omit_frame_pointer
overriders on x86:
- ix86_recompute_optlev_based_flags computes and sets a
-f[no-]omit-frame-pointer default depending on
USE_IX86_FRAME_POINTER and, in 32-bit mode, optimize_size
- ix86_option_override_internal enables flag_omit_frame_pointer for
-momit-leaf-frame-pointer to take effect
ix86_option_override[_internal] calls
ix86_recompute_optlev_based_flags before setting
flag_omit_frame_pointer. It is called during global process_options.
But ix86_recompute_optlev_based_flags is also called by
parse_optimize_options, during attribute processing, and at that
point, ix86_option_override is not called, so the final overrider for
global options is not applied to the optimize attributes. If they
differ, the testcase fails.
In order to fix this, we need to process all overriders of this option
whenever we process any of them. Since this setting is affected by
optimization options, it makes sense to compute it in
parse_optimize_options, rather than in process_options.
for gcc/ChangeLog
PR target/113719
* config/i386/i386-options.cc (ix86_option_override_internal):
Move flag_omit_frame_pointer final overrider...
(ix86_recompute_optlev_based_flags): ... here.
|
|
The following testcase was not properly testing anything due to an
uninitialized variable. As a result, the loop was not iterating through
the testing data, but instead on undefined values which could cause an
unexpected abort.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h:
initialize variable
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
|
|
gcc/
* config/avr/avr.md: Simplify mode usage.
(GET_MODE_SIZE (<MODE>mode)): Use <SIZE> instead.
(GET_MODE_BITSIZE (<MODE>mode) - 1): Use <MSB> instead.
(GET_MODE_MASK (QImode)): Use 0xff instead.
* config/avr/avr-fixed.md: Same.
|
|
Nick has implemented a new .base64 directive in gas (to be shipped in
the upcoming binutils 2.43; big thanks for that).
See https://sourceware.org/bugzilla/show_bug.cgi?id=31964
The following patch adjusts default_elf_asm_output_ascii (i.e.
ASM_OUTPUT_ASCII elfos.h implementation) to use it if it detects binary
data and gas supports it.
Without this patch, we emit stuff like:
.string "\177ELF\002\001\001\003"
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string "\002"
.string ">"
...
.string "\324\001\236 0FS\202\002E\n0@\203\004\005&\202\021\337)\021\203C\020A\300\220I\004\t\b\206(\234\0132l\004b\300\bK\006\220$0\303\020P$\233\211\002D\f"
etc., with this patch more compact
.base64 "f0VMRgIBAQMAAAAAAAAAAAIAPgABAAAAABf3AAAAAABAAAAAAAAAAACneB0AAAAAAAAAAEAAOAAOAEAALAArAAYAAAAEAAAAQAAAAAAAAABAAEAAAAAAAEAAQAAAAAAAEAMAAAAAAAAQAwAAAAAAAAgAAAAAAAAAAwAAAAQAAABQAwAAAAAAAFADQAAAAAAAUANAAAAAAAAcAAAAAAAAABwAAAAAAAAAAQAAAAAAAAABAAAABAAAAAAAAAAAAAAAAABAAAAAAAAAAEAAAAAAADBwOQAAAAAAMHA5AAAAAAAAEAAAAAAAAAEAAAAFAAAAAIA5AAAAAAAAgHkAAAAA"
.base64 "AACAeQAAAAAAxSSgAgAAAADFJKACAAAAAAAQAAAAAAAAAQAAAAQAAAAAsNkCAAAAAACwGQMAAAAAALAZAwAAAADMtc0AAAAAAMy1zQAAAAAAABAAAAAAAAABAAAABgAAAGhmpwMAAAAAaHbnAwAAAABoducDAAAAAOAMAQAAAAAA4MEeAAAAAAAAEAAAAAAAAAIAAAAGAAAAkH2nAwAAAACQjecDAAAAAJCN5wMAAAAAQAIAAAAAAABAAgAAAAAAAAgAAAAAAAAABAAAAAQAAABwAwAAAAAAAHADQAAAAAAAcANAAAAAAABAAAAAAAAAAEAAAAAAAAAACAAAAAAA"
.base64 "AAAEAAAABAAAALADAAAAAAAAsANAAAAAAACwA0AAAAAAACAAAAAAAAAAIAAAAAAAAAAEAAAAAAAAAAcAAAAEAAAAaGanAwAAAABoducDAAAAAGh25wMAAAAAAAAAAAAAAAAQAAAAAAAAAAgAAAAAAAAAU+V0ZAQAAABwAwAAAAAAAHADQAAAAAAAcANAAAAAAABAAAAAAAAAAEAAAAAAAAAACAAAAAAAAABQ5XRkBAAAAAw/WAMAAAAADD+YAwAAAAAMP5gDAAAAAPy7CgAAAAAA/LsKAAAAAAAEAAAAAAAAAFHldGQGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
.base64 "AAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAUuV0ZAQAAABoZqcDAAAAAGh25wMAAAAAaHbnAwAAAACYGQAAAAAAAJgZAAAAAAAAAQAAAAAAAAAvbGliNjQvbGQtbGludXgteDg2LTY0LnNvLjIAAAAAAAQAAAAwAAAABQAAAEdOVQACgADABAAAAAEAAAAAAAAAAQABwAQAAAAJAAAAAAAAAAIAAcAEAAAAAwAAAAAAAAAEAAAAEAAAAAEAAABHTlUAAAAAAAMAAAACAAAAAAAAAAOAAACsqAAAgS0AAOJWAAAjNwAAXjAAAAAAAAAAAAAAF1gAAHsxAABBBwAA"
.base64 "G0kAALGmAACwoAAAAAAAAAAAAACQhAAAAAAAAOw1AACNYgAAAAAAAFQoAAAAAAAAx3UAALZAAAAAAAAAiIUAALGeAABBlAAAWEsAAPmRAACmOgAAAAAAADh3AAAAAAAAlCAAAAAAAABymgAAaosAAMIjAAAKMQAAMkIAADU0AAAAAAAA5ZwAAAAAAAAAAAAAAAAAAFIdAAAIGQAAAAAAAMFbAAAoTQAAGDcAAIRgAAA6HgAAlxwAAAAAAADOlgAAAAAAAEhPAAARiwAAMGgAAOVtAADMFgAAAAAAAAAAAACrjgAAYl4AACZVAAA/HgAAAAAAAAAAAABqPwAAAAAA"
The patch attempts to juggle between readability and compactness, so
if it detects some hunk of the initializer that would be shorter to be
emitted as .string/.ascii directive, it does so, but if it previously
used .base64 directive it switches mode only if there is a 16+ char
ASCII-ish string.
On my #embed testcase from yesterday
unsigned char a[] = {
#embed "cc1plus"
};
without this patch it emits 2.4GB of assembly, while with this
patch 649M.
Compile times (trunk, so yes,rtl,extra checking) are:
time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c
real 0m13.647s
user 0m7.157s
sys 0m2.597s
time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c
real 0m28.649s
user 0m26.653s
sys 0m1.958s
without the patch and
time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c
real 0m4.283s
user 0m2.288s
sys 0m0.859s
time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c
real 0m6.888s
user 0m5.876s
sys 0m1.002s
with the patch, so that feels like significant improvement.
The resulting embed-11.o is identical between the two ways of expressing
the mostly binary data in the assembly. But note that there are portions
like:
.base64 "nAAAAAAAAAAvZRcAIgAOAFAzMwEAAAAABgAAAAAAAACEQBgAEgAOAFBHcAIAAAAA7AAAAAAAAAAAX19nbXB6X2dldF9zaQBtcGZyX3NldF9zaV8yZXhwAG1wZnJfY29zaABtcGZyX3RhbmgAbXBmcl9zZXRfbmFuAG1wZnJfc3ViAG1wZnJfdGFuAG1wZnJfc3RydG9mcgBfX2dtcHpfc3ViX3VpAF9fZ21wX2dldF9tZW1vcnlfZnVuY3Rpb25zAF9fZ21wel9zZXRfdWkAbXBmcl9wb3cAX19nbXB6X3N1YgBfX2dtcHpfZml0c19zbG9uZ19wAG1wZnJfYXRh"
.base64 "bjIAX19nbXB6X2RpdmV4YWN0AG1wZnJfc2V0X2VtaW4AX19nbXB6X3NldABfX2dtcHpfbXVsAG1wZnJfY2xlYXIAbXBmcl9sb2cAbXBmcl9hdGFuaABfX2dtcHpfc3dhcABtcGZyX2FzaW5oAG1wZnJfYXNpbgBtcGZyX2NsZWFycwBfX2dtcHpfbXVsXzJleHAAX19nbXB6X2FkZG11bABtcGZyX3NpbmgAX19nbXB6X2FkZF91aQBfX2dtcHFfY2xlYXIAX19nbW9uX3N0YXJ0X18AbXBmcl9hY29zAG1wZnJfc2V0X2VtYXgAbXBmcl9jb3MAbXBmcl9zaW4A"
.string "__gmpz_ui_pow_ui"
.string "mpfr_get_str"
.string "mpfr_acosh"
.string "mpfr_sub_ui"
.string "__gmpq_set_ui"
.string "mpfr_set_inf"
...
.string "GLIBC_2.14"
.string "GLIBC_2.11"
.base64 "AAABAAIAAQADAAMAAwADAAMAAwAEAAUABgADAAEAAQADAAMABwABAAEAAwADAAMAAwAIAAEAAwADAAEAAwABAAMAAwABAAMAAQADAAMAAwADAAMAAwADAAYAAwADAAEAAQAIAAMAAwADAAMAAwABAAMAAQADAAMAAQABAAEAAwAIAAEAAwADAAEAAwABAAMAAQADAAEABgADAAMAAQAHAAMAAwADAAMAAwABAAMAAQABAAMAAwADAAkAAQABAAEAAwAKAAEAAwADAAMAAQABAAMAAwALAAEAAwADAAEAAQADAAMAAwABAAMAAwABAAEAAwADAAMABwABAAMAAwAB"
.base64 "AAEAAwADAAEAAwABAAMAAQADAAMAAwADAAEAAQABAAEAAwADAAMAAQABAAEAAQABAAEAAQADAAMAAwADAAMAAQABAAwAAwADAA0AAwADAAMAAwADAAEAAQADAAMAAQABAAMAAwADAAEAAwADAAEAAwAIAAMAAwADAAMABgABAA4ACwAGAAEAAQADAAEAAQADAAEAAwABAAMAAwABAAEAAwABAAMAAwABAAEAAwADAAMAAwABAAMAAQABAAEAAQABAAMADwABAAMAAQADAAMAAwABAAEAAQAIAAEADAADAAMAAQABAAMAAwADAAEAAQABAAEAAQADAAEAAwADAAEA"
.base64 "AwABAAMAAQADAAMAAQABAAEAAwADAAMAAwADAAMAAQADAAMACAAQAA8AAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQA="
so it isn't all just totally unreadable stuff.
2024-07-15 Jakub Jelinek <jakub@redhat.com>
* configure.ac (HAVE_GAS_BASE64): New check.
* config/elfos.h (BASE64_ASM_OP): Define if HAVE_GAS_BASE64 is
defined.
* varasm.cc (assemble_string): Bump maximum from 2000 to 16384 if
BASE64_ASM_OP is defined.
(default_elf_asm_output_limited_string): Emit opening '"' together
with STRING_ASM_OP.
(default_elf_asm_output_ascii): Use BASE64_ASM_OP if defined and
beneficial. Remove UB when last_null is NULL.
* configure: Regenerate.
* config.in: Regenerate.
|
|
- _5 = __atomic_fetch_or_8 (&set_work_pending_p, 1, 0);
- # DEBUG old => (long int) _5
+ _6 = .ATOMIC_BIT_TEST_AND_SET (&set_work_pending_p, 0, 1, 0, __atomic_fetch_or_8);
+ # DEBUG old => NULL
# DEBUG BEGIN_STMT
- # DEBUG D#2 => _5 & 1
+ # DEBUG D#2 => NULL
...
- _10 = ~_5;
- _8 = (_Bool) _10;
- # DEBUG ret => _8
+ _8 = _6 == 0;
+ # DEBUG ret => (_Bool) _10
confirmed. convert_atomic_bit_not does this, it checks for single_use
and removes the def, failing to release the name (which would fix this up
IIRC).
Note the function removes stmts in "wrong" order (before uses of LHS
are removed), so it requires larger surgery. And it leaks SSA names.
gcc/ChangeLog:
PR target/115872
* tree-ssa-ccp.cc (convert_atomic_bit_not): Remove use_stmt after use_nop_stmt is removed.
(optimize_atomic_bit_test_and): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr115872.c: New test.
|
|
For APX ccmp, current infrastructure will always generate cstore for
the ccmp flag user, like
cmpe %rcx, %r8
ccmpnel %rax, %rbx
seta %dil
add %rcx, %r9
add %r9, %rdx
testb %dil, %dil
je .L2
For such case, the legacy add clobbers FLAGS_REG so there should have
extra cstore to avoid the flag be reset before using it. If the
instructions between flag producer and user are NF insns, the setcc/
test sequence is not required.
Add a pass to convert legacy flag clobber insns to their NF counterpart.
The convertion only happens when
1. APX_NF enabled.
2. For a BB, cstore was find, and there are insns between such cstore
and next explicit set insn to FLAGS_REG (test or cmp).
3. All the insns found should have NF counterpart.
The pass was added after rtl-ifcvt which eliminates some branch when
profitable, which could cause some flag-clobbering insn put between
cstore and jcc.
gcc/ChangeLog:
* config/i386/i386.md (has_nf): New define_attr, add to all
nf related patterns.
* config/i386/i386-features.cc (apx_nf_convert): New function
to convert Non-NF insns to their NF counterparts.
(class pass_apx_nf_convert): New pass class.
(make_pass_apx_nf_convert): New.
* config/i386/i386-passes.def: Add pass_apx_nf_convert after
rtl_ifcvt.
* config/i386/i386-protos.h (make_pass_apx_nf_convert): Declare.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-nf-2.c: New test.
|
|
With r15-1619-g3b9b8d6cfdf593, pr111235.c fails due to different
registers used in ldrexd instruction. The key part of this test is that
the compiler generates LDREXD. The registers used for that are pretty
much irrelevant as they are not matched with any other operations within
the test. This patch changes the test to test only for the mnemonic and
not for any of the operands.
2024-07-15 Surya Kumari Jangala <jskumari@linux.ibm.com>
gcc/testsuite:
PR testsuite/115894
* gcc.target/arm/pr111235.c: Update expected output.
|
|
The patch add the Zihintntl instructions in the prefetch pattern.
Zicbop has prefetch instructions. Zihintntl has NTL instructions.
Insert NTL instructions before prefetch instruction, if target
has Zihintntl extension.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_print_operand): Add 'L' letter
to print zihintntl instructions string.
* config/riscv/riscv.md (prefetch): Add zihintntl instructions.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/prefetch-zicbop.c: New test.
* gcc.target/riscv/prefetch-zihintntl.c: New test.
|
|
The fix at r15-1619-g3b9b8d6cfdf593 results in a rearrangement of
instructions generated for cpy_1.c. This patch fixes the expected output.
2024-07-12 Surya Kumari Jangala <jskumari@linux.ibm.com>
gcc/testsuite:
PR testsuite/115892
* gcc.target/aarch64/sve/acle/general/cpy_1.c: Update expected
output.
|
|
With r15-1619-g3b9b8d6cfdf593, there's a XPASS and a FAIL
for this test-case for cris-elf. Looking at the generated
code, _foo is indeed no longer saved in a register for CRIS.
While that looks like a regression, coremark results are the
same around this revision, so simply adjust the test-case:
remove the target-specific exceptions for cris-*-*.
* gcc.dg/tree-ssa/loop-1.c: Remove target-specific test
and xfail to adjust for recent changes in register allocation.
|
|
V3: Add Bfloat16 vector insn in generic-vector-ooo.md
v2: Rebase
Accroding to the BFloat16 spec, some vector iterators and new pattern
are added in md files.
Signed-off-by: Feng Wang <wangfeng@eswincomputing.com>
gcc/ChangeLog:
* config/riscv/generic-vector-ooo.md: Add def_insn_reservation for vector BFloat16.
* config/riscv/riscv.md: Add new insn name for vector BFloat16.
* config/riscv/vector-iterators.md: Add some iterators for vector BFloat16.
* config/riscv/vector.md: Add some attribute for vector BFloat16.
* config/riscv/vector-bfloat16.md: New file. Add insn pattern vector BFloat16.
|
|
v3: Modify warning message in riscv.cc
v2: Rebase
Accroding to the intrinsic doc, the 'Zvfbfmin' and 'Zvfbfwma' intrinsic
functions are added by this patch.
Signed-off-by: Feng Wang <wangfeng@eswincomputing.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc (class vfncvtbf16_f):
Add 'Zvfbfmin' intrinsic in bases.
(class vfwcvtbf16_f): Ditto.
(class vfwmaccbf16): Add 'Zvfbfwma' intrinsic in bases.
(BASE): Add BASE macro for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins-bases.h: Add declaration for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins-functions.def (REQUIRED_EXTENSIONS):
Add builtins def for 'Zvfbfmin' and 'Zvfbfwma'.
(vfncvtbf16_f): Ditto.
(vfncvtbf16_f_frm): Ditto.
(vfwcvtbf16_f): Ditto.
(vfwmaccbf16): Ditto.
(vfwmaccbf16_frm): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (supports_vectype_p):
Add vector intrinsic build judgment for BFloat16.
(build_all): Ditto.
(BASE_NAME_MAX_LEN): Adjust max length.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_F32_OPS):
Add new operand type for BFloat16.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_F32_OPS): Ditto.
(validate_instance_type_required_extensions):
Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins.h (enum required_ext):
Add required_ext declaration for 'Zvfbfmin' and 'Zvfbfwma'.
(reqired_ext_to_isa_name): Ditto.
(required_extensions_specified): Ditto.
(struct function_group_info): Add match case for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv.cc (riscv_validate_vector_type):
Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.
|
|
According to the instruction spec of AVX512BF16, the convert from float
to BF16 is not a simple truncation. It has special handling for
denormal/nan, even for normal float it will add an extra bias according
to the least significant bit for bf number. This means we cannot use the
vcvtne2ps2bf16 for any bf16 vector shuffle.
The optimization introduced in r15-1368 adds a specific split to convert
HImode permutation with this instruction, so remove it and treat the
BFmode permutation same as HFmode.
gcc/ChangeLog:
PR target/115889
* config/i386/predicates.md (vcvtne2ps2bf_parallel): Remove.
* config/i386/sse.md (hi_cvt_bf): Remove.
(HI_CVT_BF): Likewise.
(vpermt2_sepcial_bf16_shuffle_<mode>):Likewise.
gcc/testsuite/ChangeLog:
PR target/115889
* gcc.target/i386/vpermt2-special-bf16-shufflue.c: Adjust output
scan.
|
|
v3: Rebase
v2: Rebase
The vector type of BFloat16 format is added in this patch,
subsequent extensions to zvfbfmin and zvfwma need to be based
on this patch.
Signed-off-by: Feng Wang <wangfeng@eswincomputing.com>
gcc/ChangeLog:
* config/riscv/genrvv-type-indexer.cc (bfloat16_type):
Generate bf16 vector_type and scalar_type in DEF_RVV_TYPE_INDEX.
(bfloat16_wide_type): Ditto.
(same_ratio_eew_bf16_type): Ditto.
(main): Ditto.
* config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
Add vector type for BFloat16.
(RVV_WHOLE_MODES): Add vector type for BFloat16.
(RVV_FRACT_MODE): Ditto.
(RVV_NF4_MODES): Ditto.
(RVV_NF8_MODES): Ditto.
(RVV_NF2_MODES): Ditto.
* config/riscv/riscv-vector-builtins-types.def (vbfloat16mf4_t):
Add builtin vector type for BFloat16.
(vbfloat16mf2_t): Add builtin vector type for BFloat16.
(vbfloat16m1_t): Ditto.
(vbfloat16m2_t): Ditto.
(vbfloat16m4_t): Ditto.
(vbfloat16m8_t): Ditto.
(vbfloat16mf4x2_t): Ditto.
(vbfloat16mf4x3_t): Ditto.
(vbfloat16mf4x4_t): Ditto.
(vbfloat16mf4x5_t): Ditto.
(vbfloat16mf4x6_t): Ditto.
(vbfloat16mf4x7_t): Ditto.
(vbfloat16mf4x8_t): Ditto.
(vbfloat16mf2x2_t): Ditto.
(vbfloat16mf2x3_t): Ditto.
(vbfloat16mf2x4_t): Ditto.
(vbfloat16mf2x5_t): Ditto.
(vbfloat16mf2x6_t): Ditto.
(vbfloat16mf2x7_t): Ditto.
(vbfloat16mf2x8_t): Ditto.
(vbfloat16m1x2_t): Ditto.
(vbfloat16m1x3_t): Ditto.
(vbfloat16m1x4_t): Ditto.
(vbfloat16m1x5_t): Ditto.
(vbfloat16m1x6_t): Ditto.
(vbfloat16m1x7_t): Ditto.
(vbfloat16m1x8_t): Ditto.
(vbfloat16m2x2_t): Ditto.
(vbfloat16m2x3_t): Ditto.
(vbfloat16m2x4_t): Ditto.
(vbfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (check_required_extensions):
Add required_ext checking for BFloat16.
* config/riscv/riscv-vector-builtins.def (vbfloat16mf4_t):
Add vector_type for BFloat16 in builtins.def.
(vbfloat16mf4x2_t): Ditto.
(vbfloat16mf4x3_t): Ditto.
(vbfloat16mf4x4_t): Ditto.
(vbfloat16mf4x5_t): Ditto.
(vbfloat16mf4x6_t): Ditto.
(vbfloat16mf4x7_t): Ditto.
(vbfloat16mf4x8_t): Ditto.
(vbfloat16mf2_t): Ditto.
(vbfloat16mf2x2_t): Ditto.
(vbfloat16mf2x3_t): Ditto.
(vbfloat16mf2x4_t): Ditto.
(vbfloat16mf2x5_t): Ditto.
(vbfloat16mf2x6_t): Ditto.
(vbfloat16mf2x7_t): Ditto.
(vbfloat16mf2x8_t): Ditto.
(vbfloat16m1_t): Ditto.
(vbfloat16m1x2_t): Ditto.
(vbfloat16m1x3_t): Ditto.
(vbfloat16m1x4_t): Ditto.
(vbfloat16m1x5_t): Ditto.
(vbfloat16m1x6_t): Ditto.
(vbfloat16m1x7_t): Ditto.
(vbfloat16m1x8_t): Ditto.
(vbfloat16m2_t): Ditto.
(vbfloat16m2x2_t): Ditto.
(vbfloat16m2x3_t): Ditto.
(vbfloat16m2x4_t): Ditto.
(vbfloat16m4_t): Ditto.
(vbfloat16m4x2_t): Ditto.
(vbfloat16m8_t): Ditto.
(double_trunc_bfloat_scalar): Add scalar_type def for BFloat16.
(double_trunc_bfloat_vector): Add vector_type def for BFloat16.
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ELEN_BF_16):
Add required defination of BFloat16 ext.
* config/riscv/riscv-vector-switch.def (ENTRY):
Add vector_type information for BFloat16.
(TUPLE_ENTRY): Add tuple vector_type information for BFloat16.
|
|
|
|
This is a minor change to restore bootstrap on systems using gcc 4.8
as a host compiler. The fatal error is:
In file included from gcc/gcc/coretypes.h:471:0,
from gcc/gcc/config/i386/i386-expand.cc:23:
gcc/gcc/config/i386/i386-expand.cc: In function 'void ix86_expand_fp_absneg_operator(rtx_code, machine_mode, rtx_def**)':
./insn-modes.h:315:75: error: temporary of non-literal type 'scalar_float_mode' in a constant expression
#define HFmode (scalar_float_mode ((scalar_float_mode::from_int) E_HFmode))
^
gcc/gcc/config/i386/i386-expand.cc:2179:8: note: in expansion of macro 'HFmode'
case HFmode:
^
The solution is to use the E_?Fmode enumeration constants as case values
in switch statements.
2024-07-14 Roger Sayle <roger@nextmovesoftware.com>
* config/i386/i386-expand.cc (ix86_expand_fp_absneg_operator):
Use E_?Fmode enumeration constants in switch statement.
(ix86_expand_copysign): Likewise.
(ix86_expand_xorsign): Likewise.
|
|
Warn about the following:
char s[3] = "foo";
Initializing a char array with a string literal of the same length as
the size of the array is usually a mistake. Rarely is the case where
one wants to create a non-terminated character sequence from a string
literal.
In some cases, for writing faster code, one may want to use arrays
instead of pointers, since that removes the need for storing an array of
pointers apart from the strings themselves.
char *log_levels[] = { "info", "warning", "err" };
vs.
char log_levels[][7] = { "info", "warning", "err" };
This forces the programmer to specify a size, which might change if a
new entry is later added. Having no way to enforce null termination is
very dangerous, however, so it is useful to have a warning for this, so
that the compiler can make sure that the programmer didn't make any
mistakes. This warning catches the bug above, so that the programmer
will be able to fix it and write:
char log_levels[][8] = { "info", "warning", "err" };
This warning already existed as part of -Wc++-compat, but this patch
allows enabling it separately. It is also included in -Wextra, since
it may not always be desired (when unterminated character sequences are
wanted), but it's likely to be desired in most cases.
Since Wc++-compat now includes this warning, the test has to be modified
to expect the text of the new warning too, in <gcc.dg/Wcxx-compat-14.c>.
Link: https://lists.gnu.org/archive/html/groff/2022-11/msg00059.html
Link: https://lists.gnu.org/archive/html/groff/2022-11/msg00063.html
Link: https://inbox.sourceware.org/gcc/36da94eb-1cac-5ae8-7fea-ec66160cf413@gmail.com/T/
PR c/115185
gcc/c-family/ChangeLog:
* c.opt: Add -Wunterminated-string-initialization.
gcc/c/ChangeLog:
* c-typeck.cc (digest_init): Separate warnings about character
arrays being initialized as unterminated character sequences
with string literals, from -Wc++-compat, into a new warning,
-Wunterminated-string-initialization.
gcc/ChangeLog:
* doc/invoke.texi: Document the new
-Wunterminated-string-initialization.
gcc/testsuite/ChangeLog:
* gcc.dg/Wcxx-compat-14.c: Adapt the test to match the new text
of the warning, which doesn't say anything about C++ anymore.
* gcc.dg/Wunterminated-string-initialization.c: New test.
Acked-by: Doug McIlroy <douglas.mcilroy@dartmouth.edu>
Acked-by: Mike Stump <mikestump@comcast.net>
Reviewed-by: Sandra Loosemore <sloosemore@baylibre.com>
Reviewed-by: Martin Uecker <uecker@tugraz.at>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
Reviewed-by: Marek Polacek <polacek@redhat.com>
|
|
* config/cris/cris.cc (cris_option_override_after_change): Fix up
comment regarding disabling late_combine.
|
|
With late-combine, performance for coremark compiled for cris-elf
regresses 2.6% by performance and by size 0.4%, measured at
r15-2005-g13757e50ff0b, when compiled with "-O2 -march=v10".
Earlier, at r15-1880-gce34fcc572a0, numbers were by performance 3.2%
and by size 0.4%, even with the proposed patch to PR115883 (TL;DR: a
presumed bug in LRA or combine exposed by late-combine). Without that
patch, about the same performance results (at that revision).
Similarly around the late-combine commit (r15-1579-g792f97b44ffc5e).
I briefly looked at the performance regression for coremark at
r15-2005-g13757e50ff0b (with/without this patch) as far as seeing that
the stack-frame grew larger (maxing out on hard registers and needing
one more slot) for at least two of the top three* functions that
regressed the most in terms of cycles per call:
matrix_mul_matrix_bitextract (in coremark, 17% slower) and __subdf3
(in libgcc, 6.7% slower). That makes sense when considering that
late-combine "naturally" stretches register life-times. But, looking
at late_combine::combine_into_uses and late_combine::optimizable_set,
nothing stood out to me. I guess there's improvement opportunities in
late_combine::check_register_pressure.
(*) I opted not to look at _dtoa_r (in newlib) mostly because it's
boring and always shows up when something in gcc goes sideways. (It
maxes out on hard registers and is big. End of story.)
Note that the change of default is done in the
TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE worker, not in the
TARGET_OPTION_OVERRIDE worker for reasons stated in the
comment.
* config/cris/cris.cc (cris_option_override_after_change): New
function. Disable late-combine by default.
(cris_option_override): Call the new function.
|
|
|
|
gcc/
* dwarf2codeview.cc (write_lf_modifier): Expand upon comment.
|
|
CodeView symbols have to be multiples of four bytes; add an alignment
directive to write_data_symbol to ensure this.
Note that these can be zeroes, so we can rely on GAS to do this for us;
it's only types that need f3, f2, f1 values.
gcc/
* dwarf2codeview.cc (write_data_symbol): Add alignment directive.
|
|
Adds names for the padding magic numbers to enum cv_leaf_type.
gcc/
* dwarf2codeview.cc (enum cv_leaf_type): Add padding constants.
(write_cv_padding): Use names for padding constants.
|
|
Make everything more gdb-friendly by using an enum for symbol constants
rather than #defines.
gcc/
* dwarf2codeview.cc (S_LDATA32, S_GDATA32, S_COMPILE3): Undefine.
(enum cv_sym_type): Define.
(struct codeview_symbol): Use enum cv_sym_type.
(write_codeview_symbols): Add default to switch.
|