Age | Commit message (Collapse) | Author | Files | Lines |
|
With respective two-operand bitwise operations now expressable by a
single VPTERNLOG, add splitters to also deal with ior and xor
counterparts of the original and-only case. Note that the splitters need
to be separate, as the placement of "not" differs in the final insns
(*iornot<mode>3, *xnor<mode>3) which are intended to pick up one half of
the result.
gcc/
PR target/100711
* config/i386/sse.md: New splitters to simplify
not;vec_duplicate;{ior,xor} as vec_duplicate;{iornot,xnor}.
gcc/testsuite/
PR target/100711
* gcc.target/i386/pr100711-4.c: New test.
* gcc.target/i386/pr100711-5.c: New test.
|
|
The intended broadcast (with AVX512) can very well be done right from
memory.
gcc/
PR target/100711
* config/i386/sse.md: Permit non-immediate operand 1 in AVX2
form of splitter for PR target/100711.
|
|
The following adjusts the tree.def documentation about VEC_PERM_EXPR
which wasn't adjusted when the restrictions of permutes with constant
mask were relaxed.
PR middle-end/110541
* tree.def (VEC_PERM_EXPR): Adjust documentation to reflect
reality.
|
|
When it's the memory operand which is to be inverted, using VPANDN*
requires a further load instruction. The same can be achieved by a
single VPTERNLOG*. Add two new alternatives (for plain memory and
embedded broadcast), adjusting the predicate for the first operand
accordingly.
Two pre-existing testcases actually end up being affected (improved) by
the change, which is reflected in updated expectations there.
gcc/
PR target/93768
* config/i386/sse.md (*andnot<mode>3): Add new alternatives
for memory form operand 1.
gcc/testsuite/
PR target/93768
* gcc.target/i386/avx512f-andn-di-zmm-2.c: New test.
* gcc.target/i386/avx512f-andn-si-zmm-2.c: Adjust expecations
towards generated code.
* gcc.target/i386/pr100711-3.c: Adjust expectations for 32-bit
code.
|
|
All combinations of and, ior, xor, and not involving two operands can be
expressed that way in a single insn.
gcc/
PR target/93768
* config/i386/i386.cc (ix86_rtx_costs): Further special-case
bitwise vector operations.
* config/i386/sse.md (*iornot<mode>3): New insn.
(*xnor<mode>3): Likewise.
(*<nlogic><mode>3): Likewise.
(andor): New code iterator.
(nlogic): New code attribute.
(ternlog_nlogic): Likewise.
gcc/testsuite/
PR target/93768
* gcc.target/i386/avx512-binop-not-1.h: New.
* gcc.target/i386/avx512-binop-not-2.h: New.
* gcc.target/i386/avx512f-orn-si-zmm-1.c: New test.
* gcc.target/i386/avx512f-orn-si-zmm-2.c: New test.
|
|
* tree-vect-stmts.cc (vect_mark_relevant): Fix typo.
|
|
gcc/ChangeLog:
* config/riscv/vector.md: Add float16 attr at sew、vlmul and ratio.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/abi-10.c: Add float16 tuple type case.
* gcc.target/riscv/rvv/base/abi-11.c: Ditto.
* gcc.target/riscv/rvv/base/abi-12.c: Ditto.
* gcc.target/riscv/rvv/base/abi-15.c: Ditto.
* gcc.target/riscv/rvv/base/abi-8.c: Ditto.
* gcc.target/riscv/rvv/base/abi-9.c: Ditto.
* gcc.target/riscv/rvv/base/abi-17.c: New test.
* gcc.target/riscv/rvv/base/abi-18.c: New test.
|
|
This patch adds support for the float16 tuple type.
gcc/ChangeLog:
* config/riscv/genrvv-type-indexer.cc (valid_type): Enable FP16 tuple.
* config/riscv/riscv-modes.def (RVV_TUPLE_MODES): New macro.
(ADJUST_ALIGNMENT): Ditto.
(RVV_TUPLE_PARTIAL_MODES): Ditto.
(ADJUST_NUNITS): Ditto.
* config/riscv/riscv-vector-builtins-types.def (vfloat16mf4x2_t):
New types.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): New macro.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): New.
* config/riscv/riscv.md: New.
* config/riscv/vector-iterators.md: New.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/tuple-28.c: New test.
* gcc.target/riscv/rvv/base/tuple-29.c: New test.
* gcc.target/riscv/rvv/base/tuple-30.c: New test.
* gcc.target/riscv/rvv/base/tuple-31.c: New test.
* gcc.target/riscv/rvv/base/tuple-32.c: New test.
|
|
A mips16e2 related test fails after the ifcvt change. The mips16e2
addition also causes a test for unrelated module to fail.
This patch adjusts branch costs when running the two affected tests.
These tests should not require the -mbranch-cost option, and
this issue needs to be addressed.
gcc/testsuite/ChangeLog:
* gcc.target/mips/mips16e2-cmov.c: Adjust branch cost to
encourage if-conversion.
* gcc.target/mips/movcc-3.c: Same as above.
|
|
|
|
The problem here is we might produce some values out of the type's
min/max (and/or valid values, e.g. signed booleans). The fix is to
use an integer type which has the same precision and signedness
as the original type.
Note two_value_replacement in phiopt had the same issue in previous
versions; though I don't know if a problem will show up there.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/110487
* match.pd (a !=/== CST1 ? CST2 : CST3): Always
build a nonstandard integer and use that.
|
|
This fixes the first part of this bug where `a ? -1 : 0`
would cause a value of 1 into the signed boolean value.
It fixes the problem by casting to an integer type of
the same size/signedness before doing the negative and
then casting to the type of expression.
OK? Bootstrapped and tested on x86_64.
gcc/ChangeLog:
* match.pd (a?-1:0): Cast type an integer type
rather the type before the negative.
(a?0:-1): Likewise.
|
|
gcc/ChangeLog:
* config/xtensa/xtensa.cc (machine_function, xtensa_expand_prologue):
Change to use HARD_REG_BIT and its macros.
* config/xtensa/xtensa.md
(peephole2: regmove elimination during DFmode input reload):
Likewise.
|
|
The following makes sure to not make conditional undefs in PHI arguments
unconditional by folding cond ? arg1 : arg2.
PR tree-optimization/110491
* tree-ssa-phiopt.cc (match_simplify_replacement): Check
whether the PHI args are possibly undefined before folding
the COND_EXPR.
* gcc.dg/torture/pr110491.c: New testcase.
|
|
We extend the machine mode from 8 to 16 bits already. But there still
one placing missing from the streamer. It has one hard coded array
for the machine code like size 256.
In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
value of the MAX_MACHINE_MODE will grow as more and more modes are
added. While the machine mode array in tree-streamer still leave 256 as is.
Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
lto_output_init_mode_table will touch the memory out of range unexpected.
This patch would like to take the MAX_MACHINE_MODE as the size of the
array in streamer, to make sure there is no potential unexpected
memory access in future. Meanwhile, this patch also adjust some place
which has MAX_MACHINE_MODE <= 256 assumption.
Care is taken that for offload compilation, we interpret the stream-in
data in terms of the host 'MAX_MACHINE_MODE' ('file_data->mode_bits'),
which very likely is different from the offload device
'MAX_MACHINE_MODE'.
gcc/
* lto-streamer-in.cc (lto_input_mode_table): Stream in the mode
bits for machine mode table.
* lto-streamer-out.cc (lto_write_mode_table): Stream out the
HOST machine mode bits.
* lto-streamer.h (struct lto_file_decl_data): New fields mode_bits.
* tree-streamer.cc (streamer_mode_table): Take MAX_MACHINE_MODE
as the table size.
* tree-streamer.h (streamer_mode_table): Ditto.
(bp_pack_machine_mode): Take 1 << ceil_log2 (MAX_MACHINE_MODE)
as the packing limit.
(bp_unpack_machine_mode): Ditto with 'file_data->mode_bits'.
gcc/lto/
* lto-common.cc (lto_file_finalize) [!ACCEL_COMPILER]: Initialize
'file_data->mode_bits'.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
|
|
... instead of just 'unsigned char *mode_table'. Preparation for a forthcoming
change, where we need to capture an additional 'file_data' item, so it seems
easier to just capture that one proper.
gcc/
* lto-streamer.h (class lto_input_block): Capture
'lto_file_decl_data *file_data' instead of just
'unsigned char *mode_table'.
* ipa-devirt.cc (ipa_odr_read_section): Adjust.
* ipa-fnsummary.cc (inline_read_section): Likewise.
* ipa-icf.cc (sem_item_optimizer::read_section): Likewise.
* ipa-modref.cc (read_section): Likewise.
* ipa-prop.cc (ipa_prop_read_section, read_replacements_section):
Likewise.
* ipa-sra.cc (isra_read_summary_section): Likewise.
* lto-cgraph.cc (input_cgraph_opt_section): Likewise.
* lto-section-in.cc (lto_create_simple_input_block): Likewise.
* lto-streamer-in.cc (lto_read_body_or_constructor)
(lto_input_toplevel_asms): Likewise.
* tree-streamer.h (bp_unpack_machine_mode): Likewise.
gcc/lto/
* lto-common.cc (lto_read_decls): Adjust.
|
|
The following removes gimple_uses_undefined_value_p and instead
uses the conservative mark_ssa_maybe_undefs in PHI-OPT, the last
user of the other API.
* tree-ssa-phiopt.cc (pass_phiopt::execute): Mark SSA undefs.
(empty_bb_or_one_feeding_into_p): Check for them.
* tree-ssa.h (gimple_uses_undefined_value_p): Remove.
* tree-ssa.cc (gimple_uses_undefined_value_p): Likewise.
|
|
The following removes an unnecessary check.
* tree-vect-loop.cc (vect_analyze_loop_costing): Remove
check guarding scalar_niter underflow.
|
|
This is a new testcase for the fixed bug.
PR tree-optimization/110376
* gcc.dg/torture/pr110376.c: New testcase.
|
|
slp_done_for_suggested_uf is used directly in vect_analyze_loop_2
without initialization, which is undefined behavior. Initialize it to false
according to the discussion.
gcc/ChangeLog:
PR tree-optimization/110531
* tree-vect-loop.cc (vect_analyze_loop_1): initialize
slp_done_for_suggested_uf to false.
|
|
The following replaces the simplistic gimple_uses_undefined_value_p
with the conservative mark_ssa_maybe_undefs approach as already
used by LIM and IVOPTs. This is to avoid exposing an unconditional
uninitialized read on a path from entry by if-combine.
PR tree-optimization/110228
* tree-ssa-ifcombine.cc (pass_tree_ifcombine::execute):
Mark SSA may-undefs.
(bb_no_side_effects_p): Check stmt uses for undefs.
* gcc.dg/torture/pr110228.c: New testcase.
* gcc.dg/uninit-pr101912.c: Un-XFAIL.
|
|
When we compute liveness and relevantness we have to make sure to
handle live but not relevant stmts in a way we can later vectorize
them. When the stmt uses only operands that do not need vectorization
we can just leave such stmts in place - but not in the case they
are recognized as patterns. Since we don't have a way to cancel
pattern recognition we have to force mark such stmts as relevant.
PR tree-optimization/110436
* tree-vect-stmts.cc (vect_mark_relevant): Expand dumping,
force live but not relevant pattern stmts relevant.
* gcc.dg/pr110436.c: New testcase.
|
|
Enable ENQCMD and UINTR for march=sierraforest according to Intel ISE
https://cdrdv2.intel.com/v1/dl/getContent/671368
gcc/ChangeLog
* config/i386/i386.h: Add PTA_ENQCMD and PTA_UINTR to PTA_SIERRAFOREST.
* doc/invoke.texi: Update new isa to march=sierraforest and grandridge.
|
|
This relaxes the condition under which Expand_Assign_Array leaves the
assignment to or from an array slice untouched. The main prerequisite
for the code generator is that everything be aligned on byte boundaries
and Is_Possibly_Unaligned_Slice is too strong a predicate for this, so
it is replaced by the combination of Possible_Bit_Aligned_Component and
Is_Bit_Packed_Array, modulo a change to Possible_Bit_Aligned_Component
to take into account the specific case of slices.
gcc/ada/
* exp_ch5.adb (Expand_Assign_Array): Adjust comment above the
calls to Possible_Bit_Aligned_Component on the LHS and RHS. Do not
call Is_Possibly_Unaligned_Slice in the slice case.
* exp_util.ads (Component_May_Be_Bit_Aligned): Add For_Slice
boolean parameter.
(Possible_Bit_Aligned_Component): Likewise.
* exp_util.adb (Component_May_Be_Bit_Aligned): Do not return False
for the slice of a small record or bit-packed array component.
(Possible_Bit_Aligned_Component): Pass For_Slice in recursive
calls, except in the slice case where True is passed, as well as
in call to Component_May_Be_Bit_Aligned.
|
|
The procedure is not stable under repeated invocation. Now it may be called
twice on the same node, for example during the expansion of the renaming of
the predefined equality operator after the unchecked union type is frozen.
gcc/ada/
* exp_ch4.ads (Expand_Unchecked_Union_Equality): Only take a
single parameter.
* exp_ch4.adb (Expand_Unchecked_Union_Equality): Add guard against
repeated invocation on the same node.
* exp_ch6.adb (Expand_Call): Only pass a single actual parameter
in the call to Expand_Unchecked_Union_Equality.
|
|
gcc/ada/
* doc/gnat_rm/standard_and_implementation_defined_restrictions.rst:
add No_Use_Of_Attribute & No_Use_Of_Pragma restrictions.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.
|
|
The query Inherited_Subprograms was returning a list containing
some subprograms whose overridding was also in the list, when
interfaces was present. This was an issue for GNATprove. Now propose
a mode for this function to filter out overridden primitives.
gcc/ada/
* sem_disp.adb (Inherited_Subprograms): Add parameter to filter
out results.
* sem_disp.ads: Likewise.
|
|
When trying to associate (v + INT_MAX) + INT_MAX we are using
the TREE_OVERFLOW bit to check for correctness. That isn't
working for VECTOR_CSTs and it can't in general when one considers
VL vectors. It looks like it should work for COMPLEX_CSTs but
I didn't try to single out _Complex int in this change.
The following makes sure that for vectors we use the fallback of
using unsigned arithmetic when associating the above to
v + (INT_MAX + INT_MAX).
PR middle-end/110495
* tree.h (TREE_OVERFLOW): Do not mention VECTOR_CSTs
since we do not set TREE_OVERFLOW on those since the
introduction of VL vectors.
* match.pd (x +- CST +- CST): For VECTOR_CST do not look
at TREE_OVERFLOW to determine validity of association.
* gcc.dg/tree-ssa/addadd-2.c: Amend.
* gcc.dg/tree-ssa/forwprop-27.c: Adjust.
|
|
The following removes late deciding to elide vectorized epilogues to
the analysis phase and also avoids altering the epilogues niter.
The costing part from vect_determine_partial_vectors_and_peeling is
moved to vect_analyze_loop_costing where we use the main loop
analysis to constrain the epilogue scalar iterations.
I have not tried to integrate this with vect_known_niters_smaller_than_vf.
It seems the for_epilogue_p parameter in
vect_determine_partial_vectors_and_peeling is largely useless and
we could compute that in the function itself.
PR tree-optimization/110310
* tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
Move costing part ...
(vect_analyze_loop_costing): ... here. Integrate better
estimate for epilogues from ...
(vect_analyze_loop_2): Call vect_determine_partial_vectors_and_peeling
with actual epilogue status.
* tree-vect-loop-manip.cc (vect_do_peeling): ... here and
avoid cancelling epilogue vectorization.
(vect_update_epilogue_niters): Remove. No longer update
epilogue LOOP_VINFO_NITERS.
* gcc.target/i386/pr110310.c: New testcase.
* gcc.dg/vect/slp-perm-12.c: Disable epilogue vectorization.
|
|
This reverts commit 3d95a524d4746ceb3065f92f30a5679afb88d16a.
gcc/ChangeLog:
* config/riscv/vector.md: Revert changes.
|
|
Hi, Richi and Richard.
Base one the review comments from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html
I change len_mask_gather_load/len_mask_scatter_store order into:
{len,bias,mask}
We adjust adding len and mask using using add_len_and_mask_args
which is same as partial_load/parial_store.
Now, the codes become more reasonable and easier maintain.
This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:
void
f (uint8_t *restrict a,
uint8_t *restrict b, int n,
int base, int step,
int *restrict cond)
{
for (int i = 0; i < n; ++i)
{
if (cond[i])
a[i * step + base] = b[i * step + base];
}
}
We hope RVV can vectorize such case into following IR:
loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, bias, control_mask)
LEN_SCATTER_STORE (... v, ..., loop_len, bias, control_mask)
This patch doesn't apply such patterns into vectorizer, just add patterns
and update the documents.
Will send patch which apply such patterns into vectorizer soon after this
patch is approved.
Ok for trunk?
gcc/ChangeLog:
* doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store.
* internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
(expand_gather_load_optab_fn): Ditto.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_gather_scatter_fn_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
(LEN_MASK_SCATTER_STORE): Ditto.
* optabs.def (OPTAB_CD): Ditto.
|
|
I recently noticed that current VSETVL pass has a unnecessary restriction on local
AVL propgation.
Consider this following case:
+ insn 1: vsetvli a5,a3,e8,mf4,ta,mu
+ insn 2: vsetvli zero,a5,e32,m1,ta,ma
+ ...
+ vle32.v v1,0(a1)
+ vsetvli a2,zero,e32,m1,ta,ma
+ vadd.vv v1,v1,v1
+ vsetvli zero,a5,e32,m1,ta,ma
+ vse32.v v1,0(a0)
+ ...
+ insn 3: sub a3,a3,a5
+ ...
We failed to elide insn 2 (vsetvl insn) since insn 3 is modifying "a3" AVL.
Actually, we don't really care about insn 3 since we should only check and make sure
there is no insn between insn 1 and insn 2 that modifies "a3" AVL. Then, we can propgate
AVL "a3" from insn 1 to insn 2. Finally, insn 2 is eliminated.
After this patch:
+ insn 1: vsetvli a5,a3,e8,mf4,ta,ma
+ ...
+ vle32.v v1,0(a1)
+ vsetvli a2,zero,e32,m1,ta,ma
+ vadd.vv v1,v1,v1
+ vsetvli zero,a5,e32,m1,ta,ma
+ vse32.v v1,0(a0)
+ ...
+ insn 3: sub a3,a3,a5
+ ...
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc
(vector_insn_info::parse_insn): Add early break.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/avl_prop-1.c: New test.
|
|
This is just expected to be a change in representation.
No code is expected to change; no new tests are added.
* config/cris/cris.md (CRIS_UNSPEC_SWAP_BITS): Remove.
("cris_swap_bits", "ctzsi2"): Use bitreverse instead.
|
|
This seems to have just been overlooked when introducing
BITREVERSE. Note that the function name mem_loc_descriptor
is a misnomer; it'd better be called rtx_loc_descriptor or
any_loc_descriptor, because "anything" RTX can end up here.
To wit, when introducing new RTL that ends up as code or for
other reasons appear in debug expressions, don't forget to
update this function. This was observed by building
libstdc+++ for cris-elf with a patch replacing the
CRIS_UNSPEC_SWAP_BITS by bitreverse, as hitting the
internal-error-generating default case.
Looking at the BSWAP, POPCOUNT and ROTATE cases, BITREVERSE
can probably be fully expressed as DWARF code if need be,
but let's start with not throwing an internal error.
gcc:
* dwarf2out.cc (mem_loc_descriptor): Handle BITREVERSE.
|
|
|
|
This series adds basic support for the vector crypto extensions:
* Zvbb
* Zvbc
* Zvkg
* Zvkned
* Zvkhn[a,b]
* Zvksed
* Zvksh
* Zvkn
* Zvknc
* Zvkng
* Zvks
* Zvksc
* Zvksg
* Zvkt
This patch is based on the v20230620 version of the Vector Cryptography
specification. The specification is frozen and can be found here:
https://github.com/riscv/riscv-crypto/releases/tag/v20230620
Binutils support is merged as 9fdc1b157b6e72f7dd98851a240c5fdb386a558e.
All extensions come with (passing) tests for the feature test macros.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add support for zvbb,
zvbc, zvkg, zvkned, zvknha, zvknhb, zvksed, zvksh, zvkn,
zvknc, zvkng, zvks, zvksc, zvksg, zvkt and the implied subsets.
* config/riscv/arch-canonicalize: Add canonicalization info for
zvkn, zvknc, zvkng, zvks, zvksc, zvksg.
* config/riscv/riscv-opts.h (MASK_ZVBB): New macro.
(MASK_ZVBC): Likewise.
(TARGET_ZVBB): Likewise.
(TARGET_ZVBC): Likewise.
(MASK_ZVKG): Likewise.
(MASK_ZVKNED): Likewise.
(MASK_ZVKNHA): Likewise.
(MASK_ZVKNHB): Likewise.
(MASK_ZVKSED): Likewise.
(MASK_ZVKSH): Likewise.
(MASK_ZVKN): Likewise.
(MASK_ZVKNC): Likewise.
(MASK_ZVKNG): Likewise.
(MASK_ZVKS): Likewise.
(MASK_ZVKSC): Likewise.
(MASK_ZVKSG): Likewise.
(MASK_ZVKT): Likewise.
(TARGET_ZVKG): Likewise.
(TARGET_ZVKNED): Likewise.
(TARGET_ZVKNHA): Likewise.
(TARGET_ZVKNHB): Likewise.
(TARGET_ZVKSED): Likewise.
(TARGET_ZVKSH): Likewise.
(TARGET_ZVKN): Likewise.
(TARGET_ZVKNC): Likewise.
(TARGET_ZVKNG): Likewise.
(TARGET_ZVKS): Likewise.
(TARGET_ZVKSC): Likewise.
(TARGET_ZVKSG): Likewise.
(TARGET_ZVKT): Likewise.
* config/riscv/riscv.opt: Introduction of riscv_zv{b,k}_subext.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zvbb.c: New test.
* gcc.target/riscv/zvbc.c: New test.
* gcc.target/riscv/zvkg.c: New test.
* gcc.target/riscv/zvkn-1.c: New test.
* gcc.target/riscv/zvkn.c: New test.
* gcc.target/riscv/zvknc-1.c: New test.
* gcc.target/riscv/zvknc-2.c: New test.
* gcc.target/riscv/zvknc.c: New test.
* gcc.target/riscv/zvkned.c: New test.
* gcc.target/riscv/zvkng-1.c: New test.
* gcc.target/riscv/zvkng-2.c: New test.
* gcc.target/riscv/zvkng.c: New test.
* gcc.target/riscv/zvknha.c: New test.
* gcc.target/riscv/zvknhb.c: New test.
* gcc.target/riscv/zvks-1.c: New test.
* gcc.target/riscv/zvks.c: New test.
* gcc.target/riscv/zvksc-1.c: New test.
* gcc.target/riscv/zvksc-2.c: New test.
* gcc.target/riscv/zvksc.c: New test.
* gcc.target/riscv/zvksed.c: New test.
* gcc.target/riscv/zvksg-1.c: New test.
* gcc.target/riscv/zvksg-2.c: New test.
* gcc.target/riscv/zvksg.c: New test.
* gcc.target/riscv/zvksh.c: New test.
* gcc.target/riscv/zvkt.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
The backtrace in the bug report suggest there is a running out of
stack during GC collection, because of a long chain of eh_landing_pad_d.
This might fix that by adding chain_next onto eh_landing_pad_d's GTY marker.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR middle-end/110510
* except.h (struct eh_landing_pad_d): Add chain_next GTY.
|
|
The addition of the multiply_defined suppress flag has been handled for some
considerable time now in the Darwin specs; remove it from the testsuite libs.
Avoid duplicates in the specs.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config/darwin.h: Avoid duplicate multiply_defined specs on
earlier Darwin versions with shared libgcc.
libstdc++-v3/ChangeLog:
* testsuite/lib/libstdc++.exp: Remove additional flag handled
by Darwin specs.
gcc/testsuite/ChangeLog:
* lib/g++.exp: Remove additional flag handled by Darwin specs.
* lib/obj-c++.exp: Likewise.
|
|
Also change internal variable from int to bool.
gcc/ChangeLog:
* tree.h (tree_int_cst_equal): Change return type from int to bool.
(operand_equal_for_phi_arg_p): Ditto.
(tree_map_base_marked_p): Ditto.
* tree.cc (contains_placeholder_p): Update function body
for bool return type.
(type_cache_hasher::equal): Ditto.
(tree_map_base_hash): Change return type
from int to void and adjust function body accordingly.
(tree_int_cst_equal): Ditto.
(operand_equal_for_phi_arg_p): Ditto.
(get_narrower): Change "first" variable to bool.
(cl_option_hasher::equal): Update function body for bool return type.
* ggc.h (ggc_set_mark): Change return type from int to bool.
(ggc_marked_p): Ditto.
* ggc-page.cc (gt_ggc_mx): Change return type
from int to void and adjust function body accordingly.
(ggc_set_mark): Ditto.
|
|
Hi, Richard. I fix the order as you suggeted.
Before this patch, the order is {len,mask,bias}.
Now, after this patch, the order becomes {len,bias,mask}.
Since you said we should not need 'internal_fn_bias_index', the bias index should always be the len index + 1.
I notice LEN_STORE order is {len,vector,bias}, to make them consistent, I reorder into LEN_STORE {len,bias,vector}.
Just like MASK_STORE {mask,vector}.
Ok for trunk ?
gcc/ChangeLog:
* config/riscv/autovec.md: Change order of
LEN_MASK_LOAD/LEN_MASK_STORE/LEN_LOAD/LEN_STORE arguments.
* config/riscv/riscv-v.cc (expand_load_store): Ditto.
* doc/md.texi: Ditto.
* gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Ditto.
* internal-fn.cc (len_maskload_direct): Ditto.
(len_maskstore_direct): Ditto.
(add_len_and_mask_args): New function.
(expand_partial_load_optab_fn): Change order of
LEN_MASK_LOAD/LEN_MASK_STORE/LEN_LOAD/LEN_STORE arguments.
(expand_partial_store_optab_fn): Ditto.
(internal_fn_len_index): New function.
(internal_fn_mask_index): Change order of
LEN_MASK_LOAD/LEN_MASK_STORE/LEN_LOAD/LEN_STORE arguments.
(internal_fn_stored_value_index): Ditto.
(internal_len_load_store_bias): Ditto.
* internal-fn.h (internal_fn_len_index): New function.
* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Change order of
LEN_MASK_LOAD/LEN_MASK_STORE/LEN_LOAD/LEN_STORE arguments.
* tree-vect-stmts.cc (vectorizable_store): Ditto.
(vectorizable_load): Ditto.
|
|
The problem is that the predefined equality operator for unchecked union
types is implemented out of line by invoking a function that takes more
parameters than the two operands, which means that the renaming is not
seen as type conforming with this function and, therefore, is rejected.
The way out is to implement these additional parameters as "extra" formal
parameters, since this kind of parameters is not taken into account for
semantic checks. The change also factors out the duplicated generation
of actuals for these additional parameters into a single procedure.
gcc/ada/
* exp_ch3.ads (Build_Variant_Record_Equality): Add Spec_Id as second
parameter.
* exp_ch3.adb (Build_Variant_Record_Equality): For unchecked union
types, build the additional parameters as extra formal parameters.
(Expand_Freeze_Record_Type.Build_Variant_Record_Equality): Pass
Empty as Spec_Id in call to Build_Variant_Record_Equality.
* exp_ch4.ads (Expand_Unchecked_Union_Equality): New procedure.
* exp_ch4.adb (Expand_Composite_Equality): In the presence of a
function implementing composite equality, do not special case the
unchecked union types, and only convert the operands if the base
types are not the same like in Build_Equality_Call.
(Build_Equality_Call): Do not special case the unchecked union types
and relocate the operands only once.
(Expand_N_Op_Eq): Do not special case the unchecked union types.
(Expand_Unchecked_Union_Equality): New procedure implementing the
specific expansion of calls to the predefined equality function.
* exp_ch6.adb (Is_Unchecked_Union_Equality): New predicate.
(Expand_Call): Call Is_Unchecked_Union_Equality to determine whether
to call Expand_Unchecked_Union_Equality or Expand_Call_Helper.
* exp_ch8.adb (Build_Body_For_Renaming): Set Has_Delayed_Freeze flag
earlier on Id and pass Id in call to Build_Variant_Record_Equality.
|
|
The expansion of the predefined equality operator for untagged record types
can be done either in line, i.e. into the component-wise comparison of the
operands, or out of line, i.e. into a call to a function implementing this
comparison, and the heuristics of the selection are essentially based on the
complexity of the implementation.
For discriminated record types with a variant part, which comprise unchecked
union types, the expansion is always done out of line. For nondiscriminated
types, the expansion is done in line, unless one of the components is of a
record type for which a user-defined equality operator exists, in which case
the expansion is done out of line.
For the third case, i.e. discriminated record types without a variant part,
the expansion is always done in line. Now given that the discriminants are
considered as mere components for the purpose of predefined equality in this
case, there does not seem to be any reason for treating it differently from
the second case above.
gcc/ada/
* exp_ch3.adb (Build_Untagged_Equality): Rename into...
(Build_Untagged_Record_Equality): ...this.
(Expand_Freeze_Record_Type): Adjust to above renaming and invoke
the procedure also for discriminated types without a variant part.
|
|
This is the clause about inferable discriminants in unchecked unions.
gcc/ada/
* sem_util.adb (Has_Inferable_Discriminants): In the case of a
component with a per-object constraint, also return true if the
enclosing object is not of an unchecked union type.
In the default case, remove a useless call to Base_Type.
|
|
The modula-2 static analysis incorrectly identifies variables as
uninitialized if they are initialized within a WITH statement. This bug
fix re-implements the variable static analysis and will detect simple
pointer record fields being accessed before being initialized.
The static analysis is limited to the first basic block in a procedure.
It does not check variant records, arrays or sets. A new option
-Wuninit-variable-checking will turn on the new semantic checking
(-Wall also enables the new checking).
gcc/ChangeLog:
PR modula2/110125
* doc/gm2.texi (Semantic checking): Include examples using
-Wuninit-variable-checking.
gcc/m2/ChangeLog:
PR modula2/110125
* Make-lang.in (GM2-COMP-BOOT-DEFS): Add M2SymInit.def.
(GM2-COMP-BOOT-MODS): Add M2SymInit.mod.
* gm2-compiler/M2BasicBlock.mod: Formatting changes.
* gm2-compiler/M2Code.mod: Remove import of VariableAnalysis from
M2Quads. Import VariableAnalysis from M2SymInit.mod.
* gm2-compiler/M2GCCDeclare.mod (PrintVerboseFromList):
Add debugging print for a component.
(TypeConstFullyDeclared): Call RememberType for every type.
* gm2-compiler/M2GenGCC.mod (CodeReturnValue): Add parameter to
GetQuadOtok.
(CodeBecomes): Add parameter to GetQuadOtok.
(CodeXIndr): Add parameter to GetQuadOtok.
* gm2-compiler/M2Optimize.mod (ReduceBranch): Reformat and
preserve operand token positions when reducing the branch
quadruples.
(ReduceGoto): Reformat.
(FoldMultipleGoto): Reformat.
(KnownReachable): Reformat.
* gm2-compiler/M2Options.def (UninitVariableChecking): New
variable declared and exported.
(SetUninitVariableChecking): New procedure.
* gm2-compiler/M2Options.mod (SetWall): Set
UninitVariableChecking.
(SetUninitVariableChecking): New procedure.
* gm2-compiler/M2Quads.def (PutQuadOtok): Exported and declared.
(VariableAnalysis): Removed.
* gm2-compiler/M2Quads.mod (PutQuadOtok): New procedure.
(doVal): Reformatted.
(MarkAsWrite): Reformatted.
(MarkArrayAsWritten): Reformatted.
(doIndrX): Use PutQuadOtok.
(MakeRightValue): Use GenQuadOtok.
(MakeLeftValue): Use GenQuadOtok.
(CheckReadBeforeInitialized): Remove.
(IsNeverAltered): Reformat.
(DebugLocation): New procedure.
(BuildDesignatorPointer): Use GenQuadO to preserve operand token
position.
(BuildRelOp): Use GenQuadOtok ditto.
* gm2-compiler/SymbolTable.def (VarCheckReadInit): New procedure.
(VarInitState): New procedure.
(PutVarInitialized): New procedure.
(PutVarFieldInitialized): New procedure function.
(GetVarFieldInitialized): New procedure function.
(PrintInitialized): New procedure.
* gm2-compiler/SymbolTable.mod (VarCheckReadInit): New procedure.
(VarInitState): New procedure.
(PutVarInitialized): New procedure.
(PutVarFieldInitialized): New procedure function.
(GetVarFieldInitialized): New procedure function.
(PrintInitialized): New procedure.
(LRInitDesc): New type.
(SymVar): InitState new field.
(MakeVar): Initialize InitState.
* gm2-gcc/m2options.h (M2Options_SetUninitVariableChecking):
New function declaration.
* gm2-lang.cc (gm2_langhook_handle_option): Detect
OPT_Wuninit_variable_checking and call SetUninitVariableChecking.
* lang.opt: Add Wuninit-variable-checking.
* gm2-compiler/M2SymInit.def: New file.
* gm2-compiler/M2SymInit.mod: New file.
gcc/testsuite/ChangeLog:
PR modula2/110125
* gm2/switches/uninit-variable-checking/fail/testinit.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testlarge.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testlarge2.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testrecinit.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testrecinit2.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testrecinit5.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testsmallrec.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testsmallrec2.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testsmallvec.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testvarinit.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testwithnoptr.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testwithptr.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testwithptr2.mod: New test.
* gm2/switches/uninit-variable-checking/fail/testwithptr3.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testrecinit3.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testrecinit5.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testsmallrec.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testsmallrec2.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testvarinit.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testwithptr.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testwithptr2.mod: New test.
* gm2/switches/uninit-variable-checking/pass/testwithptr3.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
Similar to vfwmacc. Add combine patterns as follows:
For vfwnmsac:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))
For vfwmsac:
1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))
For vfwnmacc:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*double_widen_fnma<mode>): New pattern.
(*single_widen_fnma<mode>): Ditto.
(*double_widen_fms<mode>): Ditto.
(*single_widen_fms<mode>): Ditto.
(*double_widen_fnms<mode>): Ditto.
(*single_widen_fnms<mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/widen/widen-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-12.c: New test.
|
|
Consider the following complicate case:
__attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 ( \
TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3, \
TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b, \
TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n) \
{ \
for (int i = 0; i < n; i++) \
{ \
dst[i] = (TYPE1) a[i] * (TYPE1) b[i]; \
dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i]; \
dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i]; \
dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i]; \
} \
}
TEST_TYPE (double, float)
Such complicate situation, Combine PASS can not combine extension of both operands on the fly.
So the combine PASS will first try to combine one of the combine extension, and then combine
the other. The combine flow is as follows:
Original IR:
(set (reg 0) (float_extend: (reg 1))
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (reg 0) (reg 3))
First step of combine:
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (float_extend: (reg 1) (reg 3))
Second step of combine:
(set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))
So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL pattern in autovec-opt.md
which is (set (reg 0) (mult (float_extend (reg 1) (reg 2)))).
gcc/ChangeLog:
* config/riscv/autovec-opt.md (@pred_single_widen_mul<any_extend:su><mode>): Change "@"
into "*" in pattern name which simplifies build files.
(*pred_single_widen_mul<any_extend:su><mode>): Ditto.
(*pred_single_widen_mul<mode>): New pattern.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/widen/widen-3.c: Add floating-point.
* gcc.target/riscv/rvv/autovec/widen/widen-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c: New test.
|
|
The documentation says:
-------------------------------------------------------------------------
@cindex @code{vec_extract@var{m}@var{n}} instruction pattern
@item @samp{vec_extract@var{m}@var{n}}
Extract given field from the vector value. [...] The
@var{n} mode is the mode of the field or vector of fields that should be
extracted, [...]
If @var{n} is a vector mode, the index is counted in units of that mode.
-------------------------------------------------------------------------
However, Robin pointed out that, in practice, the index is counted
in whole multiples of @var{n}. These are the semantics that x86
and target-independent code follow.
This patch updates the aarch64 pattern to match, which also removes
the FAIL. I think Robin has patches that update the documentation
and make more use of the de facto semantics.
I haven't found an existing testcase that shows the difference.
We do now use the pattern for:
union u { int32x4_t x; int32x2_t y[2]; };
int32x2_t f(int32x4_t x) { union u u = { x }; return u.y[1]; }
but we were already generating perfect code for it. Because of that,
it didn't really seem worth adding a specific dump test.
gcc/
* config/aarch64/aarch64-simd.md (vec_extract<mode><Vhalf>): Expect
the index to be 0 or 1.
|
|
This reverts commit 47e6dcb597b2d4abcab13c9dea0cc7d2131b6419.
|
|
Similar to vfwmacc. Add combine patterns as follows:
For vfwnmsac:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))
For vfwmsac:
1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))
For vfwnmacc:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*double_widen_fnma<mode>): New pattern.
(*single_widen_fnma<mode>): Ditto.
(*double_widen_fms<mode>): Ditto.
(*single_widen_fms<mode>): Ditto.
(*double_widen_fnms<mode>): Ditto.
(*single_widen_fnms<mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/widen/widen-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-12.c: New test.
|
|
This patch would like to fix one typo that take rdn instead of dyn by
mistake.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/vector.md: Fix typo.
|