Age | Commit message (Collapse) | Author | Files | Lines |
|
Fix test that uses -fPIC without stating the requirement for PIC
support.
for gcc/testsuite/ChangeLog
* gcc.target/i386/pr103074.c: Require fpic support.
|
|
tsvc tests all fail on systems that don't offer a malloc.h, other than
those that explicitly rule that out. Use the preprocessor to test for
malloc.h's availability.
tsvc.h also expects a definition for struct timeval, but it doesn't
include sys/time.h. Add a conditional include thereof.
for gcc/testsuite/ChangeLog
* gcc.dg/vect/tsvc/tsvc.h: Test for and conditionally include
malloc.h and sys/time.h.
|
|
Various x86 tests fail if the toolchain is configured with
--enable-frame-pointer, because the unexpected extra insns mess with
the expected asm counts. Add -fomit-frame-pointer so that they can
still pass.
for gcc/testsuite/ChangeLog
* gcc.target/i386/pieces-memcpy-7.c: Add -fomit-frame-pointer.
* gcc.target/i386/pieces-memcpy-8.c: Likewise.
* gcc.target/i386/pieces-memcpy-9.c: Likewise.
* gcc.target/i386/pieces-memset-1.c: Likewise.
* gcc.target/i386/pieces-memset-36.c: Likewise.
* gcc.target/i386/pieces-memset-4.c: Likewise.
* gcc.target/i386/pieces-memset-40.c: Likewise.
* gcc.target/i386/pieces-memset-41.c: Likewise.
* gcc.target/i386/pieces-memset-7.c: Likewise.
* gcc.target/i386/pieces-memset-8.c: Likewise.
* gcc.target/i386/pieces-memset-9.c: Likewise.
* gcc.target/i386/pr102230.c: Likewise.
* gcc.target/i386/pr78103-2.c: Likewise.
|
|
|
|
Provide a PHI analyzer framework to provive better initial values for
PHI nodes which formk groups with initial values and single statements
which modify the PHI values in some predicatable way.
PR tree-optimization/107822
PR tree-optimization/107986
gcc/
* Makefile.in (OBJS): Add gimple-range-phi.o.
* gimple-range-cache.h (ranger_cache::m_estimate): New
phi_analyzer pointer member.
* gimple-range-fold.cc (fold_using_range::range_of_phi): Use
phi_analyzer if no loop info is available.
* gimple-range-phi.cc: New file.
* gimple-range-phi.h: New file.
* tree-vrp.cc (execute_ranger_vrp): Utililze a phi_analyzer.
gcc/testsuite/
* gcc.dg/pr107822.c: New.
* gcc.dg/pr107986-1.c: New.
|
|
Allow fur_list and fold_stmt to be provided a range_query rather than
always defaultsing to NULL (which becomes a global query).
Also provide a fold_relations () routine which can provide a range_trio
for an arbitrary statement using any range_query
* gimple-range-fold.cc (fur_list::fur_list): Add range_query param
to contructors.
(fold_range): Add range_query parameter.
(fur_relation::fur_relation): New.
(fur_relation::trio): New.
(fur_relation::register_relation): New.
(fold_relations): New.
* gimple-range-fold.h (fold_range): Adjust prototypes.
(fold_relations): New.
|
|
By providing range_of_expr as a range_query, we can fold and do other
interesting things using values from the global table. Make ranger's
knonw globals available via const_query.
* gimple-range-cache.cc (ssa_cache::range_of_expr): New.
* gimple-range-cache.h (class ssa_cache): Inherit from range_query.
(ranger_cache::const_query): New.
* gimple-range.cc (gimple_ranger::const_query): New.
* gimple-range.h (gimple_ranger::const_query): New prototype.
|
|
Making them virtual allows us to interchangebly use the caches.
* gimple-range-cache.cc (ssa_cache::dump): Use get_range.
(ssa_cache::dump_range_query): Delete.
(ssa_lazy_cache::dump_range_query): Delete.
(ssa_lazy_cache::get_range): Move from header file.
(ssa_lazy_cache::clear_range): ditto.
(ssa_lazy_cache::clear): Ditto.
* gimple-range-cache.h (class ssa_cache): Virtualize.
(class ssa_lazy_cache): Inherit and virtualize.
|
|
gcc/fortran/ChangeLog:
PR fortran/104350
* simplify.cc (simplify_size): Reject DIM argument of intrinsic SIZE
with error when out of valid range.
gcc/testsuite/ChangeLog:
PR fortran/104350
* gfortran.dg/size_dim_2.f90: New test.
|
|
gcc/fortran/ChangeLog:
PR fortran/103794
* check.cc (gfc_check_reshape): Expand constant arguments SHAPE and
ORDER before checking.
* gfortran.h (gfc_is_constant_array_expr): Add prototype.
* iresolve.cc (gfc_resolve_reshape): Expand constant argument SHAPE.
* simplify.cc (is_constant_array_expr): If array is determined to be
constant, expand small array constructors if needed.
(gfc_is_constant_array_expr): Wrapper for is_constant_array_expr.
(gfc_simplify_reshape): Fix check for insufficient elements in SOURCE
when no padding specified.
gcc/testsuite/ChangeLog:
PR fortran/103794
* gfortran.dg/reshape_10.f90: New test.
* gfortran.dg/reshape_11.f90: New test.
|
|
gcc/ChangeLog:
* value-range.h (vrange::kind): Remove.
|
|
PR middle-end/109840 is a regression introduced by my recent patch to
fold popcount(bswap(x)) as popcount(x). When the bswap and the popcount
have the same precision, everything works fine, but this optimization also
allowed a zero-extension between the two. The oversight is that we need
to be strict with type conversions, both to avoid accidentally changing
the argument type to popcount, and also to reflect the effects of
argument/return-value promotion in the call to bswap, so this zero extension
needs to be preserved/explicit in the optimized form.
Interestingly, match.pd should (in theory) be able to narrow calls to
popcount and parity, removing a zero-extension from its argument, but
that is an independent optimization, that needs to check IFN_ support.
Many thanks to Andrew Pinski for his help/fixes with these transformations.
2023-05-24 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/109840
* match.pd <popcount optimizations>: Preserve zero-extension when
optimizing popcount((T)bswap(x)) and popcount((T)rotate(x,y)) as
popcount((T)x), so the popcount's argument keeps the same type.
<parity optimizations>: Likewise preserve extensions when
simplifying parity((T)bswap(x)) and parity((T)rotate(x,y)) as
parity((T)x), so that the parity's argument type is the same.
gcc/testsuite/ChangeLog
PR middle-end/109840
* gcc.dg/fold-parity-8.c: New test.
* gcc.dg/fold-popcount-11.c: Likewise.
|
|
This patch encapsulates the ipa_vr internals into an API. It also
makes it type agnostic, in preparation for upcoming changes to IPA.
Interestingly, there's a 0.44% improvement to IPA-cp, which I'm sure
we'll soak up with future changes in this area :).
gcc/ChangeLog:
* ipa-cp.cc (ipa_value_range_from_jfunc): Use new ipa_vr API.
(ipcp_store_vr_results): Same.
* ipa-prop.cc (ipa_vr::ipa_vr): New.
(ipa_vr::get_vrange): New.
(ipa_vr::set_unknown): New.
(ipa_vr::streamer_read): New.
(ipa_vr::streamer_write): New.
(write_ipcp_transformation_info): Use new ipa_vr API.
(read_ipcp_transformation_info): Same.
(ipa_vr::nonzero_p): Delete.
(ipcp_update_vr): Use new ipa_vr API.
* ipa-prop.h (class ipa_vr): Provide an API and hide internals.
* ipa-sra.cc (zap_useless_ipcp_results): Use new ipa_vr API.
gcc/testsuite/ChangeLog:
* gcc.dg/ipa/pr78121.c: Adjust for vrange::dump use.
* gcc.dg/ipa/vrp1.c: Same.
* gcc.dg/ipa/vrp2.c: Same.
* gcc.dg/ipa/vrp3.c: Same.
* gcc.dg/ipa/vrp4.c: Same.
* gcc.dg/ipa/vrp5.c: Same.
* gcc.dg/ipa/vrp6.c: Same.
* gcc.dg/ipa/vrp7.c: Same.
* gcc.dg/ipa/vrp8.c: Same.
|
|
One of the supplied argument strings is unneccesarily long (c-sky, using
basically the same code, fixed it to a shorter length) and this fixes overflow
warnings, as GCC fails to deduce that the full 256 bytes for load_op[] are
not used at all.
gcc/ChangeLog:
* config/mcore/mcore.cc (output_inline_const) Make buffer smaller to
silence overflow warnings later on.
|
|
Also, move v<any_shift:insn>v8qi3 expander to a better place and enable
it with TARGET_MMX_WITH_SSE. Remove handling of V8QImode from
ix86_expand_vecop_qihi2 since all partial QI->HI vector modes expand
via ix86_expand_vecop_qihi_partial.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Remove handling of V8QImode.
* config/i386/mmx.md (v<insn>v8qi3): Move from sse.md.
Call ix86_expand_vecop_qihi_partial. Enable for TARGET_MMX_WITH_SSE.
(v<insn>v4qi3): Ditto.
* config/i386/sse.md (v<insn>v8qi3): Remove.
gcc/testsuite/ChangeLog:
* gcc.target/i386/vect-shiftv4qi.c (dg-options):
Remove -ftree-vectorize.
* gcc.target/i386/vect-shiftv8qi.c (dg-options): Ditto.
* gcc.target/i386/vect-vshiftv4qi.c: New test.
* gcc.target/i386/vect-vshiftv8qi.c: New test.
|
|
Continuing the series of straightforward annotations, this one handles the normal (not widening or narrowing) vector shifts.
Tests included.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_simd_lshr<mode>): Rename to...
(aarch64_simd_lshr<mode><vczle><vczbe>): ... This.
(aarch64_simd_ashr<mode>): Rename to...
(aarch64_simd_ashr<mode><vczle><vczbe>): ... This.
(aarch64_simd_imm_shl<mode>): Rename to...
(aarch64_simd_imm_shl<mode><vczle><vczbe>): ... This.
(aarch64_simd_reg_sshl<mode>): Rename to...
(aarch64_simd_reg_sshl<mode><vczle><vczbe>): ... This.
(aarch64_simd_reg_shl<mode>_unsigned): Rename to...
(aarch64_simd_reg_shl<mode>_unsigned<vczle><vczbe>): ... This.
(aarch64_simd_reg_shl<mode>_signed): Rename to...
(aarch64_simd_reg_shl<mode>_signed<vczle><vczbe>): ... This.
(vec_shr_<mode>): Rename to...
(vec_shr_<mode><vczle><vczbe>): ... This.
(aarch64_<sur>shl<mode>): Rename to...
(aarch64_<sur>shl<mode><vczle><vczbe>): ... This.
(aarch64_<sur>q<r>shl<mode>): Rename to...
(aarch64_<sur>q<r>shl<mode><vczle><vczbe>): ... This.
gcc/testsuite/ChangeLog:
PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add testing for shifts.
* gcc.target/aarch64/simd/pr99195_6.c: Likewise.
* gcc.target/aarch64/simd/pr99195_8.c: New test.
|
|
The following dispatches to V2DImode CTOR expansion instead of
using sets of (subreg:DI (reg:V16QI 146) [08]) which causes
LRA to spill DImode and reload V16QImode. The same applies for
V8QImode or V4HImode construction from SImode parts which happens
during 32bit libgcc build.
PR target/109944
* config/i386/i386-expand.cc (ix86_expand_vector_init_general):
Perform final vector composition using
ix86_expand_vector_init_general instead of setting
the highpart and lowpart which causes spilling.
* gcc.target/i386/pr109944-1.c: New testcase.
* gcc.target/i386/pr109944-2.c: Likewise.
|
|
Do not update and propagate a global value if it hasn't changed.
PR tree-optimization/109695
* gimple-range-cache.cc (ranger_cache::get_global_range): Add
changed param.
* gimple-range-cache.h (ranger_cache::get_global_range): Ditto.
* gimple-range.cc (gimple_ranger::range_of_stmt): Pass changed
flag to set_global_range.
(gimple_ranger::prefill_stmt_dependencies): Ditto.
|
|
Instead of using 0, use negative timestamps to reflect always_current state.
If the value doesn't change, keep the timestamp rather than creating a new
one and invalidating any dependencies.
PR tree-optimization/109695
* gimple-range-cache.cc (temporal_cache::temporal_value): Return
a positive int.
(temporal_cache::current_p): Check always_current method.
(temporal_cache::set_always_current): Add param and set value
appropriately.
(temporal_cache::always_current_p): New.
(ranger_cache::get_global_range): Adjust.
(ranger_cache::set_global_range): set always current first.
|
|
Instead of defaulting to VARYING, fold the stmt using just global ranges.
PR tree-optimization/109695
* gimple-range-cache.cc (ranger_cache::get_global_range): Call
fold_range with global query to choose an initial value.
|
|
An obvious fix to make all enum naming consistent.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (enum frm_field_enum): Add FRM_
prefix.
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
|
|
The PR109849 fix made us no longer hoist some memory loads because
of the expression set intersection. We can still avoid to compute
the union by simply taking the first sets expressions and leave
the pruning of expressions with values not suitable for hoisting
to sorted_array_from_bitmap_set.
PR tree-optimization/109849
* tree-ssa-pre.cc (do_hoist_insertion): Do not intersect
expressions but take the first sets.
* gcc.dg/tree-ssa/ssa-hoist-9.c: New testcase.
|
|
This patch fixes the case when a single character constant literal is
passed as a string actual parameter to an ARRAY OF CHAR formal parameter.
To be consistent a single character is promoted to a string and nul
terminated (and its high value is 1). Previously a single character
string would not be nul terminated and the high value was 0.
The documentation now includes a section describing the expected behavior
and included in this patch is some regression test code matching the
table inside the documentation.
gcc/ChangeLog:
PR modula2/109952
* doc/gm2.texi (High procedure function): New node.
(Using): New menu entry for High procedure function.
gcc/m2/ChangeLog:
PR modula2/109952
* Make-maintainer.in: Change header to include emacs file mode.
* gm2-compiler/M2GenGCC.mod (BuildHighFromChar): Check whether
operand is a constant string and is nul terminated then return one.
* gm2-compiler/PCSymBuild.mod (WalkFunction): Add default return
TRUE. Static analysis missing return path fix.
* gm2-libs/IO.mod (Init): Rewrite to help static analysis.
* target-independent/m2/gm2-libs.texi: Rebuild.
gcc/testsuite/ChangeLog:
PR modula2/109952
* gm2/pim/run/pass/hightests.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
When I wrote early-remat, the DF_FORWARD block order was a postorder
of a reverse/backward walk (i.e. of the inverted cfg), rather than a
reverse postorder of a forward walk. A postorder of a backward walk
lacked the important property that dominators come before the blocks
they dominate; instead it ensures that postdominators come after
the blocks that they postdominate.
The DF_BACKWARD block order was similarly a postorder of a forward
walk. Since early-remat wanted a standard postorder and reverse
postorder with normal dominator properties, it used the DF_BACKWARD
order instead of the DF_FORWARD order.
g:53dddbfeb213ac4ec39f fixed the DF orders so that DF_FORWARD was
an RPO of a forward walk and so that DF_BACKWARD was an RPO of a
backward walk. This meant that iterating backwards over the
DF_BACKWARD order had the exact problem that the original DF_FORWARD
order had, triggering a flurry of ICEs for SVE.
This fixes the build with SVE enabled. It also fixes an ICE
in g++.target/aarch64/sve/pr99766.C with normal builds. I've
included the test from the PR as well, for extra coverage.
gcc/
PR rtl-optimization/109940
* early-remat.cc (postorder_index): Rename to...
(rpo_index): ...this.
(compare_candidates): Sort by decreasing rpo_index rather than
increasing postorder_index.
(early_remat::sort_candidates): Calculate the forward RPO from
DF_FORWARD.
(early_remat::local_phase): Follow forward RPO using DF_FORWARD,
rather than DF_BACKWARD in reverse.
gcc/testsuite/
* gcc.dg/torture/pr109940.c: New test.
|
|
As the PR says we shouldn't be using qualifier_unsigned for the return type of the __ssat intrinsics.
UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS already exists for that.
This was just a thinko.
This patch fixes this and the warning with -Wconversion goes away.
Bootstrapped and tested on arm-none-linux-gnueabihf.
gcc/ChangeLog:
PR target/109939
* config/arm/arm-builtins.cc (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS): Use
qualifier_none for the return operand.
gcc/testsuite/ChangeLog:
PR target/109939
* gcc.target/arm/pr109939.c: New test.
|
|
This patch is adding mask logic auto-vectorization, define the pattern
as "define_insn_and_split" to allow combine PASS easily combine series
instructions.
For example:
combine vmxor.mm + vmnot.m into vmxnor.mm
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/autovec.md (<optab><mode>3): New pattern.
(one_cmpl<mode>2): Ditto.
(*<optab>not<mode>): Ditto.
(*n<optab><mode>): Ditto.
* config/riscv/riscv-v.cc (expand_vec_cmp_float): Change to
one_cmpl.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cmp/vcond-4.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-4.c: New test.
|
|
The bogus warning is present on 32-bit ppc-vx7r2 too, so drop the 64
from the powerpc xfail triplet.
for gcc/testsuite/ChangeLog
* gcc.dg/uninit-pred-9_b.c: Xfail bogus warning on 32-bit ppc
as well.
|
|
The expected results for signbit-2 only arise on x86 with avx512f
disabled and sse2 enabled. The patch already disables avx512f
explicitly, but it fails to enable sse2.
for gcc/testsuite/ChangeLog
* gcc.dg/signbit-2.c: Add -msse2 on x86.
|
|
The sysconf function is only available in rtp mode on vxworks. In
kernel mode, it is not even declared, but the feature test macro in
the testsuite doesn't notice its absence because it's a link test, and
vxworks kernel mode uses partial linking.
This patch introduces an alternate test on vxworks targets to check
for a declaration and for an often-used sysconf parameter.
for gcc/testsuite/ChangeLog
* lib/target-supports.exp (check_effective_target_sysconf):
Check for declaration and _SC_PAGESIZE on vxworks.
|
|
Following Richi's suggestion in [1], I'm working on deferring
cost evaluation next to the transformation, this patch is
to enhance function vect_transform_slp_perm_load_1 which
could under-cost for vector permutation, since the costing
doesn't try to consider nvectors_per_build, it's inconsistent
with the transformation part.
Basically it changes the below
if (index == count)
{
if (!noop_p)
{
// A ...
// ++*n_perms;
if (!analyze_only)
{
// B1 ...
// B2 ...
for ...
// B3 building VEC_PERM_EXPR
}
}
else if (!analyze_only)
{
// no B2 since no any further uses here.
for ...
// B4 building nothing
}
// B5 ...
}
to:
if (index == count)
{
if (!noop_p)
{
// A ...
if (!analyze_only)
// B1 ...
// B2 ... (trivial computations during analyze_only or not)
for ...
{
// now n_perms is consistent with building VEC_PERM_EXPR
// ++*n_perms;
if (analyze_only)
continue;
// B3 building VEC_PERM_EXPR
}
}
else if (!analyze_only)
{
// no B2 since no any further uses here.
for ...
// B4 building nothing
}
// B5 ...
}
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
gcc/ChangeLog:
* tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
calculation on n_perms by considering nvectors_per_build.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
|
|
This patch enable RVV auto-vectorization including floating-point
unorder and order comparison.
The testcases are leveraged from Richard. So include Richard as co-author.
And this patch is the prerequisite patch for my current middle-end work.
Without this patch, I can't support len_mask_xxx middle-end pattern
since the mask is generated by comparison.
For example,
for (int i...; i < n.)
if (cond[i])
a[i] = b[i]
We need len_mask_load/len_mask_store for such code and I am gonna
support them in the middle-end after this patch is merged.
Both integer && floating (order and unorder) are tested.
built && regression passed.
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com>
gcc/ChangeLog:
* config/riscv/autovec.md (@vcond_mask_<mode><vm>): New pattern.
(vec_cmp<mode><vm>): New pattern.
(vec_cmpu<mode><vm>): New pattern.
(vcond<V:mode><VI:mode>): New pattern.
(vcondu<V:mode><VI:mode>): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): Add new enum.
(emit_vlmax_merge_insn): New function.
(emit_vlmax_cmp_insn): Ditto.
(emit_vlmax_cmp_mu_insn): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float): Ditto.
(expand_vcond): Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_merge_insn): Ditto.
(emit_vlmax_cmp_insn): Ditto.
(emit_vlmax_cmp_mu_insn): Ditto.
(get_cmp_insn_code): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float): Ditto.
(expand_vcond): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-3.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: New test.
|
|
This patch support the RVV VREINTERPRET from the vbool*_t to the
vuint*m1_t. Aka:
vuint*m1_t __riscv_vreinterpret_x_x(vbool*_t);
These APIs help the users to convert vector the vbool*_t to the LMUL=1
unsigned integer vint*_t. According to the RVV intrinsic SPEC as below,
the reinterpret intrinsics only change the types of the underlying contents.
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#reinterpret-vbool-o-vintm1
For example, given below code.
vuint8m1_t test_vreinterpret_v_b1_vuint8m1 (vbool1_t src) {
return __riscv_vreinterpret_v_b1_u8m1 (src);
}
It will generate the assembly code similar as below:
vsetvli a5,zero,e8,m8,ta,ma
vlm.v v1,0(a1)
vs1r.v v1,0(a0)
ret
Please NOTE the test files doesn't cover all the possible combinations
of the intrinsic APIs introduced by this PATCH due to too many.
This is the last PATCH for the reinterpret between the signed/unsigned
and the bool vector types.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/genrvv-type-indexer.cc (main): Add
unsigned_eew*_lmul1_interpret for indexer.
* config/riscv/riscv-vector-builtins-functions.def (vreinterpret):
Register vuint*m1_t interpret function.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_UNSIGNED_EEW8_LMUL1_INTERPRET_OPS):
New macro for vuint8m1_t.
(DEF_RVV_UNSIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
(DEF_RVV_UNSIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
(DEF_RVV_UNSIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
(vbool1_t): Add to unsigned_eew*_interpret_ops.
(vbool2_t): Likewise.
(vbool4_t): Likewise.
(vbool8_t): Likewise.
(vbool16_t): Likewise.
(vbool32_t): Likewise.
(vbool64_t): Likewise.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_UNSIGNED_EEW8_LMUL1_INTERPRET_OPS):
New macro for vuint*m1_t.
(DEF_RVV_UNSIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
(DEF_RVV_UNSIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
(DEF_RVV_UNSIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
(required_extensions_p): Add vuint*m1_t interpret case.
* config/riscv/riscv-vector-builtins.def (unsigned_eew8_lmul1_interpret):
Add vuint*m1_t interpret to base type.
(unsigned_eew16_lmul1_interpret): Likewise.
(unsigned_eew32_lmul1_interpret): Likewise.
(unsigned_eew64_lmul1_interpret): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c:
Enrich test cases.
|
|
This patch support the RVV VREINTERPRET from the vbool*_t to the
vint*m1_t. Aka:
vint*m1_t __riscv_vreinterpret_x_x(vbool*_t);
These APIs help the users to convert vector the vbool*_t to the LMUL=1
signed integer vint*_t. According to the RVV intrinsic SPEC as below,
the reinterpret intrinsics only change the types of the underlying contents.
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#reinterpret-vbool-o-vintm1
For example, given below code.
vint8m1_t test_vreinterpret_v_b1_vint8m1 (vbool1_t src) {
return __riscv_vreinterpret_v_b1_i8m1 (src);
}
It will generate the assembly code similar as below:
vsetvli a5,zero,e8,m8,ta,ma
vlm.v v1,0(a1)
vs1r.v v1,0(a0)
ret
Please NOTE the test files doesn't cover all the possible combinations
of the intrinsic APIs introduced by this PATCH due to too many.
The reinterpret from vbool*_t to vuint*m1_t with lmul=1 will be coverred
in another PATCH.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/genrvv-type-indexer.cc (EEW_SIZE_LIST): New macro
for the eew size list.
(LMUL1_LOG2): New macro for the log2 value of lmul=1.
(main): Add signed_eew*_lmul1_interpret for indexer.
* config/riscv/riscv-vector-builtins-functions.def (vreinterpret):
Register vint*m1_t interpret function.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_SIGNED_EEW8_LMUL1_INTERPRET_OPS):
New macro for vint8m1_t.
(DEF_RVV_SIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
(DEF_RVV_SIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
(DEF_RVV_SIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
(vbool1_t): Add to signed_eew*_interpret_ops.
(vbool2_t): Likewise.
(vbool4_t): Likewise.
(vbool8_t): Likewise.
(vbool16_t): Likewise.
(vbool32_t): Likewise.
(vbool64_t): Likewise.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_SIGNED_EEW8_LMUL1_INTERPRET_OPS):
New macro for vint*m1_t.
(DEF_RVV_SIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
(DEF_RVV_SIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
(DEF_RVV_SIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
(required_extensions_p): Add vint8m1_t interpret case.
* config/riscv/riscv-vector-builtins.def (signed_eew8_lmul1_interpret):
Add vint*m1_t interpret to base type.
(signed_eew16_lmul1_interpret): Likewise.
(signed_eew32_lmul1_interpret): Likewise.
(signed_eew64_lmul1_interpret): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c:
Enrich the test cases.
|
|
To fix this issue, we seperate Vl operand and normal operands.
gcc/ChangeLog:
* config/riscv/autovec.md: Adjust for new interface.
* config/riscv/riscv-protos.h (emit_vlmax_insn): Add VL operand.
(emit_nonvlmax_insn): Add AVL operand.
* config/riscv/riscv-v.cc (emit_vlmax_insn): Add VL operand.
(emit_nonvlmax_insn): Add AVL operand.
(sew64_scalar_helper): Adjust for new interface.
(expand_tuple_move): Ditto.
* config/riscv/vector.md: Ditto.
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
|
|
This simple patch fixes the magic number, remove magic number make codes
more reasonable.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vec_series): Remove magic number.
(expand_const_vector): Ditto.
(legitimize_move): Ditto.
(sew64_scalar_helper): Ditto.
(expand_tuple_move): Ditto.
(expand_vector_init_insert_elems): Ditto.
* config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
|
|
Also for 64-bit vector abs intrinsics _mm_abs_{pi8,pi16,pi32}.
gcc/ChangeLog:
PR target/109900
* config/i386/i386.cc (ix86_gimple_fold_builtin): Fold
_mm{,256,512}_abs_{epi8,epi16,epi32,epi64} and
_mm_abs_{pi8,pi16,pi32} into gimple ABS_EXPR.
(ix86_masked_all_ones): Handle 64-bit mask.
* config/i386/i386-builtin.def: Replace icode of related
non-mask simd abs builtins with CODE_FOR_nothing.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr109900.c: New test.
|
|
|
|
Size expressions were sometimes lost and not gimplified correctly,
leading to ICEs and incorrect evaluation order. Fix this by 1) not
recursing pointers when gimplifying parameters, which was incorrect
because it might access variables declared later for incomplete
structs, and 2) adding a decl expr for variably-modified arrays
that are pointed to by parameters declared as arrays.
PR c/109450
gcc/
* function.cc (gimplify_parm_type): Remove function.
(gimplify_parameters): Call gimplify_type_sizes.
gcc/c/
* c-decl.cc (add_decl_expr): New function.
(grokdeclarator): Add decl expr for size expression in
types pointed to by parameters declared as arrays.
gcc/testsuite/
* gcc.dg/pr109450-1.c: New test.
* gcc.dg/pr109450-2.c: New test.
* gcc.dg/vla-26.c: New test.
|
|
Size expressions were sometimes lost and not gimplified correctly, leading to
ICEs and incorrect evaluation order. Fix this by 1) not recursing into
pointers when gimplifying parameters in the middle-end (the code is merged with
gimplify_type_sizes), which is incorrect because it might access variables
declared later for incomplete structs, and 2) tracking size expressions for
struct/union members correctly, 3) emitting code to evaluate size expressions
for missing cases (nested functions, empty declarations, and structs/unions).
PR c/70418
PR c/106465
PR c/107557
PR c/108423
gcc/c/
* c-decl.cc (start_decl): Make sure size expression are
evaluated only in correct context.
(grokdeclarator): Size expression in fields may need a bind
expression, make sure DECL_EXPR is always created.
(grokfield, declspecs_add_type): Pass along size expressions.
(finish_struct): Remove unneeded DECL_EXPR.
(start_function): Evaluate size expressions for nested functions.
* c-parser.cc (c_parser_struct_declarations,
c_parser_struct_or_union_specifier): Pass along size expressions.
(c_parser_declaration_or_fndef): Evaluate size expression.
(c_parser_objc_at_property_declaration,
c_parser_objc_class_instance_variables): Adapt.
* c-tree.h (grokfield): Adapt declaration.
gcc/testsuite/
* gcc.dg/nested-vla-1.c: New test.
* gcc.dg/nested-vla-2.c: New test.
* gcc.dg/nested-vla-3.c: New test.
* gcc.dg/pr70418.c: New test.
* gcc.dg/pr106465.c: New test.
* gcc.dg/pr107557-1.c: New test.
* gcc.dg/pr107557-2.c: New test.
* gcc.dg/pr108423-1.c: New test.
* gcc.dg/pr108423-2.c: New test.
* gcc.dg/pr108423-3.c: New test.
* gcc.dg/pr108423-4.c: New test.
* gcc.dg/pr108423-5.c: New test.
* gcc.dg/pr108423-6.c: New test.
* gcc.dg/typename-vla-2.c: New test.
* gcc.dg/typename-vla-3.c: New test.
* gcc.dg/typename-vla-4.c: New test.
* gcc.misc-tests/gcov-pr85350.c: Adapt.
|
|
By making use of the 'addsub_operator' added in the last patch.
gcc/ChangeLog:
* config/xtensa/xtensa.md (*addsubx): Rename from '*addx',
and change to also accept '*subx' pattern.
(*subx): Remove.
|
|
This patch decreses one machine instruction from "single bit extraction
with shifting" operation, and tries to eliminate the conditional
branch if CST2_POW2 doesn't fit into signed 12 bits with the help
of ifcvt optimization.
/* example #1 */
int test0(int x) {
return (x & 1048576) != 0 ? 1024 : 0;
}
extern int foo(void);
int test1(void) {
return (foo() & 1048576) != 0 ? 16777216 : 0;
}
;; before
test0:
movi a9, 0x400
srai a2, a2, 10
and a2, a2, a9
ret.n
test1:
addi sp, sp, -16
s32i.n a0, sp, 12
call0 foo
extui a2, a2, 20, 1
slli a2, a2, 20
beqz.n a2, .L2
movi.n a2, 1
slli a2, a2, 24
.L2:
l32i.n a0, sp, 12
addi sp, sp, 16
ret.n
;; after
test0:
extui a2, a2, 20, 1
slli a2, a2, 10
ret.n
test1:
addi sp, sp, -16
s32i.n a0, sp, 12
call0 foo
l32i.n a0, sp, 12
extui a2, a2, 20, 1
slli a2, a2, 24
addi sp, sp, 16
ret.n
In addition, if the left shift amount ('exact_log2(CST2_POW2)') is
between 1 through 3 and a either addition or subtraction with another
register follows, emit a ADDX[248] or SUBX[248] machine instruction
instead of separate left shift and add/subtract ones.
/* example #2 */
int test2(int x, int y) {
return ((x & 1048576) != 0 ? 4 : 0) + y;
}
int test3(int x, int y) {
return ((x & 2) != 0 ? 8 : 0) - y;
}
;; before
test2:
movi.n a9, 4
srai a2, a2, 18
and a2, a2, a9
add.n a2, a2, a3
ret.n
test3:
movi.n a9, 8
slli a2, a2, 2
and a2, a2, a9
sub a2, a2, a3
ret.n
;; after
test2:
extui a2, a2, 20, 1
addx4 a2, a2, a3
ret.n
test3:
extui a2, a2, 1, 1
subx8 a2, a2, a3
ret.n
gcc/ChangeLog:
* config/xtensa/predicates.md (addsub_operator): New.
* config/xtensa/xtensa.md (*extzvsi-1bit_ashlsi3,
*extzvsi-1bit_addsubx): New insn_and_split patterns.
* config/xtensa/xtensa.cc (xtensa_rtx_costs):
Add a special case about ifcvt 'noce_try_cmove()' to handle
constant loads that do not fit into signed 12 bits in the
patterns added above.
|
|
The x86 backend looks at the SLP node passed to the add_stmt_cost
hook when costing vec_construct, looking for elements that require
a move from a GPR to a vector register and cost that. But since
vect_prologue_cost_for_slp decomposes the cost for an external
SLP node into individual pieces this cost gets applied N times
without a chance for the backend to know it's just dealing with
a part of the SLP node. Just looking at a part is also not perfect
since the GPR to XMM move cost applies only once per distinct
element so handling the whole SLP node one more correctly reflects
cost (albeit without considering other external SLP nodes).
The following addresses the issue by passing down the SLP node
only for one piece and nullptr for the rest. The x86 backend
is currently the only one looking at it.
In the future the cost of external elements is something to deal
with globally but that would require the full SLP tree be available
to costing.
It's difficult to write a testcase, at the tipping point not
vectorizing is better so I'll followup with x86 specific adjustments
and will see to add a testcase later.
PR tree-optimization/109747
* tree-vect-slp.cc (vect_prologue_cost_for_slp): Pass down
the SLP node only once to the cost hook.
|
|
Some miscomputation of rtx_costs lead to sub-optimal code for
single-bit bit insertions. This patch implements TARGET_INSN_COST,
which has a chance to see the whole insn during insn combination;
in partictlar the SET_DEST of (set (zero_extract (...) ...)).
gcc/
* config/avr/avr.cc (avr_insn_cost): New static function.
(TARGET_INSN_COST): Define to that function.
|
|
The following also accounts for a GPR->XMM move cost for splat
operations and properly guards eliding the cost when moving from
memory only for SSE4.1 or HImode or larger operands. This
doesn't fix the PR fully yet.
PR target/109944
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
For vector construction or splats apply GPR->XMM move
costing. QImode memory can be handled directly only
with SSE4.1 pinsrb.
|
|
This is a small adjustment to the work done for PR108752 and
better reflects the cost of the generated sequence.
PR tree-optimization/108752
* tree-vect-stmts.cc (vectorizable_operation): For bit
operations with generic word_mode vectors do not cost
an extra stmt. For plus, minus and negate also cost the
constant materialization.
|
|
Add V8QImode and V4QImode vector shift patterns that call into
ix86_expand_vecop_qihi_partial. Generate special sequences
for constant count operands.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial):
Call ix86_expand_vec_shift_qihi_constant for shifts
with constant count operand.
* config/i386/i386.cc (ix86_shift_rotate_cost):
Handle V4QImode and V8QImode.
* config/i386/mmx.md (<insn>v8qi3): New insn pattern.
(<insn>v4qi3): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/vect-shiftv4qi.c: New test.
* gcc.target/i386/vect-shiftv8qi.c: New test.
|
|
I just notice the warning:
../../../riscv-gcc/gcc/config/riscv/vector.md:618:1: warning: source
missing a mode?
gcc/ChangeLog:
* config/riscv/vector.md: Add mode.
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
|
|
This patch removes a buggy special case in irange::invert which seems
to have been broken for a while, and probably never triggered because
the legacy code was handled elsewhere, and the non-legacy code was
using an int_range_max of int_range<255> which made it extremely
likely for num_ranges == 255. However, with auto-resizing ranges,
int_range_max will start off at 3 and can hit this bogus code in the
unswitching code.
PR tree-optimization/109934
gcc/ChangeLog:
* value-range.cc (irange::invert): Remove buggy special case.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr109934.c: New test.
|
|
This dumps ANTIC_OUT before pruning clobbered mems from it as part
of the ANTIC_IN compute.
* tree-ssa-pre.cc (compute_antic_aux): Dump the correct
ANTIC_OUT.
|
|
At -O2, and so with SLP vectorisation enabled:
struct complx_t { float re, im; };
complx_t add(complx_t a, complx_t b) {
return {a.re + b.re, a.im + b.im};
}
generates:
fmov w3, s1
fmov x0, d0
fmov x1, d2
fmov w2, s3
bfi x0, x3, 32, 32
fmov d31, x0
bfi x1, x2, 32, 32
fmov d30, x1
fadd v31.2s, v31.2s, v30.2s
fmov x1, d31
lsr x0, x1, 32
fmov s1, w0
lsr w0, w1, 0
fmov s0, w0
ret
This is because complx_t is passed and returned in FPRs, but GCC gives
it DImode. We therefore “need” to assemble a DImode pseudo from the
two individual floats, bitcast it to a vector, do the arithmetic,
bitcast it back to a DImode pseudo, then extract the individual floats.
There are many problems here. The most basic is that we shouldn't
use SLP for such a trivial example. But SLP should in principle be
beneficial for more complicated examples, so preventing SLP for the
example above just changes the reproducer needed. A more fundamental
problem is that it doesn't make sense to use single DImode pseudos in a
testcase like this. I have a WIP patch to allow re and im to be stored
in individual SFmode pseudos instead, but it's quite an invasive change
and might end up going nowhere.
A simpler problem to tackle is that we allow DImode pseudos to be stored
in FPRs, but we don't provide any patterns for inserting values into
them, even though INS makes that easy for element-like insertions.
This patch adds some patterns for that.
Doing that showed that aarch64_modes_tieable_p was too strict:
it didn't allow SFmode and DImode values to be tied, even though
both of them occupy a single GPR and FPR, and even though we allow
both classes to change between the modes.
The *aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS> pattern is
especially ugly, but it's not clear what target-independent
code ought to simplify it to, if it was going to simplify it.
We should probably do the same thing for extractions, but that's left
as future work.
After the patch we generate:
ins v0.s[1], v1.s[0]
ins v2.s[1], v3.s[0]
fadd v0.2s, v0.2s, v2.2s
fmov x0, d0
ushr d1, d0, 32
lsr w0, w0, 0
fmov s0, w0
ret
which seems like a step in the right direction.
All in all, there's nothing elegant about this patchh. It just
seems like the least worst option.
gcc/
PR target/109632
* config/aarch64/aarch64.cc (aarch64_modes_tieable_p): Allow
subregs between any scalars that are 64 bits or smaller.
* config/aarch64/iterators.md (SUBDI_BITS): New int iterator.
(bits_etype): New int attribute.
* config/aarch64/aarch64.md (*insv_reg<mode>_<SUBDI_BITS>)
(*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>): New patterns.
(*aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS>): Likewise.
gcc/testsuite/
* gcc.target/aarch64/ins_bitfield_1.c: New test.
* gcc.target/aarch64/ins_bitfield_2.c: Likewise.
* gcc.target/aarch64/ins_bitfield_3.c: Likewise.
* gcc.target/aarch64/ins_bitfield_4.c: Likewise.
* gcc.target/aarch64/ins_bitfield_5.c: Likewise.
* gcc.target/aarch64/ins_bitfield_6.c: Likewise.
|