Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
This patch address comments from Richard && Richi and rebase to trunk.
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750
The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:
1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration
Code
# void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i<n; i++) { z[i]=x[i]+y[i]; } }
Take RVV codegen for example:
Before this patch:
vvaddint32:
ble a0,zero,.L6
csrr a4,vlenb
srli a6,a4,2
.L4:
mv a5,a0
bleu a0,a6,.L3
mv a5,a6
.L3:
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v2,0(a1)
vle32.v v1,0(a2)
vsetvli a7,zero,e32,m1,ta,ma
sub a0,a0,a5
vadd.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a3)
add a2,a2,a4
add a3,a3,a4
add a1,a1,a4
bne a0,zero,.L4
.L6:
ret
After this patch:
vvaddint32:
vsetvli t0, a0, e32, ta, ma # Set vector length based on 32-bit vectors
vle32.v v0, (a1) # Get first vector
sub a0, a0, t0 # Decrement number done
slli t0, t0, 2 # Multiply number done by 4 bytes
add a1, a1, t0 # Bump pointer
vle32.v v1, (a2) # Get second vector
add a2, a2, t0 # Bump pointer
vadd.vv v2, v0, v1 # Sum vectors
vse32.v v2, (a3) # Store result
add a3, a3, t0 # Bump pointer
bnez a0, vvaddint32 # Loop back
ret # Finished
Co-authored-by: Richard Sandiford<richard.sandiford@arm.com>
Co-authored-by: Richard Biener <rguenther@suse.de>
gcc/ChangeLog:
* doc/md.texi: Add SELECT_VL support.
* internal-fn.def (SELECT_VL): Ditto.
* optabs.def (OPTAB_D): Ditto.
* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto.
* tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto.
(vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
|
|
PR analyzer/110112 notes that -fanalyzer is extremely slow on a source
file with large read-only static arrays, repeatedly building the
same compound_svalue representing the full initializer, and repeatedly
building svalues representing parts of the the full initialiazer.
This patch adds caches for both of these; together they reduce the time
taken by -fanalyzer -O2 on the testcase in the bug for an optimized
build:
91.2s : no caches (status quo)
32.4s : cache in decl_region::get_svalue_for_constructor
3.7s : cache in region::get_initial_value_at_main
3.1s : both caches (this patch)
gcc/analyzer/ChangeLog:
PR analyzer/110112
* region-model.cc (region_model::get_initial_value_for_global):
Move code to region::calc_initial_value_at_main.
* region.cc (region::get_initial_value_at_main): New function.
(region::calc_initial_value_at_main): New function, based on code
in region_model::get_initial_value_for_global.
(region::region): Initialize m_cached_init_sval_at_main.
(decl_region::get_svalue_for_constructor): Add a cache, splitting
out body to...
(decl_region::calc_svalue_for_constructor): ...this new function.
* region.h (region::get_initial_value_at_main): New decl.
(region::calc_initial_value_at_main): New decl.
(region::m_cached_init_sval_at_main): New field.
(decl_region::decl_region): Initialize m_ctor_svalue.
(decl_region::calc_svalue_for_constructor): New decl.
(decl_region::m_ctor_svalue): New field.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
before casting into an irange, make sure the type being cast into
is also supported.
PR ipa/109886
* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Check param
type as well.
|
|
Make range_cast inlinable by moving it to the header file.
Also trap if the destination is not capable of representing the cast type.
Add a generic version which can change range classes.. ie float to int.
* range-op.cc (range_cast): Move to...
* range-op.h (range_cast): Here and add generic a version.
|
|
Various spaceship tests failed after r14-1624. This turned out to be
because the comparison category classes return in memory on 32-bit targets,
and the synthesized operator<=> looks something like
if (auto v = a.x <=> b.x, v == 0); else return v;
if (auto v = a.y <=> b.y, v == 0); else return v;
etc.
so check_return_expr was trying to do NRVO for all the 'v' variables, and
now on subsequent returns we check to see if the previous NRV is still in
scope. But the NRVs didn't have names, so looking up name bindings crashed.
Fixed both by giving 'v' a name so we can NRVO the first one, and fixing the
test to give up if the old NRV has no name.
PR c++/110185
PR c++/58487
gcc/cp/ChangeLog:
* method.cc (build_comparison_op): Give retval a name.
* typeck.cc (check_return_expr): Fix for nameless variables.
|
|
We were failing to diagnose this Concepts TS feature that didn't make it
into C++20 because the 'auto' was getting converted to a template parameter
before we checked for it. So also check in cp_parser_simple_type_specifier.
The code in cp_parser_template_type_arg that I initially expected to
diagnose this seems unreachable because cp_parser_type_id_1 already checks
auto.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_simple_type_specifier): Check for auto
in template argument.
(cp_parser_template_type_arg): Remove auto checking.
gcc/testsuite/ChangeLog:
* g++.dg/concepts/auto7.C: New test.
* g++.dg/concepts/auto7a.C: New test.
|
|
The maybe_init_list_as_range optimization is a form of copy elision, but we
can only elide well-formed copies.
PR c++/110102
gcc/cp/ChangeLog:
* call.cc (maybe_init_list_as_array): Check that the element type is
copyable.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/initlist-opt1.C: New test.
|
|
The manual is incorrect in saying that the option does not warn
about designated initializers, which it does in C++. Whether the
divergence in behavior is desirable is another thing, but let's
at least make the manual match the reality.
PR c/39589
PR c++/96868
gcc/ChangeLog:
* doc/invoke.texi: Clarify that -Wmissing-field-initializers doesn't
warn about designated initializers in C only.
|
|
This adds plus to the op list of `(zero_one == 0) ? y : z <op> y` patterns
which currently has bit_ior and bit_xor.
This shows up now in GCC after the boolization work that Uroš has been doing.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/97711
PR tree-optimization/110155
gcc/ChangeLog:
* match.pd ((zero_one == 0) ? y : z <op> y): Add plus to the op.
((zero_one != 0) ? z <op> y : y): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/branchless-cond-add-2.c: New test.
* gcc.dg/tree-ssa/branchless-cond-add.c: New test.
|
|
rather than `(-zero_one) & z`
Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` already,
it is better if we don't do a secondary transformation. This reduces the extra
statements produced by match-and-simplify on the gimple level too.
gcc/ChangeLog:
* match.pd ((zero_one ==/!= 0) ? y : z <op> y): Use
multiply rather than negation/bit_and.
|
|
This allows unsigned types if the inner type where the negation is
located has greater than or equal to precision than the outer type.
branchless-cond.c needs to be updated since now we change it to
use a multiply rather than still having (-a)&c in there.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd (`X & -Y -> X * Y`): Allow for truncation
and the same type for unsigned types.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
|
|
So for the attached testcase, we assumed that zero_one_valued_p would
be the value [0,1] but currently zero_one_valued_p matches also
signed 1 bit integers.
This changes that not to match that and fixes the 2 new testcases at
all optimization levels.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Note the GCC 13 patch will be slightly different due to the changes
made to zero_one_valued_p.
PR tree-optimization/110165
PR tree-optimization/110166
gcc/ChangeLog:
* match.pd (zero_one_valued_p): Don't accept
signed 1-bit integers.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr110165-1.c: New test.
* gcc.c-torture/execute/pr110166-1.c: New test.
|
|
This patch fixes the problem that the loop in the tsvc s176 function is
optimized and removed because `iterations/LEN_1D` is 0 (where iterations
is set to 10000, LEN_1D is set to 32000 in tsvc.h).
This testcase passed on x86 and AArch64 system.
Best,
Lehua
gcc/testsuite/ChangeLog:
* gcc.dg/vect/tsvc/vect-tsvc-s176.c: Adjust iterations.
* gcc.dg/vect/tsvc/tsvc.h: Adjust expected rsult for s176.
|
|
When folding two conversions in a row we use TYPE_PRECISION but
that's invalid for VECTOR_TYPE. The following fixes this by
using element_precision instead.
* match.pd (two conversions in a row): Use element_precision
to DTRT for VECTOR_TYPE.
|
|
This patch would like to refactor the requirement of both the ZVFH
and ZVFHMIN. By default, the ZVFHMIN will enable FP16 for all the
iterators of RVV. And then the ZVFH will leverage one define attr as
the gate for FP16 supported or not.
Please note the ZVFH will cover the ZVFHMIN instructions. This patch
add one test for this.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Co-Authored by: Kito Cheng <kito.cheng@sifive.com>
gcc/ChangeLog:
* config/riscv/riscv.md (enabled): Move to another place, and
add fp_vector_disabled to the cond.
(fp_vector_disabled): New attr defined for disabling fp.
* config/riscv/vector-iterators.md: Fix V_WHOLE and V_FRACT.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add vle16 test
for ZVFHMIN.
|
|
This patch would like to fix one warning similar as below, and add the
link for where the values comes from.
./gcc/config/riscv/riscv-protos.h:260:13: warning: binary constants are
a C++14 feature or GCC extension
FRM_RNE = 0b000,
^~~~~
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-protos.h (enum frm_field_enum): Adjust
literal to int.
|
|
The pr96024.f90 testcase ICEs on big-endian hosts. The problem is
that length->val.integer is accessed after checking
length->expr_type == EXPR_CONSTANT, but it is a CHARACTER constant
which uses length->val.character union member instead and on big-endian
we end up reading constant 0x100000000 rather than some small number
on little-endian and if target doesn't have enough memory for 4 times
that (i.e. 16GB allocation), it ICEs.
2023-06-09 Jakub Jelinek <jakub@redhat.com>
PR fortran/96024
* primary.cc (gfc_convert_to_structure_constructor): Only do
constant string ctor length verification and truncation/padding
if constant length has INTEGER type.
|
|
Since mask < 0 will be always false for vector char when
-funsigned-char, but vpblendvb needs to check the most significant
bit. The patch explicitly VCE to vector signed char.
gcc/ChangeLog:
PR target/110108
* config/i386/i386.cc (ix86_gimple_fold_builtin): Explicitly
view_convert_expr mask to signed type when folding pblendvb
builtins.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110108-2.c: New test.
|
|
r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for
TYPE_MIN, but PABSB will store unsigned result into dst. The patch
uses ABSU_EXPR + VCE instead of ABS_EXPR.
Also don't fold _mm_abs_{pi8,pi16,pi32} w/o TARGET_64BIT since 64-bit
vector absm2 is guarded with TARGET_MMX_WITH_SSE.
gcc/ChangeLog:
PR target/110108
* config/i386/i386.cc (ix86_gimple_fold_builtin): Fold
_mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple
ABSU_EXPR + VCE, don't fold _mm_abs_{pi8,pi16,pi32} w/o
TARGET_64BIT.
* config/i386/i386-builtin.def: Replace CODE_FOR_nothing with
real codename for __builtin_ia32_pabs{b,w,d}.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110108.c: New test.
* gcc.target/i386/pr110108-3.c: New test.
* gcc.target/i386/pr109900.c: Adjust testcase.
|
|
|
|
This patches fixes two problems with the asm statement.
gm2 -Wall -c fooasm3.mod generates an incorrect warning and
gm2 cannot concatenate strings before an ASM statement.
The asm statement now accepts a constant expression (rather than
a string) and it updates the variable read/write use lists as
appropriate.
gcc/m2/ChangeLog:
PR modula2/110126
* gm2-compiler/M2GenGCC.mod (BuildTreeFromInterface): Remove
tokenno parameter. Use object tok instead of tokenno.
(BuildTrashTreeFromInterface): Use object tok instead of
GetDeclaredMod.
(CodeInline): Remove tokenno from parameter list to BuildTreeFromInterface.
* gm2-compiler/M2Quads.def (BuildAsmElement): Exported and
defined.
* gm2-compiler/M2Quads.mod (BuildOptimizeOff): Reformatted.
(BuildInline): Reformatted.
(BuildLineNo): Reformatted.
(UseLineNote): Reformatted.
(BuildAsmElement): New procedure.
* gm2-compiler/P0SyntaxCheck.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/P1Build.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/P2Build.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/P3Build.bnf (AsmOperands): Rewrite.
(AsmOperandSpec): Rewrite.
(AsmOutputList): New rule.
(AsmInputList): New rule.
(TrashList): Rewrite.
* gm2-compiler/PCBuild.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/PHBuild.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/SymbolTable.def (PutRegInterface):
Rewrite interface.
(GetRegInterface): Rewrite interface.
* gm2-compiler/SymbolTable.mod (SetFirstUsed): New procedure.
(PutFirstUsed): New procedure.
(PutRegInterface): Rewrite.
(GetRegInterface): Rewrite.
gcc/testsuite/ChangeLog:
PR modula2/110126
* gm2/pim/pass/fooasm3.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
Simplify range_op_handler to have a single range_operator pointer and
provide a more flexible dispatch mechanism for calls via generic vrange
classes. This is more extensible for adding new classes of range support.
Any unsupported dispatch patterns will simply return FALSE now rather
than generating compile time exceptions, aleviating the need to
constantly check for supoprted types.
* gimple-range-op.cc
(gimple_range_op_handler::gimple_range_op_handler): Adjust.
(gimple_range_op_handler::maybe_builtin_call): Adjust.
* gimple-range-op.h (operand1, operand2): Use m_operator.
* range-op.cc (integral_table, pointer_table): Relocate.
(get_op_handler): Rename from get_handler and handle all types.
(range_op_handler::range_op_handler): Relocate.
(range_op_handler::set_op_handler): Relocate and adjust.
(range_op_handler::range_op_handler): Relocate.
(dispatch_trio): New.
(RO_III, RO_IFI, RO_IFF, RO_FFF, RO_FIF, RO_FII): New consts.
(range_op_handler::dispatch_kind): New.
(range_op_handler::fold_range): Relocate and Use new dispatch value.
(range_op_handler::op1_range): Ditto.
(range_op_handler::op2_range): Ditto.
(range_op_handler::lhs_op1_relation): Ditto.
(range_op_handler::lhs_op2_relation): Ditto.
(range_op_handler::op1_op2_relation): Ditto.
(range_op_handler::set_op_handler): Use m_operator member.
* range-op.h (range_op_handler::operator bool): Use m_operator.
(range_op_handler::dispatch_kind): New.
(range_op_handler::m_valid): Delete.
(range_op_handler::m_int): Delete
(range_op_handler::m_float): Delete
(range_op_handler::m_operator): New.
(range_op_table::operator[]): Relocate from .cc file.
(range_op_table::set): Ditto.
* value-range.h (class vrange): Make range_op_handler a friend.
|
|
Range_operator and range_operator_float are 2 different classes, making
generalized dispatch difficult. The distinction between what is a float
operator and what is an integral operator also blurs when some methods
have multiple types. ie, casts : INT = FLOAT and FLOAT = INT
This patch unifies all possible invocation patterns in one class, and
switches the float table to use the general range_op_table.
* gimple-range-op.cc (cfn_constant_float_p): Change base class.
(cfn_pass_through_arg1): Adjust using statemenmt.
(cfn_signbit): Change base class, adjust using statement.
(cfn_copysign): Ditto.
(cfn_sqrt): Ditto.
(cfn_sincos): Ditto.
* range-op-float.cc (fold_range): Change class to range_operator.
(rv_fold): Ditto.
(op1_range): Ditto
(op2_range): Ditto
(lhs_op1_relation): Ditto.
(lhs_op2_relation): Ditto.
(op1_op2_relation): Ditto.
(foperator_*): Ditto.
(class float_table): New. Inherit from range_op_table.
(floating_tree_table) Change to range_op_table pointer.
(class floating_op_table): Delete.
* range-op.cc (operator_equal): Adjust using statement.
(operator_not_equal): Ditto.
(operator_lt, operator_le, operator_gt, operator_ge): Ditto.
(operator_minus, operator_cast): Ditto.
(operator_bitwise_and, pointer_plus_operator): Ditto.
(get_float_handle): Change return type.
* range-op.h (range_operator_float): Delete. Relocate all methods
into class range_operator.
(range_op_handler::m_float): Change type to range_operator.
(floating_op_table): Delete.
(floating_tree_table): Change type.
|
|
Range_operator had a tree code added last release to facilitate
bitmask operations. This removes the tree_code and replaces it with a
virtual routine to peform the masking. Remove any duplicate instances
which are no longer needed.
* range-op.cc (range_operator::fold_range): Call virtual routine.
(range_operator::update_bitmask): New.
(operator_equal::update_bitmask): New.
(operator_not_equal::update_bitmask): New.
(operator_lt::update_bitmask): New.
(operator_le::update_bitmask): New.
(operator_gt::update_bitmask): New.
(operator_ge::update_bitmask): New.
(operator_ge::update_bitmask): New.
(operator_plus::update_bitmask): New.
(operator_minus::update_bitmask): New.
(operator_pointer_diff::update_bitmask): New.
(operator_min::update_bitmask): New.
(operator_max::update_bitmask): New.
(operator_mult::update_bitmask): New.
(operator_div:operator_div):New.
(operator_div::update_bitmask): New.
(operator_div::m_code): New member.
(operator_exact_divide::operator_exact_divide): New constructor.
(operator_lshift::update_bitmask): New.
(operator_rshift::update_bitmask): New.
(operator_bitwise_and::update_bitmask): New.
(operator_bitwise_or::update_bitmask): New.
(operator_bitwise_xor::update_bitmask): New.
(operator_trunc_mod::update_bitmask): New.
(op_ident, op_unknown, op_ptr_min_max): New.
(op_nop, op_convert): Delete.
(op_ssa, op_paren, op_obj_type): Delete.
(op_realpart, op_imagpart): Delete.
(op_ptr_min, op_ptr_max): Delete.
(pointer_plus_operator:update_bitmask): New.
(range_op_table::set): Do not use m_code.
(integral_table::integral_table): Adjust to single instances.
* range-op.h (range_operator::range_operator): Delete.
(range_operator::m_code): Delete.
(range_operator::update_bitmask): New.
|
|
We currently do not have any floating point operators where operand 1 is
a different type than the LHS. When we eventually do there is a bug
in fold_range. If either operand is a known NAN, it returns a NAN
of the type of operand 1 instead of the result type.
* range-op-float.cc (range_operator_float::fold_range): Return
NAN of the result type.
|
|
This patch would like to add new test cases to make sure the
RVV FP16 works well as expected.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Add new cases.
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: New test.
|
|
This patch enhances -Wanalyzer-out-of-bounds that is no longer paired
with a -Wanalyzer-use-of-uninitialized-value on out-of-bounds-read.
This also fixes PR analyzer/109437.
Before there could always be at most one OOB-read warning per frame because
-Wanalyzer-use-of-uninitialized-value always terminates the analysis
path.
PR 109439
gcc/analyzer/ChangeLog:
* bounds-checking.cc (region_model::check_symbolic_bounds): Returns whether the BASE_REG
region access was OOB.
(region_model::check_region_bounds): Likewise.
* region-model.cc (region_model::get_store_value): Creates an
unknown svalue on OOB-read access to REG.
(region_model::check_region_access): Returns whether an unknown svalue needs be created.
(region_model::check_region_for_read): Passes check_region_access return value.
* region-model.h: Update prior function definitions.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-2.c: Cleaned test for uninitialized-value warning
* gcc.dg/analyzer/out-of-bounds-5.c: Likewise.
* gcc.dg/analyzer/pr101962.c: Likewise.
* gcc.dg/analyzer/realloc-5.c: Likewise.
* gcc.dg/analyzer/pr109439.c: New test.
|
|
We have expand_doubleword_clz for a couple of years, where we emit
double-word CLZ as if (high_word == 0) return CLZ (low_word) + word_size;
else return CLZ (high_word);
We can do something similar for CTZ and FFS IMHO, just with the 2
words swapped. So if (low_word == 0) return CTZ (high_word) + word_size;
else return CTZ (low_word); for CTZ and
if (low_word == 0) { return high_word ? FFS (high_word) + word_size : 0;
else return FFS (low_word);
The following patch implements that.
Note, on some targets which implement both word_mode ctz and ffs patterns,
it might be better to incrementally implement those double-word ffs expansion
patterns in md files, because we aren't able to optimize it correctly;
nothing can detect we have just made sure that argument is not 0 and so
don't need to bother with handling that case. So, on ia32 just using
CTZ patterns would be better there, but I think we can even do better and
instead of doing the comparisons of the operands against 0 do the CTZ
expansion followed by testing of flags.
2023-06-08 Jakub Jelinek <jakub@redhat.com>
* optabs.cc (expand_ffs): Add forward declaration.
(expand_doubleword_clz): Rename to ...
(expand_doubleword_clz_ctz_ffs): ... this. Add UNOPTAB argument,
handle also doubleword CTZ and FFS in addition to CLZ.
(expand_unop): Adjust caller. Also call it for doubleword
ctz_optab and ffs_optab.
* gcc.target/i386/ctzll-1.c: New test.
* gcc.target/i386/ffsll-1.c: New test.
|
|
[PR110152]
I'm getting
+FAIL: gcc.target/i386/3dnow-1.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/3dnow-1.c (test for excess errors)
+FAIL: gcc.target/i386/3dnow-2.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/3dnow-2.c (test for excess errors)
+FAIL: gcc.target/i386/mmx-1.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/mmx-1.c (test for excess errors)
+FAIL: gcc.target/i386/mmx-2.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/mmx-2.c (test for excess errors)
regressions on i686-linux since r14-1166. The problem is when
ix86_expand_vector_init_general is called with mmx_ok = true and
mode = V4HImode, it newly recurses with mmx_ok = false and mode = V2SImode,
but as mmx_ok is false and !TARGET_SSE, we recurse again with the same
arguments (ok, fresh new tmp and vals) infinitely.
The following patch fixes that by passing mmx_ok to that recursive call.
For n_words == 4 it isn't needed, because we only care about mmx_ok for
V2SImode or V2SFmode and no other modes.
2023-06-08 Jakub Jelinek <jakub@redhat.com>
PR target/110152
* config/i386/i386-expand.cc (ix86_expand_vector_init_general): For
n_words == 2 recurse with mmx_ok as first argument rather than false.
|
|
2023-06-08 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/87477
PR fortran/99350
PR fortran/107821
PR fortran/109451
* decl.cc (char_len_param_value): Simplify a copy of the expr
and replace the original if there is no error.
* gfortran.h : Remove the redundant field 'rankguessed' from
'gfc_association_list'.
* resolve.cc (resolve_assoc_var): Remove refs to 'rankguessed'.
(resolve_variable): Associate names with constant or structure
constructor targets cannot have array refs.
* trans-array.cc (gfc_conv_expr_descriptor): Guard expression
character length backend decl before using it. Suppress the
assignment if lhs equals rhs.
* trans-io.cc (gfc_trans_transfer): Scalarize transfer of
associate variables pointing to a variable. Add comment.
* trans-stmt.cc (trans_associate_var): Remove requirement that
the character length be deferred before assigning the value
returned by gfc_conv_expr_descriptor. Also, guard the backend
decl before testing with VAR_P.
gcc/testsuite/
PR fortran/99350
* gfortran.dg/pr99350.f90 : New test.
PR fortran/107821
* gfortran.dg/associate_5.f03 : Changed error message.
* gfortran.dg/pr107821.f90 : New test.
PR fortran/109451
* gfortran.dg/associate_61.f90 : New test
|
|
Several tests are timing out when targeting x86-*-vxworks with qemu.
Bump their timeout factor.
for gcc/testsuite/ChangeLog
* gcc.dg/vect/tsvc/vect-tsvc-s116.c: Bump timeout factor.
* gcc.dg/vect/tsvc/vect-tsvc-s241.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s254.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s271.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s2711.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s2712.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s276.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-vdotr.c: Likewise.
|
|
|
|
Richard Sandiford was, of course, right to be warry of new code without
much test coverage. Converting the nvptx backend to use the BITREVERSE
rtx infrastructure, has resulted in far more exhaustive testing and
revealed a subtle bug in the new wi::bitreverse implementation. The
code needs to use HOST_WIDE_INT_1U (instead of 1) to avoid unintended
sign extension.
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(with a minor tweak to use BITREVERSE), where it fixes regressions of
the 32-bit test vectors in gcc.target/nvptx/brev-2.c and the 64-bit
test vectors in gcc.target/nvptx/brevll-2.c. Committed as obvious.
2023-06-07 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* wide-int.cc (wi::bitreverse_large): Use HOST_WIDE_INT_1U to
avoid sign extension/undefined behaviour when setting each bit.
|
|
This patch is the latest revision of my patch to add support for the
STC (set carry flag) and CMC (complement carry flag) instructions to
the i386 backend, incorporating Uros' previous feedback. The significant
changes are (i) the inclusion of CMC, (ii) the use of UNSPEC for pattern,
(iii) Use of a new X86_TUNE_SLOW_STC tuning flag to use alternate
implementations on pentium4 (which has a notoriously slow STC) when
not optimizing for size.
An example of the use of the stc instruction is:
unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) {
return __builtin_ia32_addcarryx_u32 (1, a, b, c);
}
which previously generated:
movl $1, %eax
addb $-1, %al
adcl %esi, %edi
setc %al
movl %edi, (%rdx)
movzbl %al, %eax
ret
with this patch now generates:
stc
adcl %esi, %edi
setc %al
movl %edi, (%rdx)
movzbl %al, %eax
ret
An example of the use of the cmc instruction (where the carry from
a first adc is inverted/complemented as input to a second adc) is:
unsigned int bar (unsigned int a, unsigned int b,
unsigned int c, unsigned int d)
{
unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1);
return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, &o2);
}
which previously generated:
movl $1, %eax
addb $-1, %al
adcl %esi, %edi
setnc %al
movl %edi, o1(%rip)
addb $-1, %al
adcl %ecx, %edx
setc %al
movl %edx, o2(%rip)
movzbl %al, %eax
ret
and now generates:
stc
adcl %esi, %edi
cmc
movl %edi, o1(%rip)
adcl %ecx, %edx
setc %al
movl %edx, o2(%rip)
movzbl %al, %eax
ret
This version implements Uros' suggestions/refinements. (i) Avoid the
UNSPEC_CMC by using the canonical RTL idiom for *x86_cmc, (ii) Use
peephole2s to convert x86_stc and *x86_cmc into alternate forms on
TARGET_SLOW_STC CPUs (pentium4), when a suitable QImode register is
available, (iii) Prefer the addqi_cconly_overflow idiom (addb $-1,%al)
over negqi_ccc_1 (neg %al) for setting the carry from a QImode value,
These changes required two minor edits to i386.cc: ix86_cc_mode had
to be tweaked to suggest CCCmode for the new *x86_cmc pattern, and
*x86_cmc needed to be handled/parameterized in ix86_rtx_costs so that
combine would appreciate that this complex RTL expression was actually
a fast, single byte instruction [i.e. preferable].
2022-06-07 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_builtin) <handlecarry>:
Use new x86_stc instruction when the carry flag must be set.
* config/i386/i386.cc (ix86_cc_mode): Use CCCmode for *x86_cmc.
(ix86_rtx_costs): Provide accurate rtx_costs for *x86_cmc.
* config/i386/i386.h (TARGET_SLOW_STC): New define.
* config/i386/i386.md (UNSPEC_STC): New UNSPEC for stc.
(x86_stc): New define_insn.
(define_peephole2): Convert x86_stc into alternate implementation
on pentium4 without -Os when a QImode register is available.
(*x86_cmc): New define_insn.
(define_peephole2): Convert *x86_cmc into alternate implementation
on pentium4 without -Os when a QImode register is available.
(*setccc): New define_insn_and_split for a no-op CCCmode move.
(*setcc_qi_negqi_ccc_1_<mode>): New define_insn_and_split to
recognize (and eliminate) the carry flag being copied to itself.
(*setcc_qi_negqi_ccc_2_<mode>): Likewise.
* config/i386/x86-tune.def (X86_TUNE_SLOW_STC): New tuning flag.
gcc/testsuite/ChangeLog
* gcc.target/i386/cmc-1.c: New test case.
* gcc.target/i386/stc-1.c: Likewise.
|
|
Now that we support NRV from an inner block, we can also support non-NRV
returns from other blocks, since once the NRV is out of scope a later return
expression can't possibly alias it.
This fixes 58487 and half-fixes 53637: now one of the returns is elided, but
not the other.
Fixing the remaining xfails in these testcases will require a very different
approach, probably involving a full tree/block walk from finalize_nrv, and
check_return_expr only adding to a list of potential return variables.
PR c++/58487
PR c++/53637
gcc/cp/ChangeLog:
* cp-tree.h (INIT_EXPR_NRV_P): New.
* semantics.cc (finalize_nrv_r): Check it.
* name-lookup.h (decl_in_scope_p): Declare.
* name-lookup.cc (decl_in_scope_p): New.
* typeck.cc (check_return_expr): Allow non-NRV
returns if the NRV is no longer in scope.
gcc/testsuite/ChangeLog:
* g++.dg/opt/nrv26.C: New test.
* g++.dg/opt/nrv26a.C: New test.
* g++.dg/opt/nrv27.C: New test.
|
|
The patterns match more than just `a & 1` so change the comment
for these two patterns to say that.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd: Fix comment for the
`(zero_one ==/!= 0) ? y : z <op> y` patterns.
|
|
This patch tries to prevent generating unnecessary sign extension
after *w instructions like "addiw" or "divw".
The main idea of it is to add SUBREG_PROMOTED fields during expanding.
I have tested on SPEC2017 there is no regression.
Only gcc.dg/pr30957-1.c test failed.
To solve that I did some changes in loop-iv.cc, but not sure that it is
suitable.
gcc/ChangeLog:
* config/riscv/bitmanip.md (rotrdi3, rotrsi3, rotlsi3): New expanders.
(rotrsi3_sext): Expose generator.
(rotlsi3 pattern): Hide generator.
* config/riscv/riscv-protos.h (riscv_emit_binary): New function
declaration.
* config/riscv/riscv.cc (riscv_emit_binary): Removed static
* config/riscv/riscv.md (addsi3, subsi3, negsi2): Hide generator.
(mulsi3, <optab>si3): Likewise.
(addsi3, subsi3, negsi2, mulsi3, <optab>si3): New expanders.
(addv<mode>4, subv<mode>4, mulv<mode>4): Use riscv_emit_binary.
(<u>mulsidi3): Likewise.
(addsi3_extended, subsi3_extended, negsi2_extended): Expose generator.
(mulsi3_extended, <optab>si3_extended): Likewise.
(splitter for shadd feeding divison): Update RTL pattern to account
for changes in how 32 bit ops are expanded for TARGET_64BIT.
* loop-iv.cc (get_biv_step_1): Process src of extension when it PLUS.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/shift-and-2.c: New tests.
* gcc.target/riscv/shift-shift-2.c: Adjust expected output.
* gcc.target/riscv/sign-extend.c: New test.
* gcc.target/riscv/zbb-rol-ror-03.c: Adjust expected output.
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
|
|
During libgcc configure stage for riscv32-none-elf, when
"--enable-checking=yes,rtl" has been activated, the following error
is observed:
during RTL pass: final
conftest.c: In function 'main':
conftest.c:16:1: internal compiler error: RTL check: expected code 'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4462
16 | }
| ^
0x843c4d rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*)
/mnt/nvme/dinux/local-workspace/gcc/gcc/rtl.cc:916
0x8ea823 riscv_print_operand
/mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:4462
0xde84b5 output_operand(rtx_def*, int)
/mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3632
0xde8ef8 output_asm_insn(char const*, rtx_def**)
/mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3544
0xded33b output_asm_insn(char const*, rtx_def**)
/mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3421
0xded33b final_scan_insn_1
/mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2841
0xded6cb final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
/mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2887
0xded8b7 final_1
/mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:1979
0xdee518 rest_of_handle_final
/mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4240
0xdee518 execute
/mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4318
Fix by moving the calculation of memmodel to the cases where it is used.
Regression tested for riscv32-none-elf. No changes in gcc.sum and
g++.sum.
PR target/109725
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_print_operand): Calculate
memmodel only when it is valid.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
When building riscv32-none-elf with "--enable-checking=yes,rtl", the
following ICE is observed:
cc1: internal compiler error: RTL check: expected code 'const_int', have 'const_double' in riscv_const_insns, at config/riscv/riscv.cc:1313
0x843c4d rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*)
/mnt/nvme/dinux/local-workspace/gcc/gcc/rtl.cc:916
0x8eab61 riscv_const_insns(rtx_def*)
/mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:1313
0x15443bb riscv_legitimate_constant_p
/mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:826
0xdd3c71 emit_move_insn(rtx_def*, rtx_def*)
/mnt/nvme/dinux/local-workspace/gcc/gcc/expr.cc:4310
0x15f28e5 run_const_vector_selftests
/mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv-selftests.cc:285
0x15f37bd selftest::riscv_run_selftests()
/mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv-selftests.cc:364
0x1f6fba9 selftest::run_tests()
/mnt/nvme/dinux/local-workspace/gcc/gcc/selftest-run-tests.cc:111
0x11d1f39 toplev::run_self_tests()
/mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:2185
Fix by following the spirit of the adjacent comment, and using the
dedicated riscv_const_insns() function to calculate cost for loading a
constant element. Infinite recursion is not possible because the first
invocation is on a CONST_VECTOR, whereas the second is on a single
element of the vector (e.g. CONST_INT or CONST_DOUBLE).
Regression tested for riscv32-none-elf. No changes in gcc.sum and
g++.sum.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_const_insns): Recursively call
for constant element of a vector.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
Recently zero_one_valued_p was changed to handle integer_zerop
case specially, because tree_nonzero_bits (@0) == 1 only returns
true for non-constant values with range [0, 1] or constant 1,
constant 0 has tree_nonzero_bits (integer_zero_node) == 0.
The following patch reverts that change and instead checks
that tree_nonzero_bits is <= 1U.
2023-06-07 Jakub Jelinek <jakub@redhat.com>
* match.pd (zero_one_valued_p): Don't handle integer_zerop specially,
instead compare tree_nonzero_bits <= 1U rather than just == 1.
|
|
This patch refactors the ls64 builtins to allow the compiler to define them
directly instead of having wrapper functions in arm_acle.h. This should be not
only easier to maintain, but it makes two important correctness fixes:
- It fixes PR110132, where the builtins ended up getting declared with
invisible bindings in the C FE, so the FE ended up synthesizing
incompatible implicit definitions for these builtins.
- It allows the builtins to be used with LTO, which didn't work previously.
We also take the opportunity to add test coverage from C++ for these
builtins.
gcc/ChangeLog:
PR target/110132
* config/aarch64/aarch64-builtins.cc (aarch64_general_simulate_builtin):
New. Use it ...
(aarch64_init_ls64_builtins): ... here. Switch to declaring public ACLE
names for builtins.
(aarch64_general_init_builtins): Ensure we invoke the arm_acle.h
setup if in_lto_p, just like we do for SVE.
* config/aarch64/arm_acle.h: (__arm_ld64b): Delete.
(__arm_st64b): Delete.
(__arm_st64bv): Delete.
(__arm_st64bv0): Delete.
gcc/testsuite/ChangeLog:
PR target/110132
* lib/target-supports.exp (check_effective_target_aarch64_asm_FUNC_ok):
Extend to ls64.
* g++.target/aarch64/acle/acle.exp: New.
* g++.target/aarch64/acle/ls64.C: New test.
* g++.target/aarch64/acle/ls64_lto.C: New test.
* gcc.target/aarch64/acle/ls64_lto.c: New test.
* gcc.target/aarch64/acle/pr110132.c: New test.
|
|
The st64b pattern incorrectly had an output constraint on the register
operand containing the destination address for the store, leading to
wrong code. This patch fixes that.
gcc/ChangeLog:
PR target/110100
* config/aarch64/aarch64-builtins.cc (aarch64_expand_builtin_ls64):
Use input operand for the destination address.
* config/aarch64/aarch64.md (st64b): Fix constraint on address
operand.
gcc/testsuite/ChangeLog:
PR target/110100
* gcc.target/aarch64/acle/pr110100.c: New test.
|
|
The ls64 builtin code was using incorrect GNU style with eight spaces where
there should be a tab. Fixed thusly.
gcc/ChangeLog:
PR target/110100
* config/aarch64/aarch64-builtins.cc (aarch64_init_ls64_builtins_types):
Replace eight consecutive spaces with tabs.
(aarch64_init_ls64_builtins): Likewise.
(aarch64_expand_builtin_ls64): Likewise.
* config/aarch64/aarch64.md (ld64b): Likewise.
(st64b): Likewise.
(st64bv): Likewise
(st64bv0): Likewise.
|
|
On some targets an integer pseudo can be assigned to a FP reg. For
pic offset table pseudo it means we will reload the pseudo in this
case and, as a consequence, memory containing the pseudo might be
recognized as wrong one. The patch fix this problem.
PR target/109541
gcc/ChangeLog:
* ira-costs.cc: (find_costs_and_classes): Constrain classes of pic
offset table pseudo to a general reg subset.
gcc/testsuite/ChangeLog:
* gcc.target/sparc/pr109541.c: New.
|
|
This patch removes UNSPEC_SQXTUN and uses organic RTL codes to represent the operation.
SQXTUN is an odd one. It's described in the architecture as "Signed saturating extract Unsigned Narrow".
It's not a straightforward ss_truncate nor a us_truncate.
It is a sort of truncating signed clamp operation with limits derived from the unsigned extrema of the narrow mode:
(truncate:N
(smin:M
(smax:M (reg:M) (const_int 0))
(const_int <unsigned-max-for-mode-N>)))
This patch implements these semantics. I've checked that the vqmovun tests in advsimd-intrinsics.exp
now get constant-folded and still pass validation, so I'm pretty confident in the semantics.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_sqmovun<mode><vczle><vczbe>):
Rename to...
(*aarch64_sqmovun<mode>_insn<vczle><vczbe>): ... This. Reimplement
with RTL codes.
(aarch64_sqmovun<mode> [SD_HSDI]): Reimplement with RTL codes.
(aarch64_sqxtun2<mode>_le): Likewise.
(aarch64_sqxtun2<mode>_be): Likewise.
(aarch64_sqxtun2<mode>): Adjust for the above.
(aarch64_sqmovun<mode>): New define_expand.
* config/aarch64/iterators.md (UNSPEC_SQXTUN): Delete.
(half_mask): New mode attribute.
* config/aarch64/predicates.md (aarch64_simd_umax_half_mode):
New predicate.
|
|
Similar to the ADDLP instructions the non-widening ADDP ones can be
represented by adding the odd lanes with the even lanes of a vector.
These instructions take two vector inputs and the architecture spec
describes the operation as concatenating them together before going
through it with pairwise additions.
This patch chooses to represent ADDP on 64-bit and 128-bit input
vectors slightly differently, reasons explained in the comments
in aarhc64-simd.md.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_addp<mode><vczle><vczbe>):
Reimplement as...
(aarch64_addp<mode>_insn): ... This...
(aarch64_addp<mode><vczle><vczbe>_insn): ... And this.
(aarch64_addp<mode>): New define_expand.
|
|
Recent changes in the hoisting code change the optimized gimple for the
shadd-3 testcase on the PA. That in turn changes the number of expected
shadd instructions.
I'm not entirely sure the test is actually testing what we want anymore
since I don't see a CSE for postreload to discover. But I did verify
that the number of shadd instructions is sane, so I just changed the
count in the obvious way.
gcc/testsuite
* gcc.target/hppa/shadd-3.c: Update expected output.
|
|
internal-fn.h since yesterday includes insn-opinit.h, which is a generated
header.
One of my bootstraps today failed because some m2 sources started compiling
before insn-opinit.h has been generated.
Normally, gcc/Makefile.in has
# In order for parallel make to really start compiling the expensive
# objects from $(OBJS) as early as possible, build all their
# prerequisites strictly before all objects.
$(ALL_HOST_OBJS) : | $(generated_files)
rule which ensures that all the generated files are generated before
any $(ALL_HOST_OBJS) objects start, but use order-only dependency for
this because we don't want to rebuild most of the objects whenever one
generated header is regenerated. After the initial build in an empty
directory we'll have .deps/ files contain the detailed dependencies.
$(ALL_HOST_OBJS) includes even some FE files, I think in the m2 case
would be m2_OBJS, but m2/Make-lang.in doesn't define those.
The following patch just adds a similar rule to m2/Make-lang.in.
Another option would be to set m2_OBJS variable in m2/Make-lang.in to
something, but not really sure to which exactly and why it isn't
done.
2023-06-07 Jakub Jelinek <jakub@redhat.com>
* Make-lang.in: Build $(generated_files) before building
all $(GM2_C_OBJS).
|
|
This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
for (int i = 0; i < 100; ++i)
{
a[i * 8 + 0] = b[i * 8 + 7] + 1;
a[i * 8 + 1] = b[i * 8 + 7] + 2;
a[i * 8 + 2] = b[i * 8 + 7] + 8;
a[i * 8 + 3] = b[i * 8 + 7] + 4;
a[i * 8 + 4] = b[i * 8 + 7] + 5;
a[i * 8 + 5] = b[i * 8 + 7] + 6;
a[i * 8 + 6] = b[i * 8 + 7] + 7;
a[i * 8 + 7] = b[i * 8 + 7] + 3;
}
}
To enable VLA SLP auto-vectorization, we should be able to handle this following const vector:
1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
2. NPATTERNS = 8, NELTS_PER_PATTERN = 1.
{ 1, 2, 8, 4, 5, 6, 7, 3, ... }
And these vector can be generated at prologue.
After this patch, we end up with this following codegen:
Prologue:
...
vsetvli a7,zero,e16,m2,ta,ma
vid.v v4
vsrl.vi v4,v4,3
li a3,8
vmul.vx v4,v4,a3 ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
...
li t1,67633152
addi t1,t1,513
li a3,50790400
addi a3,a3,1541
slli a3,a3,32
add a3,a3,t1
vsetvli t1,zero,e64,m1,ta,ma
vmv.v.x v3,a3 ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
...
LoopBody:
...
min a3,...
vsetvli zero,a3,e8,m1,ta,ma
vle8.v v2,0(a6)
vsetvli a7,zero,e8,m1,ta,ma
vrgatherei16.vv v1,v2,v4
vadd.vv v1,v1,v3
vsetvli zero,a3,e8,m1,ta,ma
vse8.v v1,0(a2)
add a6,a6,a4
add a2,a2,a4
mv a3,a5
add a5,a5,t1
bgtu a3,a4,.L3
...
Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8
since "vrgatherei16.vv" can cover larger range than "vrgather.vv" (which
only can maximum element index = 255).
Epilogue:
lbu a5,799(a1)
addiw a4,a5,1
sb a4,792(a0)
addiw a4,a5,2
sb a4,793(a0)
addiw a4,a5,8
sb a4,794(a0)
addiw a4,a5,4
sb a4,795(a0)
addiw a4,a5,5
sb a4,796(a0)
addiw a4,a5,6
sb a4,797(a0)
addiw a4,a5,7
sb a4,798(a0)
addiw a5,a5,3
sb a5,799(a0)
ret
There is one more last thing we need to do is the "Epilogue auto-vectorization"
which needs VLS modes support. I will support VLS modes for
"Epilogue auto-vectorization" in the future.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
* config/riscv/riscv-v.cc
(rvv_builder::can_duplicate_repeating_sequence_p): Support POLY
handling.
(rvv_builder::single_step_npatterns_p): New function.
(rvv_builder::npatterns_all_equal_p): Ditto.
(const_vec_all_in_range_p): Support POLY handling.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Add vrgatherei16.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_const_vector): Add VLA SLP const vector support.
(expand_vec_perm): Support POLY.
(struct expand_vec_perm_d): New struct.
(shuffle_generic_patterns): New function.
(expand_vec_perm_const_1): Ditto.
(expand_vec_perm_const): Ditto.
* config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
(TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA
vectorizer.
* gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test.
|