Age | Commit message (Collapse) | Author | Files | Lines |
|
cost 0, 1 and 2 for QI, HI and SI mode
Add asm dump check test for vec_duplicate + vaaddu.vv combine to
vaaddu.vx, with the GR2VR cost is 0, 1 and 2. Please note DImode
is not included.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
cost 0, 2 and 15 for QI, HI and SI mode
Add asm dump check and run test for vec_duplicate + vaaddu.vv
combine to vaaddu.vx, with the GR2VR cost is 0, 2 and 15. Please
note DImode is not included here.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-1-u8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
For inserting zero into a vector lane we usually use an instruction like:
ins v0.h[2], wzr
This, however, has not-so-great performance on some CPUs.
On Grace, for example it has a latency of 5 and throughput 1.
The alternative sequence:
movi v31.8b, #0
ins v0.h[2], v31.h[0]
is prefereble bcause the MOVI-0 is often a zero-latency operation that is
eliminated by the CPU frontend and the lane-to-lane INS has a latency of 2 and
throughput of 4.
We can avoid the merging of the two instructions into the aarch64_simd_vec_set_zero<mode>
by disabling that pattern when optimizing for speed.
Thanks to wider benchmarking from Tamar, it makes sense to make this change for
all tunings, so no RTX costs or tuning flags are introduced to control this
in a more fine-grained manner. They can be easily added in the future if needed
for a particular CPU.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/
* config/aarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>):
Enable only when optimizing for size.
gcc/testsuite/
* gcc.target/aarch64/simd/mf8_data_1.c (test_set_lane4,
test_setq_lane4): Relax allowed assembly.
* gcc.target/aarch64/vec-set-zero.c: Use -Os in flags.
* gcc.target/aarch64/inszero_split_1.c: New test.
|
|
With bools we can have the usual mismatch between mask and data
use. Catch that, like we do elsewhere.
PR tree-optimization/121194
* tree-vect-loop.cc (vectorizable_lc_phi): Verify
vector types are compatible.
* gcc.dg/torture/pr121194.c: New testcase.
|
|
This implements error handling for hard register constraints including
potential conflicts with register asm operands.
In contrast to register asm operands, hard register constraints allow
more than just one register per operand. Even more than just one
register per alternative. For example, a valid constraint for an
operand is "{r0}{r1}m,{r2}". However, this also means that we have to
make sure that each register is used at most once in each alternative
over all outputs and likewise over all inputs. For asm statements this
is done by this patch during gimplification. For hard register
constraints used in machine description, error handling is still a todo
and I haven't investigated this so far and consider this rather a low
priority.
gcc/ada/ChangeLog:
* gcc-interface/trans.cc (gnat_to_gnu): Pass null pointer to
parse_{input,output}_constraint().
gcc/analyzer/ChangeLog:
* region-model-asm.cc (region_model::on_asm_stmt): Pass null
pointer to parse_{input,output}_constraint().
gcc/c/ChangeLog:
* c-typeck.cc (build_asm_expr): Pass null pointer to
parse_{input,output}_constraint().
gcc/ChangeLog:
* cfgexpand.cc (n_occurrences): Move this ...
(check_operand_nalternatives): and this ...
(expand_asm_stmt): and the call to gimplify.cc.
* config/s390/s390.cc (s390_md_asm_adjust): Pass null pointer to
parse_{input,output}_constraint().
* gimple-walk.cc (walk_gimple_asm): Pass null pointer to
parse_{input,output}_constraint().
(walk_stmt_load_store_addr_ops): Ditto.
* gimplify-me.cc (gimple_regimplify_operands): Ditto.
* gimplify.cc (num_occurrences): Moved from cfgexpand.cc.
(num_alternatives): Ditto.
(gimplify_asm_expr): Deal with hard register constraints.
* stmt.cc (eliminable_regno_p): New helper.
(hardreg_ok_p): Perform a similar check as done in
make_decl_rtl().
(parse_output_constraint): Add parameter for gimplify_reg_info
and validate hard register constrained operands.
(parse_input_constraint): Ditto.
* stmt.h (class gimplify_reg_info): Forward declaration.
(parse_output_constraint): Add parameter.
(parse_input_constraint): Ditto.
* tree-ssa-operands.cc
(operands_scanner::get_asm_stmt_operands): Pass null pointer
to parse_{input,output}_constraint().
* tree-ssa-structalias.cc (find_func_aliases): Pass null pointer
to parse_{input,output}_constraint().
* varasm.cc (assemble_asm): Pass null pointer to
parse_{input,output}_constraint().
* gimplify_reg_info.h: New file.
gcc/cp/ChangeLog:
* semantics.cc (finish_asm_stmt): Pass null pointer to
parse_{input,output}_constraint().
gcc/d/ChangeLog:
* toir.cc: Pass null pointer to
parse_{input,output}_constraint().
gcc/testsuite/ChangeLog:
* gcc.dg/pr87600-2.c: Split test into two files since errors for
functions test{0,1} are thrown during expand, and for
test{2,3} during gimplification.
* lib/scanasm.exp: On s390, skip lines beginning with #.
* gcc.dg/asm-hard-reg-error-1.c: New test.
* gcc.dg/asm-hard-reg-error-2.c: New test.
* gcc.dg/asm-hard-reg-error-3.c: New test.
* gcc.dg/asm-hard-reg-error-4.c: New test.
* gcc.dg/asm-hard-reg-error-5.c: New test.
* gcc.dg/pr87600-3.c: New test.
* gcc.target/aarch64/asm-hard-reg-2.c: New test.
* gcc.target/s390/asm-hard-reg-7.c: New test.
|
|
Implement hard register constraints of the form {regname} where regname
must be a valid register name for the target. Such constraints may be
used in asm statements as a replacement for register asm and in machine
descriptions. A more verbose description is given in extend.texi.
It is expected and desired that optimizations coalesce multiple pseudos
into one whenever possible. However, in case of hard register
constraints we may have to undo this and introduce copies since
otherwise we would constraint a single pseudo to multiple hard
registers. This is done prior RA during asmcons in
match_asm_constraints_2(). While IRA tries to reduce live ranges, it
also replaces some register-register moves. That in turn might undo
those copies of a pseudo which we just introduced during asmcons. Thus,
check in decrease_live_ranges_number() via
valid_replacement_for_asm_input_p() whether it is valid to perform a
replacement.
The reminder of the patch mostly deals with parsing and decoding hard
register constraints. The actual work is done by LRA in
process_alt_operands() where a register filter, according to the
constraint, is installed.
For the sake of "reviewability" and in order to show the beauty of LRA,
error handling (which gets pretty involved) is spread out into a
subsequent patch.
Limitation
----------
Currently, a fixed register cannot be used as hard register constraint.
For example, loading the stack pointer on x86_64 via
void *
foo (void)
{
void *y;
__asm__ ("" : "={rsp}" (y));
return y;
}
leads to an error.
Asm Adjust Hook
---------------
The following targets implement TARGET_MD_ASM_ADJUST:
- aarch64
- arm
- avr
- cris
- i386
- mn10300
- nds32
- pdp11
- rs6000
- s390
- vax
Most of them only add the CC register to the list of clobbered register.
However, cris, i386, and s390 need some minor adjustment.
gcc/ChangeLog:
* config/cris/cris.cc (cris_md_asm_adjust): Deal with hard
register constraint.
* config/i386/i386.cc (map_egpr_constraints): Ditto.
* config/s390/s390.cc (f_constraint_p): Ditto.
* doc/extend.texi: Document hard register constraints.
* doc/md.texi: Ditto.
* function.cc (match_asm_constraints_2): Have a unique pseudo
for each operand with a hard register constraint.
(pass_match_asm_constraints::execute): Calling into new helper
match_asm_constraints_2().
* genoutput.cc (mdep_constraint_len): Return the length of a
hard register constraint.
* genpreds.cc (write_insn_constraint_len): Support hard register
constraints for insn_constraint_len().
* ira.cc (valid_replacement_for_asm_input_p_1): New helper.
(valid_replacement_for_asm_input_p): New helper.
(decrease_live_ranges_number): Similar to
match_asm_constraints_2() ensure that each operand has a unique
pseudo if constrained by a hard register.
* lra-constraints.cc (process_alt_operands): Install hard
register filter according to constraint.
* recog.cc (asm_operand_ok): Accept register type for hard
register constrained asm operands.
(constrain_operands): Validate hard register constraints.
* stmt.cc (decode_hard_reg_constraint): Parse a hard register
constraint into the corresponding register number or bail out.
(parse_output_constraint): Parse hard register constraint and
set *ALLOWS_REG.
(parse_input_constraint): Ditto.
* stmt.h (decode_hard_reg_constraint): Declaration of new
function.
gcc/testsuite/ChangeLog:
* gcc.dg/asm-hard-reg-1.c: New test.
* gcc.dg/asm-hard-reg-2.c: New test.
* gcc.dg/asm-hard-reg-3.c: New test.
* gcc.dg/asm-hard-reg-4.c: New test.
* gcc.dg/asm-hard-reg-5.c: New test.
* gcc.dg/asm-hard-reg-6.c: New test.
* gcc.dg/asm-hard-reg-7.c: New test.
* gcc.dg/asm-hard-reg-8.c: New test.
* gcc.target/aarch64/asm-hard-reg-1.c: New test.
* gcc.target/i386/asm-hard-reg-1.c: New test.
* gcc.target/i386/asm-hard-reg-2.c: New test.
* gcc.target/s390/asm-hard-reg-1.c: New test.
* gcc.target/s390/asm-hard-reg-2.c: New test.
* gcc.target/s390/asm-hard-reg-3.c: New test.
* gcc.target/s390/asm-hard-reg-4.c: New test.
* gcc.target/s390/asm-hard-reg-5.c: New test.
* gcc.target/s390/asm-hard-reg-6.c: New test.
* gcc.target/s390/asm-hard-reg-longdouble.h: New test.
|
|
The following removes the minimum VF compute from dataref analysis
which does not take into account SLP at all, leaving the testcase
vectorized with V2SImode instead of V4SImode on x86. With SLP
the only minimum VF we can compute this early is 1.
* tree-vectorizer.h (vect_analyze_data_refs): Remove min_vf
output.
* tree-vect-data-refs.cc (vect_analyze_data_refs): Likewise.
* tree-vect-loop.cc (vect_analyze_loop_2): Remove early
out based on bogus min_vf.
* tree-vect-slp.cc (vect_slp_analyze_bb_1): Adjust.
* gcc.dg/vect/vect-127.c: New testcase.
|
|
PR fortran/119106
gcc/fortran/ChangeLog:
* expr.cc (simplify_constructor): Do not simplify constants.
(gfc_simplify_expr): Continue to simplify expression when an
iterator is present.
gcc/testsuite/ChangeLog:
* gfortran.dg/array_constructor_58.f90: New test.
|
|
This patch adds testcase for form8 and form9, as shown below:
T __attribute__((noinline)) \
sat_u_add_##T##_fmt_8(T x, T y) \
{ \
return x <= (T)(x + y) ? (x + y) : -1; \
}
T __attribute__((noinline)) \
sat_u_add_##T##_fmt_9(T x, T y) \
{ \
return x > (T)(x + y) ? -1 : (x + y); \
}
Passed the rv64gc regression test.
Signed-off-by: Ciyan Pan <panciyan@eswincomputing.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Unsigned testcase form8 form9.
* gcc.target/riscv/sat/sat_u_add-8-u16.c: New test.
* gcc.target/riscv/sat/sat_u_add-8-u32.c: New test.
* gcc.target/riscv/sat/sat_u_add-8-u64.c: New test.
* gcc.target/riscv/sat/sat_u_add-8-u8.c: New test.
* gcc.target/riscv/sat/sat_u_add-9-u16.c: New test.
* gcc.target/riscv/sat/sat_u_add-9-u32.c: New test.
* gcc.target/riscv/sat/sat_u_add-9-u64.c: New test.
* gcc.target/riscv/sat/sat_u_add-9-u8.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-8-u16.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-8-u32.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-8-u64.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-8-u8.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-9-u16.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-9-u32.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-9-u64.c: New test.
* gcc.target/riscv/sat/sat_u_add-run-9-u8.c: New test.
|
|
|
|
The problem here is that the testcase is part of another
testcase but dg-final does not work across source files
so it needs its own dg-* headers to that match up with
afdo-crossmodule-1.c.
Pushed as preapproved in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120859#c4 .
PR testsuite/120859
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/afdo-crossmodule-1b.c: Add some dg-*
commands like what is in afdo-crossmodule-1.c
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The previous test case doesn't leverage the right test helper macro,
it should be DEF_AVG_0_WRAP instead of DEF_AVG_0. We prefer the
test function name is test_avg_floor_int64_t_int32_t_0 instead
of test_avg_floor_WT_NT_0 for DEF_AVG_0(WT, NT).
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/avg_floor-1-i16-from-i32.c:
Leverage DEF_AVG_0_WRAP to generate the correct func name.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i16-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i32-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i64-from-i128.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i16-from-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i16-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i32-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i64-from-i128.c: Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
|
|
The ctable base address for SBCO/LBCO load/store patterns was
incorrectly stored as unsigned integer. That prevented matching
addresses with bit 31 set, because const_int RTL expression is expected
to be sign-extended.
Fix by using sign-extended 32-bit values for ctable base addresses.
PR target/121124
gcc/ChangeLog:
* config/pru/pru-pragma.cc (pru_pragma_ctable_entry): Handle the
ctable base address as signed 32-bit value, and sign-extend to
HOST_WIDE_INT.
* config/pru/pru-protos.h (struct pru_ctable_entry): Store the
ctable base address as signed.
(pru_get_ctable_exact_base_index): Pass base address as signed.
(pru_get_ctable_base_index): Ditto.
(pru_get_ctable_base_offset): Ditto.
* config/pru/pru.cc (pru_get_ctable_exact_base_index): Ditto.
(pru_get_ctable_base_index): Ditto.
(pru_get_ctable_base_offset): Ditto.
(pru_print_operand_address): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/pru/pragma-ctable_entry-2.c: New test.
|
|
[PR119100]
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a float_extend'ed vec_duplicate into a (possibly negated) minus-mult
RTL instruction.
Before this patch, we have six instructions, e.g.:
vsetivli zero,4,e32,m1,ta,ma
fcvt.s.h fa5,fa5
vfmv.v.f v4,fa5
vfwcvt.f.f.v v1,v3
vsetvli zero,zero,e32,m1,ta,ma
vfnmadd.vv v1,v4,v2
After, we get only one:
vfwnmacc.vf v1,fa5,v2
PR target/119100
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfwnmacc_vf_<mode>): New pattern.
(*vfwnmsac_vf_<mode>): New pattern.
* config/riscv/riscv.cc (get_vector_binary_rtx_cost): Add support for a
vec_duplicate in a neg.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwnmacc and
vfwnmsac.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmsac-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmsac-run-1-f32.c: New test.
|
|
|
|
PR fortran/121145
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_conv_procedure_call): Do not create pointer
check for proc-pointer actual passed to optional dummy.
gcc/testsuite/ChangeLog:
* gfortran.dg/pointer_check_15.f90: New test.
|
|
[PR121153]
I missed this when I added the two testcase vect-reduc-cond-[12].c. These testcases
require support of vectorization of `a ? b : c` which some targets (e.g. sparc) does
not support.
Pushed as obvious after a quick test.
PR testsuite/121153
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-reduc-cond-1.c: Require vect_condition.
* gcc.dg/vect/vect-reduc-cond-2.c: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Like the avg3_floor pattern, the avg3_ceil has the
similar issue that lack of the RVV DImode support.
Thus, this patch would like to support the DImode by
the standard name, with the iterator V_VLSI_D.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/autovec.md (avg<mode>3_ceil): Add new pattern
of avg3_ceil for RVV DImode
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/avg_data.h: Adjust the test data.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i64-from-i128.c: New test.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i64-from-i128.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Testcase of PR 117423 shows a flaw in the fancy way we do "total
scalarization" in SRA now. We use the types encountered in the
function body and not in type declaration (allowing us to totally
scalarize when only one union field is ever used, since we effectively
"skip" the union then) and can accommodate pre-existing accesses that
happen to fall into padding.
In this case, we skipped the union (bypassing the
totally_scalarizable_type_p check) and the access falling into the
"padding" is an aggregate and so not a candidate for SRA but actually
containing data. Arguably total scalarization should just bail out
when it encounters this situation (but I decided not to depend on this
mainly because we'd need to detect all cases when we eventually cannot
scalarize, such as when a scalar access has children accesses) but the
actual bug is that the detection if all data in an aggregate is indeed
covered by replacements just assumes that is always the case if total
scalarization triggers which however may not be the case in cases like
this - and perhaps more.
This patch fixes the bug by just assuming that all padding is taken
care of when total scalarization triggered, not that every access was
actually scalarized.
gcc/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* tree-sra.cc (analyze_access_subtree): Fix computation of grp_covered
flag.
gcc/testsuite/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* gcc.dg/tree-ssa/pr117423.c: New test.
|
|
The following makes sure we analyze live LC PHIs not part of
a double reduction.
PR tree-optimization/121126
* tree-vect-stmts.cc (vect_analyze_stmt): Analyze the
live lane extract for LC PHIs that are vect_internal_def.
* gcc.dg/vect/pr121126.c: New testcase.
|
|
The PR shows that the uninit analysis limits are set too low in
cases we lower switches to ifs as happens on s390x for a linux
kernel TU. This causes false positive uninit diagnostics as we
abort the attempt to prove that a value is initialized on all
paths. The new testcase only would require upping to 9.
PR tree-optimization/120924
* params.opt (uninit-max-chain-len): Up from 8 to 12.
* gcc.dg/uninit-pr120924.c: New testcase.
|
|
The following testcase ICEs because SCALAR_INT_TYPE_MODE of course
doesn't work for large BITINT_TYPE types which have BLKmode.
native_encode* as well as e.g. r14-8276 use in cases like these
GET_MODE_SIZE (SCALAR_INT_TYPE_MODE ()) and TREE_INT_CST_LOW (TYPE_SIZE_UNIT
()) for the BLKmode ones.
In this case, it wants bits rather than bytes, so I've used
GET_MODE_BITSIZE like before and TYPE_SIZE otherwise.
Furthermore, the patch only computes encoding_size for big endian
targets, for little endian we don't really adjust anything, so there
is no point computing it.
2025-07-18 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/121131
* gimple-fold.cc (fold_nonarray_ctor_reference): Use
TREE_INT_CST_LOW (TYPE_SIZE ()) instead of
GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE ()) for BLKmode BITINT_TYPEs.
Don't compute encoding_size at all for little endian targets.
* gcc.dg/bitint-124.c: New test.
|
|
|
|
This seems to have been fixed by r15-7260 for PR118285, but is sufficiently
different to merit its own test.
PR c++/87097
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/constexpr-array29.C: New test.
|
|
SME uses a lazy save system to manage ZA. The idea is that,
if a function with ZA state wants to call a "normal" function,
it can leave its state in ZA and instead set up a lazy save buffer.
If, unexpectedly, that normal function contains a nested use of ZA,
that nested use of ZA must commit the lazy save first.
This lazy save system uses a special system register called TPIDR2_EL0.
See:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#66the-za-lazy-saving-scheme
for details.
The ABI specifies that, on entry to an exception handler, the following
things must be true:
* PSTATE.SM must be 0 (the processor must be in non-streaming mode)
* PSTATE.ZA must be 0 (ZA must be off)
* TPIDR2_EL0 must be 0 (there must be no uncommitted lazy save)
This is normally done by making _Unwind_RaiseException & friends
commit any lazy save before they unwind. This also has the side
effect of ensuring that TPIDR2_EL0 is never left pointing to a
lazy save buffer that has been unwound.
However, things get more complicated with signals. If:
(a) a signal is raised while ZA is dormant (that is, while there is an
uncommitted lazy save);
(b) the signal handler throws an exception; and
(c) that exception is caught outside the signal handler
something must ensure that the lazy save from (a) is committed.
This would be simple if the signal handler was entered with ZA and
TPIDR2_EL0 intact. However, for various good reasons that are out
of scope here, this is not done. Instead, Linux now clears both
TPIDR2_EL0 and PSTATE.ZA before entering a signal handler, see:
https://lore.kernel.org/all/20250417190113.3778111-1-mark.rutland@arm.com/
for details.
Therefore, it is the unwinder that must simulate a commit of the lazy
save from (a). It can do this by reading the previous values of
TPIDR2_EL0 and ZA from the sigcontext.
The SME-related sigcontext structures were only added to linux's
asm/sigcontext.h relatively recently and we can't rely on GCC being
built against such recent kernel header files. The patch therefore uses
defines relevant macros if they are not defined and provide types that
comply with ABI layout of the corresponding linux types.
The patch includes some ugly casting in an attempt to support big-endian
ILP32, even though SME on big-endian ILP32 linux should never be a thing.
We can remove it if we also remove ILP32 support from GCC.
Co-authored-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Tamar Christina <tamar.christina@arm.com>
gcc/
* doc/sourcebuild.texi (aarch64_sme_hw): Document.
gcc/testsuite/
* lib/target-supports.exp (add_options_for_aarch64_sme)
(check_effective_target_aarch64_sme_hw): New procedures.
* g++.target/aarch64/sme/sme_throw_1.C: New test.
* g++.target/aarch64/sme/sme_throw_2.C: Likewise.
libgcc/
* config/aarch64/linux-unwind.h (aarch64_fallback_frame_state):
If a signal was raised while there was an uncommitted lazy save,
commit the save as part of the unwind process.
|
|
Currently for a signbit operation instructions tc{f,d,x}b + ipm + srl
are emitted. If the source operand is a MEM, then a load precedes the
sequence. A faster implementation is by issuing a load either from a
REG or MEM into a GPR followed by a shift.
In spirit of the signbit function of the C standard, the signbit optab
only guarantees that the resulting value is nonzero if the signbit is
set. The common code implementation computes a value where the signbit
is stored in the most significant bit, i.e., all other bits are just
masked out, whereas the current implementation of s390 results in a
value where the signbit is stored in the least significant bit.
Although, there is no guarantee where the signbit is stored, keep the
current behaviour and, therefore, implement the signbit optab manually.
Since z10, instruction lgdr can be effectively used for a 64-bit
FPR-to-GPR load. However, there exists no 32-bit pendant. Thus, for
target z10 make use of post-reload splitters which emit either a 64-bit
or a 32-bit load depending on whether the source operand is a REG or a
MEM and a corresponding 63 or 31-bit shift. We can do without
post-reload splitter in case of vector extensions since there we also
have a 32-bit VR-to-GPR load via instruction vlgvf.
gcc/ChangeLog:
* config/s390/s390.md (signbit_tdc): Rename expander.
(signbit<mode>2): New expander.
(signbit<mode>2_z10): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/s390/isfinite-isinf-isnormal-signbit-2.c: Adapt
scan assembler directives.
* gcc.target/s390/isfinite-isinf-isnormal-signbit-3.c: Ditto.
* gcc.target/s390/signbit-1.c: New test.
* gcc.target/s390/signbit-2.c: New test.
* gcc.target/s390/signbit-3.c: New test.
* gcc.target/s390/signbit-4.c: New test.
* gcc.target/s390/signbit-5.c: New test.
* gcc.target/s390/signbit.h: New test.
|
|
Exploit the fact that instruction VLGV zeros excessive bits of a GPR.
gcc/ChangeLog:
* config/s390/vector.md (bhfgq): Add scalar modes.
(*movdi<mode>_zero_extend_A): New insn.
(*movsi<mode>_zero_extend_A): New insn.
(*movdi<mode>_zero_extend_B): New insn.
(*movsi<mode>_zero_extend_B): New insn.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/vlgv-zero-extend-1.c: New test.
|
|
[PR121064]
When TARGET_VECTORIZE_VEC_PERM_CONST is called, target may be the
same pseudo as op0 and/or op1. Loading the selector into target
would clobber the input, producing wrong code like
vld $vr0, $t0
vshuf.w $vr0, $vr0, $vr1
So don't load the selector into d->target, use a new pseudo to hold the
selector instead. The reload pass will load the pseudo for selector and
the pseudo for target into the same hard register (following our
constraint '0' on the shuf instructions) anyway.
gcc/ChangeLog:
PR target/121064
* config/loongarch/lsx.md (lsx_vshuf_<lsxfmt_f>): Add '@' to
generate a mode-aware helper. Use <VIMODE> as the mode of the
operand 1 (selector).
* config/loongarch/lasx.md (lasx_xvshuf_<lasxfmt_f>): Likewise.
* config/loongarch/loongarch.cc
(loongarch_try_expand_lsx_vshuf_const): Create a new pseudo for
the selector. Use the mode-aware helper to simplify the code.
(loongarch_expand_vec_perm_const): Likewise.
gcc/testsuite/ChangeLog:
PR target/121064
* gcc.target/loongarch/pr121064.c: New test.
|
|
The following makes us never consider vector(1) T types for
vectorization and ensures this during SLP build. This is a
long-standing issue for BB vectorization and when we remove
early loop vector type setting we lose the single place we have
that rejects this for loops.
Once we implement partial loop vectorization we should revisit
this, but then use the original scalar types for the unvectorized
parts.
* tree-vect-slp.cc (vect_build_slp_tree_1): Reject
single-lane vector types.
* gcc.dg/vect/bb-slp-39.c: Adjust.
|
|
When VN iterates we can end up with unreachable inserted expressions
in the expression tables which in turn will not be added to their
value by PREs compute_avail. This will later ICE when we pick
them up and want to generate them. Deal with this by giving up.
PR tree-optimization/121035
* tree-ssa-pre.cc (find_or_generate_expression): Handle
values without expression.
* gcc.dg/pr121035.c: New testcase.
|
|
|
|
For MMX 16-bit, 32-bit and 64-bit constant vector loads from constant
vector pool:
(insn 6 2 7 2 (set (reg:V1SI 5 di)
(mem/u/c:V1SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S4 A32])) "pr121062-2.c":10:3 2036 {*movv1si_internal}
(expr_list:REG_EQUAL (const_vector:V1SI [
(const_int -1 [0xffffffffffffffff])
])
(nil)))
we can convert it to
(insn 12 2 7 2 (set (reg:SI 5 di)
(const_int -1 [0xffffffffffffffff])) "pr121062-2.c":10:3 100 {*movsi_internal}
(nil))
Co-Developed-by: H.J. Lu <hjl.tools@gmail.com>
gcc/
PR target/121062
* config/i386/i386.cc (ix86_convert_const_vector_to_integer):
Handle E_V1SImode and E_V1DImode.
* config/i386/mmx.md (V_16_32_64): Add V1SI, V2BF and V1DI.
(mmxinsnmode): Add V1DI and V1SI.
Add V_16_32_64 splitter for constant vector loads from constant
vector pool.
(V_16_32_64:*mov<mode>_imm): Moved after V_16_32_64 splitter.
Replace lowpart_subreg with adjust_address.
gcc/testsuite/
PR target/121062
* gcc.target/i386/pr121062-1.c: New test.
* gcc.target/i386/pr121062-2.c: Likewise.
* gcc.target/i386/pr121062-3a.c: Likewise.
* gcc.target/i386/pr121062-3b.c: Likewise.
* gcc.target/i386/pr121062-3c.c: Likewise.
* gcc.target/i386/pr121062-4.c: Likewise.
* gcc.target/i386/pr121062-5.c: Likewise.
* gcc.target/i386/pr121062-6.c: Likewise.
* gcc.target/i386/pr121062-7.c: Likewise.
|
|
Since only glibc targets support -mfentry, warn -pg without -mfentry only
on glibc targets.
gcc/
PR target/120881
PR testsuite/121078
* config/i386/i386-options.cc (ix86_option_override_internal):
Warn -pg without -mfentry only on glibc targets.
gcc/testsuite/
PR target/120881
PR testsuite/121078
* gcc.dg/20021014-1.c (dg-additional-options): Add -mfentry
-fno-pic only on gnu/x86 targets.
* gcc.dg/aru-2.c (dg-additional-options): Likewise.
* gcc.dg/nest.c (dg-additional-options): Likewise.
* gcc.dg/pr32450.c (dg-additional-options): Likewise.
* gcc.dg/pr43643.c (dg-additional-options): Likewise.
* gcc.target/i386/pr104447.c (dg-additional-options): Likewise.
* gcc.target/i386/pr113122-3.c(dg-additional-options): Likewise.
* gcc.target/i386/pr119386-1.c (dg-additional-options): Add
-mfentry only on gnu targets.
* gcc.target/i386/pr119386-2.c (dg-additional-options): Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
The following disables loop masking when we are using an even/odd
widening operation in a reduction because the loop mask then aligns
to the wrong elements.
PR tree-optimization/121049
* internal-fn.h (widening_evenodd_fn_p): Declare.
* internal-fn.cc (widening_evenodd_fn_p): New function.
* tree-vect-stmts.cc (vectorizable_conversion): When using
an even/odd widening function disable loop masking.
* gcc.dg/vect/pr121049.c: New testcase.
|
|
For possible reductions, ifconv currently handles if the addition
is on one side of the if. But in the case of PR 119920, the reduction
addition is on both sides of the if.
E.g.
```
if (_27 == 0)
goto <bb 14>; [50.00%]
else
goto <bb 13>; [50.00%]
<bb 14>
a_29 = b_14(D) + a_17;
goto <bb 15>; [100.00%]
<bb 13>
a_28 = c_12(D) + a_17;
<bb 15>
# a_30 = PHI <a_28(13), a_29(14)>
```
Which ifcvt converts into:
```
_34 = _32 + _33;
a_15 = (int) _34;
_23 = _4 == 0;
_37 = _33 + _35;
a_13 = (int) _37;
a_5 = _23 ? a_15 : a_13;
```
But the vectorizer does not recognize this as a reduction.
To fix this, we should factor out the addition from the `if`.
This allows us to get:
```
iftmp.0_7 = _22 ? b_13(D) : c_12(D);
a_14 = iftmp.0_7 + a_18;
```
Which then the vectorizer recognizes as a reduction.
In the case of PR 112324 and PR 110015, it is similar but with MAX_EXPR reduction
instead of an addition.
Note while this should be done in phiopt, there are regressions
due to other passes not able to handle the factored out cases
(see linked bug to PR 64700). I have not had time to fix all of the passes
that could handle the addition being in the if/then/else rather than being outside yet.
So this is I thought it would be useful just to have a localized version in ifconv which
is then only used for the vectorizer.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/119920
PR tree-optimization/112324
PR tree-optimization/110015
gcc/ChangeLog:
* tree-if-conv.cc (find_different_opnum): New function.
(factor_out_operators): New function.
(predicate_scalar_phi): Call factor_out_operators when
there is only 2 elements of a phi.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-reduc-cond-1.c: New test.
* gcc.dg/vect/vect-reduc-cond-2.c: New test.
* gcc.dg/vect/vect-reduc-cond-3.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
When having a _BitInt induction we should make sure to not create
the step vector elements as _BitInts but as vector element typed.
PR tree-optimization/121116
* tree-vect-loop.cc (vectorizable_induction): Use the
step vector element type for further processing.
* gcc.dg/torture/pr121116.c: New testcase.
|
|
Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin
if at least one argument is a vector highpart and all others are VECTOR_CSTs
that we can extend to 128-bits.
For example, we prefer to duplicate f0 and use UMULL2 here over DUP+UMULL:
uint16x8_t
foo (const uint8x16_t s)
{
const uint8x8_t f0 = vdup_n_u8 (4);
return vmull_u8 (vget_high_u8 (s), f0);
}
gcc/ChangeLog:
PR target/117850
* config/aarch64/aarch64-builtins.cc (LO_HI_PAIRINGS): New, group the
lo/hi pairs from aarch64-builtin-pairs.def.
(aarch64_get_highpart_builtin): New function.
(aarch64_v128_highpart_ref): New function, helper to look for vector
highparts.
(aarch64_build_vector_cst): New function, helper to build duplicated
VECTOR_CSTs.
(aarch64_fold_lo_call_to_hi): New function.
(aarch64_general_gimple_fold_builtin): Add cases for the lo builtins
in aarch64-builtin-pairs.def.
* config/aarch64/aarch64-builtin-pairs.def: New file, declare the
parirs of lowpart-operating and highpart-operating builtins.
gcc/testsuite/ChangeLog:
PR target/117850
* gcc.target/aarch64/simd/vabal_combine.c: Removed. This is
covered by fold_to_highpart_1.c
* gcc.target/aarch64/simd/fold_to_highpart_1.c: New test.
* gcc.target/aarch64/simd/fold_to_highpart_2.c: Likewise.
* gcc.target/aarch64/simd/fold_to_highpart_3.c: Likewise.
* gcc.target/aarch64/simd/fold_to_highpart_4.c: Likewise.
* gcc.target/aarch64/simd/fold_to_highpart_5.c: Likewise.
* gcc.target/aarch64/simd/fold_to_highpart_6.c: Likewise.
|
|
The string_slice inherits from array_slice and is used to refer to a
substring of an array that is memory managed elsewhere without modifying
the underlying array.
For example, this is useful in cases such as when needing to refer to a
substring of an attribute in the syntax tree.
Adds some minimal helper functions for string_slice,
such as a strtok alternative, equality operators, strcmp, and a function
to strip whitespace from the beginning and end of a string_slice.
gcc/c-family/ChangeLog:
* c-format.cc (local_string_slice_node): New node type.
(asm_fprintf_char_table): New entry.
(init_dynamic_diag_info): Add support for string_slice.
* c-format.h (T_STRING_SLICE): New node type.
gcc/ChangeLog:
* pretty-print.cc (format_phase_2): Add support for string_slice.
* vec.cc (string_slice::tokenize): New static method.
(string_slice::strcmp): New static method.
(string_slice::strip): New method.
(test_string_slice_initializers): New test.
(test_string_slice_tokenize): Ditto.
(test_string_slice_strcmp): Ditto.
(test_string_slice_equality): Ditto.
(test_string_slice_inequality): Ditto.
(test_string_slice_invalid): Ditto.
(test_string_slice_strip): Ditto.
(vec_cc_tests): Add new tests.
* vec.h (class string_slice): New class.
gcc/testsuite/ChangeLog
* g++.dg/warn/Wformat-gcc_diag-1.C: Add string_slice "%B" format tests.
|
|
r16-2175-g5aa21765236730 introduced an assert for floating-point modes
when expanding an RDIV_EXPR but forgot fixed-point modes. This patch
adds ALL_FIXED_POINT_MODE_P to the assert.
PR middle-end/121065
gcc/ChangeLog:
* cfgexpand.cc (expand_debug_expr): Allow fixed-point modes for
RDIV_EXPR.
* optabs-tree.cc (optab_for_tree_code): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/arm/pr121065.c: New test.
|
|
In PR120297 we fuse
vsetvl e8,mf2,...
vsetvl e64,m1,...
into
vsetvl e64,m4,...
Individually, that's ok but we also change the new vsetvl's demand to
"SEW only" even though the first original one demanded SEW >= 8 and
ratio = 16.
As we forget the ratio after the merge we find that the vsetvl following
the merged one has ratio = 64 demand and we fuse into
vsetvl e64,m1,..
which obviously doesn't have ratio = 16 any more.
Regtested on rv64gcv_zvl512b.
PR target/120297
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.def: Do not forget ratio demand of
previous vsetvl.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/pr120297.c: New test.
|
|
SVE2 BSL2N (x, y, z) = (x & z) | (~y & ~z). When x == y this computes:
(x & z) | (~x & ~z) which is ~(x ^ z).
Thus, we can use it to match RTL patterns (not (xor (...) (...))) for both
Advanced SIMD and SVE modes when TARGET_SVE2.
This patch does that.
For code like:
uint64x2_t eon_q(uint64x2_t a, uint64x2_t b) { return EON(a, b); }
svuint64_t eon_z(svuint64_t a, svuint64_t b) { return EON(a, b); }
We now generate:
eon_q:
bsl2n z0.d, z0.d, z0.d, z1.d
ret
eon_z:
bsl2n z0.d, z0.d, z0.d, z1.d
ret
instead of the previous:
eon_q:
eor v0.16b, v0.16b, v1.16b
not v0.16b, v0.16b
ret
eon_z:
eor z0.d, z0.d, z1.d
ptrue p3.b, all
not z0.d, p3/m, z0.d
ret
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/
* config/aarch64/aarch64-sve2.md (*aarch64_sve2_bsl2n_eon<mode>):
New pattern.
(*aarch64_sve2_eon_bsl2n_unpred<mode>): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve2/eon_bsl2n.c: New test.
|
|
We already have patterns to use the NBSL instruction to implement vector
NOR and NAND operations for SVE types and modes. It is straightforward to
have similar patterns for the fixed-width Advanced SIMD modes as well, though
it requires combine patterns without the predicate operand and an explicit 'Z'
output modifier. This patch does so.
So now for example we generate for:
uint64x2_t nand_q(uint64x2_t a, uint64x2_t b) { return NAND(a, b); }
uint64x2_t nor_q(uint64x2_t a, uint64x2_t b) { return NOR(a, b); }
nand_q:
nbsl z0.d, z0.d, z1.d, z1.d
ret
nor_q:
nbsl z0.d, z0.d, z1.d, z0.d
ret
instead of the previous:
nand_q:
and v0.16b, v0.16b, v1.16b
not v0.16b, v0.16b
ret
nor_q:
orr v0.16b, v0.16b, v1.16b
not v0.16b, v0.16b
ret
The tied operand requirements for NBSL mean that we can generate the MOVPRFX
when the operands fall that way, but I guess having a 2-insn MOVPRFX form is
not worse than the current 2-insn codegen at least, and the MOVPRFX can be
fused by many cores.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/
* config/aarch64/aarch64-sve2.md (*aarch64_sve2_unpred_nor<mode>):
New define_insn.
(*aarch64_sve2_nand_unpred<mode>): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve2/nbsl_nor_nand_neon.c: New test.
|
|
2025-07-16 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/121060
* interface.cc (matching_typebound_op): Defer determination of
specific procedure until resolution by returning NULL.
gcc/testsuite/
PR fortran/121060
* gfortran.dg/associate_75.f90: New test.
|
|
2025-07-16 Steve Kargl <sgk@troutmask.apl.washington.edu>
gcc/fortran
* decl.cc (gfc_match_import): Correct minor whitespace snafu
and fix NULL pointer dereferences in two places.
gcc/testsuite/
* gfortran.dg/import13.f90: New test.
|
|
Hi,
This fixes PR c/82134 which concerns gcc emitting an incorrect unused
result diagnostic for empty types. This diagnostic is emitted from
tree-cfg.cc because of a couple code paths which attempt to avoid
copying empty types, resulting in GIMPLE that isn't using the returned
value of a call. To fix this I've added suppress_warning in three locations
and a corresponding check in do_warn_unused_result.
Cheers,
Jeremy
PR c/82134
gcc/cp/ChangeLog:
* call.cc (build_call_a): Add suppress_warning
* cp-gimplify.cc (cp_gimplify_expr): Add suppress_warning
gcc/ChangeLog:
* gimplify.cc (gimplify_modify_expr): Add suppress_warning
* tree-cfg.cc (do_warn_unused_result): Check warning_suppressed_p
gcc/testsuite/ChangeLog:
* c-c++-common/attr-warn-unused-result-2.c: New test.
Signed-off-by: Jeremy Rifkin <jeremy@rifkin.dev>
|
|
In ISE058, the AVX10.2 imply is removed from AMX-AVX512. This
leads to re-consideration on the imply for AMX-AVX512.
Since it is using zmm register and using zmm register only, we
need to at least imply AVX512F. AVX512VL is not needed.
On the other hand, if we imply AVX10.1 for AMX-AVX512, it will
cause -mno-avx10.1 disabling AMX-AVX512. This would be a surprise
for users.
Based on the two reasons above, the patch is decoupling AMX-AVX512
from AVX10.2 and imply AVX512F.
gcc/ChangeLog:
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AMX_AVX512_SET): Do not set AVX10.2.
(OPTION_MASK_ISA2_AVX10_2_UNSET): Remove AMX-AVX512 unset.
(OPTION_MASK_ISA2_AVX512F_UNSET): Unset AMX-AVX512.
(ix86_handle_option): Imply AVX512F for AMX-AVX512.
gcc/testsuite/ChangeLog:
* gcc.target/i386/amxavx512-cvtrowd2ps-2.c: Add -mavx512fp16 to
use FP16 related intrins for convert.
* gcc.target/i386/amxavx512-cvtrowps2bf16-2.c: Ditto.
* gcc.target/i386/amxavx512-cvtrowps2ph-2.c: Ditto.
* gcc.target/i386/amxavx512-movrow-2.c: Ditto.
|
|
Per previous discuss with Jeff, we don't do complicated
asm check like scalar saturation alu. It is somehow
not easy to maintain, as well as fragile. Thus, we
remove these function-body check, and introduce the
jmp label asm check instead.The code-gen of SAT_*
will never have a jmp, and the other run test will
make sure the correctness of SAT_* code-gen.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
The below failed test cases are resolved:
FAIL: gcc.target/riscv/sat/sat_s_add_imm-2-i8.c -Oz
check-function-bodies sat_s_add_imm_int8_t_fmt_2_1
FAIL: gcc.target/riscv/sat/sat_s_add_imm-2-i8.c -Os
check-function-bodies sat_s_add_imm_int8_t_fmt_2_1
FAIL: gcc.target/riscv/sat/sat_s_add_imm-2-i8.c -O3
check-function-bodies sat_s_add_imm_int8_t_fmt_2_1
FAIL: gcc.target/riscv/sat/sat_s_add_imm-2-i8.c -Ofast
check-function-bodies sat_s_add_imm_int8_t_fmt_2_1
FAIL: gcc.target/riscv/sat/sat_s_add_imm-2-i8.c -O2
check-function-bodies sat_s_add_imm_int8_t_fmt_2_1
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_s_add-1-i16.c: Remove function-body
check and add no jmp label asm check.
* gcc.target/riscv/sat/sat_s_add-1-i32.c:
* gcc.target/riscv/sat/sat_s_add-1-i64.c: Ditto.
* gcc.target/riscv/sat/sat_s_add-1-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_add-2-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_add-2-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_add-2-i64.c: Ditto.
* gcc.target/riscv/sat/sat_s_add-2-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_add-3-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_add-3-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_add-3-i64.c: Ditto.
* gcc.target/riscv/sat/sat_s_add-3-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_add-4-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_add-4-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_add-4-i64.c: Ditto.
* gcc.target/riscv/sat/sat_s_add-4-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_add_imm-1-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_add_imm-1-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_add_imm-1-i64.c: Ditto.
* gcc.target/riscv/sat/sat_s_add_imm-1-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_add_imm-2-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_add_imm-2-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_add_imm-2-i64.c: Ditto.
* gcc.target/riscv/sat/sat_s_add_imm-2-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-1-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-1-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-1-i64.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-1-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-2-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-2-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-2-i64.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-2-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-3-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-3-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-3-i64.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-3-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-4-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-4-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-4-i64.c: Ditto.
* gcc.target/riscv/sat/sat_s_sub-4-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-1-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-1-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-1-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-1-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-1-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-1-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-2-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-2-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-2-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-2-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-2-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-2-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-3-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-3-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-3-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-3-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-3-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-3-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-4-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-4-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-4-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-4-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-4-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-4-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-5-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-5-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-5-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-5-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-5-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-5-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-6-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-6-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-6-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-6-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-6-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-6-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-7-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-7-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-7-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-7-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-7-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-7-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-8-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-8-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-8-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-8-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-8-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat/sat_s_trunc-8-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-1-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-1-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-1-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-1-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-2-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-2-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-2-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-2-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-3-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-3-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-3-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-3-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-4-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-4-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-4-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-4-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-5-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-5-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-5-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-5-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-6-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-6-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-6-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-6-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-7-u16-from-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-7-u16-from-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-7-u32-from-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-7-u8-from-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-7-u8-from-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_add-7-u8-from-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-1-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-1-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-1-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-1-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-2-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-2-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-2-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-2-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-3-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-3-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-3-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-3-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-4-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-4-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-4-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_add_imm-4-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_mul-1-u16-from-u128.c: Ditto.
* gcc.target/riscv/sat/sat_u_mul-1-u32-from-u128.c: Ditto.
* gcc.target/riscv/sat/sat_u_mul-1-u64-from-u128.c: Ditto.
* gcc.target/riscv/sat/sat_u_mul-1-u8-from-u128.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-1-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-1-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-1-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-1-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-10-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-10-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-10-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-10-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-11-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-11-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-11-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-11-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-12-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-12-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-12-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-12-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-2-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-2-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-2-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-2-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-3-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-3-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-3-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-3-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-4-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-4-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-4-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-4-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-5-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-5-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-5-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-5-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-6-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-6-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-6-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-6-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-7-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-7-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-7-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-7-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-8-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-8-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-8-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-8-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-9-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-9-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-9-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub-9-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u16-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u16-2.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u16-3.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u16-4.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u32-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u32-2.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u32-3.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u32-4.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u64-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u64-2.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u8-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u8-2.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u8-3.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u8-4.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-1-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u16-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u16-2.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u16-3.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u32-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u32-2.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u32-3.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u64-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u8-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u8-2.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u8-3.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-2-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-3-u16-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-3-u16-2.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-3-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-3-u32-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-3-u32-2.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-3-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-3-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-3-u8-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-3-u8-2.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-3-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-4-u16-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-4-u16-2.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-4-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-4-u32-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-4-u32-2.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-4-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-4-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-4-u8-1.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-4-u8-2.c: Ditto.
* gcc.target/riscv/sat/sat_u_sub_imm-4-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-1-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-1-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-1-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-1-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-2-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-2-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-2-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-2-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-3-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-3-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-3-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-3-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-4-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-4-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-4-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-4-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-5-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-5-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-5-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-5-u8.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-6-u16.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-6-u32.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-6-u64.c: Ditto.
* gcc.target/riscv/sat/sat_u_trunc-6-u8.c: Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
The avg3_floor pattern leverage the add and shift rtl
with the DOUBLE_TRUNC mode iterator. Aka, RVVDImode
iterator will generate avg3rvvsimode_floor, only the
element size QI, HI and SI are allowed.
Thus, this patch would like to support the DImode by
the standard name, with the iterator V_VLSI_D.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/autovec.md (avg<mode>3_floor): Add new
pattern of avg3_floor for rvv DImode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/avg.h: Add int128 type when
xlen == 64.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i32.c:
Suppress __int128 warning for run test.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i32-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_data.h: Fix one incorrect
test data.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i32-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i64-from-i128.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i64-from-i128.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
|