Age | Commit message (Collapse) | Author | Files | Lines |
|
Newer llvm-mc assemblers support the gfx*-generic targets, permitting to
generate code for all GPUs belonging to the same generation, even if not
optimal code. This requires LLVM 19.
This patch adds the compiler-side support for generic gfx and also
adds -march=gfx10-3-generic and -march=gfx-11. However, those -march= are
not documented nor used anywhere, yet.
Disclaimer: Not tested (as my ROCm does not support it); additionally,
libgomp/plugin/plugin-gcn.c has to be updated before it becomes useful.
For better compatibility with LLVM's Clang, this commit additionally adds
the macro definitions __GFX<9|10|11>__ for the architecture family,
__AMDGPU__ besides the existing __AMDGCN__ and the two strings-containing
macros __amdgcn_processor__ and __amdgcn_target_id__, where the former has
'-' replaced by '_' but otherwise both contain the lower case name. For the
new generic targets, the same happens, yielding, e.g., __gfx10_3_generic__.
gcc/ChangeLog:
* config/gcn/gcn-devices.def: Add generic version/flag as additional
value and architecture family entry; update; add gfx-10-3-generic
and gfx11-generic.
* config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): Remove
(ASM_SPEC): Use generated ABI_VERSION_OPT instead.
* config/gcn/gcn-tables.opt: Regenerate
* config/gcn/gcn.h (gcn_device_def): Add generic_version and
arch_family members.
(TARGET_CPU_CPP_BUILTINS): Fix allocation bug, handle '-' in the
name and add additional macro defines.
* config/gcn/gcn.cc (gcn_devices): Handle it.
* config/gcn/gen-gcn-device-macros.awk: Likewise; use ELF name
for the macro name; generate ABI_VERSION_OPT.
* config/gcn/mkoffload.cc (ELFABIVERSION_AMDGPU_HSA_V6,
EF_AMDGPU_GENERIC_VERSION_V, EF_AMDGPU_GENERIC_VERSION_OFFSET,
GET_GENERIC_VERSION, SET_GENERIC_VERSION): Define.
(get_arch): Call SET_GENERIC_VERSION flag on elf_flags.
(copy_early_debug_info): If the arch sets the generic version,
use ELFABIVERSION_AMDGPU_HSA_V6.
|
|
Converted the tests to use check-function-bodies in order to ensure that
the sequence is correct.
gcc/testsuite/ChangeLog:
* gcc.target/arm/fp16-aapcs-1.c: Use check-function-bodies.
* gcc.target/arm/fp16-aapcs-2.c: Likewise.
* gcc.target/arm/fp16-aapcs-3.c: Likewise.
* gcc.target/arm/fp16-aapcs-4.c: Likewise.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
Below -O2, lsls/lsrs are prefered. For -O2 and above, lsl/lsr are
prefered.
gcc/testsuite/ChangeLog:
* gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Allow lsl and
lsr instructions.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-and-union.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/union-2.c: Likewise.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
Converted the tests to use check-function-bodies in order to ensure that
the sequence is correct.
This also allows both APSR_nzcvq and APSR_nzcvqg as target selector does
not work when the -march and/or -mcpu overrides the target to test.
gcc/testsuite/ChangeLog:
* gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-5.c: Use
check-function-bodies.
* gcc.target/arm/cmse/mainline/8m/hard/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8m/soft/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8m/softfp-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8m/softfp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c:
Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
This test needs a directive checking the removal of the link_error.
Committed as obvious.
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/testsuite/
* gcc.dg/tree-ssa/log_ident.c: Add scan for removal of
link_error in optimized tree dump.
|
|
After r15-4050-g5dad738c1dd164 register_specialization needs to set
elt.hash to the (maybe) precomputed hash so that the lookup uses it
rather than redundantly computing it from scratch.
gcc/cp/ChangeLog:
* pt.cc (register_specialization): Set elt.hash.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
gcc.dg/torture/pr112305.c contains an inner loop that executes
0x8000_0014 times and an outer loop that executes 5 times, giving about
10 billion total executions of the inner loop body. At -O2 and above we
are able to remove the inner loop, but at -O1 we keep a no-op loop:
dls lr, r3
.L3:
subs r3, r3, #1
le lr, .L3
and at -O0 we of course don't optimise.
This can lead to long execution times on simulators, possibly
triggering a timeout.
gcc/testsuite
* gcc.dg/torture/pr112305.c: Skip at -O0 and -O1 for simulators.
|
|
In some cases we can access members of a namespace-scope class without
ever having performed name-lookup on it; this can occur when a
forward-declaration of the class is used as a return type, for
instance, or with PIMPL.
One possible approach would be to do name lookup in complete_type to
force lazy loading to occur, but this seems overly expensive for a
relatively rare case. Instead, this patch generalises the existing
pending-entity support to handle this case as well.
Unfortunately this does mean that almost every class definition will be
added to the pending-entity table, and almost always unnecessarily, but
I don't see a good way to avoid this.
gcc/cp/ChangeLog:
* module.cc (depset::DB_IS_MEMBER_BIT): Rename to...
(depset::DB_IS_PENDING_BIT): ...this.
(depset::is_member): Remove.
(depset::is_pending_entity): New function.
(depset::hash::make_dependency): Mark definitions of
namespace-scope types as maybe-pending entities.
(depset::hash::add_class_entities): Rename DB_IS_MEMBER_BIT to
DB_IS_PENDING_BIT.
(depset::hash::find_dependencies): Use is_pending_entity
instead of is_member.
(module_state::write_pendings): Likewise; adjust comment.
gcc/testsuite/ChangeLog:
* g++.dg/modules/inst-4_b.C: Adjust pending-entity count.
* g++.dg/modules/member-def-1_c.C: Likewise.
* g++.dg/modules/member-def-2_c.C: Likewise.
* g++.dg/modules/tpl-spec-3_b.C: Likewise.
* g++.dg/modules/tpl-spec-4_b.C: Likewise.
* g++.dg/modules/tpl-spec-5_b.C: Likewise.
* g++.dg/modules/class-9_a.H: New test.
* g++.dg/modules/class-9_b.H: New test.
* g++.dg/modules/class-9_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
The diagnostics code fails to handle non-constant domain max.
PR tree-optimization/117254
* gimple-ssa-warn-access.cc (maybe_warn_nonstring_arg):
Check the array domain max is constant before using it.
* gcc.dg/pr117254.c: New testcase.
|
|
Almost all device-specific settings are now centralised into gcn-devices.def
for the compiler, mkoffload, and libgomp. No longer will we have to touch 10
files in multiple places just to add another device without any exotic
features. (New ISAs and devices with incompatible metadata will continue to
need a bit more.)
In order to remove the device-specific conditionals in the code a new value
HSACO_ATTR_UNSUPPORTED has been added, indicating that the assembler will
reject any setting of that option.
This incorporates some of Tobias's patch from March 2024.
Co-Authored-By: Tobias Burnus <tburnus@baylibre.com>
gcc/ChangeLog:
* config.gcc (amdgcn): Add gcn-device-macros.h to tm_file.
Add gcn-tables.opt to extra_options.
* config/gcn/gcn-hsa.h (NO_XNACK): Delete.
(NO_SRAM_ECC): Delete.
(SRAMOPT): Move definition to generated file gcn-device-macros.h.
(XNACKOPT): Likewise.
(ASM_SPEC): Redefine using generated values from gcn-device-macros.h.
* config/gcn/gcn-opts.h
(enum processor_type): Generate from gcn-devices.def.
(TARGET_VEGA10): Delete.
(TARGET_VEGA20): Delete.
(TARGET_GFX908): Delete.
(TARGET_GFX90a): Delete.
(TARGET_GFX90c): Delete.
(TARGET_GFX1030): Delete.
(TARGET_GFX1036): Delete.
(TARGET_GFX1100): Delete.
(TARGET_GFX1103): Delete.
(TARGET_XNACK): Redefine to allow for HSACO_ATTR_UNSUPPORTED.
(enum hsaco_attr_type): Add HSACO_ATTR_UNSUPPORTED.
(TARGET_TGSPLIT): New define.
* config/gcn/gcn.cc (gcn_devices): New constant table.
(gcn_option_override): Rework to use gcn_devices table.
(gcn_omp_device_kind_arch_isa): Likewise.
(output_file_start): Likewise.
(gcn_hsa_declare_function_name): Rework using TARGET_* macros.
* config/gcn/gcn.h (gcn_devices): Declare struct and table.
(TARGET_CPU_CPP_BUILTINS): Rework using gcn_devices.
* config/gcn/gcn.opt: Move enum data to generated file gcn-tables.opt.
Use new names for the default values.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX900): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX906): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX908): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX90a): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX90c): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX1030): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX1036): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX1100): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX1103): Delete.
(enum elf_arch_code): Define using gcn-devices.def.
(get_arch): Rework using gcn-devices.def.
(main): Rework using gcn-devices.def
* config/gcn/t-gcn-hsa (gcn-tables.opt): Generate file.
(gcn-device-macros.h): Generate file.
* config/gcn/t-omp-device: Generate isa list from gcn-devices.def.
* config/gcn/gcn-devices.def: New file.
* config/gcn/gcn-tables.opt: New file.
* config/gcn/gcn-tables.opt.urls: New file.
* config/gcn/gen-gcn-device-macros.awk: New file.
* config/gcn/gen-opt-tables.awk: New file.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (EF_AMDGPU_MACH): Generate from gcn-devices.def.
(gcn_gfx803_s): Delete.
(gcn_gfx900_s): Delete.
(gcn_gfx906_s): Delete.
(gcn_gfx908_s): Delete.
(gcn_gfx90a_s): Delete.
(gcn_gfx90c_s): Delete.
(gcn_gfx1030_s): Delete.
(gcn_gfx1036_s): Delete.
(gcn_gfx1100_s): Delete.
(gcn_gfx1103_s): Delete.
(gcn_isa_name_len): Delete.
(isa_hsa_name): Rename ...
(isa_name): ... to this, and rework using gcn-devices.def.
(isa_gcc_name): Delete.
(isa_code): Rework using gcn-devices.def.
(max_isa_vgprs): Rework using gcn-devices.def.
(isa_matches_agent): Update isa_name usage.
(GOMP_OFFLOAD_init_device): Improve diagnostic using the name.
|
|
Value-numbering can use its set of equivalences to prove that
a PHI node with args <a_1, 5, 10> is equal to a_1 iff on the
edges with the constants a_1 == 5 and a_1 == 10 hold. This
breaks down when the order of PHI args is <5, 10, a_1> as then
we drop to VARYING early. The following mitigates this by
shuffling a copy of the edge vector to always process a SSA name
argument first. Which should also handle the special-case of
a two argument <5, a_1> we already had.
PR tree-optimization/117123
* tree-ssa-sccvn.cc (visit_phi): First process a non-constant
argument edge to handle more equivalences. Remove the
two-arg special case.
* g++.dg/tree-ssa/pr117123.C: New testcase.
|
|
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/ext-floating19.C: Fix typo for bfloat16 guard.
|
|
form 1:
T __attribute__((noinline)) \
sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \
{ \
return (T)IMM >= y ? (T)IMM - y : 0; \
}
Passed the rv64gcv regression test.
Change-Id: I8805225b445cdbbc685f4f54a4d66c7ee8f748e1
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_u_sub_imm-1_4.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2_4.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3_4.c: New test.
* gcc.target/riscv/sat_u_sub_imm-4_2.c: New test.
|
|
This patch would like to support .SAT_SUB when one of the op
is IMM = 1 of form1.
Form 1:
#define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \
T __attribute__((noinline)) \
sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \
{ \
return IMM >= y ? IMM - y : 0; \
}
Take below form 1 as example:
DEF_SAT_U_SUB_IMM_FMT_1(uint8_t, 1)
Before this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm1_uint8_t_fmt_1 (uint8_t y)
{
uint8_t _1;
uint8_t _3;
<bb 2> [local count: 1073741824]:
if (y_2(D) <= 1)
goto <bb 3>; [41.00%]
else
goto <bb 4>; [59.00%]
<bb 3> [local count: 440234144]:
_3 = y_2(D) ^ 1;
<bb 4> [local count: 1073741824]:
# _1 = PHI <0(2), _3(3)>
return _1;
}
After this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm1_uint8_t_fmt_1 (uint8_t y)
{
uint8_t _1;
;; basic block 2, loop depth 0
;; pred: ENTRY
_1 = .SAT_SUB (1, y_2(D)); [tail call]
return _1;
;; succ: EXIT
}
The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/ChangeLog:
* match.pd: Support IMM=1.
|
|
form 1:
T __attribute__((noinline)) \
sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \
{ \
return (T)IMM >= y ? (T)IMM - y : 0; \
}
Passed the rv64gcv regression test.
Change-Id: Idaa1ab41f2a5785112279ea8ee2c93236457b740
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_u_sub_imm-1_3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2_3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3_3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-4_1.c: New test.
|
|
This patch would like to support .SAT_SUB when one of the op
is IMM = max - 1 of form1.
Form 1:
#define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \
T __attribute__((noinline)) \
sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \
{ \
return IMM >= y ? IMM - y : 0; \
}
Take below form 1 as example:
DEF_SAT_U_SUB_IMM_FMT_1(uint8_t, 254)
Before this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm254_uint8_t_fmt_1 (uint8_t y)
{
uint8_t _1;
uint8_t _3;
<bb 2> [local count: 1073741824]:
if (y_2(D) != 255)
goto <bb 3>; [66.00%]
else
goto <bb 4>; [34.00%]
<bb 3> [local count: 708669600]:
_3 = 254 - y_2(D);
<bb 4> [local count: 1073741824]:
# _1 = PHI <0(2), _3(3)>
return _1;
}
After this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm254_uint8_t_fmt_1 (uint8_t y)
{
uint8_t _1;
<bb 2> [local count: 1073741824]:
_1 = .SAT_SUB (254, y_2(D)); [tail call]
return _1;
}
The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/ChangeLog:
* match.pd: Support IMM=max-1.
|
|
|
|
ext-dce
A while back I noticed that the code to call carry_backpropagate was being
called after the optimization step. Which seemed wrong, but at the time I
didn't have a testcase showing it as a problem. Now I have 4 :-)
The way things used to work, the extension would be stripped away before
calling carry_backpropagte, meaning carry_backpropagate would never see a
SIGN_EXTENSION. Thus the code trying to account for the sign extended bit was
never reached.
Getting that bit marked live is what's needed to fix these testcases. Fallout
is minor with just an adjustment needed to sensibly deal with vector modes in a
place where we didn't have them before.
I'm still somewhat concerned about this code. Specifically whether or not we
can get in here with arbitrarily complex RTL, and if so do we need to recurse
down and look at those sub-expressions.
So while this patch fixes the most pressing issue, I wouldn't be terribly
surprised if we're back inside this code at some point.
Bootstrapped and regression tested on x86_64, ppc64le, riscv64, s390x, mips64,
loongarch, aarch64, m68k, alpha, hppa, sh4, sh4eb, perhaps something else that
I've forgotten... Also tested on all the crosses in my tester.
PR rtl-optimization/116488
PR rtl-optimization/116579
PR rtl-optimization/116915
PR rtl-optimization/117226
gcc/
* ext-dce.cc (carry_backpropagate): Properly handle SIGN_EXTEND, add
ZERO_EXTEND handling as well.
(ext_dce_process_uses): Call carry_backpropagate before the optimization
step.
gcc/testsuite/
* gcc.dg/torture/pr116488.c: New test.
* gcc.dg/torture/pr116579.c: New test.
* gcc.dg/torture/pr116915.c: New test.
* gcc.dg/torture/pr117226.c: New test.
|
|
Form 8:
#define DEF_VEC_SAT_S_TRUNC_FMT_8(NT, WT, NT_MIN, NT_MAX) \
void __attribute__((noinline)) \
vec_sat_s_trunc_##NT##_##WT##_fmt_8 (NT *out, WT *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
WT x = in[i]; \
NT trunc = (NT)x; \
out[i] = (WT)NT_MIN >= x || x >= (WT)NT_MAX \
? x < 0 ? NT_MIN : NT_MAX \
: trunc; \
} \
}
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-8-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-8-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-8-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-8-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-8-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-8-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-8-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-8-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-8-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-8-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-8-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-8-i64-to-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Form 7:
#define DEF_VEC_SAT_S_TRUNC_FMT_7(NT, WT, NT_MIN, NT_MAX) \
void __attribute__((noinline)) \
vec_sat_s_trunc_##NT##_##WT##_fmt_7 (NT *out, WT *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
WT x = in[i]; \
NT trunc = (NT)x; \
out[i] = (WT)NT_MIN > x || x >= (WT)NT_MAX \
? x < 0 ? NT_MIN : NT_MAX \
: trunc; \
} \
}
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-7-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-7-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-7-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-7-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-7-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-7-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-7-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-7-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-7-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-7-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-7-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-7-i64-to-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Form 6:
#define DEF_VEC_SAT_S_TRUNC_FMT_6(NT, WT, NT_MIN, NT_MAX) \
void __attribute__((noinline)) \
vec_sat_s_trunc_##NT##_##WT##_fmt_6 (NT *out, WT *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
WT x = in[i]; \
NT trunc = (NT)x; \
out[i] = (WT)NT_MIN >= x || x > (WT)NT_MAX \
? x < 0 ? NT_MIN : NT_MAX \
j: trunc; \
} \
}
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-6-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-6-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-6-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-6-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-6-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-6-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-6-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-6-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-6-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-6-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-6-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-6-i64-to-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Form 5:
#define DEF_VEC_SAT_S_TRUNC_FMT_5(NT, WT, NT_MIN, NT_MAX) \
void __attribute__((noinline)) \
vec_sat_s_trunc_##NT##_##WT##_fmt_5 (NT *out, WT *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
WT x = in[i]; \
NT trunc = (NT)x; \
out[i] = (WT)NT_MIN > x || x > (WT)NT_MAX \
? x < 0 ? NT_MIN : NT_MAX \
: trunc; \
} \
}
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-5-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-5-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-5-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-5-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-5-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-5-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-5-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-5-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-5-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-5-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-5-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-5-i64-to-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Form 4:
#define DEF_VEC_SAT_S_TRUNC_FMT_4(NT, WT, NT_MIN, NT_MAX) \
void __attribute__((noinline)) \
vec_sat_s_trunc_##NT##_##WT##_fmt_4 (NT *out, WT *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
WT x = in[i]; \
NT trunc = (NT)x; \
out[i] = (WT)NT_MIN <= x && x < (WT)NT_MAX \
? trunc \
: x < 0 ? NT_MIN : NT_MAX; \
} \
}
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-4-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-4-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-4-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-4-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-4-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-4-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-4-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-4-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-4-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-4-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-4-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-4-i64-to-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Form 3:
#define DEF_VEC_SAT_S_TRUNC_FMT_3(NT, WT, NT_MIN, NT_MAX) \
void __attribute__((noinline)) \
vec_sat_s_trunc_##NT##_##WT##_fmt_3 (NT *out, WT *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
WT x = in[i]; \
NT trunc = (NT)x; \
out[i] = (WT)NT_MIN < x && x < (WT)NT_MAX \
? trunc \
: x < 0 ? NT_MIN : NT_MAX; \
} \
}
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-3-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-3-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-3-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-3-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-3-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-3-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-3-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-3-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-3-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-3-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-3-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-3-i64-to-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Form 2:
#define DEF_VEC_SAT_S_TRUNC_FMT_2(NT, WT, NT_MIN, NT_MAX) \
void __attribute__((noinline)) \
vec_sat_s_trunc_##NT##_##WT##_fmt_2 (NT *out, WT *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
WT x = in[i]; \
NT trunc = (NT)x; \
out[i] = (WT)NT_MIN < x && x < (WT)NT_MAX \
? trunc \
: x < 0 ? NT_MIN : NT_MAX; \
} \
}
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-2-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-2-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-2-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-2-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-2-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-2-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-2-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-2-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-2-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-2-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-2-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-2-i64-to-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Form 1:
#define DEF_VEC_SAT_S_TRUNC_FMT_1(NT, WT, NT_MIN, NT_MAX) \
void __attribute__((noinline)) \
vec_sat_s_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
WT x = in[i]; \
NT trunc = (NT)x; \
out[i] = (WT)NT_MIN <= x && x <= (WT)NT_MAX \
? trunc \
: x < 0 ? NT_MIN : NT_MAX; \
} \
}
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/vec_sat_data.h: Add test data for
signed SAT_TRUNC.
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-1-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-1-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-1-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-1-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-1-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-1-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-1-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-1-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-1-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-1-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-1-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-1-i64-to-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to implement the sstrunc for vector signed integer.
Form 1:
#define DEF_VEC_SAT_S_TRUNC_FMT_1(NT, WT, NT_MIN, NT_MAX) \
void __attribute__((noinline)) \
vec_sat_s_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
WT x = in[i]; \
NT trunc = (NT)x; \
out[i] = (WT)NT_MIN <= x && x <= (WT)NT_MAX \
? trunc \
: x < 0 ? NT_MIN : NT_MAX; \
} \
}
DEF_VEC_SAT_S_TRUNC_FMT_1(int32_t, int64_t, INT32_MIN, INT32_MAX)
Before this patch:
27 │ vsetvli a5,a2,e64,m1,ta,ma
28 │ vle64.v v1,0(a1)
29 │ slli a3,a5,3
30 │ slli a4,a5,2
31 │ sub a2,a2,a5
32 │ add a1,a1,a3
33 │ vadd.vv v0,v1,v5
34 │ vsetvli zero,zero,e32,mf2,ta,ma
35 │ vnsrl.wx v2,v1,a6
36 │ vncvt.x.x.w v1,v1
37 │ vsetvli zero,zero,e64,m1,ta,ma
38 │ vmsgtu.vv v0,v0,v4
39 │ vsetvli zero,zero,e32,mf2,ta,mu
40 │ vneg.v v2,v2
41 │ vxor.vv v1,v2,v3,v0.t
42 │ vse32.v v1,0(a0)
43 │ add a0,a0,a4
44 │ bne a2,zero,.L3
After this patch:
16 │ vsetvli a5,a2,e32,mf2,ta,ma
17 │ vle64.v v1,0(a1)
18 │ slli a3,a5,3
19 │ slli a4,a5,2
20 │ sub a2,a2,a5
21 │ add a1,a1,a3
22 │ vnclip.wi v1,v1,0
23 │ vse32.v v1,0(a0)
24 │ add a0,a0,a4
25 │ bne a2,zero,.L3
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/autovec.md (sstrunc<mode><v_double_trunc>2): Add
new pattern sstrunc for double trunc.
(sstrunc<mode><v_quad_trunc>2): Ditto but for quad trunc.
(sstrunc<mode><v_oct_trunc>2): Ditto but for oct trunc.
* config/riscv/riscv-protos.h (expand_vec_double_sstrunc): Add
new func decl to expand double trunc.
(expand_vec_quad_sstrunc): Ditto but for quad trunc.
(expand_vec_oct_sstrunc): Ditto but for oct trunc.
* config/riscv/riscv-v.cc (expand_vec_double_sstrunc): Add new
func to expand double trunc.
(expand_vec_quad_sstrunc): Ditto but for quad trunc.
(expand_vec_oct_sstrunc): Ditto but for oct trunc.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Almost the same as vector unsigned integer SAT_TRUNC, try to match
the signed version during the vector pattern matching.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* tree-vect-patterns.cc (gimple_signed_integer_sat_trunc): Add
new func decl for signed SAT_TRUNC.
(vect_recog_sat_trunc_pattern): Try signed match pattern for
the SAT_TRUNC.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to support the form 1 of the vector signed
integer SAT_TRUNC. Aka below example:
Form 1:
#define DEF_VEC_SAT_S_TRUNC_FMT_1(NT, WT, NT_MIN, NT_MAX) \
void __attribute__((noinline)) \
vec_sat_s_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
WT x = in[i]; \
NT trunc = (NT)x; \
out[i] = (WT)NT_MIN <= x && x <= (WT)NT_MAX \
? trunc \
: x < 0 ? NT_MIN : NT_MAX; \
} \
}
DEF_VEC_SAT_S_TRUNC_FMT_1(int32_t, int64_t, INT32_MIN, INT32_MAX)
Before this patch:
48 │ _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [2, 2]);
49 │ ivtmp_64 = _87 * 8;
50 │ vect_x_14.10_67 = .MASK_LEN_LOAD (vectp_in.8_65, 64B, { -1, ... }, _87, 0);
51 │ vect_trunc_15.21_78 = (vector([2,2]) int) vect_x_14.10_67;
52 │ _61 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned long>(vect_x_14.10_67);
53 │ _32 = _61 >> 63;
54 │ vect_patt_52.16_73 = (vector([2,2]) int) _32;
55 │ vect__46.17_74 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned int>(vect_patt_52.16_73);
56 │ vect__47.18_75 = -vect__46.17_74;
57 │ vect__21.19_76 = VIEW_CONVERT_EXPR<vector([2,2]) int>(vect__47.18_75);
58 │ vect_x.11_68 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned long>(vect_x_14.10_67);
59 │ vect__5.12_69 = vect_x.11_68 + { 2147483648, ... };
60 │ mask__34.13_70 = vect__5.12_69 > { 4294967295, ... };
61 │ _25 = .COND_XOR (mask__34.13_70, vect__21.19_76, { 2147483647, ... }, vect_trunc_15.21_78);
62 │ ivtmp_80 = _87 * 4;
63 │ .MASK_LEN_STORE (vectp_out.23_81, 32B, { -1, ... }, _87, 0, _25);
64 │ vectp_in.8_66 = vectp_in.8_65 + ivtmp_64;
65 │ vectp_out.23_82 = vectp_out.23_81 + ivtmp_80;
66 │ ivtmp_86 = ivtmp_85 - _87;
After this patch:
38 │ _77 = .SELECT_VL (ivtmp_75, POLY_INT_CST [2, 2]);
39 │ ivtmp_65 = _77 * 8;
40 │ vect_x_14.10_68 = .MASK_LEN_LOAD (vectp_in.8_66, 64B, { -1, ... }, _77, 0);
41 │ vect_patt_53.11_69 = .SAT_TRUNC (vect_x_14.10_68);
42 │ ivtmp_70 = _77 * 4;
43 │ .MASK_LEN_STORE (vectp_out.12_71, 32B, { -1, ... }, _77, 0, vect_patt_53.11_69);
44 │ vectp_in.8_67 = vectp_in.8_66 + ivtmp_65;
45 │ vectp_out.12_72 = vectp_out.12_71 + ivtmp_70;
46 │ ivtmp_76 = ivtmp_75 - _77;
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Refine matching for vector signed SAT_TRUNC form 1.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This is necessary to prevent reload assuming that a direct FP->FPMR move
is valid.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_register_move_cost):
Increase costs involving MOVEABLE_SYSREGS.
|
|
FIRST_SGPR_REG is register zero so the compiler always claims this comparison
is redundant. It's right, of course, but I'd have preferred to keep the
comparison for completeness. Probably the "correct" solution is to use an enum
for these values.
gcc/ChangeLog:
* config/gcn/gcn.h (SGPR_REGNO_P): Silence warning.
|
|
As the PR shows, pair-fusion was tricking memory_modified_in_insn_p into
returning false when a common base register (in this case, x1) was
modified between the mem and the store insn. This lead to wrong code as
the accesses really did alias.
To avoid this sort of problem, this patch avoids invoking RTL alias
analysis altogether (and assume an alias conflict) if the two insns to
be compared share a common address register R, and the insns see different
definitions of R (i.e. it was modified in between).
gcc/ChangeLog:
PR rtl-optimization/116783
* pair-fusion.cc (def_walker::cand_addr_uses): New.
(def_walker::def_walker): Add parameter for candidate address
uses.
(def_walker::alias_conflict_p): Declare.
(def_walker::addr_reg_conflict_p): New.
(def_walker::conflict_p): New.
(store_walker::store_walker): Add parameter for candidate
address uses and pass to base ctor.
(store_walker::conflict_p): Rename to ...
(store_walker::alias_conflict_p): ... this.
(load_walker::load_walker): Add parameter for candidate
address uses and pass to base ctor.
(load_walker::conflict_p): Rename to ...
(load_walker::alias_conflict_p): ... this.
(pair_fusion_bb_info::try_fuse_pair): Collect address register
uses for candidate insns and pass down to alias walkers.
gcc/testsuite/ChangeLog:
PR rtl-optimization/116783
* g++.dg/torture/pr116783.C: New test.
|
|
Corrected the function code for the Atomic Memory Operation "Fetch and Decrement
Bounded", changing it from 0x1A to 0x1C.
2024-10-11 Jeevitha Palanisamy <jeevitha@linux.ibm.com>
gcc/
* config/rs6000/amo.h (enum _AMO_LD): Correct the function code for
_AMO_LD_DEC_BOUNDED.
|
|
From ISE, it shows that we will have family 0x13 for Diamond Rapids.
Therefore, we need to refactor the get_intel_cpu to accept new families.
Also I did some reorder in the switch for clearness by putting earlier
added products on top for search convenience.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_intel_cpu): Refactor the
function for future expansion on different family.
|
|
Skip flat -flto to address UNRESOLVED cases as follows:
gcc.target/riscv/sat_s_add-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects: output file does not exist
UNRESOLVED: gcc.target/riscv/sat_s_add-1.c
Change-Id: I7ff55197b6294cd473dfaa6cc350c5e2eb5960fe
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_s_add-1.c: Skip flag -flto.
* gcc.target/riscv/sat_s_add-10.c: Ditto.
* gcc.target/riscv/sat_s_add-11.c: Ditto.
* gcc.target/riscv/sat_s_add-12.c: Ditto.
* gcc.target/riscv/sat_s_add-13.c: Ditto.
* gcc.target/riscv/sat_s_add-14.c: Ditto.
* gcc.target/riscv/sat_s_add-15.c: Ditto.
* gcc.target/riscv/sat_s_add-16.c: Ditto.
* gcc.target/riscv/sat_s_add-2.c: Ditto.
* gcc.target/riscv/sat_s_add-3.c: Ditto.
* gcc.target/riscv/sat_s_add-4.c: Ditto.
* gcc.target/riscv/sat_s_add-5.c: Ditto.
* gcc.target/riscv/sat_s_add-6.c: Ditto.
* gcc.target/riscv/sat_s_add-7.c: Ditto.
* gcc.target/riscv/sat_s_add-8.c: Ditto.
* gcc.target/riscv/sat_s_add-9.c: Ditto.
* gcc.target/riscv/sat_s_sub-1-i16.c: Ditto.
* gcc.target/riscv/sat_s_sub-1-i32.c: Ditto.
* gcc.target/riscv/sat_s_sub-1-i64.c: Ditto.
* gcc.target/riscv/sat_s_sub-1-i8.c: Ditto.
* gcc.target/riscv/sat_s_sub-2-i16.c: Ditto.
* gcc.target/riscv/sat_s_sub-2-i32.c: Ditto.
* gcc.target/riscv/sat_s_sub-2-i64.c: Ditto.
* gcc.target/riscv/sat_s_sub-2-i8.c: Ditto.
* gcc.target/riscv/sat_s_sub-3-i16.c: Ditto.
* gcc.target/riscv/sat_s_sub-3-i32.c: Ditto.
* gcc.target/riscv/sat_s_sub-3-i64.c: Ditto.
* gcc.target/riscv/sat_s_sub-3-i8.c: Ditto.
* gcc.target/riscv/sat_s_sub-4-i16.c: Ditto.
* gcc.target/riscv/sat_s_sub-4-i32.c: Ditto.
* gcc.target/riscv/sat_s_sub-4-i64.c: Ditto.
* gcc.target/riscv/sat_s_sub-4-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-1-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-1-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-1-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-1-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-1-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-1-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-2-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-2-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-2-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-2-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-2-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-2-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-3-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-3-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-3-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-3-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-3-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-3-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-4-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-4-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-4-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-4-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-4-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-4-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-5-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-5-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-5-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-5-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-5-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-5-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-6-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-6-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-6-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-6-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-6-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-6-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-7-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-7-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-7-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-7-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-7-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-7-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-8-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-8-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-8-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-8-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-8-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-8-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_u_add-1.c: Ditto.
* gcc.target/riscv/sat_u_add-10.c: Ditto.
* gcc.target/riscv/sat_u_add-11.c: Ditto.
* gcc.target/riscv/sat_u_add-12.c: Ditto.
* gcc.target/riscv/sat_u_add-13.c: Ditto.
* gcc.target/riscv/sat_u_add-14.c: Ditto.
* gcc.target/riscv/sat_u_add-15.c: Ditto.
* gcc.target/riscv/sat_u_add-16.c: Ditto.
* gcc.target/riscv/sat_u_add-17.c: Ditto.
* gcc.target/riscv/sat_u_add-18.c: Ditto.
* gcc.target/riscv/sat_u_add-19.c: Ditto.
* gcc.target/riscv/sat_u_add-2.c: Ditto.
* gcc.target/riscv/sat_u_add-20.c: Ditto.
* gcc.target/riscv/sat_u_add-21.c: Ditto.
* gcc.target/riscv/sat_u_add-22.c: Ditto.
* gcc.target/riscv/sat_u_add-23.c: Ditto.
* gcc.target/riscv/sat_u_add-24.c: Ditto.
* gcc.target/riscv/sat_u_add-3.c: Ditto.
* gcc.target/riscv/sat_u_add-4.c: Ditto.
* gcc.target/riscv/sat_u_add-5.c: Ditto.
* gcc.target/riscv/sat_u_add-6.c: Ditto.
* gcc.target/riscv/sat_u_add-7.c: Ditto.
* gcc.target/riscv/sat_u_add-8.c: Ditto.
* gcc.target/riscv/sat_u_add-9.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-1.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-10.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-11.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-12.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-13.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-14.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-15.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-16.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-2.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-3.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-4.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-5.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-6.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-7.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-8.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-9.c: Ditto.
* gcc.target/riscv/sat_u_sub-1.c: Ditto.
* gcc.target/riscv/sat_u_sub-10.c: Ditto.
* gcc.target/riscv/sat_u_sub-11.c: Ditto.
* gcc.target/riscv/sat_u_sub-12.c: Ditto.
* gcc.target/riscv/sat_u_sub-13.c: Ditto.
* gcc.target/riscv/sat_u_sub-14.c: Ditto.
* gcc.target/riscv/sat_u_sub-15.c: Ditto.
* gcc.target/riscv/sat_u_sub-16.c: Ditto.
* gcc.target/riscv/sat_u_sub-17.c: Ditto.
* gcc.target/riscv/sat_u_sub-18.c: Ditto.
* gcc.target/riscv/sat_u_sub-19.c: Ditto.
* gcc.target/riscv/sat_u_sub-2.c: Ditto.
* gcc.target/riscv/sat_u_sub-20.c: Ditto.
* gcc.target/riscv/sat_u_sub-21.c: Ditto.
* gcc.target/riscv/sat_u_sub-22.c: Ditto.
* gcc.target/riscv/sat_u_sub-23.c: Ditto.
* gcc.target/riscv/sat_u_sub-24.c: Ditto.
* gcc.target/riscv/sat_u_sub-25.c: Ditto.
* gcc.target/riscv/sat_u_sub-26.c: Ditto.
* gcc.target/riscv/sat_u_sub-27.c: Ditto.
* gcc.target/riscv/sat_u_sub-28.c: Ditto.
* gcc.target/riscv/sat_u_sub-29.c: Ditto.
* gcc.target/riscv/sat_u_sub-3.c: Ditto.
* gcc.target/riscv/sat_u_sub-30.c: Ditto.
* gcc.target/riscv/sat_u_sub-31.c: Ditto.
* gcc.target/riscv/sat_u_sub-32.c: Ditto.
* gcc.target/riscv/sat_u_sub-33.c: Ditto.
* gcc.target/riscv/sat_u_sub-34.c: Ditto.
* gcc.target/riscv/sat_u_sub-35.c: Ditto.
* gcc.target/riscv/sat_u_sub-36.c: Ditto.
* gcc.target/riscv/sat_u_sub-37.c: Ditto.
* gcc.target/riscv/sat_u_sub-38.c: Ditto.
* gcc.target/riscv/sat_u_sub-39.c: Ditto.
* gcc.target/riscv/sat_u_sub-4.c: Ditto.
* gcc.target/riscv/sat_u_sub-40.c: Ditto.
* gcc.target/riscv/sat_u_sub-41.c: Ditto.
* gcc.target/riscv/sat_u_sub-42.c: Ditto.
* gcc.target/riscv/sat_u_sub-43.c: Ditto.
* gcc.target/riscv/sat_u_sub-44.c: Ditto.
* gcc.target/riscv/sat_u_sub-45.c: Ditto.
* gcc.target/riscv/sat_u_sub-46.c: Ditto.
* gcc.target/riscv/sat_u_sub-47.c: Ditto.
* gcc.target/riscv/sat_u_sub-48.c: Ditto.
* gcc.target/riscv/sat_u_sub-5.c: Ditto.
* gcc.target/riscv/sat_u_sub-6.c: Ditto.
* gcc.target/riscv/sat_u_sub-7.c: Ditto.
* gcc.target/riscv/sat_u_sub-8.c: Ditto.
* gcc.target/riscv/sat_u_sub-9.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-10.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-10_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-10_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-11.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-11_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-11_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-12.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-13.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-13_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-13_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-14.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-14_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-14_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-15.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-15_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-15_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-16.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-1_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-1_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-2_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-2_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-3.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-3_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-3_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-4.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-5.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-5_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-5_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-6.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-6_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-6_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-7.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-7_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-7_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-8.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-9.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-9_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-9_2.c: Ditto.
* gcc.target/riscv/sat_u_trunc-1.c: Ditto.
* gcc.target/riscv/sat_u_trunc-10.c: Ditto.
* gcc.target/riscv/sat_u_trunc-11.c: Ditto.
* gcc.target/riscv/sat_u_trunc-12.c: Ditto.
* gcc.target/riscv/sat_u_trunc-13.c: Ditto.
* gcc.target/riscv/sat_u_trunc-14.c: Ditto.
* gcc.target/riscv/sat_u_trunc-15.c: Ditto.
* gcc.target/riscv/sat_u_trunc-16.c: Ditto.
* gcc.target/riscv/sat_u_trunc-17.c: Ditto.
* gcc.target/riscv/sat_u_trunc-18.c: Ditto.
* gcc.target/riscv/sat_u_trunc-19.c: Ditto.
* gcc.target/riscv/sat_u_trunc-2.c: Ditto.
* gcc.target/riscv/sat_u_trunc-20.c: Ditto.
* gcc.target/riscv/sat_u_trunc-21.c: Ditto.
* gcc.target/riscv/sat_u_trunc-22.c: Ditto.
* gcc.target/riscv/sat_u_trunc-23.c: Ditto.
* gcc.target/riscv/sat_u_trunc-24.c: Ditto.
* gcc.target/riscv/sat_u_trunc-3.c: Ditto.
* gcc.target/riscv/sat_u_trunc-4.c: Ditto.
* gcc.target/riscv/sat_u_trunc-5.c: Ditto.
* gcc.target/riscv/sat_u_trunc-6.c: Ditto.
* gcc.target/riscv/sat_u_trunc-7.c: Ditto.
* gcc.target/riscv/sat_u_trunc-8.c: Ditto.
* gcc.target/riscv/sat_u_trunc-9.c: Ditto.
|
|
arm pac and bti tests that use -march=armv8.1-m.main get an implicit
-mthumb, that is incompatible with vxworks kernel mode. Declaring the
requirement for a 8.1-m.main-compatible toolchain is enough to avoid
those fails, because the toolchain feature test fails in kernel mode,
but taking the -march options from the standardized arch tests, after
testing for support for the corresponding effective target, makes it
generally safer, and enables us to drop skip directives and extraneous
option variants.
for gcc/testsuite/ChangeLog
* gcc.target/arm/bti-1.c: Require arch, use its opts, drop skip.
* gcc.target/arm/bti-2.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-11.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-12.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-7.c: Likewise.
* g++.target/arm/pac-1.C: Likewise. Drop +mve.
|
|
r12-6103-g1a7ce8570997eb combines vpcmpuw + zero_extend to vpcmpuw
with the pre_reload splitter, but the splitter transforms the
zero_extend into a subreg which make reload think the upper part is
garbage, it's not correct.
The patch adjusts the zero_extend define_insn_and_split to
define_insn to keep zero_extend.
gcc/ChangeLog:
PR target/117159
* config/i386/sse.md
(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Change from define_insn_and_split to define_insn.
(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Split to the zero_extend pattern.
(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr117159.c: New test.
* gcc.target/i386/avx512bw-pr103750-1.c: Remove xfail.
* gcc.target/i386/avx512bw-pr103750-2.c: Remove xfail.
|
|
|
|
UNITS_PER_WORD"
This reverts commit 72ceddbfb78dbb95f0808c3eca1765e8cd48b023.
|
|
Further cleanups and improve the wording of an error message.
gcc/m2/ChangeLog:
* gm2-compiler/M2MetaError.mod (op): Corrected ordering.
* gm2-compiler/P2SymBuild.def: Remove comment.
* gm2-compiler/P2SymBuild.mod (GetComparison): Replace
the word less with fewer.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
|
|
libcpp is not currently set up to be able to generate valid
locations for tokens lexed from a _Pragma string. Instead, after obtaining
the tokens, it sets their locations all to the location of the _Pragma
operator itself. This makes things like _Pragma("GCC diagnostic") work well
enough, but if any diagnostics are issued during lexing, prior to resetting
the token locations, those diagnostics get issued at the invalid
locations. Fix that up by adding a new field pfile->diagnostic_override_loc
that instructs libcpp to issue diagnostics at the alternate location.
libcpp/ChangeLog:
PR preprocessor/114423
* internal.h (struct cpp_reader): Add DIAGNOSTIC_OVERRIDE_LOC
field.
* directives.cc (destringize_and_run): Set the new field to the
location of the _Pragma operator.
* errors.cc (cpp_diagnostic_at): Support DIAGNOSTIC_OVERRIDE_LOC to
temporarily issue diagnostics at a different location.
(cpp_diagnostic_with_line): Likewise.
gcc/testsuite/ChangeLog:
PR preprocessor/114423
* c-c++-common/cpp/pragma-diagnostic-loc.c: New test.
* c-c++-common/cpp/diagnostic-pragma-1.c: Adjust expected output.
* g++.dg/pch/operator-1.C: Likewise.
|
|
This patch is a tidyup for gm2-compiler/M2MetaError.mod.
gcc/m2/ChangeLog:
* gm2-compiler/M2MetaError.mod (op): Alphabetically order
each case label and comment.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
Sometimes factor_out_conditional_operation can factor out
an operation that causes a phi node to become the same element.
Other times, we want to factor out a binary operator because
it can improve code generation, an example is PR 110015 (openjpeg).
Note this includes a heuristic to decide if factoring out the operation
is profitable or not. It can be expanded to include a better live range
extend detector. Right now it has a simple one where if it is live on a
dominating path, it is considered a live or if there are a small # of
assign statements (defaults to 5), then it does not extend the live range
too much.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/112418
gcc/ChangeLog:
* tree-ssa-phiopt.cc (is_factor_profitable): New function.
(factor_out_conditional_operation): Add merge argument. Remove
arg0/arg1 arguments. Return bool instead of the new phi.
Early return for virtual ops. Call is_factor_profitable to
check if the factoring would be profitable.
(pass_phiopt::execute): Call factor_out_conditional_operation
on all phis instead of just singleton phi.
* doc/invoke.texi (--param phiopt-factor-max-stmts-live=): Document.
* params.opt (--param=phiopt-factor-max-stmts-live=): New opt.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/factor_op_phi-1.c: New test.
* gcc.dg/tree-ssa/factor_op_phi-2.c: New test.
* gcc.dg/tree-ssa/factor_op_phi-3.c: New test.
* gcc.dg/tree-ssa/factor_op_phi-4.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Add option -m(no-)autovec-segment to enable/disable autovectorizer
from emitting vector segment load/store instructions. This is useful for
performance experiments.
gcc/ChangeLog:
* config/riscv/autovec.md (vec_mask_len_load_lanes, vec_mask_len_store_lanes):
Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT
* config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New macro.
* config/riscv/riscv.opt (-m(no-)autovec-segment): New option.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/no-segment.c: New test.
|
|
gcc/testsuite/ChangeLog:
PR fortran/117225
* gfortran.dg/unsigned_38.f90: Add missing dg-error directive.
|
|
For fast unaligned access targets, by pieces uses up to UNITS_PER_WORD
size pieces resulting in more store instructions than needed. For
example gcc.target/riscv/rvv/base/setmem-1.c:f1 built with
`-O3 -march=rv64gcv -mtune=thead-c906`:
```
f1:
vsetivli zero,8,e8,mf2,ta,ma
vmv.v.x v1,a1
vsetivli zero,0,e32,mf2,ta,ma
sb a1,14(a0)
vmv.x.s a4,v1
vsetivli zero,8,e16,m1,ta,ma
vmv.x.s a5,v1
vse8.v v1,0(a0)
sw a4,8(a0)
sh a5,12(a0)
ret
```
The slow unaligned access version built with `-O3 -march=rv64gcv` used
15 sb instructions:
```
f1:
sb a1,0(a0)
sb a1,1(a0)
sb a1,2(a0)
sb a1,3(a0)
sb a1,4(a0)
sb a1,5(a0)
sb a1,6(a0)
sb a1,7(a0)
sb a1,8(a0)
sb a1,9(a0)
sb a1,10(a0)
sb a1,11(a0)
sb a1,12(a0)
sb a1,13(a0)
sb a1,14(a0)
ret
```
After this patch, the following is generated in both cases:
```
f1:
vsetivli zero,15,e8,m1,ta,ma
vmv.v.x v1,a1
vse8.v v1,0(a0)
ret
```
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_use_by_pieces_infrastructure_p):
New function.
(TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Define.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr113469.c: Expect mf2 setmem.
* gcc.target/riscv/rvv/base/setmem-2.c: Update f1 to expect
straight-line vector memset.
* gcc.target/riscv/rvv/base/setmem-3.c: Likewise.
|
|
[NFC]
This moves the code for deciding whether to generate a vectorized
memcpy, what vector mode to use and whether a loop is needed out of
riscv_vector::expand_block_move and into a new function
riscv_vector::use_stringop_p so that it can be reused for other string
operations.
gcc/ChangeLog:
* config/riscv/riscv-string.cc (struct stringop_info): New.
(expand_block_move): Move decision making code to...
(use_vector_stringop_p): ...here.
|
|
Unlike the other vector string ops, expand_block_move was using max LMUL
m8 regardless of TARGET_MAX_LMUL.
The check for whether to generate inline vector code for movmem has been
moved from movmem<mode> to riscv_vector::expand_block_move to avoid
maintaining multiple versions of similar logic. They already differed
on the minimum length for which they would generate vector code. Now
that the expand_block_move value is used, movmem will be generated for
smaller lengths.
Limiting memcpy to m1 caused some memcpy loops to be generated in
the calling convention tests which makes it awkward to add suitable scan
assembler tests checking the return value being set, so
-mrvv-max-lmul=m8 has been added to these tests. Other tests have been
adjusted to expect the new memcpy m1 generation where reasonably
straight-forward, otherwise -mrvv-max-lmul=m8 has been added.
pr111720-[0-9].c regressed because a memcpy loop is generated instead
of straight-line. This reveals an existing issue where a redundant
straight-line memcpy gets eliminated but a memcpy loop does not
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117205).
For example, on pr111720-0.c after this patch:
-mrvv-max-lmul=m8:
test:
lui a5,%hi(.LANCHOR0)
li a4,32
addi sp,sp,-32
addi a5,a5,%lo(.LANCHOR0)
vsetvli zero,a4,e8,m1,ta,ma
vle8.v v8,0(a5)
addi sp,sp,32
jr ra
-mrvv-max-lmul=m1:
test:
addi sp,sp,-32
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
mv a2,sp
li a3,32
.L2:
vsetvli a4,a3,e8,m1,ta,ma
vle8.v v8,0(a5)
sub a3,a3,a4
add a5,a5,a4
vse8.v v8,0(a2)
add a2,a2,a4
bne a3,zero,.L2
li a5,32
vsetvli zero,a5,e8,m1,ta,ma
vle8.v v8,0(sp)
addi sp,sp,32
jr ra
I have added -mrvv-max-lmul=m8 to pr111720-[0-9].c so that we continue
to test the elimination of straight-line memcpy.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (get_lmul_mode): New prototype.
(expand_block_move): Add bool parameter for movmem_p.
* config/riscv/riscv-string.cc (riscv_expand_block_move_scalar):
Pass movmem_p as false to riscv_vector::expand_block_move.
(expand_block_move): Add movmem_p parameter. Return false if
loop needed and movmem_p is true. Respect TARGET_MAX_LMUL.
* config/riscv/riscv-v.cc (get_lmul_mode): New function.
* config/riscv/riscv.md (movmem<mode>): Move checking for
whether to generate inline vector code to
riscv_vector::expand_block_move by passing movmem_p as true.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr113206-1.c: Add
-mrvv-max-lmul=m8.
* gcc.target/riscv/rvv/autovec/pr113206-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c: Add
-mrvv-max-lmul=m8 and adjust assembly scans.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Add
-mrvv-max-lmul=m8.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Likewise.
* gcc.target/riscv/rvv/base/cpymem-1.c: Expect m1 in f1 and f2.
* gcc.target/riscv/rvv/base/cpymem-2.c: Add -mrvv-max-lmul=m8.
* gcc.target/riscv/rvv/base/movmem-1.c: Adjust f1 to a length
that will not get vectorized.
* gcc.target/riscv/rvv/base/pr111720-0.c: Add -mrvv-max-lmul=m8.
* gcc.target/riscv/rvv/base/pr111720-1.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-2.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-3.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-4.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-5.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-6.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-7.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-8.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-9.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/pr112929-1.c: Expect memcpy m1
loops.
* gcc.target/riscv/rvv/vsetvl/pr112988-1.c: Likewise.
|
|
This patch implements the FORWARD keyword found in the ISO standard.
The patch checks incoming parameters against the prior declaration
found in definition/forward sections and will issue an error based
on virtual tokens highlighing the full parameter declaration.
gcc/m2/ChangeLog:
PR modula2/115328
* gm2-compiler/M2MetaError.def: Extend comment documentating
new format specifiers.
* gm2-compiler/M2MetaError.mod (GetTokProcedure): New declaration.
(doErrorScopeModule): New procedure.
(doErrorScopeForward): Ditto.
(doErrorScopeMod): Reimplement.
(doErrorScopeFor): New procedure.
(declarationMod): Ditto.
(doErrorScopeDefinition): Ditto.
(doErrorScopeDef): Reimplement.
(declaredDef): New procedure.
(declaredFor): Ditto.
(doErrorScopeProc): Ditto.
(declaredVar): Ditto.
(declaredType): Ditto.
(declaredFull): Ditto.
* gm2-compiler/M2Options.mod (SetAutoInit): Add missing
return type.
(GetDumpGimple): Remove duplicate implementation.
* gm2-compiler/M2Quads.def (DupFrame): New procedure.
* gm2-compiler/M2Quads.mod (DupFrame): New procedure.
* gm2-compiler/M2Reserved.def (ForwardTok): New variable.
* gm2-compiler/M2Reserved.mod (ForwardTok): Initialize variable.
* gm2-compiler/M2Scaffold.mod (DeclareArgEnvParams): Add
tokno parameter for call to PutParam.
* gm2-compiler/P0SymBuild.def (EndForward): New procedure.
* gm2-compiler/P0SymBuild.mod (EndForward): New procedure.
* gm2-compiler/P0SyntaxCheck.bnf (BlockAssert): New procedure.
(ProcedureDeclaration): Reimplement rule.
(PostProcedureHeading): New rule.
(ForwardDeclaration): Ditto.
(ProperProcedure): Ditto.
* gm2-compiler/P1Build.bnf (ProcedureDeclaration): Reimplement rule.
(PostProcedureHeading): New rule.
(ForwardDeclaration): Ditto.
(ProperProcedure): Ditto.
* gm2-compiler/P1SymBuild.def (Export): Removed unnecessary
export.
(EndBuildForward): New procedure.
* gm2-compiler/P1SymBuild.mod (StartBuildProcedure): Reimplement.
(EndBuildProcedure): Ditto.
(EndBuildForward): Ditto.
* gm2-compiler/P2Build.bnf (ProcedureDeclaration): Reimplement rule.
(PostProcedureHeading): New rule.
(ForwardDeclaration): Ditto.
(ProperProcedure): Ditto.
* gm2-compiler/P2SymBuild.def (BuildProcedureDefinedByForward):
New procedure.
(BuildProcedureDefinedByProper): Ditto.
(CheckProcedure): Ditto.
(EndBuildForward): Ditto.
* gm2-compiler/P2SymBuild.mod (EndBuildProcedure): Reimplement.
(EndBuildForward): New procedure.
(BuildFPSection): Reimplement to allow forward declaration or
checking of parameters.
(BuildProcedureDefinedByProper): New procedure.
(BuildProcedureDefinedByForward): Ditto
(FailParameter): Remove.
(ParameterError): New procedure.
(ParameterMismatch): Ditto.
(EndBuildFormalParameters): Add parameter number check.
(GetComparison): New procedure function.
(GetSourceDesc): Ditto.
(GetCurSrcDesc): Ditto.
(GetDeclared): New procedure.
(ReturnTypeMismatch): Ditto.
(BuildFunction): Reimplement.
(CheckProcedure): New procedure.
(CheckFormalParameterSection): Reimplement using ParameterError.
* gm2-compiler/P3Build.bnf (ProcedureDeclaration): Reimplement rule.
(PostProcedureHeading): New rule.
(ForwardDeclaration): Ditto.
(ProperProcedure): Ditto.
* gm2-compiler/P3SymBuild.def (Export): Remove unnecessary export.
(EndBuildForward): New procedure.
* gm2-compiler/P3SymBuild.mod (EndBuildForward): New procedure.
* gm2-compiler/PCBuild.bnf (ProcedureDeclaration): Reimplement rule.
(PostProcedureHeading): New rule.
(ForwardDeclaration): Ditto.
(ProperProcedure): Ditto.
* gm2-compiler/PCSymBuild.def (EndBuildForward): New procedure.
* gm2-compiler/PCSymBuild.mod (EndBuildForward): Ditto.
* gm2-compiler/PHBuild.bnf (ProcedureDeclaration): Reimplement rule.
(PostProcedureHeading): New rule.
(ForwardDeclaration): Ditto.
(ProperProcedure): Ditto.
* gm2-compiler/SymbolTable.def (PutVarTok): New procedure.
(PutParam): Add typetok parameter.
(PutVarParam): Ditto.
(PutParamName): Ditto.
(GetDeclaredFor): New procedure function.
(AreParametersDefinedInDefinition): Ditto.
(PutParametersDefinedByForward): New procedure.
(GetParametersDefinedByForward): New procedure function.
(PutParametersDefinedByProper): New procedure.
(GetParametersDefinedByProper): New procedure function.
(GetProcedureDeclaredForward): Ditto.
(PutProcedureDeclaredForward): New procedure.
(GetProcedureDeclaredProper): New procedure function.
(PutProcedureDeclaredProper): New procedure.
(GetProcedureDeclaredDefinition): New procedure function.
(PutProcedureDeclaredDefinition): New procedure.
(GetVarDeclTypeTok): Ditto.
(PutVarDeclTypeTok): New procedure.
(GetVarDeclTok): Ditto.
(PutVarDeclTok): New procedure.
(GetVarDeclFullTok): Ditto.
* gm2-compiler/SymbolTable.mod (ProcedureDecl): New record type.
(VarDecl): Ditto.
(SymProcedure): Add new field Declared.
(SymVar): Add new field Declared.
(PutVarTok): New procedure.
(PutParam): Add typetok parameter.
(PutVarParam): Ditto.
(PutParamName): Ditto.
(GetDeclaredFor): New procedure function.
(AreParametersDefinedInDefinition): Ditto.
(PutParametersDefinedByForward): New procedure.
(GetParametersDefinedByForward): New procedure function.
(PutParametersDefinedByProper): New procedure.
(GetParametersDefinedByProper): New procedure function.
(GetProcedureDeclaredForward): Ditto.
(PutProcedureDeclaredForward): New procedure.
(GetProcedureDeclaredProper): New procedure function.
(PutProcedureDeclaredProper): New procedure.
(GetProcedureDeclaredDefinition): New procedure function.
(PutProcedureDeclaredDefinition): New procedure.
(GetVarDeclTypeTok): Ditto.
(PutVarDeclTypeTok): New procedure.
(GetVarDeclTok): Ditto.
(PutVarDeclTok): New procedure.
(GetVarDeclFullTok): Ditto.
(MakeProcedure): Initialize Declared field.
(MakeVar): Initialize Declared field.
* gm2-libs-log/FileSystem.def (FileNameChar): Add
missing return type.
* m2.flex: Add FORWARD keyword.
gcc/testsuite/ChangeLog:
PR modula2/115328
* gm2/iso/fail/badparam.def: New test.
* gm2/iso/fail/badparam.mod: New test.
* gm2/iso/fail/badparam2.def: New test.
* gm2/iso/fail/badparam2.mod: New test.
* gm2/iso/fail/badparam3.def: New test.
* gm2/iso/fail/badparam3.mod: New test.
* gm2/iso/fail/badparamarray.def: New test.
* gm2/iso/fail/badparamarray.mod: New test.
* gm2/iso/fail/simpledef1.def: New test.
* gm2/iso/fail/simpledef1.mod: New test.
* gm2/iso/fail/simpleforward.mod: New test.
* gm2/iso/fail/simpleforward2.mod: New test.
* gm2/iso/fail/simpleforward3.mod: New test.
* gm2/iso/fail/simpleforward4.mod: New test.
* gm2/iso/fail/simpleforward5.mod: New test.
* gm2/iso/fail/simpleforward7.mod: New test.
* gm2/iso/pass/simpleforward.mod: New test.
* gm2/iso/pass/simpleforward6.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|