diff options
author | Ju-Zhe Zhong <juzhe.zhong@rivai.ai> | 2023-07-12 21:17:39 +0800 |
---|---|---|
committer | Pan Li <pan2.li@intel.com> | 2023-07-12 22:26:33 +0800 |
commit | 0d4dd7e07a879d6c07a33edb2799710faa95651e (patch) | |
tree | caabf9a182c552c07f985008527476861d512152 /gcc/tree-ssa-loop-ch.cc | |
parent | 13c3e29d47e359b2f05ea98d61710fc162ba6d31 (diff) | |
download | gcc-0d4dd7e07a879d6c07a33edb2799710faa95651e.zip gcc-0d4dd7e07a879d6c07a33edb2799710faa95651e.tar.gz gcc-0d4dd7e07a879d6c07a33edb2799710faa95651e.tar.bz2 |
VECT: Apply COND_LEN_* into vectorizable_operation
Hi, Richard and Richi.
As we disscussed before, COND_LEN_* patterns were added for multiple situations.
This patch apply CON_LEN_* for the following situation:
Support for the situation that in "vectorizable_operation":
/* If operating on inactive elements could generate spurious traps,
we need to restrict the operation to active lanes. Note that this
specifically doesn't apply to unhoisted invariants, since they
operate on the same value for every lane.
Similarly, if this operation is part of a reduction, a fully-masked
loop should only change the active lanes of the reduction chain,
keeping the inactive lanes as-is. */
bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
|| reduc_idx >= 0);
For mask_out_inactive is true with length loop control.
So, we can these 2 following cases:
1. Integer division:
#define TEST_TYPE(TYPE) \
__attribute__((noipa)) \
void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
{ \
for (int i = 0; i < n; i++) \
dst[i] = a[i] % b[i]; \
}
#define TEST_ALL() \
TEST_TYPE(int8_t) \
TEST_ALL()
With this patch:
_61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
ivtmp_45 = _61 * 4;
vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, vect__4.8_48, _61, 0);
.LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
2. Floating-point arithmetic **WITHOUT** -ffast-math
#define TEST_TYPE(TYPE) \
__attribute__((noipa)) \
void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
{ \
for (int i = 0; i < n; i++) \
dst[i] = a[i] + b[i]; \
}
#define TEST_ALL() \
TEST_TYPE(float) \
TEST_ALL()
With this patch:
_61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
ivtmp_45 = _61 * 4;
vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, vect__4.8_48, _61, 0);
.LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
With this patch, we can make sure operations won't trap for elements that "mask_out_inactive".
gcc/ChangeLog:
* internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* support.
(CASE): Ditto.
(get_conditional_len_internal_fn): New function.
* internal-fn.h (get_conditional_len_internal_fn): Ditto.
* tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_*
support.
Diffstat (limited to 'gcc/tree-ssa-loop-ch.cc')
0 files changed, 0 insertions, 0 deletions