diff options
author | Feng Xue <fxue@os.amperecomputing.com> | 2024-07-02 17:12:00 +0800 |
---|---|---|
committer | Feng Xue <fxue@os.amperecomputing.com> | 2024-07-17 21:54:05 +0800 |
commit | 8b59fa9d8ca25bdf0792390a8bdeae151532a530 (patch) | |
tree | 62bf8669e56c168e14e2bde52801ca888a0a7fe2 /gcc/tree-vectorizer.h | |
parent | e7fbae834f8db2508d3161d88efe7ddbb702e437 (diff) | |
download | gcc-8b59fa9d8ca25bdf0792390a8bdeae151532a530.zip gcc-8b59fa9d8ca25bdf0792390a8bdeae151532a530.tar.gz gcc-8b59fa9d8ca25bdf0792390a8bdeae151532a530.tar.bz2 |
vect: Refit lane-reducing to be normal operation
Vector stmts number of an operation is calculated based on output vectype.
This is over-estimated for lane-reducing operation, which would cause vector
def/use mismatched when we want to support loop reduction mixed with lane-
reducing and normal operations. One solution is to refit lane-reducing
to make it behave like a normal one, by adding new pass-through copies to
fix possible def/use gap. And resultant superfluous statements could be
optimized away after vectorization. For example:
int sum = 1;
for (i)
{
sum += d0[i] * d1[i]; // dot-prod <vector(16) char>
}
The vector size is 128-bit,vectorization factor is 16. Reduction
statements would be transformed as:
vector<4> int sum_v0 = { 0, 0, 0, 1 };
vector<4> int sum_v1 = { 0, 0, 0, 0 };
vector<4> int sum_v2 = { 0, 0, 0, 0 };
vector<4> int sum_v3 = { 0, 0, 0, 0 };
for (i / 16)
{
sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
sum_v1 = sum_v1; // copy
sum_v2 = sum_v2; // copy
sum_v3 = sum_v3; // copy
}
sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3; // = sum_v0
2024-07-02 Feng Xue <fxue@os.amperecomputing.com>
gcc/
* tree-vect-loop.cc (vect_reduction_update_partial_vector_usage):
Calculate effective vector stmts number with generic
vect_get_num_copies.
(vect_transform_reduction): Insert copies for lane-reducing so as to
fix over-estimated vector stmts number.
(vect_transform_cycle_phi): Calculate vector PHI number only based on
output vectype.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Remove
adjustment on vector stmts number specific to slp reduction.
Diffstat (limited to 'gcc/tree-vectorizer.h')
0 files changed, 0 insertions, 0 deletions