aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vectorizer.h
diff options
context:
space:
mode:
authorFeng Xue <fxue@os.amperecomputing.com>2024-07-02 17:12:00 +0800
committerFeng Xue <fxue@os.amperecomputing.com>2024-07-17 21:54:05 +0800
commit8b59fa9d8ca25bdf0792390a8bdeae151532a530 (patch)
tree62bf8669e56c168e14e2bde52801ca888a0a7fe2 /gcc/tree-vectorizer.h
parente7fbae834f8db2508d3161d88efe7ddbb702e437 (diff)
downloadgcc-8b59fa9d8ca25bdf0792390a8bdeae151532a530.zip
gcc-8b59fa9d8ca25bdf0792390a8bdeae151532a530.tar.gz
gcc-8b59fa9d8ca25bdf0792390a8bdeae151532a530.tar.bz2
vect: Refit lane-reducing to be normal operation
Vector stmts number of an operation is calculated based on output vectype. This is over-estimated for lane-reducing operation, which would cause vector def/use mismatched when we want to support loop reduction mixed with lane- reducing and normal operations. One solution is to refit lane-reducing to make it behave like a normal one, by adding new pass-through copies to fix possible def/use gap. And resultant superfluous statements could be optimized away after vectorization. For example: int sum = 1; for (i) { sum += d0[i] * d1[i]; // dot-prod <vector(16) char> } The vector size is 128-bit,vectorization factor is 16. Reduction statements would be transformed as: vector<4> int sum_v0 = { 0, 0, 0, 1 }; vector<4> int sum_v1 = { 0, 0, 0, 0 }; vector<4> int sum_v2 = { 0, 0, 0, 0 }; vector<4> int sum_v3 = { 0, 0, 0, 0 }; for (i / 16) { sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 = sum_v1; // copy sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy } sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3; // = sum_v0 2024-07-02 Feng Xue <fxue@os.amperecomputing.com> gcc/ * tree-vect-loop.cc (vect_reduction_update_partial_vector_usage): Calculate effective vector stmts number with generic vect_get_num_copies. (vect_transform_reduction): Insert copies for lane-reducing so as to fix over-estimated vector stmts number. (vect_transform_cycle_phi): Calculate vector PHI number only based on output vectype. * tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Remove adjustment on vector stmts number specific to slp reduction.
Diffstat (limited to 'gcc/tree-vectorizer.h')
0 files changed, 0 insertions, 0 deletions