diff options
author | Hanhan Wang <hanchung@google.com> | 2023-04-20 14:46:40 -0700 |
---|---|---|
committer | Hanhan Wang <hanchung@google.com> | 2023-04-23 11:05:41 -0700 |
commit | 8d163e5045073a5ac570225cc8e14cc9f6d72f09 (patch) | |
tree | 486a1b2e5637ae2cf12dd1e38b38efe598aaf903 /clang/lib/Frontend/CompilerInvocation.cpp | |
parent | 1746c7838ee05c3b293e3866ea35038b13e090f7 (diff) | |
download | llvm-8d163e5045073a5ac570225cc8e14cc9f6d72f09.zip llvm-8d163e5045073a5ac570225cc8e14cc9f6d72f09.tar.gz llvm-8d163e5045073a5ac570225cc8e14cc9f6d72f09.tar.bz2 |
[mlir][Vector] Add 16x16 strategy to vector.transpose lowering.
It adds a `shuffle_16x16` strategy LowerVectorTranspose and renames `shuffle` to `shuffle_1d`. The idea is similar to 8x8 cases in x86Vector::avx2. The general algorithm is:
```
interleave 32-bit lanes using
8x _mm512_unpacklo_epi32
8x _mm512_unpackhi_epi32
interleave 64-bit lanes using
8x _mm512_unpacklo_epi64
8x _mm512_unpackhi_epi64
permute 128-bit lanes using
16x _mm512_shuffle_i32x4
permute 256-bit lanes using again
16x _mm512_shuffle_i32x4
```
After the first stage, they got transposed to
```
0 16 1 17 4 20 5 21 8 24 9 25 12 28 13 29
2 18 3 19 6 22 7 23 10 26 11 27 14 30 15 31
32 48 33 49 ...
34 50 35 51 ...
64 80 65 81 ...
...
```
After the second stage, they got transposed to
```
0 16 32 48 ...
1 17 33 49 ...
2 18 34 49 ...
3 19 35 51 ...
64 80 96 112 ...
65 81 97 114 ...
66 82 98 113 ...
67 83 99 115 ...
...
```
After the thrid stage, they got transposed to
```
0 16 32 48 8 24 40 56 64 80 96 112 ...
1 17 33 49 ...
2 18 34 50 ...
3 19 35 51 ...
4 20 36 52 ...
5 21 37 53 ...
6 22 38 54 ...
7 23 39 55 ...
128 144 160 176 ...
129 145 161 177 ...
...
```
After the last stage, they got transposed to
```
0 16 32 48 64 80 96 112 ... 240
1 17 33 49 66 81 97 113 ... 241
2 18 34 50 67 82 98 114 ... 242
...
15 31 47 63 79 96 111 127 ... 255
```
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D148685
Diffstat (limited to 'clang/lib/Frontend/CompilerInvocation.cpp')
0 files changed, 0 insertions, 0 deletions