aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/MachineCopyPropagation.cpp
diff options
context:
space:
mode:
authorDavid Green <david.green@arm.com>2019-09-09 16:35:49 +0000
committerDavid Green <david.green@arm.com>2019-09-09 16:35:49 +0000
commit2b7089949eda508203eb23c835d6a295eb00b46b (patch)
tree908c30aa1eea566a45985615c326dbe1d3e3fde6 /llvm/lib/CodeGen/MachineCopyPropagation.cpp
parent63e6d8db1cbfe75142669c55819c655c600f00a5 (diff)
downloadllvm-2b7089949eda508203eb23c835d6a295eb00b46b.zip
llvm-2b7089949eda508203eb23c835d6a295eb00b46b.tar.gz
llvm-2b7089949eda508203eb23c835d6a295eb00b46b.tar.bz2
[ARM] Fix loads and stores for predicate vectors
These predicate vectors can usually be loaded and stored with a single instruction, a VSTR_P0. However this instruction will store the entire P0 predicate, 16 bits, zeroextended to 32bits. Each lane of the the v4i1/v8i1/v16i1 representing 4/2/1 bits. As far as I understand, when llvm says "store this v4i1", it really does need to store 4 bits (or 8, that being the size of a byte, with this bottom 4 as the interesting bits). For example a bitcast from a v8i1 to a i8 is defined as a store followed by a load, which is how the code is expanded. So this instead lowers the v4i1/v8i1 load/store through some shuffles to get the bits into the correct positions. This, as you might imagine, is not as efficient as a single instruction. But I believe it is needed for correctness. v16i1 equally should not load/store 32bits, only storing the 16bits of data. Stack loads/stores are still using the VSTR_P0 (as can be seen by the test not changing). This is fine as they are self-consistent, it is only "externally observable loads/stores" (from our point of view) that need to be corrected. Differential revision: https://reviews.llvm.org/D67085 llvm-svn: 371419
Diffstat (limited to 'llvm/lib/CodeGen/MachineCopyPropagation.cpp')
0 files changed, 0 insertions, 0 deletions