[GlobalISel] Combine (a[0]) | (a[1] << k1) | ...| (a[m] << kn) into a wide load

This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`. (See D27861) This tries to recognize patterns like below (assuming a little-endian target): ``` s8* x = ... s32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24) -> s32 val = *((i32)a) s8* x = ... s32 val = a[3] | (a[2] << 8) | (a[1] << 16) | (a[0] << 24) -> s32 val = BSWAP(*((s32)a)) ``` (This patch also handles the big-endian target case as well, in which the first example above has a BSWAP, and the second example above does not.) To recognize the pattern, this searches from the last G_OR in the expression tree. E.g. ``` Reg Reg \ / OR_1 Reg \ / OR_2 \ Reg .. / Root ``` Each non-OR register in the tree is put in a list. Each register in the list is then checked to see if it's an appropriate load + shift logic. If every register is a load + potentially a shift, the combine checks if those loads + shifts, when OR'd together, are equivalent to a wide load (possibly with a BSWAP.) To simplify things, this patch (1) Only handles G_ZEXTLOADs (which appear to be the common case) (2) Only works in a single MachineBasicBlock (3) Only handles G_SHL as the bit twiddling to stick the small load into a specific location An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from test/CodeGen/AArch64/load-combine.ll) At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3, and a 0.4% improvement for CTMark/7zip-benchmark. Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was the first instruction in the block. Differential Revision: https://reviews.llvm.org/D94350
author: Jessica Paquette <jpaquette@apple.com> 2020-12-18 12:56:14 -0800
committer: Jessica Paquette <jpaquette@apple.com> 2021-01-19 10:24:27 -0800
commit: cfc60730179042a93cb9cb338982e71d20707a24 (patch)
tree: 8cf5babe4bba4844440dfe89c7772200eb5d0a66 /llvm/lib/CodeGen/TargetLoweringBase.cpp
parent: 9c6a00fe99c4bbe329dd1933515f1a1a430fd5d7 (diff)
download: llvm-cfc60730179042a93cb9cb338982e71d20707a24.zip
llvm-cfc60730179042a93cb9cb338982e71d20707a24.tar.gz
llvm-cfc60730179042a93cb9cb338982e71d20707a24.tar.bz2
1 files changed, 8 insertions, 0 deletions
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 6cacb10..f639f72 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1756,6 +1756,14 @@ bool TargetLoweringBase::allowsMemoryAccess(LLVMContext &Context,
                             MMO.getFlags(), Fast);
 }
 
+bool TargetLoweringBase::allowsMemoryAccess(LLVMContext &Context,
+                                            const DataLayout &DL, LLT Ty,
+                                            const MachineMemOperand &MMO,
+                                            bool *Fast) const {
+  return allowsMemoryAccess(Context, DL, getMVTForLLT(Ty), MMO.getAddrSpace(),
+                            MMO.getAlign(), MMO.getFlags(), Fast);
+}
+
 BranchProbability TargetLoweringBase::getPredictableBranchThreshold() const {
   return BranchProbability(MinPercentageForPredictableBranch, 100);
 }
author	Jessica Paquette <jpaquette@apple.com>	2020-12-18 12:56:14 -0800
committer	Jessica Paquette <jpaquette@apple.com>	2021-01-19 10:24:27 -0800
commit	cfc60730179042a93cb9cb338982e71d20707a24 (patch)
tree	8cf5babe4bba4844440dfe89c7772200eb5d0a66 /llvm/lib/CodeGen/TargetLoweringBase.cpp
parent	9c6a00fe99c4bbe329dd1933515f1a1a430fd5d7 (diff)
download	llvm-cfc60730179042a93cb9cb338982e71d20707a24.zip llvm-cfc60730179042a93cb9cb338982e71d20707a24.tar.gz llvm-cfc60730179042a93cb9cb338982e71d20707a24.tar.bz2