aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Utils/LoopUnroll.cpp
diff options
context:
space:
mode:
authorBruno Cardoso Lopes <bruno.cardoso@gmail.com>2014-12-22 19:45:43 +0000
committerBruno Cardoso Lopes <bruno.cardoso@gmail.com>2014-12-22 19:45:43 +0000
commit811c173523bf585fa55270a93d262a7ad8e82995 (patch)
treed95aa2a3b9e1d6a94eef509ee42d9f7949376163 /llvm/lib/Transforms/Utils/LoopUnroll.cpp
parent54c75900738a09216e9e399ff70c9ad56d60aef0 (diff)
downloadllvm-811c173523bf585fa55270a93d262a7ad8e82995.zip
llvm-811c173523bf585fa55270a93d262a7ad8e82995.tar.gz
llvm-811c173523bf585fa55270a93d262a7ad8e82995.tar.bz2
[x86] Add vector @llvm.ctpop intrinsic custom lowering
Currently, when ctpop is supported for scalar types, the expansion of @llvm.ctpop.vXiY uses vector element extractions, insertions and individual calls to @llvm.ctpop.iY. When not, expansion with bit-math operations is used for the scalar calls. Local haswell measurements show that we can improve vector @llvm.ctpop.vXiY expansion in some cases by using a using a vector parallel bit twiddling approach, based on: v = v - ((v >> 1) & 0x55555555); v = (v & 0x33333333) + ((v >> 2) & 0x33333333); v = ((v + (v >> 4) & 0xF0F0F0F) v = v + (v >> 8) v = v + (v >> 16) v = v & 0x0000003F (from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel) When scalar ctpop isn't supported, the approach above performs better for v2i64, v4i32, v4i64 and v8i32 (see numbers below). And even when scalar ctpop is supported, this approach performs ~2x better for v8i32. Here, x86_64 implies -march=corei7-avx without ctpop and x86_64h includes ctpop support with -march=core-avx2. == [x86_64h - new] v8i32: 0.661685 v4i32: 0.514678 v4i64: 0.652009 v2i64: 0.324289 == [x86_64h - old] v8i32: 1.29578 v4i32: 0.528807 v4i64: 0.65981 v2i64: 0.330707 == [x86_64 - new] v8i32: 1.003 v4i32: 0.656273 v4i64: 1.11711 v2i64: 0.754064 == [x86_64 - old] v8i32: 2.34886 v4i32: 1.72053 v4i64: 1.41086 v2i64: 1.0244 More work for other vector types will come next. llvm-svn: 224725
Diffstat (limited to 'llvm/lib/Transforms/Utils/LoopUnroll.cpp')
0 files changed, 0 insertions, 0 deletions