aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Utils/LoopUtils.cpp
diff options
context:
space:
mode:
authorAlexey Bataev <5361294+alexey-bataev@users.noreply.github.com>2024-03-08 13:57:02 -0500
committerAlexey Bataev <a.bataev@outlook.com>2024-03-14 06:23:14 -0700
commit7f2167868d8c1cedd3915883412b9c787a2f01db (patch)
tree6ae488cc7043c76e7e242a427bdccfff139af88c /llvm/lib/Transforms/Utils/LoopUtils.cpp
parent21d80859df3fb416efac13ce8178fdf6d6489292 (diff)
downloadllvm-7f2167868d8c1cedd3915883412b9c787a2f01db.zip
llvm-7f2167868d8c1cedd3915883412b9c787a2f01db.tar.gz
llvm-7f2167868d8c1cedd3915883412b9c787a2f01db.tar.bz2
[SLP]Improve minbitwidth analysis.
This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Original Pull Request: https://github.com/llvm/llvm-project/pull/84334 The patch has the same functionality (no test changes, no changes in benchmarks) as the original patch, just has some compile time improvements + fixes for xxhash unittest, discovered earlier in the previous version of the patch. Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/84536
Diffstat (limited to 'llvm/lib/Transforms/Utils/LoopUtils.cpp')
0 files changed, 0 insertions, 0 deletions