aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp
diff options
context:
space:
mode:
authorAlexey Bataev <a.bataev@outlook.com>2024-09-17 06:57:47 -0400
committerAlexey Bataev <a.bataev@outlook.com>2024-09-21 15:41:06 -0700
commit1833d418a04123916c1dbeb0c41c8bc7d06b779b (patch)
treec2ac87d48f7df944f58a20243a284576bf14f96d /llvm/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp
parente588fd994fe8ce0fa7804284f2a2a9a6922980fd (diff)
downloadllvm-1833d418a04123916c1dbeb0c41c8bc7d06b779b.zip
llvm-1833d418a04123916c1dbeb0c41c8bc7d06b779b.tar.gz
llvm-1833d418a04123916c1dbeb0c41c8bc7d06b779b.tar.bz2
[SLP]Vectorize gathered loads
Final gather/buildvector nodes may have scalar loads, which are not vectorized (since they are part of the gather nodes) but may form full vector loads, being combined. This patch walks over all gather nodes, "gathering" and sorting gathered scalar loads and then tries to build vector loads, which later are reshuffled between the gather nodes. It allows later to add support for segmented loads (kind of AOS to SOA load kind for RISC-V RVV) and may help with the removal of the alternat e opcodes support. Currently, alternate nodes may depend on each other because of the consecutive loads between their operands. Because of that we cannot simply remove alternate vectorization. But this approach may help to remove most of the stuff for it, since we'll be able to vectorize loads in between lanes. Metric: size..text, AVX512 Program size..text test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 238381.00 250669.00 5.2% test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test 25753.00 26329.00 2.2% test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-psadbw.test 3028.00 3092.00 2.1% test-suite :: MultiSource/Benchmarks/Rodinia/hotspot/hotspot.test 4243.00 4275.00 0.8% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 649765.00 653877.00 0.6% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 649765.00 653877.00 0.6% test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test 4199.00 4222.00 0.5% test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw.test 12933.00 12997.00 0.5% test-suite :: SingleSource/Benchmarks/Misc/flops.test 8282.00 8314.00 0.4% test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-unpack_msasm.test 10065.00 10097.00 0.3% test-suite :: SingleSource/Benchmarks/Misc-C++/Large/ray.test 5160.00 5176.00 0.3% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12472220.00 12509612.00 0.3% test-suite :: MultiSource/Benchmarks/Prolangs-C++/city/city.test 6908.00 6924.00 0.2% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 202830.00 203278.00 0.2% test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test 9133.00 9149.00 0.2% test-suite :: MultiSource/Benchmarks/Olden/power/power.test 6792.00 6803.00 0.2% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1395585.00 1397473.00 0.1% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1395585.00 1397473.00 0.1% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 97662.00 97758.00 0.1% test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test 595179.00 595739.00 0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test 70603.00 70667.00 0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail.test 19877.00 19893.00 0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test 90231.00 90279.00 0.1% test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test 33738.00 33754.00 0.0% test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test 13262.00 13268.00 0.0% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 1139964.00 1140460.00 0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 849507.00 849875.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1158379.00 1158859.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test 38724.00 38740.00 0.0% test-suite :: External/SPEC/CFP2006/470.lbm/470.lbm.test 15180.00 15186.00 0.0% test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test 15484.00 15490.00 0.0% test-suite :: External/SPEC/CINT2006/456.hmmer/456.hmmer.test 167391.00 167455.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.test 137448.00 137496.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2030254.00 2030766.00 0.0% test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test 302870.00 302934.00 0.0% test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test 303126.00 303190.00 0.0% test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test 241107.00 241155.00 0.0% test-suite :: External/SPEC/CFP2006/482.sphinx3/482.sphinx3.test 162974.00 163006.00 0.0% test-suite :: MultiSource/Applications/siod/siod.test 167168.00 167200.00 0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1048796.00 1048988.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test 201623.00 201655.00 0.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 501734.00 501798.00 0.0% test-suite :: MultiSource/Applications/ClamAV/clamscan.test 580888.00 580952.00 0.0% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 168319.00 168335.00 0.0% test-suite :: MicroBenchmarks/ImageProcessing/Interpolation/Interpolation.test 226022.00 226038.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt.test 118011.00 118015.00 0.0% test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test 550589.00 550605.00 0.0% test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test 3072477.00 3072541.00 0.0% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2385563.00 2385579.00 0.0% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 389171.00 389155.00 -0.0% test-suite :: MultiSource/Applications/lua/lua.test 234764.00 234748.00 -0.0% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 227694.00 227678.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test 119819.00 119807.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test 117995.00 117983.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test 123610.00 123594.00 -0.0% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 81414.00 81398.00 -0.0% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 782040.00 781880.00 -0.0% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 9597420.00 9595292.00 -0.0% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 9597420.00 9595292.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 911832.00 911608.00 -0.0% test-suite :: MultiSource/Applications/oggenc/oggenc.test 192507.00 192459.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test 122843.00 122811.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test 122292.00 122260.00 -0.0% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 777363.00 777155.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test 123265.00 123205.00 -0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 315534.00 315358.00 -0.1% test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.test 128163.00 128083.00 -0.1% test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 6562.00 6555.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test 23428.00 23396.00 -0.1% test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test 22749.00 22717.00 -0.1% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 39549.00 39485.00 -0.2% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 39546.00 39482.00 -0.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test 57214.00 57118.00 -0.2% test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 413668.00 412804.00 -0.2% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1044047.00 1041487.00 -0.2% test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 12414.00 12382.00 -0.3% test-suite :: MultiSource/Benchmarks/Prolangs-C/gnugo/gnugo.test 31161.00 30969.00 -0.6% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 224726.00 223254.00 -0.7% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 93512.00 92824.00 -0.7% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 281151.00 278463.00 -1.0% test-suite :: MultiSource/Benchmarks/Olden/tsp/tsp.test 2820.00 2788.00 -1.1% test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 156819.00 154739.00 -1.3% test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test 11560.00 11160.00 -3.5% test-suite :: MultiSource/Benchmarks/McCat/08-main/main.test 6734.00 6382.00 -5.2% results results0 diff ASCI_Purple/SMG2000 - extra vector code VPlanNativePath/outer-loop-vect - extra vectorization, better vector code AVX512BWVL/Vector-AVX512BWVL-psadbw - better vector code Rodinia/hotspot - small variations CINT2017speed/625.x264_s CINT2017rate/525.x264_r - extra vector code, better vectorization BenchmarkGame/n-body - better vector code. AVX512BWVL/Vector-AVX512BWVL-unpack_msasm - small variations Misc/flops - extra vector code AVX512BWVL/Vector-AVX512BWVL-mask_set_bw - small variations Misc-C++/Large - better vector code CFP2017rate/526.blender_r - extra vector code Prolangs-C++/city - extra vector code MiBench/consumer-lame - extra vector code CoyoteBench/fftbench - extra vector code Olden/power - better vector code CFP2017rate/538.imagick_r CFP2017speed/638.imagick_s - extra vector code CINT2017rate/531.deepsjeng_r - extra vector code CFP2006/447.dealII - small variations DOE-ProxyApps-C/miniAMR - small variations Prolangs-C/unix-smail - small variations DOE-ProxyApps-C++/PENNANT - small variations CINT2006/473.astar - small variations CFP2006/453.povray - small variations JM/lencod - extra vector code CFP2017rate/511.povray_r - small variations DOE-ProxyApps-C/CoMD - small variations CFP2006/470.lbm - extra vector code CFP2017speed/619.lbm_s CFP2017rate/519.lbm_r - extra vector code CINT2006/456.hmmer - extra code vectorized TSVC/ControlFlow-dbl - extra vector code CFP2017rate/510.parest_r - better vector code LCALS/SubsetALambdaLoops - extra code vectorized LCALS/SubsetARawLoops - extra code vectorized CFP2006/444.namd - extra code vectorized CFP2006/482.sphinx3 - better vector code Applications/siod - better vector code Benchmarks/7zip - better vector code DOE-ProxyApps-C++/CLAMR - extra code vectorized Applications/sqlite3 - extra code vectorized Applications/ClamAV - smaller vector code MallocBench/gs - small variations MicroBenchmarks/ImageProcessing - small variations TSVC/StatementReordering-flt - extra code vectorized CINT2006/471.omnetpp - small variations CINT2006/403.gcc - extra code vectorized CINT2006/483.xalancbmk - extra code vectorized JM/ldecod - small variations Applications/lua - extra code vectorized mafft/pairlocalalign - small variations TSVC/NodeSplitting-flt - extra code vectorized TSVC/Recurrences-flt - extra code vectorized TSVC/InductionVariable-flt - extra code vectorized FreeBench/pifft - small variations CINT2006/464.h264ref - extra code vectorized CINT2017speed/602.gcc_s CINT2017rate/502.gcc_r - some extra code vectorized, extra code inlined CINT2006/445.gobmk - small variations Applications/oggenc - small variations TSVC/LoopRestructuring-flt - extra code vectorized TSVC/CrossingThresholds-flt - extra code vectorized CFP2017rate/508.namd_r - small variations TSVC/ControlFlow-flt - extra code vectorized mediabench/g721 - small variations Prolangs-C/compiler - small variations FreeBench/fourinarow - better vector code MiBench/telecomm-gsm - small variation in vector code mediabench/gsm - same Prolangs-C/bison - small variations Adobe-C++/loop_unroll - extra code vectorized Benchmarks/tramp3d-v4 - extra code gets inlined, small changes in vetor code McCat/18-imp - variations in vector code Prolangs-C/gnugo - variations in vector code MallocBench/espresso - extra code vectorized DOE-ProxyApps-C++/miniFE - small variations in vector code Prolangs-C/TimberWolfMC - extra code vectorized, small changes in previously vectorized code. Olden/tsp - small changes in vector code CFP2006/433.milc - extra code gets inlined, vectorized 2 x stores to 4 x stores MiBench/security-blowfish - extra code vectorized McCat/08-main - better vector code. Metric: size..text, RISCV, sifive-p670 Program size..text results results0 diff test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 63580.00 64020.00 0.7% test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test 21388.00 21406.00 0.1% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 296992.00 297088.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 968112.00 968208.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.test 45160.00 45164.00 0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 2635902.00 2635854.00 -0.0% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 2635902.00 2635854.00 -0.0% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 7568730.00 7568578.00 -0.0% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 7568730.00 7568578.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test 49764.00 49762.00 -0.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 449132.00 449108.00 -0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 695932.00 695892.00 -0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 508820.00 508788.00 -0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 508820.00 508788.00 -0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9594152.00 9593336.00 -0.0% test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 166522.00 166490.00 -0.0% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 722252.00 722092.00 -0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 27554.00 27546.00 -0.0% test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test 10900.00 10896.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test 46754.00 46732.00 -0.0% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 631570.00 631226.00 -0.1% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 850698.00 850218.00 -0.1% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 24816.00 24800.00 -0.1% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 24814.00 24798.00 -0.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 1599946.00 1598394.00 -0.1% test-suite :: MultiSource/Applications/hbd/hbd.test 27236.00 27204.00 -0.1% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 293848.00 293480.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test 20160.00 20048.00 -0.6% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 182088.00 181040.00 -0.6% test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 4788.00 4748.00 -0.8% DOE-ProxyApps-C++/miniFE - extra vector code MiBench/automotive-susan - small variations Benchmarks/Bullet - extra vector code CFP2017rate/511.povray_r - slightly better vector code TSVC/StatementReordering-dbl - small variations CINT2017rate/523.xalancbmk_r CINT2017speed/623.xalancbmk_s - extra vector code CINT2017rate/502.gcc_r CINT2017speed/602.gcc_s - extra vector code TSVC/CrossingThresholds-flt - small variations Applications/sqlite3 - extra vector code JM/lencod - extra vector code, small variations CINT2017rate/525.x264_r CINT2017speed/625.x264_s - small variations CFP2017rate/526.blender_r - extra vector code, small variations DOE-ProxyApps-C/miniGMG - small variations Vectorizer/VPlanNativePath/outer-loop-vect - small variations TSVC/CrossingThresholds-dbl - small variations Benchmarks/tramp3d-v4 - small variations Benchmarks/7zip - extra vector code MiBench/telecomm-gsm - small variations mediabench/gsm/toast - small variations CFP2017rate/510.parest_r - extra vector code Applications/hbd - extra vector code JM/ldecod - better vector code Prolangs-C/compiler - extra vector code MallocBench/espresso - extra vector code mediabench/g721/g721encode - extra vectorization Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/107461
Diffstat (limited to 'llvm/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp')
0 files changed, 0 insertions, 0 deletions