aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
AgeCommit message (Collapse)AuthorFilesLines
3 daysReapply "[CodeGen][NPM] Stitch up loop passes in codegen pipeline" (#151098)Vikram Hegde1-1/+8
Reapplies https://github.com/llvm/llvm-project/pull/148114 includes shared lib build failure fixes for AMDGPU and X86.
5 days[HIPSTDPAR] Add handling for math builtins (#140158)Alex Voicu1-2/+6
When compiling in `--hipstdpar` mode, the builtins corresponding to the standard library might end up in code that is expected to execute on the accelerator (e.g. by using the `std::` prefixed functions from `<cmath>`). We do not have uniform handling for this in AMDGPU, and the errors that obtain are quite arcane. Furthermore, the user-space changes required to work around this tend to be rather intrusive. This patch adds an additional `--hipstdpar` specific pass which forwards to the run time component of HIPSTDPAR the intrinsics / libcalls which result from the use of the math builtins, and which are not properly handled. In the long run we will want to stop relying on this and handle things in the compiler, but it is going to be a rather lengthy journey, which makes this medium term escape hatch necessary. The paired change in the run time component is here <https://github.com/ROCm/rocThrust/pull/551>.
6 daysRevert "[CodeGen][NPM] Stitch up loop passes in codegen pipeline" (#150883)Vikram Hegde1-8/+1
Reverts llvm/llvm-project#148114 will update with fixed PR.
6 days[CodeGen][NPM] Stitch up loop passes in codegen pipeline (#148114)Vikram Hegde1-1/+8
same as https://github.com/llvm/llvm-project/pull/133050 Co-authored-by : Oke, Akshat <[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
2025-07-18AMDGPU: Add pass to replace constant materialize with AV pseudos (#149292)Matt Arsenault1-0/+13
If we have a v_mov_b32 or v_accvgpr_write_b32 with an inline immediate, replace it with a pseudo which writes to the combined AV_* class. This relaxes the operand constraints, which will allow the allocator to inflate the register class to AV_* to potentially avoid spilling. The allocator does not know how to replace an instruction to enable the change of register class. I originally tried to do this by changing all of the places we introduce v_mov_b32 with immediate, but it's along tail of niche cases that require manual updating. Plus we can restrict this to only run on functions where we know we will be allocating AGPRs.
2025-07-17[AMDGPU][NPM] Fill in addPreSched2 passes (#148112)Vikram Hegde1-0/+6
same as https://github.com/llvm/llvm-project/pull/139516 Co-authored-by : Oke, Akshat <[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
2025-07-10[AMDGPU][NewPM] Port "AMDGPUResourceUsageAnalysis" to NPM (#130959)Vikram Hegde1-1/+2
2025-07-09[CodeGen][NPM] Differentiate pipeline-required and opt-required passes (#135752)Akshat Oke1-1/+2
"Required" passes relate to actually running the pass on the IR, regardless of whether they are in the pipeline. CGPassBuilder was mistakenly still adding them to the pipeline. The test `llc -stop-after=greedy -enable-new-pm` would still add `greedy` to the pipeline otherwise.
2025-07-09[AMDGPU][NPM] Complete optimized regalloc pipeline (#138491)Akshat Oke1-3/+38
Also fill in some other passes.
2025-07-09[CodeGen][NPM] Support CodeGenSCCOrder in pipeline (#136818)Akshat Oke1-0/+2
Wrap passes into Post order CGSCC pass manager in codegen pass builder. I am adding the pipeline test in this but it is not yet complete.
2025-06-27AMDGPU: Introduce a pass to replace VGPR MFMAs with AGPR (#145024)Matt Arsenault1-0/+3
In gfx90a-gfx950, it's possible to emit MFMAs which use AGPRs or VGPRs for vdst and src2. We do not want to do use the AGPR form, unless required by register pressure as it requires cross bank register copies from most other instructions. Currently we select the AGPR or VGPR version depending on a crude heuristic for whether it's possible AGPRs will be required. We really need the register allocation to be complete to make a good decision, which is what this pass is for. This adds the pass, but does not yet remove the selection patterns for AGPRs. This is a WIP, and NFC-ish. It should be a no-op on any currently selected code. It also does not yet trigger on the real examples of interest, which require handling batches of MFMAs at once.
2025-06-23AMDGPU: Remove legacy pass manager version of AMDGPUAttributor (#145262)Matt Arsenault1-1/+0
2025-06-22AMDGPU: Use reportFatalUsageError for regalloc flag error (#145198)Matt Arsenault1-2/+2
2025-06-21AMDGPU: Really delete AMDGPUAnnotateKernelFeatures (#145136)Matt Arsenault1-3/+0
2025-06-20AMDGPU: Remove legacy pass manager version of AMDGPUUnifyMetadata (#144985)Matt Arsenault1-1/+0
This is only run in the new pass manager now.
2025-06-20AMDGPU: Remove legacy PM version of AMDGPUPromoteAllocaToVector (#144986)Matt Arsenault1-1/+0
This is only run in the middle end with the new pass manager now, so garbage collect the old PM version.
2025-06-17[llvm] annotate interfaces in llvm/Target for DLL export (#143615)Andrew Rogers1-1/+2
## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch annotates the `llvm/Target` library. These annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). A sub-set of these changes were generated automatically using the [Interface Definition Scanner (IDS)](https://github.com/compnerd/ids) tool, followed formatting with `git clang-format`. The bulk of this change is manual additions of `LLVM_ABI` to `LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target. Adding `LLVM_ABI` to the function implementation is required here because they do not `#include "llvm/Support/TargetSelect.h"`, which contains the declarations for this functions and was already updated with `LLVM_ABI` in a previous patch. I considered patching these files with `#include "llvm/Support/TargetSelect.h"` instead, but since TargetSelect.h is a large file with a bunch of preprocessor x-macro stuff in it I was concerned it would unnecessarily impact compile times. In addition, a number of unit tests under llvm/unittests/Target required additional dependencies to make them build correctly against the LLVM DLL on Windows using MSVC. ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang
2025-06-05[AMDGPU] Remove duplicated/confusing helpers. NFCI (#142598)Diana Picus1-1/+3
Move canGuaranteeTCO and mayTailCallThisCC into AMDGPUBaseInfo instead of keeping two copies for DAG/Global ISel. Also remove isKernelCC, which doesn't agree with isKernel and doesn't seem very useful. While at it, also move all the CC-related helpers into AMDGPUBaseInfo.h and mark them constexpr.
2025-06-03[MISched] Add templates for creating custom schedulers (#141935)Pengcheng Wang1-1/+1
We rename `createGenericSchedLive` and `createGenericSchedPostRA` to `createSchedLive` and `createSchedPostRA`, and add a template parameter `Strategy` which is the generic implementation by default. This can simplify some code for targets that have custom scheduler strategy.
2025-05-30Reapply "Reapply "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer ↵Shilei Tian1-0/+4
kernel arguments (#137488)"" This reverts commit 37ea3b32cdcb6c0dcecbcc4bf844f5190c7378dd.
2025-05-30Revert "Reapply "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer ↵Shilei Tian1-4/+0
kernel arguments (#137488)"" This reverts commit 4efc13f8ff1eaf4f9fb1fcea8d4552b3eca052ca.
2025-05-30Reapply "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel ↵Shilei Tian1-0/+4
arguments (#137488)" This reverts commit 3c6211c183885afb5d89259a53c4f4f46a6bf399.
2025-05-30Revert "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel ↵Shilei Tian1-4/+0
arguments (#137488)" This reverts commit 9bf6b2a8cb0467b62173659306e43a0346f063a2.
2025-05-30[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel arguments ↵Shilei Tian1-0/+4
(#137488)
2025-05-29[AMDGPU] Move InferAddressSpacesPass to middle end optimization pipeline ↵Shilei Tian1-6/+22
(#138604) It will run twice in the non-LTO pipeline with `O1` or higher. In LTO post link pipeline, it will be run once with `O2` or higher, since inline and SROA don't run in `O1`.
2025-05-26[AMDGPU] Cluster export instructions in PostRA Scheduler (#141399)Carl Ritson1-0/+1
DAG mutation needs to be applied post-RA to maintain order established during pre-RA scheduler.
2025-05-19[AMDGPU] Set AS8 address width to 48 bitsAlexander Richardson1-4/+3
Of the 128-bits of buffer descriptor only 48 bits are address bits, so following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54, the logic conclusion is to set the index width to 48 bits instead of the current value of 128. Most of the test changes are mechanical datalayout updates, but there is one actual change: the ptrmask test now uses .i48 instead of .i128 and I had to update SelectionDAGBuilder to correctly extend the mask. Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139419
2025-05-11[AMDGPU] Move kernarg preload logic to separate pass (#130434)Austin Kerbow1-0/+8
Moves kernarg preload logic to its own module pass. Cloned function declarations are removed when preloading hidden arguments. The inreg attribute is now added in this pass instead of AMDGPUAttributor. The rest of the logic is copied from AMDGPULowerKernelArguments which now only check whether an arguments is marked inreg to avoid replacing direct uses of preloaded arguments. This change requires test updates to remove inreg from lit tests with kernels that don't actually want preloading.
2025-05-06Register assembly printer passes (#138348)Matthias Braun1-0/+1
Register assembly printer passes in the pass registry. This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests. Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow targets to use the `INITIALIZE_PASS` macros and register the pass in the pass registry. This currently has a default parameter so it won't break any targets that have not been updated.
2025-05-02[AMDGPU][Attributor] Add `ThinOrFullLTOPhase` as an argument (#123994)Shilei Tian1-3/+6
2025-04-30[CodeGen][NPM] Port VirtRegRewriter to NPM (#130564)Akshat Oke1-0/+46
2025-04-26[TTI] Simplify implementation (NFCI) (#136674)Sergei Barannikov1-1/+1
Replace "concept based polymorphism" with simpler PImpl idiom. This pursues two goals: * Enforce static type checking. Previously, target implementations hid base class methods and type checking was impossible. Now that they override the methods, the compiler will complain on mismatched signatures. * Make the code easier to navigate. Previously, if you asked your favorite LSP server to show a method (e.g. `getInstructionCost()`), it would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`, `TTIImplBase`, and target overrides. Now it is two less :) There are three commits to hopefully simplify the review. The first commit removes `TTI::Model`. This is done by deriving `TargetTransformInfoImplBase` from `TTI::Concept`. This is possible because they implement the same set of interfaces with identical signatures. The first commit makes `TargetTransformImplBase` polymorphic, which means all derived classes should `override` its methods. This is done in second commit to make the first one smaller. It appeared infeasible to extract this into a separate PR because the first commit landed separately would result in tons of `-Woverloaded-virtual` warnings (and break `-Werror` builds). The third commit eliminates `TTI::Concept` by merging it with the only derived class `TargetTransformImplBase`. This commit could be extracted into a separate PR, but it touches the same lines in `TargetTransformInfoImpl.h` (removes `override` added by the second commit and adds `virtual`), so I thought it may make sense to land these two commits together. Pull Request: https://github.com/llvm/llvm-project/pull/136674
2025-04-21[AMDGPU][NewPM] Make the pass flow consistent with the legacy pipeline. ↵Christudasan Devadasan1-5/+3
(#136551)
2025-04-17[AMDGPU][NPM] Cleanup AMDGPUPassRegistry.def (#130071)Akshat Oke1-0/+1
Finishing up AMDGPU specific passes. Only ones remaining are assembly printer, virt reg rewriter and PEI.
2025-04-15[AMDGPU] Remove the AnnotateKernelFeatures pass (#130198)Jun Wang1-7/+0
Previously the AnnotateKernelFeatures pass infers two attributes: amdgpu-calls and amdgpu-stack-objects, which are used to help determine if flat scratch init is allowed. PR #118907 created the amdgpu-no-flat-scratch-init attribute. Continuing with that work, this patch makes use of this attribute to determine flat scratch init, replacing amdgpu-calls and amdgpu-stack-objects. This also leads to the removal of the AnnotateKernelFeatures pass.
2025-04-15[HIP][HIPSTDPAR][NFC] Re-order & adapt `hipstdpar` specific passes (#134753)Alex Voicu1-7/+13
The `hipstdpar` specific passes were not ordered ideally, especially for `fgpu-rdc` compilations, which meant that we'd eagerly run accelerator code selection and remove symbols that might end up used. This change corrects that aspect by ensuring that accelerator code selection is only done after linking (this will have to be revisited in the future once the closed-world assumption no longer holds). Furthermore, we take the opportunity to move allocation interposition so that it properly gets printed when print-pipeline-passes is requested. NFC.
2025-04-14[CodeGen][NPM] Port BranchRelaxation to NPM (#130067)Akshat Oke1-1/+2
This completes the PreEmitPasses.
2025-04-11[AMDGPU] Teach iterative schedulers about IGLP (#134953)Jeffrey Byrnes1-2/+6
This adds IGLP mutation to the iterative schedulers (`gcn-iterative-max-occupancy-experimental`, `gcn-iterative-minreg`, and `gcn-iterative-ilp`). The `gcn-iterative-minreg` and `gcn-iterative-ilp` schedulers never actually applied the mutations added, so this also has the effect of teaching them about mutations in general. The `gcn-iterative-max-occupancy-experimental` scheduler has calls to `ScheduleDAGMILive::schedule()`, so, before this, mutations were applied at this point. Now this is done during calls to `BuildDAG`, with IGLP superseding other mutations (similar to the other schedulers). We may end up scheduling regions multiple times, with mutations being applied each time, so we need to track for `AMDGPU::SchedulingPhase::PreRAReentry`
2025-04-10[AMDGPU] Make the iterative schedulers selectable via amdgpu-sched-strategy ↵Jeffrey Byrnes1-0/+9
(#135042) Currently, the only way for users to try these schedulers is via `-misched=` . However, this overrides the default scheduler for all targets. This causes problems for various toolchains / drivers which spawn jobs for both x86 and AMDGPU -- e.g. hipcc. On the other hand, `amdgpu-sched-strategy` only changes the scheduler for AMDGPU target.
2025-04-09[CodeGen][NPM] Port PostRAHazardRecognizer to NPM (#130066)Akshat Oke1-1/+2
2025-04-08[AMDGPU][NPM] Port SIPreEmitPeephole to NPM (#130065)Akshat Oke1-4/+3
2025-04-07[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410)Rahul Joshi1-0/+2
- Remove calls to pass initialization from pass constructors. - https://github.com/llvm/llvm-project/issues/111767
2025-04-02[AMDGPU][NPM] Port AMDGPUSetWavePriority to NPM (#130064)Akshat Oke1-3/+2
2025-03-26[AMDGPU][NPM] Port SILateBranchLowering to NPM (#130063)Akshat Oke1-2/+3
2025-03-25[AMDGPU][NPM] Port SIInsertHardClauses to NPM (#130062)Akshat Oke1-1/+1
2025-03-24[AMDGPU][NPM] Port SIInsertWaitcnts to NPM (#130061)Akshat Oke1-2/+2
2025-03-19[AMDGPU][NPM] Port AMDGPUMarkLastScratchLoad to NPM (#131738)Akshat Oke1-1/+1
This finishes all passes for the optimized regalloc path. --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-03-14[NFC][AMDGPU] Replace direct arch comparison with `isAMDGCN()` (#131357)Shilei Tian1-11/+9
2025-03-14[AMDGPU][NPM] Port GCNCreateVOPD to NPM (#130059)Akshat Oke1-2/+2
2025-03-12AMDGPU/GlobalISel: Disable LCSSA pass (#124297)Petar Avramovic1-2/+8
Disable LCSSA pass in preparation for implementing temporal divergence lowering in amdgpu divergence lowering. Breaks all cases where sgpr or i1 values are used outside of the cycle with divergent exit. Regenerate regression tests for amdgpu divergence lowering with LCSSA disabled. Update IntrinsicLaneMaskAnalyzer to stop tracking lcssa phis that are lane masks.