Age | Commit message (Collapse) | Author | Files | Lines |
|
Saves around 125-210 MB of compilation memory usage per source for
roughly one third of our backend sources, ~60 MB on average.
|
|
|
|
using wrapper llvm::sort(nfc)
|
|
|
|
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
|
|
This reverts commit aa9f8596b01fef013ab62c20e61fc96d165f60f7 because it made
some assumptions that may not be valid.
|
|
FullLTO has to run this early before module splitting occurs otherwise
module splitting won't work as expected. There was a targeted fix for
fortran on another branch that disables the LTO run but that'd break
full LTO module splitting entirely.
Test changes are due to metadata indexes shifting.
See #122891
|
|
|
|
Currently, if there is already noalias metadata present on loads and
stores, lower module lds pass is generating a more conservative aliasing
set. This results in inhibiting scheduling intrinsics that would have
otherwise generated a better pipelined instruction.
The fix is not to always intersect already existing noalias metadata
with noalias created for lowering of LDS. But to intersect only if
noalias scopes are from the same domain, otherwise concatenate exising
noalias sets with LDS noalias.
There a few patches that have come for scopedAA in the past. Following
three should be enough background information.
https://reviews.llvm.org/D91576
https://reviews.llvm.org/D108315
https://reviews.llvm.org/D110049
Essentially, after a pass that might change aliasing info, one should
check if that pass results in change number of MayAlias or ModRef using
the following:
`opt -S -aa-pipeline=basic-aa,scoped-noalias-aa -passes=aa-eval
-evaluate-aa-metadata -print-all-alias-modref-info -disable-output`
|
|
- Remove calls to pass initialization from pass constructors.
- https://github.com/llvm/llvm-project/issues/111767
|
|
Add a new `CreateIntrinsic` overload with no `Types`, useful for
creating calls to non-overloaded intrinsics that don't need additional
mangling.
|
|
This patch fixes:
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp:1031:17: error:
unused variable 'F' [-Werror,-Wunused-variable]
|
|
Use a local pointer type to represent the named barrier in builtin and
intrinsic. This makes the definitions more user friendly
bacause they do not need to worry about the hardware ID assignment. Also
this approach is more like the other popular GPU programming language.
Named barriers should be represented as global variables of addrspace(3)
in LLVM-IR. Compiler assigns the special LDS offsets for those variables
during AMDGPULowerModuleLDS pass. Those addresses are converted to hw
barrier ID during instruction selection. The rest of the
instruction-selection changes are primarily due to the
intrinsic-definition changes.
|
|
Convert many instances of:
Fn = Intrinsic::getOrInsertDeclaration(...);
CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.
|
|
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
|
|
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
|
|
(#107092)
Dynamic lds and Table lds both use the amdgpu_lds_kernel_id intrinsic.
Kernels and functons that make an indirect use of this should not have
the
"amdgpu-no-lds-kernel-id" attribute.
For the later, this was done. For the dynamic lds case, this was
missing. This patch fixes it.
|
|
Fixes 106412. The logic that skips the pass on already-lowered variables
doesn't cover the path that increases alignment of variables. If a
variable is allocated at 24 and then given 16 byte alignment, the
backend notices and fatal-errors on the inconsistency.
|
|
It is only used by CodeGen so does not need to be shared with the
assembler/disassembler.
|
|
|
|
Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>
|
|
Reverts the above commit, as it updates a common header function and
did not update all callsites:
https://lab.llvm.org/buildbot/#/builders/29/builds/382
This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
|
|
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
|
|
This PR updates removeFnAttrFromReachable in AMDGPUMemoryUtils to accept
array of function attributes as argument.
Helps to remove multiple attributes in one CallGraph walk.
|
|
|
|
AMDGPUMemoryUtils (#88002)
This moves some of the utility methods from amdgpu-lower-module-lds pass to AMDGPUMemoryUtils.
|
|
(#88278)
- Resolve Static Analyzer Check Failure: Pointer Dereferencing After
Null Check.
- Minor naming and style improvement
|
|
Refactor the logic that checks if a module contains mixed
absolute/non-lowered LDS GVs.
The check now happens latter when the "worklists" are formed. This is
because in some cases (OpenMP) we can have non-lowered GVs in a lowered
module, and this is normal because those GVs are just unused and removed
from the list at some point before the end of `getUsesOfLDSByFunction`.
Doing the check later ensures that if a mixed module is spotted, then
it's a _real_ mixed module that needs rejection, not a module containing
an intentionally ignored GV.
|
|
If all variables in the module are absolute, this means we're running
the pass again on an already lowered module, and that works.
If none of them are absolute, lowering can proceed as usual.
Only diagnose cases where we have a mix of absolute/non-absolute GVs,
which means we added LDS GVs after lowering, which is broken.
See #81491
Split from #75333
|
|
This is in preparation for moving the run of AMDGPUAttributor earlier.
Currently it infers the lack of the corresponding intrinsic calls,
so if we introduce new ones we need to remove the attribute from any
possible transitive callers. This is more conservative than necessary,
we could try to identify specific subgraphs where LDS globals are not
used.
Other options include teaching the attributor to avoid adding it in
cases
where the lowering may choose the table, but this seems more complex.
Alternatively could add a second run which doesn't seem worth it.
Depends #71349
|
|
Identified with clangd.
|
|
Identified with clangd.
|
|
Replace this with PointerType::getUnqual().
Followup to the opaque pointer transition. Fixes an in-code TODO item.
|
|
This patch adds a two-argument SetInsertPoint method to IRBuilder that
takes a block/iterator instead of an instruction, and updates many call
sites to use it. The motivating reason for doing this is given here [0],
we'd like to pass around more information about the position of debug-info
in the iterator object. That necessitates passing iterators around most of
the time.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152468
|
|
https://reviews.llvm.org/D157660
|
|
|
|
* Use shorter versions of the LLVM API
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D155589
|
|
* Do a single lookup when querying the map
* Use shorter versions of the LLVM API
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D155588
|
|
PromoteAlloca
|
|
Requires D155190
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D155238
|
|
Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend.
Prior to this patch:
The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol
metadata so that the assembler can build lookup tables out of it. There is a fragile association between
kernel functions and named structs which is used to recompute the frame layout in the backend, with
fatal_errors catching inconsistencies in the second calculation.
After this patch:
The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same
absolute_symbol metadata that the assembler uses to place objects within that frame size.
Deleted the now dead allocation code from the backend. Left for a later cleanup:
- enabling lowering for anonymous functions
- removing the elide-module-lds attribute (test churn, it's not used by llc any more)
- adjusting the dynamic alignment check to not use symbol names
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D155190
|
|
variables
These aren't implemented. They could be at moderate implementation
complexity. Raising an error is better than silently miscompiling.
Patching now because the patch at D155125 is a step towards using this metadata
more extensively as part of the lowering path and that will interact badly with
input variables with this annotation.
Lowering user defined variables at specific addresses would drop this error,
put them at the requested position in the frame during this pass, and then
use the same codegen that will be used for the kernel specific struct shortly.
Reviewed By: jmmartinez
Differential Revision: https://reviews.llvm.org/D155132
|
|
It seems that the sanitizer-x86_64-linux-android wasn't able to deduce
the template argument:
AMDGPULowerModuleLDSPass.cpp:1192:53: error: no viable constructor or
deduction guide for deduction of template arguments of 'vector'
auto TableLookupVariablesOrdered = sortByName(std::vector(
This patch makes the template argument explicit.
|
|
Fixed compilation error and reudndant copy warning
Differential Revision: https://reviews.llvm.org/D154977
|
|
More robust association between the kernels and lds struct.
Use poison instead of value() for lookup table elements introduced by dynamic lds lowering.
Extracted from D154946, new test from there verbatim. Segv fixed.
Fixes issues/63338
Fixes SWDEV-404491
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D154972
|
|
This reverts commit 125b90749a98d6dc6b492883c9617f9e91ab60e0.
|
|
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154970
|
|
Moving out some changes not related to the bugfix in https://reviews.llvm.org/D154946
Reviewed By: JonChesterfield, arsenm
Differential Revision: https://reviews.llvm.org/D154959
|
|
|
|
|