Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
the global AS (#160129)
Mostly NFC, and adds an assertion for gfx12 to ensure that no atomic scratch
instructions are present in the case of GloballyAddressableScratch. This should
always hold because of #154710.
|
|
A fence release could be followed by a barrier, so it should wait for
the relevant memory accesses to complete, even if it is mmra-limited to
LDS. So far, that would be skipped for non-global fence releases.
Fixes SWDEV-554932.
|
|
The removed function SIGfx90ACacheControl::enableLoadCacheBypass() does
not actually do anything except one assert and one unreachable.
|
|
Defaults to "agent" for targets that do not support it.
- Add documentation
- Register it in MachineModuleInfo
- Add MemoryLegalizer support
|
|
|
|
This reverts commit be17791f2624f22b3ed24a2539406164a379125d.
This is not necessary for gfx1250 anymore.
|
|
Implements the base of the MemoryLegalizer for a roughly correct GFX1250 memory model.
Documentation will come later, and some remaining changes still have to be added, but this is the backbone of the model.
|
|
- Add clang built-ins + sema/codegen
- Add IR Intrinsic + verifier
- Add DAG/GlobalISel codegen for the intrinsics
- Add lowering in SIMemoryLegalizer using a MMO flag.
|
|
Re-enable it by adding a wait on vm_vsrc before every barrier "start"
instruction in GFX10/11/12 CU mode.
This is a less strong wait than what we do without BackOffBarrier, thus
this shouldn't introduce
any new guarantees that can be abused, instead it relaxes the guarantees
we have now to the bare
minimum needed to support the behavior users want (fence release +
barrier works).
There is an exact memory model in the works which will be documented
separately.
|
|
The new instruction represents the unknown number of waitcnts needed at a
release operation to ensure that prior direct loads to LDS (formerly called LDS
DMA) are completed. The instruction is replaced in SIInsertWaitcnts with a
suitable value for vmcnt().
Co-authored-by: Austin Kerbow <austin.kerbow@amd.com>.
|
|
Determines whether we can use `SCOPE_CU` stores (on by default), or
whether all stores must be done at `SCOPE_SE` minimum.
|
|
(#150587)
We can do it all in finalizeStore if we ensure it always sees the
stores.
For that, I needed to fix a hidden bug where finalizeStore wouldn't see
all stores
because sometimes the iterator got out-of-sync and didn't point to the
store anymore.
This also removes the waits before volatile LDS stores which never
needed it, that was a bug until now.
|
|
|
|
(#148630)
Directly plug it into the MMO instead, which is much cleaner.
|
|
"amdgpu-as" is way too vague and doesn't give enough context.
We may want to support it on normal atomics too, to control the synchronized (ordered) AS.
If we do that, the name has to be less vague.
|
|
(#145084)
No tests yet, but it will allow further tests not to be
polluted with these waits.
|
|
This annotates the `Twine` passed to the constructors of the various
DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes
us to warn when we would try to print the twine after it had already
been destructed.
We also update `DiagnosticInfoUnsupported` to hold a `const Twine &`
like all of the other DiagnosticInfo classes, since this warning allows
us to clean up all of the places where it was being used incorrectly.
|
|
|
|
This was only used for gfx940 and gfx941, which have since been removed.
For SWDEV-512631
|
|
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
This PR removes all non-documentation occurrences of gfx940/gfx941 from
the llvm directory, and the remaining occurrences in clang.
Documentation changes will follow.
For SWDEV-512631
|
|
global_wb with scopes lower than SCOPE_SYS is unnecessary for
correctness.
I was initially optimistic they would be very cheap no-ops but they can
actually be quite expensive so let's avoid them.
|
|
Documents the memory model implemented as of #98591, with some
fixes/optimizations to the implementation.
|
|
|
|
|
|
- Emit GLOBAL_WB instructions
- Reflect synscope on instructions's `scope:` operand
Fixes SWDEV-468508
Fixes SWDEV-470735
Fixes SWDEV-468392
Fixes SWDEV-469622
|
|
Using MMRAs, allow `builtin_amdgcn_fence` to emit fences that only
target one or more address spaces, instead of fencing all address spaces
at once.
This is done through a `amdgpu-as` MMRA. Currently focused on OpenCL
fences, but can very easily support more AS names and codegen on more
than just fences.
|
|
Convert !amdgpu.last.use metadata into MachineMemOperand for last use
and handle it in SIMemoryLegalizer similar to nontemporal and volatile.
|
|
Iterator MI can advance in insertWait() but we need original instruction
to set temporal hint. Just move it before handling volatile.
|
|
Insert waitcnts for loads and atomics before stores with system scope.
Scope is field in instruction encoding and corresponds to desired
coherence level in cache hierarchy.
Intrinsic stores can set scope in cache policy operand.
If volatile keyword is used on generic stores memory legalizer will set
scope to system. Generic stores, by default, get lowest scope level.
Waitcnts are not required if it is guaranteed that memory is cached.
For example vulkan shaders can guarantee this.
TODO: implement flag for frontends to give us a hint not to insert
waits.
Expecting vulkan flag to be implemented as vulkan:private MMRA.
|
|
Fixes SWDEV-443292
|
|
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait
instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into
separate LOADCNT, SAMPLECNT and BVHCNT counters.
|
|
Co-authored-by: Diana Picus <Diana-Magda.Picus@amd.com>
|
|
to be 0 already (#72830)
Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>
|
|
Differential Revision: https://reviews.llvm.org/D149986
|
|
Memory models for gfx90a and gfx940 do not require buffer_wbl2
before the fence for acquire ordering, but we do insert the full
release.
Fixes: SWDEV-386785
Differential Revision: https://reviews.llvm.org/D145524
|
|
|
|
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This fixes clang.
|
|
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.
Differential Revision: https://reviews.llvm.org/D139828
|
|
|
|
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
|
|
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
|
|
|
|
|
|
This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
|
|
|
|
|
|
In GFX10 dlc controlled L1 cache bypass. In GFX11 it has been repurposed
to control MALL NOALLOC, and glc controls L1 as well as L0 cache bypass.
Update the documentation and SIMemoryLegalizer accordingly. Set dlc for
nontemporal and volatile accesses.
Differential Revision: https://reviews.llvm.org/D127405
|
|
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926
before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
|
|
Differential Revision: https://reviews.llvm.org/D121242
|