Age | Commit message (Collapse) | Author | Files | Lines |
|
- Added opencl Pipe builtins
- Pipe instructions were added in tablegen and lowered in
SPIRVBuiltins.cpp
---------
Co-authored-by: Michal Paszkowski <michal@michalpaszkowski.com>
Co-authored-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
|
|
Added constraints related to Addressing model as specified in the
specification.
It conforms with the implementation in translator
Same as PR #160089
Solved all issues
|
|
SPV_EXT_relaxed_printf_string_address_space (#160245)
Added support for the extension to support more storageclass for printf
strings.
|
|
Mapped llvm.debugtrap intrinsic to OpNop in the SPIR-V backend, since
SPIR-V has no direct equivalent with tests.
|
|
Added SPIR-V support for constrained floating-point comparison
intrinsics (fcmp, fcmps) with lowering and tests.
|
|
The code was incorrectly converting all `undef` arguments to `i32`,
while the `spv_insertv` intrinsics only expects that for the first
operand, representing the aggregate type.
Fixes https://github.com/llvm/llvm-project/issues/127977
---------
Co-authored-by: Michal Paszkowski <michal@michalpaszkowski.com>
|
|
Only VPlan pattern matching is used in the file, move the using
statement to the top level.
|
|
This commit add patterns for lsx and lasx to support generating
`[x]vadda.{b/h/w/d}` instructions.
Note: For convenience, this commit also set `ISD::ABS` as legal. As
shown in the tests, this brings no change to the results, just same as
the results obtained from expanding it before. But, setting it as legal
brings more vectorization opportunities to IR transformation which may
bring more vector optimization chances for later stages and the backend.
|
|
In order to avoid conflating the legacy CSE with the VPlan-based one,
rename the legacy CSE and insert a FIXME to clarify the nature of the
legacy CSE.
|
|
Try to remove `UnsafeFPMath` uses in arm backend. These global flags
block some improvements like
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797.
Remove them incrementally.
|
|
This patch aims at making the combination of single-float and N32/N64
ABI properly work.
Right now when both options are enabled the compiler chooses an
incorrect ABI and in some cases even generates wrong instructions.
The floating point behavior on MIPS is controlled through 3 flags:
soft-float, single-float, fp64. This makes things complicated because
fp64 indicates the presence of 64bit floating point registers, but
cannot be easily disabled (the mips3 feature require it, but mips3 CPUs
with only 32bit floating point exist). Also if fp64 is missing it
doesn't actually disable 64bit floating point operations, because
certain MIPS1/2 CPUs support 64bit floating point with 32bit registers,
hence the single-float option.
I'm guessing that originally single-float was only intended for the
latter case, and that's the reason why it doesn't properly work on 64bit
targets.
So this patch does the following:
- Make single-float a "master disable", even if fp64 is enabled this
should completely disable generation of 64bit floating point operations,
making it available on targets which hard require fp64.
- Add proper calling conventions for N32/N64 single-float combinations.
- Fixup codegen to not generate certain 64bit floating point operations,
apparently not assigning a register class to f64 values is not enough to
prevent them from showing up.
- Add tests for the new calling conventions and codegen.
|
|
Fixes #160981
The exponential part of a floating-point number is signed. This patch
prevents treating it as unsigned.
|
|
CSE may replace multiple redundant broadcasts of EVL with a single
broadcast which may have more than 1 user. Adjust the verifier to allow
this.
Fixes a crash when building llvm-test-suite with EVL:
https://lab.llvm.org/buildbot/#/builders/210/builds/3303
|
|
This enables additional DCE/CSE opportunities and ensures that we don't
end up with multiple redundant users of a VPInstruction using EVL. It
fixes a verifier error in the added test_3_inductions test.
|
|
If we have a AVX512 target capable of AVXIFMA but not AVX512IFMA then we must split 512-bit (or larger) types to 256-bits
Fixes #160928
|
|
This is a follow-up to #156140, which deprecated one form of write.
We have two forms of read:
template <typename value_type, std::size_t alignment>
[[nodiscard]] inline value_type read(const void *memory, endianness
endian)
template <typename value_type, endianness endian, std::size_t alignment>
[[nodiscard]] inline value_type read(const void *memory)
The difference is that endian is a function parameter in the former
but a template parameter in the latter.
This patch streamlines the code by migrating the use of the latter to
the former while deprecating the latter.
|
|
declarations. (#160749)
This code doesn't work very well, but this makes it work when intrinsic
definitions are present. It now discounts functions declarations from
the set of attributes it looks at.
The code would have worked better before
0ab5b5b8581d9f2951575f7245824e6e4fc57dec when module-level attributes
could provide the information used to construct build-attributes.
|
|
This reverts commits 56a1cbb ([LAA] Fix non-NFC parts of 1aded51),
1aded51 ([LAA] Prepare to handle diff type sizes (NFC)). The original
NFC patch caused some regressions, which the later patch tried to fix.
However, the later patch is the cause of some crashes, and it would be
best to revert both for now, and re-land after thorough testing.
|
|
(#160628)
The rotate transformation from
https://github.com/llvm/llvm-project/blob/72c04bb882ad70230bce309c3013d9cc2c99e9a7/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L10312-L10337
has no middle-end equivalent in InstCombine. The following is a port of
that transformation to InstCombine.
---------
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
|
|
This reverts commit aa08b1a9963f33ded658d3ee655429e1121b5212.
|
|
This is primarily to avoid folding a frame index materialized
into an SGPR into the pseudo; this would end up looking like:
%sreg = s_mov_b32 %stack.0
%av_32 = av_mov_b32_imm_pseudo %sreg
Which is not useful.
Match the check used for the b64 case. This is limited to the
pseudo to avoid regression due to gfx908's special case - it
is expecting to pass here with v_accvgpr_write_b32 for illegal
cases, and stay in the intermediate state with an sgpr input.
This avoids regressions in a future patch.
|
|
The naive char-by-char lookup performed OK, but we can skip ahead to the
next match, avoiding all the extra hash lookups in the key map. Likely
there is a faster method than this, but its already a 42% win in the
BM_Mustache_StringRendering/Escaped benchmark, and an order of magnitude
improvement for BM_Mustache_LargeOutputString.
| Benchmark | Before (ns) | After (ns) | Speedup |
| :--- | ---: | ---: | ---: |
| `StringRendering/Escaped` | 29,440,922 | 16,583,603 | ~44% |
| `LargeOutputString` | 15,139,251 | 929,891 | ~94% |
| `HugeArrayIteration` | 102,148,245 | 95,943,960 | ~6% |
| `PartialsRendering` | 308,330,014 | 303,556,563 | ~1.6% |
Unreported benchmarks, like those for parsing, had no significant
change.
|
|
Propagate `!prof` from `switch` instructions.
Issue #147390
|
|
Since v2f32 is legal but v2i32 is not, this causes some sequences of
operations like bitcast (build_vector) to be lowered inefficiently.
|
|
This PR uses the VFS to create the OpenMP target entry instead of going
straight to the real file system. This matches the behavior of other
input files of the compiler.
|
|
This patch is based on https://github.com/llvm/llvm-project/pull/159713
This patch extends AddressSanitizer to support indexed/segment
instructions in RVV. It enables proper instrumentation for these memory
operations.
A new member, `MaybeOffset`, is added to `InterestingMemoryOperand` to
describe the offset between the base pointer and the actual memory
reference address.
Co-authored-by: Yeting Kuo <yeting.kuo@sifive.com>
|
|
This is a temporary fix for a regression from #154875.
The new pattern sets the hi part of V_BFI result and that confuses
si-fix-sgpr-copies - where the proper fix is likely to be.
During si-fix-sgpr-copies, an incorrect fold happens:
%86:vgpr_32 = V_BFI_B32_e64
%87:sreg_32 = COPY %86.hi16:vgpr_32
%95:vgpr_32 = nofpexcept V_PACK_B32_F16_t16_e64 0, killed %87:sreg_32,
0, %63:vgpr_16, 0, 0
into
%86:vgpr_32 = V_BFI_B32_e64
%95:vgpr_32 = nofpexcept V_PACK_B32_F16_t16_e64 0, %86.lo16:vgpr_32, 0,
%63:vgpr_16, 0, 0
Fixes: Vulkan CTS dEQP-VK.glsl.builtin.precision_fp16_storage32b.*.
|
|
Fix M68k build failures caused by
https://github.com/llvm/llvm-project/pull/160797
|
|
permutation instructions (#160763)
In newer SiFIve7 cores like X390, permutation instructions like
vrgather.vv operates on LMUL smaller than a single DLEN could yield a
constant cycle. For slightly larger data that fits in the constraint of
`log2(SEW/8) + log2(LMUL) <= log2(DLEN / 32)`, these instructions can
also yield cycles that are proportional to the quadratic of LMUL, rather
than being proportional to VL.
Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>
|
|
Selecting vgpr for the uniform version of this pattern may lead to
unnecessary vgpr and waterfall loops.
|
|
Root Signature Flags, allow flags to block compilation of certain shader
stages. This PR implements a validation and notify the user if they
compile a root signature that is denying such shader stage.
Closes: https://github.com/llvm/llvm-project/issues/153062
Previously approved: https://github.com/llvm/llvm-project/pull/153287
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com>
Co-authored-by: Joao Saffran <jderezende@microsoft.com>
|
|
values
- Currently, Intrinsic can only have up to 9 return values. In case new
intrinsics require more than 9 return values, additional ITT_STRUCTxxx
values need to be added to support > 9 return values. Instead, this
patch unifies them into a single IIT_STRUCT followed by a BYTE
specifying the minimal 2 (encoded as 0) and maximal 257 (encoded as
255) return values.
|
|
(#160184)
This PR is updating `Object/DXContainer.h` so that we can read data from
root signature version 1.2, which adds flags into static samplers.
|
|
This patch adds an Offload Wrapper for the SYCL kind. This is an
essential step for SYCL offloading and the compilation flow. The usage
of offload wrapping is added to the clang-linker-wrapper tool.
Modifications:
Implemented `bundleSYCL()` function to handle SYCL image bundling.
Implemented `wrapSYCLBinaries()` function that is invoked from
clang-linker-wrapper.
SYCL Offload Wrapping uses specific data structures such as
`__sycl.tgt_device_image` and `__sycl.tgt_bin_desc`. Each SYCL image
maintains its own symbol table (unlike shared global tables in other
targets). Therefore, symbols are encoded explicitly during the offload
wrapping. Also, images refer to their own Offloading Entries arrays
unlike other targets.
The proposed `__sycl.tgt_device_image` uses Version 3 to differentiate
from images generated by Intel DPC++. The structure proposed in this
patch doesn't have fields deprecated in DPC++.
|
|
Consolidate predicate definitions into top level entry point for PowerPC
target `PPC.td` and
remove duplicate definitions for 32/64 bit sub-target checks.
|
|
This PR updates the YAML representation of DXContainer to support Root
Signature 1.2, this also requires updating the write logic to support
testing.
|
|
This patch adds 2 small validation to DirectX backend. First, it checks
if registers in descriptor tables are not overflowing, meaning they
don't try to bind registers over the maximum allowed value, this is
checked both on the offset and on the number of descriptors inside the
range; second, it checks if samplers are being mixed with other resource
types.
Closes: #153057, #153058
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com>
Co-authored-by: Joao Saffran <jderezende@microsoft.com>
|
|
SMLoc itself encapsulates just a pointer, so there is no need to pass or
return it by reference.
|
|
|
|
Fix line ending to Unix style by running dos2unix on this file.
|
|
|
|
This PR changes `llvm::FileCollector` to use the `llvm::vfs::FileSystem`
API for making file paths absolute instead of using
`llvm::sys::fs::make_absolute()` directly. This matches the behavior of
the compiler on most other input files.
|
|
(#160294)
This is essentially the same patch as
116ca9522e89f1e4e02676b5bbe505e80c4d4933;
when trying to match a physreg hint, try to find a compatible physreg if
there is
a subregister copy. This has the slight difference of using getSubReg on
the hint
instead of getMatchingSuperReg (the other use should also use getSubReg
instead,
it's faster).
At the moment this turns out to have very little effect. The adjacent
code needs
better handling of subregisters, so continue adding this piecemeal. The
X86 test
shows a net reduction in real instructions, plus a few new kills.
|
|
[nfc]" (#160897)
Reverts llvm/llvm-project#160765. Failures on buildbot indicate second
assertion does not in fact hold.
|
|
Uses the existing format of the LiveRange printer, and just factors it
out so that you can do vni->dump() when debugging, or log a vni in a
debug print statement.
|
|
|
|
We should always be able to find the VNInfo in the original live
interval which corresponds to the subset we're trying to spill, and the
only cases where we have a VNInfo without a definition instruction are
if the vni is unused, or corresponds to a phi. Adjust the code structure
to explicitly check for PHIDef, and assert the stronger conditions.
|
|
We didn't have trace logging for two cases in this routine which makes
it sometimes hard to tell what is going on. In addition to debug trace
statements, add comments to explain the logic behind the early exits
which don't mark the virtual register live. Suggestions on how to word
these more precisely very welcome; I'm not clear I understand all the
intrinicies of this code myself.
|
|
This heuristic was originally added in 40c4aa with the stated purpose of
avoiding global split on live long ranges created by MachineLICM
hoisting trivially rematerializable instructions. In the meantime,
various backends have introduced non-trivial rematerialization cases,
MachineLICM gained an explicitly triviality check, and we've reworked
our APIs to match naming wise. Let's move this heuristic back to truely
trivial remat only.
This is a functional change, though somewhat hard to hit. This change
will cause non-trivially rematerializable instructions to be globally
split more often. This is likely a good thing since non-trivial remat
may not be legal at all possible points in the live interval, but may
cost slightly more compile time.
I don't have a motivating example; I found it when reviewing the callers
of isRemMaterializable(MI).
|
|
Enable the generation of no-loop kernels for Fortran OpenMP code. target
teams distribute parallel do pragmas can be promoted to no-loop kernels
if the user adds the -fopenmp-assume-teams-oversubscription and
-fopenmp-assume-threads-oversubscription flags.
If the OpenMP kernel contains reduction or num_teams clauses, it is not
promoted to no-loop mode.
The global OpenMP device RTL oversubscription flags no longer force
no-loop code generation for Fortran.
|