Age | Commit message (Collapse) | Author | Files | Lines |
|
Created using spr 1.3.4
[skip ci]
|
|
|
|
This reverts commit d6713ad80d6907210c629f22babaf12177fa329c.
This changed was reverted because of greendragon failures such
as
Unresolved Tests (2):
lldb-api :: debuginfod/Normal/TestDebuginfod.py
lldb-api :: debuginfod/SplitDWARF/TestDebuginfodDWP.py
|
|
tail-folding mode using EVL. (#76172)
This patch introduces generating VP intrinsics in the Loop Vectorizer.
Currently the Loop Vectorizer supports vector predication in a very
limited capacity via tail-folding and masked load/store/gather/scatter
intrinsics. However, this does not let architectures with active vector
length predication support take advantage of their capabilities.
Architectures with general masked predication support also can only take
advantage of predication on memory operations. By having a way for the
Loop Vectorizer to generate Vector Predication intrinsics, which (will)
provide a target-independent way to model predicated vector
instructions. These architectures can make better use of their
predication capabilities.
Our first approach (implemented in this patch) builds on top of the
existing tail-folding mechanism in the LV (just adds a new tail-folding
mode using EVL), but instead of generating masked intrinsics for memory
operations it generates VP intrinsics for loads/stores instructions. The
patch adds a new VPlanTransforms to replace the wide header predicate
compare with EVL and updates codegen for load/stores to use VP
store/load with EVL.
Other important part of this approach is how the Explicit Vector Length
is computed. (VP intrinsics define this vector length parameter as
Explicit Vector Length (EVL)). We use an experimental intrinsic
`get_vector_length`, that can be lowered to architecture specific
instruction(s) to compute EVL.
Also, added a new recipe to emit instructions for computing EVL. Using
VPlan in this way will eventually help build and compare VPlans
corresponding to different strategies and alternatives.
Differential Revision: https://reviews.llvm.org/D99750
|
|
This is required to set target specific code generation options for
Android,
like using the TLS slot for the stack protector.
|
|
This adds build configuration for building LLDB on macOS and Linux. It
uses a default subset of features that should work out of the box with
macOS + Ubuntu. It is notably missing python support right now, although
some of the scaffolding is there, because of the complexity of linking a
python dylib, especially if you plan to distribute the resulting
liblldb.so.
Most of this build file is pretty simple, one of the unfortunate
patterns I had to use was to split the header and sources cc_library
targets to break circular dependencies.
|
|
The `--gcc-toolchain` and `--gcc-install-dir` option were previously only visible to the Clang driver, but not Flang. These determine which assembler, linker, and libraries to use, e.g. for cross-compiling, and therefore are relevant for Flang as well.
Tests are implemented using a mock GCC installation in `basic_cross_linux_tree` copied over from Clang's tests. The Clang driver already contains tests with `--driver-mode=flang` but `flang-new` is an entirely different executable (containing the `-fc1` stage) that should be tested as well. While not all files in `basic_cross_linux_tree` are strictly needed for testing those two driver flags, they will be necessarily needed for future added flags such as `--rtlib`.
Also remove the entry `*.o` in flang's `.gitignore` since `crt*.o` files are needed in the GCC mock installation.
Fixes #86729
|
|
|
|
This adds support for using the L and H argument modifiers for twinword
operands in inline asm code, such as in:
```
%1 = tail call i64 asm sideeffect "rd %pc, ${0:L} ; srlx ${0:L}, 32, ${0:H}", "={o4}"()
```
This is needed by the Linux kernel.
|
|
Presense of `cutoff-hot` or `random-skip-rate`
should be enough to trigger optimization.
|
|
They are supposed to be used with `getNumOccurrences`.
|
|
https://github.com/llvm/llvm-project/commit/5aeb604c7ce417eea110f9803a6c5cb1cdbc5372
https://buildkite.com/llvm-project/upstream-bazel/builds/93859
|
|
Downstream's having some issues due to math-macros.h issues. These will
be fixed properly soon.
See https://github.com/llvm/llvm-project/issues/87683 for tracking this
tech debt.
|
|
Default cutoff is not usefull here. Decision to
enable or not sanitizer causes more significant
performance impact, than a typical optimizations
which rely on `profile-summary-cutoff-hot`.
|
|
The header file includes windows.h in a mean-and-lean way to avoid
bringing in names that may conflict with Flang code.
|
|
to desugars_to.h (#87337)
This improves compile times and memory usage slightly and removes some
boilerplate.
|
|
iter_args (#87019)
As part of this extension this change also does some general cleanup
1) Make all the methods take `RewriterBase` as arguments instead of
creating their own builders that tend to crash when used within
pattern rewrites
2) Split `coalesePerfectlyNestedLoops` into two separate methods, one
for `scf.for` and other for `affine.for`. The templatization didnt
seem to be buying much there.
Also general clean up of tests.
|
|
This patch refactors the serialization of MemProf data to a switch
statement style:
switch (Version) {
case Version0:
return ...;
case Version1:
return ...;
}
just like IndexedMemProfRecord::serialize.
A reasonable amount of code is shared and factored out to helper
functions between writeMemProfV0 and writeMemProfV1 to the extent that
doens't hamper readability.
|
|
boundaries" (#87699)
Reverts llvm/llvm-project#79173
The testcase fails in non-asserts builds.
|
|
Even if __need_unreachable is set, stddef.h should not declare
unreachable() in C++ because it conflicts with the declaration in
\<utility>.
|
|
|
|
We don't need clang builtin for this one.
It was copy pasted from `__builtin_allow_runtime_check`
RFC:
https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641
|
|
According to
https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#cfpg-var-qual-attr-device
> A device array may be an explicit-shape array, an allocatable array,
or an assumed-shape dummy array.
Assumed size array are not supported. This patch adds an error for that
case.
|
|
ODS was still generating the old `Operation::setAttr` hooks for ODS
methods for setting attributes, when the backing implementation of the
attributes was changed to properties. No idea how this wasn't noticed
until now.
|
|
|
|
This PR should fix the parsing bug reported in
https://github.com/llvm/llvm-project/issues/87430. It allows using
result number as the `cf.switch` operand.
|
|
There is no reason to limit the minimum to two pages.
|
|
Main change is replacing DEFAULT with HOT99.
I'll remove DEFAULT related functionality in the followup patches.
|
|
|
|
Need to check that the externally used value can be represented with the
BitWidth before applying it, otherwise need to keep wider type.
|
|
|
|
Since we have released Clang 16 is no longer actively supported. However
the FreeBSD runner is still using this, so some tests still guard
against Clang 16.
|
|
I believe I've got the tests properly configured to only run on Linux
x86(_64), as I don't have a Linux AArch64/Arm device to diagnose what's
going wrong with the tests (I suspect there's some issue with generating
`.note.gnu.build-id` sections...)
The actual code fixes have now been reviewed 3 times:
https://github.com/llvm/llvm-project/pull/79181 (moved shell tests to
API tests), https://github.com/llvm/llvm-project/pull/85693 (Changed
some of the testing infra), and
https://github.com/llvm/llvm-project/pull/86812 (didn't get the tests
configured quite right). The Debuginfod integration for symbol
acquisition in LLDB now works with the `executable` and `debuginfo`
Debuginfod network requests working properly for normal, `objcopy
--only-keep-debug` stripped, split-dwarf, and `objcopy
--only-keep-debug` stripped *plus* split-dwarf symbols/binaries.
The reasons for the multiple attempts have been tests on platforms I
don't have access to (Linux AArch64/Arm + MacOS x86_64). I believe I've
got the tests properly disabled for everything except for Linux x86(_64)
now. I've built & tested on MacOS AArch64 and Linux x86_64.
---------
Co-authored-by: Kevin Frei <freik@meta.com>
|
|
Summary:
This synchronization should be done before we handle the logic relating
to closing the port. This isn't majorly important now but it would break
if we ever decided to run a server on the GPU.
|
|
|
|
The existing heuristics were assuming that every core behaves like an
Apple A7, where any extend/shift costs an extra micro-op... but in
reality, nothing else behaves like that.
On some older Cortex designs, shifts by 1 or 4 cost extra, but all other
shifts/extensions are free. On all other cores, as far as I can tell,
all shifts/extensions for integer loads are free (i.e. the same cost as
an unshifted load).
To reflect this, this patch:
- Enables aggressive folding of shifts into loads by default.
- Removes the old AddrLSLFast feature, since it applies to everything
except A7 (and even if you are explicitly targeting A7, we want to
assume extensions are free because the code will almost always run on a
newer core).
- Adds a new feature AddrLSLSlow14 that applies specifically to the
Cortex cores where shifts by 1 or 4 cost extra.
I didn't add support for AddrLSLSlow14 on the GlobalISel side because it
would require a bunch of refactoring to work correctly. Someone can pick
this up as a followup.
|
|
load/store/gather/scatter
Noticed while starting triage for #87640
|
|
Depends on #87545
Emit `GNU_PROPERTY_AARCH64_FEATURE_PAUTH` property in
`.note.gnu.property` section depending on
`aarch64-elf-pauthabi-platform` and `aarch64-elf-pauthabi-version` llvm
module flags.
|
|
It matches up with other _attribute_ adding member functions and helps
simplify InterfaceFile assignment for InstallAPI.
|
|
There is one notable "regression". This patch replaces the bespoke `or
disjoint` logic we a direct match. This means we fail some
simplification during `instsimplify`.
All the cases we fail in `instsimplify` we do handle in `instcombine`
as we add `disjoint` flags.
Other than that, just some basic cases.
See proofs: https://alive2.llvm.org/ce/z/_-g7C8
Closes #86083
|
|
|
|
In `(icmp eq (and x,y), C)` all 1s in `C` must also be set in both
`x`/`y`.
In `(icmp eq (or x,y), C)` all 0s in `C` must also be set in both
`x`/`y`.
Closes #87143
|
|
x,y), C)`; NFC
|
|
`PromotableOpInterface` (#86792)
Add `requiresReplacedValues` and `visitReplacedValues` methods to
`PromotableOpInterface`. These methods allow `PromotableOpInterface` ops
to transforms definitions mutated by a `store`.
This change is necessary to correctly handle the promotion of
`LLVM_DbgDeclareOp`.
---------
Co-authored-by: Théo Degioanni <30992420+Moxinilian@users.noreply.github.com>
|
|
LLVMgold.so can be used with GNU ar, gold, ld, and nm to process LLVM
bitcode files. Install it in LLVM_INSTALL_TOOLCHAIN_ONLY=on builds like
we install libLTO.so.
Suggested by @emelife
Fix #84271
|
|
commit d89914f30bc7c180fe349a5aa0f03438ae6c20a4
Author: Kazu Hirata <kazu@google.com>
Date: Wed Apr 3 21:48:38 2024 -0700
changed RecordWriterTrait to a template class with IndexedVersion as a
template parameter. This patch changes the class back to a
non-template one while retaining the ability to serialize multiple
versions.
The reason I changed RecordWriterTrait to a template class was
because, even if RecordWriterTrait had IndexedVersion as a member
variable, RecordWriterTrait::EmitKeyDataLength, being a static
function, would not have access to the variable.
Since OnDiskChainedHashTableGenerator calls EmitKeyDataLength as:
const std::pair<offset_type, offset_type> &Len =
InfoObj.EmitKeyDataLength(Out, I->Key, I->Data);
we can make EmitKeyDataLength a member function, but we have one
problem. InstrProfWriter::writeImpl calls:
void insert(typename Info::key_type_ref Key,
typename Info::data_type_ref Data) {
Info InfoObj;
insert(Key, Data, InfoObj);
}
which default-constructs RecordWriterTrait without a specific version
number. This patch fixes the problem by adjusting
InstrProfWriter::writeImpl to call the other form of insert instead:
void insert(typename Info::key_type_ref Key,
typename Info::data_type_ref Data, Info &InfoObj)
To prevent an accidental invocation of the default constructor of
RecordWriterTrait, this patch deletes the default constructor.
|
|
|
|
|
|
|
|
|