Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
|
|
|
|
This patch reworks the way that wsloop reduction operations function to better
match the expected semantics from the OpenMP specification, following the rework
of parallel reductions.
The new semantics create a private reduction variable as a block argument which
should be used normally for all operations on that variable in the region; this
private variable is then combined with the others into the shared variable. This
way no special omp.reduction operations are needed inside the region. These
block arguments follow the loop control block arguments.
|
|
This patch changes how the macro __ARM_ARCH is defined to match its
defintion in the ACLE. In ACLE 5.4.1, __ARM_ARCH is defined as equal to
the major architecture version for ISAs up to and including v8. From
v8.1 onwards, its definition is changed to include minor versions, such
that for an architecture vX.Y, __ARM_ARCH = X*100 + Y. Before this
patch, LLVM defined __ARM_ARCH using only the major architecture version
for all architecture versions. This patch adds functionality to define
__ARM_ARCH correctly for architectures greater than or equal to v8.1.
|
|
This uses
https://pygithub.readthedocs.io/en/stable/github_objects/Repository.html?highlight=get_collaborator_permission#github.Repository.Repository.get_collaborator_permission.
Which does
https://docs.github.com/en/rest/collaborators/collaborators?apiVersion=2022-11-28#get-repository-permissions-for-a-user
and returns the top level "permission" key.
This is less detailed than the user/permissions key but should be fine
for this
use case.
When a review is submitted we check:
* If it's an approval.
* Whether we have already left a merge on behalf comment (by looking for
a hidden HTML comment).
* Whether the author has permissions to merge their own PR.
* Whether the reviewer has permissions to merge.
If needed we leave a comment tagging the reviewer. If the reviewer also
doesn't have merge permission, then it asks them to find someone else
who does.
|
|
|
|
Apparently, some compilers [correctly] warn that the variable that was
created prior to this change is unused.
This reemoves the variable.
|
|
Historically TableGen has used `A.swap(B)` to move containers without
the expense of copying them. Perhaps this predated rvalue references. In
any case `A = std::move(B)` seems like a more direct way to implement
this when only A is required after the operation.
|
|
This reapplies commit bdde5f9 by undoing the revert bc66e0c.
The previous reapplication 5c9f768 was reverted due to a crash
(reproducer in comments for 5c9f768) which was fixed in #81595.
As noted in the original commit, this commit may break downstream tests.
If this commit is breaking your downstream tests, please see comment 12 in
[0], which documents the kind of variation in tests we'd expect to see from
this change and what to do about it.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
|
|
(#81600)
It's not interesting for majority of downstream users.
|
|
|
|
|
|
'serial', 'parallel', and 'kernel' constructs are all considered
'Compute' constructs. This patch creates the AST type, plus the required
infrastructure for such a type, plus some base types that will be useful
in the future for breaking this up.
The only difference between the three is the 'kind'( plus some minor
clause legalization rules, but those can be differentiated easily
enough), so rather than representing them as separate AST nodes, it
seems
to make sense to make them the same.
Additionally, no clause AST functionality is being implemented yet, as
that fits better in a separate patch, and this is enough to get the
'naked' constructs implemented.
This is otherwise an 'NFC' patch, as it doesn't alter execution at all,
so there aren't any tests. I did this to break up the review workload
and to get feedback on the layout.
|
|
Just emit their satisfaction state, which is what the current
interpreter does as well.
|
|
In a few places we test whether sets (i.e. sorted ranges) intersect by
computing the set_intersection and then testing whether it is empty. For
this purpose it should be more efficient to use a std:vector instead of
a std::set to hold the result of the set_intersection, since insertion
is simpler.
|
|
Introduce `mcdc::DecisionParameters` and `mcdc::BranchParameters` and make
sure them not initialized as zero.
FIXME: Could we make `CoverageMappingRegion` as a smart tagged union?
|
|
zOS doesn't support aligned allocation, so mark these testcases as
unsupported.
Continuation of https://reviews.llvm.org/D102798
|
|
Allocate storage and initialize it with the given APValue contents.
|
|
This fixes a crash when lowering an extract_subvector like:
t0:v1i64 = extract_subvector t1:v2i64, 1
Whilst we never need a vslidedown with M1 on scalable vector types, we might
need to do it for v1i64/v1f64, since the smallest container type for it is
nxv1i64/nxv1f64.
The lowering code is still correct for this case, but the assertion was too
strict. The actual invariant we're relying on is that ContainerSubVecVT's LMUL
<= M1, not < M1. Hence why we handled v2i32 fine, because its container type
was nxv1i32 and MF2.
|
|
|
|
Common backends (LLVM, SPIR-V) only supports 1D vectors, LLVM conversion
handles ND vectors (N >= 2) as `array<array<... vector>>` and SPIR-V
conversion doesn't handle them at all at the moment. Sometimes it's
preferable to treat multidim vectors as linearized 1D. Add pass to do
this. Only constants and simple elementwise ops are supported for now.
@krzysz00 I've extracted yours result type conversion code from
LegalizeToF32 and moved it to common place.
Also, add ConversionPattern class operating on traits.
|
|
Adds a test to help document Linalg Ops that are currently not supported
by the vectoriser (i.e. the logic to vectorise these is missing). The
list is not exhaustive.
|
|
Fix crash raised in comments for 5c9f7682b090124d9a8b69f92d3f7c269dca25fc
|
|
|
|
directives (#81081)
This patch adds support for the depend clause in a number of OpenMP
directives/constructs related to offloading. Specifically, it adds the
handling of the depend clause when it is used with the following
constructs
- target
- target enter data
- target update data
- target exit data
|
|
Without this I would hit errors with libstdc++-12 like:
/usr/include/c++/12/bits/stl_iterator_base_funcs.h:230:5: note:
candidate template ignored: substitution failure [with _InputIterator =
llvm::const_set_bits_iterator_impl<llvm::BitVector>]: argument may not
have 'void' type
next(_InputIterator __x, typename
^
|
|
|
|
The 1-D case directly maps to LLVM intrinsics. The n-D case will be
handled by unrolling to 1-D first (in a later patch).
Depends on: #80965
|
|
Although in a normal implementation the assumption is reasonable, it
seems that some esoteric implementation are not returning a T&. This
should be handled correctly and the values be propagated.
---------
Co-authored-by: martinboehme <mboehme@google.com>
|
|
The motivation here was a suggestion over in Compiler Explorer. You can
use `-mllvm` already to do this but since gfortran supports `-masm`, I
figured I'd try to add it.
This is done by flang expanding `-masm` into `-mllvm x86-asm-syntax=`,
then passing that to fc1. Which then collects all the `-mllvm` options
and forwards them on.
The code to expand it comes from clang `Clang::AddX86TargetArgs` (there
are some other places doing the same thing too). However I've removed
the `-inline-asm` that clang adds, as fortran doesn't have inline
assembly.
So `-masm` for flang purely changes the style of assembly output.
```
$ ./bin/flang-new /tmp/test.f90 -o - -S -target x86_64-linux-gnu
<...>
pushq %rbp
$ ./bin/flang-new /tmp/test.f90 -o - -S -target x86_64-linux-gnu -masm=att
<...>
pushq %rbp
$ ./bin/flang-new /tmp/test.f90 -o - -S -target x86_64-linux-gnu -masm=intel
<...>
push rbp
```
The test is adapted from `clang/test/Driver/masm.c` by removing the
clang-cl related lines and changing the 32 bit triples to 64 bit triples
since flang doesn't support 32 bit targets.
|
|
This patch adds full support for linking SystemZ (ELF s390x) object
files. Support should be generally complete:
- All relocation types are supported.
- Full shared library support (DYNAMIC, GOT, PLT, ifunc).
- Relaxation of TLS and GOT relocations where appropriate.
- Platform-specific test cases.
In addition to new platform code and the obvious changes, there were a
few additional changes to common code:
- Add three new RelExpr members (R_GOTPLT_OFF, R_GOTPLT_PC, and
R_PLT_GOTREL) needed to support certain s390x relocations. I chose not
to use a platform-specific name since nothing in the definition of these
relocs is actually platform-specific; it is well possible that other
platforms will need the same.
- A couple of tweaks to TLS relocation handling, as the particular
semantics of the s390x versions differ slightly. See comments in the
code.
This was tested by building and testing >1500 Fedora packages, with only
a handful of failures; as these also have issues when building with LLD
on other architectures, they seem unrelated.
Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@redhat.com>
|
|
Use templates instead.
Part of <https://github.com/llvm/llvm-project/issues/62629>.
|
|
|
|
|
|
bugprone-unused-local-non-trivial-variable (#81563)
|
|
This reverts commit a034e65e972175a2465deacb8c78bc7efc99bd23.
Some protobuf users reported that this patch caused a significant
compile-time regression because `TailDuplicator` works poorly with a
specific pattern.
We will reland it once the codegen issue is fixed.
|
|
The strictfp attribute has the requirement that "LLVM will not introduce
any new floating-point instructions that may trap". The llvm.is.fpclass
intrinsic is documented as "The function never raises floating-point
exceptions", and the fcmp instruction may raise one, so we can't
transform the former into the latter in functions with the strictfp
attribute.
|
|
llvm.dbg.assign intrinsics have 2 {value, expression} pairs; fix hwasan to
update the second expression.
Fixes #76545. This is #78606 rebased and with the addition of DPValue handling.
Note the addition of --try-experimental-debuginfo-iterators in the tests and
some shuffling of code in MemoryTaggingSupport.cpp.
|
|
|
|
This function will be useful when we change the behavior of record-type
prvalues
so that they directly initialize the associated result object. See also
the
comment here for more details:
https://github.com/llvm/llvm-project/blob/9e73656af524a2c592978aec91de67316c5ce69f/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h#L354
As part of this patch, we document and assert that synthetic fields may
not have
reference type.
There is no practical use case for this: A `StorageLocation` may not
have
reference type, and a synthetic field of the corresponding non-reference
type
can serve the same purpose.
|
|
|
|
Currently, `phaseParity` argument of `nvgpu.mbarrier.try_wait.parity` is
index. This can cause a problem if it's passed any value different than
0 or 1. Because the PTX instruction only accepts even or odd phase. This
PR makes phaseParity argument i1 to avoid misuse.
Here is the information from PTX doc:
```
The .parity variant of the instructions test for the completion of the phase indicated
by the operand phaseParity, which is the integer parity of either the current phase or
the immediately preceding phase of the mbarrier object. An even phase has integer
parity 0 and an odd phase has integer parity of 1. So the valid values of phaseParity
operand are 0 and 1.
```
See for more information:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-test-wait-mbarrier-try-wait
|
|
|
|
They can be also used in `clang`.
Introduce the lightweight header instead of `CoverageMapping.h`.
This includes for now:
* `mcdc::ConditionID`
* `mcdc::Parameters`
|
|
This is one option for attempting to move genMinMaxlocReductionLoop to a
better location. It moves it into Transforms and makes HLFIRTranforms
depend upon FIRTransforms.
It passes a build locally, both with and without -DBUILD_SHARED_LIBS,
and does OK on the windows CI.
|
|
If we have something like G_TRUNC from v2s32 to v2s16, then lowering
this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from
v2s16 to v2s8 does not bring us any closer to legality. In fact, the
first part of that is a G_BUILD_VECTOR whose legalization will produce a
new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get
combined to the original, causing a legalization cycle.
Make the lowering condition more precise, by requiring that the original
vector is >128 bits, which is I believe the only case where this
specific splitting approach is useful.
Note that this doesn't actually produce a legal result (the alwaysLegal
is a lie, as before), but it will cause a proper globalisel abort
instead of an infinite legalization loop.
Fixes https://github.com/llvm/llvm-project/issues/81244.
|
|
Fix discrepancy from when this was forked from the SkylakeServer model
This also fixes VRANGEPS/VRANGEPD instructions which typically match FADD characteristics
Confirmed with Agner + uops.info
Fixes #81504
|
|
|