Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
Make sure we don't push the break and continue scope for a while
loop until after we have evaluated the condition.
|
|
Change from GFX12: Relax S_CLAUSE rules to all all non-flat memory types
in
the same clause, and all Flat types in the same.
For VMEM/FLAT clause types now look like:
- Non-Flat (load, store, atomic): buffer, global, scratch, TDM, Async
- Flat: load, store, atomic
|
|
Scan reductions are supported in OpenMP with the help of scan directive.
Reduction clause of the for loop/simd directive can take an `inscan`
modifier along with the body of the directive specifying a `scan`
directive. This PR implements the lowering logic for scan reductions in
workshare loops of OpenMP.
The body of the for loop is split into two loops (Input phase loop and
Scan Phase loop) and a scan reduction loop is added in the middle. The
Input phase loop populates a temporary buffer with initial values that
are to be reduced. The buffer is used by the reduction loop to perform
scan reduction. Scan phase loop copies the values of the buffer to the
reduction variable before executing the scan phase. Below is a high
level view of the code generated.
```
<declare pointer to buffer> ptr
omp parallel {
size num_iters = <num_iters>
// temp buffer allocation
omp masked {
buff = malloc(num_iters*scanvarstype)
*ptr = buff
}
barrier;
// input phase loop
for (i: 0..<num_iters>) {
<input phase>;
buffer = *ptr;
buffer[i] = red;
}
// scan reduction
omp masked
{
for (int k = 0; k != ceil(log2(num_iters)); ++k) {
i=pow(2,k)
for (size cnt = last_iter; cnt >= i; --cnt) {
buffer = *ptr;
buffer[cnt] op= buffer[cnt-i];
}
}
}
barrier;
// scan phase loop
for (0..<num_iters>) {
buffer = *ptr;
red = buffer[i] ;
<scan phase>;
}
// temp buffer deletion
omp masked {
free(*ptr)
}
barrier;
}
```
The temporary buffer needs to be shared between all threads performing
reduction since it is read/written in Input and Scan workshare Loops.
This is achieved by declaring a pointer to the buffer in the shared
region and dynamically allocating the buffer by the master thread.
This is the reason why allocation, deallocation and scan reduction are
performed within `masked`. The code is verified to produce correct
results for Fortran programs with the code changes in the PR
https://github.com/llvm/llvm-project/pull/133149
|
|
This warns the user of incompatible configurations, such as 39-bit and
42-bit VMAs for AArch64 non-Android Linux ASan
(https://github.com/llvm/llvm-project/issues/145259).
|
|
(#152559)
The requirement that the LDS operand is contiguous is overly restrictive
because it's perfectly valid to have a subview depend on subgroup IDs
that is still subgroup contiguous. We could continue trying to do this
verification based on the number of copied elements, but instead this
change just opts to clarify the semantics on the op definition.
|
|
Starting in gfx1250, voffset and immoffset are zero-extended from 32
bits
to 45 bits before being added together.
|
|
This patch makes all of the ninja commands in the monolithic-* scripts
write to log files in the current working directory. The plan is to use
this to feed the ninja log into generate_test_report_github.py so we can
surface compilation errors.
Related to #152246.
Reviewers: Keenuts, lnihlen, cmtice, dschuff, gburgessiv
Reviewed By: Keenuts, cmtice
Pull Request: https://github.com/llvm/llvm-project/pull/152331
|
|
Some DIA PDB tests pass with the native plugin already, but didn't test
this. This adds test runs with the native plugin - no functional
changes.
In addition to the x86 calling convention test, there's also
https://github.com/llvm/llvm-project/blob/9f102a90042fd3757c207112cfe64ee10182ace5/lldb/test/Shell/SymbolFile/PDB/calling-conventions-arm.test,
but I can't test this.
|
|
|
|
Adds the `isHLSLResourceRecordArray()` method to the `Type` class. This method returns `true` if the `Type` represents an array of HLSL resource records. Defining this method on `Type` makes it accessible from both sema and codegen.
|
|
The function ResolveOmpObject had a lot of highly-indented code in two
variant visitors. Extract the visitors into their own functions, and
reformat the code. Replace !(||) with !&&! in a couple of places to make
the formatting a bit nicer. Use llvm::enumerate instead of manually
maintaining iteration index.
|
|
The processing of `-oso_prefix` uses `llvm::sys::fs::real_path` from the
user value, but it is later tried to be matched with the result of
`make_absolute`. While `real_path` resolves special symbols like `.`,
`..` and `~`, and resolves symlinks along the path, `make_absolute` does
neither, causing an incompatibility in some situations.
In macOS, temporary directories would normally be reported as
`/var/folders/<random>`, but `/var` is in fact a symlink to
`private/var`. If own is working on a temporary directory and uses
`-oso_prefix .`, it will be expanded to `/private/var/folder/<random>`,
while `make_absolute` will expand to `/var/folder/<random>` instead, and
`-oso_prefix` will fail to remove the prefix from the `N_OSO` entries,
leaving absolute paths to the temporary directory in the resulting file.
This would happen in any situation in which the working directory
includes a symlink, not only in temporary directories.
One can change the usage of `make_absolute` to use `real_path` as well,
but `real_path` will mean checking the file system for each `N_OSO`
entry. The other solution is stop using `real_path` when processing
`-oso_prefix` and manually expand an input of `.` like `make_absolute`
will do.
This second option is the one implemented here, since it is the closest
to the visible behaviour of ld64 (like the removed comment notes), so it
is the better one for compatibility. This means that a test that checked
the usage of the tilde as `-oso_prefix` needs to be removed (since it
was done by using `real_path`), and two new tests are provided checking
that symlinks do not affect the result. The second test checks a change
in behaviour, in which if one provides the input files with a prefix of
`./`, even when using `-oso_prefix .` because the matching is textual,
the `./` prefix will stay in the `N_OSO` entries. This matches the
observed behaviour of ld64.
|
|
test_function_pointer_signature (#150921)
I fixed support for varargs functions
(previously it didn't crash but the codegen was incorrect).
I added tests for structs and unions which already work. With the
multivalue abi they crash in the backend, so I added a sema check that
rejects structs and unions for that abi.
It will also crash in the backend if passed an int128 or float128 type.
|
|
|
|
|
|
Adds this large patch that merely moved Pack/Unpack Ops from the Tensor
to Linalg dialects:
* https://github.com/llvm/llvm-project/pull/123902
|
|
This reverts commit 229d86026fa0e5d9412a0d5004532f0d9733aac6.
|
|
When the cleanup handling code was initially upstreamed, a SmallVector
was used to simplify the handling of the stack of cleanup objects.
However, that mechanism won't scale well enough for the rate at which
cleanup handlers are going to be pushed and popped while compiling a
large program. This change introduces the custom memory allocator which
is used in classic codegen and the CIR incubator.
Thiis does not otherwise change the cleanup handling implementation and
many parts of the infrastructure are still missing.
This is not intended to have any observable effect on the generated CIR,
but it does change the internal implementation significantly, so it's
not exactly an NFC change. The functionality is covered by existing
tests.
|
|
This patch drastically speed ups dumping .gdb_index for large indexes
|
|
|
|
options (#151579)
Adding mlir to llvm support for atomic control options.
Atomic Control Options are used to specify architectural characteristics
to help lowering of atomic operations. The options used are:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
`-f[no-]atomic-ignore-denormal-mode`.
Legacy option `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.
More details can be found in
https://github.com/llvm/llvm-project/pull/102569. This PR implements the
MLIR to LLVM lowering support of atomic control attributes specified
with OpenMP `atomicUpdateOp`.
Initial support can be found in PR:
https://github.com/llvm/llvm-project/pull/150860
|
|
In the large mode, SmallPtrSet uses quadratic probing with ProbeAmt++
just like DenseMap.
|
|
Add to the VOP patterns to recognise when or/xor/and are masking only the most significant bit of i32/v2i32/i64 and replace with the corresponding FP source modifier.
|
|
This implements very basic support for RISC-V mapping symbols in
llvm-objdump, sharing the implementation with how Arm/AArch64/CSKY
implement this feature.
This only supports the `$x` (instruction) and `$d` (data) mapping
symbols for RISC-V, and not the version of `$x` which includes an
architecture string suffix.
|
|
This change adds the definition of VTableAddrPointOp and the related
AddressPointAttr to the CIR dialect, along with tests for the parsing
and verification of these elements.
Code to generate this operation will be added in a later change.
|
|
|
|
By profiling LLDB debugging a Swift application without a dSYM and a
large amount of .o files, I identified that querying shared modules was
the biggest bottleneck when running "frame variable", and Clang types
need to be searched.
One of the reasons for that slowness is that the shared module list can
grow very large, and the search through it is O(n).
To solve this issue, this patch adds a new hashmap to the shared module
list whose key is the name of the module, and the value is all the
modules that share that name. This should speed up any search where the
query contains the module name.
rdar://156753350
|
|
The landing of #126324 made it so that __has_builtin returns false for
aux triple builtins. CUDA offloading can sometimes compile where the
host is in the aux triple (ie x86_64). This patch explicitly carves out
NVPTX so that we do not run into redefinition errors.
|
|
The code that checks for overlapping binding did not compare register space when one of the bindings was for an unbounded resource array, leading to false errors. This change fixes it.
|
|
Needed in #152308
|
|
This patch fixes:
llvm/lib/Target/PowerPC/PPCSelectionDAGInfo.h:25:3: error:
'EmitTargetCodeForMemcmp' overrides a member function but is not
marked 'override' [-Werror,-Winconsistent-missing-override]
|
|
This change adds support for Mul CompoundAssignment for ComplexType
https://github.com/llvm/llvm-project/issues/141365
|
|
|
|
The limit 'dfa-max-num-paths' that is used to control number of
enumerated paths was not checked against inside getPathsFromStateDefMap.
It may lead to large memory consumption for complex enough switch
statements.
|
|
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __memcmp routine is a millicode implementation;
we use millicode for the memcmp function instead of a library call to
improve performance.
|
|
This PR adds a support for file scope assembly in CIR.
|
|
prefetch-inferas-test.ll (#152492)
prefetch-inferas-test.ll was added in #146203 , but due to missing ptxas
version the CI is defaulting to sm 60.
This patch adds the arg in ptxas-verify check.
|
|
Fix typo in ComplexRangeKind comment
Catched in https://github.com/llvm/clangir/pull/1779
|
|
Fixes #152220.
- Adds `dx_imad` and `dx_umad` to
`isTargetIntrinsicTriviallyScalarizable`
- Adds tests that confirm the intrinsic is now scalarizing
|
|
This patchset:
* propagates alignment attributes from memref operations into the SPIR-V
dialect,
* fixes an error in the logic which previously propagated alignment
attributes but did not add other MemoryAccess attributes.
* adds a failure condition in the case where the alignment attribute
from the memref dialect (64-bit wide) does not fit in SPIR-V's alignment
attribute (specified to be 32-bit wide).
|
|
This is a continuation of #152188, which started splitting up the MCP
implementation into a generic implementation in Protocol/MCP that will
be shared between LLDB and lldb-mcp.
For now I kept all the networking code in the MCP server plugin. Once
the changes to JSONTransport land, we might be able to move more of it
into the Protocol library.
|
|
This per-file macro definition on the command line breaks caching of
modules. See discussion in #150677
Instead we use a constexpr function that processes the __FILE__ macro,
but prefer also the __FILE_NAME__ macro when available (clang/gcc) to spare
compile-time in the frontend.
If the constexpr function isn't const-evaluated, it'll be only evaluated when
printing the debug message.
|
|
|
|
The `constFoldBinaryOp` helper function had limited support for
different input and output types, but the static type of the underlying
value (e.g. `APInt`) had to match between the inputs and the output.
This worked fine for int comparisons of the form `(intN, intN) -> int1`,
as the static type signature was `(APInt, APInt) -> APInt`. However,
float comparisons map `(floatN, floatN) -> int1`, with a static type
signature of `(APFloat, APFloat) -> APInt`. This use case wasn't
supported by `constFoldBinaryOp`.
`constFoldBinaryOp` now accepts an optional template argument overriding
the return type in case it differs from the input type. If the new
template argument isn't provided, the default behavior is unchanged
(i.e. the return type will be assumed to match the input type).
`constFoldUnaryOp` received similar changes in order to support folding
non-cast ops of the form `(T1) -> T2` (e.g. a `sign` op mapping
`(floatN) -> sint32`).
|
|
This is for #151855, to make the changes more obvious.
|
|
(#89590)
…face
In order to have a consistent implementation for getMatchingYieldValue
for linalg generic with buffer/tensor semantics, we should assert the
opOperand index based on the numDpsInits and not numOfResults which may
be zero in the buffer semantics.
|
|
The fixed-point layout algorithm handles linker scripts, thunks, and
relaxOnce (to suppress out-of-range GOT-indirect-to-PC-relative
optimization). These passes are not needed for relocatable links because
they require address information that is not yet available.
Since we don't scan relocations for relocatable links, the
`createThunks` and `relaxOnce` functions are no-ops anyway, making these
passes redundant.
To prevent cluttering the line history, I place the `if (...) break;`
inside the for loop.
Pull Request: https://github.com/llvm/llvm-project/pull/152240
|
|
(#152230)
Splat is deprecated, and being prepared for removal in a future release.
https://discourse.llvm.org/t/rfc-mlir-vector-deprecate-then-remove-vector-splat/87143/5
The command I used, catches almost every splat op:
```
perl -i -pe
's/vector\.splat\s+(\S+)\s*:\s*vector<((?:\[?\d+\]?x)*)\s*([^>]+)>/vector.broadcast
$1 : $3 to vector<$2$3>/g' filename
```
|