Age | Commit message (Collapse) | Author | Files | Lines |
|
Fixes #78754
|
|
As of 4d20cfcf4eb08217ed37c4d4c38dc395d7a66d26, `__bit_reference`
contains a template `__fill_n` with a bool `_FillValue` parameter.
Unfortunately there is a relatively widely used piece of scientific
software called NetCDF, which exposes a (C) macro `_FillValue` in its
public headers.
When building the NetCDF C++ bindings, this quickly leads to compilation
errors when the macro interferes with the template in `__bit_reference`.
Rename the parameter to `_FillVal` to avoid the conflict.
|
|
This patch adds a simple pass to replace the uses inside compute
operation. It replaces the `varPtr` values with their corresponding
`accPtr` values gathered through the dataClauseOperands.
private and reductions variables are not included in this pass since
they will normally be replace when they are materialized.
---------
Co-authored-by: Slava Zakharin <szakharin@nvidia.com>
|
|
|
|
Handle masked predicated movss/movsd in addConstantComments now that we can generically handle the destination + mask register
This will more significantly help improve 'fixup constant' comments from #73509
|
|
Handle masked predicated load/broadcasts in addConstantComments now that we can generically handle the destination + mask register
This will more significantly help improve 'fixup constant' comments from #73509
|
|
Remove handling from EmitAnyX86InstComments and handle all VPMOVSX/VPMOVZX comments in addConstantComments now that we can generically handle the destination + mask register and shuffle mask comment
|
|
printDstRegisterName helpers. NFC.
This will allow us to easily use printDstRegisterName for other mask predicate destination registers, and printout shuffle masks from other instruction types.
|
|
We strive not to use raw assert(...) anymore in libc++abi in preparation
for using the hardening framework.
|
|
Pass `DeallocationOptions` instead of `privateFuncDynamicOwnership`.
This will make it easier to add new options in the future.
|
|
This was incorrectly removed when I split up the header.
|
|
This PR implements the `insque` and `remque` entrypoint functions.
|
|
This commit improves alpha.security.ArrayBoundV2 in two connected areas:
(1) It calls `markInteresting()` on the symbolic values that are
responsible for the out of bounds access.
(2) Its index-is-in-bounds assumptions are reported in note tags if they
provide information about the value of an interesting symbol.
This commit is limited to "display" changes: it introduces new
diagnostic pieces (potentially to bugs found by other checkers), but
ArrayBoundV2 will make the same assumptions and detect the same bugs
before and after this change.
As a minor unrelated change, this commit also updates/removes some very
old comments which became obsolete due to my previous changes.
|
|
We noticed that some feature-test macros were not conditional on
configuration flags like _LIBCPP_HAS_NO_FILESYSTEM. As a result, code
attempting to use FTMs would not work as intended.
This patch adds conditionals for a few feature-test macros, but more
issues may exist.
rdar://122020466
|
|
"__builtin_assume"
|
|
Summary:
This function is always false in the current implementation and is not
even considered required. Just remove it and if someone needs it in the
future they can add it back in. This is done to simplify the interface
prior to other changes
|
|
This adds a conversion from an externaly defined `func.func`, a
`func.func` without function body, to an `emitc.func` with an `extern`
specifier.
|
|
After PR#68727 the source for both the fir.box_addr and a box became the
same. Thus the detection that only one of the sources was direct and the
special logic around it was being skipped. As a result, the test
included would show a "MayAlias" result instead of a "NoAlias" result.
|
|
|
|
This fixes a crash in the attached test case due to missing type
inference for ICmp VPInstructions.
|
|
This is similar to c1ad363e6eba308fa94c47374ee98b3c79693a35, but with
the additional twist that initializing an existing value from a
`MemberExpr` was not working correctly.
|
|
If the pattern of a pack indexing type did not contain a pack, we would
still construct a pack indexing type (to improve error messages) but we
would fail to make the type as dependent, leading to infinite recursion
when trying to extract a canonical type.
|
|
Fold ((cst << x) & 1) to zext(x == 0) when cst is odd.
Fixes: https://github.com/llvm/llvm-project/issues/73384
Alive2: https://alive2.llvm.org/ce/z/5RbaK6
|
|
|
|
Summary:
These tests likely always failed but was hidden by the expected return
value. Simply make them require AMDGPU as a registered target so they
don't fail on other machines.
|
|
Initialize both elements to 0.
|
|
|
|
The current interpreter does this, so follow suit to match its
diagnostics.
|
|
Summary:
Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64
targets even though it is valid. This is relevant for making library
code that can handle both without needing to check the wavefront size.
This patch relaxes the semantic check for w64 so it can be used
normally.
|
|
Summary:
The PTX language rejects globals with `.` in the name. We need to change
the global name if we are targeting NVPTX to prevent the toolchain from
complaining.
|
|
Correct AMDGPU strictfp tests to follow the rules documented in the
LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
These tests needed the strictfp attribute added to function calls and
some declarations.
Some of the tests now pass with D146845, others get farther along and
fail with D146845. The tests revealed that further work is required
in mostly AMDGPU atomics to get the tests passing.
Since I was here anyway I removed the strictfp attribute from some
constrained intrinsic declarations. They have this attribute by default.
Test changes verified with D146845.
|
|
predicates instead of src index offsets.
|
|
|
|
This just avoids useless work of adding NoRegister to BaseSet, for
consistency with other places that iterate over all physical registers.
|
|
In mixed matmul lowering (e.g., i8 to i32) we're seeing the following
sequence:
%0 = arith.extsi %src : vector<4x[8]xi8> to vector<4x[8]xi32>
%1 = vector.extract %0[0] : vector<[8]xi32> from vector<4x[8]xi32>
%lhs = vector.scalable.extract %1[0] : vector<[4]xi32> from
vector<[8]xi32>
... (same for rhs)
%2 = vector.outerproduct %lhs, %rhs, %acc vector<[4]xi32>,
vector<[4]xi32>
// x4 chained by accumulator
This chain of 4 outer products can be fused into a single 4-way widening
variant but the pass doesn't match on the IR, as it expects the source
of the inputs to be an extend and it can't look through the extracts.
This patch fixes this with two rewrites that swaps extract(extend) into
extend(extract).
Related to #78975, #79288.
|
|
The "Refer to" and table shouldn't be in the example code sequence.
|
|
|
|
Fixes #77659
Fixes #46357
Picked up from https://reviews.llvm.org/D114119
|
|
This adds the support to add the target-feature to outline atomic operations (calling the
runtime library instead).
|
|
repeated switch cases etc. NFC.
|
|
masked predicates. NFC.
|
|
This is to add test coverage for crash report in #80340
|
|
This enables IR expansion for i128 divisions. The vector case is still
broken because ExpandLargeDivRem doesn't try to handle them.
Fixes: SWDEV-426193
|
|
Fixes #80366
|
|
|
|
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.
TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.
patch 3 from: https://github.com/llvm/llvm-project/pull/73337
|
|
The SGPR registers used for preserving EXEC mask while lowering the
whole-wave register spills and copies should be preserved at the prolog
and epilog if they are in the CSR range. It isn't happening when there
is only wwm-copy lowered and there are no wwm-spills. This patch
addresses that problem.
|
|
|
|
|
|
|