Age | Commit message (Collapse) | Author | Files | Lines |
|
This addresses the post-commit review comment in
https://github.com/llvm/llvm-project/pull/100775
|
|
The test now seems to pass so remove the XFAIL.
|
|
Fixes #101859.
If we have at least 2 ranges, we have to try to merge the last and first
ones to handle the wrap range.
|
|
`is_null_pointer` can be implemented very efficiently as
`__is_same(__remove_cv(T), decltype(nullptr))`. Since GCC supports both
of these builtins as well, libc++ has no interest in using
`__is_nullptr` instead. Furthermore, I could find only a single use in
the wild
(https://sourcegraph.com/search?q=context:global+__is_nullptr%28+-file:clang&patternType=keyword&sm=0).
Because of these reasons I don't think it's worth keeping this builtin
around.
|
|
|
|
If the comparison operation is equivalent to < and that is a total
order, we know that we can use equality comparison on that type instead
to extract some information. Furthermore, if equality comparison on that
type is trivial, the user can't observe that we're calling it. So
instead of using the user-provided total order, we use std::mismatch,
which uses equality comparison (and is vertorized). Additionally, if the
type is trivially lexicographically comparable, we can go one step
further and use std::memcmp directly instead of calling std::mismatch.
Benchmarks:
```
-------------------------------------------------------------------------------------
Benchmark old new
-------------------------------------------------------------------------------------
bm_lexicographical_compare<unsigned char>/1 1.17 ns 2.34 ns
bm_lexicographical_compare<unsigned char>/2 1.64 ns 2.57 ns
bm_lexicographical_compare<unsigned char>/3 2.23 ns 2.58 ns
bm_lexicographical_compare<unsigned char>/4 2.82 ns 2.57 ns
bm_lexicographical_compare<unsigned char>/5 3.34 ns 2.11 ns
bm_lexicographical_compare<unsigned char>/6 3.94 ns 2.21 ns
bm_lexicographical_compare<unsigned char>/7 4.56 ns 2.11 ns
bm_lexicographical_compare<unsigned char>/8 5.25 ns 2.11 ns
bm_lexicographical_compare<unsigned char>/16 9.88 ns 2.11 ns
bm_lexicographical_compare<unsigned char>/64 38.9 ns 2.36 ns
bm_lexicographical_compare<unsigned char>/512 317 ns 6.54 ns
bm_lexicographical_compare<unsigned char>/4096 2517 ns 41.4 ns
bm_lexicographical_compare<unsigned char>/32768 20052 ns 488 ns
bm_lexicographical_compare<unsigned char>/262144 159579 ns 4409 ns
bm_lexicographical_compare<unsigned char>/1048576 640456 ns 20342 ns
bm_lexicographical_compare<signed char>/1 1.18 ns 2.37 ns
bm_lexicographical_compare<signed char>/2 1.65 ns 2.60 ns
bm_lexicographical_compare<signed char>/3 2.23 ns 2.83 ns
bm_lexicographical_compare<signed char>/4 2.81 ns 3.06 ns
bm_lexicographical_compare<signed char>/5 3.35 ns 3.30 ns
bm_lexicographical_compare<signed char>/6 3.90 ns 3.99 ns
bm_lexicographical_compare<signed char>/7 4.56 ns 3.78 ns
bm_lexicographical_compare<signed char>/8 5.20 ns 4.02 ns
bm_lexicographical_compare<signed char>/16 9.80 ns 6.21 ns
bm_lexicographical_compare<signed char>/64 39.0 ns 3.16 ns
bm_lexicographical_compare<signed char>/512 318 ns 7.58 ns
bm_lexicographical_compare<signed char>/4096 2514 ns 47.4 ns
bm_lexicographical_compare<signed char>/32768 20096 ns 504 ns
bm_lexicographical_compare<signed char>/262144 156617 ns 4146 ns
bm_lexicographical_compare<signed char>/1048576 624265 ns 19810 ns
bm_lexicographical_compare<int>/1 1.15 ns 2.12 ns
bm_lexicographical_compare<int>/2 1.60 ns 2.36 ns
bm_lexicographical_compare<int>/3 2.21 ns 2.59 ns
bm_lexicographical_compare<int>/4 2.74 ns 2.83 ns
bm_lexicographical_compare<int>/5 3.26 ns 3.06 ns
bm_lexicographical_compare<int>/6 3.81 ns 4.53 ns
bm_lexicographical_compare<int>/7 4.41 ns 4.72 ns
bm_lexicographical_compare<int>/8 5.08 ns 2.36 ns
bm_lexicographical_compare<int>/16 9.54 ns 3.08 ns
bm_lexicographical_compare<int>/64 37.8 ns 4.71 ns
bm_lexicographical_compare<int>/512 309 ns 24.6 ns
bm_lexicographical_compare<int>/4096 2422 ns 204 ns
bm_lexicographical_compare<int>/32768 19362 ns 1947 ns
bm_lexicographical_compare<int>/262144 155727 ns 19793 ns
bm_lexicographical_compare<int>/1048576 623614 ns 80180 ns
bm_ranges_lexicographical_compare<unsigned char>/1 1.07 ns 2.35 ns
bm_ranges_lexicographical_compare<unsigned char>/2 1.72 ns 2.13 ns
bm_ranges_lexicographical_compare<unsigned char>/3 2.46 ns 2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/4 3.17 ns 2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/5 3.86 ns 2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/6 4.55 ns 2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/7 5.25 ns 2.12 ns
bm_ranges_lexicographical_compare<unsigned char>/8 5.95 ns 2.13 ns
bm_ranges_lexicographical_compare<unsigned char>/16 11.7 ns 2.13 ns
bm_ranges_lexicographical_compare<unsigned char>/64 45.5 ns 2.36 ns
bm_ranges_lexicographical_compare<unsigned char>/512 366 ns 6.35 ns
bm_ranges_lexicographical_compare<unsigned char>/4096 2886 ns 40.9 ns
bm_ranges_lexicographical_compare<unsigned char>/32768 23054 ns 489 ns
bm_ranges_lexicographical_compare<unsigned char>/262144 185302 ns 4339 ns
bm_ranges_lexicographical_compare<unsigned char>/1048576 741576 ns 19430 ns
bm_ranges_lexicographical_compare<signed char>/1 1.10 ns 2.12 ns
bm_ranges_lexicographical_compare<signed char>/2 1.66 ns 2.35 ns
bm_ranges_lexicographical_compare<signed char>/3 2.23 ns 2.58 ns
bm_ranges_lexicographical_compare<signed char>/4 2.82 ns 2.82 ns
bm_ranges_lexicographical_compare<signed char>/5 3.34 ns 3.06 ns
bm_ranges_lexicographical_compare<signed char>/6 3.92 ns 3.99 ns
bm_ranges_lexicographical_compare<signed char>/7 4.64 ns 4.10 ns
bm_ranges_lexicographical_compare<signed char>/8 5.21 ns 4.61 ns
bm_ranges_lexicographical_compare<signed char>/16 9.79 ns 7.42 ns
bm_ranges_lexicographical_compare<signed char>/64 38.9 ns 2.93 ns
bm_ranges_lexicographical_compare<signed char>/512 317 ns 7.31 ns
bm_ranges_lexicographical_compare<signed char>/4096 2500 ns 47.5 ns
bm_ranges_lexicographical_compare<signed char>/32768 19940 ns 496 ns
bm_ranges_lexicographical_compare<signed char>/262144 159166 ns 4393 ns
bm_ranges_lexicographical_compare<signed char>/1048576 638206 ns 19786 ns
bm_ranges_lexicographical_compare<int>/1 1.10 ns 2.12 ns
bm_ranges_lexicographical_compare<int>/2 1.64 ns 3.04 ns
bm_ranges_lexicographical_compare<int>/3 2.23 ns 2.58 ns
bm_ranges_lexicographical_compare<int>/4 2.81 ns 2.81 ns
bm_ranges_lexicographical_compare<int>/5 3.35 ns 3.05 ns
bm_ranges_lexicographical_compare<int>/6 3.94 ns 4.60 ns
bm_ranges_lexicographical_compare<int>/7 4.60 ns 4.81 ns
bm_ranges_lexicographical_compare<int>/8 5.19 ns 2.35 ns
bm_ranges_lexicographical_compare<int>/16 9.85 ns 2.87 ns
bm_ranges_lexicographical_compare<int>/64 38.9 ns 4.70 ns
bm_ranges_lexicographical_compare<int>/512 318 ns 24.5 ns
bm_ranges_lexicographical_compare<int>/4096 2494 ns 202 ns
bm_ranges_lexicographical_compare<int>/32768 20000 ns 1939 ns
bm_ranges_lexicographical_compare<int>/262144 160433 ns 19730 ns
bm_ranges_lexicographical_compare<int>/1048576 642636 ns 80760 ns
```
|
|
|
|
Implement a new transformation that fold the bit-testing expression
(icmp ne (and (lshr V B) 1) 0) to (icmp ne (and V (shl 1 B)) 0) for
constant V. This rule already existed for non-constant V and constants
other than 1; this restriction to non-constant V has been added in
commit c3b2111d975a39d19f0c5d635e2961a4449c5a71 to fix an infinite loop.
Avoid the infinite loop by allowing constant V only if the shift
instruction is an lshr and the constant is 1. Also fold the negated
variant of the LHS.
This transformation necessitates an adaption of existing tests in
`icmp-and-shift.ll` and `load-cmp.ll`. One test in `icmp-and-shift.ll`,
which previously was a negative test, now gets folded. Rename it to
indicate that it is a positive test.
Alive proof: https://alive2.llvm.org/ce/z/vcJJTx
Relates to issue #86813.
|
|
|
|
(#101816)
None of the Python files were committed with the executable bit set, and
only yaml_to_classes.py was intended to be executed.
Also sets the executable bit on yaml_to_classes.py and changes the
shebang to run python3 instead of python.
|
|
Disable `strfroml` entrypoint on aarch64 to please clang-11 buildbots.
Detailed in https://github.com/llvm/llvm-project/issues/101846. This is
not a fix for #101846 so I will keep the issue open until our buildbot
is updated or other mitigation is applied.
|
|
reference (#86761)
|
|
We can increase the number of Bits passes to the users by adding
the shift amount.
|
|
RISCVDAGToDAGISel::hasAllNBitUsers. NFC
Make "break" consistently the "if" body and the "return false" the
last thing in each case.
This makes it easier to add different conditions for different operands
of some instructions and makes everything more consistent.
|
|
|
|
|
|
```
UBSan-Standalone-sparc :: TestCases/Misc/Linux/diag-stacktrace.cpp
```
`FAIL`s on 32 and 64-bit Linux/sparc64 (and on Solaris/sparcv9, too: the
test isn't Linux-specific at all). With
`UBSAN_OPTIONS=fast_unwind_on_fatal=1`, the stack trace shows a
duplicate innermost frame:
```
compiler-rt/test/ubsan/TestCases/Misc/Linux/diag-stacktrace.cpp:14:31: runtime error: execution reached the end of a value-returning function without returning a value
#0 0x7003a708 in f() compiler-rt/test/ubsan/TestCases/Misc/Linux/diag-stacktrace.cpp:14:35
#1 0x7003a708 in f() compiler-rt/test/ubsan/TestCases/Misc/Linux/diag-stacktrace.cpp:14:35
#2 0x7003a714 in g() compiler-rt/test/ubsan/TestCases/Misc/Linux/diag-stacktrace.cpp:17:38
```
which isn't seen with `fast_unwind_on_fatal=0`.
This turns out to be another fallout from fixing
`__builtin_return_address`/`__builtin_extract_return_addr` on SPARC. In
`sanitizer_stacktrace_sparc.cpp` (`BufferedStackTrace::UnwindFast`) the
`pc` arg is the return address, while `pc1` from the stack frame
(`fr_savpc`) is the address of the `call` insn, leading to a double
entry for the innermost frame in `trace_buffer[]`.
This patch fixes this by moving the adjustment before all uses.
Tested on `sparc64-unknown-linux-gnu` and `sparcv9-sun-solaris2.11`
(with the `ubsan/TestCases/Misc/Linux` tests enabled).
|
|
`compiler-rt/lib/builtins/divtc3.c` and `multc3.c` don't compile on
Solaris/sparcv9 with `gcc -m32`:
```
FAILED: projects/compiler-rt/lib/builtins/CMakeFiles/clang_rt.builtins-sparc.dir/divtc3.c.o
[...]
compiler-rt/lib/builtins/divtc3.c: In function ‘__divtc3’:
compiler-rt/lib/builtins/divtc3.c:22:18: error: implicit declaration of function ‘__compiler_rt_logbtf’ [-Wimplicit-function-declaration]
22 | fp_t __logbw = __compiler_rt_logbtf(
| ^~~~~~~~~~~~~~~~~~~~
```
and many more. It turns out that while the definition of `__divtc3` is
guarded with `CRT_HAS_F128`, the `__compiler_rt_logbtf` and other
declarations use `CRT_HAS_128BIT && CRT_HAS_F128` as guard. This only
shows up with `gcc` since, as documented in Issue #41838, `clang`
violates the SPARC psABI in not using 128-bit `long double`, so this
code path isn't used.
Fixed by changing the guards to match.
Tested on `sparcv9-sun-solaris2.11`.
|
|
This patch allows sequences like:
`__asan_before_dynamic_init`
`__asan_before_dynamic_init`
...
`__asan_before_dynamic_init`
to do minimal incrementa poisoning.
It's NFC as now callbacks invokes in pairs:
`__asan_before_dynamic_init`
`__asan_after_dynamic_init`
`__asan_before_dynamic_init`
`__asan_after_dynamic_init`
and `__asan_after_dynamic_init` unpoisons
everything anyway.
For #101837
|
|
This is a non-feature change that enables most of the entrypoints for
aarch64 based runtime builds. It fixes an additional problem that some
compiler-rt targets are not defined at the time of dependency checking
thus leading to false-negatives.
|
|
This patch implements tracking for the insertion of a
sandboxir::Instruction into a sandboxir::BasicBlock.
|
|
Use const SCEV * explicitly in more places to prepare for
https://github.com/llvm/llvm-project/pull/91961. Split off as suggested.
|
|
Ctx was introduced in March 2022 as a more suitable place for such
singletons.
|
|
Ctx was introduced in March 2022 as a more suitable place for such
singletons. ctx's hidden visibility optimizes generated instructions.
This change fixes a pitfall: certain ElfSym members (e.g.
globalOffsetTable, tlsModuleBase) were not zeroed and might be stale
when lld::elf::link was invoked the second time.
|
|
Ctx was introduced in March 2022 as a more suitable place for such
singletons. ctx's hidden visibility optimizes generated instructions.
bufferStart and tlsPhdr, which are not OutputSection, can now be moved
outside of `Out`.
|
|
The test is intended to check the order of modules
in DynInitPoison, only relative to
DynInitUnpoison.
Folloup to #101584
|
|
The patch just switches container from plain
list of globals, to a map grouped by module.
Prepare for incremental poisoning in #101837
|
|
(#101774)
python3 wasn't able to see modules installed by pip, so we need to use
the setup-python action to ensure that the default pip and python3 both
use the same prefix.
See https://github.com/actions/runner-images/issues/10385
|
|
RHS first.
|
|
We need to evaluate the callee before the arguments.
|
|
#98489 resurrected an [old patch](https://reviews.llvm.org/D10833) that
was adding new libclang functions. That PR got merged with old `LLVM_13`
symbol versions for new functions. This patch fixes this oversight.
|
|
Basically, these operations are equivalent to a loop that iterates all
elements and then does a `getelementptr` (without `inbounds`!) plus
`load`/`store` only for the masked-on elements.
|
|
This patch tries to clean up some of the existing values in
getMemOpInfo. All values should now be in bytes (not bits), and the
MinOffset/MaxOffset are now always represented unscaled (the immediate
that will be present in the final instruction).
Although I could not find a place where it altered codegen, the offset
of a post-index instruction will be 0, not scale*imm. A
IsPostIndexLdStOpcode method has been added to try and make sure that
case is handled properly.
|
|
|
|
Always evaluate LHS first, then RHS.
|
|
Investigating #96612 shows our implementation was different from the
Standard and could cause UB. Testing the codegen showed quite a bit of
assembly generated for these functions. The functions have been written
differently which allows Clang to optimize the code to use simple CPU
rotate instructions.
Fixes: https://github.com/llvm/llvm-project/issues/96612
|
|
The generator makes a few changes to the output
- removes the synopsis, it did not really show what was implemented
correctly.
- the output now is clang-format clean.
This code uses the new FTM data structure. Since the contents of this
structure are not up-to-date the code is only used in its tests.
|
|
This is to extract the NFC change in #96878 into a separate PR.
|
|
logic (#98805)
This commit moves the argument materialization logic from
`legalizeConvertedArgumentTypes` to
`legalizeUnresolvedMaterializations`.
Before this change:
- Argument materializations were created in
`legalizeConvertedArgumentTypes` (which used to call
`materializeLiveConversions`).
After this change:
- `legalizeConvertedArgumentTypes` creates a "placeholder"
`unrealized_conversion_cast`.
- The placeholder `unrealized_conversion_cast` is replaced with an
argument materialization (using the type converter) in
`legalizeUnresolvedMaterializations`.
- All argument and target materializations now take place in the same
location (`legalizeUnresolvedMaterializations`).
This commit brings us closer towards creating all source/target/argument
materializations in one central step, which can then be made optional
(and delegated to the user) in the future. (There is one more source
materialization step that has not been moved yet.)
This commit also consolidates all `build*UnresolvedMaterialization`
functions into a single `buildUnresolvedMaterialization` function.
This is a re-upload of #96329.
|
|
Change scope handling to allow multiple Destroy calls for a given scope,
provided it is preceeded by a InitScope call. This is necessary to
properly allow nested scopes in loops.
|
|
This doesn't fix the attached test case, but at least we're not crashing
anymore.
|
|
|
|
Proof (Please run alive-tv with larger smt-to):
https://alive2.llvm.org/ce/z/-aqixk
FMF propagation: https://alive2.llvm.org/ce/z/zyKK_p
```
sqrt(X) < 0.0 --> false
sqrt(X) u>= 0.0 --> true
sqrt(X) u< 0.0 --> X u< 0.0
sqrt(X) u<= 0.0 --> X u<= 0.0
sqrt(X) > 0.0 --> X > 0.0
sqrt(X) >= 0.0 --> X >= 0.0
sqrt(X) == 0.0 --> X == 0.0
sqrt(X) u!= 0.0 --> X u!= 0.0
sqrt(X) <= 0.0 --> X == 0.0
sqrt(X) u> 0.0 --> X u!= 0.0
sqrt(X) u== 0.0 --> X u<= 0.0
sqrt(X) != 0.0 --> X > 0.0
!isnan(sqrt(X)) --> X >= 0.0
isnan(sqrt(X)) --> X u< 0.0
```
In most cases, `sqrt` cannot be eliminated since it has multiple uses.
But this patch will break data dependencies and allow optimizer to sink
expensive `sqrt` calls into successor blocks.
|
|
|
|
Add more relational operators.
|
|
(#101569)
The old version in the llvm/actions repo stopped working after the
version variables were moved out of llvm/CMakeLists.txt. Composite
actions are more simple and don't require javascript, which is why I
reimplemented it as a composite action.
This will fix the failing abi checks on the release branch.
|
|
This patch adds sandboxir::LoadInst::setVolatile() and sandboxir::StoreInst::setVolatile()
and the corresponding tracking class.
|
|
This PR adds the length intrinsic and an HLSL function that uses it.
The SPIRV implementation is left for a future PR.
This PR addresses #99134, though some SPIR-V changes still need to be
made to complete the task. Below is how this PR addresses #99134.
- "Implement `length` clang builtin" was done by defining `HLSLL ength`
in Builtins.td
- "Link `length` clang builtin with hlsl_intrinsics.h" was done by using
the alias attribute to make `length` an alias of
`__builtin_hlsl_elementwise_length` in hlsl_intrinsics.h
- "Add sema checks for `length` to `CheckHLSLBuiltinFunctionCall` in
`SemaChecking.cpp` " was done, but in this case not in SemaChecking.cpp,
rather SemaHLSL.cpp. A case was added to the builtin to check for
semantic failures, and set `TheCall` up to have the right return type.
- "Add codegen for `length` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp`"
was done. For scalars, fabs is emitted, otherwise, length is emitted.
- "Add codegen tests to `clang/test/CodeGenHLSL/builtins/length.hlsl`
was done to test that `length` in HLSL emits the right intrinsic.
- "Add sema tests to `clang/test/SemaHLSL/BuiltIns/length-errors.hlsl`"
was done to test for diagnostics emitted in SemaHLSL.cpp
- "Create the `int_dx_length` intrinsic in `IntrinsicsDirectX.td`" was
done. Specifying return types and parameter types was difficult, but
`idot` was used for reference, and `llvm\include\llvm\IR\Intrinsics.td`
contains all the ways to express return / parameter types.
- "Create an intrinsic expansion of `int_dx_length` in
`llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp`" was done, and was
mostly derived by looking at `TranslateLength` in `HLOperationLower.cpp`
in the DXC codebase.
- "Create the `length.ll` and `length_errors.ll` tests in
`llvm/test/CodeGen/DirectX/`" was done by taking the DXIL output of
`clang/test/CodeGenHLSL/builtins/length.hlsl` and running `opt -S
-dxil-intrinsic-expansion` and ` opt -S -dxil-op-lower` on it, checking
for how the length intrinsic was either expanded or lowered.
- "Create the `int_spv_length` intrinsic in `IntrinsicsSPIRV.td`" was
done by copying `IntrinsicsDirectX.td`.
---------
Co-authored-by: Justin Bogner <mail@justinbogner.com>
|
|
|
|
|