aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-01run git merge mainusers/minglotus-6/spr/main.nfcmingmingl14557-335794/+787098
2024-04-01[workflows] issue-write: Avoid race condition when PR branch is deleted (#87118)Tom Stellard1-0/+9
Fixes #87102 .
2024-04-01[ThinLTO][TypeProf] Implement vtable def import (#79381)Mingming Liu6-36/+120
Add annotated vtable GUID as referenced variables in per function summary, and update bitcode writer to create value-ids for these referenced vtables. - This is the part3 of type profiling work, and described in the "Virtual Table Definition Import" [1] section of the RFC. [1] https://github.com/llvm/llvm-project/pull/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
2024-04-01[GISEL][NFC] Fix comment for widenScalarToNextPow2Michael Maitland1-1/+2
The docstring for this function incorrectly specified when a widening is not performed. This patch adds the additional specification for what happens when the type size is a power of two but it is less than MinSize.
2024-04-01[Object,ELFTypes] Remove TargetEndiannessFangrui Song1-1/+0
Finish the rename by #86604
2024-04-01[workflows] issue-write: Exit early if there are no comments (#87114)Tom Stellard1-1/+1
This will eliminate some unnecessary REST API calls.
2024-04-01[libc] fixup ftello test (#87282)Nick Desaulniers1-1/+1
Use a seek offset that fits within the file size. This was missed in presubmit because the FILE based stdio tests aren't run in overlay mode; fullbuild is not tested in presubmit. WRITE_SIZE == 11, so using a value of 42 for offseto would cause the expression `WRITE_SIZE - offseto` to evaluate to -31 as an unsigned 64b integer (18446744073709551585ULL). Fixes #86928
2024-04-01[BOLT][NFC] Fix typoMaksim Panchenko1-1/+1
2024-04-01[lldb] Don't crash when attempting to parse breakpoint id `N.` as `N.*` (#87263)Jordan Rupprecht2-26/+28
We check if the next character after `N.` is `*` before we check its length. Using `split` on the string is cleaner and less error prone than using indices with `find` and `substr`. Note: this does not make `N.` mean anything, it just prevents assertion failures. `N.` is treated the same as an unrecognized breakpoint name: ``` (lldb) breakpoint enable 1 1 breakpoints enabled. (lldb) breakpoint enable 1.* 1 breakpoints enabled. (lldb) breakpoint enable 1. 0 breakpoints enabled. (lldb) breakpoint enable xyz 0 breakpoints enabled. ``` Found via LLDB fuzzers.
2024-04-01[llvm][Support] Use `thread_local` caching for llvm::get_threadid() query on ↵Jeff Niu1-2/+5
Apple systems (#87219) I was profiling our compiler and noticed that `llvm::get_threadid` was at the top of the hotlist, taking up a surprising 5% (7 seconds) in the profile trace. It seems that computing this on MacOS systems is non-trivial, so cache the result in a thread_local. Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-04-01[PseudoProbe] Extend to skip instrumenting probe into the dests of invoke ↵Lei Wang6-20/+276
(#79919) As before we only skip instrumenting probe of `unwind`(`KnownColdBlock`) block, this PR extends to skip the both EH flow from `invoke`, i.e. also skip the `normal` dest. For more contexts: when doing call-to-invoke conversion, the block is split by the `invoke` and two extra blocks(`normal` and `unwind`) are added. With this PR, the instrumentation is the same as the one before the call-to-invoke conversion. One significant benefit is this can help mitigate the "unstable IR" issue(https://discourse.llvm.org/t/ipo-for-linkonce-odr-functions/69404), the two versions now are on the same probe instrumentation, expected to be the same checksum. To achieve the same checksum, some tweaks is needed: - Now it also skips incrementing the probe ID for the skipped probe. - The checksum is also computed based on the CFG that skips the EH edges. We observed this fixes ~5% mismatched samples.
2024-04-01[ubsan][NFC] Remove recently added `cl::init(false)`Vitaly Buka2-6/+4
Extracted from #84858
2024-04-01[scudo] Change isPowerOfTwo macro to return false for zero. (#87120)Christopher Ferris3-7/+9
Clean-up all of the calls and remove the redundant == 0 checks. There is only one small visible change. For non-Android, the memalign function will now fail if alignment is zero. Before this would have passed.
2024-04-01[clang] Fix bitfield access unit for vbase corner case (#87238)Nathan Sidwell2-48/+113
This fixes #87227, a vbase can be placed below nvsize when empty members and/or bases are in play. We must account for that.
2024-04-01[mlir] Remove ``dataclasses`` package from mlir ``requirements.txt`` (#87223)Kirill Podoprigora1-2/+1
The ``dataclasses`` package makes sense for Python 3.6, becauses ``dataclasses`` is only included in the standard library with 3.7 version. Now, 3.6 has reached EOL, so all current supported versions of Python (3.8, 3.9, 3.10, 3.11, 3.12) have this feature in their standard libraries. Therefore there's no need to install the ``dataclasses`` package now.
2024-04-01[C99] Claim conformance to "conversion of array to pointer not limited to ↵Aaron Ballman2-1/+39
lvalues" We don't have a document number for this, but the change was called out explicitly in the editor's comments in the C99 foreword.
2024-04-01[NFC]Precommit test for vtable import (#79363)Mingming Liu1-0/+61
A precommit test case to show function summary and global values when a function has instructions annotated with vtable profiles and indirect call profiles. - This is a precommit test for https://github.com/llvm/llvm-project/pull/79381
2024-04-01[clang] Factor out OpenACC part of `Sema` (#84184)Vlad Serebrennikov7-83/+125
This patch moves OpenACC parts of `Sema` into a separate class `SemaOpenACC` that is placed in a separate header `Sema/SemaOpenACC.h`. This patch is intended to be a model of factoring things out of `Sema`, so I picked a small OpenACC part. Goals are the following: 1) Split `Sema` into manageable parts. 2) Make dependencies between parts visible. 3) Improve Clang development cycle by avoiding recompiling unrelated parts of the compiler. 4) Avoid compile-time regressions. 5) Avoid notational regressions in the code that uses Sema.
2024-04-01Revert "[CodeGen] Fix register pressure computation in MachinePipeliner ↵Gulfem Savrun Yeniceri3-180/+166
(#87030)" This reverts commit a4dec9d6bc67c4d8fbd4a4f54ffaa0399def9627 because the test failed in the following builder: https://luci-milo.appspot.com/ui/p/fuchsia/builders/prod/clang-linux-x64/b8751864477467126481/overview
2024-04-01[GOFF] Wrap debug output with LLVM_DEBUG (#87252)Kai Nacke1-6/+3
The content of a GOFF record is always dumped if NDEBUG is not defined, which produces rather confusing output. This changes wrap the dumping code in LLVM_DEBUG, so the dump is only done when debug output of this module is requested.
2024-04-01[clang-cl] Allow a colon after /Fo option (#87209)Russell Greene2-0/+4
Modeled after https://github.com/llvm/llvm-project/commit/8513a681f7d8d1188706762e712168aebc3119dd# According to https://learn.microsoft.com/en-us/cpp/build/reference/fo-object-file-name?view=msvc-170, `/Fo` accepts a trailing-colon variant. This is also tested in practice. This allows clang-cl to parse this. I just copied one of the existing tests, let me know if this is not the best way to do this. I tested that the test does not pass beofre the Options.td change, and that it does after. See also #46065
2024-04-01[C99] Claim conformance to WG14 N570Aaron Ballman2-1/+32
2024-04-01[libc][math] Implement atan2f correctly rounded to all rounding modes. (#86716)lntue22-73/+634
We compute atan2f(y, x) in 2 stages: - Fast step: perform computations in double precision , with relative errors < 2^-50 - Accurate step: if the result from the Fast step fails Ziv's rounding test, then we perform computations in double-double precision, with relative errors < 2^-100. On Ryzen 5900X, worst-case latency is ~ 200 clocks, compared to average latency ~ 60 clocks, and average reciprocal throughput ~ 20 clocks.
2024-04-01[mlir][sparse] allow YieldOp to yield multiple values. (#87261)Peiming Liu4-23/+29
2024-04-01[libc] fixup missing include for fullbuild (#87266)Nick Desaulniers2-0/+2
Fixes #86928
2024-04-01Update the "Current Status" section of the website to be current. (#84507)Eric1-0/+19
The section discusses the reasons for the libraries inception more than a decade ago. Now it discusses the progess libc++ has made, and the many impressive acomplishments our contributors have brought it. The initial section remains below. --------- Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
2024-04-01[HLSL] Implement array temporary support (#79382)Chris B49-31/+606
HLSL constant sized array function parameters do not decay to pointers. Instead constant sized array types are preserved as unique types for overload resolution, template instantiation and name mangling. This implements the change by adding a new `ArrayParameterType` which represents a non-decaying `ConstantArrayType`. The new type behaves the same as `ConstantArrayType` except that it does not decay to a pointer. Values of `ConstantArrayType` in HLSL decay during overload resolution via a new `HLSLArrayRValue` cast to `ArrayParameterType`. `ArrayParamterType` values are passed indirectly by-value to functions in IR generation resulting in callee generated memcpy instructions. The behavior of HLSL function calls is documented in the [draft language specification](https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf) under the Expr.Post.Call heading. Additionally the design of this implementation approach is documented in [Clang's documentation](https://clang.llvm.org/docs/HLSL/FunctionCalls.html) Resolves #70123
2024-04-01[scudo] Do a M_PURGE call before changing release interval on Android (#87110)ChiaHungDuan1-0/+5
2024-04-01[libc][POSIX] implement fseeko, ftello (#86928)Shourya Goel11-4/+130
Fixes: #85287
2024-04-01[nfc] Disable the a cpp compiler-rt test on ppc bigendian systems due to ↵Mingming Liu1-0/+5
build errors (#87262) `Linux/instrprof-vtable-value-prof.cpp` needs to be built for the test to run. However, cpp compile & link failed with undefined-ABI error [1]. See original failure in https://lab.llvm.org/buildbot/#/builders/18/builds/16429 [1] ``` FAIL: Profile-powerpc64 :: Linux/instrprof-vtable-value-prof.cpp (2406 of 2414) ******************** TEST 'Profile-powerpc64 :: Linux/instrprof-vtable-value-prof.cpp' FAILED ******************** Exit Code: 1 Command Output (stderr): -- RUN: at line 3: /home/buildbots/llvm-external-buildbots/workers/ppc64be-sanitizer/sanitizer-ppc64be/build/build_debug/./bin/clang --driver-mode=g++ -m64 -ldl -fprofile-generate -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm -enable-vtable-value-profiling /home/buildbots/llvm-external-buildbots/workers/ppc64be-sanitizer/sanitizer-ppc64be/build/llvm-project/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64be-sanitizer/sanitizer-ppc64be/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/profile/Profile-powerpc64/Linux/Output/instrprof-vtable-value-prof.cpp.tmp-test + /home/buildbots/llvm-external-buildbots/workers/ppc64be-sanitizer/sanitizer-ppc64be/build/build_debug/./bin/clang --driver-mode=g++ -m64 -ldl -fprofile-generate -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm -enable-vtable-value-profiling /home/buildbots/llvm-external-buildbots/workers/ppc64be-sanitizer/sanitizer-ppc64be/build/llvm-project/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64be-sanitizer/sanitizer-ppc64be/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/profile/Profile-powerpc64/Linux/Output/instrprof-vtable-value-prof.cpp.tmp-test ld.lld: error: /lib/../lib64/Scrt1.o: ABI version 1 is not supported clang: error: linker command failed with exit code 1 (use -v to see invocation) ```
2024-04-01[libc] Include algorithm.h to parser.h (#87125)Caslyn Tonelli3-0/+3
This includes algorithm.h directly to provide the definition for `cpp:max` in parser.h. This will define `max(...)` in the libc namespace for build systems that pull in parser.h explicitly.
2024-04-01[mlir][TD] Allow op printing flags as `transform.print` attrs (#86846)Jakub Kuderski4-10/+94
Introduce 3 new optional attributes to the `transform.print` ops: * `assume_verified` * `use_local_scope` * `skip_regions` The primary motivation is to allow printing on large inputs that otherwise take forever to print and verify. For the full context, see this IREE issue: https://github.com/openxla/iree/issues/16901. Also add some tests and fix the op description.
2024-04-01[libc++] Optimize the two range overload of mismatch (#86853)Nikolas Klauser7-17/+74
``` ----------------------------------------------------------------------------- Benchmark old new ----------------------------------------------------------------------------- bm_mismatch_two_range_overload<char>/1 0.941 ns 1.88 ns bm_mismatch_two_range_overload<char>/2 1.43 ns 2.15 ns bm_mismatch_two_range_overload<char>/3 1.95 ns 2.55 ns bm_mismatch_two_range_overload<char>/4 2.58 ns 2.90 ns bm_mismatch_two_range_overload<char>/5 3.75 ns 3.31 ns bm_mismatch_two_range_overload<char>/6 5.00 ns 3.83 ns bm_mismatch_two_range_overload<char>/7 5.59 ns 4.35 ns bm_mismatch_two_range_overload<char>/8 6.37 ns 4.84 ns bm_mismatch_two_range_overload<char>/16 11.8 ns 6.72 ns bm_mismatch_two_range_overload<char>/64 45.5 ns 2.59 ns bm_mismatch_two_range_overload<char>/512 366 ns 12.6 ns bm_mismatch_two_range_overload<char>/4096 2890 ns 91.6 ns bm_mismatch_two_range_overload<char>/32768 23038 ns 758 ns bm_mismatch_two_range_overload<char>/262144 142813 ns 6573 ns bm_mismatch_two_range_overload<char>/1048576 366679 ns 26710 ns bm_mismatch_two_range_overload<short>/1 0.934 ns 1.88 ns bm_mismatch_two_range_overload<short>/2 1.30 ns 2.58 ns bm_mismatch_two_range_overload<short>/3 1.76 ns 3.28 ns bm_mismatch_two_range_overload<short>/4 2.24 ns 3.98 ns bm_mismatch_two_range_overload<short>/5 2.80 ns 4.92 ns bm_mismatch_two_range_overload<short>/6 3.58 ns 6.01 ns bm_mismatch_two_range_overload<short>/7 4.29 ns 7.03 ns bm_mismatch_two_range_overload<short>/8 4.67 ns 7.39 ns bm_mismatch_two_range_overload<short>/16 9.86 ns 13.1 ns bm_mismatch_two_range_overload<short>/64 38.9 ns 4.55 ns bm_mismatch_two_range_overload<short>/512 348 ns 27.7 ns bm_mismatch_two_range_overload<short>/4096 2881 ns 225 ns bm_mismatch_two_range_overload<short>/32768 23111 ns 1715 ns bm_mismatch_two_range_overload<short>/262144 184846 ns 14416 ns bm_mismatch_two_range_overload<short>/1048576 742885 ns 57264 ns bm_mismatch_two_range_overload<int>/1 0.838 ns 1.19 ns bm_mismatch_two_range_overload<int>/2 1.19 ns 1.65 ns bm_mismatch_two_range_overload<int>/3 1.83 ns 2.06 ns bm_mismatch_two_range_overload<int>/4 2.38 ns 2.42 ns bm_mismatch_two_range_overload<int>/5 3.60 ns 2.47 ns bm_mismatch_two_range_overload<int>/6 3.68 ns 3.05 ns bm_mismatch_two_range_overload<int>/7 4.32 ns 3.36 ns bm_mismatch_two_range_overload<int>/8 5.18 ns 3.58 ns bm_mismatch_two_range_overload<int>/16 10.6 ns 2.84 ns bm_mismatch_two_range_overload<int>/64 39.0 ns 7.78 ns bm_mismatch_two_range_overload<int>/512 247 ns 53.9 ns bm_mismatch_two_range_overload<int>/4096 1927 ns 429 ns bm_mismatch_two_range_overload<int>/32768 15569 ns 3393 ns bm_mismatch_two_range_overload<int>/262144 125413 ns 28504 ns bm_mismatch_two_range_overload<int>/1048576 504549 ns 112729 ns ```
2024-04-01[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write ↵Mingming Liu17-192/+1419
(#66825) (The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691) * For InstrFDO value profiling, implement instrumentation and lowering for virtual table address. * This is controlled by `-enable-vtable-value-profiling` and off by default. * When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads. * Implement profile reader and writer support * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols. * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't happen since IR is used to construct InstrProfSymtab. * Indexed profile writer collects the list of vtable names, and stores that to index profiles. * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type. * `llvm-profdata show -show-vtables <args> <profile>` is implemented. rfc in https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
2024-04-01[mlir][NFC] Simplify type checks with isa predicates (#87183)Jakub Kuderski28-118/+83
For more context on isa predicates, see: https://github.com/llvm/llvm-project/pull/83753.
2024-04-01[RISCV] ReadStoreData is read later in the pipeline for SiFive7 (#86454)Michael Maitland1-1/+1
Store data is read later in the pipeline, so we use SiFive7AnyToGPRBypass to model that a store instruction can begin some cycles before that data is ready.
2024-04-01[SLP]Fix PR87011: Missing sign extension of demoted type before zero extensionAlexey Bataev6-51/+19
Need to drop skipping of the first zext/sext nodes, it leads to incorrect and less profitable code.
2024-04-01[GISEL] G_SPLAT_VECTOR can take a splat that is larger than the vector ↵Michael Maitland3-7/+18
element (#86974) This is what SelectionDAG does. We'd like to reuse SelectionDAG patterns.
2024-04-01[Libomptarget] Fix resizing the buffer of RPC handlesJoseph Huber3-2/+8
Summary: The previous code would potentially make it smaller if a device with a lower ID touched it later. Also we should minimize changes to the state for multi threaded reasons. This just sets up an owned slot for each at initialization time.
2024-04-01[Flang] Relaxing an error when contiguous pointer is assigned to a ↵harishch41-1/+1
non-contig… (#86781) …uous function. Fix from [thtsikas](https://github.com/thtsikas) based on a discussion in [slack](https://flang-compiler.slack.com/archives/C5C58TT32/p1711124374836079). Example: ``` Program test Integer, Pointer, Contiguous :: cont(:) Interface Function f() Integer, Pointer :: f(:) End Function End Interface cont => f() Print *, cont(3) End Program Function f() Integer, Pointer :: f(:) Allocate (f(4),Source=[1,1,42,1]) ! f => f(4:1:-1) !! not contiguous, runtime error End Function f ``` Understanding is that the standard intended to allow this pattern. The restriction 10.2.2.3 p6 Data pointer assignment "If the pointer object has the CONTIGUOUS attribute, the pointer target shall be contiguous." is not associated with a numbered constraint. If there is a mechanism for injecting runtime checks, this would be a place to do it. Absent that, a warning is the best we can do. No other compiler treats contigPtr => func() as an error when func() is not CONTIGUOUS, so a warning would probably be better for consistency. https://godbolt.org/z/5cM6roeEE
2024-04-01[VPlan] Use recipe's debug loc for VPWidenMemoryInstructionRecipe (NFCI)Florian Hahn4-17/+21
Now that VPRecipeBase manages debug locations for recipes, use it in VPWidenMemoryInstructionRecipe.
2024-04-01[TableGen] Fix MacroFusion.tdWang Pengcheng1-4/+4
We are missing `[[maybe_unused]]`.
2024-04-01[VPlan] Inline addVPValue into single caller (NFCI).Florian Hahn1-8/+3
Inline the function into its single caller.
2024-04-01[TableGen][NFC] Add maybe_unused to MRI (#87044)Pengcheng Wang1-1/+2
This suppresses warning `unused variable 'MRI' [-Wunused-variable]` for those fusions that don't need `MRI`.
2024-04-01[MLIR][Arith] Add rounding mode attribute to `truncf` (#86152)Victor Perez13-14/+309
Add rounding mode attribute to `arith`. This attribute can be used in different FP `arith` operations to control rounding mode. Rounding modes correspond to IEEE 754-specified rounding modes. Use in `arith.truncf` folding. As this is not supported in dialects other than LLVM, conversion should fail for now in case this attribute is present. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>
2024-04-01[TableGen] Introduce a less aggressive suppression for HwMode Decoder… ↵superZWT1233-47/+137
(#86060) 1. Remove 'AllModes' and 'DefaultMode' suffixes for DecoderTables under default HwMode. 2. Introduce a less aggressive suppression for HwMode DecoderTable, only reduce necessary tables duplications. This allows encodings under different HwModes to retain the original DecoderNamespace. 3. Change 'suppress-per-hwmode-duplicates' command option from bool type to enum type, allowing users to choose what level of suppression to use.
2024-04-01[NFC] [Serialization] Reordering lexcical and visible TU block after type ↵Chuanqi Xu2-31/+42
decl offsets This patch reorder the lexical block for the translation unit, visible update block for the TU and the viisble upaete block for the extern C context after the type decl offsets block. This should be a NFC patch. This is helpful for later optimizations for eliding unreachable declarations in the global module fragment. See the comments in https://github.com/llvm/llvm-project/pull/76930. Simply, if we want to get the reachable sets of declaratins during the writing process, we need to write the file-level context later than the process of writing declarations (which is the main process to determine the reachable set).
2024-04-01[CodeGen] Fix register pressure computation in MachinePipeliner (#87030)Ryotaro KASUGA3-166/+180
`RegisterClassInfo::getRegPressureSetLimit` has been changed to return a smaller value than before so the limit may become negative in later calculations. As a workaround, change to use `TargetRegisterInfo::getRegPressureSetLimit`. Also improve tests.
2024-04-01[clang-tidy] add new check readability-enum-initial-value (#86129)Congcong Cai9-0/+431
Fixes: #85243.
2024-04-01[mlir][math] Expand powfI operation for constant power operand. (#87081)Prashant Kumar4-5/+183
-- Convert `math.fpowi` to a series of `arith.mulf` operations. -- If the power is negative, we divide the result by 1.