aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-08-14Actually commit the codegen fixes this timeusers/ojhunt/while-loop-scopeOliver Hunt2-14/+35
2025-08-08other loop kinds in progressOliver Hunt5-40/+38
2025-08-07[clang][Sema] Fix the continue and break scope for while loopsOliver Hunt3-1/+15
Make sure we don't push the break and continue scope for a while loop until after we have evaluated the condition.
2025-08-07[AMDGPU] Adjust hard clause rules for gfx1250 (#152592)Stanislav Mekhanoshin5-7/+617
Change from GFX12: Relax S_CLAUSE rules to all all non-flat memory types in the same clause, and all Flat types in the same. For VMEM/FLAT clause types now look like: - Non-Flat (load, store, atomic): buffer, global, scratch, TDM, Async - Flat: load, store, atomic
2025-08-07[OpenMP] [IR Builder] Changes to Support Scan Operation (#136035)Anchu Rajendran S3-2/+765
Scan reductions are supported in OpenMP with the help of scan directive. Reduction clause of the for loop/simd directive can take an `inscan` modifier along with the body of the directive specifying a `scan` directive. This PR implements the lowering logic for scan reductions in workshare loops of OpenMP. The body of the for loop is split into two loops (Input phase loop and Scan Phase loop) and a scan reduction loop is added in the middle. The Input phase loop populates a temporary buffer with initial values that are to be reduced. The buffer is used by the reduction loop to perform scan reduction. Scan phase loop copies the values of the buffer to the reduction variable before executing the scan phase. Below is a high level view of the code generated. ``` <declare pointer to buffer> ptr omp parallel { size num_iters = <num_iters> // temp buffer allocation omp masked { buff = malloc(num_iters*scanvarstype) *ptr = buff } barrier; // input phase loop for (i: 0..<num_iters>) { <input phase>; buffer = *ptr; buffer[i] = red; } // scan reduction omp masked { for (int k = 0; k != ceil(log2(num_iters)); ++k) { i=pow(2,k) for (size cnt = last_iter; cnt >= i; --cnt) { buffer = *ptr; buffer[cnt] op= buffer[cnt-i]; } } } barrier; // scan phase loop for (0..<num_iters>) { buffer = *ptr; red = buffer[i] ; <scan phase>; } // temp buffer deletion omp masked { free(*ptr) } barrier; } ``` The temporary buffer needs to be shared between all threads performing reduction since it is read/written in Input and Scan workshare Loops. This is achieved by declaring a pointer to the buffer in the shared region and dynamically allocating the buffer by the master thread. This is the reason why allocation, deallocation and scan reduction are performed within `masked`. The code is verified to produce correct results for Fortran programs with the code changes in the PR https://github.com/llvm/llvm-project/pull/133149
2025-08-07[sanitizer] Warn if allocator size exceeds max user virtual address (#152428)Thurston Dang1-0/+18
This warns the user of incompatible configurations, such as 39-bit and 42-bit VMAs for AArch64 non-Android Linux ASan (https://github.com/llvm/llvm-project/issues/145259).
2025-08-07[mlir][AMDGPU] Allow non-contiguous destination memrefs for gather_to_lds ↵Quinn Dawkins4-4/+15
(#152559) The requirement that the LDS operand is contiguous is overly restrictive because it's perfectly valid to have a subview depend on subgroup IDs that is still subgroup contiguous. We could continue trying to do this verification based on the number of copied elements, but instead this change just opts to clarify the semantics on the op definition.
2025-08-07[AMDGPU] Fix buffer addressing mode matching (#152584)Stanislav Mekhanoshin30-4399/+9513
Starting in gfx1250, voffset and immoffset are zero-extended from 32 bits to 45 bits before being added together.
2025-08-07[CI] Tee Ninja Output to Log FilesAiden Grossman2-5/+7
This patch makes all of the ninja commands in the monolithic-* scripts write to log files in the current working directory. The plan is to use this to feed the ninja log into generate_test_report_github.py so we can surface compilation errors. Related to #152246. Reviewers: Keenuts, lnihlen, cmtice, dschuff, gburgessiv Reviewed By: Keenuts, cmtice Pull Request: https://github.com/llvm/llvm-project/pull/152331
2025-08-07[LLDB] Run a few more PDB tests with native PDB as well (#152580)nerix3-4/+15
Some DIA PDB tests pass with the native plugin already, but didn't test this. This adds test runs with the native plugin - no functional changes. In addition to the x86 calling convention test, there's also https://github.com/llvm/llvm-project/blob/9f102a90042fd3757c207112cfe64ee10182ace5/lldb/test/Shell/SymbolFile/PDB/calling-conventions-arm.test, but I can't test this.
2025-08-07[SLP][NFC]Cleanup undefs and the whole test, NFCAlexey Bataev1-61/+76
2025-08-07[HLSL] Add `isHLSLResourceRecordArray` method to `clang::Type` (#152450)Helena Kotas4-23/+16
Adds the `isHLSLResourceRecordArray()` method to the `Type` class. This method returns `true` if the `Type` represents an array of HLSL resource records. Defining this method on `Type` makes it accessible from both sema and codegen.
2025-08-07[flang][OpenMP] Break up ResolveOmpObject for readability, NFC (#151957)Krzysztof Parzyszek1-187/+176
The function ResolveOmpObject had a lot of highly-indented code in two variant visitors. Extract the visitors into their own functions, and reformat the code. Replace !(||) with !&&! in a couple of places to make the formatting a bit nicer. Use llvm::enumerate instead of manually maintaining iteration index.
2025-08-07[lld-macho] Process OSO prefix only textually in both input and output (#152063)Daniel Rodríguez Troitiño2-19/+23
The processing of `-oso_prefix` uses `llvm::sys::fs::real_path` from the user value, but it is later tried to be matched with the result of `make_absolute`. While `real_path` resolves special symbols like `.`, `..` and `~`, and resolves symlinks along the path, `make_absolute` does neither, causing an incompatibility in some situations. In macOS, temporary directories would normally be reported as `/var/folders/<random>`, but `/var` is in fact a symlink to `private/var`. If own is working on a temporary directory and uses `-oso_prefix .`, it will be expanded to `/private/var/folder/<random>`, while `make_absolute` will expand to `/var/folder/<random>` instead, and `-oso_prefix` will fail to remove the prefix from the `N_OSO` entries, leaving absolute paths to the temporary directory in the resulting file. This would happen in any situation in which the working directory includes a symlink, not only in temporary directories. One can change the usage of `make_absolute` to use `real_path` as well, but `real_path` will mean checking the file system for each `N_OSO` entry. The other solution is stop using `real_path` when processing `-oso_prefix` and manually expand an input of `.` like `make_absolute` will do. This second option is the one implemented here, since it is the closest to the visible behaviour of ld64 (like the removed comment notes), so it is the better one for compatibility. This means that a test that checked the usage of the tilde as `-oso_prefix` needs to be removed (since it was done by using `real_path`), and two new tests are provided checking that symlinks do not affect the result. The second test checks a change in behaviour, in which if one provides the input files with a prefix of `./`, even when using `-oso_prefix .` because the matching is textual, the `./` prefix will stay in the `N_OSO` entries. This matches the observed behaviour of ld64.
2025-08-07[clang][WebAssembly] Support reftypes & varargs in ↵Hood Chatham9-70/+151
test_function_pointer_signature (#150921) I fixed support for varargs functions (previously it didn't crash but the codegen was incorrect). I added tests for structs and unions which already work. With the multivalue abi they crash in the backend, so I added a sema check that rejects structs and unions for that abi. It will also crash in the backend if passed an int128 or float128 type.
2025-08-07[lldb] Fix UBSan complaints for #151460Igor Kudrin2-5/+5
2025-08-07[AMDGPU] bf16 clamp folding (#152573)Stanislav Mekhanoshin3-21/+22
2025-08-07Update .git-blame-ignore-revs for Pack/Unpack move (#152469)Andrzej Warzyński1-0/+3
Adds this large patch that merely moved Pack/Unpack Ops from the Tensor to Linalg dialects: * https://github.com/llvm/llvm-project/pull/123902
2025-08-07Revert "[NFC][lldb] Speed up lookup of shared modules (#152054)" (#152582)Augusto Noronha1-235/+7
This reverts commit 229d86026fa0e5d9412a0d5004532f0d9733aac6.
2025-08-07[CIR] Upstream EHScopeStack memory allocator (#152215)Andy Kaylor8-25/+219
When the cleanup handling code was initially upstreamed, a SmallVector was used to simplify the handling of the stack of cleanup objects. However, that mechanism won't scale well enough for the rate at which cleanup handlers are going to be pushed and popped while compiling a large program. This change introduces the custom memory allocator which is used in classic codegen and the CIR incubator. Thiis does not otherwise change the cleanup handling implementation and many parts of the infrastructure are still missing. This is not intended to have any observable effect on the generated CIR, but it does change the internal implementation significantly, so it's not exactly an NFC change. The functionality is covered by existing tests.
2025-08-07[DWARF] Speedup .gdb_index dumping (#151806)itrofimow1-7/+15
This patch drastically speed ups dumping .gdb_index for large indexes
2025-08-07[bazel] Port #151410: constFoldBinaryOp (#152568)Jordan Rupprecht1-0/+1
2025-08-07 [flang][OMPIRBuilder][MLIR][llvm] Backend support for atomic control ↵Anchu Rajendran S6-15/+165
options (#151579) Adding mlir to llvm support for atomic control options. Atomic Control Options are used to specify architectural characteristics to help lowering of atomic operations. The options used are: `-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`, `-f[no-]atomic-ignore-denormal-mode`. Legacy option `-m[no-]unsafe-fp-atomics` is aliased to `-f[no-]ignore-denormal-mode`. More details can be found in https://github.com/llvm/llvm-project/pull/102569. This PR implements the MLIR to LLVM lowering support of atomic control attributes specified with OpenMP `atomicUpdateOp`. Initial support can be found in PR: https://github.com/llvm/llvm-project/pull/150860
2025-08-07[ADT] Fix a comment typo in SmallPtrSet (NFC) (#152565)Kazu Hirata1-1/+1
In the large mode, SmallPtrSet uses quadratic probing with ProbeAmt++ just like DenseMap.
2025-08-07[AMDGPU] Recognise bitmask operations as srcmods on select (#152119)Chris Jackson6-282/+1331
Add to the VOP patterns to recognise when or/xor/and are masking only the most significant bit of i32/v2i32/i64 and replace with the corresponding FP source modifier.
2025-08-07[RISCV] Basic Objdump Mapping Symbol Support (#151452)Sam Elliott8-50/+110
This implements very basic support for RISC-V mapping symbols in llvm-objdump, sharing the implementation with how Arm/AArch64/CSKY implement this feature. This only supports the `$x` (instruction) and `$d` (data) mapping symbols for RISC-V, and not the version of `$x` which includes an architecture string suffix.
2025-08-07[CIR] Add VTableAddrPointOp (#148730)Andy Kaylor6-0/+129
This change adds the definition of VTableAddrPointOp and the related AddressPointAttr to the CIR dialect, along with tests for the parsing and verification of these elements. Code to generate this operation will be added in a later change.
2025-08-07[mlir][rocdl] Add `readfirstlane` intrinsic (#152551)Ivan Butygin3-1/+39
2025-08-07[NFC][lldb] Speed up lookup of shared modules (#152054)Augusto Noronha1-7/+235
By profiling LLDB debugging a Swift application without a dSYM and a large amount of .o files, I identified that querying shared modules was the biggest bottleneck when running "frame variable", and Clang types need to be searched. One of the reasons for that slowness is that the shared module list can grow very large, and the search through it is O(n). To solve this issue, this patch adds a new hashmap to the shared module list whose key is the name of the module, and the value is all the modules that share that name. This should speed up any search where the query contains the module name. rdar://156753350
2025-08-07[Clang] Fix __cpuidex conflict with CUDA (#152556)Aiden Grossman2-0/+6
The landing of #126324 made it so that __has_builtin returns false for aux triple builtins. CUDA offloading can sometimes compile where the host is in the aux triple (ie x86_64). This patch explicitly carves out NVPTX so that we do not run into redefinition errors.
2025-08-07[DirectX] Overlapping binding detection - check register space first (#152250)Helena Kotas2-1/+38
The code that checks for overlapping binding did not compare register space when one of the bindings was for an unbounded resource array, leading to false errors. This change fixes it.
2025-08-07[ADT] Make `getAutoSenseRadix` in `StringRef` global (#152503)Ilia Kuklin3-3/+18
Needed in #152308
2025-08-07[PowerPC] Fix a warningKazu Hirata1-1/+1
This patch fixes: llvm/lib/Target/PowerPC/PPCSelectionDAGInfo.h:25:3: error: 'EmitTargetCodeForMemcmp' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
2025-08-07[CIR] Mul CompoundAssignment support for ComplexType (#152354)Amr Hesham2-1/+213
This change adds support for Mul CompoundAssignment for ComplexType https://github.com/llvm/llvm-project/issues/141365
2025-08-07[clang][bytecode] Handle more invalid member pointer casts (#152546)Timm Baeder5-2/+64
2025-08-07[DFAJumpThreading] Prevent pass from using too much memory. (#145482)Bushev Dmitry1-14/+24
The limit 'dfa-max-num-paths' that is used to control number of enumerated paths was not checked against inside getPathsFromStateDefMap. It may lead to large memory consumption for complex enough switch statements.
2025-08-07[PowerPC][AIX] Using milicode for memcmp instead of libcall (#147093)zhijian lin11-9/+62
AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __memcmp routine is a millicode implementation; we use millicode for the memcmp function instead of a library call to improve performance.
2025-08-07[CIR] add support for file scope assembly (#152093)gitoleg6-0/+43
This PR adds a support for file scope assembly in CIR.
2025-08-07[NVPTX][Test-only] Add proper sm-version to ptxas-verify in ↵Abhilash Majumder1-1/+3
prefetch-inferas-test.ll (#152492) prefetch-inferas-test.ll was added in #146203 , but due to missing ptxas version the CI is defaulting to sm 60. This patch adds the arg in ptxas-verify check.
2025-08-07[CIR][NFC] Fix typo in ComplexRangeKind comment (#152535)Amr Hesham1-1/+1
Fix typo in ComplexRangeKind comment Catched in https://github.com/llvm/clangir/pull/1779
2025-08-07Scalarize vector `mad` operations for integer types (#152228)Kaitlin Peng3-12/+194
Fixes #152220. - Adds `dx_imad` and `dx_umad` to `isTargetIntrinsicTriviallyScalarizable` - Adds tests that confirm the intrinsic is now scalarizing
2025-08-07[mlir] MemRefToSPIRV propagate alignment attributes from MemRef ops. (#151723)Erick Ochoa Lopez2-12/+38
This patchset: * propagates alignment attributes from memref operations into the SPIR-V dialect, * fixes an error in the logic which previously propagated alignment attributes but did not add other MemoryAccess attributes. * adds a failure condition in the case where the alignment attribute from the memref dialect (64-bit wide) does not fit in SPIR-V's alignment attribute (specified to be 32-bit wide).
2025-08-07[lldb] Move the generic MCP server code into Protocol/MCP (NFC) (#152396)Jonas Devlieghere6-296/+320
This is a continuation of #152188, which started splitting up the MCP implementation into a generic implementation in Protocol/MCP that will be shared between LLDB and lldb-mcp. For now I kept all the networking code in the MCP server plugin. Once the changes to JSONTransport land, we might be able to move more of it into the Protocol library.
2025-08-07Remove __SHORT_FILE__ macro definition in CMake (#152344)Mehdi Amini3-30/+11
This per-file macro definition on the command line breaks caching of modules. See discussion in #150677 Instead we use a constexpr function that processes the __FILE__ macro, but prefer also the __FILE_NAME__ macro when available (clang/gcc) to spare compile-time in the frontend. If the constexpr function isn't const-evaluated, it'll be only evaluated when printing the debug message.
2025-08-07[mlir][emitc] Simplify emitc::isSupportedFloatType (NFC) (#152464)Andrey Timonin1-5/+2
2025-08-07[MLIR] Allow `constFoldBinaryOp` to fold `(T1, T1) -> T2` (#151410)Matthias Guenther4-33/+237
The `constFoldBinaryOp` helper function had limited support for different input and output types, but the static type of the underlying value (e.g. `APInt`) had to match between the inputs and the output. This worked fine for int comparisons of the form `(intN, intN) -> int1`, as the static type signature was `(APInt, APInt) -> APInt`. However, float comparisons map `(floatN, floatN) -> int1`, with a static type signature of `(APFloat, APFloat) -> APInt`. This use case wasn't supported by `constFoldBinaryOp`. `constFoldBinaryOp` now accepts an optional template argument overriding the return type in case it differs from the input type. If the new template argument isn't provided, the default behavior is unchanged (i.e. the return type will be assumed to match the input type). `constFoldUnaryOp` received similar changes in order to support folding non-cast ops of the form `(T1) -> T2` (e.g. a `sign` op mapping `(floatN) -> sint32`).
2025-08-07[AArch64] Move tryCombineToBSL. NFCDavid Green1-100/+100
This is for #151855, to make the changes more obvious.
2025-08-07[mlir][linalg]-Fix wrong assertion in the getMatchingYieldValue inter… ↵Amir Bishara1-1/+1
(#89590) …face In order to have a consistent implementation for getMatchingYieldValue for linalg generic with buffer/tensor semantics, we should assert the opOperand index based on the numDpsInits and not numOfResults which may be zero in the buffer semantics.
2025-08-07ELF: -r: Call assignAddresses only onceFangrui Song3-7/+3
The fixed-point layout algorithm handles linker scripts, thunks, and relaxOnce (to suppress out-of-range GOT-indirect-to-PC-relative optimization). These passes are not needed for relocatable links because they require address information that is not yet available. Since we don't scan relocations for relocatable links, the `createThunks` and `relaxOnce` functions are no-ops anyway, making these passes redundant. To prevent cluttering the line history, I place the `if (...) break;` inside the for loop. Pull Request: https://github.com/llvm/llvm-project/pull/152240
2025-08-07[mlir][vector] Replace vector.splat with vector.broadcast in some tests ↵James Newling16-53/+53
(#152230) Splat is deprecated, and being prepared for removal in a future release. https://discourse.llvm.org/t/rfc-mlir-vector-deprecate-then-remove-vector-splat/87143/5 The command I used, catches almost every splat op: ``` perl -i -pe 's/vector\.splat\s+(\S+)\s*:\s*vector<((?:\[?\d+\]?x)*)\s*([^>]+)>/vector.broadcast $1 : $3 to vector<$2$3>/g' filename ```