aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/AsmParser/LLLexer.cpp
AgeCommit message (Collapse)AuthorFilesLines
2022-11-15Revert "[MemProf] ThinLTO summary support"Teresa Johnson1-8/+0
This reverts commit 47459455009db4790ffc3765a2ec0f8b4934c2a4. Revert while I try to fix a couple of non-Linux build failures.
2022-11-15[MemProf] ThinLTO summary supportTeresa Johnson1-0/+8
Implements the ThinLTO summary support for memprof related metadata. This includes support for the assembly format, and for building the summary from IR during ModuleSummaryAnalysis. To reduce space in both the bitcode format and the in memory index, we do 2 things: 1. We keep a single vector of all uniq stack id hashes, and record the index into this vector in the callsite and allocation memprof summaries. 2. When building the combined index during the LTO link, the callsite and allocation memprof summaries are only kept on the FunctionSummary of the prevailing copy. Differential Revision: https://reviews.llvm.org/D135714
2022-11-04[IR] Switch everything to use memory attributeNikita Popov1-0/+3
This switches everything to use the memory attribute proposed in https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579. The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly attributes are dropped. The readnone, readonly and writeonly attributes are restricted to parameters only. The old attributes are auto-upgraded both in bitcode and IR. The bitcode upgrade is a policy requirement that has to be retained indefinitely. The IR upgrade is mainly there so it's not necessary to update all tests using memory attributes in this patch, which is already large enough. We could drop that part after migrating tests, or retain it longer term, to make it easier to import IR from older LLVM versions. High-level Function/CallBase APIs like doesNotAccessMemory() or setDoesNotAccessMemory() are mapped transparently to the memory attribute. Code that directly manipulates attributes (e.g. via AttributeList) on the other hand needs to switch to working with the memory attribute instead. Differential Revision: https://reviews.llvm.org/D135780
2022-10-21[IR] Add support for memory attributeNikita Popov1-0/+6
This implements IR and bitcode support for the memory attribute, as specified in https://reviews.llvm.org/D135597. The new attribute is not used for anything yet (and as such, the old memory attributes are unaffected). Differential Revision: https://reviews.llvm.org/D135592
2022-09-15[AArch64][SME] Fix lowering of llvm.aarch64.get.pstatesm()Sander de Smalen1-0/+2
A thread may not have access to SME or TPIDR2_EL0, so in order to safely query PSTATE.SM in a streaming-compatible function, the code should call `__arm_sme_state()`, as described in the ABI: https://github.com/ARM-software/abi-aa/pull/123/commits/c2bb09c4d4ee60a5787baf1ccc7e92e67e4240b7 This means that the value of pstate.sm is: * 0 if the function is non-streaming. * 1 if the function has `arm_streaming` or `arm_locally_streaming`. * evaluated at runtime by a call to __arm_sme_state() otherwise. This patch also adds a calling convention for calls to SME support routines. At some point we can remove the need for the llvm.aarch64.get.pstatesm() intrinsic and use function calls (with the corresponding cc) directly instead. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131571
2022-07-13Remove 'no_sanitize_memtag'. Add 'sanitize_memtag'.Mitch Phillips1-1/+0
For MTE globals, we should have clang emit the attribute for all GV's that it creates, and then use that in the upcoming AArch64 global tagging IR pass. We need a positive attribute for this sanitizer (rather than implicit sanitization of all globals) because it needs to interact with other parts of LLVM, including: 1. Suppressing certain global optimisations (like merging), 2. Emitting extra directives by the ASM writer, and 3. Putting extra information in the symbol table entries. While this does technically make the LLVM IR / bitcode format non-backwards-compatible, nobody should have used this attribute yet, because it's a no-op. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D128950
2022-07-06[LLVM] Add the support for fmax and fmin in atomicrmw instructionShilei Tian1-1/+1
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041
2022-06-10Add sanitizer-specific GlobalValue attributes.Mitch Phillips1-0/+5
Plan is the migrate the global variable metadata for sanitizers, that's currently carried around generally in the 'llvm.asan.globals' section, onto the global variable itself. This patch adds the attribute and plumbs it through the LLVM IR and bitcode formats, but is a no-op other than that so far. Reviewed By: vitalybuka, kstoimenov Differential Revision: https://reviews.llvm.org/D126100
2022-04-27[AsmParser] Automatically declare and lex attribute keywords (NFC)Nikita Popov1-82/+5
Rather than listing these by hand, include all enum attribute keywords from Attributes.inc. This reduces the number of places one has to update whenever an enum attribute is added. Differential Revision: https://reviews.llvm.org/D124465
2022-04-26Attributes: add a new `allocptr` attributeAugie Fackler1-0/+1
This continues the push away from hard-coded knowledge about functions towards attributes. We'll use this to annotate free(), realloc() and cousins and obviate the hard-coded list of free functions. Differential Revision: https://reviews.llvm.org/D123083
2022-04-05[LLVMContext] Replace enableOpaquePointers() with setOpaquePointers()Nikita Popov1-2/+2
This allows both explicitly enabling and explicitly disabling opaque pointers, in anticipation of the default switching at some point. This also slightly changes the rules by allowing calls if either the opaque pointer mode has not yet been set (explicitly or implicitly) or if the value remains unchanged.
2022-03-21Revert "Revert "[OpaquePtr][LLParser] Automatically detect opaque pointers ↵Arthur Eubanks1-1/+4
in .ll files"" This reverts commit 9c96a6bbfdde665b5c2389100a15acdeea0f4145. Issues were already fixed at head.
2022-03-21Revert "[OpaquePtr][LLParser] Automatically detect opaque pointers in .ll files"Mitch Phillips1-4/+1
This reverts commit 295172ef51c6b9a73bc0fdcfd25f8c41ead9034a. Reason: Broke the ASan buildbot. More details are available on the original Phab review at https://reviews.llvm.org/D119482.
2022-03-17[OpaquePtr][LLParser] Automatically detect opaque pointers in .ll filesArthur Eubanks1-1/+4
This allows us to not have to specify -opaque-pointers when updating IR tests from typed pointers to opaque pointers. We detect opaque pointers in .ll files by looking for relevant tokens, either "ptr" or "*". Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D119482
2022-03-04Attributes: add a new allocalign attributeAugie Fackler1-0/+1
This will let us start moving away from hard-coded attributes in MemoryBuiltins.cpp and put the knowledge about various attribute functions in the compilers that emit those calls where it probably belongs. Differential Revision: https://reviews.llvm.org/D117921
2022-03-01[SanitizerBounds] Add support for NoSanitizeBounds functionTong Zhang1-0/+1
Currently adding attribute no_sanitize("bounds") isn't disabling -fsanitize=local-bounds (also enabled in -fsanitize=bounds). The Clang frontend handles fsanitize=array-bounds which can already be disabled by no_sanitize("bounds"). However, instrumentation added by the BoundsChecking pass in the middle-end cannot be disabled by the attribute. The fix is very similar to D102772 that added the ability to selectively disable sanitizer pass on certain functions. In this patch, if no_sanitize("bounds") is provided, an additional function attribute (NoSanitizeBounds) is attached to IR to let the BoundsChecking pass know we want to disable local-bounds checking. In order to support this feature, the IR is extended (similar to D102772) to make Clang able to preserve the information and let BoundsChecking pass know bounds checking is disabled for certain function. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D119816
2022-02-14Extend the `uwtable` attribute with unwind table kindMomchil Velikov1-0/+2
We have the `clang -cc1` command-line option `-funwind-tables=1|2` and the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or asynchronous unwind tables (2)`. However, this is encoded in LLVM IR by the presence or the absence of the `uwtable` attribute, i.e. we lose the information whether to generate want just some unwind tables or asynchronous unwind tables. Asynchronous unwind tables take more space in the runtime image, I'd estimate something like 80-90% more, as the difference is adding roughly the same number of CFI directives as for prologues, only a bit simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even more, if you consider tail duplication of epilogue blocks. Asynchronous unwind tables could also restrict code generation to having only a finite number of frame pointer adjustments (an example of *not* having a finite number of `SP` adjustments is on AArch64 when untagging the stack (MTE) in some cases the compiler can modify `SP` in a loop). Having the CFI precise up to an instruction generally also means one cannot bundle together CFI instructions once the prologue is done, they need to be interspersed with ordinary instructions, which means extra `DW_CFA_advance_loc` commands, further increasing the unwind tables size. That is to say, async unwind tables impose a non-negligible overhead, yet for the most common use cases (like C++ exceptions), they are not even needed. This patch extends the `uwtable` attribute with an optional value: - `uwtable` (default to `async`) - `uwtable(sync)`, synchronous unwind tables - `uwtable(async)`, asynchronous (instruction precise) unwind tables Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D114543
2021-12-20[llvm][IR] Add no_cfi constantSami Tolvanen1-0/+1
With Control-Flow Integrity (CFI), the LowerTypeTests pass replaces function references with CFI jump table references, which is a problem for low-level code that needs the address of the actual function body. For example, in the Linux kernel, the code that sets up interrupt handlers needs to take the address of the interrupt handler function instead of the CFI jump table, as the jump table may not even be mapped into memory when an interrupt is triggered. This change adds the no_cfi constant type, which wraps function references in a value that LowerTypeTestsModule::replaceCfiUses does not replace. Link: https://github.com/ClangBuiltLinux/linux/issues/1353 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D108478
2021-12-14[LTO] Ignore unreachable virtual functions in WPD in hybrid LTO.Mingming Liu1-0/+1
Differential Revision: https://reviews.llvm.org/D115492
2021-09-27[ThinLTO] Add noRecurse and noUnwind thinlink function attribute propagationmodimo1-0/+3
Thinlink provides an opportunity to propagate function attributes across modules, enabling additional propagation opportunities. This change propagates (currently default off, turn on with `disable-thinlto-funcattrs=1`) noRecurse and noUnwind based off of function summaries of the prevailing functions in bottom-up call-graph order. Testing on clang self-build: 1. There's a 35-40% increase in noUnwind functions due to the additional propagation opportunities. 2. Throughput is measured at 10-15% increase in thinlink time which itself is 1.5% of E2E link time. Implementation-wise this adds the following summary function attributes: 1. noUnwind: function is noUnwind 2. mayThrow: function contains a non-call instruction that `Instruction::mayThrow` returns true on (e.g. windows SEH instructions) 3. hasUnknownCall: function contains calls that don't make it into the summary call-graph thus should not be propagated from (e.g. indirect for now, could add no-opt functions as well) Testing: Clang self-build passes and 2nd stage build passes check-all ninja check-all with newly added tests passing Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D36850
2021-09-10[OpaquePtr] Forbid mixing typed and opaque pointersNikita Popov1-1/+9
Currently, opaque pointers are supported in two forms: The -force-opaque-pointers mode, where all pointers are opaque and typed pointers do not exist. And as a simple ptr type that can coexist with typed pointers. This patch removes support for the mixed mode. You either get typed pointers, or you get opaque pointers, but not both. In the (current) default mode, using ptr is forbidden. In -opaque-pointers mode, all pointers are opaque. The motivation here is that the mixed mode introduces additional issues that don't exist in fully opaque mode. D105155 is an example of a design problem. Looking at D109259, it would probably need additional work to support mixed mode (e.g. to generate GEPs for typed base but opaque result). Mixed mode will also end up inserting many casts between i8* and ptr, which would require significant additional work to consistently avoid. I don't think the mixed mode is particularly valuable, as it doesn't align with our end goal. The only thing I've found it to be moderately useful for is adding some opaque pointer tests in between typed pointer tests, but I think we can live without that. Differential Revision: https://reviews.llvm.org/D109290
2021-08-20[clang][Codegen] Introduce the disable_sanitizer_instrumentation attributeAlexander Potapenko1-0/+1
The purpose of __attribute__((disable_sanitizer_instrumentation)) is to prevent all kinds of sanitizer instrumentation applied to a certain function, Objective-C method, or global variable. The no_sanitize(...) attribute drops instrumentation checks, but may still insert code preventing false positive reports. In some cases though (e.g. when building Linux kernel with -fsanitize=kernel-memory or -fsanitize=thread) the users may want to avoid any kind of instrumentation. Differential Revision: https://reviews.llvm.org/D108029
2021-07-20[IR] Rename `comdat noduplicates` to `comdat nodeduplicate`Fangrui Song1-1/+1
In the textual format, `noduplicates` means no COMDAT/section group deduplication is performed. Therefore, if both sets of sections are retained, and they happen to define strong external symbols with the same names, there will be a duplicate definition linker error. In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`. The name describes the corollary instead of the immediate semantics. The name can cause confusion to other binary formats (ELF, wasm) which have implemented/ want to implement the "no deduplication" selection kind. Rename it to be clearer. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D106319
2021-07-15[IR] Add elementtype attributeNikita Popov1-0/+1
This implements the elementtype attribute specified in D105407. It just adds the attribute and the specified verifier rules, but doesn't yet make use of it anywhere. Differential Revision: https://reviews.llvm.org/D106008
2021-06-14[LLParser] Remove outdated deplibsXuanda Yang1-1/+0
The comment mentions deplibs should be removed in 4.0. Removing it in this patch. Reviewed By: compnerd, dexonsmith, lattner Differential Revision: https://reviews.llvm.org/D102763
2021-05-25[SanitizeCoverage] Add support for NoSanitizeCoverage function attributeMarco Elver1-0/+1
We really ought to support no_sanitize("coverage") in line with other sanitizers. This came up again in discussions on the Linux-kernel mailing lists, because we currently do workarounds using objtool to remove coverage instrumentation. Since that support is only on x86, to continue support coverage instrumentation on other architectures, we must support selectively disabling coverage instrumentation via function attributes. Unfortunately, for SanitizeCoverage, it has not been implemented as a sanitizer via fsanitize= and associated options in Sanitizers.def, but rolls its own option fsanitize-coverage. This meant that we never got "automatic" no_sanitize attribute support. Implement no_sanitize attribute support by special-casing the string "coverage" in the NoSanitizeAttr implementation. To keep the feature as unintrusive to existing IR generation as possible, define a new negative function attribute NoSanitizeCoverage to propagate the information through to the instrumentation pass. Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035 Reviewed By: vitalybuka, morehouse Differential Revision: https://reviews.llvm.org/D102772
2021-05-17IR/AArch64/X86: add "swifttailcc" calling convention.Tim Northover1-0/+1
Swift's new concurrency features are going to require guaranteed tail calls so that they don't consume excessive amounts of stack space. This would normally mean "tailcc", but there are also Swift-specific ABI desires that don't naturally go along with "tailcc" so this adds another calling convention that's the combination of "swiftcc" and "tailcc". Support is added for AArch64 and X86 for now.
2021-05-14IR+AArch64: add a "swiftasync" argument attribute.Tim Northover1-0/+1
This extends any frame record created in the function to include that parameter, passed in X22. The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001 in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect of this is that tools walking the stack should expect to see one of three values there: * 0b0000 => a normal, non-extended record with just [FP, LR] * 0b0001 => the extended record [X22, FP, LR] * 0b1111 => kernel space, and a non-extended record. All other values are currently reserved. If compiling for arm64e this context pointer is address-discriminated with the discriminator 0xc31a and the DB (process-specific) key. There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing front-ends access to this slot (and forcing its creation initialized to nullptr if necessary).
2021-05-13[IR] Introduce the opaque pointer typeArthur Eubanks1-0/+1
The opaque pointer type is essentially just a normal pointer type with a null pointee type. This also adds support for the opaque pointer type to the bitcode reader/writer, as well as to textual IR. To avoid confusion with existing pointer types, we disallow creating a pointer to an opaque pointer. Opaque pointer types should not be widely used at this point since many parts of LLVM still do not support them. The next steps are to add some very simple use cases of opaque pointers to make sure they work, then start pretending that all pointers are opaque pointers and see what breaks. https://lists.llvm.org/pipermail/llvm-dev/2021-May/150359.html Reviewed By: dblaikie, dexonsmith, pcc Differential Revision: https://reviews.llvm.org/D101704
2021-04-26[Lexer] Allow LLLexer to be used as an APIWilliam S. Moses1-1/+1
Explose LLVM Lexer for usage externally as an API Differential Revision: https://reviews.llvm.org/D100920
2021-03-22[IR] Add vscale_range IR function attributeBradley Smith1-0/+1
This attribute represents the minimum and maximum values vscale can take. For now this attribute is not hooked up to anything during codegen, this will be added in the future when such codegen is considered stable. Additionally hook up the -msve-vector-bits=<x> clang option to emit this attribute. Differential Revision: https://reviews.llvm.org/D98030
2021-01-27[ThinLTO] Add Visibility bits to GlobalValueSummary::GVFlagsFangrui Song1-0/+1
Imported functions and variable get the visibility from the module supplying the definition. However, non-imported definitions do not get the visibility from (ELF) the most constraining visibility among all modules (Mach-O) the visibility of the prevailing definition. This patch * adds visibility bits to GlobalValueSummary::GVFlags * computes the result visibility and propagates it to all definitions Protected/hidden can imply dso_local which can enable some optimizations (this is stronger than GVFlags::DSOLocal because the implied dso_local can be leveraged for ELF -shared while default visibility dso_local has to be cleared for ELF -shared). Note: we don't have summaries for declarations, so for ELF if a declaration has the most constraining visibility, the result visibility may not be that one. Differential Revision: https://reviews.llvm.org/D92900
2021-01-26Support for instrumenting only selected files or functionsPetr Hosek1-0/+1
This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820
2021-01-26Revert "Support for instrumenting only selected files or functions"Petr Hosek1-1/+0
This reverts commit 4edf35f11a9e20bd5df3cb47283715f0ff38b751 because the test fails on Windows bots.
2021-01-26Support for instrumenting only selected files or functionsPetr Hosek1-0/+1
This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820
2020-12-30[X86] Add x86_amx type for intel AMX.Luo, Yuanke1-0/+1
The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. It can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981. Differential Revision: https://reviews.llvm.org/D91927
2020-12-14[clang][IR] Add support for leaf attributeGulfem Savrun Yeniceri1-0/+1
This patch adds support for leaf attribute as an optimization hint in Clang/LLVM. Differential Revision: https://reviews.llvm.org/D90275
2020-11-25Adding PoisonValue for representing poison value explicitly in IRZhengyang Liu1-0/+1
Define ConstantData::PoisonValue. Add support for poison value to LLLexer/LLParser/BitcodeReader/BitcodeWriter. Add support for poison value to llvm-c interface. Add support for poison value to OCaml binding. Add m_Poison in PatternMatch. Differential Revision: https://reviews.llvm.org/D71126
2020-11-19[llvm][IR] Add dso_local_equivalent ConstantLeonard Chan1-0/+1
The `dso_local_equivalent` constant is a wrapper for functions that represents a value which is functionally equivalent to the global passed to this. That is, if this accepts a function, calling this constant should have the same effects as calling the function directly. This could be a direct reference to the function, the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the resolved function as a call target. When lowered, the returned address must have a constant offset at link time from some other symbol defined within the same binary. The address of this value is also insignificant. The name is leveraged from `dso_local` where use of a function or variable is resolved to a symbol in the same linkage unit. In this patch: - Addition of `dso_local_equivalent` and handling it - Update Constant::needsRelocation() to strip constant inbound GEPs and take advantage of `dso_local_equivalent` for relative references This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959) which makes vtables readonly. This works by replacing the dynamic relocations for function pointers in them with static relocations that represent the offset between the vtable and virtual functions. If a function is externally defined, `dso_local_equivalent` can be used as a generic wrapper for the function to still allow for this static offset calculation to be done. See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details. Differential Revision: https://reviews.llvm.org/D77248
2020-11-17Revert "[IR] add fn attr for no_stack_protector; prevent inlining on mismatch"Nick Desaulniers1-1/+0
This reverts commit b7926ce6d7a83cdf70c68d82bc3389c04009b841. Going with a simpler approach.
2020-11-09[AMDGPU] Add amdgpu_gfx calling conventionSebastian Neubauer1-0/+1
Add a calling convention called amdgpu_gfx for real function calls within graphics shaders. For the moment, this uses the same calling convention as other calls in amdgpu, with registers excluded for return address, stack pointer and stack buffer descriptor. Differential Revision: https://reviews.llvm.org/D88540
2020-10-23[IR] add fn attr for no_stack_protector; prevent inlining on mismatchNick Desaulniers1-0/+1
It's currently ambiguous in IR whether the source language explicitly did not want a stack a stack protector (in C, via function attribute no_stack_protector) or doesn't care for any given function. It's common for code that manipulates the stack via inline assembly or that has to set up its own stack canary (such as the Linux kernel) would like to avoid stack protectors in certain functions. In this case, we've been bitten by numerous bugs where a callee with a stack protector is inlined into an __attribute__((__no_stack_protector__)) caller, which generally breaks the caller's assumptions about not having a stack protector. LTO exacerbates the issue. While developers can avoid this by putting all no_stack_protector functions in one translation unit together and compiling those with -fno-stack-protector, it's generally not very ergonomic or as ergonomic as a function attribute, and still doesn't work for LTO. See also: https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/ https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u Typically, when inlining a callee into a caller, the caller will be upgraded in its level of stack protection (see adjustCallerSSPLevel()). By adding an explicit attribute in the IR when the function attribute is used in the source language, we can now identify such cases and prevent inlining. Block inlining when the callee and caller differ in the case that one contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`. Fixes pr/47479. Reviewed By: void Differential Revision: https://reviews.llvm.org/D87956
2020-10-20[IR] Adds mustprogress as a LLVM IR attributeAtmn Patel1-0/+1
This adds the LLVM IR attribute `mustprogress` as defined in LangRef through D86233. This attribute will be applied to functions with in languages like C++ where forward progress is guaranteed. Functions without this attribute are not required to make progress. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85393
2020-07-20IR: Define byref parameter attributeMatt Arsenault1-0/+1
This allows tracking the in-memory type of a pointer argument to a function for ABI purposes. This is essentially a stripped down version of byval to remove some of the stack-copy implications in its definition. This includes the base IR changes, and some tests for places where it should be treated similarly to byval. Codegen support will be in a future patch. My original attempt at solving some of these problems was to repurpose byval with a different address space from the stack. However, it is technically permitted for the callee to introduce a write to the argument, although nothing does this in reality. There is also talk of removing and replacing the byval attribute, so a new attribute would need to take its place anyway. This is intended avoid some optimization issues with the current handling of aggregate arguments, as well as fixes inflexibilty in how frontends can specify the kernel ABI. The most honest representation of the amdgpu_kernel convention is to expose all kernel arguments as loads from constant memory. Today, these are raw, SSA Argument values and codegen is responsible for turning these into loads. Background: There currently isn't a satisfactory way to represent how arguments for the amdgpu_kernel calling convention are passed. In reality, arguments are passed in a single, flat, constant memory buffer implicitly passed to the function. It is also illegal to call this function in the IR, and this is only ever invoked by a driver of some kind. It does not make sense to have a stack passed parameter in this context as is implied by byval. It is never valid to write to the kernel arguments, as this would corrupt the inputs seen by other dispatches of the kernel. These argumets are also not in the same address space as the stack, so a copy is needed to an alloca. From a source C-like language, the kernel parameters are invisible. Semantically, a copy is always required from the constant argument memory to a mutable variable. The current clang calling convention lowering emits raw values, including aggregates into the function argument list, since using byval would not make sense. This has some unfortunate consequences for the optimizer. In the aggregate case, we end up with an aggregate store to alloca, which both SROA and instcombine turn into a store of each aggregate field. The optimizer never pieces this back together to see that this is really just a copy from constant memory, so we end up stuck with expensive stack usage. This also means the backend dictates the alignment of arguments, and arbitrarily picks the LLVM IR ABI type alignment. By allowing an explicit alignment, frontends can make better decisions. For example, there's real no advantage to an aligment higher than 4, so a frontend could choose to compact the argument layout. Similarly, there is a high penalty to using an alignment lower than 4, so a frontend could opt into more padding for small arguments. Another design consideration is when it is appropriate to expose the fact that these arguments are all really passed in adjacent memory. Currently we have a late IR optimization pass in codegen to rewrite the kernel argument values into explicit loads to enable vectorization. In most programs, unrelated argument loads can be merged together. However, exposing this property directly from the frontend has some disadvantages. We still need a way to track the original argument sizes and alignments to report to the driver. I find using some side-channel, metadata mechanism to track this unappealing. If the kernel arguments were exposed as a single buffer to begin with, alias analysis would be unaware that the padding bits betewen arguments are meaningless. Another family of problems is there are still some gaps in replacing all of the available parameter attributes with metadata equivalents once lowered to loads. The immediate plan is to start using this new attribute to handle all aggregate argumets for kernels. Long term, it makes sense to migrate all kernel arguments, including scalars, to be passed indirectly in the same manner. Additional context is in D79744.
2020-07-08[LLVM] Accept `noundef` attribute in function definitions/callsGui Andrade1-0/+1
The `noundef` attribute indicates an argument or return value which may never have an undef value representation. This patch allows LLVM to parse the attribute. Differential Revision: https://reviews.llvm.org/D83412
2020-06-10[StackSafety] Add info into function summaryVitaly Buka1-0/+2
Summary: This patch adds optional field into function summary, implements asm and bitcode serialization. YAML serialization is omitted and can be added later if needed. This patch includes this information into summary only if module contains at least one sanitize_memtag function. In a near future MTE is the user of the analysis. Later if needed we can provede more direct control on when information is included into summary. Reviewers: eugenis Subscribers: hiraditya, steven_wu, dexonsmith, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80908
2020-05-28[ThinLTO] Compute the basic block count across modules.Hiroshi Yamauchi1-0/+1
Summary: Count the per-module number of basic blocks when the module summary is computed and sum them up during Thin LTO indexing. This is used to estimate the working set size under the partial sample PGO. This is split off of D79831. Reviewers: davidxl, espindola Subscribers: emaste, inglorion, hiraditya, MaskRay, steven_wu, dexonsmith, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80403
2020-05-15[IR] Convert null-pointer-is-valid into an enum attributeNikita Popov1-0/+1
The "null-pointer-is-valid" attribute needs to be checked by many pointer-related combines. To make the check more efficient, convert it from a string into an enum attribute. In the future, this attribute may be replaced with data layout properties. Differential Revision: https://reviews.llvm.org/D78862
2020-05-15[IR][BFloat] Add BFloat IR typeTies Stuij1-2/+10
Summary: The BFloat IR type is introduced to provide support for, initially, the BFloat16 datatype introduced with the Armv8.6 architecture (optional from Armv8.2 onwards). It has an 8-bit exponent and a 7-bit mantissa and behaves like an IEEE 754 floating point IR type. This is part of a patch series upstreaming Armv8.6 features. Subsequent patches will upstream intrinsics support and C-lang support for BFloat. Reviewers: SjoerdMeijer, rjmccall, rsmith, liutianle, RKSimon, craig.topper, jfb, LukeGeeson, sdesmalen, deadalnix, ctetreau Subscribers: hiraditya, llvm-commits, danielkiss, arphaman, kristof.beyls, dexonsmith Tags: #llvm Differential Revision: https://reviews.llvm.org/D78190
2020-05-12Add nomerge function attribute to supress tail merge optimization in simplifyCFGZequan Wu1-0/+1
We want to add a way to avoid merging identical calls so as to keep the separate debug-information for those calls. There is also an asan usecase where having this attribute would be beneficial to avoid alternative work-arounds. Here is the link to the feature request: https://bugs.llvm.org/show_bug.cgi?id=42783. `nomerge` is different from `noline`. `noinline` prevents function from inlining at callsites, but `nomerge` prevents multiple identical calls from being merged into one. This patch adds `nomerge` to disable the optimization in IR level. A followup patch will be needed to let backend understands `nomerge` and avoid tail merge at backend. Reviewed By: asbirlea, rnk Differential Revision: https://reviews.llvm.org/D78659