rocket-tools/riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
3 days	[AMDGPU] Allow readonly features to be written to IR when there is no target ↵	Changpeng Fang	1	-2/+5
	(#148141) Fixes: SWDEV-541399
2025-07-09	[Clang] Respect MS layout attributes during CUDA/HIP device compilation ↵	Yaxun (Sam) Liu	1	-2/+3
	(#146620) This patch fixes an issue where Microsoft-specific layout attributes, such as __declspec(empty_bases), were ignored during CUDA/HIP device compilation on a Windows host. This caused a critical memory layout mismatch between host and device objects, breaking libraries that rely on these attributes for ABI compatibility. The fix introduces a centralized hasMicrosoftRecordLayout() check within the TargetInfo class. This check is aware of the auxiliary (host) target and is set during TargetInfo::adjust if the host uses a Microsoft ABI. The empty_bases, layout_version, and msvc::no_unique_address attributes now use this centralized flag, ensuring device code respects them and maintains layout consistency with the host. Fixes: https://github.com/llvm/llvm-project/issues/146047
2025-06-04	[HLSL][SPIR-V] Implement vk::ext_builtin_input attribute (#138530)	Nathan Gauër	1	-0/+2
	This variable attribute is used in HLSL to add Vulkan specific builtins in a shader. The attribute is documented here: https://github.com/microsoft/hlsl-specs/blob/17727e88fd1cb09013cb3a144110826af05f4dd5/proposals/0011-inline-spirv.md Those variable, even if marked as `static` are externally initialized by the pipeline/driver/GPU. This is handled by moving them to a specific address space `hlsl_input`, also added by this commit. The design for input variables in Clang can be found here: https://github.com/llvm/wg-hlsl/blob/355771361ef69259fef39a65caef8bff9cb4046d/proposals/0019-spirv-input-builtin.md Co-authored-by: Justin Bogner <mail@justinbogner.com>
2025-05-31	[Basic] Remove unused includes (NFC) (#142295)	Kazu Hirata	1	-1/+0
	These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-05-19	[AMDGPU] Set AS8 address width to 48 bits	Alexander Richardson	1	-4/+3
	Of the 128-bits of buffer descriptor only 48 bits are address bits, so following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54, the logic conclusion is to set the index width to 48 bits instead of the current value of 128. Most of the test changes are mechanical datalayout updates, but there is one actual change: the ptrmask test now uses .i48 instead of .i128 and I had to update SelectionDAGBuilder to correctly extend the mask. Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139419
2025-05-16	[clang] Use llvm::replace (NFC) (#140264)	Kazu Hirata	1	-2/+2

2025-04-22	[HLSL] Use hlsl_device address space for getpointer. (#127675)	Steven Perron	1	-0/+2
	We add the hlsl_device address space to represent the device memory space as defined in section 1.7.1.3 of the [HLSL spec](https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf). Fixes https://github.com/llvm/llvm-project/issues/127075
2025-04-10	[HLSL][SPIR-V] Add hlsl_private address space for SPIR-V (#133464)	Nathan Gauër	1	-0/+4
	This is an alternative to https://github.com/llvm/llvm-project/pull/122103 In SPIR-V, private global variables have the Private storage class. This PR adds a new address space which allows frontend to emit variable with this storage class when targeting this backend. This is covered in this proposal: llvm/wg-hlsl@4c9e11a This PR will cause addrspacecast to show up in several cases, like class member functions or assignment. Those will have to be handled in the backend later on, particularly to fixup pointer storage classes in some functions. Before this change, global variable were emitted with the 'Function' storage class, which was wrong.
2025-04-02	[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on ↵	Juan Manuel Martinez Caamaño	1	-1/+1
	gfx9/10 (#133055) This patch introduces the `vmem-to-lds-load-insts` target feature, which can be used to enable builtins `__builtin_amdgcn_global_load_lds` and `__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this feature. This feature is only available on gfx9/10. A limitation of using a common target feature for both builtins is that we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available on gfx6,7,8.
2025-02-27	Add clang atomic control options and attribute (#114841)	Yaxun (Sam) Liu	1	-2/+3
	Add option and statement attribute for controlling emitting of target-specific metadata to atomicrmw instructions in IR. The RFC for this attribute and option is https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641, Originally a pragma was proposed, then it was changed to clang attribute. This attribute allows users to specify one, two, or all three options and must be applied to a compound statement. The attribute can also be nested, with inner attributes overriding the options specified by outer attributes or the target's default options. These options will then determine the target-specific metadata added to atomic instructions in the IR. In addition to the attribute, three new compiler options are introduced: `-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`, `-f[no-]atomic-ignore-denormal-mode`. These compiler options allow users to override the default options through the Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to `-f[no-]ignore-denormal-mode`. In terms of implementation, the atomic attribute is represented in the AST by the existing AttributedStmt, with minimal changes to AST and Sema. During code generation in Clang, the CodeGenModule maintains the current atomic options, which are used to emit the relevant metadata for atomic instructions. RAII is used to manage the saving and restoring of atomic options when entering and exiting nested AttributedStmt.
2025-02-04	[StrTable] Switch Clang builtins to use string tables	Chandler Carruth	1	-9/+17
	This both reapplies #118734, the initial attempt at this, and updates it significantly. First, it uses the newly added `StringTable` abstraction for string tables, and simplifies the construction to build the string table and info arrays separately. This should reduce any `constexpr` compile time memory or CPU cost of the original PR while significantly improving the APIs throughout. It also restructures the builtins to support sharding across several independent tables. This accomplishes two improvements from the original PR: 1) It improves the APIs used significantly. 2) When builtins are defined from different sources (like SVE vs MVE in AArch64), this allows each of them to build their own string table independently rather than having to merge the string tables and info structures. 3) It allows each shard to factor out a common prefix, often cutting the size of the strings needed for the builtins by a factor two. The second point is important both to allow different mechanisms of construction (for example a `.def` file and a tablegen'ed `.inc` file, or different tablegen'ed `.inc files), it also simply reduces the sizes of these tables which is valuable given how large they are in some cases. The third builds on that size reduction. Initially, we use this new sharding rather than merging tables in AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system works, as without further changes these still push scaling limits. Subsequent commits will more deeply leverage the new structure, including using the prefix capabilities which cannot be easily factored out here and requires deep changes to the targets.
2025-01-24	[HLSL] Introduce address space `hlsl_constant(2)` for constant buffer ↵	Helena Kotas	1	-10/+11
	declarations (#123411) Introduces a new address space `hlsl_constant(2)` for constant buffer declarations. This address space is applied to declarations inside `cbuffer` block. Later on, it will also be applied to `ConstantBuffer<T>` syntax and the default `$Globals` constant buffer. Clang codegen translates constant buffer declarations to global variables and loads from `hlsl_constant(2)` address space. More work coming soon will include addition of metadata that will map these globals to individual constant buffers and enable their transformation to appropriate constant buffer load intrinsics later on in an LLVM pass. Fixes #123406
2024-12-13	Revert "Switch builtin strings to use string tables" (#119638)	Chandler Carruth	1	-14/+9
	Reverts llvm/llvm-project#118734 There are currently some specific versions of MSVC that are miscompiling this code (we think). We don't know why as all the other build bots and at least some folks' local Windows builds work fine. This is a candidate revert to help the relevant folks catch their builders up and have time to debug the issue. However, the expectation is to roll forward at some point with a workaround if at all possible.
2024-12-08	Switch builtin strings to use string tables (#118734)	Chandler Carruth	1	-9/+14
	The Clang binary (and any binary linking Clang as a library), when built using PIE, ends up with a pretty shocking number of dynamic relocations to apply to the executable image: roughly 400k. Each of these takes up binary space in the executable, and perhaps most interestingly takes start-up time to apply the relocations. The largest pattern I identified were the strings used to describe target builtins. The addresses of these string literals were stored into huge arrays, each one requiring a dynamic relocation. The way to avoid this is to design the target builtins to use a single large table of strings and offsets within the table for the individual strings. This switches the builtin management to such a scheme. This saves over 100k dynamic relocations by my measurement, an over 25% reduction. Just looking at byte size improvements, using the `bloaty` tool to compare a newly built `clang` binary to an old one: ``` FILE SIZE VM SIZE -------------- -------------- +1.4% +653Ki +1.4% +653Ki .rodata +0.0% +960 +0.0% +960 .text +0.0% +197 +0.0% +197 .dynstr +0.0% +184 +0.0% +184 .eh_frame +0.0% +96 +0.0% +96 .dynsym +0.0% +40 +0.0% +40 .eh_frame_hdr +114% +32 [ = ] 0 [Unmapped] +0.0% +20 +0.0% +20 .gnu.hash +0.0% +8 +0.0% +8 .gnu.version +0.9% +7 +0.9% +7 [LOAD #2 [R]] [ = ] 0 -75.4% -3.00Ki .relro_padding -16.1% -802Ki -16.1% -802Ki .data.rel.ro -27.3% -2.52Mi -27.3% -2.52Mi .rela.dyn -1.6% -2.66Mi -1.6% -2.66Mi TOTAL ``` We get a 16% reduction in the `.data.rel.ro` section, and nearly 30% reduction in `.rela.dyn` where those reloctaions are stored. This is also visible in my benchmarking of binary start-up overhead at least: ``` Benchmark 1: ./old_clang --version Time (mean ± σ): 17.6 ms ± 1.5 ms [User: 4.1 ms, System: 13.3 ms] Range (min … max): 14.2 ms … 22.8 ms 162 runs Benchmark 2: ./new_clang --version Time (mean ± σ): 15.5 ms ± 1.4 ms [User: 3.6 ms, System: 11.8 ms] Range (min … max): 12.4 ms … 20.3 ms 216 runs Summary './new_clang --version' ran 1.13 ± 0.14 times faster than './old_clang --version' ``` We get about 2ms faster `--version` runs. While there is a lot of noise in binary execution time, this delta is pretty consistent, and represents over 10% improvement. This is particularly interesting to me because for very short source files, repeatedly starting the `clang` binary is actually the dominant cost. For example, `configure` scripts running against the `clang` compiler are slow in large part because of binary start up time, not the time to process the actual inputs to the compiler. ---- This PR implements the string tables using `constexpr` code and the existing macro system. I understand that the builtins are moving towards a TableGen model, and if complete that would provide more options for modeling this. Unfortunately, that migration isn't complete, and even the parts that are migrated still rely on the ability to break out of the TableGen model and directly expand an X-macro style `BUILTIN(...)` textually. I looked at trying to complete the move to TableGen, but it would both require the difficult migration of the remaining targets, and solving some tricky problems with how to move away from any macro-based expansion. I was also able to find a reasonably clean and effective way of doing this with the existing macros and some `constexpr` code that I think is clean enough to be a pretty good intermediate state, and maybe give a good target for the eventual TableGen solution. I was also able to factor the macros into set of consistent patterns that avoids a significant regression in overall boilerplate.
2024-12-02	Revert "[SPIR-V] Fixup storage class for global private (#116636)" (#118312)	Nathan Gauër	1	-2/+0
	This reverts commit aa7fe1c10e5d6d0d3aacdb345fed995de413e142.
2024-12-02	[SPIR-V] Fixup storage class for global private (#116636)	Nathan Gauër	1	-0/+2
	Adds a new address spaces: `hlsl_private`. Variables with such address space will be emitted with a `Private` storage class. This is useful for variables global to a SPIR-V module, since up to now, they were still emitted with a `Function` storage class, which is wrong. --------- Signed-off-by: Nathan Gauër <brioche@google.com>
2024-11-11	[Clang][HIP] Reapply: Deprecate the AMDGCN_WAVEFRONT_SIZE macros (#115507)	Fabian Ritter	1	-3/+6
	So far, these macros can be used in contexts where no meaningful wavefront size is available. We therefore deprecate these macros, to replace them with a more resilient interface to access wavefront size information where it is available. Reapplies #112849 with a fix for the non-hermetic clang test that failed on Mac after the revert in #115499. For SWDEV-491529.
2024-11-08	Revert "[Clang][HIP] Deprecate the AMDGCN_WAVEFRONT_SIZE macros" (#115499)	Fabian Ritter	1	-6/+3
	Reverts llvm/llvm-project#112849 due to test failure on Mac, reported by @nico
2024-11-08	[Clang][HIP] Deprecate the AMDGCN_WAVEFRONT_SIZE macros (#112849)	Fabian Ritter	1	-3/+6
	So far, these macros can be used in contexts where no meaningful wavefront size is available. We therefore deprecate these macros, to replace them with a more resilient interface to access wavefront size information where it is available. For SWDEV-491529.
2024-10-22	[clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for ↵	Alex Voicu	1	-3/+3
	when `generic` is available (#112442) Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use `private` as the default address space. This is wrong for cases where the `generic` address space is available, and is corrected via this patch. In general, this AS map abuse is a bad hack and we should re-work it altogether, but at least after this patch we will stop being incorrect for e.g. OpenCL 2.0.
2024-07-09	[AMDGPU] Report error in clang if wave32 is requested where unsupported (#97633)	Stanislav Mekhanoshin	1	-3/+9

2024-05-04	Add clarifying parenthesis around non-trivial conditions in ternary ↵	luolent	1	-1/+1
	expressions. (#90391) Fixes [#85868](https://github.com/llvm/llvm-project/issues/85868) Parenthesis are added as requested on ternary operators with non trivial conditions. I used this [precedence table](https://en.cppreference.com/w/cpp/language/operator_precedence) for reference, to make sure we get the expected behavior on each change.
2024-02-14	[AMDGPU] Replace '.' with '-' in generic target names (#81718)	Pierre van Houtryve	1	-2/+1
	The dot is too confusing for tools. Output temporaries would have '10.3-generic' so tools could parse it as an extension, device libs & the associated clang driver logic are also confused by the dot. After discussions, we decided it's better to just remove the '.' from the target name than fix each issue one by one.
2024-02-12	[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955)	Pierre van Houtryve	1	-4/+17
	These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. This contains the documentation changes for both this change and #76954 as well.
2024-01-30	[AMDGPU] Correctly exclude the HIP host from arch macros	Joseph Huber	1	-2/+3
	Summary: This logic was wrong and accidentally appling to OpenCL.
2024-01-30	[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#80035)	Joseph Huber	1	-24/+24
	Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulting IR will not contain any target dependent attributes and can then be inserted into another program via `-mlink-builtin-bitcode` to inherit its attributes. However, there are a handful of macros that can leak incorrect information when compiling for an unspecified architecture. Currently, things like the wavefront size will default to 64, which is actually variable. We should not expose these macros unless it is known.
2024-01-29	Revert "[AMDGPU] Do not emit arch dependent macros with unspecified cpu ↵	Joseph Huber	1	-23/+24
	(#79660)" This reverts commit c9a6e993f7b349405b6c8f9244cd9cf0f56a6a81. This breaks HIP code that incorrectly depended on GPU-specific macros to be set. The code is totally wrong as using `__WAVEFRTONSIZE__` on the host is absolutely meaningless, but it seems this entire corner of the toolchain is fundmentally broken. Reverting for now to avoid breakages.
2024-01-29	[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660)	Joseph Huber	1	-24/+23
	Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulting IR will not contain any target dependent attributes and can then be inserted into another program via `-mlink-builtin-bitcode` to inherit its attributes. However, there are a handful of macros that can leak incorrect information when compiling for an unspecified architecture. Currently, things like the wavefront size will default to 64, which is actually variable. We should not expose these macros unless it is known.
2023-12-15	[AMDGPU] - Add address space for strided buffers (#74471)	Jessica Del	1	-2/+3
	This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.
2023-12-13	[clang] Use StringRef::{starts,ends}_with (NFC) (#75149)	Kazu Hirata	1	-1/+1
	This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.
2023-12-11	[NFC][AMDGPU] Unify AMDGPU address space enum (#73944)	Dominik Adamski	1	-40/+40
	Types of AMDGPU address space were defined not only in Clang-specific class but also in LLVM header. If we unify the AMD GPU address space enumeration, then we can reuse it in Clang, Flang and LLVM.
2023-08-25	[AMDGPU] Add target feature gws to clang	Yaxun (Sam) Liu	1	-1/+2
	Reviewed by: Matt Arsenault Differential Revision: https://reviews.llvm.org/D158367
2023-08-01	[HIP] Fix regression about `__fp16` args and return value	Yaxun (Sam) Liu	1	-0/+1
	HIP allows __fp16 as function arguments and return value by passing -fallow-half-arguments-and-returns to clang through hipcc. https://reviews.llvm.org/D133885 removed -fallow-half-arguments-and-returns and add a TargetInfo member to control it. This caused regressions in some HIP apps (https://github.com/ROCm-Developer-Tools/HIP/issues/3178). Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D145345 Fixes: https://github.com/ROCm-Developer-Tools/HIP/issues/3178
2023-07-02	[AMDGPU] Rename predefined macro __AMDGCN_WAVEFRONT_SIZE	Yaxun (Sam) Liu	1	-0/+2
	rename it to __AMDGCN_WAVEFRONT_SIZE__ for consistency. __AMDGCN_WAVEFRONT_SIZE will be deprecated in the future. Reviewed by: Matt Arsenault, Johannes Doerfert Differential Revision: https://reviews.llvm.org/D154207
2023-06-14	[HIP] emit macro `__HIP_NO_IMAGE_SUPPORT`	Yaxun (Sam) Liu	1	-0/+1
	HIP texture/image support is optional as some devices do not have image instructions. A macro __HIP_NO_IMAGE_SUPPORT is defined for device not supporting images (https://github.com/ROCm-Developer-Tools/HIP/blob/d0448aa4c4dd0f4b29ccf6a663b7f5ad9f5183e0/docs/reference/kernel_language.md?plain=1#L426 ) Currently the macro is defined by HIP header based on predefined macros for GPU, e.g __gfx*__ , which is error prone. This patch let clang emit the predefined macro. Reviewed by: Matt Arsenault, Artem Belevich Differential Revision: https://reviews.llvm.org/D151349
2023-05-12	[AMDGPU] Emit predefined macro `__AMDGCN_CUMODE__`	Yaxun (Sam) Liu	1	-1/+2
	Predefine __AMDGCN_CUMODE__ as 1 or 0 when compilation assumes CU or WGP modes. If WGP mode is not supported, ignore -mno-cumode and emit a warning. This is needed for implementing device functions like __smid (https://github.com/ROCm-Developer-Tools/hipamd/blob/312dff7b794337aa040be0691acc78e9f968a8d2/include/hip/amd_detail/amd_device_functions.h#L957) Reviewed by: Matt Arsenault, Artem Belevich, Brian Sumner Differential Revision: https://reviews.llvm.org/D145343
2023-05-03	Re-land "[AMDGPU] Define data layout entries for buffers""	Krzysztof Drewniak	1	-2/+2
	Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27. Differential Revision: https://reviews.llvm.org/D149776
2023-05-03	Revert "[AMDGPU] Define data layout entries for buffers"	Krzysztof Drewniak	1	-2/+2
	This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850. Differential Revision: https://reviews.llvm.org/D149758
2023-05-03	[AMDGPU] Define data layout entries for buffers	Krzysztof Drewniak	1	-2/+2
	Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441
2023-03-29	[Clang][Flang][AMDGPU] Add support for AMDGPU to Flang driver	Dominik Adamski	1	-182/+4
	Scope of changes: 1) Extract common code between Clang and Flang for parsing AMDGPU features 2) Add function which adds implicit target features for AMDGPU as Clang does 3) Add AMDGPU target as one of valid targets for Flang Differential Revision: https://reviews.llvm.org/D145579 Reviewed By: yaxunl, awarzynski
2023-03-28	[AMDGPU] Replace target feature for global fadd32	Anshil Gandhi	1	-0/+2
	Change target feature of __builtin_amdgcn_global_atomic_fadd_f32 to atomic-fadd-rtn-insts. Enable atomic-fadd-rtn-insts for gfx90a, gfx940 and gfx1100 as they all support the return variant of `global_atomic_add_f32`. Fixes https://github.com/llvm/llvm-project/issues/61331. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D146840
2023-03-24	[AMDGPU] Create Subtarget Features for some of 16 bits atomic fadd instructions	Mariusz Sikora	1	-0/+4
	Introducing Subtarget Features for instructions: - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 Differential Revision: https://reviews.llvm.org/D146701
2023-01-26	[AMDGPU] Split dot7 feature	Stanislav Mekhanoshin	1	-0/+4
	Differential Revision: https://reviews.llvm.org/D142507
2023-01-24	[AMDGPU] Split dot8 feature	Stanislav Mekhanoshin	1	-0/+1
	Differential Revision: https://reviews.llvm.org/D142407
2023-01-24	[AMDGPU] Remove dot1 and dot6 features from clang for gfx11	Stanislav Mekhanoshin	1	-2/+0
	These are unsupported. Differential Revision: https://reviews.llvm.org/D142493
2023-01-23	[clang] Optimize clang::Builtin::Info density	serge-sans-paille	1	-2/+2
	Reorganize clang::Builtin::Info to have them naturally align on 4 bytes boundaries. Instead of storing builtin headers as a straight char pointer, enumerate them and store the enum. It allows to use a small enum instead of a pointer to reference them. On a 64 bit machine, this brings sizeof(clang::Builtin::Info) from 56 down to 48 bytes. On a release build on my Linux 64 bit machine, it shrinks the size of libclang-cpp.so by 193kB. The impact on performance is negligible in terms of instruction count, but the wall time seems better, see https://llvm-compile-time-tracker.com/compare.php?from=b3d8639f3536a4876b511aca9fb7948ff9266cee&to=a89b56423f98b550260a58c41e64aff9e56b76be&stat=task-clock Differential Revision: https://reviews.llvm.org/D142024
2023-01-09	Move from llvm::makeArrayRef to ArrayRef deduction guides - clang/ part	serge-sans-paille	1	-3/+3
	This is a follow-up to https://reviews.llvm.org/D140896, split into several parts as it touches a lot of files. Differential Revision: https://reviews.llvm.org/D141139
2023-01-05	clang/AMDGPU: Remove flat-address-space from feature map	Matt Arsenault	1	-4/+0
	This was only used for checking if is_shared/is_private were legal, which we're not bothering to do anymore. This is apparently visible to more than the target attribute (which seems to silently ignore unrecognized features), so this has the potential to break something (i.e. see the OpenMP test change)
2022-12-29	AMDGPU/clang: Add builtins for llvm.amdgcn.ballot	Matt Arsenault	1	-1/+32
	Use explicit _w32/_w64 suffixes for the wave size to be consistent with the existing other wave dependent intrinsics. Also start diagnosing trying to use both wave32 and wave64. I would have preferred to avoid the +wavefrontsize64 spam on targets where that's the only option, but avoiding this seems to be more work than I expected.
2022-12-27	[clang] Use a StringRef instead of a raw char pointer to store builtin and ↵	serge-sans-paille	1	-1/+1
	call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470. The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e. The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable. Differential Revision: https://reviews.llvm.org/D139881