| Age | Commit message (Collapse) | Author | Files | Lines |
|
(#180116)
Previous commit message:
>Previous commit message:
>
>> Original commit message:
>>
>>>When users explicitly specify a PTX version via -mattr=+ptxNN that's
insufficient for their target SM, we now emit a fatal error. Previously,
we silently upgraded the PTX version to the minimum required for the
target SM.
>>>
>>>When no SM or PTX version is specified, we now use PTX 3.2 (the
minimum for the default SM 3.0) instead of PTX 6.0.
>>
>>The following commits should fix the failures that arose when I
previously tried to land this commit:
>>
>>
>>https://github.com/llvm/llvm-project/commit/9fc5fd0ad689eed94f65b1d6d10f9c5642935e68
should address the llvm-nvptx*-nvidia-* build failures:
https://github.com/llvm/llvm-project/pull/174834#issuecomment-3742242651
>>
>>
>>https://github.com/llvm/llvm-project/commit/600514a63760c6730e4cd970d2fcead9c5a897b3
should address the MLIR failures
>
>The previous commit was reverted with
https://github.com/llvm/llvm-project/commit/d23cb79ba497281de050ef609cb91b91058bf323
because the
[mlir-nvidia](https://lab.llvm.org/buildbot/#/builders/138/builds/24797)
and
[mlir-nvidia-gcc7](https://lab.llvm.org/buildbot/#/builders/116/builds/23929)
Buildbots were failing.
>
>Those tests failed because MLIR's default SM was 5.0, which caused
NVPTX
to target PTX ISA v4.0, which did not support the intrinsics used in the
failing tests.
>
>https://github.com/llvm/llvm-project/commit/243f011577193c99358ccc4142b296d4fa80ea11
should address this by bumping MLIR's default SM to 7.5. Now, using
MLIR's new default SM, NVPTX
targets the PTX ISA v6.3, which supports the intrinsics used in the
failing tests.
---
The previous commit was reverted with
e9b578a4d77025e18318efedd0f3f3764338d859
[because](https://github.com/llvm/llvm-project/pull/179304#issuecomment-3856301333)
the clang driver set the default PTX ISA version to v4.2 when no CUDA
installation is found. However, given our patch, we should not set a
default; instead, let the LLVM backend select the appropriate PTX ISA
version for the target SM.
|
|
version"" (#180035)
Reverts llvm/llvm-project#179304 due to
https://github.com/llvm/llvm-project/pull/179304#issuecomment-3856100622
|
|
(#179304)
Previous commit message:
> Original commit message:
>
>>When users explicitly specify a PTX version via -mattr=+ptxNN that's
insufficient for their target SM, we now emit a fatal error. Previously,
we silently upgraded the PTX version to the minimum required for the
target SM.
>>
>>When no SM or PTX version is specified, we now use PTX 3.2 (the
minimum for the default SM 3.0) instead of PTX 6.0.
>
>The following commits should fix the failures that arose when I
previously tried to land this commit:
>
>https://github.com/llvm/llvm-project/commit/9fc5fd0ad689eed94f65b1d6d10f9c5642935e68
should address the llvm-nvptx*-nvidia-* build failures:
https://github.com/llvm/llvm-project/pull/174834#issuecomment-3742242651
>
>https://github.com/llvm/llvm-project/commit/600514a63760c6730e4cd970d2fcead9c5a897b3
should address the MLIR failures
---
The previous commit was reverted with
d23cb79ba497281de050ef609cb91b91058bf323 because the
[mlir-nvidia](https://lab.llvm.org/buildbot/#/builders/138/builds/24797)
and
[mlir-nvidia-gcc7](https://lab.llvm.org/buildbot/#/builders/116/builds/23929)
Buildbots were failing.
Those tests failed because MLIR's default SM was 5.0, which caused NVPTX
to target PTX ISA v4.0, which did not support the intrinsics used in the
failing tests.
243f011577193c99358ccc4142b296d4fa80ea11 should address this by bumping
MLIR's default SM to 7.5. Now, using MLIR's new default SM, NVPTX
targets the PTX ISA v6.3, which supports the intrinsics used in the
failing tests.
|
|
This just added unnecessary work to the IR, since they are only used for
load and store, which just causes some IR noise. Tests updated by UTC
script to remove the extra lines.
|
|
version"" (#178046)
Reverts llvm/llvm-project#177459
`mlir-nvidia` and `mlir-nvidia-gcc7` Buildbots are failing.
The blamelist is small and likely because of my change. Preemptively
reverting.
|
|
(#177459)
Original commit message:
> When users explicitly specify a PTX version via -mattr=+ptxNN that's
insufficient for their target SM, we now emit a fatal error. Previously,
we silently upgraded the PTX version to the minimum required for the
target SM.
>
>When no SM or PTX version is specified, we now use PTX 3.2 (the minimum
for the default SM 3.0) instead of PTX 6.0.
---
The following commits should fix the failures that arose when I
previously tried to land this commit:
- 9fc5fd0ad689eed94f65b1d6d10f9c5642935e68 should address the
`llvm-nvptx*-nvidia-*` build failures:
https://github.com/llvm/llvm-project/pull/174834#issuecomment-3742242651
- 600514a63760c6730e4cd970d2fcead9c5a897b3 should address the MLIR
failures
|
|
(#175760)
Reverts llvm/llvm-project#174834
Bots are broken.
|
|
When users explicitly specify a PTX version via `-mattr=+ptxNN` that's
insufficient for their target SM, we now emit a fatal error. Previously,
we silently upgraded the PTX version to the minimum required for the
target SM.
When no SM or PTX version is specified, we now use PTX 3.2 (the minimum
for the default SM 3.0) instead of PTX 6.0.
|
|
address-space-conversions.cpp (#167261)
Calling convention is irrelevant to address space verification and adds
complixity for other target triples.
|
|
(#161773)
LLVM models ConstantPointerNull as all-zero, but some GPUs (e.g. AMDGPU
and our downstream GPU target) use a non-zero sentinel for null in
private / local address spaces.
SPIR-V is a supported input for our GPU target. This PR preserves a
canonical zero form in the generic AS while allowing later lowering to
substitute the target’s real sentinel.
|
|
(#140282)
This patch is part of the upstreaming effort for supporting SYCL
language front end.
It makes the following changes:
1. Adds sycl_external attribute for functions with external linkage,
which is intended for use to implement the SYCL_EXTERNAL macro as
specified by the SYCL 2020 specification
2. Adds checks to avoid emitting device code when sycl_external and
sycl_kernel_entry_point attributes are not enabled
3. Fixes test failures caused by the above changes
This patch is missing diagnostics for the following diagnostics listed
in the SYCL 2020 specification's section 5.10.1, which will be addressed
in a subsequent PR:
Functions that are declared using SYCL_EXTERNAL have the following
additional restrictions beyond those imposed on other device functions:
1. If the SYCL backend does not support the generic address space then
the function cannot use raw pointers as parameter or return types.
Explicit pointer classes must be used instead;
2. The function cannot call group::parallel_for_work_item;
3. The function cannot be called from a parallel_for_work_group scope.
In addition to that, the subsequent PR will also implement diagnostics
for inline functions including virtual functions defined as inline.
---------
Co-authored-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
|
|
The test checks x86 target in fsycl-is-device mode. x86 is not a valid
offloading target and IMO test meant to check fsycl-is-host mode to make
sure host invocations of __builtin_sycl_unique_stable_name work
correctly. This also switches the test to use sycl_entry_point attribute
instead of sycl_kernel because the intention for the future SYCL changes
is to use sycl_entry_point and eventually remove sycl_kernel attribute
when sycl_entry_point supports enough SYCL cases.
Fixes https://github.com/llvm/llvm-project/issues/146566
---------
Co-authored-by: Tom Honermann <tom@honermann.net>
Co-authored-by: Victor Lomuller <victor@codeplay.com>
|
|
Entry point functions such as main, wmain etc. should not be mangled for
UEFI targets.
|
|
functions. (#133030)
A function declared with the `sycl_kernel_entry_point` attribute,
sometimes called a SYCL kernel entry point function, specifies a pattern
from which the parameters and body of an offload entry point function,
sometimes called a SYCL kernel caller function, are derived.
SYCL kernel caller functions are emitted during SYCL device compilation.
Their parameters and body are derived from the `SYCLKernelCallStmt`
statement and `OutlinedFunctionDecl` declaration associated with their
corresponding SYCL kernel entry point function. A distinct SYCL kernel
caller function is generated for each SYCL kernel entry point function
defined as a non-inline function or ODR-used in the translation unit.
The name of each SYCL kernel caller function is parameterized by the
SYCL kernel name type specified by the `sycl_kernel_entry_point`
attribute attached to the corresponding SYCL kernel entry point
function. For the moment, the Itanium ABI mangled name for typeinfo data
(`_ZTS<type>`) is used to name these functions; a future change will
switch to a more appropriate naming scheme.
The calling convention used for a SYCL kernel caller function is target
dependent. Support for AMDGCN, NVPTX, and SPIR targets is currently
provided. These functions are required to observe the language
restrictions for SYCL devices as specified by the SYCL 2020
specification; this includes a forward progress guarantee and prohibits
recursion.
Only SYCL kernel caller functions, functions declared as
`SYCL_EXTERNAL`, and functions directly or indirectly referenced from
those functions should be emitted during device compilation. Pruning of
other declarations has not yet been implemented.
---------
Co-authored-by: Elizabeth Andrews <elizabeth.andrews@intel.com>
|
|
|
|
Add nuw attribute to inbounds GEPs where the expression used to form the
GEP is an addition of unsigned indices.
Relands #105496, which was reverted because it exposed a miscompilation
arising from #98608. This is now fixed by #106512.
|
|
Reverts llvm/llvm-project#105496
This patch breaks:
https://lab.llvm.org/buildbot/#/builders/25/builds/1952
https://lab.llvm.org/buildbot/#/builders/52/builds/1775
Somehow output is different with sanitizers.
Maybe non-determinism in the code?
|
|
Add nuw attribute to inbounds GEPs where the expression used to form the
GEP is an addition of unsigned indices.
|
|
Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.
Regression tests are updated using update scripts where possible, and by
find + replace where not.
|
|
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
|
|
ParseFrontendArgs takes the last OPT_Action_Group option. The other
actions are overridden.
|
|
|
|
local/private `nullptr` value for AMDGPU. (#78759)
- Address space cast of nullptr in local_space into a generic_space for
the CUDA backend. The reason for this cast was having invalid local
memory base address for the associated variable.
- In the context of AMD GPU, assigns a NULL value as ~0 for the address
spaces of sycl_local and sycl_private to match the ones for opencl_local
and opencl_private.
|
|
|
|
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D150520
|
|
Like CUDA and OpenCL, the SYCL specification says that throwing and
catching exceptions in device functions is not supported, so this change
extends the logic for adding the NoUnwind attribute to SYCL.
The existing convergent.cpp test, which tests that the convergent
attribute is added to functions by default, is renamed and reused to
test that the nounwind attribute is added by default. This test now has
-fexceptions added to it, which the driver adds by default as well.
The obvious question here is why not simply change the driver to remove
-fexceptions. This change follows the direction given by the TODO
comment because removing -fexceptions would also disable the
__EXCEPTIONS macro, which should reflect whether exceptions are enabled
on the host, rather than on the device, to avoid conflicts in types
shared between host and device.
Reviewed By: bader
Differential Revision: https://reviews.llvm.org/D147097
|
|
These were all tests where no manual fixup was required.
|
|
The global constant arguments could be in a different address space
than the first argument, so we have to add another overloaded argument.
This patch was originally made for CHERI LLVM (where globals can be in
address space 200), but it also appears to be useful for in-tree targets
as can be seen from the test diffs.
Differential Revision: https://reviews.llvm.org/D138722
|
|
Clang language-level address spaces and LLVM pointer address spaces are
not the same thing (even though they will both have a numeric value of
zero in many cases). LangAS is a enum class to avoid implicit conversions,
but eba69b59d1a30dead07da2c279c8ecfd2b62ba9f avoided the compiler error by
adding a `static_cast<>`. While touching this code, simplify it by using
CreatePointerBitCastOrAddrSpaceCast() which is already a no-op if the types
match.
This changes the code generation for spir64 to place the globals in
the sycl_global addreds space, which maps to `addrspace(1)`.
Reviewed By: bader
Differential Revision: https://reviews.llvm.org/D138284
|
|
This adds -no-opaque-pointers to clang tests whose output will
change when opaque pointers are enabled by default. This is
intended to be part of the migration approach described in
https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9.
The patch has been produced by replacing %clang_cc1 with
%clang_cc1 -no-opaque-pointers for tests that fail with opaque
pointers enabled. Worth noting that this doesn't cover all tests,
there's a remaining ~40 tests not using %clang_cc1 that will need
a followup change.
Differential Revision: https://reviews.llvm.org/D123115
|
|
Reviewed By: bader
Differential Revision: https://reviews.llvm.org/D118935
|
|
turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
|
|
Functions pointers should be created with program address space. This
patch introduces program address space in TargetInfo. Targets with
non-default (default is 0) address space for functions should explicitly
set this value. This patch fixes a crash on lvalue reference to function
pointer (in device code) when using oneAPI DPC++ compiler.
Differential Revision: https://reviews.llvm.org/D111566
|
|
at the start of the entry block, which in turn would aid better code transformation/optimization.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D110257
|
|
disable-noundef-analysis and turn it off by default"
This reverts commit aacfbb953eb705af2ecfeb95a6262818fa85dd92.
Revert "Fix lit test failures in CodeGenCoroutines"
This reverts commit 63fff0f5bffe20fa2c84a45a41161afa0043cb34.
|
|
turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:
(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.
(2) The remaining tests are updated manually.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D108453
Resolve lit failures in clang after 8ca4b3e's land
Fix lit test failures in clang-ppc* and clang-x64-windows-msvc
Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc
Fix internal_clone(aarch64) inline assembly
|
|
disable-noundef-analysis and turn it off by default"
This reverts commit 7584ef766a7219b6ee5a400637206d26e0fa98ac.
|
|
turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
|
|
for parameters and locals need to refer to the stack
allocation in the alloca address space.
|
|
This reverts the following commits:
37ca7a795b277c20c02a218bf44052278c03344b
9aa6c72b92b6c89cc6d23b693257df9af7de2d15
705387c5074bcca36d626882462ebbc2bcc3bed4
8ca4b3ef19fe82d7ad6a6e1515317dcc01b41515
80dba72a669b5416e97a42fd2c2a7bc5a6d3f44a
|
|
turn it off by default (2)
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:
(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.
(2) The remaining tests are updated manually.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D108453
|
|
which is essentially required as a pre-commit for https://reviews.llvm.org/D110257.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D110676
|
|
After significant problems in our downstream with the previous
implementation, the SYCL standard has opted to make using macros/etc to
change kernel-naming-lambdas in any way UB (even passively). As a
result, we are able to just emit the itanium mangling.
However, this DOES require a little work in the CXXABI, as the microsoft
and itanium mangler use different numbering schemes for lambdas. This
patch adds a pair of mangling contexts that use the normal 'itanium'
mangling strategy to fill in the "DeviceManglingNumber" used previously
by CUDA.
Differential Revision: https://reviews.llvm.org/D110281
|
|
Discovered in SYCL, the field annotations were always cast to an i8*,
which is an invalid bitcast for a pointer type with an address space.
This patch makes sure that we create an intrinsic that takes a pointer
to the correct address-space and properly do our casts.
Differential Revision: https://reviews.llvm.org/D109003
|
|
In the case where the device is an itanium target, and the host is a
windows target, we were getting the names wrong, since in the itanium
case we filter by lambda-signature.
The fix is to always filter by the signature rather than just on
non-windows builds. I considered doing the reverse (that is, checking
the aux-triple), but doing so would result in duplicate lambda mangling
numbers (from linux reusing the same number for different signatures).
|
|
The original version of this was reverted, and @rjmcall provided some
advice to architect a new solution. This is that solution.
This implements a builtin to provide a unique name that is stable across
compilations of this TU for the purposes of implementing the library
component of the unnamed kernel feature of SYCL. It does this by
running the Itanium mangler with a few modifications.
Because it is somewhat common to wrap non-kernel-related lambdas in
macros that aren't present on the device (such as for logging), this
uniquely generates an ID for all lambdas involved in the naming of a
kernel. It uses the lambda-mangling number to do this, except replaces
this with its own number (starting at 10000 for readabililty reasons)
for lambdas used to name a kernel.
Additionally, this implements itself as constexpr with a slight catch:
if a name would be invalidated by the use of this lambda in a later
kernel invocation, it is diagnosed as an error (see the Sema tests).
Differential Revision: https://reviews.llvm.org/D103112
|
|
Differential Revision: https://reviews.llvm.org/D100396
|
|
Default address space (applies when no explicit address space was
specified) maps to generic (4) address space.
Added SYCL named address spaces `sycl_global`, `sycl_local` and
`sycl_private` defined as sub-sets of the default address space.
Static variables without address space now reside in global address
space when compile for SPIR target, unless they have an explicit address
space qualifier in source code.
Differential Revision: https://reviews.llvm.org/D89909
|
|
|
|
SYCL compilations initiated by the driver will spawn off one or more
frontend compilation jobs (one for device and one for host). This patch
reworks the driver options to make upstreaming this from the downstream
SYCL fork easier.
This patch introduces a language option to identify host executions
(SYCLIsHost) and a -cc1 frontend option to enable this mode. -fsycl and
-fno-sycl become driver-only options that are rejected when passed to
-cc1. This is because the frontend and beyond should be looking at
whether the user is doing a device or host compilation specifically.
Because the frontend should only ever be in one mode or the other,
-fsycl-is-device and -fsycl-is-host are mutually exclusive options.
|