Age | Commit message (Collapse) | Author | Files | Lines |
|
data (#78879)
The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a
CMake project to use for generating the profile data. For example, to
use the llvm-test-suite to generate profile data you would do:
$ cmake -G Ninja -B build -S llvm -C <path to
source>/clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite>
\
-DBOOTSTRAP_CLANG_PGO_TRAINING_DEPS=runtimes
Note that the CLANG_PERF_TRAINING_DEPS has been renamed to
CLANG_PGO_TRAINING_DEPS.
---------
Co-authored-by: Petr Hosek <phosek@google.com>
(cherry picked from commit dd0356d741aefa25ece973d6cc4b55dcb73b84b4)
|
|
Similar to what we did for ZA, this patch adds diagnostics to flag when
using a ZT0 builtin in a function that does not have ZT0 state.
|
|
options (#69133)"
This reverts commit b83b8d3fd17885438b0ea154e07088d877d293a8.
Breaks buildbots e.g.
https://lab.llvm.org/buildbot/#/builders/225/builds/29950
|
|
(#69133)
This reverts commit 6c47419703acfcd7dcca9e30ab9dba6a7a42f977.
Default to CLANG_BOLT=OFF
Test Plan:
Build a regular Clang build.
|
|
|
|
(#69133)"
This reverts commit 745883bba69007f1d2c5135f3d5b0f1efcfc82cd.
This is failing to configure on many of our bots:
https://lab.llvm.org/buildbot/#/builders/245/builds/19468
This did not get caught right away because generally bots only
clean the build every so often.
|
|
Split up and refactor CLANG_BOLT_INSTRUMENT into support for
BOLT instrumentation, perf no-LBR and perf with LBR profiling.
Differential Revision: https://reviews.llvm.org/D143617
|
|
|
|
The arm_sme.td file was still using `IsSharedZA` and `IsPreservesZA`,
which should be changed to match the new state attributes added in
#76971.
This patch adds `IsInZA`, `IsOutZA` and `IsInOutZA` as the state for the
Clang builtins and fixes up the code in SemaChecking and SveEmitter to
match.
Note that the code is written in such a way that it can be easily
extended with ZT0 state (to follow in a future patch).
|
|
GCC 14 defines `__arm_streaming` as a macro expanding to
`[[arm::streaming]]`. Due to the nested macro use, this gets expanded
prior to concatenation.
It doesn't look like C++ has a really clean way to prevent macro
expansion. The best I have found is to use `EMPTY ## X` where `EMPTY` is
an empty macro argument, so this is the hack I'm implementing here.
Fixes https://github.com/llvm/llvm-project/issues/78691.
|
|
This upstreams more of the Clang API Notes functionality that is
currently implemented in the Apple fork:
https://github.com/apple/llvm-project/tree/next/clang/lib/APINotes
This is the largest chunk of the API Notes functionality in the
upstreaming process. I will soon submit a follow-up patch to actually
enable usage of this functionality by having a Clang driver flag that
enables API Notes, along with tests.
|
|
These attributes were using the GNU attribute syntax, rather than the new
keyword attribute syntax, and they are no longer required as we have code
in SemaChecking to verify whether a builtin is compatible with its caller.
|
|
|
|
This patch replaces the `__arm_new_za`, `__arm_shared_za` and
`__arm_preserves_za` attributes in favour of:
* `__arm_new("za")`
* `__arm_in("za")`
* `__arm_out("za")`
* `__arm_inout("za")`
* `__arm_preserves("za")`
As described in https://github.com/ARM-software/acle/pull/276.
One change is that `__arm_in/out/inout/preserves(S)` are all mutually
exclusive, whereas previously it was fine to write `__arm_shared_za
__arm_preserves_za`. This case is now represented with `__arm_in("za")`.
The current implementation uses the same LLVM attributes under the hood,
since `__arm_in/out/inout` are all variations of "shared ZA", so can use
the existing `aarch64_pstate_za_shared` attribute in LLVM.
#77941 will add support for the new "zt0" state as introduced
with SME2.
|
|
to Zvfhmin (#77866)
From #75735, Zvfh implies Zvfhmin.
|
|
This patch adds IsStreamingOrSVE2p1 to the applicable builtins and a
warning for when those builtins are not used in a streaming or sve2p1
function.
|
|
Reverts llvm/llvm-project#75958
I mistakenly included a commit from my local main after rebasing.
|
|
This patch adds IsStreamingOrSVE2p1 to the applicable builtins and a
warning for when those builtins are not used in a streaming or sve2p1
function.
|
|
Optimize castToDeclContext for 2% improvement in build times
castToDeclContext is a heavily used function, and as such, it needs to
be kept as slim as feasible to preserve as much performance as possible.
To this end, it was observed that the function was generating suboptimal
assembly code, and putting the most common execution path in the longest
sequence of instructions. This patch addresses this by guiding the
compiler towards generating a lookup table of offsets, which can be used
to perform an addition on the pointer. This results in a 1-2%
improvement on debug builds (and a negligible improvement on release).
To achieve this, the switch was simplified to flatten the if statements
in the default branch. In order to make the values of the switch more
compact, encouraging LLVM to generate a look-up table instead of a jump
table, the AST TableGen generator was modified so it can take order
priority based on class inheritance. This combination allowed for a more
optimal generation of the function. Of note, 2 other functions with an
equivalent structure also needed to be modified.
Fixes #76824
|
|
This includes:
* __arm_in_streaming_mode()
* __arm_has_sme()
* __arm_za_disable()
* __svundef_za()
|
|
Builtins.def says that bfloat should be represented by the 'y'
character, not the 'b' character. The 'b' character is specified to
represent boolean. The implementation currently uses 'b' correctly for
boolean and incorrectly re-uses 'b' for bfloat.
This was not caught since no builtins are emitted in
build/tools/clang/include/clang/Basic/riscv_sifive_vector_builtins.inc.
Don't know that we can test this without creating builtins that expose
this issue, although I'm not sure we really want to do that.
|
|
The RISC-V vector crypto extensions have been ratified. This patch
updates the Clang and LLVM support for these extensions to be
non-experimental, while leaving the C intrinsics as experimental since
the C intrinsics are not yet standardized.
Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
|
|
This patch adds a warning that's emitted when a builtin call uses ZA
state but the calling function doesn't provide any.
Patch by David Sherwood <david.sherwood@arm.com>.
|
|
function (#75487)
This PR adds a warning that's emitted when a non-streaming or
non-streaming-compatible builtin is called in an unsuitable function.
Uses work by Kerry McLaughlin.
This is a re-upload of #74064 and fixes a compile time increase.
|
|
non-streaming function" (#75449)
Reverts llvm/llvm-project#74064
|
|
function (#74064)
This PR adds a warning that's emitted when a non-streaming or
non-streaming-compatible builtin is called in an unsuitable function.
Uses work by Kerry McLaughlin.
|
|
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
|
|
This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:
// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);
// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t
pg, svuint8_t zn);
// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
svminqv[_u8](svbool_t pg, svuint8_t zn);
// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t
svminnmqv[_f16](svbool_t pg, svfloat16_t zn);
According to the PR#257[1]
The reduction instruction uses scalable vectors as input and fixed
vectors as output, therefore we changed SVEEmitter to emit fixed vector
types in case the neon header(arm_neon.h) is not present.
[1]https://github.com/ARM-software/acle/pull/257
Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>
|
|
This patch is needed for the reduction instructions in sve2.1
It add a new header to sve with all the fixed vector types.
The new types are only added if neon is not declared.
|
|
See https://github.com/ARM-software/acle/pull/217
Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
|
|
Remove ptr-to-ptr bitcast which was added back in
939352b6ec31db4e8defe07856868438fbc5340d . With opaque pointers, the
bitcast is now redundant.
Opaque ptr cleanup effort.
|
|
|
|
This PR attempts to recommit the PR (#72216) with a safe-bounded TypeID
that will not cause indeterminate results for the compiler.
|
|
This reverts commit 8434b0b9d39b7ffcd1f7f7b5746151e293620e0d. #72216
This commit broke the multiple buildbots, looks like the extension in
`NUM_PREDEF_TYPE_IDS` might have broken some inheriting usages, causing
indeterminate results for the compiler. Investigating the issue now.
|
|
The first commit extends the capacity from the compiler infrastructure,
and the second commit continues the effort in #71140 to introduce tuple
types for bfloat16.
|
|
|
|
|
|
Adds the following SME2 builtins:
- sv(add|sub)
- sv(add|sub)_za32/za64,
- sv(add|sub)_write_za32/za64
Other changes in this patch:
- CGBuiltin.cpp: The GetAArch64SMEProcessedOperands function is created
to avoid duplicating existing code from EmitAArch64SVEBuiltinExpr.
- arm_sve.td: The add/sub SME2 builtins which do not operate on ZA have
been added to arm_sve.td, matching the corrosponding LLVM IR intrinsic
names which start with @llvm.aarch64.sve for this reason.
- SveEmitter.cpp: Adds the createCoreHeaderIntrinsics function to remove
duplicated code in createHeader & createSMEHeader. Uses a new enum
(ACLEKind) to choose either "__builtin_sme_" or "__builtin_sve_" when
emitting the intrinsics.
See https://github.com/ARM-software/acle/pull/217/files
|
|
Previous representation used an enumeration combined to a switch to
dispatch to the appropriate lexer.
Use function pointer so that the dispatching is just an indirect call,
which is actually better because lexing is a costly task compared to a
function call.
This also makes the code slightly cleaner, speedup on compile time
tracker are consistent and range form -0.05% to -0.20% for NewPM-O0-g,
see
https://llvm-compile-time-tracker.com/compare.php?from=f9906508bc4f05d3950e2219b4c56f6c078a61ef&to=608c85ec1283638db949d73e062bcc3355001ce4&stat=instructions:u
Considering just the preprocessing task, preprocessing the sqlite
amalgametion takes -0.6% instructions (according to valgrind
--tool=callgrind)
---------
Co-authored-by: serge-sans-paille <sguelton@mozilla.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
|
|
BF16 implementation based on @joshua-arch1's
https://reviews.llvm.org/D152498
Fixed the incorrect f16 type introduced in
https://github.com/llvm/llvm-project/pull/68296
---------
Co-authored-by: Jun Sha (Joshua) <cooper.joshua@linux.alibaba.com>
|
|
startswith/endswith. NFC.
startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)
|
|
This patch converts TagTypeKind into scoped enum. Among other benefits,
this allows us to forward-declare it where necessary.
|
|
This patch adds reinterpret builtins as proposed here:
https://github.com/ARM-software/acle/pull/275.
The builtins take the form:
sv<dst>x<N>_t svreinterpret_<dst>_<src>_x<N>(sv<src>x<N>_t op)
where
- <src> and <dst> designate the source and the destination type,
respectively, all pairs chosen from {s8, u8, s16, u8, s32, u32, s64,
u64, bf16, f16, f32, f64}
- <N> designated the number of tuple elements, 2, 3 or 4
A short (overloaded) for is also provided, where the destination type is
explicitly designated and the source type is deduced from the parameter
type. These take the form
sv<dst>x<N>_t svreinterpret_<dst>(sv<src>x<N>_t op)
For example:
svuin16x2_t svreinterpret_u16_s32_x2(svint32x2_t op);
svuin16x2_t svreinterpret_u16(svint32x2_t op);
|
|
FP32-to-int8 Ranged Clip Instructions
https://sifive.cdn.prismic.io/sifive/0aacff47-f530-43dc-8446-5caa2260ece0_xsfvfnrclipxfqf-spec.pdf
|
|
Bfloat16 Matrix Multiply Accumulate Instruction
https://sifive.cdn.prismic.io/sifive/c391d53e-ffcf-4091-82f6-c37bf3e883ed_xsfvfwmaccqqq-spec.pdf
|
|
|
|
Rather than write a bunch of logic to shepherd between enums with the
same sets of values, add the ability for EnumArgument to refer to an
external enum in the first place.
|
|
names. (#69460)
* Mark SVE ACLE types as substitution candidates.
* Change mangling of svbfloat16_t from __SVBFloat16_t to
__SVBfloat16_t.
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
This is an ABI break with the old behaviour available via
"-fclang-abi-compat=17".
|
|
|
|
analysis
- For a better understand of what the unsupported cases are, we add
more information to the debug note---a string of ancestor AST nodes
of the unclaimed DRE. For example, an unclaimed DRE p in an
expression `*(p++)` will result in a string starting with
`DRE ==> UnaryOperator(++) ==> Paren ==> UnaryOperator(*)`.
- To find out the most common patterns of those unsupported use cases,
we add a simple script to build a prefix tree over those strings and
count each prefix. The script reads input line by line, assumes a
line is a list of words separated by `==>`s, and builds a prefix tree
over those lists.
Reviewed by: t-rasmud (Rashmi Mudduluru), NoQ (Artem Dergachev)
Differential revision: https://reviews.llvm.org/D158561
|