Age | Commit message (Collapse) | Author | Files | Lines |
|
This is a follow-up to #156140 and #160979, which deprecated one form of
write and read, respectively.
We have two forms of byte_swap:
template <typename value_type>
[[nodiscard]] inline value_type byte_swap(value_type value, endianness
endian)
template <typename value_type, endianness endian>
[[nodiscard]] inline value_type byte_swap(value_type value)
The difference is that endian is a function parameter in the former
but a template parameter in the latter.
This patch streamlines the code by migrating the use of the latter to
the former while deprecating the latter because the latter is just
forwarded to the former.
|
|
This patch adds an Offload Wrapper for the SYCL kind. This is an
essential step for SYCL offloading and the compilation flow. The usage
of offload wrapping is added to the clang-linker-wrapper tool.
Modifications:
Implemented `bundleSYCL()` function to handle SYCL image bundling.
Implemented `wrapSYCLBinaries()` function that is invoked from
clang-linker-wrapper.
SYCL Offload Wrapping uses specific data structures such as
`__sycl.tgt_device_image` and `__sycl.tgt_bin_desc`. Each SYCL image
maintains its own symbol table (unlike shared global tables in other
targets). Therefore, symbols are encoded explicitly during the offload
wrapping. Also, images refer to their own Offloading Entries arrays
unlike other targets.
The proposed `__sycl.tgt_device_image` uses Version 3 to differentiate
from images generated by Intel DPC++. The structure proposed in this
patch doesn't have fields deprecated in DPC++.
|
|
This PR changes `llvm::FileCollector` to use the `llvm::vfs::FileSystem`
API for making file paths absolute instead of using
`llvm::sys::fs::make_absolute()` directly. This matches the behavior of
the compiler on most other input files.
|
|
FreeBSD libc has a lot of symbols that are ifuncs, which makes TLI
checker believe they are not available. This change makes the tool
consider symbols with the STT_GNU_IFUNC type.
|
|
Some LLVM passes need access to the filesystem to read configuration
files and similar. In some places, this is achieved by grabbing the VFS
from `PGOOptions`, but some passes don't have access to these and resort
to just calling `vfs::getRealFileSystem()`. This PR allows setting the
VFS directly on `PassBuilder` that's able to pass it down to all passes
that need it.
|
|
It adds a new JSON array with the list of test vectors to the end of the mcdc_records.
I also bumped the json format version accordingly, which I believe wasn’t done properly in the past when new fields were added.
This PR adds tests and comment docs for
https://github.com/llvm/llvm-project/pull/105511
---------
Co-authored-by: Arpad Borsos <swatinem@swatinem.de>
|
|
Add a filter command to llvm-remarkutil. This can be used to extract
remarks for a certain function, pass, type, etc. from a large remarks
file to a new remarks file. This uses the same filter arguments as the
count command.
Depends on #156715. Thanks to this change, we don't need to buffer all
remarks before reserializing them, so we should be able to process
arbitrarily large files.
Pull Request: https://github.com/llvm/llvm-project/pull/159784
|
|
This patch makes it so that InstrPostProcess::postProcessInstruction
takes in a reference to a mca::Instruction rather than a reference to a
std::unique_ptr. Without this, InstrPostProcess cannot be used with MCA
instruction recycling because it needs to be called on both newly
created instructions and instructions that have been recycled. We only
have access to a raw pointer for instructions that have been recycled
rather than a reference to the std::unique_ptr that owns them.
This patch adds a call in the existing instruction recycling unit test
to ensure the API remains compatible with this use case.
|
|
This patch adds the MIR parsing and serialization support for save and
restore points with subsets of callee saved registers. That is, it
syntactically allows a function to contain two or more distinct
sub-regions in which distinct subsets of registers are spilled/filled as
callee save. This is useful if e.g. one of the CSRs isn't modified in
one of the sub-regions, but is in the other(s).
Support for actually using this capability in code generation is still
forthcoming. This patch is the next logical step for multiple
save/restore points support.
All points are now stored in DenseMap from MBB to vector of
CalleeSavedInfo.
Shrink-Wrap points split Part 4.
RFC:
https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581
Part 1: https://github.com/llvm/llvm-project/pull/117862 (landed)
Part 2: https://github.com/llvm/llvm-project/pull/119355 (landed)
Part 3: https://github.com/llvm/llvm-project/pull/119357 (landed)
Part 5: https://github.com/llvm/llvm-project/pull/119359 (likely to be
further split)
|
|
Summary:
This was originally kept separate so it didn't pollute the name space,
but now I'm thinking it's just easier to bundle it in with the default
interface. This means that we'll have a bit of extra code for people
using the server.h file to handle libc opcodes, but it's minimal (3
functions) and it simplifies this.
I'm doing this because I'm hoping to move the GPU tester binary to
liboffload which handles `libc` opcodes internally except these. This is
the easier option compared to adding a hook to register custom handlers
there.
|
|
Currently there are two serialization modes for bitstream Remarks:
standalone and separate. The separate mode splits remark metadata (e.g.
the string table) from actual remark data. The metadata is written into
the object file by the AsmPrinter, while the remark data is stored in a
separate remarks file. This means we can't use bitstream remarks with
tools like opt that don't generate an object file. Also, it is confusing
to post-process bitstream remarks files, because only the standalone
files can be read by llvm-remarkutil. We always need to use dsymutil
to convert the separate files to standalone files, which only works for
MachO. It is not possible for clang/opt to directly emit bitstream
remark files in standalone mode, because the string table can only be
serialized after all remarks were emitted.
Therefore, this change completely removes the separate serialization
mode. Instead, the remark string table is now always written to the end
of the remarks file. This requires us to tell the serializer when to
finalize remark serialization. This automatically happens when the
serializer goes out of scope. However, often the remark file goes out of
scope before the serializer is destroyed. To diagnose this, I have added
an assert to alert users that they need to explicitly call
finalizeLLVMOptimizationRemarks.
This change paves the way for further improvements to the remark
infrastructure, including more tooling (e.g. #159784), size optimizations
for bitstream remarks, and more.
Pull Request: https://github.com/llvm/llvm-project/pull/156715
|
|
`EM_INTEL205` was renamed to `EM_INTELGT` (ref
[here](https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=7b9f985957798ba4dacc454f22c9e426c6897cb8))
and is used for Intel GPU images.
We will be using this type for offloading to Intel GPUs.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
|
|
Currently MCA takes instruction properties from scheduling model.
However, some instructions may execute differently depending on external
factors - for example, latency of memory instructions may vary
differently depending on whether the load comes from L1 cache, L2 or
DRAM. While MCA as a static analysis tool cannot model such differences
(and currently takes some static decision, e.g. all memory ops are
treated as L1 accesses), it makes sense to allow manual modification of
instruction properties to model different behavior (e.g. sensitivity of
code performance to cache misses in particular load instruction). This
patch addresses this need.
The library modification is intentionally generic - arbitrary
modifications to InstrDesc are allowed. The tool support is currently
limited to changing instruction latencies (single number applies to all
output arguments and MaxLatency) via coments in the input assembler
code; the format is the like this:
add (%eax), eax // LLVM-MCA-LATENCY:100
Users of MCA library can already make additional customizations; command
line tool can be extended in the future.
Note that InstructionView currently shows per-instruction information
according to scheduling model and is not affected by this change.
See https://github.com/llvm/llvm-project/issues/133429 for additional
clarifications (including explanation why existing customization
mechanisms do not provide required functionality)
---------
Co-authored-by: Min-Yih Hsu <min@myhsu.dev>
|
|
Support -mcpu=native by querying the Host CPU Name and Feature flags.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
|
|
Planning to add to the list in
https://github.com/llvm/llvm-project/pull/159791, so format it.
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
|
|
- The output for `--output-sort=id` matches `--output-sort=offset` for
the available readers. Tests were updated accordingly.
- For `--output-sort=none`, and per `LVReader::sortScopes()`,
`LVScope::sort()` is called on the root scope.
`LVScope::sort()` has no effect if `getSortFunction() == nullptr`, and
thus the elements are currently traversed in the order in which they
were initially added. This should change, however, after
`LVScope::Children` is removed.
|
|
Explicitly round up the reciprocal calculation,
so that .125 is displayed as 0.13 consistently across all hosts.
Fix buildbot failure
https://lab.llvm.org/buildbot/#/builders/193/builds/10666
since https://github.com/llvm/llvm-project/pull/154495
|
|
This PR passes the VFS down to `llvm::cl` functions so that they don't
assume the real file system.
|
|
The --totals option was not working for Mach-O files because the Darwin
segment size calculation skipped the totals accumulation.
---------
Co-authored-by: James Henderson <James.Henderson@sony.com>
|
|
When dumping a PDB with an inlinesite that had a ChangeFile annotation,
the `Filename` wasn't passed to the format string. This hit an assertion
in debug mode and silently failed in release mode.
|
|
|
|
Summary:
Turns out the new CUDA ABI now applies retroactively to all the other
SMs if you upgrade to CUDA 13.0. This patch changes the scheme, keeping
all the SM flags consistent but using an offset.
Fixes: https://github.com/llvm/llvm-project/issues/159088
|
|
This avoids the following kind of warning with GCC:
../tools/llvm-lipo/llvm-lipo.cpp: In function ‘void printInfo(llvm::LLVMContext&, llvm::ArrayRef<llvm::object::OwningBinary<llvm::object::Binary> >)’:
../tools/llvm-lipo/llvm-lipo.cpp:464:34: warning: suggest parentheses around ‘& ’ within ‘||’ [-Wparentheses]
464 | Binary->isArchive() && "expected MachO binary");
| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
We have inconsistent handling of null returns on target MC constructors.
Most tools report an error, but some assert. It's currently not possible
to test this with any in-tree targets. Make this a clean error so in the
future targets can fail the construction.
|
|
|
|
Replace assert with an error.
|
|
Replace assert with a handled error. The same should probably
be done for all of the MC constructors.
|
|
|
|
This PR is a reapplication of
https://github.com/llvm/llvm-project/pull/142686
|
|
(#158764)
This reverts commit 895cda70a95529fd22aac05eee7c34f7624996af.
And fix attempt: 06f671e57a574ba1c5127038eff8e8773273790e.
Performance regressions and broken sanitizers, see #142686.
|
|
access events for non context-sensitive profiles using debug info (#148013)
An RFC is in
https://discourse.llvm.org/t/rfc-vtable-type-profiling-for-samplefdo/87283
This change extends to process perf data with Intel
[MEM_INST_RETIRED.ALL_LOADS](https://perfmon-events.intel.com/index.html?pltfrm=skylake_server.html&evnt=MEM_INST_RETIRED.ALL_LOADS)
samples and produce sample profiles with vtable information for non
context-sensitive SampleFDO profiles.
* For feature parity across different hardwares, future work could
incorporate support for AMD Instruction-Based Sampling (IBS) and Arm
Statistical Profiling Extension (SPE).
---------
Co-authored-by: Paschalis Mpeis <paschalis.mpeis@arm.com>
|
|
This patch adds the flag -fexperimental-loop-fuse to the clang and flang
drivers. This is primarily useful for experiments as we envision to
enable the pass one day.
The options are based on the same principles and reason on which we have
`floop-interchange`.
---------
Co-authored-by: Madhur Amilkanthwar <madhura@nvidia.com>
|
|
This is a revised PR of #148309 (closed due to some git issues). The
changes do the following:
- Adds description for the modes of `-unified-lto=mode` option
- Changes parsing of `unified-lto` descriptions from string to cEnumValN
for continuity of description formatting
- Adds testing of error output in `unified-lto-check.ll`
|
|
pdb2yaml dumps the public symbols, but yaml2pdb didn't create these in
the exported PDB. With this PR, they're added to the final PDB.
|
|
(#156501)
Add a new option (under `--mark-all-context-preinlined`) that marks all
function samples with the `ContextShouldBeInlined ` attribute during
post-processing to make the profile as preinlined. This can be useful
for experiments outside of the CS preinliner, e.g. to fully replay the
inlining for a given profile.
|
|
`DXContainer` (#154804)
This pr adds the `extract-section` option to `llvm-objcopy` as a common
option. It differs from `dump-section` as it will produce a standalone
object with just one section, as opposed to just the section contents.
For more context as to other options considered, see
https://github.com/llvm/llvm-project/pull/153265#issuecomment-3195696003.
This difference in behaviour is used for DXC compatibility with
`extract-rootsignature` and `/Frs`.
This pr then implements this functionality for `DXContainer` objects.
This is the second step of
https://github.com/llvm/llvm-project/issues/150277 to implement as a
compiler action that invokes `llvm-objcopy` for functionality.
This also completes the implementation of `extract-rootsignature` as
described in https://github.com/llvm/llvm-project/issues/149560.
|
|
|
|
Avoids more Triple->string->Triple round trip. This
is a continuation of f137c3d592e96330e450a8fd63ef7e8877fc1908
|
|
Previously, llvm-readelf dumped hex format values in different ways.
Some of them were printed in upper-case, while the others were in
lower-case format. This change switches the format to lower-case in all
cases.
Why is this useful? As an example, FileCheck comparisons are
case-sensitive by default. This change means it's easier to compare
those values, because they have the same format.
|
|
(#156735)
Reverts llvm/llvm-project#156300
Need to look at the X86 test failures.
|
|
In the serial snippet generator and function that computes the aliasing
instructions, we don't want to include load/store instructions
to create a chain as that could make the results more unreliable.
There is a hasMemoryOperands() check, but currently that looks like a X86
way for checking for loads/stores. For AArch64 and other architectures, we
should check mayLoad() and mayStore().
|
|
by snippet generator" (#156423)
### Reland #142529 (Resolving "not all operands are initialized by
snippet generator")
Introduced changes in implementation of `randomizeTargetMCOperand()` for
AArch64 that omitting `OPERAND_SHIFT_MSL`, `OPERAND_PCREL` to an
immediate value of 264 and 8 respectively.
PS: Omitting
`MCOI::OPERAND_FIRST_TARGET/llvm:AArch64:OPERAND_IMPLICIT_IMM_0`
similarly, to value 0. It was low hanging change thus added in this PR
only.
For any future operand type of AArch64 if not initialised will exit with
error "`Unimplemented operand type: MCOI::OperandType:<#Number>`".
#### [Reland Updates]
Updated `tools/llvm-exegesis/AArch64/error-resolution.s` which caused
problem.
Test case was failing when there is uninitialised operands error coming
from secondary/consumer instruction used by exegesis in latency mode
required to chain up the assembly to ensure serial execution.
i.e. We get error message like `UMOVvi16_idx0: Not all operands were
initialized by the snippet generator for <<<any opcode other than
UMOVvi16_idx0>>> opcode.` but test case want to check like
`# UMOVvi16_idx0_latency: ---`. Thus the testcase fails.
```+ /llvm-project/build/bin/FileCheck /llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=UMOVvi16_idx0_latency
/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s:65:26: error: UMOVvi16_idx0_latency: expected string not found in input
# UMOVvi16_idx0_latency: ---
^
<stdin>:1:1: note: scanning from here
UMOVvi16_idx0: Not all operands were initialized by the snippet generator for LD1W_D_IMM opcode.
^
Input file: <stdin>
Check file: /llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s
-dump-input=help explains the following input dump.
Input was:
<<<<<<
1: UMOVvi16_idx0: Not all operands were initialized by the snippet generator for LD1W_D_IMM opcode.
check:65 X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
>>>>>>
--
********************
********************
Failed Tests (1):
LLVM :: tools/llvm-exegesis/AArch64/error-resolution.s
```
#### [Why it fails (only sometimes)]
Exegesis in latency mode require the generated assembly to be chained to
ensure serial execution,
For this exegesis add an additional consumer instruction for some
instruction, which is chosen via a random seed.
Thus, it randomly fails whenever there is secondary consumer instruction
(which is unsupported/throws error) added in generated assembly.
|
|
Directly write to the output instead of building a string to
print.
Closes #142538
|
|
This avoids subclassing std::vector and a static constructor.
This started as a refactor to make TargetLibraryInfo available during
printing so a custom name could be reported. It turns out this struct
wasn't doing anything, other than providing a hacky way of printing the
standard name instead of the target's custom name. Just remove this and
stop hacking on the TargetLibraryInfo to falsely report the function
is available later.
|
|
This patch reverts 44e791c6ff1a982de9651aad7d1c83d1ad96da8a,
3cc1031a827d319c6cb48df1c3aafc9ba7e96d72 and
adbd43250ade1d5357542d8bd7c3dfed212ddec0. Which breaks debuginfod build
and tests when httplib is used.
|
|
initialized by snippet generator" (#142529)"
This reverts commit 2e8ecf7d5fbb4e0d029b0baf94f57f8161b396be.
It is causing clang-aarch64-quick buildbot to fail.
(see:https://lab.llvm.org/buildbot/#/builders/65/builds/22035)
|
|
Fixes the bug of the missing terminator for CallBrInst.
|
|
snippet generator" (#142529)
Exegesis for AArch64 arch, before this patch only handles/initialise
Immediate and Register Operands.
For opcodes requiring rest operand types exegesis exits with Error:
`"not all operands are initialised by snippet generator"`.
To resolve a given error we have to initialise required operand types.
i.e., For `"not all operands are initialised by snippet generator"` init
`OPERAND_SHIFT_MSL`, `OPERAND_PCREL`,
And For `"targets with target-specific operands should implement this"`
init `OPERAND_FIRST_TARGET`.
This PR adds support to the following opcodes:-
- OPERAND_SHIFT_MSL: `[MOVI|MVNI]_[2s|4s]_msl`.
- OPERAND_PCREL: `LDR[R|X|W|SW|D|S|Q]l`
- OPERAND_IMPLICIT_IMM_0:
`[UMOV|SMOV]v[i8|i16|i32|i64|i8to32|i8to64|i16to32|i32to64|i16to64]_idx0`
---
### [Experiment/Learnings]
Moreover, We found out we can similarly omit `OPERAND_UNKNOWN` with
immediate value of 0. This brute force fix helps us get major part of
(`~1000`) opcodes which throw un-init operands error. But, The correct
way to resolve is to introduce `OperandType` in AArch64 tablegen files
for opcode which have `OPERND_UNKNOWN` in `AArch64GenInstrInfo.inc`. And
add switch case for those `OperandType` in the
`randomizeTargetMCOperand()`.
As, side-effect to this temporary fix that we explored is listed below
system-level instructions throws `illegal instruction` i.e. for`MRS,
MSR, MSRpstatesvcrImm1, SYSLxt, SYSxt, UDF`.
This patch in `--mode=inverse_throughput` for opcodes `MRS, MSR,
MSRpstatesvcrImm1, SYSLxt, SYSxt, UDF` exits with handled error of
`snippet crashed while running: Illegal instruction`, they previously
used to exits with error `not all operands initialized by snippet
generator`.
[For completeness] Additionally, exegesis beforehand and with this patch
too, throws illegal instruction in throughput mode, for these opcodes
too (`APAS, DCPS1, DCPS2, DCPS3, HLT, HVC, SMC, STGM, STZGM`). Will look
into them later.
---
### [Summary]
Thus, Only introduced changes in implementation of
`randomizeTargetMCOperand()` for AArch64 that omitting
`OPERAND_SHIFT_MSL`, `OPERAND_PCREL` to an immediate value of 264 and 8
respectively.
PS: Omitting
`MCOI::OPERAND_FIRST_TARGET`/`llvm:AArch64:OPERAND_IMPLICIT_IMM_0`
similarly, to value 0. It was low hanging change thus added in this PR
only.
For any future operand type of AArch64 if not initialised will exit with
error `"Unimplemented operand type: MCOI::OperandType:<#Number>"`.
|
|
This is a follow on to #154537, which quoted the CMAKE_SYSTEM_NAME to avoid expanding it again when that CMAKE_SYSTEM_NAME expands to AIX.
But by the same logic, we also need to quote the plain string AIX as well.
|