Age | Commit message (Collapse) | Author | Files | Lines |
|
Summary:
The GPU target has recently been changed to support standard `libc`
build rules. This means we should be able to build for it both in
`LLVM_ENABLE_PROJECTS` mode, or targeting the runtimes directory
directly as in the LLVM `libc` documentation. Previously this failed
because the version check on the compiler was too strict and the
`--target=` options were not being set on the link jobs unless in CMake
cross compiliation mode. This patch fixes those so the following config
should work now to build the GPU target directly if using NVPTX.
```
cmake ../runtimes -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang \
-DLLVM_ENABLE_RUNTIMES=libc -DLLVM_RUNTIMES_TARGET=nvptx64-nvidia-cuda \
-DLLVM_DEFAULT_TARGET_TRIPLE=nvptx64-nvidia-cuda \
-DLIBC_HDRGEN_EXE=/path/to/hdrgen/libc-hdrgen \
-DLLVM_LIBC_FULL_BUILD=ON -GNinja
```
|
|
Summary:
We need some extra tools for the GPU build. Normally we search for these
from the build itself, but in the case of a `LLVM_PROJECTS_BUILD` or
some other kind of external build, this directory will not be populated.
However, the GPU build already requires that the compiler is an
up-to-date clang, which should always have these present next to the
binary. Simply add this as a fallback search path. Generally we want it
to be the second, because it would pick up someone install and then
become stale.
|
|
Summary:
This patch silences two warnings that may occur during the building of
GPU tests. These are not informative or helpful and just make the test
output longer.
|
|
Summary:
This is a massive patch because it reworks the entire build and
everything that depends on it. This is not split up because various bots
would fail otherwise. I will attempt to describe the necessary changes
here.
This patch completely reworks how the GPU build is built and targeted.
Previously, we used a standard runtimes build and handled both NVPTX and
AMDGPU in a single build via multi-targeting. This added a lot of
divergence in the build system and prevented us from doing various
things like building for the CPU / GPU at the same time, or exporting
the startup libraries or running tests without a full rebuild.
The new appraoch is to handle the GPU builds as strict cross-compiling
runtimes. The first step required
https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC`
target to build for the GPU without touching the other targets. This
means that the GPU uses all the same handling as the other builds in
`libc`.
The new expected way to build the GPU libc is with
`LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`.
The second step was reworking how we generated the embedded GPU library
by moving it into the library install step. Where we previously had one
`libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This
patch includes the necessary clang / OpenMP changes to make that not
break the bots when this lands.
We unfortunately still require that the NVPTX target has an `internal`
target for tests. This is because the NVPTX target needs to do LTO for
the provided version (The offloading toolchain can handle it) but cannot
use it for the native toolchain which is used for making tests.
This approach is vastly superior in every way, allowing us to treat the
GPU as a standard cross-compiling target. We can now install the GPU
utilities to do things like use the offload tests and other fun things.
Some certain utilities need to be built with
`--target=${LLVM_HOST_TRIPLE}` as well. I think this is a fine
workaround as we
will always assume that the GPU `libc` is a cross-build with a
functioning host.
Depends on https://github.com/llvm/llvm-project/pull/81557
|
|
Summary:
This cleans up the handling of hermetic test flags. Primarily done to
simplify the GPU rework patch.
|
|
The cmake test generator needed to be updated to support the same flags
as the source. To simplify the code, I moved it to a new file.
|
|
|
|
according to ISO/IEC TR 18037:2008 standard, and add fixed point type support detection. (#81255)
Fixed point extension standard:
https://standards.iso.org/ittf/PubliclyAvailableStandards/c051126_ISO_IEC_TR_18037_2008.zip
|
|
Summary:
I neglected the fact that `activemask` is a 6.2 or 6.3 feature, so
building this on older machines is incorrect. Bump this up to 6.3 for
now so it works. In the future we will try to get rid of the N
architecture business.
|
|
|
|
|
|
|
|
fixes #79474.
|
|
|
|
entry points and unit tests. (#79254)
|
|
operator to HermeticTestUtils.cpp. (#78906)
`-nostdlib++` is a clang-only flag. Replacing it with `-nostdlib` when
building with gcc.
|
|
Usage of uninitialized memory is a top memory safety issue in C++ codebases.
Help mitigate this somewhat by default initialize stack allocations to a
pattern (0xAA repeating).
Clang has received optimizations to sink these into control flow paths that
access such values to minimize the overhead of these added initializations.
If there's a measurable slowdown, we can add
-ftrivial-auto-var-init-max-size=<N> for some value N bytes if we have any
large stack allocations, or add attribute uninitialized to any variable
declarations.
Unsupported until GCC 12.1 / Clang 8.
Increases file size of libc.a from a full build by +8.79Ki (+0.2%).
|
|
fixes https://github.com/llvm/llvm-project/issues/78743
- For normal objects, the patch removes `RTTI` and exceptions in `fullbuild`
- For FP tests, the patch adds links to `stdc++` and `gcc_s` if `MPFR` is used.
|
|
This reverts commit 6886a52d6dbefff77f33de12ff85d654e2557f81.
Most of the errors observed in postsubmit have been addressed. We can
fix-forward the remaining ones.
Link: https://lab.llvm.org/buildbot/#/changes/117129
|
|
Summary:
A previous patch added a dependency on the stack protectors, this was
not built on the GPU targets so every test was disabled. It turns out
that disabled tests still get targets so we need to specifically check
if the it is in the target's set of entrypoints before we can use it.
Another patch, because the build-bot was down, snuck in that prevented
the new math tests from being run. The problem is that the `signal.h`
header requires target specific definitions but was being used
unconditionally. I have made changes that disable building this header
if the file is not defined in the config. This required disbaling the
signal_to_string utility, so that will simply be missing from targets
that don't define it.
|
|
This allows individual object files to override the common compile
commands in
their local CMakeLists' add_object_library call.
For example, the common compile commands contain -Wall and -Wextra.
Before
this patch, the per object COMPILE_OPTIONS were prepended to these, so
that
builds of individual object files could not individually disable
specific
diagnostics from those groups explicitly.
After this patch, the per-object file compile objects are appended to
the list
of compiler flags, enabling this use case.
ARGN is a bit of cmake magic; let's be explicit in the APPEND that we're
appending the compile options.
Link: #77007
|
|
* separate initialization routines into _start and do_start for all
architectures.
* lift do_start as a separate object library to avoid code duplication.
* (addtionally) address the problem of building hermetic libc with
-fstack-pointer-*
The `crt1.o` is now a merged result of three components:
```
___
|___ x86_64
| |_______ start.cpp.o <- _start (loads process initial stack and aligns stack pointer)
| |_______ tls.cpp.o <- init_tls, cleanup_tls, set_thread_pointer (TLS related routines)
|___ do_start.cpp.o <- do_start (sets up global variables and invokes the main function)
```
|
|
This adds the ROCm device libs defines for both target architectures so
that we an compile libc on such GPUs.
|
|
tests. (#75552)
Profiling cmake shows that a significant time configuring `libc` folder
is spent on running `get_object_files_for_test` in the `test` folder (13
sec in `libc/test` folder / 16 sec in `libc` folder). By caching all
needed objects for each target instead of resolving every time, the time
cmake spends on configuring `libc/test` folder is reduced to ~1s.
|
|
Summary:
The GPU tests tend to fail when run massively in parallel. This is why
we use a CMake job pool to limit it to 1 in most cases. We should
default to the configuration that is most likely to work, that being a
single thread. There aren't enough GPU tests for this to be a massive
increase in test time on the bots, so we should default to what works
guaranteed.
|
|
This reverts commit 606653091d1a66d1a83a1bfdea2883cc8d46687e.
Post submit buildbots are now red. We can use these explicit errors to better
clean up existing warnings, then reland this.
Link: #73966
|
|
A recent commit introduced warnings observable when building unit tests.
If the
unit tests don't fail when warnings are introduced into the build, then
we
might fail to notice them in the stream of output from check-libc.
Link: https://github.com/llvm/llvm-project/pull/72763/files#r1410932348
|
|
There are some basic vectorization features in standard architecture
specifications. Such as SSE/SSE2 for x86-64, or NEON for aarch64. Even
though such features are almost always available, we still need some
methods to test fallback routines without any vectorization.
Previous attempt in hsearch adds a DISABLE_SSE2_OPT flag that tries to
compile the code with -mno-sse2 in order to test specific table scanning
routines. However, it turns out that such flag may have some unwanted
side effects hindering portability.
This PR introduces PREFER_GENERIC as an alternative. When a target is
built with PREFER_GENERIC, cmake will define a macro
__LIBC_PREFER_GENERIC such that developers can selectively choose the
fallback routine based on the macro.
|
|
This patch implements `hcreate(_r)/hsearch(_r)/hdestroy(_r)` as
specified in https://man7.org/linux/man-pages/man3/hsearch.3.html.
Notice that `neon/asimd` extension is not yet added in this patch.
- The implementation is largely simplified from rust's
[`hashbrown`](https://github.com/rust-lang/hashbrown/blob/master/src/raw/mod.rs)
as we only consider fix-sized insertion-only hashtables. Technical
details are provided in code comments.
- This patch also contains a portable string hash function, which is
derived from [`aHash`](https://github.com/tkaitchuck/aHash)'s fallback
routine. Not using any SIMD acceleration, it has a good enough quality
(passing all SMHasher tests) and is not too bad in speed.
- Some general functionalities are added, such as `memory_size`,
`offset_to`(alignment), `next_power_of_two`, `is_power_of_two`.
`ctz/clz` are extended to support shorter integers.
|
|
(#73311)
Floating point properties are a combination of target OS, target
architecture and compiler support.
- Adding target OS detection,
- Moving floating point type detection to its own file.
This is in preparation of adding support for `_Float16` which requires
testing compiler **version** and target architecture.
|
|
Summary:
This patch includes the necessary changes to make the `libc` tests
running on AMD GPUs run using the newer code object version. The 'code
object version' is AMD's internal ABI for making kernel calls. The move
from 4 to 5 changed how we handle arguments for builtins such as
obtaining the grid size or setting up the size of the private stack.
Fixes: https://github.com/llvm/llvm-project/issues/72517
|
|
Currently the only way to add or remove entrypoints is to modify the
entrypoints.txt file for the current target. This isn't ideal since
a user would have to carry a diff for this file when updating their
checkout. This patch adds a basic mechanism to allow the user to remove
entrypoints without modifying the repository.
|
|
(#72351)
|
|
Summray:
A recent patch upgrades the NVPTX ctor / dtor lowering pass to emit
kernels so other languages can call them. We do this manually in `libc`
so we do not need this. Use the provided flag to disable this step to
keep the created kernels cleaner.
|
|
with copysignf128. (#71731)
|
|
Summary:
This patch fixes some code duplication on the GPU. The GPU build wanted
to enable timing for hermetic tests so it built some special case
handling into the test suite. Now that `clock` is supported on the
target we can simply link against the external interface. Because we
include `clock.h` for the CLOCKS_PER_SEC macro we remap the C entrypoint
to the internal one if it ends up called. This should allow hermetic
tests to run with timing if it is supported.
|
|
Summary:
This patch simply adds the `-fconvergent-functions` flag to the GPU
compilation. This is in relation to the behaviour of SIMT
architectures under divergence. With the flag, we assume every function
is convergent by default and rely on the compiler's divergence analysis
to transform it if possible.
Fixes: https://github.com/llvm/llvm-project/issues/63853
|
|
Summary:
There were a few tests that weren't enabled on the GPU. This is because
the logic caused them to be skipped as we don't use CPU featured on the
host. This also disables the logic making multiple versions of the
memory functions.
|
|
Summary:
Previously this code was applied to the integration tests but did not
copy the logic that stopped this from being passed to the GPU build.
Copy the full line to avoid the warnings and prevent any libraries from
being included.
|
|
64 bits versions
This patch enables the compilation of libc for rv32 by unifying the
current rv64 and rv32 implementation into a single rv implementation.
We updated the cmake file to match the new riscv32 arch and force
LIBC_TARGET_ARCHITECTURE to be "riscv" whenever we find "riscv32" or
"riscv64". This is required as LIBC_TARGET_ARCHITECTURE is used in the
path for several platform specific implementations.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D148797
|
|
(#66979)
printf_core.parser is not yet updated to use the printf config options. It
does not use them currently anyway and the corresponding parser_test
should be updated to respect the config options.
|
|
This patch set the integration test's linking options to be the same one
used in the hermetic tests.
In particular, by removing -nostdlib the tests are linked with
libgcc/compiler-rt and this fixes an issue undefined reference to
__udivdi3 and __umoddi3 in rv32.
|
|
${CMAKE_CROSSCOMPILING_EMULATOR} will be used in the new rv32 buildbot
and is prepended automatically when we call add_custom_target in CMake,
except when we use a custom command.
There are two places where custom commands are used in libc, so we
explicitly add the ${CMAKE_CROSSCOMPILING_EMULATOR} variable there.
Other systems that don't use ${CMAKE_CROSSCOMPILING_EMULATOR} are
unaffected
|
|
This is a reland of #66783 a35a3b75b219247eb9ff6784d1a0fe562f72d415
fixing the benchmark breakage.
|
|
This reverts commit a35a3b75b219247eb9ff6784d1a0fe562f72d415. This broke
libc benchmarks.
|
|
We want to activate `llvm-header-guard` (#66477) but the current CMake
configuration includes paths that should be `isystem`. This PR restricts
the number of `-I` passed to the clang command line and correctly marks
the llvm libc include path as `isystem`.
|
|
Summary:
The GPU build has a lot of magic around how we package the output.
Generally, the GPU needs to exist as a secondary fatbinary image for
offloading languages. This is because offloading languages pretend like
offloading to an accelerator is a single file. This then needs to be put
into a single file to make it mesh with the existing build
infrastructure. To work with this, the `libc` makes an installed version
of the library that simply embeds the GPU code into an empty stub file.
This wasn't being updated correctly, which lead to the installed `libc`
static library not being updated correctly when the underlying file was
changed. The previous behaviour only updated when the entrypoint itself
was modified, but not any of its headers. By adding a dependcy on the
actual *object* file we should now capture the regular CMake semantics.
|
|
This fixes the broken overlay builders.
|
|
(#66226)
The name has been changed to adhere to the config option naming format.
The necessary build changes to use the new option have also been made.
|
|
This is part of a libc wide CMake cleanup which aims to eliminate
certain explicitly duplicated logic which is available in CMake-3.20.
This change in particular makes the entrypoint aliases real library
targets so that they can be treated as normal library targets by other
libc build rules.
|