Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
This reverts commit e645728e9de64d019661c8f92bb487e06d95644a.
|
|
|
|
libgcc/config/libbid/ChangeLog:
PR target/120691
* bid128_div.c: Fix _Decimal128 arithmetic error under
FE_UPWARD.
* bid128_rem.c: Ditto.
* bid128_sqrt.c: Ditto.
* bid64_div.c (bid64_div): Ditto.
* bid64_sqrt.c (bid64_sqrt): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr120691.c: New test.
(cherry picked from commit 50064b2898edfb83bc37f2597a35cbd3c1c853e3)
|
|
|
|
SME uses a lazy save system to manage ZA. The idea is that,
if a function with ZA state wants to call a "normal" function,
it can leave its state in ZA and instead set up a lazy save buffer.
If, unexpectedly, that normal function contains a nested use of ZA,
that nested use of ZA must commit the lazy save first.
This lazy save system uses a special system register called TPIDR2_EL0.
See:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#66the-za-lazy-saving-scheme
for details.
The ABI specifies that, on entry to an exception handler, the following
things must be true:
* PSTATE.SM must be 0 (the processor must be in non-streaming mode)
* PSTATE.ZA must be 0 (ZA must be off)
* TPIDR2_EL0 must be 0 (there must be no uncommitted lazy save)
This is normally done by making _Unwind_RaiseException & friends
commit any lazy save before they unwind. This also has the side
effect of ensuring that TPIDR2_EL0 is never left pointing to a
lazy save buffer that has been unwound.
However, things get more complicated with signals. If:
(a) a signal is raised while ZA is dormant (that is, while there is an
uncommitted lazy save);
(b) the signal handler throws an exception; and
(c) that exception is caught outside the signal handler
something must ensure that the lazy save from (a) is committed.
This would be simple if the signal handler was entered with ZA and
TPIDR2_EL0 intact. However, for various good reasons that are out
of scope here, this is not done. Instead, Linux now clears both
TPIDR2_EL0 and PSTATE.ZA before entering a signal handler, see:
https://lore.kernel.org/all/20250417190113.3778111-1-mark.rutland@arm.com/
for details.
Therefore, it is the unwinder that must simulate a commit of the lazy
save from (a). It can do this by reading the previous values of
TPIDR2_EL0 and ZA from the sigcontext.
The SME-related sigcontext structures were only added to linux's
asm/sigcontext.h relatively recently and we can't rely on GCC being
built against such recent kernel header files. The patch therefore uses
defines relevant macros if they are not defined and provide types that
comply with ABI layout of the corresponding linux types.
The patch includes some ugly casting in an attempt to support big-endian
ILP32, even though SME on big-endian ILP32 linux should never be a thing.
We can remove it if we also remove ILP32 support from GCC.
Co-authored-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Tamar Christina <tamar.christina@arm.com>
gcc/
* doc/sourcebuild.texi (aarch64_sme_hw): Document.
gcc/testsuite/
* lib/target-supports.exp (add_options_for_aarch64_sme)
(check_effective_target_aarch64_sme_hw): New procedures.
* g++.target/aarch64/sme/sme_throw_1.C: New test.
* g++.target/aarch64/sme/sme_throw_2.C: Likewise.
libgcc/
* config/aarch64/linux-unwind.h (aarch64_fallback_frame_state):
If a signal was raised while there was an uncommitted lazy save,
commit the save as part of the unwind process.
(cherry picked from commit b5ffc8e75a81bab7ee7554483447c27be438464e)
|
|
|
|
f7_exp limited exponents to 512, but 1023 * ln2 ≈ 709,
hence 1024 is a correct limit.
libgcc/config/avr/libf7/
PR target/120441
* libf7.c (f7_exp): Limit aa->expo to 10 (not to 9).
(cherry picked from commit 672569cee76a1927d14b5eb754a5ff0b9cee1bc8)
|
|
|
|
|
|
From macOSX15 SDK, the unwinder no longer exports some of the symbols used
in that library which (a) causes bootstrap fail and (b) means that the
legacy library is no longer useful.
No open branch of GCC emits references to this library - and any already
-built code that depends on the symbols would need rework anyway.
We have been asked to extend this back to the earliest OS vesion supported
by the SDK (10.12).
PR target/116809
libgcc/ChangeLog:
* config.host: Build legacy libgcc_s.1 on hosts before macOS 10.12.
* config/i386/t-darwin: Remove reference to legacy libgcc_s.1
* config/rs6000/t-darwin: Likewise.
* config/t-darwin-libgccs1: New file.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit d9cafa0c4f0a81304d9b95a78ccc8e9003c6d7a3)
|
|
|
|
The following testcase shows a bug in unwind-dw2-btree.h.
In short, the header provides lock-free btree data structure (so no parent
link on nodes, both insertion and deletion are done in top-down walks
with some locking of just a few nodes at a time so that lookups can notice
concurrent modifications and retry, non-leaf (inner) nodes contain keys
which are initially the base address of the left-most leaf entry of the
following child (or all ones if there is none) minus one, insertion ensures
balancing of the tree to ensure [d/2, d] entries filled through aggressive
splitting if it sees a full tree while walking, deletion performs various
operations like merging neighbour trees, merging into parent or moving some
nodes from neighbour to the current one).
What differs from the textbook implementations is mostly that the leaf nodes
don't include just address as a key, but address range, address + size
(where we don't insert any ranges with zero size) and the lookups can be
performed for any address in the [address, address + size) range. The keys
on inner nodes are still just address-1, so the child covers all nodes
where addr <= key unless it is covered already in children to the left.
The user (static executables or JIT) should always ensure there is no
overlap in between any of the ranges.
In the testcase a bunch of insertions are done, always followed by one
removal, followed by one insertion of a range slightly different from the
removed one. E.g. in the first case [&code[0x50], &code[0x59]] range
is removed and then we insert [&code[0x4c], &code[0x53]] range instead.
This is valid, it doesn't overlap anything. But the problem is that some
non-leaf (inner) one used the &code[0x4f] key (after the 11 insertions
completely correctly). On removal, nothing adjusts the keys on the parent
nodes (it really can't in the top-down only walk, the keys could be many nodes
above it and unlike insertion, removal only knows the start address, doesn't
know the removed size and so will discover it only when reaching the leaf
node which contains it; plus even if it knew the address and size, it still
doesn't know what the second left-most leaf node will be (i.e. the one after
removal)). And on insertion, if nodes aren't split at a level, nothing
adjusts the inner keys either. If a range is inserted and is either fully
bellow key (keys are - 1, so having address + size - 1 being equal to key is
fine) or fully after key (i.e. address > key), it works just fine, but if
the key is in a middle of the range like in this case, &code[0x4f] is in the
middle of the [&code[0x4c], &code[0x53]] range, then insertion works fine
(we only use size on the leaf nodes), and lookup of the addresses below
the key work fine too (i.e. [&code[0x4c], &code[0x4f]] will succeed).
The problem is with lookups after the key (i.e. [&code[0x50, &code[0x53]]),
the lookup looks for them in different children of the btree and doesn't
find an entry and returns NULL.
As users need to ensure non-overlapping entries at any time, the following
patch fixes it by adjusting keys during insertion where we know not just
the address but also size; if we find during the top-down walk a key
which is in the middle of the range being inserted, we simply increase the
key to be equal to address + size - 1 of the range being inserted.
There can't be any existing leaf nodes overlapping the range in correct
programs and the btree rebalancing done on deletion ensures we don't have
any empty nodes which would also cause problems.
The patch adjusts the keys in two spots, once for the current node being
walked (the last hunk in the header, with large comment trying to explain
it) and once during inner node splitting in a parent node if we'd otherwise
try to add that key in the middle of the range being inserted into the
parent node (in that case it would be missed in the last hunk).
The testcase covers both of those spots, so succeeds with GCC 12 (which
didn't have btrees) and fails with vanilla GCC trunk and also fails if
either the
if (fence < base + size - 1)
fence = iter->content.children[slot].separator = base + size - 1;
or
if (left_fence >= target && left_fence < target + size - 1)
left_fence = target + size - 1;
hunk is removed (of course, only with the current node sizes, i.e. up to
15 children of inner nodes and up to 10 entries in leaf nodes).
2025-03-10 Jakub Jelinek <jakub@redhat.com>
Michael Leuchtenburg <michael@slashhome.org>
PR libgcc/119151
* unwind-dw2-btree.h (btree_split_inner): Add size argument. If
left_fence is in the middle of [target,target + size - 1] range,
increase it to target + size - 1.
(btree_insert): Adjust btree_split_inner caller. If fence is smaller
than base + size - 1, increase it and separator of the slot to
base + size - 1.
* gcc.dg/pr119151.c: New test.
(cherry picked from commit 21109b37e8585a7a1b27650fcbf1749380016108)
|
|
|
|
[PR118844].
Due to the presence of R_LARCH_B26 in
/usr/lib/gcc/loongarch64-linux-gnu/14/crtbeginS.o, its addressing
range is [PC-128MiB, PC+128MiB-4]. This means that when the code
segment size exceeds 128MB, linking with lld will definitely fail
(ld will not fail because the order of the two is different).
The linking order:
lld: crtbeginS.o + .text + .plt
ld : .plt + crtbeginS.o + .text
To solve this issue, add '-mcmodel=extreme' when compiling crtbeginS.o.
PR target/118844
libgcc/ChangeLog:
* config/loongarch/t-crtstuff: Add '-mcmodel=extreme'
to CRTSTUFF_T_CFLAGS_S.
(cherry picked from commit ae14d7d04da8c6cb542269722638071f999f94d8)
|
|
|
|
Add crtbeginT.o to extra_parts on FreeBSD. This ensures we use GCC's
crt objects for static linking. Otherwise it could mix crtbeginT.o
from the base system with libgcc's crtend.o, possibly leading to
segfaults.
libgcc:
PR target/118685
* config.host (*-*-freebsd*): Add crtbeginT.o to extra_parts.
Signed-off-by: Dimitry Andric <dimitry@andric.com>
|
|
|
|
Unlike crtoffload{begin,end}.o which just define some symbols at the start/end
of the various .gnu.offload* sections, crtoffloadtable.o contains
const void *const __OFFLOAD_TABLE__[]
__attribute__ ((__visibility__ ("hidden"))) =
{
&__offload_func_table, &__offload_funcs_end,
&__offload_var_table, &__offload_vars_end,
&__offload_ind_func_table, &__offload_ind_funcs_end,
};
The problem is that linking this into PIEs or shared libraries doesn't
work when it is compiled without -fpic/-fpie - __OFFLOAD_TABLE__ for non-PIC
code is put into .rodata section, but it really needs relocations, so for
PIC it should go into .data.rel.ro/.data.rel.ro.local.
As I think we don't want .data.rel.ro section in non-PIE binaries, this patch
follows the path of e.g. crtbegin.o vs. crtbeginS.o and adds crtoffloadtableS.o
next to crtoffloadtable.o, where crtoffloadtableS.o is compiled with -fpic.
2024-11-30 Jakub Jelinek <jakub@redhat.com>
PR libgomp/117851
gcc/
* lto-wrapper.cc (find_crtoffloadtable): Add PIE_OR_SHARED argument,
search for crtoffloadtableS.o rather than crtoffloadtable.o if
true.
(run_gcc): Add pie_or_shared variable. If OPT_pie or OPT_shared or
OPT_static_pie is seen, set pie_or_shared to true, if OPT_no_pie is
seen, set pie_or_shared to false. Pass it to find_crtoffloadtable.
libgcc/
* configure.ac (extra_parts): Add crtoffloadtableS.o.
* Makefile.in (crtoffloadtableS$(objext)): New goal.
* configure: Regenerated.
(cherry picked from commit f089ef880e385e2193237b1f53ec81dac4141680)
|
|
|
|
In the OpenRISC build we get the following warning:
ld: warning: __modsi3_s.o: missing .note.GNU-stack section implies executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
Fix this by adding a .note.GNU-stack to indicate the stack does not need to be
executable for the lib1funcs.
Note, this is also needed for the upcoming glibc 2.41.
libgcc/
* config/or1k/lib1funcs.S: Add .note.GNU-stack section on linux.
|
|
|
|
For libgcc, we have (so far) supported building a DSO that supports
earlier versions of the OS than the target. From macOS 11, there are
APIs that do not exist on earlier OS versions, so limit the libgcc
range to macOS11..current.
libgcc/ChangeLog:
* config.host: From macOS 11, limit earliest macOS support
to macOS 11.
* config/t-darwin-min-11: New file.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit 43eab54939d37d4e634a692910d31adafc053e38)
|
|
|
|
mips16.S was missing since
commit 29b74545531f6afbee9fc38c267524326dbfbedf
Date: Thu Jun 1 10:14:24 2023 +0800
MIPS: Add speculation_barrier support
Without mips16.S included, some symbols will miss for mips16, and
so some software will fail to build.
libgcc/ChangeLog:
* config/mips/lib1funcs.S: Includes mips16.S.
(cherry picked from commit 9522fc8bb7812f2ad50eb038e0938bfd958e730f)
|
|
|
|
|
|
The CPU features initialization code uses CPUID registers (rather than
HWCAP). The equality comparisons it uses are incorrect: for example FEAT_SVE
is not set if SVE2 is available. Using HWCAPs for these is both simpler and
correct. The initialization must also be done atomically to avoid multiple
threads causing corruption due to non-atomic RMW accesses to the global.
libgcc:
PR target/115342
* config/aarch64/cpuinfo.c (__init_cpu_features_constructor):
Use HWCAP where possible. Use atomic write for initialization.
Fix FEAT_PREDRES comparison.
(__init_cpu_features_resolver): Use atomic load for correct
initialization.
(__init_cpu_features): Likewise.
(cherry picked from commit d7cbcfe7c33645eaf95f175f19884d443817857b)
|
|
|
|
This patch adds missing assembly directives to the CMSE library wrapper to call
functions with attribute cmse_nonsecure_call. Without the .type directive the
linker will fail to produce the correct veneer if a call to this wrapper
function is to far from the wrapper itself. The .size was added for
completeness, though we don't necessarily have a usecase for it.
libgcc/ChangeLog:
PR target/115360
* config/arm/cmse_nonsecure_call.S: Add .type and .size directives.
(cherry picked from commit c559353af49fe5743d226ac3112a285b27a50f6a)
|
|
|
|
Much like AT_HWCAP is already provided in case the platform headers
don't have the value (yet).
libgcc/
* config/aarch64/cpuinfo.c: Provide AT_HWCAP2.
|
|
|
|
PR target/115317
libgcc/config/avr/libf7/
* libf7-asm.sx (__isinf): Map -Inf to -1.
gcc/testsuite/
* gcc.target/avr/torture/pr115317-isinf.c: New test.
(cherry picked from commit f12454278dc725fec3520a5d870e967d79292ee6)
|
|
|
|
The libgcc implementation of __clzhi2 can be tweaked by
one cycle in some situations by re-arranging the instructions.
It also reduces the WCET by 1 cycle.
libgcc/
PR target/115065
* config/avr/lib1funcs.S (__clzhi2): Tweak.
(cherry picked from commit 988838da722dea09bd81ee9d49800a6f24980372)
|
|
|
|
Implement __powisf2 in assembly.
PR target/114981
libgcc/
* config/avr/t-avr (LIB2FUNCS_EXCLUDE): Add _powisf2.
(LIB1ASMFUNCS) [!avrtiny]: Add _powif.
* config/avr/lib1funcs.S (mov4): New .macro.
(L_powif, __powisf2) [!avrtiny]: New module and function.
gcc/testsuite/
* gcc.target/avr/pr114981-powif.c: New test.
(cherry picked from commit af64af69c3cc85dbe00c520651a54850bf5cadc1)
|
|
|
|
This supports __powidf2 by means of a double wrapper for already
existing f7_powi (renamed to __f7_powi by f7-renames.h).
It tweaks the implementation so that it does not perform trivial
multiplications with 1.0 any more, but instead uses a move.
It also fixes the last statement of f7_powi, which was wrong.
Notice that f7_powi was unused until now.
PR target/114981
libgcc/config/avr/libf7/
* libf7-common.mk (F7_ASM_PARTS): Add D_powi
* libf7-asm.sx (F7MOD_D_powi_, __powidf2): New module and function.
* libf7.c (f7_powi): Fix last (wrong) statement.
Tweak trivial multiplications with 1.0.
gcc/testsuite/
* gcc.target/avr/pr114981-powil.c: New test.
(cherry picked from commit de4eea7d7ea86e54843507c68d6672eca9d8c7bb)
|
|
|
|
|
|
On Mon, Apr 29, 2024 at 01:44:24PM +0000, Joseph Myers wrote:
> > glibc 2.34 and later doesn't have separate libpthread (libpthread.so.0 is a
> > dummy shared library with just some symbol versions for compatibility, but
> > all the pthread_* APIs are in libc.so.6).
>
> I suspect this has caused link failures in the glibc testsuite for Hurd,
> which still has separate libpthread.
>
> https://sourceware.org/pipermail/libc-testresults/2024q2/012556.html
So like this then?
2024-04-30 Jakub Jelinek <jakub@redhat.com>
* gthr.h (GTHREAD_USE_WEAK): Don't redefine to 0 for glibc 2.34+
on GNU Hurd.
(cherry picked from commit 3146a92a77f1fccec71a880c7f890a1251aeab41)
|
|
|
|
glibc 2.34 and later doesn't have separate libpthread (libpthread.so.0 is a
dummy shared library with just some symbol versions for compatibility, but
all the pthread_* APIs are in libc.so.6).
So, we don't need to do the .weakref dances to check whether a program
has been linked with -lpthread or not, in dynamically linked apps those
will be always true anyway.
In -static linking, this fixes various issues people had when only linking
some parts of libpthread.a and getting weird crashes. A hack for that was
what e.g. some Fedora glibcs used, where libpthread.a was a library
containing just one giant *.o file which had all the normal libpthread.a
*.o files linked with -r together.
libstdc++-v3 actually does something like this already since r10-10928,
the following patch is meant to fix it even for libgfortran, libobjc and
whatever else uses gthr.h.
2024-04-25 Jakub Jelinek <jakub@redhat.com>
* gthr.h (GTHREAD_USE_WEAK): Redefine to 0 for GLIBC 2.34 or later.
|
|
|
|
libgcc/
PR target/114794
* config/avr/lib1funcs.S (__udivmodqi4): Tweak.
|
|
|
|
The following testcase is miscompiled because the code to decrement
vn on negative value with all ones in most significant limb (even partial)
and 0 in most significant bit of the second most significant limb doesn't
take into account the case where all bits below the most significant limb
are zero. This has been a problem both in the version before yesterday's
commit where it has been done only if un was one shorter than vn before this
decrement, and is now problem even more often when it is done earlier.
When we decrement vn in such case and negate it, we end up with all 0s in
the v2 value, so have both the problems with UB on __builtin_clz* and the
expectations of the algorithm that the divisor has most significant bit set
after shifting, plus when the decremented vn is 1 it can SIGFPE on division
by zero even when it is not division by zero etc. Other values shouldn't
get 0 in the new most significant limb after negation, because the
bitint_reduce_prec canonicalization should reduce prec if the second most
significant limb is all ones and if that limb is all zeros, if at least
one limb below it is non-zero, carry in will make it non-zero.
The following patch fixes it by checking if at least one bit below the
most significant limb is non-zero, in that case it decrements, otherwise
it will do nothing (but e.g. for the un < vn case that also means the
divisor is large enough that the result should be q 0 r u).
2024-04-18 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114762
* libgcc2.c (__divmodbitint4): Perform the decrement on negative
v with most significant limb all ones and the second least
significant limb with most significant bit clear always, regardless of
un < vn.
* gcc.dg/torture/bitint-70.c: New test.
|
|
|