aboutsummaryrefslogtreecommitdiff
path: root/libgcc
AgeCommit message (Collapse)AuthorFilesLines
2024-02-16libgcc: fix Win32 CV abnormal spurious wakeups in timed wait [PR113850]Matteo Italia1-1/+1
Fix a typo in __gthr_win32_abs_to_rel_time that caused it to return a relative time in seconds instead of milliseconds. As a consequence, __gthr_win32_cond_timedwait called SleepConditionVariableCS with a 1000x shorter timeout; this caused ~1000x more spurious wakeups in CV timed waits such as std::condition_variable::wait_for or wait_until, resulting generally in much higher CPU usage. This can be demonstrated by this sample program: ``` int main() { std::condition_variable cv; std::mutex mx; bool pass = false; auto thread_fn = [&](bool timed) { int wakeups = 0; using sc = std::chrono::system_clock; auto before = sc::now(); std::unique_lock<std::mutex> ml(mx); if (timed) { cv.wait_for(ml, std::chrono::seconds(2), [&]{ ++wakeups; return pass; }); } else { cv.wait(ml, [&]{ ++wakeups; return pass; }); } printf("pass: %d; wakeups: %d; elapsed: %d ms\n", pass, wakeups, int((sc::now() - before) / std::chrono::milliseconds(1))); pass = false; }; { // timed wait, let expire std::thread t(thread_fn, true); t.join(); } { // timed wait, wake up explicitly after 1 second std::thread t(thread_fn, true); std::this_thread::sleep_for(std::chrono::seconds(1)); { std::unique_lock<std::mutex> ml(mx); pass = true; } cv.notify_all(); t.join(); } { // non-timed wait, wake up explicitly after 1 second std::thread t(thread_fn, false); std::this_thread::sleep_for(std::chrono::seconds(1)); { std::unique_lock<std::mutex> ml(mx); pass = true; } cv.notify_all(); t.join(); } return 0; } ``` On builds based on non-affected threading models (e.g. POSIX on Linux, or winpthreads or MCF on Win32) the output is something like ``` pass: 0; wakeups: 2; elapsed: 2000 ms pass: 1; wakeups: 2; elapsed: 991 ms pass: 1; wakeups: 2; elapsed: 996 ms ``` while with the Win32 threading model we get ``` pass: 0; wakeups: 1418; elapsed: 2000 ms pass: 1; wakeups: 479; elapsed: 988 ms pass: 1; wakeups: 2; elapsed: 992 ms ``` (notice the huge number of wakeups in the timed wait cases only). This commit fixes the conversion, adjusting the final division by NSEC100_PER_SEC to use NSEC100_PER_MSEC instead (already defined in the file and not used in any other place, so probably just a typo). libgcc/ChangeLog: PR libgcc/113850 * config/i386/gthr-win32-cond.c (__gthr_win32_abs_to_rel_time): fix absolute timespec to relative milliseconds count conversion (it incorrectly returned seconds instead of milliseconds); this avoids spurious wakeups in __gthr_win32_cond_timedwait
2024-02-15Daily bump.GCC Administrator1-0/+8
2024-02-14x86: Support x32 and IBT in heap trampolineH.J. Lu1-3/+39
Add x32 and IBT support to x86 heap trampoline implementation with a testcase. 2024-02-13 Jakub Jelinek <jakub@redhat.com> H.J. Lu <hjl.tools@gmail.com> libgcc/ PR target/113855 * config/i386/heap-trampoline.c (trampoline_insns): Add IBT support and pad to the multiple of 4 bytes. Use movabsq instead of movabs in comments. Add -mx32 variant. gcc/testsuite/ PR target/113855 * gcc.dg/heap-trampoline-1.c: New test. * lib/target-supports.exp (check_effective_target_heap_trampoline): New.
2024-02-14Daily bump.GCC Administrator1-0/+5
2024-02-13libgcc: Fix UB in FP_FROM_BITINTJakub Jelinek1-1/+5
As I wrote earlier, I was seeing FAIL: gcc.dg/torture/bitint-24.c -O0 execution test FAIL: gcc.dg/torture/bitint-24.c -O2 execution test with the ia32 _BitInt enablement patch on i686-linux. I thought floatbitintxf.c was miscompiled with -O2 -march=i686 -mtune=generic, but it turned out to be UB in it. If a signed _BitInt to be converted to binary floating point has (after sign extension from possible partial limb to full limb) one or more most significant limbs equal to all ones and then in the limb below (the most significant non-~(UBILtype)0 limb) has the most significant limb cleared, like for 32-bit limbs 0x81582c05U, 0x0a8b01e4U, 0xc1b8b18fU, 0x2aac2a08U, -1U, -1U then bitint_reduce_prec can't reduce it to that 0x2aac2a08U limb, so msb is all ones and precision is negative (so it reduced precision from 161 to 192 bits down to 160 bits, in theory could go as low as 129 bits but that wouldn't change anything on the following behavior). But still iprec is negative, -160 here. For that case (i.e. where we are dealing with an negative input), the code was using 65 - __builtin_clzll (~msb) to compute how many relevant bits we have from the msb. Unfortunately that invokes UB for msb all ones. The right number of relevant bits in that case is 1 though (like for -2 it is 2 and -4 or -3 3 as already computed) - all we care about from that is that the most significant bit is set (i.e. the number is negative) and the bits below that should be supplied from the limbs below. So, the following patch fixes it by special casing it not to invoke UB. For msb 0 we already have a special case from before (but that is also different because msb 0 implies the whole number is 0 given the way bitint_reduce_prec works - even if we have limbs like ..., 0x80000000U, 0U the reduction can skip the most significant limb and msb then would be the one below it), so if iprec > 0, we already don't call __builtin_clzll on 0. 2024-02-13 Jakub Jelinek <jakub@redhat.com> * soft-fp/bitint.h (FP_FROM_BITINT): If iprec < 0 and msb is all ones, just set n to 1 instead of using __builtin_clzll (~msb).
2024-02-13Daily bump.GCC Administrator1-0/+10
2024-02-12x86, libgcc: Implement ia32 basic heap trampoline [PR113855].Iain Sandoe2-3/+38
The initial heap trampoline implementation was targeting 64b platforms. As the PR demonstrates this creates an issue where it is expected that the same symbols are exported for 32 and 64b. Rather than conditionalize the exports and code-gen on x86_64, this patch provides a basic implementation of the IA32 trampoline. This also avoids potential user confusion, when a 32b target has 64b multilibs, and vice versa; which is the case for Darwin. PR target/113855 gcc/ChangeLog: * config/i386/darwin.h (DARWIN_HEAP_T_LIB): Moved to be available to all sub-targets. * config/i386/darwin32-biarch.h (DARWIN_HEAP_T_LIB): Delete. * config/i386/darwin64-biarch.h (DARWIN_HEAP_T_LIB): Delete. libgcc/ChangeLog: * config.host: Add trampoline support to x?86-linux. * config/i386/heap-trampoline.c (trampoline_insns): Provide a variant for IA32. (union ix86_trampoline): Likewise. (__gcc_nested_func_ptr_created): Implement a basic trampoline for IA32.
2024-02-11Daily bump.GCC Administrator1-0/+16
2024-02-10libgcc: Fix a bug in _BitInt -> dfp conversionsJakub Jelinek3-3/+3
The ia32 _BitInt support revealed a bug in floatbitint?d.c. As can be even guessed from how the code is written in the loop, the intention was to set inexact to non-zero whenever the remainder after division wasn't zero, but I've ended up just checking whether the 2 least significant limbs of the remainder were non-zero. Now, in the dfp/bitint-4.c test in one case the remainder happens to have least significant 64 bits zero and then the higher limbs are non-zero; with 32-bit limbs that means 2 least significant limbs are zero and so the code acted as if it was exactly divisible. Fixed thusly. 2024-02-10 Jakub Jelinek <jakub@redhat.com> * soft-fp/floatbitintdd.c (__bid_floatbitintdd): Or in all remainder limbs into inexact rather than just first two. * soft-fp/floatbitintsd.c (__bid_floatbitintsd): Likewise. * soft-fp/floatbitinttd.c (__bid_floatbitinttd): Likewise.
2024-02-10libgcc: Fix BIL_TYPE_SIZE == 32 support in _BitInt <-> dfp supportJakub Jelinek5-16/+20
I've tried last night to enable _BitInt support for i?86-linux, and a few spots in libgcc emitted -Wshift-count-overflow warnings and clearly didn't do what it was supposed to do. Fixed thusly. 2024-02-10 Jakub Jelinek <jakub@redhat.com> * soft-fp/fixddbitint.c (__bid_fixddbitint): Fix up BIL_TYPE_SIZE == 32 shifts. * soft-fp/fixsdbitint.c (__bid_fixsdbitint): Likewise. * soft-fp/fixtdbitint.c (__bid_fixtdbitint): Likewise. * soft-fp/floatbitintdd.c (__bid_floatbitintdd): Likewise. * soft-fp/floatbitinttd.c (__bid_floatbitinttd): Likewise.
2024-02-10Daily bump.GCC Administrator1-0/+5
2024-02-09libgcc, Darwin: Update symbol exports to include bitint and bf.Iain Sandoe1-1/+23
Some exports were missed from the GCC-13 cycle, these are added here along with the bitint-related ones added in GCC-14. libgcc/ChangeLog: * config/i386/libgcc-darwin.ver: Export bf and bitint-related synbols.
2024-02-07Daily bump.GCC Administrator1-0/+15
2024-02-06libgcc: Export i386 symbols added after GCC_7.0.0 on Solaris [PR113700]Rainer Orth2-0/+40
As reported in the PR, all libgcc x86 symbol versions added after GCC_7.0.0 were only added to i386/libgcc-glibc.ver, missing all of libgcc-sol2.ver, libgcc-bsd.ver, and libgcc-darwin.ver. This patch fixes this for Solaris/x86, adding all of them (GCC_1[234].0.0) as GCC_14.0.0 to not retroactively change history. Since this isn't the first time this happens, I've added a note to the end of libgcc-glibc.ver to request notifying other maintainers in case of additions. Tested on i386-pc-solaris2.11. 2024-02-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> libgcc: PR target/113700 * config/i386/libgcc-sol2.ver (GCC_14.0.0): Added all symbols from i386/libgcc-glibc.ver (GCC_12.0.0, GCC_13.0.0, GCC_14.0.0). * config/i386/libgcc-glibc.ver: Request notifications on updates.
2024-02-06libgcc: fix SEH C++ rethrow semantics [PR113337]Matteo Italia1-3/+3
SEH _Unwind_Resume_or_Rethrow invokes abort directly if _Unwind_RaiseException doesn't manage to find a handler for the rethrown exception; this is incorrect, as in this case std::terminate should be invoked, allowing an application-provided terminate handler to handle the situation instead of straight crashing the application through abort. The bug can be demonstrated with this simple test case: === static void custom_terminate_handler() { fprintf(stderr, "custom_terminate_handler invoked\n"); std::exit(1); } int main(int argc, char *argv[]) { std::set_terminate(&custom_terminate_handler); if (argc < 2) return 1; const char *mode = argv[1]; fprintf(stderr, "%s\n", mode); if (strcmp(mode, "throw") == 0) { throw std::exception(); } else if (strcmp(mode, "rethrow") == 0) { try { throw std::exception(); } catch (...) { throw; } } else { return 1; } return 0; } === On all gcc builds with non-SEH exceptions, this will print "custom_terminate_handler invoked" both if launched as ./a.out throw or as ./a.out rethrow, on SEH builds instead if will work as expected only with ./a.exe throw, but will crash with the "built-in" abort message with ./a.exe rethrow. This patch fixes the problem, forwarding back the error code to the caller (__cxa_rethrow), that calls std::terminate if _Unwind_Resume_or_Rethrow returns. The change makes the code path coherent with SEH _Unwind_RaiseException, and with the generic _Unwind_Resume_or_Rethrow from libgcc/unwind.inc (used for SjLj and Dw2 exception backend). libgcc/ChangeLog: PR libgcc/113337 * unwind-seh.c (_Unwind_Resume_or_Rethrow): forward _Unwind_RaiseException return code back to caller instead of calling abort, allowing __cxa_rethrow to invoke std::terminate in case of uncaught rethrown exception
2024-02-03Daily bump.GCC Administrator1-0/+18
2024-02-02libgcc: Fix up _BitInt division [PR113604]Jakub Jelinek1-6/+40
The following testcase ends up with SIGFPE in __divmodbitint4. The problem is a thinko in my attempt to implement Knuth's algorithm. The algorithm does (where b is 65536, i.e. one larger than what fits in their unsigned short word): // Compute estimate qhat of q[j]. qhat = (un[j+n]*b + un[j+n-1])/vn[n-1]; rhat = (un[j+n]*b + un[j+n-1]) - qhat*vn[n-1]; again: if (qhat >= b || qhat*vn[n-2] > b*rhat + un[j+n-2]) { qhat = qhat - 1; rhat = rhat + vn[n-1]; if (rhat < b) goto again; } The problem is that it uses a double-word / word -> double-word division (and modulo), while all we have is udiv_qrnnd unless we'd want to do further library calls, and udiv_qrnnd is a double-word / word -> word division and modulo. Now, as the algorithm description says, it can produce at most word bits + 1 bit quotient. And I believe that actually the highest qhat the original algorithm can produce is (1 << word_bits) + 1. The algorithm performs earlier canonicalization where both the divisor and dividend are shifted left such that divisor has msb set. If it has msb set already before, no shifting occurs but we start with added 0 limb, so in the first uv1:uv0 double-word uv1 is 0 and so we can't get too high qhat, if shifting occurs, the first limb of dividend is shifted right by UWtype bits - shift count into a new limb, so again in the first iteration in the uv1:uv0 double-word uv1 doesn't have msb set while vv1 does and qhat has to fit into word. In the following iterations, previous iteration should guarantee that the previous quotient digit is correct. Even if the divisor was the maximal possible vv1:all_ones_in_all_lower_limbs, if the old uv0:lower_limbs would be larger or equal to the divisor, the previous quotient digit would increase and another divisor would be subtracted, which I think implies that in the next iteration in uv1:uv0 double-word uv1 <= vv1, but uv0 could be up to all ones, e.g. in case of all lower limbs of divisor being all ones and at least one dividend limb below uv0 being not all ones. So, we can e.g. for 64-bit UWtype see uv1:uv0 / vv1 0x8000000000000000UL:0xffffffffffffffffUL / 0x8000000000000000UL or 0xffffffffffffffffUL:0xffffffffffffffffUL / 0xffffffffffffffffUL In all these cases (when uv1 == vv1 && uv0 >= uv1), qhat is 0x10000000000000001UL, i.e. 2 more than fits into UWtype result, if uv1 == vv1 && uv0 < uv1 it would be 0x10000000000000000UL, i.e. 1 more than fits into UWtype result. Because we only have udiv_qrnnd which can't deal with those too large cases (SIGFPEs or otherwise invokes undefined behavior on those), I've tried to handle the uv1 >= vv1 case separately, but for one thing I thought it would be at most 1 larger than what fits, and for two have actually subtracted vv1:vv1 from uv1:uv0 instead of subtracting 0:vv1 from uv1:uv0. For the uv1 < vv1 case, the implementation already performs roughly what the algorithm does. Now, let's see what happens with the two possible extra cases in the original algorithm. If uv1 == vv1 && uv0 < uv1, qhat above would be b, so we take if (qhat >= b, decrement qhat by 1 (it becomes b - 1), add vn[n-1] aka vv1 to rhat and goto again if rhat < b (but because qhat already fits we can goto to the again label in the uv1 < vv1 code). rhat in this case is uv0 and rhat + vv1 can but doesn't have to overflow, say for uv0 42UL and vv1 0x8000000000000000UL it will not (and so we should goto again), while for uv0 0x8000000000000000UL and vv1 0x8000000000000001UL it will (and we shouldn't goto again). If uv1 == vv1 && uv0 >= uv1, qhat above would be b + 1, so we take if (qhat >= b, decrement qhat by 1 (it becomes b), add vn[n-1] aka vv1 to rhat. But because vv1 has msb set and rhat in this case is uv0 - vv1, the rhat + vv1 addition certainly doesn't overflow, because (uv0 - vv1) + vv1 is uv0, so in the algorithm we goto again, again take if (qhat >= b and decrement qhat so it finally becomes b - 1, and add vn[n-1] aka vv1 to rhat again. But this time I believe it must always overflow, simply because we added (uv0 - vv1) + vv1 + vv1 and vv1 has msb set, so already vv1 + vv1 must overflow. And because it overflowed, it will not goto again. So, I believe the following patch implements this correctly, by subtracting vv1 from uv1:uv0 double-word once, then comparing again if uv1 >= vv1. If that is true, subtract vv1 from uv1:uv0 again and add 2 * vv1 to rhat, no __builtin_add_overflow is needed as we know it always overflowed and so won't goto again. If after the first subtraction uv1 < vv1, use __builtin_add_overflow when adding vv1 to rhat, because it can but doesn't have to overflow. I've added an extra testcase which tests the behavior of all the changed cases, so it has a case where uv1:uv0 / vv1 is 1:1, where it is 1:0 and rhat + vv1 overflows and where it is 1:0 and rhat + vv1 does not overflow, and includes tests also from Zdenek's other failing tests. 2024-02-02 Jakub Jelinek <jakub@redhat.com> PR libgcc/113604 * libgcc2.c (__divmodbitint4): If uv1 >= vv1, subtract vv1 from uv1:uv0 once or twice as needed, rather than subtracting vv1:vv1. * gcc.dg/torture/bitint-53.c: New test. * gcc.dg/torture/bitint-55.c: New test.
2024-02-02[PATCH] libgcc: Include stdlib.h for abort() on mingw32Khem Raj1-0/+1
libgcc/ * config/i386/enable-execute-stack-mingw32.c: Include stdlib.h for abort() definition.
2024-02-02libgcc: Export XF, TF, HF and BFmode specific _BitInt symbols from ↵Jakub Jelinek1-6/+6
libgcc_s.so.1 [PR113700] Rainer pointed out that __PFX__ and __FIXPTPFX__ prefix replacement is done solely for libgcc-std.ver.in and not for the *.ver files in config. I've used the __PFX__ prefix even in config/i386/libgcc-glibc.ver because it was used for similar symbols in libgcc-std.ver.in, and that results in those symbols being STB_LOCAL in libgcc_s.so.1. Tests still work because gcc by default uses -static-libgcc when linking (unlike g++ etc.), but would have failed when using -shared-libgcc (but I see nothing in the testsuite actually testing with -shared-libgcc, so am not adding tests). With the patch, libgcc_s.so.1 now exports __fixtfbitint@@GCC_14.0.0 FUNC GLOBAL DEFAULT __fixxfbitint@@GCC_14.0.0 FUNC GLOBAL DEFAULT __floatbitintbf@@GCC_14.0.0 FUNC GLOBAL DEFAULT __floatbitinthf@@GCC_14.0.0 FUNC GLOBAL DEFAULT __floatbitinttf@@GCC_14.0.0 FUNC GLOBAL DEFAULT __floatbitintxf@@GCC_14.0.0 FUNC GLOBAL DEFAULT on x86_64-linux which it wasn't before. 2024-02-02 Jakub Jelinek <jakub@redhat.com> PR target/113700 * config/i386/libgcc-glibc.ver (GCC_14.0.0): Remove __PFX prefixes from symbol names.
2024-02-02Daily bump.GCC Administrator1-0/+17
2024-02-01libgcc: Avoid warnings on __gcc_nested_func_ptr_created [PR113402]Jakub Jelinek3-7/+7
I'm seeing hundreds of In file included from ../../../libgcc/libgcc2.c:56: ../../../libgcc/libgcc2.h:32:13: warning: conflicting types for built-in function ‘__gcc_nested_func_ptr_created’; expected ‘void(void *, void *, void *)’ +[-Wbuiltin-declaration-mismatch] 32 | extern void __gcc_nested_func_ptr_created (void *, void *, void **); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ warnings. Either we need to add like in r14-6218 #pragma GCC diagnostic ignored "-Wbuiltin-declaration-mismatch" (but in that case because of the libgcc2.h prototype (why is it there?) it would need to be also with #pragma GCC diagnostic push/pop around), or we could go with just following how the builtins are prototyped on the compiler side and only cast to void ** when dereferencing (which is in a single spot in each TU). 2024-02-01 Jakub Jelinek <jakub@redhat.com> PR libgcc/113402 * libgcc2.h (__gcc_nested_func_ptr_created): Change type of last argument from void ** to void *. * config/i386/heap-trampoline.c (__gcc_nested_func_ptr_created): Change type of dst from void ** to void * and cast dst to void ** before dereferencing it. * config/aarch64/heap-trampoline.c (__gcc_nested_func_ptr_created): Likewise.
2024-02-01libgcc: Fix up i386/t-heap-trampoline [PR113403]Jakub Jelinek1-1/+1
I'm seeing ../../../libgcc/shared-object.mk:14: warning: overriding recipe for target 'heap-trampoline.o' ../../../libgcc/shared-object.mk:14: warning: ignoring old recipe for target 'heap-trampoline.o' ../../../libgcc/shared-object.mk:17: warning: overriding recipe for target 'heap-trampoline_s.o' ../../../libgcc/shared-object.mk:17: warning: ignoring old recipe for target 'heap-trampoline_s.o' This patch fixes that. 2024-02-01 Jakub Jelinek <jakub@redhat.com> PR libgcc/113403 * config/i386/t-heap-trampoline: Add to LIB2ADDEHSHARED i386/heap-trampoline.c rather than aarch64/heap-trampoline.c.
2024-02-01Daily bump.GCC Administrator1-0/+6
2024-02-01aarch64: libgcc: Cleanup ELF marking in asmSzabolcs Nagy3-51/+1
Use aarch64-asm.h in asm code consistently, this was started in commit c608ada288ced0268bbbbc1fd4136f56c34b24d4 Author: Zac Walker <zacwalker@microsoft.com> CommitDate: 2024-01-23 15:32:30 +0000 Ifdef `.hidden`, `.type`, and `.size` pseudo-ops for `aarch64-w64-mingw32` target But that commit failed to remove some existing markings from asm files, which means some objects got double marked with gnu property notes. libgcc/ChangeLog: * config/aarch64/crti.S: Remove stack marking. * config/aarch64/crtn.S: Remove stack marking, include aarch64-asm.h * config/aarch64/lse.S: Remove stack and GNU property markings.
2024-01-31Daily bump.GCC Administrator1-0/+20
2024-01-30libgcc: Make heap trampoline support dynamic [PR113403].Iain Sandoe4-2/+34
In order to handle system security constraints during GCC build and test and that most platform versions cannot link to libgcc_eh since the unwinder there is incompatible with the system one. 1. We make the support functions weak definitions. 2. We include them as a CRT for platform conditions that do not allow libgcc_eh. 3. We ensure that the weak symbols are exported from DSOs (which includes exes on Darwin) so that the dynamic linker will pick one instance (which avoids duplication of trampoline caches). PR libgcc/113403 gcc/ChangeLog: * config/darwin.h (DARWIN_SHARED_WEAK_ADDS, DARWIN_WEAK_CRTS): New. (REAL_LIBGCC_SPEC): Move weak CRT handling to separate spec. * config/i386/darwin.h (DARWIN_HEAP_T_LIB): New. * config/i386/darwin32-biarch.h (DARWIN_HEAP_T_LIB): New. * config/i386/darwin64-biarch.h (DARWIN_HEAP_T_LIB): New. * config/rs6000/darwin.h (DARWIN_HEAP_T_LIB): New. libgcc/ChangeLog: * config.host: Build libheap_t.a for i686/x86_64 Darwin. * config/aarch64/heap-trampoline.c (HEAP_T_ATTR): New. (allocate_tramp_ctrl): Allow a target to build this as a weak def. (__gcc_nested_func_ptr_created): Likewise. * config/i386/heap-trampoline.c (HEAP_T_ATTR): New. (allocate_tramp_ctrl): Allow a target to build this as a weak def. (__gcc_nested_func_ptr_created): Likewise. * config/t-darwin: Build libheap_t.a (a CRT with heap trampoline support).
2024-01-30libgcc: Make heap trampoline support dynamic [PR113403].Iain Sandoe2-2/+4
This removes the heap trampoline support functions from libgcc.a and adds them to libgcc_eh.a. They are also present in libgcc_s. PR libgcc/113403 libgcc/ChangeLog: * config/aarch64/t-heap-trampoline: Move the heap trampoline support functions from libgcc.a to libgcc_eh.a. * config/i386/t-heap-trampoline: Likewise.
2024-01-29Daily bump.GCC Administrator1-0/+14
2024-01-28Fix __builtin_nested_func_ptr_{created,deleted} symbol versions [PR113402]Iain Sandoe4-13/+12
The symbols for the functions supporting heap-based trampolines were exported at an incorrect symbol version, the following patch fixes that. As requested in the PR, this also renames __builtin_nested_func_ptr* to __gcc_nested_func_ptr*. In carrying our the rename, we move the builtins to use DEF_EXT_LIB_BUILTIN. PR libgcc/113402 gcc/ChangeLog: * builtins.cc (expand_builtin): Handle BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED. * builtins.def (BUILT_IN_GCC_NESTED_PTR_CREATED, BUILT_IN_GCC_NESTED_PTR_DELETED): Make these builtins LIB-EXT and rename the library fallbacks to __gcc_nested_func_ptr_created and __gcc_nested_func_ptr_deleted. * doc/invoke.texi: Rename these to __gcc_nested_func_ptr_created and __gcc_nested_func_ptr_deleted. * tree-nested.cc (finalize_nesting_tree_1): Use builtin_explicit for BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED. * tree.cc (build_common_builtin_nodes): Build the BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED local builtins only for non-explicit. libgcc/ChangeLog: * config/aarch64/heap-trampoline.c: Rename __builtin_nested_func_ptr_created to __gcc_nested_func_ptr_created and __builtin_nested_func_ptr_deleted to __gcc_nested_func_ptr_deleted. * config/i386/heap-trampoline.c: Likewise. * libgcc2.h: Likewise. * libgcc-std.ver.in (GCC_7.0.0): Likewise and then move __gcc_nested_func_ptr_created and __gcc_nested_func_ptr_deleted from this symbol version to ... (GCC_14.0.0): ... this one. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> Co-authored-by: Jakub Jelinek <jakub@redhat.com>
2024-01-27Daily bump.GCC Administrator1-0/+4
2024-01-26amdgcn: additional gfx1030/gfx1100 supportAndrew Stubbs1-1/+1
This is enough to get gfx1030 and gfx1100 working; there are still some test failures to investigate, and probably some tuning to do. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3. * config/gcn/gcn-valu.md (all_convert): New iterator. (<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New define_expand, and rename the old one to ... (*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this. (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ... (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this. (*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New. * config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly. (gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100. * config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3. (<u>mulqihi3_scalar): Likewise. libgcc/ChangeLog: * config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3. libgomp/ChangeLog: * config/gcn/time.c (RTC_TICKS): Configure RDNA3. (omp_get_wtime): Add RDNA3-compatible variant. * plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100. Signed-off-by: Andrew Stubbs <ams@baylibre.com>
2024-01-24Daily bump.GCC Administrator1-0/+10
2024-01-23Ifdef `.hidden`, `.type`, and `.size` pseudo-ops for `aarch64-w64-mingw32` ↵Zac Walker6-20/+27
target Recent change (https://gcc.gnu.org/pipermail/gcc-cvs/2023-December/394915.html) added a generic SME support using `.hidden`, `.type`, and ``.size` pseudo-ops in the assembly sources, `aarch64-w64-mingw32` does not support the pseudo-ops though. This patch wraps usage of those pseudo-ops using macros and ifdefs them for `__ELF__` define. libgcc/ * config/aarch64/aarch64-asm.h (HIDDEN, SYMBOL_SIZE, SYMBOL_TYPE) (ENTRY_ALIGN, GNU_PROPERTY): New macros. * config/aarch64/__arm_sme_state.S: Use them. * config/aarch64/__arm_tpidr2_save.S: Likewise. * config/aarch64/__arm_za_disable.S: Likewise. * config/aarch64/crti.S: Likewise. * config/aarch64/lse.S: Likewise.
2024-01-13Daily bump.GCC Administrator1-0/+16
2024-01-12libgcc: Use may_alias attribute in bitint handlersJakub Jelinek3-25/+27
As discussed on IRC, the following patch uses may_alias attribute, so that on targets like aarch64 where abi_limb_mode != limb_mode the library accesses the limbs (half limbs of the ABI) in the arrays with conservative alias set. 2024-01-12 Jakub Jelinek <jakub@redhat.com> * libgcc2.h (UBILtype): New typedef with may_alias attribute. (__mulbitint3, __divmodbitint4): Use UBILtype * instead of UWtype * and const UBILtype * instead of const UWtype *. * libgcc2.c (bitint_reduce_prec, bitint_mul_1, bitint_addmul_1, __mulbitint3, bitint_negate, bitint_submul_1, __divmodbitint4): Likewise. * soft-fp/bitint.h (UBILtype): Change define into a typedef with may_alias attribute.
2024-01-12libgcc, nios2: Fix exception handling on nios2 with -fpicSandra Loosemore1-2/+3
Exception handling on nios2-linux-gnu with -fpic has been broken since revision 790854ea7670f11c14d431c102a49181d2915965, "Use _dl_find_object in _Unwind_Find_FDE". For whatever reason, this doesn't work on nios2. Nios2 uses the GOT address as the base for DW_EH_PE_datarel relocations in PIC; see my previous fix to make this work, revision 2d33dcfe9f0494c9b56a8d704c3d27c5a4329ebc, "Support for GOT-relative DW_EH_PE_datarel encoding". So this may be a horrible bug in the ABI or in my interpretation of it or just glibc's implementation of _dl_find_object for this target, but there's existing code out there that does things this way; and realistically, nobody is going to re-engineer this now that the vendor has EOL'ed the nios2 architecture. So, just skip over the code trying to use _dl_find_object on this target and fall back to the way that works. I plan to backport this patch to the GCC 12 and GCC 13 branches as well. libgcc/ChangeLog * unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Do not try to use _dl_find_object on nios2; it doesn't work.
2024-01-03Update copyright years.Jakub Jelinek1085-1085/+1085
2024-01-03Update Copyright year in ChangeLog filesJakub Jelinek2-2/+2
2023 -> 2024
2023-12-24Daily bump.GCC Administrator1-0/+7
2023-12-23GCN, nvptx: Basic '__cxa_guard_{acquire,abort,release}' for C++ static local ↵Thomas Schwinge4-0/+105
variables support For now, for single-threaded GCN, nvptx target use only; extension for multi-threaded offloading use is to follow later. Eventually switch to libstdc++-v3/libsupc++ proper. libgcc/ * c++-minimal/README: New. * c++-minimal/guard.c: New. * config/gcn/t-amdgcn (LIB2ADD): Add it. * config/nvptx/t-nvptx (LIB2ADD): Likewise.
2023-12-21Daily bump.GCC Administrator1-0/+6
2023-12-20strub: sparc: omit frame in strub_leave [PR112917]Alexandre Oliva2-0/+6
If we allow __strub_leave to allocate a frame on sparc, it will overlap with a lot of the stack range we're supposed to scrub, because of the large fixed-size outgoing args and register save area. Unfortunately, setting up the PIC register seems to prevent the frame pointer from being omitted. Since the strub runtime doesn't issue calls or use global variables, at least on sparc, disabling PIC to compile strub.c seems to do the right thing. for libgcc/ChangeLog PR middle-end/112917 * config.host (sparc, sparc64): Enable... * config/sparc/t-sparc: ... this new fragment.
2023-12-20Daily bump.GCC Administrator1-0/+6
2023-12-19strub: avoid lto inliningAlexandre Oliva1-2/+6
The strub builtins are not suited for cross-unit inlining, they should only be inlined by the builtin expanders, if at all. While testing on sparc64, it occurred to me that, if libgcc was built with LTO enabled, lto1 might inline them, and that would likely break things. So, make sure they're clearly marked as not inlinable. for libgcc/ChangeLog * strub.c (ATTRIBUTE_NOINLINE): New. (ATTRIBUTE_STRUB_CALLABLE): Add it. (__strub_dummy_force_no_leaf): Drop it.
2023-12-17Daily bump.GCC Administrator1-0/+14
2023-12-16[aarch64] Add function multiversioning supportAndrew Carlotti1-67/+2
This adds initial support for function multiversioning on aarch64 using the target_version and target_clones attributes. This loosely follows the Beta specification in the ACLE [1], although with some differences that still need to be resolved (possibly as follow-up patches). Existing function multiversioning implementations are broken in various ways when used across translation units. This includes placing resolvers in the wrong translation units, and using symbol mangling that callers to unintentionally bypass the resolver in some circumstances. Fixing these issues for aarch64 will require modifications to our ACLE specification. It will also require further adjustments to existing middle end code, to facilitate different mangling and resolver placement while preserving existing target behaviours. The list of function multiversioning features specified in the ACLE is also inconsistent with the list of features supported in target option extensions. I intend to resolve some or all of these inconsistencies at a later stage. The target_version attribute is currently only supported in C++, since this is the only frontend with existing support for multiversioning using the target attribute. On the other hand, this patch happens to enable multiversioning with the target_clones attribute in Ada and D, as well as the entire C family, using their existing frontend support. This patch also does not support the following aspects of the Beta specification: - The target_clones attribute should allow an implicit unlisted "default" version. - There should be an option to disable function multiversioning at compile time. - Unrecognised target names in a target_clones attribute should be ignored (with an optional warning). This current patch raises an error instead. [1] https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning gcc/ChangeLog: * config/aarch64/aarch64-feature-deps.h (fmv_deps_<FEAT_NAME>): Define aarch64_feature_flags mask foreach FMV feature. * config/aarch64/aarch64-option-extensions.def: Use new macros to define FMV feature extensions. * config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p): Check for target_version attribute after processing target attribute. (aarch64_fmv_feature_data): New. (aarch64_parse_fmv_features): New. (aarch64_process_target_version_attr): New. (aarch64_option_valid_version_attribute_p): New. (get_feature_mask_for_version): New. (compare_feature_masks): New. (aarch64_compare_version_priority): New. (build_ifunc_arg_type): New. (make_resolver_func): New. (add_condition_to_bb): New. (dispatch_function_versions): New. (aarch64_generate_version_dispatcher_body): New. (aarch64_get_function_versions_dispatcher): New. (aarch64_common_function_versions): New. (aarch64_mangle_decl_assembler_name): New. (TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation. (TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation. (TARGET_OPTION_FUNCTION_VERSIONS): New implementation. (TARGET_COMPARE_VERSION_PRIORITY): New implementation. (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation. (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation. (TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation. * config/aarch64/aarch64.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): Set target macro. * config/arm/aarch-common.h (enum aarch_parse_opt_result): Add new value to report duplicate FMV feature. * common/config/aarch64/cpuinfo.h: New file. libgcc/ChangeLog: * config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared copy in gcc/common gcc/testsuite/ChangeLog: * gcc.target/aarch64/options_set_17.c: Reorder expected flags. * gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_21.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.
2023-12-16aarch64: Add cpu feature detection to libgccAndrew Carlotti2-0/+501
This is added to enable function multiversioning, but can also be used directly. The interface is chosen to match that used in LLVM's compiler-rt, to facilitate cross-compiler compatibility. The content of the patch is derived almost entirely from Pavel's prior contributions to compiler-rt/lib/builtins/cpu_model.c. I have made minor changes to align more closely with GCC coding style, and to exclude any code from other LLVM contributors, and am adding this to GCC with Pavel's approval. libgcc/ChangeLog: * config/aarch64/t-aarch64: Include cpuinfo.c * config/aarch64/cpuinfo.c: New file (__init_cpu_features_constructor) New. (__init_cpu_features_resolver) New. (__init_cpu_features) New. Co-authored-by: Pavel Iliin <Pavel.Iliin@arm.com>
2023-12-12Daily bump.GCC Administrator1-0/+10
2023-12-11libgfortran: Replace mutex with rwlockLipeng Zhu1-0/+60
This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro. (__gthrw): New function. (__gthread_rwlock_rdlock): New function. (__gthread_rwlock_tryrdlock): New function. (__gthread_rwlock_wrlock): New function. (__gthread_rwlock_trywrlock): New function. (__gthread_rwlock_unlock): New function. libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New macro. * io/async.h (RWLOCK_DEBUG_ADD): New macro. (CHECK_RDLOCK): New macro. (CHECK_WRLOCK): New macro. (TAIL_RWLOCK_DEBUG_QUEUE): New macro. (IN_RWLOCK_DEBUG_QUEUE): New macro. (RDLOCK): New macro. (WRLOCK): New macro. (RWUNLOCK): New macro. (RD_TO_WRLOCK): New macro. (INTERN_RDLOCK): New macro. (INTERN_WRLOCK): New macro. (INTERN_RWUNLOCK): New macro. * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in a comment. (unit_lock): Remove including associated internal_proto. (unit_rwlock): New declarations including associated internal_proto. (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock instead of __gthread_mutex_lock and __gthread_mutex_unlock on unit_lock. * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (st_write_done_worker): Likewise. * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules' comment. Use unit_rwlock variable instead of unit_lock variable. (get_gfc_unit_from_unit_root): New function. (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_units): Likewise. (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on unit_lock. * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock.
2023-12-09Daily bump.GCC Administrator1-0/+38