aboutsummaryrefslogtreecommitdiff
path: root/malloc
AgeCommit message (Collapse)AuthorFilesLines
2025-04-15malloc: Use tailcalls in __libc_freeWilco Dijkstra1-7/+24
Use tailcalls to avoid the overhead of a frame on the free fastpath. Move tcache initialization to _int_free_chunk(). Add malloc_printerr_tail() which can be tailcalled without forcing a frame like no-return functions. Change tcache_double_free_verify() to retry via __libc_free() after clearing the key. Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-15malloc: Inline tcache_freeWilco Dijkstra1-30/+14
Inline tcache_free since it's only used by __libc_free. Add __glibc_likely for the tcache checks. Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-15malloc: Improve free checksWilco Dijkstra1-9/+6
The checks on size can be merged and use __builtin_add_overflow. Since tcache only handles small sizes (and rejects sizes < MINSIZE), delay this check until after tcache. Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-15malloc: Inline _int_free_checkWilco Dijkstra1-19/+12
Inline _int_free_check since it is only used by __libc_free. Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-14malloc: Inline _int_freeWilco Dijkstra2-28/+12
Inline _int_free since it is a small function and only really used by __libc_free. Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-14malloc: Move mmap code out of __libc_free hotpathWilco Dijkstra1-30/+40
Currently __libc_free checks for a freed mmap chunk in the fast path. Also errno is always saved and restored to preserve it. Since mmap chunks are larger than the largest tcache chunk, it is safe to delay this and handle tcache, smallbin and medium bin blocks first. Move saving of errno to cases that actually need it. Remove a safety check that fails on mmap chunks and a check that mmap chunks cannot be added to tcache. Performance of bench-malloc-thread improves by 9.2% for 1 thread and 6.9% for 32 threads on Neoverse V2. Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-03-28malloc: Improve performance of __libc_mallocWilco Dijkstra1-13/+20
Improve performance of __libc_malloc by splitting it into 2 parts: first handle the tcache fastpath, then do the rest in a separate tailcalled function. This results in significant performance gains since __libc_malloc doesn't need to setup a frame and we delay tcache initialization and setting of errno until later. On Neoverse V2, bench-malloc-simple improves by 6.7% overall (up to 8.5% for ST case) and bench-malloc-thread improves by 20.3% for 1 thread and 14.4% for 32 threads. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-26malloc: Use __always_inline for simple functionsWilco Dijkstra2-11/+11
Use __always_inline for small helper functions that are critical for performance. This ensures inlining always happens when expected. Performance of bench-malloc-simple improves by 0.6% on average on Neoverse V2. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-25malloc: Use _int_free_chunk for remaindersWilco Dijkstra1-9/+6
When splitting a chunk, release the tail part by calling int_free_chunk. This avoids inserting random blocks into tcache that were never requested by the user. Fragmentation will be worse if they are never used again. Note if the tail is fairly small, we could avoid splitting it at all. Also remove an oddly placed initialization of tcache in _libc_realloc. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-21malloc: missing initialization of tcache in _mid_memalignCupertino Miranda1-0/+2
_mid_memalign includes tcache code but does not attempt to initialize tcaches. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-18malloc: Improve csize2tidxWilco Dijkstra1-1/+1
Remove the alignment rounding up from csize2tidx - this makes no sense since the input should be a chunk size. Removing it enables further optimizations, for example chunksize_nomask can be safely used and invalid sizes < MINSIZE are not mapped to a valid tidx. Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-03-18malloc: Improve arena_for_chunk()Wilco Dijkstra1-6/+7
Change heap_max_size() to improve performance of arena_for_chunk(). Instead of a complex calculation, using a simple mask operation to get the arena base pointer. HEAP_MAX_SIZE should be larger than the huge page size, otherwise heaps will use not huge pages. On AArch64 this removes 6 instructions from arena_for_chunk(), and bench-malloc-thread improves by 1.1% - 1.8%. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-03-03malloc: Add integrity check to largebin nextsizesBen Kallus1-0/+3
If attacker overwrites the bk_nextsize link in the first chunk of a largebin that later has a smaller chunk inserted into it, malloc will write a heap pointer into an attacker-controlled address [0]. This patch adds an integrity check to mitigate this attack. [0]: https://github.com/shellphish/how2heap/blob/master/glibc_2.39/large_bin_attack.c Signed-off-by: Ben Kallus <benjamin.p.kallus.gr@dartmouth.edu> Reviewed-by: DJ Delorie <dj@redhat.com>
2025-02-13malloc: Add size check when moving fastbin->tcacheBen Kallus1-0/+3
By overwriting a forward link in a fastbin chunk that is subsequently moved into the tcache, it's possible to get malloc to return an arbitrary address [0]. When a chunk is fetched from a fastbin, its size is checked against the expected chunk size for that fastbin (see malloc.c:3991). This patch adds a similar check for chunks being moved from a fastbin to tcache, which renders obsolete the exploitation technique described above. Now updated to use __glibc_unlikely instead of __builtin_expect, as requested. [0]: https://github.com/shellphish/how2heap/blob/master/glibc_2.39/fastbin_reverse_into_tcache.c Signed-off-by: Ben Kallus <benjamin.p.kallus.gr@dartmouth.edu> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-02-02elf: Merge __dl_libc_freemem into __rtld_libc_freeresFlorian Weimer1-2/+0
The functions serve very similar purposes. The advantage of __rtld_libc_freeres is that it is located within ld.so, so it is more natural to poke at link map internals there. This slightly regresses cleanup capabilities for statically linked binaries. If that becomes a problem, we should start calling __rtld_libc_freeres from __libc_freeres (perhaps after renaming it).
2025-01-25malloc: cleanup casts in tst-callocSam James1-2/+2
Followup to c3d1dac96bdd10250aa37bb367d5ef8334a093a1. As pointed out by Maciej W. Rozycki, the casts are obviously useless now. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-01-10malloc: obscure calloc use in tst-callocSam James1-4/+8
Similar to a9944a52c967ce76a5894c30d0274b824df43c7a and f9493a15ea9cfb63a815c00c23142369ec09d8ce, we need to hide calloc use from the compiler to accommodate GCC's r15-6566-g804e9d55d9e54c change. First, include tst-malloc-aux.h, but then use `volatile` variables for size. The test passes without the tst-malloc-aux.h change but IMO we want it there for consistency and to avoid future problems (possibly silent). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-01-01Update copyright dates not handled by scripts/update-copyrightsPaul Eggert1-1/+1
I've updated copyright dates in glibc for 2025. This is the patch for the changes not generated by scripts/update-copyrights and subsequent build / regeneration of generated files.
2025-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert97-97/+97
2024-12-23include/sys/cdefs.h: Add __attribute_optimization_barrier__Adhemerval Zanella3-3/+3
Add __attribute_optimization_barrier__ to disable inlining and cloning on a function. For Clang, expand it to __attribute__ ((optnone)) Otherwise, expand it to __attribute__ ((noinline, clone)) Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2024-12-22malloc: Only enable -Waggressive-loop-optimizations suppression for gccAdhemerval Zanella1-2/+2
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-17Hide all malloc functions from compiler [BZ #32366]H.J. Lu6-5/+30
Since -1 isn't a power of two, compiler may reject it, hide memalign from Clang 19 which issues an error: tst-memalign.c:86:31: error: requested alignment is not a power of 2 [-Werror,-Wnon-power-of-two-alignment] 86 | p = memalign (-1, pagesize); | ^~ tst-memalign.c:86:31: error: requested alignment must be 4294967296 bytes or smaller; maximum alignment assumed [-Werror,-Wbuiltin-assume-aligned-alignment] 86 | p = memalign (-1, pagesize); | ^~ Update tst-malloc-aux.h to hide all malloc functions and include it in all malloc tests to prevent compiler from optimizing out any malloc functions. Tested with Clang 19.1.5 and GCC 15 20241206 for BZ #32366. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2024-12-11malloc: Add tcache path for callocWangyang Guo2-38/+132
This commit add tcache support in calloc() which can largely improve the performance of small size allocation, especially in multi-thread scenario. tcache_available() and tcache_try_malloc() are split out as a helper function for better reusing the code. Also fix tst-safe-linking failure after enabling tcache. In previous, calloc() is used as a way to by-pass tcache in memory allocation and trigger safe-linking check in fastbins path. With tcache enabled, it needs extra workarounds to bypass tcache. Result of bench-calloc-thread benchmark Test Platform: Xeon-8380 Ratio: New / Original time_per_iteration (Lower is Better) Threads# | Ratio -----------|------ 1 thread | 0.656 4 threads | 0.470 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-10malloc: add indirection for malloc(-like) functions in tests [BZ #32366]Sam James7-0/+52
GCC 15 introduces allocation dead code removal (DCE) for PR117370 in r15-5255-g7828dc070510f8. This breaks various glibc tests which want to assert various properties of the allocator without doing anything obviously useful with the allocated memory. Alexander Monakov rightly pointed out that we can and should do better than passing -fno-malloc-dce to paper over the problem. Not least because GCC 14 already does such DCE where there's no testing of malloc's return value against NULL, and LLVM has such optimisations too. Handle this by providing malloc (and friends) wrappers with a volatile function pointer to obscure that we're calling malloc (et. al) from the compiler. Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>
2024-12-04malloc: Optimize small memory clearing for callocH.J. Lu2-35/+2
Add calloc-clear-memory.h to clear memory size up to 36 bytes (72 bytes on 64-bit targets) for calloc. Use repeated stores with 1 branch, instead of up to 3 branches. On x86-64, it is faster than memset since calling memset needs 1 indirect branch, 1 broadcast, and up to 4 branches. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2024-11-29malloc: send freed small chunks to smallbink4lizen1-19/+34
Large chunks get added to the unsorted bin since sorting them takes time, for small chunks the benefit of adding them to the unsorted bin is non-existant, actually hurting performance. Splitting and malloc_consolidate still add small chunks to unsorted, but we can hint the compiler that that is a relatively rare occurance. Benchmarking shows this to be consistently good. Authored-by: k4lizen <k4lizen@proton.me> Signed-off-by: Aleksa Siriški <sir@tmina.org>
2024-11-27malloc: Avoid func call for tcache quick path in free()Wangyang Guo1-1/+1
Tcache is an important optimzation to accelerate memory free(), things within this code path should be kept as simple as possible. This commit try to remove the function call when free() invokes tcache code path by inlining _int_free(). Result of bench-malloc-thread benchmark Test Platform: Xeon-8380 Ratio: New / Original time_per_iteration (Lower is Better) Threads# | Ratio -----------|------ 1 thread | 0.879 4 threads | 0.874 The performance data shows it can improve bench-malloc-thread benchmark by ~12% in both single thread and multi-thread scenario. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-25Silence most -Wzero-as-null-pointer-constant diagnosticsAlejandro Colomar5-46/+46
Replace 0 by NULL and {0} by {}. Omit a few cases that aren't so trivial to fix. Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117059> Link: <https://software.codidact.com/posts/292718/292759#answer-292759> Signed-off-by: Alejandro Colomar <alx@kernel.org>
2024-11-25malloc: Split _int_free() into 3 sub functionsWangyang Guo1-49/+86
Split _int_free() into 3 smaller functions for flexible combination: * _int_free_check -- sanity check for free * tcache_free -- free memory to tcache (quick path) * _int_free_chunk -- free memory chunk (slow path)
2024-11-12linux: Add support for getrandom vDSOAdhemerval Zanella1-2/+2
Linux 6.11 has getrandom() in vDSO. It operates on a thread-local opaque state allocated with mmap using flags specified by the vDSO. Multiple states are allocated at once, as many as fit into a page, and these are held in an array of available states to be doled out to each thread upon first use, and recycled when a thread terminates. As these states run low, more are allocated. To make this procedure async-signal-safe, a simple guard is used in the LSB of the opaque state address, falling back to the syscall if there's reentrancy contention. Also, _Fork() is handled by blocking signals on opaque state allocation (so _Fork() always sees a consistent state even if it interrupts a getrandom() call) and by iterating over the thread stack cache on reclaim_stack. Each opaque state will be in the free states list (grnd_alloc.states) or allocated to a running thread. The cancellation is handled by always using GRND_NONBLOCK flags while calling the vDSO, and falling back to the cancellable syscall if the kernel returns EAGAIN (would block). Since getrandom is not defined by POSIX and cancellation is supported as an extension, the cancellation is handled as 'may occur' instead of 'shall occur' [1], meaning that if vDSO does not block (the expected behavior) getrandom will not act as a cancellation entrypoint. It avoids a pthread_testcancel call on the fast path (different than 'shall occur' functions, like sem_wait()). It is currently enabled for x86_64, which is available in Linux 6.11, and aarch64, powerpc32, powerpc64, loongarch64, and s390x, which are available in Linux 6.12. Link: https://pubs.opengroup.org/onlinepubs/9799919799/nframe.html [1] Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> # x86_64 Tested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> # x86_64, aarch64 Tested-by: Xi Ruoyao <xry111@xry111.site> # x86_64, aarch64, loongarch64 Tested-by: Stefan Liebler <stli@linux.ibm.com> # s390x
2024-08-20malloc: Link threading tests with $(shared-thread-library)Samuel Thibault1-0/+6
Fixes build failures on Hurd.
2024-07-27malloc: Link threading tests with $(shared-thread-library)Florian Weimer1-0/+2
Fixes build failures on Hurd.
2024-07-22malloc: add multi-threaded tests for aligned_alloc/calloc/mallocMiguel Martín3-0/+172
Improve aligned_alloc/calloc/malloc test coverage by adding multi-threaded tests with random memory allocations and with/without cross-thread memory deallocations. Perform a number of memory allocation calls with random sizes limited to 0xffff. Use the existing DSO ('malloc/tst-aligned_alloc-lib.c') to randomize allocator selection. The multi-threaded allocation/deallocation is staged as described below: - Stage 1: Half of the threads will be allocating memory and the other half will be waiting for them to finish the allocation. - Stage 2: Half of the threads will be allocating memory and the other half will be deallocating memory. - Stage 3: Half of the threads will be deallocating memory and the second half waiting on them to finish. Add 'malloc/tst-aligned-alloc-random-thread.c' where each thread will deallocate only the memory that was previously allocated by itself. Add 'malloc/tst-aligned-alloc-random-thread-cross.c' where each thread will deallocate memory that was previously allocated by another thread. The intention is to be able to utilize existing malloc testing to ensure that similar allocation APIs are also exposed to the same rigors. Reviewed-by: Arjun Shankar <arjun@redhat.com>
2024-07-22malloc: avoid global locks in tst-aligned_alloc-lib.cMiguel Martín1-19/+20
Make sure the DSO used by aligned_alloc/calloc/malloc tests does not get a global lock on multithreaded tests. Reviewed-by: Arjun Shankar <arjun@redhat.com>
2024-07-19Fix usage of _STACK_GROWS_DOWN and _STACK_GROWS_UP defines [BZ 31989]John David Anglin1-1/+1
Signed-off-by: John David Anglin <dave.anglin@bell.net> Reviewed-By: Andreas K. Hüttel <dilfridge@gentoo.org>
2024-06-24mtrace: make shell commands robust against meta charactersAndreas Schwab1-2/+2
Use the list form of the open function to avoid interpreting meta characters in the arguments.
2024-06-20malloc: Replace shell/Perl gate in mtraceFlorian Weimer1-4/+17
The previous version expanded $0 and $@ twice. The new version defines a q no-op shell command. The Perl syntax error is masked by the eval Perl function. The q { … } construct is executed by the shell without errors because the q shell function was defined, but treated as a non-expanding quoted string by Perl, effectively hiding its context from the Perl interpreter. As before the script is read by require instead of executed directly, to avoid infinite recursion because the #! line contains /bin/sh. Introduce the “fatal” function to produce diagnostics that are not suppressed by “do”. Use “do” instead of “require” because it has fewer requirements on the executed script than “require”. Prefix relative paths with './' because “do” (and “require“ before) searches for the script in @INC if the path is relative and does not start with './'. Use $_ to make the trampoline shorter. Add an Emacs mode marker to indentify the script as a Perl script.
2024-06-20malloc: Always install mtrace (bug 31892)Florian Weimer2-7/+4
Generation of the Perl script does not depend on Perl, so we can always install it even if $(PERL) is not set during the build. Change the malloc/mtrace.pl text substition not to rely on $(PERL). Instead use PATH at run time to find the Perl interpreter. The Perl interpreter cannot execute directly a script that starts with “#! /bin/sh”: it always executes it with /bin/sh. There is no perl command line switch to disable this behavior. Instead, use the Perl require function to execute the script. The additional shift calls remove the “.” shell arguments. Perl interprets the “.” as a string concatenation operator, making the expression syntactically valid. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-06-04malloc: New test to check malloc alternate path using memory obstructionsayan paul2-0/+73
The test aims to ensure that malloc uses the alternate path to allocate memory when sbrk() or brk() fails.To achieve this, the test first creates an obstruction at current program break, tests that obstruction with a failing sbrk(), then checks if malloc is still returning a valid ptr thus inferring that malloc() used mmap() instead of brk() or sbrk() to allocate the memory. Reviewed-by: Arjun Shankar <arjun@redhat.com> Reviewed-by: Zack Weinberg <zack@owlfolio.org>
2024-05-14malloc: Improve aligned_alloc and calloc test coverage.Joe Simmons-Talbott5-0/+151
Add a DSO (malloc/tst-aligned_alloc-lib.so) that can be used during testing to interpose malloc with a call that randomly uses either aligned_alloc, __libc_malloc, or __libc_calloc in the place of malloc. Use LD_PRELOAD with the DSO to mirror malloc/tst-malloc.c testing as an example in malloc/tst-malloc-random.c. Add malloc/tst-aligned-alloc-random.c as another example that does a number of malloc calls with randomly sized, but limited to 0xffff, requests. The intention is to be able to utilize existing malloc testing to ensure that similar allocation APIs are also exposed to the same rigors. Reviewed-by: DJ Delorie <dj@redhat.com>
2024-05-10malloc/Makefile: Split and sort testsH.J. Lu1-64/+102
Put each test on a separate line and sort tests. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-01-12Make __getrandom_nocancel set errno and add a _nostatus versionXi Ruoyao1-1/+3
The __getrandom_nocancel function returns errors as negative values instead of errno. This is inconsistent with other _nocancel functions and it breaks "TEMP_FAILURE_RETRY (__getrandom_nocancel (p, n, 0))" in __arc4random_buf. Use INLINE_SYSCALL_CALL instead of INTERNAL_SYSCALL_CALL to fix this issue. But __getrandom_nocancel has been avoiding from touching errno for a reason, see BZ 29624. So add a __getrandom_nocancel_nostatus function and use it in tcache_key_initialize. Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2024-01-01Update copyright dates not handled by scripts/update-copyrightsPaul Eggert3-3/+3
I've updated copyright dates in glibc for 2024. This is the patch for the changes not generated by scripts/update-copyrights and subsequent build / regeneration of generated files.
2024-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert90-90/+90
2023-11-29malloc: Improve MAP_HUGETLB with glibc.malloc.hugetlb=2Adhemerval Zanella1-3/+10
Even for explicit large page support, allocation might use mmap without the hugepage bit set if the requested size is smaller than mmap_threshold. For this case where mmap is issued, MAP_HUGETLB is set iff the allocation size is larger than the used large page. To force such allocations to use large pages, also tune the mmap_threhold (if it is not explicit set by a tunable). This forces allocation to follow the sbrk path, which will fall back to mmap (which will try large pages before galling back to default mmap). Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
2023-11-22malloc: Use __get_nprocs on arena_get2 (BZ 30945)Adhemerval Zanella1-1/+1
This restore the 2.33 semantic for arena_get2. It was changed by 11a02b035b46 to avoid arena_get2 call malloc (back when __get_nproc was refactored to use an scratch_buffer - 903bc7dcc2acafc). The __get_nproc was refactored over then and now it also avoid to call malloc. The 11a02b035b46 did not take in consideration any performance implication, which should have been discussed properly. The __get_nprocs_sched is still used as a fallback mechanism if procfs and sysfs is not acessible. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2023-11-07malloc: Decorate malloc mapsAdhemerval Zanella2-0/+9
Add anonymous mmap annotations on loader malloc, malloc when it allocates memory with mmap, and on malloc arena. The /proc/self/maps will now print: [anon: glibc: malloc arena] [anon: glibc: malloc] [anon: glibc: loader malloc] On arena allocation, glibc annotates only the read/write mapping. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2023-10-23malloc: Fix tst-tcfree3 build csky-linux-gnuabiv2 with fortify sourceAdhemerval Zanella2-3/+2
With gcc 13.1 with --enable-fortify-source=2, tst-tcfree3 fails to build on csky-linux-gnuabiv2 with: ../string/bits/string_fortified.h: In function ‘do_test’: ../string/bits/string_fortified.h:26:8: error: inlining failed in call to ‘always_inline’ ‘memcpy’: target specific option mismatch 26 | __NTH (memcpy (void *__restrict __dest, const void *__restrict __src, | ^~~~~~ ../misc/sys/cdefs.h:81:62: note: in definition of macro ‘__NTH’ 81 | # define __NTH(fct) __attribute__ ((__nothrow__ __LEAF)) fct | ^~~ tst-tcfree3.c:45:3: note: called from here 45 | memcpy (c, a, 32); | ^~~~~~~~~~~~~~~~~ Instead of relying on -O0 to avoid malloc/free to be optimized away, disable the builtin. Reviewed-by: DJ Delorie <dj@redhat.com>
2023-08-15malloc: Remove bin scanning from memalign (bug 30723)Florian Weimer2-166/+10
On the test workload (mpv --cache=yes with VP9 video decoding), the bin scanning has a very poor success rate (less than 2%). The tcache scanning has about 50% success rate, so keep that. Update comments in malloc/tst-memalign-2 to indicate the purpose of the tests. Even with the scanning removed, the additional merging opportunities since commit 542b1105852568c3ebc712225ae78b ("malloc: Enable merging of remainders in memalign (bug 30723)") are sufficient to pass the existing large bins test. Remove leftover variables from _int_free from refactoring in the same commit. Reviewed-by: DJ Delorie <dj@redhat.com>
2023-08-11malloc: Enable merging of remainders in memalign (bug 30723)Florian Weimer1-76/+121
Previously, calling _int_free from _int_memalign could put remainders into the tcache or into fastbins, where they are invisible to the low-level allocator. This results in missed merge opportunities because once these freed chunks become available to the low-level allocator, further memalign allocations (even of the same size are) likely obstructing merges. Furthermore, during forwards merging in _int_memalign, do not completely give up when the remainder is too small to serve as a chunk on its own. We can still give it back if it can be merged with the following unused chunk. This makes it more likely that memalign calls in a loop achieve a compact memory layout, independently of initial heap layout. Drop some useless (unsigned long) casts along the way, and tweak the style to more closely match GNU on changed lines. Reviewed-by: DJ Delorie <dj@redhat.com>