Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch is returning back support for AArch64 ILP32 ABI that was
removed in de479a54e22e8fcb6262639a8e67fe8b00a27c37 commit but is needed
to ensure source code compatibility with GCC 14.
The change in newlib/libc/machine/aarch64/asmdefs.h makes it
out-of-the-sync with the current upstream implementation in
https://github.com/ARM-software/optimized-routines repository.
Signed-off-by: Radek Bartoň <radek.barton@microsoft.com>
|
|
This was what I was using locally before Radek Bartoň
<radek.barton@microsoft.com> had his version of the patch. Revert in
favor of his final version.
Revert 70c5505766ad4ae62e4d045835ed2a6b928d5760
|
|
ilp32 support was removed prematurely. It is still in GCC 15 which is
the latest GCC release.
From: <radek.barton@microsoft.com>
Date: Thu, 5 Jun 2025 11:32:08 +0200
Subject: [PATCH] newlib: libc: update asmdefs.h compatible with Cygwin AArch64
This patch synchronizes newlib/libc/machine/aarch64/asmdefs.h header with
version from https://github.com/ARM-software/optimized-routines/commit/4352245388a55a836f3ac9ac5907022c24ab8e4c
commit that added support for AArch64 Cygwin.
This version of the header removed PTR_ARG and SIZE_ARG macros as ILP32 was
deprecated which introduced changes in many .S files so the patch contains
removal of usages of those macros.
On top of that, setjmp.S and rawmemchr.S were refactored to use
ENTRY/ENTRY_ALIGN and END macros. `
Signed-off-by: Radek Bartoň <radek.barton@microsoft.com>
|
|
This patch introduces dummy implementation of fegetprec and fsetprec for Cygwin
build as those symbols are being exported by cygwin1.dll and AArch64 do not support
changing floating point precision at runtime.
Signed-off-by: Radek Bartoň <radek.barton@microsoft.com>
|
|
This commit changes the return type of the read() and write() syscalls for
nvptx to ssize_t. This would allow large files to be handled properly by
these syscalls in situations where the read/write buffer length exceeds
INT_MAX, for example. This also makes the syscall signatures fully complaint
with their current POSIX specifications.
We additionally define two macros: '_READ_WRITE_RETURN_TYPE' as _ssize_t and
'_READ_WRITE_BUFSIZE_TYPE' as __size_t in libc/include/sys/config.h under
__nvptx__ for consistency.
Signed-off-by: Arijit Kumar Das <arijitkdgit.official@gmail.com>
|
|
This patch synchronizes newlib/libc/machine/aarch64/asmdefs.h header with
version from https://github.com/ARM-software/optimized-routines/commit/4352245388a55a836f3ac9ac5907022c24ab8e4c
commit that added support for AArch64 Cygwin.
This version of the header removed PTR_ARG and SIZE_ARG macros as ILP32 was
deprecated which introduced changes in many .S files so the patch contains
removal of usages of those macros.
On top of that, setjmp.S and rawmemchr.S were refactored to use
ENTRY/ENTRY_ALIGN and END macros. `
Signed-off-by: Radek Bartoň <radek.barton@microsoft.com>
|
|
Redirect to memcpy() if the memory areas of source and destination
do not overlap. Only redirect if length is > SZREG in order to
reduce overhead on very short copies.
Signed-off-by: m fally <marlene.fally@gmail.com>
|
|
If misaligned accesses are slow or prohibited, either
source or destination address are unaligned and the number
of bytes to be copied is > SZREG*2, align the source address
to xlen. This speeds up the function in the case where at
least one address is unaligned, since now one word (or doubleword
for rv64) is loaded at a time, therefore reducing the amount of
memory accesses necessary.
We still need to store back individual bytes since the
destination address might (still) be unaligned after aligning
the source.
The threshold of SZREG*2 was chosen to keep the negative effect
on shorter copies caused by the additional overhead from aligning
the source low.
This change also affects the case where both adresses are xlen-
aligned, the memory areas overlap destructively, and length is not
a multiple of SZREG. In the destructive-overlap case, the copying
needs to be done in reversed order. Therefore the length is added
to the addresses first, which causes them to become unaligned.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: m fally <marlene.fally@gmail.com>
|
|
Add loop-unrolling for the case where both source and destination
address are aligned in the case of a destructive overlap, and
increase the unroll factor from 4 to 9 for the word-by-word
copy loop in the non-destructive case.
This matches the loop-unrolling done in memcpy() and increases
performance for lenghts >= SZREG*9 while almost not at all
degrading performance for shorter lengths.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: m fally <marlene.fally@gmail.com>
|
|
Replace macros with static inline functions or RISC-V specifc
macros in order to keep consistency between all functions in the port.
Change data types to fixed-width and/or RISC-V specific types.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: m fally <marlene.fally@gmail.com>
|
|
Copy the common implementation of memmove() to the RISC-V port.
Rename memmove.S to memmove-asm.S to keep naming of files
consistent between functions. Update Makefile.inc with the changed
filenames.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: m fally <marlene.fally@gmail.com>
|
|
strcmp gives incorrect result for little endian targets under
the following conditions:
1. Length of 1st string is 1 less than a multiple of 4 (i.e len%4=3)
2. First string is a prefix of the second string
3. The first differing character in the second string is extended
ASCII (that is > 127)
Signed-off-by: Jovan Dmitrović <jovan.dmitrovic@htecgroup.com>
|
|
Improve `strcmp` by using `ext` instruction, if available.
Signed-off-by: Jovan Dmitrović <jovan.dmitrovic@htecgroup.com>
|
|
Fix prefetching in core loop to avoid exceeding the operated upon
memory region. Revert accidentally changed prefetch-hint back to
streaming mode. Refactor various bits and provide pre-processor
checks to allow parameters to be overridden from compiler command
line.
Signed-off-by: Jovan Dmitrović <jovan.dmitrovic@htecgroup.com>
|
|
Signed-off-by: Jovan Dmitrović <jovan.dmitrovic@htecgroup.com>
|
|
Implement abstract interface for MIPS, including unified hosting
interface (UHI).
Signed-off-by: Jovan Dmitrović <jovan.dmitrovic@htecgroup.com>
|
|
For aarch64 on ELF targets, the library does not export
fe{enable,disable,get}except as symbols from the library, relying on
static inline functions to provide suitable definitions if required.
But for Cygwin we need to create real definitions to satisfy the DLL
export script.
So arrange for real definitions of these functions when building on Cygwin.
Signed-off-by: Radek Bartoň <radek.barton@microsoft.com>
|
|
GCC 13 does not define the __riscv_misaligned_* builtin defines. They are
supported by GCC 14 or later. Test for __riscv_misaligned_fast to select an
always correct memcpy() implementation for GCC 13.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
|
|
Zb* extension support
Reworks the mismatch handling to use Zbb's ctz/clz instructions for
faster byte difference detection, significantly improving performance on
Zbb-capable targets. Non-Zbb targets retain the original logic for
compatibility.
Signed-off-by: puranikvinit <kvp933.vinit@gmail.com>
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
|
|
with Zb* extension support
Introduces conditional use of the orc.b instruction from the Zbb
extension for null byte detection, falling back to the original logic
for non-Zbb targets. This reduces cycles in the hot path for supported
architectures.
Signed-off-by: puranikvinit <kvp933.vinit@gmail.com>
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
|
|
Replaces temporary registers (t0) with compressed registers (a4) in the
null detection loop, reducing instruction count and code size in
speed-optimized builds while maintaining identical logic.
Signed-off-by: puranikvinit <kvp933.vinit@gmail.com>
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
|
|
Renames labels in strcmp.S to use descriptive .L prefixes (e.g.,
.Lcompare, .Lreturn_diff) instead of numeric labels (e.g., 1f, 2f). This
improves maintainability and aligns with modern assembly conventions
without affecting functionality.
Signed-off-by: puranikvinit <kvp933.vinit@gmail.com>
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
|
|
This patch optimizes the RISC-V setjmp implementation in
newlib/libc/machine/riscv/setjmp.S for 32-bit targets. It reduces code
size by using doubleword store/load instructions (sd/ld) when the Zilsd
or Zclsd extensions are available for saving and
restoring s0-s11 registers, while preserving the original
single-word instructions (REG_S/REG_L) for compatibility with other
configurations.
Signed-off-by: puranikvinit <kvp933.vinit@gmail.com>
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
RISC-V: setjmp: reduce code size for register load/store with Zilsd
|
|
Pointer arithmetic overflow is undefined behavior, so use a signed type
to avoid it.
Signed-off-by: Eric Salem <ericsalem@gmail.com>
|
|
Align the whitespace of the size optimized implementation of memset()
to match the speed optimized version.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Eric Salem <ericsalem@gmail.com>
|
|
The RISC-V Zba, Zbkb, and Zilsd/Zclsd extensions provide instructions
optimized for bit and load/store operations. Use them when available for
the RISC-V port. Also increase loop unrolling for faster performance.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Eric Salem <ericsalem@gmail.com>
|
|
Add a code path for when source and dest are differently aligned.
If misaligned access is slow or prohibited, and the alignments of the
source and destination are different, we align the destination to do
XLEN stores. This uses only one aligned store for every four (or eight
for XLEN == 64) bytes of data.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Mahmoud Abumandour <ma.mandourr@gmail.com>
|
|
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Mahmoud Abumandour <ma.mandourr@gmail.com>
|
|
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Mahmoud Abumandour <ma.mandourr@gmail.com>
|
|
The RISC-V Zbb, Zbkb, and Zilsd extensions provide instructions
optimized for bit and load/store operations. Use them when available for
the RISC-V port.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Eric Salem <ericsalem@gmail.com>
|
|
Copy stock implementations of memchr() and memrchr() to the RISC-V port.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Eric Salem <ericsalem@gmail.com>
|
|
For architectures where XLEN is 32 bits, when detecting a null byte, a
word is read at a time. Once a null is found in the word, its precise
location is then determined. Make clear to the compiler that if the
first three bytes are not null, the last byte must be null, and does not
need to be read from the string, since its value is always zero.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Eric Salem <ericsalem@gmail.com>
|
|
Update the macro check so that rv64e builds successfully.
Signed-off-by: Eric Salem <ericsalem@gmail.com>
|
|
For architectures where XLEN is 32 bits, when detecting a null byte, a
word is read at a time. Once a null is found in the word, its precise
location is then determined. Make clear to the compiler that if the
first three bytes are not null, the last byte must be null, and does not
need to be read from the source string, since its value is always zero.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Eric Salem <ericsalem@gmail.com>
|
|
Replace add instructions with addi where applicable in
the size optimized versions of memmove(), memset(), memcpy(),
and strcmp(). This change does not affect the functions themselves
and is only done to improve syntactic accuracy.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: m fally <marlene.fally@gmail.com>
|
|
Rename local labels to improve readability.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: m fally <marlene.fally@gmail.com>
|
|
Swap register t1 with a3, so that the affected instructions can be compressed.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Reviewed-by: m fally <marlene.fally@gmail.com>
Signed-off-by: Eric Salem <ericsalem@gmail.com>
|
|
Replace registers t1 and t2 with registers a3 and a4 respectively,
so that the affected instructions can be compressed.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Mahmoud Abumandour <ma.mandourr@gmail.com>
|
|
Replace lb with lbu to avoid unnecessary sign extension.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Mahmoud Abumandour <ma.mandourr@gmail.com>
|
|
Move the instruction that increments the remaining number of
bytes to be copied inbetween the load and store instructions.
This is done in order to relax the RAW dependency between the
load and store instructions.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: m fally <marlene.fally@gmail.com>
|
|
Replace lb with lbu to avoid unnecessary sign extension.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: m fally <marlene.fally@gmail.com>
|
|
Since the algorithm in this version of memmove() is different
from the original version, add comments to give a description.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Reviewed-by: Eric Salem <ericsalem@gmail.com>
Signed-off-by: m fally <marlene.fally@gmail.com>
|
|
Rename local lables so that the structure of the function is clearer.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: m fally <marlene.fally@gmail.com>
|
|
Change register t1 to register a4, so that the affected instructions
can be compressed. Since now we have less registers available, the
following changes need to be made:
In the previous version of this function, a4 was used to hold the offset
that needs to be added to source and destination addresses before copying
any data in the case of source address > destination address.
Since a4 now holds the destination address, this offset is not calculated
anymore. Instead, the value in a2 (the number of bytes to be copied) is added
to the source and destination addresses. Therefore, in the case of
source address > destination adress, a value of 1 needs to be subtracted
from both addresses before starting the copying process.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: m fally <marlene.fally@gmail.com>
|
|
Replace register t2 with register a5, so that lb/sb instructions can be compressed.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: m fally <marlene.fally@gmail.com>
|
|
The sys/asm.h header file is included for certain assembly files, so
move the typedef to a separate header file due to the build breaking on
some systems. Also include the port's string header file (and move and
rename) instead of the system's version.
Addresses: https://sourceware.org/pipermail/newlib/2025/021591.html
Fixes: c3b9bb173c8c ("newlib: riscv: Add XLEN typedef and clean up types")
Reported-by: Jeff Law <jlaw@ventanamicro.com>
Suggested-by: Kito Cheng <kito.cheng@gmail.com>
Signed-off-by: Eric Salem <ericsalem@gmail.com>
|
|
The large code model assume the data may far away from the code, so we
must put the address of the target data wihin the `.text` section,
normally we will just put within the function or nearby the function to
prevent it out-of-range.
Report from riscv-gnu-toolchain:
https://github.com/riscv-collab/riscv-gnu-toolchain/issues/1699
Verified with riscv-gnu-toolchain with rv64gc.
|
|
Add implementation of stpcpy() to the RISC-V port. Also refactor shared
code between strcpy() and stpcpy() to a common function.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Eric Salem <ericsalem@gmail.com>
|
|
The RISC-V Zbb extension provides instructions optimized for bit
operations. Use them when available for the RISC-V port.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Eric Salem <ericsalem@gmail.com>
|
|
The size of the long data type isn't precisely defined in the C
standard, so create a new typedef that uses either uint32_t or uint64_t
based on XLEN. The fixed width types are more robust against any ABI
changes and fit the data types of the intrinsic functions. Use the new
uintxlen_t type instead of long and uintptr_t.
Reviewed-by: Christian Herber <christian.herber@oss.nxp.com>
Signed-off-by: Eric Salem <ericsalem@gmail.com>
|