aboutsummaryrefslogtreecommitdiff
path: root/src/stdio
AgeCommit message (Collapse)AuthorFilesLines
2022-12-14prevent invalid reads of nl_arg in printf_coreMarkus Wichmann1-6/+8
printf_core() runs twice, and during its first run, nl_arg is uninitialized and must not be read. It gets initialized at the end of the first run. Conversely, nl_type does not need to be set during the second run, as its useful life has ended at that point, since the only time it is read is during that exact same initialization. Therefore we can simply alternate the assignments. p and w do still need to get values assigned to them, since at least one line in the same if-statement depends on that, but they can be dummy values. arg does not need to be assigned, since in the first run, we encounter a continue statement before using the argument.
2022-10-19fgets: avoid arithmetic overflow when n==INT_MIN is passedRich Felker1-2/+3
performing n-- is not a safe operation for arbitrary signed input n. only perform the decrement in the code path where the initial n is greater than 1, and adjust the condition in the n<=1 code path to compensate for it not having been decremented.
2022-10-19remove LFS64 symbol aliases; replace with dynamic linker remappingRich Felker7-14/+0
originally the namespace-infringing "large file support" interfaces were included as part of glibc-ABI-compat, with the intent that they not be used for linking, since our off_t is and always has been unconditionally 64-bit and since we usually do not aim to support nonstandard interfaces when there is an equivalent standard interface. unfortunately, having the symbols present and available for linking caused configure scripts to detect them and attempt to use them without declarations, producing all the expected ill effects that entails. as a result, commit 2dd8d5e1b8ba1118ff1782e96545cb8a2318592c was made to prevent this, using macros to redirect the LFS64 names to the standard names, conditional on _GNU_SOURCE or _LARGEFILE64_SOURCE. however, this has turned out to be a source of further problems, especially since g++ defines _GNU_SOURCE by default. in particular, the presence of these names as macros breaks a lot of valid code. this commit removes all the LFS64 symbols and replaces them with a mechanism in the dynamic linker symbol lookup failure path to retry with the spurious "64" removed from the symbol name. in the future, if/when the rest of glibc-ABI-compat is moved out of libc, this can be removed.
2022-09-07fix fwprintf missing output to open_wmemstream FILEsRich Felker1-1/+5
open_wmemstream's write method was written assuming no buffering, since it sets the FILE up with buf_len of zero in order to avoid issues with position/seeking. however, as a consequence of commit bd57e2b43a5b56c00a82adbde0e33e5820c81164, a FILE being written to by the printf core has a temporary local buffer for the duration of the operation if it was unbuffered to begin with. since this was disregarded by the wide memstream's write method, output produced through this code path, particularly numeric fields, was missing from the output wchar buffer. copy the equivalent logic for using the buffered data from the byte-oriented open_memstream.
2022-08-17freopen: reset stream orientation (byte/wide) and encoding ruleRich Felker1-0/+2
this is a requirement of the C language (orientation) and POSIX (encoding rule) that was somehow overlooked. we rely on the fact that the buffer pointers have been reset by fflush, so that any future stdio operations on the stream will go through the same code paths they would on a newly-opened file without an orientation set, thereby setting the orientation as they should.
2022-05-01drop use of stat operation in temporary file name generationRich Felker2-10/+6
this change serves two purposes: 1. it eliminates one of the few remaining uses of the kernel stat structure which will not be present in future archs, avoiding the need for growing ifdef logic here. 2. it potentially makes the operations less expensive when the candidate exists as a non-symlink by avoiding the need to read the inode (assuming the directory tables suffice to distinguish symlinks). this uses the idiom I discovered while rewriting realpath for commit 29ff7599a448232f2527841c2362643d246cee36 of being able to use the readlink operation as an inexpensive probe for file existence that doesn't following symlinks.
2022-02-20fix spurious failures by fgetws when buffer ends with partial characterRich Felker1-6/+1
commit a90d9da1d1b14d81c4f93e1a6d1a686c3312e4ba made fgetws look for changes to errno by fgetwc to detect encoding errors, since ISO C did not allow the implementation to set the stream's error flag in this case, and the fgetwc interface did not admit any other way to detect the error. however, the possibility of fgetwc setting errno to EILSEQ in the success path was overlooked, and in fact this can happen if the buffer ends with a partial character, causing mbtowc to be called with only part of the character available. since that change was made, the C standard was amended to specify that fgetwc set the stream error flag on encoding errors, and commit 511d70738bce11a67219d0132ce725c323d00e4e made it do so. thus, there is no longer any need for fgetws to poke at errno to handle encoding errors. this commit reverts commit a90d9da1d1b14d81c4f93e1a6d1a686c3312e4ba and thereby fixes the problem.
2022-01-09make fseek detect and produce an error for invalid whence argumentsRich Felker1-0/+7
this is a POSIX requirement. we previously relied on the underlying fd (or other backend) seek operation to produce the error, but since linux lseek now supports other seek modes (SEEK_DATA and SEEK_HOLE) which do not interact well with stdio buffering, this is insufficient. instead, explicitly check whence before performing any operations.
2021-09-11fix undefined behavior in getdelim via null pointer arithmetic and memcpyRich Felker1-3/+5
both passing a null pointer to memcpy with length 0, and adding 0 to a null pointer, are undefined. in some sense this is 'benign' UB, but having it precludes use of tooling that strictly traps on UB. there may be better ways to fix it, but conditioning the operations which are intended to be no-ops in the k==0 case on k being nonzero is a simple and safe solution.
2021-04-20fix popen not to leak pipes from one child to anotherRich Felker1-0/+6
POSIX places an obscure requirement on popen which is like a limited version of close-on-exec: "The popen() function shall ensure that any streams from previous popen() calls that remain open in the parent process are closed in the new child process." if the POSIX-future 'e' mode flag is passed, producing a pipe FILE with FD_CLOEXEC on the underlying pipe, this requirement is automatically satisfied. however, for applications which use multiple concurrent popen pipes but don't request close-on-exec, fd leaks from earlier popen calls to later ones could produce deadlock situations where processes are waiting for a pipe EOF that will never happen. to fix this, iterate through all open FILEs and add close actions for those obtained from popen. this requires holding a lock on the open file list across the posix_spawn call so that additional popen FILEs are not created after the list is traversed. note that it's still possible for another popen call to start and create its pipe while the lock is held, but such pipes are created with O_CLOEXEC and only drop close-on-exec status (when 'e' flag is omitted) under control of the lock.
2021-04-20remove spurious lock in popenRich Felker1-2/+0
the newly allocated FILE * has not yet leaked to the application and is only visible to stdio internals until popen returns. since we do not change any fields of the structure observed by libc internals, only the pipe_pid member, locking is not necessary.
2021-03-15remove no-longer-needed special case handling in popenRich Felker1-16/+0
popen was special-casing the possibility (only possible when the parent closed stdin and/or stdout) that the child's end of the pipe was already on the final desired fd number, in which case there was no way to get rid of its close-on-exec flag in the child. commit 6fc6ca1a323bc0b6b9e9cdc8fa72221ae18fe206 made this unnecessary by implementing the POSIX-future requirement that dup2 file actions with equal source and destination fd values remove the close-on-exec flag.
2020-11-11lift child restrictions after multi-threaded forkRich Felker1-0/+2
as the outcome of Austin Group tracker issue #62, future editions of POSIX have dropped the requirement that fork be AS-safe. this allows but does not require implementations to synchronize fork with internal locks and give forked children of multithreaded parents a partly or fully unrestricted execution environment where they can continue to use the standard library (per POSIX, they can only portably use AS-safe functions). up until recently, taking this allowance did not seem desirable. however, commit 8ed2bd8bfcb4ea6448afb55a941f4b5b2b0398c0 exposed the extent to which applications and libraries are depending on the ability to use malloc and other non-AS-safe interfaces in MT-forked children, by converting latent very-low-probability catastrophic state corruption into predictable deadlock. dealing with the fallout has been a huge burden for users/distros. while it looks like most of the non-portable usage in applications could be fixed given sufficient effort, at least some of it seems to occur in language runtimes which are exposing the ability to run unrestricted code in the child as part of the contract with the programmer. any attempt at fixing such contracts is not just a technical problem but a social one, and is probably not tractable. this patch extends the fork function to take locks for all libc singletons in the parent, and release or reset those locks in the child, so that when the underlying fork operation takes place, the state protected by these locks is consistent and ready for the child to use. locking is skipped in the case where the parent is single-threaded so as not to interfere with legacy AS-safety property of fork in single-threaded programs. lock order is mostly arbitrary, but the malloc locks (including bump allocator in case it's used) must be taken after the locks on any subsystems that might use malloc, and non-AS-safe locks cannot be taken while the thread list lock is held, imposing a requirement that it be taken last.
2020-10-14move aio implementation details to a proper internal headerRich Felker1-0/+1
also fix the lack of declaration (and thus hidden visibility) in __stdio_close's use of __aio_close.
2020-08-30clean up overinclusion in files using TIOCGWINSZRich Felker2-2/+0
now that struct winsize is available via sys/ioctl.h once again, including termios.h is not needed.
2020-08-24add tcgetwinsize and tcsetwinsize functions, move struct winsizeRich Felker2-0/+2
these have been adopted for future issue of POSIX as the outcome of Austin Group issue 1151, and are simply functions performing the roles of the historical ioctls. since struct winsize is being standardized along with them, its definition is moved to the appropriate header. there is some chance this will break source files that expect struct winsize to be defined by sys/ioctl.h without including termios.h. if this happens, further changes will be needed to have sys/ioctl.h expose it too.
2020-07-02vfscanf: fix possible invalid free due to uninitialized variable useJulien Ramseier1-1/+1
vfscanf() may use the variable 'alloc' uninitialized when taking the branch introduced by commit b287cd745c2243f8e5114331763a5a9813b5f6ee. Spotted by clang.
2020-04-17move __string_read into vsscanf source fileRich Felker2-19/+13
apparently this function was intended at some point to be used by strto* family as well, and thus was put in its own file; however, as far as I can tell, it's only ever been used by vsscanf. move it to the same file to reduce the number of source files and external symbols.
2020-04-17remove spurious repeated semicolon in fmemopenRich Felker1-1/+1
2020-04-17combine two calls to memset in fmemopenRich Felker1-2/+2
this idea came up when I thought we might need to zero the UNGET portion of buf as well, but it seems like a useful improvement even when that turned out not to be necessary.
2020-04-17fix undefined behavior in scanf coreRich Felker1-0/+3
as reported/analyzed by Pascal Cuoq, the shlim and shcnt macros/functions are called by the scanf core (vfscanf) with f->rpos potentially null (if the FILE is not yet activated for reading at the time of the call). in this case, they compute differences between a null pointer (f->rpos) and a non-null one (f->buf), resulting in undefined behavior. it's unlikely that any observably wrong behavior occurred in practice, at least without LTO, due to limits on what's visible to the compiler from translation unit boundaries, but this has not been checked. fix is simply ensuring that the FILE is activated for read mode before entering the main scanf loop, and erroring out early if it can't be.
2020-02-21remove wrap_write helper from vdprintfRich Felker1-6/+1
this reverts commit 4ee039f3545976f9e3e25a7e5d7b58f1f2316dc3, which added the helper as a hack to make vdprintf usable before relocation, contingent on strong assumptions about the arch and tooling, back when the dynamic linker did not have a real staged model for self-relocation. since commit f3ddd173806fd5c60b3f034528ca24542aecc5b9 this has been unnecessary and the function was just wasting size and execution time.
2020-02-12fix remaining direct use of stat syscalls outside fstatat.cRich Felker2-4/+6
because struct stat is no longer assumed to correspond to the structure used by the stat-family syscalls, it's not valid to make any of these syscalls directly using a buffer of type struct stat. commit 9493892021eac4edf1776d945bcdd3f7a96f6978 moved all logic around this change for stat-family functions into fstatat.c, making the others wrappers for it. but a few other direct uses of the syscall were overlooked. the ones in tmpnam/tempnam are harmless since the syscalls are just used to test for file existence. however, the uses in fchmodat and __map_file depend on getting accurate file properties, and these functions may actually have been broken one or more mips variants due to removal of conversion hacks from syscall_arch.h. as a low-risk fix, simply use struct kstat in place of struct stat in the affected places.
2019-10-18fix return value of ungetc when argument is outside unsigned char rangeRich Felker1-1/+1
aside from the special value EOF, ungetc is specified to accept and convert values outside the range of unsigned char. conversion takes place automatically as part of assignment when storing into the buffer, but the return value is also required to be the resulting converted value, and this requirement was not satisfied. simplified from patch by Wang Jianjian.
2019-09-13fix %lf in wprintfBrion Vibber1-0/+2
commit cc3a4466605fe8dfc31f3b75779110ac93055bc1 fixed this for printf but neglected to fix wprintf. Previously, %lf caused a failure to output.
2019-07-16use namespace-safe __lseek for __stdio_seek instead of direct syscallRich Felker1-8/+2
this probably saves a few bytes, avoids duplicating the clunky lseek/_llseek syscall convention in two places, and sets the stage for fixing broken seeks on x32 and mipsn32.
2019-06-25allow fmemopen with zero sizeRich Felker1-1/+1
previously, POSIX erroneously required this to fail with EINVAL despite the traditional glibc implementation, on which the POSIX interface was based, allowing it. the resolution of Austin Group issue 818 removes the requirement to fail.
2019-05-05make fgetwc set error indicator for stream on encoding errorsRich Felker1-2/+8
this is a requirement in POSIX that's omitted, and seemed potentially non-conforming, in the C standard. as such it was omitted here. however, as part of Austin Group issue #1170, the discrepancy was raised with WG14 and determined to be unintended; future versions of the C standard will require the error indicator to be set, as POSIX does.
2019-03-21support archs with no renameat syscall, only renameat2Drew DeVault1-2/+4
2019-03-12setvbuf: return failure if mode is invalidA. Wilcox1-1/+3
POSIX requires setvbuf to return non-zero if `mode` is not one of _IONBF, _IOLBF, or _IOFBF.
2019-02-13fix behavior of gets when input line contains a null byteRich Felker1-3/+8
the way gets was implemented in terms of fgets, it used the location of the null termination to determine where to find and remove the newline, if any. an embedded null byte prevented this from working. this also fixes a one-byte buffer overflow, whereby when gets read an N-byte line (not counting newline), it would store two null terminators for a total of N+2 bytes. it's unlikely that anyone would care that a function whose use is pretty much inherently a buffer overflow writes too much, but it could break the only possible correct uses of this function, in conjunction with input of known format from a trusted/same-privilege-domain source, where the buffer length may have been selected to exactly match a line length contract. there seems to be no correct way to implement gets in terms of a single call to fgets or scanf, and using multiple calls would require explicit locking, so we might as well just write the logic out explicitly character-at-a-time. this isn't fast, but nobody cares if a catastrophically unsafe function that's so bad it was removed from the C language is fast.
2018-11-02fix failure to flush stderr when fflush(0) is calledRich Felker1-1/+4
commit ddc947eda311331959c73dbc4491afcfe2326346 fixed the corresponding bug for exit which was introduced when commit 0b80a7b0404b6e49b0b724e3e3fe0ed5af3b08ef added support for caller-provided buffers, making it possible for stderr to be a buffered stream.
2018-11-02fix deadlock and buffered data loss race in fcloseRich Felker1-13/+19
fflush(NULL) and __stdio_exit lock individual FILEs while holding the open file list lock to walk the list. since fclose first locked the FILE to be closed, then the ofl lock, it could deadlock with these functions. also, because fclose removed the FILE to be closed from the open file list before flushing and closing it, a concurrent fclose or exit could complete successfully before fclose flushed the FILE it was closing, resulting in data loss. reorder the body of fclose to first flush and close the file, then remove it from the open file list only after unlocking it. this creates a window where consumers of the open file list can see dead FILE objects, but in the absence of undefined behavior on the part of the application, such objects will be in an inactive-buffer state and processing them will have no side effects. __unlist_locked_file is also moved so that it's performed only for non-permanent files. this change is not necessary, but preserves consistency (and thereby provides safety/hardening) in the case where an application uses one of the standard streams after closing it while holding an explicit lock on it. such usage is of course undefined behavior.
2018-10-18further optimize getc/putc when locking is neededRich Felker2-10/+10
check whether the lock is free before loading the calling thread's tid. if so, just use a dummy tid value that cannot compare equal to any actual thread id (because it's one bit wider). this also avoids the need to save the tid and pass it to locking_getc or locking_putc, reducing register pressure. this change might slightly hurt the case where the caller already holds the lock, but it does not affect the single-threaded case, and may significantly improve the multi-threaded case, especially on archs where loading the thread pointer is disproportionately expensive like early mips and arm ISA levels. but even on i386 it helps, at least on some machines; I measured roughly a 10-15% improvement.
2018-10-18fix build regression due to missing file for putc changesRich Felker1-0/+22
commit d664061adb4d7f6647ab2059bc351daa394bf5da inadvertently omitted the new file putc.h.
2018-10-18bypass indirection through pointer objects to access stdin/out/errRich Felker3-9/+15
by ABI, the public stdin/out/err macros use extern pointer objects, and this is necessary to avoid copy relocations that would be expensive and make the size of the FILE structure part of the ABI. however, internally it makes sense to access the underlying FILE objects directly. this avoids both an indirection through the GOT to find the address of the stdin/out/err pointer objects (which can't be computed PC-relative because they may have been moved to the main program by copy relocations) and an indirection through the resulting pointer object. in most places this is just a minor optimization, but in the case of getchar and putchar (and the unlocked versions thereof), ipa constant propagation makes all accesses to members of stdin/out PC-relative or GOT-relative, possibly reducing register pressure as well.
2018-10-17optimize hot paths of putc with manual shrink-wrappingRich Felker3-13/+8
this is the analog of commit dd8f02b7dce53d6b1c4282439f1636a2d63bee01, but for putc.
2018-10-17optimize hot paths of getc with manual shrink-wrappingRich Felker4-15/+30
with these changes, in a program that has not created any threads besides the main thread and that has not called f[try]lockfile, getc performs indistinguishably from getc_unlocked. this was measured on several i386 and x86_64 models, and should hold on other archs too simply by the properties of the code generation. the case where the caller already holds the lock (via flockfile) is improved significantly as well (40-60% reduction in time on machines tested) and the case where locking is needed is improved somewhat (roughly 10%). the key technique used here is forcing the non-hot path out-of-line and enabling it to be a tail call. a static noinline function (conditional on __GNUC__) is used rather than the extern hiddens used elsewhere for this purpose, so that the compiler can choose non-default calling conventions, making it possible to tail-call to a callee that takes more arguments than the caller on archs where arguments are passed on the stack or must have space reserved on the stack for spilling the. the tid could just be reloaded via the thread pointer in locking_getc, but that would be ridiculously expensive on some archs where thread pointer load requires a trap or syscall.
2018-10-16move stdio locking MAYBE_WAITERS definition to stdio_impl.hRich Felker2-4/+0
don't repeat definition in two places.
2018-09-18fix race condition in file lockingKaarle Ritvanen1-6/+6
The condition occurs when - thread #1 is holding the lock - thread #2 is waiting for it on __futexwait - thread #1 is about to release the lock and performs a_swap - thread #3 enters the __lockfile function and manages to grab the lock before thread #1 calls __wake, resetting the MAYBE_WAITERS flag - thread #1 calls __wake - thread #2 wakes up but goes again to __futexwait as the lock is held by thread #3 - thread #3 releases the lock but does not call __wake as the MAYBE_WAITERS flag is not set This condition results in thread #2 not being woken up. This patch fixes the problem by making the woken up thread ensure that the flag is properly set before going to sleep again. Mainainer's note: This fixes a regression introduced in commit c21f750727515602a9e84f2a190ee8a0a2aeb2a1.
2018-09-16getdelim: only grow buffer when necessary, improve OOM behaviorRich Felker1-10/+17
commit b114190b29417fff6f701eea3a3b3b6030338280 introduced spurious realloc of the output buffer in cases where the result would exactly fit in the caller-provided buffer. this is contrary to a strict reading of the spec, which only allows realloc when the provided buffer is "of insufficient size". revert the adjustment of the realloc threshold, and instead push the byte read by getc_unlocked (for which the adjustment was made) back into the stdio buffer if it does not fit in the output buffer, to be read in the next loop iteration. in order not to leave a pushed-back byte in the stdio buffer if realloc fails (which would violate the invariant that logical FILE position and underlying open file description offset match for unbuffered FILEs), the OOM code path must be changed. it would suffice move just one byte in this case, but from a QoI perspective, in the event of ENOMEM the entire output buffer (up to the allocated length reported via *n) should contain bytes read from the FILE stream. otherwise the caller has no way to distinguish trunated data from uninitialized buffer space. the SIZE_MAX/2 check is removed since the sum of disjoint object sizes is assumed not to be able to overflow, leaving just one OOM code path.
2018-09-16fix null pointer subtraction and comparison in stdioRich Felker13-29/+39
morally, for null pointers a and b, a-b, a<b, and a>b should all be defined as 0; however, C does not define any of them. the stdio implementation makes heavy use of such pointer comparison and subtraction for buffer logic, and also uses null pos/base/end pointers to indicate that the FILE is not in the corresponding (read or write) mode ready for accesses through the buffer. all of the comparisons are fixed trivially by using != in place of the relational operators, since the opposite relation (e.g. pos>end) is logically impossible. the subtractions have been reviewed to check that they are conditional the stream being in the appropriate reading- or writing-through-buffer mode, with checks added where needed. in fgets and getdelim, the checks added should improve performance for unbuffered streams by avoiding a do-nothing call to memchr, and should be negligible for buffered streams.
2018-09-16fix failure of getdelim to set stream orientation on errorRich Felker1-0/+2
if EINVAL or ENOMEM happened before the first getc_unlocked, it was possible that the stream orientation had not yet been set.
2018-09-12split internal lock API out of libc.h, creating lock.hRich Felker1-1/+1
this further reduces the number of source files which need to include libc.h and thereby be potentially exposed to libc global state and internals. this will also facilitate further improvements like adding an inline fast-path, if we want to do so later.
2018-09-12remove spurious inclusion of libc.h for LFS64 ABI aliasesRich Felker7-14/+7
the LFS64 macro was not self-documenting and barely saved any characters. simply use weak_alias directly so that it's clear what's being done, and doesn't depend on a header to provide a strange macro.
2018-09-12reduce spurious inclusion of libc.hRich Felker27-12/+19
libc.h was intended to be a header for access to global libc state and related interfaces, but ended up included all over the place because it was the way to get the weak_alias macro. most of the inclusions removed here are places where weak_alias was needed. a few were recently introduced for hidden. some go all the way back to when libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented) cancellation points had to include it. remaining spurious users are mostly callers of the LOCK/UNLOCK macros and files that use the LFS64 macro to define the awful *64 aliases. in a few places, new inclusion of libc.h is added because several internal headers no longer implicitly include libc.h. declarations for __lockfile and __unlockfile are moved from libc.h to stdio_impl.h so that the latter does not need libc.h. putting them in libc.h made no sense at all, since the macros in stdio_impl.h are needed to use them correctly anyway.
2018-09-12hide purely dependency-triggering functions in stdio __toread & __towriteRich Felker2-2/+2
2018-09-12overhaul internally-public declarations using wrapper headersRich Felker4-8/+4
commits leading up to this one have moved the vast majority of libc-internal interface declarations to appropriate internal headers, allowing them to be type-checked and setting the stage to limit their visibility. the ones that have not yet been moved are mostly namespace-protected aliases for standard/public interfaces, which exist to facilitate implementing plain C functions in terms of POSIX functionality, or C or POSIX functionality in terms of extensions that are not standardized. some don't quite fit this description, but are "internally public" interfacs between subsystems of libc. rather than create a number of newly-named headers to declare these functions, and having to add explicit include directives for them to every source file where they're needed, I have introduced a method of wrapping the corresponding public headers. parallel to the public headers in $(srcdir)/include, we now have wrappers in $(srcdir)/src/include that come earlier in the include path order. they include the public header they're wrapping, then add declarations for namespace-protected versions of the same interfaces and any "internally public" interfaces for the subsystem they correspond to. along these lines, the wrapper for features.h is now responsible for the definition of the hidden, weak, and weak_alias macros. this means source files will no longer need to include any special headers to access these features. over time, it is my expectation that the scope of what is "internally public" will expand, reducing the number of source files which need to include *_impl.h and related headers down to those which are actually implementing the corresponding subsystems, not just using them.
2018-09-12move __stdio_exit_needed to stdio_impl.hRich Felker2-4/+0
this functions is glue for linking dependency logic.
2018-09-12make internal declarations for flockfile tracking functions checkableRich Felker2-4/+0
logically these belong to the intersection of the stdio and pthread subsystems, and either place the declarations could go (stdio_impl.h or pthread_impl.h) requires a forward declaration for one of the argument types.