aboutsummaryrefslogtreecommitdiff
path: root/io
AgeCommit message (Collapse)AuthorFilesLines
2023-09-07io: follow coroutine AioContext in qio_channel_yield()Stefan Hajnoczi7-41/+134
The ongoing QEMU multi-queue block layer effort makes it possible for multiple threads to process I/O in parallel. The nbd block driver is not compatible with the multi-queue block layer yet because QIOChannel cannot be used easily from coroutines running in multiple threads. This series changes the QIOChannel API to make that possible. In the current API, calling qio_channel_attach_aio_context() sets the AioContext where qio_channel_yield() installs an fd handler prior to yielding: qio_channel_attach_aio_context(ioc, my_ctx); ... qio_channel_yield(ioc); // my_ctx is used here ... qio_channel_detach_aio_context(ioc); This API design has limitations: reading and writing must be done in the same AioContext and moving between AioContexts involves a cumbersome sequence of API calls that is not suitable for doing on a per-request basis. There is no fundamental reason why a QIOChannel needs to run within the same AioContext every time qio_channel_yield() is called. QIOChannel only uses the AioContext while inside qio_channel_yield(). The rest of the time, QIOChannel is independent of any AioContext. In the new API, qio_channel_yield() queries the AioContext from the current coroutine using qemu_coroutine_get_aio_context(). There is no need to explicitly attach/detach AioContexts anymore and qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are gone. One coroutine can read from the QIOChannel while another coroutine writes from a different AioContext. This API change allows the nbd block driver to use QIOChannel from any thread. It's important to keep in mind that the block driver already synchronizes QIOChannel access and ensures that two coroutines never read simultaneously or write simultaneously. This patch updates all users of qio_channel_attach_aio_context() to the new API. Most conversions are simple, but vhost-user-server requires a new qemu_coroutine_yield() call to quiesce the vu_client_trip() coroutine when not attached to any AioContext. While the API is has become simpler, there is one wart: QIOChannel has a special case for the iohandler AioContext (used for handlers that must not run in nested event loops). I didn't find an elegant way preserve that behavior, so I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, true|false) for opting in to the new AioContext model. By default QIOChannel uses the iohandler AioHandler. Code that formerly called qio_channel_attach_aio_context() now calls qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is created. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20230830224802.493686-5-stefanha@redhat.com> [eblake: also fix migration/rdma.c] Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-07io: check there are no qio_channel_yield() coroutines during ->finalize()Stefan Hajnoczi1-0/+4
Callers must clean up their coroutines before calling object_unref(OBJECT(ioc)) to prevent an fd handler leak. Add an assertion to check this. This patch is preparation for the fd handler changes that follow. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20230830224802.493686-4-stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2023-08-01io: remove io watch if TLS channel is closed during handshakeDaniel P. Berrangé1-6/+12
The TLS handshake make take some time to complete, during which time an I/O watch might be registered with the main loop. If the owner of the I/O channel invokes qio_channel_close() while the handshake is waiting to continue the I/O watch must be removed. Failing to remove it will later trigger the completion callback which the owner is not expecting to receive. In the case of the VNC server, this results in a SEGV as vnc_disconnect_start() tries to shutdown a client connection that is already gone / NULL. CVE-2023-3354 Reported-by: jiangyegen <jiangyegen@huawei.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2023-05-30aio: remove aio_disable_external() APIStefan Hajnoczi3-8/+4
All callers now pass is_external=false to aio_set_fd_handler() and aio_set_event_notifier(). The aio_disable_external() API that temporarily disables fd handlers that were registered is_external=true is therefore dead code. Remove aio_disable_external(), aio_enable_external(), and the is_external arguments to aio_set_fd_handler() and aio_set_event_notifier(). The entire test-fdmon-epoll test is removed because its sole purpose was testing aio_disable_external(). Parts of this patch were generated using the following coccinelle (https://coccinelle.lip6.fr/) semantic patch: @@ expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque; @@ - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque) + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque) @@ expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready; @@ - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready) + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready) Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230516190238.8401-21-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19nbd/server: Fix drained_poll to wake coroutine in right AioContextKevin Wolf1-6/+27
nbd_drained_poll() generally runs in the main thread, not whatever iothread the NBD server coroutine is meant to run in, so it can't directly reenter the coroutines to wake them up. The code seems to have the right intention, it specifies the correct AioContext when it calls qemu_aio_coroutine_enter(). However, this functions doesn't schedule the coroutine to run in that AioContext, but it assumes it is already called in the home thread of the AioContext. To fix this, add a new thread-safe qio_channel_wake_read() that can be called in the main thread to wake up the coroutine in its AioContext, and use this in nbd_drained_poll(). Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230517152834.277483-3-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-04-20io: mark mixed functions that can suspendPaolo Bonzini1-39/+39
There should be no paths from a coroutine_fn to aio_poll, however in practice coroutine_mixed_fn will call aio_poll in the !qemu_in_coroutine() path. By marking mixed functions, we can track accurately the call paths that execute entirely in coroutine context, and find more missing coroutine_fn markers. This results in more accurate checks that coroutine code does not end up blocking. If the marking were extended transitively to all functions that call these ones, static analysis could be done much more efficiently. However, this is a start and makes it possible to use vrc's path-based searches to find potential bugs where coroutine_fns call blocking functions. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-12io: tls: Inherit QIO_CHANNEL_FEATURE_SHUTDOWN on server sidePeter Xu1-0/+3
TLS iochannel will inherit io_shutdown() from the master ioc, however we missed to do that on the server side. This will e.g. allow qemu_file_shutdown() to work on dest QEMU too for migration. Acked-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-03-14io/channel-tls: plug memory leakage on GSourceMatheus Tavares Bernardino1-0/+1
This leakage can be seen through test-io-channel-tls: $ ../configure --target-list=aarch64-softmmu --enable-sanitizers $ make ./tests/unit/test-io-channel-tls $ ./tests/unit/test-io-channel-tls Indirect leak of 104 byte(s) in 1 object(s) allocated from: #0 0x7f81d1725808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144 #1 0x7f81d135ae98 in g_malloc (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x57e98) #2 0x55616c5d4c1b in object_new_with_propv ../qom/object.c:795 #3 0x55616c5d4a83 in object_new_with_props ../qom/object.c:768 #4 0x55616c5c5415 in test_tls_creds_create ../tests/unit/test-io-channel-tls.c:70 #5 0x55616c5c5a6b in test_io_channel_tls ../tests/unit/test-io-channel-tls.c:158 #6 0x7f81d137d58d (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a58d) Indirect leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7f81d1725a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153 #1 0x7f81d1472a20 in gnutls_dh_params_init (/lib/x86_64-linux-gnu/libgnutls.so.30+0x46a20) #2 0x55616c6485ff in qcrypto_tls_creds_x509_load ../crypto/tlscredsx509.c:634 #3 0x55616c648ba2 in qcrypto_tls_creds_x509_complete ../crypto/tlscredsx509.c:694 #4 0x55616c5e1fea in user_creatable_complete ../qom/object_interfaces.c:28 #5 0x55616c5d4c8c in object_new_with_propv ../qom/object.c:807 #6 0x55616c5d4a83 in object_new_with_props ../qom/object.c:768 #7 0x55616c5c5415 in test_tls_creds_create ../tests/unit/test-io-channel-tls.c:70 #8 0x55616c5c5a6b in test_io_channel_tls ../tests/unit/test-io-channel-tls.c:158 #9 0x7f81d137d58d (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a58d) ... SUMMARY: AddressSanitizer: 49143 byte(s) leaked in 184 allocation(s). The docs for `g_source_add_child_source(source, child_source)` says "source will hold a reference on child_source while child_source is attached to it." Therefore, we should unreference the child source at `qio_channel_tls_read_watch()` after attaching it to `source`. With this change, ./tests/unit/test-io-channel-tls shows no leakages. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-13win32: replace closesocket() with close() wrapperMarc-André Lureau1-5/+5
Use a close() wrapper instead, so that we don't need to worry about closesocket() vs close() anymore, let's hope. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Message-Id: <20230221124802.4103554-17-marcandre.lureau@redhat.com>
2023-03-13win32: avoid mixing SOCKET and file descriptor spaceMarc-André Lureau1-3/+3
Until now, a win32 SOCKET handle is often cast to an int file descriptor, as this is what other OS use for sockets. When necessary, QEMU eventually queries whether it's a socket with the help of fd_is_socket(). However, there is no guarantee of conflict between the fd and SOCKET space. Such conflict would have surprising consequences, we shouldn't mix them. Also, it is often forgotten that SOCKET must be closed with closesocket(), and not close(). Instead, let's make the win32 socket wrapper functions return and take a file descriptor, and let util/ wrappers do the fd/SOCKET conversion as necessary. A bit of adaptation is necessary in io/ as well. Unfortunately, we can't drop closesocket() usage, despite _open_osfhandle() documentation claiming transfer of ownership, testing shows bad behaviour if you forget to call closesocket(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Message-Id: <20230221124802.4103554-15-marcandre.lureau@redhat.com>
2023-03-13win32/socket: introduce qemu_socket_unselect() helperMarc-André Lureau1-2/+2
A more explicit version of qemu_socket_select() with no events. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Message-Id: <20230221124802.4103554-8-marcandre.lureau@redhat.com>
2023-03-13win32/socket: introduce qemu_socket_select() helperMarc-André Lureau2-5/+5
This is a wrapper for WSAEventSelect, with Error handling. By default, it will produce a warning, so callers don't have to be modified now, and yet we can spot potential mis-use. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Message-Id: <20230221124802.4103554-7-marcandre.lureau@redhat.com>
2023-03-13io: use closesocket()Marc-André Lureau1-3/+3
Because they are actually sockets... Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230221124802.4103554-4-marcandre.lureau@redhat.com>
2023-02-15io/channel-tls: fix handling of bigger read buffersAntoine Damhet1-1/+65
Since the TLS backend can read more data from the underlying QIOChannel we introduce a minimal child GSource to notify if we still have more data available to be read. Signed-off-by: Antoine Damhet <antoine.damhet@shadow.tech> Signed-off-by: Charles Frey <charles.frey@shadow.tech> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2023-02-06io: Add support for MSG_PEEK for socket channelmanish.mishra8-5/+36
MSG_PEEK peeks at the channel, The data is treated as unread and the next read shall still return this data. This support is currently added only for socket class. Extra parameter 'flags' is added to io_readv calls to pass extra read flags like MSG_PEEK. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: manish.mishra <manish.mishra@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-10-26io/channel-watch: Fix socket watch on WindowsBin Meng1-4/+0
Random failure was observed when running qtests on Windows due to "Broken pipe" detected by qmp_fd_receive(). What happened is that the qtest executable sends testing data over a socket to the QEMU under test but no response is received. The errno of the recv() call from the qtest executable indicates ETIMEOUT, due to the qmp chardev's tcp_chr_read() is never called to receive testing data hence no response is sent to the other side. tcp_chr_read() is registered as the callback of the socket watch GSource. The reason of the callback not being called by glib, is that the source check fails to indicate the source is ready. There are two socket watch sources created to monitor the same socket event object from the char-socket backend in update_ioc_handlers(). During the source check phase, qio_channel_socket_source_check() calls WSAEnumNetworkEvents() to discover occurrences of network events for the indicated socket, clear internal network event records, and reset the event object. Testing shows that if we don't reset the event object by not passing the event handle to WSAEnumNetworkEvents() the symptom goes away and qtest runs very stably. It seems we don't need to call WSAEnumNetworkEvents() at all, as we don't parse the result of WSANETWORKEVENTS returned from this API. We use select() to poll the socket status. Fix this instability by dropping the WSAEnumNetworkEvents() call. Some side notes: During the testing, I removed the following codes in update_ioc_handlers(): remove_hup_source(s); s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP); g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup, chr, NULL); g_source_attach(s->hup_source, chr->gcontext); and such change also makes the symptom go away. And if I moved the above codes to the beginning, before the call to io_add_watch_poll(), the symptom also goes away. It seems two sources watching on the same socket event object is the key that leads to the instability. The order of adding a source watch seems to also play a role but I can't explain why. Hopefully a Windows and glib expert could explain this behavior. Signed-off-by: Bin Meng <bin.meng@windriver.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2022-10-26io/channel-watch: Drop the unnecessary castBin Meng1-3/+3
There is no need to do a type cast on ssource->socket as it is already declared as a SOCKET. Suggested-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2022-10-26io/channel-watch: Drop a superfluous '#ifdef WIN32'Bin Meng1-2/+0
In the win32 version qio_channel_create_socket_watch() body there is no need to do a '#ifdef WIN32'. Signed-off-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2022-10-12io/command: implement support for win32Marc-André Lureau1-21/+59
The initial implementation was changing the pipe state created by GLib to PIPE_NOWAIT, but it turns out it doesn't work (read/write returns an error). Since reading may return less than the requested amount, it seems to be non-blocking already. However, the IO operation may block until the FD is ready, I can't find good sources of information, to be safe we can just poll for readiness before. Alternatively, we could setup the FDs ourself, and use UNIX sockets on Windows, which can be used in blocking/non-blocking mode. I haven't tried it, as I am not sure it is necessary. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20221006113657.2656108-6-marcandre.lureau@redhat.com>
2022-10-12io/command: use glib GSpawn, instead of open-coding fork/execMarc-André Lureau1-87/+18
Simplify qio_channel_command_new_spawn() with GSpawn API. This will allow to build for WIN32 in the following patches. As pointed out by Daniel Berrangé: there is a change in semantics here too. The current code only touches stdin/stdout/stderr. Any other FDs which do NOT have O_CLOEXEC set will be inherited. With the new code, all FDs except stdin/out/err will be explicitly closed, because we don't set the flag G_SPAWN_LEAVE_DESCRIPTORS_OPEN. The only place we use QIOChannelCommand today is the migration exec: protocol, and that is only declared to use stdin/stdout. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20221006113657.2656108-5-marcandre.lureau@redhat.com>
2022-09-22io/channel-websock: Replace strlen(const_str) by sizeof(const_str) - 1Philippe Mathieu-Daudé1-1/+1
The combined_key[... QIO_CHANNEL_WEBSOCK_GUID_LEN ...] array in qio_channel_websock_handshake_send_res_ok() expands to a call to strlen(QIO_CHANNEL_WEBSOCK_GUID), and the compiler doesn't realize the string is const, so consider combined_key[] being a variable-length array. To remove the variable-length array, we provide it a hint to the compiler by using sizeof() - 1 instead of strlen(). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20220819153931.3147384-5-peter.maydell@linaro.org
2022-08-05QIOChannelSocket: Add support for MSG_ZEROCOPY + IPV6Leonardo Bras1-2/+2
For using MSG_ZEROCOPY, there are two steps: 1 - io_writev() the packet, which enqueues the packet for sending, and 2 - io_flush(), which gets confirmation that all packets got correctly sent Currently, if MSG_ZEROCOPY is used to send packets over IPV6, no error will be reported in (1), but it will fail in the first time (2) happens. This happens because (2) currently checks for cmsg_level & cmsg_type associated with IPV4 only, before reporting any error. Add checks for cmsg_level & cmsg_type associated with IPV6, and thus enable support for MSG_ZEROCOPY + IPV6 Fixes: 2bc58ffc29 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX") Signed-off-by: Leonardo Bras <leobras@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2022-07-20QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sentLeonardo Bras1-1/+7
If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently returns 1. This return code should be used only when Linux fails to use MSG_ZEROCOPY on a lot of sendmsg(). Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY) was attempted. Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX") Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20220711211112.18951-2-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-22io: add a QIOChannelNull equivalent to /dev/nullDaniel P. Berrangé3-0/+241
This is for code which needs a portable equivalent to a QIOChannelFile connected to /dev/null. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-22QIOChannelSocket: Fix zero-copy send so socket flush worksLeonardo Bras1-0/+5
Somewhere between v6 and v7 the of the zero-copy-send patchset a crucial part of the flushing mechanism got missing: incrementing zero_copy_queued. Without that, the flushing interface becomes a no-op, and there is no guarantee the buffer is really sent. This can go as bad as causing a corruption in RAM during migration. Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX") Reported-by: 徐闯 <xuchuangxclwt@bytedance.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-22QIOChannelSocket: Introduce assert and reduce ifdefs to improve readabilityLeonardo Bras1-5/+9
During implementation of MSG_ZEROCOPY feature, a lot of #ifdefs were introduced, particularly at qio_channel_socket_writev(). Rewrite some of those changes so it's easier to read. Also, introduce an assert to help detect incorrect zero-copy usage is when it's disabled on build. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Fixed up thinko'd g_assert_unreachable->g_assert_not_reached
2022-05-16QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUXLeonardo Bras1-4/+112
For CONFIG_LINUX, implement the new zero copy flag and the optional callback io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY feature is available in the host kernel, which is checked on qio_channel_socket_connect_sync() qio_channel_socket_flush() was implemented by counting how many times sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the socket's error queue, in order to find how many of them finished sending. Flush will loop until those counters are the same, or until some error occurs. Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY: 1: Buffer - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying, some caution is necessary to avoid overwriting any buffer before it's sent. If something like this happen, a newer version of the buffer may be sent instead. - If this is a problem, it's recommended to call qio_channel_flush() before freeing or re-using the buffer. 2: Locked memory - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and unlocked after it's sent. - Depending on the size of each buffer, and how often it's sent, it may require a larger amount of locked memory than usually available to non-root user. - If the required amount of locked memory is not available, writev_zero_copy will return an error, which can abort an operation like migration, - Because of this, when an user code wants to add zero copy as a feature, it requires a mechanism to disable it, so it can still be accessible to less privileged users. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20220513062836.965425-4-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-05-16QIOChannel: Add flags on io_writev and introduce io_flush callbackLeonardo Bras7-10/+46
Add flags to io_writev and introduce io_flush as optional callback to QIOChannelClass, allowing the implementation of zero copy writes by subclasses. How to use them: - Write data using qio_channel_writev*(...,QIO_CHANNEL_WRITE_FLAG_ZERO_COPY), - Wait write completion with qio_channel_flush(). Notes: As some zero copy write implementations work asynchronously, it's recommended to keep the write buffer untouched until the return of qio_channel_flush(), to avoid the risk of sending an updated buffer instead of the buffer state during write. As io_flush callback is optional, if a subclass does not implement it, then: - io_flush will return 0 without changing anything. Also, some functions like qio_channel_writev_full_all() were adapted to receive a flag parameter. That allows shared code between zero copy and non-zero copy writev, and also an easier implementation on new flags. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20220513062836.965425-3-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-05-03util: rename qemu_*block() socket functionsMarc-André Lureau1-3/+3
The qemu_*block() functions are meant to be be used with sockets (the win32 implementation expects SOCKET) Over time, those functions where used with Win32 SOCKET or file-descriptors interchangeably. But for portability, they must only be used with socket-like file-descriptors. FDs can use g_unix_set_fd_nonblocking() instead. Rename the functions with "socket" in the name to prevent bad usages. This is effectively reverting commit f9e8cacc5557e43 ("oslib-posix: rename socket_set_nonblock() to qemu_set_nonblock()"). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-03io: replace qemu_set{_non}block()Marc-André Lureau2-11/+18
Those calls are non-socket fd, or are POSIX-specific. Use the dedicated GLib API. (qemu_set_nonblock() is for socket-like) (this is a preliminary patch before renaming qemu_set_nonblock()) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2022-05-03io: make qio_channel_command_new_pid() staticMarc-André Lureau1-4/+22
The function isn't used outside of qio_channel_command_new_spawn(), which is !win32-specific. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2022-05-03io: replace pipe() with g_unix_open_pipe(CLOEXEC)Marc-André Lureau1-2/+2
Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2022-04-06Remove qemu-common.h include from most unitsMarc-André Lureau1-1/+0
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-22Replace GCC_FMT_ATTR with G_GNUC_PRINTFMarc-André Lureau1-1/+1
One less qemu-specific macro. It also helps to make some headers/units only depend on glib, and thus moved in standalone projects eventually. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
2022-03-22Drop qemu_foo() socket API wrapperMarc-André Lureau1-3/+3
The socket API wrappers were initially introduced in commit 00aa0040 ("Wrap recv to avoid warnings"), but made redundant with commit a2d96af4 ("osdep: add wrappers for socket functions") which fixes the win32 declarations and thus removed the earlier warnings. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2022-01-12aio-posix: split poll check from ready handlerStefan Hajnoczi3-4/+8
Adaptive polling measures the execution time of the polling check plus handlers called when a polled event becomes ready. Handlers can take a significant amount of time, making it look like polling was running for a long time when in fact the event handler was running for a long time. For example, on Linux the io_submit(2) syscall invoked when a virtio-blk device's virtqueue becomes ready can take 10s of microseconds. This can exceed the default polling interval (32 microseconds) and cause adaptive polling to stop polling. By excluding the handler's execution time from the polling check we make the adaptive polling calculation more accurate. As a result, the event loop now stays in polling mode where previously it would have fallen back to file descriptor monitoring. The following data was collected with virtio-blk num-queues=2 event_idx=off using an IOThread. Before: 168k IOPS, IOThread syscalls: 9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0) = 16 9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8) = 8 9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8) = 8 9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3 9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0) = 32 174k IOPS (+3.6%), IOThread syscalls: 9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0) = 32 9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8) = 8 9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8) = 8 9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50) = 32 Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because the IOThread stays in polling mode instead of falling back to file descriptor monitoring. As usual, polling is not implemented on Windows so this patch ignores the new io_poll_read() callback in aio-win32.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20211207132336.36627-2-stefanha@redhat.com [Fixed up aio_set_event_notifier() calls in tests/unit/test-fdmon-epoll.c added after this series was queued. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-09-30build-sys: add HAVE_IPPROTO_MPTCPMarc-André Lureau1-1/+1
The QAPI schema shouldn't rely on C system headers #define, but on configure-time project #define, so we can express the build condition in a C-independent way. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20210907121943.3498701-3-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-14io: use GDateTime for formatting timestamp for websock headersDaniel P. Berrangé1-8/+2
The GDateTime APIs provided by GLib avoid portability pitfalls, such as some platforms where 'struct timeval.tv_sec' field is still 'long' instead of 'time_t'. When combined with automatic cleanup, GDateTime often results in simpler code too. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-06-08sockets: Support multipath TCPDr. David Alan Gilbert1-0/+4
Multipath TCP allows combining multiple interfaces/routes into a single socket, with very little work for the user/admin. It's enabled by 'mptcp' on most socket addresses: ./qemu-system-x86_64 -nographic -incoming tcp:0:4444,mptcp Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210421112834.107651-6-dgilbert@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-06-08io/net-listener: Call the notifier during finalizeDr. David Alan Gilbert1-0/+3
Call the notifier during finalize; it's currently only called if we change it, which is not the intent. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210421112834.107651-3-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-06-08channel-socket: Only set CLOEXEC if we have space for fdsDr. David Alan Gilbert1-4/+4
MSG_CMSG_CLOEXEC cleans up received fd's; it's really only for Unix sockets, but currently we enable it for everything; some socket types (IP_MPTCP) don't like this. Only enable it when we're giving the recvmsg room to receive fd's anyway. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210421112834.107651-2-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-06-02docs: fix references to docs/devel/tracing.rstStefano Garzarella1-1/+1
Commit e50caf4a5c ("tracing: convert documentation to rST") converted docs/devel/tracing.txt to docs/devel/tracing.rst. We still have several references to the old file, so let's fix them with the following command: sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt) Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210517151702.109066-2-sgarzare@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2021-02-12io: error_prepend() in qio_channel_readv_full_all() causes segfaultJagannathan Raman1-2/+1
Using error_prepend() in qio_channel_readv_full_all() causes a segfault as errp is not set when ret is 0. This results in the failure of iotest 83. Replacing with error_setg() fixes the problem. Additionally, removes a full stop at the end of error message Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Fixes: bebab91ebdfc591f8793a9a17370df1bfbe8b2ca (io: add qio_channel_readv_full_all_eof & qio_channel_readv_full_all helpers) Message-Id: <be476bcdb99e820fec0fa09fe8f04c9dd3e62473.1613128220.git.jag.raman@oracle.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-10io: add qio_channel_readv_full_all_eof & qio_channel_readv_full_all helpersElena Ufimtseva1-20/+81
Adds qio_channel_readv_full_all_eof() and qio_channel_readv_full_all() to read both data and FDs. Refactors existing code to use these helpers. Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Message-id: b059c4cc0fb741e794d644c144cc21372cad877d.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-02-10io: add qio_channel_writev_full_all helperElena Ufimtseva1-1/+14
Adds qio_channel_writev_full_all() to transmit both data and FDs. Refactors existing code to use this helper. Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Message-id: 480fbf1fe4152495d60596c9b665124549b426a5.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-01-13Merge remote-tracking branch 'remotes/armbru/tags/pull-yank-2021-01-13' into ↵Peter Maydell1-2/+4
staging Yank patches patches for 2021-01-13 # gpg: Signature made Wed 13 Jan 2021 09:25:46 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-yank-2021-01-13: tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown io/channel-tls.c: make qio_channel_tls_shutdown thread-safe migration: Add yank feature chardev/char-socket.c: Add yank feature block/nbd.c: Add yank feature Introduce yank feature Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-13io/channel-tls.c: make qio_channel_tls_shutdown thread-safeLukas Straub1-2/+4
Make qio_channel_tls_shutdown thread-safe by using atomics when accessing tioc->shutdown. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <5bd8733f583f3558b32250fd0eb576b7aa756485.1609167865.git.lukasstraub2@web.de> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-12meson: Propagate gnutls dependencyRoman Bolshakov1-1/+1
crypto/tlscreds.h includes GnuTLS headers if CONFIG_GNUTLS is set, but GNUTLS_CFLAGS, that describe include path, are not propagated transitively to all users of crypto and build fails if GnuTLS headers reside in non-standard directory (which is a case for homebrew on Apple Silicon). Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Message-Id: <20210102125213.41279-1-r.bolshakov@yadro.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-29io: Don't use '#' flag of printf formatAlexChen1-1/+1
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: AlexChen <alex.chen@huawei.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-10-29io: Fix Lesser GPL version numberChetan Pant11-11/+11
There is no "version 2" of the "Lesser" General Public License. It is either "GPL version 2.0" or "Lesser GPL version 2.1". This patch replaces all occurrences of "Lesser GPL version 2" with "Lesser GPL version 2.1" in comment section. Signed-off-by: Chetan Pant <chetan4windows@gmail.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>