Age | Commit message (Collapse) | Author | Files | Lines |
|
This helper will be useful in the listener handlers to extract the
VFIO device from a memory region using memory_region_owner(). At the
moment, we only care for PCI passthrough devices. If the need arises,
we will add more.
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250206131438.1505542-5-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
Rephrase a bit the ending comment about how errors are handled
depending on the phase in which vfio_listener_region_add() is called.
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250206131438.1505542-4-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
This is to be consistent with other reported errors related to vIOMMU
devices.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250206131438.1505542-3-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
When iommufd_cdev_ram_block_discard_disable() fails for whatever reason,
errp should be set or else SIGSEV is triggered in vfio_realize() when
error_prepend() is called.
By this chance, use the same error message for both legacy and iommufd
backend.
Fixes: 5ee3dc7af785 ("vfio/iommufd: Implement the iommufd backend")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20250116102307.260849-1-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
With the introduction of config_offset field, VFIOConfigMirrorQuirk can
now be used for those mirrored register in igd bar0. This eliminates
the need for the macro intoduced in 1a2623b5c9e7 ("vfio/igd: add macro
for declaring mirrored registers").
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20250104154219.7209-4-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
Device may only expose a specific portion of PCI config space through a
region in a BAR, such behavior is seen in igd GGC and BDSM mirrors in
BAR0. To handle these, config_offset is introduced to allow mirroring
arbitrary region in PCI config space.
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20250104154219.7209-3-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
Declare generic vfio_generic_{window_address,window_data,mirror}_quirk
in newly created pci_quirks.h so that they can be used elsewhere, like
igd.c.
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20250104154219.7209-2-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
The risk is mainly theoretical since the applied bit mask will keep
the 'ggms' shift value below 3. Nevertheless, let's use a 64 bit
integer type and resolve the coverity issue.
Resolves: Coverity CID 1585908
Fixes: 1e1eac5f3dcd ("vfio/igd: canonicalize memory size calculations")
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20250107130604.669697-1-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
The general expectation is that header files should follow the same
file/path naming scheme as the corresponding source file. There are
various historical exceptions to this practice in QEMU, with one of
the most notable being the include/qapi/qmp/ directory. Most of the
headers there correspond to source files in qobject/.
This patch corrects most of that inconsistency by creating
include/qobject/ and moving the headers for qobject/ there.
This also fixes MAINTAINERS for include/qapi/qmp/dispatch.h:
scripts/get_maintainer.pl now reports "QAPI" instead of "No
maintainers found".
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com> #s390x
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20241118151235.2665921-2-armbru@redhat.com>
[Rebased]
|
|
vfio_pci_size_rom() distinguishes whether rombar is explicitly set to 1
by checking dev->opts, bypassing the QOM property infrastructure.
Use -1 as the default value for rombar to tell if the user explicitly
set it to 1. The property is also converted from unsigned to signed.
-1 is signed so it is safe to give it a new meaning. The values in
[2 ^ 31, 2 ^ 32) become invalid, but nobody should have typed these
values by chance.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250104-reuse-v18-13-c349eafd8673@daynix.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
|
|
vfio_devices_all_dirty_tracking() is used to check if dirty page log
sync is needed. However, besides checking the dirty page tracking
status, it also checks the pre_copy_dirty_page_tracking flag.
Rename it to vfio_devices_log_sync_needed() which reflects its purpose
more accurately and makes the code clearer as there are already several
helpers with similar names.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20241218134022.21264-5-avihaih@nvidia.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
During DMA unmap with vIOMMU, vfio_devices_all_running_and_mig_active()
is used to check whether a dirty page log sync of the unmapped pages is
required. Such log sync is needed during migration pre-copy phase, and
the current logic detects it by checking if migration is active and if
the VFIO devices are running.
However, recently there has been an effort to simplify the migration
status API and reduce it to a single migration_is_running() function.
To accommodate this, refactor vfio_devices_all_running_and_mig_active()
logic so it won't use migration_is_active(). Do it by simply checking if
dirty tracking has been started using internal VFIO flags.
This should be equivalent to the previous logic as during migration
dirty tracking is active and when the guest is stopped there shouldn't
be DMA unmaps coming from it.
As a side effect, now that migration status is no longer used, DMA unmap
log syncs are untied from migration. This will make calc-dirty-rate more
accurate as now it will also include VFIO dirty pages that were DMA
unmapped.
Also rename the function to properly reflect its new logic and extract
common code from vfio_devices_all_dirty_tracking().
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20241218134022.21264-4-avihaih@nvidia.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
During dirty page log sync, vfio_devices_all_dirty_tracking() is used to
check if dirty tracking has been started in order to avoid errors. The
current logic checks if migration is in ACTIVE or DEVICE states to
ensure dirty tracking has been started.
However, recently there has been an effort to simplify the migration
status API and reduce it to a single migration_is_running() function.
To accommodate this, refactor vfio_devices_all_dirty_tracking() logic so
it won't use migration_is_active() and migration_is_device(). Instead,
use internal VFIO dirty tracking flags.
As a side effect, now that migration status is no longer used to detect
dirty tracking status, VFIO log syncs are untied from migration. This
will make calc-dirty-rate more accurate as now it will also include VFIO
dirty pages.
While at it, as VFIODevice->dirty_tracking is now used to detect dirty
tracking status, add a comment that states how it's protected.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20241218134022.21264-3-avihaih@nvidia.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
Add a flag to VFIOContainerBase that indicates whether dirty tracking
has been started for the container or not.
This will be used in the following patches to allow dirty page syncs
only if dirty tracking has been started.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20241218134022.21264-2-avihaih@nvidia.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
DSM region is likely to store framebuffer in Windows, a small DSM
region may cause display issues (e.g. half of the screen is black).
Since 971ca22f041b ("vfio/igd: don't set stolen memory size to zero"),
the x-igd-gms option was functionally removed, QEMU uses host's
original value, which is determined by DVMT Pre-Allocated option in
Intel FSP of host bios.
However, some vendors do not expose this config item to users. In
such cases, x-igd-gms option can be used to manually set the data
stolen memory size for guest. So this commit brings this option back,
keeping its old behavior. When it is not specified, QEMU uses host's
value.
When DVMT Pre-Allocated option is available in host BIOS, user should
set DSM region size there instead of using x-igd-gms option.
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-11-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
A recent commit in i915 driver [1] claims the BDSM register at 0x1080c0
of mmio bar0 has been there since gen 6. Mirror this register to the 32
bit BDSM register at 0x5c in pci config space for gen6-10 devices.
[1] https://patchwork.freedesktop.org/patch/msgid/20240202224340.30647-7-ville.syrjala@linux.intel.com
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-10-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
The GGC register at 0x50 of pci config space is a mirror of the same
register at 0x108040 of mmio bar0 [1]. i915 driver also reads that
register from mmio bar0 instead of config space. As GGC is programmed
and emulated by qemu, the mmio address should also be emulated, in the
same way of BDSM register.
[1] 4.1.28, 12th Generation Intel Core Processors Datasheet Volume 2
https://www.intel.com/content/www/us/en/content-details/655259
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-9-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
igd devices have multipe registers mirroring mmio address and pci
config space, more than a single BDSM register. To support this,
the read/write functions are made common and a macro is defined to
simplify the declaration of MemoryRegionOps.
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-8-tomitamoeko@gmail.com
[ clg : Fixed conversion specifier on 32-bit platform ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
All gen 11 and 12 igd devices have 64 bit BDSM register at 0xC0 in its
config space, add them to the list to support igd passthrough on Alder/
Raptor/Rocket/Ice/Jasper Lake platforms.
Tested legacy mode of igd passthrough works properly on both linux and
windows guests with AlderLake-S GT1 (8086:4680).
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-7-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
Both Gemini Lake and Comet Lake are gen 9 devices. Many user reports
on internet shows legacy mode of igd passthrough works as qemu treats
them as gen 8 devices by default before e433f208973f ("vfio/igd:
return an invalid generation for unknown devices").
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-6-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
Add helper functions igd_gtt_memory_size() and igd_stolen_size() for
calculating GTT stolen memory and Data stolen memory size in bytes,
and use macros to replace the hardware-related magic numbers for
better readability.
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-5-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
Define the igd device generations according to i915 kernel driver to
avoid confusion, and adjust comment placement to clearly reflect the
relationship between ids and devices.
The condition of how GTT stolen memory size is calculated is changed
accordingly as GGMS is in multiple of 2 starting from gen 8.
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-4-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
Since e433f208973f ("vfio/igd: return an invalid generation for unknown
devices"), the default return of igd_gen() was changed to unsupported.
There is no need to filter out those unsupported devices.
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Link: https://lore.kernel.org/r/20241206122749.9893-3-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
On gen 8 and later devices, the GTT stolen memory size when GGMS equals
0 is 0 (no preallocated memory) rather than 1MB [1].
[1] 3.1.13, 5th Generation Intel Core Processor Family Datasheet Vol. 2
https://www.intel.com/content/www/us/en/content-details/330835
Fixes: c4c45e943e51 ("vfio/pci: Intel graphics legacy mode assignment")
Reported-By: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-2-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
Accel & Exec patch queue
- Ignore writes to CNTP_CTL_EL0 on HVF ARM (Alexander)
- Add '-d invalid_mem' logging option (Zoltan)
- Create QOM containers explicitly (Peter)
- Rename sysemu/ -> system/ (Philippe)
- Re-orderning of include/exec/ headers (Philippe)
Move a lot of declarations from these legacy mixed bag headers:
. "exec/cpu-all.h"
. "exec/cpu-common.h"
. "exec/cpu-defs.h"
. "exec/exec-all.h"
. "exec/translate-all"
to these more specific ones:
. "exec/page-protection.h"
. "exec/translation-block.h"
. "user/cpu_loop.h"
. "user/guest-host.h"
. "user/page-protection.h"
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmdlnyAACgkQ4+MsLN6t
# wN6mBw//QFWi7CrU+bb8KMM53kOU9C507tjn99LLGFb5or73/umDsw6eo/b8DHBt
# KIwGLgATel42oojKfNKavtAzLK5rOrywpboPDpa3SNeF1onW+99NGJ52LQUqIX6K
# A6bS0fPdGG9ZzEuPpbjDXlp++0yhDcdSgZsS42fEsT7Dyj5gzJYlqpqhiXGqpsn8
# 4Y0UMxSL21K3HEexlzw2hsoOBFA3tUm2ujNDhNkt8QASr85yQVLCypABJnuoe///
# 5Ojl5wTBeDwhANET0rhwHK8eIYaNboiM9fHopJYhvyw1bz6yAu9jQwzF/MrL3s/r
# xa4OBHBy5mq2hQV9Shcl3UfCQdk/vDaYaWpgzJGX8stgMGYfnfej1SIl8haJIfcl
# VMX8/jEFdYbjhO4AeGRYcBzWjEJymkDJZoiSWp2NuEDi6jqIW+7yW1q0Rnlg9lay
# ShAqLK5Pv4zUw3t0Jy3qv9KSW8sbs6PQxtzXjk8p97rTf76BJ2pF8sv1tVzmsidP
# 9L92Hv5O34IqzBu2oATOUZYJk89YGmTIUSLkpT7asJZpBLwNM2qLp5jO00WVU0Sd
# +kAn324guYPkko/TVnjC/AY7CMu55EOtD9NU35k3mUAnxXT9oDUeL4NlYtfgrJx6
# x1Nzr2FkS68+wlPAFKNSSU5lTjsjNaFM0bIJ4LCNtenJVP+SnRo=
# =cjz8
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 20 Dec 2024 11:45:20 EST
# gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE
* tag 'exec-20241220' of https://github.com/philmd/qemu: (59 commits)
util/qemu-timer: fix indentation
meson: Do not define CONFIG_DEVICES on user emulation
system/accel-ops: Remove unnecessary 'exec/cpu-common.h' header
system/numa: Remove unnecessary 'exec/cpu-common.h' header
hw/xen: Remove unnecessary 'exec/cpu-common.h' header
target/mips: Drop left-over comment about Jazz machine
target/mips: Remove tswap() calls in semihosting uhi_fstat_cb()
target/xtensa: Remove tswap() calls in semihosting simcall() helper
accel/tcg: Un-inline translator_is_same_page()
accel/tcg: Include missing 'exec/translation-block.h' header
accel/tcg: Move tcg_cflags_has/set() to 'exec/translation-block.h'
accel/tcg: Restrict curr_cflags() declaration to 'internal-common.h'
qemu/coroutine: Include missing 'qemu/atomic.h' header
exec/translation-block: Include missing 'qemu/atomic.h' header
accel/tcg: Declare cpu_loop_exit_requested() in 'exec/cpu-common.h'
exec/cpu-all: Include 'cpu.h' earlier so MMU_USER_IDX is always defined
target/sparc: Move sparc_restore_state_to_opc() to cpu.c
target/sparc: Uninline cpu_get_tb_cpu_state()
target/loongarch: Declare loongarch_cpu_dump_state() locally
user: Move various declarations out of 'exec/exec-all.h'
...
Conflicts:
hw/char/riscv_htif.c
hw/intc/riscv_aplic.c
target/s390x/cpu.c
Apply sysemu header path changes to not in the pull request.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|
|
Headers in include/sysemu/ are not only related to system
*emulation*, they are also used by virtualization. Rename
as system/ which is clearer.
Files renamed manually then mechanical change using sed tool.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20241203172445.28576-1-philmd@linaro.org>
|
|
Via sed "s/ Property [*]/ const Property */".
The opaque pointers passed to ObjectProperty callbacks are
the last instances of non-const Property pointers in the tree.
For the most part, these callbacks only use object_field_prop_ptr,
which now takes a const pointer itself.
This logically should have accompanied d36f165d952 which
allowed const Property to be registered.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://lore.kernel.org/r/20241218134251.4724-25-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now that all of the Property arrays are counted, we can remove
the terminator object from each array. Update the assertions
in device_class_set_props to match.
With struct Property being 88 bytes, this was a rather large
form of terminator. Saves 30k from qemu-system-aarch64.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://lore.kernel.org/r/20241218134251.4724-21-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
staging
* Fixes & doc updates for the new "boot order" s390x bios feature
* Provide a "loadparm" property for scsi-hd & scsi-cd devices on s390x
(required for the "boot order" feature)
* Fix the floating-point multiply-and-add NaN rules on s390x
* Raise timeout on cross-accel build jobs to 60m
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmc7ercRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbVjyg//ZuhSDCj+oBSU6vwM7Lwh3CS6GwZvGECU
# h60V3tizKypiRNtTJRXHoWcx95brXmoZgI+QQhDEXe3fFLkOEKT6AIlDhrKZRUsd
# rpLPr6O8TVKO+rSE7JVJAP3X1tpOOQDxnq83uWBv53b0S+Da0VwDRtI9gcugRMmh
# d58P8Q1bV344fQdcrebejstpSUG7RxSA4Plj2uSQx4mSHT7cy/hN+vA34Ha7reE3
# tcN9yfQq3Rmfvt0MV5I9Umd6JXEoDlEAwjSNsWRsCzo69jBZwiMtXSH8LyLtwRTp
# C919G/MIRuhvImF74dStLVCr82sNq54YR1NP6CGcmqPH76FOH8Mx3vmx9Cxj9ckA
# 6NI6SvIg++bW2O1efG2apz8p5fjbDzYXSAbHnaWTcEu3gPgH4PQ5QXoyKaDymvWV
# JIh5/gXEy+twEXgIBsdWQ44A9E06lL/tNfKnqGdXK4ZYF2JIrI+Lq7AKBee7tebP
# +72I4PljHLSHQ3GxdkoOeJ8ahu70IBdSz2/VEIwOWK1wIf5C5WFNBerLJyDmkyx8
# xIvIm0vlRLwPcuOC711nlaMaKqTNT+8W4DIqIY6fHs2Jy0psMdgey1uHQxYEj9Kh
# fg7CvalK8n3MkGAwTqAvRJIwMFe0a4Ss6c6CaemSaYa38ud/pCNnv+IT+Eqr+mjq
# 6y5PZWNrZi0=
# =UaDH
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 18 Nov 2024 17:34:47 GMT
# gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg: issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg: aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5
* tag 'pull-request-2024-11-18' of https://gitlab.com/thuth/qemu:
.gitlab-ci.d: Raise timeout on cross-accel build jobs to 60m
pc-bios: Update the s390 bios images with the recent fixes
pc-bios/s390-ccw: Re-initialize receive queue index before each boot attempt
pc-bios/s390x: Initialize machine loadparm before probing IPL devices
pc-bios/s390x: Initialize cdrom type to false for each IPL device
hw: Add "loadparm" property to scsi disk devices for booting on s390x
hw/s390x: Restrict "loadparm" property to devices that can be used for booting
docs/system/bootindex: Make it clear that s390x can also boot from virtio-net
docs/system/s390x/bootdevices: Update loadparm documentation
tests/tcg/s390x: Add the floating-point multiply-and-add test
target/s390x: Fix the floating-point multiply-and-add NaN rules
hw/usb: Use __attribute__((packed)) vs __packed
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
|
|
Commit bb185de423 ("s390x: Add individual loadparm assignment to
CCW device") added a "loadparm" property to all CCW devices. This
was a little bit unfortunate, since this property is only useful
for devices that can be used for booting, but certainly it is not
useful for devices like virtio-gpu or virtio-tablet.
Thus let's restrict the property to CCW devices that we can boot from
(i.e. virtio-block, virtio-net and vfio-ccw devices).
Message-ID: <20241113114741.681096-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Jared Rossi <jrossi@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
|
|
When commit 96b7af4388b3 intoduced a .instance_finalize() handler,
it did not take into account that the container was not necessarily
inserted into the container list of the address space. Hence, if
the container object is destroyed, by calling object_unref() for
example, before vfio_address_space_insert() is called, QEMU may
crash when removing the container from the list as done in
vfio_container_instance_finalize(). This was seen with an SEV-SNP
guest for which discarding of RAM fails.
To resolve this issue, use the safe version of QLIST_REMOVE().
Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Fixes: 96b7af4388b3 ("vfio/container: Move vfio_container_destroy() to an instance_finalize() handler")
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
When copying the calculation of the stolen memory size for Intels integrated
graphics device of gen 9 and later from the Linux kernel [1], we missed
subtracting 0xf0 from the graphics mode select value for values above 0xf0.
This leads to QEMU reporting a very large size of the graphics stolen memory
area. That's just a waste of memory. Additionally the guest firmware might be
unable to allocate such a large buffer.
[1] https://github.com/torvalds/linux/blob/7c626ce4bae1ac14f60076d00eafe71af30450ba/arch/x86/kernel/early-quirks.c#L455-L460
Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Fixes: 871922416683 ("vfio/igd: correctly calculate stolen memory size for gen 9 and later")
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
[ clg: Changed commit subject ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
|
I've tested and verified that Coffee Lake devices are working properly.
Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
|
|
This way it is clearly known when there's no more data to send for that
device.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
|
|
This way both the start and end points of migrating a particular VFIO
device are known.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
|
|
This helper is mostly the same as migration_is_running(), except that one
has COLO reported as true, the other has CANCELLING reported as true.
Per my past years experience on the state changes, none of them should
matter.
To make it slightly safer, report both COLO || CANCELLING to be true in
migration_is_running(), then drop the other one. We kept the 1st only
because the name is simpler, and clear enough.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
|
|
Thanks to work by Peter Xu, support is introduced in Linux v6.12 to
allow pfnmap insertions at PMD and PUD levels of the page table. This
means that provided a properly aligned mmap, the vfio driver is able
to map MMIO at significantly larger intervals than PAGE_SIZE. For
example on x86_64 (the only architecture currently supporting huge
pfnmaps for PUD), rather than 4KiB mappings, we can map device MMIO
using 2MiB and even 1GiB page table entries.
Typically mmap will already provide PMD aligned mappings, so devices
with moderately sized MMIO ranges, even GPUs with standard 256MiB BARs,
will already take advantage of this support. However in order to better
support devices exposing multi-GiB MMIO, such as 3D accelerators or GPUs
with resizable BARs enabled, we need to manually align the mmap.
There doesn't seem to be a way for userspace to easily learn about PMD
and PUD mapping level sizes, therefore this takes the simple approach
to align the mapping to the power-of-two size of the region, up to 1GiB,
which is currently the maximum alignment we care about.
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
|
|
Move error handling code to the end of the function so that it can more
easily be shared by new mmap failure conditions. No functional change
intended.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
|
|
Data sizes in VFIO migration trace events are printed in hex format
while in migration core trace events they are printed in decimal format.
This inconsistency makes it less readable when using both trace event
types. Hence, change the data sizes print format to decimal in VFIO
migration trace events.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
|
|
vfio_state_pending_exact() is used to update migration core how much
device data is left for the device migration. Currently, the sum of
pre-copy and stop-copy sizes of the VFIO device are reported.
The pre-copy size is obtained via the VFIO_MIG_GET_PRECOPY_INFO ioctl,
which returns the amount of device data available to be transferred
while the device is in the PRE_COPY states.
The stop-copy size is obtained via the VFIO_DEVICE_FEATURE_MIG_DATA_SIZE
ioctl, which returns the total amount of device data left to be
transferred in order to complete the device migration.
According to the above, current implementation is wrong -- it reports
extra overlapping data because pre-copy size is already contained in
stop-copy size. Fix it by reporting only stop-copy size.
Fixes: eda7362af959 ("vfio/migration: Add VFIO migration pre-copy support")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
|
|
We have to update the calculation of the stolen memory size because
we've seen devices using values of 0xf0 and above for the graphics mode
select field. The new calculation was taken from the linux kernel [1].
[1] https://github.com/torvalds/linux/blob/7c626ce4bae1ac14f60076d00eafe71af30450ba/arch/x86/kernel/early-quirks.c#L455-L460
Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
|
|
The stolen memory is required for the GOP (EFI) driver and the Windows
driver. While the GOP driver seems to work with any stolen memory size,
the Windows driver will crash if the size doesn't match the size
allocated by the host BIOS. For that reason, it doesn't make sense to
overwrite the stolen memory size. It's true that this wastes some VM
memory. In the worst case, the stolen memory can take up more than a GB.
However, that's uncommon. Additionally, it's likely that a bunch of RAM
is assigned to VMs making use of GPU passthrough.
Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
|
|
ElkhartLake and TigerLake devices were tested in legacy mode with Linux
and Windows VMs. Both are working properly. It's likely that other Intel
GPUs of gen 11 and 12 like IceLake device are working too. However,
we're only adding known good devices for now.
Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
|
|
The BDSM register is mirrored into MMIO space at least for gen 11 and
later devices. Unfortunately, the Windows driver reads the register
value from MMIO space instead of PCI config space for those devices [1].
Therefore, we either have to keep a 1:1 mapping for the host and guest
address or we have to emulate the MMIO register too. Using the igd in
legacy mode is already hard due to it's many constraints. Keeping a 1:1
mapping may not work in all cases and makes it even harder to use. An
MMIO emulation has to trap the whole MMIO page. This makes accesses to
this page slower compared to using second level address translation.
Nevertheless, it doesn't have any constraints and I haven't noticed any
performance degradation yet making it a better solution.
[1] https://github.com/projectacrn/acrn-hypervisor/blob/5c351bee0f6ae46250eefc07f44b4a31e770f3cf/devicemodel/hw/pci/passthrough.c#L650-L653
Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Intel changed the location and size of the BDSM register for gen 11
devices and later. We have to adjust our emulation for these devices to
properly support them.
Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
|
|
We're soon going to add support for legacy mode to ElkhartLake and
TigerLake devices. Those are gen 11 and 12 devices. At the moment, all
devices identified by our igd_gen function do support legacy mode. This
won't change when adding our new devices of gen 11 and 12. Therefore, it
makes more sense to accept legacy mode for all known devices instead of
maintaining a long list of known good generations. If we add a new
generation to igd_gen which doesn't support legacy mode for some reason,
it'll be easy to advance the check to reject legacy mode for this
specific generation.
Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Intel changes it's specification quite often e.g. the location and size
of the BDSM register has change for gen 11 devices and later. This
causes our emulation to fail on those devices. So, it's impossible for
us to use a suitable default value for unknown devices. Instead of
returning a random generation value and hoping that everthing works
fine, we should verify that different devices are working and add them
to our list of known devices.
Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
|
|
The tracepoint trace_vfio_msix_early_setup() uses "int" for the type
of the table_bar argument, but we use this to print a uint32_t.
Coverity warns that this means that we could end up treating it as a
negative number.
We only use this in printing the value in the tracepoint, so
mishandling it as a negative number would be harmless, but it's
better to use the right type in the tracepoint. Use uint64_t to
match how we print the table_offset in the vfio_msix_relo()
tracepoint.
Resolves: Coverity CID 1547690
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
|
|
Use device_class_set_legacy_reset() instead of opencoding an
assignment to DeviceClass::reset. This change was produced
with:
spatch --macro-file scripts/cocci-macro-file.h \
--sp-file scripts/coccinelle/device-reset.cocci \
--keep-comments --smpl-spacing --in-place --dir hw
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240830145812.1967042-8-peter.maydell@linaro.org
|