aboutsummaryrefslogtreecommitdiff
path: root/hw/nvme/ctrl.c
AgeCommit message (Collapse)AuthorFilesLines
2023-09-21Merge tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu into stagingStefan Hajnoczi1-3/+3
trivial patches for 2023-09-21 # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEe3O61ovnosKJMUsicBtPaxppPlkFAmUL/84PHG1qdEB0bHMu # bXNrLnJ1AAoJEHAbT2saaT5Zlz4H/iI7Rhmsw6E46WhQPz1oly8p5I3m6Tcxs5B3 # nagfaJC0EYjKyMZC1bsATJwRj8robCb5SDhZeUfudt1ytZYFfH3ulvlUrGYrMQRW # YEfBFIDLexqrLpsykc6ovl2NB5BXQsK3n6NNbnYE1OxQt8Cy4kNQi1bStrZ8JzDE # lIxvWZdwoQJ2K0VRDGRLrL6XG80qeONSXEoppXxJlfhk1Ar3Ruhijn3REzfQybvV # 1zIa1/h80fSLuwOGSPuOLqVCt6JzTuOOrfYc9F+sjcmIQWHLECy6CwTHEbb921Tw # 9HD6ah4rvkxoN2NWSPo/kM6tNW/pyOiYwYldx5rfWcQ5mhScuO8= # =u6P0 # -----END PGP SIGNATURE----- # gpg: Signature made Thu 21 Sep 2023 04:33:18 EDT # gpg: using RSA key 7B73BAD68BE7A2C289314B22701B4F6B1A693E59 # gpg: issuer "mjt@tls.msk.ru" # gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" [full] # gpg: aka "Michael Tokarev <mjt@corpit.ru>" [full] # gpg: aka "Michael Tokarev <mjt@debian.org>" [full] # Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5 # Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931 4B22 701B 4F6B 1A69 3E59 * tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu: docs/devel/reset.rst: Correct function names docs/cxl: Cleanout some more aarch64 examples. hw/mem/cxl_type3: Add missing copyright and license notice hw/cxl: Fix out of bound array access docs/cxl: Change to lowercase as others hw/cxl/cxl_device: Replace magic number in CXLError definition hw/pci-bridge/cxl_upstream: Fix bandwidth entry base unit for SSLBIS hw/cxl: Fix CFMW config memory leak hw/i386/pc: fix code comment on cumulative flash size subprojects: Use the correct .git suffix in the repository URLs hw/other: spelling fixes hw/tpm: spelling fixes hw/pci: spelling fixes hw/net: spelling fixes i386: spelling fixes bsd-user: spelling fixes ppc: spelling fixes Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-09-21hw/other: spelling fixesMichael Tokarev1-3/+3
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2023-09-20block: remove AIOCBInfo->get_aio_context()Stefan Hajnoczi1-7/+0
The synchronous bdrv_aio_cancel() function needs the acb's AioContext so it can call aio_poll() to wait for cancellation. It turns out that all users run under the BQL in the main AioContext, so this callback is not needed. Remove the callback, mark bdrv_aio_cancel() GLOBAL_STATE_CODE just like its blk_aio_cancel() caller, and poll the main loop AioContext. The purpose of this cleanup is to identify bdrv_aio_cancel() as an API that does not work with the multi-queue block layer. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230912231037.826804-2-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-12hw/nvme: Avoid dynamic stack allocationPeter Maydell1-1/+1
Instead of using a variable-length array in nvme_map_prp(), allocate on the stack with a g_autofree pointer. The codebase has very few VLAs, and if we can get rid of them all we can make the compiler error on new additions. This is a defensive measure against security bugs where an on-stack dynamic allocation isn't correctly size-checked (e.g. CVE-2021-3527). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-09-12hw/nvme: Use #define to avoid variable length arrayPhilippe Mathieu-Daudé1-1/+1
In nvme_map_sgl() we create an array segment[] whose size is the 'const int SEG_CHUNK_SIZE'. Since this is C, rather than C++, a "const int foo" is not a true constant, it's merely a variable with a constant value, and so semantically segment[] is a variable-length array. Switch SEG_CHUNK_SIZE to a #define so that we can make the segment[] array truly fixed-size, in the sense that it doesn't trigger the -Wvla warning. The codebase has very few VLAs, and if we can get rid of them all we can make the compiler error on new additions. This is a defensive measure against security bugs where an on-stack dynamic allocation isn't correctly size-checked (e.g. CVE-2021-3527). [PMM: rebased (function has moved file), expand commit message based on discussion from previous version of patch] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-08-09hw/nvme: fix null pointer access in ruh updateKlaus Jensen1-1/+7
The Reclaim Unit Update operation in I/O Management Receive does not verify the presence of a configured endurance group prior to accessing it. Fix this. Cc: qemu-stable@nongnu.org Fixes: 73064edfb864 ("hw/nvme: flexible data placement emulation") Reviewed-by: Jesper Wendel Devantier <j.devantier@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-08-09hw/nvme: fix null pointer access in directive receiveKlaus Jensen1-1/+1
nvme_directive_receive() does not check if an endurance group has been configured (set) prior to testing if flexible data placement is enabled or not. Fix this. Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1815 Fixes: 73064edfb864 ("hw/nvme: flexible data placement emulation") Reviewed-by: Jesper Wendel Devantier <j.devantier@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-08-07hw/nvme: fix compliance issue wrt. iosqes/iocqesKlaus Jensen1-34/+12
As of prior to this patch, the controller checks the value of CC.IOCQES and CC.IOSQES prior to enabling the controller. As reported by Ben in GitLab issue #1691, this is not spec compliant. The controller should only check these values when queues are created. This patch moves these checks to nvme_create_cq(). We do not need to check it in nvme_create_sq() since that will error out if the completion queue is not already created. Also, since the controller exclusively supports SQEs of size 64 bytes and CQEs of size 16 bytes, hard code that. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1691 Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-08-07hw/nvme: fix oob memory read in fdp events logKlaus Jensen1-0/+5
As reported by Trend Micro's Zero Day Initiative, an oob memory read vulnerability exists in nvme_fdp_events(). The host-provided offset is not verified. Fix this. This is only exploitable when Flexible Data Placement mode (fdp=on) is enabled. Fixes: CVE-2023-4135 Fixes: 73064edfb864 ("hw/nvme: flexible data placement emulation") Reported-by: Trend Micro's Zero Day Initiative Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-07-30hw/nvme: use stl/ldl pci dma apiKlaus Jensen1-29/+13
Use the stl/ldl pci dma api for writing/reading doorbells. This removes the explicit endian conversions. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-07-19hw/nvme: fix endianness issue for shadow doorbellsKlaus Jensen1-5/+13
In commit 2fda0726e514 ("hw/nvme: fix missing endian conversions for doorbell buffers"), we fixed shadow doorbells for big-endian guests running on little endian hosts. But I did not fix little-endian guests on big-endian hosts. Fix this. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1765 Fixes: 3f7fe8de3d49 ("hw/nvme: Implement shadow doorbell buffer support") Cc: qemu-stable@nongnu.org Reported-by: Thomas Huth <thuth@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-07-10pcie: Use common ARI next function numberAkihiko Odaki1-1/+1
Currently the only implementers of ARI is SR-IOV devices, and they behave similar. Share the ARI next function number. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Message-Id: <20230710153838.33917-2-akihiko.odaki@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-28hw/nvme: check maximum copy length (MCL) for COPYMinwoo Im1-0/+24
MCL(Maximum Copy Length) in the Identify Namespace data structure limits the number of LBAs to be copied inside of the controller. We've not checked it at all, so added the check with returning the proper error status. Signed-off-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-06-28hw/nvme: consider COPY command in nvme_aio_errMinwoo Im1-0/+1
If we don't have NVME_CMD_COPY consideration in the switch statement in nvme_aio_err(), it will go to have NVME_INTERNAL_DEV_ERROR and `req->status` will be ovewritten to it. During the aio context, it might set the NVMe status field like NVME_CMD_SIZE_LIMIT, but it's overwritten in the nvme_aio_err(). Add consideration for the NVME_CMD_COPY not to overwrite the status at the end of the function. Signed-off-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-06-28hw/nvme: add comment for nvme-ns propertiesMinwoo Im1-1/+8
Add more comments of existing properties for nvme-ns device. Signed-off-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-04-28hw: replace most qemu_bh_new calls with qemu_bh_new_guardedAlexander Bulekov1-2/+4
This protects devices from bh->mmio reentrancy issues. Thanks: Thomas Huth <thuth@redhat.com> for diagnosing OS X test failure. Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230427211013.2994127-5-alxndr@bu.edu> Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-04-20nvme: remove constant argument to tracepointPaolo Bonzini1-3/+1
The last argument to -pci_nvme_err_startfail_virt_state is always "OFFLINE" due to the enclosing "if" condition requiring !sctrl->scs. Reported by Coverity. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-12hw/nvme: fix memory leak in nvme_dsmKlaus Jensen1-0/+3
The iocb (and the allocated memory to hold LBA ranges) leaks if reading the LBA ranges fails. Fix this by adding a free and an unref of the iocb. Reported-by: Coverity (CID 1508281) Fixes: d7d1474fd85d ("hw/nvme: reimplement dsm to allow cancellation") Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-03-27hw/nvme: fix missing DNR on compare failureKlaus Jensen1-3/+3
Even if the host is somehow using compare to do compare-and-write, the host should be notified immediately about the compare failure and not have to wait for the driver to potentially retry the command. Fixes: 0a384f923f51 ("hw/block/nvme: add compare command") Reported-by: Jim Harris <james.r.harris@intel.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-03-27hw/nvme: Change alignment in dma functions for nvme_blk_*Mateusz Kozlowski1-10/+10
Since the nvme_blk_read/write are used by both the data and metadata portions of the IO, it can't have the 512B alignment requirement. Without this change any metadata transfer, which length isn't a multiple of 512B and which is bigger than 512B, will result in only a partial transfer. Signed-off-by: Mateusz Kozlowski <kozlowski.mateuszpl@gmail.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-03-06hw/nvme: flexible data placement emulationJesper Devantier1-3/+695
Add emulation of TP4146 ("Flexible Data Placement"). Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jesper Devantier <j.devantier@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-03-06hw/nvme: basic directives supportGollu Appalanaidu1-1/+39
Add support for the Directive Send and Recv commands and the Identify directive. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-03-06hw/nvme: add basic endurance group supportKlaus Jensen1-1/+51
Add the mandatory Endurance Group identify data structures and log pages. For now, all namespaces in a subsystem belongs to a single Endurance Group. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-03-06hw/nvme: move adjustment of data_units{read,written}Joel Granados1-6/+8
Move the rounding of bytes read/written into nvme_smart_log which reports in units of 512 bytes, rounded up in thousands. This is in preparation for adding the Endurance Group Information log page which reports in units of billions, rounded up. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-01-11hw/nvme: cleanup error reporting in nvme_init_pci()Klaus Jensen1-13/+12
Replace the local Error variable with errp and ERRP_GUARD() and change the return value to bool. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-01-11hw/nvme: clean up confusing use of errp/local_errKlaus Jensen1-25/+23
Remove an unnecessary local Error value in nvme_realize(). In the process, change nvme_check_constraints() to return a bool. Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-01-09hw/nvme: fix missing cq eventidx updateKlaus Jensen1-0/+10
Prior to reading the shadow doorbell cq head, we have to update the eventidx. Otherwise, we risk that the driver will skip an mmio doorbell write. This happens on riscv64, as reported by Guenter. Adding the missing update to the cq eventidx fixes the issue. Fixes: 3f7fe8de3d49 ("hw/nvme: Implement shadow doorbell buffer support") Cc: qemu-stable@nongnu.org Cc: qemu-riscv@nongnu.org Reported-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-01-09hw/nvme: fix missing endian conversions for doorbell buffersKlaus Jensen1-6/+13
The eventidx and doorbell value are not handling endianness correctly. Fix this. Fixes: 3f7fe8de3d49 ("hw/nvme: Implement shadow doorbell buffer support") Cc: qemu-stable@nongnu.org Reported-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-01-09hw/nvme: rename shadow doorbell related trace eventsKlaus Jensen1-4/+7
Rename the trace events related to writing the event index and reading the doorbell value to make it more clear that the event is associated with an actual update (write or read respectively). Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-01-09hw/nvme: use QOM accessorsKlaus Jensen1-41/+48
Replace various ->parent_obj use with the equivalent QOM accessors. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-12-14Drop more useless casts from void * to pointerMarkus Armbruster1-2/+2
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20221123133811.1398562-1-armbru@redhat.com>
2022-12-01hw/nvme: remove copy bh schedulingKlaus Jensen1-49/+14
Fix a potential use-after-free by removing the bottom half and enqueuing the completion directly. Fixes: 796d20681d9b ("hw/nvme: reimplement the copy command to allow aio cancellation") Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-12-01hw/nvme: fix aio cancel in dsmKlaus Jensen1-26/+8
When the DSM operation is cancelled asynchronously, we set iocb->ret to -ECANCELED. However, the callback function only checks the return value of the completed aio, which may have completed succesfully prior to the cancellation and thus the callback ends up continuing the dsm operation instead of bailing out. Fix this. Secondly, fix a potential use-after-free by removing the bottom half and enqueuing the completion directly. Fixes: d7d1474fd85d ("hw/nvme: reimplement dsm to allow cancellation") Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-12-01hw/nvme: fix aio cancel in zone resetKlaus Jensen1-25/+11
If the zone reset operation is cancelled but the block unmap operation completes normally, the callback will continue resetting the next zone since it neglects to check iocb->ret which will have been set to -ECANCELED. Make sure that this is checked and bail out if an error is present. Secondly, fix a potential use-after-free by removing the bottom half and enqueuing the completion directly. Fixes: 63d96e4ffd71 ("hw/nvme: reimplement zone reset to allow cancellation") Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-12-01hw/nvme: fix aio cancel in flushKlaus Jensen1-15/+6
Make sure that iocb->aiocb is NULL'ed when cancelling. Fix a potential use-after-free by removing the bottom half and enqueuing the completion directly. Fixes: 38f4ac65ac88 ("hw/nvme: reimplement flush to allow cancellation") Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-12-01hw/nvme: fix aio cancel in formatKlaus Jensen1-16/+12
There are several bugs in the async cancel code for the Format command. Firstly, cancelling a format operation neglects to set iocb->ret as well as clearing the iocb->aiocb after cancelling the underlying aiocb which causes the aio callback to ignore the cancellation. Trivial fix. Secondly, and worse, because the request is queued up for posting to the CQ in a bottom half, if the cancellation is due to the submission queue being deleted (which calls blk_aio_cancel), the req structure is deallocated in nvme_del_sq prior to the bottom half being schedulued. Fix this by simply removing the bottom half, there is no reason to defer it anyway. Fixes: 3bcf26d3d619 ("hw/nvme: reimplement format nvm to allow cancellation") Reported-by: Jonathan Derrick <jonathan.derrick@linux.dev> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-11-07Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi1-4/+1
into staging pci,pc,virtio: features, tests, fixes, cleanups lots of acpi rework first version of biosbits infrastructure ASID support in vhost-vdpa core_count2 support in smbios PCIe DOE emulation virtio vq reset HMAT support part of infrastructure for viommu support in vhost-vdpa VTD PASID support fixes, tests all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut # uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s # 5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X # Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur # 6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU # EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo= # =zTCn # -----END PGP SIGNATURE----- # gpg: Signature made Mon 07 Nov 2022 14:27:53 EST # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (83 commits) checkpatch: better pattern for inline comments hw/virtio: introduce virtio_device_should_start tests/acpi: update tables for new core count test bios-tables-test: add test for number of cores > 255 tests/acpi: allow changes for core_count2 test bios-tables-test: teach test to use smbios 3.0 tables hw/smbios: add core_count2 to smbios table type 4 vhost-user: Support vhost_dev_start vhost: Change the sequence of device start intel-iommu: PASID support intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function intel-iommu: drop VTDBus intel-iommu: don't warn guest errors when getting rid2pasid entry vfio: move implement of vfio_get_xlat_addr() to memory.c tests: virt: Update expected *.acpihmatvirt tables tests: acpi: aarch64/virt: add a test for hmat nodes with no initiators hw/arm/virt: Enable HMAT on arm virt machine tests: Add HMAT AArch64/virt empty table files tests: acpi: q35: update expected blobs *.hmat-noinitiators expected HMAT: tests: acpi: q35: add test for hmat nodes without initiators ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-11-07msix: Assert that specified vector is in rangeAkihiko Odaki1-4/+1
There were several different ways to deal with the situation where the vector specified for a msix function is out of bound: - early return a function and keep progresssing - propagate the error to the caller - mark msix unusable - assert it is in bound - just ignore An out-of-bound vector should not be specified if the device implementation is correct so let msix functions always assert that the specified vector is in range. An exceptional case is virtio-pci, which allows the guest to configure vectors. For virtio-pci, it is more appropriate to introduce its own checks because it is sometimes too late to check the vector range in msix functions. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20220829083524.143640-1-akihiko.odaki@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Yuval Shaia <yuval.shaia.ml@gmail.com> Signed-off-by: Akihiko Odaki &lt;<a href="mailto:akihiko.odaki@daynix.com" target="_blank">akihiko.odaki@daynix.com</a>&gt;<br>
2022-11-02hw/nvme: Abort copy command when format is one while pifFrancis Pravin Antony Michael Raj1-1/+2
As per the NVMe Command Set specification Section 3.2.2, if i) The namespace is formatted to use 16b Guard Protection Information (i.e., pif = 0) and ii) The Descriptor Format is not cleared to 0h Then the copy command should be aborted with the status code of Invalid Namespace or Format Fixes: 44219b6029fc ("hw/nvme: 64-bit pi support") Signed-off-by: Francis Pravin Antony Michael Raj <francis.michael@solidigm.com> Signed-off-by: Jonathan Derrick <jonathan.derrick@solidigm.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-11-02hw/nvme: reenable cqe batchingKlaus Jensen1-15/+11
Commit 2e53b0b45024 ("hw/nvme: Use ioeventfd to handle doorbell updates") had the unintended effect of disabling batching of CQEs. This patch changes the sq/cq timers to bottom halfs and instead of calling nvme_post_cqes() immediately (causing an interrupt per cqe), we defer the call. | iops -----------------+------ baseline | 138k +cqe batching | 233k Fixes: 2e53b0b45024 ("hw/nvme: Use ioeventfd to handle doorbell updates") Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-08-01hw/nvme: do not enable ioeventfd by defaultKlaus Jensen1-1/+1
Do not enable ioeventfd by default. Let the feature mature a bit before we consider enabling it by default. Fixes: 2e53b0b45024 ("hw/nvme: Use ioeventfd to handle doorbell updates") Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-08-01hw/nvme: unregister the event notifier handler on the main loopKlaus Jensen1-0/+2
Make sure the notifier handler is unregistered in the main loop prior to cleaning it up. Fixes: 2e53b0b45024 ("hw/nvme: Use ioeventfd to handle doorbell updates") Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-08-01hw/nvme: skip queue processing if notifier is clearedKlaus Jensen1-2/+6
While it is safe to process the queues when they are empty, skip it if the event notifier callback was invoked spuriously. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-07-15hw/nvme: Use ioeventfd to handle doorbell updatesJinhao Fan1-1/+112
Add property "ioeventfd" which is enabled by default. When this is enabled, updates on the doorbell registers will cause KVM to signal an event to the QEMU main loop to handle the doorbell updates. Therefore, instead of letting the vcpu thread run both guest VM and IO emulation, we now use the main loop thread to do IO emulation and thus the vcpu thread has more cycles for the guest VM. Since ioeventfd does not tell us the exact value that is written, it is only useful when shadow doorbell buffer is enabled, where we check for the value in the shadow doorbell buffer when we get the doorbell update event. IOPS comparison on Linux 5.19-rc2: (Unit: KIOPS) qd 1 4 16 64 qemu 35 121 176 153 ioeventfd 41 133 258 313 Changes since v3: - Do not deregister ioeventfd when it was not enabled on a SQ/CQ Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-07-15hw/nvme: Add trace events for shadow doorbell bufferJinhao Fan1-0/+5
When shadow doorbell buffer is enabled, doorbell registers are lazily updated. The actual queue head and tail pointers are stored in Shadow Doorbell buffers. Add trace events for updates on the Shadow Doorbell buffers and EventIdx buffers. Also add trace event for the Doorbell Buffer Config command. Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> [k.jensen: rebased] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-07-15hw/nvme: Implement shadow doorbell buffer supportJinhao Fan1-1/+114
Implement Doorbel Buffer Config command (Section 5.7 in NVMe Spec 1.3) and Shadow Doorbel buffer & EventIdx buffer handling logic (Section 7.13 in NVMe Spec 1.3). For queues created before the Doorbell Buffer Config command, the nvme_dbbuf_config function tries to associate each existing SQ and CQ with its Shadow Doorbel buffer and EventIdx buffer address. Queues created after the Doorbell Buffer Config command will have the doorbell buffers associated with them when they are initialized. In nvme_process_sq and nvme_post_cqe, proactively check for Shadow Doorbell buffer changes instead of wait for doorbell register changes. This reduces the number of MMIOs. In nvme_process_db(), update the shadow doorbell buffer value with the doorbell register value if it is the admin queue. This is a hack since hosts like Linux NVMe driver and SPDK do not use shadow doorbell buffer for the admin queue. Copying the doorbell register value to the shadow doorbell buffer allows us to support these hosts as well as spec-compliant hosts that use shadow doorbell buffer for the admin queue. Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> [k.jensen: rebased] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-06-28trivial typos: namesapceDr. David Alan Gilbert1-1/+1
'namespace' is misspelled in a bunch of places. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Message-Id: <20220614104045.85728-3-dgilbert@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-06-23hw/nvme: clear aen mask on resetKlaus Jensen1-0/+1
The internally maintained AEN mask is not cleared on reset. Fix this. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-06-23Revert "hw/block/nvme: add support for sgl bit bucket descriptor"Klaus Jensen1-23/+6
This reverts commit d97eee64fef35655bd06f5c44a07fdb83a6274ae. The emulated controller correctly accounts for not including bit buckets in the controller-to-host data transfer, however it doesn't correctly account for the holes for the on-disk data offsets. Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-06-23hw/nvme: clean up CC register write logicKlaus Jensen1-22/+16
The SRIOV series exposed an issued with how CC register writes are handled and how CSTS is set in response to that. Specifically, after applying the SRIOV series, the controller could end up in a state with CC.EN set to '1' but with CSTS.RDY cleared to '0', causing drivers to expect CSTS.RDY to transition to '1' but timing out. Clean this up. Reviewed-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Reviewed-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>