aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-10-17vfio/pci: Handle host oversightEric Auger1-0/+11
In case the end-user calls qemu with -vfio-pci option without passing either sysfsdev or host property value, the device is interpreted as 0000:00:00.0. Let's create a specific error message to guide the end-user. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Remove vfio_populate_device returned valueEric Auger1-13/+10
The returned value (either -errno or -1) is not used anymore by the caller, vfio_realize, since the error now is stored in the error object. So let's remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Remove vfio_msix_early_setup returned valueEric Auger1-10/+10
The returned value is not used anymore by the caller, vfio_realize, since the error now is stored in the error object. So let's remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Conversion to realizeEric Auger2-39/+27
This patch converts VFIO PCI to realize function. Also original initfn errors now are propagated using QEMU error objects. All errors are formatted with the same pattern: "vfio: %s: the error description" Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/platform: Pass an error object to vfio_base_device_initEric Auger1-23/+27
This patch propagates errors encountered during vfio_base_device_init up to the realize function. In case the host value is not set or badly formed we now report an error. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/platform: fix a wrong returned value in vfio_populate_deviceEric Auger1-0/+1
In case the vfio_init_intp fails we currently do not return an error value. This patch fixes the bug. The returned value is not explicit but in practice the error object is the one used to report the error to the end-user and the actual returned error value is not used. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/platform: Pass an error object to vfio_populate_deviceEric Auger1-12/+13
Propagate the vfio_populate_device errors up to vfio_base_device_init. The error object also is passed to vfio_init_intp. At the moment we only report the error. Subsequent patches will propagate the error up to the realize function. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio: Pass an error object to vfio_get_deviceEric Auger4-12/+11
Pass an error object to prepare for migration to VFIO-PCI realize. In vfio platform vfio_base_device_init we currently just report the error. Subsequent patches will propagate the error up to the realize function. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio: Pass an error object to vfio_get_groupEric Auger4-18/+22
Pass an error object to prepare for migration to VFIO-PCI realize. For the time being let's just simply report the error in vfio platform's vfio_base_device_init(). A subsequent patch will duly propagate the error up to vfio_platform_realize. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio: Pass an Error object to vfio_connect_containerEric Auger1-15/+25
The error is currently simply reported in vfio_get_group. Don't bother too much with the prefix which will be handled at upper level, later on. Also return an error value in case container->error is not 0 and the container is teared down. On vfio_spapr_remove_window failure, we also report an error whereas it was silent before. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_pci_igd_opregion_initEric Auger3-8/+8
Pass an error object to prepare for migration to VFIO-PCI realize. In vfio_probe_igd_bar4_quirk, simply report the error. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_add_capabilitiesEric Auger1-28/+31
Pass an error object to prepare for migration to VFIO-PCI realize. The error is cascaded downto vfio_add_std_cap and then vfio_msi(x)_setup, vfio_setup_pcie_cap. vfio_add_ext_cap does not return anything else than 0 so let's transform it into a void function. Also use pci_add_capability2 which takes an error object. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_intx_enableEric Auger1-12/+29
Pass an error object to prepare for migration to VFIO-PCI realize. The error object is propagated down to vfio_intx_enable_kvm(). The three other callers, vfio_intx_enable_kvm(), vfio_msi_disable_common() and vfio_pci_post_reset() do not propagate the error and simply call error_reportf_err() with the ERR_PREFIX formatting. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_msix_early_setupEric Auger1-5/+8
Pass an error object to prepare for migration to VFIO-PCI realize. The returned value will be removed later on. We now format an error in case of reading failure for - the MSIX flags - the MSIX table, - the MSIX PBA. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_populate_deviceEric Auger1-14/+12
Pass an error object to prepare for migration to VFIO-PCI realize. The returned value will be removed later on. The case where error recovery cannot be enabled is not converted into an error object but directly reported through error_report, as before. Populating an error instead would cause the future realize function to fail, which is not wanted. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_populate_vgaEric Auger3-9/+16
Pass an error object to prepare for the same operation in vfio_populate_device. Eventually this contributes to the migration to VFIO-PCI realize. We now report an error on vfio_get_region_info failure. vfio_probe_igd_bar4_quirk is not involved in the migration to realize and simply calls error_reportf_err. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Use local error object in vfio_initfnEric Auger2-30/+45
To prepare for migration to realize, let's use a local error object in vfio_initfn. Also let's use the same error prefix for all error messages. On top of the 1-1 conversion, we start using a common error prefix for all error messages. We also introduce a similar warning prefix which will be used later on. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into stagingPeter Maydell12-190/+245
This pull request contains: - a patch to add a vdc->reset() handler to virtio-9p - a bunch of patches to fix various memory leaks (thanks to Li Qiang) - some code cleanups for 9pfs # gpg: Signature made Mon 17 Oct 2016 16:01:46 BST # gpg: using DSA key 0x02FC3AEB0101DBC2 # gpg: Good signature from "Greg Kurz <groug@kaod.org>" # gpg: aka "Greg Kurz <groug@free.fr>" # gpg: aka "Greg Kurz <gkurz@fr.ibm.com>" # gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>" # gpg: aka "Gregory Kurz (Groug) <groug@free.fr>" # gpg: aka "Gregory Kurz (Cimai Technology) <gkurz@cimai.com>" # gpg: aka "Gregory Kurz (Meiosys Technology) <gkurz@meiosys.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2 * remotes/gkurz/tags/for-upstream: 9pfs: fix memory leak in v9fs_write 9pfs: fix memory leak in v9fs_link 9pfs: fix memory leak in v9fs_xattrcreate 9pfs: fix information leak in xattr read virtio-9p: add reset handler 9pfs: only free completed request if not flushed 9pfs: drop useless check in pdu_free() 9pfs: use coroutine_fn annotation in hw/9pfs/9p.[ch] 9pfs: use coroutine_fn annotation in hw/9pfs/co*.[ch] 9pfs: fsdev: drop useless extern annotation for functions 9pfs: fix potential host memory leak in v9fs_read 9pfs: allocate space for guest originated empty strings Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-179pfs: fix memory leak in v9fs_writeLi Qiang1-1/+1
If an error occurs when marshalling the transfer length to the guest, the v9fs_write() function doesn't free an IO vector, thus leading to a memory leak. This patch fixes the issue. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, rephrased the changelog] Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fix memory leak in v9fs_linkLi Qiang1-0/+1
The v9fs_link() function keeps a reference on the source fid object. This causes a memory leak since the reference never goes down to 0. This patch fixes the issue. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, rephrased the changelog] Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fix memory leak in v9fs_xattrcreateLi Qiang1-0/+1
The 'fs.xattr.value' field in V9fsFidState object doesn't consider the situation that this field has been allocated previously. Every time, it will be allocated directly. This leads to a host memory leak issue if the client sends another Txattrcreate message with the same fid number before the fid from the previous time got clunked. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, updated the changelog to indicate how the leak can occur] Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fix information leak in xattr readLi Qiang1-1/+1
9pfs uses g_malloc() to allocate the xattr memory space, if the guest reads this memory before writing to it, this will leak host heap memory to the guest. This patch avoid this. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17virtio-9p: add reset handlerGreg Kurz3-0/+39
Virtio devices should implement the VirtIODevice->reset() function to perform necessary cleanup actions and to bring the device to a quiescent state. In the case of the virtio-9p device, this means: - emptying the list of active PDUs (i.e. draining all in-flight I/O) - freeing all fids (i.e. close open file descriptors and free memory) That's what this patch does. The reset handler first waits for all active PDUs to complete. Since completion happens in the QEMU global aio context, we just have to loop around aio_poll() until the active list is empty. The freeing part involves some actions to be performed on the backend, like closing file descriptors or flushing extended attributes to the underlying filesystem. The virtfs_reset() function already does the job: it calls free_fid() for all open fids not involved in an ongoing I/O operation. We are sure this is the case since we have drained the PDU active list. The current code implements all backend accesses with coroutines, but we want to stay synchronous on the reset path. We can either change the current code to be able to run when not in coroutine context, or create a coroutine context and wait for virtfs_reset() to complete. This patch goes for the latter because it results in simpler code. Note that we also need to create a dummy PDU because it is also an API to pass the FsContext pointer to all backend callbacks. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-10-179pfs: only free completed request if not flushedGreg Kurz1-11/+7
If a PDU has a flush request pending, the current code calls pdu_free() twice: 1) pdu_complete()->pdu_free() with pdu->cancelled set, which does nothing 2) v9fs_flush()->pdu_free() with pdu->cancelled cleared, which moves the PDU back to the free list. This works but it complexifies the logic of pdu_free(). With this patch, pdu_complete() only calls pdu_free() if no flush request is pending, i.e. qemu_co_queue_next() returns false. Since pdu_free() is now supposed to be called with pdu->cancelled cleared, the check in pdu_free() is dropped and replaced by an assertion. Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: drop useless check in pdu_free()Greg Kurz1-10/+8
Out of the three users of pdu_free(), none ever passes a NULL pointer to this function. Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: use coroutine_fn annotation in hw/9pfs/9p.[ch]Greg Kurz2-57/+62
All these functions either call the v9fs_co_* functions which have the coroutine_fn annotation, or pdu_complete() which calls qemu_co_queue_next(). Let's mark them to make it obvious they execute in coroutine context. Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: use coroutine_fn annotation in hw/9pfs/co*.[ch]Greg Kurz5-95/+109
All these functions use the v9fs_co_run_in_worker() macro, and thus always call qemu_coroutine_self() and qemu_coroutine_yield(). Let's mark them to make it obvious they execute in coroutine context. Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fsdev: drop useless extern annotation for functionsGreg Kurz5-65/+65
Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fix potential host memory leak in v9fs_readLi Qiang1-2/+3
In 9pfs read dispatch function, it doesn't free two QEMUIOVector object thus causing potential memory leak. This patch avoid this. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: allocate space for guest originated empty stringsLi Qiang2-2/+2
If a guest sends an empty string paramater to any 9P operation, the current code unmarshals it into a V9fsString equal to { .size = 0, .data = NULL }. This is unfortunate because it can cause NULL pointer dereference to happen at various locations in the 9pfs code. And we don't want to check str->data everywhere we pass it to strcmp() or any other function which expects a dereferenceable pointer. This patch enforces the allocation of genuine C empty strings instead, so callers don't have to bother. Out of all v9fs_iov_vunmarshal() users, only v9fs_xattrwalk() checks if the returned string is empty. It now uses v9fs_string_size() since name.data cannot be NULL anymore. Signed-off-by: Li Qiang <liqiang6-s@360.cn> [groug, rewritten title and changelog, fix empty string check in v9fs_xattrwalk()] Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.8-20161017' ↵Peter Maydell28-337/+642
into staging ppc patch queue 2016-10-17 Highlights: * Significant rework of how PCI IO windows are placed for the pseries machine type * A number of extra tests added for ppc * Other tests clean up / fixed * Some cleanups to the XICS interrupt controller in preparation for the 'powernv' machine type A number of the test changes aren't strictly in ppc related code, but are included via my tree because they're primarily focused on improving test coverage for ppc. # gpg: Signature made Mon 17 Oct 2016 03:42:41 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.8-20161017: spapr: Improved placement of PCI host bridges in guest memory map spapr_pci: Add a 64-bit MMIO window spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM spapr_pci: Delegate placement of PCI host bridges to machine type libqos: Limit spapr-pci to 32-bit MMIO for now libqos: Correct error in PCI hole sizing for spapr libqos: Isolate knowledge of spapr memory map to qpci_init_spapr() ppc/xics: Split ICS into ics-base and ics class ppc/xics: Make the ICSState a list spapr: fix inheritance chain for default machine options target-ppc: implement vexts[bh]2w and vexts[bhw]2d tests/boot-sector: Increase time-out to 90 seconds tests/boot-sector: Use mkstemp() to create a unique file name tests/boot-sector: Use minimum length for the Forth boot script qtest: ask endianness of the target in qtest_init() tests: minor cleanups in usb-hcd-uhci-test Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-17Merge remote-tracking branch 'remotes/famz/tags/for-upstream' into stagingPeter Maydell4-24/+66
# gpg: Signature made Mon 17 Oct 2016 03:08:28 BST # gpg: using RSA key 0xCA35624C6A9171C6 # gpg: Good signature from "Fam Zheng <famz@redhat.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6 * remotes/famz/tags/for-upstream: tests/docker/Makefile.include: add a generic docker-run target tests/docker: make test-mingw honour TARGET_LIST tests/docker: test-build script tests/docker: add travis dockerfile Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-17Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20161014' ↵Peter Maydell15-182/+281
into staging migration/next for 20161014 # gpg: Signature made Fri 14 Oct 2016 16:24:13 BST # gpg: using RSA key 0xF487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20161014: docs/xbzrle: correction migrate: move max-bandwidth and downtime-limit to migrate_set_parameter migration: Fix seg with missing port migration/postcopy: Explicitly disallow huge pages RAMBlocks: Store page size Postcopy vs xbzrle: Don't send xbzrle pages once in postcopy [for 2.8] migrate: Fix bounds check for migration parameters in migration.c migrate: Use boxed qapi for migrate-set-parameters migrate: Share common MigrationParameters struct migrate: Fix cpu-throttle-increment regression in HMP migration/rdma: Don't flag an error when we've been told about one migration: Make failed migration load set file error migration/rdma: Pass qemu_file errors across link migration: Report values for comparisons migration: report an error giving the failed field Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-17tests/docker/Makefile.include: add a generic docker-run targetAlex Bennée1-23/+38
This re-factors the docker makefile to include a docker-run target which can be controlled entirely from environment variables specified on the make command line. This allows us to run against any given docker image we may have in our repository, for example: make docker-run TEST="test-quick" IMAGE="debian:arm64" \ EXECUTABLE=./aarch64-linux-user/qemu-aarch64 The existing docker-foo@bar targets still work but the inline verification has been dropped because we already don't hit that due to other pattern rules in rules.mak. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20161011161625.9070-5-alex.bennee@linaro.org> Message-Id: <20161011161625.9070-6-alex.bennee@linaro.org> [Squash in the verification removal patch. - Fam] Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-17tests/docker: make test-mingw honour TARGET_LISTAlex Bennée1-1/+2
The other builders honour this variable, so should the mingw build. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20161011161625.9070-4-alex.bennee@linaro.org> Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-17tests/docker: test-build scriptAlex Bennée1-0/+20
Much like test-quick but only builds. This is useful for some of the build targets like ThreadSanitizer that don't yet pass "make check". Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20161011161625.9070-3-alex.bennee@linaro.org> Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-17tests/docker: add travis dockerfileAlex Bennée1-0/+6
This target grabs the latest Travis containers from their repository at quay.io and then installs QEMU's build dependencies. With this it is possible to run on broadly the same setup as they have on travis-ci.org. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20161011161625.9070-2-alex.bennee@linaro.org> Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-16spapr: Improved placement of PCI host bridges in guest memory mapDavid Gibson6-40/+109
Currently, the MMIO space for accessing PCI on pseries guests begins at 1 TiB in guest address space. Each PCI host bridge (PHB) has a 64 GiB chunk of address space in which it places its outbound PIO and 32-bit and 64-bit MMIO windows. This scheme as several problems: - It limits guest RAM to 1 TiB (though we have a limited fix for this now) - It limits the total MMIO window to 64 GiB. This is not always enough for some of the large nVidia GPGPU cards - Putting all the windows into a single 64 GiB area means that naturally aligning things within there will waste more address space. In addition there was a miscalculation in some of the defaults, which meant that the MMIO windows for each PHB actually slightly overran the 64 GiB region for that PHB. We got away without nasty consequences because the overrun fit within an unused area at the beginning of the next PHB's region, but it's not pretty. This patch implements a new scheme which addresses those problems, and is also closer to what bare metal hardware and pHyp guests generally use. Because some guest versions (including most current distro kernels) can't access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and 64 TiB. This is broken into 1 TiB chunks. The first 1 TiB contains the PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs. Each subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for one PHB each. This reduces the number of allowed PHBs (without full manual configuration of all the windows) from 256 to 31, but this should still be plenty in practice. We also change some of the default window sizes for manually configured PHBs to saner values. Finally we adjust some tests and libqos so that it correctly uses the new default locations. Ideally it would parse the device tree given to the guest, but that's a more complex problem for another time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16spapr_pci: Add a 64-bit MMIO windowDavid Gibson4-19/+72
On real hardware, and under pHyp, the PCI host bridges on Power machines typically advertise two outbound MMIO windows from the guest's physical memory space to PCI memory space: - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space - A 64-bit window which maps onto a large region somewhere high in PCI address space (traditionally this used an identity mapping from guest physical address to PCI address, but that's not always the case) The qemu implementation in spapr-pci-host-bridge, however, only supports a single outbound MMIO window, however. At least some Linux versions expect the two windows however, so we arranged this window to map onto the PCI memory space from 2 GiB..~64 GiB, then advertised it as two contiguous windows, the "32-bit" window from 2G..4G and the "64-bit" window from 4G..~64G. This approach means, however, that the 64G window is not naturally aligned. In turn this limits the size of the largest BAR we can map (which does have to be naturally aligned) to roughly half of the total window. With some large nVidia GPGPU cards which have huge memory BARs, this is starting to be a problem. This patch adds true support for separate 32-bit and 64-bit outbound MMIO windows to the spapr-pci-host-bridge implementation, each of which can be independently configured. The 32-bit window always maps to 2G.. in PCI space, but the PCI address of the 64-bit window can be configured (it defaults to the same as the guest physical address). So as not to break possible existing configurations, as long as a 64-bit window is not specified, a large single window can be specified. This will appear the same way to the guest as the old approach, although it's now implemented by two contiguous memory regions rather than a single one. For now, this only adds the possibility of 64-bit windows. The default configuration still uses the legacy mode. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16spapr: Adjust placement of PCI host bridge to allow > 1TiB RAMDavid Gibson1-2/+14
Currently the default PCI host bridge for the 'pseries' machine type is constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in guest memory space. This means that if > 1TiB of guest RAM is specified, the RAM will collide with the PCI IO windows, causing serious problems. Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because there's a little unused space at the bottom of the area reserved for PCI, but essentially this means that > 1TiB of RAM has never worked with the pseries machine type. This patch fixes this by altering the placement of PHBs on large-RAM VMs. Instead of always placing the first PHB at 1TiB, it is placed at the next 1 TiB boundary after the maximum RAM address. Technically, this changes behaviour in a migration-breaking way for existing machines with > 1TiB maximum memory, but since having > 1 TiB memory was broken anyway, this seems like a reasonable trade-off. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16spapr_pci: Delegate placement of PCI host bridges to machine typeDavid Gibson4-24/+42
The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB) for a PAPR guest. Unlike on x86, it's routine on Power (both bare metal and PAPR guests) to have numerous independent PHBs, each controlling a separate PCI domain. There are two ways of configuring the spapr-pci-host-bridge device: first it can be done fully manually, specifying the locations and sizes of all the IO windows. This gives the most control, but is very awkward with 6 mandatory parameters. Alternatively just an "index" can be specified which essentially selects from an array of predefined PHB locations. The PHB at index 0 is automatically created as the default PHB. The current set of default locations causes some problems for guests with large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia GPGPU cards via VFIO). Obviously, for migration we can only change the locations on a new machine type, however. This is awkward, because the placement is currently decided within the spapr-pci-host-bridge code, so it breaks abstraction to look inside the machine type version. So, this patch delegates the "default mode" PHB placement from the spapr-pci-host-bridge device back to the machine type via a public method in sPAPRMachineClass. It's still a bit ugly, but it's about the best we can do. For now, this just changes where the calculation is done. It doesn't change the actual location of the host bridges, or any other behaviour. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16libqos: Limit spapr-pci to 32-bit MMIO for nowDavid Gibson1-17/+15
Currently the functions in pci-spapr.c (like pci-pc.c on which it's based) don't distinguish between 32-bit and 64-bit PCI MMIO. At the moment, the qemu side implementation is a bit weird and has a single MMIO window straddling 32-bit and 64-bit regions, but we're likely to change that in future. In any case, pci-pc.c - and therefore the testcases using PCI - only handle 32-bit MMIOs for now. For spapr despite whatever changes might happen with the MMIO windows, the 32-bit window is likely to remain at 2..4 GiB in PCI space. So, explicitly limit pci-spapr.c to 32-bit MMIOs for now, we can add 64-bit MMIO support back in when and if we need it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16libqos: Correct error in PCI hole sizing for spaprDavid Gibson1-2/+4
In pci-spapr.c (as in pci-pc.c from which it was derived), the pci_hole_start/pci_hole_size and pci_iohole_start/pci_iohole_size pairs[1] essentially define the region of PCI (not CPU) addresses in which MMIO or PIO BARs respectively will be allocated. The size value is relative to the start value. But in pci-spapr.c it is set to the entire size of the window supported by the (emulated) hardware, but the start values are *not* at the beginning of the emulated windows. That means if you tried to map enough PCI BARs, we'd messily overrun the IO windows, instead of failing in iomap as we should. This patch corrects this by calculating the hole sizes from the location of the window in PCI space and the hole start. [1] Those are bad names, but that's a problem for another time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16libqos: Isolate knowledge of spapr memory map to qpci_init_spapr()David Gibson1-49/+64
The libqos code for accessing PCI on the spapr machine type uses IOBASE() and MMIOBASE() macros to determine the address in the CPU memory map of the windows to PCI address space. This is a detail of the implementation of PCI in the machine type, it's not specified by the PAPR standard. Real guests would get the addresses of the PCI windows from the device tree. Finding the device tree in libqos would be awkward, but we can at least localize this knowledge of the implementation to the init function, saving it in the QPCIBusSPAPR structure for use by the accessors. That leaves only one place to fix if we alter the location of the PCI windows, as we're planning to do. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-14ppc/xics: Split ICS into ics-base and ics classBenjamin Herrenschmidt5-91/+138
The existing implementation remains same and ics-base is introduced. The type name "ics" is retained, and all the related functions renamed as ics_simple_* This will allow different implementations for the source controllers such as the MSI support of PHB3 on Power8 which uses in-memory state tables for example. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ clg: added ICS_BASE_GET_CLASS and related fixes, based on : http://patchwork.ozlabs.org/patch/646010/ ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14ppc/xics: Make the ICSState a listBenjamin Herrenschmidt8-86/+139
Instead of an array of fixed sized blocks, use a list, as we will need to have sources with variable number of interrupts. SPAPR only uses a single entry. Native will create more. If performance becomes an issue we can add some hashed lookup but for now this will do fine. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [ move the initialization of list to xics_common_initfn, restore xirr_owner after migration and move restoring to icp_post_load] Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> [ clg: removed the icp_post_load() changes from nikunj patchset v3: http://patchwork.ozlabs.org/patch/646008/ ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14spapr: fix inheritance chain for default machine optionsMichael Roth1-0/+3
Rather than machine instances having backward-compatible option defaults that need to be repeatedly re-enabled for every new machine type we introduce, we set the defaults appropriate for newer machine types, then add code to explicitly disable instance options as needed to maintain compatibility with older machine types. Currently pseries-2.5 does not inherit from pseries-2.6 in this fashion, which is okay at the moment since we do not have any instance compatibility options for pseries-2.6+ currently. We will make use of this in future patches though, so fix it here. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> [dwg: Extended to make 2.7 inherit from 2.8 as well] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14target-ppc: implement vexts[bh]2w and vexts[bhw]2dNikunj A Dadhania4-0/+30
Vector Extend Sign Instructions: vextsb2w: Vector Extend Sign Byte To Word vextsh2w: Vector Extend Sign Halfword To Word vextsb2d: Vector Extend Sign Byte To Doubleword vextsh2d: Vector Extend Sign Halfword To Doubleword vextsw2d: Vector Extend Sign Word To Doubleword Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14tests/boot-sector: Increase time-out to 90 secondsThomas Huth1-2/+2
Since the PXE tester runs rather slow on ppc64 with tcg, there is a chance that we hit the 60 seconds timeout on machines that have a heavy CPU load. So let's increase the timeout to ease the situation. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14tests/boot-sector: Use mkstemp() to create a unique file nameThomas Huth4-9/+16
The pxe-test is run for three different targets now (x86_64, i386 and ppc64), and the bios-tables-test is run for two targets (x86_64 and i386). But each of the tests is using an invariant name for the disk image with the boot sector code - so if the tests are running in parallel, there is a race condition that they destroy the disk image of a parallel test program. Let's use mkstemp() to create unique temporary files here instead - and since mkstemp() is returning an integer file descriptor instead of a FILE pointer, we also switch the fwrite() and fclose() to write() and close() instead. Reported-by: Sascha Silbe <x-qemu@se-silbe.de> Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>