aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2016-06-16scsi: esp: make cmdbuf big enough for maximum CDB sizePrasad J Pandit1-1/+2
While doing DMA read into ESP command buffer 's->cmdbuf', it could write past the 's->cmdbuf' area, if it was transferring more than 16 bytes. Increase the command buffer size to 32, which is maximum when 's->do_cmd' is set, and add a check on 'len' to avoid OOB access. Reported-by: Li Qiang <liqiang6-s@360.cn> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16nbd: Avoid magic number for NBD max name sizeEric Blake1-0/+6
Declare a constant and use that when determining if an export name fits within the constraints we are willing to support. Note that upstream NBD recently documented that clients MUST support export names of 256 bytes (not including trailing NUL), and SHOULD support names up to 4096 bytes. 4096 is a bit big (we would lose benefits of stack-allocation of a name array), and we already have other limits in place (for example, qcow2 snapshot names are clamped around 1024). So for now, just stick to the required minimum, as that's easier to audit than a full-scale support for larger names. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1463006384-7734-12-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16nbd: simplify the nbd_request and nbd_reply structsPaolo Bonzini1-6/+7
These structs are never used to represent the bytes that go over the network. The big-endian network data is built into a uint8_t array in nbd_{receive,send}_{request,reply}. Remove the unused magic field, reorder the struct to avoid holes, and remove the packed attribute. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16clean-includes: run it once morePaolo Bonzini2-2/+0
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16os-posix: include sys/mman.hPaolo Bonzini2-2/+1
qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check is bogus without a previous inclusion of sys/mman.h. Include it in sysemu/os-posix.h and remove it from everywhere else. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16block/mirror: Fix target backing BDSMax Reitz1-1/+17
Currently, we are trying to move the backing BDS from the source to the target in bdrv_replace_in_backing_chain() which is called from mirror_exit(). However, mirror_complete() already tries to open the target's backing chain with a call to bdrv_open_backing_file(). First, we should only set the target's backing BDS once. Second, the mirroring block job has a better idea of what to set it to than the generic code in bdrv_replace_in_backing_chain() (in fact, the latter's conditions on when to move the backing BDS from source to target are not really correct). Therefore, remove that code from bdrv_replace_in_backing_chain() and leave it to mirror_complete(). Depending on what kind of mirroring is performed, we furthermore want to use different strategies to open the target's backing chain: - If blockdev-mirror is used, we can assume the user made sure that the target already has the correct backing chain. In particular, we should not try to open a backing file if the target does not have any yet. - If drive-mirror with mode=absolute-paths is used, we can and should reuse the already existing chain of nodes that the source BDS is in. In case of sync=full, no backing BDS is required; with sync=top, we just link the source's backing BDS to the target, and with sync=none, we use the source BDS as the target's backing BDS. We should not try to open these backing files anew because this would lead to two BDSs existing per physical file in the backing chain, and we would like to avoid such concurrent access. - If drive-mirror with mode=existing is used, we have to use the information provided in the physical image file which means opening the target's backing chain completely anew, just as it has been done already. If the target's backing chain shares images with the source, this may lead to multiple BDSs per physical image file. But since we cannot reliably ascertain this case, there is nothing we can do about it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160610185750.30956-3-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16block: Remove bs->zero_beyond_eofKevin Wolf1-3/+0
It is always true for open images now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16block: Make bdrv_load/save_vmstate coroutine_fnsKevin Wolf1-4/+6
This allows drivers to share code between normal I/O and vmstate accesses. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16block: Make .bdrv_load_vmstate() vectoredKevin Wolf2-2/+3
This brings it in line with .bdrv_save_vmstate(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16block: Introduce bdrv_preadv()Kevin Wolf1-0/+1
We already have a byte-based bdrv_pwritev(), but the read counterpart was still missing. This commit adds it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16block: Byte-based bdrv_co_do_copy_on_readv()Kevin Wolf1-3/+7
In a first step to convert the common I/O path to work on bytes rather than sectors, this converts the copy-on-read logic that is used by bdrv_aligned_preadv(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16block: Assert that flags are in rangeEric Blake1-0/+3
Add a new BDRV_REQ_MASK constant, and use it to make sure that caller flags are always valid. Tested with 'make check' and with qemu-iotests on both '-raw' and '-qcow2'; the only failure turned up was fixed in the previous commit. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-16Merge remote-tracking branch ↵Peter Maydell1-13/+15
'remotes/amit-migration/tags/migration-for-2.7-4' into staging Migration: - Fixes for TLS series - Postcopy: Add stats, fix, test case # gpg: Signature made Thu 16 Jun 2016 05:40:09 BST # gpg: using RSA key 0xEB0B4DFC657EF670 # gpg: Good signature from "Amit Shah <amit@amitshah.net>" # gpg: aka "Amit Shah <amit@kernel.org>" # gpg: aka "Amit Shah <amitshah@gmx.net>" # Primary key fingerprint: 48CA 3722 5FE7 F4A8 B337 2735 1E9A 3B5F 8540 83B6 # Subkey fingerprint: CC63 D332 AB8F 4617 4529 6534 EB0B 4DFC 657E F670 * remotes/amit-migration/tags/migration-for-2.7-4: migration: rename functions to starting migrations migration: fix typos in qapi-schema from latest migration additions Postcopy: Check for support when setting the capability tests: fix libqtest socket timeouts test: Postcopy Postcopy: Add stats on page requests Migration: Split out ram part of qmp_query_migrate Postcopy: Avoid 0 length discards Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-16migration: rename functions to starting migrationsDaniel P. Berrange1-13/+13
Apply the following renames for starting incoming migration: process_incoming_migration -> migration_fd_process_incoming migration_set_incoming_channel -> migration_channel_process_incoming migration_tls_set_incoming_channel -> migration_tls_channel_process_incoming and for starting outgoing migration: migration_set_outgoing_channel -> migration_channel_connect migration_tls_set_outgoing_channel -> migration_tls_channel_connect Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1464776234-9910-3-git-send-email-berrange@redhat.com Message-Id: <1464776234-9910-3-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-06-16Postcopy: Add stats on page requestsDr. David Alan Gilbert1-0/+2
On the source, add a count of page requests received from the destination. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-id: 1465816605-29488-4-git-send-email-dgilbert@redhat.com Message-Id: <1465816605-29488-4-git-send-email-dgilbert@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-06-14target-i386: Implement CPUID[0xB] (Extended Topology Enumeration)Radim Krčmář1-1/+6
I looked at a dozen Intel CPU that have this CPUID and all of them always had Core offset as 1 (a wasted bit when hyperthreading is disabled) and Package offset at least 4 (wasted bits at <= 4 cores). QEMU uses more compact IDs and it doesn't make much sense to change it now. I keep the SMT and Core sub-leaves even if there is just one thread/core; it makes the code simpler and there should be no harm. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-06-14pc: Add 2.7 machineIgor Mammedov1-0/+4
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-06-14Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20160614-tag' ↵Peter Maydell1-1/+0
into staging Xen 2016/06/14 # gpg: Signature made Tue 14 Jun 2016 16:01:52 BST # gpg: using RSA key 0x894F8F4870E1AE90 # gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>" # Primary key fingerprint: D04E 33AB A51F 67BA 07D3 0AEA 894F 8F48 70E1 AE90 * remotes/sstabellini/tags/xen-20160614-tag: xen: Clean up includes xen/blkif: avoid double access to any shared ring request fields Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-14arm: xlnx-zynqmp: Add xlnx-dp and xlnx-dpdmaKONRAD Frederic1-0/+4
This adds the DP and the DPDMA to the Zynq MP platform. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Tested-By: Hyun Kwon <hyun.kwon@xilinx.com> Message-id: 1465833014-21982-10-git-send-email-fred.konrad@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-14introduce xlnx-dpKONRAD Frederic1-0/+109
This is the implementation of the DisplayPort. It has an aux-bus to access dpcd and edid. Graphic plane is connected to the channel 3. Video plane is connected to the channel 0. Audio stream are connected to the channels 4 and 5. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Tested-By: Hyun Kwon <hyun.kwon@xilinx.com> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Message-id: 1465833014-21982-9-git-send-email-fred.konrad@greensocs.com [PMM: fixed format strings] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-14introduce xlnx-dpdmaKONRAD Frederic1-0/+85
This is the implementation of the DPDMA. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Tested-By: Hyun Kwon <hyun.kwon@xilinx.com> Message-id: 1465833014-21982-8-git-send-email-fred.konrad@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-14hw/i2c-ddc.c: Implement DDC I2C slavePeter Maydell1-0/+38
Implement an I2C slave which implements DDC and returns the EDID data for an attached monitor. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Tested-by: Hyun Kwon <hyun.kwon@xilinx.com> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Message-id: 1465833014-21982-7-git-send-email-fred.konrad@greensocs.com - Rebased on the current master. - Modified for QOM. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Tested-By: Hyun Kwon <hyun.kwon@xilinx.com> [PMM: actually wire up the vmstate to dc->vmsd] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-14introduce dpcd moduleKONRAD Frederic1-0/+105
This introduces dpcd module. It wires on a aux-bus and can be accessed by the driver to get lane-speed, etc. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Tested-By: Hyun Kwon <hyun.kwon@xilinx.com> Message-id: 1465833014-21982-6-git-send-email-fred.konrad@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-14introduce aux-busKONRAD Frederic1-0/+128
This introduces a new bus: aux-bus. It contains an address space for aux slaves devices and a bridge to an I2C bus for I2C through AUX transactions. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Tested-By: Hyun Kwon <hyun.kwon@xilinx.com> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Message-id: 1465833014-21982-5-git-send-email-fred.konrad@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-14i2c: Factor our send() and recv() common logicPeter Crosthwaite1-0/+1
Most of the control flow logic between send and recv (error checking etc) is the same. Factor this out into a common send_recv() API. This is then usable by clients, where the control logic for send and receive differs only by a boolean. E.g. if (send) i2c_send(...): else i2c_recv(...); becomes: i2c_send_recv(... , send); Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Message-id: 1465833014-21982-4-git-send-email-fred.konrad@greensocs.com Changes from FK: * Rebased on master. * Rebased on my i2c broadcast patch. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-14hw/arm/virt: Add PMU node for virt machineShannon Zhao1-0/+4
Add a virtual PMU device for virt machine while use PPI 7 for PMU overflow interrupt number. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-id: 1465267577-1808-3-git-send-email-zhaoshenglong@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-14xen: Clean up includesPeter Maydell1-1/+0
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2016-06-14s390x/css: introduce property type for device idsCornelia Huck1-0/+17
Let's introduce a CssDevId to handle device ids of the xx.x.xxxx type used for channel devices. This has some benefits: - We can use them in virtio-ccw and split the validity checks for a channel device id in general from the constraint checking within the virtio-ccw scope. - We can reuse the device id type for future non-virtio channel devices. While we're at it, improve the validity checks and disallow e.g. trailing characters. Suggested-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Acked-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-06-14s390x/kvm: add interface for clearing IO irqsHalil Pasic1-0/+2
According to the platform specification, under certain conditions, pending IO interruptions have to be cleared. Let's add an interface for that. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-06-14linux-headers: updateCornelia Huck2-1/+21
Update to 4.7-rc2. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-06-14spapr: Ensure all LMBs are represented in ibm,dynamic-memoryBharata B Rao1-2/+4
Memory hotplug can fail for some combinations of RAM and maxmem when DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends on maximum addressable memory returned by guest and this value is currently being calculated wrongly by the guest kernel routine memory_hotplug_max(). While there is an attempt to fix the guest kernel, this patch works around the problem within QEMU itself. memory_hotplug_max() routine in the guest kernel arrives at max addressable memory by multiplying lmb-size with the lmb-count obtained from ibm,dynamic-memory property. There are two assumptions here: - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM where only hot-pluggable LMBs are present in this property. - The memory area comprising of RAM and hotplug region is contiguous: This needn't be true always for PowerKVM as there can be gap between boot time RAM and hotplug region. To work around this guest kernel bug, ensure that ibm,dynamic-memory has information about all the LMBs (RMA, boot-time LMBs, future hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and hotpluggable region). RMA is represented separately by memory@0 node. Hence mark RMA LMBs and also the LMBs for the gap b/n RAM and hotpluggable region as reserved and as having no valid DRC so that these LMBs are not considered by the guest. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-14macio: call dma_memory_unmap() at the end of each DMA transferMark Cave-Ayland1-0/+5
This ensures that the underlying memory is marked dirty once the transfer is complete and resolves cache coherency problems under MacOS 9. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-14Add PowerPC AT_HWCAP2 definitionsAnton Blanchard1-0/+13
We need the PPC_FEATURE2_HAS_HTM bit in a subsequent patch, so add the PowerPC AT_HWCAP2 definitions. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-13Merge remote-tracking branch 'remotes/kraxel/tags/pull-usb-20160613-1' into ↵Peter Maydell1-1/+0
staging usb: misc fixes. # gpg: Signature made Mon 13 Jun 2016 14:09:15 BST # gpg: using RSA key 0x4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/pull-usb-20160613-1: vl: Eliminate usb_enabled() pxa2xx: Unconditionally enable USB controller hw/usb/dev-network.c: Use ldl_le_p() and stl_le_p() usb-host: add special case for bus+addr Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-13crypto: aes: always rename internal symbolsMike Frysinger1-3/+2
OpenSSL's libcrypto always defines AES symbols with the same names as qemu's local aes code. This is problematic when enabling at least curl as that frequently also uses libcrypto. It might not be noticed when running, but if you try to statically link, everything falls down. An example snippet: LINK qemu-nbd .../libcrypto.a(aes-x86_64.o): In function 'AES_encrypt': (.text+0x460): multiple definition of 'AES_encrypt' crypto/aes.o:aes.c:(.text+0x670): first defined here .../libcrypto.a(aes-x86_64.o): In function 'AES_decrypt': (.text+0x9f0): multiple definition of 'AES_decrypt' crypto/aes.o:aes.c:(.text+0xb30): first defined here .../libcrypto.a(aes-x86_64.o): In function 'AES_cbc_encrypt': (.text+0xf90): multiple definition of 'AES_cbc_encrypt' crypto/aes.o:aes.c:(.text+0xff0): first defined here collect2: error: ld returned 1 exit status .../qemu-2.6.0/rules.mak:105: recipe for target 'qemu-nbd' failed make: *** [qemu-nbd] Error 1 The aes.h header has redefines already for FreeBSD, but go ahead and enable that for everyone since there's no real good reason to not use a namespace all the time. Signed-off-by: Mike Frysinger <vapier@chromium.org> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-06-13vl: Eliminate usb_enabled()Eduardo Habkost1-1/+0
This wrapper for machine_usb(current_machine) is not necessary, replace all usages of usb_enabled() with machine_usb(). Cc: Peter Maydell <peter.maydell@linaro.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alexander Graf <agraf@suse.de> Cc: qemu-arm@nongnu.org Cc: qemu-ppc@nongnu.org Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-id: 1465419025-21519-3-git-send-email-ehabkost@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-06-11tb hash: track translated blocks with qhtEmilio G. Cota3-7/+5
Having a fixed-size hash table for keeping track of all translation blocks is suboptimal: some workloads are just too big or too small to get maximum performance from the hash table. The MRU promotion policy helps improve performance when the hash table is a little undersized, but it cannot make up for severely undersized hash tables. Furthermore, frequent MRU promotions result in writes that are a scalability bottleneck. For scalability, lookups should only perform reads, not writes. This is not a big deal for now, but it will become one once MTTCG matures. The appended fixes these issues by using qht as the implementation of the TB hash table. This solution is superior to other alternatives considered, namely: - master: implementation in QEMU before this patchset - xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU. - xxhash-rcu: fixed buckets + xxhash + RCU list + MRU. MRU is implemented here by adding an intermediate struct that contains the u32 hash and a pointer to the TB; this allows us, on an MRU promotion, to copy said struct (that is not at the head), and put this new copy at the head. After a grace period, the original non-head struct can be eliminated, and after another grace period, freed. - qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize + no MRU for lookups; MRU for inserts. The appended solution is the following: - qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize + no MRU for lookups; MRU for inserts. The plots below compare the considered solutions. The Y axis shows the boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis sweeps the number of buckets (or initial number of buckets for qht-autoresize). The plots in PNG format (and with errorbars) can be seen here: http://imgur.com/a/Awgnq Each test runs 5 times, and the entire QEMU process is pinned to a single core for repeatability of results. Host: Intel Xeon E5-2690 28 ++------------+-------------+-------------+-------------+------------++ A***** + + + master **A*** + 27 ++ * xxhash ##B###++ | A******A****** xxhash-rcu $$C$$$ | 26 C$$ A******A****** qht-fixed-nomru*%%D%%%++ D%%$$ A******A******A*qht-dyn-mru A*E****A 25 ++ %%$$ qht-dyn-nomru &&F&&&++ B#####% | 24 ++ #C$$$$$ ++ | B### $ | | ## C$$$$$$ | 23 ++ # C$$$$$$ ++ | B###### C$$$$$$ %%%D 22 ++ %B###### C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C | D%%%%%%B###### @E@@@@@@ %%%D%%%@@@E@@@@@@E 21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B######B + E@@@ F&&& + E@ + F&&& + + 20 ++------------+-------------+-------------+-------------+------------++ 14 16 18 20 22 24 log2 number of buckets Host: Intel i7-4790K 14.5 ++------------+------------+-------------+------------+------------++ A** + + + master **A*** + 14 ++ ** xxhash ##B###++ 13.5 ++ ** xxhash-rcu $$C$$$++ | qht-fixed-nomru %%D%%% | 13 ++ A****** qht-dyn-mru @@E@@@++ | A*****A******A****** qht-dyn-nomru &&F&&& | 12.5 C$$ A******A******A*****A****** ***A 12 ++ $$ A*** ++ D%%% $$ | 11.5 ++ %% ++ B### %C$$$$$$ | 11 ++ ## D%%%%% C$$$$$ ++ | # % C$$$$$$ | 10.5 F&&&&&&B######D%%%%% C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$ $$$C 10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B######B + F&& D%%%%%%B######B######B#####B###@@@D%%% + 9.5 ++------------+------------+-------------+------------+------------++ 14 16 18 20 22 24 log2 number of buckets Note that the original point before this patch series is X=15 for "master"; the little sensitivity to the increased number of buckets is due to the poor hashing function in master. xxhash-rcu has significant overhead due to the constant churn of allocating and deallocating intermediate structs for implementing MRU. An alternative would be do consider failed lookups as "maybe not there", and then acquire the external lock (tb_lock in this case) to really confirm that there was indeed a failed lookup. This, however, would not be enough to implement dynamic resizing--this is more complex: see "Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming" by Triplett, McKenney and Walpole. This solution was discarded due to the very coarse RCU read critical sections that we have in MTTCG; resizing requires waiting for readers after every pointer update, and resizes require many pointer updates, so this would quickly become prohibitive. qht-fixed-nomru shows that MRU promotion is advisable for undersized hash tables. However, qht-dyn-mru shows that MRU promotion is not important if the hash table is properly sized: there is virtually no difference in performance between qht-dyn-nomru and qht-dyn-mru. Before this patch, we're at X=15 on "xxhash"; after this patch, we're at X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we can achieve with optimum sizing of the hash table, while keeping the hash table scalable for readers. The improvement we get before and after this patch for booting debian jessie with arm-softmmu is: - Intel Xeon E5-2690: 10.5% less time - Intel i7-4790K: 5.2% less time We could get this same improvement _for this particular workload_ by statically increasing the size of the hash table. But this would hurt workloads that do not need a large hash table. The dynamic (upward) resizing allows us to start small and enlarge the hash table as needed. A quick note on downsizing: the table is resized back to 2**15 buckets on every tb_flush; this makes sense because it is not guaranteed that the table will reach the same number of TBs later on (e.g. most bootup code is thrown away after boot); it makes sense to grow the hash table as more code blocks are translated. This also avoids the complication of having to build downsizing hysteresis logic into qht. Reviewed-by: Sergey Fedorov <serge.fedorov@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-15-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-06-11qht: QEMU's fast, resizable and scalable Hash TableEmilio G. Cota1-0/+183
This is a fast, scalable chained hash table with optional auto-resizing, allowing reads that are concurrent with reads, and reads/writes that are concurrent with writes to separate buckets. A hash table with these features will be necessary for the scalability of the ongoing MTTCG work; before those changes arrive we can already benefit from the single-threaded speedup that qht also provides. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-11-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-06-11qdist: add module to represent frequency distributions of dataEmilio G. Cota1-0/+63
Sometimes it is useful to have a quick histogram to represent a certain distribution -- for example, when investigating a performance regression in a hash table due to inadequate hashing. The appended allows us to easily represent a distribution using Unicode characters. Further, the data structure keeping track of the distribution is so simple that obtaining its values for off-line processing is trivial. Example, taking the last 10 commits to QEMU: Characters in commit title Count ----------------------------------- 39 1 48 1 53 1 54 2 57 1 61 1 67 1 78 1 80 1 qdist_init(&dist); qdist_inc(&dist, 39); [...] qdist_inc(&dist, 80); char *str = qdist_pr(&dist, 9, QDIST_PR_LABELS); // -> [39.0,43.6)▂▂ █▂ ▂ ▄[75.4,80.0] g_free(str); char *str = qdist_pr(&dist, 4, QDIST_PR_LABELS); // -> [39.0,49.2)▁█▁▁[69.8,80.0] g_free(str); Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-9-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-06-11tb hash: hash phys_pc, pc, and flags with xxhashEmilio G. Cota1-2/+6
For some workloads such as arm bootup, tb_phys_hash is performance-critical. The is due to the high frequency of accesses to the hash table, originated by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's. More info: https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html To dig further into this I modified an arm image booting debian jessie to immediately shut down after boot. Analysis revealed that quite a bit of time is unnecessarily spent in tb_phys_hash: the cause is poor hashing that results in very uneven loading of chains in the hash table's buckets; the longest observed chain had ~550 elements. The appended addresses this with two changes: 1) Use xxhash as the hash table's hash function. xxhash is a fast, high-quality hashing function. 2) Feed the hashing function with not just tb_phys, but also pc and flags. This improves performance over using just tb_phys for hashing, since that resulted in some hash buckets having many TB's, while others getting very few; with these changes, the longest observed chain on a single hash bucket is brought down from ~550 to ~40. Tests show that the other element checked for in tb_find_physical, cs_base, is always a match when tb_phys+pc+flags are a match, so hashing cs_base is wasteful. It could be that this is an ARM-only thing, though. UPDATE: On Tue, Apr 05, 2016 at 08:41:43 -0700, Richard Henderson wrote: > The cs_base field is only used by i386 (in 16-bit modes), and sparc (for a TB > consisting of only a delay slot). > It may well still turn out to be reasonable to ignore cs_base for hashing. BTW, after this change the hash table should not be called "tb_hash_phys" anymore; this is addressed later in this series. This change gives consistent bootup time improvements. I tested two host machines: - Intel Xeon E5-2690: 11.6% less time - Intel i7-4790K: 19.2% less time Increasing the number of hash buckets yields further improvements. However, using a larger, fixed number of buckets can degrade performance for other workloads that do not translate as many blocks (600K+ for debian-jessie arm bootup). This is dealt with later in this series. Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-8-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-06-11exec: add tb_hash_func5, derived from xxhashEmilio G. Cota1-0/+94
This will be used by upcoming changes for hashing the tb hash. Add this into a separate file to include the copyright notice from xxhash. Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-7-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-06-11qemu-thread: add simple test-and-set spinlockGuillaume Delbergue1-0/+35
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org> Signed-off-by: Guillaume Delbergue <guillaume.delbergue@greensocs.com> [Rewritten. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Emilio's additions: use TAS instead of atomic_xchg; emit acquire/release barriers; return bool from trylock; call cpu_relax() while spinning; optimize for uncontended locks by acquiring the lock with TAS instead of TATAS; add qemu_spin_locked().] Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-6-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-06-11include/processor.h: define cpu_relax()Emilio G. Cota1-0/+30
Taken from the linux kernel. Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-5-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-06-11seqlock: rename write_lock/unlock to write_begin/endEmilio G. Cota1-2/+2
It is a more appropriate name, now that the mutex embedded in the seqlock is gone. Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-4-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-06-11seqlock: remove optional mutexEmilio G. Cota1-9/+1
This option is unused; besides, it bloats the struct when not needed. Let's just let writers define their own locks elsewhere. Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-3-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-06-11compiler.h: add QEMU_ALIGNED() to enforce struct alignmentEmilio G. Cota1-0/+2
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-2-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-06-09cpu-exec: Rename cpu_resume_from_signal() to cpu_loop_exit_noexc()Peter Maydell1-1/+1
The function cpu_resume_from_signal() is now always called with a NULL puc argument, and is rather misnamed since it is never called from a signal handler. It is essentially forcing an exit to the top level cpu loop but without raising any exception, so rename it to cpu_loop_exit_noexc() and drop the useless unused argument. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org> Acked-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Riku Voipio <riku.voipio@linaro.org> Message-id: 1463494687-25947-4-git-send-email-peter.maydell@linaro.org
2016-06-08Merge remote-tracking branch 'remotes/riku/tags/pull-linux-user-20160608' ↵Peter Maydell1-1/+0
into staging linux-user pull request for June 2016 # gpg: Signature made Wed 08 Jun 2016 14:27:14 BST # gpg: using RSA key 0xB44890DEDE3C9BC0 # gpg: Good signature from "Riku Voipio <riku.voipio@iki.fi>" # gpg: aka "Riku Voipio <riku.voipio@linaro.org>" * remotes/riku/tags/pull-linux-user-20160608: (44 commits) linux-user: In fork_end(), remove correct CPUs from CPU list linux-user: Special-case ERESTARTSYS in target_strerror() linux-user: Make target_strerror() return 'const char *' linux-user: Correct signedness of target_flock l_start and l_len fields linux-user: Use safe_syscall wrapper for ioctl linux-user: Use safe_syscall wrapper for accept and accept4 syscalls linux-user: Use safe_syscall wrapper for semop linux-user: Use safe_syscall wrapper for epoll_wait syscalls linux-user: Use safe_syscall wrapper for poll and ppoll syscalls linux-user: Use safe_syscall wrapper for sleep syscalls linux-user: Use safe_syscall wrapper for rt_sigtimedwait syscall linux-user: Use safe_syscall wrapper for flock linux-user: Use safe_syscall wrapper for mq_timedsend and mq_timedreceive linux-user: Use safe_syscall wrapper for msgsnd and msgrcv linux-user: Use safe_syscall wrapper for send* and recv* syscalls linux-user: Use safe_syscall wrapper for connect syscall linux-user: Use safe_syscall wrapper for readv and writev syscalls linux-user: Fix error conversion in 64-bit fadvise syscall linux-user: Fix NR_fadvise64 and NR_fadvise64_64 for 32-bit guests linux-user: Fix handling of arm_fadvise64_64 syscall ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Conflicts: configure scripts/qemu-binfmt-conf.sh
2016-06-08block: Kill bdrv_co_write_zeroes()Eric Blake1-2/+0
Now that all drivers have been converted to a byte interface, we no longer need a sector interface. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08block: Switch bdrv_write_zeroes() to byte interfaceEric Blake1-5/+5
Rename to bdrv_pwrite_zeroes() to let the compiler ensure we cater to the updated semantics. Do the same for bdrv_co_write_zeroes(). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>