aboutsummaryrefslogtreecommitdiff
path: root/util/bufferiszero.c
AgeCommit message (Collapse)AuthorFilesLines
2023-05-23util/bufferiszero: Use i386 host/cpuinfo.hRichard Henderson1-80/+45
Use cpuinfo_init() during init_accel(), and the variable cpuinfo during test_buffer_is_zero_next_accel(). Adjust the logic that cycles through the set of accelerators for testing. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-05include/qemu/cpuid: Introduce xgetbv_lowRichard Henderson1-2/+1
Replace the two uses of asm to expand xgetbv with an inline function. Since one of the two has been using the mnemonic, assume that the comment about "older versions of the assember" is obsolete, as even that is 4 years old. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-01-16util/bufferiszero: Use __attribute__((target)) for avx2/avx512Richard Henderson1-35/+6
Use the attribute, which is supported by clang, instead of the #pragma, which is not supported and, for some reason, also not detected by the meson probe, so we fail by -Werror. Include only <immintrin.h> as that is the outermost "official" header for these intrinsics -- emmintrin.h and smmintrin -- are older SSE2 and SSE4 specific headers, while the immintrin.h includes all of the Intel intrinsics. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-02-04cpuid: use unsigned for max cpuidMichael S. Tsirkin1-1/+1
__get_cpuid_max returns an unsigned value. For consistency, store the result in an unsigned variable. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2020-04-01util/bufferiszero: improve avx2 acceleratorRobert Hoo1-17/+9
By increasing avx2 length_to_accel to 128, we can simplify its logic and reduce a branch. The authorship of this patch actually belongs to Richard Henderson <richard.henderson@linaro.org>, I just fixed a boundary case on his original patch. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Message-Id: <1585119021-46593-2-git-send-email-robert.hu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-01util/bufferiszero: assign length_to_accel value for each accelerator caseRobert Hoo1-0/+3
Because in unit test, init_accel() will be called several times, each with different accelerator type. Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Message-Id: <1585119021-46593-1-git-send-email-robert.hu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16util: add util function buffer_zero_avx512()Robert Hoo1-10/+61
And intialize buffer_is_zero() with it, when Intel AVX512F is available on host. This function utilizes Intel AVX512 fundamental instructions which is faster than its implementation with AVX2 (in my unit test, with 4K buffer, on CascadeLake SP, ~36% faster, buffer_zero_avx512() V.S. buffer_zero_avx2()). Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-12Include qemu-common.h exactly where neededMarkus Armbruster1-1/+0
No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]
2017-07-24util: Introduce include/qemu/cpuid.hRichard Henderson1-4/+3
Clang 3.9 passes the CONFIG_AVX2_OPT configure test. However, the supplied <cpuid.h> does not contain the bit_AVX2 define that we use when detecting whether the routine can be enabled. Introduce a qemu-specific header that uses the compiler's definition of __cpuid et al, but supplies any missing bit_* definitions needed. This avoids introducing any extra ifdefs to util/bufferiszero.c, and allows quite a few to be removed from tcg/i386/tcg-target.inc.c. Signed-off-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20170719044018.18063-1-rth@twiddle.net Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-14cutils: Rewrite x86 buffer zero checkingRichard Henderson1-75/+156
Handle alignment of buffers, so that the vector paths can be used more often. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1473800239-13841-1-git-send-email-rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13cutils: Add generic prefetchRichard Henderson1-0/+5
There's no real knowledge of the cacheline size, just prefetching one loop ahead. Signed-off-by: Richard Henderson <rth@twiddle.net> Message-Id: <1472496380-19706-7-git-send-email-rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13cutils: Add SSE4 versionPaolo Bonzini1-0/+10
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13cutils: Add test for buffer_is_zeroRichard Henderson1-0/+20
Signed-off-by: Richard Henderson <rth@twiddle.net> Message-Id: <1472496380-19706-6-git-send-email-rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13cutils: Remove ppc buffer zero checkingRichard Henderson1-25/+1
For ppc64le, gcc6 does extremely poorly with the Altivec code. Moreover, on POWER7 and POWER8, a hand-optimized Altivec version turns out to be no faster than the revised integer version, and therefore not worth the effort. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13cutils: Remove aarch64 buffer zero checkingRichard Henderson1-15/+0
The revised integer version is 4 times faster than the neon version on an AppliedMicro Mustang. Even with hand scheduling and additional unrolling I cannot make any neon version run as fast as the integer. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13cutils: Rearrange buffer_is_zero accelerationRichard Henderson1-191/+157
Allow selection of several acceleration functions based on the size and alignment of the buffer. Do not require ifunc support for AVX2 acceleration. Signed-off-by: Richard Henderson <rth@twiddle.net> Message-Id: <1472496380-19706-5-git-send-email-rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13cutils: Export only buffer_is_zeroRichard Henderson1-4/+4
Since the two users don't make use of the returned offset, beyond ensuring that the entire buffer is zero, consider the can_use_buffer_find_nonzero_offset and buffer_find_nonzero_offset functions internal. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Message-Id: <1472496380-19706-4-git-send-email-rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13cutils: Remove SPLAT macroRichard Henderson1-4/+0
This is unused and complicates the vector interface. Signed-off-by: Richard Henderson <rth@twiddle.net> Message-Id: <1472496380-19706-3-git-send-email-rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13cutils: Move buffer_is_zero and subroutines to a new fileRichard Henderson1-0/+272
Signed-off-by: Richard Henderson <rth@twiddle.net> Message-Id: <1472496380-19706-2-git-send-email-rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>