riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-07-07	ada: Make the names of uninstalled cross-gnattools consistent across builds	Maciej W. Rozycki	3	-11/+31
	We suffer from an inconsistency in the names of uninstalled gnattools executables in cross-compiler configurations. The cause is a recipe we have: ada.all.cross: for tool in $(ADA_TOOLS) ; do \ if [ -f $$tool$(exeext) ] ; \ then \ $(MV) $$tool$(exeext) $$tool-cross$(exeext); \ fi; \ done the intent of which is to give the names of gnattools executables the '-cross' suffix, consistently with the compiler drivers: 'gcc-cross', 'g++-cross', etc. A problem with the recipe is that this 'make' target is called too early in the build process, before gnattools have been made. Consequently no renames happen and owing to that they are conditional on the presence of the individual executables the recipe succeeds doing nothing. However if a target is requested later on such as 'make pdf' that does not cause gnattools executables to be rebuilt, then 'ada.all.cross' does succeed in renaming the executables already present in the build tree. Then if the 'gnat' testsuite is run later on which expects non-suffixed 'gnatmake' executable, it does not find the 'gnatmake-cross' executable in the build tree and may either catastrophically fail or incorrectly use a system-installed copy of 'gnatmake'. Of course if a target is requested such as `make all' that does cause gnattools executables to be rebuilt, then both suffixed and non-suffixed uninstalled executables result. Fix the problem by moving the renaming of gnattools to a separate 'make' recipe, pasted into a new 'gnattools-cross-mv' target and the existing legacy 'cross-gnattools' target. Then invoke the new target explicitly from the 'gnattools-cross' recipe in gnattools/. Update the test harness accordingly, so that suffixed gnattools are used in cross-compilation testsuite runs. gcc/ada/ * gcc-interface/Make-lang.in (ada.all.cross): Move recipe to... (GNATTOOLS_CROSS_MV): ... this new variable. (cross-gnattools): Paste it here. (gnattools-cross-mv): New target. gnattools/ * Makefile.in (gnattools-cross): Also build 'gnattools-cross-mv' in GCC_DIR. gcc/testsuite/ * lib/gnat.exp (local_find_gnatmake, find_gnatclean): Use '-cross' suffix where testing a cross-compiler.
2024-07-07	libstdc++: Fix std::find for non-contiguous iterators [PR115799]	Jonathan Wakely	2	-24/+27
	The r15-1857 change didn't correctly restrict the new optimization to contiguous iterators. libstdc++-v3/ChangeLog: PR libstdc++/115799 * include/bits/stl_algo.h (find): Use 'if constexpr' so that memchr optimization is a discarded statement for non-contiguous iterators. * testsuite/25_algorithms/find/bytes.cc: Check with input iterators.
2024-07-07	libstdc++: Fix memchr path in std::ranges::find for non-common range [PR115799]	Jonathan Wakely	2	-10/+19
	The memchr optimization introduced in r15-1857 needs to advance the start iterator instead of returning the sentinel. libstdc++-v3/ChangeLog: PR libstdc++/115799 * include/bits/ranges_util.h (__find_fn): Return iterator instead of sentinel. * testsuite/25_algorithms/find/constrained.cc: Check non-common contiguous sized range of char.
2024-07-07	Daily bump.	GCC Administrator	4	-1/+128

2024-07-06	libstdc++: Remove redundant 17_intro/headers tests	Jonathan Wakely	35	-1278/+26
	We have several nearly identical tests under 17_intro/headers which only differ in a -std option set using dg-options. Since the testsuite now supports running tests with multiple -std options (and I test that regularly) we don't need these duplicated tests. We can remove most of them and let the testsuite decide which -std option to use. In the all_attributes.cc case the content of the tests is slightly different, but they can be combined into one test that defines macros conditionally based on __cplusplus checks. The stdc++.cc tests could also be combined this way, but for now I've just kept one version for c++98 and one for all later standards. For stdc++_multiple_inclusion.cc we can remove the body of the files and just include stdc++.cc twice. This means we don't need to add includes to both stdc++.cc and stdc++_multiple_inclusion.cc, we only need to update one place. libstdc++-v3/ChangeLog: * testsuite/17_intro/headers/c++1998/all_attributes.cc: Add attribute names from later standards and remove dg-options. * testsuite/17_intro/headers/c++1998/stdc++.cc: Add c++98_only target selector. * testsuite/17_intro/headers/c++1998/stdc++_multiple_inclusion.cc: Remove content and include stdc++.cc twice instead. * testsuite/17_intro/headers/c++2011/stdc++.cc: Replace dg-options with c++11 target selector. * testsuite/17_intro/headers/c++2011/stdc++_multiple_inclusion.cc: Remove content and include stdc++.cc twice instead. * testsuite/17_intro/headers/c++2011/all_attributes.cc: Removed. * testsuite/17_intro/headers/c++2011/all_no_exceptions.cc: Removed. * testsuite/17_intro/headers/c++2011/all_no_rtti.cc: Removed. * testsuite/17_intro/headers/c++2011/all_pedantic_errors.cc: Removed. * testsuite/17_intro/headers/c++2011/charset.cc: Removed. * testsuite/17_intro/headers/c++2011/operator_names.cc: Removed. * testsuite/17_intro/headers/c++2014/all_attributes.cc: Removed. * testsuite/17_intro/headers/c++2014/all_no_exceptions.cc: Removed. * testsuite/17_intro/headers/c++2014/all_no_rtti.cc: Removed. * testsuite/17_intro/headers/c++2014/all_pedantic_errors.cc: Removed. * testsuite/17_intro/headers/c++2014/charset.cc: Removed. * testsuite/17_intro/headers/c++2014/operator_names.cc: Removed. * testsuite/17_intro/headers/c++2014/stdc++.cc: Removed. * testsuite/17_intro/headers/c++2014/stdc++_multiple_inclusion.cc: Removed. * testsuite/17_intro/headers/c++2017/all_attributes.cc: Removed. * testsuite/17_intro/headers/c++2017/all_no_exceptions.cc: Removed. * testsuite/17_intro/headers/c++2017/all_no_rtti.cc: Removed. * testsuite/17_intro/headers/c++2017/all_pedantic_errors.cc: Removed. * testsuite/17_intro/headers/c++2017/charset.cc: Removed. * testsuite/17_intro/headers/c++2017/operator_names.cc: Removed. * testsuite/17_intro/headers/c++2017/stdc++.cc: Removed. * testsuite/17_intro/headers/c++2017/stdc++_multiple_inclusion.cc: Removed. * testsuite/17_intro/headers/c++2020/all_attributes.cc: Removed. * testsuite/17_intro/headers/c++2020/all_no_exceptions.cc: Removed. * testsuite/17_intro/headers/c++2020/all_no_rtti.cc: Removed. * testsuite/17_intro/headers/c++2020/all_pedantic_errors.cc: Removed. * testsuite/17_intro/headers/c++2020/charset.cc: Removed. * testsuite/17_intro/headers/c++2020/operator_names.cc: Removed. * testsuite/17_intro/headers/c++2020/stdc++.cc: Removed. * testsuite/17_intro/headers/c++2020/stdc++_multiple_inclusion.cc: Removed.
2024-07-06	libstdc++: Use reserved form of [[__likely__]] in <variant>	Jonathan Wakely	1	-1/+1
	We should not use [[unlikely]] before C++20, so use [[__unlikely__]] instead. libstdc++-v3/ChangeLog: * include/std/variant (_Variant_storage::_M_reset): Use __unlikely__ form of attribute instead of unlikely.
2024-07-06	libstdc++: Restore support for including <name.h> in extern "C" [PR115797]	Jonathan Wakely	2	-2/+4
	The r15-1857 change means that <type_traits> is included by <cmath> for C++17 and up, which breaks code including <math.h> inside an extern "C" block. Although doing that is not allowed by the C++ standard, there's lots of existing code which incorrectly thinks it's a good idea and so we try to support it. libstdc++-v3/ChangeLog: PR libstdc++/115797 * include/std/type_traits: Ensure "C++" language linkage. * testsuite/17_intro/headers/c++2011/linkage.cc: Replace dg-options with c++11 target selector.
2024-07-06	[to-be-committed][v3][RISC-V] Handle bit manipulation of SImode values	Jeff Law	4	-16/+192
	Last patch in this round of bitmanip work... At least I think I'm going to pause here and switch gears to other projects that need attention 🙂 This patch introduces the ability to generate bitmanip instructions for rv64 when operating on SI objects when we know something about the range of the bit position (due to masking of the position). I've got note that the (7-pos % 8) bit position form was discovered by RAU in 500.perl. I took that and expanded it to the simple (pos & mask) form as well as covering bset, binv and bclr. As far as the implementation is concerned.... This turns the recently added define_splits into define_insn_and_split constructs. This allows combine to "see" enough RTL to realize a sign extension is unnecessary. Otherwise we get undesirable sign extensions for the new testcases. Second it adds new patterns for the logical operations. Two patterns for IOR/XOR and two patterns for AND. I think a key concept to keep in mind is that once we determine a Zbs operation is safe to perform on a SI value, we can rewrite the RTL in 64bit form. If we were ever to try and use range information at expand time for this stuff (and we probably should investigate that), that's the path I'd suggest. This is notably cleaner than my original implementation which actually kept the more complex RTL form through final and emitted 2/3 instructions (mask the bit position, then the bset/bclr/binv). Tested in my tester, but waiting for pre-commit CI to report back before taking further action. gcc/ * config/riscv/bitmanip.md (bset splitters): Turn into define_and_splits. Don't depend on combine splitting the "andn with constant" form. (bset, binv, bclr with masked bit position): New patterns. gcc/testsuite * gcc.target/riscv/binv-for-simode-1.c: New test. * gcc.target/riscv/bset-for-simode-1.c: New test. * gcc.target/riscv/bclr-for-simode-1.c: New test.
2024-07-06	testsuite/52641 - Fix more sloppy tests.	Georg-Johann Lay	14	-11/+25
	PR testsuite/52641 gcc/testsuite/ * gcc.dg/analyzer/torture/boxed-ptr-1.c: Requires size24plus. * gcc.dg/analyzer/torture/pr102692.c: Use intptr_t instead of long. * gcc.dg/ipa/pr102714.c: Use uintptr_t instead of unsigned long. * gcc.dg/torture/pr115387-1.c: Same. * gcc.dg/torture/pr113895-1.c : Same. * gcc.dg/ipa/pr108007.c: Require int32plus. * gcc.dg/ipa/pr109318.c: Same. * gcc.dg/ipa/pr96040.c: Use size_t instead of unsigned long. * gcc.dg/torture/pr113126.c: Use vectors of same dimension. * gcc.dg/tree-ssa/builtin-sprintf-9.c: Requires double64. * gcc.dg/spellcheck-inttypes.c [avr]: Avoid include of inttypes.h. * gcc.dg/analyzer/torture/pr104159.c [avr]: Skip. * gcc.dg/torture/pr84682-2.c [avr]: Skip. * gcc.dg/wtr-conversion-1.c [avr]: Remove avr selector since long double is a 64-bit type by now.
2024-07-06	[committed] Fix various sh define_insn_and_split predicates	Jeff Law	1	-50/+50
	The sh4-linux-gnu port has failed to bootstrap since the introduction of late combine due to failures to split certain insns. This is caused by incorrect predicates in various define_insn_and_split patterns. Essentially the insn's predicate is something like "TARGET_SH1". The split predicate is "&& can_create_pseudos_p ()". So these patterns will match post-reload, but be un-splittable. So at assembly output time, we get the failure as the output template is "#". This patch fixes the most obvious & egregious cases by bringing the split condition into the insn's predicate and leaving "&& 1" as the split condition. That's enough to get sh4-linux-gnu bootstrapping again and I'm hoping it does the same for sh4eb-linux-gnu. Pushing to the trunk. gcc/ * config/sh/sh.md (adddi3): Only allow matching when we can still create new pseudos. (subdi3, rotcl, rotcr, rotcr_neg_t, negdi2): Likewise. (abs<mode>2, negabs<mode>2, negdi_cond): Likewise. (swapbisi2_and_shl8, swapbhisi2, movsi_index_disp_load): Likewise. (movhi_index_disp_load, mov<mode>index_disp_store): Likewise. (mov_t_msb_neg, negt_msb, clipu_one): Likewise.
2024-07-06	AVR: Create more opportunities for -mfuse-add optimization.	Georg-Johann Lay	3	-21/+80
	avr_split_tiny_move() was only run for AVR_TINY because it has no PLUS addressing modes. Same applies to the X register on ordinary cores, and also to the Z register when used with [E]LPM. For example, without this patch long long addLL (long long a, long long b) { return a + b; } compiles with "-mmcu=atmgea128 -Os -dp" to: ... movw r26,r24 ; 80 [c=4 l=1] movhi/0 movw r30,r22 ; 81 [c=4 l=1] movhi/0 ld r18,X ; 82 [c=4 l=1] movqi_insn/3 adiw r26,1 ; 83 [c=4 l=3] movqi_insn/3 ld r19,X sbiw r26,1 adiw r26,2 ; 84 [c=4 l=3] movqi_insn/3 ld r20,X sbiw r26,2 adiw r26,3 ; 85 [c=4 l=3] movqi_insn/3 ld r21,X sbiw r26,3 adiw r26,4 ; 86 [c=4 l=3] movqi_insn/3 ld r22,X sbiw r26,4 adiw r26,5 ; 87 [c=4 l=3] movqi_insn/3 ld r23,X sbiw r26,5 adiw r26,6 ; 88 [c=4 l=3] movqi_insn/3 ld r24,X sbiw r26,6 adiw r26,7 ; 89 [c=4 l=2] movqi_insn/3 ld r25,X ld r10,Z ; 90 [c=4 l=1] movqi_insn/3 ... whereas with this patch it becomes: ... movw r26,r24 ; 80 [c=4 l=1] movhi/0 movw r30,r22 ; 81 [c=4 l=1] movhi/0 ld r18,X+ ; 140 [c=4 l=1] movqi_insn/3 ld r19,X+ ; 142 [c=4 l=1] movqi_insn/3 ld r20,X+ ; 144 [c=4 l=1] movqi_insn/3 ld r21,X+ ; 146 [c=4 l=1] movqi_insn/3 ld r22,X+ ; 148 [c=4 l=1] movqi_insn/3 ld r23,X+ ; 150 [c=4 l=1] movqi_insn/3 ld r24,X+ ; 152 [c=4 l=1] movqi_insn/3 ld r25,X ; 109 [c=4 l=1] movqi_insn/3 ld r10,Z ; 111 [c=4 l=1] movqi_insn/3 ... gcc/ * config/avr/avr.md: Also split with avr_split_tiny_move() for non-AVR_TINY. * config/avr/avr.cc (avr_split_tiny_move): Don't change memory references with base regs that can do PLUS addressing. (avr_out_lpm_no_lpmx) [POST_INC]: Don't output final ADIW when the address register is unused after. gcc/testsuite/ * gcc.target/avr/torture/fuse-add.c: New test.
2024-07-06	RISC-V: fix internal error on global variable-length array	Eric Botcazou	3	-1/+45
	This is an ICE in the RISC-V back-end calling tree_to_uhwi on the DECL_SIZE of a global variable-length array. gcc/ PR target/115591 * config/riscv/riscv.cc (riscv_valid_lo_sum_p): Add missing test on tree_fits_uhwi_p before calling tree_to_uhwi. gcc/testsuite/ * gnat.dg/array41.ads, gnat.dg/array41.adb: New test.
2024-07-06	PR target/115751: Avoid force_reg in ix86_expand_ternlog.	Roger Sayle	1	-2/+13
	This patch fixes a problem with splitting of complex AVX512 ternlog instructions on x86_64. A recent change allows the ternlog pattern to have multiple mem-like operands prior to reload, by emitting any "reloads" as necessary during split1, before register allocation. The issue is that this code calls force_reg to place the mem-like operand into a register, but unfortunately the vec_duplicate (broadcast) form of operands supported by ternlog isn't considered a "general_operand", i.e. supported by all instructions. This mismatch triggers an ICE in the middle-end's force_reg, even though the x86 supports loading these vec_duplicate operands into a vector register in a single (move) instruction. This patch resolves this problem by replacing force_reg with calls to gen_reg_rtx and emit_move (as the i386 backend, unlike the middle-end, knows these will be recognized by recog). 2024-07-06 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/115751 * config/i386/i386-expand.cc (ix86_expand_ternlog): Avoid use of force_reg to "reload" non-register operands, as these may contain vec_duplicate (broadcast) operands that aren't supported by force_reg. Use (safer) gen_reg_rtx and emit_move instead.
2024-07-06	Daily bump.	GCC Administrator	6	-1/+194

2024-07-05	x86, Darwin: Fix bootstrap for 32b multilibs/hosts.	Iain Sandoe	1	-0/+23
	r15-1735-ge62ea4fb8ffcab06ddd contained changes that altered the codegen for 32b Darwin (whether hosted on 64b or as 32b host) such that the per function picbase load is called multiple times in some cases. Darwin's back end is not expecting this (and indeed some of the handling depends on a single instance). The fixes the issue by marking those instructions as not copyable (as suggested by Andrew Pinski). The change is Darwin-specific. gcc/ChangeLog: * config/i386/i386.cc (ix86_cannot_copy_insn_p): New. (TARGET_CANNOT_COPY_INSN_P): New. Signed-off-by: Iain Sandoe <iains@gcc.gnu.org>
2024-07-06	Fortran: switch test to use issignaling() built-in	Francois-Xavier Coudert	2	-10/+3
	The macro may not be present in all libc's, but the built-in is always available. gcc/testsuite/ChangeLog: * gfortran.dg/ieee/signaling_2.f90: Adjust test. * gfortran.dg/ieee/signaling_2_c.c: Adjust test.
2024-07-05	Arm: Fix ldrd offset range [PR115153]	Wilco Dijkstra	3	-29/+48
	The valid offset range of LDRD in arm_legitimate_index_p is increased to -1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode. Fix this by moving the LDRD check earlier. gcc: PR target/115153 * config/arm/arm.cc (arm_legitimate_index_p): Move LDRD case before NEON. (thumb2_legitimate_index_p): Update comments. (output_move_neon): Use DFmode for vldr/vstr and non-checking adjust_address. gcc/testsuite: PR target/115153 * gcc.target/arm/pr115153.c: Add new test. * lib/target-supports.exp: Add arm_arch_v7ve_neon target support.
2024-07-05	libgccjit: Allow comparing array types	Antoni Boucher	3	-0/+23
	gcc/jit/ChangeLog: * jit-common.h: Add array_type class. * jit-recording.h (type::dyn_cast_array_type, memento_of_get_aligned::dyn_cast_array_type, array_type::dyn_cast_array_type, array_type::is_same_type_as): New methods. gcc/testsuite/ChangeLog: * jit.dg/test-types.c: Add array type comparison to the test.
2024-07-05	libgccjit: Add support for the type bfloat16	Antoni Boucher	8	-2/+67
	gcc/jit/ChangeLog: PR jit/112574 * docs/topics/types.rst: Document GCC_JIT_TYPE_BFLOAT16. * jit-common.h: Update NUM_GCC_JIT_TYPES. * jit-playback.cc (get_tree_node_for_type): Support bfloat16. * jit-recording.cc (recording::memento_of_get_type::get_size, recording::memento_of_get_type::dereference, recording::memento_of_get_type::is_int, recording::memento_of_get_type::is_signed, recording::memento_of_get_type::is_float, recording::memento_of_get_type::is_bool): Support bfloat16. * libgccjit.h (enum gcc_jit_types): Add GCC_JIT_TYPE_BFLOAT16. gcc/testsuite/ChangeLog: PR jit/112574 * jit.dg/all-non-failing-tests.h: New test test-bfloat16.c. * jit.dg/test-types.c: Test GCC_JIT_TYPE_BFLOAT16. * jit.dg/test-bfloat16.c: New test.
2024-07-05	MAINTAINERS: Fix order in DCO	Filip Kastl	1	-1/+1
	ChangeLog: * MAINTAINERS: Fix order in Contributing under the DCO. Signed-off-by: Filip Kastl <fkastl@suse.cz>
2024-07-05	RISC-V: Use tu policy for first-element vec_set [PR115725].	Robin Dapp	6	-33/+22
	This patch changes the tail policy for vmv.s.x from ta to tu. By default the bug does not show up with qemu because qemu's current vmv.s.x implementation always uses the tail-undisturbed policy. With a local qemu version that overwrites the tail with ones when the tail-agnostic policy is specified, the bug shows. gcc/ChangeLog: * config/riscv/autovec.md: Add TU policy. * config/riscv/riscv-protos.h (enum insn_type): Define SCALAR_MOVE_MERGED_OP_TU. gcc/testsuite/ChangeLog: PR target/115725 * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Adjust test expectation. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Ditto.
2024-07-05	AVR: target/87376 - Use nop_general_operand for DImode inputs.	Georg-Johann Lay	2	-13/+73
	The avr-dimode.md expanders have code like emit_move_insn(acc_a, operands[1]) where acc_a is a hard register and operands[1] might be a non-generic address-space memory reference. Such loads may clobber hard regs since some of them are implemented as libgcc calls /and/ 64-moves are expanded as eight byte-moves, so that acc_a or acc_b might be clobbered by such a load. This patch simply denies non-generic address-space references by using nop_general_operand for all avr-dimode.md input predicates. With the patch, all memory loads that require library calls are issued before the expander codes from avr-dimode.md are run. PR target/87376 gcc/ * config/avr/avr-dimode.md: Use "nop_general_operand" instead of "general_operand" as predicate for all input operands. gcc/testsuite/ * gcc.target/avr/torture/pr87376.c: New test.
2024-07-05	libstdc++: Add dg-error for new -Wdelete-incomplete diagnostics [PR115747]	Jonathan Wakely	1	-0/+3
	Since r15-1794-gbeb7a418aaef2e the -Wdelete-incomplete diagnostic is a permerror instead of a (suppressed in system headers) warning. Add dg-error directives. libstdc++-v3/ChangeLog: PR c++/115747 * testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc: Add dg-error for new C++26 diagnostics.
2024-07-05	libstdc++: Use RAII in <bits/stl_uninitialized.h>	Jonathan Wakely	1	-209/+156
	This adds an _UninitDestroyGuard class template, similar to ranges::_DestroyGuard used in <bits/ranges_uninitialized.h>. This allows us to remove all the try-catch blocks and rethrows, because any required cleanup gets done in the guard destructor. libstdc++-v3/ChangeLog: * include/bits/stl_uninitialized.h (_UninitDestroyGuard): New class template and partial specialization. (__do_uninit_copy, __do_uninit_fill, __do_uninit_fill_n) (__uninitialized_copy_a, __uninitialized_fill_a) (__uninitialized_fill_n_a, __uninitialized_copy_move) (__uninitialized_move_copy, __uninitialized_fill_move) (__uninitialized_move_fill, __uninitialized_default_1) (__uninitialized_default_n_a, __uninitialized_default_novalue_1) (__uninitialized_default_novalue_n_1, __uninitialized_copy_n) (__uninitialized_copy_n_pair): Use it.
2024-07-05	libstdc++: Use memchr to optimize std::find [PR88545]	Jonathan Wakely	4	-2/+202
	This optimizes std::find to use memchr when searching for an integer in a range of bytes. libstdc++-v3/ChangeLog: PR libstdc++/88545 PR libstdc++/115040 * include/bits/cpp_type_traits.h (__can_use_memchr_for_find): New variable template. * include/bits/ranges_util.h (__find_fn): Use memchr when possible. * include/bits/stl_algo.h (find): Likewise. * testsuite/25_algorithms/find/bytes.cc: New test.
2024-07-05	AArch64: lower 2 reg TBL permutes with one zero register to 1 reg TBL.	Tamar Christina	3	-6/+94
	When a two reg TBL is performed with one operand being a zero vector we can instead use a single reg TBL and map the indices for accessing the zero vector to an out of range constant. On AArch64 out of range indices into a TBL have a defined semantics of setting the element to zero. Many uArches have a slower 2-reg TBL than 1-reg TBL. Before this change we had: typedef unsigned int v4si __attribute__ ((vector_size (16))); v4si f1 (v4si a) { v4si zeros = {0,0,0,0}; return __builtin_shufflevector (a, zeros, 0, 5, 1, 6); } which generates: f1: mov v30.16b, v0.16b movi v31.4s, 0 adrp x0, .LC0 ldr q0, [x0, #:lo12:.LC0] tbl v0.16b, {v30.16b - v31.16b}, v0.16b ret .LC0: .byte 0 .byte 1 .byte 2 .byte 3 .byte 20 .byte 21 .byte 22 .byte 23 .byte 4 .byte 5 .byte 6 .byte 7 .byte 24 .byte 25 .byte 26 .byte 27 and with the patch: f1: adrp x0, .LC0 ldr q31, [x0, #:lo12:.LC0] tbl v0.16b, {v0.16b}, v31.16b ret .LC0: .byte 0 .byte 1 .byte 2 .byte 3 .byte -1 .byte -1 .byte -1 .byte -1 .byte 4 .byte 5 .byte 6 .byte 7 .byte -1 .byte -1 .byte -1 .byte -1 This sequence is generated often by openmp and aside from the strict performance impact of this change, it also gives better register allocation as we no longer have the consecutive register limitation. gcc/ChangeLog: * config/aarch64/aarch64.cc (struct expand_vec_perm_d): Add zero_op0_p and zero_op_p1. (aarch64_evpc_tbl): Implement register value remapping. (aarch64_vectorize_vec_perm_const): Detect if operand is a zero dup before it's forced to a reg. gcc/testsuite/ChangeLog: * gcc.target/aarch64/tbl_with_zero_1.c: New test. * gcc.target/aarch64/tbl_with_zero_2.c: New test.
2024-07-05	AArch64: remove aarch64_simd_vec_unpack<su>_lo_	Tamar Christina	2	-18/+5
	The fix for PR18127 reworked the uxtl to zip optimization. In doing so it undid the changes in aarch64_simd_vec_unpack<su>_lo_ and this now no longer matches aarch64_simd_vec_unpack<su>_hi_. It still works because the RTL generated by aarch64_simd_vec_unpack<su>_lo_ overlaps with the general zero extend RTL and so because that one is listed before the lo pattern recog picks it instead. This removes aarch64_simd_vec_unpack<su>_lo_. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_simd_vec_unpack<su>_lo_<mode>): Remove. (vec_unpack<su>_lo_<mode): Simplify. * config/aarch64/aarch64.cc (aarch64_gen_shareable_zero): Update comment.
2024-07-05	middle-end: Add debug functions to dump dominator tree in dot format	Alex Coplan	1	-0/+30
	This adds debug functions to dump the dominator tree in dot format. There are two overloads: one which takes a FILE * and another which takes a const char fname and wraps the first with fopen/fclose for convenience. gcc/ChangeLog: dominance.cc (dot_dominance_tree): New.
2024-07-05	i386: Refactor ssedoublemode	Hu, Lin1	1	-10/+9
	ssedoublemode's double should mean double type, like SI -> DI. And we need to refactor some patterns with <ssedoublemode> instead of <ssedoublevecmode>. gcc/ChangeLog: * config/i386/sse.md (ssedoublemode): Remove mappings to twice the number of same-sized elements. Add mappings to the same number of double-sized elements. (define_split for vec_concat_minus_plus): Change mode_attr from ssedoublemode to ssedoublevecmode. (define_split for vec_concat_plus_minus): Ditto. (<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>): Ditto. (avx512f_shuf_<shuffletype>64x2_1<mask_name>): Ditto. (avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Ditto. (avx512f_shuf_<shuffletype>32x4_1<mask_name>): Ditto.
2024-07-05	MIPS: Support more cases with alien mode of SHF.DF	YunQiang Su	3	-15/+170
	Currently, we support the cases that strictly fit for the instructions. For example, for V16QImode, we only support shuffle like (0<=N0, N1, N2, N3<=3 here) N0, N1, N2, N3 N0+4 N1+4 N2+4, N3+4 N0+8 N1+8 N2+8, N3+8 N0+12 N1+12 N2+12, N3+12 While in fact we can support more cases to try use other SHF.DF instructions not strictly fitting the mode. 1) We can use SHF.H to support more cases for V16QImode: (M0/M1/M2/M3 are 0 or 2 or 4 or 6) M0 M0+1, M1, M1+1 M2 M2+1, M3, M3+1 M0+8 M0+9, M1+8, M1+9 M2+8 M2+9, M3+8, M3+9 2) We can use SHF.W to support some cases for V16QImode: (M0/M1/M2/M3 are 0 or 4 or 8 or 12) M0, M0+1, M0+2, M0+3 M1, M1+1, M1+2, M1+3 M2, M2+1, M2+2, M2+3 M3, M3+1, M3+2, M3+3 3) We can use SHF.W to support some cases for V8HImode: (M0/M1/M2/M3 are 0 or 2 or 4 or 6) M0, M0+1 M1, M1+1 M2, M2+1 M3, M3+1 4) We can also use SHF.W to swap the 2 parts of V2DF or V2DI. gcc * config/mips/mips-protos.h: New function mips_msa_shf_i8. * config/mips/mips-msa.md(MSA_WHB_W): Not used anymore; (msa_shf_<msafmt_f>): Use mips_msa_shf_i8. * config/mips/mips.cc(mips_const_vector_shuffle_set_p): Support more cases try to use alien mode instruction; (mips_msa_shf_i8): New function to get the correct MSA SHF instruction and IMM.
2024-07-05	Testsuite/MIPS: Fix msa.c: test7_v2f64, test7_v4f32, test43_v2i64	YunQiang Su	1	-3/+3
	BNEGI.W/D are used for test7_v2f64 and test7_v4f32 now. It is an improvment since that we can save a instruction. ILVR.D is used for test43_v2i64 now, instead of INSVE.D. gcc/testsuite * gcc.target/mips/msa.c: Fix test7_v2f64, test7_v4f32 and test43_v2i64.
2024-07-05	MIPS/testsuite: Add -mfpxx to call-clobbered-1.c	YunQiang Su	1	-1/+1
	The scan-assembler-times rules only fit for -mfp32 and -mfpxx. It fails if we are configured as FP64 by default, as it has one less sdc1/ldc1 pair. gcc/testsuite * gcc.target/mips/call-clobbered-1.c: Add -mfpxx.
2024-07-05	MIPS/testsuite: Fix umips-save-restore-1.c	YunQiang Su	1	-4/+6
	With some recent optimization, -O1/-O2/-O3 can archive almost same performace/size by stack load/store. Thus lwm/swm will save/store less callee-saved register. In fact only $16 is saved with swm. To be sure that this optimization does exist, let's add 2 more function calls. So that lwm/swm can be much more profitable. If we add only once more, -O1 will still use stack load/store. gcc/testsuite * gcc.target/mips/umips-save-restore-1.c: Be sure lwm/swm are used for more callee-saved registers with addtional 2 more function calls.
2024-07-05	Support group size of three in SLP store permute lowering	Richard Biener	3	-1/+97
	The following implements the group-size three scheme from vect_permute_store_chain in SLP grouped store permute lowering and extends it to power-of-two multiples of group size three. The scheme goes from vectors A, B and C to { A[0], B[0], C[0], A[1], B[1], C[1], ... } by first producing { A[0], B[0], X, A[1], B[1], X, ... } (with X random but chosen to A[n]) and then permuting in C[n] in the appropriate places. The extension goes as to replace vector elements with a power-of-two number of lanes and you'd get pairwise interleaving until the final three input permutes happen. The last permute step could be seen as extending C to { C[0], C[0], C[0], ... } and then performing a blend. VLA archs will want to use store-lanes here I guess, I'm not sure if the three vector interleave operation is also available with a register source and destination and thus available for a shuffle. * tree-vect-slp.cc (vect_build_slp_instance): Special case three input permute with the same number of lanes in store permute lowering. * gcc.dg/vect/slp-53.c: New testcase. * gcc.dg/vect/slp-54.c: New testcase.
2024-07-05	Daily bump.	GCC Administrator	6	-1/+160

2024-07-04	analyzer: convert sm_context * to sm_context &	David Malcolm	12	-427/+419
	These are never nullptr and never change, so use a reference rather than a pointer. No functional change intended. gcc/analyzer/ChangeLog: * diagnostic-manager.cc (diagnostic_manager::add_events_for_eedge): Pass sm_ctxt by reference. * engine.cc (impl_region_model_context::on_condition): Likewise. (impl_region_model_context::on_bounded_ranges): Likewise. (impl_region_model_context::on_phi): Likewise. (exploded_node::on_stmt): Likewise. * sm-fd.cc: Update all uses of sm_context * to sm_context &. * sm-file.cc: Likewise. * sm-malloc.cc: Likewise. * sm-pattern-test.cc: Likewise. * sm-sensitive.cc: Likewise. * sm-signal.cc: Likewise. * sm-taint.cc: Likewise. * sm.h: Likewise. * varargs.cc: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/plugin/analyzer_gil_plugin.c: Update all uses of sm_context * to sm_context &. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-07-04	analyzer: handle <error.h> at -O0 [PR115724]	David Malcolm	2	-0/+90
	At -O0, glibc's: __extern_always_inline void error (int __status, int __errnum, const char __format, ...) { if (__builtin_constant_p (__status) && __status != 0) __error_noreturn (__status, __errnum, __format, __builtin_va_arg_pack ()); else __error_alias (__status, __errnum, __format, __builtin_va_arg_pack ()); } becomes just: __extern_always_inline void error (int __status, int __errnum, const char __format, ...) { if (0) __error_noreturn (__status, __errnum, __format, __builtin_va_arg_pack ()); else __error_alias (__status, __errnum, __format, __builtin_va_arg_pack ()); } and thus calls to "error" are calls to "__error_alias" by the time -fanalyzer "sees" them. Handle them with more special-casing in kf.cc. gcc/analyzer/ChangeLog: PR analyzer/115724 * kf.cc (register_known_functions): Add __error_alias and __error_at_line_alias. gcc/testsuite/ChangeLog: PR analyzer/115724 * c-c++-common/analyzer/error-pr115724.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-07-04	[committed][RISC-V] Fix test expectations after recent late-combine changes	Jeff Law	1	-3/+3
	With the recent DCE related adjustment to late-combine the rvv/base/vcreate.c test no longer has those undesirable vmvNr statements. It's a bit unclear why this wasn't written as a scan-assembler-not and xfailed given the comment says we don't want to see vmvNr insructions. I must have missed that during review. This patch adjusts the test to expect no vmvNr statements and if they're ever re-introduced, we'll get a nice unexpected failure. gcc/testsuite * gcc.target/riscv/rvv/base/vcreate.c: Update expected output.
2024-07-04	Skip 30_threads/future/members/poll.cc on hppa--linux*	John David Anglin	1	-0/+1
	hppa--linux* lacks high resolution timer support. Timer resolution ranges from 1 to 10ms. As a result, a large number of iterations are needed for the wait_for_0 and ready loops. This causes the wait_until_sys_epoch and wait_until_steady_epoch loops to timeout. There the loop wait time is determined by the timer resolution. 2024-07-04 John David Anglin <danglin@gcc.gnu.org> libstdc++-v3/ChangeLog: PR libstdc++/98678 * testsuite/30_threads/future/members/poll.cc: Skip on hppa--linux*.
2024-07-04	testsuite: Update test for PR115537 to use SVE .	Tamar Christina	1	-1/+1
	The PR was about SVE codegen, the testcase accidentally used neoverse-n1 instead of neoverse-v1 as was the original report. This updates the tool options. gcc/testsuite/ChangeLog: PR tree-optimization/115537 * gcc.dg/vect/pr115537.c: Update flag from neoverse-n1 to neoverse-v1.
2024-07-04	c++ frontend: check for missing condition for novector [PR115623]	Tamar Christina	2	-1/+11
	It looks like I forgot to check in the C++ frontend if a condition exist for the loop being adorned with novector. This causes a segfault because cond isn't expected to be null. This fixes it by issuing ignoring the pragma when there's no loop condition the same way we do in the C frontend. gcc/cp/ChangeLog: PR c++/115623 * semantics.cc (finish_for_cond): Add check for C++ cond. gcc/testsuite/ChangeLog: PR c++/115623 * g++.dg/vect/vect-novector-pragma_2.cc: New test.
2024-07-04	arm: Use LDMIA/STMIA for thumb1 DI/DF loads/stores	Siarhei Volkau	4	-8/+58
	If the address register is dead after load/store operation it looks beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions, at least if optimizing for size. gcc/ChangeLog: * config/arm/arm.cc (thumb_load_double_from_address): Emit ldmia when address reg rewritten by load. * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New. (peephole2 to rewrite DI/DF store): New. * config/arm/iterators.md (DIDF): New. gcc/testsuite: * gcc.target/arm/thumb1-load-store-64bit.c: Add new test. Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
2024-07-04	Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890]	Alfie Richards	2	-3/+1
	This change removes code that switches the operands in bigendian mode erroneously. This fixes the related test also. gcc/ChangeLog: PR target/114890 * config/aarch64/aarch64-simd.md: Remove bigendian operand swap. gcc/testsuite/ChangeLog: PR target/114890 * gcc.target/aarch64/vector_intrinsics_asm.c: Remove xfail.
2024-07-04	Aarch64: Add test for non-commutative SIMD intrinsic	Alfie Richards	1	-0/+371
	This adds a test for non-commutative SIMD NEON intrinsics. Specifically addp is non-commutative and has a bug in the current big-endian implementation. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vector_intrinsics_asm.c: New test.
2024-07-04	middle-end/115426 - wrong gimplification of "rm" asm output operand	Richard Biener	2	-0/+22
	When the operand is gimplified to an extract of a register or a register we have to disallow memory as we otherwise fail to gimplify it properly. Instead of __asm__("" : "=rm" __imag <r>); we want __asm__("" : "=rm" D.2772); _1 = REALPART_EXPR <r>; r = COMPLEX_EXPR <_1, D.2772>; otherwise SSA rewrite will fail and generate wrong code with 'r' left bare in the asm output. PR middle-end/115426 * gimplify.cc (gimplify_asm_expr): Handle "rm" output constraint gimplified to a register (operation). * gcc.dg/pr115426.c: New testcase.
2024-07-04	Use __builtin_cpu_support instead of __get_cpuid_count.	liuhongt	1	-26/+20
	gcc/testsuite/ChangeLog: PR target/115748 * gcc.target/i386/avx512-check.h: Use __builtin_cpu_support instead of __get_cpuid_count.
2024-07-04	i386: Add additional variant of bswaphisi2_lowpart peephole2.	Roger Sayle	2	-0/+35
	This patch adds an additional variation of the peephole2 used to convert bswaphisi2_lowpart into rotlhi3_1_slp, which converts xchgb %ah,%al into rotw if the flags register isn't live. The motivating example is: void ext(int x); void foo(int x) { ext((x&~0xffff)\|((x>>8)&0xff)\|((x&0xff)<<8)); } where GCC with -O2 currently produces: foo: movl %edi, %eax rolw $8, %ax movl %eax, %edi jmp ext The issue is that the original xchgb (bswaphisi2_lowpart) can only be performed in "Q" registers that allow the %?h register to be used, so reload generates the above two movl. However, it's later in peephole2 where we see that CC_FLAGS can be clobbered, so we can use a rotate word, which is more forgiving with register allocations. With the additional peephole2 proposed here, we now generate: foo: rolw $8, %di jmp ext 2024-07-04 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.md (bswaphisi2_lowpart peephole2): New peephole2 variant to eliminate register shuffling. gcc/testsuite/ChangeLog * gcc.target/i386/xchg-4.c: New test case.
2024-07-03	[committed] Fix newlib build failure with rx as well as several dozen ↵	Jeff Law	1	-3/+2
	testsuite failures The rx port has been failing to build newlib for a bit over a week. I can't remember if it was the late-combine work or the IRA costing twiddle, regardless the real bug is in the rx backend. Basically dwarf2cfi is blowing up because of inconsistent state caused by the failure to mark a stack adjustment as frame related. This instance in the epilogue looks like a simple goof. With the port building again, the testsuite would run and it showed a number of regressions, again related to CFI handling. The common thread was a failure to mark a copy from FP to SP in the prologue as frame related. The change which introduced this bug as supposed to just be changing promotions of vector types. It's unclear if Nick included the hunk accidentally or just goof'd on the logic. Regardless it looks quite incorrect. Reverting that hunk fixes the regressions and fixes 94 pre-existing failures. The net is rx-elf is regression free and has moved forward in terms of its testsuite status. Pushing to the trunk momentarily. gcc/ * config/rx/rx.cc (rx_expand_prologue): Mark the copy from FP to SP as frame related. (rx_expand_epilogue): Mark the stack pointer adjustment as frame related.
2024-07-04	[APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue	Hongyu Wang	4	-4/+34
	According to APX spec, the pushp/popp pairs should be matched, otherwise the PPX hint cannot take effect and cause performance loss. In the ix86_expand_epilogue, there are several optimizations that may cause the epilogue using mov to restore the regs. Check if PPX applied and prevent usage of mov/leave in the epilogue. Also do not use PPX for eh_return. gcc/ChangeLog: * config/i386/i386.cc (ix86_expand_prologue): Set apx_ppx_used flag in m.fs with TARGET_APX_PPX && !crtl->calls_eh_return. (ix86_emit_save_regs): Emit ppx is available only when TARGET_APX_PPX && !crtl->calls_eh_return. (ix86_expand_epilogue): Don't restore reg using mov when apx_ppx_used flag is true. * config/i386/i386.h (struct machine_frame_state): Add apx_ppx_used flag. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ppx-2.c: New test. * gcc.target/i386/apx-ppx-3.c: Likewise.
2024-07-03	c++: OVERLOAD in diagnostics	Jason Merrill	2	-5/+3
	In modules we can get an OVERLOAD around a non-function, so let's tail recurse instead of falling through. As a result we start printing the template header in this testcase. gcc/cp/ChangeLog: * error.cc (dump_decl) [OVERLOAD]: Recurse on single case. gcc/testsuite/ChangeLog: * g++.dg/warn/pr61945.C: Adjust diagnostic.