aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2020-12-13Revert "Arm: Add NEON and MVE RTL patterns for Complex Addition, Multiply ↵Tamar Christina8-216/+208
and FMA." This reverts commit 3b8a82f97dd48e153ce93b317c44254839e11461. Has a dependency on the AArch64 patch which hasn't been approved yet.
2020-12-13varasm: Reject soft frame or arg pointer registers for register vars [PR92469]Jakub Jelinek4-4/+54
The following patch rejects frame, argp and retarg registers (unless they are equal to hard frame pointer registers or if they aren't eliminable) from local or global register vars. These are just internal implementation details eliminated later into hard frame pointer or stack pointer and using them as register variable leads to numerous ICEs. 2020-12-13 Jakub Jelinek <jakub@redhat.com> PR target/92469 * varasm.c (eliminable_regno_p): New function. (make_decl_rtl): Reject asm vars for frame and argp if they are different from hard frame pointer. * gcc.target/i386/pr92469.c: New test. * gcc.target/i386/pr79804.c: Adjust expected diagnostics. * gcc.target/i386/pr88178.c: Expect an error.
2020-12-13Arm: Add NEON and MVE RTL patterns for Complex Addition, Multiply and FMA.Tamar Christina8-208/+216
This adds implementation for the optabs for complex additions. With this the following C code: void f90 (float complex a[restrict N], float complex b[restrict N], float complex c[restrict N]) { for (int i=0; i < N; i++) c[i] = a[i] + (b[i] * I); } generates f90: add r3, r2, #1600 .L2: vld1.32 {q8}, [r0]! vld1.32 {q9}, [r1]! vcadd.f32 q8, q8, q9, #90 vst1.32 {q8}, [r2]! cmp r3, r2 bne .L2 bx lr instead of f90: add r3, r2, #1600 .L2: vld2.32 {d24-d27}, [r0]! vld2.32 {d20-d23}, [r1]! vsub.f32 q8, q12, q11 vadd.f32 q9, q13, q10 vst2.32 {d16-d19}, [r2]! cmp r3, r2 bne .L2 bx lr gcc/ChangeLog: * config/arm/arm_mve.h (__arm_vcaddq_rot90_u8, __arm_vcaddq_rot270_u8, , __arm_vcaddq_rot90_s8, __arm_vcaddq_rot270_s8, __arm_vcaddq_rot90_u16, __arm_vcaddq_rot270_u16, __arm_vcaddq_rot90_s16, __arm_vcaddq_rot270_s16, __arm_vcaddq_rot90_u32, __arm_vcaddq_rot270_u32, __arm_vcaddq_rot90_s32, __arm_vcaddq_rot270_s32, __arm_vcmulq_rot90_f16, __arm_vcmulq_rot270_f16, __arm_vcmulq_rot180_f16, __arm_vcmulq_f16, __arm_vcaddq_rot90_f16, __arm_vcaddq_rot270_f16, __arm_vcmulq_rot90_f32, __arm_vcmulq_rot270_f32, __arm_vcmulq_rot180_f32, __arm_vcmulq_f32, __arm_vcaddq_rot90_f32, __arm_vcaddq_rot270_f32, __arm_vcmlaq_f16, __arm_vcmlaq_rot180_f16, __arm_vcmlaq_rot270_f16, __arm_vcmlaq_rot90_f16, __arm_vcmlaq_f32, __arm_vcmlaq_rot180_f32, __arm_vcmlaq_rot270_f32, __arm_vcmlaq_rot90_f32): Update builtin calls. * config/arm/arm_mve_builtins.def (vcaddq_rot90_u, vcaddq_rot270_u, vcaddq_rot90_s, vcaddq_rot270_s, vcaddq_rot90_f, vcaddq_rot270_f, vcmulq_f, vcmulq_rot90_f, vcmulq_rot180_f, vcmulq_rot270_f, vcmlaq_f, vcmlaq_rot90_f, vcmlaq_rot180_f, vcmlaq_rot270_f): Removed. (vcaddq_rot90, vcaddq_rot270, vcmulq, vcmulq_rot90, vcmulq_rot180, vcmulq_rot270, vcmlaq, vcmlaq_rot90, vcmlaq_rot180, vcmlaq_rot270): New. * config/arm/constraints.md (Dz): Include MVE. * config/arm/iterators.md (mve_rotsplit1, mve_rotsplit2): New. (rot): Add UNSPEC_VCMLS, UNSPEC_VCMUL and UNSPEC_VCMUL180. (rot_op, rotsplit1, rotsplit2, fcmac1, VCMLA_OP, VCMUL_OP): New. * config/arm/mve.md (VCADDQ_ROT270_S, VCADDQ_ROT90_S, VCADDQ_ROT270_U, VCADDQ_ROT90_U, VCADDQ_ROT270_F, VCADDQ_ROT90_F, VCMULQ_F, VCMULQ_ROT180_F, VCMULQ_ROT270_F, VCMULQ_ROT90_F, VCMLAQ_F, VCMLAQ_ROT180_F, VCMLAQ_ROT90_F, VCMLAQ_ROT270_F, VCADDQ_ROT270_S, VCADDQ_ROT270, VCADDQ_ROT90): Removed. (mve_rot, VCMUL): New. (mve_vcaddq_rot270_<supf><mode, mve_vcaddq_rot90_<supf><mode>, mve_vcaddq_rot270_f<mode>, mve_vcaddq_rot90_f<mode>, mve_vcmulq_f<mode, mve_vcmulq_rot180_f<mode>, mve_vcmulq_rot270_f<mode>, mve_vcmulq_rot90_f<mode>, mve_vcmlaq_f<mode>, mve_vcmlaq_rot180_f<mode>, mve_vcmlaq_rot270_f<mode>, mve_vcmlaq_rot90_f<mode>): Removed. (mve_vcmlaq<mve_rot><mode>, mve_vcmulq<mve_rot><mode>, mve_vcaddq<mve_rot><mode>, cadd<rot><mode>3, mve_vcaddq<mve_rot><mode>): New. (cmul<rot_op><mode>3): Exclude MVE types. * config/arm/unspecs.md (UNSPEC_VCMUL90, UNSPEC_VCMUL270): New. * config/arm/vec-common.md (cadd<rot><mode>3, cmul<rot_op><mode>3, arm_vcmla<rot><mode>, cml<fcmac1><rot_op><mode>4): New. * config/arm/unspecs.md (UNSPEC_VCMUL, UNSPEC_VCMUL180, UNSPEC_VCMLS, UNSPEC_VCMLS180): New. * config/arm/neon.md (cmul<rot_op><mode>3): New.
2020-12-13Arm: Add support for auto-vectorization using HF mode.Tamar Christina2-0/+16
This adds support to the auto-vectorizer to support HFmode vectorization for AArch32. This is supported when +fp16 is used. I wonder if I should disable the returning of the type if the option isn't enabled. At the moment it will be returned but the vectorizer will try and fail to use it. It wastes a few compile cycles but doesn't result in bad code. gcc/ChangeLog: * config/arm/arm.c (arm_preferred_simd_mode): Add E_HFmode. gcc/testsuite/ChangeLog: * gcc.target/arm/vect-half-floats.c: New test.
2020-12-13middle-end: Support complex AdditionTamar Christina43-21/+2078
This patch adds support for * Complex Addition with rotation of 90 and 270. Addition with rotation of the second argument around the Argand plane. Supported rotations are 90 and 180. c = a + (b * I) and c = a + (b * I * I * I) gcc/ChangeLog: * tree-vect-slp-patterns.c: New file. * Makefile.in: Add it. * doc/passes.texi: Document it. * internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New. * optabs.def (cadd90_optab, cadd270_optab): New. * doc/md.texi: Document them. * tree-vect-loop.c (vect_analyze_loop_2): Add dissolve code. * tree-vect-slp.c: (vect_free_slp_instance, vect_create_new_slp_node): Export. (vect_match_slp_patterns_2, vect_match_slp_patterns): New. (vect_analyze_slp): Use it. * tree-vectorizer.h (vect_free_slp_tree): Export. (enum _complex_operation): Forward declare. (class vect_pattern): New gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_arm_v8_3a_complex_neon_ok_nocache): Fix it. (check_effective_target_vect_complex_add_byte ,check_effective_target_vect_complex_add_int ,check_effective_target_vect_complex_add_short ,check_effective_target_vect_complex_add_long ,check_effective_target_vect_complex_add_half ,check_effective_target_vect_complex_add_float ,check_effective_target_vect_complex_add_double): New. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c: New test. * gcc.dg/vect/complex/complex-add-pattern-template.c: New test. * gcc.dg/vect/complex/complex-add-template.c: New test. * gcc.dg/vect/complex/complex-operations-run.c: New test. * gcc.dg/vect/complex/complex-operations.c: New test. * gcc.dg/vect/complex/complex.exp: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-double.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-half-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-short.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c: New test.
2020-12-13middle-end: Refactor and expose some vectorizer helper functions.Tamar Christina4-9/+32
This is a small refactoring which exposes some helper functions in the vectorizer so they can be used in other places. gcc/ChangeLog: * tree-vect-patterns.c (vect_mark_pattern_stmts): Remove static inline. * tree-vect-slp.c (vect_create_new_slp_node): Remove static and only set smts if valid. * tree-vectorizer.c (vec_info::add_pattern_stmt): New. (vec_info::set_vinfo_for_stmt): Optionally enforce read-only. * tree-vectorizer.h (struct _slp_tree): Use new types. (lane_permutation_t, lane_permutation_t): New. (vect_create_new_slp_node, vect_mark_pattern_stmts): New.
2020-12-13Show coarrays on parse tree dump, implement debug for array references.Thomas Koenig1-0/+36
gcc/fortran/ChangeLog: * dump-parse-tree.c (show_array_ref): Also show coarrays. (debug): Implement for array reference.
2020-12-13testsuite: Fix various scan-assembler-symbol-section issuesRainer Orth5-23/+47
This patch addresses some of the issues that I found when looking into the failures of the scan-assembler-symbol-section tests on Solaris/SPARC. * The first issue was that on Solaris/SPARC, section names are double-quoted, both with as and gas: .section ".text" When using as, the section flag and type syntax is completely different from other ELF targets: .section "my_named_section",#alloc,#execinstr,#progbits This patch fixes this by stripping double quotes from section names. * However, this didn't work initially (only the leading quote was stripped), which is due to David's recent AIX patch: with the introduction of the new capturing group to handle both .section (ELF) and .csect (XCOFF), $full_section_directive would never be empty on ELF and Mach-O targets, so the extraction of the section name didn't work any longer. This had also broken the Darwin tests completely. * With working double quote stripping, all but one of the tests PASSed on Solaris/SPARC, the exception being: FAIL: gcc.dg/20021029-1.c scan-assembler-symbol-section symbol ar (found __sparc_get_pc_thunk.l7) has section ^\\\\.(const|rodata)|\\\\[RO\\\\] (found .text.__sparc_get_pc_thunk.l7%__sparc_get_pc_thunk.l7) This is due to the symbol name (ar) not being anchored in the test and unexpectedly matchting __sparc_get_pc_thunk.l7. * Next, I ran the tests on Darwin 11 and found two failing tests: FAIL: gcc.dg/darwin-sections.c scan-assembler-symbol-section symbol ^_a\$ (symbol not found) has section \\\\.data FAIL: gcc.dg/darwin-sections.c scan-assembler-symbol-section symbol ^_b\$ (symbol not found) has section \\\\.data is due to Iain's recent "Darwin : Begin rework of zero-fill sections." patch which emits .globl _a .zerofill __DATA,__common,_a,1,0 This is already scanned for, so the two scans above can just go. The other failing test is FAIL: g++.dg/gomp/tls-5.C -std=c++14 scan-assembler-symbol-section symbol ^_?_ZGR2ir_\$ (symbol not found) has section ^\\\\.tdata|\\\\[TL\\\\] FAIL: g++.dg/gomp/tls-5.C -std=c++14 scan-assembler-symbol-section symbol ^_?ir\$ (symbol not found) has section ^\\\\.tbss|\\\\[TL\\\\] Other scans are guarded by target tls_native, and indeed the assembler output has ___emutls_v._ZGR2ir_: ___emutls_t._ZGR2ir_: ___emutls_v.ir: Unfortunately scan-assembler-symbol-section doesn't support selects yet, which this test implements both for the benefit of this test and for symmetry. With those changes, test results are clean now on sparc-sun-solaris2.11, i386-pc-solaris2.11, i386-apple-darwin11.4.2, and powerpc-ibm-aix7.2.4.0. 2020-12-03 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc: * doc/sourcebuild.texi (Commands for use in dg-final, Scan the assembly output, scan-assembler-symbol-section): Document. (scan-symbol-section): Document. gcc/testsuite: * lib/scanasm.exp (scan-symbol-section): Pass args to dg-scan-symbol-section. (scan-assembler-symbol-section): Likewise. (dg-scan-symbol-section): Handle selector from orig_args. Get patterns from orig_args. (parse_section_of_symbols): Fix section_pattern. Strip double quotes from section name. * g++.dg/gomp/tls-5.C: Restrict ir, _ZGR2ir_ scans to tls_native. * gcc.dg/20021029-1.c: Anchor ar symbol. * gcc.dg/darwin-sections.c: Remove obsolete scans for _a, _b in .data.
2020-12-13Tweak the way that is_a is implementedRichard Sandiford1-18/+63
At the moment, class hierarchies that use is_a are expected to define specialisations like: template <> template <> inline bool is_a_helper <cgraph_node *>::test (symtab_node *p) { return p->type == SYMTAB_FUNCTION; } But this doesn't scale well to larger hierarchies, because it only defines ::test for an argument that is exactly “symtab_node *” (and not for example “const symtab_node *” or something that comes between cgraph_node and symtab_node in the hierarchy). For example: struct A { int x; }; struct B : A {}; struct C : B {}; template <> template <> inline bool is_a_helper <C *>::test (A *a) { return a->x == 1; } bool f(B *b) { return is_a<C *> (b); } gives: warning: inline function ‘static bool is_a_helper<T>::test(U*) [with U = B; T = C*]’ used but never defined and: bool f(const A *a) { return is_a<const C *> (a); } gives: warning: inline function ‘static bool is_a_helper<T>::test(U*) [with U = const A; T = const C*]’ used but never defined This patch instead allows is_a to be implemented by specialising is_a_helper as a whole, for example: template<> struct is_a_helper<C *> : static_is_a_helper<C *> { static inline bool test (const A *a) { return a->x == 1; } }; It also adds a general specialisation of is_a_helper for const pointers. Together, this makes both of the above examples work. gcc/ * is-a.h (reinterpret_is_a_helper): New class. (static_is_a_helper): Likewise. (is_a_helper): Inherit from reinterpret_is_a_helper. (is_a_helper<const T *>): New specialization.
2020-12-13Move iterator_range to a new iterator-utils.h fileRichard Sandiford2-17/+45
A later patch will add more iterator-related utilities. Rather than putting them all directly in coretypes.h, it seemed better to add a new header file, here called "iterator-utils.h". This preliminary patch moves the existing iterator_range class there too. I used the same copyright date range as coretypes.h “just to be sure”. gcc/ * coretypes.h (iterator_range): Move to... * iterator-utils.h: ...this new file.
2020-12-13rtlanal: Remove noop_move_p REG_EQUAL conditionRichard Sandiford1-4/+0
noop_move_p currently keeps any instruction that has a REG_EQUAL note, on the basis that the equality might be useful in future. But this creates a perverse incentive not to add potentially-useful REG_EQUAL notes, in case they prevent an instruction from later being removed as dead. The condition originates from flow.c:life_analysis_1 and predates the changes tracked by the current repository (1992). It probably made sense when most optimisations were done on RTL rather than FE trees, but it seems counterproductive now. gcc/ * rtlanal.c (noop_move_p): Don't check for REG_EQUAL notes.
2020-12-13vec: Silence clang warningRichard Sandiford1-1/+1
I noticed during compatibility testing that clang warns that this operator won't be implicitly const in C++14 onwards. gcc/ * vec.h (vnull::operator vec<T, A, L>): Make const.
2020-12-13Daily bump.GCC Administrator4-1/+56
2020-12-12Fortran: Enable inquiry references in data statements [PR98022].Paul Thomas2-13/+94
2020-12-12 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/98022 * data.c (gfc_assign_data_value): Handle inquiry references in the data statement object list. gcc/testsuite/ PR fortran/98022 * gfortran.dg/data_inquiry_ref.f90: New test.
2020-12-12match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]Jakub Jelinek4-0/+163
This patch adds the ~(X - Y) -> ~X + Y simplification requested in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can be safely negated. The first two simplify blocks is what has been requested in the PR and that makes the first testcase pass. Unfortunately, that change also breaks the second testcase, because while the same expressions appearing in the same stmt and split across multiple stmts has been folded (not really) before, with this optimization fold-const.c optimizes ~X + Y further into (Y - X) - 1 in fold_binary_loc associate: code, but we have nothing like that in GIMPLE and so end up with different expressions. The last simplify is an attempt to deal with just this case, had to rule out there the Y == -1U case, because then we reached infinite recursion as ~X + -1U was canonicalized by the pattern into (-1U - X) + -1U but there is a canonicalization -1 - A -> ~A that turns it back. Furthermore, had to make it #if GIMPLE only, because it otherwise resulted in infinite recursion when interacting with the associate: optimization. The end result is that we pass all 3 testcases and thus canonizalize the 3 possible forms of writing the same thing. 2020-12-12 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/96685 * match.pd (~(X - Y) -> ~X + Y): New optimization. (~X + Y -> (Y - X) - 1): Likewise. * gcc.dg/tree-ssa/pr96685-1.c: New test. * gcc.dg/tree-ssa/pr96685-2.c: New test. * gcc.dg/tree-ssa/pr96685-3.c: New test.
2020-12-12widening_mul: Recognize another form of ADD_OVERFLOW [PR96272]Jakub Jelinek2-21/+111
The following patch recognizes another form of hand written __builtin_add_overflow (this time _p), in particular when the code does unsigned if (x > ~0U - y) or if (x <= ~0U - y) it can be optimized (if the subtraction turned into ~y is single use) into if (__builtin_add_overflow_p (x, y, 0U)) or if (!__builtin_add_overflow_p (x, y, 0U)) and generate better code, e.g. for the first function in the testcase: - movl %esi, %eax addl %edi, %esi - notl %eax - cmpl %edi, %eax - movl $-1, %eax - cmovnb %esi, %eax + jc .L3 + movl %esi, %eax + ret +.L3: + orl $-1, %eax ret on x86_64. As for the jumps vs. conditional move case, that is some CE issue with complex branch patterns we should fix up no matter what, but in this case I'm actually not sure if branchy code isn't better, overflow is something that isn't that common. 2020-12-12 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/96272 * tree-ssa-math-opts.c (uaddsub_overflow_check_p): Add OTHER argument. Handle BIT_NOT_EXPR. (match_uaddsub_overflow): Optimize unsigned a > ~b into __imag__ .ADD_OVERFLOW (a, b). (math_opts_dom_walker::after_dom_children): Call match_uaddsub_overflow even for BIT_NOT_EXPR. * gcc.dg/tree-ssa/pr96272.c: New test.
2020-12-12openmp, openacc: Fix up handling of data regions [PR98183]Jakub Jelinek4-22/+43
While the data regions (target data and OpenACC counterparts) aren't standalone directives, unlike most other OpenMP/OpenACC constructs we allow (apparently as an extension) exceptions and goto out of the block. During gimplification we place an *end* call into a finally block so that it is reached even on exceptions or goto out etc.). During omplower pass we then add paired #pragma omp return for them, but due to the exceptions because the region is not SESE we can end up with #pragma omp return appearing only conditionally in the CFG etc., which the ompexp pass can't handle. For the ompexp pass, we actually don't care about the end part or about target data nesting, so we can treat it as standalone directive. 2020-12-12 Jakub Jelinek <jakub@redhat.com> PR middle-end/98183 * omp-low.c (lower_omp_target): Don't add OMP_RETURN for data regions. * omp-expand.c (expand_omp_target): Don't try to remove OMP_RETURN for data regions. (build_omp_regions_1, omp_make_gimple_edges): Don't expect OMP_RETURN for data regions. * gcc.dg/gomp/pr98183.c: New test. * gcc.dg/goacc/pr98183.c: New test.
2020-12-12Daily bump.GCC Administrator4-1/+39
2020-12-11c++: Avoid considering some conversion ops [PR97600]Jason Merrill3-1/+59
Patrick's earlier patch to check convertibility before constraints for conversion ops wasn't suitable because checking convertibility can also lead to unwanted instantiations, but it occurs to me that there's a smaller check we can do to avoid doing normal consideration of the conversion ops in this case: since we're in the middle of a user-defined conversion, we can exclude from consideration any conversion ops that return a type that would need an additional user-defined conversion to reach the desired type: namely, a type that differs in class-ness from the desired type. [temp.inst]/9 allows optimizations like this: "If the function selected by overload resolution can be determined without instantiating a class template definition, it is unspecified whether that instantiation actually takes place." gcc/cp/ChangeLog: PR libstdc++/97600 * call.c (build_user_type_conversion_1): Avoid considering conversion functions that return a clearly unsuitable type. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-conv3.C: New test.
2020-12-11c++: Fix build with --enable-gather-detailed-mem-stats.Jason Merrill1-1/+1
Nathan's recent patch added make_binding_vec defined with MEM_STAT_DECL, but didn't add the parallel decoration to the forward declaration. gcc/cp/ChangeLog: * cp-tree.h (make_binding_vec): Add CXX_MEM_STAT_INFO.
2020-12-11c++: Final module preparationsNathan Sidwell7-4/+60
This adds the final few preparations to drop modules in. I'd missed a couple of changes to core compiler -- a new pair of preprocessor options, and marking the boundary of fixed and lazy global trees. For C++, we need to add module.cc to the GTY scanner. Parsing final cleanups needs a few tweaks for modules. Lambdas used to initialize a global (for instance) get an extra scope, but we now need to point that object to the lambda too. Finally template instantiation needs to do lazy loading before looking at the available instantiations and specializations. gcc/ * gcc.c (cpp_unique_options): Add Mmodules, Mno-modules. * tree-core.h (enum tree_index): Add TI_MODULE_HWM. gcc/cp/ * config-lang.in (gtfiles): Add cp/module.cc. * decl2.c (c_parse_final_cleanups): Add module support. * lambda.c (record_lambda_scope): Call maybe_attach_decl. * module.cc (maybe_attach_decl, lazy_load_specializations): Stubs. (finish_module_procesing): Stub. * pt.c (lookup_template_class_1): Lazy load specializations. (instantiate_template_1): Likewise.
2020-12-11c++: Refactor final cleanupNathan Sidwell1-15/+6
This is a small refactor of the end of decl processing, into which dropping module support will be simpler. gcc/cp/ * decl2.c (c_parse_final_cleanups): Refactor loop.
2020-12-11Add missing varasm DECL_P check.Jim Wilson1-0/+1
This fixes a riscv64-linux bootstrap failure. get_constant_section calls the select_section target hook, and select_section calls get_named_section which calls get_section. So it is possible to have a constant not a decl in both of these functions. They already call DECL_P checks everywhere except for the new code HJ recently added. This adds the missing DECL_P check. gcc/ * varasm.c (get_section): Add DECL_P check before DECL_PRESERVE_P.
2020-12-11Daily bump.GCC Administrator5-1/+537
2020-12-11compiler: encode user visible names if necessaryIan Lance Taylor5-59/+144
Avoid putting weird characters into the user visible name. It breaks stabs in particular, and may also cause debugger problems. Instead, encode those names, and use a "g." prefix to tell the debugger. Also dereference the type for the name of a recover thunk, to avoid a pointless '*' that gets encoded. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/277232
2020-12-11arm: Auto-vectorization for MVE clean condition for vand and vorr expandersChristophe Lyon2-10/+3
The patch restores the unconditional definition of the VDQ iterator, and changes the conditions of the vand and vorr expanders to use ARM_HAVE_<MODE>_ARITH. 2020-12-11 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/iterators.md (VDQ): Remove TARGET_HAVE_MVE conditions. * config/arm/vec-common.md (and<mode>3): Use ARM_HAVE_<MODE>_ARITH. (ior<mode>3): Likewise.
2020-12-11arc: Update ARC700 cache hazard detection.Claudiu Zissulescu1-29/+23
Replace/update ARC700 cache hazard detection. The next situations are handled: - There are 2 stores back2back, then 3 loads in next 3 or 4 instructions. if 3 loads in 3 instructions then we insert 2 nops after stores. if 3 loads in 4 instructions then we insert 1 nop after stores - 2 back to back stores, followed by at least 3 loads in next 4 instructions. st st ld ld ld ## st st ## ld ld ld st st ld ## ld ld st st ld ld ## ld ## - any instruction - store between non-store instructions, followed by 3 loads $$ st SS ld ld ld $$ - non-store instruction, even load. gcc/ 2020-12-11 Claudiu Zissulescu <claziss@synopsys.com> * config/arc/arc.c (arc_active_insn): Ignore all non essential instructions when getting the next active instruction. (check_store_cacheline_hazard): Update. (workaround_arc_anomaly): Remove obsolete cache hazard code. Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>
2020-12-11arc: Avoid generating brcc instructions with limmClaudiu Zissulescu1-0/+1
BRcc instructions are generated quite late in the compilation process. These instructions combines a compare with a regular conditional branch if the result of the compare is not used anylonger. However, when compiling for size, it is better to avoid BRcc instructions which are introducing a 32-bit long immediate. gcc/ 2020-12-11 Claudiu Zissulescu <claziss@synopsys.com> * config/arc/arc.c (arc_reorg): Avoid limm in BRcc.
2020-12-11arc: Refurbish adc/sbc patternsClaudiu Zissulescu3-122/+29
The adc/sbc patterns were unecessary spliting, remove that and associated functions. gcc/ 2020-12-11 Claudiu Zissulescu <claziss@synopsys.com> * config/arc/arc-protos.h (arc_scheduling_not_expected): Remove it. (arc_sets_cc_p): Likewise. (arc_need_delay): Likewise. * config/arc/arc.c (arc_sets_cc_p): Likewise. (arc_need_delay): Likewise. (arc_scheduling_not_expected): Likewise. * config/arc/arc.md: Convert adc/sbc patterns to simple instruction definitions. Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>
2020-12-11c++: module test harnessNathan Sidwell1-0/+376
Here is the module test harness -- but no tests. gcc/testsuite/ * g++.dg/modules/modules.exp: New.
2020-12-11c++: cp_tree_equal tweaksNathan Sidwell3-4/+27
When comparing streamed trees we can encounter NON_LVALUE_EXPR and VIEW_CONVERT_EXPRs with null types. Also, when checking a potential duplicate we don't want to reject PARM_DECLs with different contexts, if those two contexts are the two decls of interest. gcc/cp/ * cp-tree.h (map_context_from, map_context_to): Declare. * module.cc (map_context_from, map_context_to): Define. * tree.c (cp_tree_equal): Check map_context_{from,to} for parm context difference. Allow NON_LVALUE_EXPR and VIEW_CONVERT_EXPR with null types.
2020-12-11arm: Auto-vectorization for MVE: vorrChristophe Lyon7-17/+97
This patch enables MVE vorrq instructions for auto-vectorization. MVE vorrq insns in mve.md are modified to use ior instead of unspec expression to support ior<mode>3. The ior<mode>3 expander is added to vec-common.md 2020-12-03 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/iterators.md (supf): Remove VORRQ_S and VORRQ_U. (VORRQ): Remove. * config/arm/mve.md (mve_vorrq_s<mode>): New entry for vorr instruction using expression ior. (mve_vorrq_u<mode>): New expander. (mve_vorrq_f<mode>): Use ior code instead of unspec. * config/arm/neon.md (ior<mode>3): Renamed into ior<mode>3_neon. * config/arm/predicates.md (imm_for_neon_logic_operand): Enable for MVE. * config/arm/unspecs.md (VORRQ_S, VORRQ_U, VORRQ_F): Remove. * config/arm/vec-common.md (ior<mode>3): New expander. gcc/testsuite/ * gcc.target/arm/simd/mve-vorr.c: Add vorr tests.
2020-12-11arc: Use separate predicated patterns for mpyd(u)Claudiu Zissulescu3-51/+67
The compiler can match mpyd.eq r0,r1,r0 as a predicated instruction, which is incorrect. The mpyd(u) instruction takes as input two 32-bit registers, returning into a double 64-bit even-odd register pair. For the predicated case, the ARC instruction decoder expects the destination register to be the same as the first input register. In the big-endian case the result is swaped in the destination register pair, however, the instruction encoding remains the same. Refurbish the mpyd(u) patterns to take into account the above observation. gcc/ 2020-12-11 Claudiu Zissulescu <claziss@synopsys.com> * config/arc/arc.md (mpyd<su_optab>_arcv2hs): New template pattern. (*pmpyd<su_optab>_arcv2hs): Likewise. (*pmpyd<su_optab>_imm_arcv2hs): Likewise. (mpyd_arcv2hs): Moved into above template. (mpyd_imm_arcv2hs): Moved into above template. (mpydu_arcv2hs): Likewise. (mpydu_imm_arcv2hs): Likewise. (su_optab): New optab prefix for sign/zero-extending operations. gcc/testsuite/ 2020-12-11 Claudiu Zissulescu <claziss@synopsys.com> * gcc.target/arc/pmpyd.c: New test. * gcc.target/arc/tmac-1.c: Update. Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>
2020-12-11x86: Update user interrupt handler stack frameH.J. Lu9-9/+233
User interrupt handler stack frame is similar to exception interrupt handler stack frame. Instead of error code, the second argument is user interrupt request register vector. gcc/ PR target/98219 * config/i386/uintrintrin.h (__uintr_frame): Remove uirrv. gcc/testsuite/ PR target/98219 * gcc.dg/guality/pr98219-1.c: New test. * gcc.dg/guality/pr98219-2.c: Likewise. * gcc.dg/torture/pr98219-1.c: Likewise. * gcc.dg/torture/pr98219-2.c: Likewise. * gcc.target/i386/uintr-2.c: Scan "add[lq] $8, %[er]sp". (uword_t): New. (foo): Add a uword_t argument. (UINTR_hanlder): Likewise. * gcc.target/i386/uintr-3.c: Scan "add[lq] $8, %[er]sp". (uword_t): New. (UINTR_hanlder): Add a uword_t argument. * gcc.target/i386/uintr-4.c (uword_t): New. (UINTR_hanlder): Add a uword_t argument. * gcc.target/i386/uintr-5.c (uword_t): New. (UINTR_hanlder): Add a uword_t argument.
2020-12-11c++: Module lang hook overridingNathan Sidwell5-1/+44
This installs stub lang hooks for modules and creates the module dump file. gcc/cp/ * cp-lang.c (LANG_HOOKS_PREPROCESS_MAIN_FILE): Override. (LANG_HOOKS_PREPROCESS_OPTIONS): Override. (LANG_HOOKS_PREPROCESS_TOKEN): Override. * cp-objcp-common.c (cp_register_dumps): Add module dump. (cp_handle_option): New. * cp-objcp-common.h (cp_handle_option): Declare. (LANG_HOOKS_HANDLE_OPTION): Override. * cp-tree.h (module_dump_id): Declare. * module.cc (module_dump_id): Define. (module_begin_main_file, handle_module_option) (module_preproces_options): Stubs.
2020-12-11c++: name lookup API for modulesNathan Sidwell3-1/+443
This adds a set of calls to name lookup that are needed by modules. Generally installing imported bindings, or walking the current TU's bindings. One note about template instantiations though. When we're about to instantiate a template we have to know about all the maybe-partial specializations that exist. These can be in any imported module -- not necesarily the module defining the template. Thus we key such foreign templates to the innermost namespace and identifier of the containing entitity -- that's the only thing we have a handle on. That's why we note and load pending specializations here. gcc/cp/ * module.cc (lazy_specializations_p): Stub. * name-lookup.h (append_imported_binding_slot) (mergeable_namespacE_slots, lookup_class_binding) (walk_module_binding, import_module_binding, set_module_binding) (note_pending_specializations, load_pending_specializations) (add_module_decl, add_imported_namespace): Declare. (get_cxx_dialect_name): Declare. (enum WMB_flags): New. * name-lookup.c (append_imported_binding_slot) (mergeable_namespacE_slots, lookup_class_binding) (walk_module_binding, import_module_binding, set_module_binding) (note_pending_specializations, load_pending_specializations) (add_module_decl, add_imported_namespace): New. (get_cxx_dialect_name): Make extern.
2020-12-11c++: missing SFINAE with pointer subtraction [PR78173]Patrick Palka2-1/+10
This fixes a missed SFINAE when subtracting pointers to an incomplete type. gcc/cp/ChangeLog: PR c++/78173 * typeck.c (pointer_diff): Use complete_type_or_maybe_complain instead of complete_type_or_else. gcc/testsuite/ChangeLog: PR c++/78173 * g++.dg/cpp2a/concepts-pr78173.C: New test.
2020-12-11arm: Improve documentation for effective target 'arm_softfloat'Andrea Corallo2-4/+3
gcc/ChangeLog 2020-12-01 Andrea Corallo <andrea.corallo@arm.com> * doc/sourcebuild.texi (arm_softfloat): Improve documentation. gcc/testsuite/ChangeLog 2020-12-01 Andrea Corallo <andrea.corallo@arm.com> * lib/target-supports.exp (check_effective_target_arm_softfloat): Improve documentation.
2020-12-11arm: [testsuite] fix lob tests for -mfloat-abi=hardAndrea Corallo4-4/+4
2020-11-26 Andrea Corallo <andrea.corallo@arm.com> * gcc.target/arm/lob2.c: Use '-march=armv8.1-m.main+fp'. * gcc.target/arm/lob3.c: Skip with '-mfloat-abi=hard'. * gcc.target/arm/lob4.c: Likewise. * gcc.target/arm/lob5.c: Use '-march=armv8.1-m.main+fp'.
2020-12-11testsuite/98244 - amend gcc.dg/vect/vect-live-6.cRichard Biener1-1/+1
Committed. 2020-12-11 Richard Biener <rguenther@suse.de> PR testsuite/98244 * gcc.dg/vect/vect-live-6.c: Require vect_condition.
2020-12-11testsuite/98242 - amend gcc.dg/vect/bb-slp-subgroups-3.cRichard Biener1-0/+1
Committed. 2020-12-11 Richard Biener <rguenther@suse.de> PR testsuite/98242 * gcc.dg/vect/bb-slp-subgroups-3.c: Require vect_int_mult.
2020-12-11testsuite/98240 - amend gcc.dg/vect/pr97678.cRichard Biener1-0/+2
Committed. 2020-12-11 Richard Biener <rguenther@suse.de> PR testsuite/98240 * gcc.dg/vect/pr97678.c: Require vect_int_mult and vect_pack_trunc.
2020-12-11testsuite/98239 - require vect_condition for gcc.dg/vect/bb-slp-69.cRichard Biener1-0/+1
Committed. 2020-12-11 Richard Biener <rguenther@suse.de> PR testsuite/98239 * gcc.dg/vect/bb-slp-69.c: Require vect_condition.
2020-12-11expand: Fix up expand_doubleword_mod on 32-bit targets [PR98229]Jakub Jelinek2-2/+11
As the testcase shows, for 32-bit word size we can end up with op1 up to 0xffffffff (0x100000000 % 0xffffffff == 1 and so we use bit == 32 for that), but the CONST_INT we got from caller is for DImode in that case and not valid for SImode operations. The following patch canonicalizes the two spots where the constant needs canonicalization. 2020-12-10 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/98229 * optabs.c (expand_doubleword_mod): Canonicalize op1 and 1 - INTVAL (op1) as word_mode constants when used in word_mode arithmetics. * gcc.c-torture/compile/pr98229.c: New test.
2020-12-11tree-optimization/98235 - limit SLP discoveryRichard Biener2-31/+77
With following backedges and the SLP discovery cache not being permute aware we have to put some discovery limits in place again. That's also the opportunity to ditch the separate limit on the number of permutes we try, so the patch limits the overall work done (as in vect_build_slp_tree cache misses) to what we compute as max_tree_size which is based on the number of scalar stmts in the vectorized region. Note the limit is global and there's no attempt to divide the allowed work evenly amongst opportunities, so one degenerate can eat it all up. That's probably only relevant for BB vectorization where the limit is based on up to the size of the whole function. 2020-12-11 Richard Biener <rguenther@suse.de> PR tree-optimization/98235 * tree-vect-slp.c (vect_build_slp_tree): Exchange npermutes for limit. Decrement that for each cache miss and fail discovery when it reaches zero. (vect_build_slp_tree_2): Remove npermutes handling and simply pass down limit. (vect_build_slp_instance): Use pass down limit. (vect_analyze_slp_instance): Likewise. (vect_analyze_slp): Base the SLP discovery limit on max_tree_size and pass it down. * gcc.dg/torture/pr98235.c: New testcase.
2020-12-11expansion: Sign or zero extend on MEM_REF stores into SUBREG with ↵Jakub Jelinek2-0/+57
SUBREG_PROMOTED_VAR_P [PR98190] Some targets decide to promote certain scalar variables to wider mode, so their DECL_RTL is a SUBREG with SUBREG_PROMOTED_VAR_P. When storing to such vars, store_expr takes care of sign or zero extending, but if we store e.g. through MEM_REF into them, no sign or zero extension happens and that leads to wrong-code e.g. on the following testcase on aarch64-linux. The following patch uses store_expr if we overwrite all the bits and it is not reversed storage order, i.e. something that store_expr handles normally, and otherwise (if the most significant bit is (or for pdp11 might be, but pdp11 doesn't promote) being modified), the code extends manually. 2020-12-11 Jakub Jelinek <jakub@redhat.com> PR middle-end/98190 * expr.c (expand_assignment): If to_rtx is a promoted SUBREG, ensure sign or zero extension either through use of store_expr or by extending manually. * gcc.dg/pr98190.c: New test.
2020-12-11ira.c: Fix ICE in ira-color [PR97092]Andrea Corallo2-2/+28
gcc/ChangeLog 2020-12-10 Andrea Corallo <andrea.corallo@arm.com> PR rtl-optimization/97092 * ira-color.c (update_costs_from_allocno): Do not carry over mode between subsequent iterations. gcc/testsuite/ChangeLog 2020-12-10 Andrea Corallo <andrea.corallo@arm.com> * gcc.target/aarch64/sve/pr97092.c: New test.
2020-12-11tree-optimization/95582 - fix vector pattern with bool conversionsRichard Biener1-1/+1
The pattern recognizer fends off against recognizing conversions from VECT_SCALAR_BOOLEAN_TYPE_P to precision one types but what it really needs to fend off is conversions between VECT_SCALAR_BOOLEAN_TYPE_P types - the Ada FE uses an 8 bit boolean type that satisfies this predicate. 2020-12-11 Richard Biener <rguenther@suse.de> PR tree-optimization/95582 * tree-vect-patterns.c (vect_recog_bool_pattern): Check for VECT_SCALAR_BOOLEAN_TYPE_P, not just precision one.
2020-12-11Fix feature check for HRESET/AVX_VNNI/UINTRHongyu1-10/+15
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Move check for HRESET/AVX_VNNI/UINTR out of avx512_usable.
2020-12-11dojump: Fix up probabilities splitting in dojump.c comparison splitting ↵Jakub Jelinek2-6/+34
[PR98212] When compiling: void foo (void); void bar (float a, float b) { if (__builtin_expect (a != b, 1)) foo (); } void baz (float a, float b) { if (__builtin_expect (a == b, 1)) foo (); } void qux (float a, float b) { if (__builtin_expect (a != b, 0)) foo (); } void corge (float a, float b) { if (__builtin_expect (a == b, 0)) foo (); } on x86_64, we get (unimportant cruft removed): bar: ucomiss %xmm1, %xmm0 jp .L4 je .L1 .L4: jmp foo .L1: ret baz: ucomiss %xmm1, %xmm0 jp .L6 jne .L6 jmp foo .L6: ret qux: ucomiss %xmm1, %xmm0 jp .L13 jne .L13 ret .L13: jmp foo corge: ucomiss %xmm1, %xmm0 jnp .L18 .L14: ret .L18: jne .L14 jmp foo (note for bar and qux that changed with a patch I've posted earlier today). This is all reasonable, except the last function, the overall jump to the tail call is predicted unlikely (10%), so it is good jmp foo isn't on the straight line path, but NaNs are (or should be) considered very unlikely in the programs, so IMHO the right code (and one emitted with the following patch) is: corge: ucomiss %xmm1, %xmm0 jp .L14 je .L18 .L14: ret .L18: jmp foo Let's discuss the probabilities in the above testcase: for !and_them it looks all correct, so for bar we split if (a != b) goto t; // prob 90% goto f; into: if (a unord b) goto t; // first_prob = prob * cprob = 90% * 1% = 0.9% if (a ltgt b) goto t; // adjusted prob = (prob - first_prob) / (1 - first_prob) = (90% - 0.9%) / (1 - 0.9%) = 89.909% and for qux we split if (a != b) goto t; // prob 10% goto f; into: if (a unord b) goto t; // first_prob = prob * cprob = 10% * 1% = 0.1% if (a ltgt b) goto t; // adjusted prob = (prob - first_prob) / (1 - first_prob) = (10% - 0.1%) / (1 - 0.1%) = 9.910% Now, the and_them cases should be probability wise exactly the same if we swap the f and t labels, because baz if (a == b) goto t; // prob 90% goto f; is equivalent to: if (a != b) goto f; // prob 10% goto t; which is in qux. This means we could expand baz as: if (a unord b) goto f; // 0.1% if (a ltgt b) goto f; // 9.910% goto t; But we don't expand it exactly that way, but instead (as the comment says) as: if (a ord b) ; else goto f; // first_prob as probability of ; if (a uneq b) goto t; // adjusted prob goto f; So, first_prob.invert () should be 0.1% and adjusted prob should be 1 - 9.910%. Thus, the right thing is 4 inverts: prob = prob.invert (); // baz is equivalent to qux with swap(t, f) and thus inverted original prob first_prob = prob.split (cprob.invert ()).invert (); // cprob.invert because by doing if (cond) ; else goto f; we effectively invert the condition // the second invert because first_prob is probability of ; rather than goto f prob = prob.invert (); // lastly because adjusted prob we want is // probability of goto t;, while the one from corresponding !and_them case // would be if (...) goto f; goto t; 2020-12-11 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/98212 * dojump.c (do_compare_rtx_and_jump): Change computation of first_prob for and_them. Add comment explaining and_them case. * gcc.dg/predict-8.c: Adjust expected probability.