aboutsummaryrefslogtreecommitdiff
path: root/gcc/config/rs6000/mma.md
AgeCommit message (Collapse)AuthorFilesLines
2022-08-16rs6000: Adjust mov optabs for opaque modes [PR103353]Kewen.Lin1-6/+33
As PR103353 shows, we may want to continue to expand built-in function __builtin_vsx_lxvp, even if we have already emitted error messages about some missing required conditions. As shown in that PR, without one explicit mov optab on OOmode provided, it would call emit_move_insn recursively. So this patch is to allow the mov pattern to be generated during expanding phase if compiler has already seen errors. PR target/103353 gcc/ChangeLog: * config/rs6000/mma.md (define_expand movoo): Move TARGET_MMA condition check to preparation statements and add handlings for !TARGET_MMA. (define_expand movxo): Likewise. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr103353.c: New test.
2022-05-17rs6000: Prefer assigning the MMA vector operands to altivec registers [PR105556]Peter Bergner1-75/+75
When optimizing the DGEMM kernel in OpenBLAS to use MMA, the MMA code uses all 8 accumulators, which overlap all vs0-vs31 vector registers. Current trunk assigns one of the normal vector inputs to one of the MMA instructions, which forces us to spill one of the accumulators to memory, leading to poor performance. The solution here is to replace the "wa" constraints for the vector input operands in the MMA instruction patterns with "v,?wa" so that we prefer using the altivec registers vs32-vs63 over the vs0-vs31 registers. 2022-05-17 Peter Bergner <bergner@linux.ibm.com> Segher Boessenkool <segher@kernel.crashing.org> gcc/ PR target/105556 * config/rs6000/mma.md (mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>, mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>, mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>, mma_<vvi4i4i4>, mma_<avvi4i4i4>): Replace "wa" constraints with "v,?wa". Update other operands accordingly.
2022-01-03Update copyright years.Jakub Jelinek1-1/+1
2021-12-14rs6000: Do not allow combining of multiple assemble quads [PR103548]Peter Bergner1-18/+20
The compiler will gladly CSE the result of two __builtin_mma_build_acc calls with the same four vector arguments, leading to illegal MMA code being generated. The fix here is to make the mma_assemble_acc pattern use a unspec_volatile to stop the CSE from happening. 2021-12-14 Peter Bergner <bergner@linux.ibm.com> gcc/ PR target/103548 * config/rs6000/mma.md (UNSPEC_MMA_ASSEMBLE): Rename unspec from this... (UNSPEC_VSX_ASSEMBLE): ...to this. (UNSPECV_MMA_ASSEMBLE): New unspecv. (vsx_assemble_pair): Use UNSPEC_VSX_ASSEMBLE. (*vsx_assemble_pair): Likewise. (mma_assemble_acc): Use UNSPECV_MMA_ASSEMBLE. (*mma_assemble_acc): Likewise. * config/rs6000/rs6000.c (rs6000_split_multireg_move): Handle UNSPEC_VOLATILE. Use UNSPEC_VSX_ASSEMBLE and UNSPECV_MMA_ASSEMBLE. gcc/testsuite/ PR target/103548 * gcc.target/powerpc/mma-builtin-10-pair.c: New test. * gcc.target/powerpc/mma-builtin-10-quad.c: New test.
2021-11-16rs6000: MMA test case emits wrong code when building a vector pair [PR102976]Peter Bergner1-2/+8
PR102976 shows a test case where we generate wrong code when building a vector pair from 2 vector registers. The bug here is that with unlucky register assignments, we can clobber one of the input operands before we write both registers of the output operand. The solution is to use early-clobbers in the assemble pair and accumulator patterns. 2021-11-16 Peter Bergner <bergner@linux.ibm.com> gcc/ PR target/102976 * config/rs6000/mma.md (*vsx_assemble_pair): Add early-clobber for output operand. (*mma_assemble_acc): Likewise. gcc/testsuite/ PR target/102976 * gcc.target/powerpc/pr102976.c: New test.
2021-09-14rs6000: Disable optimizing multiple xxsetaccz instructions into one xxsetacczPeter Bergner1-21/+10
Fwprop will happily optimize two xxsetaccz instructions into one xxsetaccz by propagating the results of the first to the uses of the second. We really don't want that to happen given the late priming/depriming of accumulators. I fixed this by making the xxsetaccz source operand an unspec volatile. I also removed the mma_xxsetaccz define_expand and define_insn_and_split and replaced it with a simple define_insn. The expand and splitter patterns were leftovers from the pre opaque mode code when the xxsetaccz code was part of the movpxi pattern, and we don't need them now. Rather than a new test case, I was able to just modify the current test case to add another __builtin_mma_xxsetaccz call which shows the bad code gen with unpatched compilers. 2021-09-14 Peter Bergner <bergner@linux.ibm.com> gcc/ * config/rs6000/mma.md (unspec): Delete UNSPEC_MMA_XXSETACCZ. (unspecv): Add UNSPECV_MMA_XXSETACCZ. (*mma_xxsetaccz): Delete. (mma_xxsetaccz): Change to define_insn. Remove operand 1. Use UNSPECV_MMA_XXSETACCZ. Update comment. * config/rs6000/rs6000.c (rs6000_rtx_costs): Use UNSPECV_MMA_XXSETACCZ. gcc/testsuite/ * gcc.target/powerpc/mma-builtin-6.c: Add second call to xxsetacc built-in. Update instruction counts.
2021-03-31Update prefixed attribute for Power10.Pat Haugen1-10/+20
This patch creates a new attribute, "maybe_prefixed", which is used to mark those instructions that may have a prefixed form. The existing "prefixed" attribute is now used to mark all instructions that are prefixed form. 2021-03-31 Pat Haugen <pthaugen@linux.ibm.com> gcc/ PR target/99133 * config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst, xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst, xxblend_<mode>, xxpermx_inst, xxeval): Mark prefixed. * config/rs6000/mma.md (mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>, mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>, mma_<vvi4i4i4>, mma_<avvi4i4i4>): Likewise. * config/rs6000/rs6000.c (rs6000_final_prescan_insn): Adjust test. * config/rs6000/rs6000.md (define_attr "maybe_prefixed"): New. (define_attr "prefixed"): Update initializer.
2021-03-03Update size attribute for Power10.Pat Haugen1-0/+1
2021-03-03 Pat Haugen <pthaugen@linux.ibm.com> gcc/ * config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp<mode>_internal1, floatditd2, ftrunc<mode>2, fix<mode>di2, dfp_ddedpd_<mode>, dfp_denbcd_<mode>, dfp_dxex_<mode>, dfp_diex_<mode>, *dfp_sgnfcnc_<mode>, dfp_dscli_<mode>, dfp_dscri_<mode>): Update size attribute for Power10. * config/rs6000/mma.md (*movoo): Likewise. * config/rs6000/rs6000.md (define_attr "size"): Add 256. (define_mode_attr bits): Add DD/TD modes. * config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti, store_conditionalpti): Update size attribute for Power10.
2021-02-23rs6000: Add support for compatibility built-insPeter Bergner1-4/+4
The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and __builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and __builtin_vsx_disassemble_pair respectively. It's too late to remove the old names, so this patch renames the built-ins to the new names and then adds support for creating compatibility built-ins (ie, multiple built-in functions generate the same code) and then creates compatibility built-ins using the old names. 2021-02-23 Peter Bergner <bergner@linux.ibm.com> gcc/ * config/rs6000/mma.md (mma_assemble_pair): Rename from this... (vsx_assemble_pair): ...to this. (*mma_assemble_pair): Rename from this... (*vsx_assemble_pair): ...to this. (mma_disassemble_pair): Rename from this... (vsx_disassemble_pair): ...to this. (*mma_disassemble_pair): Rename from this... (*vsx_disassemble_pair): ...to this. * config/rs6000/rs6000-builtin.def (BU_MMA_V2, BU_MMA_V3, BU_COMPAT): New macros. (mma_assemble_pair): Rename from this... (vsx_assemble_pair): ...to this. (mma_disassemble_pair): Rename from this... (vsx_disassemble_pair): ...to this. (mma_assemble_pair): New compatibility built-in. (mma_disassemble_pair): Likewise. * config/rs6000/rs6000-call.c (struct builtin_compatibility): New. (RS6000_BUILTIN_COMPAT): Define. (bdesc_compat): New. (mma_expand_builtin): Use VSX_BUILTIN_DISASSEMBLE_PAIR_INTERNAL. (rs6000_gimple_fold_mma_builtin): Use MMA_BUILTIN_DISASSEMBLE_PAIR and VSX_BUILTIN_ASSEMBLE_PAIR. (rs6000_init_builtins): Register compatibility built-ins. (mma_init_builtins): Use VSX_BUILTIN_ASSEMBLE_PAIR, VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL, VSX_BUILTIN_DISASSEMBLE_PAIR and VSX_BUILTIN_DISASSEMBLE_PAIR_INTERNAL. * doc/extend.texi (__builtin_mma_assemble_pair): Rename from this... (__builtin_vsx_assemble_pair): ...to this. (__builtin_mma_disassemble_pair): Rename from this... (__builtin_vsx_disassemble_pair): ...to this. gcc/testsuite/ * gcc.target/powerpc/mma-builtin-4.c: Add tests for __builtin_vsx_assemble_pair and __builtin_vsx_disassemble_pair. Add __has_builtin tests for built-ins. Update expected instruction counts.
2021-01-04Update copyright years.Jakub Jelinek1-1/+1
2020-12-16Fix instruction length for MMA insns.Pat Haugen1-21/+11
Prefixed instructions should not have their length explicitly set to '8'. The function get_attr_length() will adjust the length appropriately based on the value of the "prefixed" attribute. 2020-12-16 Pat Haugen <pthaugen@linux.ibm.com> gcc/ * config/rs6000/mma.md (*movxo, mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>, mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>, mma_<vvi4i4i4>, mma_<avvi4i4i4>): Remove explicit setting of length attribute.
2020-11-21Make MMA builtins use opaque modesAaron Sawdey1-176/+245
This patch changes powerpc MMA builtins to use the new opaque mode class and use modes OO (32 bytes) and XO (64 bytes) instead of POI/PXI. Using the opaque modes prevents optimization from trying to do anything with vector pair/quad, which was the problem we were seeing with the partial integer modes. gcc/ * config/rs6000/mma.md (unspec): Add assemble/extract UNSPECs. (movoi): Change to movoo. (*movpoi): Change to *movoo. (movxi): Change to movxo. (*movpxi): Change to *movxo. (mma_assemble_pair): Change to OO mode. (*mma_assemble_pair): New define_insn_and_split. (mma_disassemble_pair): New define_expand. (*mma_disassemble_pair): New define_insn_and_split. (mma_assemble_acc): Change to XO mode. (*mma_assemble_acc): Change to XO mode. (mma_disassemble_acc): New define_expand. (*mma_disassemble_acc): New define_insn_and_split. (mma_<acc>): Change to XO mode. (mma_<vv>): Change to XO mode. (mma_<avv>): Change to XO mode. (mma_<pv>): Change to OO mode. (mma_<apv>): Change to XO/OO mode. (mma_<vvi4i4i8>): Change to XO mode. (mma_<avvi4i4i8>): Change to XO mode. (mma_<vvi4i4i2>): Change to XO mode. (mma_<avvi4i4i2>): Change to XO mode. (mma_<vvi4i4>): Change to XO mode. (mma_<avvi4i4>): Change to XO mode. (mma_<pvi4i2>): Change to XO/OO mode. (mma_<apvi4i2>): Change to XO/OO mode. (mma_<vvi4i4i4>): Change to XO mode. (mma_<avvi4i4i4>): Change to XO mode. * config/rs6000/predicates.md (input_operand): Allow opaque. (mma_disassemble_output_operand): New predicate. * config/rs6000/rs6000-builtin.def: Changes to disassemble builtins. * config/rs6000/rs6000-call.c (rs6000_return_in_memory): Disallow __vector_pair/__vector_quad as return types. (rs6000_promote_function_mode): Remove function return type check because we can't test it here any more. (rs6000_function_arg): Do not allow __vector_pair/__vector_quad as as function arguments. (rs6000_gimple_fold_mma_builtin): Handle mma_disassemble_* builtins. (rs6000_init_builtins): Create types for XO/OO modes. * config/rs6000/rs6000-modes.def: DElete OI, XI, POI, and PXI modes, and create XO and OO modes. * config/rs6000/rs6000-string.c (expand_block_move): Update to OO mode. * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok_uncached): Update for XO/OO modes. (rs6000_rtx_costs): Make UNSPEC_MMA_XXSETACCZ cost 0. (rs6000_modes_tieable_p): Update for XO/OO modes. (rs6000_debug_reg_global): Update for XO/OO modes. (rs6000_setup_reg_addr_masks): Update for XO/OO modes. (rs6000_init_hard_regno_mode_ok): Update for XO/OO modes. (reg_offset_addressing_ok_p): Update for XO/OO modes. (rs6000_emit_move): Update for XO/OO modes. (rs6000_preferred_reload_class): Update for XO/OO modes. (rs6000_split_multireg_move): Update for XO/OO modes. (rs6000_mangle_type): Update for opaque types. (rs6000_invalid_conversion): Update for XO/OO modes. * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Update for XO/OO modes. * config/rs6000/rs6000.md (RELOAD): Update for XO/OO modes. gcc/testsuite/ * gcc.target/powerpc/mma-double-test.c (main): Call abort for failure. * gcc.target/powerpc/mma-single-test.c (main): Call abort for failure. * gcc.target/powerpc/pr96506.c: Rename to pr96506-1.c. * gcc.target/powerpc/pr96506-2.c: New test.
2020-08-06rs6000: Don't ICE when spilling an MMA accumulatorPeter Bergner1-8/+14
When we spill an accumulator that has a known zero value, LRA will emit a new (set (reg:PXI ...) 0) insn, but it does not use the mma_xxsetaccz pattern to do it, leading to an unrecognized insn ICE. The solution here is to move the xxsetaccz instruction into the movpxi pattern and have the xxsetaccz pattern call the move pattern. 2020-08-06 Peter Bergner <bergner@linux.ibm.com> gcc/ PR target/96446 * config/rs6000/mma.md (*movpxi): Add xxsetaccz generation. Disable split for zero constant source operand. (mma_xxsetaccz): Change to define_expand. Call gen_movpxi. gcc/testsuite/ PR target/96446 * gcc.target/powerpc/pr96446.c: New test.
2020-06-21rs6000: Add MMA built-in function definitions and test cases.Peter Bergner1-6/+484
Add the Matrix-Multiply Assist (MMA) built-ins. The MMA accumulators are INOUT operands for most MMA instructions, but they are also very expensive to move around. For this reason, we have implemented a built-in API where the accumulators are passed using pass-by-reference/pointers, so the user won't use one accumulator as input and another as output, which wouldentail a lot of copies. However, using pointers gives us poor code generation when we expand the built-ins at normal expand time. We therefore expand the MMA built-ins early into gimple, converting the pass-by-reference calls to an internal built-in that uses pass-by-value calling convention, where we can enforce the input and output accumulators are the same. This gives us much better code generation. 2020-06-20 Peter Bergner <bergner@linux.ibm.com> gcc/ * config/rs6000/predicates.md (mma_assemble_input_operand): New. * config/rs6000/rs6000-builtin.def (BU_MMA_1, BU_MMA_V2, BU_MMA_3, BU_MMA_5, BU_MMA_6, BU_VSX_1): Add support macros for defining MMA built-in functions. (ASSEMBLE_ACC, ASSEMBLE_PAIR, DISASSEMBLE_ACC, DISASSEMBLE_PAIR, PMXVBF16GER2, PMXVBF16GER2NN, PMXVBF16GER2NP, PMXVBF16GER2PN, PMXVBF16GER2PP, PMXVF16GER2, PMXVF16GER2NN, PMXVF16GER2NP, PMXVF16GER2PN, PMXVF16GER2PP, PMXVF32GER, PMXVF32GERNN, PMXVF32GERNP, PMXVF32GERPN, PMXVF32GERPP, PMXVF64GER, PMXVF64GERNN, PMXVF64GERNP, PMXVF64GERPN, PMXVF64GERPP, PMXVI16GER2, PMXVI16GER2PP, PMXVI16GER2S, PMXVI16GER2SPP, PMXVI4GER8, PMXVI4GER8PP, PMXVI8GER4, PMXVI8GER4PP, PMXVI8GER4SPP, XVBF16GER2, XVBF16GER2NN, XVBF16GER2NP, XVBF16GER2PN, XVBF16GER2PP, XVCVBF16SP, XVCVSPBF16, XVF16GER2, XVF16GER2NN, XVF16GER2NP, XVF16GER2PN, XVF16GER2PP, XVF32GER, XVF32GERNN, XVF32GERNP, XVF32GERPN, XVF32GERPP, XVF64GER, XVF64GERNN, XVF64GERNP, XVF64GERPN, XVF64GERPP, XVI16GER2, XVI16GER2PP, XVI16GER2S, XVI16GER2SPP, XVI4GER8, XVI4GER8PP, XVI8GER4, XVI8GER4PP, XVI8GER4SPP, XXMFACC, XXMTACC, XXSETACCZ): Add MMA built-ins. * config/rs6000/rs6000.c (rs6000_emit_move): Use CONST_INT_P. Allow zero constants. (print_operand) <case 'A'>: New output modifier. (rs6000_split_multireg_move): Add support for inserting accumulator priming and depriming instructions. Add support for splitting an assemble accumulator pattern. * config/rs6000/rs6000-call.c (mma_init_builtins, mma_expand_builtin, rs6000_gimple_fold_mma_builtin): New functions. (RS6000_BUILTIN_M): New macro. (def_builtin): Handle RS6000_BTC_QUAD and RS6000_BTC_PAIR attributes. (bdesc_mma): Add new MMA built-in support. (htm_expand_builtin): Use RS6000_BTC_OPND_MASK. (rs6000_invalid_builtin): Add handling of RS6000_BTM_FUTURE and RS6000_BTM_MMA. (rs6000_builtin_valid_without_lhs): Handle RS6000_BTC_VOID attribute. (rs6000_gimple_fold_builtin): Call rs6000_builtin_is_supported_p and rs6000_gimple_fold_mma_builtin. (rs6000_expand_builtin): Call mma_expand_builtin. Use RS6000_BTC_OPND_MASK. (rs6000_init_builtins): Adjust comment. Call mma_init_builtins. (htm_init_builtins): Use RS6000_BTC_OPND_MASK. (builtin_function_type): Handle VSX_BUILTIN_XVCVSPBF16 and VSX_BUILTIN_XVCVBF16SP. * config/rs6000/rs6000.h (RS6000_BTC_QUINARY, RS6000_BTC_SENARY, RS6000_BTC_OPND_MASK, RS6000_BTC_QUAD, RS6000_BTC_PAIR, RS6000_BTC_QUADPAIR, RS6000_BTC_GIMPLE): New defines. (RS6000_BTC_PREDICATE, RS6000_BTC_ABS, RS6000_BTC_DST, RS6000_BTC_TYPE_MASK, RS6000_BTC_ATTR_MASK): Adjust values. * config/rs6000/mma.md (MAX_MMA_OPERANDS): New define_constant. (UNSPEC_MMA_ASSEMBLE_ACC, UNSPEC_MMA_PMXVBF16GER2, UNSPEC_MMA_PMXVBF16GER2NN, UNSPEC_MMA_PMXVBF16GER2NP, UNSPEC_MMA_PMXVBF16GER2PN, UNSPEC_MMA_PMXVBF16GER2PP, UNSPEC_MMA_PMXVF16GER2, UNSPEC_MMA_PMXVF16GER2NN, UNSPEC_MMA_PMXVF16GER2NP, UNSPEC_MMA_PMXVF16GER2PN, UNSPEC_MMA_PMXVF16GER2PP, UNSPEC_MMA_PMXVF32GER, UNSPEC_MMA_PMXVF32GERNN, UNSPEC_MMA_PMXVF32GERNP, UNSPEC_MMA_PMXVF32GERPN, UNSPEC_MMA_PMXVF32GERPP, UNSPEC_MMA_PMXVF64GER, UNSPEC_MMA_PMXVF64GERNN, UNSPEC_MMA_PMXVF64GERNP, UNSPEC_MMA_PMXVF64GERPN, UNSPEC_MMA_PMXVF64GERPP, UNSPEC_MMA_PMXVI16GER2, UNSPEC_MMA_PMXVI16GER2PP, UNSPEC_MMA_PMXVI16GER2S, UNSPEC_MMA_PMXVI16GER2SPP, UNSPEC_MMA_PMXVI4GER8, UNSPEC_MMA_PMXVI4GER8PP, UNSPEC_MMA_PMXVI8GER4, UNSPEC_MMA_PMXVI8GER4PP, UNSPEC_MMA_PMXVI8GER4SPP, UNSPEC_MMA_XVBF16GER2, UNSPEC_MMA_XVBF16GER2NN, UNSPEC_MMA_XVBF16GER2NP, UNSPEC_MMA_XVBF16GER2PN, UNSPEC_MMA_XVBF16GER2PP, UNSPEC_MMA_XVF16GER2, UNSPEC_MMA_XVF16GER2NN, UNSPEC_MMA_XVF16GER2NP, UNSPEC_MMA_XVF16GER2PN, UNSPEC_MMA_XVF16GER2PP, UNSPEC_MMA_XVF32GER, UNSPEC_MMA_XVF32GERNN, UNSPEC_MMA_XVF32GERNP, UNSPEC_MMA_XVF32GERPN, UNSPEC_MMA_XVF32GERPP, UNSPEC_MMA_XVF64GER, UNSPEC_MMA_XVF64GERNN, UNSPEC_MMA_XVF64GERNP, UNSPEC_MMA_XVF64GERPN, UNSPEC_MMA_XVF64GERPP, UNSPEC_MMA_XVI16GER2, UNSPEC_MMA_XVI16GER2PP, UNSPEC_MMA_XVI16GER2S, UNSPEC_MMA_XVI16GER2SPP, UNSPEC_MMA_XVI4GER8, UNSPEC_MMA_XVI4GER8PP, UNSPEC_MMA_XVI8GER4, UNSPEC_MMA_XVI8GER4PP, UNSPEC_MMA_XVI8GER4SPP, UNSPEC_MMA_XXMFACC, UNSPEC_MMA_XXMTACC): New. (MMA_ACC, MMA_VV, MMA_AVV, MMA_PV, MMA_APV, MMA_VVI4I4I8, MMA_AVVI4I4I8, MMA_VVI4I4I2, MMA_AVVI4I4I2, MMA_VVI4I4, MMA_AVVI4I4, MMA_PVI4I2, MMA_APVI4I2, MMA_VVI4I4I4, MMA_AVVI4I4I4): New define_int_iterator. (acc, vv, avv, pv, apv, vvi4i4i8, avvi4i4i8, vvi4i4i2, avvi4i4i2, vvi4i4, avvi4i4, pvi4i2, apvi4i2, vvi4i4i4, avvi4i4i4): New define_int_attr. (*movpxi): Add zero constant alternative. (mma_assemble_pair, mma_assemble_acc): New define_expand. (*mma_assemble_acc): New define_insn_and_split. (mma_<acc>, mma_xxsetaccz, mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>, mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>, mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>, mma_<vvi4i4i4>, mma_<avvi4i4i4>): New define_insn. * config/rs6000/rs6000.md (define_attr "type"): New type mma. * config/rs6000/vsx.md (UNSPEC_VSX_XVCVBF16SP): New. (UNSPEC_VSX_XVCVSPBF16): Likewise. (XVCVBF16): New define_int_iterator. (xvcvbf16): New define_int_attr. (vsx_<xvcvbf16>): New define_insn. * doc/extend.texi: Document the mma built-ins.
2020-06-21rs6000: Add base support and types for defining MMA built-ins.Peter Bergner1-0/+108
Add the new -mmma option as well as the initial MMA support, which includes the target specific __vector_pair and __vector_quad types, the POImode and PXImode partial integer modes they are mapped to, and their associated move patterns. Support for the restrictions on the registers these modes can be assigned to as also been added. 2020-06-20 Peter Bergner <bergner@linux.ibm.com> Michael Meissner <meissner@linux.ibm.com> gcc/ * config/rs6000/mma.md: New file. * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define __MMA__ for mma. * config/rs6000/rs6000-call.c (rs6000_init_builtins): Add support for __vector_pair and __vector_quad types. * config/rs6000/rs6000-cpus.def (OTHER_FUTURE_MASKS): Add OPTION_MASK_MMA. (POWERPC_MASKS): Likewise. * config/rs6000/rs6000-modes.def (OI, XI): New integer modes. (POI, PXI): New partial integer modes. * config/rs6000/rs6000.c (TARGET_INVALID_CONVERSION): Define. (rs6000_hard_regno_nregs_internal): Use VECTOR_ALIGNMENT_P. (rs6000_hard_regno_mode_ok_uncached): Likewise. Add support for POImode being allowed in VSX registers and PXImode being allowed in FP registers. (rs6000_modes_tieable_p): Adjust comment. Add support for POImode and PXImode. (rs6000_debug_reg_global) <print_tieable_modes>: Add OImode, POImode XImode, PXImode, V2SImode, V2SFmode and CCFPmode.. (rs6000_setup_reg_addr_masks): Use VECTOR_ALIGNMENT_P. Set up appropriate addr_masks for vector pair and vector quad addresses. (rs6000_init_hard_regno_mode_ok): Add support for vector pair and vector quad registers. Setup reload handlers for POImode and PXImode. (rs6000_builtin_mask_calculate): Add support for RS6000_BTM_MMA. (rs6000_option_override_internal): Error if -mmma is specified without -mcpu=future. (rs6000_slow_unaligned_access): Use VECTOR_ALIGNMENT_P. (quad_address_p): Change size test to less than 16 bytes. (reg_offset_addressing_ok_p): Add support for ISA 3.1 vector pair and vector quad instructions. (avoiding_indexed_address_p): Likewise. (rs6000_emit_move): Disallow POImode and PXImode moves involving constants. (rs6000_preferred_reload_class): Prefer VSX registers for POImode and FP registers for PXImode. (rs6000_split_multireg_move): Support splitting POImode and PXImode move instructions. (rs6000_mangle_type): Adjust comment. Add support for mangling __vector_pair and __vector_quad types. (rs6000_opt_masks): Add entry for mma. (rs6000_builtin_mask_names): Add RS6000_BTM_MMA and RS6000_BTM_FUTURE. (rs6000_function_value): Use VECTOR_ALIGNMENT_P. (address_to_insn_form): Likewise. (reg_to_non_prefixed): Likewise. (rs6000_invalid_conversion): New function. * config/rs6000/rs6000.h (MASK_MMA): Define. (BIGGEST_ALIGNMENT): Set to 512 if MMA support is enabled. (VECTOR_ALIGNMENT_P): New helper macro. (ALTIVEC_VECTOR_MODE): Use VECTOR_ALIGNMENT_P. (RS6000_BTM_MMA): Define. (RS6000_BTM_COMMON): Add RS6000_BTM_MMA and RS6000_BTM_FUTURE. (rs6000_builtin_type_index): Add RS6000_BTI_vector_pair and RS6000_BTI_vector_quad. (vector_pair_type_node): New. (vector_quad_type_node): New. * config/rs6000/rs6000.md: Include mma.md. (define_mode_iterator RELOAD): Add POI and PXI. * config/rs6000/t-rs6000 (MD_INCLUDES): Add mma.md. * config/rs6000/rs6000.opt (-mmma): New. * doc/invoke.texi: Document -mmma.