diff options
author | Peter Bergner <bergner@linux.ibm.com> | 2020-06-20 23:23:02 -0500 |
---|---|---|
committer | Peter Bergner <bergner@linux.ibm.com> | 2020-06-21 00:26:13 -0500 |
commit | 8ee2640bfdc62f835ec9740278f948034bc7d9f1 (patch) | |
tree | 9375290dee7275b930336364140634ebd6a5fe08 /gcc/doc | |
parent | f002c046e37d0027513af5297d9259e1fad29c27 (diff) | |
download | gcc-8ee2640bfdc62f835ec9740278f948034bc7d9f1.zip gcc-8ee2640bfdc62f835ec9740278f948034bc7d9f1.tar.gz gcc-8ee2640bfdc62f835ec9740278f948034bc7d9f1.tar.bz2 |
rs6000: Add MMA built-in function definitions and test cases.
Add the Matrix-Multiply Assist (MMA) built-ins. The MMA accumulators are
INOUT operands for most MMA instructions, but they are also very expensive
to move around. For this reason, we have implemented a built-in API where
the accumulators are passed using pass-by-reference/pointers, so the user
won't use one accumulator as input and another as output, which wouldentail
a lot of copies. However, using pointers gives us poor code generation
when we expand the built-ins at normal expand time. We therefore expand
the MMA built-ins early into gimple, converting the pass-by-reference calls
to an internal built-in that uses pass-by-value calling convention, where
we can enforce the input and output accumulators are the same. This gives
us much better code generation.
2020-06-20 Peter Bergner <bergner@linux.ibm.com>
gcc/
* config/rs6000/predicates.md (mma_assemble_input_operand): New.
* config/rs6000/rs6000-builtin.def (BU_MMA_1, BU_MMA_V2, BU_MMA_3,
BU_MMA_5, BU_MMA_6, BU_VSX_1): Add support macros for defining MMA
built-in functions.
(ASSEMBLE_ACC, ASSEMBLE_PAIR, DISASSEMBLE_ACC, DISASSEMBLE_PAIR,
PMXVBF16GER2, PMXVBF16GER2NN, PMXVBF16GER2NP, PMXVBF16GER2PN,
PMXVBF16GER2PP, PMXVF16GER2, PMXVF16GER2NN, PMXVF16GER2NP,
PMXVF16GER2PN, PMXVF16GER2PP, PMXVF32GER, PMXVF32GERNN,
PMXVF32GERNP, PMXVF32GERPN, PMXVF32GERPP, PMXVF64GER, PMXVF64GERNN,
PMXVF64GERNP, PMXVF64GERPN, PMXVF64GERPP, PMXVI16GER2, PMXVI16GER2PP,
PMXVI16GER2S, PMXVI16GER2SPP, PMXVI4GER8, PMXVI4GER8PP, PMXVI8GER4,
PMXVI8GER4PP, PMXVI8GER4SPP, XVBF16GER2, XVBF16GER2NN, XVBF16GER2NP,
XVBF16GER2PN, XVBF16GER2PP, XVCVBF16SP, XVCVSPBF16, XVF16GER2,
XVF16GER2NN, XVF16GER2NP, XVF16GER2PN, XVF16GER2PP, XVF32GER,
XVF32GERNN, XVF32GERNP, XVF32GERPN, XVF32GERPP, XVF64GER, XVF64GERNN,
XVF64GERNP, XVF64GERPN, XVF64GERPP, XVI16GER2, XVI16GER2PP, XVI16GER2S,
XVI16GER2SPP, XVI4GER8, XVI4GER8PP, XVI8GER4, XVI8GER4PP, XVI8GER4SPP,
XXMFACC, XXMTACC, XXSETACCZ): Add MMA built-ins.
* config/rs6000/rs6000.c (rs6000_emit_move): Use CONST_INT_P.
Allow zero constants.
(print_operand) <case 'A'>: New output modifier.
(rs6000_split_multireg_move): Add support for inserting accumulator
priming and depriming instructions. Add support for splitting an
assemble accumulator pattern.
* config/rs6000/rs6000-call.c (mma_init_builtins, mma_expand_builtin,
rs6000_gimple_fold_mma_builtin): New functions.
(RS6000_BUILTIN_M): New macro.
(def_builtin): Handle RS6000_BTC_QUAD and RS6000_BTC_PAIR attributes.
(bdesc_mma): Add new MMA built-in support.
(htm_expand_builtin): Use RS6000_BTC_OPND_MASK.
(rs6000_invalid_builtin): Add handling of RS6000_BTM_FUTURE and
RS6000_BTM_MMA.
(rs6000_builtin_valid_without_lhs): Handle RS6000_BTC_VOID attribute.
(rs6000_gimple_fold_builtin): Call rs6000_builtin_is_supported_p
and rs6000_gimple_fold_mma_builtin.
(rs6000_expand_builtin): Call mma_expand_builtin.
Use RS6000_BTC_OPND_MASK.
(rs6000_init_builtins): Adjust comment. Call mma_init_builtins.
(htm_init_builtins): Use RS6000_BTC_OPND_MASK.
(builtin_function_type): Handle VSX_BUILTIN_XVCVSPBF16 and
VSX_BUILTIN_XVCVBF16SP.
* config/rs6000/rs6000.h (RS6000_BTC_QUINARY, RS6000_BTC_SENARY,
RS6000_BTC_OPND_MASK, RS6000_BTC_QUAD, RS6000_BTC_PAIR,
RS6000_BTC_QUADPAIR, RS6000_BTC_GIMPLE): New defines.
(RS6000_BTC_PREDICATE, RS6000_BTC_ABS, RS6000_BTC_DST,
RS6000_BTC_TYPE_MASK, RS6000_BTC_ATTR_MASK): Adjust values.
* config/rs6000/mma.md (MAX_MMA_OPERANDS): New define_constant.
(UNSPEC_MMA_ASSEMBLE_ACC, UNSPEC_MMA_PMXVBF16GER2,
UNSPEC_MMA_PMXVBF16GER2NN, UNSPEC_MMA_PMXVBF16GER2NP,
UNSPEC_MMA_PMXVBF16GER2PN, UNSPEC_MMA_PMXVBF16GER2PP,
UNSPEC_MMA_PMXVF16GER2, UNSPEC_MMA_PMXVF16GER2NN,
UNSPEC_MMA_PMXVF16GER2NP, UNSPEC_MMA_PMXVF16GER2PN,
UNSPEC_MMA_PMXVF16GER2PP, UNSPEC_MMA_PMXVF32GER,
UNSPEC_MMA_PMXVF32GERNN, UNSPEC_MMA_PMXVF32GERNP,
UNSPEC_MMA_PMXVF32GERPN, UNSPEC_MMA_PMXVF32GERPP,
UNSPEC_MMA_PMXVF64GER, UNSPEC_MMA_PMXVF64GERNN,
UNSPEC_MMA_PMXVF64GERNP, UNSPEC_MMA_PMXVF64GERPN,
UNSPEC_MMA_PMXVF64GERPP, UNSPEC_MMA_PMXVI16GER2,
UNSPEC_MMA_PMXVI16GER2PP, UNSPEC_MMA_PMXVI16GER2S,
UNSPEC_MMA_PMXVI16GER2SPP, UNSPEC_MMA_PMXVI4GER8,
UNSPEC_MMA_PMXVI4GER8PP, UNSPEC_MMA_PMXVI8GER4,
UNSPEC_MMA_PMXVI8GER4PP, UNSPEC_MMA_PMXVI8GER4SPP,
UNSPEC_MMA_XVBF16GER2, UNSPEC_MMA_XVBF16GER2NN,
UNSPEC_MMA_XVBF16GER2NP, UNSPEC_MMA_XVBF16GER2PN,
UNSPEC_MMA_XVBF16GER2PP, UNSPEC_MMA_XVF16GER2, UNSPEC_MMA_XVF16GER2NN,
UNSPEC_MMA_XVF16GER2NP, UNSPEC_MMA_XVF16GER2PN, UNSPEC_MMA_XVF16GER2PP,
UNSPEC_MMA_XVF32GER, UNSPEC_MMA_XVF32GERNN, UNSPEC_MMA_XVF32GERNP,
UNSPEC_MMA_XVF32GERPN, UNSPEC_MMA_XVF32GERPP, UNSPEC_MMA_XVF64GER,
UNSPEC_MMA_XVF64GERNN, UNSPEC_MMA_XVF64GERNP, UNSPEC_MMA_XVF64GERPN,
UNSPEC_MMA_XVF64GERPP, UNSPEC_MMA_XVI16GER2, UNSPEC_MMA_XVI16GER2PP,
UNSPEC_MMA_XVI16GER2S, UNSPEC_MMA_XVI16GER2SPP, UNSPEC_MMA_XVI4GER8,
UNSPEC_MMA_XVI4GER8PP, UNSPEC_MMA_XVI8GER4, UNSPEC_MMA_XVI8GER4PP,
UNSPEC_MMA_XVI8GER4SPP, UNSPEC_MMA_XXMFACC, UNSPEC_MMA_XXMTACC): New.
(MMA_ACC, MMA_VV, MMA_AVV, MMA_PV, MMA_APV, MMA_VVI4I4I8,
MMA_AVVI4I4I8, MMA_VVI4I4I2, MMA_AVVI4I4I2, MMA_VVI4I4,
MMA_AVVI4I4, MMA_PVI4I2, MMA_APVI4I2, MMA_VVI4I4I4,
MMA_AVVI4I4I4): New define_int_iterator.
(acc, vv, avv, pv, apv, vvi4i4i8, avvi4i4i8, vvi4i4i2,
avvi4i4i2, vvi4i4, avvi4i4, pvi4i2, apvi4i2, vvi4i4i4,
avvi4i4i4): New define_int_attr.
(*movpxi): Add zero constant alternative.
(mma_assemble_pair, mma_assemble_acc): New define_expand.
(*mma_assemble_acc): New define_insn_and_split.
(mma_<acc>, mma_xxsetaccz, mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>,
mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>,
mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>,
mma_<vvi4i4i4>, mma_<avvi4i4i4>): New define_insn.
* config/rs6000/rs6000.md (define_attr "type"): New type mma.
* config/rs6000/vsx.md (UNSPEC_VSX_XVCVBF16SP): New.
(UNSPEC_VSX_XVCVSPBF16): Likewise.
(XVCVBF16): New define_int_iterator.
(xvcvbf16): New define_int_attr.
(vsx_<xvcvbf16>): New define_insn.
* doc/extend.texi: Document the mma built-ins.
Diffstat (limited to 'gcc/doc')
-rw-r--r-- | gcc/doc/extend.texi | 95 |
1 files changed, 95 insertions, 0 deletions
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 10dc32e..95f7192 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -13858,6 +13858,7 @@ instructions, but allow the compiler to schedule those calls. * PowerPC AltiVec/VSX Built-in Functions:: * PowerPC Hardware Transactional Memory Built-in Functions:: * PowerPC Atomic Memory Operation Functions:: +* PowerPC Matrix-Multiply Assist Built-in Functions:: * RX Built-in Functions:: * S/390 System z Built-in Functions:: * SH Built-in Functions:: @@ -21359,6 +21360,100 @@ void amo_stdat_smax (int64_t *, int64_t); void amo_stdat_smin (int64_t *, int64_t); @end smallexample +@node PowerPC Matrix-Multiply Assist Built-in Functions +@subsection PowerPC Matrix-Multiply Assist Built-in Functions +ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions. +GCC provides support for these instructions through the following built-in +functions which are enabled with the @code{-mmma} option. The vec_t type +below is defined to be a normal vector unsigned char type. The uint2, uint4 +and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants +respectively. The compiler will verify that they are constants and that +their values are within range. + +The built-in functions supported are: + +@smallexample +void __builtin_mma_xvi4ger8 (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi8ger4 (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi16ger2 (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi16ger2s (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf16ger2 (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvbf16ger2 (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf32ger (__vector_quad *, vec_t, vec_t); + +void __builtin_mma_xvi4ger8pp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi8ger4pp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi8ger4spp(__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi16ger2pp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi16ger2spp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf16ger2pp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf16ger2pn (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf16ger2np (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf16ger2nn (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvbf16ger2pp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvbf16ger2pn (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvbf16ger2np (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvbf16ger2nn (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf32gerpp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf32gerpn (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf32gernp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf32gernn (__vector_quad *, vec_t, vec_t); + +void __builtin_mma_pmxvi4ger8 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8); +void __builtin_mma_pmxvi4ger8pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8); + +void __builtin_mma_pmxvi8ger4 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4); +void __builtin_mma_pmxvi8ger4pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4); +void __builtin_mma_pmxvi8ger4spp(__vector_quad *, vec_t, vec_t, uint4, uint4, uint4); + +void __builtin_mma_pmxvi16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvi16ger2s (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvbf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); + +void __builtin_mma_pmxvi16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvi16ger2spp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvbf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvbf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvbf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvbf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); + +void __builtin_mma_pmxvf32ger (__vector_quad *, vec_t, vec_t, uint4, uint4); +void __builtin_mma_pmxvf32gerpp (__vector_quad *, vec_t, vec_t, uint4, uint4); +void __builtin_mma_pmxvf32gerpn (__vector_quad *, vec_t, vec_t, uint4, uint4); +void __builtin_mma_pmxvf32gernp (__vector_quad *, vec_t, vec_t, uint4, uint4); +void __builtin_mma_pmxvf32gernn (__vector_quad *, vec_t, vec_t, uint4, uint4); + +void __builtin_mma_xvf64ger (__vector_quad *, __vector_pair, vec_t); +void __builtin_mma_xvf64gerpp (__vector_quad *, __vector_pair, vec_t); +void __builtin_mma_xvf64gerpn (__vector_quad *, __vector_pair, vec_t); +void __builtin_mma_xvf64gernp (__vector_quad *, __vector_pair, vec_t); +void __builtin_mma_xvf64gernn (__vector_quad *, __vector_pair, vec_t); + +void __builtin_mma_pmxvf64ger (__vector_quad *, __vector_pair, vec_t, uint4, uint2); +void __builtin_mma_pmxvf64gerpp (__vector_quad *, __vector_pair, vec_t, uint4, uint2); +void __builtin_mma_pmxvf64gerpn (__vector_quad *, __vector_pair, vec_t, uint4, uint2); +void __builtin_mma_pmxvf64gernp (__vector_quad *, __vector_pair, vec_t, uint4, uint2); +void __builtin_mma_pmxvf64gernn (__vector_quad *, __vector_pair, vec_t, uint4, uint2); + +void __builtin_mma_xxmtacc (__vector_quad *); +void __builtin_mma_xxmfacc (__vector_quad *); +void __builtin_mma_xxsetaccz (__vector_quad *); + +void __builtin_mma_assemble_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t); +void __builtin_mma_disassemble_acc (void *, __vector_quad *); + +void __builtin_mma_assemble_pair (__vector_pair *, vec_t, vec_t); +void __builtin_mma_disassemble_pair (void *, __vector_pair *); + +vec_t __builtin_vsx_xvcvspbf16 (vec_t); +vec_t __builtin_vsx_xvcvbf16sp (vec_t); +@end smallexample + @node RX Built-in Functions @subsection RX Built-in Functions GCC supports some of the RX instructions which cannot be expressed in |