diff options
Diffstat (limited to 'gcc/doc/extend.texi')
| -rw-r--r-- | gcc/doc/extend.texi | 871 |
1 files changed, 758 insertions, 113 deletions
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 8aaedae..5f36510 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -31,7 +31,6 @@ extensions, accepted by GCC in C90 mode and in C++. * Thread-Local:: Per-thread variables. * OpenMP:: Multiprocessing extensions. * OpenACC:: Extensions for offloading code to accelerator devices. -* _Countof:: The number of elements of arrays. * Inline:: Defining inline functions (as fast as macros). * Volatiles:: What constitutes an access to a volatile object. * Using Assembly Language with C:: Instructions and extensions for interfacing C with assembler. @@ -300,49 +299,19 @@ typedef _Complex float __attribute__((mode(IC))) _Complex_ibm128; @node Half-Precision @subsection Half-Precision Floating Point @cindex half-precision floating point -@cindex @code{__fp16} data type -@cindex @code{__Float16} data type +@cindex @code{_Float16} data type -On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating -point via the @code{__fp16} type defined in the ARM C Language Extensions. -On ARM systems, you must enable this type explicitly with the -@option{-mfp16-format} command-line option in order to use it. -On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit) -floating point via the @code{_Float16} type. For C++, x86 provides a builtin -type named @code{_Float16} which contains same data format as C. +GCC supports half-precision (16-bit) floating point on several targets. -ARM targets support two incompatible representations for half-precision -floating-point values. You must choose one of the representations and -use it consistently in your program. +It is recommended that portable code use the @code{_Float16} type defined +by ISO/IEC TS 18661-3:2015. @xref{Floating Types}. -Specifying @option{-mfp16-format=ieee} selects the IEEE 754-2008 format. -This format can represent normalized values in the range of @math{2^{-14}} to 65504. -There are 11 bits of significand precision, approximately 3 -decimal digits. +Some targets have peculiarities as follows. -Specifying @option{-mfp16-format=alternative} selects the ARM -alternative format. This representation is similar to the IEEE -format, but does not support infinities or NaNs. Instead, the range -of exponents is extended, so that this format can represent normalized -values in the range of @math{2^{-14}} to 131008. - -The GCC port for AArch64 only supports the IEEE 754-2008 format, and does -not require use of the @option{-mfp16-format} command-line option. - -The @code{__fp16} type may only be used as an argument to intrinsics defined -in @code{<arm_fp16.h>}, or as a storage format. For purposes of -arithmetic and other operations, @code{__fp16} values in C or C++ -expressions are automatically promoted to @code{float}. - -The ARM target provides hardware support for conversions between -@code{__fp16} and @code{float} values -as an extension to VFP and NEON (Advanced SIMD), and from ARMv8-A provides -hardware support for conversions between @code{__fp16} and @code{double} -values. GCC generates code using these hardware instructions if you -compile with options to select an FPU that provides them; -for example, @option{-mfpu=neon-fp16 -mfloat-abi=softfp}, -in addition to the @option{-mfp16-format} option to select -a half-precision format. +@cindex @code{__fp16} data type +On Arm and AArch64 targets, GCC supports half-precision (16-bit) +floating point via the @code{__fp16} type defined in the Arm +C-Language Extensions (ACLE). Language-level support for the @code{__fp16} data type is independent of whether GCC generates code using hardware floating-point @@ -350,17 +319,77 @@ instructions. In cases where hardware support is not specified, GCC implements conversions between @code{__fp16} and other types as library calls. -It is recommended that portable code use the @code{_Float16} type defined -by ISO/IEC TS 18661-3:2015. @xref{Floating Types}. +Arm targets support two mutually incompatible half-precision +floating-point formats: + +@itemize @bullet +@item +A format that implements IEEE 754-2008 16-bit floating point types, +enabled with the @option{-mfp16-format=ieee} command-line option; this +format can represent normalized values in the range of @math{2^{-14}} +to 65504. There are 11 bits of significand precision, approximately 3 +decimal digits. + +@item +An alternative format that sacrifices NaNs and infinity values, but +has a larger range of values that can be represented: @math{2^{-14}} +to 131008. This is enabled with the +@option{-mfp16-format=alternative} option. +@end itemize + +You must choose one of the formats and use it consistently in your +program. + +GCC only supports the @samp{alternative} format on implementations +that support it in hardware; there is no support for conversions to +and from this format using library functions. Furthermore, you cannot +link together code compiled with one format and code compiled for the +other. GCC also supports the @option{-mfp16-format=none} option, +which disables all support for half-precision floating-point types. +Code compiled with this option can be linked safely with code compiled +for either format. + +The Arm architecture extension @code{FEAT_FP16} (enabled, for example, +with @option{-march=armv8.2-a+fp16}, or +@option{-march=armv8.1-m.main+mve.fp}) defines data processing +instructions that only support the @samp{ieee} format. The compiler +rejects attempts to use the @samp{alternative} format when this +architecture extension is enabled. + +Note that the ACLE has deprecated use of the @samp{alternative} format +and recommends that only the @samp{ieee} format be used. + +The default is to compile with @option{-mfp16-format=ieee}. + +In C and C++ there are two related data types: +@itemize @bullet +@item + +@code{__fp16}, as defined by the Arm C-Language Extensions (ACLE). +This can be used to hold either format; + +@item +@code{_Float16}, which is defined by ISO/IEC TS 18661-3:2015. This is +only defined when the format selected is @samp{ieee}. +@end itemize + +The GCC port for AArch64 only supports the IEEE 754-2008 format, and +does not have the @option{-mfp16-format} command-line option. + + +On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit) +floating point via the @code{_Float16} type. For C++, x86 provides a +builtin type named @code{_Float16} which contains same data format as C. + On x86 targets with SSE2 enabled, without @option{-mavx512fp16}, -all operations will be emulated by software emulation and the @code{float} +all operations are emulated by software emulation and the @code{float} instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep the intermediate result of the operation as 32-bit precision. This may lead to inconsistent behavior between software emulation and AVX512-FP16 instructions. -Using @option{-fexcess-precision=16} will force round back after each operation. +Using @option{-fexcess-precision=16} forces round back after each operation. -Using @option{-mavx512fp16} will generate AVX512-FP16 instructions instead of +Using @option{-mavx512fp16} generates AVX512-FP16 instructions instead of software emulation. The default behavior of @code{FLT_EVAL_METHOD} is to round after each operation. The same is true with @option{-fexcess-precision=standard} and @option{-mfpmath=sse}. If there is no @option{-mfpmath=sse}, @@ -1131,6 +1160,14 @@ such an initializer, as shown here: char **foo = (char *[]) @{ "x", "y", "z" @}; @end smallexample +As a GNU extension, GCC allows compound literals with a variable size. +In this case, only empty initialization is allowed. + +@smallexample +int n = 4; +char (*p)[n] = &(char[n])@{ @}; +@end smallexample + Compound literals for scalar types and union types are also allowed. In the following example the variable @code{i} is initialized to the value @code{2}, the result of incrementing the unnamed object created by @@ -3463,12 +3500,41 @@ Function Attributes}, @ref{PowerPC Function Attributes}, @ref{ARM Function Attributes}, @ref{AArch64 Function Attributes}, and @ref{S/390 Function Attributes} for details. +On targets supporting @code{target} function multiversioning (x86), when using +C++, you can declare multiple functions with the same signatures but different +@code{target} attribute values, and the correct version is chosen by the +dynamic linker. In the example below, two function versions are produced +with differing mangling. Additionally an ifunc resolver is created to +select the correct version to populate the @code{func} symbol. + +@smallexample +int func (void) __attribute__ ((target ("arch=core2"))) @{ return 1; @} +int func (void) __attribute__ ((target ("sse3"))) @{ return 2; @} +@end smallexample + +Declarations annotated with @code{target} cannot be used in combination with +declarations annotated with @code{target_clones} in a single multiversioned +function definition. + +@xref{Function Multiversioning} for more details. + +@cindex @code{target_version} function attribute +@item target_version (@var{option}) +On targets with @code{target_version} function multiversioning (AArch64 and +RISC-V) in C or C++, you can declare multiple functions with +@code{target_version} or @code{target_clones} attributes to define a function +version set. + +@xref{Function Multiversioning} for more details. + @cindex @code{target_clones} function attribute @item target_clones (@var{options}) The @code{target_clones} attribute is used to specify that a function be cloned into multiple versions compiled with different target options -than specified on the command line. The supported options and restrictions -are the same as for @code{target} attribute. +than specified on the command line. + +For the x86 and PowerPC targets, the supported options and restrictions +are the same as for the @code{target} attribute. For instance, on an x86, you could compile a function with @code{target_clones("sse4.1,avx")}. GCC creates two function clones, @@ -3480,16 +3546,20 @@ function clones, one compiled with @option{-mcpu=power9} and another with the default options. GCC must be configured to use GLIBC 2.23 or newer in order to use the @code{target_clones} attribute. -It also creates a resolver function (see -the @code{ifunc} attribute above) that dynamically selects a clone -suitable for current architecture. The resolver is created only if there -is a usage of a function with @code{target_clones} attribute. +@code{target_clones} works similarly for targets that support the +@code{target_version} attribute (AArch64 and RISC-V). The attribute takes +multiple arguments, and generates a versioned clone for each. A function +annotated with @code{target_clones} is equivalent to the same function +duplicated for each valid version string in the argument, where each +version is instead annotated with @code{target_version}. This means that a +@code{target_clones} annotated function definition can be used in combination +with @code{target_version} annotated functions definitions and other +@code{target_clones} annotated function definitions. -Note that any subsequent call of a function without @code{target_clone} -from a @code{target_clone} caller will not lead to copying -(target clone) of the called function. -If you want to enforce such behavior, -we recommend declaring the calling function with the @code{flatten} attribute? +For these targets the supported options and restrictions are the same as for +the @code{target_version} attribute. + +@xref{Function Multiversioning} for more details. @cindex @code{unavailable} function attribute @item unavailable @@ -3930,6 +4000,27 @@ threads, such as the POSIX @code{swapcontext} function. This attribute adds a @code{BTI J} instruction when BTI is enabled e.g. via @option{-mbranch-protection}. +@cindex @code{preserve_none} function attribute, AArch64 +@item preserve_none +Use this attribute to change the procedure call standard of the specified +function to the preserve-none variant. + +The preserve-none ABI variant modifies the AAPCS such that it has no +callee-saved registers (including SIMD and floating-point registers). That is, +with the exception of the stack register, link register (r30), and frame pointer +(r29), all registers are changed to caller saved, and can be used as scratch +registers by the callee. + +Additionally, registers r20--r28, r0--r7, r10--r14, r9 and r15 are used for +argument passing, in that order. For Microsoft Windows targets +r15 is not used for argument passing. + +The return value registers remain r0 and r1 in both cases. + +All other details are the same as for the AAPCS ABI. + +This ABI has not been stabilized, and may be subject to change in future +versions. @end table The above target attributes can be specified as follows: @@ -4767,7 +4858,16 @@ Calls to @code{foo} are mapped to calls to @code{foo@{20040821@}}. @node LoongArch Function Attributes @subsubsection LoongArch Function Attributes -These function attributes are supported by the LoongArch end: +The following attributes are supported by LoongArch end: + +@table @code + +@cindex @code{target (option,...)} loongarch function attribute target +@item target (option,...) + +The following target-specific function attributes are available for the +LoongArch target. These options mirror the behavior of similar +command-line options (@pxref{LoongArch Options}), but on a per-function basis. @table @code @cindex @code{strict-align} function attribute, LoongArch @@ -4836,6 +4936,200 @@ But the following method cannot perform 128-bit vectorization. $ gcc test.c -o test.s -O2 -mlasx -mno-lasx @end smallexample +@cindex @code{recipe} function attribute, LoongArch +@item recipe +@itemx no-recipe +@code{recipe} indicates that frecipe.@{s/d@} and frsqrt.@{s/d@}instruction generation +is allowed (not allowed) when compiling the function. The behavior is same as for +the command-line option +@option{-mrecipe} and @option{-mno-recipe}. + +@cindex @code{div32} function attribute, LoongArch +@item div32 +@itemx no-div32 +@code{div32} determines whether div.w[u] and mod.w[u] instructions on 64-bit machines +are evaluated based only on the lower 32 bits of the input registers. +@option{-mdiv32} and @option{-mno-div32}. + +@cindex @code{lam-bh} function attribute, LoongArch +@item lam-bh +@itemx no-lam-bh +@code{lam-bh} indicates that am@{swap/add@}[_db].@{b/h@} instruction generation +is allowed (not allowed) when compiling the function. The behavior is same as for +the command-line option +@option{-mlam-bh} and @option{-mno-lam-bh}. + +@cindex @code{lamcas} function attribute, LoongArch +@item lamcas +@itemx no-lamcas +@code{lamcas} indicates that amcas[_db].@{b/h/w/d@} instruction generation +is allowed (not allowed) when compiling the function. The behavior is same as for +the command-line option +@option{-mlamcas} and @option{-mno-lamcas}. + +@cindex @code{scq} function attribute, LoongArch +@item scq +@itemx no-scq +@code{scq} indicates that sc.q instruction generation is allowed (not allowed) when +compiling the function. The behavior is same as for the command-line option +@option{-mscq} and @option{-mno-scq}. + +@cindex @code{ld-seq-sa} function attribute, LoongArch +@item ld-seq-sa +@itemx no-ld-seq-sa +@code{ld-seq-sa} indicates that whether need load-load barries (dbar 0x700) +@option{-mld-seq-sa} and @option{-mno-ld-seq-sa}. + +@end table + +Multiple target function attributes can be specified by separating them with +a comma. For example: + +@smallexample +__attribute__((target("arch=la64v1.1,lasx"))) +int +foo (int a) +@{ + return a + 5; +@} +@end smallexample + +is valid and compiles function @code{foo} for LA64V1.1 with @code{lasx}. + +@subsubheading Inlining rules +Specifying target attributes on individual functions or performing link-time +optimization across translation units compiled with different target options +can affect function inlining rules: + +In particular, a caller function can inline a callee function only if the +architectural features available to the callee are a subset of the features +available to the caller. + +Note that when the callee function does not have the always_inline attribute, +it will not be inlined if the code model of the caller function is different +from the code model of the callee function. + +@cindex @code{target_clones (string,...)} loongarch function attribute target_clones +@item target_clones (string,...) + +Like attribute @code{target}, these options also reflect the behavior of +similar command line options. + +@code{string} can take the following values: + +@itemize @bullet +@item default +@item strict-align +@item arch= +@item lsx +@item lasx +@item frecipe +@item div32 +@item lam-bh +@item lamcas +@item scq +@item ld-seq-sa +@end itemize +You can set the priority of attributes in target_clones (except @code{default}). +For example: + +@smallexample +__attribute__((target_clones ("default","arch=la64v1.1","lsx;priority=1"))) +int +foo (int a) +@{ + return a + 5; +@} +@end smallexample + +The priority is from low to high: +@itemize @bullet +@item default +@item arch=loongarch64 +@item strict-align +@item frecipe = div32 = lam-bh = lamcas = scq = ld-seq-sa +@item lsx +@item arch=la64v1.0 +@item arch=la64v1.1 +@item lasx +@end itemize + +Note that the option values on the gcc command line are not considered when +calculating the priority. + +If a priority is set for a feature in target_clones, then the priority of this +feature will be higher than @code{lasx}. + +For example: + +@smallexample +__attribute__((target_clones ("default","arch=la64v1.1","lsx;priority=1"))) +int +foo (int a) +@{ + return a + 5; +@} +@end smallexample + +In this test case, the priority of @code{lsx} is higher than that of +@code{arch=la64v1.1}. + +If the same priority is explicitly set for two features, the priority is still +calculated according to the priority list above. + +For example: + +@smallexample +__attribute__((target_clones ("default","arch=la64v1.1;priority=1","lsx;priority=1"))) +int +foo (int a) +@{ + return a + 5; +@} +@end smallexample + +In this test case, the priority of @code{arch=la64v1.1;priority=1} is higher +than that of @code{lsx;priority=1}. + +@cindex @code{target_version (string)} loongarch function attribute target_versions +@item target_version (string) +Support attributes and priorities are the same as @code{target_clones}. +Note that this attribute requires GLIBC2.38 and newer that support HWCAP. + +For example: + +@code{test1.C} +@smallexample +__attribute__((target_clones ("default","arch=la64v1.1","lsx;priority=1"))) +int +foo (int a) +@{ + return a + 5; +@} +@end smallexample + +@code{test2.C} +@smallexample +__attribute__((target_version ("default"))) +int +foo (int a) +@{ + return a + 5; +@} +__attribute__((target_version ("arch=la64v1.1"))) +int +foo (int a) +@{ + return a + 5; +@} +__attribute__((target_version ("lsx;priority=1"))) +int +foo (int a) +@{ + return a + 5; +@} +@end smallexample +The implementations of @code{test1.C} and @code{test2.C} are equivalent. @end table @node M32C Function Attributes @@ -5713,6 +6007,16 @@ Specifies the core for which to tune the performance of this function and also whose architectural features to use. The behavior and valid arguments are the same as for the @option{-mcpu=} command-line option. +@cindex @code{max-vectorization} function attribute, RISC-V +@item max-vectorization +@itemx no-max-vectorization +@code{max-vectorization} tells GCC's vectorizer to treat all vector +loops as being more profitable than the original scalar loops when +optimizing the current function. @code{no-max-vectorization} disables +this behavior. +This corresponds to the behavior of the command-line options +@option{-mmax-vectorization} and @option{-mno-max-vectorization}. + @end table The above target attributes can be specified as follows: @@ -10428,7 +10732,7 @@ for more information about the @code{target} attribute and the attribute syntax. The @code{#pragma GCC target} pragma is presently implemented for -x86, ARM, AArch64, PowerPC, and S/390 targets only. +x86, ARM, AArch64, PowerPC, RISC-V, and S/390 targets only. @cindex pragma GCC optimize @item #pragma GCC optimize (@var{string}, @dots{}) @@ -10849,36 +11153,6 @@ library. @xref{OpenMP and OpenACC Options}, for additional options useful with @option{-fopenacc}. -@node _Countof -@section Determining the Number of Elements of Arrays -@cindex _Countof -@cindex number of elements - -The keyword @code{_Countof} determines -the number of elements of an array operand. -Its syntax is similar to @code{sizeof}. -The operand must be -a parenthesized complete array type name -or an expression of such a type. -For example: - -@smallexample -int a[n]; -_Countof (a); // returns n -_Countof (int [7][3]); // returns 7 -@end smallexample - -The result of this operator is an integer constant expression, -unless the array has a variable number of elements. -The operand is only evaluated -if the array has a variable number of elements. -For example: - -@smallexample -_Countof (int [7][n++]); // integer constant expression -_Countof (int [n++][7]); // run-time value; n++ is evaluated -@end smallexample - @node Inline @section An Inline Function is As Fast As a Macro @cindex inline functions @@ -13300,6 +13574,8 @@ C and/or C++ standards, while others remain specific to GNU C. * Labels as Values:: Getting pointers to labels, and computed gotos. * Nested Functions:: Nested functions in GNU C. * Typeof:: @code{typeof}: referring to the type of an expression. +* _Countof:: Determining the number of elements of arrays +* _Maxof and _Minof:: The maximum and minimum representable values of a type. * Offsetof:: Special syntax for @code{offsetof}. * Alignment:: Determining the alignment of a function, type or variable. * Enum Extensions:: Forward declarations and specifying the underlying type. @@ -13936,6 +14212,55 @@ evaluated only once when using @code{__auto_type}, but twice if @code{typeof} is used. @end itemize +@node _Countof +@subsection Determining the Number of Elements of Arrays +@findex _Countof +@findex number of elements + +The keyword @code{_Countof} determines +the number of elements of an array operand. +Its syntax is similar to @code{sizeof}. +The operand must be +a parenthesized complete array type name +or an expression of such a type. +For example: + +@smallexample +int a[n]; +_Countof (a); // returns n +_Countof (int [7][3]); // returns 7 +@end smallexample + +The result of this operator is an integer constant expression, +unless the array has a variable number of elements. +The operand is only evaluated +if the array has a variable number of elements. +For example: + +@smallexample +_Countof (int [7][n++]); // integer constant expression +_Countof (int [n++][7]); // run-time value; n++ is evaluated +@end smallexample + +@node _Maxof and _Minof +@subsection The maximum and minimum representable values of a type +@findex _Maxof +@findex _Minof + +The keywords @code{_Maxof} and @code{_Minof} determine +the maximum and minimum representable values of an integer type. +Their syntax is similar to @code{sizeof}. +The operand must be +a parenthesized integer type. +The result of these operators is an integer constant expression +of the same type as the operand. +For example: + +@smallexample +_Maxof (int); // returns '(int) INT_MAX' +_Minof (short); // returns '(short) SHRT_MIN' +@end smallexample + @node Offsetof @subsection Support for @code{offsetof} @findex __builtin_offsetof @@ -18267,7 +18592,7 @@ instructions, but allow the compiler to schedule those calls. * Alpha Built-in Functions:: * ARC Built-in Functions:: * ARC SIMD Built-in Functions:: -* ARM C Language Extensions (ACLE):: +* Arm C Language Extensions (ACLE):: * ARM Floating Point Status and Control Intrinsics:: * ARM ARMv8-M Security Extensions:: * AVR Built-in Functions:: @@ -18882,26 +19207,22 @@ _v4hi __builtin_arc_vaddsub4h (__v4hi, __v4hi); _v4hi __builtin_arc_vsubadd4h (__v4hi, __v4hi); @end example -@node ARM C Language Extensions (ACLE) -@subsection ARM C Language Extensions (ACLE) +@node Arm C Language Extensions (ACLE) +@subsection Arm C Language Extensions (ACLE) -GCC implements extensions for C as described in the ARM C Language +GCC implements extensions for C and C++ as described in the Arm C Language Extensions (ACLE) specification, which can be found at -@uref{https://developer.arm.com/documentation/ihi0053/latest/}. - -As a part of ACLE, GCC implements extensions for Advanced SIMD as described in -the ARM C Language Extensions Specification. The complete list of Advanced SIMD -intrinsics can be found at -@uref{https://developer.arm.com/documentation/ihi0073/latest/}. -The built-in intrinsics for the Advanced SIMD extension are available when -NEON is enabled. - -Currently, ARM and AArch64 back ends do not support ACLE 2.0 fully. Both -back ends support CRC32 intrinsics and the ARM back end supports the -Coprocessor intrinsics, all from @file{arm_acle.h}. The ARM back end's 16-bit -floating-point Advanced SIMD intrinsics currently comply to ACLE v1.1. -AArch64's back end does not have support for 16-bit floating point Advanced SIMD -intrinsics yet. +@uref{https://arm-software.github.io/acle/main/}. + +As a part of ACLE, GCC implements extensions for Arm Vector extensions +as described in the Arm C Language Extensions Specification. The complete +list of Arm Vector extension intrinsics is available at +@uref{https://arm-software.github.io/acle/main/}. +The built-in intrinsics for the Arm vector extensions are available when +the respective extensions are enabled. + +Not all aspects of ACLE are supported. Support for each feature of the ACLE +is determined with the @code{__ARM_FEATURE_@var{X}} macros. See @ref{ARM Options} and @ref{AArch64 Options} for more information on the availability of extensions. @@ -19693,7 +20014,16 @@ into the data cache. The instruction is issued in slot I1@. These built-in functions are available for LoongArch. -Data Type Description: +@menu +* Data Types:: +* Directly-mapped Builtin Functions:: +* Directly-mapped Division Builtin Functions:: +* Other Builtin Functions:: +@end menu + +@node Data Types +@subsubsection Data Types + @itemize @item @code{imm0_31}, a compile-time constant in range 0 to 31; @item @code{imm0_16383}, a compile-time constant in range 0 to 16383; @@ -19701,6 +20031,9 @@ Data Type Description: @item @code{imm_n2048_2047}, a compile-time constant in range -2048 to 2047; @end itemize +@node Directly-mapped Builtin Functions +@subsubsection Directly-mapped Builtin Functions + The intrinsics provided are listed below: @smallexample unsigned int __builtin_loongarch_movfcsr2gr (imm0_31) @@ -19824,6 +20157,9 @@ function you need to include @code{larchintrin.h}. void __break (imm0_32767) @end smallexample +@node Directly-mapped Division Builtin Functions +@subsubsection Directly-mapped Division Builtin Functions + These intrinsic functions are available by including @code{larchintrin.h} and using @option{-mfrecipe}. @smallexample @@ -19833,6 +20169,9 @@ using @option{-mfrecipe}. double __frsqrte_d (double); @end smallexample +@node Other Builtin Functions +@subsubsection Other Builtin Functions + Additional built-in functions are available for LoongArch family processors to efficiently use 128-bit floating-point (__float128) values. @@ -19859,6 +20198,15 @@ GCC provides intrinsics to access the LSX (Loongson SIMD Extension) instructions The interface is made available by including @code{<lsxintrin.h>} and using @option{-mlsx}. +@menu +* SX Data Types:: +* Directly-mapped SX Builtin Functions:: +* Directly-mapped SX Division Builtin Functions:: +@end menu + +@node SX Data Types +@subsubsection SX Data Types + The following vectors typedefs are included in @code{lsxintrin.h}: @itemize @@ -19886,6 +20234,9 @@ input/output values manipulated: @item @code{imm_n2048_2047}, an integer literal in range -2048 to 2047. @end itemize +@node Directly-mapped SX Builtin Functions +@subsubsection Directly-mapped SX Builtin Functions + For convenience, GCC defines functions @code{__lsx_vrepli_@{b/h/w/d@}} and @code{__lsx_b[n]z_@{v/b/h/w/d@}}, which are implemented as follows: @@ -20669,6 +21020,9 @@ __m128i __lsx_vxori_b (__m128i, imm0_255); __m128i __lsx_vxor_v (__m128i, __m128i); @end smallexample +@node Directly-mapped SX Division Builtin Functions +@subsubsection Directly-mapped SX Division Builtin Functions + These intrinsic functions are available by including @code{lsxintrin.h} and using @option{-mfrecipe} and @option{-mlsx}. @smallexample @@ -20685,6 +21039,16 @@ GCC provides intrinsics to access the LASX (Loongson Advanced SIMD Extension) instructions. The interface is made available by including @code{<lasxintrin.h>} and using @option{-mlasx}. +@menu +* ASX Data Types:: +* Directly-mapped ASX Builtin Functions:: +* Directly-mapped ASX Division Builtin Functions:: +* Directly-mapped SX and ASX Conversion Builtin Functions:: +@end menu + +@node ASX Data Types +@subsubsection ASX Data Types + The following vectors typedefs are included in @code{lasxintrin.h}: @itemize @@ -20713,6 +21077,9 @@ input/output values manipulated: @item @code{imm_n2048_2047}, an integer literal in range -2048 to 2047. @end itemize +@node Directly-mapped ASX Builtin Functions +@subsubsection Directly-mapped ASX Builtin Functions + For convenience, GCC defines functions @code{__lasx_xvrepli_@{b/h/w/d@}} and @code{__lasx_b[n]z_@{v/b/h/w/d@}}, which are implemented as follows: @@ -21517,6 +21884,9 @@ __m256i __lasx_xvxori_b (__m256i, imm0_255); __m256i __lasx_xvxor_v (__m256i, __m256i); @end smallexample +@node Directly-mapped ASX Division Builtin Functions +@subsubsection Directly-mapped ASX Division Builtin Functions + These intrinsic functions are available by including @code{lasxintrin.h} and using @option{-mfrecipe} and @option{-mlasx}. @smallexample @@ -21526,6 +21896,213 @@ __m256d __lasx_xvfrsqrte_d (__m256d); __m256 __lasx_xvfrsqrte_s (__m256); @end smallexample +@node Directly-mapped SX and ASX Conversion Builtin Functions +@subsubsection Directly-mapped SX and ASX Conversion Builtin Functions + +For convenience, the @code{lsxintrin.h} file was imported into @code{ +lasxintrin.h} and 18 new interface functions for 128 and 256 vector +conversions were added, using the @option{-mlasx} option. +@smallexample +__m256 __lasx_cast_128_s (__m128); +__m256d __lasx_cast_128_d (__m128d); +__m256i __lasx_cast_128 (__m128i); +__m256 __lasx_concat_128_s (__m128, __m128); +__m256d __lasx_concat_128_d (__m128d, __m128d); +__m256i __lasx_concat_128 (__m128i, __m128i); +__m128 __lasx_extract_128_lo_s (__m256); +__m128 __lasx_extract_128_hi_s (__m256); +__m128d __lasx_extract_128_lo_d (__m256d); +__m128d __lasx_extract_128_hi_d (__m256d); +__m128i __lasx_extract_128_lo (__m256i); +__m128i __lasx_extract_128_hi (__m256i); +__m256 __lasx_insert_128_lo_s (__m256, __m128); +__m256 __lasx_insert_128_hi_s (__m256, __m128); +__m256d __lasx_insert_128_lo_d (__m256d, __m128d); +__m256d __lasx_insert_128_hi_d (__m256d, __m128d); +__m256i __lasx_insert_128_lo (__m256i, __m128i); +__m256i __lasx_insert_128_hi (__m256i, __m128i); +@end smallexample + +When gcc does not support interfaces for 128 and 256 conversions, +use the following code for equivalent substitution. + +@smallexample + + #ifndef __loongarch_asx_sx_conv + + #include <lasxintrin.h> + #include <lsxintrin.h> + __m256 inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_cast_128_s (__m128 src) + @{ + __m256 dest; + asm ("" : "=f"(dest) : "0"(src)); + return dest; + @} + + __m256d inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_cast_128_d (__m128d src) + @{ + __m256d dest; + asm ("" : "=f"(dest) : "0"(src)); + return dest; + @} + + __m256i inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_cast_128 (__m128i src) + @{ + __m256i dest; + asm ("" : "=f"(dest) : "0"(src)); + return dest; + @} + + __m256 inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_concat_128_s (__m128 src1, __m128 src2) + @{ + __m256 dest; + asm ("xvpermi.q %u0,%u2,0x02\n" + : "=f"(dest) + : "0"(src1), "f"(src2)); + return dest; + @} + + __m256d inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_concat_128_d (__m128d src1, __m128d src2) + @{ + __m256d dest; + asm ("xvpermi.q %u0,%u2,0x02\n" + : "=f"(dest) + : "0"(src1), "f"(src2)); + return dest; + @} + + __m256i inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_concat_128 (__m128i src1, __m128i src2) + @{ + __m256i dest; + asm ("xvpermi.q %u0,%u2,0x02\n" + : "=f"(dest) + : "0"(src1), "f"(src2)); + return dest; + @} + + __m128 inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_extract_128_lo_s (__m256 src) + @{ + __m128 dest; + asm ("" : "=f"(dest) : "0"(src)); + return dest; + @} + + __m128d inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_extract_128_lo_d (__m256d src) + @{ + __m128d dest; + asm ("" : "=f"(dest) : "0"(src)); + return dest; + @} + + __m128i inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_extract_128_lo (__m256i src) + @{ + __m128i dest; + asm ("" : "=f"(dest) : "0"(src)); + return dest; + @} + + __m128 inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_extract_128_hi_s (__m256 src) + @{ + __m128 dest; + asm ("xvpermi.d %u0,%u1,0xe\n" + : "=f"(dest) + : "f"(src)); + return dest; + @} + + __m128d inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_extract_128_hi_d (__m256d src) + @{ + __m128d dest; + asm ("xvpermi.d %u0,%u1,0xe\n" + : "=f"(dest) + : "f"(src)); + return dest; + @} + + __m128i inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_extract_128_hi (__m256i src) + @{ + __m128i dest; + asm ("xvpermi.d %u0,%u1,0xe\n" + : "=f"(dest) + : "f"(src)); + return dest; + @} + + __m256 inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_insert_128_lo_s (__m256 src1, __m128 src2) + @{ + __m256 dest; + asm ("xvpermi.q %u0,%u2,0x30\n" + : "=f"(dest) + : "0"(src1), "f"(src2)); + return dest; + @} + + __m256d inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_insert_128_lo_d (__m256d a, __m128d b) + @{ + __m256d dest; + asm ("xvpermi.q %u0,%u2,0x30\n" + : "=f"(dest) + : "0"(src1), "f"(src2)); + return dest; + @} + + __m256i inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_insert_128_lo (__m256i src1, __m128i src2) + @{ + __m256i dest; + asm ("xvpermi.q %u0,%u2,0x30\n" + : "=f"(dest) + : "0"(src1), "f"(src2)); + return dest; + @} + + __m256 inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_insert_128_hi_s (__m256 src1, __m128 src2) + @{ + __m256 dest; + asm ("xvpermi.q %u0,%u2,0x02\n" + : "=f"(dest) + : "0"(src1), "f"(src2)); + return dest; + @} + + __m256d inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_insert_128_hi_d (__m256d src1, __m128d src2) + @{ + __m256d dest; + asm ("xvpermi.q %u0,%u2,0x02\n" + : "=f"(dest) + : "0"(src1), "f"(src2)); + return dest; + @} + + __m256i inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) + __lasx_insert_128_hi (__m256i src1, __m128i src2) + @{ + __m256i dest; + asm ("xvpermi.q %u0,%u2,0x02\n" + : "=f"(dest) + : "0"(src1), "f"(src2)); + return dest; + @} + #endif + +@end smallexample + @node MIPS DSP Built-in Functions @subsection MIPS DSP Built-in Functions @@ -30674,11 +31251,79 @@ For the effects of the @code{hot} attribute on functions, see @section Function Multiversioning @cindex function versions -With the GNU C++ front end, for x86 targets, you may specify multiple -versions of a function, where each function is specialized for a -specific target feature. At runtime, the appropriate version of the -function is automatically executed depending on the characteristics of -the execution platform. Here is an example. +Function multiversioning is a mechanism that enables compiling multiple +versions of a function, each specialized for different combinations of +architecture extensions. Additionally, the compiler generates a resolver that +the dynamic linker uses to detect architecture support and choose the +appropriate version at runtime. + +Function multiversioning relies on the indirect function extension to the ELF +standard, and therefore Binutils version 2.20.1 or higher and GNU C Library +version 2.11.1 are required to use this feature. + +There are two versions of function multiversioning supported by GCC. + +For targets supporting the @code{target_version} attribute (AArch64 and RISC-V), +when compiling for C or C++, a function version set can be defined by a +combination of function definitions with @code{target_version} and +@code{target_clones} attributes, across translation units. + +For example: + +@smallexample +// fmv.h: +int foo (); +int foo [[gnu::target_clones("sve", "sve2")]] (); +int foo [[gnu::target_version("dotprod;priority=1")]] (); + +// fmv1.cc +#include "fmv.h" + +int foo () +@{ + // The default version of foo. + return 0; +@} + +// fmv2.cc: +#include "fmv.h" + +int foo [[gnu::target_clones("sve", "sve2")]] () +@{ + // foo versions for sve and sve2 + return 1; +@} + +int foo [[gnu::target_version("dotprod")]] () +@{ + // foo version for dotprod extension + return 2; +@} + +// main.cc +#include "fmv.h" + +int main () +@{ + int (*p)() = &foo; + assert ((*p) () == foo ()); + return 0; +@} +@end smallexample + +This example results in 4 versions of the foo function being generated, and +a resolver which is used by the dynamic linker to choose the correct version. + +For the AArch64 target GCC implements function multiversionsing, with the +semantics and version strings as specified in the +@ref{Arm C Language Extensions (ACLE)}. + +For targets that support multiversioning with the @code{target} attribute +(x86) a multiversioned function can be defined with either multiple function +definitions with the @code{target} attribute (in C++) within a translation unit, +or a single definition with the @code{target_clones} attribute. + +Here is an example. @smallexample __attribute__ ((target ("default"))) |
