aboutsummaryrefslogtreecommitdiff
path: root/gcc/doc/extend.texi
diff options
context:
space:
mode:
Diffstat (limited to 'gcc/doc/extend.texi')
-rw-r--r--gcc/doc/extend.texi871
1 files changed, 758 insertions, 113 deletions
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8aaedae..5f36510 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -31,7 +31,6 @@ extensions, accepted by GCC in C90 mode and in C++.
* Thread-Local:: Per-thread variables.
* OpenMP:: Multiprocessing extensions.
* OpenACC:: Extensions for offloading code to accelerator devices.
-* _Countof:: The number of elements of arrays.
* Inline:: Defining inline functions (as fast as macros).
* Volatiles:: What constitutes an access to a volatile object.
* Using Assembly Language with C:: Instructions and extensions for interfacing C with assembler.
@@ -300,49 +299,19 @@ typedef _Complex float __attribute__((mode(IC))) _Complex_ibm128;
@node Half-Precision
@subsection Half-Precision Floating Point
@cindex half-precision floating point
-@cindex @code{__fp16} data type
-@cindex @code{__Float16} data type
+@cindex @code{_Float16} data type
-On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
-point via the @code{__fp16} type defined in the ARM C Language Extensions.
-On ARM systems, you must enable this type explicitly with the
-@option{-mfp16-format} command-line option in order to use it.
-On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit)
-floating point via the @code{_Float16} type. For C++, x86 provides a builtin
-type named @code{_Float16} which contains same data format as C.
+GCC supports half-precision (16-bit) floating point on several targets.
-ARM targets support two incompatible representations for half-precision
-floating-point values. You must choose one of the representations and
-use it consistently in your program.
+It is recommended that portable code use the @code{_Float16} type defined
+by ISO/IEC TS 18661-3:2015. @xref{Floating Types}.
-Specifying @option{-mfp16-format=ieee} selects the IEEE 754-2008 format.
-This format can represent normalized values in the range of @math{2^{-14}} to 65504.
-There are 11 bits of significand precision, approximately 3
-decimal digits.
+Some targets have peculiarities as follows.
-Specifying @option{-mfp16-format=alternative} selects the ARM
-alternative format. This representation is similar to the IEEE
-format, but does not support infinities or NaNs. Instead, the range
-of exponents is extended, so that this format can represent normalized
-values in the range of @math{2^{-14}} to 131008.
-
-The GCC port for AArch64 only supports the IEEE 754-2008 format, and does
-not require use of the @option{-mfp16-format} command-line option.
-
-The @code{__fp16} type may only be used as an argument to intrinsics defined
-in @code{<arm_fp16.h>}, or as a storage format. For purposes of
-arithmetic and other operations, @code{__fp16} values in C or C++
-expressions are automatically promoted to @code{float}.
-
-The ARM target provides hardware support for conversions between
-@code{__fp16} and @code{float} values
-as an extension to VFP and NEON (Advanced SIMD), and from ARMv8-A provides
-hardware support for conversions between @code{__fp16} and @code{double}
-values. GCC generates code using these hardware instructions if you
-compile with options to select an FPU that provides them;
-for example, @option{-mfpu=neon-fp16 -mfloat-abi=softfp},
-in addition to the @option{-mfp16-format} option to select
-a half-precision format.
+@cindex @code{__fp16} data type
+On Arm and AArch64 targets, GCC supports half-precision (16-bit)
+floating point via the @code{__fp16} type defined in the Arm
+C-Language Extensions (ACLE).
Language-level support for the @code{__fp16} data type is
independent of whether GCC generates code using hardware floating-point
@@ -350,17 +319,77 @@ instructions. In cases where hardware support is not specified, GCC
implements conversions between @code{__fp16} and other types as library
calls.
-It is recommended that portable code use the @code{_Float16} type defined
-by ISO/IEC TS 18661-3:2015. @xref{Floating Types}.
+Arm targets support two mutually incompatible half-precision
+floating-point formats:
+
+@itemize @bullet
+@item
+A format that implements IEEE 754-2008 16-bit floating point types,
+enabled with the @option{-mfp16-format=ieee} command-line option; this
+format can represent normalized values in the range of @math{2^{-14}}
+to 65504. There are 11 bits of significand precision, approximately 3
+decimal digits.
+
+@item
+An alternative format that sacrifices NaNs and infinity values, but
+has a larger range of values that can be represented: @math{2^{-14}}
+to 131008. This is enabled with the
+@option{-mfp16-format=alternative} option.
+@end itemize
+
+You must choose one of the formats and use it consistently in your
+program.
+
+GCC only supports the @samp{alternative} format on implementations
+that support it in hardware; there is no support for conversions to
+and from this format using library functions. Furthermore, you cannot
+link together code compiled with one format and code compiled for the
+other. GCC also supports the @option{-mfp16-format=none} option,
+which disables all support for half-precision floating-point types.
+Code compiled with this option can be linked safely with code compiled
+for either format.
+
+The Arm architecture extension @code{FEAT_FP16} (enabled, for example,
+with @option{-march=armv8.2-a+fp16}, or
+@option{-march=armv8.1-m.main+mve.fp}) defines data processing
+instructions that only support the @samp{ieee} format. The compiler
+rejects attempts to use the @samp{alternative} format when this
+architecture extension is enabled.
+
+Note that the ACLE has deprecated use of the @samp{alternative} format
+and recommends that only the @samp{ieee} format be used.
+
+The default is to compile with @option{-mfp16-format=ieee}.
+
+In C and C++ there are two related data types:
+@itemize @bullet
+@item
+
+@code{__fp16}, as defined by the Arm C-Language Extensions (ACLE).
+This can be used to hold either format;
+
+@item
+@code{_Float16}, which is defined by ISO/IEC TS 18661-3:2015. This is
+only defined when the format selected is @samp{ieee}.
+@end itemize
+
+The GCC port for AArch64 only supports the IEEE 754-2008 format, and
+does not have the @option{-mfp16-format} command-line option.
+
+
+On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit)
+floating point via the @code{_Float16} type. For C++, x86 provides a
+builtin type named @code{_Float16} which contains same data format as C.
+
On x86 targets with SSE2 enabled, without @option{-mavx512fp16},
-all operations will be emulated by software emulation and the @code{float}
+all operations are emulated by software emulation and the @code{float}
instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep the
intermediate result of the operation as 32-bit precision. This may lead to
inconsistent behavior between software emulation and AVX512-FP16 instructions.
-Using @option{-fexcess-precision=16} will force round back after each operation.
+Using @option{-fexcess-precision=16} forces round back after each operation.
-Using @option{-mavx512fp16} will generate AVX512-FP16 instructions instead of
+Using @option{-mavx512fp16} generates AVX512-FP16 instructions instead of
software emulation. The default behavior of @code{FLT_EVAL_METHOD} is to round
after each operation. The same is true with @option{-fexcess-precision=standard}
and @option{-mfpmath=sse}. If there is no @option{-mfpmath=sse},
@@ -1131,6 +1160,14 @@ such an initializer, as shown here:
char **foo = (char *[]) @{ "x", "y", "z" @};
@end smallexample
+As a GNU extension, GCC allows compound literals with a variable size.
+In this case, only empty initialization is allowed.
+
+@smallexample
+int n = 4;
+char (*p)[n] = &(char[n])@{ @};
+@end smallexample
+
Compound literals for scalar types and union types are also allowed. In
the following example the variable @code{i} is initialized to the value
@code{2}, the result of incrementing the unnamed object created by
@@ -3463,12 +3500,41 @@ Function Attributes}, @ref{PowerPC Function Attributes},
@ref{ARM Function Attributes}, @ref{AArch64 Function Attributes},
and @ref{S/390 Function Attributes} for details.
+On targets supporting @code{target} function multiversioning (x86), when using
+C++, you can declare multiple functions with the same signatures but different
+@code{target} attribute values, and the correct version is chosen by the
+dynamic linker. In the example below, two function versions are produced
+with differing mangling. Additionally an ifunc resolver is created to
+select the correct version to populate the @code{func} symbol.
+
+@smallexample
+int func (void) __attribute__ ((target ("arch=core2"))) @{ return 1; @}
+int func (void) __attribute__ ((target ("sse3"))) @{ return 2; @}
+@end smallexample
+
+Declarations annotated with @code{target} cannot be used in combination with
+declarations annotated with @code{target_clones} in a single multiversioned
+function definition.
+
+@xref{Function Multiversioning} for more details.
+
+@cindex @code{target_version} function attribute
+@item target_version (@var{option})
+On targets with @code{target_version} function multiversioning (AArch64 and
+RISC-V) in C or C++, you can declare multiple functions with
+@code{target_version} or @code{target_clones} attributes to define a function
+version set.
+
+@xref{Function Multiversioning} for more details.
+
@cindex @code{target_clones} function attribute
@item target_clones (@var{options})
The @code{target_clones} attribute is used to specify that a function
be cloned into multiple versions compiled with different target options
-than specified on the command line. The supported options and restrictions
-are the same as for @code{target} attribute.
+than specified on the command line.
+
+For the x86 and PowerPC targets, the supported options and restrictions
+are the same as for the @code{target} attribute.
For instance, on an x86, you could compile a function with
@code{target_clones("sse4.1,avx")}. GCC creates two function clones,
@@ -3480,16 +3546,20 @@ function clones, one compiled with @option{-mcpu=power9} and another
with the default options. GCC must be configured to use GLIBC 2.23 or
newer in order to use the @code{target_clones} attribute.
-It also creates a resolver function (see
-the @code{ifunc} attribute above) that dynamically selects a clone
-suitable for current architecture. The resolver is created only if there
-is a usage of a function with @code{target_clones} attribute.
+@code{target_clones} works similarly for targets that support the
+@code{target_version} attribute (AArch64 and RISC-V). The attribute takes
+multiple arguments, and generates a versioned clone for each. A function
+annotated with @code{target_clones} is equivalent to the same function
+duplicated for each valid version string in the argument, where each
+version is instead annotated with @code{target_version}. This means that a
+@code{target_clones} annotated function definition can be used in combination
+with @code{target_version} annotated functions definitions and other
+@code{target_clones} annotated function definitions.
-Note that any subsequent call of a function without @code{target_clone}
-from a @code{target_clone} caller will not lead to copying
-(target clone) of the called function.
-If you want to enforce such behavior,
-we recommend declaring the calling function with the @code{flatten} attribute?
+For these targets the supported options and restrictions are the same as for
+the @code{target_version} attribute.
+
+@xref{Function Multiversioning} for more details.
@cindex @code{unavailable} function attribute
@item unavailable
@@ -3930,6 +4000,27 @@ threads, such as the POSIX @code{swapcontext} function. This attribute
adds a @code{BTI J} instruction when BTI is enabled e.g. via
@option{-mbranch-protection}.
+@cindex @code{preserve_none} function attribute, AArch64
+@item preserve_none
+Use this attribute to change the procedure call standard of the specified
+function to the preserve-none variant.
+
+The preserve-none ABI variant modifies the AAPCS such that it has no
+callee-saved registers (including SIMD and floating-point registers). That is,
+with the exception of the stack register, link register (r30), and frame pointer
+(r29), all registers are changed to caller saved, and can be used as scratch
+registers by the callee.
+
+Additionally, registers r20--r28, r0--r7, r10--r14, r9 and r15 are used for
+argument passing, in that order. For Microsoft Windows targets
+r15 is not used for argument passing.
+
+The return value registers remain r0 and r1 in both cases.
+
+All other details are the same as for the AAPCS ABI.
+
+This ABI has not been stabilized, and may be subject to change in future
+versions.
@end table
The above target attributes can be specified as follows:
@@ -4767,7 +4858,16 @@ Calls to @code{foo} are mapped to calls to @code{foo@{20040821@}}.
@node LoongArch Function Attributes
@subsubsection LoongArch Function Attributes
-These function attributes are supported by the LoongArch end:
+The following attributes are supported by LoongArch end:
+
+@table @code
+
+@cindex @code{target (option,...)} loongarch function attribute target
+@item target (option,...)
+
+The following target-specific function attributes are available for the
+LoongArch target. These options mirror the behavior of similar
+command-line options (@pxref{LoongArch Options}), but on a per-function basis.
@table @code
@cindex @code{strict-align} function attribute, LoongArch
@@ -4836,6 +4936,200 @@ But the following method cannot perform 128-bit vectorization.
$ gcc test.c -o test.s -O2 -mlasx -mno-lasx
@end smallexample
+@cindex @code{recipe} function attribute, LoongArch
+@item recipe
+@itemx no-recipe
+@code{recipe} indicates that frecipe.@{s/d@} and frsqrt.@{s/d@}instruction generation
+is allowed (not allowed) when compiling the function. The behavior is same as for
+the command-line option
+@option{-mrecipe} and @option{-mno-recipe}.
+
+@cindex @code{div32} function attribute, LoongArch
+@item div32
+@itemx no-div32
+@code{div32} determines whether div.w[u] and mod.w[u] instructions on 64-bit machines
+are evaluated based only on the lower 32 bits of the input registers.
+@option{-mdiv32} and @option{-mno-div32}.
+
+@cindex @code{lam-bh} function attribute, LoongArch
+@item lam-bh
+@itemx no-lam-bh
+@code{lam-bh} indicates that am@{swap/add@}[_db].@{b/h@} instruction generation
+is allowed (not allowed) when compiling the function. The behavior is same as for
+the command-line option
+@option{-mlam-bh} and @option{-mno-lam-bh}.
+
+@cindex @code{lamcas} function attribute, LoongArch
+@item lamcas
+@itemx no-lamcas
+@code{lamcas} indicates that amcas[_db].@{b/h/w/d@} instruction generation
+is allowed (not allowed) when compiling the function. The behavior is same as for
+the command-line option
+@option{-mlamcas} and @option{-mno-lamcas}.
+
+@cindex @code{scq} function attribute, LoongArch
+@item scq
+@itemx no-scq
+@code{scq} indicates that sc.q instruction generation is allowed (not allowed) when
+compiling the function. The behavior is same as for the command-line option
+@option{-mscq} and @option{-mno-scq}.
+
+@cindex @code{ld-seq-sa} function attribute, LoongArch
+@item ld-seq-sa
+@itemx no-ld-seq-sa
+@code{ld-seq-sa} indicates that whether need load-load barries (dbar 0x700)
+@option{-mld-seq-sa} and @option{-mno-ld-seq-sa}.
+
+@end table
+
+Multiple target function attributes can be specified by separating them with
+a comma. For example:
+
+@smallexample
+__attribute__((target("arch=la64v1.1,lasx")))
+int
+foo (int a)
+@{
+ return a + 5;
+@}
+@end smallexample
+
+is valid and compiles function @code{foo} for LA64V1.1 with @code{lasx}.
+
+@subsubheading Inlining rules
+Specifying target attributes on individual functions or performing link-time
+optimization across translation units compiled with different target options
+can affect function inlining rules:
+
+In particular, a caller function can inline a callee function only if the
+architectural features available to the callee are a subset of the features
+available to the caller.
+
+Note that when the callee function does not have the always_inline attribute,
+it will not be inlined if the code model of the caller function is different
+from the code model of the callee function.
+
+@cindex @code{target_clones (string,...)} loongarch function attribute target_clones
+@item target_clones (string,...)
+
+Like attribute @code{target}, these options also reflect the behavior of
+similar command line options.
+
+@code{string} can take the following values:
+
+@itemize @bullet
+@item default
+@item strict-align
+@item arch=
+@item lsx
+@item lasx
+@item frecipe
+@item div32
+@item lam-bh
+@item lamcas
+@item scq
+@item ld-seq-sa
+@end itemize
+You can set the priority of attributes in target_clones (except @code{default}).
+For example:
+
+@smallexample
+__attribute__((target_clones ("default","arch=la64v1.1","lsx;priority=1")))
+int
+foo (int a)
+@{
+ return a + 5;
+@}
+@end smallexample
+
+The priority is from low to high:
+@itemize @bullet
+@item default
+@item arch=loongarch64
+@item strict-align
+@item frecipe = div32 = lam-bh = lamcas = scq = ld-seq-sa
+@item lsx
+@item arch=la64v1.0
+@item arch=la64v1.1
+@item lasx
+@end itemize
+
+Note that the option values on the gcc command line are not considered when
+calculating the priority.
+
+If a priority is set for a feature in target_clones, then the priority of this
+feature will be higher than @code{lasx}.
+
+For example:
+
+@smallexample
+__attribute__((target_clones ("default","arch=la64v1.1","lsx;priority=1")))
+int
+foo (int a)
+@{
+ return a + 5;
+@}
+@end smallexample
+
+In this test case, the priority of @code{lsx} is higher than that of
+@code{arch=la64v1.1}.
+
+If the same priority is explicitly set for two features, the priority is still
+calculated according to the priority list above.
+
+For example:
+
+@smallexample
+__attribute__((target_clones ("default","arch=la64v1.1;priority=1","lsx;priority=1")))
+int
+foo (int a)
+@{
+ return a + 5;
+@}
+@end smallexample
+
+In this test case, the priority of @code{arch=la64v1.1;priority=1} is higher
+than that of @code{lsx;priority=1}.
+
+@cindex @code{target_version (string)} loongarch function attribute target_versions
+@item target_version (string)
+Support attributes and priorities are the same as @code{target_clones}.
+Note that this attribute requires GLIBC2.38 and newer that support HWCAP.
+
+For example:
+
+@code{test1.C}
+@smallexample
+__attribute__((target_clones ("default","arch=la64v1.1","lsx;priority=1")))
+int
+foo (int a)
+@{
+ return a + 5;
+@}
+@end smallexample
+
+@code{test2.C}
+@smallexample
+__attribute__((target_version ("default")))
+int
+foo (int a)
+@{
+ return a + 5;
+@}
+__attribute__((target_version ("arch=la64v1.1")))
+int
+foo (int a)
+@{
+ return a + 5;
+@}
+__attribute__((target_version ("lsx;priority=1")))
+int
+foo (int a)
+@{
+ return a + 5;
+@}
+@end smallexample
+The implementations of @code{test1.C} and @code{test2.C} are equivalent.
@end table
@node M32C Function Attributes
@@ -5713,6 +6007,16 @@ Specifies the core for which to tune the performance of this function and also
whose architectural features to use. The behavior and valid arguments are the
same as for the @option{-mcpu=} command-line option.
+@cindex @code{max-vectorization} function attribute, RISC-V
+@item max-vectorization
+@itemx no-max-vectorization
+@code{max-vectorization} tells GCC's vectorizer to treat all vector
+loops as being more profitable than the original scalar loops when
+optimizing the current function. @code{no-max-vectorization} disables
+this behavior.
+This corresponds to the behavior of the command-line options
+@option{-mmax-vectorization} and @option{-mno-max-vectorization}.
+
@end table
The above target attributes can be specified as follows:
@@ -10428,7 +10732,7 @@ for more information about the @code{target} attribute and the attribute
syntax.
The @code{#pragma GCC target} pragma is presently implemented for
-x86, ARM, AArch64, PowerPC, and S/390 targets only.
+x86, ARM, AArch64, PowerPC, RISC-V, and S/390 targets only.
@cindex pragma GCC optimize
@item #pragma GCC optimize (@var{string}, @dots{})
@@ -10849,36 +11153,6 @@ library.
@xref{OpenMP and OpenACC Options}, for additional options useful with
@option{-fopenacc}.
-@node _Countof
-@section Determining the Number of Elements of Arrays
-@cindex _Countof
-@cindex number of elements
-
-The keyword @code{_Countof} determines
-the number of elements of an array operand.
-Its syntax is similar to @code{sizeof}.
-The operand must be
-a parenthesized complete array type name
-or an expression of such a type.
-For example:
-
-@smallexample
-int a[n];
-_Countof (a); // returns n
-_Countof (int [7][3]); // returns 7
-@end smallexample
-
-The result of this operator is an integer constant expression,
-unless the array has a variable number of elements.
-The operand is only evaluated
-if the array has a variable number of elements.
-For example:
-
-@smallexample
-_Countof (int [7][n++]); // integer constant expression
-_Countof (int [n++][7]); // run-time value; n++ is evaluated
-@end smallexample
-
@node Inline
@section An Inline Function is As Fast As a Macro
@cindex inline functions
@@ -13300,6 +13574,8 @@ C and/or C++ standards, while others remain specific to GNU C.
* Labels as Values:: Getting pointers to labels, and computed gotos.
* Nested Functions:: Nested functions in GNU C.
* Typeof:: @code{typeof}: referring to the type of an expression.
+* _Countof:: Determining the number of elements of arrays
+* _Maxof and _Minof:: The maximum and minimum representable values of a type.
* Offsetof:: Special syntax for @code{offsetof}.
* Alignment:: Determining the alignment of a function, type or variable.
* Enum Extensions:: Forward declarations and specifying the underlying type.
@@ -13936,6 +14212,55 @@ evaluated only once when using @code{__auto_type}, but twice if
@code{typeof} is used.
@end itemize
+@node _Countof
+@subsection Determining the Number of Elements of Arrays
+@findex _Countof
+@findex number of elements
+
+The keyword @code{_Countof} determines
+the number of elements of an array operand.
+Its syntax is similar to @code{sizeof}.
+The operand must be
+a parenthesized complete array type name
+or an expression of such a type.
+For example:
+
+@smallexample
+int a[n];
+_Countof (a); // returns n
+_Countof (int [7][3]); // returns 7
+@end smallexample
+
+The result of this operator is an integer constant expression,
+unless the array has a variable number of elements.
+The operand is only evaluated
+if the array has a variable number of elements.
+For example:
+
+@smallexample
+_Countof (int [7][n++]); // integer constant expression
+_Countof (int [n++][7]); // run-time value; n++ is evaluated
+@end smallexample
+
+@node _Maxof and _Minof
+@subsection The maximum and minimum representable values of a type
+@findex _Maxof
+@findex _Minof
+
+The keywords @code{_Maxof} and @code{_Minof} determine
+the maximum and minimum representable values of an integer type.
+Their syntax is similar to @code{sizeof}.
+The operand must be
+a parenthesized integer type.
+The result of these operators is an integer constant expression
+of the same type as the operand.
+For example:
+
+@smallexample
+_Maxof (int); // returns '(int) INT_MAX'
+_Minof (short); // returns '(short) SHRT_MIN'
+@end smallexample
+
@node Offsetof
@subsection Support for @code{offsetof}
@findex __builtin_offsetof
@@ -18267,7 +18592,7 @@ instructions, but allow the compiler to schedule those calls.
* Alpha Built-in Functions::
* ARC Built-in Functions::
* ARC SIMD Built-in Functions::
-* ARM C Language Extensions (ACLE)::
+* Arm C Language Extensions (ACLE)::
* ARM Floating Point Status and Control Intrinsics::
* ARM ARMv8-M Security Extensions::
* AVR Built-in Functions::
@@ -18882,26 +19207,22 @@ _v4hi __builtin_arc_vaddsub4h (__v4hi, __v4hi);
_v4hi __builtin_arc_vsubadd4h (__v4hi, __v4hi);
@end example
-@node ARM C Language Extensions (ACLE)
-@subsection ARM C Language Extensions (ACLE)
+@node Arm C Language Extensions (ACLE)
+@subsection Arm C Language Extensions (ACLE)
-GCC implements extensions for C as described in the ARM C Language
+GCC implements extensions for C and C++ as described in the Arm C Language
Extensions (ACLE) specification, which can be found at
-@uref{https://developer.arm.com/documentation/ihi0053/latest/}.
-
-As a part of ACLE, GCC implements extensions for Advanced SIMD as described in
-the ARM C Language Extensions Specification. The complete list of Advanced SIMD
-intrinsics can be found at
-@uref{https://developer.arm.com/documentation/ihi0073/latest/}.
-The built-in intrinsics for the Advanced SIMD extension are available when
-NEON is enabled.
-
-Currently, ARM and AArch64 back ends do not support ACLE 2.0 fully. Both
-back ends support CRC32 intrinsics and the ARM back end supports the
-Coprocessor intrinsics, all from @file{arm_acle.h}. The ARM back end's 16-bit
-floating-point Advanced SIMD intrinsics currently comply to ACLE v1.1.
-AArch64's back end does not have support for 16-bit floating point Advanced SIMD
-intrinsics yet.
+@uref{https://arm-software.github.io/acle/main/}.
+
+As a part of ACLE, GCC implements extensions for Arm Vector extensions
+as described in the Arm C Language Extensions Specification. The complete
+list of Arm Vector extension intrinsics is available at
+@uref{https://arm-software.github.io/acle/main/}.
+The built-in intrinsics for the Arm vector extensions are available when
+the respective extensions are enabled.
+
+Not all aspects of ACLE are supported. Support for each feature of the ACLE
+is determined with the @code{__ARM_FEATURE_@var{X}} macros.
See @ref{ARM Options} and @ref{AArch64 Options} for more information on the
availability of extensions.
@@ -19693,7 +20014,16 @@ into the data cache. The instruction is issued in slot I1@.
These built-in functions are available for LoongArch.
-Data Type Description:
+@menu
+* Data Types::
+* Directly-mapped Builtin Functions::
+* Directly-mapped Division Builtin Functions::
+* Other Builtin Functions::
+@end menu
+
+@node Data Types
+@subsubsection Data Types
+
@itemize
@item @code{imm0_31}, a compile-time constant in range 0 to 31;
@item @code{imm0_16383}, a compile-time constant in range 0 to 16383;
@@ -19701,6 +20031,9 @@ Data Type Description:
@item @code{imm_n2048_2047}, a compile-time constant in range -2048 to 2047;
@end itemize
+@node Directly-mapped Builtin Functions
+@subsubsection Directly-mapped Builtin Functions
+
The intrinsics provided are listed below:
@smallexample
unsigned int __builtin_loongarch_movfcsr2gr (imm0_31)
@@ -19824,6 +20157,9 @@ function you need to include @code{larchintrin.h}.
void __break (imm0_32767)
@end smallexample
+@node Directly-mapped Division Builtin Functions
+@subsubsection Directly-mapped Division Builtin Functions
+
These intrinsic functions are available by including @code{larchintrin.h} and
using @option{-mfrecipe}.
@smallexample
@@ -19833,6 +20169,9 @@ using @option{-mfrecipe}.
double __frsqrte_d (double);
@end smallexample
+@node Other Builtin Functions
+@subsubsection Other Builtin Functions
+
Additional built-in functions are available for LoongArch family
processors to efficiently use 128-bit floating-point (__float128)
values.
@@ -19859,6 +20198,15 @@ GCC provides intrinsics to access the LSX (Loongson SIMD Extension) instructions
The interface is made available by including @code{<lsxintrin.h>} and using
@option{-mlsx}.
+@menu
+* SX Data Types::
+* Directly-mapped SX Builtin Functions::
+* Directly-mapped SX Division Builtin Functions::
+@end menu
+
+@node SX Data Types
+@subsubsection SX Data Types
+
The following vectors typedefs are included in @code{lsxintrin.h}:
@itemize
@@ -19886,6 +20234,9 @@ input/output values manipulated:
@item @code{imm_n2048_2047}, an integer literal in range -2048 to 2047.
@end itemize
+@node Directly-mapped SX Builtin Functions
+@subsubsection Directly-mapped SX Builtin Functions
+
For convenience, GCC defines functions @code{__lsx_vrepli_@{b/h/w/d@}} and
@code{__lsx_b[n]z_@{v/b/h/w/d@}}, which are implemented as follows:
@@ -20669,6 +21020,9 @@ __m128i __lsx_vxori_b (__m128i, imm0_255);
__m128i __lsx_vxor_v (__m128i, __m128i);
@end smallexample
+@node Directly-mapped SX Division Builtin Functions
+@subsubsection Directly-mapped SX Division Builtin Functions
+
These intrinsic functions are available by including @code{lsxintrin.h} and
using @option{-mfrecipe} and @option{-mlsx}.
@smallexample
@@ -20685,6 +21039,16 @@ GCC provides intrinsics to access the LASX (Loongson Advanced SIMD Extension)
instructions. The interface is made available by including @code{<lasxintrin.h>}
and using @option{-mlasx}.
+@menu
+* ASX Data Types::
+* Directly-mapped ASX Builtin Functions::
+* Directly-mapped ASX Division Builtin Functions::
+* Directly-mapped SX and ASX Conversion Builtin Functions::
+@end menu
+
+@node ASX Data Types
+@subsubsection ASX Data Types
+
The following vectors typedefs are included in @code{lasxintrin.h}:
@itemize
@@ -20713,6 +21077,9 @@ input/output values manipulated:
@item @code{imm_n2048_2047}, an integer literal in range -2048 to 2047.
@end itemize
+@node Directly-mapped ASX Builtin Functions
+@subsubsection Directly-mapped ASX Builtin Functions
+
For convenience, GCC defines functions @code{__lasx_xvrepli_@{b/h/w/d@}} and
@code{__lasx_b[n]z_@{v/b/h/w/d@}}, which are implemented as follows:
@@ -21517,6 +21884,9 @@ __m256i __lasx_xvxori_b (__m256i, imm0_255);
__m256i __lasx_xvxor_v (__m256i, __m256i);
@end smallexample
+@node Directly-mapped ASX Division Builtin Functions
+@subsubsection Directly-mapped ASX Division Builtin Functions
+
These intrinsic functions are available by including @code{lasxintrin.h} and
using @option{-mfrecipe} and @option{-mlasx}.
@smallexample
@@ -21526,6 +21896,213 @@ __m256d __lasx_xvfrsqrte_d (__m256d);
__m256 __lasx_xvfrsqrte_s (__m256);
@end smallexample
+@node Directly-mapped SX and ASX Conversion Builtin Functions
+@subsubsection Directly-mapped SX and ASX Conversion Builtin Functions
+
+For convenience, the @code{lsxintrin.h} file was imported into @code{
+lasxintrin.h} and 18 new interface functions for 128 and 256 vector
+conversions were added, using the @option{-mlasx} option.
+@smallexample
+__m256 __lasx_cast_128_s (__m128);
+__m256d __lasx_cast_128_d (__m128d);
+__m256i __lasx_cast_128 (__m128i);
+__m256 __lasx_concat_128_s (__m128, __m128);
+__m256d __lasx_concat_128_d (__m128d, __m128d);
+__m256i __lasx_concat_128 (__m128i, __m128i);
+__m128 __lasx_extract_128_lo_s (__m256);
+__m128 __lasx_extract_128_hi_s (__m256);
+__m128d __lasx_extract_128_lo_d (__m256d);
+__m128d __lasx_extract_128_hi_d (__m256d);
+__m128i __lasx_extract_128_lo (__m256i);
+__m128i __lasx_extract_128_hi (__m256i);
+__m256 __lasx_insert_128_lo_s (__m256, __m128);
+__m256 __lasx_insert_128_hi_s (__m256, __m128);
+__m256d __lasx_insert_128_lo_d (__m256d, __m128d);
+__m256d __lasx_insert_128_hi_d (__m256d, __m128d);
+__m256i __lasx_insert_128_lo (__m256i, __m128i);
+__m256i __lasx_insert_128_hi (__m256i, __m128i);
+@end smallexample
+
+When gcc does not support interfaces for 128 and 256 conversions,
+use the following code for equivalent substitution.
+
+@smallexample
+
+ #ifndef __loongarch_asx_sx_conv
+
+ #include <lasxintrin.h>
+ #include <lsxintrin.h>
+ __m256 inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_cast_128_s (__m128 src)
+ @{
+ __m256 dest;
+ asm ("" : "=f"(dest) : "0"(src));
+ return dest;
+ @}
+
+ __m256d inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_cast_128_d (__m128d src)
+ @{
+ __m256d dest;
+ asm ("" : "=f"(dest) : "0"(src));
+ return dest;
+ @}
+
+ __m256i inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_cast_128 (__m128i src)
+ @{
+ __m256i dest;
+ asm ("" : "=f"(dest) : "0"(src));
+ return dest;
+ @}
+
+ __m256 inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_concat_128_s (__m128 src1, __m128 src2)
+ @{
+ __m256 dest;
+ asm ("xvpermi.q %u0,%u2,0x02\n"
+ : "=f"(dest)
+ : "0"(src1), "f"(src2));
+ return dest;
+ @}
+
+ __m256d inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_concat_128_d (__m128d src1, __m128d src2)
+ @{
+ __m256d dest;
+ asm ("xvpermi.q %u0,%u2,0x02\n"
+ : "=f"(dest)
+ : "0"(src1), "f"(src2));
+ return dest;
+ @}
+
+ __m256i inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_concat_128 (__m128i src1, __m128i src2)
+ @{
+ __m256i dest;
+ asm ("xvpermi.q %u0,%u2,0x02\n"
+ : "=f"(dest)
+ : "0"(src1), "f"(src2));
+ return dest;
+ @}
+
+ __m128 inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_extract_128_lo_s (__m256 src)
+ @{
+ __m128 dest;
+ asm ("" : "=f"(dest) : "0"(src));
+ return dest;
+ @}
+
+ __m128d inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_extract_128_lo_d (__m256d src)
+ @{
+ __m128d dest;
+ asm ("" : "=f"(dest) : "0"(src));
+ return dest;
+ @}
+
+ __m128i inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_extract_128_lo (__m256i src)
+ @{
+ __m128i dest;
+ asm ("" : "=f"(dest) : "0"(src));
+ return dest;
+ @}
+
+ __m128 inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_extract_128_hi_s (__m256 src)
+ @{
+ __m128 dest;
+ asm ("xvpermi.d %u0,%u1,0xe\n"
+ : "=f"(dest)
+ : "f"(src));
+ return dest;
+ @}
+
+ __m128d inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_extract_128_hi_d (__m256d src)
+ @{
+ __m128d dest;
+ asm ("xvpermi.d %u0,%u1,0xe\n"
+ : "=f"(dest)
+ : "f"(src));
+ return dest;
+ @}
+
+ __m128i inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_extract_128_hi (__m256i src)
+ @{
+ __m128i dest;
+ asm ("xvpermi.d %u0,%u1,0xe\n"
+ : "=f"(dest)
+ : "f"(src));
+ return dest;
+ @}
+
+ __m256 inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_insert_128_lo_s (__m256 src1, __m128 src2)
+ @{
+ __m256 dest;
+ asm ("xvpermi.q %u0,%u2,0x30\n"
+ : "=f"(dest)
+ : "0"(src1), "f"(src2));
+ return dest;
+ @}
+
+ __m256d inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_insert_128_lo_d (__m256d a, __m128d b)
+ @{
+ __m256d dest;
+ asm ("xvpermi.q %u0,%u2,0x30\n"
+ : "=f"(dest)
+ : "0"(src1), "f"(src2));
+ return dest;
+ @}
+
+ __m256i inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_insert_128_lo (__m256i src1, __m128i src2)
+ @{
+ __m256i dest;
+ asm ("xvpermi.q %u0,%u2,0x30\n"
+ : "=f"(dest)
+ : "0"(src1), "f"(src2));
+ return dest;
+ @}
+
+ __m256 inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_insert_128_hi_s (__m256 src1, __m128 src2)
+ @{
+ __m256 dest;
+ asm ("xvpermi.q %u0,%u2,0x02\n"
+ : "=f"(dest)
+ : "0"(src1), "f"(src2));
+ return dest;
+ @}
+
+ __m256d inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_insert_128_hi_d (__m256d src1, __m128d src2)
+ @{
+ __m256d dest;
+ asm ("xvpermi.q %u0,%u2,0x02\n"
+ : "=f"(dest)
+ : "0"(src1), "f"(src2));
+ return dest;
+ @}
+
+ __m256i inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+ __lasx_insert_128_hi (__m256i src1, __m128i src2)
+ @{
+ __m256i dest;
+ asm ("xvpermi.q %u0,%u2,0x02\n"
+ : "=f"(dest)
+ : "0"(src1), "f"(src2));
+ return dest;
+ @}
+ #endif
+
+@end smallexample
+
@node MIPS DSP Built-in Functions
@subsection MIPS DSP Built-in Functions
@@ -30674,11 +31251,79 @@ For the effects of the @code{hot} attribute on functions, see
@section Function Multiversioning
@cindex function versions
-With the GNU C++ front end, for x86 targets, you may specify multiple
-versions of a function, where each function is specialized for a
-specific target feature. At runtime, the appropriate version of the
-function is automatically executed depending on the characteristics of
-the execution platform. Here is an example.
+Function multiversioning is a mechanism that enables compiling multiple
+versions of a function, each specialized for different combinations of
+architecture extensions. Additionally, the compiler generates a resolver that
+the dynamic linker uses to detect architecture support and choose the
+appropriate version at runtime.
+
+Function multiversioning relies on the indirect function extension to the ELF
+standard, and therefore Binutils version 2.20.1 or higher and GNU C Library
+version 2.11.1 are required to use this feature.
+
+There are two versions of function multiversioning supported by GCC.
+
+For targets supporting the @code{target_version} attribute (AArch64 and RISC-V),
+when compiling for C or C++, a function version set can be defined by a
+combination of function definitions with @code{target_version} and
+@code{target_clones} attributes, across translation units.
+
+For example:
+
+@smallexample
+// fmv.h:
+int foo ();
+int foo [[gnu::target_clones("sve", "sve2")]] ();
+int foo [[gnu::target_version("dotprod;priority=1")]] ();
+
+// fmv1.cc
+#include "fmv.h"
+
+int foo ()
+@{
+ // The default version of foo.
+ return 0;
+@}
+
+// fmv2.cc:
+#include "fmv.h"
+
+int foo [[gnu::target_clones("sve", "sve2")]] ()
+@{
+ // foo versions for sve and sve2
+ return 1;
+@}
+
+int foo [[gnu::target_version("dotprod")]] ()
+@{
+ // foo version for dotprod extension
+ return 2;
+@}
+
+// main.cc
+#include "fmv.h"
+
+int main ()
+@{
+ int (*p)() = &foo;
+ assert ((*p) () == foo ());
+ return 0;
+@}
+@end smallexample
+
+This example results in 4 versions of the foo function being generated, and
+a resolver which is used by the dynamic linker to choose the correct version.
+
+For the AArch64 target GCC implements function multiversionsing, with the
+semantics and version strings as specified in the
+@ref{Arm C Language Extensions (ACLE)}.
+
+For targets that support multiversioning with the @code{target} attribute
+(x86) a multiversioned function can be defined with either multiple function
+definitions with the @code{target} attribute (in C++) within a translation unit,
+or a single definition with the @code{target_clones} attribute.
+
+Here is an example.
@smallexample
__attribute__ ((target ("default")))