diff options
author | Sandra Loosemore <sandra@codesourcery.com> | 2015-01-31 21:11:30 -0500 |
---|---|---|
committer | Sandra Loosemore <sandra@gcc.gnu.org> | 2015-01-31 21:11:30 -0500 |
commit | b4fbcb1bf2f569af3e57e91132f3573f37ad3800 (patch) | |
tree | bae709a7cfaad39f410356107f39eff5748c18e4 | |
parent | 0353c564debb7e8ab17e53bb92127d8e1d6fe010 (diff) | |
download | gcc-b4fbcb1bf2f569af3e57e91132f3573f37ad3800.zip gcc-b4fbcb1bf2f569af3e57e91132f3573f37ad3800.tar.gz gcc-b4fbcb1bf2f569af3e57e91132f3573f37ad3800.tar.bz2 |
md.texi (Machine Constraints): Alphabetize table by target.
2015-01-31 Sandra Loosemore <sandra@codesourcery.com>
gcc/
* doc/md.texi (Machine Constraints): Alphabetize table by target.
* doc/extend.texi (x86 Variable Attributes): Move section to
correct alphabetization after renaming.
(x86 Type Attributes): Likewise.
(Target Builtins): Re-alphabetize menu.
(x86 Built-in Functions): Move section to correct alphabetization
after renaming.
(x86 transactional memory intrinsics): Likewise.
* doc/invoke.texi (Option Summary): Re-alphabetize x86 Options
and x86 Windows Options in table and menu.
(x86 Options): Move section to correct alphabetization after
renaming.
(x86 Windows Options): Likewise.
From-SVN: r220315
-rw-r--r-- | gcc/ChangeLog | 16 | ||||
-rw-r--r-- | gcc/doc/extend.texi | 3034 | ||||
-rw-r--r-- | gcc/doc/invoke.texi | 2514 | ||||
-rw-r--r-- | gcc/doc/md.texi | 1411 |
4 files changed, 3496 insertions, 3479 deletions
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 7c06f05..0618d83 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,21 @@ 2015-01-31 Sandra Loosemore <sandra@codesourcery.com> + * doc/md.texi (Machine Constraints): Alphabetize table by target. + * doc/extend.texi (x86 Variable Attributes): Move section to + correct alphabetization after renaming. + (x86 Type Attributes): Likewise. + (Target Builtins): Re-alphabetize menu. + (x86 Built-in Functions): Move section to correct alphabetization + after renaming. + (x86 transactional memory intrinsics): Likewise. + * doc/invoke.texi (Option Summary): Re-alphabetize x86 Options + and x86 Windows Options in table and menu. + (x86 Options): Move section to correct alphabetization after + renaming. + (x86 Windows Options): Likewise. + +2015-01-31 Sandra Loosemore <sandra@codesourcery.com> + * doc/extend.texi: Use "x86", "x86-32", and "x86-64" as the preferred names of the architecture and its 32- and 64-bit variants. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 681812e..1806850 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -5521,6 +5521,23 @@ int cpu_clock __attribute__((cb(0x123))); @end table +@subsection PowerPC Variable Attributes + +Three attributes currently are defined for PowerPC configurations: +@code{altivec}, @code{ms_struct} and @code{gcc_struct}. + +For full documentation of the struct attributes please see the +documentation in @ref{x86 Variable Attributes}. + +For documentation of @code{altivec} attribute please see the +documentation in @ref{PowerPC Type Attributes}. + +@subsection SPU Variable Attributes + +The SPU supports the @code{spu_vector} attribute for variables. For +documentation of this attribute please see the documentation in +@ref{SPU Type Attributes}. + @anchor{x86 Variable Attributes} @subsection x86 Variable Attributes @@ -5659,23 +5676,6 @@ Here, @code{t5} takes up 2 bytes. @end enumerate @end table -@subsection PowerPC Variable Attributes - -Three attributes currently are defined for PowerPC configurations: -@code{altivec}, @code{ms_struct} and @code{gcc_struct}. - -For full documentation of the struct attributes please see the -documentation in @ref{x86 Variable Attributes}. - -For documentation of @code{altivec} attribute please see the -documentation in @ref{PowerPC Type Attributes}. - -@subsection SPU Variable Attributes - -The SPU supports the @code{spu_vector} attribute for variables. For -documentation of this attribute please see the documentation in -@ref{SPU Type Attributes}. - @subsection Xstormy16 Variable Attributes One attribute is currently defined for xstormy16 configurations: @@ -6078,30 +6078,6 @@ Specifically, the @code{based}, @code{tiny}, @code{near}, and @code{far} attributes may be applied to either. The @code{io} and @code{cb} attributes may not be applied to types. -@anchor{x86 Type Attributes} -@subsection x86 Type Attributes - -Two attributes are currently defined for x86 configurations: -@code{ms_struct} and @code{gcc_struct}. - -@table @code - -@item ms_struct -@itemx gcc_struct -@cindex @code{ms_struct} -@cindex @code{gcc_struct} - -If @code{packed} is used on a structure, or if bit-fields are used -it may be that the Microsoft ABI packs them differently -than GCC normally packs them. Particularly when moving packed -data between functions compiled with GCC and the native Microsoft compiler -(either via function call or as data in a file), it may be necessary to access -either format. - -Currently @option{-m[no-]ms-bitfields} is provided for the Microsoft Windows x86 -compilers to match the native Microsoft compiler. -@end table - @anchor{PowerPC Type Attributes} @subsection PowerPC Type Attributes @@ -6134,6 +6110,30 @@ allows one to declare vector data types supported by the Sony/Toshiba/IBM SPU Language Extensions Specification. It is intended to support the @code{__vector} keyword. +@anchor{x86 Type Attributes} +@subsection x86 Type Attributes + +Two attributes are currently defined for x86 configurations: +@code{ms_struct} and @code{gcc_struct}. + +@table @code + +@item ms_struct +@itemx gcc_struct +@cindex @code{ms_struct} +@cindex @code{gcc_struct} + +If @code{packed} is used on a structure, or if bit-fields are used +it may be that the Microsoft ABI packs them differently +than GCC normally packs them. Particularly when moving packed +data between functions compiled with GCC and the native Microsoft compiler +(either via function call or as data in a file), it may be necessary to access +either format. + +Currently @option{-m[no-]ms-bitfields} is provided for the Microsoft Windows x86 +compilers to match the native Microsoft compiler. +@end table + @node Alignment @section Inquiring on Alignment of Types or Variables @cindex alignment @@ -10113,8 +10113,6 @@ instructions, but allow the compiler to schedule those calls. * AVR Built-in Functions:: * Blackfin Built-in Functions:: * FR-V Built-in Functions:: -* x86 Built-in Functions:: -* x86 transactional memory intrinsics:: * MIPS DSP Built-in Functions:: * MIPS Paired-Single Support:: * MIPS Loongson Built-in Functions:: @@ -10133,6 +10131,8 @@ instructions, but allow the compiler to schedule those calls. * TI C6X Built-in Functions:: * TILE-Gx Built-in Functions:: * TILEPro Built-in Functions:: +* x86 Built-in Functions:: +* x86 transactional memory intrinsics:: @end menu @node AArch64 Built-in Functions @@ -11484,1480 +11484,6 @@ Use the @code{nldub} instruction to load the contents of address @var{x} into the data cache. The instruction is issued in slot I1@. @end table -@node x86 Built-in Functions -@subsection x86 Built-in Functions - -These built-in functions are available for the x86-32 and x86-64 family -of computers, depending on the command-line switches used. - -If you specify command-line switches such as @option{-msse}, -the compiler could use the extended instruction sets even if the built-ins -are not used explicitly in the program. For this reason, applications -that perform run-time CPU detection must compile separate files for each -supported architecture, using the appropriate flags. In particular, -the file containing the CPU detection code should be compiled without -these options. - -The following machine modes are available for use with MMX built-in functions -(@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32-bit integers, -@code{V4HI} for a vector of four 16-bit integers, and @code{V8QI} for a -vector of eight 8-bit integers. Some of the built-in functions operate on -MMX registers as a whole 64-bit entity, these use @code{V1DI} as their mode. - -If 3DNow!@: extensions are enabled, @code{V2SF} is used as a mode for a vector -of two 32-bit floating-point values. - -If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32-bit -floating-point values. Some instructions use a vector of four 32-bit -integers, these use @code{V4SI}. Finally, some instructions operate on an -entire vector register, interpreting it as a 128-bit integer, these use mode -@code{TI}. - -In 64-bit mode, the x86-64 family of processors uses additional built-in -functions for efficient use of @code{TF} (@code{__float128}) 128-bit -floating point and @code{TC} 128-bit complex floating-point values. - -The following floating-point built-in functions are available in 64-bit -mode. All of them implement the function that is part of the name. - -@smallexample -__float128 __builtin_fabsq (__float128) -__float128 __builtin_copysignq (__float128, __float128) -@end smallexample - -The following built-in function is always available. - -@table @code -@item void __builtin_ia32_pause (void) -Generates the @code{pause} machine instruction with a compiler memory -barrier. -@end table - -The following floating-point built-in functions are made available in the -64-bit mode. - -@table @code -@item __float128 __builtin_infq (void) -Similar to @code{__builtin_inf}, except the return type is @code{__float128}. -@findex __builtin_infq - -@item __float128 __builtin_huge_valq (void) -Similar to @code{__builtin_huge_val}, except the return type is @code{__float128}. -@findex __builtin_huge_valq -@end table - -The following built-in functions are always available and can be used to -check the target platform type. - -@deftypefn {Built-in Function} void __builtin_cpu_init (void) -This function runs the CPU detection code to check the type of CPU and the -features supported. This built-in function needs to be invoked along with the built-in functions -to check CPU type and features, @code{__builtin_cpu_is} and -@code{__builtin_cpu_supports}, only when used in a function that is -executed before any constructors are called. The CPU detection code is -automatically executed in a very high priority constructor. - -For example, this function has to be used in @code{ifunc} resolvers that -check for CPU type using the built-in functions @code{__builtin_cpu_is} -and @code{__builtin_cpu_supports}, or in constructors on targets that -don't support constructor priority. -@smallexample - -static void (*resolve_memcpy (void)) (void) -@{ - // ifunc resolvers fire before constructors, explicitly call the init - // function. - __builtin_cpu_init (); - if (__builtin_cpu_supports ("ssse3")) - return ssse3_memcpy; // super fast memcpy with ssse3 instructions. - else - return default_memcpy; -@} - -void *memcpy (void *, const void *, size_t) - __attribute__ ((ifunc ("resolve_memcpy"))); -@end smallexample - -@end deftypefn - -@deftypefn {Built-in Function} int __builtin_cpu_is (const char *@var{cpuname}) -This function returns a positive integer if the run-time CPU -is of type @var{cpuname} -and returns @code{0} otherwise. The following CPU names can be detected: - -@table @samp -@item intel -Intel CPU. - -@item atom -Intel Atom CPU. - -@item core2 -Intel Core 2 CPU. - -@item corei7 -Intel Core i7 CPU. - -@item nehalem -Intel Core i7 Nehalem CPU. - -@item westmere -Intel Core i7 Westmere CPU. - -@item sandybridge -Intel Core i7 Sandy Bridge CPU. - -@item amd -AMD CPU. - -@item amdfam10h -AMD Family 10h CPU. - -@item barcelona -AMD Family 10h Barcelona CPU. - -@item shanghai -AMD Family 10h Shanghai CPU. - -@item istanbul -AMD Family 10h Istanbul CPU. - -@item btver1 -AMD Family 14h CPU. - -@item amdfam15h -AMD Family 15h CPU. - -@item bdver1 -AMD Family 15h Bulldozer version 1. - -@item bdver2 -AMD Family 15h Bulldozer version 2. - -@item bdver3 -AMD Family 15h Bulldozer version 3. - -@item bdver4 -AMD Family 15h Bulldozer version 4. - -@item btver2 -AMD Family 16h CPU. -@end table - -Here is an example: -@smallexample -if (__builtin_cpu_is ("corei7")) - @{ - do_corei7 (); // Core i7 specific implementation. - @} -else - @{ - do_generic (); // Generic implementation. - @} -@end smallexample -@end deftypefn - -@deftypefn {Built-in Function} int __builtin_cpu_supports (const char *@var{feature}) -This function returns a positive integer if the run-time CPU -supports @var{feature} -and returns @code{0} otherwise. The following features can be detected: - -@table @samp -@item cmov -CMOV instruction. -@item mmx -MMX instructions. -@item popcnt -POPCNT instruction. -@item sse -SSE instructions. -@item sse2 -SSE2 instructions. -@item sse3 -SSE3 instructions. -@item ssse3 -SSSE3 instructions. -@item sse4.1 -SSE4.1 instructions. -@item sse4.2 -SSE4.2 instructions. -@item avx -AVX instructions. -@item avx2 -AVX2 instructions. -@item avx512f -AVX512F instructions. -@end table - -Here is an example: -@smallexample -if (__builtin_cpu_supports ("popcnt")) - @{ - asm("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc"); - @} -else - @{ - count = generic_countbits (n); //generic implementation. - @} -@end smallexample -@end deftypefn - - -The following built-in functions are made available by @option{-mmmx}. -All of them generate the machine instruction that is part of the name. - -@smallexample -v8qi __builtin_ia32_paddb (v8qi, v8qi) -v4hi __builtin_ia32_paddw (v4hi, v4hi) -v2si __builtin_ia32_paddd (v2si, v2si) -v8qi __builtin_ia32_psubb (v8qi, v8qi) -v4hi __builtin_ia32_psubw (v4hi, v4hi) -v2si __builtin_ia32_psubd (v2si, v2si) -v8qi __builtin_ia32_paddsb (v8qi, v8qi) -v4hi __builtin_ia32_paddsw (v4hi, v4hi) -v8qi __builtin_ia32_psubsb (v8qi, v8qi) -v4hi __builtin_ia32_psubsw (v4hi, v4hi) -v8qi __builtin_ia32_paddusb (v8qi, v8qi) -v4hi __builtin_ia32_paddusw (v4hi, v4hi) -v8qi __builtin_ia32_psubusb (v8qi, v8qi) -v4hi __builtin_ia32_psubusw (v4hi, v4hi) -v4hi __builtin_ia32_pmullw (v4hi, v4hi) -v4hi __builtin_ia32_pmulhw (v4hi, v4hi) -di __builtin_ia32_pand (di, di) -di __builtin_ia32_pandn (di,di) -di __builtin_ia32_por (di, di) -di __builtin_ia32_pxor (di, di) -v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi) -v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi) -v2si __builtin_ia32_pcmpeqd (v2si, v2si) -v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi) -v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi) -v2si __builtin_ia32_pcmpgtd (v2si, v2si) -v8qi __builtin_ia32_punpckhbw (v8qi, v8qi) -v4hi __builtin_ia32_punpckhwd (v4hi, v4hi) -v2si __builtin_ia32_punpckhdq (v2si, v2si) -v8qi __builtin_ia32_punpcklbw (v8qi, v8qi) -v4hi __builtin_ia32_punpcklwd (v4hi, v4hi) -v2si __builtin_ia32_punpckldq (v2si, v2si) -v8qi __builtin_ia32_packsswb (v4hi, v4hi) -v4hi __builtin_ia32_packssdw (v2si, v2si) -v8qi __builtin_ia32_packuswb (v4hi, v4hi) - -v4hi __builtin_ia32_psllw (v4hi, v4hi) -v2si __builtin_ia32_pslld (v2si, v2si) -v1di __builtin_ia32_psllq (v1di, v1di) -v4hi __builtin_ia32_psrlw (v4hi, v4hi) -v2si __builtin_ia32_psrld (v2si, v2si) -v1di __builtin_ia32_psrlq (v1di, v1di) -v4hi __builtin_ia32_psraw (v4hi, v4hi) -v2si __builtin_ia32_psrad (v2si, v2si) -v4hi __builtin_ia32_psllwi (v4hi, int) -v2si __builtin_ia32_pslldi (v2si, int) -v1di __builtin_ia32_psllqi (v1di, int) -v4hi __builtin_ia32_psrlwi (v4hi, int) -v2si __builtin_ia32_psrldi (v2si, int) -v1di __builtin_ia32_psrlqi (v1di, int) -v4hi __builtin_ia32_psrawi (v4hi, int) -v2si __builtin_ia32_psradi (v2si, int) - -@end smallexample - -The following built-in functions are made available either with -@option{-msse}, or with a combination of @option{-m3dnow} and -@option{-march=athlon}. All of them generate the machine -instruction that is part of the name. - -@smallexample -v4hi __builtin_ia32_pmulhuw (v4hi, v4hi) -v8qi __builtin_ia32_pavgb (v8qi, v8qi) -v4hi __builtin_ia32_pavgw (v4hi, v4hi) -v1di __builtin_ia32_psadbw (v8qi, v8qi) -v8qi __builtin_ia32_pmaxub (v8qi, v8qi) -v4hi __builtin_ia32_pmaxsw (v4hi, v4hi) -v8qi __builtin_ia32_pminub (v8qi, v8qi) -v4hi __builtin_ia32_pminsw (v4hi, v4hi) -int __builtin_ia32_pmovmskb (v8qi) -void __builtin_ia32_maskmovq (v8qi, v8qi, char *) -void __builtin_ia32_movntq (di *, di) -void __builtin_ia32_sfence (void) -@end smallexample - -The following built-in functions are available when @option{-msse} is used. -All of them generate the machine instruction that is part of the name. - -@smallexample -int __builtin_ia32_comieq (v4sf, v4sf) -int __builtin_ia32_comineq (v4sf, v4sf) -int __builtin_ia32_comilt (v4sf, v4sf) -int __builtin_ia32_comile (v4sf, v4sf) -int __builtin_ia32_comigt (v4sf, v4sf) -int __builtin_ia32_comige (v4sf, v4sf) -int __builtin_ia32_ucomieq (v4sf, v4sf) -int __builtin_ia32_ucomineq (v4sf, v4sf) -int __builtin_ia32_ucomilt (v4sf, v4sf) -int __builtin_ia32_ucomile (v4sf, v4sf) -int __builtin_ia32_ucomigt (v4sf, v4sf) -int __builtin_ia32_ucomige (v4sf, v4sf) -v4sf __builtin_ia32_addps (v4sf, v4sf) -v4sf __builtin_ia32_subps (v4sf, v4sf) -v4sf __builtin_ia32_mulps (v4sf, v4sf) -v4sf __builtin_ia32_divps (v4sf, v4sf) -v4sf __builtin_ia32_addss (v4sf, v4sf) -v4sf __builtin_ia32_subss (v4sf, v4sf) -v4sf __builtin_ia32_mulss (v4sf, v4sf) -v4sf __builtin_ia32_divss (v4sf, v4sf) -v4sf __builtin_ia32_cmpeqps (v4sf, v4sf) -v4sf __builtin_ia32_cmpltps (v4sf, v4sf) -v4sf __builtin_ia32_cmpleps (v4sf, v4sf) -v4sf __builtin_ia32_cmpgtps (v4sf, v4sf) -v4sf __builtin_ia32_cmpgeps (v4sf, v4sf) -v4sf __builtin_ia32_cmpunordps (v4sf, v4sf) -v4sf __builtin_ia32_cmpneqps (v4sf, v4sf) -v4sf __builtin_ia32_cmpnltps (v4sf, v4sf) -v4sf __builtin_ia32_cmpnleps (v4sf, v4sf) -v4sf __builtin_ia32_cmpngtps (v4sf, v4sf) -v4sf __builtin_ia32_cmpngeps (v4sf, v4sf) -v4sf __builtin_ia32_cmpordps (v4sf, v4sf) -v4sf __builtin_ia32_cmpeqss (v4sf, v4sf) -v4sf __builtin_ia32_cmpltss (v4sf, v4sf) -v4sf __builtin_ia32_cmpless (v4sf, v4sf) -v4sf __builtin_ia32_cmpunordss (v4sf, v4sf) -v4sf __builtin_ia32_cmpneqss (v4sf, v4sf) -v4sf __builtin_ia32_cmpnltss (v4sf, v4sf) -v4sf __builtin_ia32_cmpnless (v4sf, v4sf) -v4sf __builtin_ia32_cmpordss (v4sf, v4sf) -v4sf __builtin_ia32_maxps (v4sf, v4sf) -v4sf __builtin_ia32_maxss (v4sf, v4sf) -v4sf __builtin_ia32_minps (v4sf, v4sf) -v4sf __builtin_ia32_minss (v4sf, v4sf) -v4sf __builtin_ia32_andps (v4sf, v4sf) -v4sf __builtin_ia32_andnps (v4sf, v4sf) -v4sf __builtin_ia32_orps (v4sf, v4sf) -v4sf __builtin_ia32_xorps (v4sf, v4sf) -v4sf __builtin_ia32_movss (v4sf, v4sf) -v4sf __builtin_ia32_movhlps (v4sf, v4sf) -v4sf __builtin_ia32_movlhps (v4sf, v4sf) -v4sf __builtin_ia32_unpckhps (v4sf, v4sf) -v4sf __builtin_ia32_unpcklps (v4sf, v4sf) -v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si) -v4sf __builtin_ia32_cvtsi2ss (v4sf, int) -v2si __builtin_ia32_cvtps2pi (v4sf) -int __builtin_ia32_cvtss2si (v4sf) -v2si __builtin_ia32_cvttps2pi (v4sf) -int __builtin_ia32_cvttss2si (v4sf) -v4sf __builtin_ia32_rcpps (v4sf) -v4sf __builtin_ia32_rsqrtps (v4sf) -v4sf __builtin_ia32_sqrtps (v4sf) -v4sf __builtin_ia32_rcpss (v4sf) -v4sf __builtin_ia32_rsqrtss (v4sf) -v4sf __builtin_ia32_sqrtss (v4sf) -v4sf __builtin_ia32_shufps (v4sf, v4sf, int) -void __builtin_ia32_movntps (float *, v4sf) -int __builtin_ia32_movmskps (v4sf) -@end smallexample - -The following built-in functions are available when @option{-msse} is used. - -@table @code -@item v4sf __builtin_ia32_loadups (float *) -Generates the @code{movups} machine instruction as a load from memory. -@item void __builtin_ia32_storeups (float *, v4sf) -Generates the @code{movups} machine instruction as a store to memory. -@item v4sf __builtin_ia32_loadss (float *) -Generates the @code{movss} machine instruction as a load from memory. -@item v4sf __builtin_ia32_loadhps (v4sf, const v2sf *) -Generates the @code{movhps} machine instruction as a load from memory. -@item v4sf __builtin_ia32_loadlps (v4sf, const v2sf *) -Generates the @code{movlps} machine instruction as a load from memory -@item void __builtin_ia32_storehps (v2sf *, v4sf) -Generates the @code{movhps} machine instruction as a store to memory. -@item void __builtin_ia32_storelps (v2sf *, v4sf) -Generates the @code{movlps} machine instruction as a store to memory. -@end table - -The following built-in functions are available when @option{-msse2} is used. -All of them generate the machine instruction that is part of the name. - -@smallexample -int __builtin_ia32_comisdeq (v2df, v2df) -int __builtin_ia32_comisdlt (v2df, v2df) -int __builtin_ia32_comisdle (v2df, v2df) -int __builtin_ia32_comisdgt (v2df, v2df) -int __builtin_ia32_comisdge (v2df, v2df) -int __builtin_ia32_comisdneq (v2df, v2df) -int __builtin_ia32_ucomisdeq (v2df, v2df) -int __builtin_ia32_ucomisdlt (v2df, v2df) -int __builtin_ia32_ucomisdle (v2df, v2df) -int __builtin_ia32_ucomisdgt (v2df, v2df) -int __builtin_ia32_ucomisdge (v2df, v2df) -int __builtin_ia32_ucomisdneq (v2df, v2df) -v2df __builtin_ia32_cmpeqpd (v2df, v2df) -v2df __builtin_ia32_cmpltpd (v2df, v2df) -v2df __builtin_ia32_cmplepd (v2df, v2df) -v2df __builtin_ia32_cmpgtpd (v2df, v2df) -v2df __builtin_ia32_cmpgepd (v2df, v2df) -v2df __builtin_ia32_cmpunordpd (v2df, v2df) -v2df __builtin_ia32_cmpneqpd (v2df, v2df) -v2df __builtin_ia32_cmpnltpd (v2df, v2df) -v2df __builtin_ia32_cmpnlepd (v2df, v2df) -v2df __builtin_ia32_cmpngtpd (v2df, v2df) -v2df __builtin_ia32_cmpngepd (v2df, v2df) -v2df __builtin_ia32_cmpordpd (v2df, v2df) -v2df __builtin_ia32_cmpeqsd (v2df, v2df) -v2df __builtin_ia32_cmpltsd (v2df, v2df) -v2df __builtin_ia32_cmplesd (v2df, v2df) -v2df __builtin_ia32_cmpunordsd (v2df, v2df) -v2df __builtin_ia32_cmpneqsd (v2df, v2df) -v2df __builtin_ia32_cmpnltsd (v2df, v2df) -v2df __builtin_ia32_cmpnlesd (v2df, v2df) -v2df __builtin_ia32_cmpordsd (v2df, v2df) -v2di __builtin_ia32_paddq (v2di, v2di) -v2di __builtin_ia32_psubq (v2di, v2di) -v2df __builtin_ia32_addpd (v2df, v2df) -v2df __builtin_ia32_subpd (v2df, v2df) -v2df __builtin_ia32_mulpd (v2df, v2df) -v2df __builtin_ia32_divpd (v2df, v2df) -v2df __builtin_ia32_addsd (v2df, v2df) -v2df __builtin_ia32_subsd (v2df, v2df) -v2df __builtin_ia32_mulsd (v2df, v2df) -v2df __builtin_ia32_divsd (v2df, v2df) -v2df __builtin_ia32_minpd (v2df, v2df) -v2df __builtin_ia32_maxpd (v2df, v2df) -v2df __builtin_ia32_minsd (v2df, v2df) -v2df __builtin_ia32_maxsd (v2df, v2df) -v2df __builtin_ia32_andpd (v2df, v2df) -v2df __builtin_ia32_andnpd (v2df, v2df) -v2df __builtin_ia32_orpd (v2df, v2df) -v2df __builtin_ia32_xorpd (v2df, v2df) -v2df __builtin_ia32_movsd (v2df, v2df) -v2df __builtin_ia32_unpckhpd (v2df, v2df) -v2df __builtin_ia32_unpcklpd (v2df, v2df) -v16qi __builtin_ia32_paddb128 (v16qi, v16qi) -v8hi __builtin_ia32_paddw128 (v8hi, v8hi) -v4si __builtin_ia32_paddd128 (v4si, v4si) -v2di __builtin_ia32_paddq128 (v2di, v2di) -v16qi __builtin_ia32_psubb128 (v16qi, v16qi) -v8hi __builtin_ia32_psubw128 (v8hi, v8hi) -v4si __builtin_ia32_psubd128 (v4si, v4si) -v2di __builtin_ia32_psubq128 (v2di, v2di) -v8hi __builtin_ia32_pmullw128 (v8hi, v8hi) -v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi) -v2di __builtin_ia32_pand128 (v2di, v2di) -v2di __builtin_ia32_pandn128 (v2di, v2di) -v2di __builtin_ia32_por128 (v2di, v2di) -v2di __builtin_ia32_pxor128 (v2di, v2di) -v16qi __builtin_ia32_pavgb128 (v16qi, v16qi) -v8hi __builtin_ia32_pavgw128 (v8hi, v8hi) -v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi) -v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi) -v4si __builtin_ia32_pcmpeqd128 (v4si, v4si) -v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi) -v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi) -v4si __builtin_ia32_pcmpgtd128 (v4si, v4si) -v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi) -v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi) -v16qi __builtin_ia32_pminub128 (v16qi, v16qi) -v8hi __builtin_ia32_pminsw128 (v8hi, v8hi) -v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi) -v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi) -v4si __builtin_ia32_punpckhdq128 (v4si, v4si) -v2di __builtin_ia32_punpckhqdq128 (v2di, v2di) -v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi) -v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi) -v4si __builtin_ia32_punpckldq128 (v4si, v4si) -v2di __builtin_ia32_punpcklqdq128 (v2di, v2di) -v16qi __builtin_ia32_packsswb128 (v8hi, v8hi) -v8hi __builtin_ia32_packssdw128 (v4si, v4si) -v16qi __builtin_ia32_packuswb128 (v8hi, v8hi) -v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi) -void __builtin_ia32_maskmovdqu (v16qi, v16qi) -v2df __builtin_ia32_loadupd (double *) -void __builtin_ia32_storeupd (double *, v2df) -v2df __builtin_ia32_loadhpd (v2df, double const *) -v2df __builtin_ia32_loadlpd (v2df, double const *) -int __builtin_ia32_movmskpd (v2df) -int __builtin_ia32_pmovmskb128 (v16qi) -void __builtin_ia32_movnti (int *, int) -void __builtin_ia32_movnti64 (long long int *, long long int) -void __builtin_ia32_movntpd (double *, v2df) -void __builtin_ia32_movntdq (v2df *, v2df) -v4si __builtin_ia32_pshufd (v4si, int) -v8hi __builtin_ia32_pshuflw (v8hi, int) -v8hi __builtin_ia32_pshufhw (v8hi, int) -v2di __builtin_ia32_psadbw128 (v16qi, v16qi) -v2df __builtin_ia32_sqrtpd (v2df) -v2df __builtin_ia32_sqrtsd (v2df) -v2df __builtin_ia32_shufpd (v2df, v2df, int) -v2df __builtin_ia32_cvtdq2pd (v4si) -v4sf __builtin_ia32_cvtdq2ps (v4si) -v4si __builtin_ia32_cvtpd2dq (v2df) -v2si __builtin_ia32_cvtpd2pi (v2df) -v4sf __builtin_ia32_cvtpd2ps (v2df) -v4si __builtin_ia32_cvttpd2dq (v2df) -v2si __builtin_ia32_cvttpd2pi (v2df) -v2df __builtin_ia32_cvtpi2pd (v2si) -int __builtin_ia32_cvtsd2si (v2df) -int __builtin_ia32_cvttsd2si (v2df) -long long __builtin_ia32_cvtsd2si64 (v2df) -long long __builtin_ia32_cvttsd2si64 (v2df) -v4si __builtin_ia32_cvtps2dq (v4sf) -v2df __builtin_ia32_cvtps2pd (v4sf) -v4si __builtin_ia32_cvttps2dq (v4sf) -v2df __builtin_ia32_cvtsi2sd (v2df, int) -v2df __builtin_ia32_cvtsi642sd (v2df, long long) -v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df) -v2df __builtin_ia32_cvtss2sd (v2df, v4sf) -void __builtin_ia32_clflush (const void *) -void __builtin_ia32_lfence (void) -void __builtin_ia32_mfence (void) -v16qi __builtin_ia32_loaddqu (const char *) -void __builtin_ia32_storedqu (char *, v16qi) -v1di __builtin_ia32_pmuludq (v2si, v2si) -v2di __builtin_ia32_pmuludq128 (v4si, v4si) -v8hi __builtin_ia32_psllw128 (v8hi, v8hi) -v4si __builtin_ia32_pslld128 (v4si, v4si) -v2di __builtin_ia32_psllq128 (v2di, v2di) -v8hi __builtin_ia32_psrlw128 (v8hi, v8hi) -v4si __builtin_ia32_psrld128 (v4si, v4si) -v2di __builtin_ia32_psrlq128 (v2di, v2di) -v8hi __builtin_ia32_psraw128 (v8hi, v8hi) -v4si __builtin_ia32_psrad128 (v4si, v4si) -v2di __builtin_ia32_pslldqi128 (v2di, int) -v8hi __builtin_ia32_psllwi128 (v8hi, int) -v4si __builtin_ia32_pslldi128 (v4si, int) -v2di __builtin_ia32_psllqi128 (v2di, int) -v2di __builtin_ia32_psrldqi128 (v2di, int) -v8hi __builtin_ia32_psrlwi128 (v8hi, int) -v4si __builtin_ia32_psrldi128 (v4si, int) -v2di __builtin_ia32_psrlqi128 (v2di, int) -v8hi __builtin_ia32_psrawi128 (v8hi, int) -v4si __builtin_ia32_psradi128 (v4si, int) -v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi) -v2di __builtin_ia32_movq128 (v2di) -@end smallexample - -The following built-in functions are available when @option{-msse3} is used. -All of them generate the machine instruction that is part of the name. - -@smallexample -v2df __builtin_ia32_addsubpd (v2df, v2df) -v4sf __builtin_ia32_addsubps (v4sf, v4sf) -v2df __builtin_ia32_haddpd (v2df, v2df) -v4sf __builtin_ia32_haddps (v4sf, v4sf) -v2df __builtin_ia32_hsubpd (v2df, v2df) -v4sf __builtin_ia32_hsubps (v4sf, v4sf) -v16qi __builtin_ia32_lddqu (char const *) -void __builtin_ia32_monitor (void *, unsigned int, unsigned int) -v4sf __builtin_ia32_movshdup (v4sf) -v4sf __builtin_ia32_movsldup (v4sf) -void __builtin_ia32_mwait (unsigned int, unsigned int) -@end smallexample - -The following built-in functions are available when @option{-mssse3} is used. -All of them generate the machine instruction that is part of the name. - -@smallexample -v2si __builtin_ia32_phaddd (v2si, v2si) -v4hi __builtin_ia32_phaddw (v4hi, v4hi) -v4hi __builtin_ia32_phaddsw (v4hi, v4hi) -v2si __builtin_ia32_phsubd (v2si, v2si) -v4hi __builtin_ia32_phsubw (v4hi, v4hi) -v4hi __builtin_ia32_phsubsw (v4hi, v4hi) -v4hi __builtin_ia32_pmaddubsw (v8qi, v8qi) -v4hi __builtin_ia32_pmulhrsw (v4hi, v4hi) -v8qi __builtin_ia32_pshufb (v8qi, v8qi) -v8qi __builtin_ia32_psignb (v8qi, v8qi) -v2si __builtin_ia32_psignd (v2si, v2si) -v4hi __builtin_ia32_psignw (v4hi, v4hi) -v1di __builtin_ia32_palignr (v1di, v1di, int) -v8qi __builtin_ia32_pabsb (v8qi) -v2si __builtin_ia32_pabsd (v2si) -v4hi __builtin_ia32_pabsw (v4hi) -@end smallexample - -The following built-in functions are available when @option{-mssse3} is used. -All of them generate the machine instruction that is part of the name. - -@smallexample -v4si __builtin_ia32_phaddd128 (v4si, v4si) -v8hi __builtin_ia32_phaddw128 (v8hi, v8hi) -v8hi __builtin_ia32_phaddsw128 (v8hi, v8hi) -v4si __builtin_ia32_phsubd128 (v4si, v4si) -v8hi __builtin_ia32_phsubw128 (v8hi, v8hi) -v8hi __builtin_ia32_phsubsw128 (v8hi, v8hi) -v8hi __builtin_ia32_pmaddubsw128 (v16qi, v16qi) -v8hi __builtin_ia32_pmulhrsw128 (v8hi, v8hi) -v16qi __builtin_ia32_pshufb128 (v16qi, v16qi) -v16qi __builtin_ia32_psignb128 (v16qi, v16qi) -v4si __builtin_ia32_psignd128 (v4si, v4si) -v8hi __builtin_ia32_psignw128 (v8hi, v8hi) -v2di __builtin_ia32_palignr128 (v2di, v2di, int) -v16qi __builtin_ia32_pabsb128 (v16qi) -v4si __builtin_ia32_pabsd128 (v4si) -v8hi __builtin_ia32_pabsw128 (v8hi) -@end smallexample - -The following built-in functions are available when @option{-msse4.1} is -used. All of them generate the machine instruction that is part of the -name. - -@smallexample -v2df __builtin_ia32_blendpd (v2df, v2df, const int) -v4sf __builtin_ia32_blendps (v4sf, v4sf, const int) -v2df __builtin_ia32_blendvpd (v2df, v2df, v2df) -v4sf __builtin_ia32_blendvps (v4sf, v4sf, v4sf) -v2df __builtin_ia32_dppd (v2df, v2df, const int) -v4sf __builtin_ia32_dpps (v4sf, v4sf, const int) -v4sf __builtin_ia32_insertps128 (v4sf, v4sf, const int) -v2di __builtin_ia32_movntdqa (v2di *); -v16qi __builtin_ia32_mpsadbw128 (v16qi, v16qi, const int) -v8hi __builtin_ia32_packusdw128 (v4si, v4si) -v16qi __builtin_ia32_pblendvb128 (v16qi, v16qi, v16qi) -v8hi __builtin_ia32_pblendw128 (v8hi, v8hi, const int) -v2di __builtin_ia32_pcmpeqq (v2di, v2di) -v8hi __builtin_ia32_phminposuw128 (v8hi) -v16qi __builtin_ia32_pmaxsb128 (v16qi, v16qi) -v4si __builtin_ia32_pmaxsd128 (v4si, v4si) -v4si __builtin_ia32_pmaxud128 (v4si, v4si) -v8hi __builtin_ia32_pmaxuw128 (v8hi, v8hi) -v16qi __builtin_ia32_pminsb128 (v16qi, v16qi) -v4si __builtin_ia32_pminsd128 (v4si, v4si) -v4si __builtin_ia32_pminud128 (v4si, v4si) -v8hi __builtin_ia32_pminuw128 (v8hi, v8hi) -v4si __builtin_ia32_pmovsxbd128 (v16qi) -v2di __builtin_ia32_pmovsxbq128 (v16qi) -v8hi __builtin_ia32_pmovsxbw128 (v16qi) -v2di __builtin_ia32_pmovsxdq128 (v4si) -v4si __builtin_ia32_pmovsxwd128 (v8hi) -v2di __builtin_ia32_pmovsxwq128 (v8hi) -v4si __builtin_ia32_pmovzxbd128 (v16qi) -v2di __builtin_ia32_pmovzxbq128 (v16qi) -v8hi __builtin_ia32_pmovzxbw128 (v16qi) -v2di __builtin_ia32_pmovzxdq128 (v4si) -v4si __builtin_ia32_pmovzxwd128 (v8hi) -v2di __builtin_ia32_pmovzxwq128 (v8hi) -v2di __builtin_ia32_pmuldq128 (v4si, v4si) -v4si __builtin_ia32_pmulld128 (v4si, v4si) -int __builtin_ia32_ptestc128 (v2di, v2di) -int __builtin_ia32_ptestnzc128 (v2di, v2di) -int __builtin_ia32_ptestz128 (v2di, v2di) -v2df __builtin_ia32_roundpd (v2df, const int) -v4sf __builtin_ia32_roundps (v4sf, const int) -v2df __builtin_ia32_roundsd (v2df, v2df, const int) -v4sf __builtin_ia32_roundss (v4sf, v4sf, const int) -@end smallexample - -The following built-in functions are available when @option{-msse4.1} is -used. - -@table @code -@item v4sf __builtin_ia32_vec_set_v4sf (v4sf, float, const int) -Generates the @code{insertps} machine instruction. -@item int __builtin_ia32_vec_ext_v16qi (v16qi, const int) -Generates the @code{pextrb} machine instruction. -@item v16qi __builtin_ia32_vec_set_v16qi (v16qi, int, const int) -Generates the @code{pinsrb} machine instruction. -@item v4si __builtin_ia32_vec_set_v4si (v4si, int, const int) -Generates the @code{pinsrd} machine instruction. -@item v2di __builtin_ia32_vec_set_v2di (v2di, long long, const int) -Generates the @code{pinsrq} machine instruction in 64bit mode. -@end table - -The following built-in functions are changed to generate new SSE4.1 -instructions when @option{-msse4.1} is used. - -@table @code -@item float __builtin_ia32_vec_ext_v4sf (v4sf, const int) -Generates the @code{extractps} machine instruction. -@item int __builtin_ia32_vec_ext_v4si (v4si, const int) -Generates the @code{pextrd} machine instruction. -@item long long __builtin_ia32_vec_ext_v2di (v2di, const int) -Generates the @code{pextrq} machine instruction in 64bit mode. -@end table - -The following built-in functions are available when @option{-msse4.2} is -used. All of them generate the machine instruction that is part of the -name. - -@smallexample -v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int) -int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int) -int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int) -int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int) -int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int) -int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int) -int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int) -v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int) -int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int) -int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int) -int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int) -int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int) -int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int) -int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int) -v2di __builtin_ia32_pcmpgtq (v2di, v2di) -@end smallexample - -The following built-in functions are available when @option{-msse4.2} is -used. - -@table @code -@item unsigned int __builtin_ia32_crc32qi (unsigned int, unsigned char) -Generates the @code{crc32b} machine instruction. -@item unsigned int __builtin_ia32_crc32hi (unsigned int, unsigned short) -Generates the @code{crc32w} machine instruction. -@item unsigned int __builtin_ia32_crc32si (unsigned int, unsigned int) -Generates the @code{crc32l} machine instruction. -@item unsigned long long __builtin_ia32_crc32di (unsigned long long, unsigned long long) -Generates the @code{crc32q} machine instruction. -@end table - -The following built-in functions are changed to generate new SSE4.2 -instructions when @option{-msse4.2} is used. - -@table @code -@item int __builtin_popcount (unsigned int) -Generates the @code{popcntl} machine instruction. -@item int __builtin_popcountl (unsigned long) -Generates the @code{popcntl} or @code{popcntq} machine instruction, -depending on the size of @code{unsigned long}. -@item int __builtin_popcountll (unsigned long long) -Generates the @code{popcntq} machine instruction. -@end table - -The following built-in functions are available when @option{-mavx} is -used. All of them generate the machine instruction that is part of the -name. - -@smallexample -v4df __builtin_ia32_addpd256 (v4df,v4df) -v8sf __builtin_ia32_addps256 (v8sf,v8sf) -v4df __builtin_ia32_addsubpd256 (v4df,v4df) -v8sf __builtin_ia32_addsubps256 (v8sf,v8sf) -v4df __builtin_ia32_andnpd256 (v4df,v4df) -v8sf __builtin_ia32_andnps256 (v8sf,v8sf) -v4df __builtin_ia32_andpd256 (v4df,v4df) -v8sf __builtin_ia32_andps256 (v8sf,v8sf) -v4df __builtin_ia32_blendpd256 (v4df,v4df,int) -v8sf __builtin_ia32_blendps256 (v8sf,v8sf,int) -v4df __builtin_ia32_blendvpd256 (v4df,v4df,v4df) -v8sf __builtin_ia32_blendvps256 (v8sf,v8sf,v8sf) -v2df __builtin_ia32_cmppd (v2df,v2df,int) -v4df __builtin_ia32_cmppd256 (v4df,v4df,int) -v4sf __builtin_ia32_cmpps (v4sf,v4sf,int) -v8sf __builtin_ia32_cmpps256 (v8sf,v8sf,int) -v2df __builtin_ia32_cmpsd (v2df,v2df,int) -v4sf __builtin_ia32_cmpss (v4sf,v4sf,int) -v4df __builtin_ia32_cvtdq2pd256 (v4si) -v8sf __builtin_ia32_cvtdq2ps256 (v8si) -v4si __builtin_ia32_cvtpd2dq256 (v4df) -v4sf __builtin_ia32_cvtpd2ps256 (v4df) -v8si __builtin_ia32_cvtps2dq256 (v8sf) -v4df __builtin_ia32_cvtps2pd256 (v4sf) -v4si __builtin_ia32_cvttpd2dq256 (v4df) -v8si __builtin_ia32_cvttps2dq256 (v8sf) -v4df __builtin_ia32_divpd256 (v4df,v4df) -v8sf __builtin_ia32_divps256 (v8sf,v8sf) -v8sf __builtin_ia32_dpps256 (v8sf,v8sf,int) -v4df __builtin_ia32_haddpd256 (v4df,v4df) -v8sf __builtin_ia32_haddps256 (v8sf,v8sf) -v4df __builtin_ia32_hsubpd256 (v4df,v4df) -v8sf __builtin_ia32_hsubps256 (v8sf,v8sf) -v32qi __builtin_ia32_lddqu256 (pcchar) -v32qi __builtin_ia32_loaddqu256 (pcchar) -v4df __builtin_ia32_loadupd256 (pcdouble) -v8sf __builtin_ia32_loadups256 (pcfloat) -v2df __builtin_ia32_maskloadpd (pcv2df,v2df) -v4df __builtin_ia32_maskloadpd256 (pcv4df,v4df) -v4sf __builtin_ia32_maskloadps (pcv4sf,v4sf) -v8sf __builtin_ia32_maskloadps256 (pcv8sf,v8sf) -void __builtin_ia32_maskstorepd (pv2df,v2df,v2df) -void __builtin_ia32_maskstorepd256 (pv4df,v4df,v4df) -void __builtin_ia32_maskstoreps (pv4sf,v4sf,v4sf) -void __builtin_ia32_maskstoreps256 (pv8sf,v8sf,v8sf) -v4df __builtin_ia32_maxpd256 (v4df,v4df) -v8sf __builtin_ia32_maxps256 (v8sf,v8sf) -v4df __builtin_ia32_minpd256 (v4df,v4df) -v8sf __builtin_ia32_minps256 (v8sf,v8sf) -v4df __builtin_ia32_movddup256 (v4df) -int __builtin_ia32_movmskpd256 (v4df) -int __builtin_ia32_movmskps256 (v8sf) -v8sf __builtin_ia32_movshdup256 (v8sf) -v8sf __builtin_ia32_movsldup256 (v8sf) -v4df __builtin_ia32_mulpd256 (v4df,v4df) -v8sf __builtin_ia32_mulps256 (v8sf,v8sf) -v4df __builtin_ia32_orpd256 (v4df,v4df) -v8sf __builtin_ia32_orps256 (v8sf,v8sf) -v2df __builtin_ia32_pd_pd256 (v4df) -v4df __builtin_ia32_pd256_pd (v2df) -v4sf __builtin_ia32_ps_ps256 (v8sf) -v8sf __builtin_ia32_ps256_ps (v4sf) -int __builtin_ia32_ptestc256 (v4di,v4di,ptest) -int __builtin_ia32_ptestnzc256 (v4di,v4di,ptest) -int __builtin_ia32_ptestz256 (v4di,v4di,ptest) -v8sf __builtin_ia32_rcpps256 (v8sf) -v4df __builtin_ia32_roundpd256 (v4df,int) -v8sf __builtin_ia32_roundps256 (v8sf,int) -v8sf __builtin_ia32_rsqrtps_nr256 (v8sf) -v8sf __builtin_ia32_rsqrtps256 (v8sf) -v4df __builtin_ia32_shufpd256 (v4df,v4df,int) -v8sf __builtin_ia32_shufps256 (v8sf,v8sf,int) -v4si __builtin_ia32_si_si256 (v8si) -v8si __builtin_ia32_si256_si (v4si) -v4df __builtin_ia32_sqrtpd256 (v4df) -v8sf __builtin_ia32_sqrtps_nr256 (v8sf) -v8sf __builtin_ia32_sqrtps256 (v8sf) -void __builtin_ia32_storedqu256 (pchar,v32qi) -void __builtin_ia32_storeupd256 (pdouble,v4df) -void __builtin_ia32_storeups256 (pfloat,v8sf) -v4df __builtin_ia32_subpd256 (v4df,v4df) -v8sf __builtin_ia32_subps256 (v8sf,v8sf) -v4df __builtin_ia32_unpckhpd256 (v4df,v4df) -v8sf __builtin_ia32_unpckhps256 (v8sf,v8sf) -v4df __builtin_ia32_unpcklpd256 (v4df,v4df) -v8sf __builtin_ia32_unpcklps256 (v8sf,v8sf) -v4df __builtin_ia32_vbroadcastf128_pd256 (pcv2df) -v8sf __builtin_ia32_vbroadcastf128_ps256 (pcv4sf) -v4df __builtin_ia32_vbroadcastsd256 (pcdouble) -v4sf __builtin_ia32_vbroadcastss (pcfloat) -v8sf __builtin_ia32_vbroadcastss256 (pcfloat) -v2df __builtin_ia32_vextractf128_pd256 (v4df,int) -v4sf __builtin_ia32_vextractf128_ps256 (v8sf,int) -v4si __builtin_ia32_vextractf128_si256 (v8si,int) -v4df __builtin_ia32_vinsertf128_pd256 (v4df,v2df,int) -v8sf __builtin_ia32_vinsertf128_ps256 (v8sf,v4sf,int) -v8si __builtin_ia32_vinsertf128_si256 (v8si,v4si,int) -v4df __builtin_ia32_vperm2f128_pd256 (v4df,v4df,int) -v8sf __builtin_ia32_vperm2f128_ps256 (v8sf,v8sf,int) -v8si __builtin_ia32_vperm2f128_si256 (v8si,v8si,int) -v2df __builtin_ia32_vpermil2pd (v2df,v2df,v2di,int) -v4df __builtin_ia32_vpermil2pd256 (v4df,v4df,v4di,int) -v4sf __builtin_ia32_vpermil2ps (v4sf,v4sf,v4si,int) -v8sf __builtin_ia32_vpermil2ps256 (v8sf,v8sf,v8si,int) -v2df __builtin_ia32_vpermilpd (v2df,int) -v4df __builtin_ia32_vpermilpd256 (v4df,int) -v4sf __builtin_ia32_vpermilps (v4sf,int) -v8sf __builtin_ia32_vpermilps256 (v8sf,int) -v2df __builtin_ia32_vpermilvarpd (v2df,v2di) -v4df __builtin_ia32_vpermilvarpd256 (v4df,v4di) -v4sf __builtin_ia32_vpermilvarps (v4sf,v4si) -v8sf __builtin_ia32_vpermilvarps256 (v8sf,v8si) -int __builtin_ia32_vtestcpd (v2df,v2df,ptest) -int __builtin_ia32_vtestcpd256 (v4df,v4df,ptest) -int __builtin_ia32_vtestcps (v4sf,v4sf,ptest) -int __builtin_ia32_vtestcps256 (v8sf,v8sf,ptest) -int __builtin_ia32_vtestnzcpd (v2df,v2df,ptest) -int __builtin_ia32_vtestnzcpd256 (v4df,v4df,ptest) -int __builtin_ia32_vtestnzcps (v4sf,v4sf,ptest) -int __builtin_ia32_vtestnzcps256 (v8sf,v8sf,ptest) -int __builtin_ia32_vtestzpd (v2df,v2df,ptest) -int __builtin_ia32_vtestzpd256 (v4df,v4df,ptest) -int __builtin_ia32_vtestzps (v4sf,v4sf,ptest) -int __builtin_ia32_vtestzps256 (v8sf,v8sf,ptest) -void __builtin_ia32_vzeroall (void) -void __builtin_ia32_vzeroupper (void) -v4df __builtin_ia32_xorpd256 (v4df,v4df) -v8sf __builtin_ia32_xorps256 (v8sf,v8sf) -@end smallexample - -The following built-in functions are available when @option{-mavx2} is -used. All of them generate the machine instruction that is part of the -name. - -@smallexample -v32qi __builtin_ia32_mpsadbw256 (v32qi,v32qi,int) -v32qi __builtin_ia32_pabsb256 (v32qi) -v16hi __builtin_ia32_pabsw256 (v16hi) -v8si __builtin_ia32_pabsd256 (v8si) -v16hi __builtin_ia32_packssdw256 (v8si,v8si) -v32qi __builtin_ia32_packsswb256 (v16hi,v16hi) -v16hi __builtin_ia32_packusdw256 (v8si,v8si) -v32qi __builtin_ia32_packuswb256 (v16hi,v16hi) -v32qi __builtin_ia32_paddb256 (v32qi,v32qi) -v16hi __builtin_ia32_paddw256 (v16hi,v16hi) -v8si __builtin_ia32_paddd256 (v8si,v8si) -v4di __builtin_ia32_paddq256 (v4di,v4di) -v32qi __builtin_ia32_paddsb256 (v32qi,v32qi) -v16hi __builtin_ia32_paddsw256 (v16hi,v16hi) -v32qi __builtin_ia32_paddusb256 (v32qi,v32qi) -v16hi __builtin_ia32_paddusw256 (v16hi,v16hi) -v4di __builtin_ia32_palignr256 (v4di,v4di,int) -v4di __builtin_ia32_andsi256 (v4di,v4di) -v4di __builtin_ia32_andnotsi256 (v4di,v4di) -v32qi __builtin_ia32_pavgb256 (v32qi,v32qi) -v16hi __builtin_ia32_pavgw256 (v16hi,v16hi) -v32qi __builtin_ia32_pblendvb256 (v32qi,v32qi,v32qi) -v16hi __builtin_ia32_pblendw256 (v16hi,v16hi,int) -v32qi __builtin_ia32_pcmpeqb256 (v32qi,v32qi) -v16hi __builtin_ia32_pcmpeqw256 (v16hi,v16hi) -v8si __builtin_ia32_pcmpeqd256 (c8si,v8si) -v4di __builtin_ia32_pcmpeqq256 (v4di,v4di) -v32qi __builtin_ia32_pcmpgtb256 (v32qi,v32qi) -v16hi __builtin_ia32_pcmpgtw256 (16hi,v16hi) -v8si __builtin_ia32_pcmpgtd256 (v8si,v8si) -v4di __builtin_ia32_pcmpgtq256 (v4di,v4di) -v16hi __builtin_ia32_phaddw256 (v16hi,v16hi) -v8si __builtin_ia32_phaddd256 (v8si,v8si) -v16hi __builtin_ia32_phaddsw256 (v16hi,v16hi) -v16hi __builtin_ia32_phsubw256 (v16hi,v16hi) -v8si __builtin_ia32_phsubd256 (v8si,v8si) -v16hi __builtin_ia32_phsubsw256 (v16hi,v16hi) -v32qi __builtin_ia32_pmaddubsw256 (v32qi,v32qi) -v16hi __builtin_ia32_pmaddwd256 (v16hi,v16hi) -v32qi __builtin_ia32_pmaxsb256 (v32qi,v32qi) -v16hi __builtin_ia32_pmaxsw256 (v16hi,v16hi) -v8si __builtin_ia32_pmaxsd256 (v8si,v8si) -v32qi __builtin_ia32_pmaxub256 (v32qi,v32qi) -v16hi __builtin_ia32_pmaxuw256 (v16hi,v16hi) -v8si __builtin_ia32_pmaxud256 (v8si,v8si) -v32qi __builtin_ia32_pminsb256 (v32qi,v32qi) -v16hi __builtin_ia32_pminsw256 (v16hi,v16hi) -v8si __builtin_ia32_pminsd256 (v8si,v8si) -v32qi __builtin_ia32_pminub256 (v32qi,v32qi) -v16hi __builtin_ia32_pminuw256 (v16hi,v16hi) -v8si __builtin_ia32_pminud256 (v8si,v8si) -int __builtin_ia32_pmovmskb256 (v32qi) -v16hi __builtin_ia32_pmovsxbw256 (v16qi) -v8si __builtin_ia32_pmovsxbd256 (v16qi) -v4di __builtin_ia32_pmovsxbq256 (v16qi) -v8si __builtin_ia32_pmovsxwd256 (v8hi) -v4di __builtin_ia32_pmovsxwq256 (v8hi) -v4di __builtin_ia32_pmovsxdq256 (v4si) -v16hi __builtin_ia32_pmovzxbw256 (v16qi) -v8si __builtin_ia32_pmovzxbd256 (v16qi) -v4di __builtin_ia32_pmovzxbq256 (v16qi) -v8si __builtin_ia32_pmovzxwd256 (v8hi) -v4di __builtin_ia32_pmovzxwq256 (v8hi) -v4di __builtin_ia32_pmovzxdq256 (v4si) -v4di __builtin_ia32_pmuldq256 (v8si,v8si) -v16hi __builtin_ia32_pmulhrsw256 (v16hi, v16hi) -v16hi __builtin_ia32_pmulhuw256 (v16hi,v16hi) -v16hi __builtin_ia32_pmulhw256 (v16hi,v16hi) -v16hi __builtin_ia32_pmullw256 (v16hi,v16hi) -v8si __builtin_ia32_pmulld256 (v8si,v8si) -v4di __builtin_ia32_pmuludq256 (v8si,v8si) -v4di __builtin_ia32_por256 (v4di,v4di) -v16hi __builtin_ia32_psadbw256 (v32qi,v32qi) -v32qi __builtin_ia32_pshufb256 (v32qi,v32qi) -v8si __builtin_ia32_pshufd256 (v8si,int) -v16hi __builtin_ia32_pshufhw256 (v16hi,int) -v16hi __builtin_ia32_pshuflw256 (v16hi,int) -v32qi __builtin_ia32_psignb256 (v32qi,v32qi) -v16hi __builtin_ia32_psignw256 (v16hi,v16hi) -v8si __builtin_ia32_psignd256 (v8si,v8si) -v4di __builtin_ia32_pslldqi256 (v4di,int) -v16hi __builtin_ia32_psllwi256 (16hi,int) -v16hi __builtin_ia32_psllw256(v16hi,v8hi) -v8si __builtin_ia32_pslldi256 (v8si,int) -v8si __builtin_ia32_pslld256(v8si,v4si) -v4di __builtin_ia32_psllqi256 (v4di,int) -v4di __builtin_ia32_psllq256(v4di,v2di) -v16hi __builtin_ia32_psrawi256 (v16hi,int) -v16hi __builtin_ia32_psraw256 (v16hi,v8hi) -v8si __builtin_ia32_psradi256 (v8si,int) -v8si __builtin_ia32_psrad256 (v8si,v4si) -v4di __builtin_ia32_psrldqi256 (v4di, int) -v16hi __builtin_ia32_psrlwi256 (v16hi,int) -v16hi __builtin_ia32_psrlw256 (v16hi,v8hi) -v8si __builtin_ia32_psrldi256 (v8si,int) -v8si __builtin_ia32_psrld256 (v8si,v4si) -v4di __builtin_ia32_psrlqi256 (v4di,int) -v4di __builtin_ia32_psrlq256(v4di,v2di) -v32qi __builtin_ia32_psubb256 (v32qi,v32qi) -v32hi __builtin_ia32_psubw256 (v16hi,v16hi) -v8si __builtin_ia32_psubd256 (v8si,v8si) -v4di __builtin_ia32_psubq256 (v4di,v4di) -v32qi __builtin_ia32_psubsb256 (v32qi,v32qi) -v16hi __builtin_ia32_psubsw256 (v16hi,v16hi) -v32qi __builtin_ia32_psubusb256 (v32qi,v32qi) -v16hi __builtin_ia32_psubusw256 (v16hi,v16hi) -v32qi __builtin_ia32_punpckhbw256 (v32qi,v32qi) -v16hi __builtin_ia32_punpckhwd256 (v16hi,v16hi) -v8si __builtin_ia32_punpckhdq256 (v8si,v8si) -v4di __builtin_ia32_punpckhqdq256 (v4di,v4di) -v32qi __builtin_ia32_punpcklbw256 (v32qi,v32qi) -v16hi __builtin_ia32_punpcklwd256 (v16hi,v16hi) -v8si __builtin_ia32_punpckldq256 (v8si,v8si) -v4di __builtin_ia32_punpcklqdq256 (v4di,v4di) -v4di __builtin_ia32_pxor256 (v4di,v4di) -v4di __builtin_ia32_movntdqa256 (pv4di) -v4sf __builtin_ia32_vbroadcastss_ps (v4sf) -v8sf __builtin_ia32_vbroadcastss_ps256 (v4sf) -v4df __builtin_ia32_vbroadcastsd_pd256 (v2df) -v4di __builtin_ia32_vbroadcastsi256 (v2di) -v4si __builtin_ia32_pblendd128 (v4si,v4si) -v8si __builtin_ia32_pblendd256 (v8si,v8si) -v32qi __builtin_ia32_pbroadcastb256 (v16qi) -v16hi __builtin_ia32_pbroadcastw256 (v8hi) -v8si __builtin_ia32_pbroadcastd256 (v4si) -v4di __builtin_ia32_pbroadcastq256 (v2di) -v16qi __builtin_ia32_pbroadcastb128 (v16qi) -v8hi __builtin_ia32_pbroadcastw128 (v8hi) -v4si __builtin_ia32_pbroadcastd128 (v4si) -v2di __builtin_ia32_pbroadcastq128 (v2di) -v8si __builtin_ia32_permvarsi256 (v8si,v8si) -v4df __builtin_ia32_permdf256 (v4df,int) -v8sf __builtin_ia32_permvarsf256 (v8sf,v8sf) -v4di __builtin_ia32_permdi256 (v4di,int) -v4di __builtin_ia32_permti256 (v4di,v4di,int) -v4di __builtin_ia32_extract128i256 (v4di,int) -v4di __builtin_ia32_insert128i256 (v4di,v2di,int) -v8si __builtin_ia32_maskloadd256 (pcv8si,v8si) -v4di __builtin_ia32_maskloadq256 (pcv4di,v4di) -v4si __builtin_ia32_maskloadd (pcv4si,v4si) -v2di __builtin_ia32_maskloadq (pcv2di,v2di) -void __builtin_ia32_maskstored256 (pv8si,v8si,v8si) -void __builtin_ia32_maskstoreq256 (pv4di,v4di,v4di) -void __builtin_ia32_maskstored (pv4si,v4si,v4si) -void __builtin_ia32_maskstoreq (pv2di,v2di,v2di) -v8si __builtin_ia32_psllv8si (v8si,v8si) -v4si __builtin_ia32_psllv4si (v4si,v4si) -v4di __builtin_ia32_psllv4di (v4di,v4di) -v2di __builtin_ia32_psllv2di (v2di,v2di) -v8si __builtin_ia32_psrav8si (v8si,v8si) -v4si __builtin_ia32_psrav4si (v4si,v4si) -v8si __builtin_ia32_psrlv8si (v8si,v8si) -v4si __builtin_ia32_psrlv4si (v4si,v4si) -v4di __builtin_ia32_psrlv4di (v4di,v4di) -v2di __builtin_ia32_psrlv2di (v2di,v2di) -v2df __builtin_ia32_gathersiv2df (v2df, pcdouble,v4si,v2df,int) -v4df __builtin_ia32_gathersiv4df (v4df, pcdouble,v4si,v4df,int) -v2df __builtin_ia32_gatherdiv2df (v2df, pcdouble,v2di,v2df,int) -v4df __builtin_ia32_gatherdiv4df (v4df, pcdouble,v4di,v4df,int) -v4sf __builtin_ia32_gathersiv4sf (v4sf, pcfloat,v4si,v4sf,int) -v8sf __builtin_ia32_gathersiv8sf (v8sf, pcfloat,v8si,v8sf,int) -v4sf __builtin_ia32_gatherdiv4sf (v4sf, pcfloat,v2di,v4sf,int) -v4sf __builtin_ia32_gatherdiv4sf256 (v4sf, pcfloat,v4di,v4sf,int) -v2di __builtin_ia32_gathersiv2di (v2di, pcint64,v4si,v2di,int) -v4di __builtin_ia32_gathersiv4di (v4di, pcint64,v4si,v4di,int) -v2di __builtin_ia32_gatherdiv2di (v2di, pcint64,v2di,v2di,int) -v4di __builtin_ia32_gatherdiv4di (v4di, pcint64,v4di,v4di,int) -v4si __builtin_ia32_gathersiv4si (v4si, pcint,v4si,v4si,int) -v8si __builtin_ia32_gathersiv8si (v8si, pcint,v8si,v8si,int) -v4si __builtin_ia32_gatherdiv4si (v4si, pcint,v2di,v4si,int) -v4si __builtin_ia32_gatherdiv4si256 (v4si, pcint,v4di,v4si,int) -@end smallexample - -The following built-in functions are available when @option{-maes} is -used. All of them generate the machine instruction that is part of the -name. - -@smallexample -v2di __builtin_ia32_aesenc128 (v2di, v2di) -v2di __builtin_ia32_aesenclast128 (v2di, v2di) -v2di __builtin_ia32_aesdec128 (v2di, v2di) -v2di __builtin_ia32_aesdeclast128 (v2di, v2di) -v2di __builtin_ia32_aeskeygenassist128 (v2di, const int) -v2di __builtin_ia32_aesimc128 (v2di) -@end smallexample - -The following built-in function is available when @option{-mpclmul} is -used. - -@table @code -@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int) -Generates the @code{pclmulqdq} machine instruction. -@end table - -The following built-in function is available when @option{-mfsgsbase} is -used. All of them generate the machine instruction that is part of the -name. - -@smallexample -unsigned int __builtin_ia32_rdfsbase32 (void) -unsigned long long __builtin_ia32_rdfsbase64 (void) -unsigned int __builtin_ia32_rdgsbase32 (void) -unsigned long long __builtin_ia32_rdgsbase64 (void) -void _writefsbase_u32 (unsigned int) -void _writefsbase_u64 (unsigned long long) -void _writegsbase_u32 (unsigned int) -void _writegsbase_u64 (unsigned long long) -@end smallexample - -The following built-in function is available when @option{-mrdrnd} is -used. All of them generate the machine instruction that is part of the -name. - -@smallexample -unsigned int __builtin_ia32_rdrand16_step (unsigned short *) -unsigned int __builtin_ia32_rdrand32_step (unsigned int *) -unsigned int __builtin_ia32_rdrand64_step (unsigned long long *) -@end smallexample - -The following built-in functions are available when @option{-msse4a} is used. -All of them generate the machine instruction that is part of the name. - -@smallexample -void __builtin_ia32_movntsd (double *, v2df) -void __builtin_ia32_movntss (float *, v4sf) -v2di __builtin_ia32_extrq (v2di, v16qi) -v2di __builtin_ia32_extrqi (v2di, const unsigned int, const unsigned int) -v2di __builtin_ia32_insertq (v2di, v2di) -v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int) -@end smallexample - -The following built-in functions are available when @option{-mxop} is used. -@smallexample -v2df __builtin_ia32_vfrczpd (v2df) -v4sf __builtin_ia32_vfrczps (v4sf) -v2df __builtin_ia32_vfrczsd (v2df) -v4sf __builtin_ia32_vfrczss (v4sf) -v4df __builtin_ia32_vfrczpd256 (v4df) -v8sf __builtin_ia32_vfrczps256 (v8sf) -v2di __builtin_ia32_vpcmov (v2di, v2di, v2di) -v2di __builtin_ia32_vpcmov_v2di (v2di, v2di, v2di) -v4si __builtin_ia32_vpcmov_v4si (v4si, v4si, v4si) -v8hi __builtin_ia32_vpcmov_v8hi (v8hi, v8hi, v8hi) -v16qi __builtin_ia32_vpcmov_v16qi (v16qi, v16qi, v16qi) -v2df __builtin_ia32_vpcmov_v2df (v2df, v2df, v2df) -v4sf __builtin_ia32_vpcmov_v4sf (v4sf, v4sf, v4sf) -v4di __builtin_ia32_vpcmov_v4di256 (v4di, v4di, v4di) -v8si __builtin_ia32_vpcmov_v8si256 (v8si, v8si, v8si) -v16hi __builtin_ia32_vpcmov_v16hi256 (v16hi, v16hi, v16hi) -v32qi __builtin_ia32_vpcmov_v32qi256 (v32qi, v32qi, v32qi) -v4df __builtin_ia32_vpcmov_v4df256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vpcmov_v8sf256 (v8sf, v8sf, v8sf) -v16qi __builtin_ia32_vpcomeqb (v16qi, v16qi) -v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi) -v4si __builtin_ia32_vpcomeqd (v4si, v4si) -v2di __builtin_ia32_vpcomeqq (v2di, v2di) -v16qi __builtin_ia32_vpcomequb (v16qi, v16qi) -v4si __builtin_ia32_vpcomequd (v4si, v4si) -v2di __builtin_ia32_vpcomequq (v2di, v2di) -v8hi __builtin_ia32_vpcomequw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi) -v16qi __builtin_ia32_vpcomfalseb (v16qi, v16qi) -v4si __builtin_ia32_vpcomfalsed (v4si, v4si) -v2di __builtin_ia32_vpcomfalseq (v2di, v2di) -v16qi __builtin_ia32_vpcomfalseub (v16qi, v16qi) -v4si __builtin_ia32_vpcomfalseud (v4si, v4si) -v2di __builtin_ia32_vpcomfalseuq (v2di, v2di) -v8hi __builtin_ia32_vpcomfalseuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomfalsew (v8hi, v8hi) -v16qi __builtin_ia32_vpcomgeb (v16qi, v16qi) -v4si __builtin_ia32_vpcomged (v4si, v4si) -v2di __builtin_ia32_vpcomgeq (v2di, v2di) -v16qi __builtin_ia32_vpcomgeub (v16qi, v16qi) -v4si __builtin_ia32_vpcomgeud (v4si, v4si) -v2di __builtin_ia32_vpcomgeuq (v2di, v2di) -v8hi __builtin_ia32_vpcomgeuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomgew (v8hi, v8hi) -v16qi __builtin_ia32_vpcomgtb (v16qi, v16qi) -v4si __builtin_ia32_vpcomgtd (v4si, v4si) -v2di __builtin_ia32_vpcomgtq (v2di, v2di) -v16qi __builtin_ia32_vpcomgtub (v16qi, v16qi) -v4si __builtin_ia32_vpcomgtud (v4si, v4si) -v2di __builtin_ia32_vpcomgtuq (v2di, v2di) -v8hi __builtin_ia32_vpcomgtuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomgtw (v8hi, v8hi) -v16qi __builtin_ia32_vpcomleb (v16qi, v16qi) -v4si __builtin_ia32_vpcomled (v4si, v4si) -v2di __builtin_ia32_vpcomleq (v2di, v2di) -v16qi __builtin_ia32_vpcomleub (v16qi, v16qi) -v4si __builtin_ia32_vpcomleud (v4si, v4si) -v2di __builtin_ia32_vpcomleuq (v2di, v2di) -v8hi __builtin_ia32_vpcomleuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomlew (v8hi, v8hi) -v16qi __builtin_ia32_vpcomltb (v16qi, v16qi) -v4si __builtin_ia32_vpcomltd (v4si, v4si) -v2di __builtin_ia32_vpcomltq (v2di, v2di) -v16qi __builtin_ia32_vpcomltub (v16qi, v16qi) -v4si __builtin_ia32_vpcomltud (v4si, v4si) -v2di __builtin_ia32_vpcomltuq (v2di, v2di) -v8hi __builtin_ia32_vpcomltuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomltw (v8hi, v8hi) -v16qi __builtin_ia32_vpcomneb (v16qi, v16qi) -v4si __builtin_ia32_vpcomned (v4si, v4si) -v2di __builtin_ia32_vpcomneq (v2di, v2di) -v16qi __builtin_ia32_vpcomneub (v16qi, v16qi) -v4si __builtin_ia32_vpcomneud (v4si, v4si) -v2di __builtin_ia32_vpcomneuq (v2di, v2di) -v8hi __builtin_ia32_vpcomneuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomnew (v8hi, v8hi) -v16qi __builtin_ia32_vpcomtrueb (v16qi, v16qi) -v4si __builtin_ia32_vpcomtrued (v4si, v4si) -v2di __builtin_ia32_vpcomtrueq (v2di, v2di) -v16qi __builtin_ia32_vpcomtrueub (v16qi, v16qi) -v4si __builtin_ia32_vpcomtrueud (v4si, v4si) -v2di __builtin_ia32_vpcomtrueuq (v2di, v2di) -v8hi __builtin_ia32_vpcomtrueuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomtruew (v8hi, v8hi) -v4si __builtin_ia32_vphaddbd (v16qi) -v2di __builtin_ia32_vphaddbq (v16qi) -v8hi __builtin_ia32_vphaddbw (v16qi) -v2di __builtin_ia32_vphadddq (v4si) -v4si __builtin_ia32_vphaddubd (v16qi) -v2di __builtin_ia32_vphaddubq (v16qi) -v8hi __builtin_ia32_vphaddubw (v16qi) -v2di __builtin_ia32_vphaddudq (v4si) -v4si __builtin_ia32_vphadduwd (v8hi) -v2di __builtin_ia32_vphadduwq (v8hi) -v4si __builtin_ia32_vphaddwd (v8hi) -v2di __builtin_ia32_vphaddwq (v8hi) -v8hi __builtin_ia32_vphsubbw (v16qi) -v2di __builtin_ia32_vphsubdq (v4si) -v4si __builtin_ia32_vphsubwd (v8hi) -v4si __builtin_ia32_vpmacsdd (v4si, v4si, v4si) -v2di __builtin_ia32_vpmacsdqh (v4si, v4si, v2di) -v2di __builtin_ia32_vpmacsdql (v4si, v4si, v2di) -v4si __builtin_ia32_vpmacssdd (v4si, v4si, v4si) -v2di __builtin_ia32_vpmacssdqh (v4si, v4si, v2di) -v2di __builtin_ia32_vpmacssdql (v4si, v4si, v2di) -v4si __builtin_ia32_vpmacsswd (v8hi, v8hi, v4si) -v8hi __builtin_ia32_vpmacssww (v8hi, v8hi, v8hi) -v4si __builtin_ia32_vpmacswd (v8hi, v8hi, v4si) -v8hi __builtin_ia32_vpmacsww (v8hi, v8hi, v8hi) -v4si __builtin_ia32_vpmadcsswd (v8hi, v8hi, v4si) -v4si __builtin_ia32_vpmadcswd (v8hi, v8hi, v4si) -v16qi __builtin_ia32_vpperm (v16qi, v16qi, v16qi) -v16qi __builtin_ia32_vprotb (v16qi, v16qi) -v4si __builtin_ia32_vprotd (v4si, v4si) -v2di __builtin_ia32_vprotq (v2di, v2di) -v8hi __builtin_ia32_vprotw (v8hi, v8hi) -v16qi __builtin_ia32_vpshab (v16qi, v16qi) -v4si __builtin_ia32_vpshad (v4si, v4si) -v2di __builtin_ia32_vpshaq (v2di, v2di) -v8hi __builtin_ia32_vpshaw (v8hi, v8hi) -v16qi __builtin_ia32_vpshlb (v16qi, v16qi) -v4si __builtin_ia32_vpshld (v4si, v4si) -v2di __builtin_ia32_vpshlq (v2di, v2di) -v8hi __builtin_ia32_vpshlw (v8hi, v8hi) -@end smallexample - -The following built-in functions are available when @option{-mfma4} is used. -All of them generate the machine instruction that is part of the name. - -@smallexample -v2df __builtin_ia32_vfmaddpd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfmaddps (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfmaddsd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfmaddss (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfmsubpd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfmsubps (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfmsubsd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfmsubss (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfnmaddpd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfnmaddps (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfnmaddsd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfnmaddss (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfnmsubpd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfnmsubps (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfnmsubsd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfnmsubss (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfmaddsubpd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfmaddsubps (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfmsubaddpd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfmsubaddps (v4sf, v4sf, v4sf) -v4df __builtin_ia32_vfmaddpd256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vfmaddps256 (v8sf, v8sf, v8sf) -v4df __builtin_ia32_vfmsubpd256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vfmsubps256 (v8sf, v8sf, v8sf) -v4df __builtin_ia32_vfnmaddpd256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vfnmaddps256 (v8sf, v8sf, v8sf) -v4df __builtin_ia32_vfnmsubpd256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vfnmsubps256 (v8sf, v8sf, v8sf) -v4df __builtin_ia32_vfmaddsubpd256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vfmaddsubps256 (v8sf, v8sf, v8sf) -v4df __builtin_ia32_vfmsubaddpd256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vfmsubaddps256 (v8sf, v8sf, v8sf) - -@end smallexample - -The following built-in functions are available when @option{-mlwp} is used. - -@smallexample -void __builtin_ia32_llwpcb16 (void *); -void __builtin_ia32_llwpcb32 (void *); -void __builtin_ia32_llwpcb64 (void *); -void * __builtin_ia32_llwpcb16 (void); -void * __builtin_ia32_llwpcb32 (void); -void * __builtin_ia32_llwpcb64 (void); -void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short) -void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int) -void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int) -unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short) -unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int) -unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int) -@end smallexample - -The following built-in functions are available when @option{-mbmi} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int); -unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long); -@end smallexample - -The following built-in functions are available when @option{-mbmi2} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -unsigned int _bzhi_u32 (unsigned int, unsigned int) -unsigned int _pdep_u32 (unsigned int, unsigned int) -unsigned int _pext_u32 (unsigned int, unsigned int) -unsigned long long _bzhi_u64 (unsigned long long, unsigned long long) -unsigned long long _pdep_u64 (unsigned long long, unsigned long long) -unsigned long long _pext_u64 (unsigned long long, unsigned long long) -@end smallexample - -The following built-in functions are available when @option{-mlzcnt} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -unsigned short __builtin_ia32_lzcnt_16(unsigned short); -unsigned int __builtin_ia32_lzcnt_u32(unsigned int); -unsigned long long __builtin_ia32_lzcnt_u64 (unsigned long long); -@end smallexample - -The following built-in functions are available when @option{-mfxsr} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -void __builtin_ia32_fxsave (void *) -void __builtin_ia32_fxrstor (void *) -void __builtin_ia32_fxsave64 (void *) -void __builtin_ia32_fxrstor64 (void *) -@end smallexample - -The following built-in functions are available when @option{-mxsave} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -void __builtin_ia32_xsave (void *, long long) -void __builtin_ia32_xrstor (void *, long long) -void __builtin_ia32_xsave64 (void *, long long) -void __builtin_ia32_xrstor64 (void *, long long) -@end smallexample - -The following built-in functions are available when @option{-mxsaveopt} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -void __builtin_ia32_xsaveopt (void *, long long) -void __builtin_ia32_xsaveopt64 (void *, long long) -@end smallexample - -The following built-in functions are available when @option{-mtbm} is used. -Both of them generate the immediate form of the bextr machine instruction. -@smallexample -unsigned int __builtin_ia32_bextri_u32 (unsigned int, const unsigned int); -unsigned long long __builtin_ia32_bextri_u64 (unsigned long long, const unsigned long long); -@end smallexample - - -The following built-in functions are available when @option{-m3dnow} is used. -All of them generate the machine instruction that is part of the name. - -@smallexample -void __builtin_ia32_femms (void) -v8qi __builtin_ia32_pavgusb (v8qi, v8qi) -v2si __builtin_ia32_pf2id (v2sf) -v2sf __builtin_ia32_pfacc (v2sf, v2sf) -v2sf __builtin_ia32_pfadd (v2sf, v2sf) -v2si __builtin_ia32_pfcmpeq (v2sf, v2sf) -v2si __builtin_ia32_pfcmpge (v2sf, v2sf) -v2si __builtin_ia32_pfcmpgt (v2sf, v2sf) -v2sf __builtin_ia32_pfmax (v2sf, v2sf) -v2sf __builtin_ia32_pfmin (v2sf, v2sf) -v2sf __builtin_ia32_pfmul (v2sf, v2sf) -v2sf __builtin_ia32_pfrcp (v2sf) -v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf) -v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf) -v2sf __builtin_ia32_pfrsqrt (v2sf) -v2sf __builtin_ia32_pfsub (v2sf, v2sf) -v2sf __builtin_ia32_pfsubr (v2sf, v2sf) -v2sf __builtin_ia32_pi2fd (v2si) -v4hi __builtin_ia32_pmulhrw (v4hi, v4hi) -@end smallexample - -The following built-in functions are available when both @option{-m3dnow} -and @option{-march=athlon} are used. All of them generate the machine -instruction that is part of the name. - -@smallexample -v2si __builtin_ia32_pf2iw (v2sf) -v2sf __builtin_ia32_pfnacc (v2sf, v2sf) -v2sf __builtin_ia32_pfpnacc (v2sf, v2sf) -v2sf __builtin_ia32_pi2fw (v2si) -v2sf __builtin_ia32_pswapdsf (v2sf) -v2si __builtin_ia32_pswapdsi (v2si) -@end smallexample - -The following built-in functions are available when @option{-mrtm} is used -They are used for restricted transactional memory. These are the internal -low level functions. Normally the functions in -@ref{x86 transactional memory intrinsics} should be used instead. - -@smallexample -int __builtin_ia32_xbegin () -void __builtin_ia32_xend () -void __builtin_ia32_xabort (status) -int __builtin_ia32_xtest () -@end smallexample - -@node x86 transactional memory intrinsics -@subsection x86 transaction memory intrinsics - -Hardware transactional memory intrinsics for x86. These allow to use -memory transactions with RTM (Restricted Transactional Memory). -For using HLE (Hardware Lock Elision) see @ref{x86 specific memory model extensions for transactional memory} instead. -This support is enabled with the @option{-mrtm} option. - -A memory transaction commits all changes to memory in an atomic way, -as visible to other threads. If the transaction fails it is rolled back -and all side effects discarded. - -Generally there is no guarantee that a memory transaction ever succeeds -and suitable fallback code always needs to be supplied. - -@deftypefn {RTM Function} {unsigned} _xbegin () -Start a RTM (Restricted Transactional Memory) transaction. -Returns _XBEGIN_STARTED when the transaction -started successfully (note this is not 0, so the constant has to be -explicitely tested). When the transaction aborts all side effects -are undone and an abort code is returned. There is no guarantee -any transaction ever succeeds, so there always needs to be a valid -tested fallback path. -@end deftypefn - -@smallexample -#include <immintrin.h> - -if ((status = _xbegin ()) == _XBEGIN_STARTED) @{ - ... transaction code... - _xend (); -@} else @{ - ... non transactional fallback path... -@} -@end smallexample - -Valid abort status bits (when the value is not @code{_XBEGIN_STARTED}) are: - -@table @code -@item _XABORT_EXPLICIT -Transaction explicitely aborted with @code{_xabort}. The parameter passed -to @code{_xabort} is available with @code{_XABORT_CODE(status)} -@item _XABORT_RETRY -Transaction retry is possible. -@item _XABORT_CONFLICT -Transaction abort due to a memory conflict with another thread -@item _XABORT_CAPACITY -Transaction abort due to the transaction using too much memory -@item _XABORT_DEBUG -Transaction abort due to a debug trap -@item _XABORT_NESTED -Transaction abort in a inner nested transaction -@end table - -@deftypefn {RTM Function} {void} _xend () -Commit the current transaction. When no transaction is active this will -fault. All memory side effects of the transactions will become visible -to other threads in an atomic matter. -@end deftypefn - -@deftypefn {RTM Function} {int} _xtest () -Return a value not zero when a transaction is currently active, otherwise 0. -@end deftypefn - -@deftypefn {RTM Function} {void} _xabort (status) -Abort the current transaction. When no transaction is active this is a no-op. -status must be a 8bit constant, that is included in the status code returned -by @code{_xbegin} -@end deftypefn - @node MIPS DSP Built-in Functions @subsection MIPS DSP Built-in Functions @@ -17266,6 +15792,1480 @@ The intrinsic @code{void __tile_network_barrier (void)} is used to guarantee that no network operations before it are reordered with those after it. +@node x86 Built-in Functions +@subsection x86 Built-in Functions + +These built-in functions are available for the x86-32 and x86-64 family +of computers, depending on the command-line switches used. + +If you specify command-line switches such as @option{-msse}, +the compiler could use the extended instruction sets even if the built-ins +are not used explicitly in the program. For this reason, applications +that perform run-time CPU detection must compile separate files for each +supported architecture, using the appropriate flags. In particular, +the file containing the CPU detection code should be compiled without +these options. + +The following machine modes are available for use with MMX built-in functions +(@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32-bit integers, +@code{V4HI} for a vector of four 16-bit integers, and @code{V8QI} for a +vector of eight 8-bit integers. Some of the built-in functions operate on +MMX registers as a whole 64-bit entity, these use @code{V1DI} as their mode. + +If 3DNow!@: extensions are enabled, @code{V2SF} is used as a mode for a vector +of two 32-bit floating-point values. + +If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32-bit +floating-point values. Some instructions use a vector of four 32-bit +integers, these use @code{V4SI}. Finally, some instructions operate on an +entire vector register, interpreting it as a 128-bit integer, these use mode +@code{TI}. + +In 64-bit mode, the x86-64 family of processors uses additional built-in +functions for efficient use of @code{TF} (@code{__float128}) 128-bit +floating point and @code{TC} 128-bit complex floating-point values. + +The following floating-point built-in functions are available in 64-bit +mode. All of them implement the function that is part of the name. + +@smallexample +__float128 __builtin_fabsq (__float128) +__float128 __builtin_copysignq (__float128, __float128) +@end smallexample + +The following built-in function is always available. + +@table @code +@item void __builtin_ia32_pause (void) +Generates the @code{pause} machine instruction with a compiler memory +barrier. +@end table + +The following floating-point built-in functions are made available in the +64-bit mode. + +@table @code +@item __float128 __builtin_infq (void) +Similar to @code{__builtin_inf}, except the return type is @code{__float128}. +@findex __builtin_infq + +@item __float128 __builtin_huge_valq (void) +Similar to @code{__builtin_huge_val}, except the return type is @code{__float128}. +@findex __builtin_huge_valq +@end table + +The following built-in functions are always available and can be used to +check the target platform type. + +@deftypefn {Built-in Function} void __builtin_cpu_init (void) +This function runs the CPU detection code to check the type of CPU and the +features supported. This built-in function needs to be invoked along with the built-in functions +to check CPU type and features, @code{__builtin_cpu_is} and +@code{__builtin_cpu_supports}, only when used in a function that is +executed before any constructors are called. The CPU detection code is +automatically executed in a very high priority constructor. + +For example, this function has to be used in @code{ifunc} resolvers that +check for CPU type using the built-in functions @code{__builtin_cpu_is} +and @code{__builtin_cpu_supports}, or in constructors on targets that +don't support constructor priority. +@smallexample + +static void (*resolve_memcpy (void)) (void) +@{ + // ifunc resolvers fire before constructors, explicitly call the init + // function. + __builtin_cpu_init (); + if (__builtin_cpu_supports ("ssse3")) + return ssse3_memcpy; // super fast memcpy with ssse3 instructions. + else + return default_memcpy; +@} + +void *memcpy (void *, const void *, size_t) + __attribute__ ((ifunc ("resolve_memcpy"))); +@end smallexample + +@end deftypefn + +@deftypefn {Built-in Function} int __builtin_cpu_is (const char *@var{cpuname}) +This function returns a positive integer if the run-time CPU +is of type @var{cpuname} +and returns @code{0} otherwise. The following CPU names can be detected: + +@table @samp +@item intel +Intel CPU. + +@item atom +Intel Atom CPU. + +@item core2 +Intel Core 2 CPU. + +@item corei7 +Intel Core i7 CPU. + +@item nehalem +Intel Core i7 Nehalem CPU. + +@item westmere +Intel Core i7 Westmere CPU. + +@item sandybridge +Intel Core i7 Sandy Bridge CPU. + +@item amd +AMD CPU. + +@item amdfam10h +AMD Family 10h CPU. + +@item barcelona +AMD Family 10h Barcelona CPU. + +@item shanghai +AMD Family 10h Shanghai CPU. + +@item istanbul +AMD Family 10h Istanbul CPU. + +@item btver1 +AMD Family 14h CPU. + +@item amdfam15h +AMD Family 15h CPU. + +@item bdver1 +AMD Family 15h Bulldozer version 1. + +@item bdver2 +AMD Family 15h Bulldozer version 2. + +@item bdver3 +AMD Family 15h Bulldozer version 3. + +@item bdver4 +AMD Family 15h Bulldozer version 4. + +@item btver2 +AMD Family 16h CPU. +@end table + +Here is an example: +@smallexample +if (__builtin_cpu_is ("corei7")) + @{ + do_corei7 (); // Core i7 specific implementation. + @} +else + @{ + do_generic (); // Generic implementation. + @} +@end smallexample +@end deftypefn + +@deftypefn {Built-in Function} int __builtin_cpu_supports (const char *@var{feature}) +This function returns a positive integer if the run-time CPU +supports @var{feature} +and returns @code{0} otherwise. The following features can be detected: + +@table @samp +@item cmov +CMOV instruction. +@item mmx +MMX instructions. +@item popcnt +POPCNT instruction. +@item sse +SSE instructions. +@item sse2 +SSE2 instructions. +@item sse3 +SSE3 instructions. +@item ssse3 +SSSE3 instructions. +@item sse4.1 +SSE4.1 instructions. +@item sse4.2 +SSE4.2 instructions. +@item avx +AVX instructions. +@item avx2 +AVX2 instructions. +@item avx512f +AVX512F instructions. +@end table + +Here is an example: +@smallexample +if (__builtin_cpu_supports ("popcnt")) + @{ + asm("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc"); + @} +else + @{ + count = generic_countbits (n); //generic implementation. + @} +@end smallexample +@end deftypefn + + +The following built-in functions are made available by @option{-mmmx}. +All of them generate the machine instruction that is part of the name. + +@smallexample +v8qi __builtin_ia32_paddb (v8qi, v8qi) +v4hi __builtin_ia32_paddw (v4hi, v4hi) +v2si __builtin_ia32_paddd (v2si, v2si) +v8qi __builtin_ia32_psubb (v8qi, v8qi) +v4hi __builtin_ia32_psubw (v4hi, v4hi) +v2si __builtin_ia32_psubd (v2si, v2si) +v8qi __builtin_ia32_paddsb (v8qi, v8qi) +v4hi __builtin_ia32_paddsw (v4hi, v4hi) +v8qi __builtin_ia32_psubsb (v8qi, v8qi) +v4hi __builtin_ia32_psubsw (v4hi, v4hi) +v8qi __builtin_ia32_paddusb (v8qi, v8qi) +v4hi __builtin_ia32_paddusw (v4hi, v4hi) +v8qi __builtin_ia32_psubusb (v8qi, v8qi) +v4hi __builtin_ia32_psubusw (v4hi, v4hi) +v4hi __builtin_ia32_pmullw (v4hi, v4hi) +v4hi __builtin_ia32_pmulhw (v4hi, v4hi) +di __builtin_ia32_pand (di, di) +di __builtin_ia32_pandn (di,di) +di __builtin_ia32_por (di, di) +di __builtin_ia32_pxor (di, di) +v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi) +v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi) +v2si __builtin_ia32_pcmpeqd (v2si, v2si) +v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi) +v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi) +v2si __builtin_ia32_pcmpgtd (v2si, v2si) +v8qi __builtin_ia32_punpckhbw (v8qi, v8qi) +v4hi __builtin_ia32_punpckhwd (v4hi, v4hi) +v2si __builtin_ia32_punpckhdq (v2si, v2si) +v8qi __builtin_ia32_punpcklbw (v8qi, v8qi) +v4hi __builtin_ia32_punpcklwd (v4hi, v4hi) +v2si __builtin_ia32_punpckldq (v2si, v2si) +v8qi __builtin_ia32_packsswb (v4hi, v4hi) +v4hi __builtin_ia32_packssdw (v2si, v2si) +v8qi __builtin_ia32_packuswb (v4hi, v4hi) + +v4hi __builtin_ia32_psllw (v4hi, v4hi) +v2si __builtin_ia32_pslld (v2si, v2si) +v1di __builtin_ia32_psllq (v1di, v1di) +v4hi __builtin_ia32_psrlw (v4hi, v4hi) +v2si __builtin_ia32_psrld (v2si, v2si) +v1di __builtin_ia32_psrlq (v1di, v1di) +v4hi __builtin_ia32_psraw (v4hi, v4hi) +v2si __builtin_ia32_psrad (v2si, v2si) +v4hi __builtin_ia32_psllwi (v4hi, int) +v2si __builtin_ia32_pslldi (v2si, int) +v1di __builtin_ia32_psllqi (v1di, int) +v4hi __builtin_ia32_psrlwi (v4hi, int) +v2si __builtin_ia32_psrldi (v2si, int) +v1di __builtin_ia32_psrlqi (v1di, int) +v4hi __builtin_ia32_psrawi (v4hi, int) +v2si __builtin_ia32_psradi (v2si, int) + +@end smallexample + +The following built-in functions are made available either with +@option{-msse}, or with a combination of @option{-m3dnow} and +@option{-march=athlon}. All of them generate the machine +instruction that is part of the name. + +@smallexample +v4hi __builtin_ia32_pmulhuw (v4hi, v4hi) +v8qi __builtin_ia32_pavgb (v8qi, v8qi) +v4hi __builtin_ia32_pavgw (v4hi, v4hi) +v1di __builtin_ia32_psadbw (v8qi, v8qi) +v8qi __builtin_ia32_pmaxub (v8qi, v8qi) +v4hi __builtin_ia32_pmaxsw (v4hi, v4hi) +v8qi __builtin_ia32_pminub (v8qi, v8qi) +v4hi __builtin_ia32_pminsw (v4hi, v4hi) +int __builtin_ia32_pmovmskb (v8qi) +void __builtin_ia32_maskmovq (v8qi, v8qi, char *) +void __builtin_ia32_movntq (di *, di) +void __builtin_ia32_sfence (void) +@end smallexample + +The following built-in functions are available when @option{-msse} is used. +All of them generate the machine instruction that is part of the name. + +@smallexample +int __builtin_ia32_comieq (v4sf, v4sf) +int __builtin_ia32_comineq (v4sf, v4sf) +int __builtin_ia32_comilt (v4sf, v4sf) +int __builtin_ia32_comile (v4sf, v4sf) +int __builtin_ia32_comigt (v4sf, v4sf) +int __builtin_ia32_comige (v4sf, v4sf) +int __builtin_ia32_ucomieq (v4sf, v4sf) +int __builtin_ia32_ucomineq (v4sf, v4sf) +int __builtin_ia32_ucomilt (v4sf, v4sf) +int __builtin_ia32_ucomile (v4sf, v4sf) +int __builtin_ia32_ucomigt (v4sf, v4sf) +int __builtin_ia32_ucomige (v4sf, v4sf) +v4sf __builtin_ia32_addps (v4sf, v4sf) +v4sf __builtin_ia32_subps (v4sf, v4sf) +v4sf __builtin_ia32_mulps (v4sf, v4sf) +v4sf __builtin_ia32_divps (v4sf, v4sf) +v4sf __builtin_ia32_addss (v4sf, v4sf) +v4sf __builtin_ia32_subss (v4sf, v4sf) +v4sf __builtin_ia32_mulss (v4sf, v4sf) +v4sf __builtin_ia32_divss (v4sf, v4sf) +v4sf __builtin_ia32_cmpeqps (v4sf, v4sf) +v4sf __builtin_ia32_cmpltps (v4sf, v4sf) +v4sf __builtin_ia32_cmpleps (v4sf, v4sf) +v4sf __builtin_ia32_cmpgtps (v4sf, v4sf) +v4sf __builtin_ia32_cmpgeps (v4sf, v4sf) +v4sf __builtin_ia32_cmpunordps (v4sf, v4sf) +v4sf __builtin_ia32_cmpneqps (v4sf, v4sf) +v4sf __builtin_ia32_cmpnltps (v4sf, v4sf) +v4sf __builtin_ia32_cmpnleps (v4sf, v4sf) +v4sf __builtin_ia32_cmpngtps (v4sf, v4sf) +v4sf __builtin_ia32_cmpngeps (v4sf, v4sf) +v4sf __builtin_ia32_cmpordps (v4sf, v4sf) +v4sf __builtin_ia32_cmpeqss (v4sf, v4sf) +v4sf __builtin_ia32_cmpltss (v4sf, v4sf) +v4sf __builtin_ia32_cmpless (v4sf, v4sf) +v4sf __builtin_ia32_cmpunordss (v4sf, v4sf) +v4sf __builtin_ia32_cmpneqss (v4sf, v4sf) +v4sf __builtin_ia32_cmpnltss (v4sf, v4sf) +v4sf __builtin_ia32_cmpnless (v4sf, v4sf) +v4sf __builtin_ia32_cmpordss (v4sf, v4sf) +v4sf __builtin_ia32_maxps (v4sf, v4sf) +v4sf __builtin_ia32_maxss (v4sf, v4sf) +v4sf __builtin_ia32_minps (v4sf, v4sf) +v4sf __builtin_ia32_minss (v4sf, v4sf) +v4sf __builtin_ia32_andps (v4sf, v4sf) +v4sf __builtin_ia32_andnps (v4sf, v4sf) +v4sf __builtin_ia32_orps (v4sf, v4sf) +v4sf __builtin_ia32_xorps (v4sf, v4sf) +v4sf __builtin_ia32_movss (v4sf, v4sf) +v4sf __builtin_ia32_movhlps (v4sf, v4sf) +v4sf __builtin_ia32_movlhps (v4sf, v4sf) +v4sf __builtin_ia32_unpckhps (v4sf, v4sf) +v4sf __builtin_ia32_unpcklps (v4sf, v4sf) +v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si) +v4sf __builtin_ia32_cvtsi2ss (v4sf, int) +v2si __builtin_ia32_cvtps2pi (v4sf) +int __builtin_ia32_cvtss2si (v4sf) +v2si __builtin_ia32_cvttps2pi (v4sf) +int __builtin_ia32_cvttss2si (v4sf) +v4sf __builtin_ia32_rcpps (v4sf) +v4sf __builtin_ia32_rsqrtps (v4sf) +v4sf __builtin_ia32_sqrtps (v4sf) +v4sf __builtin_ia32_rcpss (v4sf) +v4sf __builtin_ia32_rsqrtss (v4sf) +v4sf __builtin_ia32_sqrtss (v4sf) +v4sf __builtin_ia32_shufps (v4sf, v4sf, int) +void __builtin_ia32_movntps (float *, v4sf) +int __builtin_ia32_movmskps (v4sf) +@end smallexample + +The following built-in functions are available when @option{-msse} is used. + +@table @code +@item v4sf __builtin_ia32_loadups (float *) +Generates the @code{movups} machine instruction as a load from memory. +@item void __builtin_ia32_storeups (float *, v4sf) +Generates the @code{movups} machine instruction as a store to memory. +@item v4sf __builtin_ia32_loadss (float *) +Generates the @code{movss} machine instruction as a load from memory. +@item v4sf __builtin_ia32_loadhps (v4sf, const v2sf *) +Generates the @code{movhps} machine instruction as a load from memory. +@item v4sf __builtin_ia32_loadlps (v4sf, const v2sf *) +Generates the @code{movlps} machine instruction as a load from memory +@item void __builtin_ia32_storehps (v2sf *, v4sf) +Generates the @code{movhps} machine instruction as a store to memory. +@item void __builtin_ia32_storelps (v2sf *, v4sf) +Generates the @code{movlps} machine instruction as a store to memory. +@end table + +The following built-in functions are available when @option{-msse2} is used. +All of them generate the machine instruction that is part of the name. + +@smallexample +int __builtin_ia32_comisdeq (v2df, v2df) +int __builtin_ia32_comisdlt (v2df, v2df) +int __builtin_ia32_comisdle (v2df, v2df) +int __builtin_ia32_comisdgt (v2df, v2df) +int __builtin_ia32_comisdge (v2df, v2df) +int __builtin_ia32_comisdneq (v2df, v2df) +int __builtin_ia32_ucomisdeq (v2df, v2df) +int __builtin_ia32_ucomisdlt (v2df, v2df) +int __builtin_ia32_ucomisdle (v2df, v2df) +int __builtin_ia32_ucomisdgt (v2df, v2df) +int __builtin_ia32_ucomisdge (v2df, v2df) +int __builtin_ia32_ucomisdneq (v2df, v2df) +v2df __builtin_ia32_cmpeqpd (v2df, v2df) +v2df __builtin_ia32_cmpltpd (v2df, v2df) +v2df __builtin_ia32_cmplepd (v2df, v2df) +v2df __builtin_ia32_cmpgtpd (v2df, v2df) +v2df __builtin_ia32_cmpgepd (v2df, v2df) +v2df __builtin_ia32_cmpunordpd (v2df, v2df) +v2df __builtin_ia32_cmpneqpd (v2df, v2df) +v2df __builtin_ia32_cmpnltpd (v2df, v2df) +v2df __builtin_ia32_cmpnlepd (v2df, v2df) +v2df __builtin_ia32_cmpngtpd (v2df, v2df) +v2df __builtin_ia32_cmpngepd (v2df, v2df) +v2df __builtin_ia32_cmpordpd (v2df, v2df) +v2df __builtin_ia32_cmpeqsd (v2df, v2df) +v2df __builtin_ia32_cmpltsd (v2df, v2df) +v2df __builtin_ia32_cmplesd (v2df, v2df) +v2df __builtin_ia32_cmpunordsd (v2df, v2df) +v2df __builtin_ia32_cmpneqsd (v2df, v2df) +v2df __builtin_ia32_cmpnltsd (v2df, v2df) +v2df __builtin_ia32_cmpnlesd (v2df, v2df) +v2df __builtin_ia32_cmpordsd (v2df, v2df) +v2di __builtin_ia32_paddq (v2di, v2di) +v2di __builtin_ia32_psubq (v2di, v2di) +v2df __builtin_ia32_addpd (v2df, v2df) +v2df __builtin_ia32_subpd (v2df, v2df) +v2df __builtin_ia32_mulpd (v2df, v2df) +v2df __builtin_ia32_divpd (v2df, v2df) +v2df __builtin_ia32_addsd (v2df, v2df) +v2df __builtin_ia32_subsd (v2df, v2df) +v2df __builtin_ia32_mulsd (v2df, v2df) +v2df __builtin_ia32_divsd (v2df, v2df) +v2df __builtin_ia32_minpd (v2df, v2df) +v2df __builtin_ia32_maxpd (v2df, v2df) +v2df __builtin_ia32_minsd (v2df, v2df) +v2df __builtin_ia32_maxsd (v2df, v2df) +v2df __builtin_ia32_andpd (v2df, v2df) +v2df __builtin_ia32_andnpd (v2df, v2df) +v2df __builtin_ia32_orpd (v2df, v2df) +v2df __builtin_ia32_xorpd (v2df, v2df) +v2df __builtin_ia32_movsd (v2df, v2df) +v2df __builtin_ia32_unpckhpd (v2df, v2df) +v2df __builtin_ia32_unpcklpd (v2df, v2df) +v16qi __builtin_ia32_paddb128 (v16qi, v16qi) +v8hi __builtin_ia32_paddw128 (v8hi, v8hi) +v4si __builtin_ia32_paddd128 (v4si, v4si) +v2di __builtin_ia32_paddq128 (v2di, v2di) +v16qi __builtin_ia32_psubb128 (v16qi, v16qi) +v8hi __builtin_ia32_psubw128 (v8hi, v8hi) +v4si __builtin_ia32_psubd128 (v4si, v4si) +v2di __builtin_ia32_psubq128 (v2di, v2di) +v8hi __builtin_ia32_pmullw128 (v8hi, v8hi) +v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi) +v2di __builtin_ia32_pand128 (v2di, v2di) +v2di __builtin_ia32_pandn128 (v2di, v2di) +v2di __builtin_ia32_por128 (v2di, v2di) +v2di __builtin_ia32_pxor128 (v2di, v2di) +v16qi __builtin_ia32_pavgb128 (v16qi, v16qi) +v8hi __builtin_ia32_pavgw128 (v8hi, v8hi) +v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi) +v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi) +v4si __builtin_ia32_pcmpeqd128 (v4si, v4si) +v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi) +v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi) +v4si __builtin_ia32_pcmpgtd128 (v4si, v4si) +v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi) +v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi) +v16qi __builtin_ia32_pminub128 (v16qi, v16qi) +v8hi __builtin_ia32_pminsw128 (v8hi, v8hi) +v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi) +v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi) +v4si __builtin_ia32_punpckhdq128 (v4si, v4si) +v2di __builtin_ia32_punpckhqdq128 (v2di, v2di) +v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi) +v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi) +v4si __builtin_ia32_punpckldq128 (v4si, v4si) +v2di __builtin_ia32_punpcklqdq128 (v2di, v2di) +v16qi __builtin_ia32_packsswb128 (v8hi, v8hi) +v8hi __builtin_ia32_packssdw128 (v4si, v4si) +v16qi __builtin_ia32_packuswb128 (v8hi, v8hi) +v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi) +void __builtin_ia32_maskmovdqu (v16qi, v16qi) +v2df __builtin_ia32_loadupd (double *) +void __builtin_ia32_storeupd (double *, v2df) +v2df __builtin_ia32_loadhpd (v2df, double const *) +v2df __builtin_ia32_loadlpd (v2df, double const *) +int __builtin_ia32_movmskpd (v2df) +int __builtin_ia32_pmovmskb128 (v16qi) +void __builtin_ia32_movnti (int *, int) +void __builtin_ia32_movnti64 (long long int *, long long int) +void __builtin_ia32_movntpd (double *, v2df) +void __builtin_ia32_movntdq (v2df *, v2df) +v4si __builtin_ia32_pshufd (v4si, int) +v8hi __builtin_ia32_pshuflw (v8hi, int) +v8hi __builtin_ia32_pshufhw (v8hi, int) +v2di __builtin_ia32_psadbw128 (v16qi, v16qi) +v2df __builtin_ia32_sqrtpd (v2df) +v2df __builtin_ia32_sqrtsd (v2df) +v2df __builtin_ia32_shufpd (v2df, v2df, int) +v2df __builtin_ia32_cvtdq2pd (v4si) +v4sf __builtin_ia32_cvtdq2ps (v4si) +v4si __builtin_ia32_cvtpd2dq (v2df) +v2si __builtin_ia32_cvtpd2pi (v2df) +v4sf __builtin_ia32_cvtpd2ps (v2df) +v4si __builtin_ia32_cvttpd2dq (v2df) +v2si __builtin_ia32_cvttpd2pi (v2df) +v2df __builtin_ia32_cvtpi2pd (v2si) +int __builtin_ia32_cvtsd2si (v2df) +int __builtin_ia32_cvttsd2si (v2df) +long long __builtin_ia32_cvtsd2si64 (v2df) +long long __builtin_ia32_cvttsd2si64 (v2df) +v4si __builtin_ia32_cvtps2dq (v4sf) +v2df __builtin_ia32_cvtps2pd (v4sf) +v4si __builtin_ia32_cvttps2dq (v4sf) +v2df __builtin_ia32_cvtsi2sd (v2df, int) +v2df __builtin_ia32_cvtsi642sd (v2df, long long) +v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df) +v2df __builtin_ia32_cvtss2sd (v2df, v4sf) +void __builtin_ia32_clflush (const void *) +void __builtin_ia32_lfence (void) +void __builtin_ia32_mfence (void) +v16qi __builtin_ia32_loaddqu (const char *) +void __builtin_ia32_storedqu (char *, v16qi) +v1di __builtin_ia32_pmuludq (v2si, v2si) +v2di __builtin_ia32_pmuludq128 (v4si, v4si) +v8hi __builtin_ia32_psllw128 (v8hi, v8hi) +v4si __builtin_ia32_pslld128 (v4si, v4si) +v2di __builtin_ia32_psllq128 (v2di, v2di) +v8hi __builtin_ia32_psrlw128 (v8hi, v8hi) +v4si __builtin_ia32_psrld128 (v4si, v4si) +v2di __builtin_ia32_psrlq128 (v2di, v2di) +v8hi __builtin_ia32_psraw128 (v8hi, v8hi) +v4si __builtin_ia32_psrad128 (v4si, v4si) +v2di __builtin_ia32_pslldqi128 (v2di, int) +v8hi __builtin_ia32_psllwi128 (v8hi, int) +v4si __builtin_ia32_pslldi128 (v4si, int) +v2di __builtin_ia32_psllqi128 (v2di, int) +v2di __builtin_ia32_psrldqi128 (v2di, int) +v8hi __builtin_ia32_psrlwi128 (v8hi, int) +v4si __builtin_ia32_psrldi128 (v4si, int) +v2di __builtin_ia32_psrlqi128 (v2di, int) +v8hi __builtin_ia32_psrawi128 (v8hi, int) +v4si __builtin_ia32_psradi128 (v4si, int) +v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi) +v2di __builtin_ia32_movq128 (v2di) +@end smallexample + +The following built-in functions are available when @option{-msse3} is used. +All of them generate the machine instruction that is part of the name. + +@smallexample +v2df __builtin_ia32_addsubpd (v2df, v2df) +v4sf __builtin_ia32_addsubps (v4sf, v4sf) +v2df __builtin_ia32_haddpd (v2df, v2df) +v4sf __builtin_ia32_haddps (v4sf, v4sf) +v2df __builtin_ia32_hsubpd (v2df, v2df) +v4sf __builtin_ia32_hsubps (v4sf, v4sf) +v16qi __builtin_ia32_lddqu (char const *) +void __builtin_ia32_monitor (void *, unsigned int, unsigned int) +v4sf __builtin_ia32_movshdup (v4sf) +v4sf __builtin_ia32_movsldup (v4sf) +void __builtin_ia32_mwait (unsigned int, unsigned int) +@end smallexample + +The following built-in functions are available when @option{-mssse3} is used. +All of them generate the machine instruction that is part of the name. + +@smallexample +v2si __builtin_ia32_phaddd (v2si, v2si) +v4hi __builtin_ia32_phaddw (v4hi, v4hi) +v4hi __builtin_ia32_phaddsw (v4hi, v4hi) +v2si __builtin_ia32_phsubd (v2si, v2si) +v4hi __builtin_ia32_phsubw (v4hi, v4hi) +v4hi __builtin_ia32_phsubsw (v4hi, v4hi) +v4hi __builtin_ia32_pmaddubsw (v8qi, v8qi) +v4hi __builtin_ia32_pmulhrsw (v4hi, v4hi) +v8qi __builtin_ia32_pshufb (v8qi, v8qi) +v8qi __builtin_ia32_psignb (v8qi, v8qi) +v2si __builtin_ia32_psignd (v2si, v2si) +v4hi __builtin_ia32_psignw (v4hi, v4hi) +v1di __builtin_ia32_palignr (v1di, v1di, int) +v8qi __builtin_ia32_pabsb (v8qi) +v2si __builtin_ia32_pabsd (v2si) +v4hi __builtin_ia32_pabsw (v4hi) +@end smallexample + +The following built-in functions are available when @option{-mssse3} is used. +All of them generate the machine instruction that is part of the name. + +@smallexample +v4si __builtin_ia32_phaddd128 (v4si, v4si) +v8hi __builtin_ia32_phaddw128 (v8hi, v8hi) +v8hi __builtin_ia32_phaddsw128 (v8hi, v8hi) +v4si __builtin_ia32_phsubd128 (v4si, v4si) +v8hi __builtin_ia32_phsubw128 (v8hi, v8hi) +v8hi __builtin_ia32_phsubsw128 (v8hi, v8hi) +v8hi __builtin_ia32_pmaddubsw128 (v16qi, v16qi) +v8hi __builtin_ia32_pmulhrsw128 (v8hi, v8hi) +v16qi __builtin_ia32_pshufb128 (v16qi, v16qi) +v16qi __builtin_ia32_psignb128 (v16qi, v16qi) +v4si __builtin_ia32_psignd128 (v4si, v4si) +v8hi __builtin_ia32_psignw128 (v8hi, v8hi) +v2di __builtin_ia32_palignr128 (v2di, v2di, int) +v16qi __builtin_ia32_pabsb128 (v16qi) +v4si __builtin_ia32_pabsd128 (v4si) +v8hi __builtin_ia32_pabsw128 (v8hi) +@end smallexample + +The following built-in functions are available when @option{-msse4.1} is +used. All of them generate the machine instruction that is part of the +name. + +@smallexample +v2df __builtin_ia32_blendpd (v2df, v2df, const int) +v4sf __builtin_ia32_blendps (v4sf, v4sf, const int) +v2df __builtin_ia32_blendvpd (v2df, v2df, v2df) +v4sf __builtin_ia32_blendvps (v4sf, v4sf, v4sf) +v2df __builtin_ia32_dppd (v2df, v2df, const int) +v4sf __builtin_ia32_dpps (v4sf, v4sf, const int) +v4sf __builtin_ia32_insertps128 (v4sf, v4sf, const int) +v2di __builtin_ia32_movntdqa (v2di *); +v16qi __builtin_ia32_mpsadbw128 (v16qi, v16qi, const int) +v8hi __builtin_ia32_packusdw128 (v4si, v4si) +v16qi __builtin_ia32_pblendvb128 (v16qi, v16qi, v16qi) +v8hi __builtin_ia32_pblendw128 (v8hi, v8hi, const int) +v2di __builtin_ia32_pcmpeqq (v2di, v2di) +v8hi __builtin_ia32_phminposuw128 (v8hi) +v16qi __builtin_ia32_pmaxsb128 (v16qi, v16qi) +v4si __builtin_ia32_pmaxsd128 (v4si, v4si) +v4si __builtin_ia32_pmaxud128 (v4si, v4si) +v8hi __builtin_ia32_pmaxuw128 (v8hi, v8hi) +v16qi __builtin_ia32_pminsb128 (v16qi, v16qi) +v4si __builtin_ia32_pminsd128 (v4si, v4si) +v4si __builtin_ia32_pminud128 (v4si, v4si) +v8hi __builtin_ia32_pminuw128 (v8hi, v8hi) +v4si __builtin_ia32_pmovsxbd128 (v16qi) +v2di __builtin_ia32_pmovsxbq128 (v16qi) +v8hi __builtin_ia32_pmovsxbw128 (v16qi) +v2di __builtin_ia32_pmovsxdq128 (v4si) +v4si __builtin_ia32_pmovsxwd128 (v8hi) +v2di __builtin_ia32_pmovsxwq128 (v8hi) +v4si __builtin_ia32_pmovzxbd128 (v16qi) +v2di __builtin_ia32_pmovzxbq128 (v16qi) +v8hi __builtin_ia32_pmovzxbw128 (v16qi) +v2di __builtin_ia32_pmovzxdq128 (v4si) +v4si __builtin_ia32_pmovzxwd128 (v8hi) +v2di __builtin_ia32_pmovzxwq128 (v8hi) +v2di __builtin_ia32_pmuldq128 (v4si, v4si) +v4si __builtin_ia32_pmulld128 (v4si, v4si) +int __builtin_ia32_ptestc128 (v2di, v2di) +int __builtin_ia32_ptestnzc128 (v2di, v2di) +int __builtin_ia32_ptestz128 (v2di, v2di) +v2df __builtin_ia32_roundpd (v2df, const int) +v4sf __builtin_ia32_roundps (v4sf, const int) +v2df __builtin_ia32_roundsd (v2df, v2df, const int) +v4sf __builtin_ia32_roundss (v4sf, v4sf, const int) +@end smallexample + +The following built-in functions are available when @option{-msse4.1} is +used. + +@table @code +@item v4sf __builtin_ia32_vec_set_v4sf (v4sf, float, const int) +Generates the @code{insertps} machine instruction. +@item int __builtin_ia32_vec_ext_v16qi (v16qi, const int) +Generates the @code{pextrb} machine instruction. +@item v16qi __builtin_ia32_vec_set_v16qi (v16qi, int, const int) +Generates the @code{pinsrb} machine instruction. +@item v4si __builtin_ia32_vec_set_v4si (v4si, int, const int) +Generates the @code{pinsrd} machine instruction. +@item v2di __builtin_ia32_vec_set_v2di (v2di, long long, const int) +Generates the @code{pinsrq} machine instruction in 64bit mode. +@end table + +The following built-in functions are changed to generate new SSE4.1 +instructions when @option{-msse4.1} is used. + +@table @code +@item float __builtin_ia32_vec_ext_v4sf (v4sf, const int) +Generates the @code{extractps} machine instruction. +@item int __builtin_ia32_vec_ext_v4si (v4si, const int) +Generates the @code{pextrd} machine instruction. +@item long long __builtin_ia32_vec_ext_v2di (v2di, const int) +Generates the @code{pextrq} machine instruction in 64bit mode. +@end table + +The following built-in functions are available when @option{-msse4.2} is +used. All of them generate the machine instruction that is part of the +name. + +@smallexample +v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int) +int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int) +int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int) +int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int) +int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int) +int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int) +int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int) +v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int) +int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int) +int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int) +int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int) +int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int) +int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int) +int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int) +v2di __builtin_ia32_pcmpgtq (v2di, v2di) +@end smallexample + +The following built-in functions are available when @option{-msse4.2} is +used. + +@table @code +@item unsigned int __builtin_ia32_crc32qi (unsigned int, unsigned char) +Generates the @code{crc32b} machine instruction. +@item unsigned int __builtin_ia32_crc32hi (unsigned int, unsigned short) +Generates the @code{crc32w} machine instruction. +@item unsigned int __builtin_ia32_crc32si (unsigned int, unsigned int) +Generates the @code{crc32l} machine instruction. +@item unsigned long long __builtin_ia32_crc32di (unsigned long long, unsigned long long) +Generates the @code{crc32q} machine instruction. +@end table + +The following built-in functions are changed to generate new SSE4.2 +instructions when @option{-msse4.2} is used. + +@table @code +@item int __builtin_popcount (unsigned int) +Generates the @code{popcntl} machine instruction. +@item int __builtin_popcountl (unsigned long) +Generates the @code{popcntl} or @code{popcntq} machine instruction, +depending on the size of @code{unsigned long}. +@item int __builtin_popcountll (unsigned long long) +Generates the @code{popcntq} machine instruction. +@end table + +The following built-in functions are available when @option{-mavx} is +used. All of them generate the machine instruction that is part of the +name. + +@smallexample +v4df __builtin_ia32_addpd256 (v4df,v4df) +v8sf __builtin_ia32_addps256 (v8sf,v8sf) +v4df __builtin_ia32_addsubpd256 (v4df,v4df) +v8sf __builtin_ia32_addsubps256 (v8sf,v8sf) +v4df __builtin_ia32_andnpd256 (v4df,v4df) +v8sf __builtin_ia32_andnps256 (v8sf,v8sf) +v4df __builtin_ia32_andpd256 (v4df,v4df) +v8sf __builtin_ia32_andps256 (v8sf,v8sf) +v4df __builtin_ia32_blendpd256 (v4df,v4df,int) +v8sf __builtin_ia32_blendps256 (v8sf,v8sf,int) +v4df __builtin_ia32_blendvpd256 (v4df,v4df,v4df) +v8sf __builtin_ia32_blendvps256 (v8sf,v8sf,v8sf) +v2df __builtin_ia32_cmppd (v2df,v2df,int) +v4df __builtin_ia32_cmppd256 (v4df,v4df,int) +v4sf __builtin_ia32_cmpps (v4sf,v4sf,int) +v8sf __builtin_ia32_cmpps256 (v8sf,v8sf,int) +v2df __builtin_ia32_cmpsd (v2df,v2df,int) +v4sf __builtin_ia32_cmpss (v4sf,v4sf,int) +v4df __builtin_ia32_cvtdq2pd256 (v4si) +v8sf __builtin_ia32_cvtdq2ps256 (v8si) +v4si __builtin_ia32_cvtpd2dq256 (v4df) +v4sf __builtin_ia32_cvtpd2ps256 (v4df) +v8si __builtin_ia32_cvtps2dq256 (v8sf) +v4df __builtin_ia32_cvtps2pd256 (v4sf) +v4si __builtin_ia32_cvttpd2dq256 (v4df) +v8si __builtin_ia32_cvttps2dq256 (v8sf) +v4df __builtin_ia32_divpd256 (v4df,v4df) +v8sf __builtin_ia32_divps256 (v8sf,v8sf) +v8sf __builtin_ia32_dpps256 (v8sf,v8sf,int) +v4df __builtin_ia32_haddpd256 (v4df,v4df) +v8sf __builtin_ia32_haddps256 (v8sf,v8sf) +v4df __builtin_ia32_hsubpd256 (v4df,v4df) +v8sf __builtin_ia32_hsubps256 (v8sf,v8sf) +v32qi __builtin_ia32_lddqu256 (pcchar) +v32qi __builtin_ia32_loaddqu256 (pcchar) +v4df __builtin_ia32_loadupd256 (pcdouble) +v8sf __builtin_ia32_loadups256 (pcfloat) +v2df __builtin_ia32_maskloadpd (pcv2df,v2df) +v4df __builtin_ia32_maskloadpd256 (pcv4df,v4df) +v4sf __builtin_ia32_maskloadps (pcv4sf,v4sf) +v8sf __builtin_ia32_maskloadps256 (pcv8sf,v8sf) +void __builtin_ia32_maskstorepd (pv2df,v2df,v2df) +void __builtin_ia32_maskstorepd256 (pv4df,v4df,v4df) +void __builtin_ia32_maskstoreps (pv4sf,v4sf,v4sf) +void __builtin_ia32_maskstoreps256 (pv8sf,v8sf,v8sf) +v4df __builtin_ia32_maxpd256 (v4df,v4df) +v8sf __builtin_ia32_maxps256 (v8sf,v8sf) +v4df __builtin_ia32_minpd256 (v4df,v4df) +v8sf __builtin_ia32_minps256 (v8sf,v8sf) +v4df __builtin_ia32_movddup256 (v4df) +int __builtin_ia32_movmskpd256 (v4df) +int __builtin_ia32_movmskps256 (v8sf) +v8sf __builtin_ia32_movshdup256 (v8sf) +v8sf __builtin_ia32_movsldup256 (v8sf) +v4df __builtin_ia32_mulpd256 (v4df,v4df) +v8sf __builtin_ia32_mulps256 (v8sf,v8sf) +v4df __builtin_ia32_orpd256 (v4df,v4df) +v8sf __builtin_ia32_orps256 (v8sf,v8sf) +v2df __builtin_ia32_pd_pd256 (v4df) +v4df __builtin_ia32_pd256_pd (v2df) +v4sf __builtin_ia32_ps_ps256 (v8sf) +v8sf __builtin_ia32_ps256_ps (v4sf) +int __builtin_ia32_ptestc256 (v4di,v4di,ptest) +int __builtin_ia32_ptestnzc256 (v4di,v4di,ptest) +int __builtin_ia32_ptestz256 (v4di,v4di,ptest) +v8sf __builtin_ia32_rcpps256 (v8sf) +v4df __builtin_ia32_roundpd256 (v4df,int) +v8sf __builtin_ia32_roundps256 (v8sf,int) +v8sf __builtin_ia32_rsqrtps_nr256 (v8sf) +v8sf __builtin_ia32_rsqrtps256 (v8sf) +v4df __builtin_ia32_shufpd256 (v4df,v4df,int) +v8sf __builtin_ia32_shufps256 (v8sf,v8sf,int) +v4si __builtin_ia32_si_si256 (v8si) +v8si __builtin_ia32_si256_si (v4si) +v4df __builtin_ia32_sqrtpd256 (v4df) +v8sf __builtin_ia32_sqrtps_nr256 (v8sf) +v8sf __builtin_ia32_sqrtps256 (v8sf) +void __builtin_ia32_storedqu256 (pchar,v32qi) +void __builtin_ia32_storeupd256 (pdouble,v4df) +void __builtin_ia32_storeups256 (pfloat,v8sf) +v4df __builtin_ia32_subpd256 (v4df,v4df) +v8sf __builtin_ia32_subps256 (v8sf,v8sf) +v4df __builtin_ia32_unpckhpd256 (v4df,v4df) +v8sf __builtin_ia32_unpckhps256 (v8sf,v8sf) +v4df __builtin_ia32_unpcklpd256 (v4df,v4df) +v8sf __builtin_ia32_unpcklps256 (v8sf,v8sf) +v4df __builtin_ia32_vbroadcastf128_pd256 (pcv2df) +v8sf __builtin_ia32_vbroadcastf128_ps256 (pcv4sf) +v4df __builtin_ia32_vbroadcastsd256 (pcdouble) +v4sf __builtin_ia32_vbroadcastss (pcfloat) +v8sf __builtin_ia32_vbroadcastss256 (pcfloat) +v2df __builtin_ia32_vextractf128_pd256 (v4df,int) +v4sf __builtin_ia32_vextractf128_ps256 (v8sf,int) +v4si __builtin_ia32_vextractf128_si256 (v8si,int) +v4df __builtin_ia32_vinsertf128_pd256 (v4df,v2df,int) +v8sf __builtin_ia32_vinsertf128_ps256 (v8sf,v4sf,int) +v8si __builtin_ia32_vinsertf128_si256 (v8si,v4si,int) +v4df __builtin_ia32_vperm2f128_pd256 (v4df,v4df,int) +v8sf __builtin_ia32_vperm2f128_ps256 (v8sf,v8sf,int) +v8si __builtin_ia32_vperm2f128_si256 (v8si,v8si,int) +v2df __builtin_ia32_vpermil2pd (v2df,v2df,v2di,int) +v4df __builtin_ia32_vpermil2pd256 (v4df,v4df,v4di,int) +v4sf __builtin_ia32_vpermil2ps (v4sf,v4sf,v4si,int) +v8sf __builtin_ia32_vpermil2ps256 (v8sf,v8sf,v8si,int) +v2df __builtin_ia32_vpermilpd (v2df,int) +v4df __builtin_ia32_vpermilpd256 (v4df,int) +v4sf __builtin_ia32_vpermilps (v4sf,int) +v8sf __builtin_ia32_vpermilps256 (v8sf,int) +v2df __builtin_ia32_vpermilvarpd (v2df,v2di) +v4df __builtin_ia32_vpermilvarpd256 (v4df,v4di) +v4sf __builtin_ia32_vpermilvarps (v4sf,v4si) +v8sf __builtin_ia32_vpermilvarps256 (v8sf,v8si) +int __builtin_ia32_vtestcpd (v2df,v2df,ptest) +int __builtin_ia32_vtestcpd256 (v4df,v4df,ptest) +int __builtin_ia32_vtestcps (v4sf,v4sf,ptest) +int __builtin_ia32_vtestcps256 (v8sf,v8sf,ptest) +int __builtin_ia32_vtestnzcpd (v2df,v2df,ptest) +int __builtin_ia32_vtestnzcpd256 (v4df,v4df,ptest) +int __builtin_ia32_vtestnzcps (v4sf,v4sf,ptest) +int __builtin_ia32_vtestnzcps256 (v8sf,v8sf,ptest) +int __builtin_ia32_vtestzpd (v2df,v2df,ptest) +int __builtin_ia32_vtestzpd256 (v4df,v4df,ptest) +int __builtin_ia32_vtestzps (v4sf,v4sf,ptest) +int __builtin_ia32_vtestzps256 (v8sf,v8sf,ptest) +void __builtin_ia32_vzeroall (void) +void __builtin_ia32_vzeroupper (void) +v4df __builtin_ia32_xorpd256 (v4df,v4df) +v8sf __builtin_ia32_xorps256 (v8sf,v8sf) +@end smallexample + +The following built-in functions are available when @option{-mavx2} is +used. All of them generate the machine instruction that is part of the +name. + +@smallexample +v32qi __builtin_ia32_mpsadbw256 (v32qi,v32qi,int) +v32qi __builtin_ia32_pabsb256 (v32qi) +v16hi __builtin_ia32_pabsw256 (v16hi) +v8si __builtin_ia32_pabsd256 (v8si) +v16hi __builtin_ia32_packssdw256 (v8si,v8si) +v32qi __builtin_ia32_packsswb256 (v16hi,v16hi) +v16hi __builtin_ia32_packusdw256 (v8si,v8si) +v32qi __builtin_ia32_packuswb256 (v16hi,v16hi) +v32qi __builtin_ia32_paddb256 (v32qi,v32qi) +v16hi __builtin_ia32_paddw256 (v16hi,v16hi) +v8si __builtin_ia32_paddd256 (v8si,v8si) +v4di __builtin_ia32_paddq256 (v4di,v4di) +v32qi __builtin_ia32_paddsb256 (v32qi,v32qi) +v16hi __builtin_ia32_paddsw256 (v16hi,v16hi) +v32qi __builtin_ia32_paddusb256 (v32qi,v32qi) +v16hi __builtin_ia32_paddusw256 (v16hi,v16hi) +v4di __builtin_ia32_palignr256 (v4di,v4di,int) +v4di __builtin_ia32_andsi256 (v4di,v4di) +v4di __builtin_ia32_andnotsi256 (v4di,v4di) +v32qi __builtin_ia32_pavgb256 (v32qi,v32qi) +v16hi __builtin_ia32_pavgw256 (v16hi,v16hi) +v32qi __builtin_ia32_pblendvb256 (v32qi,v32qi,v32qi) +v16hi __builtin_ia32_pblendw256 (v16hi,v16hi,int) +v32qi __builtin_ia32_pcmpeqb256 (v32qi,v32qi) +v16hi __builtin_ia32_pcmpeqw256 (v16hi,v16hi) +v8si __builtin_ia32_pcmpeqd256 (c8si,v8si) +v4di __builtin_ia32_pcmpeqq256 (v4di,v4di) +v32qi __builtin_ia32_pcmpgtb256 (v32qi,v32qi) +v16hi __builtin_ia32_pcmpgtw256 (16hi,v16hi) +v8si __builtin_ia32_pcmpgtd256 (v8si,v8si) +v4di __builtin_ia32_pcmpgtq256 (v4di,v4di) +v16hi __builtin_ia32_phaddw256 (v16hi,v16hi) +v8si __builtin_ia32_phaddd256 (v8si,v8si) +v16hi __builtin_ia32_phaddsw256 (v16hi,v16hi) +v16hi __builtin_ia32_phsubw256 (v16hi,v16hi) +v8si __builtin_ia32_phsubd256 (v8si,v8si) +v16hi __builtin_ia32_phsubsw256 (v16hi,v16hi) +v32qi __builtin_ia32_pmaddubsw256 (v32qi,v32qi) +v16hi __builtin_ia32_pmaddwd256 (v16hi,v16hi) +v32qi __builtin_ia32_pmaxsb256 (v32qi,v32qi) +v16hi __builtin_ia32_pmaxsw256 (v16hi,v16hi) +v8si __builtin_ia32_pmaxsd256 (v8si,v8si) +v32qi __builtin_ia32_pmaxub256 (v32qi,v32qi) +v16hi __builtin_ia32_pmaxuw256 (v16hi,v16hi) +v8si __builtin_ia32_pmaxud256 (v8si,v8si) +v32qi __builtin_ia32_pminsb256 (v32qi,v32qi) +v16hi __builtin_ia32_pminsw256 (v16hi,v16hi) +v8si __builtin_ia32_pminsd256 (v8si,v8si) +v32qi __builtin_ia32_pminub256 (v32qi,v32qi) +v16hi __builtin_ia32_pminuw256 (v16hi,v16hi) +v8si __builtin_ia32_pminud256 (v8si,v8si) +int __builtin_ia32_pmovmskb256 (v32qi) +v16hi __builtin_ia32_pmovsxbw256 (v16qi) +v8si __builtin_ia32_pmovsxbd256 (v16qi) +v4di __builtin_ia32_pmovsxbq256 (v16qi) +v8si __builtin_ia32_pmovsxwd256 (v8hi) +v4di __builtin_ia32_pmovsxwq256 (v8hi) +v4di __builtin_ia32_pmovsxdq256 (v4si) +v16hi __builtin_ia32_pmovzxbw256 (v16qi) +v8si __builtin_ia32_pmovzxbd256 (v16qi) +v4di __builtin_ia32_pmovzxbq256 (v16qi) +v8si __builtin_ia32_pmovzxwd256 (v8hi) +v4di __builtin_ia32_pmovzxwq256 (v8hi) +v4di __builtin_ia32_pmovzxdq256 (v4si) +v4di __builtin_ia32_pmuldq256 (v8si,v8si) +v16hi __builtin_ia32_pmulhrsw256 (v16hi, v16hi) +v16hi __builtin_ia32_pmulhuw256 (v16hi,v16hi) +v16hi __builtin_ia32_pmulhw256 (v16hi,v16hi) +v16hi __builtin_ia32_pmullw256 (v16hi,v16hi) +v8si __builtin_ia32_pmulld256 (v8si,v8si) +v4di __builtin_ia32_pmuludq256 (v8si,v8si) +v4di __builtin_ia32_por256 (v4di,v4di) +v16hi __builtin_ia32_psadbw256 (v32qi,v32qi) +v32qi __builtin_ia32_pshufb256 (v32qi,v32qi) +v8si __builtin_ia32_pshufd256 (v8si,int) +v16hi __builtin_ia32_pshufhw256 (v16hi,int) +v16hi __builtin_ia32_pshuflw256 (v16hi,int) +v32qi __builtin_ia32_psignb256 (v32qi,v32qi) +v16hi __builtin_ia32_psignw256 (v16hi,v16hi) +v8si __builtin_ia32_psignd256 (v8si,v8si) +v4di __builtin_ia32_pslldqi256 (v4di,int) +v16hi __builtin_ia32_psllwi256 (16hi,int) +v16hi __builtin_ia32_psllw256(v16hi,v8hi) +v8si __builtin_ia32_pslldi256 (v8si,int) +v8si __builtin_ia32_pslld256(v8si,v4si) +v4di __builtin_ia32_psllqi256 (v4di,int) +v4di __builtin_ia32_psllq256(v4di,v2di) +v16hi __builtin_ia32_psrawi256 (v16hi,int) +v16hi __builtin_ia32_psraw256 (v16hi,v8hi) +v8si __builtin_ia32_psradi256 (v8si,int) +v8si __builtin_ia32_psrad256 (v8si,v4si) +v4di __builtin_ia32_psrldqi256 (v4di, int) +v16hi __builtin_ia32_psrlwi256 (v16hi,int) +v16hi __builtin_ia32_psrlw256 (v16hi,v8hi) +v8si __builtin_ia32_psrldi256 (v8si,int) +v8si __builtin_ia32_psrld256 (v8si,v4si) +v4di __builtin_ia32_psrlqi256 (v4di,int) +v4di __builtin_ia32_psrlq256(v4di,v2di) +v32qi __builtin_ia32_psubb256 (v32qi,v32qi) +v32hi __builtin_ia32_psubw256 (v16hi,v16hi) +v8si __builtin_ia32_psubd256 (v8si,v8si) +v4di __builtin_ia32_psubq256 (v4di,v4di) +v32qi __builtin_ia32_psubsb256 (v32qi,v32qi) +v16hi __builtin_ia32_psubsw256 (v16hi,v16hi) +v32qi __builtin_ia32_psubusb256 (v32qi,v32qi) +v16hi __builtin_ia32_psubusw256 (v16hi,v16hi) +v32qi __builtin_ia32_punpckhbw256 (v32qi,v32qi) +v16hi __builtin_ia32_punpckhwd256 (v16hi,v16hi) +v8si __builtin_ia32_punpckhdq256 (v8si,v8si) +v4di __builtin_ia32_punpckhqdq256 (v4di,v4di) +v32qi __builtin_ia32_punpcklbw256 (v32qi,v32qi) +v16hi __builtin_ia32_punpcklwd256 (v16hi,v16hi) +v8si __builtin_ia32_punpckldq256 (v8si,v8si) +v4di __builtin_ia32_punpcklqdq256 (v4di,v4di) +v4di __builtin_ia32_pxor256 (v4di,v4di) +v4di __builtin_ia32_movntdqa256 (pv4di) +v4sf __builtin_ia32_vbroadcastss_ps (v4sf) +v8sf __builtin_ia32_vbroadcastss_ps256 (v4sf) +v4df __builtin_ia32_vbroadcastsd_pd256 (v2df) +v4di __builtin_ia32_vbroadcastsi256 (v2di) +v4si __builtin_ia32_pblendd128 (v4si,v4si) +v8si __builtin_ia32_pblendd256 (v8si,v8si) +v32qi __builtin_ia32_pbroadcastb256 (v16qi) +v16hi __builtin_ia32_pbroadcastw256 (v8hi) +v8si __builtin_ia32_pbroadcastd256 (v4si) +v4di __builtin_ia32_pbroadcastq256 (v2di) +v16qi __builtin_ia32_pbroadcastb128 (v16qi) +v8hi __builtin_ia32_pbroadcastw128 (v8hi) +v4si __builtin_ia32_pbroadcastd128 (v4si) +v2di __builtin_ia32_pbroadcastq128 (v2di) +v8si __builtin_ia32_permvarsi256 (v8si,v8si) +v4df __builtin_ia32_permdf256 (v4df,int) +v8sf __builtin_ia32_permvarsf256 (v8sf,v8sf) +v4di __builtin_ia32_permdi256 (v4di,int) +v4di __builtin_ia32_permti256 (v4di,v4di,int) +v4di __builtin_ia32_extract128i256 (v4di,int) +v4di __builtin_ia32_insert128i256 (v4di,v2di,int) +v8si __builtin_ia32_maskloadd256 (pcv8si,v8si) +v4di __builtin_ia32_maskloadq256 (pcv4di,v4di) +v4si __builtin_ia32_maskloadd (pcv4si,v4si) +v2di __builtin_ia32_maskloadq (pcv2di,v2di) +void __builtin_ia32_maskstored256 (pv8si,v8si,v8si) +void __builtin_ia32_maskstoreq256 (pv4di,v4di,v4di) +void __builtin_ia32_maskstored (pv4si,v4si,v4si) +void __builtin_ia32_maskstoreq (pv2di,v2di,v2di) +v8si __builtin_ia32_psllv8si (v8si,v8si) +v4si __builtin_ia32_psllv4si (v4si,v4si) +v4di __builtin_ia32_psllv4di (v4di,v4di) +v2di __builtin_ia32_psllv2di (v2di,v2di) +v8si __builtin_ia32_psrav8si (v8si,v8si) +v4si __builtin_ia32_psrav4si (v4si,v4si) +v8si __builtin_ia32_psrlv8si (v8si,v8si) +v4si __builtin_ia32_psrlv4si (v4si,v4si) +v4di __builtin_ia32_psrlv4di (v4di,v4di) +v2di __builtin_ia32_psrlv2di (v2di,v2di) +v2df __builtin_ia32_gathersiv2df (v2df, pcdouble,v4si,v2df,int) +v4df __builtin_ia32_gathersiv4df (v4df, pcdouble,v4si,v4df,int) +v2df __builtin_ia32_gatherdiv2df (v2df, pcdouble,v2di,v2df,int) +v4df __builtin_ia32_gatherdiv4df (v4df, pcdouble,v4di,v4df,int) +v4sf __builtin_ia32_gathersiv4sf (v4sf, pcfloat,v4si,v4sf,int) +v8sf __builtin_ia32_gathersiv8sf (v8sf, pcfloat,v8si,v8sf,int) +v4sf __builtin_ia32_gatherdiv4sf (v4sf, pcfloat,v2di,v4sf,int) +v4sf __builtin_ia32_gatherdiv4sf256 (v4sf, pcfloat,v4di,v4sf,int) +v2di __builtin_ia32_gathersiv2di (v2di, pcint64,v4si,v2di,int) +v4di __builtin_ia32_gathersiv4di (v4di, pcint64,v4si,v4di,int) +v2di __builtin_ia32_gatherdiv2di (v2di, pcint64,v2di,v2di,int) +v4di __builtin_ia32_gatherdiv4di (v4di, pcint64,v4di,v4di,int) +v4si __builtin_ia32_gathersiv4si (v4si, pcint,v4si,v4si,int) +v8si __builtin_ia32_gathersiv8si (v8si, pcint,v8si,v8si,int) +v4si __builtin_ia32_gatherdiv4si (v4si, pcint,v2di,v4si,int) +v4si __builtin_ia32_gatherdiv4si256 (v4si, pcint,v4di,v4si,int) +@end smallexample + +The following built-in functions are available when @option{-maes} is +used. All of them generate the machine instruction that is part of the +name. + +@smallexample +v2di __builtin_ia32_aesenc128 (v2di, v2di) +v2di __builtin_ia32_aesenclast128 (v2di, v2di) +v2di __builtin_ia32_aesdec128 (v2di, v2di) +v2di __builtin_ia32_aesdeclast128 (v2di, v2di) +v2di __builtin_ia32_aeskeygenassist128 (v2di, const int) +v2di __builtin_ia32_aesimc128 (v2di) +@end smallexample + +The following built-in function is available when @option{-mpclmul} is +used. + +@table @code +@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int) +Generates the @code{pclmulqdq} machine instruction. +@end table + +The following built-in function is available when @option{-mfsgsbase} is +used. All of them generate the machine instruction that is part of the +name. + +@smallexample +unsigned int __builtin_ia32_rdfsbase32 (void) +unsigned long long __builtin_ia32_rdfsbase64 (void) +unsigned int __builtin_ia32_rdgsbase32 (void) +unsigned long long __builtin_ia32_rdgsbase64 (void) +void _writefsbase_u32 (unsigned int) +void _writefsbase_u64 (unsigned long long) +void _writegsbase_u32 (unsigned int) +void _writegsbase_u64 (unsigned long long) +@end smallexample + +The following built-in function is available when @option{-mrdrnd} is +used. All of them generate the machine instruction that is part of the +name. + +@smallexample +unsigned int __builtin_ia32_rdrand16_step (unsigned short *) +unsigned int __builtin_ia32_rdrand32_step (unsigned int *) +unsigned int __builtin_ia32_rdrand64_step (unsigned long long *) +@end smallexample + +The following built-in functions are available when @option{-msse4a} is used. +All of them generate the machine instruction that is part of the name. + +@smallexample +void __builtin_ia32_movntsd (double *, v2df) +void __builtin_ia32_movntss (float *, v4sf) +v2di __builtin_ia32_extrq (v2di, v16qi) +v2di __builtin_ia32_extrqi (v2di, const unsigned int, const unsigned int) +v2di __builtin_ia32_insertq (v2di, v2di) +v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int) +@end smallexample + +The following built-in functions are available when @option{-mxop} is used. +@smallexample +v2df __builtin_ia32_vfrczpd (v2df) +v4sf __builtin_ia32_vfrczps (v4sf) +v2df __builtin_ia32_vfrczsd (v2df) +v4sf __builtin_ia32_vfrczss (v4sf) +v4df __builtin_ia32_vfrczpd256 (v4df) +v8sf __builtin_ia32_vfrczps256 (v8sf) +v2di __builtin_ia32_vpcmov (v2di, v2di, v2di) +v2di __builtin_ia32_vpcmov_v2di (v2di, v2di, v2di) +v4si __builtin_ia32_vpcmov_v4si (v4si, v4si, v4si) +v8hi __builtin_ia32_vpcmov_v8hi (v8hi, v8hi, v8hi) +v16qi __builtin_ia32_vpcmov_v16qi (v16qi, v16qi, v16qi) +v2df __builtin_ia32_vpcmov_v2df (v2df, v2df, v2df) +v4sf __builtin_ia32_vpcmov_v4sf (v4sf, v4sf, v4sf) +v4di __builtin_ia32_vpcmov_v4di256 (v4di, v4di, v4di) +v8si __builtin_ia32_vpcmov_v8si256 (v8si, v8si, v8si) +v16hi __builtin_ia32_vpcmov_v16hi256 (v16hi, v16hi, v16hi) +v32qi __builtin_ia32_vpcmov_v32qi256 (v32qi, v32qi, v32qi) +v4df __builtin_ia32_vpcmov_v4df256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vpcmov_v8sf256 (v8sf, v8sf, v8sf) +v16qi __builtin_ia32_vpcomeqb (v16qi, v16qi) +v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi) +v4si __builtin_ia32_vpcomeqd (v4si, v4si) +v2di __builtin_ia32_vpcomeqq (v2di, v2di) +v16qi __builtin_ia32_vpcomequb (v16qi, v16qi) +v4si __builtin_ia32_vpcomequd (v4si, v4si) +v2di __builtin_ia32_vpcomequq (v2di, v2di) +v8hi __builtin_ia32_vpcomequw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi) +v16qi __builtin_ia32_vpcomfalseb (v16qi, v16qi) +v4si __builtin_ia32_vpcomfalsed (v4si, v4si) +v2di __builtin_ia32_vpcomfalseq (v2di, v2di) +v16qi __builtin_ia32_vpcomfalseub (v16qi, v16qi) +v4si __builtin_ia32_vpcomfalseud (v4si, v4si) +v2di __builtin_ia32_vpcomfalseuq (v2di, v2di) +v8hi __builtin_ia32_vpcomfalseuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomfalsew (v8hi, v8hi) +v16qi __builtin_ia32_vpcomgeb (v16qi, v16qi) +v4si __builtin_ia32_vpcomged (v4si, v4si) +v2di __builtin_ia32_vpcomgeq (v2di, v2di) +v16qi __builtin_ia32_vpcomgeub (v16qi, v16qi) +v4si __builtin_ia32_vpcomgeud (v4si, v4si) +v2di __builtin_ia32_vpcomgeuq (v2di, v2di) +v8hi __builtin_ia32_vpcomgeuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomgew (v8hi, v8hi) +v16qi __builtin_ia32_vpcomgtb (v16qi, v16qi) +v4si __builtin_ia32_vpcomgtd (v4si, v4si) +v2di __builtin_ia32_vpcomgtq (v2di, v2di) +v16qi __builtin_ia32_vpcomgtub (v16qi, v16qi) +v4si __builtin_ia32_vpcomgtud (v4si, v4si) +v2di __builtin_ia32_vpcomgtuq (v2di, v2di) +v8hi __builtin_ia32_vpcomgtuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomgtw (v8hi, v8hi) +v16qi __builtin_ia32_vpcomleb (v16qi, v16qi) +v4si __builtin_ia32_vpcomled (v4si, v4si) +v2di __builtin_ia32_vpcomleq (v2di, v2di) +v16qi __builtin_ia32_vpcomleub (v16qi, v16qi) +v4si __builtin_ia32_vpcomleud (v4si, v4si) +v2di __builtin_ia32_vpcomleuq (v2di, v2di) +v8hi __builtin_ia32_vpcomleuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomlew (v8hi, v8hi) +v16qi __builtin_ia32_vpcomltb (v16qi, v16qi) +v4si __builtin_ia32_vpcomltd (v4si, v4si) +v2di __builtin_ia32_vpcomltq (v2di, v2di) +v16qi __builtin_ia32_vpcomltub (v16qi, v16qi) +v4si __builtin_ia32_vpcomltud (v4si, v4si) +v2di __builtin_ia32_vpcomltuq (v2di, v2di) +v8hi __builtin_ia32_vpcomltuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomltw (v8hi, v8hi) +v16qi __builtin_ia32_vpcomneb (v16qi, v16qi) +v4si __builtin_ia32_vpcomned (v4si, v4si) +v2di __builtin_ia32_vpcomneq (v2di, v2di) +v16qi __builtin_ia32_vpcomneub (v16qi, v16qi) +v4si __builtin_ia32_vpcomneud (v4si, v4si) +v2di __builtin_ia32_vpcomneuq (v2di, v2di) +v8hi __builtin_ia32_vpcomneuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomnew (v8hi, v8hi) +v16qi __builtin_ia32_vpcomtrueb (v16qi, v16qi) +v4si __builtin_ia32_vpcomtrued (v4si, v4si) +v2di __builtin_ia32_vpcomtrueq (v2di, v2di) +v16qi __builtin_ia32_vpcomtrueub (v16qi, v16qi) +v4si __builtin_ia32_vpcomtrueud (v4si, v4si) +v2di __builtin_ia32_vpcomtrueuq (v2di, v2di) +v8hi __builtin_ia32_vpcomtrueuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomtruew (v8hi, v8hi) +v4si __builtin_ia32_vphaddbd (v16qi) +v2di __builtin_ia32_vphaddbq (v16qi) +v8hi __builtin_ia32_vphaddbw (v16qi) +v2di __builtin_ia32_vphadddq (v4si) +v4si __builtin_ia32_vphaddubd (v16qi) +v2di __builtin_ia32_vphaddubq (v16qi) +v8hi __builtin_ia32_vphaddubw (v16qi) +v2di __builtin_ia32_vphaddudq (v4si) +v4si __builtin_ia32_vphadduwd (v8hi) +v2di __builtin_ia32_vphadduwq (v8hi) +v4si __builtin_ia32_vphaddwd (v8hi) +v2di __builtin_ia32_vphaddwq (v8hi) +v8hi __builtin_ia32_vphsubbw (v16qi) +v2di __builtin_ia32_vphsubdq (v4si) +v4si __builtin_ia32_vphsubwd (v8hi) +v4si __builtin_ia32_vpmacsdd (v4si, v4si, v4si) +v2di __builtin_ia32_vpmacsdqh (v4si, v4si, v2di) +v2di __builtin_ia32_vpmacsdql (v4si, v4si, v2di) +v4si __builtin_ia32_vpmacssdd (v4si, v4si, v4si) +v2di __builtin_ia32_vpmacssdqh (v4si, v4si, v2di) +v2di __builtin_ia32_vpmacssdql (v4si, v4si, v2di) +v4si __builtin_ia32_vpmacsswd (v8hi, v8hi, v4si) +v8hi __builtin_ia32_vpmacssww (v8hi, v8hi, v8hi) +v4si __builtin_ia32_vpmacswd (v8hi, v8hi, v4si) +v8hi __builtin_ia32_vpmacsww (v8hi, v8hi, v8hi) +v4si __builtin_ia32_vpmadcsswd (v8hi, v8hi, v4si) +v4si __builtin_ia32_vpmadcswd (v8hi, v8hi, v4si) +v16qi __builtin_ia32_vpperm (v16qi, v16qi, v16qi) +v16qi __builtin_ia32_vprotb (v16qi, v16qi) +v4si __builtin_ia32_vprotd (v4si, v4si) +v2di __builtin_ia32_vprotq (v2di, v2di) +v8hi __builtin_ia32_vprotw (v8hi, v8hi) +v16qi __builtin_ia32_vpshab (v16qi, v16qi) +v4si __builtin_ia32_vpshad (v4si, v4si) +v2di __builtin_ia32_vpshaq (v2di, v2di) +v8hi __builtin_ia32_vpshaw (v8hi, v8hi) +v16qi __builtin_ia32_vpshlb (v16qi, v16qi) +v4si __builtin_ia32_vpshld (v4si, v4si) +v2di __builtin_ia32_vpshlq (v2di, v2di) +v8hi __builtin_ia32_vpshlw (v8hi, v8hi) +@end smallexample + +The following built-in functions are available when @option{-mfma4} is used. +All of them generate the machine instruction that is part of the name. + +@smallexample +v2df __builtin_ia32_vfmaddpd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfmaddps (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfmaddsd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfmaddss (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfmsubpd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfmsubps (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfmsubsd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfmsubss (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfnmaddpd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfnmaddps (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfnmaddsd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfnmaddss (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfnmsubpd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfnmsubps (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfnmsubsd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfnmsubss (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfmaddsubpd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfmaddsubps (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfmsubaddpd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfmsubaddps (v4sf, v4sf, v4sf) +v4df __builtin_ia32_vfmaddpd256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vfmaddps256 (v8sf, v8sf, v8sf) +v4df __builtin_ia32_vfmsubpd256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vfmsubps256 (v8sf, v8sf, v8sf) +v4df __builtin_ia32_vfnmaddpd256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vfnmaddps256 (v8sf, v8sf, v8sf) +v4df __builtin_ia32_vfnmsubpd256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vfnmsubps256 (v8sf, v8sf, v8sf) +v4df __builtin_ia32_vfmaddsubpd256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vfmaddsubps256 (v8sf, v8sf, v8sf) +v4df __builtin_ia32_vfmsubaddpd256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vfmsubaddps256 (v8sf, v8sf, v8sf) + +@end smallexample + +The following built-in functions are available when @option{-mlwp} is used. + +@smallexample +void __builtin_ia32_llwpcb16 (void *); +void __builtin_ia32_llwpcb32 (void *); +void __builtin_ia32_llwpcb64 (void *); +void * __builtin_ia32_llwpcb16 (void); +void * __builtin_ia32_llwpcb32 (void); +void * __builtin_ia32_llwpcb64 (void); +void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short) +void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int) +void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int) +unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short) +unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int) +unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int) +@end smallexample + +The following built-in functions are available when @option{-mbmi} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int); +unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long); +@end smallexample + +The following built-in functions are available when @option{-mbmi2} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +unsigned int _bzhi_u32 (unsigned int, unsigned int) +unsigned int _pdep_u32 (unsigned int, unsigned int) +unsigned int _pext_u32 (unsigned int, unsigned int) +unsigned long long _bzhi_u64 (unsigned long long, unsigned long long) +unsigned long long _pdep_u64 (unsigned long long, unsigned long long) +unsigned long long _pext_u64 (unsigned long long, unsigned long long) +@end smallexample + +The following built-in functions are available when @option{-mlzcnt} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +unsigned short __builtin_ia32_lzcnt_16(unsigned short); +unsigned int __builtin_ia32_lzcnt_u32(unsigned int); +unsigned long long __builtin_ia32_lzcnt_u64 (unsigned long long); +@end smallexample + +The following built-in functions are available when @option{-mfxsr} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +void __builtin_ia32_fxsave (void *) +void __builtin_ia32_fxrstor (void *) +void __builtin_ia32_fxsave64 (void *) +void __builtin_ia32_fxrstor64 (void *) +@end smallexample + +The following built-in functions are available when @option{-mxsave} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +void __builtin_ia32_xsave (void *, long long) +void __builtin_ia32_xrstor (void *, long long) +void __builtin_ia32_xsave64 (void *, long long) +void __builtin_ia32_xrstor64 (void *, long long) +@end smallexample + +The following built-in functions are available when @option{-mxsaveopt} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +void __builtin_ia32_xsaveopt (void *, long long) +void __builtin_ia32_xsaveopt64 (void *, long long) +@end smallexample + +The following built-in functions are available when @option{-mtbm} is used. +Both of them generate the immediate form of the bextr machine instruction. +@smallexample +unsigned int __builtin_ia32_bextri_u32 (unsigned int, const unsigned int); +unsigned long long __builtin_ia32_bextri_u64 (unsigned long long, const unsigned long long); +@end smallexample + + +The following built-in functions are available when @option{-m3dnow} is used. +All of them generate the machine instruction that is part of the name. + +@smallexample +void __builtin_ia32_femms (void) +v8qi __builtin_ia32_pavgusb (v8qi, v8qi) +v2si __builtin_ia32_pf2id (v2sf) +v2sf __builtin_ia32_pfacc (v2sf, v2sf) +v2sf __builtin_ia32_pfadd (v2sf, v2sf) +v2si __builtin_ia32_pfcmpeq (v2sf, v2sf) +v2si __builtin_ia32_pfcmpge (v2sf, v2sf) +v2si __builtin_ia32_pfcmpgt (v2sf, v2sf) +v2sf __builtin_ia32_pfmax (v2sf, v2sf) +v2sf __builtin_ia32_pfmin (v2sf, v2sf) +v2sf __builtin_ia32_pfmul (v2sf, v2sf) +v2sf __builtin_ia32_pfrcp (v2sf) +v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf) +v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf) +v2sf __builtin_ia32_pfrsqrt (v2sf) +v2sf __builtin_ia32_pfsub (v2sf, v2sf) +v2sf __builtin_ia32_pfsubr (v2sf, v2sf) +v2sf __builtin_ia32_pi2fd (v2si) +v4hi __builtin_ia32_pmulhrw (v4hi, v4hi) +@end smallexample + +The following built-in functions are available when both @option{-m3dnow} +and @option{-march=athlon} are used. All of them generate the machine +instruction that is part of the name. + +@smallexample +v2si __builtin_ia32_pf2iw (v2sf) +v2sf __builtin_ia32_pfnacc (v2sf, v2sf) +v2sf __builtin_ia32_pfpnacc (v2sf, v2sf) +v2sf __builtin_ia32_pi2fw (v2si) +v2sf __builtin_ia32_pswapdsf (v2sf) +v2si __builtin_ia32_pswapdsi (v2si) +@end smallexample + +The following built-in functions are available when @option{-mrtm} is used +They are used for restricted transactional memory. These are the internal +low level functions. Normally the functions in +@ref{x86 transactional memory intrinsics} should be used instead. + +@smallexample +int __builtin_ia32_xbegin () +void __builtin_ia32_xend () +void __builtin_ia32_xabort (status) +int __builtin_ia32_xtest () +@end smallexample + +@node x86 transactional memory intrinsics +@subsection x86 transaction memory intrinsics + +Hardware transactional memory intrinsics for x86. These allow to use +memory transactions with RTM (Restricted Transactional Memory). +For using HLE (Hardware Lock Elision) see @ref{x86 specific memory model extensions for transactional memory} instead. +This support is enabled with the @option{-mrtm} option. + +A memory transaction commits all changes to memory in an atomic way, +as visible to other threads. If the transaction fails it is rolled back +and all side effects discarded. + +Generally there is no guarantee that a memory transaction ever succeeds +and suitable fallback code always needs to be supplied. + +@deftypefn {RTM Function} {unsigned} _xbegin () +Start a RTM (Restricted Transactional Memory) transaction. +Returns _XBEGIN_STARTED when the transaction +started successfully (note this is not 0, so the constant has to be +explicitely tested). When the transaction aborts all side effects +are undone and an abort code is returned. There is no guarantee +any transaction ever succeeds, so there always needs to be a valid +tested fallback path. +@end deftypefn + +@smallexample +#include <immintrin.h> + +if ((status = _xbegin ()) == _XBEGIN_STARTED) @{ + ... transaction code... + _xend (); +@} else @{ + ... non transactional fallback path... +@} +@end smallexample + +Valid abort status bits (when the value is not @code{_XBEGIN_STARTED}) are: + +@table @code +@item _XABORT_EXPLICIT +Transaction explicitely aborted with @code{_xabort}. The parameter passed +to @code{_xabort} is available with @code{_XABORT_CODE(status)} +@item _XABORT_RETRY +Transaction retry is possible. +@item _XABORT_CONFLICT +Transaction abort due to a memory conflict with another thread +@item _XABORT_CAPACITY +Transaction abort due to the transaction using too much memory +@item _XABORT_DEBUG +Transaction abort due to a debug trap +@item _XABORT_NESTED +Transaction abort in a inner nested transaction +@end table + +@deftypefn {RTM Function} {void} _xend () +Commit the current transaction. When no transaction is active this will +fault. All memory side effects of the transactions will become visible +to other threads in an atomic matter. +@end deftypefn + +@deftypefn {RTM Function} {int} _xtest () +Return a value not zero when a transaction is currently active, otherwise 0. +@end deftypefn + +@deftypefn {RTM Function} {void} _xabort (status) +Abort the current transaction. When no transaction is active this is a no-op. +status must be a 8bit constant, that is included in the status code returned +by @code{_xbegin} +@end deftypefn + @node Target Format Checks @section Format Checks Specific to Particular Target Machines diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 94ca947..ba81ec7 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -676,44 +676,6 @@ Objective-C and Objective-C++ Dialects}. -mschedule=@var{cpu-type} -mspace-regs -msio -mwsio @gol -munix=@var{unix-std} -nolibdld -static -threads} -@emph{x86 Options} -@gccoptlist{-mtune=@var{cpu-type} -march=@var{cpu-type} @gol --mtune-ctrl=@var{feature-list} -mdump-tune-features -mno-default @gol --mfpmath=@var{unit} @gol --masm=@var{dialect} -mno-fancy-math-387 @gol --mno-fp-ret-in-387 -msoft-float @gol --mno-wide-multiply -mrtd -malign-double @gol --mpreferred-stack-boundary=@var{num} @gol --mincoming-stack-boundary=@var{num} @gol --mcld -mcx16 -msahf -mmovbe -mcrc32 @gol --mrecip -mrecip=@var{opt} @gol --mvzeroupper -mprefer-avx128 @gol --mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol --mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -msha @gol --maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfma -mprefetchwt1 @gol --mclflushopt -mxsavec -mxsaves @gol --msse4a -m3dnow -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop -mlzcnt @gol --mbmi2 -mfxsr -mxsave -mxsaveopt -mrtm -mlwp -mmpx -mthreads @gol --mno-align-stringops -minline-all-stringops @gol --minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol --mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} @gol --mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol --m96bit-long-double -mlong-double-64 -mlong-double-80 -mlong-double-128 @gol --mregparm=@var{num} -msseregparm @gol --mveclibabi=@var{type} -mvect8-ret-in-mem @gol --mpc32 -mpc64 -mpc80 -mstackrealign @gol --momit-leaf-frame-pointer -mno-red-zone -mno-tls-direct-seg-refs @gol --mcmodel=@var{code-model} -mabi=@var{name} -maddress-mode=@var{mode} @gol --m32 -m64 -mx32 -m16 -mlarge-data-threshold=@var{num} @gol --msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol --mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol --malign-data=@var{type} -mstack-protector-guard=@var{guard}} - -@emph{x86 Windows Options} -@gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol --mnop-fun-dllimport -mthread @gol --municode -mwin32 -mwindows -fno-set-stack-executable} - @emph{IA-64 Options} @gccoptlist{-mbig-endian -mlittle-endian -mgnu-as -mgnu-ld -mno-pic @gol -mvolatile-asm-stop -mregister-names -msdata -mno-sdata @gol @@ -1081,6 +1043,44 @@ See RS/6000 and PowerPC Options. @gccoptlist{-mrtp -non-static -Bstatic -Bdynamic @gol -Xbind-lazy -Xbind-now} +@emph{x86 Options} +@gccoptlist{-mtune=@var{cpu-type} -march=@var{cpu-type} @gol +-mtune-ctrl=@var{feature-list} -mdump-tune-features -mno-default @gol +-mfpmath=@var{unit} @gol +-masm=@var{dialect} -mno-fancy-math-387 @gol +-mno-fp-ret-in-387 -msoft-float @gol +-mno-wide-multiply -mrtd -malign-double @gol +-mpreferred-stack-boundary=@var{num} @gol +-mincoming-stack-boundary=@var{num} @gol +-mcld -mcx16 -msahf -mmovbe -mcrc32 @gol +-mrecip -mrecip=@var{opt} @gol +-mvzeroupper -mprefer-avx128 @gol +-mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol +-mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -msha @gol +-maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfma -mprefetchwt1 @gol +-mclflushopt -mxsavec -mxsaves @gol +-msse4a -m3dnow -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop -mlzcnt @gol +-mbmi2 -mfxsr -mxsave -mxsaveopt -mrtm -mlwp -mmpx -mthreads @gol +-mno-align-stringops -minline-all-stringops @gol +-minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} @gol +-mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol +-m96bit-long-double -mlong-double-64 -mlong-double-80 -mlong-double-128 @gol +-mregparm=@var{num} -msseregparm @gol +-mveclibabi=@var{type} -mvect8-ret-in-mem @gol +-mpc32 -mpc64 -mpc80 -mstackrealign @gol +-momit-leaf-frame-pointer -mno-red-zone -mno-tls-direct-seg-refs @gol +-mcmodel=@var{code-model} -mabi=@var{name} -maddress-mode=@var{mode} @gol +-m32 -m64 -mx32 -m16 -mlarge-data-threshold=@var{num} @gol +-msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol +-mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol +-malign-data=@var{type} -mstack-protector-guard=@var{guard}} + +@emph{x86 Windows Options} +@gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol +-mnop-fun-dllimport -mthread @gol +-municode -mwin32 -mwindows -fno-set-stack-executable} + @emph{Xstormy16 Options} @gccoptlist{-msim} @@ -11952,8 +11952,6 @@ platform. * GNU/Linux Options:: * H8/300 Options:: * HPPA Options:: -* x86 Options:: -* x86 Windows Options:: * IA-64 Options:: * LM32 Options:: * M32C Options:: @@ -11989,6 +11987,8 @@ platform. * Visium Options:: * VMS Options:: * VxWorks Options:: +* x86 Options:: +* x86 Windows Options:: * Xstormy16 Options:: * Xtensa Options:: * zSeries Options:: @@ -15361,1223 +15361,6 @@ under HP-UX@. This option sets flags for both the preprocessor and linker. @end table -@node x86 Options -@subsection x86 Options -@cindex x86 Options - -These @samp{-m} options are defined for the x86 family of computers. - -@table @gcctabopt - -@item -march=@var{cpu-type} -@opindex march -Generate instructions for the machine type @var{cpu-type}. In contrast to -@option{-mtune=@var{cpu-type}}, which merely tunes the generated code -for the specified @var{cpu-type}, @option{-march=@var{cpu-type}} allows GCC -to generate code that may not run at all on processors other than the one -indicated. Specifying @option{-march=@var{cpu-type}} implies -@option{-mtune=@var{cpu-type}}. - -The choices for @var{cpu-type} are: - -@table @samp -@item native -This selects the CPU to generate code for at compilation time by determining -the processor type of the compiling machine. Using @option{-march=native} -enables all instruction subsets supported by the local machine (hence -the result might not run on different machines). Using @option{-mtune=native} -produces code optimized for the local machine under the constraints -of the selected instruction set. - -@item i386 -Original Intel i386 CPU@. - -@item i486 -Intel i486 CPU@. (No scheduling is implemented for this chip.) - -@item i586 -@itemx pentium -Intel Pentium CPU with no MMX support. - -@item pentium-mmx -Intel Pentium MMX CPU, based on Pentium core with MMX instruction set support. - -@item pentiumpro -Intel Pentium Pro CPU@. - -@item i686 -When used with @option{-march}, the Pentium Pro -instruction set is used, so the code runs on all i686 family chips. -When used with @option{-mtune}, it has the same meaning as @samp{generic}. - -@item pentium2 -Intel Pentium II CPU, based on Pentium Pro core with MMX instruction set -support. - -@item pentium3 -@itemx pentium3m -Intel Pentium III CPU, based on Pentium Pro core with MMX and SSE instruction -set support. - -@item pentium-m -Intel Pentium M; low-power version of Intel Pentium III CPU -with MMX, SSE and SSE2 instruction set support. Used by Centrino notebooks. - -@item pentium4 -@itemx pentium4m -Intel Pentium 4 CPU with MMX, SSE and SSE2 instruction set support. - -@item prescott -Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2 and SSE3 instruction -set support. - -@item nocona -Improved version of Intel Pentium 4 CPU with 64-bit extensions, MMX, SSE, -SSE2 and SSE3 instruction set support. - -@item core2 -Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 -instruction set support. - -@item nehalem -Intel Nehalem CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2 and POPCNT instruction set support. - -@item westmere -Intel Westmere CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2, POPCNT, AES and PCLMUL instruction set support. - -@item sandybridge -Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support. - -@item ivybridge -Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C -instruction set support. - -@item haswell -Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, -BMI, BMI2 and F16C instruction set support. - -@item broadwell -Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, -BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set support. - -@item bonnell -Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3 -instruction set support. - -@item silvermont -Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction set support. - -@item k6 -AMD K6 CPU with MMX instruction set support. - -@item k6-2 -@itemx k6-3 -Improved versions of AMD K6 CPU with MMX and 3DNow!@: instruction set support. - -@item athlon -@itemx athlon-tbird -AMD Athlon CPU with MMX, 3dNOW!, enhanced 3DNow!@: and SSE prefetch instructions -support. - -@item athlon-4 -@itemx athlon-xp -@itemx athlon-mp -Improved AMD Athlon CPU with MMX, 3DNow!, enhanced 3DNow!@: and full SSE -instruction set support. - -@item k8 -@itemx opteron -@itemx athlon64 -@itemx athlon-fx -Processors based on the AMD K8 core with x86-64 instruction set support, -including the AMD Opteron, Athlon 64, and Athlon 64 FX processors. -(This supersets MMX, SSE, SSE2, 3DNow!, enhanced 3DNow!@: and 64-bit -instruction set extensions.) - -@item k8-sse3 -@itemx opteron-sse3 -@itemx athlon64-sse3 -Improved versions of AMD K8 cores with SSE3 instruction set support. - -@item amdfam10 -@itemx barcelona -CPUs based on AMD Family 10h cores with x86-64 instruction set support. (This -supersets MMX, SSE, SSE2, SSE3, SSE4A, 3DNow!, enhanced 3DNow!, ABM and 64-bit -instruction set extensions.) - -@item bdver1 -CPUs based on AMD Family 15h cores with x86-64 instruction set support. (This -supersets FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, -SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.) -@item bdver2 -AMD Family 15h core based CPUs with x86-64 instruction set support. (This -supersets BMI, TBM, F16C, FMA, FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, -SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set -extensions.) -@item bdver3 -AMD Family 15h core based CPUs with x86-64 instruction set support. (This -supersets BMI, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, XOP, LWP, AES, -PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and -64-bit instruction set extensions. -@item bdver4 -AMD Family 15h core based CPUs with x86-64 instruction set support. (This -supersets BMI, BMI2, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, AVX2, XOP, LWP, -AES, PCL_MUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, -SSE4.2, ABM and 64-bit instruction set extensions. - -@item btver1 -CPUs based on AMD Family 14h cores with x86-64 instruction set support. (This -supersets MMX, SSE, SSE2, SSE3, SSSE3, SSE4A, CX16, ABM and 64-bit -instruction set extensions.) - -@item btver2 -CPUs based on AMD Family 16h cores with x86-64 instruction set support. This -includes MOVBE, F16C, BMI, AVX, PCL_MUL, AES, SSE4.2, SSE4.1, CX16, ABM, -SSE4A, SSSE3, SSE3, SSE2, SSE, MMX and 64-bit instruction set extensions. - -@item winchip-c6 -IDT WinChip C6 CPU, dealt in same way as i486 with additional MMX instruction -set support. - -@item winchip2 -IDT WinChip 2 CPU, dealt in same way as i486 with additional MMX and 3DNow!@: -instruction set support. - -@item c3 -VIA C3 CPU with MMX and 3DNow!@: instruction set support. (No scheduling is -implemented for this chip.) - -@item c3-2 -VIA C3-2 (Nehemiah/C5XL) CPU with MMX and SSE instruction set support. -(No scheduling is -implemented for this chip.) - -@item geode -AMD Geode embedded processor with MMX and 3DNow!@: instruction set support. -@end table - -@item -mtune=@var{cpu-type} -@opindex mtune -Tune to @var{cpu-type} everything applicable about the generated code, except -for the ABI and the set of available instructions. -While picking a specific @var{cpu-type} schedules things appropriately -for that particular chip, the compiler does not generate any code that -cannot run on the default machine type unless you use a -@option{-march=@var{cpu-type}} option. -For example, if GCC is configured for i686-pc-linux-gnu -then @option{-mtune=pentium4} generates code that is tuned for Pentium 4 -but still runs on i686 machines. - -The choices for @var{cpu-type} are the same as for @option{-march}. -In addition, @option{-mtune} supports 2 extra choices for @var{cpu-type}: - -@table @samp -@item generic -Produce code optimized for the most common IA32/@/AMD64/@/EM64T processors. -If you know the CPU on which your code will run, then you should use -the corresponding @option{-mtune} or @option{-march} option instead of -@option{-mtune=generic}. But, if you do not know exactly what CPU users -of your application will have, then you should use this option. - -As new processors are deployed in the marketplace, the behavior of this -option will change. Therefore, if you upgrade to a newer version of -GCC, code generation controlled by this option will change to reflect -the processors -that are most common at the time that version of GCC is released. - -There is no @option{-march=generic} option because @option{-march} -indicates the instruction set the compiler can use, and there is no -generic instruction set applicable to all processors. In contrast, -@option{-mtune} indicates the processor (or, in this case, collection of -processors) for which the code is optimized. - -@item intel -Produce code optimized for the most current Intel processors, which are -Haswell and Silvermont for this version of GCC. If you know the CPU -on which your code will run, then you should use the corresponding -@option{-mtune} or @option{-march} option instead of @option{-mtune=intel}. -But, if you want your application performs better on both Haswell and -Silvermont, then you should use this option. - -As new Intel processors are deployed in the marketplace, the behavior of -this option will change. Therefore, if you upgrade to a newer version of -GCC, code generation controlled by this option will change to reflect -the most current Intel processors at the time that version of GCC is -released. - -There is no @option{-march=intel} option because @option{-march} indicates -the instruction set the compiler can use, and there is no common -instruction set applicable to all processors. In contrast, -@option{-mtune} indicates the processor (or, in this case, collection of -processors) for which the code is optimized. -@end table - -@item -mcpu=@var{cpu-type} -@opindex mcpu -A deprecated synonym for @option{-mtune}. - -@item -mfpmath=@var{unit} -@opindex mfpmath -Generate floating-point arithmetic for selected unit @var{unit}. The choices -for @var{unit} are: - -@table @samp -@item 387 -Use the standard 387 floating-point coprocessor present on the majority of chips and -emulated otherwise. Code compiled with this option runs almost everywhere. -The temporary results are computed in 80-bit precision instead of the precision -specified by the type, resulting in slightly different results compared to most -of other chips. See @option{-ffloat-store} for more detailed description. - -This is the default choice for x86-32 targets. - -@item sse -Use scalar floating-point instructions present in the SSE instruction set. -This instruction set is supported by Pentium III and newer chips, -and in the AMD line -by Athlon-4, Athlon XP and Athlon MP chips. The earlier version of the SSE -instruction set supports only single-precision arithmetic, thus the double and -extended-precision arithmetic are still done using 387. A later version, present -only in Pentium 4 and AMD x86-64 chips, supports double-precision -arithmetic too. - -For the x86-32 compiler, you must use @option{-march=@var{cpu-type}}, @option{-msse} -or @option{-msse2} switches to enable SSE extensions and make this option -effective. For the x86-64 compiler, these extensions are enabled by default. - -The resulting code should be considerably faster in the majority of cases and avoid -the numerical instability problems of 387 code, but may break some existing -code that expects temporaries to be 80 bits. - -This is the default choice for the x86-64 compiler. - -@item sse,387 -@itemx sse+387 -@itemx both -Attempt to utilize both instruction sets at once. This effectively doubles the -amount of available registers, and on chips with separate execution units for -387 and SSE the execution resources too. Use this option with care, as it is -still experimental, because the GCC register allocator does not model separate -functional units well, resulting in unstable performance. -@end table - -@item -masm=@var{dialect} -@opindex masm=@var{dialect} -Output assembly instructions using selected @var{dialect}. Supported -choices are @samp{intel} or @samp{att} (the default). Darwin does -not support @samp{intel}. - -@item -mieee-fp -@itemx -mno-ieee-fp -@opindex mieee-fp -@opindex mno-ieee-fp -Control whether or not the compiler uses IEEE floating-point -comparisons. These correctly handle the case where the result of a -comparison is unordered. - -@item -msoft-float -@opindex msoft-float -Generate output containing library calls for floating point. - -@strong{Warning:} the requisite libraries are not part of GCC@. -Normally the facilities of the machine's usual C compiler are used, but -this can't be done directly in cross-compilation. You must make your -own arrangements to provide suitable library functions for -cross-compilation. - -On machines where a function returns floating-point results in the 80387 -register stack, some floating-point opcodes may be emitted even if -@option{-msoft-float} is used. - -@item -mno-fp-ret-in-387 -@opindex mno-fp-ret-in-387 -Do not use the FPU registers for return values of functions. - -The usual calling convention has functions return values of types -@code{float} and @code{double} in an FPU register, even if there -is no FPU@. The idea is that the operating system should emulate -an FPU@. - -The option @option{-mno-fp-ret-in-387} causes such values to be returned -in ordinary CPU registers instead. - -@item -mno-fancy-math-387 -@opindex mno-fancy-math-387 -Some 387 emulators do not support the @code{sin}, @code{cos} and -@code{sqrt} instructions for the 387. Specify this option to avoid -generating those instructions. This option is the default on FreeBSD, -OpenBSD and NetBSD@. This option is overridden when @option{-march} -indicates that the target CPU always has an FPU and so the -instruction does not need emulation. These -instructions are not generated unless you also use the -@option{-funsafe-math-optimizations} switch. - -@item -malign-double -@itemx -mno-align-double -@opindex malign-double -@opindex mno-align-double -Control whether GCC aligns @code{double}, @code{long double}, and -@code{long long} variables on a two-word boundary or a one-word -boundary. Aligning @code{double} variables on a two-word boundary -produces code that runs somewhat faster on a Pentium at the -expense of more memory. - -On x86-64, @option{-malign-double} is enabled by default. - -@strong{Warning:} if you use the @option{-malign-double} switch, -structures containing the above types are aligned differently than -the published application binary interface specifications for the x86-32 -and are not binary compatible with structures in code compiled -without that switch. - -@item -m96bit-long-double -@itemx -m128bit-long-double -@opindex m96bit-long-double -@opindex m128bit-long-double -These switches control the size of @code{long double} type. The x86-32 -application binary interface specifies the size to be 96 bits, -so @option{-m96bit-long-double} is the default in 32-bit mode. - -Modern architectures (Pentium and newer) prefer @code{long double} -to be aligned to an 8- or 16-byte boundary. In arrays or structures -conforming to the ABI, this is not possible. So specifying -@option{-m128bit-long-double} aligns @code{long double} -to a 16-byte boundary by padding the @code{long double} with an additional -32-bit zero. - -In the x86-64 compiler, @option{-m128bit-long-double} is the default choice as -its ABI specifies that @code{long double} is aligned on 16-byte boundary. - -Notice that neither of these options enable any extra precision over the x87 -standard of 80 bits for a @code{long double}. - -@strong{Warning:} if you override the default value for your target ABI, this -changes the size of -structures and arrays containing @code{long double} variables, -as well as modifying the function calling convention for functions taking -@code{long double}. Hence they are not binary-compatible -with code compiled without that switch. - -@item -mlong-double-64 -@itemx -mlong-double-80 -@itemx -mlong-double-128 -@opindex mlong-double-64 -@opindex mlong-double-80 -@opindex mlong-double-128 -These switches control the size of @code{long double} type. A size -of 64 bits makes the @code{long double} type equivalent to the @code{double} -type. This is the default for 32-bit Bionic C library. A size -of 128 bits makes the @code{long double} type equivalent to the -@code{__float128} type. This is the default for 64-bit Bionic C library. - -@strong{Warning:} if you override the default value for your target ABI, this -changes the size of -structures and arrays containing @code{long double} variables, -as well as modifying the function calling convention for functions taking -@code{long double}. Hence they are not binary-compatible -with code compiled without that switch. - -@item -malign-data=@var{type} -@opindex malign-data -Control how GCC aligns variables. Supported values for @var{type} are -@samp{compat} uses increased alignment value compatible uses GCC 4.8 -and earlier, @samp{abi} uses alignment value as specified by the -psABI, and @samp{cacheline} uses increased alignment value to match -the cache line size. @samp{compat} is the default. - -@item -mlarge-data-threshold=@var{threshold} -@opindex mlarge-data-threshold -When @option{-mcmodel=medium} is specified, data objects larger than -@var{threshold} are placed in the large data section. This value must be the -same across all objects linked into the binary, and defaults to 65535. - -@item -mrtd -@opindex mrtd -Use a different function-calling convention, in which functions that -take a fixed number of arguments return with the @code{ret @var{num}} -instruction, which pops their arguments while returning. This saves one -instruction in the caller since there is no need to pop the arguments -there. - -You can specify that an individual function is called with this calling -sequence with the function attribute @code{stdcall}. You can also -override the @option{-mrtd} option by using the function attribute -@code{cdecl}. @xref{Function Attributes}. - -@strong{Warning:} this calling convention is incompatible with the one -normally used on Unix, so you cannot use it if you need to call -libraries compiled with the Unix compiler. - -Also, you must provide function prototypes for all functions that -take variable numbers of arguments (including @code{printf}); -otherwise incorrect code is generated for calls to those -functions. - -In addition, seriously incorrect code results if you call a -function with too many arguments. (Normally, extra arguments are -harmlessly ignored.) - -@item -mregparm=@var{num} -@opindex mregparm -Control how many registers are used to pass integer arguments. By -default, no registers are used to pass arguments, and at most 3 -registers can be used. You can control this behavior for a specific -function by using the function attribute @code{regparm}. -@xref{Function Attributes}. - -@strong{Warning:} if you use this switch, and -@var{num} is nonzero, then you must build all modules with the same -value, including any libraries. This includes the system libraries and -startup modules. - -@item -msseregparm -@opindex msseregparm -Use SSE register passing conventions for float and double arguments -and return values. You can control this behavior for a specific -function by using the function attribute @code{sseregparm}. -@xref{Function Attributes}. - -@strong{Warning:} if you use this switch then you must build all -modules with the same value, including any libraries. This includes -the system libraries and startup modules. - -@item -mvect8-ret-in-mem -@opindex mvect8-ret-in-mem -Return 8-byte vectors in memory instead of MMX registers. This is the -default on Solaris@tie{}8 and 9 and VxWorks to match the ABI of the Sun -Studio compilers until version 12. Later compiler versions (starting -with Studio 12 Update@tie{}1) follow the ABI used by other x86 targets, which -is the default on Solaris@tie{}10 and later. @emph{Only} use this option if -you need to remain compatible with existing code produced by those -previous compiler versions or older versions of GCC@. - -@item -mpc32 -@itemx -mpc64 -@itemx -mpc80 -@opindex mpc32 -@opindex mpc64 -@opindex mpc80 - -Set 80387 floating-point precision to 32, 64 or 80 bits. When @option{-mpc32} -is specified, the significands of results of floating-point operations are -rounded to 24 bits (single precision); @option{-mpc64} rounds the -significands of results of floating-point operations to 53 bits (double -precision) and @option{-mpc80} rounds the significands of results of -floating-point operations to 64 bits (extended double precision), which is -the default. When this option is used, floating-point operations in higher -precisions are not available to the programmer without setting the FPU -control word explicitly. - -Setting the rounding of floating-point operations to less than the default -80 bits can speed some programs by 2% or more. Note that some mathematical -libraries assume that extended-precision (80-bit) floating-point operations -are enabled by default; routines in such libraries could suffer significant -loss of accuracy, typically through so-called ``catastrophic cancellation'', -when this option is used to set the precision to less than extended precision. - -@item -mstackrealign -@opindex mstackrealign -Realign the stack at entry. On the x86, the @option{-mstackrealign} -option generates an alternate prologue and epilogue that realigns the -run-time stack if necessary. This supports mixing legacy codes that keep -4-byte stack alignment with modern codes that keep 16-byte stack alignment for -SSE compatibility. See also the attribute @code{force_align_arg_pointer}, -applicable to individual functions. - -@item -mpreferred-stack-boundary=@var{num} -@opindex mpreferred-stack-boundary -Attempt to keep the stack boundary aligned to a 2 raised to @var{num} -byte boundary. If @option{-mpreferred-stack-boundary} is not specified, -the default is 4 (16 bytes or 128 bits). - -@strong{Warning:} When generating code for the x86-64 architecture with -SSE extensions disabled, @option{-mpreferred-stack-boundary=3} can be -used to keep the stack boundary aligned to 8 byte boundary. Since -x86-64 ABI require 16 byte stack alignment, this is ABI incompatible and -intended to be used in controlled environment where stack space is -important limitation. This option leads to wrong code when functions -compiled with 16 byte stack alignment (such as functions from a standard -library) are called with misaligned stack. In this case, SSE -instructions may lead to misaligned memory access traps. In addition, -variable arguments are handled incorrectly for 16 byte aligned -objects (including x87 long double and __int128), leading to wrong -results. You must build all modules with -@option{-mpreferred-stack-boundary=3}, including any libraries. This -includes the system libraries and startup modules. - -@item -mincoming-stack-boundary=@var{num} -@opindex mincoming-stack-boundary -Assume the incoming stack is aligned to a 2 raised to @var{num} byte -boundary. If @option{-mincoming-stack-boundary} is not specified, -the one specified by @option{-mpreferred-stack-boundary} is used. - -On Pentium and Pentium Pro, @code{double} and @code{long double} values -should be aligned to an 8-byte boundary (see @option{-malign-double}) or -suffer significant run time performance penalties. On Pentium III, the -Streaming SIMD Extension (SSE) data type @code{__m128} may not work -properly if it is not 16-byte aligned. - -To ensure proper alignment of this values on the stack, the stack boundary -must be as aligned as that required by any value stored on the stack. -Further, every function must be generated such that it keeps the stack -aligned. Thus calling a function compiled with a higher preferred -stack boundary from a function compiled with a lower preferred stack -boundary most likely misaligns the stack. It is recommended that -libraries that use callbacks always use the default setting. - -This extra alignment does consume extra stack space, and generally -increases code size. Code that is sensitive to stack space usage, such -as embedded systems and operating system kernels, may want to reduce the -preferred alignment to @option{-mpreferred-stack-boundary=2}. - -@need 200 -@item -mmmx -@opindex mmmx -@need 200 -@itemx -msse -@opindex msse -@need 200 -@itemx -msse2 -@need 200 -@itemx -msse3 -@need 200 -@itemx -mssse3 -@need 200 -@itemx -msse4 -@need 200 -@itemx -msse4a -@need 200 -@itemx -msse4.1 -@need 200 -@itemx -msse4.2 -@need 200 -@itemx -mavx -@opindex mavx -@need 200 -@itemx -mavx2 -@need 200 -@itemx -mavx512f -@need 200 -@itemx -mavx512pf -@need 200 -@itemx -mavx512er -@need 200 -@itemx -mavx512cd -@need 200 -@itemx -msha -@opindex msha -@need 200 -@itemx -maes -@opindex maes -@need 200 -@itemx -mpclmul -@opindex mpclmul -@need 200 -@itemx -mclfushopt -@opindex mclfushopt -@need 200 -@itemx -mfsgsbase -@opindex mfsgsbase -@need 200 -@itemx -mrdrnd -@opindex mrdrnd -@need 200 -@itemx -mf16c -@opindex mf16c -@need 200 -@itemx -mfma -@opindex mfma -@need 200 -@itemx -mfma4 -@need 200 -@itemx -mno-fma4 -@need 200 -@itemx -mprefetchwt1 -@opindex mprefetchwt1 -@need 200 -@itemx -mxop -@opindex mxop -@need 200 -@itemx -mlwp -@opindex mlwp -@need 200 -@itemx -m3dnow -@opindex m3dnow -@need 200 -@itemx -mpopcnt -@opindex mpopcnt -@need 200 -@itemx -mabm -@opindex mabm -@need 200 -@itemx -mbmi -@opindex mbmi -@need 200 -@itemx -mbmi2 -@need 200 -@itemx -mlzcnt -@opindex mlzcnt -@need 200 -@itemx -mfxsr -@opindex mfxsr -@need 200 -@itemx -mxsave -@opindex mxsave -@need 200 -@itemx -mxsaveopt -@opindex mxsaveopt -@need 200 -@itemx -mxsavec -@opindex mxsavec -@need 200 -@itemx -mxsaves -@opindex mxsaves -@need 200 -@itemx -mrtm -@opindex mrtm -@need 200 -@itemx -mtbm -@opindex mtbm -@need 200 -@itemx -mmpx -@opindex mmpx -These switches enable the use of instructions in the MMX, SSE, -SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD, -SHA, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, LWP, ABM, -BMI, BMI2, FXSR, XSAVE, XSAVEOPT, LZCNT, RTM, MPX or 3DNow!@: -extended instruction sets. Each has a corresponding @option{-mno-} option -to disable use of these instructions. - -These extensions are also available as built-in functions: see -@ref{x86 Built-in Functions}, for details of the functions enabled and -disabled by these switches. - -To generate SSE/SSE2 instructions automatically from floating-point -code (as opposed to 387 instructions), see @option{-mfpmath=sse}. - -GCC depresses SSEx instructions when @option{-mavx} is used. Instead, it -generates new AVX instructions or AVX equivalence for all SSEx instructions -when needed. - -These options enable GCC to use these extended instructions in -generated code, even without @option{-mfpmath=sse}. Applications that -perform run-time CPU detection must compile separate files for each -supported architecture, using the appropriate flags. In particular, -the file containing the CPU detection code should be compiled without -these options. - -@item -mdump-tune-features -@opindex mdump-tune-features -This option instructs GCC to dump the names of the x86 performance -tuning features and default settings. The names can be used in -@option{-mtune-ctrl=@var{feature-list}}. - -@item -mtune-ctrl=@var{feature-list} -@opindex mtune-ctrl=@var{feature-list} -This option is used to do fine grain control of x86 code generation features. -@var{feature-list} is a comma separated list of @var{feature} names. See also -@option{-mdump-tune-features}. When specified, the @var{feature} is turned -on if it is not preceded with @samp{^}, otherwise, it is turned off. -@option{-mtune-ctrl=@var{feature-list}} is intended to be used by GCC -developers. Using it may lead to code paths not covered by testing and can -potentially result in compiler ICEs or runtime errors. - -@item -mno-default -@opindex mno-default -This option instructs GCC to turn off all tunable features. See also -@option{-mtune-ctrl=@var{feature-list}} and @option{-mdump-tune-features}. - -@item -mcld -@opindex mcld -This option instructs GCC to emit a @code{cld} instruction in the prologue -of functions that use string instructions. String instructions depend on -the DF flag to select between autoincrement or autodecrement mode. While the -ABI specifies the DF flag to be cleared on function entry, some operating -systems violate this specification by not clearing the DF flag in their -exception dispatchers. The exception handler can be invoked with the DF flag -set, which leads to wrong direction mode when string instructions are used. -This option can be enabled by default on 32-bit x86 targets by configuring -GCC with the @option{--enable-cld} configure option. Generation of @code{cld} -instructions can be suppressed with the @option{-mno-cld} compiler option -in this case. - -@item -mvzeroupper -@opindex mvzeroupper -This option instructs GCC to emit a @code{vzeroupper} instruction -before a transfer of control flow out of the function to minimize -the AVX to SSE transition penalty as well as remove unnecessary @code{zeroupper} -intrinsics. - -@item -mprefer-avx128 -@opindex mprefer-avx128 -This option instructs GCC to use 128-bit AVX instructions instead of -256-bit AVX instructions in the auto-vectorizer. - -@item -mcx16 -@opindex mcx16 -This option enables GCC to generate @code{CMPXCHG16B} instructions. -@code{CMPXCHG16B} allows for atomic operations on 128-bit double quadword -(or oword) data types. -This is useful for high-resolution counters that can be updated -by multiple processors (or cores). This instruction is generated as part of -atomic built-in functions: see @ref{__sync Builtins} or -@ref{__atomic Builtins} for details. - -@item -msahf -@opindex msahf -This option enables generation of @code{SAHF} instructions in 64-bit code. -Early Intel Pentium 4 CPUs with Intel 64 support, -prior to the introduction of Pentium 4 G1 step in December 2005, -lacked the @code{LAHF} and @code{SAHF} instructions -which are supported by AMD64. -These are load and store instructions, respectively, for certain status flags. -In 64-bit mode, the @code{SAHF} instruction is used to optimize @code{fmod}, -@code{drem}, and @code{remainder} built-in functions; -see @ref{Other Builtins} for details. - -@item -mmovbe -@opindex mmovbe -This option enables use of the @code{movbe} instruction to implement -@code{__builtin_bswap32} and @code{__builtin_bswap64}. - -@item -mcrc32 -@opindex mcrc32 -This option enables built-in functions @code{__builtin_ia32_crc32qi}, -@code{__builtin_ia32_crc32hi}, @code{__builtin_ia32_crc32si} and -@code{__builtin_ia32_crc32di} to generate the @code{crc32} machine instruction. - -@item -mrecip -@opindex mrecip -This option enables use of @code{RCPSS} and @code{RSQRTSS} instructions -(and their vectorized variants @code{RCPPS} and @code{RSQRTPS}) -with an additional Newton-Raphson step -to increase precision instead of @code{DIVSS} and @code{SQRTSS} -(and their vectorized -variants) for single-precision floating-point arguments. These instructions -are generated only when @option{-funsafe-math-optimizations} is enabled -together with @option{-finite-math-only} and @option{-fno-trapping-math}. -Note that while the throughput of the sequence is higher than the throughput -of the non-reciprocal instruction, the precision of the sequence can be -decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994). - -Note that GCC implements @code{1.0f/sqrtf(@var{x})} in terms of @code{RSQRTSS} -(or @code{RSQRTPS}) already with @option{-ffast-math} (or the above option -combination), and doesn't need @option{-mrecip}. - -Also note that GCC emits the above sequence with additional Newton-Raphson step -for vectorized single-float division and vectorized @code{sqrtf(@var{x})} -already with @option{-ffast-math} (or the above option combination), and -doesn't need @option{-mrecip}. - -@item -mrecip=@var{opt} -@opindex mrecip=opt -This option controls which reciprocal estimate instructions -may be used. @var{opt} is a comma-separated list of options, which may -be preceded by a @samp{!} to invert the option: - -@table @samp -@item all -Enable all estimate instructions. - -@item default -Enable the default instructions, equivalent to @option{-mrecip}. - -@item none -Disable all estimate instructions, equivalent to @option{-mno-recip}. - -@item div -Enable the approximation for scalar division. - -@item vec-div -Enable the approximation for vectorized division. - -@item sqrt -Enable the approximation for scalar square root. - -@item vec-sqrt -Enable the approximation for vectorized square root. -@end table - -So, for example, @option{-mrecip=all,!sqrt} enables -all of the reciprocal approximations, except for square root. - -@item -mveclibabi=@var{type} -@opindex mveclibabi -Specifies the ABI type to use for vectorizing intrinsics using an -external library. Supported values for @var{type} are @samp{svml} -for the Intel short -vector math library and @samp{acml} for the AMD math core library. -To use this option, both @option{-ftree-vectorize} and -@option{-funsafe-math-optimizations} have to be enabled, and an SVML or ACML -ABI-compatible library must be specified at link time. - -GCC currently emits calls to @code{vmldExp2}, -@code{vmldLn2}, @code{vmldLog102}, @code{vmldLog102}, @code{vmldPow2}, -@code{vmldTanh2}, @code{vmldTan2}, @code{vmldAtan2}, @code{vmldAtanh2}, -@code{vmldCbrt2}, @code{vmldSinh2}, @code{vmldSin2}, @code{vmldAsinh2}, -@code{vmldAsin2}, @code{vmldCosh2}, @code{vmldCos2}, @code{vmldAcosh2}, -@code{vmldAcos2}, @code{vmlsExp4}, @code{vmlsLn4}, @code{vmlsLog104}, -@code{vmlsLog104}, @code{vmlsPow4}, @code{vmlsTanh4}, @code{vmlsTan4}, -@code{vmlsAtan4}, @code{vmlsAtanh4}, @code{vmlsCbrt4}, @code{vmlsSinh4}, -@code{vmlsSin4}, @code{vmlsAsinh4}, @code{vmlsAsin4}, @code{vmlsCosh4}, -@code{vmlsCos4}, @code{vmlsAcosh4} and @code{vmlsAcos4} for corresponding -function type when @option{-mveclibabi=svml} is used, and @code{__vrd2_sin}, -@code{__vrd2_cos}, @code{__vrd2_exp}, @code{__vrd2_log}, @code{__vrd2_log2}, -@code{__vrd2_log10}, @code{__vrs4_sinf}, @code{__vrs4_cosf}, -@code{__vrs4_expf}, @code{__vrs4_logf}, @code{__vrs4_log2f}, -@code{__vrs4_log10f} and @code{__vrs4_powf} for the corresponding function type -when @option{-mveclibabi=acml} is used. - -@item -mabi=@var{name} -@opindex mabi -Generate code for the specified calling convention. Permissible values -are @samp{sysv} for the ABI used on GNU/Linux and other systems, and -@samp{ms} for the Microsoft ABI. The default is to use the Microsoft -ABI when targeting Microsoft Windows and the SysV ABI on all other systems. -You can control this behavior for specific functions by -using the function attributes @code{ms_abi} and @code{sysv_abi}. -@xref{Function Attributes}. - -@item -mtls-dialect=@var{type} -@opindex mtls-dialect -Generate code to access thread-local storage using the @samp{gnu} or -@samp{gnu2} conventions. @samp{gnu} is the conservative default; -@samp{gnu2} is more efficient, but it may add compile- and run-time -requirements that cannot be satisfied on all systems. - -@item -mpush-args -@itemx -mno-push-args -@opindex mpush-args -@opindex mno-push-args -Use PUSH operations to store outgoing parameters. This method is shorter -and usually equally fast as method using SUB/MOV operations and is enabled -by default. In some cases disabling it may improve performance because of -improved scheduling and reduced dependencies. - -@item -maccumulate-outgoing-args -@opindex maccumulate-outgoing-args -If enabled, the maximum amount of space required for outgoing arguments is -computed in the function prologue. This is faster on most modern CPUs -because of reduced dependencies, improved scheduling and reduced stack usage -when the preferred stack boundary is not equal to 2. The drawback is a notable -increase in code size. This switch implies @option{-mno-push-args}. - -@item -mthreads -@opindex mthreads -Support thread-safe exception handling on MinGW. Programs that rely -on thread-safe exception handling must compile and link all code with the -@option{-mthreads} option. When compiling, @option{-mthreads} defines -@option{-D_MT}; when linking, it links in a special thread helper library -@option{-lmingwthrd} which cleans up per-thread exception-handling data. - -@item -mno-align-stringops -@opindex mno-align-stringops -Do not align the destination of inlined string operations. This switch reduces -code size and improves performance in case the destination is already aligned, -but GCC doesn't know about it. - -@item -minline-all-stringops -@opindex minline-all-stringops -By default GCC inlines string operations only when the destination is -known to be aligned to least a 4-byte boundary. -This enables more inlining and increases code -size, but may improve performance of code that depends on fast -@code{memcpy}, @code{strlen}, -and @code{memset} for short lengths. - -@item -minline-stringops-dynamically -@opindex minline-stringops-dynamically -For string operations of unknown size, use run-time checks with -inline code for small blocks and a library call for large blocks. - -@item -mstringop-strategy=@var{alg} -@opindex mstringop-strategy=@var{alg} -Override the internal decision heuristic for the particular algorithm to use -for inlining string operations. The allowed values for @var{alg} are: - -@table @samp -@item rep_byte -@itemx rep_4byte -@itemx rep_8byte -Expand using i386 @code{rep} prefix of the specified size. - -@item byte_loop -@itemx loop -@itemx unrolled_loop -Expand into an inline loop. - -@item libcall -Always use a library call. -@end table - -@item -mmemcpy-strategy=@var{strategy} -@opindex mmemcpy-strategy=@var{strategy} -Override the internal decision heuristic to decide if @code{__builtin_memcpy} -should be inlined and what inline algorithm to use when the expected size -of the copy operation is known. @var{strategy} -is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets. -@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies -the max byte size with which inline algorithm @var{alg} is allowed. For the last -triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets -in the list must be specified in increasing order. The minimal byte size for -@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the -preceding range. - -@item -mmemset-strategy=@var{strategy} -@opindex mmemset-strategy=@var{strategy} -The option is similar to @option{-mmemcpy-strategy=} except that it is to control -@code{__builtin_memset} expansion. - -@item -momit-leaf-frame-pointer -@opindex momit-leaf-frame-pointer -Don't keep the frame pointer in a register for leaf functions. This -avoids the instructions to save, set up, and restore frame pointers and -makes an extra register available in leaf functions. The option -@option{-fomit-leaf-frame-pointer} removes the frame pointer for leaf functions, -which might make debugging harder. - -@item -mtls-direct-seg-refs -@itemx -mno-tls-direct-seg-refs -@opindex mtls-direct-seg-refs -Controls whether TLS variables may be accessed with offsets from the -TLS segment register (@code{%gs} for 32-bit, @code{%fs} for 64-bit), -or whether the thread base pointer must be added. Whether or not this -is valid depends on the operating system, and whether it maps the -segment to cover the entire TLS area. - -For systems that use the GNU C Library, the default is on. - -@item -msse2avx -@itemx -mno-sse2avx -@opindex msse2avx -Specify that the assembler should encode SSE instructions with VEX -prefix. The option @option{-mavx} turns this on by default. - -@item -mfentry -@itemx -mno-fentry -@opindex mfentry -If profiling is active (@option{-pg}), put the profiling -counter call before the prologue. -Note: On x86 architectures the attribute @code{ms_hook_prologue} -isn't possible at the moment for @option{-mfentry} and @option{-pg}. - -@item -mrecord-mcount -@itemx -mno-record-mcount -@opindex mrecord-mcount -If profiling is active (@option{-pg}), generate a __mcount_loc section -that contains pointers to each profiling call. This is useful for -automatically patching and out calls. - -@item -mnop-mcount -@itemx -mno-nop-mcount -@opindex mnop-mcount -If profiling is active (@option{-pg}), generate the calls to -the profiling functions as nops. This is useful when they -should be patched in later dynamically. This is likely only -useful together with @option{-mrecord-mcount}. - -@item -mskip-rax-setup -@itemx -mno-skip-rax-setup -@opindex mskip-rax-setup -When generating code for the x86-64 architecture with SSE extensions -disabled, @option{-skip-rax-setup} can be used to skip setting up RAX -register when there are no variable arguments passed in vector registers. - -@strong{Warning:} Since RAX register is used to avoid unnecessarily -saving vector registers on stack when passing variable arguments, the -impacts of this option are callees may waste some stack space, -misbehave or jump to a random location. GCC 4.4 or newer don't have -those issues, regardless the RAX register value. - -@item -m8bit-idiv -@itemx -mno-8bit-idiv -@opindex m8bit-idiv -On some processors, like Intel Atom, 8-bit unsigned integer divide is -much faster than 32-bit/64-bit integer divide. This option generates a -run-time check. If both dividend and divisor are within range of 0 -to 255, 8-bit unsigned integer divide is used instead of -32-bit/64-bit integer divide. - -@item -mavx256-split-unaligned-load -@itemx -mavx256-split-unaligned-store -@opindex mavx256-split-unaligned-load -@opindex mavx256-split-unaligned-store -Split 32-byte AVX unaligned load and store. - -@item -mstack-protector-guard=@var{guard} -@opindex mstack-protector-guard=@var{guard} -Generate stack protection code using canary at @var{guard}. Supported -locations are @samp{global} for global canary or @samp{tls} for per-thread -canary in the TLS block (the default). This option has effect only when -@option{-fstack-protector} or @option{-fstack-protector-all} is specified. - -@end table - -These @samp{-m} switches are supported in addition to the above -on x86-64 processors in 64-bit environments. - -@table @gcctabopt -@item -m32 -@itemx -m64 -@itemx -mx32 -@itemx -m16 -@opindex m32 -@opindex m64 -@opindex mx32 -@opindex m16 -Generate code for a 16-bit, 32-bit or 64-bit environment. -The @option{-m32} option sets @code{int}, @code{long}, and pointer types -to 32 bits, and -generates code that runs on any i386 system. - -The @option{-m64} option sets @code{int} to 32 bits and @code{long} and pointer -types to 64 bits, and generates code for the x86-64 architecture. -For Darwin only the @option{-m64} option also turns off the @option{-fno-pic} -and @option{-mdynamic-no-pic} options. - -The @option{-mx32} option sets @code{int}, @code{long}, and pointer types -to 32 bits, and -generates code for the x86-64 architecture. - -The @option{-m16} option is the same as @option{-m32}, except for that -it outputs the @code{.code16gcc} assembly directive at the beginning of -the assembly output so that the binary can run in 16-bit mode. - -@item -mno-red-zone -@opindex mno-red-zone -Do not use a so-called ``red zone'' for x86-64 code. The red zone is mandated -by the x86-64 ABI; it is a 128-byte area beyond the location of the -stack pointer that is not modified by signal or interrupt handlers -and therefore can be used for temporary data without adjusting the stack -pointer. The flag @option{-mno-red-zone} disables this red zone. - -@item -mcmodel=small -@opindex mcmodel=small -Generate code for the small code model: the program and its symbols must -be linked in the lower 2 GB of the address space. Pointers are 64 bits. -Programs can be statically or dynamically linked. This is the default -code model. - -@item -mcmodel=kernel -@opindex mcmodel=kernel -Generate code for the kernel code model. The kernel runs in the -negative 2 GB of the address space. -This model has to be used for Linux kernel code. - -@item -mcmodel=medium -@opindex mcmodel=medium -Generate code for the medium model: the program is linked in the lower 2 -GB of the address space. Small symbols are also placed there. Symbols -with sizes larger than @option{-mlarge-data-threshold} are put into -large data or BSS sections and can be located above 2GB. Programs can -be statically or dynamically linked. - -@item -mcmodel=large -@opindex mcmodel=large -Generate code for the large model. This model makes no assumptions -about addresses and sizes of sections. - -@item -maddress-mode=long -@opindex maddress-mode=long -Generate code for long address mode. This is only supported for 64-bit -and x32 environments. It is the default address mode for 64-bit -environments. - -@item -maddress-mode=short -@opindex maddress-mode=short -Generate code for short address mode. This is only supported for 32-bit -and x32 environments. It is the default address mode for 32-bit and -x32 environments. -@end table - -@node x86 Windows Options -@subsection x86 Windows Options -@cindex x86 Windows Options -@cindex Windows Options for x86 - -These additional options are available for Microsoft Windows targets: - -@table @gcctabopt -@item -mconsole -@opindex mconsole -This option -specifies that a console application is to be generated, by -instructing the linker to set the PE header subsystem type -required for console applications. -This option is available for Cygwin and MinGW targets and is -enabled by default on those targets. - -@item -mdll -@opindex mdll -This option is available for Cygwin and MinGW targets. It -specifies that a DLL---a dynamic link library---is to be -generated, enabling the selection of the required runtime -startup object and entry point. - -@item -mnop-fun-dllimport -@opindex mnop-fun-dllimport -This option is available for Cygwin and MinGW targets. It -specifies that the @code{dllimport} attribute should be ignored. - -@item -mthread -@opindex mthread -This option is available for MinGW targets. It specifies -that MinGW-specific thread support is to be used. - -@item -municode -@opindex municode -This option is available for MinGW-w64 targets. It causes -the @code{UNICODE} preprocessor macro to be predefined, and -chooses Unicode-capable runtime startup code. - -@item -mwin32 -@opindex mwin32 -This option is available for Cygwin and MinGW targets. It -specifies that the typical Microsoft Windows predefined macros are to -be set in the pre-processor, but does not influence the choice -of runtime library/startup code. - -@item -mwindows -@opindex mwindows -This option is available for Cygwin and MinGW targets. It -specifies that a GUI application is to be generated by -instructing the linker to set the PE header subsystem type -appropriately. - -@item -fno-set-stack-executable -@opindex fno-set-stack-executable -This option is available for MinGW targets. It specifies that -the executable flag for the stack used by nested functions isn't -set. This is necessary for binaries running in kernel mode of -Microsoft Windows, as there the User32 API, which is used to set executable -privileges, isn't available. - -@item -fwritable-relocated-rdata -@opindex fno-writable-relocated-rdata -This option is available for MinGW and Cygwin targets. It specifies -that relocated-data in read-only section is put into .data -section. This is a necessary for older runtimes not supporting -modification of .rdata sections for pseudo-relocation. - -@item -mpe-aligned-commons -@opindex mpe-aligned-commons -This option is available for Cygwin and MinGW targets. It -specifies that the GNU extension to the PE file format that -permits the correct alignment of COMMON variables should be -used when generating code. It is enabled by default if -GCC detects that the target assembler found during configuration -supports the feature. -@end table - -See also under @ref{x86 Options} for standard options. - @node IA-64 Options @subsection IA-64 Options @cindex IA-64 Options @@ -22850,6 +21633,1223 @@ Disable lazy binding of function calls. This option is the default and is defined for compatibility with Diab. @end table +@node x86 Options +@subsection x86 Options +@cindex x86 Options + +These @samp{-m} options are defined for the x86 family of computers. + +@table @gcctabopt + +@item -march=@var{cpu-type} +@opindex march +Generate instructions for the machine type @var{cpu-type}. In contrast to +@option{-mtune=@var{cpu-type}}, which merely tunes the generated code +for the specified @var{cpu-type}, @option{-march=@var{cpu-type}} allows GCC +to generate code that may not run at all on processors other than the one +indicated. Specifying @option{-march=@var{cpu-type}} implies +@option{-mtune=@var{cpu-type}}. + +The choices for @var{cpu-type} are: + +@table @samp +@item native +This selects the CPU to generate code for at compilation time by determining +the processor type of the compiling machine. Using @option{-march=native} +enables all instruction subsets supported by the local machine (hence +the result might not run on different machines). Using @option{-mtune=native} +produces code optimized for the local machine under the constraints +of the selected instruction set. + +@item i386 +Original Intel i386 CPU@. + +@item i486 +Intel i486 CPU@. (No scheduling is implemented for this chip.) + +@item i586 +@itemx pentium +Intel Pentium CPU with no MMX support. + +@item pentium-mmx +Intel Pentium MMX CPU, based on Pentium core with MMX instruction set support. + +@item pentiumpro +Intel Pentium Pro CPU@. + +@item i686 +When used with @option{-march}, the Pentium Pro +instruction set is used, so the code runs on all i686 family chips. +When used with @option{-mtune}, it has the same meaning as @samp{generic}. + +@item pentium2 +Intel Pentium II CPU, based on Pentium Pro core with MMX instruction set +support. + +@item pentium3 +@itemx pentium3m +Intel Pentium III CPU, based on Pentium Pro core with MMX and SSE instruction +set support. + +@item pentium-m +Intel Pentium M; low-power version of Intel Pentium III CPU +with MMX, SSE and SSE2 instruction set support. Used by Centrino notebooks. + +@item pentium4 +@itemx pentium4m +Intel Pentium 4 CPU with MMX, SSE and SSE2 instruction set support. + +@item prescott +Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2 and SSE3 instruction +set support. + +@item nocona +Improved version of Intel Pentium 4 CPU with 64-bit extensions, MMX, SSE, +SSE2 and SSE3 instruction set support. + +@item core2 +Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 +instruction set support. + +@item nehalem +Intel Nehalem CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2 and POPCNT instruction set support. + +@item westmere +Intel Westmere CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2, POPCNT, AES and PCLMUL instruction set support. + +@item sandybridge +Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support. + +@item ivybridge +Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C +instruction set support. + +@item haswell +Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, +BMI, BMI2 and F16C instruction set support. + +@item broadwell +Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, +BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set support. + +@item bonnell +Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3 +instruction set support. + +@item silvermont +Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction set support. + +@item k6 +AMD K6 CPU with MMX instruction set support. + +@item k6-2 +@itemx k6-3 +Improved versions of AMD K6 CPU with MMX and 3DNow!@: instruction set support. + +@item athlon +@itemx athlon-tbird +AMD Athlon CPU with MMX, 3dNOW!, enhanced 3DNow!@: and SSE prefetch instructions +support. + +@item athlon-4 +@itemx athlon-xp +@itemx athlon-mp +Improved AMD Athlon CPU with MMX, 3DNow!, enhanced 3DNow!@: and full SSE +instruction set support. + +@item k8 +@itemx opteron +@itemx athlon64 +@itemx athlon-fx +Processors based on the AMD K8 core with x86-64 instruction set support, +including the AMD Opteron, Athlon 64, and Athlon 64 FX processors. +(This supersets MMX, SSE, SSE2, 3DNow!, enhanced 3DNow!@: and 64-bit +instruction set extensions.) + +@item k8-sse3 +@itemx opteron-sse3 +@itemx athlon64-sse3 +Improved versions of AMD K8 cores with SSE3 instruction set support. + +@item amdfam10 +@itemx barcelona +CPUs based on AMD Family 10h cores with x86-64 instruction set support. (This +supersets MMX, SSE, SSE2, SSE3, SSE4A, 3DNow!, enhanced 3DNow!, ABM and 64-bit +instruction set extensions.) + +@item bdver1 +CPUs based on AMD Family 15h cores with x86-64 instruction set support. (This +supersets FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, +SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.) +@item bdver2 +AMD Family 15h core based CPUs with x86-64 instruction set support. (This +supersets BMI, TBM, F16C, FMA, FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, +SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set +extensions.) +@item bdver3 +AMD Family 15h core based CPUs with x86-64 instruction set support. (This +supersets BMI, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, XOP, LWP, AES, +PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and +64-bit instruction set extensions. +@item bdver4 +AMD Family 15h core based CPUs with x86-64 instruction set support. (This +supersets BMI, BMI2, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, AVX2, XOP, LWP, +AES, PCL_MUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, +SSE4.2, ABM and 64-bit instruction set extensions. + +@item btver1 +CPUs based on AMD Family 14h cores with x86-64 instruction set support. (This +supersets MMX, SSE, SSE2, SSE3, SSSE3, SSE4A, CX16, ABM and 64-bit +instruction set extensions.) + +@item btver2 +CPUs based on AMD Family 16h cores with x86-64 instruction set support. This +includes MOVBE, F16C, BMI, AVX, PCL_MUL, AES, SSE4.2, SSE4.1, CX16, ABM, +SSE4A, SSSE3, SSE3, SSE2, SSE, MMX and 64-bit instruction set extensions. + +@item winchip-c6 +IDT WinChip C6 CPU, dealt in same way as i486 with additional MMX instruction +set support. + +@item winchip2 +IDT WinChip 2 CPU, dealt in same way as i486 with additional MMX and 3DNow!@: +instruction set support. + +@item c3 +VIA C3 CPU with MMX and 3DNow!@: instruction set support. (No scheduling is +implemented for this chip.) + +@item c3-2 +VIA C3-2 (Nehemiah/C5XL) CPU with MMX and SSE instruction set support. +(No scheduling is +implemented for this chip.) + +@item geode +AMD Geode embedded processor with MMX and 3DNow!@: instruction set support. +@end table + +@item -mtune=@var{cpu-type} +@opindex mtune +Tune to @var{cpu-type} everything applicable about the generated code, except +for the ABI and the set of available instructions. +While picking a specific @var{cpu-type} schedules things appropriately +for that particular chip, the compiler does not generate any code that +cannot run on the default machine type unless you use a +@option{-march=@var{cpu-type}} option. +For example, if GCC is configured for i686-pc-linux-gnu +then @option{-mtune=pentium4} generates code that is tuned for Pentium 4 +but still runs on i686 machines. + +The choices for @var{cpu-type} are the same as for @option{-march}. +In addition, @option{-mtune} supports 2 extra choices for @var{cpu-type}: + +@table @samp +@item generic +Produce code optimized for the most common IA32/@/AMD64/@/EM64T processors. +If you know the CPU on which your code will run, then you should use +the corresponding @option{-mtune} or @option{-march} option instead of +@option{-mtune=generic}. But, if you do not know exactly what CPU users +of your application will have, then you should use this option. + +As new processors are deployed in the marketplace, the behavior of this +option will change. Therefore, if you upgrade to a newer version of +GCC, code generation controlled by this option will change to reflect +the processors +that are most common at the time that version of GCC is released. + +There is no @option{-march=generic} option because @option{-march} +indicates the instruction set the compiler can use, and there is no +generic instruction set applicable to all processors. In contrast, +@option{-mtune} indicates the processor (or, in this case, collection of +processors) for which the code is optimized. + +@item intel +Produce code optimized for the most current Intel processors, which are +Haswell and Silvermont for this version of GCC. If you know the CPU +on which your code will run, then you should use the corresponding +@option{-mtune} or @option{-march} option instead of @option{-mtune=intel}. +But, if you want your application performs better on both Haswell and +Silvermont, then you should use this option. + +As new Intel processors are deployed in the marketplace, the behavior of +this option will change. Therefore, if you upgrade to a newer version of +GCC, code generation controlled by this option will change to reflect +the most current Intel processors at the time that version of GCC is +released. + +There is no @option{-march=intel} option because @option{-march} indicates +the instruction set the compiler can use, and there is no common +instruction set applicable to all processors. In contrast, +@option{-mtune} indicates the processor (or, in this case, collection of +processors) for which the code is optimized. +@end table + +@item -mcpu=@var{cpu-type} +@opindex mcpu +A deprecated synonym for @option{-mtune}. + +@item -mfpmath=@var{unit} +@opindex mfpmath +Generate floating-point arithmetic for selected unit @var{unit}. The choices +for @var{unit} are: + +@table @samp +@item 387 +Use the standard 387 floating-point coprocessor present on the majority of chips and +emulated otherwise. Code compiled with this option runs almost everywhere. +The temporary results are computed in 80-bit precision instead of the precision +specified by the type, resulting in slightly different results compared to most +of other chips. See @option{-ffloat-store} for more detailed description. + +This is the default choice for x86-32 targets. + +@item sse +Use scalar floating-point instructions present in the SSE instruction set. +This instruction set is supported by Pentium III and newer chips, +and in the AMD line +by Athlon-4, Athlon XP and Athlon MP chips. The earlier version of the SSE +instruction set supports only single-precision arithmetic, thus the double and +extended-precision arithmetic are still done using 387. A later version, present +only in Pentium 4 and AMD x86-64 chips, supports double-precision +arithmetic too. + +For the x86-32 compiler, you must use @option{-march=@var{cpu-type}}, @option{-msse} +or @option{-msse2} switches to enable SSE extensions and make this option +effective. For the x86-64 compiler, these extensions are enabled by default. + +The resulting code should be considerably faster in the majority of cases and avoid +the numerical instability problems of 387 code, but may break some existing +code that expects temporaries to be 80 bits. + +This is the default choice for the x86-64 compiler. + +@item sse,387 +@itemx sse+387 +@itemx both +Attempt to utilize both instruction sets at once. This effectively doubles the +amount of available registers, and on chips with separate execution units for +387 and SSE the execution resources too. Use this option with care, as it is +still experimental, because the GCC register allocator does not model separate +functional units well, resulting in unstable performance. +@end table + +@item -masm=@var{dialect} +@opindex masm=@var{dialect} +Output assembly instructions using selected @var{dialect}. Supported +choices are @samp{intel} or @samp{att} (the default). Darwin does +not support @samp{intel}. + +@item -mieee-fp +@itemx -mno-ieee-fp +@opindex mieee-fp +@opindex mno-ieee-fp +Control whether or not the compiler uses IEEE floating-point +comparisons. These correctly handle the case where the result of a +comparison is unordered. + +@item -msoft-float +@opindex msoft-float +Generate output containing library calls for floating point. + +@strong{Warning:} the requisite libraries are not part of GCC@. +Normally the facilities of the machine's usual C compiler are used, but +this can't be done directly in cross-compilation. You must make your +own arrangements to provide suitable library functions for +cross-compilation. + +On machines where a function returns floating-point results in the 80387 +register stack, some floating-point opcodes may be emitted even if +@option{-msoft-float} is used. + +@item -mno-fp-ret-in-387 +@opindex mno-fp-ret-in-387 +Do not use the FPU registers for return values of functions. + +The usual calling convention has functions return values of types +@code{float} and @code{double} in an FPU register, even if there +is no FPU@. The idea is that the operating system should emulate +an FPU@. + +The option @option{-mno-fp-ret-in-387} causes such values to be returned +in ordinary CPU registers instead. + +@item -mno-fancy-math-387 +@opindex mno-fancy-math-387 +Some 387 emulators do not support the @code{sin}, @code{cos} and +@code{sqrt} instructions for the 387. Specify this option to avoid +generating those instructions. This option is the default on FreeBSD, +OpenBSD and NetBSD@. This option is overridden when @option{-march} +indicates that the target CPU always has an FPU and so the +instruction does not need emulation. These +instructions are not generated unless you also use the +@option{-funsafe-math-optimizations} switch. + +@item -malign-double +@itemx -mno-align-double +@opindex malign-double +@opindex mno-align-double +Control whether GCC aligns @code{double}, @code{long double}, and +@code{long long} variables on a two-word boundary or a one-word +boundary. Aligning @code{double} variables on a two-word boundary +produces code that runs somewhat faster on a Pentium at the +expense of more memory. + +On x86-64, @option{-malign-double} is enabled by default. + +@strong{Warning:} if you use the @option{-malign-double} switch, +structures containing the above types are aligned differently than +the published application binary interface specifications for the x86-32 +and are not binary compatible with structures in code compiled +without that switch. + +@item -m96bit-long-double +@itemx -m128bit-long-double +@opindex m96bit-long-double +@opindex m128bit-long-double +These switches control the size of @code{long double} type. The x86-32 +application binary interface specifies the size to be 96 bits, +so @option{-m96bit-long-double} is the default in 32-bit mode. + +Modern architectures (Pentium and newer) prefer @code{long double} +to be aligned to an 8- or 16-byte boundary. In arrays or structures +conforming to the ABI, this is not possible. So specifying +@option{-m128bit-long-double} aligns @code{long double} +to a 16-byte boundary by padding the @code{long double} with an additional +32-bit zero. + +In the x86-64 compiler, @option{-m128bit-long-double} is the default choice as +its ABI specifies that @code{long double} is aligned on 16-byte boundary. + +Notice that neither of these options enable any extra precision over the x87 +standard of 80 bits for a @code{long double}. + +@strong{Warning:} if you override the default value for your target ABI, this +changes the size of +structures and arrays containing @code{long double} variables, +as well as modifying the function calling convention for functions taking +@code{long double}. Hence they are not binary-compatible +with code compiled without that switch. + +@item -mlong-double-64 +@itemx -mlong-double-80 +@itemx -mlong-double-128 +@opindex mlong-double-64 +@opindex mlong-double-80 +@opindex mlong-double-128 +These switches control the size of @code{long double} type. A size +of 64 bits makes the @code{long double} type equivalent to the @code{double} +type. This is the default for 32-bit Bionic C library. A size +of 128 bits makes the @code{long double} type equivalent to the +@code{__float128} type. This is the default for 64-bit Bionic C library. + +@strong{Warning:} if you override the default value for your target ABI, this +changes the size of +structures and arrays containing @code{long double} variables, +as well as modifying the function calling convention for functions taking +@code{long double}. Hence they are not binary-compatible +with code compiled without that switch. + +@item -malign-data=@var{type} +@opindex malign-data +Control how GCC aligns variables. Supported values for @var{type} are +@samp{compat} uses increased alignment value compatible uses GCC 4.8 +and earlier, @samp{abi} uses alignment value as specified by the +psABI, and @samp{cacheline} uses increased alignment value to match +the cache line size. @samp{compat} is the default. + +@item -mlarge-data-threshold=@var{threshold} +@opindex mlarge-data-threshold +When @option{-mcmodel=medium} is specified, data objects larger than +@var{threshold} are placed in the large data section. This value must be the +same across all objects linked into the binary, and defaults to 65535. + +@item -mrtd +@opindex mrtd +Use a different function-calling convention, in which functions that +take a fixed number of arguments return with the @code{ret @var{num}} +instruction, which pops their arguments while returning. This saves one +instruction in the caller since there is no need to pop the arguments +there. + +You can specify that an individual function is called with this calling +sequence with the function attribute @code{stdcall}. You can also +override the @option{-mrtd} option by using the function attribute +@code{cdecl}. @xref{Function Attributes}. + +@strong{Warning:} this calling convention is incompatible with the one +normally used on Unix, so you cannot use it if you need to call +libraries compiled with the Unix compiler. + +Also, you must provide function prototypes for all functions that +take variable numbers of arguments (including @code{printf}); +otherwise incorrect code is generated for calls to those +functions. + +In addition, seriously incorrect code results if you call a +function with too many arguments. (Normally, extra arguments are +harmlessly ignored.) + +@item -mregparm=@var{num} +@opindex mregparm +Control how many registers are used to pass integer arguments. By +default, no registers are used to pass arguments, and at most 3 +registers can be used. You can control this behavior for a specific +function by using the function attribute @code{regparm}. +@xref{Function Attributes}. + +@strong{Warning:} if you use this switch, and +@var{num} is nonzero, then you must build all modules with the same +value, including any libraries. This includes the system libraries and +startup modules. + +@item -msseregparm +@opindex msseregparm +Use SSE register passing conventions for float and double arguments +and return values. You can control this behavior for a specific +function by using the function attribute @code{sseregparm}. +@xref{Function Attributes}. + +@strong{Warning:} if you use this switch then you must build all +modules with the same value, including any libraries. This includes +the system libraries and startup modules. + +@item -mvect8-ret-in-mem +@opindex mvect8-ret-in-mem +Return 8-byte vectors in memory instead of MMX registers. This is the +default on Solaris@tie{}8 and 9 and VxWorks to match the ABI of the Sun +Studio compilers until version 12. Later compiler versions (starting +with Studio 12 Update@tie{}1) follow the ABI used by other x86 targets, which +is the default on Solaris@tie{}10 and later. @emph{Only} use this option if +you need to remain compatible with existing code produced by those +previous compiler versions or older versions of GCC@. + +@item -mpc32 +@itemx -mpc64 +@itemx -mpc80 +@opindex mpc32 +@opindex mpc64 +@opindex mpc80 + +Set 80387 floating-point precision to 32, 64 or 80 bits. When @option{-mpc32} +is specified, the significands of results of floating-point operations are +rounded to 24 bits (single precision); @option{-mpc64} rounds the +significands of results of floating-point operations to 53 bits (double +precision) and @option{-mpc80} rounds the significands of results of +floating-point operations to 64 bits (extended double precision), which is +the default. When this option is used, floating-point operations in higher +precisions are not available to the programmer without setting the FPU +control word explicitly. + +Setting the rounding of floating-point operations to less than the default +80 bits can speed some programs by 2% or more. Note that some mathematical +libraries assume that extended-precision (80-bit) floating-point operations +are enabled by default; routines in such libraries could suffer significant +loss of accuracy, typically through so-called ``catastrophic cancellation'', +when this option is used to set the precision to less than extended precision. + +@item -mstackrealign +@opindex mstackrealign +Realign the stack at entry. On the x86, the @option{-mstackrealign} +option generates an alternate prologue and epilogue that realigns the +run-time stack if necessary. This supports mixing legacy codes that keep +4-byte stack alignment with modern codes that keep 16-byte stack alignment for +SSE compatibility. See also the attribute @code{force_align_arg_pointer}, +applicable to individual functions. + +@item -mpreferred-stack-boundary=@var{num} +@opindex mpreferred-stack-boundary +Attempt to keep the stack boundary aligned to a 2 raised to @var{num} +byte boundary. If @option{-mpreferred-stack-boundary} is not specified, +the default is 4 (16 bytes or 128 bits). + +@strong{Warning:} When generating code for the x86-64 architecture with +SSE extensions disabled, @option{-mpreferred-stack-boundary=3} can be +used to keep the stack boundary aligned to 8 byte boundary. Since +x86-64 ABI require 16 byte stack alignment, this is ABI incompatible and +intended to be used in controlled environment where stack space is +important limitation. This option leads to wrong code when functions +compiled with 16 byte stack alignment (such as functions from a standard +library) are called with misaligned stack. In this case, SSE +instructions may lead to misaligned memory access traps. In addition, +variable arguments are handled incorrectly for 16 byte aligned +objects (including x87 long double and __int128), leading to wrong +results. You must build all modules with +@option{-mpreferred-stack-boundary=3}, including any libraries. This +includes the system libraries and startup modules. + +@item -mincoming-stack-boundary=@var{num} +@opindex mincoming-stack-boundary +Assume the incoming stack is aligned to a 2 raised to @var{num} byte +boundary. If @option{-mincoming-stack-boundary} is not specified, +the one specified by @option{-mpreferred-stack-boundary} is used. + +On Pentium and Pentium Pro, @code{double} and @code{long double} values +should be aligned to an 8-byte boundary (see @option{-malign-double}) or +suffer significant run time performance penalties. On Pentium III, the +Streaming SIMD Extension (SSE) data type @code{__m128} may not work +properly if it is not 16-byte aligned. + +To ensure proper alignment of this values on the stack, the stack boundary +must be as aligned as that required by any value stored on the stack. +Further, every function must be generated such that it keeps the stack +aligned. Thus calling a function compiled with a higher preferred +stack boundary from a function compiled with a lower preferred stack +boundary most likely misaligns the stack. It is recommended that +libraries that use callbacks always use the default setting. + +This extra alignment does consume extra stack space, and generally +increases code size. Code that is sensitive to stack space usage, such +as embedded systems and operating system kernels, may want to reduce the +preferred alignment to @option{-mpreferred-stack-boundary=2}. + +@need 200 +@item -mmmx +@opindex mmmx +@need 200 +@itemx -msse +@opindex msse +@need 200 +@itemx -msse2 +@need 200 +@itemx -msse3 +@need 200 +@itemx -mssse3 +@need 200 +@itemx -msse4 +@need 200 +@itemx -msse4a +@need 200 +@itemx -msse4.1 +@need 200 +@itemx -msse4.2 +@need 200 +@itemx -mavx +@opindex mavx +@need 200 +@itemx -mavx2 +@need 200 +@itemx -mavx512f +@need 200 +@itemx -mavx512pf +@need 200 +@itemx -mavx512er +@need 200 +@itemx -mavx512cd +@need 200 +@itemx -msha +@opindex msha +@need 200 +@itemx -maes +@opindex maes +@need 200 +@itemx -mpclmul +@opindex mpclmul +@need 200 +@itemx -mclfushopt +@opindex mclfushopt +@need 200 +@itemx -mfsgsbase +@opindex mfsgsbase +@need 200 +@itemx -mrdrnd +@opindex mrdrnd +@need 200 +@itemx -mf16c +@opindex mf16c +@need 200 +@itemx -mfma +@opindex mfma +@need 200 +@itemx -mfma4 +@need 200 +@itemx -mno-fma4 +@need 200 +@itemx -mprefetchwt1 +@opindex mprefetchwt1 +@need 200 +@itemx -mxop +@opindex mxop +@need 200 +@itemx -mlwp +@opindex mlwp +@need 200 +@itemx -m3dnow +@opindex m3dnow +@need 200 +@itemx -mpopcnt +@opindex mpopcnt +@need 200 +@itemx -mabm +@opindex mabm +@need 200 +@itemx -mbmi +@opindex mbmi +@need 200 +@itemx -mbmi2 +@need 200 +@itemx -mlzcnt +@opindex mlzcnt +@need 200 +@itemx -mfxsr +@opindex mfxsr +@need 200 +@itemx -mxsave +@opindex mxsave +@need 200 +@itemx -mxsaveopt +@opindex mxsaveopt +@need 200 +@itemx -mxsavec +@opindex mxsavec +@need 200 +@itemx -mxsaves +@opindex mxsaves +@need 200 +@itemx -mrtm +@opindex mrtm +@need 200 +@itemx -mtbm +@opindex mtbm +@need 200 +@itemx -mmpx +@opindex mmpx +These switches enable the use of instructions in the MMX, SSE, +SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD, +SHA, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, LWP, ABM, +BMI, BMI2, FXSR, XSAVE, XSAVEOPT, LZCNT, RTM, MPX or 3DNow!@: +extended instruction sets. Each has a corresponding @option{-mno-} option +to disable use of these instructions. + +These extensions are also available as built-in functions: see +@ref{x86 Built-in Functions}, for details of the functions enabled and +disabled by these switches. + +To generate SSE/SSE2 instructions automatically from floating-point +code (as opposed to 387 instructions), see @option{-mfpmath=sse}. + +GCC depresses SSEx instructions when @option{-mavx} is used. Instead, it +generates new AVX instructions or AVX equivalence for all SSEx instructions +when needed. + +These options enable GCC to use these extended instructions in +generated code, even without @option{-mfpmath=sse}. Applications that +perform run-time CPU detection must compile separate files for each +supported architecture, using the appropriate flags. In particular, +the file containing the CPU detection code should be compiled without +these options. + +@item -mdump-tune-features +@opindex mdump-tune-features +This option instructs GCC to dump the names of the x86 performance +tuning features and default settings. The names can be used in +@option{-mtune-ctrl=@var{feature-list}}. + +@item -mtune-ctrl=@var{feature-list} +@opindex mtune-ctrl=@var{feature-list} +This option is used to do fine grain control of x86 code generation features. +@var{feature-list} is a comma separated list of @var{feature} names. See also +@option{-mdump-tune-features}. When specified, the @var{feature} is turned +on if it is not preceded with @samp{^}, otherwise, it is turned off. +@option{-mtune-ctrl=@var{feature-list}} is intended to be used by GCC +developers. Using it may lead to code paths not covered by testing and can +potentially result in compiler ICEs or runtime errors. + +@item -mno-default +@opindex mno-default +This option instructs GCC to turn off all tunable features. See also +@option{-mtune-ctrl=@var{feature-list}} and @option{-mdump-tune-features}. + +@item -mcld +@opindex mcld +This option instructs GCC to emit a @code{cld} instruction in the prologue +of functions that use string instructions. String instructions depend on +the DF flag to select between autoincrement or autodecrement mode. While the +ABI specifies the DF flag to be cleared on function entry, some operating +systems violate this specification by not clearing the DF flag in their +exception dispatchers. The exception handler can be invoked with the DF flag +set, which leads to wrong direction mode when string instructions are used. +This option can be enabled by default on 32-bit x86 targets by configuring +GCC with the @option{--enable-cld} configure option. Generation of @code{cld} +instructions can be suppressed with the @option{-mno-cld} compiler option +in this case. + +@item -mvzeroupper +@opindex mvzeroupper +This option instructs GCC to emit a @code{vzeroupper} instruction +before a transfer of control flow out of the function to minimize +the AVX to SSE transition penalty as well as remove unnecessary @code{zeroupper} +intrinsics. + +@item -mprefer-avx128 +@opindex mprefer-avx128 +This option instructs GCC to use 128-bit AVX instructions instead of +256-bit AVX instructions in the auto-vectorizer. + +@item -mcx16 +@opindex mcx16 +This option enables GCC to generate @code{CMPXCHG16B} instructions. +@code{CMPXCHG16B} allows for atomic operations on 128-bit double quadword +(or oword) data types. +This is useful for high-resolution counters that can be updated +by multiple processors (or cores). This instruction is generated as part of +atomic built-in functions: see @ref{__sync Builtins} or +@ref{__atomic Builtins} for details. + +@item -msahf +@opindex msahf +This option enables generation of @code{SAHF} instructions in 64-bit code. +Early Intel Pentium 4 CPUs with Intel 64 support, +prior to the introduction of Pentium 4 G1 step in December 2005, +lacked the @code{LAHF} and @code{SAHF} instructions +which are supported by AMD64. +These are load and store instructions, respectively, for certain status flags. +In 64-bit mode, the @code{SAHF} instruction is used to optimize @code{fmod}, +@code{drem}, and @code{remainder} built-in functions; +see @ref{Other Builtins} for details. + +@item -mmovbe +@opindex mmovbe +This option enables use of the @code{movbe} instruction to implement +@code{__builtin_bswap32} and @code{__builtin_bswap64}. + +@item -mcrc32 +@opindex mcrc32 +This option enables built-in functions @code{__builtin_ia32_crc32qi}, +@code{__builtin_ia32_crc32hi}, @code{__builtin_ia32_crc32si} and +@code{__builtin_ia32_crc32di} to generate the @code{crc32} machine instruction. + +@item -mrecip +@opindex mrecip +This option enables use of @code{RCPSS} and @code{RSQRTSS} instructions +(and their vectorized variants @code{RCPPS} and @code{RSQRTPS}) +with an additional Newton-Raphson step +to increase precision instead of @code{DIVSS} and @code{SQRTSS} +(and their vectorized +variants) for single-precision floating-point arguments. These instructions +are generated only when @option{-funsafe-math-optimizations} is enabled +together with @option{-finite-math-only} and @option{-fno-trapping-math}. +Note that while the throughput of the sequence is higher than the throughput +of the non-reciprocal instruction, the precision of the sequence can be +decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994). + +Note that GCC implements @code{1.0f/sqrtf(@var{x})} in terms of @code{RSQRTSS} +(or @code{RSQRTPS}) already with @option{-ffast-math} (or the above option +combination), and doesn't need @option{-mrecip}. + +Also note that GCC emits the above sequence with additional Newton-Raphson step +for vectorized single-float division and vectorized @code{sqrtf(@var{x})} +already with @option{-ffast-math} (or the above option combination), and +doesn't need @option{-mrecip}. + +@item -mrecip=@var{opt} +@opindex mrecip=opt +This option controls which reciprocal estimate instructions +may be used. @var{opt} is a comma-separated list of options, which may +be preceded by a @samp{!} to invert the option: + +@table @samp +@item all +Enable all estimate instructions. + +@item default +Enable the default instructions, equivalent to @option{-mrecip}. + +@item none +Disable all estimate instructions, equivalent to @option{-mno-recip}. + +@item div +Enable the approximation for scalar division. + +@item vec-div +Enable the approximation for vectorized division. + +@item sqrt +Enable the approximation for scalar square root. + +@item vec-sqrt +Enable the approximation for vectorized square root. +@end table + +So, for example, @option{-mrecip=all,!sqrt} enables +all of the reciprocal approximations, except for square root. + +@item -mveclibabi=@var{type} +@opindex mveclibabi +Specifies the ABI type to use for vectorizing intrinsics using an +external library. Supported values for @var{type} are @samp{svml} +for the Intel short +vector math library and @samp{acml} for the AMD math core library. +To use this option, both @option{-ftree-vectorize} and +@option{-funsafe-math-optimizations} have to be enabled, and an SVML or ACML +ABI-compatible library must be specified at link time. + +GCC currently emits calls to @code{vmldExp2}, +@code{vmldLn2}, @code{vmldLog102}, @code{vmldLog102}, @code{vmldPow2}, +@code{vmldTanh2}, @code{vmldTan2}, @code{vmldAtan2}, @code{vmldAtanh2}, +@code{vmldCbrt2}, @code{vmldSinh2}, @code{vmldSin2}, @code{vmldAsinh2}, +@code{vmldAsin2}, @code{vmldCosh2}, @code{vmldCos2}, @code{vmldAcosh2}, +@code{vmldAcos2}, @code{vmlsExp4}, @code{vmlsLn4}, @code{vmlsLog104}, +@code{vmlsLog104}, @code{vmlsPow4}, @code{vmlsTanh4}, @code{vmlsTan4}, +@code{vmlsAtan4}, @code{vmlsAtanh4}, @code{vmlsCbrt4}, @code{vmlsSinh4}, +@code{vmlsSin4}, @code{vmlsAsinh4}, @code{vmlsAsin4}, @code{vmlsCosh4}, +@code{vmlsCos4}, @code{vmlsAcosh4} and @code{vmlsAcos4} for corresponding +function type when @option{-mveclibabi=svml} is used, and @code{__vrd2_sin}, +@code{__vrd2_cos}, @code{__vrd2_exp}, @code{__vrd2_log}, @code{__vrd2_log2}, +@code{__vrd2_log10}, @code{__vrs4_sinf}, @code{__vrs4_cosf}, +@code{__vrs4_expf}, @code{__vrs4_logf}, @code{__vrs4_log2f}, +@code{__vrs4_log10f} and @code{__vrs4_powf} for the corresponding function type +when @option{-mveclibabi=acml} is used. + +@item -mabi=@var{name} +@opindex mabi +Generate code for the specified calling convention. Permissible values +are @samp{sysv} for the ABI used on GNU/Linux and other systems, and +@samp{ms} for the Microsoft ABI. The default is to use the Microsoft +ABI when targeting Microsoft Windows and the SysV ABI on all other systems. +You can control this behavior for specific functions by +using the function attributes @code{ms_abi} and @code{sysv_abi}. +@xref{Function Attributes}. + +@item -mtls-dialect=@var{type} +@opindex mtls-dialect +Generate code to access thread-local storage using the @samp{gnu} or +@samp{gnu2} conventions. @samp{gnu} is the conservative default; +@samp{gnu2} is more efficient, but it may add compile- and run-time +requirements that cannot be satisfied on all systems. + +@item -mpush-args +@itemx -mno-push-args +@opindex mpush-args +@opindex mno-push-args +Use PUSH operations to store outgoing parameters. This method is shorter +and usually equally fast as method using SUB/MOV operations and is enabled +by default. In some cases disabling it may improve performance because of +improved scheduling and reduced dependencies. + +@item -maccumulate-outgoing-args +@opindex maccumulate-outgoing-args +If enabled, the maximum amount of space required for outgoing arguments is +computed in the function prologue. This is faster on most modern CPUs +because of reduced dependencies, improved scheduling and reduced stack usage +when the preferred stack boundary is not equal to 2. The drawback is a notable +increase in code size. This switch implies @option{-mno-push-args}. + +@item -mthreads +@opindex mthreads +Support thread-safe exception handling on MinGW. Programs that rely +on thread-safe exception handling must compile and link all code with the +@option{-mthreads} option. When compiling, @option{-mthreads} defines +@option{-D_MT}; when linking, it links in a special thread helper library +@option{-lmingwthrd} which cleans up per-thread exception-handling data. + +@item -mno-align-stringops +@opindex mno-align-stringops +Do not align the destination of inlined string operations. This switch reduces +code size and improves performance in case the destination is already aligned, +but GCC doesn't know about it. + +@item -minline-all-stringops +@opindex minline-all-stringops +By default GCC inlines string operations only when the destination is +known to be aligned to least a 4-byte boundary. +This enables more inlining and increases code +size, but may improve performance of code that depends on fast +@code{memcpy}, @code{strlen}, +and @code{memset} for short lengths. + +@item -minline-stringops-dynamically +@opindex minline-stringops-dynamically +For string operations of unknown size, use run-time checks with +inline code for small blocks and a library call for large blocks. + +@item -mstringop-strategy=@var{alg} +@opindex mstringop-strategy=@var{alg} +Override the internal decision heuristic for the particular algorithm to use +for inlining string operations. The allowed values for @var{alg} are: + +@table @samp +@item rep_byte +@itemx rep_4byte +@itemx rep_8byte +Expand using i386 @code{rep} prefix of the specified size. + +@item byte_loop +@itemx loop +@itemx unrolled_loop +Expand into an inline loop. + +@item libcall +Always use a library call. +@end table + +@item -mmemcpy-strategy=@var{strategy} +@opindex mmemcpy-strategy=@var{strategy} +Override the internal decision heuristic to decide if @code{__builtin_memcpy} +should be inlined and what inline algorithm to use when the expected size +of the copy operation is known. @var{strategy} +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets. +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies +the max byte size with which inline algorithm @var{alg} is allowed. For the last +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets +in the list must be specified in increasing order. The minimal byte size for +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the +preceding range. + +@item -mmemset-strategy=@var{strategy} +@opindex mmemset-strategy=@var{strategy} +The option is similar to @option{-mmemcpy-strategy=} except that it is to control +@code{__builtin_memset} expansion. + +@item -momit-leaf-frame-pointer +@opindex momit-leaf-frame-pointer +Don't keep the frame pointer in a register for leaf functions. This +avoids the instructions to save, set up, and restore frame pointers and +makes an extra register available in leaf functions. The option +@option{-fomit-leaf-frame-pointer} removes the frame pointer for leaf functions, +which might make debugging harder. + +@item -mtls-direct-seg-refs +@itemx -mno-tls-direct-seg-refs +@opindex mtls-direct-seg-refs +Controls whether TLS variables may be accessed with offsets from the +TLS segment register (@code{%gs} for 32-bit, @code{%fs} for 64-bit), +or whether the thread base pointer must be added. Whether or not this +is valid depends on the operating system, and whether it maps the +segment to cover the entire TLS area. + +For systems that use the GNU C Library, the default is on. + +@item -msse2avx +@itemx -mno-sse2avx +@opindex msse2avx +Specify that the assembler should encode SSE instructions with VEX +prefix. The option @option{-mavx} turns this on by default. + +@item -mfentry +@itemx -mno-fentry +@opindex mfentry +If profiling is active (@option{-pg}), put the profiling +counter call before the prologue. +Note: On x86 architectures the attribute @code{ms_hook_prologue} +isn't possible at the moment for @option{-mfentry} and @option{-pg}. + +@item -mrecord-mcount +@itemx -mno-record-mcount +@opindex mrecord-mcount +If profiling is active (@option{-pg}), generate a __mcount_loc section +that contains pointers to each profiling call. This is useful for +automatically patching and out calls. + +@item -mnop-mcount +@itemx -mno-nop-mcount +@opindex mnop-mcount +If profiling is active (@option{-pg}), generate the calls to +the profiling functions as nops. This is useful when they +should be patched in later dynamically. This is likely only +useful together with @option{-mrecord-mcount}. + +@item -mskip-rax-setup +@itemx -mno-skip-rax-setup +@opindex mskip-rax-setup +When generating code for the x86-64 architecture with SSE extensions +disabled, @option{-skip-rax-setup} can be used to skip setting up RAX +register when there are no variable arguments passed in vector registers. + +@strong{Warning:} Since RAX register is used to avoid unnecessarily +saving vector registers on stack when passing variable arguments, the +impacts of this option are callees may waste some stack space, +misbehave or jump to a random location. GCC 4.4 or newer don't have +those issues, regardless the RAX register value. + +@item -m8bit-idiv +@itemx -mno-8bit-idiv +@opindex m8bit-idiv +On some processors, like Intel Atom, 8-bit unsigned integer divide is +much faster than 32-bit/64-bit integer divide. This option generates a +run-time check. If both dividend and divisor are within range of 0 +to 255, 8-bit unsigned integer divide is used instead of +32-bit/64-bit integer divide. + +@item -mavx256-split-unaligned-load +@itemx -mavx256-split-unaligned-store +@opindex mavx256-split-unaligned-load +@opindex mavx256-split-unaligned-store +Split 32-byte AVX unaligned load and store. + +@item -mstack-protector-guard=@var{guard} +@opindex mstack-protector-guard=@var{guard} +Generate stack protection code using canary at @var{guard}. Supported +locations are @samp{global} for global canary or @samp{tls} for per-thread +canary in the TLS block (the default). This option has effect only when +@option{-fstack-protector} or @option{-fstack-protector-all} is specified. + +@end table + +These @samp{-m} switches are supported in addition to the above +on x86-64 processors in 64-bit environments. + +@table @gcctabopt +@item -m32 +@itemx -m64 +@itemx -mx32 +@itemx -m16 +@opindex m32 +@opindex m64 +@opindex mx32 +@opindex m16 +Generate code for a 16-bit, 32-bit or 64-bit environment. +The @option{-m32} option sets @code{int}, @code{long}, and pointer types +to 32 bits, and +generates code that runs on any i386 system. + +The @option{-m64} option sets @code{int} to 32 bits and @code{long} and pointer +types to 64 bits, and generates code for the x86-64 architecture. +For Darwin only the @option{-m64} option also turns off the @option{-fno-pic} +and @option{-mdynamic-no-pic} options. + +The @option{-mx32} option sets @code{int}, @code{long}, and pointer types +to 32 bits, and +generates code for the x86-64 architecture. + +The @option{-m16} option is the same as @option{-m32}, except for that +it outputs the @code{.code16gcc} assembly directive at the beginning of +the assembly output so that the binary can run in 16-bit mode. + +@item -mno-red-zone +@opindex mno-red-zone +Do not use a so-called ``red zone'' for x86-64 code. The red zone is mandated +by the x86-64 ABI; it is a 128-byte area beyond the location of the +stack pointer that is not modified by signal or interrupt handlers +and therefore can be used for temporary data without adjusting the stack +pointer. The flag @option{-mno-red-zone} disables this red zone. + +@item -mcmodel=small +@opindex mcmodel=small +Generate code for the small code model: the program and its symbols must +be linked in the lower 2 GB of the address space. Pointers are 64 bits. +Programs can be statically or dynamically linked. This is the default +code model. + +@item -mcmodel=kernel +@opindex mcmodel=kernel +Generate code for the kernel code model. The kernel runs in the +negative 2 GB of the address space. +This model has to be used for Linux kernel code. + +@item -mcmodel=medium +@opindex mcmodel=medium +Generate code for the medium model: the program is linked in the lower 2 +GB of the address space. Small symbols are also placed there. Symbols +with sizes larger than @option{-mlarge-data-threshold} are put into +large data or BSS sections and can be located above 2GB. Programs can +be statically or dynamically linked. + +@item -mcmodel=large +@opindex mcmodel=large +Generate code for the large model. This model makes no assumptions +about addresses and sizes of sections. + +@item -maddress-mode=long +@opindex maddress-mode=long +Generate code for long address mode. This is only supported for 64-bit +and x32 environments. It is the default address mode for 64-bit +environments. + +@item -maddress-mode=short +@opindex maddress-mode=short +Generate code for short address mode. This is only supported for 32-bit +and x32 environments. It is the default address mode for 32-bit and +x32 environments. +@end table + +@node x86 Windows Options +@subsection x86 Windows Options +@cindex x86 Windows Options +@cindex Windows Options for x86 + +These additional options are available for Microsoft Windows targets: + +@table @gcctabopt +@item -mconsole +@opindex mconsole +This option +specifies that a console application is to be generated, by +instructing the linker to set the PE header subsystem type +required for console applications. +This option is available for Cygwin and MinGW targets and is +enabled by default on those targets. + +@item -mdll +@opindex mdll +This option is available for Cygwin and MinGW targets. It +specifies that a DLL---a dynamic link library---is to be +generated, enabling the selection of the required runtime +startup object and entry point. + +@item -mnop-fun-dllimport +@opindex mnop-fun-dllimport +This option is available for Cygwin and MinGW targets. It +specifies that the @code{dllimport} attribute should be ignored. + +@item -mthread +@opindex mthread +This option is available for MinGW targets. It specifies +that MinGW-specific thread support is to be used. + +@item -municode +@opindex municode +This option is available for MinGW-w64 targets. It causes +the @code{UNICODE} preprocessor macro to be predefined, and +chooses Unicode-capable runtime startup code. + +@item -mwin32 +@opindex mwin32 +This option is available for Cygwin and MinGW targets. It +specifies that the typical Microsoft Windows predefined macros are to +be set in the pre-processor, but does not influence the choice +of runtime library/startup code. + +@item -mwindows +@opindex mwindows +This option is available for Cygwin and MinGW targets. It +specifies that a GUI application is to be generated by +instructing the linker to set the PE header subsystem type +appropriately. + +@item -fno-set-stack-executable +@opindex fno-set-stack-executable +This option is available for MinGW targets. It specifies that +the executable flag for the stack used by nested functions isn't +set. This is necessary for binaries running in kernel mode of +Microsoft Windows, as there the User32 API, which is used to set executable +privileges, isn't available. + +@item -fwritable-relocated-rdata +@opindex fno-writable-relocated-rdata +This option is available for MinGW and Cygwin targets. It specifies +that relocated-data in read-only section is put into .data +section. This is a necessary for older runtimes not supporting +modification of .rdata sections for pseudo-relocation. + +@item -mpe-aligned-commons +@opindex mpe-aligned-commons +This option is available for Cygwin and MinGW targets. It +specifies that the GNU extension to the PE file format that +permits the correct alignment of COMMON variables should be +used when generating code. It is enabled by default if +GCC detects that the target assembler found during configuration +supports the feature. +@end table + +See also under @ref{x86 Options} for standard options. + @node Xstormy16 Options @subsection Xstormy16 Options @cindex Xstormy16 Options diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 03faa12..f2c25c2 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -1695,6 +1695,7 @@ constraints that aren't. The compiler source file mentioned in the table heading for each architecture is the definitive reference for the meanings of that architecture's constraints. +@c Please keep this table alphabetized by target! @table @emph @item AArch64 family---@file{config/aarch64/constraints.md} @table @code @@ -1931,6 +1932,157 @@ A floating point constant 0.0 A memory address based on Y or Z pointer with displacement. @end table +@item Blackfin family---@file{config/bfin/constraints.md} +@table @code +@item a +P register + +@item d +D register + +@item z +A call clobbered P register. + +@item q@var{n} +A single register. If @var{n} is in the range 0 to 7, the corresponding D +register. If it is @code{A}, then the register P0. + +@item D +Even-numbered D register + +@item W +Odd-numbered D register + +@item e +Accumulator register. + +@item A +Even-numbered accumulator register. + +@item B +Odd-numbered accumulator register. + +@item b +I register + +@item v +B register + +@item f +M register + +@item c +Registers used for circular buffering, i.e. I, B, or L registers. + +@item C +The CC register. + +@item t +LT0 or LT1. + +@item k +LC0 or LC1. + +@item u +LB0 or LB1. + +@item x +Any D, P, B, M, I or L register. + +@item y +Additional registers typically used only in prologues and epilogues: RETS, +RETN, RETI, RETX, RETE, ASTAT, SEQSTAT and USP. + +@item w +Any register except accumulators or CC. + +@item Ksh +Signed 16 bit integer (in the range @minus{}32768 to 32767) + +@item Kuh +Unsigned 16 bit integer (in the range 0 to 65535) + +@item Ks7 +Signed 7 bit integer (in the range @minus{}64 to 63) + +@item Ku7 +Unsigned 7 bit integer (in the range 0 to 127) + +@item Ku5 +Unsigned 5 bit integer (in the range 0 to 31) + +@item Ks4 +Signed 4 bit integer (in the range @minus{}8 to 7) + +@item Ks3 +Signed 3 bit integer (in the range @minus{}3 to 4) + +@item Ku3 +Unsigned 3 bit integer (in the range 0 to 7) + +@item P@var{n} +Constant @var{n}, where @var{n} is a single-digit constant in the range 0 to 4. + +@item PA +An integer equal to one of the MACFLAG_XXX constants that is suitable for +use with either accumulator. + +@item PB +An integer equal to one of the MACFLAG_XXX constants that is suitable for +use only with accumulator A1. + +@item M1 +Constant 255. + +@item M2 +Constant 65535. + +@item J +An integer constant with exactly a single bit set. + +@item L +An integer constant with all bits set except exactly one. + +@item H + +@item Q +Any SYMBOL_REF. +@end table + +@item CR16 Architecture---@file{config/cr16/cr16.h} +@table @code + +@item b +Registers from r0 to r14 (registers without stack pointer) + +@item t +Register from r0 to r11 (all 16-bit registers) + +@item p +Register from r12 to r15 (all 32-bit registers) + +@item I +Signed constant that fits in 4 bits + +@item J +Signed constant that fits in 5 bits + +@item K +Signed constant that fits in 6 bits + +@item L +Unsigned constant that fits in 4 bits + +@item M +Signed constant that fits in 32 bits + +@item N +Check for 64 bits wide constants for add/sub instructions + +@item G +Floating point constant that is legal for store immediate +@end table + @item Epiphany---@file{config/epiphany/constraints.md} @table @code @item U16 @@ -2002,38 +2154,97 @@ Matches control register values to switch fp mode, which are encapsulated in @code{UNSPEC_FP_MODE}. @end table -@item CR16 Architecture---@file{config/cr16/cr16.h} +@item FRV---@file{config/frv/frv.h} @table @code +@item a +Register in the class @code{ACC_REGS} (@code{acc0} to @code{acc7}). @item b -Registers from r0 to r14 (registers without stack pointer) +Register in the class @code{EVEN_ACC_REGS} (@code{acc0} to @code{acc7}). + +@item c +Register in the class @code{CC_REGS} (@code{fcc0} to @code{fcc3} and +@code{icc0} to @code{icc3}). + +@item d +Register in the class @code{GPR_REGS} (@code{gr0} to @code{gr63}). + +@item e +Register in the class @code{EVEN_REGS} (@code{gr0} to @code{gr63}). +Odd registers are excluded not in the class but through the use of a machine +mode larger than 4 bytes. + +@item f +Register in the class @code{FPR_REGS} (@code{fr0} to @code{fr63}). + +@item h +Register in the class @code{FEVEN_REGS} (@code{fr0} to @code{fr63}). +Odd registers are excluded not in the class but through the use of a machine +mode larger than 4 bytes. + +@item l +Register in the class @code{LR_REG} (the @code{lr} register). + +@item q +Register in the class @code{QUAD_REGS} (@code{gr2} to @code{gr63}). +Register numbers not divisible by 4 are excluded not in the class but through +the use of a machine mode larger than 8 bytes. @item t -Register from r0 to r11 (all 16-bit registers) +Register in the class @code{ICC_REGS} (@code{icc0} to @code{icc3}). -@item p -Register from r12 to r15 (all 32-bit registers) +@item u +Register in the class @code{FCC_REGS} (@code{fcc0} to @code{fcc3}). + +@item v +Register in the class @code{ICR_REGS} (@code{cc4} to @code{cc7}). + +@item w +Register in the class @code{FCR_REGS} (@code{cc0} to @code{cc3}). + +@item x +Register in the class @code{QUAD_FPR_REGS} (@code{fr0} to @code{fr63}). +Register numbers not divisible by 4 are excluded not in the class but through +the use of a machine mode larger than 8 bytes. + +@item z +Register in the class @code{SPR_REGS} (@code{lcr} and @code{lr}). + +@item A +Register in the class @code{QUAD_ACC_REGS} (@code{acc0} to @code{acc7}). + +@item B +Register in the class @code{ACCG_REGS} (@code{accg0} to @code{accg7}). + +@item C +Register in the class @code{CR_REGS} (@code{cc0} to @code{cc7}). + +@item G +Floating point constant zero @item I -Signed constant that fits in 4 bits +6-bit signed integer constant @item J -Signed constant that fits in 5 bits - -@item K -Signed constant that fits in 6 bits +10-bit signed integer constant @item L -Unsigned constant that fits in 4 bits +16-bit signed integer constant @item M -Signed constant that fits in 32 bits +16-bit unsigned integer constant @item N -Check for 64 bits wide constants for add/sub instructions +12-bit signed integer constant that is negative---i.e.@: in the +range of @minus{}2048 to @minus{}1 + +@item O +Constant zero + +@item P +12-bit signed integer constant that is greater than zero---i.e.@: in the +range of 1 to 2047. -@item G -Floating point constant that is legal for store immediate @end table @item Hewlett-Packard PA-RISC---@file{config/pa/pa.h} @@ -2107,343 +2318,6 @@ A memory operand for floating-point loads and stores A register indirect memory operand @end table -@item PowerPC and IBM RS6000---@file{config/rs6000/constraints.md} -@table @code -@item b -Address base register - -@item d -Floating point register (containing 64-bit value) - -@item f -Floating point register (containing 32-bit value) - -@item v -Altivec vector register - -@item wa -Any VSX register if the -mvsx option was used or NO_REGS. - -@item wd -VSX vector register to hold vector double data or NO_REGS. - -@item wf -VSX vector register to hold vector float data or NO_REGS. - -@item wg -If @option{-mmfpgpr} was used, a floating point register or NO_REGS. - -@item wh -Floating point register if direct moves are available, or NO_REGS. - -@item wi -FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS. - -@item wj -FP or VSX register to hold 64-bit integers for direct moves or NO_REGS. - -@item wk -FP or VSX register to hold 64-bit doubles for direct moves or NO_REGS. - -@item wl -Floating point register if the LFIWAX instruction is enabled or NO_REGS. - -@item wm -VSX register if direct move instructions are enabled, or NO_REGS. - -@item wn -No register (NO_REGS). - -@item wr -General purpose register if 64-bit instructions are enabled or NO_REGS. - -@item ws -VSX vector register to hold scalar double values or NO_REGS. - -@item wt -VSX vector register to hold 128 bit integer or NO_REGS. - -@item wu -Altivec register to use for float/32-bit int loads/stores or NO_REGS. - -@item wv -Altivec register to use for double loads/stores or NO_REGS. - -@item ww -FP or VSX register to perform float operations under @option{-mvsx} or NO_REGS. - -@item wx -Floating point register if the STFIWX instruction is enabled or NO_REGS. - -@item wy -FP or VSX register to perform ISA 2.07 float ops or NO_REGS. - -@item wz -Floating point register if the LFIWZX instruction is enabled or NO_REGS. - -@item wD -Int constant that is the element number of the 64-bit scalar in a vector. - -@item wQ -A memory address that will work with the @code{lq} and @code{stq} -instructions. - -@item h -@samp{MQ}, @samp{CTR}, or @samp{LINK} register - -@item q -@samp{MQ} register - -@item c -@samp{CTR} register - -@item l -@samp{LINK} register - -@item x -@samp{CR} register (condition register) number 0 - -@item y -@samp{CR} register (condition register) - -@item z -@samp{XER[CA]} carry bit (part of the XER register) - -@item I -Signed 16-bit constant - -@item J -Unsigned 16-bit constant shifted left 16 bits (use @samp{L} instead for -@code{SImode} constants) - -@item K -Unsigned 16-bit constant - -@item L -Signed 16-bit constant shifted left 16 bits - -@item M -Constant larger than 31 - -@item N -Exact power of 2 - -@item O -Zero - -@item P -Constant whose negation is a signed 16-bit constant - -@item G -Floating point constant that can be loaded into a register with one -instruction per word - -@item H -Integer/Floating point constant that can be loaded into a register using -three instructions - -@item m -Memory operand. -Normally, @code{m} does not allow addresses that update the base register. -If @samp{<} or @samp{>} constraint is also used, they are allowed and -therefore on PowerPC targets in that case it is only safe -to use @samp{m<>} in an @code{asm} statement if that @code{asm} statement -accesses the operand exactly once. The @code{asm} statement must also -use @samp{%U@var{<opno>}} as a placeholder for the ``update'' flag in the -corresponding load or store instruction. For example: - -@smallexample -asm ("st%U0 %1,%0" : "=m<>" (mem) : "r" (val)); -@end smallexample - -is correct but: - -@smallexample -asm ("st %1,%0" : "=m<>" (mem) : "r" (val)); -@end smallexample - -is not. - -@item es -A ``stable'' memory operand; that is, one which does not include any -automodification of the base register. This used to be useful when -@samp{m} allowed automodification of the base register, but as those are now only -allowed when @samp{<} or @samp{>} is used, @samp{es} is basically the same -as @samp{m} without @samp{<} and @samp{>}. - -@item Q -Memory operand that is an offset from a register (it is usually better -to use @samp{m} or @samp{es} in @code{asm} statements) - -@item Z -Memory operand that is an indexed or indirect from a register (it is -usually better to use @samp{m} or @samp{es} in @code{asm} statements) - -@item R -AIX TOC entry - -@item a -Address operand that is an indexed or indirect from a register (@samp{p} is -preferable for @code{asm} statements) - -@item S -Constant suitable as a 64-bit mask operand - -@item T -Constant suitable as a 32-bit mask operand - -@item U -System V Release 4 small data area reference - -@item t -AND masks that can be performed by two rldic@{l, r@} instructions - -@item W -Vector constant that does not require memory - -@item j -Vector constant that is all zeros. - -@end table - -@item x86 family---@file{config/i386/constraints.md} -@table @code -@item R -Legacy register---the eight integer registers available on all -i386 processors (@code{a}, @code{b}, @code{c}, @code{d}, -@code{si}, @code{di}, @code{bp}, @code{sp}). - -@item q -Any register accessible as @code{@var{r}l}. In 32-bit mode, @code{a}, -@code{b}, @code{c}, and @code{d}; in 64-bit mode, any integer register. - -@item Q -Any register accessible as @code{@var{r}h}: @code{a}, @code{b}, -@code{c}, and @code{d}. - -@ifset INTERNALS -@item l -Any register that can be used as the index in a base+index memory -access: that is, any general register except the stack pointer. -@end ifset - -@item a -The @code{a} register. - -@item b -The @code{b} register. - -@item c -The @code{c} register. - -@item d -The @code{d} register. - -@item S -The @code{si} register. - -@item D -The @code{di} register. - -@item A -The @code{a} and @code{d} registers. This class is used for instructions -that return double word results in the @code{ax:dx} register pair. Single -word values will be allocated either in @code{ax} or @code{dx}. -For example on i386 the following implements @code{rdtsc}: - -@smallexample -unsigned long long rdtsc (void) -@{ - unsigned long long tick; - __asm__ __volatile__("rdtsc":"=A"(tick)); - return tick; -@} -@end smallexample - -This is not correct on x86-64 as it would allocate tick in either @code{ax} -or @code{dx}. You have to use the following variant instead: - -@smallexample -unsigned long long rdtsc (void) -@{ - unsigned int tickl, tickh; - __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh)); - return ((unsigned long long)tickh << 32)|tickl; -@} -@end smallexample - - -@item f -Any 80387 floating-point (stack) register. - -@item t -Top of 80387 floating-point stack (@code{%st(0)}). - -@item u -Second from top of 80387 floating-point stack (@code{%st(1)}). - -@item y -Any MMX register. - -@item x -Any SSE register. - -@item Yz -First SSE register (@code{%xmm0}). - -@ifset INTERNALS -@item Y2 -Any SSE register, when SSE2 is enabled. - -@item Yi -Any SSE register, when SSE2 and inter-unit moves are enabled. - -@item Ym -Any MMX register, when inter-unit moves are enabled. -@end ifset - -@item I -Integer constant in the range 0 @dots{} 31, for 32-bit shifts. - -@item J -Integer constant in the range 0 @dots{} 63, for 64-bit shifts. - -@item K -Signed 8-bit integer constant. - -@item L -@code{0xFF} or @code{0xFFFF}, for andsi as a zero-extending move. - -@item M -0, 1, 2, or 3 (shifts for the @code{lea} instruction). - -@item N -Unsigned 8-bit integer constant (for @code{in} and @code{out} -instructions). - -@ifset INTERNALS -@item O -Integer constant in the range 0 @dots{} 127, for 128-bit shifts. -@end ifset - -@item G -Standard 80387 floating point constant. - -@item C -Standard SSE floating point constant. - -@item e -32-bit signed integer constant, or a symbolic reference known -to fit that range (for immediate operands in sign-extending x86-64 -instructions). - -@item Z -32-bit unsigned integer constant, or a symbolic reference known -to fit that range (for immediate operands in zero-extending x86-64 -instructions). - -@end table - @item Intel IA-64---@file{config/ia64/ia64.h} @table @code @item a @@ -2508,216 +2382,6 @@ now roughly the same as @samp{m} when not used together with @samp{<} or @samp{>}. @end table -@item FRV---@file{config/frv/frv.h} -@table @code -@item a -Register in the class @code{ACC_REGS} (@code{acc0} to @code{acc7}). - -@item b -Register in the class @code{EVEN_ACC_REGS} (@code{acc0} to @code{acc7}). - -@item c -Register in the class @code{CC_REGS} (@code{fcc0} to @code{fcc3} and -@code{icc0} to @code{icc3}). - -@item d -Register in the class @code{GPR_REGS} (@code{gr0} to @code{gr63}). - -@item e -Register in the class @code{EVEN_REGS} (@code{gr0} to @code{gr63}). -Odd registers are excluded not in the class but through the use of a machine -mode larger than 4 bytes. - -@item f -Register in the class @code{FPR_REGS} (@code{fr0} to @code{fr63}). - -@item h -Register in the class @code{FEVEN_REGS} (@code{fr0} to @code{fr63}). -Odd registers are excluded not in the class but through the use of a machine -mode larger than 4 bytes. - -@item l -Register in the class @code{LR_REG} (the @code{lr} register). - -@item q -Register in the class @code{QUAD_REGS} (@code{gr2} to @code{gr63}). -Register numbers not divisible by 4 are excluded not in the class but through -the use of a machine mode larger than 8 bytes. - -@item t -Register in the class @code{ICC_REGS} (@code{icc0} to @code{icc3}). - -@item u -Register in the class @code{FCC_REGS} (@code{fcc0} to @code{fcc3}). - -@item v -Register in the class @code{ICR_REGS} (@code{cc4} to @code{cc7}). - -@item w -Register in the class @code{FCR_REGS} (@code{cc0} to @code{cc3}). - -@item x -Register in the class @code{QUAD_FPR_REGS} (@code{fr0} to @code{fr63}). -Register numbers not divisible by 4 are excluded not in the class but through -the use of a machine mode larger than 8 bytes. - -@item z -Register in the class @code{SPR_REGS} (@code{lcr} and @code{lr}). - -@item A -Register in the class @code{QUAD_ACC_REGS} (@code{acc0} to @code{acc7}). - -@item B -Register in the class @code{ACCG_REGS} (@code{accg0} to @code{accg7}). - -@item C -Register in the class @code{CR_REGS} (@code{cc0} to @code{cc7}). - -@item G -Floating point constant zero - -@item I -6-bit signed integer constant - -@item J -10-bit signed integer constant - -@item L -16-bit signed integer constant - -@item M -16-bit unsigned integer constant - -@item N -12-bit signed integer constant that is negative---i.e.@: in the -range of @minus{}2048 to @minus{}1 - -@item O -Constant zero - -@item P -12-bit signed integer constant that is greater than zero---i.e.@: in the -range of 1 to 2047. - -@end table - -@item Blackfin family---@file{config/bfin/constraints.md} -@table @code -@item a -P register - -@item d -D register - -@item z -A call clobbered P register. - -@item q@var{n} -A single register. If @var{n} is in the range 0 to 7, the corresponding D -register. If it is @code{A}, then the register P0. - -@item D -Even-numbered D register - -@item W -Odd-numbered D register - -@item e -Accumulator register. - -@item A -Even-numbered accumulator register. - -@item B -Odd-numbered accumulator register. - -@item b -I register - -@item v -B register - -@item f -M register - -@item c -Registers used for circular buffering, i.e. I, B, or L registers. - -@item C -The CC register. - -@item t -LT0 or LT1. - -@item k -LC0 or LC1. - -@item u -LB0 or LB1. - -@item x -Any D, P, B, M, I or L register. - -@item y -Additional registers typically used only in prologues and epilogues: RETS, -RETN, RETI, RETX, RETE, ASTAT, SEQSTAT and USP. - -@item w -Any register except accumulators or CC. - -@item Ksh -Signed 16 bit integer (in the range @minus{}32768 to 32767) - -@item Kuh -Unsigned 16 bit integer (in the range 0 to 65535) - -@item Ks7 -Signed 7 bit integer (in the range @minus{}64 to 63) - -@item Ku7 -Unsigned 7 bit integer (in the range 0 to 127) - -@item Ku5 -Unsigned 5 bit integer (in the range 0 to 31) - -@item Ks4 -Signed 4 bit integer (in the range @minus{}8 to 7) - -@item Ks3 -Signed 3 bit integer (in the range @minus{}3 to 4) - -@item Ku3 -Unsigned 3 bit integer (in the range 0 to 7) - -@item P@var{n} -Constant @var{n}, where @var{n} is a single-digit constant in the range 0 to 4. - -@item PA -An integer equal to one of the MACFLAG_XXX constants that is suitable for -use with either accumulator. - -@item PB -An integer equal to one of the MACFLAG_XXX constants that is suitable for -use only with accumulator A1. - -@item M1 -Constant 255. - -@item M2 -Constant 65535. - -@item J -An integer constant with exactly a single bit set. - -@item L -An integer constant with all bits set except exactly one. - -@item H - -@item Q -Any SYMBOL_REF. -@end table - @item M32C---@file{config/m32c/m32c.c} @table @code @item Rsp @@ -3346,6 +3010,205 @@ A memory reference that is encoded within the opcode. @end table +@item PowerPC and IBM RS6000---@file{config/rs6000/constraints.md} +@table @code +@item b +Address base register + +@item d +Floating point register (containing 64-bit value) + +@item f +Floating point register (containing 32-bit value) + +@item v +Altivec vector register + +@item wa +Any VSX register if the -mvsx option was used or NO_REGS. + +@item wd +VSX vector register to hold vector double data or NO_REGS. + +@item wf +VSX vector register to hold vector float data or NO_REGS. + +@item wg +If @option{-mmfpgpr} was used, a floating point register or NO_REGS. + +@item wh +Floating point register if direct moves are available, or NO_REGS. + +@item wi +FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS. + +@item wj +FP or VSX register to hold 64-bit integers for direct moves or NO_REGS. + +@item wk +FP or VSX register to hold 64-bit doubles for direct moves or NO_REGS. + +@item wl +Floating point register if the LFIWAX instruction is enabled or NO_REGS. + +@item wm +VSX register if direct move instructions are enabled, or NO_REGS. + +@item wn +No register (NO_REGS). + +@item wr +General purpose register if 64-bit instructions are enabled or NO_REGS. + +@item ws +VSX vector register to hold scalar double values or NO_REGS. + +@item wt +VSX vector register to hold 128 bit integer or NO_REGS. + +@item wu +Altivec register to use for float/32-bit int loads/stores or NO_REGS. + +@item wv +Altivec register to use for double loads/stores or NO_REGS. + +@item ww +FP or VSX register to perform float operations under @option{-mvsx} or NO_REGS. + +@item wx +Floating point register if the STFIWX instruction is enabled or NO_REGS. + +@item wy +FP or VSX register to perform ISA 2.07 float ops or NO_REGS. + +@item wz +Floating point register if the LFIWZX instruction is enabled or NO_REGS. + +@item wD +Int constant that is the element number of the 64-bit scalar in a vector. + +@item wQ +A memory address that will work with the @code{lq} and @code{stq} +instructions. + +@item h +@samp{MQ}, @samp{CTR}, or @samp{LINK} register + +@item q +@samp{MQ} register + +@item c +@samp{CTR} register + +@item l +@samp{LINK} register + +@item x +@samp{CR} register (condition register) number 0 + +@item y +@samp{CR} register (condition register) + +@item z +@samp{XER[CA]} carry bit (part of the XER register) + +@item I +Signed 16-bit constant + +@item J +Unsigned 16-bit constant shifted left 16 bits (use @samp{L} instead for +@code{SImode} constants) + +@item K +Unsigned 16-bit constant + +@item L +Signed 16-bit constant shifted left 16 bits + +@item M +Constant larger than 31 + +@item N +Exact power of 2 + +@item O +Zero + +@item P +Constant whose negation is a signed 16-bit constant + +@item G +Floating point constant that can be loaded into a register with one +instruction per word + +@item H +Integer/Floating point constant that can be loaded into a register using +three instructions + +@item m +Memory operand. +Normally, @code{m} does not allow addresses that update the base register. +If @samp{<} or @samp{>} constraint is also used, they are allowed and +therefore on PowerPC targets in that case it is only safe +to use @samp{m<>} in an @code{asm} statement if that @code{asm} statement +accesses the operand exactly once. The @code{asm} statement must also +use @samp{%U@var{<opno>}} as a placeholder for the ``update'' flag in the +corresponding load or store instruction. For example: + +@smallexample +asm ("st%U0 %1,%0" : "=m<>" (mem) : "r" (val)); +@end smallexample + +is correct but: + +@smallexample +asm ("st %1,%0" : "=m<>" (mem) : "r" (val)); +@end smallexample + +is not. + +@item es +A ``stable'' memory operand; that is, one which does not include any +automodification of the base register. This used to be useful when +@samp{m} allowed automodification of the base register, but as those are now only +allowed when @samp{<} or @samp{>} is used, @samp{es} is basically the same +as @samp{m} without @samp{<} and @samp{>}. + +@item Q +Memory operand that is an offset from a register (it is usually better +to use @samp{m} or @samp{es} in @code{asm} statements) + +@item Z +Memory operand that is an indexed or indirect from a register (it is +usually better to use @samp{m} or @samp{es} in @code{asm} statements) + +@item R +AIX TOC entry + +@item a +Address operand that is an indexed or indirect from a register (@samp{p} is +preferable for @code{asm} statements) + +@item S +Constant suitable as a 64-bit mask operand + +@item T +Constant suitable as a 32-bit mask operand + +@item U +System V Release 4 small data area reference + +@item t +AND masks that can be performed by two rldic@{l, r@} instructions + +@item W +Vector constant that does not require memory + +@item j +Vector constant that is all zeros. + +@end table + @item RL78---@file{config/rl78/constraints.md} @table @code @@ -3462,6 +3325,79 @@ A constant in the range 0 to 15, inclusive. @end table +@item S/390 and zSeries---@file{config/s390/s390.h} +@table @code +@item a +Address register (general purpose register except r0) + +@item c +Condition code register + +@item d +Data register (arbitrary general purpose register) + +@item f +Floating-point register + +@item I +Unsigned 8-bit constant (0--255) + +@item J +Unsigned 12-bit constant (0--4095) + +@item K +Signed 16-bit constant (@minus{}32768--32767) + +@item L +Value appropriate as displacement. +@table @code +@item (0..4095) +for short displacement +@item (@minus{}524288..524287) +for long displacement +@end table + +@item M +Constant integer with a value of 0x7fffffff. + +@item N +Multiple letter constraint followed by 4 parameter letters. +@table @code +@item 0..9: +number of the part counting from most to least significant +@item H,Q: +mode of the part +@item D,S,H: +mode of the containing operand +@item 0,F: +value of the other parts (F---all bits set) +@end table +The constraint matches if the specified part of a constant +has a value different from its other parts. + +@item Q +Memory reference without index register and with short displacement. + +@item R +Memory reference with index register and short displacement. + +@item S +Memory reference without index register but with long displacement. + +@item T +Memory reference with index register and long displacement. + +@item U +Pointer with short displacement. + +@item W +Pointer with long displacement. + +@item Y +Shift count operand. + +@end table + @need 1000 @item SPARC---@file{config/sparc/sparc.h} @table @code @@ -3634,149 +3570,6 @@ An immediate for the @code{iohl} instruction. const_int is sign extended to 128 @end table -@item S/390 and zSeries---@file{config/s390/s390.h} -@table @code -@item a -Address register (general purpose register except r0) - -@item c -Condition code register - -@item d -Data register (arbitrary general purpose register) - -@item f -Floating-point register - -@item I -Unsigned 8-bit constant (0--255) - -@item J -Unsigned 12-bit constant (0--4095) - -@item K -Signed 16-bit constant (@minus{}32768--32767) - -@item L -Value appropriate as displacement. -@table @code -@item (0..4095) -for short displacement -@item (@minus{}524288..524287) -for long displacement -@end table - -@item M -Constant integer with a value of 0x7fffffff. - -@item N -Multiple letter constraint followed by 4 parameter letters. -@table @code -@item 0..9: -number of the part counting from most to least significant -@item H,Q: -mode of the part -@item D,S,H: -mode of the containing operand -@item 0,F: -value of the other parts (F---all bits set) -@end table -The constraint matches if the specified part of a constant -has a value different from its other parts. - -@item Q -Memory reference without index register and with short displacement. - -@item R -Memory reference with index register and short displacement. - -@item S -Memory reference without index register but with long displacement. - -@item T -Memory reference with index register and long displacement. - -@item U -Pointer with short displacement. - -@item W -Pointer with long displacement. - -@item Y -Shift count operand. - -@end table - -@item Xstormy16---@file{config/stormy16/stormy16.h} -@table @code -@item a -Register r0. - -@item b -Register r1. - -@item c -Register r2. - -@item d -Register r8. - -@item e -Registers r0 through r7. - -@item t -Registers r0 and r1. - -@item y -The carry register. - -@item z -Registers r8 and r9. - -@item I -A constant between 0 and 3 inclusive. - -@item J -A constant that has exactly one bit set. - -@item K -A constant that has exactly one bit clear. - -@item L -A constant between 0 and 255 inclusive. - -@item M -A constant between @minus{}255 and 0 inclusive. - -@item N -A constant between @minus{}3 and 0 inclusive. - -@item O -A constant between 1 and 4 inclusive. - -@item P -A constant between @minus{}4 and @minus{}1 inclusive. - -@item Q -A memory reference that is a stack push. - -@item R -A memory reference that is a stack pop. - -@item S -A memory reference that refers to a constant address of known value. - -@item T -The register indicated by Rx (not implemented yet). - -@item U -A constant that is not between 2 and 15 inclusive. - -@item Z -The constant 0. - -@end table - @item TI C6X family---@file{config/c6x/constraints.md} @table @code @item a @@ -4058,6 +3851,214 @@ Integer constant 0 Integer constant 32 @end table +@item x86 family---@file{config/i386/constraints.md} +@table @code +@item R +Legacy register---the eight integer registers available on all +i386 processors (@code{a}, @code{b}, @code{c}, @code{d}, +@code{si}, @code{di}, @code{bp}, @code{sp}). + +@item q +Any register accessible as @code{@var{r}l}. In 32-bit mode, @code{a}, +@code{b}, @code{c}, and @code{d}; in 64-bit mode, any integer register. + +@item Q +Any register accessible as @code{@var{r}h}: @code{a}, @code{b}, +@code{c}, and @code{d}. + +@ifset INTERNALS +@item l +Any register that can be used as the index in a base+index memory +access: that is, any general register except the stack pointer. +@end ifset + +@item a +The @code{a} register. + +@item b +The @code{b} register. + +@item c +The @code{c} register. + +@item d +The @code{d} register. + +@item S +The @code{si} register. + +@item D +The @code{di} register. + +@item A +The @code{a} and @code{d} registers. This class is used for instructions +that return double word results in the @code{ax:dx} register pair. Single +word values will be allocated either in @code{ax} or @code{dx}. +For example on i386 the following implements @code{rdtsc}: + +@smallexample +unsigned long long rdtsc (void) +@{ + unsigned long long tick; + __asm__ __volatile__("rdtsc":"=A"(tick)); + return tick; +@} +@end smallexample + +This is not correct on x86-64 as it would allocate tick in either @code{ax} +or @code{dx}. You have to use the following variant instead: + +@smallexample +unsigned long long rdtsc (void) +@{ + unsigned int tickl, tickh; + __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh)); + return ((unsigned long long)tickh << 32)|tickl; +@} +@end smallexample + + +@item f +Any 80387 floating-point (stack) register. + +@item t +Top of 80387 floating-point stack (@code{%st(0)}). + +@item u +Second from top of 80387 floating-point stack (@code{%st(1)}). + +@item y +Any MMX register. + +@item x +Any SSE register. + +@item Yz +First SSE register (@code{%xmm0}). + +@ifset INTERNALS +@item Y2 +Any SSE register, when SSE2 is enabled. + +@item Yi +Any SSE register, when SSE2 and inter-unit moves are enabled. + +@item Ym +Any MMX register, when inter-unit moves are enabled. +@end ifset + +@item I +Integer constant in the range 0 @dots{} 31, for 32-bit shifts. + +@item J +Integer constant in the range 0 @dots{} 63, for 64-bit shifts. + +@item K +Signed 8-bit integer constant. + +@item L +@code{0xFF} or @code{0xFFFF}, for andsi as a zero-extending move. + +@item M +0, 1, 2, or 3 (shifts for the @code{lea} instruction). + +@item N +Unsigned 8-bit integer constant (for @code{in} and @code{out} +instructions). + +@ifset INTERNALS +@item O +Integer constant in the range 0 @dots{} 127, for 128-bit shifts. +@end ifset + +@item G +Standard 80387 floating point constant. + +@item C +Standard SSE floating point constant. + +@item e +32-bit signed integer constant, or a symbolic reference known +to fit that range (for immediate operands in sign-extending x86-64 +instructions). + +@item Z +32-bit unsigned integer constant, or a symbolic reference known +to fit that range (for immediate operands in zero-extending x86-64 +instructions). + +@end table + +@item Xstormy16---@file{config/stormy16/stormy16.h} +@table @code +@item a +Register r0. + +@item b +Register r1. + +@item c +Register r2. + +@item d +Register r8. + +@item e +Registers r0 through r7. + +@item t +Registers r0 and r1. + +@item y +The carry register. + +@item z +Registers r8 and r9. + +@item I +A constant between 0 and 3 inclusive. + +@item J +A constant that has exactly one bit set. + +@item K +A constant that has exactly one bit clear. + +@item L +A constant between 0 and 255 inclusive. + +@item M +A constant between @minus{}255 and 0 inclusive. + +@item N +A constant between @minus{}3 and 0 inclusive. + +@item O +A constant between 1 and 4 inclusive. + +@item P +A constant between @minus{}4 and @minus{}1 inclusive. + +@item Q +A memory reference that is a stack push. + +@item R +A memory reference that is a stack pop. + +@item S +A memory reference that refers to a constant address of known value. + +@item T +The register indicated by Rx (not implemented yet). + +@item U +A constant that is not between 2 and 15 inclusive. + +@item Z +The constant 0. + +@end table + @item Xtensa---@file{config/xtensa/constraints.md} @table @code @item a |