aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSandra Loosemore <sandra@codesourcery.com>2015-01-31 21:11:30 -0500
committerSandra Loosemore <sandra@gcc.gnu.org>2015-01-31 21:11:30 -0500
commitb4fbcb1bf2f569af3e57e91132f3573f37ad3800 (patch)
treebae709a7cfaad39f410356107f39eff5748c18e4
parent0353c564debb7e8ab17e53bb92127d8e1d6fe010 (diff)
downloadgcc-b4fbcb1bf2f569af3e57e91132f3573f37ad3800.zip
gcc-b4fbcb1bf2f569af3e57e91132f3573f37ad3800.tar.gz
gcc-b4fbcb1bf2f569af3e57e91132f3573f37ad3800.tar.bz2
md.texi (Machine Constraints): Alphabetize table by target.
2015-01-31 Sandra Loosemore <sandra@codesourcery.com> gcc/ * doc/md.texi (Machine Constraints): Alphabetize table by target. * doc/extend.texi (x86 Variable Attributes): Move section to correct alphabetization after renaming. (x86 Type Attributes): Likewise. (Target Builtins): Re-alphabetize menu. (x86 Built-in Functions): Move section to correct alphabetization after renaming. (x86 transactional memory intrinsics): Likewise. * doc/invoke.texi (Option Summary): Re-alphabetize x86 Options and x86 Windows Options in table and menu. (x86 Options): Move section to correct alphabetization after renaming. (x86 Windows Options): Likewise. From-SVN: r220315
-rw-r--r--gcc/ChangeLog16
-rw-r--r--gcc/doc/extend.texi3034
-rw-r--r--gcc/doc/invoke.texi2514
-rw-r--r--gcc/doc/md.texi1411
4 files changed, 3496 insertions, 3479 deletions
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7c06f05..0618d83 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,21 @@
2015-01-31 Sandra Loosemore <sandra@codesourcery.com>
+ * doc/md.texi (Machine Constraints): Alphabetize table by target.
+ * doc/extend.texi (x86 Variable Attributes): Move section to
+ correct alphabetization after renaming.
+ (x86 Type Attributes): Likewise.
+ (Target Builtins): Re-alphabetize menu.
+ (x86 Built-in Functions): Move section to correct alphabetization
+ after renaming.
+ (x86 transactional memory intrinsics): Likewise.
+ * doc/invoke.texi (Option Summary): Re-alphabetize x86 Options
+ and x86 Windows Options in table and menu.
+ (x86 Options): Move section to correct alphabetization after
+ renaming.
+ (x86 Windows Options): Likewise.
+
+2015-01-31 Sandra Loosemore <sandra@codesourcery.com>
+
* doc/extend.texi: Use "x86", "x86-32", and "x86-64" as the
preferred names of the architecture and its 32- and 64-bit
variants.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 681812e..1806850 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -5521,6 +5521,23 @@ int cpu_clock __attribute__((cb(0x123)));
@end table
+@subsection PowerPC Variable Attributes
+
+Three attributes currently are defined for PowerPC configurations:
+@code{altivec}, @code{ms_struct} and @code{gcc_struct}.
+
+For full documentation of the struct attributes please see the
+documentation in @ref{x86 Variable Attributes}.
+
+For documentation of @code{altivec} attribute please see the
+documentation in @ref{PowerPC Type Attributes}.
+
+@subsection SPU Variable Attributes
+
+The SPU supports the @code{spu_vector} attribute for variables. For
+documentation of this attribute please see the documentation in
+@ref{SPU Type Attributes}.
+
@anchor{x86 Variable Attributes}
@subsection x86 Variable Attributes
@@ -5659,23 +5676,6 @@ Here, @code{t5} takes up 2 bytes.
@end enumerate
@end table
-@subsection PowerPC Variable Attributes
-
-Three attributes currently are defined for PowerPC configurations:
-@code{altivec}, @code{ms_struct} and @code{gcc_struct}.
-
-For full documentation of the struct attributes please see the
-documentation in @ref{x86 Variable Attributes}.
-
-For documentation of @code{altivec} attribute please see the
-documentation in @ref{PowerPC Type Attributes}.
-
-@subsection SPU Variable Attributes
-
-The SPU supports the @code{spu_vector} attribute for variables. For
-documentation of this attribute please see the documentation in
-@ref{SPU Type Attributes}.
-
@subsection Xstormy16 Variable Attributes
One attribute is currently defined for xstormy16 configurations:
@@ -6078,30 +6078,6 @@ Specifically, the @code{based}, @code{tiny}, @code{near}, and
@code{far} attributes may be applied to either. The @code{io} and
@code{cb} attributes may not be applied to types.
-@anchor{x86 Type Attributes}
-@subsection x86 Type Attributes
-
-Two attributes are currently defined for x86 configurations:
-@code{ms_struct} and @code{gcc_struct}.
-
-@table @code
-
-@item ms_struct
-@itemx gcc_struct
-@cindex @code{ms_struct}
-@cindex @code{gcc_struct}
-
-If @code{packed} is used on a structure, or if bit-fields are used
-it may be that the Microsoft ABI packs them differently
-than GCC normally packs them. Particularly when moving packed
-data between functions compiled with GCC and the native Microsoft compiler
-(either via function call or as data in a file), it may be necessary to access
-either format.
-
-Currently @option{-m[no-]ms-bitfields} is provided for the Microsoft Windows x86
-compilers to match the native Microsoft compiler.
-@end table
-
@anchor{PowerPC Type Attributes}
@subsection PowerPC Type Attributes
@@ -6134,6 +6110,30 @@ allows one to declare vector data types supported by the Sony/Toshiba/IBM SPU
Language Extensions Specification. It is intended to support the
@code{__vector} keyword.
+@anchor{x86 Type Attributes}
+@subsection x86 Type Attributes
+
+Two attributes are currently defined for x86 configurations:
+@code{ms_struct} and @code{gcc_struct}.
+
+@table @code
+
+@item ms_struct
+@itemx gcc_struct
+@cindex @code{ms_struct}
+@cindex @code{gcc_struct}
+
+If @code{packed} is used on a structure, or if bit-fields are used
+it may be that the Microsoft ABI packs them differently
+than GCC normally packs them. Particularly when moving packed
+data between functions compiled with GCC and the native Microsoft compiler
+(either via function call or as data in a file), it may be necessary to access
+either format.
+
+Currently @option{-m[no-]ms-bitfields} is provided for the Microsoft Windows x86
+compilers to match the native Microsoft compiler.
+@end table
+
@node Alignment
@section Inquiring on Alignment of Types or Variables
@cindex alignment
@@ -10113,8 +10113,6 @@ instructions, but allow the compiler to schedule those calls.
* AVR Built-in Functions::
* Blackfin Built-in Functions::
* FR-V Built-in Functions::
-* x86 Built-in Functions::
-* x86 transactional memory intrinsics::
* MIPS DSP Built-in Functions::
* MIPS Paired-Single Support::
* MIPS Loongson Built-in Functions::
@@ -10133,6 +10131,8 @@ instructions, but allow the compiler to schedule those calls.
* TI C6X Built-in Functions::
* TILE-Gx Built-in Functions::
* TILEPro Built-in Functions::
+* x86 Built-in Functions::
+* x86 transactional memory intrinsics::
@end menu
@node AArch64 Built-in Functions
@@ -11484,1480 +11484,6 @@ Use the @code{nldub} instruction to load the contents of address @var{x}
into the data cache. The instruction is issued in slot I1@.
@end table
-@node x86 Built-in Functions
-@subsection x86 Built-in Functions
-
-These built-in functions are available for the x86-32 and x86-64 family
-of computers, depending on the command-line switches used.
-
-If you specify command-line switches such as @option{-msse},
-the compiler could use the extended instruction sets even if the built-ins
-are not used explicitly in the program. For this reason, applications
-that perform run-time CPU detection must compile separate files for each
-supported architecture, using the appropriate flags. In particular,
-the file containing the CPU detection code should be compiled without
-these options.
-
-The following machine modes are available for use with MMX built-in functions
-(@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32-bit integers,
-@code{V4HI} for a vector of four 16-bit integers, and @code{V8QI} for a
-vector of eight 8-bit integers. Some of the built-in functions operate on
-MMX registers as a whole 64-bit entity, these use @code{V1DI} as their mode.
-
-If 3DNow!@: extensions are enabled, @code{V2SF} is used as a mode for a vector
-of two 32-bit floating-point values.
-
-If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32-bit
-floating-point values. Some instructions use a vector of four 32-bit
-integers, these use @code{V4SI}. Finally, some instructions operate on an
-entire vector register, interpreting it as a 128-bit integer, these use mode
-@code{TI}.
-
-In 64-bit mode, the x86-64 family of processors uses additional built-in
-functions for efficient use of @code{TF} (@code{__float128}) 128-bit
-floating point and @code{TC} 128-bit complex floating-point values.
-
-The following floating-point built-in functions are available in 64-bit
-mode. All of them implement the function that is part of the name.
-
-@smallexample
-__float128 __builtin_fabsq (__float128)
-__float128 __builtin_copysignq (__float128, __float128)
-@end smallexample
-
-The following built-in function is always available.
-
-@table @code
-@item void __builtin_ia32_pause (void)
-Generates the @code{pause} machine instruction with a compiler memory
-barrier.
-@end table
-
-The following floating-point built-in functions are made available in the
-64-bit mode.
-
-@table @code
-@item __float128 __builtin_infq (void)
-Similar to @code{__builtin_inf}, except the return type is @code{__float128}.
-@findex __builtin_infq
-
-@item __float128 __builtin_huge_valq (void)
-Similar to @code{__builtin_huge_val}, except the return type is @code{__float128}.
-@findex __builtin_huge_valq
-@end table
-
-The following built-in functions are always available and can be used to
-check the target platform type.
-
-@deftypefn {Built-in Function} void __builtin_cpu_init (void)
-This function runs the CPU detection code to check the type of CPU and the
-features supported. This built-in function needs to be invoked along with the built-in functions
-to check CPU type and features, @code{__builtin_cpu_is} and
-@code{__builtin_cpu_supports}, only when used in a function that is
-executed before any constructors are called. The CPU detection code is
-automatically executed in a very high priority constructor.
-
-For example, this function has to be used in @code{ifunc} resolvers that
-check for CPU type using the built-in functions @code{__builtin_cpu_is}
-and @code{__builtin_cpu_supports}, or in constructors on targets that
-don't support constructor priority.
-@smallexample
-
-static void (*resolve_memcpy (void)) (void)
-@{
- // ifunc resolvers fire before constructors, explicitly call the init
- // function.
- __builtin_cpu_init ();
- if (__builtin_cpu_supports ("ssse3"))
- return ssse3_memcpy; // super fast memcpy with ssse3 instructions.
- else
- return default_memcpy;
-@}
-
-void *memcpy (void *, const void *, size_t)
- __attribute__ ((ifunc ("resolve_memcpy")));
-@end smallexample
-
-@end deftypefn
-
-@deftypefn {Built-in Function} int __builtin_cpu_is (const char *@var{cpuname})
-This function returns a positive integer if the run-time CPU
-is of type @var{cpuname}
-and returns @code{0} otherwise. The following CPU names can be detected:
-
-@table @samp
-@item intel
-Intel CPU.
-
-@item atom
-Intel Atom CPU.
-
-@item core2
-Intel Core 2 CPU.
-
-@item corei7
-Intel Core i7 CPU.
-
-@item nehalem
-Intel Core i7 Nehalem CPU.
-
-@item westmere
-Intel Core i7 Westmere CPU.
-
-@item sandybridge
-Intel Core i7 Sandy Bridge CPU.
-
-@item amd
-AMD CPU.
-
-@item amdfam10h
-AMD Family 10h CPU.
-
-@item barcelona
-AMD Family 10h Barcelona CPU.
-
-@item shanghai
-AMD Family 10h Shanghai CPU.
-
-@item istanbul
-AMD Family 10h Istanbul CPU.
-
-@item btver1
-AMD Family 14h CPU.
-
-@item amdfam15h
-AMD Family 15h CPU.
-
-@item bdver1
-AMD Family 15h Bulldozer version 1.
-
-@item bdver2
-AMD Family 15h Bulldozer version 2.
-
-@item bdver3
-AMD Family 15h Bulldozer version 3.
-
-@item bdver4
-AMD Family 15h Bulldozer version 4.
-
-@item btver2
-AMD Family 16h CPU.
-@end table
-
-Here is an example:
-@smallexample
-if (__builtin_cpu_is ("corei7"))
- @{
- do_corei7 (); // Core i7 specific implementation.
- @}
-else
- @{
- do_generic (); // Generic implementation.
- @}
-@end smallexample
-@end deftypefn
-
-@deftypefn {Built-in Function} int __builtin_cpu_supports (const char *@var{feature})
-This function returns a positive integer if the run-time CPU
-supports @var{feature}
-and returns @code{0} otherwise. The following features can be detected:
-
-@table @samp
-@item cmov
-CMOV instruction.
-@item mmx
-MMX instructions.
-@item popcnt
-POPCNT instruction.
-@item sse
-SSE instructions.
-@item sse2
-SSE2 instructions.
-@item sse3
-SSE3 instructions.
-@item ssse3
-SSSE3 instructions.
-@item sse4.1
-SSE4.1 instructions.
-@item sse4.2
-SSE4.2 instructions.
-@item avx
-AVX instructions.
-@item avx2
-AVX2 instructions.
-@item avx512f
-AVX512F instructions.
-@end table
-
-Here is an example:
-@smallexample
-if (__builtin_cpu_supports ("popcnt"))
- @{
- asm("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc");
- @}
-else
- @{
- count = generic_countbits (n); //generic implementation.
- @}
-@end smallexample
-@end deftypefn
-
-
-The following built-in functions are made available by @option{-mmmx}.
-All of them generate the machine instruction that is part of the name.
-
-@smallexample
-v8qi __builtin_ia32_paddb (v8qi, v8qi)
-v4hi __builtin_ia32_paddw (v4hi, v4hi)
-v2si __builtin_ia32_paddd (v2si, v2si)
-v8qi __builtin_ia32_psubb (v8qi, v8qi)
-v4hi __builtin_ia32_psubw (v4hi, v4hi)
-v2si __builtin_ia32_psubd (v2si, v2si)
-v8qi __builtin_ia32_paddsb (v8qi, v8qi)
-v4hi __builtin_ia32_paddsw (v4hi, v4hi)
-v8qi __builtin_ia32_psubsb (v8qi, v8qi)
-v4hi __builtin_ia32_psubsw (v4hi, v4hi)
-v8qi __builtin_ia32_paddusb (v8qi, v8qi)
-v4hi __builtin_ia32_paddusw (v4hi, v4hi)
-v8qi __builtin_ia32_psubusb (v8qi, v8qi)
-v4hi __builtin_ia32_psubusw (v4hi, v4hi)
-v4hi __builtin_ia32_pmullw (v4hi, v4hi)
-v4hi __builtin_ia32_pmulhw (v4hi, v4hi)
-di __builtin_ia32_pand (di, di)
-di __builtin_ia32_pandn (di,di)
-di __builtin_ia32_por (di, di)
-di __builtin_ia32_pxor (di, di)
-v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi)
-v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi)
-v2si __builtin_ia32_pcmpeqd (v2si, v2si)
-v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi)
-v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi)
-v2si __builtin_ia32_pcmpgtd (v2si, v2si)
-v8qi __builtin_ia32_punpckhbw (v8qi, v8qi)
-v4hi __builtin_ia32_punpckhwd (v4hi, v4hi)
-v2si __builtin_ia32_punpckhdq (v2si, v2si)
-v8qi __builtin_ia32_punpcklbw (v8qi, v8qi)
-v4hi __builtin_ia32_punpcklwd (v4hi, v4hi)
-v2si __builtin_ia32_punpckldq (v2si, v2si)
-v8qi __builtin_ia32_packsswb (v4hi, v4hi)
-v4hi __builtin_ia32_packssdw (v2si, v2si)
-v8qi __builtin_ia32_packuswb (v4hi, v4hi)
-
-v4hi __builtin_ia32_psllw (v4hi, v4hi)
-v2si __builtin_ia32_pslld (v2si, v2si)
-v1di __builtin_ia32_psllq (v1di, v1di)
-v4hi __builtin_ia32_psrlw (v4hi, v4hi)
-v2si __builtin_ia32_psrld (v2si, v2si)
-v1di __builtin_ia32_psrlq (v1di, v1di)
-v4hi __builtin_ia32_psraw (v4hi, v4hi)
-v2si __builtin_ia32_psrad (v2si, v2si)
-v4hi __builtin_ia32_psllwi (v4hi, int)
-v2si __builtin_ia32_pslldi (v2si, int)
-v1di __builtin_ia32_psllqi (v1di, int)
-v4hi __builtin_ia32_psrlwi (v4hi, int)
-v2si __builtin_ia32_psrldi (v2si, int)
-v1di __builtin_ia32_psrlqi (v1di, int)
-v4hi __builtin_ia32_psrawi (v4hi, int)
-v2si __builtin_ia32_psradi (v2si, int)
-
-@end smallexample
-
-The following built-in functions are made available either with
-@option{-msse}, or with a combination of @option{-m3dnow} and
-@option{-march=athlon}. All of them generate the machine
-instruction that is part of the name.
-
-@smallexample
-v4hi __builtin_ia32_pmulhuw (v4hi, v4hi)
-v8qi __builtin_ia32_pavgb (v8qi, v8qi)
-v4hi __builtin_ia32_pavgw (v4hi, v4hi)
-v1di __builtin_ia32_psadbw (v8qi, v8qi)
-v8qi __builtin_ia32_pmaxub (v8qi, v8qi)
-v4hi __builtin_ia32_pmaxsw (v4hi, v4hi)
-v8qi __builtin_ia32_pminub (v8qi, v8qi)
-v4hi __builtin_ia32_pminsw (v4hi, v4hi)
-int __builtin_ia32_pmovmskb (v8qi)
-void __builtin_ia32_maskmovq (v8qi, v8qi, char *)
-void __builtin_ia32_movntq (di *, di)
-void __builtin_ia32_sfence (void)
-@end smallexample
-
-The following built-in functions are available when @option{-msse} is used.
-All of them generate the machine instruction that is part of the name.
-
-@smallexample
-int __builtin_ia32_comieq (v4sf, v4sf)
-int __builtin_ia32_comineq (v4sf, v4sf)
-int __builtin_ia32_comilt (v4sf, v4sf)
-int __builtin_ia32_comile (v4sf, v4sf)
-int __builtin_ia32_comigt (v4sf, v4sf)
-int __builtin_ia32_comige (v4sf, v4sf)
-int __builtin_ia32_ucomieq (v4sf, v4sf)
-int __builtin_ia32_ucomineq (v4sf, v4sf)
-int __builtin_ia32_ucomilt (v4sf, v4sf)
-int __builtin_ia32_ucomile (v4sf, v4sf)
-int __builtin_ia32_ucomigt (v4sf, v4sf)
-int __builtin_ia32_ucomige (v4sf, v4sf)
-v4sf __builtin_ia32_addps (v4sf, v4sf)
-v4sf __builtin_ia32_subps (v4sf, v4sf)
-v4sf __builtin_ia32_mulps (v4sf, v4sf)
-v4sf __builtin_ia32_divps (v4sf, v4sf)
-v4sf __builtin_ia32_addss (v4sf, v4sf)
-v4sf __builtin_ia32_subss (v4sf, v4sf)
-v4sf __builtin_ia32_mulss (v4sf, v4sf)
-v4sf __builtin_ia32_divss (v4sf, v4sf)
-v4sf __builtin_ia32_cmpeqps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpltps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpleps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpgtps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpgeps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpunordps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpneqps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpnltps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpnleps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpngtps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpngeps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpordps (v4sf, v4sf)
-v4sf __builtin_ia32_cmpeqss (v4sf, v4sf)
-v4sf __builtin_ia32_cmpltss (v4sf, v4sf)
-v4sf __builtin_ia32_cmpless (v4sf, v4sf)
-v4sf __builtin_ia32_cmpunordss (v4sf, v4sf)
-v4sf __builtin_ia32_cmpneqss (v4sf, v4sf)
-v4sf __builtin_ia32_cmpnltss (v4sf, v4sf)
-v4sf __builtin_ia32_cmpnless (v4sf, v4sf)
-v4sf __builtin_ia32_cmpordss (v4sf, v4sf)
-v4sf __builtin_ia32_maxps (v4sf, v4sf)
-v4sf __builtin_ia32_maxss (v4sf, v4sf)
-v4sf __builtin_ia32_minps (v4sf, v4sf)
-v4sf __builtin_ia32_minss (v4sf, v4sf)
-v4sf __builtin_ia32_andps (v4sf, v4sf)
-v4sf __builtin_ia32_andnps (v4sf, v4sf)
-v4sf __builtin_ia32_orps (v4sf, v4sf)
-v4sf __builtin_ia32_xorps (v4sf, v4sf)
-v4sf __builtin_ia32_movss (v4sf, v4sf)
-v4sf __builtin_ia32_movhlps (v4sf, v4sf)
-v4sf __builtin_ia32_movlhps (v4sf, v4sf)
-v4sf __builtin_ia32_unpckhps (v4sf, v4sf)
-v4sf __builtin_ia32_unpcklps (v4sf, v4sf)
-v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si)
-v4sf __builtin_ia32_cvtsi2ss (v4sf, int)
-v2si __builtin_ia32_cvtps2pi (v4sf)
-int __builtin_ia32_cvtss2si (v4sf)
-v2si __builtin_ia32_cvttps2pi (v4sf)
-int __builtin_ia32_cvttss2si (v4sf)
-v4sf __builtin_ia32_rcpps (v4sf)
-v4sf __builtin_ia32_rsqrtps (v4sf)
-v4sf __builtin_ia32_sqrtps (v4sf)
-v4sf __builtin_ia32_rcpss (v4sf)
-v4sf __builtin_ia32_rsqrtss (v4sf)
-v4sf __builtin_ia32_sqrtss (v4sf)
-v4sf __builtin_ia32_shufps (v4sf, v4sf, int)
-void __builtin_ia32_movntps (float *, v4sf)
-int __builtin_ia32_movmskps (v4sf)
-@end smallexample
-
-The following built-in functions are available when @option{-msse} is used.
-
-@table @code
-@item v4sf __builtin_ia32_loadups (float *)
-Generates the @code{movups} machine instruction as a load from memory.
-@item void __builtin_ia32_storeups (float *, v4sf)
-Generates the @code{movups} machine instruction as a store to memory.
-@item v4sf __builtin_ia32_loadss (float *)
-Generates the @code{movss} machine instruction as a load from memory.
-@item v4sf __builtin_ia32_loadhps (v4sf, const v2sf *)
-Generates the @code{movhps} machine instruction as a load from memory.
-@item v4sf __builtin_ia32_loadlps (v4sf, const v2sf *)
-Generates the @code{movlps} machine instruction as a load from memory
-@item void __builtin_ia32_storehps (v2sf *, v4sf)
-Generates the @code{movhps} machine instruction as a store to memory.
-@item void __builtin_ia32_storelps (v2sf *, v4sf)
-Generates the @code{movlps} machine instruction as a store to memory.
-@end table
-
-The following built-in functions are available when @option{-msse2} is used.
-All of them generate the machine instruction that is part of the name.
-
-@smallexample
-int __builtin_ia32_comisdeq (v2df, v2df)
-int __builtin_ia32_comisdlt (v2df, v2df)
-int __builtin_ia32_comisdle (v2df, v2df)
-int __builtin_ia32_comisdgt (v2df, v2df)
-int __builtin_ia32_comisdge (v2df, v2df)
-int __builtin_ia32_comisdneq (v2df, v2df)
-int __builtin_ia32_ucomisdeq (v2df, v2df)
-int __builtin_ia32_ucomisdlt (v2df, v2df)
-int __builtin_ia32_ucomisdle (v2df, v2df)
-int __builtin_ia32_ucomisdgt (v2df, v2df)
-int __builtin_ia32_ucomisdge (v2df, v2df)
-int __builtin_ia32_ucomisdneq (v2df, v2df)
-v2df __builtin_ia32_cmpeqpd (v2df, v2df)
-v2df __builtin_ia32_cmpltpd (v2df, v2df)
-v2df __builtin_ia32_cmplepd (v2df, v2df)
-v2df __builtin_ia32_cmpgtpd (v2df, v2df)
-v2df __builtin_ia32_cmpgepd (v2df, v2df)
-v2df __builtin_ia32_cmpunordpd (v2df, v2df)
-v2df __builtin_ia32_cmpneqpd (v2df, v2df)
-v2df __builtin_ia32_cmpnltpd (v2df, v2df)
-v2df __builtin_ia32_cmpnlepd (v2df, v2df)
-v2df __builtin_ia32_cmpngtpd (v2df, v2df)
-v2df __builtin_ia32_cmpngepd (v2df, v2df)
-v2df __builtin_ia32_cmpordpd (v2df, v2df)
-v2df __builtin_ia32_cmpeqsd (v2df, v2df)
-v2df __builtin_ia32_cmpltsd (v2df, v2df)
-v2df __builtin_ia32_cmplesd (v2df, v2df)
-v2df __builtin_ia32_cmpunordsd (v2df, v2df)
-v2df __builtin_ia32_cmpneqsd (v2df, v2df)
-v2df __builtin_ia32_cmpnltsd (v2df, v2df)
-v2df __builtin_ia32_cmpnlesd (v2df, v2df)
-v2df __builtin_ia32_cmpordsd (v2df, v2df)
-v2di __builtin_ia32_paddq (v2di, v2di)
-v2di __builtin_ia32_psubq (v2di, v2di)
-v2df __builtin_ia32_addpd (v2df, v2df)
-v2df __builtin_ia32_subpd (v2df, v2df)
-v2df __builtin_ia32_mulpd (v2df, v2df)
-v2df __builtin_ia32_divpd (v2df, v2df)
-v2df __builtin_ia32_addsd (v2df, v2df)
-v2df __builtin_ia32_subsd (v2df, v2df)
-v2df __builtin_ia32_mulsd (v2df, v2df)
-v2df __builtin_ia32_divsd (v2df, v2df)
-v2df __builtin_ia32_minpd (v2df, v2df)
-v2df __builtin_ia32_maxpd (v2df, v2df)
-v2df __builtin_ia32_minsd (v2df, v2df)
-v2df __builtin_ia32_maxsd (v2df, v2df)
-v2df __builtin_ia32_andpd (v2df, v2df)
-v2df __builtin_ia32_andnpd (v2df, v2df)
-v2df __builtin_ia32_orpd (v2df, v2df)
-v2df __builtin_ia32_xorpd (v2df, v2df)
-v2df __builtin_ia32_movsd (v2df, v2df)
-v2df __builtin_ia32_unpckhpd (v2df, v2df)
-v2df __builtin_ia32_unpcklpd (v2df, v2df)
-v16qi __builtin_ia32_paddb128 (v16qi, v16qi)
-v8hi __builtin_ia32_paddw128 (v8hi, v8hi)
-v4si __builtin_ia32_paddd128 (v4si, v4si)
-v2di __builtin_ia32_paddq128 (v2di, v2di)
-v16qi __builtin_ia32_psubb128 (v16qi, v16qi)
-v8hi __builtin_ia32_psubw128 (v8hi, v8hi)
-v4si __builtin_ia32_psubd128 (v4si, v4si)
-v2di __builtin_ia32_psubq128 (v2di, v2di)
-v8hi __builtin_ia32_pmullw128 (v8hi, v8hi)
-v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi)
-v2di __builtin_ia32_pand128 (v2di, v2di)
-v2di __builtin_ia32_pandn128 (v2di, v2di)
-v2di __builtin_ia32_por128 (v2di, v2di)
-v2di __builtin_ia32_pxor128 (v2di, v2di)
-v16qi __builtin_ia32_pavgb128 (v16qi, v16qi)
-v8hi __builtin_ia32_pavgw128 (v8hi, v8hi)
-v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi)
-v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi)
-v4si __builtin_ia32_pcmpeqd128 (v4si, v4si)
-v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi)
-v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi)
-v4si __builtin_ia32_pcmpgtd128 (v4si, v4si)
-v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi)
-v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi)
-v16qi __builtin_ia32_pminub128 (v16qi, v16qi)
-v8hi __builtin_ia32_pminsw128 (v8hi, v8hi)
-v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi)
-v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi)
-v4si __builtin_ia32_punpckhdq128 (v4si, v4si)
-v2di __builtin_ia32_punpckhqdq128 (v2di, v2di)
-v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi)
-v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi)
-v4si __builtin_ia32_punpckldq128 (v4si, v4si)
-v2di __builtin_ia32_punpcklqdq128 (v2di, v2di)
-v16qi __builtin_ia32_packsswb128 (v8hi, v8hi)
-v8hi __builtin_ia32_packssdw128 (v4si, v4si)
-v16qi __builtin_ia32_packuswb128 (v8hi, v8hi)
-v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi)
-void __builtin_ia32_maskmovdqu (v16qi, v16qi)
-v2df __builtin_ia32_loadupd (double *)
-void __builtin_ia32_storeupd (double *, v2df)
-v2df __builtin_ia32_loadhpd (v2df, double const *)
-v2df __builtin_ia32_loadlpd (v2df, double const *)
-int __builtin_ia32_movmskpd (v2df)
-int __builtin_ia32_pmovmskb128 (v16qi)
-void __builtin_ia32_movnti (int *, int)
-void __builtin_ia32_movnti64 (long long int *, long long int)
-void __builtin_ia32_movntpd (double *, v2df)
-void __builtin_ia32_movntdq (v2df *, v2df)
-v4si __builtin_ia32_pshufd (v4si, int)
-v8hi __builtin_ia32_pshuflw (v8hi, int)
-v8hi __builtin_ia32_pshufhw (v8hi, int)
-v2di __builtin_ia32_psadbw128 (v16qi, v16qi)
-v2df __builtin_ia32_sqrtpd (v2df)
-v2df __builtin_ia32_sqrtsd (v2df)
-v2df __builtin_ia32_shufpd (v2df, v2df, int)
-v2df __builtin_ia32_cvtdq2pd (v4si)
-v4sf __builtin_ia32_cvtdq2ps (v4si)
-v4si __builtin_ia32_cvtpd2dq (v2df)
-v2si __builtin_ia32_cvtpd2pi (v2df)
-v4sf __builtin_ia32_cvtpd2ps (v2df)
-v4si __builtin_ia32_cvttpd2dq (v2df)
-v2si __builtin_ia32_cvttpd2pi (v2df)
-v2df __builtin_ia32_cvtpi2pd (v2si)
-int __builtin_ia32_cvtsd2si (v2df)
-int __builtin_ia32_cvttsd2si (v2df)
-long long __builtin_ia32_cvtsd2si64 (v2df)
-long long __builtin_ia32_cvttsd2si64 (v2df)
-v4si __builtin_ia32_cvtps2dq (v4sf)
-v2df __builtin_ia32_cvtps2pd (v4sf)
-v4si __builtin_ia32_cvttps2dq (v4sf)
-v2df __builtin_ia32_cvtsi2sd (v2df, int)
-v2df __builtin_ia32_cvtsi642sd (v2df, long long)
-v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df)
-v2df __builtin_ia32_cvtss2sd (v2df, v4sf)
-void __builtin_ia32_clflush (const void *)
-void __builtin_ia32_lfence (void)
-void __builtin_ia32_mfence (void)
-v16qi __builtin_ia32_loaddqu (const char *)
-void __builtin_ia32_storedqu (char *, v16qi)
-v1di __builtin_ia32_pmuludq (v2si, v2si)
-v2di __builtin_ia32_pmuludq128 (v4si, v4si)
-v8hi __builtin_ia32_psllw128 (v8hi, v8hi)
-v4si __builtin_ia32_pslld128 (v4si, v4si)
-v2di __builtin_ia32_psllq128 (v2di, v2di)
-v8hi __builtin_ia32_psrlw128 (v8hi, v8hi)
-v4si __builtin_ia32_psrld128 (v4si, v4si)
-v2di __builtin_ia32_psrlq128 (v2di, v2di)
-v8hi __builtin_ia32_psraw128 (v8hi, v8hi)
-v4si __builtin_ia32_psrad128 (v4si, v4si)
-v2di __builtin_ia32_pslldqi128 (v2di, int)
-v8hi __builtin_ia32_psllwi128 (v8hi, int)
-v4si __builtin_ia32_pslldi128 (v4si, int)
-v2di __builtin_ia32_psllqi128 (v2di, int)
-v2di __builtin_ia32_psrldqi128 (v2di, int)
-v8hi __builtin_ia32_psrlwi128 (v8hi, int)
-v4si __builtin_ia32_psrldi128 (v4si, int)
-v2di __builtin_ia32_psrlqi128 (v2di, int)
-v8hi __builtin_ia32_psrawi128 (v8hi, int)
-v4si __builtin_ia32_psradi128 (v4si, int)
-v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi)
-v2di __builtin_ia32_movq128 (v2di)
-@end smallexample
-
-The following built-in functions are available when @option{-msse3} is used.
-All of them generate the machine instruction that is part of the name.
-
-@smallexample
-v2df __builtin_ia32_addsubpd (v2df, v2df)
-v4sf __builtin_ia32_addsubps (v4sf, v4sf)
-v2df __builtin_ia32_haddpd (v2df, v2df)
-v4sf __builtin_ia32_haddps (v4sf, v4sf)
-v2df __builtin_ia32_hsubpd (v2df, v2df)
-v4sf __builtin_ia32_hsubps (v4sf, v4sf)
-v16qi __builtin_ia32_lddqu (char const *)
-void __builtin_ia32_monitor (void *, unsigned int, unsigned int)
-v4sf __builtin_ia32_movshdup (v4sf)
-v4sf __builtin_ia32_movsldup (v4sf)
-void __builtin_ia32_mwait (unsigned int, unsigned int)
-@end smallexample
-
-The following built-in functions are available when @option{-mssse3} is used.
-All of them generate the machine instruction that is part of the name.
-
-@smallexample
-v2si __builtin_ia32_phaddd (v2si, v2si)
-v4hi __builtin_ia32_phaddw (v4hi, v4hi)
-v4hi __builtin_ia32_phaddsw (v4hi, v4hi)
-v2si __builtin_ia32_phsubd (v2si, v2si)
-v4hi __builtin_ia32_phsubw (v4hi, v4hi)
-v4hi __builtin_ia32_phsubsw (v4hi, v4hi)
-v4hi __builtin_ia32_pmaddubsw (v8qi, v8qi)
-v4hi __builtin_ia32_pmulhrsw (v4hi, v4hi)
-v8qi __builtin_ia32_pshufb (v8qi, v8qi)
-v8qi __builtin_ia32_psignb (v8qi, v8qi)
-v2si __builtin_ia32_psignd (v2si, v2si)
-v4hi __builtin_ia32_psignw (v4hi, v4hi)
-v1di __builtin_ia32_palignr (v1di, v1di, int)
-v8qi __builtin_ia32_pabsb (v8qi)
-v2si __builtin_ia32_pabsd (v2si)
-v4hi __builtin_ia32_pabsw (v4hi)
-@end smallexample
-
-The following built-in functions are available when @option{-mssse3} is used.
-All of them generate the machine instruction that is part of the name.
-
-@smallexample
-v4si __builtin_ia32_phaddd128 (v4si, v4si)
-v8hi __builtin_ia32_phaddw128 (v8hi, v8hi)
-v8hi __builtin_ia32_phaddsw128 (v8hi, v8hi)
-v4si __builtin_ia32_phsubd128 (v4si, v4si)
-v8hi __builtin_ia32_phsubw128 (v8hi, v8hi)
-v8hi __builtin_ia32_phsubsw128 (v8hi, v8hi)
-v8hi __builtin_ia32_pmaddubsw128 (v16qi, v16qi)
-v8hi __builtin_ia32_pmulhrsw128 (v8hi, v8hi)
-v16qi __builtin_ia32_pshufb128 (v16qi, v16qi)
-v16qi __builtin_ia32_psignb128 (v16qi, v16qi)
-v4si __builtin_ia32_psignd128 (v4si, v4si)
-v8hi __builtin_ia32_psignw128 (v8hi, v8hi)
-v2di __builtin_ia32_palignr128 (v2di, v2di, int)
-v16qi __builtin_ia32_pabsb128 (v16qi)
-v4si __builtin_ia32_pabsd128 (v4si)
-v8hi __builtin_ia32_pabsw128 (v8hi)
-@end smallexample
-
-The following built-in functions are available when @option{-msse4.1} is
-used. All of them generate the machine instruction that is part of the
-name.
-
-@smallexample
-v2df __builtin_ia32_blendpd (v2df, v2df, const int)
-v4sf __builtin_ia32_blendps (v4sf, v4sf, const int)
-v2df __builtin_ia32_blendvpd (v2df, v2df, v2df)
-v4sf __builtin_ia32_blendvps (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_dppd (v2df, v2df, const int)
-v4sf __builtin_ia32_dpps (v4sf, v4sf, const int)
-v4sf __builtin_ia32_insertps128 (v4sf, v4sf, const int)
-v2di __builtin_ia32_movntdqa (v2di *);
-v16qi __builtin_ia32_mpsadbw128 (v16qi, v16qi, const int)
-v8hi __builtin_ia32_packusdw128 (v4si, v4si)
-v16qi __builtin_ia32_pblendvb128 (v16qi, v16qi, v16qi)
-v8hi __builtin_ia32_pblendw128 (v8hi, v8hi, const int)
-v2di __builtin_ia32_pcmpeqq (v2di, v2di)
-v8hi __builtin_ia32_phminposuw128 (v8hi)
-v16qi __builtin_ia32_pmaxsb128 (v16qi, v16qi)
-v4si __builtin_ia32_pmaxsd128 (v4si, v4si)
-v4si __builtin_ia32_pmaxud128 (v4si, v4si)
-v8hi __builtin_ia32_pmaxuw128 (v8hi, v8hi)
-v16qi __builtin_ia32_pminsb128 (v16qi, v16qi)
-v4si __builtin_ia32_pminsd128 (v4si, v4si)
-v4si __builtin_ia32_pminud128 (v4si, v4si)
-v8hi __builtin_ia32_pminuw128 (v8hi, v8hi)
-v4si __builtin_ia32_pmovsxbd128 (v16qi)
-v2di __builtin_ia32_pmovsxbq128 (v16qi)
-v8hi __builtin_ia32_pmovsxbw128 (v16qi)
-v2di __builtin_ia32_pmovsxdq128 (v4si)
-v4si __builtin_ia32_pmovsxwd128 (v8hi)
-v2di __builtin_ia32_pmovsxwq128 (v8hi)
-v4si __builtin_ia32_pmovzxbd128 (v16qi)
-v2di __builtin_ia32_pmovzxbq128 (v16qi)
-v8hi __builtin_ia32_pmovzxbw128 (v16qi)
-v2di __builtin_ia32_pmovzxdq128 (v4si)
-v4si __builtin_ia32_pmovzxwd128 (v8hi)
-v2di __builtin_ia32_pmovzxwq128 (v8hi)
-v2di __builtin_ia32_pmuldq128 (v4si, v4si)
-v4si __builtin_ia32_pmulld128 (v4si, v4si)
-int __builtin_ia32_ptestc128 (v2di, v2di)
-int __builtin_ia32_ptestnzc128 (v2di, v2di)
-int __builtin_ia32_ptestz128 (v2di, v2di)
-v2df __builtin_ia32_roundpd (v2df, const int)
-v4sf __builtin_ia32_roundps (v4sf, const int)
-v2df __builtin_ia32_roundsd (v2df, v2df, const int)
-v4sf __builtin_ia32_roundss (v4sf, v4sf, const int)
-@end smallexample
-
-The following built-in functions are available when @option{-msse4.1} is
-used.
-
-@table @code
-@item v4sf __builtin_ia32_vec_set_v4sf (v4sf, float, const int)
-Generates the @code{insertps} machine instruction.
-@item int __builtin_ia32_vec_ext_v16qi (v16qi, const int)
-Generates the @code{pextrb} machine instruction.
-@item v16qi __builtin_ia32_vec_set_v16qi (v16qi, int, const int)
-Generates the @code{pinsrb} machine instruction.
-@item v4si __builtin_ia32_vec_set_v4si (v4si, int, const int)
-Generates the @code{pinsrd} machine instruction.
-@item v2di __builtin_ia32_vec_set_v2di (v2di, long long, const int)
-Generates the @code{pinsrq} machine instruction in 64bit mode.
-@end table
-
-The following built-in functions are changed to generate new SSE4.1
-instructions when @option{-msse4.1} is used.
-
-@table @code
-@item float __builtin_ia32_vec_ext_v4sf (v4sf, const int)
-Generates the @code{extractps} machine instruction.
-@item int __builtin_ia32_vec_ext_v4si (v4si, const int)
-Generates the @code{pextrd} machine instruction.
-@item long long __builtin_ia32_vec_ext_v2di (v2di, const int)
-Generates the @code{pextrq} machine instruction in 64bit mode.
-@end table
-
-The following built-in functions are available when @option{-msse4.2} is
-used. All of them generate the machine instruction that is part of the
-name.
-
-@smallexample
-v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int)
-int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int)
-int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int)
-int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int)
-int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int)
-int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int)
-int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int)
-v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int)
-int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int)
-int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int)
-int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int)
-int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int)
-int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int)
-int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int)
-v2di __builtin_ia32_pcmpgtq (v2di, v2di)
-@end smallexample
-
-The following built-in functions are available when @option{-msse4.2} is
-used.
-
-@table @code
-@item unsigned int __builtin_ia32_crc32qi (unsigned int, unsigned char)
-Generates the @code{crc32b} machine instruction.
-@item unsigned int __builtin_ia32_crc32hi (unsigned int, unsigned short)
-Generates the @code{crc32w} machine instruction.
-@item unsigned int __builtin_ia32_crc32si (unsigned int, unsigned int)
-Generates the @code{crc32l} machine instruction.
-@item unsigned long long __builtin_ia32_crc32di (unsigned long long, unsigned long long)
-Generates the @code{crc32q} machine instruction.
-@end table
-
-The following built-in functions are changed to generate new SSE4.2
-instructions when @option{-msse4.2} is used.
-
-@table @code
-@item int __builtin_popcount (unsigned int)
-Generates the @code{popcntl} machine instruction.
-@item int __builtin_popcountl (unsigned long)
-Generates the @code{popcntl} or @code{popcntq} machine instruction,
-depending on the size of @code{unsigned long}.
-@item int __builtin_popcountll (unsigned long long)
-Generates the @code{popcntq} machine instruction.
-@end table
-
-The following built-in functions are available when @option{-mavx} is
-used. All of them generate the machine instruction that is part of the
-name.
-
-@smallexample
-v4df __builtin_ia32_addpd256 (v4df,v4df)
-v8sf __builtin_ia32_addps256 (v8sf,v8sf)
-v4df __builtin_ia32_addsubpd256 (v4df,v4df)
-v8sf __builtin_ia32_addsubps256 (v8sf,v8sf)
-v4df __builtin_ia32_andnpd256 (v4df,v4df)
-v8sf __builtin_ia32_andnps256 (v8sf,v8sf)
-v4df __builtin_ia32_andpd256 (v4df,v4df)
-v8sf __builtin_ia32_andps256 (v8sf,v8sf)
-v4df __builtin_ia32_blendpd256 (v4df,v4df,int)
-v8sf __builtin_ia32_blendps256 (v8sf,v8sf,int)
-v4df __builtin_ia32_blendvpd256 (v4df,v4df,v4df)
-v8sf __builtin_ia32_blendvps256 (v8sf,v8sf,v8sf)
-v2df __builtin_ia32_cmppd (v2df,v2df,int)
-v4df __builtin_ia32_cmppd256 (v4df,v4df,int)
-v4sf __builtin_ia32_cmpps (v4sf,v4sf,int)
-v8sf __builtin_ia32_cmpps256 (v8sf,v8sf,int)
-v2df __builtin_ia32_cmpsd (v2df,v2df,int)
-v4sf __builtin_ia32_cmpss (v4sf,v4sf,int)
-v4df __builtin_ia32_cvtdq2pd256 (v4si)
-v8sf __builtin_ia32_cvtdq2ps256 (v8si)
-v4si __builtin_ia32_cvtpd2dq256 (v4df)
-v4sf __builtin_ia32_cvtpd2ps256 (v4df)
-v8si __builtin_ia32_cvtps2dq256 (v8sf)
-v4df __builtin_ia32_cvtps2pd256 (v4sf)
-v4si __builtin_ia32_cvttpd2dq256 (v4df)
-v8si __builtin_ia32_cvttps2dq256 (v8sf)
-v4df __builtin_ia32_divpd256 (v4df,v4df)
-v8sf __builtin_ia32_divps256 (v8sf,v8sf)
-v8sf __builtin_ia32_dpps256 (v8sf,v8sf,int)
-v4df __builtin_ia32_haddpd256 (v4df,v4df)
-v8sf __builtin_ia32_haddps256 (v8sf,v8sf)
-v4df __builtin_ia32_hsubpd256 (v4df,v4df)
-v8sf __builtin_ia32_hsubps256 (v8sf,v8sf)
-v32qi __builtin_ia32_lddqu256 (pcchar)
-v32qi __builtin_ia32_loaddqu256 (pcchar)
-v4df __builtin_ia32_loadupd256 (pcdouble)
-v8sf __builtin_ia32_loadups256 (pcfloat)
-v2df __builtin_ia32_maskloadpd (pcv2df,v2df)
-v4df __builtin_ia32_maskloadpd256 (pcv4df,v4df)
-v4sf __builtin_ia32_maskloadps (pcv4sf,v4sf)
-v8sf __builtin_ia32_maskloadps256 (pcv8sf,v8sf)
-void __builtin_ia32_maskstorepd (pv2df,v2df,v2df)
-void __builtin_ia32_maskstorepd256 (pv4df,v4df,v4df)
-void __builtin_ia32_maskstoreps (pv4sf,v4sf,v4sf)
-void __builtin_ia32_maskstoreps256 (pv8sf,v8sf,v8sf)
-v4df __builtin_ia32_maxpd256 (v4df,v4df)
-v8sf __builtin_ia32_maxps256 (v8sf,v8sf)
-v4df __builtin_ia32_minpd256 (v4df,v4df)
-v8sf __builtin_ia32_minps256 (v8sf,v8sf)
-v4df __builtin_ia32_movddup256 (v4df)
-int __builtin_ia32_movmskpd256 (v4df)
-int __builtin_ia32_movmskps256 (v8sf)
-v8sf __builtin_ia32_movshdup256 (v8sf)
-v8sf __builtin_ia32_movsldup256 (v8sf)
-v4df __builtin_ia32_mulpd256 (v4df,v4df)
-v8sf __builtin_ia32_mulps256 (v8sf,v8sf)
-v4df __builtin_ia32_orpd256 (v4df,v4df)
-v8sf __builtin_ia32_orps256 (v8sf,v8sf)
-v2df __builtin_ia32_pd_pd256 (v4df)
-v4df __builtin_ia32_pd256_pd (v2df)
-v4sf __builtin_ia32_ps_ps256 (v8sf)
-v8sf __builtin_ia32_ps256_ps (v4sf)
-int __builtin_ia32_ptestc256 (v4di,v4di,ptest)
-int __builtin_ia32_ptestnzc256 (v4di,v4di,ptest)
-int __builtin_ia32_ptestz256 (v4di,v4di,ptest)
-v8sf __builtin_ia32_rcpps256 (v8sf)
-v4df __builtin_ia32_roundpd256 (v4df,int)
-v8sf __builtin_ia32_roundps256 (v8sf,int)
-v8sf __builtin_ia32_rsqrtps_nr256 (v8sf)
-v8sf __builtin_ia32_rsqrtps256 (v8sf)
-v4df __builtin_ia32_shufpd256 (v4df,v4df,int)
-v8sf __builtin_ia32_shufps256 (v8sf,v8sf,int)
-v4si __builtin_ia32_si_si256 (v8si)
-v8si __builtin_ia32_si256_si (v4si)
-v4df __builtin_ia32_sqrtpd256 (v4df)
-v8sf __builtin_ia32_sqrtps_nr256 (v8sf)
-v8sf __builtin_ia32_sqrtps256 (v8sf)
-void __builtin_ia32_storedqu256 (pchar,v32qi)
-void __builtin_ia32_storeupd256 (pdouble,v4df)
-void __builtin_ia32_storeups256 (pfloat,v8sf)
-v4df __builtin_ia32_subpd256 (v4df,v4df)
-v8sf __builtin_ia32_subps256 (v8sf,v8sf)
-v4df __builtin_ia32_unpckhpd256 (v4df,v4df)
-v8sf __builtin_ia32_unpckhps256 (v8sf,v8sf)
-v4df __builtin_ia32_unpcklpd256 (v4df,v4df)
-v8sf __builtin_ia32_unpcklps256 (v8sf,v8sf)
-v4df __builtin_ia32_vbroadcastf128_pd256 (pcv2df)
-v8sf __builtin_ia32_vbroadcastf128_ps256 (pcv4sf)
-v4df __builtin_ia32_vbroadcastsd256 (pcdouble)
-v4sf __builtin_ia32_vbroadcastss (pcfloat)
-v8sf __builtin_ia32_vbroadcastss256 (pcfloat)
-v2df __builtin_ia32_vextractf128_pd256 (v4df,int)
-v4sf __builtin_ia32_vextractf128_ps256 (v8sf,int)
-v4si __builtin_ia32_vextractf128_si256 (v8si,int)
-v4df __builtin_ia32_vinsertf128_pd256 (v4df,v2df,int)
-v8sf __builtin_ia32_vinsertf128_ps256 (v8sf,v4sf,int)
-v8si __builtin_ia32_vinsertf128_si256 (v8si,v4si,int)
-v4df __builtin_ia32_vperm2f128_pd256 (v4df,v4df,int)
-v8sf __builtin_ia32_vperm2f128_ps256 (v8sf,v8sf,int)
-v8si __builtin_ia32_vperm2f128_si256 (v8si,v8si,int)
-v2df __builtin_ia32_vpermil2pd (v2df,v2df,v2di,int)
-v4df __builtin_ia32_vpermil2pd256 (v4df,v4df,v4di,int)
-v4sf __builtin_ia32_vpermil2ps (v4sf,v4sf,v4si,int)
-v8sf __builtin_ia32_vpermil2ps256 (v8sf,v8sf,v8si,int)
-v2df __builtin_ia32_vpermilpd (v2df,int)
-v4df __builtin_ia32_vpermilpd256 (v4df,int)
-v4sf __builtin_ia32_vpermilps (v4sf,int)
-v8sf __builtin_ia32_vpermilps256 (v8sf,int)
-v2df __builtin_ia32_vpermilvarpd (v2df,v2di)
-v4df __builtin_ia32_vpermilvarpd256 (v4df,v4di)
-v4sf __builtin_ia32_vpermilvarps (v4sf,v4si)
-v8sf __builtin_ia32_vpermilvarps256 (v8sf,v8si)
-int __builtin_ia32_vtestcpd (v2df,v2df,ptest)
-int __builtin_ia32_vtestcpd256 (v4df,v4df,ptest)
-int __builtin_ia32_vtestcps (v4sf,v4sf,ptest)
-int __builtin_ia32_vtestcps256 (v8sf,v8sf,ptest)
-int __builtin_ia32_vtestnzcpd (v2df,v2df,ptest)
-int __builtin_ia32_vtestnzcpd256 (v4df,v4df,ptest)
-int __builtin_ia32_vtestnzcps (v4sf,v4sf,ptest)
-int __builtin_ia32_vtestnzcps256 (v8sf,v8sf,ptest)
-int __builtin_ia32_vtestzpd (v2df,v2df,ptest)
-int __builtin_ia32_vtestzpd256 (v4df,v4df,ptest)
-int __builtin_ia32_vtestzps (v4sf,v4sf,ptest)
-int __builtin_ia32_vtestzps256 (v8sf,v8sf,ptest)
-void __builtin_ia32_vzeroall (void)
-void __builtin_ia32_vzeroupper (void)
-v4df __builtin_ia32_xorpd256 (v4df,v4df)
-v8sf __builtin_ia32_xorps256 (v8sf,v8sf)
-@end smallexample
-
-The following built-in functions are available when @option{-mavx2} is
-used. All of them generate the machine instruction that is part of the
-name.
-
-@smallexample
-v32qi __builtin_ia32_mpsadbw256 (v32qi,v32qi,int)
-v32qi __builtin_ia32_pabsb256 (v32qi)
-v16hi __builtin_ia32_pabsw256 (v16hi)
-v8si __builtin_ia32_pabsd256 (v8si)
-v16hi __builtin_ia32_packssdw256 (v8si,v8si)
-v32qi __builtin_ia32_packsswb256 (v16hi,v16hi)
-v16hi __builtin_ia32_packusdw256 (v8si,v8si)
-v32qi __builtin_ia32_packuswb256 (v16hi,v16hi)
-v32qi __builtin_ia32_paddb256 (v32qi,v32qi)
-v16hi __builtin_ia32_paddw256 (v16hi,v16hi)
-v8si __builtin_ia32_paddd256 (v8si,v8si)
-v4di __builtin_ia32_paddq256 (v4di,v4di)
-v32qi __builtin_ia32_paddsb256 (v32qi,v32qi)
-v16hi __builtin_ia32_paddsw256 (v16hi,v16hi)
-v32qi __builtin_ia32_paddusb256 (v32qi,v32qi)
-v16hi __builtin_ia32_paddusw256 (v16hi,v16hi)
-v4di __builtin_ia32_palignr256 (v4di,v4di,int)
-v4di __builtin_ia32_andsi256 (v4di,v4di)
-v4di __builtin_ia32_andnotsi256 (v4di,v4di)
-v32qi __builtin_ia32_pavgb256 (v32qi,v32qi)
-v16hi __builtin_ia32_pavgw256 (v16hi,v16hi)
-v32qi __builtin_ia32_pblendvb256 (v32qi,v32qi,v32qi)
-v16hi __builtin_ia32_pblendw256 (v16hi,v16hi,int)
-v32qi __builtin_ia32_pcmpeqb256 (v32qi,v32qi)
-v16hi __builtin_ia32_pcmpeqw256 (v16hi,v16hi)
-v8si __builtin_ia32_pcmpeqd256 (c8si,v8si)
-v4di __builtin_ia32_pcmpeqq256 (v4di,v4di)
-v32qi __builtin_ia32_pcmpgtb256 (v32qi,v32qi)
-v16hi __builtin_ia32_pcmpgtw256 (16hi,v16hi)
-v8si __builtin_ia32_pcmpgtd256 (v8si,v8si)
-v4di __builtin_ia32_pcmpgtq256 (v4di,v4di)
-v16hi __builtin_ia32_phaddw256 (v16hi,v16hi)
-v8si __builtin_ia32_phaddd256 (v8si,v8si)
-v16hi __builtin_ia32_phaddsw256 (v16hi,v16hi)
-v16hi __builtin_ia32_phsubw256 (v16hi,v16hi)
-v8si __builtin_ia32_phsubd256 (v8si,v8si)
-v16hi __builtin_ia32_phsubsw256 (v16hi,v16hi)
-v32qi __builtin_ia32_pmaddubsw256 (v32qi,v32qi)
-v16hi __builtin_ia32_pmaddwd256 (v16hi,v16hi)
-v32qi __builtin_ia32_pmaxsb256 (v32qi,v32qi)
-v16hi __builtin_ia32_pmaxsw256 (v16hi,v16hi)
-v8si __builtin_ia32_pmaxsd256 (v8si,v8si)
-v32qi __builtin_ia32_pmaxub256 (v32qi,v32qi)
-v16hi __builtin_ia32_pmaxuw256 (v16hi,v16hi)
-v8si __builtin_ia32_pmaxud256 (v8si,v8si)
-v32qi __builtin_ia32_pminsb256 (v32qi,v32qi)
-v16hi __builtin_ia32_pminsw256 (v16hi,v16hi)
-v8si __builtin_ia32_pminsd256 (v8si,v8si)
-v32qi __builtin_ia32_pminub256 (v32qi,v32qi)
-v16hi __builtin_ia32_pminuw256 (v16hi,v16hi)
-v8si __builtin_ia32_pminud256 (v8si,v8si)
-int __builtin_ia32_pmovmskb256 (v32qi)
-v16hi __builtin_ia32_pmovsxbw256 (v16qi)
-v8si __builtin_ia32_pmovsxbd256 (v16qi)
-v4di __builtin_ia32_pmovsxbq256 (v16qi)
-v8si __builtin_ia32_pmovsxwd256 (v8hi)
-v4di __builtin_ia32_pmovsxwq256 (v8hi)
-v4di __builtin_ia32_pmovsxdq256 (v4si)
-v16hi __builtin_ia32_pmovzxbw256 (v16qi)
-v8si __builtin_ia32_pmovzxbd256 (v16qi)
-v4di __builtin_ia32_pmovzxbq256 (v16qi)
-v8si __builtin_ia32_pmovzxwd256 (v8hi)
-v4di __builtin_ia32_pmovzxwq256 (v8hi)
-v4di __builtin_ia32_pmovzxdq256 (v4si)
-v4di __builtin_ia32_pmuldq256 (v8si,v8si)
-v16hi __builtin_ia32_pmulhrsw256 (v16hi, v16hi)
-v16hi __builtin_ia32_pmulhuw256 (v16hi,v16hi)
-v16hi __builtin_ia32_pmulhw256 (v16hi,v16hi)
-v16hi __builtin_ia32_pmullw256 (v16hi,v16hi)
-v8si __builtin_ia32_pmulld256 (v8si,v8si)
-v4di __builtin_ia32_pmuludq256 (v8si,v8si)
-v4di __builtin_ia32_por256 (v4di,v4di)
-v16hi __builtin_ia32_psadbw256 (v32qi,v32qi)
-v32qi __builtin_ia32_pshufb256 (v32qi,v32qi)
-v8si __builtin_ia32_pshufd256 (v8si,int)
-v16hi __builtin_ia32_pshufhw256 (v16hi,int)
-v16hi __builtin_ia32_pshuflw256 (v16hi,int)
-v32qi __builtin_ia32_psignb256 (v32qi,v32qi)
-v16hi __builtin_ia32_psignw256 (v16hi,v16hi)
-v8si __builtin_ia32_psignd256 (v8si,v8si)
-v4di __builtin_ia32_pslldqi256 (v4di,int)
-v16hi __builtin_ia32_psllwi256 (16hi,int)
-v16hi __builtin_ia32_psllw256(v16hi,v8hi)
-v8si __builtin_ia32_pslldi256 (v8si,int)
-v8si __builtin_ia32_pslld256(v8si,v4si)
-v4di __builtin_ia32_psllqi256 (v4di,int)
-v4di __builtin_ia32_psllq256(v4di,v2di)
-v16hi __builtin_ia32_psrawi256 (v16hi,int)
-v16hi __builtin_ia32_psraw256 (v16hi,v8hi)
-v8si __builtin_ia32_psradi256 (v8si,int)
-v8si __builtin_ia32_psrad256 (v8si,v4si)
-v4di __builtin_ia32_psrldqi256 (v4di, int)
-v16hi __builtin_ia32_psrlwi256 (v16hi,int)
-v16hi __builtin_ia32_psrlw256 (v16hi,v8hi)
-v8si __builtin_ia32_psrldi256 (v8si,int)
-v8si __builtin_ia32_psrld256 (v8si,v4si)
-v4di __builtin_ia32_psrlqi256 (v4di,int)
-v4di __builtin_ia32_psrlq256(v4di,v2di)
-v32qi __builtin_ia32_psubb256 (v32qi,v32qi)
-v32hi __builtin_ia32_psubw256 (v16hi,v16hi)
-v8si __builtin_ia32_psubd256 (v8si,v8si)
-v4di __builtin_ia32_psubq256 (v4di,v4di)
-v32qi __builtin_ia32_psubsb256 (v32qi,v32qi)
-v16hi __builtin_ia32_psubsw256 (v16hi,v16hi)
-v32qi __builtin_ia32_psubusb256 (v32qi,v32qi)
-v16hi __builtin_ia32_psubusw256 (v16hi,v16hi)
-v32qi __builtin_ia32_punpckhbw256 (v32qi,v32qi)
-v16hi __builtin_ia32_punpckhwd256 (v16hi,v16hi)
-v8si __builtin_ia32_punpckhdq256 (v8si,v8si)
-v4di __builtin_ia32_punpckhqdq256 (v4di,v4di)
-v32qi __builtin_ia32_punpcklbw256 (v32qi,v32qi)
-v16hi __builtin_ia32_punpcklwd256 (v16hi,v16hi)
-v8si __builtin_ia32_punpckldq256 (v8si,v8si)
-v4di __builtin_ia32_punpcklqdq256 (v4di,v4di)
-v4di __builtin_ia32_pxor256 (v4di,v4di)
-v4di __builtin_ia32_movntdqa256 (pv4di)
-v4sf __builtin_ia32_vbroadcastss_ps (v4sf)
-v8sf __builtin_ia32_vbroadcastss_ps256 (v4sf)
-v4df __builtin_ia32_vbroadcastsd_pd256 (v2df)
-v4di __builtin_ia32_vbroadcastsi256 (v2di)
-v4si __builtin_ia32_pblendd128 (v4si,v4si)
-v8si __builtin_ia32_pblendd256 (v8si,v8si)
-v32qi __builtin_ia32_pbroadcastb256 (v16qi)
-v16hi __builtin_ia32_pbroadcastw256 (v8hi)
-v8si __builtin_ia32_pbroadcastd256 (v4si)
-v4di __builtin_ia32_pbroadcastq256 (v2di)
-v16qi __builtin_ia32_pbroadcastb128 (v16qi)
-v8hi __builtin_ia32_pbroadcastw128 (v8hi)
-v4si __builtin_ia32_pbroadcastd128 (v4si)
-v2di __builtin_ia32_pbroadcastq128 (v2di)
-v8si __builtin_ia32_permvarsi256 (v8si,v8si)
-v4df __builtin_ia32_permdf256 (v4df,int)
-v8sf __builtin_ia32_permvarsf256 (v8sf,v8sf)
-v4di __builtin_ia32_permdi256 (v4di,int)
-v4di __builtin_ia32_permti256 (v4di,v4di,int)
-v4di __builtin_ia32_extract128i256 (v4di,int)
-v4di __builtin_ia32_insert128i256 (v4di,v2di,int)
-v8si __builtin_ia32_maskloadd256 (pcv8si,v8si)
-v4di __builtin_ia32_maskloadq256 (pcv4di,v4di)
-v4si __builtin_ia32_maskloadd (pcv4si,v4si)
-v2di __builtin_ia32_maskloadq (pcv2di,v2di)
-void __builtin_ia32_maskstored256 (pv8si,v8si,v8si)
-void __builtin_ia32_maskstoreq256 (pv4di,v4di,v4di)
-void __builtin_ia32_maskstored (pv4si,v4si,v4si)
-void __builtin_ia32_maskstoreq (pv2di,v2di,v2di)
-v8si __builtin_ia32_psllv8si (v8si,v8si)
-v4si __builtin_ia32_psllv4si (v4si,v4si)
-v4di __builtin_ia32_psllv4di (v4di,v4di)
-v2di __builtin_ia32_psllv2di (v2di,v2di)
-v8si __builtin_ia32_psrav8si (v8si,v8si)
-v4si __builtin_ia32_psrav4si (v4si,v4si)
-v8si __builtin_ia32_psrlv8si (v8si,v8si)
-v4si __builtin_ia32_psrlv4si (v4si,v4si)
-v4di __builtin_ia32_psrlv4di (v4di,v4di)
-v2di __builtin_ia32_psrlv2di (v2di,v2di)
-v2df __builtin_ia32_gathersiv2df (v2df, pcdouble,v4si,v2df,int)
-v4df __builtin_ia32_gathersiv4df (v4df, pcdouble,v4si,v4df,int)
-v2df __builtin_ia32_gatherdiv2df (v2df, pcdouble,v2di,v2df,int)
-v4df __builtin_ia32_gatherdiv4df (v4df, pcdouble,v4di,v4df,int)
-v4sf __builtin_ia32_gathersiv4sf (v4sf, pcfloat,v4si,v4sf,int)
-v8sf __builtin_ia32_gathersiv8sf (v8sf, pcfloat,v8si,v8sf,int)
-v4sf __builtin_ia32_gatherdiv4sf (v4sf, pcfloat,v2di,v4sf,int)
-v4sf __builtin_ia32_gatherdiv4sf256 (v4sf, pcfloat,v4di,v4sf,int)
-v2di __builtin_ia32_gathersiv2di (v2di, pcint64,v4si,v2di,int)
-v4di __builtin_ia32_gathersiv4di (v4di, pcint64,v4si,v4di,int)
-v2di __builtin_ia32_gatherdiv2di (v2di, pcint64,v2di,v2di,int)
-v4di __builtin_ia32_gatherdiv4di (v4di, pcint64,v4di,v4di,int)
-v4si __builtin_ia32_gathersiv4si (v4si, pcint,v4si,v4si,int)
-v8si __builtin_ia32_gathersiv8si (v8si, pcint,v8si,v8si,int)
-v4si __builtin_ia32_gatherdiv4si (v4si, pcint,v2di,v4si,int)
-v4si __builtin_ia32_gatherdiv4si256 (v4si, pcint,v4di,v4si,int)
-@end smallexample
-
-The following built-in functions are available when @option{-maes} is
-used. All of them generate the machine instruction that is part of the
-name.
-
-@smallexample
-v2di __builtin_ia32_aesenc128 (v2di, v2di)
-v2di __builtin_ia32_aesenclast128 (v2di, v2di)
-v2di __builtin_ia32_aesdec128 (v2di, v2di)
-v2di __builtin_ia32_aesdeclast128 (v2di, v2di)
-v2di __builtin_ia32_aeskeygenassist128 (v2di, const int)
-v2di __builtin_ia32_aesimc128 (v2di)
-@end smallexample
-
-The following built-in function is available when @option{-mpclmul} is
-used.
-
-@table @code
-@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)
-Generates the @code{pclmulqdq} machine instruction.
-@end table
-
-The following built-in function is available when @option{-mfsgsbase} is
-used. All of them generate the machine instruction that is part of the
-name.
-
-@smallexample
-unsigned int __builtin_ia32_rdfsbase32 (void)
-unsigned long long __builtin_ia32_rdfsbase64 (void)
-unsigned int __builtin_ia32_rdgsbase32 (void)
-unsigned long long __builtin_ia32_rdgsbase64 (void)
-void _writefsbase_u32 (unsigned int)
-void _writefsbase_u64 (unsigned long long)
-void _writegsbase_u32 (unsigned int)
-void _writegsbase_u64 (unsigned long long)
-@end smallexample
-
-The following built-in function is available when @option{-mrdrnd} is
-used. All of them generate the machine instruction that is part of the
-name.
-
-@smallexample
-unsigned int __builtin_ia32_rdrand16_step (unsigned short *)
-unsigned int __builtin_ia32_rdrand32_step (unsigned int *)
-unsigned int __builtin_ia32_rdrand64_step (unsigned long long *)
-@end smallexample
-
-The following built-in functions are available when @option{-msse4a} is used.
-All of them generate the machine instruction that is part of the name.
-
-@smallexample
-void __builtin_ia32_movntsd (double *, v2df)
-void __builtin_ia32_movntss (float *, v4sf)
-v2di __builtin_ia32_extrq (v2di, v16qi)
-v2di __builtin_ia32_extrqi (v2di, const unsigned int, const unsigned int)
-v2di __builtin_ia32_insertq (v2di, v2di)
-v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int)
-@end smallexample
-
-The following built-in functions are available when @option{-mxop} is used.
-@smallexample
-v2df __builtin_ia32_vfrczpd (v2df)
-v4sf __builtin_ia32_vfrczps (v4sf)
-v2df __builtin_ia32_vfrczsd (v2df)
-v4sf __builtin_ia32_vfrczss (v4sf)
-v4df __builtin_ia32_vfrczpd256 (v4df)
-v8sf __builtin_ia32_vfrczps256 (v8sf)
-v2di __builtin_ia32_vpcmov (v2di, v2di, v2di)
-v2di __builtin_ia32_vpcmov_v2di (v2di, v2di, v2di)
-v4si __builtin_ia32_vpcmov_v4si (v4si, v4si, v4si)
-v8hi __builtin_ia32_vpcmov_v8hi (v8hi, v8hi, v8hi)
-v16qi __builtin_ia32_vpcmov_v16qi (v16qi, v16qi, v16qi)
-v2df __builtin_ia32_vpcmov_v2df (v2df, v2df, v2df)
-v4sf __builtin_ia32_vpcmov_v4sf (v4sf, v4sf, v4sf)
-v4di __builtin_ia32_vpcmov_v4di256 (v4di, v4di, v4di)
-v8si __builtin_ia32_vpcmov_v8si256 (v8si, v8si, v8si)
-v16hi __builtin_ia32_vpcmov_v16hi256 (v16hi, v16hi, v16hi)
-v32qi __builtin_ia32_vpcmov_v32qi256 (v32qi, v32qi, v32qi)
-v4df __builtin_ia32_vpcmov_v4df256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vpcmov_v8sf256 (v8sf, v8sf, v8sf)
-v16qi __builtin_ia32_vpcomeqb (v16qi, v16qi)
-v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi)
-v4si __builtin_ia32_vpcomeqd (v4si, v4si)
-v2di __builtin_ia32_vpcomeqq (v2di, v2di)
-v16qi __builtin_ia32_vpcomequb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomequd (v4si, v4si)
-v2di __builtin_ia32_vpcomequq (v2di, v2di)
-v8hi __builtin_ia32_vpcomequw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomfalseb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomfalsed (v4si, v4si)
-v2di __builtin_ia32_vpcomfalseq (v2di, v2di)
-v16qi __builtin_ia32_vpcomfalseub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomfalseud (v4si, v4si)
-v2di __builtin_ia32_vpcomfalseuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomfalseuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomfalsew (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomgeb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomged (v4si, v4si)
-v2di __builtin_ia32_vpcomgeq (v2di, v2di)
-v16qi __builtin_ia32_vpcomgeub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomgeud (v4si, v4si)
-v2di __builtin_ia32_vpcomgeuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomgeuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomgew (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomgtb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomgtd (v4si, v4si)
-v2di __builtin_ia32_vpcomgtq (v2di, v2di)
-v16qi __builtin_ia32_vpcomgtub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomgtud (v4si, v4si)
-v2di __builtin_ia32_vpcomgtuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomgtuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomgtw (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomleb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomled (v4si, v4si)
-v2di __builtin_ia32_vpcomleq (v2di, v2di)
-v16qi __builtin_ia32_vpcomleub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomleud (v4si, v4si)
-v2di __builtin_ia32_vpcomleuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomleuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomlew (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomltb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomltd (v4si, v4si)
-v2di __builtin_ia32_vpcomltq (v2di, v2di)
-v16qi __builtin_ia32_vpcomltub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomltud (v4si, v4si)
-v2di __builtin_ia32_vpcomltuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomltuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomltw (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomneb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomned (v4si, v4si)
-v2di __builtin_ia32_vpcomneq (v2di, v2di)
-v16qi __builtin_ia32_vpcomneub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomneud (v4si, v4si)
-v2di __builtin_ia32_vpcomneuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomneuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomnew (v8hi, v8hi)
-v16qi __builtin_ia32_vpcomtrueb (v16qi, v16qi)
-v4si __builtin_ia32_vpcomtrued (v4si, v4si)
-v2di __builtin_ia32_vpcomtrueq (v2di, v2di)
-v16qi __builtin_ia32_vpcomtrueub (v16qi, v16qi)
-v4si __builtin_ia32_vpcomtrueud (v4si, v4si)
-v2di __builtin_ia32_vpcomtrueuq (v2di, v2di)
-v8hi __builtin_ia32_vpcomtrueuw (v8hi, v8hi)
-v8hi __builtin_ia32_vpcomtruew (v8hi, v8hi)
-v4si __builtin_ia32_vphaddbd (v16qi)
-v2di __builtin_ia32_vphaddbq (v16qi)
-v8hi __builtin_ia32_vphaddbw (v16qi)
-v2di __builtin_ia32_vphadddq (v4si)
-v4si __builtin_ia32_vphaddubd (v16qi)
-v2di __builtin_ia32_vphaddubq (v16qi)
-v8hi __builtin_ia32_vphaddubw (v16qi)
-v2di __builtin_ia32_vphaddudq (v4si)
-v4si __builtin_ia32_vphadduwd (v8hi)
-v2di __builtin_ia32_vphadduwq (v8hi)
-v4si __builtin_ia32_vphaddwd (v8hi)
-v2di __builtin_ia32_vphaddwq (v8hi)
-v8hi __builtin_ia32_vphsubbw (v16qi)
-v2di __builtin_ia32_vphsubdq (v4si)
-v4si __builtin_ia32_vphsubwd (v8hi)
-v4si __builtin_ia32_vpmacsdd (v4si, v4si, v4si)
-v2di __builtin_ia32_vpmacsdqh (v4si, v4si, v2di)
-v2di __builtin_ia32_vpmacsdql (v4si, v4si, v2di)
-v4si __builtin_ia32_vpmacssdd (v4si, v4si, v4si)
-v2di __builtin_ia32_vpmacssdqh (v4si, v4si, v2di)
-v2di __builtin_ia32_vpmacssdql (v4si, v4si, v2di)
-v4si __builtin_ia32_vpmacsswd (v8hi, v8hi, v4si)
-v8hi __builtin_ia32_vpmacssww (v8hi, v8hi, v8hi)
-v4si __builtin_ia32_vpmacswd (v8hi, v8hi, v4si)
-v8hi __builtin_ia32_vpmacsww (v8hi, v8hi, v8hi)
-v4si __builtin_ia32_vpmadcsswd (v8hi, v8hi, v4si)
-v4si __builtin_ia32_vpmadcswd (v8hi, v8hi, v4si)
-v16qi __builtin_ia32_vpperm (v16qi, v16qi, v16qi)
-v16qi __builtin_ia32_vprotb (v16qi, v16qi)
-v4si __builtin_ia32_vprotd (v4si, v4si)
-v2di __builtin_ia32_vprotq (v2di, v2di)
-v8hi __builtin_ia32_vprotw (v8hi, v8hi)
-v16qi __builtin_ia32_vpshab (v16qi, v16qi)
-v4si __builtin_ia32_vpshad (v4si, v4si)
-v2di __builtin_ia32_vpshaq (v2di, v2di)
-v8hi __builtin_ia32_vpshaw (v8hi, v8hi)
-v16qi __builtin_ia32_vpshlb (v16qi, v16qi)
-v4si __builtin_ia32_vpshld (v4si, v4si)
-v2di __builtin_ia32_vpshlq (v2di, v2di)
-v8hi __builtin_ia32_vpshlw (v8hi, v8hi)
-@end smallexample
-
-The following built-in functions are available when @option{-mfma4} is used.
-All of them generate the machine instruction that is part of the name.
-
-@smallexample
-v2df __builtin_ia32_vfmaddpd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfmaddps (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfmaddsd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfmaddss (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfmsubpd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfmsubps (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfmsubsd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfmsubss (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfnmaddpd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfnmaddps (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfnmaddsd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfnmaddss (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfnmsubpd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfnmsubps (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfnmsubsd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfnmsubss (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfmaddsubpd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfmaddsubps (v4sf, v4sf, v4sf)
-v2df __builtin_ia32_vfmsubaddpd (v2df, v2df, v2df)
-v4sf __builtin_ia32_vfmsubaddps (v4sf, v4sf, v4sf)
-v4df __builtin_ia32_vfmaddpd256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vfmaddps256 (v8sf, v8sf, v8sf)
-v4df __builtin_ia32_vfmsubpd256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vfmsubps256 (v8sf, v8sf, v8sf)
-v4df __builtin_ia32_vfnmaddpd256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vfnmaddps256 (v8sf, v8sf, v8sf)
-v4df __builtin_ia32_vfnmsubpd256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vfnmsubps256 (v8sf, v8sf, v8sf)
-v4df __builtin_ia32_vfmaddsubpd256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vfmaddsubps256 (v8sf, v8sf, v8sf)
-v4df __builtin_ia32_vfmsubaddpd256 (v4df, v4df, v4df)
-v8sf __builtin_ia32_vfmsubaddps256 (v8sf, v8sf, v8sf)
-
-@end smallexample
-
-The following built-in functions are available when @option{-mlwp} is used.
-
-@smallexample
-void __builtin_ia32_llwpcb16 (void *);
-void __builtin_ia32_llwpcb32 (void *);
-void __builtin_ia32_llwpcb64 (void *);
-void * __builtin_ia32_llwpcb16 (void);
-void * __builtin_ia32_llwpcb32 (void);
-void * __builtin_ia32_llwpcb64 (void);
-void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short)
-void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int)
-void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int)
-unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short)
-unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int)
-unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int)
-@end smallexample
-
-The following built-in functions are available when @option{-mbmi} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int);
-unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long);
-@end smallexample
-
-The following built-in functions are available when @option{-mbmi2} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-unsigned int _bzhi_u32 (unsigned int, unsigned int)
-unsigned int _pdep_u32 (unsigned int, unsigned int)
-unsigned int _pext_u32 (unsigned int, unsigned int)
-unsigned long long _bzhi_u64 (unsigned long long, unsigned long long)
-unsigned long long _pdep_u64 (unsigned long long, unsigned long long)
-unsigned long long _pext_u64 (unsigned long long, unsigned long long)
-@end smallexample
-
-The following built-in functions are available when @option{-mlzcnt} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-unsigned short __builtin_ia32_lzcnt_16(unsigned short);
-unsigned int __builtin_ia32_lzcnt_u32(unsigned int);
-unsigned long long __builtin_ia32_lzcnt_u64 (unsigned long long);
-@end smallexample
-
-The following built-in functions are available when @option{-mfxsr} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-void __builtin_ia32_fxsave (void *)
-void __builtin_ia32_fxrstor (void *)
-void __builtin_ia32_fxsave64 (void *)
-void __builtin_ia32_fxrstor64 (void *)
-@end smallexample
-
-The following built-in functions are available when @option{-mxsave} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-void __builtin_ia32_xsave (void *, long long)
-void __builtin_ia32_xrstor (void *, long long)
-void __builtin_ia32_xsave64 (void *, long long)
-void __builtin_ia32_xrstor64 (void *, long long)
-@end smallexample
-
-The following built-in functions are available when @option{-mxsaveopt} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-void __builtin_ia32_xsaveopt (void *, long long)
-void __builtin_ia32_xsaveopt64 (void *, long long)
-@end smallexample
-
-The following built-in functions are available when @option{-mtbm} is used.
-Both of them generate the immediate form of the bextr machine instruction.
-@smallexample
-unsigned int __builtin_ia32_bextri_u32 (unsigned int, const unsigned int);
-unsigned long long __builtin_ia32_bextri_u64 (unsigned long long, const unsigned long long);
-@end smallexample
-
-
-The following built-in functions are available when @option{-m3dnow} is used.
-All of them generate the machine instruction that is part of the name.
-
-@smallexample
-void __builtin_ia32_femms (void)
-v8qi __builtin_ia32_pavgusb (v8qi, v8qi)
-v2si __builtin_ia32_pf2id (v2sf)
-v2sf __builtin_ia32_pfacc (v2sf, v2sf)
-v2sf __builtin_ia32_pfadd (v2sf, v2sf)
-v2si __builtin_ia32_pfcmpeq (v2sf, v2sf)
-v2si __builtin_ia32_pfcmpge (v2sf, v2sf)
-v2si __builtin_ia32_pfcmpgt (v2sf, v2sf)
-v2sf __builtin_ia32_pfmax (v2sf, v2sf)
-v2sf __builtin_ia32_pfmin (v2sf, v2sf)
-v2sf __builtin_ia32_pfmul (v2sf, v2sf)
-v2sf __builtin_ia32_pfrcp (v2sf)
-v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf)
-v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf)
-v2sf __builtin_ia32_pfrsqrt (v2sf)
-v2sf __builtin_ia32_pfsub (v2sf, v2sf)
-v2sf __builtin_ia32_pfsubr (v2sf, v2sf)
-v2sf __builtin_ia32_pi2fd (v2si)
-v4hi __builtin_ia32_pmulhrw (v4hi, v4hi)
-@end smallexample
-
-The following built-in functions are available when both @option{-m3dnow}
-and @option{-march=athlon} are used. All of them generate the machine
-instruction that is part of the name.
-
-@smallexample
-v2si __builtin_ia32_pf2iw (v2sf)
-v2sf __builtin_ia32_pfnacc (v2sf, v2sf)
-v2sf __builtin_ia32_pfpnacc (v2sf, v2sf)
-v2sf __builtin_ia32_pi2fw (v2si)
-v2sf __builtin_ia32_pswapdsf (v2sf)
-v2si __builtin_ia32_pswapdsi (v2si)
-@end smallexample
-
-The following built-in functions are available when @option{-mrtm} is used
-They are used for restricted transactional memory. These are the internal
-low level functions. Normally the functions in
-@ref{x86 transactional memory intrinsics} should be used instead.
-
-@smallexample
-int __builtin_ia32_xbegin ()
-void __builtin_ia32_xend ()
-void __builtin_ia32_xabort (status)
-int __builtin_ia32_xtest ()
-@end smallexample
-
-@node x86 transactional memory intrinsics
-@subsection x86 transaction memory intrinsics
-
-Hardware transactional memory intrinsics for x86. These allow to use
-memory transactions with RTM (Restricted Transactional Memory).
-For using HLE (Hardware Lock Elision) see @ref{x86 specific memory model extensions for transactional memory} instead.
-This support is enabled with the @option{-mrtm} option.
-
-A memory transaction commits all changes to memory in an atomic way,
-as visible to other threads. If the transaction fails it is rolled back
-and all side effects discarded.
-
-Generally there is no guarantee that a memory transaction ever succeeds
-and suitable fallback code always needs to be supplied.
-
-@deftypefn {RTM Function} {unsigned} _xbegin ()
-Start a RTM (Restricted Transactional Memory) transaction.
-Returns _XBEGIN_STARTED when the transaction
-started successfully (note this is not 0, so the constant has to be
-explicitely tested). When the transaction aborts all side effects
-are undone and an abort code is returned. There is no guarantee
-any transaction ever succeeds, so there always needs to be a valid
-tested fallback path.
-@end deftypefn
-
-@smallexample
-#include <immintrin.h>
-
-if ((status = _xbegin ()) == _XBEGIN_STARTED) @{
- ... transaction code...
- _xend ();
-@} else @{
- ... non transactional fallback path...
-@}
-@end smallexample
-
-Valid abort status bits (when the value is not @code{_XBEGIN_STARTED}) are:
-
-@table @code
-@item _XABORT_EXPLICIT
-Transaction explicitely aborted with @code{_xabort}. The parameter passed
-to @code{_xabort} is available with @code{_XABORT_CODE(status)}
-@item _XABORT_RETRY
-Transaction retry is possible.
-@item _XABORT_CONFLICT
-Transaction abort due to a memory conflict with another thread
-@item _XABORT_CAPACITY
-Transaction abort due to the transaction using too much memory
-@item _XABORT_DEBUG
-Transaction abort due to a debug trap
-@item _XABORT_NESTED
-Transaction abort in a inner nested transaction
-@end table
-
-@deftypefn {RTM Function} {void} _xend ()
-Commit the current transaction. When no transaction is active this will
-fault. All memory side effects of the transactions will become visible
-to other threads in an atomic matter.
-@end deftypefn
-
-@deftypefn {RTM Function} {int} _xtest ()
-Return a value not zero when a transaction is currently active, otherwise 0.
-@end deftypefn
-
-@deftypefn {RTM Function} {void} _xabort (status)
-Abort the current transaction. When no transaction is active this is a no-op.
-status must be a 8bit constant, that is included in the status code returned
-by @code{_xbegin}
-@end deftypefn
-
@node MIPS DSP Built-in Functions
@subsection MIPS DSP Built-in Functions
@@ -17266,6 +15792,1480 @@ The intrinsic @code{void __tile_network_barrier (void)} is used to
guarantee that no network operations before it are reordered with
those after it.
+@node x86 Built-in Functions
+@subsection x86 Built-in Functions
+
+These built-in functions are available for the x86-32 and x86-64 family
+of computers, depending on the command-line switches used.
+
+If you specify command-line switches such as @option{-msse},
+the compiler could use the extended instruction sets even if the built-ins
+are not used explicitly in the program. For this reason, applications
+that perform run-time CPU detection must compile separate files for each
+supported architecture, using the appropriate flags. In particular,
+the file containing the CPU detection code should be compiled without
+these options.
+
+The following machine modes are available for use with MMX built-in functions
+(@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32-bit integers,
+@code{V4HI} for a vector of four 16-bit integers, and @code{V8QI} for a
+vector of eight 8-bit integers. Some of the built-in functions operate on
+MMX registers as a whole 64-bit entity, these use @code{V1DI} as their mode.
+
+If 3DNow!@: extensions are enabled, @code{V2SF} is used as a mode for a vector
+of two 32-bit floating-point values.
+
+If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32-bit
+floating-point values. Some instructions use a vector of four 32-bit
+integers, these use @code{V4SI}. Finally, some instructions operate on an
+entire vector register, interpreting it as a 128-bit integer, these use mode
+@code{TI}.
+
+In 64-bit mode, the x86-64 family of processors uses additional built-in
+functions for efficient use of @code{TF} (@code{__float128}) 128-bit
+floating point and @code{TC} 128-bit complex floating-point values.
+
+The following floating-point built-in functions are available in 64-bit
+mode. All of them implement the function that is part of the name.
+
+@smallexample
+__float128 __builtin_fabsq (__float128)
+__float128 __builtin_copysignq (__float128, __float128)
+@end smallexample
+
+The following built-in function is always available.
+
+@table @code
+@item void __builtin_ia32_pause (void)
+Generates the @code{pause} machine instruction with a compiler memory
+barrier.
+@end table
+
+The following floating-point built-in functions are made available in the
+64-bit mode.
+
+@table @code
+@item __float128 __builtin_infq (void)
+Similar to @code{__builtin_inf}, except the return type is @code{__float128}.
+@findex __builtin_infq
+
+@item __float128 __builtin_huge_valq (void)
+Similar to @code{__builtin_huge_val}, except the return type is @code{__float128}.
+@findex __builtin_huge_valq
+@end table
+
+The following built-in functions are always available and can be used to
+check the target platform type.
+
+@deftypefn {Built-in Function} void __builtin_cpu_init (void)
+This function runs the CPU detection code to check the type of CPU and the
+features supported. This built-in function needs to be invoked along with the built-in functions
+to check CPU type and features, @code{__builtin_cpu_is} and
+@code{__builtin_cpu_supports}, only when used in a function that is
+executed before any constructors are called. The CPU detection code is
+automatically executed in a very high priority constructor.
+
+For example, this function has to be used in @code{ifunc} resolvers that
+check for CPU type using the built-in functions @code{__builtin_cpu_is}
+and @code{__builtin_cpu_supports}, or in constructors on targets that
+don't support constructor priority.
+@smallexample
+
+static void (*resolve_memcpy (void)) (void)
+@{
+ // ifunc resolvers fire before constructors, explicitly call the init
+ // function.
+ __builtin_cpu_init ();
+ if (__builtin_cpu_supports ("ssse3"))
+ return ssse3_memcpy; // super fast memcpy with ssse3 instructions.
+ else
+ return default_memcpy;
+@}
+
+void *memcpy (void *, const void *, size_t)
+ __attribute__ ((ifunc ("resolve_memcpy")));
+@end smallexample
+
+@end deftypefn
+
+@deftypefn {Built-in Function} int __builtin_cpu_is (const char *@var{cpuname})
+This function returns a positive integer if the run-time CPU
+is of type @var{cpuname}
+and returns @code{0} otherwise. The following CPU names can be detected:
+
+@table @samp
+@item intel
+Intel CPU.
+
+@item atom
+Intel Atom CPU.
+
+@item core2
+Intel Core 2 CPU.
+
+@item corei7
+Intel Core i7 CPU.
+
+@item nehalem
+Intel Core i7 Nehalem CPU.
+
+@item westmere
+Intel Core i7 Westmere CPU.
+
+@item sandybridge
+Intel Core i7 Sandy Bridge CPU.
+
+@item amd
+AMD CPU.
+
+@item amdfam10h
+AMD Family 10h CPU.
+
+@item barcelona
+AMD Family 10h Barcelona CPU.
+
+@item shanghai
+AMD Family 10h Shanghai CPU.
+
+@item istanbul
+AMD Family 10h Istanbul CPU.
+
+@item btver1
+AMD Family 14h CPU.
+
+@item amdfam15h
+AMD Family 15h CPU.
+
+@item bdver1
+AMD Family 15h Bulldozer version 1.
+
+@item bdver2
+AMD Family 15h Bulldozer version 2.
+
+@item bdver3
+AMD Family 15h Bulldozer version 3.
+
+@item bdver4
+AMD Family 15h Bulldozer version 4.
+
+@item btver2
+AMD Family 16h CPU.
+@end table
+
+Here is an example:
+@smallexample
+if (__builtin_cpu_is ("corei7"))
+ @{
+ do_corei7 (); // Core i7 specific implementation.
+ @}
+else
+ @{
+ do_generic (); // Generic implementation.
+ @}
+@end smallexample
+@end deftypefn
+
+@deftypefn {Built-in Function} int __builtin_cpu_supports (const char *@var{feature})
+This function returns a positive integer if the run-time CPU
+supports @var{feature}
+and returns @code{0} otherwise. The following features can be detected:
+
+@table @samp
+@item cmov
+CMOV instruction.
+@item mmx
+MMX instructions.
+@item popcnt
+POPCNT instruction.
+@item sse
+SSE instructions.
+@item sse2
+SSE2 instructions.
+@item sse3
+SSE3 instructions.
+@item ssse3
+SSSE3 instructions.
+@item sse4.1
+SSE4.1 instructions.
+@item sse4.2
+SSE4.2 instructions.
+@item avx
+AVX instructions.
+@item avx2
+AVX2 instructions.
+@item avx512f
+AVX512F instructions.
+@end table
+
+Here is an example:
+@smallexample
+if (__builtin_cpu_supports ("popcnt"))
+ @{
+ asm("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc");
+ @}
+else
+ @{
+ count = generic_countbits (n); //generic implementation.
+ @}
+@end smallexample
+@end deftypefn
+
+
+The following built-in functions are made available by @option{-mmmx}.
+All of them generate the machine instruction that is part of the name.
+
+@smallexample
+v8qi __builtin_ia32_paddb (v8qi, v8qi)
+v4hi __builtin_ia32_paddw (v4hi, v4hi)
+v2si __builtin_ia32_paddd (v2si, v2si)
+v8qi __builtin_ia32_psubb (v8qi, v8qi)
+v4hi __builtin_ia32_psubw (v4hi, v4hi)
+v2si __builtin_ia32_psubd (v2si, v2si)
+v8qi __builtin_ia32_paddsb (v8qi, v8qi)
+v4hi __builtin_ia32_paddsw (v4hi, v4hi)
+v8qi __builtin_ia32_psubsb (v8qi, v8qi)
+v4hi __builtin_ia32_psubsw (v4hi, v4hi)
+v8qi __builtin_ia32_paddusb (v8qi, v8qi)
+v4hi __builtin_ia32_paddusw (v4hi, v4hi)
+v8qi __builtin_ia32_psubusb (v8qi, v8qi)
+v4hi __builtin_ia32_psubusw (v4hi, v4hi)
+v4hi __builtin_ia32_pmullw (v4hi, v4hi)
+v4hi __builtin_ia32_pmulhw (v4hi, v4hi)
+di __builtin_ia32_pand (di, di)
+di __builtin_ia32_pandn (di,di)
+di __builtin_ia32_por (di, di)
+di __builtin_ia32_pxor (di, di)
+v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi)
+v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi)
+v2si __builtin_ia32_pcmpeqd (v2si, v2si)
+v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi)
+v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi)
+v2si __builtin_ia32_pcmpgtd (v2si, v2si)
+v8qi __builtin_ia32_punpckhbw (v8qi, v8qi)
+v4hi __builtin_ia32_punpckhwd (v4hi, v4hi)
+v2si __builtin_ia32_punpckhdq (v2si, v2si)
+v8qi __builtin_ia32_punpcklbw (v8qi, v8qi)
+v4hi __builtin_ia32_punpcklwd (v4hi, v4hi)
+v2si __builtin_ia32_punpckldq (v2si, v2si)
+v8qi __builtin_ia32_packsswb (v4hi, v4hi)
+v4hi __builtin_ia32_packssdw (v2si, v2si)
+v8qi __builtin_ia32_packuswb (v4hi, v4hi)
+
+v4hi __builtin_ia32_psllw (v4hi, v4hi)
+v2si __builtin_ia32_pslld (v2si, v2si)
+v1di __builtin_ia32_psllq (v1di, v1di)
+v4hi __builtin_ia32_psrlw (v4hi, v4hi)
+v2si __builtin_ia32_psrld (v2si, v2si)
+v1di __builtin_ia32_psrlq (v1di, v1di)
+v4hi __builtin_ia32_psraw (v4hi, v4hi)
+v2si __builtin_ia32_psrad (v2si, v2si)
+v4hi __builtin_ia32_psllwi (v4hi, int)
+v2si __builtin_ia32_pslldi (v2si, int)
+v1di __builtin_ia32_psllqi (v1di, int)
+v4hi __builtin_ia32_psrlwi (v4hi, int)
+v2si __builtin_ia32_psrldi (v2si, int)
+v1di __builtin_ia32_psrlqi (v1di, int)
+v4hi __builtin_ia32_psrawi (v4hi, int)
+v2si __builtin_ia32_psradi (v2si, int)
+
+@end smallexample
+
+The following built-in functions are made available either with
+@option{-msse}, or with a combination of @option{-m3dnow} and
+@option{-march=athlon}. All of them generate the machine
+instruction that is part of the name.
+
+@smallexample
+v4hi __builtin_ia32_pmulhuw (v4hi, v4hi)
+v8qi __builtin_ia32_pavgb (v8qi, v8qi)
+v4hi __builtin_ia32_pavgw (v4hi, v4hi)
+v1di __builtin_ia32_psadbw (v8qi, v8qi)
+v8qi __builtin_ia32_pmaxub (v8qi, v8qi)
+v4hi __builtin_ia32_pmaxsw (v4hi, v4hi)
+v8qi __builtin_ia32_pminub (v8qi, v8qi)
+v4hi __builtin_ia32_pminsw (v4hi, v4hi)
+int __builtin_ia32_pmovmskb (v8qi)
+void __builtin_ia32_maskmovq (v8qi, v8qi, char *)
+void __builtin_ia32_movntq (di *, di)
+void __builtin_ia32_sfence (void)
+@end smallexample
+
+The following built-in functions are available when @option{-msse} is used.
+All of them generate the machine instruction that is part of the name.
+
+@smallexample
+int __builtin_ia32_comieq (v4sf, v4sf)
+int __builtin_ia32_comineq (v4sf, v4sf)
+int __builtin_ia32_comilt (v4sf, v4sf)
+int __builtin_ia32_comile (v4sf, v4sf)
+int __builtin_ia32_comigt (v4sf, v4sf)
+int __builtin_ia32_comige (v4sf, v4sf)
+int __builtin_ia32_ucomieq (v4sf, v4sf)
+int __builtin_ia32_ucomineq (v4sf, v4sf)
+int __builtin_ia32_ucomilt (v4sf, v4sf)
+int __builtin_ia32_ucomile (v4sf, v4sf)
+int __builtin_ia32_ucomigt (v4sf, v4sf)
+int __builtin_ia32_ucomige (v4sf, v4sf)
+v4sf __builtin_ia32_addps (v4sf, v4sf)
+v4sf __builtin_ia32_subps (v4sf, v4sf)
+v4sf __builtin_ia32_mulps (v4sf, v4sf)
+v4sf __builtin_ia32_divps (v4sf, v4sf)
+v4sf __builtin_ia32_addss (v4sf, v4sf)
+v4sf __builtin_ia32_subss (v4sf, v4sf)
+v4sf __builtin_ia32_mulss (v4sf, v4sf)
+v4sf __builtin_ia32_divss (v4sf, v4sf)
+v4sf __builtin_ia32_cmpeqps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpltps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpleps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpgtps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpgeps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpunordps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpneqps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpnltps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpnleps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpngtps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpngeps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpordps (v4sf, v4sf)
+v4sf __builtin_ia32_cmpeqss (v4sf, v4sf)
+v4sf __builtin_ia32_cmpltss (v4sf, v4sf)
+v4sf __builtin_ia32_cmpless (v4sf, v4sf)
+v4sf __builtin_ia32_cmpunordss (v4sf, v4sf)
+v4sf __builtin_ia32_cmpneqss (v4sf, v4sf)
+v4sf __builtin_ia32_cmpnltss (v4sf, v4sf)
+v4sf __builtin_ia32_cmpnless (v4sf, v4sf)
+v4sf __builtin_ia32_cmpordss (v4sf, v4sf)
+v4sf __builtin_ia32_maxps (v4sf, v4sf)
+v4sf __builtin_ia32_maxss (v4sf, v4sf)
+v4sf __builtin_ia32_minps (v4sf, v4sf)
+v4sf __builtin_ia32_minss (v4sf, v4sf)
+v4sf __builtin_ia32_andps (v4sf, v4sf)
+v4sf __builtin_ia32_andnps (v4sf, v4sf)
+v4sf __builtin_ia32_orps (v4sf, v4sf)
+v4sf __builtin_ia32_xorps (v4sf, v4sf)
+v4sf __builtin_ia32_movss (v4sf, v4sf)
+v4sf __builtin_ia32_movhlps (v4sf, v4sf)
+v4sf __builtin_ia32_movlhps (v4sf, v4sf)
+v4sf __builtin_ia32_unpckhps (v4sf, v4sf)
+v4sf __builtin_ia32_unpcklps (v4sf, v4sf)
+v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si)
+v4sf __builtin_ia32_cvtsi2ss (v4sf, int)
+v2si __builtin_ia32_cvtps2pi (v4sf)
+int __builtin_ia32_cvtss2si (v4sf)
+v2si __builtin_ia32_cvttps2pi (v4sf)
+int __builtin_ia32_cvttss2si (v4sf)
+v4sf __builtin_ia32_rcpps (v4sf)
+v4sf __builtin_ia32_rsqrtps (v4sf)
+v4sf __builtin_ia32_sqrtps (v4sf)
+v4sf __builtin_ia32_rcpss (v4sf)
+v4sf __builtin_ia32_rsqrtss (v4sf)
+v4sf __builtin_ia32_sqrtss (v4sf)
+v4sf __builtin_ia32_shufps (v4sf, v4sf, int)
+void __builtin_ia32_movntps (float *, v4sf)
+int __builtin_ia32_movmskps (v4sf)
+@end smallexample
+
+The following built-in functions are available when @option{-msse} is used.
+
+@table @code
+@item v4sf __builtin_ia32_loadups (float *)
+Generates the @code{movups} machine instruction as a load from memory.
+@item void __builtin_ia32_storeups (float *, v4sf)
+Generates the @code{movups} machine instruction as a store to memory.
+@item v4sf __builtin_ia32_loadss (float *)
+Generates the @code{movss} machine instruction as a load from memory.
+@item v4sf __builtin_ia32_loadhps (v4sf, const v2sf *)
+Generates the @code{movhps} machine instruction as a load from memory.
+@item v4sf __builtin_ia32_loadlps (v4sf, const v2sf *)
+Generates the @code{movlps} machine instruction as a load from memory
+@item void __builtin_ia32_storehps (v2sf *, v4sf)
+Generates the @code{movhps} machine instruction as a store to memory.
+@item void __builtin_ia32_storelps (v2sf *, v4sf)
+Generates the @code{movlps} machine instruction as a store to memory.
+@end table
+
+The following built-in functions are available when @option{-msse2} is used.
+All of them generate the machine instruction that is part of the name.
+
+@smallexample
+int __builtin_ia32_comisdeq (v2df, v2df)
+int __builtin_ia32_comisdlt (v2df, v2df)
+int __builtin_ia32_comisdle (v2df, v2df)
+int __builtin_ia32_comisdgt (v2df, v2df)
+int __builtin_ia32_comisdge (v2df, v2df)
+int __builtin_ia32_comisdneq (v2df, v2df)
+int __builtin_ia32_ucomisdeq (v2df, v2df)
+int __builtin_ia32_ucomisdlt (v2df, v2df)
+int __builtin_ia32_ucomisdle (v2df, v2df)
+int __builtin_ia32_ucomisdgt (v2df, v2df)
+int __builtin_ia32_ucomisdge (v2df, v2df)
+int __builtin_ia32_ucomisdneq (v2df, v2df)
+v2df __builtin_ia32_cmpeqpd (v2df, v2df)
+v2df __builtin_ia32_cmpltpd (v2df, v2df)
+v2df __builtin_ia32_cmplepd (v2df, v2df)
+v2df __builtin_ia32_cmpgtpd (v2df, v2df)
+v2df __builtin_ia32_cmpgepd (v2df, v2df)
+v2df __builtin_ia32_cmpunordpd (v2df, v2df)
+v2df __builtin_ia32_cmpneqpd (v2df, v2df)
+v2df __builtin_ia32_cmpnltpd (v2df, v2df)
+v2df __builtin_ia32_cmpnlepd (v2df, v2df)
+v2df __builtin_ia32_cmpngtpd (v2df, v2df)
+v2df __builtin_ia32_cmpngepd (v2df, v2df)
+v2df __builtin_ia32_cmpordpd (v2df, v2df)
+v2df __builtin_ia32_cmpeqsd (v2df, v2df)
+v2df __builtin_ia32_cmpltsd (v2df, v2df)
+v2df __builtin_ia32_cmplesd (v2df, v2df)
+v2df __builtin_ia32_cmpunordsd (v2df, v2df)
+v2df __builtin_ia32_cmpneqsd (v2df, v2df)
+v2df __builtin_ia32_cmpnltsd (v2df, v2df)
+v2df __builtin_ia32_cmpnlesd (v2df, v2df)
+v2df __builtin_ia32_cmpordsd (v2df, v2df)
+v2di __builtin_ia32_paddq (v2di, v2di)
+v2di __builtin_ia32_psubq (v2di, v2di)
+v2df __builtin_ia32_addpd (v2df, v2df)
+v2df __builtin_ia32_subpd (v2df, v2df)
+v2df __builtin_ia32_mulpd (v2df, v2df)
+v2df __builtin_ia32_divpd (v2df, v2df)
+v2df __builtin_ia32_addsd (v2df, v2df)
+v2df __builtin_ia32_subsd (v2df, v2df)
+v2df __builtin_ia32_mulsd (v2df, v2df)
+v2df __builtin_ia32_divsd (v2df, v2df)
+v2df __builtin_ia32_minpd (v2df, v2df)
+v2df __builtin_ia32_maxpd (v2df, v2df)
+v2df __builtin_ia32_minsd (v2df, v2df)
+v2df __builtin_ia32_maxsd (v2df, v2df)
+v2df __builtin_ia32_andpd (v2df, v2df)
+v2df __builtin_ia32_andnpd (v2df, v2df)
+v2df __builtin_ia32_orpd (v2df, v2df)
+v2df __builtin_ia32_xorpd (v2df, v2df)
+v2df __builtin_ia32_movsd (v2df, v2df)
+v2df __builtin_ia32_unpckhpd (v2df, v2df)
+v2df __builtin_ia32_unpcklpd (v2df, v2df)
+v16qi __builtin_ia32_paddb128 (v16qi, v16qi)
+v8hi __builtin_ia32_paddw128 (v8hi, v8hi)
+v4si __builtin_ia32_paddd128 (v4si, v4si)
+v2di __builtin_ia32_paddq128 (v2di, v2di)
+v16qi __builtin_ia32_psubb128 (v16qi, v16qi)
+v8hi __builtin_ia32_psubw128 (v8hi, v8hi)
+v4si __builtin_ia32_psubd128 (v4si, v4si)
+v2di __builtin_ia32_psubq128 (v2di, v2di)
+v8hi __builtin_ia32_pmullw128 (v8hi, v8hi)
+v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi)
+v2di __builtin_ia32_pand128 (v2di, v2di)
+v2di __builtin_ia32_pandn128 (v2di, v2di)
+v2di __builtin_ia32_por128 (v2di, v2di)
+v2di __builtin_ia32_pxor128 (v2di, v2di)
+v16qi __builtin_ia32_pavgb128 (v16qi, v16qi)
+v8hi __builtin_ia32_pavgw128 (v8hi, v8hi)
+v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi)
+v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi)
+v4si __builtin_ia32_pcmpeqd128 (v4si, v4si)
+v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi)
+v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi)
+v4si __builtin_ia32_pcmpgtd128 (v4si, v4si)
+v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi)
+v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi)
+v16qi __builtin_ia32_pminub128 (v16qi, v16qi)
+v8hi __builtin_ia32_pminsw128 (v8hi, v8hi)
+v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi)
+v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi)
+v4si __builtin_ia32_punpckhdq128 (v4si, v4si)
+v2di __builtin_ia32_punpckhqdq128 (v2di, v2di)
+v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi)
+v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi)
+v4si __builtin_ia32_punpckldq128 (v4si, v4si)
+v2di __builtin_ia32_punpcklqdq128 (v2di, v2di)
+v16qi __builtin_ia32_packsswb128 (v8hi, v8hi)
+v8hi __builtin_ia32_packssdw128 (v4si, v4si)
+v16qi __builtin_ia32_packuswb128 (v8hi, v8hi)
+v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi)
+void __builtin_ia32_maskmovdqu (v16qi, v16qi)
+v2df __builtin_ia32_loadupd (double *)
+void __builtin_ia32_storeupd (double *, v2df)
+v2df __builtin_ia32_loadhpd (v2df, double const *)
+v2df __builtin_ia32_loadlpd (v2df, double const *)
+int __builtin_ia32_movmskpd (v2df)
+int __builtin_ia32_pmovmskb128 (v16qi)
+void __builtin_ia32_movnti (int *, int)
+void __builtin_ia32_movnti64 (long long int *, long long int)
+void __builtin_ia32_movntpd (double *, v2df)
+void __builtin_ia32_movntdq (v2df *, v2df)
+v4si __builtin_ia32_pshufd (v4si, int)
+v8hi __builtin_ia32_pshuflw (v8hi, int)
+v8hi __builtin_ia32_pshufhw (v8hi, int)
+v2di __builtin_ia32_psadbw128 (v16qi, v16qi)
+v2df __builtin_ia32_sqrtpd (v2df)
+v2df __builtin_ia32_sqrtsd (v2df)
+v2df __builtin_ia32_shufpd (v2df, v2df, int)
+v2df __builtin_ia32_cvtdq2pd (v4si)
+v4sf __builtin_ia32_cvtdq2ps (v4si)
+v4si __builtin_ia32_cvtpd2dq (v2df)
+v2si __builtin_ia32_cvtpd2pi (v2df)
+v4sf __builtin_ia32_cvtpd2ps (v2df)
+v4si __builtin_ia32_cvttpd2dq (v2df)
+v2si __builtin_ia32_cvttpd2pi (v2df)
+v2df __builtin_ia32_cvtpi2pd (v2si)
+int __builtin_ia32_cvtsd2si (v2df)
+int __builtin_ia32_cvttsd2si (v2df)
+long long __builtin_ia32_cvtsd2si64 (v2df)
+long long __builtin_ia32_cvttsd2si64 (v2df)
+v4si __builtin_ia32_cvtps2dq (v4sf)
+v2df __builtin_ia32_cvtps2pd (v4sf)
+v4si __builtin_ia32_cvttps2dq (v4sf)
+v2df __builtin_ia32_cvtsi2sd (v2df, int)
+v2df __builtin_ia32_cvtsi642sd (v2df, long long)
+v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df)
+v2df __builtin_ia32_cvtss2sd (v2df, v4sf)
+void __builtin_ia32_clflush (const void *)
+void __builtin_ia32_lfence (void)
+void __builtin_ia32_mfence (void)
+v16qi __builtin_ia32_loaddqu (const char *)
+void __builtin_ia32_storedqu (char *, v16qi)
+v1di __builtin_ia32_pmuludq (v2si, v2si)
+v2di __builtin_ia32_pmuludq128 (v4si, v4si)
+v8hi __builtin_ia32_psllw128 (v8hi, v8hi)
+v4si __builtin_ia32_pslld128 (v4si, v4si)
+v2di __builtin_ia32_psllq128 (v2di, v2di)
+v8hi __builtin_ia32_psrlw128 (v8hi, v8hi)
+v4si __builtin_ia32_psrld128 (v4si, v4si)
+v2di __builtin_ia32_psrlq128 (v2di, v2di)
+v8hi __builtin_ia32_psraw128 (v8hi, v8hi)
+v4si __builtin_ia32_psrad128 (v4si, v4si)
+v2di __builtin_ia32_pslldqi128 (v2di, int)
+v8hi __builtin_ia32_psllwi128 (v8hi, int)
+v4si __builtin_ia32_pslldi128 (v4si, int)
+v2di __builtin_ia32_psllqi128 (v2di, int)
+v2di __builtin_ia32_psrldqi128 (v2di, int)
+v8hi __builtin_ia32_psrlwi128 (v8hi, int)
+v4si __builtin_ia32_psrldi128 (v4si, int)
+v2di __builtin_ia32_psrlqi128 (v2di, int)
+v8hi __builtin_ia32_psrawi128 (v8hi, int)
+v4si __builtin_ia32_psradi128 (v4si, int)
+v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi)
+v2di __builtin_ia32_movq128 (v2di)
+@end smallexample
+
+The following built-in functions are available when @option{-msse3} is used.
+All of them generate the machine instruction that is part of the name.
+
+@smallexample
+v2df __builtin_ia32_addsubpd (v2df, v2df)
+v4sf __builtin_ia32_addsubps (v4sf, v4sf)
+v2df __builtin_ia32_haddpd (v2df, v2df)
+v4sf __builtin_ia32_haddps (v4sf, v4sf)
+v2df __builtin_ia32_hsubpd (v2df, v2df)
+v4sf __builtin_ia32_hsubps (v4sf, v4sf)
+v16qi __builtin_ia32_lddqu (char const *)
+void __builtin_ia32_monitor (void *, unsigned int, unsigned int)
+v4sf __builtin_ia32_movshdup (v4sf)
+v4sf __builtin_ia32_movsldup (v4sf)
+void __builtin_ia32_mwait (unsigned int, unsigned int)
+@end smallexample
+
+The following built-in functions are available when @option{-mssse3} is used.
+All of them generate the machine instruction that is part of the name.
+
+@smallexample
+v2si __builtin_ia32_phaddd (v2si, v2si)
+v4hi __builtin_ia32_phaddw (v4hi, v4hi)
+v4hi __builtin_ia32_phaddsw (v4hi, v4hi)
+v2si __builtin_ia32_phsubd (v2si, v2si)
+v4hi __builtin_ia32_phsubw (v4hi, v4hi)
+v4hi __builtin_ia32_phsubsw (v4hi, v4hi)
+v4hi __builtin_ia32_pmaddubsw (v8qi, v8qi)
+v4hi __builtin_ia32_pmulhrsw (v4hi, v4hi)
+v8qi __builtin_ia32_pshufb (v8qi, v8qi)
+v8qi __builtin_ia32_psignb (v8qi, v8qi)
+v2si __builtin_ia32_psignd (v2si, v2si)
+v4hi __builtin_ia32_psignw (v4hi, v4hi)
+v1di __builtin_ia32_palignr (v1di, v1di, int)
+v8qi __builtin_ia32_pabsb (v8qi)
+v2si __builtin_ia32_pabsd (v2si)
+v4hi __builtin_ia32_pabsw (v4hi)
+@end smallexample
+
+The following built-in functions are available when @option{-mssse3} is used.
+All of them generate the machine instruction that is part of the name.
+
+@smallexample
+v4si __builtin_ia32_phaddd128 (v4si, v4si)
+v8hi __builtin_ia32_phaddw128 (v8hi, v8hi)
+v8hi __builtin_ia32_phaddsw128 (v8hi, v8hi)
+v4si __builtin_ia32_phsubd128 (v4si, v4si)
+v8hi __builtin_ia32_phsubw128 (v8hi, v8hi)
+v8hi __builtin_ia32_phsubsw128 (v8hi, v8hi)
+v8hi __builtin_ia32_pmaddubsw128 (v16qi, v16qi)
+v8hi __builtin_ia32_pmulhrsw128 (v8hi, v8hi)
+v16qi __builtin_ia32_pshufb128 (v16qi, v16qi)
+v16qi __builtin_ia32_psignb128 (v16qi, v16qi)
+v4si __builtin_ia32_psignd128 (v4si, v4si)
+v8hi __builtin_ia32_psignw128 (v8hi, v8hi)
+v2di __builtin_ia32_palignr128 (v2di, v2di, int)
+v16qi __builtin_ia32_pabsb128 (v16qi)
+v4si __builtin_ia32_pabsd128 (v4si)
+v8hi __builtin_ia32_pabsw128 (v8hi)
+@end smallexample
+
+The following built-in functions are available when @option{-msse4.1} is
+used. All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+v2df __builtin_ia32_blendpd (v2df, v2df, const int)
+v4sf __builtin_ia32_blendps (v4sf, v4sf, const int)
+v2df __builtin_ia32_blendvpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_blendvps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_dppd (v2df, v2df, const int)
+v4sf __builtin_ia32_dpps (v4sf, v4sf, const int)
+v4sf __builtin_ia32_insertps128 (v4sf, v4sf, const int)
+v2di __builtin_ia32_movntdqa (v2di *);
+v16qi __builtin_ia32_mpsadbw128 (v16qi, v16qi, const int)
+v8hi __builtin_ia32_packusdw128 (v4si, v4si)
+v16qi __builtin_ia32_pblendvb128 (v16qi, v16qi, v16qi)
+v8hi __builtin_ia32_pblendw128 (v8hi, v8hi, const int)
+v2di __builtin_ia32_pcmpeqq (v2di, v2di)
+v8hi __builtin_ia32_phminposuw128 (v8hi)
+v16qi __builtin_ia32_pmaxsb128 (v16qi, v16qi)
+v4si __builtin_ia32_pmaxsd128 (v4si, v4si)
+v4si __builtin_ia32_pmaxud128 (v4si, v4si)
+v8hi __builtin_ia32_pmaxuw128 (v8hi, v8hi)
+v16qi __builtin_ia32_pminsb128 (v16qi, v16qi)
+v4si __builtin_ia32_pminsd128 (v4si, v4si)
+v4si __builtin_ia32_pminud128 (v4si, v4si)
+v8hi __builtin_ia32_pminuw128 (v8hi, v8hi)
+v4si __builtin_ia32_pmovsxbd128 (v16qi)
+v2di __builtin_ia32_pmovsxbq128 (v16qi)
+v8hi __builtin_ia32_pmovsxbw128 (v16qi)
+v2di __builtin_ia32_pmovsxdq128 (v4si)
+v4si __builtin_ia32_pmovsxwd128 (v8hi)
+v2di __builtin_ia32_pmovsxwq128 (v8hi)
+v4si __builtin_ia32_pmovzxbd128 (v16qi)
+v2di __builtin_ia32_pmovzxbq128 (v16qi)
+v8hi __builtin_ia32_pmovzxbw128 (v16qi)
+v2di __builtin_ia32_pmovzxdq128 (v4si)
+v4si __builtin_ia32_pmovzxwd128 (v8hi)
+v2di __builtin_ia32_pmovzxwq128 (v8hi)
+v2di __builtin_ia32_pmuldq128 (v4si, v4si)
+v4si __builtin_ia32_pmulld128 (v4si, v4si)
+int __builtin_ia32_ptestc128 (v2di, v2di)
+int __builtin_ia32_ptestnzc128 (v2di, v2di)
+int __builtin_ia32_ptestz128 (v2di, v2di)
+v2df __builtin_ia32_roundpd (v2df, const int)
+v4sf __builtin_ia32_roundps (v4sf, const int)
+v2df __builtin_ia32_roundsd (v2df, v2df, const int)
+v4sf __builtin_ia32_roundss (v4sf, v4sf, const int)
+@end smallexample
+
+The following built-in functions are available when @option{-msse4.1} is
+used.
+
+@table @code
+@item v4sf __builtin_ia32_vec_set_v4sf (v4sf, float, const int)
+Generates the @code{insertps} machine instruction.
+@item int __builtin_ia32_vec_ext_v16qi (v16qi, const int)
+Generates the @code{pextrb} machine instruction.
+@item v16qi __builtin_ia32_vec_set_v16qi (v16qi, int, const int)
+Generates the @code{pinsrb} machine instruction.
+@item v4si __builtin_ia32_vec_set_v4si (v4si, int, const int)
+Generates the @code{pinsrd} machine instruction.
+@item v2di __builtin_ia32_vec_set_v2di (v2di, long long, const int)
+Generates the @code{pinsrq} machine instruction in 64bit mode.
+@end table
+
+The following built-in functions are changed to generate new SSE4.1
+instructions when @option{-msse4.1} is used.
+
+@table @code
+@item float __builtin_ia32_vec_ext_v4sf (v4sf, const int)
+Generates the @code{extractps} machine instruction.
+@item int __builtin_ia32_vec_ext_v4si (v4si, const int)
+Generates the @code{pextrd} machine instruction.
+@item long long __builtin_ia32_vec_ext_v2di (v2di, const int)
+Generates the @code{pextrq} machine instruction in 64bit mode.
+@end table
+
+The following built-in functions are available when @option{-msse4.2} is
+used. All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int)
+v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int)
+v2di __builtin_ia32_pcmpgtq (v2di, v2di)
+@end smallexample
+
+The following built-in functions are available when @option{-msse4.2} is
+used.
+
+@table @code
+@item unsigned int __builtin_ia32_crc32qi (unsigned int, unsigned char)
+Generates the @code{crc32b} machine instruction.
+@item unsigned int __builtin_ia32_crc32hi (unsigned int, unsigned short)
+Generates the @code{crc32w} machine instruction.
+@item unsigned int __builtin_ia32_crc32si (unsigned int, unsigned int)
+Generates the @code{crc32l} machine instruction.
+@item unsigned long long __builtin_ia32_crc32di (unsigned long long, unsigned long long)
+Generates the @code{crc32q} machine instruction.
+@end table
+
+The following built-in functions are changed to generate new SSE4.2
+instructions when @option{-msse4.2} is used.
+
+@table @code
+@item int __builtin_popcount (unsigned int)
+Generates the @code{popcntl} machine instruction.
+@item int __builtin_popcountl (unsigned long)
+Generates the @code{popcntl} or @code{popcntq} machine instruction,
+depending on the size of @code{unsigned long}.
+@item int __builtin_popcountll (unsigned long long)
+Generates the @code{popcntq} machine instruction.
+@end table
+
+The following built-in functions are available when @option{-mavx} is
+used. All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+v4df __builtin_ia32_addpd256 (v4df,v4df)
+v8sf __builtin_ia32_addps256 (v8sf,v8sf)
+v4df __builtin_ia32_addsubpd256 (v4df,v4df)
+v8sf __builtin_ia32_addsubps256 (v8sf,v8sf)
+v4df __builtin_ia32_andnpd256 (v4df,v4df)
+v8sf __builtin_ia32_andnps256 (v8sf,v8sf)
+v4df __builtin_ia32_andpd256 (v4df,v4df)
+v8sf __builtin_ia32_andps256 (v8sf,v8sf)
+v4df __builtin_ia32_blendpd256 (v4df,v4df,int)
+v8sf __builtin_ia32_blendps256 (v8sf,v8sf,int)
+v4df __builtin_ia32_blendvpd256 (v4df,v4df,v4df)
+v8sf __builtin_ia32_blendvps256 (v8sf,v8sf,v8sf)
+v2df __builtin_ia32_cmppd (v2df,v2df,int)
+v4df __builtin_ia32_cmppd256 (v4df,v4df,int)
+v4sf __builtin_ia32_cmpps (v4sf,v4sf,int)
+v8sf __builtin_ia32_cmpps256 (v8sf,v8sf,int)
+v2df __builtin_ia32_cmpsd (v2df,v2df,int)
+v4sf __builtin_ia32_cmpss (v4sf,v4sf,int)
+v4df __builtin_ia32_cvtdq2pd256 (v4si)
+v8sf __builtin_ia32_cvtdq2ps256 (v8si)
+v4si __builtin_ia32_cvtpd2dq256 (v4df)
+v4sf __builtin_ia32_cvtpd2ps256 (v4df)
+v8si __builtin_ia32_cvtps2dq256 (v8sf)
+v4df __builtin_ia32_cvtps2pd256 (v4sf)
+v4si __builtin_ia32_cvttpd2dq256 (v4df)
+v8si __builtin_ia32_cvttps2dq256 (v8sf)
+v4df __builtin_ia32_divpd256 (v4df,v4df)
+v8sf __builtin_ia32_divps256 (v8sf,v8sf)
+v8sf __builtin_ia32_dpps256 (v8sf,v8sf,int)
+v4df __builtin_ia32_haddpd256 (v4df,v4df)
+v8sf __builtin_ia32_haddps256 (v8sf,v8sf)
+v4df __builtin_ia32_hsubpd256 (v4df,v4df)
+v8sf __builtin_ia32_hsubps256 (v8sf,v8sf)
+v32qi __builtin_ia32_lddqu256 (pcchar)
+v32qi __builtin_ia32_loaddqu256 (pcchar)
+v4df __builtin_ia32_loadupd256 (pcdouble)
+v8sf __builtin_ia32_loadups256 (pcfloat)
+v2df __builtin_ia32_maskloadpd (pcv2df,v2df)
+v4df __builtin_ia32_maskloadpd256 (pcv4df,v4df)
+v4sf __builtin_ia32_maskloadps (pcv4sf,v4sf)
+v8sf __builtin_ia32_maskloadps256 (pcv8sf,v8sf)
+void __builtin_ia32_maskstorepd (pv2df,v2df,v2df)
+void __builtin_ia32_maskstorepd256 (pv4df,v4df,v4df)
+void __builtin_ia32_maskstoreps (pv4sf,v4sf,v4sf)
+void __builtin_ia32_maskstoreps256 (pv8sf,v8sf,v8sf)
+v4df __builtin_ia32_maxpd256 (v4df,v4df)
+v8sf __builtin_ia32_maxps256 (v8sf,v8sf)
+v4df __builtin_ia32_minpd256 (v4df,v4df)
+v8sf __builtin_ia32_minps256 (v8sf,v8sf)
+v4df __builtin_ia32_movddup256 (v4df)
+int __builtin_ia32_movmskpd256 (v4df)
+int __builtin_ia32_movmskps256 (v8sf)
+v8sf __builtin_ia32_movshdup256 (v8sf)
+v8sf __builtin_ia32_movsldup256 (v8sf)
+v4df __builtin_ia32_mulpd256 (v4df,v4df)
+v8sf __builtin_ia32_mulps256 (v8sf,v8sf)
+v4df __builtin_ia32_orpd256 (v4df,v4df)
+v8sf __builtin_ia32_orps256 (v8sf,v8sf)
+v2df __builtin_ia32_pd_pd256 (v4df)
+v4df __builtin_ia32_pd256_pd (v2df)
+v4sf __builtin_ia32_ps_ps256 (v8sf)
+v8sf __builtin_ia32_ps256_ps (v4sf)
+int __builtin_ia32_ptestc256 (v4di,v4di,ptest)
+int __builtin_ia32_ptestnzc256 (v4di,v4di,ptest)
+int __builtin_ia32_ptestz256 (v4di,v4di,ptest)
+v8sf __builtin_ia32_rcpps256 (v8sf)
+v4df __builtin_ia32_roundpd256 (v4df,int)
+v8sf __builtin_ia32_roundps256 (v8sf,int)
+v8sf __builtin_ia32_rsqrtps_nr256 (v8sf)
+v8sf __builtin_ia32_rsqrtps256 (v8sf)
+v4df __builtin_ia32_shufpd256 (v4df,v4df,int)
+v8sf __builtin_ia32_shufps256 (v8sf,v8sf,int)
+v4si __builtin_ia32_si_si256 (v8si)
+v8si __builtin_ia32_si256_si (v4si)
+v4df __builtin_ia32_sqrtpd256 (v4df)
+v8sf __builtin_ia32_sqrtps_nr256 (v8sf)
+v8sf __builtin_ia32_sqrtps256 (v8sf)
+void __builtin_ia32_storedqu256 (pchar,v32qi)
+void __builtin_ia32_storeupd256 (pdouble,v4df)
+void __builtin_ia32_storeups256 (pfloat,v8sf)
+v4df __builtin_ia32_subpd256 (v4df,v4df)
+v8sf __builtin_ia32_subps256 (v8sf,v8sf)
+v4df __builtin_ia32_unpckhpd256 (v4df,v4df)
+v8sf __builtin_ia32_unpckhps256 (v8sf,v8sf)
+v4df __builtin_ia32_unpcklpd256 (v4df,v4df)
+v8sf __builtin_ia32_unpcklps256 (v8sf,v8sf)
+v4df __builtin_ia32_vbroadcastf128_pd256 (pcv2df)
+v8sf __builtin_ia32_vbroadcastf128_ps256 (pcv4sf)
+v4df __builtin_ia32_vbroadcastsd256 (pcdouble)
+v4sf __builtin_ia32_vbroadcastss (pcfloat)
+v8sf __builtin_ia32_vbroadcastss256 (pcfloat)
+v2df __builtin_ia32_vextractf128_pd256 (v4df,int)
+v4sf __builtin_ia32_vextractf128_ps256 (v8sf,int)
+v4si __builtin_ia32_vextractf128_si256 (v8si,int)
+v4df __builtin_ia32_vinsertf128_pd256 (v4df,v2df,int)
+v8sf __builtin_ia32_vinsertf128_ps256 (v8sf,v4sf,int)
+v8si __builtin_ia32_vinsertf128_si256 (v8si,v4si,int)
+v4df __builtin_ia32_vperm2f128_pd256 (v4df,v4df,int)
+v8sf __builtin_ia32_vperm2f128_ps256 (v8sf,v8sf,int)
+v8si __builtin_ia32_vperm2f128_si256 (v8si,v8si,int)
+v2df __builtin_ia32_vpermil2pd (v2df,v2df,v2di,int)
+v4df __builtin_ia32_vpermil2pd256 (v4df,v4df,v4di,int)
+v4sf __builtin_ia32_vpermil2ps (v4sf,v4sf,v4si,int)
+v8sf __builtin_ia32_vpermil2ps256 (v8sf,v8sf,v8si,int)
+v2df __builtin_ia32_vpermilpd (v2df,int)
+v4df __builtin_ia32_vpermilpd256 (v4df,int)
+v4sf __builtin_ia32_vpermilps (v4sf,int)
+v8sf __builtin_ia32_vpermilps256 (v8sf,int)
+v2df __builtin_ia32_vpermilvarpd (v2df,v2di)
+v4df __builtin_ia32_vpermilvarpd256 (v4df,v4di)
+v4sf __builtin_ia32_vpermilvarps (v4sf,v4si)
+v8sf __builtin_ia32_vpermilvarps256 (v8sf,v8si)
+int __builtin_ia32_vtestcpd (v2df,v2df,ptest)
+int __builtin_ia32_vtestcpd256 (v4df,v4df,ptest)
+int __builtin_ia32_vtestcps (v4sf,v4sf,ptest)
+int __builtin_ia32_vtestcps256 (v8sf,v8sf,ptest)
+int __builtin_ia32_vtestnzcpd (v2df,v2df,ptest)
+int __builtin_ia32_vtestnzcpd256 (v4df,v4df,ptest)
+int __builtin_ia32_vtestnzcps (v4sf,v4sf,ptest)
+int __builtin_ia32_vtestnzcps256 (v8sf,v8sf,ptest)
+int __builtin_ia32_vtestzpd (v2df,v2df,ptest)
+int __builtin_ia32_vtestzpd256 (v4df,v4df,ptest)
+int __builtin_ia32_vtestzps (v4sf,v4sf,ptest)
+int __builtin_ia32_vtestzps256 (v8sf,v8sf,ptest)
+void __builtin_ia32_vzeroall (void)
+void __builtin_ia32_vzeroupper (void)
+v4df __builtin_ia32_xorpd256 (v4df,v4df)
+v8sf __builtin_ia32_xorps256 (v8sf,v8sf)
+@end smallexample
+
+The following built-in functions are available when @option{-mavx2} is
+used. All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+v32qi __builtin_ia32_mpsadbw256 (v32qi,v32qi,int)
+v32qi __builtin_ia32_pabsb256 (v32qi)
+v16hi __builtin_ia32_pabsw256 (v16hi)
+v8si __builtin_ia32_pabsd256 (v8si)
+v16hi __builtin_ia32_packssdw256 (v8si,v8si)
+v32qi __builtin_ia32_packsswb256 (v16hi,v16hi)
+v16hi __builtin_ia32_packusdw256 (v8si,v8si)
+v32qi __builtin_ia32_packuswb256 (v16hi,v16hi)
+v32qi __builtin_ia32_paddb256 (v32qi,v32qi)
+v16hi __builtin_ia32_paddw256 (v16hi,v16hi)
+v8si __builtin_ia32_paddd256 (v8si,v8si)
+v4di __builtin_ia32_paddq256 (v4di,v4di)
+v32qi __builtin_ia32_paddsb256 (v32qi,v32qi)
+v16hi __builtin_ia32_paddsw256 (v16hi,v16hi)
+v32qi __builtin_ia32_paddusb256 (v32qi,v32qi)
+v16hi __builtin_ia32_paddusw256 (v16hi,v16hi)
+v4di __builtin_ia32_palignr256 (v4di,v4di,int)
+v4di __builtin_ia32_andsi256 (v4di,v4di)
+v4di __builtin_ia32_andnotsi256 (v4di,v4di)
+v32qi __builtin_ia32_pavgb256 (v32qi,v32qi)
+v16hi __builtin_ia32_pavgw256 (v16hi,v16hi)
+v32qi __builtin_ia32_pblendvb256 (v32qi,v32qi,v32qi)
+v16hi __builtin_ia32_pblendw256 (v16hi,v16hi,int)
+v32qi __builtin_ia32_pcmpeqb256 (v32qi,v32qi)
+v16hi __builtin_ia32_pcmpeqw256 (v16hi,v16hi)
+v8si __builtin_ia32_pcmpeqd256 (c8si,v8si)
+v4di __builtin_ia32_pcmpeqq256 (v4di,v4di)
+v32qi __builtin_ia32_pcmpgtb256 (v32qi,v32qi)
+v16hi __builtin_ia32_pcmpgtw256 (16hi,v16hi)
+v8si __builtin_ia32_pcmpgtd256 (v8si,v8si)
+v4di __builtin_ia32_pcmpgtq256 (v4di,v4di)
+v16hi __builtin_ia32_phaddw256 (v16hi,v16hi)
+v8si __builtin_ia32_phaddd256 (v8si,v8si)
+v16hi __builtin_ia32_phaddsw256 (v16hi,v16hi)
+v16hi __builtin_ia32_phsubw256 (v16hi,v16hi)
+v8si __builtin_ia32_phsubd256 (v8si,v8si)
+v16hi __builtin_ia32_phsubsw256 (v16hi,v16hi)
+v32qi __builtin_ia32_pmaddubsw256 (v32qi,v32qi)
+v16hi __builtin_ia32_pmaddwd256 (v16hi,v16hi)
+v32qi __builtin_ia32_pmaxsb256 (v32qi,v32qi)
+v16hi __builtin_ia32_pmaxsw256 (v16hi,v16hi)
+v8si __builtin_ia32_pmaxsd256 (v8si,v8si)
+v32qi __builtin_ia32_pmaxub256 (v32qi,v32qi)
+v16hi __builtin_ia32_pmaxuw256 (v16hi,v16hi)
+v8si __builtin_ia32_pmaxud256 (v8si,v8si)
+v32qi __builtin_ia32_pminsb256 (v32qi,v32qi)
+v16hi __builtin_ia32_pminsw256 (v16hi,v16hi)
+v8si __builtin_ia32_pminsd256 (v8si,v8si)
+v32qi __builtin_ia32_pminub256 (v32qi,v32qi)
+v16hi __builtin_ia32_pminuw256 (v16hi,v16hi)
+v8si __builtin_ia32_pminud256 (v8si,v8si)
+int __builtin_ia32_pmovmskb256 (v32qi)
+v16hi __builtin_ia32_pmovsxbw256 (v16qi)
+v8si __builtin_ia32_pmovsxbd256 (v16qi)
+v4di __builtin_ia32_pmovsxbq256 (v16qi)
+v8si __builtin_ia32_pmovsxwd256 (v8hi)
+v4di __builtin_ia32_pmovsxwq256 (v8hi)
+v4di __builtin_ia32_pmovsxdq256 (v4si)
+v16hi __builtin_ia32_pmovzxbw256 (v16qi)
+v8si __builtin_ia32_pmovzxbd256 (v16qi)
+v4di __builtin_ia32_pmovzxbq256 (v16qi)
+v8si __builtin_ia32_pmovzxwd256 (v8hi)
+v4di __builtin_ia32_pmovzxwq256 (v8hi)
+v4di __builtin_ia32_pmovzxdq256 (v4si)
+v4di __builtin_ia32_pmuldq256 (v8si,v8si)
+v16hi __builtin_ia32_pmulhrsw256 (v16hi, v16hi)
+v16hi __builtin_ia32_pmulhuw256 (v16hi,v16hi)
+v16hi __builtin_ia32_pmulhw256 (v16hi,v16hi)
+v16hi __builtin_ia32_pmullw256 (v16hi,v16hi)
+v8si __builtin_ia32_pmulld256 (v8si,v8si)
+v4di __builtin_ia32_pmuludq256 (v8si,v8si)
+v4di __builtin_ia32_por256 (v4di,v4di)
+v16hi __builtin_ia32_psadbw256 (v32qi,v32qi)
+v32qi __builtin_ia32_pshufb256 (v32qi,v32qi)
+v8si __builtin_ia32_pshufd256 (v8si,int)
+v16hi __builtin_ia32_pshufhw256 (v16hi,int)
+v16hi __builtin_ia32_pshuflw256 (v16hi,int)
+v32qi __builtin_ia32_psignb256 (v32qi,v32qi)
+v16hi __builtin_ia32_psignw256 (v16hi,v16hi)
+v8si __builtin_ia32_psignd256 (v8si,v8si)
+v4di __builtin_ia32_pslldqi256 (v4di,int)
+v16hi __builtin_ia32_psllwi256 (16hi,int)
+v16hi __builtin_ia32_psllw256(v16hi,v8hi)
+v8si __builtin_ia32_pslldi256 (v8si,int)
+v8si __builtin_ia32_pslld256(v8si,v4si)
+v4di __builtin_ia32_psllqi256 (v4di,int)
+v4di __builtin_ia32_psllq256(v4di,v2di)
+v16hi __builtin_ia32_psrawi256 (v16hi,int)
+v16hi __builtin_ia32_psraw256 (v16hi,v8hi)
+v8si __builtin_ia32_psradi256 (v8si,int)
+v8si __builtin_ia32_psrad256 (v8si,v4si)
+v4di __builtin_ia32_psrldqi256 (v4di, int)
+v16hi __builtin_ia32_psrlwi256 (v16hi,int)
+v16hi __builtin_ia32_psrlw256 (v16hi,v8hi)
+v8si __builtin_ia32_psrldi256 (v8si,int)
+v8si __builtin_ia32_psrld256 (v8si,v4si)
+v4di __builtin_ia32_psrlqi256 (v4di,int)
+v4di __builtin_ia32_psrlq256(v4di,v2di)
+v32qi __builtin_ia32_psubb256 (v32qi,v32qi)
+v32hi __builtin_ia32_psubw256 (v16hi,v16hi)
+v8si __builtin_ia32_psubd256 (v8si,v8si)
+v4di __builtin_ia32_psubq256 (v4di,v4di)
+v32qi __builtin_ia32_psubsb256 (v32qi,v32qi)
+v16hi __builtin_ia32_psubsw256 (v16hi,v16hi)
+v32qi __builtin_ia32_psubusb256 (v32qi,v32qi)
+v16hi __builtin_ia32_psubusw256 (v16hi,v16hi)
+v32qi __builtin_ia32_punpckhbw256 (v32qi,v32qi)
+v16hi __builtin_ia32_punpckhwd256 (v16hi,v16hi)
+v8si __builtin_ia32_punpckhdq256 (v8si,v8si)
+v4di __builtin_ia32_punpckhqdq256 (v4di,v4di)
+v32qi __builtin_ia32_punpcklbw256 (v32qi,v32qi)
+v16hi __builtin_ia32_punpcklwd256 (v16hi,v16hi)
+v8si __builtin_ia32_punpckldq256 (v8si,v8si)
+v4di __builtin_ia32_punpcklqdq256 (v4di,v4di)
+v4di __builtin_ia32_pxor256 (v4di,v4di)
+v4di __builtin_ia32_movntdqa256 (pv4di)
+v4sf __builtin_ia32_vbroadcastss_ps (v4sf)
+v8sf __builtin_ia32_vbroadcastss_ps256 (v4sf)
+v4df __builtin_ia32_vbroadcastsd_pd256 (v2df)
+v4di __builtin_ia32_vbroadcastsi256 (v2di)
+v4si __builtin_ia32_pblendd128 (v4si,v4si)
+v8si __builtin_ia32_pblendd256 (v8si,v8si)
+v32qi __builtin_ia32_pbroadcastb256 (v16qi)
+v16hi __builtin_ia32_pbroadcastw256 (v8hi)
+v8si __builtin_ia32_pbroadcastd256 (v4si)
+v4di __builtin_ia32_pbroadcastq256 (v2di)
+v16qi __builtin_ia32_pbroadcastb128 (v16qi)
+v8hi __builtin_ia32_pbroadcastw128 (v8hi)
+v4si __builtin_ia32_pbroadcastd128 (v4si)
+v2di __builtin_ia32_pbroadcastq128 (v2di)
+v8si __builtin_ia32_permvarsi256 (v8si,v8si)
+v4df __builtin_ia32_permdf256 (v4df,int)
+v8sf __builtin_ia32_permvarsf256 (v8sf,v8sf)
+v4di __builtin_ia32_permdi256 (v4di,int)
+v4di __builtin_ia32_permti256 (v4di,v4di,int)
+v4di __builtin_ia32_extract128i256 (v4di,int)
+v4di __builtin_ia32_insert128i256 (v4di,v2di,int)
+v8si __builtin_ia32_maskloadd256 (pcv8si,v8si)
+v4di __builtin_ia32_maskloadq256 (pcv4di,v4di)
+v4si __builtin_ia32_maskloadd (pcv4si,v4si)
+v2di __builtin_ia32_maskloadq (pcv2di,v2di)
+void __builtin_ia32_maskstored256 (pv8si,v8si,v8si)
+void __builtin_ia32_maskstoreq256 (pv4di,v4di,v4di)
+void __builtin_ia32_maskstored (pv4si,v4si,v4si)
+void __builtin_ia32_maskstoreq (pv2di,v2di,v2di)
+v8si __builtin_ia32_psllv8si (v8si,v8si)
+v4si __builtin_ia32_psllv4si (v4si,v4si)
+v4di __builtin_ia32_psllv4di (v4di,v4di)
+v2di __builtin_ia32_psllv2di (v2di,v2di)
+v8si __builtin_ia32_psrav8si (v8si,v8si)
+v4si __builtin_ia32_psrav4si (v4si,v4si)
+v8si __builtin_ia32_psrlv8si (v8si,v8si)
+v4si __builtin_ia32_psrlv4si (v4si,v4si)
+v4di __builtin_ia32_psrlv4di (v4di,v4di)
+v2di __builtin_ia32_psrlv2di (v2di,v2di)
+v2df __builtin_ia32_gathersiv2df (v2df, pcdouble,v4si,v2df,int)
+v4df __builtin_ia32_gathersiv4df (v4df, pcdouble,v4si,v4df,int)
+v2df __builtin_ia32_gatherdiv2df (v2df, pcdouble,v2di,v2df,int)
+v4df __builtin_ia32_gatherdiv4df (v4df, pcdouble,v4di,v4df,int)
+v4sf __builtin_ia32_gathersiv4sf (v4sf, pcfloat,v4si,v4sf,int)
+v8sf __builtin_ia32_gathersiv8sf (v8sf, pcfloat,v8si,v8sf,int)
+v4sf __builtin_ia32_gatherdiv4sf (v4sf, pcfloat,v2di,v4sf,int)
+v4sf __builtin_ia32_gatherdiv4sf256 (v4sf, pcfloat,v4di,v4sf,int)
+v2di __builtin_ia32_gathersiv2di (v2di, pcint64,v4si,v2di,int)
+v4di __builtin_ia32_gathersiv4di (v4di, pcint64,v4si,v4di,int)
+v2di __builtin_ia32_gatherdiv2di (v2di, pcint64,v2di,v2di,int)
+v4di __builtin_ia32_gatherdiv4di (v4di, pcint64,v4di,v4di,int)
+v4si __builtin_ia32_gathersiv4si (v4si, pcint,v4si,v4si,int)
+v8si __builtin_ia32_gathersiv8si (v8si, pcint,v8si,v8si,int)
+v4si __builtin_ia32_gatherdiv4si (v4si, pcint,v2di,v4si,int)
+v4si __builtin_ia32_gatherdiv4si256 (v4si, pcint,v4di,v4si,int)
+@end smallexample
+
+The following built-in functions are available when @option{-maes} is
+used. All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+v2di __builtin_ia32_aesenc128 (v2di, v2di)
+v2di __builtin_ia32_aesenclast128 (v2di, v2di)
+v2di __builtin_ia32_aesdec128 (v2di, v2di)
+v2di __builtin_ia32_aesdeclast128 (v2di, v2di)
+v2di __builtin_ia32_aeskeygenassist128 (v2di, const int)
+v2di __builtin_ia32_aesimc128 (v2di)
+@end smallexample
+
+The following built-in function is available when @option{-mpclmul} is
+used.
+
+@table @code
+@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)
+Generates the @code{pclmulqdq} machine instruction.
+@end table
+
+The following built-in function is available when @option{-mfsgsbase} is
+used. All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+unsigned int __builtin_ia32_rdfsbase32 (void)
+unsigned long long __builtin_ia32_rdfsbase64 (void)
+unsigned int __builtin_ia32_rdgsbase32 (void)
+unsigned long long __builtin_ia32_rdgsbase64 (void)
+void _writefsbase_u32 (unsigned int)
+void _writefsbase_u64 (unsigned long long)
+void _writegsbase_u32 (unsigned int)
+void _writegsbase_u64 (unsigned long long)
+@end smallexample
+
+The following built-in function is available when @option{-mrdrnd} is
+used. All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+unsigned int __builtin_ia32_rdrand16_step (unsigned short *)
+unsigned int __builtin_ia32_rdrand32_step (unsigned int *)
+unsigned int __builtin_ia32_rdrand64_step (unsigned long long *)
+@end smallexample
+
+The following built-in functions are available when @option{-msse4a} is used.
+All of them generate the machine instruction that is part of the name.
+
+@smallexample
+void __builtin_ia32_movntsd (double *, v2df)
+void __builtin_ia32_movntss (float *, v4sf)
+v2di __builtin_ia32_extrq (v2di, v16qi)
+v2di __builtin_ia32_extrqi (v2di, const unsigned int, const unsigned int)
+v2di __builtin_ia32_insertq (v2di, v2di)
+v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int)
+@end smallexample
+
+The following built-in functions are available when @option{-mxop} is used.
+@smallexample
+v2df __builtin_ia32_vfrczpd (v2df)
+v4sf __builtin_ia32_vfrczps (v4sf)
+v2df __builtin_ia32_vfrczsd (v2df)
+v4sf __builtin_ia32_vfrczss (v4sf)
+v4df __builtin_ia32_vfrczpd256 (v4df)
+v8sf __builtin_ia32_vfrczps256 (v8sf)
+v2di __builtin_ia32_vpcmov (v2di, v2di, v2di)
+v2di __builtin_ia32_vpcmov_v2di (v2di, v2di, v2di)
+v4si __builtin_ia32_vpcmov_v4si (v4si, v4si, v4si)
+v8hi __builtin_ia32_vpcmov_v8hi (v8hi, v8hi, v8hi)
+v16qi __builtin_ia32_vpcmov_v16qi (v16qi, v16qi, v16qi)
+v2df __builtin_ia32_vpcmov_v2df (v2df, v2df, v2df)
+v4sf __builtin_ia32_vpcmov_v4sf (v4sf, v4sf, v4sf)
+v4di __builtin_ia32_vpcmov_v4di256 (v4di, v4di, v4di)
+v8si __builtin_ia32_vpcmov_v8si256 (v8si, v8si, v8si)
+v16hi __builtin_ia32_vpcmov_v16hi256 (v16hi, v16hi, v16hi)
+v32qi __builtin_ia32_vpcmov_v32qi256 (v32qi, v32qi, v32qi)
+v4df __builtin_ia32_vpcmov_v4df256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vpcmov_v8sf256 (v8sf, v8sf, v8sf)
+v16qi __builtin_ia32_vpcomeqb (v16qi, v16qi)
+v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi)
+v4si __builtin_ia32_vpcomeqd (v4si, v4si)
+v2di __builtin_ia32_vpcomeqq (v2di, v2di)
+v16qi __builtin_ia32_vpcomequb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomequd (v4si, v4si)
+v2di __builtin_ia32_vpcomequq (v2di, v2di)
+v8hi __builtin_ia32_vpcomequw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomfalseb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomfalsed (v4si, v4si)
+v2di __builtin_ia32_vpcomfalseq (v2di, v2di)
+v16qi __builtin_ia32_vpcomfalseub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomfalseud (v4si, v4si)
+v2di __builtin_ia32_vpcomfalseuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomfalseuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomfalsew (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomgeb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomged (v4si, v4si)
+v2di __builtin_ia32_vpcomgeq (v2di, v2di)
+v16qi __builtin_ia32_vpcomgeub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomgeud (v4si, v4si)
+v2di __builtin_ia32_vpcomgeuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomgeuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomgew (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomgtb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomgtd (v4si, v4si)
+v2di __builtin_ia32_vpcomgtq (v2di, v2di)
+v16qi __builtin_ia32_vpcomgtub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomgtud (v4si, v4si)
+v2di __builtin_ia32_vpcomgtuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomgtuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomgtw (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomleb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomled (v4si, v4si)
+v2di __builtin_ia32_vpcomleq (v2di, v2di)
+v16qi __builtin_ia32_vpcomleub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomleud (v4si, v4si)
+v2di __builtin_ia32_vpcomleuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomleuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomlew (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomltb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomltd (v4si, v4si)
+v2di __builtin_ia32_vpcomltq (v2di, v2di)
+v16qi __builtin_ia32_vpcomltub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomltud (v4si, v4si)
+v2di __builtin_ia32_vpcomltuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomltuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomltw (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomneb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomned (v4si, v4si)
+v2di __builtin_ia32_vpcomneq (v2di, v2di)
+v16qi __builtin_ia32_vpcomneub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomneud (v4si, v4si)
+v2di __builtin_ia32_vpcomneuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomneuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomnew (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomtrueb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomtrued (v4si, v4si)
+v2di __builtin_ia32_vpcomtrueq (v2di, v2di)
+v16qi __builtin_ia32_vpcomtrueub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomtrueud (v4si, v4si)
+v2di __builtin_ia32_vpcomtrueuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomtrueuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomtruew (v8hi, v8hi)
+v4si __builtin_ia32_vphaddbd (v16qi)
+v2di __builtin_ia32_vphaddbq (v16qi)
+v8hi __builtin_ia32_vphaddbw (v16qi)
+v2di __builtin_ia32_vphadddq (v4si)
+v4si __builtin_ia32_vphaddubd (v16qi)
+v2di __builtin_ia32_vphaddubq (v16qi)
+v8hi __builtin_ia32_vphaddubw (v16qi)
+v2di __builtin_ia32_vphaddudq (v4si)
+v4si __builtin_ia32_vphadduwd (v8hi)
+v2di __builtin_ia32_vphadduwq (v8hi)
+v4si __builtin_ia32_vphaddwd (v8hi)
+v2di __builtin_ia32_vphaddwq (v8hi)
+v8hi __builtin_ia32_vphsubbw (v16qi)
+v2di __builtin_ia32_vphsubdq (v4si)
+v4si __builtin_ia32_vphsubwd (v8hi)
+v4si __builtin_ia32_vpmacsdd (v4si, v4si, v4si)
+v2di __builtin_ia32_vpmacsdqh (v4si, v4si, v2di)
+v2di __builtin_ia32_vpmacsdql (v4si, v4si, v2di)
+v4si __builtin_ia32_vpmacssdd (v4si, v4si, v4si)
+v2di __builtin_ia32_vpmacssdqh (v4si, v4si, v2di)
+v2di __builtin_ia32_vpmacssdql (v4si, v4si, v2di)
+v4si __builtin_ia32_vpmacsswd (v8hi, v8hi, v4si)
+v8hi __builtin_ia32_vpmacssww (v8hi, v8hi, v8hi)
+v4si __builtin_ia32_vpmacswd (v8hi, v8hi, v4si)
+v8hi __builtin_ia32_vpmacsww (v8hi, v8hi, v8hi)
+v4si __builtin_ia32_vpmadcsswd (v8hi, v8hi, v4si)
+v4si __builtin_ia32_vpmadcswd (v8hi, v8hi, v4si)
+v16qi __builtin_ia32_vpperm (v16qi, v16qi, v16qi)
+v16qi __builtin_ia32_vprotb (v16qi, v16qi)
+v4si __builtin_ia32_vprotd (v4si, v4si)
+v2di __builtin_ia32_vprotq (v2di, v2di)
+v8hi __builtin_ia32_vprotw (v8hi, v8hi)
+v16qi __builtin_ia32_vpshab (v16qi, v16qi)
+v4si __builtin_ia32_vpshad (v4si, v4si)
+v2di __builtin_ia32_vpshaq (v2di, v2di)
+v8hi __builtin_ia32_vpshaw (v8hi, v8hi)
+v16qi __builtin_ia32_vpshlb (v16qi, v16qi)
+v4si __builtin_ia32_vpshld (v4si, v4si)
+v2di __builtin_ia32_vpshlq (v2di, v2di)
+v8hi __builtin_ia32_vpshlw (v8hi, v8hi)
+@end smallexample
+
+The following built-in functions are available when @option{-mfma4} is used.
+All of them generate the machine instruction that is part of the name.
+
+@smallexample
+v2df __builtin_ia32_vfmaddpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfmaddps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfmaddsd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfmaddss (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfmsubpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfmsubps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfmsubsd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfmsubss (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfnmaddpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfnmaddps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfnmaddsd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfnmaddss (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfnmsubpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfnmsubps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfnmsubsd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfnmsubss (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfmaddsubpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfmaddsubps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_vfmsubaddpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_vfmsubaddps (v4sf, v4sf, v4sf)
+v4df __builtin_ia32_vfmaddpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vfmaddps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_vfmsubpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vfmsubps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_vfnmaddpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vfnmaddps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_vfnmsubpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vfnmsubps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_vfmaddsubpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vfmaddsubps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_vfmsubaddpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vfmsubaddps256 (v8sf, v8sf, v8sf)
+
+@end smallexample
+
+The following built-in functions are available when @option{-mlwp} is used.
+
+@smallexample
+void __builtin_ia32_llwpcb16 (void *);
+void __builtin_ia32_llwpcb32 (void *);
+void __builtin_ia32_llwpcb64 (void *);
+void * __builtin_ia32_llwpcb16 (void);
+void * __builtin_ia32_llwpcb32 (void);
+void * __builtin_ia32_llwpcb64 (void);
+void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short)
+void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int)
+void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int)
+unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short)
+unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int)
+unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int)
+@end smallexample
+
+The following built-in functions are available when @option{-mbmi} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int);
+unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long);
+@end smallexample
+
+The following built-in functions are available when @option{-mbmi2} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+unsigned int _bzhi_u32 (unsigned int, unsigned int)
+unsigned int _pdep_u32 (unsigned int, unsigned int)
+unsigned int _pext_u32 (unsigned int, unsigned int)
+unsigned long long _bzhi_u64 (unsigned long long, unsigned long long)
+unsigned long long _pdep_u64 (unsigned long long, unsigned long long)
+unsigned long long _pext_u64 (unsigned long long, unsigned long long)
+@end smallexample
+
+The following built-in functions are available when @option{-mlzcnt} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+unsigned short __builtin_ia32_lzcnt_16(unsigned short);
+unsigned int __builtin_ia32_lzcnt_u32(unsigned int);
+unsigned long long __builtin_ia32_lzcnt_u64 (unsigned long long);
+@end smallexample
+
+The following built-in functions are available when @option{-mfxsr} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+void __builtin_ia32_fxsave (void *)
+void __builtin_ia32_fxrstor (void *)
+void __builtin_ia32_fxsave64 (void *)
+void __builtin_ia32_fxrstor64 (void *)
+@end smallexample
+
+The following built-in functions are available when @option{-mxsave} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+void __builtin_ia32_xsave (void *, long long)
+void __builtin_ia32_xrstor (void *, long long)
+void __builtin_ia32_xsave64 (void *, long long)
+void __builtin_ia32_xrstor64 (void *, long long)
+@end smallexample
+
+The following built-in functions are available when @option{-mxsaveopt} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+void __builtin_ia32_xsaveopt (void *, long long)
+void __builtin_ia32_xsaveopt64 (void *, long long)
+@end smallexample
+
+The following built-in functions are available when @option{-mtbm} is used.
+Both of them generate the immediate form of the bextr machine instruction.
+@smallexample
+unsigned int __builtin_ia32_bextri_u32 (unsigned int, const unsigned int);
+unsigned long long __builtin_ia32_bextri_u64 (unsigned long long, const unsigned long long);
+@end smallexample
+
+
+The following built-in functions are available when @option{-m3dnow} is used.
+All of them generate the machine instruction that is part of the name.
+
+@smallexample
+void __builtin_ia32_femms (void)
+v8qi __builtin_ia32_pavgusb (v8qi, v8qi)
+v2si __builtin_ia32_pf2id (v2sf)
+v2sf __builtin_ia32_pfacc (v2sf, v2sf)
+v2sf __builtin_ia32_pfadd (v2sf, v2sf)
+v2si __builtin_ia32_pfcmpeq (v2sf, v2sf)
+v2si __builtin_ia32_pfcmpge (v2sf, v2sf)
+v2si __builtin_ia32_pfcmpgt (v2sf, v2sf)
+v2sf __builtin_ia32_pfmax (v2sf, v2sf)
+v2sf __builtin_ia32_pfmin (v2sf, v2sf)
+v2sf __builtin_ia32_pfmul (v2sf, v2sf)
+v2sf __builtin_ia32_pfrcp (v2sf)
+v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf)
+v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf)
+v2sf __builtin_ia32_pfrsqrt (v2sf)
+v2sf __builtin_ia32_pfsub (v2sf, v2sf)
+v2sf __builtin_ia32_pfsubr (v2sf, v2sf)
+v2sf __builtin_ia32_pi2fd (v2si)
+v4hi __builtin_ia32_pmulhrw (v4hi, v4hi)
+@end smallexample
+
+The following built-in functions are available when both @option{-m3dnow}
+and @option{-march=athlon} are used. All of them generate the machine
+instruction that is part of the name.
+
+@smallexample
+v2si __builtin_ia32_pf2iw (v2sf)
+v2sf __builtin_ia32_pfnacc (v2sf, v2sf)
+v2sf __builtin_ia32_pfpnacc (v2sf, v2sf)
+v2sf __builtin_ia32_pi2fw (v2si)
+v2sf __builtin_ia32_pswapdsf (v2sf)
+v2si __builtin_ia32_pswapdsi (v2si)
+@end smallexample
+
+The following built-in functions are available when @option{-mrtm} is used
+They are used for restricted transactional memory. These are the internal
+low level functions. Normally the functions in
+@ref{x86 transactional memory intrinsics} should be used instead.
+
+@smallexample
+int __builtin_ia32_xbegin ()
+void __builtin_ia32_xend ()
+void __builtin_ia32_xabort (status)
+int __builtin_ia32_xtest ()
+@end smallexample
+
+@node x86 transactional memory intrinsics
+@subsection x86 transaction memory intrinsics
+
+Hardware transactional memory intrinsics for x86. These allow to use
+memory transactions with RTM (Restricted Transactional Memory).
+For using HLE (Hardware Lock Elision) see @ref{x86 specific memory model extensions for transactional memory} instead.
+This support is enabled with the @option{-mrtm} option.
+
+A memory transaction commits all changes to memory in an atomic way,
+as visible to other threads. If the transaction fails it is rolled back
+and all side effects discarded.
+
+Generally there is no guarantee that a memory transaction ever succeeds
+and suitable fallback code always needs to be supplied.
+
+@deftypefn {RTM Function} {unsigned} _xbegin ()
+Start a RTM (Restricted Transactional Memory) transaction.
+Returns _XBEGIN_STARTED when the transaction
+started successfully (note this is not 0, so the constant has to be
+explicitely tested). When the transaction aborts all side effects
+are undone and an abort code is returned. There is no guarantee
+any transaction ever succeeds, so there always needs to be a valid
+tested fallback path.
+@end deftypefn
+
+@smallexample
+#include <immintrin.h>
+
+if ((status = _xbegin ()) == _XBEGIN_STARTED) @{
+ ... transaction code...
+ _xend ();
+@} else @{
+ ... non transactional fallback path...
+@}
+@end smallexample
+
+Valid abort status bits (when the value is not @code{_XBEGIN_STARTED}) are:
+
+@table @code
+@item _XABORT_EXPLICIT
+Transaction explicitely aborted with @code{_xabort}. The parameter passed
+to @code{_xabort} is available with @code{_XABORT_CODE(status)}
+@item _XABORT_RETRY
+Transaction retry is possible.
+@item _XABORT_CONFLICT
+Transaction abort due to a memory conflict with another thread
+@item _XABORT_CAPACITY
+Transaction abort due to the transaction using too much memory
+@item _XABORT_DEBUG
+Transaction abort due to a debug trap
+@item _XABORT_NESTED
+Transaction abort in a inner nested transaction
+@end table
+
+@deftypefn {RTM Function} {void} _xend ()
+Commit the current transaction. When no transaction is active this will
+fault. All memory side effects of the transactions will become visible
+to other threads in an atomic matter.
+@end deftypefn
+
+@deftypefn {RTM Function} {int} _xtest ()
+Return a value not zero when a transaction is currently active, otherwise 0.
+@end deftypefn
+
+@deftypefn {RTM Function} {void} _xabort (status)
+Abort the current transaction. When no transaction is active this is a no-op.
+status must be a 8bit constant, that is included in the status code returned
+by @code{_xbegin}
+@end deftypefn
+
@node Target Format Checks
@section Format Checks Specific to Particular Target Machines
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 94ca947..ba81ec7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -676,44 +676,6 @@ Objective-C and Objective-C++ Dialects}.
-mschedule=@var{cpu-type} -mspace-regs -msio -mwsio @gol
-munix=@var{unix-std} -nolibdld -static -threads}
-@emph{x86 Options}
-@gccoptlist{-mtune=@var{cpu-type} -march=@var{cpu-type} @gol
--mtune-ctrl=@var{feature-list} -mdump-tune-features -mno-default @gol
--mfpmath=@var{unit} @gol
--masm=@var{dialect} -mno-fancy-math-387 @gol
--mno-fp-ret-in-387 -msoft-float @gol
--mno-wide-multiply -mrtd -malign-double @gol
--mpreferred-stack-boundary=@var{num} @gol
--mincoming-stack-boundary=@var{num} @gol
--mcld -mcx16 -msahf -mmovbe -mcrc32 @gol
--mrecip -mrecip=@var{opt} @gol
--mvzeroupper -mprefer-avx128 @gol
--mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
--mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -msha @gol
--maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfma -mprefetchwt1 @gol
--mclflushopt -mxsavec -mxsaves @gol
--msse4a -m3dnow -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop -mlzcnt @gol
--mbmi2 -mfxsr -mxsave -mxsaveopt -mrtm -mlwp -mmpx -mthreads @gol
--mno-align-stringops -minline-all-stringops @gol
--minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
--mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} @gol
--mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol
--m96bit-long-double -mlong-double-64 -mlong-double-80 -mlong-double-128 @gol
--mregparm=@var{num} -msseregparm @gol
--mveclibabi=@var{type} -mvect8-ret-in-mem @gol
--mpc32 -mpc64 -mpc80 -mstackrealign @gol
--momit-leaf-frame-pointer -mno-red-zone -mno-tls-direct-seg-refs @gol
--mcmodel=@var{code-model} -mabi=@var{name} -maddress-mode=@var{mode} @gol
--m32 -m64 -mx32 -m16 -mlarge-data-threshold=@var{num} @gol
--msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol
--mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol
--malign-data=@var{type} -mstack-protector-guard=@var{guard}}
-
-@emph{x86 Windows Options}
-@gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol
--mnop-fun-dllimport -mthread @gol
--municode -mwin32 -mwindows -fno-set-stack-executable}
-
@emph{IA-64 Options}
@gccoptlist{-mbig-endian -mlittle-endian -mgnu-as -mgnu-ld -mno-pic @gol
-mvolatile-asm-stop -mregister-names -msdata -mno-sdata @gol
@@ -1081,6 +1043,44 @@ See RS/6000 and PowerPC Options.
@gccoptlist{-mrtp -non-static -Bstatic -Bdynamic @gol
-Xbind-lazy -Xbind-now}
+@emph{x86 Options}
+@gccoptlist{-mtune=@var{cpu-type} -march=@var{cpu-type} @gol
+-mtune-ctrl=@var{feature-list} -mdump-tune-features -mno-default @gol
+-mfpmath=@var{unit} @gol
+-masm=@var{dialect} -mno-fancy-math-387 @gol
+-mno-fp-ret-in-387 -msoft-float @gol
+-mno-wide-multiply -mrtd -malign-double @gol
+-mpreferred-stack-boundary=@var{num} @gol
+-mincoming-stack-boundary=@var{num} @gol
+-mcld -mcx16 -msahf -mmovbe -mcrc32 @gol
+-mrecip -mrecip=@var{opt} @gol
+-mvzeroupper -mprefer-avx128 @gol
+-mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
+-mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -msha @gol
+-maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfma -mprefetchwt1 @gol
+-mclflushopt -mxsavec -mxsaves @gol
+-msse4a -m3dnow -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop -mlzcnt @gol
+-mbmi2 -mfxsr -mxsave -mxsaveopt -mrtm -mlwp -mmpx -mthreads @gol
+-mno-align-stringops -minline-all-stringops @gol
+-minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
+-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} @gol
+-mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol
+-m96bit-long-double -mlong-double-64 -mlong-double-80 -mlong-double-128 @gol
+-mregparm=@var{num} -msseregparm @gol
+-mveclibabi=@var{type} -mvect8-ret-in-mem @gol
+-mpc32 -mpc64 -mpc80 -mstackrealign @gol
+-momit-leaf-frame-pointer -mno-red-zone -mno-tls-direct-seg-refs @gol
+-mcmodel=@var{code-model} -mabi=@var{name} -maddress-mode=@var{mode} @gol
+-m32 -m64 -mx32 -m16 -mlarge-data-threshold=@var{num} @gol
+-msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol
+-mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol
+-malign-data=@var{type} -mstack-protector-guard=@var{guard}}
+
+@emph{x86 Windows Options}
+@gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol
+-mnop-fun-dllimport -mthread @gol
+-municode -mwin32 -mwindows -fno-set-stack-executable}
+
@emph{Xstormy16 Options}
@gccoptlist{-msim}
@@ -11952,8 +11952,6 @@ platform.
* GNU/Linux Options::
* H8/300 Options::
* HPPA Options::
-* x86 Options::
-* x86 Windows Options::
* IA-64 Options::
* LM32 Options::
* M32C Options::
@@ -11989,6 +11987,8 @@ platform.
* Visium Options::
* VMS Options::
* VxWorks Options::
+* x86 Options::
+* x86 Windows Options::
* Xstormy16 Options::
* Xtensa Options::
* zSeries Options::
@@ -15361,1223 +15361,6 @@ under HP-UX@. This option sets flags for both the preprocessor and
linker.
@end table
-@node x86 Options
-@subsection x86 Options
-@cindex x86 Options
-
-These @samp{-m} options are defined for the x86 family of computers.
-
-@table @gcctabopt
-
-@item -march=@var{cpu-type}
-@opindex march
-Generate instructions for the machine type @var{cpu-type}. In contrast to
-@option{-mtune=@var{cpu-type}}, which merely tunes the generated code
-for the specified @var{cpu-type}, @option{-march=@var{cpu-type}} allows GCC
-to generate code that may not run at all on processors other than the one
-indicated. Specifying @option{-march=@var{cpu-type}} implies
-@option{-mtune=@var{cpu-type}}.
-
-The choices for @var{cpu-type} are:
-
-@table @samp
-@item native
-This selects the CPU to generate code for at compilation time by determining
-the processor type of the compiling machine. Using @option{-march=native}
-enables all instruction subsets supported by the local machine (hence
-the result might not run on different machines). Using @option{-mtune=native}
-produces code optimized for the local machine under the constraints
-of the selected instruction set.
-
-@item i386
-Original Intel i386 CPU@.
-
-@item i486
-Intel i486 CPU@. (No scheduling is implemented for this chip.)
-
-@item i586
-@itemx pentium
-Intel Pentium CPU with no MMX support.
-
-@item pentium-mmx
-Intel Pentium MMX CPU, based on Pentium core with MMX instruction set support.
-
-@item pentiumpro
-Intel Pentium Pro CPU@.
-
-@item i686
-When used with @option{-march}, the Pentium Pro
-instruction set is used, so the code runs on all i686 family chips.
-When used with @option{-mtune}, it has the same meaning as @samp{generic}.
-
-@item pentium2
-Intel Pentium II CPU, based on Pentium Pro core with MMX instruction set
-support.
-
-@item pentium3
-@itemx pentium3m
-Intel Pentium III CPU, based on Pentium Pro core with MMX and SSE instruction
-set support.
-
-@item pentium-m
-Intel Pentium M; low-power version of Intel Pentium III CPU
-with MMX, SSE and SSE2 instruction set support. Used by Centrino notebooks.
-
-@item pentium4
-@itemx pentium4m
-Intel Pentium 4 CPU with MMX, SSE and SSE2 instruction set support.
-
-@item prescott
-Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2 and SSE3 instruction
-set support.
-
-@item nocona
-Improved version of Intel Pentium 4 CPU with 64-bit extensions, MMX, SSE,
-SSE2 and SSE3 instruction set support.
-
-@item core2
-Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3
-instruction set support.
-
-@item nehalem
-Intel Nehalem CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2 and POPCNT instruction set support.
-
-@item westmere
-Intel Westmere CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2, POPCNT, AES and PCLMUL instruction set support.
-
-@item sandybridge
-Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support.
-
-@item ivybridge
-Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C
-instruction set support.
-
-@item haswell
-Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
-BMI, BMI2 and F16C instruction set support.
-
-@item broadwell
-Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
-BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set support.
-
-@item bonnell
-Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3
-instruction set support.
-
-@item silvermont
-Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
-SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction set support.
-
-@item k6
-AMD K6 CPU with MMX instruction set support.
-
-@item k6-2
-@itemx k6-3
-Improved versions of AMD K6 CPU with MMX and 3DNow!@: instruction set support.
-
-@item athlon
-@itemx athlon-tbird
-AMD Athlon CPU with MMX, 3dNOW!, enhanced 3DNow!@: and SSE prefetch instructions
-support.
-
-@item athlon-4
-@itemx athlon-xp
-@itemx athlon-mp
-Improved AMD Athlon CPU with MMX, 3DNow!, enhanced 3DNow!@: and full SSE
-instruction set support.
-
-@item k8
-@itemx opteron
-@itemx athlon64
-@itemx athlon-fx
-Processors based on the AMD K8 core with x86-64 instruction set support,
-including the AMD Opteron, Athlon 64, and Athlon 64 FX processors.
-(This supersets MMX, SSE, SSE2, 3DNow!, enhanced 3DNow!@: and 64-bit
-instruction set extensions.)
-
-@item k8-sse3
-@itemx opteron-sse3
-@itemx athlon64-sse3
-Improved versions of AMD K8 cores with SSE3 instruction set support.
-
-@item amdfam10
-@itemx barcelona
-CPUs based on AMD Family 10h cores with x86-64 instruction set support. (This
-supersets MMX, SSE, SSE2, SSE3, SSE4A, 3DNow!, enhanced 3DNow!, ABM and 64-bit
-instruction set extensions.)
-
-@item bdver1
-CPUs based on AMD Family 15h cores with x86-64 instruction set support. (This
-supersets FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A,
-SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.)
-@item bdver2
-AMD Family 15h core based CPUs with x86-64 instruction set support. (This
-supersets BMI, TBM, F16C, FMA, FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX,
-SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set
-extensions.)
-@item bdver3
-AMD Family 15h core based CPUs with x86-64 instruction set support. (This
-supersets BMI, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, XOP, LWP, AES,
-PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and
-64-bit instruction set extensions.
-@item bdver4
-AMD Family 15h core based CPUs with x86-64 instruction set support. (This
-supersets BMI, BMI2, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, AVX2, XOP, LWP,
-AES, PCL_MUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1,
-SSE4.2, ABM and 64-bit instruction set extensions.
-
-@item btver1
-CPUs based on AMD Family 14h cores with x86-64 instruction set support. (This
-supersets MMX, SSE, SSE2, SSE3, SSSE3, SSE4A, CX16, ABM and 64-bit
-instruction set extensions.)
-
-@item btver2
-CPUs based on AMD Family 16h cores with x86-64 instruction set support. This
-includes MOVBE, F16C, BMI, AVX, PCL_MUL, AES, SSE4.2, SSE4.1, CX16, ABM,
-SSE4A, SSSE3, SSE3, SSE2, SSE, MMX and 64-bit instruction set extensions.
-
-@item winchip-c6
-IDT WinChip C6 CPU, dealt in same way as i486 with additional MMX instruction
-set support.
-
-@item winchip2
-IDT WinChip 2 CPU, dealt in same way as i486 with additional MMX and 3DNow!@:
-instruction set support.
-
-@item c3
-VIA C3 CPU with MMX and 3DNow!@: instruction set support. (No scheduling is
-implemented for this chip.)
-
-@item c3-2
-VIA C3-2 (Nehemiah/C5XL) CPU with MMX and SSE instruction set support.
-(No scheduling is
-implemented for this chip.)
-
-@item geode
-AMD Geode embedded processor with MMX and 3DNow!@: instruction set support.
-@end table
-
-@item -mtune=@var{cpu-type}
-@opindex mtune
-Tune to @var{cpu-type} everything applicable about the generated code, except
-for the ABI and the set of available instructions.
-While picking a specific @var{cpu-type} schedules things appropriately
-for that particular chip, the compiler does not generate any code that
-cannot run on the default machine type unless you use a
-@option{-march=@var{cpu-type}} option.
-For example, if GCC is configured for i686-pc-linux-gnu
-then @option{-mtune=pentium4} generates code that is tuned for Pentium 4
-but still runs on i686 machines.
-
-The choices for @var{cpu-type} are the same as for @option{-march}.
-In addition, @option{-mtune} supports 2 extra choices for @var{cpu-type}:
-
-@table @samp
-@item generic
-Produce code optimized for the most common IA32/@/AMD64/@/EM64T processors.
-If you know the CPU on which your code will run, then you should use
-the corresponding @option{-mtune} or @option{-march} option instead of
-@option{-mtune=generic}. But, if you do not know exactly what CPU users
-of your application will have, then you should use this option.
-
-As new processors are deployed in the marketplace, the behavior of this
-option will change. Therefore, if you upgrade to a newer version of
-GCC, code generation controlled by this option will change to reflect
-the processors
-that are most common at the time that version of GCC is released.
-
-There is no @option{-march=generic} option because @option{-march}
-indicates the instruction set the compiler can use, and there is no
-generic instruction set applicable to all processors. In contrast,
-@option{-mtune} indicates the processor (or, in this case, collection of
-processors) for which the code is optimized.
-
-@item intel
-Produce code optimized for the most current Intel processors, which are
-Haswell and Silvermont for this version of GCC. If you know the CPU
-on which your code will run, then you should use the corresponding
-@option{-mtune} or @option{-march} option instead of @option{-mtune=intel}.
-But, if you want your application performs better on both Haswell and
-Silvermont, then you should use this option.
-
-As new Intel processors are deployed in the marketplace, the behavior of
-this option will change. Therefore, if you upgrade to a newer version of
-GCC, code generation controlled by this option will change to reflect
-the most current Intel processors at the time that version of GCC is
-released.
-
-There is no @option{-march=intel} option because @option{-march} indicates
-the instruction set the compiler can use, and there is no common
-instruction set applicable to all processors. In contrast,
-@option{-mtune} indicates the processor (or, in this case, collection of
-processors) for which the code is optimized.
-@end table
-
-@item -mcpu=@var{cpu-type}
-@opindex mcpu
-A deprecated synonym for @option{-mtune}.
-
-@item -mfpmath=@var{unit}
-@opindex mfpmath
-Generate floating-point arithmetic for selected unit @var{unit}. The choices
-for @var{unit} are:
-
-@table @samp
-@item 387
-Use the standard 387 floating-point coprocessor present on the majority of chips and
-emulated otherwise. Code compiled with this option runs almost everywhere.
-The temporary results are computed in 80-bit precision instead of the precision
-specified by the type, resulting in slightly different results compared to most
-of other chips. See @option{-ffloat-store} for more detailed description.
-
-This is the default choice for x86-32 targets.
-
-@item sse
-Use scalar floating-point instructions present in the SSE instruction set.
-This instruction set is supported by Pentium III and newer chips,
-and in the AMD line
-by Athlon-4, Athlon XP and Athlon MP chips. The earlier version of the SSE
-instruction set supports only single-precision arithmetic, thus the double and
-extended-precision arithmetic are still done using 387. A later version, present
-only in Pentium 4 and AMD x86-64 chips, supports double-precision
-arithmetic too.
-
-For the x86-32 compiler, you must use @option{-march=@var{cpu-type}}, @option{-msse}
-or @option{-msse2} switches to enable SSE extensions and make this option
-effective. For the x86-64 compiler, these extensions are enabled by default.
-
-The resulting code should be considerably faster in the majority of cases and avoid
-the numerical instability problems of 387 code, but may break some existing
-code that expects temporaries to be 80 bits.
-
-This is the default choice for the x86-64 compiler.
-
-@item sse,387
-@itemx sse+387
-@itemx both
-Attempt to utilize both instruction sets at once. This effectively doubles the
-amount of available registers, and on chips with separate execution units for
-387 and SSE the execution resources too. Use this option with care, as it is
-still experimental, because the GCC register allocator does not model separate
-functional units well, resulting in unstable performance.
-@end table
-
-@item -masm=@var{dialect}
-@opindex masm=@var{dialect}
-Output assembly instructions using selected @var{dialect}. Supported
-choices are @samp{intel} or @samp{att} (the default). Darwin does
-not support @samp{intel}.
-
-@item -mieee-fp
-@itemx -mno-ieee-fp
-@opindex mieee-fp
-@opindex mno-ieee-fp
-Control whether or not the compiler uses IEEE floating-point
-comparisons. These correctly handle the case where the result of a
-comparison is unordered.
-
-@item -msoft-float
-@opindex msoft-float
-Generate output containing library calls for floating point.
-
-@strong{Warning:} the requisite libraries are not part of GCC@.
-Normally the facilities of the machine's usual C compiler are used, but
-this can't be done directly in cross-compilation. You must make your
-own arrangements to provide suitable library functions for
-cross-compilation.
-
-On machines where a function returns floating-point results in the 80387
-register stack, some floating-point opcodes may be emitted even if
-@option{-msoft-float} is used.
-
-@item -mno-fp-ret-in-387
-@opindex mno-fp-ret-in-387
-Do not use the FPU registers for return values of functions.
-
-The usual calling convention has functions return values of types
-@code{float} and @code{double} in an FPU register, even if there
-is no FPU@. The idea is that the operating system should emulate
-an FPU@.
-
-The option @option{-mno-fp-ret-in-387} causes such values to be returned
-in ordinary CPU registers instead.
-
-@item -mno-fancy-math-387
-@opindex mno-fancy-math-387
-Some 387 emulators do not support the @code{sin}, @code{cos} and
-@code{sqrt} instructions for the 387. Specify this option to avoid
-generating those instructions. This option is the default on FreeBSD,
-OpenBSD and NetBSD@. This option is overridden when @option{-march}
-indicates that the target CPU always has an FPU and so the
-instruction does not need emulation. These
-instructions are not generated unless you also use the
-@option{-funsafe-math-optimizations} switch.
-
-@item -malign-double
-@itemx -mno-align-double
-@opindex malign-double
-@opindex mno-align-double
-Control whether GCC aligns @code{double}, @code{long double}, and
-@code{long long} variables on a two-word boundary or a one-word
-boundary. Aligning @code{double} variables on a two-word boundary
-produces code that runs somewhat faster on a Pentium at the
-expense of more memory.
-
-On x86-64, @option{-malign-double} is enabled by default.
-
-@strong{Warning:} if you use the @option{-malign-double} switch,
-structures containing the above types are aligned differently than
-the published application binary interface specifications for the x86-32
-and are not binary compatible with structures in code compiled
-without that switch.
-
-@item -m96bit-long-double
-@itemx -m128bit-long-double
-@opindex m96bit-long-double
-@opindex m128bit-long-double
-These switches control the size of @code{long double} type. The x86-32
-application binary interface specifies the size to be 96 bits,
-so @option{-m96bit-long-double} is the default in 32-bit mode.
-
-Modern architectures (Pentium and newer) prefer @code{long double}
-to be aligned to an 8- or 16-byte boundary. In arrays or structures
-conforming to the ABI, this is not possible. So specifying
-@option{-m128bit-long-double} aligns @code{long double}
-to a 16-byte boundary by padding the @code{long double} with an additional
-32-bit zero.
-
-In the x86-64 compiler, @option{-m128bit-long-double} is the default choice as
-its ABI specifies that @code{long double} is aligned on 16-byte boundary.
-
-Notice that neither of these options enable any extra precision over the x87
-standard of 80 bits for a @code{long double}.
-
-@strong{Warning:} if you override the default value for your target ABI, this
-changes the size of
-structures and arrays containing @code{long double} variables,
-as well as modifying the function calling convention for functions taking
-@code{long double}. Hence they are not binary-compatible
-with code compiled without that switch.
-
-@item -mlong-double-64
-@itemx -mlong-double-80
-@itemx -mlong-double-128
-@opindex mlong-double-64
-@opindex mlong-double-80
-@opindex mlong-double-128
-These switches control the size of @code{long double} type. A size
-of 64 bits makes the @code{long double} type equivalent to the @code{double}
-type. This is the default for 32-bit Bionic C library. A size
-of 128 bits makes the @code{long double} type equivalent to the
-@code{__float128} type. This is the default for 64-bit Bionic C library.
-
-@strong{Warning:} if you override the default value for your target ABI, this
-changes the size of
-structures and arrays containing @code{long double} variables,
-as well as modifying the function calling convention for functions taking
-@code{long double}. Hence they are not binary-compatible
-with code compiled without that switch.
-
-@item -malign-data=@var{type}
-@opindex malign-data
-Control how GCC aligns variables. Supported values for @var{type} are
-@samp{compat} uses increased alignment value compatible uses GCC 4.8
-and earlier, @samp{abi} uses alignment value as specified by the
-psABI, and @samp{cacheline} uses increased alignment value to match
-the cache line size. @samp{compat} is the default.
-
-@item -mlarge-data-threshold=@var{threshold}
-@opindex mlarge-data-threshold
-When @option{-mcmodel=medium} is specified, data objects larger than
-@var{threshold} are placed in the large data section. This value must be the
-same across all objects linked into the binary, and defaults to 65535.
-
-@item -mrtd
-@opindex mrtd
-Use a different function-calling convention, in which functions that
-take a fixed number of arguments return with the @code{ret @var{num}}
-instruction, which pops their arguments while returning. This saves one
-instruction in the caller since there is no need to pop the arguments
-there.
-
-You can specify that an individual function is called with this calling
-sequence with the function attribute @code{stdcall}. You can also
-override the @option{-mrtd} option by using the function attribute
-@code{cdecl}. @xref{Function Attributes}.
-
-@strong{Warning:} this calling convention is incompatible with the one
-normally used on Unix, so you cannot use it if you need to call
-libraries compiled with the Unix compiler.
-
-Also, you must provide function prototypes for all functions that
-take variable numbers of arguments (including @code{printf});
-otherwise incorrect code is generated for calls to those
-functions.
-
-In addition, seriously incorrect code results if you call a
-function with too many arguments. (Normally, extra arguments are
-harmlessly ignored.)
-
-@item -mregparm=@var{num}
-@opindex mregparm
-Control how many registers are used to pass integer arguments. By
-default, no registers are used to pass arguments, and at most 3
-registers can be used. You can control this behavior for a specific
-function by using the function attribute @code{regparm}.
-@xref{Function Attributes}.
-
-@strong{Warning:} if you use this switch, and
-@var{num} is nonzero, then you must build all modules with the same
-value, including any libraries. This includes the system libraries and
-startup modules.
-
-@item -msseregparm
-@opindex msseregparm
-Use SSE register passing conventions for float and double arguments
-and return values. You can control this behavior for a specific
-function by using the function attribute @code{sseregparm}.
-@xref{Function Attributes}.
-
-@strong{Warning:} if you use this switch then you must build all
-modules with the same value, including any libraries. This includes
-the system libraries and startup modules.
-
-@item -mvect8-ret-in-mem
-@opindex mvect8-ret-in-mem
-Return 8-byte vectors in memory instead of MMX registers. This is the
-default on Solaris@tie{}8 and 9 and VxWorks to match the ABI of the Sun
-Studio compilers until version 12. Later compiler versions (starting
-with Studio 12 Update@tie{}1) follow the ABI used by other x86 targets, which
-is the default on Solaris@tie{}10 and later. @emph{Only} use this option if
-you need to remain compatible with existing code produced by those
-previous compiler versions or older versions of GCC@.
-
-@item -mpc32
-@itemx -mpc64
-@itemx -mpc80
-@opindex mpc32
-@opindex mpc64
-@opindex mpc80
-
-Set 80387 floating-point precision to 32, 64 or 80 bits. When @option{-mpc32}
-is specified, the significands of results of floating-point operations are
-rounded to 24 bits (single precision); @option{-mpc64} rounds the
-significands of results of floating-point operations to 53 bits (double
-precision) and @option{-mpc80} rounds the significands of results of
-floating-point operations to 64 bits (extended double precision), which is
-the default. When this option is used, floating-point operations in higher
-precisions are not available to the programmer without setting the FPU
-control word explicitly.
-
-Setting the rounding of floating-point operations to less than the default
-80 bits can speed some programs by 2% or more. Note that some mathematical
-libraries assume that extended-precision (80-bit) floating-point operations
-are enabled by default; routines in such libraries could suffer significant
-loss of accuracy, typically through so-called ``catastrophic cancellation'',
-when this option is used to set the precision to less than extended precision.
-
-@item -mstackrealign
-@opindex mstackrealign
-Realign the stack at entry. On the x86, the @option{-mstackrealign}
-option generates an alternate prologue and epilogue that realigns the
-run-time stack if necessary. This supports mixing legacy codes that keep
-4-byte stack alignment with modern codes that keep 16-byte stack alignment for
-SSE compatibility. See also the attribute @code{force_align_arg_pointer},
-applicable to individual functions.
-
-@item -mpreferred-stack-boundary=@var{num}
-@opindex mpreferred-stack-boundary
-Attempt to keep the stack boundary aligned to a 2 raised to @var{num}
-byte boundary. If @option{-mpreferred-stack-boundary} is not specified,
-the default is 4 (16 bytes or 128 bits).
-
-@strong{Warning:} When generating code for the x86-64 architecture with
-SSE extensions disabled, @option{-mpreferred-stack-boundary=3} can be
-used to keep the stack boundary aligned to 8 byte boundary. Since
-x86-64 ABI require 16 byte stack alignment, this is ABI incompatible and
-intended to be used in controlled environment where stack space is
-important limitation. This option leads to wrong code when functions
-compiled with 16 byte stack alignment (such as functions from a standard
-library) are called with misaligned stack. In this case, SSE
-instructions may lead to misaligned memory access traps. In addition,
-variable arguments are handled incorrectly for 16 byte aligned
-objects (including x87 long double and __int128), leading to wrong
-results. You must build all modules with
-@option{-mpreferred-stack-boundary=3}, including any libraries. This
-includes the system libraries and startup modules.
-
-@item -mincoming-stack-boundary=@var{num}
-@opindex mincoming-stack-boundary
-Assume the incoming stack is aligned to a 2 raised to @var{num} byte
-boundary. If @option{-mincoming-stack-boundary} is not specified,
-the one specified by @option{-mpreferred-stack-boundary} is used.
-
-On Pentium and Pentium Pro, @code{double} and @code{long double} values
-should be aligned to an 8-byte boundary (see @option{-malign-double}) or
-suffer significant run time performance penalties. On Pentium III, the
-Streaming SIMD Extension (SSE) data type @code{__m128} may not work
-properly if it is not 16-byte aligned.
-
-To ensure proper alignment of this values on the stack, the stack boundary
-must be as aligned as that required by any value stored on the stack.
-Further, every function must be generated such that it keeps the stack
-aligned. Thus calling a function compiled with a higher preferred
-stack boundary from a function compiled with a lower preferred stack
-boundary most likely misaligns the stack. It is recommended that
-libraries that use callbacks always use the default setting.
-
-This extra alignment does consume extra stack space, and generally
-increases code size. Code that is sensitive to stack space usage, such
-as embedded systems and operating system kernels, may want to reduce the
-preferred alignment to @option{-mpreferred-stack-boundary=2}.
-
-@need 200
-@item -mmmx
-@opindex mmmx
-@need 200
-@itemx -msse
-@opindex msse
-@need 200
-@itemx -msse2
-@need 200
-@itemx -msse3
-@need 200
-@itemx -mssse3
-@need 200
-@itemx -msse4
-@need 200
-@itemx -msse4a
-@need 200
-@itemx -msse4.1
-@need 200
-@itemx -msse4.2
-@need 200
-@itemx -mavx
-@opindex mavx
-@need 200
-@itemx -mavx2
-@need 200
-@itemx -mavx512f
-@need 200
-@itemx -mavx512pf
-@need 200
-@itemx -mavx512er
-@need 200
-@itemx -mavx512cd
-@need 200
-@itemx -msha
-@opindex msha
-@need 200
-@itemx -maes
-@opindex maes
-@need 200
-@itemx -mpclmul
-@opindex mpclmul
-@need 200
-@itemx -mclfushopt
-@opindex mclfushopt
-@need 200
-@itemx -mfsgsbase
-@opindex mfsgsbase
-@need 200
-@itemx -mrdrnd
-@opindex mrdrnd
-@need 200
-@itemx -mf16c
-@opindex mf16c
-@need 200
-@itemx -mfma
-@opindex mfma
-@need 200
-@itemx -mfma4
-@need 200
-@itemx -mno-fma4
-@need 200
-@itemx -mprefetchwt1
-@opindex mprefetchwt1
-@need 200
-@itemx -mxop
-@opindex mxop
-@need 200
-@itemx -mlwp
-@opindex mlwp
-@need 200
-@itemx -m3dnow
-@opindex m3dnow
-@need 200
-@itemx -mpopcnt
-@opindex mpopcnt
-@need 200
-@itemx -mabm
-@opindex mabm
-@need 200
-@itemx -mbmi
-@opindex mbmi
-@need 200
-@itemx -mbmi2
-@need 200
-@itemx -mlzcnt
-@opindex mlzcnt
-@need 200
-@itemx -mfxsr
-@opindex mfxsr
-@need 200
-@itemx -mxsave
-@opindex mxsave
-@need 200
-@itemx -mxsaveopt
-@opindex mxsaveopt
-@need 200
-@itemx -mxsavec
-@opindex mxsavec
-@need 200
-@itemx -mxsaves
-@opindex mxsaves
-@need 200
-@itemx -mrtm
-@opindex mrtm
-@need 200
-@itemx -mtbm
-@opindex mtbm
-@need 200
-@itemx -mmpx
-@opindex mmpx
-These switches enable the use of instructions in the MMX, SSE,
-SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD,
-SHA, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, LWP, ABM,
-BMI, BMI2, FXSR, XSAVE, XSAVEOPT, LZCNT, RTM, MPX or 3DNow!@:
-extended instruction sets. Each has a corresponding @option{-mno-} option
-to disable use of these instructions.
-
-These extensions are also available as built-in functions: see
-@ref{x86 Built-in Functions}, for details of the functions enabled and
-disabled by these switches.
-
-To generate SSE/SSE2 instructions automatically from floating-point
-code (as opposed to 387 instructions), see @option{-mfpmath=sse}.
-
-GCC depresses SSEx instructions when @option{-mavx} is used. Instead, it
-generates new AVX instructions or AVX equivalence for all SSEx instructions
-when needed.
-
-These options enable GCC to use these extended instructions in
-generated code, even without @option{-mfpmath=sse}. Applications that
-perform run-time CPU detection must compile separate files for each
-supported architecture, using the appropriate flags. In particular,
-the file containing the CPU detection code should be compiled without
-these options.
-
-@item -mdump-tune-features
-@opindex mdump-tune-features
-This option instructs GCC to dump the names of the x86 performance
-tuning features and default settings. The names can be used in
-@option{-mtune-ctrl=@var{feature-list}}.
-
-@item -mtune-ctrl=@var{feature-list}
-@opindex mtune-ctrl=@var{feature-list}
-This option is used to do fine grain control of x86 code generation features.
-@var{feature-list} is a comma separated list of @var{feature} names. See also
-@option{-mdump-tune-features}. When specified, the @var{feature} is turned
-on if it is not preceded with @samp{^}, otherwise, it is turned off.
-@option{-mtune-ctrl=@var{feature-list}} is intended to be used by GCC
-developers. Using it may lead to code paths not covered by testing and can
-potentially result in compiler ICEs or runtime errors.
-
-@item -mno-default
-@opindex mno-default
-This option instructs GCC to turn off all tunable features. See also
-@option{-mtune-ctrl=@var{feature-list}} and @option{-mdump-tune-features}.
-
-@item -mcld
-@opindex mcld
-This option instructs GCC to emit a @code{cld} instruction in the prologue
-of functions that use string instructions. String instructions depend on
-the DF flag to select between autoincrement or autodecrement mode. While the
-ABI specifies the DF flag to be cleared on function entry, some operating
-systems violate this specification by not clearing the DF flag in their
-exception dispatchers. The exception handler can be invoked with the DF flag
-set, which leads to wrong direction mode when string instructions are used.
-This option can be enabled by default on 32-bit x86 targets by configuring
-GCC with the @option{--enable-cld} configure option. Generation of @code{cld}
-instructions can be suppressed with the @option{-mno-cld} compiler option
-in this case.
-
-@item -mvzeroupper
-@opindex mvzeroupper
-This option instructs GCC to emit a @code{vzeroupper} instruction
-before a transfer of control flow out of the function to minimize
-the AVX to SSE transition penalty as well as remove unnecessary @code{zeroupper}
-intrinsics.
-
-@item -mprefer-avx128
-@opindex mprefer-avx128
-This option instructs GCC to use 128-bit AVX instructions instead of
-256-bit AVX instructions in the auto-vectorizer.
-
-@item -mcx16
-@opindex mcx16
-This option enables GCC to generate @code{CMPXCHG16B} instructions.
-@code{CMPXCHG16B} allows for atomic operations on 128-bit double quadword
-(or oword) data types.
-This is useful for high-resolution counters that can be updated
-by multiple processors (or cores). This instruction is generated as part of
-atomic built-in functions: see @ref{__sync Builtins} or
-@ref{__atomic Builtins} for details.
-
-@item -msahf
-@opindex msahf
-This option enables generation of @code{SAHF} instructions in 64-bit code.
-Early Intel Pentium 4 CPUs with Intel 64 support,
-prior to the introduction of Pentium 4 G1 step in December 2005,
-lacked the @code{LAHF} and @code{SAHF} instructions
-which are supported by AMD64.
-These are load and store instructions, respectively, for certain status flags.
-In 64-bit mode, the @code{SAHF} instruction is used to optimize @code{fmod},
-@code{drem}, and @code{remainder} built-in functions;
-see @ref{Other Builtins} for details.
-
-@item -mmovbe
-@opindex mmovbe
-This option enables use of the @code{movbe} instruction to implement
-@code{__builtin_bswap32} and @code{__builtin_bswap64}.
-
-@item -mcrc32
-@opindex mcrc32
-This option enables built-in functions @code{__builtin_ia32_crc32qi},
-@code{__builtin_ia32_crc32hi}, @code{__builtin_ia32_crc32si} and
-@code{__builtin_ia32_crc32di} to generate the @code{crc32} machine instruction.
-
-@item -mrecip
-@opindex mrecip
-This option enables use of @code{RCPSS} and @code{RSQRTSS} instructions
-(and their vectorized variants @code{RCPPS} and @code{RSQRTPS})
-with an additional Newton-Raphson step
-to increase precision instead of @code{DIVSS} and @code{SQRTSS}
-(and their vectorized
-variants) for single-precision floating-point arguments. These instructions
-are generated only when @option{-funsafe-math-optimizations} is enabled
-together with @option{-finite-math-only} and @option{-fno-trapping-math}.
-Note that while the throughput of the sequence is higher than the throughput
-of the non-reciprocal instruction, the precision of the sequence can be
-decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).
-
-Note that GCC implements @code{1.0f/sqrtf(@var{x})} in terms of @code{RSQRTSS}
-(or @code{RSQRTPS}) already with @option{-ffast-math} (or the above option
-combination), and doesn't need @option{-mrecip}.
-
-Also note that GCC emits the above sequence with additional Newton-Raphson step
-for vectorized single-float division and vectorized @code{sqrtf(@var{x})}
-already with @option{-ffast-math} (or the above option combination), and
-doesn't need @option{-mrecip}.
-
-@item -mrecip=@var{opt}
-@opindex mrecip=opt
-This option controls which reciprocal estimate instructions
-may be used. @var{opt} is a comma-separated list of options, which may
-be preceded by a @samp{!} to invert the option:
-
-@table @samp
-@item all
-Enable all estimate instructions.
-
-@item default
-Enable the default instructions, equivalent to @option{-mrecip}.
-
-@item none
-Disable all estimate instructions, equivalent to @option{-mno-recip}.
-
-@item div
-Enable the approximation for scalar division.
-
-@item vec-div
-Enable the approximation for vectorized division.
-
-@item sqrt
-Enable the approximation for scalar square root.
-
-@item vec-sqrt
-Enable the approximation for vectorized square root.
-@end table
-
-So, for example, @option{-mrecip=all,!sqrt} enables
-all of the reciprocal approximations, except for square root.
-
-@item -mveclibabi=@var{type}
-@opindex mveclibabi
-Specifies the ABI type to use for vectorizing intrinsics using an
-external library. Supported values for @var{type} are @samp{svml}
-for the Intel short
-vector math library and @samp{acml} for the AMD math core library.
-To use this option, both @option{-ftree-vectorize} and
-@option{-funsafe-math-optimizations} have to be enabled, and an SVML or ACML
-ABI-compatible library must be specified at link time.
-
-GCC currently emits calls to @code{vmldExp2},
-@code{vmldLn2}, @code{vmldLog102}, @code{vmldLog102}, @code{vmldPow2},
-@code{vmldTanh2}, @code{vmldTan2}, @code{vmldAtan2}, @code{vmldAtanh2},
-@code{vmldCbrt2}, @code{vmldSinh2}, @code{vmldSin2}, @code{vmldAsinh2},
-@code{vmldAsin2}, @code{vmldCosh2}, @code{vmldCos2}, @code{vmldAcosh2},
-@code{vmldAcos2}, @code{vmlsExp4}, @code{vmlsLn4}, @code{vmlsLog104},
-@code{vmlsLog104}, @code{vmlsPow4}, @code{vmlsTanh4}, @code{vmlsTan4},
-@code{vmlsAtan4}, @code{vmlsAtanh4}, @code{vmlsCbrt4}, @code{vmlsSinh4},
-@code{vmlsSin4}, @code{vmlsAsinh4}, @code{vmlsAsin4}, @code{vmlsCosh4},
-@code{vmlsCos4}, @code{vmlsAcosh4} and @code{vmlsAcos4} for corresponding
-function type when @option{-mveclibabi=svml} is used, and @code{__vrd2_sin},
-@code{__vrd2_cos}, @code{__vrd2_exp}, @code{__vrd2_log}, @code{__vrd2_log2},
-@code{__vrd2_log10}, @code{__vrs4_sinf}, @code{__vrs4_cosf},
-@code{__vrs4_expf}, @code{__vrs4_logf}, @code{__vrs4_log2f},
-@code{__vrs4_log10f} and @code{__vrs4_powf} for the corresponding function type
-when @option{-mveclibabi=acml} is used.
-
-@item -mabi=@var{name}
-@opindex mabi
-Generate code for the specified calling convention. Permissible values
-are @samp{sysv} for the ABI used on GNU/Linux and other systems, and
-@samp{ms} for the Microsoft ABI. The default is to use the Microsoft
-ABI when targeting Microsoft Windows and the SysV ABI on all other systems.
-You can control this behavior for specific functions by
-using the function attributes @code{ms_abi} and @code{sysv_abi}.
-@xref{Function Attributes}.
-
-@item -mtls-dialect=@var{type}
-@opindex mtls-dialect
-Generate code to access thread-local storage using the @samp{gnu} or
-@samp{gnu2} conventions. @samp{gnu} is the conservative default;
-@samp{gnu2} is more efficient, but it may add compile- and run-time
-requirements that cannot be satisfied on all systems.
-
-@item -mpush-args
-@itemx -mno-push-args
-@opindex mpush-args
-@opindex mno-push-args
-Use PUSH operations to store outgoing parameters. This method is shorter
-and usually equally fast as method using SUB/MOV operations and is enabled
-by default. In some cases disabling it may improve performance because of
-improved scheduling and reduced dependencies.
-
-@item -maccumulate-outgoing-args
-@opindex maccumulate-outgoing-args
-If enabled, the maximum amount of space required for outgoing arguments is
-computed in the function prologue. This is faster on most modern CPUs
-because of reduced dependencies, improved scheduling and reduced stack usage
-when the preferred stack boundary is not equal to 2. The drawback is a notable
-increase in code size. This switch implies @option{-mno-push-args}.
-
-@item -mthreads
-@opindex mthreads
-Support thread-safe exception handling on MinGW. Programs that rely
-on thread-safe exception handling must compile and link all code with the
-@option{-mthreads} option. When compiling, @option{-mthreads} defines
-@option{-D_MT}; when linking, it links in a special thread helper library
-@option{-lmingwthrd} which cleans up per-thread exception-handling data.
-
-@item -mno-align-stringops
-@opindex mno-align-stringops
-Do not align the destination of inlined string operations. This switch reduces
-code size and improves performance in case the destination is already aligned,
-but GCC doesn't know about it.
-
-@item -minline-all-stringops
-@opindex minline-all-stringops
-By default GCC inlines string operations only when the destination is
-known to be aligned to least a 4-byte boundary.
-This enables more inlining and increases code
-size, but may improve performance of code that depends on fast
-@code{memcpy}, @code{strlen},
-and @code{memset} for short lengths.
-
-@item -minline-stringops-dynamically
-@opindex minline-stringops-dynamically
-For string operations of unknown size, use run-time checks with
-inline code for small blocks and a library call for large blocks.
-
-@item -mstringop-strategy=@var{alg}
-@opindex mstringop-strategy=@var{alg}
-Override the internal decision heuristic for the particular algorithm to use
-for inlining string operations. The allowed values for @var{alg} are:
-
-@table @samp
-@item rep_byte
-@itemx rep_4byte
-@itemx rep_8byte
-Expand using i386 @code{rep} prefix of the specified size.
-
-@item byte_loop
-@itemx loop
-@itemx unrolled_loop
-Expand into an inline loop.
-
-@item libcall
-Always use a library call.
-@end table
-
-@item -mmemcpy-strategy=@var{strategy}
-@opindex mmemcpy-strategy=@var{strategy}
-Override the internal decision heuristic to decide if @code{__builtin_memcpy}
-should be inlined and what inline algorithm to use when the expected size
-of the copy operation is known. @var{strategy}
-is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets.
-@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
-the max byte size with which inline algorithm @var{alg} is allowed. For the last
-triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
-in the list must be specified in increasing order. The minimal byte size for
-@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the
-preceding range.
-
-@item -mmemset-strategy=@var{strategy}
-@opindex mmemset-strategy=@var{strategy}
-The option is similar to @option{-mmemcpy-strategy=} except that it is to control
-@code{__builtin_memset} expansion.
-
-@item -momit-leaf-frame-pointer
-@opindex momit-leaf-frame-pointer
-Don't keep the frame pointer in a register for leaf functions. This
-avoids the instructions to save, set up, and restore frame pointers and
-makes an extra register available in leaf functions. The option
-@option{-fomit-leaf-frame-pointer} removes the frame pointer for leaf functions,
-which might make debugging harder.
-
-@item -mtls-direct-seg-refs
-@itemx -mno-tls-direct-seg-refs
-@opindex mtls-direct-seg-refs
-Controls whether TLS variables may be accessed with offsets from the
-TLS segment register (@code{%gs} for 32-bit, @code{%fs} for 64-bit),
-or whether the thread base pointer must be added. Whether or not this
-is valid depends on the operating system, and whether it maps the
-segment to cover the entire TLS area.
-
-For systems that use the GNU C Library, the default is on.
-
-@item -msse2avx
-@itemx -mno-sse2avx
-@opindex msse2avx
-Specify that the assembler should encode SSE instructions with VEX
-prefix. The option @option{-mavx} turns this on by default.
-
-@item -mfentry
-@itemx -mno-fentry
-@opindex mfentry
-If profiling is active (@option{-pg}), put the profiling
-counter call before the prologue.
-Note: On x86 architectures the attribute @code{ms_hook_prologue}
-isn't possible at the moment for @option{-mfentry} and @option{-pg}.
-
-@item -mrecord-mcount
-@itemx -mno-record-mcount
-@opindex mrecord-mcount
-If profiling is active (@option{-pg}), generate a __mcount_loc section
-that contains pointers to each profiling call. This is useful for
-automatically patching and out calls.
-
-@item -mnop-mcount
-@itemx -mno-nop-mcount
-@opindex mnop-mcount
-If profiling is active (@option{-pg}), generate the calls to
-the profiling functions as nops. This is useful when they
-should be patched in later dynamically. This is likely only
-useful together with @option{-mrecord-mcount}.
-
-@item -mskip-rax-setup
-@itemx -mno-skip-rax-setup
-@opindex mskip-rax-setup
-When generating code for the x86-64 architecture with SSE extensions
-disabled, @option{-skip-rax-setup} can be used to skip setting up RAX
-register when there are no variable arguments passed in vector registers.
-
-@strong{Warning:} Since RAX register is used to avoid unnecessarily
-saving vector registers on stack when passing variable arguments, the
-impacts of this option are callees may waste some stack space,
-misbehave or jump to a random location. GCC 4.4 or newer don't have
-those issues, regardless the RAX register value.
-
-@item -m8bit-idiv
-@itemx -mno-8bit-idiv
-@opindex m8bit-idiv
-On some processors, like Intel Atom, 8-bit unsigned integer divide is
-much faster than 32-bit/64-bit integer divide. This option generates a
-run-time check. If both dividend and divisor are within range of 0
-to 255, 8-bit unsigned integer divide is used instead of
-32-bit/64-bit integer divide.
-
-@item -mavx256-split-unaligned-load
-@itemx -mavx256-split-unaligned-store
-@opindex mavx256-split-unaligned-load
-@opindex mavx256-split-unaligned-store
-Split 32-byte AVX unaligned load and store.
-
-@item -mstack-protector-guard=@var{guard}
-@opindex mstack-protector-guard=@var{guard}
-Generate stack protection code using canary at @var{guard}. Supported
-locations are @samp{global} for global canary or @samp{tls} for per-thread
-canary in the TLS block (the default). This option has effect only when
-@option{-fstack-protector} or @option{-fstack-protector-all} is specified.
-
-@end table
-
-These @samp{-m} switches are supported in addition to the above
-on x86-64 processors in 64-bit environments.
-
-@table @gcctabopt
-@item -m32
-@itemx -m64
-@itemx -mx32
-@itemx -m16
-@opindex m32
-@opindex m64
-@opindex mx32
-@opindex m16
-Generate code for a 16-bit, 32-bit or 64-bit environment.
-The @option{-m32} option sets @code{int}, @code{long}, and pointer types
-to 32 bits, and
-generates code that runs on any i386 system.
-
-The @option{-m64} option sets @code{int} to 32 bits and @code{long} and pointer
-types to 64 bits, and generates code for the x86-64 architecture.
-For Darwin only the @option{-m64} option also turns off the @option{-fno-pic}
-and @option{-mdynamic-no-pic} options.
-
-The @option{-mx32} option sets @code{int}, @code{long}, and pointer types
-to 32 bits, and
-generates code for the x86-64 architecture.
-
-The @option{-m16} option is the same as @option{-m32}, except for that
-it outputs the @code{.code16gcc} assembly directive at the beginning of
-the assembly output so that the binary can run in 16-bit mode.
-
-@item -mno-red-zone
-@opindex mno-red-zone
-Do not use a so-called ``red zone'' for x86-64 code. The red zone is mandated
-by the x86-64 ABI; it is a 128-byte area beyond the location of the
-stack pointer that is not modified by signal or interrupt handlers
-and therefore can be used for temporary data without adjusting the stack
-pointer. The flag @option{-mno-red-zone} disables this red zone.
-
-@item -mcmodel=small
-@opindex mcmodel=small
-Generate code for the small code model: the program and its symbols must
-be linked in the lower 2 GB of the address space. Pointers are 64 bits.
-Programs can be statically or dynamically linked. This is the default
-code model.
-
-@item -mcmodel=kernel
-@opindex mcmodel=kernel
-Generate code for the kernel code model. The kernel runs in the
-negative 2 GB of the address space.
-This model has to be used for Linux kernel code.
-
-@item -mcmodel=medium
-@opindex mcmodel=medium
-Generate code for the medium model: the program is linked in the lower 2
-GB of the address space. Small symbols are also placed there. Symbols
-with sizes larger than @option{-mlarge-data-threshold} are put into
-large data or BSS sections and can be located above 2GB. Programs can
-be statically or dynamically linked.
-
-@item -mcmodel=large
-@opindex mcmodel=large
-Generate code for the large model. This model makes no assumptions
-about addresses and sizes of sections.
-
-@item -maddress-mode=long
-@opindex maddress-mode=long
-Generate code for long address mode. This is only supported for 64-bit
-and x32 environments. It is the default address mode for 64-bit
-environments.
-
-@item -maddress-mode=short
-@opindex maddress-mode=short
-Generate code for short address mode. This is only supported for 32-bit
-and x32 environments. It is the default address mode for 32-bit and
-x32 environments.
-@end table
-
-@node x86 Windows Options
-@subsection x86 Windows Options
-@cindex x86 Windows Options
-@cindex Windows Options for x86
-
-These additional options are available for Microsoft Windows targets:
-
-@table @gcctabopt
-@item -mconsole
-@opindex mconsole
-This option
-specifies that a console application is to be generated, by
-instructing the linker to set the PE header subsystem type
-required for console applications.
-This option is available for Cygwin and MinGW targets and is
-enabled by default on those targets.
-
-@item -mdll
-@opindex mdll
-This option is available for Cygwin and MinGW targets. It
-specifies that a DLL---a dynamic link library---is to be
-generated, enabling the selection of the required runtime
-startup object and entry point.
-
-@item -mnop-fun-dllimport
-@opindex mnop-fun-dllimport
-This option is available for Cygwin and MinGW targets. It
-specifies that the @code{dllimport} attribute should be ignored.
-
-@item -mthread
-@opindex mthread
-This option is available for MinGW targets. It specifies
-that MinGW-specific thread support is to be used.
-
-@item -municode
-@opindex municode
-This option is available for MinGW-w64 targets. It causes
-the @code{UNICODE} preprocessor macro to be predefined, and
-chooses Unicode-capable runtime startup code.
-
-@item -mwin32
-@opindex mwin32
-This option is available for Cygwin and MinGW targets. It
-specifies that the typical Microsoft Windows predefined macros are to
-be set in the pre-processor, but does not influence the choice
-of runtime library/startup code.
-
-@item -mwindows
-@opindex mwindows
-This option is available for Cygwin and MinGW targets. It
-specifies that a GUI application is to be generated by
-instructing the linker to set the PE header subsystem type
-appropriately.
-
-@item -fno-set-stack-executable
-@opindex fno-set-stack-executable
-This option is available for MinGW targets. It specifies that
-the executable flag for the stack used by nested functions isn't
-set. This is necessary for binaries running in kernel mode of
-Microsoft Windows, as there the User32 API, which is used to set executable
-privileges, isn't available.
-
-@item -fwritable-relocated-rdata
-@opindex fno-writable-relocated-rdata
-This option is available for MinGW and Cygwin targets. It specifies
-that relocated-data in read-only section is put into .data
-section. This is a necessary for older runtimes not supporting
-modification of .rdata sections for pseudo-relocation.
-
-@item -mpe-aligned-commons
-@opindex mpe-aligned-commons
-This option is available for Cygwin and MinGW targets. It
-specifies that the GNU extension to the PE file format that
-permits the correct alignment of COMMON variables should be
-used when generating code. It is enabled by default if
-GCC detects that the target assembler found during configuration
-supports the feature.
-@end table
-
-See also under @ref{x86 Options} for standard options.
-
@node IA-64 Options
@subsection IA-64 Options
@cindex IA-64 Options
@@ -22850,6 +21633,1223 @@ Disable lazy binding of function calls. This option is the default and
is defined for compatibility with Diab.
@end table
+@node x86 Options
+@subsection x86 Options
+@cindex x86 Options
+
+These @samp{-m} options are defined for the x86 family of computers.
+
+@table @gcctabopt
+
+@item -march=@var{cpu-type}
+@opindex march
+Generate instructions for the machine type @var{cpu-type}. In contrast to
+@option{-mtune=@var{cpu-type}}, which merely tunes the generated code
+for the specified @var{cpu-type}, @option{-march=@var{cpu-type}} allows GCC
+to generate code that may not run at all on processors other than the one
+indicated. Specifying @option{-march=@var{cpu-type}} implies
+@option{-mtune=@var{cpu-type}}.
+
+The choices for @var{cpu-type} are:
+
+@table @samp
+@item native
+This selects the CPU to generate code for at compilation time by determining
+the processor type of the compiling machine. Using @option{-march=native}
+enables all instruction subsets supported by the local machine (hence
+the result might not run on different machines). Using @option{-mtune=native}
+produces code optimized for the local machine under the constraints
+of the selected instruction set.
+
+@item i386
+Original Intel i386 CPU@.
+
+@item i486
+Intel i486 CPU@. (No scheduling is implemented for this chip.)
+
+@item i586
+@itemx pentium
+Intel Pentium CPU with no MMX support.
+
+@item pentium-mmx
+Intel Pentium MMX CPU, based on Pentium core with MMX instruction set support.
+
+@item pentiumpro
+Intel Pentium Pro CPU@.
+
+@item i686
+When used with @option{-march}, the Pentium Pro
+instruction set is used, so the code runs on all i686 family chips.
+When used with @option{-mtune}, it has the same meaning as @samp{generic}.
+
+@item pentium2
+Intel Pentium II CPU, based on Pentium Pro core with MMX instruction set
+support.
+
+@item pentium3
+@itemx pentium3m
+Intel Pentium III CPU, based on Pentium Pro core with MMX and SSE instruction
+set support.
+
+@item pentium-m
+Intel Pentium M; low-power version of Intel Pentium III CPU
+with MMX, SSE and SSE2 instruction set support. Used by Centrino notebooks.
+
+@item pentium4
+@itemx pentium4m
+Intel Pentium 4 CPU with MMX, SSE and SSE2 instruction set support.
+
+@item prescott
+Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2 and SSE3 instruction
+set support.
+
+@item nocona
+Improved version of Intel Pentium 4 CPU with 64-bit extensions, MMX, SSE,
+SSE2 and SSE3 instruction set support.
+
+@item core2
+Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3
+instruction set support.
+
+@item nehalem
+Intel Nehalem CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2 and POPCNT instruction set support.
+
+@item westmere
+Intel Westmere CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AES and PCLMUL instruction set support.
+
+@item sandybridge
+Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support.
+
+@item ivybridge
+Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C
+instruction set support.
+
+@item haswell
+Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
+BMI, BMI2 and F16C instruction set support.
+
+@item broadwell
+Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
+BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set support.
+
+@item bonnell
+Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3
+instruction set support.
+
+@item silvermont
+Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
+SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction set support.
+
+@item k6
+AMD K6 CPU with MMX instruction set support.
+
+@item k6-2
+@itemx k6-3
+Improved versions of AMD K6 CPU with MMX and 3DNow!@: instruction set support.
+
+@item athlon
+@itemx athlon-tbird
+AMD Athlon CPU with MMX, 3dNOW!, enhanced 3DNow!@: and SSE prefetch instructions
+support.
+
+@item athlon-4
+@itemx athlon-xp
+@itemx athlon-mp
+Improved AMD Athlon CPU with MMX, 3DNow!, enhanced 3DNow!@: and full SSE
+instruction set support.
+
+@item k8
+@itemx opteron
+@itemx athlon64
+@itemx athlon-fx
+Processors based on the AMD K8 core with x86-64 instruction set support,
+including the AMD Opteron, Athlon 64, and Athlon 64 FX processors.
+(This supersets MMX, SSE, SSE2, 3DNow!, enhanced 3DNow!@: and 64-bit
+instruction set extensions.)
+
+@item k8-sse3
+@itemx opteron-sse3
+@itemx athlon64-sse3
+Improved versions of AMD K8 cores with SSE3 instruction set support.
+
+@item amdfam10
+@itemx barcelona
+CPUs based on AMD Family 10h cores with x86-64 instruction set support. (This
+supersets MMX, SSE, SSE2, SSE3, SSE4A, 3DNow!, enhanced 3DNow!, ABM and 64-bit
+instruction set extensions.)
+
+@item bdver1
+CPUs based on AMD Family 15h cores with x86-64 instruction set support. (This
+supersets FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A,
+SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.)
+@item bdver2
+AMD Family 15h core based CPUs with x86-64 instruction set support. (This
+supersets BMI, TBM, F16C, FMA, FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX,
+SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set
+extensions.)
+@item bdver3
+AMD Family 15h core based CPUs with x86-64 instruction set support. (This
+supersets BMI, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, XOP, LWP, AES,
+PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and
+64-bit instruction set extensions.
+@item bdver4
+AMD Family 15h core based CPUs with x86-64 instruction set support. (This
+supersets BMI, BMI2, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, AVX2, XOP, LWP,
+AES, PCL_MUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1,
+SSE4.2, ABM and 64-bit instruction set extensions.
+
+@item btver1
+CPUs based on AMD Family 14h cores with x86-64 instruction set support. (This
+supersets MMX, SSE, SSE2, SSE3, SSSE3, SSE4A, CX16, ABM and 64-bit
+instruction set extensions.)
+
+@item btver2
+CPUs based on AMD Family 16h cores with x86-64 instruction set support. This
+includes MOVBE, F16C, BMI, AVX, PCL_MUL, AES, SSE4.2, SSE4.1, CX16, ABM,
+SSE4A, SSSE3, SSE3, SSE2, SSE, MMX and 64-bit instruction set extensions.
+
+@item winchip-c6
+IDT WinChip C6 CPU, dealt in same way as i486 with additional MMX instruction
+set support.
+
+@item winchip2
+IDT WinChip 2 CPU, dealt in same way as i486 with additional MMX and 3DNow!@:
+instruction set support.
+
+@item c3
+VIA C3 CPU with MMX and 3DNow!@: instruction set support. (No scheduling is
+implemented for this chip.)
+
+@item c3-2
+VIA C3-2 (Nehemiah/C5XL) CPU with MMX and SSE instruction set support.
+(No scheduling is
+implemented for this chip.)
+
+@item geode
+AMD Geode embedded processor with MMX and 3DNow!@: instruction set support.
+@end table
+
+@item -mtune=@var{cpu-type}
+@opindex mtune
+Tune to @var{cpu-type} everything applicable about the generated code, except
+for the ABI and the set of available instructions.
+While picking a specific @var{cpu-type} schedules things appropriately
+for that particular chip, the compiler does not generate any code that
+cannot run on the default machine type unless you use a
+@option{-march=@var{cpu-type}} option.
+For example, if GCC is configured for i686-pc-linux-gnu
+then @option{-mtune=pentium4} generates code that is tuned for Pentium 4
+but still runs on i686 machines.
+
+The choices for @var{cpu-type} are the same as for @option{-march}.
+In addition, @option{-mtune} supports 2 extra choices for @var{cpu-type}:
+
+@table @samp
+@item generic
+Produce code optimized for the most common IA32/@/AMD64/@/EM64T processors.
+If you know the CPU on which your code will run, then you should use
+the corresponding @option{-mtune} or @option{-march} option instead of
+@option{-mtune=generic}. But, if you do not know exactly what CPU users
+of your application will have, then you should use this option.
+
+As new processors are deployed in the marketplace, the behavior of this
+option will change. Therefore, if you upgrade to a newer version of
+GCC, code generation controlled by this option will change to reflect
+the processors
+that are most common at the time that version of GCC is released.
+
+There is no @option{-march=generic} option because @option{-march}
+indicates the instruction set the compiler can use, and there is no
+generic instruction set applicable to all processors. In contrast,
+@option{-mtune} indicates the processor (or, in this case, collection of
+processors) for which the code is optimized.
+
+@item intel
+Produce code optimized for the most current Intel processors, which are
+Haswell and Silvermont for this version of GCC. If you know the CPU
+on which your code will run, then you should use the corresponding
+@option{-mtune} or @option{-march} option instead of @option{-mtune=intel}.
+But, if you want your application performs better on both Haswell and
+Silvermont, then you should use this option.
+
+As new Intel processors are deployed in the marketplace, the behavior of
+this option will change. Therefore, if you upgrade to a newer version of
+GCC, code generation controlled by this option will change to reflect
+the most current Intel processors at the time that version of GCC is
+released.
+
+There is no @option{-march=intel} option because @option{-march} indicates
+the instruction set the compiler can use, and there is no common
+instruction set applicable to all processors. In contrast,
+@option{-mtune} indicates the processor (or, in this case, collection of
+processors) for which the code is optimized.
+@end table
+
+@item -mcpu=@var{cpu-type}
+@opindex mcpu
+A deprecated synonym for @option{-mtune}.
+
+@item -mfpmath=@var{unit}
+@opindex mfpmath
+Generate floating-point arithmetic for selected unit @var{unit}. The choices
+for @var{unit} are:
+
+@table @samp
+@item 387
+Use the standard 387 floating-point coprocessor present on the majority of chips and
+emulated otherwise. Code compiled with this option runs almost everywhere.
+The temporary results are computed in 80-bit precision instead of the precision
+specified by the type, resulting in slightly different results compared to most
+of other chips. See @option{-ffloat-store} for more detailed description.
+
+This is the default choice for x86-32 targets.
+
+@item sse
+Use scalar floating-point instructions present in the SSE instruction set.
+This instruction set is supported by Pentium III and newer chips,
+and in the AMD line
+by Athlon-4, Athlon XP and Athlon MP chips. The earlier version of the SSE
+instruction set supports only single-precision arithmetic, thus the double and
+extended-precision arithmetic are still done using 387. A later version, present
+only in Pentium 4 and AMD x86-64 chips, supports double-precision
+arithmetic too.
+
+For the x86-32 compiler, you must use @option{-march=@var{cpu-type}}, @option{-msse}
+or @option{-msse2} switches to enable SSE extensions and make this option
+effective. For the x86-64 compiler, these extensions are enabled by default.
+
+The resulting code should be considerably faster in the majority of cases and avoid
+the numerical instability problems of 387 code, but may break some existing
+code that expects temporaries to be 80 bits.
+
+This is the default choice for the x86-64 compiler.
+
+@item sse,387
+@itemx sse+387
+@itemx both
+Attempt to utilize both instruction sets at once. This effectively doubles the
+amount of available registers, and on chips with separate execution units for
+387 and SSE the execution resources too. Use this option with care, as it is
+still experimental, because the GCC register allocator does not model separate
+functional units well, resulting in unstable performance.
+@end table
+
+@item -masm=@var{dialect}
+@opindex masm=@var{dialect}
+Output assembly instructions using selected @var{dialect}. Supported
+choices are @samp{intel} or @samp{att} (the default). Darwin does
+not support @samp{intel}.
+
+@item -mieee-fp
+@itemx -mno-ieee-fp
+@opindex mieee-fp
+@opindex mno-ieee-fp
+Control whether or not the compiler uses IEEE floating-point
+comparisons. These correctly handle the case where the result of a
+comparison is unordered.
+
+@item -msoft-float
+@opindex msoft-float
+Generate output containing library calls for floating point.
+
+@strong{Warning:} the requisite libraries are not part of GCC@.
+Normally the facilities of the machine's usual C compiler are used, but
+this can't be done directly in cross-compilation. You must make your
+own arrangements to provide suitable library functions for
+cross-compilation.
+
+On machines where a function returns floating-point results in the 80387
+register stack, some floating-point opcodes may be emitted even if
+@option{-msoft-float} is used.
+
+@item -mno-fp-ret-in-387
+@opindex mno-fp-ret-in-387
+Do not use the FPU registers for return values of functions.
+
+The usual calling convention has functions return values of types
+@code{float} and @code{double} in an FPU register, even if there
+is no FPU@. The idea is that the operating system should emulate
+an FPU@.
+
+The option @option{-mno-fp-ret-in-387} causes such values to be returned
+in ordinary CPU registers instead.
+
+@item -mno-fancy-math-387
+@opindex mno-fancy-math-387
+Some 387 emulators do not support the @code{sin}, @code{cos} and
+@code{sqrt} instructions for the 387. Specify this option to avoid
+generating those instructions. This option is the default on FreeBSD,
+OpenBSD and NetBSD@. This option is overridden when @option{-march}
+indicates that the target CPU always has an FPU and so the
+instruction does not need emulation. These
+instructions are not generated unless you also use the
+@option{-funsafe-math-optimizations} switch.
+
+@item -malign-double
+@itemx -mno-align-double
+@opindex malign-double
+@opindex mno-align-double
+Control whether GCC aligns @code{double}, @code{long double}, and
+@code{long long} variables on a two-word boundary or a one-word
+boundary. Aligning @code{double} variables on a two-word boundary
+produces code that runs somewhat faster on a Pentium at the
+expense of more memory.
+
+On x86-64, @option{-malign-double} is enabled by default.
+
+@strong{Warning:} if you use the @option{-malign-double} switch,
+structures containing the above types are aligned differently than
+the published application binary interface specifications for the x86-32
+and are not binary compatible with structures in code compiled
+without that switch.
+
+@item -m96bit-long-double
+@itemx -m128bit-long-double
+@opindex m96bit-long-double
+@opindex m128bit-long-double
+These switches control the size of @code{long double} type. The x86-32
+application binary interface specifies the size to be 96 bits,
+so @option{-m96bit-long-double} is the default in 32-bit mode.
+
+Modern architectures (Pentium and newer) prefer @code{long double}
+to be aligned to an 8- or 16-byte boundary. In arrays or structures
+conforming to the ABI, this is not possible. So specifying
+@option{-m128bit-long-double} aligns @code{long double}
+to a 16-byte boundary by padding the @code{long double} with an additional
+32-bit zero.
+
+In the x86-64 compiler, @option{-m128bit-long-double} is the default choice as
+its ABI specifies that @code{long double} is aligned on 16-byte boundary.
+
+Notice that neither of these options enable any extra precision over the x87
+standard of 80 bits for a @code{long double}.
+
+@strong{Warning:} if you override the default value for your target ABI, this
+changes the size of
+structures and arrays containing @code{long double} variables,
+as well as modifying the function calling convention for functions taking
+@code{long double}. Hence they are not binary-compatible
+with code compiled without that switch.
+
+@item -mlong-double-64
+@itemx -mlong-double-80
+@itemx -mlong-double-128
+@opindex mlong-double-64
+@opindex mlong-double-80
+@opindex mlong-double-128
+These switches control the size of @code{long double} type. A size
+of 64 bits makes the @code{long double} type equivalent to the @code{double}
+type. This is the default for 32-bit Bionic C library. A size
+of 128 bits makes the @code{long double} type equivalent to the
+@code{__float128} type. This is the default for 64-bit Bionic C library.
+
+@strong{Warning:} if you override the default value for your target ABI, this
+changes the size of
+structures and arrays containing @code{long double} variables,
+as well as modifying the function calling convention for functions taking
+@code{long double}. Hence they are not binary-compatible
+with code compiled without that switch.
+
+@item -malign-data=@var{type}
+@opindex malign-data
+Control how GCC aligns variables. Supported values for @var{type} are
+@samp{compat} uses increased alignment value compatible uses GCC 4.8
+and earlier, @samp{abi} uses alignment value as specified by the
+psABI, and @samp{cacheline} uses increased alignment value to match
+the cache line size. @samp{compat} is the default.
+
+@item -mlarge-data-threshold=@var{threshold}
+@opindex mlarge-data-threshold
+When @option{-mcmodel=medium} is specified, data objects larger than
+@var{threshold} are placed in the large data section. This value must be the
+same across all objects linked into the binary, and defaults to 65535.
+
+@item -mrtd
+@opindex mrtd
+Use a different function-calling convention, in which functions that
+take a fixed number of arguments return with the @code{ret @var{num}}
+instruction, which pops their arguments while returning. This saves one
+instruction in the caller since there is no need to pop the arguments
+there.
+
+You can specify that an individual function is called with this calling
+sequence with the function attribute @code{stdcall}. You can also
+override the @option{-mrtd} option by using the function attribute
+@code{cdecl}. @xref{Function Attributes}.
+
+@strong{Warning:} this calling convention is incompatible with the one
+normally used on Unix, so you cannot use it if you need to call
+libraries compiled with the Unix compiler.
+
+Also, you must provide function prototypes for all functions that
+take variable numbers of arguments (including @code{printf});
+otherwise incorrect code is generated for calls to those
+functions.
+
+In addition, seriously incorrect code results if you call a
+function with too many arguments. (Normally, extra arguments are
+harmlessly ignored.)
+
+@item -mregparm=@var{num}
+@opindex mregparm
+Control how many registers are used to pass integer arguments. By
+default, no registers are used to pass arguments, and at most 3
+registers can be used. You can control this behavior for a specific
+function by using the function attribute @code{regparm}.
+@xref{Function Attributes}.
+
+@strong{Warning:} if you use this switch, and
+@var{num} is nonzero, then you must build all modules with the same
+value, including any libraries. This includes the system libraries and
+startup modules.
+
+@item -msseregparm
+@opindex msseregparm
+Use SSE register passing conventions for float and double arguments
+and return values. You can control this behavior for a specific
+function by using the function attribute @code{sseregparm}.
+@xref{Function Attributes}.
+
+@strong{Warning:} if you use this switch then you must build all
+modules with the same value, including any libraries. This includes
+the system libraries and startup modules.
+
+@item -mvect8-ret-in-mem
+@opindex mvect8-ret-in-mem
+Return 8-byte vectors in memory instead of MMX registers. This is the
+default on Solaris@tie{}8 and 9 and VxWorks to match the ABI of the Sun
+Studio compilers until version 12. Later compiler versions (starting
+with Studio 12 Update@tie{}1) follow the ABI used by other x86 targets, which
+is the default on Solaris@tie{}10 and later. @emph{Only} use this option if
+you need to remain compatible with existing code produced by those
+previous compiler versions or older versions of GCC@.
+
+@item -mpc32
+@itemx -mpc64
+@itemx -mpc80
+@opindex mpc32
+@opindex mpc64
+@opindex mpc80
+
+Set 80387 floating-point precision to 32, 64 or 80 bits. When @option{-mpc32}
+is specified, the significands of results of floating-point operations are
+rounded to 24 bits (single precision); @option{-mpc64} rounds the
+significands of results of floating-point operations to 53 bits (double
+precision) and @option{-mpc80} rounds the significands of results of
+floating-point operations to 64 bits (extended double precision), which is
+the default. When this option is used, floating-point operations in higher
+precisions are not available to the programmer without setting the FPU
+control word explicitly.
+
+Setting the rounding of floating-point operations to less than the default
+80 bits can speed some programs by 2% or more. Note that some mathematical
+libraries assume that extended-precision (80-bit) floating-point operations
+are enabled by default; routines in such libraries could suffer significant
+loss of accuracy, typically through so-called ``catastrophic cancellation'',
+when this option is used to set the precision to less than extended precision.
+
+@item -mstackrealign
+@opindex mstackrealign
+Realign the stack at entry. On the x86, the @option{-mstackrealign}
+option generates an alternate prologue and epilogue that realigns the
+run-time stack if necessary. This supports mixing legacy codes that keep
+4-byte stack alignment with modern codes that keep 16-byte stack alignment for
+SSE compatibility. See also the attribute @code{force_align_arg_pointer},
+applicable to individual functions.
+
+@item -mpreferred-stack-boundary=@var{num}
+@opindex mpreferred-stack-boundary
+Attempt to keep the stack boundary aligned to a 2 raised to @var{num}
+byte boundary. If @option{-mpreferred-stack-boundary} is not specified,
+the default is 4 (16 bytes or 128 bits).
+
+@strong{Warning:} When generating code for the x86-64 architecture with
+SSE extensions disabled, @option{-mpreferred-stack-boundary=3} can be
+used to keep the stack boundary aligned to 8 byte boundary. Since
+x86-64 ABI require 16 byte stack alignment, this is ABI incompatible and
+intended to be used in controlled environment where stack space is
+important limitation. This option leads to wrong code when functions
+compiled with 16 byte stack alignment (such as functions from a standard
+library) are called with misaligned stack. In this case, SSE
+instructions may lead to misaligned memory access traps. In addition,
+variable arguments are handled incorrectly for 16 byte aligned
+objects (including x87 long double and __int128), leading to wrong
+results. You must build all modules with
+@option{-mpreferred-stack-boundary=3}, including any libraries. This
+includes the system libraries and startup modules.
+
+@item -mincoming-stack-boundary=@var{num}
+@opindex mincoming-stack-boundary
+Assume the incoming stack is aligned to a 2 raised to @var{num} byte
+boundary. If @option{-mincoming-stack-boundary} is not specified,
+the one specified by @option{-mpreferred-stack-boundary} is used.
+
+On Pentium and Pentium Pro, @code{double} and @code{long double} values
+should be aligned to an 8-byte boundary (see @option{-malign-double}) or
+suffer significant run time performance penalties. On Pentium III, the
+Streaming SIMD Extension (SSE) data type @code{__m128} may not work
+properly if it is not 16-byte aligned.
+
+To ensure proper alignment of this values on the stack, the stack boundary
+must be as aligned as that required by any value stored on the stack.
+Further, every function must be generated such that it keeps the stack
+aligned. Thus calling a function compiled with a higher preferred
+stack boundary from a function compiled with a lower preferred stack
+boundary most likely misaligns the stack. It is recommended that
+libraries that use callbacks always use the default setting.
+
+This extra alignment does consume extra stack space, and generally
+increases code size. Code that is sensitive to stack space usage, such
+as embedded systems and operating system kernels, may want to reduce the
+preferred alignment to @option{-mpreferred-stack-boundary=2}.
+
+@need 200
+@item -mmmx
+@opindex mmmx
+@need 200
+@itemx -msse
+@opindex msse
+@need 200
+@itemx -msse2
+@need 200
+@itemx -msse3
+@need 200
+@itemx -mssse3
+@need 200
+@itemx -msse4
+@need 200
+@itemx -msse4a
+@need 200
+@itemx -msse4.1
+@need 200
+@itemx -msse4.2
+@need 200
+@itemx -mavx
+@opindex mavx
+@need 200
+@itemx -mavx2
+@need 200
+@itemx -mavx512f
+@need 200
+@itemx -mavx512pf
+@need 200
+@itemx -mavx512er
+@need 200
+@itemx -mavx512cd
+@need 200
+@itemx -msha
+@opindex msha
+@need 200
+@itemx -maes
+@opindex maes
+@need 200
+@itemx -mpclmul
+@opindex mpclmul
+@need 200
+@itemx -mclfushopt
+@opindex mclfushopt
+@need 200
+@itemx -mfsgsbase
+@opindex mfsgsbase
+@need 200
+@itemx -mrdrnd
+@opindex mrdrnd
+@need 200
+@itemx -mf16c
+@opindex mf16c
+@need 200
+@itemx -mfma
+@opindex mfma
+@need 200
+@itemx -mfma4
+@need 200
+@itemx -mno-fma4
+@need 200
+@itemx -mprefetchwt1
+@opindex mprefetchwt1
+@need 200
+@itemx -mxop
+@opindex mxop
+@need 200
+@itemx -mlwp
+@opindex mlwp
+@need 200
+@itemx -m3dnow
+@opindex m3dnow
+@need 200
+@itemx -mpopcnt
+@opindex mpopcnt
+@need 200
+@itemx -mabm
+@opindex mabm
+@need 200
+@itemx -mbmi
+@opindex mbmi
+@need 200
+@itemx -mbmi2
+@need 200
+@itemx -mlzcnt
+@opindex mlzcnt
+@need 200
+@itemx -mfxsr
+@opindex mfxsr
+@need 200
+@itemx -mxsave
+@opindex mxsave
+@need 200
+@itemx -mxsaveopt
+@opindex mxsaveopt
+@need 200
+@itemx -mxsavec
+@opindex mxsavec
+@need 200
+@itemx -mxsaves
+@opindex mxsaves
+@need 200
+@itemx -mrtm
+@opindex mrtm
+@need 200
+@itemx -mtbm
+@opindex mtbm
+@need 200
+@itemx -mmpx
+@opindex mmpx
+These switches enable the use of instructions in the MMX, SSE,
+SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD,
+SHA, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, LWP, ABM,
+BMI, BMI2, FXSR, XSAVE, XSAVEOPT, LZCNT, RTM, MPX or 3DNow!@:
+extended instruction sets. Each has a corresponding @option{-mno-} option
+to disable use of these instructions.
+
+These extensions are also available as built-in functions: see
+@ref{x86 Built-in Functions}, for details of the functions enabled and
+disabled by these switches.
+
+To generate SSE/SSE2 instructions automatically from floating-point
+code (as opposed to 387 instructions), see @option{-mfpmath=sse}.
+
+GCC depresses SSEx instructions when @option{-mavx} is used. Instead, it
+generates new AVX instructions or AVX equivalence for all SSEx instructions
+when needed.
+
+These options enable GCC to use these extended instructions in
+generated code, even without @option{-mfpmath=sse}. Applications that
+perform run-time CPU detection must compile separate files for each
+supported architecture, using the appropriate flags. In particular,
+the file containing the CPU detection code should be compiled without
+these options.
+
+@item -mdump-tune-features
+@opindex mdump-tune-features
+This option instructs GCC to dump the names of the x86 performance
+tuning features and default settings. The names can be used in
+@option{-mtune-ctrl=@var{feature-list}}.
+
+@item -mtune-ctrl=@var{feature-list}
+@opindex mtune-ctrl=@var{feature-list}
+This option is used to do fine grain control of x86 code generation features.
+@var{feature-list} is a comma separated list of @var{feature} names. See also
+@option{-mdump-tune-features}. When specified, the @var{feature} is turned
+on if it is not preceded with @samp{^}, otherwise, it is turned off.
+@option{-mtune-ctrl=@var{feature-list}} is intended to be used by GCC
+developers. Using it may lead to code paths not covered by testing and can
+potentially result in compiler ICEs or runtime errors.
+
+@item -mno-default
+@opindex mno-default
+This option instructs GCC to turn off all tunable features. See also
+@option{-mtune-ctrl=@var{feature-list}} and @option{-mdump-tune-features}.
+
+@item -mcld
+@opindex mcld
+This option instructs GCC to emit a @code{cld} instruction in the prologue
+of functions that use string instructions. String instructions depend on
+the DF flag to select between autoincrement or autodecrement mode. While the
+ABI specifies the DF flag to be cleared on function entry, some operating
+systems violate this specification by not clearing the DF flag in their
+exception dispatchers. The exception handler can be invoked with the DF flag
+set, which leads to wrong direction mode when string instructions are used.
+This option can be enabled by default on 32-bit x86 targets by configuring
+GCC with the @option{--enable-cld} configure option. Generation of @code{cld}
+instructions can be suppressed with the @option{-mno-cld} compiler option
+in this case.
+
+@item -mvzeroupper
+@opindex mvzeroupper
+This option instructs GCC to emit a @code{vzeroupper} instruction
+before a transfer of control flow out of the function to minimize
+the AVX to SSE transition penalty as well as remove unnecessary @code{zeroupper}
+intrinsics.
+
+@item -mprefer-avx128
+@opindex mprefer-avx128
+This option instructs GCC to use 128-bit AVX instructions instead of
+256-bit AVX instructions in the auto-vectorizer.
+
+@item -mcx16
+@opindex mcx16
+This option enables GCC to generate @code{CMPXCHG16B} instructions.
+@code{CMPXCHG16B} allows for atomic operations on 128-bit double quadword
+(or oword) data types.
+This is useful for high-resolution counters that can be updated
+by multiple processors (or cores). This instruction is generated as part of
+atomic built-in functions: see @ref{__sync Builtins} or
+@ref{__atomic Builtins} for details.
+
+@item -msahf
+@opindex msahf
+This option enables generation of @code{SAHF} instructions in 64-bit code.
+Early Intel Pentium 4 CPUs with Intel 64 support,
+prior to the introduction of Pentium 4 G1 step in December 2005,
+lacked the @code{LAHF} and @code{SAHF} instructions
+which are supported by AMD64.
+These are load and store instructions, respectively, for certain status flags.
+In 64-bit mode, the @code{SAHF} instruction is used to optimize @code{fmod},
+@code{drem}, and @code{remainder} built-in functions;
+see @ref{Other Builtins} for details.
+
+@item -mmovbe
+@opindex mmovbe
+This option enables use of the @code{movbe} instruction to implement
+@code{__builtin_bswap32} and @code{__builtin_bswap64}.
+
+@item -mcrc32
+@opindex mcrc32
+This option enables built-in functions @code{__builtin_ia32_crc32qi},
+@code{__builtin_ia32_crc32hi}, @code{__builtin_ia32_crc32si} and
+@code{__builtin_ia32_crc32di} to generate the @code{crc32} machine instruction.
+
+@item -mrecip
+@opindex mrecip
+This option enables use of @code{RCPSS} and @code{RSQRTSS} instructions
+(and their vectorized variants @code{RCPPS} and @code{RSQRTPS})
+with an additional Newton-Raphson step
+to increase precision instead of @code{DIVSS} and @code{SQRTSS}
+(and their vectorized
+variants) for single-precision floating-point arguments. These instructions
+are generated only when @option{-funsafe-math-optimizations} is enabled
+together with @option{-finite-math-only} and @option{-fno-trapping-math}.
+Note that while the throughput of the sequence is higher than the throughput
+of the non-reciprocal instruction, the precision of the sequence can be
+decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).
+
+Note that GCC implements @code{1.0f/sqrtf(@var{x})} in terms of @code{RSQRTSS}
+(or @code{RSQRTPS}) already with @option{-ffast-math} (or the above option
+combination), and doesn't need @option{-mrecip}.
+
+Also note that GCC emits the above sequence with additional Newton-Raphson step
+for vectorized single-float division and vectorized @code{sqrtf(@var{x})}
+already with @option{-ffast-math} (or the above option combination), and
+doesn't need @option{-mrecip}.
+
+@item -mrecip=@var{opt}
+@opindex mrecip=opt
+This option controls which reciprocal estimate instructions
+may be used. @var{opt} is a comma-separated list of options, which may
+be preceded by a @samp{!} to invert the option:
+
+@table @samp
+@item all
+Enable all estimate instructions.
+
+@item default
+Enable the default instructions, equivalent to @option{-mrecip}.
+
+@item none
+Disable all estimate instructions, equivalent to @option{-mno-recip}.
+
+@item div
+Enable the approximation for scalar division.
+
+@item vec-div
+Enable the approximation for vectorized division.
+
+@item sqrt
+Enable the approximation for scalar square root.
+
+@item vec-sqrt
+Enable the approximation for vectorized square root.
+@end table
+
+So, for example, @option{-mrecip=all,!sqrt} enables
+all of the reciprocal approximations, except for square root.
+
+@item -mveclibabi=@var{type}
+@opindex mveclibabi
+Specifies the ABI type to use for vectorizing intrinsics using an
+external library. Supported values for @var{type} are @samp{svml}
+for the Intel short
+vector math library and @samp{acml} for the AMD math core library.
+To use this option, both @option{-ftree-vectorize} and
+@option{-funsafe-math-optimizations} have to be enabled, and an SVML or ACML
+ABI-compatible library must be specified at link time.
+
+GCC currently emits calls to @code{vmldExp2},
+@code{vmldLn2}, @code{vmldLog102}, @code{vmldLog102}, @code{vmldPow2},
+@code{vmldTanh2}, @code{vmldTan2}, @code{vmldAtan2}, @code{vmldAtanh2},
+@code{vmldCbrt2}, @code{vmldSinh2}, @code{vmldSin2}, @code{vmldAsinh2},
+@code{vmldAsin2}, @code{vmldCosh2}, @code{vmldCos2}, @code{vmldAcosh2},
+@code{vmldAcos2}, @code{vmlsExp4}, @code{vmlsLn4}, @code{vmlsLog104},
+@code{vmlsLog104}, @code{vmlsPow4}, @code{vmlsTanh4}, @code{vmlsTan4},
+@code{vmlsAtan4}, @code{vmlsAtanh4}, @code{vmlsCbrt4}, @code{vmlsSinh4},
+@code{vmlsSin4}, @code{vmlsAsinh4}, @code{vmlsAsin4}, @code{vmlsCosh4},
+@code{vmlsCos4}, @code{vmlsAcosh4} and @code{vmlsAcos4} for corresponding
+function type when @option{-mveclibabi=svml} is used, and @code{__vrd2_sin},
+@code{__vrd2_cos}, @code{__vrd2_exp}, @code{__vrd2_log}, @code{__vrd2_log2},
+@code{__vrd2_log10}, @code{__vrs4_sinf}, @code{__vrs4_cosf},
+@code{__vrs4_expf}, @code{__vrs4_logf}, @code{__vrs4_log2f},
+@code{__vrs4_log10f} and @code{__vrs4_powf} for the corresponding function type
+when @option{-mveclibabi=acml} is used.
+
+@item -mabi=@var{name}
+@opindex mabi
+Generate code for the specified calling convention. Permissible values
+are @samp{sysv} for the ABI used on GNU/Linux and other systems, and
+@samp{ms} for the Microsoft ABI. The default is to use the Microsoft
+ABI when targeting Microsoft Windows and the SysV ABI on all other systems.
+You can control this behavior for specific functions by
+using the function attributes @code{ms_abi} and @code{sysv_abi}.
+@xref{Function Attributes}.
+
+@item -mtls-dialect=@var{type}
+@opindex mtls-dialect
+Generate code to access thread-local storage using the @samp{gnu} or
+@samp{gnu2} conventions. @samp{gnu} is the conservative default;
+@samp{gnu2} is more efficient, but it may add compile- and run-time
+requirements that cannot be satisfied on all systems.
+
+@item -mpush-args
+@itemx -mno-push-args
+@opindex mpush-args
+@opindex mno-push-args
+Use PUSH operations to store outgoing parameters. This method is shorter
+and usually equally fast as method using SUB/MOV operations and is enabled
+by default. In some cases disabling it may improve performance because of
+improved scheduling and reduced dependencies.
+
+@item -maccumulate-outgoing-args
+@opindex maccumulate-outgoing-args
+If enabled, the maximum amount of space required for outgoing arguments is
+computed in the function prologue. This is faster on most modern CPUs
+because of reduced dependencies, improved scheduling and reduced stack usage
+when the preferred stack boundary is not equal to 2. The drawback is a notable
+increase in code size. This switch implies @option{-mno-push-args}.
+
+@item -mthreads
+@opindex mthreads
+Support thread-safe exception handling on MinGW. Programs that rely
+on thread-safe exception handling must compile and link all code with the
+@option{-mthreads} option. When compiling, @option{-mthreads} defines
+@option{-D_MT}; when linking, it links in a special thread helper library
+@option{-lmingwthrd} which cleans up per-thread exception-handling data.
+
+@item -mno-align-stringops
+@opindex mno-align-stringops
+Do not align the destination of inlined string operations. This switch reduces
+code size and improves performance in case the destination is already aligned,
+but GCC doesn't know about it.
+
+@item -minline-all-stringops
+@opindex minline-all-stringops
+By default GCC inlines string operations only when the destination is
+known to be aligned to least a 4-byte boundary.
+This enables more inlining and increases code
+size, but may improve performance of code that depends on fast
+@code{memcpy}, @code{strlen},
+and @code{memset} for short lengths.
+
+@item -minline-stringops-dynamically
+@opindex minline-stringops-dynamically
+For string operations of unknown size, use run-time checks with
+inline code for small blocks and a library call for large blocks.
+
+@item -mstringop-strategy=@var{alg}
+@opindex mstringop-strategy=@var{alg}
+Override the internal decision heuristic for the particular algorithm to use
+for inlining string operations. The allowed values for @var{alg} are:
+
+@table @samp
+@item rep_byte
+@itemx rep_4byte
+@itemx rep_8byte
+Expand using i386 @code{rep} prefix of the specified size.
+
+@item byte_loop
+@itemx loop
+@itemx unrolled_loop
+Expand into an inline loop.
+
+@item libcall
+Always use a library call.
+@end table
+
+@item -mmemcpy-strategy=@var{strategy}
+@opindex mmemcpy-strategy=@var{strategy}
+Override the internal decision heuristic to decide if @code{__builtin_memcpy}
+should be inlined and what inline algorithm to use when the expected size
+of the copy operation is known. @var{strategy}
+is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets.
+@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
+the max byte size with which inline algorithm @var{alg} is allowed. For the last
+triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
+in the list must be specified in increasing order. The minimal byte size for
+@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the
+preceding range.
+
+@item -mmemset-strategy=@var{strategy}
+@opindex mmemset-strategy=@var{strategy}
+The option is similar to @option{-mmemcpy-strategy=} except that it is to control
+@code{__builtin_memset} expansion.
+
+@item -momit-leaf-frame-pointer
+@opindex momit-leaf-frame-pointer
+Don't keep the frame pointer in a register for leaf functions. This
+avoids the instructions to save, set up, and restore frame pointers and
+makes an extra register available in leaf functions. The option
+@option{-fomit-leaf-frame-pointer} removes the frame pointer for leaf functions,
+which might make debugging harder.
+
+@item -mtls-direct-seg-refs
+@itemx -mno-tls-direct-seg-refs
+@opindex mtls-direct-seg-refs
+Controls whether TLS variables may be accessed with offsets from the
+TLS segment register (@code{%gs} for 32-bit, @code{%fs} for 64-bit),
+or whether the thread base pointer must be added. Whether or not this
+is valid depends on the operating system, and whether it maps the
+segment to cover the entire TLS area.
+
+For systems that use the GNU C Library, the default is on.
+
+@item -msse2avx
+@itemx -mno-sse2avx
+@opindex msse2avx
+Specify that the assembler should encode SSE instructions with VEX
+prefix. The option @option{-mavx} turns this on by default.
+
+@item -mfentry
+@itemx -mno-fentry
+@opindex mfentry
+If profiling is active (@option{-pg}), put the profiling
+counter call before the prologue.
+Note: On x86 architectures the attribute @code{ms_hook_prologue}
+isn't possible at the moment for @option{-mfentry} and @option{-pg}.
+
+@item -mrecord-mcount
+@itemx -mno-record-mcount
+@opindex mrecord-mcount
+If profiling is active (@option{-pg}), generate a __mcount_loc section
+that contains pointers to each profiling call. This is useful for
+automatically patching and out calls.
+
+@item -mnop-mcount
+@itemx -mno-nop-mcount
+@opindex mnop-mcount
+If profiling is active (@option{-pg}), generate the calls to
+the profiling functions as nops. This is useful when they
+should be patched in later dynamically. This is likely only
+useful together with @option{-mrecord-mcount}.
+
+@item -mskip-rax-setup
+@itemx -mno-skip-rax-setup
+@opindex mskip-rax-setup
+When generating code for the x86-64 architecture with SSE extensions
+disabled, @option{-skip-rax-setup} can be used to skip setting up RAX
+register when there are no variable arguments passed in vector registers.
+
+@strong{Warning:} Since RAX register is used to avoid unnecessarily
+saving vector registers on stack when passing variable arguments, the
+impacts of this option are callees may waste some stack space,
+misbehave or jump to a random location. GCC 4.4 or newer don't have
+those issues, regardless the RAX register value.
+
+@item -m8bit-idiv
+@itemx -mno-8bit-idiv
+@opindex m8bit-idiv
+On some processors, like Intel Atom, 8-bit unsigned integer divide is
+much faster than 32-bit/64-bit integer divide. This option generates a
+run-time check. If both dividend and divisor are within range of 0
+to 255, 8-bit unsigned integer divide is used instead of
+32-bit/64-bit integer divide.
+
+@item -mavx256-split-unaligned-load
+@itemx -mavx256-split-unaligned-store
+@opindex mavx256-split-unaligned-load
+@opindex mavx256-split-unaligned-store
+Split 32-byte AVX unaligned load and store.
+
+@item -mstack-protector-guard=@var{guard}
+@opindex mstack-protector-guard=@var{guard}
+Generate stack protection code using canary at @var{guard}. Supported
+locations are @samp{global} for global canary or @samp{tls} for per-thread
+canary in the TLS block (the default). This option has effect only when
+@option{-fstack-protector} or @option{-fstack-protector-all} is specified.
+
+@end table
+
+These @samp{-m} switches are supported in addition to the above
+on x86-64 processors in 64-bit environments.
+
+@table @gcctabopt
+@item -m32
+@itemx -m64
+@itemx -mx32
+@itemx -m16
+@opindex m32
+@opindex m64
+@opindex mx32
+@opindex m16
+Generate code for a 16-bit, 32-bit or 64-bit environment.
+The @option{-m32} option sets @code{int}, @code{long}, and pointer types
+to 32 bits, and
+generates code that runs on any i386 system.
+
+The @option{-m64} option sets @code{int} to 32 bits and @code{long} and pointer
+types to 64 bits, and generates code for the x86-64 architecture.
+For Darwin only the @option{-m64} option also turns off the @option{-fno-pic}
+and @option{-mdynamic-no-pic} options.
+
+The @option{-mx32} option sets @code{int}, @code{long}, and pointer types
+to 32 bits, and
+generates code for the x86-64 architecture.
+
+The @option{-m16} option is the same as @option{-m32}, except for that
+it outputs the @code{.code16gcc} assembly directive at the beginning of
+the assembly output so that the binary can run in 16-bit mode.
+
+@item -mno-red-zone
+@opindex mno-red-zone
+Do not use a so-called ``red zone'' for x86-64 code. The red zone is mandated
+by the x86-64 ABI; it is a 128-byte area beyond the location of the
+stack pointer that is not modified by signal or interrupt handlers
+and therefore can be used for temporary data without adjusting the stack
+pointer. The flag @option{-mno-red-zone} disables this red zone.
+
+@item -mcmodel=small
+@opindex mcmodel=small
+Generate code for the small code model: the program and its symbols must
+be linked in the lower 2 GB of the address space. Pointers are 64 bits.
+Programs can be statically or dynamically linked. This is the default
+code model.
+
+@item -mcmodel=kernel
+@opindex mcmodel=kernel
+Generate code for the kernel code model. The kernel runs in the
+negative 2 GB of the address space.
+This model has to be used for Linux kernel code.
+
+@item -mcmodel=medium
+@opindex mcmodel=medium
+Generate code for the medium model: the program is linked in the lower 2
+GB of the address space. Small symbols are also placed there. Symbols
+with sizes larger than @option{-mlarge-data-threshold} are put into
+large data or BSS sections and can be located above 2GB. Programs can
+be statically or dynamically linked.
+
+@item -mcmodel=large
+@opindex mcmodel=large
+Generate code for the large model. This model makes no assumptions
+about addresses and sizes of sections.
+
+@item -maddress-mode=long
+@opindex maddress-mode=long
+Generate code for long address mode. This is only supported for 64-bit
+and x32 environments. It is the default address mode for 64-bit
+environments.
+
+@item -maddress-mode=short
+@opindex maddress-mode=short
+Generate code for short address mode. This is only supported for 32-bit
+and x32 environments. It is the default address mode for 32-bit and
+x32 environments.
+@end table
+
+@node x86 Windows Options
+@subsection x86 Windows Options
+@cindex x86 Windows Options
+@cindex Windows Options for x86
+
+These additional options are available for Microsoft Windows targets:
+
+@table @gcctabopt
+@item -mconsole
+@opindex mconsole
+This option
+specifies that a console application is to be generated, by
+instructing the linker to set the PE header subsystem type
+required for console applications.
+This option is available for Cygwin and MinGW targets and is
+enabled by default on those targets.
+
+@item -mdll
+@opindex mdll
+This option is available for Cygwin and MinGW targets. It
+specifies that a DLL---a dynamic link library---is to be
+generated, enabling the selection of the required runtime
+startup object and entry point.
+
+@item -mnop-fun-dllimport
+@opindex mnop-fun-dllimport
+This option is available for Cygwin and MinGW targets. It
+specifies that the @code{dllimport} attribute should be ignored.
+
+@item -mthread
+@opindex mthread
+This option is available for MinGW targets. It specifies
+that MinGW-specific thread support is to be used.
+
+@item -municode
+@opindex municode
+This option is available for MinGW-w64 targets. It causes
+the @code{UNICODE} preprocessor macro to be predefined, and
+chooses Unicode-capable runtime startup code.
+
+@item -mwin32
+@opindex mwin32
+This option is available for Cygwin and MinGW targets. It
+specifies that the typical Microsoft Windows predefined macros are to
+be set in the pre-processor, but does not influence the choice
+of runtime library/startup code.
+
+@item -mwindows
+@opindex mwindows
+This option is available for Cygwin and MinGW targets. It
+specifies that a GUI application is to be generated by
+instructing the linker to set the PE header subsystem type
+appropriately.
+
+@item -fno-set-stack-executable
+@opindex fno-set-stack-executable
+This option is available for MinGW targets. It specifies that
+the executable flag for the stack used by nested functions isn't
+set. This is necessary for binaries running in kernel mode of
+Microsoft Windows, as there the User32 API, which is used to set executable
+privileges, isn't available.
+
+@item -fwritable-relocated-rdata
+@opindex fno-writable-relocated-rdata
+This option is available for MinGW and Cygwin targets. It specifies
+that relocated-data in read-only section is put into .data
+section. This is a necessary for older runtimes not supporting
+modification of .rdata sections for pseudo-relocation.
+
+@item -mpe-aligned-commons
+@opindex mpe-aligned-commons
+This option is available for Cygwin and MinGW targets. It
+specifies that the GNU extension to the PE file format that
+permits the correct alignment of COMMON variables should be
+used when generating code. It is enabled by default if
+GCC detects that the target assembler found during configuration
+supports the feature.
+@end table
+
+See also under @ref{x86 Options} for standard options.
+
@node Xstormy16 Options
@subsection Xstormy16 Options
@cindex Xstormy16 Options
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 03faa12..f2c25c2 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -1695,6 +1695,7 @@ constraints that aren't. The compiler source file mentioned in the
table heading for each architecture is the definitive reference for
the meanings of that architecture's constraints.
+@c Please keep this table alphabetized by target!
@table @emph
@item AArch64 family---@file{config/aarch64/constraints.md}
@table @code
@@ -1931,6 +1932,157 @@ A floating point constant 0.0
A memory address based on Y or Z pointer with displacement.
@end table
+@item Blackfin family---@file{config/bfin/constraints.md}
+@table @code
+@item a
+P register
+
+@item d
+D register
+
+@item z
+A call clobbered P register.
+
+@item q@var{n}
+A single register. If @var{n} is in the range 0 to 7, the corresponding D
+register. If it is @code{A}, then the register P0.
+
+@item D
+Even-numbered D register
+
+@item W
+Odd-numbered D register
+
+@item e
+Accumulator register.
+
+@item A
+Even-numbered accumulator register.
+
+@item B
+Odd-numbered accumulator register.
+
+@item b
+I register
+
+@item v
+B register
+
+@item f
+M register
+
+@item c
+Registers used for circular buffering, i.e. I, B, or L registers.
+
+@item C
+The CC register.
+
+@item t
+LT0 or LT1.
+
+@item k
+LC0 or LC1.
+
+@item u
+LB0 or LB1.
+
+@item x
+Any D, P, B, M, I or L register.
+
+@item y
+Additional registers typically used only in prologues and epilogues: RETS,
+RETN, RETI, RETX, RETE, ASTAT, SEQSTAT and USP.
+
+@item w
+Any register except accumulators or CC.
+
+@item Ksh
+Signed 16 bit integer (in the range @minus{}32768 to 32767)
+
+@item Kuh
+Unsigned 16 bit integer (in the range 0 to 65535)
+
+@item Ks7
+Signed 7 bit integer (in the range @minus{}64 to 63)
+
+@item Ku7
+Unsigned 7 bit integer (in the range 0 to 127)
+
+@item Ku5
+Unsigned 5 bit integer (in the range 0 to 31)
+
+@item Ks4
+Signed 4 bit integer (in the range @minus{}8 to 7)
+
+@item Ks3
+Signed 3 bit integer (in the range @minus{}3 to 4)
+
+@item Ku3
+Unsigned 3 bit integer (in the range 0 to 7)
+
+@item P@var{n}
+Constant @var{n}, where @var{n} is a single-digit constant in the range 0 to 4.
+
+@item PA
+An integer equal to one of the MACFLAG_XXX constants that is suitable for
+use with either accumulator.
+
+@item PB
+An integer equal to one of the MACFLAG_XXX constants that is suitable for
+use only with accumulator A1.
+
+@item M1
+Constant 255.
+
+@item M2
+Constant 65535.
+
+@item J
+An integer constant with exactly a single bit set.
+
+@item L
+An integer constant with all bits set except exactly one.
+
+@item H
+
+@item Q
+Any SYMBOL_REF.
+@end table
+
+@item CR16 Architecture---@file{config/cr16/cr16.h}
+@table @code
+
+@item b
+Registers from r0 to r14 (registers without stack pointer)
+
+@item t
+Register from r0 to r11 (all 16-bit registers)
+
+@item p
+Register from r12 to r15 (all 32-bit registers)
+
+@item I
+Signed constant that fits in 4 bits
+
+@item J
+Signed constant that fits in 5 bits
+
+@item K
+Signed constant that fits in 6 bits
+
+@item L
+Unsigned constant that fits in 4 bits
+
+@item M
+Signed constant that fits in 32 bits
+
+@item N
+Check for 64 bits wide constants for add/sub instructions
+
+@item G
+Floating point constant that is legal for store immediate
+@end table
+
@item Epiphany---@file{config/epiphany/constraints.md}
@table @code
@item U16
@@ -2002,38 +2154,97 @@ Matches control register values to switch fp mode, which are encapsulated in
@code{UNSPEC_FP_MODE}.
@end table
-@item CR16 Architecture---@file{config/cr16/cr16.h}
+@item FRV---@file{config/frv/frv.h}
@table @code
+@item a
+Register in the class @code{ACC_REGS} (@code{acc0} to @code{acc7}).
@item b
-Registers from r0 to r14 (registers without stack pointer)
+Register in the class @code{EVEN_ACC_REGS} (@code{acc0} to @code{acc7}).
+
+@item c
+Register in the class @code{CC_REGS} (@code{fcc0} to @code{fcc3} and
+@code{icc0} to @code{icc3}).
+
+@item d
+Register in the class @code{GPR_REGS} (@code{gr0} to @code{gr63}).
+
+@item e
+Register in the class @code{EVEN_REGS} (@code{gr0} to @code{gr63}).
+Odd registers are excluded not in the class but through the use of a machine
+mode larger than 4 bytes.
+
+@item f
+Register in the class @code{FPR_REGS} (@code{fr0} to @code{fr63}).
+
+@item h
+Register in the class @code{FEVEN_REGS} (@code{fr0} to @code{fr63}).
+Odd registers are excluded not in the class but through the use of a machine
+mode larger than 4 bytes.
+
+@item l
+Register in the class @code{LR_REG} (the @code{lr} register).
+
+@item q
+Register in the class @code{QUAD_REGS} (@code{gr2} to @code{gr63}).
+Register numbers not divisible by 4 are excluded not in the class but through
+the use of a machine mode larger than 8 bytes.
@item t
-Register from r0 to r11 (all 16-bit registers)
+Register in the class @code{ICC_REGS} (@code{icc0} to @code{icc3}).
-@item p
-Register from r12 to r15 (all 32-bit registers)
+@item u
+Register in the class @code{FCC_REGS} (@code{fcc0} to @code{fcc3}).
+
+@item v
+Register in the class @code{ICR_REGS} (@code{cc4} to @code{cc7}).
+
+@item w
+Register in the class @code{FCR_REGS} (@code{cc0} to @code{cc3}).
+
+@item x
+Register in the class @code{QUAD_FPR_REGS} (@code{fr0} to @code{fr63}).
+Register numbers not divisible by 4 are excluded not in the class but through
+the use of a machine mode larger than 8 bytes.
+
+@item z
+Register in the class @code{SPR_REGS} (@code{lcr} and @code{lr}).
+
+@item A
+Register in the class @code{QUAD_ACC_REGS} (@code{acc0} to @code{acc7}).
+
+@item B
+Register in the class @code{ACCG_REGS} (@code{accg0} to @code{accg7}).
+
+@item C
+Register in the class @code{CR_REGS} (@code{cc0} to @code{cc7}).
+
+@item G
+Floating point constant zero
@item I
-Signed constant that fits in 4 bits
+6-bit signed integer constant
@item J
-Signed constant that fits in 5 bits
-
-@item K
-Signed constant that fits in 6 bits
+10-bit signed integer constant
@item L
-Unsigned constant that fits in 4 bits
+16-bit signed integer constant
@item M
-Signed constant that fits in 32 bits
+16-bit unsigned integer constant
@item N
-Check for 64 bits wide constants for add/sub instructions
+12-bit signed integer constant that is negative---i.e.@: in the
+range of @minus{}2048 to @minus{}1
+
+@item O
+Constant zero
+
+@item P
+12-bit signed integer constant that is greater than zero---i.e.@: in the
+range of 1 to 2047.
-@item G
-Floating point constant that is legal for store immediate
@end table
@item Hewlett-Packard PA-RISC---@file{config/pa/pa.h}
@@ -2107,343 +2318,6 @@ A memory operand for floating-point loads and stores
A register indirect memory operand
@end table
-@item PowerPC and IBM RS6000---@file{config/rs6000/constraints.md}
-@table @code
-@item b
-Address base register
-
-@item d
-Floating point register (containing 64-bit value)
-
-@item f
-Floating point register (containing 32-bit value)
-
-@item v
-Altivec vector register
-
-@item wa
-Any VSX register if the -mvsx option was used or NO_REGS.
-
-@item wd
-VSX vector register to hold vector double data or NO_REGS.
-
-@item wf
-VSX vector register to hold vector float data or NO_REGS.
-
-@item wg
-If @option{-mmfpgpr} was used, a floating point register or NO_REGS.
-
-@item wh
-Floating point register if direct moves are available, or NO_REGS.
-
-@item wi
-FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS.
-
-@item wj
-FP or VSX register to hold 64-bit integers for direct moves or NO_REGS.
-
-@item wk
-FP or VSX register to hold 64-bit doubles for direct moves or NO_REGS.
-
-@item wl
-Floating point register if the LFIWAX instruction is enabled or NO_REGS.
-
-@item wm
-VSX register if direct move instructions are enabled, or NO_REGS.
-
-@item wn
-No register (NO_REGS).
-
-@item wr
-General purpose register if 64-bit instructions are enabled or NO_REGS.
-
-@item ws
-VSX vector register to hold scalar double values or NO_REGS.
-
-@item wt
-VSX vector register to hold 128 bit integer or NO_REGS.
-
-@item wu
-Altivec register to use for float/32-bit int loads/stores or NO_REGS.
-
-@item wv
-Altivec register to use for double loads/stores or NO_REGS.
-
-@item ww
-FP or VSX register to perform float operations under @option{-mvsx} or NO_REGS.
-
-@item wx
-Floating point register if the STFIWX instruction is enabled or NO_REGS.
-
-@item wy
-FP or VSX register to perform ISA 2.07 float ops or NO_REGS.
-
-@item wz
-Floating point register if the LFIWZX instruction is enabled or NO_REGS.
-
-@item wD
-Int constant that is the element number of the 64-bit scalar in a vector.
-
-@item wQ
-A memory address that will work with the @code{lq} and @code{stq}
-instructions.
-
-@item h
-@samp{MQ}, @samp{CTR}, or @samp{LINK} register
-
-@item q
-@samp{MQ} register
-
-@item c
-@samp{CTR} register
-
-@item l
-@samp{LINK} register
-
-@item x
-@samp{CR} register (condition register) number 0
-
-@item y
-@samp{CR} register (condition register)
-
-@item z
-@samp{XER[CA]} carry bit (part of the XER register)
-
-@item I
-Signed 16-bit constant
-
-@item J
-Unsigned 16-bit constant shifted left 16 bits (use @samp{L} instead for
-@code{SImode} constants)
-
-@item K
-Unsigned 16-bit constant
-
-@item L
-Signed 16-bit constant shifted left 16 bits
-
-@item M
-Constant larger than 31
-
-@item N
-Exact power of 2
-
-@item O
-Zero
-
-@item P
-Constant whose negation is a signed 16-bit constant
-
-@item G
-Floating point constant that can be loaded into a register with one
-instruction per word
-
-@item H
-Integer/Floating point constant that can be loaded into a register using
-three instructions
-
-@item m
-Memory operand.
-Normally, @code{m} does not allow addresses that update the base register.
-If @samp{<} or @samp{>} constraint is also used, they are allowed and
-therefore on PowerPC targets in that case it is only safe
-to use @samp{m<>} in an @code{asm} statement if that @code{asm} statement
-accesses the operand exactly once. The @code{asm} statement must also
-use @samp{%U@var{<opno>}} as a placeholder for the ``update'' flag in the
-corresponding load or store instruction. For example:
-
-@smallexample
-asm ("st%U0 %1,%0" : "=m<>" (mem) : "r" (val));
-@end smallexample
-
-is correct but:
-
-@smallexample
-asm ("st %1,%0" : "=m<>" (mem) : "r" (val));
-@end smallexample
-
-is not.
-
-@item es
-A ``stable'' memory operand; that is, one which does not include any
-automodification of the base register. This used to be useful when
-@samp{m} allowed automodification of the base register, but as those are now only
-allowed when @samp{<} or @samp{>} is used, @samp{es} is basically the same
-as @samp{m} without @samp{<} and @samp{>}.
-
-@item Q
-Memory operand that is an offset from a register (it is usually better
-to use @samp{m} or @samp{es} in @code{asm} statements)
-
-@item Z
-Memory operand that is an indexed or indirect from a register (it is
-usually better to use @samp{m} or @samp{es} in @code{asm} statements)
-
-@item R
-AIX TOC entry
-
-@item a
-Address operand that is an indexed or indirect from a register (@samp{p} is
-preferable for @code{asm} statements)
-
-@item S
-Constant suitable as a 64-bit mask operand
-
-@item T
-Constant suitable as a 32-bit mask operand
-
-@item U
-System V Release 4 small data area reference
-
-@item t
-AND masks that can be performed by two rldic@{l, r@} instructions
-
-@item W
-Vector constant that does not require memory
-
-@item j
-Vector constant that is all zeros.
-
-@end table
-
-@item x86 family---@file{config/i386/constraints.md}
-@table @code
-@item R
-Legacy register---the eight integer registers available on all
-i386 processors (@code{a}, @code{b}, @code{c}, @code{d},
-@code{si}, @code{di}, @code{bp}, @code{sp}).
-
-@item q
-Any register accessible as @code{@var{r}l}. In 32-bit mode, @code{a},
-@code{b}, @code{c}, and @code{d}; in 64-bit mode, any integer register.
-
-@item Q
-Any register accessible as @code{@var{r}h}: @code{a}, @code{b},
-@code{c}, and @code{d}.
-
-@ifset INTERNALS
-@item l
-Any register that can be used as the index in a base+index memory
-access: that is, any general register except the stack pointer.
-@end ifset
-
-@item a
-The @code{a} register.
-
-@item b
-The @code{b} register.
-
-@item c
-The @code{c} register.
-
-@item d
-The @code{d} register.
-
-@item S
-The @code{si} register.
-
-@item D
-The @code{di} register.
-
-@item A
-The @code{a} and @code{d} registers. This class is used for instructions
-that return double word results in the @code{ax:dx} register pair. Single
-word values will be allocated either in @code{ax} or @code{dx}.
-For example on i386 the following implements @code{rdtsc}:
-
-@smallexample
-unsigned long long rdtsc (void)
-@{
- unsigned long long tick;
- __asm__ __volatile__("rdtsc":"=A"(tick));
- return tick;
-@}
-@end smallexample
-
-This is not correct on x86-64 as it would allocate tick in either @code{ax}
-or @code{dx}. You have to use the following variant instead:
-
-@smallexample
-unsigned long long rdtsc (void)
-@{
- unsigned int tickl, tickh;
- __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh));
- return ((unsigned long long)tickh << 32)|tickl;
-@}
-@end smallexample
-
-
-@item f
-Any 80387 floating-point (stack) register.
-
-@item t
-Top of 80387 floating-point stack (@code{%st(0)}).
-
-@item u
-Second from top of 80387 floating-point stack (@code{%st(1)}).
-
-@item y
-Any MMX register.
-
-@item x
-Any SSE register.
-
-@item Yz
-First SSE register (@code{%xmm0}).
-
-@ifset INTERNALS
-@item Y2
-Any SSE register, when SSE2 is enabled.
-
-@item Yi
-Any SSE register, when SSE2 and inter-unit moves are enabled.
-
-@item Ym
-Any MMX register, when inter-unit moves are enabled.
-@end ifset
-
-@item I
-Integer constant in the range 0 @dots{} 31, for 32-bit shifts.
-
-@item J
-Integer constant in the range 0 @dots{} 63, for 64-bit shifts.
-
-@item K
-Signed 8-bit integer constant.
-
-@item L
-@code{0xFF} or @code{0xFFFF}, for andsi as a zero-extending move.
-
-@item M
-0, 1, 2, or 3 (shifts for the @code{lea} instruction).
-
-@item N
-Unsigned 8-bit integer constant (for @code{in} and @code{out}
-instructions).
-
-@ifset INTERNALS
-@item O
-Integer constant in the range 0 @dots{} 127, for 128-bit shifts.
-@end ifset
-
-@item G
-Standard 80387 floating point constant.
-
-@item C
-Standard SSE floating point constant.
-
-@item e
-32-bit signed integer constant, or a symbolic reference known
-to fit that range (for immediate operands in sign-extending x86-64
-instructions).
-
-@item Z
-32-bit unsigned integer constant, or a symbolic reference known
-to fit that range (for immediate operands in zero-extending x86-64
-instructions).
-
-@end table
-
@item Intel IA-64---@file{config/ia64/ia64.h}
@table @code
@item a
@@ -2508,216 +2382,6 @@ now roughly the same as @samp{m} when not used together with @samp{<}
or @samp{>}.
@end table
-@item FRV---@file{config/frv/frv.h}
-@table @code
-@item a
-Register in the class @code{ACC_REGS} (@code{acc0} to @code{acc7}).
-
-@item b
-Register in the class @code{EVEN_ACC_REGS} (@code{acc0} to @code{acc7}).
-
-@item c
-Register in the class @code{CC_REGS} (@code{fcc0} to @code{fcc3} and
-@code{icc0} to @code{icc3}).
-
-@item d
-Register in the class @code{GPR_REGS} (@code{gr0} to @code{gr63}).
-
-@item e
-Register in the class @code{EVEN_REGS} (@code{gr0} to @code{gr63}).
-Odd registers are excluded not in the class but through the use of a machine
-mode larger than 4 bytes.
-
-@item f
-Register in the class @code{FPR_REGS} (@code{fr0} to @code{fr63}).
-
-@item h
-Register in the class @code{FEVEN_REGS} (@code{fr0} to @code{fr63}).
-Odd registers are excluded not in the class but through the use of a machine
-mode larger than 4 bytes.
-
-@item l
-Register in the class @code{LR_REG} (the @code{lr} register).
-
-@item q
-Register in the class @code{QUAD_REGS} (@code{gr2} to @code{gr63}).
-Register numbers not divisible by 4 are excluded not in the class but through
-the use of a machine mode larger than 8 bytes.
-
-@item t
-Register in the class @code{ICC_REGS} (@code{icc0} to @code{icc3}).
-
-@item u
-Register in the class @code{FCC_REGS} (@code{fcc0} to @code{fcc3}).
-
-@item v
-Register in the class @code{ICR_REGS} (@code{cc4} to @code{cc7}).
-
-@item w
-Register in the class @code{FCR_REGS} (@code{cc0} to @code{cc3}).
-
-@item x
-Register in the class @code{QUAD_FPR_REGS} (@code{fr0} to @code{fr63}).
-Register numbers not divisible by 4 are excluded not in the class but through
-the use of a machine mode larger than 8 bytes.
-
-@item z
-Register in the class @code{SPR_REGS} (@code{lcr} and @code{lr}).
-
-@item A
-Register in the class @code{QUAD_ACC_REGS} (@code{acc0} to @code{acc7}).
-
-@item B
-Register in the class @code{ACCG_REGS} (@code{accg0} to @code{accg7}).
-
-@item C
-Register in the class @code{CR_REGS} (@code{cc0} to @code{cc7}).
-
-@item G
-Floating point constant zero
-
-@item I
-6-bit signed integer constant
-
-@item J
-10-bit signed integer constant
-
-@item L
-16-bit signed integer constant
-
-@item M
-16-bit unsigned integer constant
-
-@item N
-12-bit signed integer constant that is negative---i.e.@: in the
-range of @minus{}2048 to @minus{}1
-
-@item O
-Constant zero
-
-@item P
-12-bit signed integer constant that is greater than zero---i.e.@: in the
-range of 1 to 2047.
-
-@end table
-
-@item Blackfin family---@file{config/bfin/constraints.md}
-@table @code
-@item a
-P register
-
-@item d
-D register
-
-@item z
-A call clobbered P register.
-
-@item q@var{n}
-A single register. If @var{n} is in the range 0 to 7, the corresponding D
-register. If it is @code{A}, then the register P0.
-
-@item D
-Even-numbered D register
-
-@item W
-Odd-numbered D register
-
-@item e
-Accumulator register.
-
-@item A
-Even-numbered accumulator register.
-
-@item B
-Odd-numbered accumulator register.
-
-@item b
-I register
-
-@item v
-B register
-
-@item f
-M register
-
-@item c
-Registers used for circular buffering, i.e. I, B, or L registers.
-
-@item C
-The CC register.
-
-@item t
-LT0 or LT1.
-
-@item k
-LC0 or LC1.
-
-@item u
-LB0 or LB1.
-
-@item x
-Any D, P, B, M, I or L register.
-
-@item y
-Additional registers typically used only in prologues and epilogues: RETS,
-RETN, RETI, RETX, RETE, ASTAT, SEQSTAT and USP.
-
-@item w
-Any register except accumulators or CC.
-
-@item Ksh
-Signed 16 bit integer (in the range @minus{}32768 to 32767)
-
-@item Kuh
-Unsigned 16 bit integer (in the range 0 to 65535)
-
-@item Ks7
-Signed 7 bit integer (in the range @minus{}64 to 63)
-
-@item Ku7
-Unsigned 7 bit integer (in the range 0 to 127)
-
-@item Ku5
-Unsigned 5 bit integer (in the range 0 to 31)
-
-@item Ks4
-Signed 4 bit integer (in the range @minus{}8 to 7)
-
-@item Ks3
-Signed 3 bit integer (in the range @minus{}3 to 4)
-
-@item Ku3
-Unsigned 3 bit integer (in the range 0 to 7)
-
-@item P@var{n}
-Constant @var{n}, where @var{n} is a single-digit constant in the range 0 to 4.
-
-@item PA
-An integer equal to one of the MACFLAG_XXX constants that is suitable for
-use with either accumulator.
-
-@item PB
-An integer equal to one of the MACFLAG_XXX constants that is suitable for
-use only with accumulator A1.
-
-@item M1
-Constant 255.
-
-@item M2
-Constant 65535.
-
-@item J
-An integer constant with exactly a single bit set.
-
-@item L
-An integer constant with all bits set except exactly one.
-
-@item H
-
-@item Q
-Any SYMBOL_REF.
-@end table
-
@item M32C---@file{config/m32c/m32c.c}
@table @code
@item Rsp
@@ -3346,6 +3010,205 @@ A memory reference that is encoded within the opcode.
@end table
+@item PowerPC and IBM RS6000---@file{config/rs6000/constraints.md}
+@table @code
+@item b
+Address base register
+
+@item d
+Floating point register (containing 64-bit value)
+
+@item f
+Floating point register (containing 32-bit value)
+
+@item v
+Altivec vector register
+
+@item wa
+Any VSX register if the -mvsx option was used or NO_REGS.
+
+@item wd
+VSX vector register to hold vector double data or NO_REGS.
+
+@item wf
+VSX vector register to hold vector float data or NO_REGS.
+
+@item wg
+If @option{-mmfpgpr} was used, a floating point register or NO_REGS.
+
+@item wh
+Floating point register if direct moves are available, or NO_REGS.
+
+@item wi
+FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS.
+
+@item wj
+FP or VSX register to hold 64-bit integers for direct moves or NO_REGS.
+
+@item wk
+FP or VSX register to hold 64-bit doubles for direct moves or NO_REGS.
+
+@item wl
+Floating point register if the LFIWAX instruction is enabled or NO_REGS.
+
+@item wm
+VSX register if direct move instructions are enabled, or NO_REGS.
+
+@item wn
+No register (NO_REGS).
+
+@item wr
+General purpose register if 64-bit instructions are enabled or NO_REGS.
+
+@item ws
+VSX vector register to hold scalar double values or NO_REGS.
+
+@item wt
+VSX vector register to hold 128 bit integer or NO_REGS.
+
+@item wu
+Altivec register to use for float/32-bit int loads/stores or NO_REGS.
+
+@item wv
+Altivec register to use for double loads/stores or NO_REGS.
+
+@item ww
+FP or VSX register to perform float operations under @option{-mvsx} or NO_REGS.
+
+@item wx
+Floating point register if the STFIWX instruction is enabled or NO_REGS.
+
+@item wy
+FP or VSX register to perform ISA 2.07 float ops or NO_REGS.
+
+@item wz
+Floating point register if the LFIWZX instruction is enabled or NO_REGS.
+
+@item wD
+Int constant that is the element number of the 64-bit scalar in a vector.
+
+@item wQ
+A memory address that will work with the @code{lq} and @code{stq}
+instructions.
+
+@item h
+@samp{MQ}, @samp{CTR}, or @samp{LINK} register
+
+@item q
+@samp{MQ} register
+
+@item c
+@samp{CTR} register
+
+@item l
+@samp{LINK} register
+
+@item x
+@samp{CR} register (condition register) number 0
+
+@item y
+@samp{CR} register (condition register)
+
+@item z
+@samp{XER[CA]} carry bit (part of the XER register)
+
+@item I
+Signed 16-bit constant
+
+@item J
+Unsigned 16-bit constant shifted left 16 bits (use @samp{L} instead for
+@code{SImode} constants)
+
+@item K
+Unsigned 16-bit constant
+
+@item L
+Signed 16-bit constant shifted left 16 bits
+
+@item M
+Constant larger than 31
+
+@item N
+Exact power of 2
+
+@item O
+Zero
+
+@item P
+Constant whose negation is a signed 16-bit constant
+
+@item G
+Floating point constant that can be loaded into a register with one
+instruction per word
+
+@item H
+Integer/Floating point constant that can be loaded into a register using
+three instructions
+
+@item m
+Memory operand.
+Normally, @code{m} does not allow addresses that update the base register.
+If @samp{<} or @samp{>} constraint is also used, they are allowed and
+therefore on PowerPC targets in that case it is only safe
+to use @samp{m<>} in an @code{asm} statement if that @code{asm} statement
+accesses the operand exactly once. The @code{asm} statement must also
+use @samp{%U@var{<opno>}} as a placeholder for the ``update'' flag in the
+corresponding load or store instruction. For example:
+
+@smallexample
+asm ("st%U0 %1,%0" : "=m<>" (mem) : "r" (val));
+@end smallexample
+
+is correct but:
+
+@smallexample
+asm ("st %1,%0" : "=m<>" (mem) : "r" (val));
+@end smallexample
+
+is not.
+
+@item es
+A ``stable'' memory operand; that is, one which does not include any
+automodification of the base register. This used to be useful when
+@samp{m} allowed automodification of the base register, but as those are now only
+allowed when @samp{<} or @samp{>} is used, @samp{es} is basically the same
+as @samp{m} without @samp{<} and @samp{>}.
+
+@item Q
+Memory operand that is an offset from a register (it is usually better
+to use @samp{m} or @samp{es} in @code{asm} statements)
+
+@item Z
+Memory operand that is an indexed or indirect from a register (it is
+usually better to use @samp{m} or @samp{es} in @code{asm} statements)
+
+@item R
+AIX TOC entry
+
+@item a
+Address operand that is an indexed or indirect from a register (@samp{p} is
+preferable for @code{asm} statements)
+
+@item S
+Constant suitable as a 64-bit mask operand
+
+@item T
+Constant suitable as a 32-bit mask operand
+
+@item U
+System V Release 4 small data area reference
+
+@item t
+AND masks that can be performed by two rldic@{l, r@} instructions
+
+@item W
+Vector constant that does not require memory
+
+@item j
+Vector constant that is all zeros.
+
+@end table
+
@item RL78---@file{config/rl78/constraints.md}
@table @code
@@ -3462,6 +3325,79 @@ A constant in the range 0 to 15, inclusive.
@end table
+@item S/390 and zSeries---@file{config/s390/s390.h}
+@table @code
+@item a
+Address register (general purpose register except r0)
+
+@item c
+Condition code register
+
+@item d
+Data register (arbitrary general purpose register)
+
+@item f
+Floating-point register
+
+@item I
+Unsigned 8-bit constant (0--255)
+
+@item J
+Unsigned 12-bit constant (0--4095)
+
+@item K
+Signed 16-bit constant (@minus{}32768--32767)
+
+@item L
+Value appropriate as displacement.
+@table @code
+@item (0..4095)
+for short displacement
+@item (@minus{}524288..524287)
+for long displacement
+@end table
+
+@item M
+Constant integer with a value of 0x7fffffff.
+
+@item N
+Multiple letter constraint followed by 4 parameter letters.
+@table @code
+@item 0..9:
+number of the part counting from most to least significant
+@item H,Q:
+mode of the part
+@item D,S,H:
+mode of the containing operand
+@item 0,F:
+value of the other parts (F---all bits set)
+@end table
+The constraint matches if the specified part of a constant
+has a value different from its other parts.
+
+@item Q
+Memory reference without index register and with short displacement.
+
+@item R
+Memory reference with index register and short displacement.
+
+@item S
+Memory reference without index register but with long displacement.
+
+@item T
+Memory reference with index register and long displacement.
+
+@item U
+Pointer with short displacement.
+
+@item W
+Pointer with long displacement.
+
+@item Y
+Shift count operand.
+
+@end table
+
@need 1000
@item SPARC---@file{config/sparc/sparc.h}
@table @code
@@ -3634,149 +3570,6 @@ An immediate for the @code{iohl} instruction. const_int is sign extended to 128
@end table
-@item S/390 and zSeries---@file{config/s390/s390.h}
-@table @code
-@item a
-Address register (general purpose register except r0)
-
-@item c
-Condition code register
-
-@item d
-Data register (arbitrary general purpose register)
-
-@item f
-Floating-point register
-
-@item I
-Unsigned 8-bit constant (0--255)
-
-@item J
-Unsigned 12-bit constant (0--4095)
-
-@item K
-Signed 16-bit constant (@minus{}32768--32767)
-
-@item L
-Value appropriate as displacement.
-@table @code
-@item (0..4095)
-for short displacement
-@item (@minus{}524288..524287)
-for long displacement
-@end table
-
-@item M
-Constant integer with a value of 0x7fffffff.
-
-@item N
-Multiple letter constraint followed by 4 parameter letters.
-@table @code
-@item 0..9:
-number of the part counting from most to least significant
-@item H,Q:
-mode of the part
-@item D,S,H:
-mode of the containing operand
-@item 0,F:
-value of the other parts (F---all bits set)
-@end table
-The constraint matches if the specified part of a constant
-has a value different from its other parts.
-
-@item Q
-Memory reference without index register and with short displacement.
-
-@item R
-Memory reference with index register and short displacement.
-
-@item S
-Memory reference without index register but with long displacement.
-
-@item T
-Memory reference with index register and long displacement.
-
-@item U
-Pointer with short displacement.
-
-@item W
-Pointer with long displacement.
-
-@item Y
-Shift count operand.
-
-@end table
-
-@item Xstormy16---@file{config/stormy16/stormy16.h}
-@table @code
-@item a
-Register r0.
-
-@item b
-Register r1.
-
-@item c
-Register r2.
-
-@item d
-Register r8.
-
-@item e
-Registers r0 through r7.
-
-@item t
-Registers r0 and r1.
-
-@item y
-The carry register.
-
-@item z
-Registers r8 and r9.
-
-@item I
-A constant between 0 and 3 inclusive.
-
-@item J
-A constant that has exactly one bit set.
-
-@item K
-A constant that has exactly one bit clear.
-
-@item L
-A constant between 0 and 255 inclusive.
-
-@item M
-A constant between @minus{}255 and 0 inclusive.
-
-@item N
-A constant between @minus{}3 and 0 inclusive.
-
-@item O
-A constant between 1 and 4 inclusive.
-
-@item P
-A constant between @minus{}4 and @minus{}1 inclusive.
-
-@item Q
-A memory reference that is a stack push.
-
-@item R
-A memory reference that is a stack pop.
-
-@item S
-A memory reference that refers to a constant address of known value.
-
-@item T
-The register indicated by Rx (not implemented yet).
-
-@item U
-A constant that is not between 2 and 15 inclusive.
-
-@item Z
-The constant 0.
-
-@end table
-
@item TI C6X family---@file{config/c6x/constraints.md}
@table @code
@item a
@@ -4058,6 +3851,214 @@ Integer constant 0
Integer constant 32
@end table
+@item x86 family---@file{config/i386/constraints.md}
+@table @code
+@item R
+Legacy register---the eight integer registers available on all
+i386 processors (@code{a}, @code{b}, @code{c}, @code{d},
+@code{si}, @code{di}, @code{bp}, @code{sp}).
+
+@item q
+Any register accessible as @code{@var{r}l}. In 32-bit mode, @code{a},
+@code{b}, @code{c}, and @code{d}; in 64-bit mode, any integer register.
+
+@item Q
+Any register accessible as @code{@var{r}h}: @code{a}, @code{b},
+@code{c}, and @code{d}.
+
+@ifset INTERNALS
+@item l
+Any register that can be used as the index in a base+index memory
+access: that is, any general register except the stack pointer.
+@end ifset
+
+@item a
+The @code{a} register.
+
+@item b
+The @code{b} register.
+
+@item c
+The @code{c} register.
+
+@item d
+The @code{d} register.
+
+@item S
+The @code{si} register.
+
+@item D
+The @code{di} register.
+
+@item A
+The @code{a} and @code{d} registers. This class is used for instructions
+that return double word results in the @code{ax:dx} register pair. Single
+word values will be allocated either in @code{ax} or @code{dx}.
+For example on i386 the following implements @code{rdtsc}:
+
+@smallexample
+unsigned long long rdtsc (void)
+@{
+ unsigned long long tick;
+ __asm__ __volatile__("rdtsc":"=A"(tick));
+ return tick;
+@}
+@end smallexample
+
+This is not correct on x86-64 as it would allocate tick in either @code{ax}
+or @code{dx}. You have to use the following variant instead:
+
+@smallexample
+unsigned long long rdtsc (void)
+@{
+ unsigned int tickl, tickh;
+ __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh));
+ return ((unsigned long long)tickh << 32)|tickl;
+@}
+@end smallexample
+
+
+@item f
+Any 80387 floating-point (stack) register.
+
+@item t
+Top of 80387 floating-point stack (@code{%st(0)}).
+
+@item u
+Second from top of 80387 floating-point stack (@code{%st(1)}).
+
+@item y
+Any MMX register.
+
+@item x
+Any SSE register.
+
+@item Yz
+First SSE register (@code{%xmm0}).
+
+@ifset INTERNALS
+@item Y2
+Any SSE register, when SSE2 is enabled.
+
+@item Yi
+Any SSE register, when SSE2 and inter-unit moves are enabled.
+
+@item Ym
+Any MMX register, when inter-unit moves are enabled.
+@end ifset
+
+@item I
+Integer constant in the range 0 @dots{} 31, for 32-bit shifts.
+
+@item J
+Integer constant in the range 0 @dots{} 63, for 64-bit shifts.
+
+@item K
+Signed 8-bit integer constant.
+
+@item L
+@code{0xFF} or @code{0xFFFF}, for andsi as a zero-extending move.
+
+@item M
+0, 1, 2, or 3 (shifts for the @code{lea} instruction).
+
+@item N
+Unsigned 8-bit integer constant (for @code{in} and @code{out}
+instructions).
+
+@ifset INTERNALS
+@item O
+Integer constant in the range 0 @dots{} 127, for 128-bit shifts.
+@end ifset
+
+@item G
+Standard 80387 floating point constant.
+
+@item C
+Standard SSE floating point constant.
+
+@item e
+32-bit signed integer constant, or a symbolic reference known
+to fit that range (for immediate operands in sign-extending x86-64
+instructions).
+
+@item Z
+32-bit unsigned integer constant, or a symbolic reference known
+to fit that range (for immediate operands in zero-extending x86-64
+instructions).
+
+@end table
+
+@item Xstormy16---@file{config/stormy16/stormy16.h}
+@table @code
+@item a
+Register r0.
+
+@item b
+Register r1.
+
+@item c
+Register r2.
+
+@item d
+Register r8.
+
+@item e
+Registers r0 through r7.
+
+@item t
+Registers r0 and r1.
+
+@item y
+The carry register.
+
+@item z
+Registers r8 and r9.
+
+@item I
+A constant between 0 and 3 inclusive.
+
+@item J
+A constant that has exactly one bit set.
+
+@item K
+A constant that has exactly one bit clear.
+
+@item L
+A constant between 0 and 255 inclusive.
+
+@item M
+A constant between @minus{}255 and 0 inclusive.
+
+@item N
+A constant between @minus{}3 and 0 inclusive.
+
+@item O
+A constant between 1 and 4 inclusive.
+
+@item P
+A constant between @minus{}4 and @minus{}1 inclusive.
+
+@item Q
+A memory reference that is a stack push.
+
+@item R
+A memory reference that is a stack pop.
+
+@item S
+A memory reference that refers to a constant address of known value.
+
+@item T
+The register indicated by Rx (not implemented yet).
+
+@item U
+A constant that is not between 2 and 15 inclusive.
+
+@item Z
+The constant 0.
+
+@end table
+
@item Xtensa---@file{config/xtensa/constraints.md}
@table @code
@item a