aboutsummaryrefslogtreecommitdiff
path: root/gcc/doc
diff options
context:
space:
mode:
authorBernd Schmidt <bernds@redhat.com>2001-09-29 16:33:20 +0000
committerBernd Schmidt <bernds@gcc.gnu.org>2001-09-29 16:33:20 +0000
commit1255c85c04a16e6151e01c98d75165140a0a848d (patch)
treebe9a372813a643821987008dbcafcad682c3f60f /gcc/doc
parent86be733d75a5daae48fb2690e47d36c444d7e680 (diff)
downloadgcc-1255c85c04a16e6151e01c98d75165140a0a848d.zip
gcc-1255c85c04a16e6151e01c98d75165140a0a848d.tar.gz
gcc-1255c85c04a16e6151e01c98d75165140a0a848d.tar.bz2
Documentation for vector extensions
From-SVN: r45880
Diffstat (limited to 'gcc/doc')
-rw-r--r--gcc/doc/extend.texi71
-rw-r--r--gcc/doc/invoke.texi378
2 files changed, 449 insertions, 0 deletions
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ea60b57..7ade488 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -430,6 +430,7 @@ extensions, accepted by GCC in C89 mode and in C++.
* Function Names:: Printable strings which are the name of the current
function.
* Return Address:: Getting the return or frame address of a function.
+* Vector Extensions:: Using vector instructions through built-in functions.
* Other Builtins:: Other built-in functions.
* Pragmas:: Pragmas accepted by GCC.
@end menu
@@ -483,6 +484,7 @@ extensions, accepted by GCC in C89 mode and in C++.
* Function Names:: Printable strings which are the name of the current
function.
* Return Address:: Getting the return or frame address of a function.
+* Vector Extensions:: Using vector instructions through built-in functions.
* Other Builtins:: Other built-in functions.
* Pragmas:: Pragmas accepted by GCC.
@end menu
@@ -4147,6 +4149,75 @@ This function should only be used with a non-zero argument for debugging
purposes.
@end deftypefn
+@node Vector Extensions
+@section Using vector instructions through built-in functions
+
+On some targets, the instruction set contains SIMD vector instructions that
+operate on multiple values contained in one large register at the same time.
+For example, on the i386 the MMX, 3Dnow! and SSE extensions can be used
+this way.
+
+The first step in using these extensions is to provide the necessary data
+types. This should be done using an appropriate @code{typedef}:
+
+@example
+typedef int v4si __attribute__ ((mode(V4SI)));
+@end example
+
+The base type @code{int} is effectively ignored by the compiler, the
+actual properties of the new type @code{v4si} are defined by the
+@code{__attribute__}. It defines the machine mode to be used; for vector
+types these have the form @code{VnB}; @code{n} should be the number of
+elements in the vector, and @code{B} should be the base mode of the
+individual elements. The following can be used as base modes:
+
+@table @code
+@item QI
+An integer that is as wide as the smallest addressable unit, usually 8 bits.
+@item HI
+An integer, twice as wide as a QI mode integer, usually 16 bits.
+@item SI
+An integer, four times as wide as a QI mode integer, usually 32 bits.
+@item DI
+An integer, eight times as wide as a QI mode integer, usually 64 bits.
+@item SF
+A floating point value, as wide as a SI mode integer, usually 32 bits.
+@item DF
+A floating point value, as wide as a DI mode integer, usually 64 bits.
+@end table
+
+Not all base types or combinations are always valid; which modes can be used
+is determined by the target machine. For example, if targetting the i386 MMX
+extensions, only @code{V8QI}, @code{V4HI} and @code{V2SI} are allowed modes.
+
+There are no @code{V1xx} vector modes - they would be identical to the
+corresponding base mode.
+
+There is no distinction between signed and unsigned vector modes. This
+distinction is made by the operations that perform on the vectors, not
+by the data type.
+
+The types defined in this manner are somewhat special, they cannot be
+used with most normal C operations (i.e., a vector addition can @emph{not}
+be represented by a normal addition of two vector type variables). You
+can declare only variables and use them in function calls and returns, as
+well as in assignments and some casts. It is possible to cast from one
+vector type to another, provided they are of the same size (in fact, you
+can also cast vectors to and from other datatypes of the same size).
+
+A port that supports vector operations provides a set of built-in functions
+that can be used to operate on vectors. For example, a function to add two
+vectors and multiply the result by a third could look like this:
+
+@example
+v4si f (v4si a, v4si b, v4si c)
+@{
+ v4si tmp = __builtin_addv4si (a, b);
+ return __builtin_mulv4si (tmp, c);
+@}
+
+@end example
+
@node Other Builtins
@section Other built-in functions provided by GCC
@cindex built-in functions
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d2b2afc..2b5f953 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -471,6 +471,7 @@ in the following sections.
-mno-fp-ret-in-387 -msoft-float -msvr3-shlib @gol
-mno-wide-multiply -mrtd -malign-double @gol
-mpreferred-stack-boundary=@var{num} @gol
+-mmmx -msse -m3dnow @gol
-mthreads -mno-align-stringops -minline-all-stringops @gol
-mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol
-m96bit-long-double -mregparm=@var{num} -momit-leaf-frame-pointer}
@@ -7600,6 +7601,383 @@ to stack space usage, such as embedded systems and operating system kernels,
may want to reduce the preferred alignment to
@option{-mpreferred-stack-boundary=2}.
+@item -mmmx
+@itemx -mno-mmx
+@item -msse
+@itemx -mno-sse
+@item -m3dnow
+@itemx -mno-3dnow
+@opindex mmmx
+@opindex mno-mmx
+@opindex msse
+@opindex mno-sse
+@opindex m3dnow
+@opindex mno-3dnow
+These switches enable or disable the use of built-in functions that allow
+direct access to the MMX, SSE and 3Dnow extensions of the instruction set.
+
+The following machine modes are available for use with MMX builtins
+(@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32 bit integers,
+@code{V4HI} for a vector of four 16 bit integers, and @code{V8QI} for a
+vector of eight 8 bit integers. Some of the builtins operate on MMX
+registers as a whole 64 bit entity, these use @code{DI} as their mode.
+
+If 3Dnow extensions are enabled, @code{V2SF} is used as a mode for a vector
+of two 32 bit floating point values.
+
+If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32 bit
+floating point values. Some instructions use a vector of four 32 bit
+integers, these use @code{V4SI}. Finally, some instructions operate on an
+entire vector register, interpreting it as a 128 bit integer, these use mode
+@code{TI}.
+
+The following builtins are made available by @option{-mmmx}:
+@table @code
+@item v8qi __builtin_ia32_paddb (v8qi, v8qi)
+Generates the @code{paddb} machine instruction.
+@item v4hi __builtin_ia32_paddw (v4hi, v4hi)
+Generates the @code{paddw} machine instruction.
+@item v2si __builtin_ia32_paddd (v2si, v2si)
+Generates the @code{paddd} machine instruction.
+@item v8qi __builtin_ia32_psubb (v8qi, v8qi)
+Generates the @code{psubb} machine instruction.
+@item v4hi __builtin_ia32_psubw (v4hi, v4hi)
+Generates the @code{psubw} machine instruction.
+@item v2si __builtin_ia32_psubd (v2si, v2si)
+Generates the @code{psubd} machine instruction.
+
+@item v8qi __builtin_ia32_paddsb (v8qi, v8qi)
+Generates the @code{paddsb} machine instruction.
+@item v4hi __builtin_ia32_paddsw (v4hi, v4hi)
+Generates the @code{paddsw} machine instruction.
+@item v8qi __builtin_ia32_psubsb (v8qi, v8qi)
+Generates the @code{psubsb} machine instruction.
+@item v4hi __builtin_ia32_psubsw (v4hi, v4hi)
+Generates the @code{psubsw} machine instruction.
+
+@item v8qi __builtin_ia32_paddusb (v8qi, v8qi)
+Generates the @code{paddusb} machine instruction.
+@item v4hi __builtin_ia32_paddusw (v4hi, v4hi)
+Generates the @code{paddusw} machine instruction.
+@item v8qi __builtin_ia32_psubusb (v8qi, v8qi)
+Generates the @code{psubusb} machine instruction.
+@item v4hi __builtin_ia32_psubusw (v4hi, v4hi)
+Generates the @code{psubusw} machine instruction.
+
+@item v4hi __builtin_ia32_pmullw (v4hi, v4hi)
+Generates the @code{pmullw} machine instruction.
+@item v4hi __builtin_ia32_pmulhw (v4hi, v4hi)
+Generates the @code{pmulhw} machine instruction.
+
+@item di __builtin_ia32_pand (di, di)
+Generates the @code{pand} machine instruction.
+@item di __builtin_ia32_pandn (di,di)
+Generates the @code{pandn} machine instruction.
+@item di __builtin_ia32_por (di, di)
+Generates the @code{por} machine instruction.
+@item di __builtin_ia32_pxor (di, di)
+Generates the @code{pxor} machine instruction.
+
+@item v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi)
+Generates the @code{pcmpeqb} machine instruction.
+@item v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi)
+Generates the @code{pcmpeqw} machine instruction.
+@item v2si __builtin_ia32_pcmpeqd (v2si, v2si)
+Generates the @code{pcmpeqd} machine instruction.
+@item v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi)
+Generates the @code{pcmpgtb} machine instruction.
+@item v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi)
+Generates the @code{pcmpgtw} machine instruction.
+@item v2si __builtin_ia32_pcmpgtd (v2si, v2si)
+Generates the @code{pcmpgtd} machine instruction.
+
+@item v8qi __builtin_ia32_punpckhbw (v8qi, v8qi)
+Generates the @code{punpckhbw} machine instruction.
+@item v4hi __builtin_ia32_punpckhwd (v4hi, v4hi)
+Generates the @code{punpckhwd} machine instruction.
+@item v2si __builtin_ia32_punpckhdq (v2si, v2si)
+Generates the @code{punpckhdq} machine instruction.
+@item v8qi __builtin_ia32_punpcklbw (v8qi, v8qi)
+Generates the @code{punpcklbw} machine instruction.
+@item v4hi __builtin_ia32_punpcklwd (v4hi, v4hi)
+Generates the @code{punpcklwd} machine instruction.
+@item v2si __builtin_ia32_punpckldq (v2si, v2si)
+Generates the @code{punpckldq} machine instruction.
+
+@item v8qi __builtin_ia32_packsswb (v4hi, v4hi)
+Generates the @code{packsswb} machine instruction.
+@item v4hi __builtin_ia32_packssdw (v2si, v2si)
+Generates the @code{packssdw} machine instruction.
+@item v8qi __builtin_ia32_packuswb (v4hi, v4hi)
+Generates the @code{packuswb} machine instruction.
+
+@end table
+
+The following builtins are made available either with @option{-msse}, or
+with a combination of @option{-m3dnow} and @option{-march=athlon}.
+@table @code
+
+@item v4hi __builtin_ia32_pmulhuw (v4hi, v4hi)
+Generates the @code{pmulhuw} machine instruction.
+
+@item v8qi __builtin_ia32_pavgb (v8qi, v8qi)
+Generates the @code{pavgb} machine instruction.
+@item v4hi __builtin_ia32_pavgw (v4hi, v4hi)
+Generates the @code{pavgw} machine instruction.
+@item v4hi __builtin_ia32_psadbw (v8qi, v8qi)
+Generates the @code{psadbw} machine instruction.
+
+@item v8qi __builtin_ia32_pmaxub (v8qi, v8qi)
+Generates the @code{pmaxub} machine instruction.
+@item v4hi __builtin_ia32_pmaxsw (v4hi, v4hi)
+Generates the @code{pmaxsw} machine instruction.
+@item v8qi __builtin_ia32_pminub (v8qi, v8qi)
+Generates the @code{pminub} machine instruction.
+@item v4hi __builtin_ia32_pminsw (v4hi, v4hi)
+Generates the @code{pminsw} machine instruction.
+
+@item int __builtin_ia32_pextrw (v4hi, int)
+Generates the @code{pextrw} machine instruction.
+@item v4hi __builtin_ia32_pinsrw (v4hi, int, int)
+Generates the @code{pinsrw} machine instruction.
+
+@item int __builtin_ia32_pmovmskb (v8qi)
+Generates the @code{pmovmskb} machine instruction.
+@item void __builtin_ia32_maskmovq (v8qi, v8qi, char *)
+Generates the @code{maskmovq} machine instruction.
+@item void __buitlin_ia32_movntq (di *, di)
+Generates the @code{movntq} machine instruction.
+@item void __buitlin_ia32_sfence (void)
+Generates the @code{sfence} machine instruction.
+@item void __builtin_ia32_prefetch (char *, int selector)
+Generates a prefetch machine instruction, depending on the value of
+selector. If @code{selector} is 0, it generates @code{prefetchnta}; for
+a value of 1, it generates @code{prefetcht0}; for a value of 2, it generates
+@code{prefetcht1}; and for a value of 3 it generates @code{prefetcht2}.
+
+@end table
+
+The following builtins are available when @option{-msse} is used.
+
+@table @code
+@item int __buitlin_ia32_comieq (v4sf, v4sf)
+Generates the @code{comiss} machine instruction and performs an equality
+comparison. The return value is the truth value of that comparison.
+@item int __buitlin_ia32_comineq (v4sf, v4sf)
+Generates the @code{comiss} machine instruction and performs an inequality
+comparison. The return value is the truth value of that comparison.
+@item int __buitlin_ia32_comilt (v4sf, v4sf)
+Generates the @code{comiss} machine instruction and performs a ``less than''
+comparison. The return value is the truth value of that comparison.
+@item int __buitlin_ia32_comile (v4sf, v4sf)
+Generates the @code{comiss} machine instruction and performs a ``less or
+equal'' comparison. The return value is the truth value of that comparison.
+@item int __buitlin_ia32_comigt (v4sf, v4sf)
+Generates the @code{comiss} machine instruction and performs a ``greater than''
+comparison. The return value is the truth value of that comparison.
+@item int __buitlin_ia32_comige (v4sf, v4sf)
+Generates the @code{comiss} machine instruction and performs a ``greater or
+equal'' comparison. The return value is the truth value of that comparison.
+
+@item int __buitlin_ia32_ucomieq (v4sf, v4sf)
+Generates the @code{ucomiss} machine instruction and performs an equality
+comparison. The return value is the truth value of that comparison.
+@item int __buitlin_ia32_ucomineq (v4sf, v4sf)
+Generates the @code{ucomiss} machine instruction and performs an inequality
+comparison. The return value is the truth value of that comparison.
+@item int __buitlin_ia32_ucomilt (v4sf, v4sf)
+Generates the @code{ucomiss} machine instruction and performs a ``less than''
+comparison. The return value is the truth value of that comparison.
+@item int __buitlin_ia32_ucomile (v4sf, v4sf)
+Generates the @code{ucomiss} machine instruction and performs a ``less or
+equal'' comparison. The return value is the truth value of that comparison.
+@item int __buitlin_ia32_ucomigt (v4sf, v4sf)
+Generates the @code{ucomiss} machine instruction and performs a ``greater than''
+comparison. The return value is the truth value of that comparison.
+@item int __buitlin_ia32_ucomige (v4sf, v4sf)
+Generates the @code{ucomiss} machine instruction and performs a ``greater or
+equal'' comparison. The return value is the truth value of that comparison.
+
+@item v4sf __buitlin_ia32_addps (v4sf, v4sf)
+Generates the @code{addps} machine instruction.
+@item v4sf __buitlin_ia32_addss (v4sf, v4sf)
+Generates the @code{addss} machine instruction.
+@item v4sf __buitlin_ia32_subps (v4sf, v4sf)
+Generates the @code{subps} machine instruction.
+@item v4sf __buitlin_ia32_subss (v4sf, v4sf)
+Generates the @code{subss} machine instruction.
+@item v4sf __buitlin_ia32_mulps (v4sf, v4sf)
+Generates the @code{mulps} machine instruction.
+@item v4sf __buitlin_ia32_mulss (v4sf, v4sf)
+Generates the @code{mulss} machine instruction.
+@item v4sf __buitlin_ia32_divps (v4sf, v4sf)
+Generates the @code{divps} machine instruction.
+@item v4sf __buitlin_ia32_divss (v4sf, v4sf)
+Generates the @code{divss} machine instruction.
+
+@item v4si __buitlin_ia32_cmpeqps (v4sf, v4sf)
+Generates the @code{cmpeqps} machine instruction.
+@item v4si __buitlin_ia32_cmplts (v4sf, v4sf)
+Generates the @code{cmpltps} machine instruction.
+@item v4si __buitlin_ia32_cmpleps (v4sf, v4sf)
+Generates the @code{cmpleps} machine instruction.
+@item v4si __buitlin_ia32_cmpgtps (v4sf, v4sf)
+Generates the @code{cmpgtps} machine instruction.
+@item v4si __buitlin_ia32_cmpgeps (v4sf, v4sf)
+Generates the @code{cmpgeps} machine instruction.
+@item v4si __buitlin_ia32_cmpunordps (v4sf, v4sf)
+Generates the @code{cmpunodps} machine instruction.
+@item v4si __buitlin_ia32_cmpneqps (v4sf, v4sf)
+Generates the @code{cmpeqps} machine instruction.
+@item v4si __buitlin_ia32_cmpnltps (v4sf, v4sf)
+Generates the @code{cmpltps} machine instruction.
+@item v4si __buitlin_ia32_cmpnleps (v4sf, v4sf)
+Generates the @code{cmpleps} machine instruction.
+@item v4si __buitlin_ia32_cmpngtps (v4sf, v4sf)
+Generates the @code{cmpgtps} machine instruction.
+@item v4si __buitlin_ia32_cmpngeps (v4sf, v4sf)
+Generates the @code{cmpgeps} machine instruction.
+@item v4si __buitlin_ia32_cmpordps (v4sf, v4sf)
+Generates the @code{cmpunodps} machine instruction.
+
+@item v4si __buitlin_ia32_cmpeqss (v4sf, v4sf)
+Generates the @code{cmpeqss} machine instruction.
+@item v4si __buitlin_ia32_cmpltss (v4sf, v4sf)
+Generates the @code{cmpltss} machine instruction.
+@item v4si __buitlin_ia32_cmpless (v4sf, v4sf)
+Generates the @code{cmpless} machine instruction.
+@item v4si __buitlin_ia32_cmpgtss (v4sf, v4sf)
+Generates the @code{cmpgtss} machine instruction.
+@item v4si __buitlin_ia32_cmpgess (v4sf, v4sf)
+Generates the @code{cmpgess} machine instruction.
+@item v4si __buitlin_ia32_cmpunordss (v4sf, v4sf)
+Generates the @code{cmpunodss} machine instruction.
+@item v4si __buitlin_ia32_cmpneqss (v4sf, v4sf)
+Generates the @code{cmpeqss} machine instruction.
+@item v4si __buitlin_ia32_cmpnlts (v4sf, v4sf)
+Generates the @code{cmpltss} machine instruction.
+@item v4si __buitlin_ia32_cmpnless (v4sf, v4sf)
+Generates the @code{cmpless} machine instruction.
+@item v4si __buitlin_ia32_cmpngtss (v4sf, v4sf)
+Generates the @code{cmpgtss} machine instruction.
+@item v4si __buitlin_ia32_cmpngess (v4sf, v4sf)
+Generates the @code{cmpgess} machine instruction.
+@item v4si __buitlin_ia32_cmpordss (v4sf, v4sf)
+Generates the @code{cmpunodss} machine instruction.
+
+@item v4sf __buitlin_ia32_maxps (v4sf, v4sf)
+Generates the @code{maxps} machine instruction.
+@item v4sf __buitlin_ia32_maxsss (v4sf, v4sf)
+Generates the @code{maxss} machine instruction.
+@item v4sf __buitlin_ia32_minps (v4sf, v4sf)
+Generates the @code{minps} machine instruction.
+@item v4sf __buitlin_ia32_minsss (v4sf, v4sf)
+Generates the @code{minss} machine instruction.
+
+@item ti __buitlin_ia32_andps (ti, ti)
+Generates the @code{andps} machine instruction.
+@item ti __buitlin_ia32_andnps (ti, ti)
+Generates the @code{andnps} machine instruction.
+@item ti __buitlin_ia32_orps (ti, ti)
+Generates the @code{orps} machine instruction.
+@item ti __buitlin_ia32_xorps (ti, ti)
+Generates the @code{xorps} machine instruction.
+
+@item v4sf __buitlin_ia32_movps (v4sf, v4sf)
+Generates the @code{movps} machine instruction.
+@item v4sf __buitlin_ia32_movhlps (v4sf, v4sf)
+Generates the @code{movhlps} machine instruction.
+@item v4sf __buitlin_ia32_movlhps (v4sf, v4sf)
+Generates the @code{movlhps} machine instruction.
+@item v4sf __buitlin_ia32_unpckhps (v4sf, v4sf)
+Generates the @code{unpckhps} machine instruction.
+@item v4sf __buitlin_ia32_unpcklps (v4sf, v4sf)
+Generates the @code{unpcklps} machine instruction.
+
+@item v4sf __buitlin_ia32_cvtpi2ps (v4sf, v2si)
+Generates the @code{cvtpi2ps} machine instruction.
+@item v2si __buitlin_ia32_cvtps2pi (v4sf)
+Generates the @code{cvtps2pi} machine instruction.
+@item v4sf __buitlin_ia32_cvtsi2ss (v4sf, int)
+Generates the @code{cvtsi2ss} machine instruction.
+@item int __buitlin_ia32_cvtss2si (v4sf)
+Generates the @code{cvtsi2ss} machine instruction.
+@item v2si __buitlin_ia32_cvttps2pi (v4sf)
+Generates the @code{cvttps2pi} machine instruction.
+@item int __buitlin_ia32_cvttss2si (v4sf)
+Generates the @code{cvttsi2ss} machine instruction.
+
+@item v4sf __buitlin_ia32_rcpps (v4sf)
+Generates the @code{rcpps} machine instruction.
+@item v4sf __buitlin_ia32_rsqrtps (v4sf)
+Generates the @code{rsqrtps} machine instruction.
+@item v4sf __buitlin_ia32_sqrtps (v4sf)
+Generates the @code{sqrtps} machine instruction.
+@item v4sf __buitlin_ia32_rcpss (v4sf)
+Generates the @code{rcpss} machine instruction.
+@item v4sf __buitlin_ia32_rsqrtss (v4sf)
+Generates the @code{rsqrtss} machine instruction.
+@item v4sf __buitlin_ia32_sqrtss (v4sf)
+Generates the @code{sqrtss} machine instruction.
+
+@item v4sf __buitlin_ia32_shufps (v4sf, v4sf, int)
+Generates the @code{shufps} machine instruction.
+
+@item v4sf __buitlin_ia32_loadaps (float *)
+Generates the @code{movaps} machine instruction as a load from memory.
+@item void __buitlin_ia32_storeaps (float *, v4sf)
+Generates the @code{movaps} machine instruction as a store to memory.
+@item v4sf __buitlin_ia32_loadups (float *)
+Generates the @code{movups} machine instruction as a load from memory.
+@item void __buitlin_ia32_storeups (float *, v4sf)
+Generates the @code{movups} machine instruction as a store to memory.
+@item v4sf __buitlin_ia32_loadsss (float *)
+Generates the @code{movss} machine instruction as a load from memory.
+@item void __buitlin_ia32_storess (float *, v4sf)
+Generates the @code{movss} machine instruction as a store to memory.
+
+@item v4sf __buitlin_ia32_loadhps (v4sf, v2si *)
+Generates the @code{movhps} machine instruction as a load from memory.
+@item v4sf __buitlin_ia32_loadlps (v4sf, v2si *)
+Generates the @code{movlps} machine instruction as a load from memory
+@item void __buitlin_ia32_storehps (v4sf, v2si *)
+Generates the @code{movhps} machine instruction as a store to memory.
+@item void __buitlin_ia32_storelps (v4sf, v2si *)
+Generates the @code{movlps} machine instruction as a store to memory.
+
+@item void __buitlin_ia32_movntps (float *, v4sf)
+Generates the @code{movntps} machine instruction.
+@item int __buitlin_ia32_movmskps (v4sf)
+Generates the @code{movntps} machine instruction.
+
+@item void __buitlin_ia32_storeps1 (float *, v4sf)
+Generates the @code{movaps} machine instruction as a store to memory.
+Before storing, the value is modified with a @code{shufps} instruction
+so that the lowest of the four floating point elements is replicated
+across the entire vector that is stored.
+@item void __buitlin_ia32_storerps (float *, v4sf)
+Generates the @code{movaps} machine instruction as a store to memory.
+Before storing, the value is modified with a @code{shufps} instruction
+so that the order of the four floating point elements in the vector is
+reversed.
+@item v4sf __buitlin_ia32_loadps1 (float *)
+Generates a @code{movss} machine instruction to load a floating point
+value from memory, and a @code{shufps} instruction to replicate the
+loaded value across all four elements of the result vector.
+@item v4sf __buitlin_ia32_loadrps (float *)
+Generates a @code{movaps} machine instruction to load a vector from
+memory, and a @code{shufps} instruction to reverse the order of the
+four floating point elements in the result vector.
+@item v4sf __builtin_ia32_setps (float, float, float, float)
+Constructs a vector from four single floating point values. The return
+value is equal to the value that would result from storing the four
+arguments into consecutive memory locations and then executing a
+@code{movaps} to load the vector from memory.
+@item v4sf __builtin_ia32_setps1 (float)
+Constructs a vector from a single floating point value by replicating
+it across all four elements of the result vector.
+@end table
+
@item -mpush-args
@itemx -mno-push-args
@opindex mpush-args